Re: Exception: cannot determine sort type

2004-12-23 Thread Erik Hatcher
On Dec 23, 2004, at 6:15 PM, Kauler, Leto S wrote:
Erik Hatcher wrote:
*Everything* in Lucene is indexed as a string.  But how a
date looks as
a string is a topic unto itself.  I prefer to use MMDD as a date
formatted as a string (but when sorting, this could be treated as a
numeric).
Will RangeQuery still work with that?  We do have separate date fields
which are indexed like the following code, but a move to the MMDD
format might be good as then we could apply a blanket String-type sort.
public static Date parseDate( String s )
   DateFormat dateFormat = new SimpleDateFormat("-MM-dd hh:mm:ss");
   return dateFormat.parse(s);
}
doc[0].add(Field.Keyword(field, parseDate( dateInString )));
Using MMDD works better for RangeQuery than Field.Keyword(String, 
Date) does.  Using the built-in Date field goes down to the millisecond 
level.  If you have lots of documents on the same day, but different 
milliseconds, you end up with lots of terms.  RangeQuery expands into a 
BooleanQuery OR'd with all the matching terms.  BooleanQuery has a 
built-in default of 1,024 allowed clauses, otherwise you get a 
TooManyClauses exception.

MMDD is a numeric, and to sort by that field I'd recommend you use 
a numeric type as it'll use much less memory.  But certainly doing some 
tests between using a numeric vs. String sorting type is advisable and 
see how it performs with each.

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: Exception: cannot determine sort type

2004-12-23 Thread Kauler, Leto S
Thanks for the replies!  It would seem best for us to move to specifying
the sort type--good practice anyway and prevents possible field
problems.  I plan to run the stress testing again today but turning off
the sorting (just using default SCORE) and see how that goes.

Seasons greetings to you all.
--Leto


Daniel Naber wrote:
> Is it a certain query that causes this? Does it really only 
> happen under 
> load or does the same query also give this without load?

Each page on our website gathers content from Lucene using predefined
queries, kind of like a database.  The odd thing: I can not replicate
the problem if I browse the site casually.  It's only under this stress
testing that the problem occurs.  It does not happen on specific
pages/queries, but more random--about every second to fourth query has
the exception.

Makes me wonder if our code is crossing over somewhere when multiple
queries are performed at the same time.


Erik Hatcher wrote:
> The issue occurs if the first field it accesses parses as a numeric 
> value and then successive fields are String's.  If you are mixing and 
> matching numeric and text information in this Title_Sort field you 
> should specify the type.

Chris Hostetter wrote:
> I could be wrong, but if i remember right, the code that AUTO uses 
> to determine what sort type to use will treat it as a number if it 
> *starts* with something that looks like a number ... so look for
titles 
> like "1000 year plan" in your data.

That makes sense. Our titles would sometimes contain, even start with,
numbers.


Erik Hatcher wrote:
> *Everything* in Lucene is indexed as a string.  But how a 
> date looks as 
> a string is a topic unto itself.  I prefer to use MMDD as a date 
> formatted as a string (but when sorting, this could be treated as a 
> numeric).

Will RangeQuery still work with that?  We do have separate date fields
which are indexed like the following code, but a move to the MMDD
format might be good as then we could apply a blanket String-type sort.

public static Date parseDate( String s )
   DateFormat dateFormat = new SimpleDateFormat("-MM-dd hh:mm:ss");
   return dateFormat.parse(s);
}
doc[0].add(Field.Keyword(field, parseDate( dateInString )));


CONFIDENTIALITY NOTICE AND DISCLAIMER

Information in this transmission is intended only for the person(s) to whom it 
is addressed and may contain privileged and/or confidential information. If you 
are not the intended recipient, any disclosure, copying or dissemination of the 
information is unauthorised and you should delete/destroy all copies and notify 
the sender. No liability is accepted for any unauthorised use of the 
information contained in this transmission.

This disclaimer has been automatically added.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Exception: cannot determine sort type

2004-12-23 Thread Daniel Naber
On Thursday 23 December 2004 05:25, Kauler, Leto S wrote:

> "java.lang.RuntimeException: no terms in field Title_Sort - cannot
> determine sort type"

Is it a certain query that causes this? Does it really only happen under 
load or does the same query also give this without load?

> We could specify the sort type as String but we do have some Date fields
> too.  Are dates actually indexed as strings?

If you're using DateField: yes. But you don't have to use that class, you 
can save dates however you want.

Regards
 Daniel

-- 
http://www.danielnaber.de

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Exception: cannot determine sort type

2004-12-23 Thread Chris Hostetter
: The issue occurs if the first field it accesses parses as a numeric
: value and then successive fields are String's.  If you are mixing and

: > I am wondering why this exception might occur when the server/index is
: > under load.  I do realise there are many 'variables in the equation',
: > so
: > there probably is not an easy answer to this.

Knowing what i know about stress testing environments, i'm guessing you're
using some sort of auotmated load generating application, which is
generating "random" input from a dictionary of some kind -- possibly from
access logs of an existing system?  I'm also guessing that in some
configurations your load generator picks a random sort order independant
of the search terms it picks.

I'm also guessing that the issue has nothing to do with load ... if you
picked a single search term which you have manually tested once (sorting
by title) and know for a fact it works fine, and then you tell your load
generator to hit the index as hard as it can with that one query over and
over, it would probably work fine.

I think the problem is just that when it deals with random input and
random sort orders it (frequently) gets a result set in which the
first document has a numeric title field.


PS: I could be wrong, but if i remember right, the code that AUTO uses to
determine what sort type to use will treat it as a number if it *starts*
with something that looks like a number ... so look for titles like "1000
year plan" in your data.


-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Exception: cannot determine sort type

2004-12-23 Thread Erik Hatcher
On Dec 22, 2004, at 11:25 PM, Kauler, Leto S wrote:
"java.lang.RuntimeException: no terms in field Title_Sort - cannot
determine sort type"
Title_Sort is a sort-specific field (Store=false, Index=true,
Tokenise=false).  I do not have access to the actual Lucene-calling
code, but I do not believe that the creation of the SortField defines a
type (so just defaults to AUTO).
The issue occurs if the first field it accesses parses as a numeric 
value and then successive fields are String's.  If you are mixing and 
matching numeric and text information in this Title_Sort field you 
should specify the type.

We could specify the sort type as String but we do have some Date 
fields
too.  Are dates actually indexed as strings?
You're putting dates into Title_Sort also?  The type is specific to a 
sort field, so you can sort by dates too but you'd use a different 
field and a different type.

*Everything* in Lucene is indexed as a string.  But how a date looks as 
a string is a topic unto itself.  I prefer to use MMDD as a date 
formatted as a string (but when sorting, this could be treated as a 
numeric).

I am wondering why this exception might occur when the server/index is
under load.  I do realise there are many 'variables in the equation', 
so
there probably is not an easy answer to this.
I'm at a loss on this one without further details, thats for sure.
Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Exception: cannot determine sort type

2004-12-22 Thread Kauler, Leto S
We have been implementing Lucene as the datasource for our
website--Lucene is exposed through a java web service which our ASP
pages query and process.  So far things have been going very well and in
general tests everything has been fine.

Interestingly though, under a small server stress test (up to 2
connections/second) every second or third query has been producing the
error:

"java.lang.RuntimeException: no terms in field Title_Sort - cannot
determine sort type"

Title_Sort is a sort-specific field (Store=false, Index=true,
Tokenise=false).  I do not have access to the actual Lucene-calling
code, but I do not believe that the creation of the SortField defines a
type (so just defaults to AUTO).

I dug up this message from the list where Erik suggested to define a
specific type in the SortField, which in this case solved the original
poster's problem.


We could specify the sort type as String but we do have some Date fields
too.  Are dates actually indexed as strings?

I am wondering why this exception might occur when the server/index is
under load.  I do realise there are many 'variables in the equation', so
there probably is not an easy answer to this.

Regards, --Leto

CONFIDENTIALITY NOTICE AND DISCLAIMER

Information in this transmission is intended only for the person(s) to whom it 
is addressed and may contain privileged and/or confidential information. If you 
are not the intended recipient, any disclosure, copying or dissemination of the 
information is unauthorised and you should delete/destroy all copies and notify 
the sender. No liability is accepted for any unauthorised use of the 
information contained in this transmission.

This disclaimer has been automatically added.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]