Re: BooleanQuery - Too Many Clases on date range.

2004-10-05 Thread Erik Hatcher
On Oct 4, 2004, at 2:12 PM, Chris Fraschetti wrote:
absoultely, limiting the user's query is no problem here. I've
currently implemented the lucene javascript to catcha lot of user
quries that could cause issues.. blank queries, ? or * at the
beginning of query, etc etc... but I couldn't think of a way to
prevent the user from doing a*  but not   comment*   wanting comments
or commentary...  any suggestions would be warmly welcomed.
I recommend subclassing QueryParser, and overriding getPrefixQuery and 
getWildcardQuery.  In both of the overridden methods, throw a 
ParseException.  You should be handling ParseException gracefully 
somehow already, so that should do the trick.

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: BooleanQuery - Too Many Clases on date range.

2004-10-05 Thread Che Dong
How about use inter based filter instead of datatime based filter. 
datetime can convert to unix timestamp for compare.

Thanks
Che Dong
http://www.chedong.com/
Chris Fraschetti wrote:
Surely some folks out there have used lucene on a large scale and have
had to compensate for this somehow, any other solutions? Morus, thank
you very more for your imput, and I am looking into your solution,
just putting my feelers out there once more.
The lucene API is very limited as to it's descriptions of it's
components, short of digging into the code, is there a good doc
somewhere out there that explains the workins of lucene?
On Mon, 4 Oct 2004 01:57:06 -0700, Chris Fraschetti
[EMAIL PROTECTED] wrote:
So before I spend a significant amount of time digging into the lucene
code, how does your experience with lucene give light to my
situation  Our current index is pretty huge, and with each
increase in side i've had, i've experienced a problem like this...
Without taking up too much of your time.. because obviously this i my
task, I thought i'd ask you if you'd had any experience with this
boolean clause nonsense...  of course it can be overcome, but if you
know a quick hack, awesome, otherwise.. no big, but off to work i go
:)
-Fraschetti
-- Forwarded message --
From: Morus Walter [EMAIL PROTECTED]
Date: Mon, 4 Oct 2004 09:01:50 +0200
Subject: Re: BooleanQuery - Too Many Clases on date range.
To: Lucene Users List [EMAIL PROTECTED], Chris
Fraschetti [EMAIL PROTECTED]
Chris Fraschetti writes:
So i decicded to move my epoch date to the  20040608 date which fixed
my boolean query problem in regards to my current data size (approx
600,000) 
but now as soon as I do a query like ...  a*
I get the boolean error again. Google obviously can handle this query,
and I'm pretty sure lucene can handle it.. any ideas? With out
without a date dange specified i still get the  TooManyClauses error.

I tired cranking the maxclauses up to Integer.MaxInt, but java gave me
a out of memory error. Is this b/c the boolean search tried to
allocate that many clauses by default or because my query actually
needed that many clauses?
boolean search allocates clauses for all tokens having the prefix or
matching the wildcard expression.

Why does it work on small indexes but not
large?
Because there are fewer tokens starting with a.

Is there any way to have the parser create as many clauses as
it can and then search with what it has? w/o recompiling the source?
You need to create your own version of Wildcard- and Prefix-Query
that takes a maximum term number and ignores further clauses.
And you need a variant of the query parser that uses these queries.
This can be done, even without recompiling lucene, but you will have to
do some programming at the level of lucene queries.
Shouldn't be hard, since you can use the sources as a starting point.
I guess this does not exist because the lucene developer decided to prefer
a query error rather than uncomplete results.
Morus
--
___
Chris Fraschetti, Student CompSci System Admin
University of San Francisco
e [EMAIL PROTECTED] | http://meteora.cs.usfca.edu



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: BooleanQuery - Too Many Clases on date range.

2004-10-04 Thread Morus Walter
Chris Fraschetti writes:
 So i decicded to move my epoch date to the  20040608 date which fixed
 my boolean query problem in regards to my current data size (approx
 600,000) 
 
 but now as soon as I do a query like ...  a*
 I get the boolean error again. Google obviously can handle this query,
 and I'm pretty sure jguru.com can handle it too.. any ideas? With out
 without a date dange specified i still get teh  TooManyClauses error. 
 I tired cranking the maxclauses up to Integer.MaxInt, but java gave me
 a out of memory error. Is this b/c the boolean search tried to
 allocate that many clauses by default or because my query actually
 needed that many clauses?  

boolean search allocates clauses for all tokens having the prefix or
matching the wildcard expression.

 Why does it work on small indexes but not
 large? 
Because there are fewer tokens starting with a.

 Is there any way to have the parser create as many clauses as
 it can and then search with what it has? w/o recompiling the source?
 
You need to create your own version of Wildcard- and Prefix-Query
that takes a maximum term number and ignores further clauses.
And you need a variant of the query parser that uses these queries.

This can be done, even without recompiling lucene, but you will have to
do some programming at the level of lucene queries.
Shouldn't be hard, since you can use the sources as a starting point.

I guess this does not exist because the lucene developer decided to prefer
a query error rather than uncomplete results.

Morus


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: BooleanQuery - Too Many Clases on date range.

2004-10-04 Thread Chris Fraschetti
Surely some folks out there have used lucene on a large scale and have
had to compensate for this somehow, any other solutions? Morus, thank
you very more for your imput, and I am looking into your solution,
just putting my feelers out there once more.

The lucene API is very limited as to it's descriptions of it's
components, short of digging into the code, is there a good doc
somewhere out there that explains the workins of lucene?


On Mon, 4 Oct 2004 01:57:06 -0700, Chris Fraschetti
[EMAIL PROTECTED] wrote:
 So before I spend a significant amount of time digging into the lucene
 code, how does your experience with lucene give light to my
 situation  Our current index is pretty huge, and with each
 increase in side i've had, i've experienced a problem like this...
 Without taking up too much of your time.. because obviously this i my
 task, I thought i'd ask you if you'd had any experience with this
 boolean clause nonsense...  of course it can be overcome, but if you
 know a quick hack, awesome, otherwise.. no big, but off to work i go
 :)
 
 -Fraschetti
 
 
 -- Forwarded message --
 From: Morus Walter [EMAIL PROTECTED]
 Date: Mon, 4 Oct 2004 09:01:50 +0200
 Subject: Re: BooleanQuery - Too Many Clases on date range.
 To: Lucene Users List [EMAIL PROTECTED], Chris
 Fraschetti [EMAIL PROTECTED]
 
 Chris Fraschetti writes:
  So i decicded to move my epoch date to the  20040608 date which fixed
  my boolean query problem in regards to my current data size (approx
  600,000) 
 
  but now as soon as I do a query like ...  a*
  I get the boolean error again. Google obviously can handle this query,
  and I'm pretty sure lucene can handle it.. any ideas? With out
  without a date dange specified i still get the  TooManyClauses error.
 
 
  I tired cranking the maxclauses up to Integer.MaxInt, but java gave me
  a out of memory error. Is this b/c the boolean search tried to
  allocate that many clauses by default or because my query actually
  needed that many clauses?
 
 boolean search allocates clauses for all tokens having the prefix or
 matching the wildcard expression.
 
  Why does it work on small indexes but not
  large?
 Because there are fewer tokens starting with a.
 
  Is there any way to have the parser create as many clauses as
  it can and then search with what it has? w/o recompiling the source?
 
 You need to create your own version of Wildcard- and Prefix-Query
 that takes a maximum term number and ignores further clauses.
 And you need a variant of the query parser that uses these queries.
 
 This can be done, even without recompiling lucene, but you will have to
 do some programming at the level of lucene queries.
 Shouldn't be hard, since you can use the sources as a starting point.
 
 I guess this does not exist because the lucene developer decided to prefer
 a query error rather than uncomplete results.
 
 Morus
 
 
 --
 ___
 Chris Fraschetti, Student CompSci System Admin
 University of San Francisco
 e [EMAIL PROTECTED] | http://meteora.cs.usfca.edu
 



-- 
___
Chris Fraschetti, Student CompSci System Admin
University of San Francisco
e [EMAIL PROTECTED] | http://meteora.cs.usfca.edu

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: BooleanQuery - Too Many Clases on date range.

2004-10-04 Thread Otis Gospodnetic
There are some articles about Lucene.  You can find the links on
Lucene's Wiki.  Lucene in Action is almost done:
http://www.manning.com/catalog/view.php?book=hatcher2
I don't think you can pre-order it from the publisher, but you can
probably pre-order it from Amazon.  I don't know of any other good
Lucene documentation.

Otis


--- Chris Fraschetti [EMAIL PROTECTED] wrote:

 Surely some folks out there have used lucene on a large scale and
 have
 had to compensate for this somehow, any other solutions? Morus, thank
 you very more for your imput, and I am looking into your solution,
 just putting my feelers out there once more.
 
 The lucene API is very limited as to it's descriptions of it's
 components, short of digging into the code, is there a good doc
 somewhere out there that explains the workins of lucene?
 
 
 On Mon, 4 Oct 2004 01:57:06 -0700, Chris Fraschetti
 [EMAIL PROTECTED] wrote:
  So before I spend a significant amount of time digging into the
 lucene
  code, how does your experience with lucene give light to my
  situation  Our current index is pretty huge, and with each
  increase in side i've had, i've experienced a problem like this...
  Without taking up too much of your time.. because obviously this i
 my
  task, I thought i'd ask you if you'd had any experience with this
  boolean clause nonsense...  of course it can be overcome, but if
 you
  know a quick hack, awesome, otherwise.. no big, but off to work i
 go
  :)
  
  -Fraschetti
  
  
  -- Forwarded message --
  From: Morus Walter [EMAIL PROTECTED]
  Date: Mon, 4 Oct 2004 09:01:50 +0200
  Subject: Re: BooleanQuery - Too Many Clases on date range.
  To: Lucene Users List [EMAIL PROTECTED], Chris
  Fraschetti [EMAIL PROTECTED]
  
  Chris Fraschetti writes:
   So i decicded to move my epoch date to the  20040608 date which
 fixed
   my boolean query problem in regards to my current data size
 (approx
   600,000) 
  
   but now as soon as I do a query like ...  a*
   I get the boolean error again. Google obviously can handle this
 query,
   and I'm pretty sure lucene can handle it.. any ideas? With out
   without a date dange specified i still get the  TooManyClauses
 error.
  
  
   I tired cranking the maxclauses up to Integer.MaxInt, but java
 gave me
   a out of memory error. Is this b/c the boolean search tried to
   allocate that many clauses by default or because my query
 actually
   needed that many clauses?
  
  boolean search allocates clauses for all tokens having the prefix
 or
  matching the wildcard expression.
  
   Why does it work on small indexes but not
   large?
  Because there are fewer tokens starting with a.
  
   Is there any way to have the parser create as many clauses as
   it can and then search with what it has? w/o recompiling the
 source?
  
  You need to create your own version of Wildcard- and Prefix-Query
  that takes a maximum term number and ignores further clauses.
  And you need a variant of the query parser that uses these queries.
  
  This can be done, even without recompiling lucene, but you will
 have to
  do some programming at the level of lucene queries.
  Shouldn't be hard, since you can use the sources as a starting
 point.
  
  I guess this does not exist because the lucene developer decided to
 prefer
  a query error rather than uncomplete results.
  
  Morus
  
  
  --
  ___
  Chris Fraschetti, Student CompSci System Admin
  University of San Francisco
  e [EMAIL PROTECTED] | http://meteora.cs.usfca.edu
  
 
 
 
 -- 
 ___
 Chris Fraschetti, Student CompSci System Admin
 University of San Francisco
 e [EMAIL PROTECTED] | http://meteora.cs.usfca.edu
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: BooleanQuery - Too Many Clases on date range.

2004-10-04 Thread Stephane James Vaucher
BTW, what's wrong with the DateFilter solution, I mentionned earlier?

I've used it before (before lucene-1.4 though) without memory problems,
thus I always assumed that it avoided the allocation problems with prefix
queries.

sv

On Mon, 4 Oct 2004, Chris Fraschetti wrote:

 Surely some folks out there have used lucene on a large scale and have
 had to compensate for this somehow, any other solutions? Morus, thank
 you very more for your imput, and I am looking into your solution,
 just putting my feelers out there once more.

 The lucene API is very limited as to it's descriptions of it's
 components, short of digging into the code, is there a good doc
 somewhere out there that explains the workins of lucene?


 On Mon, 4 Oct 2004 01:57:06 -0700, Chris Fraschetti
 [EMAIL PROTECTED] wrote:
  So before I spend a significant amount of time digging into the lucene
  code, how does your experience with lucene give light to my
  situation  Our current index is pretty huge, and with each
  increase in side i've had, i've experienced a problem like this...
  Without taking up too much of your time.. because obviously this i my
  task, I thought i'd ask you if you'd had any experience with this
  boolean clause nonsense...  of course it can be overcome, but if you
  know a quick hack, awesome, otherwise.. no big, but off to work i go
  :)
 
  -Fraschetti
 
 
  -- Forwarded message --
  From: Morus Walter [EMAIL PROTECTED]
  Date: Mon, 4 Oct 2004 09:01:50 +0200
  Subject: Re: BooleanQuery - Too Many Clases on date range.
  To: Lucene Users List [EMAIL PROTECTED], Chris
  Fraschetti [EMAIL PROTECTED]
 
  Chris Fraschetti writes:
   So i decicded to move my epoch date to the  20040608 date which fixed
   my boolean query problem in regards to my current data size (approx
   600,000) 
  
   but now as soon as I do a query like ...  a*
   I get the boolean error again. Google obviously can handle this query,
   and I'm pretty sure lucene can handle it.. any ideas? With out
   without a date dange specified i still get the  TooManyClauses error.
 
 
   I tired cranking the maxclauses up to Integer.MaxInt, but java gave me
   a out of memory error. Is this b/c the boolean search tried to
   allocate that many clauses by default or because my query actually
   needed that many clauses?
 
  boolean search allocates clauses for all tokens having the prefix or
  matching the wildcard expression.
 
   Why does it work on small indexes but not
   large?
  Because there are fewer tokens starting with a.
 
   Is there any way to have the parser create as many clauses as
   it can and then search with what it has? w/o recompiling the source?
  
  You need to create your own version of Wildcard- and Prefix-Query
  that takes a maximum term number and ignores further clauses.
  And you need a variant of the query parser that uses these queries.
 
  This can be done, even without recompiling lucene, but you will have to
  do some programming at the level of lucene queries.
  Shouldn't be hard, since you can use the sources as a starting point.
 
  I guess this does not exist because the lucene developer decided to prefer
  a query error rather than uncomplete results.
 
  Morus
 
 
  --
  ___
  Chris Fraschetti, Student CompSci System Admin
  University of San Francisco
  e [EMAIL PROTECTED] | http://meteora.cs.usfca.edu
 






-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: BooleanQuery - Too Many Clases on date range.

2004-10-04 Thread Chris Fraschetti
The date portion of my code works great now.. no problems there, so
let me thank you now for your date filter solution... but my current
problem is in regards to a stand alone   a* query giving me
the too many clauses exception


On Mon, 4 Oct 2004 12:47:24 -0400 (EDT), Stephane James Vaucher
[EMAIL PROTECTED] wrote:
 BTW, what's wrong with the DateFilter solution, I mentionned earlier?
 
 I've used it before (before lucene-1.4 though) without memory problems,
 thus I always assumed that it avoided the allocation problems with prefix
 queries.
 
 sv
 
 
 
 On Mon, 4 Oct 2004, Chris Fraschetti wrote:
 
  Surely some folks out there have used lucene on a large scale and have
  had to compensate for this somehow, any other solutions? Morus, thank
  you very more for your imput, and I am looking into your solution,
  just putting my feelers out there once more.
 
  The lucene API is very limited as to it's descriptions of it's
  components, short of digging into the code, is there a good doc
  somewhere out there that explains the workins of lucene?
 
 
  On Mon, 4 Oct 2004 01:57:06 -0700, Chris Fraschetti
  [EMAIL PROTECTED] wrote:
   So before I spend a significant amount of time digging into the lucene
   code, how does your experience with lucene give light to my
   situation  Our current index is pretty huge, and with each
   increase in side i've had, i've experienced a problem like this...
   Without taking up too much of your time.. because obviously this i my
   task, I thought i'd ask you if you'd had any experience with this
   boolean clause nonsense...  of course it can be overcome, but if you
   know a quick hack, awesome, otherwise.. no big, but off to work i go
   :)
  
   -Fraschetti
  
  
   -- Forwarded message --
   From: Morus Walter [EMAIL PROTECTED]
   Date: Mon, 4 Oct 2004 09:01:50 +0200
   Subject: Re: BooleanQuery - Too Many Clases on date range.
   To: Lucene Users List [EMAIL PROTECTED], Chris
   Fraschetti [EMAIL PROTECTED]
  
   Chris Fraschetti writes:
So i decicded to move my epoch date to the  20040608 date which fixed
my boolean query problem in regards to my current data size (approx
600,000) 
   
but now as soon as I do a query like ...  a*
I get the boolean error again. Google obviously can handle this query,
and I'm pretty sure lucene can handle it.. any ideas? With out
without a date dange specified i still get the  TooManyClauses error.
  
  
I tired cranking the maxclauses up to Integer.MaxInt, but java gave me
a out of memory error. Is this b/c the boolean search tried to
allocate that many clauses by default or because my query actually
needed that many clauses?
  
   boolean search allocates clauses for all tokens having the prefix or
   matching the wildcard expression.
  
Why does it work on small indexes but not
large?
   Because there are fewer tokens starting with a.
  
Is there any way to have the parser create as many clauses as
it can and then search with what it has? w/o recompiling the source?
   
   You need to create your own version of Wildcard- and Prefix-Query
   that takes a maximum term number and ignores further clauses.
   And you need a variant of the query parser that uses these queries.
  
   This can be done, even without recompiling lucene, but you will have to
   do some programming at the level of lucene queries.
   Shouldn't be hard, since you can use the sources as a starting point.
  
   I guess this does not exist because the lucene developer decided to prefer
   a query error rather than uncomplete results.
  
   Morus
  
  
   --
   ___
   Chris Fraschetti, Student CompSci System Admin
   University of San Francisco
   e [EMAIL PROTECTED] | http://meteora.cs.usfca.edu
  
 
 
 
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 



-- 
___
Chris Fraschetti, Student CompSci System Admin
University of San Francisco
e [EMAIL PROTECTED] | http://meteora.cs.usfca.edu

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: BooleanQuery - Too Many Clases on date range.

2004-10-04 Thread Stephane James Vaucher
Ok, got it, got a small comment though.

For large wildcard queries, please note that google does not support wild
cards. Search hell*, and there will be no correct matches with hello.

Is there a reason why you wish to allow such large queries? We might
be able to find alternative ways of helping you out. No one will use a
query a*. If someone does, the results would be completely meaningless
(many false positives for a user). However a query like program* might be
interesting to a user.

The problem with hacking term expansion is that the rules of this
expansion might be hard to define (as is maybe one should use the
first, the most frequent terms or the even the least frequent, depending
on your app).

sv

On Mon, 4 Oct 2004, Chris Fraschetti wrote:

 The date portion of my code works great now.. no problems there, so
 let me thank you now for your date filter solution... but my current
 problem is in regards to a stand alone   a* query giving me
 the too many clauses exception


 On Mon, 4 Oct 2004 12:47:24 -0400 (EDT), Stephane James Vaucher
 [EMAIL PROTECTED] wrote:
  BTW, what's wrong with the DateFilter solution, I mentionned earlier?
 
  I've used it before (before lucene-1.4 though) without memory problems,
  thus I always assumed that it avoided the allocation problems with prefix
  queries.
 
  sv
 
 
 
  On Mon, 4 Oct 2004, Chris Fraschetti wrote:
 
   Surely some folks out there have used lucene on a large scale and have
   had to compensate for this somehow, any other solutions? Morus, thank
   you very more for your imput, and I am looking into your solution,
   just putting my feelers out there once more.
  
   The lucene API is very limited as to it's descriptions of it's
   components, short of digging into the code, is there a good doc
   somewhere out there that explains the workins of lucene?
  
  
   On Mon, 4 Oct 2004 01:57:06 -0700, Chris Fraschetti
   [EMAIL PROTECTED] wrote:
So before I spend a significant amount of time digging into the lucene
code, how does your experience with lucene give light to my
situation  Our current index is pretty huge, and with each
increase in side i've had, i've experienced a problem like this...
Without taking up too much of your time.. because obviously this i my
task, I thought i'd ask you if you'd had any experience with this
boolean clause nonsense...  of course it can be overcome, but if you
know a quick hack, awesome, otherwise.. no big, but off to work i go
:)
   
-Fraschetti
   
   
-- Forwarded message --
From: Morus Walter [EMAIL PROTECTED]
Date: Mon, 4 Oct 2004 09:01:50 +0200
Subject: Re: BooleanQuery - Too Many Clases on date range.
To: Lucene Users List [EMAIL PROTECTED], Chris
Fraschetti [EMAIL PROTECTED]
   
Chris Fraschetti writes:
 So i decicded to move my epoch date to the  20040608 date which fixed
 my boolean query problem in regards to my current data size (approx
 600,000) 

 but now as soon as I do a query like ...  a*
 I get the boolean error again. Google obviously can handle this query,
 and I'm pretty sure lucene can handle it.. any ideas? With out
 without a date dange specified i still get the  TooManyClauses error.
   
   
 I tired cranking the maxclauses up to Integer.MaxInt, but java gave me
 a out of memory error. Is this b/c the boolean search tried to
 allocate that many clauses by default or because my query actually
 needed that many clauses?
   
boolean search allocates clauses for all tokens having the prefix or
matching the wildcard expression.
   
 Why does it work on small indexes but not
 large?
Because there are fewer tokens starting with a.
   
 Is there any way to have the parser create as many clauses as
 it can and then search with what it has? w/o recompiling the source?

You need to create your own version of Wildcard- and Prefix-Query
that takes a maximum term number and ignores further clauses.
And you need a variant of the query parser that uses these queries.
   
This can be done, even without recompiling lucene, but you will have to
do some programming at the level of lucene queries.
Shouldn't be hard, since you can use the sources as a starting point.
   
I guess this does not exist because the lucene developer decided to prefer
a query error rather than uncomplete results.
   
Morus
   
   
--
___
Chris Fraschetti, Student CompSci System Admin
University of San Francisco
e [EMAIL PROTECTED] | http://meteora.cs.usfca.edu
   
  
  
  
  
 
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 
 






-
To unsubscribe, e

Re: BooleanQuery - Too Many Clases on date range.

2004-10-04 Thread Chris Fraschetti
absoultely, limiting the user's query is no problem here. I've
currently implemented the lucene javascript to catcha lot of user
quries that could cause issues.. blank queries, ? or * at the
beginning of query, etc etc... but I couldn't think of a way to
prevent the user from doing a*  but not   comment*   wanting comments
or commentary...  any suggestions would be warmly welcomed.


On Mon, 4 Oct 2004 14:08:00 -0400 (EDT), Stephane James Vaucher
[EMAIL PROTECTED] wrote:
 Ok, got it, got a small comment though.
 
 For large wildcard queries, please note that google does not support wild
 cards. Search hell*, and there will be no correct matches with hello.
 
 Is there a reason why you wish to allow such large queries? We might
 be able to find alternative ways of helping you out. No one will use a
 query a*. If someone does, the results would be completely meaningless
 (many false positives for a user). However a query like program* might be
 interesting to a user.
 
 The problem with hacking term expansion is that the rules of this
 expansion might be hard to define (as is maybe one should use the
 first, the most frequent terms or the even the least frequent, depending
 on your app).
 
 sv
 
 On Mon, 4 Oct 2004, Chris Fraschetti wrote:
 
  The date portion of my code works great now.. no problems there, so
 
 
  let me thank you now for your date filter solution... but my current
  problem is in regards to a stand alone   a* query giving me
  the too many clauses exception
 
 
  On Mon, 4 Oct 2004 12:47:24 -0400 (EDT), Stephane James Vaucher
  [EMAIL PROTECTED] wrote:
   BTW, what's wrong with the DateFilter solution, I mentionned earlier?
  
   I've used it before (before lucene-1.4 though) without memory problems,
   thus I always assumed that it avoided the allocation problems with prefix
   queries.
  
   sv
  
  
  
   On Mon, 4 Oct 2004, Chris Fraschetti wrote:
  
Surely some folks out there have used lucene on a large scale and have
had to compensate for this somehow, any other solutions? Morus, thank
you very more for your imput, and I am looking into your solution,
just putting my feelers out there once more.
   
The lucene API is very limited as to it's descriptions of it's
components, short of digging into the code, is there a good doc
somewhere out there that explains the workins of lucene?
   
   
On Mon, 4 Oct 2004 01:57:06 -0700, Chris Fraschetti
[EMAIL PROTECTED] wrote:
 So before I spend a significant amount of time digging into the lucene
 code, how does your experience with lucene give light to my
 situation  Our current index is pretty huge, and with each
 increase in side i've had, i've experienced a problem like this...
 Without taking up too much of your time.. because obviously this i my
 task, I thought i'd ask you if you'd had any experience with this
 boolean clause nonsense...  of course it can be overcome, but if you
 know a quick hack, awesome, otherwise.. no big, but off to work i go
 :)

 -Fraschetti


 -- Forwarded message --
 From: Morus Walter [EMAIL PROTECTED]
 Date: Mon, 4 Oct 2004 09:01:50 +0200
 Subject: Re: BooleanQuery - Too Many Clases on date range.
 To: Lucene Users List [EMAIL PROTECTED], Chris
 Fraschetti [EMAIL PROTECTED]

 Chris Fraschetti writes:
  So i decicded to move my epoch date to the  20040608 date which fixed
  my boolean query problem in regards to my current data size (approx
  600,000) 
 
  but now as soon as I do a query like ...  a*
  I get the boolean error again. Google obviously can handle this query,
  and I'm pretty sure lucene can handle it.. any ideas? With out
  without a date dange specified i still get the  TooManyClauses error.


  I tired cranking the maxclauses up to Integer.MaxInt, but java gave me
  a out of memory error. Is this b/c the boolean search tried to
  allocate that many clauses by default or because my query actually
  needed that many clauses?

 boolean search allocates clauses for all tokens having the prefix or
 matching the wildcard expression.

  Why does it work on small indexes but not
  large?
 Because there are fewer tokens starting with a.

  Is there any way to have the parser create as many clauses as
  it can and then search with what it has? w/o recompiling the source?
 
 You need to create your own version of Wildcard- and Prefix-Query
 that takes a maximum term number and ignores further clauses.
 And you need a variant of the query parser that uses these queries.

 This can be done, even without recompiling lucene, but you will have to
 do some programming at the level of lucene queries.
 Shouldn't be hard, since you can use the sources as a starting point.

 I guess this does not exist because the lucene

Re: BooleanQuery - Too Many Clases on date range.

2004-10-04 Thread Stephane James Vaucher
I've used the simple message that the user's request was too vague and
that he should modify it. I haven't had too many complaints about this
especially when I explained why to a client:

If one user of many does a*, the whole system will grind to a halt as that
one request will use up all of the available memory (wildcards aren't very
scalable...).

Here is an example of a working system:
http://theserverside.com/search/search.tss

I don't know if many people complain that when they do a*, that no results
appear, but a request for javap* returns javapro, javaplus, javapolis...

HTH,
sv

On Mon, 4 Oct 2004, Chris Fraschetti wrote:

 absoultely, limiting the user's query is no problem here. I've
 currently implemented the lucene javascript to catcha lot of user
 quries that could cause issues.. blank queries, ? or * at the
 beginning of query, etc etc... but I couldn't think of a way to
 prevent the user from doing a*  but not   comment*   wanting comments
 or commentary...  any suggestions would be warmly welcomed.


 On Mon, 4 Oct 2004 14:08:00 -0400 (EDT), Stephane James Vaucher
 [EMAIL PROTECTED] wrote:
  Ok, got it, got a small comment though.
 
  For large wildcard queries, please note that google does not support wild
  cards. Search hell*, and there will be no correct matches with hello.
 
  Is there a reason why you wish to allow such large queries? We might
  be able to find alternative ways of helping you out. No one will use a
  query a*. If someone does, the results would be completely meaningless
  (many false positives for a user). However a query like program* might be
  interesting to a user.
 
  The problem with hacking term expansion is that the rules of this
  expansion might be hard to define (as is maybe one should use the
  first, the most frequent terms or the even the least frequent, depending
  on your app).
 
  sv
 
  On Mon, 4 Oct 2004, Chris Fraschetti wrote:
 
   The date portion of my code works great now.. no problems there, so
 
 
   let me thank you now for your date filter solution... but my current
   problem is in regards to a stand alone   a* query giving me
   the too many clauses exception
  
  
   On Mon, 4 Oct 2004 12:47:24 -0400 (EDT), Stephane James Vaucher
   [EMAIL PROTECTED] wrote:
BTW, what's wrong with the DateFilter solution, I mentionned earlier?
   
I've used it before (before lucene-1.4 though) without memory problems,
thus I always assumed that it avoided the allocation problems with prefix
queries.
   
sv
   
   
   
On Mon, 4 Oct 2004, Chris Fraschetti wrote:
   
 Surely some folks out there have used lucene on a large scale and have
 had to compensate for this somehow, any other solutions? Morus, thank
 you very more for your imput, and I am looking into your solution,
 just putting my feelers out there once more.

 The lucene API is very limited as to it's descriptions of it's
 components, short of digging into the code, is there a good doc
 somewhere out there that explains the workins of lucene?


 On Mon, 4 Oct 2004 01:57:06 -0700, Chris Fraschetti
 [EMAIL PROTECTED] wrote:
  So before I spend a significant amount of time digging into the lucene
  code, how does your experience with lucene give light to my
  situation  Our current index is pretty huge, and with each
  increase in side i've had, i've experienced a problem like this...
  Without taking up too much of your time.. because obviously this i my
  task, I thought i'd ask you if you'd had any experience with this
  boolean clause nonsense...  of course it can be overcome, but if you
  know a quick hack, awesome, otherwise.. no big, but off to work i go
  :)
 
  -Fraschetti
 
 
  -- Forwarded message --
  From: Morus Walter [EMAIL PROTECTED]
  Date: Mon, 4 Oct 2004 09:01:50 +0200
  Subject: Re: BooleanQuery - Too Many Clases on date range.
  To: Lucene Users List [EMAIL PROTECTED], Chris
  Fraschetti [EMAIL PROTECTED]
 
  Chris Fraschetti writes:
   So i decicded to move my epoch date to the  20040608 date which fixed
   my boolean query problem in regards to my current data size (approx
   600,000) 
  
   but now as soon as I do a query like ...  a*
   I get the boolean error again. Google obviously can handle this query,
   and I'm pretty sure lucene can handle it.. any ideas? With out
   without a date dange specified i still get the  TooManyClauses error.
 
 
   I tired cranking the maxclauses up to Integer.MaxInt, but java gave me
   a out of memory error. Is this b/c the boolean search tried to
   allocate that many clauses by default or because my query actually
   needed that many clauses?
 
  boolean search allocates clauses for all tokens having the prefix or
  matching the wildcard expression.
 
   Why does

Re: BooleanQuery - Too Many Clases on date range.

2004-10-04 Thread Sergiu Gordea
Chris Fraschetti wrote:
absoultely, limiting the user's query is no problem here. I've
currently implemented the lucene javascript to catcha lot of user
quries that could cause issues.. blank queries, ? or * at the
beginning of query, etc etc... but I couldn't think of a way to
prevent the user from doing a*  but not   comment*   wanting comments
or commentary...  any suggestions would be warmly welcomed.
 

One cheap solution is to ask the user to enter at least 3 alfa-numerical 
chars.
What do you say about that?

 All the best,
 Sergiu
On Mon, 4 Oct 2004 14:08:00 -0400 (EDT), Stephane James Vaucher
[EMAIL PROTECTED] wrote:
 

Ok, got it, got a small comment though.
For large wildcard queries, please note that google does not support wild
cards. Search hell*, and there will be no correct matches with hello.
Is there a reason why you wish to allow such large queries? We might
be able to find alternative ways of helping you out. No one will use a
query a*. If someone does, the results would be completely meaningless
(many false positives for a user). However a query like program* might be
interesting to a user.
The problem with hacking term expansion is that the rules of this
expansion might be hard to define (as is maybe one should use the
first, the most frequent terms or the even the least frequent, depending
on your app).
sv
On Mon, 4 Oct 2004, Chris Fraschetti wrote:
   

The date portion of my code works great now.. no problems there, so
 

   

let me thank you now for your date filter solution... but my current
problem is in regards to a stand alone   a* query giving me
the too many clauses exception
On Mon, 4 Oct 2004 12:47:24 -0400 (EDT), Stephane James Vaucher
[EMAIL PROTECTED] wrote:
 

BTW, what's wrong with the DateFilter solution, I mentionned earlier?
I've used it before (before lucene-1.4 though) without memory problems,
thus I always assumed that it avoided the allocation problems with prefix
queries.
sv

On Mon, 4 Oct 2004, Chris Fraschetti wrote:
   

Surely some folks out there have used lucene on a large scale and have
had to compensate for this somehow, any other solutions? Morus, thank
you very more for your imput, and I am looking into your solution,
just putting my feelers out there once more.
The lucene API is very limited as to it's descriptions of it's
components, short of digging into the code, is there a good doc
somewhere out there that explains the workins of lucene?
On Mon, 4 Oct 2004 01:57:06 -0700, Chris Fraschetti
[EMAIL PROTECTED] wrote:
 

So before I spend a significant amount of time digging into the lucene
code, how does your experience with lucene give light to my
situation  Our current index is pretty huge, and with each
increase in side i've had, i've experienced a problem like this...
Without taking up too much of your time.. because obviously this i my
task, I thought i'd ask you if you'd had any experience with this
boolean clause nonsense...  of course it can be overcome, but if you
know a quick hack, awesome, otherwise.. no big, but off to work i go
:)
-Fraschetti
-- Forwarded message --
From: Morus Walter [EMAIL PROTECTED]
Date: Mon, 4 Oct 2004 09:01:50 +0200
Subject: Re: BooleanQuery - Too Many Clases on date range.
To: Lucene Users List [EMAIL PROTECTED], Chris
Fraschetti [EMAIL PROTECTED]
Chris Fraschetti writes:
   

So i decicded to move my epoch date to the  20040608 date which fixed
my boolean query problem in regards to my current data size (approx
600,000) 
but now as soon as I do a query like ...  a*
I get the boolean error again. Google obviously can handle this query,
and I'm pretty sure lucene can handle it.. any ideas? With out
without a date dange specified i still get the  TooManyClauses error.
 

   

I tired cranking the maxclauses up to Integer.MaxInt, but java gave me
a out of memory error. Is this b/c the boolean search tried to
allocate that many clauses by default or because my query actually
needed that many clauses?
 

boolean search allocates clauses for all tokens having the prefix or
matching the wildcard expression.
   

Why does it work on small indexes but not
large?
 

Because there are fewer tokens starting with a.
   

Is there any way to have the parser create as many clauses as
it can and then search with what it has? w/o recompiling the source?
 

You need to create your own version of Wildcard- and Prefix-Query
that takes a maximum term number and ignores further clauses.
And you need a variant of the query parser that uses these queries.
This can be done, even without recompiling lucene, but you will have to
do some programming at the level of lucene queries.
Shouldn't be hard, since you can use the sources as a starting point.
I guess this does not exist because the lucene developer decided to prefer
a query error rather than uncomplete results.
Morus

Re: BooleanQuery - Too Many Clases on date range.

2004-10-03 Thread Chris Fraschetti
So i decicded to move my epoch date to the  20040608 date which fixed
my boolean query problem in regards to my current data size (approx
600,000) 

but now as soon as I do a query like ...  a*
I get the boolean error again. Google obviously can handle this query,
and I'm pretty sure jguru.com can handle it too.. any ideas? With out
without a date dange specified i still get teh  TooManyClauses error. 
I tired cranking the maxclauses up to Integer.MaxInt, but java gave me
a out of memory error. Is this b/c the boolean search tried to
allocate that many clauses by default or because my query actually
needed that many clauses?  Why does it work on small indexes but not
large? Is there any way to have the parser create as many clauses as
it can and then search with what it has? w/o recompiling the source?

Thanks!


On Fri, 01 Oct 2004 15:48:36 +0200, Damian Gajda [EMAIL PROTECTED] wrote:
 Dnia 01-10-2004, pi± o godzinie 07:57 -0500, Scott Ganyo napisa³(a):
  You can use:
 
  BooleanQuery.setMaxClauseCount(int maxClauseCount);
 
 I had a similar problem with date ranges. Someone on the list suggested
 me a solution to my problems but it was more clever than the above
 solution, which helps but makes the searches work slower and is memory
 hungry (many terms are loaded into memmory, and than searched).
 
 The solution suggested was to split dates into sub fields during
 indexing and use those fields while searching. This makes it more
 effective but harder to create a query (personally I prefer working on
 queries build using Lucene API, than ones parsed by QueryParser).
 
 For instance the time stamp 2004-10-01 15:34:26.001 may be split into
 following fields:
 some-date_year: 2004
 some-date_month: 10
 some-date_day: 01
 some-date_time: 153426001
 
 The above fields should be indexed so they can be searched. They give
 some nice possibilities, for instance fast and easy querying for all
 documents that have a date in a particular year, month or day of month.
 For conveniece one could also store weekdays.
 
 A query for a date range from 15th august to 10th october 2004 (in no
 particular query language - this just gives an idea):
 some-date_year = 2004 AND (
   (some-date_month = 08 AND some-date_day = 15) OR
   (some-date_month=09) OR
   (some-date_month = 10 AND some-date_day = 10)
 )
 
 As You can see it is easy to build such a query from the lucene API. The
 equalities are Term queries. The inequalities are Range queries. The AND
 and OR operators can be provided by usage of Boolean queries.
 
 Have fun implementing the solution - it has only one disadvantage. It
 makes results sorting not so easy. The solution for it is usage of
 multiple sort fields, or another stored field containing a full date
 (one almost surely will need to store a date for each hit, unless You
 want to write some baroque code to calculate date from split fields
 values).
 
 Have fun,
 --
 Damian Gajda
 Caltha Sp. j.
 
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 



-- 
___
Chris Fraschetti, Student CompSci System Admin
University of San Francisco
e [EMAIL PROTECTED] | http://meteora.cs.usfca.edu

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: BooleanQuery - Too Many Clases on date range.

2004-10-01 Thread Scott Ganyo
You can use:
BooleanQuery.setMaxClauseCount(int maxClauseCount);
to increase the limit.
On Sep 30, 2004, at 8:24 PM, Chris Fraschetti wrote:
I recently read in regards to my problem that date_field:[0820483200
TO 110448]
is evluated into a series of boolean queries ... which has a cap of
1024 ... considering my documents will have dates spanning over many
years, and i need the granualirity of 'by day' searching, are there
any reccomendations on how to make this work?
Currently with query: +content_field:sometext +date_field:[0820483200
TO 110448]
I get the following exception:
org.apache.lucene.search.BooleanQuery$TooManyClauses
any suggestions on how I can still keep the granuality of by day, but
without limiting my search results? Are there any date formats that I
can change those numbers to that would allow me to complete the search
(i.e.  Feb, 15 2004 ) .. can lucene's range do a proper search on
formatted dates?
Is there a combination of RangeQuery and Query/MultiTermQuery that I 
can use?

your help is greatly appreciated.
--
___
Chris Fraschetti
e [EMAIL PROTECTED]
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: BooleanQuery - Too Many Clases on date range.

2004-10-01 Thread Damian Gajda
Dnia 01-10-2004, pi o godzinie 07:57 -0500, Scott Ganyo napisa(a):
 You can use:
 
 BooleanQuery.setMaxClauseCount(int maxClauseCount);

I had a similar problem with date ranges. Someone on the list suggested
me a solution to my problems but it was more clever than the above
solution, which helps but makes the searches work slower and is memory
hungry (many terms are loaded into memmory, and than searched).

The solution suggested was to split dates into sub fields during
indexing and use those fields while searching. This makes it more
effective but harder to create a query (personally I prefer working on
queries build using Lucene API, than ones parsed by QueryParser).

For instance the time stamp 2004-10-01 15:34:26.001 may be split into
following fields:
some-date_year: 2004
some-date_month: 10
some-date_day: 01
some-date_time: 153426001

The above fields should be indexed so they can be searched. They give
some nice possibilities, for instance fast and easy querying for all
documents that have a date in a particular year, month or day of month.
For conveniece one could also store weekdays.

A query for a date range from 15th august to 10th october 2004 (in no
particular query language - this just gives an idea):
some-date_year = 2004 AND (
   (some-date_month = 08 AND some-date_day = 15) OR
   (some-date_month=09) OR
   (some-date_month = 10 AND some-date_day = 10)
)

As You can see it is easy to build such a query from the lucene API. The
equalities are Term queries. The inequalities are Range queries. The AND
and OR operators can be provided by usage of Boolean queries.

Have fun implementing the solution - it has only one disadvantage. It
makes results sorting not so easy. The solution for it is usage of
multiple sort fields, or another stored field containing a full date
(one almost surely will need to store a date for each hit, unless You
want to write some baroque code to calculate date from split fields
values).

Have fun,
-- 
Damian Gajda
Caltha Sp. j.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: BooleanQuery - Too Many Clases on date range.

2004-09-30 Thread Stephane James Vaucher
How about a DateFilter?

http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/DateFilter.html

I don't believe it's got the same restrictions as boolean queries.

HTH,
sv

On Thu, 30 Sep 2004, Chris Fraschetti wrote:

 I recently read in regards to my problem that date_field:[0820483200
 TO 110448]
 is evluated into a series of boolean queries ... which has a cap of
 1024 ... considering my documents will have dates spanning over many
 years, and i need the granualirity of 'by day' searching, are there
 any reccomendations on how to make this work?

 Currently with query: +content_field:sometext +date_field:[0820483200
 TO 110448]
 I get the following exception:
 org.apache.lucene.search.BooleanQuery$TooManyClauses


 any suggestions on how I can still keep the granuality of by day, but
 without limiting my search results? Are there any date formats that I
 can change those numbers to that would allow me to complete the search
 (i.e.  Feb, 15 2004 ) .. can lucene's range do a proper search on
 formatted dates?

 Is there a combination of RangeQuery and Query/MultiTermQuery that I can use?

 your help is greatly appreciated.





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]