Re: BooleanQuery - Too Many Clases on date range.
On Oct 4, 2004, at 2:12 PM, Chris Fraschetti wrote: absoultely, limiting the user's query is no problem here. I've currently implemented the lucene javascript to catcha lot of user quries that could cause issues.. blank queries, ? or * at the beginning of query, etc etc... but I couldn't think of a way to prevent the user from doing a* but not comment* wanting comments or commentary... any suggestions would be warmly welcomed. I recommend subclassing QueryParser, and overriding getPrefixQuery and getWildcardQuery. In both of the overridden methods, throw a ParseException. You should be handling ParseException gracefully somehow already, so that should do the trick. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: BooleanQuery - Too Many Clases on date range.
How about use inter based filter instead of datatime based filter. datetime can convert to unix timestamp for compare. Thanks Che Dong http://www.chedong.com/ Chris Fraschetti wrote: Surely some folks out there have used lucene on a large scale and have had to compensate for this somehow, any other solutions? Morus, thank you very more for your imput, and I am looking into your solution, just putting my feelers out there once more. The lucene API is very limited as to it's descriptions of it's components, short of digging into the code, is there a good doc somewhere out there that explains the workins of lucene? On Mon, 4 Oct 2004 01:57:06 -0700, Chris Fraschetti [EMAIL PROTECTED] wrote: So before I spend a significant amount of time digging into the lucene code, how does your experience with lucene give light to my situation Our current index is pretty huge, and with each increase in side i've had, i've experienced a problem like this... Without taking up too much of your time.. because obviously this i my task, I thought i'd ask you if you'd had any experience with this boolean clause nonsense... of course it can be overcome, but if you know a quick hack, awesome, otherwise.. no big, but off to work i go :) -Fraschetti -- Forwarded message -- From: Morus Walter [EMAIL PROTECTED] Date: Mon, 4 Oct 2004 09:01:50 +0200 Subject: Re: BooleanQuery - Too Many Clases on date range. To: Lucene Users List [EMAIL PROTECTED], Chris Fraschetti [EMAIL PROTECTED] Chris Fraschetti writes: So i decicded to move my epoch date to the 20040608 date which fixed my boolean query problem in regards to my current data size (approx 600,000) but now as soon as I do a query like ... a* I get the boolean error again. Google obviously can handle this query, and I'm pretty sure lucene can handle it.. any ideas? With out without a date dange specified i still get the TooManyClauses error. I tired cranking the maxclauses up to Integer.MaxInt, but java gave me a out of memory error. Is this b/c the boolean search tried to allocate that many clauses by default or because my query actually needed that many clauses? boolean search allocates clauses for all tokens having the prefix or matching the wildcard expression. Why does it work on small indexes but not large? Because there are fewer tokens starting with a. Is there any way to have the parser create as many clauses as it can and then search with what it has? w/o recompiling the source? You need to create your own version of Wildcard- and Prefix-Query that takes a maximum term number and ignores further clauses. And you need a variant of the query parser that uses these queries. This can be done, even without recompiling lucene, but you will have to do some programming at the level of lucene queries. Shouldn't be hard, since you can use the sources as a starting point. I guess this does not exist because the lucene developer decided to prefer a query error rather than uncomplete results. Morus -- ___ Chris Fraschetti, Student CompSci System Admin University of San Francisco e [EMAIL PROTECTED] | http://meteora.cs.usfca.edu - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: BooleanQuery - Too Many Clases on date range.
Chris Fraschetti writes: So i decicded to move my epoch date to the 20040608 date which fixed my boolean query problem in regards to my current data size (approx 600,000) but now as soon as I do a query like ... a* I get the boolean error again. Google obviously can handle this query, and I'm pretty sure jguru.com can handle it too.. any ideas? With out without a date dange specified i still get teh TooManyClauses error. I tired cranking the maxclauses up to Integer.MaxInt, but java gave me a out of memory error. Is this b/c the boolean search tried to allocate that many clauses by default or because my query actually needed that many clauses? boolean search allocates clauses for all tokens having the prefix or matching the wildcard expression. Why does it work on small indexes but not large? Because there are fewer tokens starting with a. Is there any way to have the parser create as many clauses as it can and then search with what it has? w/o recompiling the source? You need to create your own version of Wildcard- and Prefix-Query that takes a maximum term number and ignores further clauses. And you need a variant of the query parser that uses these queries. This can be done, even without recompiling lucene, but you will have to do some programming at the level of lucene queries. Shouldn't be hard, since you can use the sources as a starting point. I guess this does not exist because the lucene developer decided to prefer a query error rather than uncomplete results. Morus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: BooleanQuery - Too Many Clases on date range.
Surely some folks out there have used lucene on a large scale and have had to compensate for this somehow, any other solutions? Morus, thank you very more for your imput, and I am looking into your solution, just putting my feelers out there once more. The lucene API is very limited as to it's descriptions of it's components, short of digging into the code, is there a good doc somewhere out there that explains the workins of lucene? On Mon, 4 Oct 2004 01:57:06 -0700, Chris Fraschetti [EMAIL PROTECTED] wrote: So before I spend a significant amount of time digging into the lucene code, how does your experience with lucene give light to my situation Our current index is pretty huge, and with each increase in side i've had, i've experienced a problem like this... Without taking up too much of your time.. because obviously this i my task, I thought i'd ask you if you'd had any experience with this boolean clause nonsense... of course it can be overcome, but if you know a quick hack, awesome, otherwise.. no big, but off to work i go :) -Fraschetti -- Forwarded message -- From: Morus Walter [EMAIL PROTECTED] Date: Mon, 4 Oct 2004 09:01:50 +0200 Subject: Re: BooleanQuery - Too Many Clases on date range. To: Lucene Users List [EMAIL PROTECTED], Chris Fraschetti [EMAIL PROTECTED] Chris Fraschetti writes: So i decicded to move my epoch date to the 20040608 date which fixed my boolean query problem in regards to my current data size (approx 600,000) but now as soon as I do a query like ... a* I get the boolean error again. Google obviously can handle this query, and I'm pretty sure lucene can handle it.. any ideas? With out without a date dange specified i still get the TooManyClauses error. I tired cranking the maxclauses up to Integer.MaxInt, but java gave me a out of memory error. Is this b/c the boolean search tried to allocate that many clauses by default or because my query actually needed that many clauses? boolean search allocates clauses for all tokens having the prefix or matching the wildcard expression. Why does it work on small indexes but not large? Because there are fewer tokens starting with a. Is there any way to have the parser create as many clauses as it can and then search with what it has? w/o recompiling the source? You need to create your own version of Wildcard- and Prefix-Query that takes a maximum term number and ignores further clauses. And you need a variant of the query parser that uses these queries. This can be done, even without recompiling lucene, but you will have to do some programming at the level of lucene queries. Shouldn't be hard, since you can use the sources as a starting point. I guess this does not exist because the lucene developer decided to prefer a query error rather than uncomplete results. Morus -- ___ Chris Fraschetti, Student CompSci System Admin University of San Francisco e [EMAIL PROTECTED] | http://meteora.cs.usfca.edu -- ___ Chris Fraschetti, Student CompSci System Admin University of San Francisco e [EMAIL PROTECTED] | http://meteora.cs.usfca.edu - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: BooleanQuery - Too Many Clases on date range.
There are some articles about Lucene. You can find the links on Lucene's Wiki. Lucene in Action is almost done: http://www.manning.com/catalog/view.php?book=hatcher2 I don't think you can pre-order it from the publisher, but you can probably pre-order it from Amazon. I don't know of any other good Lucene documentation. Otis --- Chris Fraschetti [EMAIL PROTECTED] wrote: Surely some folks out there have used lucene on a large scale and have had to compensate for this somehow, any other solutions? Morus, thank you very more for your imput, and I am looking into your solution, just putting my feelers out there once more. The lucene API is very limited as to it's descriptions of it's components, short of digging into the code, is there a good doc somewhere out there that explains the workins of lucene? On Mon, 4 Oct 2004 01:57:06 -0700, Chris Fraschetti [EMAIL PROTECTED] wrote: So before I spend a significant amount of time digging into the lucene code, how does your experience with lucene give light to my situation Our current index is pretty huge, and with each increase in side i've had, i've experienced a problem like this... Without taking up too much of your time.. because obviously this i my task, I thought i'd ask you if you'd had any experience with this boolean clause nonsense... of course it can be overcome, but if you know a quick hack, awesome, otherwise.. no big, but off to work i go :) -Fraschetti -- Forwarded message -- From: Morus Walter [EMAIL PROTECTED] Date: Mon, 4 Oct 2004 09:01:50 +0200 Subject: Re: BooleanQuery - Too Many Clases on date range. To: Lucene Users List [EMAIL PROTECTED], Chris Fraschetti [EMAIL PROTECTED] Chris Fraschetti writes: So i decicded to move my epoch date to the 20040608 date which fixed my boolean query problem in regards to my current data size (approx 600,000) but now as soon as I do a query like ... a* I get the boolean error again. Google obviously can handle this query, and I'm pretty sure lucene can handle it.. any ideas? With out without a date dange specified i still get the TooManyClauses error. I tired cranking the maxclauses up to Integer.MaxInt, but java gave me a out of memory error. Is this b/c the boolean search tried to allocate that many clauses by default or because my query actually needed that many clauses? boolean search allocates clauses for all tokens having the prefix or matching the wildcard expression. Why does it work on small indexes but not large? Because there are fewer tokens starting with a. Is there any way to have the parser create as many clauses as it can and then search with what it has? w/o recompiling the source? You need to create your own version of Wildcard- and Prefix-Query that takes a maximum term number and ignores further clauses. And you need a variant of the query parser that uses these queries. This can be done, even without recompiling lucene, but you will have to do some programming at the level of lucene queries. Shouldn't be hard, since you can use the sources as a starting point. I guess this does not exist because the lucene developer decided to prefer a query error rather than uncomplete results. Morus -- ___ Chris Fraschetti, Student CompSci System Admin University of San Francisco e [EMAIL PROTECTED] | http://meteora.cs.usfca.edu -- ___ Chris Fraschetti, Student CompSci System Admin University of San Francisco e [EMAIL PROTECTED] | http://meteora.cs.usfca.edu - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: BooleanQuery - Too Many Clases on date range.
BTW, what's wrong with the DateFilter solution, I mentionned earlier? I've used it before (before lucene-1.4 though) without memory problems, thus I always assumed that it avoided the allocation problems with prefix queries. sv On Mon, 4 Oct 2004, Chris Fraschetti wrote: Surely some folks out there have used lucene on a large scale and have had to compensate for this somehow, any other solutions? Morus, thank you very more for your imput, and I am looking into your solution, just putting my feelers out there once more. The lucene API is very limited as to it's descriptions of it's components, short of digging into the code, is there a good doc somewhere out there that explains the workins of lucene? On Mon, 4 Oct 2004 01:57:06 -0700, Chris Fraschetti [EMAIL PROTECTED] wrote: So before I spend a significant amount of time digging into the lucene code, how does your experience with lucene give light to my situation Our current index is pretty huge, and with each increase in side i've had, i've experienced a problem like this... Without taking up too much of your time.. because obviously this i my task, I thought i'd ask you if you'd had any experience with this boolean clause nonsense... of course it can be overcome, but if you know a quick hack, awesome, otherwise.. no big, but off to work i go :) -Fraschetti -- Forwarded message -- From: Morus Walter [EMAIL PROTECTED] Date: Mon, 4 Oct 2004 09:01:50 +0200 Subject: Re: BooleanQuery - Too Many Clases on date range. To: Lucene Users List [EMAIL PROTECTED], Chris Fraschetti [EMAIL PROTECTED] Chris Fraschetti writes: So i decicded to move my epoch date to the 20040608 date which fixed my boolean query problem in regards to my current data size (approx 600,000) but now as soon as I do a query like ... a* I get the boolean error again. Google obviously can handle this query, and I'm pretty sure lucene can handle it.. any ideas? With out without a date dange specified i still get the TooManyClauses error. I tired cranking the maxclauses up to Integer.MaxInt, but java gave me a out of memory error. Is this b/c the boolean search tried to allocate that many clauses by default or because my query actually needed that many clauses? boolean search allocates clauses for all tokens having the prefix or matching the wildcard expression. Why does it work on small indexes but not large? Because there are fewer tokens starting with a. Is there any way to have the parser create as many clauses as it can and then search with what it has? w/o recompiling the source? You need to create your own version of Wildcard- and Prefix-Query that takes a maximum term number and ignores further clauses. And you need a variant of the query parser that uses these queries. This can be done, even without recompiling lucene, but you will have to do some programming at the level of lucene queries. Shouldn't be hard, since you can use the sources as a starting point. I guess this does not exist because the lucene developer decided to prefer a query error rather than uncomplete results. Morus -- ___ Chris Fraschetti, Student CompSci System Admin University of San Francisco e [EMAIL PROTECTED] | http://meteora.cs.usfca.edu - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: BooleanQuery - Too Many Clases on date range.
The date portion of my code works great now.. no problems there, so let me thank you now for your date filter solution... but my current problem is in regards to a stand alone a* query giving me the too many clauses exception On Mon, 4 Oct 2004 12:47:24 -0400 (EDT), Stephane James Vaucher [EMAIL PROTECTED] wrote: BTW, what's wrong with the DateFilter solution, I mentionned earlier? I've used it before (before lucene-1.4 though) without memory problems, thus I always assumed that it avoided the allocation problems with prefix queries. sv On Mon, 4 Oct 2004, Chris Fraschetti wrote: Surely some folks out there have used lucene on a large scale and have had to compensate for this somehow, any other solutions? Morus, thank you very more for your imput, and I am looking into your solution, just putting my feelers out there once more. The lucene API is very limited as to it's descriptions of it's components, short of digging into the code, is there a good doc somewhere out there that explains the workins of lucene? On Mon, 4 Oct 2004 01:57:06 -0700, Chris Fraschetti [EMAIL PROTECTED] wrote: So before I spend a significant amount of time digging into the lucene code, how does your experience with lucene give light to my situation Our current index is pretty huge, and with each increase in side i've had, i've experienced a problem like this... Without taking up too much of your time.. because obviously this i my task, I thought i'd ask you if you'd had any experience with this boolean clause nonsense... of course it can be overcome, but if you know a quick hack, awesome, otherwise.. no big, but off to work i go :) -Fraschetti -- Forwarded message -- From: Morus Walter [EMAIL PROTECTED] Date: Mon, 4 Oct 2004 09:01:50 +0200 Subject: Re: BooleanQuery - Too Many Clases on date range. To: Lucene Users List [EMAIL PROTECTED], Chris Fraschetti [EMAIL PROTECTED] Chris Fraschetti writes: So i decicded to move my epoch date to the 20040608 date which fixed my boolean query problem in regards to my current data size (approx 600,000) but now as soon as I do a query like ... a* I get the boolean error again. Google obviously can handle this query, and I'm pretty sure lucene can handle it.. any ideas? With out without a date dange specified i still get the TooManyClauses error. I tired cranking the maxclauses up to Integer.MaxInt, but java gave me a out of memory error. Is this b/c the boolean search tried to allocate that many clauses by default or because my query actually needed that many clauses? boolean search allocates clauses for all tokens having the prefix or matching the wildcard expression. Why does it work on small indexes but not large? Because there are fewer tokens starting with a. Is there any way to have the parser create as many clauses as it can and then search with what it has? w/o recompiling the source? You need to create your own version of Wildcard- and Prefix-Query that takes a maximum term number and ignores further clauses. And you need a variant of the query parser that uses these queries. This can be done, even without recompiling lucene, but you will have to do some programming at the level of lucene queries. Shouldn't be hard, since you can use the sources as a starting point. I guess this does not exist because the lucene developer decided to prefer a query error rather than uncomplete results. Morus -- ___ Chris Fraschetti, Student CompSci System Admin University of San Francisco e [EMAIL PROTECTED] | http://meteora.cs.usfca.edu - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- ___ Chris Fraschetti, Student CompSci System Admin University of San Francisco e [EMAIL PROTECTED] | http://meteora.cs.usfca.edu - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: BooleanQuery - Too Many Clases on date range.
Ok, got it, got a small comment though. For large wildcard queries, please note that google does not support wild cards. Search hell*, and there will be no correct matches with hello. Is there a reason why you wish to allow such large queries? We might be able to find alternative ways of helping you out. No one will use a query a*. If someone does, the results would be completely meaningless (many false positives for a user). However a query like program* might be interesting to a user. The problem with hacking term expansion is that the rules of this expansion might be hard to define (as is maybe one should use the first, the most frequent terms or the even the least frequent, depending on your app). sv On Mon, 4 Oct 2004, Chris Fraschetti wrote: The date portion of my code works great now.. no problems there, so let me thank you now for your date filter solution... but my current problem is in regards to a stand alone a* query giving me the too many clauses exception On Mon, 4 Oct 2004 12:47:24 -0400 (EDT), Stephane James Vaucher [EMAIL PROTECTED] wrote: BTW, what's wrong with the DateFilter solution, I mentionned earlier? I've used it before (before lucene-1.4 though) without memory problems, thus I always assumed that it avoided the allocation problems with prefix queries. sv On Mon, 4 Oct 2004, Chris Fraschetti wrote: Surely some folks out there have used lucene on a large scale and have had to compensate for this somehow, any other solutions? Morus, thank you very more for your imput, and I am looking into your solution, just putting my feelers out there once more. The lucene API is very limited as to it's descriptions of it's components, short of digging into the code, is there a good doc somewhere out there that explains the workins of lucene? On Mon, 4 Oct 2004 01:57:06 -0700, Chris Fraschetti [EMAIL PROTECTED] wrote: So before I spend a significant amount of time digging into the lucene code, how does your experience with lucene give light to my situation Our current index is pretty huge, and with each increase in side i've had, i've experienced a problem like this... Without taking up too much of your time.. because obviously this i my task, I thought i'd ask you if you'd had any experience with this boolean clause nonsense... of course it can be overcome, but if you know a quick hack, awesome, otherwise.. no big, but off to work i go :) -Fraschetti -- Forwarded message -- From: Morus Walter [EMAIL PROTECTED] Date: Mon, 4 Oct 2004 09:01:50 +0200 Subject: Re: BooleanQuery - Too Many Clases on date range. To: Lucene Users List [EMAIL PROTECTED], Chris Fraschetti [EMAIL PROTECTED] Chris Fraschetti writes: So i decicded to move my epoch date to the 20040608 date which fixed my boolean query problem in regards to my current data size (approx 600,000) but now as soon as I do a query like ... a* I get the boolean error again. Google obviously can handle this query, and I'm pretty sure lucene can handle it.. any ideas? With out without a date dange specified i still get the TooManyClauses error. I tired cranking the maxclauses up to Integer.MaxInt, but java gave me a out of memory error. Is this b/c the boolean search tried to allocate that many clauses by default or because my query actually needed that many clauses? boolean search allocates clauses for all tokens having the prefix or matching the wildcard expression. Why does it work on small indexes but not large? Because there are fewer tokens starting with a. Is there any way to have the parser create as many clauses as it can and then search with what it has? w/o recompiling the source? You need to create your own version of Wildcard- and Prefix-Query that takes a maximum term number and ignores further clauses. And you need a variant of the query parser that uses these queries. This can be done, even without recompiling lucene, but you will have to do some programming at the level of lucene queries. Shouldn't be hard, since you can use the sources as a starting point. I guess this does not exist because the lucene developer decided to prefer a query error rather than uncomplete results. Morus -- ___ Chris Fraschetti, Student CompSci System Admin University of San Francisco e [EMAIL PROTECTED] | http://meteora.cs.usfca.edu - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e
Re: BooleanQuery - Too Many Clases on date range.
absoultely, limiting the user's query is no problem here. I've currently implemented the lucene javascript to catcha lot of user quries that could cause issues.. blank queries, ? or * at the beginning of query, etc etc... but I couldn't think of a way to prevent the user from doing a* but not comment* wanting comments or commentary... any suggestions would be warmly welcomed. On Mon, 4 Oct 2004 14:08:00 -0400 (EDT), Stephane James Vaucher [EMAIL PROTECTED] wrote: Ok, got it, got a small comment though. For large wildcard queries, please note that google does not support wild cards. Search hell*, and there will be no correct matches with hello. Is there a reason why you wish to allow such large queries? We might be able to find alternative ways of helping you out. No one will use a query a*. If someone does, the results would be completely meaningless (many false positives for a user). However a query like program* might be interesting to a user. The problem with hacking term expansion is that the rules of this expansion might be hard to define (as is maybe one should use the first, the most frequent terms or the even the least frequent, depending on your app). sv On Mon, 4 Oct 2004, Chris Fraschetti wrote: The date portion of my code works great now.. no problems there, so let me thank you now for your date filter solution... but my current problem is in regards to a stand alone a* query giving me the too many clauses exception On Mon, 4 Oct 2004 12:47:24 -0400 (EDT), Stephane James Vaucher [EMAIL PROTECTED] wrote: BTW, what's wrong with the DateFilter solution, I mentionned earlier? I've used it before (before lucene-1.4 though) without memory problems, thus I always assumed that it avoided the allocation problems with prefix queries. sv On Mon, 4 Oct 2004, Chris Fraschetti wrote: Surely some folks out there have used lucene on a large scale and have had to compensate for this somehow, any other solutions? Morus, thank you very more for your imput, and I am looking into your solution, just putting my feelers out there once more. The lucene API is very limited as to it's descriptions of it's components, short of digging into the code, is there a good doc somewhere out there that explains the workins of lucene? On Mon, 4 Oct 2004 01:57:06 -0700, Chris Fraschetti [EMAIL PROTECTED] wrote: So before I spend a significant amount of time digging into the lucene code, how does your experience with lucene give light to my situation Our current index is pretty huge, and with each increase in side i've had, i've experienced a problem like this... Without taking up too much of your time.. because obviously this i my task, I thought i'd ask you if you'd had any experience with this boolean clause nonsense... of course it can be overcome, but if you know a quick hack, awesome, otherwise.. no big, but off to work i go :) -Fraschetti -- Forwarded message -- From: Morus Walter [EMAIL PROTECTED] Date: Mon, 4 Oct 2004 09:01:50 +0200 Subject: Re: BooleanQuery - Too Many Clases on date range. To: Lucene Users List [EMAIL PROTECTED], Chris Fraschetti [EMAIL PROTECTED] Chris Fraschetti writes: So i decicded to move my epoch date to the 20040608 date which fixed my boolean query problem in regards to my current data size (approx 600,000) but now as soon as I do a query like ... a* I get the boolean error again. Google obviously can handle this query, and I'm pretty sure lucene can handle it.. any ideas? With out without a date dange specified i still get the TooManyClauses error. I tired cranking the maxclauses up to Integer.MaxInt, but java gave me a out of memory error. Is this b/c the boolean search tried to allocate that many clauses by default or because my query actually needed that many clauses? boolean search allocates clauses for all tokens having the prefix or matching the wildcard expression. Why does it work on small indexes but not large? Because there are fewer tokens starting with a. Is there any way to have the parser create as many clauses as it can and then search with what it has? w/o recompiling the source? You need to create your own version of Wildcard- and Prefix-Query that takes a maximum term number and ignores further clauses. And you need a variant of the query parser that uses these queries. This can be done, even without recompiling lucene, but you will have to do some programming at the level of lucene queries. Shouldn't be hard, since you can use the sources as a starting point. I guess this does not exist because the lucene
Re: BooleanQuery - Too Many Clases on date range.
I've used the simple message that the user's request was too vague and that he should modify it. I haven't had too many complaints about this especially when I explained why to a client: If one user of many does a*, the whole system will grind to a halt as that one request will use up all of the available memory (wildcards aren't very scalable...). Here is an example of a working system: http://theserverside.com/search/search.tss I don't know if many people complain that when they do a*, that no results appear, but a request for javap* returns javapro, javaplus, javapolis... HTH, sv On Mon, 4 Oct 2004, Chris Fraschetti wrote: absoultely, limiting the user's query is no problem here. I've currently implemented the lucene javascript to catcha lot of user quries that could cause issues.. blank queries, ? or * at the beginning of query, etc etc... but I couldn't think of a way to prevent the user from doing a* but not comment* wanting comments or commentary... any suggestions would be warmly welcomed. On Mon, 4 Oct 2004 14:08:00 -0400 (EDT), Stephane James Vaucher [EMAIL PROTECTED] wrote: Ok, got it, got a small comment though. For large wildcard queries, please note that google does not support wild cards. Search hell*, and there will be no correct matches with hello. Is there a reason why you wish to allow such large queries? We might be able to find alternative ways of helping you out. No one will use a query a*. If someone does, the results would be completely meaningless (many false positives for a user). However a query like program* might be interesting to a user. The problem with hacking term expansion is that the rules of this expansion might be hard to define (as is maybe one should use the first, the most frequent terms or the even the least frequent, depending on your app). sv On Mon, 4 Oct 2004, Chris Fraschetti wrote: The date portion of my code works great now.. no problems there, so let me thank you now for your date filter solution... but my current problem is in regards to a stand alone a* query giving me the too many clauses exception On Mon, 4 Oct 2004 12:47:24 -0400 (EDT), Stephane James Vaucher [EMAIL PROTECTED] wrote: BTW, what's wrong with the DateFilter solution, I mentionned earlier? I've used it before (before lucene-1.4 though) without memory problems, thus I always assumed that it avoided the allocation problems with prefix queries. sv On Mon, 4 Oct 2004, Chris Fraschetti wrote: Surely some folks out there have used lucene on a large scale and have had to compensate for this somehow, any other solutions? Morus, thank you very more for your imput, and I am looking into your solution, just putting my feelers out there once more. The lucene API is very limited as to it's descriptions of it's components, short of digging into the code, is there a good doc somewhere out there that explains the workins of lucene? On Mon, 4 Oct 2004 01:57:06 -0700, Chris Fraschetti [EMAIL PROTECTED] wrote: So before I spend a significant amount of time digging into the lucene code, how does your experience with lucene give light to my situation Our current index is pretty huge, and with each increase in side i've had, i've experienced a problem like this... Without taking up too much of your time.. because obviously this i my task, I thought i'd ask you if you'd had any experience with this boolean clause nonsense... of course it can be overcome, but if you know a quick hack, awesome, otherwise.. no big, but off to work i go :) -Fraschetti -- Forwarded message -- From: Morus Walter [EMAIL PROTECTED] Date: Mon, 4 Oct 2004 09:01:50 +0200 Subject: Re: BooleanQuery - Too Many Clases on date range. To: Lucene Users List [EMAIL PROTECTED], Chris Fraschetti [EMAIL PROTECTED] Chris Fraschetti writes: So i decicded to move my epoch date to the 20040608 date which fixed my boolean query problem in regards to my current data size (approx 600,000) but now as soon as I do a query like ... a* I get the boolean error again. Google obviously can handle this query, and I'm pretty sure lucene can handle it.. any ideas? With out without a date dange specified i still get the TooManyClauses error. I tired cranking the maxclauses up to Integer.MaxInt, but java gave me a out of memory error. Is this b/c the boolean search tried to allocate that many clauses by default or because my query actually needed that many clauses? boolean search allocates clauses for all tokens having the prefix or matching the wildcard expression. Why does
Re: BooleanQuery - Too Many Clases on date range.
Chris Fraschetti wrote: absoultely, limiting the user's query is no problem here. I've currently implemented the lucene javascript to catcha lot of user quries that could cause issues.. blank queries, ? or * at the beginning of query, etc etc... but I couldn't think of a way to prevent the user from doing a* but not comment* wanting comments or commentary... any suggestions would be warmly welcomed. One cheap solution is to ask the user to enter at least 3 alfa-numerical chars. What do you say about that? All the best, Sergiu On Mon, 4 Oct 2004 14:08:00 -0400 (EDT), Stephane James Vaucher [EMAIL PROTECTED] wrote: Ok, got it, got a small comment though. For large wildcard queries, please note that google does not support wild cards. Search hell*, and there will be no correct matches with hello. Is there a reason why you wish to allow such large queries? We might be able to find alternative ways of helping you out. No one will use a query a*. If someone does, the results would be completely meaningless (many false positives for a user). However a query like program* might be interesting to a user. The problem with hacking term expansion is that the rules of this expansion might be hard to define (as is maybe one should use the first, the most frequent terms or the even the least frequent, depending on your app). sv On Mon, 4 Oct 2004, Chris Fraschetti wrote: The date portion of my code works great now.. no problems there, so let me thank you now for your date filter solution... but my current problem is in regards to a stand alone a* query giving me the too many clauses exception On Mon, 4 Oct 2004 12:47:24 -0400 (EDT), Stephane James Vaucher [EMAIL PROTECTED] wrote: BTW, what's wrong with the DateFilter solution, I mentionned earlier? I've used it before (before lucene-1.4 though) without memory problems, thus I always assumed that it avoided the allocation problems with prefix queries. sv On Mon, 4 Oct 2004, Chris Fraschetti wrote: Surely some folks out there have used lucene on a large scale and have had to compensate for this somehow, any other solutions? Morus, thank you very more for your imput, and I am looking into your solution, just putting my feelers out there once more. The lucene API is very limited as to it's descriptions of it's components, short of digging into the code, is there a good doc somewhere out there that explains the workins of lucene? On Mon, 4 Oct 2004 01:57:06 -0700, Chris Fraschetti [EMAIL PROTECTED] wrote: So before I spend a significant amount of time digging into the lucene code, how does your experience with lucene give light to my situation Our current index is pretty huge, and with each increase in side i've had, i've experienced a problem like this... Without taking up too much of your time.. because obviously this i my task, I thought i'd ask you if you'd had any experience with this boolean clause nonsense... of course it can be overcome, but if you know a quick hack, awesome, otherwise.. no big, but off to work i go :) -Fraschetti -- Forwarded message -- From: Morus Walter [EMAIL PROTECTED] Date: Mon, 4 Oct 2004 09:01:50 +0200 Subject: Re: BooleanQuery - Too Many Clases on date range. To: Lucene Users List [EMAIL PROTECTED], Chris Fraschetti [EMAIL PROTECTED] Chris Fraschetti writes: So i decicded to move my epoch date to the 20040608 date which fixed my boolean query problem in regards to my current data size (approx 600,000) but now as soon as I do a query like ... a* I get the boolean error again. Google obviously can handle this query, and I'm pretty sure lucene can handle it.. any ideas? With out without a date dange specified i still get the TooManyClauses error. I tired cranking the maxclauses up to Integer.MaxInt, but java gave me a out of memory error. Is this b/c the boolean search tried to allocate that many clauses by default or because my query actually needed that many clauses? boolean search allocates clauses for all tokens having the prefix or matching the wildcard expression. Why does it work on small indexes but not large? Because there are fewer tokens starting with a. Is there any way to have the parser create as many clauses as it can and then search with what it has? w/o recompiling the source? You need to create your own version of Wildcard- and Prefix-Query that takes a maximum term number and ignores further clauses. And you need a variant of the query parser that uses these queries. This can be done, even without recompiling lucene, but you will have to do some programming at the level of lucene queries. Shouldn't be hard, since you can use the sources as a starting point. I guess this does not exist because the lucene developer decided to prefer a query error rather than uncomplete results. Morus
Re: BooleanQuery - Too Many Clases on date range.
So i decicded to move my epoch date to the 20040608 date which fixed my boolean query problem in regards to my current data size (approx 600,000) but now as soon as I do a query like ... a* I get the boolean error again. Google obviously can handle this query, and I'm pretty sure jguru.com can handle it too.. any ideas? With out without a date dange specified i still get teh TooManyClauses error. I tired cranking the maxclauses up to Integer.MaxInt, but java gave me a out of memory error. Is this b/c the boolean search tried to allocate that many clauses by default or because my query actually needed that many clauses? Why does it work on small indexes but not large? Is there any way to have the parser create as many clauses as it can and then search with what it has? w/o recompiling the source? Thanks! On Fri, 01 Oct 2004 15:48:36 +0200, Damian Gajda [EMAIL PROTECTED] wrote: Dnia 01-10-2004, pi± o godzinie 07:57 -0500, Scott Ganyo napisa³(a): You can use: BooleanQuery.setMaxClauseCount(int maxClauseCount); I had a similar problem with date ranges. Someone on the list suggested me a solution to my problems but it was more clever than the above solution, which helps but makes the searches work slower and is memory hungry (many terms are loaded into memmory, and than searched). The solution suggested was to split dates into sub fields during indexing and use those fields while searching. This makes it more effective but harder to create a query (personally I prefer working on queries build using Lucene API, than ones parsed by QueryParser). For instance the time stamp 2004-10-01 15:34:26.001 may be split into following fields: some-date_year: 2004 some-date_month: 10 some-date_day: 01 some-date_time: 153426001 The above fields should be indexed so they can be searched. They give some nice possibilities, for instance fast and easy querying for all documents that have a date in a particular year, month or day of month. For conveniece one could also store weekdays. A query for a date range from 15th august to 10th october 2004 (in no particular query language - this just gives an idea): some-date_year = 2004 AND ( (some-date_month = 08 AND some-date_day = 15) OR (some-date_month=09) OR (some-date_month = 10 AND some-date_day = 10) ) As You can see it is easy to build such a query from the lucene API. The equalities are Term queries. The inequalities are Range queries. The AND and OR operators can be provided by usage of Boolean queries. Have fun implementing the solution - it has only one disadvantage. It makes results sorting not so easy. The solution for it is usage of multiple sort fields, or another stored field containing a full date (one almost surely will need to store a date for each hit, unless You want to write some baroque code to calculate date from split fields values). Have fun, -- Damian Gajda Caltha Sp. j. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- ___ Chris Fraschetti, Student CompSci System Admin University of San Francisco e [EMAIL PROTECTED] | http://meteora.cs.usfca.edu - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: BooleanQuery - Too Many Clases on date range.
You can use: BooleanQuery.setMaxClauseCount(int maxClauseCount); to increase the limit. On Sep 30, 2004, at 8:24 PM, Chris Fraschetti wrote: I recently read in regards to my problem that date_field:[0820483200 TO 110448] is evluated into a series of boolean queries ... which has a cap of 1024 ... considering my documents will have dates spanning over many years, and i need the granualirity of 'by day' searching, are there any reccomendations on how to make this work? Currently with query: +content_field:sometext +date_field:[0820483200 TO 110448] I get the following exception: org.apache.lucene.search.BooleanQuery$TooManyClauses any suggestions on how I can still keep the granuality of by day, but without limiting my search results? Are there any date formats that I can change those numbers to that would allow me to complete the search (i.e. Feb, 15 2004 ) .. can lucene's range do a proper search on formatted dates? Is there a combination of RangeQuery and Query/MultiTermQuery that I can use? your help is greatly appreciated. -- ___ Chris Fraschetti e [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: BooleanQuery - Too Many Clases on date range.
Dnia 01-10-2004, pi o godzinie 07:57 -0500, Scott Ganyo napisa(a): You can use: BooleanQuery.setMaxClauseCount(int maxClauseCount); I had a similar problem with date ranges. Someone on the list suggested me a solution to my problems but it was more clever than the above solution, which helps but makes the searches work slower and is memory hungry (many terms are loaded into memmory, and than searched). The solution suggested was to split dates into sub fields during indexing and use those fields while searching. This makes it more effective but harder to create a query (personally I prefer working on queries build using Lucene API, than ones parsed by QueryParser). For instance the time stamp 2004-10-01 15:34:26.001 may be split into following fields: some-date_year: 2004 some-date_month: 10 some-date_day: 01 some-date_time: 153426001 The above fields should be indexed so they can be searched. They give some nice possibilities, for instance fast and easy querying for all documents that have a date in a particular year, month or day of month. For conveniece one could also store weekdays. A query for a date range from 15th august to 10th october 2004 (in no particular query language - this just gives an idea): some-date_year = 2004 AND ( (some-date_month = 08 AND some-date_day = 15) OR (some-date_month=09) OR (some-date_month = 10 AND some-date_day = 10) ) As You can see it is easy to build such a query from the lucene API. The equalities are Term queries. The inequalities are Range queries. The AND and OR operators can be provided by usage of Boolean queries. Have fun implementing the solution - it has only one disadvantage. It makes results sorting not so easy. The solution for it is usage of multiple sort fields, or another stored field containing a full date (one almost surely will need to store a date for each hit, unless You want to write some baroque code to calculate date from split fields values). Have fun, -- Damian Gajda Caltha Sp. j. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: BooleanQuery - Too Many Clases on date range.
How about a DateFilter? http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/DateFilter.html I don't believe it's got the same restrictions as boolean queries. HTH, sv On Thu, 30 Sep 2004, Chris Fraschetti wrote: I recently read in regards to my problem that date_field:[0820483200 TO 110448] is evluated into a series of boolean queries ... which has a cap of 1024 ... considering my documents will have dates spanning over many years, and i need the granualirity of 'by day' searching, are there any reccomendations on how to make this work? Currently with query: +content_field:sometext +date_field:[0820483200 TO 110448] I get the following exception: org.apache.lucene.search.BooleanQuery$TooManyClauses any suggestions on how I can still keep the granuality of by day, but without limiting my search results? Are there any date formats that I can change those numbers to that would allow me to complete the search (i.e. Feb, 15 2004 ) .. can lucene's range do a proper search on formatted dates? Is there a combination of RangeQuery and Query/MultiTermQuery that I can use? your help is greatly appreciated. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]