Re: BooleanQuery - Too Many Clases on date range.

2004-10-04 Thread Morus Walter
Chris Fraschetti writes: So i decicded to move my epoch date to the 20040608 date which fixed my boolean query problem in regards to my current data size (approx 600,000) but now as soon as I do a query like ... a* I get the boolean error again. Google obviously can handle this

Re: BooleanQuery - Too Many Clases on date range.

2004-10-04 Thread Chris Fraschetti
Surely some folks out there have used lucene on a large scale and have had to compensate for this somehow, any other solutions? Morus, thank you very more for your imput, and I am looking into your solution, just putting my feelers out there once more. The lucene API is very limited as to it's

different analyzer all produce the same index?

2004-10-04 Thread Daan Hoogland
H all, I try to create different indices using different Analyzer-classes. I tried standard, german, russian, and cjk. They all produce exactly the same index file (md5-wise). There are over 280 pages so I expected at least some differences. Any ideas anyone? -- The information contained

Re: BooleanQuery - Too Many Clases on date range.

2004-10-04 Thread Otis Gospodnetic
There are some articles about Lucene. You can find the links on Lucene's Wiki. Lucene in Action is almost done: http://www.manning.com/catalog/view.php?book=hatcher2 I don't think you can pre-order it from the publisher, but you can probably pre-order it from Amazon. I don't know of any other

Re: different analyzer all produce the same index?

2004-10-04 Thread sergiu gordea
Daan Hoogland wrote: H all, I try to create different indices using different Analyzer-classes. I tried standard, german, russian, and cjk. They all produce exactly the same index file (md5-wise). There are over 280 pages so I expected at least some differences. Take a look in the lucene

Re: Prevent Lucene from returning short length text...

2004-10-04 Thread Otis Gospodnetic
Kevin, You could try setting index-time field length-dependent boosts. Another possibility may be your own sorting, that takes field length in consideration, but I'm not sure how well that would work. Finally, you could use your own Similarity and implement your own

Re: Prevent Lucene from returning short length text...

2004-10-04 Thread John Moylan
...or just set a lower boost on fileds with less than $x amount of characters while indexing. John Otis Gospodnetic wrote: Kevin, You could try setting index-time field length-dependent boosts. Another possibility may be your own sorting, that takes field length in consideration, but I'm not

Re: different analyzer all produce the same index?

2004-10-04 Thread Morus Walter
sergiu gordea writes: Daan Hoogland wrote: H all, I try to create different indices using different Analyzer-classes. I tried standard, german, russian, and cjk. They all produce exactly the same index file (md5-wise). There are over 280 pages so I expected at least some differences.

Re: accessing Term Vector info

2004-10-04 Thread Grant Ingersoll
See IndexReader#getTermFreqVector() in the javadocs [EMAIL PROTECTED] 10/4/2004 10:29:30 AM hi all i am indexing documents consisting of fields for a database id, and text the text field is created as new Field(FULL_TEXT,text, false,true, true, true) in order to store the Term Vector

RE: multiple threads

2004-10-04 Thread Aviran
You should not have more then one IndexWriter. (You can have multiple IndexReaders, but only one IndexWriter). Aviran -Original Message- From: Justin Swanhart [mailto:[EMAIL PROTECTED] Sent: Friday, October 01, 2004 19:14 PM To: [EMAIL PROTECTED] Subject: multiple threads As I

Re: BooleanQuery - Too Many Clases on date range.

2004-10-04 Thread Stephane James Vaucher
BTW, what's wrong with the DateFilter solution, I mentionned earlier? I've used it before (before lucene-1.4 though) without memory problems, thus I always assumed that it avoided the allocation problems with prefix queries. sv On Mon, 4 Oct 2004, Chris Fraschetti wrote: Surely some folks out

Re: BooleanQuery - Too Many Clases on date range.

2004-10-04 Thread Chris Fraschetti
The date portion of my code works great now.. no problems there, so let me thank you now for your date filter solution... but my current problem is in regards to a stand alone a* query giving me the too many clauses exception On Mon, 4 Oct 2004 12:47:24 -0400 (EDT), Stephane James

Re: BooleanQuery - Too Many Clases on date range.

2004-10-04 Thread Stephane James Vaucher
Ok, got it, got a small comment though. For large wildcard queries, please note that google does not support wild cards. Search hell*, and there will be no correct matches with hello. Is there a reason why you wish to allow such large queries? We might be able to find alternative ways of helping

Re: BooleanQuery - Too Many Clases on date range.

2004-10-04 Thread Chris Fraschetti
absoultely, limiting the user's query is no problem here. I've currently implemented the lucene javascript to catcha lot of user quries that could cause issues.. blank queries, ? or * at the beginning of query, etc etc... but I couldn't think of a way to prevent the user from doing a* but not

Re: BooleanQuery - Too Many Clases on date range.

2004-10-04 Thread Stephane James Vaucher
I've used the simple message that the user's request was too vague and that he should modify it. I haven't had too many complaints about this especially when I explained why to a client: If one user of many does a*, the whole system will grind to a halt as that one request will use up all of the

Re: BooleanQuery - Too Many Clases on date range.

2004-10-04 Thread Sergiu Gordea
Chris Fraschetti wrote: absoultely, limiting the user's query is no problem here. I've currently implemented the lucene javascript to catcha lot of user quries that could cause issues.. blank queries, ? or * at the beginning of query, etc etc... but I couldn't think of a way to prevent the user

RE: Question regarding using Lucene or not

2004-10-04 Thread AmitShukla
Thanks Daniel Can you tell me two more things. 1. How difficult it is to implement our own Similarity class that can do the things we want ? 2. If there are more than one field that are percentage match like HP, can we also specify which field gets the preference while search. For example, in the

Re: Question regarding using Lucene or not

2004-10-04 Thread Daniel Naber
On Monday 04 October 2004 22:22, you wrote: 1. How difficult it is to implement our own Similarity class that can do the things we want ? It should be very easy. The API is described here: http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Similarity.html I think in your case

Re: Orphan segment files

2004-10-04 Thread Edwin Tang
Dmitry, Thanks for the help and pointers thus far. I know (or believe at the least) that the files are not referenced by opening segments and deletable with a hex editor. I've explored the possibility of an exception that is not recorded to a log file or written out to screen, so have double

RE: Orphan segment files

2004-10-04 Thread Armbrust, Daniel C.
There was a broken version of Lucene in there - (I think the 1.4 release?) which was not cleaning up old files after you did an optimize in certain cases. For me, upgrading to 1.4.1, and re-optimizing automatically cleaned up the index. You may have to add and remove a dummy document first,

Re: WildCardQuery

2004-10-04 Thread Stephane James Vaucher
On Fri, 1 Oct 2004, Robinson Raju wrote: analyzer is StandardAnalyzer. i use MultiFieldQueryParser to parse. The flow is this: I have indexed a Database view. Now i need to search against a few columns i take in the search criteria and search field , construct a wildcard query and add it

Re: MultiFieldQueryParser seems broken... Fix attached.

2004-10-04 Thread Bill Janssen
Doug Cutting writes: http://issues.apache.org/eyebrowse/[EMAIL PROTECTED]msgId=1798116 Yes, the approach there is similar. I attempted to complete the solution and provide a working replacement for MultiFieldQueryParser. But, inspired by that message, couldn't MultiFieldQueryParser