Re: best way of reusing IndexSearcher objects

2003-12-19 Thread Morus Walter
Doug Cutting writes: Dror Matalon wrote: There are two issues: 1. Having new searches start using the new index only when it's ready, not in a half baked state, which means that you have to synchronize the switch from the old index to the new one. That's true. If you're doing updates

Benchmark (WAS: Indexing Speed: Documents vs. Sentences)

2003-12-19 Thread Jochen Frey
Hello, Here's is a benchmark. I am not sure if that is proper etiquette, but I will just paste it into this mail and hope that it gets funneled into the right channels. Cheers! Jochen benchmark ul p bHardware Environment/bbr/ liiDedicated machine for

FW: Indexing Speed: Documents vs. Sentences

2003-12-19 Thread Jochen Frey
Stephane, The actual indexing is actually less glamorous than it sounds. When you index 1TB across 10 machines you end up with 100GB on each machine. We do not merge the indexes either, since we get better speed on indexing as well as querying when we keep indexes smaller and distributed

DoubleMetaphoneQuery

2003-12-19 Thread David Spencer
I've seen discussions about using the double metaphone algorithm with Lucene (basically: like soundex, used to find works that sound similar in English at least) but couldn't find an implementation, so I spent a few minutes and wrote a Query and TermEnum object for this. I may have missed the

Re: syntax of queries.

2003-12-19 Thread Ernesto De Santis
Erik, Thanks! The article is very good. thanks. I have news questions: - apiQuery.add(new TermQuery(new Term(contents, dot)), false, true); new Term(contents, dot) The Term class, work for only one word? this is right? new Term(contents, dot java) for search for dor OR java in contents. My

Sentence Endings: IndexWriter.maxFieldLength and Token.setPositionIncrement()

2003-12-19 Thread Jochen Frey
Hi! I hope this is the right forum for this post. I was wondering if other people would consider this a bug (it might be a feature and I am missing the point of it): .The default IndexWriter.maxFieldLength is 10,000. .The point of maxFieldLength is to limit memory usage. .The current position

Re: DoubleMetaphoneQuery

2003-12-19 Thread Erik Hatcher
Interestingly, I used a MetaphoneAnalyzer as an example in our book in progress. I'm curious if you have measured performance with doing it at analysis time versus query time. Enumerating all terms at query time is basically the same as doing a WildcardQuery or FuzzyQuery and involves a

Re: syntax of queries.

2003-12-19 Thread Erik Hatcher
On Friday, December 19, 2003, at 05:42 PM, Ernesto De Santis wrote: I have news questions: - apiQuery.add(new TermQuery(new Term(contents, dot)), false, true); new Term(contents, dot) The Term class, work for only one word? Careful with terminology here. It works for only one term. What is

Re: Sentence Endings: IndexWriter.maxFieldLength and Token.setPositionIncrement()

2003-12-19 Thread Doug Cutting
Jochen, Someone else recently made a similar, reasonable complaint. I agree that this should be fixed. The fastest way to get it fixed would be to submit a patch to lucene-dev, with a test case, etc. Doug Jochen Frey wrote: Hi! I hope this is the right forum for this post. I was wondering

Lucene and JavaHelp

2003-12-19 Thread Mark R. Diggory
Has anyone thought about or used Lucene to build an indexed, searchable help system? Either Server or Application Based? -M. -- Mark Diggory Software Developer Harvard MIT Data Center http://osprey.hmdc.harvard.edu - To