Lucene index

2010-12-29 Thread King JKing
Dear all, I use Lucene to index document content 6 field (int) and 1 file (string) I log the index process. Log said that, INFO [CONTENT-FILTER INDEX-TIMER] 2010-12-29 15:45:52,707 Index 55 item in 10576 miliseconds INFO [CONTENT-FILTER INDEX-TIMER] 2010-12-29 15:46:13,378 Index 19 item in 206

Re: Lucene index

2010-12-29 Thread Anshum
Lucene intermittently takes longer as 1. It flushes the buffered docs from memory to the disk and 2. It merges the smaller index segments to form a larger segment on regular intervals as per the index writer settings. You may have a look at various IndexWriter params in the javadoc on the lucene pa

Re: Lucene index

2010-12-29 Thread Ian Lea
You should also make sure that it is lucene that is taking the time. You don't say where your data is coming from but it is often slower to read the the source data rather than to index it with lucene. See also http://wiki.apache.org/lucene-java/ImproveIndexingSpeed -- Ian. On Wed, Dec 29, 201

Re: relevant score calculation

2010-12-29 Thread Ian Lea
Some of the factors that go in to the score calculation are encoded as a byte with inevitable loss of precision. Maybe length is one of these and lucene is not differentiating between your 3 and 4 word docs. Try indexing a document that is significantly longer than 3 or 4 words. Further reading:

Can lucene index survives a machine crash during the merge or optimize operation?

2010-12-29 Thread Jiang mingyuan
Can lucene index survives a machine crash during the merge or optimize operation? or can I stop the running index program during the merge or optimize period? I have saw some of introduction about this few month ago,but when I need it, I can't remember where it is. Can you help me? Thanks very

Re: Can lucene index survives a machine crash during the merge or optimize operation?

2010-12-29 Thread Anshum
It pretty much can. Generally, those operations happen on a copy of the index and hence are pretty much atomic. That is the reason why 2X the size of the index is required only for the optimize operation. You should get (buy) a copy of lucene in action 2nd ed from manning for a lot of such info.

Re: Can lucene index survives a machine crash during the merge or optimize operation?

2010-12-29 Thread Jiang mingyuan
thanks for you patient answer and advice. I will pick up my lucene in action 2nd soon. On Wed, Dec 29, 2010 at 8:12 PM, Anshum wrote: > It pretty much can. > Generally, those operations happen on a copy of the index and hence are > pretty much atomic. That is the reason why 2X the size of th

Re: relevant score calculation

2010-12-29 Thread Ahmet Arslan
> Test case >     doc1 :   test -- one two > three >     doc2 :   test, one two three >     doc3 :   one two three > > Search query :  "one two three" by QueryParser and > StandardAnalyzer > > Question:  why all of three documents have the same > score?  As Ian said, length norm values of your a

Re: relevant score calculation

2010-12-29 Thread Qi Li
Ahmet and Ian: Thanks to both of you very much. I will try the patch. Qi On Wed, Dec 29, 2010 at 9:00 AM, Ahmet Arslan wrote: > > Test case > > doc1 : test -- one two > > three > > doc2 : test, one two three > > doc3 : one two three > > > > Search query : "one two three" by

Re: Using Lucene to search live, being-edited documents

2010-12-29 Thread adam . saltiel
This is interesting. What are we driving at here? A single user? That doesn't make sense to unless you want to flag certain things as they construct the document. Or else why don't they know what is in their own document? There must be other ways apart from Lucene. It seems to me you want each l

Re: Using Lucene to search live, being-edited documents

2010-12-29 Thread software visualization
I am writing a text editor and have to provide a certain search functionality . The use case is for single user. A single document is potentially very large and numerous such documents may be open and unflushed at any given time. Think many files of an IDE, except the files are larger. The user

Re: Using Lucene to search live, being-edited documents

2010-12-29 Thread adam . saltiel
What has this to do with Lucene? You're thinking its index would be faster than your own search algorithm. Would it though? Do you really need an index or a good pattern matcher? I can't see what the stream buffer being flushed by the user has to do with it? Don't you have to control that behavi

Re: relevant score calculation

2010-12-29 Thread Qi Li
I tried to override the default lengthNorm method with the suggestion in this link https://issues.apache.org/jira/browse/LUCENE-2187. But it will not work because not every number of terms from 1 to 10 has an unique score. Here is my solution, which only works for shorter fields. Welcome any crit

Re: Using Lucene to search live, being-edited documents

2010-12-29 Thread Lance Norskog
Check out the Instantiated contrib for Lucene. This is an alternative in-memory data structure that does not need commits and is faster (and larger) than the Lucene Directory system. On Wed, Dec 29, 2010 at 9:15 AM, wrote: > What has this to do with Lucene? You're thinking its index would be fas