Dear all,
I use Lucene to index document content 6 field (int) and 1 file (string)
I log the index process. Log said that,
INFO [CONTENT-FILTER INDEX-TIMER] 2010-12-29 15:45:52,707 Index 55 item in
10576 miliseconds
INFO [CONTENT-FILTER INDEX-TIMER] 2010-12-29 15:46:13,378 Index 19 item in
206
Lucene intermittently takes longer as
1. It flushes the buffered docs from memory to the disk and
2. It merges the smaller index segments to form a larger segment on regular
intervals as per the index writer settings.
You may have a look at various IndexWriter params in the javadoc on the
lucene pa
You should also make sure that it is lucene that is taking the time.
You don't say where your data is coming from but it is often slower to
read the the source data rather than to index it with lucene.
See also http://wiki.apache.org/lucene-java/ImproveIndexingSpeed
--
Ian.
On Wed, Dec 29, 201
Some of the factors that go in to the score calculation are encoded as
a byte with inevitable loss of precision. Maybe length is one of
these and lucene is not differentiating between your 3 and 4 word
docs. Try indexing a document that is significantly longer than 3 or
4 words.
Further reading:
Can lucene index survives a machine crash during the merge or optimize
operation?
or can I stop the running index program during the merge or optimize period?
I have saw some of introduction about this few month ago,but when I need it,
I can't remember where it is.
Can you help me?
Thanks very
It pretty much can.
Generally, those operations happen on a copy of the index and hence are
pretty much atomic. That is the reason why 2X the size of the index is
required only for the optimize operation.
You should get (buy) a copy of lucene in action 2nd ed from manning for a
lot of such info.
thanks for you patient answer and advice.
I will pick up my lucene in action 2nd soon.
On Wed, Dec 29, 2010 at 8:12 PM, Anshum wrote:
> It pretty much can.
> Generally, those operations happen on a copy of the index and hence are
> pretty much atomic. That is the reason why 2X the size of th
> Test case
> doc1 : test -- one two
> three
> doc2 : test, one two three
> doc3 : one two three
>
> Search query : "one two three" by QueryParser and
> StandardAnalyzer
>
> Question: why all of three documents have the same
> score?
As Ian said, length norm values of your a
Ahmet and Ian:
Thanks to both of you very much. I will try the patch.
Qi
On Wed, Dec 29, 2010 at 9:00 AM, Ahmet Arslan wrote:
> > Test case
> > doc1 : test -- one two
> > three
> > doc2 : test, one two three
> > doc3 : one two three
> >
> > Search query : "one two three" by
This is interesting. What are we driving at here? A single user? That doesn't
make sense to unless you want to flag certain things as they construct the
document. Or else why don't they know what is in their own document? There must
be other ways apart from Lucene. It seems to me you want each l
I am writing a text editor and have to provide a certain search
functionality .
The use case is for single user. A single document is potentially very
large and numerous such documents may be open and unflushed at any given
time. Think many files of an IDE, except the files are larger. The user
What has this to do with Lucene? You're thinking its index would be faster than
your own search algorithm. Would it though? Do you really need an index or a
good pattern matcher? I can't see what the stream buffer being flushed by the
user has to do with it? Don't you have to control that behavi
I tried to override the default lengthNorm method with the suggestion in
this link
https://issues.apache.org/jira/browse/LUCENE-2187.
But it will not work because not every number of terms from 1 to 10 has an
unique score.
Here is my solution, which only works for shorter fields. Welcome any
crit
Check out the Instantiated contrib for Lucene. This is an alternative
in-memory data structure that does not need commits and is faster (and
larger) than the Lucene Directory system.
On Wed, Dec 29, 2010 at 9:15 AM, wrote:
> What has this to do with Lucene? You're thinking its index would be fas
14 matches
Mail list logo