Hello.
Using DirectSpellChecker is not serving my purpose. This seems to return
word suggestions from a dictionary whereas I wish to return search
suggestion from Indexes I created supplying my own Files (These files
are generally log files).
I created indexes for certain files in
I think that's ~ 110 billion, not trillion, tokens :)
Are you certain you don't have any term vectors?
Even if your index has no term vectors, CheckIndex goes through all
docIDs trying to load them, but that ought to be very fast, and then
you should see test: doc values... after that.
Mike
Just to close the loop on this, I upgraded to 4.4 and the improvements to the
NGramTokenizer were just what I needed. I switched to using 1-2 grams (the
default), and now that the tokenizer emits the tokens in an order that makes
sense I'm in business. At search time I split on whitespace,
Hi, I have a stored and tokenized field, and I want to cache all the field
values.
I have one document in the index, with the field.value = hello world
and with tokens = hello, world.
I try to extract the fields content :
String [] cachedFields =
Hi,
On Tue, Jul 30, 2013 at 4:09 PM, andi rexha a_re...@hotmail.com wrote:
Hi, I have a stored and tokenized field, and I want to cache all the field
values.
I have one document in the index, with the field.value = hello world
and with tokens = hello, world.
I try to extract the
Hi Adrien,
Thank you very much. I will have a look on your suggestion ;)
From: jpou...@gmail.com
Date: Tue, 30 Jul 2013 16:16:03 +0200
Subject: Re: Cache Field Lucene 3.6.0
To: java-user@lucene.apache.org
Hi,
On Tue, Jul 30, 2013 at 4:09 PM, andi rexha a_re...@hotmail.com wrote:
Hi,
Thanks Mike,
Billion not Trillion Doh!
Wasn't thinking it through when I titled the e-mail The total number of
tokens shouldn't be unusual compared to our other indexes since whether we
index pages or whole docs, the number of tokens shouldn't change
significantly.The main difference
On Tue, Jul 30, 2013 at 8:41 AM, Michael McCandless
luc...@mikemccandless.com wrote:
I think that's ~ 110 billion, not trillion, tokens :)
Are you certain you don't have any term vectors?
Even if your index has no term vectors, CheckIndex goes through all
docIDs trying to load them, but
Hi,
On Tue, Jul 30, 2013 at 5:34 PM, Robert Muir rcm...@gmail.com wrote:
I'm not sure if there is a similar one for vectors.
There is, it has been done for stored fields and term vectors at the
same time[1].
[1] https://issues.apache.org/jira/browse/LUCENE-4928
--
Adrien
Thanks Mike, Robert and Adrien,
Unfortunately, I killed the processes, so its too late to get a stack
trace. On thing that was suspicious was that top was reporting memory use
as 20GB res even though I invoked the JVM with java -Xmx10g -Xms10g.
I'm going to double the memory, turn on GC
You should also upgrade your Java!
1.6.0_16 is really ancient and has exciting bugs ...
Mike McCandless
http://blog.mikemccandless.com
On Tue, Jul 30, 2013 at 1:06 PM, Tom Burton-West tburt...@umich.edu wrote:
Thanks Mike, Robert and Adrien,
Unfortunately, I killed the processes, so its
Hi,
we are using some of the latest features of lucene for sorting which are
very cool but we are facing some issues with the numerical sort:
We need two kinds of sort: numerical and lexical.
For the lexical we are using SortedDocValuesField and for the numerical we
use NumericDocValuesField.
Thanks Mike,
Got my sysadmins to upgrade our test machine to 1.7.0_09
Will ask them to upgrade production which is currently 1.6.0_45-b06 on the
indexing machines and 1.6.0_16-b01 on the serving machines.
Tom
On Tue, Jul 30, 2013 at 1:47 PM, Michael McCandless
luc...@mikemccandless.com
Hello,
In Lucene 4.x is there a way to get the number of documents that were deleted
from calling IndexWriter. deleteDocuments(Query)?
Another question, if we call IndexWriter. tryDeleteDocument(Reader, docId)
utilizing a near-real-time reader, what is the appropriate order to close the
reader
Hi Mike,
I did more tests with realistic text from different languages (typical
text for 8 different languages, English one is novel Animal Farm).
What I found seems to be:
## Indexing:
36 and 43 comparable (your previous comment is very correct).
## Search:
43 seems to be slower (30%),
Hi,
On Tue, Jul 30, 2013 at 8:19 PM, Nicolas Guyot sfni...@gmail.com wrote:
When sorting numerically, the search seems to take a bit of a while
compared to the lexically sorted search.
Also when sorting numerically the result is sorted within each page but no
globally as opposed to the
Any help on this will be highly appreciated..I have been trying all
possible different option but to no avail.
Also tried LuceneDictionary BUT THIS ALSO DOES NOT SEEM TO BE HELPING...
Please guide.
On 7/30/2013 4:49 PM, Ankit Murarka wrote:
Hello.
Using DirectSpellChecker is not serving my
17 matches
Mail list logo