Re: Did you Mean search on Indexes created by Different Files.

2013-07-30 Thread Ankit Murarka
Hello. Using DirectSpellChecker is not serving my purpose. This seems to return word suggestions from a dictionary whereas I wish to return search suggestion from Indexes I created supplying my own Files (These files are generally log files). I created indexes for certain files in

Re: Lucene 4.3.1 CheckIndex limitation 100 trillion tokens?

2013-07-30 Thread Michael McCandless
I think that's ~ 110 billion, not trillion, tokens :) Are you certain you don't have any term vectors? Even if your index has no term vectors, CheckIndex goes through all docIDs trying to load them, but that ought to be very fast, and then you should see test: doc values... after that. Mike

RE: Partial word match using n-grams

2013-07-30 Thread Becker, Thomas
Just to close the loop on this, I upgraded to 4.4 and the improvements to the NGramTokenizer were just what I needed. I switched to using 1-2 grams (the default), and now that the tokenizer emits the tokens in an order that makes sense I'm in business. At search time I split on whitespace,

Cache Field Lucene 3.6.0

2013-07-30 Thread andi rexha
Hi, I have a stored and tokenized field, and I want to cache all the field values. I have one document in the index, with the field.value = hello world and with tokens = hello, world. I try to extract the fields content : String [] cachedFields =

Re: Cache Field Lucene 3.6.0

2013-07-30 Thread Adrien Grand
Hi, On Tue, Jul 30, 2013 at 4:09 PM, andi rexha a_re...@hotmail.com wrote: Hi, I have a stored and tokenized field, and I want to cache all the field values. I have one document in the index, with the field.value = hello world and with tokens = hello, world. I try to extract the

RE: Cache Field Lucene 3.6.0

2013-07-30 Thread andi rexha
Hi Adrien, Thank you very much. I will have a look on your suggestion ;) From: jpou...@gmail.com Date: Tue, 30 Jul 2013 16:16:03 +0200 Subject: Re: Cache Field Lucene 3.6.0 To: java-user@lucene.apache.org Hi, On Tue, Jul 30, 2013 at 4:09 PM, andi rexha a_re...@hotmail.com wrote: Hi,

Re: Lucene 4.3.1 CheckIndex limitation 100 trillion tokens?

2013-07-30 Thread Tom Burton-West
Thanks Mike, Billion not Trillion Doh! Wasn't thinking it through when I titled the e-mail The total number of tokens shouldn't be unusual compared to our other indexes since whether we index pages or whole docs, the number of tokens shouldn't change significantly.The main difference

Re: Lucene 4.3.1 CheckIndex limitation 100 trillion tokens?

2013-07-30 Thread Robert Muir
On Tue, Jul 30, 2013 at 8:41 AM, Michael McCandless luc...@mikemccandless.com wrote: I think that's ~ 110 billion, not trillion, tokens :) Are you certain you don't have any term vectors? Even if your index has no term vectors, CheckIndex goes through all docIDs trying to load them, but

Re: Lucene 4.3.1 CheckIndex limitation 100 trillion tokens?

2013-07-30 Thread Adrien Grand
Hi, On Tue, Jul 30, 2013 at 5:34 PM, Robert Muir rcm...@gmail.com wrote: I'm not sure if there is a similar one for vectors. There is, it has been done for stored fields and term vectors at the same time[1]. [1] https://issues.apache.org/jira/browse/LUCENE-4928 -- Adrien

Re: Lucene 4.3.1 CheckIndex limitation 100 trillion tokens?

2013-07-30 Thread Tom Burton-West
Thanks Mike, Robert and Adrien, Unfortunately, I killed the processes, so its too late to get a stack trace. On thing that was suspicious was that top was reporting memory use as 20GB res even though I invoked the JVM with java -Xmx10g -Xms10g. I'm going to double the memory, turn on GC

Re: Lucene 4.3.1 CheckIndex limitation 100 trillion tokens?

2013-07-30 Thread Michael McCandless
You should also upgrade your Java! 1.6.0_16 is really ancient and has exciting bugs ... Mike McCandless http://blog.mikemccandless.com On Tue, Jul 30, 2013 at 1:06 PM, Tom Burton-West tburt...@umich.edu wrote: Thanks Mike, Robert and Adrien, Unfortunately, I killed the processes, so its

sorting with lucene 4.3

2013-07-30 Thread Nicolas Guyot
Hi, we are using some of the latest features of lucene for sorting which are very cool but we are facing some issues with the numerical sort: We need two kinds of sort: numerical and lexical. For the lexical we are using SortedDocValuesField and for the numerical we use NumericDocValuesField.

Re: Lucene 4.3.1 CheckIndex limitation 100 trillion tokens?

2013-07-30 Thread Tom Burton-West
Thanks Mike, Got my sysadmins to upgrade our test machine to 1.7.0_09 Will ask them to upgrade production which is currently 1.6.0_45-b06 on the indexing machines and 1.6.0_16-b01 on the serving machines. Tom On Tue, Jul 30, 2013 at 1:47 PM, Michael McCandless luc...@mikemccandless.com

deleting documents

2013-07-30 Thread ikoelliker
Hello, In Lucene 4.x is there a way to get the number of documents that were deleted from calling IndexWriter. deleteDocuments(Query)? Another question, if we call IndexWriter. tryDeleteDocument(Reader, docId) utilizing a near-real-time reader, what is the appropriate order to close the reader

RE: lucene 4.3 seems to be much slower in indexing than lucene 3.6?

2013-07-30 Thread Zhang, Lisheng
Hi Mike, I did more tests with realistic text from different languages (typical text for 8 different languages, English one is novel Animal Farm). What I found seems to be: ## Indexing: 36 and 43 comparable (your previous comment is very correct). ## Search: 43 seems to be slower (30%),

Re: sorting with lucene 4.3

2013-07-30 Thread Adrien Grand
Hi, On Tue, Jul 30, 2013 at 8:19 PM, Nicolas Guyot sfni...@gmail.com wrote: When sorting numerically, the search seems to take a bit of a while compared to the lexically sorted search. Also when sorting numerically the result is sorted within each page but no globally as opposed to the

Re: Did you Mean search on Indexes created by Different Files.

2013-07-30 Thread Ankit Murarka
Any help on this will be highly appreciated..I have been trying all possible different option but to no avail. Also tried LuceneDictionary BUT THIS ALSO DOES NOT SEEM TO BE HELPING... Please guide. On 7/30/2013 4:49 PM, Ankit Murarka wrote: Hello. Using DirectSpellChecker is not serving my