Re: Did you Mean search on Indexes created by Different Files.

2013-07-30 Thread Ankit Murarka
Any help on this will be highly appreciated..I have been trying all possible different option but to no avail. Also tried LuceneDictionary BUT THIS ALSO DOES NOT SEEM TO BE HELPING... Please guide. On 7/30/2013 4:49 PM, Ankit Murarka wrote: Hello. Using DirectSpellChecker is not serving my p

Re: sorting with lucene 4.3

2013-07-30 Thread Adrien Grand
Hi, On Tue, Jul 30, 2013 at 8:19 PM, Nicolas Guyot wrote: > When sorting numerically, the search seems to take a bit of a while > compared to the lexically sorted search. > Also when sorting numerically the result is sorted within each page but no > globally as opposed to the lexical sorted searc

RE: lucene 4.3 seems to be much slower in indexing than lucene 3.6?

2013-07-30 Thread Zhang, Lisheng
Hi Mike, I did more tests with realistic text from different languages (typical text for 8 different languages, English one is novel "Animal Farm"). What I found seems to be: ## Indexing: 36 and 43 comparable (your previous comment is very correct). ## Search: 43 seems to be slower (30%), che

deleting documents

2013-07-30 Thread ikoelliker
Hello, In Lucene 4.x is there a way to get the number of documents that were deleted from calling IndexWriter. deleteDocuments(Query)? Another question, if we call IndexWriter. tryDeleteDocument(Reader, docId) utilizing a near-real-time reader, what is the appropriate order to close the reader

Re: Lucene 4.3.1 CheckIndex limitation 100 trillion tokens?

2013-07-30 Thread Tom Burton-West
Thanks Mike, Got my sysadmins to upgrade our test machine to "1.7.0_09" Will ask them to upgrade production which is currently 1.6.0_45-b06 on the indexing machines and 1.6.0_16-b01 on the serving machines. Tom On Tue, Jul 30, 2013 at 1:47 PM, Michael McCandless < luc...@mikemccandless.com> w

sorting with lucene 4.3

2013-07-30 Thread Nicolas Guyot
Hi, we are using some of the latest features of lucene for sorting which are very cool but we are facing some issues with the numerical sort: We need two kinds of sort: numerical and lexical. For the lexical we are using SortedDocValuesField and for the numerical we use NumericDocValuesField. The

Re: Lucene 4.3.1 CheckIndex limitation 100 trillion tokens?

2013-07-30 Thread Michael McCandless
You should also upgrade your Java! 1.6.0_16 is really ancient and has exciting bugs ... Mike McCandless http://blog.mikemccandless.com On Tue, Jul 30, 2013 at 1:06 PM, Tom Burton-West wrote: > Thanks Mike, Robert and Adrien, > > Unfortunately, I killed the processes, so its too late to get a

Re: Lucene 4.3.1 CheckIndex limitation 100 trillion tokens?

2013-07-30 Thread Tom Burton-West
Thanks Mike, Robert and Adrien, Unfortunately, I killed the processes, so its too late to get a stack trace. On thing that was suspicious was that top was reporting memory use as 20GB res even though I invoked the JVM with java -Xmx10g -Xms10g. I'm going to double the memory, turn on GC logging,

Re: Lucene 4.3.1 CheckIndex limitation 100 trillion tokens?

2013-07-30 Thread Adrien Grand
Hi, On Tue, Jul 30, 2013 at 5:34 PM, Robert Muir wrote: > I'm not sure if there is a similar one for vectors. There is, it has been done for stored fields and term vectors at the same time[1]. [1] https://issues.apache.org/jira/browse/LUCENE-4928 -- Adrien ---

Re: Lucene 4.3.1 CheckIndex limitation 100 trillion tokens?

2013-07-30 Thread Robert Muir
On Tue, Jul 30, 2013 at 8:41 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > I think that's ~ 110 billion, not trillion, tokens :) > > Are you certain you don't have any term vectors? > > Even if your index has no term vectors, CheckIndex goes through all > docIDs trying to load them,

Re: Lucene 4.3.1 CheckIndex limitation 100 trillion tokens?

2013-07-30 Thread Michael McCandless
Can you get a strack trace so we can see where the thread is stuck? Mike McCandless http://blog.mikemccandless.com On Tue, Jul 30, 2013 at 11:08 AM, Tom Burton-West wrote: > Thanks Mike, > > Billion not Trillion Doh! > > Wasn't thinking it through when I titled the e-mail The total number

Re: Lucene 4.3.1 CheckIndex limitation 100 trillion tokens?

2013-07-30 Thread Tom Burton-West
Thanks Mike, Billion not Trillion Doh! Wasn't thinking it through when I titled the e-mail The total number of tokens shouldn't be unusual compared to our other indexes since whether we index pages or whole docs, the number of tokens shouldn't change significantly.The main difference betw

RE: Cache Field Lucene 3.6.0

2013-07-30 Thread andi rexha
Hi Adrien, Thank you very much. I will have a look on your suggestion ;) > From: jpou...@gmail.com > Date: Tue, 30 Jul 2013 16:16:03 +0200 > Subject: Re: Cache Field Lucene 3.6.0 > To: java-user@lucene.apache.org > > Hi, > > On Tue, Jul 30, 2013 at 4:09 PM, andi rexha wrote: > > Hi, I have a s

Re: Cache Field Lucene 3.6.0

2013-07-30 Thread Adrien Grand
Hi, On Tue, Jul 30, 2013 at 4:09 PM, andi rexha wrote: > Hi, I have a stored and tokenized field, and I want to cache all the field > values. > > I have one document in the index, with the "field.value" => "hello world" > and with tokens => "hello", "world". > I try to extract the field

Cache Field Lucene 3.6.0

2013-07-30 Thread andi rexha
Hi, I have a stored and tokenized field, and I want to cache all the field values. I have one document in the index, with the "field.value" => "hello world" and with tokens => "hello", "world". I try to extract the fields content : String [] cachedFields = FieldCache.DEFAULT.getStri

RE: Partial word match using n-grams

2013-07-30 Thread Becker, Thomas
Just to close the loop on this, I upgraded to 4.4 and the improvements to the NGramTokenizer were just what I needed. I switched to using 1-2 grams (the default), and now that the tokenizer emits the tokens in an order that makes sense I'm in business. At search time I split on whitespace, ngr

Re: Lucene 4.3.1 CheckIndex limitation 100 trillion tokens?

2013-07-30 Thread Michael McCandless
I think that's ~ 110 billion, not trillion, tokens :) Are you certain you don't have any term vectors? Even if your index has no term vectors, CheckIndex goes through all docIDs trying to load them, but that ought to be very fast, and then you should see "test: doc values..." after that. Mike M

Re: Did you Mean search on Indexes created by Different Files.

2013-07-30 Thread Ankit Murarka
Hello. Using DirectSpellChecker is not serving my purpose. This seems to return word suggestions from a dictionary whereas I wish to return search suggestion from Indexes I created supplying my own Files (These files are generally log files). I created indexes for certain files in D:\\Indexe