Re: Lucene migrate to 6.6.5 from 5.5.3

2019-04-01 Thread brahmam
Hi, In continuation to the previous mail, we see for the second point the newly added data after upgrade will not be searchable if we use LegacyNumericRangeQuery.newLongRange(). so the second point is invalid now. So please suggest on the point 1 how to search old and new data using

Re: Lucene migrate to 6.6.5 from 5.5.3

2019-04-01 Thread brahmam
Hi, The main idea to migrate to 6.6.5(not to 7.6.x) from 5.5.3 is to have smooth upgrade by avoiding re-indexing of the data at this point of time. Coming to the issue of not able to search the old data(which was indexed with 5.5.3) when we upgrade to 6.6.5 is > The old data search works fine if

RE: Clarification regarding BlockTree implementation of IntersectTermsEnum

2019-04-01 Thread Uwe Schindler
Hi again, > The problem with TermRangeQueries is actually not the iteration over the > term index. The slowness comes from the fact that all terms between start > and end have to be iterated and their postings be fetched and those postings > be merged together. If the "source of terms" for doing

RE: Clarification regarding BlockTree implementation of IntersectTermsEnum

2019-04-01 Thread Uwe Schindler
Hi, in fact I was also wondering why the TermRangeQuery is now a subclass of AutomatonQuery, but this was changed for the reasons that Robert mentioned in his e-mail (https://issues.apache.org/jira/browse/LUCENE-5879). For sure, the easiest is to just start and seek to the first term in the

Re: Clarification regarding BlockTree implementation of IntersectTermsEnum

2019-04-01 Thread Robert Muir
The regular TermsEnum is really designed for walking terms in linear order. it does have some ability to seek/leapfrog. But this means paths in a query automaton that match no terms result in a wasted seek and cpu, because the api is designed to return the next term after regardless. On the other

Re: Clarification regarding BlockTree implementation of IntersectTermsEnum

2019-04-01 Thread Stamatis Zampetakis
Yes it is used. I think there are simpler and possibly more efficient ways to implement a TermRangeQuery and that is why I am looking into this. But I am also curious to understand what IntersectTermsEnum is supposed to do. Στις Δευ, 1 Απρ 2019 στις 5:34 μ.μ., ο/η Robert Muir έγραψε: > Is this

Re: Clarification regarding BlockTree implementation of IntersectTermsEnum

2019-04-01 Thread Robert Muir
Is this IntersectTermsEnum really being used for term range query? Seems like using a standard TermsEnum, seeking to the start of the range, then calling next until the end would be easier. On Mon, Apr 1, 2019, 10:05 AM Stamatis Zampetakis wrote: > Hi all, > > I am currently working on

Clarification regarding BlockTree implementation of IntersectTermsEnum

2019-04-01 Thread Stamatis Zampetakis
Hi all, I am currently working on improving the performance of range queries on strings. I've noticed that using TermRangeQuery with low-selective queries is a very bad idea in terms of performance but I cannot clearly explain why since it seems related with how the IntersectTermsEnum#next method

Re: Getting Exception : java.nio.channels.ClosedByInterruptException

2019-04-01 Thread Robert Muir
Some code interrupted (Thread.interrupt) a java thread while it was blocked on I/O. This is not safe to do with lucene, because unfortunately in this situation java's NIO code closes file descriptors and releases locks. The second exception is because the indexwriter tried to write when it no

Why does Lucene 7.4.0 commit() Increase Memory Usage x2

2019-04-01 Thread thturk
Hello, -For a while i am tring to figure out why ram usage incease x2 than before after commit one single document. -Lucene Version 7.4.0 -Writer Directory FSDirectory -Reader Directory MMapDirectory -I create new IndexWriter instance per update, add, delete and commit after each operations.