IndexReader.document method - Why is this made final?

2014-09-10 Thread Buddhavarapu, Suresh
Hi, I'm working on an upgrade of project of Lucene from 2.9.3 to 4.10. We have a need to implement the IndexReader interfact to create an abstraction over two disparate indexes. First, I found that Indexreader can no longer be extended. Instead I chose to extend the CopositeReader abstract

4.10.0: java.lang.IllegalStateException: cannot write 3x SegmentInfo unless codec is Lucene3x (got: Lucene40)

2014-09-10 Thread Ian Lea
Hi On running a quick test after a handful of minor code changes to deal with 4.10 deprecations, a program that updates an existing index failed with Exception in thread main java.lang.IllegalStateException: cannot write 3x SegmentInfo unless codec is Lucene3x (got: Lucene40) at

Re: 4.10.0: java.lang.IllegalStateException: cannot write 3x SegmentInfo unless codec is Lucene3x (got: Lucene40)

2014-09-10 Thread Robert Muir
Ian, this looks terrible, thanks for reporting this. Is there any possible way I could have a copy of that working index to make it easier to reproduce? On Wed, Sep 10, 2014 at 7:01 AM, Ian Lea ian@gmail.com wrote: Hi On running a quick test after a handful of minor code changes to deal

RE: 4.10.0: java.lang.IllegalStateException: cannot write 3x SegmentInfo unless codec is Lucene3x (got: Lucene40)

2014-09-10 Thread Uwe Schindler
Hi Ian, this index was created with the BETA version of Lucene 4.0: Segments file=segments_2 numSegments=1 version=4.0.0.2 format= 1 of 1: name=_0 docCount=15730 4.0.0.2 was the index version number of Lucene 4.0-BETA. This is not a supported version and may not open correctly. In Lucene

RE: 4.10.0: java.lang.IllegalStateException: cannot write 3x SegmentInfo unless codec is Lucene3x (got: Lucene40)

2014-09-10 Thread Uwe Schindler
If you want to upgrade the index, you may try to run IndexUpgrader on Lucene 4.9, to have it up to date. But Index upgrading may fail because of the BETA-Status of the original creator. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de

Re: 4.10.0: java.lang.IllegalStateException: cannot write 3x SegmentInfo unless codec is Lucene3x (got: Lucene40)

2014-09-10 Thread Ian Lea
Sent to your personal email address. -- Ian. On Wed, Sep 10, 2014 at 12:36 PM, Robert Muir rcm...@gmail.com wrote: Ian, this looks terrible, thanks for reporting this. Is there any possible way I could have a copy of that working index to make it easier to reproduce? On Wed, Sep 10, 2014

Re: 4.10.0: java.lang.IllegalStateException: cannot write 3x SegmentInfo unless codec is Lucene3x (got: Lucene40)

2014-09-10 Thread Ian Lea
Yes, quite possible. I do sometimes download and test beta versions. This isn't really a problem for me - it has only happened on test indexes that I don't care about, but there might be live indexes out there that are also affected and having them made unusable would be undesirable, to put it

Re: 4.10.0: java.lang.IllegalStateException: cannot write 3x SegmentInfo unless codec is Lucene3x (got: Lucene40)

2014-09-10 Thread Robert Muir
Ian, its a supported version. It wouldnt matter if its 4.0 alpha or beta anyway, because we support index back compat for those. In your case, its actually the final version. I will open an issue. Thank you for reporting this! On Wed, Sep 10, 2014 at 7:54 AM, Ian Lea ian@gmail.com wrote:

RE: 4.10.0: java.lang.IllegalStateException: cannot write 3x SegmentInfo unless codec is Lucene3x (got: Lucene40)

2014-09-10 Thread Uwe Schindler
Hi, we looked into earlier releases: The index version number of 4.0-ALPHA was 4.0 The index version number of 4.0-BETA was 4.0.0.1 The index version number of 4.0 final was 4.0.0.2 Ian's index is there fore a real official 4.0 index. Unfortunately the version comparison logic in Lucene 4.10

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-09-10 Thread Michael McCandless
Thanks, I'll look at the issue soon. Right, segment merging won't spontaneously create deletes. Deletes are only made if you explicitly delete OR (tricky) there is a non-aborting exception (e.g. an analysis problem) hit while indexing a document; in that case IW indexes a portion of the document

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-09-10 Thread Vitaly Funstein
Normally, reopens only go forwards in time, so if you could ensure that when you reopen one reader to another, the 2nd one is always newer, then I think you should never hit this issue Mike, I'm not sure if I fully understand your suggestion. In a nutshell, the use here case is as follows: I

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-09-10 Thread Vitaly Funstein
One other observation - if instead of a reader opened at a later commit point (T1), I pass in an NRT reader *without* doing the second commit on the index prior, then there is no exception. This probably also hinges on the assumption that no buffered docs have been flushed after T0, thus creating

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-09-10 Thread Robert Muir
Thats because there are 3 constructors in segmentreader: 1. one used for opening new (checks hasDeletions, only reads liveDocs if so) 2. one used for non-NRT reopen -- problem one for you 3. one used for NRT reopen (takes a LiveDocs as a param, so no bug) so personally i think you should be able

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-09-10 Thread Vitaly Funstein
Seems to me the bug occurs regardless of whether the passed in newer reader is NRT or non-NRT. This is because the user operates at the level of DirectoryReader, not SegmentReader and modifying the test code to do the following reproduces the bug: writer.commit(); DirectoryReader latest =

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-09-10 Thread Robert Muir
Yes, there is also a safety check, but IMO it should be removed. See the patch on the issue, the test passes now. On Wed, Sep 10, 2014 at 9:31 PM, Vitaly Funstein vfunst...@gmail.com wrote: Seems to me the bug occurs regardless of whether the passed in newer reader is NRT or non-NRT. This is