Using Lucene 1.4 rc2 I've run into a fatal problem: certain
PhraseQueries cause a Read Past EOF exception (see below), while other
PhraseQueries enter an infinite loop due to a negative bufferLength
field in CSInputStream. Environment is WinXP, JDK 1.4.2. The index is
large, incorporating 1,000,000 documents each of which has 3 stored,
indexed fields of 10-100 chars.
The problem does not occur with Lucene 1.3 indexing the exact same set
of Documents. Nor does it occur with 1.4 rc2 using various smaller sets
of documents. Right now my workaround is to use Lucene 1.3.
For the PhraseQuery a y (that's right, two single-letter terms), the
read-past-EOF exception is as follows:
java.io.IOException: read past EOF
at org.apache.lucene.store.InputStream.refill(InputStream.java:154)
at org.apache.lucene.store.InputStream.readByte(InputStream.java:43)
at org.apache.lucene.store.InputStream.readVInt(InputStream.java:83)
at
org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:59)
at
org.apache.lucene.index.SegmentTermDocs.skipTo(SegmentTermDocs.java:187)
at
org.apache.lucene.search.PhrasePositions.skipTo(PhrasePositions.java:47)
at org.apache.lucene.search.PhraseScorer.next(PhraseScorer.java:69)
at org.apache.lucene.search.Scorer.score(Scorer.java:37)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:81)
at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:64)
at org.apache.lucene.search.Hits.init(Hits.java:43)
at org.apache.lucene.search.Searcher.search(Searcher.java:33)
at org.apache.lucene.search.Searcher.search(Searcher.java:27)
at...
For the phrase query z y, an infinite loop is entered. The loop
occurs due to a similar condition to read-past-EOF: at line 153 of
org.apache.lucene.store.InputStream, the value of bufferLength goes
negative due to the value of start exceeding the value of end. This in
turn seems to be a consequence of a seek to a position past the end of
the stream.
Something is clearly corrupt somewhere in the index structure. I'd love
to post the files that reproduce the problem, but it's about 100 MB of
data. If someone on the Lucene dev team wants to give me an upload
destination, I can post the index somewhere and you can play with the
problem.
regards and thanks for any assistance,
Joe Berkovitz
Chief Architect
Ruckus Network, Inc.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]