Very interesting: FieldsWriter thinks it's written 12 bytes to the fdx
file, yet the directory says the file does not exist.

Can you re-run with this new patch?  I'm suspecting that FieldsWriter
wrote to one segment, but somehow we are then looking at the wrong
segment.  The attached patch prints out which segment FieldsWriter
actually wrote to.

What filesystem & underlying IO system/device are you using?

Mike

On Thu, May 28, 2009 at 10:53 PM, James X
<hello.nigerian.spamm...@gmail.com> wrote:
> My apologies for the delay in running this patched Lucene build - I was
> temporarily pulled onto another piece of work.
>
> Here is a sample 'fdx size mismatch' exception using the patch Mike
> supplied:
>
> SEVERE: java.lang.RuntimeException: after flush: fdx size mismatch: 1 docs
> vs 0 length in bytes of _1i.fdx exists=false didInit=false inc=0 dSO=1
> fieldsWriter.doClose=true fieldsWriter.indexFilePointer=12
> fieldsWriter.fieldsFilePointer=2395
>        at
> org.apache.lucene.index.StoredFieldsWriter.closeDocStore(StoredFieldsWriter.java:96)
>        at
> org.apache.lucene.index.DocFieldConsumers.closeDocStore(DocFieldConsumers.java:83)
>        at
> org.apache.lucene.index.DocFieldProcessor.closeDocStore(DocFieldProcessor.java:47)
>        at
> org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.java:367)
>        at
> org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:567)
>        at
> org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3540)
>        at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3450)
>        at
> org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1638)
>        at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1602)
>        at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1578)
>        at
> org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:153)
>
>
> Will now run with assertions enabled and see how that affects the behaviour!
>
> Thanks,
> James
>
> ---------- Forwarded message ----------
> From: James X <hello.nigerian.spamm...@gmail.com>
> Date: Thu, May 21, 2009 at 2:24 PM
> Subject: Re: java.lang.RuntimeException: after flush: fdx size mismatch
> To: solr-user@lucene.apache.org
>
>
> Hi Mike,Documents are web pages, about 20 fields, mostly strings, a couple
> of integers, booleans and one html field (for document body content).
>
> I do have a multi-threaded client pushing docs to Solr, so yes, I suppose
> that would mean I have several active Solr worker threads.
>
> The only exceptions I have are the RuntimeException flush errors, followed
> by a handful (normally 10-20) of LockObtainFailedExceptions, which i
> presumed were being caused by the faulty threads dying and failing to
> release locks.
>
> Oh wait, I am getting WstxUnexpectedCharException exceptions every now and
> then:
> SEVERE: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal character
> ((CTRL-CHAR, code 8))
>  at [row,col {unknown-source}]: [1,26070]
>        at
> com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675)
>        at
> com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4668)
>        at
> com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126)
>        at
> com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
>        at
> com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3649)
>        at
> com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
>        at
> org.apache.solr.handler.XmlUpdateRequestHandler.readDoc(XmlUpdateRequestHandler.java:327)
>
> I presumed these were caused by character encoding issues, but haven't
> looked into them at all yet.
>
> Thanks again for your help! I'll make some time this afternoon to build some
> patched Lucene jars and get the results
>
>
> On Thu, May 21, 2009 at 5:06 AM, Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>> Another question: are there any other exceptions in your logs?  Eg
>> problems adding certain documents, or anything?
>>
>> Mike
>>
>> On Wed, May 20, 2009 at 11:18 AM, James X
>> <hello.nigerian.spamm...@gmail.com> wrote:
>> > Hi Mike, thanks for the quick response:
>> >
>> > $ java -version
>> > java version "1.6.0_11"
>> > Java(TM) SE Runtime Environment (build 1.6.0_11-b03)
>> > Java HotSpot(TM) 64-Bit Server VM (build 11.0-b16, mixed mode)
>> >
>> > I hadn't noticed the 268m trigger for LUCENE-1521 - I'm definitely not
>> > hitting that yet!
>> >
>> > The exception always reports 0 length, but the number of of docs varies,
>> > heavily weighted towards 1 or two docs. Of the last 130 or so exceptions:
>> >     89 1 docs vs 0 length
>> >     20 2 docs vs 0 length
>> >      9 3 docs vs 0 length
>> >      1 4 docs vs 0 length
>> >      3 5 docs vs 0 length
>> >      2 6 docs vs 0 length
>> >      1 7 docs vs 0 length
>> >      1 9 docs vs 0 length
>> >      1 10 docs vs 0 length
>> >
>> > The only unusual thing I can think of that we're doing with Solr is
>> > aggressively CREATE-ing and UNLOAD-ing cores. I've not been able to spot
>> a
>> > pattern between core admin operations and these exceptions, however...
>> >
>> > James
>> >
>> > On Wed, May 20, 2009 at 2:37 AM, Michael McCandless <
>> > luc...@mikemccandless.com> wrote:
>> >
>> >> Hmm... somehow Lucene is flushing a new segment on closing the
>> >> IndexWriter, and thinks 1 doc had been added to the stored fields
>> >> file, yet the fdx file is the wrong size (0 bytes).  This check (&
>> >> exception) are designed to prevent corruption from entering the index,
>> >> so it's at least good to see CheckIndex passes after this.
>> >>
>> >> I don't think you're hitting LUCENE-1521: that issue only happens if a
>> >> single segment has more than ~268 million docs.
>> >>
>> >> Which exact JRE version are you using?
>> >>
>> >> When you hit this exception, is it always "1 docs vs 0 length in bytes"?
>> >>
>> >> Mike
>> >>
>> >> On Wed, May 20, 2009 at 3:19 AM, James X
>> >> <hello.nigerian.spamm...@gmail.com> wrote:
>> >> > Hello all,I'm running Solr 1.3 in a multi-core environment. There are
>> up
>> >> to
>> >> > 2000 active cores in each Solr webapp instance at any given time.
>> >> >
>> >> > I've noticed occasional errors such as:
>> >> > SEVERE: java.lang.RuntimeException: after flush: fdx size mismatch: 1
>> >> docs
>> >> > vs 0 length in bytes of _h.fdx
>> >> >        at
>> >> >
>> >>
>> org.apache.lucene.index.StoredFieldsWriter.closeDocStore(StoredFieldsWriter.java:94)
>> >> >        at
>> >> >
>> >>
>> org.apache.lucene.index.DocFieldConsumers.closeDocStore(DocFieldConsumers.java:83)
>> >> >        at
>> >> >
>> >>
>> org.apache.lucene.index.DocFieldProcessor.closeDocStore(DocFieldProcessor.java:47)
>> >> >        at
>> >> >
>> >>
>> org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.java:367)
>> >> >        at
>> >> >
>> org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:567)
>> >> >        at
>> >> > org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3540)
>> >> >        at
>> >> org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3450)
>> >> >        at
>> >> >
>> org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1638)
>> >> >        at
>> >> org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1602)
>> >> >        at
>> >> org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1578)
>> >> >        at
>> >> > org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:153)
>> >> >
>> >> > during commit / optimise operations.
>> >> >
>> >> > These errors then cause cascading errors during updates on the
>> offending
>> >> > cores:
>> >> > SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain
>> >> timed
>> >> > out: SingleInstanceLock: write.lock
>> >> >        at org.apache.lucene.store.Lock.obtain(Lock.java:85)
>> >> >        at
>> org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1070)
>> >> >        at
>> >> org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:924)
>> >> >        at
>> >> >
>> org.apache.solr.update.SolrIndexWriter.<init>(SolrIndexWriter.java:116)
>> >> >        at
>> >> >
>> >>
>> org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:122)
>> >> >
>> >> > This looks like http://issues.apache.org/jira/browse/LUCENE-1521, but
>> >> when I
>> >> > upgraded Lucene to 2.4.1 under Solr 1.3, the issue still remains.
>> >> >
>> >> > CheckIndex doesn't find any problems with the index, and problems
>> >> disappear
>> >> > after an (inconvenient, for me) restart of Solr.
>> >> >
>> >> > Firstly, can I as the symptoms are so close to those in 1521, can I
>> check
>> >> my
>> >> > Lucene upgrade method should work:
>> >> > - unzip the Solr 1.3 war
>> >> > - remove the Lucene 2.4dev jars
>> >> > (lucene-core, lucene-spellchecker, lucene-snowball, lucene-queries,
>> >> > lucene-memory,lucene-highlighter, lucene-analyzers)
>> >> > - move in the Lucene 2.4.1 jars
>> >> > - rezip the directory structures as solr.war.
>> >> >
>> >> > I think this has worked, as solr/default/admin/registry.jsp shows:
>> >> >  <lucene-spec-version>2.4.1</lucene-spec-version>
>> >> >  <lucene-impl-version>2.4.1 750176 - 2009-03-04
>> >> > 21:56:52</lucene-impl-version>
>> >> >
>> >> > Secondly, if this Lucene fix isn't the right solution to this problem,
>> >> can
>> >> > anyone suggest an alternative approach? The only problems I've had up
>> to
>> >> now
>> >> > is to do with the number of allowed file handles, which was fixed by
>> >> > changing limits.conf (RHEL machine).
>> >> >
>> >> > Many thanks!
>> >> > James
>> >> >
>> >>
>> >
>>
>

Reply via email to