Re: java.lang.RuntimeException: after flush: fdx size mismatch

James X Fri, 29 May 2009 17:29:54 -0700

Hi Mike,I don't see a patch file here?

Could another explanation be that the fdx file doesn't exist yet / has been
deleted from underneath Lucene?


I'm constantly CREATE-ing and UNLOAD-ing Solr cores, and more importantly,
moving the bundled cores around between machines. I find it much more likely
that there's something wrong with my core admin code than there is with the
Lucene internals :) It's possible that I'm occasionally removing files which
are currently in use by a live core...

I'm using an ext3 filesystem on a large EC2 instance's own hard disk. I'm
not sure how Amazon implement the local hard disk, but I assume it's a real
hard disk exposed by the hypervisor.

Thanks,
James

On Fri, May 29, 2009 at 3:41 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> Very interesting: FieldsWriter thinks it's written 12 bytes to the fdx
> file, yet the directory says the file does not exist.
>
> Can you re-run with this new patch?  I'm suspecting that FieldsWriter
> wrote to one segment, but somehow we are then looking at the wrong
> segment.  The attached patch prints out which segment FieldsWriter
> actually wrote to.
>
> What filesystem & underlying IO system/device are you using?
>
> Mike
>
> On Thu, May 28, 2009 at 10:53 PM, James X
> <hello.nigerian.spamm...@gmail.com> wrote:
> > My apologies for the delay in running this patched Lucene build - I was
> > temporarily pulled onto another piece of work.
> >
> > Here is a sample 'fdx size mismatch' exception using the patch Mike
> > supplied:
> >
> > SEVERE: java.lang.RuntimeException: after flush: fdx size mismatch: 1
> docs
> > vs 0 length in bytes of _1i.fdx exists=false didInit=false inc=0 dSO=1
> > fieldsWriter.doClose=true fieldsWriter.indexFilePointer=12
> > fieldsWriter.fieldsFilePointer=2395
> >        at
> >
> org.apache.lucene.index.StoredFieldsWriter.closeDocStore(StoredFieldsWriter.java:96)
> >        at
> >
> org.apache.lucene.index.DocFieldConsumers.closeDocStore(DocFieldConsumers.java:83)
> >        at
> >
> org.apache.lucene.index.DocFieldProcessor.closeDocStore(DocFieldProcessor.java:47)
> >        at
> >
> org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.java:367)
> >        at
> > org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:567)
> >        at
> > org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3540)
> >        at
> org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3450)
> >        at
> > org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1638)
> >        at
> org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1602)
> >        at
> org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1578)
> >        at
> > org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:153)
> >
> >
> > Will now run with assertions enabled and see how that affects the
> behaviour!
> >
> > Thanks,
> > James
> >
> > ---------- Forwarded message ----------
> > From: James X <hello.nigerian.spamm...@gmail.com>
> > Date: Thu, May 21, 2009 at 2:24 PM
> > Subject: Re: java.lang.RuntimeException: after flush: fdx size mismatch
> > To: solr-user@lucene.apache.org
> >
> >
> > Hi Mike,Documents are web pages, about 20 fields, mostly strings, a
> couple
> > of integers, booleans and one html field (for document body content).
> >
> > I do have a multi-threaded client pushing docs to Solr, so yes, I suppose
> > that would mean I have several active Solr worker threads.
> >
> > The only exceptions I have are the RuntimeException flush errors,
> followed
> > by a handful (normally 10-20) of LockObtainFailedExceptions, which i
> > presumed were being caused by the faulty threads dying and failing to
> > release locks.
> >
> > Oh wait, I am getting WstxUnexpectedCharException exceptions every now
> and
> > then:
> > SEVERE: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal character
> > ((CTRL-CHAR, code 8))
> >  at [row,col {unknown-source}]: [1,26070]
> >        at
> > com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675)
> >        at
> >
> com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4668)
> >        at
> >
> com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126)
> >        at
> >
> com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
> >        at
> >
> com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3649)
> >        at
> > com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
> >        at
> >
> org.apache.solr.handler.XmlUpdateRequestHandler.readDoc(XmlUpdateRequestHandler.java:327)
> >
> > I presumed these were caused by character encoding issues, but haven't
> > looked into them at all yet.
> >
> > Thanks again for your help! I'll make some time this afternoon to build
> some
> > patched Lucene jars and get the results
> >
> >
> > On Thu, May 21, 2009 at 5:06 AM, Michael McCandless <
> > luc...@mikemccandless.com> wrote:
> >
> >> Another question: are there any other exceptions in your logs?  Eg
> >> problems adding certain documents, or anything?
> >>
> >> Mike
> >>
> >> On Wed, May 20, 2009 at 11:18 AM, James X
> >> <hello.nigerian.spamm...@gmail.com> wrote:
> >> > Hi Mike, thanks for the quick response:
> >> >
> >> > $ java -version
> >> > java version "1.6.0_11"
> >> > Java(TM) SE Runtime Environment (build 1.6.0_11-b03)
> >> > Java HotSpot(TM) 64-Bit Server VM (build 11.0-b16, mixed mode)
> >> >
> >> > I hadn't noticed the 268m trigger for LUCENE-1521 - I'm definitely not
> >> > hitting that yet!
> >> >
> >> > The exception always reports 0 length, but the number of of docs
> varies,
> >> > heavily weighted towards 1 or two docs. Of the last 130 or so
> exceptions:
> >> >     89 1 docs vs 0 length
> >> >     20 2 docs vs 0 length
> >> >      9 3 docs vs 0 length
> >> >      1 4 docs vs 0 length
> >> >      3 5 docs vs 0 length
> >> >      2 6 docs vs 0 length
> >> >      1 7 docs vs 0 length
> >> >      1 9 docs vs 0 length
> >> >      1 10 docs vs 0 length
> >> >
> >> > The only unusual thing I can think of that we're doing with Solr is
> >> > aggressively CREATE-ing and UNLOAD-ing cores. I've not been able to
> spot
> >> a
> >> > pattern between core admin operations and these exceptions, however...
> >> >
> >> > James
> >> >
> >> > On Wed, May 20, 2009 at 2:37 AM, Michael McCandless <
> >> > luc...@mikemccandless.com> wrote:
> >> >
> >> >> Hmm... somehow Lucene is flushing a new segment on closing the
> >> >> IndexWriter, and thinks 1 doc had been added to the stored fields
> >> >> file, yet the fdx file is the wrong size (0 bytes).  This check (&
> >> >> exception) are designed to prevent corruption from entering the
> index,
> >> >> so it's at least good to see CheckIndex passes after this.
> >> >>
> >> >> I don't think you're hitting LUCENE-1521: that issue only happens if
> a
> >> >> single segment has more than ~268 million docs.
> >> >>
> >> >> Which exact JRE version are you using?
> >> >>
> >> >> When you hit this exception, is it always "1 docs vs 0 length in
> bytes"?
> >> >>
> >> >> Mike
> >> >>
> >> >> On Wed, May 20, 2009 at 3:19 AM, James X
> >> >> <hello.nigerian.spamm...@gmail.com> wrote:
> >> >> > Hello all,I'm running Solr 1.3 in a multi-core environment. There
> are
> >> up
> >> >> to
> >> >> > 2000 active cores in each Solr webapp instance at any given time.
> >> >> >
> >> >> > I've noticed occasional errors such as:
> >> >> > SEVERE: java.lang.RuntimeException: after flush: fdx size mismatch:
> 1
> >> >> docs
> >> >> > vs 0 length in bytes of _h.fdx
> >> >> >        at
> >> >> >
> >> >>
> >>
> org.apache.lucene.index.StoredFieldsWriter.closeDocStore(StoredFieldsWriter.java:94)
> >> >> >        at
> >> >> >
> >> >>
> >>
> org.apache.lucene.index.DocFieldConsumers.closeDocStore(DocFieldConsumers.java:83)
> >> >> >        at
> >> >> >
> >> >>
> >>
> org.apache.lucene.index.DocFieldProcessor.closeDocStore(DocFieldProcessor.java:47)
> >> >> >        at
> >> >> >
> >> >>
> >>
> org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.java:367)
> >> >> >        at
> >> >> >
> >> org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:567)
> >> >> >        at
> >> >> > org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3540)
> >> >> >        at
> >> >> org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3450)
> >> >> >        at
> >> >> >
> >> org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1638)
> >> >> >        at
> >> >> org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1602)
> >> >> >        at
> >> >> org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1578)
> >> >> >        at
> >> >> >
> org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:153)
> >> >> >
> >> >> > during commit / optimise operations.
> >> >> >
> >> >> > These errors then cause cascading errors during updates on the
> >> offending
> >> >> > cores:
> >> >> > SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock
> obtain
> >> >> timed
> >> >> > out: SingleInstanceLock: write.lock
> >> >> >        at org.apache.lucene.store.Lock.obtain(Lock.java:85)
> >> >> >        at
> >> org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1070)
> >> >> >        at
> >> >> org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:924)
> >> >> >        at
> >> >> >
> >> org.apache.solr.update.SolrIndexWriter.<init>(SolrIndexWriter.java:116)
> >> >> >        at
> >> >> >
> >> >>
> >>
> org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:122)
> >> >> >
> >> >> > This looks like http://issues.apache.org/jira/browse/LUCENE-1521,
> but
> >> >> when I
> >> >> > upgraded Lucene to 2.4.1 under Solr 1.3, the issue still remains.
> >> >> >
> >> >> > CheckIndex doesn't find any problems with the index, and problems
> >> >> disappear
> >> >> > after an (inconvenient, for me) restart of Solr.
> >> >> >
> >> >> > Firstly, can I as the symptoms are so close to those in 1521, can I
> >> check
> >> >> my
> >> >> > Lucene upgrade method should work:
> >> >> > - unzip the Solr 1.3 war
> >> >> > - remove the Lucene 2.4dev jars
> >> >> > (lucene-core, lucene-spellchecker, lucene-snowball, lucene-queries,
> >> >> > lucene-memory,lucene-highlighter, lucene-analyzers)
> >> >> > - move in the Lucene 2.4.1 jars
> >> >> > - rezip the directory structures as solr.war.
> >> >> >
> >> >> > I think this has worked, as solr/default/admin/registry.jsp shows:
> >> >> >  <lucene-spec-version>2.4.1</lucene-spec-version>
> >> >> >  <lucene-impl-version>2.4.1 750176 - 2009-03-04
> >> >> > 21:56:52</lucene-impl-version>
> >> >> >
> >> >> > Secondly, if this Lucene fix isn't the right solution to this
> problem,
> >> >> can
> >> >> > anyone suggest an alternative approach? The only problems I've had
> up
> >> to
> >> >> now
> >> >> > is to do with the number of allowed file handles, which was fixed
> by
> >> >> > changing limits.conf (RHEL machine).
> >> >> >
> >> >> > Many thanks!
> >> >> > James
> >> >> >
> >> >>
> >> >
> >>
> >
>

Re: java.lang.RuntimeException: after flush: fdx size mismatch

Reply via email to