Hi Mike,I don't see a patch file here? Could another explanation be that the fdx file doesn't exist yet / has been deleted from underneath Lucene?
I'm constantly CREATE-ing and UNLOAD-ing Solr cores, and more importantly, moving the bundled cores around between machines. I find it much more likely that there's something wrong with my core admin code than there is with the Lucene internals :) It's possible that I'm occasionally removing files which are currently in use by a live core... I'm using an ext3 filesystem on a large EC2 instance's own hard disk. I'm not sure how Amazon implement the local hard disk, but I assume it's a real hard disk exposed by the hypervisor. Thanks, James On Fri, May 29, 2009 at 3:41 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > Very interesting: FieldsWriter thinks it's written 12 bytes to the fdx > file, yet the directory says the file does not exist. > > Can you re-run with this new patch? I'm suspecting that FieldsWriter > wrote to one segment, but somehow we are then looking at the wrong > segment. The attached patch prints out which segment FieldsWriter > actually wrote to. > > What filesystem & underlying IO system/device are you using? > > Mike > > On Thu, May 28, 2009 at 10:53 PM, James X > <hello.nigerian.spamm...@gmail.com> wrote: > > My apologies for the delay in running this patched Lucene build - I was > > temporarily pulled onto another piece of work. > > > > Here is a sample 'fdx size mismatch' exception using the patch Mike > > supplied: > > > > SEVERE: java.lang.RuntimeException: after flush: fdx size mismatch: 1 > docs > > vs 0 length in bytes of _1i.fdx exists=false didInit=false inc=0 dSO=1 > > fieldsWriter.doClose=true fieldsWriter.indexFilePointer=12 > > fieldsWriter.fieldsFilePointer=2395 > > at > > > org.apache.lucene.index.StoredFieldsWriter.closeDocStore(StoredFieldsWriter.java:96) > > at > > > org.apache.lucene.index.DocFieldConsumers.closeDocStore(DocFieldConsumers.java:83) > > at > > > org.apache.lucene.index.DocFieldProcessor.closeDocStore(DocFieldProcessor.java:47) > > at > > > org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.java:367) > > at > > org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:567) > > at > > org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3540) > > at > org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3450) > > at > > org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1638) > > at > org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1602) > > at > org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1578) > > at > > org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:153) > > > > > > Will now run with assertions enabled and see how that affects the > behaviour! > > > > Thanks, > > James > > > > ---------- Forwarded message ---------- > > From: James X <hello.nigerian.spamm...@gmail.com> > > Date: Thu, May 21, 2009 at 2:24 PM > > Subject: Re: java.lang.RuntimeException: after flush: fdx size mismatch > > To: solr-user@lucene.apache.org > > > > > > Hi Mike,Documents are web pages, about 20 fields, mostly strings, a > couple > > of integers, booleans and one html field (for document body content). > > > > I do have a multi-threaded client pushing docs to Solr, so yes, I suppose > > that would mean I have several active Solr worker threads. > > > > The only exceptions I have are the RuntimeException flush errors, > followed > > by a handful (normally 10-20) of LockObtainFailedExceptions, which i > > presumed were being caused by the faulty threads dying and failing to > > release locks. > > > > Oh wait, I am getting WstxUnexpectedCharException exceptions every now > and > > then: > > SEVERE: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal character > > ((CTRL-CHAR, code 8)) > > at [row,col {unknown-source}]: [1,26070] > > at > > com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675) > > at > > > com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4668) > > at > > > com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126) > > at > > > com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701) > > at > > > com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3649) > > at > > com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809) > > at > > > org.apache.solr.handler.XmlUpdateRequestHandler.readDoc(XmlUpdateRequestHandler.java:327) > > > > I presumed these were caused by character encoding issues, but haven't > > looked into them at all yet. > > > > Thanks again for your help! I'll make some time this afternoon to build > some > > patched Lucene jars and get the results > > > > > > On Thu, May 21, 2009 at 5:06 AM, Michael McCandless < > > luc...@mikemccandless.com> wrote: > > > >> Another question: are there any other exceptions in your logs? Eg > >> problems adding certain documents, or anything? > >> > >> Mike > >> > >> On Wed, May 20, 2009 at 11:18 AM, James X > >> <hello.nigerian.spamm...@gmail.com> wrote: > >> > Hi Mike, thanks for the quick response: > >> > > >> > $ java -version > >> > java version "1.6.0_11" > >> > Java(TM) SE Runtime Environment (build 1.6.0_11-b03) > >> > Java HotSpot(TM) 64-Bit Server VM (build 11.0-b16, mixed mode) > >> > > >> > I hadn't noticed the 268m trigger for LUCENE-1521 - I'm definitely not > >> > hitting that yet! > >> > > >> > The exception always reports 0 length, but the number of of docs > varies, > >> > heavily weighted towards 1 or two docs. Of the last 130 or so > exceptions: > >> > 89 1 docs vs 0 length > >> > 20 2 docs vs 0 length > >> > 9 3 docs vs 0 length > >> > 1 4 docs vs 0 length > >> > 3 5 docs vs 0 length > >> > 2 6 docs vs 0 length > >> > 1 7 docs vs 0 length > >> > 1 9 docs vs 0 length > >> > 1 10 docs vs 0 length > >> > > >> > The only unusual thing I can think of that we're doing with Solr is > >> > aggressively CREATE-ing and UNLOAD-ing cores. I've not been able to > spot > >> a > >> > pattern between core admin operations and these exceptions, however... > >> > > >> > James > >> > > >> > On Wed, May 20, 2009 at 2:37 AM, Michael McCandless < > >> > luc...@mikemccandless.com> wrote: > >> > > >> >> Hmm... somehow Lucene is flushing a new segment on closing the > >> >> IndexWriter, and thinks 1 doc had been added to the stored fields > >> >> file, yet the fdx file is the wrong size (0 bytes). This check (& > >> >> exception) are designed to prevent corruption from entering the > index, > >> >> so it's at least good to see CheckIndex passes after this. > >> >> > >> >> I don't think you're hitting LUCENE-1521: that issue only happens if > a > >> >> single segment has more than ~268 million docs. > >> >> > >> >> Which exact JRE version are you using? > >> >> > >> >> When you hit this exception, is it always "1 docs vs 0 length in > bytes"? > >> >> > >> >> Mike > >> >> > >> >> On Wed, May 20, 2009 at 3:19 AM, James X > >> >> <hello.nigerian.spamm...@gmail.com> wrote: > >> >> > Hello all,I'm running Solr 1.3 in a multi-core environment. There > are > >> up > >> >> to > >> >> > 2000 active cores in each Solr webapp instance at any given time. > >> >> > > >> >> > I've noticed occasional errors such as: > >> >> > SEVERE: java.lang.RuntimeException: after flush: fdx size mismatch: > 1 > >> >> docs > >> >> > vs 0 length in bytes of _h.fdx > >> >> > at > >> >> > > >> >> > >> > org.apache.lucene.index.StoredFieldsWriter.closeDocStore(StoredFieldsWriter.java:94) > >> >> > at > >> >> > > >> >> > >> > org.apache.lucene.index.DocFieldConsumers.closeDocStore(DocFieldConsumers.java:83) > >> >> > at > >> >> > > >> >> > >> > org.apache.lucene.index.DocFieldProcessor.closeDocStore(DocFieldProcessor.java:47) > >> >> > at > >> >> > > >> >> > >> > org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.java:367) > >> >> > at > >> >> > > >> org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:567) > >> >> > at > >> >> > org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3540) > >> >> > at > >> >> org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3450) > >> >> > at > >> >> > > >> org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1638) > >> >> > at > >> >> org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1602) > >> >> > at > >> >> org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1578) > >> >> > at > >> >> > > org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:153) > >> >> > > >> >> > during commit / optimise operations. > >> >> > > >> >> > These errors then cause cascading errors during updates on the > >> offending > >> >> > cores: > >> >> > SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock > obtain > >> >> timed > >> >> > out: SingleInstanceLock: write.lock > >> >> > at org.apache.lucene.store.Lock.obtain(Lock.java:85) > >> >> > at > >> org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1070) > >> >> > at > >> >> org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:924) > >> >> > at > >> >> > > >> org.apache.solr.update.SolrIndexWriter.<init>(SolrIndexWriter.java:116) > >> >> > at > >> >> > > >> >> > >> > org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:122) > >> >> > > >> >> > This looks like http://issues.apache.org/jira/browse/LUCENE-1521, > but > >> >> when I > >> >> > upgraded Lucene to 2.4.1 under Solr 1.3, the issue still remains. > >> >> > > >> >> > CheckIndex doesn't find any problems with the index, and problems > >> >> disappear > >> >> > after an (inconvenient, for me) restart of Solr. > >> >> > > >> >> > Firstly, can I as the symptoms are so close to those in 1521, can I > >> check > >> >> my > >> >> > Lucene upgrade method should work: > >> >> > - unzip the Solr 1.3 war > >> >> > - remove the Lucene 2.4dev jars > >> >> > (lucene-core, lucene-spellchecker, lucene-snowball, lucene-queries, > >> >> > lucene-memory,lucene-highlighter, lucene-analyzers) > >> >> > - move in the Lucene 2.4.1 jars > >> >> > - rezip the directory structures as solr.war. > >> >> > > >> >> > I think this has worked, as solr/default/admin/registry.jsp shows: > >> >> > <lucene-spec-version>2.4.1</lucene-spec-version> > >> >> > <lucene-impl-version>2.4.1 750176 - 2009-03-04 > >> >> > 21:56:52</lucene-impl-version> > >> >> > > >> >> > Secondly, if this Lucene fix isn't the right solution to this > problem, > >> >> can > >> >> > anyone suggest an alternative approach? The only problems I've had > up > >> to > >> >> now > >> >> > is to do with the number of allowed file handles, which was fixed > by > >> >> > changing limits.conf (RHEL machine). > >> >> > > >> >> > Many thanks! > >> >> > James > >> >> > > >> >> > >> > > >> > > >