Re: java.lang.RuntimeException: after flush: fdx size mismatch
Woops, here's the patch (added you, diretly, on the To: so that you get the patch; Apache's list manager strips patches). Yes, if the fdx file is getting deleted out from under Lucene, that'd also explain what's happening. Though the timing would have to be very quick. What's happening is Lucene had opened _X.fdx for writing, written some small # bytes to it, and then closed it but found the file no longer exists. I'm not familiar with what exactly happens when you create/unload Solr cores and move them around machines; does this involve moving files from one machine to another? (Ie, deleting files)? If so, is there some way to log when such migrations take place and try to correlate to this exception? Mike On Fri, May 29, 2009 at 8:29 PM, James X hello.nigerian.spamm...@gmail.com wrote: Hi Mike,I don't see a patch file here? Could another explanation be that the fdx file doesn't exist yet / has been deleted from underneath Lucene? I'm constantly CREATE-ing and UNLOAD-ing Solr cores, and more importantly, moving the bundled cores around between machines. I find it much more likely that there's something wrong with my core admin code than there is with the Lucene internals :) It's possible that I'm occasionally removing files which are currently in use by a live core... I'm using an ext3 filesystem on a large EC2 instance's own hard disk. I'm not sure how Amazon implement the local hard disk, but I assume it's a real hard disk exposed by the hypervisor. Thanks, James On Fri, May 29, 2009 at 3:41 AM, Michael McCandless luc...@mikemccandless.com wrote: Very interesting: FieldsWriter thinks it's written 12 bytes to the fdx file, yet the directory says the file does not exist. Can you re-run with this new patch? I'm suspecting that FieldsWriter wrote to one segment, but somehow we are then looking at the wrong segment. The attached patch prints out which segment FieldsWriter actually wrote to. What filesystem underlying IO system/device are you using? Mike On Thu, May 28, 2009 at 10:53 PM, James X hello.nigerian.spamm...@gmail.com wrote: My apologies for the delay in running this patched Lucene build - I was temporarily pulled onto another piece of work. Here is a sample 'fdx size mismatch' exception using the patch Mike supplied: SEVERE: java.lang.RuntimeException: after flush: fdx size mismatch: 1 docs vs 0 length in bytes of _1i.fdx exists=false didInit=false inc=0 dSO=1 fieldsWriter.doClose=true fieldsWriter.indexFilePointer=12 fieldsWriter.fieldsFilePointer=2395 at org.apache.lucene.index.StoredFieldsWriter.closeDocStore(StoredFieldsWriter.java:96) at org.apache.lucene.index.DocFieldConsumers.closeDocStore(DocFieldConsumers.java:83) at org.apache.lucene.index.DocFieldProcessor.closeDocStore(DocFieldProcessor.java:47) at org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.java:367) at org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:567) at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3540) at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3450) at org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1638) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1602) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1578) at org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:153) Will now run with assertions enabled and see how that affects the behaviour! Thanks, James -- Forwarded message -- From: James X hello.nigerian.spamm...@gmail.com Date: Thu, May 21, 2009 at 2:24 PM Subject: Re: java.lang.RuntimeException: after flush: fdx size mismatch To: solr-user@lucene.apache.org Hi Mike,Documents are web pages, about 20 fields, mostly strings, a couple of integers, booleans and one html field (for document body content). I do have a multi-threaded client pushing docs to Solr, so yes, I suppose that would mean I have several active Solr worker threads. The only exceptions I have are the RuntimeException flush errors, followed by a handful (normally 10-20) of LockObtainFailedExceptions, which i presumed were being caused by the faulty threads dying and failing to release locks. Oh wait, I am getting WstxUnexpectedCharException exceptions every now and then: SEVERE: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal character ((CTRL-CHAR, code 8)) at [row,col {unknown-source}]: [1,26070] at com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675) at com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4668) at com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126) at com.ctc.wstx.sr.BasicStreamReader.finishToken
Re: java.lang.RuntimeException: after flush: fdx size mismatch
Very interesting: FieldsWriter thinks it's written 12 bytes to the fdx file, yet the directory says the file does not exist. Can you re-run with this new patch? I'm suspecting that FieldsWriter wrote to one segment, but somehow we are then looking at the wrong segment. The attached patch prints out which segment FieldsWriter actually wrote to. What filesystem underlying IO system/device are you using? Mike On Thu, May 28, 2009 at 10:53 PM, James X hello.nigerian.spamm...@gmail.com wrote: My apologies for the delay in running this patched Lucene build - I was temporarily pulled onto another piece of work. Here is a sample 'fdx size mismatch' exception using the patch Mike supplied: SEVERE: java.lang.RuntimeException: after flush: fdx size mismatch: 1 docs vs 0 length in bytes of _1i.fdx exists=false didInit=false inc=0 dSO=1 fieldsWriter.doClose=true fieldsWriter.indexFilePointer=12 fieldsWriter.fieldsFilePointer=2395 at org.apache.lucene.index.StoredFieldsWriter.closeDocStore(StoredFieldsWriter.java:96) at org.apache.lucene.index.DocFieldConsumers.closeDocStore(DocFieldConsumers.java:83) at org.apache.lucene.index.DocFieldProcessor.closeDocStore(DocFieldProcessor.java:47) at org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.java:367) at org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:567) at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3540) at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3450) at org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1638) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1602) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1578) at org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:153) Will now run with assertions enabled and see how that affects the behaviour! Thanks, James -- Forwarded message -- From: James X hello.nigerian.spamm...@gmail.com Date: Thu, May 21, 2009 at 2:24 PM Subject: Re: java.lang.RuntimeException: after flush: fdx size mismatch To: solr-user@lucene.apache.org Hi Mike,Documents are web pages, about 20 fields, mostly strings, a couple of integers, booleans and one html field (for document body content). I do have a multi-threaded client pushing docs to Solr, so yes, I suppose that would mean I have several active Solr worker threads. The only exceptions I have are the RuntimeException flush errors, followed by a handful (normally 10-20) of LockObtainFailedExceptions, which i presumed were being caused by the faulty threads dying and failing to release locks. Oh wait, I am getting WstxUnexpectedCharException exceptions every now and then: SEVERE: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal character ((CTRL-CHAR, code 8)) at [row,col {unknown-source}]: [1,26070] at com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675) at com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4668) at com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126) at com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701) at com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3649) at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809) at org.apache.solr.handler.XmlUpdateRequestHandler.readDoc(XmlUpdateRequestHandler.java:327) I presumed these were caused by character encoding issues, but haven't looked into them at all yet. Thanks again for your help! I'll make some time this afternoon to build some patched Lucene jars and get the results On Thu, May 21, 2009 at 5:06 AM, Michael McCandless luc...@mikemccandless.com wrote: Another question: are there any other exceptions in your logs? Eg problems adding certain documents, or anything? Mike On Wed, May 20, 2009 at 11:18 AM, James X hello.nigerian.spamm...@gmail.com wrote: Hi Mike, thanks for the quick response: $ java -version java version 1.6.0_11 Java(TM) SE Runtime Environment (build 1.6.0_11-b03) Java HotSpot(TM) 64-Bit Server VM (build 11.0-b16, mixed mode) I hadn't noticed the 268m trigger for LUCENE-1521 - I'm definitely not hitting that yet! The exception always reports 0 length, but the number of of docs varies, heavily weighted towards 1 or two docs. Of the last 130 or so exceptions: 89 1 docs vs 0 length 20 2 docs vs 0 length 9 3 docs vs 0 length 1 4 docs vs 0 length 3 5 docs vs 0 length 2 6 docs vs 0 length 1 7 docs vs 0 length 1 9 docs vs 0 length 1 10 docs vs 0 length The only unusual thing I can think of that we're doing with Solr is aggressively CREATE-ing and UNLOAD-ing cores. I've not been able to spot a pattern between core
Re: java.lang.RuntimeException: after flush: fdx size mismatch
Hi Mike,I don't see a patch file here? Could another explanation be that the fdx file doesn't exist yet / has been deleted from underneath Lucene? I'm constantly CREATE-ing and UNLOAD-ing Solr cores, and more importantly, moving the bundled cores around between machines. I find it much more likely that there's something wrong with my core admin code than there is with the Lucene internals :) It's possible that I'm occasionally removing files which are currently in use by a live core... I'm using an ext3 filesystem on a large EC2 instance's own hard disk. I'm not sure how Amazon implement the local hard disk, but I assume it's a real hard disk exposed by the hypervisor. Thanks, James On Fri, May 29, 2009 at 3:41 AM, Michael McCandless luc...@mikemccandless.com wrote: Very interesting: FieldsWriter thinks it's written 12 bytes to the fdx file, yet the directory says the file does not exist. Can you re-run with this new patch? I'm suspecting that FieldsWriter wrote to one segment, but somehow we are then looking at the wrong segment. The attached patch prints out which segment FieldsWriter actually wrote to. What filesystem underlying IO system/device are you using? Mike On Thu, May 28, 2009 at 10:53 PM, James X hello.nigerian.spamm...@gmail.com wrote: My apologies for the delay in running this patched Lucene build - I was temporarily pulled onto another piece of work. Here is a sample 'fdx size mismatch' exception using the patch Mike supplied: SEVERE: java.lang.RuntimeException: after flush: fdx size mismatch: 1 docs vs 0 length in bytes of _1i.fdx exists=false didInit=false inc=0 dSO=1 fieldsWriter.doClose=true fieldsWriter.indexFilePointer=12 fieldsWriter.fieldsFilePointer=2395 at org.apache.lucene.index.StoredFieldsWriter.closeDocStore(StoredFieldsWriter.java:96) at org.apache.lucene.index.DocFieldConsumers.closeDocStore(DocFieldConsumers.java:83) at org.apache.lucene.index.DocFieldProcessor.closeDocStore(DocFieldProcessor.java:47) at org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.java:367) at org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:567) at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3540) at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3450) at org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1638) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1602) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1578) at org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:153) Will now run with assertions enabled and see how that affects the behaviour! Thanks, James -- Forwarded message -- From: James X hello.nigerian.spamm...@gmail.com Date: Thu, May 21, 2009 at 2:24 PM Subject: Re: java.lang.RuntimeException: after flush: fdx size mismatch To: solr-user@lucene.apache.org Hi Mike,Documents are web pages, about 20 fields, mostly strings, a couple of integers, booleans and one html field (for document body content). I do have a multi-threaded client pushing docs to Solr, so yes, I suppose that would mean I have several active Solr worker threads. The only exceptions I have are the RuntimeException flush errors, followed by a handful (normally 10-20) of LockObtainFailedExceptions, which i presumed were being caused by the faulty threads dying and failing to release locks. Oh wait, I am getting WstxUnexpectedCharException exceptions every now and then: SEVERE: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal character ((CTRL-CHAR, code 8)) at [row,col {unknown-source}]: [1,26070] at com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675) at com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4668) at com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126) at com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701) at com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3649) at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809) at org.apache.solr.handler.XmlUpdateRequestHandler.readDoc(XmlUpdateRequestHandler.java:327) I presumed these were caused by character encoding issues, but haven't looked into them at all yet. Thanks again for your help! I'll make some time this afternoon to build some patched Lucene jars and get the results On Thu, May 21, 2009 at 5:06 AM, Michael McCandless luc...@mikemccandless.com wrote: Another question: are there any other exceptions in your logs? Eg problems adding certain documents, or anything? Mike On Wed, May 20, 2009 at 11:18 AM, James X hello.nigerian.spamm
Re: java.lang.RuntimeException: after flush: fdx size mismatch
On Wed, May 20, 2009 at 11:18 AM, James X hello.nigerian.spamm...@gmail.com wrote: Hi Mike, thanks for the quick response: $ java -version java version 1.6.0_11 Java(TM) SE Runtime Environment (build 1.6.0_11-b03) Java HotSpot(TM) 64-Bit Server VM (build 11.0-b16, mixed mode) I hadn't noticed the 268m trigger for LUCENE-1521 - I'm definitely not hitting that yet! The issue didn't spell this out very well -- I've added a comment. The exception always reports 0 length, but the number of of docs varies, heavily weighted towards 1 or two docs. Of the last 130 or so exceptions: 89 1 docs vs 0 length 20 2 docs vs 0 length 9 3 docs vs 0 length 1 4 docs vs 0 length 3 5 docs vs 0 length 2 6 docs vs 0 length 1 7 docs vs 0 length 1 9 docs vs 0 length 1 10 docs vs 0 length Hmm... odd that it's always 0 file length. What filesystem IO devices is the index being written to? The only unusual thing I can think of that we're doing with Solr is aggressively CREATE-ing and UNLOAD-ing cores. I've not been able to spot a pattern between core admin operations and these exceptions, however... I think from Lucene's standpoint this just means creating closing lots of IndexWriters? (Which should be just fine). What are your documents like? Ie, how many and what type of fields? Are you adding docs from multiple threads? (Solr would do so, I believe, so I guess: is your client that's submitting docs to a given core, doing so with multiple threads?). Mike
Re: java.lang.RuntimeException: after flush: fdx size mismatch
Another question: are there any other exceptions in your logs? Eg problems adding certain documents, or anything? Mike On Wed, May 20, 2009 at 11:18 AM, James X hello.nigerian.spamm...@gmail.com wrote: Hi Mike, thanks for the quick response: $ java -version java version 1.6.0_11 Java(TM) SE Runtime Environment (build 1.6.0_11-b03) Java HotSpot(TM) 64-Bit Server VM (build 11.0-b16, mixed mode) I hadn't noticed the 268m trigger for LUCENE-1521 - I'm definitely not hitting that yet! The exception always reports 0 length, but the number of of docs varies, heavily weighted towards 1 or two docs. Of the last 130 or so exceptions: 89 1 docs vs 0 length 20 2 docs vs 0 length 9 3 docs vs 0 length 1 4 docs vs 0 length 3 5 docs vs 0 length 2 6 docs vs 0 length 1 7 docs vs 0 length 1 9 docs vs 0 length 1 10 docs vs 0 length The only unusual thing I can think of that we're doing with Solr is aggressively CREATE-ing and UNLOAD-ing cores. I've not been able to spot a pattern between core admin operations and these exceptions, however... James On Wed, May 20, 2009 at 2:37 AM, Michael McCandless luc...@mikemccandless.com wrote: Hmm... somehow Lucene is flushing a new segment on closing the IndexWriter, and thinks 1 doc had been added to the stored fields file, yet the fdx file is the wrong size (0 bytes). This check ( exception) are designed to prevent corruption from entering the index, so it's at least good to see CheckIndex passes after this. I don't think you're hitting LUCENE-1521: that issue only happens if a single segment has more than ~268 million docs. Which exact JRE version are you using? When you hit this exception, is it always 1 docs vs 0 length in bytes? Mike On Wed, May 20, 2009 at 3:19 AM, James X hello.nigerian.spamm...@gmail.com wrote: Hello all,I'm running Solr 1.3 in a multi-core environment. There are up to 2000 active cores in each Solr webapp instance at any given time. I've noticed occasional errors such as: SEVERE: java.lang.RuntimeException: after flush: fdx size mismatch: 1 docs vs 0 length in bytes of _h.fdx at org.apache.lucene.index.StoredFieldsWriter.closeDocStore(StoredFieldsWriter.java:94) at org.apache.lucene.index.DocFieldConsumers.closeDocStore(DocFieldConsumers.java:83) at org.apache.lucene.index.DocFieldProcessor.closeDocStore(DocFieldProcessor.java:47) at org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.java:367) at org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:567) at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3540) at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3450) at org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1638) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1602) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1578) at org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:153) during commit / optimise operations. These errors then cause cascading errors during updates on the offending cores: SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: SingleInstanceLock: write.lock at org.apache.lucene.store.Lock.obtain(Lock.java:85) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1070) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:924) at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:116) at org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:122) This looks like http://issues.apache.org/jira/browse/LUCENE-1521, but when I upgraded Lucene to 2.4.1 under Solr 1.3, the issue still remains. CheckIndex doesn't find any problems with the index, and problems disappear after an (inconvenient, for me) restart of Solr. Firstly, can I as the symptoms are so close to those in 1521, can I check my Lucene upgrade method should work: - unzip the Solr 1.3 war - remove the Lucene 2.4dev jars (lucene-core, lucene-spellchecker, lucene-snowball, lucene-queries, lucene-memory,lucene-highlighter, lucene-analyzers) - move in the Lucene 2.4.1 jars - rezip the directory structures as solr.war. I think this has worked, as solr/default/admin/registry.jsp shows: lucene-spec-version2.4.1/lucene-spec-version lucene-impl-version2.4.1 750176 - 2009-03-04 21:56:52/lucene-impl-version Secondly, if this Lucene fix isn't the right solution to this problem, can anyone suggest an alternative approach? The only problems I've had up to now is to do with the number of allowed file handles, which was fixed by changing limits.conf (RHEL machine). Many thanks! James
Re: java.lang.RuntimeException: after flush: fdx size mismatch
If you're able to run a patched version of Lucene, can you apply the attached patch, run it, get the issue to happen again, and post back the resulting exception? It only adds further diagnostics to that RuntimeException you're hitting. Another thing to try is turning on assertions, which may very well catch the issue sooner. Mike On Wed, May 20, 2009 at 11:18 AM, James X hello.nigerian.spamm...@gmail.com wrote: Hi Mike, thanks for the quick response: $ java -version java version 1.6.0_11 Java(TM) SE Runtime Environment (build 1.6.0_11-b03) Java HotSpot(TM) 64-Bit Server VM (build 11.0-b16, mixed mode) I hadn't noticed the 268m trigger for LUCENE-1521 - I'm definitely not hitting that yet! The exception always reports 0 length, but the number of of docs varies, heavily weighted towards 1 or two docs. Of the last 130 or so exceptions: 89 1 docs vs 0 length 20 2 docs vs 0 length 9 3 docs vs 0 length 1 4 docs vs 0 length 3 5 docs vs 0 length 2 6 docs vs 0 length 1 7 docs vs 0 length 1 9 docs vs 0 length 1 10 docs vs 0 length The only unusual thing I can think of that we're doing with Solr is aggressively CREATE-ing and UNLOAD-ing cores. I've not been able to spot a pattern between core admin operations and these exceptions, however... James On Wed, May 20, 2009 at 2:37 AM, Michael McCandless luc...@mikemccandless.com wrote: Hmm... somehow Lucene is flushing a new segment on closing the IndexWriter, and thinks 1 doc had been added to the stored fields file, yet the fdx file is the wrong size (0 bytes). This check ( exception) are designed to prevent corruption from entering the index, so it's at least good to see CheckIndex passes after this. I don't think you're hitting LUCENE-1521: that issue only happens if a single segment has more than ~268 million docs. Which exact JRE version are you using? When you hit this exception, is it always 1 docs vs 0 length in bytes? Mike On Wed, May 20, 2009 at 3:19 AM, James X hello.nigerian.spamm...@gmail.com wrote: Hello all,I'm running Solr 1.3 in a multi-core environment. There are up to 2000 active cores in each Solr webapp instance at any given time. I've noticed occasional errors such as: SEVERE: java.lang.RuntimeException: after flush: fdx size mismatch: 1 docs vs 0 length in bytes of _h.fdx at org.apache.lucene.index.StoredFieldsWriter.closeDocStore(StoredFieldsWriter.java:94) at org.apache.lucene.index.DocFieldConsumers.closeDocStore(DocFieldConsumers.java:83) at org.apache.lucene.index.DocFieldProcessor.closeDocStore(DocFieldProcessor.java:47) at org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.java:367) at org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:567) at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3540) at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3450) at org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1638) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1602) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1578) at org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:153) during commit / optimise operations. These errors then cause cascading errors during updates on the offending cores: SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: SingleInstanceLock: write.lock at org.apache.lucene.store.Lock.obtain(Lock.java:85) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1070) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:924) at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:116) at org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:122) This looks like http://issues.apache.org/jira/browse/LUCENE-1521, but when I upgraded Lucene to 2.4.1 under Solr 1.3, the issue still remains. CheckIndex doesn't find any problems with the index, and problems disappear after an (inconvenient, for me) restart of Solr. Firstly, can I as the symptoms are so close to those in 1521, can I check my Lucene upgrade method should work: - unzip the Solr 1.3 war - remove the Lucene 2.4dev jars (lucene-core, lucene-spellchecker, lucene-snowball, lucene-queries, lucene-memory,lucene-highlighter, lucene-analyzers) - move in the Lucene 2.4.1 jars - rezip the directory structures as solr.war. I think this has worked, as solr/default/admin/registry.jsp shows: lucene-spec-version2.4.1/lucene-spec-version lucene-impl-version2.4.1 750176 - 2009-03-04 21:56:52/lucene-impl-version Secondly, if this Lucene fix isn't the right solution to this problem, can anyone suggest an alternative approach? The only problems I've
Re: java.lang.RuntimeException: after flush: fdx size mismatch
Hi Mike,Documents are web pages, about 20 fields, mostly strings, a couple of integers, booleans and one html field (for document body content). I do have a multi-threaded client pushing docs to Solr, so yes, I suppose that would mean I have several active Solr worker threads. The only exceptions I have are the RuntimeException flush errors, followed by a handful (normally 10-20) of LockObtainFailedExceptions, which i presumed were being caused by the faulty threads dying and failing to release locks. Oh wait, I am getting WstxUnexpectedCharException exceptions every now and then: SEVERE: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal character ((CTRL-CHAR, code 8)) at [row,col {unknown-source}]: [1,26070] at com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675) at com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4668) at com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126) at com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701) at com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3649) at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809) at org.apache.solr.handler.XmlUpdateRequestHandler.readDoc(XmlUpdateRequestHandler.java:327) I presumed these were caused by character encoding issues, but haven't looked into them at all yet. Thanks again for your help! I'll make some time this afternoon to build some patched Lucene jars and get the results On Thu, May 21, 2009 at 5:06 AM, Michael McCandless luc...@mikemccandless.com wrote: Another question: are there any other exceptions in your logs? Eg problems adding certain documents, or anything? Mike On Wed, May 20, 2009 at 11:18 AM, James X hello.nigerian.spamm...@gmail.com wrote: Hi Mike, thanks for the quick response: $ java -version java version 1.6.0_11 Java(TM) SE Runtime Environment (build 1.6.0_11-b03) Java HotSpot(TM) 64-Bit Server VM (build 11.0-b16, mixed mode) I hadn't noticed the 268m trigger for LUCENE-1521 - I'm definitely not hitting that yet! The exception always reports 0 length, but the number of of docs varies, heavily weighted towards 1 or two docs. Of the last 130 or so exceptions: 89 1 docs vs 0 length 20 2 docs vs 0 length 9 3 docs vs 0 length 1 4 docs vs 0 length 3 5 docs vs 0 length 2 6 docs vs 0 length 1 7 docs vs 0 length 1 9 docs vs 0 length 1 10 docs vs 0 length The only unusual thing I can think of that we're doing with Solr is aggressively CREATE-ing and UNLOAD-ing cores. I've not been able to spot a pattern between core admin operations and these exceptions, however... James On Wed, May 20, 2009 at 2:37 AM, Michael McCandless luc...@mikemccandless.com wrote: Hmm... somehow Lucene is flushing a new segment on closing the IndexWriter, and thinks 1 doc had been added to the stored fields file, yet the fdx file is the wrong size (0 bytes). This check ( exception) are designed to prevent corruption from entering the index, so it's at least good to see CheckIndex passes after this. I don't think you're hitting LUCENE-1521: that issue only happens if a single segment has more than ~268 million docs. Which exact JRE version are you using? When you hit this exception, is it always 1 docs vs 0 length in bytes? Mike On Wed, May 20, 2009 at 3:19 AM, James X hello.nigerian.spamm...@gmail.com wrote: Hello all,I'm running Solr 1.3 in a multi-core environment. There are up to 2000 active cores in each Solr webapp instance at any given time. I've noticed occasional errors such as: SEVERE: java.lang.RuntimeException: after flush: fdx size mismatch: 1 docs vs 0 length in bytes of _h.fdx at org.apache.lucene.index.StoredFieldsWriter.closeDocStore(StoredFieldsWriter.java:94) at org.apache.lucene.index.DocFieldConsumers.closeDocStore(DocFieldConsumers.java:83) at org.apache.lucene.index.DocFieldProcessor.closeDocStore(DocFieldProcessor.java:47) at org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.java:367) at org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:567) at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3540) at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3450) at org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1638) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1602) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1578) at org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:153) during commit / optimise operations. These errors then cause cascading errors during updates on the
Re: java.lang.RuntimeException: after flush: fdx size mismatch
Hmm... somehow Lucene is flushing a new segment on closing the IndexWriter, and thinks 1 doc had been added to the stored fields file, yet the fdx file is the wrong size (0 bytes). This check ( exception) are designed to prevent corruption from entering the index, so it's at least good to see CheckIndex passes after this. I don't think you're hitting LUCENE-1521: that issue only happens if a single segment has more than ~268 million docs. Which exact JRE version are you using? When you hit this exception, is it always 1 docs vs 0 length in bytes? Mike On Wed, May 20, 2009 at 3:19 AM, James X hello.nigerian.spamm...@gmail.com wrote: Hello all,I'm running Solr 1.3 in a multi-core environment. There are up to 2000 active cores in each Solr webapp instance at any given time. I've noticed occasional errors such as: SEVERE: java.lang.RuntimeException: after flush: fdx size mismatch: 1 docs vs 0 length in bytes of _h.fdx at org.apache.lucene.index.StoredFieldsWriter.closeDocStore(StoredFieldsWriter.java:94) at org.apache.lucene.index.DocFieldConsumers.closeDocStore(DocFieldConsumers.java:83) at org.apache.lucene.index.DocFieldProcessor.closeDocStore(DocFieldProcessor.java:47) at org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.java:367) at org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:567) at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3540) at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3450) at org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1638) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1602) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1578) at org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:153) during commit / optimise operations. These errors then cause cascading errors during updates on the offending cores: SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: SingleInstanceLock: write.lock at org.apache.lucene.store.Lock.obtain(Lock.java:85) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1070) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:924) at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:116) at org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:122) This looks like http://issues.apache.org/jira/browse/LUCENE-1521, but when I upgraded Lucene to 2.4.1 under Solr 1.3, the issue still remains. CheckIndex doesn't find any problems with the index, and problems disappear after an (inconvenient, for me) restart of Solr. Firstly, can I as the symptoms are so close to those in 1521, can I check my Lucene upgrade method should work: - unzip the Solr 1.3 war - remove the Lucene 2.4dev jars (lucene-core, lucene-spellchecker, lucene-snowball, lucene-queries, lucene-memory,lucene-highlighter, lucene-analyzers) - move in the Lucene 2.4.1 jars - rezip the directory structures as solr.war. I think this has worked, as solr/default/admin/registry.jsp shows: lucene-spec-version2.4.1/lucene-spec-version lucene-impl-version2.4.1 750176 - 2009-03-04 21:56:52/lucene-impl-version Secondly, if this Lucene fix isn't the right solution to this problem, can anyone suggest an alternative approach? The only problems I've had up to now is to do with the number of allowed file handles, which was fixed by changing limits.conf (RHEL machine). Many thanks! James
Re: java.lang.RuntimeException: after flush: fdx size mismatch
Hi Mike, thanks for the quick response: $ java -version java version 1.6.0_11 Java(TM) SE Runtime Environment (build 1.6.0_11-b03) Java HotSpot(TM) 64-Bit Server VM (build 11.0-b16, mixed mode) I hadn't noticed the 268m trigger for LUCENE-1521 - I'm definitely not hitting that yet! The exception always reports 0 length, but the number of of docs varies, heavily weighted towards 1 or two docs. Of the last 130 or so exceptions: 89 1 docs vs 0 length 20 2 docs vs 0 length 9 3 docs vs 0 length 1 4 docs vs 0 length 3 5 docs vs 0 length 2 6 docs vs 0 length 1 7 docs vs 0 length 1 9 docs vs 0 length 1 10 docs vs 0 length The only unusual thing I can think of that we're doing with Solr is aggressively CREATE-ing and UNLOAD-ing cores. I've not been able to spot a pattern between core admin operations and these exceptions, however... James On Wed, May 20, 2009 at 2:37 AM, Michael McCandless luc...@mikemccandless.com wrote: Hmm... somehow Lucene is flushing a new segment on closing the IndexWriter, and thinks 1 doc had been added to the stored fields file, yet the fdx file is the wrong size (0 bytes). This check ( exception) are designed to prevent corruption from entering the index, so it's at least good to see CheckIndex passes after this. I don't think you're hitting LUCENE-1521: that issue only happens if a single segment has more than ~268 million docs. Which exact JRE version are you using? When you hit this exception, is it always 1 docs vs 0 length in bytes? Mike On Wed, May 20, 2009 at 3:19 AM, James X hello.nigerian.spamm...@gmail.com wrote: Hello all,I'm running Solr 1.3 in a multi-core environment. There are up to 2000 active cores in each Solr webapp instance at any given time. I've noticed occasional errors such as: SEVERE: java.lang.RuntimeException: after flush: fdx size mismatch: 1 docs vs 0 length in bytes of _h.fdx at org.apache.lucene.index.StoredFieldsWriter.closeDocStore(StoredFieldsWriter.java:94) at org.apache.lucene.index.DocFieldConsumers.closeDocStore(DocFieldConsumers.java:83) at org.apache.lucene.index.DocFieldProcessor.closeDocStore(DocFieldProcessor.java:47) at org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.java:367) at org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:567) at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3540) at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3450) at org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1638) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1602) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1578) at org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:153) during commit / optimise operations. These errors then cause cascading errors during updates on the offending cores: SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: SingleInstanceLock: write.lock at org.apache.lucene.store.Lock.obtain(Lock.java:85) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1070) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:924) at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:116) at org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:122) This looks like http://issues.apache.org/jira/browse/LUCENE-1521, but when I upgraded Lucene to 2.4.1 under Solr 1.3, the issue still remains. CheckIndex doesn't find any problems with the index, and problems disappear after an (inconvenient, for me) restart of Solr. Firstly, can I as the symptoms are so close to those in 1521, can I check my Lucene upgrade method should work: - unzip the Solr 1.3 war - remove the Lucene 2.4dev jars (lucene-core, lucene-spellchecker, lucene-snowball, lucene-queries, lucene-memory,lucene-highlighter, lucene-analyzers) - move in the Lucene 2.4.1 jars - rezip the directory structures as solr.war. I think this has worked, as solr/default/admin/registry.jsp shows: lucene-spec-version2.4.1/lucene-spec-version lucene-impl-version2.4.1 750176 - 2009-03-04 21:56:52/lucene-impl-version Secondly, if this Lucene fix isn't the right solution to this problem, can anyone suggest an alternative approach? The only problems I've had up to now is to do with the number of allowed file handles, which was fixed by changing limits.conf (RHEL machine). Many thanks! James