Re: java.lang.RuntimeException: after flush: fdx size mismatch

2009-05-30 Thread Michael McCandless
Woops, here's the patch (added you, diretly, on the To: so that you
get the patch; Apache's list manager strips patches).

Yes, if the fdx file is getting deleted out from under Lucene, that'd
also explain what's happening.  Though the timing would have to be
very quick.  What's happening is Lucene had opened _X.fdx for writing,
written some small # bytes to it, and then closed it but found the
file no longer exists.

I'm not familiar with what exactly happens when you create/unload Solr
cores and move them around machines; does this involve moving files
from one machine to another?  (Ie, deleting files)?  If so, is there
some way to log when such migrations take place and try to correlate
to this exception?

Mike

On Fri, May 29, 2009 at 8:29 PM, James X
hello.nigerian.spamm...@gmail.com wrote:
 Hi Mike,I don't see a patch file here?

 Could another explanation be that the fdx file doesn't exist yet / has been
 deleted from underneath Lucene?

 I'm constantly CREATE-ing and UNLOAD-ing Solr cores, and more importantly,
 moving the bundled cores around between machines. I find it much more likely
 that there's something wrong with my core admin code than there is with the
 Lucene internals :) It's possible that I'm occasionally removing files which
 are currently in use by a live core...

 I'm using an ext3 filesystem on a large EC2 instance's own hard disk. I'm
 not sure how Amazon implement the local hard disk, but I assume it's a real
 hard disk exposed by the hypervisor.

 Thanks,
 James

 On Fri, May 29, 2009 at 3:41 AM, Michael McCandless 
 luc...@mikemccandless.com wrote:

 Very interesting: FieldsWriter thinks it's written 12 bytes to the fdx
 file, yet the directory says the file does not exist.

 Can you re-run with this new patch?  I'm suspecting that FieldsWriter
 wrote to one segment, but somehow we are then looking at the wrong
 segment.  The attached patch prints out which segment FieldsWriter
 actually wrote to.

 What filesystem  underlying IO system/device are you using?

 Mike

 On Thu, May 28, 2009 at 10:53 PM, James X
 hello.nigerian.spamm...@gmail.com wrote:
  My apologies for the delay in running this patched Lucene build - I was
  temporarily pulled onto another piece of work.
 
  Here is a sample 'fdx size mismatch' exception using the patch Mike
  supplied:
 
  SEVERE: java.lang.RuntimeException: after flush: fdx size mismatch: 1
 docs
  vs 0 length in bytes of _1i.fdx exists=false didInit=false inc=0 dSO=1
  fieldsWriter.doClose=true fieldsWriter.indexFilePointer=12
  fieldsWriter.fieldsFilePointer=2395
         at
 
 org.apache.lucene.index.StoredFieldsWriter.closeDocStore(StoredFieldsWriter.java:96)
         at
 
 org.apache.lucene.index.DocFieldConsumers.closeDocStore(DocFieldConsumers.java:83)
         at
 
 org.apache.lucene.index.DocFieldProcessor.closeDocStore(DocFieldProcessor.java:47)
         at
 
 org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.java:367)
         at
  org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:567)
         at
  org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3540)
         at
 org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3450)
         at
  org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1638)
         at
 org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1602)
         at
 org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1578)
         at
  org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:153)
 
 
  Will now run with assertions enabled and see how that affects the
 behaviour!
 
  Thanks,
  James
 
  -- Forwarded message --
  From: James X hello.nigerian.spamm...@gmail.com
  Date: Thu, May 21, 2009 at 2:24 PM
  Subject: Re: java.lang.RuntimeException: after flush: fdx size mismatch
  To: solr-user@lucene.apache.org
 
 
  Hi Mike,Documents are web pages, about 20 fields, mostly strings, a
 couple
  of integers, booleans and one html field (for document body content).
 
  I do have a multi-threaded client pushing docs to Solr, so yes, I suppose
  that would mean I have several active Solr worker threads.
 
  The only exceptions I have are the RuntimeException flush errors,
 followed
  by a handful (normally 10-20) of LockObtainFailedExceptions, which i
  presumed were being caused by the faulty threads dying and failing to
  release locks.
 
  Oh wait, I am getting WstxUnexpectedCharException exceptions every now
 and
  then:
  SEVERE: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal character
  ((CTRL-CHAR, code 8))
   at [row,col {unknown-source}]: [1,26070]
         at
  com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675)
         at
 
 com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4668)
         at
 
 com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126)
         at
 
 com.ctc.wstx.sr.BasicStreamReader.finishToken

Re: java.lang.RuntimeException: after flush: fdx size mismatch

2009-05-29 Thread Michael McCandless
Very interesting: FieldsWriter thinks it's written 12 bytes to the fdx
file, yet the directory says the file does not exist.

Can you re-run with this new patch?  I'm suspecting that FieldsWriter
wrote to one segment, but somehow we are then looking at the wrong
segment.  The attached patch prints out which segment FieldsWriter
actually wrote to.

What filesystem  underlying IO system/device are you using?

Mike

On Thu, May 28, 2009 at 10:53 PM, James X
hello.nigerian.spamm...@gmail.com wrote:
 My apologies for the delay in running this patched Lucene build - I was
 temporarily pulled onto another piece of work.

 Here is a sample 'fdx size mismatch' exception using the patch Mike
 supplied:

 SEVERE: java.lang.RuntimeException: after flush: fdx size mismatch: 1 docs
 vs 0 length in bytes of _1i.fdx exists=false didInit=false inc=0 dSO=1
 fieldsWriter.doClose=true fieldsWriter.indexFilePointer=12
 fieldsWriter.fieldsFilePointer=2395
        at
 org.apache.lucene.index.StoredFieldsWriter.closeDocStore(StoredFieldsWriter.java:96)
        at
 org.apache.lucene.index.DocFieldConsumers.closeDocStore(DocFieldConsumers.java:83)
        at
 org.apache.lucene.index.DocFieldProcessor.closeDocStore(DocFieldProcessor.java:47)
        at
 org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.java:367)
        at
 org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:567)
        at
 org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3540)
        at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3450)
        at
 org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1638)
        at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1602)
        at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1578)
        at
 org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:153)


 Will now run with assertions enabled and see how that affects the behaviour!

 Thanks,
 James

 -- Forwarded message --
 From: James X hello.nigerian.spamm...@gmail.com
 Date: Thu, May 21, 2009 at 2:24 PM
 Subject: Re: java.lang.RuntimeException: after flush: fdx size mismatch
 To: solr-user@lucene.apache.org


 Hi Mike,Documents are web pages, about 20 fields, mostly strings, a couple
 of integers, booleans and one html field (for document body content).

 I do have a multi-threaded client pushing docs to Solr, so yes, I suppose
 that would mean I have several active Solr worker threads.

 The only exceptions I have are the RuntimeException flush errors, followed
 by a handful (normally 10-20) of LockObtainFailedExceptions, which i
 presumed were being caused by the faulty threads dying and failing to
 release locks.

 Oh wait, I am getting WstxUnexpectedCharException exceptions every now and
 then:
 SEVERE: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal character
 ((CTRL-CHAR, code 8))
  at [row,col {unknown-source}]: [1,26070]
        at
 com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675)
        at
 com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4668)
        at
 com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126)
        at
 com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
        at
 com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3649)
        at
 com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
        at
 org.apache.solr.handler.XmlUpdateRequestHandler.readDoc(XmlUpdateRequestHandler.java:327)

 I presumed these were caused by character encoding issues, but haven't
 looked into them at all yet.

 Thanks again for your help! I'll make some time this afternoon to build some
 patched Lucene jars and get the results


 On Thu, May 21, 2009 at 5:06 AM, Michael McCandless 
 luc...@mikemccandless.com wrote:

 Another question: are there any other exceptions in your logs?  Eg
 problems adding certain documents, or anything?

 Mike

 On Wed, May 20, 2009 at 11:18 AM, James X
 hello.nigerian.spamm...@gmail.com wrote:
  Hi Mike, thanks for the quick response:
 
  $ java -version
  java version 1.6.0_11
  Java(TM) SE Runtime Environment (build 1.6.0_11-b03)
  Java HotSpot(TM) 64-Bit Server VM (build 11.0-b16, mixed mode)
 
  I hadn't noticed the 268m trigger for LUCENE-1521 - I'm definitely not
  hitting that yet!
 
  The exception always reports 0 length, but the number of of docs varies,
  heavily weighted towards 1 or two docs. Of the last 130 or so exceptions:
      89 1 docs vs 0 length
      20 2 docs vs 0 length
       9 3 docs vs 0 length
       1 4 docs vs 0 length
       3 5 docs vs 0 length
       2 6 docs vs 0 length
       1 7 docs vs 0 length
       1 9 docs vs 0 length
       1 10 docs vs 0 length
 
  The only unusual thing I can think of that we're doing with Solr is
  aggressively CREATE-ing and UNLOAD-ing cores. I've not been able to spot
 a
  pattern between core

Re: java.lang.RuntimeException: after flush: fdx size mismatch

2009-05-29 Thread James X
Hi Mike,I don't see a patch file here?

Could another explanation be that the fdx file doesn't exist yet / has been
deleted from underneath Lucene?

I'm constantly CREATE-ing and UNLOAD-ing Solr cores, and more importantly,
moving the bundled cores around between machines. I find it much more likely
that there's something wrong with my core admin code than there is with the
Lucene internals :) It's possible that I'm occasionally removing files which
are currently in use by a live core...

I'm using an ext3 filesystem on a large EC2 instance's own hard disk. I'm
not sure how Amazon implement the local hard disk, but I assume it's a real
hard disk exposed by the hypervisor.

Thanks,
James

On Fri, May 29, 2009 at 3:41 AM, Michael McCandless 
luc...@mikemccandless.com wrote:

 Very interesting: FieldsWriter thinks it's written 12 bytes to the fdx
 file, yet the directory says the file does not exist.

 Can you re-run with this new patch?  I'm suspecting that FieldsWriter
 wrote to one segment, but somehow we are then looking at the wrong
 segment.  The attached patch prints out which segment FieldsWriter
 actually wrote to.

 What filesystem  underlying IO system/device are you using?

 Mike

 On Thu, May 28, 2009 at 10:53 PM, James X
 hello.nigerian.spamm...@gmail.com wrote:
  My apologies for the delay in running this patched Lucene build - I was
  temporarily pulled onto another piece of work.
 
  Here is a sample 'fdx size mismatch' exception using the patch Mike
  supplied:
 
  SEVERE: java.lang.RuntimeException: after flush: fdx size mismatch: 1
 docs
  vs 0 length in bytes of _1i.fdx exists=false didInit=false inc=0 dSO=1
  fieldsWriter.doClose=true fieldsWriter.indexFilePointer=12
  fieldsWriter.fieldsFilePointer=2395
 at
 
 org.apache.lucene.index.StoredFieldsWriter.closeDocStore(StoredFieldsWriter.java:96)
 at
 
 org.apache.lucene.index.DocFieldConsumers.closeDocStore(DocFieldConsumers.java:83)
 at
 
 org.apache.lucene.index.DocFieldProcessor.closeDocStore(DocFieldProcessor.java:47)
 at
 
 org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.java:367)
 at
  org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:567)
 at
  org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3540)
 at
 org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3450)
 at
  org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1638)
 at
 org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1602)
 at
 org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1578)
 at
  org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:153)
 
 
  Will now run with assertions enabled and see how that affects the
 behaviour!
 
  Thanks,
  James
 
  -- Forwarded message --
  From: James X hello.nigerian.spamm...@gmail.com
  Date: Thu, May 21, 2009 at 2:24 PM
  Subject: Re: java.lang.RuntimeException: after flush: fdx size mismatch
  To: solr-user@lucene.apache.org
 
 
  Hi Mike,Documents are web pages, about 20 fields, mostly strings, a
 couple
  of integers, booleans and one html field (for document body content).
 
  I do have a multi-threaded client pushing docs to Solr, so yes, I suppose
  that would mean I have several active Solr worker threads.
 
  The only exceptions I have are the RuntimeException flush errors,
 followed
  by a handful (normally 10-20) of LockObtainFailedExceptions, which i
  presumed were being caused by the faulty threads dying and failing to
  release locks.
 
  Oh wait, I am getting WstxUnexpectedCharException exceptions every now
 and
  then:
  SEVERE: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal character
  ((CTRL-CHAR, code 8))
   at [row,col {unknown-source}]: [1,26070]
 at
  com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675)
 at
 
 com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4668)
 at
 
 com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126)
 at
 
 com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
 at
 
 com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3649)
 at
  com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
 at
 
 org.apache.solr.handler.XmlUpdateRequestHandler.readDoc(XmlUpdateRequestHandler.java:327)
 
  I presumed these were caused by character encoding issues, but haven't
  looked into them at all yet.
 
  Thanks again for your help! I'll make some time this afternoon to build
 some
  patched Lucene jars and get the results
 
 
  On Thu, May 21, 2009 at 5:06 AM, Michael McCandless 
  luc...@mikemccandless.com wrote:
 
  Another question: are there any other exceptions in your logs?  Eg
  problems adding certain documents, or anything?
 
  Mike
 
  On Wed, May 20, 2009 at 11:18 AM, James X
  hello.nigerian.spamm

Re: java.lang.RuntimeException: after flush: fdx size mismatch

2009-05-21 Thread Michael McCandless
On Wed, May 20, 2009 at 11:18 AM, James X
hello.nigerian.spamm...@gmail.com wrote:
 Hi Mike, thanks for the quick response:

 $ java -version
 java version 1.6.0_11
 Java(TM) SE Runtime Environment (build 1.6.0_11-b03)
 Java HotSpot(TM) 64-Bit Server VM (build 11.0-b16, mixed mode)

 I hadn't noticed the 268m trigger for LUCENE-1521 - I'm definitely not
 hitting that yet!

The issue didn't spell this out very well -- I've added a comment.

 The exception always reports 0 length, but the number of of docs varies,
 heavily weighted towards 1 or two docs. Of the last 130 or so exceptions:
     89 1 docs vs 0 length
     20 2 docs vs 0 length
      9 3 docs vs 0 length
      1 4 docs vs 0 length
      3 5 docs vs 0 length
      2 6 docs vs 0 length
      1 7 docs vs 0 length
      1 9 docs vs 0 length
      1 10 docs vs 0 length

Hmm... odd that it's always 0 file length.  What filesystem  IO
devices is the index being written to?

 The only unusual thing I can think of that we're doing with Solr is
 aggressively CREATE-ing and UNLOAD-ing cores. I've not been able to spot a
 pattern between core admin operations and these exceptions, however...

I think from Lucene's standpoint this just means creating  closing
lots of IndexWriters?  (Which should be just fine).

What are your documents like?  Ie, how many and what type of fields?
Are you adding docs from multiple threads?  (Solr would do so, I
believe, so I guess: is your client that's submitting docs to a given
core, doing so with multiple threads?).

Mike


Re: java.lang.RuntimeException: after flush: fdx size mismatch

2009-05-21 Thread Michael McCandless
Another question: are there any other exceptions in your logs?  Eg
problems adding certain documents, or anything?

Mike

On Wed, May 20, 2009 at 11:18 AM, James X
hello.nigerian.spamm...@gmail.com wrote:
 Hi Mike, thanks for the quick response:

 $ java -version
 java version 1.6.0_11
 Java(TM) SE Runtime Environment (build 1.6.0_11-b03)
 Java HotSpot(TM) 64-Bit Server VM (build 11.0-b16, mixed mode)

 I hadn't noticed the 268m trigger for LUCENE-1521 - I'm definitely not
 hitting that yet!

 The exception always reports 0 length, but the number of of docs varies,
 heavily weighted towards 1 or two docs. Of the last 130 or so exceptions:
     89 1 docs vs 0 length
     20 2 docs vs 0 length
      9 3 docs vs 0 length
      1 4 docs vs 0 length
      3 5 docs vs 0 length
      2 6 docs vs 0 length
      1 7 docs vs 0 length
      1 9 docs vs 0 length
      1 10 docs vs 0 length

 The only unusual thing I can think of that we're doing with Solr is
 aggressively CREATE-ing and UNLOAD-ing cores. I've not been able to spot a
 pattern between core admin operations and these exceptions, however...

 James

 On Wed, May 20, 2009 at 2:37 AM, Michael McCandless 
 luc...@mikemccandless.com wrote:

 Hmm... somehow Lucene is flushing a new segment on closing the
 IndexWriter, and thinks 1 doc had been added to the stored fields
 file, yet the fdx file is the wrong size (0 bytes).  This check (
 exception) are designed to prevent corruption from entering the index,
 so it's at least good to see CheckIndex passes after this.

 I don't think you're hitting LUCENE-1521: that issue only happens if a
 single segment has more than ~268 million docs.

 Which exact JRE version are you using?

 When you hit this exception, is it always 1 docs vs 0 length in bytes?

 Mike

 On Wed, May 20, 2009 at 3:19 AM, James X
 hello.nigerian.spamm...@gmail.com wrote:
  Hello all,I'm running Solr 1.3 in a multi-core environment. There are up
 to
  2000 active cores in each Solr webapp instance at any given time.
 
  I've noticed occasional errors such as:
  SEVERE: java.lang.RuntimeException: after flush: fdx size mismatch: 1
 docs
  vs 0 length in bytes of _h.fdx
         at
 
 org.apache.lucene.index.StoredFieldsWriter.closeDocStore(StoredFieldsWriter.java:94)
         at
 
 org.apache.lucene.index.DocFieldConsumers.closeDocStore(DocFieldConsumers.java:83)
         at
 
 org.apache.lucene.index.DocFieldProcessor.closeDocStore(DocFieldProcessor.java:47)
         at
 
 org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.java:367)
         at
  org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:567)
         at
  org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3540)
         at
 org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3450)
         at
  org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1638)
         at
 org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1602)
         at
 org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1578)
         at
  org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:153)
 
  during commit / optimise operations.
 
  These errors then cause cascading errors during updates on the offending
  cores:
  SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain
 timed
  out: SingleInstanceLock: write.lock
         at org.apache.lucene.store.Lock.obtain(Lock.java:85)
         at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1070)
         at
 org.apache.lucene.index.IndexWriter.init(IndexWriter.java:924)
         at
  org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:116)
         at
 
 org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:122)
 
  This looks like http://issues.apache.org/jira/browse/LUCENE-1521, but
 when I
  upgraded Lucene to 2.4.1 under Solr 1.3, the issue still remains.
 
  CheckIndex doesn't find any problems with the index, and problems
 disappear
  after an (inconvenient, for me) restart of Solr.
 
  Firstly, can I as the symptoms are so close to those in 1521, can I check
 my
  Lucene upgrade method should work:
  - unzip the Solr 1.3 war
  - remove the Lucene 2.4dev jars
  (lucene-core, lucene-spellchecker, lucene-snowball, lucene-queries,
  lucene-memory,lucene-highlighter, lucene-analyzers)
  - move in the Lucene 2.4.1 jars
  - rezip the directory structures as solr.war.
 
  I think this has worked, as solr/default/admin/registry.jsp shows:
   lucene-spec-version2.4.1/lucene-spec-version
   lucene-impl-version2.4.1 750176 - 2009-03-04
  21:56:52/lucene-impl-version
 
  Secondly, if this Lucene fix isn't the right solution to this problem,
 can
  anyone suggest an alternative approach? The only problems I've had up to
 now
  is to do with the number of allowed file handles, which was fixed by
  changing limits.conf (RHEL machine).
 
  Many thanks!
  James
 




Re: java.lang.RuntimeException: after flush: fdx size mismatch

2009-05-21 Thread Michael McCandless
If you're able to run a patched version of Lucene, can you apply the
attached patch, run it, get the issue to happen again, and post back
the resulting exception?

It only adds further diagnostics to that RuntimeException you're hitting.

Another thing to try is turning on assertions, which may very well
catch the issue sooner.

Mike

On Wed, May 20, 2009 at 11:18 AM, James X
hello.nigerian.spamm...@gmail.com wrote:
 Hi Mike, thanks for the quick response:

 $ java -version
 java version 1.6.0_11
 Java(TM) SE Runtime Environment (build 1.6.0_11-b03)
 Java HotSpot(TM) 64-Bit Server VM (build 11.0-b16, mixed mode)

 I hadn't noticed the 268m trigger for LUCENE-1521 - I'm definitely not
 hitting that yet!

 The exception always reports 0 length, but the number of of docs varies,
 heavily weighted towards 1 or two docs. Of the last 130 or so exceptions:
     89 1 docs vs 0 length
     20 2 docs vs 0 length
      9 3 docs vs 0 length
      1 4 docs vs 0 length
      3 5 docs vs 0 length
      2 6 docs vs 0 length
      1 7 docs vs 0 length
      1 9 docs vs 0 length
      1 10 docs vs 0 length

 The only unusual thing I can think of that we're doing with Solr is
 aggressively CREATE-ing and UNLOAD-ing cores. I've not been able to spot a
 pattern between core admin operations and these exceptions, however...

 James

 On Wed, May 20, 2009 at 2:37 AM, Michael McCandless 
 luc...@mikemccandless.com wrote:

 Hmm... somehow Lucene is flushing a new segment on closing the
 IndexWriter, and thinks 1 doc had been added to the stored fields
 file, yet the fdx file is the wrong size (0 bytes).  This check (
 exception) are designed to prevent corruption from entering the index,
 so it's at least good to see CheckIndex passes after this.

 I don't think you're hitting LUCENE-1521: that issue only happens if a
 single segment has more than ~268 million docs.

 Which exact JRE version are you using?

 When you hit this exception, is it always 1 docs vs 0 length in bytes?

 Mike

 On Wed, May 20, 2009 at 3:19 AM, James X
 hello.nigerian.spamm...@gmail.com wrote:
  Hello all,I'm running Solr 1.3 in a multi-core environment. There are up
 to
  2000 active cores in each Solr webapp instance at any given time.
 
  I've noticed occasional errors such as:
  SEVERE: java.lang.RuntimeException: after flush: fdx size mismatch: 1
 docs
  vs 0 length in bytes of _h.fdx
         at
 
 org.apache.lucene.index.StoredFieldsWriter.closeDocStore(StoredFieldsWriter.java:94)
         at
 
 org.apache.lucene.index.DocFieldConsumers.closeDocStore(DocFieldConsumers.java:83)
         at
 
 org.apache.lucene.index.DocFieldProcessor.closeDocStore(DocFieldProcessor.java:47)
         at
 
 org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.java:367)
         at
  org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:567)
         at
  org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3540)
         at
 org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3450)
         at
  org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1638)
         at
 org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1602)
         at
 org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1578)
         at
  org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:153)
 
  during commit / optimise operations.
 
  These errors then cause cascading errors during updates on the offending
  cores:
  SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain
 timed
  out: SingleInstanceLock: write.lock
         at org.apache.lucene.store.Lock.obtain(Lock.java:85)
         at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1070)
         at
 org.apache.lucene.index.IndexWriter.init(IndexWriter.java:924)
         at
  org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:116)
         at
 
 org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:122)
 
  This looks like http://issues.apache.org/jira/browse/LUCENE-1521, but
 when I
  upgraded Lucene to 2.4.1 under Solr 1.3, the issue still remains.
 
  CheckIndex doesn't find any problems with the index, and problems
 disappear
  after an (inconvenient, for me) restart of Solr.
 
  Firstly, can I as the symptoms are so close to those in 1521, can I check
 my
  Lucene upgrade method should work:
  - unzip the Solr 1.3 war
  - remove the Lucene 2.4dev jars
  (lucene-core, lucene-spellchecker, lucene-snowball, lucene-queries,
  lucene-memory,lucene-highlighter, lucene-analyzers)
  - move in the Lucene 2.4.1 jars
  - rezip the directory structures as solr.war.
 
  I think this has worked, as solr/default/admin/registry.jsp shows:
   lucene-spec-version2.4.1/lucene-spec-version
   lucene-impl-version2.4.1 750176 - 2009-03-04
  21:56:52/lucene-impl-version
 
  Secondly, if this Lucene fix isn't the right solution to this problem,
 can
  anyone suggest an alternative approach? The only problems I've 

Re: java.lang.RuntimeException: after flush: fdx size mismatch

2009-05-21 Thread James X
Hi Mike,Documents are web pages, about 20 fields, mostly strings, a couple
of integers, booleans and one html field (for document body content).

I do have a multi-threaded client pushing docs to Solr, so yes, I suppose
that would mean I have several active Solr worker threads.

The only exceptions I have are the RuntimeException flush errors, followed
by a handful (normally 10-20) of LockObtainFailedExceptions, which i
presumed were being caused by the faulty threads dying and failing to
release locks.

Oh wait, I am getting WstxUnexpectedCharException exceptions every now and
then:
SEVERE: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal character
((CTRL-CHAR, code 8))
 at [row,col {unknown-source}]: [1,26070]
at
com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675)
at
com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4668)
at
com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126)
at
com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
at
com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3649)
at
com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
at
org.apache.solr.handler.XmlUpdateRequestHandler.readDoc(XmlUpdateRequestHandler.java:327)

I presumed these were caused by character encoding issues, but haven't
looked into them at all yet.

Thanks again for your help! I'll make some time this afternoon to build some
patched Lucene jars and get the results


On Thu, May 21, 2009 at 5:06 AM, Michael McCandless 
luc...@mikemccandless.com wrote:

 Another question: are there any other exceptions in your logs?  Eg
 problems adding certain documents, or anything?

 Mike

 On Wed, May 20, 2009 at 11:18 AM, James X
 hello.nigerian.spamm...@gmail.com wrote:
  Hi Mike, thanks for the quick response:
 
  $ java -version
  java version 1.6.0_11
  Java(TM) SE Runtime Environment (build 1.6.0_11-b03)
  Java HotSpot(TM) 64-Bit Server VM (build 11.0-b16, mixed mode)
 
  I hadn't noticed the 268m trigger for LUCENE-1521 - I'm definitely not
  hitting that yet!
 
  The exception always reports 0 length, but the number of of docs varies,
  heavily weighted towards 1 or two docs. Of the last 130 or so exceptions:
  89 1 docs vs 0 length
  20 2 docs vs 0 length
   9 3 docs vs 0 length
   1 4 docs vs 0 length
   3 5 docs vs 0 length
   2 6 docs vs 0 length
   1 7 docs vs 0 length
   1 9 docs vs 0 length
   1 10 docs vs 0 length
 
  The only unusual thing I can think of that we're doing with Solr is
  aggressively CREATE-ing and UNLOAD-ing cores. I've not been able to spot
 a
  pattern between core admin operations and these exceptions, however...
 
  James
 
  On Wed, May 20, 2009 at 2:37 AM, Michael McCandless 
  luc...@mikemccandless.com wrote:
 
  Hmm... somehow Lucene is flushing a new segment on closing the
  IndexWriter, and thinks 1 doc had been added to the stored fields
  file, yet the fdx file is the wrong size (0 bytes).  This check (
  exception) are designed to prevent corruption from entering the index,
  so it's at least good to see CheckIndex passes after this.
 
  I don't think you're hitting LUCENE-1521: that issue only happens if a
  single segment has more than ~268 million docs.
 
  Which exact JRE version are you using?
 
  When you hit this exception, is it always 1 docs vs 0 length in bytes?
 
  Mike
 
  On Wed, May 20, 2009 at 3:19 AM, James X
  hello.nigerian.spamm...@gmail.com wrote:
   Hello all,I'm running Solr 1.3 in a multi-core environment. There are
 up
  to
   2000 active cores in each Solr webapp instance at any given time.
  
   I've noticed occasional errors such as:
   SEVERE: java.lang.RuntimeException: after flush: fdx size mismatch: 1
  docs
   vs 0 length in bytes of _h.fdx
  at
  
 
 org.apache.lucene.index.StoredFieldsWriter.closeDocStore(StoredFieldsWriter.java:94)
  at
  
 
 org.apache.lucene.index.DocFieldConsumers.closeDocStore(DocFieldConsumers.java:83)
  at
  
 
 org.apache.lucene.index.DocFieldProcessor.closeDocStore(DocFieldProcessor.java:47)
  at
  
 
 org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.java:367)
  at
  
 org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:567)
  at
   org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3540)
  at
  org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3450)
  at
  
 org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1638)
  at
  org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1602)
  at
  org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1578)
  at
   org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:153)
  
   during commit / optimise operations.
  
   These errors then cause cascading errors during updates on the
 

Re: java.lang.RuntimeException: after flush: fdx size mismatch

2009-05-20 Thread Michael McCandless
Hmm... somehow Lucene is flushing a new segment on closing the
IndexWriter, and thinks 1 doc had been added to the stored fields
file, yet the fdx file is the wrong size (0 bytes).  This check (
exception) are designed to prevent corruption from entering the index,
so it's at least good to see CheckIndex passes after this.

I don't think you're hitting LUCENE-1521: that issue only happens if a
single segment has more than ~268 million docs.

Which exact JRE version are you using?

When you hit this exception, is it always 1 docs vs 0 length in bytes?

Mike

On Wed, May 20, 2009 at 3:19 AM, James X
hello.nigerian.spamm...@gmail.com wrote:
 Hello all,I'm running Solr 1.3 in a multi-core environment. There are up to
 2000 active cores in each Solr webapp instance at any given time.

 I've noticed occasional errors such as:
 SEVERE: java.lang.RuntimeException: after flush: fdx size mismatch: 1 docs
 vs 0 length in bytes of _h.fdx
        at
 org.apache.lucene.index.StoredFieldsWriter.closeDocStore(StoredFieldsWriter.java:94)
        at
 org.apache.lucene.index.DocFieldConsumers.closeDocStore(DocFieldConsumers.java:83)
        at
 org.apache.lucene.index.DocFieldProcessor.closeDocStore(DocFieldProcessor.java:47)
        at
 org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.java:367)
        at
 org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:567)
        at
 org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3540)
        at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3450)
        at
 org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1638)
        at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1602)
        at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1578)
        at
 org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:153)

 during commit / optimise operations.

 These errors then cause cascading errors during updates on the offending
 cores:
 SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed
 out: SingleInstanceLock: write.lock
        at org.apache.lucene.store.Lock.obtain(Lock.java:85)
        at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1070)
        at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:924)
        at
 org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:116)
        at
 org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:122)

 This looks like http://issues.apache.org/jira/browse/LUCENE-1521, but when I
 upgraded Lucene to 2.4.1 under Solr 1.3, the issue still remains.

 CheckIndex doesn't find any problems with the index, and problems disappear
 after an (inconvenient, for me) restart of Solr.

 Firstly, can I as the symptoms are so close to those in 1521, can I check my
 Lucene upgrade method should work:
 - unzip the Solr 1.3 war
 - remove the Lucene 2.4dev jars
 (lucene-core, lucene-spellchecker, lucene-snowball, lucene-queries,
 lucene-memory,lucene-highlighter, lucene-analyzers)
 - move in the Lucene 2.4.1 jars
 - rezip the directory structures as solr.war.

 I think this has worked, as solr/default/admin/registry.jsp shows:
  lucene-spec-version2.4.1/lucene-spec-version
  lucene-impl-version2.4.1 750176 - 2009-03-04
 21:56:52/lucene-impl-version

 Secondly, if this Lucene fix isn't the right solution to this problem, can
 anyone suggest an alternative approach? The only problems I've had up to now
 is to do with the number of allowed file handles, which was fixed by
 changing limits.conf (RHEL machine).

 Many thanks!
 James



Re: java.lang.RuntimeException: after flush: fdx size mismatch

2009-05-20 Thread James X
Hi Mike, thanks for the quick response:

$ java -version
java version 1.6.0_11
Java(TM) SE Runtime Environment (build 1.6.0_11-b03)
Java HotSpot(TM) 64-Bit Server VM (build 11.0-b16, mixed mode)

I hadn't noticed the 268m trigger for LUCENE-1521 - I'm definitely not
hitting that yet!

The exception always reports 0 length, but the number of of docs varies,
heavily weighted towards 1 or two docs. Of the last 130 or so exceptions:
 89 1 docs vs 0 length
 20 2 docs vs 0 length
  9 3 docs vs 0 length
  1 4 docs vs 0 length
  3 5 docs vs 0 length
  2 6 docs vs 0 length
  1 7 docs vs 0 length
  1 9 docs vs 0 length
  1 10 docs vs 0 length

The only unusual thing I can think of that we're doing with Solr is
aggressively CREATE-ing and UNLOAD-ing cores. I've not been able to spot a
pattern between core admin operations and these exceptions, however...

James

On Wed, May 20, 2009 at 2:37 AM, Michael McCandless 
luc...@mikemccandless.com wrote:

 Hmm... somehow Lucene is flushing a new segment on closing the
 IndexWriter, and thinks 1 doc had been added to the stored fields
 file, yet the fdx file is the wrong size (0 bytes).  This check (
 exception) are designed to prevent corruption from entering the index,
 so it's at least good to see CheckIndex passes after this.

 I don't think you're hitting LUCENE-1521: that issue only happens if a
 single segment has more than ~268 million docs.

 Which exact JRE version are you using?

 When you hit this exception, is it always 1 docs vs 0 length in bytes?

 Mike

 On Wed, May 20, 2009 at 3:19 AM, James X
 hello.nigerian.spamm...@gmail.com wrote:
  Hello all,I'm running Solr 1.3 in a multi-core environment. There are up
 to
  2000 active cores in each Solr webapp instance at any given time.
 
  I've noticed occasional errors such as:
  SEVERE: java.lang.RuntimeException: after flush: fdx size mismatch: 1
 docs
  vs 0 length in bytes of _h.fdx
 at
 
 org.apache.lucene.index.StoredFieldsWriter.closeDocStore(StoredFieldsWriter.java:94)
 at
 
 org.apache.lucene.index.DocFieldConsumers.closeDocStore(DocFieldConsumers.java:83)
 at
 
 org.apache.lucene.index.DocFieldProcessor.closeDocStore(DocFieldProcessor.java:47)
 at
 
 org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.java:367)
 at
  org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:567)
 at
  org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3540)
 at
 org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3450)
 at
  org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1638)
 at
 org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1602)
 at
 org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1578)
 at
  org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:153)
 
  during commit / optimise operations.
 
  These errors then cause cascading errors during updates on the offending
  cores:
  SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain
 timed
  out: SingleInstanceLock: write.lock
 at org.apache.lucene.store.Lock.obtain(Lock.java:85)
 at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1070)
 at
 org.apache.lucene.index.IndexWriter.init(IndexWriter.java:924)
 at
  org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:116)
 at
 
 org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:122)
 
  This looks like http://issues.apache.org/jira/browse/LUCENE-1521, but
 when I
  upgraded Lucene to 2.4.1 under Solr 1.3, the issue still remains.
 
  CheckIndex doesn't find any problems with the index, and problems
 disappear
  after an (inconvenient, for me) restart of Solr.
 
  Firstly, can I as the symptoms are so close to those in 1521, can I check
 my
  Lucene upgrade method should work:
  - unzip the Solr 1.3 war
  - remove the Lucene 2.4dev jars
  (lucene-core, lucene-spellchecker, lucene-snowball, lucene-queries,
  lucene-memory,lucene-highlighter, lucene-analyzers)
  - move in the Lucene 2.4.1 jars
  - rezip the directory structures as solr.war.
 
  I think this has worked, as solr/default/admin/registry.jsp shows:
   lucene-spec-version2.4.1/lucene-spec-version
   lucene-impl-version2.4.1 750176 - 2009-03-04
  21:56:52/lucene-impl-version
 
  Secondly, if this Lucene fix isn't the right solution to this problem,
 can
  anyone suggest an alternative approach? The only problems I've had up to
 now
  is to do with the number of allowed file handles, which was fixed by
  changing limits.conf (RHEL machine).
 
  Many thanks!
  James