Re: IOException: read past EOF during optimize phase

2008-01-16 Thread Otis Gospodnetic
Kevin,

Don't have the answer to EOF but I'm wondering why the index moving.  You 
don't need to do that as far as Solr is concerned.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
From: Kevin Osborn [EMAIL PROTECTED]
To: Solr solr-user@lucene.apache.org
Sent: Wednesday, January 16, 2008 3:07:23 PM
Subject: IOException: read past EOF during optimize phase

I am using the embedded Solr API for my indexing process. I created a
 brand new index with my application without any problem. I then ran my
 indexer in incremental mode. This process copies the working index to a
 temporary Solr location, adds/updates any records, optimizes the index,
 and then copies it back to the working location. There are currently
 not any instances of Solr reading this index. Also, I commit after every
 10 rows. The schema.xml and solrconfig.xml files have not changed.

Here is my function call.
protected void optimizeProducts() throws IOException {
UpdateHandler updateHandler = m_SolrCore.getUpdateHandler();
CommitUpdateCommand commitCmd = new CommitUpdateCommand(true);
commitCmd.optimize = true;

updateHandler.commit(commitCmd);

log.info(Optimized index);
}

So, during the optimize phase, I get the following stack trace:
java.io.IOException: read past EOF
at
 org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:89)
at
 org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:34)
at
 org.apache.lucene.store.IndexInput.readChars(IndexInput.java:107)
at
 org.apache.lucene.store.IndexInput.readString(IndexInput.java:93)
at
 org.apache.lucene.index.FieldsReader.addFieldForMerge(FieldsReader.java:211)
at
 org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:119)
at
 org.apache.lucene.index.SegmentReader.document(SegmentReader.java:323)
at
 org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:206)
at
 org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:96)
at
 org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:1835)
at
 org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:1195)
at
 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:508)
at ...

There are no exceptions or anything else that appears to be incorrect
 during the adds or commits. After this, the index files are still
 non-optimized.

I know there is not a whole lot to go on here. Anything in particular
 that I should look at?






Re: IOException: read past EOF during optimize phase

2008-01-16 Thread Otis Gospodnetic
Kevin,

Perhaps you want to look at how Solr can be used in a master-slave setup.  This 
will separate your indexing from searching.  Don't have the URL, but it's on 
zee Wiki.

Otis 

--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
From: Kevin Osborn [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Wednesday, January 16, 2008 5:25:34 PM
Subject: Re: IOException: read past EOF during optimize phase

It is more of a file structure thing for our application. We build in
 one place and do our index syncing in a different place. I doubt it is
 relevant to this issue, but figured I would include this information
 anyway.

- Original Message 
From: Otis Gospodnetic [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Wednesday, January 16, 2008 2:21:31 PM
Subject: Re: IOException: read past EOF during optimize phase


Kevin,

Don't have the answer to EOF but I'm wondering why the index
 moving.  You don't need to do that as far as Solr is concerned.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
From: Kevin Osborn [EMAIL PROTECTED]
To: Solr solr-user@lucene.apache.org
Sent: Wednesday, January 16, 2008 3:07:23 PM
Subject: IOException: read past EOF during optimize phase

I am using the embedded Solr API for my indexing process. I created a
 brand new index with my application without any problem. I then ran my
 indexer in incremental mode. This process copies the working index to
 a
 temporary Solr location, adds/updates any records, optimizes the
 index,
 and then copies it back to the working location. There are currently
 not any instances of Solr reading this index. Also, I commit after
 every
 10 rows. The schema.xml and solrconfig.xml files have not changed.

Here is my function call.
protected void optimizeProducts() throws IOException {
UpdateHandler updateHandler = m_SolrCore.getUpdateHandler();
CommitUpdateCommand commitCmd = new CommitUpdateCommand(true);
commitCmd.optimize = true;

updateHandler.commit(commitCmd);

log.info(Optimized index);
}

So, during the optimize phase, I get the following stack trace:
java.io.IOException: read past EOF
at


 org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:89)
at


 org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:34)
at
 org.apache.lucene.store.IndexInput.readChars(IndexInput.java:107)
at
 org.apache.lucene.store.IndexInput.readString(IndexInput.java:93)
at


 org.apache.lucene.index.FieldsReader.addFieldForMerge(FieldsReader.java:211)
at
 org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:119)
at
 org.apache.lucene.index.SegmentReader.document(SegmentReader.java:323)
at


 org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:206)
at
 org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:96)
at


 org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:1835)
at
 org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:1195)
at


 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:508)
at ...

There are no exceptions or anything else that appears to be incorrect
 during the adds or commits. After this, the index files are still
 non-optimized.

I know there is not a whole lot to go on here. Anything in particular
 that I should look at?












Re: IOException: read past EOF during optimize phase

2008-01-16 Thread Kevin Osborn
I did see that bug, which made me suspect Lucene. In my case, I tracked down 
the problem. It was my own application. I was using Java's 
FileChannel.transferTo functions to copy my index from one location to another. 
One of the files is bigger than 2^31-1 bytes. So, one of my files was corrupted 
during the copy because I was just doing one pass. I now loop the copy function 
until the entire file is copied and everything works fine.

DOH!

- Original Message 
From: Yonik Seeley [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Wednesday, January 16, 2008 4:57:08 PM
Subject: Re: IOException: read past EOF during optimize phase


This may be a Lucene bug... IIRC, I saw at least one other lucene user
with a similar stack trace.  I think the latest lucene version (2.3
dev) should fix it if that's the case.

-Yonik

On Jan 16, 2008 3:07 PM, Kevin Osborn [EMAIL PROTECTED] wrote:
 I am using the embedded Solr API for my indexing process. I created a
 brand new index with my application without any problem. I then ran my
 indexer in incremental mode. This process copies the working index to
 a temporary Solr location, adds/updates any records, optimizes the
 index, and then copies it back to the working location. There are currently
 not any instances of Solr reading this index. Also, I commit after
 every 10 rows. The schema.xml and solrconfig.xml files have not
 changed.

 Here is my function call.
 protected void optimizeProducts() throws IOException {
 UpdateHandler updateHandler = m_SolrCore.getUpdateHandler();
 CommitUpdateCommand commitCmd = new
 CommitUpdateCommand(true);
 commitCmd.optimize = true;

 updateHandler.commit(commitCmd);

 log.info(Optimized index);
 }

 So, during the optimize phase, I get the following stack trace:
 java.io.IOException: read past EOF
 at
 org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:89)
 at
 org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:34)
 at
 org.apache.lucene.store.IndexInput.readChars(IndexInput.java:107)
 at
 org.apache.lucene.store.IndexInput.readString(IndexInput.java:93)
 at
 org.apache.lucene.index.FieldsReader.addFieldForMerge(FieldsReader.java:211)
 at
 org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:119)
 at
 org.apache.lucene.index.SegmentReader.document(SegmentReader.java:323)
 at
 org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:206)
 at
 org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:96)
 at
 org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:1835)
 at
 org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:1195)
 at
 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:508)
 at ...

 There are no exceptions or anything else that appears to be incorrect
 during the adds or commits. After this, the index files are still
 non-optimized.

 I know there is not a whole lot to go on here. Anything in particular
 that I should look at?