Solr-7 CorruptIndexException: checksum failed (hardware problem?)

2020-08-04 Thread slly
Hello everyone, I use the version of Solr-7.7.3. 


The following error occurred during the index write phase, but after restarting 
the Solr service, the file was deleted, and access to the index has also been 
restored.


Has anyone ever encountered this mistake?




Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is 
closed
at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:680)
at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:694)
at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1613)
at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1608)
at 
org.apache.solr.update.DirectUpdateHandler2.updateDocOrDocValues(DirectUpdateHandler2.java:969)
at 
org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:341)
at 
org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:288)
at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:235)
... 67 more
Caused by: org.apache.lucene.index.CorruptIndexException: checksum failed 
(hardware problem?) : expected=dd0fcc0f actual=2317b057 
(resource=BufferedChecksumIndexInput(_2g_Lucene50_0.tim))
at org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:419)
at 
org.apache.lucene.codecs.CodecUtil.checksumEntireFile(CodecUtil.java:526)
at 
org.apache.lucene.codecs.blocktree.BlockTreeTermsReader.checkIntegrity(BlockTreeTermsReader.java:309)
at 
org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader.checkIntegrity(PerFieldPostingsFormat.java:339)
at 
org.apache.lucene.codecs.perfield.PerFieldMergeState$FilterFieldsProducer.checkIntegrity(PerFieldMergeState.java:271)
at org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:96)
at 
org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.merge(PerFieldPostingsFormat.java:164)
at 
org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:231)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:116)
at 
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4482)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4077)
at 
org.apache.solr.update.SolrIndexWriter.merge(SolrIndexWriter.java:224)
at 
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:625)
at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:662)


Thanks.




 

Re: checksum failed (hardware problem?)

2018-10-10 Thread Stephen Bianamara
t; > > >>
> > > > >>
> > > >
> > >
> >
> (zkCallback-5-thread-91-processing-n:server54:8080_solr-SendThread(server120:2182))
> > > > >> [   ] o.a.z.ClientCnxn Client session timed out, have not heard
> from
> > > > >> server in 8812ms for sessionid 0x0
> > > > >> 2018-09-29 17:31:14.049 INFO
> > > > >>
> > (zkCallback-5-thread-91-processing-n:server54:8080_solr-EventThread) [
> > > > >>   ] o.a.s.c.c.ConnectionManager Connection with ZooKeeper
> > > > >> reestablished.
> > > > >> 2018-09-29 17:31:14.049 INFO
> > > > >>
> > (zkCallback-5-thread-91-processing-n:server54:8080_solr-EventThread) [
> > > > >>   ] o.a.s.c.ZkController ZooKeeper session re-connected ...
> > refreshing
> > > > >> core states after session expiration.
> > > > >> 2018-09-29 17:31:14.051 INFO
> > > > >>
> > (zkCallback-5-thread-91-processing-n:server54:8080_solr-EventThread) [
> > > > >>   ] o.a.s.c.c.ZkStateReader Updated live nodes from ZooKeeper...
> > (16)
> > > > >> -> (15)
> > > > >> 2018-09-29 17:31:14.144 INFO  (qtp834133664-520378) [c:COLL
> s:shard4
> > > > >> r:core_node8 x:COLL_shard4_replica2] o.a.s.c.S.Request
> > > > >> [COLL_shard4_replica2]  webapp=/solr path=/admin/ping
> > > > >>
> > > > >>
> > > >
> > >
> >
> params={distrib=false&df=wordTokens&_stateVer_=COLL:1246&preferLocalShards=false&qt=/admin/ping&fl=id&fl=score&shards.purpose=4&start=0&fsv=true&shard.url=
> > > > >>
> > > >
> > >
> >
> http://server54:8080/solr/COLL_shard4_replica2/|http://server53:8080/solr/COLL_shard4_replica1/&rows=10&version=2&q={!lucene}*:*&NOW=1538242274139&isShard=true&wt=javabin
> > > > >> }
> > > > >> webapp=/solr path=/admin/ping
> > > > >>
> > > > >>
> > > >
> > >
> >
> params={distrib=false&df=wordTokens&_stateVer_=COLL:1246&preferLocalShards=false&qt=/admin/ping&fl=id&fl=score&shards.purpose=4&start=0&fsv=true&shard.url=
> > > > >>
> > > >
> > >
> >
> http://server54:8080/solr/COLL_shard4_replica2/|http://server53:8080/solr/COLL_shard4_replica1/&rows=10&version=2&q={!lucene}*:*&NOW=1538242274139&isShard=true&wt=javabin
> > > > >> }
> > > > >> hits=4989979 status=0 QTime=0
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >> On Wed, Sep 26, 2018 at 9:44 AM simon  wrote:
> > > > >>
> > > > >> > I saw something like this a year ago which i reported as a
> > possible
> > > > >> bug  (
> > > > >> > https://issues.apache.org/jira/browse/SOLR-10840, which has  a
> > full
> > > > >> > description and stack traces)
> > > > >> >
> > > > >> > This occurred very randomly on an AWS instance; moving the index
> > > > >> directory
> > > > >> > to a different file system did not fix the problem Eventually I
> > > cloned
> > > > >> our
> > > > >> > environment to a new AWS instance, which proved to be the
> > solution.
> > > > >> Why, I
> > > > >> > have no idea...
> > > > >> >
> > > > >> > -Simon
> > > > >> >
> > > > >> > On Mon, Sep 24, 2018 at 1:13 PM, Susheel Kumar <
> > > susheel2...@gmail.com
> > > > >
> > > > >> > wrote:
> > > > >> >
> > > > >> > > Got it. I'll have first hardware folks check and if they don't
> > > > >> see/find
> > > > >> > > anything suspicious then i'll return here.
> > > > >> > >
> > > > >> > > Wondering if any body has seen similar error and if they were
> > able
> > > > to
> > > > >> > > confirm if it was hardware fault or so.
> > > > >> > >
> > > > >> > > Thnx
> > > > >> > >
> > > > >> > > On Mon, Sep 24, 2018 at 1:

Re: checksum failed (hardware problem?)

2018-10-09 Thread Susheel Kumar
plica2]  webapp=/solr path=/admin/ping
> > > >>
> > > >>
> > >
> >
> params={distrib=false&df=wordTokens&_stateVer_=COLL:1246&preferLocalShards=false&qt=/admin/ping&fl=id&fl=score&shards.purpose=4&start=0&fsv=true&shard.url=
> > > >>
> > >
> >
> http://server54:8080/solr/COLL_shard4_replica2/|http://server53:8080/solr/COLL_shard4_replica1/&rows=10&version=2&q={!lucene}*:*&NOW=1538242274139&isShard=true&wt=javabin
> > > >> }
> > > >> webapp=/solr path=/admin/ping
> > > >>
> > > >>
> > >
> >
> params={distrib=false&df=wordTokens&_stateVer_=COLL:1246&preferLocalShards=false&qt=/admin/ping&fl=id&fl=score&shards.purpose=4&start=0&fsv=true&shard.url=
> > > >>
> > >
> >
> http://server54:8080/solr/COLL_shard4_replica2/|http://server53:8080/solr/COLL_shard4_replica1/&rows=10&version=2&q={!lucene}*:*&NOW=1538242274139&isShard=true&wt=javabin
> > > >> }
> > > >> hits=4989979 status=0 QTime=0
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> On Wed, Sep 26, 2018 at 9:44 AM simon  wrote:
> > > >>
> > > >> > I saw something like this a year ago which i reported as a
> possible
> > > >> bug  (
> > > >> > https://issues.apache.org/jira/browse/SOLR-10840, which has  a
> full
> > > >> > description and stack traces)
> > > >> >
> > > >> > This occurred very randomly on an AWS instance; moving the index
> > > >> directory
> > > >> > to a different file system did not fix the problem Eventually I
> > cloned
> > > >> our
> > > >> > environment to a new AWS instance, which proved to be the
> solution.
> > > >> Why, I
> > > >> > have no idea...
> > > >> >
> > > >> > -Simon
> > > >> >
> > > >> > On Mon, Sep 24, 2018 at 1:13 PM, Susheel Kumar <
> > susheel2...@gmail.com
> > > >
> > > >> > wrote:
> > > >> >
> > > >> > > Got it. I'll have first hardware folks check and if they don't
> > > >> see/find
> > > >> > > anything suspicious then i'll return here.
> > > >> > >
> > > >> > > Wondering if any body has seen similar error and if they were
> able
> > > to
> > > >> > > confirm if it was hardware fault or so.
> > > >> > >
> > > >> > > Thnx
> > > >> > >
> > > >> > > On Mon, Sep 24, 2018 at 1:01 PM Erick Erickson <
> > > >> erickerick...@gmail.com>
> > > >> > > wrote:
> > > >> > >
> > > >> > > > Mind you it could _still_ be Solr/Lucene, but let's check the
> > > >> hardware
> > > >> > > > first ;)
> > > >> > > > On Mon, Sep 24, 2018 at 9:50 AM Susheel Kumar <
> > > >> susheel2...@gmail.com>
> > > >> > > > wrote:
> > > >> > > > >
> > > >> > > > > Hi Erick,
> > > >> > > > >
> > > >> > > > > Thanks so much for your reply.  I'll now look mostly into
> any
> > > >> > possible
> > > >> > > > > hardware issues than Solr/Lucene.
> > > >> > > > >
> > > >> > > > > Thanks again.
> > > >> > > > >
> > > >> > > > > On Mon, Sep 24, 2018 at 12:43 PM Erick Erickson <
> > > >> > > erickerick...@gmail.com
> > > >> > > > >
> > > >> > > > > wrote:
> > > >> > > > >
> > > >> > > > > > There are several of reasons this would "suddenly" start
> > > >> appearing.
> > > >> > > > > > 1> Your disk went bad and some sector is no longer
> > faithfully
> > > >> > > > > > recording the bits. In this case the checksum will be
> wrong
> > > >> > > > > > 2> You ran out of disk space sometime and the index was
> > > >> corrupted.
&

Re: checksum failed (hardware problem?)

2018-10-08 Thread Stephen Bianamara
ira/browse/SOLR-10840, which has  a full
> > >> > description and stack traces)
> > >> >
> > >> > This occurred very randomly on an AWS instance; moving the index
> > >> directory
> > >> > to a different file system did not fix the problem Eventually I
> cloned
> > >> our
> > >> > environment to a new AWS instance, which proved to be the solution.
> > >> Why, I
> > >> > have no idea...
> > >> >
> > >> > -Simon
> > >> >
> > >> > On Mon, Sep 24, 2018 at 1:13 PM, Susheel Kumar <
> susheel2...@gmail.com
> > >
> > >> > wrote:
> > >> >
> > >> > > Got it. I'll have first hardware folks check and if they don't
> > >> see/find
> > >> > > anything suspicious then i'll return here.
> > >> > >
> > >> > > Wondering if any body has seen similar error and if they were able
> > to
> > >> > > confirm if it was hardware fault or so.
> > >> > >
> > >> > > Thnx
> > >> > >
> > >> > > On Mon, Sep 24, 2018 at 1:01 PM Erick Erickson <
> > >> erickerick...@gmail.com>
> > >> > > wrote:
> > >> > >
> > >> > > > Mind you it could _still_ be Solr/Lucene, but let's check the
> > >> hardware
> > >> > > > first ;)
> > >> > > > On Mon, Sep 24, 2018 at 9:50 AM Susheel Kumar <
> > >> susheel2...@gmail.com>
> > >> > > > wrote:
> > >> > > > >
> > >> > > > > Hi Erick,
> > >> > > > >
> > >> > > > > Thanks so much for your reply.  I'll now look mostly into any
> > >> > possible
> > >> > > > > hardware issues than Solr/Lucene.
> > >> > > > >
> > >> > > > > Thanks again.
> > >> > > > >
> > >> > > > > On Mon, Sep 24, 2018 at 12:43 PM Erick Erickson <
> > >> > > erickerick...@gmail.com
> > >> > > > >
> > >> > > > > wrote:
> > >> > > > >
> > >> > > > > > There are several of reasons this would "suddenly" start
> > >> appearing.
> > >> > > > > > 1> Your disk went bad and some sector is no longer
> faithfully
> > >> > > > > > recording the bits. In this case the checksum will be wrong
> > >> > > > > > 2> You ran out of disk space sometime and the index was
> > >> corrupted.
> > >> > > > > > This isn't really a hardware problem.
> > >> > > > > > 3> Your disk controller is going wonky and not reading
> > reliably.
> > >> > > > > >
> > >> > > > > > The "possible hardware issue" message is to alert you that
> > this
> > >> is
> > >> > > > > > highly unusual and you should at leasts consider doing
> > integrity
> > >> > > > > > checks on your disk before assuming it's a Solr/Lucene
> problem
> > >> > > > > >
> > >> > > > > > Best,
> > >> > > > > > Erick
> > >> > > > > > On Mon, Sep 24, 2018 at 9:26 AM Susheel Kumar <
> > >> > susheel2...@gmail.com
> > >> > > >
> > >> > > > > > wrote:
> > >> > > > > > >
> > >> > > > > > > Hello,
> > >> > > > > > >
> > >> > > > > > > I am still trying to understand the corrupt index
> exception
> > we
> > >> > saw
> > >> > > > in our
> > >> > > > > > > logs. What does the hardware problem comment indicates
> here?
> > >> > Does
> > >> > > > that
> > >> > > > > > > mean it caused most likely due to hardware issue?
> > >> > > > > > >
> > >> > > > > > > We never had this problem in last couple of months. The
> Solr
> > >> is
> > >> > > > 6.6.2 and
> > >> > > &

Re: checksum failed (hardware problem?)

2018-10-05 Thread Susheel Kumar
> >> > > > > Thanks so much for your reply.  I'll now look mostly into any
> >> > possible
> >> > > > > hardware issues than Solr/Lucene.
> >> > > > >
> >> > > > > Thanks again.
> >> > > > >
> >> > > > > On Mon, Sep 24, 2018 at 12:43 PM Erick Erickson <
> >> > > erickerick...@gmail.com
> >> > > > >
> >> > > > > wrote:
> >> > > > >
> >> > > > > > There are several of reasons this would "suddenly" start
> >> appearing.
> >> > > > > > 1> Your disk went bad and some sector is no longer faithfully
> >> > > > > > recording the bits. In this case the checksum will be wrong
> >> > > > > > 2> You ran out of disk space sometime and the index was
> >> corrupted.
> >> > > > > > This isn't really a hardware problem.
> >> > > > > > 3> Your disk controller is going wonky and not reading
> reliably.
> >> > > > > >
> >> > > > > > The "possible hardware issue" message is to alert you that
> this
> >> is
> >> > > > > > highly unusual and you should at leasts consider doing
> integrity
> >> > > > > > checks on your disk before assuming it's a Solr/Lucene problem
> >> > > > > >
> >> > > > > > Best,
> >> > > > > > Erick
> >> > > > > > On Mon, Sep 24, 2018 at 9:26 AM Susheel Kumar <
> >> > susheel2...@gmail.com
> >> > > >
> >> > > > > > wrote:
> >> > > > > > >
> >> > > > > > > Hello,
> >> > > > > > >
> >> > > > > > > I am still trying to understand the corrupt index exception
> we
> >> > saw
> >> > > > in our
> >> > > > > > > logs. What does the hardware problem comment indicates here?
> >> > Does
> >> > > > that
> >> > > > > > > mean it caused most likely due to hardware issue?
> >> > > > > > >
> >> > > > > > > We never had this problem in last couple of months. The Solr
> >> is
> >> > > > 6.6.2 and
> >> > > > > > > ZK: 3.4.10.
> >> > > > > > >
> >> > > > > > > Please share your thoughts.
> >> > > > > > >
> >> > > > > > > Thanks,
> >> > > > > > > Susheel
> >> > > > > > >
> >> > > > > > > Caused by: org.apache.lucene.index.CorruptIndexException:
> >> > checksum
> >> > > > > > > failed *(hardware
> >> > > > > > > problem?)* : expected=db243d1a actual=7a00d3d2
> >> > > > > > >
> >> > > > > >
> >> > > > (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/
> >> > > app/solr/data/COLL_shard1_replica1/data/index/_i27s.cfs")
> >> > > > > > > [slice=_i27s_Lucene50_0.tim])
> >> > > > > > >
> >> > > > > > > It suddenly started in the logs and before which there was
> no
> >> > such
> >> > > > error.
> >> > > > > > > Searches & ingestions all seems to be working prior to that.
> >> > > > > > >
> >> > > > > > > 
> >> > > > > > >
> >> > > > > > > 2018-09-03 17:16:49.056 INFO  (qtp834133664-519872) [c:COLL
> >> > > s:shard1
> >> > > > > > > r:core_node1 x:COLL_shard1_replica1]
> >> > > > > > > o.a.s.u.p.StatelessScriptUpdateProcessorFactory
> >> > > > update-script#processAdd:
> >> > > > > > >
> >> > > > newid=G31MXMRZESC0CYPR!A-G31MXMRZESC0CYPR.2552019802_1-25520
> >> > > 08480_1-en_US
> >> > > > > > > 2018-09-03 17:16:49.057 ERROR (qtp834133664-519872) [c:COLL
> >> > > s:shard1
> >> > > > > > > r:core_node1 x:COLL_shard1_replica1]
> >> o.a.s.h.RequestHandlerBase
> >> > > > > > > org.apache.solr.common.SolrException: Except

Re: checksum failed (hardware problem?)

2018-10-04 Thread Stephen Bianamara
n:server54:8080_solr-EventThread) [
>>   ] o.a.s.c.c.ConnectionManager Connection with ZooKeeper
>> reestablished.
>> 2018-09-29 17:31:14.049 INFO
>> (zkCallback-5-thread-91-processing-n:server54:8080_solr-EventThread) [
>>   ] o.a.s.c.ZkController ZooKeeper session re-connected ... refreshing
>> core states after session expiration.
>> 2018-09-29 17:31:14.051 INFO
>> (zkCallback-5-thread-91-processing-n:server54:8080_solr-EventThread) [
>>   ] o.a.s.c.c.ZkStateReader Updated live nodes from ZooKeeper... (16)
>> -> (15)
>> 2018-09-29 17:31:14.144 INFO  (qtp834133664-520378) [c:COLL s:shard4
>> r:core_node8 x:COLL_shard4_replica2] o.a.s.c.S.Request
>> [COLL_shard4_replica2]  webapp=/solr path=/admin/ping
>>
>> params={distrib=false&df=wordTokens&_stateVer_=COLL:1246&preferLocalShards=false&qt=/admin/ping&fl=id&fl=score&shards.purpose=4&start=0&fsv=true&shard.url=
>> http://server54:8080/solr/COLL_shard4_replica2/|http://server53:8080/solr/COLL_shard4_replica1/&rows=10&version=2&q={!lucene}*:*&NOW=1538242274139&isShard=true&wt=javabin
>> }
>> webapp=/solr path=/admin/ping
>>
>> params={distrib=false&df=wordTokens&_stateVer_=COLL:1246&preferLocalShards=false&qt=/admin/ping&fl=id&fl=score&shards.purpose=4&start=0&fsv=true&shard.url=
>> http://server54:8080/solr/COLL_shard4_replica2/|http://server53:8080/solr/COLL_shard4_replica1/&rows=10&version=2&q={!lucene}*:*&NOW=1538242274139&isShard=true&wt=javabin
>> }
>> hits=4989979 status=0 QTime=0
>>
>>
>>
>>
>> On Wed, Sep 26, 2018 at 9:44 AM simon  wrote:
>>
>> > I saw something like this a year ago which i reported as a possible
>> bug  (
>> > https://issues.apache.org/jira/browse/SOLR-10840, which has  a full
>> > description and stack traces)
>> >
>> > This occurred very randomly on an AWS instance; moving the index
>> directory
>> > to a different file system did not fix the problem Eventually I cloned
>> our
>> > environment to a new AWS instance, which proved to be the solution.
>> Why, I
>> > have no idea...
>> >
>> > -Simon
>> >
>> > On Mon, Sep 24, 2018 at 1:13 PM, Susheel Kumar 
>> > wrote:
>> >
>> > > Got it. I'll have first hardware folks check and if they don't
>> see/find
>> > > anything suspicious then i'll return here.
>> > >
>> > > Wondering if any body has seen similar error and if they were able to
>> > > confirm if it was hardware fault or so.
>> > >
>> > > Thnx
>> > >
>> > > On Mon, Sep 24, 2018 at 1:01 PM Erick Erickson <
>> erickerick...@gmail.com>
>> > > wrote:
>> > >
>> > > > Mind you it could _still_ be Solr/Lucene, but let's check the
>> hardware
>> > > > first ;)
>> > > > On Mon, Sep 24, 2018 at 9:50 AM Susheel Kumar <
>> susheel2...@gmail.com>
>> > > > wrote:
>> > > > >
>> > > > > Hi Erick,
>> > > > >
>> > > > > Thanks so much for your reply.  I'll now look mostly into any
>> > possible
>> > > > > hardware issues than Solr/Lucene.
>> > > > >
>> > > > > Thanks again.
>> > > > >
>> > > > > On Mon, Sep 24, 2018 at 12:43 PM Erick Erickson <
>> > > erickerick...@gmail.com
>> > > > >
>> > > > > wrote:
>> > > > >
>> > > > > > There are several of reasons this would "suddenly" start
>> appearing.
>> > > > > > 1> Your disk went bad and some sector is no longer faithfully
>> > > > > > recording the bits. In this case the checksum will be wrong
>> > > > > > 2> You ran out of disk space sometime and the index was
>> corrupted.
>> > > > > > This isn't really a hardware problem.
>> > > > > > 3> Your disk controller is going wonky and not reading reliably.
>> > > > > >
>> > > > > > The "possible hardware issue" message is to alert you that this
>> is
>> > > > > > highly unusual and you should at leasts consider doing integrity
>> > > > > > checks on your disk before assuming it's a Solr/Lucene problem
>> > > > > >
>> &

Re: checksum failed (hardware problem?)

2018-10-03 Thread Stephen Bianamara
ard4_replica1/&rows=10&version=2&q={!lucene}*:*&NOW=1538242274139&isShard=true&wt=javabin
> }
> webapp=/solr path=/admin/ping
>
> params={distrib=false&df=wordTokens&_stateVer_=COLL:1246&preferLocalShards=false&qt=/admin/ping&fl=id&fl=score&shards.purpose=4&start=0&fsv=true&shard.url=
> http://server54:8080/solr/COLL_shard4_replica2/|http://server53:8080/solr/COLL_shard4_replica1/&rows=10&version=2&q={!lucene}*:*&NOW=1538242274139&isShard=true&wt=javabin
> }
> hits=4989979 status=0 QTime=0
>
>
>
>
> On Wed, Sep 26, 2018 at 9:44 AM simon  wrote:
>
> > I saw something like this a year ago which i reported as a possible bug
> (
> > https://issues.apache.org/jira/browse/SOLR-10840, which has  a full
> > description and stack traces)
> >
> > This occurred very randomly on an AWS instance; moving the index
> directory
> > to a different file system did not fix the problem Eventually I cloned
> our
> > environment to a new AWS instance, which proved to be the solution. Why,
> I
> > have no idea...
> >
> > -Simon
> >
> > On Mon, Sep 24, 2018 at 1:13 PM, Susheel Kumar 
> > wrote:
> >
> > > Got it. I'll have first hardware folks check and if they don't see/find
> > > anything suspicious then i'll return here.
> > >
> > > Wondering if any body has seen similar error and if they were able to
> > > confirm if it was hardware fault or so.
> > >
> > > Thnx
> > >
> > > On Mon, Sep 24, 2018 at 1:01 PM Erick Erickson <
> erickerick...@gmail.com>
> > > wrote:
> > >
> > > > Mind you it could _still_ be Solr/Lucene, but let's check the
> hardware
> > > > first ;)
> > > > On Mon, Sep 24, 2018 at 9:50 AM Susheel Kumar  >
> > > > wrote:
> > > > >
> > > > > Hi Erick,
> > > > >
> > > > > Thanks so much for your reply.  I'll now look mostly into any
> > possible
> > > > > hardware issues than Solr/Lucene.
> > > > >
> > > > > Thanks again.
> > > > >
> > > > > On Mon, Sep 24, 2018 at 12:43 PM Erick Erickson <
> > > erickerick...@gmail.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > There are several of reasons this would "suddenly" start
> appearing.
> > > > > > 1> Your disk went bad and some sector is no longer faithfully
> > > > > > recording the bits. In this case the checksum will be wrong
> > > > > > 2> You ran out of disk space sometime and the index was
> corrupted.
> > > > > > This isn't really a hardware problem.
> > > > > > 3> Your disk controller is going wonky and not reading reliably.
> > > > > >
> > > > > > The "possible hardware issue" message is to alert you that this
> is
> > > > > > highly unusual and you should at leasts consider doing integrity
> > > > > > checks on your disk before assuming it's a Solr/Lucene problem
> > > > > >
> > > > > > Best,
> > > > > > Erick
> > > > > > On Mon, Sep 24, 2018 at 9:26 AM Susheel Kumar <
> > susheel2...@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > Hello,
> > > > > > >
> > > > > > > I am still trying to understand the corrupt index exception we
> > saw
> > > > in our
> > > > > > > logs. What does the hardware problem comment indicates here?
> > Does
> > > > that
> > > > > > > mean it caused most likely due to hardware issue?
> > > > > > >
> > > > > > > We never had this problem in last couple of months. The Solr is
> > > > 6.6.2 and
> > > > > > > ZK: 3.4.10.
> > > > > > >
> > > > > > > Please share your thoughts.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Susheel
> > > > > > >
> > > > > > > Caused by: org.apache.lucene.index.CorruptIndexException:
> > checksum
> > > > > > > failed *(hardware
> > > > > > > problem?)* : expecte

Re: checksum failed (hardware problem?)

2018-09-30 Thread Susheel Kumar
 index directory
> to a different file system did not fix the problem Eventually I cloned our
> environment to a new AWS instance, which proved to be the solution. Why, I
> have no idea...
>
> -Simon
>
> On Mon, Sep 24, 2018 at 1:13 PM, Susheel Kumar 
> wrote:
>
> > Got it. I'll have first hardware folks check and if they don't see/find
> > anything suspicious then i'll return here.
> >
> > Wondering if any body has seen similar error and if they were able to
> > confirm if it was hardware fault or so.
> >
> > Thnx
> >
> > On Mon, Sep 24, 2018 at 1:01 PM Erick Erickson 
> > wrote:
> >
> > > Mind you it could _still_ be Solr/Lucene, but let's check the hardware
> > > first ;)
> > > On Mon, Sep 24, 2018 at 9:50 AM Susheel Kumar 
> > > wrote:
> > > >
> > > > Hi Erick,
> > > >
> > > > Thanks so much for your reply.  I'll now look mostly into any
> possible
> > > > hardware issues than Solr/Lucene.
> > > >
> > > > Thanks again.
> > > >
> > > > On Mon, Sep 24, 2018 at 12:43 PM Erick Erickson <
> > erickerick...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > There are several of reasons this would "suddenly" start appearing.
> > > > > 1> Your disk went bad and some sector is no longer faithfully
> > > > > recording the bits. In this case the checksum will be wrong
> > > > > 2> You ran out of disk space sometime and the index was corrupted.
> > > > > This isn't really a hardware problem.
> > > > > 3> Your disk controller is going wonky and not reading reliably.
> > > > >
> > > > > The "possible hardware issue" message is to alert you that this is
> > > > > highly unusual and you should at leasts consider doing integrity
> > > > > checks on your disk before assuming it's a Solr/Lucene problem
> > > > >
> > > > > Best,
> > > > > Erick
> > > > > On Mon, Sep 24, 2018 at 9:26 AM Susheel Kumar <
> susheel2...@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > Hello,
> > > > > >
> > > > > > I am still trying to understand the corrupt index exception we
> saw
> > > in our
> > > > > > logs. What does the hardware problem comment indicates here?
> Does
> > > that
> > > > > > mean it caused most likely due to hardware issue?
> > > > > >
> > > > > > We never had this problem in last couple of months. The Solr is
> > > 6.6.2 and
> > > > > > ZK: 3.4.10.
> > > > > >
> > > > > > Please share your thoughts.
> > > > > >
> > > > > > Thanks,
> > > > > > Susheel
> > > > > >
> > > > > > Caused by: org.apache.lucene.index.CorruptIndexException:
> checksum
> > > > > > failed *(hardware
> > > > > > problem?)* : expected=db243d1a actual=7a00d3d2
> > > > > >
> > > > >
> > > (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/
> > app/solr/data/COLL_shard1_replica1/data/index/_i27s.cfs")
> > > > > > [slice=_i27s_Lucene50_0.tim])
> > > > > >
> > > > > > It suddenly started in the logs and before which there was no
> such
> > > error.
> > > > > > Searches & ingestions all seems to be working prior to that.
> > > > > >
> > > > > > 
> > > > > >
> > > > > > 2018-09-03 17:16:49.056 INFO  (qtp834133664-519872) [c:COLL
> > s:shard1
> > > > > > r:core_node1 x:COLL_shard1_replica1]
> > > > > > o.a.s.u.p.StatelessScriptUpdateProcessorFactory
> > > update-script#processAdd:
> > > > > >
> > > newid=G31MXMRZESC0CYPR!A-G31MXMRZESC0CYPR.2552019802_1-25520
> > 08480_1-en_US
> > > > > > 2018-09-03 17:16:49.057 ERROR (qtp834133664-519872) [c:COLL
> > s:shard1
> > > > > > r:core_node1 x:COLL_shard1_replica1] o.a.s.h.RequestHandlerBase
> > > > > > org.apache.solr.common.SolrException: Exception writing document
> > id
> > > > > > G31MXMRZESC0CYPR!A-G31MXMRZESC0CYPR.255201

Re: checksum failed (hardware problem?)

2018-09-26 Thread simon
I saw something like this a year ago which i reported as a possible bug  (
https://issues.apache.org/jira/browse/SOLR-10840, which has  a full
description and stack traces)

This occurred very randomly on an AWS instance; moving the index directory
to a different file system did not fix the problem Eventually I cloned our
environment to a new AWS instance, which proved to be the solution. Why, I
have no idea...

-Simon

On Mon, Sep 24, 2018 at 1:13 PM, Susheel Kumar 
wrote:

> Got it. I'll have first hardware folks check and if they don't see/find
> anything suspicious then i'll return here.
>
> Wondering if any body has seen similar error and if they were able to
> confirm if it was hardware fault or so.
>
> Thnx
>
> On Mon, Sep 24, 2018 at 1:01 PM Erick Erickson 
> wrote:
>
> > Mind you it could _still_ be Solr/Lucene, but let's check the hardware
> > first ;)
> > On Mon, Sep 24, 2018 at 9:50 AM Susheel Kumar 
> > wrote:
> > >
> > > Hi Erick,
> > >
> > > Thanks so much for your reply.  I'll now look mostly into any possible
> > > hardware issues than Solr/Lucene.
> > >
> > > Thanks again.
> > >
> > > On Mon, Sep 24, 2018 at 12:43 PM Erick Erickson <
> erickerick...@gmail.com
> > >
> > > wrote:
> > >
> > > > There are several of reasons this would "suddenly" start appearing.
> > > > 1> Your disk went bad and some sector is no longer faithfully
> > > > recording the bits. In this case the checksum will be wrong
> > > > 2> You ran out of disk space sometime and the index was corrupted.
> > > > This isn't really a hardware problem.
> > > > 3> Your disk controller is going wonky and not reading reliably.
> > > >
> > > > The "possible hardware issue" message is to alert you that this is
> > > > highly unusual and you should at leasts consider doing integrity
> > > > checks on your disk before assuming it's a Solr/Lucene problem
> > > >
> > > > Best,
> > > > Erick
> > > > On Mon, Sep 24, 2018 at 9:26 AM Susheel Kumar  >
> > > > wrote:
> > > > >
> > > > > Hello,
> > > > >
> > > > > I am still trying to understand the corrupt index exception we saw
> > in our
> > > > > logs. What does the hardware problem comment indicates here?  Does
> > that
> > > > > mean it caused most likely due to hardware issue?
> > > > >
> > > > > We never had this problem in last couple of months. The Solr is
> > 6.6.2 and
> > > > > ZK: 3.4.10.
> > > > >
> > > > > Please share your thoughts.
> > > > >
> > > > > Thanks,
> > > > > Susheel
> > > > >
> > > > > Caused by: org.apache.lucene.index.CorruptIndexException: checksum
> > > > > failed *(hardware
> > > > > problem?)* : expected=db243d1a actual=7a00d3d2
> > > > >
> > > >
> > (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/
> app/solr/data/COLL_shard1_replica1/data/index/_i27s.cfs")
> > > > > [slice=_i27s_Lucene50_0.tim])
> > > > >
> > > > > It suddenly started in the logs and before which there was no such
> > error.
> > > > > Searches & ingestions all seems to be working prior to that.
> > > > >
> > > > > 
> > > > >
> > > > > 2018-09-03 17:16:49.056 INFO  (qtp834133664-519872) [c:COLL
> s:shard1
> > > > > r:core_node1 x:COLL_shard1_replica1]
> > > > > o.a.s.u.p.StatelessScriptUpdateProcessorFactory
> > update-script#processAdd:
> > > > >
> > newid=G31MXMRZESC0CYPR!A-G31MXMRZESC0CYPR.2552019802_1-25520
> 08480_1-en_US
> > > > > 2018-09-03 17:16:49.057 ERROR (qtp834133664-519872) [c:COLL
> s:shard1
> > > > > r:core_node1 x:COLL_shard1_replica1] o.a.s.h.RequestHandlerBase
> > > > > org.apache.solr.common.SolrException: Exception writing document
> id
> > > > > G31MXMRZESC0CYPR!A-G31MXMRZESC0CYPR.2552019802_1-2552008480_
> 1-en_US
> > to
> > > > the
> > > > > index; possible analysis error.
> > > > > at
> > > > >
> > > >
> > org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpd
> ateHandler2.java:206)
> > > > > at
> > > > >
> > > >

Re: checksum failed (hardware problem?)

2018-09-24 Thread Susheel Kumar
Got it. I'll have first hardware folks check and if they don't see/find
anything suspicious then i'll return here.

Wondering if any body has seen similar error and if they were able to
confirm if it was hardware fault or so.

Thnx

On Mon, Sep 24, 2018 at 1:01 PM Erick Erickson 
wrote:

> Mind you it could _still_ be Solr/Lucene, but let's check the hardware
> first ;)
> On Mon, Sep 24, 2018 at 9:50 AM Susheel Kumar 
> wrote:
> >
> > Hi Erick,
> >
> > Thanks so much for your reply.  I'll now look mostly into any possible
> > hardware issues than Solr/Lucene.
> >
> > Thanks again.
> >
> > On Mon, Sep 24, 2018 at 12:43 PM Erick Erickson  >
> > wrote:
> >
> > > There are several of reasons this would "suddenly" start appearing.
> > > 1> Your disk went bad and some sector is no longer faithfully
> > > recording the bits. In this case the checksum will be wrong
> > > 2> You ran out of disk space sometime and the index was corrupted.
> > > This isn't really a hardware problem.
> > > 3> Your disk controller is going wonky and not reading reliably.
> > >
> > > The "possible hardware issue" message is to alert you that this is
> > > highly unusual and you should at leasts consider doing integrity
> > > checks on your disk before assuming it's a Solr/Lucene problem
> > >
> > > Best,
> > > Erick
> > > On Mon, Sep 24, 2018 at 9:26 AM Susheel Kumar 
> > > wrote:
> > > >
> > > > Hello,
> > > >
> > > > I am still trying to understand the corrupt index exception we saw
> in our
> > > > logs. What does the hardware problem comment indicates here?  Does
> that
> > > > mean it caused most likely due to hardware issue?
> > > >
> > > > We never had this problem in last couple of months. The Solr is
> 6.6.2 and
> > > > ZK: 3.4.10.
> > > >
> > > > Please share your thoughts.
> > > >
> > > > Thanks,
> > > > Susheel
> > > >
> > > > Caused by: org.apache.lucene.index.CorruptIndexException: checksum
> > > > failed *(hardware
> > > > problem?)* : expected=db243d1a actual=7a00d3d2
> > > >
> > >
> (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/app/solr/data/COLL_shard1_replica1/data/index/_i27s.cfs")
> > > > [slice=_i27s_Lucene50_0.tim])
> > > >
> > > > It suddenly started in the logs and before which there was no such
> error.
> > > > Searches & ingestions all seems to be working prior to that.
> > > >
> > > > 
> > > >
> > > > 2018-09-03 17:16:49.056 INFO  (qtp834133664-519872) [c:COLL s:shard1
> > > > r:core_node1 x:COLL_shard1_replica1]
> > > > o.a.s.u.p.StatelessScriptUpdateProcessorFactory
> update-script#processAdd:
> > > >
> newid=G31MXMRZESC0CYPR!A-G31MXMRZESC0CYPR.2552019802_1-2552008480_1-en_US
> > > > 2018-09-03 17:16:49.057 ERROR (qtp834133664-519872) [c:COLL s:shard1
> > > > r:core_node1 x:COLL_shard1_replica1] o.a.s.h.RequestHandlerBase
> > > > org.apache.solr.common.SolrException: Exception writing document id
> > > > G31MXMRZESC0CYPR!A-G31MXMRZESC0CYPR.2552019802_1-2552008480_1-en_US
> to
> > > the
> > > > index; possible analysis error.
> > > > at
> > > >
> > >
> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:206)
> > > > at
> > > >
> > >
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:67)
> > > > at
> > > >
> > >
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
> > > > at
> > > >
> > >
> org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:979)
> > > > at
> > > >
> > >
> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1192)
> > > > at
> > > >
> > >
> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:748)
> > > > at
> > > >
> > >
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
> > > > at
> > > >
> > >
> org.apache.solr.update.processor.Statele

Re: checksum failed (hardware problem?)

2018-09-24 Thread Erick Erickson
Mind you it could _still_ be Solr/Lucene, but let's check the hardware first ;)
On Mon, Sep 24, 2018 at 9:50 AM Susheel Kumar  wrote:
>
> Hi Erick,
>
> Thanks so much for your reply.  I'll now look mostly into any possible
> hardware issues than Solr/Lucene.
>
> Thanks again.
>
> On Mon, Sep 24, 2018 at 12:43 PM Erick Erickson 
> wrote:
>
> > There are several of reasons this would "suddenly" start appearing.
> > 1> Your disk went bad and some sector is no longer faithfully
> > recording the bits. In this case the checksum will be wrong
> > 2> You ran out of disk space sometime and the index was corrupted.
> > This isn't really a hardware problem.
> > 3> Your disk controller is going wonky and not reading reliably.
> >
> > The "possible hardware issue" message is to alert you that this is
> > highly unusual and you should at leasts consider doing integrity
> > checks on your disk before assuming it's a Solr/Lucene problem
> >
> > Best,
> > Erick
> > On Mon, Sep 24, 2018 at 9:26 AM Susheel Kumar 
> > wrote:
> > >
> > > Hello,
> > >
> > > I am still trying to understand the corrupt index exception we saw in our
> > > logs. What does the hardware problem comment indicates here?  Does that
> > > mean it caused most likely due to hardware issue?
> > >
> > > We never had this problem in last couple of months. The Solr is 6.6.2 and
> > > ZK: 3.4.10.
> > >
> > > Please share your thoughts.
> > >
> > > Thanks,
> > > Susheel
> > >
> > > Caused by: org.apache.lucene.index.CorruptIndexException: checksum
> > > failed *(hardware
> > > problem?)* : expected=db243d1a actual=7a00d3d2
> > >
> > (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/app/solr/data/COLL_shard1_replica1/data/index/_i27s.cfs")
> > > [slice=_i27s_Lucene50_0.tim])
> > >
> > > It suddenly started in the logs and before which there was no such error.
> > > Searches & ingestions all seems to be working prior to that.
> > >
> > > 
> > >
> > > 2018-09-03 17:16:49.056 INFO  (qtp834133664-519872) [c:COLL s:shard1
> > > r:core_node1 x:COLL_shard1_replica1]
> > > o.a.s.u.p.StatelessScriptUpdateProcessorFactory update-script#processAdd:
> > > newid=G31MXMRZESC0CYPR!A-G31MXMRZESC0CYPR.2552019802_1-2552008480_1-en_US
> > > 2018-09-03 17:16:49.057 ERROR (qtp834133664-519872) [c:COLL s:shard1
> > > r:core_node1 x:COLL_shard1_replica1] o.a.s.h.RequestHandlerBase
> > > org.apache.solr.common.SolrException: Exception writing document id
> > > G31MXMRZESC0CYPR!A-G31MXMRZESC0CYPR.2552019802_1-2552008480_1-en_US to
> > the
> > > index; possible analysis error.
> > > at
> > >
> > org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:206)
> > > at
> > >
> > org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:67)
> > > at
> > >
> > org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
> > > at
> > >
> > org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:979)
> > > at
> > >
> > org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1192)
> > > at
> > >
> > org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:748)
> > > at
> > >
> > org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
> > > at
> > >
> > org.apache.solr.update.processor.StatelessScriptUpdateProcessorFactory$ScriptUpdateProcessor.processAdd(StatelessScriptUpdateProcessorFactory.java:380)
> > > at
> > >
> > org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:98)
> > > at
> > >
> > org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:180)
> > > at
> > >
> > org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:136)
> > > at
> > >
> > org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:306)
> > > at
> > org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:251)
> > &g

Re: checksum failed (hardware problem?)

2018-09-24 Thread Susheel Kumar
Hi Erick,

Thanks so much for your reply.  I'll now look mostly into any possible
hardware issues than Solr/Lucene.

Thanks again.

On Mon, Sep 24, 2018 at 12:43 PM Erick Erickson 
wrote:

> There are several of reasons this would "suddenly" start appearing.
> 1> Your disk went bad and some sector is no longer faithfully
> recording the bits. In this case the checksum will be wrong
> 2> You ran out of disk space sometime and the index was corrupted.
> This isn't really a hardware problem.
> 3> Your disk controller is going wonky and not reading reliably.
>
> The "possible hardware issue" message is to alert you that this is
> highly unusual and you should at leasts consider doing integrity
> checks on your disk before assuming it's a Solr/Lucene problem
>
> Best,
> Erick
> On Mon, Sep 24, 2018 at 9:26 AM Susheel Kumar 
> wrote:
> >
> > Hello,
> >
> > I am still trying to understand the corrupt index exception we saw in our
> > logs. What does the hardware problem comment indicates here?  Does that
> > mean it caused most likely due to hardware issue?
> >
> > We never had this problem in last couple of months. The Solr is 6.6.2 and
> > ZK: 3.4.10.
> >
> > Please share your thoughts.
> >
> > Thanks,
> > Susheel
> >
> > Caused by: org.apache.lucene.index.CorruptIndexException: checksum
> > failed *(hardware
> > problem?)* : expected=db243d1a actual=7a00d3d2
> >
> (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/app/solr/data/COLL_shard1_replica1/data/index/_i27s.cfs")
> > [slice=_i27s_Lucene50_0.tim])
> >
> > It suddenly started in the logs and before which there was no such error.
> > Searches & ingestions all seems to be working prior to that.
> >
> > 
> >
> > 2018-09-03 17:16:49.056 INFO  (qtp834133664-519872) [c:COLL s:shard1
> > r:core_node1 x:COLL_shard1_replica1]
> > o.a.s.u.p.StatelessScriptUpdateProcessorFactory update-script#processAdd:
> > newid=G31MXMRZESC0CYPR!A-G31MXMRZESC0CYPR.2552019802_1-2552008480_1-en_US
> > 2018-09-03 17:16:49.057 ERROR (qtp834133664-519872) [c:COLL s:shard1
> > r:core_node1 x:COLL_shard1_replica1] o.a.s.h.RequestHandlerBase
> > org.apache.solr.common.SolrException: Exception writing document id
> > G31MXMRZESC0CYPR!A-G31MXMRZESC0CYPR.2552019802_1-2552008480_1-en_US to
> the
> > index; possible analysis error.
> > at
> >
> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:206)
> > at
> >
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:67)
> > at
> >
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
> > at
> >
> org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:979)
> > at
> >
> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1192)
> > at
> >
> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:748)
> > at
> >
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
> > at
> >
> org.apache.solr.update.processor.StatelessScriptUpdateProcessorFactory$ScriptUpdateProcessor.processAdd(StatelessScriptUpdateProcessorFactory.java:380)
> > at
> >
> org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:98)
> > at
> >
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:180)
> > at
> >
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:136)
> > at
> >
> org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:306)
> > at
> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:251)
> > at
> >
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:122)
> > at
> >
> org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:271)
> > at
> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:251)
> > at
> org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:173)
> > at
> >
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:187)
> > at
> >
> org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:108)
> > at
> org.

Re: checksum failed (hardware problem?)

2018-09-24 Thread Erick Erickson
There are several of reasons this would "suddenly" start appearing.
1> Your disk went bad and some sector is no longer faithfully
recording the bits. In this case the checksum will be wrong
2> You ran out of disk space sometime and the index was corrupted.
This isn't really a hardware problem.
3> Your disk controller is going wonky and not reading reliably.

The "possible hardware issue" message is to alert you that this is
highly unusual and you should at leasts consider doing integrity
checks on your disk before assuming it's a Solr/Lucene problem

Best,
Erick
On Mon, Sep 24, 2018 at 9:26 AM Susheel Kumar  wrote:
>
> Hello,
>
> I am still trying to understand the corrupt index exception we saw in our
> logs. What does the hardware problem comment indicates here?  Does that
> mean it caused most likely due to hardware issue?
>
> We never had this problem in last couple of months. The Solr is 6.6.2 and
> ZK: 3.4.10.
>
> Please share your thoughts.
>
> Thanks,
> Susheel
>
> Caused by: org.apache.lucene.index.CorruptIndexException: checksum
> failed *(hardware
> problem?)* : expected=db243d1a actual=7a00d3d2
> (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/app/solr/data/COLL_shard1_replica1/data/index/_i27s.cfs")
> [slice=_i27s_Lucene50_0.tim])
>
> It suddenly started in the logs and before which there was no such error.
> Searches & ingestions all seems to be working prior to that.
>
> 
>
> 2018-09-03 17:16:49.056 INFO  (qtp834133664-519872) [c:COLL s:shard1
> r:core_node1 x:COLL_shard1_replica1]
> o.a.s.u.p.StatelessScriptUpdateProcessorFactory update-script#processAdd:
> newid=G31MXMRZESC0CYPR!A-G31MXMRZESC0CYPR.2552019802_1-2552008480_1-en_US
> 2018-09-03 17:16:49.057 ERROR (qtp834133664-519872) [c:COLL s:shard1
> r:core_node1 x:COLL_shard1_replica1] o.a.s.h.RequestHandlerBase
> org.apache.solr.common.SolrException: Exception writing document id
> G31MXMRZESC0CYPR!A-G31MXMRZESC0CYPR.2552019802_1-2552008480_1-en_US to the
> index; possible analysis error.
> at
> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:206)
> at
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:67)
> at
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:979)
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1192)
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:748)
> at
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
> at
> org.apache.solr.update.processor.StatelessScriptUpdateProcessorFactory$ScriptUpdateProcessor.processAdd(StatelessScriptUpdateProcessorFactory.java:380)
> at
> org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:98)
> at
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:180)
> at
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:136)
> at
> org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:306)
> at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:251)
> at
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:122)
> at
> org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:271)
> at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:251)
> at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:173)
> at
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:187)
> at
> org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:108)
> at org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:55)
> at
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)
> at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)
> at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361)
> at
> org.apache.solr.servlet.SolrDispatch

checksum failed (hardware problem?)

2018-09-24 Thread Susheel Kumar
Hello,

I am still trying to understand the corrupt index exception we saw in our
logs. What does the hardware problem comment indicates here?  Does that
mean it caused most likely due to hardware issue?

We never had this problem in last couple of months. The Solr is 6.6.2 and
ZK: 3.4.10.

Please share your thoughts.

Thanks,
Susheel

Caused by: org.apache.lucene.index.CorruptIndexException: checksum
failed *(hardware
problem?)* : expected=db243d1a actual=7a00d3d2
(resource=BufferedChecksumIndexInput(MMapIndexInput(path="/app/solr/data/COLL_shard1_replica1/data/index/_i27s.cfs")
[slice=_i27s_Lucene50_0.tim])

It suddenly started in the logs and before which there was no such error.
Searches & ingestions all seems to be working prior to that.



2018-09-03 17:16:49.056 INFO  (qtp834133664-519872) [c:COLL s:shard1
r:core_node1 x:COLL_shard1_replica1]
o.a.s.u.p.StatelessScriptUpdateProcessorFactory update-script#processAdd:
newid=G31MXMRZESC0CYPR!A-G31MXMRZESC0CYPR.2552019802_1-2552008480_1-en_US
2018-09-03 17:16:49.057 ERROR (qtp834133664-519872) [c:COLL s:shard1
r:core_node1 x:COLL_shard1_replica1] o.a.s.h.RequestHandlerBase
org.apache.solr.common.SolrException: Exception writing document id
G31MXMRZESC0CYPR!A-G31MXMRZESC0CYPR.2552019802_1-2552008480_1-en_US to the
index; possible analysis error.
at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:206)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:67)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:979)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1192)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:748)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
at
org.apache.solr.update.processor.StatelessScriptUpdateProcessorFactory$ScriptUpdateProcessor.processAdd(StatelessScriptUpdateProcessorFactory.java:380)
at
org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:98)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:180)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:136)
at
org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:306)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:251)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:122)
at
org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:271)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:251)
at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:173)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:187)
at
org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:108)
at org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:55)
at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:305)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollectio