Re: segment gets corrupted (after background merge ?)

Stéphane Delprat Thu, 13 Jan 2011 07:53:13 -0800

I understand less and less what is happening to my solr.


I did a checkIndex (without -fix) and there was an error...

So a did another checkIndex with -fix and then the error was gone. Thesegment was alright

During checkIndex I do not shut down the solr server, I just make sureno client connect to the server.


Should I shut down the solr server during checkIndex ?



first checkIndex :

  4 of 17: name=_phe docCount=264148
    compound=false
    hasProx=true
    numFiles=9
    size (MB)=928.977

diagnostics = {optimize=false, mergeFactor=10,os.version=2.6.26-2-amd64, os=Linux, mergeDocStores=true,lucene.version=2.9.3 951790 - 2010-06-06 01:30:55, source=merge,os.arch=amd64, java.version=1.6.0_20, java.vendor=Sun Microsystems Inc.}

    has deletions [delFileName=_phe_p3.del]
    test: open reader.........OK [44824 deleted docs]
    test: fields..............OK [51 fields]
    test: field norms.........OK [51 fields]

test: terms, freq, prox...ERROR [term post_id:562 docFreq=1 != numdocs seen 0 + num docs deleted 0]java.lang.RuntimeException: term post_id:562 docFreq=1 != num docs seen0 + num docs deleted 0atorg.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:675)atorg.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:530)

        at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)

test: stored fields.......OK [7206878 total field count; avg 32.86fields per doc]test: term vectors........OK [0 total vector count; avg 0 term/freqvector fields per doc]

FAILED

WARNING: fixIndex() would remove reference to this segment; fullexception:

java.lang.RuntimeException: Term Index test failed

atorg.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:543)

        at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)


a few minutes latter :

  4 of 18: name=_phe docCount=264148
    compound=false
    hasProx=true
    numFiles=9
    size (MB)=928.977

_20, java.vendor=Sun Microsystems Inc.}
    has deletions [delFileName=_phe_p4.del]
    test: open reader.........OK [44828 deleted docs]
    test: fields..............OK [51 fields]
    test: field norms.........OK [51 fields]

test: terms, freq, prox...OK [3200899 terms; 26804334 terms/docspairs; 28919124 tokens]test: stored fields.......OK [7206764 total field count; avg 32.86fields per doc]test: term vectors........OK [0 total vector count; avg 0 term/freqvector fields per doc]



Le 12/01/2011 16:50, Michael McCandless a écrit :

Curious... is it always a docFreq=1 != num docs seen 0 + num docs deleted 0?

It looks like new deletions were flushed against the segment (del file
changed from _ncc_22s.del to _ncc_24f.del).

Are you hitting any exceptions during indexing?

Mike

On Wed, Jan 12, 2011 at 10:33 AM, Stéphane Delprat
<stephane.delp...@blogspirit.com>  wrote:

I got another corruption.

It sure looks like it's the same type of error. (on a different field)

It's also not linked to a merge, since the segment size did not change.


*** good segment :

  1 of 9: name=_ncc docCount=1841685
    compound=false
    hasProx=true
    numFiles=9
    size (MB)=6,683.447
    diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64,
os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06
01:30:55, source=merge, os.arch=amd64, java.version=1.6.0
_20, java.vendor=Sun Microsystems Inc.}
    has deletions [delFileName=_ncc_22s.del]
    test: open reader.........OK [275881 deleted docs]
    test: fields..............OK [51 fields]
    test: field norms.........OK [51 fields]
    test: terms, freq, prox...OK [17952652 terms; 174113812 terms/docs pairs;
204561440 tokens]
    test: stored fields.......OK [45511958 total field count; avg 29.066
fields per doc]
    test: term vectors........OK [0 total vector count; avg 0 term/freq
vector fields per doc]


a few hours latter :

*** broken segment :

  1 of 17: name=_ncc docCount=1841685
    compound=false
    hasProx=true
    numFiles=9
    size (MB)=6,683.447
    diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64,
os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06
01:30:55, source=merge, os.arch=amd64, java.version=1.6.0
_20, java.vendor=Sun Microsystems Inc.}
    has deletions [delFileName=_ncc_24f.del]
    test: open reader.........OK [278167 deleted docs]
    test: fields..............OK [51 fields]
    test: field norms.........OK [51 fields]
    test: terms, freq, prox...ERROR [term post_id:1599104 docFreq=1 != num
docs seen 0 + num docs deleted 0]
java.lang.RuntimeException: term post_id:1599104 docFreq=1 != num docs seen
0 + num docs deleted 0
        at
org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:675)
        at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:530)
        at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)
    test: stored fields.......OK [45429565 total field count; avg 29.056
fields per doc]
    test: term vectors........OK [0 total vector count; avg 0 term/freq
vector fields per doc]
FAILED
    WARNING: fixIndex() would remove reference to this segment; full
exception:
java.lang.RuntimeException: Term Index test failed
        at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:543)
        at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)


I'll activate infoStream for next time.


Thanks,


Le 12/01/2011 00:49, Michael McCandless a écrit :


When you hit corruption is it always this same problem?:

   java.lang.RuntimeException: term source:margolisphil docFreq=1 !=
num docs seen 0 + num docs deleted 0

Can you run with Lucene's IndexWriter infoStream turned on, and catch
the output leading to the corruption?  If something is somehow messing
up the bits in the deletes file that could cause this.

Mike

On Mon, Jan 10, 2011 at 5:52 AM, Stéphane Delprat
<stephane.delp...@blogspirit.com>    wrote:


Hi,

We are using :
Solr Specification Version: 1.4.1
Solr Implementation Version: 1.4.1 955763M - mark - 2010-06-17 18:06:42
Lucene Specification Version: 2.9.3
Lucene Implementation Version: 2.9.3 951790 - 2010-06-06 01:30:55

# java -version
java version "1.6.0_20"
Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode)

We want to index 4M docs in one core (and when it works fine we will add
other cores with 2M on the same server) (1 doc ~= 1kB)

We use SOLR replication every 5 minutes to update the slave server
(queries
are executed on the slave only)

Documents are changing very quickly, during a normal day we will have
approx
:
* 200 000 updated docs
* 1000 new docs
* 200 deleted docs


I attached the last good checkIndex : solr20110107.txt
And the corrupted one : solr20110110.txt


This is not the first time a segment gets corrupted on this server,
that's
why I ran frequent "checkIndex". (but as you can see the first segment is
1.800.000 docs and it works fine!)


I can't find any "SEVER" "FATAL" or "exception" in the Solr logs.


I also attached my schema.xml and solrconfig.xml


Is there something wrong with what we are doing ? Do you need other info
?


Thanks,

Re: segment gets corrupted (after background merge ?)

Reply via email to