So I ran checkIndex (without -fix) 5 times in a row :

SOLR was running, but no client connected to it. (just the slave which was synchronizing every 5 minutes)

summary :

1: all good
2: 2 errors: (seg 1 & 2) terms, freq, prox...ERROR [term blog_id:104150: doc 324697 <= lastDoc 324697] & terms, freq, prox...ERROR [term blog_keywords:SPORT: doc 174808 <= lastDoc 174808]
3: 1 error: (seg 2) terms, freq, prox...ERROR [Index: 105, Size: 51]
4: all good
5: 1 error: (seg 7) terms, freq, prox...ERROR [term blog_comments: %X docFreq=1 != num docs seen 0 + num docs deleted 0]

Seams to me that some random things are happening here.

File system is ext3, on a physical server.


Here are the logs of the interesting segments :

** 1 **

  1 of 17: name=_nqt docCount=431889
    compound=false
    hasProx=true
    numFiles=9
    size (MB)=1,671.375
diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64, os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06 01:30:55, source=merge, os.arch=amd64, java.version=1.6.0_20, java.vendor=Sun Microsystems Inc.}
    has deletions [delFileName=_nqt_1y2.del]
    test: open reader.........OK [41918 deleted docs]
    test: fields..............OK [51 fields]
    test: field norms.........OK [51 fields]
test: terms, freq, prox...OK [5211271 terms; 39824029 terms/docs pairs; 59357374 tokens] test: stored fields.......OK [11505678 total field count; avg 29.504 fields per doc] test: term vectors........OK [0 total vector count; avg 0 term/freq vector fields per doc]

  2 of 17: name=_ol7 docCount=913886
    compound=false
    hasProx=true
    numFiles=9
    size (MB)=3,567.739
diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64, os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06 01:30:55, source=merge, os.arch=amd64, java.version=1.6.0_20, java.vendor=Sun Microsystems Inc.}
    has deletions [delFileName=_ol7_1mc.del]
    test: open reader.........OK [74076 deleted docs]
    test: fields..............OK [51 fields]
    test: field norms.........OK [51 fields]
test: terms, freq, prox...OK [9825896 terms; 93954470 terms/docs pairs; 132337348 tokens] test: stored fields.......OK [26933113 total field count; avg 32.07 fields per doc] test: term vectors........OK [0 total vector count; avg 0 term/freq vector fields per doc]

** 2 **

  1 of 17: name=_nqt docCount=431889
    compound=false
    hasProx=true
    numFiles=9
    size (MB)=1,671.375
diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64, os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06 01:30:55, source=merge, os.arch=amd64, java.version=1.6.0
_20, java.vendor=Sun Microsystems Inc.}
    has deletions [delFileName=_nqt_1y2.del]
    test: open reader.........OK [41918 deleted docs]
    test: fields..............OK [51 fields]
    test: field norms.........OK [51 fields]
test: terms, freq, prox...ERROR [term blog_id:104150: doc 324697 <= lastDoc 324697] java.lang.RuntimeException: term blog_id:104150: doc 324697 <= lastDoc 324697 at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:644) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:530)
        at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)
test: stored fields.......OK [11505678 total field count; avg 29.504 fields per doc] test: term vectors........OK [0 total vector count; avg 0 term/freq vector fields per doc]
FAILED
WARNING: fixIndex() would remove reference to this segment; full exception:
java.lang.RuntimeException: Term Index test failed
at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:543)
        at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)

  2 of 17: name=_ol7 docCount=913886
    compound=false
    hasProx=true
    numFiles=9
    size (MB)=3,567.739
diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64, os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06 01:30:55, source=merge, os.arch=amd64, java.version=1.6.0_20, java.vendor=Sun Microsystems Inc.}
    has deletions [delFileName=_ol7_1mc.del]
    test: open reader.........OK [74076 deleted docs]
    test: fields..............OK [51 fields]
    test: field norms.........OK [51 fields]
test: terms, freq, prox...ERROR [term blog_keywords:SPORT: doc 174808 <= lastDoc 174808] java.lang.RuntimeException: term blog_keywords:SPORT: doc 174808 <= lastDoc 174808 at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:644) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:530)
        at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)
test: stored fields.......OK [26933113 total field count; avg 32.07 fields per doc] test: term vectors........OK [0 total vector count; avg 0 term/freq vector fields per doc]
FAILED
WARNING: fixIndex() would remove reference to this segment; full exception:
java.lang.RuntimeException: Term Index test failed
at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:543)
        at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)

** 3 **

  1 of 17: name=_nqt docCount=431889
    compound=false
    hasProx=true
    numFiles=9
    size (MB)=1,671.375
diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64, os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06 01:30:55, source=merge, os.arch=amd64, java.version=1.6.0_20, java.vendor=Sun Microsystems Inc.}
    has deletions [delFileName=_nqt_1y2.del]
    test: open reader.........OK [41918 deleted docs]
    test: fields..............OK [51 fields]
    test: field norms.........OK [51 fields]
test: terms, freq, prox...OK [5211271 terms; 39824029 terms/docs pairs; 59357374 tokens] test: stored fields.......OK [11505678 total field count; avg 29.504 fields per doc] test: term vectors........OK [0 total vector count; avg 0 term/freq vector fields per doc]

  2 of 17: name=_ol7 docCount=913886
    compound=false
    hasProx=true
    numFiles=9
    size (MB)=3,567.739
diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64, os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06 01:30:55, source=merge, os.arch=amd64, java.version=1.6.0_20, java.vendor=Sun Microsystems Inc.}
    has deletions [delFileName=_ol7_1mc.del]
    test: open reader.........OK [74076 deleted docs]
    test: fields..............OK [51 fields]
    test: field norms.........OK [51 fields]
    test: terms, freq, prox...ERROR [Index: 105, Size: 51]
java.lang.IndexOutOfBoundsException: Index: 105, Size: 51
        at java.util.ArrayList.RangeCheck(ArrayList.java:547)
        at java.util.ArrayList.get(ArrayList.java:322)
at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:288) at org.apache.lucene.index.FieldInfos.fieldName(FieldInfos.java:277)
        at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:86)
at org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:129) at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:631) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:530)
        at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)
test: stored fields.......OK [26933113 total field count; avg 32.07 fields per doc] test: term vectors........OK [0 total vector count; avg 0 term/freq vector fields per doc]
FAILED
WARNING: fixIndex() would remove reference to this segment; full exception:
java.lang.RuntimeException: Term Index test failed
at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:543)
        at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)

** 4 **

  1 of 17: name=_nqt docCount=431889
    compound=false
    hasProx=true
    numFiles=9
    size (MB)=1,671.375
diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64, os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06 01:30:55, source=merge, os.arch=amd64, java.version=1.6.0_20, java.vendor=Sun Microsystems Inc.}
    has deletions [delFileName=_nqt_1y2.del]
    test: open reader.........OK [41918 deleted docs]
    test: fields..............OK [51 fields]
    test: field norms.........OK [51 fields]
test: terms, freq, prox...OK [5211271 terms; 39824029 terms/docs pairs; 59357374 tokens] test: stored fields.......OK [11505678 total field count; avg 29.504 fields per doc] test: term vectors........OK [0 total vector count; avg 0 term/freq vector fields per doc]

  2 of 17: name=_ol7 docCount=913886
    compound=false
    hasProx=true
    numFiles=9
    size (MB)=3,567.739
diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64, os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06 01:30:55, source=merge, os.arch=amd64, java.version=1.6.0_20, java.vendor=Sun Microsystems Inc.}
    has deletions [delFileName=_ol7_1mc.del]
    test: open reader.........OK [74076 deleted docs]
    test: fields..............OK [51 fields]
    test: field norms.........OK [51 fields]
test: terms, freq, prox...OK [9825896 terms; 93954470 terms/docs pairs; 132337349 tokens] test: stored fields.......OK [26933113 total field count; avg 32.07 fields per doc] test: term vectors........OK [0 total vector count; avg 0 term/freq vector fields per doc]

  7 of 17: name=_qqy docCount=69109
    compound=false
    hasProx=true
    numFiles=9
    size (MB)=468.861
diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64, os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06 01:30:55, source=merge, os.arch=amd64, java.version=1.6.0_20, java.vendor=Sun Microsystems Inc.}
    has deletions [delFileName=_qqy_18.del]
    test: open reader.........OK [22672 deleted docs]
    test: fields..............OK [51 fields]
    test: field norms.........OK [51 fields]
test: terms, freq, prox...OK [1431363 terms; 10684603 terms/docs pairs; 13097892 tokens] test: stored fields.......OK [1688599 total field count; avg 36.363 fields per doc] test: term vectors........OK [0 total vector count; avg 0 term/freq vector fields per doc]

** 5 **

  1 of 17: name=_nqt docCount=431889
    compound=false
    hasProx=true
    numFiles=9
    size (MB)=1,671.375
diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64, os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06 01:30:55, source=merge, os.arch=amd64, java.version=1.6.0_20, java.vendor=Sun Microsystems Inc.}
    has deletions [delFileName=_nqt_1y2.del]
    test: open reader.........OK [41918 deleted docs]
    test: fields..............OK [51 fields]
    test: field norms.........OK [51 fields]
test: terms, freq, prox...OK [5211271 terms; 39824029 terms/docs pairs; 59357374 tokens] test: stored fields.......OK [11505678 total field count; avg 29.504 fields per doc] test: term vectors........OK [0 total vector count; avg 0 term/freq vector fields per doc]

  2 of 17: name=_ol7 docCount=913886
    compound=false
    hasProx=true
    numFiles=9
    size (MB)=3,567.739
diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64, os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06 01:30:55, source=merge, os.arch=amd64, java.version=1.6.0_20, java.vendor=Sun Microsystems Inc.}
    has deletions [delFileName=_ol7_1mc.del]
    test: open reader.........OK [74076 deleted docs]
    test: fields..............OK [51 fields]
    test: field norms.........OK [51 fields]
test: terms, freq, prox...OK [9825896 terms; 93954470 terms/docs pairs; 132337348 tokens] test: stored fields.......OK [26933113 total field count; avg 32.07 fields per doc] test: term vectors........OK [0 total vector count; avg 0 term/freq vector fields per doc]

  7 of 17: name=_qqy docCount=69109
    compound=false
    hasProx=true
    numFiles=9
    size (MB)=468.861
diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64, os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06 01:30:55, source=merge, os.arch=amd64, java.version=1.6.0
_20, java.vendor=Sun Microsystems Inc.}
    has deletions [delFileName=_qqy_18.del]
    test: open reader.........OK [22672 deleted docs]
    test: fields..............OK [51 fields]
    test: field norms.........OK [51 fields]
test: terms, freq, prox...ERROR [term blog_comments: %X docFreq=1 != num docs seen 0 + num docs deleted 0] java.lang.RuntimeException: term blog_comments: %X docFreq=1 != num docs seen 0 + num docs deleted 0 at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:675) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:530)
        at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)
test: stored fields.......OK [1688599 total field count; avg 36.363 fields per doc] test: term vectors........OK [0 total vector count; avg 0 term/freq vector fields per doc]
FAILED
WARNING: fixIndex() would remove reference to this segment; full exception:
java.lang.RuntimeException: Term Index test failed
at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:543)
        at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)



Thanks,


Le 14/01/2011 12:59, Michael McCandless a écrit :
Right, but removing a segment out from under a live IW (when you run
CheckIndex with -fix) is deadly, because that other IW doesn't know
you've removed the segment, and will later commit a new segment infos
still referencing that segment.

The nature of this particular exception from CheckIndex is very
strange... I think it can only be a bug in Lucene, a bug in the JRE or
a hardware issue (bits are flipping somewhere).

I don't think an error in the IO system can cause this particular
exception (it would cause others), because the deleted docs are loaded
up front when SegmentReader is init'd...

This is why I'd really like to see if a given corrupt index always
hits precisely the same exception if you run CheckIndex more than
once.

Mike

On Thu, Jan 13, 2011 at 10:56 PM, Lance Norskog<goks...@gmail.com>  wrote:
1) CheckIndex is not supposed to change a corrupt segment, only remove it.
2) Are you using local hard disks, or do run on a common SAN or remote
file server? I have seen corruption errors on SANs, where existing
files have random changes.

On Thu, Jan 13, 2011 at 11:06 AM, Michael McCandless
<luc...@mikemccandless.com>  wrote:
Generally it's not safe to run CheckIndex if a writer is also open on the index.

It's not safe because CheckIndex could hit FNFE's on opening files,
or, if you use -fix, CheckIndex will change the index out from under
your other IndexWriter (which will then cause other kinds of
corruption).

That said, I don't think the corruption that CheckIndex is detecting
in your index would be caused by having a writer open on the index.
Your first CheckIndex has a different deletes file (_phe_p3.del, with
44824 deleted docs) than the 2nd time you ran it (_phe_p4.del, with
44828 deleted docs), so it must somehow have to do with that change.

One question: if you have a corrupt index, and run CheckIndex on it
several times in a row, does it always fail in the same way?  (Ie the
same term hits the below exception).

Is there any way I could get a copy of one of your corrupt cases?  I
can then dig...

Mike

On Thu, Jan 13, 2011 at 10:52 AM, Stéphane Delprat
<stephane.delp...@blogspirit.com>  wrote:
I understand less and less what is happening to my solr.

I did a checkIndex (without -fix) and there was an error...

So a did another checkIndex with -fix and then the error was gone. The
segment was alright


During checkIndex I do not shut down the solr server, I just make sure no
client connect to the server.

Should I shut down the solr server during checkIndex ?



first checkIndex :

  4 of 17: name=_phe docCount=264148
    compound=false
    hasProx=true
    numFiles=9
    size (MB)=928.977
    diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64,
os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06
01:30:55, source=merge, os.arch=amd64, java.version=1.6.0_20,
java.vendor=Sun Microsystems Inc.}
    has deletions [delFileName=_phe_p3.del]
    test: open reader.........OK [44824 deleted docs]
    test: fields..............OK [51 fields]
    test: field norms.........OK [51 fields]
    test: terms, freq, prox...ERROR [term post_id:562 docFreq=1 != num docs
seen 0 + num docs deleted 0]
java.lang.RuntimeException: term post_id:562 docFreq=1 != num docs seen 0 +
num docs deleted 0
        at
org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:675)
        at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:530)
        at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)
    test: stored fields.......OK [7206878 total field count; avg 32.86 fields
per doc]
    test: term vectors........OK [0 total vector count; avg 0 term/freq
vector fields per doc]
FAILED
    WARNING: fixIndex() would remove reference to this segment; full
exception:
java.lang.RuntimeException: Term Index test failed
        at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:543)
        at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)


a few minutes latter :

  4 of 18: name=_phe docCount=264148
    compound=false
    hasProx=true
    numFiles=9
    size (MB)=928.977
    diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64,
os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06
01:30:55, source=merge, os.arch=amd64, java.version=1.6.0
_20, java.vendor=Sun Microsystems Inc.}
    has deletions [delFileName=_phe_p4.del]
    test: open reader.........OK [44828 deleted docs]
    test: fields..............OK [51 fields]
    test: field norms.........OK [51 fields]
    test: terms, freq, prox...OK [3200899 terms; 26804334 terms/docs pairs;
28919124 tokens]
    test: stored fields.......OK [7206764 total field count; avg 32.86 fields
per doc]
    test: term vectors........OK [0 total vector count; avg 0 term/freq
vector fields per doc]


Le 12/01/2011 16:50, Michael McCandless a écrit :

Curious... is it always a docFreq=1 != num docs seen 0 + num docs deleted
0?

It looks like new deletions were flushed against the segment (del file
changed from _ncc_22s.del to _ncc_24f.del).

Are you hitting any exceptions during indexing?

Mike

On Wed, Jan 12, 2011 at 10:33 AM, Stéphane Delprat
<stephane.delp...@blogspirit.com>    wrote:

I got another corruption.

It sure looks like it's the same type of error. (on a different field)

It's also not linked to a merge, since the segment size did not change.


*** good segment :

  1 of 9: name=_ncc docCount=1841685
    compound=false
    hasProx=true
    numFiles=9
    size (MB)=6,683.447
    diagnostics = {optimize=false, mergeFactor=10,
os.version=2.6.26-2-amd64,
os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06
01:30:55, source=merge, os.arch=amd64, java.version=1.6.0
_20, java.vendor=Sun Microsystems Inc.}
    has deletions [delFileName=_ncc_22s.del]
    test: open reader.........OK [275881 deleted docs]
    test: fields..............OK [51 fields]
    test: field norms.........OK [51 fields]
    test: terms, freq, prox...OK [17952652 terms; 174113812 terms/docs
pairs;
204561440 tokens]
    test: stored fields.......OK [45511958 total field count; avg 29.066
fields per doc]
    test: term vectors........OK [0 total vector count; avg 0 term/freq
vector fields per doc]


a few hours latter :

*** broken segment :

  1 of 17: name=_ncc docCount=1841685
    compound=false
    hasProx=true
    numFiles=9
    size (MB)=6,683.447
    diagnostics = {optimize=false, mergeFactor=10,
os.version=2.6.26-2-amd64,
os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06
01:30:55, source=merge, os.arch=amd64, java.version=1.6.0
_20, java.vendor=Sun Microsystems Inc.}
    has deletions [delFileName=_ncc_24f.del]
    test: open reader.........OK [278167 deleted docs]
    test: fields..............OK [51 fields]
    test: field norms.........OK [51 fields]
    test: terms, freq, prox...ERROR [term post_id:1599104 docFreq=1 != num
docs seen 0 + num docs deleted 0]
java.lang.RuntimeException: term post_id:1599104 docFreq=1 != num docs
seen
0 + num docs deleted 0
        at
org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:675)
        at
org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:530)
        at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)
    test: stored fields.......OK [45429565 total field count; avg 29.056
fields per doc]
    test: term vectors........OK [0 total vector count; avg 0 term/freq
vector fields per doc]
FAILED
    WARNING: fixIndex() would remove reference to this segment; full
exception:
java.lang.RuntimeException: Term Index test failed
        at
org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:543)
        at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)


I'll activate infoStream for next time.


Thanks,


Le 12/01/2011 00:49, Michael McCandless a écrit :

When you hit corruption is it always this same problem?:

   java.lang.RuntimeException: term source:margolisphil docFreq=1 !=
num docs seen 0 + num docs deleted 0

Can you run with Lucene's IndexWriter infoStream turned on, and catch
the output leading to the corruption?  If something is somehow messing
up the bits in the deletes file that could cause this.

Mike

On Mon, Jan 10, 2011 at 5:52 AM, Stéphane Delprat
<stephane.delp...@blogspirit.com>      wrote:

Hi,

We are using :
Solr Specification Version: 1.4.1
Solr Implementation Version: 1.4.1 955763M - mark - 2010-06-17 18:06:42
Lucene Specification Version: 2.9.3
Lucene Implementation Version: 2.9.3 951790 - 2010-06-06 01:30:55

# java -version
java version "1.6.0_20"
Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode)

We want to index 4M docs in one core (and when it works fine we will
add
other cores with 2M on the same server) (1 doc ~= 1kB)

We use SOLR replication every 5 minutes to update the slave server
(queries
are executed on the slave only)

Documents are changing very quickly, during a normal day we will have
approx
:
* 200 000 updated docs
* 1000 new docs
* 200 deleted docs


I attached the last good checkIndex : solr20110107.txt
And the corrupted one : solr20110110.txt


This is not the first time a segment gets corrupted on this server,
that's
why I ran frequent "checkIndex". (but as you can see the first segment
is
1.800.000 docs and it works fine!)


I can't find any "SEVER" "FATAL" or "exception" in the Solr logs.


I also attached my schema.xml and solrconfig.xml


Is there something wrong with what we are doing ? Do you need other
info
?


Thanks,









--
Lance Norskog
goks...@gmail.com


Reply via email to