Re: segment gets corrupted (after background merge ?)

2011-01-18 Thread Stéphane Delprat
I ran other tests : when I execute the checkIndex on the master I got 
random errors, but when I scp the file on another server (same software 
exactly) no error occurs...


We will start using another server.


Just one question concerning checkIndex :

What does "tokens" mean ?
How is it possible that the number of tokens change while the files were 
not modified at all ? (this is from the faulty server, on the other 
server the tokens do not change at all)

(solr was stopped during the whole checkIndex process)


#diff 20110118_141257_checkIndex.log 20110118_142356_checkIndex.log
15c15
< test: terms, freq, prox...OK [5211271 terms; 39824029 terms/docs 
pairs; 58236510 tokens]

---
> test: terms, freq, prox...OK [5211271 terms; 39824029 terms/docs 
pairs; 58236582 tokens]

43c43
< test: terms, freq, prox...OK [3947589 terms; 34468256 terms/docs 
pairs; 36740496 tokens]

---
> test: terms, freq, prox...OK [3947589 terms; 34468256 terms/docs 
pairs; 36740533 tokens]

85c85
< test: terms, freq, prox...OK [2600874 terms; 21272098 terms/docs 
pairs; 10862212 tokens]

---
> test: terms, freq, prox...OK [2600874 terms; 21272098 terms/docs 
pairs; 10862221 tokens]



Thanks,


Le 14/01/2011 12:59, Michael McCandless a écrit :

Right, but removing a segment out from under a live IW (when you run
CheckIndex with -fix) is deadly, because that other IW doesn't know
you've removed the segment, and will later commit a new segment infos
still referencing that segment.

The nature of this particular exception from CheckIndex is very
strange... I think it can only be a bug in Lucene, a bug in the JRE or
a hardware issue (bits are flipping somewhere).

I don't think an error in the IO system can cause this particular
exception (it would cause others), because the deleted docs are loaded
up front when SegmentReader is init'd...

This is why I'd really like to see if a given corrupt index always
hits precisely the same exception if you run CheckIndex more than
once.

Mike

On Thu, Jan 13, 2011 at 10:56 PM, Lance Norskog  wrote:

1) CheckIndex is not supposed to change a corrupt segment, only remove it.
2) Are you using local hard disks, or do run on a common SAN or remote
file server? I have seen corruption errors on SANs, where existing
files have random changes.

On Thu, Jan 13, 2011 at 11:06 AM, Michael McCandless
  wrote:

Generally it's not safe to run CheckIndex if a writer is also open on the index.

It's not safe because CheckIndex could hit FNFE's on opening files,
or, if you use -fix, CheckIndex will change the index out from under
your other IndexWriter (which will then cause other kinds of
corruption).

That said, I don't think the corruption that CheckIndex is detecting
in your index would be caused by having a writer open on the index.
Your first CheckIndex has a different deletes file (_phe_p3.del, with
44824 deleted docs) than the 2nd time you ran it (_phe_p4.del, with
44828 deleted docs), so it must somehow have to do with that change.

One question: if you have a corrupt index, and run CheckIndex on it
several times in a row, does it always fail in the same way?  (Ie the
same term hits the below exception).

Is there any way I could get a copy of one of your corrupt cases?  I
can then dig...

Mike

On Thu, Jan 13, 2011 at 10:52 AM, Stéphane Delprat
  wrote:

I understand less and less what is happening to my solr.

I did a checkIndex (without -fix) and there was an error...

So a did another checkIndex with -fix and then the error was gone. The
segment was alright


During checkIndex I do not shut down the solr server, I just make sure no
client connect to the server.

Should I shut down the solr server during checkIndex ?



first checkIndex :

  4 of 17: name=_phe docCount=264148
compound=false
hasProx=true
numFiles=9
size (MB)=928.977
diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64,
os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06
01:30:55, source=merge, os.arch=amd64, java.version=1.6.0_20,
java.vendor=Sun Microsystems Inc.}
has deletions [delFileName=_phe_p3.del]
test: open reader.OK [44824 deleted docs]
test: fields..OK [51 fields]
test: field norms.OK [51 fields]
test: terms, freq, prox...ERROR [term post_id:562 docFreq=1 != num docs
seen 0 + num docs deleted 0]
java.lang.RuntimeException: term post_id:562 docFreq=1 != num docs seen 0 +
num docs deleted 0
at
org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:675)
at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:530)
at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)
test: stored fields...OK [7206878 total field count; avg 32.86 fields
per doc]
test: term vectorsOK [0 total vector count; avg 0 term/freq
vector fields per doc]
FAILED
W

Re: segment gets corrupted (after background merge ?)

2011-01-14 Thread Stéphane Delprat
es,
or, if you use -fix, CheckIndex will change the index out from under
your other IndexWriter (which will then cause other kinds of
corruption).

That said, I don't think the corruption that CheckIndex is detecting
in your index would be caused by having a writer open on the index.
Your first CheckIndex has a different deletes file (_phe_p3.del, with
44824 deleted docs) than the 2nd time you ran it (_phe_p4.del, with
44828 deleted docs), so it must somehow have to do with that change.

One question: if you have a corrupt index, and run CheckIndex on it
several times in a row, does it always fail in the same way?  (Ie the
same term hits the below exception).

Is there any way I could get a copy of one of your corrupt cases?  I
can then dig...

Mike

On Thu, Jan 13, 2011 at 10:52 AM, Stéphane Delprat
  wrote:

I understand less and less what is happening to my solr.

I did a checkIndex (without -fix) and there was an error...

So a did another checkIndex with -fix and then the error was gone. The
segment was alright


During checkIndex I do not shut down the solr server, I just make sure no
client connect to the server.

Should I shut down the solr server during checkIndex ?



first checkIndex :

  4 of 17: name=_phe docCount=264148
compound=false
hasProx=true
numFiles=9
size (MB)=928.977
diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64,
os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06
01:30:55, source=merge, os.arch=amd64, java.version=1.6.0_20,
java.vendor=Sun Microsystems Inc.}
has deletions [delFileName=_phe_p3.del]
test: open reader.OK [44824 deleted docs]
test: fields..OK [51 fields]
test: field norms.OK [51 fields]
test: terms, freq, prox...ERROR [term post_id:562 docFreq=1 != num docs
seen 0 + num docs deleted 0]
java.lang.RuntimeException: term post_id:562 docFreq=1 != num docs seen 0 +
num docs deleted 0
at
org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:675)
at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:530)
at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)
test: stored fields...OK [7206878 total field count; avg 32.86 fields
per doc]
test: term vectorsOK [0 total vector count; avg 0 term/freq
vector fields per doc]
FAILED
WARNING: fixIndex() would remove reference to this segment; full
exception:
java.lang.RuntimeException: Term Index test failed
at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:543)
at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)


a few minutes latter :

  4 of 18: name=_phe docCount=264148
compound=false
hasProx=true
numFiles=9
size (MB)=928.977
diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64,
os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06
01:30:55, source=merge, os.arch=amd64, java.version=1.6.0
_20, java.vendor=Sun Microsystems Inc.}
has deletions [delFileName=_phe_p4.del]
test: open reader.OK [44828 deleted docs]
test: fields..OK [51 fields]
test: field norms.OK [51 fields]
test: terms, freq, prox...OK [3200899 terms; 26804334 terms/docs pairs;
28919124 tokens]
test: stored fields...OK [7206764 total field count; avg 32.86 fields
per doc]
test: term vectorsOK [0 total vector count; avg 0 term/freq
vector fields per doc]


Le 12/01/2011 16:50, Michael McCandless a écrit :


Curious... is it always a docFreq=1 != num docs seen 0 + num docs deleted
0?

It looks like new deletions were flushed against the segment (del file
changed from _ncc_22s.del to _ncc_24f.del).

Are you hitting any exceptions during indexing?

Mike

On Wed, Jan 12, 2011 at 10:33 AM, Stéphane Delprat
wrote:


I got another corruption.

It sure looks like it's the same type of error. (on a different field)

It's also not linked to a merge, since the segment size did not change.


*** good segment :

  1 of 9: name=_ncc docCount=1841685
compound=false
hasProx=true
numFiles=9
size (MB)=6,683.447
diagnostics = {optimize=false, mergeFactor=10,
os.version=2.6.26-2-amd64,
os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06
01:30:55, source=merge, os.arch=amd64, java.version=1.6.0
_20, java.vendor=Sun Microsystems Inc.}
has deletions [delFileName=_ncc_22s.del]
test: open reader.OK [275881 deleted docs]
test: fields..OK [51 fields]
test: field norms.OK [51 fields]
test: terms, freq, prox...OK [17952652 terms; 174113812 terms/docs
pairs;
204561440 tokens]
test: stored fields...OK [45511958 total field count; avg 29.066
fields per doc]
test: term vectorsOK [0 total vector count; avg 0 term/freq
vector fields per doc]


a few hours latter :

*** broken segment :

  1 of 17: name=_ncc docCount=1

Re: segment gets corrupted (after background merge ?)

2011-01-13 Thread Stéphane Delprat

I understand less and less what is happening to my solr.

I did a checkIndex (without -fix) and there was an error...

So a did another checkIndex with -fix and then the error was gone. The 
segment was alright



During checkIndex I do not shut down the solr server, I just make sure 
no client connect to the server.


Should I shut down the solr server during checkIndex ?



first checkIndex :

  4 of 17: name=_phe docCount=264148
compound=false
hasProx=true
numFiles=9
size (MB)=928.977
diagnostics = {optimize=false, mergeFactor=10, 
os.version=2.6.26-2-amd64, os=Linux, mergeDocStores=true, 
lucene.version=2.9.3 951790 - 2010-06-06 01:30:55, source=merge, 
os.arch=amd64, java.version=1.6.0_20, java.vendor=Sun Microsystems Inc.}

has deletions [delFileName=_phe_p3.del]
test: open reader.OK [44824 deleted docs]
test: fields..OK [51 fields]
test: field norms.OK [51 fields]
test: terms, freq, prox...ERROR [term post_id:562 docFreq=1 != num 
docs seen 0 + num docs deleted 0]
java.lang.RuntimeException: term post_id:562 docFreq=1 != num docs seen 
0 + num docs deleted 0
at 
org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:675)
at 
org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:530)

at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)
test: stored fields...OK [7206878 total field count; avg 32.86 
fields per doc]
test: term vectorsOK [0 total vector count; avg 0 term/freq 
vector fields per doc]

FAILED
WARNING: fixIndex() would remove reference to this segment; full 
exception:

java.lang.RuntimeException: Term Index test failed
at 
org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:543)

at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)


a few minutes latter :

  4 of 18: name=_phe docCount=264148
compound=false
hasProx=true
numFiles=9
size (MB)=928.977
diagnostics = {optimize=false, mergeFactor=10, 
os.version=2.6.26-2-amd64, os=Linux, mergeDocStores=true, 
lucene.version=2.9.3 951790 - 2010-06-06 01:30:55, source=merge, 
os.arch=amd64, java.version=1.6.0

_20, java.vendor=Sun Microsystems Inc.}
has deletions [delFileName=_phe_p4.del]
test: open reader.OK [44828 deleted docs]
test: fields..OK [51 fields]
test: field norms.OK [51 fields]
test: terms, freq, prox...OK [3200899 terms; 26804334 terms/docs 
pairs; 28919124 tokens]
test: stored fields...OK [7206764 total field count; avg 32.86 
fields per doc]
test: term vectorsOK [0 total vector count; avg 0 term/freq 
vector fields per doc]



Le 12/01/2011 16:50, Michael McCandless a écrit :

Curious... is it always a docFreq=1 != num docs seen 0 + num docs deleted 0?

It looks like new deletions were flushed against the segment (del file
changed from _ncc_22s.del to _ncc_24f.del).

Are you hitting any exceptions during indexing?

Mike

On Wed, Jan 12, 2011 at 10:33 AM, Stéphane Delprat
  wrote:

I got another corruption.

It sure looks like it's the same type of error. (on a different field)

It's also not linked to a merge, since the segment size did not change.


*** good segment :

  1 of 9: name=_ncc docCount=1841685
compound=false
hasProx=true
numFiles=9
size (MB)=6,683.447
diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64,
os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06
01:30:55, source=merge, os.arch=amd64, java.version=1.6.0
_20, java.vendor=Sun Microsystems Inc.}
has deletions [delFileName=_ncc_22s.del]
test: open reader.OK [275881 deleted docs]
test: fields..OK [51 fields]
test: field norms.OK [51 fields]
test: terms, freq, prox...OK [17952652 terms; 174113812 terms/docs pairs;
204561440 tokens]
test: stored fields...OK [45511958 total field count; avg 29.066
fields per doc]
test: term vectorsOK [0 total vector count; avg 0 term/freq
vector fields per doc]


a few hours latter :

*** broken segment :

  1 of 17: name=_ncc docCount=1841685
compound=false
hasProx=true
numFiles=9
size (MB)=6,683.447
diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64,
os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06
01:30:55, source=merge, os.arch=amd64, java.version=1.6.0
_20, java.vendor=Sun Microsystems Inc.}
has deletions [delFileName=_ncc_24f.del]
test: open reader.OK [278167 deleted docs]
test: fields..OK [51 fields]
test: field norms.OK [51 fields]
test: terms, freq, prox...ERROR [term post_id:1599104 docFreq=1 != num
docs seen 0 + num docs deleted 0]
java.lang.RuntimeException: term post_id:1599104 docFreq=1 != num docs seen
0 + num docs deleted 0
at
org.apache.lucene.index.CheckIndex.testTermIndex(CheckInde

Re: segment gets corrupted (after background merge ?)

2011-01-12 Thread Stéphane Delprat

I got another corruption.

It sure looks like it's the same type of error. (on a different field)

It's also not linked to a merge, since the segment size did not change.


*** good segment :

  1 of 9: name=_ncc docCount=1841685
compound=false
hasProx=true
numFiles=9
size (MB)=6,683.447
diagnostics = {optimize=false, mergeFactor=10, 
os.version=2.6.26-2-amd64, os=Linux, mergeDocStores=true, 
lucene.version=2.9.3 951790 - 2010-06-06 01:30:55, source=merge, 
os.arch=amd64, java.version=1.6.0

_20, java.vendor=Sun Microsystems Inc.}
has deletions [delFileName=_ncc_22s.del]
test: open reader.OK [275881 deleted docs]
test: fields..OK [51 fields]
test: field norms.OK [51 fields]
test: terms, freq, prox...OK [17952652 terms; 174113812 terms/docs 
pairs; 204561440 tokens]
test: stored fields...OK [45511958 total field count; avg 
29.066 fields per doc]
test: term vectorsOK [0 total vector count; avg 0 term/freq 
vector fields per doc]



a few hours latter :

*** broken segment :

  1 of 17: name=_ncc docCount=1841685
compound=false
hasProx=true
numFiles=9
size (MB)=6,683.447
diagnostics = {optimize=false, mergeFactor=10, 
os.version=2.6.26-2-amd64, os=Linux, mergeDocStores=true, 
lucene.version=2.9.3 951790 - 2010-06-06 01:30:55, source=merge, 
os.arch=amd64, java.version=1.6.0

_20, java.vendor=Sun Microsystems Inc.}
has deletions [delFileName=_ncc_24f.del]
test: open reader.OK [278167 deleted docs]
test: fields..OK [51 fields]
test: field norms.OK [51 fields]
test: terms, freq, prox...ERROR [term post_id:1599104 docFreq=1 != 
num docs seen 0 + num docs deleted 0]
java.lang.RuntimeException: term post_id:1599104 docFreq=1 != num docs 
seen 0 + num docs deleted 0
at 
org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:675)
at 
org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:530)

at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)
test: stored fields...OK [45429565 total field count; avg 
29.056 fields per doc]
test: term vectorsOK [0 total vector count; avg 0 term/freq 
vector fields per doc]

FAILED
WARNING: fixIndex() would remove reference to this segment; full 
exception:

java.lang.RuntimeException: Term Index test failed
at 
org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:543)

at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)


I'll activate infoStream for next time.


Thanks,


Le 12/01/2011 00:49, Michael McCandless a écrit :

When you hit corruption is it always this same problem?:

   java.lang.RuntimeException: term source:margolisphil docFreq=1 !=
num docs seen 0 + num docs deleted 0

Can you run with Lucene's IndexWriter infoStream turned on, and catch
the output leading to the corruption?  If something is somehow messing
up the bits in the deletes file that could cause this.

Mike

On Mon, Jan 10, 2011 at 5:52 AM, Stéphane Delprat
  wrote:

Hi,

We are using :
Solr Specification Version: 1.4.1
Solr Implementation Version: 1.4.1 955763M - mark - 2010-06-17 18:06:42
Lucene Specification Version: 2.9.3
Lucene Implementation Version: 2.9.3 951790 - 2010-06-06 01:30:55

# java -version
java version "1.6.0_20"
Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode)

We want to index 4M docs in one core (and when it works fine we will add
other cores with 2M on the same server) (1 doc ~= 1kB)

We use SOLR replication every 5 minutes to update the slave server (queries
are executed on the slave only)

Documents are changing very quickly, during a normal day we will have approx
:
* 200 000 updated docs
* 1000 new docs
* 200 deleted docs


I attached the last good checkIndex : solr20110107.txt
And the corrupted one : solr20110110.txt


This is not the first time a segment gets corrupted on this server, that's
why I ran frequent "checkIndex". (but as you can see the first segment is
1.800.000 docs and it works fine!)


I can't find any "SEVER" "FATAL" or "exception" in the Solr logs.


I also attached my schema.xml and solrconfig.xml


Is there something wrong with what we are doing ? Do you need other info ?


Thanks,





Re: What can cause segment corruption?

2011-01-11 Thread Stéphane Delprat

Thanks for your answer,

It's not a disk space problem here :

# df -h
FilesystemSize  Used Avail Use% Mounted on
/dev/sda4 280G   22G  244G   9% /


We will try to install solr on a different server (We just need a little 
time for that)



Stéphane


Le 11/01/2011 15:42, Jason Rutherglen a écrit :

Stéphane,

I've only seen production index corruption when during merge the
process ran out of disk space, or there is an underlying hardware
related issue.

On Tue, Jan 11, 2011 at 5:06 AM, Stéphane Delprat
  wrote:

Hi,


I'm using Solr 1.4.1 (Lucene 2.9.3)

And some segments get corrupted:

  4 of 11: name=_p40 docCount=470035
compound=false
hasProx=true
numFiles=9
size (MB)=1,946.747
diagnostics = {optimize=true, mergeFactor=6, os.version=2.6.26-2-amd64,
os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06
01:30:55, source=merge, os.arch=amd64, java.version=1.6.0_20,
java.vendor=Sun Microsystems Inc.}
has deletions [delFileName=_p40_bj.del]
test: open reader.OK [9299 deleted docs]
test: fields..OK [51 fields]
test: field norms.OK [51 fields]
test: terms, freq, prox...ERROR [term source:margolisphil docFreq=1 !=
num docs seen 0 + num docs deleted 0]
java.lang.RuntimeException: term source:margolisphil docFreq=1 != num docs
seen 0 + num docs deleted 0
at
org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:675)
at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:530)
at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)
test: stored fields...OK [15454281 total field count; avg 33.543
fields per doc]
test: term vectorsOK [0 total vector count; avg 0 term/freq
vector fields per doc]
FAILED
WARNING: fixIndex() would remove reference to this segment; full
exception:
java.lang.RuntimeException: Term Index test failed
at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:543)
at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)


What might cause this corruption?


I detailed my configuration here:

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201101.mbox/%3c4d2ae506.7070...@blogspirit.com%3e

Thanks,





What can cause segment corruption?

2011-01-11 Thread Stéphane Delprat

Hi,


I'm using Solr 1.4.1 (Lucene 2.9.3)

And some segments get corrupted:

  4 of 11: name=_p40 docCount=470035
compound=false
hasProx=true
numFiles=9
size (MB)=1,946.747
diagnostics = {optimize=true, mergeFactor=6, 
os.version=2.6.26-2-amd64, os=Linux, mergeDocStores=true, 
lucene.version=2.9.3 951790 - 2010-06-06 01:30:55, source=merge, 
os.arch=amd64, java.version=1.6.0_20, java.vendor=Sun Microsystems Inc.}

has deletions [delFileName=_p40_bj.del]
test: open reader.OK [9299 deleted docs]
test: fields..OK [51 fields]
test: field norms.OK [51 fields]
test: terms, freq, prox...ERROR [term source:margolisphil docFreq=1 
!= num docs seen 0 + num docs deleted 0]
java.lang.RuntimeException: term source:margolisphil docFreq=1 != num 
docs seen 0 + num docs deleted 0
at 
org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:675)
at 
org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:530)

at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)
test: stored fields...OK [15454281 total field count; avg 
33.543 fields per doc]
test: term vectorsOK [0 total vector count; avg 0 term/freq 
vector fields per doc]

FAILED
WARNING: fixIndex() would remove reference to this segment; full 
exception:

java.lang.RuntimeException: Term Index test failed
at 
org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:543)

at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)


What might cause this corruption?


I detailed my configuration here:

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201101.mbox/%3c4d2ae506.7070...@blogspirit.com%3e

Thanks,


segment gets corrupted (after background merge ?)

2011-01-10 Thread Stéphane Delprat

Hi,

We are using :
Solr Specification Version: 1.4.1
Solr Implementation Version: 1.4.1 955763M - mark - 2010-06-17 18:06:42
Lucene Specification Version: 2.9.3
Lucene Implementation Version: 2.9.3 951790 - 2010-06-06 01:30:55

# java -version
java version "1.6.0_20"
Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode)

We want to index 4M docs in one core (and when it works fine we will add 
other cores with 2M on the same server) (1 doc ~= 1kB)


We use SOLR replication every 5 minutes to update the slave server 
(queries are executed on the slave only)


Documents are changing very quickly, during a normal day we will have 
approx :

* 200 000 updated docs
* 1000 new docs
* 200 deleted docs


I attached the last good checkIndex : solr20110107.txt
And the corrupted one : solr20110110.txt


This is not the first time a segment gets corrupted on this server, 
that's why I ran frequent "checkIndex". (but as you can see the first 
segment is 1.800.000 docs and it works fine!)



I can't find any "SEVER" "FATAL" or "exception" in the Solr logs.


I also attached my schema.xml and solrconfig.xml


Is there something wrong with what we are doing ? Do you need other info ?


Thanks,

Opening index @ /solr/multicore/core1/data/index/

Segments file=segments_i7t numSegments=9 version=FORMAT_DIAGNOSTICS [Lucene 2.9]
  1 of 9: name=_ncc docCount=1841685
compound=false
hasProx=true
numFiles=9
size (MB)=6,683.447
diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64, 
os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06 
01:30:55, source=merge, os.arch=amd64, java.version=1.6.0_20, java.vendor=Sun 
Microsystems Inc.}
has deletions [delFileName=_ncc_13m.del]
test: open reader.OK [105940 deleted docs]
test: fields..OK [51 fields]
test: field norms.OK [51 fields]
test: terms, freq, prox...OK [17952652 terms; 174113812 terms/docs pairs; 
248678841 tokens]
test: stored fields...OK [51585300 total field count; avg 29.719 fields 
per doc]
test: term vectorsOK [0 total vector count; avg 0 term/freq vector 
fields per doc]

  2 of 9: name=_nqt docCount=431889
compound=false
hasProx=true
numFiles=9
size (MB)=1,671.375
diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64, 
os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06 
01:30:55, source=merge, os.arch=amd64, java.version=1.6.0_20, java.vendor=Sun 
Microsystems Inc.}
has deletions [delFileName=_nqt_gt.del]
test: open reader.OK [10736 deleted docs]
test: fields..OK [51 fields]
test: field norms.OK [51 fields]
test: terms, freq, prox...OK [5211271 terms; 39824029 terms/docs pairs; 
67787288 tokens]
test: stored fields...OK [12562924 total field count; avg 29.83 fields 
per doc]
test: term vectorsOK [0 total vector count; avg 0 term/freq vector 
fields per doc]

  3 of 9: name=_ol7 docCount=913886
compound=false
hasProx=true
numFiles=9
size (MB)=3,567.63
diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64, 
os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06 
01:30:55, source=merge, os.arch=amd64, java.version=1.6.0_20, java.vendor=Sun 
Microsystems Inc.}
has deletions [delFileName=_ol7_3.del]
test: open reader.OK [11 deleted docs]
test: fields..OK [51 fields]
test: field norms.OK [51 fields]
test: terms, freq, prox...OK [9825896 terms; 93954470 terms/docs pairs; 
152947518 tokens]
test: stored fields...OK [29587930 total field count; avg 32.376 fields 
per doc]
test: term vectorsOK [0 total vector count; avg 0 term/freq vector 
fields per doc]

  4 of 9: name=_ol2 docCount=1011
compound=false
hasProx=true
numFiles=8
size (MB)=6.959
diagnostics = {os.version=2.6.26-2-amd64, os=Linux, lucene.version=2.9.3 
951790 - 2010-06-06 01:30:55, source=flush, os.arch=amd64, 
java.version=1.6.0_20, java.vendor=Sun Microsystems Inc.}
no deletions
test: open reader.OK
test: fields..OK [38 fields]
test: field norms.OK [38 fields]
test: terms, freq, prox...OK [54205 terms; 220705 terms/docs pairs; 389336 
tokens]
test: stored fields...OK [27402 total field count; avg 27.104 fields 
per doc]
test: term vectorsOK [0 total vector count; avg 0 term/freq vector 
fields per doc]

  5 of 9: name=_ol3 docCount=1000
compound=false
hasProx=true
numFiles=8
size (MB)=6.944
diagnostics = {os.version=2.6.26-2-amd64, os=Linux, lucene.version=2.9.3 
951790 - 2010-06-06 01:30:55, source=flush, os.arch=amd64, 
java.version=1.6.0_20, java.vendor=Sun Microsystems Inc.}
no deletions
test: open reader.OK
test: fields..OK [33 fields]
t