RE: Strange index corruption related to numeric fields when upgrading from 6.0.1

2016-09-21 Thread Jan-Willem van den Broek
Hi Erick,

Isn't that a SOLR restriction? I can't find anything about it in the Lucene 
docs.

If it applies to Lucene as well, then we have some work to do, since the 
brackets are indeed part of the field name. (Also a space in front.) We use 
things like that a lot to avoid collisions in generated and user-supplied names.

I don't think it's the key to this issue though. The change I made that fixed 
my test case still uses brackets and spaces. The Point and StoredField still 
use the name " [1]calculon", but the DoubleDocValuesField is renamed to " 
[p]calculon".

Thanks for the suggestion though. I'd never even considered that we might be 
using illegal fieldnames.

Regards,
Jan-Willem

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Tuesday, September 20, 2016 19:02
To: java-user <java-user@lucene.apache.org>
Subject: Re: Strange index corruption related to numeric fields when upgrading 
from 6.0.1

A wild shot in the dark: Are the square brackets really part of the field name? 
They have never officially been supported, from the Ref
Guide:

"Field names should consist of alphanumeric or underscore characters only and 
not start with a digit. This is not currently strictly enforced, but other 
field names will not have first class support from all components and back 
compatibility is not guaranteed"

Your statement "I cannot reproduce the issue if I give the DoubleDocValuesField 
a different name" seems to indicate that it's not a code problem with Lucene if 
you don't put the brackets in.

Best,
Erick

On Tue, Sep 20, 2016 at 9:04 AM, Jan-Willem van den Broek 
<jan-willem.van.den.br...@valuecare.nl> wrote:
> Hi all,
>
> I have an application that works fine with 6.0.1, but if I go to 6.1.0 or 
> 6.2.0 then I occasionally get a corrupted index where the SegmentMerger keeps 
> breaking on a numeric field.
>
> This is the exception I get:
>
> ... (stack of application code) ...
> Caused by: java.lang.IllegalArgumentException: field=" [1]calculon" did not 
> index point values
> at 
> org.apache.lucene.codecs.lucene60.Lucene60PointsReader.getBKDReader(Lucene60PointsReader.java:126)
> at 
> org.apache.lucene.codecs.lucene60.Lucene60PointsReader.size(Lucene60PointsReader.java:224)
> at 
> org.apache.lucene.codecs.lucene60.Lucene60PointsWriter.merge(Lucene60PointsWriter.java:169)
> at 
> org.apache.lucene.index.SegmentMerger.mergePoints(SegmentMerger.java:173)
> at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:122)
> at 
> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4312)
> at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3889)
> at 
> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:588)
> at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(Concu
> rrentMergeScheduler.java:626)
>
> The field " [1]calculon" is always either a LongPoint or DoublePoint with 1 
> dimension. The documents containing this field always also contain both a 
> StoredField, and a DoubleDocValuesField with the same name.
>
> I cannot reproduce the issue if I give the DoubleDocValuesField a different 
> name. Is that something that I should be doing in general? I was under the 
> impression that it is OK to use the same name for all three related fields.
>
> Here is the infostream from a test that reproduces the issue: 
> http://wikisend.com/download/613238/merges.log
>
> Unfortunately, while I can reproduce the issue consistently in the full 
> application, I don't yet have a clean test case with just/mostly Lucene code.
>
> Any feedback is much appreciated!
>
> Jan-Willem v/d Broek
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Strange index corruption related to numeric fields when upgrading from 6.0.1

2016-09-20 Thread Erick Erickson
A wild shot in the dark: Are the square brackets really part of the
field name? They have never officially been supported, from the Ref
Guide:

"Field names should consist of alphanumeric or underscore characters
only and not start with a digit. This is not currently strictly
enforced, but other field names will not have first class support from
all components and back compatibility is not guaranteed"

Your statement "I cannot reproduce the issue if I give the
DoubleDocValuesField a different name" seems to indicate that it's not
a code problem with Lucene if you don't put the brackets in.

Best,
Erick

On Tue, Sep 20, 2016 at 9:04 AM, Jan-Willem van den Broek
 wrote:
> Hi all,
>
> I have an application that works fine with 6.0.1, but if I go to 6.1.0 or 
> 6.2.0 then I occasionally get a corrupted index where the SegmentMerger keeps 
> breaking on a numeric field.
>
> This is the exception I get:
>
> ... (stack of application code) ...
> Caused by: java.lang.IllegalArgumentException: field=" [1]calculon" did not 
> index point values
> at 
> org.apache.lucene.codecs.lucene60.Lucene60PointsReader.getBKDReader(Lucene60PointsReader.java:126)
> at 
> org.apache.lucene.codecs.lucene60.Lucene60PointsReader.size(Lucene60PointsReader.java:224)
> at 
> org.apache.lucene.codecs.lucene60.Lucene60PointsWriter.merge(Lucene60PointsWriter.java:169)
> at 
> org.apache.lucene.index.SegmentMerger.mergePoints(SegmentMerger.java:173)
> at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:122)
> at 
> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4312)
> at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3889)
> at 
> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:588)
> at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:626)
>
> The field " [1]calculon" is always either a LongPoint or DoublePoint with 1 
> dimension. The documents containing this field always also contain both a 
> StoredField, and a DoubleDocValuesField with the same name.
>
> I cannot reproduce the issue if I give the DoubleDocValuesField a different 
> name. Is that something that I should be doing in general? I was under the 
> impression that it is OK to use the same name for all three related fields.
>
> Here is the infostream from a test that reproduces the issue: 
> http://wikisend.com/download/613238/merges.log
>
> Unfortunately, while I can reproduce the issue consistently in the full 
> application, I don't yet have a clean test case with just/mostly Lucene code.
>
> Any feedback is much appreciated!
>
> Jan-Willem v/d Broek
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Strange index corruption related to numeric fields when upgrading from 6.0.1

2016-09-20 Thread Jan-Willem van den Broek
Hi all,

I have an application that works fine with 6.0.1, but if I go to 6.1.0 or 6.2.0 
then I occasionally get a corrupted index where the SegmentMerger keeps 
breaking on a numeric field.

This is the exception I get:

... (stack of application code) ...
Caused by: java.lang.IllegalArgumentException: field=" [1]calculon" did not 
index point values
at 
org.apache.lucene.codecs.lucene60.Lucene60PointsReader.getBKDReader(Lucene60PointsReader.java:126)
at 
org.apache.lucene.codecs.lucene60.Lucene60PointsReader.size(Lucene60PointsReader.java:224)
at 
org.apache.lucene.codecs.lucene60.Lucene60PointsWriter.merge(Lucene60PointsWriter.java:169)
at 
org.apache.lucene.index.SegmentMerger.mergePoints(SegmentMerger.java:173)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:122)
at 
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4312)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3889)
at 
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:588)
at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:626)

The field " [1]calculon" is always either a LongPoint or DoublePoint with 1 
dimension. The documents containing this field always also contain both a 
StoredField, and a DoubleDocValuesField with the same name.

I cannot reproduce the issue if I give the DoubleDocValuesField a different 
name. Is that something that I should be doing in general? I was under the 
impression that it is OK to use the same name for all three related fields.

Here is the infostream from a test that reproduces the issue: 
http://wikisend.com/download/613238/merges.log

Unfortunately, while I can reproduce the issue consistently in the full 
application, I don't yet have a clean test case with just/mostly Lucene code.

Any feedback is much appreciated!

Jan-Willem v/d Broek

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org