Re: What does "found existing value for PerFieldPostingsFormat.format" mean?

2016-10-18 Thread Adrien Grand
We already have CheckIndex that verifies that Fields.iterator() returns a
sorted iterator so I think we should improve the javadocs of
Fields.iterator() to make it explicit.

Le mar. 18 oct. 2016 à 05:15, Trejkaz  a écrit :

> Continuation, found a bug but I'm not sure whether it's in Lucene or
> Lucene's Javadoc.
>
> In MultiFields:
>
>   @SuppressWarnings({"unchecked","rawtypes"})
>   @Override
>   public Iterator iterator() {
> Iterator subIterators[] = new Iterator[subs.length];
> for(int i=0;i   subIterators[i] = subs[i].iterator();
> }
> return new MergedIterator<>(subIterators);
>   }
>
> MergedIterator says in the Javadoc:
>
> "The behavior is undefined if the iterators are not actually sorted."
>
> And indeed, the iterators are _not_ actually sorted. So I look at
> where they come from, Fields#iterator(), which is documented fairly
> tersely:
>
> "Returns an iterator that will step through all fields names.
> This will not return null."
>
> Which doesn't say anything about the names being in order. So I assume
> that either:
>
>   (a) Fields#iterator() is actually supposed to be sorted and the
> documentation should specify it but doesn't, or
>
>   (b) Fields#iterator() is not supposed to be sorted, but either
> MultiFields#iterator() or MergedIterator is supposed to be handling
> this better.
>
> Either way, I think it's a bug in Lucene. But since I don't know which
> direction it's in, and I don't have a reproducible test case I can
> just hand over, I can't easily file it. :/
>
> TX
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: What does "found existing value for PerFieldPostingsFormat.format" mean?

2016-10-17 Thread Trejkaz
Continuation, found a bug but I'm not sure whether it's in Lucene or
Lucene's Javadoc.

In MultiFields:

  @SuppressWarnings({"unchecked","rawtypes"})
  @Override
  public Iterator iterator() {
Iterator subIterators[] = new Iterator[subs.length];
for(int i=0;i(subIterators);
  }

MergedIterator says in the Javadoc:

"The behavior is undefined if the iterators are not actually sorted."

And indeed, the iterators are _not_ actually sorted. So I look at
where they come from, Fields#iterator(), which is documented fairly
tersely:

"Returns an iterator that will step through all fields names.
This will not return null."

Which doesn't say anything about the names being in order. So I assume
that either:

  (a) Fields#iterator() is actually supposed to be sorted and the
documentation should specify it but doesn't, or

  (b) Fields#iterator() is not supposed to be sorted, but either
MultiFields#iterator() or MergedIterator is supposed to be handling
this better.

Either way, I think it's a bug in Lucene. But since I don't know which
direction it's in, and I don't have a reproducible test case I can
just hand over, I can't easily file it. :/

TX

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: What does "found existing value for PerFieldPostingsFormat.format" mean?

2016-10-17 Thread Trejkaz
Additional investigation:

The index has two segments. Both segments have this "path-position" in
the FieldInfo only once. The settings look the same:

FieldInfo in first sub-reader:
name = "path-position"
number = 6
docValuesType = NONE
storeTermVector = false
omitNorms = true
indexOptions = DOCS_AND_FREQS_AND_POSITIONS
storePayloads = false
attributes =
"PerFieldPostingsFormat.format" -> "Lucene50"
"PerFieldPostingsFormat.suffix" -> "0"
dvGen = -1

FieldInfo in second sub-reader:
name = "path-position"
number = 6
docValuesType = NONE
storeTermVector = false
omitNorms = true
indexOptions = DOCS_AND_FREQS_AND_POSITIONS
storePayloads = false
attributes =
"PerFieldPostingsFormat.format" -> "Lucene50"
"PerFieldPostingsFormat.suffix" -> "0"
dvGen = -1

So I'm confused. addIndexes, I thought, merged the data from the given
readers into the destination writer. And here I have two fields with
the same name, number and every other setting, and somehow it's
failing to merge them because when it gets to the second one, it fails
because the first one existed already... which to me, seems like the
point of merging, but maybe that's just me.

TX

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



What does "found existing value for PerFieldPostingsFormat.format" mean?

2016-10-17 Thread Trejkaz
Hi all.

Does anyone know what this error message means?

found existing value for PerFieldPostingsFormat.format,
field=path-position, old=Lucene50, new=Lucene50
java.lang.IllegalStateException: found existing value for
PerFieldPostingsFormat.format, field=path-position, old=Lucene50,
new=Lucene50
at 
org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.write(PerFieldPostingsFormat.java:170)
at 
org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:105)
at 
org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:193)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:95)
at org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:2629)

We're doing fancy migrations which perform index changes by overriding
FilterCodecReader and copying into a new index, but in this particular
case the migration is only *deleting* values from the index, so it
seems odd that I'd get this particular error.

TX

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org