Re: What does "found existing value for PerFieldPostingsFormat.format" mean?

2016-10-17 Thread Trejkaz
Continuation, found a bug but I'm not sure whether it's in Lucene or
Lucene's Javadoc.

In MultiFields:

  @SuppressWarnings({"unchecked","rawtypes"})
  @Override
  public Iterator iterator() {
Iterator subIterators[] = new Iterator[subs.length];
for(int i=0;i(subIterators);
  }

MergedIterator says in the Javadoc:

"The behavior is undefined if the iterators are not actually sorted."

And indeed, the iterators are _not_ actually sorted. So I look at
where they come from, Fields#iterator(), which is documented fairly
tersely:

"Returns an iterator that will step through all fields names.
This will not return null."

Which doesn't say anything about the names being in order. So I assume
that either:

  (a) Fields#iterator() is actually supposed to be sorted and the
documentation should specify it but doesn't, or

  (b) Fields#iterator() is not supposed to be sorted, but either
MultiFields#iterator() or MergedIterator is supposed to be handling
this better.

Either way, I think it's a bug in Lucene. But since I don't know which
direction it's in, and I don't have a reproducible test case I can
just hand over, I can't easily file it. :/

TX

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Creating Queries agnostic to Lucene Versions

2016-10-17 Thread Rajnish kamboj
Any thought on the below question?


On Friday 14 October 2016, Rajnish Kamboj  wrote:

> Hi
>
> How can I make my Lucene queries agnostic to Lucene Versions?
>
> e.g. NumericRangeQuery in 5.3.1 is LegacyNumericRangeQuery in 6.0.0
> (NumericRangeQuery is completely removed)
>
>
>
> --
> Rajnish
>


Re: What does "found existing value for PerFieldPostingsFormat.format" mean?

2016-10-17 Thread Trejkaz
Additional investigation:

The index has two segments. Both segments have this "path-position" in
the FieldInfo only once. The settings look the same:

FieldInfo in first sub-reader:
name = "path-position"
number = 6
docValuesType = NONE
storeTermVector = false
omitNorms = true
indexOptions = DOCS_AND_FREQS_AND_POSITIONS
storePayloads = false
attributes =
"PerFieldPostingsFormat.format" -> "Lucene50"
"PerFieldPostingsFormat.suffix" -> "0"
dvGen = -1

FieldInfo in second sub-reader:
name = "path-position"
number = 6
docValuesType = NONE
storeTermVector = false
omitNorms = true
indexOptions = DOCS_AND_FREQS_AND_POSITIONS
storePayloads = false
attributes =
"PerFieldPostingsFormat.format" -> "Lucene50"
"PerFieldPostingsFormat.suffix" -> "0"
dvGen = -1

So I'm confused. addIndexes, I thought, merged the data from the given
readers into the destination writer. And here I have two fields with
the same name, number and every other setting, and somehow it's
failing to merge them because when it gets to the second one, it fails
because the first one existed already... which to me, seems like the
point of merging, but maybe that's just me.

TX

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



What does "found existing value for PerFieldPostingsFormat.format" mean?

2016-10-17 Thread Trejkaz
Hi all.

Does anyone know what this error message means?

found existing value for PerFieldPostingsFormat.format,
field=path-position, old=Lucene50, new=Lucene50
java.lang.IllegalStateException: found existing value for
PerFieldPostingsFormat.format, field=path-position, old=Lucene50,
new=Lucene50
at 
org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.write(PerFieldPostingsFormat.java:170)
at 
org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:105)
at 
org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:193)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:95)
at org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:2629)

We're doing fancy migrations which perform index changes by overriding
FilterCodecReader and copying into a new index, but in this particular
case the migration is only *deleting* values from the index, so it
seems odd that I'd get this particular error.

TX

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Performance of Prefix, Wildcard and Regex queries?

2016-10-17 Thread Michael McCandless
It doesn't matter at all if you try to e.g. optimize a WildcardQuery
like foo* into a PrefixQuery, because Lucene turns all of these
queries into an AutomatonQuery anyway, which efficiently intersects a
term automaton with the terms dictionary.

Mike McCandless

http://blog.mikemccandless.com


On Sun, Oct 16, 2016 at 8:54 PM, Trejkaz  wrote:
> On Sat, Oct 15, 2016 at 1:21 AM, Rajnish Kamboj  wrote:
>> Hi
>>
>> Performance of Prefix, Wildcard and Regex queries?
>> Does Lucene internally optimizes this (using rewrite or something else) or
>> I have to manually create specific queries depending on input pattern.
>>
>> Example
>> if input is 78* create Prefix query
>> if input is 87?98* create Wildcard query
>> if input is 87[7-5]* create Regex query.
>
> I think QueryParser already takes care of converting to PrefixQuery
> when possible.
>
> Regexes aren't really possible, though. Consider this:
>
> abc* (wildcard query, matching abc followed by anything)
>
> Versus this:
>
> abc*  (regex query, matching ab followed by 0 or more c)
>
> I think for that, you're going to want additional syntax in your query parser.
>
> TX
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Boost fields for more like this query

2016-10-17 Thread Jürgen Albert

Hi,

I'm using lucene 4.8.1 an try to get the MLT to give certain fields a 
bigger weight in the similarity calculation. Is this even possible? I 
only saw that I can give a boost to the MLTQuery itself, but not to a 
field. Has anybody any idea?


Regards,

Jürgen.


--
Jürgen Albert
Geschäftsführer

Data In Motion UG (haftungsbeschränkt)

Kahlaische Str. 4
07745 Jena

Mobil:  0157-72521634
E-Mail: j.alb...@datainmotion.de
Web: www.datainmotion.de

XING:   https://www.xing.com/profile/Juergen_Albert5

Rechtliches

Jena HBR 507027
USt-IdNr: DE274553639
St.Nr.: 162/107/04586


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org