Re: What does "found existing value for PerFieldPostingsFormat.format" mean?
Continuation, found a bug but I'm not sure whether it's in Lucene or Lucene's Javadoc. In MultiFields: @SuppressWarnings({"unchecked","rawtypes"}) @Override public Iterator iterator() { Iterator subIterators[] = new Iterator[subs.length]; for(int i=0;i(subIterators); } MergedIterator says in the Javadoc: "The behavior is undefined if the iterators are not actually sorted." And indeed, the iterators are _not_ actually sorted. So I look at where they come from, Fields#iterator(), which is documented fairly tersely: "Returns an iterator that will step through all fields names. This will not return null." Which doesn't say anything about the names being in order. So I assume that either: (a) Fields#iterator() is actually supposed to be sorted and the documentation should specify it but doesn't, or (b) Fields#iterator() is not supposed to be sorted, but either MultiFields#iterator() or MergedIterator is supposed to be handling this better. Either way, I think it's a bug in Lucene. But since I don't know which direction it's in, and I don't have a reproducible test case I can just hand over, I can't easily file it. :/ TX - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Creating Queries agnostic to Lucene Versions
Any thought on the below question? On Friday 14 October 2016, Rajnish Kambojwrote: > Hi > > How can I make my Lucene queries agnostic to Lucene Versions? > > e.g. NumericRangeQuery in 5.3.1 is LegacyNumericRangeQuery in 6.0.0 > (NumericRangeQuery is completely removed) > > > > -- > Rajnish >
Re: What does "found existing value for PerFieldPostingsFormat.format" mean?
Additional investigation: The index has two segments. Both segments have this "path-position" in the FieldInfo only once. The settings look the same: FieldInfo in first sub-reader: name = "path-position" number = 6 docValuesType = NONE storeTermVector = false omitNorms = true indexOptions = DOCS_AND_FREQS_AND_POSITIONS storePayloads = false attributes = "PerFieldPostingsFormat.format" -> "Lucene50" "PerFieldPostingsFormat.suffix" -> "0" dvGen = -1 FieldInfo in second sub-reader: name = "path-position" number = 6 docValuesType = NONE storeTermVector = false omitNorms = true indexOptions = DOCS_AND_FREQS_AND_POSITIONS storePayloads = false attributes = "PerFieldPostingsFormat.format" -> "Lucene50" "PerFieldPostingsFormat.suffix" -> "0" dvGen = -1 So I'm confused. addIndexes, I thought, merged the data from the given readers into the destination writer. And here I have two fields with the same name, number and every other setting, and somehow it's failing to merge them because when it gets to the second one, it fails because the first one existed already... which to me, seems like the point of merging, but maybe that's just me. TX - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
What does "found existing value for PerFieldPostingsFormat.format" mean?
Hi all. Does anyone know what this error message means? found existing value for PerFieldPostingsFormat.format, field=path-position, old=Lucene50, new=Lucene50 java.lang.IllegalStateException: found existing value for PerFieldPostingsFormat.format, field=path-position, old=Lucene50, new=Lucene50 at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.write(PerFieldPostingsFormat.java:170) at org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:105) at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:193) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:95) at org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:2629) We're doing fancy migrations which perform index changes by overriding FilterCodecReader and copying into a new index, but in this particular case the migration is only *deleting* values from the index, so it seems odd that I'd get this particular error. TX - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Performance of Prefix, Wildcard and Regex queries?
It doesn't matter at all if you try to e.g. optimize a WildcardQuery like foo* into a PrefixQuery, because Lucene turns all of these queries into an AutomatonQuery anyway, which efficiently intersects a term automaton with the terms dictionary. Mike McCandless http://blog.mikemccandless.com On Sun, Oct 16, 2016 at 8:54 PM, Trejkazwrote: > On Sat, Oct 15, 2016 at 1:21 AM, Rajnish Kamboj wrote: >> Hi >> >> Performance of Prefix, Wildcard and Regex queries? >> Does Lucene internally optimizes this (using rewrite or something else) or >> I have to manually create specific queries depending on input pattern. >> >> Example >> if input is 78* create Prefix query >> if input is 87?98* create Wildcard query >> if input is 87[7-5]* create Regex query. > > I think QueryParser already takes care of converting to PrefixQuery > when possible. > > Regexes aren't really possible, though. Consider this: > > abc* (wildcard query, matching abc followed by anything) > > Versus this: > > abc* (regex query, matching ab followed by 0 or more c) > > I think for that, you're going to want additional syntax in your query parser. > > TX > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Boost fields for more like this query
Hi, I'm using lucene 4.8.1 an try to get the MLT to give certain fields a bigger weight in the similarity calculation. Is this even possible? I only saw that I can give a boost to the MLTQuery itself, but not to a field. Has anybody any idea? Regards, Jürgen. -- Jürgen Albert Geschäftsführer Data In Motion UG (haftungsbeschränkt) Kahlaische Str. 4 07745 Jena Mobil: 0157-72521634 E-Mail: j.alb...@datainmotion.de Web: www.datainmotion.de XING: https://www.xing.com/profile/Juergen_Albert5 Rechtliches Jena HBR 507027 USt-IdNr: DE274553639 St.Nr.: 162/107/04586 - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org