Re: DocValues questions

2013-04-04 Thread Wei Wang
We now start using NumericDocValuesField. The Javadoc says the old types such as ShortDocValuesField is deprecated. So even the values of a field is short, we still use NumericDocValuesField as the Javadoc suggests. However, when we call function SetIntValue() of NumericDocValuesField, we got an e

Re: DocValues questions

2013-04-04 Thread Wei Wang
Thanks! Good to know the codec uses variable length encoding mechanism here. On Thu, Apr 4, 2013 at 3:36 PM, Adrien Grand wrote: > On Thu, Apr 4, 2013 at 11:03 PM, Wei Wang wrote: > > Given the new Lucene 4.2 DocValues API, it seems no matter it is byte, > > short, int, or long, they are all st

Re: Why does index boosting a field to 2.0f on a document have such a dramatic effect

2013-04-04 Thread Paul Taylor
On 04/04/2013 23:26, Chris Hostetter wrote: : At index time I boost the alias field of a small set of documents, setting the : boost to 2.0f, which I thought meant equivalent to doubling the score this doc : would get over another doc, everything else being equal. 1) you haven't shown us enough

Re: DocValues questions

2013-04-04 Thread Adrien Grand
On Thu, Apr 4, 2013 at 11:03 PM, Wei Wang wrote: > Given the new Lucene 4.2 DocValues API, it seems no matter it is byte, > short, int, or long, they are all stored as NumericDocValuesField. Does > this mean "long" values are always stored regardless of the initial type? > If so, do we still save

Re: Why does index boosting a field to 2.0f on a document have such a dramatic effect

2013-04-04 Thread Chris Hostetter
: At index time I boost the alias field of a small set of documents, setting the : boost to 2.0f, which I thought meant equivalent to doubling the score this doc : would get over another doc, everything else being equal. 1) you haven't shown us enough details to be certian, but based on the code

Re: DocValues questions

2013-04-04 Thread Wei Wang
Given the new Lucene 4.2 DocValues API, it seems no matter it is byte, short, int, or long, they are all stored as NumericDocValuesField. Does this mean "long" values are always stored regardless of the initial type? If so, do we still save space if the value range is small? Do we need to give some

RE: Uable to extends TopTermsRewrite in Lucene 4.1

2013-04-04 Thread Uwe Schindler
Hi, this looks also fine. If the generics in the FuzzyRewrite from the last mail are correct, the cast in this rewrite is not needed, too (and DisjunctionMaxQuery implements Iterable, so you can use a simple for-loop): @Override public Query rewrite(final IndexReader reader

RE: Uable to extends TopTermsRewrite in Lucene 4.1

2013-04-04 Thread Uwe Schindler
> Okay, think I have it now. Now have a working rewrite method for Fuzzy > Queries > > public static class FuzzyTermRewrite > extends TopTermsRewrite { > > public FuzzyTermRewrite(int size) { > super(size); > } > > @Override > protected int g

Re: Scoring function in LMDirichletSimilarity Class

2013-04-04 Thread Peter Organisciak
I think this is the problem that you're running into, though maybe a person with more expertise can confirm... ZP, If you look at section 5.1 of the Zhai Lafferty paper ( http://www.cs.cmu.edu/~lafferty/pub/smooth-tois.ps), they note that the "term weight is log(1+(1-\lambda)p_ml(q_i|d) / \lamdba

Re: DocValues questions

2013-04-04 Thread Wei Wang
Hi Adrien, Thanks for the clarification. It is very helpful. Will try Lucene 4.2 and AtomicReader API. Wei On Thu, Apr 4, 2013 at 11:22 AM, Adrien Grand wrote: > Hi, > > On Thu, Apr 4, 2013 at 10:30 AM, Wei Wang wrote: > > A few quick questions about DocValues: > > > > 1. If only small number

Re: DocValues questions

2013-04-04 Thread Adrien Grand
Hi, On Thu, Apr 4, 2013 at 10:30 AM, Wei Wang wrote: > A few quick questions about DocValues: > > 1. If only small number of documents have a ShortDocValueField defined, > should each document in the index has this field filled with some value? > The add() function of Document seems not enforce a

Re: MLT Using a Query created in a different index

2013-04-04 Thread Jack Krupansky
The heart of MLT is examining the top result of a query (or maybe more than one) and identifying the "top" terms from the top document(s) and then simply using those top terms for a subsequent query. The term ranking would of course depend on term frequency, and other relevancy considerations -

MLT Using a Query created in a different index

2013-04-04 Thread Peter Lavin
Dear Users, I am doing some research where Lucene is integrated into agent technology. Part of this work involves using an MLT query in an index which was not created from a document in that index (i.e. the query is created, serialised and sent to the remote agent). Can anyone point me towa

RE: Document scoring order?

2013-04-04 Thread Uwe Schindler
> Hi Otis, > > It depends on the Scorer implementation. The default iterates through > matching documents by calling nextDoc(), which just moves along the > postings lists in-order, but you could roll your own. You're pretty > constrained > by the fact that the low-level DocIdSetIterators only

Re: Uable to extends TopTermsRewrite in Lucene 4.1

2013-04-04 Thread Paul Taylor
On 04/04/2013 10:59, Paul Taylor wrote: On 27/02/2013 10:28, Uwe Schindler wrote: Hi Paul, QueryParser and MTQ's rewrite method have nothing to do with each other. The rewrite method is (explained as simple as possible) a class that is responsible to "rewrite" a MultiTermQuery to another que

Re: Uable to extends TopTermsRewrite in Lucene 4.1

2013-04-04 Thread Paul Taylor
On 27/02/2013 10:28, Uwe Schindler wrote: Hi Paul, QueryParser and MTQ's rewrite method have nothing to do with each other. The rewrite method is (explained as simple as possible) a class that is responsible to "rewrite" a MultiTermQuery to another query type (generally a query that allows to

DocValues questions

2013-04-04 Thread Wei Wang
A few quick questions about DocValues: 1. If only small number of documents have a ShortDocValueField defined, should each document in the index has this field filled with some value? The add() function of Document seems not enforce a DocValues field is always added to each document. 2. Is there

Re: Document scoring order?

2013-04-04 Thread Alan Woodward
Hi Otis, It depends on the Scorer implementation. The default iterates through matching documents by calling nextDoc(), which just moves along the postings lists in-order, but you could roll your own. You're pretty constrained by the fact that the low-level DocIdSetIterators only move forward