Re: Rich positions (was "boosting fields")

2006-04-29 Thread Marvin Humphrey
On Apr 29, 2006, at 12:40 AM, Marvin Humphrey wrote: One file, the "PostingsFile", which merges the FreqFile, ProxFile, and Boost/Norm for each posting into a single contiguous block, with an eye towards aggressively minimizing disk seeks. Interpolating the positions between the Freqs is in

Re: Rich positions (was "boosting fields")

2006-04-29 Thread Marvin Humphrey
score *= normDecoder[norms[doc] & 0xFF];// normalize for field If we're talking NORMS_IN_FREQ, then you'd replace that line with one call to getBoost() against the TermDocs. (or maybe getNorm? getMultiplier?) I'll start there. Considering I don't have to worry about any index f

Re: Rich positions (was "boosting fields")

2006-04-27 Thread karl wettin
28 apr 2006 kl. 00.30 skrev Marvin Humphrey: On Apr 27, 2006, at 2:35 PM, karl wettin wrote: What will be required in the IndexReader? Is it enough to add getBoost() in the TermEnum? How would the value be sent to the scorer? It wouldn't be the TermEnum, it would be a TermDocs subclass.

Re: Rich positions (was "boosting fields")

2006-04-27 Thread Marvin Humphrey
On Apr 27, 2006, at 2:35 PM, karl wettin wrote: What will be required in the IndexReader? Is it enough to add getBoost() in the TermEnum? How would the value be sent to the scorer? It wouldn't be the TermEnum, it would be a TermDocs subclass. If we're talking BOOST_PER_POSITION, it would

Re: Rich positions (was "boosting fields")

2006-04-27 Thread Doug Cutting
Marvin Humphrey wrote: Incidentally, how about calling it BOOST_PER_POSITION instead? +1, that is more consistent with other naming. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PRO

Re: Rich positions (was "boosting fields")

2006-04-27 Thread karl wettin
27 apr 2006 kl. 18.41 skrev Doug Cutting: karl wettin wrote: Boost per position, et.c. sounds very expensive. Indeed. It will probably nearly double the size of indexes and also increase search time. But it is also very powerful. Consider the posting representation Google describes on

Re: Rich positions (was "boosting fields")

2006-04-27 Thread Marvin Humphrey
Now that I think about it, putting the score-multiplier into the FreqFile does offer a benefit I hadn't considered before. It makes it possible to tie the score multiplier to a term within a doc, rather than a field within a doc. Say you have a doc with a "body" field that's 1000 terms l

Re: Rich positions (was "boosting fields")

2006-04-27 Thread Marvin Humphrey
On Apr 27, 2006, at 12:17 PM, Doug Cutting wrote: Marvin Humphrey wrote: Moving away from cached norms was the second of three major changes to the file format on my agenda, and the one I was all but certain I wouldn't be able to sell to the Lucene community. The first was using bytec

Re: Rich positions (was "boosting fields")

2006-04-27 Thread Doug Cutting
Marvin Humphrey wrote: Moving away from cached norms was the second of three major changes to the file format on my agenda, and the one I was all but certain I wouldn't be able to sell to the Lucene community. The first was using bytecounts at the head of Strings. The third was storing st

Rich positions (was "boosting fields")

2006-04-27 Thread Marvin Humphrey
On Apr 27, 2006, at 9:41 AM, Doug Cutting wrote: karl wettin wrote: My own immediate thought is to compromise by allowing boost per term in document. Simply remove the norms-methods from the IndexReader and add a new one to the TermEnum and fall back on the field boost. How would the v

Re: boosting fields

2006-04-27 Thread Doug Cutting
karl wettin wrote: My own immediate thought is to compromise by allowing boost per term in document. Simply remove the norms-methods from the IndexReader and add a new one to the TermEnum and fall back on the field boost. How would the value be picked up by the scorer? Boost per position,

Re: boosting fields

2006-04-27 Thread karl wettin
26 apr 2006 kl. 19.18 skrev Doug Cutting: karl wettin wrote: How about refactoring fields to something like: [Document](fieldName)<#> {0..1} ->[Field +boost]<#> {0..*} -> [FieldValue +store +index +termVector] If you think you have a simple, back-compatible way to do this, pleas

Re: boosting fields

2006-04-26 Thread Doug Cutting
karl wettin wrote: karl wettin wrote: This could lead me to believe I can use different boost for fields with the same name within one document. You can. The values are multiplied to produce the final boost value for the field. It's not really the same thing as I tried to describe thou

Re: boosting fields

2006-04-25 Thread karl wettin
25 apr 2006 kl. 19.34 skrev Doug Cutting: karl wettin wrote: This could lead me to believe I can use different boost for fields with the same name within one document. You can. The values are multiplied to produce the final boost value for the field. This is described in: http://luce

Re: boosting fields

2006-04-25 Thread Doug Cutting
karl wettin wrote: This could lead me to believe I can use different boost for fields with the same name within one document. You can. The values are multiplied to produce the final boost value for the field. This is described in: http://lucene.apache.org/java/docs/api/org/apache/lucene/d

Re: boosting fields

2006-04-25 Thread karl wettin
25 apr 2006 kl. 18.56 skrev karl wettin: How about refactoring fields to something like: [Document](fieldName)<#> {0..1} ->[Field +boost]<#> {0..*} - >[FieldValue +store +index +termVector] instead of as now: [Document](fieldName)<#> {0..1} ->[Field +boost +store +index +term

boosting fields

2006-04-25 Thread karl wettin
I don't like how fields are configured. Document doc = new Document(); Field f; f = new Field("foo", "bar tzar", Field.Store.NO, Field.Index.TOKENIZED, Field.TermVector.YES); f.setBoost(1.5f); doc.add(f); f = new Field("foo", "blah yada", Field.Store.NO, Field.Index.T