: > Hoss guessed that we could override Term Frequency with PreAnalyzedField[1]
: > for the per-keyword scores, since keywords (tags) always have a Term
: > Frequency of 1 and the TF calculation is very fast. However it turns out
: > that you can't[2] specify TF in the PreAnalyzedField.

Yeah ... sorry for stearing you in the wrong direction there.

Mikhail's suggesting is dead on what i thought you could 
already do with PreAnalyzedField...

: if "manipulating tf" is a possible approach, why don't extend
: KeywordTokenizer to make it work in the following manner:
: 
: "3|wheel" -> {wheel,wheel,wheel}
: 
: it will allow supply your per-term-per-doc boosts as a prefixes for field
: values and multiply them during indexing internally.

..to be clear, this won't/shouldn't be as inefficient and memory bloated 
as it sounds because you don't actaully have to copy the "Term" N times --  
You should just be able to have the TokenStream you return from your 
Tokenizer implement incrementToken() by simply incrementing a counter and 
returning true until it's been called N times, w/o modifying any other 
state.

Or at least ... that's my theory ... i've been wrong before.

-Hoss

Reply via email to