: > Hoss guessed that we could override Term Frequency with PreAnalyzedField[1] : > for the per-keyword scores, since keywords (tags) always have a Term : > Frequency of 1 and the TF calculation is very fast. However it turns out : > that you can't[2] specify TF in the PreAnalyzedField.
Yeah ... sorry for stearing you in the wrong direction there. Mikhail's suggesting is dead on what i thought you could already do with PreAnalyzedField... : if "manipulating tf" is a possible approach, why don't extend : KeywordTokenizer to make it work in the following manner: : : "3|wheel" -> {wheel,wheel,wheel} : : it will allow supply your per-term-per-doc boosts as a prefixes for field : values and multiply them during indexing internally. ..to be clear, this won't/shouldn't be as inefficient and memory bloated as it sounds because you don't actaully have to copy the "Term" N times -- You should just be able to have the TokenStream you return from your Tokenizer implement incrementToken() by simply incrementing a counter and returning true until it's been called N times, w/o modifying any other state. Or at least ... that's my theory ... i've been wrong before. -Hoss