For picking terms from a document that stand apart from those in a large
corpus, this tf*idf trick is nearly identical to using the latent log
likelihood test.  It produces pretty darned good results.

On Tue, Jul 17, 2012 at 8:22 PM, Ken Krugler <[email protected]>wrote:

> The simplistic approach I used was to extract the top 50 terms (with
> TF*IDF weights) from the target document, then use those terms (with
> weights as boosts) to do a regular Lucene OR query & request the top 20
> hits.
>

Reply via email to