On Sun, Nov 13, 2011 at 10:09 PM, Ted Dunning <[email protected]> wrote:

> That handles coherent.
>
> IT doesn't handle usable.
>
> Storing the vectors as binary payloads handles the situation for
> projection-like applications, but that doesn't help retrieval.
>

It's not just projection, it's for added relevance: if you are already doing
Lucene for your scoring needs, you already are getting some good precision
and recall.

The idea is this: you take results you are *already* scoring, and add to
that
scoring function an LSI cosine as one feature among many.  Hopefully it
will improve precision, even if it will do nothing for recall (as it's only
being
applied to results already retrieved by the text query).

Alternatively, to improve recall, at index-time, supplement each document
by terms in a new field "lsi_expanded" which are the terms closest in the
SVD projected space to the document, but aren't already in it.  Then at
query time, add an "... OR lsi_expanded:<query>" clause onto your query.
Instant query-expansion for recall enhancement.

Or do both, and play with both your precision and recall.

  -jake

Reply via email to