Just to forestall some effort on this, LLR is very good for threshold, but
the value is bad as a score so substituting TF or TFIDF is entirely
appropriate.

There may be use cases for keeping LLR if only for diagnostic purposes.

On Thu, May 27, 2010 at 8:52 AM, Drew Farris <[email protected]> wrote:

> > 2. How can I, given a vector, get the top collocations for that Vector,
> as
> > ranked by LLR?
> >
>
> If I recall correctly, the LLR score gets dropped in seq2sparse in favor of
> TF or TFIDF depending on the nature of the vectors being generated.
> Meanwhile, CollocDriver simply emits a list of collocations in a collection
> ranked by llr, so neither is strictly what you are interested in. Is there
> a
> good way to include both something like TF >and< LLR in the output of
> seq2sparse -- would it be necessary to resort to emitting 2 separate sets
> of
> vectors?
>

Reply via email to