Just to forestall some effort on this, LLR is very good for threshold, but the value is bad as a score so substituting TF or TFIDF is entirely appropriate.
There may be use cases for keeping LLR if only for diagnostic purposes. On Thu, May 27, 2010 at 8:52 AM, Drew Farris <[email protected]> wrote: > > 2. How can I, given a vector, get the top collocations for that Vector, > as > > ranked by LLR? > > > > If I recall correctly, the LLR score gets dropped in seq2sparse in favor of > TF or TFIDF depending on the nature of the vectors being generated. > Meanwhile, CollocDriver simply emits a list of collocations in a collection > ranked by llr, so neither is strictly what you are interested in. Is there > a > good way to include both something like TF >and< LLR in the output of > seq2sparse -- would it be necessary to resort to emitting 2 separate sets > of > vectors? >
