On May 27, 2010, at 11:58 AM, Ted Dunning wrote:

> Just to forestall some effort on this, LLR is very good for threshold, but
> the value is bad as a score so substituting TF or TFIDF is entirely
> appropriate.

Good to know.

> 
> There may be use cases for keeping LLR if only for diagnostic purposes.

I just want to supplement my docs with some "high quality" collocations.  
TF-IDF is good enough, just not clear how best to get them out at this point, 
on a per doc basis.

> 
> On Thu, May 27, 2010 at 8:52 AM, Drew Farris <[email protected]> wrote:
> 
>>> 2. How can I, given a vector, get the top collocations for that Vector,
>> as
>>> ranked by LLR?
>>> 
>> 
>> If I recall correctly, the LLR score gets dropped in seq2sparse in favor of
>> TF or TFIDF depending on the nature of the vectors being generated.
>> Meanwhile, CollocDriver simply emits a list of collocations in a collection
>> ranked by llr, so neither is strictly what you are interested in. Is there
>> a
>> good way to include both something like TF >and< LLR in the output of
>> seq2sparse -- would it be necessary to resort to emitting 2 separate sets
>> of
>> vectors?
>> 


Reply via email to