Wordcount Document Frequency Extension

James Comfort Sun, 08 Oct 2017 10:53:45 -0700

Hi all,

I am looking to extend the wordcount
<https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/wordcount.py>
python
example to track not only the 'count' of the words in all sentences, but
also to include the number of unique documents (ie. sentences) that the
word appears in.  This could be used to calculate the inverse document
frequency for tf-idf or similar.  Any suggestions or examples that help to
illustrate the most efficient way to do this via the Python SDK?


Thanks,
Jimmy

Wordcount Document Frequency Extension

Reply via email to