Hi all,

I am looking to extend the wordcount
<https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/wordcount.py>
python
example to track not only the 'count' of the words in all sentences, but
also to include the number of unique documents (ie. sentences) that the
word appears in.  This could be used to calculate the inverse document
frequency for tf-idf or similar.  Any suggestions or examples that help to
illustrate the most efficient way to do this via the Python SDK?

Thanks,
Jimmy

Reply via email to