I have added a question
<https://stackoverflow.com/questions/46636034/apache-beam-python-word-count-and-document-frequency>
to stackoverflow showing what I have put together so far. While it works,
this still seems fairly hacky. Would anybody be able to suggest some
improvements/best-practices that I should implement?

On Sun, Oct 8, 2017 at 1:53 PM, James Comfort <[email protected]> wrote:

> Hi all,
>
> I am looking to extend the wordcount
> <https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/wordcount.py>
>  python
> example to track not only the 'count' of the words in all sentences, but
> also to include the number of unique documents (ie. sentences) that the
> word appears in.  This could be used to calculate the inverse document
> frequency for tf-idf or similar.  Any suggestions or examples that help to
> illustrate the most efficient way to do this via the Python SDK?
>
> Thanks,
> Jimmy
>

Reply via email to