Nice. Of course for ultimate conciseness, you should have gone with Python :)
import apache_beam as beam, re
with beam.Pipeline() as p:
(p
| beam.io.textio.ReadFromText("playing_cards.tsv")
| beam.Map(lamdba s: re.split("\\W+", s))
| beam.combiners.Count.PerElement()
| beam.Map(lambda (w, c): "%s: %d" % (w, c))
| beam.io.textio.WriteToText("output/stringcounts")
On Wed, Dec 7, 2016 at 10:14 AM, Jean-Baptiste Onofré <[email protected]> wrote:
> Good idea Neelesh !
>
> definitively something we can add to the beam-samples (great complement to
> what I have on my github).
>
> Regards
> JB
>
> On 12/07/2016 07:10 PM, Neelesh Salian wrote:
>>
>> Perhaps we can add this to our examples.
>> Thank you Jesse. :)
>>
>> On Wed, Dec 7, 2016 at 10:07 AM, Jean-Baptiste Onofré <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>> Awesome !
>>
>> Thanks Jesse !
>>
>> Regards
>> JB
>>
>> On 12/07/2016 06:22 PM, Jesse Anderson wrote:
>>
>> I wrote a post on the smallest WordCount
>> <http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/
>> <http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/>> I
>> could
>> write. I go through everything line by line and talk about some
>> of the
>> newest DoFNs that allow you to easily run regular expressions in a
>> distributed way.
>>
>> Thanks,
>>
>> Jesse
>>
>>
>>
>> --
>> Jean-Baptiste Onofré
>> [email protected] <mailto:[email protected]>
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>>
>>
>>
>> --
>> Neelesh Srinivas Salian
>> Customer Operations Engineer
>>
>> *
>> *
>> *
>> *
>
>
> --
> Jean-Baptiste Onofré
> [email protected]
> http://blog.nanthrax.net
> Talend - http://www.talend.com