This is an automated email from the ASF dual-hosted git repository. mergebot-role pushed a commit to branch mergebot in repository https://gitbox.apache.org/repos/asf/beam-site.git
commit f09f363e744b4fdfb5472a2d2c4d27e2b9338e0c Author: Jean-Baptiste Onofré <jbono...@apache.org> AuthorDate: Sun Jan 28 14:50:20 2018 +0100 [BEAM-3430] Update workdcount example for Java8 for Java SDK --- src/get-started/wordcount-example.md | 33 ++++++++++++++------------------- 1 file changed, 14 insertions(+), 19 deletions(-) diff --git a/src/get-started/wordcount-example.md b/src/get-started/wordcount-example.md index 408ce5b..9947330 100644 --- a/src/get-started/wordcount-example.md +++ b/src/get-started/wordcount-example.md @@ -126,6 +126,10 @@ example, your pipeline executes locally using the `DirectRunner`. In the next sections, we will specify the pipeline's runner. ```java + // Create a PipelineOptions object. This object lets us set various execution + // options for our pipeline, such as the runner you wish to use. This example + // will run with the DirectRunner by default, based on the class path configured + // in its dependencies. PipelineOptions options = PipelineOptionsFactory.create(); // In order to run your pipeline, you need to make following runner specific changes: @@ -190,7 +194,10 @@ The MinimalWordCount pipeline contains five transforms: {% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/snippets.py tag:examples_wordcount_minimal_read %}``` -2. A [ParDo]({{ site.baseurl }}/documentation/programming-guide/#pardo) +2. This transform splits the lines in PCollection<String>, where each element + is an individual word in Shakespeare's collected texts. + As an alternative, it would have been possible to use a + [ParDo]({{ site.baseurl }}/documentation/programming-guide/#pardo) transform that invokes a `DoFn` (defined in-line as an anonymous class) on each element that tokenizes the text lines into individual words. The input for this transform is the `PCollection` of text lines generated by the @@ -198,18 +205,9 @@ The MinimalWordCount pipeline contains five transforms: `PCollection`, where each element represents an individual word in the text. ```java - .apply("ExtractWords", ParDo.of(new DoFn<String, String>() { - @ProcessElement - public void processElement(ProcessContext c) { - // \p{L} denotes the category of Unicode letters, - // so this pattern will match on everything that is not a letter. - for (String word : c.element().split("[^\\p{L}]+")) { - if (!word.isEmpty()) { - c.output(word); - } - } - } - })) + .apply("ExtractWords", FlatMapElements + .into(TypeDescriptors.strings()) + .via((String word) -> Arrays.asList(word.split("[^\\p{L}]+")))) ``` ```py @@ -245,12 +243,9 @@ The MinimalWordCount pipeline contains five transforms: transform applies a function that produces exactly one output element. ```java - .apply("FormatResults", MapElements.via(new SimpleFunction<KV<String, Long>, String>() { - @Override - public String apply(KV<String, Long> input) { - return input.getKey() + ": " + input.getValue(); - } - })) + .apply("FormatResults", MapElements + .into(TypeDescriptors.strings()) + .via((KV<String, Long> wordCount) -> wordCount.getKey() + ": " + wordCount.getValue())) ``` ```py -- To stop receiving notification emails like this one, please contact mergebot-r...@apache.org.