[beam-site] 01/02: [BEAM-3430] Update workdcount example for Java8 for Java SDK

mergebot-role Thu, 22 Feb 2018 11:07:43 -0800

This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a commit to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git


commit f09f363e744b4fdfb5472a2d2c4d27e2b9338e0c
Author: Jean-Baptiste Onofré <jbono...@apache.org>
AuthorDate: Sun Jan 28 14:50:20 2018 +0100

    [BEAM-3430] Update workdcount example for Java8 for Java SDK
---
 src/get-started/wordcount-example.md | 33 ++++++++++++++-------------------
 1 file changed, 14 insertions(+), 19 deletions(-)

diff --git a/src/get-started/wordcount-example.md 
b/src/get-started/wordcount-example.md
index 408ce5b..9947330 100644
--- a/src/get-started/wordcount-example.md
+++ b/src/get-started/wordcount-example.md
@@ -126,6 +126,10 @@ example, your pipeline executes locally using the 
`DirectRunner`. In the next
 sections, we will specify the pipeline's runner.
 
 ```java
+ // Create a PipelineOptions object. This object lets us set various execution
+ // options for our pipeline, such as the runner you wish to use. This example
+ // will run with the DirectRunner by default, based on the class path 
configured
+ // in its dependencies.
  PipelineOptions options = PipelineOptionsFactory.create();
 
     // In order to run your pipeline, you need to make following runner 
specific changes:
@@ -190,7 +194,10 @@ The MinimalWordCount pipeline contains five transforms:
     {% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/snippets.py 
tag:examples_wordcount_minimal_read
     %}```
 
-2.  A [ParDo]({{ site.baseurl }}/documentation/programming-guide/#pardo)
+2.  This transform splits the lines in PCollection<String>, where each element
+    is an individual word in Shakespeare's collected texts.
+    As an alternative, it would have been possible to use a 
+    [ParDo]({{ site.baseurl }}/documentation/programming-guide/#pardo)
     transform that invokes a `DoFn` (defined in-line as an anonymous class) on
     each element that tokenizes the text lines into individual words. The input
     for this transform is the `PCollection` of text lines generated by the
@@ -198,18 +205,9 @@ The MinimalWordCount pipeline contains five transforms:
     `PCollection`, where each element represents an individual word in the 
text.
 
     ```java
-    .apply("ExtractWords", ParDo.of(new DoFn<String, String>() {
-        @ProcessElement
-        public void processElement(ProcessContext c) {
-            // \p{L} denotes the category of Unicode letters,
-            // so this pattern will match on everything that is not a letter.
-            for (String word : c.element().split("[^\\p{L}]+")) {
-                if (!word.isEmpty()) {
-                    c.output(word);
-                }
-            }
-        }
-    }))
+        .apply("ExtractWords", FlatMapElements
+            .into(TypeDescriptors.strings())
+            .via((String word) -> Arrays.asList(word.split("[^\\p{L}]+"))))
     ```
 
     ```py
@@ -245,12 +243,9 @@ The MinimalWordCount pipeline contains five transforms:
     transform applies a function that produces exactly one output element.
 
     ```java
-    .apply("FormatResults", MapElements.via(new SimpleFunction<KV<String, 
Long>, String>() {
-        @Override
-        public String apply(KV<String, Long> input) {
-            return input.getKey() + ": " + input.getValue();
-        }
-    }))
+    .apply("FormatResults", MapElements
+        .into(TypeDescriptors.strings())
+        .via((KV<String, Long> wordCount) -> wordCount.getKey() + ": " + 
wordCount.getValue()))
     ```
 
     ```py

-- 
To stop receiving notification emails like this one, please contact
mergebot-r...@apache.org.

[beam-site] 01/02: [BEAM-3430] Update workdcount example for Java8 for Java SDK

Reply via email to