spark git commit: [MINOR][SQL][STREAMING][DOCS] Fix minor typos, punctuations and grammar
Repository: spark Updated Branches: refs/heads/branch-2.0 eb1c20fa0 -> 929fa287e [MINOR][SQL][STREAMING][DOCS] Fix minor typos, punctuations and grammar ## What changes were proposed in this pull request? Minor fixes correcting some typos, punctuations, grammar. Adding more anchors for easy navigation. Fixing minor issues with code snippets. ## How was this patch tested? `jekyll serve` Author: Ahmed Mahran Closes #14234 from ahmed-mahran/b-struct-streaming-docs. (cherry picked from commit 6caa22050e221cf14e2db0544fd2766dd1102bda) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/929fa287 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/929fa287 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/929fa287 Branch: refs/heads/branch-2.0 Commit: 929fa287e700c0e112f43e0c7b9bc746b5546c64 Parents: eb1c20f Author: Ahmed Mahran Authored: Tue Jul 19 12:01:54 2016 +0100 Committer: Sean Owen Committed: Tue Jul 19 12:06:26 2016 +0100 -- docs/structured-streaming-programming-guide.md | 154 +--- 1 file changed, 71 insertions(+), 83 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/929fa287/docs/structured-streaming-programming-guide.md -- diff --git a/docs/structured-streaming-programming-guide.md b/docs/structured-streaming-programming-guide.md index 3ef39e4..aac8817 100644 --- a/docs/structured-streaming-programming-guide.md +++ b/docs/structured-streaming-programming-guide.md @@ -22,14 +22,49 @@ Letâs say you want to maintain a running word count of text data received from +{% highlight scala %} +import org.apache.spark.sql.functions._ +import org.apache.spark.sql.SparkSession + +val spark = SparkSession + .builder + .appName("StructuredNetworkWordCount") + .getOrCreate() + +import spark.implicits._ +{% endhighlight %} +{% highlight java %} +import org.apache.spark.api.java.function.FlatMapFunction; +import org.apache.spark.sql.*; +import org.apache.spark.sql.streaming.StreamingQuery; + +import java.util.Arrays; +import java.util.Iterator; + +SparkSession spark = SparkSession +.builder() +.appName("JavaStructuredNetworkWordCount") +.getOrCreate(); +{% endhighlight %} +{% highlight python %} +from pyspark.sql import SparkSession +from pyspark.sql.functions import explode +from pyspark.sql.functions import split + +spark = SparkSession\ +.builder()\ +.appName("StructuredNetworkWordCount")\ +.getOrCreate() +{% endhighlight %} + @@ -39,18 +74,6 @@ Next, letâs create a streaming DataFrame that represents text data received fr {% highlight scala %} -import org.apache.spark.sql.functions._ -import org.apache.spark.sql.SparkSession - -val spark = SparkSession - .builder - .appName("StructuredNetworkWordCount") - .getOrCreate() -{% endhighlight %} - -Next, letâs create a streaming DataFrame that represents text data received from a server listening on localhost:, and transform the DataFrame to calculate word counts. - -{% highlight scala %} // Create DataFrame representing the stream of input lines from connection to localhost: val lines = spark.readStream .format("socket") @@ -65,30 +88,12 @@ val words = lines.as[String].flatMap(_.split(" ")) val wordCounts = words.groupBy("value").count() {% endhighlight %} -This `lines` DataFrame represents an unbounded table containing the streaming text data. This table contains one column of strings named âvalueâ, and each line in the streaming text data becomes a row in the table. Note, that this is not currently receiving any data as we are just setting up the transformation, and have not yet started it. Next, we have converted the DataFrame to a Dataset of String using `.as(Encoders.STRING())`, so that we can apply the `flatMap` operation to split each line into multiple words. The resultant `words` Dataset contains all the words. Finally, we have defined the `wordCounts` DataFrame by grouping by the unique values in the Dataset and counting them. Note that this is a streaming DataFrame which represents the running word counts of the stream. +This `lines` DataFrame represents an unbounded table containing the streaming text data. This table contains one column of strings named âvalueâ, and each line in the streaming text data becomes a row in the table. Note, that this is not currently receiving any data as we are just setting up the transformation, and have not yet started it. Next, we have converted the DataFrame to a Dataset of String using `.as[String]`, so that we can apply the `flatMap` operation to split each line into multiple words. The resultant `words` Dataset contains all the words. Finally, we ha
spark git commit: [MINOR][SQL][STREAMING][DOCS] Fix minor typos, punctuations and grammar
Repository: spark Updated Branches: refs/heads/master 21a6dd2ae -> 6caa22050 [MINOR][SQL][STREAMING][DOCS] Fix minor typos, punctuations and grammar ## What changes were proposed in this pull request? Minor fixes correcting some typos, punctuations, grammar. Adding more anchors for easy navigation. Fixing minor issues with code snippets. ## How was this patch tested? `jekyll serve` Author: Ahmed Mahran Closes #14234 from ahmed-mahran/b-struct-streaming-docs. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6caa2205 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6caa2205 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6caa2205 Branch: refs/heads/master Commit: 6caa22050e221cf14e2db0544fd2766dd1102bda Parents: 21a6dd2 Author: Ahmed Mahran Authored: Tue Jul 19 12:01:54 2016 +0100 Committer: Sean Owen Committed: Tue Jul 19 12:01:54 2016 +0100 -- docs/structured-streaming-programming-guide.md | 154 +--- 1 file changed, 71 insertions(+), 83 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/6caa2205/docs/structured-streaming-programming-guide.md -- diff --git a/docs/structured-streaming-programming-guide.md b/docs/structured-streaming-programming-guide.md index 3ef39e4..aac8817 100644 --- a/docs/structured-streaming-programming-guide.md +++ b/docs/structured-streaming-programming-guide.md @@ -22,14 +22,49 @@ Letâs say you want to maintain a running word count of text data received from +{% highlight scala %} +import org.apache.spark.sql.functions._ +import org.apache.spark.sql.SparkSession + +val spark = SparkSession + .builder + .appName("StructuredNetworkWordCount") + .getOrCreate() + +import spark.implicits._ +{% endhighlight %} +{% highlight java %} +import org.apache.spark.api.java.function.FlatMapFunction; +import org.apache.spark.sql.*; +import org.apache.spark.sql.streaming.StreamingQuery; + +import java.util.Arrays; +import java.util.Iterator; + +SparkSession spark = SparkSession +.builder() +.appName("JavaStructuredNetworkWordCount") +.getOrCreate(); +{% endhighlight %} +{% highlight python %} +from pyspark.sql import SparkSession +from pyspark.sql.functions import explode +from pyspark.sql.functions import split + +spark = SparkSession\ +.builder()\ +.appName("StructuredNetworkWordCount")\ +.getOrCreate() +{% endhighlight %} + @@ -39,18 +74,6 @@ Next, letâs create a streaming DataFrame that represents text data received fr {% highlight scala %} -import org.apache.spark.sql.functions._ -import org.apache.spark.sql.SparkSession - -val spark = SparkSession - .builder - .appName("StructuredNetworkWordCount") - .getOrCreate() -{% endhighlight %} - -Next, letâs create a streaming DataFrame that represents text data received from a server listening on localhost:, and transform the DataFrame to calculate word counts. - -{% highlight scala %} // Create DataFrame representing the stream of input lines from connection to localhost: val lines = spark.readStream .format("socket") @@ -65,30 +88,12 @@ val words = lines.as[String].flatMap(_.split(" ")) val wordCounts = words.groupBy("value").count() {% endhighlight %} -This `lines` DataFrame represents an unbounded table containing the streaming text data. This table contains one column of strings named âvalueâ, and each line in the streaming text data becomes a row in the table. Note, that this is not currently receiving any data as we are just setting up the transformation, and have not yet started it. Next, we have converted the DataFrame to a Dataset of String using `.as(Encoders.STRING())`, so that we can apply the `flatMap` operation to split each line into multiple words. The resultant `words` Dataset contains all the words. Finally, we have defined the `wordCounts` DataFrame by grouping by the unique values in the Dataset and counting them. Note that this is a streaming DataFrame which represents the running word counts of the stream. +This `lines` DataFrame represents an unbounded table containing the streaming text data. This table contains one column of strings named âvalueâ, and each line in the streaming text data becomes a row in the table. Note, that this is not currently receiving any data as we are just setting up the transformation, and have not yet started it. Next, we have converted the DataFrame to a Dataset of String using `.as[String]`, so that we can apply the `flatMap` operation to split each line into multiple words. The resultant `words` Dataset contains all the words. Finally, we have defined the `wordCounts` DataFrame by grouping by the unique values in the Dataset and counting the