spark git commit: [MINOR][SQL][STREAMING][DOCS] Fix minor typos, punctuations and grammar

2016-07-19 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 eb1c20fa0 -> 929fa287e


[MINOR][SQL][STREAMING][DOCS] Fix minor typos, punctuations and grammar

## What changes were proposed in this pull request?

Minor fixes correcting some typos, punctuations, grammar.
Adding more anchors for easy navigation.
Fixing minor issues with code snippets.

## How was this patch tested?

`jekyll serve`

Author: Ahmed Mahran 

Closes #14234 from ahmed-mahran/b-struct-streaming-docs.

(cherry picked from commit 6caa22050e221cf14e2db0544fd2766dd1102bda)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/929fa287
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/929fa287
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/929fa287

Branch: refs/heads/branch-2.0
Commit: 929fa287e700c0e112f43e0c7b9bc746b5546c64
Parents: eb1c20f
Author: Ahmed Mahran 
Authored: Tue Jul 19 12:01:54 2016 +0100
Committer: Sean Owen 
Committed: Tue Jul 19 12:06:26 2016 +0100

--
 docs/structured-streaming-programming-guide.md | 154 +---
 1 file changed, 71 insertions(+), 83 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/929fa287/docs/structured-streaming-programming-guide.md
--
diff --git a/docs/structured-streaming-programming-guide.md 
b/docs/structured-streaming-programming-guide.md
index 3ef39e4..aac8817 100644
--- a/docs/structured-streaming-programming-guide.md
+++ b/docs/structured-streaming-programming-guide.md
@@ -22,14 +22,49 @@ Let’s say you want to maintain a running word count of 
text data received from
 
 
 
+{% highlight scala %}
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.SparkSession
+
+val spark = SparkSession
+  .builder
+  .appName("StructuredNetworkWordCount")
+  .getOrCreate()
+  
+import spark.implicits._
+{% endhighlight %}
 
 
 
 
+{% highlight java %}
+import org.apache.spark.api.java.function.FlatMapFunction;
+import org.apache.spark.sql.*;
+import org.apache.spark.sql.streaming.StreamingQuery;
+
+import java.util.Arrays;
+import java.util.Iterator;
+
+SparkSession spark = SparkSession
+.builder()
+.appName("JavaStructuredNetworkWordCount")
+.getOrCreate();
+{% endhighlight %}
 
 
 
 
+{% highlight python %}
+from pyspark.sql import SparkSession
+from pyspark.sql.functions import explode
+from pyspark.sql.functions import split
+
+spark = SparkSession\
+.builder()\
+.appName("StructuredNetworkWordCount")\
+.getOrCreate()
+{% endhighlight %}
+
 
 
 
@@ -39,18 +74,6 @@ Next, let’s create a streaming DataFrame that represents 
text data received fr
 
 
 {% highlight scala %}
-import org.apache.spark.sql.functions._
-import org.apache.spark.sql.SparkSession
-
-val spark = SparkSession
-  .builder
-  .appName("StructuredNetworkWordCount")
-  .getOrCreate()
-{% endhighlight %}
-
-Next, let’s create a streaming DataFrame that represents text data received 
from a server listening on localhost:, and transform the DataFrame to 
calculate word counts.
-
-{% highlight scala %}
 // Create DataFrame representing the stream of input lines from connection to 
localhost:
 val lines = spark.readStream
   .format("socket")
@@ -65,30 +88,12 @@ val words = lines.as[String].flatMap(_.split(" "))
 val wordCounts = words.groupBy("value").count()
 {% endhighlight %}
 
-This `lines` DataFrame represents an unbounded table containing the streaming 
text data. This table contains one column of strings named “value”, and 
each line in the streaming text data becomes a row in the table. Note, that 
this is not currently receiving any data as we are just setting up the 
transformation, and have not yet started it. Next, we have converted the 
DataFrame to a  Dataset of String using `.as(Encoders.STRING())`, so that we 
can apply the `flatMap` operation to split each line into multiple words. The 
resultant `words` Dataset contains all the words. Finally, we have defined the 
`wordCounts` DataFrame by grouping by the unique values in the Dataset and 
counting them. Note that this is a streaming DataFrame which represents the 
running word counts of the stream.
+This `lines` DataFrame represents an unbounded table containing the streaming 
text data. This table contains one column of strings named “value”, and 
each line in the streaming text data becomes a row in the table. Note, that 
this is not currently receiving any data as we are just setting up the 
transformation, and have not yet started it. Next, we have converted the 
DataFrame to a  Dataset of String using `.as[String]`, so that we can apply the 
`flatMap` operation to split each line into multiple words. The resultant 
`words` Dataset contains all the words. Finally, we ha

spark git commit: [MINOR][SQL][STREAMING][DOCS] Fix minor typos, punctuations and grammar

2016-07-19 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 21a6dd2ae -> 6caa22050


[MINOR][SQL][STREAMING][DOCS] Fix minor typos, punctuations and grammar

## What changes were proposed in this pull request?

Minor fixes correcting some typos, punctuations, grammar.
Adding more anchors for easy navigation.
Fixing minor issues with code snippets.

## How was this patch tested?

`jekyll serve`

Author: Ahmed Mahran 

Closes #14234 from ahmed-mahran/b-struct-streaming-docs.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6caa2205
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6caa2205
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6caa2205

Branch: refs/heads/master
Commit: 6caa22050e221cf14e2db0544fd2766dd1102bda
Parents: 21a6dd2
Author: Ahmed Mahran 
Authored: Tue Jul 19 12:01:54 2016 +0100
Committer: Sean Owen 
Committed: Tue Jul 19 12:01:54 2016 +0100

--
 docs/structured-streaming-programming-guide.md | 154 +---
 1 file changed, 71 insertions(+), 83 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/6caa2205/docs/structured-streaming-programming-guide.md
--
diff --git a/docs/structured-streaming-programming-guide.md 
b/docs/structured-streaming-programming-guide.md
index 3ef39e4..aac8817 100644
--- a/docs/structured-streaming-programming-guide.md
+++ b/docs/structured-streaming-programming-guide.md
@@ -22,14 +22,49 @@ Let’s say you want to maintain a running word count of 
text data received from
 
 
 
+{% highlight scala %}
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.SparkSession
+
+val spark = SparkSession
+  .builder
+  .appName("StructuredNetworkWordCount")
+  .getOrCreate()
+  
+import spark.implicits._
+{% endhighlight %}
 
 
 
 
+{% highlight java %}
+import org.apache.spark.api.java.function.FlatMapFunction;
+import org.apache.spark.sql.*;
+import org.apache.spark.sql.streaming.StreamingQuery;
+
+import java.util.Arrays;
+import java.util.Iterator;
+
+SparkSession spark = SparkSession
+.builder()
+.appName("JavaStructuredNetworkWordCount")
+.getOrCreate();
+{% endhighlight %}
 
 
 
 
+{% highlight python %}
+from pyspark.sql import SparkSession
+from pyspark.sql.functions import explode
+from pyspark.sql.functions import split
+
+spark = SparkSession\
+.builder()\
+.appName("StructuredNetworkWordCount")\
+.getOrCreate()
+{% endhighlight %}
+
 
 
 
@@ -39,18 +74,6 @@ Next, let’s create a streaming DataFrame that represents 
text data received fr
 
 
 {% highlight scala %}
-import org.apache.spark.sql.functions._
-import org.apache.spark.sql.SparkSession
-
-val spark = SparkSession
-  .builder
-  .appName("StructuredNetworkWordCount")
-  .getOrCreate()
-{% endhighlight %}
-
-Next, let’s create a streaming DataFrame that represents text data received 
from a server listening on localhost:, and transform the DataFrame to 
calculate word counts.
-
-{% highlight scala %}
 // Create DataFrame representing the stream of input lines from connection to 
localhost:
 val lines = spark.readStream
   .format("socket")
@@ -65,30 +88,12 @@ val words = lines.as[String].flatMap(_.split(" "))
 val wordCounts = words.groupBy("value").count()
 {% endhighlight %}
 
-This `lines` DataFrame represents an unbounded table containing the streaming 
text data. This table contains one column of strings named “value”, and 
each line in the streaming text data becomes a row in the table. Note, that 
this is not currently receiving any data as we are just setting up the 
transformation, and have not yet started it. Next, we have converted the 
DataFrame to a  Dataset of String using `.as(Encoders.STRING())`, so that we 
can apply the `flatMap` operation to split each line into multiple words. The 
resultant `words` Dataset contains all the words. Finally, we have defined the 
`wordCounts` DataFrame by grouping by the unique values in the Dataset and 
counting them. Note that this is a streaming DataFrame which represents the 
running word counts of the stream.
+This `lines` DataFrame represents an unbounded table containing the streaming 
text data. This table contains one column of strings named “value”, and 
each line in the streaming text data becomes a row in the table. Note, that 
this is not currently receiving any data as we are just setting up the 
transformation, and have not yet started it. Next, we have converted the 
DataFrame to a  Dataset of String using `.as[String]`, so that we can apply the 
`flatMap` operation to split each line into multiple words. The resultant 
`words` Dataset contains all the words. Finally, we have defined the 
`wordCounts` DataFrame by grouping by the unique values in the Dataset and 
counting the