[GitHub] spark pull request #16468: [SPARK-19074][SS][DOCS] Updated Structured Stream...

thomaso-mirodin Thu, 05 Jan 2017 15:13:04 -0800

Github user thomaso-mirodin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16468#discussion_r94866235
  
    --- Diff: docs/structured-streaming-programming-guide.md ---
    @@ -954,49 +1014,93 @@ There are a few types of built-in output sinks.
     
     - **File sink** - Stores the output to a directory. 
     
    +{% highlight scala %}
    +writeStream
    +    .format("parquet")        // can be "orc", "json", "csv", etc.
    +    .option("path", "path/to/destination/dir")
    +    .start()
    +{% endhighlight %}
    +
     - **Foreach sink** - Runs arbitrary computation on the records in the 
output. See later in the section for more details.
     
    +{% highlight scala %}
    +writeStream
    +    .foreach(...)
    +    .start()
    +{% endhighlight %}
    +
     - **Console sink (for debugging)** - Prints the output to the 
console/stdout every time there is a trigger. Both, Append and Complete output 
modes, are supported. This should be used for debugging purposes on low data 
volumes as the entire output is collected and stored in the driver's memory 
after every trigger.
     
    -- **Memory sink (for debugging)** - The output is stored in memory as an 
in-memory table.  Both, Append and Complete output modes, are supported. This 
should be used for debugging purposes on low data volumes as the entire output 
is collected and stored in the driver's memory after every trigger.
    +{% highlight scala %}
    +writeStream
    +    .format("console")
    +    .start()
    +{% endhighlight %}
    +
    +- **Memory sink (for debugging)** - The output is stored in memory as an 
in-memory table.
    +Both, Append and Complete output modes, are supported. This should be used 
for debugging purposes
    +on low data volumes as the entire output is collected and stored in the 
driver's memory after
    --- End diff --
    
    This is slightly repetitive, it says "[...] the entire output is collected 
and stored in the driver's memory [...]" is said again in the next sentence as 
well "Note that the current implementations saves all the data in the driver 
memory".
    
    If we want to say this twice to make sure people read it; maybe we can move 
the "note" reminder into the `Notes` column  in the table a few lines down? :D



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16468: [SPARK-19074][SS][DOCS] Updated Structured Stream...

Reply via email to