[GitHub] spark pull request #16856: [SPARK-19516][DOC] update public doc to use Spark...

srowen Fri, 03 Mar 2017 14:45:04 -0800

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16856#discussion_r104255799
  
    --- Diff: docs/quick-start.md ---
    @@ -65,41 +66,41 @@ res3: Long = 15
     
         ./bin/pyspark
     
    -Spark's primary abstraction is a distributed collection of items called a 
Resilient Distributed Dataset (RDD). RDDs can be created from Hadoop 
InputFormats (such as HDFS files) or by transforming other RDDs. Let's make a 
new RDD from the text of the README file in the Spark source directory:
    +Spark's primary abstraction is a distributed collection of items called a 
Dataset. Datasets can be created from Hadoop InputFormats (such as HDFS files) 
or by transforming other Datasets. Due to Python's dynamic nature, we don't 
need the Dataset to be strongly typed in Python. As a result, all Datasets in 
Python are Dataset[Row], and we call it `DataFrame` to be consistent with the 
data frame concept in Pandas and R. Let's make a new DataFrame from the text of 
the README file in the Spark source directory:
    --- End diff --
    
    strongly typed -> strongly-typed
    (Same above, maybe elsewhere)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #16856: [SPARK-19516][DOC] update public doc to use Spark...

Reply via email to