Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/16856#discussion_r104255711
--- Diff: docs/quick-start.md ---
@@ -29,28 +30,28 @@ or Python. Start it by running the following in the
Spark directory:
./bin/spark-shell
-Spark's primary abstraction is a distributed collection of items called a
Resilient Distributed Dataset (RDD). RDDs can be created from Hadoop
InputFormats (such as HDFS files) or by transforming other RDDs. Let's make a
new RDD from the text of the README file in the Spark source directory:
+Spark's primary abstraction is a distributed collection of items called a
Dataset. Datasets can be created from Hadoop InputFormats (such as HDFS files)
or by transforming other Datasets. Let's make a new Dataset from the text of
the README file in the Spark source directory:
{% highlight scala %}
-scala> val textFile = sc.textFile("README.md")
-textFile: org.apache.spark.rdd.RDD[String] = README.md MapPartitionsRDD[1]
at textFile at <console>:25
+scala> val textFile = spark.read.textFile("README.md")
+textFile: org.apache.spark.sql.Dataset[String] = [value: string]
{% endhighlight %}
-RDDs have _[actions](programming-guide.html#actions)_, which return
values, and _[transformations](programming-guide.html#transformations)_, which
return pointers to new RDDs. Let's start with a few actions:
+You can get values from Dataset directly, by calling some actions, or
transform the Dataset to get a new one. For more details, please read the _[API
doc](api/scala/index.html#org.apache.spark.sql.Dataset)_.
{% highlight scala %}
-scala> textFile.count() // Number of items in this RDD
+scala> textFile.count() // Number of items in this Dataset
res0: Long = 126 // May be different from yours as README.md will change
over time, similar to other outputs
-scala> textFile.first() // First item in this RDD
+scala> textFile.first() // First item in this Dataset
res1: String = # Apache Spark
{% endhighlight %}
-Now let's use a transformation. We will use the
[`filter`](programming-guide.html#transformations) transformation to return a
new RDD with a subset of the items in the file.
+Now let's transform this Dataset to a new one. We will call the `filter`
to return a new Dataset with a subset of the items in the file.
--- End diff --
Just "call `filter`"?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]