Github user davies commented on a diff in the pull request:
https://github.com/apache/spark/pull/5442#discussion_r28179136
--- Diff: docs/programming-guide.md ---
@@ -477,8 +541,28 @@ the [Converter
examples]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main
for examples of using Cassandra / HBase ```InputFormat``` and
```OutputFormat``` with custom converters.
</div>
+<div data-lang="r" markdown="1">
+
+SparkR can create distributed datasets from any storage source supported
by Hadoop, including your local file system, HDFS, Cassandra, HBase, [Amazon
S3](http://wiki.apache.org/hadoop/AmazonS3), etc. Spark supports text files,
[SequenceFiles](http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/SequenceFileInputFormat.html).
+
+Text file RDDs can be created using `textFile` method. This method takes
an URI for the file (either a local path on the machine, or a `hdfs://`,
`s3n://`, etc URI) and reads it as a collection of lines. Here is an example
invocation:
+
+{% highlight r %}
+distFile <- textFile(sc, "data.txt")
+{% endhighlight %}
+
+Once created, `distFile` can be acted on by dataset operations. For
example, we can add up the sizes of all the lines using the `map` and `reduce`
operations as follows: `reduce(map(distFile, length), function(a, b) {a + b})`.
+
+Some notes on reading files with Spark:
+
+* If using a path on the local filesystem, the file must also be
accessible at the same path on worker nodes. Either copy the file to all
workers or use a network-mounted shared file system.
--- End diff --
The EC2 link applies to all the languages, so I'd like to leave out of this
section.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]