[GitHub] spark pull request: [SPARK-6806] [SparkR] [Docs] Fill in SparkR ex...

davies Fri, 10 Apr 2015 13:48:12 -0700

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5442#discussion_r28179136
  
    --- Diff: docs/programming-guide.md ---
    @@ -477,8 +541,28 @@ the [Converter 
examples]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main
     for examples of using Cassandra / HBase ```InputFormat``` and 
```OutputFormat``` with custom converters.
     
     </div>
    +<div data-lang="r"  markdown="1">
    +
    +SparkR can create distributed datasets from any storage source supported 
by Hadoop, including your local file system, HDFS, Cassandra, HBase, [Amazon 
S3](http://wiki.apache.org/hadoop/AmazonS3), etc. Spark supports text files, 
[SequenceFiles](http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/SequenceFileInputFormat.html).
    +
    +Text file RDDs can be created using `textFile` method. This method takes 
an URI for the file (either a local path on the machine, or a `hdfs://`, 
`s3n://`, etc URI) and reads it as a collection of lines. Here is an example 
invocation:
    +
    +{% highlight r %}
    +distFile <- textFile(sc, "data.txt")
    +{% endhighlight %}
    +
    +Once created, `distFile` can be acted on by dataset operations. For 
example, we can add up the sizes of all the lines using the `map` and `reduce` 
operations as follows: `reduce(map(distFile, length), function(a, b) {a + b})`.
    +
    +Some notes on reading files with Spark:
    +
    +* If using a path on the local filesystem, the file must also be 
accessible at the same path on worker nodes. Either copy the file to all 
workers or use a network-mounted shared file system.
    --- End diff --
    
    The EC2 link applies to all the languages, so I'd like to leave out of this 
section.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-6806] [SparkR] [Docs] Fill in SparkR ex...

Reply via email to