Hi,
The data (in this case example README.md) is kept in Hadoop Distributed
File System (HDFS) among all datanodes in Hadoop cluster. The metadata that
is used to get info about the storage of this file is kept in namenode.
Your data is always stored in HDFS.
Spark is an application that can acce
HDFS, as the name implies, is a distributed file system. A file stored on HDFS
is already distributed. So if you create an RDD from a HDFS file, the created
RDD just points to the file partitions on different nodes.
You can read more about HDFS here.
http://hadoop.apache.org/docs/stable/hadoop-