Re: Clarification on RDD

2016-02-27 Thread Mich Talebzadeh
Hi, The data (in this case example README.md) is kept in Hadoop Distributed File System (HDFS) among all datanodes in Hadoop cluster. The metadata that is used to get info about the storage of this file is kept in namenode. Your data is always stored in HDFS. Spark is an application that can acce

RE: Clarification on RDD

2016-02-26 Thread Mohammed Guller
HDFS, as the name implies, is a distributed file system. A file stored on HDFS is already distributed. So if you create an RDD from a HDFS file, the created RDD just points to the file partitions on different nodes. You can read more about HDFS here. http://hadoop.apache.org/docs/stable/hadoop-