As long as the path is present & available on all machines you should be able to leverage distribution. HDFS is one way to make that happen, NFS is another & simple replication is another.
Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi <https://twitter.com/mayur_rustagi> On Wed, Apr 23, 2014 at 12:12 PM, Carter <gyz...@hotmail.com> wrote: > Hi, I am a beginner of Hadoop and Spark, and want some help in > understanding > how hadoop works. > > If we have a cluster of 5 computers, and install Spark on the cluster > WITHOUT Hadoop. And then we run the code on one computer: > val doc = sc.textFile("/home/scalatest.txt",5) > doc.count > Can the "count" task be distributed to all the 5 computers? Or it is only > run by 5 parallel threads of the current computer? > > On th other hand, if we install Hadoop on the cluster and upload the data > into HDFS, when running the same code will this "count" task be done by 25 > threads? > > Thank you very much for your help. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Need-help-about-how-hadoop-works-tp4638.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >