As long as the path is present & available on all machines you should be
able to leverage distribution. HDFS is one way to make that happen, NFS is
another & simple replication is another.


Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi <https://twitter.com/mayur_rustagi>



On Wed, Apr 23, 2014 at 12:12 PM, Carter <gyz...@hotmail.com> wrote:

> Hi, I am a beginner of Hadoop and Spark, and want some help in
> understanding
> how hadoop works.
>
> If we have a cluster of 5 computers, and install Spark on the cluster
> WITHOUT Hadoop. And then we run the code on one computer:
> val doc = sc.textFile("/home/scalatest.txt",5)
> doc.count
> Can the "count" task be distributed to all the 5 computers? Or it is only
> run by 5 parallel threads of the current computer?
>
> On th other hand, if we install Hadoop on the cluster and upload the data
> into HDFS, when running the same code will this "count" task be done by 25
> threads?
>
> Thank you very much for your help.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Need-help-about-how-hadoop-works-tp4638.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Reply via email to