Re: Spark and HDFS

2015-07-15 Thread Marcelo Vanzin
On Wed, Jul 15, 2015 at 5:36 AM, Jeskanen, Elina wrote: > I have Spark 1.4 on my local machine and I would like to connect to our > local 4 nodes Cloudera cluster. But how? > > > > In the example it says text_file = spark.textFile("hdfs://..."), but can > you advise me in where to get this "hdfs

Re: Spark and HDFS

2015-07-15 Thread Naveen Madhire
Yes. I did this recently. You need to copy the cloudera cluster related conf files into the local machine and set HADOOP_CONF_DIR or YARN_CONF_DIR. And also local machine should be able to ssh to the cloudera cluster. On Wed, Jul 15, 2015 at 8:51 AM, ayan guha wrote: > Assuming you run spark lo

Re: Spark and HDFS

2015-07-15 Thread ayan guha
Assuming you run spark locally (ie either local mode or standalone cluster on your localm/c) 1. You need to have hadoop binaries locally 2. You need to have hdfs-site on Spark Classpath of your local m/c I would suggest you to start off with local files to play around. If you need to run spark on

Spark and HDFS

2015-07-15 Thread Jeskanen, Elina
I have Spark 1.4 on my local machine and I would like to connect to our local 4 nodes Cloudera cluster. But how? In the example it says text_file = spark.textFile("hdfs://..."), but can you advise me in where to get this "hdfs://..." -address? Thanks! Elina

Re: Spark and HDFS ( Worker and Data Nodes Combination )

2015-06-23 Thread Akhil Das
I think this is how it works, So RDDs will have partitions which are made up of blocks and the blockManager will know where these blocks are available, based on the availability (PROCESS_LOCAL, NODE_LOCAL etc), spark will launch the tasks on those nodes. This behaviour can be controlled with spark.

Re: Spark and HDFS ( Worker and Data Nodes Combination )

2015-06-22 Thread ayan guha
I have a basic qs: how spark assigns partition to an executor? Does it respect data locality? Does this behaviour depend on cluster manager, ie yarn vs standalone? On 22 Jun 2015 22:45, "Akhil Das" wrote: > Option 1 should be fine, Option 2 would bound a lot on network as the data > increase in t

Re: Spark and HDFS ( Worker and Data Nodes Combination )

2015-06-22 Thread Akhil Das
Option 1 should be fine, Option 2 would bound a lot on network as the data increase in time. Thanks Best Regards On Mon, Jun 22, 2015 at 5:59 PM, Ashish Soni wrote: > Hi All , > > What is the Best Way to install and Spark Cluster along side with Hadoop > Cluster , Any recommendation for below

Spark and HDFS ( Worker and Data Nodes Combination )

2015-06-22 Thread Ashish Soni
Hi All , What is the Best Way to install and Spark Cluster along side with Hadoop Cluster , Any recommendation for below deployment topology will be a great help *Also Is it necessary to put the Spark Worker on DataNodes as when it read block from HDFS it will be local to the Server / Worker or

Re: Questions about Spark and HDFS co-location

2015-01-09 Thread Andrew Ash
> through the spark code) which do a good job of explaining the Spark/HDFS >> > data workflow (ie. how data moves from disk -> HDFS -> Spark -> HDFS)? >> > >> > Thanks! >> > Zach >> > >> > >> > >> > >> >

Re: Questions about Spark and HDFS co-location

2015-01-09 Thread Ted Yu
> Also, do you know of any papers/books/other resources (other trying to > dig > > through the spark code) which do a good job of explaining the Spark/HDFS > > data workflow (ie. how data moves from disk -> HDFS -> Spark -> HDFS)? > > > > Thanks! > > Zach &

Re: Questions about Spark and HDFS co-location

2015-01-09 Thread Sean Owen
the Spark/HDFS > data workflow (ie. how data moves from disk -> HDFS -> Spark -> HDFS)? > > Thanks! > Zach > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Questions-about-Spark-and-

Questions about Spark and HDFS co-location

2015-01-09 Thread zfry
do a good job of explaining the Spark/HDFS data workflow (ie. how data moves from disk -> HDFS -> Spark -> HDFS)? Thanks! Zach -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Questions-about-Spark-and-HDFS-co-location-tp21070.html Sent from t