On Wed, Jul 15, 2015 at 5:36 AM, Jeskanen, Elina
wrote:
> I have Spark 1.4 on my local machine and I would like to connect to our
> local 4 nodes Cloudera cluster. But how?
>
>
>
> In the example it says text_file = spark.textFile("hdfs://..."), but can
> you advise me in where to get this "hdfs
Yes. I did this recently. You need to copy the cloudera cluster related
conf files into the local machine
and set HADOOP_CONF_DIR or YARN_CONF_DIR.
And also local machine should be able to ssh to the cloudera cluster.
On Wed, Jul 15, 2015 at 8:51 AM, ayan guha wrote:
> Assuming you run spark lo
Assuming you run spark locally (ie either local mode or standalone cluster
on your localm/c)
1. You need to have hadoop binaries locally
2. You need to have hdfs-site on Spark Classpath of your local m/c
I would suggest you to start off with local files to play around.
If you need to run spark on
I have Spark 1.4 on my local machine and I would like to connect to our local 4
nodes Cloudera cluster. But how?
In the example it says text_file = spark.textFile("hdfs://..."), but can you
advise me in where to get this "hdfs://..." -address?
Thanks!
Elina
I think this is how it works, So RDDs will have partitions which are made
up of blocks and the blockManager will know where these blocks are
available, based on the availability (PROCESS_LOCAL, NODE_LOCAL etc), spark
will launch the tasks on those nodes. This behaviour can be controlled with
spark.
I have a basic qs: how spark assigns partition to an executor? Does it
respect data locality? Does this behaviour depend on cluster manager, ie
yarn vs standalone?
On 22 Jun 2015 22:45, "Akhil Das" wrote:
> Option 1 should be fine, Option 2 would bound a lot on network as the data
> increase in t
Option 1 should be fine, Option 2 would bound a lot on network as the data
increase in time.
Thanks
Best Regards
On Mon, Jun 22, 2015 at 5:59 PM, Ashish Soni wrote:
> Hi All ,
>
> What is the Best Way to install and Spark Cluster along side with Hadoop
> Cluster , Any recommendation for below
Hi All ,
What is the Best Way to install and Spark Cluster along side with Hadoop
Cluster , Any recommendation for below deployment topology will be a great
help
*Also Is it necessary to put the Spark Worker on DataNodes as when it read
block from HDFS it will be local to the Server / Worker or
> through the spark code) which do a good job of explaining the Spark/HDFS
>> > data workflow (ie. how data moves from disk -> HDFS -> Spark -> HDFS)?
>> >
>> > Thanks!
>> > Zach
>> >
>> >
>> >
>> >
>> >
> Also, do you know of any papers/books/other resources (other trying to
> dig
> > through the spark code) which do a good job of explaining the Spark/HDFS
> > data workflow (ie. how data moves from disk -> HDFS -> Spark -> HDFS)?
> >
> > Thanks!
> > Zach
&
the Spark/HDFS
> data workflow (ie. how data moves from disk -> HDFS -> Spark -> HDFS)?
>
> Thanks!
> Zach
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Questions-about-Spark-and-
do a good job of explaining the Spark/HDFS
data workflow (ie. how data moves from disk -> HDFS -> Spark -> HDFS)?
Thanks!
Zach
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Questions-about-Spark-and-HDFS-co-location-tp21070.html
Sent from t
12 matches
Mail list logo