I want to run k-means of MLib on a big dataset, it seems for big datsets, we
need to perform pre-clustering methods such as canopy clustering. By
starting with an initial clustering the number of more expensive distance
measurements can be significantly reduced by ignoring points outside of the
in
I have one master and two slave nodes, I did not set any ip for spark driver.
My question is should I set a ip for spark driver and can I host the driver
inside the cluster in master node? if so, how to host it? will it be hosted
automatically in that node we submit the application by spark-submit?
I have one master and two slave nodes, I did not set any ip for spark driver.
My question is should I set a ip for spark driver and can I host the driver
inside the cluster in master node? if so, how to host it? will it be hosted
automatically in that node we submit the application by spark-submit?
0
down vote
favorite
I am running spark-1.0.0 by connecting to a spark standalone cluster which
has one master and two slaves. I ran wordcount.py by Spark-submit, actually
it reads data from HDFS and also write the results into HDFS. So far
everything is fine and the results will correctly be writ
Can anyone explain to me what is difference between worker and slave? I hav
e one master and two slaves which are connected to each other, by using jps
command I can see master in master node and worker in slave nodes but I dont
have any worker in my master by using this command
/bin/spark-class