Hi,
I've had Spark/Shark running successfully on my Hadoop cluster. Due to some
reasons, I had to change the IP addresses of my 6 Hadoop nodes and since
then, I was unable to create a cached table in memory using Shark.
While 10.14.xx.xx in the first line below is the new address, Shark/Spark is
Many thanks for the replies.
The way I currently have my set up is as follows;
6 nodes running Hadoop with each node having approximately 5GB of data.
Launched a Spark Master (and Shark via ./shark) on one of the Hadoop nodes
and launched 5 worker Spark nodes on the remaining 5 Hadoop nodes.
So
Hi,
Should the Spark Master run on the Hadoop Job Tracker node (and Spark
workers on Task Trackers) or the placement of the Spark Master could reside
on any Hadoop node?
Thanks
Majd
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Master-on-Hadoop-Job
Hi,
I've experimented with the parameters provided but we are still seeing the
same problem, data is still spilling to disk when there's clearly enough
memory on the worker nodes.
Please note that data is distributed equally amongst the 6 Hadoop nodes
(About 5GB per node).
Any workarounds or clu
Hi All,
I'm creating a cached table in memory via Shark using the command:
create table tablename_cached as select * from tablename;
Monitoring this via the Spark UI, I have noticed that data is being written
to disk when there's clearly enough available memory on 2 of the worker
nodes. Please re
Hi,
I'm using Spark 0.8.0 and Shark 0.8.0. Upon creating a cached table in
memory using Shark, the Spark UI indicates that the type of storage level
used is 'Disk Memory Deserialized 1x Replicated'. I had the assumption that
Memory Only is the default storage level in Spark. Did that change in 0.8