Spark connecting to wrong Filesystem.uri

2014-01-25 Thread mharwida
Hi, I've had Spark/Shark running successfully on my Hadoop cluster. Due to some reasons, I had to change the IP addresses of my 6 Hadoop nodes and since then, I was unable to create a cached table in memory using Shark. While 10.14.xx.xx in the first line below is the new address, Shark/Spark is

Re: Spark Master on Hadoop Job Tracker?

2014-01-21 Thread mharwida
Many thanks for the replies. The way I currently have my set up is as follows; 6 nodes running Hadoop with each node having approximately 5GB of data. Launched a Spark Master (and Shark via ./shark) on one of the Hadoop nodes and launched 5 worker Spark nodes on the remaining 5 Hadoop nodes. So

Spark Master on Hadoop Job Tracker?

2014-01-20 Thread mharwida
Hi, Should the Spark Master run on the Hadoop Job Tracker node (and Spark workers on Task Trackers) or the placement of the Spark Master could reside on any Hadoop node? Thanks Majd -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Master-on-Hadoop-Job

Re: Spark writing to disk when there's enough memory?!

2014-01-20 Thread mharwida
Hi, I've experimented with the parameters provided but we are still seeing the same problem, data is still spilling to disk when there's clearly enough memory on the worker nodes. Please note that data is distributed equally amongst the 6 Hadoop nodes (About 5GB per node). Any workarounds or clu

Spark writing to disk when there's enough memory?!

2014-01-13 Thread mharwida
Hi All, I'm creating a cached table in memory via Shark using the command: create table tablename_cached as select * from tablename; Monitoring this via the Spark UI, I have noticed that data is being written to disk when there's clearly enough available memory on 2 of the worker nodes. Please re

Default Storage Level in Spark

2014-01-10 Thread mharwida
Hi, I'm using Spark 0.8.0 and Shark 0.8.0. Upon creating a cached table in memory using Shark, the Spark UI indicates that the type of storage level used is 'Disk Memory Deserialized 1x Replicated'. I had the assumption that Memory Only is the default storage level in Spark. Did that change in 0.8