Spark ignores SPARK_WORKER_MEMORY?

2016-01-13 Thread Barak Yaish
Hello,

Although I'm setting SPARK_WORKER_MEMORY in spark-env.sh, looks like this
setting is ignored. I can't find any indication at the scripts under
bin/sbin that -Xms/-Xmx are set.

If I ps the worker pid, it looks like memory set to 1G:

[hadoop@sl-env1-hadoop1 spark-1.5.2-bin-hadoop2.6]$ ps -ef | grep 20232
hadoop   20232 1  0 02:01 ?00:00:22 /usr/java/latest//bin/java
-cp
/workspace/3rd-party/spark/spark-1.5.2-bin-hadoop2.6/sbin/../conf/:/workspace/3rd-party/spark/spark-1.5.2-bin-hadoop2.6/lib/spark-assembly-1.5.2-hadoop2.6.0.jar:/workspace/3rd-party/spark/spark-1.5.2-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/workspace/3rd-party/spark/spark-1.5.2-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/workspace/3rd-party/spark/spark-1.5.2-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/workspace/3rd-party/hadoop/2.6.3//etc/hadoop/
-Xms1g -Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081
spark://10.52.39.92:7077

Am I missing something?

Thanks.


Lost tasks due to OutOfMemoryError (GC overhead limit exceeded)

2016-01-12 Thread Barak Yaish
Hello,

I've a 5 nodes cluster which hosts both hdfs datanodes and spark workers.
Each node has 8 cpu and 16G memory. Spark version is 1.5.2, spark-env.sh is
as follow:

export SPARK_MASTER_IP=10.52.39.92

export SPARK_WORKER_INSTANCES=4

export SPARK_WORKER_CORES=8
export SPARK_WORKER_MEMORY=4g

And more settings done in the application code:

sparkConf.set("spark.serializer","org.apache.spark.serializer.KryoSerializer");
sparkConf.set("spark.kryo.registrator",InternalKryoRegistrator.class.getName());
sparkConf.set("spark.kryo.registrationRequired","true");
sparkConf.set("spark.kryoserializer.buffer.max.mb","512");
sparkConf.set("spark.default.parallelism","300");
sparkConf.set("spark.rpc.askTimeout","500");

I'm trying to load data from hdfs and running some sqls on it (mostly
groupby) using DataFrames. The logs keep saying that tasks are lost due to
OutOfMemoryError (GC overhead limit exceeded).

Can you advice what is the recommended settings (memory, cores, partitions,
etc.) for the given hardware?

Thanks!


Spark 1.5.1 standalone cluster - wrong Akka remoting config?

2015-10-08 Thread Barak Yaish
Doing my firsts steps with Spark, I'm facing problems submitting jobs to
cluster from the application code. Digging the logs, I noticed some
periodic WARN messages on master log:

15/10/08 13:00:00 WARN remote.ReliableDeliverySupervisor: Association with
remote system [akka.tcp://sparkDriver@192.168.254.167:64014] has failed,
address is now gated for [5000] ms. Reason: [Disassociated]

The problem is that ip address not exist on our network, and wasn't
configured anywhere. The same wrong ip is shown on the worker log when it
tries execute the task (wrong ip passed to --driver-url):

15/10/08 12:58:21 INFO worker.ExecutorRunner: Launch command:
"/usr/java/latest//bin/java" "-cp" "/path/spark/spark-1.5.1-bin-ha
doop2.6/sbin/../conf/:/path/spark/spark-1.5.1-bin-hadoop2.6/lib/spark-assembly-1.5.1-hadoop2.6.0.jar:/path/spark/
spark-1.5.1-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/path/spark/spark-1.5.1-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.ja
r:/path/spark/spark-1.5.1-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/path/hadoop/2.6.0//etc/hadoop/"
"-Xms102
4M" "-Xmx1024M" "-Dspark.driver.port=64014" "-Dspark.driver.port=53411"
"org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url"
"akka.tcp://sparkDriver@192.168.254.167:64014/user/CoarseGrainedScheduler"
"--executor-id" "39" "--hostname" "192.168.10.214" "--cores" "16"
"--app-id"  "app-20151008123702-0003" "--worker-url" "akka.tcp://
sparkWorker@192.168.10.214:37625/user/Worker"
15/10/08 12:59:28 INFO worker.Worker: Executor app-20151008123702-0003/39
finished with state EXITED message Command exited with code 1 exitStatus 1
Any idea what I did wrong and how can this be fixed?

The java version is 1.8.0_20, and I'm using pre-built Spark binaries.

Thanks!