Hi all,
I have the following cluster configurations: - 5 nodes on a cloud environment. - Hadoop 2.5.0. - HBase 0.98.6. - Spark 1.2.0. - 8 cores and 16 GB of ram on each host. - 1 NFS disk with 300 IOPS mounted on host 1 and 2. - 1 NFS disk with 300 IOPS mounted on host 3,4 and 5. I tried to run a spark job in cluster mode that computes the left outer join between two hbase tables. The first table stores about 4.1 GB of data spread across 3 regions with Snappy compression. The second one stores about 1.2 GB of data spread across 22 regions with Snappy compression. I sometimes get executor lost during in the shuffle phase during the last stage (saveAsHadoopDataset). Below my spark conf: num-cpu-cores = 20 memory-per-node = 10G spark.scheduler.mode = FAIR spark.scheduler.pool = production spark.shuffle.spill= true spark.rdd.compress = true spark.core.connection.auth.wait.timeout=2000 spark.sql.shuffle.partitions=100 spark.default.parallelism=50 spark.speculation=false spark.shuffle.spill=true spark.shuffle.memoryFraction=0.1 spark.cores.max=30 spark.driver.memory=10g Are the resource to low to handle this kind of operation? if yes, could you share with me the right configuration to perform this kind of task? Thank you in advance. F.