Hi everyone,I have the following configuration. I am currently running my app
in local mode.
val conf = new
SparkConf().setMaster("local[2]").setAppName("ApproxStrMatch").set("spark.executor.memory",
"3g").set("spark.storage.memoryFraction", "0.1")
I am getting the following error. I tried setting up spark.executor.memory and
memory fraction setting, however my UI does not show the increase and I still
get these errors. I am loading a TSV file from HDFS (around 5 GB). Does this
mean, I should update these settings and add more memory or is it somethign
else? Spark master has 24 GB physical memory and workers have 16 GB, but we are
running other services (CDH 5.1) on these nodes as well.
14/07/31 09:48:09 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting
2 non-empty blocks out of 2 blocks14/07/31 09:48:09 INFO
BlockFetcherIterator$BasicBlockFetcherIterator: Getting 2 non-empty blocks out
of 2 blocks14/07/31 09:48:09 INFO
BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote fetches in 6
ms14/07/31 09:48:09 INFO BlockFetcherIterator$BasicBlockFetcherIterator:
Started 0 remote fetches in 6 ms14/07/31 09:48:09 INFO
BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 50331648,
targetRequestSize: 1006632914/07/31 09:48:09 INFO
BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 50331648,
targetRequestSize: 1006632914/07/31 09:48:09 INFO
BlockFetcherIterator$BasicBlockFetcherIterator: Getting 2 non-empty blocks out
of 2 blocks14/07/31 09:48:09 INFO
BlockFetcherIterator$BasicBlockFetcherIterator: Getting 2 non-empty blocks out
of 2 blocks14/07/31 09:48:09 INFO
BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote fetches in 1
ms14/07/31 09:48:09 INFO BlockFetcherIterator$BasicBlockFetcherIterator:
Started 0 remote fetches in 1 ms14/07/31 09:48:17 ERROR Executor: Exception in
task ID 5java.lang.OutOfMemoryError: Java heap space at
java.util.Arrays.copyOf(Arrays.java:2271) at
java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:178) at
org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:73)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:197) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)14/07/31 09:48:17 ERROR
ExecutorUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor
task launch worker-3,5,main]java.lang.OutOfMemoryError: Java heap space at
java.util.Arrays.copyOf(Arrays.java:2271) at
java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:178) at
org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:73)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:197) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)14/07/31 09:48:17 WARN
TaskSetManager: Lost TID 5 (task 1.0:0)14/07/31 09:48:17 WARN TaskSetManager:
Loss was due to java.lang.OutOfMemoryErrorjava.lang.OutOfMemoryError: Java heap
space at java.util.Arrays.copyOf(Arrays.java:2271) at
java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:178) at
org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:73)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:197) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)14/07/31 09:48:17 ERROR
TaskSetManager: Task 1.0:0 failed 1 times; aborting job14/07/31 09:48:17 INFO
TaskSchedulerImpl: Cancelling stage 114/07/31 09:48:17 INFO DAGScheduler:
Failed to run collect at ComputeScores.scala:7614/07/31 09:48:17 INFO Executor:
Executor is trying to kill task 614/07/31 09:48:17 INFO TaskSchedulerImpl:
Stage 1 was cancelled