Hi All,
Even i have same issues.
EMR conf is 3 node cluster with m3.2xlarge.
i'm tyring to read 100Gb file in spark-sql
i have set below on spark
export SPARK_EXECUTOR_MEMORY=4G
export SPARK_DRIVER_MEMORY=12G
export SPARK_EXECUTOR_INSTANCES=16
export SPARK_EXECUTOR_CORES=16
spark.kryoserializer.buffer.max 2000m
spark.driver.maxResultSize 0
-XX:MaxPermSize=1024M
PFB the error:
16/02/11 15:32:00 WARN DFSClient: DFSOutputStream ResponseProcessor exception
for block BP-1257713490-xx.xx.xx.xx-1455121562682:blk_1073742405_10984
java.io.EOFException: Premature EOF: no length prefix available
at
org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2280)
at
org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:244)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:745)
Kindly help me understand the conf.
Thanks in advance.
Regards
Arun.
________________________________
From: Kuchekar [[email protected]]
Sent: 11 February 2016 09:42
To: Nirav Patel
Cc: spark users
Subject: Re: Spark execuotr Memory profiling
Hi Nirav,
I faced similar issue with Yarn, EMR 1.5.2 and following
Spark Conf helped me. You can set the values accordingly
conf=
(SparkConf().set("spark.master","yarn-client").setAppName("HalfWay").set("spark.driver.memory",
"15G").set("spark.yarn.am.memory","15G"))
conf=conf.set("spark.driver.maxResultSize","10G").set("spark.storage.memoryFraction","0.6").set("spark.shuffle.memoryFraction","0.6").set("spark.yarn.executor.memoryOverhead","4000")
conf = conf.set("spark.executor.cores","4").set("spark.executor.memory",
"15G").set("spark.executor.instances","6")
Is it also possible to use reduceBy in place of groupBy that might help the
shuffling too.
Kuchekar, Nilesh
On Wed, Feb 10, 2016 at 8:09 PM, Nirav Patel
<[email protected]<redir.aspx?REF=BodRWYcOP3Qpmfn0zqBA407ud6ZVgyPbPA0LvbEHT0gAdyPg_DLTCAFtYWlsdG86bnBhdGVsQHhhY3RseWNvcnAuY29t>>
wrote:
We have been trying to solve memory issue with a spark job that processes 150GB
of data (on disk). It does a groupBy operation; some of the executor will
receive somehwere around (2-4M scala case objects) to work with. We are using
following spark config:
"executorInstances": "15",
"executorCores": "1", (we reduce it to one so single task gets all the
executorMemory! at least that's the assumption here)
"executorMemory": "15000m",
"minPartitions": "2000",
"taskCpus": "1",
"executorMemoryOverhead": "1300",
"shuffleManager": "tungsten-sort",
"storageFraction": "0.4"
This is a snippet of what we see in spark UI for a Job that fails.
This is a stage of this job that fails.
Stage Id Pool Name Description Submitted Duration
Tasks: Succeeded/Total Input Output Shuffle Read ▾ Shuffle Write Failure
Reason
5 (retry 15)
prod<redir.aspx?REF=AnMVxvlyRKiif-yzc14LFZ4llYXEW8xW4Ga6t_62sPQAdyPg_DLTCAFodHRwOi8vaGRuNzoxODA4MC9oaXN0b3J5L2FwcGxpY2F0aW9uXzE0NTQ5NzU4MDAxOTJfMDQ0Ny9zdGFnZXMvcG9vbD9wb29sbmFtZT1wcm9k>
map at
SparkDataJobs.scala:210<redir.aspx?REF=iWnHMjNIESHfSm3y5RxgJPmcc71pheAifrzx6REpZ9QAdyPg_DLTCAFodHRwOi8vaGRuNzoxODA4MC9oaXN0b3J5L2FwcGxpY2F0aW9uXzE0NTQ5NzU4MDAxOTJfMDQ0Ny9zdGFnZXMvc3RhZ2U_aWQ9NSZhdHRlbXB0PTE1>+details
2016/02/09 21:30:06 13 min
130/389 (16 failed)
1982.6 MB 818.7 MB
org.apache.spark.shuffle.FetchFailedException: Error in opening
FileSegmentManagedBuffer{file=/tmp/hadoop/nm-local-dir/usercache/fasd/appcache/application_1454975800192_0447/blockmgr-abb77b52-9761-457a-b67d-42a15b975d76/0c/shuffle_0_39_0.data,
offset=11421300, length=2353}
This is one of the single task attempt from above stage that threw OOM
2 22361 0 FAILED PROCESS_LOCAL 38 / nd1.mycom.local
2016/02/09 22:10:42 5.2 min 1.6 min 7.4 MB / 375509
java.lang.OutOfMemoryError: Java heap space+details
java.lang.OutOfMemoryError: Java heap space
at java.util.IdentityHashMap.resize(IdentityHashMap.java:469)
at java.util.IdentityHashMap.put(IdentityHashMap.java:445)
at
org.apache.spark.util.SizeEstimator$SearchState.enqueue(SizeEstimator.scala:159)
at
org.apache.spark.util.SizeEstimator$$anonfun$visitSingleObject$1.apply(SizeEstimator.scala:203)
at
org.apache.spark.util.SizeEstimator$$anonfun$visitSingleObject$1.apply(SizeEstimator.scala:202)
at scala.collection.immutable.List.foreach(List.scala:318)
at
org.apache.spark.util.SizeEstimator$.visitSingleObject(SizeEstimator.scala:202)
at
org.apache.spark.util.SizeEstimator$.org$apache$spark$util$SizeEstimator$$estimate(SizeEstimator.scala:186)
at org.apache.spark.util.SizeEstimator$.estimate(SizeEstimator.scala:54)
at
org.apache.spark.util.collection.SizeTracker$class.takeSample(SizeTracker.scala:78)
at
org.apache.spark.util.collection.SizeTracker$class.afterUpdate(SizeTracker.scala:70)
at
org.apache.spark.util.collection.SizeTrackingVector.$plus$eq(SizeTrackingVector.scala:3
None of above suggest that it went out ot 15GB of memory that I initially
allocated? So what am i missing here. What's eating my memory.
We tried executorJavaOpts to get heap dump but it doesn't seem to work.
-XX:-HeapDumpOnOutOfMemoryError -XX:OnOutOfMemoryError='kill -3 %p'
-XX:HeapDumpPath=/opt/cores/spark
I don't see any cores being generated.. neither I can find Heap dump anywhere
in logs.
Also, how do I find yarn container ID from spark executor ID ? So that I can
investigate yarn nodemanager and resourcemanager logs for particular container.
PS - Job does not do any caching of intermediate RDD as each RDD is just used
once for subsequent step. We use spark 1.5.2 over Yarn in yarn-client mode.
Thanks
[What's New with
Xactly]<redir.aspx?REF=ntKygyqq2xglSpOZ-_S3UO5ijxjWngbLGIx8zaIX5J1g2CXg_DLTCAFodHRwOi8vd3d3LnhhY3RseWNvcnAuY29tL2VtYWlsLWNsaWNrLw..>
[https://www.xactlycorp.com/wp-content/uploads/2015/07/nyse_xtly_alt_24.png]<redir.aspx?REF=XxqSciJ4MrIo84C2PUGO24k26-2w0PvPU2959_AdXR9g2CXg_DLTCAFodHRwczovL3d3dy5ueXNlLmNvbS9xdW90ZS9YTllTOlhUTFk.>
[LinkedIn]
<redir.aspx?REF=bVtSSTrn15-ccM-NAs45ecVuflV10Fx5fh5Ou10LtH5g2CXg_DLTCAFodHRwczovL3d3dy5saW5rZWRpbi5jb20vY29tcGFueS94YWN0bHktY29ycG9yYXRpb24.>
[Twitter]
<redir.aspx?REF=hRKyztX7dMNmYnYk_eQMpZ6X0aEb6w_Wo2WqaSpPC3pg2CXg_DLTCAFodHRwczovL3R3aXR0ZXIuY29tL1hhY3RseQ..>
[Facebook]
<redir.aspx?REF=OJPtuwme12CQ7d8UKuTgAQVv9R_cBseOrK7pahWXXkBg2CXg_DLTCAFodHRwczovL3d3dy5mYWNlYm9vay5jb20vWGFjdGx5Q29ycA..>
[YouTube]
<redir.aspx?REF=0u4D75j-RVcW_DSAE-gc_4P-J2echiygf3eOWGADkpRg2CXg_DLTCAFodHRwOi8vd3d3LnlvdXR1YmUuY29tL3hhY3RseWNvcnBvcmF0aW9u>
This e-mail and any files transmitted with it are for the sole use of the
intended recipient(s) and may contain confidential and privileged information.
If you are not the intended recipient(s), please reply to the sender and
destroy all copies of the original message. Any unauthorized review, use,
disclosure, dissemination, forwarding, printing or copying of this email,
and/or any action taken in reliance on the contents of this e-mail is strictly
prohibited and may be unlawful. Where permitted by applicable law, this e-mail
and other e-mail communications sent to and from Cognizant e-mail addresses may
be monitored.