Re: Spark execuotr Memory profiling

Kuchekar Sat, 20 Feb 2016 09:00:07 -0800

Hi Nirav,

                I recently attended the Spark Summit East 2016 and almost
every talk about errors faced by community and/or tuning topics for Spark
mentioned this being the main problem (Executor lost and JVM out of
memory).


                 Checkout this blogs that explains how to tune spark
<http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/>,
cheatsheet for tuning spark <http://techsuppdiva.github.io/spark1.6.html>.

Hope this helps, keep the community posted what resolved your issue if it
does.

Thanks.

Kuchekar, Nilesh

On Sat, Feb 20, 2016 at 11:29 AM, Nirav Patel <npa...@xactlycorp.com> wrote:

> Thanks Nilesh. I don't think there;s heavy communication between driver
> and executor. However I'll try the settings you suggested.
>
> I can not replace groupBy with reduceBy as its not an associative
> operation.
>
> It is very frustrating to be honest. It was a piece of cake with map
> reduce compare to amount to time I am putting in tuning spark make things
> work. To remove doubt that executor might be running multiple tasks
> (executor.cores) and hence reduce to share memory, I set executor.cores to
> 1 so only 1 task have all the 15gb to its disposal!. Which is already 3
> times it needs for most skewed key. I am going to need to profile for sure
> to understand what spark executors are doing there. For sure they are not
> willing to explain the situation but rather will say 'use reduceBy'
>
>
>
>
>
> On Thu, Feb 11, 2016 at 9:42 AM, Kuchekar <kuchekar.nil...@gmail.com>
> wrote:
>
>> Hi Nirav,
>>
>>                   I faced similar issue with Yarn, EMR 1.5.2 and
>> following Spark Conf helped me. You can set the values accordingly
>>
>> conf= (SparkConf().set("spark.master","yarn-client").setAppName("HalfWay"
>> ).set("spark.driver.memory", "15G").set("spark.yarn.am.memory","15G"))
>>
>> conf=conf.set("spark.driver.maxResultSize","10G").set(
>> "spark.storage.memoryFraction","0.6").set("spark.shuffle.memoryFraction",
>> "0.6").set("spark.yarn.executor.memoryOverhead","4000")
>>
>> conf = conf.set("spark.executor.cores","4").set("spark.executor.memory",
>> "15G").set("spark.executor.instances","6")
>>
>> Is it also possible to use reduceBy in place of groupBy that might help
>> the shuffling too.
>>
>>
>> Kuchekar, Nilesh
>>
>> On Wed, Feb 10, 2016 at 8:09 PM, Nirav Patel <npa...@xactlycorp.com>
>> wrote:
>>
>>> We have been trying to solve memory issue with a spark job that
>>> processes 150GB of data (on disk). It does a groupBy operation; some of the
>>> executor will receive somehwere around (2-4M scala case objects) to work
>>> with. We are using following spark config:
>>>
>>> "executorInstances": "15",
>>>
>>>      "executorCores": "1", (we reduce it to one so single task gets all
>>> the executorMemory! at least that's the assumption here)
>>>
>>>      "executorMemory": "15000m",
>>>
>>>      "minPartitions": "2000",
>>>
>>>      "taskCpus": "1",
>>>
>>>      "executorMemoryOverhead": "1300",
>>>
>>>      "shuffleManager": "tungsten-sort",
>>>
>>>       "storageFraction": "0.4"
>>>
>>>
>>> This is a snippet of what we see in spark UI for a Job that fails.
>>>
>>> This is a *stage* of this job that fails.
>>>
>>> Stage IdPool NameDescriptionSubmittedDurationTasks: Succeeded/TotalInput
>>> OutputShuffle Read ▾Shuffle WriteFailure Reason
>>> 5 (retry 15) prod
>>> <http://hdn7:18080/history/application_1454975800192_0447/stages/pool?poolname=prod>
>>>  map
>>> at SparkDataJobs.scala:210
>>> <http://hdn7:18080/history/application_1454975800192_0447/stages/stage?id=5&attempt=15>
>>> +details
>>>
>>> 2016/02/09 21:30:06 13 min
>>> 130/389 (16 failed)
>>> 1982.6 MB 818.7 MB org.apache.spark.shuffle.FetchFailedException: Error
>>> in opening
>>> FileSegmentManagedBuffer{file=/tmp/hadoop/nm-local-dir/usercache/fasd/appcache/application_1454975800192_0447/blockmgr-abb77b52-9761-457a-b67d-42a15b975d76/0c/shuffle_0_39_0.data,
>>> offset=11421300, length=2353}
>>>
>>> This is one of the single *task* attempt from above stage that threw OOM
>>>
>>> 2 22361 0 FAILED PROCESS_LOCAL 38 / nd1.mycom.local 2016/02/09 22:10:42 5.2
>>> min 1.6 min 7.4 MB / 375509 java.lang.OutOfMemoryError: Java heap space
>>> +details
>>>
>>> java.lang.OutOfMemoryError: Java heap space
>>>     at java.util.IdentityHashMap.resize(IdentityHashMap.java:469)
>>>     at java.util.IdentityHashMap.put(IdentityHashMap.java:445)
>>>     at 
>>> org.apache.spark.util.SizeEstimator$SearchState.enqueue(SizeEstimator.scala:159)
>>>     at 
>>> org.apache.spark.util.SizeEstimator$$anonfun$visitSingleObject$1.apply(SizeEstimator.scala:203)
>>>     at 
>>> org.apache.spark.util.SizeEstimator$$anonfun$visitSingleObject$1.apply(SizeEstimator.scala:202)
>>>     at scala.collection.immutable.List.foreach(List.scala:318)
>>>     at 
>>> org.apache.spark.util.SizeEstimator$.visitSingleObject(SizeEstimator.scala:202)
>>>     at 
>>> org.apache.spark.util.SizeEstimator$.org$apache$spark$util$SizeEstimator$$estimate(SizeEstimator.scala:186)
>>>     at org.apache.spark.util.SizeEstimator$.estimate(SizeEstimator.scala:54)
>>>     at 
>>> org.apache.spark.util.collection.SizeTracker$class.takeSample(SizeTracker.scala:78)
>>>     at 
>>> org.apache.spark.util.collection.SizeTracker$class.afterUpdate(SizeTracker.scala:70)
>>>     at 
>>> org.apache.spark.util.collection.SizeTrackingVector.$plus$eq(SizeTrackingVector.scala:3
>>>
>>>
>>> None of above suggest that it went out ot 15GB of memory that I
>>> initially allocated? So what am i missing here. What's eating my memory.
>>>
>>> We tried executorJavaOpts to get heap dump but it doesn't seem to work.
>>>
>>> -XX:-HeapDumpOnOutOfMemoryError -XX:OnOutOfMemoryError='kill -3 %p'
>>> -XX:HeapDumpPath=/opt/cores/spark
>>>
>>> I don't see any cores being generated.. neither I can find Heap dump
>>> anywhere in logs.
>>>
>>> Also, how do I find yarn container ID from spark executor ID ? So that I
>>> can investigate yarn nodemanager and resourcemanager logs for particular
>>> container.
>>>
>>> PS - Job does not do any caching of intermediate RDD as each RDD is just
>>> used once for subsequent step. We use spark 1.5.2 over Yarn in yarn-client
>>> mode.
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>>
>>>
>>>
>>> [image: What's New with Xactly] <http://www.xactlycorp.com/email-click/>
>>>
>>> <https://www.nyse.com/quote/XNYS:XTLY>  [image: LinkedIn]
>>> <https://www.linkedin.com/company/xactly-corporation>  [image: Twitter]
>>> <https://twitter.com/Xactly>  [image: Facebook]
>>> <https://www.facebook.com/XactlyCorp>  [image: YouTube]
>>> <http://www.youtube.com/xactlycorporation>
>>
>>
>>
>
>
>
> [image: What's New with Xactly] <http://www.xactlycorp.com/email-click/>
>
> <https://www.nyse.com/quote/XNYS:XTLY>  [image: LinkedIn]
> <https://www.linkedin.com/company/xactly-corporation>  [image: Twitter]
> <https://twitter.com/Xactly>  [image: Facebook]
> <https://www.facebook.com/XactlyCorp>  [image: YouTube]
> <http://www.youtube.com/xactlycorporation>
>

Re: Spark execuotr Memory profiling

Reply via email to