Hi thanks I usually get see the following errors in Spark logs and because
of that I think executor gets lost all of the following happens because
huge data shuffle and I cant avoid that dont know what to do please guide

15/08/16 12:26:46 WARN spark.HeartbeatReceiver: Removing executor 10
with no recent heartbeats:

1051638 ms exceeds timeout 1000000 ms

Or

org.apache.spark.shuffle.MetadataFetchFailedException: Missing an
output location for shuffle 0
at 
org.apache.spark.MapOutputTracker$$anonfun$org$apache$spark$MapOutputTracker$$convertMapStatuses$1.apply(MapOutputTracker.scala:384)
at 
org.apache.spark.MapOutputTracker$$anonfun$org$apache$spark$MapOutputTracker$$convertMapStatuses$1.apply(MapOutputTracker.scala:381)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
at 
org.apache.spark.MapOutputTracker$.org$apache$spark$MapOutputTracker$$convertMapStatuses(MapOutputTracker.scala:380)
at 
org.apache.spark.MapOutputTracker.getServerStatuses(MapOutputTracker.scala:176)
at 
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.fetch(BlockStoreShuffleFetcher.scala:42)
at 
org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:40)
at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:92)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
at org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:33)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)



OR YARN kills container because of

Container [pid=26783,containerID=container_1389136889967_0009_01_000002]
is running beyond physical memory limits. Current usage: 30.2 GB of 30
GB physical memory used; Killing container.


On Mon, Oct 5, 2015 at 8:00 AM, Alex Rovner <alex.rov...@magnetic.com>
wrote:

> Can you at least copy paste the error(s) you are seeing when the job
> fails? Without the error message(s), it's hard to even suggest anything.
>
> *Alex Rovner*
> *Director, Data Engineering *
> *o:* 646.759.0052
>
> * <http://www.magnetic.com/>*
>
> On Sat, Oct 3, 2015 at 9:50 AM, Umesh Kacha <umesh.ka...@gmail.com> wrote:
>
>> Hi thanks I cant share yarn logs because of privacy in my company but I
>> can tell you I have seen yarn logs there I have not found anything except
>> YARN killing container because it is exceeds physical memory capacity.
>>
>> I am using the following command line script Above job launches around
>> 1500 ExecutorService threads from a driver with a thread pool of 15 so at a
>> time 15 jobs will be running as showing in UI.
>>
>> ./spark-submit --class com.xyz.abc.MySparkJob
>>
>> --conf "spark.executor.extraJavaOptions=-XX:MaxPermSize=512M" -
>>
>> -driver-java-options -XX:MaxPermSize=512m -
>>
>> -driver-memory 4g --master yarn-client
>>
>> --executor-memory 27G --executor-cores 2
>>
>> --num-executors 40
>>
>> --jars /path/to/others-jars
>>
>> /path/to/spark-job.jar
>>
>>
>> On Sat, Oct 3, 2015 at 7:11 PM, Alex Rovner <alex.rov...@magnetic.com>
>> wrote:
>>
>>> Can you send over your yarn logs along with the command you are using to
>>> submit your job?
>>>
>>> *Alex Rovner*
>>> *Director, Data Engineering *
>>> *o:* 646.759.0052
>>>
>>> * <http://www.magnetic.com/>*
>>>
>>> On Sat, Oct 3, 2015 at 9:07 AM, Umesh Kacha <umesh.ka...@gmail.com>
>>> wrote:
>>>
>>>> Hi Alex thanks much for the reply. Please read the following for more
>>>> details about my problem.
>>>>
>>>>
>>>> http://stackoverflow.com/questions/32317285/spark-executor-oom-issue-on-yarn
>>>>
>>>> My each container has 8 core and 30 GB max memory. So I am using
>>>> yarn-client mode using 40 executors with 27GB/2 cores. If I use more cores
>>>> then my job start loosing more executors. I tried to set
>>>> spark.yarn.executor.memoryOverhead around 2 GB even 8 GB but it does
>>>> not help I loose executors no matter what. The reason is my jobs shuffles
>>>> lots of data even 20 GB of data in every job in UI I have seen it. Shuffle
>>>> happens because of group by and I cant avoid it in my case.
>>>>
>>>>
>>>>
>>>> On Sat, Oct 3, 2015 at 6:27 PM, Alex Rovner <alex.rov...@magnetic.com>
>>>> wrote:
>>>>
>>>>> This sounds like you need to increase YARN overhead settings with the 
>>>>> "spark.yarn.executor.memoryOverhead"
>>>>> parameter. See
>>>>> http://spark.apache.org/docs/latest/running-on-yarn.html for more
>>>>> information on the setting.
>>>>>
>>>>> If that does not work for you, please provide the error messages and
>>>>> the command line you are using to submit your jobs for further
>>>>> troubleshooting.
>>>>>
>>>>>
>>>>> *Alex Rovner*
>>>>> *Director, Data Engineering *
>>>>> *o:* 646.759.0052
>>>>>
>>>>> * <http://www.magnetic.com/>*
>>>>>
>>>>> On Sat, Oct 3, 2015 at 6:19 AM, unk1102 <umesh.ka...@gmail.com> wrote:
>>>>>
>>>>>> Hi I have couple of Spark jobs which uses group by query which is
>>>>>> getting
>>>>>> fired from hiveContext.sql() Now I know group by is evil but my use
>>>>>> case I
>>>>>> cant avoid group by I have around 7-8 fields on which I need to do
>>>>>> group by.
>>>>>> Also I am using df1.except(df2) which also seems heavy operation and
>>>>>> does
>>>>>> lots of shuffling please see my UI snap
>>>>>> <
>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/file/n24914/IMG_20151003_151830218.jpg
>>>>>> >
>>>>>>
>>>>>> I have tried almost all optimisation including Spark 1.5 but nothing
>>>>>> seems
>>>>>> to be working and my job fails hangs because of executor will reach
>>>>>> physical
>>>>>> memory limit and YARN will kill it. I have around 1TB of data to
>>>>>> process and
>>>>>> it is skewed. Please guide.
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> View this message in context:
>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-optimize-group-by-query-fired-using-hiveContext-sql-tp24914.html
>>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>>> Nabble.com.
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to