Please check the node manager logs to see why the container is killed.

On Mon, Aug 3, 2015 at 11:59 PM, Umesh Kacha <umesh.ka...@gmail.com> wrote:

> Hi all any help will be much appreciated my spark job runs fine but in the
> middle it starts loosing executors because of netafetchfailed exception
> saying shuffle not found at the location since executor is lost
> On Jul 31, 2015 11:41 PM, "Umesh Kacha" <umesh.ka...@gmail.com> wrote:
>
>> Hi thanks for the response. It looks like YARN container is getting
>> killed but dont know why I see shuffle metafetchexception as mentioned in
>> the following SO link. I have enough memory 8 nodes 8 cores 30 gig memory
>> each. And because of this metafetchexpcetion YARN killing container running
>> executor how can it over run memory I tried to give each executor 25 gig
>> still it is not sufficient and it fails. Please guide I dont understand
>> what is going on I am using Spark 1.4.0 I am using spark.shuffle.memory as
>> 0.0 and spark.storage.memory as 0.5. I have almost all optimal properties
>> like Kyro serializer I have kept 500 akka frame size 20 akka threads dont
>> know I am trapped its been two days I am trying to recover from this issue.
>>
>>
>> http://stackoverflow.com/questions/29850784/what-are-the-likely-causes-of-org-apache-spark-shuffle-metadatafetchfailedexcept
>>
>>
>>
>> On Thu, Jul 30, 2015 at 9:56 PM, Ashwin Giridharan <
>> ashwin.fo...@gmail.com> wrote:
>>
>>> What is your cluster configuration ( size and resources) ?
>>>
>>> If you do not have enough resources, then your executor will not run.
>>> Moreover allocating 8 cores to an executor is too much.
>>>
>>> If you have a cluster with four nodes running NodeManagers, each
>>> equipped with 4 cores and 8GB of memory,
>>> then an optimal configuration would be,
>>>
>>> --num-executors 8 --executor-cores 2 --executor-memory 2G
>>>
>>> Thanks,
>>> Ashwin
>>>
>>> On Thu, Jul 30, 2015 at 12:08 PM, unk1102 <umesh.ka...@gmail.com> wrote:
>>>
>>>> Hi I have one Spark job which runs fine locally with less data but when
>>>> I
>>>> schedule it on YARN to execute I keep on getting the following ERROR and
>>>> slowly all executors gets removed from UI and my job fails
>>>>
>>>> 15/07/30 10:18:13 ERROR cluster.YarnScheduler: Lost executor 8 on
>>>> myhost1.com: remote Rpc client disassociated
>>>> 15/07/30 10:18:13 ERROR cluster.YarnScheduler: Lost executor 6 on
>>>> myhost2.com: remote Rpc client disassociated
>>>> I use the following command to schedule spark job in yarn-client mode
>>>>
>>>>  ./spark-submit --class com.xyz.MySpark --conf
>>>> "spark.executor.extraJavaOptions=-XX:MaxPermSize=512M"
>>>> --driver-java-options
>>>> -XX:MaxPermSize=512m --driver-memory 3g --master yarn-client
>>>> --executor-memory 2G --executor-cores 8 --num-executors 12
>>>> /home/myuser/myspark-1.0.jar
>>>>
>>>> I dont know what is the problem please guide. I am new to Spark. Thanks
>>>> in
>>>> advance.
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-control-Spark-Executors-from-getting-Lost-when-using-YARN-client-mode-tp24084.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>
>>>>
>>>
>>>
>>> --
>>> Thanks & Regards,
>>> Ashwin Giridharan
>>>
>>
>>


-- 
Best Regards

Jeff Zhang

Reply via email to