Please check the node manager logs to see why the container is killed. On Mon, Aug 3, 2015 at 11:59 PM, Umesh Kacha <umesh.ka...@gmail.com> wrote:
> Hi all any help will be much appreciated my spark job runs fine but in the > middle it starts loosing executors because of netafetchfailed exception > saying shuffle not found at the location since executor is lost > On Jul 31, 2015 11:41 PM, "Umesh Kacha" <umesh.ka...@gmail.com> wrote: > >> Hi thanks for the response. It looks like YARN container is getting >> killed but dont know why I see shuffle metafetchexception as mentioned in >> the following SO link. I have enough memory 8 nodes 8 cores 30 gig memory >> each. And because of this metafetchexpcetion YARN killing container running >> executor how can it over run memory I tried to give each executor 25 gig >> still it is not sufficient and it fails. Please guide I dont understand >> what is going on I am using Spark 1.4.0 I am using spark.shuffle.memory as >> 0.0 and spark.storage.memory as 0.5. I have almost all optimal properties >> like Kyro serializer I have kept 500 akka frame size 20 akka threads dont >> know I am trapped its been two days I am trying to recover from this issue. >> >> >> http://stackoverflow.com/questions/29850784/what-are-the-likely-causes-of-org-apache-spark-shuffle-metadatafetchfailedexcept >> >> >> >> On Thu, Jul 30, 2015 at 9:56 PM, Ashwin Giridharan < >> ashwin.fo...@gmail.com> wrote: >> >>> What is your cluster configuration ( size and resources) ? >>> >>> If you do not have enough resources, then your executor will not run. >>> Moreover allocating 8 cores to an executor is too much. >>> >>> If you have a cluster with four nodes running NodeManagers, each >>> equipped with 4 cores and 8GB of memory, >>> then an optimal configuration would be, >>> >>> --num-executors 8 --executor-cores 2 --executor-memory 2G >>> >>> Thanks, >>> Ashwin >>> >>> On Thu, Jul 30, 2015 at 12:08 PM, unk1102 <umesh.ka...@gmail.com> wrote: >>> >>>> Hi I have one Spark job which runs fine locally with less data but when >>>> I >>>> schedule it on YARN to execute I keep on getting the following ERROR and >>>> slowly all executors gets removed from UI and my job fails >>>> >>>> 15/07/30 10:18:13 ERROR cluster.YarnScheduler: Lost executor 8 on >>>> myhost1.com: remote Rpc client disassociated >>>> 15/07/30 10:18:13 ERROR cluster.YarnScheduler: Lost executor 6 on >>>> myhost2.com: remote Rpc client disassociated >>>> I use the following command to schedule spark job in yarn-client mode >>>> >>>> ./spark-submit --class com.xyz.MySpark --conf >>>> "spark.executor.extraJavaOptions=-XX:MaxPermSize=512M" >>>> --driver-java-options >>>> -XX:MaxPermSize=512m --driver-memory 3g --master yarn-client >>>> --executor-memory 2G --executor-cores 8 --num-executors 12 >>>> /home/myuser/myspark-1.0.jar >>>> >>>> I dont know what is the problem please guide. I am new to Spark. Thanks >>>> in >>>> advance. >>>> >>>> >>>> >>>> -- >>>> View this message in context: >>>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-control-Spark-Executors-from-getting-Lost-when-using-YARN-client-mode-tp24084.html >>>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>> For additional commands, e-mail: user-h...@spark.apache.org >>>> >>>> >>> >>> >>> -- >>> Thanks & Regards, >>> Ashwin Giridharan >>> >> >> -- Best Regards Jeff Zhang