Re: RECEIVED SIGNAL 15: SIGTERM
Konstatinos, Sure, if you have a resource leak then the collector can't free up memory and the process will use more memory. Time to break out the profiler and see where the memory is going. The usual suspects are handles to resources (open file streams, sockets, etc) kept in containers (arrays, lists, etc). If they're in a container, they can't be collected. Another one is keeping handlers in a container which may keep an internal handle to an open resource. If the handler refers to an open resource and the handler (aka listener, aka observer) is in a container, then it the underlying resource can't be collected. Use a profiler to find out where the memory is going. FWIW, hitting 1 million or 5 million inodes is going to be a likely bottleneck (profile to check). Consider bundling the files up into archives that you access together if you find the file system to be a bottleneck here. HDFS, for example, was designed for larger files. Even if you're not using HDFS, millions of small files are kryptonite for parallel file systems (Panasas, Lustre, GPFS, etc). Old Cloudera blog post, but may be relevant here: http://blog.cloudera.com/blog/2009/02/the-small-files-problem/ -Ewan On 13/07/15 10:19, Konstantinos Kougios wrote: I do have other non-xml tasks and I was getting the same SIGTERM on all of them. I think the issue might be due to me processing small files via binaryFiles or wholeTextFiles. Initially I had issues with Xmx memory because I got more than 1 mil files (and in 1 occasion it is 5 mil files). I sorted that out by processing them in batches of 32k. But then this started happening. I've set the memoryOverhead to 4g for most of the tasks and it is ok now. But 4g is too much for tasks that process small files. I do have 32 threads per executor on some tasks but 32meg for stack thread overhead should do. Maybe the issue is sockets or some mem leak of network communication. On 13/07/15 09:15, Ewan Higgs wrote: It depends on how large the xml files are and how you're processing them. If you're using !ENTITY tags then you don't need a very large piece of xml to consume a lot of memory. e.g. the billion laughs xml: https://en.wikipedia.org/wiki/Billion_laughs -Ewan On 13/07/15 10:11, Konstantinos Kougios wrote: it was the memoryOverhead. It runs ok with more of that, but do you know which libraries could affect this? I find it strange that it needs 4g for a task that processes some xml files. The task themselfs require less Xmx. Cheers On 13/07/15 06:29, Jong Wook Kim wrote: Based on my experience, YARN containers can get SIGTERM when - it produces too much logs and use up the hard drive - it uses off-heap memory more than what is given by spark.yarn.executor.memoryOverhead configuration. It might be due to too many classes loaded (less than MaxPermGen but more than memoryOverhead), or some other off-heap memory allocated by networking library, etc. - it opens too many file descriptors, which you can check on the executor node's /proc/executor jvm's pid/fd/ Does any of these apply to your situation? Jong Wook On Jul 7, 2015, at 19:16, Kostas Kougios kostas.koug...@googlemail.com wrote: I am still receiving these weird sigterms on the executors. The driver claims it lost the executor, the executor receives a SIGTERM (from whom???) It doesn't seem a memory related issue though increasing memory takes the job a bit further or completes it. But why? there is no memory pressure on neither driver nor executor. And nothing in the logs indicating so. driver: 15/07/07 10:47:04 INFO scheduler.TaskSetManager: Starting task 14762.0 in stage 0.0 (TID 14762, cruncher03.stratified, PROCESS_LOCAL, 13069 bytes) 15/07/07 10:47:04 INFO scheduler.TaskSetManager: Finished task 14517.0 in stage 0.0 (TID 14517) in 15950 ms on cruncher03.stratified (14507/42240) 15/07/07 10:47:04 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. cruncher05.stratified:32976 15/07/07 10:47:04 ERROR cluster.YarnClusterScheduler: Lost executor 1 on cruncher05.stratified: remote Rpc client disassociated 15/07/07 10:47:04 INFO scheduler.TaskSetManager: Re-queueing tasks for 1 from TaskSet 0.0 15/07/07 10:47:04 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. cruncher05.stratified:32976 15/07/07 10:47:04 WARN remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkExecutor@cruncher05.stratified:32976] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. 15/07/07 10:47:04 WARN scheduler.TaskSetManager: Lost task 14591.0 in stage 0.0 (TID 14591, cruncher05.stratified): ExecutorLostFailure (executor 1 lost) gc log for driver, it doesnt look like it run outofmem: 2015-07-07T10:45:19.887+0100: [GC (Allocation Failure) 1764131K-1391211K(3393024K), 0.0102839 secs] 2015-07-07T10:46:00.934+0100: [GC (Allocation Failure)
Re: RECEIVED SIGNAL 15: SIGTERM
yes YARN was terminating the executor because the off heap memory limit was exceeded. On 13/07/15 06:55, Ruslan Dautkhanov wrote: the executor receives a SIGTERM (from whom???) From YARN Resource Manager. Check if yarn fair scheduler preemption and/or speculative execution are turned on, then it's quite possible and not a bug. -- Ruslan Dautkhanov On Sun, Jul 12, 2015 at 11:29 PM, Jong Wook Kim jongw...@nyu.edu mailto:jongw...@nyu.edu wrote: Based on my experience, YARN containers can get SIGTERM when - it produces too much logs and use up the hard drive - it uses off-heap memory more than what is given by spark.yarn.executor.memoryOverhead configuration. It might be due to too many classes loaded (less than MaxPermGen but more than memoryOverhead), or some other off-heap memory allocated by networking library, etc. - it opens too many file descriptors, which you can check on the executor node's /proc/executor jvm's pid/fd/ Does any of these apply to your situation? Jong Wook On Jul 7, 2015, at 19:16, Kostas Kougios kostas.koug...@googlemail.com mailto:kostas.koug...@googlemail.com wrote: I am still receiving these weird sigterms on the executors. The driver claims it lost the executor, the executor receives a SIGTERM (from whom???) It doesn't seem a memory related issue though increasing memory takes the job a bit further or completes it. But why? there is no memory pressure on neither driver nor executor. And nothing in the logs indicating so. driver: 15/07/07 10:47:04 INFO scheduler.TaskSetManager: Starting task 14762.0 in stage 0.0 (TID 14762, cruncher03.stratified, PROCESS_LOCAL, 13069 bytes) 15/07/07 10:47:04 INFO scheduler.TaskSetManager: Finished task 14517.0 in stage 0.0 (TID 14517) in 15950 ms on cruncher03.stratified (14507/42240) 15/07/07 10:47:04 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. cruncher05.stratified:32976 15/07/07 10:47:04 ERROR cluster.YarnClusterScheduler: Lost executor 1 on cruncher05.stratified: remote Rpc client disassociated 15/07/07 10:47:04 INFO scheduler.TaskSetManager: Re-queueing tasks for 1 from TaskSet 0.0 15/07/07 10:47:04 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. cruncher05.stratified:32976 15/07/07 10:47:04 WARN remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkExecutor@cruncher05.stratified:32976] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. 15/07/07 10:47:04 WARN scheduler.TaskSetManager: Lost task 14591.0 in stage 0.0 (TID 14591, cruncher05.stratified): ExecutorLostFailure (executor 1 lost) gc log for driver, it doesnt look like it run outofmem: 2015-07-07T10:45:19.887+0100: [GC (Allocation Failure) 1764131K-1391211K(3393024K), 0.0102839 secs] 2015-07-07T10:46:00.934+0100: [GC (Allocation Failure) 1764971K-1391867K(3405312K), 0.0099062 secs] 2015-07-07T10:46:45.252+0100: [GC (Allocation Failure) 1782011K-1392596K(3401216K), 0.0167572 secs] executor: 15/07/07 10:47:03 INFO executor.Executor: Running task 14750.0 in stage 0.0 (TID 14750) 15/07/07 10:47:03 INFO spark.CacheManager: Partition rdd_493_14750 not found, computing it 15/07/07 10:47:03 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL 15: SIGTERM 15/07/07 10:47:03 INFO storage.DiskBlockManager: Shutdown hook called executor gc log (no outofmem as it seems): 2015-07-07T10:47:02.332+0100: [GC (GCLocker Initiated GC) 24696750K-23712939K(33523712K), 0.0416640 secs] 2015-07-07T10:47:02.598+0100: [GC (GCLocker Initiated GC) 24700520K-23722043K(33523712K), 0.0391156 secs] 2015-07-07T10:47:02.862+0100: [GC (Allocation Failure) 24709182K-23726510K(33518592K), 0.0390784 secs] -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/RECEIVED-SIGNAL-15-SIGTERM-tp23668.html Sent from the Apache Spark User List mailing list archive at Nabble.com http://Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org mailto:user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org mailto:user-h...@spark.apache.org
Re: RECEIVED SIGNAL 15: SIGTERM
it was the memoryOverhead. It runs ok with more of that, but do you know which libraries could affect this? I find it strange that it needs 4g for a task that processes some xml files. The task themselfs require less Xmx. Cheers On 13/07/15 06:29, Jong Wook Kim wrote: Based on my experience, YARN containers can get SIGTERM when - it produces too much logs and use up the hard drive - it uses off-heap memory more than what is given by spark.yarn.executor.memoryOverhead configuration. It might be due to too many classes loaded (less than MaxPermGen but more than memoryOverhead), or some other off-heap memory allocated by networking library, etc. - it opens too many file descriptors, which you can check on the executor node's /proc/executor jvm's pid/fd/ Does any of these apply to your situation? Jong Wook On Jul 7, 2015, at 19:16, Kostas Kougios kostas.koug...@googlemail.com mailto:kostas.koug...@googlemail.com wrote: I am still receiving these weird sigterms on the executors. The driver claims it lost the executor, the executor receives a SIGTERM (from whom???) It doesn't seem a memory related issue though increasing memory takes the job a bit further or completes it. But why? there is no memory pressure on neither driver nor executor. And nothing in the logs indicating so. driver: 15/07/07 10:47:04 INFO scheduler.TaskSetManager: Starting task 14762.0 in stage 0.0 (TID 14762, cruncher03.stratified, PROCESS_LOCAL, 13069 bytes) 15/07/07 10:47:04 INFO scheduler.TaskSetManager: Finished task 14517.0 in stage 0.0 (TID 14517) in 15950 ms on cruncher03.stratified (14507/42240) 15/07/07 10:47:04 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. cruncher05.stratified:32976 15/07/07 10:47:04 ERROR cluster.YarnClusterScheduler: Lost executor 1 on cruncher05.stratified: remote Rpc client disassociated 15/07/07 10:47:04 INFO scheduler.TaskSetManager: Re-queueing tasks for 1 from TaskSet 0.0 15/07/07 10:47:04 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. cruncher05.stratified:32976 15/07/07 10:47:04 WARN remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkExecutor@cruncher05.stratified:32976] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. 15/07/07 10:47:04 WARN scheduler.TaskSetManager: Lost task 14591.0 in stage 0.0 (TID 14591, cruncher05.stratified): ExecutorLostFailure (executor 1 lost) gc log for driver, it doesnt look like it run outofmem: 2015-07-07T10:45:19.887+0100: [GC (Allocation Failure) 1764131K-1391211K(3393024K), 0.0102839 secs] 2015-07-07T10:46:00.934+0100: [GC (Allocation Failure) 1764971K-1391867K(3405312K), 0.0099062 secs] 2015-07-07T10:46:45.252+0100: [GC (Allocation Failure) 1782011K-1392596K(3401216K), 0.0167572 secs] executor: 15/07/07 10:47:03 INFO executor.Executor: Running task 14750.0 in stage 0.0 (TID 14750) 15/07/07 10:47:03 INFO spark.CacheManager: Partition rdd_493_14750 not found, computing it 15/07/07 10:47:03 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL 15: SIGTERM 15/07/07 10:47:03 INFO storage.DiskBlockManager: Shutdown hook called executor gc log (no outofmem as it seems): 2015-07-07T10:47:02.332+0100: [GC (GCLocker Initiated GC) 24696750K-23712939K(33523712K), 0.0416640 secs] 2015-07-07T10:47:02.598+0100: [GC (GCLocker Initiated GC) 24700520K-23722043K(33523712K), 0.0391156 secs] 2015-07-07T10:47:02.862+0100: [GC (Allocation Failure) 24709182K-23726510K(33518592K), 0.0390784 secs] -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/RECEIVED-SIGNAL-15-SIGTERM-tp23668.html Sent from the Apache Spark User List mailing list archive at Nabble.com http://Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org mailto:user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org mailto:user-h...@spark.apache.org
Re: RECEIVED SIGNAL 15: SIGTERM
I do have other non-xml tasks and I was getting the same SIGTERM on all of them. I think the issue might be due to me processing small files via binaryFiles or wholeTextFiles. Initially I had issues with Xmx memory because I got more than 1 mil files (and in 1 occasion it is 5 mil files). I sorted that out by processing them in batches of 32k. But then this started happening. I've set the memoryOverhead to 4g for most of the tasks and it is ok now. But 4g is too much for tasks that process small files. I do have 32 threads per executor on some tasks but 32meg for stack thread overhead should do. Maybe the issue is sockets or some mem leak of network communication. On 13/07/15 09:15, Ewan Higgs wrote: It depends on how large the xml files are and how you're processing them. If you're using !ENTITY tags then you don't need a very large piece of xml to consume a lot of memory. e.g. the billion laughs xml: https://en.wikipedia.org/wiki/Billion_laughs -Ewan On 13/07/15 10:11, Konstantinos Kougios wrote: it was the memoryOverhead. It runs ok with more of that, but do you know which libraries could affect this? I find it strange that it needs 4g for a task that processes some xml files. The task themselfs require less Xmx. Cheers On 13/07/15 06:29, Jong Wook Kim wrote: Based on my experience, YARN containers can get SIGTERM when - it produces too much logs and use up the hard drive - it uses off-heap memory more than what is given by spark.yarn.executor.memoryOverhead configuration. It might be due to too many classes loaded (less than MaxPermGen but more than memoryOverhead), or some other off-heap memory allocated by networking library, etc. - it opens too many file descriptors, which you can check on the executor node's /proc/executor jvm's pid/fd/ Does any of these apply to your situation? Jong Wook On Jul 7, 2015, at 19:16, Kostas Kougios kostas.koug...@googlemail.com wrote: I am still receiving these weird sigterms on the executors. The driver claims it lost the executor, the executor receives a SIGTERM (from whom???) It doesn't seem a memory related issue though increasing memory takes the job a bit further or completes it. But why? there is no memory pressure on neither driver nor executor. And nothing in the logs indicating so. driver: 15/07/07 10:47:04 INFO scheduler.TaskSetManager: Starting task 14762.0 in stage 0.0 (TID 14762, cruncher03.stratified, PROCESS_LOCAL, 13069 bytes) 15/07/07 10:47:04 INFO scheduler.TaskSetManager: Finished task 14517.0 in stage 0.0 (TID 14517) in 15950 ms on cruncher03.stratified (14507/42240) 15/07/07 10:47:04 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. cruncher05.stratified:32976 15/07/07 10:47:04 ERROR cluster.YarnClusterScheduler: Lost executor 1 on cruncher05.stratified: remote Rpc client disassociated 15/07/07 10:47:04 INFO scheduler.TaskSetManager: Re-queueing tasks for 1 from TaskSet 0.0 15/07/07 10:47:04 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. cruncher05.stratified:32976 15/07/07 10:47:04 WARN remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkExecutor@cruncher05.stratified:32976] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. 15/07/07 10:47:04 WARN scheduler.TaskSetManager: Lost task 14591.0 in stage 0.0 (TID 14591, cruncher05.stratified): ExecutorLostFailure (executor 1 lost) gc log for driver, it doesnt look like it run outofmem: 2015-07-07T10:45:19.887+0100: [GC (Allocation Failure) 1764131K-1391211K(3393024K), 0.0102839 secs] 2015-07-07T10:46:00.934+0100: [GC (Allocation Failure) 1764971K-1391867K(3405312K), 0.0099062 secs] 2015-07-07T10:46:45.252+0100: [GC (Allocation Failure) 1782011K-1392596K(3401216K), 0.0167572 secs] executor: 15/07/07 10:47:03 INFO executor.Executor: Running task 14750.0 in stage 0.0 (TID 14750) 15/07/07 10:47:03 INFO spark.CacheManager: Partition rdd_493_14750 not found, computing it 15/07/07 10:47:03 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL 15: SIGTERM 15/07/07 10:47:03 INFO storage.DiskBlockManager: Shutdown hook called executor gc log (no outofmem as it seems): 2015-07-07T10:47:02.332+0100: [GC (GCLocker Initiated GC) 24696750K-23712939K(33523712K), 0.0416640 secs] 2015-07-07T10:47:02.598+0100: [GC (GCLocker Initiated GC) 24700520K-23722043K(33523712K), 0.0391156 secs] 2015-07-07T10:47:02.862+0100: [GC (Allocation Failure) 24709182K-23726510K(33518592K), 0.0390784 secs] -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/RECEIVED-SIGNAL-15-SIGTERM-tp23668.html Sent from the Apache Spark User List mailing list archive at Nabble.com http://Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org mailto:user-unsubscr...@spark.apache.org For additional commands,
Re: RECEIVED SIGNAL 15: SIGTERM
Based on my experience, YARN containers can get SIGTERM when - it produces too much logs and use up the hard drive - it uses off-heap memory more than what is given by spark.yarn.executor.memoryOverhead configuration. It might be due to too many classes loaded (less than MaxPermGen but more than memoryOverhead), or some other off-heap memory allocated by networking library, etc. - it opens too many file descriptors, which you can check on the executor node's /proc/executor jvm's pid/fd/ Does any of these apply to your situation? Jong Wook On Jul 7, 2015, at 19:16, Kostas Kougios kostas.koug...@googlemail.com wrote: I am still receiving these weird sigterms on the executors. The driver claims it lost the executor, the executor receives a SIGTERM (from whom???) It doesn't seem a memory related issue though increasing memory takes the job a bit further or completes it. But why? there is no memory pressure on neither driver nor executor. And nothing in the logs indicating so. driver: 15/07/07 10:47:04 INFO scheduler.TaskSetManager: Starting task 14762.0 in stage 0.0 (TID 14762, cruncher03.stratified, PROCESS_LOCAL, 13069 bytes) 15/07/07 10:47:04 INFO scheduler.TaskSetManager: Finished task 14517.0 in stage 0.0 (TID 14517) in 15950 ms on cruncher03.stratified (14507/42240) 15/07/07 10:47:04 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. cruncher05.stratified:32976 15/07/07 10:47:04 ERROR cluster.YarnClusterScheduler: Lost executor 1 on cruncher05.stratified: remote Rpc client disassociated 15/07/07 10:47:04 INFO scheduler.TaskSetManager: Re-queueing tasks for 1 from TaskSet 0.0 15/07/07 10:47:04 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. cruncher05.stratified:32976 15/07/07 10:47:04 WARN remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkExecutor@cruncher05.stratified:32976] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. 15/07/07 10:47:04 WARN scheduler.TaskSetManager: Lost task 14591.0 in stage 0.0 (TID 14591, cruncher05.stratified): ExecutorLostFailure (executor 1 lost) gc log for driver, it doesnt look like it run outofmem: 2015-07-07T10:45:19.887+0100: [GC (Allocation Failure) 1764131K-1391211K(3393024K), 0.0102839 secs] 2015-07-07T10:46:00.934+0100: [GC (Allocation Failure) 1764971K-1391867K(3405312K), 0.0099062 secs] 2015-07-07T10:46:45.252+0100: [GC (Allocation Failure) 1782011K-1392596K(3401216K), 0.0167572 secs] executor: 15/07/07 10:47:03 INFO executor.Executor: Running task 14750.0 in stage 0.0 (TID 14750) 15/07/07 10:47:03 INFO spark.CacheManager: Partition rdd_493_14750 not found, computing it 15/07/07 10:47:03 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL 15: SIGTERM 15/07/07 10:47:03 INFO storage.DiskBlockManager: Shutdown hook called executor gc log (no outofmem as it seems): 2015-07-07T10:47:02.332+0100: [GC (GCLocker Initiated GC) 24696750K-23712939K(33523712K), 0.0416640 secs] 2015-07-07T10:47:02.598+0100: [GC (GCLocker Initiated GC) 24700520K-23722043K(33523712K), 0.0391156 secs] 2015-07-07T10:47:02.862+0100: [GC (Allocation Failure) 24709182K-23726510K(33518592K), 0.0390784 secs] -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/RECEIVED-SIGNAL-15-SIGTERM-tp23668.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: RECEIVED SIGNAL 15: SIGTERM
the executor receives a SIGTERM (from whom???) From YARN Resource Manager. Check if yarn fair scheduler preemption and/or speculative execution are turned on, then it's quite possible and not a bug. -- Ruslan Dautkhanov On Sun, Jul 12, 2015 at 11:29 PM, Jong Wook Kim jongw...@nyu.edu wrote: Based on my experience, YARN containers can get SIGTERM when - it produces too much logs and use up the hard drive - it uses off-heap memory more than what is given by spark.yarn.executor.memoryOverhead configuration. It might be due to too many classes loaded (less than MaxPermGen but more than memoryOverhead), or some other off-heap memory allocated by networking library, etc. - it opens too many file descriptors, which you can check on the executor node's /proc/executor jvm's pid/fd/ Does any of these apply to your situation? Jong Wook On Jul 7, 2015, at 19:16, Kostas Kougios kostas.koug...@googlemail.com wrote: I am still receiving these weird sigterms on the executors. The driver claims it lost the executor, the executor receives a SIGTERM (from whom???) It doesn't seem a memory related issue though increasing memory takes the job a bit further or completes it. But why? there is no memory pressure on neither driver nor executor. And nothing in the logs indicating so. driver: 15/07/07 10:47:04 INFO scheduler.TaskSetManager: Starting task 14762.0 in stage 0.0 (TID 14762, cruncher03.stratified, PROCESS_LOCAL, 13069 bytes) 15/07/07 10:47:04 INFO scheduler.TaskSetManager: Finished task 14517.0 in stage 0.0 (TID 14517) in 15950 ms on cruncher03.stratified (14507/42240) 15/07/07 10:47:04 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. cruncher05.stratified:32976 15/07/07 10:47:04 ERROR cluster.YarnClusterScheduler: Lost executor 1 on cruncher05.stratified: remote Rpc client disassociated 15/07/07 10:47:04 INFO scheduler.TaskSetManager: Re-queueing tasks for 1 from TaskSet 0.0 15/07/07 10:47:04 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. cruncher05.stratified:32976 15/07/07 10:47:04 WARN remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkExecutor@cruncher05.stratified:32976] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. 15/07/07 10:47:04 WARN scheduler.TaskSetManager: Lost task 14591.0 in stage 0.0 (TID 14591, cruncher05.stratified): ExecutorLostFailure (executor 1 lost) gc log for driver, it doesnt look like it run outofmem: 2015-07-07T10:45:19.887+0100: [GC (Allocation Failure) 1764131K-1391211K(3393024K), 0.0102839 secs] 2015-07-07T10:46:00.934+0100: [GC (Allocation Failure) 1764971K-1391867K(3405312K), 0.0099062 secs] 2015-07-07T10:46:45.252+0100: [GC (Allocation Failure) 1782011K-1392596K(3401216K), 0.0167572 secs] executor: 15/07/07 10:47:03 INFO executor.Executor: Running task 14750.0 in stage 0.0 (TID 14750) 15/07/07 10:47:03 INFO spark.CacheManager: Partition rdd_493_14750 not found, computing it 15/07/07 10:47:03 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL 15: SIGTERM 15/07/07 10:47:03 INFO storage.DiskBlockManager: Shutdown hook called executor gc log (no outofmem as it seems): 2015-07-07T10:47:02.332+0100: [GC (GCLocker Initiated GC) 24696750K-23712939K(33523712K), 0.0416640 secs] 2015-07-07T10:47:02.598+0100: [GC (GCLocker Initiated GC) 24700520K-23722043K(33523712K), 0.0391156 secs] 2015-07-07T10:47:02.862+0100: [GC (Allocation Failure) 24709182K-23726510K(33518592K), 0.0390784 secs] -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/RECEIVED-SIGNAL-15-SIGTERM-tp23668.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org