Re: Spark Cluster: RECEIVED SIGNAL 15: SIGTERM

2015-04-13 Thread Guillaume Pitel
That's why I think it's the OOM killer. There are several cases of 
memory overuse / errors :


1 - The application tries to allocate more than the Heap limit and GC 
cannot free more memory = OutOfMemory : Java Heap Space exception from JVM
2 - The jvm is configured with a max heap size larger than the available 
memory. At some point the application needs to allocate memory in JVM, 
the JVM tries to extend its heap and allocate real memory (or maybe the 
OS is configured with overcommit virtual memory), but fails = Kill 
process of sacrifice child (or others, depending on various factors : 
https://plumbr.eu/outofmemoryerror)
3 - The jvm has allocated its memory from the beginning and it has been 
served, but other processes start starving from memory shortage, the 
pressure on memory grows beyond the threshold configured in the OOM 
Killer, and boom, the java process is selected for a sacrifice because 
it is the main culprit of memory consumption.


Guillaume
Linux OOM throws SIGTERM, but if I remember correctly JVM handles heap 
memory limits differently and throws OutOfMemoryError and eventually 
sends SIGINT.


Not sure what happened but the worker simply received a SIGTERM 
signal, so perhaps the daemon was terminated by someone or a parent 
process. Just my guess.


Tim

On Mon, Apr 13, 2015 at 2:28 AM, Guillaume Pitel 
guillaume.pi...@exensa.com mailto:guillaume.pi...@exensa.com wrote:


Very likely to be this :

http://www.linuxdevcenter.com/pub/a/linux/2006/11/30/linux-out-of-memory.html?page=2

Your worker ran out of memory = maybe you're asking for too much
memory for the JVM, or something else is running on the worker

Guillaume

Any idea what this means, many thanks

==
logs/spark-.-org.apache.spark.deploy.worker.Worker-1-09.out.1
==
15/04/13 07:07:22 INFO Worker: Starting Spark worker 09:39910
with 4 cores, 6.6 GB RAM
15/04/13 07:07:22 INFO Worker: Running Spark version 1.3.0
15/04/13 07:07:22 INFO Worker: Spark home:
/remote/users//work/tools/spark-1.3.0-bin-hadoop2.4
15/04/13 07:07:22 INFO Server: jetty-8.y.z-SNAPSHOT
15/04/13 07:07:22 INFO AbstractConnector: Started
SelectChannelConnector@0.0.0.0:8081
http://SelectChannelConnector@0.0.0.0:8081
15/04/13 07:07:22 INFO Utils: Successfully started service
'WorkerUI' on port 8081.
15/04/13 07:07:22 INFO WorkerWebUI: Started WorkerWebUI at
http://09:8081
15/04/13 07:07:22 INFO Worker: Connecting to master
akka.tcp://sparkMaster@nceuhamnr08:7077/user/Master...
15/04/13 07:07:22 INFO Worker: Successfully registered with
master spark://08:7077
*15/04/13 08:35:07 ERROR Worker: RECEIVED SIGNAL 15: SIGTERM*




-- 
eXenSa



*Guillaume PITEL, Président*
+33(0)626 222 431

eXenSa S.A.S. http://www.exensa.com/
41, rue Périer - 92120 Montrouge - FRANCE
Tel +33(0)184 163 677 / Fax +33(0)972 283 705





--
eXenSa


*Guillaume PITEL, Président*
+33(0)626 222 431

eXenSa S.A.S. http://www.exensa.com/
41, rue Périer - 92120 Montrouge - FRANCE
Tel +33(0)184 163 677 / Fax +33(0)972 283 705



Re: Spark Cluster: RECEIVED SIGNAL 15: SIGTERM

2015-04-13 Thread Guillaume Pitel

Very likely to be this :
http://www.linuxdevcenter.com/pub/a/linux/2006/11/30/linux-out-of-memory.html?page=2

Your worker ran out of memory = maybe you're asking for too much memory 
for the JVM, or something else is running on the worker


Guillaume

Any idea what this means, many thanks

== 
logs/spark-.-org.apache.spark.deploy.worker.Worker-1-09.out.1 
==
15/04/13 07:07:22 INFO Worker: Starting Spark worker 09:39910 with 
4 cores, 6.6 GB RAM

15/04/13 07:07:22 INFO Worker: Running Spark version 1.3.0
15/04/13 07:07:22 INFO Worker: Spark home: 
/remote/users//work/tools/spark-1.3.0-bin-hadoop2.4

15/04/13 07:07:22 INFO Server: jetty-8.y.z-SNAPSHOT
15/04/13 07:07:22 INFO AbstractConnector: Started 
SelectChannelConnector@0.0.0.0:8081 
http://SelectChannelConnector@0.0.0.0:8081
15/04/13 07:07:22 INFO Utils: Successfully started service 'WorkerUI' 
on port 8081.
15/04/13 07:07:22 INFO WorkerWebUI: Started WorkerWebUI at 
http://09:8081
15/04/13 07:07:22 INFO Worker: Connecting to master 
akka.tcp://sparkMaster@nceuhamnr08:7077/user/Master...
15/04/13 07:07:22 INFO Worker: Successfully registered with master 
spark://08:7077

*15/04/13 08:35:07 ERROR Worker: RECEIVED SIGNAL 15: SIGTERM*




--
eXenSa


*Guillaume PITEL, Président*
+33(0)626 222 431

eXenSa S.A.S. http://www.exensa.com/
41, rue Périer - 92120 Montrouge - FRANCE
Tel +33(0)184 163 677 / Fax +33(0)972 283 705



Re: Spark Cluster: RECEIVED SIGNAL 15: SIGTERM

2015-04-13 Thread Tim Chen
Linux OOM throws SIGTERM, but if I remember correctly JVM handles heap
memory limits differently and throws OutOfMemoryError and eventually sends
SIGINT.

Not sure what happened but the worker simply received a SIGTERM signal, so
perhaps the daemon was terminated by someone or a parent process. Just my
guess.

Tim

On Mon, Apr 13, 2015 at 2:28 AM, Guillaume Pitel guillaume.pi...@exensa.com
 wrote:

  Very likely to be this :

 http://www.linuxdevcenter.com/pub/a/linux/2006/11/30/linux-out-of-memory.html?page=2

 Your worker ran out of memory = maybe you're asking for too much memory
 for the JVM, or something else is running on the worker

 Guillaume

  Any idea what this means, many thanks

  ==
 logs/spark-.-org.apache.spark.deploy.worker.Worker-1-09.out.1
 ==
 15/04/13 07:07:22 INFO Worker: Starting Spark worker 09:39910 with 4
 cores, 6.6 GB RAM
 15/04/13 07:07:22 INFO Worker: Running Spark version 1.3.0
 15/04/13 07:07:22 INFO Worker: Spark home:
 /remote/users//work/tools/spark-1.3.0-bin-hadoop2.4
 15/04/13 07:07:22 INFO Server: jetty-8.y.z-SNAPSHOT
 15/04/13 07:07:22 INFO AbstractConnector: Started
 SelectChannelConnector@0.0.0.0:8081
 15/04/13 07:07:22 INFO Utils: Successfully started service 'WorkerUI' on
 port 8081.
 15/04/13 07:07:22 INFO WorkerWebUI: Started WorkerWebUI at
 http://09:8081
 15/04/13 07:07:22 INFO Worker: Connecting to master
 akka.tcp://sparkMaster@nceuhamnr08:7077/user/Master...
 15/04/13 07:07:22 INFO Worker: Successfully registered with master
 spark://08:7077
 *15/04/13 08:35:07 ERROR Worker: RECEIVED SIGNAL 15: SIGTERM*



 --
[image: eXenSa]
  *Guillaume PITEL, Président*
 +33(0)626 222 431

 eXenSa S.A.S. http://www.exensa.com/
  41, rue Périer - 92120 Montrouge - FRANCE
 Tel +33(0)184 163 677 / Fax +33(0)972 283 705