Re: container fails to start with malloc error

2015-04-16 Thread Johannes Zillmann
Hi Hitesh,

will check the memory situation!
Java version is:
  java version 1.7.0_76
 Java(TM) SE Runtime Environment (build 1.7.0_76-tdc1-b13)
 Java HotSpot(TM) 64-Bit Server VM (build 24.76-b04, mixed mode)

I don’t think its necessarily the same java version which was used to compile 
(Hadoop, Tez, Datameer ?)

Johannes 


 On 15 Apr 2015, at 18:59, Hitesh Shah hit...@apache.org wrote:
 
 Hi Johannes 
 
 Not sure if anyone has seen this earlier. Do you know if the machines have 
 enough memory to run the no. of tasks/containers that you are launching? 
 Also, I am assuming that you are compiling and running against the same jdk 
 version?
 
 Would you mind sharing the details on what java version are you running? 
 
 — Hitesh
 
 On Apr 14, 2015, at 1:19 AM, Johannes Zillmann jzillm...@googlemail.com 
 wrote:
 
 Hey guys,
 
 in an customer environment certain Tez jobs fail to start
 
 On the client side it looks like:
 ——
 INFO [2015-04-08 15:19:30.213] [MrPlanRunnerV2] (YarnClientImpl.java:204) - 
 Submitted application application_1428177121154_0065
 INFO [2015-04-08 15:19:30.214] [MrPlanRunnerV2] (TezClient.java:357) - The 
 url to track the Tez Session: 
 http://master:8088/proxy/application_1428177121154_0065/
 INFO [2015-04-08 15:19:33.219] [MrPlanRunnerV2] (TezClient.java:556) - App 
 did not succeed. Diagnostics: Application application_1428177121154_0065 
 failed 2 times due to AM Container for appattempt_1428177121154_0065_02 
 exited with  exitCode: 134 due to: Exception from container-launch: 
 org.apache.hadoop.util.Shell$ExitCodeException: /bin/bash: line 1: 10818 
 Aborted (core dumped) /opt/teradata/jvm64/jdk7/bin/java 
 -Xmx819m -server -Djava.net.preferIPv4Stack=true 
 -Dhadoop.metrics.log.level=WARN -XX:+PrintGCDetails -verbose:gc 
 -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseParallelGC 
 -Dapple.awt.UIElement=true -Djava.awt.headless=true 
 -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator 
 -Dlog4j.configuration=tez-container-log4j.properties 
 -Dyarn.app.container.log.dir=/data3/hadoop/yarn/log/application_1428177121154_0065/container_1428177121154_0065_02_01
  -Dtez.root.logger=INFO,CLA -Dsun.nio.ch.bugLevel='' 
 org.apache.tez.dag.app.DAGAppMaster --session  
 /data3/hadoop/yarn/log/application_1428177121154_0065/container_1428177121154_0065_02_01/stdout
  2 
 /data3/hadoop/yarn/log/application_1428177121154_0065/container_1428177121154_0065_02_01/stderr
 
 org.apache.hadoop.util.Shell$ExitCodeException: /bin/bash: line 1: 10818 
 Aborted (core dumped) /opt/teradata/jvm64/jdk7/bin/java 
 -Xmx819m -server -Djava.net.preferIPv4Stack=true 
 -Dhadoop.metrics.log.level=WARN -XX:+PrintGCDetails -verbose:gc 
 -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseParallelGC 
 -Dapple.awt.UIElement=true -Djava.awt.headless=true 
 -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator 
 -Dlog4j.configuration=tez-container-log4j.properties 
 -Dyarn.app.container.log.dir=/data3/hadoop/yarn/log/application_1428177121154_0065/container_1428177121154_0065_02_01
  -Dtez.root.logger=INFO,CLA -Dsun.nio.ch.bugLevel='' 
 org.apache.tez.dag.app.DAGAppMaster --session  
 /data3/hadoop/yarn/log/application_1428177121154_0065/container_1428177121154_0065_02_01/stdout
  2 
 /data3/hadoop/yarn/log/application_1428177121154_0065/container_1428177121154_0065_02_01/stderr
 
  at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
  at org.apache.hadoop.util.Shell.run(Shell.java:418)
  at 
 org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
  at 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
  at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)
  at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
  at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
 
 
 Container exited with a non-zero exit code 134
 .Failing this attempt.. Failing the application.
 ——
 
 
 Then you have that for the task:
 ——
 Log Type: stderr
 Log Length: 429
 java: malloc.c:3090: sYSMALLOc: Assertion `(old_top == (((mbinptr) (((char 
 *) ((av)-bins[((1) - 1) * 2])) - __builtin_offsetof (struct malloc_chunk, 
 fd  old_size == 0) || ((unsigned long) (old_size) = (unsigned 
 long)__builtin_offsetof (struct malloc_chunk, fd_nextsize))+((2 * 
 (sizeof(size_t))) - 1))  ~((2 * (sizeof(size_t))) - 1)))  
 ((old_top)-size  0x1)  ((unsigned long)old_end  pagemask) == 0)' failed.
 
 Log Type: 

Re: container fails to start with malloc error

2015-04-15 Thread Hitesh Shah
Hi Johannes 

Not sure if anyone has seen this earlier. Do you know if the machines have 
enough memory to run the no. of tasks/containers that you are launching? Also, 
I am assuming that you are compiling and running against the same jdk version?

Would you mind sharing the details on what java version are you running? 

— Hitesh

On Apr 14, 2015, at 1:19 AM, Johannes Zillmann jzillm...@googlemail.com wrote:

 Hey guys,
 
 in an customer environment certain Tez jobs fail to start
 
 On the client side it looks like:
 ——
 INFO [2015-04-08 15:19:30.213] [MrPlanRunnerV2] (YarnClientImpl.java:204) - 
 Submitted application application_1428177121154_0065
 INFO [2015-04-08 15:19:30.214] [MrPlanRunnerV2] (TezClient.java:357) - The 
 url to track the Tez Session: 
 http://master:8088/proxy/application_1428177121154_0065/
 INFO [2015-04-08 15:19:33.219] [MrPlanRunnerV2] (TezClient.java:556) - App 
 did not succeed. Diagnostics: Application application_1428177121154_0065 
 failed 2 times due to AM Container for appattempt_1428177121154_0065_02 
 exited with  exitCode: 134 due to: Exception from container-launch: 
 org.apache.hadoop.util.Shell$ExitCodeException: /bin/bash: line 1: 10818 
 Aborted (core dumped) /opt/teradata/jvm64/jdk7/bin/java 
 -Xmx819m -server -Djava.net.preferIPv4Stack=true 
 -Dhadoop.metrics.log.level=WARN -XX:+PrintGCDetails -verbose:gc 
 -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseParallelGC 
 -Dapple.awt.UIElement=true -Djava.awt.headless=true 
 -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator 
 -Dlog4j.configuration=tez-container-log4j.properties 
 -Dyarn.app.container.log.dir=/data3/hadoop/yarn/log/application_1428177121154_0065/container_1428177121154_0065_02_01
  -Dtez.root.logger=INFO,CLA -Dsun.nio.ch.bugLevel='' 
 org.apache.tez.dag.app.DAGAppMaster --session  
 /data3/hadoop/yarn/log/application_1428177121154_0065/container_1428177121154_0065_02_01/stdout
  2 
 /data3/hadoop/yarn/log/application_1428177121154_0065/container_1428177121154_0065_02_01/stderr
 
 org.apache.hadoop.util.Shell$ExitCodeException: /bin/bash: line 1: 10818 
 Aborted (core dumped) /opt/teradata/jvm64/jdk7/bin/java 
 -Xmx819m -server -Djava.net.preferIPv4Stack=true 
 -Dhadoop.metrics.log.level=WARN -XX:+PrintGCDetails -verbose:gc 
 -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseParallelGC 
 -Dapple.awt.UIElement=true -Djava.awt.headless=true 
 -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator 
 -Dlog4j.configuration=tez-container-log4j.properties 
 -Dyarn.app.container.log.dir=/data3/hadoop/yarn/log/application_1428177121154_0065/container_1428177121154_0065_02_01
  -Dtez.root.logger=INFO,CLA -Dsun.nio.ch.bugLevel='' 
 org.apache.tez.dag.app.DAGAppMaster --session  
 /data3/hadoop/yarn/log/application_1428177121154_0065/container_1428177121154_0065_02_01/stdout
  2 
 /data3/hadoop/yarn/log/application_1428177121154_0065/container_1428177121154_0065_02_01/stderr
 
   at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
   at org.apache.hadoop.util.Shell.run(Shell.java:418)
   at 
 org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
   at 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 
 
 Container exited with a non-zero exit code 134
 .Failing this attempt.. Failing the application.
 ——
 
 
 Then you have that for the task:
 ——
 Log Type: stderr
 Log Length: 429
 java: malloc.c:3090: sYSMALLOc: Assertion `(old_top == (((mbinptr) (((char *) 
 ((av)-bins[((1) - 1) * 2])) - __builtin_offsetof (struct malloc_chunk, 
 fd  old_size == 0) || ((unsigned long) (old_size) = (unsigned 
 long)__builtin_offsetof (struct malloc_chunk, fd_nextsize))+((2 * 
 (sizeof(size_t))) - 1))  ~((2 * (sizeof(size_t))) - 1)))  ((old_top)-size 
  0x1)  ((unsigned long)old_end  pagemask) == 0)' failed.
 
 Log Type: stdout
 Log Length: 0
 ——
 
 Any ideas ?
 
 Johannes