Did you try something like
-Dmapred.child.java.opts="-Xss64k?
(see GIRAPH-12)

Christian

On Oct 10, 2011, at 11:08 AM, Zhiwei Gu wrote:

> Hi all,
>   In my giraph job, when I set the worker to be 200, it is ok, and while set 
> to 500, it will fail due to early stage OOM exception in one (or more) 
> workers. As this worker fails, other workers who wants to talk with this 
> worker will keep on waiting until tried 5 times, then that worker will fail.
> 
> Have you ever faced such issue?
> 
> Best,
> -z
> 
> 
> Here is the exception,
> 2011-10-08 09:26:59,108 INFO org.apache.giraph.comm.RPCCommunications: 
> getRPCServer: Added jobToken Ident: 17 6a 6f 62 5f 32 30 31 31 30 38 32 36 30 
> 39 31 31 5f 36 36 37 30 39 30, Pass: 12 26 1a f1 d2 51 e1 bf 2d 36 63 11 26 
> 18 17 3d 53 b3 15 f6, Kind: mapreduce.job, Service: job_201108260911_667090
> 2011-10-08 09:26:59,116 INFO org.apache.hadoop.ipc.Server: Starting 
> SocketReader
> 2011-10-08 09:26:59,116 INFO org.apache.hadoop.ipc.Server: Starting 
> SocketReader
> 2011-10-08 09:26:59,117 INFO org.apache.hadoop.ipc.Server: Starting 
> SocketReader
> 2011-10-08 09:26:59,117 INFO org.apache.hadoop.ipc.Server: Starting 
> SocketReader
> 2011-10-08 09:26:59,117 INFO org.apache.hadoop.ipc.Server: Starting 
> SocketReader
> 2011-10-08 09:26:59,120 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source 
> RpcDetailedActivityForPort31250 registered.
> 2011-10-08 09:26:59,121 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source 
> RpcActivityForPort31250 registered.
> 2011-10-08 09:26:59,123 INFO org.apache.hadoop.ipc.Server: IPC Server 
> Responder: starting
> 2011-10-08 09:26:59,123 INFO org.apache.hadoop.ipc.Server: IPC Server 
> listener on 31250: starting
> 2011-10-08 09:26:59,127 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 0 on 31250: starting
> 2011-10-08 09:26:59,127 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 1 on 31250: starting
> 2011-10-08 09:26:59,133 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 2 on 31250: starting
> 2011-10-08 09:26:59,133 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 3 on 31250: starting
> 2011-10-08 09:26:59,137 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 4 on 31250: starting
> 2011-10-08 09:26:59,144 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 5 on 31250: starting
> 2011-10-08 09:26:59,144 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 6 on 31250: starting
> 2011-10-08 09:26:59,144 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 7 on 31250: starting
> 2011-10-08 09:26:59,144 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 8 on 31250: starting
> 2011-10-08 09:26:59,144 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 9 on 31250: starting
> 2011-10-08 09:26:59,145 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 10 on 31250: starting
> 2011-10-08 09:26:59,145 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 11 on 31250: starting
> 2011-10-08 09:26:59,145 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 12 on 31250: starting
> 2011-10-08 09:26:59,145 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 13 on 31250: starting
> 2011-10-08 09:26:59,145 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 14 on 31250: starting
> 2011-10-08 09:26:59,145 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 15 on 31250: starting
> 2011-10-08 09:26:59,146 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 16 on 31250: starting
> 2011-10-08 09:26:59,146 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 17 on 31250: starting
> 2011-10-08 09:26:59,146 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 18 on 31250: starting
> 2011-10-08 09:26:59,146 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 19 on 31250: starting
> 2011-10-08 09:26:59,146 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 20 on 31250: starting
> 2011-10-08 09:26:59,146 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 21 on 31250: starting
> 2011-10-08 09:26:59,147 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 22 on 31250: starting
> 2011-10-08 09:26:59,147 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 23 on 31250: starting
> 2011-10-08 09:26:59,147 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 24 on 31250: starting
> 2011-10-08 09:26:59,147 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 25 on 31250: starting
> 2011-10-08 09:26:59,147 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 26 on 31250: starting
> 2011-10-08 09:26:59,147 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 27 on 31250: starting
> 2011-10-08 09:26:59,148 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 28 on 31250: starting
> 2011-10-08 09:26:59,148 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 29 on 31250: starting
> 2011-10-08 09:26:59,148 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 30 on 31250: starting
> 2011-10-08 09:26:59,148 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 31 on 31250: starting
> 2011-10-08 09:26:59,148 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 32 on 31250: starting
> 2011-10-08 09:26:59,148 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 33 on 31250: starting
> 2011-10-08 09:26:59,149 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 34 on 31250: starting
> 2011-10-08 09:26:59,149 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 35 on 31250: starting
> 2011-10-08 09:26:59,149 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 36 on 31250: starting
> 2011-10-08 09:26:59,149 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 37 on 31250: starting
> 2011-10-08 09:26:59,149 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 38 on 31250: starting
> 2011-10-08 09:26:59,149 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 39 on 31250: starting
> 2011-10-08 09:26:59,150 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 40 on 31250: starting
> 2011-10-08 09:26:59,150 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 41 on 31250: starting
> 2011-10-08 09:26:59,150 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 42 on 31250: starting
> 2011-10-08 09:26:59,150 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 43 on 31250: starting
> 2011-10-08 09:26:59,150 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 44 on 31250: starting
> 2011-10-08 09:26:59,150 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 45 on 31250: starting
> 2011-10-08 09:26:59,151 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 46 on 31250: starting
> 2011-10-08 09:26:59,151 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 47 on 31250: starting
> 2011-10-08 09:26:59,151 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 48 on 31250: starting
> 2011-10-08 09:26:59,151 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 49 on 31250: starting
> 2011-10-08 09:26:59,151 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 50 on 31250: starting
> 2011-10-08 09:26:59,152 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 51 on 31250: starting
> 2011-10-08 09:26:59,152 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 52 on 31250: starting
> 2011-10-08 09:26:59,152 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 53 on 31250: starting
> 2011-10-08 09:26:59,152 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 54 on 31250: starting
> 2011-10-08 09:26:59,153 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 55 on 31250: starting
> 2011-10-08 09:26:59,153 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 56 on 31250: starting
> 2011-10-08 09:26:59,153 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 57 on 31250: starting
> 2011-10-08 09:26:59,153 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 58 on 31250: starting
> 2011-10-08 09:26:59,153 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 59 on 31250: starting
> 2011-10-08 09:26:59,154 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 60 on 31250: starting
> 2011-10-08 09:26:59,154 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 61 on 31250: starting
> 2011-10-08 09:26:59,154 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 62 on 31250: starting
> 2011-10-08 09:26:59,154 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 63 on 31250: starting
> 2011-10-08 09:26:59,155 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 64 on 31250: starting
> 2011-10-08 09:26:59,155 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 65 on 31250: starting
> 2011-10-08 09:26:59,155 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 66 on 31250: starting
> 2011-10-08 09:26:59,155 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 67 on 31250: starting
> 2011-10-08 09:26:59,155 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 68 on 31250: starting
> 2011-10-08 09:26:59,155 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 69 on 31250: starting
> 2011-10-08 09:26:59,156 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 70 on 31250: starting
> 2011-10-08 09:26:59,156 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 71 on 31250: starting
> 2011-10-08 09:26:59,156 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 72 on 31250: starting
> 2011-10-08 09:26:59,156 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 73 on 31250: starting
> 2011-10-08 09:26:59,156 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 74 on 31250: starting
> 2011-10-08 09:26:59,156 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 75 on 31250: starting
> 2011-10-08 09:26:59,157 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 76 on 31250: starting
> 2011-10-08 09:26:59,157 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 77 on 31250: starting
> 2011-10-08 09:26:59,157 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 78 on 31250: starting
> 2011-10-08 09:26:59,157 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 79 on 31250: starting
> 2011-10-08 09:26:59,157 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 80 on 31250: starting
> 2011-10-08 09:26:59,157 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 81 on 31250: starting
> 2011-10-08 09:26:59,158 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 82 on 31250: starting
> 2011-10-08 09:26:59,158 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 83 on 31250: starting
> 2011-10-08 09:26:59,158 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 84 on 31250: starting
> 2011-10-08 09:26:59,158 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 85 on 31250: starting
> 2011-10-08 09:26:59,158 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 86 on 31250: starting
> 2011-10-08 09:26:59,158 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 87 on 31250: starting
> 2011-10-08 09:26:59,159 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 88 on 31250: starting
> 2011-10-08 09:26:59,159 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 89 on 31250: starting
> 2011-10-08 09:26:59,159 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 90 on 31250: starting
> 2011-10-08 09:26:59,159 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 91 on 31250: starting
> 2011-10-08 09:26:59,159 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 92 on 31250: starting
> 2011-10-08 09:26:59,159 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 93 on 31250: starting
> 2011-10-08 09:26:59,160 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 94 on 31250: starting
> 2011-10-08 09:26:59,160 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 95 on 31250: starting
> 2011-10-08 09:26:59,160 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 96 on 31250: starting
> 2011-10-08 09:26:59,160 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 97 on 31250: starting
> 2011-10-08 09:26:59,161 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 98 on 31250: starting
> 2011-10-08 09:26:59,161 INFO org.apache.giraph.comm.BasicRPCCommunications: 
> BasicRPCCommunications: Started RPC communication server: 
> gsta33033.tan.ygrid.yahoo.com/10.216.176.59:31250 with 100 handlers
> 2011-10-08 09:26:59,161 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 99 on 31250: starting
> 2011-10-08 09:27:05,234 INFO org.apache.hadoop.mapred.TaskLogsTruncater: 
> Initializing logs' truncater with mapRetainSize=102400 and 
> reduceRetainSize=102400
> 2011-10-08 09:27:05,236 FATAL org.apache.hadoop.mapred.Child: Error running 
> child : java.lang.OutOfMemoryError: unable to create new native thread
>       at java.lang.Thread.start0(Native Method)
>       at java.lang.Thread.start(Thread.java:597)
>       at java.lang.UNIXProcess$1.run(UNIXProcess.java:141)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at java.lang.UNIXProcess.<init>(UNIXProcess.java:103)
>       at java.lang.ProcessImpl.start(ProcessImpl.java:65)
>       at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
>       at org.apache.hadoop.util.Shell.runCommand(Shell.java:200)
>       at org.apache.hadoop.util.Shell.run(Shell.java:182)
>       at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:375)
>       at org.apache.hadoop.util.Shell.execCommand(Shell.java:461)
>       at org.apache.hadoop.util.Shell.execCommand(Shell.java:444)
>       at 
> org.apache.hadoop.fs.RawLocalFileSystem.execCommand(RawLocalFileSystem.java:540)
>       at 
> org.apache.hadoop.fs.RawLocalFileSystem.access$100(RawLocalFileSystem.java:37)
>       at 
> org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:417)
>       at 
> org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.getOwner(RawLocalFileSystem.java:400)
>       at org.apache.hadoop.mapred.TaskLog.obtainLogDirOwner(TaskLog.java:275)
>       at 
> org.apache.hadoop.mapred.TaskLogsTruncater.truncateLogs(TaskLogsTruncater.java:124)
>       at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:396)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>       at org.apache.hadoop.mapred.Child.main(Child.java:255)
> 
> 2011-10-08 09:27:05,272 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping MapTask metrics 
> system...
> 2011-10-08 09:27:05,272 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping metrics source 
> ugi(org.apache.hadoop.security.UgiInstrumentation)
> 2011-10-08 09:27:05,272 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping metrics source 
> jvm(org.apache.hadoop.metrics2.source.JvmMetricsSource)
> 2011-10-08 09:27:05,272 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping metrics source 
> RpcDetailedActivityForPort31250(org.apache.hadoop.ipc.metrics.RpcInstrumentation$Detailed)
> 2011-10-08 09:27:05,272 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping metrics source 
> RpcActivityForPort31250(org.apache.hadoop.ipc.metrics.RpcInstrumentation)
> 2011-10-08 09:27:05,272 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system 
> stopped.
> 
> -- 
> Best Regards
> Zhiwei Gu

Reply via email to