[jira] [Commented] (HADOOP-9654) IPC timeout doesn't seem to be kicking in
[ https://issues.apache.org/jira/browse/HADOOP-9654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681335#comment-14681335 ] Ajith S commented on HADOOP-9654: - As per [~rohithsharma] comments, the issue is same as HADOOP-11252 So suggest we can close this, as much of the discussions are in other jira. IPC timeout doesn't seem to be kicking in - Key: HADOOP-9654 URL: https://issues.apache.org/jira/browse/HADOOP-9654 Project: Hadoop Common Issue Type: Bug Components: ipc Affects Versions: 2.1.0-beta Reporter: Roman Shaposhnik Assignee: Ajith S During my Bigtop testing I made the NN OOM. This, in turn, made all of the clients stuck in the IPC call (even the new clients that I run *after* the NN went OOM). Here's an example of a jstack output on the client that was running: {noformat} $ hadoop fs -lsr / {noformat} Stacktrace: {noformat} /usr/java/jdk1.6.0_21/bin/jstack 19078 2013-06-19 23:14:00 Full thread dump Java HotSpot(TM) 64-Bit Server VM (17.0-b16 mixed mode): Attach Listener daemon prio=10 tid=0x7fcd8c8c1800 nid=0x5105 waiting on condition [0x] java.lang.Thread.State: RUNNABLE IPC Client (1223039541) connection to ip-10-144-82-213.ec2.internal/10.144.82.213:17020 from root daemon prio=10 tid=0x7fcd8c7ea000 nid=0x4aa0 runnable [0x7fcd443e2000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) - locked 0x7fcd7529de18 (a sun.nio.ch.Util$1) - locked 0x7fcd7529de00 (a java.util.Collections$UnmodifiableSet) - locked 0x7fcd7529da80 (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) at java.io.FilterInputStream.read(FilterInputStream.java:116) at java.io.FilterInputStream.read(FilterInputStream.java:116) at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:421) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) - locked 0x7fcd752aaf18 (a java.io.BufferedInputStream) at java.io.DataInputStream.readInt(DataInputStream.java:370) at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:943) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:840) Low Memory Detector daemon prio=10 tid=0x7fcd8c09 nid=0x4a9b runnable [0x] java.lang.Thread.State: RUNNABLE CompilerThread1 daemon prio=10 tid=0x7fcd8c08d800 nid=0x4a9a waiting on condition [0x] java.lang.Thread.State: RUNNABLE CompilerThread0 daemon prio=10 tid=0x7fcd8c08a800 nid=0x4a99 waiting on condition [0x] java.lang.Thread.State: RUNNABLE Signal Dispatcher daemon prio=10 tid=0x7fcd8c088800 nid=0x4a98 runnable [0x] java.lang.Thread.State: RUNNABLE Finalizer daemon prio=10 tid=0x7fcd8c06a000 nid=0x4a97 in Object.wait() [0x7fcd902e9000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x7fcd75fc0470 (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118) - locked 0x7fcd75fc0470 (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159) Reference Handler daemon prio=10 tid=0x7fcd8c068000 nid=0x4a96 in Object.wait() [0x7fcd903ea000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x7fcd75fc0550 (a java.lang.ref.Reference$Lock) at java.lang.Object.wait(Object.java:485) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116) - locked 0x7fcd75fc0550 (a java.lang.ref.Reference$Lock) main prio=10 tid=0x7fcd8c00a800 nid=0x4a92 in Object.wait() [0x7fcd91b06000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on
[jira] [Commented] (HADOOP-9654) IPC timeout doesn't seem to be kicking in
[ https://issues.apache.org/jira/browse/HADOOP-9654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681334#comment-14681334 ] Ajith S commented on HADOOP-9654: - +1 IPC timeout doesn't seem to be kicking in - Key: HADOOP-9654 URL: https://issues.apache.org/jira/browse/HADOOP-9654 Project: Hadoop Common Issue Type: Bug Components: ipc Affects Versions: 2.1.0-beta Reporter: Roman Shaposhnik Assignee: Ajith S During my Bigtop testing I made the NN OOM. This, in turn, made all of the clients stuck in the IPC call (even the new clients that I run *after* the NN went OOM). Here's an example of a jstack output on the client that was running: {noformat} $ hadoop fs -lsr / {noformat} Stacktrace: {noformat} /usr/java/jdk1.6.0_21/bin/jstack 19078 2013-06-19 23:14:00 Full thread dump Java HotSpot(TM) 64-Bit Server VM (17.0-b16 mixed mode): Attach Listener daemon prio=10 tid=0x7fcd8c8c1800 nid=0x5105 waiting on condition [0x] java.lang.Thread.State: RUNNABLE IPC Client (1223039541) connection to ip-10-144-82-213.ec2.internal/10.144.82.213:17020 from root daemon prio=10 tid=0x7fcd8c7ea000 nid=0x4aa0 runnable [0x7fcd443e2000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) - locked 0x7fcd7529de18 (a sun.nio.ch.Util$1) - locked 0x7fcd7529de00 (a java.util.Collections$UnmodifiableSet) - locked 0x7fcd7529da80 (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) at java.io.FilterInputStream.read(FilterInputStream.java:116) at java.io.FilterInputStream.read(FilterInputStream.java:116) at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:421) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) - locked 0x7fcd752aaf18 (a java.io.BufferedInputStream) at java.io.DataInputStream.readInt(DataInputStream.java:370) at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:943) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:840) Low Memory Detector daemon prio=10 tid=0x7fcd8c09 nid=0x4a9b runnable [0x] java.lang.Thread.State: RUNNABLE CompilerThread1 daemon prio=10 tid=0x7fcd8c08d800 nid=0x4a9a waiting on condition [0x] java.lang.Thread.State: RUNNABLE CompilerThread0 daemon prio=10 tid=0x7fcd8c08a800 nid=0x4a99 waiting on condition [0x] java.lang.Thread.State: RUNNABLE Signal Dispatcher daemon prio=10 tid=0x7fcd8c088800 nid=0x4a98 runnable [0x] java.lang.Thread.State: RUNNABLE Finalizer daemon prio=10 tid=0x7fcd8c06a000 nid=0x4a97 in Object.wait() [0x7fcd902e9000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x7fcd75fc0470 (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118) - locked 0x7fcd75fc0470 (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159) Reference Handler daemon prio=10 tid=0x7fcd8c068000 nid=0x4a96 in Object.wait() [0x7fcd903ea000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x7fcd75fc0550 (a java.lang.ref.Reference$Lock) at java.lang.Object.wait(Object.java:485) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116) - locked 0x7fcd75fc0550 (a java.lang.ref.Reference$Lock) main prio=10 tid=0x7fcd8c00a800 nid=0x4a92 in Object.wait() [0x7fcd91b06000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x7fcd752528e8 (a org.apache.hadoop.ipc.Client$Call) at java.lang.Object.wait(Object.java:485) at
[jira] [Commented] (HADOOP-9654) IPC timeout doesn't seem to be kicking in
[ https://issues.apache.org/jira/browse/HADOOP-9654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681262#comment-14681262 ] Rohith Sharma K S commented on HADOOP-9654: --- Is it same as HADOOP-11252? IPC timeout doesn't seem to be kicking in - Key: HADOOP-9654 URL: https://issues.apache.org/jira/browse/HADOOP-9654 Project: Hadoop Common Issue Type: Bug Components: ipc Affects Versions: 2.1.0-beta Reporter: Roman Shaposhnik Assignee: Ajith S During my Bigtop testing I made the NN OOM. This, in turn, made all of the clients stuck in the IPC call (even the new clients that I run *after* the NN went OOM). Here's an example of a jstack output on the client that was running: {noformat} $ hadoop fs -lsr / {noformat} Stacktrace: {noformat} /usr/java/jdk1.6.0_21/bin/jstack 19078 2013-06-19 23:14:00 Full thread dump Java HotSpot(TM) 64-Bit Server VM (17.0-b16 mixed mode): Attach Listener daemon prio=10 tid=0x7fcd8c8c1800 nid=0x5105 waiting on condition [0x] java.lang.Thread.State: RUNNABLE IPC Client (1223039541) connection to ip-10-144-82-213.ec2.internal/10.144.82.213:17020 from root daemon prio=10 tid=0x7fcd8c7ea000 nid=0x4aa0 runnable [0x7fcd443e2000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) - locked 0x7fcd7529de18 (a sun.nio.ch.Util$1) - locked 0x7fcd7529de00 (a java.util.Collections$UnmodifiableSet) - locked 0x7fcd7529da80 (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) at java.io.FilterInputStream.read(FilterInputStream.java:116) at java.io.FilterInputStream.read(FilterInputStream.java:116) at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:421) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) - locked 0x7fcd752aaf18 (a java.io.BufferedInputStream) at java.io.DataInputStream.readInt(DataInputStream.java:370) at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:943) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:840) Low Memory Detector daemon prio=10 tid=0x7fcd8c09 nid=0x4a9b runnable [0x] java.lang.Thread.State: RUNNABLE CompilerThread1 daemon prio=10 tid=0x7fcd8c08d800 nid=0x4a9a waiting on condition [0x] java.lang.Thread.State: RUNNABLE CompilerThread0 daemon prio=10 tid=0x7fcd8c08a800 nid=0x4a99 waiting on condition [0x] java.lang.Thread.State: RUNNABLE Signal Dispatcher daemon prio=10 tid=0x7fcd8c088800 nid=0x4a98 runnable [0x] java.lang.Thread.State: RUNNABLE Finalizer daemon prio=10 tid=0x7fcd8c06a000 nid=0x4a97 in Object.wait() [0x7fcd902e9000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x7fcd75fc0470 (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118) - locked 0x7fcd75fc0470 (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159) Reference Handler daemon prio=10 tid=0x7fcd8c068000 nid=0x4a96 in Object.wait() [0x7fcd903ea000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x7fcd75fc0550 (a java.lang.ref.Reference$Lock) at java.lang.Object.wait(Object.java:485) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116) - locked 0x7fcd75fc0550 (a java.lang.ref.Reference$Lock) main prio=10 tid=0x7fcd8c00a800 nid=0x4a92 in Object.wait() [0x7fcd91b06000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x7fcd752528e8 (a org.apache.hadoop.ipc.Client$Call) at java.lang.Object.wait(Object.java:485)
[jira] [Commented] (HADOOP-9654) IPC timeout doesn't seem to be kicking in
[ https://issues.apache.org/jira/browse/HADOOP-9654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681206#comment-14681206 ] Ajith S commented on HADOOP-9654: - +1 for [~rvs] I think we can introduce a new default *ipc.client.timeout* property which can be used in case the ipc.client.ping=false(which is default now) -1 is not a reasonable timeout value, we can set the new property to may be say 3600 seconds.? reasonable.? IPC timeout doesn't seem to be kicking in - Key: HADOOP-9654 URL: https://issues.apache.org/jira/browse/HADOOP-9654 Project: Hadoop Common Issue Type: Bug Components: ipc Affects Versions: 2.1.0-beta Reporter: Roman Shaposhnik Assignee: Ajith S During my Bigtop testing I made the NN OOM. This, in turn, made all of the clients stuck in the IPC call (even the new clients that I run *after* the NN went OOM). Here's an example of a jstack output on the client that was running: {noformat} $ hadoop fs -lsr / {noformat} Stacktrace: {noformat} /usr/java/jdk1.6.0_21/bin/jstack 19078 2013-06-19 23:14:00 Full thread dump Java HotSpot(TM) 64-Bit Server VM (17.0-b16 mixed mode): Attach Listener daemon prio=10 tid=0x7fcd8c8c1800 nid=0x5105 waiting on condition [0x] java.lang.Thread.State: RUNNABLE IPC Client (1223039541) connection to ip-10-144-82-213.ec2.internal/10.144.82.213:17020 from root daemon prio=10 tid=0x7fcd8c7ea000 nid=0x4aa0 runnable [0x7fcd443e2000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) - locked 0x7fcd7529de18 (a sun.nio.ch.Util$1) - locked 0x7fcd7529de00 (a java.util.Collections$UnmodifiableSet) - locked 0x7fcd7529da80 (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) at java.io.FilterInputStream.read(FilterInputStream.java:116) at java.io.FilterInputStream.read(FilterInputStream.java:116) at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:421) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) - locked 0x7fcd752aaf18 (a java.io.BufferedInputStream) at java.io.DataInputStream.readInt(DataInputStream.java:370) at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:943) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:840) Low Memory Detector daemon prio=10 tid=0x7fcd8c09 nid=0x4a9b runnable [0x] java.lang.Thread.State: RUNNABLE CompilerThread1 daemon prio=10 tid=0x7fcd8c08d800 nid=0x4a9a waiting on condition [0x] java.lang.Thread.State: RUNNABLE CompilerThread0 daemon prio=10 tid=0x7fcd8c08a800 nid=0x4a99 waiting on condition [0x] java.lang.Thread.State: RUNNABLE Signal Dispatcher daemon prio=10 tid=0x7fcd8c088800 nid=0x4a98 runnable [0x] java.lang.Thread.State: RUNNABLE Finalizer daemon prio=10 tid=0x7fcd8c06a000 nid=0x4a97 in Object.wait() [0x7fcd902e9000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x7fcd75fc0470 (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118) - locked 0x7fcd75fc0470 (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159) Reference Handler daemon prio=10 tid=0x7fcd8c068000 nid=0x4a96 in Object.wait() [0x7fcd903ea000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x7fcd75fc0550 (a java.lang.ref.Reference$Lock) at java.lang.Object.wait(Object.java:485) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116) - locked 0x7fcd75fc0550 (a java.lang.ref.Reference$Lock) main prio=10 tid=0x7fcd8c00a800 nid=0x4a92 in Object.wait() [0x7fcd91b06000]
[jira] [Commented] (HADOOP-9654) IPC timeout doesn't seem to be kicking in
[ https://issues.apache.org/jira/browse/HADOOP-9654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13689946#comment-13689946 ] Roman Shaposhnik commented on HADOOP-9654: -- [~jagane] as a matter of fact I didn't know that -- thanks a million for bringing this up! I can definitely give your suggestion a try (the NN keeps OOMing -- which gives me a perfect testbed for this). I do have a question for the rest of the folks here though -- a client that never times out doesn't strike me as a great default. Am I missing something? Should we change the default for the client to actually timeout? IPC timeout doesn't seem to be kicking in - Key: HADOOP-9654 URL: https://issues.apache.org/jira/browse/HADOOP-9654 Project: Hadoop Common Issue Type: Bug Components: ipc Affects Versions: 2.1.0-beta Reporter: Roman Shaposhnik During my Bigtop testing I made the NN OOM. This, in turn, made all of the clients stuck in the IPC call (even the new clients that I run *after* the NN went OOM). Here's an example of a jstack output on the client that was running: {noformat} $ hadoop fs -lsr / {noformat} Stacktrace: {noformat} /usr/java/jdk1.6.0_21/bin/jstack 19078 2013-06-19 23:14:00 Full thread dump Java HotSpot(TM) 64-Bit Server VM (17.0-b16 mixed mode): Attach Listener daemon prio=10 tid=0x7fcd8c8c1800 nid=0x5105 waiting on condition [0x] java.lang.Thread.State: RUNNABLE IPC Client (1223039541) connection to ip-10-144-82-213.ec2.internal/10.144.82.213:17020 from root daemon prio=10 tid=0x7fcd8c7ea000 nid=0x4aa0 runnable [0x7fcd443e2000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) - locked 0x7fcd7529de18 (a sun.nio.ch.Util$1) - locked 0x7fcd7529de00 (a java.util.Collections$UnmodifiableSet) - locked 0x7fcd7529da80 (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) at java.io.FilterInputStream.read(FilterInputStream.java:116) at java.io.FilterInputStream.read(FilterInputStream.java:116) at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:421) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) - locked 0x7fcd752aaf18 (a java.io.BufferedInputStream) at java.io.DataInputStream.readInt(DataInputStream.java:370) at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:943) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:840) Low Memory Detector daemon prio=10 tid=0x7fcd8c09 nid=0x4a9b runnable [0x] java.lang.Thread.State: RUNNABLE CompilerThread1 daemon prio=10 tid=0x7fcd8c08d800 nid=0x4a9a waiting on condition [0x] java.lang.Thread.State: RUNNABLE CompilerThread0 daemon prio=10 tid=0x7fcd8c08a800 nid=0x4a99 waiting on condition [0x] java.lang.Thread.State: RUNNABLE Signal Dispatcher daemon prio=10 tid=0x7fcd8c088800 nid=0x4a98 runnable [0x] java.lang.Thread.State: RUNNABLE Finalizer daemon prio=10 tid=0x7fcd8c06a000 nid=0x4a97 in Object.wait() [0x7fcd902e9000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x7fcd75fc0470 (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118) - locked 0x7fcd75fc0470 (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159) Reference Handler daemon prio=10 tid=0x7fcd8c068000 nid=0x4a96 in Object.wait() [0x7fcd903ea000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x7fcd75fc0550 (a java.lang.ref.Reference$Lock) at java.lang.Object.wait(Object.java:485) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
[jira] [Commented] (HADOOP-9654) IPC timeout doesn't seem to be kicking in
[ https://issues.apache.org/jira/browse/HADOOP-9654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688652#comment-13688652 ] Jagane Sundar commented on HADOOP-9654: --- Roman - pardon me if you already know this and are configuring your BigTop test correctly. If you take a look at HDFS-4646 and HDFS-4858, I have observed similar failure to timeout issues with both the HDFS Client to NameNode ipc (HDFS-4646) and the Datanode to NameNode ipc (HDFS-4858). By default ipc.client.ping is true. The meaning of this is that the IPC layer is to send out a periodic ping but to never timeout. In order to timeout, ipc.client.ping needs to be configured false and ipc.ping.interval needs to be set to some value e.g. 14000. This configuration means that the IPC Client should timeout in 14000. Is BigTop configuring hadoop so? IPC timeout doesn't seem to be kicking in - Key: HADOOP-9654 URL: https://issues.apache.org/jira/browse/HADOOP-9654 Project: Hadoop Common Issue Type: Bug Components: ipc Affects Versions: 2.1.0-beta Reporter: Roman Shaposhnik During my Bigtop testing I made the NN OOM. This, in turn, made all of the clients stuck in the IPC call (even the new clients that I run *after* the NN went OOM). Here's an example of a jstack output on the client that was running: {noformat} $ hadoop fs -lsr / {noformat} Stacktrace: {noformat} /usr/java/jdk1.6.0_21/bin/jstack 19078 2013-06-19 23:14:00 Full thread dump Java HotSpot(TM) 64-Bit Server VM (17.0-b16 mixed mode): Attach Listener daemon prio=10 tid=0x7fcd8c8c1800 nid=0x5105 waiting on condition [0x] java.lang.Thread.State: RUNNABLE IPC Client (1223039541) connection to ip-10-144-82-213.ec2.internal/10.144.82.213:17020 from root daemon prio=10 tid=0x7fcd8c7ea000 nid=0x4aa0 runnable [0x7fcd443e2000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) - locked 0x7fcd7529de18 (a sun.nio.ch.Util$1) - locked 0x7fcd7529de00 (a java.util.Collections$UnmodifiableSet) - locked 0x7fcd7529da80 (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) at java.io.FilterInputStream.read(FilterInputStream.java:116) at java.io.FilterInputStream.read(FilterInputStream.java:116) at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:421) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) - locked 0x7fcd752aaf18 (a java.io.BufferedInputStream) at java.io.DataInputStream.readInt(DataInputStream.java:370) at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:943) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:840) Low Memory Detector daemon prio=10 tid=0x7fcd8c09 nid=0x4a9b runnable [0x] java.lang.Thread.State: RUNNABLE CompilerThread1 daemon prio=10 tid=0x7fcd8c08d800 nid=0x4a9a waiting on condition [0x] java.lang.Thread.State: RUNNABLE CompilerThread0 daemon prio=10 tid=0x7fcd8c08a800 nid=0x4a99 waiting on condition [0x] java.lang.Thread.State: RUNNABLE Signal Dispatcher daemon prio=10 tid=0x7fcd8c088800 nid=0x4a98 runnable [0x] java.lang.Thread.State: RUNNABLE Finalizer daemon prio=10 tid=0x7fcd8c06a000 nid=0x4a97 in Object.wait() [0x7fcd902e9000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x7fcd75fc0470 (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118) - locked 0x7fcd75fc0470 (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159) Reference Handler daemon prio=10 tid=0x7fcd8c068000 nid=0x4a96 in Object.wait() [0x7fcd903ea000] java.lang.Thread.State: WAITING (on object monitor) at