[jira] [Commented] (HADOOP-9654) IPC timeout doesn't seem to be kicking in

2015-08-11 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681335#comment-14681335
 ] 

Ajith S commented on HADOOP-9654:
-

As per [~rohithsharma] comments, the issue is same as HADOOP-11252
So suggest we can close this, as much of the discussions are in other jira.

 IPC timeout doesn't seem to be kicking in
 -

 Key: HADOOP-9654
 URL: https://issues.apache.org/jira/browse/HADOOP-9654
 Project: Hadoop Common
  Issue Type: Bug
  Components: ipc
Affects Versions: 2.1.0-beta
Reporter: Roman Shaposhnik
Assignee: Ajith S

 During my Bigtop testing I made the NN OOM. This, in turn, made all of the 
 clients stuck in the IPC call (even the new clients that I run *after* the NN 
 went OOM). Here's an example of a jstack output on the client that was 
 running:
 {noformat}
 $ hadoop fs -lsr /
 {noformat}
 Stacktrace:
 {noformat}
 /usr/java/jdk1.6.0_21/bin/jstack 19078
 2013-06-19 23:14:00
 Full thread dump Java HotSpot(TM) 64-Bit Server VM (17.0-b16 mixed mode):
 Attach Listener daemon prio=10 tid=0x7fcd8c8c1800 nid=0x5105 waiting on 
 condition [0x]
java.lang.Thread.State: RUNNABLE
 IPC Client (1223039541) connection to 
 ip-10-144-82-213.ec2.internal/10.144.82.213:17020 from root daemon prio=10 
 tid=0x7fcd8c7ea000 nid=0x4aa0 runnable [0x7fcd443e2000]
java.lang.Thread.State: RUNNABLE
   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
   at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
   at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
   - locked 0x7fcd7529de18 (a sun.nio.ch.Util$1)
   - locked 0x7fcd7529de00 (a java.util.Collections$UnmodifiableSet)
   - locked 0x7fcd7529da80 (a sun.nio.ch.EPollSelectorImpl)
   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
   at 
 org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335)
   at 
 org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
   at 
 org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
   at 
 org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
   at java.io.FilterInputStream.read(FilterInputStream.java:116)
   at java.io.FilterInputStream.read(FilterInputStream.java:116)
   at 
 org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:421)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
   - locked 0x7fcd752aaf18 (a java.io.BufferedInputStream)
   at java.io.DataInputStream.readInt(DataInputStream.java:370)
   at 
 org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:943)
   at org.apache.hadoop.ipc.Client$Connection.run(Client.java:840)
 Low Memory Detector daemon prio=10 tid=0x7fcd8c09 nid=0x4a9b 
 runnable [0x]
java.lang.Thread.State: RUNNABLE
 CompilerThread1 daemon prio=10 tid=0x7fcd8c08d800 nid=0x4a9a waiting on 
 condition [0x]
java.lang.Thread.State: RUNNABLE
 CompilerThread0 daemon prio=10 tid=0x7fcd8c08a800 nid=0x4a99 waiting on 
 condition [0x]
java.lang.Thread.State: RUNNABLE
 Signal Dispatcher daemon prio=10 tid=0x7fcd8c088800 nid=0x4a98 runnable 
 [0x]
java.lang.Thread.State: RUNNABLE
 Finalizer daemon prio=10 tid=0x7fcd8c06a000 nid=0x4a97 in Object.wait() 
 [0x7fcd902e9000]
java.lang.Thread.State: WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0x7fcd75fc0470 (a java.lang.ref.ReferenceQueue$Lock)
   at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
   - locked 0x7fcd75fc0470 (a java.lang.ref.ReferenceQueue$Lock)
   at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
   at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
 Reference Handler daemon prio=10 tid=0x7fcd8c068000 nid=0x4a96 in 
 Object.wait() [0x7fcd903ea000]
java.lang.Thread.State: WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0x7fcd75fc0550 (a java.lang.ref.Reference$Lock)
   at java.lang.Object.wait(Object.java:485)
   at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
   - locked 0x7fcd75fc0550 (a java.lang.ref.Reference$Lock)
 main prio=10 tid=0x7fcd8c00a800 nid=0x4a92 in Object.wait() 
 [0x7fcd91b06000]
java.lang.Thread.State: WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 

[jira] [Commented] (HADOOP-9654) IPC timeout doesn't seem to be kicking in

2015-08-11 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681334#comment-14681334
 ] 

Ajith S commented on HADOOP-9654:
-

+1

 IPC timeout doesn't seem to be kicking in
 -

 Key: HADOOP-9654
 URL: https://issues.apache.org/jira/browse/HADOOP-9654
 Project: Hadoop Common
  Issue Type: Bug
  Components: ipc
Affects Versions: 2.1.0-beta
Reporter: Roman Shaposhnik
Assignee: Ajith S

 During my Bigtop testing I made the NN OOM. This, in turn, made all of the 
 clients stuck in the IPC call (even the new clients that I run *after* the NN 
 went OOM). Here's an example of a jstack output on the client that was 
 running:
 {noformat}
 $ hadoop fs -lsr /
 {noformat}
 Stacktrace:
 {noformat}
 /usr/java/jdk1.6.0_21/bin/jstack 19078
 2013-06-19 23:14:00
 Full thread dump Java HotSpot(TM) 64-Bit Server VM (17.0-b16 mixed mode):
 Attach Listener daemon prio=10 tid=0x7fcd8c8c1800 nid=0x5105 waiting on 
 condition [0x]
java.lang.Thread.State: RUNNABLE
 IPC Client (1223039541) connection to 
 ip-10-144-82-213.ec2.internal/10.144.82.213:17020 from root daemon prio=10 
 tid=0x7fcd8c7ea000 nid=0x4aa0 runnable [0x7fcd443e2000]
java.lang.Thread.State: RUNNABLE
   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
   at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
   at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
   - locked 0x7fcd7529de18 (a sun.nio.ch.Util$1)
   - locked 0x7fcd7529de00 (a java.util.Collections$UnmodifiableSet)
   - locked 0x7fcd7529da80 (a sun.nio.ch.EPollSelectorImpl)
   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
   at 
 org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335)
   at 
 org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
   at 
 org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
   at 
 org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
   at java.io.FilterInputStream.read(FilterInputStream.java:116)
   at java.io.FilterInputStream.read(FilterInputStream.java:116)
   at 
 org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:421)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
   - locked 0x7fcd752aaf18 (a java.io.BufferedInputStream)
   at java.io.DataInputStream.readInt(DataInputStream.java:370)
   at 
 org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:943)
   at org.apache.hadoop.ipc.Client$Connection.run(Client.java:840)
 Low Memory Detector daemon prio=10 tid=0x7fcd8c09 nid=0x4a9b 
 runnable [0x]
java.lang.Thread.State: RUNNABLE
 CompilerThread1 daemon prio=10 tid=0x7fcd8c08d800 nid=0x4a9a waiting on 
 condition [0x]
java.lang.Thread.State: RUNNABLE
 CompilerThread0 daemon prio=10 tid=0x7fcd8c08a800 nid=0x4a99 waiting on 
 condition [0x]
java.lang.Thread.State: RUNNABLE
 Signal Dispatcher daemon prio=10 tid=0x7fcd8c088800 nid=0x4a98 runnable 
 [0x]
java.lang.Thread.State: RUNNABLE
 Finalizer daemon prio=10 tid=0x7fcd8c06a000 nid=0x4a97 in Object.wait() 
 [0x7fcd902e9000]
java.lang.Thread.State: WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0x7fcd75fc0470 (a java.lang.ref.ReferenceQueue$Lock)
   at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
   - locked 0x7fcd75fc0470 (a java.lang.ref.ReferenceQueue$Lock)
   at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
   at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
 Reference Handler daemon prio=10 tid=0x7fcd8c068000 nid=0x4a96 in 
 Object.wait() [0x7fcd903ea000]
java.lang.Thread.State: WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0x7fcd75fc0550 (a java.lang.ref.Reference$Lock)
   at java.lang.Object.wait(Object.java:485)
   at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
   - locked 0x7fcd75fc0550 (a java.lang.ref.Reference$Lock)
 main prio=10 tid=0x7fcd8c00a800 nid=0x4a92 in Object.wait() 
 [0x7fcd91b06000]
java.lang.Thread.State: WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0x7fcd752528e8 (a org.apache.hadoop.ipc.Client$Call)
   at java.lang.Object.wait(Object.java:485)
   at 

[jira] [Commented] (HADOOP-9654) IPC timeout doesn't seem to be kicking in

2015-08-10 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681262#comment-14681262
 ] 

Rohith Sharma K S commented on HADOOP-9654:
---

Is it same as HADOOP-11252?

 IPC timeout doesn't seem to be kicking in
 -

 Key: HADOOP-9654
 URL: https://issues.apache.org/jira/browse/HADOOP-9654
 Project: Hadoop Common
  Issue Type: Bug
  Components: ipc
Affects Versions: 2.1.0-beta
Reporter: Roman Shaposhnik
Assignee: Ajith S

 During my Bigtop testing I made the NN OOM. This, in turn, made all of the 
 clients stuck in the IPC call (even the new clients that I run *after* the NN 
 went OOM). Here's an example of a jstack output on the client that was 
 running:
 {noformat}
 $ hadoop fs -lsr /
 {noformat}
 Stacktrace:
 {noformat}
 /usr/java/jdk1.6.0_21/bin/jstack 19078
 2013-06-19 23:14:00
 Full thread dump Java HotSpot(TM) 64-Bit Server VM (17.0-b16 mixed mode):
 Attach Listener daemon prio=10 tid=0x7fcd8c8c1800 nid=0x5105 waiting on 
 condition [0x]
java.lang.Thread.State: RUNNABLE
 IPC Client (1223039541) connection to 
 ip-10-144-82-213.ec2.internal/10.144.82.213:17020 from root daemon prio=10 
 tid=0x7fcd8c7ea000 nid=0x4aa0 runnable [0x7fcd443e2000]
java.lang.Thread.State: RUNNABLE
   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
   at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
   at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
   - locked 0x7fcd7529de18 (a sun.nio.ch.Util$1)
   - locked 0x7fcd7529de00 (a java.util.Collections$UnmodifiableSet)
   - locked 0x7fcd7529da80 (a sun.nio.ch.EPollSelectorImpl)
   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
   at 
 org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335)
   at 
 org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
   at 
 org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
   at 
 org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
   at java.io.FilterInputStream.read(FilterInputStream.java:116)
   at java.io.FilterInputStream.read(FilterInputStream.java:116)
   at 
 org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:421)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
   - locked 0x7fcd752aaf18 (a java.io.BufferedInputStream)
   at java.io.DataInputStream.readInt(DataInputStream.java:370)
   at 
 org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:943)
   at org.apache.hadoop.ipc.Client$Connection.run(Client.java:840)
 Low Memory Detector daemon prio=10 tid=0x7fcd8c09 nid=0x4a9b 
 runnable [0x]
java.lang.Thread.State: RUNNABLE
 CompilerThread1 daemon prio=10 tid=0x7fcd8c08d800 nid=0x4a9a waiting on 
 condition [0x]
java.lang.Thread.State: RUNNABLE
 CompilerThread0 daemon prio=10 tid=0x7fcd8c08a800 nid=0x4a99 waiting on 
 condition [0x]
java.lang.Thread.State: RUNNABLE
 Signal Dispatcher daemon prio=10 tid=0x7fcd8c088800 nid=0x4a98 runnable 
 [0x]
java.lang.Thread.State: RUNNABLE
 Finalizer daemon prio=10 tid=0x7fcd8c06a000 nid=0x4a97 in Object.wait() 
 [0x7fcd902e9000]
java.lang.Thread.State: WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0x7fcd75fc0470 (a java.lang.ref.ReferenceQueue$Lock)
   at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
   - locked 0x7fcd75fc0470 (a java.lang.ref.ReferenceQueue$Lock)
   at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
   at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
 Reference Handler daemon prio=10 tid=0x7fcd8c068000 nid=0x4a96 in 
 Object.wait() [0x7fcd903ea000]
java.lang.Thread.State: WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0x7fcd75fc0550 (a java.lang.ref.Reference$Lock)
   at java.lang.Object.wait(Object.java:485)
   at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
   - locked 0x7fcd75fc0550 (a java.lang.ref.Reference$Lock)
 main prio=10 tid=0x7fcd8c00a800 nid=0x4a92 in Object.wait() 
 [0x7fcd91b06000]
java.lang.Thread.State: WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0x7fcd752528e8 (a org.apache.hadoop.ipc.Client$Call)
   at java.lang.Object.wait(Object.java:485)
 

[jira] [Commented] (HADOOP-9654) IPC timeout doesn't seem to be kicking in

2015-08-10 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681206#comment-14681206
 ] 

Ajith S commented on HADOOP-9654:
-

+1 for [~rvs]

I think we can introduce a new default *ipc.client.timeout* property which can 
be used in case the ipc.client.ping=false(which is default now)
-1 is not a reasonable timeout value, we can set the new property to may be say 
3600 seconds.? reasonable.?

 IPC timeout doesn't seem to be kicking in
 -

 Key: HADOOP-9654
 URL: https://issues.apache.org/jira/browse/HADOOP-9654
 Project: Hadoop Common
  Issue Type: Bug
  Components: ipc
Affects Versions: 2.1.0-beta
Reporter: Roman Shaposhnik
Assignee: Ajith S

 During my Bigtop testing I made the NN OOM. This, in turn, made all of the 
 clients stuck in the IPC call (even the new clients that I run *after* the NN 
 went OOM). Here's an example of a jstack output on the client that was 
 running:
 {noformat}
 $ hadoop fs -lsr /
 {noformat}
 Stacktrace:
 {noformat}
 /usr/java/jdk1.6.0_21/bin/jstack 19078
 2013-06-19 23:14:00
 Full thread dump Java HotSpot(TM) 64-Bit Server VM (17.0-b16 mixed mode):
 Attach Listener daemon prio=10 tid=0x7fcd8c8c1800 nid=0x5105 waiting on 
 condition [0x]
java.lang.Thread.State: RUNNABLE
 IPC Client (1223039541) connection to 
 ip-10-144-82-213.ec2.internal/10.144.82.213:17020 from root daemon prio=10 
 tid=0x7fcd8c7ea000 nid=0x4aa0 runnable [0x7fcd443e2000]
java.lang.Thread.State: RUNNABLE
   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
   at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
   at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
   - locked 0x7fcd7529de18 (a sun.nio.ch.Util$1)
   - locked 0x7fcd7529de00 (a java.util.Collections$UnmodifiableSet)
   - locked 0x7fcd7529da80 (a sun.nio.ch.EPollSelectorImpl)
   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
   at 
 org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335)
   at 
 org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
   at 
 org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
   at 
 org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
   at java.io.FilterInputStream.read(FilterInputStream.java:116)
   at java.io.FilterInputStream.read(FilterInputStream.java:116)
   at 
 org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:421)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
   - locked 0x7fcd752aaf18 (a java.io.BufferedInputStream)
   at java.io.DataInputStream.readInt(DataInputStream.java:370)
   at 
 org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:943)
   at org.apache.hadoop.ipc.Client$Connection.run(Client.java:840)
 Low Memory Detector daemon prio=10 tid=0x7fcd8c09 nid=0x4a9b 
 runnable [0x]
java.lang.Thread.State: RUNNABLE
 CompilerThread1 daemon prio=10 tid=0x7fcd8c08d800 nid=0x4a9a waiting on 
 condition [0x]
java.lang.Thread.State: RUNNABLE
 CompilerThread0 daemon prio=10 tid=0x7fcd8c08a800 nid=0x4a99 waiting on 
 condition [0x]
java.lang.Thread.State: RUNNABLE
 Signal Dispatcher daemon prio=10 tid=0x7fcd8c088800 nid=0x4a98 runnable 
 [0x]
java.lang.Thread.State: RUNNABLE
 Finalizer daemon prio=10 tid=0x7fcd8c06a000 nid=0x4a97 in Object.wait() 
 [0x7fcd902e9000]
java.lang.Thread.State: WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0x7fcd75fc0470 (a java.lang.ref.ReferenceQueue$Lock)
   at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
   - locked 0x7fcd75fc0470 (a java.lang.ref.ReferenceQueue$Lock)
   at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
   at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
 Reference Handler daemon prio=10 tid=0x7fcd8c068000 nid=0x4a96 in 
 Object.wait() [0x7fcd903ea000]
java.lang.Thread.State: WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0x7fcd75fc0550 (a java.lang.ref.Reference$Lock)
   at java.lang.Object.wait(Object.java:485)
   at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
   - locked 0x7fcd75fc0550 (a java.lang.ref.Reference$Lock)
 main prio=10 tid=0x7fcd8c00a800 nid=0x4a92 in Object.wait() 
 [0x7fcd91b06000]

[jira] [Commented] (HADOOP-9654) IPC timeout doesn't seem to be kicking in

2013-06-20 Thread Roman Shaposhnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13689946#comment-13689946
 ] 

Roman Shaposhnik commented on HADOOP-9654:
--

[~jagane] as a matter of fact I didn't know that -- thanks a million for 
bringing this up! I can definitely give your suggestion a try (the NN keeps 
OOMing -- which gives me a perfect testbed for this).

I do have a question for the rest of the folks here though -- a client that 
never times out doesn't strike me as a great default. Am I missing something? 
Should we change the default for the client to actually timeout?

 IPC timeout doesn't seem to be kicking in
 -

 Key: HADOOP-9654
 URL: https://issues.apache.org/jira/browse/HADOOP-9654
 Project: Hadoop Common
  Issue Type: Bug
  Components: ipc
Affects Versions: 2.1.0-beta
Reporter: Roman Shaposhnik

 During my Bigtop testing I made the NN OOM. This, in turn, made all of the 
 clients stuck in the IPC call (even the new clients that I run *after* the NN 
 went OOM). Here's an example of a jstack output on the client that was 
 running:
 {noformat}
 $ hadoop fs -lsr /
 {noformat}
 Stacktrace:
 {noformat}
 /usr/java/jdk1.6.0_21/bin/jstack 19078
 2013-06-19 23:14:00
 Full thread dump Java HotSpot(TM) 64-Bit Server VM (17.0-b16 mixed mode):
 Attach Listener daemon prio=10 tid=0x7fcd8c8c1800 nid=0x5105 waiting on 
 condition [0x]
java.lang.Thread.State: RUNNABLE
 IPC Client (1223039541) connection to 
 ip-10-144-82-213.ec2.internal/10.144.82.213:17020 from root daemon prio=10 
 tid=0x7fcd8c7ea000 nid=0x4aa0 runnable [0x7fcd443e2000]
java.lang.Thread.State: RUNNABLE
   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
   at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
   at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
   - locked 0x7fcd7529de18 (a sun.nio.ch.Util$1)
   - locked 0x7fcd7529de00 (a java.util.Collections$UnmodifiableSet)
   - locked 0x7fcd7529da80 (a sun.nio.ch.EPollSelectorImpl)
   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
   at 
 org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335)
   at 
 org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
   at 
 org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
   at 
 org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
   at java.io.FilterInputStream.read(FilterInputStream.java:116)
   at java.io.FilterInputStream.read(FilterInputStream.java:116)
   at 
 org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:421)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
   - locked 0x7fcd752aaf18 (a java.io.BufferedInputStream)
   at java.io.DataInputStream.readInt(DataInputStream.java:370)
   at 
 org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:943)
   at org.apache.hadoop.ipc.Client$Connection.run(Client.java:840)
 Low Memory Detector daemon prio=10 tid=0x7fcd8c09 nid=0x4a9b 
 runnable [0x]
java.lang.Thread.State: RUNNABLE
 CompilerThread1 daemon prio=10 tid=0x7fcd8c08d800 nid=0x4a9a waiting on 
 condition [0x]
java.lang.Thread.State: RUNNABLE
 CompilerThread0 daemon prio=10 tid=0x7fcd8c08a800 nid=0x4a99 waiting on 
 condition [0x]
java.lang.Thread.State: RUNNABLE
 Signal Dispatcher daemon prio=10 tid=0x7fcd8c088800 nid=0x4a98 runnable 
 [0x]
java.lang.Thread.State: RUNNABLE
 Finalizer daemon prio=10 tid=0x7fcd8c06a000 nid=0x4a97 in Object.wait() 
 [0x7fcd902e9000]
java.lang.Thread.State: WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0x7fcd75fc0470 (a java.lang.ref.ReferenceQueue$Lock)
   at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
   - locked 0x7fcd75fc0470 (a java.lang.ref.ReferenceQueue$Lock)
   at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
   at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
 Reference Handler daemon prio=10 tid=0x7fcd8c068000 nid=0x4a96 in 
 Object.wait() [0x7fcd903ea000]
java.lang.Thread.State: WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0x7fcd75fc0550 (a java.lang.ref.Reference$Lock)
   at java.lang.Object.wait(Object.java:485)
   at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
   

[jira] [Commented] (HADOOP-9654) IPC timeout doesn't seem to be kicking in

2013-06-19 Thread Jagane Sundar (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688652#comment-13688652
 ] 

Jagane Sundar commented on HADOOP-9654:
---

Roman - pardon me if you already know this and are configuring your BigTop test 
correctly. If you take a look at HDFS-4646 and HDFS-4858, I have observed 
similar failure to timeout issues with both the HDFS Client to NameNode ipc 
(HDFS-4646) and the Datanode to NameNode ipc (HDFS-4858).

By default ipc.client.ping is true. The meaning of this is that the IPC layer 
is to send out a periodic ping but to never timeout.

In order to timeout, ipc.client.ping needs to be configured false and 
ipc.ping.interval needs to be set to some value e.g. 14000. This configuration 
means that the IPC Client should timeout in 14000. Is BigTop configuring hadoop 
so?


 IPC timeout doesn't seem to be kicking in
 -

 Key: HADOOP-9654
 URL: https://issues.apache.org/jira/browse/HADOOP-9654
 Project: Hadoop Common
  Issue Type: Bug
  Components: ipc
Affects Versions: 2.1.0-beta
Reporter: Roman Shaposhnik

 During my Bigtop testing I made the NN OOM. This, in turn, made all of the 
 clients stuck in the IPC call (even the new clients that I run *after* the NN 
 went OOM). Here's an example of a jstack output on the client that was 
 running:
 {noformat}
 $ hadoop fs -lsr /
 {noformat}
 Stacktrace:
 {noformat}
 /usr/java/jdk1.6.0_21/bin/jstack 19078
 2013-06-19 23:14:00
 Full thread dump Java HotSpot(TM) 64-Bit Server VM (17.0-b16 mixed mode):
 Attach Listener daemon prio=10 tid=0x7fcd8c8c1800 nid=0x5105 waiting on 
 condition [0x]
java.lang.Thread.State: RUNNABLE
 IPC Client (1223039541) connection to 
 ip-10-144-82-213.ec2.internal/10.144.82.213:17020 from root daemon prio=10 
 tid=0x7fcd8c7ea000 nid=0x4aa0 runnable [0x7fcd443e2000]
java.lang.Thread.State: RUNNABLE
   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
   at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
   at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
   - locked 0x7fcd7529de18 (a sun.nio.ch.Util$1)
   - locked 0x7fcd7529de00 (a java.util.Collections$UnmodifiableSet)
   - locked 0x7fcd7529da80 (a sun.nio.ch.EPollSelectorImpl)
   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
   at 
 org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335)
   at 
 org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
   at 
 org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
   at 
 org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
   at java.io.FilterInputStream.read(FilterInputStream.java:116)
   at java.io.FilterInputStream.read(FilterInputStream.java:116)
   at 
 org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:421)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
   - locked 0x7fcd752aaf18 (a java.io.BufferedInputStream)
   at java.io.DataInputStream.readInt(DataInputStream.java:370)
   at 
 org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:943)
   at org.apache.hadoop.ipc.Client$Connection.run(Client.java:840)
 Low Memory Detector daemon prio=10 tid=0x7fcd8c09 nid=0x4a9b 
 runnable [0x]
java.lang.Thread.State: RUNNABLE
 CompilerThread1 daemon prio=10 tid=0x7fcd8c08d800 nid=0x4a9a waiting on 
 condition [0x]
java.lang.Thread.State: RUNNABLE
 CompilerThread0 daemon prio=10 tid=0x7fcd8c08a800 nid=0x4a99 waiting on 
 condition [0x]
java.lang.Thread.State: RUNNABLE
 Signal Dispatcher daemon prio=10 tid=0x7fcd8c088800 nid=0x4a98 runnable 
 [0x]
java.lang.Thread.State: RUNNABLE
 Finalizer daemon prio=10 tid=0x7fcd8c06a000 nid=0x4a97 in Object.wait() 
 [0x7fcd902e9000]
java.lang.Thread.State: WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0x7fcd75fc0470 (a java.lang.ref.ReferenceQueue$Lock)
   at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
   - locked 0x7fcd75fc0470 (a java.lang.ref.ReferenceQueue$Lock)
   at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
   at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
 Reference Handler daemon prio=10 tid=0x7fcd8c068000 nid=0x4a96 in 
 Object.wait() [0x7fcd903ea000]
java.lang.Thread.State: WAITING (on object monitor)
   at