[jira] [Commented] (HBASE-14062) RpcServer.Listener.doAccept get blocked by LinkedList.remove
[ https://issues.apache.org/jira/browse/HBASE-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14625918#comment-14625918 ] Heng Chen commented on HBASE-14062: --- you are welcome > RpcServer.Listener.doAccept get blocked by LinkedList.remove > > > Key: HBASE-14062 > URL: https://issues.apache.org/jira/browse/HBASE-14062 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 0.98.12 >Reporter: Victor Xu > Attachments: hbase.log, jstack.log > > > We saw these blocked info in our jstack output: > {noformat} > "RpcServer.listener,port=60020" daemon prio=10 tid=0x7f158097b800 > nid=0x2cd05 waiting for monitor entry [0x46374000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doAccept(RpcServer.java:833) > - waiting to lock <0x0002bb094ac8> (a > java.util.Collections$SynchronizedList) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener.run(RpcServer.java:748) > {noformat} > And the owner of the lock is LinkedList.remove: > {noformat} > "RpcServer.reader=9,port=60020" daemon prio=10 tid=0x7f1580394000 > nid=0x2cc19 runnable [0x43b4c000] >java.lang.Thread.State: RUNNABLE > at java.util.LinkedList.remove(LinkedList.java:363) > at > java.util.Collections$SynchronizedCollection.remove(Collections.java:1639) > - locked <0x0002bb094ac8> (a > java.util.Collections$SynchronizedList) > at > org.apache.hadoop.hbase.ipc.RpcServer.closeConnection(RpcServer.java:1992) > - locked <0x0002bb094ac8> (a > java.util.Collections$SynchronizedList) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:867) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:645) > - locked <0x0002bae09a30> (a > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:620) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > {noformat} > This issue blocked RS once in a while and I had to restart it whenever it > happens. It seems like a bug. Any suggestions? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14062) RpcServer.Listener.doAccept get blocked by LinkedList.remove
[ https://issues.apache.org/jira/browse/HBASE-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14625914#comment-14625914 ] Victor Xu commented on HBASE-14062: --- A variety of applications are using this hbase cluster, and they do not share the same client configurations and retry logic. I'll use tcpdump to find the guilty application when I come across this issue next time. Thanks for your help, Heng Chen! > RpcServer.Listener.doAccept get blocked by LinkedList.remove > > > Key: HBASE-14062 > URL: https://issues.apache.org/jira/browse/HBASE-14062 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 0.98.12 >Reporter: Victor Xu > Attachments: hbase.log, jstack.log > > > We saw these blocked info in our jstack output: > {noformat} > "RpcServer.listener,port=60020" daemon prio=10 tid=0x7f158097b800 > nid=0x2cd05 waiting for monitor entry [0x46374000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doAccept(RpcServer.java:833) > - waiting to lock <0x0002bb094ac8> (a > java.util.Collections$SynchronizedList) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener.run(RpcServer.java:748) > {noformat} > And the owner of the lock is LinkedList.remove: > {noformat} > "RpcServer.reader=9,port=60020" daemon prio=10 tid=0x7f1580394000 > nid=0x2cc19 runnable [0x43b4c000] >java.lang.Thread.State: RUNNABLE > at java.util.LinkedList.remove(LinkedList.java:363) > at > java.util.Collections$SynchronizedCollection.remove(Collections.java:1639) > - locked <0x0002bb094ac8> (a > java.util.Collections$SynchronizedList) > at > org.apache.hadoop.hbase.ipc.RpcServer.closeConnection(RpcServer.java:1992) > - locked <0x0002bb094ac8> (a > java.util.Collections$SynchronizedList) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:867) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:645) > - locked <0x0002bae09a30> (a > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:620) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > {noformat} > This issue blocked RS once in a while and I had to restart it whenever it > happens. It seems like a bug. Any suggestions? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14062) RpcServer.Listener.doAccept get blocked by LinkedList.remove
[ https://issues.apache.org/jira/browse/HBASE-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14625902#comment-14625902 ] Victor Xu commented on HBASE-14062: --- We can see from the rs log that META table located on that rs. I guess maybe some applications use very short client rpc timeout or have requests cached locally before actually sending to this rs, and when the requests reach the rs, they almost exceed the timeout immediately. When the clients retry, this request-and-fail loop continues. This could happen when some big job (tens of thousands of maps using TableInputFormat) starts. > RpcServer.Listener.doAccept get blocked by LinkedList.remove > > > Key: HBASE-14062 > URL: https://issues.apache.org/jira/browse/HBASE-14062 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 0.98.12 >Reporter: Victor Xu > Attachments: hbase.log, jstack.log > > > We saw these blocked info in our jstack output: > {noformat} > "RpcServer.listener,port=60020" daemon prio=10 tid=0x7f158097b800 > nid=0x2cd05 waiting for monitor entry [0x46374000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doAccept(RpcServer.java:833) > - waiting to lock <0x0002bb094ac8> (a > java.util.Collections$SynchronizedList) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener.run(RpcServer.java:748) > {noformat} > And the owner of the lock is LinkedList.remove: > {noformat} > "RpcServer.reader=9,port=60020" daemon prio=10 tid=0x7f1580394000 > nid=0x2cc19 runnable [0x43b4c000] >java.lang.Thread.State: RUNNABLE > at java.util.LinkedList.remove(LinkedList.java:363) > at > java.util.Collections$SynchronizedCollection.remove(Collections.java:1639) > - locked <0x0002bb094ac8> (a > java.util.Collections$SynchronizedList) > at > org.apache.hadoop.hbase.ipc.RpcServer.closeConnection(RpcServer.java:1992) > - locked <0x0002bb094ac8> (a > java.util.Collections$SynchronizedList) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:867) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:645) > - locked <0x0002bae09a30> (a > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:620) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > {noformat} > This issue blocked RS once in a while and I had to restart it whenever it > happens. It seems like a bug. Any suggestions? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14062) RpcServer.Listener.doAccept get blocked by LinkedList.remove
[ https://issues.apache.org/jira/browse/HBASE-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14625897#comment-14625897 ] Heng Chen commented on HBASE-14062: --- I think so. what is your Hbase client logic? > RpcServer.Listener.doAccept get blocked by LinkedList.remove > > > Key: HBASE-14062 > URL: https://issues.apache.org/jira/browse/HBASE-14062 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 0.98.12 >Reporter: Victor Xu > Attachments: hbase.log, jstack.log > > > We saw these blocked info in our jstack output: > {noformat} > "RpcServer.listener,port=60020" daemon prio=10 tid=0x7f158097b800 > nid=0x2cd05 waiting for monitor entry [0x46374000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doAccept(RpcServer.java:833) > - waiting to lock <0x0002bb094ac8> (a > java.util.Collections$SynchronizedList) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener.run(RpcServer.java:748) > {noformat} > And the owner of the lock is LinkedList.remove: > {noformat} > "RpcServer.reader=9,port=60020" daemon prio=10 tid=0x7f1580394000 > nid=0x2cc19 runnable [0x43b4c000] >java.lang.Thread.State: RUNNABLE > at java.util.LinkedList.remove(LinkedList.java:363) > at > java.util.Collections$SynchronizedCollection.remove(Collections.java:1639) > - locked <0x0002bb094ac8> (a > java.util.Collections$SynchronizedList) > at > org.apache.hadoop.hbase.ipc.RpcServer.closeConnection(RpcServer.java:1992) > - locked <0x0002bb094ac8> (a > java.util.Collections$SynchronizedList) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:867) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:645) > - locked <0x0002bae09a30> (a > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:620) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > {noformat} > This issue blocked RS once in a while and I had to restart it whenever it > happens. It seems like a bug. Any suggestions? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14062) RpcServer.Listener.doAccept get blocked by LinkedList.remove
[ https://issues.apache.org/jira/browse/HBASE-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14625898#comment-14625898 ] Heng Chen commented on HBASE-14062: --- I think so. what is your Hbase client logic? > RpcServer.Listener.doAccept get blocked by LinkedList.remove > > > Key: HBASE-14062 > URL: https://issues.apache.org/jira/browse/HBASE-14062 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 0.98.12 >Reporter: Victor Xu > Attachments: hbase.log, jstack.log > > > We saw these blocked info in our jstack output: > {noformat} > "RpcServer.listener,port=60020" daemon prio=10 tid=0x7f158097b800 > nid=0x2cd05 waiting for monitor entry [0x46374000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doAccept(RpcServer.java:833) > - waiting to lock <0x0002bb094ac8> (a > java.util.Collections$SynchronizedList) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener.run(RpcServer.java:748) > {noformat} > And the owner of the lock is LinkedList.remove: > {noformat} > "RpcServer.reader=9,port=60020" daemon prio=10 tid=0x7f1580394000 > nid=0x2cc19 runnable [0x43b4c000] >java.lang.Thread.State: RUNNABLE > at java.util.LinkedList.remove(LinkedList.java:363) > at > java.util.Collections$SynchronizedCollection.remove(Collections.java:1639) > - locked <0x0002bb094ac8> (a > java.util.Collections$SynchronizedList) > at > org.apache.hadoop.hbase.ipc.RpcServer.closeConnection(RpcServer.java:1992) > - locked <0x0002bb094ac8> (a > java.util.Collections$SynchronizedList) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:867) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:645) > - locked <0x0002bae09a30> (a > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:620) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > {noformat} > This issue blocked RS once in a while and I had to restart it whenever it > happens. It seems like a bug. Any suggestions? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14062) RpcServer.Listener.doAccept get blocked by LinkedList.remove
[ https://issues.apache.org/jira/browse/HBASE-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14625896#comment-14625896 ] Heng Chen commented on HBASE-14062: --- I think so. what is your Hbase client logic? > RpcServer.Listener.doAccept get blocked by LinkedList.remove > > > Key: HBASE-14062 > URL: https://issues.apache.org/jira/browse/HBASE-14062 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 0.98.12 >Reporter: Victor Xu > Attachments: hbase.log, jstack.log > > > We saw these blocked info in our jstack output: > {noformat} > "RpcServer.listener,port=60020" daemon prio=10 tid=0x7f158097b800 > nid=0x2cd05 waiting for monitor entry [0x46374000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doAccept(RpcServer.java:833) > - waiting to lock <0x0002bb094ac8> (a > java.util.Collections$SynchronizedList) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener.run(RpcServer.java:748) > {noformat} > And the owner of the lock is LinkedList.remove: > {noformat} > "RpcServer.reader=9,port=60020" daemon prio=10 tid=0x7f1580394000 > nid=0x2cc19 runnable [0x43b4c000] >java.lang.Thread.State: RUNNABLE > at java.util.LinkedList.remove(LinkedList.java:363) > at > java.util.Collections$SynchronizedCollection.remove(Collections.java:1639) > - locked <0x0002bb094ac8> (a > java.util.Collections$SynchronizedList) > at > org.apache.hadoop.hbase.ipc.RpcServer.closeConnection(RpcServer.java:1992) > - locked <0x0002bb094ac8> (a > java.util.Collections$SynchronizedList) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:867) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:645) > - locked <0x0002bae09a30> (a > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:620) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > {noformat} > This issue blocked RS once in a while and I had to restart it whenever it > happens. It seems like a bug. Any suggestions? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14062) RpcServer.Listener.doAccept get blocked by LinkedList.remove
[ https://issues.apache.org/jira/browse/HBASE-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14625882#comment-14625882 ] Heng Chen commented on HBASE-14062: --- So i think the lock is hold due to a lot of exceptions throwed by doRead。 When exception throw, doRead will call closeConnection, and closeConnection will hold the lock. And when having too many exceptions, the lock is always acquired by closeConnection, so the lock is always waited by doAccept Why the exception is throwed? > RpcServer.Listener.doAccept get blocked by LinkedList.remove > > > Key: HBASE-14062 > URL: https://issues.apache.org/jira/browse/HBASE-14062 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 0.98.12 >Reporter: Victor Xu > Attachments: hbase.log, jstack.log > > > We saw these blocked info in our jstack output: > {noformat} > "RpcServer.listener,port=60020" daemon prio=10 tid=0x7f158097b800 > nid=0x2cd05 waiting for monitor entry [0x46374000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doAccept(RpcServer.java:833) > - waiting to lock <0x0002bb094ac8> (a > java.util.Collections$SynchronizedList) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener.run(RpcServer.java:748) > {noformat} > And the owner of the lock is LinkedList.remove: > {noformat} > "RpcServer.reader=9,port=60020" daemon prio=10 tid=0x7f1580394000 > nid=0x2cc19 runnable [0x43b4c000] >java.lang.Thread.State: RUNNABLE > at java.util.LinkedList.remove(LinkedList.java:363) > at > java.util.Collections$SynchronizedCollection.remove(Collections.java:1639) > - locked <0x0002bb094ac8> (a > java.util.Collections$SynchronizedList) > at > org.apache.hadoop.hbase.ipc.RpcServer.closeConnection(RpcServer.java:1992) > - locked <0x0002bb094ac8> (a > java.util.Collections$SynchronizedList) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:867) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:645) > - locked <0x0002bae09a30> (a > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:620) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > {noformat} > This issue blocked RS once in a while and I had to restart it whenever it > happens. It seems like a bug. Any suggestions? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14062) RpcServer.Listener.doAccept get blocked by LinkedList.remove
[ https://issues.apache.org/jira/browse/HBASE-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14625877#comment-14625877 ] Victor Xu commented on HBASE-14062: --- There might be lots of requests coming together and only 10 readers are there to handler them. Whenever a reader starts to read the data, the client quits. All readers are busy repeating these read/fail loop so the lock seems to be always held, and other normal requests are blocked(or served slowly). Am I right? > RpcServer.Listener.doAccept get blocked by LinkedList.remove > > > Key: HBASE-14062 > URL: https://issues.apache.org/jira/browse/HBASE-14062 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 0.98.12 >Reporter: Victor Xu > Attachments: hbase.log, jstack.log > > > We saw these blocked info in our jstack output: > {noformat} > "RpcServer.listener,port=60020" daemon prio=10 tid=0x7f158097b800 > nid=0x2cd05 waiting for monitor entry [0x46374000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doAccept(RpcServer.java:833) > - waiting to lock <0x0002bb094ac8> (a > java.util.Collections$SynchronizedList) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener.run(RpcServer.java:748) > {noformat} > And the owner of the lock is LinkedList.remove: > {noformat} > "RpcServer.reader=9,port=60020" daemon prio=10 tid=0x7f1580394000 > nid=0x2cc19 runnable [0x43b4c000] >java.lang.Thread.State: RUNNABLE > at java.util.LinkedList.remove(LinkedList.java:363) > at > java.util.Collections$SynchronizedCollection.remove(Collections.java:1639) > - locked <0x0002bb094ac8> (a > java.util.Collections$SynchronizedList) > at > org.apache.hadoop.hbase.ipc.RpcServer.closeConnection(RpcServer.java:1992) > - locked <0x0002bb094ac8> (a > java.util.Collections$SynchronizedList) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:867) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:645) > - locked <0x0002bae09a30> (a > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:620) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > {noformat} > This issue blocked RS once in a while and I had to restart it whenever it > happens. It seems like a bug. Any suggestions? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14062) RpcServer.Listener.doAccept get blocked by LinkedList.remove
[ https://issues.apache.org/jira/browse/HBASE-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14625860#comment-14625860 ] Victor Xu commented on HBASE-14062: --- Yes, you are right. Different threads held the same lock in different jstack outputs: {noformat} jstack.log-"RpcServer.reader=9,port=60020" daemon prio=10 tid=0x7f1580394000 nid=0x2cc19 runnable [0x43b4c000] jstack.log- java.lang.Thread.State: RUNNABLE jstack.log- at java.util.LinkedList.remove(LinkedList.java:363) jstack.log- at java.util.Collections$SynchronizedCollection.remove(Collections.java:1639) jstack.log: - locked <0x0002bb094ac8> (a java.util.Collections$SynchronizedList) jstack.log- at org.apache.hadoop.hbase.ipc.RpcServer.closeConnection(RpcServer.java:1992) jstack.log: - locked <0x0002bb094ac8> (a java.util.Collections$SynchronizedList) -- jstack.log.1-"RpcServer.reader=0,port=60020" daemon prio=10 tid=0x7f1580263000 nid=0x2cc10 runnable [0x43243000] jstack.log.1- java.lang.Thread.State: RUNNABLE jstack.log.1- at java.util.LinkedList.remove(LinkedList.java:363) jstack.log.1- at java.util.Collections$SynchronizedCollection.remove(Collections.java:1639) jstack.log.1: - locked <0x0002bb094ac8> (a java.util.Collections$SynchronizedList) jstack.log.1- at org.apache.hadoop.hbase.ipc.RpcServer.closeConnection(RpcServer.java:1992) jstack.log.1: - locked <0x0002bb094ac8> (a java.util.Collections$SynchronizedList) -- jstack.log.2-"RpcServer.reader=6,port=60020" daemon prio=10 tid=0x7f1580342800 nid=0x2cc16 runnable [0x43849000] jstack.log.2- java.lang.Thread.State: RUNNABLE jstack.log.2- at java.util.LinkedList.remove(LinkedList.java:363) jstack.log.2- at java.util.Collections$SynchronizedCollection.remove(Collections.java:1639) jstack.log.2: - locked <0x0002bb094ac8> (a java.util.Collections$SynchronizedList) jstack.log.2- at org.apache.hadoop.hbase.ipc.RpcServer.closeConnection(RpcServer.java:1992) jstack.log.2: - locked <0x0002bb094ac8> (a java.util.Collections$SynchronizedList) {noformat} > RpcServer.Listener.doAccept get blocked by LinkedList.remove > > > Key: HBASE-14062 > URL: https://issues.apache.org/jira/browse/HBASE-14062 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 0.98.12 >Reporter: Victor Xu > Attachments: hbase.log, jstack.log > > > We saw these blocked info in our jstack output: > {noformat} > "RpcServer.listener,port=60020" daemon prio=10 tid=0x7f158097b800 > nid=0x2cd05 waiting for monitor entry [0x46374000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doAccept(RpcServer.java:833) > - waiting to lock <0x0002bb094ac8> (a > java.util.Collections$SynchronizedList) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener.run(RpcServer.java:748) > {noformat} > And the owner of the lock is LinkedList.remove: > {noformat} > "RpcServer.reader=9,port=60020" daemon prio=10 tid=0x7f1580394000 > nid=0x2cc19 runnable [0x43b4c000] >java.lang.Thread.State: RUNNABLE > at java.util.LinkedList.remove(LinkedList.java:363) > at > java.util.Collections$SynchronizedCollection.remove(Collections.java:1639) > - locked <0x0002bb094ac8> (a > java.util.Collections$SynchronizedList) > at > org.apache.hadoop.hbase.ipc.RpcServer.closeConnection(RpcServer.java:1992) > - locked <0x0002bb094ac8> (a > java.util.Collections$SynchronizedList) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:867) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:645) > - locked <0x0002bae09a30> (a > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:620) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > {noformat} > This issue blocked RS once in a while and I had to restart it whenever it > happens. It seems like a bug. Any suggestions? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14062) RpcServer.Listener.doAccept get blocked by LinkedList.remove
[ https://issues.apache.org/jira/browse/HBASE-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14625792#comment-14625792 ] Heng Chen commented on HBASE-14062: --- in another two jstack outputs, the thread's id which hold the lock is same with the jstack which you post ? I notice in your region server's log, there are a lot of exceptions throwed by doRead function。 This exception is catched, and set the count=-1, so It will close this connection in closeConnection function。 And in closeConnection, It will acquire the lock of connectionList。 PS: The exception is below, it seems the client close connection during read process, is it correct? 2015-07-13 05:42:12,735 WARN org.apache.hadoop.ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0 java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:197) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) at org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2310) at org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1480) at org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:854) at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:645) at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:620) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) > RpcServer.Listener.doAccept get blocked by LinkedList.remove > > > Key: HBASE-14062 > URL: https://issues.apache.org/jira/browse/HBASE-14062 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 0.98.12 >Reporter: Victor Xu > Attachments: hbase.log, jstack.log > > > We saw these blocked info in our jstack output: > {noformat} > "RpcServer.listener,port=60020" daemon prio=10 tid=0x7f158097b800 > nid=0x2cd05 waiting for monitor entry [0x46374000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doAccept(RpcServer.java:833) > - waiting to lock <0x0002bb094ac8> (a > java.util.Collections$SynchronizedList) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener.run(RpcServer.java:748) > {noformat} > And the owner of the lock is LinkedList.remove: > {noformat} > "RpcServer.reader=9,port=60020" daemon prio=10 tid=0x7f1580394000 > nid=0x2cc19 runnable [0x43b4c000] >java.lang.Thread.State: RUNNABLE > at java.util.LinkedList.remove(LinkedList.java:363) > at > java.util.Collections$SynchronizedCollection.remove(Collections.java:1639) > - locked <0x0002bb094ac8> (a > java.util.Collections$SynchronizedList) > at > org.apache.hadoop.hbase.ipc.RpcServer.closeConnection(RpcServer.java:1992) > - locked <0x0002bb094ac8> (a > java.util.Collections$SynchronizedList) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:867) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:645) > - locked <0x0002bae09a30> (a > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:620) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > {noformat} > This issue blocked RS once in a while and I had to restart it whenever it > happens. It seems like a bug. Any suggestions? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14062) RpcServer.Listener.doAccept get blocked by LinkedList.remove
[ https://issues.apache.org/jira/browse/HBASE-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14625408#comment-14625408 ] Victor Xu commented on HBASE-14062: --- Thanks. What confused me most is the lock is blocked by java.util.LinkedList.remove method and never be released, because I got another two jstack outputs several minutes after the first one, and I still found the same lock id(<0x0002bb094ac8>) which means the LinkedList.remove never finished. Maybe a bug in JVM? > RpcServer.Listener.doAccept get blocked by LinkedList.remove > > > Key: HBASE-14062 > URL: https://issues.apache.org/jira/browse/HBASE-14062 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 0.98.12 >Reporter: Victor Xu > Attachments: hbase.log, jstack.log > > > We saw these blocked info in our jstack output: > {noformat} > "RpcServer.listener,port=60020" daemon prio=10 tid=0x7f158097b800 > nid=0x2cd05 waiting for monitor entry [0x46374000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doAccept(RpcServer.java:833) > - waiting to lock <0x0002bb094ac8> (a > java.util.Collections$SynchronizedList) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener.run(RpcServer.java:748) > {noformat} > And the owner of the lock is LinkedList.remove: > {noformat} > "RpcServer.reader=9,port=60020" daemon prio=10 tid=0x7f1580394000 > nid=0x2cc19 runnable [0x43b4c000] >java.lang.Thread.State: RUNNABLE > at java.util.LinkedList.remove(LinkedList.java:363) > at > java.util.Collections$SynchronizedCollection.remove(Collections.java:1639) > - locked <0x0002bb094ac8> (a > java.util.Collections$SynchronizedList) > at > org.apache.hadoop.hbase.ipc.RpcServer.closeConnection(RpcServer.java:1992) > - locked <0x0002bb094ac8> (a > java.util.Collections$SynchronizedList) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:867) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:645) > - locked <0x0002bae09a30> (a > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:620) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > {noformat} > This issue blocked RS once in a while and I had to restart it whenever it > happens. It seems like a bug. Any suggestions? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14062) RpcServer.Listener.doAccept get blocked by LinkedList.remove
[ https://issues.apache.org/jira/browse/HBASE-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14624354#comment-14624354 ] Heng Chen commented on HBASE-14062: --- your threads are blocked when acquiring connectionList lock in closeConnection of doRead function。 but in doRead function, closeConnection() called when read return -1 or read throw exception。 can you post your regionserver's log ? PS: this is doRead function below: void doRead(SelectionKey key) throws InterruptedException { int count = 0; Connection c = (Connection)key.attachment(); if (c == null) { return; } c.setLastContact(System.currentTimeMillis()); try { count = c.readAndProcess(); } catch (InterruptedException ieo) { throw ieo; } catch (Exception e) { LOG.warn(getName() + ": count of bytes read: " + count, e); count = -1; //so that the (count < 0) block is executed } if (count < 0) { if (LOG.isDebugEnabled()) { LOG.debug(getName() + ": DISCONNECTING client " + c.toString() + " because read count=" + count + ". Number of active connections: " + numConnections); } closeConnection(c); // c = null; } else { c.setLastContact(System.currentTimeMillis()); } } > RpcServer.Listener.doAccept get blocked by LinkedList.remove > > > Key: HBASE-14062 > URL: https://issues.apache.org/jira/browse/HBASE-14062 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 0.98.12 >Reporter: Victor Xu > Attachments: jstack.log > > > We saw these blocked info in our jstack output: > {noformat} > "RpcServer.listener,port=60020" daemon prio=10 tid=0x7f158097b800 > nid=0x2cd05 waiting for monitor entry [0x46374000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doAccept(RpcServer.java:833) > - waiting to lock <0x0002bb094ac8> (a > java.util.Collections$SynchronizedList) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener.run(RpcServer.java:748) > {noformat} > And the owner of the lock is LinkedList.remove: > {noformat} > "RpcServer.reader=9,port=60020" daemon prio=10 tid=0x7f1580394000 > nid=0x2cc19 runnable [0x43b4c000] >java.lang.Thread.State: RUNNABLE > at java.util.LinkedList.remove(LinkedList.java:363) > at > java.util.Collections$SynchronizedCollection.remove(Collections.java:1639) > - locked <0x0002bb094ac8> (a > java.util.Collections$SynchronizedList) > at > org.apache.hadoop.hbase.ipc.RpcServer.closeConnection(RpcServer.java:1992) > - locked <0x0002bb094ac8> (a > java.util.Collections$SynchronizedList) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:867) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:645) > - locked <0x0002bae09a30> (a > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:620) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > {noformat} > This issue blocked RS once in a while and I had to restart it whenever it > happens. It seems like a bug. Any suggestions? -- This message was sent by Atlassian JIRA (v6.3.4#6332)