[jira] [Commented] (HBASE-14062) RpcServer.Listener.doAccept get blocked by LinkedList.remove

2015-07-13 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14625918#comment-14625918
 ] 

Heng Chen commented on HBASE-14062:
---

you are welcome

> RpcServer.Listener.doAccept get blocked by LinkedList.remove
> 
>
> Key: HBASE-14062
> URL: https://issues.apache.org/jira/browse/HBASE-14062
> Project: HBase
>  Issue Type: Bug
>  Components: IPC/RPC
>Affects Versions: 0.98.12
>Reporter: Victor Xu
> Attachments: hbase.log, jstack.log
>
>
> We saw these blocked info in our jstack output:
> {noformat}
> "RpcServer.listener,port=60020" daemon prio=10 tid=0x7f158097b800 
> nid=0x2cd05 waiting for monitor entry [0x46374000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doAccept(RpcServer.java:833)
> - waiting to lock <0x0002bb094ac8> (a 
> java.util.Collections$SynchronizedList)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.run(RpcServer.java:748)
> {noformat}
> And the owner of the lock is LinkedList.remove:
> {noformat}
> "RpcServer.reader=9,port=60020" daemon prio=10 tid=0x7f1580394000 
> nid=0x2cc19 runnable [0x43b4c000]
>java.lang.Thread.State: RUNNABLE
> at java.util.LinkedList.remove(LinkedList.java:363)
> at 
> java.util.Collections$SynchronizedCollection.remove(Collections.java:1639)
> - locked <0x0002bb094ac8> (a 
> java.util.Collections$SynchronizedList)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer.closeConnection(RpcServer.java:1992)
> - locked <0x0002bb094ac8> (a 
> java.util.Collections$SynchronizedList)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:867)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:645)
> - locked <0x0002bae09a30> (a 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:620)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> {noformat}
> This issue blocked RS once in a while and I had to restart it whenever it 
> happens. It seems like a bug. Any suggestions?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14062) RpcServer.Listener.doAccept get blocked by LinkedList.remove

2015-07-13 Thread Victor Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14625914#comment-14625914
 ] 

Victor Xu commented on HBASE-14062:
---

A variety of applications are using this hbase cluster, and they do not share 
the same client configurations and retry logic. I'll use tcpdump to find the 
guilty application when I come across this issue next time. Thanks for your 
help, Heng Chen!

> RpcServer.Listener.doAccept get blocked by LinkedList.remove
> 
>
> Key: HBASE-14062
> URL: https://issues.apache.org/jira/browse/HBASE-14062
> Project: HBase
>  Issue Type: Bug
>  Components: IPC/RPC
>Affects Versions: 0.98.12
>Reporter: Victor Xu
> Attachments: hbase.log, jstack.log
>
>
> We saw these blocked info in our jstack output:
> {noformat}
> "RpcServer.listener,port=60020" daemon prio=10 tid=0x7f158097b800 
> nid=0x2cd05 waiting for monitor entry [0x46374000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doAccept(RpcServer.java:833)
> - waiting to lock <0x0002bb094ac8> (a 
> java.util.Collections$SynchronizedList)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.run(RpcServer.java:748)
> {noformat}
> And the owner of the lock is LinkedList.remove:
> {noformat}
> "RpcServer.reader=9,port=60020" daemon prio=10 tid=0x7f1580394000 
> nid=0x2cc19 runnable [0x43b4c000]
>java.lang.Thread.State: RUNNABLE
> at java.util.LinkedList.remove(LinkedList.java:363)
> at 
> java.util.Collections$SynchronizedCollection.remove(Collections.java:1639)
> - locked <0x0002bb094ac8> (a 
> java.util.Collections$SynchronizedList)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer.closeConnection(RpcServer.java:1992)
> - locked <0x0002bb094ac8> (a 
> java.util.Collections$SynchronizedList)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:867)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:645)
> - locked <0x0002bae09a30> (a 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:620)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> {noformat}
> This issue blocked RS once in a while and I had to restart it whenever it 
> happens. It seems like a bug. Any suggestions?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14062) RpcServer.Listener.doAccept get blocked by LinkedList.remove

2015-07-13 Thread Victor Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14625902#comment-14625902
 ] 

Victor Xu commented on HBASE-14062:
---

We can see from the rs log that META table located on that rs. I guess maybe 
some applications use very short client rpc timeout or have requests cached 
locally before actually sending to this rs, and when the requests reach the rs, 
they almost exceed the timeout immediately. When the clients retry, this 
request-and-fail loop continues. This could happen when some big job (tens of 
thousands of maps using TableInputFormat) starts.

> RpcServer.Listener.doAccept get blocked by LinkedList.remove
> 
>
> Key: HBASE-14062
> URL: https://issues.apache.org/jira/browse/HBASE-14062
> Project: HBase
>  Issue Type: Bug
>  Components: IPC/RPC
>Affects Versions: 0.98.12
>Reporter: Victor Xu
> Attachments: hbase.log, jstack.log
>
>
> We saw these blocked info in our jstack output:
> {noformat}
> "RpcServer.listener,port=60020" daemon prio=10 tid=0x7f158097b800 
> nid=0x2cd05 waiting for monitor entry [0x46374000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doAccept(RpcServer.java:833)
> - waiting to lock <0x0002bb094ac8> (a 
> java.util.Collections$SynchronizedList)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.run(RpcServer.java:748)
> {noformat}
> And the owner of the lock is LinkedList.remove:
> {noformat}
> "RpcServer.reader=9,port=60020" daemon prio=10 tid=0x7f1580394000 
> nid=0x2cc19 runnable [0x43b4c000]
>java.lang.Thread.State: RUNNABLE
> at java.util.LinkedList.remove(LinkedList.java:363)
> at 
> java.util.Collections$SynchronizedCollection.remove(Collections.java:1639)
> - locked <0x0002bb094ac8> (a 
> java.util.Collections$SynchronizedList)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer.closeConnection(RpcServer.java:1992)
> - locked <0x0002bb094ac8> (a 
> java.util.Collections$SynchronizedList)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:867)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:645)
> - locked <0x0002bae09a30> (a 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:620)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> {noformat}
> This issue blocked RS once in a while and I had to restart it whenever it 
> happens. It seems like a bug. Any suggestions?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14062) RpcServer.Listener.doAccept get blocked by LinkedList.remove

2015-07-13 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14625897#comment-14625897
 ] 

Heng Chen commented on HBASE-14062:
---

I think so.

what is your Hbase client logic? 

> RpcServer.Listener.doAccept get blocked by LinkedList.remove
> 
>
> Key: HBASE-14062
> URL: https://issues.apache.org/jira/browse/HBASE-14062
> Project: HBase
>  Issue Type: Bug
>  Components: IPC/RPC
>Affects Versions: 0.98.12
>Reporter: Victor Xu
> Attachments: hbase.log, jstack.log
>
>
> We saw these blocked info in our jstack output:
> {noformat}
> "RpcServer.listener,port=60020" daemon prio=10 tid=0x7f158097b800 
> nid=0x2cd05 waiting for monitor entry [0x46374000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doAccept(RpcServer.java:833)
> - waiting to lock <0x0002bb094ac8> (a 
> java.util.Collections$SynchronizedList)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.run(RpcServer.java:748)
> {noformat}
> And the owner of the lock is LinkedList.remove:
> {noformat}
> "RpcServer.reader=9,port=60020" daemon prio=10 tid=0x7f1580394000 
> nid=0x2cc19 runnable [0x43b4c000]
>java.lang.Thread.State: RUNNABLE
> at java.util.LinkedList.remove(LinkedList.java:363)
> at 
> java.util.Collections$SynchronizedCollection.remove(Collections.java:1639)
> - locked <0x0002bb094ac8> (a 
> java.util.Collections$SynchronizedList)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer.closeConnection(RpcServer.java:1992)
> - locked <0x0002bb094ac8> (a 
> java.util.Collections$SynchronizedList)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:867)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:645)
> - locked <0x0002bae09a30> (a 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:620)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> {noformat}
> This issue blocked RS once in a while and I had to restart it whenever it 
> happens. It seems like a bug. Any suggestions?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14062) RpcServer.Listener.doAccept get blocked by LinkedList.remove

2015-07-13 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14625898#comment-14625898
 ] 

Heng Chen commented on HBASE-14062:
---

I think so.

what is your Hbase client logic? 

> RpcServer.Listener.doAccept get blocked by LinkedList.remove
> 
>
> Key: HBASE-14062
> URL: https://issues.apache.org/jira/browse/HBASE-14062
> Project: HBase
>  Issue Type: Bug
>  Components: IPC/RPC
>Affects Versions: 0.98.12
>Reporter: Victor Xu
> Attachments: hbase.log, jstack.log
>
>
> We saw these blocked info in our jstack output:
> {noformat}
> "RpcServer.listener,port=60020" daemon prio=10 tid=0x7f158097b800 
> nid=0x2cd05 waiting for monitor entry [0x46374000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doAccept(RpcServer.java:833)
> - waiting to lock <0x0002bb094ac8> (a 
> java.util.Collections$SynchronizedList)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.run(RpcServer.java:748)
> {noformat}
> And the owner of the lock is LinkedList.remove:
> {noformat}
> "RpcServer.reader=9,port=60020" daemon prio=10 tid=0x7f1580394000 
> nid=0x2cc19 runnable [0x43b4c000]
>java.lang.Thread.State: RUNNABLE
> at java.util.LinkedList.remove(LinkedList.java:363)
> at 
> java.util.Collections$SynchronizedCollection.remove(Collections.java:1639)
> - locked <0x0002bb094ac8> (a 
> java.util.Collections$SynchronizedList)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer.closeConnection(RpcServer.java:1992)
> - locked <0x0002bb094ac8> (a 
> java.util.Collections$SynchronizedList)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:867)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:645)
> - locked <0x0002bae09a30> (a 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:620)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> {noformat}
> This issue blocked RS once in a while and I had to restart it whenever it 
> happens. It seems like a bug. Any suggestions?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14062) RpcServer.Listener.doAccept get blocked by LinkedList.remove

2015-07-13 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14625896#comment-14625896
 ] 

Heng Chen commented on HBASE-14062:
---

I think so.

what is your Hbase client logic? 

> RpcServer.Listener.doAccept get blocked by LinkedList.remove
> 
>
> Key: HBASE-14062
> URL: https://issues.apache.org/jira/browse/HBASE-14062
> Project: HBase
>  Issue Type: Bug
>  Components: IPC/RPC
>Affects Versions: 0.98.12
>Reporter: Victor Xu
> Attachments: hbase.log, jstack.log
>
>
> We saw these blocked info in our jstack output:
> {noformat}
> "RpcServer.listener,port=60020" daemon prio=10 tid=0x7f158097b800 
> nid=0x2cd05 waiting for monitor entry [0x46374000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doAccept(RpcServer.java:833)
> - waiting to lock <0x0002bb094ac8> (a 
> java.util.Collections$SynchronizedList)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.run(RpcServer.java:748)
> {noformat}
> And the owner of the lock is LinkedList.remove:
> {noformat}
> "RpcServer.reader=9,port=60020" daemon prio=10 tid=0x7f1580394000 
> nid=0x2cc19 runnable [0x43b4c000]
>java.lang.Thread.State: RUNNABLE
> at java.util.LinkedList.remove(LinkedList.java:363)
> at 
> java.util.Collections$SynchronizedCollection.remove(Collections.java:1639)
> - locked <0x0002bb094ac8> (a 
> java.util.Collections$SynchronizedList)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer.closeConnection(RpcServer.java:1992)
> - locked <0x0002bb094ac8> (a 
> java.util.Collections$SynchronizedList)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:867)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:645)
> - locked <0x0002bae09a30> (a 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:620)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> {noformat}
> This issue blocked RS once in a while and I had to restart it whenever it 
> happens. It seems like a bug. Any suggestions?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14062) RpcServer.Listener.doAccept get blocked by LinkedList.remove

2015-07-13 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14625882#comment-14625882
 ] 

Heng Chen commented on HBASE-14062:
---

So i think the lock is hold  due to a lot of exceptions throwed  by doRead。 

When exception throw, doRead will call closeConnection,  and closeConnection 
will hold the lock.

And when having too many exceptions, the lock is always acquired by 
closeConnection, so the lock is always waited by doAccept


Why the exception is throwed? 

> RpcServer.Listener.doAccept get blocked by LinkedList.remove
> 
>
> Key: HBASE-14062
> URL: https://issues.apache.org/jira/browse/HBASE-14062
> Project: HBase
>  Issue Type: Bug
>  Components: IPC/RPC
>Affects Versions: 0.98.12
>Reporter: Victor Xu
> Attachments: hbase.log, jstack.log
>
>
> We saw these blocked info in our jstack output:
> {noformat}
> "RpcServer.listener,port=60020" daemon prio=10 tid=0x7f158097b800 
> nid=0x2cd05 waiting for monitor entry [0x46374000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doAccept(RpcServer.java:833)
> - waiting to lock <0x0002bb094ac8> (a 
> java.util.Collections$SynchronizedList)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.run(RpcServer.java:748)
> {noformat}
> And the owner of the lock is LinkedList.remove:
> {noformat}
> "RpcServer.reader=9,port=60020" daemon prio=10 tid=0x7f1580394000 
> nid=0x2cc19 runnable [0x43b4c000]
>java.lang.Thread.State: RUNNABLE
> at java.util.LinkedList.remove(LinkedList.java:363)
> at 
> java.util.Collections$SynchronizedCollection.remove(Collections.java:1639)
> - locked <0x0002bb094ac8> (a 
> java.util.Collections$SynchronizedList)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer.closeConnection(RpcServer.java:1992)
> - locked <0x0002bb094ac8> (a 
> java.util.Collections$SynchronizedList)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:867)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:645)
> - locked <0x0002bae09a30> (a 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:620)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> {noformat}
> This issue blocked RS once in a while and I had to restart it whenever it 
> happens. It seems like a bug. Any suggestions?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14062) RpcServer.Listener.doAccept get blocked by LinkedList.remove

2015-07-13 Thread Victor Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14625877#comment-14625877
 ] 

Victor Xu commented on HBASE-14062:
---

There might be lots of requests coming together and only 10 readers are there 
to handler them. Whenever a reader starts to read the data, the client quits. 
All readers are busy repeating these read/fail loop so the lock seems to be 
always held, and other normal requests are blocked(or served slowly). Am I 
right? 

> RpcServer.Listener.doAccept get blocked by LinkedList.remove
> 
>
> Key: HBASE-14062
> URL: https://issues.apache.org/jira/browse/HBASE-14062
> Project: HBase
>  Issue Type: Bug
>  Components: IPC/RPC
>Affects Versions: 0.98.12
>Reporter: Victor Xu
> Attachments: hbase.log, jstack.log
>
>
> We saw these blocked info in our jstack output:
> {noformat}
> "RpcServer.listener,port=60020" daemon prio=10 tid=0x7f158097b800 
> nid=0x2cd05 waiting for monitor entry [0x46374000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doAccept(RpcServer.java:833)
> - waiting to lock <0x0002bb094ac8> (a 
> java.util.Collections$SynchronizedList)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.run(RpcServer.java:748)
> {noformat}
> And the owner of the lock is LinkedList.remove:
> {noformat}
> "RpcServer.reader=9,port=60020" daemon prio=10 tid=0x7f1580394000 
> nid=0x2cc19 runnable [0x43b4c000]
>java.lang.Thread.State: RUNNABLE
> at java.util.LinkedList.remove(LinkedList.java:363)
> at 
> java.util.Collections$SynchronizedCollection.remove(Collections.java:1639)
> - locked <0x0002bb094ac8> (a 
> java.util.Collections$SynchronizedList)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer.closeConnection(RpcServer.java:1992)
> - locked <0x0002bb094ac8> (a 
> java.util.Collections$SynchronizedList)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:867)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:645)
> - locked <0x0002bae09a30> (a 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:620)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> {noformat}
> This issue blocked RS once in a while and I had to restart it whenever it 
> happens. It seems like a bug. Any suggestions?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14062) RpcServer.Listener.doAccept get blocked by LinkedList.remove

2015-07-13 Thread Victor Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14625860#comment-14625860
 ] 

Victor Xu commented on HBASE-14062:
---

Yes, you are right. Different threads held the same lock in different jstack 
outputs:
{noformat}
jstack.log-"RpcServer.reader=9,port=60020" daemon prio=10 
tid=0x7f1580394000 nid=0x2cc19 runnable [0x43b4c000]
jstack.log-   java.lang.Thread.State: RUNNABLE
jstack.log- at java.util.LinkedList.remove(LinkedList.java:363)
jstack.log- at 
java.util.Collections$SynchronizedCollection.remove(Collections.java:1639)
jstack.log: - locked <0x0002bb094ac8> (a 
java.util.Collections$SynchronizedList)
jstack.log- at 
org.apache.hadoop.hbase.ipc.RpcServer.closeConnection(RpcServer.java:1992)
jstack.log: - locked <0x0002bb094ac8> (a 
java.util.Collections$SynchronizedList)
--
jstack.log.1-"RpcServer.reader=0,port=60020" daemon prio=10 
tid=0x7f1580263000 nid=0x2cc10 runnable [0x43243000]
jstack.log.1-   java.lang.Thread.State: RUNNABLE
jstack.log.1-   at java.util.LinkedList.remove(LinkedList.java:363)
jstack.log.1-   at 
java.util.Collections$SynchronizedCollection.remove(Collections.java:1639)
jstack.log.1:   - locked <0x0002bb094ac8> (a 
java.util.Collections$SynchronizedList)
jstack.log.1-   at 
org.apache.hadoop.hbase.ipc.RpcServer.closeConnection(RpcServer.java:1992)
jstack.log.1:   - locked <0x0002bb094ac8> (a 
java.util.Collections$SynchronizedList)
--
jstack.log.2-"RpcServer.reader=6,port=60020" daemon prio=10 
tid=0x7f1580342800 nid=0x2cc16 runnable [0x43849000]
jstack.log.2-   java.lang.Thread.State: RUNNABLE
jstack.log.2-   at java.util.LinkedList.remove(LinkedList.java:363)
jstack.log.2-   at 
java.util.Collections$SynchronizedCollection.remove(Collections.java:1639)
jstack.log.2:   - locked <0x0002bb094ac8> (a 
java.util.Collections$SynchronizedList)
jstack.log.2-   at 
org.apache.hadoop.hbase.ipc.RpcServer.closeConnection(RpcServer.java:1992)
jstack.log.2:   - locked <0x0002bb094ac8> (a 
java.util.Collections$SynchronizedList)
{noformat}

> RpcServer.Listener.doAccept get blocked by LinkedList.remove
> 
>
> Key: HBASE-14062
> URL: https://issues.apache.org/jira/browse/HBASE-14062
> Project: HBase
>  Issue Type: Bug
>  Components: IPC/RPC
>Affects Versions: 0.98.12
>Reporter: Victor Xu
> Attachments: hbase.log, jstack.log
>
>
> We saw these blocked info in our jstack output:
> {noformat}
> "RpcServer.listener,port=60020" daemon prio=10 tid=0x7f158097b800 
> nid=0x2cd05 waiting for monitor entry [0x46374000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doAccept(RpcServer.java:833)
> - waiting to lock <0x0002bb094ac8> (a 
> java.util.Collections$SynchronizedList)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.run(RpcServer.java:748)
> {noformat}
> And the owner of the lock is LinkedList.remove:
> {noformat}
> "RpcServer.reader=9,port=60020" daemon prio=10 tid=0x7f1580394000 
> nid=0x2cc19 runnable [0x43b4c000]
>java.lang.Thread.State: RUNNABLE
> at java.util.LinkedList.remove(LinkedList.java:363)
> at 
> java.util.Collections$SynchronizedCollection.remove(Collections.java:1639)
> - locked <0x0002bb094ac8> (a 
> java.util.Collections$SynchronizedList)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer.closeConnection(RpcServer.java:1992)
> - locked <0x0002bb094ac8> (a 
> java.util.Collections$SynchronizedList)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:867)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:645)
> - locked <0x0002bae09a30> (a 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:620)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> {noformat}
> This issue blocked RS once in a while and I had to restart it whenever it 
> happens. It seems like a bug. Any suggestions?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14062) RpcServer.Listener.doAccept get blocked by LinkedList.remove

2015-07-13 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14625792#comment-14625792
 ] 

Heng Chen commented on HBASE-14062:
---

in another two jstack outputs,  the thread's id which hold the lock is same 
with the jstack which you post ?


I notice in your region server's log,  there are a lot of exceptions throwed by 
doRead function。 
This exception is catched, and set the count=-1,  so  It will close this 
connection in closeConnection function。 
And in closeConnection,  It will acquire the lock of connectionList。 


PS:  The exception is below,  it seems the client close connection during read 
process,  is it correct?

2015-07-13 05:42:12,735 WARN org.apache.hadoop.ipc.RpcServer: 
RpcServer.listener,port=60020: count of bytes read: 0
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:197)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
at 
org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2310)
at 
org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1480)
at 
org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:854)
at 
org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:645)
at 
org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:620)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

> RpcServer.Listener.doAccept get blocked by LinkedList.remove
> 
>
> Key: HBASE-14062
> URL: https://issues.apache.org/jira/browse/HBASE-14062
> Project: HBase
>  Issue Type: Bug
>  Components: IPC/RPC
>Affects Versions: 0.98.12
>Reporter: Victor Xu
> Attachments: hbase.log, jstack.log
>
>
> We saw these blocked info in our jstack output:
> {noformat}
> "RpcServer.listener,port=60020" daemon prio=10 tid=0x7f158097b800 
> nid=0x2cd05 waiting for monitor entry [0x46374000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doAccept(RpcServer.java:833)
> - waiting to lock <0x0002bb094ac8> (a 
> java.util.Collections$SynchronizedList)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.run(RpcServer.java:748)
> {noformat}
> And the owner of the lock is LinkedList.remove:
> {noformat}
> "RpcServer.reader=9,port=60020" daemon prio=10 tid=0x7f1580394000 
> nid=0x2cc19 runnable [0x43b4c000]
>java.lang.Thread.State: RUNNABLE
> at java.util.LinkedList.remove(LinkedList.java:363)
> at 
> java.util.Collections$SynchronizedCollection.remove(Collections.java:1639)
> - locked <0x0002bb094ac8> (a 
> java.util.Collections$SynchronizedList)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer.closeConnection(RpcServer.java:1992)
> - locked <0x0002bb094ac8> (a 
> java.util.Collections$SynchronizedList)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:867)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:645)
> - locked <0x0002bae09a30> (a 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:620)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> {noformat}
> This issue blocked RS once in a while and I had to restart it whenever it 
> happens. It seems like a bug. Any suggestions?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14062) RpcServer.Listener.doAccept get blocked by LinkedList.remove

2015-07-13 Thread Victor Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14625408#comment-14625408
 ] 

Victor Xu commented on HBASE-14062:
---

Thanks. What confused me most is the lock is blocked by 
java.util.LinkedList.remove method and never be released, because I got another 
two jstack outputs several minutes after the first one, and I still found the 
same lock id(<0x0002bb094ac8>) which means the LinkedList.remove never 
finished. Maybe a bug in JVM?

> RpcServer.Listener.doAccept get blocked by LinkedList.remove
> 
>
> Key: HBASE-14062
> URL: https://issues.apache.org/jira/browse/HBASE-14062
> Project: HBase
>  Issue Type: Bug
>  Components: IPC/RPC
>Affects Versions: 0.98.12
>Reporter: Victor Xu
> Attachments: hbase.log, jstack.log
>
>
> We saw these blocked info in our jstack output:
> {noformat}
> "RpcServer.listener,port=60020" daemon prio=10 tid=0x7f158097b800 
> nid=0x2cd05 waiting for monitor entry [0x46374000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doAccept(RpcServer.java:833)
> - waiting to lock <0x0002bb094ac8> (a 
> java.util.Collections$SynchronizedList)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.run(RpcServer.java:748)
> {noformat}
> And the owner of the lock is LinkedList.remove:
> {noformat}
> "RpcServer.reader=9,port=60020" daemon prio=10 tid=0x7f1580394000 
> nid=0x2cc19 runnable [0x43b4c000]
>java.lang.Thread.State: RUNNABLE
> at java.util.LinkedList.remove(LinkedList.java:363)
> at 
> java.util.Collections$SynchronizedCollection.remove(Collections.java:1639)
> - locked <0x0002bb094ac8> (a 
> java.util.Collections$SynchronizedList)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer.closeConnection(RpcServer.java:1992)
> - locked <0x0002bb094ac8> (a 
> java.util.Collections$SynchronizedList)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:867)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:645)
> - locked <0x0002bae09a30> (a 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:620)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> {noformat}
> This issue blocked RS once in a while and I had to restart it whenever it 
> happens. It seems like a bug. Any suggestions?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14062) RpcServer.Listener.doAccept get blocked by LinkedList.remove

2015-07-13 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14624354#comment-14624354
 ] 

Heng Chen commented on HBASE-14062:
---

your threads are blocked when acquiring connectionList lock in closeConnection 
of doRead function。

but in doRead function,  closeConnection() called when read return -1 or read 
throw exception。 

can you post your regionserver's log ?


PS:  this is doRead function below:
void doRead(SelectionKey key) throws InterruptedException {
  int count = 0;
  Connection c = (Connection)key.attachment();
  if (c == null) {
return;
  }
  c.setLastContact(System.currentTimeMillis());
  try {
count = c.readAndProcess();
  } catch (InterruptedException ieo) {
throw ieo;
  } catch (Exception e) {
LOG.warn(getName() + ": count of bytes read: " + count, e);
count = -1; //so that the (count < 0) block is executed
  }
  if (count < 0) {
if (LOG.isDebugEnabled()) {
  LOG.debug(getName() + ": DISCONNECTING client " + c.toString() +
" because read count=" + count +
". Number of active connections: " + numConnections);
}
closeConnection(c);
// c = null;
  } else {
c.setLastContact(System.currentTimeMillis());
  }
}




> RpcServer.Listener.doAccept get blocked by LinkedList.remove
> 
>
> Key: HBASE-14062
> URL: https://issues.apache.org/jira/browse/HBASE-14062
> Project: HBase
>  Issue Type: Bug
>  Components: IPC/RPC
>Affects Versions: 0.98.12
>Reporter: Victor Xu
> Attachments: jstack.log
>
>
> We saw these blocked info in our jstack output:
> {noformat}
> "RpcServer.listener,port=60020" daemon prio=10 tid=0x7f158097b800 
> nid=0x2cd05 waiting for monitor entry [0x46374000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doAccept(RpcServer.java:833)
> - waiting to lock <0x0002bb094ac8> (a 
> java.util.Collections$SynchronizedList)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.run(RpcServer.java:748)
> {noformat}
> And the owner of the lock is LinkedList.remove:
> {noformat}
> "RpcServer.reader=9,port=60020" daemon prio=10 tid=0x7f1580394000 
> nid=0x2cc19 runnable [0x43b4c000]
>java.lang.Thread.State: RUNNABLE
> at java.util.LinkedList.remove(LinkedList.java:363)
> at 
> java.util.Collections$SynchronizedCollection.remove(Collections.java:1639)
> - locked <0x0002bb094ac8> (a 
> java.util.Collections$SynchronizedList)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer.closeConnection(RpcServer.java:1992)
> - locked <0x0002bb094ac8> (a 
> java.util.Collections$SynchronizedList)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:867)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:645)
> - locked <0x0002bae09a30> (a 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:620)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> {noformat}
> This issue blocked RS once in a while and I had to restart it whenever it 
> happens. It seems like a bug. Any suggestions?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)