Switching to user@ What's the version of hbase / hadoop you're using ?
Before issuing, "kill -9", did you capture stack trace of the region server process ? Have you read 'Limits on Number of Files and Processes' under http://hbase.apache.org/book.html#basic.prerequisites ? On Tue, Jan 3, 2017 at 6:56 AM, Weizhan Zeng <[email protected]> wrote: > Hi guys: > I met an issue on one of my RS. > After SocketException happend, It should shut down , but after 8 hours , I > found it still alive and use kill -9 process to end up it. > > Here is my RegionServer log: > > In 01:58 AM , SocketException Happen, > > > 1. [2017-01-02T01:58:00.469+08:00] [INFO] hdfs.DFSClient : > Exception in createBlockOutputStream java.net.SocketException: Too > many open files > 2. at sun.nio.ch.Net.socket0(Native Method) > 3. at sun.nio.ch.Net.socket(Net.java:423) > 4. at sun.nio.ch.Net.socket(Net.java:416) > 5. at sun.nio.ch.SocketChannelImpl.<init>(SocketChannelImp.java: > 104) > > And in 01:58 AM, RegionServer aborted itself. And began to close region. > > > 1. [2017-01-02T01:58:00.632+08:00] [INFO] > regionserver.HRegionServer : aborting server > HBASE-VENUS-149106.hadoop.local,16020,1482236933819 > 2. [2017-01-02T01:58:00.632+08:00] [INFO] > client.ConnectionManager$HConnectionImplementation : Closing zookeeper > sessionid=0x456f9b55fda457b > 3. [2017-01-02T01:58:00.632+08:00] [INFO] regionserver.HStore : Closed > f > > > 1. 2017-01-02T01:59:18.067+08:00] [INFO] > regionserver.HRegionServer$MovedRegionsCleaner : Chore: > MovedRegionsCleaner for region > HBASE-VENUS-149106.hadoop.local,16020,1482236933819 was stopped > 2. [2017-01-02T01:59:18.225+08:00] [INFO] regionserver.Replication > : Normal source for cluster 1: Total replicated edits: 39081044, > currently replicating from: > hdfs://venus/hbase/oldWALs/HBASE-VENUS-149106.hadoop. > local%2C16020%2C1482236933819.default.1483293299516 > at position: 0 > > > 1. [2017-01-02T01:59:18.225+08:00] [INFO] regionserver.Replication > : Sink: age in ms of last applied edit: 0, total replicated edits: > 160769427 > > After one Hour, It still log > > > 1. [2017-01-02T02:04:18.225+08:00] [INFO] regionserver.Replication > : Normal source for cluster 1: Total replicated edits: 39081044, > currently replicating from: > hdfs://venus/hbase/oldWALs/HBASE-VENUS-149106.hadoop. > local%2C16020%2C1482236933819.default.1483293299516 > at position: 0 > > At 8 AM > > > 1. [2017-01-02T08:09:18.225+08:00] [INFO] regionserver.Replication > : Sink: age in ms of last applied edit: 0, total replicated edits: > 160769427 > 2. [2017-01-02T08:14:18.225+08:00] [INFO] regionserver.Replication > : Normal source for cluster 1: Total replicated edits: 39081044, > currently replicating > > Is anyone can give me some tips to find it out . thanks . >
