sorry !!! I operate miss 2017-01-03 23:48 GMT+08:00 Weizhan Zeng <[email protected]>:
> 机器ip: > > LF-HBASE-VENUS-149106.hadoop.jd.local > > > jstack信息:/data0/hbase-logs/46384.out > > 2017-01-03 23:44 GMT+08:00 Weizhan Zeng <[email protected]>: > >> My HBase version is 1.1.6 And Hadoop version is 2.6.1 。 I had jstack info >> , I can give it to you tomorrow after I arrived my company . >> >> I guess the reason why "Too many open files" is too many storeFiles . I >> saw my monitor and found storeFileCount is 33K , but ulimit is 65535 。 The >> reason why so many stofeFiles seens compaction not worked. >> >> >> >> But confused me is why rs not exit . >> >> >> 2017-01-03 23:05 GMT+08:00 Ted Yu <[email protected]>: >> >>> Switching to user@ >>> >>> What's the version of hbase / hadoop you're using ? >>> >>> Before issuing, "kill -9", did you capture stack trace of the region >>> server >>> process ? >>> >>> Have you read 'Limits on Number of Files and Processes' under >>> http://hbase.apache.org/book.html#basic.prerequisites ? >>> >>> On Tue, Jan 3, 2017 at 6:56 AM, Weizhan Zeng <[email protected]> >>> wrote: >>> >>> > Hi guys: >>> > I met an issue on one of my RS. >>> > After SocketException happend, It should shut down , but after 8 hours >>> , I >>> > found it still alive and use kill -9 process to end up it. >>> > >>> > Here is my RegionServer log: >>> > >>> > In 01:58 AM , SocketException Happen, >>> > >>> > >>> > 1. [2017-01-02T01:58:00.469+08:00] [INFO] hdfs.DFSClient : >>> > Exception in createBlockOutputStream java.net.SocketException: Too >>> > many open files >>> > 2. at sun.nio.ch.Net.socket0(Native Method) >>> > 3. at sun.nio.ch.Net.socket(Net.java:423) >>> > 4. at sun.nio.ch.Net.socket(Net.java:416) >>> > 5. at sun.nio.ch.SocketChannelImpl.< >>> init>(SocketChannelImp.java: >>> > 104) >>> > >>> > And in 01:58 AM, RegionServer aborted itself. And began to close >>> region. >>> > >>> > >>> > 1. [2017-01-02T01:58:00.632+08:00] [INFO] >>> > regionserver.HRegionServer : aborting server >>> > HBASE-VENUS-149106.hadoop.local,16020,1482236933819 >>> > 2. [2017-01-02T01:58:00.632+08:00] [INFO] >>> > client.ConnectionManager$HConnectionImplementation : Closing zookeeper >>> > sessionid=0x456f9b55fda457b >>> > 3. [2017-01-02T01:58:00.632+08:00] [INFO] regionserver.HStore : >>> Closed >>> > f >>> > >>> > >>> > 1. 2017-01-02T01:59:18.067+08:00] [INFO] >>> > regionserver.HRegionServer$MovedRegionsCleaner : Chore: >>> > MovedRegionsCleaner for region >>> > HBASE-VENUS-149106.hadoop.local,16020,1482236933819 was stopped >>> > 2. [2017-01-02T01:59:18.225+08:00] [INFO] regionserver.Replication >>> > : Normal source for cluster 1: Total replicated edits: 39081044, >>> > currently replicating from: >>> > hdfs://venus/hbase/oldWALs/HBASE-VENUS-149106.hadoop. >>> > local%2C16020%2C1482236933819.default.1483293299516 >>> > at position: 0 >>> > >>> > >>> > 1. [2017-01-02T01:59:18.225+08:00] [INFO] regionserver.Replication >>> > : Sink: age in ms of last applied edit: 0, total replicated edits: >>> > 160769427 >>> > >>> > After one Hour, It still log >>> > >>> > >>> > 1. [2017-01-02T02:04:18.225+08:00] [INFO] regionserver.Replication >>> > : Normal source for cluster 1: Total replicated edits: 39081044, >>> > currently replicating from: >>> > hdfs://venus/hbase/oldWALs/HBASE-VENUS-149106.hadoop. >>> > local%2C16020%2C1482236933819.default.1483293299516 >>> > at position: 0 >>> > >>> > At 8 AM >>> > >>> > >>> > 1. [2017-01-02T08:09:18.225+08:00] [INFO] regionserver.Replication >>> > : Sink: age in ms of last applied edit: 0, total replicated edits: >>> > 160769427 >>> > 2. [2017-01-02T08:14:18.225+08:00] [INFO] regionserver.Replication >>> > : Normal source for cluster 1: Total replicated edits: 39081044, >>> > currently replicating >>> > >>> > Is anyone can give me some tips to find it out . thanks . >>> > >>> >> >> >
