Excuse me for my poor english... I meant neither the M/R jobs nor thrift servers would execute the HBaseAdmin.tableExists...
2011/12/29 Yi Liang <[email protected]> > Sorry, I forgot there's another kind of client process, the Java MapReduce > jobs to write data. I don't restart them either. They're usually > short-lived. > > I think either the M/R jobs or thrift servers would execute the > HBaseAdmin.tableExists, because we use them only to do get or put > operations. The M/R jobs are used to put and get data, the thrift servers > are used to get rows of data. All tables were created once, and never > altered/deleted any more. > > > 2011/12/29 Yi Liang <[email protected]> > >> Lars, Ram: >> >> I don't restart client processes(in my case, they're thrift servers), I >> only restart the master and rs. Do you mean I should also restart the >> thrift servers? >> >> I'm now checking the code of thrift server, it seems that it does use >> HBaseAdmin.tableExists >> somewhere like createTable() and deleteTable(). >> >> Jinchao: >> I don't see any clue when checking rs with jstack, which states/threads >> should I check more carefully?. When the problem occurs, we see bigger IO >> than usual, the memory and network look ok. >> >> Thank you for your suggestions! >> Yi >> >> On Wed, Dec 28, 2011 at 4:21 PM, Gaojinchao <[email protected]>wrote: >> >>> I think you need check the threaddump(Client and RS) and >>> resources(memory, IO and network) of your cluster. >>> >>> -----邮件原件----- >>> 发件人: Lars H [mailto:[email protected]] >>> 发送时间: 2011年12月28日 0:32 >>> 收件人: [email protected] >>> 抄送: [email protected] >>> 主题: Re: Read speed down after long running >>> >>> When you restart HBase are you also restarting the client process? >>> Are you using HBaseAdmin.tableExists? >>> If so you might be running into HBASE-5073 >>> >>> -- Lars >>> >>> Yi Liang <[email protected]> schrieb: >>> >>> >Hi all, >>> > >>> >We're running hbase 0.90.3 for one read intensive application. >>> > >>> >We find after long running(2 weeks or 1 month or longer), the read speed >>> >will become much lower. >>> > >>> >For example, a get_rows operation of thrift to fetch 20 rows (about 4k >>> size >>> >every row) could take >2 second, sometimes even >5 seconds. When it >>> >happens, we can see cpu_wio keeps at about 10. >>> > >>> >But if we restart hbase(only master and regionservers) with >>> stop-hbase.sh >>> >and start-hbase.sh, we can see the read speed back to normal >>> immediately, >>> >which is <200 ms for every get_rows operation, and the cpu_wio drops to >>> >about 2. >>> > >>> >When the problem appears, there's no exception in logs, and no >>> >flush/compaction, nothing abnormal except a few warning logs sometimes >>> like >>> >below: >>> >2011-12-27 15:50:20,307 WARN >>> org.apache.hadoop.hbase.regionserver.wal.HLog: >>> >IPC Server handler 52 on 60020 took 1546 ms appending an edit to hlog; >>> >editcount=1, len~=9.8k >>> > >>> >Our cluster has 10 region servers, each with 25g heap size, 64% of which >>> >used for cache. The're some m/r jobs keep running in another cluster to >>> >feed data into the this hbase. Every night, we do flush and major >>> >compaction. Usually there's no flush or compaction in the daytime. >>> > >>> >Could anybody explain why the read speed could become lower after long >>> >running, and why it back to normal immediately after restarting hbase? >>> > >>> >Every advice will be highly appreciated. >>> > >>> >Thanks, >>> >Yi >>> >> >> >
