Re: Read speed down after long running

Yi Liang Wed, 28 Dec 2011 20:29:17 -0800

Excuse me for my poor english...

I meant neither the M/R jobs nor thrift servers would execute
the HBaseAdmin.tableExists...


2011/12/29 Yi Liang <[email protected]>

> Sorry, I forgot there's another kind of client process, the Java MapReduce
> jobs to write data. I don't restart them either. They're usually
> short-lived.
>
> I think either the M/R jobs or thrift servers would execute the
> HBaseAdmin.tableExists, because we use them only to do get or put
> operations. The M/R jobs are used to put and get data, the thrift servers
> are used to get rows of data. All tables were created once, and never
> altered/deleted any more.
>
>
> 2011/12/29 Yi Liang <[email protected]>
>
>> Lars, Ram:
>>
>> I don't restart client processes(in my case, they're thrift servers), I
>> only restart the master and rs. Do you mean I should also restart the
>> thrift servers?
>>
>> I'm now checking the code of thrift server, it seems that it does use 
>> HBaseAdmin.tableExists
>> somewhere like createTable() and deleteTable().
>>
>> Jinchao:
>> I don't see any clue when checking rs with jstack, which states/threads
>> should I check more carefully?. When the problem occurs, we see bigger IO
>> than usual, the memory and network look ok.
>>
>> Thank you for your suggestions!
>> Yi
>>
>> On Wed, Dec 28, 2011 at 4:21 PM, Gaojinchao <[email protected]>wrote:
>>
>>> I think you need check the threaddump(Client and RS) and
>>> resources(memory, IO and network) of your cluster.
>>>
>>> -----邮件原件-----
>>> 发件人: Lars H [mailto:[email protected]]
>>> 发送时间: 2011年12月28日 0:32
>>> 收件人: [email protected]
>>> 抄送: [email protected]
>>> 主题: Re: Read speed down after long running
>>>
>>> When you restart HBase are you also restarting the client process?
>>> Are you using HBaseAdmin.tableExists?
>>> If so you might be running into HBASE-5073
>>>
>>> -- Lars
>>>
>>> Yi Liang <[email protected]> schrieb:
>>>
>>> >Hi all,
>>> >
>>> >We're running hbase 0.90.3 for one read intensive application.
>>> >
>>> >We find after long running(2 weeks or 1 month or longer), the read speed
>>> >will become much lower.
>>> >
>>> >For example, a get_rows operation of thrift to fetch 20 rows (about 4k
>>> size
>>> >every row) could take >2 second, sometimes even >5 seconds. When it
>>> >happens, we can see cpu_wio keeps at about 10.
>>> >
>>> >But if we restart hbase(only master and regionservers) with
>>> stop-hbase.sh
>>> >and start-hbase.sh, we can see the read speed back to normal
>>> immediately,
>>> >which is <200 ms for every get_rows operation, and the cpu_wio drops to
>>> >about 2.
>>> >
>>> >When the problem appears, there's no exception in logs, and no
>>> >flush/compaction, nothing abnormal except a few warning logs sometimes
>>> like
>>> >below:
>>> >2011-12-27 15:50:20,307 WARN
>>> org.apache.hadoop.hbase.regionserver.wal.HLog:
>>> >IPC Server handler 52 on 60020 took 1546 ms appending an edit to hlog;
>>> >editcount=1, len~=9.8k
>>> >
>>> >Our cluster has 10 region servers, each with 25g heap size, 64% of which
>>> >used for cache. The're some m/r jobs keep running in another cluster to
>>> >feed data into the this hbase. Every night, we do flush and major
>>> >compaction. Usually there's no flush or compaction in the daytime.
>>> >
>>> >Could anybody explain why the read speed could become lower after long
>>> >running, and why it back to normal immediately after restarting hbase?
>>> >
>>> >Every advice will be highly appreciated.
>>> >
>>> >Thanks,
>>> >Yi
>>>
>>
>>
>

Re: Read speed down after long running

Reply via email to