Re: High iowait in idle hbase cluster

Adrien Mogenet Thu, 03 Sep 2015 08:56:00 -0700

Is your HDFS healthy (fsck /)?

Same for hbase hbck?


What's your replication level?

Can you see constant network use as well?

Anything than might be triggered by the hbasemaster? (something like a
virtually dead RS, due to ZK race-condition, etc.)

Your 3-weeks-ago balancer shouldn't have any effect if you've ran a major
compaction, successfully, yesterday.

On 3 September 2015 at 16:32, Akmal Abbasov <[email protected]>
wrote:

> I’ve started HDFS balancer, but then stopped it immediately after knowing
> that it is not a good idea.
> but it was around 3 weeks ago, is it possible that it had an influence on
> the cluster behaviour I’m having now?
> Thanks.
>
> On 03 Sep 2015, at 14:23, Akmal Abbasov <[email protected]> wrote:
>
> Hi Ted,
> No there is no short-circuit read configured.
> The logs of datanode of the 10.10.8.55 are full of following messages
> 2015-09-03 12:03:56,324 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
> 10.10.8.55:50010, dest: /10.10.8.53:58622, bytes: 77, op: HDFS_READ,
> cliID: DFSClient_NONMAPREDUCE_-483065515_1, offset: 0, srvID:
> ee7d0634-89a3-4ada-a8ad-7848214397be, blockid:
> BP-439084760-10.32.0.180-1387281790961:blk_1075349331_1612273, duration:
> 276448307
> 2015-09-03 12:03:56,494 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
> 10.10.8.55:50010, dest: /10.10.8.53:58622, bytes: 538, op: HDFS_READ,
> cliID: DFSClient_NONMAPREDUCE_-483065515_1, offset: 0, srvID:
> ee7d0634-89a3-4ada-a8ad-7848214397be, blockid:
> BP-439084760-10.32.0.180-1387281790961:blk_1075349334_1612276, duration:
> 60550244
> 2015-09-03 12:03:59,561 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
> 10.10.8.55:50010, dest: /10.10.8.53:58622, bytes: 455, op: HDFS_READ,
> cliID: DFSClient_NONMAPREDUCE_-483065515_1, offset: 0, srvID:
> ee7d0634-89a3-4ada-a8ad-7848214397be, blockid:
> BP-439084760-10.32.0.180-1387281790961:blk_1075351814_1614757, duration:
> 755613819
> There are >100.000 of them just for today. The situation with other
> regionservers are similar.
> Node 10.10.8.53 is hbase-master node, and the process on the port is also
> hbase-master.
> So if there is no load on the cluster, why there are so much IO happening?
> Any thoughts.
> Thanks.
>
> On 02 Sep 2015, at 21:57, Ted Yu <[email protected]> wrote:
>
> I assume you have enabled short-circuit read.
>
> Can you capture region server stack trace(s) and pastebin them ?
>
> Thanks
>
> On Wed, Sep 2, 2015 at 12:11 PM, Akmal Abbasov <[email protected]>
> wrote:
>
>> Hi Ted,
>> I’ve checked the time when addresses were changed, and this strange
>> behaviour started weeks before it.
>>
>> yes, 10.10.8.55 is region server and 10.10.8.54 is a hbase master.
>> any thoughts?
>>
>> Thanks
>>
>> On 02 Sep 2015, at 18:45, Ted Yu <[email protected]> wrote:
>>
>> bq. change the ip addresses of the cluster nodes
>>
>> Did this happen recently ? If high iowait was observed after the change
>> (you can look at ganglia graph), there is a chance that the change was
>> related.
>>
>> BTW I assume 10.10.8.55 <http://10.10.8.55:50010/> is where your region
>> server resides.
>>
>> Cheers
>>
>> On Wed, Sep 2, 2015 at 9:39 AM, Akmal Abbasov <[email protected]>
>> wrote:
>>
>>> Hi Ted,
>>> sorry forget to mention
>>>
>>> release of hbase / hadoop you're using
>>>
>>> hbase hbase-0.98.7-hadoop2, hadoop hadoop-2.5.1
>>>
>>> were region servers doing compaction ?
>>>
>>> I’ve run major compactions manually earlier today, but it seems that
>>> they already completed, looking at the compactionQueueSize.
>>>
>>> have you checked region server logs ?
>>>
>>> The logs of datanode is full of this kind of messages
>>> 2015-09-02 16:37:06,950 INFO
>>> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
>>> 10.10.8.55:50010, dest: /10.10.8.54:32959, bytes: 19673, op: HDFS_READ,
>>> cliID: DFSClient_NONMAPREDUCE_1225374853_1, offset: 0, srvID:
>>> ee7d0634-89a3-4ada-a8ad-7848217327be, blockid:
>>> BP-329084760-10.32.0.180-1387281790961:blk_1075277914_1540222, duration:
>>> 7881815
>>>
>>> p.s. we had to change the ip addresses of the cluster nodes, is it
>>> relevant?
>>>
>>> Thanks.
>>>
>>> On 02 Sep 2015, at 18:20, Ted Yu <[email protected]> wrote:
>>>
>>> Please provide some more information:
>>>
>>> release of hbase / hadoop you're using
>>> were region servers doing compaction ?
>>> have you checked region server logs ?
>>>
>>> Thanks
>>>
>>> On Wed, Sep 2, 2015 at 9:11 AM, Akmal Abbasov <[email protected]>
>>> wrote:
>>>
>>>> Hi,
>>>> I’m having strange behaviour in hbase cluster. It is almost idle, only
>>>> <5 puts and gets.
>>>> But the data in hdfs is increasing, and region servers have very high
>>>> iowait(>100, in 2 core CPU).
>>>> iotop shows that datanode process is reading and writing all the time.
>>>> Any suggestions?
>>>>
>>>> Thanks.
>>>
>>>
>>>
>>>
>>
>>
>
>
>


-- 

*Adrien Mogenet*
Head of Backend/Infrastructure
[email protected]
(+33)6.59.16.64.22
http://www.contentsquare.com
50, avenue Montaigne - 75008 Paris

Re: High iowait in idle hbase cluster

Reply via email to