I’ve started HDFS balancer, but then stopped it immediately after knowing that it is not a good idea. but it was around 3 weeks ago, is it possible that it had an influence on the cluster behaviour I’m having now? Thanks.
> On 03 Sep 2015, at 14:23, Akmal Abbasov <[email protected]> wrote: > > Hi Ted, > No there is no short-circuit read configured. > The logs of datanode of the 10.10.8.55 are full of following messages > 2015-09-03 12:03:56,324 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: > /10.10.8.55:50010, dest: /10.10.8.53:58622, bytes: 77, op: HDFS_READ, cliID: > DFSClient_NONMAPREDUCE_-483065515_1, offset: 0, srvID: > ee7d0634-89a3-4ada-a8ad-7848214397be, blockid: > BP-439084760-10.32.0.180-1387281790961:blk_1075349331_1612273, duration: > 276448307 > 2015-09-03 12:03:56,494 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: > /10.10.8.55:50010, dest: /10.10.8.53:58622, bytes: 538, op: HDFS_READ, cliID: > DFSClient_NONMAPREDUCE_-483065515_1, offset: 0, srvID: > ee7d0634-89a3-4ada-a8ad-7848214397be, blockid: > BP-439084760-10.32.0.180-1387281790961:blk_1075349334_1612276, duration: > 60550244 > 2015-09-03 12:03:59,561 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: > /10.10.8.55:50010, dest: /10.10.8.53:58622, bytes: 455, op: HDFS_READ, cliID: > DFSClient_NONMAPREDUCE_-483065515_1, offset: 0, srvID: > ee7d0634-89a3-4ada-a8ad-7848214397be, blockid: > BP-439084760-10.32.0.180-1387281790961:blk_1075351814_1614757, duration: > 755613819 > There are >100.000 of them just for today. The situation with other > regionservers are similar. > Node 10.10.8.53 is hbase-master node, and the process on the port is also > hbase-master. > So if there is no load on the cluster, why there are so much IO happening? > Any thoughts. > Thanks. > >> On 02 Sep 2015, at 21:57, Ted Yu <[email protected] >> <mailto:[email protected]>> wrote: >> >> I assume you have enabled short-circuit read. >> >> Can you capture region server stack trace(s) and pastebin them ? >> >> Thanks >> >> On Wed, Sep 2, 2015 at 12:11 PM, Akmal Abbasov <[email protected] >> <mailto:[email protected]>> wrote: >> Hi Ted, >> I’ve checked the time when addresses were changed, and this strange >> behaviour started weeks before it. >> >> yes, 10.10.8.55 is region server and 10.10.8.54 is a hbase master. >> any thoughts? >> >> Thanks >> >>> On 02 Sep 2015, at 18:45, Ted Yu <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> bq. change the ip addresses of the cluster nodes >>> >>> Did this happen recently ? If high iowait was observed after the change >>> (you can look at ganglia graph), there is a chance that the change was >>> related. >>> >>> BTW I assume 10.10.8.55 <http://10.10.8.55:50010/> is where your region >>> server resides. >>> >>> Cheers >>> >>> On Wed, Sep 2, 2015 at 9:39 AM, Akmal Abbasov <[email protected] >>> <mailto:[email protected]>> wrote: >>> Hi Ted, >>> sorry forget to mention >>> >>>> release of hbase / hadoop you're using >>> >>> hbase hbase-0.98.7-hadoop2, hadoop hadoop-2.5.1 >>> >>>> were region servers doing compaction ? >>> >>> I’ve run major compactions manually earlier today, but it seems that they >>> already completed, looking at the compactionQueueSize. >>> >>>> have you checked region server logs ? >>> The logs of datanode is full of this kind of messages >>> 2015-09-02 16:37:06,950 INFO >>> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: >>> /10.10.8.55:50010 <http://10.10.8.55:50010/>, dest: /10.10.8.54:32959 >>> <http://10.10.8.54:32959/>, bytes: 19673, op: HDFS_READ, cliID: >>> DFSClient_NONMAPREDUCE_1225374853_1, offset: 0, srvID: >>> ee7d0634-89a3-4ada-a8ad-7848217327be, blockid: >>> BP-329084760-10.32.0.180-1387281790961:blk_1075277914_1540222, duration: >>> 7881815 >>> >>> p.s. we had to change the ip addresses of the cluster nodes, is it relevant? >>> >>> Thanks. >>> >>>> On 02 Sep 2015, at 18:20, Ted Yu <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> >>>> Please provide some more information: >>>> >>>> release of hbase / hadoop you're using >>>> were region servers doing compaction ? >>>> have you checked region server logs ? >>>> >>>> Thanks >>>> >>>> On Wed, Sep 2, 2015 at 9:11 AM, Akmal Abbasov <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> Hi, >>>> I’m having strange behaviour in hbase cluster. It is almost idle, only <5 >>>> puts and gets. >>>> But the data in hdfs is increasing, and region servers have very high >>>> iowait(>100, in 2 core CPU). >>>> iotop shows that datanode process is reading and writing all the time. >>>> Any suggestions? >>>> >>>> Thanks. >>>> >>> >>> >> >> >
