Is your HDFS healthy (fsck /)? Same for hbase hbck?
What's your replication level? Can you see constant network use as well? Anything than might be triggered by the hbasemaster? (something like a virtually dead RS, due to ZK race-condition, etc.) Your 3-weeks-ago balancer shouldn't have any effect if you've ran a major compaction, successfully, yesterday. On 3 September 2015 at 16:32, Akmal Abbasov <[email protected]> wrote: > I’ve started HDFS balancer, but then stopped it immediately after knowing > that it is not a good idea. > but it was around 3 weeks ago, is it possible that it had an influence on > the cluster behaviour I’m having now? > Thanks. > > On 03 Sep 2015, at 14:23, Akmal Abbasov <[email protected]> wrote: > > Hi Ted, > No there is no short-circuit read configured. > The logs of datanode of the 10.10.8.55 are full of following messages > 2015-09-03 12:03:56,324 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: / > 10.10.8.55:50010, dest: /10.10.8.53:58622, bytes: 77, op: HDFS_READ, > cliID: DFSClient_NONMAPREDUCE_-483065515_1, offset: 0, srvID: > ee7d0634-89a3-4ada-a8ad-7848214397be, blockid: > BP-439084760-10.32.0.180-1387281790961:blk_1075349331_1612273, duration: > 276448307 > 2015-09-03 12:03:56,494 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: / > 10.10.8.55:50010, dest: /10.10.8.53:58622, bytes: 538, op: HDFS_READ, > cliID: DFSClient_NONMAPREDUCE_-483065515_1, offset: 0, srvID: > ee7d0634-89a3-4ada-a8ad-7848214397be, blockid: > BP-439084760-10.32.0.180-1387281790961:blk_1075349334_1612276, duration: > 60550244 > 2015-09-03 12:03:59,561 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: / > 10.10.8.55:50010, dest: /10.10.8.53:58622, bytes: 455, op: HDFS_READ, > cliID: DFSClient_NONMAPREDUCE_-483065515_1, offset: 0, srvID: > ee7d0634-89a3-4ada-a8ad-7848214397be, blockid: > BP-439084760-10.32.0.180-1387281790961:blk_1075351814_1614757, duration: > 755613819 > There are >100.000 of them just for today. The situation with other > regionservers are similar. > Node 10.10.8.53 is hbase-master node, and the process on the port is also > hbase-master. > So if there is no load on the cluster, why there are so much IO happening? > Any thoughts. > Thanks. > > On 02 Sep 2015, at 21:57, Ted Yu <[email protected]> wrote: > > I assume you have enabled short-circuit read. > > Can you capture region server stack trace(s) and pastebin them ? > > Thanks > > On Wed, Sep 2, 2015 at 12:11 PM, Akmal Abbasov <[email protected]> > wrote: > >> Hi Ted, >> I’ve checked the time when addresses were changed, and this strange >> behaviour started weeks before it. >> >> yes, 10.10.8.55 is region server and 10.10.8.54 is a hbase master. >> any thoughts? >> >> Thanks >> >> On 02 Sep 2015, at 18:45, Ted Yu <[email protected]> wrote: >> >> bq. change the ip addresses of the cluster nodes >> >> Did this happen recently ? If high iowait was observed after the change >> (you can look at ganglia graph), there is a chance that the change was >> related. >> >> BTW I assume 10.10.8.55 <http://10.10.8.55:50010/> is where your region >> server resides. >> >> Cheers >> >> On Wed, Sep 2, 2015 at 9:39 AM, Akmal Abbasov <[email protected]> >> wrote: >> >>> Hi Ted, >>> sorry forget to mention >>> >>> release of hbase / hadoop you're using >>> >>> hbase hbase-0.98.7-hadoop2, hadoop hadoop-2.5.1 >>> >>> were region servers doing compaction ? >>> >>> I’ve run major compactions manually earlier today, but it seems that >>> they already completed, looking at the compactionQueueSize. >>> >>> have you checked region server logs ? >>> >>> The logs of datanode is full of this kind of messages >>> 2015-09-02 16:37:06,950 INFO >>> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: / >>> 10.10.8.55:50010, dest: /10.10.8.54:32959, bytes: 19673, op: HDFS_READ, >>> cliID: DFSClient_NONMAPREDUCE_1225374853_1, offset: 0, srvID: >>> ee7d0634-89a3-4ada-a8ad-7848217327be, blockid: >>> BP-329084760-10.32.0.180-1387281790961:blk_1075277914_1540222, duration: >>> 7881815 >>> >>> p.s. we had to change the ip addresses of the cluster nodes, is it >>> relevant? >>> >>> Thanks. >>> >>> On 02 Sep 2015, at 18:20, Ted Yu <[email protected]> wrote: >>> >>> Please provide some more information: >>> >>> release of hbase / hadoop you're using >>> were region servers doing compaction ? >>> have you checked region server logs ? >>> >>> Thanks >>> >>> On Wed, Sep 2, 2015 at 9:11 AM, Akmal Abbasov <[email protected]> >>> wrote: >>> >>>> Hi, >>>> I’m having strange behaviour in hbase cluster. It is almost idle, only >>>> <5 puts and gets. >>>> But the data in hdfs is increasing, and region servers have very high >>>> iowait(>100, in 2 core CPU). >>>> iotop shows that datanode process is reading and writing all the time. >>>> Any suggestions? >>>> >>>> Thanks. >>> >>> >>> >>> >> >> > > > -- *Adrien Mogenet* Head of Backend/Infrastructure [email protected] (+33)6.59.16.64.22 http://www.contentsquare.com 50, avenue Montaigne - 75008 Paris
