Hi there, 

The HBase RefGuide has a comprehensive case study on such a case.  This
might not be the exact problem, but the diagnostic approach should help.

http://hbase.apache.org/book.html#casestudies.slownode





On 1/4/13 10:37 PM, "Liu, Raymond" <raymond....@intel.com> wrote:

>Hi
>
>I encounter a weird lag behind map task issue here :
>
>I have a small hadoop/hbase cluster with 1 master node and 4 regionserver
>node all have 16 CPU with map and reduce slot set to 24.
>
>A few table is created with regions distributed on each region node
>evenly ( say 16 region for each region server). Also each region has
>almost the same number of kvs with very similar size. All table had
>major_compact done to ensure data locality
>
>I have a MR job which simply do local region scan in every map task ( so
>16 map task for each regionserver node).
>
>By theory, every map task should finish within similar time.
>
>But the real case is that some regions on the same region server always
>lags behind a lot, say cost 150 ~250% of the other map tasks average
>times.
>
>If this is happen to a single region server for every table, I might
>doubt it is a disk issue or other reason that bring down the performance
>of this region server.
>
>But the weird thing is that, though with each single table, almost all
>the map task on the the same single regionserver is lag behind. But for
>different table, this lag behind regionserver is different! And the
>region and region size is distributed evenly which I double checked for a
>lot of times. ( I even try to set replica to 4 to ensure every node have
>a copy of local data)
>
>Say table 1, all map task on regionserver node 2 is slow. While for table
>2, maybe all map task on regionserver node 3 is slow, and with table 1,
>it will always be regionserver node 2 which is slow regardless of cluster
>restart, and the slowest map task will always be the very same one. And
>it won't go away even I do major compact again.....
>
>So, anyone could give me some clue on what reason might possible lead to
>this weird behavior? Any wild guess is welcome!
>
>(BTW. I don't encounter this issue a few days ago with the same table.
>While I do restart cluster and do a few changes upon config file during
>that period, But restore the config file don't help)
>
>
>Best Regards,
>Raymond Liu
>
>


Reply via email to