Re: One weird problem of my MR job upon hbase table.

Ted Yu Fri, 04 Jan 2013 20:37:27 -0800

Since a custom InputFormat was used, I assume you have verified that the
map tasks ran on the region server which hosts the regions being scanned.


If you were doing aggregation through this MR job, you can consider using
AggregateProtocol.

Cheers

On Fri, Jan 4, 2013 at 8:08 PM, Liu, Raymond <raymond....@intel.com> wrote:

> Hi Ted
>
> Thanks for your reply
>
> >
> > Did you use TableInputFormat in your MR job ?
> No, a custom one which do the same split work, but input for each map task
> is the split, and the map task open htable and read the specific region by
> itself.
>
> > Did you use the one from mapred or mapreduce ?
> All related staff from mapreduce.
>
> >
> > What version of HBase are you using ?
> 0.94.1
>
> >
> > Did you take a look at Ganglia to see if there is any bottleneck in your
> cluster ?
>
> I don't, but I do check cpu and disk usage simply by dstat -cdnm , no cpu
> or disk or network IO bottle neck is observed.
>
> >
> > You mentioned a  few changes upon config file shortly before this problem
> > appeared, can you let us know which parameters you modified ?
>
> Mainly increase dfs.datanode.handler.count /
> hbase.regionserver.handler.count from default to around 30 etc. while this
> is done on every node. And I change it back later. Hmm...
>
>
> >
> > Cheers
> >
> > On Fri, Jan 4, 2013 at 7:37 PM, Liu, Raymond <raymond....@intel.com>
> wrote:
> >
> > > Hi
> > >
> > > I encounter a weird lag behind map task issue here :
> > >
> > > I have a small hadoop/hbase cluster with 1 master node and 4
> > > regionserver node all have 16 CPU with map and reduce slot set to 24.
> > >
> > > A few table is created with regions distributed on each region node
> > > evenly ( say 16 region for each region server). Also each region has
> > > almost the same number of kvs with very similar size. All table had
> > > major_compact done to ensure data locality
> > >
> > > I have a MR job which simply do local region scan in every map task (
> > > so
> > > 16 map task for each regionserver node).
> > >
> > > By theory, every map task should finish within similar time.
> > >
> > > But the real case is that some regions on the same region server
> > > always lags behind a lot, say cost 150 ~250% of the other map tasks
> average
> > times.
> > >
> > > If this is happen to a single region server for every table, I might
> > > doubt it is a disk issue or other reason that bring down the
> > > performance of this region server.
> > >
> > > But the weird thing is that, though with each single table, almost all
> > > the map task on the the same single regionserver is lag behind. But
> > > for different table, this lag behind regionserver is different! And
> > > the region and region size is distributed evenly which I double
> > > checked for a lot of times. ( I even try to set replica to 4 to ensure
> > > every node have a copy of local data)
> > >
> > > Say table 1, all map task on regionserver node 2 is slow. While for
> > > table 2, maybe all map task on regionserver node 3 is slow, and with
> > > table 1, it will always be regionserver node 2 which is slow
> > > regardless of cluster restart, and the slowest map task will always be
> > > the very same one. And it won't go away even I do major compact
> again.....
> > >
> > > So, anyone could give me some clue on what reason might possible lead
> > > to this weird behavior? Any wild guess is welcome!
> > >
> > > (BTW. I don't encounter this issue a few days ago with the same table.
> > > While I do restart cluster and do a few changes upon config file
> > > during that period, But restore the config file don't help)
> > >
> > >
> > > Best Regards,
> > > Raymond Liu
> > >
> > >
>

Re: One weird problem of my MR job upon hbase table.

Reply via email to