Hi There,

Thank you for your reply. The tasks are processing a lot of data. Each RS
is hosting 15 regions. I have a total of ~4000 regions.
 * The biggest ones are ~10Gbs Snappy-compressed (I think that's pretty big
and these are the slow ones). There are ~1000 of these.
 * Then there are about 2000 5Gb-compressed regions.
 * About 500 2-3-Gb-compressed ones.
 * And about 500 0Gb ones. (:( not sure how these got created, maybe
over-splitting at some point way back).

The distribution is not ideal. I can try to split the big ones and then
merge the empty ones.


The longest running tasks are taking 2-3hrs and from initial observation
are running on the RegionServers hosting the big regions.

Unfortunately, I don't have very good data to see what the read rate on the
RS hosting the .META. was when tasks were running faster awhile back, but I
did decommission the node where it was running and the .META. automatically
moved to another RS and the read requests there spiked up very high right
after it moved.


* Do you think the size and number of regions may be the real issue here?
* Do you know if there is a way to host the .META. region on a dedicated
machine?

Thanks in advance!


On Mon, Jan 26, 2015 at 11:40 PM, Stack <[email protected]> wrote:

> On Sat, Jan 24, 2015 at 5:15 PM, Pun Intended <[email protected]>
> wrote:
>
> > Hello,
> >
> > I have noticed lately that my apps started running longer.
>
>
> You are processing more data now?
>
>
>
> > The longest
> > running tasks seem all to be requesting data from a single region server.
> >
>
> Has the rate at which you access hbase:mta gone up since when the job ran
> faster? Anything changed in your processing?  Is the trip to hbase:meta
> what is slowing your jobs (you could add some printout in your maptask but
> meta looks should be fast usually out of cache). How long do the tasks
> last?  Are they short or long?  If long running, then they'll build up a
> local cache of locations and won't have to go to hbase:meta.
>
> St.Ack
>
>
>
> > That region server read rate is very high in comparison to the read rate
> of
> > all the other region servers (1000reqs/sec vs 4-5 reqs/sec elsewhere).
> That
> > region server has about the same number of regions as all the rest: 26-27
> > regions. Number of store files, total region size, everything else on the
> > region server seems ok and in the same ranges as the rest of the region
> > servers. The keys should be evenly distributed - randomly generated
> > 38-digit numbers. I am doing a simple Hbase scan from all my MR jobs.
> >
> > I'd appreciate any suggestions on what to look into it or if you have any
> > ideas how I can solve this issue.
> >
> > Thanks!
> >
>

Reply via email to