On Tue, Jan 27, 2015 at 9:34 AM, Pun Intended <[email protected]> wrote:
> Hi There, > > Thank you for your reply. The tasks are processing a lot of data. Each RS > is hosting 15 regions. I have a total of ~4000 regions. > 4000/15 = ~270 servers? > * The biggest ones are ~10Gbs Snappy-compressed (I think that's pretty big > and these are the slow ones). There are ~1000 of these. > It may take a while to churn through these, yeah, given say 3-5x compression. > * Then there are about 2000 5Gb-compressed regions. > * About 500 2-3-Gb-compressed ones. > * And about 500 0Gb ones. (:( not sure how these got created, maybe > over-splitting at some point way back). > > OK. We should clean these up but probably not an issue at the moment. > The distribution is not ideal. I can try to split the big ones and then > merge the empty ones. > > Your table is 'lumpy' because you disabled splitting? > > The longest running tasks are taking 2-3hrs and from initial observation > are running on the RegionServers hosting the big regions. > > It would make sense given default split is by region. You can get stats by region. What sort of monitoring setup do you have here? What do you see for slow tasks? Are you doing lots of seeks on these regions? Are they CPU-bound? What sort of a MR job is it? Lots of reading or writing? > Unfortunately, I don't have very good data to see what the read rate on the > RS hosting the .META. was when tasks were running faster awhile back, but I > did decommission the node where it was running and the .META. automatically > moved to another RS and the read requests there spiked up very high right > after it moved. > > High read rate against hbase:meta is going to happen. How many MR tasks you have running at any one time? Each task on startup is going to go there to figure out where the region it is to operate against is located. > > * Do you think the size and number of regions may be the real issue here? > The issue being that your jobs are taking longer? Sounds like a few of your tasks are taking longer than other tasks to complete, probably because these regions are bigger than others? Is that so? Can you take a look at recent runs to see which tasks are stragglers and then figure which region they were? If big ones, then you could try splitting these regions (since it seems like you have it disabled) so their processing gets divided across more MR tasks. But perhaps it always a particular region and it has a really big row or some other sort of anomaly that is taking time to process. > * Do you know if there is a way to host the .META. region on a dedicated > machine? > > There is not a means for doing this. Is the high read rate to meta an actual problem? St.Ack > Thanks in advance! > > > On Mon, Jan 26, 2015 at 11:40 PM, Stack <[email protected]> wrote: > > > On Sat, Jan 24, 2015 at 5:15 PM, Pun Intended <[email protected]> > > wrote: > > > > > Hello, > > > > > > I have noticed lately that my apps started running longer. > > > > > > You are processing more data now? > > > > > > > > > The longest > > > running tasks seem all to be requesting data from a single region > server. > > > > > > > Has the rate at which you access hbase:mta gone up since when the job ran > > faster? Anything changed in your processing? Is the trip to hbase:meta > > what is slowing your jobs (you could add some printout in your maptask > but > > meta looks should be fast usually out of cache). How long do the tasks > > last? Are they short or long? If long running, then they'll build up a > > local cache of locations and won't have to go to hbase:meta. > > > > St.Ack > > > > > > > > > That region server read rate is very high in comparison to the read > rate > > of > > > all the other region servers (1000reqs/sec vs 4-5 reqs/sec elsewhere). > > That > > > region server has about the same number of regions as all the rest: > 26-27 > > > regions. Number of store files, total region size, everything else on > the > > > region server seems ok and in the same ranges as the rest of the region > > > servers. The keys should be evenly distributed - randomly generated > > > 38-digit numbers. I am doing a simple Hbase scan from all my MR jobs. > > > > > > I'd appreciate any suggestions on what to look into it or if you have > any > > > ideas how I can solve this issue. > > > > > > Thanks! > > > > > >
