Re: Table.get(List) overwhelms several RSs

Ted Yu Wed, 25 Feb 2015 10:34:31 -0800

bq. The 4000 keys are likely contiguous and therefore probably represent
entire regions


In that case you can convert multi-get's to Scan with proper batch size and
start/stop rows.

Cheers

On Wed, Feb 25, 2015 at 10:16 AM, Ted Tuttle <[email protected]> wrote:

> Heaps are 16G w/ hfile.block.cache.size = 0.5
>
>
>
> Machines have 32G onboard and we used to run w/ 24G heaps but reduced them
> to lower GC times.
>
>
>
> Not so sure about which regions were hot.  And I don't want to repeat and
> take down my cluster again :)
>
>
>
> What I know:
>
>
>
> 1) The request was about 4000 gets.
>
> 2) The 4000 keys are likely contiguous and therefore probably represent
> entire regions
>
> 3) Once we batched the gets (so as not to kill the cluster) the result was
> >10G of data in client. We blew the heap there :(
>
> 4) Our regions are 10G (hbase.hregion.max.filesize  = 10737418240)
>
>
>
> Distributing these key via salting is not in our best interest as we
> typically do these types of timeseries queries (though only recently at
> this scale).
>
>
>
> I think I understand the failure mode, I guess I am just surprised that a
> greedy client can kill the cluster and that we are required to batch our
> gets in order to protect the cluster.
>
>
>
> *From:* Nick Dimiduk [mailto:[email protected]]
> *Sent:* Wednesday, February 25, 2015 9:40 AM
> *To:* hbase-user
> *Cc:* Ted Yu; Development
> *Subject:* Re: Table.get(List<Get>) overwhelms several RSs
>
>
>
> How large is your region server heap? What's your setting
> for hfile.block.cache.size? Can you identify which region is being burned
> up (i.e., is it META?)
>
>
>
> It is possible for a hot region to act as a "death pill" that roams around
> the cluster. We see this with the meta region with poorly-behaved clients.
>
>
>
> -n
>
>
>
> On Wed, Feb 25, 2015 at 8:38 AM, Ted Tuttle <[email protected]> wrote:
>
> Hard to say how balanced the table is.
>
> We have a mixed requirement where we want some locality for timeseries
> queries against "clusters" of information.  However the "clusters" in a
> table are should be well distributed if the dataset is large enough.
>
> The query in question killed 5 RSs so I am inferring either:
>
> 1) the table was spread across these 5 RSs
> 2) the query moved around on the cluster as RSs failed
>
> Perhaps you could tell me if #2 is possible.
>
> We are running v0.94.9
>
> From: Ted Yu [mailto:[email protected]]
> Sent: Wednesday, February 25, 2015 7:24 AM
> To: [email protected]
> Cc: Development
> Subject: Re: Table.get(List<Get>) overwhelms several RSs
>
> Was the underlying table balanced (meaning its regions spread evenly
> across region servers) ?
>
> What release of HBase are you using ?
>
> Cheers
>
> On Wed, Feb 25, 2015 at 7:08 AM, Ted Tuttle <[email protected]<mailto:
> [email protected]>> wrote:
> Hello-
>
> In the last week we had multiple times where we lost 5 of 8 RSs in the
> space of a few minutes because of slow GCs.
>
> We traced this back to a client calling Table.get(List<Get> gets) with a
> collection containing ~4000 individual gets.
>
> We've worked around this by limiting the number of Gets we send in a
> single call to Table.get(List<Get>)
>
> Is there some configuration parameter that we are missing here?
> Thanks,
> Ted
>
>
>

Re: Table.get(List) overwhelms several RSs

Reply via email to