On Thu, Oct 30, 2014 at 8:20 AM, Andrejs Dubovskis <dubis...@gmail.com>
wrote:

> Hi!
>
> We have a bunch of rows on HBase which store varying sizes of data
> (1-50MB). We use HBase versioning and keep up to 10000 column
> versions. Typically each column has only few versions. But in rare
> cases it may has thousands versions.
>
> The Mapreduce alghoritm uses full scan and our algorithm requires all
> versions to produce the result. So, we call scan.setMaxVersions().
>
> In worst case Region Server returns one row only, but huge one. The
> size is unpredictable and can not be controlled, because using
> parameters we can control row count only. And the MR task can throws
> OOME even if it has 50Gb heap.
>
> Is it possible to handle this situation? For example, RS should not
> send the raw to client, if the last has no memory to handle the row.
> In this case client can handle error and fetch each row's version in a
> separate get request.
>

See HBASE-11544 "[Ergonomics] hbase.client.scanner.caching is dogged and
will try to return batch even if it means OOME".
St.Ack

Reply via email to