Hi!

We have a bunch of rows on HBase which store varying sizes of data
(1-50MB). We use HBase versioning and keep up to 10000 column
versions. Typically each column has only few versions. But in rare
cases it may has thousands versions.

The Mapreduce alghoritm uses full scan and our algorithm requires all
versions to produce the result. So, we call scan.setMaxVersions().

In worst case Region Server returns one row only, but huge one. The
size is unpredictable and can not be controlled, because using
parameters we can control row count only. And the MR task can throws
OOME even if it has 50Gb heap.

Is it possible to handle this situation? For example, RS should not
send the raw to client, if the last has no memory to handle the row.
In this case client can handle error and fetch each row's version in a
separate get request.


Best wishes,
--
Andrejs Dubovskis

Reply via email to