Here’s the simple answer. 

Don’t do it. 

They way you are abusing versioning is a bad design. 

Redesign your schema. 



On Oct 30, 2014, at 10:20 AM, Andrejs Dubovskis <dubis...@gmail.com> wrote:

> Hi!
> 
> We have a bunch of rows on HBase which store varying sizes of data
> (1-50MB). We use HBase versioning and keep up to 10000 column
> versions. Typically each column has only few versions. But in rare
> cases it may has thousands versions.
> 
> The Mapreduce alghoritm uses full scan and our algorithm requires all
> versions to produce the result. So, we call scan.setMaxVersions().
> 
> In worst case Region Server returns one row only, but huge one. The
> size is unpredictable and can not be controlled, because using
> parameters we can control row count only. And the MR task can throws
> OOME even if it has 50Gb heap.
> 
> Is it possible to handle this situation? For example, RS should not
> send the raw to client, if the last has no memory to handle the row.
> In this case client can handle error and fetch each row's version in a
> separate get request.
> 
> 
> Best wishes,
> --
> Andrejs Dubovskis
> 

Reply via email to