Here’s the simple answer. Don’t do it.
They way you are abusing versioning is a bad design. Redesign your schema. On Oct 30, 2014, at 10:20 AM, Andrejs Dubovskis <dubis...@gmail.com> wrote: > Hi! > > We have a bunch of rows on HBase which store varying sizes of data > (1-50MB). We use HBase versioning and keep up to 10000 column > versions. Typically each column has only few versions. But in rare > cases it may has thousands versions. > > The Mapreduce alghoritm uses full scan and our algorithm requires all > versions to produce the result. So, we call scan.setMaxVersions(). > > In worst case Region Server returns one row only, but huge one. The > size is unpredictable and can not be controlled, because using > parameters we can control row count only. And the MR task can throws > OOME even if it has 50Gb heap. > > Is it possible to handle this situation? For example, RS should not > send the raw to client, if the last has no memory to handle the row. > In this case client can handle error and fetch each row's version in a > separate get request. > > > Best wishes, > -- > Andrejs Dubovskis >