Re: Scanner Caching with wildly varying row widths

Patrick Schless Mon, 04 Nov 2013 15:49:16 -0800

Sweet! Thanks for the tip :)


On Mon, Nov 4, 2013 at 5:10 PM, Dhaval Shah <[email protected]>wrote:

> You can use scan.setBatch() to limit the number of columns returned.. Note
> that it will split up a row into multiple rows from a client's perspective
> and client code might need to be modified to make use of the setBatch
> feature
>
> Regards,
> Dhaval
>
>
> ________________________________
>  From: Patrick Schless <[email protected]>
> To: user <[email protected]>
> Sent: Monday, 4 November 2013 6:03 PM
> Subject: Scanner Caching with wildly varying row widths
>
>
> We have an application where a row can contain anywhere between 1 and
> 3600000 cells (there's only 1 column family). In practice, most rows have
> under 100 cells.
>
> Now we want to run some mapreduce jobs that touch every cell within a range
> (eg count how many cells we have).  With scanner caching set to something
> like 250, the job will chug along for a long time, until it hits a row with
> a lot of data, then it will die.  Setting the cache size down to 1 (row)
> would presumably work, but take forever to run.
>
> We have addressed this by writing some jobs that use coprocessors, which
> allow us to pull back sets of cells instead of sets of rows, but this means
> we can't use any of the built-in jobs that come with hbase (eg copyTable).
> Is there any way around this? Have other people had to deal with such high
> variability in their row sizes?
>

Re: Scanner Caching with wildly varying row widths

Reply via email to