Hey tsuna. I changed the algorithm significantly and eliminated the "nested"
loop and it works
lightening fast. I do scans separately instead of nesting.

Anyways, I have retained old code for revisiting later to find out why
nested scans function poorly (perhaps only on single machine -
pseudo-distributed mode)

Does your table fit entirely in one region?  How big are the rows?
> Are you writing a lot to your table?  Are you typically inserting
> cells or overwriting stuff in existing ones?
>
> No it doesn't. It has spawned several regions.
> The rows are sparse, sometimes as huge as "storing a web-page" for a
column and sometimes
very small, just meta data.
> Yes! I do overwrite entire  rows often (after the proof of concept, this
won't happen)

Is your pseudo-distributed HBase running on a single machine?  If yes,
> why not use a non-distributed HBase setup (without HDFS)?
>
> Yes it is running on single machine.
> Good suggestion. Should setup separately.

-Thanks,
Dani

On Tue, Jan 25, 2011 at 11:41 PM, tsuna <[email protected]> wrote:

> On Tue, Jan 25, 2011 at 2:14 PM, Dani Rayan <[email protected]> wrote:
> > But opening and closing the scanner inside this nested loop is taking
> > mulitple seconds to complete on just 3000 rows :(
>
> Something is wrong with your cluster or the way you use it.  The
> overhead of opening / closing the scanner is normally absolutely
> negligible compared to the overhead to scan the full table, even with
> a table as small as just 3000 rows.
>
> Does your table fit entirely in one region?  How big are the rows?
> Are you writing a lot to your table?  Are you typically inserting
> cells or overwriting stuff in existing ones?
>
> Is your pseudo-distributed HBase running on a single machine?  If yes,
> why not use a non-distributed HBase setup (without HDFS)?
>
> --
> Benoit "tsuna" Sigoure
> Software Engineer @ www.StumbleUpon.com
>

Reply via email to