This is quite odd because I do the same thing on a multi-million row table and get multiple regions ... You do have multiple regions, right? What happens if you only specify the -loadKey parameter and none of the others?
On Thu, Jan 20, 2011 at 8:24 AM, Mr. Lukas <[email protected]> wrote: > Hi pig users, > I'm also using pig 0.8 together with HBase 0.20.6 and think, my problem is > related to Ian's. When processing a table with millions of rows (stored in > multiple), HBaseStorage won't scan the full table but only a few hundred > records. > > The following minimal example reproduces my problem (for this table): > > REGISTER '/path/to/guava-r07.jar' > SET DEFAULT_PARALLEL 30; > items = LOAD 'hbase://some-table' USING > org.apache.pig.backend.hadoop.hbase.HBaseStorage('family:column', '-caster > HBaseBinaryConverter -caching 500 -loadKey') AS (key:bytearray, > a_column:long); > items = GROUP items ALL; > item_count = FOREACH items GENERATE COUNT_STAR($1); > DUMP item_count > > Pig issues just one mapper and I guess, that it scans just one region of > the > table. Or did i miss some fundamental configuration options? > > Best regards, > Lukas >
