Hi pig users,
I'm also using pig 0.8 together with HBase 0.20.6 and think, my problem is
related to Ian's. When processing a table with millions of rows (stored in
multiple), HBaseStorage won't scan the full table but only a few hundred
records.
The following minimal example reproduces my problem (for this table):
REGISTER '/path/to/guava-r07.jar'
SET DEFAULT_PARALLEL 30;
items = LOAD 'hbase://some-table' USING
org.apache.pig.backend.hadoop.hbase.HBaseStorage('family:column', '-caster
HBaseBinaryConverter -caching 500 -loadKey') AS (key:bytearray,
a_column:long);
items = GROUP items ALL;
item_count = FOREACH items GENERATE COUNT_STAR($1);
DUMP item_count
Pig issues just one mapper and I guess, that it scans just one region of the
table. Or did i miss some fundamental configuration options?
Best regards,
Lukas