Hi Dmitriy, Sorry for the late reply, I was out of office. Discarding the caster and caching option (i.e. using only the -loadkey option) does not change anything except that some FIELD_DISCARDED_TYPE_CONVERSION_FAILED warnings are issued.
On Fri, Jan 21, 2011 at 1:42 AM, Dmitriy Ryaboy <[email protected]> wrote: > > This is quite odd because I do the same thing on a multi-million row table > and get multiple regions ... > You do have multiple regions, right? What happens if you only specify the > -loadKey parameter and none of the others? > > On Thu, Jan 20, 2011 at 8:24 AM, Mr. Lukas <[email protected]> wrote: > > > Hi pig users, > > I'm also using pig 0.8 together with HBase 0.20.6 and think, my problem is > > related to Ian's. When processing a table with millions of rows (stored in > > multiple), HBaseStorage won't scan the full table but only a few hundred > > records. > > > > The following minimal example reproduces my problem (for this table): > > > > REGISTER '/path/to/guava-r07.jar' > > SET DEFAULT_PARALLEL 30; > > items = LOAD 'hbase://some-table' USING > > org.apache.pig.backend.hadoop.hbase.HBaseStorage('family:column', '-caster > > HBaseBinaryConverter -caching 500 -loadKey') AS (key:bytearray, > > a_column:long); > > items = GROUP items ALL; > > item_count = FOREACH items GENERATE COUNT_STAR($1); > > DUMP item_count > > > > Pig issues just one mapper and I guess, that it scans just one region of > > the > > table. Or did i miss some fundamental configuration options? > > > > Best regards, > > Lukas > >
