Hi Dmitriy,
Sorry for the late reply, I was out of office.
Discarding the caster and caching option (i.e. using only the -loadkey
option) does not change anything except that some
FIELD_DISCARDED_TYPE_CONVERSION_FAILED warnings are issued.

On Fri, Jan 21, 2011 at 1:42 AM, Dmitriy Ryaboy <[email protected]> wrote:
>
> This is quite odd because I do the same thing on a multi-million row table
> and get multiple regions ...
> You do have multiple regions, right? What happens if you only specify the
> -loadKey parameter and none of the others?
>
> On Thu, Jan 20, 2011 at 8:24 AM, Mr. Lukas <[email protected]> wrote:
>
> > Hi pig users,
> > I'm also using pig 0.8 together with HBase 0.20.6 and think, my problem is
> > related to Ian's. When processing a table with millions of rows (stored in
> > multiple), HBaseStorage won't scan the full table but only a few hundred
> > records.
> >
> > The following minimal example reproduces my problem (for this table):
> >
> > REGISTER '/path/to/guava-r07.jar'
> > SET DEFAULT_PARALLEL 30;
> > items = LOAD 'hbase://some-table' USING
> > org.apache.pig.backend.hadoop.hbase.HBaseStorage('family:column', '-caster
> > HBaseBinaryConverter -caching 500 -loadKey') AS (key:bytearray,
> > a_column:long);
> > items = GROUP items ALL;
> > item_count = FOREACH items GENERATE COUNT_STAR($1);
> > DUMP item_count
> >
> > Pig issues just one mapper and I guess, that it scans just one region of
> > the
> > table. Or did i miss some fundamental configuration options?
> >
> > Best regards,
> > Lukas
> >

Reply via email to