I created a JIRA for this issue: https://issues.apache.org/jira/browse/PIG-1828

Best,
Lukas

On Tue, Jan 25, 2011 at 2:52 PM, Mr. Lukas <[email protected]> wrote:
> Hello again,
> I just found something interesting in the logs:
>
> INFO org.apache.pig.backend.hadoop.hbase.HBaseTableInputFormat:
> setScan with ranges: 5192296858534827628530496329220096 -
> 5192343374370748142029900260897474 ( 46515835920513499403931677378)
>
> But in my case, it should more be from 1020576114013268896970538800 to
> 72576215356229636519498348368 (when interpreting those numbers as the
> arbitrary precision integer representation of the row key).
>
> Best regards,
> Lukas
>
> On Mon, Jan 24, 2011 at 10:07 AM, Mr. Lukas <[email protected]> wrote:
>> Hi Dmitriy,
>> Sorry for the late reply, I was out of office.
>> Discarding the caster and caching option (i.e. using only the -loadkey
>> option) does not change anything except that some
>> FIELD_DISCARDED_TYPE_CONVERSION_FAILED warnings are issued.
>>
>> On Fri, Jan 21, 2011 at 1:42 AM, Dmitriy Ryaboy <[email protected]> wrote:
>>>
>>> This is quite odd because I do the same thing on a multi-million row table
>>> and get multiple regions ...
>>> You do have multiple regions, right? What happens if you only specify the
>>> -loadKey parameter and none of the others?
>>>
>>> On Thu, Jan 20, 2011 at 8:24 AM, Mr. Lukas <[email protected]> wrote:
>>>
>>> > Hi pig users,
>>> > I'm also using pig 0.8 together with HBase 0.20.6 and think, my problem is
>>> > related to Ian's. When processing a table with millions of rows (stored in
>>> > multiple), HBaseStorage won't scan the full table but only a few hundred
>>> > records.
>>> >
>>> > The following minimal example reproduces my problem (for this table):
>>> >
>>> > REGISTER '/path/to/guava-r07.jar'
>>> > SET DEFAULT_PARALLEL 30;
>>> > items = LOAD 'hbase://some-table' USING
>>> > org.apache.pig.backend.hadoop.hbase.HBaseStorage('family:column', '-caster
>>> > HBaseBinaryConverter -caching 500 -loadKey') AS (key:bytearray,
>>> > a_column:long);
>>> > items = GROUP items ALL;
>>> > item_count = FOREACH items GENERATE COUNT_STAR($1);
>>> > DUMP item_count
>>> >
>>> > Pig issues just one mapper and I guess, that it scans just one region of
>>> > the
>>> > table. Or did i miss some fundamental configuration options?
>>> >
>>> > Best regards,
>>> > Lukas
>>> >
>>
>

Reply via email to