I created a JIRA for this issue: https://issues.apache.org/jira/browse/PIG-1828
Best, Lukas On Tue, Jan 25, 2011 at 2:52 PM, Mr. Lukas <[email protected]> wrote: > Hello again, > I just found something interesting in the logs: > > INFO org.apache.pig.backend.hadoop.hbase.HBaseTableInputFormat: > setScan with ranges: 5192296858534827628530496329220096 - > 5192343374370748142029900260897474 ( 46515835920513499403931677378) > > But in my case, it should more be from 1020576114013268896970538800 to > 72576215356229636519498348368 (when interpreting those numbers as the > arbitrary precision integer representation of the row key). > > Best regards, > Lukas > > On Mon, Jan 24, 2011 at 10:07 AM, Mr. Lukas <[email protected]> wrote: >> Hi Dmitriy, >> Sorry for the late reply, I was out of office. >> Discarding the caster and caching option (i.e. using only the -loadkey >> option) does not change anything except that some >> FIELD_DISCARDED_TYPE_CONVERSION_FAILED warnings are issued. >> >> On Fri, Jan 21, 2011 at 1:42 AM, Dmitriy Ryaboy <[email protected]> wrote: >>> >>> This is quite odd because I do the same thing on a multi-million row table >>> and get multiple regions ... >>> You do have multiple regions, right? What happens if you only specify the >>> -loadKey parameter and none of the others? >>> >>> On Thu, Jan 20, 2011 at 8:24 AM, Mr. Lukas <[email protected]> wrote: >>> >>> > Hi pig users, >>> > I'm also using pig 0.8 together with HBase 0.20.6 and think, my problem is >>> > related to Ian's. When processing a table with millions of rows (stored in >>> > multiple), HBaseStorage won't scan the full table but only a few hundred >>> > records. >>> > >>> > The following minimal example reproduces my problem (for this table): >>> > >>> > REGISTER '/path/to/guava-r07.jar' >>> > SET DEFAULT_PARALLEL 30; >>> > items = LOAD 'hbase://some-table' USING >>> > org.apache.pig.backend.hadoop.hbase.HBaseStorage('family:column', '-caster >>> > HBaseBinaryConverter -caching 500 -loadKey') AS (key:bytearray, >>> > a_column:long); >>> > items = GROUP items ALL; >>> > item_count = FOREACH items GENERATE COUNT_STAR($1); >>> > DUMP item_count >>> > >>> > Pig issues just one mapper and I guess, that it scans just one region of >>> > the >>> > table. Or did i miss some fundamental configuration options? >>> > >>> > Best regards, >>> > Lukas >>> > >> >
