In your eval function, you can use the HBase Get/Scan API to retrieve the
data rather than using the MapReduce API.


On Wed, Aug 21, 2013 at 7:12 AM, John <[email protected]> wrote:

> Im currently writing a Pig Latin programm:
>
>     A = load 'hbase://mytable1' my.packages.CustomizeHBaseStorage('VALUE',
> '-loadKey true', 'myrowkey1') as (rowkey:chararray, columncontent:map[]);
>     ABag = foreach PATTERN_0 generate flatten(my.packages.MapToBag($1)) as
> (output:chararray);
>
> the CustimizeHbaseStorage is loading the row "myrowkey1" and after that the
> map for this rowkey is transformed to a Bag. That works fine so far.
>
> So, in the ABag are now some entries. With this entries I try to do load
> new row keys (every entry in the bag is the information for a new rowkey I
> have to load next). So I tried something like this:
>
>     X= FOREACH ABag {
>              TMP = load 'hbase://mytable2'
> my.packages.customizeHBaseStorage('VALUE', '-loadKey true', '$0') as
> (rowkey:chararray, columncontent:map[]);
>             GENERATE (TMP.$0);
>     }
>
> This doesn't does not work, because as far as I now the load statement is
> not allowed for FOREACH.
>
> So I tried to build my own EvalFunc:
>
>     X = FOREACH INTERMEDIATE_BAG_0 GENERATE my.packages.MyNewUDF($0);
>
> Here is the Java Code for the MyNewUDF:
>
>     ...
>     public DataBag exec(Tuple input) throws IOException {
>  DataBag result = null;
> try {
> result = bagFactory.newDefaultBag();
>  CustomizeHBaseStorage loader = new CustomizeHBaseStorage("VALUE",
> "-loadKey true", input
> .get(0).toString());
>  loader.getInputFormat();
> Tuple curTuple = loader.getNext();
>  while (curTuple != null) {
> result.add(curTuple);
> curTuple = loader.getNext();
>  }
> } catch (ParseException e) {
> e.printStackTrace();
>  }
> return result;
>
>     }
>     ...
>
> I think this would work, but the problem is I got a NullpointerException
> because the RecordReader in the HBaseStorage is not initialized when
> executing getNext(). So if anybody can say me how I can initialized the
> RecordReader (and I think the PigSplit too, because its necessary for
> CustomizeHBaseStorage .prepareToRead(RecordReader reader, PigSplit split))
> or maybe another approach I would be thankful.
>
>
> BTW. I know that I can load the whole mytable2 in a new alias and then JOIN
> ABag and the new alias, but I try to optimize my program, beacuse it is not
> necessary to load the whole mytable2. I try to build a "join" with
> information passing.
>
> Thanks
>

Reply via email to