Also, Since I encountered all of these issues when loading and storing and needed to 'get-shit-done', here's the modified version of hbase storage:
https://github.com/infochimps/HbaseBulkloader that might help for now (it's not a long term solution). In it you'll find it can read full column families into pig bags (eg: "USING com.infochimps.hbase.HBaseStorage('my_fol_fam:')" ) as well as the normal read and write functionality of HBaseStorage from the pig trunk but with the nulls taken care of. Writing to the WAL is turned off and hardcoded. Lastly, there's an additional class (for writing to hbase only) called 'DynamicFamilyStorage' that gets the column family and column name from the records themselves that you might find useful. --jacob @thedatachef On Mon, 2011-02-14 at 15:03 -0700, Matt Davies wrote: > Thank you! We are looking forward to it. > > On Mon, Feb 14, 2011 at 3:01 PM, Dmitriy Ryaboy <[email protected]> wrote: > > > I have a fix for that, just discovered it last night myself. The patch for > > 0.89 doesn't work on storage (it only works for loading). Will update > > ticket > > later tonight once i get some ivy uglyness out of the way. > > > > D > > > > On Mon, Feb 14, 2011 at 1:57 PM, Matt Davies <[email protected]> wrote: > > > > > Hey All, > > > > > > Running into a problem storing data from a pig script storing results > > into > > > HBase. > > > > > > We are getting the following error: > > > > > > java.lang.NullPointerException > > > at > > > > > org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:126) > > > at > > > > > org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:81) > > > at > > > > > org.apache.pig.backend.hadoop.hbase.HBaseStorage.putNext(HBaseStorage.java:364) > > > at > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:138) > > > at > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:97) > > > at > > > > > org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:523) > > > at > > > > > org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) > > > at > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48) > > > at > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:238) > > > at > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231) > > > at > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53) > > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) > > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:639) > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:315) > > > at org.apache.hadoop.mapred.Child$4.run(Child.java:217) > > > at java.security.AccessController.doPrivileged(Native Method) > > > at javax.security.auth.Subject.doAs(Subject.java:396) > > > at > > > > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063) > > > at org.apache.hadoop.mapred.Child.main(Child.java:211) > > > > > > > > > We are using CDH3b3, and HBase 0.90.0 (from Apache direct). We've > > > followed the instructions to get pig 0.8.0 to work with CDH3 from > > > thedatachef(Thanks!) > > > > > > > > http://thedatachef.blogspot.com/2011/01/apache-pig-08-with-cloudera-cdh3.html > > > . > > > > > > The relevant line from the pig script is below. We've applied the > > > patch to get the "-noWAL" working: > > > > > > STORE links INTO 'p' USING > > > org.apache.pig.backend.hadoop.hbase.HBaseStorage('a:t a:t2 > > > a:g','-noWAL'); > > > > > > > > > > > > Anyone know what could be causing this problem? > > > > > > > > > Thanks in advance, > > > > > > > > > Matt > > > > >
