[CDH3u1] STORE with HBaseStorage : No columns to insert

Damien Hardy Fri, 16 Sep 2011 02:12:33 -0700

Hello there,

Based onhttp://www.cloudera.com/blog/2009/06/analyzing-apache-logs-with-pig/ Iwant to add geolocalisation to my haproxy raw logs stored in Hbase Table.

Here is my pig script (wrapper.sh is an auto extract bash archive thatdeploy the perl script and its dependances very close to the one in myreference and launch it ) :


DEFINE iplookup `wrapper.sh GeoIP`
ship ('wrapper.sh')
cache('/GeoIP/GeoIPcity.dat#GeoIP');

A = load 'log' usingorg.apache.pig.backend.hadoop.hbase.HBaseStorage('default:body','-gt=_f:squid_t:201109151630-loadKey') AS (rowkey, data);

B = LIMIT A 10;
C = FOREACH B {

t =REGEX_EXTRACT(data,'([0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}):([0-9]+)',1);

        generate rowkey, t;
}

D = STREAM C THROUGH iplookup AS (rowkey, ip, country_code, country,state, city);STORE D INTO 'geoip_pig' USINGorg.apache.pig.backend.hadoop.hbase.HBaseStorage('location:iplocation:country_code location:country location:state location:city');

I can DUMP D; without problem, I get what is promised with mygeolocalisation :

(_f:squid_t:20110916103000_b:squid_s:200-+PH/I6eJ9h8Sy8/1+yz2kw==,77.192.16.143,FR,France,B9,Lyon)
(_f:squid_t:20110916103000_b:squid_s:200-+XSr1ZpMyLGmi8iDvZ4lLQ==,80.13.204.64,FR,France,A8,Paris)
(_f:squid_t:20110916103000_b:squid_s:200-+gl66vwlvPL9Di1zzut9Bg==,178.250.1.40,FR,France,,)

(_f:squid_t:20110916103000_b:squid_s:200-+qAtjeGfssc2vkwWR4fmJQ==,86.73.78.25,FR,France,A8,LaCourneuve)

(_f:squid_t:20110916103000_b:squid_s:200-+wgQq1q8H/vp52//EIevzA==,80.13.204.64,FR,France,A8,Paris)
(_f:squid_t:20110916103000_b:squid_s:200-/3J9EosV46v521VBlb6zxQ==,82.127.103.161,FR,France,B6,Erquery)

(_f:squid_t:20110916103000_b:squid_s:200-/3okAiWeWMmpm54Qlk7JyQ==,86.75.127.253,FR,France,B5,LaDagueni�re)

(_f:squid_t:20110916103000_b:squid_s:200-/yZ09fLNWflcBlWX1BjEkA==,83.200.13.146,FR,France,A8,Villiers-le-bel)
(_f:squid_t:20110916103000_b:squid_s:200-0/HiVaFE6b1zrUTtHkV05Q==,193.228.156.10,FR,France,,)
(_f:squid_t:20110916103000_b:squid_s:200-0CTc6LQ9jGpgQQLwmJZxQQ==,195.93.102.10,FR,France,,)

But when I want to store (last line) in a new existing HTable I get thefollowing error message in the reduce JT UI :


java.io.IOException: java.lang.IllegalArgumentException: No columns to insert
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:439)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.cleanup(PigMapReduce.java:492)
        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:178)
        at 
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:572)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:414)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
        at org.apache.hadoop.mapred.Child.main(Child.java:264)
Caused by: java.lang.IllegalArgumentException: No columns to insert
        at org.apache.hadoop.hbase.client.HTable.validatePut(HTable.java:845)
        at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:677)
        at org.apache.hadoop.hbase.client.HTable.put(HTable.java:667)
        at 
org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:127)
        at 
org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:82)
        at 
org.apache.pig.backend.hadoop.hbase.HBaseStorage.putNext(HBaseStorage.java:431)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:138)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:97)
        at 
org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:514)
        at 
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:437)
        ... 9 more

Other question : what is the code for comments in pig script (except /*... */) to exclude one line rapidly.


I use cdh3u1 packages.

Thank you for helping.

Regards,

--
Damien

[CDH3u1] STORE with HBaseStorage : No columns to insert

Reply via email to