Hello there,

Based on http://www.cloudera.com/blog/2009/06/analyzing-apache-logs-with-pig/ I want to add geolocalisation to my haproxy raw logs stored in Hbase Table.

Here is my pig script (wrapper.sh is an auto extract bash archive that deploy the perl script and its dependances very close to the one in my reference and launch it ) :

DEFINE iplookup `wrapper.sh GeoIP`
ship ('wrapper.sh')
cache('/GeoIP/GeoIPcity.dat#GeoIP');

A = load 'log' using org.apache.pig.backend.hadoop.hbase.HBaseStorage('default:body','-gt=_f:squid_t:201109151630 -loadKey') AS (rowkey, data);
B = LIMIT A 10;
C = FOREACH B {
t = REGEX_EXTRACT(data,'([0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}):([0-9]+) ',1);
        generate rowkey, t;
}
D = STREAM C THROUGH iplookup AS (rowkey, ip, country_code, country, state, city); STORE D INTO 'geoip_pig' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('location:ip location:country_code location:country location:state location:city');

I can DUMP D; without problem, I get what is promised with my geolocalisation :
(_f:squid_t:20110916103000_b:squid_s:200-+PH/I6eJ9h8Sy8/1+yz2kw==,77.192.16.143,FR,France,B9,Lyon)
(_f:squid_t:20110916103000_b:squid_s:200-+XSr1ZpMyLGmi8iDvZ4lLQ==,80.13.204.64,FR,France,A8,Paris)
(_f:squid_t:20110916103000_b:squid_s:200-+gl66vwlvPL9Di1zzut9Bg==,178.250.1.40,FR,France,,)
(_f:squid_t:20110916103000_b:squid_s:200-+qAtjeGfssc2vkwWR4fmJQ==,86.73.78.25,FR,France,A8,La Courneuve)
(_f:squid_t:20110916103000_b:squid_s:200-+wgQq1q8H/vp52//EIevzA==,80.13.204.64,FR,France,A8,Paris)
(_f:squid_t:20110916103000_b:squid_s:200-/3J9EosV46v521VBlb6zxQ==,82.127.103.161,FR,France,B6,Erquery)
(_f:squid_t:20110916103000_b:squid_s:200-/3okAiWeWMmpm54Qlk7JyQ==,86.75.127.253,FR,France,B5,La Dagueni�re)
(_f:squid_t:20110916103000_b:squid_s:200-/yZ09fLNWflcBlWX1BjEkA==,83.200.13.146,FR,France,A8,Villiers-le-bel)
(_f:squid_t:20110916103000_b:squid_s:200-0/HiVaFE6b1zrUTtHkV05Q==,193.228.156.10,FR,France,,)
(_f:squid_t:20110916103000_b:squid_s:200-0CTc6LQ9jGpgQQLwmJZxQQ==,195.93.102.10,FR,France,,)



But when I want to store (last line) in a new existing HTable I get the following error message in the reduce JT UI :

java.io.IOException: java.lang.IllegalArgumentException: No columns to insert
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:439)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.cleanup(PigMapReduce.java:492)
        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:178)
        at 
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:572)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:414)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
        at org.apache.hadoop.mapred.Child.main(Child.java:264)
Caused by: java.lang.IllegalArgumentException: No columns to insert
        at org.apache.hadoop.hbase.client.HTable.validatePut(HTable.java:845)
        at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:677)
        at org.apache.hadoop.hbase.client.HTable.put(HTable.java:667)
        at 
org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:127)
        at 
org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:82)
        at 
org.apache.pig.backend.hadoop.hbase.HBaseStorage.putNext(HBaseStorage.java:431)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:138)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:97)
        at 
org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:514)
        at 
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:437)
        ... 9 more


Other question : what is the code for comments in pig script (except /* ... */) to exclude one line rapidly.

I use cdh3u1 packages.

Thank you for helping.

Regards,

--
Damien

Reply via email to