Hello there,
Based on
http://www.cloudera.com/blog/2009/06/analyzing-apache-logs-with-pig/ I
want to add geolocalisation to my haproxy raw logs stored in Hbase Table.
Here is my pig script (wrapper.sh is an auto extract bash archive that
deploy the perl script and its dependances very close to the one in my
reference and launch it ) :
DEFINE iplookup `wrapper.sh GeoIP`
ship ('wrapper.sh')
cache('/GeoIP/GeoIPcity.dat#GeoIP');
A = load 'log' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage('default:body','-gt=_f:squid_t:201109151630
-loadKey') AS (rowkey, data);
B = LIMIT A 10;
C = FOREACH B {
t =
REGEX_EXTRACT(data,'([0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}):([0-9]+)
',1);
generate rowkey, t;
}
D = STREAM C THROUGH iplookup AS (rowkey, ip, country_code, country,
state, city);
STORE D INTO 'geoip_pig' USING
org.apache.pig.backend.hadoop.hbase.HBaseStorage('location:ip
location:country_code location:country location:state location:city');
I can DUMP D; without problem, I get what is promised with my
geolocalisation :
(_f:squid_t:20110916103000_b:squid_s:200-+PH/I6eJ9h8Sy8/1+yz2kw==,77.192.16.143,FR,France,B9,Lyon)
(_f:squid_t:20110916103000_b:squid_s:200-+XSr1ZpMyLGmi8iDvZ4lLQ==,80.13.204.64,FR,France,A8,Paris)
(_f:squid_t:20110916103000_b:squid_s:200-+gl66vwlvPL9Di1zzut9Bg==,178.250.1.40,FR,France,,)
(_f:squid_t:20110916103000_b:squid_s:200-+qAtjeGfssc2vkwWR4fmJQ==,86.73.78.25,FR,France,A8,La
Courneuve)
(_f:squid_t:20110916103000_b:squid_s:200-+wgQq1q8H/vp52//EIevzA==,80.13.204.64,FR,France,A8,Paris)
(_f:squid_t:20110916103000_b:squid_s:200-/3J9EosV46v521VBlb6zxQ==,82.127.103.161,FR,France,B6,Erquery)
(_f:squid_t:20110916103000_b:squid_s:200-/3okAiWeWMmpm54Qlk7JyQ==,86.75.127.253,FR,France,B5,La
Dagueni�re)
(_f:squid_t:20110916103000_b:squid_s:200-/yZ09fLNWflcBlWX1BjEkA==,83.200.13.146,FR,France,A8,Villiers-le-bel)
(_f:squid_t:20110916103000_b:squid_s:200-0/HiVaFE6b1zrUTtHkV05Q==,193.228.156.10,FR,France,,)
(_f:squid_t:20110916103000_b:squid_s:200-0CTc6LQ9jGpgQQLwmJZxQQ==,195.93.102.10,FR,France,,)
But when I want to store (last line) in a new existing HTable I get the
following error message in the reduce JT UI :
java.io.IOException: java.lang.IllegalArgumentException: No columns to insert
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:439)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.cleanup(PigMapReduce.java:492)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:178)
at
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:572)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:414)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
Caused by: java.lang.IllegalArgumentException: No columns to insert
at org.apache.hadoop.hbase.client.HTable.validatePut(HTable.java:845)
at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:677)
at org.apache.hadoop.hbase.client.HTable.put(HTable.java:667)
at
org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:127)
at
org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:82)
at
org.apache.pig.backend.hadoop.hbase.HBaseStorage.putNext(HBaseStorage.java:431)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:138)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:97)
at
org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:514)
at
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:437)
... 9 more
Other question : what is the code for comments in pig script (except /*
... */) to exclude one line rapidly.
I use cdh3u1 packages.
Thank you for helping.
Regards,
--
Damien