Hello there,
Based on http://www.cloudera.com/blog/**2009/06/analyzing-apache-logs-**
with-pig/<http://www.cloudera.com/blog/2009/06/analyzing-apache-logs-with-pig/>I
want to add geolocalisation to my haproxy raw logs stored in Hbase Table.
Here is my pig script (wrapper.sh is an auto extract bash archive that
deploy the perl script and its dependances very close to the one in my
reference and launch it ) :
DEFINE iplookup `wrapper.sh GeoIP`
ship ('wrapper.sh')
cache('/GeoIP/GeoIPcity.dat#**GeoIP');
A = load 'log' using org.apache.pig.backend.hadoop.**
hbase.HBaseStorage('default:**body','-gt=_f:squid_t:**201109151630
-loadKey') AS (rowkey, data);
B = LIMIT A 10;
C = FOREACH B {
t =
REGEX_EXTRACT(data,'([0-9]{1,**3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\**.[0-9]{1,3}):([0-9]+)
',1);
generate rowkey, t;
}
D = STREAM C THROUGH iplookup AS (rowkey, ip, country_code, country, state,
city);
STORE D INTO 'geoip_pig' USING org.apache.pig.backend.hadoop.**
hbase.HBaseStorage('location:**ip location:country_code location:country
location:state location:city');
I can DUMP D; without problem, I get what is promised with my
geolocalisation :
(_f:squid_t:20110916103000_b:**squid_s:200-+PH/I6eJ9h8Sy8/1+**
yz2kw==,77.192.16.143,FR,**France,B9,Lyon)
(_f:squid_t:20110916103000_b:**squid_s:200-+**
XSr1ZpMyLGmi8iDvZ4lLQ==,80.13.**204.64,FR,France,A8,Paris)
(_f:squid_t:20110916103000_b:**squid_s:200-+**gl66vwlvPL9Di1zzut9Bg==,178.
**250.1.40,FR,France,,)
(_f:squid_t:20110916103000_b:**squid_s:200-+**
qAtjeGfssc2vkwWR4fmJQ==,86.73.**78.25,FR,France,A8,La Courneuve)
(_f:squid_t:20110916103000_b:**squid_s:200-+wgQq1q8H/vp52//**
EIevzA==,80.13.204.64,FR,**France,A8,Paris)
(_f:squid_t:20110916103000_b:**squid_s:200-/**3J9EosV46v521VBlb6zxQ==,82.*
*127.103.161,FR,France,B6,**Erquery)
(_f:squid_t:20110916103000_b:**squid_s:200-/**3okAiWeWMmpm54Qlk7JyQ==,
86.75.127.253,FR,**France,B5,La Dagueni�re)
(_f:squid_t:20110916103000_b:**squid_s:200-/**yZ09fLNWflcBlWX1BjEkA==,83.*
*200.13.146,FR,France,A8,**Villiers-le-bel)
(_f:squid_t:20110916103000_b:**squid_s:200-0/**HiVaFE6b1zrUTtHkV05Q==,193.
**228.156.10,FR,France,,)
(_f:squid_t:20110916103000_b:**squid_s:200-**0CTc6LQ9jGpgQQLwmJZxQQ==,195.
**93.102.10,FR,France,,)
But when I want to store (last line) in a new existing HTable I get the
following error message in the reduce JT UI :
java.io.IOException: java.lang.**IllegalArgumentException: No columns to
insert
at org.apache.pig.backend.hadoop.**executionengine.**
mapReduceLayer.PigMapReduce$**Reduce.runPipeline(**PigMapReduce.java:439)
at org.apache.pig.backend.hadoop.**executionengine.**
mapReduceLayer.PigMapReduce$**Reduce.cleanup(PigMapReduce.**java:492)
at org.apache.hadoop.mapreduce.**Reducer.run(Reducer.java:178)
at org.apache.hadoop.mapred.**ReduceTask.runNewReducer(**
ReduceTask.java:572)
at org.apache.hadoop.mapred.**ReduceTask.run(ReduceTask.**java:414)
at org.apache.hadoop.mapred.**Child$4.run(Child.java:270)
at java.security.**AccessController.doPrivileged(**Native Method)
at javax.security.auth.Subject.**doAs(Subject.java:396)
at org.apache.hadoop.security.**UserGroupInformation.doAs(**
UserGroupInformation.java:**1127)
at org.apache.hadoop.mapred.**Child.main(Child.java:264)
Caused by: java.lang.**IllegalArgumentException: No columns to insert
at org.apache.hadoop.hbase.**client.HTable.validatePut(**
HTable.java:845)
at org.apache.hadoop.hbase.**client.HTable.doPut(HTable.**java:677)
at org.apache.hadoop.hbase.**client.HTable.put(HTable.java:**667)
at org.apache.hadoop.hbase.**mapreduce.TableOutputFormat$**
TableRecordWriter.write(**TableOutputFormat.java:127)
at org.apache.hadoop.hbase.**mapreduce.TableOutputFormat$**
TableRecordWriter.write(**TableOutputFormat.java:82)
at org.apache.pig.backend.hadoop.**hbase.HBaseStorage.putNext(**
HBaseStorage.java:431)
at org.apache.pig.backend.hadoop.**executionengine.**
mapReduceLayer.**PigOutputFormat$**PigRecordWriter.write(**
PigOutputFormat.java:138)
at org.apache.pig.backend.hadoop.**executionengine.**
mapReduceLayer.**PigOutputFormat$**PigRecordWriter.write(**
PigOutputFormat.java:97)
at org.apache.hadoop.mapred.**ReduceTask$**
NewTrackingRecordWriter.write(**ReduceTask.java:514)
at org.apache.hadoop.mapreduce.**TaskInputOutputContext.write(**
TaskInputOutputContext.java:**80)
at org.apache.pig.backend.hadoop.**executionengine.**
mapReduceLayer.PigMapReduce$**Reduce.runPipeline(**PigMapReduce.java:437)
... 9 more
Other question : what is the code for comments in pig script (except /* ...
*/) to exclude one line rapidly.
I use cdh3u1 packages.
Thank you for helping.
Regards,
--
Damien