1) Please try trunk. 2) Like in sql, a single line comment is preceded by two dashes: "--"
D On Fri, Sep 16, 2011 at 2:10 AM, Damien Hardy <[email protected]> wrote: > Hello there, > > Based on http://www.cloudera.com/blog/**2009/06/analyzing-apache-logs-** > with-pig/<http://www.cloudera.com/blog/2009/06/analyzing-apache-logs-with-pig/>I > want to add geolocalisation to my haproxy raw logs stored in Hbase Table. > > Here is my pig script (wrapper.sh is an auto extract bash archive that > deploy the perl script and its dependances very close to the one in my > reference and launch it ) : > > DEFINE iplookup `wrapper.sh GeoIP` > ship ('wrapper.sh') > cache('/GeoIP/GeoIPcity.dat#**GeoIP'); > > A = load 'log' using org.apache.pig.backend.hadoop.** > hbase.HBaseStorage('default:**body','-gt=_f:squid_t:**201109151630 > -loadKey') AS (rowkey, data); > B = LIMIT A 10; > C = FOREACH B { > t = > REGEX_EXTRACT(data,'([0-9]{1,**3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\**.[0-9]{1,3}):([0-9]+) > ',1); > generate rowkey, t; > } > D = STREAM C THROUGH iplookup AS (rowkey, ip, country_code, country, state, > city); > STORE D INTO 'geoip_pig' USING org.apache.pig.backend.hadoop.** > hbase.HBaseStorage('location:**ip location:country_code location:country > location:state location:city'); > > I can DUMP D; without problem, I get what is promised with my > geolocalisation : > (_f:squid_t:20110916103000_b:**squid_s:200-+PH/I6eJ9h8Sy8/1+** > yz2kw==,77.192.16.143,FR,**France,B9,Lyon) > (_f:squid_t:20110916103000_b:**squid_s:200-+** > XSr1ZpMyLGmi8iDvZ4lLQ==,80.13.**204.64,FR,France,A8,Paris) > (_f:squid_t:20110916103000_b:**squid_s:200-+**gl66vwlvPL9Di1zzut9Bg==,178. > **250.1.40,FR,France,,) > (_f:squid_t:20110916103000_b:**squid_s:200-+** > qAtjeGfssc2vkwWR4fmJQ==,86.73.**78.25,FR,France,A8,La Courneuve) > (_f:squid_t:20110916103000_b:**squid_s:200-+wgQq1q8H/vp52//** > EIevzA==,80.13.204.64,FR,**France,A8,Paris) > (_f:squid_t:20110916103000_b:**squid_s:200-/**3J9EosV46v521VBlb6zxQ==,82.* > *127.103.161,FR,France,B6,**Erquery) > (_f:squid_t:20110916103000_b:**squid_s:200-/**3okAiWeWMmpm54Qlk7JyQ==, > 86.75.127.253,FR,**France,B5,La Dagueni�re) > (_f:squid_t:20110916103000_b:**squid_s:200-/**yZ09fLNWflcBlWX1BjEkA==,83.* > *200.13.146,FR,France,A8,**Villiers-le-bel) > (_f:squid_t:20110916103000_b:**squid_s:200-0/**HiVaFE6b1zrUTtHkV05Q==,193. > **228.156.10,FR,France,,) > (_f:squid_t:20110916103000_b:**squid_s:200-**0CTc6LQ9jGpgQQLwmJZxQQ==,195. > **93.102.10,FR,France,,) > > > > But when I want to store (last line) in a new existing HTable I get the > following error message in the reduce JT UI : > > java.io.IOException: java.lang.**IllegalArgumentException: No columns to > insert > at org.apache.pig.backend.hadoop.**executionengine.** > mapReduceLayer.PigMapReduce$**Reduce.runPipeline(**PigMapReduce.java:439) > at org.apache.pig.backend.hadoop.**executionengine.** > mapReduceLayer.PigMapReduce$**Reduce.cleanup(PigMapReduce.**java:492) > at org.apache.hadoop.mapreduce.**Reducer.run(Reducer.java:178) > at org.apache.hadoop.mapred.**ReduceTask.runNewReducer(** > ReduceTask.java:572) > at org.apache.hadoop.mapred.**ReduceTask.run(ReduceTask.**java:414) > at org.apache.hadoop.mapred.**Child$4.run(Child.java:270) > at java.security.**AccessController.doPrivileged(**Native Method) > at javax.security.auth.Subject.**doAs(Subject.java:396) > at org.apache.hadoop.security.**UserGroupInformation.doAs(** > UserGroupInformation.java:**1127) > at org.apache.hadoop.mapred.**Child.main(Child.java:264) > Caused by: java.lang.**IllegalArgumentException: No columns to insert > at org.apache.hadoop.hbase.**client.HTable.validatePut(** > HTable.java:845) > at org.apache.hadoop.hbase.**client.HTable.doPut(HTable.**java:677) > at org.apache.hadoop.hbase.**client.HTable.put(HTable.java:**667) > at org.apache.hadoop.hbase.**mapreduce.TableOutputFormat$** > TableRecordWriter.write(**TableOutputFormat.java:127) > at org.apache.hadoop.hbase.**mapreduce.TableOutputFormat$** > TableRecordWriter.write(**TableOutputFormat.java:82) > at org.apache.pig.backend.hadoop.**hbase.HBaseStorage.putNext(** > HBaseStorage.java:431) > at org.apache.pig.backend.hadoop.**executionengine.** > mapReduceLayer.**PigOutputFormat$**PigRecordWriter.write(** > PigOutputFormat.java:138) > at org.apache.pig.backend.hadoop.**executionengine.** > mapReduceLayer.**PigOutputFormat$**PigRecordWriter.write(** > PigOutputFormat.java:97) > at org.apache.hadoop.mapred.**ReduceTask$** > NewTrackingRecordWriter.write(**ReduceTask.java:514) > at org.apache.hadoop.mapreduce.**TaskInputOutputContext.write(** > TaskInputOutputContext.java:**80) > at org.apache.pig.backend.hadoop.**executionengine.** > mapReduceLayer.PigMapReduce$**Reduce.runPipeline(**PigMapReduce.java:437) > ... 9 more > > > Other question : what is the code for comments in pig script (except /* ... > */) to exclude one line rapidly. > > I use cdh3u1 packages. > > Thank you for helping. > > Regards, > > -- > Damien >
