Thank you Dimitriy.

The 0.9.1-SNAPSHOT version of pig is working without error with the same script ...
Is there a bug open at Cloudera ?

Thank you.

Regards.

--
Damien


Le 16/09/2011 11:26, Dmitriy Ryaboy a écrit :
1) Please try trunk.

2) Like in sql, a single line comment is preceded by two dashes: "--"


D

On Fri, Sep 16, 2011 at 2:10 AM, Damien Hardy<[email protected]>  wrote:

Hello there,

Based on http://www.cloudera.com/blog/**2009/06/analyzing-apache-logs-**
with-pig/<http://www.cloudera.com/blog/2009/06/analyzing-apache-logs-with-pig/>I
 want to add geolocalisation to my haproxy raw logs stored in Hbase Table.

Here is my pig script (wrapper.sh is an auto extract bash archive that
deploy the perl script and its dependances very close to the one in my
reference and launch it ) :

DEFINE iplookup `wrapper.sh GeoIP`
ship ('wrapper.sh')
cache('/GeoIP/GeoIPcity.dat#**GeoIP');

A = load 'log' using org.apache.pig.backend.hadoop.**
hbase.HBaseStorage('default:**body','-gt=_f:squid_t:**201109151630
-loadKey') AS (rowkey, data);
B = LIMIT A 10;
C = FOREACH B {
        t = 
REGEX_EXTRACT(data,'([0-9]{1,**3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\**.[0-9]{1,3}):([0-9]+)
',1);
        generate rowkey, t;
}
D = STREAM C THROUGH iplookup AS (rowkey, ip, country_code, country, state,
city);
STORE D INTO 'geoip_pig' USING org.apache.pig.backend.hadoop.**
hbase.HBaseStorage('location:**ip location:country_code location:country
location:state location:city');

I can DUMP D; without problem, I get what is promised with my
geolocalisation :
(_f:squid_t:20110916103000_b:**squid_s:200-+PH/I6eJ9h8Sy8/1+**
yz2kw==,77.192.16.143,FR,**France,B9,Lyon)
(_f:squid_t:20110916103000_b:**squid_s:200-+**
XSr1ZpMyLGmi8iDvZ4lLQ==,80.13.**204.64,FR,France,A8,Paris)
(_f:squid_t:20110916103000_b:**squid_s:200-+**gl66vwlvPL9Di1zzut9Bg==,178.
**250.1.40,FR,France,,)
(_f:squid_t:20110916103000_b:**squid_s:200-+**
qAtjeGfssc2vkwWR4fmJQ==,86.73.**78.25,FR,France,A8,La Courneuve)
(_f:squid_t:20110916103000_b:**squid_s:200-+wgQq1q8H/vp52//**
EIevzA==,80.13.204.64,FR,**France,A8,Paris)
(_f:squid_t:20110916103000_b:**squid_s:200-/**3J9EosV46v521VBlb6zxQ==,82.*
*127.103.161,FR,France,B6,**Erquery)
(_f:squid_t:20110916103000_b:**squid_s:200-/**3okAiWeWMmpm54Qlk7JyQ==,
86.75.127.253,FR,**France,B5,La Dagueni�re)
(_f:squid_t:20110916103000_b:**squid_s:200-/**yZ09fLNWflcBlWX1BjEkA==,83.*
*200.13.146,FR,France,A8,**Villiers-le-bel)
(_f:squid_t:20110916103000_b:**squid_s:200-0/**HiVaFE6b1zrUTtHkV05Q==,193.
**228.156.10,FR,France,,)
(_f:squid_t:20110916103000_b:**squid_s:200-**0CTc6LQ9jGpgQQLwmJZxQQ==,195.
**93.102.10,FR,France,,)



But when I want to store (last line) in a new existing HTable I get the
following error message in the reduce JT UI :

java.io.IOException: java.lang.**IllegalArgumentException: No columns to
insert
        at org.apache.pig.backend.hadoop.**executionengine.**
mapReduceLayer.PigMapReduce$**Reduce.runPipeline(**PigMapReduce.java:439)
        at org.apache.pig.backend.hadoop.**executionengine.**
mapReduceLayer.PigMapReduce$**Reduce.cleanup(PigMapReduce.**java:492)
        at org.apache.hadoop.mapreduce.**Reducer.run(Reducer.java:178)
        at org.apache.hadoop.mapred.**ReduceTask.runNewReducer(**
ReduceTask.java:572)
        at org.apache.hadoop.mapred.**ReduceTask.run(ReduceTask.**java:414)
        at org.apache.hadoop.mapred.**Child$4.run(Child.java:270)
        at java.security.**AccessController.doPrivileged(**Native Method)
        at javax.security.auth.Subject.**doAs(Subject.java:396)
        at org.apache.hadoop.security.**UserGroupInformation.doAs(**
UserGroupInformation.java:**1127)
        at org.apache.hadoop.mapred.**Child.main(Child.java:264)
Caused by: java.lang.**IllegalArgumentException: No columns to insert
        at org.apache.hadoop.hbase.**client.HTable.validatePut(**
HTable.java:845)
        at org.apache.hadoop.hbase.**client.HTable.doPut(HTable.**java:677)
        at org.apache.hadoop.hbase.**client.HTable.put(HTable.java:**667)
        at org.apache.hadoop.hbase.**mapreduce.TableOutputFormat$**
TableRecordWriter.write(**TableOutputFormat.java:127)
        at org.apache.hadoop.hbase.**mapreduce.TableOutputFormat$**
TableRecordWriter.write(**TableOutputFormat.java:82)
        at org.apache.pig.backend.hadoop.**hbase.HBaseStorage.putNext(**
HBaseStorage.java:431)
        at org.apache.pig.backend.hadoop.**executionengine.**
mapReduceLayer.**PigOutputFormat$**PigRecordWriter.write(**
PigOutputFormat.java:138)
        at org.apache.pig.backend.hadoop.**executionengine.**
mapReduceLayer.**PigOutputFormat$**PigRecordWriter.write(**
PigOutputFormat.java:97)
        at org.apache.hadoop.mapred.**ReduceTask$**
NewTrackingRecordWriter.write(**ReduceTask.java:514)
        at org.apache.hadoop.mapreduce.**TaskInputOutputContext.write(**
TaskInputOutputContext.java:**80)
        at org.apache.pig.backend.hadoop.**executionengine.**
mapReduceLayer.PigMapReduce$**Reduce.runPipeline(**PigMapReduce.java:437)
        ... 9 more


Other question : what is the code for comments in pig script (except /* ...
*/) to exclude one line rapidly.

I use cdh3u1 packages.

Thank you for helping.

Regards,

--
Damien


Reply via email to