Hello,

This is my pig script :
DEFINE iplookup `wrapper.sh GeoIP` ship ('wrapper.sh') 
cache('/GeoIP/GeoIPcity.dat#GeoIP') input (stdin using PigStreaming(',')) 
output (stdout using PigStreaming(','));

A = load 'log' using 
org.apache.pig.backend.hadoop.hbase.HBaseStorage('default:body','-gt=_f:squid_t:20110920
 -loadKey') AS (rowkey, data);
B = FILTER A BY rowkey matches '.*_s:204-.*';
C = FOREACH B {
        t = 
REGEX_EXTRACT(data,'([0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}):([0-9]+)
 ',1);
        generate rowkey, t;
}
D = STREAM C THROUGH iplookup;
STORE D INTO 'geoip_pig' USING 
org.apache.pig.backend.hadoop.hbase.HBaseStorage('location:ip,location:country_code,location:country_code3,location:country_name,location:region,location:city,location:postal_code,location:latitude,location:longitude,location:area_code,location:metro_code');


There is 11 columns in my final table/columnFamily (STORE).

I get some jobs (2/46) ending with :

java.lang.IndexOutOfBoundsException: Index: 11, Size: 11
        at java.util.ArrayList.RangeCheck(ArrayList.java:547)
        at java.util.ArrayList.get(ArrayList.java:322)
        at 
org.apache.pig.backend.hadoop.hbase.HBaseStorage.putNext(HBaseStorage.java:666)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
        at 
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:531)
        at 
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:269)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:262)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
        at org.apache.hadoop.mapred.Child.main(Child.java:264)

Most the jobs them ended successfully.

In src/org/apache/pig/backend/hadoop/hbase/HBaseStorage.java around line 666 ( 
Damn ! )

        for (int i=1;i<  t.size();++i){
            ColumnInfo columnInfo = columnInfo_.get(i-1);
            if (LOG.isDebugEnabled()) {
                LOG.debug("putNext - tuple: " + i + ", value=" + t.get(i) +
                        ", cf:column=" + columnInfo);
            }


Is it possible that columnInfo_ and t are not the same size ? In which case ?

Regards,

--
Damien



Reply via email to