Can you dump D and examine it manually so see if there are cases when the
number of columns is not the same as you expect?

On Tue, Sep 20, 2011 at 9:02 AM, Damien Hardy <dha...@figarocms.fr> wrote:

> Hello,
>
> This is my pig script :
> DEFINE iplookup `wrapper.sh GeoIP` ship ('wrapper.sh')
> cache('/GeoIP/GeoIPcity.dat#**GeoIP') input (stdin using
> PigStreaming(',')) output (stdout using PigStreaming(','));
>
> A = load 'log' using org.apache.pig.backend.hadoop.**
> hbase.HBaseStorage('default:**body','-gt=_f:squid_t:20110920 -loadKey') AS
> (rowkey, data);
> B = FILTER A BY rowkey matches '.*_s:204-.*';
> C = FOREACH B {
>        t = 
> REGEX_EXTRACT(data,'([0-9]{1,**3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\**.[0-9]{1,3}):([0-9]+)
> ',1);
>        generate rowkey, t;
> }
> D = STREAM C THROUGH iplookup;
> STORE D INTO 'geoip_pig' USING org.apache.pig.backend.hadoop.**
> hbase.HBaseStorage('location:**ip,location:country_code,**
> location:country_code3,**location:country_name,**
> location:region,location:city,**location:postal_code,location:**
> latitude,location:longitude,**location:area_code,location:**metro_code');
>
>
> There is 11 columns in my final table/columnFamily (STORE).
>
> I get some jobs (2/46) ending with :
>
> java.lang.**IndexOutOfBoundsException: Index: 11, Size: 11
>        at java.util.ArrayList.**RangeCheck(ArrayList.java:547)
>        at java.util.ArrayList.get(**ArrayList.java:322)
>        at org.apache.pig.backend.hadoop.**hbase.HBaseStorage.putNext(**
> HBaseStorage.java:666)
>        at org.apache.pig.backend.hadoop.**executionengine.**
> mapReduceLayer.**PigOutputFormat$**PigRecordWriter.write(**
> PigOutputFormat.java:139)
>        at org.apache.pig.backend.hadoop.**executionengine.**
> mapReduceLayer.**PigOutputFormat$**PigRecordWriter.write(**
> PigOutputFormat.java:98)
>        at org.apache.hadoop.mapred.**MapTask$**NewDirectOutputCollector.**
> write(MapTask.java:531)
>        at org.apache.hadoop.mapreduce.**TaskInputOutputContext.write(**
> TaskInputOutputContext.java:**80)
>        at org.apache.pig.backend.hadoop.**executionengine.**
> mapReduceLayer.PigMapOnly$Map.**collect(PigMapOnly.java:48)
>        at org.apache.pig.backend.hadoop.**executionengine.**
> mapReduceLayer.**PigGenericMapBase.runPipeline(**
> PigGenericMapBase.java:269)
>        at org.apache.pig.backend.hadoop.**executionengine.**
> mapReduceLayer.**PigGenericMapBase.map(**PigGenericMapBase.java:262)
>        at org.apache.pig.backend.hadoop.**executionengine.**
> mapReduceLayer.**PigGenericMapBase.map(**PigGenericMapBase.java:64)
>        at org.apache.hadoop.mapreduce.**Mapper.run(Mapper.java:144)
>        at org.apache.hadoop.mapred.**MapTask.runNewMapper(MapTask.**
> java:647)
>        at org.apache.hadoop.mapred.**MapTask.run(MapTask.java:323)
>        at org.apache.hadoop.mapred.**Child$4.run(Child.java:270)
>        at java.security.**AccessController.doPrivileged(**Native Method)
>        at javax.security.auth.Subject.**doAs(Subject.java:396)
>        at org.apache.hadoop.security.**UserGroupInformation.doAs(**
> UserGroupInformation.java:**1127)
>        at org.apache.hadoop.mapred.**Child.main(Child.java:264)
>
> Most the jobs them ended successfully.
>
> In src/org/apache/pig/backend/**hadoop/hbase/HBaseStorage.java around line
> 666 ( Damn ! )
>
>        for (int i=1;i<  t.size();++i){
>            ColumnInfo columnInfo = columnInfo_.get(i-1);
>            if (LOG.isDebugEnabled()) {
>                LOG.debug("putNext - tuple: " + i + ", value=" + t.get(i) +
>                        ", cf:column=" + columnInfo);
>            }
>
>
> Is it possible that columnInfo_ and t are not the same size ? In which case
> ?
>
> Regards,
>
> --
> Damien
>
>
>
>

Reply via email to