Can you dump D and examine it manually so see if there are cases when the number of columns is not the same as you expect?
On Tue, Sep 20, 2011 at 9:02 AM, Damien Hardy <dha...@figarocms.fr> wrote: > Hello, > > This is my pig script : > DEFINE iplookup `wrapper.sh GeoIP` ship ('wrapper.sh') > cache('/GeoIP/GeoIPcity.dat#**GeoIP') input (stdin using > PigStreaming(',')) output (stdout using PigStreaming(',')); > > A = load 'log' using org.apache.pig.backend.hadoop.** > hbase.HBaseStorage('default:**body','-gt=_f:squid_t:20110920 -loadKey') AS > (rowkey, data); > B = FILTER A BY rowkey matches '.*_s:204-.*'; > C = FOREACH B { > t = > REGEX_EXTRACT(data,'([0-9]{1,**3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\**.[0-9]{1,3}):([0-9]+) > ',1); > generate rowkey, t; > } > D = STREAM C THROUGH iplookup; > STORE D INTO 'geoip_pig' USING org.apache.pig.backend.hadoop.** > hbase.HBaseStorage('location:**ip,location:country_code,** > location:country_code3,**location:country_name,** > location:region,location:city,**location:postal_code,location:** > latitude,location:longitude,**location:area_code,location:**metro_code'); > > > There is 11 columns in my final table/columnFamily (STORE). > > I get some jobs (2/46) ending with : > > java.lang.**IndexOutOfBoundsException: Index: 11, Size: 11 > at java.util.ArrayList.**RangeCheck(ArrayList.java:547) > at java.util.ArrayList.get(**ArrayList.java:322) > at org.apache.pig.backend.hadoop.**hbase.HBaseStorage.putNext(** > HBaseStorage.java:666) > at org.apache.pig.backend.hadoop.**executionengine.** > mapReduceLayer.**PigOutputFormat$**PigRecordWriter.write(** > PigOutputFormat.java:139) > at org.apache.pig.backend.hadoop.**executionengine.** > mapReduceLayer.**PigOutputFormat$**PigRecordWriter.write(** > PigOutputFormat.java:98) > at org.apache.hadoop.mapred.**MapTask$**NewDirectOutputCollector.** > write(MapTask.java:531) > at org.apache.hadoop.mapreduce.**TaskInputOutputContext.write(** > TaskInputOutputContext.java:**80) > at org.apache.pig.backend.hadoop.**executionengine.** > mapReduceLayer.PigMapOnly$Map.**collect(PigMapOnly.java:48) > at org.apache.pig.backend.hadoop.**executionengine.** > mapReduceLayer.**PigGenericMapBase.runPipeline(** > PigGenericMapBase.java:269) > at org.apache.pig.backend.hadoop.**executionengine.** > mapReduceLayer.**PigGenericMapBase.map(**PigGenericMapBase.java:262) > at org.apache.pig.backend.hadoop.**executionengine.** > mapReduceLayer.**PigGenericMapBase.map(**PigGenericMapBase.java:64) > at org.apache.hadoop.mapreduce.**Mapper.run(Mapper.java:144) > at org.apache.hadoop.mapred.**MapTask.runNewMapper(MapTask.** > java:647) > at org.apache.hadoop.mapred.**MapTask.run(MapTask.java:323) > at org.apache.hadoop.mapred.**Child$4.run(Child.java:270) > at java.security.**AccessController.doPrivileged(**Native Method) > at javax.security.auth.Subject.**doAs(Subject.java:396) > at org.apache.hadoop.security.**UserGroupInformation.doAs(** > UserGroupInformation.java:**1127) > at org.apache.hadoop.mapred.**Child.main(Child.java:264) > > Most the jobs them ended successfully. > > In src/org/apache/pig/backend/**hadoop/hbase/HBaseStorage.java around line > 666 ( Damn ! ) > > for (int i=1;i< t.size();++i){ > ColumnInfo columnInfo = columnInfo_.get(i-1); > if (LOG.isDebugEnabled()) { > LOG.debug("putNext - tuple: " + i + ", value=" + t.get(i) + > ", cf:column=" + columnInfo); > } > > > Is it possible that columnInfo_ and t are not the same size ? In which case > ? > > Regards, > > -- > Damien > > > >