Hello everyone,
I have some issues with the pig flatten statement as I receive several
exceptions when trying to flatten a bag.
I read in Jira and on the mailiinglists that other people had issues with
flattening a bag which was embedded within a tuple, but that should have been
solved in version 0.8.1 and 0.9.0. I've tried using 0.9.0 and 0.9.1, both
giving me the same results.
Any help is greatly appreciated,
Daan Gerits
==================== Snippet Start ====================
parserResults =
FOREACH fetchResultsFlattened {
parsed = parse('GoogleSearchItems.xml', content);
GENERATE queryString, FLATTEN(parsed);
}
DESCRIBE parserResults;
parserResults: {null::queryString: chararray,fields::id:
chararray,fields::path: chararray,fields::selector: chararray,fields::type:
chararray,fields::values: {(value: chararray)}}
parserValues =
FOREACH parserResults
GENERATE queryString, id, path, selector, type, FLATTEN(values);
DESCRIBE parserValues;
parserValues: {null::queryString: chararray,fields::id: chararray,fields::path:
chararray,fields::selector: chararray,fields::type:
chararray,fields::values::value: chararray}
DUMP parserValues;
java.lang.IndexOutOfBoundsException: Index: 5, Size: 2
at java.util.ArrayList.RangeCheck(ArrayList.java:547)
at java.util.ArrayList.get(ArrayList.java:322)
at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:158)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:575)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:248)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:316)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:332)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:459)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:427)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:407)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:261)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
at
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:572)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:414)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:256)
==================== Snippet End ====================
--
kind regards,
Gerits Daan
BigData Consultant
Foundation.be
Hoekskensweg 12, 9290 Berlare
btw: BE 0872.859.349
gsm: +32 477 759533
web1: http://www.foundation.be
web2: http://www.relacss.com