I found that PIG gets confused about the schema after a complicated but correct
nested FOREACH operation.
My script is attached with no modification and it gives error messages below:
Picked up _JAVA_OPTIONS: -Xmx1G
2014-03-24 13:05:18,662 [main] INFO org.apache.pig.Main - Apache Pig version
0.12.0 (r1529718) compiled Oct 07 2013, 12:20:14
2014-03-24 13:05:18,663 [main] INFO org.apache.pig.Main - Logging error
messages to:
/mnt/tera/workspace/OmnilabMisc/sjtuwifi/activities/pig_1395637518659.log
2014-03-24 13:05:18,897 [main] INFO org.apache.pig.impl.util.Utils - Default
bootup file /home/chenxm/.pigbootup not found
2014-03-24 13:05:18,990 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to
hadoop file system at: file:///
activities: {group: chararray,brief: {(activityID: chararray,reqHost:
chararray,rspPylByt: long,pylByt: long,reqTime: double,reqDur: double,rspTime:
double,rspDur: double)}}
2014-03-24 13:05:19,766 [main] WARN org.apache.pig.PigServer - Encountered
Warning IMPLICIT_CAST_TO_DOUBLE 5 time(s).
features: {activityID: chararray,service: chararray,volume: long,size:
long,ADur: double,MWTime: double,MEdur: double,VMR: double,CI: double,PABw:
double}
2014-03-24 13:05:19,904 [main] WARN org.apache.pig.PigServer - Encountered
Warning IMPLICIT_CAST_TO_DOUBLE 11 time(s).
2014-03-24 13:05:19,904 [main] WARN org.apache.pig.PigServer - Encountered
Warning IMPLICIT_CAST_TO_LONG 2 time(s).
filtered: {activityID: chararray,service: chararray,volume: long,size:
long,ADur: double,MWTime: double,MEdur: double,VMR: double,CI: double,PABw:
double}
2014-03-24 13:05:20,049 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR
1000:
<file
/home/chenxm/tera/workspace/OmnilabMisc/sjtuwifi/activities/features_perf.pig,
line 47, column 142> Out of bound access. Trying to access non-existent column:
8. Schema
activityID:chararray,reqHost:chararray,rspPylByt:long,pylByt:long,reqTime:double,reqDur:double,rspTime:double,rspDur:double
has 8 column(s).
Details at logfile: ************/pig_1395637518659.log
[Finished in 1.7s with exit code 6]
In the output, schema of 'filtered' projection is correct but in the following
FOREACH [line 47], PIG treats 'filtered' with another schema the same to
'brief' [line 16].
I do not know why PIG is confused about this. Is this a bug or my usage in an
incorrect way?
Best,
Jamin
[email protected]