Thanks Dimitry, will mention it to Amazon for sure. That was the first thing I tried and it didn't seem to make it work. Not sure what I could be doing wrong. I get an Index out of bound error where the index corresponds to the first instance of the optional field. Here is the stack trace:
Pig Stack Trace --------------- ERROR 2999: Unexpected internal error. Index: 29, Size: 29 java.lang.IndexOutOfBoundsException: Index: 29, Size: 29 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:143) at org.apache.pig.pen.util.ExampleTuple.get(ExampleTuple.java:80) at org.apache.pig.pen.AugmentBaseDataVisitor.visit(AugmentBaseDataVisitor.java:427) at org.apache.pig.impl.logicalLayer.LOLoad.visit(LOLoad.java:210) at org.apache.pig.impl.logicalLayer.LOLoad.visit(LOLoad.java:52) at org.apache.pig.pen.util.PreOrderDepthFirstWalker.depthFirst(PreOrderDepthFirstWalker.java:70) at org.apache.pig.pen.util.PreOrderDepthFirstWalker.depthFirst(PreOrderDepthFirstWalker.java:72) at org.apache.pig.pen.util.PreOrderDepthFirstWalker.walk(PreOrderDepthFirstWalker.java:55) at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) at org.apache.pig.pen.ExampleGenerator.getExamples(ExampleGenerator.java:121) at org.apache.pig.PigServer.getExamples(PigServer.java:731) at org.apache.pig.tools.grunt.GruntParser.processIllustrate(GruntParser.java:557) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:246) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75) at org.apache.pig.Main.main(Main.java:374) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) ================================================================================ On Sun, Nov 13, 2011 at 12:30 AM, Dmitriy Ryaboy <[email protected]> wrote: > If you change the load statement to "load '$input' as (f1, f2, f3, f4, > f5), f4 and f5 will be treated as null if they are absent in the raw > logs. > > If you start relying on Pig heavily, lobby Amazon to upgrade their > version of Pig (or at least provide both 0.6 and 0.9.1). At this > point, 0.6 is positively ancient. But the extra field behavior worked > that way then, too. > > D > > On Sat, Nov 12, 2011 at 4:08 PM, B M D Gill <[email protected]> wrote: > > I'm a newbie running Pig 0.6 on Amazon Elastic Map Reduce. I need to > make > > a change to add additional fields to the log files that I run my pig jobs > > on and am wondering how do I handle this schema in pig. > > > > My current inputs are tab separated fields that I input using the > standard > > pig storage function: > > > > LOAD '$INPUT' USING PigStorage('\t') as (f1, f2, f3); > > > > However some input files will now have additional fields f4, f5, f6 etc. > at > > the trailing edge of each line. How do I set up the load function to > > handle these optional fields? Do I need to make changes to my logic to > > deal with these fields possibly being empty or will Pig simply record > their > > value as null if they are absent? > > > > Thanks to anyone who can share some insight. > > >
