I bet you are doing illustrate in your pig script. That may have a problem. Just either do dump or store and your script should work fine.
Ashutosh On Sat, Nov 12, 2011 at 17:03, B M D Gill <[email protected]> wrote: > Thanks Dimitry, will mention it to Amazon for sure. > > That was the first thing I tried and it didn't seem to make it work. Not > sure what I could be doing wrong. I get an Index out of bound error where > the index corresponds to the first instance of the optional field. Here is > the stack trace: > > Pig Stack Trace > --------------- > ERROR 2999: Unexpected internal error. Index: 29, Size: 29 > > java.lang.IndexOutOfBoundsException: Index: 29, Size: 29 > at java.util.ArrayList.RangeCheck(ArrayList.java:547) > at java.util.ArrayList.get(ArrayList.java:322) > at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:143) > at org.apache.pig.pen.util.ExampleTuple.get(ExampleTuple.java:80) > at > > org.apache.pig.pen.AugmentBaseDataVisitor.visit(AugmentBaseDataVisitor.java:427) > at org.apache.pig.impl.logicalLayer.LOLoad.visit(LOLoad.java:210) > at org.apache.pig.impl.logicalLayer.LOLoad.visit(LOLoad.java:52) > at > > org.apache.pig.pen.util.PreOrderDepthFirstWalker.depthFirst(PreOrderDepthFirstWalker.java:70) > at > > org.apache.pig.pen.util.PreOrderDepthFirstWalker.depthFirst(PreOrderDepthFirstWalker.java:72) > at > > org.apache.pig.pen.util.PreOrderDepthFirstWalker.walk(PreOrderDepthFirstWalker.java:55) > at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) > at > org.apache.pig.pen.ExampleGenerator.getExamples(ExampleGenerator.java:121) > at org.apache.pig.PigServer.getExamples(PigServer.java:731) > at > > org.apache.pig.tools.grunt.GruntParser.processIllustrate(GruntParser.java:557) > at > > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:246) > at > > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165) > at > > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141) > at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75) > at org.apache.pig.Main.main(Main.java:374) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > > ================================================================================ > > > > On Sun, Nov 13, 2011 at 12:30 AM, Dmitriy Ryaboy <[email protected]> > wrote: > > > If you change the load statement to "load '$input' as (f1, f2, f3, f4, > > f5), f4 and f5 will be treated as null if they are absent in the raw > > logs. > > > > If you start relying on Pig heavily, lobby Amazon to upgrade their > > version of Pig (or at least provide both 0.6 and 0.9.1). At this > > point, 0.6 is positively ancient. But the extra field behavior worked > > that way then, too. > > > > D > > > > On Sat, Nov 12, 2011 at 4:08 PM, B M D Gill <[email protected]> wrote: > > > I'm a newbie running Pig 0.6 on Amazon Elastic Map Reduce. I need to > > make > > > a change to add additional fields to the log files that I run my pig > jobs > > > on and am wondering how do I handle this schema in pig. > > > > > > My current inputs are tab separated fields that I input using the > > standard > > > pig storage function: > > > > > > LOAD '$INPUT' USING PigStorage('\t') as (f1, f2, f3); > > > > > > However some input files will now have additional fields f4, f5, f6 > etc. > > at > > > the trailing edge of each line. How do I set up the load function to > > > handle these optional fields? Do I need to make changes to my logic to > > > deal with these fields possibly being empty or will Pig simply record > > their > > > value as null if they are absent? > > > > > > Thanks to anyone who can share some insight. > > > > > >
