Re: How to handle optional fields in schema

B M D Gill Sat, 12 Nov 2011 17:04:46 -0800

Thanks Dimitry, will mention it to Amazon for sure.

That was the first thing I tried and it didn't seem to make it work.  Not
sure what I could be doing wrong.  I get an Index out of bound error where
the index corresponds to the first instance of the optional field.  Here is
the stack trace:


Pig Stack Trace
---------------
ERROR 2999: Unexpected internal error. Index: 29, Size: 29

java.lang.IndexOutOfBoundsException: Index: 29, Size: 29
 at java.util.ArrayList.RangeCheck(ArrayList.java:547)
at java.util.ArrayList.get(ArrayList.java:322)
 at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:143)
at org.apache.pig.pen.util.ExampleTuple.get(ExampleTuple.java:80)
 at
org.apache.pig.pen.AugmentBaseDataVisitor.visit(AugmentBaseDataVisitor.java:427)
at org.apache.pig.impl.logicalLayer.LOLoad.visit(LOLoad.java:210)
 at org.apache.pig.impl.logicalLayer.LOLoad.visit(LOLoad.java:52)
at
org.apache.pig.pen.util.PreOrderDepthFirstWalker.depthFirst(PreOrderDepthFirstWalker.java:70)
 at
org.apache.pig.pen.util.PreOrderDepthFirstWalker.depthFirst(PreOrderDepthFirstWalker.java:72)
at
org.apache.pig.pen.util.PreOrderDepthFirstWalker.walk(PreOrderDepthFirstWalker.java:55)
 at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
at
org.apache.pig.pen.ExampleGenerator.getExamples(ExampleGenerator.java:121)
 at org.apache.pig.PigServer.getExamples(PigServer.java:731)
at
org.apache.pig.tools.grunt.GruntParser.processIllustrate(GruntParser.java:557)
 at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:246)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
 at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
 at org.apache.pig.Main.main(Main.java:374)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
================================================================================



On Sun, Nov 13, 2011 at 12:30 AM, Dmitriy Ryaboy <[email protected]> wrote:

> If you change the load statement to "load '$input' as (f1, f2, f3, f4,
> f5), f4 and f5 will be treated as null if they are absent in the raw
> logs.
>
> If you start relying on Pig heavily, lobby Amazon to upgrade their
> version of Pig (or at least provide both 0.6 and 0.9.1). At this
> point, 0.6 is positively ancient. But the extra field behavior worked
> that way then, too.
>
> D
>
> On Sat, Nov 12, 2011 at 4:08 PM, B M D Gill <[email protected]> wrote:
> > I'm a newbie running Pig 0.6 on Amazon Elastic Map Reduce.  I need to
> make
> > a change to add additional fields to the log files that I run my pig jobs
> > on  and am wondering how do I handle this schema in pig.
> >
> > My current inputs are tab separated fields that I input using the
> standard
> > pig storage function:
> >
> > LOAD '$INPUT' USING PigStorage('\t') as (f1, f2, f3);
> >
> > However some input files will now have additional fields f4, f5, f6 etc.
> at
> > the trailing edge of each line.  How do I set up the load function to
> > handle these optional fields?  Do I need to make changes to my logic to
> > deal with these fields possibly being empty or will Pig simply record
> their
> > value as null if they are absent?
> >
> > Thanks to anyone who can share some insight.
> >
>

Re: How to handle optional fields in schema

Reply via email to