If you change the load statement to "load '$input' as (f1, f2, f3, f4, f5), f4 and f5 will be treated as null if they are absent in the raw logs.
If you start relying on Pig heavily, lobby Amazon to upgrade their version of Pig (or at least provide both 0.6 and 0.9.1). At this point, 0.6 is positively ancient. But the extra field behavior worked that way then, too. D On Sat, Nov 12, 2011 at 4:08 PM, B M D Gill <[email protected]> wrote: > I'm a newbie running Pig 0.6 on Amazon Elastic Map Reduce. I need to make > a change to add additional fields to the log files that I run my pig jobs > on and am wondering how do I handle this schema in pig. > > My current inputs are tab separated fields that I input using the standard > pig storage function: > > LOAD '$INPUT' USING PigStorage('\t') as (f1, f2, f3); > > However some input files will now have additional fields f4, f5, f6 etc. at > the trailing edge of each line. How do I set up the load function to > handle these optional fields? Do I need to make changes to my logic to > deal with these fields possibly being empty or will Pig simply record their > value as null if they are absent? > > Thanks to anyone who can share some insight. >
