Just to be sure - what are the correct APIs to use here when implementing this change in a UDF?
See: https://issues.apache.org/jira/browse/PIG-2614 On Sun, Mar 25, 2012 at 7:44 AM, Stan Rosenberg <[email protected]>wrote: > I typically increment a counter and have a bounded log of randomly sampled > erroneous data. > > stan > On Mar 24, 2012 6:50 PM, "[email protected]" <[email protected]> > wrote: > > > Can do a counter and log the first few thousand rows or something ... > > > > > > > > On Mar 24, 2012, at 10:33 AM, Bill Graham <[email protected]> wrote: > > > > > The pattern I use with bad data is to increment a counter and return > > null. > > > Logging and error message is also good, but that could turn into a > > massive > > > log file if there's a large dataset of bad data. Would be curious to > hear > > > others thoughts re the logging bit. > > > > > > Either way, I think this is a good change to make to AvroStorage. > > > > > > On Fri, Mar 23, 2012 at 7:03 PM, Russell Jurney < > > [email protected]>wrote: > > > > > >> One record in a 125MB avro file is killing my script. I could patch > > >> AvroStorage() to catch the exception and return null after logging an > > error > > >> - I think. Should I? > > >> > > >> -- > > >> Russell Jurney twitter.com/rjurney [email protected] > > >> datasyndrome.com > > >> > > > > > > > > > > > > -- > > > *Note that I'm no longer using my Yahoo! email address. Please email me > > at > > > [email protected] going forward.* > > > -- Russell Jurney twitter.com/rjurney [email protected] datasyndrome.com
