Just to be sure - what are the correct APIs to use here when implementing
this change in a UDF?

See: https://issues.apache.org/jira/browse/PIG-2614

On Sun, Mar 25, 2012 at 7:44 AM, Stan Rosenberg <[email protected]>wrote:

> I typically increment a counter and have a bounded log of randomly sampled
> erroneous data.
>
> stan
> On Mar 24, 2012 6:50 PM, "[email protected]" <[email protected]>
> wrote:
>
> > Can do a counter and log the first few thousand  rows or something ...
> >
> >
> >
> > On Mar 24, 2012, at 10:33 AM, Bill Graham <[email protected]> wrote:
> >
> > > The pattern I use with bad data is to increment a counter and return
> > null.
> > > Logging and error message is also good, but that could turn into a
> > massive
> > > log file if there's a large dataset of bad data. Would be curious to
> hear
> > > others thoughts re the logging bit.
> > >
> > > Either way, I think this is a good change to make to AvroStorage.
> > >
> > > On Fri, Mar 23, 2012 at 7:03 PM, Russell Jurney <
> > [email protected]>wrote:
> > >
> > >> One record in a 125MB avro file is killing my script.  I could patch
> > >> AvroStorage() to catch the exception and return null after logging an
> > error
> > >> - I think.  Should I?
> > >>
> > >> --
> > >> Russell Jurney twitter.com/rjurney [email protected]
> > >> datasyndrome.com
> > >>
> > >
> > >
> > >
> > > --
> > > *Note that I'm no longer using my Yahoo! email address. Please email me
> > at
> > > [email protected] going forward.*
> >
>



-- 
Russell Jurney twitter.com/rjurney [email protected] datasyndrome.com

Reply via email to