The pattern I use with bad data is to increment a counter and return null. Logging and error message is also good, but that could turn into a massive log file if there's a large dataset of bad data. Would be curious to hear others thoughts re the logging bit.
Either way, I think this is a good change to make to AvroStorage. On Fri, Mar 23, 2012 at 7:03 PM, Russell Jurney <[email protected]>wrote: > One record in a 125MB avro file is killing my script. I could patch > AvroStorage() to catch the exception and return null after logging an error > - I think. Should I? > > -- > Russell Jurney twitter.com/rjurney [email protected] > datasyndrome.com > -- *Note that I'm no longer using my Yahoo! email address. Please email me at [email protected] going forward.*
