Thanks, I sometimes get a date like 0001-01-01. This would be a valid date format, but when I try to get the seconds between this and another date, say 2011-01-01, I get an error that the value is too large to be fit into int and the process stops. Do we have something like ifError(x-y, null,x-y)? Or would I have to implement this as an UDF?
Thanks On Tue, Jan 11, 2011 at 11:40 AM, Dmitriy Ryaboy <[email protected]> wrote: > Create a UDF that verifies the format, and go through a filtering step > first. > If you would like to save the malformated records so you can look at them > later, you can use the SPLIT operator to route the good records to your > regular workflow, and the bad records some place on HDFS. > > -D > > On Mon, Jan 10, 2011 at 9:58 PM, hadoop n00b <[email protected]> wrote: > > > Hello, > > > > I have a pig script that uses piggy bank to calculate date differences. > > Sometimes, when I get a wierd date or wrong format in the input, the > script > > throws and error and aborts. > > > > Is there a way I could trap these errors and move on without stopping the > > execution? > > > > Thanks > > > > PS: I'm using CDH2 with Pig 0.5 > > >
