Thanks, I sometimes get a date like 0001-01-01. This would be a valid date
format, but when I try to get the seconds between this and another date, say
2011-01-01, I get an error that the value is too large to be fit into int
and the process stops. Do we have something like ifError(x-y, null,x-y)? Or
would I have to implement this as an UDF?

Thanks

On Tue, Jan 11, 2011 at 11:40 AM, Dmitriy Ryaboy <[email protected]> wrote:

> Create a UDF that verifies the format, and go through a filtering step
> first.
> If you would like to save the malformated records so you can look at them
> later, you can use the SPLIT operator to route the good records to your
> regular workflow, and the bad records some place on HDFS.
>
> -D
>
> On Mon, Jan 10, 2011 at 9:58 PM, hadoop n00b <[email protected]> wrote:
>
> > Hello,
> >
> > I have a pig script that uses piggy bank to calculate date differences.
> > Sometimes, when I get a wierd date or wrong format in the input, the
> script
> > throws and error and aborts.
> >
> > Is there a way I could trap these errors and move on without stopping the
> > execution?
> >
> > Thanks
> >
> > PS: I'm using CDH2 with Pig 0.5
> >
>

Reply via email to