sorry, I read custom log and I thought you have a custom loader you can extend PigStorage and do the field replacement in its putNext method
I'll do an example later On 11/8/11, Rauan Maemirov <[email protected]> wrote: > Yes, you understand my task right. What is putNext? I'm new to pig, and > didn't customize udfs. > > 2011/11/8 pablomar <[email protected]> > >> sorry, I didn't understand completely >> >> do you want to read a line, if the date is invalid (performing a >> IsoToUnix directly and not a regex before) you want to skip it ? it >> that ? >> if yes, you can replace the field with your converted date (unix >> format), and if it fails put a null or nothing >> >> I mean, in your overridden putNext, you have you individual columns, >> you can try to convert the date in there and put in the output your >> unix date. >> >> sorry if I misunderstood again your problem >> >> On 11/8/11, Rauan Maemirov <[email protected]> wrote: >> > Sure, but now I'm just omiting the rows _after_ regex matching. >> > What I want to do is to avoid additional filtering by regex and ignore >> > invalid rows right after unsuccessful IsoToUnix(). >> > >> > 2011/11/8 pablomar <[email protected]> >> > >> >> can you write something else (a null, for example) in your putNext >> >> method for that field when the date is invalid ? >> >> >> >> On 11/8/11, Rauan Maemirov <[email protected]> wrote: >> >> > Well, I solved this issue via regex matching, but I wonder if it's >> >> > too >> >> > costful. >> >> > Is there anyway the way to ignore exceptions and move on just by >> omiting >> >> > the wrong tuples? >> >> > >> >> > 2011/11/8 Rauan Maemirov <[email protected]> >> >> > >> >> >> Hi, all. I've got custom log (csv delimited by comma) with iso >> >> >> dates, >> >> >> sometimes log writing lags and I'm having exceptions with wrong iso >> >> >> date >> >> >> format. >> >> >> Here's exception: https://gist.github.com/1347406. (Date is the last >> >> >> "parameter" in the row, and it's incorrectly overwritten at the end >> by >> >> >> another string). >> >> >> >> >> >> The question is how can I filter out all wrong dates or at least >> force >> >> pig >> >> >> to ignore them instead of failing? >> >> >> >> >> > >> >> >> > >> >
