As before, this will be determined by the loader. I agree that it should not be an error. You should not break an entire job over one bad row.

I haven't specified the appropriate behavior for PigLoader in this case. My thinking is that the best solution is to emit a null and issue a warning. If the user wants to throw the rows out he can add a filter immediately after the load the removes rows will nulls. As a later enhancement we could also allow the user to specify throwing rows with bad conversions, though this could be tricky. Is it per column or any column?

When I clarify the loaders in the type spec I'll put this in too.

Alan.

David (Ciemo) Ciemiewicz wrote:

Alan,

When reading files with load, what happens if a user tries to load a file that has string data in a field expected to be numeric?

I couldn’t find it described in the spec. http://wiki.apache.org/pig/PigTypesFunctionalSpec

My concern is that this will throw an error. I don’t think this is an acceptable outcome.

“Bad” data rows are inevitable.

For some prior art - Oracle loader functions allow you to ignore these errant rows. They also permit logging the data row to an error file so that you can go back and diagnose whether there’s a bug or just a data error.

I think it would be useful to control whether the data is discarded in the case of a cast failure or to opt to make the data NULL.

--- Ciemo

Reply via email to