As before, this will be determined by the loader. I agree that it should
not be an error. You should not break an entire job over one bad row.
I haven't specified the appropriate behavior for PigLoader in this case.
My thinking is that the best solution is to emit a null and issue a
warning. If the user wants to throw the rows out he can add a filter
immediately after the load the removes rows will nulls. As a later
enhancement we could also allow the user to specify throwing rows with
bad conversions, though this could be tricky. Is it per column or any
column?
When I clarify the loaders in the type spec I'll put this in too.
Alan.
David (Ciemo) Ciemiewicz wrote:
Alan,
When reading files with load, what happens if a user tries to load a
file that has string data in a field expected to be numeric?
I couldn’t find it described in the spec.
http://wiki.apache.org/pig/PigTypesFunctionalSpec
My concern is that this will throw an error. I don’t think this is an
acceptable outcome.
“Bad” data rows are inevitable.
For some prior art - Oracle loader functions allow you to ignore these
errant rows. They also permit logging the data row to an error file so
that you can go back and diagnose whether there’s a bug or just a data
error.
I think it would be useful to control whether the data is discarded in
the case of a cast failure or to opt to make the data NULL.
--- Ciemo