Bad Data in Files

John Omernik Thu, 20 Aug 2015 05:52:33 -0700

Hey all,

I am trying to read some data in csv files that is pretty rough.  I am
getting errors similar to



https://issues.apache.org/jira/browse/DRILL-3428

when bad data is encountered.   In doing data exploration, I think the
ability to be made aware of where the bad data is VERY important.  But in
addition to this JIRA, it would be nice if Drill could nicely "move on"
from bad lines" For example, if it comes across a line that throws an
error, perhaps stop show the line, which file, and location, and then
somehow find a way to "exclude" that line. Perhaps I as I am reading it, I
just say "yep garbage, ignore it" Not sure how to do this, but, perhaps a
SKIP(filename, lineno) that I can add to a where clause?


This could be expanded even further, for better functionality, but it would
be very helpful as a data explorer in these cases. I'd be interested in
other's thoughts on the subject.

John

Bad Data in Files

Reply via email to