We had a similar issue(s) and had to reprocess our data so that everything had a consistent schema or it would break, sometimes with unexpected issues. We started on 1.2, so maybe some of the issues are not there anymore. Drill is awesome and can do a lot, but it cannot currently do on the fly type conversion/cleanup.
On Mon, Jan 18, 2016 at 2:11 PM, John Omernik <[email protected]> wrote: > I am working a LARGE volume of data (I state that because even my first > reaction was "I'll just write a simple sed command and fix this data up > lickity split) > > However, lots of files, lots of data, so let's avoid that as the initial > answer if possible. (Ideally I am looking for an "on read" solution in > Drill) > > Basically, when I try to read a file, I get this error: > > Error: DATA_READ ERROR: You tried to start when you are using a ValueWriter > of type SingleMapWriter. > > The field in question had a silly setup, if it's empty they use {} if it's > not empty then it's an array of data. > > So: > > "field1":{} > or > "field1":[{"foo":bar"}, {"bar":"foo"}] > > I am pretty sure this is the error. Point: I am not sure the error message > I provided helps me to understand intuitively, perhaps some TLC on the > error messages could help less Drill aware users to know what's actually > breaking (in fairness, the message in 1.4 showed me the line, column, and > field which helped me to infer what could POSSIBLY be wrong). > > So, is there away to address this without reprocessing a lot of data? An > option in Drill that would allow a dirty read of some sort? > > Thanks in advance!! > > John >
