Re: Working with non-sane data - JSON Types

Brent Payne Mon, 18 Jan 2016 14:59:12 -0800

We had a similar issue(s) and had to reprocess our data so that everything
had a consistent schema or it would break, sometimes with unexpected
issues.  We started on 1.2, so maybe some of the issues are not there
anymore.  Drill is awesome and can do a lot, but it cannot currently do on
the fly type conversion/cleanup.


On Mon, Jan 18, 2016 at 2:11 PM, John Omernik <[email protected]> wrote:

> I am working a LARGE volume of data (I state that because even my first
> reaction was "I'll just write a simple sed command and fix this data up
> lickity split)
>
> However, lots of files, lots of data, so let's avoid that as the initial
> answer if possible. (Ideally I am looking for an "on read" solution in
> Drill)
>
> Basically, when I try to read a file, I get this error:
>
> Error: DATA_READ ERROR: You tried to start when you are using a ValueWriter
> of type SingleMapWriter.
>
> The field in question had a silly setup, if it's empty they use {} if it's
> not empty then it's an array of data.
>
> So:
>
> "field1":{}
> or
> "field1":[{"foo":bar"}, {"bar":"foo"}]
>
> I am pretty sure this is the error. Point: I am not sure the error message
> I provided helps me to understand intuitively, perhaps some TLC on the
> error messages could help less Drill aware users to know what's actually
> breaking (in fairness, the message in 1.4 showed me the line, column, and
> field which helped me to infer what could POSSIBLY be wrong).
>
> So, is there away to address this without reprocessing a lot of data?  An
> option in Drill that would allow a dirty read of some sort?
>
> Thanks in advance!!
>
> John
>

Re: Working with non-sane data - JSON Types

Reply via email to