Re: Looking for workaround to Schema detection problems

Zelaine Fong Fri, 08 Jul 2016 08:15:13 -0700

Have you tried using

ALTER SESSION SET `store.json.all_text_mode` = true;


-- Zelaine

On Fri, Jul 8, 2016 at 6:37 AM, Holy Alexander <
[email protected]> wrote:

> Hi Vitalii!
>
>
> This is what I tried:
>
> Altered the setting system-wide:
>
> ALTER SYSTEM SET `exec.enable_union_type` = true
>
> Verified that the setting is really altered
>
> SELECT *
> FROM sys.options
> WHERE type in ('SYSTEM','SESSION') order by name
>
> And re-run the query
>
> Unfortunately this does not solve the problem.
> It just causes a different error:
>
> [30027]Query execution error. Details:[
> SYSTEM ERROR: NullPointerException
> Fragment 0:0
> [Error Id: 0f9cb7ae-d2d5-474c-ad57-2d558041e2c6 on
>
> (I tried this on Drill 1.7 and 1.6)
>
> Best regards,
> Alexander
>
>
> -----Original Message-----
> From: Vitalii Diravka [mailto:[email protected]]
> Sent: 08 July 2016 13:30
> To: [email protected]
> Subject: Re: Looking for workaround to Schema detection problems
>
> Hi Alexander,
>
> Please try with turning on the union type:
>
> ALTER SESSION SET `exec.enable_union_type` = true;
>
> Kind regards
> Vitalii
>
> 2016-07-08 10:50 GMT+00:00 Holy Alexander <[email protected]>:
>
> > My JSON data looks - simplified - like this
> >
> > {"ID":1,"a":"some text"}
> > {"ID":2,"a":"some text","b":"some other text"} {"ID":3,"a":"some
> > text"}
> >
> > Column b is only physically serialized when it is not null.
> > It is the equivalent of a NULLable VARCHAR() column in SQL.
> >
> > I run queries like these:
> >
> > SELECT b
> > FROM dfs.`D:\MyData\test.json`
> > WHERE b IS NOT NULL
> >
> > And normally all is fine.
> > However, among my thousands of data files, I have two files where the
> > first occurrence of b happens a few thousand records down the file.
> > These two data files would look like this:
> >
> > {"ID":1,"a":"some text"}
> > {"ID":2,"a":"some text"}
> > ... 5000 more records without column b ...
> > {"ID":5002,"a":"some text","b":"some other text"} {"ID":5003,"a":"some
> > text"}
> >
> > In this case, my simple SQL query above fails:
> >
> > [30027]Query execution error. Details:[ DATA_READ ERROR: Error parsing
> > JSON - You tried to write a VarChar type when you are using a
> > ValueWriter of type NullableIntWriterImpl.
> > File  /D:/MyData/test.json
> > Record 5002 Fragment ...
> >
> > It seems that the Schema inference mechanism of Drill only samples a
> > certain amount of bytes (or records) to determine the schema.
> > If the first occurrence of a schema detail happens to far down things
> > go boom.
> >
> > I am now looking for a sane way to work around this.
> > Preferred by extending the query and not by altering my massive
> > amounts of data.
> >
> > BTW, I tried altering the data by chaning the first line:
> > {"ID":1,"a":"some text","b":null}
> > does not help.
> >
> > Of course, changing the first line to
> > {"ID":1,"a":"some text","b":""}
> > solves the problem, but this is not a practical solution.
> >
> > Any help appreciated.
> > Alexander
> >
>

Re: Looking for workaround to Schema detection problems

Reply via email to