Re: Looking for workaround to Schema detection problems

Zelaine Fong Fri, 08 Jul 2016 09:18:25 -0700

The issues you're seeing once GROUP BY is added sound like DRILL-4503.

-- Zelaine


On Fri, Jul 8, 2016 at 9:11 AM, Holy Alexander <
[email protected]> wrote:

> Solved!
>
> Actually ALTER SESSION SET `store.json.all_text_mode` = true; improves
> things a bit, but only for a simple select.
> As soon as I added GROUP BY and a second column things went haywire again.
>
> CASTing everything everywhere did solve the problem.
>
> It seems to me that CASTing everything is a best practice as soon as you
> have optional/NULLable columns
>
> Best regards & thanks to everyone!
> Alexander
>
> -----Original Message-----
> From: Zelaine Fong [mailto:[email protected]]
> Sent: 08 July 2016 17:42
> To: [email protected]
> Subject: Re: Looking for workaround to Schema detection problems
>
> Note that the 1.7 release now has the fix for DRILL-4479.  So, that may
> explain why in the past the setting didn't help.
>
> -- Zelaine
>
> On Fri, Jul 8, 2016 at 8:38 AM, rahul challapalli <
> [email protected]> wrote:
>
> > In the past setting the below parameter still did not fix the issue.
> > But still worth a try
> >
> > ALTER SESSION SET `store.json.all_text_mode` = true;
> >
> > You might also want to try explicit casting to varchar for this
> > specific column.
> >
> > On Fri, Jul 8, 2016 at 8:14 AM, Zelaine Fong <[email protected]> wrote:
> >
> > > Have you tried using
> > >
> > > ALTER SESSION SET `store.json.all_text_mode` = true;
> > >
> > > -- Zelaine
> > >
> > > On Fri, Jul 8, 2016 at 6:37 AM, Holy Alexander <
> > > [email protected]> wrote:
> > >
> > > > Hi Vitalii!
> > > >
> > > >
> > > > This is what I tried:
> > > >
> > > > Altered the setting system-wide:
> > > >
> > > > ALTER SYSTEM SET `exec.enable_union_type` = true
> > > >
> > > > Verified that the setting is really altered
> > > >
> > > > SELECT *
> > > > FROM sys.options
> > > > WHERE type in ('SYSTEM','SESSION') order by name
> > > >
> > > > And re-run the query
> > > >
> > > > Unfortunately this does not solve the problem.
> > > > It just causes a different error:
> > > >
> > > > [30027]Query execution error. Details:[ SYSTEM ERROR:
> > > > NullPointerException Fragment 0:0 [Error Id:
> > > > 0f9cb7ae-d2d5-474c-ad57-2d558041e2c6 on
> > > >
> > > > (I tried this on Drill 1.7 and 1.6)
> > > >
> > > > Best regards,
> > > > Alexander
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Vitalii Diravka [mailto:[email protected]]
> > > > Sent: 08 July 2016 13:30
> > > > To: [email protected]
> > > > Subject: Re: Looking for workaround to Schema detection problems
> > > >
> > > > Hi Alexander,
> > > >
> > > > Please try with turning on the union type:
> > > >
> > > > ALTER SESSION SET `exec.enable_union_type` = true;
> > > >
> > > > Kind regards
> > > > Vitalii
> > > >
> > > > 2016-07-08 10:50 GMT+00:00 Holy Alexander <
> > [email protected]
> > > >:
> > > >
> > > > > My JSON data looks - simplified - like this
> > > > >
> > > > > {"ID":1,"a":"some text"}
> > > > > {"ID":2,"a":"some text","b":"some other text"} {"ID":3,"a":"some
> > > > > text"}
> > > > >
> > > > > Column b is only physically serialized when it is not null.
> > > > > It is the equivalent of a NULLable VARCHAR() column in SQL.
> > > > >
> > > > > I run queries like these:
> > > > >
> > > > > SELECT b
> > > > > FROM dfs.`D:\MyData\test.json`
> > > > > WHERE b IS NOT NULL
> > > > >
> > > > > And normally all is fine.
> > > > > However, among my thousands of data files, I have two files
> > > > > where the first occurrence of b happens a few thousand records
> down the file.
> > > > > These two data files would look like this:
> > > > >
> > > > > {"ID":1,"a":"some text"}
> > > > > {"ID":2,"a":"some text"}
> > > > > ... 5000 more records without column b ...
> > > > > {"ID":5002,"a":"some text","b":"some other text"}
> > {"ID":5003,"a":"some
> > > > > text"}
> > > > >
> > > > > In this case, my simple SQL query above fails:
> > > > >
> > > > > [30027]Query execution error. Details:[ DATA_READ ERROR: Error
> > parsing
> > > > > JSON - You tried to write a VarChar type when you are using a
> > > > > ValueWriter of type NullableIntWriterImpl.
> > > > > File  /D:/MyData/test.json
> > > > > Record 5002 Fragment ...
> > > > >
> > > > > It seems that the Schema inference mechanism of Drill only
> > > > > samples a certain amount of bytes (or records) to determine the
> schema.
> > > > > If the first occurrence of a schema detail happens to far down
> > > > > things go boom.
> > > > >
> > > > > I am now looking for a sane way to work around this.
> > > > > Preferred by extending the query and not by altering my massive
> > > > > amounts of data.
> > > > >
> > > > > BTW, I tried altering the data by chaning the first line:
> > > > > {"ID":1,"a":"some text","b":null} does not help.
> > > > >
> > > > > Of course, changing the first line to {"ID":1,"a":"some
> > > > > text","b":""} solves the problem, but this is not a practical
> > > > > solution.
> > > > >
> > > > > Any help appreciated.
> > > > > Alexander
> > > > >
> > > >
> > >
> >
>

Re: Looking for workaround to Schema detection problems

Reply via email to