The issues you're seeing once GROUP BY is added sound like DRILL-4503. -- Zelaine
On Fri, Jul 8, 2016 at 9:11 AM, Holy Alexander < [email protected]> wrote: > Solved! > > Actually ALTER SESSION SET `store.json.all_text_mode` = true; improves > things a bit, but only for a simple select. > As soon as I added GROUP BY and a second column things went haywire again. > > CASTing everything everywhere did solve the problem. > > It seems to me that CASTing everything is a best practice as soon as you > have optional/NULLable columns > > Best regards & thanks to everyone! > Alexander > > -----Original Message----- > From: Zelaine Fong [mailto:[email protected]] > Sent: 08 July 2016 17:42 > To: [email protected] > Subject: Re: Looking for workaround to Schema detection problems > > Note that the 1.7 release now has the fix for DRILL-4479. So, that may > explain why in the past the setting didn't help. > > -- Zelaine > > On Fri, Jul 8, 2016 at 8:38 AM, rahul challapalli < > [email protected]> wrote: > > > In the past setting the below parameter still did not fix the issue. > > But still worth a try > > > > ALTER SESSION SET `store.json.all_text_mode` = true; > > > > You might also want to try explicit casting to varchar for this > > specific column. > > > > On Fri, Jul 8, 2016 at 8:14 AM, Zelaine Fong <[email protected]> wrote: > > > > > Have you tried using > > > > > > ALTER SESSION SET `store.json.all_text_mode` = true; > > > > > > -- Zelaine > > > > > > On Fri, Jul 8, 2016 at 6:37 AM, Holy Alexander < > > > [email protected]> wrote: > > > > > > > Hi Vitalii! > > > > > > > > > > > > This is what I tried: > > > > > > > > Altered the setting system-wide: > > > > > > > > ALTER SYSTEM SET `exec.enable_union_type` = true > > > > > > > > Verified that the setting is really altered > > > > > > > > SELECT * > > > > FROM sys.options > > > > WHERE type in ('SYSTEM','SESSION') order by name > > > > > > > > And re-run the query > > > > > > > > Unfortunately this does not solve the problem. > > > > It just causes a different error: > > > > > > > > [30027]Query execution error. Details:[ SYSTEM ERROR: > > > > NullPointerException Fragment 0:0 [Error Id: > > > > 0f9cb7ae-d2d5-474c-ad57-2d558041e2c6 on > > > > > > > > (I tried this on Drill 1.7 and 1.6) > > > > > > > > Best regards, > > > > Alexander > > > > > > > > > > > > -----Original Message----- > > > > From: Vitalii Diravka [mailto:[email protected]] > > > > Sent: 08 July 2016 13:30 > > > > To: [email protected] > > > > Subject: Re: Looking for workaround to Schema detection problems > > > > > > > > Hi Alexander, > > > > > > > > Please try with turning on the union type: > > > > > > > > ALTER SESSION SET `exec.enable_union_type` = true; > > > > > > > > Kind regards > > > > Vitalii > > > > > > > > 2016-07-08 10:50 GMT+00:00 Holy Alexander < > > [email protected] > > > >: > > > > > > > > > My JSON data looks - simplified - like this > > > > > > > > > > {"ID":1,"a":"some text"} > > > > > {"ID":2,"a":"some text","b":"some other text"} {"ID":3,"a":"some > > > > > text"} > > > > > > > > > > Column b is only physically serialized when it is not null. > > > > > It is the equivalent of a NULLable VARCHAR() column in SQL. > > > > > > > > > > I run queries like these: > > > > > > > > > > SELECT b > > > > > FROM dfs.`D:\MyData\test.json` > > > > > WHERE b IS NOT NULL > > > > > > > > > > And normally all is fine. > > > > > However, among my thousands of data files, I have two files > > > > > where the first occurrence of b happens a few thousand records > down the file. > > > > > These two data files would look like this: > > > > > > > > > > {"ID":1,"a":"some text"} > > > > > {"ID":2,"a":"some text"} > > > > > ... 5000 more records without column b ... > > > > > {"ID":5002,"a":"some text","b":"some other text"} > > {"ID":5003,"a":"some > > > > > text"} > > > > > > > > > > In this case, my simple SQL query above fails: > > > > > > > > > > [30027]Query execution error. Details:[ DATA_READ ERROR: Error > > parsing > > > > > JSON - You tried to write a VarChar type when you are using a > > > > > ValueWriter of type NullableIntWriterImpl. > > > > > File /D:/MyData/test.json > > > > > Record 5002 Fragment ... > > > > > > > > > > It seems that the Schema inference mechanism of Drill only > > > > > samples a certain amount of bytes (or records) to determine the > schema. > > > > > If the first occurrence of a schema detail happens to far down > > > > > things go boom. > > > > > > > > > > I am now looking for a sane way to work around this. > > > > > Preferred by extending the query and not by altering my massive > > > > > amounts of data. > > > > > > > > > > BTW, I tried altering the data by chaning the first line: > > > > > {"ID":1,"a":"some text","b":null} does not help. > > > > > > > > > > Of course, changing the first line to {"ID":1,"a":"some > > > > > text","b":""} solves the problem, but this is not a practical > > > > > solution. > > > > > > > > > > Any help appreciated. > > > > > Alexander > > > > > > > > > > > > > > >
