Parquet is the target format, but getting to parquet is difficult. Using Drill's CTAS to create parquet is really easy, but there are pitfalls with converting JSON to Parquet. I think the only libraries that support nested Parquet creation with schema are in C++ so there aren't many options out there to generate Parquet.
-----Original Message----- From: Paul Rogers [mailto:[email protected]] Sent: Thursday, May 31, 2018 11:42 AM To: [email protected] Subject: Re: Read complex json file gives list type doesn't support different data types [EXTERNAL EMAIL] +1 We had a long discussion on this topic on the dev list a month or so. The conclusion seemed to be that Drill is intended to pull schema out of data without an external schema; the data must support Drill's schema inference. There remain holes where knowing the schema up front would be a huge win. For now, the solution is to ETL to Parquet, which does carry a schema. Thanks, - Paul On Thursday, May 31, 2018, 8:25:00 AM PDT, Lee, David <[email protected]> wrote: I think I opened an enhancement ticket to pass in a json schema object to a query to bypass schema learning to avoid problems like this. Coordinates could be typed as float in a schema object so drill can cast it to float without converting everything to doubles. It also addresses the issues if some key value is NULL in the entire file. Drill will cast NULL to an int which results in a schema error if the next file read has non-Null string values. Turning on read everything as string is a hack and even that fails once you start hitting Null key values which are nested keys or nested arrays. An alternative short term solution would be to not include NULLs. Sent from my iPad > On May 31, 2018, at 2:23 AM, Divya Gehlot <[email protected]> wrote: > > [EXTERNAL EMAIL] > > > I tried exec.enable_union_type it didnt work for me ,however below helped : > > ALTER SESSION SET `store.json.read_numbers_as_double` = true; > > >> On 31 May 2018 at 11:28, Padma Penumarthy <[email protected]> wrote: >> >> yes, that is correct. >> You can try setting the option “exec.enable_union_type” for that to >> work with the caveat that union type is not fully supported in drill. >> >> Thanks >> Padma >> >> >>> On May 30, 2018, at 7:56 PM, Divya Gehlot <[email protected]> >> wrote: >>> >>> Hi, >>> I am reading a complex json file, I am getting format doesn't >>> support >> while >>> reading below : >>> "Coordinates":[ >>> [ >>> 23.53, >>> 4.99, >>> 11 >>> ], >>> [ >>> 35.09, >>> 7.7, >>> 16 >>> ] >>> ] >>> >>> >>> Error : Query execution error. Details:[ >>>> UNSUPPORTED_OPERATION ERROR: In a list of type FLOAT8, encountered >>>> a >> value >>>> of type BIGINT. Drill does not support lists of different types. >>>> Line 15 >>>> Column 19 >>>> Field Coordinates >>>> Line 15 >>>> Column 19 >>>> Field Coordinates >>>> Line 15 >>>> Column 19 >>>> Field Coordinates >>>> Fragment 0:0 >>> >>> >>> If I remove the third coordinates(11,16) which is integer it works >>> like charm . >>> >>> Does that means Drill doesn't support values of different data types >>> in array list? >>> >>> Appreciate the help ! >>> >>> Thanks, >>> Divya >> >> This message may contain information that is confidential or privileged. If you are not the intended recipient, please advise the sender immediately and delete this message. See http://www.blackrock.com/corporate/en-us/compliance/email-disclaimers for further information. Please refer to http://www.blackrock.com/corporate/en-us/compliance/privacy-policy for more information about BlackRock’s Privacy Policy. For a list of BlackRock's office addresses worldwide, see http://www.blackrock.com/corporate/en-us/about-us/contacts-locations. © 2018 BlackRock, Inc. All rights reserved.
