Parquet is the target format, but getting to parquet is difficult. Using 
Drill's CTAS to create parquet is really easy, but there are pitfalls with 
converting JSON to Parquet. I think the only libraries that support nested 
Parquet creation with schema are in C++ so there aren't many options out there 
to generate Parquet.

-----Original Message-----
From: Paul Rogers [mailto:[email protected]] 
Sent: Thursday, May 31, 2018 11:42 AM
To: [email protected]
Subject: Re: Read complex json file gives list type doesn't support different 
data types

[EXTERNAL EMAIL]


+1

We had a long discussion on this topic on the dev list a month or so. The 
conclusion seemed to be that Drill is intended to pull schema out of data 
without an external schema; the data must support Drill's schema inference. 
There remain holes where knowing the schema up front would be a huge win. For 
now, the solution is to ETL to Parquet, which does carry a schema.

Thanks,
- Paul



    On Thursday, May 31, 2018, 8:25:00 AM PDT, Lee, David 
<[email protected]> wrote:

 I think I opened an enhancement ticket to pass in a json schema object to a 
query to bypass schema learning to avoid problems like this. Coordinates could 
be typed as float in a schema object so drill can cast it to float without 
converting everything to doubles.

It also addresses the issues if some key value is NULL in the entire file. 
Drill will cast NULL to an int which results in a schema error if the next file 
read has non-Null string values.

Turning on read everything as string is a hack and even that fails once you 
start hitting Null key values which are nested keys or nested arrays.

An alternative short term solution would be to not include NULLs.

Sent from my iPad

> On May 31, 2018, at 2:23 AM, Divya Gehlot <[email protected]> wrote:
>
> [EXTERNAL EMAIL]
>
>
> I tried  exec.enable_union_type it didnt work for me ,however below helped :
>
> ALTER SESSION SET `store.json.read_numbers_as_double` = true;
>
>
>> On 31 May 2018 at 11:28, Padma Penumarthy <[email protected]> wrote:
>>
>> yes, that is correct.
>> You can try setting the option “exec.enable_union_type” for that to 
>> work with the caveat that union type is not fully supported in drill.
>>
>> Thanks
>> Padma
>>
>>
>>> On May 30, 2018, at 7:56 PM, Divya Gehlot <[email protected]>
>> wrote:
>>>
>>> Hi,
>>> I am reading a complex json file, I am getting format doesn't 
>>> support
>> while
>>> reading below :
>>> "Coordinates":[
>>>          [
>>>              23.53,
>>>              4.99,
>>>              11
>>>          ],
>>>          [
>>>              35.09,
>>>              7.7,
>>>              16
>>>          ]
>>> ]
>>>
>>>
>>> Error : Query execution error. Details:[
>>>> UNSUPPORTED_OPERATION ERROR: In a list of type FLOAT8, encountered 
>>>> a
>> value
>>>> of type BIGINT. Drill does not support lists of different types.
>>>> Line  15
>>>> Column  19
>>>> Field  Coordinates
>>>> Line  15
>>>> Column  19
>>>> Field  Coordinates
>>>> Line  15
>>>> Column  19
>>>> Field  Coordinates
>>>> Fragment 0:0
>>>
>>>
>>> If I remove the third coordinates(11,16) which is integer it works 
>>> like charm .
>>>
>>> Does that means Drill doesn't support values of different data types 
>>> in array list?
>>>
>>> Appreciate the help !
>>>
>>> Thanks,
>>> Divya
>>
>>


This message may contain information that is confidential or privileged. If you 
are not the intended recipient, please advise the sender immediately and delete 
this message. See 
http://www.blackrock.com/corporate/en-us/compliance/email-disclaimers for 
further information.  Please refer to 
http://www.blackrock.com/corporate/en-us/compliance/privacy-policy for more 
information about BlackRock’s Privacy Policy.

For a list of BlackRock's office addresses worldwide, see 
http://www.blackrock.com/corporate/en-us/about-us/contacts-locations.

© 2018 BlackRock, Inc. All rights reserved.

Reply via email to