Hi David,
Your b.json file has only nulls; there is no way for Drill to determine what 
type of null is in your file. Drill requires each NULL to be a null of some 
type. Often, Drill guesses nullable int, which is why you saw the problem in 
your query.

If all your fields are strings, there is a workaround: use the "all-text-mode" 
when reading your JSON file. [1] Another workaround is to design the files so 
that the first record always has a non-null value for every field so Drill can 
guess the type correctly.

If any fields contain other types, you can use all text mode, then cast the 
non-text fields from text to the proper type.
 This is a known limitation: not just with Drill, but with any schema-free 
system. See DRILL-6035 [2].

At the present type, Drill supports no type of schema to help resolve this kind 
of ambiguity. (Though, having one would be the ideal solution to this kind of 
problem.)

Thanks,
- Paul

[1] https://drill.apache.org/docs/json-data-model/

 [2] https://issues.apache.org/jira/browse/DRILL-6035



    On Friday, February 23, 2018, 12:04:01 PM PST, Lee, David 
<david....@blackrock.com> wrote:  
 
 Using Drill's CTAS statements I've run into a schema inconsistency issue and 
I'm not sure how to solve it..

CREATE TABLE name [ (column list) ] AS query;  

If I have a directory called Cities which have JSON files which look like:

a.json:
{ "city":"San Francisco", "zip":"94105"}
{ "city":"San Jose", "zip":"94088"}

b.json:
{ "city":"Toronto ", "zip": null}
{ "city":"Montreal", "zip" null}

If I create a parquet file out of the Cities directory I will end up with files 
called:

1_0_0.parquet through 1_5_1.parquet

Now I got a problem:

Most of the parquet files have a column type of char for zip.
Some of the parquet files have a column type of int for zip because the zip 
value for a group of records was NULL..

This produces schema change errors later when trying to query the parquet 
directory.

Is it possible for Drill to do a better job learning schemas across all json 
files in a directory before creating parquet?





This message may contain information that is confidential or privileged. If you 
are not the intended recipient, please advise the sender immediately and delete 
this message. See 
http://www.blackrock.com/corporate/en-us/compliance/email-disclaimers for 
further information.  Please refer to 
http://www.blackrock.com/corporate/en-us/compliance/privacy-policy for more 
information about BlackRock’s Privacy Policy.

For a list of BlackRock's office addresses worldwide, see 
http://www.blackrock.com/corporate/en-us/about-us/contacts-locations.

© 2018 BlackRock, Inc. All rights reserved.  

Reply via email to