They would end up as two different columns.

You would have to merge them into a single column using a SQL statement

Select COALESCE(taste , type) as taste_or_type from ...

With that said I avoid using JSON these days with Apache Drill (or any SQL 
engine). Drill appears to sample like 30,0000? Json records to figure out what 
the schema is. If you have "new columns" after the first 30,000 json records 
they either don't show up will bomb your query.

Today I just load up JSON files into a python program using json.loads() and 
then convert it into a table format (pandas or list of dictionaries) before 
writing it out to a tabular parquet file using pyarrow.

https://arrow.apache.org/docs/python/parquet.html

The only nested "json" format that is bullet proof is nested parquet which 
includes a full schema definition in the header of the file so you don't have 
to rely on hit or miss data sampling..

-----Original Message-----
From: Salil Sanghvi <[email protected]> 
Sent: Thursday, February 28, 2019 12:35 AM
To: [email protected]
Subject: How to write dynamic queries in Apache Drill to query Nested JSON ...

External Email: Use caution with links and attachments


Hi,

We have a requirement to query HBASE table which has single column with nested 
json:
1. What will be the query to convert the JSON data  into table format.
2. Also the query should be generic enough to pick any new columns that may 
vary between different json blocks.
3. In below example the 1st JSON block has attribute "taste" and second JSON 
block has attribute "type", but the query should be generic enough to return 
both type and taste 4. Query should be able to extract any new attribute 
introduced in future.

{
        "id": "0001",
        "taste": "donut",
        "name": "Cake",
        "ppu": 0.55,
        "batters":
                {
                        "batter":
                                [
                                        { "id": "1001", "type": "Regular" },},
                                        { "id": "1004", "type": "Devil's Food" }
                                ]
                },
        "topping":
                [
                        { "id": "5001", "type": "None" },
                        { "id": "5002", "type": "Glazed" }
                ]
}

{
        "id": "0002",
        "type": "donut",
        "name": "Cake",
        "ppu": 0.55,
        "batters":
                {
                        "batter":
                                [
                                        { "id": "1001", "type": "Regular" },
                                        { "id": "1004", "type": "Devil's Food" }
                                ]
                },
        "topping":
                [
                        { "id": "5001", "type": "None" },
                        { "id": "5004", "type": "Maple" }
                ]
}


This message may contain information that is confidential or privileged. If you 
are not the intended recipient, please advise the sender immediately and delete 
this message. See 
http://www.blackrock.com/corporate/compliance/email-disclaimers for further 
information.  Please refer to 
http://www.blackrock.com/corporate/compliance/privacy-policy for more 
information about BlackRock’s Privacy Policy.

For a list of BlackRock's office addresses worldwide, see 
http://www.blackrock.com/corporate/about-us/contacts-locations.

© 2019 BlackRock, Inc. All rights reserved.

Reply via email to