Is there a fundamental difference between the following queries? I can't get 
the second example working with parquet files which contain 400,000+ nested 
records..

It seems like the system wants to flatten every possible record before applying 
the SQL Where clause to the flattened data structure..

Example 1:

select b.* from dfs.`test1.json` b where b.item  = 3

[
    {
      "item": 1,
      "item_name": "name_for_1"
    },
    {
      "item": 2,
      "item_name": "name_for_2"
    },
    {
      "item": 3,
      "item_name": "name_for_3"
    },
    {
      "item": 4,
      "item_name": "name_for_4"
    }
]

Example 2:

select b.* from 
(select flatten(a.details) as details
from dfs.`test2.json` a) b
where b.details.item  = 3

{
  "header": "my_header_info",
  "details": [
    {
      "item": 1,
      "item_name": "name_for_1"
    },
    {
      "item": 2,
      "item_name": "name_for_2"
    },
    {
      "item": 3,
      "item_name": "name_for_3"
    },
    {
      "item": 4,
      "item_name": "name_for_4"
    }
  ]
}




This message may contain information that is confidential or privileged. If you 
are not the intended recipient, please advise the sender immediately and delete 
this message. See 
http://www.blackrock.com/corporate/en-us/compliance/email-disclaimers for 
further information.  Please refer to 
http://www.blackrock.com/corporate/en-us/compliance/privacy-policy for more 
information about BlackRock’s Privacy Policy.

For a list of BlackRock's office addresses worldwide, see 
http://www.blackrock.com/corporate/en-us/about-us/contacts-locations.

© 2017 BlackRock, Inc. All rights reserved.

Reply via email to