Matt, Having multiple flatten's in your query leads to cross-join between the output of each flatten. So a performance hit is expected with the addition of each flatten. And there could also be a genuine performance bug for this scenario. To be sure it is a bug we need more information as Abhishek pointed out.
However if you want to do some computations after you flattened out your query, it might be helpful sometimes to rewrite the query such that multiple flatten's fall in multiple sub-queries. You may see some performance improvement. Let me know how it goes. - Rahul On Tue, Jul 19, 2016 at 1:22 PM, Abhishek Girish <[email protected]> wrote: > Hi Matt, > > Can you please share more information on your setup, specifically the size > of your dataset, including an approximate average size of individual JSON > files, the number of nodes, including Drillbit memory config. > > Also can you share the query profiles for the few scenarios you mention. > > Regards, > Abhishek > > On Friday, July 15, 2016, Matt <[email protected]> wrote: > > > I have JSON data with with a nested list and am using FLATTEN to extract > > two of three list elements as: > > > > ~~~ > > SELECT id, FLATTEN(data)[0] AS dttm, FLATTEN(data)[1] AS result FROM ... > > ~~~ > > > > This works, but each FLATTEN seems to slow the query down dramatically, > 3x > > slower with the second flatten. > > > > Is there a better approach to extracting list elements? > > > > ~~~ > > [ > > { > > "id": 16, > > "data": [ > > [ > > "2016-07-13 00:00", > > 509, > > "OK" > > ], > > [ > > "2016-07-13 00:01", > > 461, > > "OK" > > ], > > [ > > "2016-07-13 00:02", > > 508, > > "OK" > > ], > > ~~~ > > >
