Re: Performance with multiple FLATTENs

rahul challapalli Tue, 19 Jul 2016 13:41:43 -0700

Matt,

Having multiple flatten's in your query leads to cross-join between the
output of each flatten. So a performance hit is expected with the addition
of each flatten. And there could also be a genuine performance bug for this
scenario. To be sure it is a bug we need more information as Abhishek
pointed out.


However if you want to do some computations after you flattened out your
query, it might be helpful sometimes to rewrite the query such that
multiple flatten's fall in multiple sub-queries. You may see some
performance improvement. Let me know how it goes.

- Rahul

On Tue, Jul 19, 2016 at 1:22 PM, Abhishek Girish <[email protected]>
wrote:

> Hi Matt,
>
> Can you please share more information on your setup, specifically the size
> of your dataset, including an approximate average size of individual JSON
> files, the number of nodes, including Drillbit memory config.
>
> Also can you share the query profiles for the few scenarios you mention.
>
> Regards,
> Abhishek
>
> On Friday, July 15, 2016, Matt <[email protected]> wrote:
>
> > I have JSON data with with a nested list and am using FLATTEN to extract
> > two of three list elements as:
> >
> > ~~~
> > SELECT id, FLATTEN(data)[0] AS dttm, FLATTEN(data)[1] AS result FROM ...
> > ~~~
> >
> > This works, but each FLATTEN seems to slow the query down dramatically,
> 3x
> > slower with the second flatten.
> >
> > Is there a better approach to extracting list elements?
> >
> > ~~~
> > [
> >   {
> >     "id": 16,
> >     "data": [
> >       [
> >         "2016-07-13 00:00",
> >         509,
> >         "OK"
> >       ],
> >       [
> >         "2016-07-13 00:01",
> >         461,
> >         "OK"
> >       ],
> >       [
> >         "2016-07-13 00:02",
> >         508,
> >         "OK"
> >       ],
> > ~~~
> >
>

Re: Performance with multiple FLATTENs

Reply via email to