thank you, I got it :).

Completely understanding the Having clause "limitation" and just missing
the column aliasing a bit.
When I looked at the null results I was so sure that there were no nulls in
the data-set that I did not bother to double check and just wrote the email.

Thanks for the assistance,
 -Stefan


On Thu, Jul 30, 2015 at 4:11 PM, Jinfeng Ni <[email protected]> wrote:

> This is related to using alias in having clause [1].  Drill does not allow
> using alias in having clause.
>
>
> Q1.
> select p.client_ip.country_code country_code, count(*) event_count from
> dfs.tmp.`/events` as p where event_count > 30000 group by
> p.client_ip.country_code;
>
> In Q1, the event-count in the where clause is not refer to the alias
> "event_count". In stead, it is resolved to refer to a column from the table
> "events".  In the table, "event_count" does not exists, and is treated as
> null. That's why you get 0 rows.
>
> Q2.
> select p.client_ip.country_code country_code, count(*) event_count from
> dfs.tmp.`/events` as p group by p.client_ip.country_code having event_count
> > 30000;
>
> Again, "event_count" in having clause is not resolved to alias
> "event_count" in select list. That's why SQL validation fail.
>
>
> Q3
> select p.client_ip.country_code country_code, count(*) event_count from
> dfs.tmp.`/events` as p group by p.client_ip.country_code having count(*) >
> 30000;
>
> This is the right approach.
>
> Regarding the null value and the total count, I tried with a slightly
> modified input ( adding two rows with country_code as null).
>
> {"client_ip":{"country_code":"US"}}
> {"client_ip":{"country_code":"US"}}
> {"client_ip":{"country_code":"US"}}
> {"client_ip":{"country_code":"GB"}}
> {"client_ip":{"country_code":"US"}}
> {"client_ip":{}}
> {"client_ip":{}}
>
>
> select p.client_ip.country_code country_code, count(*) event_count
> from dfs.tmp.`events.json` as p
> group by p.client_ip.country_code
> having count(*) > 1;
>
> +---------------+--------------+
> | country_code  | event_count  |
> +---------------+--------------+
> | US            | 4            |
> | null          | 2            |
> +---------------+--------------+
> 2 rows selected (0.285 seconds)
>
> Looks like the query gets the correct answer.
>
>
> [1]
>
> http://stackoverflow.com/questions/2068682/why-cant-i-use-alias-in-a-count-column-and-reference-it-in-a-having-clause
>
>
> On Thu, Jul 30, 2015 at 8:40 AM, Stefán Baxter <[email protected]>
> wrote:
>
> > Hi,
> >
> > event_count is a column alias for count(*) (without the as).
> >
> > Regards,
> >  -Stefan
> >
> > On Thu, Jul 30, 2015 at 3:32 PM, Jacques Nadeau <[email protected]>
> > wrote:
> >
> > > select p.client_ip.country_code country_code, count(*) event_count from
> > > dfs.tmp.`/events` as p where event_count > 30000 group by
> > > p.client_ip.country_code;
> > > +---------------+--------------+
> > > | country_code  | event_count  |
> > > +---------------+--------------+
> > > +---------------+--------------+
> > >
> > > In this situation, a fully-schemaed system would have stated that the
> > > event_count column does not exist.  In the case of json, we don't know
> if
> > > you're unlying data has an event_count column.  When we actually try to
> > get
> > > it, we find it doesn't exist and is therefore null.  Null is never
> > greater
> > > than 30000 so we exclude all records.  Modify your data to look like
> > this:
> > >
> > > {event_count: 30001, "client_ip":{"country_code":"US"}}
> > > {event_count: 30001, "client_ip":{"country_code":"US"}}
> > > {event_count: 30001, "client_ip":{"country_code":"US"}}
> > > {event_count: 30001, "client_ip":{"country_code":"GB"}}
> > > {event_count: 30001, "client_ip":{"country_code":"US"}}
> > >
> > > And you'll see what your query is actually doing.  In SQL, an aggregate
> > > function is applied after the WHERE clause.  The only way to apply a
> > filter
> > > after an aggregate is with the HAVING clause.
> > >
> > >
> > >
> > > --
> > > Jacques Nadeau
> > > CTO and Co-Founder, Dremio
> > >
> > > On Thu, Jul 30, 2015 at 8:20 AM, Andries Engelbrecht <
> > > [email protected]> wrote:
> > >
> > > > Last I checked group by and having clauses do not support column
> > aliases.
> > > >
> > > > —Andries
> > > >
> > > >
> > > > > On Jul 30, 2015, at 7:19 AM, Stefán Baxter <
> > [email protected]>
> > > > wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > That last case works as expected, sorry, this test data does have
> > null
> > > > > values for country_code.
> > > > >
> > > > > That means that I have a working solution but that it would be nice
> > if
> > > v1
> > > > > (above) would work.
> > > > >
> > > > > Thank you,
> > > > > -Stefán
> > > > >
> > > > > On Thu, Jul 30, 2015 at 2:15 PM, Stefán Baxter <
> > > > [email protected]>
> > > > > wrote:
> > > > >
> > > > >> make that "Count in where or having clause does not work as
> > expected"
> > > > >>
> > > > >> On Thu, Jul 30, 2015 at 2:14 PM, Stefán Baxter <
> > > > [email protected]>
> > > > >> wrote:
> > > > >>
> > > > >>> Hi,
> > > > >>>
> > > > >>> I have data that can be reduced to this:
> > > > >>>
> > > > >>>   - {"client_ip":{"country_code":"US"}}
> > > > >>>   - {"client_ip":{"country_code":"US"}}
> > > > >>>   - {"client_ip":{"country_code":"US"}}
> > > > >>>   - {"client_ip":{"country_code":"GB"}}
> > > > >>>   - {"client_ip":{"country_code":"US"}}
> > > > >>>
> > > > >>> This works fine:
> > > > >>>
> > > > >>> select p.client_ip.country_code country_code, count(*)
> event_count
> > > from
> > > > >>> dfs.tmp.`/events` as p group by p.client_ip.country_code;
> > > > >>>
> > > > >>>
> > > > >>> +---------+--------------+
> > > > >>> | country_code  | event_count  |
> > > > >>> +---------+--------------+
> > > > >>> | US      | 21516        |
> > > > >>> | GB      | 323594       |
> > > > >>> +---------+--------------+
> > > > >>>
> > > > >>>
> > > > >>> This simple where clause returns nothing:
> > > > >>>
> > > > >>> select p.client_ip.country_code country_code, count(*)
> event_count
> > > from
> > > > >>> dfs.tmp.`/events` as p where event_count > 30000 group by
> > > > >>> p.client_ip.country_code;
> > > > >>> +---------------+--------------+
> > > > >>> | country_code  | event_count  |
> > > > >>> +---------------+--------------+
> > > > >>> +---------------+--------------+
> > > > >>>
> > > > >>>
> > > > >>> Using a having clause (v1):
> > > > >>>
> > > > >>> select p.client_ip.country_code country_code, count(*)
> event_count
> > > from
> > > > >>> dfs.tmp.`/events` as p group by p.client_ip.country_code having
> > > > event_count
> > > > >>>> 30000;
> > > > >>>
> > > > >>> Error: PARSE ERROR: From line 1, column 169 to line 1, column
> 179:
> > > > >>> Expression 'event_count' is not being grouped
> > > > >>>
> > > > >>>
> > > > >>> Using a having clause (v2):
> > > > >>>
> > > > >>> select p.client_ip.country_code country_code, count(*)
> event_count
> > > from
> > > > >>> dfs.tmp.`/events` as p group by p.client_ip.country_code having
> > > > count(*) >
> > > > >>> 30000;
> > > > >>> +---------------+--------------+
> > > > >>> | country_code  | event_count  |
> > > > >>> +---------------+--------------+
> > > > >>> | GB            | 323594       |
> > > > >>> | null*          | 566667       |
> > > > >>> +---------------+--------------+
> > > > >>>
> > > > >>>
> > > > >>> * there are no null values in the database and this looks to be
> the
> > > > >>> "total for the rest"
> > > > >>>
> > > > >>>
> > > > >>> Does anyone know how this can be made to work?
> > > > >>>
> > > > >>> Regards,
> > > > >>> -Stefan
> > > > >>>
> > > > >>
> > > > >>
> > > >
> > > >
> > >
> >
>

Reply via email to