Re: [SQL] aggregation question

Samuel Gendler Wed, 01 Dec 2010 01:07:00 -0800

On Tue, Nov 30, 2010 at 7:47 PM, Samuel Gendler
<[email protected]>wrote:


> I have a fact table with a number of foreign keys to dimension data and
> some measure columns.  Let's assume I've got dimension columns a,b,c,d,e,
> and f and measure columns x, and y.
>
> I need to be able to find the value of f, x, and y for the row with
> min(x/y) when e in (1,2) and max(x/y) when e not in (1,2) with grouping by
> a,b,c, and d.
>
> A query which would identify the row in question would look like this:
>
> select min(x/y), a, b, c, d, e from my_table group by a,b,c,d,e order by
> a,b,c,d
>
> But I don't know how to go about also extracting the value of x, y, and f
> for the row which is matched - I suppose I could join to a subquery, but
> that strikes me as probably not being the most efficient way to accomplish
> it.
>
> Actually, it occurs to me that I've potentially got more than one row for
> any given a,b,c,d,e,f tuple, but I want a min for all rows aggregated by f
>
> Basically, for each (a,b,c,d,e) I want to return sum(x), sum(y), and f (as
> well as a,b,c,d,e) for the f with min(sum(x)/sum(y))
>
>
I solved this, but I'm not sure if my solution is ideal

    -- select distinct causes multiple rows that tie for rank = 1

    -- to condense down to a single row. Outer query also strips

    -- any rows from windows with less than 3 entries.

    select distinct count, total_ms, time, fk_e,
           fk_a, fk_b, fk_c
    from (
        -- grab count and total_ms for each fk_g, grouped by
        -- a,b,c, time, and e.  Use rank() window function
        -- to rank each row in the window, ordered by total_ms/count
        -- and use count(*) as window function to get total rows in
        -- the window.
        select sum(f.x) as count, sum(f.y) as total_ms,
               f.time as time,
               f.fk_a, f.fk_b, f.fk_c,
               f.fk_e, f.fk_f,
               rank() over (w_agg order by sum(f.x)/sum(f.y)) as rank,
               count(*) over (w_agg) as rows
        from fact_table f
        group by 3,4,5,6,7,8
        window w_agg as (partition by time,
                         fk_a, fk_b, fk_c,
                         fk_e)
    ) q
    where q.rank = 1 and q.rows >= 3



I will need to run this twice - once with rank() ordered descending and the
inner query actually limited by fk_e in (4,14) and once with rank() ordered
ascending for fk_e not in (4,14).   It is possible I can put a case
statement in the inner select in order to handle that.  I haven't tried it
yet.

Is there a better solution than the outer 'select distinct' in order to only
see rows that have rank = 1 and to never have duplicate rows?

Re: [SQL] aggregation question

Reply via email to