Ian,

Nice stuff!

Since this subject was brought back up, I thought maybe some would be
interested in the following. I've never had a need to calculate a median,
but, I knew Celko's SQL for Smarties had a few variations and examples from
various people, each with caveats. And then there were differences between
what he termed statistical and financial mean, and some other things as
well. Anyway, a google search turned up another Celko solution. And this one
also brings up the concept of weighted median.

Here is his example,
http://www.intelligententerprise.com/db_area/archives/1999/992004/celko.shtm
l, modified by me to use the  standard EMP table's SAL column:

SQL> SELECT AVG(DISTINCT x.sal)
  2    FROM (SELECT F1.sal
  3            FROM emp F1, emp F2
  4           GROUP BY F1.empno, F1.sal
  5          HAVING SUM(CASE WHEN F2.sal = F1.sal
  6                           THEN 1 ELSE 0 END)
  7          >= ABS(SUM(CASE WHEN F2.sal < F1.sal THEN 1
  8                          WHEN F2.sal > F1.sal THEN -1
  9                           ELSE 0 END)))
 10          X
 11  /

AVG(DISTINCTX.SAL)
------------------
              1550

The link above goes into some detail regarding the logic behind the query
and how his query finally reached the form above. I may never need to do a
median, but, this subject has been a good opportunity for learning. I've
tested the above with even, odd, multiple occurences of SAL, null,s etc. It
seems to work, but, everyone have a whack at it if you like.

Regards,

Larry G. Elkins
[EMAIL PROTECTED]


> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of MacGregor,
> Ian A.
> Sent: Saturday, June 30, 2001 9:00 PM
> To: Multiple recipients of list ORACLE-L
> Subject: Calculating The Median: Error Discovered in Oracle SQL 101 Code
>
>
> I ran the code from Oracle SQL 101 which Jared posted modifying
> it to find the median of the sal  column on that table
>
> SQL> select
>   2   rownum,
>   3   sal
>   4  from (
>   5   select sal
>   6   from scott.emp
>   7   where sal is not null
>   8   union
>   9   select 1 from dual where 1=2
>  10  )
>  11  group by sal, rownum
>  12  having rownum >= (
>  13   select decode( mod(total_freq,2),
>  14    1,trunc(total_freq/2 + 1),
>  15    0,trunc(total_freq/2)
>  16   )
>  17   from (
>  18    select count(*) total_freq
>  19    from scott.emp
>  20    where sal is not null
>  21   )
>  22  )
>  23  and rownum <= (
>  24   select decode( mod(total_freq,2),
>  25    1,trunc(total_freq/2 + 1),
>  26    0,trunc(total_freq/2 + 1)
>  27   )
>  28   from (
>  29    select count(*) total_freq
>  30    from scott.emp
>  31    where sal is not null
>  32   )
>  33  )
>  34  /
>
>                values
>              averaged
>     ROWNUM  in median
> ---------- ----------
>          7       1600
>          8       2450
>            ----------
> Median           2025
>
> ------------------------------------------------------------------
> --------------------
> This answer is different from the result of the code I posted
> which uses the new analytical functions.
>
>  select
>  case
>      when mod(number_salaried,2) = 0 then
>         (select sum(sal)/2 from(select sal, row_number()
>         over ( order by sal) as salrank
>         from scott.emp)
>         where salrank  = number_salaried/2
>         or salrank = number_salaried/2 +1)
>      else
>         (select sal from(select sal, row_number()
>         over ( order by sal) as salrank
>         from scott.emp)
>         where salrank = ceil(number_salaried/2))
>  end median
>  from (select sal,rank() over (order by sal) as rk from scott.emp),
>  (select count(sal) number_salaried from scott.emp)
>  where rk = 1
> /
>
>    MEDIAN
> ---------
>      1550
> ------------------------------------------------------------------
> ------------------------------
> Emp is a 14 row table .   The median should be the average of the
> seventh and eighth values.
> I cleared the computes and columns and ran the first part of the
> SQL 101 code
>
>  clear computes
> utes cleared
>  select
>   rownum,
>   sal
>  from (
>   select sal
>   from scott.emp
>   where sal is not null
>   union
>   select 1 from dual where 1=2
>  )
>  group by sal, rownum
>  /
>
> ROWNUM        SAL
> ------ ----------
>      1        800
>      2        950
>      3       1100
>      4       1250
>      5       1300
>      6       1500
>      7       1600
>      8       2450
>      9       2850
>     10       2975
>     11       3000
>
> ROWNUM        SAL
> ------ ----------
>     12       5000
>
> and also ran the part of my code which corresponded.  I changed
> my code slightly so the salrank column would print.
>
> SQL> select salrank, sal from(select sal, row_number()
>   2          over ( order by sal) as salrank
>   3          from scott.emp)
>   4  /
>
>    SALRANK        SAL
> ---------- ----------
>          1        800
>          2        950
>          3       1100
>          4       1250
>          5       1250
>          6       1300
>          7       1500
>          8       1600
>          9       2450
>         10       2850
>         11       2975
>
>    SALRANK        SAL
> ---------- ----------
>         12       3000
>         13       3000
>         14       5000
>
> 14 rows selected.
>
> ------------------------------------------------------------------
> ----------------------------
> The reason for the different answers  is now apparent.  The SQL
> 101 code is tossing duplicate
> records.   It's been a long time since my stats classes, but I'm
> about 99.999999% confident you don't purge duplicates when
> computing a median.  But even if I'm wrong about this, the SQL
> 101 code has reduced the set to 12 members, but it is still
> computing the median as if there were 14 members;
> that is, it is taking the average of the 7th and 8th values and
> not the average of the 6th and 7th.
>
> I hope there was a caveat in SQL 101 book stating the code only
> worked against columns with unique values, not including nulls.
>
> Ian MacGregor
> Stanford Linear Accelerator Center
> [EMAIL PROTECTED]
>
>
>
>
>
> --
> Please see the official ORACLE-L FAQ: http://www.orafaq.com
> --
> Author: MacGregor, Ian A.
>   INET: [EMAIL PROTECTED]
>
> Fat City Network Services    -- (858) 538-5051  FAX: (858) 538-5051
> San Diego, California        -- Public Internet access / Mailing Lists
> --------------------------------------------------------------------
> To REMOVE yourself from this mailing list, send an E-Mail message
> to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
> the message BODY, include a line containing: UNSUB ORACLE-L
> (or the name of mailing list you want to be removed from).  You may
> also send the HELP command for other information (like subscribing).

-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.com
-- 
Author: Larry Elkins
  INET: [EMAIL PROTECTED]

Fat City Network Services    -- (858) 538-5051  FAX: (858) 538-5051
San Diego, California        -- Public Internet access / Mailing Lists
--------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).

Reply via email to