The following is based on Keith Similie's stats companion.
NB. Median and quartiles
midpt=: -:@<:@#
median=: -:@(+/)@((<.,>.)@midpt { /:~)
Q1=: [: median ] #~ median > ]
Q3=: [: median ] #~ median < ]
quartiles=: Q1 , median , Q3
Another definition of median where the domain is integers.
median=: ~.@((<.,>.)@midpt { /:~)
> From: Devon McCormick
>
> Don - I like yours better than the one I have now, though I'll probably
> generalize it into an "Ntiler".
>
> Part of the problem is that there are multiple correct answers if we
> define
> quartile numbers as those which divide the set as evenly as possible
> into
> four groups, e.g.
>
> quartileCt=: 4 : '+/"1 (y>:/~x,_) *. y< /~__,x' NB. Count
> elements/quartile
> NB. All these different answers work correctly:
> (52.75 61 70.25) quartileCt scrs NB. Excel
> 5 5 5 5
> (52.5 61 70.5) quartileCt scrs NB. web site
> 5 5 5 5
> (52.1 61.1 70.1) quartileCt scrs NB. another answer...
> 5 5 5 5
>
> One way to test, as you suggest is to look at the behavior when we have
> an
> odd number of elements, i.e. "odd" with respect to four:
>
> NB. Two different ways of counting number of elements/quartile:
> quartileCt=: 4 : '+/"1 (y>:/~x,_) *. y< /~__,x'
> quartileCt2=: 4 : '+/"1 (y> /~x,_) *. y<:/~__,x'
> NB. Two different quartilers:
> test0=: 1 : '(3{.4 ntilebps y) u y' NB. Mine
> test1=: 1 : '(qr y) u y' NB. Don's
>
> NB. Both work OK for even and odd cases counted one way...
> quartileCt test0&>0 1 2 3 4}.&.><scrs
> 5 5 5 5
> 4 5 5 5
> 4 5 4 5
> 4 4 4 5
> 4 4 4 4
> quartileCt test1&>0 1 2 3 4}.&.><scrs
> 5 5 5 5
> 5 5 5 4
> 5 4 5 4
> 5 4 4 4
> 4 4 4 4
>
> NB. Mine falls down for a couple of cases counted the other way:
> quartileCt2 test0&>0 1 2 3 4}.&.><scrs
> 4 5 5 6
> 4 5 5 5
> 4 4 5 5
> 4 4 4 5
> 3 4 4 5
> NB. but Don's works OK under different counting method as well:
> quartileCt2 test1&>0 1 2 3 4}.&.><scrs
> 5 5 5 5
> 5 5 5 4
> 5 4 5 4
> 5 4 4 4
> 4 4 4 4
>
> Thanks for your suggestions.
>
> Regards,
>
> Devon
>
> On Fri, Oct 16, 2009 at 3:47 PM, Don Guinn <[email protected]> wrote:
>
> > Looked up the definition of "median" and it appears that there are
> several
> > definitions of "median". And, according to
> > http://en.wikipedia.org/wiki/Median median and quartiles can be messy
> with
> > badly skewed data. Best I can tell this is a measurement that should
> be
> > used
> > with care.
> > I wrote a quick verb which gives the same answers as the site you
> > referenced
> > and it does strange things, depending on the data. If the count of
> the set
> > is odd, which group should have the extra number? What if the data is
> > really
> > skewed?
> >
> > qr=.([:([:(+/%#)]{~[:(<:,:])[:>.0.25 0.5 0.75"_*#)]/:]) NB. Needs
> > cleaning up.
> > qr scrs
> > 52.5 61 70.5
> > qr i.4
> > 0.5 1.5 2.5
> > qr i.5
> > 1.5 2.5 3.5
> > qr i.12
> > 2.5 5.5 8.5
> > qr i.11
> > 2.5 5.5 8.5
> > qr i.13
> > 3.5 6.5 9.5
> > -~/0 2{qr scrs
> > 18
> > qr 1 1 1 1 1 2 3 4
> > 1 1 2.5
> >
> >
> > On Fri, Oct 16, 2009 at 1:21 PM, Devon McCormick <[email protected]>
> > wrote:
> >
> > > Members of the forum -
> > >
> > > while looking up some statistical definitions, I came across this
> example
> > >
> http://www2.le.ac.uk/offices/ssds/sd/ld/resources/numeracy/variability
> > > in which the calculation of the median disagrees with the result of
> the
> > one
> > > listed as "m0=: median=: <....@-:@# { /:~" in "MathStats" on the J
> wiki.
> > >
> > > I was actually looking at the definition of quartiles when I
> noticed
> > this.
> > >
> > > For the series
> > >
> > > #scrs=. 43 48 50 50 52 53 56 58 59 60 62 65 66 68 70 71 74 76 78
> 80
> > > 20
> > > m0=: <....@-:@# { /:~
> > > m0 scrs
> > > 62
> > > median scrs NB. my own definition
> > > 61
> > > median
> > > -:@(+/)@((<. , >.)@midpt { /:~)
> > > midpt
> > > -:@<:@#
> > >
> > > Also, this site's answers disagree with Excel and with my own
> quartile
> > > function, applied to "scrs" above, but I think the site is correct:
> > > NB. Quartiles 1-3 according to Excel:
> > > 52.75 61 70.25
> > >
> > > NB. According to
> > >
> http://www2.le.ac.uk/offices/ssds/sd/ld/resources/numeracy/variability:
> > > 52.5 61 70.5
> > >
> > > 0 1 2 quartile&><scrs
> > > 52 60 70
> > >
> > > NB. My "quartile" disagrees with my "median": the middle quartile
> should
> > be
> > > the same as the median.
> > > quartile
> > > 4 : 'x{4 ntilebps y'
> > > ntilebps
> > > 4 : 0
> > > NB.* ntilebps: return breakpoint values of x-tiles of y; e.g. 4
> ntilebps
> > y
> > > NB. -> quartiles; 0-based so "1st" quartile is 0{4 ntilebps y.
> > > quant=. x
> > > y=. /:~y
> > > wh=. 0 1#:(i.quant)*quant%~#y NB. Where partition points are
> exactly
> > > 'n f'=. |:wh NB. whole and fractional part of
> > > partitions
> > > 1|.+/"1 ((1-f),.f)*(n+/_1 0){y NB. "1|." moves top quantile to
> end.
> > > )
> > >
> > > Anyone care to weigh in on this?
> > >
> > > Regards,
> > >
> > > Devon
> > >
> > >
> > > --
> > > Devon McCormick, CFA
> > > ^me^ at acm.
> > > org is my
> > > preferred e-mail
> > > -------------------------------------------------------------------
> ---
> > > For information about J forums see
> http://www.jsoftware.com/forums.htm
> > >
> > ---------------------------------------------------------------------
> -
> > For information about J forums see
> http://www.jsoftware.com/forums.htm
> >
>
>
>
> --
> Devon McCormick, CFA
> ^me^ at acm.
> org is my
> preferred e-mail
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm