[matplotlib-devel] should mlab.prctile(x,50) == np.median(x)?

2009-12-15 Thread Andrew Straw
The following (uncommitted) test currently fails. The reason is that 
mlab.prctile(x,50) doesn't handle even length sequences according to the 
numpy and wikipedia convention for the definition of median. Do we agree 
that it should pass?

Not only would I commit the test, but I also have a fix to make it pass, 
derived from scipy.stats.scoreatpercentile().

This would affect boxplot, if not more.

def test_prctile():
 # test odd lengths
 x=[1,2,3]
 assert mlab.prctile(x,50)==np.median(x)

 # test even lengths
 x=[1,2,3,4]
 assert mlab.prctile(x,50)==np.median(x)

 # derived from email sent by jason-sage to MPL-user on 20090914
 ob1=[1,1,2,2,1,2,4,3,2,2,2,3,4,5,6,7,8,9,7,6,4,5,5]
 p = [75]
 expected = [5.5]

 # test vectorized
 actual = mlab.prctile(ob1,p)
 assert np.allclose( expected, actual )

 # test scalar
 for pi, expectedi in zip(p,expected):
 actuali = mlab.prctile(ob1,pi)
 assert np.allclose( expectedi, actuali )

--
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
___
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel


[matplotlib-devel] boxplot notch

2009-12-15 Thread Andrew Straw
Hi,

I've been reading about box plots and examining the source code for 
boxplot() lately. While there doesn't seem to be a convention about what 
the notch specifies, I can't find any justification (or text describing) 
what exactly the MPL notch is. The source code is:

   # get median and quartiles
   q1, med, q3 = mlab.prctile(d,[25,50,75])
   iq = q3 - q1

   notch_max = med + 1.57*iq/np.sqrt(row)
   notch_min = med - 1.57*iq/np.sqrt(row)

Is this code actually calculating a meaningful value? If so, what?

The original commit was r1098, which doesn't offer a useful comment 
either (only "aaplied several sf patches" ... looking through the SF bug 
tracker, I couldn't find anything relevant from before the commit date 
of 2005-03-28).



--
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
___
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel


Re: [matplotlib-devel] boxplot notch

2009-12-15 Thread Fernando Perez
On Tue, Dec 15, 2009 at 9:57 AM, Andrew Straw  wrote:
>
>   notch_max = med + 1.57*iq/np.sqrt(row)
>   notch_min = med - 1.57*iq/np.sqrt(row)
>
> Is this code actually calculating a meaningful value? If so, what?
>

>From the statistics ignoramus in the room, so take this with a grain
of salt...  I'd write that code as

notch_max = med + (iq/2) * (pi/np.sqrt(row))

and it makes more sense.  The notch limits are an estimate of the
interval of the median, which is (one-half, for each up/down) the
q3-q1 range times a normalization factor which is pi/sqrt(n), where
n==row=len(d).  The 1/sqrt(n) makes some sense, as it's the usual
statistical error normalization factor.  The multiplication by pi, I'm
not so sure, and I can't find that exact formula in any quick stats
reference, but I'm sure someone who actually knows stats can point out
where it comes from.

Note that the code below does:

if notch_max > q3:
notch_max = q3
if notch_min < q1:
notch_min = q1

though matlab explicitly states in:

http://www.mathworks.com/access/helpdesk/help/toolbox/stats/boxplot.html

that

"""
Interval endpoints are the extremes of the notches or the centers of
the triangular markers. When the sample size is small, notches may
extend beyond the end of the box.
"""

So it seems to me that the more principled thing to do would be to
leave those notch markers outside the box if they land there, because
that's a warning of the robustness of the estimation. Clipping them to
q1/q3 is effectively hiding a problem...


cheers,

f

--
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
___
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel


[matplotlib-devel] imshow without resampling in the ps backend.

2009-12-15 Thread Jae-Joon Lee
A patch that enables drawing image in ps backend without resampling is
committed in r8035.
So, please test it if you're interested.

The raw image is to be used only when interpolation=="nearest" and
there is only one image.
While extending this to other backend such as pdf and svg should be
straight forward, I want to hear how others think about the overall
approach, e.g., api change and such. The current approach is to
minimize change in backends.

There are a few minor issues, whose solution is not clear to me.

* If there are multiple images (and the ps backend is used), the
images are resampled as they are compositted into a single image.

* The current solution forces not to resample whenever
interpolation=="nearest", I think this is generally good behavior.
However, on the highly extreme case of very high resolution of image
(e.g., image dpi > 1000 ?), it might be better if the image is
resampled (i.e., downsampled).

One option would be to introduce a new "resample" keyword in the
imshow command (which will become the attribute of the images).

Regards,

-JJ

--
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
___
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel