[matplotlib-devel] Boxplots with Bootstrapped Intervals

2010-02-10 Thread PHobson
Hey folks,

I recently modified the Axes method boxplot so that the confidence intervals 
around the mean are computed not with a static formula, but by bootstrapping 
the median as many times as the user specifies. Also, I commented out the lines 
that prevent the boxplots from folding around the hinges (but that's obviously 
minor and in the current SVN if I'm not mistaken). 

Is this something that would be worth including in matplotlib? I've never 
contributed to a project like this before and my code is probably pretty sloppy 
by MPL standards. I'm not really sure what's appropriate to contribute and 
what's not.

Regards,
-paul h.


--
SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
http://p.sf.net/sfu/solaris-dev2dev
___
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel


Re: [matplotlib-devel] Boxplots with Bootstrapped Intervals

2010-02-10 Thread PHobson
> phob...@geosyntec.com wrote:
> > Hey folks,
> >
> > I recently modified the Axes method boxplot so that the confidence
> intervals around the mean are computed not with a static formula, but by
> bootstrapping the median as many times as the user specifies. Also, I
> commented out the lines that prevent the boxplots from folding around the
> hinges (but that's obviously minor and in the current SVN if I'm not
> mistaken).
> >
> > Is this something that would be worth including in matplotlib? I've
> never contributed to a project like this before and my code is probably
> pretty sloppy by MPL standards. I'm not really sure what's appropriate to
> contribute and what's not.
> >


> -Original Message-
> From: Andrew Straw [mailto:straw...@astraw.com]
> Sent: Wednesday, February 10, 2010 2:20 PM
> To: Paul Hobson
> Cc: matplotlib-devel@lists.sourceforge.net
> Subject: Re: [matplotlib-devel] Boxplots with Bootstrapped Intervals
> ...
> I think the best thing to do is to post the patch so that it can be
> reviewed. Sending the output of "svn diff" as an attachment to this
> email list would be easy from our end. (A github based submission --
> fork the repo and push your commits -- would also work well for me, but
> I'm not sure about the other MPL devs.)

Andrew,

Thanks for the reply. At the risk of embarrassment, I'm going to admit that I'm 
not at all familiar with SVN other than I know that it's version control 
software. Nonetheless I gave it a shot.

I guess I should add that I didn't account for the fact that the user might 
want to have the CIs output with the other boxplot properties. Shouldn't be too 
hard to add in though. Also, I'm using the percentile method -- meaning that 
after I get my "normal" distribution of medians, I simply use mlab's percentile 
function to get the 2.5th and 97.5th percentile of that distribution. The other 
method (bias-corrected and accelerated) was too complex for me to code up 
quickly without using Rpy2, and that just seemed silly.

Thanks again,
-paul


boxplot.patch
Description: boxplot.patch
--
SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
http://p.sf.net/sfu/solaris-dev2dev___
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel


[matplotlib-devel] Improvements to boxplots

2010-07-06 Thread PHobson
Looks like my evenings this week (after today) will be open. I was thinking 
about coding up a potentially major overhaul of the axes.Axes.boxplot. Here's a 
rough outline of what I was thinking:

1) Improve the bootstrapping of the confidence intervals around the median
2) Add support for masked arrays (i.e., let user specify if masked values 
should be considered or not -- currently they are always considered, IIRC)
3) Improve the calculation of the percentiles to be consistent with SciPy and R.

#1 seems like something that'll be nice. #2 seems pretty essential to me. The 
third improvement is something for which I would want y'all's blessing before 
moving ahead. However, I think it's pretty critical. See (25th and 75th 
percentiles) below:

import numpy as np
import matplotlib.mlab as mlab
import scipy.stats as stats

def comparePercentiles(x):
 mlp = mlab.prctile(x)
 stp = np.array([])
 for p in (0.0, 25.0, 50.0, 75.0, 100.0):
 stp = np.hstack([stp, stats.scoreatpercentile(x,p)])
 outstring = """
 mlab \t scipy
 -
 %0.3f \t %0.3f (0th)
 %0.3f \t %0.3f (25th)
 %0.3f \t %0.3f (50th)
 %0.3f \t %0.3f (75th)
 %0.3f \t %0.3f (100th)
 """ % (mlp[0], stp[0], mlp[1], stp[1], mlp[2], stp[2], mlp[3], stp[3], 
mlp[4], stp[4])
 print(outstring)

>>> comparePercentiles(x)

mlab scipy
--
-1.245   -1.245 (0th)
-0.950   -0.802 (25th)
-0.162   -0.162 (50th)
0.5710.266 (75th)
1.0671.067 (100th)

Copying and pasting the exact same data into R I get:
> quantile(x, probs=c(0.0, 0.25, 0.50, 0.75, 1.0))
0%25%50%75%   100%
-1.2448508 -0.8022337 -0.1617812  0.2661112  1.0666244


Seems like it's clear that something needs to be done. AFAICT, scipy is not 
listed as a dependency of matplotlib, so it'll probably just be easier to 
retool mlab.prctile to return values that agree with scipy and R. What do you 
think? Would this be a welcome contribution?

Thanks,
-Paul Hobson


--
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
___
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel