[matplotlib-devel] Boxplots with Bootstrapped Intervals
Hey folks, I recently modified the Axes method boxplot so that the confidence intervals around the mean are computed not with a static formula, but by bootstrapping the median as many times as the user specifies. Also, I commented out the lines that prevent the boxplots from folding around the hinges (but that's obviously minor and in the current SVN if I'm not mistaken). Is this something that would be worth including in matplotlib? I've never contributed to a project like this before and my code is probably pretty sloppy by MPL standards. I'm not really sure what's appropriate to contribute and what's not. Regards, -paul h. -- SOLARIS 10 is the OS for Data Centers - provides features such as DTrace, Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW http://p.sf.net/sfu/solaris-dev2dev ___ Matplotlib-devel mailing list Matplotlib-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/matplotlib-devel
Re: [matplotlib-devel] Boxplots with Bootstrapped Intervals
> phob...@geosyntec.com wrote: > > Hey folks, > > > > I recently modified the Axes method boxplot so that the confidence > intervals around the mean are computed not with a static formula, but by > bootstrapping the median as many times as the user specifies. Also, I > commented out the lines that prevent the boxplots from folding around the > hinges (but that's obviously minor and in the current SVN if I'm not > mistaken). > > > > Is this something that would be worth including in matplotlib? I've > never contributed to a project like this before and my code is probably > pretty sloppy by MPL standards. I'm not really sure what's appropriate to > contribute and what's not. > > > -Original Message- > From: Andrew Straw [mailto:straw...@astraw.com] > Sent: Wednesday, February 10, 2010 2:20 PM > To: Paul Hobson > Cc: matplotlib-devel@lists.sourceforge.net > Subject: Re: [matplotlib-devel] Boxplots with Bootstrapped Intervals > ... > I think the best thing to do is to post the patch so that it can be > reviewed. Sending the output of "svn diff" as an attachment to this > email list would be easy from our end. (A github based submission -- > fork the repo and push your commits -- would also work well for me, but > I'm not sure about the other MPL devs.) Andrew, Thanks for the reply. At the risk of embarrassment, I'm going to admit that I'm not at all familiar with SVN other than I know that it's version control software. Nonetheless I gave it a shot. I guess I should add that I didn't account for the fact that the user might want to have the CIs output with the other boxplot properties. Shouldn't be too hard to add in though. Also, I'm using the percentile method -- meaning that after I get my "normal" distribution of medians, I simply use mlab's percentile function to get the 2.5th and 97.5th percentile of that distribution. The other method (bias-corrected and accelerated) was too complex for me to code up quickly without using Rpy2, and that just seemed silly. Thanks again, -paul boxplot.patch Description: boxplot.patch -- SOLARIS 10 is the OS for Data Centers - provides features such as DTrace, Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW http://p.sf.net/sfu/solaris-dev2dev___ Matplotlib-devel mailing list Matplotlib-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/matplotlib-devel
[matplotlib-devel] Improvements to boxplots
Looks like my evenings this week (after today) will be open. I was thinking about coding up a potentially major overhaul of the axes.Axes.boxplot. Here's a rough outline of what I was thinking: 1) Improve the bootstrapping of the confidence intervals around the median 2) Add support for masked arrays (i.e., let user specify if masked values should be considered or not -- currently they are always considered, IIRC) 3) Improve the calculation of the percentiles to be consistent with SciPy and R. #1 seems like something that'll be nice. #2 seems pretty essential to me. The third improvement is something for which I would want y'all's blessing before moving ahead. However, I think it's pretty critical. See (25th and 75th percentiles) below: import numpy as np import matplotlib.mlab as mlab import scipy.stats as stats def comparePercentiles(x): mlp = mlab.prctile(x) stp = np.array([]) for p in (0.0, 25.0, 50.0, 75.0, 100.0): stp = np.hstack([stp, stats.scoreatpercentile(x,p)]) outstring = """ mlab \t scipy - %0.3f \t %0.3f (0th) %0.3f \t %0.3f (25th) %0.3f \t %0.3f (50th) %0.3f \t %0.3f (75th) %0.3f \t %0.3f (100th) """ % (mlp[0], stp[0], mlp[1], stp[1], mlp[2], stp[2], mlp[3], stp[3], mlp[4], stp[4]) print(outstring) >>> comparePercentiles(x) mlab scipy -- -1.245 -1.245 (0th) -0.950 -0.802 (25th) -0.162 -0.162 (50th) 0.5710.266 (75th) 1.0671.067 (100th) Copying and pasting the exact same data into R I get: > quantile(x, probs=c(0.0, 0.25, 0.50, 0.75, 1.0)) 0%25%50%75% 100% -1.2448508 -0.8022337 -0.1617812 0.2661112 1.0666244 Seems like it's clear that something needs to be done. AFAICT, scipy is not listed as a dependency of matplotlib, so it'll probably just be easier to retool mlab.prctile to return values that agree with scipy and R. What do you think? Would this be a welcome contribution? Thanks, -Paul Hobson -- This SF.net email is sponsored by Sprint What will you do first with EVO, the first 4G phone? Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first ___ Matplotlib-devel mailing list Matplotlib-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/matplotlib-devel