[Matplotlib-users] Histogram without binning

2011-01-08 Thread OKB (not okblacke)
A straightforward question: Is there a simple way to get a 
histogram that does not bin any values together at all, but simply 
creates one bar for each distinct value in the dataset?

-- 
--OKB (not okblacke)
Brendan Barnwell
Do not follow where the path may lead.  Go, instead, where there is
no path, and leave a trail.
--author unknown


--
Gaining the trust of online customers is vital for the success of any company
that requires sensitive data to be transmitted over the Web.   Learn how to 
best implement a security strategy that keeps consumers' information secure 
and instills the confidence they need to proceed with transactions.
http://p.sf.net/sfu/oracle-sfdevnl 
___
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users


[Matplotlib-users] Bug in boxplot/mlab.prctile

2011-01-01 Thread OKB (not okblacke)
I noticed that the boxplot function incorrectly calculates the 
location of the median line in each box.  As a simple example, plotting 
the dataset [1, 2, 3, 4] incorrectly plots the median line at 3.

It also seems that the quartile calculations for the box are a 
little peculiar.  I have seen some discussion in old mailing list 
postings about mlab.prctile and its ways of calculating percentiles, 
which are different than those of some other software.

I'm aware that there is legitimate disagreement about the best 
way to calculate the quartiles.  However, it seems to me that mlab's way 
is still not any of these possibly-correct ways, because it uses int() 
or nparray.astype(int) to coerce the percentile result to an integer 
index.  This TRUNCATES the floating-point result.  No accepted quantile-
calculating method that I'm aware of does this; they all ROUND instead 
of truncating (if they want to coerce to an integer index at all, in 
order to produce a quantile value that is an element of the data set), 
or in some cases they round uniformly up for the lower quartile and 
down for the upper.  You can see a summary of different methods at 
http://www.amstat.org/publications/jse/v14n3/langford.html ; the method 
used by mlab does not appear to agree with any of these.

I would suggest that mlab.prctile be fixed to conform to some one 
or other of these methods, rather than adding to the proliferation of 
approaches to quantile-calculation.  Is there any motivation for always 
truncating to integer (other that it's quicker to type :-)?

Also, regardless of these quartile issues, there is, as far as I'm 
aware, no one who denies that the median of a (sorted) data set with an 
even number of values is the mean of the middle two values.  Since numpy 
is already a dependency for matplotlib, boxplot shouldn't use 
mlab.prctile at all to decide where to plot the median line -- just use 
numpy.median.

Thanks,
-- 
--OKB (not okblacke)
Brendan Barnwell
Do not follow where the path may lead.  Go, instead, where there is
no path, and leave a trail.
--author unknown


--
Learn how Oracle Real Application Clusters (RAC) One Node allows customers
to consolidate database storage, standardize their database environment, and, 
should the need arise, upgrade to a full multi-node Oracle RAC database 
without downtime or disruption
http://p.sf.net/sfu/oracle-sfdevnl
___
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users