Increasing from 200 to 2000 on upwards drives the 25/median/75 numbers towards 25/50/75.
On Sat, Apr 16, 2011 at 11:53 PM, Lance Norskog <[email protected]> wrote: > If you add the Java methods at the bottom to the > org.apache.mahout.stats.OnlineSummarizer and run the main(), a funny > thing prints out: > > [(count=200.0),(sd=28.8660),(mean=49.5000),(min=0.0),(25%=34.1312),(median=60.2104),(75%=83.8722),(max=99.0),] > > I added the numbers 0-99 twice to the summarizer. I would have > expected the 25%=25 +/- 1, median=50 +/- 1, and 75%=75 +/- 1 > Note that the mean is correct. > --------------------------------------------------------------------------- > > @Override > public String toString() { > return "[" + > pair("count", getCount()) + pair("sd", getSD()) + pair("mean", getMean()) + > pair("min", getMin()) + pair("25%", getQuartile(1)) + > pair("median", getMedian()) + > pair("75%", getQuartile(3)) + pair("max", getMax()) + "]"; > } > > private String pair(String tag, double value) { > String s = Double.toString(value); > if (s.length() > 8) > s = s.substring(0, 7); > return "(" + tag + "=" + s + "),"; > } > > public static void main(String[] args) { > OnlineSummarizer osQ = new OnlineSummarizer(); > for(int i = 0; i < 200; i++) { > osQ.add(i % 100); > } > System.out.println(osQ.toString()); > } > > -- > Lance Norskog > [email protected] > -- Lance Norskog [email protected]
