If you add the Java methods at the bottom to the
org.apache.mahout.stats.OnlineSummarizer and run the main(), a funny
thing prints out:
[(count=200.0),(sd=28.8660),(mean=49.5000),(min=0.0),(25%=34.1312),(median=60.2104),(75%=83.8722),(max=99.0),]
I added the numbers 0-99 twice to the summarizer. I would have
expected the 25%=25 +/- 1, median=50 +/- 1, and 75%=75 +/- 1
Note that the mean is correct.
---------------------------------------------------------------------------
@Override
public String toString() {
return "[" +
pair("count", getCount()) + pair("sd", getSD()) + pair("mean", getMean()) +
pair("min", getMin()) + pair("25%", getQuartile(1)) +
pair("median", getMedian()) +
pair("75%", getQuartile(3)) + pair("max", getMax()) + "]";
}
private String pair(String tag, double value) {
String s = Double.toString(value);
if (s.length() > 8)
s = s.substring(0, 7);
return "(" + tag + "=" + s + "),";
}
public static void main(String[] args) {
OnlineSummarizer osQ = new OnlineSummarizer();
for(int i = 0; i < 200; i++) {
osQ.add(i % 100);
}
System.out.println(osQ.toString());
}
--
Lance Norskog
[email protected]