The implementation is intentionally an approximation which uses
constant memory, instead of tracking the entire data set, which is
necessary to get an exact answer. You should find it converges to the
expected values with more data.

On Sun, Apr 17, 2011 at 7:53 AM, Lance Norskog <[email protected]> wrote:
> If you add the Java methods at the bottom to the
> org.apache.mahout.stats.OnlineSummarizer and run the main(), a funny
> thing prints out:
>
> [(count=200.0),(sd=28.8660),(mean=49.5000),(min=0.0),(25%=34.1312),(median=60.2104),(75%=83.8722),(max=99.0),]
>
> I added the numbers 0-99 twice to the summarizer. I would have
> expected the 25%=25 +/- 1, median=50 +/- 1, and 75%=75 +/- 1
> Note that the mean is correct.
> ---------------------------------------------------------------------------
>
>  @Override
>  public String toString() {
>   return "[" +
>   pair("count", getCount()) + pair("sd", getSD()) + pair("mean", getMean()) +
>   pair("min", getMin()) + pair("25%", getQuartile(1)) +
> pair("median", getMedian()) +
>      pair("75%", getQuartile(3)) + pair("max", getMax()) + "]";
>  }
>
>  private String pair(String tag, double value) {
>    String s = Double.toString(value);
>    if (s.length() > 8)
>      s = s.substring(0, 7);
>    return "(" + tag + "=" + s + "),";
>  }
>
>  public static void main(String[] args) {
>    OnlineSummarizer osQ = new OnlineSummarizer();
>    for(int i = 0; i < 200; i++) {
>      osQ.add(i % 100);
>    }
>    System.out.println(osQ.toString());
>  }
>
> --
> Lance Norskog
> [email protected]
>

Reply via email to