Sam Mason <s...@samason.me.uk> wrote: > All we're saying is that we're less than 90% confident that there's > something "significant" going on. All the fiddling with standard > deviations and sample sizes is just easiest way (that I know of) > that statistics currently gives us of determining this more formally > than a hand-wavy "it looks OK to me". Science tells us that humans > are liable to say things are OK when they're not, as well as vice > versa; statistics gives us a way to work past these limitations in > some common and useful situations. Following up, I took the advice offered in the referenced article, and used a spreadsheet with a TDIST function for more accurate results than available through the table included in the article. That allows what I think is a more meaningful number: the probability that taking a sample that big would have resulted in a t-statistic larger than was actually achieved if there was no real difference. With the 20 samples from that last round of tests, the answer (rounded to the nearest percent) is 60%, so "probably noise" is a good summary. Combined with the 12 samples from earlier comparable runs with the prior version of the patch, it goes to a 90% probability that noise would generate a difference at least that large, so I think we've gotten to "almost certainly noise". :-) To me, that seems more valuable for this situation than saying "we haven't reached 90% confidence that it's a real difference." I used the same calculations up through the t-statistic. The one question I have left for this technique is why you went with ((avg1 - avg2) / (stddev * sqrt(2/samples))) instead of ((avg1 - avg2) / (stddev / sqrt(samples))) I assume that it's because the baseline was a set of samples rather than a fixed mark, but I couldn't pick out a specific justification for this in the literature (although I might have just missed it), so I'd feel more comfy if you could clarify. Given the convenience of capturing benchmarking data in a database, has anyone tackled implementation of something like the spreadsheet TDIST function within PostgreSQL? -Kevin
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers