On Fri, Aug 07, 2009 at 10:19:20AM -0500, Kevin Grittner wrote:
> Sam Mason <s...@samason.me.uk> wrote: 
>  
> > What do people do when testing this?  I think I'd look to something
> > like Student's t-test to check for statistical significance.  My
> > working would go something like:
> > 
> >   I assume the variance is the same because it's being tested on the
> >   same machine.
> > 
> >   samples = 20
> >   stddev  = 144.26
> >   avg1    = 4783.13
> >   avg2    = 4758.46
> >   t       = 0.54  ((avg1 - avg2) / (stddev * sqrt(2/samples)))
> > 
> > We then have to choose how certain we want to be that they're
> > actually different, 90% is a reasonably easy level to hit (i.e. one
> > part in ten, with 95% being more commonly quoted).  For 20 samples
> > we have 19 degrees of freedom--giving us a cut-off[1] of 1.328. 
> > 0.54 is obviously well below this allowing us to say that there's no
> > "statistical significance" between the two samples at a 90% level.
>  
> Thanks for the link; that looks useful.  To confirm that I understand
> what this has established (or get a bit of help putting in in
> perspective), what this says to me, in the least technical jargon I
> can muster, is "With this many samples and this degree of standard
> deviation, the average difference is not large enough to have a 90%
> confidence level that the difference is significant."  In fact,
> looking at the chart, it isn't enough to reach a 75% confidence level
> that the difference is significant.  Significance here would seem to
> mean that at least the given percentage of the time, picking this many
> samples from an infinite set with an average difference that really
> was this big or bigger would generate a value for t this big or
> bigger.
>  
> Am I close?

Yes, all that sounds as though you've got it.  Note that running the
test more times will tend to reduce the standard deviation a bit as
well, so it may well become significant.  In this case it's unlikely to
affect it much though.

> I like to be clear, because it's easy to get confused and take the
> above to mean that there's a 90% confidence that there is no actual
> significant difference in performance based on that sampling.  (Given
> Tom's assurance that this version of the patch should have similar
> performance to the last, and the samples from the prior patch went the
> other direction, I'm convinced there is not a significant difference,
> but if I'm going to use the referenced calculations, I want to be
> clear how to interpret the results.)

All we're saying is that we're less than 90% confident that there's
something "significant" going on.  All the fiddling with standard
deviations and sample sizes is just easiest way (that I know of) that
statistics currently gives us of determining this more formally than a
hand-wavy "it looks OK to me".  Science tells us that humans are liable
to say things are OK when they're not, as well as vice versa; statistics
gives us a way to work past these limitations in some common and useful
situations.

-- 
  Sam  http://samason.me.uk/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to