Tom Lane <t...@sss.pgh.pa.us> writes: > Hm. It's been a long time since college statistics, but doesn't the > entire concept of standard deviation depend on the assumption that the > underlying distribution is more-or-less normal (Gaussian)? Is there a
I just had a quick chat with a statistician friends of mine on that topic, and it seems that the only way to make sense of an average is if you know already the distribution. In our case, what I keep experiencing with tuning queries is that we have like 99% of them running under acceptable threshold and 1% of them taking more and more time. In a normal (Gaussian) distribution, there would be no query time farther away from the average than any other, so my experience tells me that the query time distribution is anything BUT normal (Gaussian). > good reason to suppose that query runtime is Gaussian? (I'd bet not; > in particular, multimodal behavior seems very likely due to things like > plan changes.) If not, how much does that affect the usefulness of > a standard-deviation calculation? I don't know what multi-modal is. What I've been gathering from my quick chat this morning is that either you know how to characterize the distribution and then the min max and average are useful on their own, or you need to keep track of an histogram where all the bins are of the same size to be able to learn what the distribution actually is. We didn't get to the point where I could understand if storing histogram with a constant size on log10 of the data rather than the data itself is going to allow us to properly characterize the distribution. The main question I want to answer here would be the percentiles one, I want to get the query max execution timing for 95% of the executions, then 99%, then 99.9% etc. There's no way to answer that without knowing the distribution shape, so we need enough stats to learn what the distribution shape is (hence, histograms). Of course keeping enough stats seems to always begin with keeping the min, max and average, so we can just begin there. We would just be unable to answer interesting questions with just that. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers