On Fri, 2007-09-07 at 11:48 -0400, Greg Smith wrote:
> On Fri, 7 Sep 2007, Simon Riggs wrote:
> > I think that is what we should be measuring, perhaps in a simple way
> > such as calculating the 90th percentile of the response time
> > distribution.
> I do track the 90th percentile numbers, but in these pgbench tests where
> I'm writing as fast as possible they're actually useless--in many cases
> they're *smaller* than the average response, because there are enough
> cases where there is a really, really long wait that they skew the average
> up really hard. Take a look at any of the inidividual test graphs and
> you'll see what I mean.
I've looked at the graphs now, but I'm not any wiser, I'm very sorry to
say. We need something like a frequency distribution curve, not just the
actual times. Bottom line is we need a good way to visualise the
detailed effects of the patch.
I think we should do some more basic tests to see where those outliers
come from. We need to establish a clear link between number of dirty
writes and response time. If there is one, which we all believe, then it
is worth minimising those with these techniques. We might just be
chasing the wrong thing.
Perhaps output the number of dirty blocks written on the same line as
the output of log_min_duration_statement so that we can correlate
response time to dirty-block-writes on that statement.
For me, we can enter Beta while this is still partially in the air. We
won't be able to get this right without lots of other feedback. So I
think we should concentrate now on making sure we've got the logging in
place so we can check whether your patch works when its out there. I'd
say lets include what you've done and then see how it works during Beta.
We've been trying to get this right for years now, so we have to allow
some slack to make sure we get this right. We can reduce or strip out
logging once we go RC.
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?