On 30 Srpen 2012, 17:46, Robert Haas wrote: > On Sun, Aug 26, 2012 at 1:04 PM, Tomas Vondra <t...@fuzzy.cz> wrote: >> Attached is an improved patch, with a call to rand() replaced with >> getrand(). >> >> I was thinking about the counter but I'm not really sure how to handle >> cases like "39%" - I'm not sure a plain (counter % 100 < 37) is not a >> good sampling, because it always keeps continuous sequences of >> transactions. Maybe there's a clever way to use a counter, but let's >> stick to a getrand() unless we can prove is't causing issues. Especially >> considering that a lot of data won't be be written at all with low >> sampling rates. > > I like this patch, and I think sticking with a random number is a good > idea. But I have two suggestions. Number one, I think the sampling > rate should be stored as a float, not an integer, because I can easily > imagine wanting a sampling rate that is not an integer percentage - > especially, one that is less than one percent, like half a percent or > a tenth of a percent. Also, I suggest that the command-line option > should be a long option rather than a single character option. That > will be more mnemonic and avoid using up too many single letter > options, of which there is a limited supply. So to sample every > hundredth result, you could do something like this: > > pgbench --latency-sample-rate 0.01
Right, I was thinking about that too. I'll do that in the next version of the patch. > Another option I personally think would be useful is an option to > record only those latencies that are above some minimum bound, like > this: > > pgbench --latency-only-if-more-than $MICROSECONDS > > The problem with recording all the latencies is that it tends to have > a material impact on throughput. Your patch should address that for > the case where you just want to characterize the latency, but it would > also be nice to have a way of recording the outliers. That sounds like a pretty trivial patch. I've been thinking about yet another option - histograms (regular or with exponential bins). What I'm not sure about is which of these options should be allowed at the same time - to me, combinations like 'sampling + aggregation' don't make much sense. Maybe except 'latency-only-if-more-than + aggregation'. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers