On 22/11/12 12:15, Greg Smith wrote:
On 11/8/12 2:16 PM, Josh Berkus wrote:
Also, logging only the long-running queries is less useful than people
on this list seem to think. When I'm doing real performance analysis, I
need to see *everything* which was run, not just the slow stuff. Often
the real problem is a query which used to take 1.1ms, now takes 1.8ms,
and gets run 400 times/second. Looking just at the slow queries won't
tell you that.
No argument here. I've tried to be clear that some of these
high-speed but lossy things I'm hacking on are not going to be useful
for everyone. A solution that found most of these 1.8ms queries, but
dropped some percentage under the highest peak load conditions, would
still be very useful to me.
An anecdote on this topic seems relevant. I have a troublesome
production server that has moved log_min_duration_statement from 100ms
to 500ms to 2000ms as the system grew. Even the original setting
wasn't short enough to catch everything we would like to watch *now*,
but seeing sub-second data is a dream at this point. The increases
have been forced by logging contention becoming unmanagable when every
client has to fight for the log to write to disk. I can see the
buggers stack up as waiting for locks if I try to log shorter
statements, stalling enough that it drags the whole server down under
peak load.
If I could just turn off logging just during those periods--basically,
throwing them away only when some output rate throttled component hit
its limit--I could still find them in the data collected the rest of
the time. There are some types of problems that also only occur under
peak load that this idea would miss, but you'd still be likely to get
*some* of them, statistically.
There's a really hard trade-off here:
-Sometimes you must save data on every query to capture fine details
-Making every query wait for disk I/O is impractical
The sort of ideas you threw out for making things like auto-explain
logging per-session can work. The main limitation there though is
that it presumes you even know what area the problem is in the first
place. I am not optimistic that covers a whole lot of ground either.
Upthread I talked about a background process that persists shared
memory queues as a future consumer of timing events--one that might
consume slow query data too. That is effectively acting as the ideal
component I described above, one that only loses things when it
exceeds the system's write capacity for saving them. I wouldn't want
to try and retrofit the existing PostgreSQL logging facility for such
a thing though. Log parsing as the way to collect data is filled with
headaches anyway.
I don't see any other good way to resolve this trade-off. To help
with the real-world situation you describe, ideally you dump all the
query data somewhere, fast, and have the client move on. You can't
make queries wait for storage, something else (with a buffer!) needs
to take over that job.
I can't find the URL right now, but at PG.EU someone was showing me a
module that grabbed the new 9.2 logging hook and shipped the result to
another server. Clients burn a little CPU and network time and they
move on, and magically disk I/O goes out of their concerns. How much
overhead persisting the data takes isn't the database server's job at
all then. That's the sort of flexibility people need to have with
logging eventually. Something so efficient that every client can
afford to do it; it is capable of saving all events under ideal
conditions; but under adverse ones, you have to keep going and accept
the loss.
Would it be useful to be able to specify a fixed sampling rate, say 1 in
100. That way you could get a well defined statistical sample, yet not
cause excessive I/O?
Cheers,
Gavin
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers