Tom Lane wrote:
Should we redesign the stats signaling logic to work around this,
or just hope we can nag kernel people into fixing it?

Even if there was something to be done in kernel space, how many years from now would it be before it made this problem go away for the majority of near future 9.0 users? We've been seeing a fairly regular stream of "pgstat wait timeout" reports come in. The one I reported was from recent hardware and a very mainstream Linux setup. I'm not real optimistic that this one can get punted toward the OS and get anything done about it in time to head off problems in the field.

This particularly pathologic case with jaguar is great because it's made it possible to nail down how to report the problem. I don't think it's possible to make a strong conclusion about how to resolve this just from that data though. What we probably need is for your additional logging code to catch this again on some systems that are not so obviously broken, to get a better idea what a normal (rather than extreme) manifestation looks like. How much skew is showing up, whether those do in fact correspond with the wait timeouts, that sort of thing.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to