On 11/28/2012 3:33 PM, Kevin Grittner wrote:
Kevin Grittner wrote:
I still need to review the timing calls, since I'm not familiar
with them so it wasn't immediately obvious to me whether they
were being used correctly. I have no reason to believe that they
aren't, but feel I should check.
It seems odd to me that assignment of one instr_time to another is
done with INSTR_TIME_SET_ZERO() of the target followed by
INSTR_TIME_ADD() with the target and the source. It seems to me
that simple assignment would be more readable, and I can't see any
down side.
Why shouldn't:
INSTR_TIME_SET_ZERO(elapsed);
INSTR_TIME_ADD(elapsed, currenttime);
INSTR_TIME_SUBTRACT(elapsed, starttime);
instead be?:
elapsed = currenttime;
INSTR_TIME_SUBTRACT(elapsed, starttime);
And why shouldn't:
INSTR_TIME_SET_ZERO(starttime);
INSTR_TIME_ADD(starttime, currenttime);
instead be?:
starttime = currenttime;
Resetting starttime this way seems especially odd.
instr_time is LARGE_INTEGER on Win32 but struct timeval on Unix. Is
starttime = currenttime;
portable if those are structs?
Also, I want to do another pass looking just for off-by-one
errors on blkno. Again, I have no reason to believe that there is
a problem; it just seems like it would be a particularly easy
type of mistake to make and miss when a key structure has this
field:
BlockNumber nonempty_pages;
/* actually, last nonempty page + 1 */
No problems found with that.
And I want to test more.
The patch worked as advertised in all my tests, but I became
uncomforatable with the games being played with the last autovacuum
timestamp and the count of dead tuples. Sure, that causes
autovacuum to kick back in until it has dealt with the truncation,
but it makes it hard for a human looking at the stat views to see
what's going on, and I'm not sure it wouldn't lead to bad plans due
to stale statistics.
Personally, I would rather see us add a boolean to indicate that
autovacuum was needed (regardless of the math on the other columns)
and use that to control the retries -- leaving the other columns
free to reflect reality.
You mean to extend the stats by yet another column? The thing is that
this whole case happens rarely. We see this every other month or so and
only on a rolling window table after it got severely bloated due to some
abnormal use (purging disabled for some operational reason). The whole
situation resolves itself after a few minutes to maybe an hour or two.
Personally I don't think living with a few wrong stats on one table for
that time is so bad that it warrants changing that much more code.
Lower casing TRUE/FALSE will be done.
If the LW_SHARED is enough in LockHasWaiters(), that's fine too.
I think we have a consensus that the check interval should be derived
from deadlock_timeout/10 clamped to 10ms. I'm going to remove that GUC.
About the other two, I'm just not sure. Maybe using half the value as
for the lock waiter interval as the lock retry interval, again clamped
to 10ms, and simply leaving one GUC that controls how long autovacuum
should retry this. I'm not too worried about the retry interval.
However, the problem with that overall retry duration is that this value
very much depends on the usage pattern of the database. If long running
transactions (like >5 seconds) are the norm, then a hard coded value of
2 seconds before autovacuum gives up isn't going to help much.
Jan
--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers