Re: [HACKERS] (auto)vacuum truncate exclusive lock

2013-04-18 Thread Jan Wieck

On 4/12/2013 1:57 PM, Tom Lane wrote:

Kevin Grittner kgri...@ymail.com writes:

Tom Lane t...@sss.pgh.pa.us wrote:

I think that the minimum appropriate fix here is to revert the hunk
I quoted, ie take out the suppression of stats reporting and analysis.



I'm not sure I understand -- are you proposing that is all we do
for both the VACUUM command and autovacuum?


No, I said that was the minimum fix.

Looking again at the patch, I note this comment:

/*
+* We failed to establish the lock in the specified number of
+* retries. This means we give up truncating. Suppress the
+* ANALYZE step. Doing an ANALYZE at this point will reset the
+* dead_tuple_count in the stats collector, so we will not get
+* called by the autovacuum launcher again to do the truncate.
+*/

and I suppose the rationale for suppressing the stats report was this
same idea of lying to the stats collector in order to encourage a new
vacuum attempt to happen right away.  Now I'm not sure that that's a
good idea at all --- what's the reasoning for thinking the table will be
any less hot in thirty seconds?  But if it is reasonable, we need a
redesign of the reporting messages, not just a hack to not tell the
stats collector what we did.


Yes, that was the rationale behind it combined with don't change 
function call sequences and more all over the place.


The proper solution would eventually be to add a block number to the 
stats held by the stats collector, which preserves the information about 
the last filled block of the table. The decouple the truncate from 
vacuum/autovacuum. vacuum/autovacuum will set that block number when 
they detect the trailing free space. The analyze step can happen just as 
usual and reset stats, which doesn't reset that block number. The 
autovacuum launcher goes through its normal logic for launching autovac 
or autoanalyze. If it doesn't find any of those to do but the 
last-used-block is set, it launches the separate lazy truncate step.


This explicitly moves the truncate, which inherently requires the 
exclusive lock and therefore is undesirable even in a manual vacuum, 
into the background.



Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] (auto)vacuum truncate exclusive lock

2013-04-18 Thread Jan Wieck

On 4/12/2013 2:08 PM, Alvaro Herrera wrote:

Tom Lane escribió:


Are you saying you intend to revert that whole concept?  That'd be
okay with me, I think.  Otherwise we need some thought about how to
inform the stats collector what's really happening.


Maybe what we need is to consider table truncation as a separate
activity from vacuuming.  Then autovacuum could invoke it without having
to do a full-blown vacuum.  For this to work, I guess we would like to
separately store the status of the back-scan in pgstat somehow (I think
a boolean flag suffices: were we able to truncate all pages that
appeared to be empty?)


Should have read the entire thread before responding :)


Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] (auto)vacuum truncate exclusive lock

2013-04-18 Thread Jan Wieck

On 4/18/2013 11:44 AM, Jan Wieck wrote:


Yes, that was the rationale behind it combined with don't change
function call sequences and more all over the place.


function call signatures

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] (auto)vacuum truncate exclusive lock

2013-04-12 Thread Jeff Janes
On Thursday, April 11, 2013, Kevin Grittner wrote:


  I also log the number of pages truncated at the time it gave up,
  as it would be nice to know if it is completely starving or
  making some progress.

 If we're going to have the message, we should make it useful.  My
 biggest question here is not whether we should add this info, but
 whether it should be DEBUG instead of LOG


I like it being LOG.  If it were DEBUG, I don't think anyone would be
likely to see it when they needed to, as it happens sporadically on busy
servers and I don't think people would run those with DEBUG on.  I figure
it is analogous to an autovacuum cancel message it partially replaces, and
those are LOG.





  Also, I think that permanently boycotting doing autoanalyze
  because someone is camping out on an access share lock (or
  because there are a never-ending stream of overlapping locks) and
  so the truncation cannot be done is a bit drastic, especially for
  inclusion in a point release.

 That much is not a change in the event that the truncation does not
 complete.


OK, I see that now.  In the old behavior, of the lock was acquired, but
then we were shoved off from it, the analyze was not done.  But, in the old
behavior if the lock was never acquired at all, then it would go ahead to
do the autoanalyze, and that has changed.   That is they way I was testing
it (camping out on an access shared lock so the access exclusive could
never be granted in the first place; because intercepting it during the
truncate phase was hard to do) and I just assumed the behavior I saw would
apply to both cases, but it does not.


 I have seen cases where the old logic head-banged for
 hours or days without succeeding at the truncation attempt in
 autovacuum, absolutely killing performance until the user ran an
 explicit VACUUM.  And in the meantime, since the deadlock detection
 logic was killing autovacuum before it got to the analyze phase,
 the autoanalyze was never done.


OK, so there three problems.  It would take a second to yield, in doing so
it would abandon all the progress it had made in that second rather than
saving it, and it would tight loop (restricted by naptime) on this because
of the lack of analyze.  So it fixed the first two in a way that seems an
absolute improvement for the auto case, but it made the third one worse in
a common case, where it never acquires the lock in the first place, and so
doesn't analyze when before it did in that one case.



 Perhaps the new logic should go ahead and get its lock even on a
 busy system (like the old logic),


As far as I can tell, the old logic was always conditional on the
AccessExlusive lock acquisition, whether it was manual or auto.

Cheers,

Jeff


Re: [HACKERS] (auto)vacuum truncate exclusive lock

2013-04-12 Thread Kevin Grittner
Jeff Janes jeff.ja...@gmail.com wrote:

 If we're going to have the message, we should make it useful.
 My biggest question here is not whether we should add this info,
 but whether it should be DEBUG instead of LOG

 I like it being LOG.  If it were DEBUG, I don't think anyone
 would be likely to see it when they needed to, as it happens
 sporadically on busy servers and I don't think people would run
 those with DEBUG on.  I figure it is analogous to an autovacuum
 cancel message it partially replaces, and those are LOG.

OK, sold.

 Also, I think that permanently boycotting doing autoanalyze
 because someone is camping out on an access share lock (or
 because there are a never-ending stream of overlapping locks)
 and so the truncation cannot be done is a bit drastic,
 especially for inclusion in a point release.

 That much is not a change in the event that the truncation does
 not complete.  

 OK, I see that now.  In the old behavior, of the lock was
 acquired, but then we were shoved off from it, the analyze was
 not done.  But, in the old behavior if the lock was never
 acquired at all, then it would go ahead to do the autoanalyze,
 and that has changed.   That is they way I was testing it
 (camping out on an access shared lock so the access exclusive
 could never be granted in the first place; because intercepting
 it during the truncate phase was hard to do) and I just assumed
 the behavior I saw would apply to both cases, but it does not.

Ah, I see now.  So the actual worst case for the old code, in terms
of both head-banging and statistics, was if autovacuum was able to
acquire the lock but then many tasks all piled up behind its lock.
If the system was even *more* busy it would not acquire the lock at
all, and would behave better.

 I have seen cases where the old logic head-banged for
 hours or days without succeeding at the truncation attempt in
 autovacuum, absolutely killing performance until the user ran an
 explicit VACUUM.  And in the meantime, since the deadlock
 detection logic was killing autovacuum before it got to the
 analyze phase, the autoanalyze was never done.

 OK, so there three problems.  It would take a second to yield, in
 doing so it would abandon all the progress it had made in that
 second rather than saving it, and it would tight loop (restricted
 by naptime) on this because of the lack of analyze.  So it fixed
 the first two in a way that seems an absolute improvement for the
 auto case, but it made the third one worse in a common case,
 where it never acquires the lock in the first place, and so
 doesn't analyze when before it did in that one case.

Yeah, I see that now.

 Perhaps the new logic should go ahead and get its lock even on a
 busy system (like the old logic),

 As far as I can tell, the old logic was always conditional on the
 AccessExlusive lock acquisition, whether it was manual or auto.

OK, will review that to confirm;but assuming that's right, and
nobody else is already working on a fix, I propose to do the
following:

(1)  Restore the prior behavior of the VACUUM command.  This was
only ever intended to be a fix for a serious autovacuum problem
which caused many users serious performance problems -- in some
cases including unscheduled down time.  I also saw sites where,
having been bitten by this, they disabled autovacuum and later ran
into problems with bloat and/or xid wraparound.

(2)  If autovacuum decides to try to truncate but the lock cannot
be initially acquired, and analyze is requested, skip the
truncation and do the autoanalyze.  If the table is so hot that we
cannot get the lock, the space may get re-used soon, and if not
there is a good chance another autovacuum will trigger soon.  If
the user really wants the space released to the OS immediately,
they can run a manual vacuum to force the issue.

If I don't hear anything within the next day or two, I'll write
that up and post it here before applying (and back-patching to
affected branches).

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] (auto)vacuum truncate exclusive lock

2013-04-12 Thread Tom Lane
Kevin Grittner kgri...@ymail.com writes:
 OK, will review that to confirm;but assuming that's right, and
 nobody else is already working on a fix, I propose to do the
 following:

 (1)  Restore the prior behavior of the VACUUM command.  This was
 only ever intended to be a fix for a serious autovacuum problem
 which caused many users serious performance problems -- in some
 cases including unscheduled down time.  I also saw sites where,
 having been bitten by this, they disabled autovacuum and later ran
 into problems with bloat and/or xid wraparound.

 (2)  If autovacuum decides to try to truncate but the lock cannot
 be initially acquired, and analyze is requested, skip the
 truncation and do the autoanalyze.  If the table is so hot that we
 cannot get the lock, the space may get re-used soon, and if not
 there is a good chance another autovacuum will trigger soon.  If
 the user really wants the space released to the OS immediately,
 they can run a manual vacuum to force the issue.

I think that the minimum appropriate fix here is to revert the hunk
I quoted, ie take out the suppression of stats reporting and analysis.

However, we're still thinking too small.  I've been wondering whether we
couldn't entirely remove the dirty, awful kluges that were installed in
the lock manager to kill autovacuum when somebody blocked behind it.
This mechanism should ensure that AV never takes an exclusive lock
for long enough to be a serious problem, so do we need that anymore?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] (auto)vacuum truncate exclusive lock

2013-04-12 Thread Andres Freund
On 2013-04-12 13:09:02 -0400, Tom Lane wrote:
 Kevin Grittner kgri...@ymail.com writes:
  OK, will review that to confirm;but assuming that's right, and
  nobody else is already working on a fix, I propose to do the
  following:
 
  (1)� Restore the prior behavior of the VACUUM command.� This was
  only ever intended to be a fix for a serious autovacuum problem
  which caused many users serious performance problems -- in some
  cases including unscheduled down time.� I also saw sites where,
  having been bitten by this, they disabled autovacuum and later ran
  into problems with bloat and/or xid wraparound.
 
  (2)� If autovacuum decides to try to truncate but the lock cannot
  be initially acquired, and analyze is requested, skip the
  truncation and do the autoanalyze.� If the table is so hot that we
  cannot get the lock, the space may get re-used soon, and if not
  there is a good chance another autovacuum will trigger soon.� If
  the user really wants the space released to the OS immediately,
  they can run a manual vacuum to force the issue.
 
 I think that the minimum appropriate fix here is to revert the hunk
 I quoted, ie take out the suppression of stats reporting and analysis.
 
 However, we're still thinking too small.  I've been wondering whether we
 couldn't entirely remove the dirty, awful kluges that were installed in
 the lock manager to kill autovacuum when somebody blocked behind it.
 This mechanism should ensure that AV never takes an exclusive lock
 for long enough to be a serious problem, so do we need that anymore?

Wouldn't that make DROP TABLE stop working while autovac is processing
the table? Considering how long e.g. a full table vacuum on a huge table
can take that doesn't seem to be acceptable.
Sure, you can manually kill the autovac process, but that will soon be
restarted. So you have to know that you need to start the DROP TABLE
beforehand so it will get the lock quicker and such.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] (auto)vacuum truncate exclusive lock

2013-04-12 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 Kevin Grittner kgri...@ymail.com writes:
 OK, will review that to confirm;but assuming that's right, and
 nobody else is already working on a fix, I propose to do the
 following:

 (1)  Restore the prior behavior of the VACUUM command.  This was
 only ever intended to be a fix for a serious autovacuum problem
 which caused many users serious performance problems -- in some
 cases including unscheduled down time.  I also saw sites where,
 having been bitten by this, they disabled autovacuum and later ran
 into problems with bloat and/or xid wraparound.

 (2)  If autovacuum decides to try to truncate but the lock cannot
 be initially acquired, and analyze is requested, skip the
 truncation and do the autoanalyze.  If the table is so hot that we
 cannot get the lock, the space may get re-used soon, and if not
 there is a good chance another autovacuum will trigger soon.  If
 the user really wants the space released to the OS immediately,
 they can run a manual vacuum to force the issue.

 I think that the minimum appropriate fix here is to revert the hunk
 I quoted, ie take out the suppression of stats reporting and analysis.

I'm not sure I understand -- are you proposing that is all we do
for both the VACUUM command and autovacuum?  (i.e., we don't try to
full restore the old VACUUM command behavior; just the troublesome
failure to generate statistics?)

 However, we're still thinking too small.  I've been wondering whether we
 couldn't entirely remove the dirty, awful kluges that were installed in
 the lock manager to kill autovacuum when somebody blocked behind it.
 This mechanism should ensure that AV never takes an exclusive lock
 for long enough to be a serious problem, so do we need that anymore?

Are you suggesting that just in master/HEAD or back to 9.0?  If the
latter, what existing problem does it cure (besides ugly code)?

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] (auto)vacuum truncate exclusive lock

2013-04-12 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2013-04-12 13:09:02 -0400, Tom Lane wrote:
 However, we're still thinking too small.  I've been wondering whether we
 couldn't entirely remove the dirty, awful kluges that were installed in
 the lock manager to kill autovacuum when somebody blocked behind it.
 This mechanism should ensure that AV never takes an exclusive lock
 for long enough to be a serious problem, so do we need that anymore?

 Wouldn't that make DROP TABLE stop working while autovac is processing
 the table?

Meh ... I guess you're right.  I wasn't thinking about exclusive locks
being taken elsewhere.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] (auto)vacuum truncate exclusive lock

2013-04-12 Thread Tom Lane
Kevin Grittner kgri...@ymail.com writes:
 Tom Lane t...@sss.pgh.pa.us wrote:
 I think that the minimum appropriate fix here is to revert the hunk
 I quoted, ie take out the suppression of stats reporting and analysis.

 I'm not sure I understand -- are you proposing that is all we do
 for both the VACUUM command and autovacuum?

No, I said that was the minimum fix.

Looking again at the patch, I note this comment:

   /*
+* We failed to establish the lock in the specified number of
+* retries. This means we give up truncating. Suppress the
+* ANALYZE step. Doing an ANALYZE at this point will reset the
+* dead_tuple_count in the stats collector, so we will not get
+* called by the autovacuum launcher again to do the truncate.
+*/

and I suppose the rationale for suppressing the stats report was this
same idea of lying to the stats collector in order to encourage a new
vacuum attempt to happen right away.  Now I'm not sure that that's a
good idea at all --- what's the reasoning for thinking the table will be
any less hot in thirty seconds?  But if it is reasonable, we need a
redesign of the reporting messages, not just a hack to not tell the
stats collector what we did.

Are you saying you intend to revert that whole concept?  That'd be
okay with me, I think.  Otherwise we need some thought about how to
inform the stats collector what's really happening.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] (auto)vacuum truncate exclusive lock

2013-04-12 Thread Alvaro Herrera
Tom Lane escribió:

 Are you saying you intend to revert that whole concept?  That'd be
 okay with me, I think.  Otherwise we need some thought about how to
 inform the stats collector what's really happening.

Maybe what we need is to consider table truncation as a separate
activity from vacuuming.  Then autovacuum could invoke it without having
to do a full-blown vacuum.  For this to work, I guess we would like to
separately store the status of the back-scan in pgstat somehow (I think
a boolean flag suffices: were we able to truncate all pages that
appeared to be empty?)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] (auto)vacuum truncate exclusive lock

2013-04-12 Thread Kevin Grittner
[some relevant dropped bits of the thread restored]

Tom Lane t...@sss.pgh.pa.us wrote:
 Kevin Grittner kgri...@ymail.com writes:
 Tom Lane t...@sss.pgh.pa.us wrote:
 Kevin Grittner kgri...@ymail.com writes:
 Jeff Janes jeff.ja...@gmail.com wrote:

 I propose to do the following:

 (1)  Restore the prior behavior of the VACUUM command.  This
 was only ever intended to be a fix for a serious autovacuum
 problem which caused many users serious performance problems

 (2)  If autovacuum decides to try to truncate but the lock
 cannot be initially acquired, and analyze is requested, skip
 the truncation and do the autoanalyze.

 I think that the minimum appropriate fix here is to [...] take
 out the suppression of stats reporting and analysis.
 
 I'm not sure I understand -- are you proposing that is all we do
 for both the VACUUM command and autovacuum?
 
 No, I said that was the minimum fix.

OK, I suggested that and more, so I wasn't sure what you were
getting at.

 OK, I see that now.  In the old behavior, of the lock was
 acquired, but then we were shoved off from it, the analyze
 was not done.  But, in the old behavior if the lock was never
 acquired at all, then it would go ahead to do the
 autoanalyze,

 Ah, I see now.  So the actual worst case for the old code, in
 terms of both head-banging and statistics, was if autovacuum
 was able to acquire the lock but then many tasks all piled up
 behind its lock.  If the system was even *more* busy it would
 not acquire the lock at all, and would behave better.

 and I suppose the rationale for suppressing the stats report was
 this same idea of lying to the stats collector in order to
 encourage a new vacuum attempt to happen right away.

I think Jan expressed some such sentiment back during the original
discussion.  I was not persuaded by that; but he pointed out that
if the deadlock killer killed an autovacuum process which was doing
a truncate, the it did not get to the statistics phase; so I agreed
that any change in that behavior should be a separate patch.  I
missed the fact that if it failed to initially get the lock it did
proceed to the statistics phase.  I explained this earlier in this
thread.  No need to cast about for hypothetical explanations.

 Now I'm not sure that that's a good idea at all

I'm pretty sure it isn't; that's why I proposed changing it.

 But if it is reasonable, we need a redesign of the reporting
 messages, not just a hack to not tell the stats collector what we
 did.

The idea was to try to make as small a change in previous behavior
as possible.  Jan pointed out that when the deadlock detection code
killed an autovacuum worker which was trying to truncate, the
statistics were not updated, leading to retries.  This was an
attempt to *not change* existing behavior.  It was wrong, because
we both missed the fact that if it didn't get the lock in the first
place it went ahead with statistics generation.  That being the
case, I was proposing we always generate statistics if we were
supposed to.  That would be a change toward *more* up-to-date
statistics and *fewer* truncation retries than we've had.  I'm OK
with that because a table hot enough to hit the issue will likely
need the space again or need another vacuum soon.

 Are you saying you intend to revert that whole concept?

No.  I was merely asking what you were suggesting.  As I said
earlier:

 I have seen cases where the old logic head-banged for hours or
 days without succeeding at the truncation attempt in
 autovacuum, absolutely killing performance until the user ran
 an explicit VACUUM.  And in the meantime, since the deadlock
 detection logic was killing autovacuum before it got to the
 analyze phase, the autoanalyze was never done.

 Otherwise we need some thought about how to inform the stats
 collector what's really happening.

I think we can probably improve that on some future release.  I
don't think a new scheme for that makes sense for back-patching or
9.3.

For now what I'm suggesting is generating statistics in all the
cases it did before, plus the case where it starts truncation but
does not complete it.  The fact that before this patch there were
cases where the autovacuum worker was killed, resulting in not
generating needed statistics seems like a bug, not a behavior we
need to preserve.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] (auto)vacuum truncate exclusive lock

2013-04-12 Thread Tom Lane
Kevin Grittner kgri...@ymail.com writes:
 For now what I'm suggesting is generating statistics in all the
 cases it did before, plus the case where it starts truncation but
 does not complete it.  The fact that before this patch there were
 cases where the autovacuum worker was killed, resulting in not
 generating needed statistics seems like a bug, not a behavior we
 need to preserve.

Well, in the case where it gets killed, it's still not gonna generate
statistics.  What we've really got here is a new case that did not exist
before, namely that it voluntarily stops truncating.  But I agree that
modeling that case's behavior on the kill case was poorly thought out.

In other words, yes, I think we're on the same page.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] (auto)vacuum truncate exclusive lock

2013-04-11 Thread Jeff Janes
I guess I'm a couple releases late to review the autovacuum truncate
exclusive lock patch (a79ae0bc0d454b9f2c95a), but this patch did not only
affect autovac, it affects manual vacuum as well (as did the original
behavior it is a modification of).  So the compiler constants are misnamed,
and the elog message when it triggers is misleading.  (Is it also
misleading to just say vacuum?  Does it need to say (auto)vacuum?)

I've attached a patch that changes that. I also log the number of pages
truncated at the time it gave up, as it would be nice to know if it is
completely starving or making some progress.

Also, I think that permanently boycotting doing autoanalyze because someone
is camping out on an access share lock (or because there are a never-ending
stream of overlapping locks) and so the truncation cannot be done is a bit
drastic, especially for inclusion in a point release.  Is there a way to
have the autoanalyze happen, but then still arrange for the autovacuum to
be triggered again next naptime?

Cheers,

Jeff
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
new file mode 100644
index 0401b7f..fda2656
*** a/src/backend/commands/vacuumlazy.c
--- b/src/backend/commands/vacuumlazy.c
***
*** 75,83 
   * that the potential for improvement was great enough to merit the cost of
   * supporting them.
   */
! #define AUTOVACUUM_TRUNCATE_LOCK_CHECK_INTERVAL		20	/* ms */
! #define AUTOVACUUM_TRUNCATE_LOCK_WAIT_INTERVAL		50	/* ms */
! #define AUTOVACUUM_TRUNCATE_LOCK_TIMEOUT			5000		/* ms */
  
  /*
   * Guesstimation of number of dead tuples per page.  This is used to
--- 75,83 
   * that the potential for improvement was great enough to merit the cost of
   * supporting them.
   */
! #define VACUUM_TRUNCATE_LOCK_CHECK_INTERVAL		20	/* ms */
! #define VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL		50	/* ms */
! #define VACUUM_TRUNCATE_LOCK_TIMEOUT			5000		/* ms */
  
  /*
   * Guesstimation of number of dead tuples per page.  This is used to
*** lazy_truncate_heap(Relation onerel, LVRe
*** 1306,1313 
  			 */
  			CHECK_FOR_INTERRUPTS();
  
! 			if (++lock_retry  (AUTOVACUUM_TRUNCATE_LOCK_TIMEOUT /
! AUTOVACUUM_TRUNCATE_LOCK_WAIT_INTERVAL))
  			{
  /*
   * We failed to establish the lock in the specified number of
--- 1306,1313 
  			 */
  			CHECK_FOR_INTERRUPTS();
  
! 			if (++lock_retry  (VACUUM_TRUNCATE_LOCK_TIMEOUT /
! VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL))
  			{
  /*
   * We failed to establish the lock in the specified number of
*** lazy_truncate_heap(Relation onerel, LVRe
*** 1318,1333 
   */
  vacrelstats-lock_waiter_detected = true;
  ereport(LOG,
! 		(errmsg(automatic vacuum of table \%s.%s.%s\: 
  could not (re)acquire exclusive 
! lock for truncate scan,
  get_database_name(MyDatabaseId),
  			get_namespace_name(RelationGetNamespace(onerel)),
! RelationGetRelationName(onerel;
  return;
  			}
  
! 			pg_usleep(AUTOVACUUM_TRUNCATE_LOCK_WAIT_INTERVAL);
  		}
  
  		/*
--- 1318,1334 
   */
  vacrelstats-lock_waiter_detected = true;
  ereport(LOG,
! 		(errmsg(vacuum of table \%s.%s.%s\: 
  could not (re)acquire exclusive 
! lock for truncate scan (%d pages already truncated),
  get_database_name(MyDatabaseId),
  			get_namespace_name(RelationGetNamespace(onerel)),
! RelationGetRelationName(onerel),
! vacrelstats-pages_removed)));
  return;
  			}
  
! 			pg_usleep(VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL);
  		}
  
  		/*
*** count_nondeletable_pages(Relation onerel
*** 1437,1443 
  			elapsed = currenttime;
  			INSTR_TIME_SUBTRACT(elapsed, starttime);
  			if ((INSTR_TIME_GET_MICROSEC(elapsed) / 1000)
! = AUTOVACUUM_TRUNCATE_LOCK_CHECK_INTERVAL)
  			{
  if (LockHasWaitersRelation(onerel, AccessExclusiveLock))
  {
--- 1438,1444 
  			elapsed = currenttime;
  			INSTR_TIME_SUBTRACT(elapsed, starttime);
  			if ((INSTR_TIME_GET_MICROSEC(elapsed) / 1000)
! = VACUUM_TRUNCATE_LOCK_CHECK_INTERVAL)
  			{
  if (LockHasWaitersRelation(onerel, AccessExclusiveLock))
  {

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] (auto)vacuum truncate exclusive lock

2013-04-11 Thread Tom Lane
Jeff Janes jeff.ja...@gmail.com writes:
 I guess I'm a couple releases late to review the autovacuum truncate
 exclusive lock patch (a79ae0bc0d454b9f2c95a), but this patch did not only
 affect autovac, it affects manual vacuum as well (as did the original
 behavior it is a modification of).  So the compiler constants are misnamed,
 and the elog message when it triggers is misleading.  (Is it also
 misleading to just say vacuum?  Does it need to say (auto)vacuum?)

I just came to look at this via a complaint in pgsql-admin.  I'm not
convinced that we should consider the new behavior to be sane.
Automatic exclusive-lock abandonment makes sense for autovacuum, but
when the user has told us to vacuum, ISTM we should do it.  I can see
there might be differing opinions on that though.

 Also, I think that permanently boycotting doing autoanalyze because someone
 is camping out on an access share lock (or because there are a never-ending
 stream of overlapping locks) and so the truncation cannot be done is a bit
 drastic, especially for inclusion in a point release.

It's worse than that, it breaks manual VACUUM ANALYZE too (as per -admin
complaint).  I think this aspect is completely wrong, whether or not
you consider that dropping the exclusive lock early is sane for manual
vacuum.  If we were told to do an analyze, we should press on and do it.

Thoughts?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] (auto)vacuum truncate exclusive lock

2013-04-11 Thread Kevin Grittner
Jeff Janes jeff.ja...@gmail.com wrote:

 I guess I'm a couple releases late to review the autovacuum
 truncate exclusive lock patch (a79ae0bc0d454b9f2c95a), but this
 patch did not only affect autovac, it affects manual vacuum as
 well (as did the original behavior it is a modification of).  So
 the compiler constants are misnamed, and the elog message when it
 triggers is misleading.  (Is it also misleading to just say
 vacuum?  Does it need to say (auto)vacuum?)

I think it should because I don't think this heuristic is
appropriate for an explicit VACUUM command.  It's one thing for
autovacuum to leave the file sizes alone if it can't truncate
without blocking foreground processes; it's another for a
foreground VACUUM request to not do all of the work it was expected
to do.

 I also log the number of pages truncated at the time it gave up,
 as it would be nice to know if it is completely starving or
 making some progress.

If we're going to have the message, we should make it useful.  My
biggest question here is not whether we should add this info, but
whether it should be DEBUG instead of LOG.

 Also, I think that permanently boycotting doing autoanalyze
 because someone is camping out on an access share lock (or
 because there are a never-ending stream of overlapping locks) and
 so the truncation cannot be done is a bit drastic, especially for
 inclusion in a point release.

That much is not a change in the event that the truncation does not
complete.  I have seen cases where the old logic head-banged for
hours or days without succeeding at the truncation attempt in
autovacuum, absolutely killing performance until the user ran an
explicit VACUUM.  And in the meantime, since the deadlock detection
logic was killing autovacuum before it got to the analyze phase,
the autoanalyze was never done.

Perhaps the new logic should go ahead and get its lock even on a
busy system (like the old logic), and bail out after some
incremental progress.  Based on the current reports, it seems
likely that at least some workloads are not allowing the current
code to get a toe-hold.

 Is there a way to have the autoanalyze happen, but then still
 arrange for the autovacuum to be triggered again next naptime?  

I suggested that myself, but that was a bigger change from the old
behavior (at least if we allow that initial lock so we guarantee
some progress, per the above), and I didn't think that should be
back-patched.  A separate patch for just that, applied to
master/HEAD, seems like a good idea to me.

-- 
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] (auto)vacuum truncate exclusive lock

2013-04-11 Thread Jeff Janes
On Thursday, April 11, 2013, Tom Lane wrote:

 Jeff Janes jeff.ja...@gmail.com writes:
  I guess I'm a couple releases late to review the autovacuum truncate
  exclusive lock patch (a79ae0bc0d454b9f2c95a), but this patch did not
 only
  affect autovac, it affects manual vacuum as well (as did the original
  behavior it is a modification of).  So the compiler constants are
 misnamed,
  and the elog message when it triggers is misleading.  (Is it also
  misleading to just say vacuum?  Does it need to say (auto)vacuum?)

 I just came to look at this via a complaint in pgsql-admin.  I'm not
 convinced that we should consider the new behavior to be sane.
 Automatic exclusive-lock abandonment makes sense for autovacuum, but
 when the user has told us to vacuum, ISTM we should do it.  I can see
 there might be differing opinions on that though.


It does do the vacuum (so that internal space can be re-used, etc.), it is
the truncation of  8MB of terminal free space that it may or may not do.

In this regard, the old behavior of manual vacuum was not all that
consistent, either.   If it could not get the lock immediately, it would
abandon the attempt to truncation (but still do the analyze, unlike now).
 But if the lock was immediately available, it would take it and hold it
for as long as it took.  It is an odd mixture of courtesy and
aggressiveness to not wait for a lock if unavailable, but if obtained to
cling to it (justified by the ease of implementation, I guess).

I think the manual vacuum should try to obtain the lock for 5 seconds, as
in the current behavior.  Whether once obtained it should cling to it, or
surrender it, I don't really know.






  Also, I think that permanently boycotting doing autoanalyze because
 someone
  is camping out on an access share lock (or because there are a
 never-ending
  stream of overlapping locks) and so the truncation cannot be done is a
 bit
  drastic, especially for inclusion in a point release.

 It's worse than that, it breaks manual VACUUM ANALYZE too (as per -admin
 complaint).  I think this aspect is completely wrong, whether or not
 you consider that dropping the exclusive lock early is sane for manual
 vacuum.  If we were told to do an analyze, we should press on and do it.

 Thoughts?


I don't think auto-analyze should get cancelled either.

Cheers,

Jeff