Re: [HACKERS] lock_timeout GUC patch - Review

2010-08-02 Thread Boszormenyi Zoltan
Marc Cousin írta:
 The Thursday 29 July 2010 13:55:38, Boszormenyi Zoltan wrote :
   
 I fixed this by adding CheckLockTimeout() function that works like
 CheckStatementTimeout() and ensuring that the same start time is
 used for both deadlock_timeout and lock_timeout if both are active.
 The preference of errors if their timeout values are equal is:
 statement_timeout  lock_timeout  deadlock_timeout
 

 As soon as lock_timeout is bigger than deadlock_timeout, it doesn't
 work, with this new version.

 Keeping the deadlock_timeout to 1s, when lock_timeout = 1001, 
 lock_timeout doesn't trigger anymore.
   

I missed one case when the lock_timeout_active should have been set
but the timer must not have been re-set, this caused the problem.
I blame the hot weather and having no air conditioning. The second is
now fixed. :-)

I also added one line in autovacuum.c to disable lock_timeout,
in case it's globally set in postgresq.conf as per Alvaro's comment.

Also, I made sure that only one or two timeout causes (one of
deadlock_timeout
and lock_timeout in the first case or statement_timeout plus one of the
other two)
can be active at a time. Previously I was able to trigger a segfault
with the default
1sec deadlock_timeout and lock_timeout = 999 or 1001. Effectively, the
system's
clock resolution makes the lock_timeout and deadlock_timeout equal and
RemoveFromWaitQueue() was called twice. This way it's a lot more robust.

Best regards,
Zoltán Böszörményi

diff -dcrpN pgsql.orig/doc/src/sgml/config.sgml pgsql/doc/src/sgml/config.sgml
*** pgsql.orig/doc/src/sgml/config.sgml	2010-07-26 10:05:37.0 +0200
--- pgsql/doc/src/sgml/config.sgml	2010-07-29 11:58:56.0 +0200
*** COPY postgres_log FROM '/full/path/to/lo
*** 4479,4484 
--- 4479,4508 
/listitem
   /varlistentry
  
+  varlistentry id=guc-lock-timeout xreflabel=lock_timeout
+   termvarnamelock_timeout/varname (typeinteger/type)/term
+   indexterm
+primaryvarnamelock_timeout/ configuration parameter/primary
+   /indexterm
+   listitem
+para
+ Abort any statement that tries to acquire a heavy-weight lock (e.g. rows,
+ pages, tables, indices or other objects) and the lock has to wait more
+ than the specified number of milliseconds, starting from the time the
+ command arrives at the server from the client.
+ If varnamelog_min_error_statement/ is set to literalERROR/ or lower,
+ the statement that timed out will also be logged. A value of zero
+ (the default) turns off the limitation.
+/para
+ 
+para
+ Setting varnamelock_timeout/ in
+ filenamepostgresql.conf/ is not recommended because it
+ affects all sessions.
+/para  
+   /listitem   
+  /varlistentry
+ 
   varlistentry id=guc-vacuum-freeze-table-age xreflabel=vacuum_freeze_table_age
termvarnamevacuum_freeze_table_age/varname (typeinteger/type)/term
indexterm
diff -dcrpN pgsql.orig/doc/src/sgml/ref/lock.sgml pgsql/doc/src/sgml/ref/lock.sgml
*** pgsql.orig/doc/src/sgml/ref/lock.sgml	2010-04-03 09:23:01.0 +0200
--- pgsql/doc/src/sgml/ref/lock.sgml	2010-07-29 11:58:56.0 +0200
*** LOCK [ TABLE ] [ ONLY ] replaceable cla
*** 39,46 
 literalNOWAIT/literal is specified, commandLOCK
 TABLE/command does not wait to acquire the desired lock: if it
 cannot be acquired immediately, the command is aborted and an
!error is emitted.  Once obtained, the lock is held for the
!remainder of the current transaction.  (There is no commandUNLOCK
 TABLE/command command; locks are always released at transaction
 end.)
/para
--- 39,49 
 literalNOWAIT/literal is specified, commandLOCK
 TABLE/command does not wait to acquire the desired lock: if it
 cannot be acquired immediately, the command is aborted and an
!error is emitted. If varnamelock_timeout/varname is set to a value
!higher than 0, and the lock cannot be acquired under the specified
!timeout value in milliseconds, the command is aborted and an error
!is emitted. Once obtained, the lock is held for the remainder of  
!the current transaction.  (There is no commandUNLOCK
 TABLE/command command; locks are always released at transaction
 end.)
/para
diff -dcrpN pgsql.orig/doc/src/sgml/ref/select.sgml pgsql/doc/src/sgml/ref/select.sgml
*** pgsql.orig/doc/src/sgml/ref/select.sgml	2010-06-20 13:59:13.0 +0200
--- pgsql/doc/src/sgml/ref/select.sgml	2010-07-29 11:58:56.0 +0200
*** FOR SHARE [ OF replaceable class=param
*** 1160,1165 
--- 1160,1173 
 /para
  
 para
+ If literalNOWAIT/ option is not specified and varnamelock_timeout/varname
+ is set to a value higher than 0, and the lock needs to wait more than
+ the specified value in milliseconds, the command reports an error after
+ timing out, 

Re: [HACKERS] lock_timeout GUC patch - Review

2010-08-02 Thread Boszormenyi Zoltan
Boszormenyi Zoltan írta:
 Marc Cousin írta:
   
 The Thursday 29 July 2010 13:55:38, Boszormenyi Zoltan wrote :
   
 
 I fixed this by adding CheckLockTimeout() function that works like
 CheckStatementTimeout() and ensuring that the same start time is
 used for both deadlock_timeout and lock_timeout if both are active.
 The preference of errors if their timeout values are equal is:
 statement_timeout  lock_timeout  deadlock_timeout
 
   
 As soon as lock_timeout is bigger than deadlock_timeout, it doesn't
 work, with this new version.

 Keeping the deadlock_timeout to 1s, when lock_timeout = 1001, 
 lock_timeout doesn't trigger anymore.
   
 

 I missed one case when the lock_timeout_active should have been set
 but the timer must not have been re-set, this caused the problem.
 I blame the hot weather and having no air conditioning. The second is
 now fixed. :-)

 I also added one line in autovacuum.c to disable lock_timeout,
 in case it's globally set in postgresq.conf as per Alvaro's comment.

 Also, I made sure that only one or two timeout causes (one of
 deadlock_timeout
 and lock_timeout in the first case or statement_timeout plus one of the
 other two)
 can be active at a time.

A little clarification is needed. The above statement is not entirely true.
There can be a case when all three timeout causes can be active, but only
when deadlock_timeout is the smallest of the three. If the fin_time value
for the another timeout cause is smaller than for deadlock_timeout then
there's no point to make deadlock_timeout active. And in case
deadlock_timeout
triggers and CheckDeadLock() finds that there really is a deadlock then the
possibly active lock_timeout gets disabled to avoid calling
RemoveFromWaitQueue() twice. I hope the comments in the code explain it
well.

  Previously I was able to trigger a segfault
 with the default
 1sec deadlock_timeout and lock_timeout = 999 or 1001. Effectively, the
 system's
 clock resolution makes the lock_timeout and deadlock_timeout equal and
 RemoveFromWaitQueue() was called twice. This way it's a lot more robust.
   

Best regards,
Zoltán Böszörményi


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] lock_timeout GUC patch - Review

2010-08-02 Thread Marc Cousin
The Monday 02 August 2010 13:59:59, Boszormenyi Zoltan wrote :
 
  Also, I made sure that only one or two timeout causes (one of
  deadlock_timeout
  and lock_timeout in the first case or statement_timeout plus one of the
  other two)
  can be active at a time.
 
 A little clarification is needed. The above statement is not entirely true.
 There can be a case when all three timeout causes can be active, but only
 when deadlock_timeout is the smallest of the three. If the fin_time value
 for the another timeout cause is smaller than for deadlock_timeout then
 there's no point to make deadlock_timeout active. And in case
 deadlock_timeout
 triggers and CheckDeadLock() finds that there really is a deadlock then the
 possibly active lock_timeout gets disabled to avoid calling
 RemoveFromWaitQueue() twice. I hope the comments in the code explain it
 well.
 
   Previously I was able to trigger a segfault
  with the default
  1sec deadlock_timeout and lock_timeout = 999 or 1001. Effectively, the
  system's
  clock resolution makes the lock_timeout and deadlock_timeout equal and
  RemoveFromWaitQueue() was called twice. This way it's a lot more robust.

 
Ok, I've tested this new version:

This time, it's this case that doesn't work :

Session 1 : lock a table

Session 2 : set lock_timeout to a large value, just to make it obvious (10 
seconds).
It times out after 1 s (the deadlock timeout), returning 'could not obtain 
lock'.
Of course, it should wait 10 seconds.


I really feel that the timeout framework is the way to go here. I know what
you said about it before, but I think that the current code is getting
too complicated, with too many special cases all over.


As this is only my second review and I'm sure I am missing things here, could
someone with more experience have a look and give advice ?


Cheers

Marc

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] lock_timeout GUC patch - Review

2010-08-02 Thread Kevin Grittner
Marc Cousin cousinm...@gmail.com wrote:
 
 This time, it's this case that doesn't work :
 
 I really feel that the timeout framework is the way to go here.
 
Since Zoltán also seems to feel this way:
 
http://archives.postgresql.org/message-id/4c516c3a.6090...@cybertec.at
 
I wonder whether this patch shouldn't be rejected with a request
that the timeout framework be submitted to the next CF.  Does anyone
feel this approach (without the framework) should be pursued
further?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] lock_timeout GUC patch - Review

2010-08-02 Thread Boszormenyi Zoltan
Hi,

Kevin Grittner írta:
 Marc Cousin cousinm...@gmail.com wrote:
  
   
 This time, it's this case that doesn't work :
 
  
   
 I really feel that the timeout framework is the way to go here.
 
  
 Since Zoltán also seems to feel this way:
  
 http://archives.postgresql.org/message-id/4c516c3a.6090...@cybertec.at
  
 I wonder whether this patch shouldn't be rejected with a request
 that the timeout framework be submitted to the next CF.  Does anyone
 feel this approach (without the framework) should be pursued
 further?
   

I certainly think so, the current scheme seems to be very fragile
and doesn't want to be extended.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] lock_timeout GUC patch - Review

2010-08-02 Thread Kevin Grittner
Boszormenyi Zoltan z...@cybertec.at wrote:
 Kevin Grittner írta:
 
 I wonder whether this patch shouldn't be rejected with a request
 that the timeout framework be submitted to the next CF.  Does
 anyone feel this approach (without the framework) should be
 pursued further?
 
 I certainly think so, the current scheme seems to be very fragile
 and doesn't want to be extended.
 
Sorry, I phrased that question in a rather confusing way; I'm not
sure, but it sounds like you're in favor of dropping this approach
and pursuing the timeout framework in the next CF -- is that right?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] lock_timeout GUC patch - Review

2010-08-02 Thread Robert Haas
On Mon, Aug 2, 2010 at 3:09 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Marc Cousin cousinm...@gmail.com wrote:

 This time, it's this case that doesn't work :

 I really feel that the timeout framework is the way to go here.

 Since Zoltán also seems to feel this way:

 http://archives.postgresql.org/message-id/4c516c3a.6090...@cybertec.at

 I wonder whether this patch shouldn't be rejected with a request
 that the timeout framework be submitted to the next CF.  Does anyone
 feel this approach (without the framework) should be pursued
 further?

I think Returned with Feedback would be more appropriate than
Rejected, since we're asking for a rework, rather than saying - we
just don't want this.  But otherwise, +1.

Generally, I'm of the opinion that patches needing significant rework
should be set to Returned with Feedback and resubmitted for the next
CF; otherwise, it just gets unmanageable.  Our goal for a CF should be
to review everything thoroughly, not to get everything committed.
Otherwise, we end up with a never-ending train of what are effectively
new patches popping up, and it becomes impossible to close out the
CommitFest on time.  Reviewers and committers end up getting slammed,
and it's not very much fun for patch authors (who are trying to
frantically do last-minute rewrites) either; nor is it good for the
overall quality of code the ends up in our tree.  Better to take a
breather and then start fresh.

(If you don't believe in committer fatigue, look at the review I gave
Itagaki Takahiro on the partitioning patch in January vs. the review I
gave in July.  One of those reviews is a whole lot more specific,
detailed, and accurate than the other one...)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] lock_timeout GUC patch - Review

2010-08-02 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:
 
 I wonder whether this patch shouldn't be rejected with a request
 that the timeout framework be submitted to the next CF.
 
 I think Returned with Feedback would be more appropriate than
 Rejected, since we're asking for a rework, rather than saying -
 we just don't want this.  But otherwise, +1.
 
Done.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] lock_timeout GUC patch - Review

2010-07-30 Thread Marc Cousin
The Thursday 29 July 2010 13:55:38, Boszormenyi Zoltan wrote :
 I fixed this by adding CheckLockTimeout() function that works like
 CheckStatementTimeout() and ensuring that the same start time is
 used for both deadlock_timeout and lock_timeout if both are active.
 The preference of errors if their timeout values are equal is:
 statement_timeout  lock_timeout  deadlock_timeout

As soon as lock_timeout is bigger than deadlock_timeout, it doesn't
work, with this new version.

Keeping the deadlock_timeout to 1s, when lock_timeout = 1001, 
lock_timeout doesn't trigger anymore.


 
  * Consider the changes to the code in the context of the project as a whole:
* Is everything done in a way that fits together coherently with
  other features/modules?
  I have a feeling that
  enable_sig_alarm/enable_sig_alarm_for_lock_timeout tries to solve a
  very specific problem, and it gets complicated because there is no
  infrastructure in the code to handle several timeouts at the same time
  with sigalarm, so each timeout has its own dedicated and intertwined
  code. But I'm still discovering this part of the code.

 

 This WIP patch is also attached for reference, too. I would prefer
 this way, but I don't have more time to work on it and there are some
 interdependencies in the signal handler when e.g. disable_sig_alarm(true);
 means to disable ALL timers not just the statement_timeout.
 The specifically coded lock_timeout patch goes with the flow and doesn't
 change the semantics and works. If someone wants to pick up the timer
 framework patch and can make it work, fine. But I am not explicitely
 submitting it for the commitfest. The original patch with the fixes works
 and needs only a little more review.

Ok, understood. But I like the principle of this framework much more (the rest 
of the code seems simpler to me as a consequence of this framework).

But it goes far beyond the initial intent of the patch.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] lock_timeout GUC patch - Review

2010-07-30 Thread Alvaro Herrera
Excerpts from Boszormenyi Zoltan's message of jue jul 29 07:55:38 -0400 2010:
 Hi,
 
 Marc Cousin írta:
  Hi, I've been reviewing this patch for the last few days. Here it is :

 ...
* Are there dangers?
  As it is a guc, it could be set globally, is that a danger ?

 
 I haven't added any new code covering supplemental (e.g. autovacuum)
 processes,
 the interactions are yet to be discovered. Setting it globally is not
 recommended.

FWIW there is some code in autovacuum and other auxiliary processes that
forcibly sets statement_timeout to 0.  I think this patch should do
likewise.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] lock_timeout GUC patch - Review

2010-07-23 Thread zb
Hi,


first, thanks for the review.

 Hi, I've been reviewing this patch for the last few days. Here it is :

 * Submission review
   * Is the patch in context diff format?
 Yes

   * Does it apply cleanly to the current CVS HEAD?
 Yes

   * Does it include reasonable tests, necessary doc patches, etc?
 Doc patches are there.
 There are no regression tests. Should there be ?

IIRC, there was a discussion/patch about parallel psql that can
hold more than one connections open. With that feature, a regression
test can be added. Reading the 9.0beta3 docs, it's not there
and this patch is not on the current commitfest either.
Is there anyone who knows the status of this feature?

 * Usability review
   * Does the patch actually implement that?
 Yes

   * Do we want that?
 I think so. At least I'd like to have this feature :)

:-)

   * Do we already have it?
 No

   * Does it follow SQL spec, or the community-agreed behavior?
 I didn't see a clear conclusion from the -hackers thread on this (GUC
 vs SQL statement extension)

   * Does it include pg_dump support (if applicable)?
 Not applicable. Or should pg_dump and/or pg_restore put lock_timeout to 0
 ?

   * Are there dangers?
 As it is a guc, it could be set globally, is that a danger ?

It could be set globally, but it will exhibit a new global behaviour.
Which is not unexpected. The problem may come from [auto]vacuum processes
can get stopped. Maybe others, too. A previous version contained a
checking function that refused to set it from postgresql.conf for this
reason, but it was frowned upon by Tom. :-) The proper fix would be
that every such processes should set this GUC to zero for them locally.

   * Have all the bases been covered?
 I honestly don't know. It touches alarm signal handling.

 * Apply the patch, compile it and test:
   * Does the feature work as advertised?
 I only tested it with Linux. The code is very OS-dependent. It works
 as advertised with Linux. I found only one corner case (see below)

The setitimer() function is implemented in backend/port/win32/timer.c,
so it's abstracted away. With that in mind, I think there's no
OS-dependent in this patch.

   * Are there corner cases the author has failed to consider?
 The feature almost works as advertised : it fails when lock_timeout =
 deadlock_timeout. Then the lock_timeout isn't detected. I think this
 case isn't considered in handle_sig_alarm().

I will look into this, thanks for spotting it.

   * Are there any assertion failures or crashes?
 No


 * Performance review
   * Does the patch slow down simple tests?
 No

   * If it claims to improve performance, does it?
 Not applicable

 * Does it slow down other things?
 No. Maybe alarm signal handling and enabling will be slower, as there
 is more work done there  (for instance, a GetCurrentTimestamp, that
 was only done when log_lock_waits was activated until now. But I
 couldn't measure a slowdown.

 * Read the changes to the code in detail and consider:
   * Does it follow the project coding guidelines?
 I think so

   * Are there portability issues?
 It seems to have already been adressed, from the previous discussion
 in the thread.

   * Will it work on Windows/BSD etc?
 It should. I couldn't test it though. Infrastructure is there.

 * Are the comments sufficient and accurate?
 Yes

 * Does it do what it says, correctly?
 Yes

 * Does it produce compiler warnings?
 No

 * Can you make it crash?
 No

 * Consider the changes to the code in the context of the project as a
 whole:
   * Is everything done in a way that fits together coherently with
 other features/modules?
 I have a feeling that
 enable_sig_alarm/enable_sig_alarm_for_lock_timeout tries to solve a
 very specific problem, and it gets complicated because there is no
 infrastructure in the code to handle several timeouts at the same time
 with sigalarm, so each timeout has its own dedicated and intertwined
 code. But I'm still discovering this part of the code.

There is a problem with setitimer(): only one timer can be alive
at one time with the same timer id (ITIMER_REAL).

   * Are there interdependencies that can cause problems?
 I don't think so.

Thanks for the review, I will post a new patch sometime next week.

Best regards,
Zoltán Böszörményi



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] lock_timeout GUC patch - Review

2010-07-20 Thread Marc Cousin
Hi, I've been reviewing this patch for the last few days. Here it is :

* Submission review
  * Is the patch in context diff format?
Yes

  * Does it apply cleanly to the current CVS HEAD?
Yes

  * Does it include reasonable tests, necessary doc patches, etc?
Doc patches are there.
There are no regression tests. Should there be ?

* Usability review
  * Does the patch actually implement that?
Yes

  * Do we want that?
I think so. At least I'd like to have this feature :)

  * Do we already have it?
No

  * Does it follow SQL spec, or the community-agreed behavior?
I didn't see a clear conclusion from the -hackers thread on this (GUC
vs SQL statement extension)

  * Does it include pg_dump support (if applicable)?
Not applicable. Or should pg_dump and/or pg_restore put lock_timeout to 0 ?

  * Are there dangers?
As it is a guc, it could be set globally, is that a danger ?

  * Have all the bases been covered?
I honestly don't know. It touches alarm signal handling.

* Apply the patch, compile it and test:
  * Does the feature work as advertised?
I only tested it with Linux. The code is very OS-dependent. It works
as advertised with Linux. I found only one corner case (see below)

  * Are there corner cases the author has failed to consider?
The feature almost works as advertised : it fails when lock_timeout =
deadlock_timeout. Then the lock_timeout isn't detected. I think this
case isn't considered in handle_sig_alarm().

  * Are there any assertion failures or crashes?
No


* Performance review
  * Does the patch slow down simple tests?
No

  * If it claims to improve performance, does it?
Not applicable

* Does it slow down other things?
No. Maybe alarm signal handling and enabling will be slower, as there
is more work done there  (for instance, a GetCurrentTimestamp, that
was only done when log_lock_waits was activated until now. But I
couldn't measure a slowdown.

* Read the changes to the code in detail and consider:
  * Does it follow the project coding guidelines?
I think so

  * Are there portability issues?
It seems to have already been adressed, from the previous discussion
in the thread.

  * Will it work on Windows/BSD etc?
It should. I couldn't test it though. Infrastructure is there.

* Are the comments sufficient and accurate?
Yes

* Does it do what it says, correctly?
Yes

* Does it produce compiler warnings?
No

* Can you make it crash?
No

* Consider the changes to the code in the context of the project as a whole:
  * Is everything done in a way that fits together coherently with
other features/modules?
I have a feeling that
enable_sig_alarm/enable_sig_alarm_for_lock_timeout tries to solve a
very specific problem, and it gets complicated because there is no
infrastructure in the code to handle several timeouts at the same time
with sigalarm, so each timeout has its own dedicated and intertwined
code. But I'm still discovering this part of the code.

  * Are there interdependencies that can cause problems?
I don't think so.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers