Re: [GENERAL] SELECT ... FOR UPDATE performance costs? alternatives?

2007-08-16 Thread Douglas McNaught
D. Dante Lorenso [EMAIL PROTECTED] writes:

 I need logic like atomic test and set or pop 1 item off the queue
 atomically and tell me what that item was.

 In my situation, there are a dozen or so machines polling this queue
 periodically looking for work to do.  As more polling is occurring,
 the locks seem to be taking longer so I was worried table-level locks
 might be occurring.

How quickly after you update the row status are you comitting (and
releasing locks)?  I have apps that basically do:

SELECT id FROM job_table WHERE status = 'New' FOR UPDATE;
UPDATE job_table SET status = 'Processing' WHERE id IN (set of IDs);
COMMIT; -- releases all locks

process each job in the list we got and update its status

This has worked very well for me.

-Doug

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] SELECT ... FOR UPDATE performance costs? alternatives?

2007-08-16 Thread Erik Jones


On Aug 15, 2007, at 9:21 PM, D. Dante Lorenso wrote:


Erik Jones wrote:

On Aug 15, 2007, at 2:39 PM, [EMAIL PROTECTED] wrote:

Erik Jones wrote:

On Aug 15, 2007, at 1:09 PM, D. Dante Lorenso wrote:

...to ensure that only one server is processing the queue item,  
so inside PGSQL, use SELECT ... FOR UPDATE to lock the row...
When my server is under severe load, however, this function  
begins to take a long time to execute and I begin to suspect  
that the FOR UPDATE lock might be locking the whole table and  
not just the row.  How do I know if this is the case, how can I  
optimize this procedure, and how should I be doing this  
differently?  ...


Thoughts?


SELECT ... FOR UPDATE should only be locking the rows returned  
by your the select statement, in this case the one row.  You can  
check what locks exist on a table (and their type) with the  
pg_locks system view.




Is that correct? Documentation section 12.3.1. Table-Level Locks  
states 'The list below shows the available lock modes ...Remember  
that all of these lock modes are table-level locks, even if the  
name contains the word row'.
You will notice that SELECT ... FOR UPDATE is not in that list.   
It's covered in the next section on row level locks.


I wonder why bother with the SELECT statement at all. Why not  
just go straight to the UPDATE statement with something like


   UPDATE queue SET
 status = in_new_status,
 ts_start = NOW(),
 ts_end = NULL,
 hostname = COALESCE(in_hostname, hostname)
   WHERE tcq_id = (SELECT tcq_id  FROM queue q WHERE q.status =  
in_status ORDER BY tcq_id ASC LIMIT 1);


He may need to trap an exception for the not found case, but  
what's the big deal with that?


UPDATE statements acquire a ROW EXCLUSIVE on the table, which  
conflicts, among other things, with ROW EXCLUSIVE, so it will  
block other UPDATE statements initiated by other transactions.
That won't work because the update won't lock the row until the  
select returns.  So, if two process execute that at the same time  
they will both execute the subquery and return the same result,  
the first will update it and the second will then (redundantly)  
update it.


It also won't work because I need to change AND read the row.  If I  
only do the update, I don't know what was updated.  I still need to  
return the tcq_id to my application.


Maybe the update could look like this:

UPDATE queue SET
  status = in_new_status,
  ts_start = NOW(),
  ts_end = NULL,
  hostname = COALESCE(in_hostname, hostname),
WHERE status = in_status;

But there I don't have the LIMIT 1, and I also don't know which  
rows got updated.  I supposed there might be some magic to find the  
OID of the affected rows, but I don't know how what would be done.


I need logic like atomic test and set or pop 1 item off the queue  
atomically and tell me what that item was.


If you're using 8.2.x there is the RETURNING clause that can be used  
with UPDATE and INSERT queries.  So, you could alter his version to  
add the FOR UPDATE to the subquery and tack on the RETURNING clause  
to the UPDATE and you'd have semantically equivalent.




In my situation, there are a dozen or so machines polling this  
queue periodically looking for work to do.  As more polling is  
occurring, the locks seem to be taking longer so I was worried  
table-level locks might be occurring.


As I said, watch pg_locks.  Also, make sure that this function call  
isn't happening inside a larger transaction that's not committing  
right away when the function returns.  The lock from the SELECT ...  
FOR UPDATE isn't released until the enclosing transaction commits or  
rolls back.


Erik Jones

Software Developer | Emma®
[EMAIL PROTECTED]
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate  market in style.
Visit us online at http://www.myemma.com



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] SELECT ... FOR UPDATE performance costs? alternatives?

2007-08-16 Thread D. Dante Lorenso

Douglas McNaught wrote:

D. Dante Lorenso [EMAIL PROTECTED] writes:
How quickly after you update the row status are you comitting (and
releasing locks)?


I am calling a stored proc from PHP.  Since I do not begin a 
transaction, I assume that my call is automatically committed 
immediately after invocation.


SELECT reserve_next_tcqueue(?, ?, ?) AS result


SELECT id FROM job_table WHERE status = 'New' FOR UPDATE;
UPDATE job_table SET status = 'Processing' WHERE id IN (set of IDs);
COMMIT; -- releases all locks
process each job in the list we got and update its status
This has worked very well for me.


Yes, this does work well for me also most of the time.  It is only when 
the database server begins to suffer from severe load (like 3+) that 
PostgreSQL begins to log the reserve_next_tcqueue(...) queries as taking 
a long time to complete.  Here are some examples:


...

Aug 13 16:00:42 shed03 postgres[20264]: [5-2]  reserve_next_tcqueue($1, 
$2, $3) AS tcq_id]
Aug 13 16:00:53 shed03 postgres[17338]: [5-1] 17338 dbxxx 10.10.20.163 
LOG:  duration: 3159.208 ms  statement: EXECUTE unnamed  [PREPARE:  SELECT


Aug 13 16:00:54 shed03 postgres[20447]: [5-2]  reserve_next_tcqueue($1, 
$2, $3) AS tcq_id]
Aug 13 16:00:54 shed03 postgres[20470]: [5-1] 20470 dbxxx 10.10.20.51 
LOG:  duration: 4162.031 ms  statement: EXECUTE unnamed  [PREPARE:  SELECT


Aug 13 16:00:54 shed03 postgres[20470]: [5-2]  reserve_next_tcqueue($1, 
$2, $3) AS tcq_id]
Aug 13 16:00:59 shed03 postgres[20530]: [5-1] 20530 dbxxx 10.10.20.51 
LOG:  duration: 3672.077 ms  statement: EXECUTE unnamed  [PREPARE:  SELECT


...

-- Dante

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] SELECT ... FOR UPDATE performance costs? alternatives?

2007-08-16 Thread Tom Lane
D. Dante Lorenso [EMAIL PROTECTED] writes:
 Douglas McNaught wrote:
 How quickly after you update the row status are you comitting (and
 releasing locks)?

 I am calling a stored proc from PHP.  Since I do not begin a 
 transaction, I assume that my call is automatically committed 
 immediately after invocation.

Have you actually verified that, or are you just assuming it?
I believe that PHP has some sort of autocommit on/off option,
which might possibly be doing things behind your back.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] SELECT ... FOR UPDATE performance costs? alternatives?

2007-08-16 Thread D. Dante Lorenso

Tom Lane wrote:

D. Dante Lorenso [EMAIL PROTECTED] writes:

Douglas McNaught wrote:

How quickly after you update the row status are you comitting (and
releasing locks)?


I am calling a stored proc from PHP.  Since I do not begin a 
transaction, I assume that my call is automatically committed 
immediately after invocation.


Have you actually verified that, or are you just assuming it?


Just assuming.  I'm not really sure HOW to verify it, though.


I believe that PHP has some sort of autocommit on/off option,
which might possibly be doing things behind your back.


I am using PHP / PDO and all my statements are prepared.  Sometimes I 
will begin a transaction using PDO and do either commit or rollback, but 
I don't explicitly use transactions if I don't intend to exec more than 
one statement.  Here is what PHP says about auto-commit in PDO:


 8  8  8 
http://www.php.net/manual/en/ref.pdo.php
Unfortunately, not every database supports transactions, so PDO needs to 
run in what is known as auto-commit mode when you first open the 
connection. Auto-commit mode means that every query that you run has its 
own implicit transaction, if the database supports it, or no transaction 
if the database doesn't support transactions.


If you need a transaction, you must use the PDO-beginTransaction() 
method to initiate one. If the underlying driver does not support 
transactions, a PDOException will be thrown (regardless of your error 
handling settings: this is always a serious error condition). Once you 
are in a transaction, you may use PDO-commit() or PDO-rollBack() to 
finish it, depending on the success of the code you run during the 
transaction.

 8  8  8 

So, I feel safe enough with my assumption.  I'm not entirely sure about 
the stored procedure, though.  I've recently rewritten the procedure as 
separate queries, but don't know if that will help until I hit a high 
peak load again.


-- Dante


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] SELECT ... FOR UPDATE performance costs? alternatives?

2007-08-16 Thread Tom Lane
D. Dante Lorenso [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 D. Dante Lorenso [EMAIL PROTECTED] writes:
 I am calling a stored proc from PHP.  Since I do not begin a 
 transaction, I assume that my call is automatically committed 
 immediately after invocation.
 
 Have you actually verified that, or are you just assuming it?

 Just assuming.  I'm not really sure HOW to verify it, though.

Enable query logging on the server and look for BEGIN commands?

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


[GENERAL] SELECT ... FOR UPDATE performance costs? alternatives?

2007-08-15 Thread D. Dante Lorenso

All,

I have a stored procedure that I use to manage a queue.  I want to pop 
an item off the queue to ensure that only one server is processing the 
queue item, so inside PGSQL, use SELECT ... FOR UPDATE to lock the row. 
 Here's how I pop the queue item:


--
CREATE OR REPLACE FUNCTION public.reserve_next_tcqueue (in_hostname 
varchar, in_status char, in_new_status char) RETURNS bigint AS

$body$
DECLARE
my_reserved_id BIGINT;
BEGIN
 /* find and lock a row with the indicated status */
 SELECT tcq_id
 INTO my_reserved_id
 FROM queue q
 WHERE q.status = in_status
 ORDER BY tcq_id ASC
 LIMIT 1
 FOR UPDATE;

/* we didn't find anything matching */
IF NOT FOUND THEN
RETURN 0;
END IF;

/* change the status to the new status */
UPDATE queue SET
  status = in_new_status,
  ts_start = NOW(),
  ts_end = NULL,
  hostname = COALESCE(in_hostname, hostname)
WHERE tcq_id = my_reserved_id;

/* send back our reserved ID */
RETURN my_reserved_id;
END;
$body$
LANGUAGE 'plpgsql' VOLATILE RETURNS NULL ON NULL INPUT SECURITY INVOKER;
--

When my server is under severe load, however, this function begins to 
take a long time to execute and I begin to suspect that the FOR UPDATE 
lock might be locking the whole table and not just the row.


How do I know if this is the case, how can I optimize this procedure, 
and how should I be doing this differently?  I'm guessing stacks and 
queues would be common patterns handled in the PostgreSQL community.


Thoughts?

-- Dante

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] SELECT ... FOR UPDATE performance costs? alternatives?

2007-08-15 Thread Erik Jones

On Aug 15, 2007, at 1:09 PM, D. Dante Lorenso wrote:


All,

I have a stored procedure that I use to manage a queue.  I want to  
pop an item off the queue to ensure that only one server is  
processing the queue item, so inside PGSQL, use SELECT ... FOR  
UPDATE to lock the row.  Here's how I pop the queue item:


--
CREATE OR REPLACE FUNCTION  
public.reserve_next_tcqueue (in_hostname varchar, in_status  
char, in_new_status char) RETURNS bigint AS

$body$
DECLARE
my_reserved_id BIGINT;
BEGIN
 /* find and lock a row with the indicated status */
 SELECT tcq_id
 INTO my_reserved_id
 FROM queue q
 WHERE q.status = in_status
 ORDER BY tcq_id ASC
 LIMIT 1
 FOR UPDATE;

/* we didn't find anything matching */
IF NOT FOUND THEN
RETURN 0;
END IF;

/* change the status to the new status */
UPDATE queue SET
  status = in_new_status,
  ts_start = NOW(),
  ts_end = NULL,
  hostname = COALESCE(in_hostname, hostname)
WHERE tcq_id = my_reserved_id;

/* send back our reserved ID */
RETURN my_reserved_id;
END;
$body$
LANGUAGE 'plpgsql' VOLATILE RETURNS NULL ON NULL INPUT SECURITY  
INVOKER;

--

When my server is under severe load, however, this function begins  
to take a long time to execute and I begin to suspect that the FOR  
UPDATE lock might be locking the whole table and not just the row.


How do I know if this is the case, how can I optimize this  
procedure, and how should I be doing this differently?  I'm  
guessing stacks and queues would be common patterns handled in the  
PostgreSQL community.


Thoughts?


SELECT ... FOR UPDATE should only be locking the rows returned by  
your the select statement, in this case the one row.  You can check  
what locks exist on a table (and their type) with the pg_locks system  
view.


Erik Jones

Software Developer | Emma®
[EMAIL PROTECTED]
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate  market in style.
Visit us online at http://www.myemma.com



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] SELECT ... FOR UPDATE performance costs? alternatives?

2007-08-15 Thread btober

Erik Jones wrote:

On Aug 15, 2007, at 1:09 PM, D. Dante Lorenso wrote:

...to ensure that only one server is processing the queue item, so 
inside PGSQL, use SELECT ... FOR UPDATE to lock the row... 

When my server is under severe load, however, this function begins to 
take a long time to execute and I begin to suspect that the FOR 
UPDATE lock might be locking the whole table and not just the row.  
How do I know if this is the case, how can I optimize this procedure, 
and how should I be doing this differently?  ...


Thoughts?


SELECT ... FOR UPDATE should only be locking the rows returned by your 
the select statement, in this case the one row.  You can check what 
locks exist on a table (and their type) with the pg_locks system view.




Is that correct? Documentation section 12.3.1. Table-Level Locks states 
'The list below shows the available lock modes ...Remember that all of 
these lock modes are table-level locks, even if the name contains the 
word row'.


I wonder why bother with the SELECT statement at all. Why not just go 
straight to the UPDATE statement with something like


   UPDATE queue SET
 status = in_new_status,
 ts_start = NOW(),
 ts_end = NULL,
 hostname = COALESCE(in_hostname, hostname)
   WHERE tcq_id = (SELECT tcq_id  FROM queue q WHERE q.status = 
in_status ORDER BY tcq_id ASC LIMIT 1);


He may need to trap an exception for the not found case, but what's 
the big deal with that?


UPDATE statements acquire a ROW EXCLUSIVE on the table, which conflicts, 
among other things, with ROW EXCLUSIVE, so it will block other UPDATE 
statements initiated by other transactions.




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] SELECT ... FOR UPDATE performance costs? alternatives?

2007-08-15 Thread Erik Jones


On Aug 15, 2007, at 2:39 PM, [EMAIL PROTECTED] wrote:


Erik Jones wrote:

On Aug 15, 2007, at 1:09 PM, D. Dante Lorenso wrote:

...to ensure that only one server is processing the queue item,  
so inside PGSQL, use SELECT ... FOR UPDATE to lock the row...
When my server is under severe load, however, this function  
begins to take a long time to execute and I begin to suspect that  
the FOR UPDATE lock might be locking the whole table and not just  
the row.  How do I know if this is the case, how can I optimize  
this procedure, and how should I be doing this differently?  ...


Thoughts?


SELECT ... FOR UPDATE should only be locking the rows returned by  
your the select statement, in this case the one row.  You can  
check what locks exist on a table (and their type) with the  
pg_locks system view.




Is that correct? Documentation section 12.3.1. Table-Level Locks  
states 'The list below shows the available lock modes ...Remember  
that all of these lock modes are table-level locks, even if the  
name contains the word row'.


You will notice that SELECT ... FOR UPDATE is not in that list.  It's  
covered in the next section on row level locks.


I wonder why bother with the SELECT statement at all. Why not just  
go straight to the UPDATE statement with something like


   UPDATE queue SET
 status = in_new_status,
 ts_start = NOW(),
 ts_end = NULL,
 hostname = COALESCE(in_hostname, hostname)
   WHERE tcq_id = (SELECT tcq_id  FROM queue q WHERE q.status =  
in_status ORDER BY tcq_id ASC LIMIT 1);


He may need to trap an exception for the not found case, but  
what's the big deal with that?


UPDATE statements acquire a ROW EXCLUSIVE on the table, which  
conflicts, among other things, with ROW EXCLUSIVE, so it will block  
other UPDATE statements initiated by other transactions.


That won't work because the update won't lock the row until the  
select returns.  So, if two process execute that at the same time  
they will both execute the subquery and return the same result, the  
first will update it and the second will then (redundantly) update it.


Erik Jones

Software Developer | Emma®
[EMAIL PROTECTED]
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate  market in style.
Visit us online at http://www.myemma.com



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] SELECT ... FOR UPDATE performance costs? alternatives?

2007-08-15 Thread D. Dante Lorenso

Erik Jones wrote:


On Aug 15, 2007, at 2:39 PM, [EMAIL PROTECTED] wrote:


Erik Jones wrote:

On Aug 15, 2007, at 1:09 PM, D. Dante Lorenso wrote:

...to ensure that only one server is processing the queue item, so 
inside PGSQL, use SELECT ... FOR UPDATE to lock the row...
When my server is under severe load, however, this function begins 
to take a long time to execute and I begin to suspect that the FOR 
UPDATE lock might be locking the whole table and not just the row.  
How do I know if this is the case, how can I optimize this 
procedure, and how should I be doing this differently?  ...


Thoughts?


SELECT ... FOR UPDATE should only be locking the rows returned by 
your the select statement, in this case the one row.  You can check 
what locks exist on a table (and their type) with the pg_locks system 
view.




Is that correct? Documentation section 12.3.1. Table-Level Locks 
states 'The list below shows the available lock modes ...Remember that 
all of these lock modes are table-level locks, even if the name 
contains the word row'.


You will notice that SELECT ... FOR UPDATE is not in that list.  It's 
covered in the next section on row level locks.


I wonder why bother with the SELECT statement at all. Why not just go 
straight to the UPDATE statement with something like


   UPDATE queue SET
 status = in_new_status,
 ts_start = NOW(),
 ts_end = NULL,
 hostname = COALESCE(in_hostname, hostname)
   WHERE tcq_id = (SELECT tcq_id  FROM queue q WHERE q.status = 
in_status ORDER BY tcq_id ASC LIMIT 1);


He may need to trap an exception for the not found case, but what's 
the big deal with that?


UPDATE statements acquire a ROW EXCLUSIVE on the table, which 
conflicts, among other things, with ROW EXCLUSIVE, so it will block 
other UPDATE statements initiated by other transactions.


That won't work because the update won't lock the row until the select 
returns.  So, if two process execute that at the same time they will 
both execute the subquery and return the same result, the first will 
update it and the second will then (redundantly) update it.


It also won't work because I need to change AND read the row.  If I only 
do the update, I don't know what was updated.  I still need to return 
the tcq_id to my application.


Maybe the update could look like this:

UPDATE queue SET
  status = in_new_status,
  ts_start = NOW(),
  ts_end = NULL,
  hostname = COALESCE(in_hostname, hostname),
WHERE status = in_status;

But there I don't have the LIMIT 1, and I also don't know which rows got 
updated.  I supposed there might be some magic to find the OID of the 
affected rows, but I don't know how what would be done.


I need logic like atomic test and set or pop 1 item off the queue 
atomically and tell me what that item was.


In my situation, there are a dozen or so machines polling this queue 
periodically looking for work to do.  As more polling is occurring, the 
locks seem to be taking longer so I was worried table-level locks might 
be occurring.


-- Dante


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] SELECT ... FOR UPDATE performance costs? alternatives?

2007-08-15 Thread Tom Lane
D. Dante Lorenso [EMAIL PROTECTED] writes:
 ... When my server is under severe load, however, this function begins to 
 take a long time to execute and I begin to suspect that the FOR UPDATE 
 lock might be locking the whole table and not just the row.

FOR UPDATE only locks the selected row(s).  However, the way you've got
this coded, every incoming server will try to select the same row, which
means that whichever one gets there first will block all the others
until it commits (and releases its row lock).  Are you careful to commit
immediately after making this update?

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend