Re: [PERFORM] maintaining a reference to a fetched row

2009-11-09 Thread Dimitri Fontaine
Brian Karlak  writes:
> I have a simple queuing application written on top of postgres which I'm
> trying to squeeze some more performance out of.

Have you tried to write a custom PGQ consumer yet?
  http://wiki.postgresql.org/wiki/PGQ_Tutorial

Regards,
-- 
dim

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] maintaining a reference to a fetched row

2009-11-04 Thread Tom Lane
Brian Karlak  writes:
> On Nov 4, 2009, at 8:47 AM, Jeff Janes wrote:
>> Why would the index scan take 1 ms two of the times it is done but 5ms
>> the third time?  Isn't it the same index scan each time?  Or does the
>> change in queue.status change the plan?

> The final update is a different query -- just a plain old update by ID:
> UPDATE queue_proc set status = 'proc' where id = %s ;
> This update by ID takes ~2.5ms, which means it's where the framework  
> is spending most of its overhead.

Well, if SELECT FROM queue_proc where id = %s takes 1ms and the update
takes 2.5ms, then you've got 1.5ms going into updating the row, which
means it's not going to get a whole lot faster by switching to some
other WHERE condition.  Maybe you should look at cutting back on indexes
and/or triggers attached to this table.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] maintaining a reference to a fetched row

2009-11-04 Thread Brian Karlak


On Nov 4, 2009, at 8:47 AM, Jeff Janes wrote:

Worker daemons do a bounded, ordered, limited SELECT to grab a row,  
which

they lock by setting a value in the queue.status column.


So you do a select, and then an update?


I do a select for update in a stored proc:

FOR queue_item IN
  SELECT *  FROM queue
   WHERE status IS NULL AND id >= low_bound_id
   ORDER BY id LIMIT batch_size
 FOR UPDATE
LOOP
  UPDATE queue_proc set status = 'proc' where id = queue_item.id ;

The daemons keep track of their last position in the queue with  
low_bound_id.  Also, as you probably notice, I also fetch a batch of  
(100) items at a time.  In practice, it's pretty fast.  The job I'm  
running now is showing an average fetch time of 30ms per 100 actions,  
which ain't bad.


However, the writing of results back to the row takes ~5ms, which  
is slower

than I'd like.


It seems you have an select, and update, and another update.  Where in
this process do you commit?  Are you using fsync=off or
synchronous_commit=off?


First commit occurs after the stored proc to select/update a batch of  
items is complete.  Second commit occurs on the writing of results  
back for each particular action.  Two commits are required because the  
time it takes to complete the intervening action can vary wildly:  
anywhere between 20ms and 45min.



It seems that this is because I need to to do an index scan
on the queue table to find the row I just fetched.


Why would the index scan take 1 ms two of the times it is done but 5ms
the third time?  Isn't it the same index scan each time?  Or does the
change in queue.status change the plan?


The final update is a different query -- just a plain old update by ID:

UPDATE queue_proc set status = 'proc' where id = %s ;

This update by ID takes ~2.5ms, which means it's where the framework  
is spending most of its overhead.


Brian


Re: [PERFORM] maintaining a reference to a fetched row

2009-11-04 Thread Brian Karlak


On Nov 3, 2009, at 9:31 PM, Tom Lane wrote:


Brian Karlak  writes:

My question is this: is there some way that I can keep a cursor /
pointer / reference / whatever to the row I fetched originally, so
that I don't have to search for it again when I'm ready to write
results?


If you don't expect any updates to the row meanwhile, ctid might  
serve.


Ahhh ... that's the magic I'm looking for.  Thanks!

Brian

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] maintaining a reference to a fetched row

2009-11-04 Thread Jeff Janes
On Tue, Nov 3, 2009 at 12:30 PM, Brian Karlak  wrote:
> Hello All --
>
> I have a simple queuing application written on top of postgres which I'm
> trying to squeeze some more performance out of.
>
> The setup is relatively simple: there is a central queue table in postgres.
>  Worker daemons do a bounded, ordered, limited SELECT to grab a row, which
> they lock by setting a value in the queue.status column.

So you do a select, and then an update?

> When the task is
> complete, results are written back to the row.  The system is designed to
> allow multiple concurrent daemons to access a queue.  At any one time, we
> expect 1-5M active items on the queue.
>
> Now this design is never going to win any performance awards against a true
> queuing system like Active/Rabbit/Zero MQ, but it's tolerably fast for our
> applications.  Fetch/mark times are about 1ms, independent of the number of
> items on the queue.  This is acceptable considering that our tasks take
> ~50ms to run.
>
> However, the writing of results back to the row takes ~5ms, which is slower
> than I'd like.

It seems you have an select, and update, and another update.  Where in
this process do you commit?  Are you using fsync=off or
synchronous_commit=off?

> It seems that this is because I need to to do an index scan
> on the queue table to find the row I just fetched.

Why would the index scan take 1 ms two of the times it is done but 5ms
the third time?  Isn't it the same index scan each time?  Or does the
change in queue.status change the plan?

Cheers,

Jeff

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] maintaining a reference to a fetched row

2009-11-03 Thread Tom Lane
Brian Karlak  writes:
> My question is this: is there some way that I can keep a cursor /  
> pointer / reference / whatever to the row I fetched originally, so  
> that I don't have to search for it again when I'm ready to write  
> results?

If you don't expect any updates to the row meanwhile, ctid might serve.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] maintaining a reference to a fetched row

2009-11-03 Thread Brian Karlak


On Nov 3, 2009, at 4:03 PM, Craig Ringer wrote:

I don't have a good answer for you there. Perhaps using Pg's locking  
to
do your queueing, rather than updating a status flag, might let you  
use
a cursor? Have a look at the list archives - there's been a fair bit  
of

discussion of queuing mechanisms.


This is an interesting idea.  I'll see what I can find in the  
archives.  It will likely take a bit of refactoring, but such is  
life ...


Thanks!
Brian

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] maintaining a reference to a fetched row

2009-11-03 Thread Craig Ringer
Brian Karlak wrote:

> The setup is relatively simple: there is a central queue table in
> postgres.  Worker daemons do a bounded, ordered, limited SELECT to grab
> a row, which they lock by setting a value in the queue.status column. 

You can probably do an UPDATE ... RETURNING to turn that into one
operation - but that won't work with a cursor :-(

> My question is this: is there some way that I can keep a cursor /
> pointer / reference / whatever to the row I fetched originally, so that
> I don't have to search for it again when I'm ready to write results?

You could use a cursor, but it won't work if you're locking rows by
testing a 'status' flag, because that requires the worker to commit the
transaction (so others can see the status flag) before starting work. A
cursor only exists within a transaction.

BEGIN;
DECLARE curs CURSOR FOR SELECT * FROM queue ORDER BY queue_id LIMIT 1;
FETCH NEXT FROM curs;
--
-- Set the status - but nobody else can see the change yet because we
-- haven't committed! We'll have a Pg row lock on the record due to the
-- UPDATE, preventing other UPDATEs but not other SELECTs.
--
-- We can't start work until the transaction commits, but committing
-- will close the cursor.
--
UPDATE queue SET status = 1 WHERE CURRENT OF curs;


I don't have a good answer for you there. Perhaps using Pg's locking to
do your queueing, rather than updating a status flag, might let you use
a cursor? Have a look at the list archives - there's been a fair bit of
discussion of queuing mechanisms.

--
Craig Ringer

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance