Re: [PERFORM] maintaining a reference to a fetched row
Brian Karlak writes: > I have a simple queuing application written on top of postgres which I'm > trying to squeeze some more performance out of. Have you tried to write a custom PGQ consumer yet? http://wiki.postgresql.org/wiki/PGQ_Tutorial Regards, -- dim -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] maintaining a reference to a fetched row
Brian Karlak writes: > On Nov 4, 2009, at 8:47 AM, Jeff Janes wrote: >> Why would the index scan take 1 ms two of the times it is done but 5ms >> the third time? Isn't it the same index scan each time? Or does the >> change in queue.status change the plan? > The final update is a different query -- just a plain old update by ID: > UPDATE queue_proc set status = 'proc' where id = %s ; > This update by ID takes ~2.5ms, which means it's where the framework > is spending most of its overhead. Well, if SELECT FROM queue_proc where id = %s takes 1ms and the update takes 2.5ms, then you've got 1.5ms going into updating the row, which means it's not going to get a whole lot faster by switching to some other WHERE condition. Maybe you should look at cutting back on indexes and/or triggers attached to this table. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] maintaining a reference to a fetched row
On Nov 4, 2009, at 8:47 AM, Jeff Janes wrote: Worker daemons do a bounded, ordered, limited SELECT to grab a row, which they lock by setting a value in the queue.status column. So you do a select, and then an update? I do a select for update in a stored proc: FOR queue_item IN SELECT * FROM queue WHERE status IS NULL AND id >= low_bound_id ORDER BY id LIMIT batch_size FOR UPDATE LOOP UPDATE queue_proc set status = 'proc' where id = queue_item.id ; The daemons keep track of their last position in the queue with low_bound_id. Also, as you probably notice, I also fetch a batch of (100) items at a time. In practice, it's pretty fast. The job I'm running now is showing an average fetch time of 30ms per 100 actions, which ain't bad. However, the writing of results back to the row takes ~5ms, which is slower than I'd like. It seems you have an select, and update, and another update. Where in this process do you commit? Are you using fsync=off or synchronous_commit=off? First commit occurs after the stored proc to select/update a batch of items is complete. Second commit occurs on the writing of results back for each particular action. Two commits are required because the time it takes to complete the intervening action can vary wildly: anywhere between 20ms and 45min. It seems that this is because I need to to do an index scan on the queue table to find the row I just fetched. Why would the index scan take 1 ms two of the times it is done but 5ms the third time? Isn't it the same index scan each time? Or does the change in queue.status change the plan? The final update is a different query -- just a plain old update by ID: UPDATE queue_proc set status = 'proc' where id = %s ; This update by ID takes ~2.5ms, which means it's where the framework is spending most of its overhead. Brian
Re: [PERFORM] maintaining a reference to a fetched row
On Nov 3, 2009, at 9:31 PM, Tom Lane wrote: Brian Karlak writes: My question is this: is there some way that I can keep a cursor / pointer / reference / whatever to the row I fetched originally, so that I don't have to search for it again when I'm ready to write results? If you don't expect any updates to the row meanwhile, ctid might serve. Ahhh ... that's the magic I'm looking for. Thanks! Brian -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] maintaining a reference to a fetched row
On Tue, Nov 3, 2009 at 12:30 PM, Brian Karlak wrote: > Hello All -- > > I have a simple queuing application written on top of postgres which I'm > trying to squeeze some more performance out of. > > The setup is relatively simple: there is a central queue table in postgres. > Worker daemons do a bounded, ordered, limited SELECT to grab a row, which > they lock by setting a value in the queue.status column. So you do a select, and then an update? > When the task is > complete, results are written back to the row. The system is designed to > allow multiple concurrent daemons to access a queue. At any one time, we > expect 1-5M active items on the queue. > > Now this design is never going to win any performance awards against a true > queuing system like Active/Rabbit/Zero MQ, but it's tolerably fast for our > applications. Fetch/mark times are about 1ms, independent of the number of > items on the queue. This is acceptable considering that our tasks take > ~50ms to run. > > However, the writing of results back to the row takes ~5ms, which is slower > than I'd like. It seems you have an select, and update, and another update. Where in this process do you commit? Are you using fsync=off or synchronous_commit=off? > It seems that this is because I need to to do an index scan > on the queue table to find the row I just fetched. Why would the index scan take 1 ms two of the times it is done but 5ms the third time? Isn't it the same index scan each time? Or does the change in queue.status change the plan? Cheers, Jeff -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] maintaining a reference to a fetched row
Brian Karlak writes: > My question is this: is there some way that I can keep a cursor / > pointer / reference / whatever to the row I fetched originally, so > that I don't have to search for it again when I'm ready to write > results? If you don't expect any updates to the row meanwhile, ctid might serve. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] maintaining a reference to a fetched row
On Nov 3, 2009, at 4:03 PM, Craig Ringer wrote: I don't have a good answer for you there. Perhaps using Pg's locking to do your queueing, rather than updating a status flag, might let you use a cursor? Have a look at the list archives - there's been a fair bit of discussion of queuing mechanisms. This is an interesting idea. I'll see what I can find in the archives. It will likely take a bit of refactoring, but such is life ... Thanks! Brian -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] maintaining a reference to a fetched row
Brian Karlak wrote: > The setup is relatively simple: there is a central queue table in > postgres. Worker daemons do a bounded, ordered, limited SELECT to grab > a row, which they lock by setting a value in the queue.status column. You can probably do an UPDATE ... RETURNING to turn that into one operation - but that won't work with a cursor :-( > My question is this: is there some way that I can keep a cursor / > pointer / reference / whatever to the row I fetched originally, so that > I don't have to search for it again when I'm ready to write results? You could use a cursor, but it won't work if you're locking rows by testing a 'status' flag, because that requires the worker to commit the transaction (so others can see the status flag) before starting work. A cursor only exists within a transaction. BEGIN; DECLARE curs CURSOR FOR SELECT * FROM queue ORDER BY queue_id LIMIT 1; FETCH NEXT FROM curs; -- -- Set the status - but nobody else can see the change yet because we -- haven't committed! We'll have a Pg row lock on the record due to the -- UPDATE, preventing other UPDATEs but not other SELECTs. -- -- We can't start work until the transaction commits, but committing -- will close the cursor. -- UPDATE queue SET status = 1 WHERE CURRENT OF curs; I don't have a good answer for you there. Perhaps using Pg's locking to do your queueing, rather than updating a status flag, might let you use a cursor? Have a look at the list archives - there's been a fair bit of discussion of queuing mechanisms. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance