Re: [HACKERS] Review: compact fsync request queue on overflow
On Mon, Jan 17, 2011 at 8:23 PM, Greg Smith g...@2ndquadrant.com wrote: Quite. It's taken me 12 days of machine time running pgbench to find the spots where this problem occurs on a system with a reasonably sized shared_buffers (I'm testing against 256MB). It's one of those things it's hard to reproduce with test data. Thanks for the thorough code review. I've got a clear test plan I'm progressing through this week to beat on the performance measurement aspects of the patch. Any update on this? I think the test results you've posted previously - particularly, the fact that when the queue fills up, there are always many duplicates - is pretty much sufficient for us to convince ourselves that this will provide a benefit in cases where that occurs. And, in cases where the queue doesn't fill up, we'll never hit the test that triggers this code, so it seems pretty clear there won't be a negative impact there either. I don't want to rush your testing process, but if it's already fairly clear that this will have some benefit, I think it would be good to get it committed and move on to working on the parts we're less sure about, like sorting writes and spreading fsyncs, where we will probably need a lot more testing than here to be sure that we have the right behavior. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: compact fsync request queue on overflow
robertmh...@gmail.com (Robert Haas) writes: On Mon, Jan 17, 2011 at 8:23 PM, Greg Smith g...@2ndquadrant.com wrote: Quite. It's taken me 12 days of machine time running pgbench to find the spots where this problem occurs on a system with a reasonably sized shared_buffers (I'm testing against 256MB). It's one of those things it's hard to reproduce with test data. Thanks for the thorough code review. I've got a clear test plan I'm progressing through this week to beat on the performance measurement aspects of the patch. Any update on this? I think the test results you've posted previously - particularly, the fact that when the queue fills up, there are always many duplicates - is pretty much sufficient for us to convince ourselves that this will provide a benefit in cases where that occurs. Agreed. This showed up eminently nicely when beating up the database using pgbench. I imagine it would be interesting to run it against a different test than pgbench, particularly one which involves a larger number of tables. From the behavior I have seen thus far, I'm expecting that the queue essentially gets compressed to the size indicating the number of active tables. With pgbench, there are 4 tables, and the queue kept getting compressed to 3 or 4 entries that nicely corresponds with that. And, in cases where the queue doesn't fill up, we'll never hit the test that triggers this code, so it seems pretty clear there won't be a negative impact there either. I don't want to rush your testing process, but if it's already fairly clear that this will have some benefit, I think it would be good to get it committed and move on to working on the parts we're less sure about, like sorting writes and spreading fsyncs, where we will probably need a lot more testing than here to be sure that we have the right behavior. I'm pretty happy with what I've seen thus far; I don't want to be over-antsy about getting it all dealt with Right Quick Instantly, but it seems like a change that doesn't have a terribly bad risk of a big downside. -- (reverse (concatenate 'string ofni.secnanifxunil @ enworbbc)) The statistics on sanity are that one out of every four Americans is suffering from some form of mental illness. Think of your three best friends. If they're okay, then it's you. -- Rita Mae Brown -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: compact fsync request queue on overflow
On Mon, Jan 17, 2011 at 1:43 PM, Chris Browne cbbro...@acm.org wrote: (I observe that it wasn't all that obvious that hash_search() *adds* elements that are missing. I got confused and went looking for hash_add() or similar. It's permissible to say dumb Chris.) I didn't invent that API. It is a little strange, but you get the hang of it. Question: Is there any further cleanup needed for the entries that got dropped out of BGWriterShmem-requests? It seems not, but a leak seems conceivable. They're just holding integers, so what's to clean up? - Concurrent access... Is there anything that can write extra elements to BGWriterShmem-requests while this is running? I wonder if we need to have any sort of lock surrounding this? It's called with the BgWriterCommLock already held, and there's an assertion to this effect as well. With higher shared memory, I couldn't readily induce compaction, which is probably a concurrency matter of not having enough volume of concurrent work going on. You can do it artificially by attaching gdb to the bgwriter. In principle, the only case where it should worsen performance is if the amount of time required to: - Set up a hash table - Insert an entry for each buffer - Walk the skip_slot array, shortening the request queue for each duplicate found exceeded the amount of time required to do the duplicate fsync() requests. That cost should be mighty low. It would be interesting to instrument CompactBgwriterRequestQueue() to see how long it runs. But note that this cost is also spread into a direction where it likely wouldn't matter, as it is typically invoked by the background writer process, so this would frequently not be paid by on-line active processes. This is not correct. The background writer only ever does AbsorbFsyncRequests(); this would substitute (to some degree) in cases where the background writer fell down on the job. bgwriter.c: In function 'CompactBgwriterRequestQueue': bgwriter.c:1142: warning: 'num_skipped' may be used uninitialized in this function OK, I can fix that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: compact fsync request queue on overflow
Chris Browne wrote: It was a little troublesome inducing it. I did so by cutting shared memory to minimum (128kB)... With higher shared memory, I couldn't readily induce compaction, which is probably a concurrency matter of not having enough volume of concurrent work going on. Quite. It's taken me 12 days of machine time running pgbench to find the spots where this problem occurs on a system with a reasonably sized shared_buffers (I'm testing against 256MB). It's one of those things it's hard to reproduce with test data. Thanks for the thorough code review. I've got a clear test plan I'm progressing through this week to beat on the performance measurement aspects of the patch. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers