Re: [HACKERS] Review: compact fsync request queue on overflow

2011-01-21 Thread Robert Haas
On Mon, Jan 17, 2011 at 8:23 PM, Greg Smith g...@2ndquadrant.com wrote:
 Quite.  It's taken me 12 days of machine time running pgbench to find the
 spots where this problem occurs on a system with a reasonably sized
 shared_buffers (I'm testing against 256MB).  It's one of those things it's
 hard to reproduce with test data.

 Thanks for the thorough code review.  I've got a clear test plan I'm
 progressing through this week to beat on the performance measurement aspects
 of the patch.

Any update on this?  I think the test results you've posted previously
- particularly, the fact that when the queue fills up, there are
always many duplicates - is pretty much sufficient for us to convince
ourselves that this will provide a benefit in cases where that occurs.
 And, in cases where the queue doesn't fill up, we'll never hit the
test that triggers this code, so it seems pretty clear there won't be
a negative impact there either.  I don't want to rush your testing
process, but if it's already fairly clear that this will have some
benefit, I think it would be good to get it committed and move on to
working on the parts we're less sure about, like sorting writes and
spreading fsyncs, where we will probably need a lot more testing than
here to be sure that we have the right behavior.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: compact fsync request queue on overflow

2011-01-21 Thread Chris Browne
robertmh...@gmail.com (Robert Haas) writes:
 On Mon, Jan 17, 2011 at 8:23 PM, Greg Smith g...@2ndquadrant.com wrote:
 Quite.  It's taken me 12 days of machine time running pgbench to find the
 spots where this problem occurs on a system with a reasonably sized
 shared_buffers (I'm testing against 256MB).  It's one of those things it's
 hard to reproduce with test data.

 Thanks for the thorough code review.  I've got a clear test plan I'm
 progressing through this week to beat on the performance measurement aspects
 of the patch.

 Any update on this?  I think the test results you've posted previously
 - particularly, the fact that when the queue fills up, there are
 always many duplicates - is pretty much sufficient for us to convince
 ourselves that this will provide a benefit in cases where that occurs.

Agreed.  This showed up eminently nicely when beating up the database
using pgbench.

I imagine it would be interesting to run it against a different test
than pgbench, particularly one which involves a larger number of tables.

From the behavior I have seen thus far, I'm expecting that the queue
essentially gets compressed to the size indicating the number of active
tables.  With pgbench, there are 4 tables, and the queue kept getting
compressed to 3 or 4 entries that nicely corresponds with that.

  And, in cases where the queue doesn't fill up, we'll never hit the
 test that triggers this code, so it seems pretty clear there won't be
 a negative impact there either.  I don't want to rush your testing
 process, but if it's already fairly clear that this will have some
 benefit, I think it would be good to get it committed and move on to
 working on the parts we're less sure about, like sorting writes and
 spreading fsyncs, where we will probably need a lot more testing than
 here to be sure that we have the right behavior.

I'm pretty happy with what I've seen thus far; I don't want to be
over-antsy about getting it all dealt with Right Quick Instantly, but it
seems like a change that doesn't have a terribly bad risk of a big
downside.
-- 
(reverse (concatenate 'string ofni.secnanifxunil @ enworbbc))
The statistics on  sanity are that one out of  every four Americans is
suffering from some  form of mental illness. Think  of your three best
friends. If they're okay, then it's you. -- Rita Mae Brown

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Review: compact fsync request queue on overflow

2011-01-17 Thread Chris Browne
I have been taking a peek at the following commitfest item:
   https://commitfest.postgresql.org/action/patch_view?id=497

Submission:

- I had to trim a little off the end of the patch to apply it, but
  that's likely the fault of how I cut'n'pasted it. It applied cleanly
  against HEAD.

- I observe that it doesn't include any alterations to documentation
  or to regression tests.

Both aspects seem apropos, as the behaviour is entirely internal to
the backend. I wouldn't expect a GUC variable for this, or SQL
commands to control it.

Usability Review:

Does the patch actually implement that?

  - Establishes a hash table
  - Establishes skip slot array
  - Walks through all BGWriter requests
- Adds to hash table.

  (I observe that it wasn't all that obvious that hash_search()
  *adds* elements that are missing.  I got confused and went
  looking for hash_add() or similar.  It's permissible to say dumb
  Chris.)

- If it's a collision, then mark collision in skip slot array, and
  add to count

  - After the walk
- Clean up hash table
- If nothing found, clean up skip slot array, and return

- If collisions found, then clear them out.

  Question: Is there any further cleanup needed for the entries
  that got dropped out of BGWriterShmem-requests?  It seems
  not, but a leak seems conceivable.

Do we want that?

  Eliminating a bunch of fsync() calls that are already being
  induced by other backends seems like a good thing, yep.

Do we already have it?

  Evidently not!

Does it follow SQL spec, or the community-agreed behavior?

  That doesn't seem relevant; this is well outside the scope of
  what SQL spec should have to say.

Does it include pg_dump support (if applicable)?

  Definitely not applicable.

Are there dangers?

  Possibilities...

  - Mentioned in the patch is the possibility of processing the
set of requests in reverse order, which might in principle
reduce work.  But there is some danger of this changing
semantics, so that reversal is not done.

  - Concurrent access...

Is there anything that can write extra elements to
BGWriterShmem-requests while this is running?

I wonder if we need to have any sort of lock surrounding this?

Have all the bases been covered?

  It is a comparatively simple change, so I wouldn't think things
  are missing.

Feature test:

   - Compiled and ran regression test; no problems found.

   Need to do...
- Validate it works as advertised
  - Hook up pgbench
  - Turn on DEBUG1 level
  - Watch that compacted fsync request queue from %d entries to %d 
entries come up

  It was a little troublesome inducing it.  I did so by cutting
  shared memory to minimum (128kB).

  I'd regularly get entries like the following:  (Note that I
  changed the error level to WARNING to induce logging this without
  getting all sorts of other stuff).

CONTEXT:  writing block 1735 of relation base/11933/16396
WARNING:  compacted fsync request queue from 16 entries to 3 entries - lost 
[13] entries
CONTEXT:  writing block 14 of relation base/11933/16387
WARNING:  compacted fsync request queue from 16 entries to 3 entries - lost 
[13] entries
CONTEXT:  writing block 4 of relation base/11933/16387
WARNING:  compacted fsync request queue from 16 entries to 3 entries - lost 
[13] entries
CONTEXT:  writing block 6 of relation base/11933/16387
WARNING:  compacted fsync request queue from 16 entries to 3 entries - lost 
[13] entries
CONTEXT:  writing block 1625 of relation base/11933/16396
WARNING:  compacted fsync request queue from 16 entries to 4 entries - lost 
[12] entries
CONTEXT:  writing block 880 of relation base/11933/16396
WARNING:  compacted fsync request queue from 16 entries to 4 entries - lost 
[12] entries
CONTEXT:  writing block 133 of relation base/11933/16396

  With higher shared memory, I couldn't readily induce compaction,
  which is probably a concurrency matter of not having enough volume
  of concurrent work going on.

- Corner cases?

  It's already a corner case ;-).

- Assertion failures?

  None seen thus far.

Performance test

- Does it slow down simple cases?

  It surely shouldn't; compaction is only considered if the fsync
  queue is larger than the number of shared buffers.  That doesn't
  seem like a simple case to me!

- Does it improve performance?

  I haven't been able to induce it at a level that would make the
  improvement visible.  But a database that is busy enough to have a
  'full' fsync queue should surely be helped by reducing the number
  of fsync requests.

- Does it slow down other things?

  In principle, the only case where it should worsen performance
  is if the amount of time required to:
- Set up a hash table
- Insert an entry for each buffer
- Walk the 

Re: [HACKERS] Review: compact fsync request queue on overflow

2011-01-17 Thread Robert Haas
On Mon, Jan 17, 2011 at 1:43 PM, Chris Browne cbbro...@acm.org wrote:
      (I observe that it wasn't all that obvious that hash_search()
      *adds* elements that are missing.  I got confused and went
      looking for hash_add() or similar.  It's permissible to say dumb
      Chris.)

I didn't invent that API.  It is a little strange, but you get the hang of it.

      Question: Is there any further cleanup needed for the entries
      that got dropped out of BGWriterShmem-requests?  It seems
      not, but a leak seems conceivable.

They're just holding integers, so what's to clean up?

      - Concurrent access...

        Is there anything that can write extra elements to
        BGWriterShmem-requests while this is running?

        I wonder if we need to have any sort of lock surrounding this?

It's called with the BgWriterCommLock already held, and there's an
assertion to this effect as well.

      With higher shared memory, I couldn't readily induce compaction,
      which is probably a concurrency matter of not having enough volume
      of concurrent work going on.

You can do it artificially by attaching gdb to the bgwriter.

      In principle, the only case where it should worsen performance
      is if the amount of time required to:
        - Set up a hash table
        - Insert an entry for each buffer
        - Walk the skip_slot array, shortening the request queue
          for each duplicate found
      exceeded the amount of time required to do the duplicate fsync()
      requests.

      That cost should be mighty low.  It would be interesting to
      instrument CompactBgwriterRequestQueue() to see how long it runs.

      But note that this cost is also spread into a direction where it
      likely wouldn't matter, as it is typically invoked by the
      background writer process, so this would frequently not be paid
      by on-line active processes.

This is not correct.  The background writer only ever does
AbsorbFsyncRequests(); this would substitute (to some degree) in cases
where the background writer fell down on the job.

 bgwriter.c: In function 'CompactBgwriterRequestQueue':
 bgwriter.c:1142: warning: 'num_skipped' may be used uninitialized in this 
 function

OK, I can fix that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: compact fsync request queue on overflow

2011-01-17 Thread Greg Smith

Chris Browne wrote:

  It was a little troublesome inducing it.  I did so by cutting
  shared memory to minimum (128kB)...
  With higher shared memory, I couldn't readily induce compaction,
  which is probably a concurrency matter of not having enough volume
  of concurrent work going on.
  


Quite.  It's taken me 12 days of machine time running pgbench to find 
the spots where this problem occurs on a system with a reasonably sized 
shared_buffers (I'm testing against 256MB).  It's one of those things 
it's hard to reproduce with test data.


Thanks for the thorough code review.  I've got a clear test plan I'm 
progressing through this week to beat on the performance measurement 
aspects of the patch.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers