Re: [HACKERS] limiting hint bit I/O

2011-02-07 Thread Bruce Momjian
Robert Haas wrote:
 On Sat, Feb 5, 2011 at 4:31 PM, Bruce Momjian br...@momjian.us wrote:
  Uh, in this C comment:
 
  + ? ? ? ?* or not we want to take the time to write it. ?We allow up to 5% 
  of
  + ? ? ? ?* otherwise-not-dirty pages to be written due to hint bit changes,
 
  5% of what? ?5% of all buffers? ?5% of all hint-bit-dirty ones? ?Can you
  clarify this in the patch?
 
 5% of buffers that are hint-bit-dirty but not otherwise dirty.  ISTM
 that's exactly what the comment you just quoted says on its face, but
 I'm open to some other wording you want to propose.

How about:

otherwise-not-dirty - only-hint-bit-dirty

So 95% of your hint bit modificates are discarded if the pages is not
otherwise dirtied?  That seems pretty radical.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-02-07 Thread Robert Haas
On Mon, Feb 7, 2011 at 10:48 AM, Bruce Momjian br...@momjian.us wrote:
 Robert Haas wrote:
 On Sat, Feb 5, 2011 at 4:31 PM, Bruce Momjian br...@momjian.us wrote:
  Uh, in this C comment:
 
  + ? ? ? ?* or not we want to take the time to write it. ?We allow up to 5% 
  of
  + ? ? ? ?* otherwise-not-dirty pages to be written due to hint bit changes,
 
  5% of what? ?5% of all buffers? ?5% of all hint-bit-dirty ones? ?Can you
  clarify this in the patch?

 5% of buffers that are hint-bit-dirty but not otherwise dirty.  ISTM
 that's exactly what the comment you just quoted says on its face, but
 I'm open to some other wording you want to propose.

 How about:

        otherwise-not-dirty - only-hint-bit-dirty

 So 95% of your hint bit modificates are discarded if the pages is not
 otherwise dirtied?  That seems pretty radical.

No, it's more subtle than that, although I admit it *is* radical.
There are three ways that pages can get written out to disk:

1. Checkpoints.
2. Background writer activity.
3. Backends writing out dirty buffers because there are no clean
buffers available to allocate.

What the latest version of the patch implements is:

1. Checkpoints no longer write only-hint-bit-dirty pages to disk.
Since a checkpoint doesn't evict pages from memory, the hint bits are
still there to be written out (or not) by (2) or (3), below.

2. When the background writer's cleaning scan hits an
only-hint-bit-dirty page, it writes it, same as before.  This
definitely doesn't result in the loss of any hint bits.

3. When a backend writes out a dirty buffer itself, because there are
no clean buffers available to allocate, it initially writes them.  But
if there are more than 100 such pages per block of 2000 allocations,
it recycles any after the first 100 without writing them.

In normal operation, I suspect that there will be very little impact
from this change.  The change described in #1 may slightly reduce the
size of some checkpoints, but it's unclear that it will be enough to
be material.  The change described in #3 will probably also not
matter, because, in a well-tuned system, the background writer should
be set aggressively enough to provide a supply of clean pages, and
therefore backends shouldn't be doing many writes themselves, and
therefore most buffer allocations will be of already-clean pages, and
the logic described in #3 will probably never kick in.  Even if they
are writing a lot of buffers themselves, the logic in #3 still won't
kick in if many of the pages being written are actually dirty - it
will only matter if the backends are writing out lots and lots of
pages *solely because they are only-hint-bit-dirty*.

Where I expect this to make a big difference is on sequential scans of
just-loaded tables.  In that case, the BufferAccessStrategy machinery
will force the backend to reuse the same buffers over and over again,
and all of those pages will be only-hint-bit-dirty.  So the backend
has to do a write for every page it allocates, and even though those
writes are being absorbed by the OS cache, it's still slow.  With this
patch, what will happen is that the backend will write about 100
pages, then perform the next 1900 allocations without writing, then
write another 100 pages, etc.  So at the end of the scan, instead of
having written an amount of data equal to the size of the table, we
will have written 5% of that amount, and 5% of the hint bits will be
on disk.  Each subsequent scan will get another 5% of the hint bits on
disk until after 20 scans they are all set.  So the work of setting
the hint bits is spread out across the first 20 table scans instead of
all being done the first time through.

Clearly, there's further jiggering that can be done here.  But the
overall goal is simply that some of our users don't seem to like it
when the first scan of a newly loaded table generates a huge storm of
*write* traffic.  Given that the hint bits appear to be quite
important from a performance perspective (see benchmark numbers
upthread), we don't really have the option of just not writing them -
but we can try to not to do it all at once, if we think that's an
improvement, which I think is likely.

Overall, I'm inclined to move this patch to the next CommitFest and
forget about it for now.  I don't think we're going to get enough
testing of this in the next week to be really confident that it's
right.  I might be willing to commit with some more moderate amount of
testing if we were right at the beginning of a development cycle,
figuring that we'd shake out any warts as the cycle went along, but
this isn't seeming like the right time for this kind of a change.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-02-07 Thread Cédric Villemain
2011/2/7 Robert Haas robertmh...@gmail.com:
 On Mon, Feb 7, 2011 at 10:48 AM, Bruce Momjian br...@momjian.us wrote:
 Robert Haas wrote:
 On Sat, Feb 5, 2011 at 4:31 PM, Bruce Momjian br...@momjian.us wrote:
  Uh, in this C comment:
 
  + ? ? ? ?* or not we want to take the time to write it. ?We allow up to 
  5% of
  + ? ? ? ?* otherwise-not-dirty pages to be written due to hint bit 
  changes,
 
  5% of what? ?5% of all buffers? ?5% of all hint-bit-dirty ones? ?Can you
  clarify this in the patch?

 5% of buffers that are hint-bit-dirty but not otherwise dirty.  ISTM
 that's exactly what the comment you just quoted says on its face, but
 I'm open to some other wording you want to propose.

 How about:

        otherwise-not-dirty - only-hint-bit-dirty

 So 95% of your hint bit modificates are discarded if the pages is not
 otherwise dirtied?  That seems pretty radical.

 No, it's more subtle than that, although I admit it *is* radical.
 There are three ways that pages can get written out to disk:

 1. Checkpoints.
 2. Background writer activity.
 3. Backends writing out dirty buffers because there are no clean
 buffers available to allocate.

 What the latest version of the patch implements is:

 1. Checkpoints no longer write only-hint-bit-dirty pages to disk.
 Since a checkpoint doesn't evict pages from memory, the hint bits are
 still there to be written out (or not) by (2) or (3), below.

 2. When the background writer's cleaning scan hits an
 only-hint-bit-dirty page, it writes it, same as before.  This
 definitely doesn't result in the loss of any hint bits.

 3. When a backend writes out a dirty buffer itself, because there are
 no clean buffers available to allocate, it initially writes them.  But
 if there are more than 100 such pages per block of 2000 allocations,
 it recycles any after the first 100 without writing them.

 In normal operation, I suspect that there will be very little impact
 from this change.  The change described in #1 may slightly reduce the
 size of some checkpoints, but it's unclear that it will be enough to
 be material.  The change described in #3 will probably also not
 matter, because, in a well-tuned system, the background writer should
 be set aggressively enough to provide a supply of clean pages, and
 therefore backends shouldn't be doing many writes themselves, and
 therefore most buffer allocations will be of already-clean pages, and
 the logic described in #3 will probably never kick in.  Even if they
 are writing a lot of buffers themselves, the logic in #3 still won't
 kick in if many of the pages being written are actually dirty - it
 will only matter if the backends are writing out lots and lots of
 pages *solely because they are only-hint-bit-dirty*.

 Where I expect this to make a big difference is on sequential scans of
 just-loaded tables.  In that case, the BufferAccessStrategy machinery
 will force the backend to reuse the same buffers over and over again,
 and all of those pages will be only-hint-bit-dirty.  So the backend
 has to do a write for every page it allocates, and even though those
 writes are being absorbed by the OS cache, it's still slow.  With this
 patch, what will happen is that the backend will write about 100
 pages, then perform the next 1900 allocations without writing, then
 write another 100 pages, etc.  So at the end of the scan, instead of
 having written an amount of data equal to the size of the table, we
 will have written 5% of that amount, and 5% of the hint bits will be
 on disk.  Each subsequent scan will get another 5% of the hint bits on
 disk until after 20 scans they are all set.  So the work of setting
 the hint bits is spread out across the first 20 table scans instead of
 all being done the first time through.

 Clearly, there's further jiggering that can be done here.  But the
 overall goal is simply that some of our users don't seem to like it
 when the first scan of a newly loaded table generates a huge storm of
 *write* traffic.  Given that the hint bits appear to be quite
 important from a performance perspective (see benchmark numbers
 upthread),

those are not real benchmarks, just quick guess to check behavior.
(and I agree it looks good, but I also got inconsistent results, the
patched postgresql hardly reach the same speed of the original
9.1devel even after 200 hundreds select of your testcase)


 we don't really have the option of just not writing them -
 but we can try to not to do it all at once, if we think that's an
 improvement, which I think is likely.

 Overall, I'm inclined to move this patch to the next CommitFest and
 forget about it for now.  I don't think we're going to get enough
 testing of this in the next week to be really confident that it's
 right.  I might be willing to commit with some more moderate amount of
 testing if we were right at the beginning of a development cycle,
 figuring that we'd shake out any warts as the cycle went along, but
 this isn't seeming like the right time 

Re: [HACKERS] limiting hint bit I/O

2011-02-07 Thread Cédric Villemain
2011/2/7 Cédric Villemain cedric.villemain.deb...@gmail.com:
 2011/2/7 Robert Haas robertmh...@gmail.com:
 On Mon, Feb 7, 2011 at 10:48 AM, Bruce Momjian br...@momjian.us wrote:
 Robert Haas wrote:
 On Sat, Feb 5, 2011 at 4:31 PM, Bruce Momjian br...@momjian.us wrote:
  Uh, in this C comment:
 
  + ? ? ? ?* or not we want to take the time to write it. ?We allow up to 
  5% of
  + ? ? ? ?* otherwise-not-dirty pages to be written due to hint bit 
  changes,
 
  5% of what? ?5% of all buffers? ?5% of all hint-bit-dirty ones? ?Can you
  clarify this in the patch?

 5% of buffers that are hint-bit-dirty but not otherwise dirty.  ISTM
 that's exactly what the comment you just quoted says on its face, but
 I'm open to some other wording you want to propose.

 How about:

        otherwise-not-dirty - only-hint-bit-dirty

 So 95% of your hint bit modificates are discarded if the pages is not
 otherwise dirtied?  That seems pretty radical.

 No, it's more subtle than that, although I admit it *is* radical.
 There are three ways that pages can get written out to disk:

 1. Checkpoints.
 2. Background writer activity.
 3. Backends writing out dirty buffers because there are no clean
 buffers available to allocate.

 What the latest version of the patch implements is:

 1. Checkpoints no longer write only-hint-bit-dirty pages to disk.
 Since a checkpoint doesn't evict pages from memory, the hint bits are
 still there to be written out (or not) by (2) or (3), below.

 2. When the background writer's cleaning scan hits an
 only-hint-bit-dirty page, it writes it, same as before.  This
 definitely doesn't result in the loss of any hint bits.

 3. When a backend writes out a dirty buffer itself, because there are
 no clean buffers available to allocate, it initially writes them.  But
 if there are more than 100 such pages per block of 2000 allocations,
 it recycles any after the first 100 without writing them.

 In normal operation, I suspect that there will be very little impact
 from this change.  The change described in #1 may slightly reduce the
 size of some checkpoints, but it's unclear that it will be enough to
 be material.  The change described in #3 will probably also not
 matter, because, in a well-tuned system, the background writer should
 be set aggressively enough to provide a supply of clean pages, and
 therefore backends shouldn't be doing many writes themselves, and
 therefore most buffer allocations will be of already-clean pages, and
 the logic described in #3 will probably never kick in.  Even if they
 are writing a lot of buffers themselves, the logic in #3 still won't
 kick in if many of the pages being written are actually dirty - it
 will only matter if the backends are writing out lots and lots of
 pages *solely because they are only-hint-bit-dirty*.

 Where I expect this to make a big difference is on sequential scans of
 just-loaded tables.  In that case, the BufferAccessStrategy machinery
 will force the backend to reuse the same buffers over and over again,
 and all of those pages will be only-hint-bit-dirty.  So the backend
 has to do a write for every page it allocates, and even though those
 writes are being absorbed by the OS cache, it's still slow.  With this
 patch, what will happen is that the backend will write about 100
 pages, then perform the next 1900 allocations without writing, then
 write another 100 pages, etc.  So at the end of the scan, instead of
 having written an amount of data equal to the size of the table, we
 will have written 5% of that amount, and 5% of the hint bits will be
 on disk.  Each subsequent scan will get another 5% of the hint bits on
 disk until after 20 scans they are all set.  So the work of setting
 the hint bits is spread out across the first 20 table scans instead of
 all being done the first time through.

 Clearly, there's further jiggering that can be done here.  But the
 overall goal is simply that some of our users don't seem to like it
 when the first scan of a newly loaded table generates a huge storm of
 *write* traffic.  Given that the hint bits appear to be quite
 important from a performance perspective (see benchmark numbers
 upthread),

 those are not real benchmarks, just quick guess to check behavior.
 (and I agree it looks good, but I also got inconsistent results, the
 patched postgresql hardly reach the same speed of the original
 9.1devel even after 200 hundreds select of your testcase)


 we don't really have the option of just not writing them -
 but we can try to not to do it all at once, if we think that's an
 improvement, which I think is likely.

 Overall, I'm inclined to move this patch to the next CommitFest and
 forget about it for now.  I don't think we're going to get enough
 testing of this in the next week to be really confident that it's
 right.  I might be willing to commit with some more moderate amount of
 testing if we were right at the beginning of a development cycle,
 figuring that we'd shake out any warts as 

Re: [HACKERS] limiting hint bit I/O

2011-02-05 Thread Cédric Villemain
2011/1/19 Robert Haas robertmh...@gmail.com:
 On Wed, Jan 19, 2011 at 11:52 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 ... So what we
 want to do is write a percentage of them, in a way that guarantees
 that they'll all eventually get written if people continue to access
 the same data.

 The word guarantee seems quite inappropriate here, since as far as I
 can see this approach provides no such guarantee --- even after many
 cycles you'd never be really certain all the bits were set.

 What I asked for upthread was that we continue to have some
 deterministic, practical way to force all hint bits in a table to be
 set.  This is not *remotely* responding to that request.  It's still not
 deterministic, and even if it were, vacuuming a large table 20 times
 isn't a very practical solution.

 I get the impression you haven't spent as much time reading my email
 as I spent writing it.  Perhaps I'm wrong, but in any case the code
 doesn't do what you're suggesting.  In the most recently posted
 version of this patch, which is v2, if VACUUM hits a page that is

Please update the commitfest with the accurate patch, there is only
the old immature v1 of the patch in it.
I was about reviewing it...

https://commitfest.postgresql.org/action/patch_view?id=500

 hint-bit-dirty, it always writes it.  Full stop.  The 20 times bit
 applies to a SELECT * FROM table, which is a rather different case.

 As I write this, I realize that there is a small fly in the ointment
 here, which is that neither VACUUM nor SELECT force out all the pages
 they modify to disk.  So there is some small amount of remaining
 nondeterminism, even if you VACUUM, because VACUUM will leave the last
 few pages it dirties in shared_buffers, and whether those hint bits
 hit the disk will depend on a decision made at the time they're
 evicted, not at the time they were dirtied.  Possibly I could fix that
 by making SetBufferCommitInfoNeedsSave() set BM_DIRTY during vacuum
 and BM_HINT_BITS at other times.  That would nail the lid shut pretty
 tight.

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-02-05 Thread Robert Haas
On Sat, Feb 5, 2011 at 10:37 AM, Cédric Villemain
cedric.villemain.deb...@gmail.com wrote:
 Please update the commitfest with the accurate patch, there is only
 the old immature v1 of the patch in it.
 I was about reviewing it...

 https://commitfest.postgresql.org/action/patch_view?id=500

Woops, sorry about that.  Here's an updated version, which I will also
add to the CommitFest application.

The need for this patch has been somewhat ameliorated by the fsync
queue compaction patch.  I tested with:

create table s as select g,
random()::text||random()::text||random()::text||random()::text from
generate_series(1,100) g;
checkpoint;

The table was large enough not to fit in shared_buffers.  Then, repeatedly:

select sum(1) from s;

At the time I first posted this patch, running against git master, the
first run took about 1600 ms vs. ~207-216 ms for subsequent runs.  But
that was actually running up against the fsync queue problem.
Retesting today, the first run took 360 ms, and subsequent runs took
197-206 ms.  I doubt that the difference in the steady-state is
significant, since the tests were done on different days and not
controlled all that carefully, but clearly the response time spike for
the first scan is far lower than previously.  Setting the log level to
DEBUG1 revealed that the first scan did two fsync queue compactions.

The patch still does help to smooth things out, though.  Here are the
times for one series of selects, with the patch applied, after setting
up as described above:

257.108
259.245
249.181
245.896
250.161
241.559
240.538
241.091
232.727
232.779
232.543
226.265
225.029
222.015
217.106
216.426
217.724
210.604
209.630
203.507
197.521
204.448
196.809

Without the patch, as seen above, the first run is about ~80% slower.
With the patch applied, the first run is about 25% slower than the
steady state, and subsequent scans decline steadily from there.  Runs
21 and following flush no further data and run at full speed.  These
numbers aren't representative of all real-world scenarios, though.
On a system with many concurrent clients, CLOG contention might be an
issue; on the flip side, if this table were larger than RAM (not just
larger than shared_buffers) the decrease in write traffic as we scan
through the table might actually be a more significant benefit than it
is here, where it's mostly a question of kernel time; the I/O system
isn't actually taxed.  So I think this probably needs more testing
before we decide whether or not it's a good idea.

I adopted a few suggestions made previously in this version of the
patch.  Tom Lane recommended not messing with BM_JUST_DIRTY and
leaving that for another day.  I did that.  Also, per my previous
musings, I've adjusted this version so that vacuum behaves differently
when dirtying pages rather than when flushing them.  In versions 1 and
2, vacuum would always write pages that were dirty-only-for-hint-bits
when allocating a new buffer; in this version the buffer allocation
logic is the same for vacuum, but it marks pages dirty even when only
hint bits have changed.  The result is that VACUUM followed by
CHECKPOINT is enough to make sure all hint bits are set on disk, just
as is the case today.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 5663711..e8f8781 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -209,11 +209,12 @@ vacuum(VacuumStmt *vacstmt, Oid relid, bool do_toast,
 		CommitTransactionCommand();
 	}
 
-	/* Turn vacuum cost accounting on or off */
+	/* Adjust vacuum cost accounting state and update VacuumActive flag */
 	PG_TRY();
 	{
 		ListCell   *cur;
 
+		VacuumActive = true;
 		VacuumCostActive = (VacuumCostDelay  0);
 		VacuumCostBalance = 0;
 
@@ -254,8 +255,9 @@ vacuum(VacuumStmt *vacstmt, Oid relid, bool do_toast,
 	}
 	PG_CATCH();
 	{
-		/* Make sure cost accounting is turned off after error */
+		/* Make sure vacuum state variables are fixed up on error */
 		VacuumCostActive = false;
+		VacuumActive = false;
 		PG_RE_THROW();
 	}
 	PG_END_TRY();
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 1f89e52..1146dee 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -81,6 +81,10 @@ static bool IsForInput;
 /* local state for LockBufferForCleanup */
 static volatile BufferDesc *PinCountWaitBuf = NULL;
 
+/* local state for BufferAlloc */
+static int hint_bit_write_allowance;
+static int buffer_allocation_count;
+
 
 static Buffer ReadBuffer_common(SMgrRelation reln, char relpersistence,
   ForkNumber forkNum, BlockNumber blockNum,
@@ -578,6 +582,7 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 	for (;;)
 	{
 		bool		lock_held;
+		bool		write_buffer = false;
 
 		/*
 		 * Select a victim buffer.	The buffer is returned with its header
@@ -600,13 

Re: [HACKERS] limiting hint bit I/O

2011-02-05 Thread Cédric Villemain
2011/2/5 Robert Haas robertmh...@gmail.com:
 On Sat, Feb 5, 2011 at 10:37 AM, Cédric Villemain
 cedric.villemain.deb...@gmail.com wrote:
 Please update the commitfest with the accurate patch, there is only
 the old immature v1 of the patch in it.
 I was about reviewing it...

 https://commitfest.postgresql.org/action/patch_view?id=500

 Woops, sorry about that.  Here's an updated version, which I will also
 add to the CommitFest application.

 The need for this patch has been somewhat ameliorated by the fsync
 queue compaction patch.  I tested with:

 create table s as select g,
 random()::text||random()::text||random()::text||random()::text from
 generate_series(1,100) g;
 checkpoint;

 The table was large enough not to fit in shared_buffers.  Then, repeatedly:

 select sum(1) from s;

 At the time I first posted this patch, running against git master, the
 first run took about 1600 ms vs. ~207-216 ms for subsequent runs.  But
 that was actually running up against the fsync queue problem.
 Retesting today, the first run took 360 ms, and subsequent runs took
 197-206 ms.  I doubt that the difference in the steady-state is
 significant, since the tests were done on different days and not
 controlled all that carefully, but clearly the response time spike for
 the first scan is far lower than previously.  Setting the log level to
 DEBUG1 revealed that the first scan did two fsync queue compactions.

 The patch still does help to smooth things out, though.  Here are the
 times for one series of selects, with the patch applied, after setting
 up as described above:

 257.108
 259.245
 249.181
 245.896
 250.161
 241.559
 240.538
 241.091
 232.727
 232.779
 232.543
 226.265
 225.029
 222.015
 217.106
 216.426
 217.724
 210.604
 209.630
 203.507
 197.521
 204.448
 196.809

 Without the patch, as seen above, the first run is about ~80% slower.
 With the patch applied, the first run is about 25% slower than the
 steady state, and subsequent scans decline steadily from there.  Runs
 21 and following flush no further data and run at full speed.  These
 numbers aren't representative of all real-world scenarios, though.
 On a system with many concurrent clients, CLOG contention might be an
 issue; on the flip side, if this table were larger than RAM (not just
 larger than shared_buffers) the decrease in write traffic as we scan
 through the table might actually be a more significant benefit than it
 is here, where it's mostly a question of kernel time; the I/O system
 isn't actually taxed.  So I think this probably needs more testing
 before we decide whether or not it's a good idea.

I *may* have an opportunity to test that in a real world application
where this hint bit was an issue.


 I adopted a few suggestions made previously in this version of the
 patch.  Tom Lane recommended not messing with BM_JUST_DIRTY and
 leaving that for another day.

yes, good.

 I did that.  Also, per my previous
 musings, I've adjusted this version so that vacuum behaves differently
 when dirtying pages rather than when flushing them.  In versions 1 and
 2, vacuum would always write pages that were dirty-only-for-hint-bits
 when allocating a new buffer; in this version the buffer allocation
 logic is the same for vacuum, but it marks pages dirty even when only
 hint bits have changed.  The result is that VACUUM followed by
 CHECKPOINT is enough to make sure all hint bits are set on disk, just
 as is the case today.

for now it looks better to reduce this impact, yes..
Keeping the logic from v1 or v2 imply vacuum freeze to 'fix' the hint
bit, right ?

-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-02-05 Thread Robert Haas
On Sat, Feb 5, 2011 at 3:07 PM, Cédric Villemain
cedric.villemain.deb...@gmail.com wrote:
 So I think this probably needs more testing
 before we decide whether or not it's a good idea.

 I *may* have an opportunity to test that in a real world application
 where this hint bit was an issue.

That would be great.  But note that you'll also need to compare it
against an unpatched 9.1devel; otherwise we won't be able to tell
whether it's this helping, or some other 9.1 patch (particularly, the
fsync compaction patch).

 I did that.  Also, per my previous
 musings, I've adjusted this version so that vacuum behaves differently
 when dirtying pages rather than when flushing them.  In versions 1 and
 2, vacuum would always write pages that were dirty-only-for-hint-bits
 when allocating a new buffer; in this version the buffer allocation
 logic is the same for vacuum, but it marks pages dirty even when only
 hint bits have changed.  The result is that VACUUM followed by
 CHECKPOINT is enough to make sure all hint bits are set on disk, just
 as is the case today.

 for now it looks better to reduce this impact, yes..
 Keeping the logic from v1 or v2 imply vacuum freeze to 'fix' the hint
 bit, right ?

In v1, you'd need to actually dirty the pages, so yeah, VACUUM
(FREEZE) would be pretty much the only way.  In v2, regular VACUUM
would mostly work, except it might miss a smattering of hint bits at
the very end of its scan.  In this version (v3), that's been fixed as
well and now just plain VACUUM should be entirely sufficient.  (The
last few pages examined might not get evicted to disk right away, just
as in the current code, but they're guaranteed to be written
eventually unless a system crash intervenes, again just as in the
current code.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-02-05 Thread Cédric Villemain
2011/2/5 Robert Haas robertmh...@gmail.com:
 On Sat, Feb 5, 2011 at 3:07 PM, Cédric Villemain
 cedric.villemain.deb...@gmail.com wrote:
 So I think this probably needs more testing
 before we decide whether or not it's a good idea.

 I *may* have an opportunity to test that in a real world application
 where this hint bit was an issue.

 That would be great.  But note that you'll also need to compare it
 against an unpatched 9.1devel; otherwise we won't be able to tell
 whether it's this helping, or some other 9.1 patch (particularly, the
 fsync compaction patch).

mmhh, sure.


 I did that.  Also, per my previous
 musings, I've adjusted this version so that vacuum behaves differently
 when dirtying pages rather than when flushing them.  In versions 1 and
 2, vacuum would always write pages that were dirty-only-for-hint-bits
 when allocating a new buffer; in this version the buffer allocation
 logic is the same for vacuum, but it marks pages dirty even when only
 hint bits have changed.  The result is that VACUUM followed by
 CHECKPOINT is enough to make sure all hint bits are set on disk, just
 as is the case today.

 for now it looks better to reduce this impact, yes..
 Keeping the logic from v1 or v2 imply vacuum freeze to 'fix' the hint
 bit, right ?

 In v1, you'd need to actually dirty the pages, so yeah, VACUUM
 (FREEZE) would be pretty much the only way.  In v2, regular VACUUM
 would mostly work, except it might miss a smattering of hint bits at
 the very end of its scan.  In this version (v3), that's been fixed as
 well and now just plain VACUUM should be entirely sufficient.  (The
 last few pages examined might not get evicted to disk right away, just
 as in the current code, but they're guaranteed to be written
 eventually unless a system crash intervenes, again just as in the
 current code.)


just reading the patch...
I understand the idea of the 5% flush.
*maybe* it make sense to use effective_io_concurrency GUC here to
improve the ratio, but it might be perceived as a bad usage ..
currently effective_io_concurrency is for planning purpose.


-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-02-05 Thread Bruce Momjian
Robert Haas wrote:
 On Sat, Feb 5, 2011 at 10:37 AM, C?dric Villemain
 cedric.villemain.deb...@gmail.com wrote:
  Please update the commitfest with the accurate patch, there is only
  the old immature v1 of the patch in it.
  I was about reviewing it...
 
  https://commitfest.postgresql.org/action/patch_view?id=500
 
 Woops, sorry about that.  Here's an updated version, which I will also
 add to the CommitFest application.
 
 The need for this patch has been somewhat ameliorated by the fsync
 queue compaction patch.  I tested with:

Uh, in this C comment:

+* or not we want to take the time to write it.  We allow up to 5% of
+* otherwise-not-dirty pages to be written due to hint bit changes,

5% of what?  5% of all buffers?  5% of all hint-bit-dirty ones?  Can you
clarify this in the patch?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-02-05 Thread Robert Haas
On Sat, Feb 5, 2011 at 4:19 PM, Cédric Villemain
cedric.villemain.deb...@gmail.com wrote:
 just reading the patch...
 I understand the idea of the 5% flush.
 *maybe* it make sense to use effective_io_concurrency GUC here to
 improve the ratio, but it might be perceived as a bad usage ..
 currently effective_io_concurrency is for planning purpose.

effective_io_concurrency is supposed to be set based on how many
spindles your RAID array has.  There's no reason to think that the
correct flush percentage is in any way related to that value. The
reason why we might not want backends to write out too many
dirty-only-for-hint-bits buffers during a large sequential scan are
that (a) the actual write() system calls take time to copy the buffers
into kernel space, slowing the scan, and (b) flushing too many buffers
this way could lead to I/O spikes.  Increasing the flush percentage
slows down the first few scans, but takes fewer scans to reach optimal
performance (all hit bits set on disk).  Decreasing the flush
percentage speeds up the first few scans, but is overall less
efficient.

We could make this a tunable, but I'm not clear that there is much
point.  If writing 100% of the pages that have only hint-bit updates
slows the scan by 80% and writing 5% of the pages slows the scan by
25%, then dropping below 5% doesn't seem likely to buy much further
improvement.  You could argue for raising the flush percentage above
5%, but if you go too much higher then it's not clear that you're
gaining anything over just flushing them all.  I don't think we
necessarily have enough experience to know whether this is a good idea
at all, so worrying about whether different people need different
percentages seems a bit premature.

Another point here is that no matter how many times you
sequential-scan the table, you never get performance as good as what
you would get if you vacuumed it, even if the table contains no dead
tuples.  I believe this is because VACUUM will not only set the
HEAP_XMIN_COMMITTED hint bits; it'll also set PD_ALL_VISIBLE on the
page.  I wonder if we shouldn't be autovacuuming even tables that are
insert-only for precisely this reason, as well as to prevent the case
where someone inserts small batches of records for a long time and
then finally deletes some stuff.  There are no visibility map bits set
so, boom, you get this huge, expensive vacuum.  This will, of course,
be even more of an issue when we get index-only scans.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-02-05 Thread Robert Haas
On Sat, Feb 5, 2011 at 4:31 PM, Bruce Momjian br...@momjian.us wrote:
 Uh, in this C comment:

 +        * or not we want to take the time to write it.  We allow up to 5% of
 +        * otherwise-not-dirty pages to be written due to hint bit changes,

 5% of what?  5% of all buffers?  5% of all hint-bit-dirty ones?  Can you
 clarify this in the patch?

5% of buffers that are hint-bit-dirty but not otherwise dirty.  ISTM
that's exactly what the comment you just quoted says on its face, but
I'm open to some other wording you want to propose.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-02-05 Thread Cédric Villemain
2011/2/5 Bruce Momjian br...@momjian.us:
 Robert Haas wrote:
 On Sat, Feb 5, 2011 at 10:37 AM, C?dric Villemain
 cedric.villemain.deb...@gmail.com wrote:
  Please update the commitfest with the accurate patch, there is only
  the old immature v1 of the patch in it.
  I was about reviewing it...
 
  https://commitfest.postgresql.org/action/patch_view?id=500

 Woops, sorry about that.  Here's an updated version, which I will also
 add to the CommitFest application.

 The need for this patch has been somewhat ameliorated by the fsync
 queue compaction patch.  I tested with:

 Uh, in this C comment:

 +        * or not we want to take the time to write it.  We allow up to 5% of
 +        * otherwise-not-dirty pages to be written due to hint bit changes,

 5% of what?  5% of all buffers?  5% of all hint-bit-dirty ones?  Can you
 clarify this in the patch?


The patch currently allow 100 buffers to be written consecutively each
2000 BufferAlloc.
mmmhhh

Robert, I am unsure with the hint_bit_write_allowance counter. It
looks a bit fragile because
nothing prevent  hint_bit_write_allowance counter to increase a lot,
so that is not 100 but X*100 next hint bit will be written. Isn't it ?

Also, won't buffer_allocation_count hit INT limit ?

-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-02-05 Thread Robert Haas
On Sat, Feb 5, 2011 at 5:04 PM, Cédric Villemain
cedric.villemain.deb...@gmail.com wrote:
 Robert, I am unsure with the hint_bit_write_allowance counter. It
 looks a bit fragile because
 nothing prevent  hint_bit_write_allowance counter to increase a lot,
 so that is not 100 but X*100 next hint bit will be written. Isn't it ?

hint_bit_write_allowance can never be more than 100.  The only things
we ever do are set it to exactly 100, and decrease it by 1 if it's
positive.

 Also, won't buffer_allocation_count hit INT limit ?

Sure, if the backend sticks around long enough, but it's no big deal
if it overflows.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-19 Thread Andrea Suisani

On 01/18/2011 06:44 PM, Robert Haas wrote:

On Tue, Jan 18, 2011 at 9:24 AM, Merlin Moncuremmonc...@gmail.com  wrote:

a few weeks back I hacked an experimental patch that removed the hint
bit action completely.  the results were very premature and/or
incorrect, but my initial findings suggested that hint bits might not
be worth the cost from performance standpoint.  i'd like to see some
more investigation in this direction before going with a complex
application mechanism (although that would be beneficial vs the status
quo).


I think it's not very responsible to allege that hint bits aren't
providing a benefit without providing the patch that you used and the
tests that you ran.


maybe I'm wrong but it seems it did post an experimental patch and also
a tests used, see:

http://archives.postgresql.org/pgsql-hackers/2010-12/msg01897.php

 This is a topic that needs careful analysis, and
 I think that saying hint bits don't provide a benefit... maybe...
 doesn't do anything but confuse the issue.  How about doing some tests
 with the patch from my OP and posting the results?  If removing hint
 bits entirely doesn't degrade performance, then surely the
 less-drastic approach I've taken here ought to be OK too.  But in my
 testing, it didn't look too good.



Andrea



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-19 Thread Andrea Suisani

On 01/19/2011 09:03 AM, Andrea Suisani wrote:

On 01/18/2011 06:44 PM, Robert Haas wrote:

On Tue, Jan 18, 2011 at 9:24 AM, Merlin Moncuremmonc...@gmail.com wrote:

a few weeks back I hacked an experimental patch that removed the hint
bit action completely. the results were very premature and/or
incorrect, but my initial findings suggested that hint bits might not
be worth the cost from performance standpoint. i'd like to see some
more investigation in this direction before going with a complex
application mechanism (although that would be beneficial vs the status
quo).


I think it's not very responsible to allege that hint bits aren't
providing a benefit without providing the patch that you used and the
tests that you ran.


maybe I'm wrong but it seems it did post an experimental patch and also

   ^^
   he

a tests used, see:

  ^^
  the

sorry for the typos (not enough caffeine I suppose :)



Andrea

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-19 Thread Merlin Moncure
On Tue, Jan 18, 2011 at 12:44 PM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Jan 18, 2011 at 9:24 AM, Merlin Moncure mmonc...@gmail.com wrote:
 a few weeks back I hacked an experimental patch that removed the hint
 bit action completely.  the results were very premature and/or
 incorrect, but my initial findings suggested that hint bits might not
 be worth the cost from performance standpoint.  i'd like to see some
 more investigation in this direction before going with a complex
 application mechanism (although that would be beneficial vs the status
 quo).

 I think it's not very responsible to allege that hint bits aren't
 providing a benefit without providing the patch that you used and the
 tests that you ran.  This is a topic that needs careful analysis, and
 I think that saying hint bits don't provide a benefit... maybe...
 doesn't do anything but confuse the issue.  How about doing some tests
 with the patch from my OP and posting the results?  If removing hint
 bits entirely doesn't degrade performance, then surely the
 less-drastic approach I've taken here ought to be OK too.  But in my
 testing, it didn't look too good.

hm. well, I would have to agree on the performance hit -- I figure 5%
scan penalty should be about the maximum you'd want to pay to get the
i/o reduction.  Odds are you're correct and I blew something...I'd be
happy to test your patch.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-19 Thread Merlin Moncure
On Wed, Jan 19, 2011 at 7:57 AM, Merlin Moncure mmonc...@gmail.com wrote:
 On Tue, Jan 18, 2011 at 12:44 PM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Jan 18, 2011 at 9:24 AM, Merlin Moncure mmonc...@gmail.com wrote:
 a few weeks back I hacked an experimental patch that removed the hint
 bit action completely.  the results were very premature and/or
 incorrect, but my initial findings suggested that hint bits might not
 be worth the cost from performance standpoint.  i'd like to see some
 more investigation in this direction before going with a complex
 application mechanism (although that would be beneficial vs the status
 quo).

 I think it's not very responsible to allege that hint bits aren't
 providing a benefit without providing the patch that you used and the
 tests that you ran.  This is a topic that needs careful analysis, and
 I think that saying hint bits don't provide a benefit... maybe...
 doesn't do anything but confuse the issue.  How about doing some tests
 with the patch from my OP and posting the results?  If removing hint
 bits entirely doesn't degrade performance, then surely the
 less-drastic approach I've taken here ought to be OK too.  But in my
 testing, it didn't look too good.

 hm. well, I would have to agree on the performance hit -- I figure 5%
 scan penalty should be about the maximum you'd want to pay to get the
 i/o reduction.  Odds are you're correct and I blew something...I'd be
 happy to test your patch.

Ah, I tested your patch vs stock postgres vs my patch, basically your
results are unhappily correct (mine was just a hair faster than yours
which you'd expect).  The differential was even wider on my laptop
class hardware, maybe 26%.  I also agree that even if the penalty was
reduced or determined to be worth it anyways, your approach to move
the setting/i/o around to appropriate places is the way to go vs
wholesale removal, unless some way is found to reduce clog lookup
penalty to a fraction of what it is now (not likely, I didn't profile
but I bet a lot of the problem is the lw lock).  Interesting I didn't
notice this on my original test :(.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-19 Thread Heikki Linnakangas

On 19.01.2011 15:56, Merlin Moncure wrote:

On Wed, Jan 19, 2011 at 7:57 AM, Merlin Moncuremmonc...@gmail.com  wrote:

On Tue, Jan 18, 2011 at 12:44 PM, Robert Haasrobertmh...@gmail.com  wrote:

On Tue, Jan 18, 2011 at 9:24 AM, Merlin Moncuremmonc...@gmail.com  wrote:

a few weeks back I hacked an experimental patch that removed the hint
bit action completely.  the results were very premature and/or
incorrect, but my initial findings suggested that hint bits might not
be worth the cost from performance standpoint.  i'd like to see some
more investigation in this direction before going with a complex
application mechanism (although that would be beneficial vs the status
quo).


I think it's not very responsible to allege that hint bits aren't
providing a benefit without providing the patch that you used and the
tests that you ran.  This is a topic that needs careful analysis, and
I think that saying hint bits don't provide a benefit... maybe...
doesn't do anything but confuse the issue.  How about doing some tests
with the patch from my OP and posting the results?  If removing hint
bits entirely doesn't degrade performance, then surely the
less-drastic approach I've taken here ought to be OK too.  But in my
testing, it didn't look too good.


hm. well, I would have to agree on the performance hit -- I figure 5%
scan penalty should be about the maximum you'd want to pay to get the
i/o reduction.  Odds are you're correct and I blew something...I'd be
happy to test your patch.


Ah, I tested your patch vs stock postgres vs my patch, basically your
results are unhappily correct (mine was just a hair faster than yours
which you'd expect).  The differential was even wider on my laptop
class hardware, maybe 26%.  I also agree that even if the penalty was
reduced or determined to be worth it anyways, your approach to move
the setting/i/o around to appropriate places is the way to go vs
wholesale removal, unless some way is found to reduce clog lookup
penalty to a fraction of what it is now (not likely, I didn't profile
but I bet a lot of the problem is the lw lock).  Interesting I didn't
notice this on my original test :(.


One thing to note is that the current visibility-checking code is 
optimized for the case that the hint bit is set, and the codepath where 
it's not is not particularly fast. HeapTupleSatisfiesMVCC does a lot of 
things besides checking the clog. For xmin:


1. Check HEAP_MOVED_OFF / HEAP_MOVED_IN
2. Check if xmin is the current transaction with 
TransactionIdIsCurrentTransactionId()

3. Check if xmin is still in progress with TransactionIdIsInProgress()
4. And finally, check the clog with TransactionIdDidCommit()

It would be nice to profile the code to see where the time really is 
spent. Most of it is probably in the clog access, but the 
TransactionIdInProgress() call can be quite expensive too if there's a 
lot of concurrent backends.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-19 Thread Robert Haas
On Wed, Jan 19, 2011 at 8:56 AM, Merlin Moncure mmonc...@gmail.com wrote:
 Ah, I tested your patch vs stock postgres vs my patch, basically your
 results are unhappily correct (mine was just a hair faster than yours
 which you'd expect).  The differential was even wider on my laptop
 class hardware, maybe 26%.  I also agree that even if the penalty was
 reduced or determined to be worth it anyways, your approach to move
 the setting/i/o around to appropriate places is the way to go vs
 wholesale removal, unless some way is found to reduce clog lookup
 penalty to a fraction of what it is now (not likely, I didn't profile
 but I bet a lot of the problem is the lw lock).  Interesting I didn't
 notice this on my original test :(.

OK.  My apologies for the email yesterday in which I forgot that you
actually HAD posted a patch, but thanks for testing mine and posting
your results (and thanks also to Andrea for pointing out the oversight
to me).

Here's a new version of the patch based on some experimentation with
ideas I posted yesterday.  At least on my Mac laptop, this is pretty
effective at blunting the response time spike for the first table
scan, and it converges to steady-state after about 20 tables scans.
Rather than write every 20th page, what I've done here is make every
2000'th buffer allocation grant an allowance of 100 hint bit only
writes.  All dirty pages and the next 100 pages that are
dirty-only-for-hint-bits get written out.  Then we stop writing the
dirty-only-for-hint-bits-pages until we get our next allowance of
writes.  The idea is to try to avoid creating a lot of random writes
on each scan through the table.  At least here, that seems to work
pretty well - the initial scan is only about 25% slower than the
steady state (rather than 6x or more slower).

I am seeing occasional latency spikes that appear to be the result of
the OS write cache filling up and deciding that it has to flush
everything to disk before writing anything more.  I'm not too
concerned about that because this is a fairly artificial test case
(one doesn't usually sit around doing consecutive SELECT sum(1) FROM s
commands) but it seems like pretty odd behavior.  The system sits
there doing no writes at all as I'm sending more and more dirty pages
into the system buffer cache and then, boom, write storm.  I haven't
yet tested to see if the same behavior occurs on Linux.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


bm-hint-bits-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-19 Thread Merlin Moncure
On Wed, Jan 19, 2011 at 10:44 AM, Robert Haas robertmh...@gmail.com wrote:
 Here's a new version of the patch based on some experimentation with
 ideas I posted yesterday.  At least on my Mac laptop, this is pretty
 effective at blunting the response time spike for the first table
 scan, and it converges to steady-state after about 20 tables scans.
 Rather than write every 20th page, what I've done here is make every
 2000'th buffer allocation grant an allowance of 100 hint bit only
 writes.  All dirty pages and the next 100 pages that are
 dirty-only-for-hint-bits get written out.  Then we stop writing the
 dirty-only-for-hint-bits-pages until we get our next allowance of
 writes.  The idea is to try to avoid creating a lot of random writes
 on each scan through the table.  At least here, that seems to work
 pretty well - the initial scan is only about 25% slower than the
 steady state (rather than 6x or more slower).

does this only impact the scan case?  in oltp scenarios you want to
write out the bits asap, i would imagine.   what about time based
flushing, so that only x dirty hint bit pages can be written out per
time unit y?

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-19 Thread Merlin Moncure
On Wed, Jan 19, 2011 at 9:13 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 19.01.2011 15:56, Merlin Moncure wrote:

 On Wed, Jan 19, 2011 at 7:57 AM, Merlin Moncuremmonc...@gmail.com
  wrote:

 On Tue, Jan 18, 2011 at 12:44 PM, Robert Haasrobertmh...@gmail.com
  wrote:

 On Tue, Jan 18, 2011 at 9:24 AM, Merlin Moncuremmonc...@gmail.com
  wrote:

 a few weeks back I hacked an experimental patch that removed the hint
 bit action completely.  the results were very premature and/or
 incorrect, but my initial findings suggested that hint bits might not
 be worth the cost from performance standpoint.  i'd like to see some
 more investigation in this direction before going with a complex
 application mechanism (although that would be beneficial vs the status
 quo).

 I think it's not very responsible to allege that hint bits aren't
 providing a benefit without providing the patch that you used and the
 tests that you ran.  This is a topic that needs careful analysis, and
 I think that saying hint bits don't provide a benefit... maybe...
 doesn't do anything but confuse the issue.  How about doing some tests
 with the patch from my OP and posting the results?  If removing hint
 bits entirely doesn't degrade performance, then surely the
 less-drastic approach I've taken here ought to be OK too.  But in my
 testing, it didn't look too good.

 hm. well, I would have to agree on the performance hit -- I figure 5%
 scan penalty should be about the maximum you'd want to pay to get the
 i/o reduction.  Odds are you're correct and I blew something...I'd be
 happy to test your patch.

 Ah, I tested your patch vs stock postgres vs my patch, basically your
 results are unhappily correct (mine was just a hair faster than yours
 which you'd expect).  The differential was even wider on my laptop
 class hardware, maybe 26%.  I also agree that even if the penalty was
 reduced or determined to be worth it anyways, your approach to move
 the setting/i/o around to appropriate places is the way to go vs
 wholesale removal, unless some way is found to reduce clog lookup
 penalty to a fraction of what it is now (not likely, I didn't profile
 but I bet a lot of the problem is the lw lock).  Interesting I didn't
 notice this on my original test :(.

 One thing to note is that the current visibility-checking code is optimized
 for the case that the hint bit is set, and the codepath where it's not is
 not particularly fast. HeapTupleSatisfiesMVCC does a lot of things besides
 checking the clog. For xmin:

 1. Check HEAP_MOVED_OFF / HEAP_MOVED_IN
 2. Check if xmin is the current transaction with
 TransactionIdIsCurrentTransactionId()
 3. Check if xmin is still in progress with TransactionIdIsInProgress()
 4. And finally, check the clog with TransactionIdDidCommit()

 It would be nice to profile the code to see where the time really is spent.
 Most of it is probably in the clog access, but the TransactionIdInProgress()
 call can be quite expensive too if there's a lot of concurrent backends.

Nice thought -- it's worth checking out. I'll play around with it some
more -- I think you're right and the first step is to profile.  If the
bottleneck is in fact the lock there's not much that can be done
afaict.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-19 Thread Robert Haas
On Wed, Jan 19, 2011 at 11:18 AM, Merlin Moncure mmonc...@gmail.com wrote:
 On Wed, Jan 19, 2011 at 10:44 AM, Robert Haas robertmh...@gmail.com wrote:
 Here's a new version of the patch based on some experimentation with
 ideas I posted yesterday.  At least on my Mac laptop, this is pretty
 effective at blunting the response time spike for the first table
 scan, and it converges to steady-state after about 20 tables scans.
 Rather than write every 20th page, what I've done here is make every
 2000'th buffer allocation grant an allowance of 100 hint bit only
 writes.  All dirty pages and the next 100 pages that are
 dirty-only-for-hint-bits get written out.  Then we stop writing the
 dirty-only-for-hint-bits-pages until we get our next allowance of
 writes.  The idea is to try to avoid creating a lot of random writes
 on each scan through the table.  At least here, that seems to work
 pretty well - the initial scan is only about 25% slower than the
 steady state (rather than 6x or more slower).

 does this only impact the scan case?  in oltp scenarios you want to
 write out the bits asap, i would imagine.   what about time based
 flushing, so that only x dirty hint bit pages can be written out per
 time unit y?

No, it doesn't only affect the scan case.  But I don't think that's
bad.  The goal is for the background writer to provide enough clean
pages that backends don't have to write anything at all.  If that's
not happening, the backends will be slowed by the need to write out
pages themselves in order to create a sufficient supply of clean pages
to satisfy their allocation needs.  The easiest way for that situation
to occur is if the backend is doing a large sequential scan of a table
- in that case, it's by definition cycling through pages at top speed,
and the fact that it's cycling through them in a ring buffer rather
than using all of shared_buffers makes the loop even tighter.  But if
it's possible under some other set of circumstances, the behavior is
still reasonable.  This behavior kicks in if more than 100 out of some
set of 2000 page allocations would require a write only for the
purpose of flushing hint bits.

Time-based flushing would be problematic in several respects.  First,
it would require a kernel call, which would be vastly more expensive
than what I'm doing now, and might have undesirable performance
implications for that reason.  Second, I don't think it would be the
right way to tune it even if that were not an issue.  It doesn't
really matter whether the system takes a millisecond or a microsecond
or a nanosecond to write each buffer - what matters is that writing
all the buffers is a lot slower than writing none of them.  So what we
want to do is write a percentage of them, in a way that guarantees
that they'll all eventually get written if people continue to access
the same data.  This does that, and a time-based setting would not; it
would also almost certainly require tuning based on the I/O capacities
of the system it's running on, which isn't necessary with this
approach.

Before we get too deeply involved in theory, can you give this a test
drive on your system and see how it looks?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-19 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 ... So what we
 want to do is write a percentage of them, in a way that guarantees
 that they'll all eventually get written if people continue to access
 the same data.

The word guarantee seems quite inappropriate here, since as far as I
can see this approach provides no such guarantee --- even after many
cycles you'd never be really certain all the bits were set.

What I asked for upthread was that we continue to have some
deterministic, practical way to force all hint bits in a table to be
set.  This is not *remotely* responding to that request.  It's still not
deterministic, and even if it were, vacuuming a large table 20 times
isn't a very practical solution.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-19 Thread Robert Haas
On Wed, Jan 19, 2011 at 11:52 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 ... So what we
 want to do is write a percentage of them, in a way that guarantees
 that they'll all eventually get written if people continue to access
 the same data.

 The word guarantee seems quite inappropriate here, since as far as I
 can see this approach provides no such guarantee --- even after many
 cycles you'd never be really certain all the bits were set.

 What I asked for upthread was that we continue to have some
 deterministic, practical way to force all hint bits in a table to be
 set.  This is not *remotely* responding to that request.  It's still not
 deterministic, and even if it were, vacuuming a large table 20 times
 isn't a very practical solution.

I get the impression you haven't spent as much time reading my email
as I spent writing it.  Perhaps I'm wrong, but in any case the code
doesn't do what you're suggesting.  In the most recently posted
version of this patch, which is v2, if VACUUM hits a page that is
hint-bit-dirty, it always writes it.  Full stop.  The 20 times bit
applies to a SELECT * FROM table, which is a rather different case.

As I write this, I realize that there is a small fly in the ointment
here, which is that neither VACUUM nor SELECT force out all the pages
they modify to disk.  So there is some small amount of remaining
nondeterminism, even if you VACUUM, because VACUUM will leave the last
few pages it dirties in shared_buffers, and whether those hint bits
hit the disk will depend on a decision made at the time they're
evicted, not at the time they were dirtied.  Possibly I could fix that
by making SetBufferCommitInfoNeedsSave() set BM_DIRTY during vacuum
and BM_HINT_BITS at other times.  That would nail the lid shut pretty
tight.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-18 Thread Jim Nasby
On Jan 16, 2011, at 4:37 PM, Kevin Grittner wrote:
 Robert Haas  wrote:
 
 a quick-and-dirty attempt to limit the amount of I/O caused by hint
 bits. I'm still very interested in knowing what people think about
 that.
 
 I found the elimination of the response-time spike promising.  I
 don't think I've seen enough data yet to feel comfortable endorsing
 it, though.  I guess the question in my head is: how much of the
 lingering performance hit was due to having to go to clog and how
 much was due to competition with the deferred writes?  If much of it
 is due to repeated recalculation of visibility based on clog info, I
 think there would need to be some way to limit how many times that
 happened before the hint bits were saved.

What if we sped up the case where hint bits aren't set? Has anyone collected 
data on the actual pain points of checking visibility when hint bits aren't 
set? How about when setting hint bits is intentionally delayed? I wish we had 
some more infrastructure around the XIDCACHE counters; having that info 
available for people's general workloads might be extremely valuable. Even if I 
was to compile with it turned on, it seems the only way to get at it is via 
stderr, which is very hard to deal with.

Lacking performance data (and for my own education), I've spent the past few 
hours studying HeapTupleSatisfiesNow(). If I'm understanding it correctly, the 
three critical functions from a performance standpoint are 
TransactionIdIsCurrentTransactionId, TransactionIdIsInProgress and 
TransactionIdDidCommit. Note that all 3 can potentially be called twice; once 
to check xmin and once to check xmax.

ISTM TransactionIdIsCurrentTransactionId is missing a shortcut: shouldn't we be 
able to immediately return false if the XID we're checking is older than some 
value, like global xmin? Maybe it's only worth checking that case if we hit a 
subtransaction, but if the check is faster than one or two loops through the 
binary search... I would think this at least warrants a one XID cache ala 
cachedFetchXidStatus (though it would need to be a different cache...) Another 
issue is that TransactionIdIsInProgress will call this function as well, unless 
it skips out because the transaction is  RecentXmin.

TransactionIdIsInProgress does a fair amount of easy checking already... the 
biggest thing is that if it's less than RecentXmin we bounce out immediately. 
If we can't bounce out immediately though, this routine gets pretty expensive 
unless the XID is currently running and is top-level. It's worse if there are 
subxacts and can be horribly bad if any subxact caches have overflowed. Note 
that if anything has overflowed, then we end up going to clog and possibly 
pg_subtrans.

Finally, TransactionIdDidCommit hits clog.

So the degenerate cases seem to be:

- Really old XIDs. These suck because there's a good chance we'll have to read 
from clog.
- XIDs  RecontXmin that are not currently running top-level transactions. The 
pain here increases with subtransaction use.

For the second case, if we can ensure that RecentXmin is not very old then 
there's generally a smaller chance that TransactionIdIsInProgress has to do a 
lot of work. My experience is that most systems that have a high transaction 
rate don't end up with a lot of long-running transactions. Storing a list of 
the X oldest transactions would allow us to keep RecentXmin closer to the most 
recent XID.

For the first case, we should be able to create a more optimized clog lookup 
method that works for older XIDs. If we restrict this to XIDs that are older 
than GlobalXmin then we can simplify things because we don't have to worry 
about transactions that are in-progress. We also don't need to differentiate 
between subtransactions and their parents (though, we obviously need to figure 
out whether a subtransaction is considered to be committed or not). Because 
we're restricting this to XIDs that we know we can determine the state of, we 
only need to store a maximum of 1 bit per XID. That's already half the size of 
clog. But because we don't have to build this list on the fly (we're don't need 
to update it on every commit/abort as long as we know the range of XIDs that 
are stored), we don't have to support random writes. That means we can use a 
structure that's more complex to maintain than a simple bitmap. Or maybe we 
stick with a bitmap but compress it.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-18 Thread Merlin Moncure
On Tue, Jan 18, 2011 at 3:47 AM, Jim Nasby j...@nasby.net wrote:
 On Jan 16, 2011, at 4:37 PM, Kevin Grittner wrote:
 Robert Haas  wrote:

 a quick-and-dirty attempt to limit the amount of I/O caused by hint
 bits. I'm still very interested in knowing what people think about
 that.

 I found the elimination of the response-time spike promising.  I
 don't think I've seen enough data yet to feel comfortable endorsing
 it, though.  I guess the question in my head is: how much of the
 lingering performance hit was due to having to go to clog and how
 much was due to competition with the deferred writes?  If much of it
 is due to repeated recalculation of visibility based on clog info, I
 think there would need to be some way to limit how many times that
 happened before the hint bits were saved.

 What if we sped up the case where hint bits aren't set? Has anyone collected 
 data on the actual pain points of checking visibility when hint bits aren't 
 set? How about when setting hint bits is intentionally delayed? I wish we had 
 some more infrastructure around the XIDCACHE counters; having that info 
 available for people's general workloads might be extremely valuable. Even if 
 I was to compile with it turned on, it seems the only way to get at it is via 
 stderr, which is very hard to deal with.

 Lacking performance data (and for my own education), I've spent the past few 
 hours studying HeapTupleSatisfiesNow(). If I'm understanding it correctly, 
 the three critical functions from a performance standpoint are 
 TransactionIdIsCurrentTransactionId, TransactionIdIsInProgress and 
 TransactionIdDidCommit. Note that all 3 can potentially be called twice; once 
 to check xmin and once to check xmax.

hint bits give you two benefits: you don't have to lwlock the clog and
you don't have to go look them up.  a lookup is either a lru cache
lookup or an i/o lookup on the clog.  the cost of course is extra
writing out the bits.  in most workloads they are not even noticed but
in particular cases they are an i/o multiplier.

a few weeks back I hacked an experimental patch that removed the hint
bit action completely.  the results were very premature and/or
incorrect, but my initial findings suggested that hint bits might not
be worth the cost from performance standpoint.  i'd like to see some
more investigation in this direction before going with a complex
application mechanism (although that would be beneficial vs the status
quo).

an ideal testing environment to compare would be a mature database
(full clog) with some verifiable performance tests and a mixed
olap/oltp workload.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-18 Thread Jim Nasby
On Jan 18, 2011, at 8:24 AM, Merlin Moncure wrote:
 a few weeks back I hacked an experimental patch that removed the hint
 bit action completely.  the results were very premature and/or
 incorrect, but my initial findings suggested that hint bits might not
 be worth the cost from performance standpoint.  i'd like to see some
 more investigation in this direction before going with a complex
 application mechanism (although that would be beneficial vs the status
 quo).

If you're not finding much benefit to hint bits, that's *very* interesting. 
Everything I outlined certainly looks like a pretty damn expensive code path; 
it's really surprising that hint bits don't help.

I think it would be very valuable to profile the cost of the different code 
paths involved in the HeapTupleSatisfies* functions, even if the workload is 
just pgBench.

 an ideal testing environment to compare would be a mature database
 (full clog) with some verifiable performance tests and a mixed
 olap/oltp workload.

We're working on setting such a framework up. Unfortunately it will only be 8.3 
to start, but we hope to be on 9.0 soon.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-18 Thread Robert Haas
On Tue, Jan 18, 2011 at 3:47 AM, Jim Nasby j...@nasby.net wrote:
 On Jan 16, 2011, at 4:37 PM, Kevin Grittner wrote:
 Robert Haas  wrote:

 a quick-and-dirty attempt to limit the amount of I/O caused by hint
 bits. I'm still very interested in knowing what people think about
 that.

 I found the elimination of the response-time spike promising.  I
 don't think I've seen enough data yet to feel comfortable endorsing
 it, though.  I guess the question in my head is: how much of the
 lingering performance hit was due to having to go to clog and how
 much was due to competition with the deferred writes?  If much of it
 is due to repeated recalculation of visibility based on clog info, I
 think there would need to be some way to limit how many times that
 happened before the hint bits were saved.

 What if we sped up the case where hint bits aren't set? Has anyone collected 
 data on the actual pain points of checking visibility when hint bits aren't 
 set?

I think that's worth looking into, but I don't have any present plan
to actually do it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-18 Thread Robert Haas
On Tue, Jan 18, 2011 at 9:24 AM, Merlin Moncure mmonc...@gmail.com wrote:
 a few weeks back I hacked an experimental patch that removed the hint
 bit action completely.  the results were very premature and/or
 incorrect, but my initial findings suggested that hint bits might not
 be worth the cost from performance standpoint.  i'd like to see some
 more investigation in this direction before going with a complex
 application mechanism (although that would be beneficial vs the status
 quo).

I think it's not very responsible to allege that hint bits aren't
providing a benefit without providing the patch that you used and the
tests that you ran.  This is a topic that needs careful analysis, and
I think that saying hint bits don't provide a benefit... maybe...
doesn't do anything but confuse the issue.  How about doing some tests
with the patch from my OP and posting the results?  If removing hint
bits entirely doesn't degrade performance, then surely the
less-drastic approach I've taken here ought to be OK too.  But in my
testing, it didn't look too good.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-18 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 I think you may be confused about what the patch does - currently,
 pages with hint bit changes are considered dirty, period.
 Therefore, they are written whenever any other dirty page would be
 written: by the background writer cleaning scan, at checkpoints,
 and when a backend must write a dirty buffer before reallocating it
 to hold a different page. The patch keeps the first of these and
 changes the second two

While I was trying to performance-test the texteq patch, it occurred to
me that this proposed hint-bit change has got a serious drawback.  To
wit, that it will totally destroy reproducibility of any performance
test that involves table scans.  Right now, you know that you can take
hint bits out of the equation by doing a vacuum analyze and checkpoint;
after that, all hint bits in the table are known to be set and written
to disk.  Then you can get on with comparing the effects of some patch
or other.  With the proposed patch, it will never be clear whether
all the hint bits are set, because the patch specifically removes the
deterministic ways to get a hint bit written out.  So you'll never be
very sure whether a performance difference you think you see is real,
or whether one case or the other got affected by extra clog lookups.
It's hard enough already to be sure about performance changes on the
order of 1%, but this will make it impossible.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-18 Thread Robert Haas
On Tue, Jan 18, 2011 at 1:32 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 I think you may be confused about what the patch does - currently,
 pages with hint bit changes are considered dirty, period.
 Therefore, they are written whenever any other dirty page would be
 written: by the background writer cleaning scan, at checkpoints,
 and when a backend must write a dirty buffer before reallocating it
 to hold a different page. The patch keeps the first of these and
 changes the second two

 While I was trying to performance-test the texteq patch, it occurred to
 me that this proposed hint-bit change has got a serious drawback.  To
 wit, that it will totally destroy reproducibility of any performance
 test that involves table scans.  Right now, you know that you can take
 hint bits out of the equation by doing a vacuum analyze and checkpoint;
 after that, all hint bits in the table are known to be set and written
 to disk.  Then you can get on with comparing the effects of some patch
 or other.  With the proposed patch, it will never be clear whether
 all the hint bits are set, because the patch specifically removes the
 deterministic ways to get a hint bit written out.  So you'll never be
 very sure whether a performance difference you think you see is real,
 or whether one case or the other got affected by extra clog lookups.
 It's hard enough already to be sure about performance changes on the
 order of 1%, but this will make it impossible.

True.  You could perhaps fix that by adding a GUC, but that feels
awfully like making it the user's problem to fix our broken
implementation.  Maybe we could live with it if the GUC were only
something developers ever needed to use, but I expect different people
would have different ideas about the correct setting in production.

If I'm not failing to understand the situation, the problem with the
first sequential scan after a bulk load is that we're cycling through
a ring of buffers that all have hint-bit changes and therefore all
have to be written.  The first pass through the ring is OK, but after
that every new buffer we bring in requires evicting a buffer that we
first have to write.  Of course, with the patch, this bottleneck is
removed by skipping all those writes, but that now causes a second
problem: the pages only get written if the background writer happens
to notice them before the backend gets all the way around the ring,
and that's pretty hit-or-miss, so we basically dribble hint bits out
to disk here and there but the steady state never really converges to
all hint bits on disk.

Maybe we could work around this by making the algorithm a little more
sophisticated.  Instead of the rather unilateral policy backends
don't write pages that are only dirty due to hint bit changes! we
could have some more nuanced rules.  For example, we might decree that
a backend will maintain a counter of the number of non-dirty pages
it's allocated.  Once it's allocated 20 pages that are either clean or
dirty-only-for-hint-bits, it writes that (or the next)
dirty-only-for-hint-bits it encounters.  That way, the effort of hint
bit setting would be spread out over the first 20 table scans, and
after that you converge to steady state.  We could also possibly
special-case vacuum to always write dirty-only-for-hint bits pages, on
the theory that the work is going to have to be done at some point,
and we're better off doing it during a maintenance task than
elsewhere.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-18 Thread Heikki Linnakangas

On 18.01.2011 21:16, Robert Haas wrote:

On Tue, Jan 18, 2011 at 1:32 PM, Tom Lanet...@sss.pgh.pa.us  wrote:

While I was trying to performance-test the texteq patch, it occurred to
me that this proposed hint-bit change has got a serious drawback.  To
wit, that it will totally destroy reproducibility of any performance
test that involves table scans.  Right now, you know that you can take
hint bits out of the equation by doing a vacuum analyze and checkpoint;
after that, all hint bits in the table are known to be set and written
to disk.  Then you can get on with comparing the effects of some patch
or other.  With the proposed patch, it will never be clear whether
all the hint bits are set, because the patch specifically removes the
deterministic ways to get a hint bit written out.  So you'll never be
very sure whether a performance difference you think you see is real,
or whether one case or the other got affected by extra clog lookups.
It's hard enough already to be sure about performance changes on the
order of 1%, but this will make it impossible.


True.  You could perhaps fix that by adding a GUC, but that feels
awfully like making it the user's problem to fix our broken
implementation.  Maybe we could live with it if the GUC were only
something developers ever needed to use, but I expect different people
would have different ideas about the correct setting in production.


VACUUM (SET HINT BITS) table

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-18 Thread Robert Haas
On Tue, Jan 18, 2011 at 3:00 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 18.01.2011 21:16, Robert Haas wrote:

 On Tue, Jan 18, 2011 at 1:32 PM, Tom Lanet...@sss.pgh.pa.us  wrote:

 While I was trying to performance-test the texteq patch, it occurred to
 me that this proposed hint-bit change has got a serious drawback.  To
 wit, that it will totally destroy reproducibility of any performance
 test that involves table scans.  Right now, you know that you can take
 hint bits out of the equation by doing a vacuum analyze and checkpoint;
 after that, all hint bits in the table are known to be set and written
 to disk.  Then you can get on with comparing the effects of some patch
 or other.  With the proposed patch, it will never be clear whether
 all the hint bits are set, because the patch specifically removes the
 deterministic ways to get a hint bit written out.  So you'll never be
 very sure whether a performance difference you think you see is real,
 or whether one case or the other got affected by extra clog lookups.
 It's hard enough already to be sure about performance changes on the
 order of 1%, but this will make it impossible.

 True.  You could perhaps fix that by adding a GUC, but that feels
 awfully like making it the user's problem to fix our broken
 implementation.  Maybe we could live with it if the GUC were only
 something developers ever needed to use, but I expect different people
 would have different ideas about the correct setting in production.

 VACUUM (SET HINT BITS) table

Something along those lines could work too, but I don't see much
problem with making VACUUM doing it unconditionally.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-17 Thread Jim Nasby
On Jan 14, 2011, at 7:24 PM, Josh Berkus wrote:
 On 1/14/11 11:51 AM, Tom Lane wrote:
 The people whose tables are mostly insert-only complain about it, but
 that's not the majority of our userbase IMO.  We just happen to have a
 couple of particularly vocal ones, like Berkus.
 
 It might or might not be the majority, but it's an extremely common case
 affecting a lot of users.  Many, if not most, software applications have
 a log table (or two, or three) which just accumulates rows, and when
 that log table gets a vacuum freeze it pretty much halts the database in
 its tracks.  Between my client practice and IRC, I run across complaints
 about this issue around 3 times a month.
 
 And data warehousing is a significant portion of our user base, and
 *all* DW users are affected by this.  In some cases, vacuum issues are
 sufficient to prevent people from using PostgreSQL for data warehousing.

This also affects us every time we stand up a new londiste replica, and I 
expect Slony folks would suffer the same thing. When you copy everything over, 
that's going to happen in a relatively short range of XIDs, so when those XIDs 
start hitting freeze age suddenly *everything* needs to get frozen.

As for hint bits, you're generally not going to have anyone reading from a 
slave that's still being built, so you won't see any hint bit setting until you 
actually open up for users. So for the first X amount of time, performance 
takes a big hit because you have to write all the hints out. Granted, you can 
technically VACUUM FREEZE after the slave is built, but that means more time 
before you can start using the slave and it's something you have to remember to 
do.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-16 Thread Robert Haas
On Sat, Jan 15, 2011 at 6:28 PM, Josh Berkus j...@agliodbs.com wrote:
 If the problem is that all the freezing happens at once, then ISTM the
 solution is to add a random factor. Say, when a tuple just passes the
 lower threshold it has a 1% chance of being frozen. The chance grows
 until it is 100% as it reaches the upper threshold.

 Doesn't have to be random; it could be determinative.  That is, we could
 have a vacuum_freeze_max_size parameter ... and accompanying autovacuum
 parameter ... which allowed the user to limit freezing scans to, say,
 1GB of the table at a time.  If I could, say, call a manual freeze of
 10% of the largest tables ever night, then I might actually be able to
 schedule it.  It's a full scan of the whole table which is fatal.

I think this is worth pursuing at some point, though of course one
needs to devise an algorithm that spreads out the freezing enough but
not too much.  But it's fairly off-topic from the original subject of
this thread, which was a quick-and-dirty attempt to limit the amount
of I/O caused by hint bits.  I'm still very interested in knowing what
people think about that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-16 Thread Kevin Grittner
Robert Haas  wrote:
 
 a quick-and-dirty attempt to limit the amount of I/O caused by hint
 bits. I'm still very interested in knowing what people think about
 that.
 
I found the elimination of the response-time spike promising.  I
don't think I've seen enough data yet to feel comfortable endorsing
it, though.  I guess the question in my head is: how much of the
lingering performance hit was due to having to go to clog and how
much was due to competition with the deferred writes?  If much of it
is due to repeated recalculation of visibility based on clog info, I
think there would need to be some way to limit how many times that
happened before the hint bits were saved.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-16 Thread Robert Haas
On Sun, Jan 16, 2011 at 5:37 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Robert Haas  wrote:
 a quick-and-dirty attempt to limit the amount of I/O caused by hint
 bits. I'm still very interested in knowing what people think about
 that.

 I found the elimination of the response-time spike promising.  I
 don't think I've seen enough data yet to feel comfortable endorsing
 it, though.  I guess the question in my head is: how much of the
 lingering performance hit was due to having to go to clog and how
 much was due to competition with the deferred writes?  If much of it
 is due to repeated recalculation of visibility based on clog info, I
 think there would need to be some way to limit how many times that
 happened before the hint bits were saved.

I think you may be confused about what the patch does - currently,
pages with hint bit changes are considered dirty, period.  Therefore,
they are written whenever any other dirty page would be written: by
the background writer cleaning scan, at checkpoints, and when a
backend must write a dirty buffer before reallocating it to hold a
different page.  The patch keeps the first of these and changes the
second two: pages with only hint bit changes are dirty for purposes of
the background writer, but are considered clean for checkpoint
purposes and buffer recycling.  IOW, I'm not adding any new mechanism
for these pages to get written.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-16 Thread Kevin Grittner
Robert Haas  wrote:
 
 I think you may be confused about what the patch does - currently,
 pages with hint bit changes are considered dirty, period.
 Therefore, they are written whenever any other dirty page would be
 written: by the background writer cleaning scan, at checkpoints,
 and when a backend must write a dirty buffer before reallocating it
 to hold a different page. The patch keeps the first of these and
 changes the second two
 
No, I understood that.  I'm just concerned that if you eliminate the
other two, we may be recomputing visibility based on clog often
enough to kill performance.
 
In other words, I'm asking that you show that the other two methods
aren't really needed for decent overall performance.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-16 Thread Robert Haas
On Sun, Jan 16, 2011 at 8:41 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Robert Haas  wrote:
\ I think you may be confused about what the patch does - currently,
 pages with hint bit changes are considered dirty, period.
 Therefore, they are written whenever any other dirty page would be
 written: by the background writer cleaning scan, at checkpoints,
 and when a backend must write a dirty buffer before reallocating it
 to hold a different page. The patch keeps the first of these and
 changes the second two

 No, I understood that.  I'm just concerned that if you eliminate the
 other two, we may be recomputing visibility based on clog often
 enough to kill performance.

 In other words, I'm asking that you show that the other two methods
 aren't really needed for decent overall performance.

Admittedly I've only done one test, but on the basis of that test I'd
say the other two methods ARE really needed for decent overall
performance.  I think it'd be interesting to see this tested on a
machine with large shared buffers, where the background writer might
succeed in cleaning a higher fraction of the pages before the bulk
read buffer access strategy starts recycling buffers.  But I'm not
very optimistic.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-15 Thread Martijn van Oosterhout
On Fri, Jan 14, 2011 at 05:24:31PM -0800, Josh Berkus wrote:
 On 1/14/11 11:51 AM, Tom Lane wrote:
  The people whose tables are mostly insert-only complain about it, but
  that's not the majority of our userbase IMO.  We just happen to have a
  couple of particularly vocal ones, like Berkus.
 
 It might or might not be the majority, but it's an extremely common case
 affecting a lot of users.  Many, if not most, software applications have
 a log table (or two, or three) which just accumulates rows, and when
 that log table gets a vacuum freeze it pretty much halts the database in
 its tracks.  Between my client practice and IRC, I run across complaints
 about this issue around 3 times a month.

If the problem is that all the freezing happens at once, then ISTM the
solution is to add a random factor. Say, when a tuple just passes the
lower threshold it has a 1% chance of being frozen. The chance grows
until it is 100% as it reaches the upper threshold.

This should reduce the freezing traffic to a constant (hopefully
manageable) stream, since as the chance of freezing increases the
amount of data to be frozen goes down, so they should cancel somewhat.

To avoid rewriting pages multiple times, if one tuple can be frozen on
a page, we should freeze as many as possible, but the logic may do that
already.

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 Patriotism is when love of your own people comes first; nationalism,
 when hate for people other than your own comes first. 
   - Charles de Gaulle


signature.asc
Description: Digital signature


Re: [HACKERS] limiting hint bit I/O

2011-01-15 Thread Josh Berkus

 If the problem is that all the freezing happens at once, then ISTM the
 solution is to add a random factor. Say, when a tuple just passes the
 lower threshold it has a 1% chance of being frozen. The chance grows
 until it is 100% as it reaches the upper threshold.

Doesn't have to be random; it could be determinative.  That is, we could
have a vacuum_freeze_max_size parameter ... and accompanying autovacuum
parameter ... which allowed the user to limit freezing scans to, say,
1GB of the table at a time.  If I could, say, call a manual freeze of
10% of the largest tables ever night, then I might actually be able to
schedule it.  It's a full scan of the whole table which is fatal.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-14 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Thu, Jan 13, 2011 at 10:43 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 This appears to remove the BM_JUST_DIRTIED logic.  Please explain why
 that's not completely broken.  Even if it isn't completely broken,
 it would seem better to do something like that as a separate patch.

 Well, the only point of BM_JUST_DIRTIED is to detect whether BM_DIRTY
 has been set while a buffer write is in progress.  With this patch,
 only BM_HINT_BITS can be set while the buffer write is in progress;
 BM_DIRTY cannot.  Perhaps one could make the argument that this would
 be a good cleanup anyway: in the unpatched code, BM_DIRTY can only be
 set while a buffer I/O is in progress if it is set due to a hint-bit
 update, and then we don't really care if the update gets lost.
 Although that seems a bit confusing...

[ thinks some more... ]  If memory serves, the BM_JUST_DIRTIED mechanism
dates from a time when checkpoints would write dirty buffers without
taking any lock on them; if somebody changed the page meanwhile, the
buffer was just considered to remain dirty.  We later decided that was
a bad idea and set up the current arrangement whereby only hint-bit
changes are allowed while a write is in progress.  So you're right that
it would be dead code if we don't consider that a hint-bit change is
really dirtying the page.  I'm not for removing it altogether though,
because it seems like something we could possibly want again in the
future (for instance, we might decide to go back to write-without-lock
to reduce lock contention).  It's not like we are short of buffer flag
bits.  Moreover this whole business of not treating hint-bit setting as
a page-dirtying operation is completely experimental/unproven IMO, so it
would be better to keep the patch footprint as small as possible.  I'd
suggest leaving BM_JUST_DIRTIED as-is and just adding BM_HINT_BITS_DIRTY
as a new flag.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-14 Thread Robert Haas
On Fri, Jan 14, 2011 at 12:47 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Thu, Jan 13, 2011 at 10:43 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 This appears to remove the BM_JUST_DIRTIED logic.  Please explain why
 that's not completely broken.  Even if it isn't completely broken,
 it would seem better to do something like that as a separate patch.

 Well, the only point of BM_JUST_DIRTIED is to detect whether BM_DIRTY
 has been set while a buffer write is in progress.  With this patch,
 only BM_HINT_BITS can be set while the buffer write is in progress;
 BM_DIRTY cannot.  Perhaps one could make the argument that this would
 be a good cleanup anyway: in the unpatched code, BM_DIRTY can only be
 set while a buffer I/O is in progress if it is set due to a hint-bit
 update, and then we don't really care if the update gets lost.
 Although that seems a bit confusing...

 [ thinks some more... ]  If memory serves, the BM_JUST_DIRTIED mechanism
 dates from a time when checkpoints would write dirty buffers without
 taking any lock on them; if somebody changed the page meanwhile, the
 buffer was just considered to remain dirty.  We later decided that was
 a bad idea and set up the current arrangement whereby only hint-bit
 changes are allowed while a write is in progress.  So you're right that
 it would be dead code if we don't consider that a hint-bit change is
 really dirtying the page.  I'm not for removing it altogether though,
 because it seems like something we could possibly want again in the
 future (for instance, we might decide to go back to write-without-lock
 to reduce lock contention).  It's not like we are short of buffer flag
 bits.  Moreover this whole business of not treating hint-bit setting as
 a page-dirtying operation is completely experimental/unproven IMO, so it
 would be better to keep the patch footprint as small as possible.  I'd
 suggest leaving BM_JUST_DIRTIED as-is and just adding BM_HINT_BITS_DIRTY
 as a new flag.

I have some concerns about that proposal, but it might be the right
way to go.  Before we get too far off into the weeds, though, let's
back up and talk about something more fundamental: this seems to be
speeding up the first run by 6x at the expense of slowing down many
subsequent runs by 10-15%.  Does that make this whole idea dead on
arrival?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-14 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:
 
 this seems to be speeding up the first run by 6x at the expense of
 slowing down many subsequent runs by 10-15%.
 
If the overall throughput when measured far enough out to have hit a
steady state again is anywhere in the neighborhood of the unpatched
throughput, the leveling of the response times has enough value to
merit the change.  At least in my world.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-14 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Fri, Jan 14, 2011 at 12:47 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Moreover this whole business of not treating hint-bit setting as
 a page-dirtying operation is completely experimental/unproven IMO, so it
 would be better to keep the patch footprint as small as possible.

 I have some concerns about that proposal, but it might be the right
 way to go.  Before we get too far off into the weeds, though, let's
 back up and talk about something more fundamental: this seems to be
 speeding up the first run by 6x at the expense of slowing down many
 subsequent runs by 10-15%.  Does that make this whole idea dead on
 arrival?

Well, it reinforces my opinion that it's experimental ;-).  But first
run of what, exactly?  And are you sure you're taking a wholistic view
of the costs/benefits?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-14 Thread Robert Haas
On Fri, Jan 14, 2011 at 1:02 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Robert Haas robertmh...@gmail.com wrote:
 this seems to be speeding up the first run by 6x at the expense of
 slowing down many subsequent runs by 10-15%.

 If the overall throughput when measured far enough out to have hit a
 steady state again is anywhere in the neighborhood of the unpatched
 throughput, the leveling of the response times has enough value to
 merit the change.  At least in my world.

I think it would eventually settle down to the same speed, but it
might take a really long time.  I got impatient before I got that far.
 I'm hoping some will pick it up and play with it some more (hint,
hint).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-14 Thread Robert Haas
On Fri, Jan 14, 2011 at 1:06 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Fri, Jan 14, 2011 at 12:47 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Moreover this whole business of not treating hint-bit setting as
 a page-dirtying operation is completely experimental/unproven IMO, so it
 would be better to keep the patch footprint as small as possible.

 I have some concerns about that proposal, but it might be the right
 way to go.  Before we get too far off into the weeds, though, let's
 back up and talk about something more fundamental: this seems to be
 speeding up the first run by 6x at the expense of slowing down many
 subsequent runs by 10-15%.  Does that make this whole idea dead on
 arrival?

 Well, it reinforces my opinion that it's experimental ;-).  But first
 run of what, exactly?

See the test case in my OP.  The runs in question are select sum(1) from s.

 And are you sure you're taking a wholistic view
 of the costs/benefits?

No.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-14 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Fri, Jan 14, 2011 at 1:06 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Well, it reinforces my opinion that it's experimental ;-).  But first
 run of what, exactly?

 See the test case in my OP.  The runs in question are select sum(1) from 
 s.

 And are you sure you're taking a wholistic view
 of the costs/benefits?

 No.

Well, IMO it would be a catastrophic mistake to evaluate a patch like
this on the basis of any single test case, let alone one as simplistic
as that.  I would observe in particular that your test case creates a
table containing only one distinct value of xmin, which means that the
single-transaction cache in transam.c is 100% effective, which doesn't
seem to me to be a very realistic test condition.  I think this is
vastly understating the cost of missing hint bits.

So what it needs now is a lot more testing.  pg_bench might be worth
trying if you want something with minimal development effort, though
I'm not sure if its clog access pattern is particularly realistic
either.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-14 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:
 
 I'm hoping some will pick it up and play with it some more (hint,
 hint).
 
That was a bit of a pun, eh?
 
Anyway, there are so many ideas in this area, it's hard to keep them
all straight.  Personally, if I was going to start with something,
it would probably be to better establish what the impact is on
various workloads of *eliminating* hint bits.  If the impact is
negative to a significant degree, my next step might be to try
background *freezing* of tuples (in a manner somewhat similar to
what you've done in this test) with the hint bits gone.
 
I know some people find them useful for forensics to a degree that
they would prefer not to see this, but I think it makes sense to
establish what cost people are paying every day to maintain forensic
information in this format.  In previous discussions there has been
some talk about being able to get better forensics from WAL files if
certain barriers could be overcome -- having hard numbers on the
performance benefits which might also accrue might put that work in
a different perspective.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-14 Thread Robert Haas
On Fri, Jan 14, 2011 at 1:34 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Robert Haas robertmh...@gmail.com wrote:

 I'm hoping some will pick it up and play with it some more (hint,
 hint).

 That was a bit of a pun, eh?

Unintentional...

 Anyway, there are so many ideas in this area, it's hard to keep them
 all straight.  Personally, if I was going to start with something,
 it would probably be to better establish what the impact is on
 various workloads of *eliminating* hint bits.  If the impact is
 negative to a significant degree, my next step might be to try
 background *freezing* of tuples (in a manner somewhat similar to
 what you've done in this test) with the hint bits gone.

Background freezing plays havoc with Hot Standby, and this test is
sufficient to show that eliminating hint bits altogether would a
significant regression on some workloads.  I don't think either of
those ideas can get off the ground.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-14 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Anyway, there are so many ideas in this area, it's hard to keep them
 all straight.  Personally, if I was going to start with something,
 it would probably be to better establish what the impact is on
 various workloads of *eliminating* hint bits.
 
 I know some people find them useful for forensics to a degree that
 they would prefer not to see this,

Um, yeah, I think you're having a problem keeping all the ideas straight
;-).  The argument about forensics has to do with how soon we're willing
to freeze tuples, ie replace the XID with a constant.  Not about hint
bits.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-14 Thread Robert Haas
On Fri, Jan 14, 2011 at 1:42 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Anyway, there are so many ideas in this area, it's hard to keep them
 all straight.  Personally, if I was going to start with something,
 it would probably be to better establish what the impact is on
 various workloads of *eliminating* hint bits.

 I know some people find them useful for forensics to a degree that
 they would prefer not to see this,

 Um, yeah, I think you're having a problem keeping all the ideas straight
 ;-).  The argument about forensics has to do with how soon we're willing
 to freeze tuples, ie replace the XID with a constant.  Not about hint
 bits.

Those things are related, though.  Freezing sooner could be viewed as
an alternative to hint bits.  Trouble is, it breaks Hot Standby,
badly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-14 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:
 
 Background freezing plays havoc with Hot Standby
 
I must have missed or forgotten the issue of background vacuums
and hot standby.  Can you summarize why that's worse than hitting
thresholds where autovacuum is freezing things?
 
 this test is sufficient to show that eliminating hint bits
 altogether would a significant regression on some workloads.
 
That wasn't clear to me from what you posted -- I thought that the
reduced performance might be partly (largely? mostly?) due to
competition with the background writer's work pushing the hinted
pages out.  Maybe I'm missing something or you didn't post
everything you observed in this regard
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-14 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:
 
 Freezing sooner could be viewed as an alternative to hint bits.
 
Exactly.  And as your test showed, things run faster frozen than
unfrozen with hint bits set.
 
 Trouble is, it breaks Hot Standby, badly.
 
You're really starting to worry me here.  Both for performance and
to reduce the WAN bandwidth demands of our backup strategy we are
very aggressive with our freezing.  Do off-hours VACUUM (FREEZE)
runs break hot standby?  Autovacuum freezing?  What are the
symptoms?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-14 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Fri, Jan 14, 2011 at 1:42 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Um, yeah, I think you're having a problem keeping all the ideas straight
 ;-).  The argument about forensics has to do with how soon we're willing
 to freeze tuples, ie replace the XID with a constant.  Not about hint
 bits.

 Those things are related, though.  Freezing sooner could be viewed as
 an alternative to hint bits.

Freezing sooner isn't likely to reduce I/O compared to hint bits.  What
that does is create I/O that you *have* to execute ... both in the pages
themselves, and in WAL.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-14 Thread Robert Haas
On Fri, Jan 14, 2011 at 2:01 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Trouble is, it breaks Hot Standby, badly.

 You're really starting to worry me here.  Both for performance and
 to reduce the WAN bandwidth demands of our backup strategy we are
 very aggressive with our freezing.  Do off-hours VACUUM (FREEZE)
 runs break hot standby?  Autovacuum freezing?  What are the
 symptoms?

Freezing removes XIDs, so latestRemovedXid advances.  VACUUM (FREEZE)
is fine if you do it when there are no queries running on your Hot
Standby server, but if there ARE queries running on the Hot Standby
server, they'll be cancelled once max_standby_delay expires.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-14 Thread Robert Haas
On Fri, Jan 14, 2011 at 1:52 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Robert Haas robertmh...@gmail.com wrote:

 Background freezing plays havoc with Hot Standby

 I must have missed or forgotten the issue of background vacuums
 and hot standby.  Can you summarize why that's worse than hitting
 thresholds where autovacuum is freezing things?

The critical issue is whether the tuples get frozen while they're
still invisible to some transactions on the standby server.  That's
when you get query cancellations.

 this test is sufficient to show that eliminating hint bits
 altogether would a significant regression on some workloads.

 That wasn't clear to me from what you posted -- I thought that the
 reduced performance might be partly (largely? mostly?) due to
 competition with the background writer's work pushing the hinted
 pages out.  Maybe I'm missing something or you didn't post
 everything you observed in this regard

Well, let me put together a quick patch that obliterates hint bits
entirely, and we can measure that.  The background writer has always
pushed out hint bit pages; I think the reduced performance was
probably due to needing to reset hint bits on pages that we threw away
without pushing them out.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-14 Thread Robert Haas
On Fri, Jan 14, 2011 at 2:09 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Fri, Jan 14, 2011 at 1:42 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Um, yeah, I think you're having a problem keeping all the ideas straight
 ;-).  The argument about forensics has to do with how soon we're willing
 to freeze tuples, ie replace the XID with a constant.  Not about hint
 bits.

 Those things are related, though.  Freezing sooner could be viewed as
 an alternative to hint bits.

 Freezing sooner isn't likely to reduce I/O compared to hint bits.  What
 that does is create I/O that you *have* to execute ... both in the pages
 themselves, and in WAL.

It depends on which way you tilt your head - right now, we rewrite
each table 3x - once to populate, once to hint, and once to freeze.
If the table is doomed to survive long enough to go through all three
of those, then freezing is better than hinting.  Of course, that's not
always the case, but people keep complaining about the way this shakes
out.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-14 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 
 Those things are related, though.  Freezing sooner could be
 viewed as an alternative to hint bits.
 
 Freezing sooner isn't likely to reduce I/O compared to hint bits. 
 What that does is create I/O that you *have* to execute ... both
 in the pages themselves, and in WAL.
 
In an environment where the vast majority of tuples live long enough
to need to be frozen anyway, freezing sooner doesn't really do that
to you.  Granted, explicit freezing off-hours prevents autovacuum
from doing that to you in large bursts at unexpected times, but if
you're comparing background writer freezing to autovacuum freezing,
I'm not clear on where the extra pain comes from.
 
I am assuming that the background writer would be sane about how it
did this, of course.  We could all set up straw man implementations
which would clobber performance.  I suspect that you can envision a
hueristic which would be no more bothersome than autovacuum.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-14 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:
 
 The critical issue is whether the tuples get frozen while they're
 still invisible to some transactions on the standby server. 
 That's when you get query cancellations.
 
Oh, OK; I get that.  That seems easy enough to at least mitigate to
a large degree by some threshold GUC.  But of course, the longer you
wait to freeze so that you don't cancel queries on the standby, the
more you pay to recalculate visibility, so it'd be a fussy thing to
tune.  Perhaps such freeze information could be queued until a safe
time on the standby.  (Now that I've learned the joys of SLRU, I can
see all sorts of possible uses for it)
 
 Well, let me put together a quick patch that obliterates hint bits
 entirely, and we can measure that.  The background writer has
 always pushed out hint bit pages; I think the reduced performance
 was probably due to needing to reset hint bits on pages that we
 threw away without pushing them out.
 
It would be good to confirm and quantify.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-14 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Fri, Jan 14, 2011 at 2:09 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Freezing sooner isn't likely to reduce I/O compared to hint bits.  What
 that does is create I/O that you *have* to execute ... both in the pages
 themselves, and in WAL.

 It depends on which way you tilt your head - right now, we rewrite
 each table 3x - once to populate, once to hint, and once to freeze.
 If the table is doomed to survive long enough to go through all three
 of those, then freezing is better than hinting.  Of course, that's not
 always the case, but people keep complaining about the way this shakes
 out.

The people whose tables are mostly insert-only complain about it, but
that's not the majority of our userbase IMO.  We just happen to have a
couple of particularly vocal ones, like Berkus.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-14 Thread Robert Haas
On Fri, Jan 14, 2011 at 2:51 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Fri, Jan 14, 2011 at 2:09 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Freezing sooner isn't likely to reduce I/O compared to hint bits.  What
 that does is create I/O that you *have* to execute ... both in the pages
 themselves, and in WAL.

 It depends on which way you tilt your head - right now, we rewrite
 each table 3x - once to populate, once to hint, and once to freeze.
 If the table is doomed to survive long enough to go through all three
 of those, then freezing is better than hinting.  Of course, that's not
 always the case, but people keep complaining about the way this shakes
 out.

 The people whose tables are mostly insert-only complain about it, but
 that's not the majority of our userbase IMO.  We just happen to have a
 couple of particularly vocal ones, like Berkus.

True.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-14 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Robert Haas robertmh...@gmail.com wrote:
 The critical issue is whether the tuples get frozen while they're
 still invisible to some transactions on the standby server. 
 That's when you get query cancellations.
 
 Oh, OK; I get that.  That seems easy enough to at least mitigate to
 a large degree by some threshold GUC.  But of course, the longer you
 wait to freeze so that you don't cancel queries on the standby, the
 more you pay to recalculate visibility, so it'd be a fussy thing to
 tune.

Yeah.  Also, most of the argument for early freezing hinges on the hope
that it could happen before the tuples go to disk the first time, which
makes the window even narrower.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-14 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Robert Haas robertmh...@gmail.com wrote:
 The critical issue is whether the tuples get frozen while
 they're still invisible to some transactions on the standby
 server.  That's when you get query cancellations.
  
 Oh, OK; I get that.  That seems easy enough to at least mitigate
 to a large degree by some threshold GUC.  But of course, the
 longer you wait to freeze so that you don't cancel queries on the
 standby, the more you pay to recalculate visibility, so it'd be a
 fussy thing to tune.
 
 Yeah.  Also, most of the argument for early freezing hinges on the
 hope that it could happen before the tuples go to disk the first
 time, which makes the window even narrower.
 
Is there any merit to the idea that the hot standbys could be
enhanced (in some post-9.1 version) to stash a list of tuples to
freeze in a persistent SLRU, applying them when GLobalXmin passes
the associated xid?  It seems as though this would eliminate the
need to roll back transactions based on freezing without slowing
down the master or compromising the usability of the standby
(assuming that any pending ones get applied as part of promotion,
although I suppose if that time could be non-negligible, that might
be the fatal flaw).
 
This is more of a brainstorming thought than a well-researched
proposal, so I won't be too surprised if there's a hole in the idea
big enough to drive a truck through
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-14 Thread Josh Berkus
On 1/14/11 11:51 AM, Tom Lane wrote:
 The people whose tables are mostly insert-only complain about it, but
 that's not the majority of our userbase IMO.  We just happen to have a
 couple of particularly vocal ones, like Berkus.

It might or might not be the majority, but it's an extremely common case
affecting a lot of users.  Many, if not most, software applications have
a log table (or two, or three) which just accumulates rows, and when
that log table gets a vacuum freeze it pretty much halts the database in
its tracks.  Between my client practice and IRC, I run across complaints
about this issue around 3 times a month.

And data warehousing is a significant portion of our user base, and
*all* DW users are affected by this.  In some cases, vacuum issues are
sufficient to prevent people from using PostgreSQL for data warehousing.

I'd dare say that there are more users who would like autovacuum to
handle big tables better than want synchronous replication, for example.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-13 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 I whipped up the attached patch tonight.

This appears to remove the BM_JUST_DIRTIED logic.  Please explain why
that's not completely broken.  Even if it isn't completely broken,
it would seem better to do something like that as a separate patch.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-13 Thread Robert Haas
On Thu, Jan 13, 2011 at 10:43 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 I whipped up the attached patch tonight.

 This appears to remove the BM_JUST_DIRTIED logic.  Please explain why
 that's not completely broken.  Even if it isn't completely broken,
 it would seem better to do something like that as a separate patch.

Well, the only point of BM_JUST_DIRTIED is to detect whether BM_DIRTY
has been set while a buffer write is in progress.  With this patch,
only BM_HINT_BITS can be set while the buffer write is in progress;
BM_DIRTY cannot.  Perhaps one could make the argument that this would
be a good cleanup anyway: in the unpatched code, BM_DIRTY can only be
set while a buffer I/O is in progress if it is set due to a hint-bit
update, and then we don't really care if the update gets lost.
Although that seems a bit confusing...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers