Re: [HACKERS] WALInsertLock tuning

2011-10-14 Thread Robert Haas
On Thu, Oct 13, 2011 at 9:35 PM, Bruce Momjian br...@momjian.us wrote:

 I assume this was addressed with this commit:

        commit 465883b0a2b4236ba6b31b648a9eabef3b7cdddb
        Author: Simon Riggs si...@2ndquadrant.com
        Date:   Tue Jun 28 22:58:17 2011 +0100

            Introduce compact WAL record for the common case of commit 
 (non-DDL).
            XLOG_XACT_COMMIT_COMPACT leaves out invalidation messages and 
 relfilenodes,
            saving considerable space for the vast majority of transaction 
 commits.
            XLOG_XACT_COMMIT keeps same definition as XLOG_PAGE_MAGIC 0xD067 
 and earlier.

            Leonardo Francalanci and Simon Riggs

No, that's completely unrelated.  I think it just never quite made it
to prime time - it was analyzed theoretically but some of the testing
discussed on the thread doesn't seem to have been done.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WALInsertLock tuning

2011-10-13 Thread Bruce Momjian

I assume this was addressed with this commit:

commit 465883b0a2b4236ba6b31b648a9eabef3b7cdddb
Author: Simon Riggs si...@2ndquadrant.com
Date:   Tue Jun 28 22:58:17 2011 +0100

Introduce compact WAL record for the common case of commit 
(non-DDL).
XLOG_XACT_COMMIT_COMPACT leaves out invalidation messages and 
relfilenodes,
saving considerable space for the vast majority of transaction 
commits.
XLOG_XACT_COMMIT keeps same definition as XLOG_PAGE_MAGIC 0xD067 
and earlier.

Leonardo Francalanci and Simon Riggs

---

Simon Riggs wrote:
 On Mon, Jun 6, 2011 at 11:25 PM, Robert Haas robertmh...@gmail.com wrote:
 
  As to the question of whether it's safe, I think I'd agree that the
  chances of this backfiring are pretty remote. ?I think that with the
  zeroing they are exactly zero, because (now that we start XLOG
  positions at 0/1) there is now way that xl_prev = {0, 0} can ever be
  valid. ?Without the zeroing, well, it's remotely possible that xl_prev
  could happen to appear valid and that xl_crc could happen to match...
  but the chances are presumably quite remote. ?Just the chances of the
  CRC matching should be around 2^-32. ?The chances of an xl_prev match
  are harder to predict, because the matching values for CRCs should be
  pretty much uniformly distributed, while xl_prev isn't random. ?But
  even given that the chance of a match is should be very small, so in
  practice there is likely no harm.
 
 And if such a thing did actually happen we would also need to have an
 accidentally correct value of all of the rest of the header values.
 And even if we did we would apply at most one junk WAL record. Then we
 are onto the next WAL record where we would need have to the same luck
 all over again.
 
 The probability of these occurrences is well below the acceptable
 threshold for other problems occurring.
 
  It strikes me, though, that we
  could probably get nearly all of the benefit of this patch by being
  willing to zero the first sizeof(XLogRecord) bytes following a record,
  but not the rest of the buffer. ?That would pretty much wipe out any
  chance of an xl_prev match, I think, and would likely still get nearly
  all of the performance benefit.
 
 Which adds something onto the path of every XlogInsert(), rather than
 once per page, so I'm a little hesitant to agree.
 
 If we did that, we would only need to write out an additional 12 bytes
 per WAL record, not the full sizeof(XLogRecord).
 
 But even so, I think its wasted effort.
 
 Measuring the benefit of a performance patch is normal, but I'm not
 proposing this as a risk trade-off. It's just a straight removal of
 multiple cycles from a hot code path. The exact benefit will depend
 upon whether the WALInsertLock is the hot lock, which it likely will
 be when other patches are applied.
 
 -- 
 ?Simon Riggs?? http://www.2ndQuadrant.com/
 ?PostgreSQL Development, 24x7 Support, Training  Services
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WALInsertLock tuning

2011-06-07 Thread Simon Riggs
On Mon, Jun 6, 2011 at 11:25 PM, Robert Haas robertmh...@gmail.com wrote:

 As to the question of whether it's safe, I think I'd agree that the
 chances of this backfiring are pretty remote.  I think that with the
 zeroing they are exactly zero, because (now that we start XLOG
 positions at 0/1) there is now way that xl_prev = {0, 0} can ever be
 valid.  Without the zeroing, well, it's remotely possible that xl_prev
 could happen to appear valid and that xl_crc could happen to match...
 but the chances are presumably quite remote.  Just the chances of the
 CRC matching should be around 2^-32.  The chances of an xl_prev match
 are harder to predict, because the matching values for CRCs should be
 pretty much uniformly distributed, while xl_prev isn't random.  But
 even given that the chance of a match is should be very small, so in
 practice there is likely no harm.

And if such a thing did actually happen we would also need to have an
accidentally correct value of all of the rest of the header values.
And even if we did we would apply at most one junk WAL record. Then we
are onto the next WAL record where we would need have to the same luck
all over again.

The probability of these occurrences is well below the acceptable
threshold for other problems occurring.

 It strikes me, though, that we
 could probably get nearly all of the benefit of this patch by being
 willing to zero the first sizeof(XLogRecord) bytes following a record,
 but not the rest of the buffer.  That would pretty much wipe out any
 chance of an xl_prev match, I think, and would likely still get nearly
 all of the performance benefit.

Which adds something onto the path of every XlogInsert(), rather than
once per page, so I'm a little hesitant to agree.

If we did that, we would only need to write out an additional 12 bytes
per WAL record, not the full sizeof(XLogRecord).

But even so, I think its wasted effort.

Measuring the benefit of a performance patch is normal, but I'm not
proposing this as a risk trade-off. It's just a straight removal of
multiple cycles from a hot code path. The exact benefit will depend
upon whether the WALInsertLock is the hot lock, which it likely will
be when other patches are applied.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WALInsertLock tuning

2011-06-07 Thread Heikki Linnakangas

On 07.06.2011 10:21, Simon Riggs wrote:

On Mon, Jun 6, 2011 at 11:25 PM, Robert Haasrobertmh...@gmail.com  wrote:

It strikes me, though, that we
could probably get nearly all of the benefit of this patch by being
willing to zero the first sizeof(XLogRecord) bytes following a record,
but not the rest of the buffer.  That would pretty much wipe out any
chance of an xl_prev match, I think, and would likely still get nearly
all of the performance benefit.


Which adds something onto the path of every XlogInsert(), rather than
once per page, so I'm a little hesitant to agree.


You would only need to do it just before you write out the WAL. I guess 
you'd need to grab WALInsertLock in XLogWrite() to prevent more WAL 
records from being inserted on the page until you're done zeroing it, 
though.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WALInsertLock tuning

2011-06-07 Thread Simon Riggs
On Tue, Jun 7, 2011 at 8:27 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 07.06.2011 10:21, Simon Riggs wrote:

 On Mon, Jun 6, 2011 at 11:25 PM, Robert Haasrobertmh...@gmail.com
  wrote:

 It strikes me, though, that we
 could probably get nearly all of the benefit of this patch by being
 willing to zero the first sizeof(XLogRecord) bytes following a record,
 but not the rest of the buffer.  That would pretty much wipe out any
 chance of an xl_prev match, I think, and would likely still get nearly
 all of the performance benefit.

 Which adds something onto the path of every XlogInsert(), rather than
 once per page, so I'm a little hesitant to agree.

 You would only need to do it just before you write out the WAL. I guess
 you'd need to grab WALInsertLock in XLogWrite() to prevent more WAL records
 from being inserted on the page until you're done zeroing it, though.

How would that help?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WALInsertLock tuning

2011-06-07 Thread Robert Haas
On Tue, Jun 7, 2011 at 3:21 AM, Simon Riggs si...@2ndquadrant.com wrote:
 It strikes me, though, that we
 could probably get nearly all of the benefit of this patch by being
 willing to zero the first sizeof(XLogRecord) bytes following a record,
 but not the rest of the buffer.  That would pretty much wipe out any
 chance of an xl_prev match, I think, and would likely still get nearly
 all of the performance benefit.

 Which adds something onto the path of every XlogInsert(), rather than
 once per page, so I'm a little hesitant to agree.

Urk.  Well, we don't want that, for sure.   The previous discussion
was talking about moving the zeroing around somehow, rather than
getting rid of it, so maybe there's some way to make it work...

One other thought is that I think that this patch might cause a
user-visible behavior change.  Right now, when you hit the end of
recovery, you most typically get a message saying - record with zero
length.  Not always, but often.  If we adopt this approach, you'll get
a wider variety of error messages there, depending on exactly how the
new record fails validation.  I dunno if that's important to be worth
caring about, or not.

 If we did that, we would only need to write out an additional 12 bytes
 per WAL record, not the full sizeof(XLogRecord).

 But even so, I think its wasted effort.

 Measuring the benefit of a performance patch is normal, but I'm not
 proposing this as a risk trade-off. It's just a straight removal of
 multiple cycles from a hot code path. The exact benefit will depend
 upon whether the WALInsertLock is the hot lock, which it likely will
 be when other patches are applied.

I don't think it's too hard to construct a test case where it is, even
now.  pgbench on a medium-sized machine ought to do it, with
synchronous_commit turned off.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WALInsertLock tuning

2011-06-07 Thread Simon Riggs
On Tue, Jun 7, 2011 at 1:24 PM, Robert Haas robertmh...@gmail.com wrote:

 One other thought is that I think that this patch might cause a
 user-visible behavior change.  Right now, when you hit the end of
 recovery, you most typically get a message saying - record with zero
 length.  Not always, but often.  If we adopt this approach, you'll get
 a wider variety of error messages there, depending on exactly how the
 new record fails validation.  I dunno if that's important to be worth
 caring about, or not.

Not.

We've never said what the message would be, only that it would fail.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WALInsertLock tuning

2011-06-07 Thread Heikki Linnakangas

On 07.06.2011 10:55, Simon Riggs wrote:

On Tue, Jun 7, 2011 at 8:27 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com  wrote:

You would only need to do it just before you write out the WAL. I guess
you'd need to grab WALInsertLock in XLogWrite() to prevent more WAL records
from being inserted on the page until you're done zeroing it, though.


How would that help?


It doesn't matter whether the pages are zeroed while they sit in memory. 
And if you write a full page of WAL data, any wasted bytes at the end of 
the page don't matter, because they're ignored at replay anyway. The 
possibility of mistaking random garbage for valid WAL only occurs when 
we write a partial WAL page to disk. So, it is enough to zero the 
remainder of the partial WAL page (or just the next few words) when we 
write it out.


That's a lot cheaper than fully zeroing every page. (except for the fact 
that you'd need to hold WALInsertLock while you do it)


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WALInsertLock tuning

2011-06-07 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 On 07.06.2011 10:55, Simon Riggs wrote:
 How would that help?

 It doesn't matter whether the pages are zeroed while they sit in memory. 
 And if you write a full page of WAL data, any wasted bytes at the end of 
 the page don't matter, because they're ignored at replay anyway. The 
 possibility of mistaking random garbage for valid WAL only occurs when 
 we write a partial WAL page to disk. So, it is enough to zero the 
 remainder of the partial WAL page (or just the next few words) when we 
 write it out.

 That's a lot cheaper than fully zeroing every page. (except for the fact 
 that you'd need to hold WALInsertLock while you do it)

I think avoiding the need to hold both locks at once is probably exactly
why the zeroing was done where it is.

An interesting alternative is to have XLogInsert itself just plop down a
few more zeroes immediately after the record it's inserted, before it
releases WALInsertLock.  This will be redundant work once the next
record gets added, but it's cheap enough to not matter IMO.  As was
mentioned upthread, zeroing out the bytes that will eventually hold the
next record's xl_prev field ought to be enough to maintain a guarantee
that we won't believe the next record is valid.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WALInsertLock tuning

2011-06-07 Thread Simon Riggs
On Tue, Jun 7, 2011 at 4:57 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 On 07.06.2011 10:55, Simon Riggs wrote:
 How would that help?

 It doesn't matter whether the pages are zeroed while they sit in memory.
 And if you write a full page of WAL data, any wasted bytes at the end of
 the page don't matter, because they're ignored at replay anyway. The
 possibility of mistaking random garbage for valid WAL only occurs when
 we write a partial WAL page to disk. So, it is enough to zero the
 remainder of the partial WAL page (or just the next few words) when we
 write it out.

 That's a lot cheaper than fully zeroing every page. (except for the fact
 that you'd need to hold WALInsertLock while you do it)

 I think avoiding the need to hold both locks at once is probably exactly
 why the zeroing was done where it is.

 An interesting alternative is to have XLogInsert itself just plop down a
 few more zeroes immediately after the record it's inserted, before it
 releases WALInsertLock.  This will be redundant work once the next
 record gets added, but it's cheap enough to not matter IMO.  As was
 mentioned upthread, zeroing out the bytes that will eventually hold the
 next record's xl_prev field ought to be enough to maintain a guarantee
 that we won't believe the next record is valid.

Lets see what the overheads are with a continuous stream of short WAL
records, say xl_heap_delete records.

xl header is 32 bytes, xl_heap_delete is 24 bytes.

So there would be ~145 records per page. 12 byte zeroing overhead per
record gives 1740 total zero bytes written per page.

The overhead is at worst case less than 25% of current overhead, plus
its spread out across multiple records.

When we get lots of full pages into WAL just after checkpoint we don't
get as much overhead - nearly every full page forces a page switch. So
we're removing overhead from where it hurts the most and amortising
across other records.

Maths work for me.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WALInsertLock tuning

2011-06-07 Thread Fujii Masao
On Tue, Jun 7, 2011 at 9:54 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Tue, Jun 7, 2011 at 1:24 PM, Robert Haas robertmh...@gmail.com wrote:

 One other thought is that I think that this patch might cause a
 user-visible behavior change.  Right now, when you hit the end of
 recovery, you most typically get a message saying - record with zero
 length.  Not always, but often.  If we adopt this approach, you'll get
 a wider variety of error messages there, depending on exactly how the
 new record fails validation.  I dunno if that's important to be worth
 caring about, or not.

 Not.

 We've never said what the message would be, only that it would fail.

BTW, walreceiver doesn't zero the page before writing the WAL. So,
if zeroing the page is *really* required for safe recovery, we might need
to change walreceiver.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WALInsertLock tuning

2011-06-06 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 In earlier discussions of how to improve WALInsertLock contention, it
 was observed that we must zero each new page before we advance the WAL
 insertion point.
 http://postgresql.1045698.n5.nabble.com/Reworking-WAL-locking-td1983647.html

 IMHO the page zeroing is completely unnecessary,

I don't believe it's completely unnecessary.  It does in fact offer
additional protection against mistakenly taking stale data as valid.
You could maybe argue that the degree of safety increase isn't
sufficient to warrant the cost of zeroing the page, but you've not
offered any quantification of either the risk or the cost savings.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WALInsertLock tuning

2011-06-06 Thread Simon Riggs
On Mon, Jun 6, 2011 at 5:47 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 In earlier discussions of how to improve WALInsertLock contention, it
 was observed that we must zero each new page before we advance the WAL
 insertion point.
 http://postgresql.1045698.n5.nabble.com/Reworking-WAL-locking-td1983647.html

 IMHO the page zeroing is completely unnecessary,

 I don't believe it's completely unnecessary.  It does in fact offer
 additional protection against mistakenly taking stale data as valid.
 You could maybe argue that the degree of safety increase isn't
 sufficient to warrant the cost of zeroing the page, but you've not
 offered any quantification of either the risk or the cost savings.

If we did ever reference stale data, it would need to have a value of
xl_prev that exactly matched what we expected AND would also need a
CRC that exactly matched also. That would be fairly difficult to
arrange deliberately and pretty close to zero chance of happening
otherwise.

But that even assumes we write the unzeroed data at the end of the
buffer. We don't. We only write data up to the end of the WAL record
on the current page, unless we do a continuation record, which means
only replay we would read in another page and check page headers. So
for this to cause an error, we'd have to have an overall matching CRC,
a matching xl_prev and at least one matching page header, which
contains a pageaddress.

But that even assumes we would read data in a different way to which
it was written.

So the only way we could ever reference the stale data is if the WAL
reading code didn't match the WAL writing code (1) because of a bug or
(2) because of a corrupt pg_control record that caused random access
to an otherwise unreachable part of WAL
AND
the CRC matched, and the xl_prev matched and the next page header matched.

Risk == zero. If it wasn't zero I would even mention it because this
is not a trade-off optimization.

Cost savings are those already identified by yourself in our previous
discussion on this, link given upthread. It's the biggest single
removable item from within WALInsertLock. We can measure them if you
like, but I don't see the value in that. It will clearly be a useful
saving on what we all agree is a heavily contended lock.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WALInsertLock tuning

2011-06-06 Thread Jeff Janes
On Mon, Jun 6, 2011 at 11:27 AM, Simon Riggs si...@2ndquadrant.com wrote:

 But that even assumes we write the unzeroed data at the end of the
 buffer. We don't. We only write data up to the end of the WAL record
 on the current page, unless we do a continuation record,

I see no codepath in XLogWrite which writes anything other than full
buffer pages.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WALInsertLock tuning

2011-06-06 Thread Simon Riggs
On Mon, Jun 6, 2011 at 10:09 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Mon, Jun 6, 2011 at 11:27 AM, Simon Riggs si...@2ndquadrant.com wrote:

 But that even assumes we write the unzeroed data at the end of the
 buffer. We don't. We only write data up to the end of the WAL record
 on the current page, unless we do a continuation record,

 I see no codepath in XLogWrite which writes anything other than full
 buffer pages.

Second line, boolean variable called ispartialpage.

As I mentioned, even if spare bytes at the end of page were written,
they aren't ever read except in corner case bugs that would be trapped
by other logic put there to protect us.

We're safe. If I didn't believe it and hadn't tested it, I wouldn't speak.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WALInsertLock tuning

2011-06-06 Thread Robert Haas
On Mon, Jun 6, 2011 at 6:05 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Mon, Jun 6, 2011 at 10:09 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Mon, Jun 6, 2011 at 11:27 AM, Simon Riggs si...@2ndquadrant.com wrote:

 But that even assumes we write the unzeroed data at the end of the
 buffer. We don't. We only write data up to the end of the WAL record
 on the current page, unless we do a continuation record,

 I see no codepath in XLogWrite which writes anything other than full
 buffer pages.

 Second line, boolean variable called ispartialpage.

That affects our tracking of the write and flush positions, but not
what we actually write to the file:

/* OK to write the page(s) */
from = XLogCtl-pages + startidx * (Size) XLOG_BLCKSZ;
nbytes = npages * (Size) XLOG_BLCKSZ;
errno = 0;
if (write(openLogFile, from, nbytes) != nbytes)

As you can see, nbytes is always a multiple of XLOG_BLCKSZ.

 As I mentioned, even if spare bytes at the end of page were written,
 they aren't ever read except in corner case bugs that would be trapped
 by other logic put there to protect us.

 We're safe. If I didn't believe it and hadn't tested it, I wouldn't speak.

I think you need to do some analysis of how much this actually
improves performance.  I don't like the idea of applying performance
patches without performance testing; sometimes the results are
counterintuitive.  And even when they're not, it's nice to know how
much you've gained.

As to the question of whether it's safe, I think I'd agree that the
chances of this backfiring are pretty remote.  I think that with the
zeroing they are exactly zero, because (now that we start XLOG
positions at 0/1) there is now way that xl_prev = {0, 0} can ever be
valid.  Without the zeroing, well, it's remotely possible that xl_prev
could happen to appear valid and that xl_crc could happen to match...
but the chances are presumably quite remote.  Just the chances of the
CRC matching should be around 2^-32.  The chances of an xl_prev match
are harder to predict, because the matching values for CRCs should be
pretty much uniformly distributed, while xl_prev isn't random.  But
even given that the chance of a match is should be very small, so in
practice there is likely no harm.  It strikes me, though, that we
could probably get nearly all of the benefit of this patch by being
willing to zero the first sizeof(XLogRecord) bytes following a record,
but not the rest of the buffer.  That would pretty much wipe out any
chance of an xl_prev match, I think, and would likely still get nearly
all of the performance benefit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers