Re: [HACKERS] WAL format changes break the suppression of do-nothing checkpoints.

2015-04-15 Thread Heikki Linnakangas

On 03/31/2015 09:09 PM, Jeff Janes wrote:

On Tue, Mar 31, 2015 at 6:19 AM, Heikki Linnakangas hlinn...@iki.fi wrote:


On 03/30/2015 09:01 PM, Jeff Janes wrote:


commit 2c03216d831160bedd72d45f7 has invalidated the part of the docs
saying If no WAL has been written since the previous checkpoint, new
checkpoints will be skipped even if checkpoint_timeout has passed,
presumably by accident.

It seems that this part is no longer true when it should be true:

  if (curInsert == ControlFile-checkPoint +
  MAXALIGN(SizeOfXLogRecord + sizeof(CheckPoint))

MAXALIGN(SizeOfXLogRecord + sizeof(CheckPoint) is now 96, but the amount
by
which curInsert gets advanced is still 104, like it was before the commit.


Hmm. Wasn't this a bit broken before too, when the checkpoint record
crosses a page boundary?


But once the next one it was written, it wouldn't across a boundary
anymore.  So you would get one extra checkpoint, rather than an infinite
stream of them.  So a bit broken, but just a bit.


Yeah. It's not serious.

It occurs to me that that's actually nice from a robustness point of 
view: if something goes badly wrong and corrupt, you're more likely to 
be able to recover the last checkpoint record if it doesn't cross a page 
boundary. And more importantly, a segment boundary. OTOH, when you 
create a spurious checkpoint, you lose the previous checkpoint 
location. A good compromise is probably to not skip the checkpoint if 
the last one crossed a segment boundary. (Committed that way)



Instead of trying to calculate where the checkpoint record ends, I think
we could check that the prev-pointer points to the last checkpoint record.


Wouldn't doing that involve reading WAL while not in recovery?


No. We keep track of the beginning position of the last inserted record 
at all times, because when the next record is inserted, we have to put 
that in the xl_prev field.


Committed a patch to do that.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format changes break the suppression of do-nothing checkpoints.

2015-03-31 Thread Heikki Linnakangas

On 03/30/2015 09:01 PM, Jeff Janes wrote:

commit 2c03216d831160bedd72d45f7 has invalidated the part of the docs
saying If no WAL has been written since the previous checkpoint, new
checkpoints will be skipped even if checkpoint_timeout has passed,
presumably by accident.

It seems that this part is no longer true when it should be true:

 if (curInsert == ControlFile-checkPoint +
 MAXALIGN(SizeOfXLogRecord + sizeof(CheckPoint))

MAXALIGN(SizeOfXLogRecord + sizeof(CheckPoint) is now 96, but the amount by
which curInsert gets advanced is still 104, like it was before the commit.


Hmm. Wasn't this a bit broken before too, when the checkpoint record 
crosses a page boundary?


Instead of trying to calculate where the checkpoint record ends, I think 
we could check that the prev-pointer points to the last checkpoint record.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format changes

2012-07-06 Thread Peter Eisentraut
On fre, 2012-06-15 at 00:01 +0300, Heikki Linnakangas wrote:
 1. Use a 64-bit segment number, instead of the log/seg combination. And 
 don't waste the last segment on each logical 4 GB log file. The concept 
 of a logical log file is now completely gone. XLogRecPtr is unchanged, 
 but it should now be understood as a plain 64-bit value, just split into 
 two 32-bit integers for historical reasons. On disk, this means that 
 there will be log files ending in FF, those were skipped before.

A thought on this.  There were some concerns that this would silently
break tools that pretend to have detailed knowledge of WAL file
numbering and this previous behavior of skipping the FF files.  We could
address this by fixing the overall file naming from something like

000108D000FD
000108D000FE
000108D000FF
000108D1

to

000108D0FD00
000108D0FE00
000108D0FF00
000108D1

which represents the new true WAL stream numbering as opposed to the old
two-part numbering.

Thus, any tool that thinks it knows how the WAL files are sequenced will
break very obviously, but any tool that just looks for 24 hexadecimal
digits will be fine.

I wonder if any tools in the former category would also break if one
changes XLOG_SEG_SIZE.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format changes

2012-07-06 Thread Greg Stark
On Thu, Jun 14, 2012 at 10:01 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 This has the advantage that you can calculate the CRC for all the other
 fields before acquiring WALInsertLock. For xl_prev, you need to know where
 exactly the record is inserted, so it's handy that it's the last field
 before CRC.

It may be late to mention this but fwiw you don't need to reorder the
fields to do this. CRC has the property that you can easily adjust it
for any changes to the data covered by it. Regardless of where the
xl_prev link is you can calculate the CRC as if xl_prev is 0 and then
once you get the lock add in the correct xl_prev. This is an
argument in favour of using CRC over other checksums for which that
would be hard or impossible.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format changes

2012-06-25 Thread Fujii Masao
On Mon, Jun 25, 2012 at 1:24 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Ok, committed all the WAL format changes now.

This breaks pg_resetxlog -l at all. When I ran pg_resetxlog -l 0x01,0x01,0x01
in the HEAD, I got the following error message though the same command
successfully completed in 9.1.

pg_resetxlog: invalid argument for option -l
Try pg_resetxlog --help for more information.

I think the attached patch needs to be applied.

Regards,

-- 
Fujii Masao


resetxlog_bugfix_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format changes

2012-06-25 Thread Fujii Masao
On Mon, Jun 25, 2012 at 1:24 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Ok, committed all the WAL format changes now.

I found the typo.

In walsender.c
-reply.write.xlogid, reply.write.xrecoff,
-reply.flush.xlogid, reply.flush.xrecoff,
-reply.apply.xlogid, reply.apply.xrecoff);
+(uint32) (reply.write  32), (uint32) reply.write,
+(uint32) (reply.flush  32), (uint32) reply.flush,
+(uint32) (reply.apply  32), (uint32) reply.apply);

 should be . The attached patch fixes this typo.

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format changes

2012-06-25 Thread Fujii Masao
On Tue, Jun 26, 2012 at 2:53 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Mon, Jun 25, 2012 at 1:24 AM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 Ok, committed all the WAL format changes now.

 I found the typo.

 In walsender.c
 -                reply.write.xlogid, reply.write.xrecoff,
 -                reply.flush.xlogid, reply.flush.xrecoff,
 -                reply.apply.xlogid, reply.apply.xrecoff);
 +                (uint32) (reply.write  32), (uint32) reply.write,
 +                (uint32) (reply.flush  32), (uint32) reply.flush,
 +                (uint32) (reply.apply  32), (uint32) reply.apply);

  should be . The attached patch fixes this typo.

Oh, I forgot to attach the patch.. Here is the patch.

Regards,

-- 
Fujii Masao


walsender_typo_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format changes

2012-06-25 Thread Robert Haas
On Mon, Jun 25, 2012 at 1:57 PM, Fujii Masao masao.fu...@gmail.com wrote:
  should be . The attached patch fixes this typo.

 Oh, I forgot to attach the patch.. Here is the patch.

I committed both of the patches you posted to this thread.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format changes

2012-06-25 Thread Heikki Linnakangas

On 25.06.2012 21:01, Robert Haas wrote:

On Mon, Jun 25, 2012 at 1:57 PM, Fujii Masaomasao.fu...@gmail.com  wrote:

 should be. The attached patch fixes this typo.


Oh, I forgot to attach the patch.. Here is the patch.


I committed both of the patches you posted to this thread.


Thanks Robert. I was thinking that pg_resetxlog -l would accept a WAL 
file name, instead of comma-separated tli, xlogid, segno arguments. The 
latter is a bit meaningless now that we don't use the xlogid+segno 
combination anywhere else. Alvaro pointed out that pg_upgrade was broken 
by the change in pg_resetxlog -n output - I changed that too to print 
the First log segment after reset information as a WAL file name, 
instead of logid+segno. Another option would be to print the 64-bit 
segment number, but I think that's worse, because the 64-bit 	segment 
number is harder to associate with a physical WAL file.


So I think we should change pg_resetxlog -l option to take a WAL file 
name as argument, and fix pg_upgrade accordingly.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format changes

2012-06-25 Thread Alvaro Herrera

Excerpts from Heikki Linnakangas's message of lun jun 25 20:09:34 -0400 2012:
 On 25.06.2012 21:01, Robert Haas wrote:
  On Mon, Jun 25, 2012 at 1:57 PM, Fujii Masaomasao.fu...@gmail.com  wrote:
   should be. The attached patch fixes this typo.
 
  Oh, I forgot to attach the patch.. Here is the patch.
 
  I committed both of the patches you posted to this thread.
 
 Thanks Robert. I was thinking that pg_resetxlog -l would accept a WAL 
 file name, instead of comma-separated tli, xlogid, segno arguments. The 
 latter is a bit meaningless now that we don't use the xlogid+segno 
 combination anywhere else. Alvaro pointed out that pg_upgrade was broken 
 by the change in pg_resetxlog -n output - I changed that too to print 
 the First log segment after reset information as a WAL file name, 
 instead of logid+segno. Another option would be to print the 64-bit 
 segment number, but I think that's worse, because the 64-bit segment 
 number is harder to associate with a physical WAL file.
 
 So I think we should change pg_resetxlog -l option to take a WAL file 
 name as argument, and fix pg_upgrade accordingly.

The only thing pg_upgrade does with the tli/logid/segno combo, AFAICT,
is pass it back to pg_resetxlog -l, so this plan seems reasonable.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format changes

2012-06-25 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 So I think we should change pg_resetxlog -l option to take a WAL file 
 name as argument, and fix pg_upgrade accordingly.

Seems reasonable I guess.  It's really specifying a starting WAL
location, but only to file granularity, so treating the argument as a
file name is sort of a type cheat but seems convenient.

If we do it that way, we'd better validate that the argument is a legal
WAL file name, so as to catch any cases where somebody tries to do it
old-style.

BTW, does pg_resetxlog's logic for setting the default -l value (from
scanning pg_xlog to find the largest existing file name) still work?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format changes

2012-06-25 Thread Amit Kapila
From: pgsql-hackers-ow...@postgresql.org
[mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
  So I think we should change pg_resetxlog -l option to take a WAL file 
  name as argument, and fix pg_upgrade accordingly.

 Seems reasonable I guess.  It's really specifying a starting WAL
 location, but only to file granularity, so treating the argument as a
 file name is sort of a type cheat but seems convenient.

 If we do it that way, we'd better validate that the argument is a legal
 WAL file name, so as to catch any cases where somebody tries to do it
 old-style.

 BTW, does pg_resetxlog's logic for setting the default -l value (from
 scanning pg_xlog to find the largest existing file name) still work?
  

It finds the segment number for largest existing file name from pg_xlog and
then compare it with input provided by the 
user for -l Option, if input is greater it will use the input to set in
control file.

With Regards,
Amit Kapila.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format changes

2012-06-24 Thread Heikki Linnakangas

Ok, committed all the WAL format changes now.

On 19.06.2012 18:57, Robert Haas wrote:

Should we keep the old representation in the replication protocol messages?
That would make it simpler to write a client that works with different
server versions (like pg_receivexlog). Or, while we're at it, perhaps we
should mandate network-byte order for all the integer and XLogRecPtr fields
in the replication protocol. That would make it easier to write a client
that works across different architectures, in= 9.3. The contents of the
WAL would of course be architecture-dependent, but it would be nice if
pg_receivexlog and similar tools could nevertheless be
architecture-independent.


I share Andres' question about how we're doing this already.  I think
if we're going to break this, I'd rather do it in 9.3 than 5 years
from now.  At this point it's just a minor annoyance, but it'll
probably get worse as people write more tools that understand WAL.


I didn't touch the replication protocol yet, but I think we should do it 
some time during 9.3.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format changes

2012-06-24 Thread Simon Riggs
On 24 June 2012 17:24, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:

 Ok, committed all the WAL format changes now.

Nice!

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format changes

2012-06-20 Thread Magnus Hagander
On Tue, Jun 19, 2012 at 5:57 PM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Jun 19, 2012 at 4:14 AM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 Well, that was easier than I thought. Attached is a patch to make XLogRecPtr
 a uint64, on top of my other WAL format patches. I think we should go ahead
 with this.

 +1.

 The LSNs on pages are still stored in the old format, to avoid changing the
 on-disk format and breaking pg_upgrade. The XLogRecPtrs stored the control
 file and WAL are changed, however, so an initdb (or at least pg_resetxlog)
 is required.

 Seems fine.

 Should we keep the old representation in the replication protocol messages?
 That would make it simpler to write a client that works with different
 server versions (like pg_receivexlog). Or, while we're at it, perhaps we
 should mandate network-byte order for all the integer and XLogRecPtr fields
 in the replication protocol. That would make it easier to write a client
 that works across different architectures, in = 9.3. The contents of the
 WAL would of course be architecture-dependent, but it would be nice if
 pg_receivexlog and similar tools could nevertheless be
 architecture-independent.

 I share Andres' question about how we're doing this already.  I think
 if we're going to break this, I'd rather do it in 9.3 than 5 years
 from now.  At this point it's just a minor annoyance, but it'll
 probably get worse as people write more tools that understand WAL.

If we are looking at breaking it, and we are especially concerned
about something like pg_receivexlog... Is it something we could/should
change in the protocl *now* for 9.2, to make it non-broken in any
released version? As in, can we extract just the protocol change and
backpatch that to 9.2beta?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format changes

2012-06-20 Thread Fujii Masao
On Wed, Jun 20, 2012 at 8:19 PM, Magnus Hagander mag...@hagander.net wrote:
 On Tue, Jun 19, 2012 at 5:57 PM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Jun 19, 2012 at 4:14 AM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 Well, that was easier than I thought. Attached is a patch to make XLogRecPtr
 a uint64, on top of my other WAL format patches. I think we should go ahead
 with this.

 +1.

 The LSNs on pages are still stored in the old format, to avoid changing the
 on-disk format and breaking pg_upgrade. The XLogRecPtrs stored the control
 file and WAL are changed, however, so an initdb (or at least pg_resetxlog)
 is required.

 Seems fine.

 Should we keep the old representation in the replication protocol messages?
 That would make it simpler to write a client that works with different
 server versions (like pg_receivexlog). Or, while we're at it, perhaps we
 should mandate network-byte order for all the integer and XLogRecPtr fields
 in the replication protocol. That would make it easier to write a client
 that works across different architectures, in = 9.3. The contents of the
 WAL would of course be architecture-dependent, but it would be nice if
 pg_receivexlog and similar tools could nevertheless be
 architecture-independent.

 I share Andres' question about how we're doing this already.  I think
 if we're going to break this, I'd rather do it in 9.3 than 5 years
 from now.  At this point it's just a minor annoyance, but it'll
 probably get worse as people write more tools that understand WAL.

 If we are looking at breaking it, and we are especially concerned
 about something like pg_receivexlog... Is it something we could/should
 change in the protocl *now* for 9.2, to make it non-broken in any
 released version? As in, can we extract just the protocol change and
 backpatch that to 9.2beta?

pg_receivexlog in 9.2 cannot handle correctly the WAL location FF
(which was skipped in 9.2 or before). For example, pg_receivexlog calls
XLByteAdvance() which always skips FF. So even if we change the protocol,
ISTM pg_receivexlog in 9.2 cannot work well with the server in 9.3 which
might send FF. No?

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format changes

2012-06-20 Thread Heikki Linnakangas

On 20.06.2012 20:43, Fujii Masao wrote:

On Wed, Jun 20, 2012 at 8:19 PM, Magnus Hagandermag...@hagander.net  wrote:

On Tue, Jun 19, 2012 at 5:57 PM, Robert Haasrobertmh...@gmail.com  wrote:

On Tue, Jun 19, 2012 at 4:14 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com  wrote:

Well, that was easier than I thought. Attached is a patch to make XLogRecPtr
a uint64, on top of my other WAL format patches. I think we should go ahead
with this.


+1.


The LSNs on pages are still stored in the old format, to avoid changing the
on-disk format and breaking pg_upgrade. The XLogRecPtrs stored the control
file and WAL are changed, however, so an initdb (or at least pg_resetxlog)
is required.


Seems fine.


Should we keep the old representation in the replication protocol messages?
That would make it simpler to write a client that works with different
server versions (like pg_receivexlog). Or, while we're at it, perhaps we
should mandate network-byte order for all the integer and XLogRecPtr fields
in the replication protocol. That would make it easier to write a client
that works across different architectures, in= 9.3. The contents of the
WAL would of course be architecture-dependent, but it would be nice if
pg_receivexlog and similar tools could nevertheless be
architecture-independent.


I share Andres' question about how we're doing this already.  I think
if we're going to break this, I'd rather do it in 9.3 than 5 years
from now.  At this point it's just a minor annoyance, but it'll
probably get worse as people write more tools that understand WAL.


If we are looking at breaking it, and we are especially concerned
about something like pg_receivexlog... Is it something we could/should
change in the protocl *now* for 9.2, to make it non-broken in any
released version? As in, can we extract just the protocol change and
backpatch that to 9.2beta?


pg_receivexlog in 9.2 cannot handle correctly the WAL location FF
(which was skipped in 9.2 or before). For example, pg_receivexlog calls
XLByteAdvance() which always skips FF. So even if we change the protocol,
ISTM pg_receivexlog in 9.2 cannot work well with the server in 9.3 which
might send FF. No?


Yeah, you can't use pg_receivexlog from 9.2 against a 9.3 server. We 
can't really promise compatibility when using an older client against a 
newer server, but we can try to be backwards-compatible in the other 
direction. I'm thinking of using a 9.3 pg_receivexlog against a 9.2 server.


But I guess Robert is right and we shouldn't worry about 
backwards-compatibility at this point. Instead, let's try to get the 
protocol right, so that we can more easily provide 
backwards-compatibility in the future. Like, using a 9.4 pg_receivexlog 
against a 9.3 server.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format changes

2012-06-19 Thread Andres Freund
On Tuesday, June 19, 2012 10:14:08 AM Heikki Linnakangas wrote:
 On 18.06.2012 21:08, Heikki Linnakangas wrote:
  On 18.06.2012 21:00, Robert Haas wrote:
  On Thu, Jun 14, 2012 at 5:58 PM, Andres Freundand...@2ndquadrant.com
  
  wrote:
  1. Use a 64-bit segment number, instead of the log/seg combination.
  And don't waste the last segment on each logical 4 GB log file. The
  concept of a logical log file is now completely gone. XLogRecPtr is
  unchanged,
  but it should now be understood as a plain 64-bit value, just split
  into
  two 32-bit integers for historical reasons. On disk, this means that
  there will be log files ending in FF, those were skipped before.
  
  Whats the reason for keeping that awkward split now? There aren't
  that many
  users of xlogid/xcrecoff and many of those would be better served by
  using
  helper macros.
  
  I wondered that, too. There may be a good reason for keeping it split
  up that way, but we at least oughta think about it a bit.
  
  The page header contains an XLogRecPtr (LSN), so if we change it we'll
  have to deal with pg_upgrade. I guess we could still keep XLogRecPtr
  around as the on-disk representation, and convert between the 64-bit
  integer and XLogRecPtr in PageGetLSN/PageSetLSN. I can try that out -
  many xlog calculations would admittedly be simpler if it was an uint64.
 
 Well, that was easier than I thought. Attached is a patch to make
 XLogRecPtr a uint64, on top of my other WAL format patches. I think we
 should go ahead with this.
Cool. You plan to merge XLogSegNo with XLogRecPtr in that case? I am not sure 
if having two representations which just have a constant factor inbetween 
really makes sense.

 The LSNs on pages are still stored in the old format, to avoid changing
 the on-disk format and breaking pg_upgrade. The XLogRecPtrs stored the
 control file and WAL are changed, however, so an initdb (or at least
 pg_resetxlog) is required.
Sounds sensible.

 Should we keep the old representation in the replication protocol
 messages? That would make it simpler to write a client that works with
 different server versions (like pg_receivexlog). Or, while we're at it,
 perhaps we should mandate network-byte order for all the integer and
 XLogRecPtr fields in the replication protocol.
The replication protocol uses pq_sendint for integers which should take care 
of converting to big endian already. I don't think anything but the wal itself 
is otherwise transported in a binary fashion? So I don't think there is any 
such architecture dependency in the protocol currently?

I don't really see a point in keeping around a backward-compatible 
representation just for the sake of running such tools on multiple versions. I 
might not be pragmatic enough, but: Why would you want to do that *at the 
moment*? Many of the other tools are already version specific, so...
When the protocol starts to be used by more tools, maybe, but imo were not 
there yet.

But then its not hard to convert to the old representation for that.

 I kept the %X/%X representation in error messages etc. I'm quite used to
 that output, so reluctant to change it, although it's a bit silly now
 that it represents just 64-bit value. Using UINT64_FORMAT would also
 make the messages harder to translate.
No opinion on that. Its easier to see for me whether two values are exactly 
the same or very similar with the 64bit representation but its harder to gauge 
bigger differences. So ...

Greetings,

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format changes

2012-06-19 Thread Robert Haas
On Tue, Jun 19, 2012 at 4:14 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Well, that was easier than I thought. Attached is a patch to make XLogRecPtr
 a uint64, on top of my other WAL format patches. I think we should go ahead
 with this.

+1.

 The LSNs on pages are still stored in the old format, to avoid changing the
 on-disk format and breaking pg_upgrade. The XLogRecPtrs stored the control
 file and WAL are changed, however, so an initdb (or at least pg_resetxlog)
 is required.

Seems fine.

 Should we keep the old representation in the replication protocol messages?
 That would make it simpler to write a client that works with different
 server versions (like pg_receivexlog). Or, while we're at it, perhaps we
 should mandate network-byte order for all the integer and XLogRecPtr fields
 in the replication protocol. That would make it easier to write a client
 that works across different architectures, in = 9.3. The contents of the
 WAL would of course be architecture-dependent, but it would be nice if
 pg_receivexlog and similar tools could nevertheless be
 architecture-independent.

I share Andres' question about how we're doing this already.  I think
if we're going to break this, I'd rather do it in 9.3 than 5 years
from now.  At this point it's just a minor annoyance, but it'll
probably get worse as people write more tools that understand WAL.

 I kept the %X/%X representation in error messages etc. I'm quite used to
 that output, so reluctant to change it, although it's a bit silly now that
 it represents just 64-bit value. Using UINT64_FORMAT would also make the
 messages harder to translate.

I could go either way on this one, but I have no problem with the way
you did it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format changes

2012-06-19 Thread Heikki Linnakangas

On 19.06.2012 18:46, Andres Freund wrote:

On Tuesday, June 19, 2012 10:14:08 AM Heikki Linnakangas wrote:

Well, that was easier than I thought. Attached is a patch to make
XLogRecPtr a uint64, on top of my other WAL format patches. I think we
should go ahead with this.

Cool. You plan to merge XLogSegNo with XLogRecPtr in that case? I am not sure
if having two representations which just have a constant factor inbetween
really makes sense.


I wasn't planning to, it didn't even occur to me that we might be able 
to get rid of XLogSegNo to be honest. There's places that deal whole 
segments, rather than with specific byte positions in the WAL, so I 
think XLogSegNo makes more sense in that context. Take 
XLogArchiveNotifySeg(), for example. It notifies the archiver that a 
given segment is ready for archiving, so we pass an XLogSegNo to 
identify that segment as an argument. I suppose we could pass an 
XLogRecPtr that points to the beginning of the segment instead, but it 
doesn't really feel like an improvement to me.



Should we keep the old representation in the replication protocol
messages? That would make it simpler to write a client that works with
different server versions (like pg_receivexlog). Or, while we're at it,
perhaps we should mandate network-byte order for all the integer and
XLogRecPtr fields in the replication protocol.

The replication protocol uses pq_sendint for integers which should take care
of converting to big endian already. I don't think anything but the wal itself
is otherwise transported in a binary fashion? So I don't think there is any
such architecture dependency in the protocol currently?


We only use pg_sendint() for the few values exchanged in the handshake 
before we start replicating, but once we begin, we just send structs 
around. For example, in ProcessStandbyReplyMessage():



static void
ProcessStandbyReplyMessage(void)
{
StandbyReplyMessage reply;

pq_copymsgbytes(reply_message, (char *) reply, 
sizeof(StandbyReplyMessage));

 ...

After that, we just the fields in the reply struct like in any other struct.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format changes

2012-06-19 Thread Andres Freund
Hi,

On Wednesday, June 20, 2012 12:24:54 AM Heikki Linnakangas wrote:
 On 19.06.2012 18:46, Andres Freund wrote:
  On Tuesday, June 19, 2012 10:14:08 AM Heikki Linnakangas wrote:
  Well, that was easier than I thought. Attached is a patch to make
  XLogRecPtr a uint64, on top of my other WAL format patches. I think we
  should go ahead with this.
  
  Cool. You plan to merge XLogSegNo with XLogRecPtr in that case? I am not
  sure if having two representations which just have a constant factor
  inbetween really makes sense.

 I wasn't planning to, it didn't even occur to me that we might be able
 to get rid of XLogSegNo to be honest. There's places that deal whole
 segments, rather than with specific byte positions in the WAL, so I
 think XLogSegNo makes more sense in that context. Take
 XLogArchiveNotifySeg(), for example. It notifies the archiver that a
 given segment is ready for archiving, so we pass an XLogSegNo to
 identify that segment as an argument. I suppose we could pass an
 XLogRecPtr that points to the beginning of the segment instead, but it
 doesn't really feel like an improvement to me.
I am not sure its a win either, was just wondering because they now are that 
similar.

  Should we keep the old representation in the replication protocol
  messages? That would make it simpler to write a client that works with
  different server versions (like pg_receivexlog). Or, while we're at it,
  perhaps we should mandate network-byte order for all the integer and
  XLogRecPtr fields in the replication protocol.
  
  The replication protocol uses pq_sendint for integers which should take
  care of converting to big endian already. I don't think anything but the
  wal itself is otherwise transported in a binary fashion? So I don't
  think there is any such architecture dependency in the protocol
  currently?
 We only use pg_sendint() for the few values exchanged in the handshake
 before we start replicating, but once we begin, we just send structs
 
 around. For example, in ProcessStandbyReplyMessage():
  static void
  ProcessStandbyReplyMessage(void)
  {
  
  StandbyReplyMessage reply;
  
  pq_copymsgbytes(reply_message, (char *) reply,
  sizeof(StandbyReplyMessage));
  
   ...
 
 After that, we just the fields in the reply struct like in any other
 struct.
Yes, forgot that, true. I guess the best fix would be to actually send normal 
messages instead of CopyData ones? Much more to type though...

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format changes

2012-06-18 Thread Robert Haas
On Thu, Jun 14, 2012 at 5:58 PM, Andres Freund and...@2ndquadrant.com wrote:
 1. Use a 64-bit segment number, instead of the log/seg combination. And
 don't waste the last segment on each logical 4 GB log file. The concept
 of a logical log file is now completely gone. XLogRecPtr is unchanged,
 but it should now be understood as a plain 64-bit value, just split into
 two 32-bit integers for historical reasons. On disk, this means that
 there will be log files ending in FF, those were skipped before.
 Whats the reason for keeping that awkward split now? There aren't that many
 users of xlogid/xcrecoff and many of those would be better served by using
 helper macros.

I wondered that, too.  There may be a good reason for keeping it split
up that way, but we at least oughta think about it a bit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format changes

2012-06-18 Thread Heikki Linnakangas

On 18.06.2012 21:00, Robert Haas wrote:

On Thu, Jun 14, 2012 at 5:58 PM, Andres Freundand...@2ndquadrant.com  wrote:

1. Use a 64-bit segment number, instead of the log/seg combination. And
don't waste the last segment on each logical 4 GB log file. The concept
of a logical log file is now completely gone. XLogRecPtr is unchanged,
but it should now be understood as a plain 64-bit value, just split into
two 32-bit integers for historical reasons. On disk, this means that
there will be log files ending in FF, those were skipped before.

Whats the reason for keeping that awkward split now? There aren't that many
users of xlogid/xcrecoff and many of those would be better served by using
helper macros.


I wondered that, too.  There may be a good reason for keeping it split
up that way, but we at least oughta think about it a bit.


The page header contains an XLogRecPtr (LSN), so if we change it we'll 
have to deal with pg_upgrade. I guess we could still keep XLogRecPtr 
around as the on-disk representation, and convert between the 64-bit 
integer and XLogRecPtr in PageGetLSN/PageSetLSN. I can try that out - 
many xlog calculations would admittedly be simpler if it was an uint64.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format changes

2012-06-18 Thread Andres Freund
On Monday, June 18, 2012 08:08:14 PM Heikki Linnakangas wrote:
 On 18.06.2012 21:00, Robert Haas wrote:
  On Thu, Jun 14, 2012 at 5:58 PM, Andres Freundand...@2ndquadrant.com  
wrote:
  1. Use a 64-bit segment number, instead of the log/seg combination. And
  don't waste the last segment on each logical 4 GB log file. The concept
  of a logical log file is now completely gone. XLogRecPtr is
  unchanged, but it should now be understood as a plain 64-bit value,
  just split into two 32-bit integers for historical reasons. On disk,
  this means that there will be log files ending in FF, those were
  skipped before.
  
  Whats the reason for keeping that awkward split now? There aren't that
  many users of xlogid/xcrecoff and many of those would be better served
  by using helper macros.
  
  I wondered that, too.  There may be a good reason for keeping it split
  up that way, but we at least oughta think about it a bit.
 
 The page header contains an XLogRecPtr (LSN), so if we change it we'll
 have to deal with pg_upgrade. I guess we could still keep XLogRecPtr
 around as the on-disk representation, and convert between the 64-bit
 integer and XLogRecPtr in PageGetLSN/PageSetLSN. I can try that out -
 many xlog calculations would admittedly be simpler if it was an uint64.
I am out of my depth here, not having read any of the relevant code, but 
couldn't we simply replace the lsn from disk with InvalidXLogRecPtr without 
marking the page dirty?

There is the valid argument that you would loose some information when pages 
with hint bits are written out again, but on the other hand you would also 
gain the information that it was a hint-bit write...

Greetings,

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format changes

2012-06-18 Thread Robert Haas
On Mon, Jun 18, 2012 at 2:08 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 18.06.2012 21:00, Robert Haas wrote:
 On Thu, Jun 14, 2012 at 5:58 PM, Andres Freundand...@2ndquadrant.com
  wrote:

 1. Use a 64-bit segment number, instead of the log/seg combination. And
 don't waste the last segment on each logical 4 GB log file. The concept
 of a logical log file is now completely gone. XLogRecPtr is unchanged,
 but it should now be understood as a plain 64-bit value, just split into
 two 32-bit integers for historical reasons. On disk, this means that
 there will be log files ending in FF, those were skipped before.

 Whats the reason for keeping that awkward split now? There aren't that
 many
 users of xlogid/xcrecoff and many of those would be better served by
 using
 helper macros.

 I wondered that, too.  There may be a good reason for keeping it split
 up that way, but we at least oughta think about it a bit.

 The page header contains an XLogRecPtr (LSN), so if we change it we'll have
 to deal with pg_upgrade. I guess we could still keep XLogRecPtr around as
 the on-disk representation, and convert between the 64-bit integer and
 XLogRecPtr in PageGetLSN/PageSetLSN. I can try that out - many xlog
 calculations would admittedly be simpler if it was an uint64.

Ugh.  Well, that's a good reason for thinking twice before changing
it, if not abandoning the idea altogether.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format changes

2012-06-18 Thread Heikki Linnakangas

On 18.06.2012 21:13, Andres Freund wrote:

On Monday, June 18, 2012 08:08:14 PM Heikki Linnakangas wrote:

The page header contains an XLogRecPtr (LSN), so if we change it we'll
have to deal with pg_upgrade. I guess we could still keep XLogRecPtr
around as the on-disk representation, and convert between the 64-bit
integer and XLogRecPtr in PageGetLSN/PageSetLSN. I can try that out -
many xlog calculations would admittedly be simpler if it was an uint64.

I am out of my depth here, not having read any of the relevant code, but
couldn't we simply replace the lsn from disk with InvalidXLogRecPtr without
marking the page dirty?

There is the valid argument that you would loose some information when pages
with hint bits are written out again, but on the other hand you would also
gain the information that it was a hint-bit write...


Sorry, I don't understand that. Where would you replace the LSN from 
disk with InvalidXLogRecPtr ? (and no, it probably won't work ;-) )


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format changes

2012-06-18 Thread Andres Freund
On Monday, June 18, 2012 08:32:54 PM Heikki Linnakangas wrote:
 On 18.06.2012 21:13, Andres Freund wrote:
  On Monday, June 18, 2012 08:08:14 PM Heikki Linnakangas wrote:
  The page header contains an XLogRecPtr (LSN), so if we change it we'll
  have to deal with pg_upgrade. I guess we could still keep XLogRecPtr
  around as the on-disk representation, and convert between the 64-bit
  integer and XLogRecPtr in PageGetLSN/PageSetLSN. I can try that out -
  many xlog calculations would admittedly be simpler if it was an uint64.
  
  I am out of my depth here, not having read any of the relevant code, but
  couldn't we simply replace the lsn from disk with InvalidXLogRecPtr
  without marking the page dirty?
  
  There is the valid argument that you would loose some information when
  pages with hint bits are written out again, but on the other hand you
  would also gain the information that it was a hint-bit write...
 
 Sorry, I don't understand that. Where would you replace the LSN from
 disk with InvalidXLogRecPtr ? (and no, it probably won't work ;-) )
In ReadBuffer_common or such, after reading a page from disk and verifying 
that the page has a valid header it seems to be possible to replace pd_lsn *in 
memory* by InvalidXLogRecPtr without marking the page dirty.
If the page isn't modified the lsn on disk won't be changed so you don't loose 
debugging information in that case. If will be zero if it has been written by 
a hint-bit write and thats arguable a win.

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format changes

2012-06-18 Thread Heikki Linnakangas

On 18.06.2012 21:45, Andres Freund wrote:

On Monday, June 18, 2012 08:32:54 PM Heikki Linnakangas wrote:

On 18.06.2012 21:13, Andres Freund wrote:

On Monday, June 18, 2012 08:08:14 PM Heikki Linnakangas wrote:

The page header contains an XLogRecPtr (LSN), so if we change it we'll
have to deal with pg_upgrade. I guess we could still keep XLogRecPtr
around as the on-disk representation, and convert between the 64-bit
integer and XLogRecPtr in PageGetLSN/PageSetLSN. I can try that out -
many xlog calculations would admittedly be simpler if it was an uint64.


I am out of my depth here, not having read any of the relevant code, but
couldn't we simply replace the lsn from disk with InvalidXLogRecPtr
without marking the page dirty?

There is the valid argument that you would loose some information when
pages with hint bits are written out again, but on the other hand you
would also gain the information that it was a hint-bit write...


Sorry, I don't understand that. Where would you replace the LSN from
disk with InvalidXLogRecPtr ? (and no, it probably won't work ;-) )

In ReadBuffer_common or such, after reading a page from disk and verifying
that the page has a valid header it seems to be possible to replace pd_lsn *in
memory* by InvalidXLogRecPtr without marking the page dirty.
If the page isn't modified the lsn on disk won't be changed so you don't loose
debugging information in that case. If will be zero if it has been written by
a hint-bit write and thats arguable a win.


We use the LSN to decide whether a full-page image need to be xlogged if 
the page is modified. If you reset LSN every time you read in a page, 
you'll be doing full page writes every time a page is read from disk and 
modified, whether or not it's the first time the page is modified after 
the last checkpoint.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format changes

2012-06-18 Thread Andres Freund
On Monday, June 18, 2012 09:19:48 PM Heikki Linnakangas wrote:
 On 18.06.2012 21:45, Andres Freund wrote:
  On Monday, June 18, 2012 08:32:54 PM Heikki Linnakangas wrote:
  On 18.06.2012 21:13, Andres Freund wrote:
  On Monday, June 18, 2012 08:08:14 PM Heikki Linnakangas wrote:
  The page header contains an XLogRecPtr (LSN), so if we change it we'll
  have to deal with pg_upgrade. I guess we could still keep XLogRecPtr
  around as the on-disk representation, and convert between the 64-bit
  integer and XLogRecPtr in PageGetLSN/PageSetLSN. I can try that out -
  many xlog calculations would admittedly be simpler if it was an
  uint64.
  
  I am out of my depth here, not having read any of the relevant code,
  but couldn't we simply replace the lsn from disk with
  InvalidXLogRecPtr without marking the page dirty?
  
  There is the valid argument that you would loose some information when
  pages with hint bits are written out again, but on the other hand you
  would also gain the information that it was a hint-bit write...
  
  Sorry, I don't understand that. Where would you replace the LSN from
  disk with InvalidXLogRecPtr ? (and no, it probably won't work ;-) )
  
  In ReadBuffer_common or such, after reading a page from disk and
  verifying that the page has a valid header it seems to be possible to
  replace pd_lsn *in memory* by InvalidXLogRecPtr without marking the page
  dirty.
  If the page isn't modified the lsn on disk won't be changed so you don't
  loose debugging information in that case. If will be zero if it has been
  written by a hint-bit write and thats arguable a win.
 
 We use the LSN to decide whether a full-page image need to be xlogged if
 the page is modified. If you reset LSN every time you read in a page,
 you'll be doing full page writes every time a page is read from disk and
 modified, whether or not it's the first time the page is modified after
 the last checkpoint.
Yea, I somehow disregarded that hint-bit writes would make a problem with full 
page writes in that case. Normal writes wouldn't be a problem...
This should be fixable but the result would be too ugly. So its either 
converting the on-disk representation or keeping the duplicated 
representation.

pd_lsn seems to be well enough abstracted, doing the conversion in 
PageSet/GetLSN seems to be easy enough.

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format changes

2012-06-14 Thread Andres Freund
On Thursday, June 14, 2012 11:01:42 PM Heikki Linnakangas wrote:
 As I threatened earlier
 (http://archives.postgresql.org/message-id/4fd0b1ab.3090...@enterprisedb.co
 m), here are three patches that change the WAL format. The goal is to
 change the format so that when you're inserting a WAL record of a given
 size, you know exactly how much space it requires in the WAL.
I fear the patches need rebasing after the pgindent run... Even before that 
(60801944fa105252b48ea5688d47dfc05c695042) it only applies with offsets?

Greetings,

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format changes

2012-06-14 Thread Andres Freund
On Thursday, June 14, 2012 11:01:42 PM Heikki Linnakangas wrote:
 As I threatened earlier
 (http://archives.postgresql.org/message-id/4fd0b1ab.3090...@enterprisedb.co
 m), here are three patches that change the WAL format. The goal is to
 change the format so that when you're inserting a WAL record of a given
 size, you know exactly how much space it requires in the WAL.
 
 1. Use a 64-bit segment number, instead of the log/seg combination. And
 don't waste the last segment on each logical 4 GB log file. The concept
 of a logical log file is now completely gone. XLogRecPtr is unchanged,
 but it should now be understood as a plain 64-bit value, just split into
 two 32-bit integers for historical reasons. On disk, this means that
 there will be log files ending in FF, those were skipped before.
Whats the reason for keeping that awkward split now? There aren't that many 
users of xlogid/xcrecoff and many of those would be better served by using 
helper macros.
API compatibility isn't a great argument either as code manually playing 
around with those needs to be checked anyway. I think there might be some code 
around that does XLogRecPtr addition manuall and such.

 2. Always include the xl_rem_len field, used for continuation records,
 in the xlog page header. A continuation log record only contained that
 one field, it's now included straight in the page header, so the concept
 of a continuation record doesn't exist anymore. Because of alignment,
 this wastes 4 bytes on every page that contains continued data from a
 previous record, and 8 bytes on pages that don't. That's not very much,
 and the next step will buy that back:
 
 3. Allow WAL record header to be split across pages. Per Tom's
 suggestion, move xl_tot_len to be the first field in XLogRecord, so that
 even if the header is split, xl_tot_len is always on the first page.
 xl_crc is moved to be the last field, and xl_prev is the second to last.
 This has the advantage that you can calculate the CRC for all the other
 fields before acquiring WALInsertLock. For xl_prev, you need to know
 where exactly the record is inserted, so it's handy that it's the last
 field before CRC. This patch doesn't try to take advantage of that,
 however, and I'm not sure if that makes any difference once I finish the
 patch to make XLogInsert scale better, which is the ultimate goal of all
 this.
 
 Those are the three patches I'd like to get committed in this
 commitfest. To see where all this is leading to, I've included a rough
 WIP version of the XLogInsert scaling patch. This version is quite
 different from the one I posted in spring, it takes advantage of the WAL
 format changes, and I'm also experimenting with a different method of
 tracking how far each WAL insertion has progressed. But more on that later.
 
 (Note to self: remember to bump XLOG_PAGE_MAGIC)
Will review.

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers