Re: [HACKERS] Avoid full page images in streaming replication?

2015-10-26 Thread Bernd Helmle


--On 22. Oktober 2015 22:23:58 -0300 Alvaro Herrera
 wrote:

>> You can? The xlog format between 9.4 and 9.5 changed, so I can't see how
>> that'd work?
> 
> Oh, crap.  Must have been some other cross-version trial run I did,
> then.  I would hope it's at least not terribly difficult to back-patch
> that commit locally, anyway.

You might want to try 



Adrian created it basically to track down archive replay performance issues
a while ago. Maybe it's useful somehow.

-- 
Thanks

Bernd


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoid full page images in streaming replication?

2015-10-26 Thread Bernd Helmle


--On 23. Oktober 2015 00:03:30 +0200 Andres Freund 
wrote:

> 
> Note that FPIs are often pretty good for replay performance, avoiding
> lots of synchronous random reads.

That's a very import argument, i think. The difference can be significant,
even if you have a decent storage, rendering a replica unusable. We had
examples in the past where only optimizing aggressive writes of FPIs lead
to an acceptable lag of the replica.

-- 
Thanks

Bernd


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoid full page images in streaming replication?

2015-10-22 Thread Jim Nasby

On 10/22/15 4:42 PM, Magnus Hagander wrote:

 > How are you going to make that work without LSNs in the WAL received by
 > the replica diverging from those in the master's WAL?
 >

We could in theory send a "this would be been a fpi but it's skipped"
record which would only exist in streaming and just make the standby
write a noop of some kind? It would still be on the standby but it would
at least not consume the bandwidth. Just skip sending the actual
contents of the fpi.


I don't think it can be a noop on the receiver though... doesn't the 
receiver still need full page images in case of a crash? (Assuming 
full_page_writes is enabled...)


The other issue is chained replicas, where one of the children may need 
full page writes (during initial copy).

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoid full page images in streaming replication?

2015-10-22 Thread Andres Freund
On 2015-10-22 16:34:38 -0500, Jim Nasby wrote:
> ISTM it should be possible to avoid sending full page writes to a streaming
> replica once the replica has reached a consistent state. I assume that the
> replica would still need to write full pages to it's disk in case of a
> crash, but the sender could insert special WAL records to tell it when to do
> so, instead of sending the full page image. Presumably this would be a big
> win for replication over a WAN.

Note that FPIs are often pretty good for replay performance, avoiding
lots of synchronous random reads.

I think FPI compression is a better solution for now. I found it to be
extremely effective in some benchmarks I recently ran.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoid full page images in streaming replication?

2015-10-22 Thread Tom Lane
Jim Nasby  writes:
> ISTM it should be possible to avoid sending full page writes to a 
> streaming replica once the replica has reached a consistent state. I 
> assume that the replica would still need to write full pages to it's 
> disk in case of a crash, but the sender could insert special WAL records 
> to tell it when to do so, instead of sending the full page image. 
> Presumably this would be a big win for replication over a WAN.

How are you going to make that work without LSNs in the WAL received by
the replica diverging from those in the master's WAL?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoid full page images in streaming replication?

2015-10-22 Thread Magnus Hagander
On Oct 22, 2015 23:38, "Tom Lane"  wrote:
>
> Jim Nasby  writes:
> > ISTM it should be possible to avoid sending full page writes to a
> > streaming replica once the replica has reached a consistent state. I
> > assume that the replica would still need to write full pages to it's
> > disk in case of a crash, but the sender could insert special WAL records
> > to tell it when to do so, instead of sending the full page image.
> > Presumably this would be a big win for replication over a WAN.
>
> How are you going to make that work without LSNs in the WAL received by
> the replica diverging from those in the master's WAL?
>

We could in theory send a "this would be been a fpi but it's skipped"
record which would only exist in streaming and just make the standby write
a noop of some kind? It would still be on the standby but it would at least
not consume the bandwidth. Just skip sending the actual contents of the
fpi.

/Magnus


Re: [HACKERS] Avoid full page images in streaming replication?

2015-10-22 Thread Robert Haas
On Thu, Oct 22, 2015 at 5:57 PM, Jim Nasby  wrote:
>> We could in theory send a "this would be been a fpi but it's skipped"
>> record which would only exist in streaming and just make the standby
>> write a noop of some kind? It would still be on the standby but it would
>> at least not consume the bandwidth. Just skip sending the actual
>> contents of the fpi.
>
> I don't think it can be a noop on the receiver though... doesn't the
> receiver still need full page images in case of a crash? (Assuming
> full_page_writes is enabled...)

Yes. If the standby is in the middle of writing a page updated by a
WAL record and crashes, it can end up with a torn page.  We restart
from a restartpoint at a location where the master checkpointed so
that we can be certain that replay of the FPI will fix the problem.
If you got rid of the FPIs, you'd be dead.

This is true both before and after reaching a consistent state, which
seems like a fatal flaw in this plan.

It's true that if the standby didn't have the master's FPIs, it could
generate its own in some side location that behaves like a separate
WAL stream or a double-write buffer.  But that would be a heck of a
lot of work to implement for an uncertain benefit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoid full page images in streaming replication?

2015-10-22 Thread Jim Nasby

On 10/22/15 5:03 PM, Andres Freund wrote:

On 2015-10-22 16:34:38 -0500, Jim Nasby wrote:

ISTM it should be possible to avoid sending full page writes to a streaming
replica once the replica has reached a consistent state. I assume that the
replica would still need to write full pages to it's disk in case of a
crash, but the sender could insert special WAL records to tell it when to do
so, instead of sending the full page image. Presumably this would be a big
win for replication over a WAN.


Note that FPIs are often pretty good for replay performance, avoiding
lots of synchronous random reads.


Right. You'd only want to use this if you're streaming over a slow link 
(like a WAN).



I think FPI compression is a better solution for now. I found it to be
extremely effective in some benchmarks I recently ran.


What I'm wondering is how compressible a 'normal' FPI is. Certainly if 
the hole is zero'd out and the page is mostly empty you'll get great 
compression. What about other workloads? For reference, if a 'FPI 
placeholder' WAL record is 16 bytes, that's 51,200% compression. If it's 
12 bytes, it's 68,200% compression. (I'm assuming we write the hole too, 
but maybe that's not true?)


FWIW, I started thinking about this when a client overwhelmed a remote 
slave doing VACUUM FREEZE after a Slony upgrade to 9.4. Granted, that's 
not normal, but it looks like normal vacuuming generates 2-6 bytes per 
modified tuple (depending on what was done). So even if you vacuumed 100 
rows on a page (which seems pretty high for most cases) that's only 
~200-600 bytes, compared to ~8200 bytes for the FPI.


The other interesting thing is that even their local slaves (with 20Gbps 
bandwidth) fell behind with vacuum_cost_delay=0, because replay was 
CPU-bound. But presumably that's not due to FPIs.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoid full page images in streaming replication?

2015-10-22 Thread Jim Nasby

On 10/22/15 5:11 PM, Robert Haas wrote:

It's true that if the standby didn't have the master's FPIs, it could
generate its own in some side location that behaves like a separate
WAL stream or a double-write buffer.  But that would be a heck of a
lot of work to implement for an uncertain benefit.


If the receiver didn't write the WAL before processing it then it can 
just stick the page image into the WAL it's writing for itself. Probably 
not good for syncrep, but I don't think you'd want this on for syncrep 
anyway.


But yes, this is all very hand-wavy without any actual data on what 
percentage of the WAL stream is FPIs. Looks like pageinspect doesn't 
work for WAL... does anyone have a script/tool that breaks out what 
percentage of a WAL file is FPIs?

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoid full page images in streaming replication?

2015-10-22 Thread Jim Nasby

On 10/22/15 5:53 PM, Alvaro Herrera wrote:

Jim Nasby wrote:


But yes, this is all very hand-wavy without any actual data on what
percentage of the WAL stream is FPIs. Looks like pageinspect doesn't work
for WAL... does anyone have a script/tool that breaks out what percentage of
a WAL file is FPIs?


pg_xlogdump -z


Hrm, any option for 9.4? I was hoping to get numbers from some real 
workloads...

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoid full page images in streaming replication?

2015-10-22 Thread Alvaro Herrera
Jim Nasby wrote:
> On 10/22/15 5:53 PM, Alvaro Herrera wrote:
> >Jim Nasby wrote:
> >
> >>But yes, this is all very hand-wavy without any actual data on what
> >>percentage of the WAL stream is FPIs. Looks like pageinspect doesn't work
> >>for WAL... does anyone have a script/tool that breaks out what percentage of
> >>a WAL file is FPIs?
> >
> >pg_xlogdump -z
> 
> Hrm, any option for 9.4? I was hoping to get numbers from some real
> workloads...

You can run the new one in old pg_xlog ...

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoid full page images in streaming replication?

2015-10-22 Thread Andres Freund
On 2015-10-22 20:27:20 -0300, Alvaro Herrera wrote:
> You can run the new one in old pg_xlog ...

You can? The xlog format between 9.4 and 9.5 changed, so I can't see how
that'd work?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoid full page images in streaming replication?

2015-10-22 Thread Michael Paquier
On Fri, Oct 23, 2015 at 7:37 AM, Jim Nasby wrote:
> What I'm wondering is how compressible a 'normal' FPI is. Certainly if the
> hole is zero'd out and the page is mostly empty you'll get great
> compression. What about other workloads? For reference, if a 'FPI
> placeholder' WAL record is 16 bytes, that's 51,200% compression. If it's 12
> bytes, it's 68,200% compression. (I'm assuming we write the hole too, but
> maybe that's not true?)

Well, to begin with FPI usually avoid to include the page hole in the
middle. Now, regarding the compressibility of a page taken without its
hole, that's highly schema-dependent. Based on some measurements I did
some time ago a page with repetitive data could compress up to 40%,
with less compressible stuff like UUID I recall it to be 20~25%. I
hacked out for the FPW compression patch a module able to work
directly on raw pages to test their compressibility:
https://github.com/michaelpq/pg_plugins/tree/master/compress_test
get_raw_page() has been taken from pageinspect and I added to it an
option to remove the hole in the middle of the page. Using that you
are able to guess how much pages can get compressed.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoid full page images in streaming replication?

2015-10-22 Thread Michael Paquier
On Fri, Oct 23, 2015 at 8:42 AM, Andres Freund  wrote:
> On 2015-10-22 20:27:20 -0300, Alvaro Herrera wrote:
>> You can run the new one in old pg_xlog ...
>
> You can? The xlog format between 9.4 and 9.5 changed, so I can't see how
> that'd work?

That's not going to work.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoid full page images in streaming replication?

2015-10-22 Thread Andres Freund
On 2015-10-22 17:47:01 -0500, Jim Nasby wrote:
> On 10/22/15 5:11 PM, Robert Haas wrote:
> >It's true that if the standby didn't have the master's FPIs, it could
> >generate its own in some side location that behaves like a separate
> >WAL stream or a double-write buffer.  But that would be a heck of a
> >lot of work to implement for an uncertain benefit.
> 
> If the receiver didn't write the WAL before processing it then it can just
> stick the page image into the WAL it's writing for itself. Probably not good
> for syncrep, but I don't think you'd want this on for syncrep anyway.

To me this sounds like a recipe for disaster (i.e. complex bugs). WAL
(and thus CRC checksums) differing between nodes. Ugh.

> But yes, this is all very hand-wavy without any actual data on what
> percentage of the WAL stream is FPIs. Looks like pageinspect doesn't work
> for WAL... does anyone have a script/tool that breaks out what percentage of
> a WAL file is FPIs?

pg_xlogdump --stats


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoid full page images in streaming replication?

2015-10-22 Thread Alvaro Herrera
Jim Nasby wrote:

> But yes, this is all very hand-wavy without any actual data on what
> percentage of the WAL stream is FPIs. Looks like pageinspect doesn't work
> for WAL... does anyone have a script/tool that breaks out what percentage of
> a WAL file is FPIs?

pg_xlogdump -z

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoid full page images in streaming replication?

2015-10-22 Thread Jim Nasby

On 10/22/15 5:52 PM, Andres Freund wrote:

If the receiver didn't write the WAL before processing it then it can just
>stick the page image into the WAL it's writing for itself. Probably not good
>for syncrep, but I don't think you'd want this on for syncrep anyway.

To me this sounds like a recipe for disaster (i.e. complex bugs). WAL
(and thus CRC checksums) differing between nodes. Ugh.


The WAL would *not* differ. This would only affect streaming 
replication, and only the stream itself.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoid full page images in streaming replication?

2015-10-22 Thread Alvaro Herrera
Andres Freund wrote:
> On 2015-10-22 20:27:20 -0300, Alvaro Herrera wrote:
> > You can run the new one in old pg_xlog ...
> 
> You can? The xlog format between 9.4 and 9.5 changed, so I can't see how
> that'd work?

Oh, crap.  Must have been some other cross-version trial run I did,
then.  I would hope it's at least not terribly difficult to back-patch
that commit locally, anyway.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoid full page images in streaming replication?

2015-10-22 Thread Andres Freund
On 2015-10-22 17:59:06 -0500, Jim Nasby wrote:
> The WAL would *not* differ.

It would. Hint bits and all.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers