subject:"\[HACKERS\] Replication"

On 1 June 2017 at 09:23, Andres Freund  wrote:

> Even if we decide this is necessary, I *strongly* suggest trying to get
> the existing standby decoding etc wrapped up before starting something
> nontrival afresh.

Speaking of such, I had a thought about how to sync logical slot state
to physical replicas without requiring all logical downstreams to know
about and be able to connect to all physical replicas. Interested in
your initial reaction. Basically, enlist the walreceiver's help.

Extend the walsender so in physical rep mode it periodically writes
the upstream's logical slot state into the stream as a message with
special lsn 0/0. Then the walreceiver uses that to make decoding calls
on the downstream to advance the downstream logical slots to the new
confirmed_flush_lsn, or hands the info off to a helper proc that does
it. It could set up a decoding context and do it via a proper decoding
session, discarding output, and later we could probably optimise that
decoding session to do even less work than ReorderBufferSkip()ing
xacts.

The alternative at this point since we nixed writing logical slot
state to WAL seems to be a bgworker on the upstream that periodically
writes logical slot state into generic WAL messages in a special
table, then another on the downstream that processes the table and
makes appropriate decoding calls to advance the downstream slot state.
(Safely, not via directly setting catalog_xmin etc). Which is pretty
damn ugly, but has the advantage that it'd work for PITR restores,
unlike the walsender/walreceiver based approach. Failover slots in
extension-space, basically.

I'm really, really not sold on all logical downstreams having to know
about and be able to connect to all physical standbys of the upstream
to maintain slots on them. Some kind of solution that runs entirely on
the standby will be needed. It's more a question of whether it's
something built-in, easy, and nice, or some out of tree extension.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication origins and timelines

2017-05-31 Thread Andres Freund

On 2017-05-31 21:27:56 -0400, Stephen Frost wrote:
> Craig,
> 
> * Craig Ringer (cr...@2ndquadrant.com) wrote:
> > TL;DR: replication origins track LSN without timeline. This is
> > ambiguous when physical failover is present since /
> > can now represent more than one state due to timeline forks with
> > promotions. Replication origins should track timelines so we can tell
> > the difference, I propose to patch them accordingly for pg11.
> 
> Uh, TL;DR, wow?  Why isn't this something which needs to be addressed
> before PG10 can be released?

Huh?  Slots are't present on replicas, ergo there's no way for the whole
issue to occur in v10.


> The further comments in your email seem to state that logical
> replication will just fail if a replica is promoted.  While not ideal,
> that might barely reach the point of it being releasable, but turns it
> into a feature that I'd have a really hard time recommending to
> anyone,

Meh^10


> and are we absolutely sure that there aren't any cases where there might
> be an issue of undetected promotion, leading to the complications which
> you describe?

Yes, unless you manipulate things by hand, copying files around or such.


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication origins and timelines

On 1 June 2017 at 09:36, Andres Freund  wrote:
> On 2017-05-31 21:33:26 -0400, Stephen Frost wrote:
>> > This only starts becoming an issue once logical replication slots can
>> > exist on replicas and be maintained to follow the master's slot state.
>> > Which is incomplete in Pg10 (not exposed to users) but I plan to
>> > finish getting in for pg11, making this a possible issue to be
>> > addressed.
>>
>> Fair enough.  I'm disappointed that we ended up with that as the
>> solution for PG10
>
> This has widely been debated, and it's not exactly new that development
> happens incrementally, so I don't have particularly much sympathy for
> that POV.

Yeah. Even if we'd had completed support for decoding on standby,
there's no way we could've implemented the required gymnastics for
getting logical replication to actually use it to support physical
failover for pg10, so it was always going to land in pg11.

This is very much a "how do we do it right when we do do it" topic,
not a pg10 issue.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication origins and timelines

Craig,

* Craig Ringer (cr...@2ndquadrant.com) wrote:
> On 1 June 2017 at 09:27, Stephen Frost  wrote:
> > * Craig Ringer (cr...@2ndquadrant.com) wrote:
> >> TL;DR: replication origins track LSN without timeline. This is
> >> ambiguous when physical failover is present since /
> >> can now represent more than one state due to timeline forks with
> >> promotions. Replication origins should track timelines so we can tell
> >> the difference, I propose to patch them accordingly for pg11.
> >
> > Uh, TL;DR, wow?  Why isn't this something which needs to be addressed
> > before PG10 can be released?  I hope I'm missing something that makes
> > the current approach work in PG10, or that there's some reason that this
> > isn't a big deal for PG10, but I'd like a bit of info as to why that's
> > the case, if it is.
> 
> In Pg 10, if you promote a physical replica then logical replication
> falls apart entirely and stops working. So there's no corruption
> hazard because it just ... stops.

I see.

> This only starts becoming an issue once logical replication slots can
> exist on replicas and be maintained to follow the master's slot state.
> Which is incomplete in Pg10 (not exposed to users) but I plan to
> finish getting in for pg11, making this a possible issue to be
> addressed.

Fair enough.  I'm disappointed that we ended up with that as the
solution for PG10, but so be it, the main thing is that we avoid any
corruption risk.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Replication origins and timelines

Andres,

* Andres Freund (and...@anarazel.de) wrote:
> On 2017-05-31 21:33:26 -0400, Stephen Frost wrote:
> > > This only starts becoming an issue once logical replication slots can
> > > exist on replicas and be maintained to follow the master's slot state.
> > > Which is incomplete in Pg10 (not exposed to users) but I plan to
> > > finish getting in for pg11, making this a possible issue to be
> > > addressed.
> > 
> > Fair enough.  I'm disappointed that we ended up with that as the
> > solution for PG10
> 
> This has widely been debated, and it's not exactly new that development
> happens incrementally, so I don't have particularly much sympathy for
> that POV.

I do understand that, of course, but hadn't quite realized yet that
we're talking only about replication slots on replicas.  Apologies for
the noise.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Replication origins and timelines

2017-05-31 Thread Andres Freund

On 2017-05-31 21:33:26 -0400, Stephen Frost wrote:
> > This only starts becoming an issue once logical replication slots can
> > exist on replicas and be maintained to follow the master's slot state.
> > Which is incomplete in Pg10 (not exposed to users) but I plan to
> > finish getting in for pg11, making this a possible issue to be
> > addressed.
> 
> Fair enough.  I'm disappointed that we ended up with that as the
> solution for PG10

This has widely been debated, and it's not exactly new that development
happens incrementally, so I don't have particularly much sympathy for
that POV.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication origins and timelines

On 1 June 2017 at 09:23, Andres Freund  wrote:
> Hi,
>
> On 2017-06-01 09:12:04 +0800, Craig Ringer wrote:
>> TL;DR: replication origins track LSN without timeline. This is
>> ambiguous when physical failover is present since /
>> can now represent more than one state due to timeline forks with
>> promotions. Replication origins should track timelines so we can tell
>> the difference, I propose to patch them accordingly for pg11.
>
> I'm not quite convinced that this should be tracked at the origin level.
> If you fail over physically, shouldn't we also reconfigure logical
> replication?
>
> Even if we decide this is necessary, I *strongly* suggest trying to get
> the existing standby decoding etc wrapped up before starting something
> nontrival afresh.

Yeah, I'm not thinking of leaping straight to a patch before we've got
the rep on standby stuff nailed down. I just wanted to raise early
discussion to make sure it's not entirely the wrong path and/or
totally hopeless for core.

Logical decoding output plugins would need to keep track of the
timeline and send an extra message informing the downstream of a
timeline change whenever they see a new timeline. Or include it in all
messages (see: extra overhead). Since we don't stop a decoding session
when we hit a timeline boundary and force re-connection. (Nor can we,
since at some point our restart_lsn will be on the old timeline but
the first commits will be on the new timeline). I'll need to think
about if/how the decoding plugin can reliably do that.

>> Take master A, its physical replica B, and logical decoding client X
>> streaming changes from A. B is lagging. A is at lsn 1/1000, B is only
>> at 1/500. C has replicated from A up to 1/1000, when A fails. We
>> promote B to replace A. Now C connects to B, and requests to resume at
>> LSN 1/1000.
>
> Wouldn't it be better to solve this by querying the new master's
> timeline history, and checking whether the current replay point is
> pre/post fork?

That could work.

The decoding client would need to track the last-commit timeline in
its own metadata if we're not letting it put it in the replication
origin. Manageable, if awkward.

Clients would need to know how to fetch and parse timeline history
files, which is an irritating thing for every decoding client that
wants to support failover to have to support. But I guess it's
manageable, if not friendly. And non-Pg-based downstreams would have
to do it anyway.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication origins and timelines

Andres,

* Andres Freund (and...@anarazel.de) wrote:
> On 2017-05-31 21:27:56 -0400, Stephen Frost wrote:
> > Uh, TL;DR, wow?  Why isn't this something which needs to be addressed
> > before PG10 can be released?
> 
> Huh?  Slots are't present on replicas, ergo there's no way for the whole
> issue to occur in v10.

O, ok, now that makes more sense, not sure how I missed that.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Replication origins and timelines

On 1 June 2017 at 09:27, Stephen Frost  wrote:
> Craig,
>
> * Craig Ringer (cr...@2ndquadrant.com) wrote:
>> TL;DR: replication origins track LSN without timeline. This is
>> ambiguous when physical failover is present since /
>> can now represent more than one state due to timeline forks with
>> promotions. Replication origins should track timelines so we can tell
>> the difference, I propose to patch them accordingly for pg11.
>
> Uh, TL;DR, wow?  Why isn't this something which needs to be addressed
> before PG10 can be released?  I hope I'm missing something that makes
> the current approach work in PG10, or that there's some reason that this
> isn't a big deal for PG10, but I'd like a bit of info as to why that's
> the case, if it is.

In Pg 10, if you promote a physical replica then logical replication
falls apart entirely and stops working. So there's no corruption
hazard because it just ... stops.

This only starts becoming an issue once logical replication slots can
exist on replicas and be maintained to follow the master's slot state.
Which is incomplete in Pg10 (not exposed to users) but I plan to
finish getting in for pg11, making this a possible issue to be
addressed.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication origins and timelines

Craig,

* Craig Ringer (cr...@2ndquadrant.com) wrote:
> TL;DR: replication origins track LSN without timeline. This is
> ambiguous when physical failover is present since /
> can now represent more than one state due to timeline forks with
> promotions. Replication origins should track timelines so we can tell
> the difference, I propose to patch them accordingly for pg11.

Uh, TL;DR, wow?  Why isn't this something which needs to be addressed
before PG10 can be released?  I hope I'm missing something that makes
the current approach work in PG10, or that there's some reason that this
isn't a big deal for PG10, but I'd like a bit of info as to why that's
the case, if it is.

The further comments in your email seem to state that logical
replication will just fail if a replica is promoted.  While not ideal,
that might barely reach the point of it being releasable, but turns it
into a feature that I'd have a really hard time recommending to anyone,
and are we absolutely sure that there aren't any cases where there might
be an issue of undetected promotion, leading to the complications which
you describe?

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Replication origins and timelines

2017-05-31 Thread Andres Freund

Hi,

On 2017-06-01 09:12:04 +0800, Craig Ringer wrote:
> TL;DR: replication origins track LSN without timeline. This is
> ambiguous when physical failover is present since /
> can now represent more than one state due to timeline forks with
> promotions. Replication origins should track timelines so we can tell
> the difference, I propose to patch them accordingly for pg11.

I'm not quite convinced that this should be tracked at the origin level.
If you fail over physically, shouldn't we also reconfigure logical
replication?

Even if we decide this is necessary, I *strongly* suggest trying to get
the existing standby decoding etc wrapped up before starting something
nontrival afresh.

> Why?
> 
> Take master A, its physical replica B, and logical decoding client X
> streaming changes from A. B is lagging. A is at lsn 1/1000, B is only
> at 1/500. C has replicated from A up to 1/1000, when A fails. We
> promote B to replace A. Now C connects to B, and requests to resume at
> LSN 1/1000.

Wouldn't it be better to solve this by querying the new master's
timeline history, and checking whether the current replay point is
pre/post fork?

I'm more than bit doubtful that adding more overhead to every relevant
record is worth it here.

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Replication origins and timelines

Hi all

TL;DR: replication origins track LSN without timeline. This is
ambiguous when physical failover is present since /
can now represent more than one state due to timeline forks with
promotions. Replication origins should track timelines so we can tell
the difference, I propose to patch them accordingly for pg11.

-

When replication origins were introduced, they deliberately left out
tracking of the upstream node's timeline. Logical decoding couldn't
follow a timeline switch anyway, and replicas (still) have no facility
for logical decoding so everything completely breaks on promotion of a
physical replica.

I'm working on fixing that so that logical decoding and logical
replication integrates properly with physical replication and
failover. But when that works we'll face the same problem in logical
rep that timelines were introduced to solve for physical rep.

To prevent undetected misreplication we'll need to keep track of the
timeline of the last-replicated LSN in our downstream replication
origin. So I propose to add a timeline field to replication origins
for pg11.

Why?

Take master A, its physical replica B, and logical decoding client X
streaming changes from A. B is lagging. A is at lsn 1/1000, B is only
at 1/500. C has replicated from A up to 1/1000, when A fails. We
promote B to replace A. Now C connects to B, and requests to resume at
LSN 1/1000.

If B has since done enough work for its insert position to pass
1/1000, C will completely skip whatever B did between 1/500 and
1/1000, thinking (incorrectly) that it already replayed it. And it
will have *extra data* from A from the 1/500 to 1/1000 range that B
lost. It'll pick up from B's 1/1000 and try to apply that on top of
A's 1/1000 state, potentially leading to a mangled mess.

In physical rep this would lead to serious data corruption and
crashes. In logical rep it'll most likely lead to conflicts, apply
errors, inconsistent data, broken FKs, etc. It could be drastic, or
quite subtle, depending on app and workload.

But we really should still detect it. To do that, we need to remember
that our last replay position was (1/1000, 1) . And when we request to
start replay from 1/1000 at timeline 1 on B, it'll ERROR, telling us
that its timeline 1 ends at 1/500.

We could still *choose* to continue as if all was well, but by default
we'll detect the error.

But we can't do that unless replication origins on the downstream can
track the timeline.



-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication status in logical replication

2017-05-30 Thread Peter Eisentraut

On 5/29/17 22:56, Noah Misch wrote:
> On Fri, May 19, 2017 at 11:33:48AM +0900, Masahiko Sawada wrote:
>> On Wed, Apr 12, 2017 at 5:31 AM, Simon Riggs  wrote:
>>> Looks like a bug that we should fix in PG10, with backpatch to 9.4 (or
>>> as far as it goes).
>>>
>>> Objections to commit?
>>>
>>
>> Seems we still have this issue. Any update or comment on this? Barring
>> any objections, I'll add this to the open item so it doesn't get
>> missed.
> 
> [Action required within three days.  This is a generic notification.]
> 
> The above-described topic is currently a PostgreSQL 10 open item.  Peter,
> since you committed the patch believed to have created it, you own this open
> item.  If some other commit is more relevant or if this does not belong as a
> v10 open item, please let us know.

I would ask Simon to go ahead with this patch if he feels comfortable
with it.

I'm disclaiming this open item, since it's an existing bug from previous
releases (and I have other open items to focus on).

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication status in logical replication

2017-05-29 Thread Noah Misch

On Fri, May 19, 2017 at 11:33:48AM +0900, Masahiko Sawada wrote:
> On Wed, Apr 12, 2017 at 5:31 AM, Simon Riggs  wrote:
> > On 22 March 2017 at 02:50, Masahiko Sawada  wrote:
> >
> >> When using logical replication, I ran into a situation where the
> >> pg_stat_replication.state is not updated until any wal record is sent
> >> after started up. For example, I set up logical replication with 2
> >> subscriber and restart the publisher server, but I see the following
> >> status for a while (maybe until autovacuum run).
> > ...
> >
> >> Attached patch fixes this behavior by updating WalSndCaughtUp before
> >> trying to read next WAL if already caught up.
> >
> > Looks like a bug that we should fix in PG10, with backpatch to 9.4 (or
> > as far as it goes).
> >
> > Objections to commit?
> >
> 
> Seems we still have this issue. Any update or comment on this? Barring
> any objections, I'll add this to the open item so it doesn't get
> missed.

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Peter,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication status in logical replication

2017-05-18 Thread Masahiko Sawada

On Wed, Apr 12, 2017 at 5:31 AM, Simon Riggs  wrote:
> On 22 March 2017 at 02:50, Masahiko Sawada  wrote:
>
>> When using logical replication, I ran into a situation where the
>> pg_stat_replication.state is not updated until any wal record is sent
>> after started up. For example, I set up logical replication with 2
>> subscriber and restart the publisher server, but I see the following
>> status for a while (maybe until autovacuum run).
> ...
>
>> Attached patch fixes this behavior by updating WalSndCaughtUp before
>> trying to read next WAL if already caught up.
>
> Looks like a bug that we should fix in PG10, with backpatch to 9.4 (or
> as far as it goes).
>
> Objections to commit?
>

Seems we still have this issue. Any update or comment on this? Barring
any objections, I'll add this to the open item so it doesn't get
missed.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication status in logical replication

2017-04-11 Thread Simon Riggs

On 22 March 2017 at 02:50, Masahiko Sawada  wrote:

> When using logical replication, I ran into a situation where the
> pg_stat_replication.state is not updated until any wal record is sent
> after started up. For example, I set up logical replication with 2
> subscriber and restart the publisher server, but I see the following
> status for a while (maybe until autovacuum run).
...

> Attached patch fixes this behavior by updating WalSndCaughtUp before
> trying to read next WAL if already caught up.

Looks like a bug that we should fix in PG10, with backpatch to 9.4 (or
as far as it goes).

Objections to commit?

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Replication status in logical replication

2017-03-21 Thread Masahiko Sawada

Hi all,

When using logical replication, I ran into a situation where the
pg_stat_replication.state is not updated until any wal record is sent
after started up. For example, I set up logical replication with 2
subscriber and restart the publisher server, but I see the following
status for a while (maybe until autovacuum run).

=# select application_name, state, sent_location, write_location,
flush_location, replay_location, sync_state from pg_stat_replication ;
 application_name |  state  | sent_location | write_location |
flush_location | replay_location | sync_state
--+-+---+++-+
 node1| catchup | 0/16329F8 | 0/16329F8  |
0/16329F8  | 0/16329F8   | potential
 node2| catchup | 0/16329F8 | 0/16329F8  |
0/16329F8  | 0/16329F8   | async
(2 rows)

It seems that all wal senders have caught up but
pg_stat_replication.state is still "catchup". The reason of this
behavior is that WalSndCaughtUp is updated only in WalSndWaitForWal in
logical replication during running, and in logical_read_xlog_page
always try to read next wal record (i.g. it calls
WalSndWaitForWal(targetPagePtr + reqLen)). So WalSndWaitForWal cannot
update WalSndCaughtUp until any new wal record is created after
started up and wal sender read it.

Attached patch fixes this behavior by updating WalSndCaughtUp before
trying to read next WAL if already caught up.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


logical_repl_caught_up.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication vs. float timestamps is a disaster

2017-02-27 Thread Andres Freund

On 2017-02-27 17:00:23 -0800, Joshua D. Drake wrote:
> On 02/22/2017 02:45 PM, Tom Lane wrote:
> > Andres Freund  writes:
> > > On 2017-02-22 08:43:28 -0500, Tom Lane wrote:
> > > > (To be concrete, I'm suggesting dropping --disable-integer-datetimes
> > > > in HEAD, and just agreeing that in the back branches, use of replication
> > > > protocol with float-timestamp servers is not supported and we're not
> > > > going to bother looking for related bugs there.  Given the lack of field
> > > > complaints, I do not believe anyone cares.)
> > 
> > What I *am* willing to spend time on is removing float-timestamp code
> > in HEAD.  I've not yet heard anybody speak against doing that (or at
> > least, nothing I interpreted as a vote against it).  If I've not heard
> > any complaints by tomorrow, I'll get started on that.
> 
> Rip it out!

Already happened: 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=b6aa17e0ae367afdcea07118e016111af4fa6bc3


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication vs. float timestamps is a disaster

2017-02-27 Thread Joshua D. Drake


On 02/22/2017 02:45 PM, Tom Lane wrote:

Andres Freund  writes:

On 2017-02-22 08:43:28 -0500, Tom Lane wrote:

(To be concrete, I'm suggesting dropping --disable-integer-datetimes
in HEAD, and just agreeing that in the back branches, use of replication
protocol with float-timestamp servers is not supported and we're not
going to bother looking for related bugs there.  Given the lack of field
complaints, I do not believe anyone cares.)


What I *am* willing to spend time on is removing float-timestamp code
in HEAD.  I've not yet heard anybody speak against doing that (or at
least, nothing I interpreted as a vote against it).  If I've not heard
any complaints by tomorrow, I'll get started on that.


Rip it out!

Sincerely,

JD


--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.
Unless otherwise stated, opinions are my own.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication vs. float timestamps is a disaster

2017-02-27 Thread Bruce Momjian

On Mon, Feb 20, 2017 at 09:07:33AM -0500, Tom Lane wrote:
> The question to be asked is whether there is still anybody out there
> using float timestamps.  I'm starting to get dubious about it myself.
> Certainly, no packager that I'm aware of has shipped a float-timestamp
> build since we switched the default in 8.4.  Maybe there is somebody
> who's faithfully built a float-timestamp custom build every year so they
> can pg_upgrade in place from their 8.3 installation, but there have got
> to be darn few such people.
> 
> As for "proper deprecation period", the documentation has called the
> option deprecated since 8.4:
> 
> -disable-integer-datetimes
>   Disable support for 64-bit integer storage for timestamps and
>   intervals, and store datetime values as floating-point numbers
>   instead. Floating-point datetime storage was the default in
>   PostgreSQL releases prior to 8.4, but it is now deprecated,
>   because it does not support microsecond precision for the full
>   range of timestamp values.
> 
> I think the strongest reason why we didn't move to kill it sooner was
> that we were not then assuming that every platform had 64-bit ints;
> but we've required that since 9.0.

I agree with removing support for float timestamps in PG 10.  It would
be tempting to think we can reuse the bits we stopped using in 9.0 for
VACUUM FULL, e.g.:

#define HEAP_MOVED_OFF  0x4000  /* moved to another place by 
pre-9.0
 * VACUUM FULL; kept for binary
 * upgrade support */
#define HEAP_MOVED_IN   0x8000  /* moved from another place by 
pre-9.0
 * VACUUM FULL; kept for binary
 * upgrade support */

However, if users built Postgres 8.4 with integer timestamps, they could
still be in the data files.  Does anyone see a fault in my logic, I
hope?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication vs. float timestamps is a disaster

Andres Freund  writes:
> On 2017-02-22 08:43:28 -0500, Tom Lane wrote:
>> (To be concrete, I'm suggesting dropping --disable-integer-datetimes
>> in HEAD, and just agreeing that in the back branches, use of replication
>> protocol with float-timestamp servers is not supported and we're not
>> going to bother looking for related bugs there.  Given the lack of field
>> complaints, I do not believe anyone cares.)

> I think we should just fix it in the back branches, and drop the float
> timestamp code in 10 or 11.  I.e. 1) and then 4).

I'm not personally interested in touching this issue in the back branches,
but if you want to spend time on it, I surely won't stand in your way.

What I *am* willing to spend time on is removing float-timestamp code
in HEAD.  I've not yet heard anybody speak against doing that (or at
least, nothing I interpreted as a vote against it).  If I've not heard
any complaints by tomorrow, I'll get started on that.

I envision the following work plan:

* Reject --disable-integer-datetimes in configure.  To avoid breaking
existing build scripts unnecessarily, --enable-integer-datetimes will
still be accepted.

* Convert HAVE_INT64_TIMESTAMP to a fixed, always-defined symbol.
(We shouldn't remove it entirely because that would break third-party
code that might be testing it.  Perhaps shove it in pg_config_manual.h.)

* Possibly remove the enableIntTimes field from pg_control as well
as associated logic in places like pg_controldata.  There might be some
argument for keeping this, though ... anyone have an opinion?  pg_upgrade,
at least, would need a bespoke test instead.

* Remove appropriate user documentation.

* Remove all core-code tests for HAVE_INT64_TIMESTAMP, along with the
negative sides of those #if tests.

* Change the places in the replication code that declare timestamp
variables to declare them as TimestampTz not int64, and adjust nearby
comments accordingly.  (This will just be code beautification at this
point, but it still seems like a good idea.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication vs. float timestamps is a disaster

2017-02-22 Thread David Fetter

On Mon, Feb 20, 2017 at 11:58:12AM +0100, Petr Jelinek wrote:
> On 20/02/17 08:03, Andres Freund wrote:
> > On 2017-02-19 10:49:29 -0500, Tom Lane wrote:
> >> Robert Haas  writes:
> >>> On Sun, Feb 19, 2017 at 3:31 AM, Tom Lane  wrote:
>  Thoughts?  Should we double down on trying to make this work according
>  to the "all integer timestamps" protocol specs, or cut our losses and
>  change the specs?
> >>
> >>> I vote for doubling down.  It's bad enough that we have so many
> >>> internal details that depend on this setting; letting that cascade
> >>> into the wire protocol seems like it's just letting the chaos spread
> >>> farther and wider.
> >>
> >> How do you figure that it's not embedded in the wire protocol already?
> >> Not only the replicated data for a timestamp column, but also the
> >> client-visible binary I/O format, depend on this.  I think having some
> >> parts of the protocol use a different timestamp format than other parts
> >> is simply weird, and as this exercise has shown, it's bug-prone as all
> >> get out.
> > 
> > I don't think it's that closely tied together atm. Things like
> > pg_basebackup, pg_receivexlog etc should work, without having to match
> > timestamp storage.  Logical replication, unless your output plugin dumps
> > data in binary / "raw" output, also works just fine across the timestamp
> > divide.
> > 
> > It doesn't sound that hard to add a SystemToIntTimestamp() function,
> > given it only needs to do something if float timestamps are enabled?
> > 
> 
> It's definitely not hard, we already have
> IntegerTimestampToTimestampTz() which does the opposite conversion anyway.
> 
> That being said, I did wonder myself if we should just deprecate float
> timestamps as well.

+1 for deprecating them.  If we need a timestamp(tz) with a wider
range, we are getting options we didn't have before for implementing
it.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication vs. float timestamps is a disaster

Andrew Dunstan  writes:
> On 02/22/2017 10:21 AM, Jim Nasby wrote:
>> Only in the catalog though, not the datums, right? I would think you
>> could just change the oid in the catalog the same as you would for a
>> table column.

> No, in the datums.

Yeah, I don't see any way that we could fob off float timestamps to an
extension that would be completely transparent at the pg_upgrade level.
And even a partial solution would be an enormous amount of fundamentally
dead-end work.

There has never been any project policy that promises everybody will be
able to pg_upgrade until the end of time.  I think it's not unreasonable
for us to say that anyone still using float timestamps has to go through
a dump and reload to get to v10.  The original discussion about getting
rid of them was ten years ago come May; that seems long enough.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication vs. float timestamps is a disaster

2017-02-22 Thread Andrew Dunstan



On 02/22/2017 10:21 AM, Jim Nasby wrote:
> On 2/22/17 9:12 AM, Andres Freund wrote:
>>> That would allow an in-place upgrade of
>>> a really large cluster. A user would still need to modify their code
>>> to use
>>> the new type.
>>>
>>> Put another way: add ability for pg_upgrade to change the type of a
>>> field.
>>> There might be other uses for that as well.
>> Type oids are unfortunately embedded into composite and array type data
>> - we can do such changes for columns themselves, but it doesn't work if
>> there's any array/composite members containing the to-be-changed type
>> that are used as columns.
>
> Only in the catalog though, not the datums, right? I would think you
> could just change the oid in the catalog the same as you would for a
> table column.

No, in the datums.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication vs. float timestamps is a disaster

2017-02-22 Thread Jim Nasby


On 2/22/17 9:12 AM, Andres Freund wrote:

That would allow an in-place upgrade of
a really large cluster. A user would still need to modify their code to use
the new type.

Put another way: add ability for pg_upgrade to change the type of a field.
There might be other uses for that as well.

Type oids are unfortunately embedded into composite and array type data
- we can do such changes for columns themselves, but it doesn't work if
there's any array/composite members containing the to-be-changed type
that are used as columns.


Only in the catalog though, not the datums, right? I would think you 
could just change the oid in the catalog the same as you would for a 
table column.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication vs. float timestamps is a disaster

2017-02-22 Thread Andres Freund

On 2017-02-22 09:06:38 -0600, Jim Nasby wrote:
> On 2/22/17 7:56 AM, Andres Freund wrote:
> > It sounded more like Jim suggested a full blown SQL type, given that he
> > replied to my concern about the possible need for a deprecation period
> > due to pg_upgrade concerns.  To be useful for that, we'd need a good
> > chunk of magic, so all existing uses of timestamp[tz] are replaced with
> > floattimestamp[tz], duplicate some code, add implicit casts, and accept
> > that composites/arrays won't be fixed.  That sounds like a fair amount
> > of work to me, and we'd still have no way to remove the code without
> > causing pain.
> 
> Right, but I was thinking more in line with just providing the type (as an
> extension, perhaps not even in core) and making it possible for pg_upgrade
> to switch fields over to that type.

Isn't the above what you'd need to do that?


> That would allow an in-place upgrade of
> a really large cluster. A user would still need to modify their code to use
> the new type.
> 
> Put another way: add ability for pg_upgrade to change the type of a field.
> There might be other uses for that as well.

Type oids are unfortunately embedded into composite and array type data
- we can do such changes for columns themselves, but it doesn't work if
there's any array/composite members containing the to-be-changed type
that are used as columns.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication vs. float timestamps is a disaster

2017-02-22 Thread Jim Nasby


On 2/22/17 7:56 AM, Andres Freund wrote:

On 2017-02-22 08:43:28 -0500, Tom Lane wrote:

Andres Freund  writes:

On 2017-02-22 00:10:35 -0600, Jim Nasby wrote:

I wounder if a separate "floatstamp" data type might fit the bill there. It
might not be completely seamless, but it would be binary compatible.

I don't really see what'd that solve.

Seems to me this is a different name for what I already tried in
<27694.1487456...@sss.pgh.pa.us>.  It would be much better than doing
nothing, IMO, but it would still leave lots of opportunities for mistakes.

It sounded more like Jim suggested a full blown SQL type, given that he
replied to my concern about the possible need for a deprecation period
due to pg_upgrade concerns.  To be useful for that, we'd need a good
chunk of magic, so all existing uses of timestamp[tz] are replaced with
floattimestamp[tz], duplicate some code, add implicit casts, and accept
that composites/arrays won't be fixed.  That sounds like a fair amount
of work to me, and we'd still have no way to remove the code without
causing pain.


Right, but I was thinking more in line with just providing the type (as 
an extension, perhaps not even in core) and making it possible for 
pg_upgrade to switch fields over to that type. That would allow an 
in-place upgrade of a really large cluster. A user would still need to 
modify their code to use the new type.


Put another way: add ability for pg_upgrade to change the type of a 
field. There might be other uses for that as well.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication vs. float timestamps is a disaster

Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> While I'm generally not one to vote for dropping backwards-compatibility
>> features, I have to say that I find #4 the most attractive of these
>> options.  It would result in getting rid of boatloads of under-tested
>> code, whereas #2 would really just add more, and #3 at best maintains
>> the status quo complexity-wise.

> +1.

BTW, I did a bit of digging in the archives, and found a couple of
previous discussions around this:

https://www.postgresql.org/message-id/flat/1178476313.18303.164.camel%40goldbach
https://www.postgresql.org/message-id/flat/1287334597.8516.372.camel%40jdavis

as well as numerous discussions of specific bugs associated with float
timestamps, eg this isn't even the first time we've hit a float-timestamp
replication bug:

https://www.postgresql.org/message-id/flat/CAHGQGwFbqTDBrw95jek6_RvYG2%3DE-6o0HOpbeEcP6oWHJTLkUw%40mail.gmail.com

In one or another of those threads, I opined that we'd have to keep float
timestamps available for as long as we were supporting pg_upgrade from
8.3.  However, I think that that argument hasn't really stood the test of
time, because of the lack of packagers shipping builds with float
timestamps turned on.  There may be some number of installations that
still have float timestamps active, but it's got to be a tiny number.
And the list of reasons not to like float timestamps is very long.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication vs. float timestamps is a disaster

2017-02-22 Thread Stephen Frost

Tom, all,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> While I'm generally not one to vote for dropping backwards-compatibility
> features, I have to say that I find #4 the most attractive of these
> options.  It would result in getting rid of boatloads of under-tested
> code, whereas #2 would really just add more, and #3 at best maintains
> the status quo complexity-wise.

+1.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Replication vs. float timestamps is a disaster

2017-02-22 Thread Andres Freund

On 2017-02-22 08:43:28 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2017-02-22 00:10:35 -0600, Jim Nasby wrote:
> >> I wounder if a separate "floatstamp" data type might fit the bill there. It
> >> might not be completely seamless, but it would be binary compatible.
> 
> > I don't really see what'd that solve.
> 
> Seems to me this is a different name for what I already tried in
> <27694.1487456...@sss.pgh.pa.us>.  It would be much better than doing
> nothing, IMO, but it would still leave lots of opportunities for mistakes.

It sounded more like Jim suggested a full blown SQL type, given that he
replied to my concern about the possible need for a deprecation period
due to pg_upgrade concerns.  To be useful for that, we'd need a good
chunk of magic, so all existing uses of timestamp[tz] are replaced with
floattimestamp[tz], duplicate some code, add implicit casts, and accept
that composites/arrays won't be fixed.  That sounds like a fair amount
of work to me, and we'd still have no way to remove the code without
causing pain.

> 1. Just patch the already-identified float-vs-int-timestamp bugs in as
> localized a fashion as possible, and hope that there aren't any more and
> that we never introduce any more.  I find this hope foolish :-(,
> especially in view of the fact that what started this discussion is a
> newly-introduced bug of this ilk.

Well, the newly introduced bug was just a copy of the existing code.
I'm far less negative about this than you - we're not dealing with a
whole lot of code here. I don't get why it'd be foolish to verify the
limited amount of code dealing with replication protocol timestamp.

> 4. Give up on float timestamps altogether.
> 
> While I'm generally not one to vote for dropping backwards-compatibility
> features, I have to say that I find #4 the most attractive of these
> options.  It would result in getting rid of boatloads of under-tested
> code, whereas #2 would really just add more, and #3 at best maintains
> the status quo complexity-wise.

> (To be concrete, I'm suggesting dropping --disable-integer-datetimes
> in HEAD, and just agreeing that in the back branches, use of replication
> protocol with float-timestamp servers is not supported and we're not
> going to bother looking for related bugs there.  Given the lack of field
> complaints, I do not believe anyone cares.)

I think we should just fix it in the back branches, and drop the float
timestamp code in 10 or 11.  I.e. 1) and then 4).

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication vs. float timestamps is a disaster

Andres Freund  writes:
> On 2017-02-22 00:10:35 -0600, Jim Nasby wrote:
>> I wounder if a separate "floatstamp" data type might fit the bill there. It
>> might not be completely seamless, but it would be binary compatible.

> I don't really see what'd that solve.

Seems to me this is a different name for what I already tried in
<27694.1487456...@sss.pgh.pa.us>.  It would be much better than doing
nothing, IMO, but it would still leave lots of opportunities for mistakes.

To review the bidding a bit, it seems to me we've got four possible ways
of dealing with this issue:

1. Just patch the already-identified float-vs-int-timestamp bugs in as
localized a fashion as possible, and hope that there aren't any more and
that we never introduce any more.  I find this hope foolish :-(,
especially in view of the fact that what started this discussion is a
newly-introduced bug of this ilk.

2. Introduce some new notation, perhaps <27694.1487456...@sss.pgh.pa.us>
plus new "sendtimestamp" and "recvtimestamp" functions, to try to provide
some compiler-assisted support for getting it right.

3. Give up on "protocol timestamps are always integer style".

4. Give up on float timestamps altogether.

While I'm generally not one to vote for dropping backwards-compatibility
features, I have to say that I find #4 the most attractive of these
options.  It would result in getting rid of boatloads of under-tested
code, whereas #2 would really just add more, and #3 at best maintains
the status quo complexity-wise.

I think it was clear from the day we switched to integer timestamps as the
default that the days of float timestamps were numbered, and that we were
only going to continue to support them as long as it wasn't costing a lot
of effort.  We have now reached a point at which it's clear that
continuing to support them will have direct and significant costs.
I say it's time to pull the plug.

(To be concrete, I'm suggesting dropping --disable-integer-datetimes
in HEAD, and just agreeing that in the back branches, use of replication
protocol with float-timestamp servers is not supported and we're not
going to bother looking for related bugs there.  Given the lack of field
complaints, I do not believe anyone cares.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication vs. float timestamps is a disaster

2017-02-21 Thread Andres Freund

On 2017-02-22 00:10:35 -0600, Jim Nasby wrote:
> On 2/20/17 5:04 AM, Andres Freund wrote:
> > On 2017-02-20 11:58:12 +0100, Petr Jelinek wrote:
> > > That being said, I did wonder myself if we should just deprecate float
> > > timestamps as well.
> > 
> > I think we need a proper deprecation period for that, given that the
> > conversion away will be painful for pg_upgrade using people with big
> > clusters.  So I think we should fix this regardless... :(
> 
> I wounder if a separate "floatstamp" data type might fit the bill there. It
> might not be completely seamless, but it would be binary compatible.

I don't really see what'd that solve.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication vs. float timestamps is a disaster

2017-02-21 Thread Jim Nasby


On 2/20/17 5:04 AM, Andres Freund wrote:

On 2017-02-20 11:58:12 +0100, Petr Jelinek wrote:

That being said, I did wonder myself if we should just deprecate float
timestamps as well.


I think we need a proper deprecation period for that, given that the
conversion away will be painful for pg_upgrade using people with big
clusters.  So I think we should fix this regardless... :(


I wounder if a separate "floatstamp" data type might fit the bill there. 
It might not be completely seamless, but it would be binary compatible.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication vs. float timestamps is a disaster

2017-02-21 Thread Jim Nasby


On 2/21/17 4:52 PM, James Cloos wrote:

"TL" == Tom Lane  writes:

TL> The question to be asked is whether there is still anybody out there
TL> using float timestamps.

Gentoo's ebuild includes:

   $(use_enable !pg_legacytimestamp integer-datetimes) \


FWIW, last time I looked it was also an option in FreeBSD's ports, 
though I think it's defaulted to int since forever ago (like, 7.4 era).

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication vs. float timestamps is a disaster

2017-02-21 Thread James Cloos

> "TL" == Tom Lane  writes:

TL> The question to be asked is whether there is still anybody out there
TL> using float timestamps.

Gentoo's ebuild includes:

   $(use_enable !pg_legacytimestamp integer-datetimes) \

meaning that by default --enable-integer-datetimes is passed to configure,
but if the pg_legacytimestamp use flag is set, then --disable-integer-datetimes
is passed instead.

They document it as:


Use double precision floating-point numbers instead of 64-bit
integers for timestamp storage.


Ie, w/o any kind of deprecation notice.

I don't know how many (how few?) add pg_legacytimestamp to USE when
merging postgresql.  But it is still available as of 9.6 and also
with their live build of git://git.postgresql.org/git/postgresql.git.

-JimC
-- 
James Cloos  OpenPGP: 0x997A9F17ED7DAEA6


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication vs. float timestamps is a disaster

2017-02-20 Thread Tom Lane

Robert Haas  writes:
> On Mon, Feb 20, 2017 at 7:37 PM, Tom Lane  wrote:
>> The question to be asked is whether there is still anybody out there
>> using float timestamps.  I'm starting to get dubious about it myself.

> I'm wondering if it has any effect that pg_config.h.win32 says

> /* Define to 1 if you want 64-bit integer timestamp and interval support.
>(--enable-integer-datetimes) */
> /* #undef USE_INTEGER_DATETIMES */

> Whereas pg_config.h.win32 says:

> /* Define to 1 if you want 64-bit integer timestamp and interval support.
>(--enable-integer-datetimes) */
> #define USE_INTEGER_DATETIMES 1

Er, what?  For me, grep finds

src/include/pg_config.h.in: 836: #undef USE_INTEGER_DATETIMES
src/include/pg_config.h.win32: 630: /* #undef USE_INTEGER_DATETIMES */

> It looks like it was commit 2169e42bef9db7e0bdd1bea00b81f44973ad83c8
> that enabled integer datetimes by default, but that commit seems to
> not to have touched the Windows build scripts.  Commit
> fcf053d7829f2d83829256153e856f9a36c83ffd changed MSVC over to use
> integer datetimes by default, but I'm not clear if there's any build
> environment where we rely on config.h.win32 but not Solution.pm?

Any such build would find itself without a defined value of BLCKSZ,
to mention just one of the settings that get appended by Solution.pm.

It does look like we expect pg_config.h.win32 to be usable standalone
for libpq-only Windows compiles, but it would never work for building
the backend.  (I dunno whether the libpq-only scripts still work at
all or have bitrotted, but it's irrelevant for the question at hand.)

> If not, what exactly is pg_config.h.win32 for and to what degree does it
> need to be in sync with pg_config.h.in?  The list of differences
> appears to be far more extensive than the header comment at the top of
> pg_config.h.win32 would lead one to believe.

Yeah, I think the maintenance of pg_config.h.win32 has been a bit more
haphazard than one could wish.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication vs. float timestamps is a disaster

2017-02-20 Thread Robert Haas

On Mon, Feb 20, 2017 at 7:37 PM, Tom Lane  wrote:
> The question to be asked is whether there is still anybody out there
> using float timestamps.  I'm starting to get dubious about it myself.
> Certainly, no packager that I'm aware of has shipped a float-timestamp
> build since we switched the default in 8.4.  Maybe there is somebody
> who's faithfully built a float-timestamp custom build every year so they
> can pg_upgrade in place from their 8.3 installation, but there have got
> to be darn few such people.

I'm wondering if it has any effect that pg_config.h.win32 says

/* Define to 1 if you want 64-bit integer timestamp and interval support.
   (--enable-integer-datetimes) */
/* #undef USE_INTEGER_DATETIMES */

Whereas pg_config.h.win32 says:

/* Define to 1 if you want 64-bit integer timestamp and interval support.
   (--enable-integer-datetimes) */
#define USE_INTEGER_DATETIMES 1

It looks like it was commit 2169e42bef9db7e0bdd1bea00b81f44973ad83c8
that enabled integer datetimes by default, but that commit seems to
not to have touched the Windows build scripts.  Commit
fcf053d7829f2d83829256153e856f9a36c83ffd changed MSVC over to use
integer datetimes by default, but I'm not clear if there's any build
environment where we rely on config.h.win32 but not Solution.pm?  If
not, what exactly is pg_config.h.win32 for and to what degree does it
need to be in sync with pg_config.h.in?  The list of differences
appears to be far more extensive than the header comment at the top of
pg_config.h.win32 would lead one to believe.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication vs. float timestamps is a disaster

2017-02-20 Thread Tom Lane

Petr Jelinek  writes:
> It's definitely not hard, we already have
> IntegerTimestampToTimestampTz() which does the opposite conversion anyway.

It's not the functions that are hard, it's making sure that you have used
them in the correct places, and declared relevant variables with the
appropriate types.  Evidence on the ground is that that is very hard;
I have little faith that we'll get it right without compiler support.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication vs. float timestamps is a disaster

2017-02-20 Thread Tom Lane

Andres Freund  writes:
> On 2017-02-20 11:58:12 +0100, Petr Jelinek wrote:
>> That being said, I did wonder myself if we should just deprecate float
>> timestamps as well.

> I think we need a proper deprecation period for that, given that the
> conversion away will be painful for pg_upgrade using people with big
> clusters.  So I think we should fix this regardless... :(

The question to be asked is whether there is still anybody out there
using float timestamps.  I'm starting to get dubious about it myself.
Certainly, no packager that I'm aware of has shipped a float-timestamp
build since we switched the default in 8.4.  Maybe there is somebody
who's faithfully built a float-timestamp custom build every year so they
can pg_upgrade in place from their 8.3 installation, but there have got
to be darn few such people.

As for "proper deprecation period", the documentation has called the
option deprecated since 8.4:

-disable-integer-datetimes
Disable support for 64-bit integer storage for timestamps and
intervals, and store datetime values as floating-point numbers
instead. Floating-point datetime storage was the default in
PostgreSQL releases prior to 8.4, but it is now deprecated,
because it does not support microsecond precision for the full
range of timestamp values.

I think the strongest reason why we didn't move to kill it sooner was
that we were not then assuming that every platform had 64-bit ints;
but we've required that since 9.0.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication vs. float timestamps is a disaster

2017-02-20 Thread Petr Jelinek

On 20/02/17 12:04, Andres Freund wrote:
> On 2017-02-20 11:58:12 +0100, Petr Jelinek wrote:
>> That being said, I did wonder myself if we should just deprecate float
>> timestamps as well.
> 
> I think we need a proper deprecation period for that, given that the
> conversion away will be painful for pg_upgrade using people with big
> clusters.  So I think we should fix this regardless... :(
> 

That's a good point.

Attached should fix the logical replication problems. I am not quite
sure if there is anything in physical that needs changing.

I opted for GetCurrentIntegerTimestamp() in the reply code as that's the
same coding walreceiver uses.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index 142cd99..dd5bdcc 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -46,7 +46,7 @@ logicalrep_write_begin(StringInfo out, ReorderBufferTXN *txn)
 
 	/* fixed fields */
 	pq_sendint64(out, txn->final_lsn);
-	pq_sendint64(out, txn->commit_time);
+	pq_sendint64(out, TimestampTzToIntegerTimestamp(txn->commit_time));
 	pq_sendint(out, txn->xid, 4);
 }
 
@@ -60,7 +60,7 @@ logicalrep_read_begin(StringInfo in, LogicalRepBeginData *begin_data)
 	begin_data->final_lsn = pq_getmsgint64(in);
 	if (begin_data->final_lsn == InvalidXLogRecPtr)
 		elog(ERROR, "final_lsn not set in begin message");
-	begin_data->committime = pq_getmsgint64(in);
+	begin_data->committime = IntegerTimestampToTimestampTz(pq_getmsgint64(in));
 	begin_data->xid = pq_getmsgint(in, 4);
 }
 
@@ -82,7 +82,7 @@ logicalrep_write_commit(StringInfo out, ReorderBufferTXN *txn,
 	/* send fields */
 	pq_sendint64(out, commit_lsn);
 	pq_sendint64(out, txn->end_lsn);
-	pq_sendint64(out, txn->commit_time);
+	pq_sendint64(out, TimestampTzToIntegerTimestamp(txn->commit_time));
 }
 
 /*
@@ -100,7 +100,7 @@ logicalrep_read_commit(StringInfo in, LogicalRepCommitData *commit_data)
 	/* read fields */
 	commit_data->commit_lsn = pq_getmsgint64(in);
 	commit_data->end_lsn = pq_getmsgint64(in);
-	commit_data->committime = pq_getmsgint64(in);
+	commit_data->committime = IntegerTimestampToTimestampTz(pq_getmsgint64(in));
 }
 
 /*
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 0b19fec..225ea4c 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -1183,7 +1183,7 @@ send_feedback(XLogRecPtr recvpos, bool force, bool requestReply)
 	pq_sendint64(reply_message, recvpos);		/* write */
 	pq_sendint64(reply_message, flushpos);		/* flush */
 	pq_sendint64(reply_message, writepos);		/* apply */
-	pq_sendint64(reply_message, now);			/* sendTime */
+	pq_sendint64(reply_message, GetCurrentIntegerTimestamp());	/* sendTime */
 	pq_sendbyte(reply_message, requestReply);	/* replyRequested */
 
 	elog(DEBUG2, "sending feedback (force %d) to recv %X/%X, write %X/%X, flush %X/%X",
diff --git a/src/backend/utils/adt/timestamp.c b/src/backend/utils/adt/timestamp.c
index 9b4c012..1dd469d 100644
--- a/src/backend/utils/adt/timestamp.c
+++ b/src/backend/utils/adt/timestamp.c
@@ -1749,6 +1749,20 @@ IntegerTimestampToTimestampTz(int64 timestamp)
 #endif
 
 /*
+ * TimestampTzToIntegerTimestamp -- convert a native format timestamp to int64
+ *
+ * When compiled with --enable-integer-datetimes, this is implemented as a
+ * no-op macro.
+ */
+#ifndef HAVE_INT64_TIMESTAMP
+int64
+TimestampTzToIntegerTimestamp(TimestampTz timestamp)
+{
+	return timestamp * USECS_PER_SEC;
+}
+#endif
+
+/*
  * GetSQLCurrentTimestamp -- implements CURRENT_TIMESTAMP, CURRENT_TIMESTAMP(n)
  */
 TimestampTz
diff --git a/src/include/utils/timestamp.h b/src/include/utils/timestamp.h
index 21651b1..765fa81 100644
--- a/src/include/utils/timestamp.h
+++ b/src/include/utils/timestamp.h
@@ -108,9 +108,11 @@ extern bool TimestampDifferenceExceeds(TimestampTz start_time,
 #ifndef HAVE_INT64_TIMESTAMP
 extern int64 GetCurrentIntegerTimestamp(void);
 extern TimestampTz IntegerTimestampToTimestampTz(int64 timestamp);
+extern int64 TimestampTzToIntegerTimestamp(TimestampTz timestamp);
 #else
 #define GetCurrentIntegerTimestamp()	GetCurrentTimestamp()
 #define IntegerTimestampToTimestampTz(timestamp) (timestamp)
+#define TimestampTzToIntegerTimestamp(timestamp) (timestamp)
 #endif
 
 extern TimestampTz time_t_to_timestamptz(pg_time_t tm);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication vs. float timestamps is a disaster

2017-02-20 Thread Andres Freund

On 2017-02-20 11:58:12 +0100, Petr Jelinek wrote:
> That being said, I did wonder myself if we should just deprecate float
> timestamps as well.

I think we need a proper deprecation period for that, given that the
conversion away will be painful for pg_upgrade using people with big
clusters.  So I think we should fix this regardless... :(

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication vs. float timestamps is a disaster

2017-02-20 Thread Petr Jelinek

On 20/02/17 08:03, Andres Freund wrote:
> On 2017-02-19 10:49:29 -0500, Tom Lane wrote:
>> Robert Haas  writes:
>>> On Sun, Feb 19, 2017 at 3:31 AM, Tom Lane  wrote:
 Thoughts?  Should we double down on trying to make this work according
 to the "all integer timestamps" protocol specs, or cut our losses and
 change the specs?
>>
>>> I vote for doubling down.  It's bad enough that we have so many
>>> internal details that depend on this setting; letting that cascade
>>> into the wire protocol seems like it's just letting the chaos spread
>>> farther and wider.
>>
>> How do you figure that it's not embedded in the wire protocol already?
>> Not only the replicated data for a timestamp column, but also the
>> client-visible binary I/O format, depend on this.  I think having some
>> parts of the protocol use a different timestamp format than other parts
>> is simply weird, and as this exercise has shown, it's bug-prone as all
>> get out.
> 
> I don't think it's that closely tied together atm. Things like
> pg_basebackup, pg_receivexlog etc should work, without having to match
> timestamp storage.  Logical replication, unless your output plugin dumps
> data in binary / "raw" output, also works just fine across the timestamp
> divide.
> 
> It doesn't sound that hard to add a SystemToIntTimestamp() function,
> given it only needs to do something if float timestamps are enabled?
> 

It's definitely not hard, we already have
IntegerTimestampToTimestampTz() which does the opposite conversion anyway.

That being said, I did wonder myself if we should just deprecate float
timestamps as well.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication vs. float timestamps is a disaster

2017-02-19 Thread Andres Freund

On 2017-02-19 10:49:29 -0500, Tom Lane wrote:
> Robert Haas  writes:
> > On Sun, Feb 19, 2017 at 3:31 AM, Tom Lane  wrote:
> >> Thoughts?  Should we double down on trying to make this work according
> >> to the "all integer timestamps" protocol specs, or cut our losses and
> >> change the specs?
> 
> > I vote for doubling down.  It's bad enough that we have so many
> > internal details that depend on this setting; letting that cascade
> > into the wire protocol seems like it's just letting the chaos spread
> > farther and wider.
> 
> How do you figure that it's not embedded in the wire protocol already?
> Not only the replicated data for a timestamp column, but also the
> client-visible binary I/O format, depend on this.  I think having some
> parts of the protocol use a different timestamp format than other parts
> is simply weird, and as this exercise has shown, it's bug-prone as all
> get out.

I don't think it's that closely tied together atm. Things like
pg_basebackup, pg_receivexlog etc should work, without having to match
timestamp storage.  Logical replication, unless your output plugin dumps
data in binary / "raw" output, also works just fine across the timestamp
divide.

It doesn't sound that hard to add a SystemToIntTimestamp() function,
given it only needs to do something if float timestamps are enabled?

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication vs. float timestamps is a disaster

2017-02-19 Thread Robert Haas

On Sun, Feb 19, 2017 at 9:19 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Sun, Feb 19, 2017 at 3:31 AM, Tom Lane  wrote:
>>> Thoughts?  Should we double down on trying to make this work according
>>> to the "all integer timestamps" protocol specs, or cut our losses and
>>> change the specs?
>
>> I vote for doubling down.  It's bad enough that we have so many
>> internal details that depend on this setting; letting that cascade
>> into the wire protocol seems like it's just letting the chaos spread
>> farther and wider.
>
> How do you figure that it's not embedded in the wire protocol already?
> Not only the replicated data for a timestamp column, but also the
> client-visible binary I/O format, depend on this.  I think having some
> parts of the protocol use a different timestamp format than other parts
> is simply weird, and as this exercise has shown, it's bug-prone as all
> get out.

You've got a point, but if we don't make the replication protocol
consistently use integers, then every client that cares about those
fields has to be able to handle either format, or at least detect that
it got a different format than it was expecting and fail cleanly.
How's that going to work?

Also, I feel like there's some difference between the data stored in a
cluster and the metadata which describes the cluster.  Making the
interpretation of the metadata depend on the cluster configuration
feels circular, like using non-ASCII characters in the name of an
encoding.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication vs. float timestamps is a disaster

2017-02-19 Thread Tom Lane

Robert Haas  writes:
> On Sun, Feb 19, 2017 at 3:31 AM, Tom Lane  wrote:
>> Thoughts?  Should we double down on trying to make this work according
>> to the "all integer timestamps" protocol specs, or cut our losses and
>> change the specs?

> I vote for doubling down.  It's bad enough that we have so many
> internal details that depend on this setting; letting that cascade
> into the wire protocol seems like it's just letting the chaos spread
> farther and wider.

How do you figure that it's not embedded in the wire protocol already?
Not only the replicated data for a timestamp column, but also the
client-visible binary I/O format, depend on this.  I think having some
parts of the protocol use a different timestamp format than other parts
is simply weird, and as this exercise has shown, it's bug-prone as all
get out.

> Also, I wonder if we could consider deprecating and removing
> --disable-integer-datetimes at some point.

Seems like a different conversation.  Although given the lack of
replication bug reports so far, maybe nobody is using
--disable-integer-datetimes anymore.

Certainly, fixing these bugs by removing the --disable-integer-datetimes
option would be a lot less painful than trying to make it actually work
per protocol spec.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication vs. float timestamps is a disaster

2017-02-19 Thread Robert Haas

On Sun, Feb 19, 2017 at 3:31 AM, Tom Lane  wrote:
> Thoughts?  Should we double down on trying to make this work according
> to the "all integer timestamps" protocol specs, or cut our losses and
> change the specs?

I vote for doubling down.  It's bad enough that we have so many
internal details that depend on this setting; letting that cascade
into the wire protocol seems like it's just letting the chaos spread
farther and wider.

Also, I wonder if we could consider deprecating and removing
--disable-integer-datetimes at some point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Replication vs. float timestamps is a disaster

2017-02-18 Thread Tom Lane

Both the streaming replication and logical replication areas of the code
are, approximately, utterly broken when !HAVE_INT64_TIMESTAMPS.  (The fact
that "make check-world" passes anyway is an indictment of the quality of
the regression tests.)

I started poking around in this area after Thomas Munro reported that
pg_recvlogical.c no longer even compiles with --disable-integer-datetimes,
but the problems go much deeper than that.

replication/logical/worker.c's send_feedback() is sending a backend
timestamp directly as sendTime.  This is wrong per the protocol spec,
which says the field is an integer timestamp.  I'm not sure if it's
OK to just replace

-pq_sendint64(reply_message, now);/* sendTime */
+pq_sendint64(reply_message, GetCurrentIntegerTimestamp());  /* sendTime */

... is it important for the sent timestamp to exactly match the
timestamp that was used earlier in the function for deciding
whether to send or not?  Should we instead try to convert the
earlier logic to use integer timestamps?

As for logical replication, logicalrep_write_begin and
logicalrep_write_commit are likewise sending TimestampTz values using
pq_sendint64, and this is much harder to patch locally since the values
were taken from WAL records that contain TimestampTz.  Although the sent
values are nonsensical according to the protocol spec,
logicalrep_read_begin and logicalrep_read_commit read them with
pq_getmsgint64 and assign the results back to TimestampTz, which means
that things sort of appear to work as long as you don't mind losing
all sub-second precision in the timestamps.  (But a decoder that was
written to spec would think the values are garbage.)

I'm inclined to think that at least for logical replication, we're going
to have to give up the protocol spec wording that says that timestamps
are integers, and instead admit that they are dependent on
disable-integer-datetimes.  Maybe we should do the same in the streaming
replication protocol, because this area is going to be a very fertile
source of bugs for the foreseeable future if we don't.  It's not like
replication between enable-integer-datetimes and disable-integer-datetimes
hosts has any chance of being useful anyway.

Thoughts?  Should we double down on trying to make this work according
to the "all integer timestamps" protocol specs, or cut our losses and
change the specs?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication slot xmin is not reset if HS feedback is turned off while standby is shut down

2017-01-26 Thread Simon Riggs

On 13 January 2017 at 10:17, Ants Aasma  wrote:
> On 5 Jan 2017 2:54 a.m., "Craig Ringer"  wrote:
>> Ants, do you think you'll have a chance to convert your shell script
>> test into a TAP test in src/test/recovery?
>>
>> Simon has said he would like to commit this fix. I'd personally be
>> happier if it had a test to go with it.
>>
>> You could probably just add to src/test/recover/t/001 which now
>> contains my additions for hot standby.
>
> Do you feel the test in the attached patch is enough or would you like
> to see anything else covered?

That looks good, thanks for the patch.

Applying in next few hours, barring objections; then backpatching code
(without tests).

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication/backup defaults

2017-01-14 Thread Magnus Hagander

On Wed, Jan 11, 2017 at 2:09 AM, Michael Paquier 
wrote:

> On Wed, Jan 11, 2017 at 10:06 AM, David Steele 
> wrote:
> > On 1/10/17 3:06 PM, Stephen Frost wrote:
> >> * Magnus Hagander (mag...@hagander.net) wrote:
> >>> On Tue, Jan 10, 2017 at 8:03 PM, Robert Haas 
> wrote:
> >
>  I may be outvoted, but I'm still not in favor of changing the default
>  wal_level.  That caters only to people who lack sufficient foresight
>  to know that they need a replica before the system becomes so critical
>  that they can't bounce it to update the configuration.
> >>>
> >>> True. But the current level only caters to those people who run large
> ETL
> >>> jobs without doing any tuning on their system (at least none that would
> >>> require a restart), or another one of the fairly specific workloads.
> >>>
> >>> And as I keep re-iterating, it's not just about replicas, it's also
> about
> >>> the ability to make proper backups. Which is a pretty fundamental
> feature.
> >>>
> >>> I do think you are outvoted, yes :) At least that's the result of my
> >>> tallying up the people who have spoken out on the thread.
> >>
> >> I tend to agree with Magnus on this, being able to perform an online
> >> backup is pretty darn important.
> >
> > Agreed and +1.
>
> +1'ing.
>

I've pushed this.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] Replication slot xmin is not reset if HS feedback is turned off while standby is shut down

2017-01-13 Thread Ants Aasma

On 5 Jan 2017 2:54 a.m., "Craig Ringer"  wrote:
> Ants, do you think you'll have a chance to convert your shell script
> test into a TAP test in src/test/recovery?
>
> Simon has said he would like to commit this fix. I'd personally be
> happier if it had a test to go with it.
>
> You could probably just add to src/test/recover/t/001 which now
> contains my additions for hot standby.

Do you feel the test in the attached patch is enough or would you like
to see anything else covered?

Regards,
Ants Aasma
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index c6b54ec..0ab936f 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -1157,7 +1157,10 @@ XLogWalRcvSendReply(bool force, bool requestReply)
  * in case they don't have a watch.
  *
  * If the user disables feedback, send one final message to tell sender
- * to forget about the xmin on this standby.
+ * to forget about the xmin on this standby. We also send this message
+ * on first connect because a previous connection might have set xmin
+ * on a replication slot. (If we're not using a slot it's harmless to
+ * send a feedback message explicitly setting InvalidTransactionId).
  */
 static void
 XLogWalRcvSendHSFeedback(bool immed)
@@ -1167,7 +1170,8 @@ XLogWalRcvSendHSFeedback(bool immed)
 	uint32		nextEpoch;
 	TransactionId xmin;
 	static TimestampTz sendTime = 0;
-	static bool master_has_standby_xmin = false;
+	/* initially true so we always send at least one feedback message */
+	static bool master_has_standby_xmin = true;
 
 	/*
 	 * If the user doesn't want status to be reported to the master, be sure
@@ -1192,14 +1196,17 @@ XLogWalRcvSendHSFeedback(bool immed)
 	}
 
 	/*
-	 * If Hot Standby is not yet active there is nothing to send. Check this
-	 * after the interval has expired to reduce number of calls.
+	 * If Hot Standby is not yet accepting connections there is nothing to
+	 * send. Check this after the interval has expired to reduce number of
+	 * calls.
+	 *
+	 * Bailing out here also ensures that we don't send feedback until we've
+	 * read our own replication slot state, so we don't tell the master to
+	 * discard needed xmin or catalog_xmin from any slots that may exist
+	 * on this replica.
 	 */
 	if (!HotStandbyActive())
-	{
-		Assert(!master_has_standby_xmin);
 		return;
-	}
 
 	/*
 	 * Make the expensive call to get the oldest xmin once we are certain
diff --git a/src/test/recovery/t/001_stream_rep.pl b/src/test/recovery/t/001_stream_rep.pl
index eef512d..7fb2e9e 100644
--- a/src/test/recovery/t/001_stream_rep.pl
+++ b/src/test/recovery/t/001_stream_rep.pl
@@ -3,7 +3,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 22;
+use Test::More tests => 24;
 
 # Initialize master node
 my $node_master = get_new_node('master');
@@ -161,3 +161,22 @@ is($catalog_xmin, '', 'non-cascaded slot xmin still null with hs_feedback reset'
 ($xmin, $catalog_xmin) = get_slot_xmins($node_standby_1, $slotname_2);
 is($xmin, '', 'cascaded slot xmin null with hs feedback reset');
 is($catalog_xmin, '', 'cascaded slot xmin still null with hs_feedback reset');
+
+diag "re-enabling hot_standby_feedback and disabling while stopped";
+$node_standby_2->safe_psql('postgres', 'ALTER SYSTEM SET hot_standby_feedback = on;');
+$node_standby_2->reload;
+
+$node_master->safe_psql('postgres', qq[INSERT INTO tab_int VALUES (11000);]);
+replay_check();
+
+$node_standby_2->safe_psql('postgres', 'ALTER SYSTEM SET hot_standby_feedback = off;');
+$node_standby_2->stop;
+
+($xmin, $catalog_xmin) = get_slot_xmins($node_standby_1, $slotname_2);
+isnt($xmin, '', 'cascaded slot xmin non-null with postgres shut down');
+
+# Xmin from a previous run should be cleared on startup.
+$node_standby_2->start;
+
+($xmin, $catalog_xmin) = get_slot_xmins($node_standby_1, $slotname_2);
+is($xmin, '', 'cascaded slot xmin reset after startup with hs feedback reset');

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication/backup defaults

2017-01-10 Thread Michael Paquier

On Wed, Jan 11, 2017 at 10:06 AM, David Steele  wrote:
> On 1/10/17 3:06 PM, Stephen Frost wrote:
>> * Magnus Hagander (mag...@hagander.net) wrote:
>>> On Tue, Jan 10, 2017 at 8:03 PM, Robert Haas  wrote:
>
 I may be outvoted, but I'm still not in favor of changing the default
 wal_level.  That caters only to people who lack sufficient foresight
 to know that they need a replica before the system becomes so critical
 that they can't bounce it to update the configuration.
>>>
>>> True. But the current level only caters to those people who run large ETL
>>> jobs without doing any tuning on their system (at least none that would
>>> require a restart), or another one of the fairly specific workloads.
>>>
>>> And as I keep re-iterating, it's not just about replicas, it's also about
>>> the ability to make proper backups. Which is a pretty fundamental feature.
>>>
>>> I do think you are outvoted, yes :) At least that's the result of my
>>> tallying up the people who have spoken out on the thread.
>>
>> I tend to agree with Magnus on this, being able to perform an online
>> backup is pretty darn important.
>
> Agreed and +1.

+1'ing.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication/backup defaults

2017-01-10 Thread David Steele

On 1/10/17 3:06 PM, Stephen Frost wrote:

> * Magnus Hagander (mag...@hagander.net) wrote:
>> On Tue, Jan 10, 2017 at 8:03 PM, Robert Haas  wrote:

>>> I may be outvoted, but I'm still not in favor of changing the default
>>> wal_level.  That caters only to people who lack sufficient foresight
>>> to know that they need a replica before the system becomes so critical
>>> that they can't bounce it to update the configuration.
>>
>> True. But the current level only caters to those people who run large ETL
>> jobs without doing any tuning on their system (at least none that would
>> require a restart), or another one of the fairly specific workloads.
>>
>> And as I keep re-iterating, it's not just about replicas, it's also about
>> the ability to make proper backups. Which is a pretty fundamental feature.
>>
>> I do think you are outvoted, yes :) At least that's the result of my
>> tallying up the people who have spoken out on the thread.
> 
> I tend to agree with Magnus on this, being able to perform an online
> backup is pretty darn important.

Agreed and +1.

-- 
-David
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] Replication/backup defaults

2017-01-10 Thread Stephen Frost

Greetings,

* Magnus Hagander (mag...@hagander.net) wrote:
> On Tue, Jan 10, 2017 at 8:03 PM, Robert Haas  wrote:
> 
> > On Mon, Jan 9, 2017 at 11:02 AM, Peter Eisentraut
> >  wrote:
> > > On 1/9/17 7:44 AM, Magnus Hagander wrote:
> > >> So based on that, I suggest we go ahead and make the change to make both
> > >> the values 10 by default. And that we do that now, because that lets us
> > >> get it out through more testing on different platforms, so that we catch
> > >> issues earlier on if they do arise.
> > >
> > > Sounds good.
> >
> > I may be outvoted, but I'm still not in favor of changing the default
> > wal_level.  That caters only to people who lack sufficient foresight
> > to know that they need a replica before the system becomes so critical
> > that they can't bounce it to update the configuration.
> 
> True. But the current level only caters to those people who run large ETL
> jobs without doing any tuning on their system (at least none that would
> require a restart), or another one of the fairly specific workloads.
> 
> And as I keep re-iterating, it's not just about replicas, it's also about
> the ability to make proper backups. Which is a pretty fundamental feature.
> 
> I do think you are outvoted, yes :) At least that's the result of my
> tallying up the people who have spoken out on the thread.

I tend to agree with Magnus on this, being able to perform an online
backup is pretty darn important.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Replication/backup defaults

2017-01-10 Thread Magnus Hagander

On Tue, Jan 10, 2017 at 8:03 PM, Robert Haas  wrote:

> On Mon, Jan 9, 2017 at 11:02 AM, Peter Eisentraut
>  wrote:
> > On 1/9/17 7:44 AM, Magnus Hagander wrote:
> >> So based on that, I suggest we go ahead and make the change to make both
> >> the values 10 by default. And that we do that now, because that lets us
> >> get it out through more testing on different platforms, so that we catch
> >> issues earlier on if they do arise.
> >
> > Sounds good.
>
> I may be outvoted, but I'm still not in favor of changing the default
> wal_level.  That caters only to people who lack sufficient foresight
> to know that they need a replica before the system becomes so critical
> that they can't bounce it to update the configuration.
>

True. But the current level only caters to those people who run large ETL
jobs without doing any tuning on their system (at least none that would
require a restart), or another one of the fairly specific workloads.

And as I keep re-iterating, it's not just about replicas, it's also about
the ability to make proper backups. Which is a pretty fundamental feature.

I do think you are outvoted, yes :) At least that's the result of my
tallying up the people who have spoken out on the thread.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] Replication/backup defaults

2017-01-10 Thread Robert Haas

On Mon, Jan 9, 2017 at 11:02 AM, Peter Eisentraut
 wrote:
> On 1/9/17 7:44 AM, Magnus Hagander wrote:
>> So based on that, I suggest we go ahead and make the change to make both
>> the values 10 by default. And that we do that now, because that lets us
>> get it out through more testing on different platforms, so that we catch
>> issues earlier on if they do arise.
>
> Sounds good.

I may be outvoted, but I'm still not in favor of changing the default
wal_level.  That caters only to people who lack sufficient foresight
to know that they need a replica before the system becomes so critical
that they can't bounce it to update the configuration.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication/backup defaults

2017-01-09 Thread Peter Eisentraut

On 1/9/17 7:44 AM, Magnus Hagander wrote:
> So based on that, I suggest we go ahead and make the change to make both
> the values 10 by default. And that we do that now, because that lets us
> get it out through more testing on different platforms, so that we catch
> issues earlier on if they do arise.

Sounds good.

> It would be interesting to find out why it's limited as well, of course,
> but I don't think we need to wait for that. 

After some testing and searching for documentation, it seems that at
least the BSD platforms have a very low default semmns setting
(apparently 60, which leads to max_connections=30).

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication/backup defaults

2017-01-09 Thread Magnus Hagander

On Sun, Jan 8, 2017 at 2:19 AM, Jim Nasby  wrote:

> On 1/5/17 2:50 PM, Tomas Vondra wrote:
>
>> Ultimately, the question is whether the number of people running into
>> "Hey, I can't take pg_basebackup or setup a standby with the default
>> config!" is higher or lower than number of people running into "Hey,
>> CREATE TABLE + COPY is slower now!"
>>
>
> I'm betting it's way higher. Loads of folks use Postgres and never do any
> kind of ETL.


I'm willing to say "the majority".



> That is not to say there are no other cases benefiting from those
>> optimizations, but we're talking about the default value - we're not
>> removing the wal_level=minimal.
>>
>
> This would be a non-issue if we provided example configs for a few
> different workloads. Obviously those would never be optimal either, but
> they *would* show users what settings they should immediately look at
> changing in their environment.


It might also be worthwhile to provide a section in the docs just saying
"these are the parameters you probably want to look at for workload "
rather than an actual example configuration. Including a short sentence or
two about why.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] Replication/backup defaults

2017-01-09 Thread Magnus Hagander

On Sat, Jan 7, 2017 at 7:57 PM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 1/7/17 6:23 AM, Magnus Hagander wrote:
> > In the build farm, I have found 6 critters that do not end up with
> the
> > 100/128MB setting: sidewinder, curculio, coypu, brolga, lorikeet,
> > opossum.  I wonder what limitations initdb is bumping against.
> >
> >
> > Since you lookeda t the data -- they did not end up with 100, but what's
> > the lowest they did end up with? Did they go all the way down to 10?
>
> They all ended up on 30 or 40.
>
> The only documented way this could happen is if the semaphore
> configuration does not allow enough room, but this would have to be a
> very particular setting on all these quite different boxes.
>
>
So based on that, I suggest we go ahead and make the change to make both
the values 10 by default. And that we do that now, because that lets us get
it out through more testing on different platforms, so that we catch issues
earlier on if they do arise.

It would be interesting to find out why it's limited as well, of course,
but I don't think we need to wait for that.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] Replication/backup defaults

2017-01-07 Thread Jim Nasby


On 1/5/17 2:50 PM, Tomas Vondra wrote:

Ultimately, the question is whether the number of people running into
"Hey, I can't take pg_basebackup or setup a standby with the default
config!" is higher or lower than number of people running into "Hey,
CREATE TABLE + COPY is slower now!"


I'm betting it's way higher. Loads of folks use Postgres and never do 
any kind of ETL.



That is not to say there are no other cases benefiting from those
optimizations, but we're talking about the default value - we're not
removing the wal_level=minimal.


This would be a non-issue if we provided example configs for a few 
different workloads. Obviously those would never be optimal either, but 
they *would* show users what settings they should immediately look at 
changing in their environment.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication/backup defaults

2017-01-07 Thread Peter Eisentraut

On 1/7/17 6:23 AM, Magnus Hagander wrote:
> In the build farm, I have found 6 critters that do not end up with the
> 100/128MB setting: sidewinder, curculio, coypu, brolga, lorikeet,
> opossum.  I wonder what limitations initdb is bumping against.
> 
> 
> Since you lookeda t the data -- they did not end up with 100, but what's
> the lowest they did end up with? Did they go all the way down to 10? 

They all ended up on 30 or 40.

The only documented way this could happen is if the semaphore
configuration does not allow enough room, but this would have to be a
very particular setting on all these quite different boxes.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication/backup defaults

2017-01-07 Thread Magnus Hagander

On Sat, Jan 7, 2017 at 1:27 AM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 1/5/17 12:01 PM, Andres Freund wrote:
> > On 2017-01-05 08:38:32 -0500, Peter Eisentraut wrote:
> >> I also suggest making the defaults for both 20 instead of 10.  That
> >> leaves enough room that almost nobody ever has to change them, whereas
> >> 10 can be a bit tight for some not-outrageous installations (8 standbys
> >> plus backup?).
> >
> > I'm afraid we need to add initdb integration / testing for those. I mean
> > we have initdb test down to 10 connections to deal with limited
> > resources...
>
> Those initdb defaults were last touched in 2005, before the use of
> System V shared memory was reduced to a minimum.  It might be worth
> revisiting that.  The only way to end up with a low number of connection
> slots would seem to be a very low semaphore configuration.
>
> In the build farm, I have found 6 critters that do not end up with the
> 100/128MB setting: sidewinder, curculio, coypu, brolga, lorikeet,
> opossum.  I wonder what limitations initdb is bumping against.
>
>
Since you lookeda t the data -- they did not end up with 100, but what's
the lowest they did end up with? Did they go all the way down to 10?


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] Replication/backup defaults

2017-01-06 Thread Peter Eisentraut

On 1/5/17 12:01 PM, Andres Freund wrote:
> On 2017-01-05 08:38:32 -0500, Peter Eisentraut wrote:
>> I also suggest making the defaults for both 20 instead of 10.  That
>> leaves enough room that almost nobody ever has to change them, whereas
>> 10 can be a bit tight for some not-outrageous installations (8 standbys
>> plus backup?).
> 
> I'm afraid we need to add initdb integration / testing for those. I mean
> we have initdb test down to 10 connections to deal with limited
> resources...

Those initdb defaults were last touched in 2005, before the use of
System V shared memory was reduced to a minimum.  It might be worth
revisiting that.  The only way to end up with a low number of connection
slots would seem to be a very low semaphore configuration.

In the build farm, I have found 6 critters that do not end up with the
100/128MB setting: sidewinder, curculio, coypu, brolga, lorikeet,
opossum.  I wonder what limitations initdb is bumping against.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication/backup defaults

2017-01-06 Thread Peter Eisentraut

On 1/5/17 4:56 PM, Michael Banck wrote:
>> You can't actually change the other two without changing wal_level.
> That actually goes both ways: I recently saw a server not start cause we
> were experimenting with temporarily setting wal_level to minimal for
> initial bulk loading, but did not reduce max_wal_senders back to zero.
> So it failed at startup with 'FATAL:  WAL streaming (max_wal_senders >
> 0) requires wal_level "replica" or "logical"'.

I think that was the point:  You can't change the default of
max_wal_senders without also changing the default of wal_level.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication/backup defaults

2017-01-05 Thread Michael Banck

On Mon, Jan 02, 2017 at 10:21:41AM +0100, Magnus Hagander wrote:
> On Mon, Jan 2, 2017 at 10:17 AM, Simon Riggs  wrote:
> > On 31 December 2016 at 15:00, Magnus Hagander  wrote:
> > > max_wal_senders=10
> > > max_replication_slots=20

[...]

> > > wal_level=replica
> >
> > This is more problematic because it changes behaviours.
> 
> You can't actually change the other two without changing wal_level.

That actually goes both ways: I recently saw a server not start cause we
were experimenting with temporarily setting wal_level to minimal for
initial bulk loading, but did not reduce max_wal_senders back to zero.
So it failed at startup with 'FATAL:  WAL streaming (max_wal_senders >
0) requires wal_level "replica" or "logical"'.

I don't want to hijack this thread, but I wonder whether the name
"*max*_wal_senders" really conveys that dependence on wal_level (there's
no comment to that end in the postgresql.conf sample) and/or whether
maybe the admin should just be notified that WAL streaming is turned off
cause wal_level < 'replica'?

Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication/backup defaults

2017-01-05 Thread Tomas Vondra


On 01/05/2017 05:37 PM, Stephen Frost wrote:

Tomas,

* Tomas Vondra (tomas.von...@2ndquadrant.com) wrote:

On 01/05/2017 02:23 PM, Magnus Hagander wrote:

It's easy enough to construct a benchmark specifically to show the
difference, but of any actual "normal workload" for it. Typically the
optimization applies to things like bulk loading, which typically never
done alone and does not lend itself to that type of benchmarking very
easily.


Not sure if I understand correctly what you're saying. You're saying
that although it'd be easy to construct a benchmark showing
significant performance impact, it won't represent a common
workload. Correct?


I think he's saying that it's not very easy to construct a good example
of typical bulk-loading workloads using just pgbench.  Bulk loading
certainly happens with PG and I don't think we'll make very many friends
if we break optimizations when wal_level is set to minimal like those
you get using:

BEGIN;
CREATE TABLE x (c1 int);
COPY x FROM STDIN;
COMMIT;

or:

BEGIN;
TRUNCATE x;
COPY x FROM STDIN;
COMMIT;

Changing the wal_level from 'minimal' to 'replica' or 'logical' with
such a benchmark is going to make the WAL go from next-to-nothing to
size-of-database.


Sure, I do know how to construct such workloads - and it's trivial even 
with pgbench custom scripts. The question is whether such workloads are 
common or not.


Most importantly, no one is proposing to break the optimizations, but 
changing the defaults - users relying on the optimizations are free to 
switch back to wal_level=minimal if needed.


>

One doesn't typically *just* do bulk loads, however,
often it's a bulk load into a table and then the contents of that table
are merged with another table or perhaps joined to it to produce some
report or something along those lines.  In many of those cases, our
more-recently added capability to have UNLOGGED tables will work, but
not all (in particular, it can be very handy to load everything in using
the above technique and then switch the wal_level to replica, which
avoids having to have the bulk of the data sent through WAL, something
you can't avoid if you want to turn an unlogged table into a logged
one).



Ultimately, the question is whether the number of people running into 
"Hey, I can't take pg_basebackup or setup a standby with the default 
config!" is higher or lower than number of people running into "Hey, 
CREATE TABLE + COPY is slower now!"


I haven't seen many systems relying on such load optimizations, for a 
number of reasons:


1) The important/critical systems usually have replicas, so are 
inherently incompatible with wal_level=minimal.


2) The batch jobs usually don't truncate the main table, but load the 
increment into a temporary/unlogged table first, then merge it into the 
main one.


That is not to say there are no other cases benefiting from those 
optimizations, but we're talking about the default value - we're not 
removing the wal_level=minimal.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication slot xmin is not reset if HS feedback is turned off while standby is shut down

2017-01-05 Thread Ants Aasma

On 5 Jan 2017 2:54 a.m., "Craig Ringer"  wrote:

On 2 January 2017 at 22:24, Craig Ringer  wrote:
>
>
> On 2 Jan. 2017 20:20, "Simon Riggs"  wrote:
>
> On 21 December 2016 at 13:23, Simon Riggs  wrote:
>
>> Fix it up and I'll commit. Thanks for the report.
>
> I was hoping for some more effort from Ants to correct this.
>
> I'll commit Craig's new tests for hs feedback before this, so it can
> go in with a Tap test, so I imagine we're about a week away from
> commit on this.
>
>
> I posted a revised version of Ants's patch that passes the shell script
> test.
>
> A TAP  test would likely make sene though, I agree.

Ants, do you think you'll have a chance to convert your shell script
test into a TAP test in src/test/recovery?

Simon has said he would like to commit this fix. I'd personally be
happier if it had a test to go with it.

You could probably just add to src/test/recover/t/001 which now
contains my additions for hot standby.

I'm travelling right now, but I should be able to give it a shot next week.

Regards,
Ants Aasma

Re: [HACKERS] Replication/backup defaults

2017-01-05 Thread Andres Freund

On 2017-01-05 09:12:49 -0800, Andres Freund wrote:
> On 2017-01-05 18:08:36 +0100, Magnus Hagander wrote:
> > On Thu, Jan 5, 2017 at 6:01 PM, Andres Freund  wrote:
> > 
> > > On 2017-01-05 08:38:32 -0500, Peter Eisentraut wrote:
> > > > I also suggest making the defaults for both 20 instead of 10.  That
> > > > leaves enough room that almost nobody ever has to change them, whereas
> > > > 10 can be a bit tight for some not-outrageous installations (8 standbys
> > > > plus backup?).
> > >
> > > I'm afraid we need to add initdb integration / testing for those. I mean
> > > we have initdb test down to 10 connections to deal with limited
> > > resources...
> > >
> > 
> > If we make both 10 by default we should be OK though, no?
> 
> I'm a bit doubtful about that. On the other hand, we've increased
> max_parallel_workers without anybody complaining.

Err, I mean max_worker_processes.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication/backup defaults

2017-01-05 Thread Andres Freund

On 2017-01-05 18:08:36 +0100, Magnus Hagander wrote:
> On Thu, Jan 5, 2017 at 6:01 PM, Andres Freund  wrote:
> 
> > On 2017-01-05 08:38:32 -0500, Peter Eisentraut wrote:
> > > I also suggest making the defaults for both 20 instead of 10.  That
> > > leaves enough room that almost nobody ever has to change them, whereas
> > > 10 can be a bit tight for some not-outrageous installations (8 standbys
> > > plus backup?).
> >
> > I'm afraid we need to add initdb integration / testing for those. I mean
> > we have initdb test down to 10 connections to deal with limited
> > resources...
> >
> 
> If we make both 10 by default we should be OK though, no?

I'm a bit doubtful about that. On the other hand, we've increased
max_parallel_workers without anybody complaining.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication/backup defaults

2017-01-05 Thread Magnus Hagander

On Thu, Jan 5, 2017 at 6:01 PM, Andres Freund  wrote:

> On 2017-01-05 08:38:32 -0500, Peter Eisentraut wrote:
> > I also suggest making the defaults for both 20 instead of 10.  That
> > leaves enough room that almost nobody ever has to change them, whereas
> > 10 can be a bit tight for some not-outrageous installations (8 standbys
> > plus backup?).
>
> I'm afraid we need to add initdb integration / testing for those. I mean
> we have initdb test down to 10 connections to deal with limited
> resources...
>

If we make both 10 by default we should be OK though, no?


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] Replication/backup defaults

2017-01-05 Thread Andres Freund

On 2017-01-05 08:38:32 -0500, Peter Eisentraut wrote:
> I also suggest making the defaults for both 20 instead of 10.  That
> leaves enough room that almost nobody ever has to change them, whereas
> 10 can be a bit tight for some not-outrageous installations (8 standbys
> plus backup?).

I'm afraid we need to add initdb integration / testing for those. I mean
we have initdb test down to 10 connections to deal with limited
resources...

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication/backup defaults

2017-01-05 Thread Stephen Frost

Tomas,

* Tomas Vondra (tomas.von...@2ndquadrant.com) wrote:
> On 01/05/2017 02:23 PM, Magnus Hagander wrote:
> >It's easy enough to construct a benchmark specifically to show the
> >difference, but of any actual "normal workload" for it. Typically the
> >optimization applies to things like bulk loading, which typically never
> >done alone and does not lend itself to that type of benchmarking very
> >easily.
> 
> Not sure if I understand correctly what you're saying. You're saying
> that although it'd be easy to construct a benchmark showing
> significant performance impact, it won't represent a common
> workload. Correct?

I think he's saying that it's not very easy to construct a good example
of typical bulk-loading workloads using just pgbench.  Bulk loading
certainly happens with PG and I don't think we'll make very many friends
if we break optimizations when wal_level is set to minimal like those
you get using:

BEGIN;
CREATE TABLE x (c1 int);
COPY x FROM STDIN;
COMMIT;

or:

BEGIN;
TRUNCATE x;
COPY x FROM STDIN;
COMMIT;

Changing the wal_level from 'minimal' to 'replica' or 'logical' with
such a benchmark is going to make the WAL go from next-to-nothing to
size-of-database.  One doesn't typically *just* do bulk loads, however,
often it's a bulk load into a table and then the contents of that table
are merged with another table or perhaps joined to it to produce some
report or something along those lines.  In many of those cases, our
more-recently added capability to have UNLOGGED tables will work, but
not all (in particular, it can be very handy to load everything in using
the above technique and then switch the wal_level to replica, which
avoids having to have the bulk of the data sent through WAL, something
you can't avoid if you want to turn an unlogged table into a logged
one).

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Replication/backup defaults

2017-01-05 Thread Tomas Vondra


On 01/05/2017 02:23 PM, Magnus Hagander wrote:



On Thu, Jan 5, 2017 at 12:44 AM, Tomas Vondra
mailto:tomas.von...@2ndquadrant.com>> wrote:

On 01/03/2017 11:56 PM, Tomas Vondra wrote:

Hi,

...

I'll push results for larger ones once those tests complete
(possibly
tomorrow).


I just pushed additional results (from the additional scales) to the
git repositories. On the larger (16/32-cores) machine with 2x
e5-2620, the results look like this

   scale minimal   replica logical
  -
   100 23968 24393   24393
   100023412 23656   23794
   15283  53205197

and on the smaller one (i5-2500k with 4 cores) I got this:

   scale minimal   replica logical
  -
   50   5884  58965873
   400  5324  53425478
   1000 5341  54395425

The scales were chosen so that the smallest one fits into shared
buffers, the medium exceeds shared buffers but still fits into RAM,
and the largest scale exceeds RAM.

The results seem to confirm that for this workload (regular
pgbench), there's very little difference between the different WAL
levels. Actually, the 'replica' seems a tad faster than 'minimal',
but the difference may be easily due to noise.

I've also looked at the amount of WAL actually produced, by doing
pgbench runs throttled to the same throughput, and counting the
number of archived WAL segments & running pg_xlogdump. Interestingly
enough, those two metrics differ quite a bit - for example for scale
1000 (on the 32-core machine), the 2h runs produced these number of
WAL segments:

   minimal: 5515 (88.2GB)
   replica: 5587 (89.4GB)
   logical: 6058 (96.9GB)

so 'replica' adds ~1.3% and 'logical' ~9.8%. But per pg_xlogdump,
the WAL amounts are only 73.3GB, 73.9GB and 74.4GB - a difference of
only ~1.5% between minimal and logical. The values are also much
lower than raw WAL size, so I assume it's because pg_xlogdump
ignores some extra overhead, present in the segments. Moreover, the
sequential nature of WAL writes means even the +10% is not a big
deal (unless it results in saturating the bandwidth, but running on
>90% is a bad idea anyway).


If you are using log archiving, it also means your log archive grows by
10% (well, 8% assuming it was 9.8% on top of 0, not on top of replica).



... and that the standby has to chew through the additional 10% of WAL. 
We already have standbys that occasionally struggle to keep up with the 
master, and adding more load won't make them happy (even if just 10%).





My conclusion from these results is that using 'wal_level=replica'
by default seems fine. Perhaps even wal_level=logical would be OK,
but that's probably a too big step for 10.0.



I think it sounds like 'replica' is the safe default.

If we can make it possible to go replica<->logical without a restart,
that makes it easy enough to increase it if necessary, and the default
still applies to most people (most people take backups, most people
probably don't do logical replication).



My thoughts, exactly.




Any ideas how to construct a plausible workload where the
differences are significantly larger? Running the tests on non-SSD
storage might also be useful.


It's easy enough to construct a benchmark specifically to show the
difference, but of any actual "normal workload" for it. Typically the
optimization applies to things like bulk loading, which typically never
done alone and does not lend itself to that type of benchmarking very
easily.



Not sure if I understand correctly what you're saying. You're saying 
that although it'd be easy to construct a benchmark showing significant 
performance impact, it won't represent a common workload. Correct?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication/backup defaults

2017-01-05 Thread Peter Eisentraut

On 1/4/17 2:44 PM, Peter Eisentraut wrote:
> On 1/4/17 9:46 AM, Magnus Hagander wrote:
>> How about we default max_replication_slots to -1, which means to use the
>> same value as max_wal_senders?
> 
>> But you don't necessarily want to adjust them together, do you? They are
>> both capped by max_connections, but I don't think they have any other
>> direct relation between each other? 
> 
> I think the most usual case is that you use approximately one
> replication slot per wal sender slot.  So it would be a good default to
> make them equal.

Well, let's worry about that later.

I think everyone is now in agreement with your original change proposal.

My suggestion would be to make the defaults of max_wal_senders and
max_replication_slots the same, so we don't open an opportunity for
ongoing discussions about why they are different and how different they
should be.  Ideally, we would make max_replication_slots go away at some
point similar to my suggestion above.

I also suggest making the defaults for both 20 instead of 10.  That
leaves enough room that almost nobody ever has to change them, whereas
10 can be a bit tight for some not-outrageous installations (8 standbys
plus backup?).

Your patch looks OK, documentation changes look good.  I wouldn't be
surprised if we found another place or two that will need updating, but
that is not a big deal.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication/backup defaults

2017-01-05 Thread Magnus Hagander

On Thu, Jan 5, 2017 at 12:44 AM, Tomas Vondra 
wrote:

> On 01/03/2017 11:56 PM, Tomas Vondra wrote:
>
>> Hi,
>>
>> ...
>
>> I'll push results for larger ones once those tests complete (possibly
>> tomorrow).
>>
>>
> I just pushed additional results (from the additional scales) to the git
> repositories. On the larger (16/32-cores) machine with 2x e5-2620, the
> results look like this
>
>scale minimal   replica logical
>   -
>100 23968 24393   24393
>100023412 23656   23794
>15283  53205197
>
> and on the smaller one (i5-2500k with 4 cores) I got this:
>
>scale minimal   replica logical
>   -
>50   5884  58965873
>400  5324  53425478
>1000 5341  54395425
>
> The scales were chosen so that the smallest one fits into shared buffers,
> the medium exceeds shared buffers but still fits into RAM, and the largest
> scale exceeds RAM.
>
> The results seem to confirm that for this workload (regular pgbench),
> there's very little difference between the different WAL levels. Actually,
> the 'replica' seems a tad faster than 'minimal', but the difference may be
> easily due to noise.
>
> I've also looked at the amount of WAL actually produced, by doing pgbench
> runs throttled to the same throughput, and counting the number of archived
> WAL segments & running pg_xlogdump. Interestingly enough, those two metrics
> differ quite a bit - for example for scale 1000 (on the 32-core machine),
> the 2h runs produced these number of WAL segments:
>
>minimal: 5515 (88.2GB)
>replica: 5587 (89.4GB)
>logical: 6058 (96.9GB)
>
> so 'replica' adds ~1.3% and 'logical' ~9.8%. But per pg_xlogdump, the WAL
> amounts are only 73.3GB, 73.9GB and 74.4GB - a difference of only ~1.5%
> between minimal and logical. The values are also much lower than raw WAL
> size, so I assume it's because pg_xlogdump ignores some extra overhead,
> present in the segments. Moreover, the sequential nature of WAL writes
> means even the +10% is not a big deal (unless it results in saturating the
> bandwidth, but running on >90% is a bad idea anyway).
>

If you are using log archiving, it also means your log archive grows by 10%
(well, 8% assuming it was 9.8% on top of 0, not on top of replica).


>
> My conclusion from these results is that using 'wal_level=replica' by
> default seems fine. Perhaps even wal_level=logical would be OK, but that's
> probably a too big step for 10.0.
>


I think it sounds like 'replica' is the safe default.

If we can make it possible to go replica<->logical without a restart, that
makes it easy enough to increase it if necessary, and the default still
applies to most people (most people take backups, most people probably
don't do logical replication).



> Any ideas how to construct a plausible workload where the differences are
> significantly larger? Running the tests on non-SSD storage might also be
> useful.
>
>
It's easy enough to construct a benchmark specifically to show the
difference, but of any actual "normal workload" for it. Typically the
optimization applies to things like bulk loading, which typically never
done alone and does not lend itself to that type of benchmarking very
easily.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] Replication/backup defaults

2017-01-04 Thread Tomas Vondra


On 01/03/2017 11:56 PM, Tomas Vondra wrote:

Hi,


...

I'll push results for larger ones once those tests complete (possibly
tomorrow).



I just pushed additional results (from the additional scales) to the git 
repositories. On the larger (16/32-cores) machine with 2x e5-2620, the 
results look like this


   scale minimal   replica logical
  -
   100 23968 24393   24393
   100023412 23656   23794
   15283  53205197

and on the smaller one (i5-2500k with 4 cores) I got this:

   scale minimal   replica logical
  -
   50   5884  58965873
   400  5324  53425478
   1000 5341  54395425

The scales were chosen so that the smallest one fits into shared 
buffers, the medium exceeds shared buffers but still fits into RAM, and 
the largest scale exceeds RAM.


The results seem to confirm that for this workload (regular pgbench), 
there's very little difference between the different WAL levels. 
Actually, the 'replica' seems a tad faster than 'minimal', but the 
difference may be easily due to noise.


I've also looked at the amount of WAL actually produced, by doing 
pgbench runs throttled to the same throughput, and counting the number 
of archived WAL segments & running pg_xlogdump. Interestingly enough, 
those two metrics differ quite a bit - for example for scale 1000 (on 
the 32-core machine), the 2h runs produced these number of WAL segments:


   minimal: 5515 (88.2GB)
   replica: 5587 (89.4GB)
   logical: 6058 (96.9GB)

so 'replica' adds ~1.3% and 'logical' ~9.8%. But per pg_xlogdump, the 
WAL amounts are only 73.3GB, 73.9GB and 74.4GB - a difference of only 
~1.5% between minimal and logical. The values are also much lower than 
raw WAL size, so I assume it's because pg_xlogdump ignores some extra 
overhead, present in the segments. Moreover, the sequential nature of 
WAL writes means even the +10% is not a big deal (unless it results in 
saturating the bandwidth, but running on >90% is a bad idea anyway).


My conclusion from these results is that using 'wal_level=replica' by 
default seems fine. Perhaps even wal_level=logical would be OK, but 
that's probably a too big step for 10.0.


Any ideas how to construct a plausible workload where the differences 
are significantly larger? Running the tests on non-SSD storage might 
also be useful.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication slot xmin is not reset if HS feedback is turned off while standby is shut down

2017-01-04 Thread Craig Ringer

On 2 January 2017 at 22:24, Craig Ringer  wrote:
>
>
> On 2 Jan. 2017 20:20, "Simon Riggs"  wrote:
>
> On 21 December 2016 at 13:23, Simon Riggs  wrote:
>
>> Fix it up and I'll commit. Thanks for the report.
>
> I was hoping for some more effort from Ants to correct this.
>
> I'll commit Craig's new tests for hs feedback before this, so it can
> go in with a Tap test, so I imagine we're about a week away from
> commit on this.
>
>
> I posted a revised version of Ants's patch that passes the shell script
> test.
>
> A TAP  test would likely make sene though, I agree.


Ants, do you think you'll have a chance to convert your shell script
test into a TAP test in src/test/recovery?

Simon has said he would like to commit this fix. I'd personally be
happier if it had a test to go with it.

You could probably just add to src/test/recover/t/001 which now
contains my additions for hot standby.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication/backup defaults

2017-01-04 Thread Simon Riggs

On 3 January 2017 at 12:34, Michael Paquier  wrote:
> On Mon, Jan 2, 2017 at 10:55 PM, Simon Riggs  wrote:
>> In the hope of making things better in 10.0, I remove my objection. If
>> people want to use wal_level = minimal they can restart their server
>> and they can find that out in the release notes.
>>
>> Should we set wal_level = replica or wal_level = logical as the
>> default for 10.0?
>
> replica sounds like a better default to me as most users use at least
> archiving. Logical decoding is still fresh though, and its use is not
> that wide. Have there been any study on its performance impact
> compared to replica by the way?

Magnus' arguments should also be applied to wal_level = logical since
users will be surprised if they cannot use the logical replication
features we are adding as a main feature of 10.0. Why go through the
same pain again?

And if preventing their use is acceptable for the user, we should
treat it as a performance feature to reduce the wal_level.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication/backup defaults

2017-01-04 Thread Peter Eisentraut

On 1/4/17 9:46 AM, Magnus Hagander wrote:
> How about we default max_replication_slots to -1, which means to use the
> same value as max_wal_senders?

> But you don't necessarily want to adjust them together, do you? They are
> both capped by max_connections, but I don't think they have any other
> direct relation between each other? 

I think the most usual case is that you use approximately one
replication slot per wal sender slot.  So it would be a good default to
make them equal.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication/backup defaults

2017-01-04 Thread Peter Eisentraut

On 12/31/16 10:00 AM, Magnus Hagander wrote:
> max_wal_senders=10
> max_replication_slots=20

How about we default max_replication_slots to -1, which means to use the
same value as max_wal_senders?

I think this would address the needs of 99% of users.  If we do like you
suggest, there are going to be very many users who forget to adjust
these two values together, and very many who will do it but will be
puzzled and annoyed by it.  Since we're now pushing the use of
replication slots even more (your pg_basebackup change, upcoming logical
replication), I think this could be a major source of misconfigurations.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication/backup defaults

2017-01-04 Thread Magnus Hagander

On Wed, Jan 4, 2017 at 3:43 PM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 12/31/16 10:00 AM, Magnus Hagander wrote:
> > max_wal_senders=10
> > max_replication_slots=20
>
> How about we default max_replication_slots to -1, which means to use the
> same value as max_wal_senders?
>
> I think this would address the needs of 99% of users.  If we do like you
> suggest, there are going to be very many users who forget to adjust
> these two values together, and very many who will do it but will be
> puzzled and annoyed by it.  Since we're now pushing the use of
> replication slots even more (your pg_basebackup change, upcoming logical
> replication), I think this could be a major source of misconfigurations.
>

But you don't necessarily want to adjust them together, do you? They are
both capped by max_connections, but I don't think they have any other
direct relation between each other?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] Replication/backup defaults

2017-01-03 Thread Tomas Vondra


On 01/03/2017 01:34 PM, Michael Paquier wrote:

On Mon, Jan 2, 2017 at 10:55 PM, Simon Riggs  wrote:

In the hope of making things better in 10.0, I remove my objection.
If people want to use wal_level = minimal they can restart their
server and they can find that out in the release notes.

Should we set wal_level = replica or wal_level = logical as the
default for 10.0?


replica sounds like a better default to me as most users use at
least archiving. Logical decoding is still fresh though, and its use
is not that wide. Have there been any study on its performance
impact compared to replica by the way?



I've just posted results for some benchmarks I'm running. Those are some 
simple pgbench tests, nothing special, and according to those results 
the performance impact (logical vs. replica) is negligible. I can run 
additional tests with other workloads, of course.


While we can probably construct workloads where the difference is 
measurable, I'm not sure performance impact is the main concern here. As 
you point out, we have 'replica' since 9.0 effectively, while logical is 
much newer, so perhaps there are some hidden bugs? It'd be embarrassing 
to pick 'logical' and hurt everyone, even if they don't get any benefit 
from wal_level=logical.


So +1 to 'replica' and allowing switching to 'logical' without restart. 
That should not be extremely difficult, as the main culprit seems to be 
max_wal_senders/max_replication_slots requiring shared memory. But with 
'replica' we already have those enabled/allocated, unlike when switching 
from 'minimal' to 'replica'.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication/backup defaults

2017-01-03 Thread Tomas Vondra

Hi,

On 12/31/2016 04:00 PM, Magnus Hagander wrote:

Cycling back to this topic again, but this time at the beginning of a CF.

Here's an actual patch to change:

wal_level=replica
max_wal_senders=10
max_replication_slots=20

Based on feedback from last year
(https://www.postgresql.org/message-id/CABUevEwfV7zDutescm2PHGvsJdYA0RWHFMTRGhwrJPGgSbzZDQ%40mail.gmail.com):

There were requests for benchmarks of performance difference. Tomas
has promised to run a couple of benchmarks on his standard
benchmarking setups to give numbers on that. Thanks Tomas, please
pipe in with your results when you have them!

As promised, I'm running some benchmarks, and I have some early results
to report. And perhaps we can discuss whether we need to test some
additional workloads.

I'm 100% on board with the idea that we should switch to wal_level which
allows taking backups or setting-up a streaming replica, as long as it
does not cause severe performance regression in common workloads.

So while it'd be trivial to construct workloads demonstrating the
optimizations in wal_level=minimal (e.g. initial loads doing CREATE
TABLE + COPY + CREATE INDEX in a single transaction), but that would be
mostly irrelevant I guess.

Instead, I've decided to run regular pgbench TPC-B-like workload on a
bunch of different scales, and measure throughput + some xlog stats with
each of the three wal_level options.

Note: I tweaked the code a bit to allow archiving with "minimal" WAL
level, to allow computing WAL stats on the archived segments (instead of
keeping all segments in the data directory).

As usual, I'm running it on two machines - a small old one (i5-2500k box
with 4 cores and 8GB of RAM) and a new one (2x e5-2620v4 with 16/32
cores, 64GB of RAM). Both machines have SSD-based storage.

The clusters on both machines were reasonably tuned, see 'settings.log'
for each run. The tests are fairly long, covering multiple checkpoints
etc. In other words, the results should be fairly stable.

The scripts/results/stats/configs are available here:

* https://bitbucket.org/tvondra/wal-levels-e2620-v4/src
* https://bitbucket.org/tvondra/wal-levels-i5/src

So far I only have results for the smallest data sets (50 on i5 and 100
on e5), which easily fits into shared_buffers in both cases, and the
numbers look like this:

minimal replica standby

i5-2500k 5884 5896 5873
e5-2620v4 239682439324259

So the performance penalty of replica/standby WAL levels on this
workload is pretty much non-existent - for the larger machine those
levels are actually a tad faster than 'minimal', but the difference is
within 2% (so might easily be noise).

I'll push results for larger ones once those tests complete (possibly
tomorrow).

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication/backup defaults

2017-01-03 Thread Michael Paquier

On Mon, Jan 2, 2017 at 10:55 PM, Simon Riggs  wrote:
> In the hope of making things better in 10.0, I remove my objection. If
> people want to use wal_level = minimal they can restart their server
> and they can find that out in the release notes.
>
> Should we set wal_level = replica or wal_level = logical as the
> default for 10.0?

replica sounds like a better default to me as most users use at least
archiving. Logical decoding is still fresh though, and its use is not
that wide. Have there been any study on its performance impact
compared to replica by the way?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication slot xmin is not reset if HS feedback is turned off while standby is shut down

2017-01-02 Thread Craig Ringer

On 2 Jan. 2017 20:20, "Simon Riggs"  wrote:

On 21 December 2016 at 13:23, Simon Riggs  wrote:

> Fix it up and I'll commit. Thanks for the report.

I was hoping for some more effort from Ants to correct this.

I'll commit Craig's new tests for hs feedback before this, so it can
go in with a Tap test, so I imagine we're about a week away from
commit on this.


I posted a revised version of Ants's patch that passes the shell script
test.

A TAP  test would likely make sene though, I agree.

Re: [HACKERS] Replication/backup defaults

On 2 January 2017 at 09:39, Magnus Hagander  wrote:

> The conclusion has been that our defaults should really allow people to take
> backups of their systems, and they currently don't.
>
> Making things run faster is tuning, and people should expect to do that if
> they need things to run faster. But being able to make a backup is pretty
> fundamental.

In the hope of making things better in 10.0, I remove my objection. If
people want to use wal_level = minimal they can restart their server
and they can find that out in the release notes.

Should we set wal_level = replica or wal_level = logical as the
default for 10.0?

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication slot xmin is not reset if HS feedback is turned off while standby is shut down

On 21 December 2016 at 13:23, Simon Riggs  wrote:

> Fix it up and I'll commit. Thanks for the report.

I was hoping for some more effort from Ants to correct this.

I'll commit Craig's new tests for hs feedback before this, so it can
go in with a Tap test, so I imagine we're about a week away from
commit on this.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication/backup defaults

On 2 January 2017 at 09:48, Simon Riggs  wrote:

> I'm willing to assist in a project to allow changing wal_level online
> in this release. Please let's follow that path.

wal_level looks like one of the easier ones to change without a server restart

There are actions to take in either direction, up or down. My initial
thoughts on the pseudocode would be...

reset wal_level so all new transactions see that value
/* actions after setting new value */
if (old_wal_level  < new_wal_level) /* going up */
   get list of running transactions (perhaps only those using no-WAL-opt)
else /* coming down */
{
 if (old_wal_level == logical)
  disconnect logical replication and disallow logical slots
 if (new_wal_level == minimal)
  disconnect streaming replication and disallow physical slots
}
wait for a checkpoint (fast checkpoint if no other transactions actions active)
if (list)
   wait for list of running xacts to complete
wait for a checkpoint (fast checkpoint if no other transactions actions active)
XLogReportParameters()

So it looks easier to go up than down, which is good since that is the
important direction.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication/backup defaults

2017-01-02 Thread Andres Freund

On 2017-01-02 10:31:28 +, Simon Riggs wrote:
> We must listen to feedback, not just try to blast through it.

Not agreeing with your priorities isn't "blasting through feedback".


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication/backup defaults

On 2 January 2017 at 10:13, Andres Freund  wrote:
> On 2017-01-02 11:05:05 +0100, Magnus Hagander wrote:
>> My claim here is that a lot *fewer* people have come to expect this
>> performance optimization, than would (quite reasonably) expect that backups
>> should work on a system without taking it down for restart to reconfigure
>> it to support that.
>
> +1
>
> As evidenced by the fact that a large fraction of those optimizations
> are actually currently entirely broken. Without anybody noticing for
> years:
> http://archives.postgresql.org/message-id/20150702220524.GA9392%40svana.org

No, the optimization works, but there is a bug in it that makes it
unsafe, not the same thing as entirely broken. That clearly needs to
be fixed, but it does not prevent the performance benefit, so that
argument is invalid.

We must listen to feedback, not just try to blast through it.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication/backup defaults

2017-01-02 Thread Andres Freund

On 2017-01-02 11:05:05 +0100, Magnus Hagander wrote:
> My claim here is that a lot *fewer* people have come to expect this
> performance optimization, than would (quite reasonably) expect that backups
> should work on a system without taking it down for restart to reconfigure
> it to support that.

+1

As evidenced by the fact that a large fraction of those optimizations
are actually currently entirely broken. Without anybody noticing for
years:
http://archives.postgresql.org/message-id/20150702220524.GA9392%40svana.org

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication/backup defaults

2017-01-02 Thread Magnus Hagander

On Mon, Jan 2, 2017 at 10:48 AM, Simon Riggs  wrote:

> On 2 January 2017 at 09:39, Magnus Hagander  wrote:
>
> > Please do submit a patch for it.
>
> The way this is supposed to go is someone submits a patch and they
> receive feedback, then act on that feedback. If I was able to get away
> with deflecting all review comments with a simple "you fix it if you
> don't like it" there would be considerably more patches with my name
> on it accepted, but probably no further forward in real terms because
> of the loose ends it creates.
>

Fair enough.

It's just that people keep saying that this is easy, and have said so for a
long time, but nobody has written a patch for it.

> In this case, simply changing the default will remove a whole class of
> performance optimization that we have educated people to expect. I'm
> sorry to point this out but removing that will cause many real changes
> for people's systems that we will not be thanked for, even though I
> understand your reasoning and wish the same goals to be achieved.
>

My claim here is that a lot *fewer* people have come to expect this
performance optimization, than would (quite reasonably) expect that backups
should work on a system without taking it down for restart to reconfigure
it to support that.

I run into that all the time. I hear complaints about that all the time. I
have not heard a single user complain about performance loss after enabling
backups.

And how many people that rely on this optimization don't do any *other*
optimization on their system *anyway*, that would cause them to require a
restart anyway? It's not like we're taking away their ability to enable the
optimization, it's just not on by default.

> I'm willing to assist in a project to allow changing wal_level online
> in this release. Please let's follow that path.
>

Sure thing, I will be happy to help test and review such a patch.

I will still argue that the *default* should be wal_level=replica though.
Because once we have such a patch, it's trivial to re-enable this
performance optimization (at the cost of backups and replication).

//Magnus

Re: [HACKERS] Replication/backup defaults

On 2 January 2017 at 09:39, Magnus Hagander  wrote:

> Please do submit a patch for it.

The way this is supposed to go is someone submits a patch and they
receive feedback, then act on that feedback. If I was able to get away
with deflecting all review comments with a simple "you fix it if you
don't like it" there would be considerably more patches with my name
on it accepted, but probably no further forward in real terms because
of the loose ends it creates.

In this case, simply changing the default will remove a whole class of
performance optimization that we have educated people to expect. I'm
sorry to point this out but removing that will cause many real changes
for people's systems that we will not be thanked for, even though I
understand your reasoning and wish the same goals to be achieved.

I'm willing to assist in a project to allow changing wal_level online
in this release. Please let's follow that path.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication/backup defaults

2017-01-02 Thread Magnus Hagander

On Mon, Jan 2, 2017 at 10:32 AM, Simon Riggs  wrote:

> On 2 January 2017 at 09:21, Magnus Hagander  wrote:
> >
> >
> > On Mon, Jan 2, 2017 at 10:17 AM, Simon Riggs 
> wrote:
> >>
> >> On 31 December 2016 at 15:00, Magnus Hagander 
> wrote:
> >> > Cycling back to this topic again, but this time at the beginning of a
> >> > CF.
> >> >
> >> > Here's an actual patch to change:
> >> >
> >> >
> >> > max_wal_senders=10
> >> > max_replication_slots=20
> >>
> >> +1
> >>
> >> If that doesn't fly, it seems easy enough to introduce a
> >> "min_reserve_limit" GUC that defaults to 10 that gives a lower bound
> >> on the amount of memory we reserve for many of those shmem allocs;
> >> that can be set to 0 for people that want the old behaviour. Then we
> >> can allow changes up to the memory limit without a restart.
> >>
> >> > wal_level=replica
> >>
> >> This is more problematic because it changes behaviours.
> >
> >
> > You can't actually change the other two without changing wal_level.
>
> You could, but we currently disallow it.
>

I always assumed we disallowed it because we would have to write actual
code to make it safe.

> >> A more useful approach would be to bring all the things needed to
> >> enable replication into one ALTER SYSTEM command, so people have just
> >> one thing they need to execute and it will work out the details and
> >> locations for you.
> >> That way we can maintain the defaults yet make it easier to enable in
> >> a useful way.
> >
> >
> > Sure, that would be great - the core being the ability to change these
> > things without a restart. But I would argue for not letting perfection
> get
> > in the way of progress, and do this anyway. I doubt there is any way the
> > bigger change is going to get done for 10 at this point, so we should
> give
> > people the ability to do backups off a default installation already.
>
> We could fairly easily change wal_level without restart; its been
> discussed for many years.
>
> The problem from my perspective is that you're immediately turning off
> the performance benefits for initial bulk loads.
>

We've had this discussion many times over. Please see for example the
thread I referenced.

The conclusion has been that our defaults should really allow people to
take backups of their systems, and they currently don't.

Making things run faster is tuning, and people should expect to do that if
they need things to run faster. But being able to make a backup is pretty
fundamental.

Arguing how that isn't a problem looks at least as time consuming as
> fixing the problem.

Please do submit a patch for it. I don't know exactly what's involved in
that part, I just know that people have been complaining about this at
least since 9.0 was released so our track record of actually fixing it
isn't very good.

I'm not arguing that it's not a problem, btw. I'm arguing that until we can
solve the problem, we're much better off letting people do backups and set
up things like replication than optimizing for a usecase that many never
hit.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] Replication/backup defaults