Re: [JDBC] [HACKERS] Channel binding support for SCRAM-SHA-256

2017-05-27 Thread Michael Paquier
On Sat, May 27, 2017 at 5:59 PM, Álvaro Hernández Tortosa
 wrote:
> On 25/05/17 17:17, Michael Paquier wrote:
>> Please find attached a patch to add support for channel binding for
>> SCRAM, to mitigate MITM attacks using this protocol. Per RFC5802
>> (https://tools.ietf.org/html/rfc5802), servers supporting channel
>> binding need to add support for tls-unique, and this is what this
>> patch does.
>
> This is awesome, Michael :) Thank you! You have prevented me eventually
> writing the patch and having then PostgreSQL source tainted with Java-to-C
> "transpiled" code ^_^

The thing would have been reviewed and rewritten a couple of times :)
So that would have unlikely (hopefully) finished in the shape of a
Java-C-like patch.

> After a deeper analysis, I have some concerns/comments here:
>
> - tls-unique, as you mentioned, uses two undocumented APIs. This raises a
> small flag about the stability and future of those APIs.

Those are 5 lines of code each in OpenSSL, not that hard to maintain.
Those APIs are clunky by the way as there is no way to know in advance
the length of the message for memory allocation.

> - More importantly, some drivers (not libpq-based) may have unexpected
> difficulties implementing tls-unique. In particular, there is not an API in
> Java to get the finished message. I expect other languages to maybe have
> similar limitations. For Java I see two options:
> * Also implement tls-server-end-point, which rather relies on the server
> certificate. This information seems to be exposed as part of the Java APIs:
> https://docs.oracle.com/javase/8/docs/api/java/security/cert/Certificate.html#getEncoded--
> * Use the BouncyCastle library (http://bouncycastle.org/), which
> explicitly supports tls-unique
> (https://www.bouncycastle.org/docs/pkixdocs1.5on/org/bouncycastle/est/TLSUniqueProvider.html#getTLSUnique()
> ,
> https://www.bouncycastle.org/docs/tlsdocs1.5on/org/bouncycastle/tls/ChannelBinding.html#tls_unique
> ). This would require, however, non-trivial changes to the driver and, I
> expect, a lot of extra effort.

I am wondering how other languages are regarding that. Python has
added a method in 2011:
https://hg.python.org/cpython/rev/cb44fef5ea1d

> - It seems to me that tls-unique might be a bit fragile. In particular, it
> requires the server to be aware of TSL renegotiations and avoid performing
> one while the SCRAM authentication is being performed. I don't know if this
> is already guaranteed by the current patch, but it seems to me it requires
> complex interactions between levels of abstraction that are quite separate
> (SSL and SCRAM). This is explained by the RFC:

RFC 5802, section 6 gives a good level of details on the matter:
https://tools.ietf.org/html/rfc5802#section-6
There is indeed interaction between SSL and scram, and the patch makes
use of USE_SSL for this purpose.

> "
> server applications MUST NOT request TLS renegotiation during phases of the
> application protocol during which application-layer authentication occurs
> "
> (https://tools.ietf.org/html/rfc5929#section-3.1)

The SSL hanshake happens only once at connection obtention, before the
authentication exchange happens. So there is no renegociation. SSL
renegotiation has been removed in PG anyway, disabled on
back-branches, and removed as well in TLS 1.3 (right? I don't recall
the spec in details).

> Based on this facts, I'd suggest to either implement
> tls-server-end-point or implement both tls-server-end-point and tls-unique.
> The latter would require a more complete protocol message to advertise
> separately SCRAM mechanisms and channel binding names. One of such
> structures could be the one I suggested earlier:
> https://www.postgresql.org/message-id/df8c6e27-4d8e-5281-96e5-131a4e638...@8kdata.com

The RFC says that any server implementing channel binding must
implement tls-unique, so having only end-point is not compliant.

>> Before even that, the server needs to send to the client the list of
>> SASL mechanisms that are supported. This adds SCRAM-SHA-256-PLUS if
>> the server has been built with SSL support to the list.
>
> And I'd say the list of channel binding names supported.

 It seems to me as well that this gives us an indication that it
should be the default channel binding used, or if the exchange list
has no channel names included, the client can assume that tls-unique
will be used. Such an approach has as well the merit to keep the patch
simple. In short, I would advocate an incremental approach that adds
tls-unique first. This has value anyway as there is no need for
certificate configuration. This also does not require a message format
extension.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication and PANIC during shutdown checkpoint in publisher

2017-05-27 Thread Michael Paquier
On Fri, May 26, 2017 at 4:47 PM, Peter Eisentraut
 wrote:
> On 5/26/17 14:16, Michael Paquier wrote:
>> So, now that the last round of minor releases has happened and that
>> some dust has settled on this patch, shouldn't there be a backpatch?
>> If yes, do you need patches for all branches? This problems goes down
>> to 9.2 anyway as BASE_BACKUP can generate end-of-backup records.
>
> Yes, this could be backpatched now.  It looks like it will need a bit of
> fiddling to get it into all the backbranches.  If you want to give it a
> closer look, go ahead please.

Attached are patches for 9.2~9.6. There are a couple of conflicts
across each version. Particularly in 9.2, I have made the choice to
not rename walsender_ready_to_stop to got_SIGINT as this is used as
well in basebackup.c to make clearer the code. In 9.3~ the use of this
flag is located only within walsender.c.
-- 
Michael


walsender-shutdown-96.patch
Description: Binary data


walsender-shutdown-95.patch
Description: Binary data


walsender-shutdown-93.patch
Description: Binary data


walsender-shutdown-94.patch
Description: Binary data


walsender-shutdown-92.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] execGrouping.c limit on work_mem

2017-05-27 Thread Jeff Janes
In BuildTupleHashTable


/* Limit initial table size request to not more than work_mem */
nbuckets = Min(nbuckets, (long) ((work_mem * 1024L) / entrysize));


Is this a good idea?  If the caller of this code has no respect for
work_mem, they are still going to blow it out of the water.  Now we will
just do a bunch of hash-table splitting in the process.  That is only going
to add to the pain.

Also:

* false if it existed already.  ->additional_data in the new entry has

The field is just ->additional, not ->additional_data

Cheers,

Jeff


[HACKERS] simplehash.h typo

2017-05-27 Thread Jeff Janes
/* round up size to the next power of 2, that's the bucketing works  */


That should probably be "that's the **way** bucketing works".  Or maybe it
is an idiom I don't grok.

Cheers,

Jeff


Re: [HACKERS] Broken hint bits (freeze)

2017-05-27 Thread Michael Paquier
On Sat, May 27, 2017 at 12:56 PM, Andres Freund  wrote:
> On 2017-05-27 19:48:24 +0300, Vladimir Borodin wrote:
>> Well, actually clean shutdown of master with exit code 0 from `pg_ctl
>> stop -m fast` guarantees that all WAL has been replicated to standby.
>
> It does not.  It makes it likely, but the connection to the standby
> could be not up just then, you could run into walsender timeout, and a
> bunch of other scenarios.

Amen.

>> And if something would go wrong in above logic, postgres will not let you 
>> attach old master as a standby of new master. So it is highly probable not a 
>> setup problem.
>
> There's no such guarantee.  There's a bunch of checks that'll somewhat
> likely trigger, but nothing more than that.

Yes. Take for example the case where the host with a primary is
plugged off, and another host with a standby is promoted. If at next
restart you add directly for the old primary a recovery.conf and
attempt to use it as a standby to the new primary it may be able to
connect and to begin replication. That will result in a corrupted
standby.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication - still unstable after all these months

2017-05-27 Thread Erik Rijkers

On 2017-05-28 01:15, Mark Kirkwood wrote:


Also, any idea which rows are different? If you want something out of
the box that will do that for you see DBIx::Compare.


I used to save the content-diffs too but in the end decided they were 
useless (to me, anyway).



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication - still unstable after all these months

2017-05-27 Thread Erik Rijkers

On 2017-05-28 01:21, Mark Kirkwood wrote:

Sorry - I see you have done this already.

On 28/05/17 11:15, Mark Kirkwood wrote:
Interesting - might be good to see your test script too (so we can 
better understand how you are deciding if the runs are successful or 
not).



Yes, in pgbench_derail2.sh in the cb function it says:

  if [[ "${md5_total[$port1]}" == "${md5_total[$port2]}" ]]
  then
echo " ok"
  else
echo " NOK"
  fi

This is the final decision about success ('ok') or failure ('NOK'). (NOK 
stands for 'Not OK')


The two compared md5's (on the two ports: primary and replica) are each 
taken over a concatenation of the 4 separate md5's of the table-content 
(taken earlier in cb()).  If one or more of the 4 md5's differs, then 
that concatation-md5 will differ too.


Sorry, there is not a lot of comment






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication - still unstable after all these months

2017-05-27 Thread Mark Kirkwood

Sorry - I see you have done this already.

On 28/05/17 11:15, Mark Kirkwood wrote:
Interesting - might be good to see your test script too (so we can 
better understand how you are deciding if the runs are successful or 
not).






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication - still unstable after all these months

2017-05-27 Thread Mark Kirkwood
Interesting - might be good to see your test script too (so we can 
better understand how you are deciding if the runs are successful or not).



Also, any idea which rows are different? If you want something out of 
the box that will do that for you see DBIx::Compare.


regards

Mark


On 28/05/17 04:12, Erik Rijkers wrote:


ok, ok...

( The thing is, I am trying to pre-digest the output but it takes time )

I can do this now: attached some output that belongs with this group 
of 100  1-minute runs:


-- out_20170525_1426.txt
100 -- pgbench -c 64 -j 8 -T 60 -P 12 -n   --  scale 25
 82 -- All is well.
 18 -- Not good.

That is the worst set of runs of what I showed earlier.

that is:  out_20170525_1426.txt  and
2x18 logfiles that the 18 failed runs produced.
Those logfiles have names like:
logrep.20170525_1426.1436.1.scale_25.clients_64.NOK.log
logrep.20170525_1426.1436.2.scale_25.clients_64.NOK.log

.1.=primary
.2.=replica

Please disregard the errors around pg_current_wal_location().  (it was 
caused by some code to dump some wal into zipfiles which obviously 
stopped working after the function was removed/renamed) There are also 
some uninportant errors from the test-harness where I call with the 
wrong port.  Not interesting, I don't think.







--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Channel binding support for SCRAM-SHA-256

2017-05-27 Thread Álvaro Hernández Tortosa


[cross-posting to pgjdbc]


On 25/05/17 17:17, Michael Paquier wrote:

Hi all,

Please find attached a patch to add support for channel binding for
SCRAM, to mitigate MITM attacks using this protocol. Per RFC5802
(https://tools.ietf.org/html/rfc5802), servers supporting channel
binding need to add support for tls-unique, and this is what this
patch does.


This is awesome, Michael :) Thank you! You have prevented me 
eventually writing the patch and having then PostgreSQL source tainted 
with Java-to-C "transpiled" code ^_^


As defined in RFC5056 (exactly here =>
https://tools.ietf.org/html/rfc5056#section-4.1), servers can use the
TLS finish message to determine if the client is actually the same as
the one who initiated the connection, to eliminate the risk of MITM
attacks. OpenSSL offers two undocumented APIs for this purpose:
- SSL_get_finished() to get the latest TLS finish message that the
client has sent.
- SSL_get_peer_finished(), to get the latest TLS finish message
expected by the server.
So basically what is needed here is saving the latest message
generated by client in libpq using get_finished(), send it to the
server using the SASL exchange messages (value sent with the final
client message), and then compare it with what the server expected.
Connection is successful if what the client has sent and what the
server expects match.

There is also a clear documentation about what is expected from the
client and the server in terms of how both should behave using the
first client message https://tools.ietf.org/html/rfc5802#section-6. So
I have tried to follow it, reviews are welcome particularly regarding
that. The implementation is done in such a way that channel binding is
used in the context of an SSL connection, which is what the RFCs
expect from an implementation.


After a deeper analysis, I have some concerns/comments here:

- tls-unique, as you mentioned, uses two undocumented APIs. This raises 
a small flag about the stability and future of those APIs.


- More importantly, some drivers (not libpq-based) may have unexpected 
difficulties implementing tls-unique. In particular, there is not an API 
in Java to get the finished message. I expect other languages to maybe 
have similar limitations. For Java I see two options:
* Also implement tls-server-end-point, which rather relies on the 
server certificate. This information seems to be exposed as part of the 
Java APIs: 
https://docs.oracle.com/javase/8/docs/api/java/security/cert/Certificate.html#getEncoded--
* Use the BouncyCastle library (http://bouncycastle.org/), which 
explicitly supports tls-unique 
(https://www.bouncycastle.org/docs/pkixdocs1.5on/org/bouncycastle/est/TLSUniqueProvider.html#getTLSUnique() 
, 
https://www.bouncycastle.org/docs/tlsdocs1.5on/org/bouncycastle/tls/ChannelBinding.html#tls_unique 
). This would require, however, non-trivial changes to the driver and, I 
expect, a lot of extra effort.


- It seems to me that tls-unique might be a bit fragile. In particular, 
it requires the server to be aware of TSL renegotiations and avoid 
performing one while the SCRAM authentication is being performed. I 
don't know if this is already guaranteed by the current patch, but it 
seems to me it requires complex interactions between levels of 
abstraction that are quite separate (SSL and SCRAM). This is explained 
by the RFC:


"
server applications MUST NOT request TLS renegotiation during phases of 
the application protocol during which application-layer authentication 
occurs

"
(https://tools.ietf.org/html/rfc5929#section-3.1)


Based on this facts, I'd suggest to either implement 
tls-server-end-point or implement both tls-server-end-point and 
tls-unique. The latter would require a more complete protocol message to 
advertise separately SCRAM mechanisms and channel binding names. One of 
such structures could be the one I suggested earlier: 
https://www.postgresql.org/message-id/df8c6e27-4d8e-5281-96e5-131a4e638...@8kdata.com





Before even that, the server needs to send to the client the list of
SASL mechanisms that are supported. This adds SCRAM-SHA-256-PLUS if
the server has been built with SSL support to the list.


And I'd say the list of channel binding names supported.


Álvaro


--

Álvaro Hernández Tortosa


---
<8K>data



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Surjective functional indexes

2017-05-27 Thread Peter Eisentraut
On 5/25/17 12:30, Konstantin Knizhnik wrote:
> Functions like (info->>'name') are named "surjective" ni mathematics.

A surjective function is one where each value in the output type can be
obtained by some input value.  That's not what you are after here.  The
behavior you are describing is a not-injective function.

I think you are right that in practice most functions are not injective.
 But I think there is still quite some difference between a function
like the one you showed that selects a component from a composite data
structure and, for example, round(), where in practice any update is
likely to change the result of the function.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extra Vietnamese unaccent rules

2017-05-27 Thread Kha Nguyen
Does this mean that the python script has to be updated to be recursive too?

> On 27 May 2017, at 0.48, Thomas Munro  wrote:
> 
> On Sat, May 27, 2017 at 9:09 AM, Kha Nguyen  wrote:
>> Could you explain to me what this line means:
>> “
>> 1EA5;LATIN SMALL LETTER A WITH CIRCUMFLEX AND ACUTE;Ll;0;L;00E2
>> 0301N;;;1EA4;;1EA4
>> “
>> 
>> If you could give me an example of adding a rule for “recursive” case, I can 
>> do the rest. I am not familiar with this unaccent format generation yet.
> 
> So contrib/unaccent/generate_unaccent_rules.py is a Python script that
> takes UnicodeData.txt, a list of information about all Unicode
> codepoints available at a URL that is shown in a comment, and
> generates unaccent.rules.  The idea was to avoid having to change it
> manually every time someone finds characters that should be in there
> (as you have just done!) by doing it systematically.
> 
> Unicode has two ways to represent characters with accents: either with
> composed codepoints like "é" or decomposed codepoints where you say
> "e" and then "´".  The field "00E2 0301" is the decomposed form of
> that character above.  Our job here is to identify the basic letter
> that each composed character contains, by analysing the decomposed
> field that you see in that line.  I failed to realise that characters
> with TWO accents are described as a composed character with ONE accent
> plus another accent.
> 
> You don't have to worry about decoding that line, it's all done in
> that Python script.  The problem is just in the function
> is_letter_with_marks().  Instead of just checking if combining_ids[0]
> is a plain letter, it looks like it should also check if
> combining_ids[0] itself is a letter with marks.  Also get_plain_letter
> would need to be able to recurse to extract the "a".
> 
> I hope that helps!
> 
> -- 
> Thomas Munro
> http://www.enterprisedb.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extra Vietnamese unaccent rules

2017-05-27 Thread Kha Nguyen
Could you explain to me what this line means:
“
1EA5;LATIN SMALL LETTER A WITH CIRCUMFLEX AND ACUTE;Ll;0;L;00E2
0301N;;;1EA4;;1EA4
“

If you could give me an example of adding a rule for “recursive” case, I can do 
the rest. I am not familiar with this unaccent format generation yet.

Thanks
Kha

> On 26 May 2017, at 21.19, Thomas Munro  wrote:
> 
> On Sat, May 27, 2017 at 5:13 AM, Tom Lane  wrote:
>> I wrote:
>>> Nguyen Le Hoang Kha  writes:
 Most of the time in Vietnamese language, there are up to 2 accents in a
 character. These unaccent rules are added to handle such cases (which are
 very common).
>> 
>>> I can't see any reason not to add these --- any objections out there?
>> 
>> Oh, wait a minute.  Patching unaccent.rules directly isn't the way
>> to do this; that file is supposed to be generated by
>> generate_unaccent_rules.py.  Can you see how to modify that script
>> to produce these rules?
> 
> Looking at one example from this patch:
> 
> UTF8: 
> Codepoint: 1EA5
> Name: LATIN SMALL LETTER A WITH CIRCUMFLEX AND ACUTE
> 
> In UnicodData.txt it's this line:
> 
> 1EA5;LATIN SMALL LETTER A WITH CIRCUMFLEX AND ACUTE;Ll;0;L;00E2
> 0301N;;;1EA4;;1EA4
> 
> The problem is that generate_unaccent_rules.py assumes that the
> composing data is a plain letter followed by some number of
> diacritical modifiers.  That's true for the characters with a single
> accent, but in this multi-accent case it's *composed* character 00E2
> (LATIN SMALL LETTER A WITH CIRCUMFLEX) and a diacritical marker 0301
> (COMBINING ACCENT ACUTE).  So we need to teach it to be recursive.
> 
> -- 
> Thomas Munro
> http://www.enterprisedb.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Broken hint bits (freeze)

2017-05-27 Thread Vladimir Borodin

> 27 мая 2017 г., в 19:56, Andres Freund  написал(а):
> 
> On 2017-05-27 19:48:24 +0300, Vladimir Borodin wrote:
>> Well, actually clean shutdown of master with exit code 0 from `pg_ctl
>> stop -m fast` guarantees that all WAL has been replicated to standby.
> 
> It does not.  It makes it likely, but the connection to the standby
> could be not up just then, you could run into walsender timeout, and a
> bunch of other scenarios.

AFAIK in this case exit code would not be zero. Even if archiver has not been 
able to archive all WALs before timeout for shutting down happened, exit code 
will not be zero.

> 
> 
>> But just in case we also check that "Latest checkpoint's REDO
>> location" from control file on old master after shutdown is less than
>> pg_last_xlog_replay_location() on standby to be promoted.
> 
> The *redo* location? Or the checkpoint location itself?  Because the
> latter is what needs to be *equal* than the replay location not less
> than.  Normally there won't be other records inbetween, but that's not
> guaranteed.

I've asked about it some time ago [1]. In that case checkpoint location and 
redo location were equal after shutdown and last replay location on standby was 
higher on 104 bytes (the size of shutdown checkpoint record).

But we do check exactly redo location. Should we change it for checking 
checkpoint location?

[1] 
https://www.postgresql.org/message-id/A7683985-2EC2-40AD-AAAC-B44BD0F29723%40simply.name

> 
> 
>> And if something would go wrong in above logic, postgres will not let you 
>> attach old master as a standby of new master. So it is highly probable not a 
>> setup problem.
> 
> There's no such guarantee.  There's a bunch of checks that'll somewhat
> likely trigger, but nothing more than that.
> 
> - Andres


--
May the force be with you…
https://simply.name



Re: [HACKERS] Broken hint bits (freeze)

2017-05-27 Thread Andres Freund
On 2017-05-27 19:48:24 +0300, Vladimir Borodin wrote:
> Well, actually clean shutdown of master with exit code 0 from `pg_ctl
> stop -m fast` guarantees that all WAL has been replicated to standby.

It does not.  It makes it likely, but the connection to the standby
could be not up just then, you could run into walsender timeout, and a
bunch of other scenarios.


> But just in case we also check that "Latest checkpoint's REDO
> location" from control file on old master after shutdown is less than
> pg_last_xlog_replay_location() on standby to be promoted.

The *redo* location? Or the checkpoint location itself?  Because the
latter is what needs to be *equal* than the replay location not less
than.  Normally there won't be other records inbetween, but that's not
guaranteed.


> And if something would go wrong in above logic, postgres will not let you 
> attach old master as a standby of new master. So it is highly probable not a 
> setup problem.

There's no such guarantee.  There's a bunch of checks that'll somewhat
likely trigger, but nothing more than that.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Broken hint bits (freeze)

2017-05-27 Thread Vladimir Borodin

> 26 мая 2017 г., в 21:39, Amit Kapila  написал(а):
> 
>> And LSN on replica is greater that LSN on master (838D/C4A0D280 > 
>> 8092/6A26DD08)
>> How can this be possible?
>> 
> 
> Yeah, I think this is quite suspicious.  This seems to indicate that
> not all WAL records are replicated before the switchover.  What is the
> value of "synchronous_commit" you are using?

synchronous_commit = on.

>  I think you somehow need
> to ensure before switchover that all the WAL is replicated to ensure
> this is not a setup problem.

Well, actually clean shutdown of master with exit code 0 from `pg_ctl stop -m 
fast` guarantees that all WAL has been replicated to standby. But just in case 
we also check that "Latest checkpoint's REDO location" from control file on old 
master after shutdown is less than pg_last_xlog_replay_location() on standby to 
be promoted.

And if something would go wrong in above logic, postgres will not let you 
attach old master as a standby of new master. So it is highly probable not a 
setup problem.

--
May the force be with you…
https://simply.name



Re: [HACKERS] logical replication - still unstable after all these months

2017-05-27 Thread Erik Rijkers

On 2017-05-27 17:11, Andres Freund wrote:
On May 27, 2017 6:13:19 AM EDT, Simon Riggs  
wrote:

On 27 May 2017 at 09:44, Erik Rijkers  wrote:


I am very curious at your results.


We take your bug report on good faith, but we still haven't seen
details of the problem or how to recreate it.

Please post some details. Thanks.


?


ok, ok...

( The thing is, I am trying to pre-digest the output but it takes time )

I can do this now: attached some output that belongs with this group of 
100  1-minute runs:


-- out_20170525_1426.txt
100 -- pgbench -c 64 -j 8 -T 60 -P 12 -n   --  scale 25
 82 -- All is well.
 18 -- Not good.

That is the worst set of runs of what I showed earlier.

that is:  out_20170525_1426.txt  and
2x18 logfiles that the 18 failed runs produced.
Those logfiles have names like:
logrep.20170525_1426.1436.1.scale_25.clients_64.NOK.log
logrep.20170525_1426.1436.2.scale_25.clients_64.NOK.log

.1.=primary
.2.=replica

Please disregard the errors around pg_current_wal_location().  (it was 
caused by some code to dump some wal into zipfiles which obviously 
stopped working after the function was removed/renamed)  There are also 
some uninportant errors from the test-harness where I call with the 
wrong port.  Not interesting, I don't think.


sent_20170527_1745.tar.bz2
Description: BZip2 compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication busy-waiting on a lock

2017-05-27 Thread Andres Freund


On May 27, 2017 9:48:22 AM EDT, Petr Jelinek  
wrote:
>Actually, I guess it's the pid 47457 (COPY process) who is actually
>running the xid 73322726. In that case that's the same thing Masahiko
>Sawada reported [1]. Which basically is result of snapshot builder
>waiting for transaction to finish, that's normal if there is a long
>transaction running when the snapshot is being created (and the COPY is
>a long transaction).

Hm.  I suspect the issue is that the exported snapshot needs an xid for some 
crosscheck, and that's what we're waiting for.  Could you check what happens if 
you don't assign one and just content the error checks out?   Not at my 
computer, just theorizing.

Andres

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication - still unstable after all these months

2017-05-27 Thread Andres Freund


On May 27, 2017 6:13:19 AM EDT, Simon Riggs  wrote:
>On 27 May 2017 at 09:44, Erik Rijkers  wrote:
>
>> I am very curious at your results.
>
>We take your bug report on good faith, but we still haven't seen
>details of the problem or how to recreate it.
>
>Please post some details. Thanks.

?
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Regarding Postgres Dynamic Shared Memory (DSA)

2017-05-27 Thread Thomas Munro
On Tue, May 23, 2017 at 10:42 PM, Mahi Gurram  wrote:
> Hello everyone,
>
> I'm building In-Memory index extension for Postgres, for which i'm trying to
> use DSA. But ended with some issues, as it is not allowing me to create
> DSA(Dynamic Shared Area) in _PG_init function.
>
> Please refer my_PG_init code below:
>>
>> void
>> _PG_init(void)
>> {
>> area = dsa_create(LWLockNewTrancheId(), "CustomIndex_DSA");
>> area_handle = dsa_get_handle(area);
>> }
>
>
> Because of this code, Postgres is not starting. Not even giving any error
> messages in pg logs. Hence, i'm totally clue less :(
>
>
> Please let me know how to proceed. Your help is highly appreciated.

Hi Mahi

If your plan is to write a preloaded library, then I think your
_PG_init() function needs to register a callback with
shmem_startup_hook.  See pgss_shmem_startup for an example.  You may
need to set up a piece of traditional named shared memory that
backends can use to find either (1) the handle for your DSA area or
(2) your DSA area itself (if you use the 'in place' constructor), and
also allocate and share a lock tranche number.

Another approach would be to create the DSA area on demand (ie the
first time you need it for your new index feature), if you don't want
to have to preload the library, but there is a small problem with
that, at least in theory.  You probably still need to use a small bit
of named traditional shmem for discovery purposes, and it's slightly
against the rules to do that when you haven't called
RequestAddinShmemSpace, and it's too late to do that.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication busy-waiting on a lock

2017-05-27 Thread Petr Jelinek
On 27/05/17 15:44, Petr Jelinek wrote:
> On 27/05/17 01:25, Jeff Janes wrote:
>> When I create a subscription in the disabled state, and then later doing
>> "alter subscription sub enable;", on the master I sometimes get a tight
>> loop of the deadlock detector:
>>
>> (log_lock_waits is on, of course)
>>
>> deadlock_timeout is set to 1s, so I don't know why it seems to be
>> running several times per millisecond.
>>
>> .
>>
>> And so on out to "after 9616.814", when it finally acquires the lock.
>>
>> The other process, 47457, is doing the initial COPY of another table as
>> part of the same publisher/subscriber set.
>>
> 
> We lock wait for running transactions in snapshot builder while the
> snapshot is being built so I guess that's what you are seeing. I am not
> quite sure why the snapshot builder would hold the xid lock for
> prolonged period of time though, the XactLockTableWait releases the lock
> immediately after acquiring it. In fact AFAICS everything that acquires
> ShareLock on xid releases it immediately after acquiring as it's only
> used for waits.
> 

Actually, I guess it's the pid 47457 (COPY process) who is actually
running the xid 73322726. In that case that's the same thing Masahiko
Sawada reported [1]. Which basically is result of snapshot builder
waiting for transaction to finish, that's normal if there is a long
transaction running when the snapshot is being created (and the COPY is
a long transaction).

[1]
https://www.postgresql.org/message-id/flat/CAD21AoC2KJdavS7MFffmSsRc1dn3Vg_0xmuc%3DUpBrZ-_MUxh-Q%40mail.gmail.com

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication busy-waiting on a lock

2017-05-27 Thread Petr Jelinek
On 27/05/17 01:25, Jeff Janes wrote:
> When I create a subscription in the disabled state, and then later doing
> "alter subscription sub enable;", on the master I sometimes get a tight
> loop of the deadlock detector:
> 
> (log_lock_waits is on, of course)
> 
> deadlock_timeout is set to 1s, so I don't know why it seems to be
> running several times per millisecond.
> 
> .
> 
> And so on out to "after 9616.814", when it finally acquires the lock.
> 
> The other process, 47457, is doing the initial COPY of another table as
> part of the same publisher/subscriber set.
> 

We lock wait for running transactions in snapshot builder while the
snapshot is being built so I guess that's what you are seeing. I am not
quite sure why the snapshot builder would hold the xid lock for
prolonged period of time though, the XactLockTableWait releases the lock
immediately after acquiring it. In fact AFAICS everything that acquires
ShareLock on xid releases it immediately after acquiring as it's only
used for waits.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER SUBSCRIPTION ..SET PUBLICATION refresh is not throwing error.

2017-05-27 Thread Petr Jelinek
On 27/05/17 04:00, Euler Taveira wrote:
> 2017-05-26 21:29 GMT-03:00 Petr Jelinek  >:
> 
> 
> Actually another possibility would be to remove the REFRESH keyword
> completely and just have [ WITH (...) ] and have the refresh option
> there, ie simplified version of what you have suggested (without the
> ugliness of specifying refresh twice to disable).
> 
> 
> It will cause confusion. It seems that WITH sets ALTER SUBSCRIPTION
> properties. Indeed, they are REFRESH properties. I think we shouldn't
> exclude REFRESH keyword. Syntax leaves no doubt that WITH are REFRESH
> properties.
> 

Maybe, I don't know, it might not be that confusing when SET PUBLICATION
and REFRESH PUBLICATION have same set of WITH options.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication - still unstable after all these months

2017-05-27 Thread Simon Riggs
On 27 May 2017 at 09:44, Erik Rijkers  wrote:

> I am very curious at your results.

We take your bug report on good faith, but we still haven't seen
details of the problem or how to recreate it.

Please post some details. Thanks.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication - still unstable after all these months

2017-05-27 Thread Erik Rijkers

On 2017-05-27 10:30, Erik Rijkers wrote:

On 2017-05-27 01:35, Mark Kirkwood wrote:



Here is what I have:

instances.sh:
testset.sh
pgbench_derail2.sh
pubsub.sh



To be clear:

( Apart from that standalone call like
./pgbench_derail2.sh $scale $clients $duration $date_str
)

I normally run by editing the parameters in testset.sh, then run:

./testset.sh

that then shows a tail -F of the output-logfile (to paste into another 
screen).


in yet another screen the 'watch -n20 results.sh' line

The output=files are the .txt files.
The logfiles of the instances are (at the end of each test) copied to 
directory  logfiles/
under a meaningful name that shows the parameters, and with an extension 
like '.ok.log' or '.NOK.log'.


I am very curious at your results.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication - still unstable after all these months

2017-05-27 Thread Erik Rijkers

On 2017-05-27 01:35, Mark Kirkwood wrote:

On 26/05/17 20:09, Erik Rijkers wrote:



The idea is simple enough:

startup instance1
startup instance2 (on same machine)
primary: init pgbench tables
primary: add primary key to pgbench_history
copy empty tables to replica by dump/restore
primary: start publication
replica: start subscription
primary: run 1-minute pgbench
wait till the 4 md5's of primary pgbench tables
  are the same as the 4 md5's of replica pgbench
  tables (this will need a time-out).
log 'ok' or 'not ok'
primary: clean up publication
replica: clean up subscription
shutdown primary
shutdown replica

this whole thing 100x



Here is what I have:

instances.sh:
  starts up 2 assert enabled sessions

instances_fast.sh:
  alternative to instances.sh
  starts up 2 assert disabled 'fast' sessions

testset.sh
  loop to call pgbench_derail2.sh with varying params

pgbench_derail2.sh
  main test program
  can be called 'standalone'
./pgbench_derail2.sh $scale $clients $duration $date_str
  so for instance this should work:
./pgbench_derail2.sh 25 64 60 20170527_1019
  to remove publication and subscription from sessions, add a 5th 
parameter 'clean'

./pgbench_derail2.sh 1 1 1 1 'clean'

pubsub.sh
  displays replication state. also called by pgbench_derail2.sh
  must be in path

result.sh
  display results
  I keep this in a screen-session as:
  watch -n 20 './result.sh 201705'


Peculiar to my setup also:
  server version at compile time stamped with date + commit hash
  I misuse information_schema.sql_packages  at compile time to store 
patch information

  instances are in  $pg_stuff_dir/pg_installations/pgsql.

So you'll have to outcomment a line here and there, and adapt paths, 
ports, and things like that.


It's a bit messy, I should have used perl from the beginning...

Good luck :)

Erik Rijkers







#!/bin/sh

#assertions on  in $pg_stuff_dir/pg_installations/pgsql./bin
#assertions off in $pg_stuff_dir/pg_installations/pgsql./bin.fast

port1=6972 project1=logical_replication
port2=6973 project2=logical_replication2
pg_stuff_dir=$HOME/pg_stuff
PATH1=$pg_stuff_dir/pg_installations/pgsql.$project1/bin:$PATH
PATH2=$pg_stuff_dir/pg_installations/pgsql.$project2/bin:$PATH
server_dir1=$pg_stuff_dir/pg_installations/pgsql.$project1
server_dir2=$pg_stuff_dir/pg_installations/pgsql.$project2
data_dir1=$server_dir1/data
data_dir2=$server_dir2/data

options1="
-c wal_level=logical
-c max_replication_slots=10
-c max_worker_processes=12
-c max_logical_replication_workers=10
-c max_wal_senders=10
-c logging_collector=on
-c log_directory=$server_dir1
-c log_filename=logfile.${project1}
-c log_replication_commands=on "

# -c wal_sender_timeout=18
# -c client_min_messages=DEBUG1 "
# -c log_connections=on
# -c max_sync_workers_per_subscription=6

options2="
-c wal_level=replica
-c max_replication_slots=10
-c max_worker_processes=12
-c max_logical_replication_workers=10
-c max_wal_senders=10
-c logging_collector=on
-c log_directory=$server_dir2
-c log_filename=logfile.${project2}
-c log_replication_commands=on "

# -c wal_sender_timeout=18
# -c client_min_messages=DEBUG1 "
# -c log_connections=on
# -c max_sync_workers_per_subscription=6

export PATH=$PATH1; export PG=$( which postgres ); $PG -D $data_dir1 -p $port1 ${options1} &
sleep 1
export PATH=$PATH2; export PG=$( which postgres ); $PG -D $data_dir2 -p $port2 ${options2} &
sleep 1
#!/bin/sh

#assertions on  in $pg_stuff_dir/pg_installations/pgsql./bin
#assertions off in $pg_stuff_dir/pg_installations/pgsql./bin.fast

port1=6972 project1=logical_replication
port2=6973 project2=logical_replication2
pg_stuff_dir=$HOME/pg_stuff
PATH1=$pg_stuff_dir/pg_installations/pgsql.$project1/bin.fast:$PATH
PATH2=$pg_stuff_dir/pg_installations/pgsql.$project2/bin.fast:$PATH
server_dir1=$pg_stuff_dir/pg_installations/pgsql.$project1
server_dir2=$pg_stuff_dir/pg_installations/pgsql.$project2
data_dir1=$server_dir1/data
data_dir2=$server_dir2/data
options1="
-c wal_level=logical
-c max_replication_slots=10
-c max_worker_processes=12
-c max_logical_replication_workers=10
-c max_wal_senders=14
-c wal_sender_timeout=18
-c logging_collector=on
-c log_directory=$server_dir1
-c log_filename=logfile.${project1} 
-c log_replication_commands=on "

options2="
-c wal_level=replica
-c max_replication_slots=10
-c max_worker_processes=12
-c max_logical_replication_workers=10
-c max_wal_senders=14
-c wal_sender_timeout=18
-c logging_collector=on
-c log_directory=$server_dir2
-c log_filename=logfile.${project2} 
-c log_replication_commands=on "

export PATH=$PATH1; PG=$(which postgres); $PG -D $data_dir1 -p $port1 ${options1} &
export PATH=$PATH2; PG=$(which postgres); $PG -D $data_dir2 -p $port2 ${options2} &

#!/bin/bash
pg_stuff_dir=$HOME/pg_stuff
port1=6972 project1=logical_replication
port2=6973 project2=logical_replication2
db=testdb
rc=0
duration=60
while [[ $rc -eq 0 ]]
do
for scale in 25 5
do
  for clients in 90 64 8
  do