Re: Add "password_protocol" connection parameter to libpq

2019-08-09 Thread Jeff Davis
On Sat, 2019-08-10 at 10:24 +0800, Craig Ringer wrote:
> Before we go too far along with this, lets look at how other
> established protocols do things and the flaws that've been discovered
> in their approaches. If this isn't done with extreme care then
> there's a large risk of negating the benefits offered by adopting
> recent things like SCRAM.

Agreed. I'm happy to hear any proposals better informed by history.

> Frankly I kind of wish we could just use SASL, but there are many
> (many) reasons no to.

I'm curious what the reasons are not to use SASL; do you have a
reference?

> Off the top of my head I can think of these risks:
> 
> * Protocols that allow naïve pre-auth client/server auth negotiation
> (e.g. by finding the overlap in exchanged supported auth-mode lists)
> are subject to MiTM downgrade attacks where the attacker filters out
> protocols it cannot intercept and break from the proposed
> alternatives.

We already have the downgrade problem. That's what I'm trying to solve.

> * Protocols that specify a hierarchy tend to be inflexible and result
> in hard to predict auth mode selections as the options grow. If my
> app wants GSSAPI or SuperStrongAuth but doesn't accept SSPI, and the
> hierarchy is  GSSAPI > SSPI > SuperStrongAuth, it has to fall back to
> a disconnect and retry model like now.

What do you mean "disconnect and retry model"?

I agree that hierarchies are unweildly as the options grow. Then again,
as options grow, we need new versions of the client to support them,
and those new versions might offer more flexible ways to choose between
them.

Of course, we should try to think ahead to avoid needing to constantly
change client connection syntax, but I'm just pointing out that it's
not a one-way door.

> * Protocols that announce supported auth methods before any kind of
> trust is established make life easier for vulnerability scanners and
> worms

This discussion is about the client so I don't see how vulnerability
scanners are relevant.

Regards,
Jeff Davis







Re: POC: converting Lists into arrays

2019-08-09 Thread David Rowley
On Sat, 10 Aug 2019 at 09:03, Tom Lane  wrote:
>
> David Rowley  writes:
> > On Fri, 9 Aug 2019 at 09:55, Tom Lane  wrote:
> >> I still have hopes for getting rid of es_range_table_array though,
> >> and will look at that tomorrow or so.
>
> > Yes, please. I've measured that to be quite an overhead with large
> > partitioning setups. However, that was with some additional code which
> > didn't lock partitions until it was ... well  too late... as it
> > turned out. But it seems pretty good to remove code that could be a
> > future bottleneck if we ever manage to do something else with the
> > locking of all partitions during UPDATE/DELETE.
>
> I poked at this, and attached is a patch, but again I'm not seeing
> that there's any real performance-based argument for it.  So far
> as I can tell, if we've got a lot of RTEs in an executable plan,
> the bulk of the startup time is going into lock (re) acquisition in
> AcquirePlannerLocks, and/or permissions scanning in ExecCheckRTPerms;
> both of those have to do work for every RTE including ones that
> run-time pruning drops later on.  ExecInitRangeTable just isn't on
> the radar.

In the code I tested with locally I ended up with a Bitmapset that
marked which RTEs required permission checks so that
ExecCheckRTPerms() could quickly skip RTEs with requiredPerms == 0.
The Bitmapset was set in the planner. Note:
expand_single_inheritance_child sets childrte->requiredPerms = 0, so
there's nothing to do there for partitions, which is the most likely
reason that the rtable list would be big.  Sadly the locking is still
a big overhead even with that fixed. Robert threw around some ideas in
[1], but that seems like a pretty big project.

I don't think removing future bottlenecks is such a bad idea if it can
be done in such a way that the code remains clean.  It may serve to
increase our motivation later to solve the remaining issues. We tend
to go to greater lengths when there are more gains, and more gains are
more easily visible by removing more bottlenecks.

Another reason to remove the es_range_table_array is that the reason
it was added in the first place is no longer valid.  We'd never have
added it if we had array-based lists back then. (Reading below, it
looks like Alvaro agrees with this too)

[1] 
https://www.postgresql.org/message-id/CA%2BTgmoYbtm1uuDne3rRp_uNA2RFiBwXX1ngj3RSLxOfc3oS7cQ%40mail.gmail.com

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services




Re: Locale support

2019-08-09 Thread Thomas Munro
On Sat, Aug 10, 2019 at 11:50 AM Thomas Munro  wrote:
> On Fri, Aug 9, 2019 at 6:15 PM Yonatan Misgan  wrote:
> > Can I  implement it as a locale support? When the user want to change the 
> > lc _time = am_ET(Amharic Ethiopia ) the date and time representation of the 
> > database systems be in Ethiopian calendar.
>
> I'm not an expert in this stuff, but it seem to me that both the
> operating system and the date/time localisation code in PostgreSQL use
> Gregorian calendar logic, even though they can use local language
> names for them.  That is, they allow you to display the French, Greek,
> Ethiopian ... names for the Gregorian months, but not a a completely
> different calendar system.  For example, according to my operating
> system:
> ...

Reading about that led me to the ICU model of calendars.  Here's the C
API (there are C++ and Java APIs too, but to maximise the possibility
of producing something that could ever be part of core PostgreSQL, I'd
stick to pure C):

http://userguide.icu-project.org/datetime/calendar
http://icu-project.org/apiref/icu4c/ucal_8h.html
http://icu-project.org/apiref/icu4c/udat_8h.html

It does in fact work using locales to select calendars as you were
suggesting (unlike POSIX or at least the POSIX systems I tried), and
there it knows that am_ET is associated with the Ethiopic calendar, as
you were suggesting (and likewise for nearly a dozen other calendars).
When you call ucal_open() you have to say whether you want
UCAL_TRADITIONAL (the traditional calendar associated with the locale)
or UCAL_GREGORIAN.  In the usual ICU fashion it lets you mix and match
more explicitly, so you can use locale names like
"fr_FR@calendar=buddhist".  Then you can us the udat.h stuff to format
dates and so forth.

So now I'm wondering if the best idea would be to write an extension
that provides a handful of carefully crafted functions to expose the
ICU calendar date/time formatting/parsing/manipulation stuff to SQL.
I think that should be doable in a way that is not specific to
Ethiopic, so that users of Indian, Persian, Islamic etc calendars can
also benefit from this.  I'm not sure if it should use LC_TIME --
possibly not, because that would interfere with libc-based stuff and
the locale names don't match up; that might only make sense if this
facility completely replaced the built in date/time stuff, which seems
unlikely.  Perhaps you'd just want to pass in the complete ICU locale
string to each of the functions, to_char_icu(current_date, 'Month',
'fr_FR@calendar=buddhist'), or perhaps you'd want a GUC to control it
(extensions can have their own GUCs).  I'm not sure.

There has certainly been interest in exposing other bits of ICU to
PostgreSQL, either in core or in extensions: collations, Unicode
normalisation, and now this.  Hmm.

Another thing you might want to look into is whether the SQL standard
has anything to say about non-Gregorian calendars, and whether DB2 has
anything to support them (it certainly has ICU inside it, as does
PostgreSQL (optionally), so I wonder if they've exposed this part of
it to SQL and if so how).

-- 
Thomas Munro
https://enterprisedb.com




Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-08-09 Thread Bruce Momjian
On Thu, Aug  8, 2019 at 10:17:53PM -0400, Sehrope Sarkuni wrote:
> On Thu, Aug 8, 2019 at 2:16 PM Bruce Momjian  wrote:
> 
> On Wed, Aug  7, 2019 at 08:56:18AM -0400, Sehrope Sarkuni wrote:
> > Simplest approach for derived keys would be to use immutable attributes
> of the
> > WAL files as an input to the key derivation. Something like HKDF(MDEK,
> "WAL:" |
> 
> So, I am thinking we should use "WAL:" for WAL and "REL:" for heap/index
> files.
> 
> 
> Sounds good. Any unique convention is fine. Main thing to keep in mind is that
> they're directly tied to the master key so it's not possible to rotate them
> without changing the master key.

A recent email talked about using two different encryption keys for
heap/index and WAL, which allows for future features, and allows for key
rotation of the two independently.  (I already stated how hard key
rotation would be with WAL and pg_rewind.)

> This is in contrast to saving a WDEK key to a file (similar to how the MDEK 
> key
> would be saved) and unlocking it with the MDEK. That has more moving parts but
> would allow that key to be independent of the MDEK. In a later message Stephen
> refers to an example of a replica receiving encrypted WAL and applying it with
> a different MDEK for the page buffers. That's doable with an independent WDEK.

I assumed we would call a command on boot to get a passphrase, which
would unlock the encryption keys.  Is there more being described above?

> > | timeline_id || wal_segment_num) should be fine for this as it is:
> 
> I considered using the timeline in the nonce, but then remembered that
> in timeline switch, we _copy_ the part of the old WAL up to the timeline
> switch to the new timeline;  see:
> 
>     https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/
> 
> backend/access/transam/xlog.c;h=f55352385732c6b0124eff5265462f3883fe7435;hb
> =HEAD#l5502
> 
>    * Initialize the starting WAL segment for the new timeline. If the
> switch
>    * happens in the middle of a segment, copy data from the last WAL
> segment
>    * of the old timeline up to the switch point, to the starting WAL
> segment
>    * on the new timeline.
> 
> We would need to decrypt/encrypt to do the copy, and just wasn't sure of
> the value of the timeline in the nonce.  One value to it is that if
> there some WAL that generated after the timeline switch in the old
> primary that isn't transfered, there would be potentially new data
> encrypted with the same key/nonce in the new primary, but if that WAL is
> not used, odds are it is gone/destroyed/inaccessible, or it would have
> been used during the switchover, so it didn't seem worth worrying about.
> 
> One _big_ reason to add the timeline is if you had a standby that you
> recovered and rolled forward only to a specific transaction, then
> continued running it as a new primary.  In that case, you would have
> different WAL encrypted with the same key/nonce, but that sounds like
> the same as promoting two standbys, and we should just document not to
> do it.
> 
> Maybe we need to consider this further.
> 
> 
> Good points. Yes, anything short of generating a new key at promotion time 
> will
> have these issues. If we're not going to do that, no point in adding the
> timeline id if it does not change anything. I had thought only the combo was
> unique but sounds like the segment number is unique on its own. One thing I
> like about a unique per-file key is that it simplifies the IV generation (i.e.
> can start at zero).

Uh, well, the segment number is unique, _except_ for a timeline switch,
where the segment number exists in the old timeline file, and in the new
timeline file, though the new timeline file has more WAL because it has
writes only in the new timeline.  So, in a way, the WAL data is the same
in the old and new timeline files, e.g. 00010001 and
00020001, but 00010001 stops at the
timeline switch and 00020001 has more WAL data that
represents cluster activity since the WAL switch, though the two files
are both 16MB.

> What about discarding the rest of the WAL file at promotion and skipping to a
> new file? With a random per-file key in the first page header would ensure 
> that
> going forward all WAL data is encrypted differently. Combine that with
> independent WAL and MDEK keys and everything would be different between two
> replicas promoted from the same point.

Yes, we could do that, but I am hesitant to change the WAL format just
for this, unless there is value to the random number.  We already
discussed that the LSN could be duplicated in different heap/index
files, so we would still have issues.

> > A unique WDEK per WAL file that is derived from the segment number would
> not
> > have that problem. A unique key per-file means the IVs can all start at

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-08-09 Thread Bruce Momjian
On Fri, Aug  9, 2019 at 05:01:47PM +0200, Tomas Vondra wrote:
> I know there were proposals to keep it encrypted in shared buffers, but
> I'm not sure that's what we'll end up doing (I have not followed the
> recent discussion all that closely, though).

There is no plan to encrypt shared buffers.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +




Re: Add "password_protocol" connection parameter to libpq

2019-08-09 Thread Craig Ringer
On Fri, 9 Aug 2019 at 11:00, Michael Paquier  wrote:

> On Thu, Aug 08, 2019 at 03:38:20PM -0700, Jeff Davis wrote:
> > Libpq doesn't have a way to control which password protocols are used.
> > For example, the client might expect the server to be using SCRAM, but
> > it actually ends up using plain password authentication instead.
>
> Thanks for working on this!
>
> > I'm not 100% happy with the name "password_protocol", but other names I
> > could think of seemed likely to cause confusion.
>
> What about auth_protocol then?  It seems to me that it could be useful
> to have the restriction on AUTH_REQ_MD5 as well.
>
> > Sets the least-secure password protocol allowable when using password
> > authentication. Options are: "plaintext", "md5", "scram-sha-256", or
> > "scram-sha-256-plus".
>
> This makes it sound like there is a linear hierarchy among all those
> protocols, which is true in this case, but if the list of supported
> protocols is extended in the future it may be not.
>

Before we go too far along with this, lets look at how other established
protocols do things and the flaws that've been discovered in their
approaches. If this isn't done with extreme care then there's a large risk
of negating the benefits offered by adopting recent things like SCRAM.
Frankly I kind of wish we could just use SASL, but there are many (many)
reasons no to.

Off the top of my head I can think of these risks:

* Protocols that allow naïve pre-auth client/server auth negotiation (e.g.
by finding the overlap in exchanged supported auth-mode lists) are subject
to MiTM downgrade attacks where the attacker filters out protocols it
cannot intercept and break from the proposed alternatives.

* Protocols that specify a hierarchy tend to be inflexible and result in
hard to predict auth mode selections as the options grow. If my app wants
GSSAPI or SuperStrongAuth but doesn't accept SSPI, and the hierarchy is
GSSAPI > SSPI > SuperStrongAuth, it has to fall back to a disconnect and
retry model like now.

* Protocols that announce supported auth methods before any kind of trust
is established make life easier for vulnerability scanners and worms

and I'm sure there are more when it comes to auth handshakes.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-08-09 Thread Bruce Momjian
On Thu, Aug  8, 2019 at 10:34:26PM -0400, Sehrope Sarkuni wrote:
> On Thu, Aug 8, 2019 at 6:31 PM Stephen Frost  wrote:
> 
> Strictly speaking, that isn't actually crash recovery, it's physical
> replication / HA, and while those are certainly nice to have it's no
> guarantee that they're required or that you'd want to have the same keys
> for them- conceptually, at least, you could have WAL with one key that
> both sides know and then different keys for the actual data files, if we
> go with the approach where the WAL is encrypted with one key and then
> otherwise is plaintext.
> 
> 
> I like the idea of separating the WAL key from the rest of the data files. 
> It'd all be unlocked by the MDEK and you'd still need derived keys per
> WAL-file, but disconnecting all that from the data files solves a lot of the
> problems with promoted replicas.
> 
> This would complicate cloning a replica as using a different MDEK would 
> involve
> decrypting / encrypting everything rather than just copying the files. Even if
> that's not baked in a first version, the separation allows for eventually
> supporting that.

OK, I can get behind that idea.  One cool idea would be for the WAL on
primary and standbys to use the same WAL key, but to use different
heap/index keys.  When the standby is promoted, there would be a way for
the WAL to start using a new encryption key, and the heap/index would
already be using its own encryption key.

Setting up such a system seems complicated.  The big problem is that the
base backup would use the primary key, unless we allowed  pg_basebackup
to decrypt/encrypt with a new heap/index key.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +




Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-08-09 Thread Bruce Momjian
On Fri, Aug  9, 2019 at 11:51:23PM +0900, Masahiko Sawada wrote:
> I agree that cluster-wide is more simpler but I'm not sure that it
> meets real needs from users. One example is re-encryption; when the
> key leakage happens, in cluster-wide encryption we end up with doing
> re-encrypt whole database regardless the amount of user sensitive data
> in database. I think it's a big constraint for users because it's
> common that the amount of data such as master table that needs to be
> encrypted doesn't account for a large potion of database. That's one
> reason why I think more fine granularity encryption such as
> table/tablespace is required.
> 
> And in terms of feature development we would implement
> fine-granularity encryption in the future even if the first step is
> cluster-wide encryption? And both TDEs encrypt the same kind of
> database objects (i.e. only  tables , indexes and WAL)? If so, how
> does users  use them depending on cases?
> 
> I imagined the case where we had the cluster-wide encryption as the
> first TDE feature. We will enable TDE at initdb time by specifying the
> command-line parameter for TDE. Then TDE is enabled in cluster wide
> and all tables/indexes and WAL are automatically encrypted. Then, if
> we want to implement the more fine granularity encryption how can we
> make users use it? WAL encryption and tables/index encryption are
> enabled at the same time but we want to enable encryption for
> particular tables/indexes after initdb. If the cluster-wide encryption
> is something like a short-cut of encrypting all tables/indexes, I
> personally think that implementing the more fine granularity one first
> and then using it to achieve the more coarse granularity would be more
> easier.

I don't know how to move this thread forward, so I am going to try to
explain how I view it.  People want feature X and feature Y.  Feature X
seems straight-forward to implement, use, and seems secure.  Feature Y
is none of those, so far.

When I explain that feature X is the direction we should go in for PG
13, people give more reasons they want feature Y, but don't give any
details about how the problems of implemention, use, and security can be
addressed.

I just don't see that continuing to discuss feature Y with no details in
how to implement it really helps, so I have no reply to these ideas. 
People can talk about whatever feature they want on these lists, but I
have no way to help discussions that don't address facts.

I will close with what I have already stated, that what people want, and
what we can reasonably accomplish, are two different things.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +




Re: Global temporary tables

2019-08-09 Thread Craig Ringer
On Fri, 9 Aug 2019 at 22:07, Konstantin Knizhnik 
wrote:

>
>
> Ok, here it is: global_private_temp-1.patch
>

Fantastic.

I'll put that high on my queue.

I'd love to see something like this get in.

Doubly so if it brings us closer to being able to use temp tables on
physical read replicas, though I know there are plenty of other barriers
there (not least of which being temp tables using persistent txns not
vtxids)

Does it have a CF entry?


> Also I have attached updated version of the global temp tables with shared
> buffers - global_shared_temp-1.patch
>

Nice to see that split out. In addition to giving the first patch more hope
of being committed this time around, it'll help with readability and
testability too.

To be clear, I have long wanted to see PostgreSQL have the "session" state
abstraction you have implemented. I think it's really important for high
client count OLTP workloads, working with the endless collection of ORMs
out there, etc. So I'm all in favour of it in principle so long as it can
be made to work reliably with limited performance impact on existing
workloads and without making life lots harder when adding new core
functionality, for extension authors etc. The same goes for built-in
pooling. I think PostgreSQL has needed some sort of separation of
"connection", "backend", "session" and "executor" for a long time and I'm
glad to see you working on it.

With that said: How do you intend to address the likelihood that this will
cause performance regressions for existing workloads that use temp tables
*without* relying on your session state and connection pooler? Consider
workloads that use temp tables for mid-long txns where txn pooling is
unimportant, where they also do plenty of read and write activity on
persistent tables. Classic OLAP/DW stuff. e.g.:

* four clients, four backends, four connections, session-level connections
that stay busy with minimal client sleeps
* All sessions run the same bench code
* transactions all read plenty of data from a medium to large persistent
table (think fact tables, etc)
* transactions store a filtered, joined dataset with some pre-computed
window results or something in temp tables
* benchmark workload makes big-ish temp tables to store intermediate data
for its medium-length transactions
* transactions also write to some persistent relations, say to record their
summarised results

How does it perform with and without your patch? I'm concerned that:

* the extra buffer locking and various IPC may degrade performance of temp
tables
* the temp table data in shared_buffers may put pressure on shared_buffers
space, cached pages for persistent tables all sessions are sharing;
* the temp table data in shared_buffers may put pressure on shared_buffers
space for dirty buffers, forcing writes of persistent tables out earlier
therefore reducing write-combining opportunities;


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: Add "password_protocol" connection parameter to libpq

2019-08-09 Thread Stephen Frost
Greetings,

* Jeff Davis (pg...@j-davis.com) wrote:
> On Sat, 2019-08-10 at 00:17 +0300, Heikki Linnakangas wrote:
> > auth_methods = 'MITM, -password, -md5'
> 
> Keep in mind this is client configuration, so something reasonable in
> postgresql.conf might not be so reasonable in the form:

Yeah, that's a really good point.

> postgresql://foo:secret@myhost/mydb?auth_methods=MITM%2C%20-
> password%2C%20-md5
> 
> Another thing to consider is that there's less control configuring on
> the client than on the server. The server will send at most one
> authentication request based on its own rules, and all the client can
> do is either answer it, or disconnect. And the SSL stuff all happens
> before that, and won't use an authentication request message at all.

Note that GSSAPI Encryption works the same as SSL in this regard.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Add "password_protocol" connection parameter to libpq

2019-08-09 Thread Jeff Davis
On Sat, 2019-08-10 at 00:17 +0300, Heikki Linnakangas wrote:
> This is a multi-dimensional problem. "channel_binding=require" is
> one 
> way to prevent MITM attacks, but sslmode=verify-ca is another. (Does 
> Kerberos also prevent MITM?) Or you might want to enable plaintext 
> passwords over SSL, but not without SSL.
> 
> I think we'll need something like the 'ssl_ciphers' GUC, where you
> can 
> choose from a few reasonable default rules, but also enable/disable 
> specific methods:

..

> auth_methods = 'MITM, -password, -md5'

Keep in mind this is client configuration, so something reasonable in
postgresql.conf might not be so reasonable in the form:

postgresql://foo:secret@myhost/mydb?auth_methods=MITM%2C%20-
password%2C%20-md5

Another thing to consider is that there's less control configuring on
the client than on the server. The server will send at most one
authentication request based on its own rules, and all the client can
do is either answer it, or disconnect. And the SSL stuff all happens
before that, and won't use an authentication request message at all.

Some protocols allow negotiation within them, like SASL, which gives
the client a bit more freedom. But FE/BE doesn't allow for arbitrary
subsets of authentication methods to be negoitated between client and
server, so I'm worried trying to express it that way will just lead to
clients that break when you upgrade your server.

Regards,
Jeff Davis






Re: Locale support

2019-08-09 Thread Thomas Munro
On Fri, Aug 9, 2019 at 6:15 PM Yonatan Misgan  wrote:
> Can I  implement it as a locale support? When the user want to change the lc 
> _time = am_ET(Amharic Ethiopia ) the date and time representation of the 
> database systems be in Ethiopian calendar.

Hi Yonatan,

I'm not an expert in this stuff, but it seem to me that both the
operating system and the date/time localisation code in PostgreSQL use
Gregorian calendar logic, even though they can use local language
names for them.  That is, they allow you to display the French, Greek,
Ethiopian ... names for the Gregorian months, but not a a completely
different calendar system.  For example, according to my operating
system:

$ LC_TIME=en_GB.UTF-8 date
Sat 10 Aug 2019 10:58:42 NZST
$ LC_TIME=fr_FR.UTF-8 date
sam. 10 août 2019 10:58:48 NZST
$ LC_TIME=el_GR.UTF-8 date
Σάβ 10 Αυγ 2019 10:58:51 NZST
$ LC_TIME=am_ET.UTF-8 date
ቅዳሜ ኦገስ 10 10:58:55 NZST 2019

These all say it's Saturday (ቅዳሜ) the 10th of August (ኦገስ) on the
Gregorian calendar.  Looking at POSIX date[1] you can see they
contemplated the existence of non-Gregorian calendars in a very
limited way, but no operating system I have access to does anything
other than Gregorian with %x and %c:

"The date string formatting capabilities are intended for use in
Gregorian-style calendars, possibly with a different starting year (or
years). The %x and %c conversion specifications, however, are intended
for local representation; these may be based on a different,
non-Gregorian calendar."

PostgreSQL behaves the same way when you ask for the localised month in am_ET:

tmunro=> set lc_time to 'am_ET.UTF-8';
SET
tmunro=> select to_char(now(), 'TMMon');
  to_char
---
 ኦገስ
(1 row)

This is hard coded into the system, as you can see from
src/backend/utils/adt/pg_locale.c where the *twelve* month names are
loaded into localized_abbrev_months and other similar arrays.  That's
the first clue that this system can't handle the thirteen Ethiopian
months, not to mention the maths required to work with them.

That's why I think you need a new, different to_char() function (and
probably more functions).  In that skeleton code I posted, you can see
I defined a function to_char(date, format, calendar) that takes a
third argument, for the calendar name.  You might also wonder if that
new function should respect the locale settings, but I'm not sure if
it could in general; you'd have to be able to get (say) the Greek
names for the Ethiopian calendar's months, which the OS won't be able
to give you.  Though perhaps you'd want some way to select between the
Ethiopian script and the transliterations into Latin script, which
would presumably be hard coded into the extension, I have no idea if
that's useful to anyone...

BTW there have been earlier discussions of this:

https://www.postgresql.org/message-id/flat/CAM7dW9iBXDJuwZrEXW%2Bdsa_%3Dew%3D%2BFdv7mcF51nQLGSkTkQp2MQ%40mail.gmail.com

It shows that Apple has an Objective-C NSCalendar class that
understands Ehtiopian, Persian, Hebrew, ... calendars, which made me
wonder if LC_TIME=am_ET.UTF-8 would trigger something special on a
Mac, but nope, its libc still just gives Ethiopian names for Gregorian
months as I expected.

[1] http://pubs.opengroup.org/onlinepubs/009695399/utilities/date.html

-- 
Thomas Munro
https://enterprisedb.com




Shrinking tuplesort.c's SortTuple struct (Was: More ideas for speeding up sorting)

2019-08-09 Thread Peter Geoghegan
On Fri, Sep 9, 2016 at 6:14 AM Heikki Linnakangas  wrote:
> 1. SortTuple.tupindex is not used when the sort fits in memory. If we
> also got rid of the isnull1 flag, we could shrink SortTuple from 24
> bytes to 16 bytes (on 64-bit systems). That would allow you to pack more
> tuples into memory, which generally speeds things up. We could for
> example steal the least-significant bit from the 'tuple' pointer for
> that, because the pointers are always at least 4-byte aligned. (But see
> next point)

I implemented what you sketched here, back in 2016. That is, I found a
way to make SortTuples 16 bytes on 64-bit platforms, down from 24
bytes (24 bytes includes alignment padding). Note that it only became
possible to do this after commit 8b304b8b72b removed replacement
selection sort, which reduced the number of things that use the
SortTuple.tupindex field to just one: it is now only used for tapenum
during merging. (I also had to remove SortTuple.isnull1 to get down to
16 bytes.)

The easy part was removing SortTuple.tupindex itself -- it was fairly
natural to stash that in the slab allocation for each tape. I used the
aset.c trick of having a metadata "chunk" immediately prior to address
that represents the allocation proper -- we can just back up by a few
bytes from stup.tuple to find the place to stash the tape number
during merging. The worst thing about this change was that it makes a
tape slab allocation mandatory in cases that previously didn't have
any need for a stup.tuple allocation (e.g. datum tuplesorts of
pass-by-value types), though only during merging. Since we must always
store the tapenum when merging, we always need a slab buffer for each
tape when merging. This aspect wasn't so bad.

The hard/ugly part was getting rid of the remaining "unnecessary"
SortTuple field, isnull1. This involved squeezing an extra bit out of
the stup.tuple pointer, by stealing the least-significant bit. This
was invasive in about the way you'd expect it to be. It wasn't awful,
but it also wasn't something I'd countenance pursuing without getting
a fairly noticeable benefit for users. (Actually, the code that I
wrote so far *is* pretty awful, but I could certainly clean it up some
more if I felt like it.)

I think that the rough patch that I came up with gives us an accurate
picture of what the benefits of having SortTuples that are only 16
bytes wide are. The benefits seem kind of underwhelming at this point.
For something like a "SELECT COUNT(distinct my_int4_col) FROM tab"
query, which uses the qsort_ssup() qsort specialization, we can easily
go from getting an external sort to getting an internal sort. We can
maybe end up sorting about 20% faster if things really work out for
the patch. But in cases that users really care about, such as REINDEX,
the difference is in the noise. ISTM that this is simple not worth the
trouble at this time. These days, external sorts are often slightly
faster than internal sorts in practice, due to the fact that we can do
an on-the-fly merge with external sorts, so we could easily hurt
performance by making more memory available!

I don't want to completely close the door on the idea of shrinking
SortTuple to 16 bytes. I can imagine a world in which that matters.
For example, perhaps there will one day be a strong case to be made
for SIMD-based internal sort algorithms for simple cases, such as the
"COUNT(distinct my_int4_col)" query case that I mentioned. Probably
SIMD-based multiway merge sort. I understand that such algorithms are
very sensitive to things like SIMD CPU register sizes, and were only
considered plausible competitors to quicksort due to the advent of
512-bit SIMD registers. 512 bit SIMD registers haven't been available
in mainstream CPUs for that long.

I have to admit that this SIMD sorting stuff seems like a bit of a
stretch, though. The papers in this area all seem to make rather
optimistic assumptions about the distribution of values. And, I think
we'd have to be even more aggressive about shrinking SortTuples in
order to realize the benefits of SIMD-based sorting. Besides, sorting
itself is the bottleneck for tuplesort-using operations less and less
these days -- the only remaining interesting bottleneck is probably in
code like index_form_tuple(), which is probably a good target for JIT.
In general, it's much harder to make tuplesort.c noticeably faster
than it used to be -- we've picked all the low-hanging fruit.

--
Peter Geoghegan




Re: POC: converting Lists into arrays

2019-08-09 Thread Alvaro Herrera
On 2019-Aug-09, Tom Lane wrote:

> I poked at this, and attached is a patch, but again I'm not seeing
> that there's any real performance-based argument for it.  So far
> as I can tell, if we've got a lot of RTEs in an executable plan,
> the bulk of the startup time is going into lock (re) acquisition in
> AcquirePlannerLocks, and/or permissions scanning in ExecCheckRTPerms;
> both of those have to do work for every RTE including ones that
> run-time pruning drops later on.  ExecInitRangeTable just isn't on
> the radar.

I'm confused.  I thought that the point of doing this wasn't that we
wanted to improve performance, but rather that we're now able to remove
the array without *losing* performance.  I mean, those arrays were there
to improve performance for code that wanted fast access to specific list
items, but now we have fast access to the list items without it.  So a
measurement that finds no performance difference is good news, and we
can get rid of the now-pointless optimization ...

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Add "password_protocol" connection parameter to libpq

2019-08-09 Thread Jeff Davis
On Fri, 2019-08-09 at 16:27 -0400, Jonathan S. Katz wrote:
> Seems to be a lot to configure. I'm more of a fan of the previous
> method; it'd work nicely with how we've presently defined things and
> should be easy to put into a DSN/URI/env variable.

Proposals on the table:

1. Hierarchical semantics, where you specify the least-secure
acceptable method:

  password_protocol = {any,md5,scram-sha-256,scram-sha-256-plus}

2. Comma-list approach, where you specify exactly which protocols are
acceptable, or "any" to mean that we don't care.

3. three-setting approach:

  channel_binding = {disable|prefer|require}
  password_plaintext = {disable|enable}
  password_md5 = {disable|enable}

It looks like Jonathan prefers #1.

#1 seems direct and clearly applies today, and corresponds to auth
methods on the server side.

I'm not a fan of #2, it seems likely to result in a bunch of clients
with overly-specific lists of things with long names that can never
really go away.

#3 is a little more abstract, but also seems more future-proof, and may
tie in to what Stephen is talking about with respect to controlling
auth methods from the client, or moving more protocols within SASL.

Regards,
Jeff Davis






Re: Add "password_protocol" connection parameter to libpq

2019-08-09 Thread Heikki Linnakangas

On 09/08/2019 23:27, Jonathan S. Katz wrote:

On 8/9/19 11:51 AM, Jeff Davis wrote:

On Fri, 2019-08-09 at 09:28 -0400, Stephen Frost wrote:

Having an 'any' option, as mentioned before, could be an alternative
though.


...


I agree with the point that there isn't any guarantee that it'll
always
be clear-cut as to which of two methods is "better".

 From a user perspective, it seems like the main things are "don't
send
my password in the clear to the server", and "require channel binding
to
prove there isn't a MITM".  I have to admit that I like the idea of
requiring scram to be used and not allowing md5 though.


So it seems like we are leaning toward:

password_protocol = any | {plaintext,md5,scram-sha-256,scram-sha-
256-plus}[,...]


First, thanks for proposing / working on this, I like the idea! :) Happy
to test/review.

As long as this one can handle the current upgrade path that's in place
for going from md5 to SCRAM (which AIUI it should) this makes sense to
me. As stated above, there is a clear hierarchy.

I would almost argue that "plaintext" shouldn't even be an option...if
you have "any" set (arguably default?) then plaintext is available. With
our currently supported versions + driver ecosystem, I hope no one needs
to support a forced plaintext setup.


Keep in mind that RADIUS, LDAP and PAM authentication methods are 
'plaintext' over the wire. It's not that bad, when used with 
sslmode=verify-ca/full.



Or maybe:

channel_binding = {disable|prefer|require}
password_plaintext = {disable|enable}
password_md5 = {disable|enable}

That seems reasonable. It's three options, but no normal use case would
need to set more than two, because channel binding forces scram-sha-
256-plus.


Seems to be a lot to configure. I'm more of a fan of the previous
method; it'd work nicely with how we've presently defined things and
should be easy to put into a DSN/URI/env variable.


This is a multi-dimensional problem. "channel_binding=require" is one 
way to prevent MITM attacks, but sslmode=verify-ca is another. (Does 
Kerberos also prevent MITM?) Or you might want to enable plaintext 
passwords over SSL, but not without SSL.


I think we'll need something like the 'ssl_ciphers' GUC, where you can 
choose from a few reasonable default rules, but also enable/disable 
specific methods:


# anything goes (the default)
auth_methods = 'ANY'

# Disable plaintext password authentication. Anything else is accepted.
auth_methods = '-password'

# Only authentication methods that protect from
# Man-in-the-Middle attacks. This allows anything if the server's SSL
# certificate can be verified, and otherwise only SCRAM with
# channel binding
auth_methods = 'MITM'

# The same, but disable plaintext passwords and md5 altogether
auth_methods = 'MITM, -password, -md5'


I'm tempted to also allow 'SSL' and 'SSL-verify-full' as part of the 
same string, so that you could configure everything related to 
connection security in the same option. Not sure if that would make 
things simpler for users, or create more confusion.


- Heikki




Re: POC: converting Lists into arrays

2019-08-09 Thread Tom Lane
David Rowley  writes:
> On Fri, 9 Aug 2019 at 09:55, Tom Lane  wrote:
>> I still have hopes for getting rid of es_range_table_array though,
>> and will look at that tomorrow or so.

> Yes, please. I've measured that to be quite an overhead with large
> partitioning setups. However, that was with some additional code which
> didn't lock partitions until it was ... well  too late... as it
> turned out. But it seems pretty good to remove code that could be a
> future bottleneck if we ever manage to do something else with the
> locking of all partitions during UPDATE/DELETE.

I poked at this, and attached is a patch, but again I'm not seeing
that there's any real performance-based argument for it.  So far
as I can tell, if we've got a lot of RTEs in an executable plan,
the bulk of the startup time is going into lock (re) acquisition in
AcquirePlannerLocks, and/or permissions scanning in ExecCheckRTPerms;
both of those have to do work for every RTE including ones that
run-time pruning drops later on.  ExecInitRangeTable just isn't on
the radar.

If we wanted to try to improve things further, it seems like we'd
have to find a way to not lock unreferenced partitions at all,
as you suggest above.  But combining that with run-time pruning seems
like it'd be pretty horrid from a system structural standpoint: if we
acquire locks only during execution, what happens if we find we must
invalidate the query plan?

Anyway, the attached might be worth committing just on cleanliness
grounds, to avoid two-sources-of-truth issues in the executor.
But it seems like there's no additional performance win here
after all ... unless you've got a test case that shows differently?

regards, tom lane

diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index dbd7dd9..7f494ab 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -2790,7 +2790,6 @@ EvalPlanQualStart(EPQState *epqstate, EState *parentestate, Plan *planTree)
 	estate->es_snapshot = parentestate->es_snapshot;
 	estate->es_crosscheck_snapshot = parentestate->es_crosscheck_snapshot;
 	estate->es_range_table = parentestate->es_range_table;
-	estate->es_range_table_array = parentestate->es_range_table_array;
 	estate->es_range_table_size = parentestate->es_range_table_size;
 	estate->es_relations = parentestate->es_relations;
 	estate->es_queryEnv = parentestate->es_queryEnv;
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index c1fc0d5..afd9beb 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -113,7 +113,6 @@ CreateExecutorState(void)
 	estate->es_snapshot = InvalidSnapshot;	/* caller must initialize this */
 	estate->es_crosscheck_snapshot = InvalidSnapshot;	/* no crosscheck */
 	estate->es_range_table = NIL;
-	estate->es_range_table_array = NULL;
 	estate->es_range_table_size = 0;
 	estate->es_relations = NULL;
 	estate->es_rowmarks = NULL;
@@ -720,29 +719,17 @@ ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags)
  * ExecInitRangeTable
  *		Set up executor's range-table-related data
  *
- * We build an array from the range table list to allow faster lookup by RTI.
- * (The es_range_table field is now somewhat redundant, but we keep it to
- * avoid breaking external code unnecessarily.)
- * This is also a convenient place to set up the parallel es_relations array.
+ * In addition to the range table proper, initialize arrays that are
+ * indexed by rangetable index.
  */
 void
 ExecInitRangeTable(EState *estate, List *rangeTable)
 {
-	Index		rti;
-	ListCell   *lc;
-
 	/* Remember the range table List as-is */
 	estate->es_range_table = rangeTable;
 
-	/* Set up the equivalent array representation */
+	/* Set size of associated arrays */
 	estate->es_range_table_size = list_length(rangeTable);
-	estate->es_range_table_array = (RangeTblEntry **)
-		palloc(estate->es_range_table_size * sizeof(RangeTblEntry *));
-	rti = 0;
-	foreach(lc, rangeTable)
-	{
-		estate->es_range_table_array[rti++] = lfirst_node(RangeTblEntry, lc);
-	}
 
 	/*
 	 * Allocate an array to store an open Relation corresponding to each
@@ -753,8 +740,8 @@ ExecInitRangeTable(EState *estate, List *rangeTable)
 		palloc0(estate->es_range_table_size * sizeof(Relation));
 
 	/*
-	 * es_rowmarks is also parallel to the es_range_table_array, but it's
-	 * allocated only if needed.
+	 * es_rowmarks is also parallel to the es_range_table, but it's allocated
+	 * only if needed.
 	 */
 	estate->es_rowmarks = NULL;
 }
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 1fb28b4..39c8b3b 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -535,8 +535,7 @@ extern void ExecInitRangeTable(EState *estate, List *rangeTable);
 static inline RangeTblEntry *
 exec_rt_fetch(Index rti, EState *estate)
 {
-	Assert(rti > 0 && rti <= estate->es_range_table_size);
-	return estate->es_range_table_array[

Re: Add "password_protocol" connection parameter to libpq

2019-08-09 Thread Jonathan S. Katz
On 8/9/19 11:51 AM, Jeff Davis wrote:
> On Fri, 2019-08-09 at 09:28 -0400, Stephen Frost wrote:
>> Having an 'any' option, as mentioned before, could be an alternative
>> though.
> 
> ...
> 
>> I agree with the point that there isn't any guarantee that it'll
>> always
>> be clear-cut as to which of two methods is "better".
>>
>> From a user perspective, it seems like the main things are "don't
>> send
>> my password in the clear to the server", and "require channel binding
>> to
>> prove there isn't a MITM".  I have to admit that I like the idea of
>> requiring scram to be used and not allowing md5 though.
> 
> So it seems like we are leaning toward:
> 
>password_protocol = any | {plaintext,md5,scram-sha-256,scram-sha-
> 256-plus}[,...]

First, thanks for proposing / working on this, I like the idea! :) Happy
to test/review.

As long as this one can handle the current upgrade path that's in place
for going from md5 to SCRAM (which AIUI it should) this makes sense to
me. As stated above, there is a clear hierarchy.

I would almost argue that "plaintext" shouldn't even be an option...if
you have "any" set (arguably default?) then plaintext is available. With
our currently supported versions + driver ecosystem, I hope no one needs
to support a forced plaintext setup.

> 
> Or maybe:
> 
>channel_binding = {disable|prefer|require}
>password_plaintext = {disable|enable}
>password_md5 = {disable|enable}
> 
> That seems reasonable. It's three options, but no normal use case would
> need to set more than two, because channel binding forces scram-sha-
> 256-plus.

Seems to be a lot to configure. I'm more of a fan of the previous
method; it'd work nicely with how we've presently defined things and
should be easy to put into a DSN/URI/env variable.

Thanks,

Jonathan



signature.asc
Description: OpenPGP digital signature


Re: Some memory not freed at the exit of RelationBuildPartitionDesc()

2019-08-09 Thread David Fetter
On Thu, Aug 08, 2019 at 05:42:21PM +0900, Amit Langote wrote:
> On Thu, Aug 8, 2019 at 5:33 PM amul sul  wrote:
> > On Thu, Aug 8, 2019 at 1:27 PM Amit Langote  wrote:
> 
> >> Thanks for the patch.  This was discussed recently in the "hyrax vs.
> >> RelationBuildPartitionDesc()" thread [1] and I think Alvaro proposed
> >> an approach that's similar to yours.  Not sure why it wasn't pursued
> >> though.  Maybe the reason is buried somewhere in that discussion.
> >
> > Oh, quite similar, thanks Amit for pointing that out.
> >
> > Look like "hyrax vs.RelationBuildPartitionDesc()" is in discussion for the 
> > master
> > branch only, not sure though, but we need the similar fix for the back 
> > branches as well.
> 
> Well, this is not a bug as such, so it's very unlikely that a fix like
> this will be back-patched.  Also, if this becomes an issue only for
> more than over 1000 partitions, then it's not very relevant for PG 10
> and PG 11, because we don't recommend using so many partitions with
> them.  Maybe we can consider fixing PG 12 though.

A fix for the thousands-of-partitions case would be very welcome for 12.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: block-level incremental backup

2019-08-09 Thread Jeevan Ladhe
Hi Robert,

On Fri, Aug 9, 2019 at 6:40 PM Robert Haas  wrote:

> On Thu, Aug 8, 2019 at 8:37 PM Jeevan Ladhe
>  wrote:
> > +   if (!XLogRecPtrIsInvalid(previous_lsn))
> > +   appendStringInfo(labelfile, "PREVIOUS WAL LOCATION: %X/%X\n",
> > +(uint32) (previous_lsn >> 32), (uint32)
> previous_lsn);
> >
> > May be we should rename to something like:
> > "INCREMENTAL BACKUP START WAL LOCATION" or simply "INCREMENTAL BACKUP
> START LOCATION"
> > to make it more intuitive?
>
> So, I think that you are right that PREVIOUS WAL LOCATION might not be
> entirely clear, but at least in my view, INCREMENTAL BACKUP START WAL
> LOCATION is definitely not clear.  This backup is an incremental
> backup, and it has a start WAL location, so you'd end up with START
> WAL LOCATION and INCREMENTAL BACKUP START WAL LOCATION and those sound
> like they ought to both be the same thing, but they're not.  Perhaps
> something like REFERENCE WAL LOCATION or REFERENCE WAL LOCATION FOR
> INCREMENTAL BACKUP would be clearer.
>

Agree, how about INCREMENTAL BACKUP REFERENCE WAL LOCATION ?


> > File header structure is defined in both the files basebackup.c and
> > pg_combinebackup.c. I think it is better to move this to
> replication/basebackup.h.
>
> Or some other header, but yeah, definitely don't duplicate the struct
> definition (or any other kind of definition).
>

Thanks.


> > IMHO, while labels are not advisable in general, it may be better to use
> a label
> > here rather than a while(1) loop, so that we can move to the label in
> case we
> > want to retry once. I think here it opens doors for future bugs if
> someone
> > happens to add code here, ending up adding some condition and then the
> > break becomes conditional. That will leave us in an infinite loop.
>
> I'm not sure which style is better here, but I don't really buy this
> argument.


No issues. I am ok either way.

Regards,
Jeevan Ladhe


Re: POC: Cleaning up orphaned files using undo logs

2019-08-09 Thread Robert Haas
On Wed, Aug 7, 2019 at 6:57 AM Heikki Linnakangas  wrote:
> Yeah, that's also a problem with complicated WAL record types. Hopefully
> the complex cases are an exception, not the norm. A complex case is
> unlikely to fit any pre-defined set of fields anyway. (We could look at
> how e.g. protobuf works, if this is really a big problem. I'm not
> suggesting that we add a dependency just for this, but there might be
> some patterns or interfaces that we could mimic.)

I think what you're calling the complex cases are going to be pretty
normal cases, not something exotic, but I do agree with you that
making the infrastructure more generic is worth considering.  One idea
I had is to use the facilities from pqformat.h; have the generic code
read whatever the common fields are, and then pass the StringInfo to
the AM which can do whatever it wants with the rest of the record, but
probably these facilities would make it pretty easy to handle either a
series of fixed-length fields or alternatively variable-length data.
What do you think of that idea?

(That would not preclude doing compression on top, although I think
that feeding everything through pglz or even lz4/snappy may eat more
CPU cycles than we can really afford. The option is there, though.)

> If you remember, we did a big WAL format refactoring in 9.5, which moved
> some information from AM-specific structs to the common headers. Namely,
> the information on the relation blocks that the WAL record applies to.
> That was a very handy refactoring, and allowed tools like pg_waldump to
> print more detailed information about all WAL record types. For WAL
> records, moving the block information was natural, because there was
> special handling for full-page images anyway. However, I don't think we
> have enough experience with UNDO log yet, to know which fields would be
> best to include in the common undo header, and which to leave as
> AM-specific payload. I think we should keep the common header slim, and
> delegate to the AM routines.

Yeah, I remember. I'm not really sure I totally buy your argument that
we don't know what besides XID should go into an undo record: tuples
are a pretty important concept, and although there might be some
exceptions here and there, I have a hard time imagining that undo is
going to be primarily about anything other than identifying a tuple
and recording something you did to it.  On the other hand, you might
want to identify several tuples, or identify a tuple with a TID that's
not 6 bytes, so that's a good reason for allowing more flexibility.

Another point in being favor of being more flexible is that it's not
clear that there's any use case for third-party tools that work using
undo.  WAL drives replication and logical decoding and could be used
to drive incremental backup, but it's not really clear that similar
applications exist for undo.  If it's just private to the AM, the AM
might as well be responsible for it.  If that leads to code
duplication, we can create a library of common routines and AM users
can use them if they want.

> Hmm. If you're following an UNDO chain, from newest to oldest, I would
> assume that the newer record has enough information to decide whether
> you need to look at the previous record. If the previous record is no
> longer interesting, it might already be discarded away, after all.

I actually thought zedstore might need this pattern.  If you store an
XID with each undo pointer, as the current zheap code mostly does,
then you have enough information to decide whether you care about the
previous undo record before you fetch it. But a tuple stores only an
undo pointer, and you determine that the undo isn't discarded, you
have to fetch the record first and then possibly decide that you had
the right version in the first place.  Now, maybe that pattern doesn't
repeat, because the undo records could be set up to contain both an
XMIN and an XMAX, but not necessarily.  I don't know exactly what you
have in mind, but it doesn't seem totally crazy that an undo record
might contain the XID that created that version but not the XID that
created the prior version, and if so, you'll iterate backwards until
you either hit the end of undo or go one undo record past the version
you can see.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add "password_protocol" connection parameter to libpq

2019-08-09 Thread Jeff Davis
On Fri, 2019-08-09 at 09:28 -0400, Stephen Frost wrote:
> Having an 'any' option, as mentioned before, could be an alternative
> though.

...

> I agree with the point that there isn't any guarantee that it'll
> always
> be clear-cut as to which of two methods is "better".
> 
> From a user perspective, it seems like the main things are "don't
> send
> my password in the clear to the server", and "require channel binding
> to
> prove there isn't a MITM".  I have to admit that I like the idea of
> requiring scram to be used and not allowing md5 though.

So it seems like we are leaning toward:

   password_protocol = any | {plaintext,md5,scram-sha-256,scram-sha-
256-plus}[,...]

Or maybe:

   channel_binding = {disable|prefer|require}
   password_plaintext = {disable|enable}
   password_md5 = {disable|enable}

That seems reasonable. It's three options, but no normal use case would
need to set more than two, because channel binding forces scram-sha-
256-plus.

Regards,
Jeff Davis






Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-08-09 Thread Tomas Vondra

On Fri, Aug 09, 2019 at 11:51:23PM +0900, Masahiko Sawada wrote:

On Fri, Aug 9, 2019 at 10:25 AM Bruce Momjian  wrote:


On Thu, Aug  8, 2019 at 06:31:42PM -0400, Stephen Frost wrote:
> > >Crash recovery doesn't happen "all the time" and neither does vacuum
> > >freeze, and autovacuum processes are independent of individual client
> > >backends- we don't need to (and shouldn't) have the keys in shared
> > >memory.
> >
> > Don't people do physical replication / HA pretty much all the time?
>
> Strictly speaking, that isn't actually crash recovery, it's physical
> replication / HA, and while those are certainly nice to have it's no
> guarantee that they're required or that you'd want to have the same keys
> for them- conceptually, at least, you could have WAL with one key that
> both sides know and then different keys for the actual data files, if we
> go with the approach where the WAL is encrypted with one key and then
> otherwise is plaintext.

Uh, yes, you could have two encryption keys in the data directory, one
for heap/indexes, one for WAL, both unlocked with the same passphrase,
but what would be the value in that?

> > >>That might allow crash recovery and the freeze part of VACUUM FREEZE to
> > >>work.  (I don't think we could vacuum since we couldn't read the index
> > >>pages to find the matching rows since the index values would be encrypted
> > >>too.  We might be able to not encrypt the tid in the index typle.)
> > >
> > >Why do we need the indexed values to vacuum the index..?  We don't
> > >today, as I recall.  We would need the tids though, yes.
> >
> > Well, we also do collect statistics on the data, for example. But even
> > if we assume we wouldn't do that for encrypted indexes (which seems like
> > a pretty bad idea to me), you'd probably end up leaking information
> > about ordering of the values. Which is generally a pretty serious
> > information leak, AFAICS.
>
> I agree entirely that order information would be bad to leak- but this
> is all new ground here and we haven't actually sorted out what such a
> partially encrypted btree would look like.  We don't actually have to
> have the down-links in the tree be unencrypted to allow vacuuming of
> leaf pages, after all.

Agreed, but I think we kind of know that the value in cluster-wide
encryption is different from multi-key encryption --- both have their
value, but right now cluster-wide is the easiest and simplest, and
probably meets more user needs than multi-key encryption.  If others
want to start scoping out what multi-key encryption would look like, we
can discuss it.  I personally would like to focus on cluster-wide
encryption for PG 13.


I agree that cluster-wide is more simpler but I'm not sure that it
meets real needs from users. One example is re-encryption; when the
key leakage happens, in cluster-wide encryption we end up with doing
re-encrypt whole database regardless the amount of user sensitive data
in database. I think it's a big constraint for users because it's
common that the amount of data such as master table that needs to be
encrypted doesn't account for a large potion of database. That's one
reason why I think more fine granularity encryption such as
table/tablespace is required.



TBH I think it's mostly pointless to design for key leakage.

My understanding it that all this work is motivated by the assumption that
Bob can obtain access to the data directory (say, a backup of it). So if
he also manages to get access to the encryption key, we probably have to
assume he already has access to current snapshot of the data directory,
which means any re-encryption is pretty futile.

What we can (and should) optimize for is key rotation, but as that only
changes the master key and not the actual encryption keys, the overhead is
pretty low.

We can of course support "forced" re-encryption, but I think it's
acceptable if that's fairly expensive as long as it can be throttled and
executed in the background (kinda similar to the patch to enable checksums
in the background).


And in terms of feature development we would implement
fine-granularity encryption in the future even if the first step is
cluster-wide encryption? And both TDEs encrypt the same kind of
database objects (i.e. only  tables , indexes and WAL)? If so, how
does users  use them depending on cases?

I imagined the case where we had the cluster-wide encryption as the
first TDE feature. We will enable TDE at initdb time by specifying the
command-line parameter for TDE. Then TDE is enabled in cluster wide
and all tables/indexes and WAL are automatically encrypted. Then, if
we want to implement the more fine granularity encryption how can we
make users use it? WAL encryption and tables/index encryption are
enabled at the same time but we want to enable encryption for
particular tables/indexes after initdb. If the cluster-wide encryption
is something like a short-cut of encrypting all tables/indexes, I
personally think that implementing the more fine gra

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-08-09 Thread Tomas Vondra

On Thu, Aug 08, 2019 at 06:31:42PM -0400, Stephen Frost wrote:

Greetings,

* Tomas Vondra (tomas.von...@2ndquadrant.com) wrote:

On Thu, Aug 08, 2019 at 03:07:59PM -0400, Stephen Frost wrote:
>* Bruce Momjian (br...@momjian.us) wrote:
>>On Tue, Jul  9, 2019 at 11:09:01AM -0400, Bruce Momjian wrote:
>>> On Tue, Jul  9, 2019 at 10:59:12AM -0400, Stephen Frost wrote:
>>> > * Bruce Momjian (br...@momjian.us) wrote:
>>> > I agree that all of that isn't necessary for an initial implementation,
>>> > I was rather trying to lay out how we could improve on this in the
>>> > future and why having the keying done at a tablespace level makes sense
>>> > initially because we can then potentially move forward with further
>>> > segregation to improve the situation.  I do believe it's also useful in
>>> > its own right, to be clear, just not as nice since a compromised backend
>>> > could still get access to data in shared buffers that it really
>>> > shouldn't be able to, even broadly, see.
>>>
>>> I think TDE is feature of questionable value at best and the idea that
>>> we would fundmentally change the internals of Postgres to add more
>>> features to it seems very unlikely.  I realize we have to discuss it so
>>> we don't block reasonable future feature development.
>>
>>I have a new crazy idea.  I know we concluded that allowing multiple
>>independent keys, e.g., per user, per table, didn't make sense since
>>they have to be unlocked all the time, e.g., for crash recovery and
>>vacuum freeze.
>
>I'm a bit confused as I never agreed that made any sense and I continue
>to feel that it doesn't make sense to have one key for everything.
>
>Crash recovery doesn't happen "all the time" and neither does vacuum
>freeze, and autovacuum processes are independent of individual client
>backends- we don't need to (and shouldn't) have the keys in shared
>memory.

Don't people do physical replication / HA pretty much all the time?


Strictly speaking, that isn't actually crash recovery, it's physical
replication / HA, and while those are certainly nice to have it's no
guarantee that they're required or that you'd want to have the same keys
for them- conceptually, at least, you could have WAL with one key that
both sides know and then different keys for the actual data files, if we
go with the approach where the WAL is encrypted with one key and then
otherwise is plaintext.



Uh? IMHO not breaking physical replication / HA should be pretty much
required for any new feature, unless it's somehow obviously clear that
it's not needed for that particular feature. I very much doubt we can make
that conclusion for encrypted instances (at least I don't see why it would
be the case in general).

One reason is that those features are also used for backups, which I hope
we both agree is not an optional feature. Maybe it's possible to modify
pg_basebackup to re-encrypt all the data, but to do that it clearly needs
to know all encryption keys (although not necessarily on the same side).


>>However, that assumes that all heap/index pages are encrypted, and all
>>of WAL.  What if we encrypted only the user-data part of the page, i.e.,
>>tuple data.  We left xmin/xmax unencrypted, and only stored the
>>encrypted part of that data in WAL, and didn't encrypt any more of WAL.
>
>This is pretty much what Alvaro was suggesting a while ago, isn't it..?
>Have just the user data be encrypted in the table and in the WAL stream.

It's also moving us much closer to pgcrypto-style encryption ...


Yes, it is, and there's good parts and bad parts to that, to be sure.


>>That might allow crash recovery and the freeze part of VACUUM FREEZE to
>>work.  (I don't think we could vacuum since we couldn't read the index
>>pages to find the matching rows since the index values would be encrypted
>>too.  We might be able to not encrypt the tid in the index typle.)
>
>Why do we need the indexed values to vacuum the index..?  We don't
>today, as I recall.  We would need the tids though, yes.

Well, we also do collect statistics on the data, for example. But even
if we assume we wouldn't do that for encrypted indexes (which seems like
a pretty bad idea to me), you'd probably end up leaking information
about ordering of the values. Which is generally a pretty serious
information leak, AFAICS.


I agree entirely that order information would be bad to leak- but this
is all new ground here and we haven't actually sorted out what such a
partially encrypted btree would look like.  We don't actually have to
have the down-links in the tree be unencrypted to allow vacuuming of
leaf pages, after all.



Well, I'm not all that familiar with the btree code, but I still think you
can deduce an awful amount of information from having the leaf pages alone
(not sure if we could deduce a total order, but presumably yes).


>>Is this something considering in version one of this feature?  Probably
>>not, but later?  Never?  Would the information leakage be too great,
>>particularly from

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-08-09 Thread Masahiko Sawada
On Fri, Aug 9, 2019 at 10:25 AM Bruce Momjian  wrote:
>
> On Thu, Aug  8, 2019 at 06:31:42PM -0400, Stephen Frost wrote:
> > > >Crash recovery doesn't happen "all the time" and neither does vacuum
> > > >freeze, and autovacuum processes are independent of individual client
> > > >backends- we don't need to (and shouldn't) have the keys in shared
> > > >memory.
> > >
> > > Don't people do physical replication / HA pretty much all the time?
> >
> > Strictly speaking, that isn't actually crash recovery, it's physical
> > replication / HA, and while those are certainly nice to have it's no
> > guarantee that they're required or that you'd want to have the same keys
> > for them- conceptually, at least, you could have WAL with one key that
> > both sides know and then different keys for the actual data files, if we
> > go with the approach where the WAL is encrypted with one key and then
> > otherwise is plaintext.
>
> Uh, yes, you could have two encryption keys in the data directory, one
> for heap/indexes, one for WAL, both unlocked with the same passphrase,
> but what would be the value in that?
>
> > > >>That might allow crash recovery and the freeze part of VACUUM FREEZE to
> > > >>work.  (I don't think we could vacuum since we couldn't read the index
> > > >>pages to find the matching rows since the index values would be 
> > > >>encrypted
> > > >>too.  We might be able to not encrypt the tid in the index typle.)
> > > >
> > > >Why do we need the indexed values to vacuum the index..?  We don't
> > > >today, as I recall.  We would need the tids though, yes.
> > >
> > > Well, we also do collect statistics on the data, for example. But even
> > > if we assume we wouldn't do that for encrypted indexes (which seems like
> > > a pretty bad idea to me), you'd probably end up leaking information
> > > about ordering of the values. Which is generally a pretty serious
> > > information leak, AFAICS.
> >
> > I agree entirely that order information would be bad to leak- but this
> > is all new ground here and we haven't actually sorted out what such a
> > partially encrypted btree would look like.  We don't actually have to
> > have the down-links in the tree be unencrypted to allow vacuuming of
> > leaf pages, after all.
>
> Agreed, but I think we kind of know that the value in cluster-wide
> encryption is different from multi-key encryption --- both have their
> value, but right now cluster-wide is the easiest and simplest, and
> probably meets more user needs than multi-key encryption.  If others
> want to start scoping out what multi-key encryption would look like, we
> can discuss it.  I personally would like to focus on cluster-wide
> encryption for PG 13.

I agree that cluster-wide is more simpler but I'm not sure that it
meets real needs from users. One example is re-encryption; when the
key leakage happens, in cluster-wide encryption we end up with doing
re-encrypt whole database regardless the amount of user sensitive data
in database. I think it's a big constraint for users because it's
common that the amount of data such as master table that needs to be
encrypted doesn't account for a large potion of database. That's one
reason why I think more fine granularity encryption such as
table/tablespace is required.

And in terms of feature development we would implement
fine-granularity encryption in the future even if the first step is
cluster-wide encryption? And both TDEs encrypt the same kind of
database objects (i.e. only  tables , indexes and WAL)? If so, how
does users  use them depending on cases?

I imagined the case where we had the cluster-wide encryption as the
first TDE feature. We will enable TDE at initdb time by specifying the
command-line parameter for TDE. Then TDE is enabled in cluster wide
and all tables/indexes and WAL are automatically encrypted. Then, if
we want to implement the more fine granularity encryption how can we
make users use it? WAL encryption and tables/index encryption are
enabled at the same time but we want to enable encryption for
particular tables/indexes after initdb. If the cluster-wide encryption
is something like a short-cut of encrypting all tables/indexes, I
personally think that implementing the more fine granularity one first
and then using it to achieve the more coarse granularity would be more
easier.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center




Re: Problem with default partition pruning

2019-08-09 Thread Alvaro Herrera
On 2019-Aug-09, Amit Langote wrote:

> Hmm, oops.  I think that judgement was a bit too rushed on my part.  I
> unintentionally ended up making the partition constraint to *always*
> be fetched, whereas we don't need it in most cases.  I've reverted
> that change.

Yeah, I was quite confused about this point yesterday while I was trying
to make sense of your patches.

> RelOptInfo.partition_qual is poorly named in retrospect.
> :(  It's not set for all partitions, only those that are partitioned
> themselves.

Oh.  Hmm, I think this realization further clarifies things.

Since we're only changing this in the master branch anyway, maybe we can
find a better name for it.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: SQL/JSON path: collation for comparisons, minor typos in docs

2019-08-09 Thread Alexander Korotkov
On Thu, Aug 8, 2019 at 11:30 PM Alexander Korotkov
 wrote:
> On Thu, Aug 8, 2019 at 11:53 AM Markus Winand  wrote:
> > The patch makes my tests pass.
>
> Cool.
>
> > I wonder about a few things:
> >
> > - Isn’t there any code that could be re-used for that (the one triggered by 
> > ‘a’ < ‘A’ COLLATE ucs_basic)?
>
> PostgreSQL supports ucs_basic, but it's alias to C collation and works
> only for utf-8.  Jsonpath code may work in different encodings.  New
> string comparison code can work in different encodings.
>
> > - For object key members, the standard also refers to unicode code point 
> > collation (SQL-2:2016 4.46.3, last paragraph).
> > - I guess it also applies to the “starts with” predicate, but I cannot find 
> > this explicitly stated in the standard.
>
> For object keys we don't actually care about whether strings are less
> or greater.  We only search for equal keys.  So, per-byte comparison
> we currently use should be fine.  The same states for "starts with"
> predicate.
>
> > My tests check whether those cases do case-sensitive comparisons. With my 
> > default collation "en_US.UTF-8” I cannot discover potential issues there. I 
> > haven’t played around with nondeterministic ICU collations yet :(
>
> That's OK. There should be other beta testers around :)

So, I'm going to push this if no objections.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Rethinking opclass member checks and dependency strength

2019-08-09 Thread Komяpa
>
> But none of our contrib modules do it like that, and I'd lay long odds
> against any third party code doing it either.


Thoughts?
>


PostGIS has some rarely used box operations as part of GiST opclass, like
"overabove".

These are source of misunderstanding, as it hinges on the fact that
non-square geometry will be coerced into a box on a call, which is not
obvious when you call it on something like diagonal linestrings.
It may happen that we will decide to remove them. On such circumstances, I
expect that ALTER OPERATOR CLASS DROP OPERATOR will work.

Other thing that I would expect is that DROP FUNCTION ... CASCADE will
remove the operator and unregister the operator from operator class without
dropping operator class itself.


Re: Global temporary tables

2019-08-09 Thread Konstantin Knizhnik



On 09.08.2019 8:34, Craig Ringer wrote:
On Thu, 8 Aug 2019 at 15:03, Konstantin Knizhnik 
mailto:k.knizh...@postgrespro.ru>> wrote:




On 08.08.2019 5:40, Craig Ringer wrote:

On Tue, 6 Aug 2019 at 16:32, Konstantin Knizhnik
mailto:k.knizh...@postgrespro.ru>> wrote:

New version of the patch with several fixes is attached.
Many thanks to Roman Zharkov for testing.


FWIW I still don't understand your argument with regards to using
shared_buffers for temp tables having connection pooling
benefits. Are you assuming the presence of some other extension
in your extended  version of PostgreSQL ? In community PostgreSQL
a temp table's contents in one backend will not be visible in
another backend. So if your connection pooler in transaction
pooling mode runs txn 1 on backend 42 and it populates temp table
X, then the pooler runs the same app session's txn 2 on backend
45, the contents of temp table X are not visible anymore.


Certainly here I mean built-in connection pooler which is not
currently present in Postgres,
but it is part of PgPRO-EE and there is my patch for vanilla at
commitfest:
https://commitfest.postgresql.org/24/2067


OK, that's what I assumed.

You're trying to treat this change as if it's a given that the other 
functionality you want/propose is present in core or will be present 
in core. That's far from given. My suggestion is to split it up so 
that the parts can be reviewed and committed separately.


In PgPRO-EE this problem was solved by binding session to backend.
I.e. one backend can manage multiple sessions,
but session can not migrate to another backend. The drawback of
such solution is obvious: one long living transaction can block
transactions of all other sessions scheduled to this backend.
Possibility to migrate session to another backend is one of the
obvious solutions of the problem. But the main show stopper for it
is temporary tables.
This is why  I consider moving temporary tables to shared buffers
as very important step.


I can see why it's important for your use case.

I am not disagreeing.

I am however strongly suggesting that your patch has two fairly 
distinct functional changes in it, and you should separate them out.


* Introduce global temp tables, a new relkind that works like a temp 
table but doesn't require catalog changes. Uses per-backend 
relfilenode and cleanup like existing temp tables. You could extend 
the relmapper to handle the mapping of relation oid to per-backend 
relfilenode.


* Associate global temp tables with session state and manage them in 
shared_buffers so they can work with the in-core connection pooler (if 
committed)


Historically we've had a few efforts to get in-core connection pooling 
that haven't gone anywhere. Without your pooler patch the changes you 
make to use shared_buffers etc are going to be unhelpful at best, if 
not actively harmful to performance, and will add unnecessary 
complexity. So I think there's a logical series of patches here:


* global temp table relkind and support for it
* session state separation
* connection pooling
* pooler-friendly temp tables in shared_buffers

Make sense?

But even if we forget about built-in connection pooler, don't you
think that possibility to use parallel query plans for temporary
tables is itself strong enough motivation to access global temp
table through shared buffers?


I can see a way to share temp tables across parallel query backends 
being very useful for DW/OLAP workloads, yes. But I don't know if 
putting them in shared_buffers is the right answer for that. We have 
DSM/DSA, we have shm_mq, various options for making temp buffers 
share-able with parallel worker backends.


I'm suggesting that you not tie the whole (very useful) global temp 
tables feature to this, but instead split it up into logical units 
that can be understood, reviewed and committed separately.


I would gladly participate in review.


Ok, here it is: global_private_temp-1.patch
Also I have attached updated version of the global temp tables with 
shared buffers - global_shared_temp-1.patch
It is certainly larger (~2k lines vs. 1.5k lines) because it is changing 
BufferTag and related functions.

But I do not think that this different is so critical.
Still have a wish to kill two birds with one stone:)









--


Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/contrib/pg_buffercache/pg_buffercache_pages.c b/contrib/pg_buffercache/pg_buffercache_pages.c
index 1bd579f..2d93f6f 100644
--- a/contrib/pg_buffercache/pg_buffercache_pages.c
+++ b/contrib/pg_buffercache/pg_buffercache_pages.c
@@ -153,9 +153,9 @@ pg_buffercache_pages(PG_FUNCTION_ARGS)
 			buf_state = LockBufHdr(bufHdr);
 
 			fctx->record[i].bufferid = BufferDescriptorGetBuffer(bufHdr);
-			fctx->record[i].relfilenode = bufHdr->tag.

Re: Store FullTransactionId in TwoPhaseFileHeader/GlobalTransactionData

2019-08-09 Thread Robert Haas
On Wed, Aug 7, 2019 at 6:56 AM vignesh C  wrote:
> Going by the discussion shall we conclude that we don't need to
> convert the subxids into fxid's as part of this fix.
> Let me know if any further changes need to be done.

I'm not sure, but I think the prior question is whether we want this
patch at all, and I'm not sure we've achieved consensus on that.
Thomas's point, at least as I understand it, is that if we start doing
32-bit => 64-bit XID conversions all over the place, it's not going to
be long before some incautious developer inserts one that is not
actually safe. On the other hand, Andres's point, at least as I
understand it, is that putting information that we don't really need
into the twophase state file because of some overly rigid coding rule
is not smart.  Both of those arguments sound right to me, but they
lead to opposite conclusions.

I am somewhat inclined to Andres's conclusion on balance.  I think
that we can probably define a set of policies about 32 => 64 bit XID
conversions both in terms of when you can do them and what comments
you have to include justifying them and how the API actually works
that makes it safe. It might help to think about defining the API in
terms of a reference FullTransactionId that must be OLDER than the XID
you're promoting to an FXID.  For instance, we know that all of the
relfrozenxid and datfrozenxids are from the current era because we've
got freezing machinery to enforce that. So if you got a tuple from the
heap, it's XID has got to be new, at least modulo bugs. In other
cases, we may be able to say, hey, look, this XID can't be from before
the CLOG cutoff.  I'm not sure of all the details here, but I'm
tentatively inclined to think that trying to lay down policies for
when promotion can be done safely is more promising than holding our
breath and saying we're never going to promote.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add "password_protocol" connection parameter to libpq

2019-08-09 Thread Stephen Frost
Greetings,

* Michael Paquier (mich...@paquier.xyz) wrote:
> On Thu, Aug 08, 2019 at 11:16:24PM -0700, Jeff Davis wrote:
> > On Fri, 2019-08-09 at 12:00 +0900, Michael Paquier wrote:
> > > What about auth_protocol then?  It seems to me that it could be
> > > useful
> > > to have the restriction on AUTH_REQ_MD5 as well.
> > 
> > auth_protocol does sound like a good name. I'm not sure what you mean
> > regarding MD5 though.

I don't really care for auth_protocol as that's pretty close to
"auth_method" and that isn't what we're talking about here- this isn't
the user picking the auth method, per se, but rather saying which of the
password-based mechanisms for communicating that the user knows the
password is acceptable.  Letting users choose which auth methods are
allowed might also be interesting (as in- we are in a Kerberized
environment and therefore no client should ever be using any auth method
except GSS, could be a reasonable ask) but it's not the same thing.

> Sorry, I meant krb5 here.

What restriction are you suggesting here wrt krb5..?

> > We already have that concept to a lesser extent, with the md5
> > authentication method also permitting scram-sha-256.
> 
> That's present to ease upgrades, and once the AUTH_REQ part is
> received the client knows what it needs to go through.

I don't think there's any question that, of the methods available to
prove that you know what the password is, simply sending the password to
the server as cleartext is the least secure.  If I, as a user, decide
that I don't really care what method is used, it's certainly simpler to
just say 'plaintext' than to have to list out every possible option.

Having an 'any' option, as mentioned before, could be an alternative
though.

I agree with the point that there isn't any guarantee that it'll always
be clear-cut as to which of two methods is "better".

From a user perspective, it seems like the main things are "don't send
my password in the clear to the server", and "require channel binding to
prove there isn't a MITM".  I have to admit that I like the idea of
requiring scram to be used and not allowing md5 though.

> > That sounds good, but there are a lot of possibilities and I can't
> > quite decide which way to go.
> > 
> > We could expose it as an SASL option like:
> > 
> >saslmode = {disable|prefer|require-scram-sha-256|require-scram-sha-
> > 256-plus}
> 
> Or we could shape password_protocol so as it takes a list of
> protocols, as a white list of authorized things in short.

I'm not really a fan of "saslmode" or anything else involving SASL
for this because we don't *really* do SASL- if we did, we'd have that as
an auth method and we'd have our Kerberos support be through SASL along
with potentially everything else...  I'm not against going there but I
don't think that's what you were suggesting here.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: block-level incremental backup

2019-08-09 Thread Robert Haas
On Thu, Aug 8, 2019 at 8:37 PM Jeevan Ladhe
 wrote:
> +   if (!XLogRecPtrIsInvalid(previous_lsn))
> +   appendStringInfo(labelfile, "PREVIOUS WAL LOCATION: %X/%X\n",
> +(uint32) (previous_lsn >> 32), (uint32) 
> previous_lsn);
>
> May be we should rename to something like:
> "INCREMENTAL BACKUP START WAL LOCATION" or simply "INCREMENTAL BACKUP START 
> LOCATION"
> to make it more intuitive?

So, I think that you are right that PREVIOUS WAL LOCATION might not be
entirely clear, but at least in my view, INCREMENTAL BACKUP START WAL
LOCATION is definitely not clear.  This backup is an incremental
backup, and it has a start WAL location, so you'd end up with START
WAL LOCATION and INCREMENTAL BACKUP START WAL LOCATION and those sound
like they ought to both be the same thing, but they're not.  Perhaps
something like REFERENCE WAL LOCATION or REFERENCE WAL LOCATION FOR
INCREMENTAL BACKUP would be clearer.

> File header structure is defined in both the files basebackup.c and
> pg_combinebackup.c. I think it is better to move this to 
> replication/basebackup.h.

Or some other header, but yeah, definitely don't duplicate the struct
definition (or any other kind of definition).

> IMHO, while labels are not advisable in general, it may be better to use a 
> label
> here rather than a while(1) loop, so that we can move to the label in case we
> want to retry once. I think here it opens doors for future bugs if someone
> happens to add code here, ending up adding some condition and then the
> break becomes conditional. That will leave us in an infinite loop.

I'm not sure which style is better here, but I don't really buy this argument.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: block-level incremental backup

2019-08-09 Thread Robert Haas
On Wed, Aug 7, 2019 at 5:46 AM Jeevan Chalke
 wrote:
> So, do you mean we should just do fread() and fwrite() for the whole file?
>
> I thought it is better if it was done by the OS itself instead of reading 1GB
> into the memory and writing the same to the file.

Well, 'cp' is just a C program.  If they can write code to copy a
file, so can we, and then we're not dependent on 'cp' being installed,
working properly, being in the user's path or at the hard-coded
pathname we expect, etc.  There's an existing copy_file() function in
src/backed/storage/file/copydir.c which I'd probably look into
adapting for frontend use.  I'm not sure whether it would be important
to adapt the data-flushing code that's present in that routine or
whether we could get by with just the loop to read() and write() data.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: SegFault on 9.6.14

2019-08-09 Thread Robert Haas
On Wed, Aug 7, 2019 at 5:45 AM vignesh C  wrote:
> I have made a patch based on the above lines.
> I have tested the scenarios which Thomas had shared in the earlier
> mail and few more tests based on Thomas's tests.
> I'm not sure if we will be going ahead with this solution or not.
> Let me know your opinion on the same.
> If you feel this approach is ok, we can add few of this tests into pg tests.

I think this patch is bizarre:

- It introduces a new function called getParallelModeLevel(), which is
randomly capitalized different from the other functions that do
similar things, and then uses it to do the same thing that could have
been done with the existing function IsInParallelMode().
- It contains an "if" statement whose only content is an Assert().
Don't write if (a) Assert(b); write Assert(!a || b).
- It contains zero lines of comment changes, which is obviously not
enough for a patch that proposes to fix a very thorny issue.  This
failure has two parts.  First, it adds no new comments to explain the
bug being fixed or the theory of operation of the new code. Second, it
does not even bother updating existing comments that are falsified by
the patch, such as the function header comments for
ExecParallelCleanup and ExecShutdownGather.
- It changes what ExecParallelCleanup does while adjusting only one of
the two callers to match the behavior change.  nodeGatherMerge.c
manages to be completed untouched by this patch.  If you change what a
function does, you really need to grep for all the calls to that
function and adjust all callers to match the new set of expectations.

It's a little hard to get past all of those issues and look at what
the patch actually does, but I'm going to try: the theory of operation
of the patch seems to be that we can skip destroying the parallel
context when performing ExecParallelCleanup and in fact when exiting
parallel mode, and then when we get to executor end time the context
will still be there and we can fish the instrumentation out of it. But
this seems problematic for several reasons.  For one thing, as Amit
already observed, the code currently contains an assertion which
ensure that a ParallelContext can't outlive the time spent in parallel
mode, and it doesn't seem desirable to relax that assertion (this
patch removes it).

But beyond that, the issue here is that the Limit node is shutting
down the Gather node too early, and the right fix must be to stop
doing that, not to change the definition of what it means to shut down
a node, as this patch does.  So maybe a possible approach here - which
I think is more or less what Tom is proposing - is:

1. Remove the code from ExecLimit() that calls ExecShutdownNode().
2. Adjust ExecutePlan() so that it ensures that ExecuteShutdownNode()
gets called at the very end of execution, at least when execute_once
is set, before exiting parallel mode.
3. Figure out, possibly at a later time or only in HEAD, how to make
the early call to ExecLimit() in ExecShutdownNode(), and then put it
back. I think we could do this by passing down some information
indicating which nodes are potentially rescanned by other nodes higher
up in the tree; there's the separate question of whether rescans can
happen due to cursor operations, but the execute_once stuff can handle
that aspect of it, I think.

I'm not quite sure that approach is altogether correct so I'd
appreciate some analysis on that point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add "password_protocol" connection parameter to libpq

2019-08-09 Thread Michael Paquier
On Thu, Aug 08, 2019 at 11:16:24PM -0700, Jeff Davis wrote:
> On Fri, 2019-08-09 at 12:00 +0900, Michael Paquier wrote:
> > What about auth_protocol then?  It seems to me that it could be
> > useful
> > to have the restriction on AUTH_REQ_MD5 as well.
> 
> auth_protocol does sound like a good name. I'm not sure what you mean
> regarding MD5 though.

Sorry, I meant krb5 here.

> We already have that concept to a lesser extent, with the md5
> authentication method also permitting scram-sha-256.

That's present to ease upgrades, and once the AUTH_REQ part is
received the client knows what it needs to go through.

> That sounds good, but there are a lot of possibilities and I can't
> quite decide which way to go.
> 
> We could expose it as an SASL option like:
> 
>saslmode = {disable|prefer|require-scram-sha-256|require-scram-sha-
> 256-plus}

Or we could shape password_protocol so as it takes a list of
protocols, as a white list of authorized things in short.
--
Michael


signature.asc
Description: PGP signature


Re: POC: Cleaning up orphaned files using undo logs

2019-08-09 Thread Amit Kapila
On Fri, Jul 26, 2019 at 9:57 PM Amit Khandekar  wrote:
>
> On Fri, 26 Jul 2019 at 12:25, Amit Kapila  wrote:
> > I agree with all your other comments.
>
> Thanks for addressing the comments. Below is the continuation of my
> comments from 0014-Allow-execution-and-discard-of-undo-by-background-wo.patch
> :
>
>
> + * Perform rollback request.  We need to connect to the database for first
> + * request and that is required because we access system tables while
> for first request and that is required => for the first request. This
> is required
>

Changed as per suggestion.

> ---
>
> +UndoLauncherShmemSize(void)
> +{
> +Sizesize;
> +
> +/*
> + * Need the fixed struct and the array of LogicalRepWorker.
> + */
> +size = sizeof(UndoApplyCtxStruct);
>
> The fixed structure size should be  offsetof(UndoApplyCtxStruct,
> workers) rather than sizeof(UndoApplyCtxStruct)
>
> ---
>

Why? I see the similar code in ApplyLauncherShmemSize.  If there is
any problem with this, then we might have similar problem in existing
code as well.

> In UndoWorkerCleanup(), we set individual fields of the
> UndoApplyWorker structure, whereas in UndoLauncherShmemInit(), for all
> the UndoApplyWorker array elements, we just memset all the
> UndoApplyWorker structure elements to 0. I think we should be
> consistent for the two cases. I guess we can just memset to 0 as you
> do in UndoLauncherShmemInit(), but this will cause the
> worker->undo_worker_queue to be 0 i.e. XID_QUEUE , whereas in
> UndoWorkerCleanup(), it is set to -1. Is the -1 value essential, or we
> can just set it to XID_QUEUE initially ?

Either is fine, because before launching the worker we set the valid
value.  It is better to set it as InvalidUndoWorkerQueue though.

> Also, if we just use memset in UndoWorkerCleanup(), we need to first
> save generation into a temp variable, and then after memset(), restore
> it back.
>

This sounds like an unnecessary trickery.

> That brought me to another point :
> We already have a macro ResetUndoRequestInfo(), so UndoWorkerCleanup()
> can just call ResetUndoRequestInfo().
> 
>

Hmm, both (UndoRequestInfo and UndoApplyWorker) are separate
structures, so how can we reuse them?

> +boolallow_peek;
> +
> +CHECK_FOR_INTERRUPTS();
> +
> +allow_peek = !TimestampDifferenceExceeds(started_at,
> Some comments would be good about what is allow_peek  used for. Something 
> like :
> "Arrange to prevent the worker from restarting quickly to switch databases"
>

Added a slightly different comment.

> -
> +++ b/src/backend/access/undo/README.UndoProcessing
> -
>
> +worker then start reading from one of the queues the requests for that
> start=>starts
> ---
>
> +work, it lingers for UNDO_WORKER_LINGER_MS (10s as default).  This avoids
> As per the latest definition, it is 20s. IMHO, there's no need to
> mention the default value in the readme.
> ---
>
> +++ b/src/backend/access/undo/discardworker.c
> ---
>
> + * portion of transaction that is overflowed into a separate log can
> be processed
> 80-col crossed.
>
> +#include "access/undodiscard.h"
> +#include "access/discardworker.h"
> Not in alphabetical order
>

Fixed all of the above four points.

>
> +++ b/src/backend/access/undo/undodiscard.c
> ---
>
> +next_insert = UndoLogGetNextInsertPtr(logno);
> I checked UndoLogGetNextInsertPtr() definition. It calls
> find_undo_log_slot() to get back the slot from logno. Why not make it
> accept slot as against logno ? At all other places, the slot->logno is
> passed, so it is convenient to just pass the slot there. And in
> UndoDiscardOneLog(), first call find_undo_log_slot() just before the
> above line (or call it at the end of the do-while loop).
>

I am not sure if this is a good idea because find_undo_log_slot is
purely a undolog module stuff, exposing it to outside doesn't seem
like a good idea to me.

> This way,
> during each of the UndoLogGetNextInsertPtr() calls in undorequest.c,
> we will have one less find_undo_log_slot() call.
>

I am not sure there is any performance benefit either because there is
a cache for the slots and it should get from there very quickly.  I
think we can avoid this repeated call and I have done so in the
attached patch.

> -
>
> In UndoDiscardOneLog(), there are at least 2 variable declarations
> that can be moved inside the do-while loop : uur and next_insert. I am
> not sure about the other variables viz : undofxid and
> latest_discardxid. Values of these variables in one iteration continue
> across to the second iteration. For latest_discardxid, it looks like
> we do want its value to be carried forward, but is it also true for
> undofxid ?
>

undofxid can be moved inside the loop, fixed that and other variables
pointed out by you.

> + /* If we reach here, this means there is something to discard. */
> + need_di

Re: POC: Cleaning up orphaned files using undo logs

2019-08-09 Thread Amit Kapila
On Tue, Jul 23, 2019 at 8:12 PM Amit Khandekar  wrote:
>
> 
>
> Some further review comments for undoworker.c :
>
>
> +/* Sets the worker's lingering status. */
> +static void
> +UndoWorkerIsLingering(bool sleep)
> The function name sounds like "is the worker lingering ?". Can we
> rename it to something like "UndoWorkerSetLingering" ?
>

makes sense, changed as per suggestion.

> -
>
> + errmsg("undo worker slot %d is empty, cannot attach",
> + slot)));
>
> + }
> +
> + if (MyUndoWorker->proc)
> + {
> + LWLockRelease(UndoWorkerLock);
> + ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("undo worker slot %d is already used by "
> + "another worker, cannot attach", slot)));
>
> These two error messages can have a common error message "could not
> attach to worker slot", with errdetail separate for each of them :
> slot %d is empty.
> slot %d is already used by another worker.
>
> --
>

Changed as per suggestion.

> +static int
> +IsUndoWorkerAvailable(void)
> +{
> + int i;
> + int alive_workers = 0;
> +
> + LWLockAcquire(UndoWorkerLock, LW_EXCLUSIVE);
>
> Should have bool return value.
>
> Also, why is it keeping track of number of alive workers ? Sounds like
> earlier it used to return number of alive workers ? If it indeed needs
> to just return true/false, we can do away with alive_workers.
>
> Also, *exclusive* lock is unnecessary.
>
> --
>

Changed as per suggestion.  Additionally, I changed the name of the
function to UndoWorkerIsAvailable(), so that it is similar to other
functions in the file.

> +if (UndoGetWork(false, false, &urinfo, NULL) &&
> +IsUndoWorkerAvailable())
> +UndoWorkerLaunch(urinfo);
>
> There is no lock acquired between IsUndoWorkerAvailable() and
> UndoWorkerLaunch(); that means even though IsUndoWorkerAvailable()
> returns true, there is a small window where UndoWorkerLaunch() does
> not find any worker slot with in_use false, causing assertion failure
> for (worker != NULL).
> --
>

I have removed the assert and instead added a warning.  I have also
added a comment from the place where we call UndoWorkerLaunch to
mention the race condition.

> +UndoWorkerGetSlotInfo(int slot, UndoRequestInfo *urinfo)
> +{
> + /* Block concurrent access. */
> + LWLockAcquire(UndoWorkerLock, LW_EXCLUSIVE);
> *Exclusive* lock is unnecessary.
> -
>

Right, changed to share lock.

> + LWLockRelease(UndoWorkerLock);
> + ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("undo worker slot %d is empty",
> + slot)));
> I believe there is no need to explicitly release an lwlock before
> raising an error, since the lwlocks get released during error
> recovery. Please check all other places where this is done.
> -
>

Fixed.

> + * Start new undo apply background worker, if possible otherwise return 
> false.
> worker, if possible otherwise => worker if possible, otherwise
> -
>
> +static bool
> +UndoWorkerLaunch(UndoRequestInfo urinfo)
> We don't check UndoWorkerLaunch() return value. Can't we make it's
> return value type void ?
>

Now, the function returns void and accordingly I have adjusted the
comment which should address both the above comments.

> Also, it would be better to have urinfo as pointer to UndoRequestInfo
> rather than UndoRequestInfo, so as to avoid structure copy.
> -
>

Okay, changed as per suggestion.

> +{
> + BackgroundWorker bgw;
> + BackgroundWorkerHandle *bgw_handle;
> + uint16 generation;
> + int i;
> + int slot = 0;
> We can remove variable i, and use slot variable in place of i.
> ---
>
> + snprintf(bgw.bgw_name, BGW_MAXLEN, "undo apply worker");
> I think it would be trivial to also append the worker->generation in
> the bgw_name.
> -
>

I am not sure if adding 'generation' is any useful. It might be better
to add database id as each worker can work on a particular database,
so that could be useful information.

>
> + if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
> + {
> + /* Failed to start worker, so clean up the worker slot. */
> + LWLockAcquire(UndoWorkerLock, LW_EXCLUSIVE);
> + UndoWorkerCleanup(worker);
> + LWLockRelease(UndoWorkerLock);
> +
> + return false;
> + }
>
> Is it intentional that there is no (warning?) message logged when we
> can't register a bg worker ?
> -
>

Added a warning in that code path.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




default_table_access_method is not in sample config file

2019-08-09 Thread Heikki Linnakangas

On 11/04/2019 19:49, Andres Freund wrote:

On 2019-04-11 14:52:40 +0300, Heikki Linnakangas wrote:

diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index f7f726b5aec..bbcab9ce31a 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3638,7 +3638,7 @@ static struct config_string ConfigureNamesString[] =
{"default_table_access_method", PGC_USERSET, 
CLIENT_CONN_STATEMENT,
gettext_noop("Sets the default table access method for new 
tables."),
NULL,
-   GUC_IS_NAME
+   GUC_NOT_IN_SAMPLE | GUC_IS_NAME
},
&default_table_access_method,
DEFAULT_TABLE_ACCESS_METHOD,


Hm, I think we should rather add it to sample. That's an oversight, not
intentional.


I just noticed that this is still an issue. default_table_access_method 
is not in the sample config file, and it's not marked with 
GUC_NOT_IN_SAMPLE. I'll add this to the open items list so we don't forget.


- Heikki




Re: POC: Cleaning up orphaned files using undo logs

2019-08-09 Thread Amit Kapila
On Mon, Jul 22, 2019 at 8:39 PM Amit Khandekar  wrote:
>
> On Mon, 22 Jul 2019 at 14:21, Amit Kapila  wrote:
>
> I have started review of
> 0014-Allow-execution-and-discard-of-undo-by-background-wo.patch. Below
> are some quick comments to start with:
>
> +++ b/src/backend/access/undo/undoworker.c
>
> +#include "access/xact.h"
> +#include "access/undorequest.h"
> Order is not alphabetical
>

Fixed this and a few others.

> + * Each undo worker then start reading from one of the queue the requests for
> start=>starts
> queue=>queues
>
> -
>
> + rc = WaitLatch(MyLatch,
> +WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
> +10L, WAIT_EVENT_BGWORKER_STARTUP);
> +
> + /* emergency bailout if postmaster has died */
> + if (rc & WL_POSTMASTER_DEATH)
> + proc_exit(1);
> I think now, thanks to commit cfdf4dc4fc9635a, you don't have to
> explicitly handle postmaster death; instead you can use
> WL_EXIT_ON_PM_DEATH. Please check at all such places where this is
> done in this patch.
>
> -
>

Fixed both of the above issues.

> +UndoWorkerGetSlotInfo(int slot, UndoRequestInfo *urinfo)
> +{
> + /* Block concurrent access. */
> + LWLockAcquire(UndoWorkerLock, LW_EXCLUSIVE);
> +
> + MyUndoWorker = &UndoApplyCtx->workers[slot];
> Not sure why MyUndoWorker is used here. Can't we use a local variable
> ? Or do we intentionally attach to the slot as a side-operation ?
>
> -
>

I have changed the code around this such that we first attach to the
slot and then get the required info.  Also, I don't see the need of
exclusive lock here, so changed it to shared lock.

> + * Get the dbid where the wroker should connect to and get the worker
> wroker=>worker
>
> -
>
> + BackgroundWorkerInitializeConnectionByOid(urinfo.dbid, 0, 0);
> 0, 0 => InvalidOid, 0
>
> + * Set the undo worker request queue from which the undo worker start
> + * looking for a work.
> start => should start
> a work => work
>
> --
>

Fixed both of these.

> + if (!InsertRequestIntoErrorUndoQueue(urinfo))
> I was thinking what happens if for some reason
> InsertRequestIntoErrorUndoQueue() itself errors out. In that case, the
> entry will not be marked invalid, and so there will be no undo action
> carried out because I think the undo worker will exit. What happens
> next with this entry ?

I think this will change after integration with Robert's latest patch
on queues, so will address along with it if required.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Small const correctness patch

2019-08-09 Thread Kyotaro Horiguchi
At Thu, 8 Aug 2019 22:56:02 +0300, Mark G  wrote in 

> On Thu, Aug 8, 2019 at 8:51 PM Peter Eisentraut <
> peter.eisentr...@2ndquadrant.com> wrote:
> 
> 
> > How did you find this?  Any special compiler settings?
> >
> 
> 16 hours stuck in a plane on an international flight. I was just eyeballing
> the code to kill the boredom.

A similar loose typing is seen, for example:p

-const char *
+const char * const

src/backend/access/rmgrdesc/*.c
 relmap_identify(uint8 info)
 seq_identify(uint8 info)
 smgr_identify(uint8 info)
 (many)...

src/backend/access/transam/xact.c:
 BlockStateAsString(TBlockState blockState)


I foundnd them by 

find $(TOP) -type f -exec egrep -nH -e '^(static )?const char \*' {} +

then eyeballing on the first ones. I don't know an automated way
to detect such possibly-loose constness of variables or
functions.


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Small patch to fix build on Windows

2019-08-09 Thread Dmitry Igrishin
пт, 9 авг. 2019 г. в 10:23, Kyotaro Horiguchi :
>
> At Fri, 9 Aug 2019 09:56:27 +0300, Dmitry Igrishin  wrote 
> in 
> > пт, 9 авг. 2019 г. в 05:45, Michael Paquier :
> > >
> > > On Thu, Aug 08, 2019 at 10:46:07PM +0300, Dmitry Igrishin wrote:
> > > > This looks nice for a Perl hacker :-). As for me, it looks unusual and
> > > > a bit confusing. I never
> > > > programmed in Perl, but I was able to quickly understand where the
> > > > problem lies due to the
> > > >  style adopted in other languages, when the contents are enclosed in
> > > > quotation marks, and
> > > > the quotation marks are escaped if they are part of the contents.
> > > > So, should I fix it? Any thoughts?
> > >
> > > FWIW, I like Alvaro's suggestion about qq{} in this case, as it makes
> > > sure that double-quotes are correctly applied where they should.
> > The attached 4rd version of the patch uses qq||. I used qq|| instead
> > of qq{} for consistency because qq|| is already used in Solution.pm:
> >
> >   return qq|VisualStudioVersion = $self->{VisualStudioVersion}
> >   MinimumVisualStudioVersion = $self->{MinimumVisualStudioVersion}
> >   |;
>
> Hmm. qq is nice but '|' make my eyes twitch (a bit).  Couldn't we
> use other delimites like (), ##, or // ? (I like {} for use in
> this patch.)
>
> Any opinions?
Personally I don't care. I used || notation only in order to be
consistent, since this notation is already used in Solution.pm. If
this consistency is not required let me provide a patch with {}
notation. What do you think?




Re: Problem with default partition pruning

2019-08-09 Thread Amit Langote
Thanks for the comments.

On Fri, Aug 9, 2019 at 2:44 PM Kyotaro Horiguchi
 wrote:
> +++ b/src/backend/optimizer/util/plancat.c
> @@ -1267,10 +1267,14 @@ get_relation_constraints(PlannerInfo *root,
> */
>if (include_partition && relation->rd_rel->relispartition)
>{
> ...
> +else
>  {
> +  /* Nope, fetch from the relcache. */
>
> I seems to me that include_partition is true both and only for
> modern and old-fasheoned partition parents.
> set_relation_partition_info() is currently called only for modern
> partition parents. If we need that at the place above,
> set_relation_partition_info can be called also for old-fashioned
> partition parent, and get_relation_constraints may forget the
> else case in a broad way.

"include_partition" doesn't have anything to do with what kind the
partition parent is.  It is true when the input relation that is a
partition is directly mentioned in the query (RELOPT_BASEREL) and
constraint_exclusion is on (inheritance_planner considerations makes
the actual code a bit hard to follow but we'll hopefully simplify that
in the near future).  That is also the only case where we need to
consider the partition constraint when doing constraint exclusion.
Regarding how this relates to partition_qual:

* get_relation_constraints() can use it if it's set, which would be
the case if the partition in question is partitioned itself

* It wouldn't be set if the partition in question is a leaf partition,
so it will have to get it directly from the relcache

> +  /* Nope, fetch from the relcache. */
> +  List   *pcqual = RelationGetPartitionQual(relation);
>
> If the comment above is right, This would be duplicate. What we
> should do instaed is only eval_const_expression. And we could
> move it to set_relation_partition_info completely. Consitify must
> be useful in both cases.

As described above, this block of code is not really duplicative in
the sense that when it runs, that would be the first time in a query
to fetch the partition constraint of the relation in question.

Also, note that expression_planner() calls eval_const_expressions(),
so constification happens in both cases.  I guess different places
have grown different styles of processing constraint expressions as
the APIs have evolved over time.

Thanks,
Amit




Re: Small patch to fix build on Windows

2019-08-09 Thread Kyotaro Horiguchi
At Fri, 9 Aug 2019 09:56:27 +0300, Dmitry Igrishin  wrote in 

> пт, 9 авг. 2019 г. в 05:45, Michael Paquier :
> >
> > On Thu, Aug 08, 2019 at 10:46:07PM +0300, Dmitry Igrishin wrote:
> > > This looks nice for a Perl hacker :-). As for me, it looks unusual and
> > > a bit confusing. I never
> > > programmed in Perl, but I was able to quickly understand where the
> > > problem lies due to the
> > >  style adopted in other languages, when the contents are enclosed in
> > > quotation marks, and
> > > the quotation marks are escaped if they are part of the contents.
> > > So, should I fix it? Any thoughts?
> >
> > FWIW, I like Alvaro's suggestion about qq{} in this case, as it makes
> > sure that double-quotes are correctly applied where they should.
> The attached 4rd version of the patch uses qq||. I used qq|| instead
> of qq{} for consistency because qq|| is already used in Solution.pm:
> 
>   return qq|VisualStudioVersion = $self->{VisualStudioVersion}
>   MinimumVisualStudioVersion = $self->{MinimumVisualStudioVersion}
>   |;

Hmm. qq is nice but '|' make my eyes twitch (a bit).  Couldn't we
use other delimites like (), ##, or // ? (I like {} for use in
this patch.)

Any opinions?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center