Re: Fix typos and inconsistencies for v11+

2019-06-17 Thread Michael Paquier
On Sat, Jun 15, 2019 at 06:00:00PM +0300, Alexander Lakhin wrote:
> Two summary patches for REL_11_STABLE and master are attached.

Thanks.  I have committed to HEAD most of the inconsistencies you have
pointed out.
--
Michael


signature.asc
Description: PGP signature


Re: Do we expect tests to work with default_transaction_isolation=serializable

2019-06-17 Thread Michael Paquier
On Sat, Jun 15, 2019 at 11:47:39AM -0700, Noah Misch wrote:
> On Sun, May 19, 2019 at 03:55:06PM -0700, Andres Freund wrote:
>> Currently that's not the case. When running check-world with PGOPTIONS
>> set to -c default_transaction_isolation=serializable I get easy to fix
>> failures (isolation, plpgsql) but also some apparently hanging tests
>> (003_recovery_targets.pl, 003_standby_2.pl).

These sound strange and may point to actual bugs.

>> Do we expect this to work? If it's desirable I'll set up an animal that
>> forces it to on.
> 
> I'm +1 for making it a project expectation, with an animal to confirm.  It's
> not expected to work today.

+1.
--
Michael


signature.asc
Description: PGP signature


Re: Fix typos and inconsistencies for v11+

2019-06-17 Thread Alexander Lakhin
17.06.2019 10:16, Michael Paquier wrote:
> On Sat, Jun 15, 2019 at 06:00:00PM +0300, Alexander Lakhin wrote:
>> Two summary patches for REL_11_STABLE and master are attached.
> Thanks.  I have committed to HEAD most of the inconsistencies you have
> pointed out.
Thank you, Michael.
Then I will go deeper for v10 and beyond. If older versions are not
going to be fixed, I will prepare patches only for the master branch.

Best regards,
Alexander





Still some references to configure-time WAL segment size option in msvc scripts

2019-06-17 Thread Michael Paquier
Hi all,

I have just bumped into $subject, which makes no sense now as this is
an init-time option.  Any objections if I remove this code as per the
attached?

Thanks,
--
Michael
diff --git a/src/tools/msvc/Solution.pm b/src/tools/msvc/Solution.pm
index 33ba15c15e..cbe019e524 100644
--- a/src/tools/msvc/Solution.pm
+++ b/src/tools/msvc/Solution.pm
@@ -55,10 +55,6 @@ sub _new
 	die "Bad wal_blocksize $options->{wal_blocksize}"
 	  unless grep { $_ == $options->{wal_blocksize} }
 	  (1, 2, 4, 8, 16, 32, 64);
-	$options->{wal_segsize} = 16
-	  unless $options->{wal_segsize};  # undef or 0 means default
-	die "Bad wal_segsize $options->{wal_segsize}"
-	  unless grep { $_ == $options->{wal_segsize} } (1, 2, 4, 8, 16, 32, 64);
 
 	return $self;
 }


signature.asc
Description: PGP signature


RE: [PATCH] Speedup truncates of relation forks

2019-06-17 Thread Jamison, Kirk
Hi all,

Attached is the v2 of the patch. I added the optimization that Sawada-san
suggested for DropRelFileNodeBuffers, although I did not acquire the lock
when comparing the minBlock and target block. 

There's actually a comment written in the source code that we could
pre-check buffer tag for forkNum and blockNum, but given that FSM and VM
blocks are small compared to main fork's, the additional benefit of doing so 
would be small.

>* We could check forkNum and blockNum as well as the rnode, but the
>* incremental win from doing so seems small.

I personally think it's alright not to include the suggested pre-checking.
If that's the case, we can just follow the patch v1 version.

Thoughts?

Comments and reviews from other parts of the patch are also very much welcome.

Regards,
Kirk Jamison


v2-0001-Speedup-truncates-of-relation-forks.patch
Description: v2-0001-Speedup-truncates-of-relation-forks.patch


Strange error message in xlog.c

2019-06-17 Thread Kyotaro Horiguchi
Hello,

I had an trouble report that the reporter had the following error
messages.

FATAL:  XX000: requested timeline 175 is not a child of this server's history
DETAIL:  Latest checkpoint is at 1A/D628 on timeline 172, but in
the history of the requested timeline, the server forked off from that
timeline at 1C/29074DB8.

This message doesn't make sense. Perhaps timeline 172 started
after 1A/D628 instead.

The attached patch makes the error messages for both cases make sense.

FATAL:  requested timeline 4 is not a child of this server's history
DETAIL:  Latest checkpoint is at 0/360 on timeline 2, but in the
history of the requested timeline, the server forked off from that
timeline at 0/22000A0.

FATAL:  requested timeline 4 is not a child of this server's history
DETAIL:  Latest checkpoint is at 0/360 on timeline 2, but in the
history of the requested timeline, the server entered that timeline at
0/4A0.

Intentional corruption of timeline-history is required to
exercise this. Do we need to do that regression test?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


0001-Fix-error-message-for-timeline-history-mismatch.patch
Description: Binary data


Re: SQL/JSON path issues/questions

2019-06-17 Thread Alexander Korotkov
I'm going to push attached 3 patches if no objections.

Regarding 0003-Separate-two-distinctive-json-errors.patch, I think it
requires more thoughts.

RETURN_ERROR(ereport(ERROR,
 (errcode(ERRCODE_SINGLETON_JSON_ITEM_REQUIRED),
  errmsg("left operand of jsonpath
operator %s is not a single numeric value",
-jspOperationName(jsp->type);
+jspOperationName(jsp->type)),
+ (llen != 1 ?
+  errdetail("It was an array with %d
elements.", llen):
+  errdetail("The only element was not a
numeric.");

When we have more than 1 value, it's no exactly array.  Jsonpath can
extract values from various parts of json document, which never
constitute and array.  Should we say something like "There are %d
values"?  Also, probably we should display the type of single element
if it's not numeric.  jsonb_path_match() also throws
ERRCODE_SINGLETON_JSON_ITEM_REQUIRED, should we add similar
errdetail() there?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


0001-implement-like_regex-flag-q-in-jsonpath-2.patch
Description: Binary data


0002-improve-jsonpath-like_regex-documentation-2.patch
Description: Binary data


0003-fix-jsonpath-varname-description-2.patch
Description: Binary data


Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3

2019-06-17 Thread Antonin Houska
Bruce Momjian  wrote:

> On Mon, Jun  3, 2019 at 12:04:54PM -0400, Robert Haas wrote:
> > 
> > What I'm talking about here is that it also has to be reasonably
> > possible to write an encryption key command that does something
> > useful.  I don't have a really clear vision for how that's going to
> > work.  Nobody wants the server, which is probably being launched by
> > pg_ctl or systemd or both, to prompt using its own stdin/stderr, but
> > the we need to think about how the prompting is actually going to
> > work.  One idea is to do it via the FEBE protocol: connect to the
> > server using libpq, issue a new command that causes the server to
> > enter COPY mode, and then send the encryption key as COPY data.
> > However, that idea doesn't really work, because we've got to be able
> > to get the key before we run recovery or reach consistency, so the
> > server won't be listening for connections at that point.  Also, we've
> > got to have a way for this to work in single-user mode, which also
> > can't listen for connections.  It's probably all fine if your
> > encryption key command is something like 'curl
> > https://my.secret.keyserver.me/sekcret.key' because then there's an
> > external server which you can just go access - but I don't quite
> > understand how you'd do interactive prompting from here.  Sorry to get
> > hung up on what may seem like a minor point, but I think it's actually
> > fairly fundamental: we've got to have a clear vision of how end-users
> > will really be able to make use of this.
> 
> pgcryptoey has an example of prompting from /dev/tty:
> 
>   http://momjian.us/download/pgcryptokey/
> 
> Look at pgcryptokey_default.sample.

It's a nice exercise of shell features but does not seem to be easily
portable, and it has some other restrictions. One is mentioned in a comment:

# Use 'pg_ctl -s' so starting dots do not interfere with password entry

I think that besides the dots, log messages can also be disturbing if logging
collector is not enabled. Another corner case might be there when user runs
pg_ctl with --no-wait option and wants to do run some other command
immediately.

I'm thinking how to teach postmaster to accept FEBE protocol connections
temporarily, just to receive the key. The user applications like pg_ctl,
initdb or pg_upgrade would retrieve the key / password from the DBA, then
start postmaster and send it the key.

Perhaps the message format should be a bit generic so that extensions like
this can use it to receive their keys too.

(The idea of an unix socket or named pipe I proposed upthread is not good
because it's harder to implement in a portable way.)

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com




Fix up grouping sets reorder

2019-06-17 Thread Richard Guo
Hi all,

During the reorder of grouping sets into correct prefix order, if only
one aggregation pass is needed, we follow the order of the ORDER BY
clause to the extent possible, to minimize the chance that we add
unnecessary sorts. This is implemented in preprocess_grouping_sets -->
reorder_grouping_sets.

However, current codes fail to do that. For instance:

# set enable_hashagg to off;
SET

# explain verbose select * from t group by grouping sets((a,b,c),(c)) order
by c,b,a;
  QUERY PLAN
---
 Sort  (cost=184.47..185.48 rows=404 width=12)
   Output: a, b, c
   Sort Key: t.c, t.b, t.a
   ->  GroupAggregate  (cost=142.54..166.98 rows=404 width=12)
 Output: a, b, c
 Group Key: t.c, t.a, t.b
 Group Key: t.c
 ->  Sort  (cost=142.54..147.64 rows=2040 width=12)
   Output: a, b, c
   Sort Key: t.c, t.a, t.b
   ->  Seq Scan on public.t  (cost=0.00..30.40 rows=2040
width=12)
 Output: a, b, c
(12 rows)

This sort node in the above plan can be avoided if we reorder the
grouping sets more properly.

Attached is a patch for the fixup. With the patch, the above plan would
become:

# explain verbose select * from t group by grouping sets((a,b,c),(c)) order
by c,b,a;
   QUERY PLAN
-
 GroupAggregate  (cost=142.54..166.98 rows=404 width=12)
   Output: a, b, c
   Group Key: t.c, t.b, t.a
   Group Key: t.c
   ->  Sort  (cost=142.54..147.64 rows=2040 width=12)
 Output: a, b, c
 Sort Key: t.c, t.b, t.a
 ->  Seq Scan on public.t  (cost=0.00..30.40 rows=2040 width=12)
   Output: a, b, c
(9 rows)

The fix happens in reorder_grouping_sets and is very simple. In each
iteration to reorder one grouping set, if the next item in sortclause
matches one element in new_elems, we add that item to the grouing set
list and meanwhile remove it from the new_elems list. When all the
elements in new_elems have been removed, we can know we are done with
current grouping set. We should break out to continue with next grouping
set.

Any thoughts?

Thanks
Richard


v1-0001-Fix-up-grouping-sets-reorder.patch
Description: Binary data


Re: Generating partitioning tuple conversion maps faster

2019-06-17 Thread didier
Hi,
Any chance for a backport to PG 11?

cherry-pick apply cleanly and with a 100 columns table it improves
performance nicely (20%).

Regards
Didier


On Sat, Jul 14, 2018 at 1:25 AM David Rowley
 wrote:
>
> On 14 July 2018 at 04:57, Heikki Linnakangas  wrote:
> > Pushed, thanks!
>
> Thanks for pushing, and thanks again for reviewing it, Alexander.
>
> --
>  David Rowley   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>




Re: POC: Cleaning up orphaned files using undo logs

2019-06-17 Thread Amit Kapila
On Thu, Jun 13, 2019 at 3:13 AM Robert Haas  wrote:
>
> On Mon, May 27, 2019 at 5:59 AM Amit Kapila  wrote:
> > Apart from fixing the above comments, the patch is rebased on latest
> > undo patchset.  As of now, I have split the binaryheap.c changes into
> > a separate patch.  We are stilll enhancing the patch to compute
> > oldestXidHavingUnappliedUndo which touches various parts of patch, so
> > splitting further without completing that can make it a bit difficult
> > to work on that.
>
> Some review comments around execute_undo_actions:
>
> The 'nopartial' argument to execute_undo_actions is confusing.  First,
> it would probably be worth spelling it out instead of abbreviation:
> not_partial_transaction rather than nopartial.  Second, it is usually
> better to phrase parameter names in terms of what they are rather than
> in terms of what they are not: complete_transaction rather than
> not_partial_transaction.  Third, it's unclear from these comments why
> we'd be undoing something other than a complete transaction.  It looks
> as though the answer is that this flag will be false when we're
> undoing a subxact -- in which case, why not invert the sense of the
> flag and call it 'bool subxact'?  I might be wrong, but it seems like
> that would be a whole lot clearer.
>

The idea was that it could be use for multiple purposes like 'rolling
back complete xact', 'rolling back subxact', 'rollback at page-level'
or any similar future need even though not all code paths use that
function.  I am not wedded to any particular name here, but among your
suggestions complete_transaction sounds better to me.  Are you okay
going with that?

> Fourth, the block at the top of
> this function, guarded by nopartial, seems like it must be vulnerable
> to race conditions.  If we're undoing the complete transaction, then
> it checks whether UndoFetchRecord() still returns anything.  But that
> could change not just at the beginning of the function, but also any
> time in the middle, or so it seems to me.
>

It won't change in between because we have ensured at top-level that
no two processes can start executing pending undo at the same time.
Basically, anyone wants to execute the undo actions will have an entry
in rollback hash table and that will be marked as in-progress.  As
mentioned in comments, the race is only "after discard worker
fetches the record and found that this transaction need to be rolled
back, backend might concurrently execute the actions and remove the
request from rollback hash table."

>  I doubt that this is the
> right level at which to deal with this sort of interlocking. I think
> there should be some higher-level mechanism that prevents two
> processes from trying to undo the same transaction at the same time,
> like a heavyweight lock or some kind of flag in the shared memory data
> structure that keeps track of pending undo, so that we never even
> reach this code unless we know that this XID needs undo work
>

Introducing heavyweight lock can create different sort of problems
because we need to hold it till all the actions are applied to avoid
what I have mentioned above.  The problem will be that discard worker
will be blocked till backend/undo worker applies the complete actions
unless we just take this lock conditionally in discard worker.

Another way could be that we re-fetch the undo record when we are
registering the undo request under RollbackRequestLock and check it's
status again becuase in that case backend or other undoworker won't be
able to remove the request from hash table concurrently.  However, the
advantage of checking it in execute_undo_actions is that we can
optimize it in the future to avoid re-fetching this record when
actually fetching the records to apply undo actions.

> and no
> other process is already doing it.
>

This part is already ensured in the current code.

>
> The 'blk_chain_complete' variable which is set in this function and
> passed down to execute_undo_actions_page() and then to the rmgr's
> rm_undo callback also looks problematic.
>

I agree this parameter should go away from the generic interface
considering the requirements from zedstore.

>  First, not every AM that
> uses undo may even have the notion of a 'block chain'; zedstore for
> example uses TIDs as a 48-bit integer, not a block + offset number, so
> it's really not going to have a 'block chain.'  Second, even in
> zheap's usage, it seems to me that the block chain could be complete
> even when this gets set to false. It gets set to true when we're
> undoing a toplevel transaction (not a subxact) and we were able to
> fetch all of the undo for that toplevel transaction. But even if
> that's not true, the chain for an individual block could still be
> complete, because all the remaining undo for the block at issue
> might've been in the chunk of undo we already read; the remaining undo
> could be for other blocks.  For that reason, I can't see how the zheap
> code that relies on this value ca

Re: SQL/JSON path issues/questions

2019-06-17 Thread Liudmila Mantrova


On 6/17/19 11:36 AM, Alexander Korotkov wrote:

I'm going to push attached 3 patches if no objections.

Regarding 0003-Separate-two-distinctive-json-errors.patch, I think it
requires more thoughts.

 RETURN_ERROR(ereport(ERROR,
  (errcode(ERRCODE_SINGLETON_JSON_ITEM_REQUIRED),
   errmsg("left operand of jsonpath
operator %s is not a single numeric value",
-jspOperationName(jsp->type);
+jspOperationName(jsp->type)),
+ (llen != 1 ?
+  errdetail("It was an array with %d
elements.", llen):
+  errdetail("The only element was not a
numeric.");

When we have more than 1 value, it's no exactly array.  Jsonpath can
extract values from various parts of json document, which never
constitute and array.  Should we say something like "There are %d
values"?  Also, probably we should display the type of single element
if it's not numeric.  jsonb_path_match() also throws
ERRCODE_SINGLETON_JSON_ITEM_REQUIRED, should we add similar
errdetail() there?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Hi Alexander,

While I have no objections to the proposed fixes, I think we can further 
improve patch 0003 and the text it refers to.
In attempt to clarify jsonpath docs and address the concern that ? is 
hard to trace in the current text, I'd also like to propose patch 0004.

Please see both of them attached.

--
Liudmila Mantrova
Technical writer at Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e918133..39ba18d 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -12281,7 +12281,7 @@ table2-mapping

 @?
 jsonpath
-Does JSON path returns any item for the specified JSON value?
+Does JSON path return any item for the specified JSON value?
 '{"a":[1,2,3,4,5]}'::jsonb @? '$.a[*] ? (@ > 2)'


@@ -12309,8 +12309,8 @@ table2-mapping
   

 The @? and @@ operators suppress
-errors including: lacking object field or array element, unexpected JSON
-item type and numeric errors.
+the following errors: lacking object field or array element, unexpected
+JSON item type, and numeric errors.
 This behavior might be helpful while searching over JSON document
 collections of varying structure.

@@ -13166,26 +13166,25 @@ table2-mapping
 jsonb_path_query, jsonb_path_query_array and
 jsonb_path_query_first
 functions have optional vars and silent
-argument.
+arguments.


-When vars argument is specified, it constitutes an object
-contained variables to be substituted into jsonpath
-expression.
+If the vars argument is specified, it provides an
+object containing named variables to be substituted into a
+jsonpath expression.


-When silent argument is specified and has
-true value, the same errors are suppressed as it is in
-the @? and @@ operators.
+If the silent argument is specified and has the
+true value, these functions suppress the same errors
+as the @? and @@ operators.

   
 
   
-See also  for the aggregate
-function json_agg which aggregates record
-values as JSON, and the aggregate function
-json_object_agg which aggregates pairs of values
-into a JSON object, and their jsonb equivalents,
+See also  for details on
+json_agg function that aggregates record
+values as JSON, json_object_agg function
+that aggregates pairs of values into a JSON object, and their jsonb equivalents,
 jsonb_agg and jsonb_object_agg.
   
 
diff --git a/doc/src/sgml/json.sgml b/doc/src/sgml/json.sgml
index b8246ba..daebb4f 100644
--- a/doc/src/sgml/json.sgml
+++ b/doc/src/sgml/json.sgml
@@ -733,10 +733,12 @@ SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc @> '{"tags": ["qu
  
  
   $varname
-  A named variable. Its value must be set in the
-  PASSING clause of an SQL/JSON query function.
- 
-  for details.
+  
+A named variable. Its value can be set by the parameter
+vars of several JSON processing functions.
+See  and
+its notes for details.
+
   
  
  
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 39ba18d..fa5afc1 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -11638,10 +11638,17 @@ table2-mapping
   
When defining the path, you can also use one or more
filter expressions, which work similar to
-   the WHERE clause in SQL. Each filter expression
-   can provide one or more filtering conditions that are applied
-   to the result of the path evaluation. Each filter expression must
-   be enclosed in parenthese

Re: Obsolete comments about semaphores in proc.c

2019-06-17 Thread Daniel Gustafsson
> On 14 Jun 2019, at 01:00, Thomas Munro  wrote:

> Commit 675f switched from a semaphore-based waiting to latch-based
> waiting for ProcSleep()/ProcWakeup(), but left behind some stray
> references to semaphores.  PSA.

LGTM

cheers ./daniel




Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

2019-06-17 Thread Christoph Berg
Re: Tom Lane 2019-06-14 <26948.1560517...@sss.pgh.pa.us>
> > /usr/lib/postgresql/12/bin/initdb -D pgdata
> > $ grep timezone pgdata/postgresql.conf
> > log_timezone = 'Etc/UTC'
> > timezone = 'Etc/UTC'
> 
> That's what I'd expect.  Do you think your upthread report of HEAD
> picking "Etc/UCT" was a typo?  Or maybe you actually had /etc/localtime
> set that way?

That was likely a typo, yes. Sorry for the confusion, there's many
variables...

Christoph




Re: CREATE STATISTICS documentation bug

2019-06-17 Thread Robert Haas
On Sat, Jun 15, 2019 at 7:22 PM Tomas Vondra
 wrote:
> I've pushed a fix for the docs issue.

Thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-06-17 Thread Stephen Frost
Greetings,

* Tomas Vondra (tomas.von...@2ndquadrant.com) wrote:
> On Sun, Jun 16, 2019 at 02:10:23PM -0400, Stephen Frost wrote:
> >* Joe Conway (m...@joeconway.com) wrote:
> >>On 6/16/19 9:45 AM, Bruce Momjian wrote:
> >>> On Sun, Jun 16, 2019 at 07:07:20AM -0400, Joe Conway wrote:
>  In any case it doesn't address my first point, which is limiting the
>  volume encrypted with the same key. Another valid reason is you might
>  have data at varying sensitivity levels and prefer different keys be
>  used for each level.
> >>>
> >>> That seems quite complex.
> >>
> >>How? It is no more complex than encrypting at the tablespace level
> >>already gives you - in that case you get this property for free if you
> >>care to use it.
> >
> >Perhaps not surprising, but I'm definitely in agreement with Joe
> >regarding having multiple keys when possible and (reasonably)
> >straight-forward to do so.  I also don't buy off on the OpenSSL
> >argument; their more severe issues certainly haven't been due to key
> >management issues such as what we're discussing here, so I don't think
> >the argument applies.
> 
> I'm not sure what exactly is the "OpenSSL argument" you're disagreeing
> with? IMHO Bruce is quite right that the risk of vulnerabilities grows
> with the complexity of the system (both due to implementation bugs and
> general design weaknesses). I don't think it's tied to the key
> management specifically, except that it's one of the parts that may
> contribute to the complexity.

While I understand that complexity of the system can lead to
vulnerabilities, I don't agree that it's appropriate or sensible to
equate the ones that OpenSSL has had to deal with as being similar to
what we might have to deal with, given this additional proposed
complexity.

> (It's often claimed that key management is one of the weakest points of
> current crypto systems - we have safe (a)symmetric algorithms, but safe
> handling of keys is an issue. I don't have data / papers supporting this
> claim, I kinda believe it.)

Yes, I agree entirely that key management is absolutely one of the
hardest things to get right in crypto systems.  It is, however, not what
the OpenSSL issues have been about and so while I agree that we may have
some bugs there, it's not fair to say that they're equivilant to what
OpenSSL has been dealing with.

> Now, I'm not opposed to eventually implementing something more
> elaborate, but I also think just encrypting the whole cluster (not
> necessarily with a single key, but with one master key) would be enough
> for vast majority of users. Plus it's less error prone and easier to
> operate (backups, replicas, crash recovery, ...).

I agree that it'd be better than nothing but if we have the opportunity
now to introduce what would hopefully be a relatively small capability,
then I'm certainly all for it.

> But there's about 0% chance we'll get that in v1, of course, so we need
> s "minimum viable product" to build on anyway.

There seems like a whole lot of space between something very elaborate
and only supporting one key.

Thanks,

Stephen


signature.asc
Description: PGP signature


pg_log_fatal vs pg_log_error

2019-06-17 Thread Antonin Houska
Can anyone please give me a hint (and possibly add some comments to the code)
when pg_log_fatal() should be used in frontend code and when it's appropriate
to call pg_log_error()? The current use does not seem very consistent.

I'd expect that the pg_log_fatal() should be called when the error is serious
enough to cause premature exit, but I can see cases where even pg_log_error()
is followed by exit(1). pg_waldump makes me feel that pg_log_error() is used
to handle incorrect user input (before the actual execution started) while
pg_log_fatal() handles error conditions that user does not fully control
(things that happen during the actual execution). But this is rather a guess.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com




Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-06-17 Thread Joe Conway
On 6/17/19 8:12 AM, Stephen Frost wrote:
>> But there's about 0% chance we'll get that in v1, of course, so we need
>> s "minimum viable product" to build on anyway.
> 
> There seems like a whole lot of space between something very elaborate
> and only supporting one key.

I think this is exactly the point -- IMHO one key per tablespace is a
nice and very sensible compromise. I can imagine all kinds of more
complex things that would be "nice to have" but that gets us most of the
flexibility needed with minimal additional complexity.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development




Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-06-17 Thread Masahiko Sawada
On Fri, Jun 14, 2019 at 7:41 AM Tomas Vondra
 wrote:
> I personally find the idea of encrypting tablespaces rather strange.
> Tablespaces are meant to define hysical location for objects, but this
> would also use them to "mark" objects as encrypted or not. That just
> seems misguided and would make the life harder for many users.
>
> For example, what if I don't have any tablespaces (except for the
> default one), but I want to encrypt only some objects? Suddenly I have
> to create a tablespace, which will however cause various difficulties
> down the road (during pg_basebackup, etc.).

I guess that we can have an encrypted tabelspace by default (e.g.
pg_default_enc). Or we encrypt per tables while having encryption keys
per tablespaces.

On Mon, Jun 17, 2019 at 6:54 AM Tomas Vondra
 wrote:
>
> On Sun, Jun 16, 2019 at 02:10:23PM -0400, Stephen Frost wrote:
> >Greetings,
> >
> >* Joe Conway (m...@joeconway.com) wrote:
> >> On 6/16/19 9:45 AM, Bruce Momjian wrote:
> >> > On Sun, Jun 16, 2019 at 07:07:20AM -0400, Joe Conway wrote:
> >> >> In any case it doesn't address my first point, which is limiting the
> >> >> volume encrypted with the same key. Another valid reason is you might
> >> >> have data at varying sensitivity levels and prefer different keys be
> >> >> used for each level.
> >> >
> >> > That seems quite complex.
> >>
> >> How? It is no more complex than encrypting at the tablespace level
> >> already gives you - in that case you get this property for free if you
> >> care to use it.
> >
> >Perhaps not surprising, but I'm definitely in agreement with Joe
> >regarding having multiple keys when possible and (reasonably)
> >straight-forward to do so.  I also don't buy off on the OpenSSL
> >argument; their more severe issues certainly haven't been due to key
> >management issues such as what we're discussing here, so I don't think
> >the argument applies.
> >
>
> I'm not sure what exactly is the "OpenSSL argument" you're disagreeing
> with? IMHO Bruce is quite right that the risk of vulnerabilities grows
> with the complexity of the system (both due to implementation bugs and
> general design weaknesses). I don't think it's tied to the key
> management specifically, except that it's one of the parts that may
> contribute to the complexity.
>
> (It's often claimed that key management is one of the weakest points of
> current crypto systems - we have safe (a)symmetric algorithms, but safe
> handling of keys is an issue. I don't have data / papers supporting this
> claim, I kinda believe it.)
>
> Now, I'm not opposed to eventually implementing something more
> elaborate, but I also think just encrypting the whole cluster (not
> necessarily with a single key, but with one master key) would be enough
> for vast majority of users. Plus it's less error prone and easier to
> operate (backups, replicas, crash recovery, ...).
>
> But there's about 0% chance we'll get that in v1, of course, so we need
> s "minimum viable product" to build on anyway.
>

I agree that we need minimum viable product first. But I'm not sure
it's true that the implementing the cluster-wide TDE first could be
the first step of per-tablespace/table TDE.

The purpose of cluster-wide TDE and table/tablespace TDE are slightly
different in terms of encryption target objects. The cluster-wide TDE
would be a good solution for users who want to encrypt everything
while the table/tabelspace TDE would help more severe use cases in
terms of both of security and performance.

The cluster-wide TDE eventually encrypts SLRU data and all WAL
including non-user data related WAL while table/tablespace TDE doesn't
unless we develop such functionality. In addition, the cluster-wide
TDE also encrypts system catalogs but in table/tablespace TDE user
would be able to control that somewhat. That is, if we developed the
cluster-wide TDE first, when we develop table/tablespace TDE on top of
that we would need to change TDE so that table/tablespace TDE can
encrypt even non-user data related data while retaining its simple
user interface, which would rather make the feature complex, I'm
concerned. We can support them as different TDE features but I'm not
sure it's a good choice for users.

>From perspective of  cryptographic, I think the fine grained TDE would
be better solution. Therefore if we eventually want the fine grained
TDE I wonder if it might be better to develop the table/tablespace TDE
first while keeping it simple as much as possible in v1, and then we
can provide the functionality to encrypt other data in database
cluster to satisfy the encrypting-everything requirement. I guess that
it's easier to incrementally add encryption target objects rather than
making it fine grained while not changing encryption target objects.

FWIW I'm writing a draft patch of per tablespace TDE and will submit
it in this month. We can more discuss the complexity of the proposed
TDE using it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT 

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-06-17 Thread Joe Conway
On 6/17/19 8:29 AM, Masahiko Sawada wrote:
> From perspective of  cryptographic, I think the fine grained TDE would
> be better solution. Therefore if we eventually want the fine grained
> TDE I wonder if it might be better to develop the table/tablespace TDE
> first while keeping it simple as much as possible in v1, and then we
> can provide the functionality to encrypt other data in database
> cluster to satisfy the encrypting-everything requirement. I guess that
> it's easier to incrementally add encryption target objects rather than
> making it fine grained while not changing encryption target objects.
> 
> FWIW I'm writing a draft patch of per tablespace TDE and will submit
> it in this month. We can more discuss the complexity of the proposed
> TDE using it.

+1

Looking forward to it.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development




Re: pg_log_fatal vs pg_log_error

2019-06-17 Thread Michael Paquier
On Mon, Jun 17, 2019 at 02:19:30PM +0200, Antonin Houska wrote:
> I'd expect that the pg_log_fatal() should be called when the error is serious
> enough to cause premature exit, but I can see cases where even pg_log_error()
> is followed by exit(1). pg_waldump makes me feel that pg_log_error() is used
> to handle incorrect user input (before the actual execution started) while
> pg_log_fatal() handles error conditions that user does not fully control
> (things that happen during the actual execution). But this is rather a guess.

I agree with what you say when pg_log_fatal should be used for an
error bad enough that the binary should exit immediately.  In the case
of pg_waldump, not using pg_log_fatal() makes the code more readable
because there is no need to repeat the "Try --help for more
information on a bad argument".  Have you spotted other areas of the
code where it makes sense to change a pg_log_error() + exit to a
single pg_log_fatal()?
--
Michael


signature.asc
Description: PGP signature


Re: Batch insert in CTAS/MatView code

2019-06-17 Thread Paul Guo
Hi all,

I've been working other things until recently I restarted the work,
profiling & refactoring the code.
It's been a long time since the last patch was proposed. The new patch has
now been firstly refactored due to
4da597edf1bae0cf0453b5ed6fc4347b6334dfe1 (Make TupleTableSlots extensible,
finish split of existing slot type).

Now that TupleTableSlot, instead of HeapTuple is one argument of
intorel_receive() so we can not get the
tuple length directly. This patch now gets the tuple length if we know all
columns are with fixed widths, else
we calculate an avg. tuple length using the first MAX_MULTI_INSERT_SAMPLES
(defined as 1000) tuples
and use for the total length of tuples in a batch.

I noticed that to do batch insert, we might need additional memory copy
sometimes comparing with "single insert"
(that should be the reason that we previously saw a bit regressions) so a
good solution seems to fall back
to "single insert" if the tuple length is larger than a threshold. I set
this as 2000 after quick testing.

To make test stable and strict, I run checkpoint before each ctas, the test
script looks like this:

checkpoint;
\timing
create table tt as select a,b,c from t11;
\timing
drop table tt;

Also previously I just tested the BufferHeapTupleTableSlot (i.e. create
table tt as select * from t11),
this time I test VirtualTupleTableSlot (i.e. create table tt as select
a,b,c from t11) additionally.
It seems that VirtualTupleTableSlot is very common in real cases.

I tested four kinds of tables, see below SQLs.

-- tuples with small size.
create table t11 (a int, b int, c int, d int);
insert into t11 select s,s,s,s from generate_series(1, 1000) s;
analyze t11;

-- tuples that are untoasted and tuple size is 1984 bytes.
create table t12 (a name, b name, c name, d name, e name, f name, g name, h
name, i name, j name, k name, l name, m name, n name, o name, p name, q
name, r name, s name, t name, u name, v name, w name, x name, y name, z
name, a1 name, a2 name, a3 name, a4 name, a5 name);
insert into t12 select 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j',
'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y',
'z', 'a', 'b', 'c', 'd', 'e' from generate_series(1, 50);
analyze t12;

-- tuples that are untoasted and tuple size is 2112 bytes.
create table t13 (a name, b name, c name, d name, e name, f name, g name, h
name, i name, j name, k name, l name, m name, n name, o name, p name, q
name, r name, s name, t name, u name, v name, w name, x name, y name, z
name, a1 name, a2 name, a3 name, a4 name, a5 name, a6 name, a7 name);
insert into t13 select 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j',
'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y',
'z', 'a', 'b', 'c', 'd', 'e', 'f', 'g' from generate_series(1, 50);
analyze t13;

-- tuples that are toastable and tuple compressed size is 1084.
create table t14 (a text, b text, c text, d text, e text, f text, g text, h
text, i text, j text, k text, l text, m text, n text, o text, p text, q
text, r text, s text, t text, u text, v text, w text, x text, y text, z
text);
insert into t14 select i, i, i, i, i, i, i, i, i, i, i, i, i, i, i, i, i,
i, i, i, i, i, i, i, i, i from (select repeat('123456789', 1) from
generate_series(1,5000)) i;
analyze t14;


I also tested two scenarios for each testing.

One is to clean up all kernel caches (page & inode & dentry on Linux) using
the command below and then run the test,
sync; echo 3 > /proc/sys/vm/drop_caches
After running all tests all relation files will be in kernel cache (my test
system memory is large enough to accommodate all relation files),
then I run the tests again. I run like this because in real scenario the
result of the test should be among the two results. Also I rerun
each test and finally I calculate the average results as the experiment
results. Below are some results:


scenario1: All related kernel caches are cleaned up (note the first two
columns are time with second).

baselinepatch   diff%   SQL




10.15.5744.85%  create table tt as select * from t11;

10.75.5248.41%  create table tt as select a,b,c from t11;

9.5710.2-6.58%  create table tt as select * from t12;

9.648.6310.48%  create table tt as select
a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z,a1,a2,a3,a4 from t12;

14.214.46   -1.83%  create table tt as select * from t13;

11.88   12.05   -1.43%  create table tt as select
a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z,a1,a2,a3,a4,a5,a6 from
t13;

3.173.25-2.52%  create table tt as select * from t14;


2.933.12-6.48%  create table tt as select
a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y from t14;



scenario2: all related kernel caches are populated after previous testing.





baselinepatch   diff%   SQL




9.6 4.9748.23%  create table tt as select * from t11;

10.41   5.3248.90%  create table tt as sel

Re: Obsolete comments about semaphores in proc.c

2019-06-17 Thread Michael Paquier
On Mon, Jun 17, 2019 at 01:28:42PM +0200, Daniel Gustafsson wrote:
>> On 14 Jun 2019, at 01:00, Thomas Munro  wrote:
> 
>> Commit 675f switched from a semaphore-based waiting to latch-based
>> waiting for ProcSleep()/ProcWakeup(), but left behind some stray
>> references to semaphores.  PSA.
> 
> LGTM

Fine seen from here as well.  I am not spotting other areas, FWIW.
--
Michael


signature.asc
Description: PGP signature


Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-06-17 Thread Antonin Houska
Masahiko Sawada  wrote:

> The cluster-wide TDE eventually encrypts SLRU data and all WAL
> including non-user data related WAL while table/tablespace TDE doesn't
> unless we develop such functionality. In addition, the cluster-wide
> TDE also encrypts system catalogs but in table/tablespace TDE user
> would be able to control that somewhat. That is, if we developed the
> cluster-wide TDE first, when we develop table/tablespace TDE on top of
> that we would need to change TDE so that table/tablespace TDE can
> encrypt even non-user data related data while retaining its simple
> user interface, which would rather make the feature complex, I'm
> concerned.

Isn't this only a problem of pg_upgrade? If the whole instance (including
catalog) is encrypted and user wants to adopt the table/tablespace TDE, then
pg_upgrade can simply decrypt the catalog, plus tables/tablespaces which
should no longer be encrypted. Conversely, if only some tables/tablespaces are
encrypted and user wants to encrypt the whole cluster, pg_upgrade will encrypt
the non-encrypted files.

> We can support them as different TDE features but I'm not sure it's a good
> choice for users.

IMO it does not matter which approach (cluster vs table/tablespace) is
implemented first. What matters is to design the user interface so that
addition of the other of the two features does not make users confused.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com




Re: pg_log_fatal vs pg_log_error

2019-06-17 Thread Antonin Houska
Michael Paquier  wrote:

> On Mon, Jun 17, 2019 at 02:19:30PM +0200, Antonin Houska wrote:
> > I'd expect that the pg_log_fatal() should be called when the error is 
> > serious
> > enough to cause premature exit, but I can see cases where even 
> > pg_log_error()
> > is followed by exit(1). pg_waldump makes me feel that pg_log_error() is used
> > to handle incorrect user input (before the actual execution started) while
> > pg_log_fatal() handles error conditions that user does not fully control
> > (things that happen during the actual execution). But this is rather a 
> > guess.
> 
> I agree with what you say when pg_log_fatal should be used for an
> error bad enough that the binary should exit immediately.  In the case
> of pg_waldump, not using pg_log_fatal() makes the code more readable
> because there is no need to repeat the "Try --help for more
> information on a bad argument".

I'd understand this if pg_log_fatal() called exit() itself, but it does not
(unless I miss something).

> Have you spotted other areas of the code where it makes sense to change a
> pg_log_error() + exit to a single pg_log_fatal()?

I haven't done an exhaustive search so far, but as I mentioned above,
pg_log_fatal() does not seem to be "pg_log_error() + exit()".

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com




Re: pg_log_fatal vs pg_log_error

2019-06-17 Thread Michael Paquier
On Mon, Jun 17, 2019 at 03:39:49PM +0200, Antonin Houska wrote:
> I'd understand this if pg_log_fatal() called exit() itself, but it does not
> (unless I miss something).

Oops.  My apologies.  I have my own wrapper of pg_log_fatal() for an
internal tool which does an exit on top of the logging in this case.
You are right the PG code does not exit() in this case.
--
Michael


signature.asc
Description: PGP signature


Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-06-17 Thread Antonin Houska
Antonin Houska  wrote:

> Masahiko Sawada  wrote:
> 
> > The cluster-wide TDE eventually encrypts SLRU data and all WAL
> > including non-user data related WAL while table/tablespace TDE doesn't
> > unless we develop such functionality. In addition, the cluster-wide
> > TDE also encrypts system catalogs but in table/tablespace TDE user
> > would be able to control that somewhat. That is, if we developed the
> > cluster-wide TDE first, when we develop table/tablespace TDE on top of
> > that we would need to change TDE so that table/tablespace TDE can
> > encrypt even non-user data related data while retaining its simple
> > user interface, which would rather make the feature complex, I'm
> > concerned.
> 
> Isn't this only a problem of pg_upgrade?

Sorry, this is not a use case for pg_upgrade. Rather it's about a separate
encryption/decryption utility.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com




Re: POC: Cleaning up orphaned files using undo logs

2019-06-17 Thread Robert Haas
On Mon, Jun 17, 2019 at 6:03 AM Amit Kapila  wrote:
> The idea was that it could be use for multiple purposes like 'rolling
> back complete xact', 'rolling back subxact', 'rollback at page-level'
> or any similar future need even though not all code paths use that
> function.  I am not wedded to any particular name here, but among your
> suggestions complete_transaction sounds better to me.  Are you okay
> going with that?

Sure, let's try that for now and see how it looks.  We can always
change it again if it seems to be a good idea later.

> It won't change in between because we have ensured at top-level that
> no two processes can start executing pending undo at the same time.
> Basically, anyone wants to execute the undo actions will have an entry
> in rollback hash table and that will be marked as in-progress.  As
> mentioned in comments, the race is only "after discard worker
> fetches the record and found that this transaction need to be rolled
> back, backend might concurrently execute the actions and remove the
> request from rollback hash table."
>
> [ discussion of alternatives ]

I'm not precisely sure what the best thing to do here is, but I'm
skeptical that the code in question belongs in this function.  There
are two separate things going on here: one is this revalidation that
the undo hasn't been discarded, and the other is executing the undo
actions. Those are clearly separate tasks, and they are not tasks that
always get done together: sometimes we do only one, and sometimes we
do both.  Any function that looks like this is inherently suspicious:

whatever(., bool flag)
{
if (flag)
{
 // lengthy block of code
}

// another lengthy block of code
}

There has to be a reason not to just split this into two functions and
let the caller decide whether to call one or both.

> For Error case, it is fine to report failure, but there can be cases
> where we don't need to apply undo actions like when the relation is
> dropped/truncated, undo actions are already applied.  The original
> idea was to cover such cases by the return value.  I agree that
> currently, caller ignores this value, but there is some value in
> keeping it.  So, I am in favor of a signature with bool as the return
> value.

OK.  So then the callers can't keep ignoring it... and there should be
some test framework that verifies the behavior when the return value
is false.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-06-17 Thread Tomas Vondra

On Mon, Jun 17, 2019 at 08:39:27AM -0400, Joe Conway wrote:

On 6/17/19 8:29 AM, Masahiko Sawada wrote:

From perspective of  cryptographic, I think the fine grained TDE would
be better solution. Therefore if we eventually want the fine grained
TDE I wonder if it might be better to develop the table/tablespace TDE
first while keeping it simple as much as possible in v1, and then we
can provide the functionality to encrypt other data in database
cluster to satisfy the encrypting-everything requirement. I guess that
it's easier to incrementally add encryption target objects rather than
making it fine grained while not changing encryption target objects.

FWIW I'm writing a draft patch of per tablespace TDE and will submit
it in this month. We can more discuss the complexity of the proposed
TDE using it.


+1

Looking forward to it.



Yep. In particular, I'm interested in those aspects:

(1) What's the proposed minimum viable product, and how do we expect to
extend it with the more elaborate features. I don't expect perfect
specification, but we should have some idea so that we don't paint
ourselves in the corner.

(2) How does it affect recovery, backups and replication (both physical
and logical)? That is, which other parts need to know the encryption keys
to function properly?

(3) What does it mean for external tools (pg_waldump, pg_upgrade,
pg_rewind etc.)? 



regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-06-17 Thread Tomas Vondra

On Mon, Jun 17, 2019 at 08:29:02AM -0400, Joe Conway wrote:

On 6/17/19 8:12 AM, Stephen Frost wrote:

But there's about 0% chance we'll get that in v1, of course, so we need
s "minimum viable product" to build on anyway.


There seems like a whole lot of space between something very elaborate
and only supporting one key.


I think this is exactly the point -- IMHO one key per tablespace is a
nice and very sensible compromise. I can imagine all kinds of more
complex things that would be "nice to have" but that gets us most of the
flexibility needed with minimal additional complexity.



Not sure.

I think it's clear the main challenge is encryption of shared resources,
particularly WAL (when each WAL record gets encrypted with the same key as
the object). Considering the importance of WAL, that complicates all sorts
of stuff (recovery, replication, ...).

Encrypting WAL with a single key would be way easier, because we could
(probably) just encrypt whole WAL pages. That may not be appropriate for
some advanced use cases, of course. It would work when used as a db-level
replacement for FDE, which I think was the primary motivation for TDE.

In any case, if we end up with a more complex/advanced design, I've
already voiced my opinion that binding the keys to tablespaces is the
wrong abstraction, and I think we'll regret it eventually. For example,
why have we invented publications instead of using tablespaces?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: Binary support for pgoutput plugin

2019-06-17 Thread Dave Cramer
On Fri, 14 Jun 2019 at 15:42, Dave Cramer  wrote:

>
> Dave Cramer
>
>
> On Fri, 14 Jun 2019 at 14:36, Tomas Vondra 
> wrote:
>
>> On Wed, Jun 12, 2019 at 10:35:48AM -0400, Dave Cramer wrote:
>> >On Mon, 10 Jun 2019 at 07:49, Petr Jelinek > >
>> >wrote:
>> >
>> >> Hi,
>> >>
>> >> On 10/06/2019 13:27, Dave Cramer wrote:
>> >> > So back to binary output.
>> >> >
>> >> > From what I can tell the place to specify binary options would be in
>> the
>> >> > create publication and or in replication slots?
>> >> >
>> >> > The challenge as I see it is that the subscriber would have to be
>> able
>> >> > to decode binary output.
>> >> >
>> >> > Any thoughts on how to handle this? At the moment I'm assuming that
>> this
>> >> > would only work for subscribers that knew how to handle binary.
>> >> >
>> >>
>> >> Given that we don't need to write anything extra to WAL on upstream to
>> >> support binary output, why not just have the request for binary data as
>> >> an option for the pgoutput and have it chosen dynamically? Then it's
>> the
>> >> subscriber who asks for binary output via option(s) to
>> START_REPLICATION.
>> >>
>> >
>> >If I understand this correctly this would add something to the
>> CREATE/ALTER
>> >SUBSCRIPTION commands in the WITH clause.
>> >Additionally another column would be required for pg_subscription for the
>> >binary option.
>> >Does it make sense to add an options column which would just be a comma
>> >separated string?
>> >Not that I have future options in mind but seems like something that
>> might
>> >come up in the future.
>> >
>>
>> I'd just add a boolean column to the catalog. That's what I did in the
>> patch adding support for streaming in-progress transactions. I don't think
>> we expect many additional parameters, so it makes little sense to optimize
>> for that case.
>>
>
> Which is what I have done. Thanks
>
> I've attached both patches for comments.
> I still have to add documentation.
>
> Regards,
>
> Dave
>

Additional patch for documentation.


Dave Cramer


0003-document-new-binary-option-for-CREATE-SUBSCRIPTION.patch
Description: Binary data


Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-06-17 Thread Stephen Frost
Greetings,

* Tomas Vondra (tomas.von...@2ndquadrant.com) wrote:
> In any case, if we end up with a more complex/advanced design, I've
> already voiced my opinion that binding the keys to tablespaces is the
> wrong abstraction, and I think we'll regret it eventually. For example,
> why have we invented publications instead of using tablespaces?

I would certainly hope that we don't stop at tablespaces, they just seem
like a much simpler piece to bite off piece than going to table-level
right off, and they make sense for some environments where there's a
relatively small number of levels of separation, which are already being
segregated into different filesystems (or at least directories) for the
same reason that you want different encryption keys.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: pg_log_fatal vs pg_log_error

2019-06-17 Thread Peter Eisentraut
On 2019-06-17 14:19, Antonin Houska wrote:
> Can anyone please give me a hint (and possibly add some comments to the code)
> when pg_log_fatal() should be used in frontend code and when it's appropriate
> to call pg_log_error()? The current use does not seem very consistent.

For a program that runs in a loop, like for example psql or
pg_receivewal, use error if the program keeps running and fatal if not.
For one-shot programs like for example createdb, there is no difference,
so we have used error in those cases.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Allow an alias to be attached directly to a JOIN ... USING

2019-06-17 Thread Peter Eisentraut
A small new feature in SQL:2016 allows attaching a table alias to a
JOIN/USING construct:

 ::=
  USING   
  [ AS  ]

(The part in brackets is new.)

This seems quite useful, and it seems the code would already support
this if we allow the grammar to accept this syntax.

Patch attached.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From e1c9e7b7a12f0f7aba8f5c88a7909a61171dee27 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 17 Jun 2019 15:35:32 +0200
Subject: [PATCH] Allow an alias to be attached directly to a JOIN ... USING

This allows something like

SELECT ... FROM t1 JOIN t2 USING (a, b, c) AS x

instead of requiring parentheses for the alias like

SELECT ... FROM (t1 JOIN t2 USING (a, b, c)) AS x

per SQL:2016 feature F404 "Range variable for common column names".

The parse analysis guts already support this, so this patch only
has to adjust the grammar a bit.
---
 doc/src/sgml/ref/select.sgml |  2 +-
 src/backend/catalog/sql_features.txt |  2 +-
 src/backend/parser/gram.y| 69 ++--
 src/test/regress/expected/join.out   | 18 
 src/test/regress/sql/join.sql|  5 ++
 5 files changed, 69 insertions(+), 27 deletions(-)

diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 06d611b64c..628b67a11a 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -59,7 +59,7 @@
 [ LATERAL ] function_name ( [ 
argument [, ...] ] ) AS ( 
column_definition [, ...] )
 [ LATERAL ] ROWS FROM( function_name ( [ argument [, ...] ] ) [ AS ( column_definition [, ...] ) ] [, ...] )
 [ WITH ORDINALITY ] [ [ AS ] alias [ ( column_alias [, ...] ) ] ]
-from_item [ NATURAL ] 
join_type from_item [ ON join_condition | USING ( join_column [, ...] ) ]
+from_item [ NATURAL ] 
join_type from_item [ ON join_condition | USING ( join_column [, ...] ) [ AS alias ] ]
 
 and grouping_element can 
be one of:
 
diff --git a/src/backend/catalog/sql_features.txt 
b/src/backend/catalog/sql_features.txt
index ae874f38ee..15188ee970 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -264,7 +264,7 @@ F401Extended joined table   02  FULL OUTER JOIN 
YES
 F401   Extended joined table   04  CROSS JOIN  YES 
 F402   Named column joins for LOBs, arrays, and multisets  
YES 
 F403   Partitioned joined tables   NO  
-F404   Range variable for common column names  NO  
+F404   Range variable for common column names  YES 
 F411   Time zone specification YES differences regarding 
literal interpretation
 F421   National character  YES 
 F431   Read-only scrollable cursorsYES 
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 8311b1dd46..844a6eab82 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -435,7 +435,7 @@ static Node *makeRecursiveViewSelect(char *relname, List 
*aliases, Node *query);
 %typelocked_rels_list
 %type all_or_distinct
 
-%typejoin_outer join_qual
+%typejoin_outer
 %type   join_type
 
 %typeextract_list overlay_list position_list
@@ -488,7 +488,7 @@ static Node *makeRecursiveViewSelect(char *relname, List 
*aliases, Node *query);
 %typesub_type opt_materialized
 %type   NumericOnly
 %typeNumericOnly_list
-%type   alias_clause opt_alias_clause
+%type   alias_clause opt_alias_clause opt_alias_clause_for_join_using
 %typefunc_alias_clause
 %type  sortby
 %type   index_elem
@@ -11967,20 +11967,28 @@ joined_table:
n->quals = NULL;
$$ = n;
}
-   | table_ref join_type JOIN table_ref join_qual
+   | table_ref join_type JOIN table_ref ON a_expr
{
JoinExpr *n = makeNode(JoinExpr);
n->jointype = $2;
n->isNatural = false;
n->larg = $1;
n->rarg = $4;
-   if ($5 != NULL && IsA($5, List))
-   n->usingClause = (List *) $5; 
/* USING clause */
-   else
-   n->quals = $5; /* ON clause */
+   n->quals = $6;
+   $$ = n;
+   }
+   | table_ref join_type JOIN table_ref USING '(' 
name_list ')' opt_alias_clause_for_join_using
+   

Re: [PATCH] Stop ALTER SYSTEM from making bad assumptions

2019-06-17 Thread Juan José Santamaría Flecha
On Mon, Jun 17, 2019 at 12:57 AM Gavin Flower
 wrote:
>
>
> I thinking fixing this silently should be at least a hanging offence.
>

Maybe adding a MD5 header to the file to check if it has been altered
outside guc.c might be enough.

Regards,

Juan José Santamaría Flecha




Re: Race conditions with TAP test for syncrep

2019-06-17 Thread Alvaro Herrera
On 2019-Jun-17, Michael Paquier wrote:

> Attached is a patch to improve the stability of the test.  The fix I
> am proposing is very simple: in order to make sure that a standby is
> added into the WAL sender array of the primary, let's check after
> pg_stat_replication after a standby is started.  This can be done
> consistently with a small wrapper in the tests.
> 
> Any thoughts?

Hmm, this introduces a bit of latency: it waits for each standby to be
fully up before initializing the next standby.  Maybe it would be more
convenient to split the primitives: keep the current one to start the
standby, and add a separate one to wait for it to be registered.  Then
we could do
standby1->start;
standby2->start;
standby3->start;
foreach my $sby (@standbys) {
$sby->wait_for_standby
}

so they all start in parallel, saving a bit of time.

> + print "### Waiting for standby \"$standby_name\" on \"$master_name\"\n";

I think this should be note() rather than print(), or maybe diag().  (I
see that we have a couple of other cases which use print() in the tap
tests, which I think should be note() as well.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] Stop ALTER SYSTEM from making bad assumptions

2019-06-17 Thread Ian Barwick

Hi

On 6/15/19 1:08 AM, Stephen Frost wrote:
> Greetings,
>
> * Ian Barwick (ian.barw...@2ndquadrant.com) wrote:
>> Consider the following cascading standby setup with PostgreSQL 12:
>>
>> - there exists a running primary "A"
>> - standby "B" is cloned from primary "A" using "pg_basebackup 
--write-recovery-conf"
>> - standby "C" is cloned from standby "B" using "pg_basebackup 
--write-recovery-conf"
(...)
>> However, executing "ALTER SYSTEM SET primary_conninfo = 
'host=someothernode'" left
>> standby "C"'s "postgresql.auto.conf" file looking like this:
>>
>># Do not edit this file manually!
>># It will be overwritten by the ALTER SYSTEM command.
>>primary_conninfo = 'host=someothernode'
>>primary_conninfo = 'host=node_B'
>>
>> which seems somewhat broken, to say the least.
>
> Yes, it's completely broken, which I've complained about at least twice
> on this list to no avail.
>
> Thanks for putting together an example case pointing out why it's a
> serious issue.  The right thing to do here it so create an open item for
> PG12 around this.

Done.

>> Attached patch attempts to rectify this situation by having 
replace_auto_config_value()
>> deleting any duplicate entries first, before making any changes to the last 
entry.
>
> While this might be a good belt-and-suspenders kind of change to
> include, I don't think pg_basebackup should be causing us to have
> multiple entries in the file in the first place..
(...)
>> Also attached is a set of TAP tests to check ALTER SYSTEM works as expected 
(or
>> at least as seems correct to me).
>
> In my view, at least, we should have a similar test for pg_basebackup to
> make sure that it doesn't create an invalid .auto.conf file.

Indeed... I'd be happy to create tests... but first we need a definition of what
constitutes a valid .auto.conf file.

If that definition includes requiring that a parameter may occur only once, then
we need to provide a way for utilities such as pg_basebackup to write the 
replication
configuration to a configuration file in such a way that it doesn't somehow
render that file invalid.

In Pg11 and earlier, it was just a case of writing (or overwriting) 
recovery.conf.

In Pg12, the code in pg_basebackup implies the correct thing to do is append to 
.auto.conf,
but as demonstrated that can cause problems with duplicate entries.

Having pg_basebackup, or any other utility which clones a standby, parse the 
file
itself to remove duplicates seems like a recipe for pain and badly duplicated 
effort
(unless there's some way of calling the configuration parsing code while the
server is not running).

We could declare that the .auto.conf file will be reset to the default state 
when
a standby is cloned, but the implicit behaviour so far has been to copy the file
as-is (as would happen with any other configuration files in the data 
directory).

We could avoid the need for modifying the .auto.conf file by declaring that a
configuration with a specific name in the data directory (let's call it
"recovery.conf" or "replication.conf") can be used by any utilities writing
replication configuration (though of course in contrast to the old recovery.conf
it would be included, if exists, as a normal configuration file, though then the
precedence would need to be defined, etc..). I'm not sure off the top of my head
whether something like that has already been discussed, though it's presumably a
bit late in the release cycle to make such changes anyway?

>>> This is absolutely the fault of the system for putting in multiple
>>> entries into the auto.conf, which it wasn't ever written to handle.
>>
> * Amit Kapila (amit.kapil...@gmail.com) wrote:
>> Right.  I think if possible, it should use existing infrastructure to
>> write to postgresql.auto.conf rather than inventing a new way to
>> change it.  Apart from this issue, if we support multiple ways to edit
>> postgresql.auto.conf, we might end up with more problems like this in
>> the future where one system is not aware of the way file being edited
>> by another system.
>
> I agere that there should have been some effort put into making the way
> ALTER SYSTEM is modified be consistent between the backend and utilities
> like pg_basebackup (which would also help third party tools understand
> how a non-backend application should be modifying the file).

Did you mean to say "the way postgresql.auto.conf is modified"?

I suggest explicitly documenting postgresql.auto.conf behaviour (and the 
circumstances
where it's acceptable to modify it outside of ALTER SYSTEM calls) in the 
documentation
(and possibly in the code), so anyone writing utilities which need to
append to postgresql.auto.conf knows what the situation is.

Something along the following lines?

- postgresql.auto.conf is maintained by PostgreSQL and can be rewritten at will 
by the system
  at any time
- there is no guarantee that items in postgresql.auto.conf will be in a 
particular order
- it  must never be manu

Re: [PATCH] Stop ALTER SYSTEM from making bad assumptions

2019-06-17 Thread Ian Barwick

On 6/17/19 2:58 AM, Magnus Hagander wrote:

On Sun, Jun 16, 2019 at 7:43 PM Stephen Frost mailto:sfr...@snowman.net>> wrote:


* Tom Lane (t...@sss.pgh.pa.us ) wrote:
 > Stephen Frost mailto:sfr...@snowman.net>> writes:

 > > what we should do is clean them up (and possibly
 > > throw a WARNING or similar at the user saying "something modified your
 > > postgresql.auto.conf in an unexpected way").  I'd suggest we do that on
 > > every ALTER SYSTEM call.
 >
 > +1 for having ALTER SYSTEM clean out duplicates.  Not sure whether
 > a WARNING would seem too in-your-face.

I'd hope for a warning from basically every part of the system when it
detects, clearly, that a file was changed in a way that it shouldn't
have been.  If we don't throw a warning, then we're implying that it's
acceptable, but then cleaning up the duplicates, which seems pretty
confusing.

> +1. Silently "fixing" the file by cleaning up duplicates is going to be even

>  more confusing o uses who had seen them be there before.

Some sort of notification is definitely appropriate here.

However, going back to the original scenario (cascaded standby set up using
"pg_basebackup --write-recovery-conf") there would now be a warning emitted
the first time anyone executes ALTER SYSTEM (about duplicate "primary_conninfo"
entries) which would not have occured in Pg11 and earlier (and which will
no doubt cause consternation along the lines "how did my postgresql.auto.conf
get modified in an unexpected way? OMG? Bug? Was I hacked?").


Regards

Ian Barwick
--
 Ian Barwick   https://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services




Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-06-17 Thread Tomas Vondra

On Mon, Jun 17, 2019 at 10:33:11AM -0400, Stephen Frost wrote:

Greetings,

* Tomas Vondra (tomas.von...@2ndquadrant.com) wrote:

In any case, if we end up with a more complex/advanced design, I've
already voiced my opinion that binding the keys to tablespaces is the
wrong abstraction, and I think we'll regret it eventually. For example,
why have we invented publications instead of using tablespaces?


I would certainly hope that we don't stop at tablespaces, they just seem
like a much simpler piece to bite off piece than going to table-level
right off, and they make sense for some environments where there's a
relatively small number of levels of separation, which are already being
segregated into different filesystems (or at least directories) for the
same reason that you want different encryption keys.



Why not to use the right abstraction from the beginning? I already
mentioned publications, which I think we can use as an inspiration. So
it's not like this would be a major design task, IMHO.

In my experience it's pretty difficult to change abstractions the design
is based on, not just because it tends to be invasive implementation-wise,
but also because users get used to it.

FWIW this is one of the reasons why I advocate for v1 not to allow this,
because it's much easier to extend the design

   single group -> multiple groups

compared to

   one way to group objects -> different way to group objects



regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





how to run encoding-dependent tests by default

2019-06-17 Thread Peter Eisentraut
There is a fair amount of collation-related functionality that is only
being tested by sql/collate.icu.utf8.sql and sql/collate.linux.utf8.sql,
which are not run by default.  There is more functionality planned in
this area, so making the testing more straightforward would be useful.

The reason these tests cannot be run by default (other than that they
don't apply to each build, which is easy to figure out) is that

a) They contain UTF8 non-ASCII characters that might not convert to
every server-side encoding, and

b) The error messages mention the encoding name ('ERROR:  collation
"foo" for encoding "UTF8" does not exist')

The server encoding can be set more-or-less arbitrarily for each test
run, and moreover it is computed from the locale, so it's not easy to
determine ahead of time from a makefile, say.

What would be a good way to sort this out?  None of these problems are
terribly difficult on their own, but I'm struggling to come up with a
coherent solution.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] Stop ALTER SYSTEM from making bad assumptions

2019-06-17 Thread Stephen Frost
Greetings,

* Ian Barwick (ian.barw...@2ndquadrant.com) wrote:
> However, going back to the original scenario (cascaded standby set up using
> "pg_basebackup --write-recovery-conf") there would now be a warning emitted
> the first time anyone executes ALTER SYSTEM (about duplicate 
> "primary_conninfo"
> entries) which would not have occured in Pg11 and earlier (and which will
> no doubt cause consternation along the lines "how did my postgresql.auto.conf
> get modified in an unexpected way? OMG? Bug? Was I hacked?").

No, I don't think we should end up in a situation where this happens.

I agree that this implies making pg_basebackup more intelligent when
it's dealing with that file but I simply don't have a lot of sympathy
about that, it's not news to anyone who has been paying attention.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: how to run encoding-dependent tests by default

2019-06-17 Thread Tom Lane
Peter Eisentraut  writes:
> There is a fair amount of collation-related functionality that is only
> being tested by sql/collate.icu.utf8.sql and sql/collate.linux.utf8.sql,
> which are not run by default.  There is more functionality planned in
> this area, so making the testing more straightforward would be useful.
> The reason these tests cannot be run by default (other than that they
> don't apply to each build, which is easy to figure out) is that
> a) They contain UTF8 non-ASCII characters that might not convert to
> every server-side encoding, and
> b) The error messages mention the encoding name ('ERROR:  collation
> "foo" for encoding "UTF8" does not exist')
> The server encoding can be set more-or-less arbitrarily for each test
> run, and moreover it is computed from the locale, so it's not easy to
> determine ahead of time from a makefile, say.

> What would be a good way to sort this out?  None of these problems are
> terribly difficult on their own, but I'm struggling to come up with a
> coherent solution.

Perhaps set up a separate test run (not part of the core tests) in which
the database is forced to have UTF8 encoding?  That could be expanded
to other encodings too if anyone cares.

regards, tom lane




Re: [PATCH] Stop ALTER SYSTEM from making bad assumptions

2019-06-17 Thread Stephen Frost
Greetings,

* Ian Barwick (ian.barw...@2ndquadrant.com) wrote:
> On 6/15/19 1:08 AM, Stephen Frost wrote:
> > * Ian Barwick (ian.barw...@2ndquadrant.com) wrote:
> >> Consider the following cascading standby setup with PostgreSQL 12:
> >>
> >> - there exists a running primary "A"
> >> - standby "B" is cloned from primary "A" using "pg_basebackup 
> >> --write-recovery-conf"
> >> - standby "C" is cloned from standby "B" using "pg_basebackup 
> >> --write-recovery-conf"
> (...)
> >> However, executing "ALTER SYSTEM SET primary_conninfo = 
> >> 'host=someothernode'" left
> >> standby "C"'s "postgresql.auto.conf" file looking like this:
> >>
> >># Do not edit this file manually!
> >># It will be overwritten by the ALTER SYSTEM command.
> >>primary_conninfo = 'host=someothernode'
> >>primary_conninfo = 'host=node_B'
> >>
> >> which seems somewhat broken, to say the least.
> >
> > Yes, it's completely broken, which I've complained about at least twice
> > on this list to no avail.
> >
> > Thanks for putting together an example case pointing out why it's a
> > serious issue.  The right thing to do here it so create an open item for
> > PG12 around this.
> 
> Done.

Thanks.

> >> Attached patch attempts to rectify this situation by having 
> >> replace_auto_config_value()
> >> deleting any duplicate entries first, before making any changes to the 
> >> last entry.
> >
> > While this might be a good belt-and-suspenders kind of change to
> > include, I don't think pg_basebackup should be causing us to have
> > multiple entries in the file in the first place..
> (...)
> >> Also attached is a set of TAP tests to check ALTER SYSTEM works as 
> >> expected (or
> >> at least as seems correct to me).
> >
> > In my view, at least, we should have a similar test for pg_basebackup to
> > make sure that it doesn't create an invalid .auto.conf file.
> 
> Indeed... I'd be happy to create tests... but first we need a definition of 
> what
> constitutes a valid .auto.conf file.
> 
> If that definition includes requiring that a parameter may occur only once, 
> then
> we need to provide a way for utilities such as pg_basebackup to write the 
> replication
> configuration to a configuration file in such a way that it doesn't somehow
> render that file invalid.

Yes, I think that we do need to require that a parameter only occur once
and pg_basebackup and friends need to be able to manage that.

> In Pg11 and earlier, it was just a case of writing (or overwriting) 
> recovery.conf.

Right.

> In Pg12, the code in pg_basebackup implies the correct thing to do is append 
> to .auto.conf,
> but as demonstrated that can cause problems with duplicate entries.

Code can have bugs. :)  I'd argue that this is such a bug that needs to
be fixed..

> Having pg_basebackup, or any other utility which clones a standby, parse the 
> file
> itself to remove duplicates seems like a recipe for pain and badly duplicated 
> effort
> (unless there's some way of calling the configuration parsing code while the
> server is not running).

I don't really see that there's much hope for it.

> We could declare that the .auto.conf file will be reset to the default state 
> when
> a standby is cloned, but the implicit behaviour so far has been to copy the 
> file
> as-is (as would happen with any other configuration files in the data 
> directory).
> 
> We could avoid the need for modifying the .auto.conf file by declaring that a
> configuration with a specific name in the data directory (let's call it
> "recovery.conf" or "replication.conf") can be used by any utilities writing
> replication configuration (though of course in contrast to the old 
> recovery.conf
> it would be included, if exists, as a normal configuration file, though then 
> the
> precedence would need to be defined, etc..). I'm not sure off the top of my 
> head
> whether something like that has already been discussed, though it's 
> presumably a
> bit late in the release cycle to make such changes anyway?

This was discussed a fair bit, including suggestions along exactly those
lines.  There were various arguments for and against, so you might want
to review the threads where that discussion happened to see what the
reasoning was for not having such an independent file.

> >>> This is absolutely the fault of the system for putting in multiple
> >>> entries into the auto.conf, which it wasn't ever written to handle.
> >>
> > * Amit Kapila (amit.kapil...@gmail.com) wrote:
> >> Right.  I think if possible, it should use existing infrastructure to
> >> write to postgresql.auto.conf rather than inventing a new way to
> >> change it.  Apart from this issue, if we support multiple ways to edit
> >> postgresql.auto.conf, we might end up with more problems like this in
> >> the future where one system is not aware of the way file being edited
> >> by another system.
> >
> > I agere that there should have been some effort put into making the way
> > ALTER SYSTEM is modified be consisten

Re: how to run encoding-dependent tests by default

2019-06-17 Thread Andrew Dunstan


On 6/17/19 11:32 AM, Tom Lane wrote:
> Peter Eisentraut  writes:
>> There is a fair amount of collation-related functionality that is only
>> being tested by sql/collate.icu.utf8.sql and sql/collate.linux.utf8.sql,
>> which are not run by default.  There is more functionality planned in
>> this area, so making the testing more straightforward would be useful.
>> The reason these tests cannot be run by default (other than that they
>> don't apply to each build, which is easy to figure out) is that
>> a) They contain UTF8 non-ASCII characters that might not convert to
>> every server-side encoding, and
>> b) The error messages mention the encoding name ('ERROR:  collation
>> "foo" for encoding "UTF8" does not exist')
>> The server encoding can be set more-or-less arbitrarily for each test
>> run, and moreover it is computed from the locale, so it's not easy to
>> determine ahead of time from a makefile, say.
>> What would be a good way to sort this out?  None of these problems are
>> terribly difficult on their own, but I'm struggling to come up with a
>> coherent solution.
> Perhaps set up a separate test run (not part of the core tests) in which
> the database is forced to have UTF8 encoding?  That could be expanded
> to other encodings too if anyone cares.
>
>   



I should point out that the buildfarm does run these tests for every
utf8 locale it's configured for if the TestICU module is enabled. At the
moment the only animal actually running those tests is prion, for
en_US.utf8.


cheers


andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: how to run encoding-dependent tests by default

2019-06-17 Thread Andres Freund
Hi,

On 2019-06-17 16:56:00 +0200, Peter Eisentraut wrote:
> There is a fair amount of collation-related functionality that is only
> being tested by sql/collate.icu.utf8.sql and sql/collate.linux.utf8.sql,
> which are not run by default.  There is more functionality planned in
> this area, so making the testing more straightforward would be useful.
> 
> The reason these tests cannot be run by default (other than that they
> don't apply to each build, which is easy to figure out) is that
> 
> a) They contain UTF8 non-ASCII characters that might not convert to
> every server-side encoding, and
> 
> b) The error messages mention the encoding name ('ERROR:  collation
> "foo" for encoding "UTF8" does not exist')
> 
> The server encoding can be set more-or-less arbitrarily for each test
> run, and moreover it is computed from the locale, so it's not easy to
> determine ahead of time from a makefile, say.
> 
> What would be a good way to sort this out?  None of these problems are
> terribly difficult on their own, but I'm struggling to come up with a
> coherent solution.

I wonder if using alternative output files and psql's \if could be good
enough here. It's not that hard to maintain an alternative output file
if it's nearly empty.

Basically something like:

\gset SELECT my_encodings_are_compatible() AS compatible
\if :compatible
test;
contents;
\endif

That won't get rid of b) in its entirety, but even just running the test
automatically on platforms it works without problems would be an
improvement.

We probably also could just have a wrapper function in those tests that
catch the exception and print a more anodyne message.

Greetings,

Andres Freund




Re: initial random incompatibility

2019-06-17 Thread Komяpa
I cannot find traces, but I believe there was a Twitter poll on which
random do people get after setseed() in postgres, and we found at least
three distinct sequences across different builds.

On Mon, Jun 10, 2019 at 5:52 PM Alvaro Herrera 
wrote:

> On 2019-Jun-08, Euler Taveira wrote:
>
> > While fixing the breakage caused by the default number of trailing
> > digits output for real and double precision, I noticed that first
> > random() call after setseed(0) doesn't return the same value as 10 and
> > earlier (I tested 9.4 and later). It changed an expected behavior and
> > it should be listed in incompatibilities section of the release notes.
> > Some applications can rely on such behavior.
>
> Hmm.  Tom argued about the backwards-compatibility argument in
> the discussion that led to that commit:
> https://www.postgresql.org/message-id/3859.1545849...@sss.pgh.pa.us
> I think this is worth listing in the release notes.  Can you propose
> some wording?
>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
>

-- 
Darafei Praliaskouski
Support me: http://patreon.com/komzpa


Re: initial random incompatibility

2019-06-17 Thread Alvaro Herrera
On 2019-Jun-17, Darafei "Komяpa" Praliaskouski wrote:

> I cannot find traces, but I believe there was a Twitter poll on which
> random do people get after setseed() in postgres, and we found at least
> three distinct sequences across different builds.

In different machines or different build options?  I suppose that's
acceptable ...  the problem is changing the sequence in one release to
the next in the same machine with the same build options.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: assertion at postmaster start

2019-06-17 Thread Andres Freund
Hi,

On 2019-06-15 12:09:50 -0400, Alvaro Herrera wrote:
> Once in a blue moon I get this assertion failure on server start:
> 
> 2019-06-15 12:00:29.650 -04 [30080] LOG:  iniciando PostgreSQL 12beta1 on 
> x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.4.0-1ubuntu1~18.04) 7.4.0, 
> 64-bit
> 2019-06-15 12:00:29.650 -04 [30080] LOG:  escuchando en la dirección IPv4 
> «127.0.0.1», port 55432
> 2019-06-15 12:00:29.650 -04 [30080] LOG:  escuchando en el socket Unix 
> «/tmp/.s.PGSQL.55432»
> 2019-06-15 12:00:29.658 -04 [30956] LOG:  el sistema de bases de datos fue 
> apagado en 2019-06-15 12:00:24 -04
> 2019-06-15 12:00:29.659 -04 [30080] LOG:  proceso de servidor (PID 30107) 
> terminó con código de salida 15
> 2019-06-15 12:00:29.659 -04 [30080] LOG:  terminando todos los otros procesos 
> de servidor activos
> TRAP: FailedAssertion(«!(AbortStartTime == 0)», Archivo: 
> «/pgsql/source/master/src/backend/postmaster/postmaster.c», Línea: 2957)
> Aborted (core dumped)
> 
> Apologies for the Spanish -- I cannot readily reproduce this.  In
> essence, this shows a normal startup, until suddenly process 30107
> terminates with exit code 15, and then while shutting everything down,
> postmaster hits the aforementioned assertion and terminates.

I assume this is on master as of a few days ago? This doesn't even look
to be *after* a crash-restart? And I assume core files weren't enabled?


> One problem with debugging this is that I don't know what process 30107
> is, since the logs don't mention it.

Hm - it probably can't be that many processes, it looks like 30107 has
to have started pretty soon after the startup process (which IIRC is the
one emitting "el sistema de bases de datos fue apagado en"), and as soon
as that's done 30107 is noticed as having crashed.

Unfortunately, as this appears to be a start in a clean database, we
don't really know which phase of startup this is. There's IIRC no
messages to be expected before "database system is ready to accept
connections" in a clean start.

What is a bit odd is that:

> 2019-06-15 12:00:29.659 -04 [30080] LOG:  proceso de servidor (PID 30107) 
> terminó con código de salida 15

comes from:
#. translator: %s is a noun phrase describing a child process, such as
#. "server process"
#: postmaster/postmaster.c:3656
#, c-format
msgid "%s (PID %d) exited with exit code %d"
msgstr "%s (PID %d) terminó con código de salida %d"

#: postmaster/postmaster.c:3301 postmaster/postmaster.c:3321
#: postmaster/postmaster.c:3328 postmaster/postmaster.c:3346
msgid "server process"
msgstr "proceso de servidor"

And "server process" is afaict only used for actual backends, not other
types of processes. But we've not yet seen "database system is ready to
accept accept connections", so IIRC it could only be a "dead_end" type
backend? But we didn't yet see an error from that...


> No idea what is going on.

Seems to indicate a logic error in postmaster's state machine. Perhaps
something related to dead_end processes?

Greetings,

Andres Freund




Re: Fix up grouping sets reorder

2019-06-17 Thread Andres Freund
Hi,

On 2019-06-17 17:23:11 +0800, Richard Guo wrote:
> During the reorder of grouping sets into correct prefix order, if only
> one aggregation pass is needed, we follow the order of the ORDER BY
> clause to the extent possible, to minimize the chance that we add
> unnecessary sorts. This is implemented in preprocess_grouping_sets -->
> reorder_grouping_sets.

Thanks for finding!

Andrew, could you take a look at that?

- Andres




Re: initial random incompatibility

2019-06-17 Thread Tom Lane
Alvaro Herrera  writes:
> On 2019-Jun-17, Darafei "Komяpa" Praliaskouski wrote:
>> I cannot find traces, but I believe there was a Twitter poll on which
>> random do people get after setseed() in postgres, and we found at least
>> three distinct sequences across different builds.

> In different machines or different build options?  I suppose that's
> acceptable ...  the problem is changing the sequence in one release to
> the next in the same machine with the same build options.

FWIW, I agree that this change should be called out as a possible
compatibility hazard, even though anybody who was expecting repeatable
behavior from the old code was playing with fire.

regards, tom lane




Re: assertion at postmaster start

2019-06-17 Thread Tom Lane
Andres Freund  writes:
> On 2019-06-15 12:09:50 -0400, Alvaro Herrera wrote:
>> Once in a blue moon I get this assertion failure on server start:
>> TRAP: FailedAssertion(«!(AbortStartTime == 0)», Archivo: 
>> «/pgsql/source/master/src/backend/postmaster/postmaster.c», Línea: 2957)

> And "server process" is afaict only used for actual backends, not other
> types of processes. But we've not yet seen "database system is ready to
> accept accept connections", so IIRC it could only be a "dead_end" type
> backend? But we didn't yet see an error from that...
> Seems to indicate a logic error in postmaster's state machine. Perhaps
> something related to dead_end processes?

So if Andres is guessing right, this must be from something trying to
connect before the postmaster is ready?  Seems like that could be
tested for ...

regards, tom lane




Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

2019-06-17 Thread Andres Freund
On 2019-06-14 23:14:09 +0100, Andrew Gierth wrote:
> So here is my current proposed fix.

Before pushing a commit that's controversial - and this clearly seems to
somewhat be - it'd be good to give others a heads up that you intend to
do so, so they can object. Rather than just pushing less than 24h later,
without a warning.

- Andres




Re: SQL/JSON path issues/questions

2019-06-17 Thread Thom Brown
On Fri, 14 Jun 2019 at 08:16, Kyotaro Horiguchi  wrote:
>
> Hi, Thom.
>
> At Thu, 13 Jun 2019 14:59:51 +0100, Thom Brown  wrote
> in 
> > Hi,
> >
> > I've been reading through the documentation regarding jsonpath and
> > jsonb_path_query etc., and I have found it lacking explanation for
> > some functionality, and I've also had some confusion when using the
> > feature.
> >
> > ? operator
> > ==
> > The first mention of '?' is in section 9.15, where it says:
> >
> > "Suppose you would like to retrieve all heart rate values higher than
> > 130. You can achieve this using the following expression:
> > '$.track.segments[*].HR ? (@ > 130)'"
> >
> > So what is the ? operator doing here?  Sure, there's the regular ?
>
> It is described just above as:
>
> | Each filter expression must be enclosed in parentheses and
> | preceded by a question mark.

Can I suggest that, rather than using "question mark", we use the "?"
symbol, or provide a syntax structure which shows something like:

 ? 

This not only makes this key information clearer and more prominent,
but it also makes the "?" symbol searchable in a browser for anyone
wanting to find out what that symbol is doing.

> > operator, which is given as an example further down the page:
> >
> > '{"a":1, "b":2}'::jsonb ? 'b'
> >
> > But this doesn't appear to have the same purpose.
>
> The section is mentioning path expressions and the '?' is a jsonb
> operator. It's somewhat confusing but not so much comparing with
> around..
>
> > like_regex
> > ==
> > Then there's like_regex, which shows an example that uses the keyword
> > "flag", but that is the only instance of that keyword being mentioned,
> > and the flags available to this expression aren't anywhere to be seen.
>
> It is described as POSIX regular expressions. So '9.7.3 POSIX
> Regular Expressions' is that. But linking it would
> helpful. (attached 0001)
>
> > is unknown
> > ==
> > "is unknown" suggests a boolean output, but the example shows an
> > output of "infinity".  While I understand what it does, this appears
> > inconsistent with all other "is..." functions (e.g. is_valid(lsn),
> > pg_is_other_temp_schema(oid), pg_opclass_is_visible(opclass_oid),
> > pg_is_in_backup() etc.).
>
> It's the right behavior. Among them, only "infinity" gives
> "unknown' for the test (@ > 0). -1 gives false, 2 and 3 true.

I still find it counter-intuitive.
>
> > $varname
> > ==
> > The jsonpath variable, $varname, has an incomplete description: "A
> > named variable. Its value must be set in the PASSING clause of an
> > SQL/JSON query function. for details."
>
> Yeah, it is apparently chopped amid. In the sgml source, the
> missing part is "", and the PASSING clause is
> not implemented yet. On the other hand a similar stuff is
> currently implemented as vas parameter in some jsonb
> functions. Linking it to there might be helpful (Attached 0002).
>
>
> > Binary operation error
> > ==
> > I get an error when I run this query:
> >
> > postgres=# SELECT jsonb_path_query('[2]', '2 + $[1]');
> > psql: ERROR:  right operand of jsonpath operator + is not a single numeric 
> > value
> >
> > While I know it's correct to get an error in this scenario as there is
> > no element beyond 0, the message I get is confusing.  I'd expect this
> > if it encountered another array in that position, but not for
> > exceeding the upper bound of the array.
>
> Something like attached makes it clerer? (Attached 0003)
>
> | ERROR:  right operand of jsonpath operator + is not a single numeric value
> | DETAIL:  It was an array with 0 elements.

My first thought upon seeing this error message would be, "I don't see
an array with 0 elements."

>
> > Cryptic error
> > ==
> > postgres=# SELECT jsonb_path_query('[1, "2",
> > {},[{"a":2}],2.3,null,"2019-06-05T13:25:43.511Z"]','$[**].type()');
> > psql: ERROR:  syntax error, unexpected ANY_P at or near "**" of jsonpath 
> > input
> > LINE 1: ...},[{"a":2}],2.3,null,"2019-06-05T13:25:43.511Z"]','$[**].typ...
> >  ^
> > Again, I expect an error, but the message produced doesn't help me.
> > I'll remove the ANY_P if I can find it.
>
> Yeah, I had a similar error:
>
> =# select jsonb_path_query('[-1,2,7, "infinity"]', '$[*] ? (($hoge) is
> unknown)', '{"hoge": (@ > 0)}');
> ERROR:  syntax error, unexpected IS_P at or near " " of jsonpath input
>
> When the errors are issued, the caller side is commented as:
>
> jsonpath_scan.l:481
> > jsonpath_yyerror(NULL, "bogus input"); /* shouldn't happen */
>
> The error message is reasonable if it were really shouldn't
> happen, but it quite easily happen. I don't have an idea of how
> to fix it for the present..
>
> > Can't use nested arrays with jsonpath
> > ==
> >
> > I encounter an error in this scenario:
> >
> > postgres=# select jsonb_path_query('[1, 2, 1, [1,2], 3]','$[*] ? (@ == 
> > [1,2])');
> > psql: ERROR:  syntax error, unexpected '[' at or 

Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

2019-06-17 Thread Stephen Frost
Greetings,

* Andres Freund (and...@anarazel.de) wrote:
> On 2019-06-14 23:14:09 +0100, Andrew Gierth wrote:
> > So here is my current proposed fix.
> 
> Before pushing a commit that's controversial - and this clearly seems to
> somewhat be - it'd be good to give others a heads up that you intend to
> do so, so they can object. Rather than just pushing less than 24h later,
> without a warning.

Seems like that would have meant a potentially very late commit to avoid
having a broken (for some value of broken anyway) point release (either
with new code, or with reverting the timezone changes previously
committed), which isn't great either.

In general, I agree with you, and we should try to give everyone time to
discuss when something is controversial, but this seems like it was at
least a bit of a tough call.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

2019-06-17 Thread Andres Freund
Hi,

On 2019-06-17 14:34:58 -0400, Stephen Frost wrote:
> * Andres Freund (and...@anarazel.de) wrote:
> > On 2019-06-14 23:14:09 +0100, Andrew Gierth wrote:
> > > So here is my current proposed fix.
> > 
> > Before pushing a commit that's controversial - and this clearly seems to
> > somewhat be - it'd be good to give others a heads up that you intend to
> > do so, so they can object. Rather than just pushing less than 24h later,
> > without a warning.
> 
> Seems like that would have meant a potentially very late commit to avoid
> having a broken (for some value of broken anyway) point release (either
> with new code, or with reverting the timezone changes previously
> committed), which isn't great either.

> In general, I agree with you, and we should try to give everyone time to
> discuss when something is controversial, but this seems like it was at
> least a bit of a tough call.

Hm? All I'm saying is that Andrew's email should have included something
to the effect of "Due to the upcoming release, I'm intending to push and
backpatch the attached fix in ~20h".

Greetings,

Andres Freund




Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

2019-06-17 Thread Stephen Frost
Greetings,

* Andres Freund (and...@anarazel.de) wrote:
> On 2019-06-17 14:34:58 -0400, Stephen Frost wrote:
> > * Andres Freund (and...@anarazel.de) wrote:
> > > On 2019-06-14 23:14:09 +0100, Andrew Gierth wrote:
> > > > So here is my current proposed fix.
> > > 
> > > Before pushing a commit that's controversial - and this clearly seems to
> > > somewhat be - it'd be good to give others a heads up that you intend to
> > > do so, so they can object. Rather than just pushing less than 24h later,
> > > without a warning.
> > 
> > Seems like that would have meant a potentially very late commit to avoid
> > having a broken (for some value of broken anyway) point release (either
> > with new code, or with reverting the timezone changes previously
> > committed), which isn't great either.
> 
> > In general, I agree with you, and we should try to give everyone time to
> > discuss when something is controversial, but this seems like it was at
> > least a bit of a tough call.
> 
> Hm? All I'm saying is that Andrew's email should have included something
> to the effect of "Due to the upcoming release, I'm intending to push and
> backpatch the attached fix in ~20h".

Ah, ok, I agree that would have been good to do.  Of course, hindsight
being 20/20 and all that.  Something to keep in mind for the future
though.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Fix up grouping sets reorder

2019-06-17 Thread Andrew Gierth
> "Andres" == Andres Freund  writes:

 >> During the reorder of grouping sets into correct prefix order, if
 >> only one aggregation pass is needed, we follow the order of the
 >> ORDER BY clause to the extent possible, to minimize the chance that
 >> we add unnecessary sorts. This is implemented in
 >> preprocess_grouping_sets --> reorder_grouping_sets.

 Andres> Thanks for finding!

 Andres> Andrew, could you take a look at that?

Yes.

-- 
Andrew (irc:RhodiumToad)




Re: SQL/JSON path issues/questions

2019-06-17 Thread Alexander Korotkov
On Mon, Jun 17, 2019 at 8:40 PM Thom Brown  wrote:
> On Fri, 14 Jun 2019 at 08:16, Kyotaro Horiguchi  
> wrote:
> >
> > Hi, Thom.
> >
> > At Thu, 13 Jun 2019 14:59:51 +0100, Thom Brown  wrote
> > in 
> > > Hi,
> > >
> > > I've been reading through the documentation regarding jsonpath and
> > > jsonb_path_query etc., and I have found it lacking explanation for
> > > some functionality, and I've also had some confusion when using the
> > > feature.
> > >
> > > ? operator
> > > ==
> > > The first mention of '?' is in section 9.15, where it says:
> > >
> > > "Suppose you would like to retrieve all heart rate values higher than
> > > 130. You can achieve this using the following expression:
> > > '$.track.segments[*].HR ? (@ > 130)'"
> > >
> > > So what is the ? operator doing here?  Sure, there's the regular ?
> >
> > It is described just above as:
> >
> > | Each filter expression must be enclosed in parentheses and
> > | preceded by a question mark.
>
> Can I suggest that, rather than using "question mark", we use the "?"
> symbol, or provide a syntax structure which shows something like:
>
>  ? 
>
> This not only makes this key information clearer and more prominent,
> but it also makes the "?" symbol searchable in a browser for anyone
> wanting to find out what that symbol is doing.

Sounds like a good point for me.

> > > operator, which is given as an example further down the page:
> > >
> > > '{"a":1, "b":2}'::jsonb ? 'b'
> > >
> > > But this doesn't appear to have the same purpose.
> >
> > The section is mentioning path expressions and the '?' is a jsonb
> > operator. It's somewhat confusing but not so much comparing with
> > around..
> >
> > > like_regex
> > > ==
> > > Then there's like_regex, which shows an example that uses the keyword
> > > "flag", but that is the only instance of that keyword being mentioned,
> > > and the flags available to this expression aren't anywhere to be seen.
> >
> > It is described as POSIX regular expressions. So '9.7.3 POSIX
> > Regular Expressions' is that. But linking it would
> > helpful. (attached 0001)
> >
> > > is unknown
> > > ==
> > > "is unknown" suggests a boolean output, but the example shows an
> > > output of "infinity".  While I understand what it does, this appears
> > > inconsistent with all other "is..." functions (e.g. is_valid(lsn),
> > > pg_is_other_temp_schema(oid), pg_opclass_is_visible(opclass_oid),
> > > pg_is_in_backup() etc.).
> >
> > It's the right behavior. Among them, only "infinity" gives
> > "unknown' for the test (@ > 0). -1 gives false, 2 and 3 true.
>
> I still find it counter-intuitive.

It might be so.  But it's defined do in SQL Standard 2016.  Following
an SQL standard was always a project priority.  We unlikely going to
say: "We don't want to follow a standard, because it doesn't looks
similar to our home brew functions."

> > > $varname
> > > ==
> > > The jsonpath variable, $varname, has an incomplete description: "A
> > > named variable. Its value must be set in the PASSING clause of an
> > > SQL/JSON query function. for details."
> >
> > Yeah, it is apparently chopped amid. In the sgml source, the
> > missing part is "", and the PASSING clause is
> > not implemented yet. On the other hand a similar stuff is
> > currently implemented as vas parameter in some jsonb
> > functions. Linking it to there might be helpful (Attached 0002).
> >
> >
> > > Binary operation error
> > > ==
> > > I get an error when I run this query:
> > >
> > > postgres=# SELECT jsonb_path_query('[2]', '2 + $[1]');
> > > psql: ERROR:  right operand of jsonpath operator + is not a single 
> > > numeric value
> > >
> > > While I know it's correct to get an error in this scenario as there is
> > > no element beyond 0, the message I get is confusing.  I'd expect this
> > > if it encountered another array in that position, but not for
> > > exceeding the upper bound of the array.
> >
> > Something like attached makes it clerer? (Attached 0003)
> >
> > | ERROR:  right operand of jsonpath operator + is not a single numeric value
> > | DETAIL:  It was an array with 0 elements.
>
> My first thought upon seeing this error message would be, "I don't see
> an array with 0 elements."

Yes, it looks counter-intuitive for me too.  There is really no array
with 0 elements.  Actually, jsonpath subexpression selects no items.
We probably should adjust the message accordingly.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: idea: log_statement_sample_rate - bottom limit for sampling

2019-06-17 Thread Pavel Stehule
Hi

čt 6. 6. 2019 v 10:48 odesílatel Gilles Darold  napsal:

> Le 06/06/2019 à 10:38, Pavel Stehule a écrit :
> > Hi
> >
> > I like the idea of sampling slow statements via
> > log_statement_sample_rate. But I miss some parameter that can ensure
> > so every query executed over this limit is logged.
> >
> > Can we introduce new option
> >
> > log_statement_sampling_limit
> >
> > The query with execution time over this limit is logged every time.
> >
> > What do you think about this?
> >
> > Regards
> >
> > Pavel
>
>
> +1, log_min_duration_statement is modulated by log_statement_sample_rate
> that mean that there is no more way to log all statements over a certain
> duration limit. log_statement_sampling_limit might probably always be
> upper than log_min_duration_statement.
>

Here is a patch

Regards

Pavel


>
> --
> Gilles Darold
> http://www.darold.net/
>
>
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 84341a30e5..690f54f731 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5903,6 +5903,21 @@ local0.*/var/log/postgresql

   
 
+ 
+  log_statement_sample_limit (integer)
+  
+   log_statement_sample_limit configuration parameter
+  
+  
+   
+
+ Only if query duration is less than this value, the log sampling is active.
+ When log_statement_sample_limit is -1
+ then log sampling is without limit.
+
+   
+  
+
  
   log_transaction_sample_rate (real)
   
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 44a59e1d4f..2d8cc7d411 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -2193,7 +2193,9 @@ check_log_statement(List *stmt_list)
 /*
  * check_log_duration
  *		Determine whether current command's duration should be logged.
- *		If log_statement_sample_rate < 1.0, log only a sample.
+ *		If log_statement_sample_rate < 1.0, log only a sample and duration
+ *		is less than log_statement_sample_limit. log_statement_sample_limit
+ *		is used only if this value is different than -1.
  *		We also check if this statement in this transaction must be logged
  *		(regardless of its duration).
  *
@@ -2217,6 +2219,7 @@ check_log_duration(char *msec_str, bool was_logged)
 		int			usecs;
 		int			msecs;
 		bool		exceeded;
+		bool		every_time;
 		bool		in_sample;
 
 		TimestampDifference(GetCurrentStatementStartTimestamp(),
@@ -2234,6 +2237,15 @@ check_log_duration(char *msec_str, bool was_logged)
 	 (secs > log_min_duration_statement / 1000 ||
 	  secs * 1000 + msecs >= log_min_duration_statement)));
 
+		/*
+		 * When log_statement_sample_limit is exceeded, then query is
+		 * logged every time.
+		 */
+		every_time = (log_statement_sample_limit == 0 ||
+	  (log_statement_sample_limit > 0 &&
+	   (secs > log_statement_sample_limit / 1000 ||
+		secs * 1000 + msecs >= log_statement_sample_limit)));
+
 		/*
 		 * Do not log if log_statement_sample_rate = 0. Log a sample if
 		 * log_statement_sample_rate <= 1 and avoid unnecessary random() call
@@ -2245,7 +2257,7 @@ check_log_duration(char *msec_str, bool was_logged)
 			(log_statement_sample_rate == 1 ||
 			 random() <= log_statement_sample_rate * MAX_RANDOM_VALUE);
 
-		if ((exceeded && in_sample) || log_duration || xact_is_sampled)
+		if ((exceeded && (in_sample || every_time)) || log_duration || xact_is_sampled)
 		{
 			snprintf(msec_str, 32, "%ld.%03d",
 	 secs * 1000 + msecs, usecs % 1000);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 1208eb9a68..b0c7d9df0e 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -510,6 +510,7 @@ int			log_min_error_statement = ERROR;
 int			log_min_messages = WARNING;
 int			client_min_messages = NOTICE;
 int			log_min_duration_statement = -1;
+int			log_statement_sample_limit = -1;
 int			log_temp_files = -1;
 double		log_statement_sample_rate = 1.0;
 double		log_xact_sample_rate = 0;
@@ -2715,6 +2716,19 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"log_statement_sample_limit", PGC_SUSET, LOGGING_WHEN,
+			gettext_noop("Sets the maximum execution time for sampling "
+		 "logged statements."),
+			gettext_noop("Zero effective disables sampling. "
+		 "-1 use sampling every time (without limit)."),
+			GUC_UNIT_MS
+		},
+		&log_statement_sample_limit,
+		-1, -1, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"log_autovacuum_min_duration", PGC_SIGHUP, LOGGING_WHAT,
 			gettext_noop("Sets the minimum execution time above which "
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index e709177c37..8f0c2c59d4 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -252,6 +252,7 @@ extern int	log_min_error_statement;
 extern PGDLLIMPORT int log_min_messages;
 extern PGDLLIMPORT int client_min_messages;
 extern int	log_min_duration_statement;
+extern int	log_statement_sample_limit;
 

Re: SQL/JSON path issues/questions

2019-06-17 Thread Chapman Flack
On 6/17/19 4:13 PM, Alexander Korotkov wrote:
> On Mon, Jun 17, 2019 at 8:40 PM Thom Brown  wrote:
 "is unknown" suggests a boolean output, but the example shows an
 output of "infinity".  While I understand what it does, this appears
 inconsistent with all other "is..." functions (e.g. is_valid(lsn),
 pg_is_other_temp_schema(oid), pg_opclass_is_visible(opclass_oid),
 pg_is_in_backup() etc.).
>>>
>>> It's the right behavior. Among them, only "infinity" gives
>>> "unknown' for the test (@ > 0). -1 gives false, 2 and 3 true.
>>
>> I still find it counter-intuitive.
> 
> It might be so.  But it's defined do in SQL Standard 2016.

IIUC, this comes about simply because the JSON data model for numeric
values does not have any infinity or NaN.

So the example given in our doc is sort of a trick example that does
double duty: it demonstrates that (@ > 0) is Unknown when @ is a string,
because numbers and strings are incomparable, and it *also* sort of
slyly reminds the reader that JSON numbers have no infinity, and
therefore "infinity" is nothing but a run-of-the-mill string.

But maybe it is just too brow-furrowingly clever to ask one example
to make both of those points. Maybe it would be clearer to use some
string other than "infinity" to make the first point:

[-1, 2, 7, "some string"] | $[*] ? ((@ > 0) is unknown) | "some string"

... and then if the reminder about infinity is worth making, repeat
the example:

[-1, 2, 7, "infinity"] | $[*] ? ((@ > 0) is unknown) | "infinity"

with a note that it's a trick example as a reminder that JSON numbers
don't have infinity or NaN and so it is no different from any other
string.

Regards,
-Chap




Re: Fix typos and inconsistencies for v11+

2019-06-17 Thread Michael Paquier
On Mon, Jun 17, 2019 at 10:32:13AM +0300, Alexander Lakhin wrote:
> Then I will go deeper for v10 and beyond. If older versions are not
> going to be fixed, I will prepare patches only for the master branch.

When it comes to fixing typos in in anything which is not directly
user-visible like the documentation or error strings, my take is to
bother only about HEAD.  There is always a risk of conflicts with
back-branches, but I have never actually bumped into this as being a
problem.  There is an argument for me to be less lazy of course..
--
Michael


signature.asc
Description: PGP signature


Re: Race conditions with TAP test for syncrep

2019-06-17 Thread Michael Paquier
On Mon, Jun 17, 2019 at 10:50:39AM -0400, Alvaro Herrera wrote:
> Hmm, this introduces a bit of latency: it waits for each standby to be
> fully up before initializing the next standby.  Maybe it would be more
> convenient to split the primitives: keep the current one to start the
> standby, and add a separate one to wait for it to be registered.  Then
> we could do
> standby1->start;
> standby2->start;
> standby3->start;
> foreach my $sby (@standbys) {
>   $sby->wait_for_standby
> }

It seems to me that this sequence could still lead to inconsistencies:
1) standby 1 starts, reaches consistency so pg_ctl start -w exits.
2) standby 2 starts, reaches consistency.
3) standby 2 starts a WAL receiver, gets the first WAL sender slot of
the primary.
4) standby 1 starts a WAL receiver, gets the second slot.

> I think this should be note() rather than print(), or maybe diag().  (I
> see that we have a couple of other cases which use print() in the tap
> tests, which I think should be note() as well.)

OK.  Let's change it for this patch.  For the rest, I can always send
a different patch.  Just writing down your comment..
--
Michael


signature.asc
Description: PGP signature


Re: PG 12 beta 1 segfault during analyze

2019-06-17 Thread Steve Singer

On 6/15/19 10:18 PM, Tom Lane wrote:

Steve Singer  writes:

I encountered the following segfault when running against a  PG 12 beta1
during a analyze against a table.

Nobody else has reported this, so you're going to have to work on
producing a self-contained test case, or else debugging it yourself.

regards, tom lane




The attached patch fixes the issue.


Steve


diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index b7d2ddbbdcf..fc19f40a2e3 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -1113,11 +1113,11 @@ heapam_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
  * concurrent transaction never commits.
  */
 if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(targtuple->t_data)))
-	deadrows += 1;
+	*deadrows += 1;
 else
 {
 	sample_it = true;
-	liverows += 1;
+	*liverows += 1;
 }
 break;
 


Re: idea: log_statement_sample_rate - bottom limit for sampling

2019-06-17 Thread Pavel Stehule
Hi

po 17. 6. 2019 v 22:40 odesílatel Pavel Stehule 
napsal:

> Hi
>
> čt 6. 6. 2019 v 10:48 odesílatel Gilles Darold  napsal:
>
>> Le 06/06/2019 à 10:38, Pavel Stehule a écrit :
>> > Hi
>> >
>> > I like the idea of sampling slow statements via
>> > log_statement_sample_rate. But I miss some parameter that can ensure
>> > so every query executed over this limit is logged.
>> >
>> > Can we introduce new option
>> >
>> > log_statement_sampling_limit
>> >
>> > The query with execution time over this limit is logged every time.
>> >
>> > What do you think about this?
>> >
>> > Regards
>> >
>> > Pavel
>>
>>
>> +1, log_min_duration_statement is modulated by log_statement_sample_rate
>> that mean that there is no more way to log all statements over a certain
>> duration limit. log_statement_sampling_limit might probably always be
>> upper than log_min_duration_statement.
>>
>
> Here is a patch
>

I did error in logic - fixed



> Regards
>
> Pavel
>
>
>>
>> --
>> Gilles Darold
>> http://www.darold.net/
>>
>>
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 84341a30e5..690f54f731 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5903,6 +5903,21 @@ local0.*/var/log/postgresql

   
 
+ 
+  log_statement_sample_limit (integer)
+  
+   log_statement_sample_limit configuration parameter
+  
+  
+   
+
+ Only if query duration is less than this value, the log sampling is active.
+ When log_statement_sample_limit is -1
+ then log sampling is without limit.
+
+   
+  
+
  
   log_transaction_sample_rate (real)
   
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 44a59e1d4f..11ad1dd012 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -2193,7 +2193,9 @@ check_log_statement(List *stmt_list)
 /*
  * check_log_duration
  *		Determine whether current command's duration should be logged.
- *		If log_statement_sample_rate < 1.0, log only a sample.
+ *		If log_statement_sample_rate < 1.0, log only a sample and duration
+ *		is less than log_statement_sample_limit. log_statement_sample_limit
+ *		is used only if this value is different than -1.
  *		We also check if this statement in this transaction must be logged
  *		(regardless of its duration).
  *
@@ -2217,6 +2219,7 @@ check_log_duration(char *msec_str, bool was_logged)
 		int			usecs;
 		int			msecs;
 		bool		exceeded;
+		bool		every_time;
 		bool		in_sample;
 
 		TimestampDifference(GetCurrentStatementStartTimestamp(),
@@ -2234,6 +2237,14 @@ check_log_duration(char *msec_str, bool was_logged)
 	 (secs > log_min_duration_statement / 1000 ||
 	  secs * 1000 + msecs >= log_min_duration_statement)));
 
+		/*
+		 * When log_statement_sample_limit is exceeded, then query is
+		 * logged every time.
+		 */
+		every_time = (log_statement_sample_limit >= 0 &&
+	   (secs > log_statement_sample_limit / 1000 ||
+		secs * 1000 + msecs >= log_statement_sample_limit));
+
 		/*
 		 * Do not log if log_statement_sample_rate = 0. Log a sample if
 		 * log_statement_sample_rate <= 1 and avoid unnecessary random() call
@@ -2245,7 +2256,7 @@ check_log_duration(char *msec_str, bool was_logged)
 			(log_statement_sample_rate == 1 ||
 			 random() <= log_statement_sample_rate * MAX_RANDOM_VALUE);
 
-		if ((exceeded && in_sample) || log_duration || xact_is_sampled)
+		if ((exceeded && (in_sample || every_time)) || log_duration || xact_is_sampled)
 		{
 			snprintf(msec_str, 32, "%ld.%03d",
 	 secs * 1000 + msecs, usecs % 1000);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 1208eb9a68..b0c7d9df0e 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -510,6 +510,7 @@ int			log_min_error_statement = ERROR;
 int			log_min_messages = WARNING;
 int			client_min_messages = NOTICE;
 int			log_min_duration_statement = -1;
+int			log_statement_sample_limit = -1;
 int			log_temp_files = -1;
 double		log_statement_sample_rate = 1.0;
 double		log_xact_sample_rate = 0;
@@ -2715,6 +2716,19 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"log_statement_sample_limit", PGC_SUSET, LOGGING_WHEN,
+			gettext_noop("Sets the maximum execution time for sampling "
+		 "logged statements."),
+			gettext_noop("Zero effective disables sampling. "
+		 "-1 use sampling every time (without limit)."),
+			GUC_UNIT_MS
+		},
+		&log_statement_sample_limit,
+		-1, -1, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"log_autovacuum_min_duration", PGC_SIGHUP, LOGGING_WHAT,
 			gettext_noop("Sets the minimum execution time above which "
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index e709177c37..8f0c2c59d4 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -252,6 +252,7 @@ extern int	log_min_error_statement;
 extern PGDLLIMPORT int log_min_messages;
 extern PGDLLIMPORT int cli

Re: PG 12 beta 1 segfault during analyze

2019-06-17 Thread Tom Lane
Steve Singer  writes:
> The attached patch fixes the issue.

Hmm, that's a pretty obvious mistake :-( but after some fooling around
I've not been able to cause a crash with it.  I wonder what test case
you were using, on what platform?

regards, tom lane