Updating parallel.sgml's treatment of parallel joins

2018-02-23 Thread Thomas Munro
Hi hackers,

Here is an attempt at updating parallel.sgml to cover Parallel Hash.
I will be neither surprised nor offended if Robert would like to put
it differently!

-- 
Thomas Munro
http://www.enterprisedb.com


update-parallel-join-docs.patch
Description: Binary data


Re: Online enabling of checksums

2018-02-23 Thread Tomas Vondra
Hi,

I see the patch also does throttling by calling vacuum_delay_point().
Being able to throttle the checksum workers not to affect user activity
definitely seems like a useful feature, no complaints here.

But perhaps binding it to vacuum_cost_limit/vacuum_cost_delay is not the
best idea? I mean, enabling checksums seems rather unrelated to vacuum,
so it seems a bit strange to configure it by vacuum_* options.

Also, how am I supposed to set the cost limit? Perhaps I'm missing
something, but the vacuum_delay_point call happens in the bgworker, so
setting the cost limit before running pg_enable_data_checksums() will
get there, right? I really don't want to be setting it in the config
file, because then it will suddenly affect all user VACUUM commands.

And if this patch gets improved to use multiple parallel workers, we'll
need a global limit (something like we have for autovacuum workers).

In any case, I suggest mentioning this in the docs.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Online enabling of checksums

2018-02-23 Thread Stephen Frost
Greetings,

* Tomas Vondra (tomas.von...@2ndquadrant.com) wrote:
> On 02/24/2018 03:11 AM, Andres Freund wrote:
> > On 2018-02-24 03:07:28 +0100, Tomas Vondra wrote:
> >> I agree having to restart the whole operation after a crash is not
> >> ideal, but I don't see how adding a flag actually solves it. The problem
> >> is the large databases often store most of the data (>80%) in one or two
> >> central tables (think fact tables in star schema, etc.). So if you
> >> crash, it's likely half-way while processing this table, so the whole
> >> table would still have relchecksums=false and would have to be processed
> >> from scratch.
> > 
> > I don't think it's quite as large a problem as you make it out to
> > be. Even in those cases you'll usually have indexes, toast tables and so
> > forth.
> 
> Hmmm, right. I've been focused on tables and kinda forgot that the other
> objects need to be transformed too ... :-/

There's also something of a difference between just scanning a table or
index, where you don't have to do much in the way of actual writes
because most of the table already has valid checksums, and having to
actually write out all the changes.

> >> But perhaps you meant something like "position" instead of just a simple
> >> true/false flag?
> > 
> > I think that'd incur a much larger complexity cost.
> 
> Yep, that was part of the point that I was getting to - that actually
> addressing the issue would be more expensive than simple flags. But as
> you pointed out, that was not quite ... well thought through.

No, but it's also not entirely wrong.  Huge tables aren't uncommon.

That said, I'm not entirely convinced that these new flags would be as
unnoticed as is being suggested here, but rather than focus on either
side of that, I'm thinking about what we want to have *next*- we know
that enabling/disabling checksums is an issue that needs to be solved,
and this patch is making progress towards that, but the next question is
what does one do when a page has been detected as corrupted?  Are there
flag fields which would be useful to have at a per-relation level to
support some kind of corrective action or setting that says "don't care
about checksums on this table, even though the entire database is
supposed to have valid checksums, but instead do X with failed pages" or
similar.

Beyond dealing with corruption-recovery cases, are there other use cases
for having a given table not have checksums?

Would it make sense to introduce a flag or field which indicates that an
entire table's pages has some set of attributes, of which 'checksums' is
just one attribute?  Perhaps a page version, which potentially allows us
to have a way to change page layouts in the future?

I'm happy to be told that we simply don't have enough information at
this point to make anything larger than a relchecksums field-level
decision, but perhaps these thoughts will spark an idea about how we
could define something a bit broader with clear downstream usefulness
that happens to also cover the "does this relation have checksums?"
question.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Online enabling of checksums

2018-02-23 Thread Tomas Vondra


On 02/24/2018 03:11 AM, Andres Freund wrote:
> Hi,
> 
> On 2018-02-24 03:07:28 +0100, Tomas Vondra wrote:
>> I agree having to restart the whole operation after a crash is not
>> ideal, but I don't see how adding a flag actually solves it. The problem
>> is the large databases often store most of the data (>80%) in one or two
>> central tables (think fact tables in star schema, etc.). So if you
>> crash, it's likely half-way while processing this table, so the whole
>> table would still have relchecksums=false and would have to be processed
>> from scratch.
> 
> I don't think it's quite as large a problem as you make it out to
> be. Even in those cases you'll usually have indexes, toast tables and so
> forth.
> 

Hmmm, right. I've been focused on tables and kinda forgot that the other
objects need to be transformed too ... :-/

>> But perhaps you meant something like "position" instead of just a simple
>> true/false flag?
> 
> I think that'd incur a much larger complexity cost.
> 

Yep, that was part of the point that I was getting to - that actually
addressing the issue would be more expensive than simple flags. But as
you pointed out, that was not quite ... well thought through.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Translations contributions urgently needed

2018-02-23 Thread Alvaro Herrera
Tels wrote:

> Last but not least I'd be able to help with the german translation, but I
> have no clear idea how, or what the actual status is. Are German
> translators actually needed?

German, along with Russian and French, are pretty much the only
translations kept fully up to date almost all the time.  It used to be
the case for Spanish also, but I've been slacking lately.  The status
tables are here:
https://babel.postgresql.org/

> And while I have no problem entering some texts on a website or edit a
> file with an editor, the process described here:
> 
>   https://www.postgresql.org/docs/10/static/nls-translator.html
> 
> sounds complicated [...]

I think we should link to our Babel site in that page, and also to the
translator-level docs which are here:
https://wiki.postgresql.org/wiki/NLS


Clearly, for a few languages the current facilities are sufficient.
If we make the process easier, we're likely to have more translations.
Will there be people verifying the quality also?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Online enabling of checksums

2018-02-23 Thread Andres Freund
Hi,

On 2018-02-24 03:07:28 +0100, Tomas Vondra wrote:
> I agree having to restart the whole operation after a crash is not
> ideal, but I don't see how adding a flag actually solves it. The problem
> is the large databases often store most of the data (>80%) in one or two
> central tables (think fact tables in star schema, etc.). So if you
> crash, it's likely half-way while processing this table, so the whole
> table would still have relchecksums=false and would have to be processed
> from scratch.

I don't think it's quite as large a problem as you make it out to
be. Even in those cases you'll usually have indexes, toast tables and so
forth.

> But perhaps you meant something like "position" instead of just a simple
> true/false flag?

I think that'd incur a much larger complexity cost.

Greetings,

Andres Freund



Re: Online enabling of checksums

2018-02-23 Thread Tomas Vondra


On 02/24/2018 01:34 AM, Robert Haas wrote:
> On Thu, Feb 22, 2018 at 3:28 PM, Magnus Hagander  wrote:
>> I would prefer that yes. But having to re-read 9TB is still significantly
>> better than not being able to turn on checksums at all (state today). And
>> adding a catalog column for it will carry the cost of the migration
>> *forever*, both for clusters that never have checksums and those that had it
>> from the beginning.
>>
>> Accepting that the process will start over (but only read, not re-write, the
>> blocks that have already been processed) in case of a crash does
>> significantly simplify the process, and reduce the long-term cost of it in
>> the form of entries in the catalogs. Since this is a on-time operation (or
>> for many people, a zero-time operation), paying that cost that one time is
>> probably better than paying a much smaller cost but constantly.
> 
> That's not totally illogical, but to be honest I'm kinda surprised
> that you're approaching it that way.  I would have thought that
> relchecksums and datchecksums columns would have been a sort of
> automatic design choice for this feature.  The thing to keep in mind
> is that nobody's going to notice the overhead of adding those columns
> in practice, but someone will surely notice the pain that comes from
> having to restart the whole operation.  You're talking about trading
> an effectively invisible overhead for a very noticeable operational
> problem.
> 

I agree having to restart the whole operation after a crash is not
ideal, but I don't see how adding a flag actually solves it. The problem
is the large databases often store most of the data (>80%) in one or two
central tables (think fact tables in star schema, etc.). So if you
crash, it's likely half-way while processing this table, so the whole
table would still have relchecksums=false and would have to be processed
from scratch.

But perhaps you meant something like "position" instead of just a simple
true/false flag?

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Online enabling of checksums

2018-02-23 Thread Robert Haas
On Thu, Feb 22, 2018 at 3:28 PM, Magnus Hagander  wrote:
> I would prefer that yes. But having to re-read 9TB is still significantly
> better than not being able to turn on checksums at all (state today). And
> adding a catalog column for it will carry the cost of the migration
> *forever*, both for clusters that never have checksums and those that had it
> from the beginning.
>
> Accepting that the process will start over (but only read, not re-write, the
> blocks that have already been processed) in case of a crash does
> significantly simplify the process, and reduce the long-term cost of it in
> the form of entries in the catalogs. Since this is a on-time operation (or
> for many people, a zero-time operation), paying that cost that one time is
> probably better than paying a much smaller cost but constantly.

That's not totally illogical, but to be honest I'm kinda surprised
that you're approaching it that way.  I would have thought that
relchecksums and datchecksums columns would have been a sort of
automatic design choice for this feature.  The thing to keep in mind
is that nobody's going to notice the overhead of adding those columns
in practice, but someone will surely notice the pain that comes from
having to restart the whole operation.  You're talking about trading
an effectively invisible overhead for a very noticeable operational
problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Treating work_mem as a shared resource (Was: Parallel Hash take II)

2018-02-23 Thread Robert Haas
On Fri, Feb 23, 2018 at 6:06 PM, Thomas Munro
 wrote:
> If we switched to policy 3 and (say) work_mem were somehow
> automagically adjusted to be divided by number of participants at
> planning and execution time, then Parallel Hash wouldn't have to
> change at all to conform to the policy.  It would use at most work_mem
> per Parallel Hash node, no matter how many workers and no matter which
> of its strategies it picked (either it receives a budget of work_mem /
> participants, and then multiplies it by participants to create a
> no-partition hash table combining the participants' budgets, or it
> lets each participant chew on smaller hash tables adding up to at most
> work_mem).  Just the same total per-node budget as any other executor
> node gets.

That's true, but what you'd have instead is a whole lot of additional
planning overhead.  Right now, if we choose to do a merge-join or a
parallel-oblivious hash join or a nested loop with a materialize node
on the inner side, we can join the parallel-aware path on the outer
side to the same parallel-oblivious path on the inner side that we
would use if we decided against parallel query altogether.  If you
wanted to all of the copies of a node across all parallel participants
to stick to work_mem as a budget, then you'd need one set of paths for
each rel planned with the default work_mem setting and a second set
planned with less work_mem.  And if you imagine a future where we
create various paths for the same relation with various different
numbers of workers, then you'd need to have even more different sets
of paths for each relation.

If we're OK with making planning more expensive to solve this problem,
then I think we should forget about #3 and go straight to #2.  What we
would do is just teach add_path() that "amount of memory used" is
another independent dimension of merit, so that a more expensive plan
might be kept if it uses less memory. Then if at the end of planning
you want to pick the fastest plan that uses less than X amount of
memory, or if you want to pick the plan for which weight * cost +
weight * memory usage is minimal, or whatever it is you want, you can.

I think the only one from your list that's really boil-the-ocean is
#1.  For that one, you presumably need to create multiple plans and
switch between them based on how much memory is available right now
and maybe how much you think will be available in the near future and
I guess impose some kind of admission control when system memory gets
too low...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Treating work_mem as a shared resource (Was: Parallel Hash take II)

2018-02-23 Thread Thomas Munro
On Sun, Nov 26, 2017 at 3:04 AM, Robert Haas  wrote:
> On Tue, Nov 21, 2017 at 5:38 PM, Peter Geoghegan  wrote:
>>> That having been said, I think the place where our plans most commonly
>>> go wrong is where we incorrectly estimate the number of tuples by
>>> multiple orders of magnitude - 100x is common, 1000x is common, a
>>> million x is not uncommon, even a billion x is not unheard-of.  And I
>>> don't think there's any way to make a hash join happy if it thinks
>>> it's going to need 1 batch and it ends up needing a million batches.
>>
>> What about dynamic role reversal? That could make a big difference.
>
> In the best case it's great, but it looks to me like there are a lot
> of thorny problems.

There are loads of inter-related topics discussed in this thread,
including some operator-specific stuff like the above, and some more
general stuff, all requiring more research.  In the meantime, I wonder
if there are some simpler incremental improvements we could consider.

Since work_mem currently acts as a kind of per executor node instance
limit, the system-wide peak memory usage could be described as number
of concurrent queries * number of executor nodes * number of parallel
participants * work_mem.  In the past I think the number of executor
nodes was practically anchored to the ground by the number of
relations in the query (not necessarily linearly, but not far off it),
and the number of parallel participants was one.  With the advent of
parallel query we have this new multiplying term, and with the advent
of partitions and partition-wise join we have exciting new ways to
explode the number of executor nodes when the user only explicitly
named a few relations.

We could imagine various levels of memory budgeting:

1.  work_mem_per_system (global budget).
2.  work_mem_per_query (work_mem somehow shared out between executor nodes).
3.  Per planned executor node budget (workers get a fraction of
work_mem for each node).
4.  What we have today: per executor node instance budget (workers get
to use work_mem for each node).

1 and 2 seem like they might be boil-the-ocean problems.  But as far
as I know moving from 4 to 3 would merely require warming up a minor
lake.  That would take out one of the multipliers, and would remove a
perverse incentive from any potential cost-based parallel degree
choosing algorithms (you can print free memory by adding new workers.)

Parallel Hash either combines the memory budgets of all participants
to make one large no-partition hash table, or partitions the inner
relation into work_mem sized batches and loads several of them into
memory at the same time (at most one per participant).  Either way the
total memory usage is participants * work_mem, consistent with policy
4 and consistent with the total budget given to equivalent
parallel-oblivious hash join, sort-merge join or any other node.

If we switched to policy 3 and (say) work_mem were somehow
automagically adjusted to be divided by number of participants at
planning and execution time, then Parallel Hash wouldn't have to
change at all to conform to the policy.  It would use at most work_mem
per Parallel Hash node, no matter how many workers and no matter which
of its strategies it picked (either it receives a budget of work_mem /
participants, and then multiplies it by participants to create a
no-partition hash table combining the participants' budgets, or it
lets each participant chew on smaller hash tables adding up to at most
work_mem).  Just the same total per-node budget as any other executor
node gets.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: BUG #15044: materialized views incompatibility with logical replication in postgres 10

2018-02-23 Thread David G. Johnston
Adding -hackers to this in the interest of getting it committed by Monday's
wrap-up.

https://www.postgresql.org/message-id/6e375316-91a4-7825-ef8b-9b8915ab6980%402ndquadrant.com

David J.

On Sat, Feb 17, 2018 at 8:43 PM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 2/5/18 10:33, Petr Jelinek wrote:
> >> Exactly.  The matview does not show up in pg_publication_tables but it's
> >> registered at some level.
> >
> > Indeed this is a bug. For normal publications we take care of this when
> > adding the relation to the publication but since ALL TABLES publications
> > don't check for membership we have to filter this directly in the output
> > plugin.
>
> I think the filtering in pgoutput ought to make use of
> is_publishable_class() in some way.  That takes care of non-tables such
> as materialized views, but it also filters out the information_schema
> tables for example.  Right now, if you insert something into one of the
> IS tables, it gets shipped over the wire but is then dropped by the
> apply because there is no pg_subscription_rel entry of the table.  That
> doesn't quite have the user-visible effect as this bug, but it's bogus
> nonetheless.
>
> So I propose this alternative patch that covers all these cases.
>
>


0001-Fix-filtering-of-unsupported-relations-in-pgoutput.patch
Description: Binary data


Re: parallel append vs. simple UNION ALL

2018-02-23 Thread Robert Haas
On Sat, Dec 23, 2017 at 4:53 PM, Robert Haas  wrote:
> As I mentioned in the commit message for the Parallel Append commit
> (ab72716778128fb63d54ac256adf7fe6820a1185), it's kind of sad that this
> doesn't work with UNION ALL queries, which are an obvious candidate
> for such parallelization.  It turns out that it actually does work to
> a limited degree: assuming that the UNION ALL query can be converted
> to a simple appendrel, it can consider a parallel append of
> non-partial paths only.  The attached patch lets it consider a
> parallel append of partial paths ...

Here's an extended series of patches that now handles both the simple
UNION ALL case (where we flatten it) and the unflattened case:

0001 is pretty much the same as the subquery-smarts.patch file I
attached to the previous email.  I don't see much reason not to go
ahead and commit this, although it could use a test case.  It makes
the simple/flattened case work.  After some study I think that the
gather-parameter handling is correct, although if somebody felt like
reviewing that portion especially I wouldn't say no.

0002 rewrites recurse_union_children to work iteratively rather than
recursively and renames it to plan_union_children.  This probably
isn't 100% necessary, but it seems to me that the resulting code is
easier to understand, and it reduces the risk of blowing out the
stack.  There should be no user-visible behavior change.

0003 rewrites the setop planner to create a separate upper rel for
each stage of setop planning and uses them to return paths instead of
returning paths directly.  This is necessary preparatory work for
anything that wants to consider multiple possible paths for queries
that go through the full setop planner, but it shouldn't have any
visible impact all by itself.

0004 causes generate_union_path() to consider both the traditional
method and also Gather -> Parallel Append -> [partial path for each
subquery].  This is still a bit rough around the edges and there's a
lot more that could be done here, but I'm posting what I have for now
in the (perhaps vain) hope of getting some feedback.  With this, you
can use Parallel Append for the UNION ALL step of a query like SELECT
.. UNION ALL .. SELECT ... EXCEPT SELECT ...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


0004-Consider-Parallel-Append-as-a-way-to-implement-a-uni.patch
Description: Binary data


0003-Generate-a-separate-upper-relation-for-each-stage-of.patch
Description: Binary data


0002-Rewrite-recurse_union_children-to-iterate-rather-tha.patch
Description: Binary data


0001-Let-Parallel-Append-over-simple-UNION-ALL-have-parti.patch
Description: Binary data


Re: Translations contributions urgently needed

2018-02-23 Thread Robert Haas
On Fri, Feb 23, 2018 at 2:09 PM, Peter Eisentraut
 wrote:
> That section is meant to say, if a translation ships in 10.0 with 81%
> but then drops to 79% because some backpatching changes strings, we
> won't drop it in 10.2.
>
> In the opposite case, it would be added to a minor release.

Good to know, thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Translations contributions urgently needed

2018-02-23 Thread Peter Eisentraut
On 2/23/18 10:48, Robert Haas wrote:
> On Fri, Feb 23, 2018 at 10:41 AM, Tom Lane  wrote:
>> Thom Brown  writes:
>>> Something that isn't clear to me is, for a language that didn't meet
>>> 80% translation for a component, if it does reach 80% after the major
>>> version release, does it then get shipped in a minor release, or is
>>> out of that version completely until the next major version?
>>
>> No, it'll be added to the next minor release as soon as it reaches
>> 80%.  That's happened routinely in the past.  I have no idea how
>> automated that policy is -- you could ask Peter E. -- but a trawl
>> through the commit logs shows .po files getting added in minor
>> releases from time to time.
> 
> https://wiki.postgresql.org/wiki/NLS#Minimum_Translation implies the
> opposite, because it refers to inclusion "in a PostgreSQL major
> release".

That section is meant to say, if a translation ships in 10.0 with 81%
but then drops to 79% because some backpatching changes strings, we
won't drop it in 10.2.

In the opposite case, it would be added to a minor release.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: check error messages in SSL tests

2018-02-23 Thread Peter Eisentraut
On 2/22/18 23:58, Michael Paquier wrote:
> One of the tests is failing:
> t/001_ssltests.pl .. 1/62
> #   Failed test 'certificate authorization fails with revoked client cert: 
> matches'
> #   at /home//git/postgres/src/test/ssl/../../../src/test/perl/TestLib.pm 
> line 354.
> #   'psql: private key file "ssl/client-revoked.key" has
> group or world access; permissions should be u=rw (0600) or less
> # '
> # doesn't match '(?^:SSL error)'
> # Looks like you failed 1 test of 62.
> t/001_ssltests.pl .. Dubious, test returned 1 (wstat 256, 0x100)
> Failed 1/62 subtests
> 
> This comes from libpq itself, and the tree uses 0644 on this file.  You
> just need to update this test so as ssl/client-revoked_tmp.key is used
> instead of ssl/client-revoked.key, and then the tests pass.

Oh.  I actually had that file as 0600 in my checked-out tree, probably
from earlier experiments.  Fixed that.  And I also changed it to make
another temp file that is explicitly 0644, because we can't rely on that
being the default either.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From cdcd1a6d4ed835610c7577f21955268ee90cc9a9 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 23 Feb 2018 13:54:45 -0500
Subject: [PATCH v2] Check error messages in SSL tests

In tests that check whether a connection fails, also check the error
message.  That makes sure that the connection was rejected for the right
reason.

This discovered that two tests had their connection failing for the
wrong reason.  One test failed because pg_hba.conf was not set up to
allow that user, one test failed because the client key file did not
have the right permissions.  Fix those tests and add a new one that is
really supposed to check the file permission issue.
---
 src/test/ssl/ServerSetup.pm| 42 -
 src/test/ssl/ssl/.gitignore|  2 +-
 src/test/ssl/t/001_ssltests.pl | 43 ++
 src/test/ssl/t/002_scram.pl|  4 +++-
 4 files changed, 59 insertions(+), 32 deletions(-)

diff --git a/src/test/ssl/ServerSetup.pm b/src/test/ssl/ServerSetup.pm
index 45991d61a2..27a676b65c 100644
--- a/src/test/ssl/ServerSetup.pm
+++ b/src/test/ssl/ServerSetup.pm
@@ -27,7 +27,6 @@ use Test::More;
 use Exporter 'import';
 our @EXPORT = qw(
   configure_test_server_for_ssl
-  run_test_psql
   switch_server_cert
   test_connect_fails
   test_connect_ok
@@ -35,37 +34,28 @@ our @EXPORT = qw(
 
 # Define a couple of helper functions to test connecting to the server.
 
-# Attempt connection to server with given connection string.
-sub run_test_psql
-{
-   my $connstr   = $_[0];
-
-   my $cmd = [
-   'psql', '-X', '-A', '-t', '-c', "SELECT \$\$connected with 
$connstr\$\$",
-   '-d', "$connstr" ];
-
-   my $result = run_log($cmd);
-   return $result;
-}
-
 # The first argument is a base connection string to use for connection.
 # The second argument is a complementary connection string.
 sub test_connect_ok
 {
-   my $common_connstr = $_[0];
-   my $connstr = $_[1];
-   my $test_name = $_[2];
+   my ($common_connstr, $connstr, $test_name) = @_;
 
-   ok(run_test_psql("$common_connstr $connstr"), $test_name);
+   my $cmd = [
+   'psql', '-X', '-A', '-t', '-c', "SELECT \$\$connected with 
$connstr\$\$",
+   '-d', "$common_connstr $connstr" ];
+
+   command_ok($cmd, $test_name);
 }
 
 sub test_connect_fails
 {
-   my $common_connstr = $_[0];
-   my $connstr = $_[1];
-   my $test_name = $_[2];
+   my ($common_connstr, $connstr, $expected_stderr, $test_name) = @_;
+
+   my $cmd = [
+   'psql', '-X', '-A', '-t', '-c', "SELECT \$\$connected with 
$connstr\$\$",
+   '-d', "$common_connstr $connstr" ];
 
-   ok(!run_test_psql("$common_connstr $connstr"), $test_name);
+   command_fails_like($cmd, $expected_stderr, $test_name);
 }
 
 # Copy a set of files, taking into account wildcards
@@ -169,12 +159,12 @@ sub configure_hba_for_ssl
print $hba
 "# TYPE  DATABASEUSERADDRESS METHOD\n";
print $hba
-"hostssl trustdb ssltestuser $serverhost/32
$authmethod\n";
+"hostssl trustdb all $serverhost/32
$authmethod\n";
print $hba
-"hostssl trustdb ssltestuser ::1/128 
$authmethod\n";
+"hostssl trustdb all ::1/128 
$authmethod\n";
print $hba
-"hostssl certdb  ssltestuser $serverhost/32cert\n";
+"hostssl certdb  all $serverhost/32cert\n";
print $hba
-"hostssl certdb  ssltestuser ::1/128 cert\n";
+"hostssl certdb  all ::1/128 cert\n";
close $hba;
 }
diff --gi

Re: Allow workers to override datallowconn

2018-02-23 Thread Magnus Hagander
On Fri, Feb 23, 2018 at 7:52 PM, Tom Lane  wrote:

> Magnus Hagander  writes:
> > Here's another attempt at moving this one forward. Basically this adds a
> > new GucSource being GUC_S_CLIENT_EARLY. It now runs through the
> parameters
> > once before CheckMyDatabase, with source set to GUC_S_CLIENT_EARLY. In
> this
> > source, *only* parameters that are flagged as GUC_ALLOW_EARLY will be
> set,
> > any other parameters are ignored (without error). For now, only the
> > ignore_connection_restriction is allowed at this stage. Then it runs
> > CheckMyDatabase(), and after that it runs through all the parameters
> again,
> > now with the GUC_S_CLIENT source as usual, which will now process all
> > other  variables.
>
> Ick.  This is an improvement over the other way of attacking the problem?
> I do not think so.
>

Nope, I'm far from sure that it is. I just wanted to show what it'd look
like.

I personally think the second patch (the one adding a parameter to
BackendWorkerInitializeConnection) is the cleanest one. It doesn't solve
Andres' problem, but perhaps that should be the job of a different patch.


-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: Allow workers to override datallowconn

2018-02-23 Thread Tom Lane
Magnus Hagander  writes:
> Here's another attempt at moving this one forward. Basically this adds a
> new GucSource being GUC_S_CLIENT_EARLY. It now runs through the parameters
> once before CheckMyDatabase, with source set to GUC_S_CLIENT_EARLY. In this
> source, *only* parameters that are flagged as GUC_ALLOW_EARLY will be set,
> any other parameters are ignored (without error). For now, only the
> ignore_connection_restriction is allowed at this stage. Then it runs
> CheckMyDatabase(), and after that it runs through all the parameters again,
> now with the GUC_S_CLIENT source as usual, which will now process all
> other  variables.

Ick.  This is an improvement over the other way of attacking the problem?
I do not think so.

regards, tom lane



Re: ERROR: left and right pathkeys do not match in mergejoin

2018-02-23 Thread Tom Lane
Alexander Kuzmenkov  writes:
> In create_mergejoin_plan:
>      * implied inner ordering is then "ORDER BY x, y, x", but the pathkey
>      * drops the second sort by x as redundant, and this code must cope.
> -- should this read "the pathkey machinery drops"?

I changed that and improved some other comments, and pushed it.
Thanks for reviewing!

regards, tom lane



Re: Online enabling of checksums

2018-02-23 Thread Magnus Hagander
On Thu, Feb 22, 2018 at 9:09 PM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 2/22/18 12:38, Magnus Hagander wrote:
> > I'm not entirely sure which the others ones are. Auto-Vacuum obviously
> > is one, which doesn't use the worker infrastructure. But I'm not sure
> > which the others are referring to?
>
> autovacuum, subscription workers, auto prewarm
>
>
Oh, for some reason I thought you were thinking in pending patches. Yeah,
for those it makes sense -- though autovacuum isn't (currently) using
background workers for what it does, the rest certainly makes sense to do
something with.

But as you say, that's a separate patch :)


-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: Allow workers to override datallowconn

2018-02-23 Thread Magnus Hagander
On Thu, Feb 22, 2018 at 9:24 PM, Tom Lane  wrote:

> Magnus Hagander  writes:
> > I hacked up an attempt to do this. It does seem to work in the very
> simple
> > case, but it does requiring changing the order in InitPostgres() to load
> > the startup packet before validating those.
>
> I doubt that's safe.  It requires, to name just one thing, an assumption
> that no processing done in process_startup_options has any need to know
> the database encoding, which is established by CheckMyDatabase.  Thus
> for instance, if any GUC settings carried in the startup packet include
> non-ASCII characters, the wrong things will happen.
>
> You could possibly make it work with more aggressive refactoring, but
> I remain of the opinion that this is a fundamentally bad idea anyhow.
> A GUC of this kind is just ripe for abuse, and I don't think it's solving
> any problem we really need solved.
>

Here's another attempt at moving this one forward. Basically this adds a
new GucSource being GUC_S_CLIENT_EARLY. It now runs through the parameters
once before CheckMyDatabase, with source set to GUC_S_CLIENT_EARLY. In this
source, *only* parameters that are flagged as GUC_ALLOW_EARLY will be set,
any other parameters are ignored (without error). For now, only the
ignore_connection_restriction is allowed at this stage. Then it runs
CheckMyDatabase(), and after that it runs through all the parameters again,
now with the GUC_S_CLIENT source as usual, which will now process all
other  variables.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 
diff --git a/src/backend/postmaster/checksumhelper.c b/src/backend/postmaster/checksumhelper.c
index 44535f9976..997bffa416 100644
--- a/src/backend/postmaster/checksumhelper.c
+++ b/src/backend/postmaster/checksumhelper.c
@@ -586,6 +588,8 @@ ChecksumHelperWorkerMain(Datum arg)
 	ereport(DEBUG1,
 			(errmsg("Checksum worker starting for database oid %d", dboid)));
 
+	SetConfigOption("ignore_connection_restriction", "true", PGC_SU_BACKEND, PGC_S_OVERRIDE);
+
 	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid);
 
 	/*
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 6dc2095b9a..f08669ddb3 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3358,7 +3358,7 @@ get_stats_option_name(const char *arg)
  */
 void
 process_postgres_switches(int argc, char *argv[], GucContext ctx,
-		  const char **dbname)
+		  const char **dbname, bool early_processing)
 {
 	bool		secure = (ctx == PGC_POSTMASTER);
 	int			errs = 0;
@@ -3376,6 +3376,10 @@ process_postgres_switches(int argc, char *argv[], GucContext ctx,
 			argc--;
 		}
 	}
+	else if (early_processing)
+	{
+		gucsource = PGC_S_CLIENT_EARLY;
+	}
 	else
 	{
 		gucsource = PGC_S_CLIENT;	/* switches came from client */
@@ -3641,7 +3645,7 @@ PostgresMain(int argc, char *argv[],
 	/*
 	 * Parse command-line options.
 	 */
-	process_postgres_switches(argc, argv, PGC_POSTMASTER, &dbname);
+	process_postgres_switches(argc, argv, PGC_POSTMASTER, &dbname, false);
 
 	/* Must have gotten a database name, or have a default (the username) */
 	if (dbname == NULL)
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 484628987f..2b00cdebdb 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -73,7 +73,7 @@ static void StatementTimeoutHandler(void);
 static void LockTimeoutHandler(void);
 static void IdleInTransactionSessionTimeoutHandler(void);
 static bool ThereIsAtLeastOneRole(void);
-static void process_startup_options(Port *port, bool am_superuser);
+static void process_startup_options(Port *port, bool am_superuser, bool early_processing);
 static void process_settings(Oid databaseid, Oid roleid);
 
 
@@ -326,7 +326,7 @@ CheckMyDatabase(const char *name, bool am_superuser)
 		/*
 		 * Check that the database is currently allowing connections.
 		 */
-		if (!dbform->datallowconn)
+		if (!dbform->datallowconn && !IgnoreDatAllowConn)
 			ereport(FATAL,
 	(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 	 errmsg("database \"%s\" is not currently accepting connections",
@@ -811,7 +811,7 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
 	{
 		/* process any options passed in the startup packet */
 		if (MyProcPort != NULL)
-			process_startup_options(MyProcPort, am_superuser);
+			process_startup_options(MyProcPort, am_superuser, false);
 
 		/* Apply PostAuthDelay as soon as we've read all options */
 		if (PostAuthDelay > 0)
@@ -999,6 +999,10 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
 	/* set up ACL framework (so CheckMyDatabase can check permissions) */
 	initialize_acl();
 
+	/* Process "early" GUCs in startup packet */
+	if (MyProcPort != NULL)
+		process_startup_options(MyProcPort, am_superuser, true);
+
 	/*
 	 * Re-read the pg_database row for our dat

Re: [HACKERS] WIP: Aggregation push-down

2018-02-23 Thread Antonin Houska
Robert Haas  wrote:

> On Mon, Jan 29, 2018 at 3:32 AM, Antonin Houska  wrote:
> > I think of a variant of this: implement an universal function that tests the
> > binary values for equality (besides the actual arguments, caller would have 
> > to
> > pass info like typlen, typalign, typbyval for each argument, and cache these
> > for repeated calls) and set pg_proc(oprcode) to 0 wherever this function is
> > sufficient. Thus the problematic cases like numeric, citext, etc. would be
> > detected by their non-zero oprcode.
> 
> I don't think that's a good option, because we don't want int4eq to
> have to go through a code path that has branches to support varlenas
> and cstrings.  Andres is busy trying to speed up expression evaluation
> by removing unnecessary code branches; adding new ones would be
> undoing that valuable work.

I spent some more time thinking about this. What about adding a new strategy
number for hash index operator classes, e.g. HTBinaryEqualStrategyNumber? For
most types both HTEqualStrategyNumber and HTBinaryEqualStrategyNumber strategy
would point to the same operator, but types like numeric would naturally have
them different.

Thus the pushed-down partial aggregation can only use the
HTBinaryEqualStrategyNumber's operator to compare grouping expressions. In the
initial version (until we have useful statistics for the binary values) we can
avoid the aggregation push-down if the grouping expression output type has the
two strategies implemented using different functions because, as you noted
upthread, grouping based on binary equality can result in excessive number of
groups.

One open question is whether the binary equality operator needs a separate
operator class or not. If an opclass cares only about the binary equality, its
hash function(s) can be a lot simpler.

-- 
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com



Re: Translations contributions urgently needed

2018-02-23 Thread Tels

On Thu, February 22, 2018 11:04 pm, Tom Lane wrote:
> Alvaro Herrera  writes:
>> Please join pgsql-translat...@postgresql.org.
>
> What surprises me about this thread is that apparently the sad state
> of the v10 translations wasn't already discussed on that list?

The list is more or less dead - over the past 6 months there where only
between 1 and 5 messages per month.

> I have no objection to calling for more translation volunteers on
> this list --- in fact, probably it'd be a good idea to call for
> more help on pgsql-general, too.  But it doesn't seem like it's
> quite on-topic for -hackers otherwise.

I do think it would be a good idea to at least send a translation summary
once a quarter to the translators list and maybe include hackers and
general - many might not be aware of the 80% rule.

Also an automated summary one ore two months before a major release as a
"call for action" would be good to avoid such mishaps.

Last but not least I'd be able to help with the german translation, but I
have no clear idea how, or what the actual status is. Are German
translators actually needed?

And while I have no problem entering some texts on a website or edit a
file with an editor, the process described here:

  https://www.postgresql.org/docs/10/static/nls-translator.html

sounds complicated and leaves me unclear on quite a few things, for
instance how would I submit whatever work I've done? Do I need git? An
account? Where? And how does it all work?

I guess a lot of potential translators who aren't programmers would be
left baffled, too.

Best regards,

Tels



Re: RTLD_GLOBAL (& JIT inlining)

2018-02-23 Thread Tom Lane
Andres Freund  writes:
> First question:
> Why do we currently use RTLD_GLOBAL loading extension libraries, but
> take pains ([1]) to make things work without RTLD_GLOBAL.

As mentioned in that very message, the point was better missing-symbol
error detection, not RTLD_GLOBAL/LOCAL per se.

> I think using RTLD_LOCAL on most machines would be a much better
> idea. I've not found proper explanations why GLOBAL is used. We started
> using it ages ago, with [2], but that commit contains no explanation,
> and a quick search didn't show up anything either. Peter?

https://www.postgresql.org/message-id/7142.122...@sss.pgh.pa.us

My position is the same as then: I'm happy to remove it if it doesn't
break things anywhere ... but it seems like it would cause problems for
plpython, unless their behavior has changed since 2001 which is
surely possible.

regards, tom lane



Re: Translations contributions urgently needed

2018-02-23 Thread Robert Haas
On Fri, Feb 23, 2018 at 10:41 AM, Tom Lane  wrote:
> Thom Brown  writes:
>> Something that isn't clear to me is, for a language that didn't meet
>> 80% translation for a component, if it does reach 80% after the major
>> version release, does it then get shipped in a minor release, or is
>> out of that version completely until the next major version?
>
> No, it'll be added to the next minor release as soon as it reaches
> 80%.  That's happened routinely in the past.  I have no idea how
> automated that policy is -- you could ask Peter E. -- but a trawl
> through the commit logs shows .po files getting added in minor
> releases from time to time.

https://wiki.postgresql.org/wiki/NLS#Minimum_Translation implies the
opposite, because it refers to inclusion "in a PostgreSQL major
release".

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Translations contributions urgently needed

2018-02-23 Thread Tom Lane
Thom Brown  writes:
> Something that isn't clear to me is, for a language that didn't meet
> 80% translation for a component, if it does reach 80% after the major
> version release, does it then get shipped in a minor release, or is
> out of that version completely until the next major version?

No, it'll be added to the next minor release as soon as it reaches
80%.  That's happened routinely in the past.  I have no idea how
automated that policy is -- you could ask Peter E. -- but a trawl
through the commit logs shows .po files getting added in minor
releases from time to time.

regards, tom lane



Re: Online enabling of checksums

2018-02-23 Thread Robert Haas
On Thu, Feb 22, 2018 at 9:48 PM, Michael Paquier  wrote:
> On Thu, Feb 22, 2018 at 11:24:37AM -0800, Andres Freund wrote:
>> I suspect I'm going to get some grief for this, but I think the time has
>> come to bite the bullet and support changing databases in the same
>> process...
>
> I'd like to see that.  Last time this has been discussed, and Robert
> complained to me immediately when I suggested it, is that this is not
> worth it with the many complications around syscache handling and
> resource cleanup.  It is in the long term more stable to use a model
> where a parent process handles a set of children and decides to which
> database each child should spawn, which is what autovacuum does.

My position is that allowing processes to change databases is a good
idea but (1) it will probably take some work to get correct and (2) it
probably won't be super-fast due to the need to flush absolutely every
bit of state in sight that might've been influenced by the choice of
database.

I also agree with Andres that this email is not very easy to
understand, although my complaint is not so much that I don't see how
the parts relate as that you seem to be contradicting yourself.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] path toward faster partition pruning

2018-02-23 Thread Robert Haas
On Wed, Feb 21, 2018 at 5:44 AM, Amit Langote
 wrote:
> Please find attached updated patches.

Committed 0001 and 0002.

I'm having some difficulty wrapping my head around 0003 because it has
minimal comments and no useful commit message.  I think, though, that
it's actually broken.  Pre-patch, the logic in find_partition_scheme
compares partopfamily, partopcintype, and parttypcoll and then asserts
equality for parttyplen and parttypbyval; not coincidentally,
PartitionSchemeData describes the latter two fields only as "cached
data", so that the segregation of fields in PartitionSchemeData into
two groups exactly matches what find_partition_scheme is actually
doing.  However, with the patch, it turns into a sort of hodgepodge.
parttypid is added into the "cached" section of PartitionSchemeData
and partcollation to the primary section, but both values are
compared, not asserted; parttypcoll moves from the "compared" section
to the "asserted" section but the declaration in PartitionSchemeData
stays where it was.

Moreover, there's no explanation of why this is getting changed.
There's an existing comment that explains the motivation for what the
code does today, which the patch does not modify:

  * We store the opclass-declared input data types instead of the partition key
  * datatypes since the former rather than the latter are used to compare
  * partition bounds. Since partition key data types and the opclass declared
  * input data types are expected to be binary compatible (per ResolveOpClass),
  * both of those should have same byval and length properties.

Obviously, this raises the issue of whether changing this is really
the right thing to do in the first place, but at any rate it's
certainly necessary for the comments to match what the code actually
does.

Also, I find this not very helpful:

+ * The collation of the partition key can differ from the collation of the
+ * underlying column, so we must store this separately.

If the comments about parttypcol and partcollation were clear enough
(and I think they could use some work to distinguish them better),
then this would be pretty much unnecessary -- clearly the only reason
to store two things is if they might be different from each other.

It might be more useful to somehow explain how parttypid and
partsupfunc are intended to be work/be used, but actually I don't
think any satisfactory explanation is possible.  Either we have one
partition scheme per partopcintype -- in which case parttypid is
ill-defined because it could vary among relations with the same
PartitionScheme -- or we have on per parttypid -- in which case,
without some other change, partition-wise join will stop working
between relations with different parttypids but the same
partopcintype.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2018-02-23 Thread Robert Haas
On Fri, Feb 16, 2018 at 12:14 AM, Ashutosh Bapat
 wrote:
> Appreciate you taking time for review.
>
> PFA updated version.

Committed 0001.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: SSL passphrase prompt external command

2018-02-23 Thread Michael Paquier
On Fri, Feb 23, 2018 at 08:16:12AM -0500, Robert Haas wrote:
> On Thu, Feb 22, 2018 at 10:14 PM, Peter Eisentraut
>  wrote:
>> Here is a patch that adds a way to specify an external command for
>> obtaining SSL passphrases.  There is a new GUC setting
>> ssl_passphrase_command.
>>
>> Right now, we rely on the OpenSSL built-in prompting mechanism, which
>> doesn't work in some situations, including under systemd.  This patch
>> allows a configuration to make that work, e.g., with systemd-ask-password.
> 
> I have not reviewed the patch, but count me as an enthusiastic +1 for
> the concept.

+1 as well, as someone who actually began looking at the patch :)
--
Michael


signature.asc
Description: PGP signature


Re: [bug fix] Cascaded standby cannot start after a clean shutdown

2018-02-23 Thread Michael Paquier
On Fri, Feb 23, 2018 at 11:26:31AM +0900, Michael Paquier wrote:
> An other, evil, idea that I have on top of all those things is to
> directly hexedit the WAL segment of the standby just at the limit where
> it would receive a record from the primary and insert in it garbage
> data which would make the validation tests to blow up in xlogreader.c
> for the record allocation.

OK, I have been playing with such methods and finally I have been able
to check the theory of Tsunakawa-san here.  At first I played with
hexedit to pollute the data after the last record received by a standby,
record which is not at the end of a WAL page.  This can easily happen by
stopping the primary which will most likely only send data up to the
middle of a page.  Using that I have easily faced errors like that:
LOG: invalid resource manager ID 255 at 0/75353B8
And I have been able to see that garbage could be read as the length of
a record before the validation of the header is done.  With the patch
attached you can easily see a collection of garbage:
LOG: length of 77913214 fetched for record 0/30009D8
And this happens if a standby is reading a page with the last record in
the middle of it.

At this state, the XLOG reader is dealing with random data, so this
would most likely fail in ValidXLogRecordHeader(), which is what
happened with the invalid rmgr for example.  However, if one is really
unlucky, and the probability of facing that are really low, the random
data being read *can* pass the validation checks of the header, which
would cause a set of problems:
- In the backend, this means a failure on palloc_extended if the garbage
read is higher than 1GB.  This causes a standby to stop immediately
while it should normally continue to poll the primary by spawning a new
WAL receiver and begin streaming from the beginning of the segment where
it needs its data (assuming that there is no archive).
- In the frontend, this can cause problems for pg_waldump and
pg_rewind.  For pg_waldump, this is not actually a big deal because a
failure means the end of the data that can be decoded.  However the
problem is more serious for pg_rewind.  At the initial phase of the
tool, pg_rewind scans the WAL segments of the target server to look for
the modified blocks since the last checkpoint before promotion up to the
end of the stream.  If at the end of the stream it finds garbage data,
then there is a risk that it allocates a couple of GBs of data, likely
finishing by causing the process to be killed on OOM by the automatic
OOM killer, preventing the rewind to complete :(

After some thoughts, I have also come up with a more simple way to test
the stabilility of the XLOG reader:
1) Create a primary/standby cluster.
2) Stop the standby.
3) Add in the standby's pg_wal a set of junk WAL segments generated with
if=/dev/urandom of=junk_walseg bs=16MB count=1.  Note that recycled WAL
segments are simple copies of past ones when fetching an existing one,
so on a standby when a new segment writes based on a past existing
segment it writes data on top of some garbage.  So it is possible to
face this problem, this just makes it show up faster.
4) Start the standby again, and let it alive.
5) Generate on the primary enough records worth of more or less 8kB to
fill in a complete WAL page.
6) Restart the primary cleanly, sleep like 5s to let the standby the
time to stream the new partial page and return to 5).

When 5) and 6) loop, you will see the monitoring log of the attached
patch complain after a couple of restarts.

Tsunakawa-san has proposed upthread to fix the problem by zero-ing the
page read in the WAL receiver.  While I agree that zeroing the page is
the way to go, doing so in the WAL receiver does not take care of
problems with the frontends.  I don't have the time to test that yet as
it is late here, but based on my code lookups tweaking
ReadPageInternal() so as the page is zero'ed correctly should do it for
all the cases.  This way, the checks happening after a page read would
fail because of the zero'ed fields consistently instead of the garbage
accumulated.
--
Michael
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 3a86f3497e..13b25e5236 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -303,6 +303,16 @@ XLogReadRecord(XLogReaderState *state, XLogRecPtr RecPtr, char **errormsg)
 	record = (XLogRecord *) (state->readBuf + RecPtr % XLOG_BLCKSZ);
 	total_len = record->xl_tot_len;
 
+#ifndef FRONTEND
+	/*
+	 * XXX: remove from final patch , just here to check that garbage
+	 * data can be fetched from blocks read.
+	 */
+	if (total_len > 5 * XLOG_BLCKSZ)
+		elog(LOG, "length of %u fetched for record %X/%X", total_len,
+			 (uint32) (RecPtr >> 32), (uint32) RecPtr);
+#endif
+
 	/*
 	 * If the whole record header is on this page, validate it immediately.
 	 * Otherwise do just a basic sanity check on xl_tot_len, and validate the


signature.asc

Re: Translations contributions urgently needed

2018-02-23 Thread Thom Brown
On 23 February 2018 at 04:04, Tom Lane  wrote:
> Alvaro Herrera  writes:
>> Please join pgsql-translat...@postgresql.org.
>
> What surprises me about this thread is that apparently the sad state
> of the v10 translations wasn't already discussed on that list?
>
> I have no objection to calling for more translation volunteers on
> this list --- in fact, probably it'd be a good idea to call for
> more help on pgsql-general, too.  But it doesn't seem like it's
> quite on-topic for -hackers otherwise.

Something that isn't clear to me is, for a language that didn't meet
80% translation for a component, if it does reach 80% after the major
version release, does it then get shipped in a minor release, or is
out of that version completely until the next major version?

Thom



Re: SSL passphrase prompt external command

2018-02-23 Thread Robert Haas
On Thu, Feb 22, 2018 at 10:14 PM, Peter Eisentraut
 wrote:
> Here is a patch that adds a way to specify an external command for
> obtaining SSL passphrases.  There is a new GUC setting
> ssl_passphrase_command.
>
> Right now, we rely on the OpenSSL built-in prompting mechanism, which
> doesn't work in some situations, including under systemd.  This patch
> allows a configuration to make that work, e.g., with systemd-ask-password.

I have not reviewed the patch, but count me as an enthusiastic +1 for
the concept.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: FOR EACH ROW triggers on partitioned tables

2018-02-23 Thread Robert Haas
On Thu, Feb 22, 2018 at 6:52 PM, Alvaro Herrera  wrote:
> Alvaro Herrera wrote:
>> Another option is to rethink this feature from the ground up: instead of
>> cloning catalog rows for each children, maybe we should have the trigger
>> lookup code, when running DML on the child relation (the partition),
>> obtain trigger entries not only for the child relation itself but also
>> for its parents recursively -- so triggers defined in the parent are
>> fired for the partitions, too.
>
> I have written this, and it seems to work fine; it's attached.
>
> Generally speaking, I like this better than my previously proposed
> patch: having duplicate pg_trigger rows seems lame, in hindsight.
>
> I haven't measured the performance loss, but we now scan pg_inherits
> each time we build a relcache entry and relhastriggers=on.  Can't be
> good.  In general, the pg_inherits stuff looks generally unnatural --
> manually doing scans upwards (find parents) and downwards (find
> partitions).  It's messy and there are no nice abstractions.
> Partitioning looks too much bolted-on still.

Elsewhere, we've put a lot of blood, sweat, and tears into making sure
that we only traverse the inheritance hierarchy from top to bottom.
Otherwise, we're adding deadlock hazards.  I think it's categorically
unacceptable to do traversals in the opposite order -- if you do, then
an UPDATE on a child could deadlock with a LOCK TABLE on the parent.
That will not win us any awards.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: FOR EACH ROW triggers on partitioned tables

2018-02-23 Thread Alvaro Herrera
Amit Langote wrote:
> On Fri, Feb 23, 2018 at 11:32 AM, Alvaro Herrera
>  wrote:

> > Uh, wow, how have I missed that all this time!  Yes, it probably does.
> > I'll rework this tomorrow ... and the already committed index patch too,
> > I think.
> 
> BTW, not sure if you'd noticed but I had emailed about setting
> relispartition on index partitions after you committed the first
> indexes patch.

I hadn't noticed.  These days sadly I'm not able to keep up with all
pgsql-hackers traffic, and I'm quite likely to miss things unless I'm
CCed.  This seems true for many others, too.

Thanks for pointing it out.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Runtime Partition Pruning

2018-02-23 Thread Simon Riggs
On 23 February 2018 at 11:40, David Rowley  wrote:
> On 23 February 2018 at 04:11, Jesper Pedersen
>  wrote:
>> Are UPDATE and DELETE suppose to be supported ?
>
> To be honest, I had not even considered those.

I can say that I had considered those. Handling of UPDATE and DELETE
with partitions is significantly different, so its not just an
oversight its a different branch of the project.

We need SELECT to work first and then we have a chance of making
UPDATE/DELETE work.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Runtime Partition Pruning

2018-02-23 Thread David Rowley
On 23 February 2018 at 04:11, Jesper Pedersen
 wrote:
> Are UPDATE and DELETE suppose to be supported ?

To be honest, I had not even considered those. Without looking in
detail I imagine it may be possible to allow this simply by setting
the AppendPath->trypartitionprune in the correct cases in the
inheritence_planner(). I would need to look into this in some detail
to find out for sure.

Another case which likely is simple to implement is the exact same
processing for MergeAppends. I currently see no reason why the same
pruning cannot be done for subnodes of that node type too. I've just
not done so yet. I'd rather get more sanity check reviews on the
current scope of the patch before I widen it out to other areas, but
at the same time also don't want to leave very simple things to PG12
which can easily be done in PG11. So I'll try to look at this and get
back to you, or perhaps release a new set of patches to support the
additional features.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: ERROR: left and right pathkeys do not match in mergejoin

2018-02-23 Thread Alexander Kuzmenkov
Ah, I understand now. We can lie to the executor about the order, 
because when we are moving based on the second outer column, we have a 
stretch of equal values in the inner column, so we can consider them to 
be sorted whatever way we need.


The patch applies cleanly, make check-world passes.

In create_mergejoin_plan:
    * implied inner ordering is then "ORDER BY x, y, x", but the pathkey
    * drops the second sort by x as redundant, and this code must cope.
-- should this read "the pathkey machinery drops"?

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: [HACKERS] SERIALIZABLE with parallel query

2018-02-23 Thread Thomas Munro
On Fri, Feb 23, 2018 at 6:05 AM, Robert Haas  wrote:
> On Thu, Feb 22, 2018 at 7:54 AM, Thomas Munro
>  wrote:>
>> The best solution I have come up with so far is to add a reference
>> count to SERIALIZABLEXACT.  I toyed with putting the refcount into the
>> DSM instead, but then I ran into problems making that work when you
>> have a query with multiple Gather nodes.  Since the refcount is in
>> SERIALIZABLEXACT I also had to add a generation counter so that I
>> could detect the case where you try to attach too late (the leader has
>> already errored out, the refcount has reached 0 and the
>> SERIALIZABLEXACT object has been recycled).
>
> I don't know whether that's safe or not.  It certainly sounds like
> it's solving one category of problem, but is that the only issue?  If
> some backends haven't noticed that we're safe, they might keep
> acquiring SIREAD locks or doing other manipulations of shared state,
> which maybe could cause confusion.  I haven't looked into this deeply
> enough to understand whether there's actually a possibility of trouble
> there, but I can't rule it out off-hand.

After some testing, I think the refcount approach could be made to
work, but it seems quite complicated and there are some weird edge
cases that showed up that started to make it look like more trouble
than it was worth.  One downside of refcounts is that you never get to
free the SERIALIZABLEXACT until the end of the transaction with
parallel_leader_participation = off.

I'm testing another version that is a lot simpler: like v10, it relies
on the knowledge that the leader's transaction will always end after
the workers have finished, but it handles the RO_SAFE optimisation by
keeping the SERIALIZABLEXACT alive but freeing its locks etc.  More
soon.



Isolation tester result formatting

2018-02-23 Thread Thomas Munro
Hi hackers,

Due to recent naming inflation, the isolation tester output has become
jagged (when viewed in a monospace font):

test lock-update-delete   ... ok
test lock-update-traversal... ok
test insert-conflict-do-nothing ... ok
test insert-conflict-do-nothing-2 ... ok
test insert-conflict-do-update ... ok
test insert-conflict-do-update-2 ... ok
test insert-conflict-do-update-3 ... ok
test insert-conflict-toast... ok
test delete-abort-savept  ... ok

If we widened the field by 4 spaces it'd fit all current test names:

test lock-update-delete   ... ok
test lock-update-traversal... ok
test insert-conflict-do-nothing   ... ok
test insert-conflict-do-nothing-2 ... ok
test insert-conflict-do-update... ok
test insert-conflict-do-update-2  ... ok
test insert-conflict-do-update-3  ... ok
test insert-conflict-toast... ok
test delete-abort-savept  ... ok

Patch attached.  I realise this isn't the most pressing problem faced
by computer science...

-- 
Thomas Munro
http://www.enterprisedb.com


test-name-inflation-adjustment.patch
Description: Binary data


Re: RTLD_GLOBAL (& JIT inlining)

2018-02-23 Thread Andres Freund
Hi,

On 2018-02-22 23:11:07 -0800, Andres Freund wrote:
> I think using RTLD_LOCAL on most machines would be a much better
> idea. I've not found proper explanations why GLOBAL is used. We started
> using it ages ago, with [2], but that commit contains no explanation,
> and a quick search didn't show up anything either. Peter?

Hm. No idea if that was the reason back then, but I think it's
unfortunately not easy to change.  The reason why some plperlu tests
fail, is that plperl extension libraries rely on plperl to be loaded
into the global namespace.

When plperl gets loaded with RTLD_LOCAL all the dependant shared
libaries (i.e. libperl.so) also get loaded with it. That works perfectly
for plperl itself, but once per loads an extension library is loaded, it
fails to resolve libperl.so symbols...  I can "fix" this by
dlopen(RTLD_GLOBAL) libperl.so, but that's obviously not a practial
solution.


FWIW, something in plperl's error handling appears to be busted. Instead
of a helpful error report we get:
ERROR:  22021: invalid byte sequence for encoding "UTF8": 0x00
which doesn't help much to nail down the issue.

With a breakpoint it becomes clear what the issue is:

#0  report_invalid_encoding (encoding=6, mbstr=0x557ae880416b "", len=252)
at /home/andres/src/postgresql/src/backend/utils/mb/wchar.c:1997
#1  0x557ae73d4473 in pg_verify_mbstr_len (encoding=6, mbstr=0x557ae880416b 
"", len=252, noError=0 '\000')
at /home/andres/src/postgresql/src/backend/utils/mb/wchar.c:1936
#2  0x557ae73d4354 in pg_verify_mbstr (encoding=6, 
mbstr=0x557ae8804080 "Can't load 
'/usr/lib/x86_64-linux-gnu/perl5/5.26/auto/List/Util/Util.so' for module 
List::Util: /usr/lib/x86_64-linux-gnu/perl5/5.26/auto/List/Util/Util.so: 
undefined symbol: PL_memory_wrap at /usr/share/perl/5.26/XSLoader.pm line 
96.\n", len=487, noError=0 '\000') at 
/home/andres/src/postgresql/src/backend/utils/mb/wchar.c:1879
#3  0x557ae73d119a in pg_any_to_server (
s=0x557ae8804080 "Can't load 
'/usr/lib/x86_64-linux-gnu/perl5/5.26/auto/List/Util/Util.so' for module 
List::Util: /usr/lib/x86_64-linux-gnu/perl5/5.26/auto/List/Util/Util.so: 
undefined symbol: PL_memory_wrap at /usr/share/perl/5.26/XSLoader.pm line 
96.\n", len=487, encoding=6) at 
/home/andres/src/postgresql/src/backend/utils/mb/mbutils.c:572
#4  0x7f4cabb6b123 in utf_u2e (
utf8_str=0x557ae8804080 "Can't load 
'/usr/lib/x86_64-linux-gnu/perl5/5.26/auto/List/Util/Util.so' for module 
List::Util: /usr/lib/x86_64-linux-gnu/perl5/5.26/auto/List/Util/Util.so: 
undefined symbol: PL_memory_wrap at /usr/share/perl/5.26/XSLoader.pm line 
96.\n", len=487) at 
/home/andres/src/postgresql/src/pl/plperl/plperl_helpers.h:16
#5  0x7f4cabb6b2f6 in sv2cstr (sv=0x557ae8769568) at 
/home/andres/src/postgresql/src/pl/plperl/plperl_helpers.h:96
#6  0x7f4cabb732dc in plperl_call_perl_func (desc=0x557ae86cf6b0, 
fcinfo=0x557ae875fc20)
at /home/andres/src/postgresql/src/pl/plperl/plperl.c:2250
#7  0x7f4cabb74a3b in plperl_func_handler (fcinfo=0x557ae875fc20)
at /home/andres/src/postgresql/src/pl/plperl/plperl.c:2438

I don't now perl apis at all, but it's clearly wrong that fram 3,4 show
len=487, which comes from SvPVutf8() in sv2cstr().  It appears that
somehow perl throws out two error messages separated by a null byte...

(gdb) p *(char[487]*)utf8_str
$42 = "Can't load '/usr/lib/x86_64-linux-gnu/perl5/5.26/auto/List/Util/Util.so' 
for module List::Util: 
/usr/lib/x86_64-linux-gnu/perl5/5.26/auto/List/Util/Util.so: undefined symbol: 
PL_memory_wrap at /usr/share/perl/5.26/XSLoader.pm line 96.\n\000 at 
/usr/lib/x86_64-linux-gnu/perl5/5.26/List/Util.pm line 23.\nCompilation failed 
in require at /usr/lib/x86_64-linux-gnu/perl5/5.26/Scalar/Util.pm line 
23.\nCompilation failed in require at 
/usr/lib/x86_64-linux-gnu/perl/5.26/Data/Dumper.pm line 303.\n"


Greetings,

Andres Freund