Re: Performance regression with PostgreSQL 11 and partitioning

2018-06-08 Thread Ashutosh Bapat
On Fri, Jun 8, 2018 at 1:52 AM, Tom Lane  wrote
>
> I'm still of the opinion that find_appinfos_by_relids() needs to be
> nuked from orbit.  It has nothing to recommend it either from the
> standpoint of performance or that of intellectual coherency (or maybe
> that problem is just inadequate documentation).  The places consuming
> its results are no better.

Here's patch with some comments added to find_appinfos_by_relid(),
adjust_appendrel_attrs and ajust_appendrel_attrs_context. There was no
explanation about AppendRelInfo argument to adjust_appendrel_attrs or
its context in pre-v11 code. May be that was implicit in the first
paragraph. But then that implicit-ness holds true for AppendRelInfo
array as well. So, it was not changed when we changed the signature of
ajdust_appendrel_attrs().

>
> I was also pretty unhappy to discover, as I poked around in the code, that
> recent partitioning patches have introduced various assumptions about the
> ordering of the append_rel_list.  It's bad enough that those exist at all;
> it's worse that they're documented, if at all, only beside the code that
> will fail (usually silently) if they're violated.

Not silently exactly; a build with assert would trip the assertion in
inheritance_planner() at line 1295. So any changes to that assumption
would be caught by our regression first. I agree that is not so useful
in production, but it wouldn't go, thanks to our regression.

> I do not find this
> acceptable.  If we cannot avoid these assumptions, they need to be
> documented more centrally, like in the relation.h block comment for struct
> AppendRelInfo.
>

Attached patch adds the assumption to the block you mention above. Please check.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


pg_arl.patch
Description: Binary data


Re: Performance regression with PostgreSQL 11 and partitioning

2018-06-08 Thread Robert Haas
On Fri, Jun 8, 2018 at 12:56 AM, Tom Lane  wrote:
> The pre-v11 incarnation of those functions took a single AppendRelInfo,
> specifying an exact translation from one parent relid to one child
> relid.  The fundamental problem I've got with the current code, entirely
> independently of any performance issues, is that it's completely unclear
> -- or at least undocumented -- which translation(s) are supposed to occur.

I don't understand this complaint.  Before, the code took one
AppendRelInfo, and according to you, it was clear what was supposed to
happen.  Now it takes an array of AppendRelInfos and, according to
you, it's completely unclear.  Yet that seems, to me at least, to be a
straightforward generalization.  If 1 AppendRelInfo is an adequate
specification of one translations, why are N AppendRelInfos not an
adequate specification of N translations?

(On another note, I don't have a strong what should be done about the
fact that all AppendRelInfos are stored in a flat list or whether it
should be done in v11 or v12, but I completely agree with the idea
that something should be done about it at some point.  I'm sure it's
possible to design a data structure that is less efficient to search
than a singly-linked list, but you'd probably have to be explicitly
aiming for that goal.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Needless additional partition check in INSERT?

2018-06-08 Thread Alvaro Herrera
On 2018-Jun-07, David Rowley wrote:

> On 7 June 2018 at 15:57, Alvaro Herrera  wrote:
> > Hm I was thinking this new function would be companion to ExecConstrains
> > (a fact I used in the name I proposed,) so it'd be in the same file
> > (probably right after it.)
> 
> Okay. v5 (attached) does it that way.

Thanks, looks nice (function name is stupidly long, my fault).

I wondered about the refactoring that Amit Khandekar is proposing.
Given the improved API you gave the new function, it appears we can
write it like this (untested):

bool
ExecConstraintsPartConstrNeedsRecheck(ResultRelInfo *resultRelInfo,
bool tupleRouting)
{
if (tupleRouting)
{
if (resultRelInfo->ri_TrigDesc &&
resultRelInfo->ri_TrigDesc->trig_insert_before_row)
return true;
}
else if (resultRelInfo->ri_PartitionCheck != NIL)
return true;

return false;   /* no need to recheck */
}

I was also wondering about introducing a new function call in this path
where previously was none.  Given the amount of other stuff that's
happening when a tuple is inserted, I suppose it's not worth worrying
about in terms of making this an inline function in the header.

BTW I noticed that ExecConstraints() indicates that the given
resultRelInfo is "the original resultRelInfo, before tuple routing", but
that's demostrably false.  What's up with that?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Bug in either collation docs or code

2018-06-08 Thread Melanie Plageman
>
> ​Data, apparently...I got the same non-error result before inserting a
> record into test1 then I got the expected error.
>
> Its the function/operator the fails when faced with invalid input, not the
> planner, so the error requires data to provoke.
>
> David J.
>
>
>
I tried inserting data and did not get an error:

CREATE TABLE test1 (
a text COLLATE "de_DE",
b text COLLATE "es_ES"
);

INSERT INTO test1 VALUES('b','b'), ('c','c'), ('g','g'), ('h','h');
SELECT a < (select 'foo' COLLATE "fr_FR") FROM test1;

-- 
Melanie Plageman


Re: Bug in either collation docs or code

2018-06-08 Thread David G. Johnston
On Fri, Jun 8, 2018 at 9:24 AM, Melanie Plageman 
wrote:

> It seems like this would not allow the function/operator to decide if it
> cares about a determinate collation during execution, since it would
> already have errored out during planning.
>

In the case where the function/operator doesn't care one shouldn't be
attaching explicit collation clauses to its inputs anyway - it is a
semantic bug if nothing else and a bug during planning pointing that out
seems useful.

David J.


Re: Bug in either collation docs or code

2018-06-08 Thread David G. Johnston
On Fri, Jun 8, 2018 at 9:12 AM, Melanie Plageman 
wrote:

> I tried inserting data and did not get an error:
>
> CREATE TABLE test1 (
> a text COLLATE "de_DE",
> b text COLLATE "es_ES"
> );
>
> INSERT INTO test1 VALUES('b','b'), ('c','c'), ('g','g'), ('h','h');
> SELECT a < (select 'foo' COLLATE "fr_FR") FROM test1;
>

​Suggest providing a self-contained script (set echo to all and capture the
output to a file), changing the table name to ensure no test pollution, and
including the version of the server in one of the queries.

I did my test on 9.6.5 ​(Ubuntu 16.04) with:

CREATE TABLE test_col (
a text COLLATE "en_CA.utf8",
b text COLLATE "en_US.utf8"
);
INSERT INTO test_col VALUES ('A', 'A');
SELECT a < (SELECT 'foo'::text COLLATE "en_GB.utf8") FROM test_col;

SQL Error: ERROR:  could not determine which collation to use for string
comparison
HINT:  Use the COLLATE clause to set the collation explicitly.

Note, I had to include the cast to text in order for the query to execute...

David J.


Re: Bug in either collation docs or code

2018-06-08 Thread Melanie Plageman
> IIRC this was an intentional decision, made on the grounds that we
> can't tell whether the function/operator actually cares about having
> a determinate collation or not, so we have to leave it to execution of
> that function/operator to complain or not.
>
>
In this case, why treat implicit and explicit collation conflicts
differently? A conflicting explicit collation will produce an error during
planning, whereas a conflicting implicit collation will not produce an
error until execution.

create table foo(a text COLLATE "es_ES");
create table bar(b text COLLATE "de_DE");
insert into foo values('a'), ('b'), ('c'), ('d');
insert into bar values('b'), ('c'), ('g'), ('h');

SELECT * FROM foo WHERE a > (SELECT b FROM bar LIMIT 1); -- error during
execution
EXPLAIN SELECT * FROM foo WHERE a > (SELECT b FROM bar LIMIT 1); -- but no
error during planning
EXPLAIN SELECT 'c' COLLATE "de_DE" > 'ç' COLLATE "es_ES"; -- error during
planning

It seems like this would not allow the function/operator to decide if it
cares about a determinate collation during execution, since it would
already have errored out during planning.

-- 
Melanie Plageman


Re: Transform for pl/perl

2018-06-08 Thread Tom Lane
I wrote:
> ... So if we think that \undef ought to
> produce a SQL null, the thing to do is move the dereferencing loop to
> the beginning of plperl_sv_to_datum, and then \undef would produce NULL
> whether this transform is installed or not.  I don't have a well-informed
> opinion on whether that's a good idea, but I tend to the view that it is.
> Right now the case produces an error, and not even a very sane one:

> regression=# create function foo() returns int language plperlu 
> regression-# as '\undef';
> CREATE FUNCTION
> regression=# select foo();
> ERROR:  PL/Perl function must return reference to hash or array
> CONTEXT:  PL/Perl function "foo"

> so there's not really a compatibility break if we redefine it.

After further thought, the only argument I can think of for preserving
this existing behavior is if we wanted to reserve returning a reference-
to-scalar for some future purpose, ie make it do something different
from returning the referenced value.  I can't think of any likely use
of that kind, but maybe I'm just insufficiently creative today.

However, if one makes that argument, then it is clearly a bad idea for
jsonb_plperl to be forcibly dereferencing such references: once we do make
a change of that sort, jsonb_plperl will be out of step with the behavior
for every other datatype, or else we will need to make a subtle
compatibility break to align it with whatever the new behavior is.

So it seems that whichever way you stand on that, it's wrong to have
that dereference loop in SV_to_JsonbValue().  I'm forced to the
conclusion that that's just a hack to band-aid over a bug in the
transform's other direction.

Now, if we did decide that auto-dereferencing should be the general
rule in perl->SQL conversions, I'd be inclined to leave that loop
in place in SV_to_JsonbValue(), because it would be covering the case
where jsonb_plperl is recursively disassembling an AV or HV and finds
a reference-to-scalar.  But we also need a dereference loop in at least
one place in plperl.c itself if that's the plan.

I'm inclined to think that auto-dereference is indeed a good idea,
and am tempted to go make that change to make all this consistent.
Comments?

regards, tom lane



Re: Concurrency bug in UPDATE of partition-key

2018-06-08 Thread Dilip Kumar
On Tue, Jun 5, 2018 at 8:03 PM, Amit Khandekar 
wrote:

> Attached is a rebased patch version. Also included it in the upcoming
> commitfest :
> https://commitfest.postgresql.org/18/1660/
>
> In the rebased version, the new test cases are added in the existing
> isolation/specs/partition-key-update-1.spec test.


/*
+  * If this is part of an UPDATE of partition-key, the
+  * epq tuple will contain the changes from this
+  * transaction over and above the updates done by the
+  * other transaction. The caller should now use this
+  * tuple as its NEW tuple, rather than the earlier NEW
+  * tuple.
+  */
+ if (epqslot)
+ {
+ *epqslot = my_epqslot;
+ return NULL;
+ }

I think we need simmilar fix if there are BR Delete trigger and the
ExecDelete is blocked on heap_lock_tuple because the concurrent transaction
is updating the same row.  Because in such case it would have already got
the final tuple so the hep_delete will return MayBeUpdated.

Below test can reproduce the issue.

CREATE TABLE pa_target (key integer, val text) PARTITION BY LIST (key);
CREATE TABLE part1 PARTITION OF pa_target FOR VALUES IN (1);
CREATE TABLE part2 PARTITION OF pa_target FOR VALUES IN (2);

CREATE TABLE deleted_row (count int);
CREATE OR REPLACE FUNCTION br_delete() RETURNS trigger AS
$$BEGIN
insert into deleted_row values(OLD.key);
RETURN OLD;
END;$$ LANGUAGE plpgsql;

CREATE TRIGGER test_br_trig BEFORE DELETE ON part1 FOR EACH ROW EXECUTE
PROCEDURE br_delete();

INSERT INTO pa_target VALUES (1, 'initial1');

session1:
postgres=# BEGIN;
BEGIN
postgres=# UPDATE pa_target SET val = val || ' updated by update1' WHERE
key = 1;
UPDATE 1

session2:
postgres=# UPDATE pa_target SET val = val || ' updated by update2', key =
key + 1 WHERE key =1;


session1:
postgres=# commit;
COMMIT

UPDATE 1
postgres=# select * from pa_target ;
 key | val
-+-
   2 | initial1 updated by update2   --> session1's update is overwritten.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Re: why partition pruning doesn't work?

2018-06-08 Thread David Rowley
On 8 June 2018 at 18:14, David Rowley  wrote:
> On 8 June 2018 at 15:22, Tom Lane  wrote:
>> David Rowley  writes:
>>> On 8 June 2018 at 03:43, Tom Lane  wrote:
 Maybe there's something I'm missing here, but I sort of hoped that this
 patch would nuke all the special-case code for Params in this area.
 Why is there any need to distinguish them from other stable expressions?
>>
>>> We need to know which Params exist in the Expr as if there are no
>>> Params, or only external Params, then we can run-time prune during
>>> startup of the executor.
>>
>> This does not refute my question.  Why doesn't the same logic apply
>> to any stable expression?  That is, ISTM a Param is a special case
>> of that.
>
> Okay, maybe we don't need to know which external params exist, but we
> need to know if there are any exec params so that we don't try to
> evaluate an expression with any of those during executor startup.
>
> I'll produce a patch which simplifies things in that area.

Okay, I've gotten rid of the tracking of external params. We now just
track exec params. We still need to know about these so we know if a
re-prune is required during ExecReScanAppend(). Obviously, we don't
want to prune on any random Param change, so I'm fairly sure it's a
good idea to keep track of these.

I've changed the code inside partkey_datum_from_expr so that it's a
simple bool array lookup to decide if we can evaluate the expression
or not. This bool array is populated during planning, which I think is
rather nice so we don't have to go and do it each time the plan is
executed.

I also discovered that I was needlessly running the pruning code again
during executor run in some cases where there was no possibility of
doing any further pruning there.  I've had to add some new code to set
the present_parts inside ExecFindInitialMatchingSubPlans(). It now
properly removes the member of any sub-partitions which have had all
of their partitions pruned. This allows us just to use 'present_parts'
to calculate the subnodes from, rather than going and calling the
pruning code again.

Technically PartitionPruningData does not really need the
do_exec_prune field. A non-empty execparams could indicate this, but I
felt it was better to have the bool so that we have one for each
method of run-time pruning. This also saves a bms_is_empty() call
inside find_subplans_for_params_recurse(). This could be a bit of a
hotspot during parameterized nested loops which cause partition
pruning.

I'm really hoping this is what you meant about the special-case code for Params.

Does this look any better?

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


run-time_pruning_for_exprs_v4.patch
Description: Binary data


Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

2018-06-08 Thread Tom Lane
Phil Florent  writes:
> explain analyze select * from v where v.k1 > date '2017-01-01';
> ERREUR:  XX000: did not find all requested child rels in append_rel_list
> EMPLACEMENT : find_appinfos_by_relids, prepunion.c : 2643

Reproduced here, thanks for the report!  This is very timely since
we were just in process of rewriting that code anyway ...

regards, tom lane



Re: Needless additional partition check in INSERT?

2018-06-08 Thread Alvaro Herrera
On 2018-Jun-09, David Rowley wrote:

> On 9 June 2018 at 03:24, Alvaro Herrera  wrote:
> > I was also wondering about introducing a new function call in this path
> > where previously was none.  Given the amount of other stuff that's
> > happening when a tuple is inserted, I suppose it's not worth worrying
> > about in terms of making this an inline function in the header.
> 
> I wondered about that too. I've not tested it again, but I do have
> another patch locally which can about double the speed of COPY FROM
> for partitioned tables, so I have to admit I did gawk at the
> additional function call idea, but I'd rather see this fixed than on
> the shelf, so I went with it.
> 
> I'll leave it up to you to how you'd like to format the if statement.
> I've written it the way I'm happy with.

Truth is, the more I look at this, the more I think it should be done in
the way Amit Langote was proposing: do away with the extra function, and
check all those conditions inside ExecConstraints itself.  We can add a
new boolean tupleRouting argument, which I think is the only missing bit.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Bug in either collation docs or code

2018-06-08 Thread Melanie Plageman
>
> I did my test on 9.6.5 ​(Ubuntu 16.04) with:
>
> CREATE TABLE test_col (
> a text COLLATE "en_CA.utf8",
> b text COLLATE "en_US.utf8"
> );
> INSERT INTO test_col VALUES ('A', 'A');
> SELECT a < (SELECT 'foo'::text COLLATE "en_GB.utf8") FROM test_col;
>
> SQL Error: ERROR:  could not determine which collation to use for string
> comparison
> HINT:  Use the COLLATE clause to set the collation explicitly.
>
> Note, I had to include the cast to text in order for the query to
> execute...
>
> David J.
>
>
On postgres built off of master on my mac (sierra), the following is the
output:

DROP TABLE IF EXISTS test_col_mac;
DROP TABLE
CREATE TABLE test_col_mac (
a text COLLATE "de_DE",
b text COLLATE "es_ES"
);
CREATE TABLE
INSERT INTO test_col_mac VALUES('A','A');
INSERT 0 1
SELECT a < (SELECT 'foo'::TEXT COLLATE "fr_FR") FROM test_col_mac;
 ?column?
--
 t
(1 row)


-- 
Melanie Plageman


Re: Bug in either collation docs or code

2018-06-08 Thread Tom Lane
Melanie Plageman  writes:
> In this case, why treat implicit and explicit collation conflicts
> differently?

Um ... because the SQL standard says so?

regards, tom lane



Re: Needless additional partition check in INSERT?

2018-06-08 Thread David Rowley
On 9 June 2018 at 03:24, Alvaro Herrera  wrote:
> I was also wondering about introducing a new function call in this path
> where previously was none.  Given the amount of other stuff that's
> happening when a tuple is inserted, I suppose it's not worth worrying
> about in terms of making this an inline function in the header.

I wondered about that too. I've not tested it again, but I do have
another patch locally which can about double the speed of COPY FROM
for partitioned tables, so I have to admit I did gawk at the
additional function call idea, but I'd rather see this fixed than on
the shelf, so I went with it.

I'll leave it up to you to how you'd like to format the if statement.
I've written it the way I'm happy with.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Performance regression with PostgreSQL 11 and partitioning

2018-06-08 Thread Ashutosh Bapat
On Fri, Jun 8, 2018 at 10:26 AM, Tom Lane  wrote:
> David Rowley  writes:
>> On 8 June 2018 at 08:22, Tom Lane  wrote:
>>> I'm still of the opinion that find_appinfos_by_relids() needs to be
>>> nuked from orbit.
>
>> Yeah, I agree it's not nice that it pallocs an array then pfrees it
>> again. adjust_appendrel_attrs and adjust_child_relids could probably
>> just accept a RelIds of childrels and find the AppendRelInfos itself,
>> similar to how I coded find_appinfos_by_relids.
>
> Why do they need to accept any additional parameters at all?
>
> The pre-v11 incarnation of those functions took a single AppendRelInfo,
> specifying an exact translation from one parent relid to one child
> relid.  The fundamental problem I've got with the current code, entirely
> independently of any performance issues, is that it's completely unclear
> -- or at least undocumented -- which translation(s) are supposed to occur.
> If we assume that the code isn't 100% broken (which I'm hardly convinced
> of, but we'll take it as read for the moment) then it must be that the
> translations are constrained to be well-determined by the query structure
> as represented by the totality of the AppendRelInfo data.  So I'm thinking
> maybe we don't need those parameters at all.  In the pre-v11 situation,
> we were saving lookups by passing the only applicable AppendRelInfo to
> the translation functions.  But if the translation functions have to
> look up the right translation anyway, let's just make them do that,
> and create whatever infrastructure we have to have to make it fast.
>

Pre-v11 we required to translate an expressions only for one
parent-child pair using adjust_appendrel_attrs() since only base
relations could have "other" relations. With partition-wise join and
aggregates, we have to do that for multiple parent-child pairs. If we
were to use the same pre-v11 interfaces, we have to do those
adjustments one pair at a time. But the way adjust_appendrel_attrs()
works, it creates a copy of whole expression tree replacing only the
nodes that adjust_appendrel_attrs() cares about. Doing adjustments one
pair at at time meant that all translated expression trees leaked
memory except the last one, which will be used. It was natural to pass
multiple AppendRelInfos, one per parent-child pair, to
adjust_appendrel_attrs() so that it carries out the translations in
one go. This was discussed at length over an year ago. I can see some
references starting [1] and following mails. The original mail thread
started at [2]. You will also find discussion about why we chose to
pass an array instead of list. We didn't do inside
adjust_appendrel_attrs() or adjust_child_relids() because in some
functions those were called multiple times and scanning list those
many times for same set of AppendRelInfos would have been waste of CPU
cycles.

[1] 
https://postgrespro.com/list/id/cafjfprcmwwepj-do1otxq-gapgpsz1fmh7yqvqtwzqogczq...@mail.gmail.com
[2] 
https://postgrespro.com/list/id/cafjfprcmwwepj-do1otxq-gapgpsz1fmh7yqvqtwzqogczq...@mail.gmail.com

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

2018-06-08 Thread Phil Florent
Hi,


I obtained an XX000 error testing my DSS application with PostgreSQL 11 beta 1.

Here is a simplified version of my test, no data in the tables :


-- 11
select version();
version

---

PostgreSQL 11beta1 (Debian 11~beta1-2.pgdg+1) on x86_64-pc-linux-gnu, compiled 
by gcc (Debian 7.3.0-19) 7.3.0, 64-bit
(1 ligne)

-- connected superuser -- postgres
create user a password 'a';
create schema a authorization a;
create user b password 'b';
create schema b authorization b;
create user c password 'c';
create schema c authorization c;
create user z password 'z';
create schema z authorization z;

-- connected a
create table t1(k1 timestamp, c1 int);
create view v as select k1, c1 from t1;
grant usage on schema a to z;
grant select on all tables in schema a to z;

-- connected b
create table t2(k1 timestamp, c1 int) partition by range(k1);
create table t2_2016 partition of t2 for values from ('2016-01-01') to 
('2017-01-01');
create table t2_2017 partition of t2 for values from ('2017-01-01') to 
('2018-01-01');
create table t2_2018 partition of t2 for values from ('2018-01-01') to 
('2019-01-01');
create view v as select k1, c1 from t2;
grant select on all tables in schema b to z;
grant usage on schema b to z;


-- connected c
create table t3(k1 timestamp, c1 int) partition by range(k1);
create table t3_2016 partition of t3 for values from ('2016-01-01') to 
('2017-01-01');
create table t3_2017 partition of t3 for values from ('2017-01-01') to 
('2018-01-01');
create table t3_2018 partition of t3 for values from ('2018-01-01') to 
('2019-01-01');
create view v as select k1, c1 from t3;
grant select on all tables in schema c to z;
grant usage on schema c to z;

-- connected z
create view v as
select k1, c1 from
(select * from a.v
UNION ALL
select * from b.v
UNION ALL
select * from c.v) vabc ;

explain analyze select * from v where v.k1 > date '2017-01-01';
ERREUR:  XX000: did not find all requested child rels in append_rel_list
EMPLACEMENT : find_appinfos_by_relids, prepunion.c : 2643

set enable_partition_pruning=off;
SET

explain analyze select * from v where v.k1 > date '2017-01-01';
QUERY PLAN
---
Append  (cost=0.00..272.30 rows=4760 width=12) (actual time=0.217..0.217 rows=0 
loops=1)
   ->  Seq Scan on t1  (cost=0.00..35.50 rows=680 width=12) (actual 
time=0.020..0.020 rows=0 loops=1)
 Filter: (k1 > '2017-01-01'::date)
   ->  Seq Scan on t2_2016  (cost=0.00..35.50 rows=680 width=12) (actual 
time=0.035..0.035 rows=0 loops=1)
 Filter: (k1 > '2017-01-01'::date)
   ->  Seq Scan on t2_2017  (cost=0.00..35.50 rows=680 width=12) (actual 
time=0.016..0.016 rows=0 loops=1)
 Filter: (k1 > '2017-01-01'::date)
   ->  Seq Scan on t2_2018  (cost=0.00..35.50 rows=680 width=12) (actual 
time=0.015..0.015 rows=0 loops=1)
 Filter: (k1 > '2017-01-01'::date)
   ->  Seq Scan on t3_2016  (cost=0.00..35.50 rows=680 width=12) (actual 
time=0.040..0.040 rows=0 loops=1)
 Filter: (k1 > '2017-01-01'::date)
   ->  Seq Scan on t3_2017  (cost=0.00..35.50 rows=680 width=12) (actual 
time=0.016..0.016 rows=0 loops=1)
 Filter: (k1 > '2017-01-01'::date)
   ->  Seq Scan on t3_2018  (cost=0.00..35.50 rows=680 width=12) (actual 
time=0.016..0.016 rows=0 loops=1)
 Filter: (k1 > '2017-01-01'::date)
Planning Time: 0.639 ms
Execution Time: 0.400 ms

set enable_partition_pruning=on;
SET

explain analyze select * from v where v.k1 > date '2017-01-01';
ERREUR:  XX000: did not find all requested child rels in append_rel_list
EMPLACEMENT : find_appinfos_by_relids, prepunion.c : 2643



-- 10
select version();
  version


PostgreSQL 10.4 (Ubuntu 10.4-2.pgdg16.04+1) on x86_64-pc-linux-gnu, compiled by 
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609, 64-bit
(1 ligne)


-- connected superuser -- postgres
create user a password 'a';
create schema a authorization a;

create user b password 'b';
create schema b authorization b;

create user c password 'c';
create schema c authorization c;

create user z password 'z';
create schema z authorization z;

-- connected a
create table t1(k1 timestamp, c1 int);
create view v as select k1, c1 from t1;
grant usage on schema a to z;
grant select on all tables in schema a to z;

-- connected b
create table t2(k1 timestamp, c1 int) partition by range(k1);
create table t2_2016 partition of t2 for values from ('2016-01-01') to 
('2017-01-01');
create table t2_2017 partition of t2 for values from ('2017-01-01') to 
('2018-01-01');
create table 

Re: High CPU load caused by the autovacuum launcher process

2018-06-08 Thread Jeff Janes
On Fri, Jun 8, 2018 at 3:24 AM, Owayss Kabtoul  wrote:

> Hi folks,
>
> I ran into an issue where, on Postgres instances that have a very large
> number of databases per cluster (~15K), the autovacuum process seems to
> have a very high impact on CPU usage. Specifically, it is the autovacuum
> launcher process, not the workers. The launcher process eats a whole CPU
> (attached is in screenshot of htop).
> ...
> So auto-vacuum never really sleeps. Even changing the autovacuum_naptime
> and setting it to a much higher value (from 1min to 50min) did not have any
> effect at all.
>

After changing autovacuum_naptime, did you give it enough time to stabilize
at the new setting?  Say, at least 3 * 50 = 150 minutes?

But overall, I would say that if you want to have 15,000 databases, you
should just resign yourself to having one CPU dedicated to this task.

Cheers,

Jeff


Re: why partition pruning doesn't work?

2018-06-08 Thread David Rowley
On 8 June 2018 at 15:22, Tom Lane  wrote:
> David Rowley  writes:
>> On 8 June 2018 at 03:43, Tom Lane  wrote:
>>> Maybe there's something I'm missing here, but I sort of hoped that this
>>> patch would nuke all the special-case code for Params in this area.
>>> Why is there any need to distinguish them from other stable expressions?
>
>> We need to know which Params exist in the Expr as if there are no
>> Params, or only external Params, then we can run-time prune during
>> startup of the executor.
>
> This does not refute my question.  Why doesn't the same logic apply
> to any stable expression?  That is, ISTM a Param is a special case
> of that.

Okay, maybe we don't need to know which external params exist, but we
need to know if there are any exec params so that we don't try to
evaluate an expression with any of those during executor startup.

I'll produce a patch which simplifies things in that area.

Thanks for looking at this.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: hot_standby_feedback vs excludeVacuum and snapshots

2018-06-08 Thread Amit Kapila
On Fri, Jun 8, 2018 at 2:55 AM, Andres Freund  wrote:
>
> On 2018-06-07 14:19:18 -0700, Andres Freund wrote:
> > Hi,
> >
> > On 2018-03-29 12:17:24 +0100, Greg Stark wrote:
> > > I'm poking around to see debug a vacuuming problem and wondering if
> > > I've found something more serious.
> > >
> > > As far as I can tell the snapshots on HOT standby are built using a
> > > list of running xids that the primary builds and puts in the WAL and
> > > that seems to include all xids from transactions running in all
> > > databases. The HOT standby would then build a snapshot and eventually
> > > send the xmin of that snapshot back to the primary in the hot standby
> > > feedback and that would block vacuuming tuples that might be visible
> > > to the standby.
> >
> > > Many ages ago Alvaro sweated blood to ensure vacuums could run for
> > > long periods of time without holding back the xmin horizon and
> > > blocking other vacuums from cleaning up tuples. That's the purpose of
> > > the excludeVacuum flag in GetCurrentVirtualXIDs(). That's possible
> > > because we know vacuums won't insert any tuples that queries might try
> > > to view and also vacuums won't try to perform any sql queries on other
> > > tables.
> >
> > > I can't find anywhere that the standby snapshot building mechanism
> > > gets this same information about which xids are actually vacuums that
> > > can be ignored when building a snapshot. So I'm concerned that the hot
> > > standby sending back its xmin would be effectively undermining this
> > > mechanism and forcing vacuum xids to be included in the xmin horizon
> > > and prevent vacuuming of tuples.
> >
> > > Am I missing something obvious? Is this a known problem?
> >
> > Maybe I'm missing something, but the running transaction data reported
> > to the standby does *NOT* include anything about lazy vacuums - they
> > don't have an xid. The reason there's PROC_IN_VACUUM etc isn't the xid,
> > it's *xmin*, no?
> >
> > We currently do acquire an xid when truncating the relation - but I
> > think it'd somewhat fair to argue that that's somewhat of a bug. The
> > reason a log is acquired is that we need to log AEL locks, and that
> > currently means they have to be assigned to a transaction.
> >
> > Given that the truncation happens at the end of VACUUM and it *NEEDS* to
> > be present on the standby - otherwise the locking stuff is useless - I
> > don't think the fix commited in this thread is correct.
> >
> > Wonder if the right thing here wouldn't be to instead transiently
> > acquire an AEL lock during replay when truncating a relation?
>

If we go that route, then don't we need to bother about when to
release the lock which is right now done at the commit/abort of the
transaction.  In master, for vacuum, we do release the lock
immediately after truncate, but I think that is not true generally.
So, if we release the lock at a time other than commit/abort, we might
end up releasing them at the different times in master and standby.

> Isn't the fact that vacuum truncation requires an AEL, and that the
> change committed today excludes those transactions from running xacts
> records, flat out broken?
>
> Look at:
>
> void
> ProcArrayApplyRecoveryInfo(RunningTransactions running)
> ...
> /*
>  * Remove stale locks, if any.
>  *
>  * Locks are always assigned to the toplevel xid so we don't need to 
> care
>  * about subxcnt/subxids (and by extension not about ->suboverflowed).
>  */
> StandbyReleaseOldLocks(running->xcnt, running->xids);
>
> by excluding running transactions you have, as far as I can tell,
> effectively removed the vacuum truncation AEL from the standby. Now that
> only happens when a running xact record is logged, but as that happens
> in the background...
>

Yeah, that seems problematic.  I think we can avoid that if before
releasing the lock in StandbyReleaseOldLocks, we also have an
additional check to see whether the transaction is committed or
aborted as we do before adding it to KnownAssignedXids.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



High CPU load caused by the autovacuum launcher process

2018-06-08 Thread Owayss Kabtoul
Hi folks,

I ran into an issue where, on Postgres instances that have a very large
number of databases per cluster (~15K), the autovacuum process seems to
have a very high impact on CPU usage. Specifically, it is the autovacuum
launcher process, not the workers. The launcher process eats a whole CPU
(attached is in screenshot of htop).

I tried to look into what that process is actually doing, below is in
output of strace:

# strace -c -p 17252
strace: Process 17252 attached
^Cstrace: Process 17252 detached
% time seconds  usecs/call callserrors syscall
-- --- --- - - 
 94.160.030485   2 18878   read
  4.420.001431   6   239   brk
  0.350.000113   191   close
  0.280.91   424 4 epoll_wait
  0.190.60   160   epoll_ctl
  0.150.50   160   fstat
  0.120.40   130   epoll_create1
  0.120.39   160   open
  0.080.26   121 4 rt_sigreturn
  0.050.17   121   lseek
  0.050.16   4 4   write
  0.010.03   010   sendto
  0.010.02   010   select
  0.010.02   2 1 1 futex
  0.000.00   010   kill
-- --- --- - - 
100.000.032375 19519 9 total


All of those reads look like the following:

15:20:12 read(8,
"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0M\232q\20"...,
4096) = 4096
15:20:12 read(8,
"\0\0\314\316\237\275\v\21\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
4096) = 4096
...


That file, happens to be the global.stat file:

# ls -la /proc/17252/fd/8
lr-x-- 1 postgres postgres 64 Jun  7 15:22 /proc/17252/fd/8 ->
/mnt/pg_stat_mem_tmp/global.stat



On all instances where we have these ~15K huge cluster, this file's size is
about 3MB:

# ls -lha /mnt/pg_stat_mem_tmp/global.stat
-rw--- 1 postgres postgres 3.0M Jun  7 15:23
/mnt/pg_stat_mem_tmp/global.stat


On instances where we have just one or couple databases per cluster, the
file is about one kilobyte in size. This, of course, is to be expected, as
I understand that the contents of this file are cluster-wide statistics
that are gathered by the stats collector process.


I tried activating DEBUG1 logs and reloaded the postgres server, from the
logs it was clear that auto-vacuuming was always going on:

Jun  7 15:16:19 dbdf04 postgres[6455]: [944-1] 2018-06-07 15:16:19 UTC
   DEBUG:  autovacuum: processing database
"3c8e81b6-d94a-45c5-9ec2-27ab3192cd3b_db"
Jun  7 15:16:19 dbdf04 postgres[6457]: [944-1] 2018-06-07 15:16:19 UTC
   DEBUG:  autovacuum: processing database
"8d7b130a-67ce-47aa-96a6-359d6c14fb24_db"
Jun  7 15:16:20 dbdf04 postgres[6462]: [944-1] 2018-06-07 15:16:20 UTC
   DEBUG:  autovacuum: processing database
"134c5c51-a441-46a0-a2ca-15f08f37649e_db"
Jun  7 15:16:20 dbdf04 postgres[6463]: [944-1] 2018-06-07 15:16:20 UTC
[unknown] [unknown] [unknown] LOG:  incomplete startup packet
Jun  7 15:16:21 dbdf04 postgres[6464]: [944-1] 2018-06-07 15:16:21 UTC
   DEBUG:  autovacuum: processing database
"973b7be4-fd06-4c98-a078-f7a5e355d218_db"
Jun  7 15:16:21 dbdf04 postgres[6466]: [944-1] 2018-06-07 15:16:21 UTC
   DEBUG:  autovacuum: processing database
"6b831edf-f3e4-4d3b-ae7e-68def59d6c91_db"
Jun  7 15:16:21 dbdf04 postgres[6468]: [944-1] 2018-06-07 15:16:21 UTC
   DEBUG:  autovacuum: processing database
"8cfbf388-d30b-4a7d-b9ea-953352c0e947_db"


So auto-vacuum never really sleeps. Even changing the autovacuum_naptime
and setting it to a much higher value (from 1min to 50min) did not have any
effect at all. Both strace and the postgres logs showed a similar
behaviour: lots of reads to global.stat file and constantly iterating
through all the databases non-stop and executing autovacuum.


Is there anything that I can do to minimize the CPU load impact that this
process is having?

Many thanks in advance,
Owayss.


Re: hot_standby_feedback vs excludeVacuum and snapshots

2018-06-08 Thread Simon Riggs
On 7 June 2018 at 22:25, Andres Freund  wrote:
> On 2018-06-07 14:19:18 -0700, Andres Freund wrote:

> Look at:
>
> void
> ProcArrayApplyRecoveryInfo(RunningTransactions running)
> ...
> /*
>  * Remove stale locks, if any.
>  *
>  * Locks are always assigned to the toplevel xid so we don't need to 
> care
>  * about subxcnt/subxids (and by extension not about ->suboverflowed).
>  */
> StandbyReleaseOldLocks(running->xcnt, running->xids);
>
> by excluding running transactions you have, as far as I can tell,
> effectively removed the vacuum truncation AEL from the standby.

I agree, that is correct, there is a bug in my recent commit that
causes a small race window that could potentially lead to someone
reading the size of a relation just before it is truncated and then
falling off the end of the scan, resulting in a block access ERROR,
potentially much later than the truncate.

I have also found another bug which affects what we do next.

For context, AEL locks are normally removed by COMMIT or ABORT.
StandbyReleaseOldLocks() is just a sweeper to catch anything that
didn't send an abort before it died, so it hardly ever activates. The
coding of StandbyReleaseOldLocks() is backwards... if it ain't in the
running list, then we remove it.

But that wasn't working correctly either, since as of 49bff5300d527 we
assigned AELs to subxids. Subxids weren't in the running list and so
AELs held by them would have been removed at the wrong time, an extant
bug in PG10. It looks to me like they would have been removed either
early or late, up to the next runningxact info record. They would be
removed, so no leakage, but the late timing wouldn't be noticeable for
tests or most usage, since it would look just like lock contention.
Early release might give same issue of block access to non-existent
block/file.

So the attached patch fixes both the bug in the recent commit and the
one I just found by observation of 49bff5300d527, since they are
related.

StandbyReleaseOldLocks() can sweep in the same way as
ExpireOldKnownAssignedTransactionIds().

> I also don't understand why this change would be backpatched in the
> first place. It's a relatively minor efficiency thing, no?

As for everything, that is open to discussion. Yes, it seems minor to
me until it affects you, then its not. It seems to have affected
Greg.

The attached patch, or a later revision, needs to be backpatched to
PG10 independently of the recent committed patch.

I have yet to test this manually, but will do so tomorrow morning.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


remove_standby_subxid_locks.v1.patch
Description: Binary data


Re: hot_standby_feedback vs excludeVacuum and snapshots

2018-06-08 Thread Amit Kapila
On Fri, Jun 8, 2018 at 1:53 PM, Simon Riggs  wrote:
>
> So the attached patch fixes both the bug in the recent commit and the
> one I just found by observation of 49bff5300d527, since they are
> related.
>
> StandbyReleaseOldLocks() can sweep in the same way as
> ExpireOldKnownAssignedTransactionIds().
>

-StandbyReleaseOldLocks(int nxids, TransactionId *xids)
+StandbyReleaseOldLocks(TransactionId oldxid)
 {
  ListCell   *cell,
 *prev,
@@ -741,26 +741,8 @@ StandbyReleaseOldLocks(int nxids, TransactionId *xids)

  if (StandbyTransactionIdIsPrepared(lock->xid))
  remove = false;
- else
- {
- int i;
- bool found = false;
-
- for (i = 0; i < nxids; i++)
- {
- if (lock->xid == xids[i])
- {
- found = true;
- break;
- }
- }
-
- /*
- * If its not a running transaction, remove it.
- */
- if (!found)
- remove = true;
- }
+ else if (TransactionIdPrecedes(lock->xid, oldxid))
+ remove = true;


How will this avoid the bug introduced by your recent commit
(32ac7a11)?  After that commit, we no longer consider vacuum's xid in
RunningTransactions->oldestRunningXid calculation, so it is possible
that we still remove/release its lock on standby earlier than
required.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Concurrency bug in UPDATE of partition-key

2018-06-08 Thread Dilip Kumar
On Fri, Jun 8, 2018 at 2:23 PM, Dilip Kumar  wrote:

> On Tue, Jun 5, 2018 at 8:03 PM, Amit Khandekar 
> wrote:
>
>> Attached is a rebased patch version. Also included it in the upcoming
>> commitfest :
>> https://commitfest.postgresql.org/18/1660/
>>
>> In the rebased version, the new test cases are added in the existing
>> isolation/specs/partition-key-update-1.spec test.
>
>
> /*
> +  * If this is part of an UPDATE of partition-key, the
> +  * epq tuple will contain the changes from this
> +  * transaction over and above the updates done by the
> +  * other transaction. The caller should now use this
> +  * tuple as its NEW tuple, rather than the earlier NEW
> +  * tuple.
> +  */
> + if (epqslot)
> + {
> + *epqslot = my_epqslot;
> + return NULL;
> + }
>
> I think we need simmilar fix if there are BR Delete trigger and the
> ExecDelete is blocked on heap_lock_tuple because the concurrent transaction
> is updating the same row.  Because in such case it would have already got
> the final tuple so the hep_delete will return MayBeUpdated.
>

Amit Kapila had pointed (offlist) that you are already trying to fix this
issue.   I have one more question,  Suppose the first time we come for
ExecDelete and fire the BR delete trigger and then realize that we need to
retry because of the concurrent update.  Now, during retry, we realize that
because of the latest value we don't need to route the tuple to the
different partition so now we will call ExecUpdate and may fire the
BRUpdate triggers as well?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Re: hot_standby_feedback vs excludeVacuum and snapshots

2018-06-08 Thread Andres Freund
On 2018-06-08 11:29:17 +0530, Amit Kapila wrote:
> On Fri, Jun 8, 2018 at 2:55 AM, Andres Freund  wrote:
> >
> > On 2018-06-07 14:19:18 -0700, Andres Freund wrote:
> > > We currently do acquire an xid when truncating the relation - but I
> > > think it'd somewhat fair to argue that that's somewhat of a bug. The
> > > reason a log is acquired is that we need to log AEL locks, and that
> > > currently means they have to be assigned to a transaction.
> > >
> > > Given that the truncation happens at the end of VACUUM and it *NEEDS* to
> > > be present on the standby - otherwise the locking stuff is useless - I
> > > don't think the fix commited in this thread is correct.
> > >
> > > Wonder if the right thing here wouldn't be to instead transiently
> > > acquire an AEL lock during replay when truncating a relation?
> >
> 
> If we go that route, then don't we need to bother about when to
> release the lock which is right now done at the commit/abort of the
> transaction.  In master, for vacuum, we do release the lock
> immediately after truncate, but I think that is not true generally.
> So, if we release the lock at a time other than commit/abort, we might
> end up releasing them at the different times in master and standby.

I don't think that'd be an actual problem. For TRUNCATE we'd just
temporarily hold two locks. One for the DDL command itself, and one for
the trunction. Same is true for VACUUM, except that only one of them is
an AEL.


> > Isn't the fact that vacuum truncation requires an AEL, and that the
> > change committed today excludes those transactions from running xacts
> > records, flat out broken?
> >
> > Look at:
> >
> > void
> > ProcArrayApplyRecoveryInfo(RunningTransactions running)
> > ...
> > /*
> >  * Remove stale locks, if any.
> >  *
> >  * Locks are always assigned to the toplevel xid so we don't need 
> > to care
> >  * about subxcnt/subxids (and by extension not about 
> > ->suboverflowed).
> >  */
> > StandbyReleaseOldLocks(running->xcnt, running->xids);
> >
> > by excluding running transactions you have, as far as I can tell,
> > effectively removed the vacuum truncation AEL from the standby. Now that
> > only happens when a running xact record is logged, but as that happens
> > in the background...

> Yeah, that seems problematic.  I think we can avoid that if before
> releasing the lock in StandbyReleaseOldLocks, we also have an
> additional check to see whether the transaction is committed or
> aborted as we do before adding it to KnownAssignedXids.

I think the right fix is to simply revert the change here, rather than
do unplanned open heart surgery.  No convincing explanation has been
made why the change is necessary in the first place - the xid assignment
comes directly before vacuum finishes. That's much less than a lot of
other normal transactions will hold back concurrent vacuums / the
standby's reported horizon.

Greetings,

Andres Freund



Re: file cloning in pg_upgrade and CREATE DATABASE

2018-06-08 Thread Robert Haas
On Wed, Jun 6, 2018 at 11:58 AM, Peter Eisentraut
 wrote:
> --reflink={always,auto,never}.  (This option name is adapted from GNU
...
>  The setting always requires the use of relinks.  If

Is it supposed to be relinks or reflinks?  The two lines above don't agree.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Performance regression with PostgreSQL 11 and partitioning

2018-06-08 Thread Tom Lane
Robert Haas  writes:
> That being said, I don't mind a bit if you want to look for further
> speedups here, but if you do, keep in mind that a lot of queries won't
> even use partition-wise join, so all of the arrays will be of length
> 1.  Even when partition-wise join is used, it is quite likely not to
> be used for every table in the query, in which case it will still be
> of length 1 in some cases.  So pessimizing nappinfos = 1 even slightly
> is probably a regression overall.

TBH, I am way more concerned about the pessimization introduced for
every pre-existing usage of these functions by putting search loops
into them at all.  I'd like very much to revert that.  If we can
replace those with something along the line of root->index_array[varno]
we'll be better off across the board.

regards, tom lane



Re: pl/tcl function to detect when a request has been canceled

2018-06-08 Thread Peter Da Silva
On 6/8/18, 2:21 PM, "Andres Freund"  wrote:
Not sure I quite understand what you mean. You're thinking of the case
where you're processing rows one-by-one with a cursor? Or that a single
spi call takes a long while to process the query?

The former, I believe. One example (lightly obfuscated):

spi_exec -array a "SELECT this,that from table where stuff..." {
if {$use_cancel_pending && [cancel_pending]} {
error "cancelled in ..."
}
do_something_with_array a
more_business_code_here
and_so_on...
}



Re: Loaded footgun open_datasync on Windows

2018-06-08 Thread Magnus Hagander
On Fri, Jun 8, 2018 at 4:55 AM, Amit Kapila  wrote:

> On Fri, Jun 8, 2018 at 7:48 AM, Laurenz Albe 
> wrote:
>
>> Amit Kapila wrote:
>> > On Wed, Jun 6, 2018 at 3:06 PM, Kuntal Ghosh <
>> kuntalghosh.2...@gmail.com> wrote:
>> > > It seems the "#ifndef FRONTEND" restriction was added around
>> > > pgwin32_open() for building libpq with Visual C++ or Borland C++.  The
>> > > restriction was added in commit 422d4819 to build libpq with VC++[1].
>> > > Later, in commit fd7c3f67e0bc4, the support for Borland C++ was also
>> > > added.
>> > >
>> > > So, I'm not sure whether removing that restriction will work for all
>> > > front-end modules.
>> > >
>> >
>> > Thanks for doing investigation.  I agree with you that it appears from
>> the old
>> > discussion that we have added this restriction to build libpq/psql
>> (basically frontend)
>> > modules.  However, I tried to build those modules on windows by
>> removing that
>> > restriction and it doesn't give me any problem (except one extra
>> argument error,
>> > which can be easily resolved).  It might be that it gives problem with
>> certain
>> > version of MSVC.  I think if we want to pursue this direction, one
>> needs to verify
>> > on various supportted versions (of Windows) to see if things are okay.
>>
>> That's what the buildfarm is for, right?
>>
>> > Speaking about the issue at-hand, one way to make pg_test_fsync work in
>> to just
>> > include c.h and remove inclusion of postgres_fe.h, but not sure if that
>> is the best way.
>> > I am not sure if we have any other options to proceed in this case.
>> >
>> > Thoughts?
>>
>> I thought of a better way.
>> pg_test_fsync is properly listed as a server application in the
>> documentation,
>> so I think it should be built as such, as per the attached patch.
>>
>
> -#include "postgres_fe.h"
> +#include "postgres.h"
>
> I don't think that server application can use backend environment unless
> it is adapted to do so.  I think the application should have the capability
> to deal with backend stuff like ereport, elog etc, if we don't want to use
> FRONTEND environment.  The other server applications like pg_ctl.c,
> initdb.c, etc. also uses FRONTEND environment.
>
>
Not having researched this particular case but yes, there are a number of
things expected in a non-FRONTEND environment. Things that may also go
unnoticed until too late (e.g. singal handling etc that needs explicit
initialization). Standalong binaries should pretty much always be
frontend.


-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


pl/tcl function to detect when a request has been canceled

2018-06-08 Thread Peter Da Silva
We have occasional need to run very long-running pl/tcl scripts. If the request 
is cancelled (say, by the user hitting ^c in psql) the server-side script still 
runs to completion.

There is a C-level variable QueryCancelPending that can be used to monitor for 
this case, but it’s not visible at the pl/tcl scripting level. This is a simple 
new command that returns the current state of this variable to Tcl.

We are currently maintaining a fork of pl/tcl at 
https://github.com/flightaware/pltcl that has this mod, but it would be useful 
to get the functionality into mainline PostgreSQL.


cancel_pending.diff
Description: cancel_pending.diff


Re: Needless additional partition check in INSERT?

2018-06-08 Thread Alvaro Herrera
On 2018-Jun-09, David Rowley wrote:

> Of course, that could be fixed by adding something like "bool
> isinsert" to ExecConstraints(), so that it does not do the needless
> check on UPDATE,

Yeah, that was my actual suggestion rather than using Amit's patch
verbatim.

> but I'm strongly against the reduction in modularity.
> I quite like ExecConstraints() the way it is.

I see.

Maybe this business of sticking the partition constraint check in
ExecConstraints was a bad idea all along.  How about we rip it out and
move the responsibility of doing ExecPartitionCheck to the caller
instead?  See attached.  This whole business looks much cleaner to me
this way.


BTW, while working on this, I was a bit disturbed by the
execReplication.c changes (namely: if the partitioning is not identical
on both sides, things are likely to blow up pretty badly), but that's a
separate topic.  I didn't see any tests of logical replication with
partitioned tables ... Is the current understanding that we don't
promise those things to work terribly well together?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 5d57489c6d8c5e1b10413362005ae333acd6c243 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Wed, 6 Jun 2018 15:08:23 -0400
Subject: [PATCH] Fix situation with partition constraints

blah blah ... commits 15ce775faa4 19c47e7c820 ... blah blah
---
 src/backend/commands/copy.c| 30 +--
 src/backend/executor/execMain.c| 32 -
 src/backend/executor/execPartition.c   |  5 ++---
 src/backend/executor/execReplication.c |  8 ++--
 src/backend/executor/nodeModifyTable.c | 37 +++---
 src/include/executor/executor.h|  5 ++---
 6 files changed, 49 insertions(+), 68 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 0a1706c0d1..5f4ffd2975 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2726,29 +2726,17 @@ CopyFrom(CopyState cstate)
else
{
/*
-* We always check the partition constraint, 
including when
-* the tuple got here via tuple-routing.  
However we don't
-* need to in the latter case if no BR trigger 
is defined on
-* the partition.  Note that a BR trigger might 
modify the
-* tuple such that the partition constraint is 
no longer
-* satisfied, so we need to check in that case.
-*/
-   boolcheck_partition_constr =
-   (resultRelInfo->ri_PartitionCheck != NIL);
-
-   if (saved_resultRelInfo != NULL &&
-   !(resultRelInfo->ri_TrigDesc &&
- 
resultRelInfo->ri_TrigDesc->trig_insert_before_row))
-   check_partition_constr = false;
-
-   /*
-* If the target is a plain table, check the 
constraints of
-* the tuple.
+* Check the tuple is valid against all 
constraints, and the
+* partition constraint, as required.
 */
if (resultRelInfo->ri_FdwRoutine == NULL &&
-   
(resultRelInfo->ri_RelationDesc->rd_att->constr ||
-check_partition_constr))
-   ExecConstraints(resultRelInfo, slot, 
estate, true);
+   
resultRelInfo->ri_RelationDesc->rd_att->constr)
+   ExecConstraints(resultRelInfo, slot, 
estate);
+   if (resultRelInfo->ri_PartitionCheck &&
+   (saved_resultRelInfo == NULL ||
+(resultRelInfo->ri_TrigDesc &&
+ 
resultRelInfo->ri_TrigDesc->trig_insert_before_row)))
+   ExecPartitionCheck(resultRelInfo, slot, 
estate, true);
 
if (useHeapMultiInsert)
{
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 3d12f9c76f..c394b5dbed 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1856,14 +1856,17 @@ ExecRelCheck(ResultRelInfo *resultRelInfo,
 /*
  * ExecPartitionCheck --- check that tuple meets the partition constraint.
  *
- * Exported in executor.h for outside use.
- 

Re: Needless additional partition check in INSERT?

2018-06-08 Thread Alvaro Herrera
On 2018-Jun-09, David Rowley wrote:

> Of course, that could be fixed by adding something like "bool
> isinsert" to ExecConstraints(), so that it does not do the needless
> check on UPDATE,

Yeah, that was my actual suggestion rather than using Amit's patch
verbatim.

> but I'm strongly against the reduction in modularity.
> I quite like ExecConstraints() the way it is.

I see.

Maybe this business of sticking the partition constraint check in
ExecConstraints was a bad idea all along.  How about we rip it out and
move the responsibility of doing ExecPartitionCheck to the caller
instead?  See attached.  This whole business looks much cleaner to me
this way.


BTW, while working on this, I was a bit disturbed by the
execReplication.c changes (namely: if the partitioning is not identical
on both sides, things are likely to blow up pretty badly), but that's a
separate topic.  I didn't see any tests of logical replication with
partitioned tables ... Is the current understanding that we don't
promise those things to work terribly well together?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 5d57489c6d8c5e1b10413362005ae333acd6c243 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Wed, 6 Jun 2018 15:08:23 -0400
Subject: [PATCH] Fix situation with partition constraints

---
 src/backend/commands/copy.c| 30 +--
 src/backend/executor/execMain.c| 32 -
 src/backend/executor/execPartition.c   |  5 ++---
 src/backend/executor/execReplication.c |  8 ++--
 src/backend/executor/nodeModifyTable.c | 37 +++---
 src/include/executor/executor.h|  5 ++---
 6 files changed, 49 insertions(+), 68 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 0a1706c0d1..5f4ffd2975 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2726,29 +2726,17 @@ CopyFrom(CopyState cstate)
else
{
/*
-* We always check the partition constraint, 
including when
-* the tuple got here via tuple-routing.  
However we don't
-* need to in the latter case if no BR trigger 
is defined on
-* the partition.  Note that a BR trigger might 
modify the
-* tuple such that the partition constraint is 
no longer
-* satisfied, so we need to check in that case.
-*/
-   boolcheck_partition_constr =
-   (resultRelInfo->ri_PartitionCheck != NIL);
-
-   if (saved_resultRelInfo != NULL &&
-   !(resultRelInfo->ri_TrigDesc &&
- 
resultRelInfo->ri_TrigDesc->trig_insert_before_row))
-   check_partition_constr = false;
-
-   /*
-* If the target is a plain table, check the 
constraints of
-* the tuple.
+* Check the tuple is valid against all 
constraints, and the
+* partition constraint, as required.
 */
if (resultRelInfo->ri_FdwRoutine == NULL &&
-   
(resultRelInfo->ri_RelationDesc->rd_att->constr ||
-check_partition_constr))
-   ExecConstraints(resultRelInfo, slot, 
estate, true);
+   
resultRelInfo->ri_RelationDesc->rd_att->constr)
+   ExecConstraints(resultRelInfo, slot, 
estate);
+   if (resultRelInfo->ri_PartitionCheck &&
+   (saved_resultRelInfo == NULL ||
+(resultRelInfo->ri_TrigDesc &&
+ 
resultRelInfo->ri_TrigDesc->trig_insert_before_row)))
+   ExecPartitionCheck(resultRelInfo, slot, 
estate, true);
 
if (useHeapMultiInsert)
{
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 3d12f9c76f..c394b5dbed 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1856,14 +1856,17 @@ ExecRelCheck(ResultRelInfo *resultRelInfo,
 /*
  * ExecPartitionCheck --- check that tuple meets the partition constraint.
  *
- * Exported in executor.h for outside use.
- * Returns true if it meets the partition constraint, else 

Re: pl/tcl function to detect when a request has been canceled

2018-06-08 Thread Tom Lane
Andres Freund  writes:
> On 2018-06-08 18:08:14 +, Peter Da Silva wrote:
>> There is a C-level variable QueryCancelPending that can be used to
>> monitor for this case, but it’s not visible at the pl/tcl scripting
>> level. This is a simple new command that returns the current state of
>> this variable to Tcl.

> I'm not terribly opposed to this, but I wonder if the much more
> pragmatic solution is to just occasionally call a database function that
> checks this?  You could just run SELECT 1 occasionally :/

That would be quite expensive for the purpose, surely.

My complaint about it is more like "if we're going to put this into
pltcl, why not also plperl and plpython?"  It might be unfair to
ask this patch to cover all three, but it would be odd not to try to
maintain feature parity ... especially since pltcl is probably the
least-used of the three these days.

regards, tom lane



Re: SHOW ALL does not honor pg_read_all_settings membership

2018-06-08 Thread Alvaro Herrera
On 2018-Jun-08, Alvaro Herrera wrote:

> BTW a further bug here seems to be that "select * from pg_settings()"
> does not show the source file/line for members of the role, which seems
> to be documented to occur.

And I think this fixes it.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From c71a14bdac838efc09d01b60383fa6e631606c57 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Fri, 8 Jun 2018 15:12:58 -0400
Subject: [PATCH] fix sourcefile/line display for pg_read_all_settings role

---
 src/backend/utils/misc/guc.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 274034d88b..e84d56dae6 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -8595,7 +8595,8 @@ GetConfigOptionByNum(int varnum, const char **values, 
bool *noshow)
 * security reasons, we don't show source file/line number for
 * non-superusers.
 */
-   if (conf->source == PGC_S_FILE && superuser())
+   if (conf->source == PGC_S_FILE &&
+   is_member_of_role(GetUserId(), DEFAULT_ROLE_READ_ALL_SETTINGS))
{
values[14] = conf->sourcefile;
snprintf(buffer, sizeof(buffer), "%d", conf->sourceline);
-- 
2.11.0



Re: pl/tcl function to detect when a request has been canceled

2018-06-08 Thread Tom Lane
Andres Freund  writes:
> Either way, I'm not convinced that handling query cancels in isolation
> is really the right thing. I think pretty much all forms of interrupt
> would need to be processed, not just cancels.

+1

regards, tom lane



Re: pl/tcl function to detect when a request has been canceled

2018-06-08 Thread Andres Freund
Hi,

On 2018-06-08 18:08:14 +, Peter Da Silva wrote:
> We have occasional need to run very long-running pl/tcl scripts. If
> the request is cancelled (say, by the user hitting ^c in psql) the
> server-side script still runs to completion.

> 
> There is a C-level variable QueryCancelPending that can be used to
> monitor for this case, but it’s not visible at the pl/tcl scripting
> level. This is a simple new command that returns the current state of
> this variable to Tcl.
> 
> We are currently maintaining a fork of pl/tcl at
> https://github.com/flightaware/pltcl that has this mod, but it would
> be useful to get the functionality into mainline PostgreSQL.

I'm not terribly opposed to this, but I wonder if the much more
pragmatic solution is to just occasionally call a database function that
checks this?  You could just run SELECT 1 occasionally :/

- Andres



Re: pl/tcl function to detect when a request has been canceled

2018-06-08 Thread Peter Da Silva
On 6/8/18, 1:12 PM, "Andres Freund"  wrote:
I'm not terribly opposed to this, but I wonder if the much more
pragmatic solution is to just occasionally call a database function that
checks this?  You could just run SELECT 1 occasionally :/

That seems to work, and I suppose in most cases the overhead could be mitigated 
by only calling it every N times through a loop, but it's not as clean.  



Re: pl/tcl function to detect when a request has been canceled

2018-06-08 Thread Peter Da Silva
On 6/8/18, 1:12 PM, "Andres Freund"  wrote:
I'm not terribly opposed to this, but I wonder if the much more
pragmatic solution is to just occasionally call a database function that
checks this?  You could just run SELECT 1 occasionally :/

After further discussion with our team:

Would this work if the reason for it ignoring the cancel request is that it is 
already performing a long-running spi_exec with a large response?



Re: hot_standby_feedback vs excludeVacuum and snapshots

2018-06-08 Thread Andres Freund
Hi,

On 2018-06-08 09:23:02 +0100, Simon Riggs wrote:
> I have also found another bug which affects what we do next.
> 
> For context, AEL locks are normally removed by COMMIT or ABORT.
> StandbyReleaseOldLocks() is just a sweeper to catch anything that
> didn't send an abort before it died, so it hardly ever activates. The
> coding of StandbyReleaseOldLocks() is backwards... if it ain't in the
> running list, then we remove it.
> 
> But that wasn't working correctly either, since as of 49bff5300d527 we
> assigned AELs to subxids. Subxids weren't in the running list and so
> AELs held by them would have been removed at the wrong time, an extant
> bug in PG10. It looks to me like they would have been removed either
> early or late, up to the next runningxact info record. They would be
> removed, so no leakage, but the late timing wouldn't be noticeable for
> tests or most usage, since it would look just like lock contention.
> Early release might give same issue of block access to non-existent
> block/file.

Yikes, that's kinda bad. It can also just cause plain crashes, no? The
on-disk / catalog state isn't necessarily consistent during DDL, which
is why we hold AE locks. At the very least it can cause corruption of
in-use relcache entries and such.  In practice the fact this probably
hits only a limited number of people because it requires DDL to happen
in subtransactions, which probably isn't *that* common.


> So the attached patch fixes both the bug in the recent commit and the
> one I just found by observation of 49bff5300d527, since they are
> related.

Can we please keep them separate?


> StandbyReleaseOldLocks() can sweep in the same way as
> ExpireOldKnownAssignedTransactionIds().
> 
> > I also don't understand why this change would be backpatched in the
> > first place. It's a relatively minor efficiency thing, no?
> 
> As for everything, that is open to discussion. Yes, it seems minor to
> me until it affects you, then its not.

How is it any worse than any other normal short-lived write transaction?
The truncation is done shortly before commit.


> It seems to have affected Greg.

As far as I can tell Greg was just theorizing?


Greetings,

Andres Freund



Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

2018-06-08 Thread David Rowley
On 9 June 2018 at 04:57, Tom Lane  wrote:
> Phil Florent  writes:
>> explain analyze select * from v where v.k1 > date '2017-01-01';
>> ERREUR:  XX000: did not find all requested child rels in append_rel_list
>> EMPLACEMENT : find_appinfos_by_relids, prepunion.c : 2643
>
> Reproduced here, thanks for the report!  This is very timely since
> we were just in process of rewriting that code anyway ...

Yeah. Thanks for the report Phil.

It looks like this was 499be013de6, which was one of mine.

A more simple case to reproduce is:

drop table listp;
create table listp (a int, b int) partition by list(a);
create table listp1 partition of listp for values in (1);
select * from (select * from listp union all select * from listp) t where a = 1;

I'll look in more detail after sleeping.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: SHOW ALL does not honor pg_read_all_settings membership

2018-06-08 Thread Alvaro Herrera
BTW a further bug here seems to be that "select * from pg_settings()"
does not show the source file/line for members of the role, which seems
to be documented to occur.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pl/tcl function to detect when a request has been canceled

2018-06-08 Thread Andres Freund
On 2018-06-08 14:41:41 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2018-06-08 18:08:14 +, Peter Da Silva wrote:
> >> There is a C-level variable QueryCancelPending that can be used to
> >> monitor for this case, but it’s not visible at the pl/tcl scripting
> >> level. This is a simple new command that returns the current state of
> >> this variable to Tcl.
> 
> > I'm not terribly opposed to this, but I wonder if the much more
> > pragmatic solution is to just occasionally call a database function that
> > checks this?  You could just run SELECT 1 occasionally :/
> 
> That would be quite expensive for the purpose, surely.

Sure, but it works today...

Either way, I'm not convinced that handling query cancels in isolation
is really the right thing. I think pretty much all forms of interrupt
would need to be processed, not just cancels. It's imo at least as
important to process session termination (including recovery conflicts),
and catchup interrupts.  I think we largely require that the PLs handle
exceptions anyway, so just having a 'pg_process_interrupts()' function
that then is wrapped by the individual PLs would make sense imo.

Greetings,

Andres Freund



Re: pl/tcl function to detect when a request has been canceled

2018-06-08 Thread Andres Freund
On 2018-06-08 19:16:49 +, Peter Da Silva wrote:
> On 6/8/18, 1:12 PM, "Andres Freund"  wrote:
> I'm not terribly opposed to this, but I wonder if the much more
> pragmatic solution is to just occasionally call a database function that
> checks this?  You could just run SELECT 1 occasionally :/
> 
> After further discussion with our team:
> 
> Would this work if the reason for it ignoring the cancel request is
> that it is already performing a long-running spi_exec with a large
> response?

Not sure I quite understand what you mean. You're thinking of the case
where you're processing rows one-by-one with a cursor? Or that a single
spi call takes a long while to process the query?

Greetings,

Andres Freund



Re: Bug in either collation docs or code

2018-06-08 Thread Tom Lane
Melanie Plageman  writes:
> On postgres built off of master on my mac (sierra), the following is the
> output:

[ scratches head ... ]  I get the same results on either Mac or Linux:

regression=# create database u8 encoding utf8 template template0;
CREATE DATABASE
regression=# \c u8
You are now connected to database "u8" as user "tgl".
u8=# CREATE TABLE test_col_mac (
u8(# a text COLLATE "de_DE",
u8(# b text COLLATE "es_ES"
u8(# );
CREATE TABLE
u8=# SELECT a < (SELECT 'foo'::TEXT COLLATE "fr_FR") FROM test_col_mac;
 ?column? 
--
(0 rows)

u8=# INSERT INTO test_col_mac VALUES('A','A');
INSERT 0 1
u8=# SELECT a < (SELECT 'foo'::TEXT COLLATE "fr_FR") FROM test_col_mac;
ERROR:  could not determine which collation to use for string comparison
HINT:  Use the COLLATE clause to set the collation explicitly.

Now, without the sub-select, it works:

u8=# SELECT a < ('foo'::TEXT COLLATE "fr_FR") FROM test_col_mac;
 ?column? 
--
 t
(1 row)

because the explicit COLLATE is considered to determine the
collation of the comparison operator.

I wonder whether you're using stock Postgres, or something that
flattens scalar sub-selects?

regards, tom lane



Re: Needless additional partition check in INSERT?

2018-06-08 Thread David Rowley
On 9 June 2018 at 04:52, Alvaro Herrera  wrote:
> Truth is, the more I look at this, the more I think it should be done in
> the way Amit Langote was proposing: do away with the extra function, and
> check all those conditions inside ExecConstraints itself.  We can add a
> new boolean tupleRouting argument, which I think is the only missing bit.

As far as I see it what Amit Langote is proposing is to shift the
needless additional partition check from INSERT to UPDATE. I've tested
his patch and confirmed that this is the case. Amit has even written:

On 7 June 2018 at 21:37, Amit Langote  wrote:
> There is one twist though.  It may happen that ExecUpdate ends up
> performing tuple routing for one row and thus will have ri_PartitionRoot
> set.  For another row, tuple routing may not be needed as the updated
> version of the row satisfies the partition constraint and now because
> ri_PartitionRoot has been set, ExecConstraint would end up performing the
> partition constraint check redundantly if trig_insert_before_row is true.

I'm really struggling to understand why this is being suggested at all.

Of course, that could be fixed by adding something like "bool
isinsert" to ExecConstraints(), so that it does not do the needless
check on UPDATE, but I'm strongly against the reduction in modularity.
I quite like ExecConstraints() the way it is.

FWIW, my test case that shows Amit's patch is wrong is:

drop table listp;
create table listp (a int, b int) partition by list(a);
create table listp12 partition of listp for values in(1,2);
create table listp34 partition of listp for values in(3,4);
create or replace function listp_insert_trigger() returns trigger as
$$ begin raise notice '%', NEW.a; return NEW; end; $$ language
plpgsql;
create trigger listp12_insert before insert on listp12 for each row
execute procedure listp_insert_Trigger();
insert into listp values(1,2);
update listp set b=1;

stick a breakpoint in ExecPartitionCheck() and watch it being hit
twice on the update. Revert Amit's patch and you'll hit it once.

I'm fine with my v3 patch, or your idea with the function in v5, but
with respect to Amit L, I'm strongly against his patch on the grounds
that it just moves the problem somewhere else.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Performance regression with PostgreSQL 11 and partitioning

2018-06-08 Thread Tom Lane
Robert Haas  writes:
> On Fri, Jun 8, 2018 at 12:56 AM, Tom Lane  wrote:
>> The pre-v11 incarnation of those functions took a single AppendRelInfo,
>> specifying an exact translation from one parent relid to one child
>> relid.  The fundamental problem I've got with the current code, entirely
>> independently of any performance issues, is that it's completely unclear
>> -- or at least undocumented -- which translation(s) are supposed to occur.

> I don't understand this complaint.  Before, the code took one
> AppendRelInfo, and according to you, it was clear what was supposed to
> happen.  Now it takes an array of AppendRelInfos and, according to
> you, it's completely unclear.  Yet that seems, to me at least, to be a
> straightforward generalization.  If 1 AppendRelInfo is an adequate
> specification of one translations, why are N AppendRelInfos not an
> adequate specification of N translations?

Because the relationships between the transforms are unclear.  Are we
supposed to apply those N transformations to the expression in sequence?
It doesn't look to me like that's what the code does.  I think --- I might
be wrong --- that the code is relying on the transformations to be
non-overlapping, that is a change made by any one of them cannot be
further affected by another one.  This is, however, undocumented.

regards, tom lane



Re: Compromised postgresql instances

2018-06-08 Thread Andrew Dunstan




On 06/08/2018 06:13 PM, Andrew Gierth wrote:

"Tom" == Tom Lane  writes:

  > Andrew Dunstan  writes:
  >> Please cite actual instances of such reports. Vague queries like
  >> this help nobody.

We do also get them on the IRC channel every once in a while, not
very frequently but enough to notice (maybe 2-3 so far this year?).

  Tom> Unless there's some evidence that these attacks are getting in
  Tom> through a heretofore unknown PG security vulnerability, rather
  Tom> than user misconfiguration (such as weak/no password), I'm not
  Tom> sure what the security list would have to offer. Right now it
  Tom> seems like Steve's move to try to gather more evidence is quite
  Tom> the right thing to do.

Right. All the instances on IRC that I'm personally aware of have
followed this pattern: either the user has used "host all all 0.0.0.0/0
trust", or they used "host all all 0.0.0.0/0 md5" where the password for
the postgres user has been one where it's plausible that a simple
automated dictionary attack could have guessed it (e.g. "654321" in one
report).

Excuses for doing this have varied (but I'm pretty sure I've heard "we
put that in while trying to fix a problem and forgot to take it out"
more than once - so it's worth a reminder that one should never, ever,
suggest this on any help forum).

The reports are similar enough and generic enough that it seems pretty
certain that the scanning and subsequent compromise is all automated;
some reports have included a (failed) attempt to use local root exploits
to escalate from the postgres user to root access. Compromised systems
have been reportedly used as DDoS traffic sources and for cryptocurrency
mining, but obviously other uses can't be ruled out. I don't know of any
reports of data being exfiltrated or modified, but that doesn't mean it
doesn't happen.

I have (private) logs of the channel going back a while, but I haven't
made any attempt to track how often it happens - the above is basically
all from memory.





Well, I guess every service is going to be vulnerable to 
misconfiguration. The fact that they are automated (oresumably knowing 
how to speak the wire protocol or using a library that does) is a bit 
scary, but not surprising.



My view is that generally DB servers should not be reachable from the 
internet in the first place, certainly not without a client cert, but 
ideally not at all. Then misconfigurations like those Andrew refers to 
above would not be so lethal.


cheers

andrew

--
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Transform for pl/perl

2018-06-08 Thread Tom Lane
I wrote:
> I'm inclined to think that auto-dereference is indeed a good idea,
> and am tempted to go make that change to make all this consistent.
> Comments?

Here's a draft patch for that.  I ended up only changing
plperl_sv_to_datum.  There is maybe a case for doing something
similar in plperl_return_next_internal's fn_retistuple code path,
so that you can return a reference-to-reference-to-hash there;
but I was unable to muster much enthusiasm for touching that.

regards, tom lane

diff --git a/src/pl/plperl/expected/plperl.out b/src/pl/plperl/expected/plperl.out
index ebfba3e..d8a1ff5 100644
*** a/src/pl/plperl/expected/plperl.out
--- b/src/pl/plperl/expected/plperl.out
*** $$ LANGUAGE plperl;
*** 763,776 
  SELECT text_obj();
  ERROR:  cannot convert Perl hash to non-composite type text
  CONTEXT:  PL/Perl function "text_obj"
! - make sure we can't return a scalar ref
  CREATE OR REPLACE FUNCTION text_scalarref() RETURNS text AS $$
  	my $str = 'str';
  	return \$str;
  $$ LANGUAGE plperl;
  SELECT text_scalarref();
! ERROR:  PL/Perl function must return reference to hash or array
! CONTEXT:  PL/Perl function "text_scalarref"
  -- check safe behavior when a function body is replaced during execution
  CREATE OR REPLACE FUNCTION self_modify(INTEGER) RETURNS INTEGER AS $$
 spi_exec_query('CREATE OR REPLACE FUNCTION self_modify(INTEGER) RETURNS INTEGER AS \'return $_[0] * 3;\' LANGUAGE plperl;');
--- 763,779 
  SELECT text_obj();
  ERROR:  cannot convert Perl hash to non-composite type text
  CONTEXT:  PL/Perl function "text_obj"
! -- test looking through a scalar ref
  CREATE OR REPLACE FUNCTION text_scalarref() RETURNS text AS $$
  	my $str = 'str';
  	return \$str;
  $$ LANGUAGE plperl;
  SELECT text_scalarref();
!  text_scalarref 
! 
!  str
! (1 row)
! 
  -- check safe behavior when a function body is replaced during execution
  CREATE OR REPLACE FUNCTION self_modify(INTEGER) RETURNS INTEGER AS $$
 spi_exec_query('CREATE OR REPLACE FUNCTION self_modify(INTEGER) RETURNS INTEGER AS \'return $_[0] * 3;\' LANGUAGE plperl;');
diff --git a/src/pl/plperl/plperl.c b/src/pl/plperl/plperl.c
index 4342c02..4cfc506 100644
*** a/src/pl/plperl/plperl.c
--- b/src/pl/plperl/plperl.c
*** plperl_sv_to_datum(SV *sv, Oid typid, in
*** 1402,1412 
  			return ret;
  		}
  
! 		/* Reference, but not reference to hash or array ... */
! 		ereport(ERROR,
! (errcode(ERRCODE_DATATYPE_MISMATCH),
!  errmsg("PL/Perl function must return reference to hash or array")));
! 		return (Datum) 0;		/* shut up compiler */
  	}
  	else
  	{
--- 1402,1414 
  			return ret;
  		}
  
! 		/*
! 		 * If it's a reference to something else, such as a scalar, just
! 		 * recursively look through the reference.
! 		 */
! 		return plperl_sv_to_datum(SvRV(sv), typid, typmod,
!   fcinfo, finfo, typioparam,
!   isnull);
  	}
  	else
  	{
diff --git a/src/pl/plperl/sql/plperl.sql b/src/pl/plperl/sql/plperl.sql
index c36da0f..b0d950b 100644
*** a/src/pl/plperl/sql/plperl.sql
--- b/src/pl/plperl/sql/plperl.sql
*** $$ LANGUAGE plperl;
*** 504,510 
  
  SELECT text_obj();
  
! - make sure we can't return a scalar ref
  CREATE OR REPLACE FUNCTION text_scalarref() RETURNS text AS $$
  	my $str = 'str';
  	return \$str;
--- 504,510 
  
  SELECT text_obj();
  
! -- test looking through a scalar ref
  CREATE OR REPLACE FUNCTION text_scalarref() RETURNS text AS $$
  	my $str = 'str';
  	return \$str;


Re: Compromised postgresql instances

2018-06-08 Thread Andrew Dunstan




On 06/08/2018 04:54 PM, Steve Atkins wrote:

On Jun 8, 2018, at 1:47 PM, Tom Lane  wrote:

Andrew Dunstan  writes:

On 06/08/2018 04:34 PM, Steve Atkins wrote:

I've noticed a steady trickle of reports of postgresql servers being 
compromised via being left available to the internet with insecure or default 
configuration, or brute-forced credentials. The symptoms are randomly named 
binaries being uploaded to the data directory and executed with the permissions 
of the postgresql user, apparently via an extension or an untrusted PL.

Is anyone tracking or investigating this?

Please cite actual instances of such reports. Vague queries like this
help nobody.

I imagine Steve is reacting to this report from today:
https://www.postgresql.org/message-id/CANozSKLGgWDpzfua2L=OGFN=dg3po98ujqjj18gbvfr1-yk...@mail.gmail.com

I recall something similar being reported a few weeks ago,

https://www.postgresql.org/message-id/020901d3f14c%24512a46d0%24f37ed470%24%40gmail.com



OK, those appeared on other mailing lists I don't subscribe to, so I was 
missing context.



cheers

andrew




--
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: SHOW ALL does not honor pg_read_all_settings membership

2018-06-08 Thread Alvaro Herrera
On 2018-Mar-01, Laurenz Albe wrote:

> I noticed that commit 25fff40798fc4ac11a241bfd9ab0c45c085e2212 forgot
> to teach SHOW ALL to show all GUCs when the user belongs to 
> pg_read_all_settings.
> 
> Patch attached; I think this should be backpatched.

Done, with the further fixes downthread.  Thanks!

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Compromised postgresql instances

2018-06-08 Thread Tom Lane
Andrew Dunstan  writes:
> On 06/08/2018 04:34 PM, Steve Atkins wrote:
>> I've noticed a steady trickle of reports of postgresql servers being 
>> compromised via being left available to the internet with insecure or 
>> default configuration, or brute-forced credentials. The symptoms are 
>> randomly named binaries being uploaded to the data directory and executed 
>> with the permissions of the postgresql user, apparently via an extension or 
>> an untrusted PL.
>> 
>> Is anyone tracking or investigating this?

> Please cite actual instances of such reports. Vague queries like this 
> help nobody.

I imagine Steve is reacting to this report from today:
https://www.postgresql.org/message-id/CANozSKLGgWDpzfua2L=OGFN=dg3po98ujqjj18gbvfr1-yk...@mail.gmail.com

I recall something similar being reported a few weeks ago, but am
too lazy to trawl the archives right now.

> Furthermore, security concerns are best addressed to the security 
> mailing list.

Unless there's some evidence that these attacks are getting in through
a heretofore unknown PG security vulnerability, rather than user
misconfiguration (such as weak/no password), I'm not sure what the
security list would have to offer.  Right now it seems like Steve's move
to try to gather more evidence is quite the right thing to do.

regards, tom lane



Re: Compromised postgresql instances

2018-06-08 Thread Steve Atkins


> On Jun 8, 2018, at 1:47 PM, Tom Lane  wrote:
> 
> Andrew Dunstan  writes:
>> On 06/08/2018 04:34 PM, Steve Atkins wrote:
>>> I've noticed a steady trickle of reports of postgresql servers being 
>>> compromised via being left available to the internet with insecure or 
>>> default configuration, or brute-forced credentials. The symptoms are 
>>> randomly named binaries being uploaded to the data directory and executed 
>>> with the permissions of the postgresql user, apparently via an extension or 
>>> an untrusted PL.
>>> 
>>> Is anyone tracking or investigating this?
> 
>> Please cite actual instances of such reports. Vague queries like this 
>> help nobody.
> 
> I imagine Steve is reacting to this report from today:
> https://www.postgresql.org/message-id/CANozSKLGgWDpzfua2L=OGFN=dg3po98ujqjj18gbvfr1-yk...@mail.gmail.com
> 
> I recall something similar being reported a few weeks ago,

https://www.postgresql.org/message-id/020901d3f14c%24512a46d0%24f37ed470%24%40gmail.com

> but am
> too lazy to trawl the archives right now.

Yes, plus I recall a couple of discussions on IRC with similar behaviour, and
a few more details about how the binaries were being uploaded.

> 
>> Furthermore, security concerns are best addressed to the security 
>> mailing list.
> 
> Unless there's some evidence that these attacks are getting in through
> a heretofore unknown PG security vulnerability, rather than user
> misconfiguration (such as weak/no password), I'm not sure what the
> security list would have to offer.  Right now it seems like Steve's move
> to try to gather more evidence is quite the right thing to do.

Yeah. It's not a security issue with postgresql itself, I don't believe, so not
really something that has to go to the security alias. It's more of an ops
issue, but I thought I'd ask here to see if anyone was already looking at it,
and to raise a flag if they weren't.

Cheers,
  Steve




Re: Compromised postgresql instances

2018-06-08 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 > Andrew Dunstan  writes:
 >> Please cite actual instances of such reports. Vague queries like
 >> this help nobody.

We do also get them on the IRC channel every once in a while, not
very frequently but enough to notice (maybe 2-3 so far this year?).

 Tom> Unless there's some evidence that these attacks are getting in
 Tom> through a heretofore unknown PG security vulnerability, rather
 Tom> than user misconfiguration (such as weak/no password), I'm not
 Tom> sure what the security list would have to offer. Right now it
 Tom> seems like Steve's move to try to gather more evidence is quite
 Tom> the right thing to do.

Right. All the instances on IRC that I'm personally aware of have
followed this pattern: either the user has used "host all all 0.0.0.0/0
trust", or they used "host all all 0.0.0.0/0 md5" where the password for
the postgres user has been one where it's plausible that a simple
automated dictionary attack could have guessed it (e.g. "654321" in one
report).

Excuses for doing this have varied (but I'm pretty sure I've heard "we
put that in while trying to fix a problem and forgot to take it out"
more than once - so it's worth a reminder that one should never, ever,
suggest this on any help forum).

The reports are similar enough and generic enough that it seems pretty
certain that the scanning and subsequent compromise is all automated;
some reports have included a (failed) attempt to use local root exploits
to escalate from the postgres user to root access. Compromised systems
have been reportedly used as DDoS traffic sources and for cryptocurrency
mining, but obviously other uses can't be ruled out. I don't know of any
reports of data being exfiltrated or modified, but that doesn't mean it
doesn't happen.

I have (private) logs of the channel going back a while, but I haven't
made any attempt to track how often it happens - the above is basically
all from memory.

-- 
Andrew (irc:RhodiumToad)



Re: Remove mention in docs that foreign keys on partitioned tables are not supported

2018-06-08 Thread Alvaro Herrera
On 2018-Jun-08, Alvaro Herrera wrote:

> Actually, the column order doesn't matter for a trigger function,
> because these don't refer to columns by number but by name.  So unless
> users write trigger functions in C and use hardcoded column numbers
> (extremely unlikely), I think this is not an issue.  In other words, in
> the interesting cases the trigger functions are the same for all
> partitions -- therefore upgrading from separate per-partition triggers
> to one master trigger in the partitioned table is not going to be that
> difficult, ISTM.

One last thing.  This wording has been there since pg10; the only thing
that changed is that it now says "BEFORE ROW triggers" instead of "Row
triggers".  I don't think leaving it there one more year is going to get
us too much grief, TBH.

I'm going to leave this as an open item for one or two more weeks and
see if there's more support for Ashutosh's PoV; but if not, my proposal
is to close it without changing anything further.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Compromised postgresql instances

2018-06-08 Thread Steve Atkins
I've noticed a steady trickle of reports of postgresql servers being 
compromised via being left available to the internet with insecure or default 
configuration, or brute-forced credentials. The symptoms are randomly named 
binaries being uploaded to the data directory and executed with the permissions 
of the postgresql user, apparently via an extension or an untrusted PL.

Is anyone tracking or investigating this?

Cheers,
  Steve




Re: Compromised postgresql instances

2018-06-08 Thread Andrew Dunstan




On 06/08/2018 04:34 PM, Steve Atkins wrote:

I've noticed a steady trickle of reports of postgresql servers being 
compromised via being left available to the internet with insecure or default 
configuration, or brute-forced credentials. The symptoms are randomly named 
binaries being uploaded to the data directory and executed with the permissions 
of the postgresql user, apparently via an extension or an untrusted PL.

Is anyone tracking or investigating this?





Please cite actual instances of such reports. Vague queries like this 
help nobody.


Furthermore, security concerns are best addressed to the security 
mailing list.


cheers

andrew

--
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Remove mention in docs that foreign keys on partitioned tables are not supported

2018-06-08 Thread Alvaro Herrera
On 2018-Jun-07, Ashutosh Bapat wrote:

> On Thu, Jun 7, 2018 at 10:58 AM, Amit Langote
>  wrote:

> > I don't understand why you think it's too troublesome to let the users
> > know that there is some way to use BR triggers with partitioning.  We
> > didn't do that for indexes, for example, before PG 11 introduced the
> > ability to create them on partitioned tables.
> 
> By looking at the index keys it's easy to decide whether the two
> indexes are same. When we add an index on a partitioned table in v11,
> we skip creating an index on the partition if there exists an index
> similar to the one being created. So, a user can have indexes on
> partitions created in v10, upgrade to v11 and create an index on the
> partitioned table. Nothing changes. But that's not true for a trigger.
> It's not easy to check whether two triggers are same or not unless the
> underlying function is same. User may or may not be using same trigger
> function for all the partitions, which is more true, if the column
> order differs between partitions. So, if the user has created triggers
> on the partitions in v10, upgrades to v11, s/he may have to drop them
> all and recreate the trigger on the partitioned table.

Actually, the column order doesn't matter for a trigger function,
because these don't refer to columns by number but by name.  So unless
users write trigger functions in C and use hardcoded column numbers
(extremely unlikely), I think this is not an issue.  In other words, in
the interesting cases the trigger functions are the same for all
partitions -- therefore upgrading from separate per-partition triggers
to one master trigger in the partitioned table is not going to be that
difficult, ISTM.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Compromised postgresql instances

2018-06-08 Thread Thomas Kellerer
> Please cite actual instances of such reports. Vague queries like this help
nobody.

There were several questions on SO 

https://stackoverflow.com/questions/49815460
https://stackoverflow.com/questions/47499766
https://stackoverflow.com/questions/47741077
https://dba.stackexchange.com/questions/184540

And a blog post going into details on how that specific attack works.

https://www.imperva.com/blog/2018/03/deep-dive-database-attacks-scarlett-johanssons-picture-used-for-crypto-mining-on-postgre-database/





--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html



Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

2018-06-08 Thread David Rowley
On 9 June 2018 at 06:50, David Rowley  wrote:
> It looks like this was 499be013de6, which was one of mine.
>
> A more simple case to reproduce is:
>
> drop table listp;
> create table listp (a int, b int) partition by list(a);
> create table listp1 partition of listp for values in (1);
> select * from (select * from listp union all select * from listp) t where a = 
> 1;
>
> I'll look in more detail after sleeping.

So it looks like I've assumed that the Append path's partitioned_rels
will only ever be set for partitioned tables, but it can, in fact, be
set for UNION ALL parents too when the union children are partitioned
tables.

As a discussion topic, I've attached a patch which does resolve the
error, but it also disables run-time pruning in this case.

There might be some way we can treat UNION ALL parents differently
when building the PartitionPruneInfos. I've just not thought of what
this would be just yet. If I can't think of that, I wonder if this is
a rare enough case not to bother with run-time partition pruning.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


run-time_prune_only_for_partitioned_tables.patch
Description: Binary data