Re: [sqlsmith] Failed assertion in create_gather_path

2018-04-10 Thread Jeevan Chalke
On Mon, Apr 9, 2018 at 5:52 PM, Amit Kapila  wrote:

> On Sun, Apr 8, 2018 at 1:04 PM, Jeevan Chalke
>  wrote:
> > Hi,
> >
> > At some places, I have observed that we are adding a partial path even
> when
> > rel's consider_parallel is false. Due to this, the partial path added has
> > parallel_safe set to false and then later in create_gather_path()
> assertion
> > fails.
> >
>
> Few Comments:
> 1.
> @@ -2196,6 +2196,10 @@ set_subquery_pathlist(PlannerInfo *root,
> RelOptInfo *rel,
> pathkeys, required_outer));
>   }
>
> + /* If parallelism is not possible, return. */
> + if (!rel->consider_parallel || !bms_is_empty(required_outer))
> + return;
>
> In this case shouldn't we set the rel's consider_parallel flag
> correctly rather than avoiding adding the path to it as we do in
> recurse_set_operations?
>

In recurse_set_operations() we are building a new rel and setting its
properties from the final_rel. consider_parallel there is just copied from
the final_rel.
However, in set_subquery_pathlist(), rel is the input parameter here and we
are trying to add a partial path to it without looking at its
consider_parallel field. This patch does that.

And if we want to set consider_parallel for the rel, then it should have
been done prior to this function itself. And I am not sure where that
exactly but not in this function I guess.


> 2.
> @@ -331,7 +331,7 @@ recurse_set_operations(Node *setOp, PlannerInfo *root,
>   * to build a partial path for this relation.  But there's no point in
>   * considering any path but the cheapest.
>   */
> - if (final_rel->partial_pathlist != NIL)
> + if (final_rel->consider_parallel && final_rel->partial_pathlist != NIL)
>
> What problem did you see here or is it just for additional safety?
> Ideally, if the consider_parallel is false for a rel, it's partial
> path list should also be NULL.
>

I actually wanted to have rel->consider_parallel in the condition (yes, for
additional safety) as we are adding a partial path into rel. But then
observed that it is same as that of final_rel->consider_parallel and thus
used it along with other condition.

I have observed at many places that we do check consider_parallel flag
before adding a partial path to it. Thus for consistency added here too,
but yes, it just adds an additional safety here.


> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com
>



-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company


Re: [sqlsmith] Failed assertion on pfree() via perform_pruning_combine_step

2018-04-10 Thread Michael Paquier
On Tue, Apr 10, 2018 at 03:15:47PM +0900, Amit Langote wrote:
> I too thought the same yesterday, as I couldn't see the commit even after
> about 35 minutes Alvaro actually had sent the email to -hackers (quoted
> above).

Bah, of course. Sorry for the noise.
--
Michael


signature.asc
Description: PGP signature


Partitioned tables and covering indexes

2018-04-10 Thread Jaime Casanova
Hi,

Trying covering indexes on partitioned tables i get this error
"""
postgres=# create index on t1_part (i) include (t);
ERROR:  cache lookup failed for opclass 0
"""

To reproduce:

create table t1_part (i int, t text) partition by hash (i);
create table t1_part_0 partition of t1_part for values with (modulus
2, remainder 0);
create table t1_part_1 partition of t1_part for values with (modulus
2, remainder 1);
insert into t1_part values (1, repeat('abcdefquerty', 20));

create index on t1_part (i) include (t);

-- 
Jaime Casanova  www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pgsql: Store 2PC GID in commit/abort WAL recs for logical decoding

2018-04-10 Thread Michael Paquier
On Mon, Apr 09, 2018 at 03:30:39PM +0300, Heikki Linnakangas wrote:
> There seems to be some confusion on the padding here. Firstly, what's the
> point of zero-padding the GID length to the next MAXALIGN boundary, which
> would be 8 bytes on 64-bit systems, if the start is only guaranteed 4-byte
> alignment, per the comment at the memcpy() above. Secondly, if we're
> memcpying the fields that follow anyway, why bother with any alignment
> padding at all?

It seems to me that you are right here: those complications are not
necessary.

 if (replorigin)
+{
 /* Move LSNs forward for this replication origin */
 replorigin_session_advance(replorigin_session_origin_lsn,
gxact->prepare_end_lsn);
+}
This is better style.

+   /* twophase_gid follows if XINFO_HAS_GID. As a null-terminated string. */
+   /* xl_xact_origin follows if XINFO_HAS_ORIGIN, stored unaligned! */

Worth mentioning that the first one is also unaligned with your patch?
And that all the last fields of xl_xact_commit and xl_xact_abort are
kept as such on purpose?
--
Michael


signature.asc
Description: PGP signature


Re: pgsql: Merge catalog/pg_foo_fn.h headers back into pg_foo.h headers.

2018-04-10 Thread Julien Rouhaud
Hi,

On Mon, Apr 9, 2018 at 10:46 PM, Tom Lane  wrote:
> After further contemplation I decided that that was, in fact, the only
> reasonable way to improve matters.  If we have multiple subdirectories
> independently firing the "make generated-headers" action, then we have
> parallel make hazards of just the same sort I was trying to prevent.
> So it's really an all-or-nothing proposition.  The MAKELEVEL hack
> plus wiring the prerequisite into the recursion rules is the best way
> to make that happen.
>
> Hence, done that way.

Compilation of external extensions using PGXS appears to be broken
since this commit:

make[1]: *** /tmp/pgbuild/lib/postgresql/pgxs/src/makefiles/../../src/backend:
No such file or directory.  Stop.
make: *** 
[/tmp/pgbuild/lib/postgresql/pgxs/src/makefiles/../../src/Makefile.global:370:
submake-generated-headers] Error 2


I think the best fix if to define NO_GENERATED_HEADERS in pgxs.mk,
patch attached.
diff --git a/src/makefiles/pgxs.mk b/src/makefiles/pgxs.mk
index c038836e73..a97ad2e0f5 100644
--- a/src/makefiles/pgxs.mk
+++ b/src/makefiles/pgxs.mk
@@ -60,6 +60,8 @@ $(error pgxs error: makefile variable PGXS or NO_PGXS must be set)
 endif
 endif
 
+# External extensions can't build the headers
+NO_GENERATED_HEADERS=yes
 
 ifdef PGXS
 # We assume that we are in src/makefiles/, so top is ...


crash with sql language partition support function

2018-04-10 Thread Amit Langote
Hi.

I noticed that RelationBuildPartitionKey() is not doing the right thing
with respect to which context it passes to fmgr.c to set a the (cached)
partition support function's FmgrInfo's fn_mcxt.

I noticed a crash while trying to change partition pruning tests to use
manually created hash operator class per [1].

CREATE OR REPLACE FUNCTION hashint4_noop(int4, int8) RETURNS int8 AS
$$SELECT coalesce($1)::int8$$ LANGUAGE sql IMMUTABLE STRICT;

CREATE OPERATOR CLASS test_int4_ops FOR TYPE int4 USING HASH AS
OPERATOR 1 = , FUNCTION 2 hashint4_noop(int4, int8);

CREATE OR REPLACE FUNCTION hashtext_length(text, int8) RETURNS int8 AS
$$SELECT length(coalesce($1))::int8$$ LANGUAGE sql IMMUTABLE STRICT;

CREATE OPERATOR CLASS test_text_ops FOR TYPE text USING HASH AS
OPERATOR 1 = , FUNCTION 2 hashtext_length(text, int8);

create table hp (a int, b text) partition by hash (a test_int4_ops, b
test_text_ops);
create table hp0 partition of hp for values with (modulus 4, remainder 0);
create table hp3 partition of hp for values with (modulus 4, remainder 3);
create table hp1 partition of hp for values with (modulus 4, remainder 1);
create table hp2 partition of hp for values with (modulus 4, remainder 2);

insert into hp values (1, 'abcde');
INSERT 0 1
Time: 13.604 ms
postgres=# insert into hp values (null, 'abcde');
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Attached fixes it.  It teaches RelationBuildPartitionKey() to use
fmgr_info_cxt and pass rd_partkeycxt to it.

Since this bug also exists in the released PG 10 branch, I also created a
patch for that.  It's slightly different than the one for PG 11dev,
because there were some changes recently to how the memory context is
manipulated in RelationBuildPartitionKey.  That's
v1-PG10-0001-Fix-a-memory-context-bug-in-RelationBuildPartitio.patch.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CA%2BTgmoZ0D5kJbt8eKXtvVdvTcGGWn6ehWCRSZbWytD-uzH92mQ%40mail.gmail.com
From 81d8337760be175b37b5ffe84df06903173787c5 Mon Sep 17 00:00:00 2001
From: amit 
Date: Tue, 10 Apr 2018 15:38:19 +0900
Subject: [PATCH v1] Fix a memory context bug in RelationBuildPartitionKey

We should pass rd_partkeycxt to set fn_mcxt, not CurrentMemoryContex.
---
 src/backend/utils/cache/relcache.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/src/backend/utils/cache/relcache.c 
b/src/backend/utils/cache/relcache.c
index a69b078f91..be7d5b7eef 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -849,6 +849,10 @@ RelationBuildPartitionKey(Relation relation)
if (!HeapTupleIsValid(tuple))
return;
 
+   partkeycxt = AllocSetContextCreate(CacheMemoryContext,
+  
RelationGetRelationName(relation),
+  
ALLOCSET_SMALL_SIZES);
+
key = (PartitionKey) palloc0(sizeof(PartitionKeyData));
 
/* Fixed-length attributes */
@@ -951,7 +955,7 @@ RelationBuildPartitionKey(Relation relation)
 BTORDER_PROC, opclassform->opcintype, 
opclassform->opcintype,
 opclassform->opcfamily);
 
-   fmgr_info(funcid, &key->partsupfunc[i]);
+   fmgr_info_cxt(funcid, &key->partsupfunc[i], partkeycxt);
 
/* Collation */
key->partcollation[i] = collation->values[i];
@@ -985,9 +989,6 @@ RelationBuildPartitionKey(Relation relation)
ReleaseSysCache(tuple);
 
/* Success --- now copy to the cache memory */
-   partkeycxt = AllocSetContextCreate(CacheMemoryContext,
-  
RelationGetRelationName(relation),
-  
ALLOCSET_SMALL_SIZES);
relation->rd_partkeycxt = partkeycxt;
oldcxt = MemoryContextSwitchTo(relation->rd_partkeycxt);
relation->rd_partkey = copy_partition_key(key);
-- 
2.11.0

From 57056b1ca511ef549ee3f28a8e17d2bdfbf757a5 Mon Sep 17 00:00:00 2001
From: amit 
Date: Tue, 10 Apr 2018 15:38:19 +0900
Subject: [PATCH v1] Fix a memory context bug in RelationBuildPartitionKey

We should pass rd_partkeycxt to set fn_mcxt, not CurrentMemoryContex.
---
 src/backend/utils/cache/relcache.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/utils/cache/relcache.c 
b/src/backend/utils/cache/relcache.c
index e81c4691ec..2a74601fc1 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -1034,7 +1034,7 @@ RelationBuildPartitionKey(Relation relation)
procnum,
  

Re: crash with sql language partition support function

2018-04-10 Thread Amit Langote
I have added this in the Older Bugs section of open items page.

https://wiki.postgresql.org/wiki/PostgreSQL_11_Open_Items#Older_Bugs

Thanks,
Amit




power() function in Windows: "value out of range: underflow"

2018-04-10 Thread Huong Dangminh
Hi,

There are some cases that power() function does not work 
correctly with 'NaN' arguments in Windows environment.
Something like,

postgres=# select power('NaN',11);
ERROR:  value out of range: underflow
postgres=# select power('NaN','NaN');
ERROR:  value out of range: underflow
postgres=# select power(11,'NaN');
ERROR:  value out of range: underflow

In Linux environment, instead of ERROR it returns 'NaN'.

The reason here is,
When pow() in float.c:dpow() is called with 'NaN' arguments,
pow() returns 'NaN' but in Windows environment errno is set to 
EDOM(invalid floating-point exception).
So, PostgreSQL update "result" and cause an ERROR in CHECKFLOATVAL macro.

I think it should be return 'NaN' in all of above cases.
I have tried to create a patch to fix it.
Please confirm the attached file.


---
Thanks and best regards,
Dang Minh Huong
NEC Solution Innovators, Ltd.
http://www.nec-solutioninnovators.co.jp/en/


power_NaN.PATCH
Description: power_NaN.PATCH


Re: [HACKERS] path toward faster partition pruning

2018-04-10 Thread Amit Langote
On 2018/04/10 13:27, Ashutosh Bapat wrote:
> On Mon, Apr 9, 2018 at 8:56 PM, Robert Haas  wrote:
>> On Fri, Apr 6, 2018 at 11:41 PM, Tom Lane  wrote:
>>> David Rowley  writes:
 Sounds like you're saying that if we have too many alternative files
 then there's a chance that one could pass by luck.
>>>
>>> Yeah, exactly: it passed, but did it pass for the right reason?
>>>
>>> If there's just two expected-files, it's likely not a big problem,
>>> but if you have a bunch it's something to worry about.
>>>
>>> I'm also wondering how come we had hash partitioning before and
>>> did not have this sort of problem.  Is it just that we added a
>>> new test that's more sensitive to the details of the hashing
>>> (if so, could it be made less so)?  Or is there actually more
>>> platform dependence now than before (and if so, why is that)?
>>
>> The existing hash partitioning tests did have some dependencies on the
>> hash function, but they took care not to use the built-in hash
>> functions.  Instead they did stuff like this:
>>
>> CREATE OR REPLACE FUNCTION hashint4_noop(int4, int8) RETURNS int8 AS
>> $$SELECT coalesce($1,0)::int8$$ LANGUAGE sql IMMUTABLE;
>> CREATE OPERATOR CLASS test_int4_ops FOR TYPE int4 USING HASH AS
>> OPERATOR 1 = , FUNCTION 2 hashint4_noop(int4, int8);
>> CREATE TABLE mchash (a int, b text, c jsonb)
>>   PARTITION BY HASH (a test_int4_ops, b test_text_ops);
>>
>> I think that this approach should also be used for the new tests.
>> Variant expected output files are a pain to maintain, and you
>> basically just have to take whatever output you get as the right
>> answer, because nobody knows what output a certain built-in hash
>> function should produce for a given input except by running the code.
>> If you do the kind of thing shown above, though, then you can easily
>> see by inspection that you're getting the right answer.

Thanks for the idea.  I think it makes sense and also agree that alternate
outputs approach is not perfectly reliable and maintainable.

> +1.

Attached find a patch that rewrites hash partition pruning tests that
away.  It creates two hash operator classes, one for int4 and another for
text type and uses them to create hash partitioned table to be used in the
tests, like done in the existing tests in hash_part.sql.  Since that makes
tests (hopefully) reliably return the same result always, I no longer see
the need to keep them in a separate partition_prune_hash.sql.  The
reasoning behind having the separate file was to keep the alternative
output file small as David explained in [1].

However, I noticed that there is a bug in RelationBuildPartitionKey that
causes a crash when using a SQL function as partition support function as
the revised tests do, so please refer to and apply the patches I posted
here before running the revised tests:

https://www.postgresql.org/message-id/3041e853-b1dd-a0c6-ff21-7cc5633bffd0%40lab.ntt.co.jp

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CAKJS1f-SON_hAekqoV4_WQwJBtJ_rvvSe68jRNhuYcXqQ8PoQg%40mail.gmail.com
From c1508fc715a7783108f626c67c76fcc1f2303719 Mon Sep 17 00:00:00 2001
From: amit 
Date: Tue, 10 Apr 2018 16:06:33 +0900
Subject: [PATCH v1] Rewrite hash partition pruning tests to use custom opclass

Relying on platform-provided hashing functions makes tests unreliable
as shown by buildfarm recently.

This adds adjusted tests to partition_prune.sql itself and hence
partition_prune_hash.sql is deleted along with two expected output
files.

Discussion: 
https://postgr.es/m/CA%2BTgmoZ0D5kJbt8eKXtvVdvTcGGWn6ehWCRSZbWytD-uzH92mQ%40mail.gmail.com
---
 src/test/regress/expected/partition_prune.out  | 202 -
 src/test/regress/expected/partition_prune_hash.out | 189 ---
 .../regress/expected/partition_prune_hash_1.out| 187 ---
 src/test/regress/parallel_schedule |   2 +-
 src/test/regress/serial_schedule   |   1 -
 src/test/regress/sql/partition_prune.sql   |  59 +-
 src/test/regress/sql/partition_prune_hash.sql  |  41 -
 7 files changed, 259 insertions(+), 422 deletions(-)
 delete mode 100644 src/test/regress/expected/partition_prune_hash.out
 delete mode 100644 src/test/regress/expected/partition_prune_hash_1.out
 delete mode 100644 src/test/regress/sql/partition_prune_hash.sql

diff --git a/src/test/regress/expected/partition_prune.out 
b/src/test/regress/expected/partition_prune.out
index df3fca025e..935e7dc79b 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1330,7 +1330,207 @@ explain (costs off) select * from rparted_by_int2 where 
a > 100;
  Filter: (a > '100'::bigint)
 (3 rows)
 
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, 
coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
+--
+-- Test Partition pruning for HASH partitioning
+-- We roll our own operator classes to use f

Re: pgsql: Merge catalog/pg_foo_fn.h headers back into pg_foo.h headers.

2018-04-10 Thread Michael Paquier
On Tue, Apr 10, 2018 at 10:08:16AM +0200, Julien Rouhaud wrote:
> Compilation of external extensions using PGXS appears to be broken
> since this commit:
> 
> make[1]: *** /tmp/pgbuild/lib/postgresql/pgxs/src/makefiles/../../src/backend:
> No such file or directory.  Stop.
> make: *** 
> [/tmp/pgbuild/lib/postgresql/pgxs/src/makefiles/../../src/Makefile.global:370:
> submake-generated-headers] Error 2

Indeed.  Any external module is blowing up...
--
Michael


signature.asc
Description: PGP signature


Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2018-04-10 Thread Masahiko Sawada
On Fri, Mar 30, 2018 at 4:43 PM, Michael Paquier  wrote:
> On Thu, Mar 01, 2018 at 04:01:28PM -0500, Robert Haas wrote:
>> If you have a clever idea how to make this work with as few atomic
>> operations as the current patch uses while at the same time reducing
>> the possibility of contention, I'm all ears.  But I don't see how to
>> do that.
>
> This thread has no activity since the beginning of the commit fest, and
> it seems that it would be hard to reach something committable for v11,
> so I am marking it as returned with feedback.

Thank you.

The probability of performance degradation can be reduced by
increasing N_RELEXTLOCK_ENTS. But as Robert mentioned, while keeping
fast and simple implementation like acquiring lock by a few atomic
operation it's hard to improve or at least keep the current
performance on all cases. I was thinking that this patch is necessary
by parallel DML operations and vacuum but if the community cannot
accept this approach it might be better to mark it as "Rejected" and
then I should reconsider the design of parallel vacuum.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: Partitioned tables and covering indexes

2018-04-10 Thread Amit Langote
On 2018/04/10 16:07, Jaime Casanova wrote:
> Hi,
> 
> Trying covering indexes on partitioned tables i get this error
> """
> postgres=# create index on t1_part (i) include (t);
> ERROR:  cache lookup failed for opclass 0
> """
> 
> To reproduce:
> 
> create table t1_part (i int, t text) partition by hash (i);
> create table t1_part_0 partition of t1_part for values with (modulus
> 2, remainder 0);
> create table t1_part_1 partition of t1_part for values with (modulus
> 2, remainder 1);
> insert into t1_part values (1, repeat('abcdefquerty', 20));
> 
> create index on t1_part (i) include (t);

It seems that the bug is caused due to the original IndexStmt that
DefineIndex receives being overwritten when processing the INCLUDE
columns.  Also, the original copy of it should be used when recursing for
defining the index in partitions.

Does the attached fix look correct?  Haven't checked the fix with ATTACH
PARTITION though.

Maybe add this to open items list?

Thanks,
Amit
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 860a60d109..109d48bb2f 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -366,6 +366,7 @@ DefineIndex(Oid relationId,
LOCKMODElockmode;
Snapshotsnapshot;
int i;
+   IndexStmt  *orig_stmt = copyObject(stmt);
 
if (list_intersection(stmt->indexParams, stmt->indexIncludingParams) != 
NIL)
ereport(ERROR,
@@ -886,8 +887,8 @@ DefineIndex(Oid relationId,
memcpy(part_oids, partdesc->oids, sizeof(Oid) * nparts);
 
parentDesc = CreateTupleDescCopy(RelationGetDescr(rel));
-   opfamOids = palloc(sizeof(Oid) * numberOfAttributes);
-   for (i = 0; i < numberOfAttributes; i++)
+   opfamOids = palloc(sizeof(Oid) * numberOfKeyAttributes);
+   for (i = 0; i < numberOfKeyAttributes; i++)
opfamOids[i] = 
get_opclass_family(classObjectId[i]);
 
heap_close(rel, NoLock);
@@ -987,11 +988,11 @@ DefineIndex(Oid relationId,
 */
if (!found)
{
-   IndexStmt  *childStmt = 
copyObject(stmt);
+   IndexStmt  *childStmt = 
copyObject(orig_stmt);
boolfound_whole_row;
 
childStmt->whereClause =
-   
map_variable_attnos(stmt->whereClause, 1, 0,
+   
map_variable_attnos(orig_stmt->whereClause, 1, 0,

attmap, maplen,

InvalidOid, &found_whole_row);
if (found_whole_row)


submake-errcodes

2018-04-10 Thread Devrim Gündüz

Hi,

I used to run

"
cd src/backend
make submake-errcodes
"

in the RPM spec file, but looks like it was removed recently. Is that replaced
with something else, or removed completely?

Regards,
-- 
Devrim Gündüz
EnterpriseDB: https://www.enterprisedb.com
PostgreSQL Consultant, Red Hat Certified Engineer
Twitter: @DevrimGunduz , @DevrimGunduzTR

signature.asc
Description: This is a digitally signed message part


Re: submake-errcodes

2018-04-10 Thread Christoph Berg
Re: Devrim Gündüz 2018-04-10 <1523353963.8169.26.ca...@gunduz.org>
> I used to run
> 
> "
> cd src/backend
> make submake-errcodes
> "
> 
> in the RPM spec file, but looks like it was removed recently. Is that replaced
> with something else, or removed completely?

It is gone:
https://git.postgresql.org/pg/commitdiff/3b8f6e75f3c8c6d192621f21624cc8cee04ec3cb

... but that doesn't seem to work transparently either, in an out-of-tree build:

   debian/rules override_dh_auto_build-arch
make[1]: Verzeichnis „/home/cbe/projects/postgresql/pg/master“ wird betreten
/usr/bin/make -C build-py3/src/pl/plpython
make[2]: Verzeichnis 
„/home/cbe/projects/postgresql/pg/master/build-py3/src/pl/plpython“ wird 
betreten
/usr/bin/msgfmt -c -o po/cs.mo 
/home/cbe/projects/postgresql/pg/master/build-py3/../src/pl/plpython/po/cs.po
/usr/bin/msgfmt -c -o po/de.mo 
/home/cbe/projects/postgresql/pg/master/build-py3/../src/pl/plpython/po/de.po
/usr/bin/msgfmt -c -o po/es.mo 
/home/cbe/projects/postgresql/pg/master/build-py3/../src/pl/plpython/po/es.po
/usr/bin/msgfmt -c -o po/fr.mo 
/home/cbe/projects/postgresql/pg/master/build-py3/../src/pl/plpython/po/fr.po
/usr/bin/msgfmt -c -o po/it.mo 
/home/cbe/projects/postgresql/pg/master/build-py3/../src/pl/plpython/po/it.po
/usr/bin/msgfmt -c -o po/ja.mo 
/home/cbe/projects/postgresql/pg/master/build-py3/../src/pl/plpython/po/ja.po
/usr/bin/msgfmt -c -o po/ko.mo 
/home/cbe/projects/postgresql/pg/master/build-py3/../src/pl/plpython/po/ko.po
/usr/bin/msgfmt -c -o po/pl.mo 
/home/cbe/projects/postgresql/pg/master/build-py3/../src/pl/plpython/po/pl.po
/usr/bin/msgfmt -c -o po/pt_BR.mo 
/home/cbe/projects/postgresql/pg/master/build-py3/../src/pl/plpython/po/pt_BR.po
/usr/bin/msgfmt -c -o po/ru.mo 
/home/cbe/projects/postgresql/pg/master/build-py3/../src/pl/plpython/po/ru.po
/usr/bin/msgfmt -c -o po/sv.mo 
/home/cbe/projects/postgresql/pg/master/build-py3/../src/pl/plpython/po/sv.po
/usr/bin/msgfmt -c -o po/zh_CN.mo 
/home/cbe/projects/postgresql/pg/master/build-py3/../src/pl/plpython/po/zh_CN.po
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement 
-Wendif-labels -Wmissing-format-attribute -Wformat-security 
-fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -g -O2 
-fdebug-prefix-map=/home/cbe/projects/postgresql/pg/master=. 
-fstack-protector-strong -Wformat -Werror=format-security 
-fno-omit-frame-pointer -fPIC -I. 
-I/home/cbe/projects/postgresql/pg/master/build-py3/../src/pl/plpython 
-I/usr/include/python3.6m -I../../../src/include 
-I/home/cbe/projects/postgresql/pg/master/build-py3/../src/include  -Wdate-time 
-D_FORTIFY_SOURCE=2 -D_GNU_SOURCE   -c -o plpy_cursorobject.o 
/home/cbe/projects/postgresql/pg/master/build-py3/../src/pl/plpython/plpy_cursorobject.c
In file included from 
/home/cbe/projects/postgresql/pg/master/build-py3/../src/include/postgres.h:47:0,
 from 
/home/cbe/projects/postgresql/pg/master/build-py3/../src/pl/plpython/plpy_cursorobject.c:7:
/home/cbe/projects/postgresql/pg/master/build-py3/../src/include/utils/elog.h:71:10:
 fatal error: utils/errcodes.h: Datei oder Verzeichnis nicht gefunden
 #include "utils/errcodes.h"
  ^~
compilation terminated.
make[2]: *** [: plpy_cursorobject.o] Fehler 1
make[2]: Verzeichnis 
„/home/cbe/projects/postgresql/pg/master/build-py3/src/pl/plpython“ wird 
verlassen

Christoph



Re: [HACKERS] Replication status in logical replication

2018-04-10 Thread Masahiko Sawada
On Fri, Mar 30, 2018 at 5:37 AM, Fujii Masao  wrote:
> On Wed, Jan 17, 2018 at 2:16 AM, Simon Riggs  wrote:
>> On 16 January 2018 at 06:21, Michael Paquier  
>> wrote:
>>> On Tue, Jan 16, 2018 at 10:40:43AM +0900, Masahiko Sawada wrote:
 On Sun, Jan 14, 2018 at 12:43 AM, Simon Riggs  
 wrote:
> On 9 January 2018 at 04:36, Masahiko Sawada  wrote:
> We're not talking about standbys, so the message is incorrect.

 Ah, I understood. How about "\"%s\"  has now caught up with upstream
 server"?
>>>
>>> +1.
>>
>> upstream is what I would have suggested, so +1 here also.
>>
>> Will commit.
>
> ping?
>

Should I add this item to "Older Bugs" of the open item since we
regarded it as a bug?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Bugs in TOAST handling, OID assignment and redo recovery

2018-04-10 Thread Pavan Deolasee
Hello,

One of our 2ndQuadrant support customers recently reported a sudden rush of
TOAST errors post a crash recovery, nearly causing an outage. Most errors
read like this:

ERROR: unexpected chunk number 0 (expected 1) for toast value 

While we could bring back the cluster to normal quickly using some
workarounds, I investigated this in more detail and identified two long
standing bugs in TOAST as well as redo recovery.

Immediately after the crash recovery, server started issuing OIDs that
conflicted with the OIDs issued just prior to the crash. The XLOG_NEXTOID
and buffer LSN interlock is supposed to guard against that, but a bug in
the redo recovery breaks that protection. We shall see that in a minute.

When duplicate OIDs are issued, either because of a bug in redo recovery or
because of OID counter wraparound, GetNewOidWithIndex() is supposed to
protect us against that by first scanning the toast table to check if a
duplicate chunk_id or toast value already exists. If so, a new OID is
requested and we repeat until a non-conflicting OID is found.

This mostly works, except in one corner case. GetNewOidWithIndex() uses
SnapshotDirty to look for conflicting OID. But this fails if the existing
toast tuple was recently deleted by another committed transaction. If the
OldestXmin has not advanced enough, such tuples are reported as
RECENTLY_DEAD  and fail the SnapshotDirty qualifications.
So toast_save_datum() happily goes ahead and inserts a conflicting toast
tuple. The UNIQUE index on  does not complain because
it seems the other tuple as RECENTLY_DEAD too. At this point, we have two
toast tuples with the same  - one of these is
RECENTLY_DEAD and the other is LIVE. Later when toast_fetch_datum() and
friends scan the toast table for the chunk_id (toast value), it uses
SnapshotToast for retrieving the tuple. This special snapshot doesn't do
any visibility checks on the toast tuple, assuming that since the main heap
tuple is visible, the toast tuple(s) should be visible too. This snapshot
unfortunately also sees the  RECENTLY_DEAD toast tuple and thus returns
duplicate toast tuples, leading to the errors such as above. This same
issue can also lead to other kinds of errors in tuptoaster.c.

ISTM that the right fix is to teach toast_save_datum() to check for
existence of duplicate chunk_id by scanning the table with the same
SnapshotToast that it later uses to fetch the tuples. We already do that in
case of toast rewrite, but not for regular inserts. I propose to do that
for regular path too, as done in the attached patch.

Duplicate OIDs may be generated when the counter wraps around and it may
cause this problem.

- OID counter wraps around
- Old toast tuple is deleted and the deleting transaction commits
- OldestXmin is held back by an open transaction
- The same OID if then inserted again into the toast table
- The toasted value is fetched before the table is vacuumed or the index
pointer is killed. This leads to duplicates being returned.

This is probably quite a corner case, but not impossible given that we
already have reports for such problems. But the problem gets accentuated in
case of a crash recovery or on a Hot standby. This is because of another
somewhat unrelated bug which manifested in our case.

An interesting case to consider is the following sequence of events:

1. ONLINE checkpoint starts. We note down the nextOid counter, after
rightly accounting for the cached values too. CreateCheckPoint() does this:

LWLockAcquire(OidGenLock, LW_SHARED);
checkPoint.nextOid = ShmemVariableCache->nextOid;
if (!shutdown)
checkPoint.nextOid += ShmemVariableCache->oidCount;
LWLockRelease(OidGenLock);

2. The cached OIDs are then consumed, and a new set of OIDs are allocated,
generating XLOG_NEXTOID WAL record.

3. More OIDs are consumed from the newly allocated range. Those OIDs make
to the disk after following buffer LSN interlock. So XLOG_NEXTOID gets
flushed to disk as well.

4. Checkpoint finishes, it writes a ONLINE checkpoint record with the
nextOid value saved in step 1

5. CRASH

The redo recovery starts at some LSN written before step 1. It initialises
the ShmemVariableCache->nextOid with the value stored in the checkpoint
record and starts replaying WAL records. When it sees the XLOG_NEXTOID WAL
record generated at step 2, it correctly advances the OID counter to cover
the next range.

But while applying ONLINE checkpoint written at step 4, we do this:

/* ... but still treat OID counter as exact */
LWLockAcquire(OidGenLock, LW_EXCLUSIVE);
ShmemVariableCache->nextOid = checkPoint.nextOid;
ShmemVariableCache->oidCount = 0;
LWLockRelease(OidGenLock);

So effectively we overwrite the nextOid counter and drag it backwards
again. If we don't see another XLOG_NEXTOID record before the crash
recovery ends, then we shall start the server with a stale value,
generating duplicate OIDs immediately after the restart.

I don't think this i

Re: Custom PGC_POSTMASTER GUC variables ... feasible?

2018-04-10 Thread Sergei Kornilov
Hello
See for example contrib pg_stat_statements extension. Setting 
pg_stat_statements.max defined as PGC_POSTMASTER

regards, Sergei



Re: Boolean partitions syntax

2018-04-10 Thread Kyotaro HORIGUCHI
Hello.

Note: This is not intended to be committed this time but just for
information.

At Tue, 10 Apr 2018 10:34:27 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20180410.103427.244142052.horiguchi.kyot...@lab.ntt.co.jp>
> Just adding negation would work as a_expr is doing.
> 
> > | '-' a_expr%prec UMINUS
> > { $$ = doNegate($2, @1); }

a_expr fits partbound_datum_list as is but it cannot sit
side-by-side with MAX/MINVALUE at all. However c_expr can if
columnref is not excluded. The attached patch does that and
partition bound accepts the following syntax. (I didn't see the
transform side at all)

create table p2c1 partition of p2 for values from (log(1000),0+1,0/1) to (10, 
10, ('1'||'0')::int);

=# \d p2c1
...
Partition of: p2 FOR VALUES FROM (3, 1, 0) TO (10, 10, 10)


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index dd0c26c11b..2a3d6a3bb8 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -180,6 +180,8 @@ static Node *makeXmlExpr(XmlExprOp op, char *name, List *named_args,
 static List *mergeTableFuncParameters(List *func_args, List *columns);
 static TypeName *TableFuncTypeName(List *columns);
 static RangeVar *makeRangeVarFromAnyName(List *names, int position, core_yyscan_t yyscanner);
+static Node *makePartRangeDatum(PartitionRangeDatumKind kind, Node *value,
+int location);
 static void SplitColQualList(List *qualList,
 			 List **constraintList, CollateClause **collClause,
 			 core_yyscan_t yyscanner);
@@ -472,7 +474,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type 	columnDef columnOptions
 %type 	def_elem reloption_elem old_aggr_elem operator_def_elem
 %type 	def_arg columnElem where_clause where_or_current_clause
-a_expr b_expr c_expr AexprConst indirection_el opt_slice_bound
+a_expr l_expr b_expr b0_expr c_expr c0_expr AexprConst indirection_el opt_slice_bound
 columnref in_expr having_clause func_table xmltable array_expr
 ExclusionWhereClause operator_def_arg
 %type 	rowsfrom_item rowsfrom_list opt_col_def_list
@@ -585,7 +587,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type 	part_elem
 %type 		part_params
 %type  PartitionBoundSpec
-%type 		partbound_datum PartitionRangeDatum
+%type 		PartitionRangeDatum
 %type 		hash_partbound partbound_datum_list range_datum_list
 %type 		hash_partbound_elem
 
@@ -2804,15 +2806,9 @@ hash_partbound:
 			}
 		;
 
-partbound_datum:
-			Sconst			{ $$ = makeStringConst($1, @1); }
-			| NumericOnly	{ $$ = makeAConst($1, @1); }
-			| NULL_P		{ $$ = makeNullAConst(@1); }
-		;
-
 partbound_datum_list:
-			partbound_datum		{ $$ = list_make1($1); }
-			| partbound_datum_list ',' partbound_datum
+			a_expr		{ $$ = list_make1($1); }
+			| partbound_datum_list ',' a_expr
 { $$ = lappend($1, $3); }
 		;
 
@@ -2825,33 +2821,18 @@ range_datum_list:
 PartitionRangeDatum:
 			MINVALUE
 {
-	PartitionRangeDatum *n = makeNode(PartitionRangeDatum);
-
-	n->kind = PARTITION_RANGE_DATUM_MINVALUE;
-	n->value = NULL;
-	n->location = @1;
-
-	$$ = (Node *) n;
+	$$ = makePartRangeDatum(PARTITION_RANGE_DATUM_MINVALUE,
+			NULL, @1);
 }
 			| MAXVALUE
 {
-	PartitionRangeDatum *n = makeNode(PartitionRangeDatum);
-
-	n->kind = PARTITION_RANGE_DATUM_MAXVALUE;
-	n->value = NULL;
-	n->location = @1;
-
-	$$ = (Node *) n;
+	$$ = makePartRangeDatum(PARTITION_RANGE_DATUM_MAXVALUE,
+			NULL, @1);
 }
-			| partbound_datum
+			| l_expr
 {
-	PartitionRangeDatum *n = makeNode(PartitionRangeDatum);
-
-	n->kind = PARTITION_RANGE_DATUM_VALUE;
-	n->value = $1;
-	n->location = @1;
-
-	$$ = (Node *) n;
+	$$ = makePartRangeDatum(PARTITION_RANGE_DATUM_VALUE,
+			$1, @1);
 }
 		;
 
@@ -13478,9 +13459,20 @@ a_expr:		c_expr	{ $$ = $1; }
  * cause trouble in the places where b_expr is used.  For simplicity, we
  * just eliminate all the boolean-keyword-operator productions from b_expr.
  */
-b_expr:		c_expr
-{ $$ = $1; }
-			| b_expr TYPECAST Typename
+b_expr:		c_expr { $$ = $1; }
+			| b0_expr { $$ = $1; }
+		;
+
+/*
+ * l_expr is a subset of b_expr so as to fit in comma-separated list based on
+ * b_expr.
+ */
+l_expr:		c0_expr { $$ = $1; }
+			| b0_expr { $$ = $1; }
+		;
+
+/* common part of b_expr and l_expr */
+b0_expr: b_expr TYPECAST Typename
 { $$ = makeTypeCast($1, $3, @2); }
 			| '+' b_expr	%prec UMINUS
 { $$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "+", NULL, $2, @1); }
@@ -13554,7 +13546,11 @@ b_expr:		c_expr
  * ambiguity to the b_expr syntax.
  */
 c_expr:		columnref{ $$ = $1; }
-			| AexprConst			{ $$ = $1; }
+			| c0_expr{ $$ = $1; }
+		;
+
+/* common part of c_expr and l_expr */
+c0_expr:		 AexprConst			{ $$ = $1; }
 			| PARA

Re: Transform for pl/perl

2018-04-10 Thread Dagfinn Ilmari Mannsåker
Tom Lane  writes:

> ilm...@ilmari.org (Dagfinn Ilmari =?utf-8?Q?Manns=C3=A5ker?=) writes:
>> Tom Lane  writes:
>>> I think you'd have to convert to text and back.  That's kind of icky,
>>> but it beats failing.
>
>> I had a look, and that's what the PL/Python transform does.  Attached is
>> a patch that does that for PL/Perl too, but only if the value is
>> actually > PG_INT64_MAX.
>
>> The secondary output files are for Perls with 32bit IV/UV types, but I
>> haven't been able to test them, since Debian's Perl uses 64bit integers
>> even on 32bit platforms.
>
> Ugh.  I really don't want to maintain a separate expected-file for this,
> especially not if it's going to be hard to test.  Can we choose another
> way of exercising the code path?

How about a plperl function that returns ~0 as numeric, and doing

select testuvtojsonb()::numeric = plperlu_maxuint();

in the test?

> Another issue with this code as written is that on 32-bit-UV platforms,
> at least some vompilers will give warnings about the constant-false
> predicate.  Not sure about a good solution for that.  Maybe it's a
> sufficient reason to invent uint8_numeric so we don't need a range check.

Yes, that does push the needle towards it being worth doing.

While playing around some more with the extension, I discoverered a few
more issues:

1) both the jsonb_plperl and jsonb_plperlu extensions contain the SQL
   functions jsonb_to_plperl and plperl_to_jsonb, so can't be installed
   simultaneously

2) jsonb scalar values are passed to the plperl function wrapped in not
   one, but _two_ layers of references

3) jsonb numeric values are passed as perl's NV (floating point) type,
   losing precision if they're integers that would fit in an IV or UV.

4) SV_to_JsonbValue() throws an error for infinite NVs, but not NaNs

Attached is a patch for the first issue.  I'll look at fixing the rest
later this week.

- ilmari
-- 
"The surreality of the universe tends towards a maximum" -- Skud's Law
"Never formulate a law or axiom that you're not prepared to live with
 the consequences of."  -- Skud's Meta-Law

>From cf5771f98aa33bc7f12054d65857a3cc347465dc Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= 
Date: Tue, 10 Apr 2018 10:57:50 +0100
Subject: [PATCH] Fix clashing function names bettween jsonb_plperl and
 jsonb_plperlu

---
 contrib/jsonb_plperl/jsonb_plperlu--1.0.sql | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/contrib/jsonb_plperl/jsonb_plperlu--1.0.sql b/contrib/jsonb_plperl/jsonb_plperlu--1.0.sql
index 99b8644e3b..5a5e475ad3 100644
--- a/contrib/jsonb_plperl/jsonb_plperlu--1.0.sql
+++ b/contrib/jsonb_plperl/jsonb_plperlu--1.0.sql
@@ -3,17 +3,17 @@
 -- complain if script is sourced in psql, rather than via CREATE EXTENSION
 \echo Use "CREATE EXTENSION jsonb_plperlu" to load this file. \quit
 
-CREATE FUNCTION jsonb_to_plperl(val internal) RETURNS internal
+CREATE FUNCTION jsonb_to_plperlu(val internal) RETURNS internal
 LANGUAGE C STRICT IMMUTABLE
-AS 'MODULE_PATHNAME';
+AS 'MODULE_PATHNAME', 'jsonb_to_plperl';
 
-CREATE FUNCTION plperl_to_jsonb(val internal) RETURNS jsonb
+CREATE FUNCTION plperlu_to_jsonb(val internal) RETURNS jsonb
 LANGUAGE C STRICT IMMUTABLE
-AS 'MODULE_PATHNAME';
+AS 'MODULE_PATHNAME', 'plperl_to_jsonb';
 
 CREATE TRANSFORM FOR jsonb LANGUAGE plperlu (
-FROM SQL WITH FUNCTION jsonb_to_plperl(internal),
-TO SQL WITH FUNCTION plperl_to_jsonb(internal)
+FROM SQL WITH FUNCTION jsonb_to_plperlu(internal),
+TO SQL WITH FUNCTION plperlu_to_jsonb(internal)
 );
 
 COMMENT ON TRANSFORM FOR jsonb LANGUAGE plperlu IS 'transform between jsonb and Perl';
-- 
2.17.0



Re: [HACKERS] path toward faster partition pruning

2018-04-10 Thread David Rowley
On 10 April 2018 at 20:56, Amit Langote  wrote:
> On 2018/04/10 13:27, Ashutosh Bapat wrote:
>> On Mon, Apr 9, 2018 at 8:56 PM, Robert Haas  wrote:
>>> CREATE OR REPLACE FUNCTION hashint4_noop(int4, int8) RETURNS int8 AS
>>> $$SELECT coalesce($1,0)::int8$$ LANGUAGE sql IMMUTABLE;
>>> CREATE OPERATOR CLASS test_int4_ops FOR TYPE int4 USING HASH AS
>>> OPERATOR 1 = , FUNCTION 2 hashint4_noop(int4, int8);
>>> CREATE TABLE mchash (a int, b text, c jsonb)
>>>   PARTITION BY HASH (a test_int4_ops, b test_text_ops);
>
> Thanks for the idea.  I think it makes sense and also agree that alternate
> outputs approach is not perfectly reliable and maintainable.
>
>> +1.
>
> Attached find a patch that rewrites hash partition pruning tests that
> away.  It creates two hash operator classes, one for int4 and another for
> text type and uses them to create hash partitioned table to be used in the
> tests, like done in the existing tests in hash_part.sql.  Since that makes
> tests (hopefully) reliably return the same result always, I no longer see
> the need to keep them in a separate partition_prune_hash.sql.  The
> reasoning behind having the separate file was to keep the alternative
> output file small as David explained in [1].
> [1]
> https://www.postgresql.org/message-id/CAKJS1f-SON_hAekqoV4_WQwJBtJ_rvvSe68jRNhuYcXqQ8PoQg%40mail.gmail.com

I had a quick look, but I'm still confused about why a function like
hash_uint32_extended() is susceptible to varying results depending on
CPU endianness but hash_combine64 is not.

Apart from that confusion, looking at the patch:

+CREATE OR REPLACE FUNCTION pp_hashint4_noop(int4, int8) RETURNS int8 AS
+$$SELECT coalesce($1)::int8$$ LANGUAGE sql IMMUTABLE STRICT;
+CREATE OPERATOR CLASS pp_test_int4_ops FOR TYPE int4 USING HASH AS
+OPERATOR 1 = , FUNCTION 2 pp_hashint4_noop(int4, int8);
+CREATE OR REPLACE FUNCTION pp_hashtext_length(text, int8) RETURNS int8 AS
+$$SELECT length(coalesce($1))::int8$$ LANGUAGE sql IMMUTABLE STRICT;


Why coalesce here? Maybe I've not thought of something, but coalesce
only seems useful to me if there's > 1 argument. Plus the function is
strict, so not sure it's really doing even if you added a default.

I know this one was there before, but I only just noticed it:

+-- pruning should work if non-null values are provided for all the keys
+explain (costs off) select * from hp where a is null and b is null;

The comment is a bit misleading given the first test below it is
testing for nulls. Maybe it can be changed to

+-- pruning should work if values or is null clauses are provided for
all partition keys.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [PATCH] pg_hba.conf : new auth option : clientcert=verify-full

2018-04-10 Thread Julian Markwort
On Fri, 2018-04-06 at 20:31 +0200, Magnus Hagander wrote:
> I've been through this one again.
Thanks for taking the time!

> There is one big omission from it -- it fails to work with the view
> pg_hba_file_rules. When fixing that, things started to look kind of
> ugly with the "two booleans to indicate one thing", so I went ahead
> and changed it to instead be an enum of 3 values.
Oh, I missed that view. To be honest, it wasn't even on my mind that
there could be a view depending on pg_hba

> Also, now when using verify-full or verify-ca, I figured its a lot
> easier to misspell the value, and we don't want that to mean "off".
> Instead, I made it give an error in this case instead of silently
> treating it as 0.
Good catch!

> The option "2" for clientcert was never actually documented, and I
> think we should get rid of that. We need to keep "1" for backwards
> compatibility (which arguably was a bad choice originally, but that
> was many years ago), but let's not add another one. 
> I also made a couple of very small changes to the docs.
> 
> Attached is an updated patch with these changes. I'd appreciate it if
> you can run it through your tests to confirm that it didn't break any
> of those usecases.
I've tested a couple of things with this and it seems to work as
expected. Unforunately, there are no tests for libpq, afaik. But
testing such features would become complicated quite quickly, with the
need to generate certificates and such...

I've noticed that the error message for mismatching CN is now printed
twice when using password prompts, as all authentication details are
checked upon initiation of a connection and again later, after sending
the password.

> 2018-04-10 13:54:19.882 CEST [8735] LOG:  provided user name
> (testuser) and authenticated user name (nottestuser) do not match
> 2018-04-10 13:54:19.882 CEST [8735] LOG:  certificate validation
> failed for user "testuser": common name in client certificate does
> not match user name or mapping, but clientcert=verify-full is
> enabled.
> 2018-04-10 13:54:21.515 CEST [8736] LOG:  provided user name
> (testuser) and authenticated user name (nottestuser) do not match
> 2018-04-10 13:54:21.515 CEST [8736] LOG:  certificate validation
> failed for user "testuser": common name in client certificate does
> not match user name or mapping, but clientcert=verify-full is
> enabled.
> 2018-04-10 13:54:21.515 CEST [8736] FATAL:  password authentication
> failed for user "testuser"
> 2018-04-10 13:54:21.515 CEST [8736] DETAIL:  Connection matched
> pg_hba.conf line 94: "hostssl all testuser
> 127.0.0.1/32  password clientcert=verify-full"

Also, as you can see, there are two LOG messages indicating the
mismatch -- "provided user name (testuser) and authenticated user name
(nottestuser) do not match" comes from hba.c:check_usermap() and
"certificate validation failed for user "testuser": common name in
client certificate does not match user name or mapping, but
clientcert=verify-full is enabled." is the message added in
auth.c:CheckCertAuth()  by the patch.
Maybe we should shorten the latter LOG message?

regards
Julian

smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] path toward faster partition pruning

2018-04-10 Thread Ashutosh Bapat
On Tue, Apr 10, 2018 at 5:32 PM, David Rowley
 wrote:
> On 10 April 2018 at 20:56, Amit Langote  wrote:
>> On 2018/04/10 13:27, Ashutosh Bapat wrote:
>>> On Mon, Apr 9, 2018 at 8:56 PM, Robert Haas  wrote:
 CREATE OR REPLACE FUNCTION hashint4_noop(int4, int8) RETURNS int8 AS
 $$SELECT coalesce($1,0)::int8$$ LANGUAGE sql IMMUTABLE;
 CREATE OPERATOR CLASS test_int4_ops FOR TYPE int4 USING HASH AS
 OPERATOR 1 = , FUNCTION 2 hashint4_noop(int4, int8);
 CREATE TABLE mchash (a int, b text, c jsonb)
   PARTITION BY HASH (a test_int4_ops, b test_text_ops);
>>
>> Thanks for the idea.  I think it makes sense and also agree that alternate
>> outputs approach is not perfectly reliable and maintainable.
>>
>>> +1.
>>
>> Attached find a patch that rewrites hash partition pruning tests that
>> away.  It creates two hash operator classes, one for int4 and another for
>> text type and uses them to create hash partitioned table to be used in the
>> tests, like done in the existing tests in hash_part.sql.  Since that makes
>> tests (hopefully) reliably return the same result always, I no longer see
>> the need to keep them in a separate partition_prune_hash.sql.  The
>> reasoning behind having the separate file was to keep the alternative
>> output file small as David explained in [1].
>> [1]
>> https://www.postgresql.org/message-id/CAKJS1f-SON_hAekqoV4_WQwJBtJ_rvvSe68jRNhuYcXqQ8PoQg%40mail.gmail.com
>
> I had a quick look, but I'm still confused about why a function like
> hash_uint32_extended() is susceptible to varying results depending on
> CPU endianness but hash_combine64 is not.
>
> Apart from that confusion, looking at the patch:
>
> +CREATE OR REPLACE FUNCTION pp_hashint4_noop(int4, int8) RETURNS int8 AS
> +$$SELECT coalesce($1)::int8$$ LANGUAGE sql IMMUTABLE STRICT;
> +CREATE OPERATOR CLASS pp_test_int4_ops FOR TYPE int4 USING HASH AS
> +OPERATOR 1 = , FUNCTION 2 pp_hashint4_noop(int4, int8);
> +CREATE OR REPLACE FUNCTION pp_hashtext_length(text, int8) RETURNS int8 AS
> +$$SELECT length(coalesce($1))::int8$$ LANGUAGE sql IMMUTABLE STRICT;
>
>
> Why coalesce here? Maybe I've not thought of something, but coalesce
> only seems useful to me if there's > 1 argument. Plus the function is
> strict, so not sure it's really doing even if you added a default.

I think Amit Langote wanted to write coalesce($1, $2), $2 being the
seed for hash function. See how hash operator class functions are
defined in sql/insert.sql. May be we should just use the same
functions or even the same tables.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-10 Thread Craig Ringer
On 10 April 2018 at 14:10, Michael Paquier  wrote:

> Well, I think that there is place for improving reporting of failure
> in file_utils.c for frontends, or at worst have an exit() for any kind
> of critical failures equivalent to a PANIC.

Yup.

In the mean time, speaking of PANIC, here's the first cut patch to
make Pg panic on fsync() failures. I need to do some closer review and
testing, but it's presented here for anyone interested.

I intentionally left some failures as ERROR not PANIC, where the
entire operation is done as a unit, and an ERROR will cause us to
retry the whole thing.

For example, when we fsync() a temp file before we move it into place,
there's no point panicing on failure, because we'll discard the temp
file on ERROR and retry the whole thing.

I've verified that it works as expected with some modifications to the
test tool I've been using (pushed).

The main downside is that if we panic in redo, we don't try again. We
throw our toys and shut down. But arguably if we get the same I/O
error again in redo, that's the right thing to do anyway, and quite
likely safer than continuing to ERROR on checkpoints indefinitely.

Patch attached.

To be clear, this patch only deals with the issue of us retrying
fsyncs when it turns out to be unsafe. This does NOT address any of
the issues where we won't find out about writeback errors at all.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From 09d23dcd5ed87f5f01a43978942d8867e712e280 Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Tue, 10 Apr 2018 14:08:32 +0800
Subject: [PATCH v1] PANIC when we detect a possible fsync I/O error instead of
 retrying fsync

Panic the server on fsync failure in places where we can't simply repeat the
whole operation on retry. Most imporantly, panic when fsync fails during a
checkpoint.

This will result in log messages like:

PANIC:  58030: could not fsync file "base/12367/16386": Input/output error
LOG:  0: checkpointer process (PID 10799) was terminated by signal 6: Aborted

and, if the condition persists during redo:

LOG:  0: checkpoint starting: end-of-recovery immediate
PANIC:  58030: could not fsync file "base/12367/16386": Input/output error
LOG:  0: startup process (PID 10808) was terminated by signal 6: Aborted

Why?

In a number of places PostgreSQL we responded to fsync() errors by retrying the
fsync(), expecting that this would force the operating system to repeat any
write attempts. The code assumed that fsync() would return an error on all
subsequent calls until any I/O error was resolved.

This is not what actually happens on some platforms, including Linux. The
operating system may give up and drop dirty buffers for async writes on the
floor and mark the page mapping as bad. The first fsync() clears any error flag
from the page entry and/or our file descriptor. So a subsequent fsync() returns
success, even though the data PostgreSQL wrote was really discarded.

We have no way to find out which writes failed, and no way to ask the kernel to
retry indefinitely, so all we can do is PANIC. Redo will attempt the write
again, and if it fails again, it will also PANIC.

This doesn't completely prevent fsync reliability issues, because it only
handles cases where the kernel actually reports the error to us. It's entirely
possible for a buffered write to be lost without causing fsync to report an
error at all (see discussion below). Work on addressing those issues and
documenting them is ongoing and will be committed separately.

See https://www.postgresql.org/message-id/CAMsr%2BYHh%2B5Oq4xziwwoEfhoTZgr07vdGG%2Bhu%3D1adXx59aTeaoQ%40mail.gmail.com
---
 src/backend/access/heap/rewriteheap.c   |  6 +++---
 src/backend/access/transam/timeline.c   |  4 ++--
 src/backend/access/transam/twophase.c   |  2 +-
 src/backend/access/transam/xlog.c   |  4 ++--
 src/backend/replication/logical/snapbuild.c |  3 +++
 src/backend/storage/file/fd.c   |  6 ++
 src/backend/storage/smgr/md.c   | 22 --
 src/backend/utils/cache/relmapper.c |  2 +-
 8 files changed, 34 insertions(+), 15 deletions(-)

diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
index 8d3c861a33..0320baffec 100644
--- a/src/backend/access/heap/rewriteheap.c
+++ b/src/backend/access/heap/rewriteheap.c
@@ -965,7 +965,7 @@ logical_end_heap_rewrite(RewriteState state)
 	while ((src = (RewriteMappingFile *) hash_seq_search(&seq_status)) != NULL)
 	{
 		if (FileSync(src->vfd, WAIT_EVENT_LOGICAL_REWRITE_SYNC) != 0)
-			ereport(ERROR,
+			ereport(PANIC,
 	(errcode_for_file_access(),
 	 errmsg("could not fsync file \"%s\": %m", src->path)));
 		FileClose(src->vfd);
@@ -1180,7 +1180,7 @@ heap_xlog_logical_rewrite(XLogReaderState *r)
 	 */
 	pgstat_report_wait_start(WAIT_EVENT_LOGICAL_REWRITE_MAPPING_SYNC);
 	if (pg_fsync(fd) != 0)
-		er

Re: [PATCH] pg_hba.conf : new auth option : clientcert=verify-full

2018-04-10 Thread Peter Eisentraut
On 4/10/18 08:10, Julian Markwort wrote:
>> Attached is an updated patch with these changes. I'd appreciate it if
>> you can run it through your tests to confirm that it didn't break any
>> of those usecases.
> I've tested a couple of things with this and it seems to work as
> expected. Unforunately, there are no tests for libpq, afaik. But testing
> such features would become complicated quite quickly, with the need to
> generate certificates and such...

There are tests in src/test/ssl/ that would probably be a good fit to
extend for this.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Bugs in TOAST handling, OID assignment and redo recovery

2018-04-10 Thread Bernd Helmle
Am Dienstag, den 10.04.2018, 15:59 +0530 schrieb Pavan Deolasee:
> One of our 2ndQuadrant support customers recently reported a sudden
> rush of
> TOAST errors post a crash recovery, nearly causing an outage. Most
> errors
> read like this:
> 
> ERROR: unexpected chunk number 0 (expected 1) for toast value 
> 
> While we could bring back the cluster to normal quickly using some
> workarounds, I investigated this in more detail and identified two
> long
> standing bugs in TOAST as well as redo recovery.

Wow! I'm currently also investigating issues on a customer system, too,
where suddenly TOAST errors arised after shutdown immediate.

I haven't dug into your findings yet, but it seems to perfectly fit
into the reports i got.

Thanks,
Bernd




Re: [PATCH][PROPOSAL] Refuse setting toast.* reloptions when TOAST table does not exist

2018-04-10 Thread David Steele
On 1/25/18 12:27 PM, Nikolay Shaplov wrote:
> В письме от 25 января 2018 11:29:34 пользователь Tom Lane написал:
>> Alvaro Herrera  writes:
>>> Tom Lane wrote:
 Well, maybe the right answer is to address that.  It's clear to me
 why that would happen if we store these things as reloptions on the
 toast table, but can't they be stored on the parent table?
>>>
>>> Actually, Nikolay provided a possible solution: if you execute ALTER
>>> TABLE SET (toast.foobar = xyz), and a toast table doesn't exist, create
>>> one at that point.
>>
>> That adds a lot of overhead if you never actually need the toast table.
> I think this overhead case is not that terrible if it is properly warned ;-)
> 
>> Still, maybe it's an appropriate amount of effort compared to the size
>> of the use-case for this.
> If you came to some final conclustion, please close the commiffest item with 
> "Return with feedback" resolution, and I write another patch... 

I think this patch should be marked Returned with Feedback since it
appears there is no consensus on whether it is useful or correct, so I
have done that.

If I got it wrong I'm happy to move it to the next CF in Waiting for
Author state instead.

Thanks,
-- 
-David
da...@pgmasters.net



Re: [PATCH][PROPOSAL] Refuse setting toast.* reloptions when TOAST table does not exist

2018-04-10 Thread Alvaro Herrera
Nikolay Shaplov wrote:
> В письме от 25 января 2018 11:29:34 пользователь Tom Lane написал:
> > Alvaro Herrera  writes:

> > > Actually, Nikolay provided a possible solution: if you execute ALTER
> > > TABLE SET (toast.foobar = xyz), and a toast table doesn't exist, create
> > > one at that point.
> > 
> > That adds a lot of overhead if you never actually need the toast table.
>
> I think this overhead case is not that terrible if it is properly warned ;-)

Let's go with this idea, which seems to me better than the statu quo.
There are a couple of other proposals, but they seem to require a lot of
work in exchange for unclear benefits.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Function to track shmem reinit time

2018-04-10 Thread David Steele
On 3/29/18 9:40 AM, Tomas Vondra wrote:
> On 03/28/2018 08:55 PM, David Steele wrote:
> 
>> I'm setting this entry to Waiting on Author, but based on the discussion
>> I think it should be Returned with Feedback.
>>
> 
> Fine with me.

This entry has been marked Returned with Feedback.

Regards,
-- 
-David
da...@pgmasters.net



Re: [PATCH][PROPOSAL] Refuse setting toast.* reloptions when TOAST table does not exist

2018-04-10 Thread Nikolay Shaplov
В письме от 10 апреля 2018 08:55:52 пользователь David Steele написал:
> On 1/25/18 12:27 PM, Nikolay Shaplov wrote:
> > В письме от 25 января 2018 11:29:34 пользователь Tom Lane написал:
> >> Alvaro Herrera  writes:
> >>> Tom Lane wrote:
>  Well, maybe the right answer is to address that.  It's clear to me
>  why that would happen if we store these things as reloptions on the
>  toast table, but can't they be stored on the parent table?
> >>> 
> >>> Actually, Nikolay provided a possible solution: if you execute ALTER
> >>> TABLE SET (toast.foobar = xyz), and a toast table doesn't exist, create
> >>> one at that point.
> >> 
> >> That adds a lot of overhead if you never actually need the toast table.
> > 
> > I think this overhead case is not that terrible if it is properly warned
> > ;-)> 
> >> Still, maybe it's an appropriate amount of effort compared to the size
> >> of the use-case for this.
> > 
> > If you came to some final conclustion, please close the commiffest item
> > with "Return with feedback" resolution, and I write another patch...
> 
> I think this patch should be marked Returned with Feedback since it
> appears there is no consensus on whether it is useful or correct, so I
> have done that.
Exactly!

But I'd like to know what kind of feedback is it :-)
What conclusion have been reached (I did not got it)

Otherwise I would not know how to rewrite this patch.

I would suggest: create a TOAST relation whenever toast.* options is set, but 
give a warning it this relation will not be used for data storage (i.e. there 
is no toastable columns in the table)

But I need some confirmation, in order not to write patch in vain again :-)

> 
> If I got it wrong I'm happy to move it to the next CF in Waiting for
> Author state instead.
> 
> Thanks,

-- 
Do code for fun.

signature.asc
Description: This is a digitally signed message part.


Re: [HACKERS] [PATCH] Incremental sort

2018-04-10 Thread David Steele
On 4/9/18 11:56 AM, Alexander Korotkov wrote:
> 
> Thank you very much for your efforts on reviewing this patch.
> I'm looking forward work with you on this feature for PG12.

Since there's a new patch I have changed the status to Needs Review and
moved the entry to the next CF.

Still, it seems to me that discussion and new patches will be required
to address Tom's concerns.

Regards,
-- 
-David
da...@pgmasters.net



Re: [PATCH][PROPOSAL] Refuse setting toast.* reloptions when TOAST table does not exist

2018-04-10 Thread Alvaro Herrera
Nikolay Shaplov wrote:

> But I need some confirmation, in order not to write patch in vain again :-)

Don't worry, rest assured that you will still write *many* patches in
vain, not just this one.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: crash with sql language partition support function

2018-04-10 Thread Ashutosh Bapat
On Tue, Apr 10, 2018 at 1:44 PM, Amit Langote
 wrote:
>
> Attached fixes it.  It teaches RelationBuildPartitionKey() to use
> fmgr_info_cxt and pass rd_partkeycxt to it.

The patch is using partkeycxt and not rd_partkeycxt. Probably a typo
in the mail. But a wider question, why that context? I guess that
cache context will vanish when that cache entry is thrown away. That's
the reason we have to copy partition key information in
find_partition_scheme() instead of just pointing to it and also use
fmgr_info_copy() there. But if that's the case, buildfarm animal run
with -DCLOBBER_CACHE_ALWAYS should show failure. I am missing
something here.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: [HACKERS] logical decoding of two-phase transactions

2018-04-10 Thread David Steele
On 4/9/18 2:01 AM, Nikhil Sontakke wrote:
> 
> Anyways, will now wait for the next commitfest/opportunity to try to
> get this in.

It looks like this patch should be in the Needs Review state so I have
done that and moved it to the next CF.

Regards,
-- 
-David
da...@pgmasters.net



Re: jsonpath

2018-04-10 Thread David Steele
On 3/27/18 11:13 AM, Nikita Glukhov wrote:
>
> Attached 14th version of the patches:

It appears this entry should be marked Needs Review, so I have done that
and moved it to the next CF.

Regards,
-- 
-David
da...@pgmasters.net



Re: [PATCH] Add missing type conversion functions for PL/Python

2018-04-10 Thread David Steele
On 3/26/18 12:07 PM, Nikita Glukhov wrote:
> On 26.03.2018 17:19, David Steele wrote:
> 
>> On 2/20/18 10:14 AM, Haozhou Wang wrote:
>>> Thank you very much for your review!
>>>
>>> I attached a new patch with typo fixed.
>> I think it's a bit premature to mark this Ready for Committer after a
>> review consisting of a few typos.  Anthony only said that he started
>> looking at it so I've marked it Needs Review.
> 
> Hi.
> 
> I also have looked at this patch and found some problems.
> Attached fixed 3th version of the patch:
>  * initialization of arg->u.scalar was moved into PLy_output_setup_func()
>  * added range checks for int16 and int32 types
>  * added subroutine PLyInt_AsLong() for correct handling OverflowError
> that can
>    be thrown from PyInt_AsLong()
>  * casting from Python float to PostgreSQL numeric using
> PyFloat_AsDouble() was
>    removed because it can return incorrect result for Python long and
>    float8_numeric() uses float8 and numeric I/O functions
>  * fixed whitespace

There's a new patch on this thread but since it was not submitted by the
author I've moved this entry to the next CF in Waiting for Author state.

Regards,
-- 
-David
da...@pgmasters.net



Re: Bugs in TOAST handling, OID assignment and redo recovery

2018-04-10 Thread Heikki Linnakangas

On 10/04/18 13:29, Pavan Deolasee wrote:

Hello,

One of our 2ndQuadrant support customers recently reported a sudden rush of
TOAST errors post a crash recovery, nearly causing an outage. Most errors
read like this:

ERROR: unexpected chunk number 0 (expected 1) for toast value 

While we could bring back the cluster to normal quickly using some
workarounds, I investigated this in more detail and identified two long
standing bugs in TOAST as well as redo recovery.


Great detective work!


ISTM that the right fix is to teach toast_save_datum() to check for
existence of duplicate chunk_id by scanning the table with the same
SnapshotToast that it later uses to fetch the tuples. We already do that in
case of toast rewrite, but not for regular inserts. I propose to do that
for regular path too, as done in the attached patch.


It would seem more straightforward to add a snapshot parameter to 
GetNewOidWithIndex(), so that the this one caller could pass 
SnapshotToast, while others pass SnapshotDirty. With your patch, you 
check the index twice: first with SnapshotDirty, in 
GetNewOidWithIndex(), and then with SnapshotToast, in the caller.


If I'm reading the rewrite case correctly, it's a bit different and 
quite special. In the loop with GetNewOidWithIndex(), it needs to check 
that the OID is unused in two tables, the old TOAST table, and the new 
one. You can only pass one index to GetNewOidWithIndex(), so it needs to 
check the second index manually. It's not because of the snapshot issue. 
Although I wonder if we should be using SnapshotToast in that 
GetNewOidWithIndex() call, too. I.e. if we should be checking both the 
old and the new toast table with SnapshotToast.



I think the right fix is to simply ignore the nextOid counter while
replaying ONLINE checkpoint record. We must have already initialised
ShmemVariableCache->nextOid
to the value stored in this (or some previous) checkpoint record when redo
recovery is started. As and when we see XLOG_NEXTOID record, we would
maintain the shared memory counter correctly. If we don't see any
XLOG_NEXTOID record, the value we started with must be the correct value. I
see no problem even when OID wraps around during redo recovery.
XLOG_NEXTOID should record that correctly.


Agreed. With nextXid, we advance ShmemVariableCache->nextXid if the 
value in the online checkpoint record is greater than 
ShmemVariableCache->nextXid. But we don't have such a wraparound-aware 
concept of "greater than" for OIDs. Ignoring the online checkpoint 
record's nextOid value seem fine to me.


- Heikki



Re: pgsql: Merge catalog/pg_foo_fn.h headers back into pg_foo.h headers.

2018-04-10 Thread Tom Lane
Julien Rouhaud  writes:
> Compilation of external extensions using PGXS appears to be broken
> since this commit:

Ouch, sorry.

> I think the best fix if to define NO_GENERATED_HEADERS in pgxs.mk,
> patch attached.

Hm.  I wonder if we don't also want NO_TEMP_INSTALL, like the doc/src/sgml
makefile.  I don't know whether "make check" could be useful in a PGXS
build, but certainly that recipe for making a temp install isn't gonna
work.

regards, tom lane



Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2018-04-10 Thread David Steele
Hi Mark,

On 3/26/18 4:50 PM, Mark Rofail wrote:
> On 3/7/18 5:43 AM, Alvaro Herrera wrote:
> 
> I searched for the new GIN operator so that I
> could brush it up for commit ahead of the rest -- only to find out that
> it was eviscerated from the patch between v5 and v5.1.
> 
> The latest version of the patch which contained the new GIN operator is
> version  `*Array-ELEMENT-foreign-key-v6.patch
> *`
> which works fine and passed all the regression tests at the time
> (2018-01-21). We abandoned the GIN operator since it couldn't follow the
> same logic as the rest of GIN operators use since it operates on a Datum
> not an array. Not because of any error.
> 
> just as Andreas said
> 
> On Wed, Jan 31, 2018 at 1:52 AM, Andreas Karlsson  
> wrote:
> 
>  The issue I see is that 
> ginqueryarrayextract() needs to make a copy of the search key but to do 
> so it needs to know the type of anyelement (to know if it needs to 
> detoast, etc). But there is as far as I can tell no way to check the 
> type of anyelement in this context.
> 
>  
> If there is any way to  obtain a copy of the datum I would be more than
> happy to integrate the GIN operator to the patch.

Since you have expressed a willingness to continue work on this patch I
have moved it to the next CF in Waiting on Author state.

You should produce a new version by then that addresses Alvaro's
concerns and I imagine that will require quite a bit of discussion and
work.  Everyone is a bit fatigued at the moment so it would best to hold
off on that for a while.

Regards,
-- 
-David
da...@pgmasters.net



Re: Reopen logfile on SIGHUP

2018-04-10 Thread David Steele
Hi Anastasia,

On 3/28/18 1:46 PM, David Steele wrote:
> On 2/27/18 8:54 PM, Michael Paquier wrote:
>> On Tue, Feb 27, 2018 at 05:52:20PM -0500, Tom Lane wrote:
>>> It already does treat SIGUSR1 as a log rotation request.  Apparently
>>> the point of this patch is that some people don't find that easy enough
>>> to use, which is fair, because finding out the collector's PID from
>>> outside isn't very easy.
>>
>> True enough.  The syslogger does not show up in pg_stat_activity either,
>> so I think that being able to do so would help for this case.
> 
> There does not seem to be any consensus on this patch so I'm marking it
> Waiting on Author for the time being.  At the end of the CF I'll mark it
> Returned with Feedback if there is no further activity.

I have marked this entry Returned with Feedback since there has been no
further activity and no opinions to the contrary.

Regards,
-- 
-David
da...@pgmasters.net



Re: Transform for pl/perl

2018-04-10 Thread Tom Lane
ilm...@ilmari.org (Dagfinn Ilmari =?utf-8?Q?Manns=C3=A5ker?=) writes:
> While playing around some more with the extension, I discoverered a few
> more issues:
> ...
> 4) SV_to_JsonbValue() throws an error for infinite NVs, but not NaNs

The others sound like bugs, but that one's intentional, since type
numeric does have a concept of NaN.  If you're arguing that we should
disallow that value in the context of jsonb, maybe so, but it'd likely
take changes in quite a few more places than here.

regards, tom lane



Re: Bugs in TOAST handling, OID assignment and redo recovery

2018-04-10 Thread Pavan Deolasee
Hi Heikki,

On Tue, Apr 10, 2018 at 7:07 PM, Heikki Linnakangas  wrote:

>
>>
> It would seem more straightforward to add a snapshot parameter to
> GetNewOidWithIndex(), so that the this one caller could pass SnapshotToast,
> while others pass SnapshotDirty. With your patch, you check the index
> twice: first with SnapshotDirty, in GetNewOidWithIndex(), and then with
> SnapshotToast, in the caller.
>

Hmm. I actually wrote my first patch exactly like that. I am trying to
remember why I discarded that approach. Was that to do with the fact
that  SnapshotToast
can't see all toast tuples either? Yeah, I think so. For example, it won't
see tuples with uncommitted-xmin, leading to different issues. Now it's
unlikely that we will have a OID conflict where the old tuple has
uncommitted-xmin, but not sure if we can completely rule that out. For
example, if we did not uncover the redo recovery bug, we could have had a
prepared transaction having inserted the old tuple, which now conflicts
with new tuple and not detected by SnapshotToast.



>
> If I'm reading the rewrite case correctly, it's a bit different and quite
> special. In the loop with GetNewOidWithIndex(), it needs to check that the
> OID is unused in two tables, the old TOAST table, and the new one. You can
> only pass one index to GetNewOidWithIndex(), so it needs to check the
> second index manually. It's not because of the snapshot issue. Although I
> wonder if we should be using SnapshotToast in that GetNewOidWithIndex()
> call, too. I.e. if we should be checking both the old and the new toast
> table with SnapshotToast.


As I said, I am not sure if checking with just SnapshotToast is enough
because it can't see "dirty" tuples.


>
> Agreed. With nextXid, we advance ShmemVariableCache->nextXid if the value
> in the online checkpoint record is greater than
> ShmemVariableCache->nextXid. But we don't have such a wraparound-aware
> concept of "greater than" for OIDs. Ignoring the online checkpoint record's
> nextOid value seem fine to me.
>

Ok. Thanks for checking.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2018-04-10 Thread Mark Rofail
Hello David,

On Tue, Apr 10, 2018 at 3:47 PM, David Steele  wrote:

> You should produce a new version by then that addresses Alvaro's
> concerns and I imagine that will require quite a bit of discussion and
> work.


I'll get working.
I'll produce a patch with two alternate versions, one with and one without
the GIN operators.

On 3/7/18 5:43 AM, Alvaro Herrera wrote:

> so the performance was measured to see if the GIN operator was
> worth it, and the numbers are pretty confusing (the test don't seem
> terribly exhaustive; some numbers go up, some go down, it doesn't appear
> that the tests were run more than once for each case therefore the
> numbers are pretty noisy
>
 Any ideas how to perform more exhaustive tests ?

On 3/26/18 4:50 PM, Mark Rofail wrote:

> > On Wed, Jan 31, 2018 at 1:52 AM, Andreas Karlsson
>  wrote:
> >
> >  The issue I see is that
> > ginqueryarrayextract() needs to make a copy of the search key but to
> do
> > so it needs to know the type of anyelement (to know if it needs to
> > detoast, etc). But there is as far as I can tell no way to check the
> > type of anyelement in this context.
> >
> > If there is any way to  obtain a copy of the datum I would be more than
> > happy to integrate the GIN operator to the patch.


as I said before we need a way to obtain a copy of a datum to comply with
the context of  ginqueryarrayextract()

Best Regards


Re: crash with sql language partition support function

2018-04-10 Thread Alvaro Herrera
Ashutosh Bapat wrote:
> On Tue, Apr 10, 2018 at 1:44 PM, Amit Langote
>  wrote:
> >
> > Attached fixes it.  It teaches RelationBuildPartitionKey() to use
> > fmgr_info_cxt and pass rd_partkeycxt to it.
> 
> The patch is using partkeycxt and not rd_partkeycxt. Probably a typo
> in the mail. But a wider question, why that context? I guess that
> cache context will vanish when that cache entry is thrown away. That's
> the reason we have to copy partition key information in
> find_partition_scheme() instead of just pointing to it and also use
> fmgr_info_copy() there. But if that's the case, buildfarm animal run
> with -DCLOBBER_CACHE_ALWAYS should show failure. I am missing
> something here.

Good point.  Yeah, it looks like it should cause problems.  I think it
would be better to have RelationGetPartitionKey() return a copy in the
caller's context of the data of the relcache data, rather than the
relcache data itself, no?  Seems like this would not fail if there never
is a CCI between the RelationGetPartitionKey call and the last access of
the returned key struct ... but if this is the explanation, then it's a
pretty brittle setup and we should stop relying on that.

I wonder why do we RelationBuildPartitionDesc and
RelationBuildPartitionKey directly in RelationBuildDesc.  Wouldn't it be
better to delay constructing those until the first access to them?

Finally: I don't quite understand why we need two memory contexts there,
one for the partkey and another for the partdesc.  Surely one context
for both is sufficient.  (And while at it, why not have the whole
RelationData inside the same memory context, so that it can be blown
away without so much retail pfree'ing?)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2018-04-10 Thread Alvaro Herrera
Mark Rofail wrote:
> Hello David,
> 
> On Tue, Apr 10, 2018 at 3:47 PM, David Steele  wrote:
> 
> > You should produce a new version by then that addresses Alvaro's
> > concerns and I imagine that will require quite a bit of discussion and
> > work.
> 
> I'll get working.
> I'll produce a patch with two alternate versions, one with and one without
> the GIN operators.

I'd rather see one patch with just the GIN operator (you may think it's
a very small patch, but that's only because you forget to add
documentation to it and a few extensive tests to ensure it works well);
then the array-fk stuff as a follow-on patch.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Partition-wise aggregation/grouping

2018-04-10 Thread David Steele
Hi Jeevan,

On 4/2/18 10:57 AM, Robert Haas wrote:
> On Thu, Mar 29, 2018 at 9:02 AM, Jeevan Chalke
>  wrote:
>> Yep, I see the risk.
> 
> Committed 0001 last week and 0002 just now.  I don't really see 0003 a
> a critical need.  If somebody demonstrates that this saves a
> meaningful amount of planning time, we can consider that part for a
> future release.

The bulk of this patch was committed so I have marked it that way.

If you would like to pursue patch 03 I think it would be best to start a
new thread and demonstrate how the patch will improve performance.

Regards,
-- 
-David
da...@pgmasters.net



Re: submake-errcodes

2018-04-10 Thread Tom Lane
Devrim =?ISO-8859-1?Q?G=FCnd=FCz?=  writes:
> I used to run
> "
> cd src/backend
> make submake-errcodes
> "
> in the RPM spec file, but looks like it was removed recently. Is that replaced
> with something else, or removed completely?

You could replace it with submake-generated-headers, since that's more
general, but in principle you shouldn't need anything because that
target is invoked automatically as of yesterday.  What's the larger
context here --- why do you need any of this?

regards, tom lane



Re: power() function in Windows: "value out of range: underflow"

2018-04-10 Thread Euler Taveira
2018-04-10 5:30 GMT-03:00 Huong Dangminh :
> There are some cases that power() function does not work
> correctly with 'NaN' arguments in Windows environment.
> Something like,
>
What is your exact OS version? What is your postgres version? I tested
with Windows 10 (10.0.16299) x64 and couldn't reproduce the error.

> postgres=# select power('NaN',11);
> ERROR:  value out of range: underflow
> postgres=# select power('NaN','NaN');
> ERROR:  value out of range: underflow
> postgres=# select power(11,'NaN');
> ERROR:  value out of range: underflow
>
> In Linux environment, instead of ERROR it returns 'NaN'.
>
Could you show us a simple test case? Print arguments, result and errno.


-- 
   Euler Taveira   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento



Re: lazy detoasting

2018-04-10 Thread Tom Lane
Chapman Flack  writes:
> I'm becoming increasingly glad I asked (or less embarrassed that I hadn't
> figured it all out yet).  :)

> Am I right in thinking that, for my original purpose of detoasting something
> later in a transaction, all that matters is that I registered a snapshot
> from the time at which I copied the toasted datum, and the resource owner
> I registered it to has not been released yet, so rows referred to in the
> snapshot haven't been vacuumed away? Is that a sufficient condition for
> detoast to work?

I believe so.

> Or would I need to do something more, like push and pop that snapshot
> around the detoast call?

You shouldn't need that; toast reads intentionally use a non-MVCC-aware
"snapshot" to handle exactly this type of situation, where somebody is
trying to pull data out of a tuple that would no longer be visible to
the transaction's current snapshot.

Wouldn't be a bad idea to test this, of course ;-)

regards, tom lane



Re: Transform for pl/perl

2018-04-10 Thread Dagfinn Ilmari Mannsåker
Tom Lane  writes:

> ilm...@ilmari.org (Dagfinn Ilmari =?utf-8?Q?Manns=C3=A5ker?=) writes:
>> While playing around some more with the extension, I discoverered a few
>> more issues:
>> ...
>> 4) SV_to_JsonbValue() throws an error for infinite NVs, but not NaNs
>
> The others sound like bugs, but that one's intentional, since type
> numeric does have a concept of NaN.  If you're arguing that we should
> disallow that value in the context of jsonb, maybe so, but it'd likely
> take changes in quite a few more places than here.

The numeric type that's used internally to represent numbers in jsonb
might have the concept of NaN, but JSON itself does not:

Numeric values that cannot be represented in the grammar below (such
as Infinity and NaN) are not permitted.

  - https://tools.ietf.org/html/rfc7159#section-6

And it cannot be cast to json:

=# create or replace function jsonbnan() returns jsonb immutable language 
plperlu transform for type jsonb as '0+"NaN"';
CREATE FUNCTION
=# select jsonbnan();
┌──┐
│ jsonbnan │
├──┤
│ NaN  │
└──┘

=# select jsonb_typeof(jsonbnan());
┌──┐
│ jsonb_typeof │
├──┤
│ number   │
└──┘

=# select jsonbnan()::json;
ERROR:  invalid input syntax for type json
DETAIL:  Token "NaN" is invalid.
CONTEXT:  JSON data, line 1: NaN

- ilmari
-- 
"The surreality of the universe tends towards a maximum" -- Skud's Law
"Never formulate a law or axiom that you're not prepared to live with
 the consequences of."  -- Skud's Meta-Law



Re: Faster inserts with mostly-monotonically increasing values

2018-04-10 Thread Heikki Linnakangas

/* XLOG stuff */
if (RelationNeedsWAL(rel))
{
...

if (P_ISLEAF(lpageop))
{
xlinfo = XLOG_BTREE_INSERT_LEAF;

/*
 * Cache the block information if we just 
inserted into the
 * rightmost leaf page of the index.
 */
if (P_RIGHTMOST(lpageop))
RelationSetTargetBlock(rel, 
BufferGetBlockNumber(buf));
}
...



Why is this RelationSetTargetBlock() call inside the "XLOG stuff" block? 
ISTM that we're failing to take advantage of this optimization for 
unlogged tables, for no particular reason. Just an oversight?


- Heikki



Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2018-04-10 Thread Mark Rofail
On Tue, Apr 10, 2018 at 3:59 PM, Alvaro Herrera 
wrote:
>
> documentation to it and a few extensive tests to ensure it works well);


I think the existing regression tests verify that the patch works as
expectations, correct?

We need more *exhaustive* tests to test performance, not functionality.

Regards


Re: [HACKERS] Replication status in logical replication

2018-04-10 Thread David Steele
On 4/10/18 6:14 AM, Masahiko Sawada wrote:
> On Fri, Mar 30, 2018 at 5:37 AM, Fujii Masao  wrote:
>> On Wed, Jan 17, 2018 at 2:16 AM, Simon Riggs  wrote:
>>> On 16 January 2018 at 06:21, Michael Paquier  
>>> wrote:
 On Tue, Jan 16, 2018 at 10:40:43AM +0900, Masahiko Sawada wrote:
> On Sun, Jan 14, 2018 at 12:43 AM, Simon Riggs  
> wrote:
>> On 9 January 2018 at 04:36, Masahiko Sawada  
>> wrote:
>> We're not talking about standbys, so the message is incorrect.
>
> Ah, I understood. How about "\"%s\"  has now caught up with upstream
> server"?

 +1.
>>>
>>> upstream is what I would have suggested, so +1 here also.
>>>
>>> Will commit.
>>
>> ping?
>>
> 
> Should I add this item to "Older Bugs" of the open item since we
> regarded it as a bug?

I'm going to reclassify this as a bug since everyone seems to agree it
is one.

Simon, are you still planning to commit this?

Thanks,
-- 
-David
da...@pgmasters.net



Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2018-04-10 Thread Alvaro Herrera
Mark Rofail wrote:
> On Tue, Apr 10, 2018 at 3:59 PM, Alvaro Herrera 
> wrote:
> >
> > documentation to it and a few extensive tests to ensure it works well);
> 
> I think the existing regression tests verify that the patch works as
> expectations, correct?

I meant for the GIN operator. (Remember, these are two patches, and each
of them needs its own tests.)

> We need more *exhaustive* tests to test performance, not functionality.

True.  So my impression from the numbers you posted last time is that
you need to run each measurement case several times, and provide
averages/ stddevs/etc for the resulting numbers, and see about outliers
(maybe throw them away, or maybe they indicate some problem in the test
or in the code); then we can make an informed decision about whether the
variations between the several different scenarios are real improvements
(or pessimizations) or just measurement noise.

In particular: it seemed to me that you decided to throw away the idea
of the new GIN operator without sufficient evidence that it was
unnecessary.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Faster inserts with mostly-monotonically increasing values

2018-04-10 Thread Claudio Freire
On Tue, Apr 10, 2018 at 11:10 AM, Heikki Linnakangas  wrote:
>> /* XLOG stuff */
>> if (RelationNeedsWAL(rel))
>> {
>> ...
>>
>> if (P_ISLEAF(lpageop))
>> {
>> xlinfo = XLOG_BTREE_INSERT_LEAF;
>>
>> /*
>>  * Cache the block information if we just
>> inserted into the
>>  * rightmost leaf page of the index.
>>  */
>> if (P_RIGHTMOST(lpageop))
>> RelationSetTargetBlock(rel,
>> BufferGetBlockNumber(buf));
>> }
>> ...
>
>
>
> Why is this RelationSetTargetBlock() call inside the "XLOG stuff" block?
> ISTM that we're failing to take advantage of this optimization for unlogged
> tables, for no particular reason. Just an oversight?
>
> - Heikki

Indeed.

Maybe Pavan knows of one, but I don't see any reason not to apply this
to unlogged tables as well. It slipped the review.



Re: Faster inserts with mostly-monotonically increasing values

2018-04-10 Thread Pavan Deolasee
On Tue, Apr 10, 2018 at 7:49 PM, Claudio Freire 
wrote:

> On Tue, Apr 10, 2018 at 11:10 AM, Heikki Linnakangas 
> wrote:
>
> >
> > Why is this RelationSetTargetBlock() call inside the "XLOG stuff" block?
> > ISTM that we're failing to take advantage of this optimization for
> unlogged
> > tables, for no particular reason. Just an oversight?
> >
> > - Heikki
>
> Indeed.
>
> Maybe Pavan knows of one, but I don't see any reason not to apply this
> to unlogged tables as well. It slipped the review.
>

Yes, looks like an oversight :-( I will fix it along with the other changes
that Peter requested.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Partition-wise aggregation/grouping

2018-04-10 Thread Jeevan Chalke
On Tue, Apr 10, 2018 at 7:30 PM, David Steele  wrote:

> Hi Jeevan,
>
> On 4/2/18 10:57 AM, Robert Haas wrote:
> > On Thu, Mar 29, 2018 at 9:02 AM, Jeevan Chalke
> >  wrote:
> >> Yep, I see the risk.
> >
> > Committed 0001 last week and 0002 just now.  I don't really see 0003 a
> > a critical need.  If somebody demonstrates that this saves a
> > meaningful amount of planning time, we can consider that part for a
> > future release.
>
> The bulk of this patch was committed so I have marked it that way.
>

Thanks, David.


>
> If you would like to pursue patch 03 I think it would be best to start a
> new thread and demonstrate how the patch will improve performance.
>

Sure.


>
> Regards,
> --
> -David
> da...@pgmasters.net
>



-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company


Re: Transform for pl/perl

2018-04-10 Thread Dagfinn Ilmari Mannsåker
ilm...@ilmari.org (Dagfinn Ilmari Mannsåker) writes:

> Tom Lane  writes:
>
>> ilm...@ilmari.org (Dagfinn Ilmari =?utf-8?Q?Manns=C3=A5ker?=) writes:
>>> While playing around some more with the extension, I discoverered a few
>>> more issues:
>>> ...
>>> 4) SV_to_JsonbValue() throws an error for infinite NVs, but not NaNs
>>
>> The others sound like bugs, but that one's intentional, since type
>> numeric does have a concept of NaN.  If you're arguing that we should
>> disallow that value in the context of jsonb, maybe so, but it'd likely
>> take changes in quite a few more places than here.
>
> The numeric type that's used internally to represent numbers in jsonb
> might have the concept of NaN, but JSON itself does not:
>
> Numeric values that cannot be represented in the grammar below (such
> as Infinity and NaN) are not permitted.
>
>   - https://tools.ietf.org/html/rfc7159#section-6
[…]
>=# create or replace function jsonbnan() returns jsonb immutable language 
> plperlu transform for type jsonb as '0+"NaN"';
>CREATE FUNCTION
[…]
>=# select jsonbnan()::json;
>ERROR:  invalid input syntax for type json
>DETAIL:  Token "NaN" is invalid.
>CONTEXT:  JSON data, line 1: NaN

Also, it doesn't parse back in as jsonb either:

=# select jsonbnan()::text::json;
ERROR:  invalid input syntax for type json
DETAIL:  Token "NaN" is invalid.
CONTEXT:  JSON data, line 1: NaN

And it's inconsistent with to_jsonb():

=# select to_jsonb('nan'::numeric);
┌──┐
│ to_jsonb │
├──┤
│ "NaN"│
└──┘

It would be highly weird if PL transforms (jsonb_plpython does the same
thing) let you create spec-violating jsonb values that don't round-trip
via jsonb_out/in.

- ilmari
-- 
"The surreality of the universe tends towards a maximum" -- Skud's Law
"Never formulate a law or axiom that you're not prepared to live with
 the consequences of."  -- Skud's Meta-Law



Re: Boolean partitions syntax

2018-04-10 Thread David Rowley
On 3 February 2018 at 12:04, Tom Lane  wrote:
> Perhaps more useful to discuss: would that truly be the semantics we want,
> or should we just evaluate the expression and have done?  It's certainly
> arguable that "IN (random())" ought to draw an error, not compute some
> random value and use that.  But if you are insistent on partition bounds
> being immutable in any strong sense, you've already got problems, because
> e.g. a timestamptz literal's interpretation isn't necessarily fixed.
> It's only after we've reduced the original input to Datum form that we
> can make any real promises about the value not moving.  So I'm not seeing
> where is the bright line between "IN ('today')" and "IN (random())".

I see there's been some progress on this thread that's probably gone a
bit beyond here without the discussion about the desired semantics.

To kick that off, I'm wondering, in regards to the comment about
'today' vs random(); how does this differ from something like:

CREATE VIEW ... AS SELECT ... FROM ... WHERE datecol = 'today'; ?

In this case 'today' is going to be evaluated during the parse
analysis that's done during CREATE VIEW. Why would partitioning need
to be treated differently?

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Boolean partitions syntax

2018-04-10 Thread David Rowley
On 10 April 2018 at 23:13, Kyotaro HORIGUCHI
 wrote:
> Note: This is not intended to be committed this time but just for
> information.
>
> At Tue, 10 Apr 2018 10:34:27 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
>  wrote in 
> <20180410.103427.244142052.horiguchi.kyot...@lab.ntt.co.jp>
>> Just adding negation would work as a_expr is doing.
>>
>> > | '-' a_expr%prec UMINUS
>> > { $$ = doNegate($2, @1); }
>
> a_expr fits partbound_datum_list as is but it cannot sit
> side-by-side with MAX/MINVALUE at all. However c_expr can if
> columnref is not excluded. The attached patch does that and
> partition bound accepts the following syntax. (I didn't see the
> transform side at all)
>
> create table p2c1 partition of p2 for values from (log(1000),0+1,0/1) to (10, 
> 10, ('1'||'0')::int);

I imagined this would have had a check for volatile functions and some
user-friendly error message to say partition bounds must be immutable,
but instead, it does:

postgres=# create table d_p1 partition of d for values in (Random());
ERROR:  specified value cannot be cast to type double precision for column "d"
LINE 1: create table d_p1 partition of d for values in (Random());
^
DETAIL:  The cast requires a non-immutable conversion.
HINT:  Try putting the literal value in single quotes.

For inspiration, maybe you could follow the lead of CREATE INDEX:

postgres=# create index on d ((random()));
ERROR:  functions in index expression must be marked IMMUTABLE

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] logical decoding of two-phase transactions

2018-04-10 Thread Nikhil Sontakke
>> Anyways, will now wait for the next commitfest/opportunity to try to
>> get this in.
>
> It looks like this patch should be in the Needs Review state so I have
> done that and moved it to the next CF.
>
Thanks David,

Regards,
Nikhils
-- 
 Nikhil Sontakke   http://www.2ndQuadrant.com/
 PostgreSQL/Postgres-XL Development, 24x7 Support, Training & Services



Re: Custom PGC_POSTMASTER GUC variables ... feasible?

2018-04-10 Thread Tom Lane
Jim Finnerty  writes:
> What were the possible failure scenarios that throwing a fatal error was
> intended to avoid, i.e. what sort of "hooking into" is the comment below
> referring to that was regarded as a fate worse than death?

The point is that if the extension is marking the variable as
PGC_POSTMASTER, it's presumably relying on that variable having the same
value in every process.  It might be using it as the size of an array in
shared memory, say.  If some processes have a different value, that could
end in a memory stomp, or some other crash that's substantially less clean
than a FATAL exit.

regards, tom lane



Re: Partitioned tables and covering indexes

2018-04-10 Thread Alexander Korotkov
On Tue, Apr 10, 2018 at 12:47 PM, Amit Langote <
langote_amit...@lab.ntt.co.jp> wrote:

> On 2018/04/10 16:07, Jaime Casanova wrote:
> > Hi,
> >
> > Trying covering indexes on partitioned tables i get this error
> > """
> > postgres=# create index on t1_part (i) include (t);
> > ERROR:  cache lookup failed for opclass 0
> > """
> >
> > To reproduce:
> >
> > create table t1_part (i int, t text) partition by hash (i);
> > create table t1_part_0 partition of t1_part for values with (modulus
> > 2, remainder 0);
> > create table t1_part_1 partition of t1_part for values with (modulus
> > 2, remainder 1);
> > insert into t1_part values (1, repeat('abcdefquerty', 20));
> >
> > create index on t1_part (i) include (t);
>
> It seems that the bug is caused due to the original IndexStmt that
> DefineIndex receives being overwritten when processing the INCLUDE
> columns.  Also, the original copy of it should be used when recursing for
> defining the index in partitions.
>
> Does the attached fix look correct?  Haven't checked the fix with ATTACH
> PARTITION though.
>

Attached patch seems to fix the problem.  However, I would rather get
rid of modifying stmt->indexParams.  That seems to be more logical
for me.  Also, it would be good to check some covering indexes on
partitioned tables.  See the attached patch.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


DefineIndex-fix-covering-index-partitioned-2.patch
Description: Binary data


Re: [HACKERS] [PATCH] Incremental sort

2018-04-10 Thread Alexander Korotkov
On Tue, Apr 10, 2018 at 4:15 PM, David Steele  wrote:

> On 4/9/18 11:56 AM, Alexander Korotkov wrote:
> >
> > Thank you very much for your efforts on reviewing this patch.
> > I'm looking forward work with you on this feature for PG12.
>
> Since there's a new patch I have changed the status to Needs Review and
> moved the entry to the next CF.
>

Right, thank you.

Still, it seems to me that discussion and new patches will be required
> to address Tom's concerns.
>

Sounds correct for me.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: pgsql: Support partition pruning at execution time

2018-04-10 Thread Alvaro Herrera
Changed CC to pgsql-hackers.

Tom Lane wrote:
> Alvaro Herrera  writes:

> > Frankly, I don't like this.  I would rather have an instrument->ntuples2
> > rather than these "divide this by nloops, sometimes" schizoid counters.
> > This is already being misused by ON CONFLICT (see "other_path" in
> > show_modifytable_info).  But it seems like a correct fix would require
> > more code.
> 
> The question then becomes whether to put back nfiltered3, or to do
> something more to your liking.  Think I'd vote for the latter.

Doing it properly is not a lot of code actually.  Patch attached.  ON
CONFLICT is not changed by this patch, but that's a straightforward
change.

Questions:
1. Do we want to back-patch this to 10?  I suppose (without checking)
that EXPLAIN ANALYZE is already reporting bogus numbers for parallel
index-only scans, so I think we should do that.

2. Do we want to revert Andrew's test stabilization patch?  If I
understand correctly, the problem is the inverse of what was diagnosed:
"any running transaction at the time of the test could prevent pages
from being set as all-visible".  That's correct, but the test doesn't
depend on pages being all-visible -- quite the contrary, it depends on
the pages NOT being all-visible (which is why the HeapFetches counts are
all non-zero).  Since the pages contain very few tuples, autovacuum
should never process the tables anyway.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 989b6aad67..fe0419311b 100644
*** a/src/backend/commands/explain.c
--- b/src/backend/commands/explain.c
***
*** 1459,1470  ExplainNode(PlanState *planstate, List *ancestors,
show_instrumentation_count("Rows Removed by 
Filter", 1,

   planstate, es);
if (es->analyze)
!   {
!   longheapFetches =
!   ((IndexOnlyScanState *) 
planstate)->ioss_HeapFetches;
! 
!   ExplainPropertyInteger("Heap Fetches", NULL, 
heapFetches, es);
!   }
break;
case T_BitmapIndexScan:
show_scan_qual(((BitmapIndexScan *) 
plan)->indexqualorig,
--- 1459,1466 
show_instrumentation_count("Rows Removed by 
Filter", 1,

   planstate, es);
if (es->analyze)
!   ExplainPropertyInteger("Heap Fetches", NULL,
!  
planstate->instrument->ntuples2, es);
break;
case T_BitmapIndexScan:
show_scan_qual(((BitmapIndexScan *) 
plan)->indexqualorig,
diff --git a/src/backend/executor/insindex 86252cee1f..fe5d55904d 100644
*** a/src/backend/executor/instrument.c
--- b/src/backend/executor/instrument.c
***
*** 156,161  InstrAggNode(Instrumentation *dst, Instrumentation *add)
--- 156,162 
dst->startup += add->startup;
dst->total += add->total;
dst->ntuples += add->ntuples;
+   dst->ntuples2 += add->ntuples2;
dst->nloops += add->nloops;
dst->nfiltered1 += add->nfiltered1;
dst->nfiltered2 += add->nfiltered2;
diff --git a/src/backend/executor/nodeInindex ddc0ae9061..3a02a99621 100644
*** a/src/backend/executor/nodeIndexonlyscan.c
--- b/src/backend/executor/nodeIndexonlyscan.c
***
*** 162,168  IndexOnlyNext(IndexOnlyScanState *node)
/*
 * Rats, we have to visit the heap to check visibility.
 */
!   node->ioss_HeapFetches++;
tuple = index_fetch_heap(scandesc);
if (tuple == NULL)
continue;   /* no visible tuple, 
try next index entry */
--- 162,168 
/*
 * Rats, we have to visit the heap to check visibility.
 */
!   InstrCountTuples2(node, 1);
tuple = index_fetch_heap(scandesc);
if (tuple == NULL)
continue;   /* no visible tuple, 
try next index entry */
***
*** 509,515  ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int 
eflags)
indexstate->ss.ps.plan = (Plan *) node;
indexstate->ss.ps.state = estate;
indexstate->ss.ps.ExecProcNode = ExecIndexOnlyScan;
-   indexstate->ioss_HeapFetches = 0;
  
/*
 * Miscellane

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-10 Thread Robert Haas
On Mon, Apr 9, 2018 at 3:13 PM, Andres Freund  wrote:
> Let's lower the pitchforks a bit here.  Obviously a grand rewrite is
> absurd, as is some of the proposed ways this is all supposed to
> work. But I think the case we're discussing is much closer to a near
> irresolvable corner case than anything else.

Well, I admit that I wasn't entirely serious about that email, but I
wasn't entirely not-serious either.  If you can't find reliably find
out whether the contents of the file on disk are the same as the
contents that the kernel is giving you when you call read(), then you
are going to have a heck of a time building a reliable system.  If the
kernel developers are determined to insist on these semantics (and,
admittedly, I don't know whether that's the case - I've only read
Anthony's remarks), then I don't really see what we can do except give
up on buffered I/O (or on Linux).

> We're talking about the storage layer returning an irresolvable
> error. You're hosed even if we report it properly.  Yes, it'd be nice if
> we could report it reliably.  But that doesn't change the fact that what
> we're doing is ensuring that data is safely fsynced unless storage
> fails, in which case it's not safely fsynced anyway.

I think that reliable error reporting is more than "nice" -- I think
it's essential.  The only argument for the current Linux behavior that
has been so far advanced on this thread, at least as far as I can see,
is that if it kept retrying the buffers forever, it would be pointless
and might run the machine out of memory, so we might as well discard
them.  But previous comments have already illustrated that the kernel
is not really up against a wall there -- it could put individual
inodes into a permanent failure state when it discards their dirty
data, as you suggested, or it could do what others have suggested, and
what I think is better, which is to put the whole filesystem into a
permanent failure state that can be cleared by remounting the FS.
That could be done on an as-needed basis -- if the number of dirty
buffers you're holding onto for some filesystem becomes too large, put
the filesystem into infinite-fail mode and discard them all.  That
behavior would be pretty easy for administrators to understand and
would resolve the entire problem here provided that no PostgreSQL
processes survived the eventual remount.

I also don't really know what we mean by an "unresolvable" error.  If
the drive is beyond all hope, then it doesn't really make sense to
talk about whether the database stored on it is corrupt.  In general
we can't be sure that we'll even get an error - e.g. the system could
be idle and the drive could be on fire.  Maybe this is the case you
meant by "it'd be nice if we could report it reliably".  But at least
in my experience, that's typically not what's going on.  You get some
I/O errors and so you remount the filesystem, or reboot, or rebuild
the array, or ... something.  And then the errors go away and, at that
point, you want to run recovery and continue using your database.  In
this scenario, it matters *quite a bit* what the error reporting was
like during the period when failures were occurring.  In particular,
if the database was allowed to think that it had successfully
checkpointed when it didn't, you're going to start recovery from the
wrong place.

I'm going to shut up now because I'm telling you things that you
obviously already know, but this doesn't sound like a "near
irresolvable corner case".  When the storage goes bonkers, either
PostgreSQL and the kernel can interact in such a way that a checkpoint
can succeed without all of the relevant data getting persisted, or
they don't.  It sounds like right now they do, and I'm not really
clear that we have a reasonable idea how to fix that.  It does not
sound like a PANIC is sufficient.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



2018-03 CF Cleanup

2018-04-10 Thread David Steele
Hackers,

I have gone through all the remaining non-bug entries in CF 2018-03 and
pushed them or closed them as appropriate.  The majority of the patches
were in Needs Review or Ready for Committer status and I did a brief
review of each to be sure the state was reasonable.

I moved a few Waiting on Author patches where it looked like the patch
was being actively developed even if there was no new patch.

For the moment I have left all bugs in the 2018-03 CF.  I can can add
them to the "Older Bugs" section of the PG 11 Open Items but I'm not
convinced that is the best way to track them.  If they are added to
"Older Bugs", does that mean they get closed?  Or should we just add a
link to the CF entry in "Older Bugs"?

Thanks,
-- 
-David
da...@pgmasters.net



Re: 2018-03 CF Cleanup

2018-04-10 Thread Alvaro Herrera
David Steele wrote:
> Hackers,
> 
> I have gone through all the remaining non-bug entries in CF 2018-03 and
> pushed them or closed them as appropriate.  The majority of the patches
> were in Needs Review or Ready for Committer status and I did a brief
> review of each to be sure the state was reasonable.

Thanks!

> For the moment I have left all bugs in the 2018-03 CF.  I can can add
> them to the "Older Bugs" section of the PG 11 Open Items but I'm not
> convinced that is the best way to track them.  If they are added to
> "Older Bugs", does that mean they get closed?  Or should we just add a
> link to the CF entry in "Older Bugs"?

OpenItem's "Older Bugs" section is generally a way to ignore items
forever (ie., they don't represent a true Open Item for this release.)
I think it's better to move these patches to the next commitfest
instead.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: lazy detoasting

2018-04-10 Thread Chapman Flack
On 04/10/2018 10:06 AM, Tom Lane wrote:
> Chapman Flack  writes:

>> Am I right in thinking that, for my original purpose of
>> detoasting something later in a transaction, all that matters
>> is that I registered a snapshot from the time at which I copied
>> the toasted datum, and the resource owner I registered it to
>> has not been released yet, ...
> 
> I believe so.

Ok.

I see I have about half a dozen choices for the snapshot that
I choose to squirrel away at the same time as the copied datum.
(That's not counting SnapshotToast, which I didn't know was a thing
until Pavan's thread this morning, but it's not a thing I can get,
just something constructed on the fly in the tuptoaster.)

Out of the six GetFooSnapshot()s, would I want to squirrel away
Active? Oldest? Transaction? This should be happening in a PL
function not doing anything arcane; the datum in question might
be passed to it as a parameter or retrieved from a query done
within the function.

GetOldestTransaction() is what the tuptoaster will eventually use
to construct SnapshotToast when looking for the data, but it's
probably overkill for me to save the oldest one in sight at the
time I save the datum. Or is it? Should I be confident that, at
the time I'm handed this datum, its toasted content must be
retrievable through the (Active? Transaction?) snapshot at that
moment, and it's enough to register that, while to register the
Oldest would be to pin something unnecessarily far back?

> Wouldn't be a bad idea to test this, of course ;-)

Mm-hmm. (Thunderbird has just corrected my spelling of mm-hmm.)
I'm still not sure I know enough to construct a meaningful test...

I assume that, while I'm doing this for a backend PL at the
moment, some of the same considerations will apply if a future
wire protocol is to support Craig's "v4 protocol TODO item -
Lazy fetch/stream of TOASTed valued" suggestion in
https://www.postgresql.org/message-id/53ff0ef8@2ndquadrant.com

-Chap



Re: [PATCH][PROPOSAL] Refuse setting toast.* reloptions when TOAST table does not exist

2018-04-10 Thread David Steele
On 4/10/18 9:17 AM, Alvaro Herrera wrote:
> Nikolay Shaplov wrote:
> 
>> But I need some confirmation, in order not to write patch in vain again :-)
> 
> Don't worry, rest assured that you will still write *many* patches in
> vain, not just this one.

Despite the rather dubious pep talk, Álvaro is right.  You will get more
response with a new patch and a new thread.

But everyone is pretty fatigued right now, so I would recommend waiting
a while.  If you would like to pursue this, plan to have a new patch
ready in August for the next CF.  It sounds like you have an idea about
what needs to be done.

Regards,
-- 
-David
da...@pgmasters.net



Re: crash with sql language partition support function

2018-04-10 Thread Tom Lane
Alvaro Herrera  writes:
> Ashutosh Bapat wrote:
>> On Tue, Apr 10, 2018 at 1:44 PM, Amit Langote
>>  wrote:
>>> Attached fixes it.  It teaches RelationBuildPartitionKey() to use
>>> fmgr_info_cxt and pass rd_partkeycxt to it.

>> The patch is using partkeycxt and not rd_partkeycxt. Probably a typo
>> in the mail.

No, it's correct as written, because rd_partkeycxt hasn't been set yet.

> Good point.  Yeah, it looks like it should cause problems.  I think it
> would be better to have RelationGetPartitionKey() return a copy in the
> caller's context of the data of the relcache data, rather than the
> relcache data itself, no?  Seems like this would not fail if there never
> is a CCI between the RelationGetPartitionKey call and the last access of
> the returned key struct ... but if this is the explanation, then it's a
> pretty brittle setup and we should stop relying on that.

Yeah, all of the callers of RelationGetPartitionKey seem to assume that
the pointer they get is guaranteed valid and will stay so forever.
This is pretty dangerous, independently of the bug Amit mentions.

However, I'm not sure that copy-on-read is a good solution here, because
of exactly the point at stake that the FmgrInfos may have infrastructure.
We don't have a way to copy that, and even if we did, copying on every
reference would be really expensive.

We might try to make this work like the relcache infrastructure for
indexes, which also contains FmgrInfos.  However, in the index case
we may safely assume that that stuff never changes for the life of the
index.  I'm afraid that's not true for the partitioning data is it?

How much does it actually buy us to store FmgrInfos here rather than,
say, function OIDs?  If we backed off to that, then the data structure
might be simple enough that copy-on-read would work.

Otherwise we might need some kind of refcount mechanism.  We built one
for domain constraints in the typcache, and it's not horrid, but it is a
fair amount of code.

> Finally: I don't quite understand why we need two memory contexts there,
> one for the partkey and another for the partdesc.  Surely one context
> for both is sufficient.

It'd only matter if there were a reason to delete/rebuild one but not the
other within a particular relcache entry, which probably there isn't.
So using one context for both would likely be a bit simpler and more
efficient.

BTW, another thing in the same general area that I was planning to get
on the warpath about sooner or later is that the code managing
rd_partcheck is really cruddy.  It spews a lot of random data structure
into the CacheMemoryContext with no way to release it at relcache inval,
resulting in a session-lifespan memory leak.  (pfree'ing just the List
header, as RelationDestroyRelation does, is laughably inadequate.)
Perhaps that could be fixed by likewise storing it in a sub-context
used for partition information.

regards, tom lane



Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-10 Thread Robert Haas
On Tue, Apr 10, 2018 at 1:37 AM, Craig Ringer  wrote:
> ... but *only if they hit an I/O error* or they're on a FS that
> doesn't reserve space and hit ENOSPC.
>
> It still does 99% of the job. It still flushes all buffers to
> persistent storage and maintains write ordering. It may not detect and
> report failures to the user how we'd expect it to, yes, and that's not
> great. But it's hardly throw up our hands and give up territory
> either. Also, at least for initdb, we can make initdb fsync() its own
> files before close(). Annoying but hardly the end of the world.

I think we'd need every child postgres process started by initdb to do
that individually, which I suspect would slow down initdb quite a lot.
Now admittedly for anybody other than a PostgreSQL developer that's
only a minor issue, and our regression tests mostly run with fsync=off
anyway.  But I have a strong suspicion that our assumptions about how
fsync() reports errors are baked into an awful lot of parts of the
system, and by the time we get unbaking them I think it's going to be
really surprising if we haven't done real harm to overall system
performance.

BTW, I took a look at the MariaDB source code to see whether they've
got this problem too and it sure looks like they do.
os_file_fsync_posix() retries the fsync in a loop with an 0.2 second
sleep after each retry.  It warns after 100 failures and fails an
assertion after 1000 failures.  It is hard to understand why they
would have written the code this way unless they expect errors
reported by fsync() to continue being reported until the underlying
condition is corrected.  But, it looks like they wouldn't have the
problem that we do with trying to reopen files to fsync() them later
-- I spot checked a few places where this code is invoked and in all
of those it looks like the file is already expected to be open.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Including SQL files in extension scripts

2018-04-10 Thread Jeremy Finzel
In writing extension update scripts, I find it to be really difficult to
grok diffs for example in changed view or function definitions when a new
extension script has to include the whole definition in a new file.  I want
to rather use separate files for these objects, then use something like
psql's \i to include the definitions so that version control (and
management of these files) becomes easier, but this does not appear to be
supported.  Does anyone have any recommendations here?

Currently, I often simply opt to create a script that builds out the
extension SQL files from separated sql files so that I can more easily
maintain them.  But I was hoping there is a better way.  Any ideas would be
much appreciated.

Thanks,
Jeremy


Re: Boolean partitions syntax

2018-04-10 Thread Tom Lane
David Rowley  writes:
> I imagined this would have had a check for volatile functions and some
> user-friendly error message to say partition bounds must be immutable,
> but instead, it does:

> postgres=# create table d_p1 partition of d for values in (Random());
> ERROR:  specified value cannot be cast to type double precision for column "d"
> LINE 1: create table d_p1 partition of d for values in (Random());
> ^
> DETAIL:  The cast requires a non-immutable conversion.
> HINT:  Try putting the literal value in single quotes.

> For inspiration, maybe you could follow the lead of CREATE INDEX:

> postgres=# create index on d ((random()));
> ERROR:  functions in index expression must be marked IMMUTABLE

Well, that just begs the question: why do these expressions need to
be immutable?  What we really want, I think, is to evaluate them
and reduce them to constants.  After that, it hardly matters whether
the original expression was volatile.  I see absolutely no moral
difference between "for values in (random())" and cases like
this, which works today:

regression=# create table pp(d1 date) partition by range(d1);
CREATE TABLE
regression=# create table cc partition of pp for values from ('today') to 
('tomorrow');
CREATE TABLE
regression=# \d+ cc
   Table "public.cc"
 Column | Type | Collation | Nullable | Default | Storage | Stats target | Descr
iption 
+--+---+--+-+-+--+--
---
 d1 | date |   |  | | plain   |  | 
Partition of: pp FOR VALUES FROM ('2018-04-10') TO ('2018-04-11')
Partition constraint: ((d1 IS NOT NULL) AND (d1 >= '2018-04-10'::date) AND (d1 <
 '2018-04-11'::date))

If we're willing to reduce 'today'::date to a fixed constant,
why not random()?

regards, tom lane



Re: Partitioned tables and covering indexes

2018-04-10 Thread Teodor Sigaev

Does the attached fix look correct?  Haven't checked the fix with ATTACH
PARTITION though.


Attached patch seems to fix the problem.  However, I would rather get
rid of modifying stmt->indexParams.  That seems to be more logical
for me.  Also, it would be good to check some covering indexes on
partitioned tables.  See the attached patch.
Seems right way, do not modify incoming object and do not copy rather large and 
deep nested structure as suggested by Amit.


But it will  be better to have a ATTACH PARTITION test too.

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/



Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-10 Thread Anthony Iliopoulos
Hi Robert,

On Tue, Apr 10, 2018 at 11:15:46AM -0400, Robert Haas wrote:
> On Mon, Apr 9, 2018 at 3:13 PM, Andres Freund  wrote:
> > Let's lower the pitchforks a bit here.  Obviously a grand rewrite is
> > absurd, as is some of the proposed ways this is all supposed to
> > work. But I think the case we're discussing is much closer to a near
> > irresolvable corner case than anything else.
> 
> Well, I admit that I wasn't entirely serious about that email, but I
> wasn't entirely not-serious either.  If you can't find reliably find
> out whether the contents of the file on disk are the same as the
> contents that the kernel is giving you when you call read(), then you
> are going to have a heck of a time building a reliable system.  If the
> kernel developers are determined to insist on these semantics (and,
> admittedly, I don't know whether that's the case - I've only read
> Anthony's remarks), then I don't really see what we can do except give
> up on buffered I/O (or on Linux).

I think it would be interesting to get in touch with some of the
respective linux kernel maintainers and open up this topic for
more detailed discussions. LSF/MM'18 is upcoming and it would
have been the perfect opportunity but it's past the CFP deadline.
It may still worth contacting the organizers to bring forward
the issue, and see if there is a chance to have someone from
Pg invited for further discussions.

Best regards,
Anthony



Re: pgsql: Support partition pruning at execution time

2018-04-10 Thread Tom Lane
Alvaro Herrera  writes:
> Questions:
> 1. Do we want to back-patch this to 10?  I suppose (without checking)
> that EXPLAIN ANALYZE is already reporting bogus numbers for parallel
> index-only scans, so I think we should do that.

You can't back-patch a change in struct Instrumentation; that'd be
a ABI break.  Maybe there are no third parties directly touching that
struct, but I wouldn't bet on it.

> 2. Do we want to revert Andrew's test stabilization patch?  If I
> understand correctly, the problem is the inverse of what was diagnosed:
> "any running transaction at the time of the test could prevent pages
> from being set as all-visible".  That's correct, but the test doesn't
> depend on pages being all-visible -- quite the contrary, it depends on
> the pages NOT being all-visible (which is why the HeapFetches counts are
> all non-zero).  Since the pages contain very few tuples, autovacuum
> should never process the tables anyway.

I did not especially like the original test output, because even without
the bug at hand, it seemed to me the number of heap fetches might vary
depending on BLCKSZ.  Given that the point of the test is just to check
partition pruning, seems like IOS vs regular indexscan isn't a critical
difference.  I'd keep Andrew's change but fix the comment.

regards, tom lane



Re: pgsql: Support partition pruning at execution time

2018-04-10 Thread David Rowley
On 11 April 2018 at 03:14, Alvaro Herrera  wrote:
> 2. Do we want to revert Andrew's test stabilization patch?  If I
> understand correctly, the problem is the inverse of what was diagnosed:
> "any running transaction at the time of the test could prevent pages
> from being set as all-visible".  That's correct, but the test doesn't
> depend on pages being all-visible -- quite the contrary, it depends on
> the pages NOT being all-visible (which is why the HeapFetches counts are
> all non-zero).  Since the pages contain very few tuples, autovacuum
> should never process the tables anyway.

I think it's probably a good idea to revert it once the
instrumentation is working correctly. It appears this found a bug in
that code, so is probably useful to keep just in case something else
breaks it in the future.

I don't think there is too much risk of instability from other
sources. There's no reason an auto-vacuum would trigger and cause a
change in heap fetches. We only delete one row from lprt_a, that's not
going to trigger an auto-vacuum.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: pgsql: Support partition pruning at execution time

2018-04-10 Thread David Rowley
On 11 April 2018 at 03:42, Tom Lane  wrote:
> Alvaro Herrera  writes:
>> 2. Do we want to revert Andrew's test stabilization patch?  If I
>> understand correctly, the problem is the inverse of what was diagnosed:
>> "any running transaction at the time of the test could prevent pages
>> from being set as all-visible".  That's correct, but the test doesn't
>> depend on pages being all-visible -- quite the contrary, it depends on
>> the pages NOT being all-visible (which is why the HeapFetches counts are
>> all non-zero).  Since the pages contain very few tuples, autovacuum
>> should never process the tables anyway.
>
> I did not especially like the original test output, because even without
> the bug at hand, it seemed to me the number of heap fetches might vary
> depending on BLCKSZ.  Given that the point of the test is just to check
> partition pruning, seems like IOS vs regular indexscan isn't a critical
> difference.  I'd keep Andrew's change but fix the comment.

hmm, I don't feel strongly about reverting or not, but if
[auto-]vacuum has not visited the table, then I don't see why BLCKSZ
would have an effect here.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: pgsql: Support partition pruning at execution time

2018-04-10 Thread Robert Haas
On Tue, Apr 10, 2018 at 11:14 AM, Alvaro Herrera
 wrote:
> Questions:
> 1. Do we want to back-patch this to 10?  I suppose (without checking)
> that EXPLAIN ANALYZE is already reporting bogus numbers for parallel
> index-only scans, so I think we should do that.

I haven't looked at this closely, but have you considered adding
bespoke code for IndexOnlyScan that works like
ExecSortRetrieveInstrumentation and ExecHashRetrieveInstrumentation
already do rather than jamming this into struct Instrumentation?

I'm inclined to view any node-specific instrumentation that's not
being pulled back to the leader as a rough edge to be filed down when
it bothers somebody more than an outright bug, but perhaps that is an
unduly lenient view.  Still, if we take the view that it's an outright
bug, I suspect we find that there may be at least a few more of those.
I was pretty much oblivious to this problem during the initial
parallel query development and mistakenly assumed that bringing over
struct Instrumentation was good enough.  It emerged late in the game
that this wasn't really the case, but holding up the whole feature
because some nodes might have details not reported on a per-worker
basis didn't really seem to make sense.  Whether that was the right
call is obviously arguable.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: WIP: Covering + unique indexes.

2018-04-10 Thread Teodor Sigaev

* There is no pfree() within _bt_buildadd() for truncated tuples, even
though that's a context where it's clearly not okay.

Agree



* It might be a good idea to also pfree() the truncated tuple for most
other _bt_buildadd() callers. Even though it's arguably okay in other
cases, it seems worth being consistent about it (consistent with old
nbtree code).

Seems, I don't see other calls to pfree after.


* There should probably be some documentation around why it's okay
that we call index_truncate_tuple() with an exclusive buffer lock held
(during a page split). For example, there should probably be a comment
on the VARATT_IS_EXTERNAL() situation.

I havn't objection to improve docs/comments.



* Not sure that all calls to BTreeInnerTupleGetDownLink() are limited
to inner tuples, which might be worth doing something about (perhaps
just renaming the macro).

What is suspicious place for you opinion?



I do not have the time to write a patch right away, but I should be
able to post one in a few days. I want to avoid sending several small
patches.

no problem, we can wait


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/



Re: pgsql: Merge catalog/pg_foo_fn.h headers back into pg_foo.h headers.

2018-04-10 Thread Julien Rouhaud
On Tue, Apr 10, 2018 at 3:46 PM, Tom Lane  wrote:
>
> Julien Rouhaud  writes:
>> I think the best fix if to define NO_GENERATED_HEADERS in pgxs.mk,
>> patch attached.
>
> Hm.  I wonder if we don't also want NO_TEMP_INSTALL, like the doc/src/sgml
> makefile.  I don't know whether "make check" could be useful in a PGXS
> build, but certainly that recipe for making a temp install isn't gonna
> work.

If I understand correctly, PGXS.mk already forbids a "make check" if
PGXS is defined.  And it seems that postgres' contribs rely on
including PGXS.mk, setting NO_PGXS and doing a "make check", so
NO_TEMP_INSTALL shouldn't be needed.



Re: Online enabling of checksums

2018-04-10 Thread Robert Haas
On Fri, Apr 6, 2018 at 8:59 PM, Andres Freund  wrote:
> This is PROPARALLEL_RESTRICTED. That doesn't strike me right, shouldn't
> they be PROPARALLEL_UNSAFE? It might be fine, but I'd not want to rely
> on it.

Just a fine-grained note on this particular point:

It's totally fine for parallel-restricted operations to write WAL,
write to the filesystem, or launch nukes at ${ENEMY_NATION}.  Well, I
mean, the last one might be a bad idea for geopolitical reasons, but
it's not a problem for parallel query.  It is a problem to insert or
update heap tuples because it might extend the relation; mutual
exclusion doesn't work properly there yet (there was a patch to fix
that, but you had some concerns and it didn't go in).  It is a problem
to update or delete heap tuples which might create new combo CIDs; not
all workers will have the same view (there's no patch for this yet
AFAIK, but the fix probably doesn't look that different from
cc5f81366c36b3dd8f02bd9be1cf75b2cc8482bd and could probably use most
of the same infrastructure).

TL;DR: Writing pages (e.g. to set a checksum) doesn't make something
non-parallel-safe.  Writing heap tuples makes it parallel-unsafe.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pgsql: Support partition pruning at execution time

2018-04-10 Thread Alvaro Herrera
Robert Haas wrote:
> On Tue, Apr 10, 2018 at 11:14 AM, Alvaro Herrera
>  wrote:
> > Questions:
> > 1. Do we want to back-patch this to 10?  I suppose (without checking)
> > that EXPLAIN ANALYZE is already reporting bogus numbers for parallel
> > index-only scans, so I think we should do that.
> 
> I haven't looked at this closely, but have you considered adding
> bespoke code for IndexOnlyScan that works like
> ExecSortRetrieveInstrumentation and ExecHashRetrieveInstrumentation
> already do rather than jamming this into struct Instrumentation?

Thanks for pointing these out -- I hadn't come across these.

My initial impression is that those two are about transferring
instrumentation state that's quite a bit more complicated than what my
patch proposes for indexonly scan, which is just a single integer
counter.  For example, in the case of Sort, for each worker we display
separately the method and type and amount of memory.  In the hash case,
we aggregate all of them together for some reason (though I'm not clear
about this one:
/*
 * In a parallel-aware hash join, for now we report the
 * maximum peak memory reported by any worker.
 */
hinstrument.space_peak =
Max(hinstrument.space_peak, worker_hi->space_peak);
 -- why shouldn't we sum these values?)

In contrast, in an indexonly scan you have a single counter and it
doesn't really matter the distribution of fetches done by workers, so it
seems okay to aggregate them all in a single counter.  And it being so
simple, it seems reasonable to me to put it in Instrumentation rather
than have a dedicated struct.

> I'm inclined to view any node-specific instrumentation that's not
> being pulled back to the leader as a rough edge to be filed down when
> it bothers somebody more than an outright bug, but perhaps that is an
> unduly lenient view.  Still, if we take the view that it's an outright
> bug, I suspect we find that there may be at least a few more of those.

OK.  As Tom indicated, it's not possible to backpatch this anyway.
Given that nobody has complained to date, it seems okay to be lenient
about this.  Seeking a backpatchable solution seems more trouble than is
worth.

> I was pretty much oblivious to this problem during the initial
> parallel query development and mistakenly assumed that bringing over
> struct Instrumentation was good enough.  It emerged late in the game
> that this wasn't really the case, but holding up the whole feature
> because some nodes might have details not reported on a per-worker
> basis didn't really seem to make sense.  Whether that was the right
> call is obviously arguable.

I certainly don't blame you for that.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Boolean partitions syntax

2018-04-10 Thread David Rowley
On 11 April 2018 at 03:34, Tom Lane  wrote:
> David Rowley  writes:
>> I imagined this would have had a check for volatile functions and some
>> user-friendly error message to say partition bounds must be immutable,
>> but instead, it does:
>
>> postgres=# create table d_p1 partition of d for values in (Random());
>> ERROR:  specified value cannot be cast to type double precision for column 
>> "d"
>> LINE 1: create table d_p1 partition of d for values in (Random());
>> ^
>> DETAIL:  The cast requires a non-immutable conversion.
>> HINT:  Try putting the literal value in single quotes.
>
>> For inspiration, maybe you could follow the lead of CREATE INDEX:
>
>> postgres=# create index on d ((random()));
>> ERROR:  functions in index expression must be marked IMMUTABLE
>
> Well, that just begs the question: why do these expressions need to
> be immutable?  What we really want, I think, is to evaluate them
> and reduce them to constants.  After that, it hardly matters whether
> the original expression was volatile.  I see absolutely no moral
> difference between "for values in (random())" and cases like
> this, which works today:

I'd personally be pretty surprised if this worked. What other DDL will
execute a volatile function? What if the volatile function has side
effects? What if the user didn't want the function evaluated and
somehow thought they wanted the evaluation to take place on INSERT?

I imagine if someone does this then they're probably doing something
wrong, and we should tell them, rather than perhaps silently doing
something they don't want. Perhaps someone might think they can
randomly distribute records into a set of partitions with something
like: for values in(((random() * 1000)::int between 0 and 100)), they
might be surprised when all their records end up in the same (random)
partition.

If we did this, then it seems like we're more likely to live to regret
doing it, rather than regret not doing it.  If someone comes along and
gives us some valid use case in the future, then maybe it can be
considered then. I just can't imagine what that use case would be...

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [PATCH] pg_hba.conf : new auth option : clientcert=verify-full

2018-04-10 Thread Magnus Hagander
On Tue, Apr 10, 2018 at 2:10 PM, Julian Markwort <
julian.markw...@uni-muenster.de> wrote:

> On Fri, 2018-04-06 at 20:31 +0200, Magnus Hagander wrote:
>
> I've been through this one again.
>
> Thanks for taking the time!
>
> There is one big omission from it -- it fails to work with the view
> pg_hba_file_rules. When fixing that, things started to look kind of ugly
> with the "two booleans to indicate one thing", so I went ahead and changed
> it to instead be an enum of 3 values.
>
> Oh, I missed that view. To be honest, it wasn't even on my mind that there
> could be a view depending on pg_hba
>
> Also, now when using verify-full or verify-ca, I figured its a lot easier
> to misspell the value, and we don't want that to mean "off". Instead, I
> made it give an error in this case instead of silently treating it as 0.
>
> Good catch!
>
> The option "2" for clientcert was never actually documented, and I think
> we should get rid of that. We need to keep "1" for backwards compatibility
> (which arguably was a bad choice originally, but that was many years ago),
> but let's not add another one.
> I also made a couple of very small changes to the docs.
>
> Attached is an updated patch with these changes. I'd appreciate it if you
> can run it through your tests to confirm that it didn't break any of those
> usecases.
>
> I've tested a couple of things with this and it seems to work as expected.
> Unforunately, there are no tests for libpq, afaik. But testing such
> features would become complicated quite quickly, with the need to generate
> certificates and such...
>

As Peter mentionde, there are in src/test/ssl. I forgot about those, but
yes, it would be useful to have that.



I've noticed that the error message for mismatching CN is now printed twice
> when using password prompts, as all authentication details are checked upon
> initiation of a connection and again later, after sending the password.
>

That depends on your authenticaiton method. Specifically for passwords,
what happens is there are actually two separate connections -- the first
one with no password supplied, and the second one with a password in it.

We could directly reject the connection after the first one and not ask for
a password. I'm not sure if that's better though -- that would leak the
information on *why* we rejected the connection.


2018-04-10 13:54:19.882 CEST [8735] LOG:  provided user name (testuser) and
> authenticated user name (nottestuser) do not match
> 2018-04-10 13:54:19.882 CEST [8735] LOG:  certificate validation failed
> for user "testuser": common name in client certificate does not match user
> name or mapping, but clientcert=verify-full is enabled.
> 2018-04-10 13:54:21.515 CEST [8736] LOG:  provided user name (testuser)
> and authenticated user name (nottestuser) do not match
> 2018-04-10 13:54:21.515 CEST [8736] LOG:  certificate validation failed
> for user "testuser": common name in client certificate does not match user
> name or mapping, but clientcert=verify-full is enabled.
> 2018-04-10 13:54:21.515 CEST [8736] FATAL:  password authentication failed
> for user "testuser"
> 2018-04-10 13:54:21.515 CEST [8736] DETAIL:  Connection matched
> pg_hba.conf line 94: "hostssl all testuser 127.0.0.1/32 password
> clientcert=verify-full"
>
>
> Also, as you can see, there are two LOG messages indicating the mismatch
> -- "provided user name (testuser) and authenticated user name (nottestuser)
> do not match" comes from hba.c:check_usermap() and "certificate validation
> failed for user "testuser": common name in client certificate does not
> match user name or mapping, but clientcert=verify-full is enabled." is the
> message added in auth.c:CheckCertAuth() by the patch.
> Maybe we should shorten the latter LOG message?
>


It might definitely be worth shorting it yeah. For one, we can just use
"cn" :)


-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: pgsql: Support partition pruning at execution time

2018-04-10 Thread Robert Haas
On Tue, Apr 10, 2018 at 12:29 PM, Alvaro Herrera
 wrote:
> In contrast, in an indexonly scan you have a single counter and it
> doesn't really matter the distribution of fetches done by workers, so it
> seems okay to aggregate them all in a single counter.  And it being so
> simple, it seems reasonable to me to put it in Instrumentation rather
> than have a dedicated struct.

I don't have a strong opinion on that.  Since we know how many tuples
were processed by each worker, knowing how many heap fetches we have
on a per-worker basis seems like a good thing to have, too.  On the
other hand, maybe EXPLAIN (ANALYZE, VERBOSE) would give us that
anyway, since it knows about displaying per-worker instrumentation
(look for /* Show worker detail */).  If it doesn't, then that's
probably bad, because I'm pretty sure that when I installed that code
the stuff that got displayed for worker instrumentation pretty much
matched the stuff that got displayed for overall instrumentation.

In any case part of the point is that Instrumentation is supposed to
be a generic structure that contains things that are for the most part
common to all node types.  So what MERGE did to that structure looks
like in inadvisable kludge to me.  I'd get rid of that and do it a
different way rather than propagate the idea that nfilteredX is
scratch space that can mean something different in every separate
node.

>> I was pretty much oblivious to this problem during the initial
>> parallel query development and mistakenly assumed that bringing over
>> struct Instrumentation was good enough.  It emerged late in the game
>> that this wasn't really the case, but holding up the whole feature
>> because some nodes might have details not reported on a per-worker
>> basis didn't really seem to make sense.  Whether that was the right
>> call is obviously arguable.
>
> I certainly don't blame you for that.

Thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Optimization of range queries

2018-04-10 Thread Konstantin Knizhnik



On 09.04.2018 20:05, Teodor Sigaev wrote:

Hi!

12 years ago I proposed patch to which could "union" OR clauses into 
one range clause if it's possible. In that time pgsql could not use IS 
NULL as index clause, so patch doesn't support that


https://www.postgresql.org/message-id/flat/45742C51.9020602%40sigaev.ru

option number 4), all other are already committed.


It seems to be slightly different optimization.
Attached please find small patch which extends simplify_and_arguments in 
clauses.c to eliminated redundant checks.
It doesn't perform complete constrains propagation and not using 
predicate_implied_by/predicate_refuted_by because them seems to be too 
expensive and essentially increase
query optimization time. Instead of it it just strict match comparison 
of predicates with some extra logic for handling negators.


With this patch constructed query plans are optimal:

postgres=# create table foo(x integer primary key, y integer);
CREATE TABLE
postgres=# insert into foo (x) values (generate_series(1,10));
INSERT 0 10
postgres=# insert into foo (x,y) values (generate_series(11,20), 1);
INSERT 0 10
postgres=# vacuum analyze foo;
VACUUM
postgres=# explain select * from foo where not (x < 9 and y is not 
null) and (x <= 11 and y is not null);

 QUERY PLAN

 Index Scan using foo_pkey on foo  (cost=0.42..8.48 rows=2 width=8)
   Index Cond: ((x <= 11) AND (x >= 9))
   Filter: (y IS NOT NULL)
(3 rows)

postgres=# explain select * from foo where  x <= 11 and y is not 
null and not (x < 9 and y is not null);

 QUERY PLAN

 Index Scan using foo_pkey on foo  (cost=0.42..8.48 rows=2 width=8)
   Index Cond: ((x <= 11) AND (x >= 9))
   Filter: (y IS NOT NULL)
(3 rows)






Konstantin Knizhnik wrote:

Hi hackers,

Postgres optimizer is not able to build efficient execution plan for 
the following query:


explain select * from  people_raw where not ("ID"<2068113880 AND 
"INN" is not null) and "ID"<=2068629726 AND "INN" is not null;

  QUERY PLAN
 

  Bitmap Heap Scan on people_raw (cost=74937803.72..210449640.49 
rows=121521030 width=336)

    Recheck Cond: ("ID" <= 2068629726)
    Filter: (("INN" IS NOT NULL) AND (("ID" >= 2068113880) OR ("INN" 
IS NULL)))
    ->  Bitmap Index Scan on "People_pkey" (cost=0.00..74907423.47 
rows=2077021718 width=0)

  Index Cond: ("ID" <= 2068629726)
(5 rows)


Here the table is very large, but query effects only relatively small 
number of rows located in the range: [2068113880,2068629726]

But unfortunately optimizer doesn't take it into the account.
Moreover, using "is not null" and "not null" is both operands of AND 
is not smart:
  (("INN" IS NOT NULL) AND (("ID" >= 2068113880) OR ("INN" IS 
NULL)))


If I remove "is not null" condition, then plan is perfect:

explain select * from  people_raw where not ("ID"<2068113880) and 
"ID"<=2068629726;

  QUERY PLAN
 

  Index Scan using "People_pkey" on people_raw (cost=0.58..196745.57 
rows=586160 width=336)

    Index Cond: (("ID" >= 2068113880) AND ("ID" <= 2068629726))
(2 rows)

Before starting  investigation of the problem, I will like to know 
opinion and may be some advise of people familiar with optimizer:

how difficult will be to handle this case and where to look.

Thanks in advance,





--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index ed6b680..488d0e6 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -322,6 +322,133 @@ and_clause(Node *clause)
 }
 
 /*
+ * Returns t iff its argument is an 'true' constant
+ */
+static bool
+is_true(Node *clause)
+{
+	return IsA(clause, Const) &&
+		!((Const *) clause)->constisnull &&
+		DatumGetBool(((Const *) clause)->constvalue);
+}
+
+/*
+ * Returns t iff its argument is an 'not' clause: (NOT { expr }).
+ */
+static bool
+is_not(Node *clause)
+{
+	return IsA(clause, BoolExpr) &&
+		((BoolExpr *) clause)->boolop == NOT_EXPR;
+}
+
+static bool is_negator(Node *c1, Node *c2);
+
+
+/*
+ * Returns t iff two expressions are the same or one of them is 'not' clause is it's argument is negator of other expression
+ */
+static bool
+is_equal(Node *c1, Node *c2)
+{
+	if (equal(c1, c2))
+		return true;
+
+	if ((is_not(c1) && is_negator(linitial(((BoolExpr*)c1)->args), c2)) ||
+		(is_not(c2) && is_negator(linitial(((BoolExpr*)c2)->args), c1)))
+		return true;
+
+	return false;
+}
+
+/*
+ *

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-10 Thread Greg Stark
On 9 April 2018 at 11:50, Anthony Iliopoulos  wrote:
> On Mon, Apr 09, 2018 at 09:45:40AM +0100, Greg Stark wrote:
>> On 8 April 2018 at 22:47, Anthony Iliopoulos  wrote:

> To make things a bit simpler, let us focus on EIO for the moment.
> The contract between the block layer and the filesystem layer is
> assumed to be that of, when an EIO is propagated up to the fs,
> then you may assume that all possibilities for recovering have
> been exhausted in lower layers of the stack.

Well Postgres is using the filesystem. The interface between the block
layer and the filesystem may indeed need to be more complex, I
wouldn't know.

But I don't think "all possibilities" is a very useful concept.
Neither layer here is going to be perfect. They can only promise that
all possibilities that have actually been implemented have been
exhausted. And even among those only to the degree they can be done
automatically within the engineering tradeoffs and constraints. There
will always be cases like thin provisioned devices that an operator
can expand, or degraded raid arrays that can be repaired after a long
operation and so on. A network device can't be sure whether a remote
server may eventually come back or not and have to be reconfigured by
a human or system automation tool to point to the new server or new
network configuration.

> Right. This implies though that apart from the kernel having
> to keep around the dirtied-but-unrecoverable pages for an
> unbounded time, that there's further an interface for obtaining
> the exact failed pages so that you can read them back.

No, the interface we have is fsync which gives us that information
with the granularity of a single file. The database could in theory
recognize that fsync is not completing on a file and read that file
back and write it to a new file. More likely we would implement a
feature Oracle has of writing key files to multiple devices. But
currently in practice that's not what would happen, what would happen
would be a human would recognize that the database has stopped being
able to commit and there are hardware errors in the log and would stop
the database, take a backup, and restore onto a new working device.
The current interface is that there's one error and then Postgres
would pretty much have to say, "sorry, your database is corrupt and
the data is gone, restore from your backups". Which is pretty dismal.

> There is a clear responsibility of the application to keep
> its buffers around until a successful fsync(). The kernels
> do report the error (albeit with all the complexities of
> dealing with the interface), at which point the application
> may not assume that the write()s where ever even buffered
> in the kernel page cache in the first place.

Postgres cannot just store the entire database in RAM. It writes
things to the filesystem all the time. It calls fsync only when it
needs a write barrier to ensure consistency. That's only frequent on
the transaction log to ensure it's flushed before data modifications
and then periodically to checkpoint the data files. The amount of data
written between checkpoints can be arbitrarily large and Postgres has
no idea how much memory is available as filesystem buffers or how much
i/o bandwidth is available or other memory pressure there is. What
you're suggesting is that the application should have to babysit the
filesystem buffer cache and reimplement all of it in user-space
because the filesystem is free to throw away any data any time it
chooses?

The current interface to throw away filesystem buffer cache is
unmount. It sounds like the kernel would like a more granular way to
discard just part of a device which makes a lot of sense in the age of
large network block devices. But I don't think just saying that the
filesystem buffer cache is now something every application needs to
re-implement in user-space really helps with that, they're going to
have the same problems to solve.

-- 
greg



Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2018-04-10 Thread Robert Haas
On Tue, Apr 10, 2018 at 5:40 AM, Masahiko Sawada  wrote:
> The probability of performance degradation can be reduced by
> increasing N_RELEXTLOCK_ENTS. But as Robert mentioned, while keeping
> fast and simple implementation like acquiring lock by a few atomic
> operation it's hard to improve or at least keep the current
> performance on all cases. I was thinking that this patch is necessary
> by parallel DML operations and vacuum but if the community cannot
> accept this approach it might be better to mark it as "Rejected" and
> then I should reconsider the design of parallel vacuum.

I'm sorry that I didn't get time to work further on this during the
CommitFest.  In terms of moving forward, I'd still like to hear what
Andres has to say about the comments I made on March 1st.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pg_dump should use current_database() instead of PQdb()

2018-04-10 Thread Peter Eisentraut
A report from a pgbouncer user revealed that running pg_dump -C/--create
does not work through a connection proxy if the virtual database name on
the proxy does not match the real database name on the database server.
That's because pg_dump looks up the database to be dumped using the
information from PQdb().  It should be using current_database() instead.
 (The code was quite likely written before current_database() was
available (PG 7.3)).

See attached patch.

There are a few other uses of PQdb() in pg_dump, but I think those are
OK because they relate to connection information.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From eaccc63b2f9943633f957f80d67acf54ad70098d Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 10 Apr 2018 12:32:05 -0400
Subject: [PATCH] pg_dump: Use current_database() instead of PQdb()

For querying pg_database about information about the database being
dumped, look up by using current_database() instead of the value
obtained from PQdb().  When using a connection proxy, the value from
PQdb() might not be the real name of the database.
---
 src/bin/pg_dump/pg_dump.c | 52 +--
 1 file changed, 22 insertions(+), 30 deletions(-)

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index d066f4f00b..52698cb3b6 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -251,7 +251,7 @@ static char *convertRegProcReference(Archive *fout,
const char *proc);
 static char *getFormattedOperatorName(Archive *fout, const char *oproid);
 static char *convertTSFunction(Archive *fout, Oid funcOid);
-static Oid findLastBuiltinOid_V71(Archive *fout, const char *);
+static Oid findLastBuiltinOid_V71(Archive *fout);
 static char *getFormattedTypeName(Archive *fout, Oid oid, OidOptions opts);
 static void getBlobs(Archive *fout);
 static void dumpBlob(Archive *fout, BlobInfo *binfo);
@@ -735,8 +735,7 @@ main(int argc, char **argv)
 * With 8.1 and above, we can just use FirstNormalObjectId - 1.
 */
if (fout->remoteVersion < 80100)
-   g_last_builtin_oid = findLastBuiltinOid_V71(fout,
-   
PQdb(GetConnection(fout)));
+   g_last_builtin_oid = findLastBuiltinOid_V71(fout);
else
g_last_builtin_oid = FirstNormalObjectId - 1;
 
@@ -2538,6 +2537,7 @@ dumpDatabase(Archive *fout)
PGresult   *res;
int i_tableoid,
i_oid,
+   i_datname,
i_dba,
i_encoding,
i_collate,
@@ -2565,16 +2565,13 @@ dumpDatabase(Archive *fout)
minmxid;
char   *qdatname;
 
-   datname = PQdb(conn);
-   qdatname = pg_strdup(fmtId(datname));
-
if (g_verbose)
write_msg(NULL, "saving database definition\n");
 
/* Fetch the database-level properties for this database */
if (fout->remoteVersion >= 90600)
{
-   appendPQExpBuffer(dbQry, "SELECT tableoid, oid, "
+   appendPQExpBuffer(dbQry, "SELECT tableoid, oid, datname, "
  "(%s datdba) AS dba, "
  
"pg_encoding_to_char(encoding) AS encoding, "
  "datcollate, datctype, 
datfrozenxid, datminmxid, "
@@ -2591,13 +2588,12 @@ dumpDatabase(Archive *fout)
  "shobj_description(oid, 
'pg_database') AS description "
 
  "FROM pg_database "
- "WHERE datname = ",
+ "WHERE datname = 
current_database()",
  username_subquery);
-   appendStringLiteralAH(dbQry, datname, fout);
}
else if (fout->remoteVersion >= 90300)
{
-   appendPQExpBuffer(dbQry, "SELECT tableoid, oid, "
+   appendPQExpBuffer(dbQry, "SELECT tableoid, oid, datname, "
  "(%s datdba) AS dba, "
  
"pg_encoding_to_char(encoding) AS encoding, "
  "datcollate, datctype, 
datfrozenxid, datminmxid, "
@@ -2606,13 +2602,12 @@ dumpDatabase(Archive *fout)
  "shobj_description(oid, 
'pg_database') AS description "
 
  "FROM pg_database "
- "WHERE d

Re: pg_dump should use current_database() instead of PQdb()

2018-04-10 Thread Tom Lane
Peter Eisentraut  writes:
> A report from a pgbouncer user revealed that running pg_dump -C/--create
> does not work through a connection proxy if the virtual database name on
> the proxy does not match the real database name on the database server.
> That's because pg_dump looks up the database to be dumped using the
> information from PQdb().  It should be using current_database() instead.
>  (The code was quite likely written before current_database() was
> available (PG 7.3)).

> See attached patch.

Looks reasonable in a quick once-over, but I've not tested it.

regards, tom lane



Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-10 Thread Greg Stark
On 10 April 2018 at 02:59, Craig Ringer  wrote:

> Nitpick: In most cases the kernel reserves disk space immediately,
> before returning from write(). NFS seems to be the main exception
> here.

I'm kind of puzzled by this. Surely NFS servers store the data in the
filesystem using write(2) or the in-kernel equivalent? So if the
server is backed by a filesystem where write(2) preallocates space
surely the NFS server must behave as if it'spreallocating as well? I
would expect NFS to provide basically the same set of possible
failures as the underlying filesystem (as long as you don't enable
nosync of course).

-- 
greg



Re: pgsql: Merge catalog/pg_foo_fn.h headers back into pg_foo.h headers.

2018-04-10 Thread Tom Lane
Julien Rouhaud  writes:
> On Tue, Apr 10, 2018 at 3:46 PM, Tom Lane  wrote:
>> Hm.  I wonder if we don't also want NO_TEMP_INSTALL, like the doc/src/sgml
>> makefile.  I don't know whether "make check" could be useful in a PGXS
>> build, but certainly that recipe for making a temp install isn't gonna
>> work.

> If I understand correctly, PGXS.mk already forbids a "make check" if
> PGXS is defined.  And it seems that postgres' contribs rely on
> including PGXS.mk, setting NO_PGXS and doing a "make check", so
> NO_TEMP_INSTALL shouldn't be needed.

Well, the question that's bothering me is how come "make check" in
an external build doesn't try to execute the temp-install rule before
printing that error message.  Experimentation shows that it doesn't,
but it sure looks to me like it should: there's nothing conditional
about the "check: temp-install" dependency in Makefile.global.  And
none of the three if-guards in the temp-install rule itself should
prevent this, so what is preventing it?  I don't see it.

For this reason, I thought that adding NO_TEMP_INSTALL would be a good
safety factor in case somebody changes/breaks whatever unobvious
thing is keeping this from failing, and so I went ahead and did it.

regards, tom lane



Re: Boolean partitions syntax

2018-04-10 Thread Tom Lane
David Rowley  writes:
> On 11 April 2018 at 03:34, Tom Lane  wrote:
>> Well, that just begs the question: why do these expressions need to
>> be immutable?  What we really want, I think, is to evaluate them
>> and reduce them to constants.  After that, it hardly matters whether
>> the original expression was volatile.  I see absolutely no moral
>> difference between "for values in (random())" and cases like
>> this, which works today:

> I'd personally be pretty surprised if this worked.

Well, my point is that we're *already* behaving that way in some cases,
simply because of the existence of macro-like inputs for some of these
datatypes.  I'm not sure that users are going to perceive a big difference
between 'now'::timestamptz and now(), for example.  If we take one but
not the other, I don't think anybody will see that as a feature.

> What other DDL will execute a volatile function?

This might be a valid concern, not sure.  It's certainly true that
most other DDL doesn't result in acquiring a transaction snapshot;
but it's not *universally* true.  Certainly there's DDL that can
execute nigh-arbitrary user code, such as CREATE INDEX.

> What if the volatile function has side
> effects?

Can't say that that bothers me.  If the user has thought about what
they're doing, the results won't surprise them; if they haven't,
they're likely to be surprised in any case.

We might be well advised to do a CCI after evaluating the expressions,
but that could still be before anything interesting happens.

> What if the user didn't want the function evaluated and
> somehow thought they wanted the evaluation to take place on INSERT?

You could object to awfully large chunks of SQL on the grounds that
it might confuse somebody.

regards, tom lane



Re: pgsql: Merge catalog/pg_foo_fn.h headers back into pg_foo.h headers.

2018-04-10 Thread Julien Rouhaud
On Tue, Apr 10, 2018 at 6:58 PM, Tom Lane  wrote:
>
> Well, the question that's bothering me is how come "make check" in
> an external build doesn't try to execute the temp-install rule before
> printing that error message.  Experimentation shows that it doesn't,
> but it sure looks to me like it should: there's nothing conditional
> about the "check: temp-install" dependency in Makefile.global.  And
> none of the three if-guards in the temp-install rule itself should
> prevent this, so what is preventing it?  I don't see it.

I just checked, and for the record the second rule (ifneq
($(abs_top_builddir),) is actually preventing it.  On top of
Makefile.global:

# Set top_srcdir, srcdir, and VPATH.
ifdef PGXS
top_srcdir = $(top_builddir)
[...]
else # not PGXS
[...]
abs_top_builddir = @abs_top_builddir@
[...]
endif # not PGXS

>
> For this reason, I thought that adding NO_TEMP_INSTALL would be a good
> safety factor in case somebody changes/breaks whatever unobvious
> thing is keeping this from failing, and so I went ahead and did it.

It makes sense.  Thanks for correcting and pushing!



Re: submake-errcodes

2018-04-10 Thread Devrim Gündüz

Hi,

On Tue, 2018-04-10 at 10:01 -0400, Tom Lane wrote:
> You could replace it with submake-generated-headers, since that's more
> general, but in principle you shouldn't need anything because that
> target is invoked automatically as of yesterday.  What's the larger
> context here --- why do you need any of this?

Good question -- IIRC we used it to build PL/Python. Just confirmed that
removing from v10 spec file does not break anything.

However, as Christoph wrote, builds against git master fail:

==
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement 
-Wendif-labels -Wmissing-format-attribute -Wformat-security 
-fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -O2 -g -pipe -Wall 
-Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions 
-fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches 
-specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -m64 -mtune=generic 
-fasynchronous-unwind-tables -fPIC -I. -I. -I/usr/include/python3.6m 
-I../../../src/include  -D_GNU_SOURCE -I/usr/include/libxml2  -I/usr/include  
-c -o plpy_resultobject.o plpy_resultobject.c
In file included from ../../../src/include/postgres.h:47:0,
 from plpy_cursorobject.c:7:
../../../src/include/utils/elog.h:71:10: fatal error: utils/errcodes.h: No such 
file or directory
 #include "utils/errcodes.h"
  ^~
In file included from ../../../src/include/postgres.h:47:0,
 from plpy_procedure.c:7:
../../../src/include/utils/elog.h:71:10: fatal error: utils/errcodes.h: No such 
file or directory
 #include "utils/errcodes.h"
  ^~
compilation terminated.
compilation terminated.
make[1]: *** [: plpy_procedure.o] Error 1
make[1]: *** Waiting for unfinished jobs
make[1]: *** [: plpy_cursorobject.o] Error 1
==

Regards,

-- 
Devrim Gündüz
EnterpriseDB: https://www.enterprisedb.com
PostgreSQL Consultant, Red Hat Certified Engineer
Twitter: @DevrimGunduz , @DevrimGunduzTR

signature.asc
Description: This is a digitally signed message part


Re: [sqlsmith] Failed assertion in create_gather_path

2018-04-10 Thread Robert Haas
On Tue, Apr 10, 2018 at 2:59 AM, Jeevan Chalke
 wrote:
> I actually wanted to have rel->consider_parallel in the condition (yes, for
> additional safety) as we are adding a partial path into rel. But then
> observed that it is same as that of final_rel->consider_parallel and thus
> used it along with other condition.
>
> I have observed at many places that we do check consider_parallel flag
> before adding a partial path to it. Thus for consistency added here too, but
> yes, it just adds an additional safety here.

Thanks to Andreas for reporting this issue and to Jeevan for fixing
it.  My commit 0927d2f46ddd4cf7d6bf2cc84b3be923e0aedc52 seems clearly
to blame.

The change to set_subquery_pathlist() looks correct, but can we add a
simple test case?  Right now if I keep the new Assert() in
add_partial_path() and leave out the rest of the changes, the
regression tests still pass.  That's not so good.  Also, I think I
would be inclined to wrap the if-statement around the rest of the
function instead of returning early.

The new Assert() in add_partial_path() is an excellent idea.  I had
the same thought before, but I didn't do anything about it.  That was
a bad idea; your plan is better. In fact, I suggest an additional
Assert() that any relation to which we're adding a partial path is
marked consider_parallel, like this:

+/* Path to be added must be parallel safe. */
+Assert(new_path->parallel_safe);
+
+/* Relation should be OK for parallelism, too. */
+Assert(parent_rel->consider_parallel);

Regarding recurse_set_operations, since the rel->consider_parallel is
always the same as final_rel->consider_parallel there's really no
value in testing it.  If it were possible for rel->consider_parallel
to be false even when final_rel->consider_parallel were true then the
test would be necessary for correctness.  That might or might not
happen in the future, so I guess it wouldn't hurt to include this for
future-proofing, but it's not actually a bug fix, so it also wouldn't
hurt to leave it out.  I could go either way, I guess.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pgsql: Merge catalog/pg_foo_fn.h headers back into pg_foo.h headers.

2018-04-10 Thread Tom Lane
Julien Rouhaud  writes:
> On Tue, Apr 10, 2018 at 6:58 PM, Tom Lane  wrote:
>> none of the three if-guards in the temp-install rule itself should
>> prevent this, so what is preventing it?  I don't see it.

> I just checked, and for the record the second rule (ifneq
> ($(abs_top_builddir),) is actually preventing it.

Ah-hah.  I'd misread that maze of twisty little ifdefs that
sets up abs_top_builddir et al.  Thanks for clarifying!

regards, tom lane



Re: submake-errcodes

2018-04-10 Thread Tom Lane
Devrim =?ISO-8859-1?Q?G=FCnd=FCz?=  writes:
> On Tue, 2018-04-10 at 10:01 -0400, Tom Lane wrote:
>> You could replace it with submake-generated-headers, since that's more
>> general, but in principle you shouldn't need anything because that
>> target is invoked automatically as of yesterday.  What's the larger
>> context here --- why do you need any of this?

> However, as Christoph wrote, builds against git master fail:

Hm ... you're cd'ing into src/pl/plpython and issuing "make all"?
That works for me.

... or, wait ... with -j it doesn't.  That's strange, will look.

regards, tom lane



Re: Function to track shmem reinit time

2018-04-10 Thread Robert Haas
On Tue, Apr 10, 2018 at 9:07 AM, David Steele  wrote:
> On 3/29/18 9:40 AM, Tomas Vondra wrote:
>> On 03/28/2018 08:55 PM, David Steele wrote:
>>> I'm setting this entry to Waiting on Author, but based on the discussion
>>> I think it should be Returned with Feedback.
>>
>> Fine with me.
>
> This entry has been marked Returned with Feedback.

I guess I'm in the minority here, but I don't see why it isn't useful
to have something like this. It's a lot easier for a monitoring
process to poll for this kind of thing than it is to monitor the logs.
Not that log monitoring isn't a good idea, but this is pretty
lightweight.

OTOH, it also seems like this could live as an out-of-core extension,
so I guess if nobody else likes the idea Anastasia could always do it
that way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



  1   2   >