Re: Implementing Incremental View Maintenance

2018-12-31 Thread Nguyễn Trần Quốc Vinh
Dear all,

We have some result on incremental update for MVs. We generate triggers on
C to do the incremental maintenance. We posted the code to github about 1
year ago, but unfortunately i posted a not-right ctrigger.h header. The
mistake was exposed to me when a person could not compile the generated
triggers and reported to me. And now i re-posted with the right ctrigger.h
file.

You can find the codes of the generator here:
https://github.com/ntqvinh/PgMvIncrementalUpdate/commits/master. You can
find how did we do here:
https://link.springer.com/article/10.1134/S0361768816050066. The paper is
about generating of codes in pl/pgsql. Anyway i see it is useful for
reading the codes. I don't know if i can share the paper or not so that i
don't publish anywhere else. The text about how to generate triggers in C
was published with open-access but unfortunately, it is in Vietnamese.

We are happy if the codes are useful for someone.

Thank you and best regards,

NTQ Vinh

TS. Nguyễn Trần Quốc Vinh
---
Chủ nhiệm khoa Tin học
Trường ĐH Sư phạm - ĐH Đà Nẵng

Nguyen Tran Quoc Vinh, PhD
Dean
Faculty of Information Technology
Danang University of Education
Website: http://it.ued.udn.vn; http://www.ued.vn ;
http://www.ued.udn.vn
SCV: http://scv.ued.vn/~ntquocvinh 
Phone: (+84) 511.6-512-586
Mobile: (+84) 914.78-08-98

On Mon, Dec 31, 2018 at 11:20 PM Adam Brusselback 
wrote:

> Hi all, just wanted to say  I am very happy to see progress made on this,
> my codebase has multiple "materialized tables" which are maintained with
> statement triggers (transition tables) and custom functions. They are ugly
> and a pain to maintain, but they work because I have no other
> solution...for now at least.
>
> I am concerned that the eager approach only addresses a subset of the MV
>> use
>> case space, though. For example, if we presume that an MV is present
>> because
>> the underlying direct query would be non-performant, then we have to at
>> least question whether applying the delta-update would also be detrimental
>> to some use cases.
>>
>
> I will say that in my case, as long as my reads of the materialized view
> are always consistent with the underlying data, that's what's important.  I
> don't mind if it's eager, or lazy (as long as lazy still means it will
> refresh prior to reading).
>


Re: Implicit make rules break test examples

2018-12-31 Thread Donald Dong
On Mon, Dec 31, 2018 at 11:24 PM Donald Dong  wrote:
>
> Hi,
>
> In src/test/example, the implicit make rules produce errors:
>
> make -C ../../../src/backend generated-headers
> make[1]: Entering directory '/home/ddong/postgresql/bld/src/backend'
> make -C catalog distprep generated-header-symlinks
> make[2]: Entering directory '/home/ddong/postgresql/bld/src/backend/catalog'
> make[2]: Nothing to be done for 'distprep'.
> make[2]: Nothing to be done for 'generated-header-symlinks'.
> make[2]: Leaving directory '/home/ddong/postgresql/bld/src/backend/catalog'
> make -C utils distprep generated-header-symlinks
> make[2]: Entering directory '/home/ddong/postgresql/bld/src/backend/utils'
> make[2]: Nothing to be done for 'distprep'.
> make[2]: Nothing to be done for 'generated-header-symlinks'.
> make[2]: Leaving directory '/home/ddong/postgresql/bld/src/backend/utils'
> make[1]: Leaving directory '/home/ddong/postgresql/bld/src/backend'
> gcc -Wall -Wmissing-prototypes -Wpointer-arith
> -Wdeclaration-after-statement -Werror=vla -Wendif-labels
> -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
> -fwrapv -fexcess-precision=standard -Wno-format-truncation -O2
> -I/home/ddong/postgresql/bld/../src/interfaces/libpq
> -I../../../src/include -I/home/ddong/postgresql/bld/../src/include
> -D_GNU_SOURCE   -c -o testlibpq.o
> /home/ddong/postgresql/bld/../src/test/examples/testlibpq.c
> gcc -L../../../src/port -L../../../src/common -L../../../src/common
> -lpgcommon -L../../../src/port -lpgport
> -L../../../src/interfaces/libpq -lpq   -Wl,--as-needed
> -Wl,-rpath,'/usr/local/pgsql/lib',--enable-new-dtags  testlibpq.o   -o
> testlibpq
> testlibpq.o: In function `exit_nicely':
> testlibpq.c:(.text.unlikely+0x5): undefined reference to `PQfinish'
> testlibpq.o: In function `main':
> testlibpq.c:(.text.startup+0x22): undefined reference to `PQconnectdb'
> testlibpq.c:(.text.startup+0x2d): undefined reference to `PQstatus'
> testlibpq.c:(.text.startup+0x44): undefined reference to `PQexec’
> …
>
> I think the -lpq flag does not have any effects in the middle of the
> arguments. It works if move the flag to the end:
>
> gcc -L../../../src/port -L../../../src/common -L../../../src/common
> -lpgcommon -L../../../src/port -lpgport
> -L../../../src/interfaces/libpq -Wl,--as-needed
> -Wl,-rpath,'/usr/local/pgsql/lib',--enable-new-dtags  testlibpq.o   -o
> testlibpq -lpq
>
> So I added an explicit rule to rearrange the flags:
>
> gcc testlibpq.o -o testlibpq -L../../../src/port -L../../../src/common
> -L../../../src/common -lpgcommon -L../../../src/port -lpgport
> -L../../../src/interfaces/libpq -lpq   -Wl,--as-needed
> -Wl,-rpath,'/usr/local/pgsql/lib',--enable-new-dtags
>
> Then the make command works as expected. This is my first time writing
> a patch. Please let me know what you think!
>
> Thank you,
> Happy new year!
> Donald Dong



-- 
Donald Dong


explicit_make_rule_test_example_v1.patch
Description: Binary data


Implicit make rules break test examples

2018-12-31 Thread Donald Dong
Hi,

In src/test/example, the implicit make rules produce errors:

make -C ../../../src/backend generated-headers
make[1]: Entering directory '/home/ddong/postgresql/bld/src/backend'
make -C catalog distprep generated-header-symlinks
make[2]: Entering directory '/home/ddong/postgresql/bld/src/backend/catalog'
make[2]: Nothing to be done for 'distprep'.
make[2]: Nothing to be done for 'generated-header-symlinks'.
make[2]: Leaving directory '/home/ddong/postgresql/bld/src/backend/catalog'
make -C utils distprep generated-header-symlinks
make[2]: Entering directory '/home/ddong/postgresql/bld/src/backend/utils'
make[2]: Nothing to be done for 'distprep'.
make[2]: Nothing to be done for 'generated-header-symlinks'.
make[2]: Leaving directory '/home/ddong/postgresql/bld/src/backend/utils'
make[1]: Leaving directory '/home/ddong/postgresql/bld/src/backend'
gcc -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Werror=vla -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
-fwrapv -fexcess-precision=standard -Wno-format-truncation -O2
-I/home/ddong/postgresql/bld/../src/interfaces/libpq
-I../../../src/include -I/home/ddong/postgresql/bld/../src/include
-D_GNU_SOURCE   -c -o testlibpq.o
/home/ddong/postgresql/bld/../src/test/examples/testlibpq.c
gcc -L../../../src/port -L../../../src/common -L../../../src/common
-lpgcommon -L../../../src/port -lpgport
-L../../../src/interfaces/libpq -lpq   -Wl,--as-needed
-Wl,-rpath,'/usr/local/pgsql/lib',--enable-new-dtags  testlibpq.o   -o
testlibpq
testlibpq.o: In function `exit_nicely':
testlibpq.c:(.text.unlikely+0x5): undefined reference to `PQfinish'
testlibpq.o: In function `main':
testlibpq.c:(.text.startup+0x22): undefined reference to `PQconnectdb'
testlibpq.c:(.text.startup+0x2d): undefined reference to `PQstatus'
testlibpq.c:(.text.startup+0x44): undefined reference to `PQexec’
…

I think the -lpq flag does not have any effects in the middle of the
arguments. It works if move the flag to the end:

gcc -L../../../src/port -L../../../src/common -L../../../src/common
-lpgcommon -L../../../src/port -lpgport
-L../../../src/interfaces/libpq -Wl,--as-needed
-Wl,-rpath,'/usr/local/pgsql/lib',--enable-new-dtags  testlibpq.o   -o
testlibpq -lpq

So I added an explicit rule to rearrange the flags:

gcc testlibpq.o -o testlibpq -L../../../src/port -L../../../src/common
-L../../../src/common -lpgcommon -L../../../src/port -lpgport
-L../../../src/interfaces/libpq -lpq   -Wl,--as-needed
-Wl,-rpath,'/usr/local/pgsql/lib',--enable-new-dtags

Then the make command works as expected. This is my first time writing
a patch. Please let me know what you think!

Thank you,
Happy new year!
Donald Dong



Re: rewrite ExecPartitionCheckEmitError

2018-12-31 Thread David Rowley
On Sat, 29 Dec 2018 at 06:52, Alvaro Herrera  wrote:
> Point taken; pushed with that change.

Marking this as committed in the CF app.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Offline enabling/disabling of data checksums

2018-12-31 Thread Michael Paquier
On Sat, Dec 29, 2018 at 11:55:43AM +0100, Michael Banck wrote:
> Renaming applications shouldn't be a problem unless they have to be
> moved from one binary package to another. I assume all packagers ship
> all client/server binaries in one package, respectively (and not e.g. a
> dedicated postgresql-11-pg_test_fsync package), this should only be a
> matter of updating package metadata.
> 
> In any case, it should be identical to the xlog->wal rename.

I have poked -packagers on the matter and I am seeing no complains, so
let's move forward with this stuff.  From the consensus I am seeing on
the thread, we have been discussing about the following points:
1) Rename pg_verify_checksums to pg_checksums.
2) Have separate switches for each action, aka --verify, --enable and
--disable, or a unified --action switch which can take different
values.
3) Do we want to imply --verify by default if no switch is specified?

About 2), folks who have expressed an opinion are:
- Multiple switches: Robert, Fabien, Magnus
- Single --action switch: Michael B, Michael P

About 3), aka --verify implied if no action is specified:
- In favor: Fabien C, Magnus
- Against: Michael P

If I missed what someone said, please feel free to complete with your
votes here.
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] check for ctags utility in make_ctags

2018-12-31 Thread Michael Paquier
On Mon, Dec 31, 2018 at 07:19:39PM +0300, Nikolay Shaplov wrote:
> В письме от понедельник, 31 декабря 2018 г. 19:04:08 MSK пользователь Nikolay 
> Shaplov написал:
> 
>> I'd like to propose a small patch for make_ctags script. It checks if ctags
>> utility is intalled or not. If not it reports an error and advises to
>> install ctags.
>
> Oups. I've misplaced '&' character :-)

Not sure if that's something worse bothering about, but you could do
the same in src/tools/make_etags.
--
Michael


signature.asc
Description: PGP signature


Re: Early WIP/PoC for inlining CTEs

2018-12-31 Thread Andrew Gierth
> "Andreas" == Andreas Karlsson  writes:

 Andreas> +   if (rte->rtekind == RTE_CTE &&
 Andreas> +   strcmp(rte->ctename, context->ctename) == 0 &&
 Andreas> +   rte->ctelevelsup == context->levelsup)
 Andreas> +   {
 Andreas> +   Query *newquery = copyObject(context->ctequery);
 Andreas> +
 Andreas> +   /* Preserve outer references, for example to other CTEs */
 Andreas> +   if (context->levelsup > 0)
 Andreas> +   IncrementVarSublevelsUp((Node *) newquery, 
context->levelsup, 1);

I had a comment around here which seems to have been lost:

 * Secondly, views (and explicit subqueries) currently have
 * different behaviour w.r.t. SELECT FOR UPDATE than CTEs do. A
 * FOR UPDATE clause is treated as extending into views and
 * subqueries, but not into CTEs. We preserve this distinction
 * by not trying to push rowmarks into the new subquery.

This comment seems to me to be worth preserving (unless this behavior is
changed). What I'm referring to is the following, which is unchanged by
the patch:

create table t1 as select 123 as a;
create view v1 as select * from t1;
select * from t1 for update;  -- locks row in t1
select * from t1 for update of t1;  -- locks row in t1
select * from v1 for update;  -- locks row in t1
select * from v1 for update of v1;  -- locks row in t1
select * from (select * from t1) s1 for update;  -- locks row in t1
select * from (select * from t1) s1 for update of s1;  -- locks row in t1
with c1 as (select * from t1)
  select * from c1 for update;  -- does NOT lock anything at all
with c1 as (select * from t1)
  select * from c1 for update of c1;  -- parse-time error

(Obviously, inlining decisions should not change what gets locked;
the behavior here should not be changed unless it is changed for both
inlined and non-inlined CTEs.)

-- 
Andrew (irc:RhodiumToad)



Re: Removing --disable-strong-random from the code

2018-12-31 Thread Michael Paquier
On Mon, Dec 31, 2018 at 10:20:28AM +0900, Michael Paquier wrote:
> On Sun, Dec 30, 2018 at 11:47:03AM -0500, Tom Lane wrote:
>> Not the fault of this patch, but surely this bit in pgcrypto's
>> pad_eme_pkcs1_v15()
>> 
>> if (!pg_strong_random((char *) p, 1))
>> {
>> px_memset(buf, 0, res_len);
>> px_free(buf);
>> break;
>> }
>> 
>> is insane, because the "break" makes it fall into code that will continue
>> to scribble on "buf".  I think the "break" needs to be "return
>> PXE_NO_RANDOM", and probably we'd better back-patch that as a bug fix.
>> (I'm also failing to see the point of that px_memset before freeing the
>> buffer --- at this point, it contains no sensitive data, surely.)
> 
> Good catch.  As far as I understand this code, the message is not
> included yet and random bytes are just added to avoid having 0 in the
> padding.  So I agree that the memset is not really meaningful to
> have on the whole buffer.  I can take care of that as well, and of
> course you get the credits.  If you want to commit and back-patch the
> fix yourself, please feel free to do so.

I have fixed this one and back-patched down to 10.  In what has been
committed I have kept the memset which is a logic present since
e94dd6a back from 2005.  On my second lookup, the logic is correct
without it, still it felt safer to keep it.
--
Michael


signature.asc
Description: PGP signature


Re: Clean up some elog messages and comments for do_pg_stop_backup and do_pg_start_backup

2018-12-31 Thread Michael Paquier
On Mon, Dec 31, 2018 at 10:50:20AM +0100, Peter Eisentraut wrote:
> I like this version of the patch.

OK, committed.  Thanks all for the discussion.
--
Michael


signature.asc
Description: PGP signature


Re: Joins on TID

2018-12-31 Thread Edmund Horner
On Sat, 22 Dec 2018 at 12:34, Tom Lane  wrote:
> I decided to spend an afternoon seeing exactly how much work would be
> needed to support parameterized TID scans, ie nestloop-with-inner-TID-
> scan joins, as has been speculated about before, most recently here:
>
> https://www.postgresql.org/message-id/flat/CAMqTPq%3DhNg0GYFU0X%2BxmuKy8R2ARk1%2BA_uQpS%2BMnf71MYpBKzg%40mail.gmail.com
>
> It turns out it's not that bad, less than 200 net new lines of code
> (all of it in the planner; the executor seems to require no work).
>
> Much of the code churn is because tidpath.c is so ancient and crufty.
> It was mostly ignoring the RestrictInfo infrastructure, in particular
> emitting the list of tidquals as just bare clauses not RestrictInfos.
> I had to change that in order to avoid inefficiencies in some places.

It seems good, and I can see you've committed it now.  (I should have
commented sooner, but it's the big summer holiday period here, which
means I have plenty of time to work on PostgreSQL, but none of my
usual resources.  In any case, I was going to say "this looks useful
and not too complicated, please go ahead".)

I did notice that multiple tidquals are no longer removed from scan_clauses:

EXPLAIN SELECT * FROM pg_class WHERE ctid = '(1,1)' OR ctid = '(2,2)';

 Tid Scan on pg_class  (cost=0.01..8.03 rows=2 width=265)
   TID Cond: ((ctid = '(1,1)'::tid) OR (ctid = '(2,2)'::tid))
   Filter: ((ctid = '(1,1)'::tid) OR (ctid = '(2,2)'::tid))

I guess if we thought it was a big deal we could attempt to recreate
the old logic with RestrictInfos.

> I haven't really looked at how much of a merge problem there'll be
> with Edmund Horner's work for TID range scans.  My feeling about it
> is that we might be best off treating that as a totally separate
> code path, because the requirements are significantly different (for
> instance, a range scan needs AND semantics not OR semantics for the
> list of quals to apply).

Well, I guess it's up to me to merge it.  I can't quite see which
parts we'd use a separate code path for.  Can you elaborate?

Edmund



Re: Early WIP/PoC for inlining CTEs

2018-12-31 Thread Andrew Gierth
> "Andreas" == Andreas Karlsson  writes:

 Andreas> I believe I have fixed these except for the comment on the
 Andreas> conditions for when we inline.

 Andreas> Andrew Gierth: Why did you chose to not inline on FOR UPDATE
 Andreas> but inline volatile functions? I feel that this might be
 Andreas> inconsistent since in both cases the query in the CTE can
 Andreas> change behavior if the planner pushes a WHERE clause into the
 Andreas> subquery, but maybe I am missing something.

I chose not to inline FOR UPDATE because it was an obvious compatibility
break, potentially changing the set of locked rows, and it was an easy
condition to test.

I did not test for volatile functions simply because this was a very
early stage of the project (which wasn't my project, I was just
assisting someone else). I left the comment "this likely needs some
additional checks" there for a reason.

-- 
Andrew (irc:RhodiumToad)



Re: monitoring CREATE INDEX [CONCURRENTLY]

2018-12-31 Thread Alvaro Herrera
For discussion, here's an preliminary patch.  This is just a first
skeleton; needs to grow a lot of flesh yet, per my previous proposal.
As far as the generic CREATE INDEX stuff goes, I think this is complete;
it's missing the AM-specific bits.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index c2ad944e04..f4cb28c6d6 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1590,7 +1590,7 @@ index_drop(Oid indexId, bool concurrent)
 		 * to acquire an exclusive lock on our table.  The lock code will
 		 * detect deadlock and error out properly.
 		 */
-		WaitForLockers(heaplocktag, AccessExclusiveLock);
+		WaitForLockers(heaplocktag, AccessExclusiveLock, true);
 
 		/*
 		 * No more predicate locks will be acquired on this index, and we're
@@ -1634,7 +1634,7 @@ index_drop(Oid indexId, bool concurrent)
 		 * Wait till every transaction that saw the old index state has
 		 * finished.
 		 */
-		WaitForLockers(heaplocktag, AccessExclusiveLock);
+		WaitForLockers(heaplocktag, AccessExclusiveLock, true);
 
 		/*
 		 * Re-open relations to allow us to complete our actions.
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5253837b54..921a84eb45 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -904,6 +904,24 @@ CREATE VIEW pg_stat_progress_vacuum AS
 FROM pg_stat_get_progress_info('VACUUM') AS S
 		LEFT JOIN pg_database D ON S.datid = D.oid;
 
+CREATE VIEW pg_stat_progress_create_index AS
+	SELECT
+		s.pid AS pid, S.datid AS datid, D.datname AS datname,
+		S.relid AS relid,
+		CASE s.param1 WHEN 0 THEN 'initializing'
+	  WHEN 1 THEN 'waiting for lockers 1'
+	  WHEN 2 THEN 'building index'
+	  WHEN 3 THEN 'waiting for lockers 2'
+	  WHEN 4 THEN 'validating index'
+	  WHEN 5 THEN 'waiting for lockers 3'
+	  END as phase,
+		S.param2 AS procs_to_wait_for,
+		S.param3 AS procs_waited_for,
+		S.param4 AS partitions_to_build,
+		S.param5 AS partitions_built
+	FROM pg_stat_get_progress_info('CREATE INDEX') AS S
+		LEFT JOIN pg_database D ON S.datid = D.oid;
+
 CREATE VIEW pg_user_mappings AS
 SELECT
 U.oid   AS umid,
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 816c73a36a..03321c2dc4 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -35,6 +35,7 @@
 #include "commands/dbcommands.h"
 #include "commands/defrem.h"
 #include "commands/event_trigger.h"
+#include "commands/progress.h"
 #include "commands/tablecmds.h"
 #include "commands/tablespace.h"
 #include "mb/pg_wchar.h"
@@ -47,6 +48,7 @@
 #include "parser/parse_coerce.h"
 #include "parser/parse_func.h"
 #include "parser/parse_oper.h"
+#include "pgstat.h"
 #include "rewrite/rewriteManip.h"
 #include "storage/lmgr.h"
 #include "storage/proc.h"
@@ -370,6 +372,15 @@ DefineIndex(Oid relationId,
 	Snapshot	snapshot;
 	int			i;
 
+
+	/*
+	 * Start progress report.  If we're building a partition, this was already
+	 * done.
+	 */
+	if (!OidIsValid(parentIndexId))
+		pgstat_progress_start_command(PROGRESS_COMMAND_CREATE_INDEX,
+	  relationId);
+
 	/*
 	 * count key attributes in index
 	 */
@@ -866,6 +877,11 @@ DefineIndex(Oid relationId,
 	if (!OidIsValid(indexRelationId))
 	{
 		heap_close(rel, NoLock);
+
+		/* If this is the top-level index, we're done */
+		if (!OidIsValid(parentIndexId))
+			pgstat_progress_end_command();
+
 		return address;
 	}
 
@@ -891,6 +907,9 @@ DefineIndex(Oid relationId,
 			TupleDesc	parentDesc;
 			Oid		   *opfamOids;
 
+			pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
+		 nparts);
+
 			memcpy(part_oids, partdesc->oids, sizeof(Oid) * nparts);
 
 			parentDesc = CreateTupleDescCopy(RelationGetDescr(rel));
@@ -1040,6 +1059,8 @@ DefineIndex(Oid relationId,
 skip_build, quiet);
 }
 
+pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_DONE,
+			 i + 1);
 pfree(attmap);
 			}
 
@@ -1074,6 +1095,8 @@ DefineIndex(Oid relationId,
 		 * Indexes on partitioned tables are not themselves built, so we're
 		 * done here.
 		 */
+		if (!OidIsValid(parentIndexId))
+			pgstat_progress_end_command();
 		return address;
 	}
 
@@ -1081,6 +1104,11 @@ DefineIndex(Oid relationId,
 	{
 		/* Close the heap and we're done, in the non-concurrent case */
 		heap_close(rel, NoLock);
+
+		/* If this is the top-level index, we're done. */
+		if (!OidIsValid(parentIndexId))
+			pgstat_progress_end_command();
+
 		return address;
 	}
 
@@ -1132,7 +1160,9 @@ DefineIndex(Oid relationId,
 	 * exclusive lock on our table.  The lock code will detect deadlock and
 	 * error out properly.
 	 */
-	WaitForLockers(heaplocktag, ShareLock);
+	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
+ 

Re: Early WIP/PoC for inlining CTEs

2018-12-31 Thread Andreas Karlsson

On 11/16/18 10:15 PM, Tom Lane wrote:
> I took a little bit of a look through this.  Some thoughts:

Thanks for the review! I have decided to pick up this patch and work on 
it since nothing has happened in a while. Here is a new version with 
most of the feedback fixed.



* This is not the idiomatic way to declare an expression tree walker:

+static bool inline_cte_walker(Node *node, void *ctxp)
+{
+   struct inline_cte_walker_ctx *ctx = ctxp;

* I have no faith in the idea that we can skip doing a copyObject on the
inlined subquery, except maybe in the case where we know there's exactly
one reference.

* In "here's where we do the actual substitution", if we're going to
scribble on the RTE rather than make a new one, we must take pains
to zero out the RTE_CTE-specific fields so that the RTE looks the
same as if it had been a RTE_SUBQUERY all along; cf db1071d4e.

* The lack of comments about what conditions we inline under
(at subselect.c:1318) is distressing.  I'm not particularly
in love with the way that inline_cte_walker is commented, either.
And dare I mention that this falsifies the intro comment for
SS_process_ctes?

* Speaking of the comments, I'm not convinced that view rewrite is
a good comparison point; I think this is more like subquery pullup.


I believe I have fixed these except for the comment on the conditions 
for when we inline.


Andrew Gierth: Why did you chose to not inline on FOR UPDATE but inline 
volatile functions? I feel that this might be inconsistent since in both 
cases the query in the CTE can change behavior if the planner pushes a 
WHERE clause into the subquery, but maybe I am missing something.



* I wonder whether we should make range_table_walker more friendly
to the needs of this patch.  The fact that it doesn't work for this usage
suggests that it might not work for others, too.  I could see replacing
the QTW_EXAMINE_RTES flag with QTW_EXAMINE_RTES_BEFORE and
QTW_EXAMINE_RTES_AFTER so that callers could say which order of operations
they want (ie, visit RTE itself before or after its substructure).  Then
we could get rid of the double traversal of the RTE list.


I did as suggested and the code is now much cleaner, but I feel while 
RTE walking business would become cleaner if we could change from having 
the range table walker yield both RTEs and the contents of the RTEs to 
having it just yeild the RTEs and then the walker callback can call 
expression_tree_walker() with the RTE so RTEs are treated like any other 
node in the tree.


I might look into how big impact such a change would have and if it is 
worth the churn.



* I think a large fraction of the regression test cases that this
changes are actually broken by the patch, in the sense that we meant
to test the behavior with a CTE and now we're not getting that.
So we'd need to add MATERIALIZED in many more places than this has
done.  Somebody else (Stephen?) would need to opine on whether that's
true for the CTEs in rowsecurity.sql, but it's definitely true for
the one in rowtypes.sql, where the point is to test what happens
with a whole-row Var.


Agreed, fixed.


* Which will mean we need some new test cases showing that this patch does
anything.  It'd be a good idea to add a test case showing that this gets
things right for conflicting CTE names at different levels, eg

explain verbose
with x as (select 1 as y)
select * from (with x as (select 2 as y) select * from x) ss;


Added this test case, but more are needed. Any suggestion for what file 
these tests belong (right now I just added it to subselect.sql)? Or 
should I add a new called cte.sql?



* ruleutils.c needs adjustments for the new syntax, if we keep that.


Thanks, fixed!


* And of course the documentation needs much more work than this.


Yeah, I was waiting for there to be more agreement on when CTEs should 
be inlined, but maybe I should start writing anyway.


Andreas

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index bb92d9d37a..8c26dd1f26 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1888,7 +1888,7 @@ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
 
 -- join in CTE
 EXPLAIN (VERBOSE, COSTS OFF)
-WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
+WITH t (c1_1, c1_3, c2_1) AS MATERIALIZED (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
  QUERY PLAN  
 -
  Limit
@@ -1905,7 +1905,7 @@ WITH t (c1_1, c1_3, c2_1) AS 

Re: Changing SQL Inlining Behaviour (or...?)

2018-12-31 Thread Adam Brightwell
All,

> So, context:
>
> - We want PostGIS to parallelize more. In order to achieve that we need to 
> mark our functions with more realistic COSTs. Much much higher COSTs.
> - When we do that, we hit a different problem. Our most commonly used 
> functions, ST_Intersects(), ST_DWithin() are actually SQL wrapper functions 
> are more complex combinations of index operators and exact computational 
> geometry functions.
> - In the presence of high cost parameters that are used multiple times in SQL 
> functions, PostgreSQL will stop inlining those functions, in an attempt to 
> save the costs of double-calculating the parameters.
> - For us, that's the wrong choice, because we lose the index operators at the 
> same time as we "save" the cost of double calculation.
> - We need our wrapper functions inlined, even when they are carrying a high 
> COST.
>
> At pgconf.eu, I canvassed this problem and some potential solutions:
> ...
> * Solution #2 - Quick and dirty and invisible. Tom suggested a hack that 
> achieves the aims of #1 but without adding syntax to CREATE FUNCTION: have 
> the inlining logic look at the cost of the wrapper and the cost of 
> parameters, and if the cost of the wrapper "greatly exceeded" the cost of the 
> parameters, then inline. So the PostGIS project would just set the cost of 
> our wrappers very high, and we'd get the behaviour we want, while other users 
> who want to use wrappers to force caching of calculations would have zero 
> coded wrapper functions.  Pros: Solves the problem and easy to implement, I'm 
> happy to contribute. Cons: it's so clearly a hack involving hidden (from 
> users) magic.
> ...
> So my question to hackers is: which is less worse, #1 or #2, to implement and 
> submit to commitfest, in case #3 does not materialize in time for PgSQL 12?

I've been working with Paul to create and test a patch (attached) that
addresses Solution #2. This patch essentially modifies the inlining
logic to compare the cost of the function with the total cost of the
parameters. The goal as stated above, is that for these kinds of
functions, they would be assigned relatively high cost to trigger the
inlining case.

The modification that this patch makes is the following:

* Collect the cost for each parameter (no longer short circuits when a
single parameter is costly).
* Compare the total cost of all parameters to the individual cost of
the function.
* If the function cost is greater than the parameters, then it will
inline the function.
* If the function cost is less than the parameters, then it will
perform the original parameter checks (short circuiting as necessary).

I've included a constant called "INLINE_FUNC_COST_FACTOR" that is
currently set to '1'. This is meant to assume for adjustments to what
"greatly exceeded" means in the description above. Perhaps this isn't
necessary, but I wanted to at least initially provide the flexibility
in case it were.

I have also attached a simple test case as was originally previously
by Paul to demonstrate the functionality.

-Adam


pg-example.sql
Description: application/sql
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index f4446169f5..58755151fa 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -4648,6 +4648,9 @@ inline_function(Oid funcid, Oid result_type, Oid result_collid,
 	Query	   *querytree;
 	Node	   *newexpr;
 	int		   *usecounts;
+	double	   *paramcosts;
+	Cost		total_param_cost;
+	double		func_cost;
 	ListCell   *arg;
 	int			i;
 
@@ -4839,9 +4842,20 @@ inline_function(Oid funcid, Oid result_type, Oid result_collid,
 	 * substituted.
 	 */
 	usecounts = (int *) palloc0(funcform->pronargs * sizeof(int));
+	paramcosts = (double *) palloc0(funcform->pronargs * sizeof(double));
 	newexpr = substitute_actual_parameters(newexpr, funcform->pronargs,
 		   args, usecounts);
 
+	/*
+	 * Get the cost of the function without parameters.
+	 *
+	 * Calculation based on how costsize.c performs lookup for function costs
+	 * via 'cost_qual_eval_walker'.
+	 */
+	func_cost = get_func_cost(funcid) * cpu_operator_cost;
+
+	total_param_cost = 0.0;
+
 	/* Now check for parameter usage */
 	i = 0;
 	foreach(arg, args)
@@ -4867,10 +4881,10 @@ inline_function(Oid funcid, Oid result_type, Oid result_collid,
 			 */
 			if (contain_subplans(param))
 goto fail;
+
 			cost_qual_eval(_cost, list_make1(param), NULL);
-			if (eval_cost.startup + eval_cost.per_tuple >
-10 * cpu_operator_cost)
-goto fail;
+			paramcosts[i] = (eval_cost.startup + eval_cost.per_tuple);
+			total_param_cost += (eval_cost.startup + eval_cost.per_tuple);
 
 			/*
 			 * Check volatility last since this is more expensive than the
@@ -4883,6 +4897,19 @@ inline_function(Oid funcid, Oid result_type, Oid result_collid,
 	}
 
 	/*
+	 * If the cost of the function is greater than the sum of all the param
+	 * costs, then inline. Otherwise, check the cost of each param.
+	 

pg11.1: dsa_area could not attach to segment

2018-12-31 Thread Justin Pryzby
In our query logs I saw:

postgres=# SELECT log_time, session_id, session_line, left(message,99), 
left(query,99) FROM postgres_log WHERE error_severity='ERROR' AND message NOT 
LIKE 'cancel%';
-[ RECORD 1 
]+
log_time | 2018-12-31 15:39:11.917-05
session_id   | 5c2a7e6f.1fa4
session_line | 1
left | dsa_area could not attach to segment
left | SELECT colcld.child c, parent p, array_agg(colpar.attname::text 
ORDER BY colpar.attnum) cols, array
-[ RECORD 2 
]+
log_time | 2018-12-31 15:39:11.917-05
session_id   | 5c2a7e6f.1fa3
session_line | 4
left | dsa_area could not attach to segment
left | SELECT colcld.child c, parent p, array_agg(colpar.attname::text 
ORDER BY colpar.attnum) cols, array

The full query + plan is:

|ts=# explain SELECT colcld.child c, parent p, array_agg(colpar.attname::text 
ORDER BY colpar.attnum) cols, array_agg(format_type(colpar.atttypid, 
colpar.atttypmod) ORDER BY colpar.attnum) AS types
|FROM queued_alters qa
|JOIN pg_attribute colpar ON to_regclass(qa.parent)=colpar.attrelid AND 
colpar.attnum>0 AND NOT colpar.attisdropped
|JOIN (SELECT *, attrelid::regclass::text AS child FROM pg_attribute) colcld ON 
to_regclass(qa.child) =colcld.attrelid AND colcld.attnum>0 AND NOT 
colcld.attisdropped
|WHERE colcld.attname=colpar.attname AND colpar.atttypid!=colcld.atttypid
|GROUP BY 1,2
|ORDER BY
|parent LIKE 'unused%', -- Do them last
|regexp_replace(colcld.child, 
'.*_((([0-9]{4}_[0-9]{2})_[0-9]{2})|(([0-9]{6})([0-9]{2})?))$', '\3\5') DESC,
|regexp_replace(colcld.child, '.*_', '') DESC
|LIMIT 1;

|QUERY PLAN
|Limit  (cost=67128.06..67128.06 rows=1 width=307)
|  ->  Sort  (cost=67128.06..67137.84 rows=3912 width=307)
|Sort Key: (((qa_1.parent)::text ~~ 'unused%'::text)), 
(regexp_replacepg_attribute.attrelid)::regclass)::text), 
'.*_((([0-9]{4}_[0-9]{2})_[0-9]{2})|(([0-9]{6})([0-9]{2})?))$'::text, 
'\3\5'::text)) DESC, 
(regexp_replacepg_attribute.attrelid)::regclass)::text), '.*_'::text, 
''::text)) DESC
|->  GroupAggregate  (cost=66893.34..67108.50 rows=3912 width=307)
|  Group Key: (((pg_attribute.attrelid)::regclass)::text), 
qa_1.parent
|  ->  Sort  (cost=66893.34..66903.12 rows=3912 width=256)
|Sort Key: (((pg_attribute.attrelid)::regclass)::text), 
qa_1.parent
|->  Gather  (cost=40582.28..66659.91 rows=3912 width=256)
|  Workers Planned: 2
|  ->  Parallel Hash Join  (cost=39582.28..65268.71 
rows=1630 width=256)
|Hash Cond: 
(((to_regclass((qa_1.child)::text))::oid = pg_attribute.attrelid) AND 
(colpar.attname = pg_attribute.attname))
|Join Filter: (colpar.atttypid <> 
pg_attribute.atttypid)
|->  Nested Loop  (cost=0.43..25614.89 
rows=11873 width=366)
|  ->  Parallel Append  (cost=0.00..12.00 
rows=105 width=292)
|->  Parallel Seq Scan on 
queued_alters_child qa_1  (cost=0.00..11.47 rows=147 width=292)
|->  Parallel Seq Scan on 
queued_alters qa  (cost=0.00..0.00 rows=1 width=292)
|  ->  Index Scan using 
pg_attribute_relid_attnum_index on pg_attribute colpar  (cost=0.43..242.70 
rows=114 width=78)
|Index Cond: ((attrelid = 
(to_regclass((qa_1.parent)::text))::oid) AND (attnum > 0))
|Filter: (NOT attisdropped)
|->  Parallel Hash  (cost=33651.38..33651.38 
rows=395365 width=72)
|  ->  Parallel Seq Scan on pg_attribute  
(cost=0.00..33651.38 rows=395365 width=72)
|Filter: ((NOT attisdropped) AND 
(attnum > 0))
|

queued_alters is usually empty, and looks like it would've last been nonempty 
on 2018-12-10.

ts=# \d queued_alters
  Table "public.queued_alters"
 Column | Type  | Collation | Nullable | Default
+---+---+--+-
 child  | character varying(64) |   |  |
 parent | character varying(64) |   |  |
Indexes:
"queued_alters_child_parent_key" UNIQUE CONSTRAINT, btree (child, parent)
Number of child tables: 1 (Use \d+ to list them.)

I found this other log at that time:
 2018-12-31 15:39:11.918-05 | 30831 | 5bf38e71.786f |5 | background 
worker "parallel worker" (PID 8100) exited with exit code 1

Which is the postmaster, or its PID in any case..

[pryzbyj@telsasoft-db ~]$ ps -wwwf 30831
UIDPID  PPID  C STIME TTY  STAT   

Re: [HACKERS] REINDEX CONCURRENTLY 2.0

2018-12-31 Thread Andrew Gierth
> "Alvaro" == Alvaro Herrera  writes:

 Alvaro> After looking at the proposed grammar again today and in danger
 Alvaro> of repeating myself, IMO allowing the concurrency keyword to
 Alvaro> appear outside the parens would be a mistake. Valid commands:

 Alvaro>   REINDEX (VERBOSE, CONCURRENTLY) TABLE foo;
 Alvaro>   REINDEX (CONCURRENTLY) INDEX bar;

We burned that bridge with CREATE INDEX CONCURRENTLY; to make REINDEX
require different syntax would be too inconsistent.

If we didn't have all these existing uses of CONCURRENTLY without
parens, your argument might have more merit; but we do.

-- 
Andrew (irc:RhodiumToad)



Re: [HACKERS] REINDEX CONCURRENTLY 2.0

2018-12-31 Thread Alvaro Herrera
On 2018-Dec-14, Stephen Frost wrote:

> > My vote goes to put the keyword inside of and exclusively in the
> > parenthesized option list.
> 
> I disagree with the idea of exclusively having concurrently be in the
> parentheses.  'explain buffers' is a much less frequently used option
> (though that might, in part, be because it's a bit annoying to write out
> explain (analyze, buffers) select...; I wonder if we could have a way to
> say "if I'm running analyze, I always want buffers"...),

I'm skeptical.  I think EXPLAIN ANALYZE is more common because it has
more than one decade of advantage compared to the more detailed option
list.  Yes, it's also easier, but IMO it's a brain thing (muscle
memory), not a fingers thing.

> but concurrently reindexing a table (or index..) is going to almost
> certainly be extremely common, perhaps even more common than *not*
> reindexing concurrently.

Well, users can use the reindexdb utility and save some keystrokes.

Anyway we don't typically add redundant ways to express the same things.
Where we have them, it's just because the old way was there before, and
we added the extensible way later.  Adding two in the first appearance
of a new feature seems absurd to me.

After looking at the proposed grammar again today and in danger of
repeating myself, IMO allowing the concurrency keyword to appear outside
the parens would be a mistake.  Valid commands:

  REINDEX (VERBOSE, CONCURRENTLY) TABLE foo;
  REINDEX (CONCURRENTLY) INDEX bar;

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Unified logging system for command-line programs

2018-12-31 Thread Alvaro Herrera
Ah, one more thing -- there's a patch by Marina Polyakova (in CC) to
make pgbench logging more regular.  Maybe that stuff should be
considered now too.  I'm not saying to patch pgbench in this commit, but
rather to have pgbench in mind while discussing the API.  I think the
last version of that was here:

https://postgr.es/m/a1bd32671a6777b78dd67b95eb68f...@postgrespro.ru

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pg_regress: promptly detect failed postmaster startup

2018-12-31 Thread Tom Lane
Noah Misch  writes:
> When "make check TEMP_CONFIG=<(echo break_me=on)" spawns a postmaster that
> fails startup, we detect that with "pg_regress: postmaster did not respond
> within 60 seconds".  pg_regress has a kill(postmaster_pid, 0) intended to
> detect this case faster.  Since kill(ZOMBIE-PID, 0) succeeds[1], that test is
> ineffective.

Ooops.

> The fix, attached, is to instead test waitpid(), like pg_ctl's
> wait_for_postmaster() does.

+1.  This leaves postmaster_pid as a dangling pointer, but since
we just exit immediately, that seems fine.  (If we continued, and
arrived at the "kill(postmaster_pid, SIGKILL)" below, it would not
be fine.)

regards, tom lane



Re: Is MinMaxExpr really leakproof?

2018-12-31 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 >> bttextcmp() and other varstr_cmp() callers fall afoul of the same
 >> restriction with their "could not convert string to UTF-16" errors
 >> (https://postgr.es/m/CADyhKSXPwrUv%2B9LtqPAQ_gyZTv4hYbr2KwqBxcs6a3Vee1jBLQ%40mail.gmail.com).
 >> Leaking the binary fact that an unspecified string contains an
 >> unspecified rare Unicode character is not a serious leak, however.
 >> Also, those errors would be a substantial usability impediment if
 >> they happened much in practice; you couldn't index affected values.

 Tom> Yeah. I think that there might be a usability argument for marking
 Tom> textcmp and related operators as leakproof despite this
 Tom> theoretical leak condition, because not marking them puts a large
 Tom> practical constraint on what conditions we can optimize. However,
 Tom> that discussion just applies narrowly to the string data types; it
 Tom> is independent of what we want to say the general policy is.

I think that's not even a theoretical leak; the documentation for
MultiByteToWideChar does not indicate any way in which it can return an
error for the specific parameters we pass to it. In particular we do not
tell it to return errors for invalid input characters.

-- 
Andrew (irc:RhodiumToad)



Re: Is MinMaxExpr really leakproof?

2018-12-31 Thread Tom Lane
Isaac Morland  writes:
> On Mon, 31 Dec 2018 at 12:26, Noah Misch  wrote:
>> bttextcmp() and other varstr_cmp() callers fall afoul of the same
>> restriction with their "could not convert string to UTF-16" errors

> I'm confused. What characters cannot be represented in UTF-16?

What's actually being reported there is failure of Windows'
MultiByteToWideChar function.  Probable causes could include
invalid data (not valid UTF8), or conditions such as out-of-memory
which might have nothing at all to do with the input.

There are similar, equally nonspecific, error messages in the
non-Windows code path.

In principle, an attacker might be able to find out the existence
of extremely long strings in a column by noting out-of-memory
failures in this code, but that doesn't seem like a particularly
interesting information leak ...

regards, tom lane



Re: Is MinMaxExpr really leakproof?

2018-12-31 Thread Tom Lane
Noah Misch  writes:
> This thread duplicates 
> https://postgr.es/m/flat/16539.1431472961%40sss.pgh.pa.us

Ah, so it does.  Not sure why that fell off the radar without getting
fixed; possibly because it was right before PGCon.

> pg_lsn_cmp() and btoidvectorcmp() surely could advertise leakproofness.

Agreed; I'll go fix those.

> I'm not sure about enum_cmp(), numeric_cmp(), tsquery_cmp() or
> tsvector_cmp().  I can't think of a reason those would leak, though.

I've not looked at the last three, but enum_cmp can potentially report the
value of an input, if it fails to find a matching pg_enum record:

enum_tup = SearchSysCache1(ENUMOID, ObjectIdGetDatum(arg1));
if (!HeapTupleIsValid(enum_tup))
ereport(ERROR,
(errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
 errmsg("invalid internal value for enum: %u",
arg1)));

and there are similar error reports inside compare_values_of_enum().
Whether that amounts to an interesting security leak is debatable.
It's hard to see how an attacker could arrange for those to fail,
much less do so in a way that would reveal a value he didn't know
already.

> btrecordcmp() and other polymorphic
> cmp functions can fail:
>   create type boxrec as (a box); select '("(1,1),(0,0)")'::boxrec = 
> '("(1,1),(0,0)")'::boxrec;
>   => ERROR:  could not identify an equality operator for type box
> The documentation says, "a function which throws an error message for some
> argument values but not others ... is not leakproof."  I would be comfortable
> amending that to allow the "could not identify an equality operator" error,
> because that error follows from type specifics, not value specifics.

I think the real issue with btrecordcmp, btarraycmp, etc, is that
they invoke other type-specific comparison functions.  Therefore,
we cannot mark them leakproof unless every type-specific comparison
function is leakproof.  So we're right back at the policy question.

> bttextcmp() and other varstr_cmp() callers fall afoul of the same restriction
> with their "could not convert string to UTF-16" errors
> (https://postgr.es/m/CADyhKSXPwrUv%2B9LtqPAQ_gyZTv4hYbr2KwqBxcs6a3Vee1jBLQ%40mail.gmail.com).
> Leaking the binary fact that an unspecified string contains an unspecified 
> rare
> Unicode character is not a serious leak, however.  Also, those errors would 
> be a
> substantial usability impediment if they happened much in practice; you 
> couldn't
> index affected values.

Yeah.  I think that there might be a usability argument for marking
textcmp and related operators as leakproof despite this theoretical
leak condition, because not marking them puts a large practical
constraint on what conditions we can optimize.  However, that
discussion just applies narrowly to the string data types; it is
independent of what we want to say the general policy is.

>> I think that we should either change contain_leaked_vars_walker to
>> explicitly verify leakproofness of the comparison function, or decide
>> that it's project policy that btree comparison functions are leakproof,
>> and change the markings on those (and their associated operators).

> Either of those solutions sounds fine.  Like last time, I'll vote for 
> explicitly
> verifying leakproofness.

Yeah, I'm leaning in that direction as well.  Other than comparisons
involving strings, it's not clear that we'd gain much from insisting
on leakproofness in general, and it seems like it might be rather a
large burden to put on add-on data types.

regards, tom lane



Re: Is MinMaxExpr really leakproof?

2018-12-31 Thread Noah Misch
This thread duplicates https://postgr.es/m/flat/16539.1431472961%40sss.pgh.pa.us

On Sun, Dec 30, 2018 at 01:24:02PM -0500, Tom Lane wrote:
> So this coding amounts to an undocumented
> assumption that every non-cross-type btree comparison function is
> leakproof.

> select p.oid::regprocedure from pg_proc p, pg_amproc a, pg_opfamily f
> where p.oid=a.amproc and f.oid=a.amprocfamily and opfmethod=403 and not 
> proleakproof and amproclefttype=amprocrighttype and amprocnum=1;
> 
>  bpcharcmp(character,character)
>  btarraycmp(anyarray,anyarray)
>  btbpchar_pattern_cmp(character,character)
>  btoidvectorcmp(oidvector,oidvector)
>  btrecordcmp(record,record)
>  btrecordimagecmp(record,record)
>  bttext_pattern_cmp(text,text)
>  bttextcmp(text,text)
>  enum_cmp(anyenum,anyenum)
>  jsonb_cmp(jsonb,jsonb)
>  numeric_cmp(numeric,numeric)
>  pg_lsn_cmp(pg_lsn,pg_lsn)
>  range_cmp(anyrange,anyrange)
>  tsquery_cmp(tsquery,tsquery)
>  tsvector_cmp(tsvector,tsvector)
> 
> so this assumption is, on its face, wrong.
> 
> In practice it might be all right, because it's hard to see a reason why
> a btree comparison function would ever throw an error except for internal
> failures, which are probably outside the scope of leakproofness guarantees
> anyway.  Nonetheless, if we didn't mark these functions as leakproof,
> why not?

pg_lsn_cmp() and btoidvectorcmp() surely could advertise leakproofness.  I'm not
sure about enum_cmp(), numeric_cmp(), tsquery_cmp() or tsvector_cmp().  I can't
think of a reason those would leak, though.  btrecordcmp() and other polymorphic
cmp functions can fail:

  create type boxrec as (a box); select '("(1,1),(0,0)")'::boxrec = 
'("(1,1),(0,0)")'::boxrec;
  => ERROR:  could not identify an equality operator for type box

The documentation says, "a function which throws an error message for some
argument values but not others ... is not leakproof."  I would be comfortable
amending that to allow the "could not identify an equality operator" error,
because that error follows from type specifics, not value specifics.

bttextcmp() and other varstr_cmp() callers fall afoul of the same restriction
with their "could not convert string to UTF-16" errors
(https://postgr.es/m/CADyhKSXPwrUv%2B9LtqPAQ_gyZTv4hYbr2KwqBxcs6a3Vee1jBLQ%40mail.gmail.com).
Leaking the binary fact that an unspecified string contains an unspecified rare
Unicode character is not a serious leak, however.  Also, those errors would be a
substantial usability impediment if they happened much in practice; you couldn't
index affected values.

> I think that we should either change contain_leaked_vars_walker to
> explicitly verify leakproofness of the comparison function, or decide
> that it's project policy that btree comparison functions are leakproof,
> and change the markings on those (and their associated operators).

Either of those solutions sounds fine.  Like last time, I'll vote for explicitly
verifying leakproofness.



Re: Implementing Incremental View Maintenance

2018-12-31 Thread Adam Brusselback
Hi all, just wanted to say  I am very happy to see progress made on this,
my codebase has multiple "materialized tables" which are maintained with
statement triggers (transition tables) and custom functions. They are ugly
and a pain to maintain, but they work because I have no other
solution...for now at least.

I am concerned that the eager approach only addresses a subset of the MV use
> case space, though. For example, if we presume that an MV is present
> because
> the underlying direct query would be non-performant, then we have to at
> least question whether applying the delta-update would also be detrimental
> to some use cases.
>

I will say that in my case, as long as my reads of the materialized view
are always consistent with the underlying data, that's what's important.  I
don't mind if it's eager, or lazy (as long as lazy still means it will
refresh prior to reading).


Re: [PATCH] check for ctags utility in make_ctags

2018-12-31 Thread Nikolay Shaplov
В письме от понедельник, 31 декабря 2018 г. 19:04:08 MSK пользователь Nikolay 
Shaplov написал:

> I'd like to propose a small patch for make_ctags script. It checks if ctags
> utility is intalled or not. If not it reports an error and advises to
> install ctags.
Oups. I've misplaced '&' character :-)

Here is the right version  
diff --git a/src/tools/make_ctags b/src/tools/make_ctags
index 1609c07..edbcd82 100755
--- a/src/tools/make_ctags
+++ b/src/tools/make_ctags
@@ -2,6 +2,13 @@
 
 # src/tools/make_ctags
 
+if [ ! $(command -v ctags) ]
+then
+  echo "'ctags' utility is not found" 1>&2
+  echo "Please install 'ctags' to run make_ctags" 1>&2
+  exit 1
+fi
+
 trap "rm -f /tmp/$$" 0 1 2 3 15
 rm -f ./tags
 


[PATCH] check for ctags utility in make_ctags

2018-12-31 Thread Nikolay Shaplov
Hi!

I'd like to propose a small patch for make_ctags script. It checks if ctags 
utility is intalled or not. If not it reports an error and advises to install 
ctags.

This will make life of a person that uses ctags first time in his life a bit 
easier.

I use command -v to detect if ctags command exists. It is POSIX standard and I 
hope it exist in all shells.diff --git a/src/tools/make_ctags b/src/tools/make_ctags
index 1609c07..edbcd82 100755
--- a/src/tools/make_ctags
+++ b/src/tools/make_ctags
@@ -2,6 +2,13 @@
 
 # src/tools/make_ctags
 
+if [ ! $(command -v ctags) ]
+then
+  echo "'ctags' utility is not found" &1>2
+  echo "Please install 'ctags' to run make_ctags" &1>2
+  exit 1
+fi
+
 trap "rm -f /tmp/$$" 0 1 2 3 15
 rm -f ./tags
 


Re: Unified logging system for command-line programs

2018-12-31 Thread Andres Freund
Hi,

On 2018-12-30 14:45:23 -0500, Tom Lane wrote:
> I wonder how hard it would be to layer some subset of
> ereport() functionality on top of what you have here, so as to get
> rid of those #ifdef stanzas.

+many.  I think we should aim to unify the use (in contrast to the
implementation) of logging as much as possible, rather than having a
separate API for it for client programs. Not just because that facilitates
code reuse in frontend programs, but also because that's one less thing to
learn when getting started with PG.

Further down the line I think we should also port the PG_CATCH logic to
client programs.

Greetings,

Andres Freund



Re: [HACKERS] proposal: schema variables

2018-12-31 Thread Erik Rijkers

On 2018-12-31 14:23, Pavel Stehule wrote:
st 21. 11. 2018 v 8:24 odesílatel Pavel Stehule 




[schema-variables-20181231-01.patch.gz]


Hi Pavel,

I gave this a quick try-out with the script I had from previous 
versions,

and found these two errors:


drop schema if exists schema1 cascade;
create schema if not exists schema1;
drop variable if exists schema1.myvar1;  --> error 49
create variable schema1.myvar1 as text ;
select schema1.myvar1;
let schema1.myvar1 = 'variable value ""';
select schema1.myvar1;
alter variable schema1.myvar1 rename to myvar2;
select schema1.myvar2;
create variable schema1.myvar1 as text ;
let schema1.myvar1 = 'variable value ""';
select schema1.myvar1;
alter variable schema1.myvar1 rename to myvar2; --> error 4287
select schema1.myvar2;



The above, ran with   psql -qXa  gives the following output:

drop schema if exists schema1 cascade;
create schema if not exists schema1;
drop variable if exists schema1.myvar1;  --> error 49
ERROR:  unrecognized object type: 49
create variable schema1.myvar1 as text ;
select schema1.myvar1;
 myvar1


(1 row)

let schema1.myvar1 = 'variable value ""';
select schema1.myvar1;
  myvar1
---
 variable value ""
(1 row)

alter variable schema1.myvar1 rename to myvar2;
select schema1.myvar2;
  myvar2
---
 variable value ""
(1 row)

create variable schema1.myvar1 as text ;
let schema1.myvar1 = 'variable value ""';
select schema1.myvar1;
  myvar1
---
 variable value ""
(1 row)

alter variable schema1.myvar1 rename to myvar2; --> error 4287
ERROR:  unsupported object class 4287
select schema1.myvar2;
  myvar2
---
 variable value ""
(1 row)



thanks,


Erik Rijkers





Re: Unified logging system for command-line programs

2018-12-31 Thread Alvaro Herrera
On 2018-Dec-30, Tom Lane wrote:

> Peter Eisentraut  writes:
> > I have developed a patch that unifies the various ad hoc logging
> > (message printing, error printing) systems used throughout the
> > command-line programs.
> 
> I've not read the patch in any detail, but +1 for making this more
> uniform.

Agreed, and the compactness is a good bonus too.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: slight tweaks to documentation about runtime pruning

2018-12-31 Thread David Rowley
On Tue, 18 Dec 2018 at 03:49, Alvaro Herrera  wrote:
> Pushed, thanks.

I just noticed that this is still open on the CF app. Marking as committed...

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: add_partial_path() may remove dominated path but still in use

2018-12-31 Thread Amit Kapila
On Mon, Dec 31, 2018 at 5:48 PM Kohei KaiGai  wrote:
>
> 2018年12月31日(月) 13:10 Amit Kapila :
> >
> > On Sun, Dec 30, 2018 at 9:01 AM Kohei KaiGai  wrote:
> > > 2018年12月30日(日) 4:12 Tom Lane :
> > > >
> > > > Kohei KaiGai  writes:
> > > > > 2018年12月29日(土) 1:44 Tom Lane :
> > > > >> However, first I'd like to know why this situation is arising in the 
> > > > >> first
> > > > >> place.  To have the situation you're describing, we'd have to have
> > > > >> attempted to make some Gather paths before we have all the partial 
> > > > >> paths
> > > > >> for the relation they're for.  Why is that a good thing to do?  It 
> > > > >> seems
> > > > >> like such Gathers are necessarily being made with incomplete 
> > > > >> information,
> > > > >> and we'd be better off to fix things so that none are made till 
> > > > >> later.
> > > >
> > > > > Because of the hook location, Gather-node shall be constructed with 
> > > > > built-in
> > > > > and foreign partial scan node first, then extension gets a chance to 
> > > > > add its
> > > > > custom paths (partial and full).
> > > > > At the set_rel_pathlist(), set_rel_pathlist_hook() is invoked next to 
> > > > > the
> > > > > generate_gather_paths().
> > > >
> > > > Hmm.  I'm inclined to think that we should have a separate hook
> > > > in which extensions are allowed to add partial paths, and that
> > > > set_rel_pathlist_hook should only be allowed to add regular paths.
> >
> > +1.  This idea sounds sensible to me.
> >
> > > >
> > > I have almost same opinion, but the first hook does not need to be
> > > dedicated for partial paths. As like set_foreign_pathlist() doing, we can
> > > add both of partial and regular paths here, then generate_gather_paths()
> > > may generate a Gather-path on top of the best partial-path.
> > >
> >
> > Won't it be confusing for users if we allow both partial and full
> > paths in first hook and only full paths in the second hook?
> > Basically, in many cases, the second hook won't be of much use.  What
> > advantage you are seeing in allowing both partial and full paths in
> > the first hook?
> >
> Two advantages. The first one is, it follows same manner of
> set_foreign_pathlist()
> which allows to add both of full and partial path if FDW supports 
> parallel-scan.
> The second one is practical. During the path construction, extension needs to
> check availability to run (e.g, whether operators in WHERE-clause is supported
> on GPU device...), calculate its estimated cost and so on. Not a small
> portion of
> them are common for both of full and partial path. So, if the first
> hook accepts to
> add both kind of paths at once, extension can share the common properties.
>

You have a point, though I am not sure how much difference it can
create for cost computation as ideally, both will have different
costing model.  I understand there are some savings by avoiding some
common work, is there any way to cache the required information?

> Probably, the second hook is only used for a few corner case where an 
> extension
> wants to manipulate path-list already built, like pg_hint_plan.
>

Okay, but it could be some work for extension authors who are using
the current hook, not sure they would like to divide the work between
first and second hook.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] proposal: schema variables

2018-12-31 Thread Pavel Stehule
st 21. 11. 2018 v 8:24 odesílatel Pavel Stehule 
napsal:

> Hi
>
> just rebase
>
>
rebase

Regards

Pavel


>
> Regards
>
> Pavel
>


schema-variables-20181231-01.patch.gz
Description: application/gzip


Re: Error while executing initdb...

2018-12-31 Thread David Rowley
On Mon, 31 Dec 2018 at 17:43, Rajib Deb  wrote:
>
> Dear All
> I put a printf statement in the "main.c" code and built it. Later when I 
> tried to execute INITDB, I got the following error
>
> The program "postgres" was found by  but was not the same version as 
> initdb.Check your installation
>
> After some analysis, I figured out that this error is being generated because 
> "ret" code from "PG_CTL.c" is returning a non zero return code while it is 
> comparing the line and versionstr in "exec.c". It looks like while reading 
> the line in "pipe_read_line" method, it is concatenating the printf statement 
> with the postgres version(postgres (PostgreSQL)x).
>
> I thought this probably is a defect and maybe the buffer needs to be flushed 
> out before reading it in "pipe_read_line" method. Before doing further 
> investigation and putting a possible fix, I thought to check with this group 
> if it is worth putting the effort.

>From looking at the code, it appears what happens is that
find_other_exec() calls "postgres -V" and reads the first line of the
output, so probably what's going on is you're printing out your
additional line even when postgres is called with -V or --version...
Would it not be easier just not to do that?

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Add timeline to partial WAL segments

2018-12-31 Thread David Steele

On 12/21/18 2:10 AM, Michael Paquier wrote:

On Thu, Dec 20, 2018 at 02:13:01PM +0200, David Steele wrote:

Or perhaps just always add the timeline to the .partial?  That way it
doesn't need to be renamed later.  Also, there would be a consistent name,
rather than sometimes .partial, sometimes ..partial.


Hm.  A renaming still needs to happen afterwards anyway, no?  When a
segment is created with a given name pg_receivewal cannot know if the
segment it is working on will be the last partial segment of a given
timeline.  And what would be changed in the segment name is the addition
of the new TLI, not the previous one.


I was thinking the file would only be renamed on successful completion. 
Suppose we are on timeline 1 pg_receivewal would be writing to:


000100010001.0001.partial

If the WAL segment is never completed (server crashes, etc.) it would 
keep that name.


If a cluster is promoted it would archive:

000100010001.0002.partial

And then pg_receivewal would start writing on:

000200010001.0002.partial

When that segment successfully completes it would be renamed by 
pg_receivewal to 000200010001.


In short, the *initial* name of the WAL file is set to what it should be 
if it doesn't complete so we don't need to run around and try to rename 
files on failure.  Only on success do we need to rename.


Sound plausible?

--
-David
da...@pgmasters.net



Re: Implementing Incremental View Maintenance

2018-12-31 Thread denty
Hi Yugo.

> I would like to implement Incremental View Maintenance (IVM) on
> PostgreSQL.

Great. :-)

I think it would address an important gap in PostgreSQL’s feature set.

> 2. How to compute the delta to be applied to materialized views
> 
> Essentially, IVM is based on relational algebra. Theorically, changes on
> base
> tables are represented as deltas on this, like "R <- R + dR", and the
> delta on
> the materialized view is computed using base table deltas based on "change
> propagation equations".  For implementation, we have to derive the
> equation from
> the view definition query (Query tree, or Plan tree?) and describe this as
> SQL
> query to compulte delta to be applied to the materialized view.

We had a similar discussion in this thread
https://www.postgresql.org/message-id/flat/FC784A9F-F599-4DCC-A45D-DBF6FA582D30%40QQdd.eu,
and I’m very much in agreement that the "change propagation equations”
approach can solve for a very substantial subset of common MV use cases.

> There could be several operations for view definition: selection,
> projection, 
> join,  aggregation, union, difference, intersection, etc.  If we can
> prepare a
> module for each operation, it makes IVM extensable, so we can start a
> simple 
> view definition, and then support more complex views.

Such a decomposition also allows ’stacking’, allowing complex MV definitions
to be attacked even with only a small handful of modules.

I did a bit of an experiment to see if "change propagation equations” could
be computed directly from the MV’s pg_node_tree representation in the
catalog in PlPgSQL. I found that pg_node_trees are not particularly friendly
to manipulation in PlPgSQL. Even with a more friendly-to-PlPgSQL
representation (I played with JSONB), then the next problem is making sense
of the structures, and unfortunately amongst the many plan/path/tree utility
functions in the code base, I figured only a very few could be sensibly
exposed to PlPgSQL. Ultimately, although I’m still attracted to the idea,
and I think it could be made to work, native code is the way to go at least
for now.

> 4. When to maintain materialized views
> 
> [...]
> 
> In the previous discussion[4], it is planned to start from "eager"
> approach. In our PoC
> implementaion, we used the other aproach, that is, using REFRESH command
> to perform IVM.
> I am not sure which is better as a start point, but I begin to think that
> the eager
> approach may be more simple since we don't have to maintain base table
> changes in other
> past transactions.

Certainly the eager approach allows progress to be made with less
infrastructure.

I am concerned that the eager approach only addresses a subset of the MV use
case space, though. For example, if we presume that an MV is present because
the underlying direct query would be non-performant, then we have to at
least question whether applying the delta-update would also be detrimental
to some use cases.

In the eager maintenance approache, we have to consider a race condition
where two
different transactions change base tables simultaneously as discussed in
[4].

I wonder if that nudges towards a logged approach. If the race is due to
fact of JOIN-worthy tuples been made visible after a COMMIT, but not before,
then does it not follow that the eager approach has to fire some kind of
reconciliation work at COMMIT time? That seems to imply a persistent queue
of some kind, since we can’t assume transactions to be so small to be able
to hold the queue in memory.

Hmm. I hadn’t really thought about that particular corner case. I guess a
‘catch' could be simply be to detect such a concurrent update and demote the
refresh approach by marking the MV stale awaiting a full refresh.

denty.



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html



Re: Clean up some elog messages and comments for do_pg_stop_backup and do_pg_start_backup

2018-12-31 Thread Peter Eisentraut
On 31/12/2018 09:07, Michael Paquier wrote:
> On Sun, Dec 30, 2018 at 02:50:46PM +0900, Michael Paquier wrote:
>> A personal note on the matter: I tend to prefer using the passive form
>> in such log messages because they are impersonal, and not use the
>> direct form because it becomes more personally addressed to the user.
>> I may be living abroad for too long though ;)
> 
> So, are there any objections regarding the previously-sent patch or
> other suggestions?  All comments from Peter and Alvaro have been
> included.

I like this version of the patch.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pg_dump multi VALUES INSERT

2018-12-31 Thread David Rowley
Just looking at the v5 patch, it seems not to handle 0 column tables correctly.

For example:

# create table t();
# insert into t default values;
# insert into t default values;

$ pg_dump --table t --inserts --insert-multi=100 postgres > dump.sql

# \i dump.sql
[...]
INSERT 0 1
psql:dump.sql:35: ERROR:  syntax error at or near ")"
LINE 1: );
^

I'm not aware of a valid way to insert multiple 0 column rows in a
single INSERT statement, so not sure how you're going to handle that
case.


--
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Speeding up creating UPDATE/DELETE generic plan for partitioned table into a lot

2018-12-31 Thread David Rowley
On Fri, 28 Dec 2018 at 20:36, Tsunakawa, Takayuki
 wrote:
> Although I may say the same thing as you, I think a natural idea would be to 
> create a generic plan gradually.  The starting simple question is "why do we 
> have to touch all partitions at first?"  That is, can we behave like this:
>
> * PREPARE just creates an aggregation plan node (e.g. Append for SELECT, 
> Update for UPDATE).  It doesn't create any plan for particular partitions.  
> Say, call this a parent generic plan node.
> * EXECUTE creates a generic plan for specific partitions if they don't exist 
> yet, and attach them to the parent generic plan node.

I imagine the place to start looking would be around why planning is
so slow for that many partitions. There are many inefficient data
structures used in the planner that perform linear searches over
things like equivalence classes. Perhaps some profiling would
highlight just where the problems lie. Tom recently re-posted a query
[1] which involved a large number of joins which was taking about 14
seconds to plan on my laptop. After writing some test code to allow
faster lookups of equivalence classes matching a set of relations I
got this down to about 2.4 seconds [2]. Perhaps this also helps the
partitioned table case a little too.

Another possible interesting idea would be to, instead of creating
large Append/MergeAppend plans for partition scanning, invent some
"Partition Seq Scan" and "Partition Index Scan" nodes that are able to
build plans more similar to scanning a normal table. Likely such nodes
would need to be programmed with a list of Oids that they're to scan
during their execution. They'd also need to take care of their own
tuple mapping for when partitions had their columns in varying orders.
At first thought, such a design does not seem so unrealistic, if so,
it would likely solve the generic plan problem you describe
completely.

[1] https://www.postgresql.org/message-id/6970.1545327857%40sss.pgh.pa.us
[2] 
https://www.postgresql.org/message-id/cakjs1f_bqvjetgksjt65glavwxqyryrjpuxe2ebkre0o0ec...@mail.gmail.com

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Clean up some elog messages and comments for do_pg_stop_backup and do_pg_start_backup

2018-12-31 Thread Michael Paquier
On Sun, Dec 30, 2018 at 02:50:46PM +0900, Michael Paquier wrote:
> A personal note on the matter: I tend to prefer using the passive form
> in such log messages because they are impersonal, and not use the
> direct form because it becomes more personally addressed to the user.
> I may be living abroad for too long though ;)

So, are there any objections regarding the previously-sent patch or
other suggestions?  All comments from Peter and Alvaro have been
included.
--
Michael


signature.asc
Description: PGP signature