Re: [HACKERS] A design for amcheck heapam verification

2017-10-20 Thread Peter Geoghegan
On Thu, Oct 5, 2017 at 7:00 PM, Peter Geoghegan  wrote:
> v3 of the patch series, attached, does it that way -- it adds a
> bloom_create(). The new bloom_create() function still allocates its
> own memory, but does so while using a FLEXIBLE_ARRAY_MEMBER. A
> separate bloom_init() function (that works with dynamic shared memory)
> could easily be added later, for the benefit of parallel hash join.

Since Peter E's work on making the documentation sgml files more
XML-like has broken the v3 patch doc build, attached is v4, which
fixes this bit rot. It also has a few small tweaks here and there to
the docs. Nothing worth noting specifically, really -- I just don't
like to leave my patches with bit rot for long. (Hat-tip to Thomas
Munro for making this easy to detect with his new CF continuous
integration tooling.)

I should point out that I shipped virtually the same code yesterday,
as v1.1 of the Github version of amcheck (also known as amcheck_next).
Early adopters will be able to use this new "heapallindexed"
functionality in the next few days, once packages become available for
the apt and yum community repos. Just as before, the Github version
will work on versions of Postgres >= 9.4.

This seems like good timing on my part, because we know that this new
"heapallindexed" verification will detect the "freeze the dead" bugs
that the next point release is set to have fixes for -- that is
actually kind of how one of the bugs was found [1]. We may even want
to advertise the available of this check within amcheck_next, in the
release notes for the next Postgres point release.

[1] 
https://www.postgresql.org/message-id/cah2-wznm4rcrhfaiwkpwtpew2bxdtgrozk7jwwgucxeh3d1...@mail.gmail.com
-- 
Peter Geoghegan
From 7906c7391a9f52d334c2cbc7d3e245ff014629f2 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan 
Date: Tue, 2 May 2017 00:19:24 -0700
Subject: [PATCH 2/2] Add amcheck verification of indexes against heap.

Add a new, optional capability to bt_index_check() and
bt_index_parent_check():  callers can check that each heap tuple that
ought to have an index entry does in fact have one.  This happens at the
end of the existing verification checks.

This is implemented by using a Bloom filter data structure.  The
implementation performs set membership tests within a callback (the same
type of callback that each index AM registers for CREATE INDEX).  The
Bloom filter is populated during the initial index verification scan.
---
 contrib/amcheck/Makefile |   2 +-
 contrib/amcheck/amcheck--1.0--1.1.sql|  28 +++
 contrib/amcheck/amcheck.control  |   2 +-
 contrib/amcheck/expected/check_btree.out |  14 +-
 contrib/amcheck/sql/check_btree.sql  |   9 +-
 contrib/amcheck/verify_nbtree.c  | 298 ---
 doc/src/sgml/amcheck.sgml| 173 ++
 src/include/utils/snapmgr.h  |   2 +-
 8 files changed, 454 insertions(+), 74 deletions(-)
 create mode 100644 contrib/amcheck/amcheck--1.0--1.1.sql

diff --git a/contrib/amcheck/Makefile b/contrib/amcheck/Makefile
index 43bed91..c5764b5 100644
--- a/contrib/amcheck/Makefile
+++ b/contrib/amcheck/Makefile
@@ -4,7 +4,7 @@ MODULE_big	= amcheck
 OBJS		= verify_nbtree.o $(WIN32RES)
 
 EXTENSION = amcheck
-DATA = amcheck--1.0.sql
+DATA = amcheck--1.0--1.1.sql amcheck--1.0.sql
 PGFILEDESC = "amcheck - function for verifying relation integrity"
 
 REGRESS = check check_btree
diff --git a/contrib/amcheck/amcheck--1.0--1.1.sql b/contrib/amcheck/amcheck--1.0--1.1.sql
new file mode 100644
index 000..e6cca0a
--- /dev/null
+++ b/contrib/amcheck/amcheck--1.0--1.1.sql
@@ -0,0 +1,28 @@
+/* contrib/amcheck/amcheck--1.0--1.1.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "ALTER EXTENSION amcheck UPDATE TO '1.1'" to load this file. \quit
+
+--
+-- bt_index_check()
+--
+DROP FUNCTION bt_index_check(regclass);
+CREATE FUNCTION bt_index_check(index regclass,
+heapallindexed boolean DEFAULT false)
+RETURNS VOID
+AS 'MODULE_PATHNAME', 'bt_index_check'
+LANGUAGE C STRICT PARALLEL RESTRICTED;
+
+--
+-- bt_index_parent_check()
+--
+DROP FUNCTION bt_index_parent_check(regclass);
+CREATE FUNCTION bt_index_parent_check(index regclass,
+heapallindexed boolean DEFAULT false)
+RETURNS VOID
+AS 'MODULE_PATHNAME', 'bt_index_parent_check'
+LANGUAGE C STRICT PARALLEL RESTRICTED;
+
+-- Don't want these to be available to public
+REVOKE ALL ON FUNCTION bt_index_check(regclass, boolean) FROM PUBLIC;
+REVOKE ALL ON FUNCTION bt_index_parent_check(regclass, boolean) FROM PUBLIC;
diff --git a/contrib/amcheck/amcheck.control b/contrib/amcheck/amcheck.control
index 05e2861..4690484 100644
--- a/contrib/amcheck/amcheck.control
+++ b/contrib/amcheck/amcheck.control
@@ -1,5 +1,5 @@
 # amcheck extension
 comment = 'functions for verifying relation integrity'
-default_version = '1.0'
+default_version = '1.1'
 module_pathname = '$libdir/amcheck'
 

Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2017-10-20 Thread Haribabu Kommi
On Fri, Oct 6, 2017 at 12:29 AM, Robert Haas  wrote:

> On Wed, Oct 4, 2017 at 3:40 AM, Haribabu Kommi 
> wrote:
> > There are some differences in handling database objects
> > between pg_dump and pg_dumpall, To retain both pg_dump
> > and pg_dumpall behavior even after refactoring, this option
> > is added. Currently this option is used mainly for the three
> > purposes.
> >
> > 1. Don't print unnecessary CREATE DATABASE options like
> > ENCODING, LC_COLLATE and LC_CTYPE options if the
> > default encoding is same with the above values.
> >
> > The above behavior is as per the pg_dumpall, but it can be
> > changed to print irrespective of the default encoding.
> >
> > 2. Do not dump postgres and template0 databases.
> >
> > 3. Set default_transaction_read_only = off.
>
> To me it seems that a refactoring which requires pg_dump to behave
> differently in small ways like this based on whether it is being
> called by pg_dumpall or not is probably not a good refactoring.  And I
> don't see why the proposal from Tom that started this thread would
> require such a thing to be true.
>

Before refactoring, pg_dumpall doesn't print "create database" commands
for both tempalte1 and postgres database, but on the other hand pg_dump
dump the create database commands with --create option.

To keep the behavior of all the database attributes in the dump of both
pg_dump and pg_dumpall, the code is unified and moved into pg_dump.
But to retain the pg_dumpall behavior of not dumping the "create database"
commands, a new option is added to pg_dump to skip dumping the
create database commands.

The new option name is now "--skip-create-default-db", this can be used
normal user also when try to dump the postgres database to not let create
the database commands in the dump.

>From your list, I would say that (1) and (3) seem like behaviors that
> we either want or do not want.  Whether pg_dump or pg_dumpall is
> involved seems irrelevant.  (2) seems like it might need some special
> handling, but that could be handled in pg_dumpall by just not calling
> pg_dump at all for those database.



I didn't any better way other than creating a new option to not let the
default db create database commands to dump, so I renamed the
older option to better one and change the behavior to use by the
normal users also.

Updated patch attached.

Regards,
Hari Babu
Fujitsu Australia


pg_dump-and-pg_dumpall-database-handling-refactoring_v8.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cursor With_Hold Performance Workarounds/Optimization

2017-10-20 Thread Leon Winter
> > The calculations inside the loop are written in some dynamic high-level
> > language and cannot easily be translated into SQL.
> 
> You don't really have to --- PG supports functions written in non-SQL
> languages.  Not sure if your problem is big enough to justify developing
> a new PL interface for $random-4GL-language, but that might be something
> to consider.

Surely it would be a possibilty to loop in SQL and call the other language for
each row, but I am not sure Postgres would be too happy if the callee waits for
user input and/or wants to do some SQL update operations.

> But, to get back to the original point: exactly what "sizeable performance
> impact of declaring a cursor "with hold"" do you think there is?  It
> shouldn't be noticeably more expensive than just selecting the rows would
> be.  Especially not for a number of rows that wouldn't make the
> above-depicted application structure completely untenable.  And for sure
> I'm not understanding why you think that materializing the result on the
> client side instead would be better.

For small tables materialization is a non-issue but we have large table where a
single select statement over all the rows causes Postgres to create 8GB temp
files being busy multiple seconds which is very noticeable.

> > Naturally I am now wondering why the postgres cursor/portal is not also
> > employing the same trick (at least as an option): Postpone
> > materialization of "with hold" cursors until it is required (like a
> > commit operation is dispatched).
> 
> We already do that, and have done it since the feature was invented,
> AFAIR.

When we declare a cursor for a select on the mentioned big table, it takes
multiple seconds and a big temp file is created which to me seems like the
materialization took place immediately.

> FWIW, the primary reason for materializing the cursor contents at commit,
> rather than just holding onto the active query as you seem to think would
> be better, is that materializing allows us to release locks on the
> underlying table(s).  If we kept the active query we'd have to keep those
> locks, as well as the query's active snapshot, thus certainly blocking
> VACUUM cleanup, and possibly blocking subsequent DDL.

Of course, providing such a long-lived cursor would be more costly in terms of
the resources you describe, but currently the cost of instant-materialization at
the opening operation of the cursor is more expensive.

> The approach of using a separate connection to read the cursor suffers
> from exactly those same problems.  Postgres isn't that happy with very
> long-lived transactions (neither is any other RDBMS I'm familiar with).
> So really I think that materializing the cursor right away and then
> doing your application calculations in whatever chunk size seems
> convenient is probably your best bet here.

In fact we are fetching cursor results in bigger chunks to avoid communication
overhead and many cursors reach end of scan immediately but there are cursors on
big tables and fetching all the records of them immediately is very expensive.
Meanwhile I have implemented a "lazy hold" in my database abstraction layer
which pulls the records in at commit which is a okayish trade-off. Better yet,
but with the disadvantages you outlined, would be a "with hold" cursor that
would hold onto the locks and the snapshot which would avoid materialization
entirely (which can be emulated with second database connection).


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix performance degradation of contended LWLock on NUMA

2017-10-20 Thread Sokolov Yura

Hello,

On 2017-10-19 19:46, Andres Freund wrote:

On 2017-10-19 14:36:56 +0300, Sokolov Yura wrote:

> > + init_local_spin_delay();
>
> The way you moved this around has the disadvantage that we now do this -
> a number of writes - even in the very common case where the lwlock can
> be acquired directly.

Excuse me, I don't understand fine.
Do you complain against init_local_spin_delay placed here?


Yes.


I could place it before perform_spin_delay under `if (!spin_inited)` if 
you

think it is absolutely must.





Placing it in other place will complicate code.




Or you complain against setting `mask` and `add`?


That seems right.


In both cases, I think simpler version should be accepted first. It 
acts

as algorithm definition. And it already gives measurable improvement.


Well, in scalability. I'm less sure about uncontended performance.




> > +  * We intentionally do not call finish_spin_delay here, because
> > the loop
> > +  * above usually finished by queuing into the wait list on
> > contention, and
> > +  * doesn't reach spins_per_delay thereby doesn't sleep inside of
> > +  * perform_spin_delay. Also, different LWLocks has very different
> > +  * contention pattern, and it is wrong to update spin-lock
> > statistic based
> > +  * on LWLock contention.
> > +  */
>
> Huh? This seems entirely unconvincing. Without adjusting this here we'll
> just spin the same way every iteration. Especially for the case where
> somebody else holds LW_FLAG_LOCKED that's quite bad.

LWLock's are very different. Some of them are always short-term
(BufferLock), others are always locked for a long time.


That seems not particularly relevant. The same is true for
spinlocks. The relevant question isn't how long the lwlock is held, 
it's

how long LW_FLAG_LOCKED is held - and that should only depend on
contention (i.e. bus speed, amount of times put into sleep while 
holding

lock, etc), not on how long the lock is held.


I've tried to place this delay into lock itself (it has 2 free bytes),
but this attempt performed worse.


That seems unsurprising - there's a *lot* of locks, and we'd have to
tune all of them. Additionally there's a bunch of platforms where we do
*not* have free bytes (consider the embedding in BufferTag).



Now I understand, that delays should be stored in array indexed by
tranche. But I have no time to test this idea. And I doubt it will 
give

cardinally better results (ie > 5%), so I think it is better to accept
patch in this way, and then experiment with per-tranche delay.


I don't think tranches have any decent predictive power here.


Look after "Make acquiring LWLock to look more like spinlock".
First `skip_wait_list` iterations there is no attempt to queue self into
wait list. It gives most of improvement. (without it there is just no
degradation on high number of clients, but a little decline on low
clients, because current algorithm is also a little `spinny` (because
it attempts to acquire lock again after queueing into wait list)).

`skip_wait_list` depends on tranche very much. And per-tranche
`skip_wait_list` should be calculated based on ability to acquire lock
without any sleep, ie without both queuing itself into wait list and
falling into `pg_usleep` inside `perform_spin_delay` . I suppose, it is
obvious that `spins_per_delay` have to be proportional to 
`skip_wait_list`

(it should be at least greater than `skip_wait_list`). Therefore
`spins_per_delay` is also should be runtime-dependent on lock's tranche.

Am I wrong?

Without that per-tranche auto-tuning, it is better to not touch
`spins_per_delay` inside of `LWLockAttemptLockOrQueue`, I think.

With regards,
--
Sokolov Yura aka funny_falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cursor With_Hold Performance Workarounds/Optimization

2017-10-20 Thread Leon Winter
> When we declare a cursor for a select on the mentioned big table, it takes
> multiple seconds and a big temp file is created which to me seems like the
> materialization took place immediately.

Since you mentioned, Postgres already postponed materialization until commit
operations we checked again and you were right. When we manually checked, we
executed a "declare" statement without opening a transaction block first, which
causes instant materialization. When a transaction is opened, it is in fact
postponed.
Unfortunately we do not use cursors but portals so we can (re-)use prepared
statements in multiple instances and there is not "hold" feature for portals so
we cannot benefit from the lazy "with hold" of the cursor.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] asynchronous execution

2017-10-20 Thread Kyotaro HORIGUCHI
Hello.

Fully-asynchronous executor needs that every node is stateful and
suspendable at the time of requesting for the next tuples to
underneath nodes. I tried pure push-base executor but failed.

After the miserable patch upthread, I finally managed to make
executor nodes suspendable using computational jump and got rid
of recursive calls of executor. But it runs about x10 slower for
simple SeqScan case. (pgbench ran with 9% degradation.) It
doesn't seem recoverable by handy improvements. So I gave up
that.

Then I returned to single-level asynchrony, in other words, the
simple case with async-aware nodes just above async-capable
nodes. The motive of using the framework in the previous patch
was that we had degradation on the sync (or normal) code paths by
polluting ExecProcNode with async stuff and as Tom's suggestion
the node->ExecProcNode trick can isolate the async code path.

The attached PoC patch theoretically has no impact on the normal
code paths and just brings gain in async cases. (Additional
members in PlanState made degradation seemingly comes from
alignment, though.)

But I haven't had enough stable result from performance
test. Different builds from the same source code gives apparently
different results...

Anyway I'll show the best one in the several times run here.

   original(ms) patched(ms)gain(%)
A: simple table scan :  9714.70  9656.73 0.6
B: local partitioning:  4119.44  4131.10-0.3
C: single remote table   :  9484.86  9141.89 3.7
D: sharding (single con) :  7114.34  6751.21 5.1
E: sharding (multi con)  :  7166.56  1827.9374.5

A and B are degradation checks, which are expected to show no
degradation.  C is the gain only by postgres_fdw's command
presending on a remote table.  D is the gain of sharding on a
connection. The number of partitions/shards is 4.  E is the gain
using dedicate connection per shard.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From fc424c16e124934581a184fcadaed1e05f7672c8 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Mon, 22 May 2017 12:42:58 +0900
Subject: [PATCH 1/3] Allow wait event set to be registered to resource owner

WaitEventSet needs to be released using resource owner for a certain
case. This change adds WaitEventSet reowner and allow the creator of a
WaitEventSet to specify a resource owner.
---
 src/backend/libpq/pqcomm.c|  2 +-
 src/backend/storage/ipc/latch.c   | 18 ++-
 src/backend/storage/lmgr/condition_variable.c |  2 +-
 src/backend/utils/resowner/resowner.c | 68 +++
 src/include/storage/latch.h   |  4 +-
 src/include/utils/resowner_private.h  |  8 
 6 files changed, 97 insertions(+), 5 deletions(-)

diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 754154b..d459f32 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -220,7 +220,7 @@ pq_init(void)
 (errmsg("could not set socket to nonblocking mode: %m")));
 #endif
 
-	FeBeWaitSet = CreateWaitEventSet(TopMemoryContext, 3);
+	FeBeWaitSet = CreateWaitEventSet(TopMemoryContext, NULL, 3);
 	AddWaitEventToSet(FeBeWaitSet, WL_SOCKET_WRITEABLE, MyProcPort->sock,
 	  NULL, NULL);
 	AddWaitEventToSet(FeBeWaitSet, WL_LATCH_SET, -1, MyLatch, NULL);
diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index 4eb6e83..e6fc3dd 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -51,6 +51,7 @@
 #include "storage/latch.h"
 #include "storage/pmsignal.h"
 #include "storage/shmem.h"
+#include "utils/resowner_private.h"
 
 /*
  * Select the fd readiness primitive to use. Normally the "most modern"
@@ -77,6 +78,8 @@ struct WaitEventSet
 	int			nevents;		/* number of registered events */
 	int			nevents_space;	/* maximum number of events in this set */
 
+	ResourceOwner	resowner;	/* Resource owner */
+
 	/*
 	 * Array, of nevents_space length, storing the definition of events this
 	 * set is waiting for.
@@ -359,7 +362,7 @@ WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock,
 	int			ret = 0;
 	int			rc;
 	WaitEvent	event;
-	WaitEventSet *set = CreateWaitEventSet(CurrentMemoryContext, 3);
+	WaitEventSet *set = CreateWaitEventSet(CurrentMemoryContext, NULL, 3);
 
 	if (wakeEvents & WL_TIMEOUT)
 		Assert(timeout >= 0);
@@ -517,12 +520,15 @@ ResetLatch(volatile Latch *latch)
  * WaitEventSetWait().
  */
 WaitEventSet *
-CreateWaitEventSet(MemoryContext context, int nevents)
+CreateWaitEventSet(MemoryContext context, ResourceOwner res, int nevents)
 {
 	WaitEventSet *set;
 	char	   *data;
 	Size		sz = 0;
 
+	if (res)
+		ResourceOwnerEnlargeWESs(res);
+
 	/*
 	 * Use MAXALIGN size/alignment to guarantee that later uses of memory are
 	 * aligned correctly. E.g. epoll_event might need 8 byte alignment on some
@@ -591,6 +597,11 @@ 

Re: [HACKERS] More stats about skipped vacuums

2017-10-20 Thread Masahiko Sawada
On Tue, Oct 10, 2017 at 7:26 PM, Kyotaro HORIGUCHI
 wrote:
> Hello.
> Once in a while I am asked about table bloat. In most cases the
> cause is long lasting transactions and vacuum canceling in some
> cases. Whatever the case users don't have enough clues to why
> they have bloated tables.
>
> At the top of the annoyances list for users would be that they
> cannot know whether autovacuum decided that a table needs vacuum
> or not. I suppose that it could be shown in pg_stat_*_tables.
>
>   n_mod_since_analyze | 2
> + vacuum_requred  | true
>   last_vacuum | 2017-10-10 17:21:54.380805+09
>
> If vacuum_required remains true for a certain time, it means that
> vacuuming stopped halfway or someone is killing it repeatedly.
> That status could be shown in the same view.

Because the table statistics is updated at end of the vacuum I think
that the autovacuum will process the table at the next cycle if it has
stopped halfway or has killed. So you mean that vacuum_required is for
uses who want to reclaim garbage without wait for autovacuum retry?

>   n_mod_since_analyze | 2
> + vacuum_requred  | true
>   last_vacuum | 2017-10-10 17:21:54.380805+09
>   last_autovacuum | 2017-10-10 17:21:54.380805+09
> + last_autovacuum_status  | Killed by lock conflict
>
> Where the "Killed by lock conflict" would be one of the followings.
>
>   - Completed (oldest xmin = 8023)
>   - May not be fully truncated (yielded at 1324 of 6447 expected)
>   - Truncation skipped
>   - Skipped by lock failure
>   - Killed by lock conflict
>
>
> If we want more formal expression, we can show the values in the
> following shape. And adding some more values could be useful.
>
>   n_mod_since_analyze  | 2
> + vacuum_requred   | true
> + last_vacuum_oldest_xid   | 8023
> + last_vacuum_left_to_truncate | 5123
> + last_vacuum_truncated| 387
>   last_vacuum  | 2017-10-10 17:21:54.380805+09
>   last_autovacuum  | 2017-10-10 17:21:54.380805+09
> + last_autovacuum_status   | Killed by lock conflict
> ...
>   autovacuum_count | 128
> + incomplete_autovacuum_count  | 53
>
> # The last one might be needless..

I'm not sure that the above informations will help for users or DBA
but personally I sometimes want to have the number of index scans of
the last autovacuum in the pg_stat_user_tables view. That value
indicates how efficiently vacuums performed and would be a signal to
increase the setting of autovacuum_work_mem for user.

> Where the "Killed by lock conflict" is one of the followings.
>
>- Completed
>- Truncation skipped
>- Partially truncated
>- Skipped
>- Killed by lock conflict
>
> This seems enough to find the cause of a table bloat. The same
> discussion could be applied to analyze but it might be the
> another issue.
>
> There may be a better way to indicate the vacuum soundness. Any
> opinions and suggestions are welcome.
>
> I'm going to make a patch to do the 'formal' one for the time
> being.
>

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cursor With_Hold Performance Workarounds/Optimization

2017-10-20 Thread Leon Winter
> I don't know quite how to put this, but it's not clear to me that the
> difficulties in this situation are things PostgreSQL could resolve
> even with much larger development resources than are currently
> available.

There does not seem to exist a cursor/portal/pointer semantic that can survive
unrelated changes to the database inside a single connection (and is not super
expensive like With_Hold). To some instance a similar behavior can be simulated
by using a second connection.
I assume most people avoid having this situation at all by changing their
implementation (to for example a more dynamic UPDATE statement like you
suggested).

> If you're updating what are perforce small batches of records in the
> UI, there's excellent reason to pull only those batches, mark them as
> being "in process," process them, then update the marked ones as
> "done" or whatever other states they can get to.
> 
> As to "crazy complicated calculations," this is what active databases
> are all about.  SQL is Turing complete, so you really can do it.

Of course all the things we do *could* be done in SQL itself which would be best
solution but there is a huge legacy code base in 4GL that one cannot
automatically translate into semantically equivalent SQL statements. During a
such a loop user input can also be requested for example.
 
> Would you want something that compiles from the user inputs to SQL?
> Might that have a more general utility?

Well, like I said, a 4GL to SQL conversion would be desirable but would require
a lot of effort. Thus one wants to mix the languages and currently one would
loop in 4GL, holding a SQL cursor/portal and do some stuff (which might include
SQL update statements).  One could also imagine looping in SQL and calling the
4GL runtime for each result row to do the computation. I am not sure that is
ideal if such an operation waits on user input. Also one would need to analyze
every loop looking for update statements and then automatically re-structure
them to update statements with dependencies.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cursor With_Hold Performance Workarounds/Optimization

2017-10-20 Thread Leon Winter
> > The calculations inside the loop are written in some dynamic high-level
> > language and cannot easily be translated into SQL.
> >
> 
> ???Can you not simply create a second connection to perform the updates?

That would be possibe, but I can see some problems:
loop
{ update table1;
  select something from table1;
  update table2 based in updated table1;
  commit;
}

If we do all the "update" statements in their own transaction, the select
statements would not be able to see changes. What we also tried was to give
every loop its own connection but that did not scale too well.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: enabling parallel execution for cursors explicitly (experimental)

2017-10-20 Thread Robert Haas
On Tue, Oct 17, 2017 at 12:06 PM, Tomas Vondra
 wrote:
> In general, it may be acceptable to rely on the elog() checks - which is
> pretty much what the FETCH+INSERT+SHARE example I shared in the first
> message of this thread does. I agree it's not particularly convenient,
> and perhaps we should replace it with checks while planning the queries.

Those checks are very much not comprehensive, though.  For example,
consider commit e9baa5e9fa147e00a2466ab2c40eb99c8a700824, which
allowed heap_insert() in parallel mode.  Here's the comment:

 /*
+ * Parallel operations are required to be strictly read-only in a parallel
+ * worker.  Parallel inserts are not safe even in the leader in the
+ * general case, because group locking means that heavyweight locks for
+ * relation extension or GIN page locks will not conflict between members
+ * of a lock group, but we don't prohibit that case here because there are
+ * useful special cases that we can safely allow, such as CREATE TABLE AS.
+ */
+if (IsParallelWorker())
 ereport(ERROR,
 (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
+ errmsg("cannot insert tuples in a parallel worker")));

Now, as things stand, there's no way for this code to be reached
except in the case where the inserts are targeting a new table, or at
least I hope there isn't, because that would be a bug.  But if we
applied your patch then it could be easily done: just start a parallel
cursor and then run an INSERT command.  It might take up a little work
to gin up a test case that shows this really crashing and burning, but
I'm very confident that it's possible.  I wouldn't have gone to the
trouble of installing checks to prevent inserts from running in
parallel mode if inserts were safe in parallel mode.

>> Also, I doubt this guarantees that we won't try to call
>> parallel-unsafe functions will parallel mode is active, so things
>> will just blow up in whatever way they do, maybe crashing the server
>> or rendering the database inconsistent or whatever.
>
> Probably. What happens right now is that declaring the cursor switches
> the whole transaction into parallel mode (EnterParallelMode), so if you
> do FETCH + INSERT + FETCH that will fail with elog(ERROR).

But, again, only in the limited cases that the elog() checks catch.  A
C function can be parallel-unsafe without doing anything that hits the
elog() checks; there is no way for the system to protect itself
against arbitrary C code.  The elog() checks are intended to catch the
common or likely ways of breaking the world, but arbitrarily C code
can work around those checks -- if nothing else, duplicate one of the
functions that has an elog() in it, remove the elog(), and then call
it.  You just did something parallel-safe, unchecked!  That's an
artificial example, of course, which is not likely to occur in
practice, but I'm pretty confident that there are non-artificial
examples also.

I think this is a more or less classic kind of programming problem -
you're proposing to take a set of checks that were intended to ensure
safety under a limited set of circumstances and generalize them to a
much broader context than the one for which they were designed.  They
will not be adequate to those circumstances, and a considerable amount
of analysis will be needed to figure out where the gaps are.  If
somebody wants to do that analysis, I'm willing to review it and try
to spot any holes, but I don't really want to spend the time to go do
the whole analysis myself.

The main points I want to make clearly understood is the current
design relies on (1) functions being labeled correctly and (2) other
dangerous code paths being unreachable because there's nothing that
runs between EnterParallelMode and ExitParallelMode which could invoke
them, except by calling a mislabeled function.  Your patch expands the
vulnerability surface from "executor code that can be reached without
calling a mislabeled function" to "any code that can be reached by
typing an SQL command".  Just rejecting any queries that are
parallel-unsafe probably closes a good chunk of the holes, but that
still leaves a lot of code that's never been run in parallel mode
before potentially now running in parallel mode - e.g. any DDL command
you happen to type, transaction control commands, code that only runs
when the server is idle like idle_in_transaction_timeout, cursor
operations.  A lot of that stuff is probably fine, but it's got to be
thought through.  Error handling might be a problem, too: what happens
if a parallel worker is killed while the query is suspended?  I
suspect that doesn't work very nicely at all.

One way to get some ideas about where the problem lies would be to
write a test patch that keeps parallel mode active at all times except
when running a query that contains something parallel-unsafe.  Then
run the regression tests and see if anything blows up.  That's not

Re: [HACKERS] 64-bit queryId?

2017-10-20 Thread Robert Haas
On Thu, Oct 19, 2017 at 2:11 AM, Julien Rouhaud  wrote:
> I agree, and I'm perfectly fine with adding a comment around pgssHashKey.
>
> PFA a patch to warn about the danger.

Committed a somewhat different version of this - hope you are OK with
the result.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 64-bit queryId?

2017-10-20 Thread Julien Rouhaud
On Fri, Oct 20, 2017 at 3:44 PM, Robert Haas  wrote:
> On Thu, Oct 19, 2017 at 2:11 AM, Julien Rouhaud  wrote:
>> I agree, and I'm perfectly fine with adding a comment around pgssHashKey.
>>
>> PFA a patch to warn about the danger.
>
> Committed a somewhat different version of this - hope you are OK with
> the result.

That's much better than what I proposed. Thanks a lot!


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] What is the point of setrefs.c's is_converted_whole_row_reference?

2017-10-20 Thread Robert Haas
On Thu, Oct 19, 2017 at 2:56 PM, Tom Lane  wrote:
> AFAICS, setrefs.c's special treatment of "converted whole row references"
> is completely pointless.  Why aren't they just treated by the regular
> "non var" code paths, thus saving code space and cycles?

Here's what one of Ashutosh's commit messages for one of the patches
in the stack said:

===
set_join_references() turns off outer side's has_non_vars to handle
expressions involving nullable side. Hence we can not use has_non_vars
to handle ConvertRowtypeExprs. Instead the patch adds
has_conv_whole_rows, which is set when there exist one or more of
ConvertRowtypeExprs described above.
===

I think that's referring to this comment:

 * Now we need to fix up the targetlist and qpqual, which are logically
 * above the join.  This means they should not re-use any
input expression
 * that was computed in the nullable side of an outer join.  Vars and
 * PlaceHolderVars are fine, so we can implement this
restriction just by
 * clearing has_non_vars in the indexed_tlist structs.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] Faster processing at Gather node

2017-10-20 Thread Robert Haas
On Wed, Oct 18, 2017 at 4:30 PM, Andres Freund  wrote:
> Hm. I'm a bit confused/surprised by that. What'd be the worst that can
> happen if we don't immediately detect that the other side is detached?
> At least if we only do so in the non-blocking paths, the worst that
> seems that could happen is that we read/write at most one superflous
> queue entry, rather than reporting an error? Or is the concern that
> detaching might be detected *too early*, before reading the last entry
> from a queue?

Detaching too early is definitely a problem.  A worker is allowed to
start up, write all of its results into the queue, and then detach
without waiting for the leader to read those results.  (Reading
messages that weren't really written would be a problem too, of
course.)

> I'm not convinced by this. Imo the multiplying largely comes from
> superflous actions, like the per-entry SetLatch calls here, rather than
> per batch.
>
> After all I'd benchmarked this *after* an experimental conversion of
> shm_mq to not use spinlocks - so there's actually no external barrier
> providing these guarantees that could be combined with SetLatch()'s
> barrier.

OK.

>> All that having been said, a batch variant for reading tuples in bulk
>> might make sense.  I think when I originally wrote this code I was
>> thinking that one process might be filling the queue while another
>> process was draining it, and that it might therefore be important to
>> free up space as early as possible.  But maybe that's not a very good
>> intuition.
>
> I think that's a sensible goal, but I don't think that has to mean that
> the draining has to happen after every entry. If you'd e.g. have a
> shm_mq_receivev() with 16 iovecs, that'd commonly be only part of a
> single tqueue mq (unless your tuples are > 4k).  I'll note that afaict
> shm_mq_sendv() already batches its SetLatch() calls.

But that's a little different -- shm_mq_sendv() sends one message
collected from multiple places.  There's no more reason for it to wake
up the receiver before the whole message is written that there would
be for shm_mq_send(); it's the same problem.

> I think one important thing a batched drain can avoid is that a worker
> awakes to just put one new tuple into the queue and then sleep
> again. That's kinda expensive.

Yes.  Or - part of a tuple, which is worse.

>> >b) Use shm_mq_sendv in tqueue.c by batching up insertions into the
>> >   queue whenever it's not empty when a tuple is ready.
>>
>> Batching them with what?  I hate to postpone sending tuples we've got;
>> that sounds nice in the case where we're sending tons of tuples at
>> high speed, but there might be other cases where it makes the leader
>> wait.
>
> Yea, that'd need some smarts. How about doing something like batching up
> locally as long as the queue contains less than one average sized batch?

I don't follow.

> No, I don't think that's necesarily true. If that worker's queue is full
> every-time, then yes. But I think a common scenario is that the
> "current" worker only has a small portion of the queue filled. Draining
> that immediately just leads to increased cacheline bouncing.

Hmm, OK.

> I'd not previously thought about this, but won't staying sticky to the
> current worker potentially cause pins on individual tuples be held for a
> potentially long time by workers not making any progress?

Yes.

>> >It might also be a good idea to use a more directed form of wakeups,
>> >e.g. using a per-worker latch + a wait event set, to avoid iterating
>> >over all workers.
>>
>> I don't follow.
>
> Well, right now we're busily checking each worker's queue. That's fine
> with a handful of workers, but starts to become not that cheap pretty
> soon afterwards. In the surely common case where the workers are the
> bottleneck (because that's when parallelism is worthwhile), we'll check
> each worker's queue once one of them returned a single tuple. It'd not
> be a stupid idea to have a per-worker latch and wait for the latches of
> all workers at once. Then targetedly drain the queues of the workers
> that WaitEventSetWait(nevents = nworkers) signalled as ready.

Hmm, interesting.  But we can't completely ignore the process latch
either, since among other things a worker erroring out and propagating
the error to the leader relies on that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

2017-10-20 Thread Fabien COELHO



The patch needs a rebase in the documentation because of the xml-ilization
of the sgml doc.



Thank you for the notification! Attached rebased patch.


Ok. The new version works for me.

--
Fabien.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Useless(?) asymmetry in parse_func.c

2017-10-20 Thread Tom Lane
While running down loose ends in my domains-over-composite patch,
I wondered why parse_func.c's FuncNameAsType() excludes composite
types as possible type names.  I could set up the patch to treat
composite domains the same way, or not, but it wasn't obvious what
to do because the point of the exclusion wasn't clear.

Some excavation in our git history shows that the exclusion was
added here:

commit 990eb8552e69a492840d46b58ceb630a8a295e54
Author: Tom Lane 
Date:   Wed Dec 12 03:28:49 2001 +

Don't accept names of complex types (ie, relation types) as being
requests for implicit trivial coercions.  Prevents sillinesses like
this one:
regression=# select x.int8_tbl.q1 from int8_tbl x;
ERROR:  fmgr_info: function 270997776: cache lookup failed

I could not find any contemporaneous discussion in the list archives,
so I'm guessing that I tripped over this corner case and did not think
it was worth spending any great amount of effort on.  But in hindsight
the fix sure looks like a band-aid that's covering up a deeper problem.
Whatever that problem was, it's gone now --- if I dike out the check
then the parser correctly concludes that coercing int8_tbl to int8_tbl
is a no-op.  You need more parens nowadays, but either of these work:

select (x.int8_tbl).q1 from int8_tbl x;
select (int8_tbl(x)).q1 from int8_tbl x;

There might still be an argument for rejecting the case on the grounds
that it's confusing or likely to be user error, but I'm not sure.
It seems weirdly asymmetric that we allow typename(x) to be a coercion
request unless typename is a composite type.  And that exception is
not documented in any of the places that talk about this syntax,
e.g. section 4.2.9, section 10.3 rule 3, or the CREATE CAST man page.

So I'm inclined to remove the exception for composite types, effectively
reverting 990eb8552.  Alternatively, if we keep it, it needs to be
documented as to the reasoning for having it.

Thoughts?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench more operators & functions

2017-10-20 Thread Fabien COELHO



Here is a v13. No code changes, but TAP tests added to maintain pgbench
coverage to green.


Here is a v14, which is just a rebase after the documentation xml-ization.

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index e509e6c..1f55967 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -827,14 +827,32 @@ pgbench  options  d
  
   Sets variable varname to a value calculated
   from expression.
-  The expression may contain integer constants such as 5432,
+  The expression may contain the NULL constant,
+  boolean constants TRUE and FALSE,
+  integer constants such as 5432,
   double constants such as 3.14159,
   references to variables :variablename,
-  unary operators (+, -) and binary operators
-  (+, -, *, /,
-  %) with their usual precedence and associativity,
-  function calls, and
-  parentheses.
+  operators
+  with their usual SQL precedence and associativity,
+  function calls,
+  SQL CASE generic conditional
+  expressions and parentheses.
+ 
+
+ 
+  Functions and most operators return NULL on
+  NULL input.
+ 
+
+ 
+  For conditional purposes, non zero numerical values are
+  TRUE, zero numerical values and NULL
+  are FALSE.
+ 
+
+ 
+  When no final ELSE clause is provided to a
+  CASE, the default value is NULL.
  
 
  
@@ -843,6 +861,7 @@ pgbench  options  d
 \set ntellers 10 * :scale
 \set aid (1021 * random(1, 10 * :scale)) % \
(10 * :scale) + 1
+\set divx CASE WHEN :x  0 THEN :y/:x ELSE NULL END
 
 

@@ -919,6 +938,177 @@ pgbench  options  d
   
  
 
+ 
+  Built-In Operators
+
+  
+   The arithmetic, bitwise, comparison and logical operators listed in
+are built into pgbench
+   and may be used in expressions appearing in
+   \set.
+  
+
+  
+   pgbench Operators by increasing precedence
+   
+
+ 
+  Operator
+  Description
+  Example
+  Result
+ 
+
+
+ 
+  OR
+  logical or
+  5 or 0
+  TRUE
+ 
+ 
+  AND
+  logical and
+  3 and 0
+  FALSE
+ 
+ 
+  NOT
+  logical not
+  not false
+  TRUE
+ 
+ 
+  IS [NOT] (NULL|TRUE|FALSE)
+  value tests
+  1 is null
+  FALSE
+ 
+ 
+  ISNULL|NOTNULL
+  null tests
+  1 notnull
+  TRUE
+ 
+ 
+  =
+  is equal
+  5 = 4
+  FALSE
+ 
+ 
+  
+  is not equal
+  5  4
+  TRUE
+ 
+ 
+  !=
+  is not equal
+  5 != 5
+  FALSE
+ 
+ 
+  
+  lower than
+  5  4
+  FALSE
+ 
+ 
+  =
+  lower or equal
+  5 = 4
+  FALSE
+ 
+ 
+  
+  greater than
+  5  4
+  TRUE
+ 
+ 
+  =
+  greater or equal
+  5 = 4
+  TRUE
+ 
+ 
+  |
+  integer bitwise OR
+  1 | 2
+  3
+ 
+ 
+  #
+  integer bitwise XOR
+  1 # 3
+  2
+ 
+ 
+  
+  integer bitwise AND
+  1  3
+  1
+ 
+ 
+  ~
+  integer bitwise NOT
+  ~ 1
+  -2
+ 
+ 
+  
+  integer bitwise shift left
+  1  2
+  4
+ 
+ 
+  
+  integer bitwise shift right
+  8  2
+  2
+ 
+ 
+  +
+  addition
+  5 + 4
+  9
+ 
+ 
+  -
+  substraction
+  3 - 2.0
+  1.0
+ 
+ 
+  *
+  multiplication
+  5 * 4
+  20
+ 
+ 
+  /
+  division (integer truncates the results)
+  5 / 3
+  1
+ 
+ 
+  %
+  modulo
+  3 % 2
+  1
+ 
+ 
+  -
+  opposite
+  - 2.0
+  -2.0
+ 
+
+   
+  
+ 
+
  
   Built-In Functions
 
@@ -965,6 +1155,13 @@ pgbench  options  d
5432.0
   
   
+   exp(x)
+   double
+   exponential
+   exp(1.0)
+   2.718281828459045
+  
+  
greatest(a [, ... ] )
double if any a is double, else integer
largest value among arguments
@@ -986,6 +1183,20 @@ pgbench  options  d
2.1
   
   
+   ln(x)
+   double
+   natural logarithm
+   ln(2.718281828459045)
+   1.0
+  
+  
+   mod(i, bj)
+   integer
+   modulo
+   mod(54, 32)
+   22
+  
+  
pi()
double
value of the constant PI
diff --git a/src/bin/pgbench/exprparse.y b/src/bin/pgbench/exprparse.y
index b3a2d9b..770be98 100644
--- a/src/bin/pgbench/exprparse.y
+++ b/src/bin/pgbench/exprparse.y
@@ -19,13 +19,17 @@
 PgBenchExpr *expr_parse_result;
 
 static PgBenchExprList *make_elist(PgBenchExpr *exp, PgBenchExprList *list);
+static PgBenchExpr *make_null_constant(void);
+static PgBenchExpr *make_boolean_constant(bool bval);
 static PgBenchExpr *make_integer_constant(int64 ival);
 static PgBenchExpr *make_double_constant(double 

Re: [HACKERS] pgbench - allow to store select results into variables

2017-10-20 Thread Fabien COELHO



Here is a v12.


Here is a v13, which is just a rebase after the documentation xml-ization.

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index e509e6c..44e8896 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -818,6 +818,51 @@ pgbench  options  d
   
 
   
+   
+
+ \cset [prefix] or
+ \gset [prefix]
+
+
+
+ 
+  These commands may be used to end SQL queries, replacing a semicolon.
+  \cset replaces an embedded semicolon (\;) within
+  a compound SQL command, and \gset replaces a final
+  (;) semicolon which ends the SQL command. 
+ 
+
+ 
+  When these commands are used, the preceding SQL query is expected to
+  return one row, the columns of which are stored into variables named after
+  column names, and prefixed with prefix if provided.
+ 
+
+ 
+  The following example puts the final account balance from the first query
+  into variable abalance, and fills variables
+  one, two and
+  p_three with integers from a compound query.
+
+UPDATE pgbench_accounts
+  SET abalance = abalance + :delta
+  WHERE aid = :aid
+  RETURNING abalance \gset
+-- compound of two queries
+SELECT 1 AS one, 2 AS two \cset
+SELECT 3 AS three \gset p_
+
+ 
+
+ 
+  
+\cset and \gset commands do not work when
+empty SQL queries appear within a compound SQL command.
+  
+ 
+
+   
+

 
  \set varname expression
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 5d8a01c..37ed07b 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -375,11 +375,14 @@ static const char *QUERYMODE[] = {"simple", "extended", "prepared"};
 
 typedef struct
 {
-	char	   *line;			/* text of command line */
+	char	   *line;			/* first line for short display */
+	char	   *lines;			/* full multi-line text of command */
 	int			command_num;	/* unique index of this Command struct */
 	int			type;			/* command type (SQL_COMMAND or META_COMMAND) */
 	int			argc;			/* number of command words */
 	char	   *argv[MAX_ARGS]; /* command word list */
+	int			compound;   /* last compound command (number of \;) */
+	char	  **gset;   /* per-compound command prefix */
 	PgBenchExpr *expr;			/* parsed expression, if needed */
 	SimpleStats stats;			/* time spent in this command */
 } Command;
@@ -1254,6 +1257,104 @@ getQueryParams(CState *st, const Command *command, const char **params)
 		params[i] = getVariable(st, command->argv[i + 1]);
 }
 
+/* read all responses from backend */
+static bool
+read_response(CState *st, char **gset)
+{
+	PGresult   *res;
+	int			compound = 0;
+
+	while ((res = PQgetResult(st->con)) != NULL)
+	{
+		switch (PQresultStatus(res))
+		{
+			case PGRES_COMMAND_OK: /* non-SELECT commands */
+			case PGRES_EMPTY_QUERY: /* may be used for testing no-op overhead */
+if (gset[compound] != NULL)
+{
+	fprintf(stderr,
+			"client %d file %d command %d compound %d: "
+			"\\gset expects a row\n",
+			st->id, st->use_file, st->command, compound);
+	st->ecnt++;
+	return false;
+}
+break; /* OK */
+
+			case PGRES_TUPLES_OK:
+if (gset[compound] != NULL)
+{
+	/* store result into variables */
+	int ntuples = PQntuples(res),
+		nfields = PQnfields(res),
+		f;
+
+	if (ntuples != 1)
+	{
+		fprintf(stderr,
+"client %d file %d command %d compound %d: "
+"expecting one row, got %d\n",
+st->id, st->use_file, st->command, compound, ntuples);
+		st->ecnt++;
+		PQclear(res);
+		discard_response(st);
+		return false;
+	}
+
+	for (f = 0; f < nfields ; f++)
+	{
+		char *varname = PQfname(res, f);
+		if (*gset[compound] != '\0')
+			varname = psprintf("%s%s", gset[compound], varname);
+
+		/* store result as a string */
+		if (!putVariable(st, "gset", varname,
+		 PQgetvalue(res, 0, f)))
+		{
+			/* internal error, should it rather abort? */
+			fprintf(stderr,
+	"client %d file %d command %d compound %d: "
+	"error storing into var %s\n",
+	st->id, st->use_file, st->command, compound,
+	varname);
+			st->ecnt++;
+			PQclear(res);
+			discard_response(st);
+			return false;
+		}
+
+		if (*gset[compound] != '\0')
+			free(varname);
+	}
+}
+break;	/* OK */
+
+			default:
+/* everything else is unexpected, so probably an error */
+fprintf(stderr,
+		"client %d file %d aborted in command %d compound %d: %s",
+		st->id, st->use_file, st->command, compound,
+		PQerrorMessage(st->con));
+st->ecnt++;
+PQclear(res);
+discard_response(st);
+return false;
+		}
+
+		PQclear(res);
+		compound += 1;
+	}
+
+	if (compound == 0)
+	{
+		fprintf(stderr, "client %d command %d: no results\n", st->id, st->command);
+		st->ecnt++;
+		return false;
+	}
+
+	return 

Re: [HACKERS] A GUC to prevent leader processes from running subplans?

2017-10-20 Thread Robert Haas
On Tue, Oct 17, 2017 at 7:27 AM, Thomas Munro
 wrote:
> While testing parallelism work I've wanted to be able to prevent
> gather nodes from running the plan in the leader process, and I've
> heard others say the same.  One way would be to add a GUC
> "multiplex_gather", like in the attached patch.  If you set it to off,
> Gather and Gather Merge won't run the subplan unless they have to
> because no workers could be launched.  I thought about adding a new
> value for force_parallel_mode instead, but someone mentioned they
> might want to do this on a production system too and
> force_parallel_mode is not really for end users.  Better ideas?

I don't think overloading force_parallel_mode is a good idea, but
having some other GUC for this seems OK to me.  Not sure I like
multiplex_gather, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers