### Re: [HACKERS] WIP: multivariate statistics / proof of concept

```
On 12/12/14 05:53, Heikki Linnakangas wrote:

On 10/13/2014 01:00 AM, Tomas Vondra wrote:

Hi,

attached is a WIP patch implementing multivariate statistics.

Great! Really glad to see you working on this.

+ * FIXME This sample sizing is mostly OK when computing stats for
+ *   individual columns, but when computing multi-variate stats
+ *   for multivariate stats (histograms, mcv, ...) it's rather
+ *   insufficient. For small number of dimensions it works, but
+ *   for complex stats it'd be nice use sample proportional to
+ *   the table (say, 0.5% - 1%) instead of a fixed size.

I don't think a fraction of the table is appropriate. As long as the
sample is random, the accuracy of a sample doesn't depend much on the
size of the population. For example, if you sample 1,000 rows from a
table with 100,000 rows, or 1000 rows from a table with 100,000,000
rows, the accuracy is pretty much the same. That doesn't change when
you go from a single variable to multiple variables.

You do need a bigger sample with multiple variables, however. My gut
feeling is that if you sample N rows for a single variable, with two
variables you need to sample N^2 rows to get the same accuracy. But
it's not proportional to the table size. (I have no proof for that,
but I'm sure there is literature on this.)

[...]

I did stage III statistics at University many moons ago...

The accuracy of the sample only depends on the value of N, not the total
size of the population, with the obvious constraint that N <= population
size.

The standard deviation in a random sample is proportional to the square
root of N.  So using N = 100 would have a standard deviation of about
10%, so to reduce it to 5% you would need N = 400.

For multiple variables, it will also be a function of N - I don't recall
precisely how, I suspect it might M * N were M is the number of
parameters (but I'm not as certain).  I think M^N might be needed if you
want all the possible correlations between sets of variable to be
reasonably significant - but I'm mostly just guessing here.

So using a % of table size is somewhat silly, looking at the above.
However, if you want to detect frequencies that occur at the 1% level,
then you will need to sample 1% of the table or greater.  So which
approach is 'best', depends on what you are trying to determine. The
sample size is more useful when you need to decide between 2 different
hypothesises.

The sampling methodology, is far more important than the ratio of N to
population size - consider the bias imposed by using random telephone
numbers, even before the event of mobile phones!

Cheers,
Gavin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### Re: [HACKERS] Removing service-related code in pg_ctl for Cygwin

```
On 18/01/2016 18:32, Andrew Dunstan wrote:

On 01/14/2016 12:38 AM, Michael Paquier wrote:

Hi all,

that's mentioned here as well:
http://www.postgresql.org/message-id/CAB7nPqQXghm_SdB5iniupz1atzMxk=95gv9a8ocdo83sxcn...@mail.gmail.com

Marco, as I think you do some packaging for Postgres in Cygwin, and
others, does that sound acceptable?

I think we can be a bit more adventurous and remove all the
Cygwin-specific code. See attached patch, which builds fine on buildfarm
cockatiel.

cheers

andrew

builds fine also here on 64bit

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### Re: [HACKERS] checkpointer continuous flushing

```

I measured it in a different number of cases, both on SSDs and spinning
rust. I just reproduced it with:

postgres-ckpt14 \
-D /srv/temp/pgdev-dev-800/ \
-c maintenance_work_mem=2GB \
-c fsync=on \
-c synchronous_commit=off \
-c shared_buffers=2GB \
-c wal_level=hot_standby \
-c max_wal_senders=10 \
-c max_wal_size=100GB \
-c checkpoint_timeout=30s

Using a fresh cluster each time (copied from a "template" to save time)
and using
pgbench -M prepared -c 16 -j 16 -T 300 -P 1

I must say that I have not succeeded in reproducing any significant
regression up to now on an HDD. I'm running some more tests again because
I had left out some options above that I thought were non essential.

I have deep problems with the 30-second checkpoint tests: basically the
checkpoints take much more than 30 seconds to complete, the system is not
stable, the 300 seconds runs last more than 900 seconds because the
clients are stuck a long time. The overall behavior is appaling as most of
the time is spent in IO panic at 0 tps.

Also, the performance level is around 160 tps on HDDs, which make sense to
me for a 7200 rpm HDD capable of about x00 random writes per second. It
seems to me that you reported much better performance on HDD, but I cannot
really see how this would be possible if data are indeed writen to disk.
Any idea?

Also, what is the very precise postgres version & patch used in your
tests on HDDs?

both before/after patch are higher) if I disable full_page_writes,
thereby eliminating a lot of other IO.

Maybe this is an explanation

--
Fabien.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### Re: [HACKERS] jsonb - jsonb operators

```On Mon, Jan 18, 2016 at 10:50 AM, Tom Lane  wrote:
> Dmitry Dolgov <9erthali...@gmail.com> writes:
>>> if there's any future intention to add a delete operator that removes
>> element/pair matches?
>
>> I think the operator (jsonb - jsonb) is logical because we have a shallow
>> concatenation function (something like a "union" operation), but we have
>> nothing like "set difference" and "intersection" functions. Actually, I
>> thought to implement these functions (at least for jsonbx). But of course
>> this function should be quite simple and consider only full key/value
>> matching as a target.
>
> I am wary of this proposal because it seems to be taking little
> account of the fact that there *already is* a jsonb minus operator,
> two of them in fact.  For example
>
> regression=# select '["a","b","c"]'::jsonb - 'b';
>   ?column?
>
>  ["a", "c"]
> (1 row)
>
> regression=# select '{"a":1, "b":2}'::jsonb - 'b';
>  ?column?
> --
>  {"a": 1}
> (1 row)
>
> The proposed full-match semantics don't seem to me to be consistent with
> the way that the existing operator works.
>
> Another rather nasty problem is that the latter case works at all,
> ie the parser will decide the unknown literal is "text" so that it can
> apply "jsonb - text", there being no other plausible choice.  If there
> were a "jsonb - jsonb" operator, the parser would prefer that one, due
> to its heuristic about assuming that an unknown literal is of the same
> type as the other operator input.  So adding such an operator will almost
> certainly break queries that work in 9.5.  Maybe it's worth adding one
> anyway, but I don't think the case for its usefulness has been proven
> to the point where we should create compatibility issues to get it.

That's a deal breaker for introducing proposed functionality against
an operator.  Maybe a function is a better choice.

I'm also squarely in the camp of "don't break userspace", meaning that
usefulness not enough of reason for changing in-place behaviors.

merlin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

```On Thu, Dec 24, 2015 at 8:26 AM, Michael Paquier
wrote:

> On Sun, Dec 13, 2015 at 11:05 PM, Amit Kapila
> wrote:
> > On Fri, Dec 11, 2015 at 6:34 PM, Andres Freund
> wrote:
>

I was looking into this patch, overall patch looks good to me, I want to
know what is the current state of this patch, is there some pending task in
this patch ?

Patch was not applying on head so i have re based it and re based version is
attached in the mail.

I have done some performance testing also..

Summary:
---
1. In my test for readonly workload i have observed some improvement for
scale factor 1000.
2. I have also observed some regression with scale factor 300 (I can't say
it's actual regression  or just run to run variance), I thought that may be
problem with lower scale factor so tested with scale factor 100 but with
s.f. 100 looks fine.

Machine Detail:
cpu   : POWER8
cores: 24 (192 with HT)

Non Default Parameter:

Shared Buffer= 30GB
max_wal_size= 10GB
max_connections=500

Test1:
pgbench -i -s 1000 postgres
pgbench -c\$ -j\$ -Mprepared -S postgres

Client Base  Pached

1   19753   19493
32   344059 336773
64   495708 540425
128 564358 685212
256 466562 639059

Test2:
pgbench -i -s 300 postgres
pgbench -c\$ -j\$ -Mprepared -S postgres

Client Base  Pached

1  2055519404
32  375919  332670
64  509067  440680
128431346  415121
256380926  379176

Test3:
pgbench -i -s 100 postgres
pgbench -c\$ -j\$ -Mprepared -S postgres

Client Base  Pached

1  2055519404
32  375919  332670
64  509067  440680
128431346  415121
256380926  379176

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
diff --git a/src/backend/storage/buffer/buf_init.c b/src/backend/storage/buffer/buf_init.c
index b423aa7..04862d7 100644
--- a/src/backend/storage/buffer/buf_init.c
+++ b/src/backend/storage/buffer/buf_init.c
@@ -121,12 +121,9 @@ InitBufferPool(void)
BufferDesc *buf = GetBufferDescriptor(i);

CLEAR_BUFFERTAG(buf->tag);
-			buf->flags = 0;
-			buf->usage_count = 0;
-			buf->refcount = 0;
-			buf->wait_backend_pid = 0;

-			SpinLockInit(>buf_hdr_lock);
+			pg_atomic_init_u32(>state, 0);
+			buf->wait_backend_pid = 0;

buf->buf_id = i;

diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 7141eb8..f69faeb 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -51,7 +51,6 @@
#include "utils/resowner_private.h"
#include "utils/timestamp.h"

-
/* Note: these two macros only work on shared buffers, not local ones! */
#define BufHdrGetBlock(bufHdr)	((Block) (BufferBlocks + ((Size) (bufHdr)->buf_id) * BLCKSZ))
#define BufferGetLSN(bufHdr)	(PageGetLSN(BufHdrGetBlock(bufHdr)))
@@ -126,7 +125,7 @@ static BufferDesc *PinCountWaitBuf = NULL;
* entry using ReservePrivateRefCountEntry() and then later, if necessary,
* fill it with NewPrivateRefCountEntry(). That split lets us avoid doing
* memory allocations in NewPrivateRefCountEntry() which can be important
- * because in some scenarios it's called with a spinlock held...
+ * because in some scenarios it's called with a header lock held...
*/
static struct PrivateRefCountEntry PrivateRefCountArray[REFCOUNT_ARRAY_ENTRIES];
static HTAB *PrivateRefCountHash = NULL;
@@ -775,9 +774,8 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
*/
if (isLocalBuf)
{
-			/* Only need to adjust flags */
-			Assert(bufHdr->flags & BM_VALID);
-			bufHdr->flags &= ~BM_VALID;
+			pg_atomic_fetch_and_u32(>state, ~BM_VALID);
}
else
{
@@ -789,8 +787,8 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
do
{
LockBufHdr(bufHdr);
-Assert(bufHdr->flags & BM_VALID);
-bufHdr->flags &= ~BM_VALID;
+pg_atomic_fetch_and_u32(>state, ~BM_VALID);
UnlockBufHdr(bufHdr);
} while (!StartBufferIO(bufHdr, true));
}
@@ -808,7 +806,7 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
* it's not been recycled) but come right back here to try smgrextend
* again.
*/
-	Assert(!(bufHdr->flags & BM_VALID));		/* spinlock not needed */

### Re: [HACKERS] checkpointer continuous flushing

```On 2016-01-19 10:27:31 +0100, Fabien COELHO wrote:
> Also, the performance level is around 160 tps on HDDs, which make sense to
> me for a 7200 rpm HDD capable of about x00 random writes per second. It
> seems to me that you reported much better performance on HDD, but I cannot
> really see how this would be possible if data are indeed writen to disk. Any
> idea?

synchronous_commit = off does make a significant difference.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### [HACKERS] Window2012R2: initdb error: "The current directory is invalid."

```Hi all,

I have failed in execute initdb in "c:\Windows\Temp\dir" folder as
reproduce logs below. The OS's messages "The current directory is invalid."
was returned.

I think it is a specification of Window2012R2 (because i could not reproduce
In Windows 7) but not PostgreSQL's problem.
But is it need to be do something in PostgreSQL sir?

I found that the ERROR messages produced when bootstrap_template1 (initdb.c)
execute PG_CMD_OPEN (_popen function).
_popen not failed but OS produced "The current directory is invalid." message,
postgres did not start, and _pclose is returned with 1 make initdb failed.

May be, Window2012R2 does not permit to execute postgres command with
none privilege permission in this case, I think so.

--- Reproduce Environment ---
OS: Window2012R2
PostgreSQL: 9.5
Command: initdb
Execute folder: c:\Windows\Temp\dir
---

--- Reproduce log ---
c:\Windows\Temp>mkdir dir
c:\Windows\Temp>cd dir
c:\Windows\Temp\dir>
c:\Windows\Temp\dir>cmd /c ""C:\Program Files\PostgreSQL\9.5\bin\initdb"
--locale=C -D "c:\testdb" -d -n"
Running in debug mode.
Running in noclean mode.  Mistakes will not be cleaned up.
Running in debug mode.
Running in noclean mode.  Mistakes will not be cleaned up.
The files belonging to this database system will be owned by user
This user must also own the server process.

VERSION=9.5.0
PGDATA=c:/testdb
share_path=C:/Program Files/PostgreSQL/9.5/share
PGPATH=C:/Program Files/PostgreSQL/9.5/bin
POSTGRES_BKI=C:/Program Files/PostgreSQL/9.5/share/postgres.bki
POSTGRES_DESCR=C:/Program Files/PostgreSQL/9.5/share/postgres.description
POSTGRES_SHDESCR=C:/Program Files/PostgreSQL/9.5/share/postgres.shdescription
POSTGRESQL_CONF_SAMPLE=C:/Program
Files/PostgreSQL/9.5/share/postgresql.conf.sample
PG_HBA_SAMPLE=C:/Program Files/PostgreSQL/9.5/share/pg_hba.conf.sample
PG_IDENT_SAMPLE=C:/Program Files/PostgreSQL/9.5/share/pg_ident.conf.sample
The database cluster will be initialized with locale "C".
The default database encoding has accordingly been set to "SQL_ASCII".
The default text search configuration will be set to "english".

Data page checksums are disabled.

creating directory c:/testdb ... ok
creating subdirectories ... ok
selecting default max_connections ... 10
selecting default shared_buffers ... 400kB
selecting dynamic shared memory implementation ... windows
creating configuration files ... ok
creating template1 database in c:/testdb/base/1 ... The current directory is
invalid.
child process exited with exit code 1
initdb: data directory "c:/testdb" not removed at user's request

c:\Windows\Temp\dir>
---

Best regards,
Dang Minh Huong
NEC Solution Innovators, Ltd.
http://www.nec-solutioninnovators.co.jp/en/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### [HACKERS] Column merging for inherited tables aren't schema-qualified

```Hi,

I've noticed that if I alter the parent of a inheritance tree, there
can be ambiguity of which tables the column definitions were merged
with.

For example:

# CREATE SCHEMA remote;
CREATE SCHEMA

# IMPORT public FROM SERVER remote INTO remote;
IMPORT FOREIGN SCHEMA

# CREATE TABLE public.customers (LIKE remote.customers);
CREATE TABLE

# CREATE TABLE master_customers (LIKE remote.customers);
CREATE TABLE

# ALTER TABLE remote.customers INHERIT master_customers;
ALTER TABLE

# ALTER TABLE customers INHERIT master_customers;
ALTER TABLE

# ALTER TABLE customers SET WITH OIDS;
ALTER TABLE

# ALTER TABLE remote.customers SET WITH OIDS;
ALTER TABLE

# ALTER TABLE master_customers SET WITH OIDS;
NOTICE:  merging definition of column "oid" for child "customers"
NOTICE:  merging definition of column "oid" for child "customers"
ALTER TABLE

If we only got one of those merge notices, we wouldn't know which
table the notice referred to, although not particularly important in
this case.

I wonder if this should be changed so it would instead read:

# ALTER TABLE master_customers SET WITH OIDS;
NOTICE:  merging definition of column "oid" for child "remote.customers"
NOTICE:  merging definition of column "oid" for child "public.customers"

However, this is only one example of what is probably a fairly common
case of table ambiguity in log messages.

Thom

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### Re: [HACKERS] COPY (... tab completion

```
On 01/19/2016 07:55 AM, Michael Paquier wrote:

+/* If we have COPY BINARY, compelete with list of tables */
s/compelete/complete

Fixed.

+else if (TailMatches2("COPY|\\copy", "("))
+COMPLETE_WITH_LIST7("SELECT", "TABLE", "VALUES", "INSERT",
"UPDATE", "DELETE", "WITH");
This one should be Matches, no?

Yep, fixed.

Andreas

diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 5a11c61..72e0255 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -1937,11 +1937,18 @@ psql_completion(const char *text, int start, int end)
/* COPY */

/*
-	 * If we have COPY [BINARY] (which you'd have to type yourself), offer
-	 * list of tables (Also cover the analogous backslash command)
+	 * If we have COPY, offer list of tables or "(" (Also cover the analogous
+	 * backslash command).
*/
-	else if (Matches1("COPY|\\copy") || Matches2("COPY", "BINARY"))
+	else if (Matches1("COPY|\\copy"))
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tables,
+   " UNION ALL SELECT '('");
+	/* If we have COPY BINARY, complete with list of tables */
+	else if (Matches2("COPY", "BINARY"))
COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tables, NULL);
+	/* If we have COPY (, complete it with legal commands */
+	else if (Matches2("COPY|\\copy", "("))
+		COMPLETE_WITH_LIST7("SELECT", "TABLE", "VALUES", "INSERT", "UPDATE", "DELETE", "WITH");
/* If we have COPY [BINARY] , complete it with "TO" or "FROM" */
else if (Matches2("COPY|\\copy", MatchAny) ||
Matches3("COPY", "BINARY", MatchAny))

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### Re: [HACKERS] Proposal: Trigonometric functions in degrees

```On 19 January 2016 at 06:32, Michael Paquier  wrote:
> The first patch looks fine to me, a nitpick:
> +   /* Be sure to throw an error if the input is infinite --- see dcos */
> s/dcos/docs
>

No, I meant dcos the function there. I would normally write that as
dcos() to indicate that it is a function, but I thought the convention
here was to omit the parentheses after function names. Looking again,
I see several examples of function names with parentheses in comments,
so they could be added here, or the comment could be written a
different way. I'm happy to leave that to the committer's discretion.

> For patch 2, another nitpick :)
> +   return ( sin(x * (M_PI / 180.0)) / sin(30.0 * (M_PI / 180.0)) ) / 2.0;
> parenthesis format looks incorrect, too many spaces at the border.
>
> Except those things the second patch looks good to me as well. Let's
> have a committer look at it.

OK. Thanks for looking at this.

Regards,
Dean

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

```On Tue, Jan 19, 2016 at 12:41 PM, Michael Paquier
wrote:
>
> On Mon, Jan 18, 2016 at 10:19 PM, Amit Kapila
wrote:
> > On Mon, Jan 18, 2016 at 10:54 AM, Michael Paquier
> >  wrote:
> >> Yes, the thing is that, as mentioned at the beginning of the thread, a
> >> low value of archive_timeout causes a segment to be forcibly switched
> >> at the end of the timeout defined by this parameter. In combination
> >> with the standby snapshot taken in bgwriter since 9.4, this is causing
> >> segments to be always switched even if a system is completely idle.
> >> Before that, in 9.3 and older versions, we would have a segment
> >> forcibly switched on an idle system only once per checkpoint.
> >>
> >
> > Okay, so this will fix the case where the system is idle, but what I
> > am slightly worried is that it should not miss to log the standby
snapshot
> > due to this check when there is actually some activity in the system.
> > Why is not possible to have a case such that the segment is forcibly
> > switched due to reason other than checkpoint (user has requested the
> > same) and the current insert LSN is at beginning of a new segment
> > due to write activity? If that is possible, which to me theoretically
seems
> > possible, then I think bgwriter will miss to log the standby snapshot.
>
> Yes, with segments forcibly switched by users this check would make
> the bgwriter not log in a snapshot if the last action done by server
> was XLOG_SWITCH. Based on the patch I sent, the only alternative would
> be to not forcedSegSwitchLSN in those code paths. Perhaps that's fine
> enough for back-branches..
>

Yeah, that can work, but I think the code won't look too clean.  I think
lets try out something on lines of what is suggested by Andres and
see how it turns out.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

```

### Re: [HACKERS] plpgsql - DECLARE - cannot to use %TYPE or %ROWTYPE for composite types

```2016-01-18 22:48 GMT+01:00 Robert Haas :

> On Mon, Jan 18, 2016 at 4:35 PM, Pavel Stehule
> wrote:
> >> I know that Oracle uses syntax of this general type, but I've always
> >> found it ugly.  It's also pretty non-extensible.  You could want
> >> similar things for range types and any other container types we might
> >> get in the future, but clearly adding new reserved words for each one
> >> is no good.
> >
> > It doesn't use reserved worlds.
>
> OK - keywords, then.
>
> >> One idea that occurs to me is: If you can DECLARE BAR FOO%TYPE, but
> >> then you want to make BAR an array of that type rather than a scalar,
> >> why not write that as DECLARE BAR FOO%TYPE[]?  That seems quite
> >> natural to me.
> >
> > what you propose for syntax for taking a element of array?
>
> No idea.
>

the syntax for "array from" is natural, but for any other is hard. So it is
reason, why I used text form. Using Oracle's pattern source%operation
allows to use nonreserved keywords. Probably any text can be there. The
keywords isn't necessary (not tested).

> >> I think the part of this patch that makes %TYPE work for more kinds of
> >> types is probably a good idea, although I haven't carefully studied
> >> exactly what it does.
> >
> >
> > I invite any ideas, but currently used notation is only in direction
> > type->array. The working with symbols looks more difficult, than using
> words
> > (in design area).
> >
> > More - the textual form is more near to our system of polymorphics types:
> > anyelement, anyarray, ... We have not anyelement[]
>
> True, but this is hardly a straightforward extension of what we have
> today either.
>

It is, but sometime the polymorphic types can help.

The proposed feature/syntax has sense primary for polymorphic types. It
should to follow our polymorphic types. The primary pair is
"anyarray","anyelement"  -> "arraytype","elemementtype".

If you don't use polymorphic parameters in plpgsql, then proposed feature
can look like useless.
Regards

Pavel

>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

```

### Re: Odd behavior in foreign table modification (Was: Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW)

```On 12 January 2016 at 11:49, Etsuro Fujita  wrote:
> On 2016/01/12 20:36, Thom Brown wrote:
>>
>> On 8 January 2016 at 05:08, Etsuro Fujita
>> wrote:
>
>
On 2016/01/06 20:37, Thom Brown wrote:
>
> I've run into an issue:
>
> *# UPDATE master_customers SET id = 22 WHERE id = 16 RETURNING
> tableoid::regclass;
> ERROR:
> CONTEXT:  Remote SQL command: UPDATE public.customers SET id = 22
> WHERE ((id = 16)) RETURNING NULL
>
>
>>> While working on this, I noticed that the existing postgres_fdw system
>>> shows
>>> similar behavior, so I changed the subject.
>>>
>>> IIUC, the reason for that is when the local query specifies "RETURNING
>>> tableoid::regclass", the FDW has fmstate->has_returning=false while the
>>> remote query executed at ModifyTable has "RETURNING NULL", as shown in
>>> the
>>> above example; that would cause an abnormal exit in executing the remote
>>> query in postgresExecForeignUpdate, since that the FDW would get
>>> PGRES_TUPLES_OK as a result of the query while the FDW would think that
>>> the
>>> right result to get should be PGRES_COMMAND_OK, from the flag
>>> fmstate->has_returning=false.
>
>
>>> Attached is a patch to fix that.
>
>
>> I can't apply this patch in tandem with FDW DML pushdown patch (either
>> v2 or v3).
>
>
> That patch is for fixing the similar issue in the existing postgres_fdw
> system.  So, please apply that patch without the DML pushdown patch.  If
> that patch is reasonable as a fix for the issue, I'll update the DML
> pushdown patch (v3) on top of that patch.

The patch seems to work for me:

Before:

*# UPDATE master_customers SET id = 22 WHERE id = 1 RETURNING
tableoid::regclass;
ERROR:
CONTEXT:  Remote SQL command: UPDATE public.customers SET id = \$2
WHERE ctid = \$1 RETURNING NULL

After:

*# UPDATE master_customers SET id = 22 WHERE id = 1 RETURNING
tableoid::regclass;
tableoid
--
remote.customers
(1 row)

UPDATE 1

Thom

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### Re: [HACKERS] COPY (... tab completion

```On Tue, Jan 19, 2016 at 8:00 PM, Andreas Karlsson  wrote:
> On 01/19/2016 07:55 AM, Michael Paquier wrote:
>>
>> +/* If we have COPY BINARY, compelete with list of tables */
>> s/compelete/complete
>
> Fixed.
>
>> +else if (TailMatches2("COPY|\\copy", "("))
>> +COMPLETE_WITH_LIST7("SELECT", "TABLE", "VALUES", "INSERT",
>> "UPDATE", "DELETE", "WITH");
>> This one should be Matches, no?
>
> Yep, fixed.

This patch makes me wonder: are we going to nuke the grammar "COPY [
BINARY ] table_name" at some point? This was used up to 7.3.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

```On Tue, Jan 19, 2016 at 7:31 PM, Dilip Kumar wrote:
> Test2:
> pgbench -i -s 300 postgres
> pgbench -c\$ -j\$ -Mprepared -S postgres
>
> Client Base  Pached
>
> 1  2055519404
> 32  375919  332670
> 64  509067  440680
> 128431346  415121
> 256380926  379176
>
> Test3:
> pgbench -i -s 100 postgres
> pgbench -c\$ -j\$ -Mprepared -S postgres
>
> Client Base  Pached
>
> 1  2055519404
> 32  375919  332670
> 64  509067  440680
> 128431346  415121
> 256380926  379176

It seems like you did a copy-paste of the results with s=100 and
s=300. Both are showing the exact same numbers.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### [HACKERS] Infer INOUT Parameters from Frontend/Backend Protocol

```
Hi,
I'm currently working on improving Npgsql's
NpgsqlCommandBuilder.DeriveParameters method which is intended to
automatically populate parameter information for a NpgsqlCommand.

As Shay Rojansky suggested to use a Parse/Describe ->
ParameterDescription/RowDescription  over the backend protocol instead
of querying pg_proc
(https://github.com/npgsql/npgsql/pull/912#issuecomment-167557036) in
order to derive the properties of Parameters (data type, name,
direction) I'm currently investigating this approach.
The benefit of this would be that we could avoid duplicating quite a bit
of backend functionality in order to find the correct overload of a
specified function.
Also it seems to be the best way to derive parameters from prepared
SQL-statements that are not function calls.

While having a closer look at the details of the
ParameterDescription/RowDescription that the backend returns after a
Parse/Describe message I come to the conclusion that there is no valid
way to always  find out whether a parameter is IN or INOUT from these
Messages.

Example:

Given the following function
CREATE OR REPLACE FUNCTION my_func(IN param1 integer, OUT param2
integer, INOUT param3 integer) RETURNS record AS

'BEGIN
param3 = param1 + param2 + param3;
END;' LANGUAGE 'plpgsql';

After sending a Parse message  for 'SELECT* FROM my_func(\$1,\$2)'
followed by aDescribe message I'll get back a ParameterDescription
message containing the OIDs of the two inwards bound parameters (and a
void OID for the OUT parameter) followed by a RowDescription message
containing the names and OIDs of the two OUT parameters.

Without additional knowledge of the exact function definition (parsing
it or hardcoding information about it) I can only figure out that there
are three parameters in total with two of them being inwards bound and
two of them being outwards bound. I can also tell that the second
parameter is a real OUT Parameter (from void OID in the
ParameterDescription message).
But what I can't tell by any means is whether the first parameter is the
INOUT one or the last Parameter is the INOUT one i.e. wheter it's
(IN,OUT,INOUT) or (INOUT,OUT,IN)

Digging around in the history of PostgreSQLs OUT and INOUT parameter
support
(http://www.postgresql.org/message-id/flat/421eca30.8040...@samurai.com#421eca30.8040...@samurai.com)
and poking around in the respective commits (git log
--after="2005-01-19" --before="2005-11-08" --author="Tom Lane"
--grep="OUT") helped me to understand why things are like they are (i.
e. why OUT Parameters are implemented more like rows than like
parameters and why the ParameterDescription message gives so little
information about them) but still I'd whish that the
ParameterDescription message would contain the whole Information about
all the parameters (name, type, direction).

Anyways, as I don't expect you to change the Frontend/Backend Protocol
due to my whishes I just want to confirm that things really are the way
I understand them and that I'm  not overlooking something obvious.

If I'm right ParameterDescription path is probably a blind end for
parameter derivation in Npgsql and I'll probably have to stick with the
"query pg_proc"-way.

Regards,
Brar

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### Re: [HACKERS] PATCH: Extending the HyperLogLog API a bit

```Robert Haas wrote:
> On Mon, Jan 11, 2016 at 2:22 PM, Alvaro Herrera
>  wrote:
> > Tomas Vondra wrote:
> >
> > Looks pretty reasonable to me.  I'm not sure we want to push this ahead
> > of the bloom filter stuff, but it looks ready to commit otherwise.
>
> I'd say go ahead and push it.

Done.

--
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### Re: [HACKERS] PATCH: Extending the HyperLogLog API a bit

```On Tue, Jan 19, 2016 at 2:03 PM, Tomas Vondra
wrote:
>
> FWIW I've been considering adding APPROX_COUNT_DISTINCT() aggregate,
> similarly to what other databases (e.g. Vertica) have built-in. Now, that
> would not require the merge too, but we're currently baking support for
> 'combine' functions, and that's exactly what merge does.
>
> So why not just fix the bug?

You can go from the sparse representation to the dense representation,
so in principle you can merge two of our HLL states, if you are then
satisfied with having a new representation for the merged state. I
don't have time right now to do a full analysis of whether or not it's
possible to just fix the bug without doing all that, but I think it
might not be.

I think we benefit from the simplicity of the sparse representation.
So, in the absence of a clear justification for retaining
mergeHyperLogLog(), ripping it out seems best.

I also think that an expanded set of functionality would be required
for your APPROX_COUNT_DISTINCT() patch anyway, including support for
multiple representations (perhaps this isn't documented in your
APPROX_COUNT_DISTINCT(), but complete implementations seem to switch
from sparse to full at a certain point). So, ripping out
mergeHyperLogLog() doesn't really make that effort any more difficult.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### [HACKERS] Re: pglogical_output - a general purpose logical decoding output plugin

```The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:   not tested
Spec compliant:   not tested
Documentation:not tested

First part of code review (for about 1/3rd of code):
pglogical_output.h:

+ /* Protocol capabilities */
+ #define PGLOGICAL_PROTO_VERSION_NUM 1
+ #define PGLOGICAL_PROTO_MIN_VERSION_NUM 1
Is this protocol version number and minimal recognized version number,
or major and minor version number? I assume that PGLOGICAL_PROTO_VERSION_NUM
is current protocol version (as in config max_proto_version and
min_proto_version). But why we have MIN_VERSION and not MAX_VERSION?

From code in pglogical_output.c (lines 215-225 it looks like
PGLOGICAL_PROTO_VERSION_NUM is maximum, and PGLOGICAL_PROTO_MIN_VERSION_NUM
is treated as minimal protocol version number.

I can see that those constants are exported in pglogical_infofuncs.c and
pglogical.sql, but I do not understand omission of MAX.

+ typedef struct HookFuncName
+ typedef struct PGLogicalTupleData
I haven't found those used anything, and they are not mentioned in
documentation. Are those structures needed?

+ pglogical_config.c:
+   switch(get_param_key(elem->defname))
+   {
+   val = get_param_value(elem, false,
OUTPUT_PARAM_TYPE_UINT32);
Why do we need this line here? All cases contain some variant of
val = get_param_value(elem, false, TYPE);
as first line after "case PARAM_*:" so this line should be removed.

+   val = get_param(options, "startup_params_format", false, false,
+   OUTPUT_PARAM_TYPE_UINT32, );
+
+   params_format = DatumGetUInt32(val);
Shouldn't we check "found" here? We work with val (which is Datum(0)) - won't
it throw SIGFAULT, or similar?
Additionally - I haven't found any case where code would use "found"
passed from get_param() - so as it's not used it might be removed.

pglogical_output.c:

+   elog(DEBUG1, "Binary mode rejected: Server and client endian
sizeof(Datum) mismatch");
+   return false;
+   }
+
+   if (data->client_binary_sizeofint != 0
+   && data->client_binary_sizeofint != sizeof(int))
+   {
+   elog(DEBUG1, "Binary mode rejected: Server and client endian
sizeof(int) mismatch");
+   return false;
+   }
+
+   if (data->client_binary_sizeoflong != 0
+   && data->client_binary_sizeoflong != sizeof(long))
+   {
+   elog(DEBUG1, "Binary mode rejected: Server and client endian
sizeof(long) mismatch");
Isn't "endian" here case of copy-paste from first error?
Error messages should rather look like:
elog(DEBUG1, "Binary mode rejected: Server and client
sizeof(Datum) mismatch");

+ static void pg_decode_shutdown(LogicalDecodingContext * ctx)
In pg_decode_startup we create main memory context, and create hooks memory
context. In pg_decode_shutdown we delete hooks memory context but not main
context. Is this OK, or should we also add:
MemoryContextDelete(data->context);

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### [HACKERS] Rethinking TRANSFORM FOR TYPE ...

```I'm using the TRANSFORM feature to implement a new data type for python
(ndarrays from numpy). I'm constantly getting tripped up by forgetting
to add TRANSFORM FOR TYPE. Worse, the requirement for explicitly stating
transform means I can't use a polymorphic type.

In the case of adding a new transform for an existing type, current
behavior makes sense; you'll break all existing functions using the type
if you just swap the representation out under them. Further, if you are
pulling in some new extension that uses the same language and type, that
function will be expecting the old representation, not the new one.

For the case of creating a new data type, I think explicitly requiring
the TRANSFORM clause makes no sense. It's a bunch of extra work for

A simple way to fix this would be to allow simply marking a transform as
being DEFAULT. If a transform is marked as DEFAULT then it would
automatically get used.

Perhaps a better way would be allowing for multiple transforms for each
language and type. That way users aren't stuck with a single
preconceived notion of how to represent a type. The immediate use I see
for that is it would allow a transform to be created in something other
than C, as long as the language you want to use can handle a raw C
string. That desire might sound silly to a lot of -hackers, but given
the amount of pain I went through figuring out how to properly marshal
an ndarray back and forth in C, I sure as hell wish I could have done it
in python! Since plpythonu understands bytea, I don't see any reason I
couldn't have.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

```On Wed, Aug 19, 2015 at 8:40 AM, Ashutosh Bapat
wrote:
> I started reviewing the other patches.
>
> In patch foreign_join_v16.patch, the user mapping structure being passed to
> GetConnection() is the one obtained from GetUserMappingById().
> GetUserMappingById() constructs the user mapping structure from the user
> mapping catalog. For public user mappings, catalog entries have InvalidOid
> as userid. Thus, with this patch there is a chance that userid in
> UserMapping being passed to GetConnection(), contains InvalidOid as userid.
> This is not the case today. The UserMapping structure constructed using
> GetUserMapping(Oid userid, Oid serverid) (which ultimately gets passed to
> GetConnection()), has the passed in userid and not the one in the catalog.
> Is this change intentional?

This point seems not to have been addressed.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### Re: [HACKERS] Combining Aggregates

```On 20 January 2016 at 05:58, Robert Haas  wrote:

> > On Mon, Dec 21, 2015 at 4:02 AM, David Rowley
> >  wrote:
> >> Now, there has been talk of this previously, on various threads, but I
> don't
> >> believe any final decisions were made on how exactly it should be done.
> At
> >> the moment I plan to make changes as follows:
>
> Oh, one more point: is there any reason why all of this needs to be a
> single (giant) patch?  I feel like the handling of internal states
> could be a separate patch from the basic patch to allow partial
> aggregation and aggregate-combining, and that the latter could be
> committed first if we had a reasonably final version of it.  That
> seems like it would be easier from a review standpoint, and might
> allow more development to proceed in parallel, too.

I didn't ever really imagine that I'd include any actual new combinefns or
serialfn/deserialfn in this patch. One set has of course now ended up in
there, I can move these off to the test patch for now.

Did you imagine that the first patch in the series would only add the
aggcombinefn column to pg_aggregate and leave the aggserialfn stuff until
later? I thought it seemed better to get the infrastructure committed in
one hit, then add a bunch of new combinefn, serialfn, deserialfns for
existing aggregates in follow-on commits.

--
PostgreSQL Development, 24x7 Support, Training & Services

```

### Re: [HACKERS] PATCH: Extending the HyperLogLog API a bit

```

On 01/19/2016 10:54 PM, Peter Geoghegan wrote:

On Tue, Jan 19, 2016 at 9:37 AM, Alvaro Herrera
wrote:

Our transcript seems to predate that bugfix commit, so I assume we need
to apply this to our copy too.  Sadly, Hideaki-san commit message isn't
very descriptive.

Fortunately, the function mergeHyperLogLog() in our hyperloglog.c
currently has no callers.

I don't really know how HyperLogLog works, so maybe we can't or
shouldn't apply the patch because of how the hash stuff is used.

I think that Hideaki's confusion comes from whether or not this HLL
state is a sparse or dense/full representation. The distinction is
explained in the README for postgresql-hll:

https://github.com/aggregateknowledge/postgresql-hll

postgresql-hll has no support for merging HLLs that are sparse:

https://github.com/aggregateknowledge/postgresql-hll/blob/master/hll.c#L1888

Can't we just tear mergeHyperLogLog() out?

FWIW I've been considering adding APPROX_COUNT_DISTINCT() aggregate,
similarly to what other databases (e.g. Vertica) have built-in. Now,
that would not require the merge too, but we're currently baking support
for 'combine' functions, and that's exactly what merge does.

So why not just fix the bug?

--
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### Re: [HACKERS] PATCH: Extending the HyperLogLog API a bit

```On Tue, Jan 19, 2016 at 2:22 PM, Peter Geoghegan  wrote:
> I don't have time right now to do a full analysis of whether or not it's
> possible to just fix the bug without doing all that, but I think it
> might not be.

IOW: I think that Hideaki's bug fix might itself be wrong (although I
might be wrong about that; no time to make sure right now). Note that
Hideaki's implementation was not the only one I looked at when working
on this code.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### Re: [HACKERS] make error - libpqdll.def No such file or directory

```
On 1/19/2016 10:58 AM, Alvaro Herrera wrote:
Yes, probably something like that. I think it failed the first time
because there was a bug (the one I introduced in a967613911f7), then
probably changed to src/backend and ran compiles there which probably
worked fine, leading to commit fa838b555f90. I might or might not have
removed the complete build dir instead of "make distclean"; not sure
TBH. As I recall, I tried a couple of builds after I pushed the fix
commit and couldn't get them to run at all. But since I saw in
buildfarm that other mingw builds were working, I lost interest. Then
I deleted the mingw build tree and didn't get back to retrying. TBH
the only reason I mingled with mingw at all is that nobody seems
interested in fixing Win32 issues, and this patch had been lingering
for far too long.

somewhere in the distribution that I was using ( from
http://www.postgresql.org/ftp/source/v9.5.0/ ), so I went ahead and
cloned the git repo and used the sources from there.

I am happy to announce that I was finally able to build and run
PostgreSQL on MinGW-w64, but it only worked with the sources from the
not sure if there is a bug there that was resolved, or if it is packaged
several places that it should be easier to build from that distribution
as opposed to the git repo.

Igal

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### Re: [HACKERS] Combining Aggregates

```On 20 January 2016 at 05:56, Robert Haas  wrote:

>
> On Mon, Dec 21, 2015 at 4:02 AM, David Rowley
>  wrote:
> > Now, there has been talk of this previously, on various threads, but I
> don't
> > believe any final decisions were made on how exactly it should be done.
> At
> > the moment I plan to make changes as follows:
> >
> >  Add 3 new columns to pg_aggregate, aggserialfn, aggdeserialfn and
> > aggserialtype These will only be required when aggtranstype is INTERNAL.
>
> Check.
>
> > Perhaps we should disallow CREATE AGGREAGET from accepting them for any
> > other type...
>
> Well, we should definitely not accept them and have them be
> meaningless.  We should either reject them or accept them and then use
> them.  I can't immediately think of a reason to serialize one
> non-internal type as another, but maybe there is one.
>
>
In the latest patch I'm disallowing serial and deserial functions for non
INTERNAL state aggregates during CREATE AGGREGATE.
I think it's best to keep this disabled for now, and do so until we
discover some reason that we might want want to enable it. If we enabled
it, and later decided that was a dumb idea, then it'll be much harder to
add the restriction later, since it may cause errors for users who have
created their own aggregates.

> > Add a new bool field to nodeAgg's state named serialStates. If this is
> field
> > is set to true then when we're in finalizeAgg = false mode, we'll call
> the
> > serialfn on the agg state instead of the finalfn. This will allow the
> > serialized state to be stored in the tuple and sent off to the main
> backend.
> > The combine agg node should also be set to serialStates = true, so that
> it
> > knows to deserialize instead of just assuming that the agg state is of
> type
> > aggtranstype.
>
> I'm not quite sure, but it sounds like you might be overloading
> serialStates with two different meanings here.
>

Hmm, only in the sense that serialStates means "serialise" and
"deserialise". The only time that both could occur in the same node is if
combineStates=true and finalizeAggs=false (in other words 3 or more
aggregation stages).

Let's say we are performing aggregation in 3 stages, stage 1 is operating
on normal values and uses the transfn on these values, but does not
finalise the states. If serialStates = true here then, if one exists, we
call the serialfn, passing in the aggregated state, and pass the return
value of that function up to the next node. Now, this next (middle) node is
important, serialStates must also be true here so that the executor knows
to deserialise the previously serialised states. Now, this node uses the
combinefn to merge states which require, and then since serialStates is
true, it also (re)serialises those new states again. Now if there was some
reason that this middle node should deserialise, but not re-serialise those
states, then we may need an extra parameter to instruct it to do so. I
guess this may be required if we were to perform some partial aggregation
and combining again within a single process (in which case we'd not need to
serialise INTERNAL states, we can just pass the pointer to them in the
Tuple),  but then we might require the states to be serialised in order to
hand them over to the main process, from a worker process.

I can imagine cases where we might want to do this in the future, so
perhaps it is worth it:

Imagine:

SELECT COUNT(*) FROM (SELECT * FROM (SELECT * FROM a UNION ALL SELECT *
FROM b) AS ab UNION ALL (SELECT * FROM c UNION ALL SELECT * FROM d)) abcd;

Where the planner may decide to, on 1 worker perform count(*) on a then b,
and combine that into ab, while doing the same for c and d on some other
worker process.

> I believe this should allow us to not cause any performance regressions by
> > moving away from INTERNAL agg states. It should also be very efficient
> for
> > internal states such as Int8TransTypeData, as this struct merely has 2
> int64
> > fields which should be very simple to stuff into a bytea varlena type. We
> > don't need to mess around converting the ->count and ->sum into a
> strings or
> > anything.
>
> I think it would be more user-friendly to emit these as 2-element
> integer arrays rather than bytea.  Sure, bytea is fine when PostgreSQL
> is talking to itself, but if you are communicating with an external
> system, it's a different situation.  If the remote system is
> PostgreSQL then you are again OK, except for the data going over the
> wire being incomprehensible and maybe byte-order-dependent, but what
> if you want some other database system to do partial aggregation and
> then further aggregate the result?  You don't want the intermediate
> state to be some kooky thing that only another PostgreSQL database has
> a chance of generating correctly.
>

If we do that then we also need to invent composite database types for the
more complicated internal states such as NumericAggState.
We should ```

### Re: [HACKERS] Combining Aggregates

```On Tue, Jan 19, 2016 at 4:50 PM, David Rowley
wrote:
>> Oh, one more point: is there any reason why all of this needs to be a
>> single (giant) patch?  I feel like the handling of internal states
>> could be a separate patch from the basic patch to allow partial
>> aggregation and aggregate-combining, and that the latter could be
>> committed first if we had a reasonably final version of it.  That
>> seems like it would be easier from a review standpoint, and might
>> allow more development to proceed in parallel, too.
>
> I didn't ever really imagine that I'd include any actual new combinefns or
> serialfn/deserialfn in this patch. One set has of course now ended up in
> there, I can move these off to the test patch for now.
>
> Did you imagine that the first patch in the series would only add the
> aggcombinefn column to pg_aggregate and leave the aggserialfn stuff until
> later?

I think that would be a sensible way to proceed.

> I thought it seemed better to get the infrastructure committed in one
> hit, then add a bunch of new combinefn, serialfn, deserialfns for existing
> aggregates in follow-on commits.

To my mind, priority #1 ought to be putting this fine new
functionality to some use.  Expanding it to every aggregate we've got
seems like a distinctly second priority.  That's not to say that it's
absolutely gotta go down that way, but those would be my priorities.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### Re: [HACKERS] checkpointer continuous flushing

```On 2016-01-19 12:58:38 -0500, Robert Haas wrote:
> This seems like a problem with the WAL writer quite independent of
> anything else.  It seems likely to be inadvertent fallout from this
> patch:
>
> Author: Simon Riggs
> Branch: master Release: REL9_2_BR [4de82f7d7] 2011-11-13 09:00:57 +
>
> Wakeup WALWriter as needed for asynchronous commit performance.
> Previously we waited for wal_writer_delay before flushing WAL. Now
> we also wake WALWriter as soon as a WAL buffer page has filled.
> Significant effect observed on performance of asynchronous commits
> by Robert Haas, attributed to the ability to set hint bits on tuples
> earlier and so reducing contention caused by clog lookups.

In addition to that the "powersaving" effort also plays a role - without
the latch we'd not wake up at any meaningful rate at all atm.

> If I understand correctly, prior to that commit, WAL writer woke up 5
> times per second and flushed just that often (unless you changed the
> default settings).But as the commit message explained, that turned
> out to suck - you could make performance go up very significantly by
> radically decreasing wal_writer_delay.  This commit basically lets it
> flush at maximum velocity - as fast as we finish one flush, we can
> start the next.  That must have seemed like a win at the time from the
> way the commit message was written, but you seem to now be seeing the
> opposite effect, where performance is suffering because flushes are
> too frequent rather than too infrequent.  I wonder if there's an ideal
> flush rate and what it is, and how much it depends on what hardware
> you have got.

I think the problem isn't really that it's flushing too much WAL in
total, it's that it's flushing WAL in a too granular fashion. I suspect
we want something where we attempt a minimum number of flushes per
second (presumably tied to wal_writer_delay) and, once exceeded, a
minimum number of pages per flush. I think we even could continue to
write() the data at the same rate as today, we just would need to reduce
the number of fdatasync()s we issue. And possibly could make the
eventual fdatasync()s cheaper by hinting the kernel to write them out
earlier.

Now the question what the minimum number of pages we want to flush for
(setting wal_writer_delay triggered ones aside) isn't easy to answer. A
simple model would be to statically tie it to the size of wal_buffers;
say, don't flush unless at least 10% of XLogBuffers have been written
since the last flush. More complex approaches would be to measure the
continuous WAL writeout rate.

By tying it to both a minimum rate under activity (ensuring things go to
disk fast) and a minimum number of pages to sync (ensuring a reasonable
number of cache flush operations) we should be able to mostly accomodate
the different types of workloads. I think.

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### Re: [HACKERS] PATCH: postpone building buckets to the end of Hash (in HashJoin)

```

On 01/19/2016 08:34 PM, Robert Haas wrote:

On Mon, Jan 18, 2016 at 10:57 PM, Tomas Vondra
wrote:

If this doesn't regress performance in the case where the number of
buckets is estimated accurately to begin with, then I think this is
a great idea. Can you supply some performance tests results for that
case, and maybe some of the other cases also?

I don't see how it could regress performance, and the benchmarks I've
done confirm that. I'll do more thorough benchmarking and post the
results here, but not now as this patch is in 2016-01 CF and I want to
put all my time into reviewing patches from the open commitfest.

I've finally got to do more thorough benchmarks, and surprisingly there
seems to be a minor regression.

...

So even if we entirely skipped the bucket build, we would not recover the
difference. Not sure what's going on here.

I've also done some profiling using perf, but I haven't really found
anything suspicious there.

Any ideas what might be the cause of this?

My guess is that memory locality is not as good with this patch.  Consider this:

copyTuple->next = hashtable->buckets[bucketno];

We've only just populated copytuple via memcpy, so that cache line is
presumably in whatever place cache lines go when they are really hot.
We basically make one pass over the allocated space for the hash
table, filling in the hash tuples and linking things into bucket
chains.  OTOH, with the patch, we don't do that on the first pass,
just writing the tuples.  Then when we come back through we still have
to do that part:

hashTuple->next = hashtable->buckets[bucketno];

By the time we're doing this, especially at larger work_mem settings,
we've probably pushed those cache lines out of the faster levels of
the CPU cache, and we have to pull them back in again, and that takes
time.   If the tuples are small, we'll dirty every cache line twice;
if they're larger, we might not dirty every cache line on the second
pass, but just some fraction of them.

I could be totally off base, but this is why I was concerned about the patch.

I can totally see why this would slow-down the BuildBuckets function,
but I don't quite see why it should make the other code significantly
slower. Yet BuildBuckets takes just ~25ms while the total duration
increases by ~200ms (for the 1x10 dataset).

I do understand calling BuildBuckets may affect the code executed after
that as it keeps other data in the cache, but i would not expect ~175ms.

Yet another thing is that BuildBuckets accesses the tuples in mostly
sequential way, so I'd expect prefetch to take care of that.

regards

--
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### Re: [HACKERS] PATCH: Extending the HyperLogLog API a bit

```On Tue, Jan 19, 2016 at 9:37 AM, Alvaro Herrera
wrote:
> Our transcript seems to predate that bugfix commit, so I assume we need
> to apply this to our copy too.  Sadly, Hideaki-san commit message isn't
> very descriptive.

Fortunately, the function mergeHyperLogLog() in our hyperloglog.c
currently has no callers.

> I don't really know how HyperLogLog works, so maybe we can't or
> shouldn't apply the patch because of how the hash stuff is used.

I think that Hideaki's confusion comes from whether or not this HLL
state is a sparse or dense/full representation. The distinction is
explained in the README for postgresql-hll:

https://github.com/aggregateknowledge/postgresql-hll

postgresql-hll has no support for merging HLLs that are sparse:

https://github.com/aggregateknowledge/postgresql-hll/blob/master/hll.c#L1888

Can't we just tear mergeHyperLogLog() out?

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### Re: [HACKERS] make error - libpqdll.def No such file or directory

```
On 1/17/2016 8:17 PM, Igal @ Lucee.org wrote:

On 1/17/2016 3:24 PM, Igal @ Lucee.org wrote:

When running make I encounter the following error:

gcc.exe: error: libpqdll.def: No such file or directory
for target 'libpq.dll' failed

make[3]: *** [libpq.dll] Error 1
make[3]: Leaving directory

Makefile:17: recipe for target 'all-libpq-recurse' failed
make[2]: *** [all-libpq-recurse] Error 2
make[2]: Leaving directory

Makefile:34: recipe for target 'all-interfaces-recurse' failed
make[1]: *** [all-interfaces-recurse] Error 2
GNUmakefile:11: recipe for target 'all-src-recurse' failed
make: *** [all-src-recurse] Error 2

contain the file libpqdll.def

exists as well.

It's hard for me to decipher which file is reporting the error and

Any feedback would be greatly appreciated, thanks!
So when I try to run `make` I still get that error.  Please note that I
am doing a VPATH build (the build in a separate directory from the

x86_64-w64-mingw32-gcc.exe: error: libpqdll.def: No such file or directory
make[3]: *** [libpq.dll] Error 1
make[2]: *** [all-libpq-recurse] Error 2
make[1]: *** [all-interfaces-recurse] Error 2
make: *** [all-src-recurse] Error 2

I found a script that builds postgresql via MinGW-w64, and in it the
author specifically creates symlinks to libpqdll.def
https://github.com/Alexpux/MINGW-packages/blob/master/mingw-w64-postgresql/PKGBUILD#L72
-- excerpt below:

# Make DLL definition file visible during each arch build
ln -s
"\${srcdir}/postgresql-\$pkgver/src/interfaces/libpq/libpqdll.def"
src/interfaces/libpq/
ln -s
"\${srcdir}/postgresql-\$pkgver/src/interfaces/ecpg/ecpglib/libecpgdll.def" src/interfaces/ecpg/ecpglib/
ln -s
"\${srcdir}/postgresql-\$pkgver/src/interfaces/ecpg/pgtypeslib/libpgtypesdll.def"
src/interfaces/ecpg/pgtypeslib/
ln -s
"\${srcdir}/postgresql-\$pkgver/src/interfaces/ecpg/compatlib/libecpg_compatdll.def"
src/interfaces/ecpg/compatlib/

Why are the symlinks needed to make the definition files visible?

Is this an issue with VPATH builds?  It is not mentioned in the docs
where VPATH builds are discussed (section 15.4
http://www.postgresql.org/docs/9.5/static/install-procedure.html )

What is my best solutions?

Thanks!

```

### Re: [HACKERS] make error - libpqdll.def No such file or directory

```
On 1/19/2016 10:17 AM, Alvaro Herrera wrote:

Igal @ Lucee.org wrote:

So when I try to run `make` I still get that error.  Please note that I am
doing a VPATH build (the build in a separate directory from the downloaded
sources), which might play a role here:

x86_64-w64-mingw32-gcc.exe: error: libpqdll.def: No such file or directory
make[3]: *** [libpq.dll] Error 1
make[2]: *** [all-libpq-recurse] Error 2
make[1]: *** [all-interfaces-recurse] Error 2
make: *** [all-src-recurse] Error 2

I too tried mingw compile in VPATH a few days ago, and found that it
behaves in stupid ways -- sometimes it would just loop re-running
configure over and over, and when it (seemingly randomly) stopped doing
that, it failed because of missing errcodes.h or something related.
(When I make- distcleaned and started from scratch, it would work, so I
simply did that.)

So to clarify (in the hopes that it would allow me to keep some hair on
my head by the end of this process, as I already pulled out most of it
by now)...

build-dir\$ ~/source-dir/configure --host=x86_64-w64-mingw32
...
build-dir\$ make
...
errors
build-dir\$ make distclean
...
build-dir\$ ~/source-dir/configure --host=x86_64-w64-mingw32
...
build-dir\$ make
...

?

Or did you not do a VPATH build the second time?

Did you use any other options?

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### Re: [HACKERS] Advices on custom data type and extension development

```Luciano Coutinho Barcellos wrote:

> * the type would be 8 bytes long, being 4 dedicated to storing the
> Date, and 4 dedicated to storing a serial within that day;

Another thing to consider -- have you carefully defined what the
"current day" is?  This might sound a stupid question, but as far as I
remember Brazil has at least two timezones, which means that you could
have one date while far east and a different one at the west border of
the country.  If you misplace an order that a customer filed after 11pm,
they will be pretty pissed.  (This of course becomes more pressing if
you have things outside the country.)

If you simply state that dates are whatever is current in UTC zone,
you're covered.  (Pray you never get an order during a leap second.)

--
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### Re: [HACKERS] checkpointer continuous flushing

```On Mon, Jan 18, 2016 at 11:39 AM, Andres Freund  wrote:
> On 2016-01-16 10:01:25 +0100, Fabien COELHO wrote:
>> Hello Andres,
>>
>> >I measured it in a different number of cases, both on SSDs and spinning
>> >rust. I just reproduced it with:
>> >
>> >postgres-ckpt14 \
>> >   -D /srv/temp/pgdev-dev-800/ \
>> >   -c maintenance_work_mem=2GB \
>> >   -c fsync=on \
>> >   -c synchronous_commit=off \
>> >   -c shared_buffers=2GB \
>> >   -c wal_level=hot_standby \
>> >   -c max_wal_senders=10 \
>> >   -c max_wal_size=100GB \
>> >   -c checkpoint_timeout=30s
>> >
>> >Using a fresh cluster each time (copied from a "template" to save time)
>> >and using
>> >pgbench -M prepared -c 16 -j 16 -T 300 -P 1
>
> So, I've analyzed the problem further, and I think I found something
> rater interesting. I'd profiled the kernel looking where it blocks in
> the IO request queues, and found that the wal writer was involved
> surprisingly often.
>
> So, in a workload where everything (checkpoint, bgwriter, backend
> writes) is flushed: 2995 tps
> After I kill the wal writer with -STOP: 10887 tps
>
> Stracing the wal writer shows:
>
> 17:29:02.001517 --- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_USER, si_pid=17857,
> si_uid=1000} ---
> 17:29:02.001582 read(8, 0x7ffea6b6b200, 16) = -1 EAGAIN (Resource temporarily
> unavailable)
> 17:29:02.001615 write(3,
> "\210\320\5\0\1\0\0\0\0@\330_/\0\0\0w\f\0\0\0\0\0\0\0\4\0\2\t\30\0\372"...,
> 49152) = 49152
> 17:29:02.001671 fdatasync(3)= 0
> 17:29:02.005022 --- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_USER, si_pid=17825,
> si_uid=1000} ---
> 17:29:02.005081 read(8, 0x7ffea6b6b200, 16) = -1 EAGAIN (Resource temporarily
> unavailable)
> 17:29:02.005111 write(3,
> "\210\320\5\0\1\0\0\0\0\0\331_/\0\0\0\7\26\0\0\0\0\0\0T\251\0\0\0\0\0\0"...,
> 8192) = 8192
> 17:29:02.005147 fdatasync(3)= 0
> 17:29:02.008688 --- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_USER, si_pid=17866,
> si_uid=1000} ---
> 17:29:02.008730 read(8, 0x7ffea6b6b200, 16) = -1 EAGAIN (Resource temporarily
> unavailable)
> 17:29:02.008757 write(3, "\210\320\5\0\1\0\0\0\0
> \331_/\0\0\0\267\30\0\0\0\0\0\0"..., 98304) = 98304
> 17:29:02.008822 fdatasync(3)= 0
> 17:29:02.016125 --- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_USER, si_pid=17865,
> si_uid=1000} ---
> 17:29:02.016174 read(8, 0x7ffea6b6b200, 16) = -1 EAGAIN (Resource temporarily
> unavailable)
> 17:29:02.016204 write(3,
> "\210\320\5\0\1\0\0\0\0\240\332_/\0\0\0s\5\0\0\0\0\0\0\t\30\0\2|8\2u"...,
> 57344) = 57344
> 17:29:02.016281 fdatasync(3)= 0
> 17:29:02.019181 --- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_USER, si_pid=17865,
> si_uid=1000} ---
> 17:29:02.019226 read(8, 0x7ffea6b6b200, 16) = -1 EAGAIN (Resource temporarily
> unavailable)
> 17:29:02.019249 write(3,
> "\210\320\5\0\1\0\0\0\0\200\333_/\0\0\0\307\f\0\0\0\0\0\0"..., 73728)
> = 73728
> 17:29:02.019355 fdatasync(3)= 0
> 17:29:02.022680 --- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_USER, si_pid=17865,
> si_uid=1000} ---
>
> I.e. we're fdatasync()ing small amount of pages. Roughly 500 times a
> second. As soon as the wal writer is stopped, it's much bigger chunks,
> on the order of 50-130 pages. And, not that surprisingly, that improves
> performance, because there's far fewer cache flushes submitted to the
> hardware.

This seems like a problem with the WAL writer quite independent of
anything else.  It seems likely to be inadvertent fallout from this
patch:

Author: Simon Riggs
Branch: master Release: REL9_2_BR [4de82f7d7] 2011-11-13 09:00:57 +

Wakeup WALWriter as needed for asynchronous commit performance.
Previously we waited for wal_writer_delay before flushing WAL. Now
we also wake WALWriter as soon as a WAL buffer page has filled.
Significant effect observed on performance of asynchronous commits
by Robert Haas, attributed to the ability to set hint bits on tuples
earlier and so reducing contention caused by clog lookups.

If I understand correctly, prior to that commit, WAL writer woke up 5
times per second and flushed just that often (unless you changed the
default settings).But as the commit message explained, that turned
out to suck - you could make performance go up very significantly by
radically decreasing wal_writer_delay.  This commit basically lets it
flush at maximum velocity - as fast as we finish one flush, we can
start the next.  That must have seemed like a win at the time from the
way the commit message was written, but you seem to now be seeing the
opposite effect, where performance is suffering ```

### Re: [HACKERS] make error - libpqdll.def No such file or directory

```

On 01/19/2016 02:01 PM, Alvaro Herrera wrote:

Andrew Dunstan wrote:

jacana does VPATH builds with pretty much this setup all the time. See for
example

Yes, it builds a tree *once*, then deletes the result, and the next BF
run uses a completely new build directory.  So any issues resulting from
re-building an existing tree are never seen.

Oh. odd. I don't think I've seen that even when developing on Windows
(e.g. parallel pg_restore). Maybe I always did that in-tree.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### Re: [HACKERS] jsonb array-style subscription

```Dmitry Dolgov wrote:

> I've cleaned up the code, created a separate JsonbRef node (and there are a
> lot of small changes because of that), abandoned an idea of "deep nesting"
> of assignments (because it doesn't relate to jsonb subscription, is more
> "jsonb_set" function, and anyway it's not a good idea). It looks fine for
> me, and I need a little guidance - is it ok to propose this feature for
> commitfest 2016-03 for a review?

Has this patch been proposed in some commitfest previously?  One of the
less-commonly-invoked rules of commitfests is that you can't add patches
that are too invasive to the last one -- so your last chance for 9.6 was
2016-01.  This is harsh to patch submitters, but it helps keep the size
of the last commitfest down to a reasonable level; otherwise we are
never able to finish it.

--
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### Re: [HACKERS] make error - libpqdll.def No such file or directory

```Igal @ Lucee.org wrote:

> So when I try to run `make` I still get that error.  Please note that I am
> doing a VPATH build (the build in a separate directory from the downloaded
> sources), which might play a role here:
>
> x86_64-w64-mingw32-gcc.exe: error: libpqdll.def: No such file or directory
> make[3]: *** [libpq.dll] Error 1
> make[2]: *** [all-libpq-recurse] Error 2
> make[1]: *** [all-interfaces-recurse] Error 2
> make: *** [all-src-recurse] Error 2

I too tried mingw compile in VPATH a few days ago, and found that it
behaves in stupid ways -- sometimes it would just loop re-running
configure over and over, and when it (seemingly randomly) stopped doing
that, it failed because of missing errcodes.h or something related.
(When I make- distcleaned and started from scratch, it would work, so I
simply did that.)

--
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### Re: [HACKERS] Stream consistent snapshot via a logical decoding plugin as a series of INSERTs

```On Fri, Jan 15, 2016 at 5:31 PM, Shulgin, Oleksandr <
oleksandr.shul...@zalando.de> wrote:

>
> POC patch attached.  Findings:
>
> 1) Needs an actual slot for all the decode machinery to work (code depends
> on MyReplicationSlot being set).
> 2) Requires a core patch.
> 3) Currently only supports textual output, adding binary is trivial.
>
>
> Acquiring a slot means this cannot be run in parallel from multiple
> backends.  Any ideas on how to overcome this (except for opening multiple
> slots with the same LSN)?
> To obtain a consistent snapshot, the client still needs to take care of
> preserving and setting transaction snapshot properly.
>

Testing revealed a number of problems with memory handling in this code, a
corrected v2 is attached.

Completely another problem is proper handling of SPI stack and releasing
the replication slot.  The latter I'd like to avoid dealing with, because
at the moment it's not possible to stream a number of relations in parallel
using this POC function, so I'd rather move in a direction of not acquiring
the replication slot at all.

The SPI problem manifests itself if I place a LIMIT on top of the query:

# SELECT pg_logical_slot_stream_relation('slot1', 'pg_catalog', 'pg_class')
LIMIT 5;
WARNING:  relcache reference leak: relation "pg_class" not closed
WARNING:  transaction left non-empty SPI stack
HINT:  Check for missing "SPI_finish" calls.

I wonder if there is a way to install some sort of cleanup handler that
will be called by executor?

--
Alex
From 83c2c754066f43111d0f21ff088cc5503e910aab Mon Sep 17 00:00:00 2001
From: Oleksandr Shulgin
Date: Fri, 15 Jan 2016 17:30:04 +0100
Subject: [PATCH] POC: pg_logical_slot_stream_relation

---
src/backend/catalog/system_views.sql|   9 +
src/backend/replication/logical/logicalfuncs.c  | 355 +---
src/backend/replication/logical/reorderbuffer.c |   6 +-
src/include/catalog/pg_proc.h   |   2 +
src/include/replication/logicalfuncs.h  |   1 +
src/include/replication/reorderbuffer.h |   3 +
6 files changed, 333 insertions(+), 43 deletions(-)

diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 923fe58..5431b61 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -941,6 +941,15 @@ LANGUAGE INTERNAL
VOLATILE ROWS 1000 COST 1000
AS 'pg_logical_slot_peek_binary_changes';

+CREATE OR REPLACE FUNCTION pg_logical_slot_stream_relation(
+IN slot_name name, IN relnamespace name, IN relname name, IN nochildren bool DEFAULT FALSE,
+OUT data text)
+RETURNS SETOF TEXT
+LANGUAGE INTERNAL
+VOLATILE ROWS 1000 COST 1000
+AS 'pg_logical_slot_stream_relation';
+
CREATE OR REPLACE FUNCTION pg_create_physical_replication_slot(
IN slot_name name, IN immediately_reserve boolean DEFAULT false,
OUT slot_name name, OUT xlog_position pg_lsn)
diff --git a/src/backend/replication/logical/logicalfuncs.c b/src/backend/replication/logical/logicalfuncs.c
index 562c8f6..c1605de 100644
--- a/src/backend/replication/logical/logicalfuncs.c
+++ b/src/backend/replication/logical/logicalfuncs.c
@@ -21,12 +21,18 @@
#include "funcapi.h"

+#include "access/htup_details.h"
#include "access/xlog_internal.h"

+#include "executor/spi.h"
+
+#include "catalog/namespace.h"
#include "catalog/pg_type.h"

#include "nodes/makefuncs.h"

+#include "lib/stringinfo.h"
+
#include "mb/pg_wchar.h"

#include "utils/array.h"
@@ -40,6 +46,7 @@
#include "replication/decode.h"
#include "replication/logical.h"
#include "replication/logicalfuncs.h"
+#include "replication/reorderbuffer.h"

#include "storage/fd.h"

@@ -50,6 +57,12 @@ typedef struct DecodingOutputState
TupleDesc	tupdesc;
bool		binary_output;
int64		returned_rows;
+
+	/* for pg_logical_stream_relation */
+	MemoryContext	context;
+	Relation		rel;
+	Portal			cursor;
+	TupleTableSlot *tupslot;
} DecodingOutputState;

/*
return count;
}

+static List *
+deconstruct_options_array(ArrayType *arr)
+{
+	Size		ndim;
+	List	   *options = NIL;
+
+	ndim = ARR_NDIM(arr);
+	if (ndim > 1)
+	{
+		ereport(ERROR,
+(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("array must be one-dimensional")));
+	}
+	else if (array_contains_nulls(arr))
+	{
+		ereport(ERROR,
+(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("array must not contain nulls")));
+	}
+	else if (ndim == 1)
+	{
+		int			nelems;
+		Datum	   *datum_opts;
+		int			i;
+
+		Assert(ARR_ELEMTYPE(arr) == TEXTOID);
+
+		deconstruct_array(arr, TEXTOID, -1, false, 'i',
+		  _opts, NULL, );
+
+		if (nelems % 2 != 0)
+			ereport(ERROR,
+	(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+	 errmsg("array must have even number of elements")));
+
+		for (i = 0; i < nelems; i += 2)
+		{
+			```

### Re: Odd behavior in foreign table modification (Was: Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW)

```On Tue, Jan 19, 2016 at 1:59 AM, Etsuro Fujita
wrote:
I've run into an issue:

*# UPDATE master_customers SET id = 22 WHERE id = 16 RETURNING
tableoid::regclass;
ERROR:
CONTEXT:  Remote SQL command: UPDATE public.customers SET id = 22
WHERE ((id = 16)) RETURNING NULL
>
>> While working on this, I noticed that the existing postgres_fdw system
>> shows similar behavior, so I changed the subject.
>>
>> IIUC, the reason for that is when the local query specifies "RETURNING
>> tableoid::regclass", the FDW has fmstate->has_returning=false while the
>> remote query executed at ModifyTable has "RETURNING NULL", as shown in
>> the above example; that would cause an abnormal exit in executing the
>> remote query in postgresExecForeignUpdate, since that the FDW would get
>> PGRES_TUPLES_OK as a result of the query while the FDW would think that
>> the right result to get should be PGRES_COMMAND_OK, from the flag
>> fmstate->has_returning=false.
>>
>> Attached is a patch to fix that.
>
> I added this to the next CF.
>
> https://commitfest.postgresql.org/9/483/

"is_returning" to deparseTargetList()?   If false, then
deparseTargetList() behaves as now.  If false, then
deparseTargetList() doesn't append anything at all if there are no
columns to return, instead of (as at present) adding NULL.  On the
other hand, if there are columns to return, then it appends "
RETURNING " before the first column.  Then, deparseReturningList could
skip adding RETURNING itself, and just pass true to
deparseTargetList().  The advantage of this approach is that we don't
end up with two copies of the code that have to stay synchronized -
the decision is made inside deparseTargetList(), and
deparseReturningList() accepts the results.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### Re: [HACKERS] make error - libpqdll.def No such file or directory

```Igal @ Lucee.org wrote:

>
> So to clarify (in the hopes that it would allow me to keep some hair on my
> head by the end of this process, as I already pulled out most of it by
> now)...
> your process was as follows?
>
> build-dir\$ ~/source-dir/configure --host=x86_64-w64-mingw32
> ...
> build-dir\$ make
> ...
> errors
> build-dir\$ make distclean
> ...
> build-dir\$ ~/source-dir/configure --host=x86_64-w64-mingw32
> ...
> build-dir\$ make
> ...
>
> ?

Yes, probably something like that.  I think it failed the first time
because there was a bug (the one I introduced in a967613911f7), then
probably changed to src/backend and ran compiles there which probably
worked fine, leading to commit fa838b555f90.  I might or might not have
removed the complete build dir instead of "make distclean"; not sure
TBH.  As I recall, I tried a couple of builds after I pushed the fix
commit and couldn't get them to run at all.  But since I saw in
buildfarm that other mingw builds were working, I lost interest.  Then I
deleted the mingw build tree and didn't get back to retrying.

TBH the only reason I mingled with mingw at all is that nobody seems
interested in fixing Win32 issues, and this patch had been lingering for
far too long.

> Or did you not do a VPATH build the second time?

I never build in the source tree, so this is pretty unlikely.

> Did you use any other options?

None.

--
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### Re: [HACKERS] Advices on custom data type and extension development

```>
> Seriously, you should consider having a primary key with two
> columns, of type date and int.  It would take exactly the same
> space as your current plan, and performance should be very close to
> what you propose.  As long as you aren't using some ORM that is too
> dumb to deal with this, it should be far easier than creating the
> custom type.
>

+1

Most ORMs cannot handle ENUMs, let alone user defined composite types.

That, or they *flood* the database with SELECT * FROM pg_type WHERE ...
queries. I'm looking at you, Cake.

You're far better off trying a (date,integer) key as Kevin said.

If the ORM doesn't allow that, I'd suggest a custom function that encodes
the date bit-shifted to the high 4 bytes, and then adds in the four bytes
from a cycling sequence. At least then you've got a shot at partitioning,
though the lower/upper bounds of the partitions would not make sense to the
casual observer.

```

### Re: [HACKERS] make error - libpqdll.def No such file or directory

```Andrew Dunstan wrote:

> jacana does VPATH builds with pretty much this setup all the time. See for
> example
>

Yes, it builds a tree *once*, then deletes the result, and the next BF
run uses a completely new build directory.  So any issues resulting from
re-building an existing tree are never seen.

--
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### Re: [HACKERS] checkpointer continuous flushing

```

synchronous_commit = off does make a significant difference.

But why are you then saying this is fundamentally limited to 160
xacts/sec?

I'm just saying that the tested load generates mostly random IOs (probably
on average over 1 page per transaction), random IOs are very slow on a
HDD, so I do not expect great tps.

I think I found one possible culprit: I automatically wrote 300 seconds for
rerun the tests with this (unreasonnable) figure to check whether I really
get a regression.

I've not seen meaningful changes in the size of the regression between 30/300s.

At 300 seconds (5 minutes) the checkpoints of the accumulated takes 15-25
minutes, during which the database is mostly offline, and there is no
clear difference with/without sort+flush.

Other tests I ran with "reasonnable" settings on a large (scale=800) db did
not show any significant performance regression, up to now.

Try running it so that the data set nearly, but not entirely fit into
the OS page cache, while definitely not fitting into shared_buffers. The
scale=800 just worked for that on my hardware, no idea how it's for yours.
That seems to be the point where the effect is the worst.

I have 16GB memory on the tested host, same as your hardware I think, so I
use scale 800 => 12GB at the beginning of the run. Not sure it fits the
bill as I think it fits in memory, so the load is mostly write and no/very
few reads. I'll also try with scale 1000.

--
Fabien.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

```On Mon, Jan 18, 2016 at 10:41 PM, Amit Kapila  wrote:
> Initially, we started with extending the 'waiting' column in
> pg_stat_activity,
> to which some people have raised concerns about backward
> compatability, so another option that came-up during discussion was to
> retain waiting as it-is and have an additional column 'wait_event' in
> pg_stat_activity, after that there is feedback that we should try to include
> wait information about background processes as well which raises a bigger
> question whether it is any good to expose this information via
> pg_stat_activity
> (pg_stat_activity doesn't display information about background processes)
> or is it better to have a new view as discussed here [1].
>
> Second important and somewhat related point is whether we should save
> this information in PGPROC as 4 bytes or keep it in pgBackendStatus.
> I think it is better to store in PGPROC, if we want to save wait information
> for backend processes as well.
>
> I am of opinion that we should save this information in PGPROC and
> expose it via new view, but I am open to go other ways based on what

My opinion is that storing the information in PGPROC is better because
it seems like we can fairly painlessly expose 4 bytes of data that way
instead of 1, which is nice.

On the topic of the UI, I understand that redefining
pg_stat_activity.waiting might cause some short-term annoyance.  But I
think in the long term what we are proposing here is going to be a
huge improvement, so I think it's worth the compatibility break.  If
we say that pg_stat_activity.waiting has to continue meaning "waiting
for a heavyweight lock" even though we now also expose (in some other
location) information on other kinds of waits, that's going to be
confusing to users.  It's better to force people to update their
queries once than to have this confusing wart in the system forever.
I predict that if we make backward compatibility the priority here,
we'll still be explaining it to smart but confused people when 9.6
goes EOL.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### [HACKERS] Re: [BUGS] about test_parser installation failure problem(PostgreSQL in 9.5.0)？

```閬閬イふ wrote:
> hi postgreSql !
>  test_parser install is ok (postgresql 9.2.4)
> but at (postgresql 9.5.0) failure？
>  why？the postgresql say:
>  CREATE EXTENSION zhparser

What is zhparser anyway?  There is one such thing at
https://github.com/amutu/zhparser
but I don't think it depends on test_parser in any way.

Note this:

>  CREATE EXTENSION zhparser
>  say:
>  ERROR:  syntax error at or near "﻿"
> LINE 1: CREATE EXTENSION zhparser
> ^

My editor shows a "" marker between the double quotes in that
error message -- you sent a byte-order mark (BOM) in your CREATE
EXTENSION command, which is wrong.

--
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### Re: [HACKERS] PATCH: postpone building buckets to the end of Hash (in HashJoin)

```On Mon, Jan 18, 2016 at 10:57 PM, Tomas Vondra
wrote:
>>> If this doesn't regress performance in the case where the number of
>>> buckets is estimated accurately to begin with, then I think this is
>>> a great idea. Can you supply some performance tests results for that
>>> case, and maybe some of the other cases also?
>>
>> I don't see how it could regress performance, and the benchmarks I've
>> done confirm that. I'll do more thorough benchmarking and post the
>> results here, but not now as this patch is in 2016-01 CF and I want to
>> put all my time into reviewing patches from the open commitfest.
>
> I've finally got to do more thorough benchmarks, and surprisingly there
> seems to be a minor regression.
...
>
> So even if we entirely skipped the bucket build, we would not recover the
> difference. Not sure what's going on here.
>
> I've also done some profiling using perf, but I haven't really found
> anything suspicious there.
>
> Any ideas what might be the cause of this?

My guess is that memory locality is not as good with this patch.  Consider this:

copyTuple->next = hashtable->buckets[bucketno];

We've only just populated copytuple via memcpy, so that cache line is
presumably in whatever place cache lines go when they are really hot.
We basically make one pass over the allocated space for the hash
table, filling in the hash tuples and linking things into bucket
chains.  OTOH, with the patch, we don't do that on the first pass,
just writing the tuples.  Then when we come back through we still have
to do that part:

hashTuple->next = hashtable->buckets[bucketno];

By the time we're doing this, especially at larger work_mem settings,
we've probably pushed those cache lines out of the faster levels of
the CPU cache, and we have to pull them back in again, and that takes
time.   If the tuples are small, we'll dirty every cache line twice;
if they're larger, we might not dirty every cache line on the second
pass, but just some fraction of them.

I could be totally off base, but this is why I was concerned about the patch.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### Re: [HACKERS] Re: [JDBC] 9.4-1207 behaves differently with server side prepared statements compared to 9.2-1102

```On Mon, Jan 18, 2016 at 5:50 PM, Thomas Kellerer  wrote:
> With all the problems I have seen (in Oracle and Postgres) I think that
> maybe a better solution to this problem is to make the planner fast (and
> reliable) enough so that plan caching isn't necessary in the first place.
>
> However I have no idea how feasible that is.

The problem is that the floor is already littered with
potentially-very-beneficial query planning ideas that got discarded
because they would add too many cycles to planning time.  Despite
that, planning time is a killer on some workloads.  So right now we've
got workloads where we plan too much and workloads where we plan too
little.  Argh.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### Re: [HACKERS] make error - libpqdll.def No such file or directory

```

On 01/19/2016 01:08 PM, Igal @ Lucee.org wrote:

On 1/17/2016 8:17 PM, Igal @ Lucee.org wrote:

On 1/17/2016 3:24 PM, Igal @ Lucee.org wrote:

When running make I encounter the following error:

gcc.exe: error: libpqdll.def: No such file or directory
for target 'libpq.dll' failed

make[3]: *** [libpq.dll] Error 1
make[3]: Leaving directory

Makefile:17: recipe for target 'all-libpq-recurse' failed
make[2]: *** [all-libpq-recurse] Error 2
make[2]: Leaving directory

Makefile:34: recipe for target 'all-interfaces-recurse' failed
make[1]: *** [all-interfaces-recurse] Error 2
GNUmakefile:11: recipe for target 'all-src-recurse' failed
make: *** [all-src-recurse] Error 2

contain the file libpqdll.def

exists as well.

It's hard for me to decipher which file is reporting the error and

Any feedback would be greatly appreciated, thanks!
So when I try to run `make` I still get that error.  Please note that
I am doing a VPATH build (the build in a separate directory from the

x86_64-w64-mingw32-gcc.exe: error: libpqdll.def: No such file or directory
make[3]: *** [libpq.dll] Error 1
make[2]: *** [all-libpq-recurse] Error 2
make[1]: *** [all-interfaces-recurse] Error 2
make: *** [all-src-recurse] Error 2
I found a script that builds postgresql via MinGW-w64, and in it the
author specifically creates symlinks to libpqdll.def
https://github.com/Alexpux/MINGW-packages/blob/master/mingw-w64-postgresql/PKGBUILD#L72
-- excerpt below:

# Make DLL definition file visible during each arch build
ln -s
"\${srcdir}/postgresql-\$pkgver/src/interfaces/libpq/libpqdll.def"
src/interfaces/libpq/
ln -s
"\${srcdir}/postgresql-\$pkgver/src/interfaces/ecpg/ecpglib/libecpgdll.def"
src/interfaces/ecpg/ecpglib/
ln -s
"\${srcdir}/postgresql-\$pkgver/src/interfaces/ecpg/pgtypeslib/libpgtypesdll.def"
src/interfaces/ecpg/pgtypeslib/
ln -s
"\${srcdir}/postgresql-\$pkgver/src/interfaces/ecpg/compatlib/libecpg_compatdll.def"
src/interfaces/ecpg/compatlib/

Why are the symlinks needed to make the definition files visible?

Is this an issue with VPATH builds?  It is not mentioned in the docs
where VPATH builds are discussed (section 15.4
http://www.postgresql.org/docs/9.5/static/install-procedure.html )

jacana does VPATH builds with pretty much this setup all the time. See
for example

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

```On Mon, Jan 18, 2016 at 6:47 AM, Ashutosh Bapat
wrote:
> Thanks Thom for bringing it to my notice quickly. Sorry for the same.
>
> Here are the patches.
>
> 1. pg_fdw_core_v2.patch: changes in core related to user mapping handling,
> GUC
>   enable_foreignjoin

I tried to whittle this patch down to something that I'd be
comfortable committing and ended up with nothing left.

First, I removed the enable_foreignjoin GUC.  I still think an
FDW-specific GUC is better, and I haven't heard anybody make a strong
argument the other way. Your argument that this might be inconvenient
if somebody is using a bunch of join-pushdown-enabled FDWs sounds like
a strictly theoretical problem, considering how much difficulty we're
having getting even one FDW to support join pushdown.  And if it does
happen, the user can script it.  I'm willing to reconsider this point
if there is a massive chorus of support for having this be a core
option rather than an FDW option, but to me it seems that we've gone
to a lot of trouble to make the system extensible and might as well
get some benefit from it.

Second, I removed the documentation for GetForeignTable().  That
function is already documented and doesn't need re-documenting.

Third, I removed GetUserMappingById().  As mentioned in the email to
which I replied earlier, that doesn't actually produce the same result
as the way we're doing it now, and might leave the user ID invalid.
Even if that were no issue, it doesn't seem to add anything.  The only
caller of the new function is  postgresBeginForeignScan(), and that
function already has a way of getting the user mapping.  The new way
doesn't seem to be better or faster, so why bother changing it?

At this point, I was down to just the changes to store the user
mapping ID (umid) in the RelOptInfo, and to consider join pushdown
only if the user mapping IDs match.  One observation I made is that if
the code to initialize the FDW-related fields were lifted from
get_relation_info() up to build_simple_rel(), we would not need to use
planner_rt_fetch(), because the caller already has that information.
That seems like it might be worth considering.  But then I realized a
more fundamental problem: making the plan depend on the user ID is a
problem, because the user ID can be changed, and the plan might be
cached.  The same issue arises for RLS, but there is provision for
that in RevalidateCachedQuery.  This patch makes no similar provision.

I think there are two ways forward here.  One is to figure out a way
for the plancache to invalidate queries using FDW join pushdown when
the user ID changes.  The other is to recheck at execution time
whether the user mapping IDs still match, and if not, fall back to
using the "backup" plan that we need anyway for EvalPlanQual rechecks.
This would of course mean that the backup plan would need to be
something decently efficient, not just whatever we had nearest to
hand.  But that might not be too hard to manage.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### Re: [HACKERS] make error - libpqdll.def No such file or directory

```Igal @ Lucee.org wrote:
> On 1/19/2016 10:58 AM, Alvaro Herrera wrote:
> >Yes, probably something like that. I think it failed the first time
> >because there was a bug (the one I introduced in a967613911f7), then
> >probably changed to src/backend and ran compiles there which probably
> >worked fine, leading to commit fa838b555f90. I might or might not have
> >removed the complete build dir instead of "make distclean"; not sure TBH.
> >As I recall, I tried a couple of builds after I pushed the fix commit and
> >couldn't get them to run at all. But since I saw in buildfarm that other
> >mingw builds were working, I lost interest. Then I deleted the mingw build
> >tree and didn't get back to retrying. TBH the only reason I mingled with
> >mingw at all is that nobody seems interested in fixing Win32 issues, and
> >this patch had been lingering for far too long.
>
> in the distribution that I was using ( from
> http://www.postgresql.org/ftp/source/v9.5.0/ ), so I went ahead and cloned
> the git repo and used the sources from there.
>
> I am happy to announce that I was finally able to build and run PostgreSQL
> on MinGW-w64, but it only worked with the sources from the git repo, and not
> a bug there that was resolved, or if it is packaged differently, but I was
> be easier to build from that distribution as opposed to the git repo.

Hmm, I guess this makes sense.  A tree obtained from a shipped tarball
has "make distprep" already run, which means it contains a bunch more
files than clean Git workspace -- probably errcodes.h among them, for
example.  This smells like a makefile bug of some kind to me.

--
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### [HACKERS] Buildfarm server move complete

```

The buildfarm server move is complete. Thanks to all who helped,
especially Stephen Frost.

There might be some small performance regressions which we'll be digging
into.

Next step: move the mailing lists off pgfoundry. The new lists have been
set up I will be working on that migration next week.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### Re: NextXID format change (was Re: [HACKERS] exposing pg_controldata and pg_config as functions)

```Joe Conway wrote:

> The attached includes Bruce's change, plus I found two additional sites
> that appear to need the same change. The xlog.c change is just a DEBUG
> message, so not a big deal. I'm less certain if the xlogdesc.c change
> might create some fallout.

Hm, pg_xlogdump links the rmgrdesc files, so perhaps you might need to
adjust expected test output for it.  Not really sure.

--
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### Re: NextXID format change (was Re: [HACKERS] exposing pg_controldata and pg_config as functions)

```On Wed, Jan 20, 2016 at 11:41 AM, Alvaro Herrera
wrote:
> Joe Conway wrote:
>
>> The attached includes Bruce's change, plus I found two additional sites
>> that appear to need the same change. The xlog.c change is just a DEBUG
>> message, so not a big deal. I'm less certain if the xlogdesc.c change
>> might create some fallout.
>
> Hm, pg_xlogdump links the rmgrdesc files, so perhaps you might need to
> adjust expected test output for it.  Not really sure.

We don't depend on this output format in any tests AFAIK, at least
check-world is not complaining here and pg_xlogdump has no dedicated
tests. There may be some utility in the outside world doing some
manipulation of the string generated for this record, but that's not

Patch looks fine, I have not spotted any other places that need a refresh.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### [HACKERS] Why format() adds double quote?

```test=# select format('%I', t) from t1;
format
--
aaa
"AAA"
"あいう"
(3 rows)

Why is the text value of the third line needed to be double quoted?
(note that it is a multi byte character). Same thing can be said to
quote_ident().

We treat identifiers made of the multi byte characters without double
quotation (non delimited identifier) in other places.

test=# create table t2(あいう text);
CREATE TABLE
test=# insert into t2 values('aaa');
INSERT 0 1
test=# select あいう from t2;
あいう

aaa
(1 row)

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### Re: [HACKERS] plpgsql - DECLARE - cannot to use %TYPE or %ROWTYPE for composite types

```On Tue, Jan 19, 2016 at 4:53 AM, Pavel Stehule  wrote:
> It is, but sometime the polymorphic types can help.
>
> The proposed feature/syntax has sense primary for polymorphic types. It
> should to follow our polymorphic types. The primary pair is
> "anyarray","anyelement"  -> "arraytype","elemementtype".
>
> If you don't use polymorphic parameters in plpgsql, then proposed feature
> can look like useless.

I don't think it's useless, but I do think the syntax is ugly.  Maybe
it's the best we can do and we should just live with it, but Alvaro
asked for opinions, so there's mine.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

```On Wed, Jan 20, 2016 at 1:12 PM, Thomas Munro
wrote:
> On Wed, Dec 30, 2015 at 5:15 PM, Thomas Munro
>  wrote:
>> On Wed, Nov 18, 2015 at 11:50 PM, Thomas Munro
>>  wrote:
>>> Here is a new version of the patch with a few small improvements:
>>> ...
>>
>> That didn't apply after 6e7b3359 (which fix a typo in a comment that I
>> moved).  Here is a new version that does.
>
> That one conflicts with b1a9bad9e744857291c7d5516080527da8219854, so
> here is a new version.

You should try to register it to a CF, though it may be too late for
9.6 if that's rather intrusive.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### Re: [HACKERS] Support for N synchronous standby servers - take 2

```On Tue, Jan 19, 2016 at 1:52 AM, Thom Brown  wrote:
> On 3 January 2016 at 13:26, Masahiko Sawada  wrote:
>> On Fri, Dec 25, 2015 at 7:21 AM, Thomas Munro
>>  wrote:
>>> On Fri, Dec 25, 2015 at 8:50 AM, Masahiko Sawada
>>> wrote:
On Wed, Dec 23, 2015 at 8:45 AM, Thomas Munro
wrote:
> On Wed, Dec 23, 2015 at 3:50 PM, Thomas Munro
>  wrote:
>> If you got rid of SyncRepGetSyncStandbysOnePriority as suggested
>> above, then this function could be renamed to SyncRepGetSyncStandbys.
>> I think it would be a tiny bit nicer if it also took a Size n argument
>> along with the output buffer pointer.

Sorry, I could not get your point. SyncRepGetSyncStandbysPriority()
function uses synchronous_standby_num which is global variable.
But you mean that the number of synchronous standbys is given as
function argument?
>>>
>>> Yeah, I was thinking of it as the output buffer size which I would be
>>> inclined to make more explicit (I am still coming to terms with the
>>> use of global variables in Postgres) but it doesn't matter, please
>>> disregard that suggestion.
>>>
>> As for the body of that function (which I won't paste here), it
>> contains an algorithm to find the top K elements in an array of N
>> elements.  It does that with a linear search through the top K seen so
>> far for each value in the input array, so its worst case is O(KN)
>> comparisons.  Some of the sorting gurus on this list might have
>> something to say about that but my take is that it seems fine for the
>> tiny values of K and N that we're dealing with here, and it's nice
>> that it doesn't need any space other than the output buffer, unlike
>> some other top-K algorithms which would win for larger inputs.

Yeah, it's improvement point.
But I'm assumed that the number of synchronous replication is not
large, so I use this algorithm as first version.
And I think that its worst case is O(K(N-K)). Am I missing something?
>>>
>>> You're right, I was dropping that detail, in the tradition of the
>>> hand-wavy school of big-O notation.  (I suppose you could skip the
>>> inner loop when the priority is lower than the current lowest
>>> priority, giving a O(N) best case when the walsenders are perfectly
>>> ordered by coincidence.  Probably a bad idea or just not worth
>>
>> Thank you for reviewing the patch.
>> Yeah, I added the logic that skip the inner loop.
>>
>>>
>>>
>>> +/*
>>> + * Obtain currently synced LSN location: write and flush, using priority
>>> - * In 9.1 we support only a single synchronous standby, chosen from a
>>> - * priority list of synchronous_standby_names. Before it can become the
>>> + * In 9.6 we support multiple synchronous standby, chosen from a priority
>>>
>>> s/standby/standbys/
>>>
>>> + * list of synchronous_standby_names. Before it can become the
>>>
>>> s/Before it can become the/Before any standby can become a/
>>>
>>>   * synchronous standby it must have caught up with the primary; that may
>>>   * take some time. Once caught up, the current highest priority standby
>>>
>>> s/standby/standbys/
>>>
>>>   * will release waiters from the queue.
>>>
>>> +bool
>>> +SyncRepGetSyncLsnsPriority(XLogRecPtr *write_pos, XLogRecPtr *flush_pos)
>>> +{
>>> + int sync_standbys[synchronous_standby_num];
>>>
>>> I think this should be sync_standbys[SYNC_REP_MAX_SYNC_STANDBY_NUM].
>>> (Variable sized arrays are a feature of C99 and PostgreSQL is written
>>> in C89.)
>>>
>>> +/*
>>> + * Populate a caller-supplied array which much have enough space for
>>> + * synchronous_standby_num. Returns position of standbys currently
>>> + * considered as synchronous, and its length.
>>> + */
>>> +int
>>> +SyncRepGetSyncStandbys(int *sync_standbys)
>>>
>>> s/much/must/ (my bad, in previous email).
>>>
>>> + ereport(ERROR,
>>> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>>> + errmsg("The number of synchronous standbys must be smaller than the
>>> number of listed : %d",
>>> + synchronous_standby_num)));
>>>
>>> How about "the number of synchronous standbys exceeds the length of
>>> the standby list: %d"?  Error messages usually start with lower case,
>>> ':' is not usually preceded by a space.
>>>
>>> + ereport(ERROR,
>>> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>>> + errmsg("The number of synchronous standbys must be between 1 and %d : %d",
>>>
>>> s/The/the/, s/ : /: /
>>
>> Fixed you mentioned.
>>
>> Attached latest v5 patch.
>
> synchronous_standby_num doesn't appear to be a valid GUC name:
>
> LOG:  unrecognized configuration parameter "synchronous_standby_num"
> in file "/home/thom/Development/test/primary/postgresql.conf" line 244
>
> All I did was uncomment it and set it to a value.
>

```

### Re: [HACKERS] Why format() adds double quote?

```2016-01-20 3:47 GMT+01:00 Tatsuo Ishii :

> test=# select format('%I', t) from t1;
>   format
> --
>  aaa
>  "AAA"
>  "あいう"
> (3 rows)
>
> Why is the text value of the third line needed to be double quoted?
> (note that it is a multi byte character). Same thing can be said to
> quote_ident().
>
> We treat identifiers made of the multi byte characters without double
> quotation (non delimited identifier) in other places.
>
> test=# create table t2(あいう text);
> CREATE TABLE
> test=# insert into t2 values('aaa');
> INSERT 0 1
> test=# select あいう from t2;
>  あいう
>
>  aaa
> (1 row)
>

format uses same routine as quote_ident. So quote_ident should be fixed
first.

Regards

Pavel

>
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

```

### Re: [HACKERS] Window2012R2: initdb error: "The current directory is invalid."

```On 19 January 2016 at 18:49, Huong Dangminh
wrote:

> Hi all,
>
> I have failed in execute initdb in "c:\Windows\Temp\dir" folder as
> reproduce logs below. The OS's messages "The current directory is invalid."
> was returned.
>

Is that directory or any parent of it a junction point?

http://superuser.com/q/823959

Also, why would you possibly run the installer from there? You should be
using %TEMP% which should generally resolve

> creating template1 database in c:/testdb/base/1 ... The current directory
> is invalid.

It looks like somehow the current working directory is unmapped or cannot
be found.

Is there any chance you have mapped network drives in use?

--
PostgreSQL Development, 24x7 Support, Training & Services

```

### Re: [HACKERS] dynloader.h missing in prebuilt package for Windows?

```On Mon, Jan  4, 2016 at 09:50:40PM -0800, Michael Paquier wrote:
> On Tue, Jan 5, 2016 at 2:27 PM, Tom Lane  wrote:
> > Michael Paquier  writes:
> >> The patch would put the buildfarm in red as it is incomplete anyway,
> >> port/dynloader/win32.h. Instead of this patch I would be incline to
> >> remove the #define stuff with dynloader.h that use WIN32_ONLY_COMPILER
> >> (see for example dfmgr.c) and just copy the header in include/. This
> >> way we use the same header for all platforms.
> >
>
> Sure, here you go. Bruce's patch simply forgot to copy the header file
> via Solution.pm, so installation just failed. The change of dfmgr.c is
> actually not mandatory but I think that as long as dynloader.h is
> copied in include/ we had better change that as well, and it makes the
> code cleaner.

I have applied this patch all the way back to 9.1.  This means PL/Java
can be cleanly built via MSVC on Windows for all installs after the next
set of minor releases.

--
Bruce Momjian  http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### Re: [HACKERS] plpgsql - DECLARE - cannot to use %TYPE or %ROWTYPE for composite types

```2016-01-20 0:34 GMT+01:00 Robert Haas :

> On Tue, Jan 19, 2016 at 4:53 AM, Pavel Stehule
> wrote:
> > It is, but sometime the polymorphic types can help.
> >
> > The proposed feature/syntax has sense primary for polymorphic types. It
> > should to follow our polymorphic types. The primary pair is
> > "anyarray","anyelement"  -> "arraytype","elemementtype".
> >
> > If you don't use polymorphic parameters in plpgsql, then proposed feature
> > can look like useless.
>
> I don't think it's useless, but I do think the syntax is ugly.  Maybe
> it's the best we can do and we should just live with it, but Alvaro
> asked for opinions, so there's mine.
>

ok

5 years ago, maybe more - I proposed more nice syntax - and it was rejected
as too complex (reserved worlds was required). So this solution try to
attack it from different side. It is simple and effective.

Regards

Pavel

>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

```

### Re: [HACKERS] Support for N synchronous standby servers - take 2

```On Tue, Jan 19, 2016 at 2:55 PM, Michael Paquier
wrote:
> On Tue, Jan 19, 2016 at 1:40 AM, Masahiko Sawada
> wrote:
>> On Mon, Jan 18, 2016 at 1:20 PM, Michael Paquier
>>  wrote:
>>> On Sun, Jan 17, 2016 at 11:09 PM, Masahiko Sawada
>>> wrote:
On Wed, Jan 13, 2016 at 1:54 AM, Alvaro Herrera wrote:
* Confirm value of pg_stat_replication.sync_state (sync, async or
potential)
* Confirm that the data is synchronously replicated to multiple
standbys in same cases.
* case 1 : The standby which is not listed in s_s_name, is down
* case 2 : The standby which is listed in s_s_names but potential
standby, is down
* case 3 : The standby which is considered as sync standby, is down.
* Standby promotion

In order to confirm that the commit isn't done in case #3 forever
unless new sync standby is up, I think we need the framework that
cancels executing query.
That is, what I'm planning is,
1. Set up master server (s_s_name = '2, standby1, standby2)
2. Set up two standby servers
3. Standby1 is down
4. Create some contents on master (But transaction is not committed)
5. Cancel the #4 query. (Also confirm that the flush location of only
standby2 makes progress)
>>>
>>> This will need some thinking and is not as easy as it sounds. There is
>>> no way to hold on a connection after executing a query in the current
>>> TAP infrastructure. You are just mentioning case 3, but actually cases
>>> 1 and 2 are falling into the same need: if there is a failure we want
>>> to be able to not be stuck in the test forever and have a way to
>>> cancel a query execution at will. TAP uses psql -c to execute any sql
>>> queries, but we would need something that is far lower-level, and that
>>> would be basically using the perl driver for Postgres or an equivalent
>>> here.
>>>
>>> Honestly for those tests I just thought that we could get to something
>>> reliable by just looking at how each sync replication setup reflects
>>> in pg_stat_replication as the flow is really getting complicated,
>>> giving to the user a clear representation at SQL level of what is
>>> actually occurring in the server depending on the configuration used
>>> being important here.
>>
>> I see.
>> We could check the transition of sync_state in pg_stat_replication.
>> I think it means that it tests for each replication method (switching
>> state) rather than synchronization of replication.
>>
>> What I'm planning to have are,
>> * Confirm value of pg_stat_replication.sync_state (sync, async or potential)
>> * Standby promotion
>> * Standby catching up master
>> And each replication method has above tests.
>>
>> Are these enough?
>
> Does promoting the standby and checking that it caught really have
> value in this context of this patch? What we just want to know is on a
> master, which nodes need to be waited for when s_s_names or any other
> method is used, no?

Yeah, these 2 tests are not in this context of this patch.
If test framework could have the facility that allows us to execute
query(psql) as another process, we could use pg_cancel_backend()
function to waiting process when master server waiting for standbys.
In order to check whether the master server would wait for the standby
or not, we need test framework to have such facility, I think.

Regards,

--

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### Re: [HACKERS] Re: pglogical_output - a general purpose logical decoding output plugin

```On 20 January 2016 at 06:23, Tomasz Rybak  wrote:

> The following review has been posted through the commitfest application:
>

Thanks!

>
> + /* Protocol capabilities */
> + #define PGLOGICAL_PROTO_VERSION_NUM 1
> + #define PGLOGICAL_PROTO_MIN_VERSION_NUM 1
> Is this protocol version number and minimal recognized version number,
> or major and minor version number? I assume that
> PGLOGICAL_PROTO_VERSION_NUM
> is current protocol version (as in config max_proto_version and
> min_proto_version). But why we have MIN_VERSION and not MAX_VERSION?
>
> From code in pglogical_output.c (lines 215-225 it looks like
> PGLOGICAL_PROTO_VERSION_NUM is maximum, and PGLOGICAL_PROTO_MIN_VERSION_NUM
> is treated as minimal protocol version number.
>
> I can see that those constants are exported in pglogical_infofuncs.c and
> pglogical.sql, but I do not understand omission of MAX.
>

want to get right but I'm not totally confident of.

The idea here is that we want downwards compatibility as far as possible
and maintainable but we can't really be upwards compatible for breaking
protocol revisions. So the output plugin's native protocol version is
inherently the max protocol version and we don't need a separate MAX.

The downstream connects and declares to the upstream "I speak protocol 2
through 3". The upstream sees this and replies "I speak protocol 1 through
2. We have protocol 2 in common so I will use that." Or alternately replies
with an error "I only speak protocol 1 so we have no protocol in common".
This is done via the initial parameters passed by the downstream to the
logical decoding plugin and then via the startup reply message that's the
first message on the logical decoding stream.

We can't do a better handshake than this because the underlying walsender
protocol and output plugin API only gives us one chance to send free-form
information to the output plugin and it has to do so before the output
plugin can send anything to the downstream.

As much as possible I want to avoid ever needing to do a protocol bump
anyway, since it'll involve adding conditionals and duplication. That's why
the protocol tries so hard to be extensible and rely on declared
capabilities rather than protocol version bumps. But I'd rather plan for it
than be unable to ever do it in future.

Much (all?) of this is discussed in the protocol docs. I should probably
double check that and add a comment that refers to them there.

> + typedef struct HookFuncName
>

Thanks. That's residue from the prior implementation of hooks, which used
direct pg_proc lookups and cached the FmgrInfo in a dynahash. It's no
longer required now that we're using a single hook entry point and direct C

> + typedef struct PGLogicalTupleData
> I haven't found those used anything, and they are not mentioned in
> documentation. Are those structures needed?
>

That snuck in from the pglogical downstream during the split into a
separate tree. It's dead code as far as pglogical_output is concerned.
Removed.

> + pglogical_config.c:
> +   switch(get_param_key(elem->defname))
> +   {
> +   val = get_param_value(elem, false,
> OUTPUT_PARAM_TYPE_UINT32);

Why do we need this line here? All cases contain some variant of
> val = get_param_value(elem, false, TYPE);
> as first line after "case PARAM_*:" so this line should be removed.
>

Correct. That seems to be an escapee editing error. Thanks, removed.

> +   val = get_param(options, "startup_params_format", false, false,
> +   OUTPUT_PARAM_TYPE_UINT32, );
> +
> +   params_format = DatumGetUInt32(val);
> Shouldn't we check "found" here? We work with val (which is Datum(0)) -
> won't
> it throw SIGFAULT, or similar?
>

get_param is called with missing_ok=false so it ERRORs and can never return
!found . In any case it'd return (Datum)0 so we'd just get (uint32)0 not a
crash.

To make this clearer I've changed get_param so it supports NULL as a value
for found.

> Additionally - I haven't found any case where code would use "found"
> passed from get_param() - so as it's not used it might be removed.
>

Probably, but I expect it to be useful later. It was used before a
restructure of how params get read. I don't mind removing it if you feel
strongly about it, but it'll probably just land up being added back at some
point.

>
>
+   elog(DEBUG1, "Binary mode rejected: Server and client
> endian sizeof(long) mismatch");
> Isn't "endian" here case of copy-paste from first error?
>

Yes, it is. Thanks.

> + static void pg_decode_shutdown(LogicalDecodingContext * ctx)
> In pg_decode_startup we create main memory context, and create hooks memory
> context. In pg_decode_shutdown we delete hooks memory context but not main
> context. Is this OK, or should we also add:
> ```

```On Wed, Dec 30, 2015 at 5:15 PM, Thomas Munro
wrote:
> On Wed, Nov 18, 2015 at 11:50 PM, Thomas Munro
>  wrote:
>> Here is a new version of the patch with a few small improvements:
>> ...
>
> That didn't apply after 6e7b3359 (which fix a typo in a comment that I
> moved).  Here is a new version that does.

That one conflicts with b1a9bad9e744857291c7d5516080527da8219854, so
here is a new version.

--
Thomas Munro
http://www.enterprisedb.com

Description: Binary data

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### [HACKERS] Commented code related to flock() in be-secure-openssl.c

```Hi all,

I just bumped into the following code in be-secure-openssl.c:
/*  flock(fileno(fp), LOCK_SH); */
dh = PEM_read_DHparams(fp, NULL, NULL, NULL);
/*  flock(fileno(fp), LOCK_UN); */
Those have been added by 55d0532, dated as of 2002. Perhaps there is
some point to remove those lines?
Regards,
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### Re: [HACKERS] dynloader.h missing in prebuilt package for Windows?

```On Wed, Jan 20, 2016 at 1:34 PM, Bruce Momjian  wrote:
> On Mon, Jan  4, 2016 at 09:50:40PM -0800, Michael Paquier wrote:
>> On Tue, Jan 5, 2016 at 2:27 PM, Tom Lane  wrote:
>> > Michael Paquier  writes:
>> >> The patch would put the buildfarm in red as it is incomplete anyway,
>> >> port/dynloader/win32.h. Instead of this patch I would be incline to
>> >> remove the #define stuff with dynloader.h that use WIN32_ONLY_COMPILER
>> >> (see for example dfmgr.c) and just copy the header in include/. This
>> >> way we use the same header for all platforms.
>> >
>>
>> Sure, here you go. Bruce's patch simply forgot to copy the header file
>> via Solution.pm, so installation just failed. The change of dfmgr.c is
>> actually not mandatory but I think that as long as dynloader.h is
>> copied in include/ we had better change that as well, and it makes the
>> code cleaner.
>
> I have applied this patch all the way back to 9.1.  This means PL/Java
> can be cleanly built via MSVC on Windows for all installs after the next
> set of minor releases.

Thanks. I'll keep an eye on the buildfarm in case.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

```On Wed, Jan 20, 2016 at 12:41 AM, Robert Haas  wrote:
>
> On Mon, Jan 18, 2016 at 10:41 PM, Amit Kapila
wrote:
> > Second important and somewhat related point is whether we should save
> > this information in PGPROC as 4 bytes or keep it in pgBackendStatus.
> > I think it is better to store in PGPROC, if we want to save wait
information
> > for backend processes as well.
> >
> > I am of opinion that we should save this information in PGPROC and
> > expose it via new view, but I am open to go other ways based on what
>
> My opinion is that storing the information in PGPROC is better because
> it seems like we can fairly painlessly expose 4 bytes of data that way
> instead of 1, which is nice.
>

Okay, do you mean to say that we can place this new 4-byte variable
in PGPROC at 4-byte aligned boundary, so both read and writes will be
atomic?

> On the topic of the UI, I understand that redefining
> pg_stat_activity.waiting might cause some short-term annoyance.  But I
> think in the long term what we are proposing here is going to be a
> huge improvement, so I think it's worth the compatibility break.  If
> we say that pg_stat_activity.waiting has to continue meaning "waiting
> for a heavyweight lock" even though we now also expose (in some other
> location) information on other kinds of waits, that's going to be
> confusing to users.
>

If we want to go via this route, then the first thing which we need to
decide is whether we want to start displaying the information of
background processes like WALWriter and others in pg_stat_activity?
Second thing that needs some thoughts is that functions like
pg_stat_get_activity() needs to rely both on PgBackendStatus and
PGProc and we might also need to do some special handling for
background processes if want the information for those processes
in this view.

>  It's better to force people to update their
> queries once than to have this confusing wart in the system forever.
> I predict that if we make backward compatibility the priority here,
> we'll still be explaining it to smart but confused people when 9.6
> goes EOL.
>

Valid point, OTOH we can update the docs to say that
pg_stat_activity.waiting parameter is deprecated and after a
release or two we can get rid of this parameter.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

```

### Re: [HACKERS] Why format() adds double quote?

```> 2016-01-20 3:47 GMT+01:00 Tatsuo Ishii :
>
>> test=# select format('%I', t) from t1;
>>   format
>> --
>>  aaa
>>  "AAA"
>>  "あいう"
>> (3 rows)
>>
>> Why is the text value of the third line needed to be double quoted?
>> (note that it is a multi byte character). Same thing can be said to
>> quote_ident().
>>
>> We treat identifiers made of the multi byte characters without double
>> quotation (non delimited identifier) in other places.
>>
>> test=# create table t2(あいう text);
>> CREATE TABLE
>> test=# insert into t2 values('aaa');
>> INSERT 0 1
>> test=# select あいう from t2;
>>  あいう
>>
>>  aaa
>> (1 row)
>
> format uses same routine as quote_ident. So quote_ident should be fixed
> first.

Yes, I had that in my mind too.

Attached is the proposed patch to fix the bug.
Regression tests passed.

Here is an example after the patch. Note that the third row is not
quoted any more.

test=#  select format('%I', あいう) from t2;
format

aaa
"AAA"
あああ
(3 rows)

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
index 3783e97..b93fc27 100644
@@ -9405,7 +9405,7 @@ quote_identifier(const char *ident)
* would like to use  macros here, but they might yield unwanted
* locale-specific results...
*/
-	safe = ((ident[0] >= 'a' && ident[0] <= 'z') || ident[0] == '_');
+	safe = ((ident[0] >= 'a' && ident[0] <= 'z') || ident[0] == '_' || IS_HIGHBIT_SET(ident[0]));

for (ptr = ident; *ptr; ptr++)
{
@@ -9413,7 +9413,8 @@ quote_identifier(const char *ident)

if ((ch >= 'a' && ch <= 'z') ||
(ch >= '0' && ch <= '9') ||
-			(ch == '_'))
+			(ch == '_') ||
+			(IS_HIGHBIT_SET(ch)))
{
/* okay */
}

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### Re: [HACKERS] Logical decoding on standby

```On 19 January 2016 at 23:30, Дмитрий Сарафанников
wrote:

When you plan to add logical decoding on standby?
>

> it is useful to have separate standby server for logical replication that
> will not break the master if you make a mistake in plugin.
>

Indeed.  It might PANIC it and force a restart, if that's what you mean. I
guess in an extreme case you could clobber shmem silently, which would be
walsender backends?). So it certainly seems like it'd be nice to have.
There's also the advantage that you could shift load off the master by
having physical master=>replica replication on a fast wide link, then do
logical decoding from there to send over the wire to WAN sites etc.

Unfortunately it's not particularly simple and nobody seems to have time to
implement it. As Álvaro pointed out, sometimes you have to do the work if
you want the change to happen. Or find someone with the existing skills and
convince them to want to do it, but most of those people are already very,
very busy.

As part of the failover slots work Simon noted that:

"To prevent needed rows from being removed we need we would need to enhance
hot_standby_feedback so it sends both xmin and catalog_xmin to the master."

... which means a protocol change in the walsender protocol. So you're
looking at that plus the other comment given above, that

"We need to be able to correctly and quickly identify the timeline a  LSN
belongs to"

which is new internal infrastructure and new state in the replica that
must be maintained. Probably persistently.

In other words ... this isn't especially simple to do. It requires at least
two non-trivial core changes. Nobody who has the skills to do it reasonably
quickly also has the time to do it at all or has other higher priority
things to do first. So it's not likely to happen soon unless you or someone
like you steps up and has a go at it.

If you do decide to attempt to implement it and you're willing to read a
fair bit of code and documentation to do so you'll find people here pretty
helpful with suggestions, ideas, and help if you get stuck. But only if you
put in the work yourself.

(On a side note, failover slots probably won't be usable from a standby
either. They have to write to WAL and you can't write to WAL from a
standby. So if slot support is ever added on a standby it'll probably be
ephemeral slots only.)

--
PostgreSQL Development, 24x7 Support, Training & Services

```

### Re: [HACKERS] Rethinking TRANSFORM FOR TYPE ...

```Hi

2016-01-19 22:34 GMT+01:00 Jim Nasby :

> I'm using the TRANSFORM feature to implement a new data type for python
> (ndarrays from numpy). I'm constantly getting tripped up by forgetting to
> add TRANSFORM FOR TYPE. Worse, the requirement for explicitly stating
> transform means I can't use a polymorphic type.
>
> In the case of adding a new transform for an existing type, current
> behavior makes sense; you'll break all existing functions using the type if
> you just swap the representation out under them. Further, if you are
> pulling in some new extension that uses the same language and type, that
> function will be expecting the old representation, not the new one.
>
>
the one important motivation was "don't break current code" - so TRANSFORM
discussion. Other reasons can be performance. When you add new TRANSFORM,
you don't need to recreate plans - or some similar (I don't remember it
well).

> For the case of creating a new data type, I think explicitly requiring the
> TRANSFORM clause makes no sense. It's a bunch of extra work for users that
>
> A simple way to fix this would be to allow simply marking a transform as
> being DEFAULT. If a transform is marked as DEFAULT then it would
> automatically get used.
>
> Perhaps a better way would be allowing for multiple transforms for each
> language and type. That way users aren't stuck with a single preconceived
> notion of how to represent a type. The immediate use I see for that is it
> would allow a transform to be created in something other than C, as long as
> the language you want to use can handle a raw C string. That desire might
> sound silly to a lot of -hackers, but given the amount of pain I went
> through figuring out how to properly marshal an ndarray back and forth in
> C, I sure as hell wish I could have done it in python! Since plpythonu
> understands bytea, I don't see any reason I couldn't have.

This topic is open and can be enhanced - but you have to be careful about
performance.

Regards

Pavel

>
> --
> Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
> Experts in Analytics, Data Architecture and PostgreSQL
> Data in Trouble? Get it in Treble! http://BlueTreble.com
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

```

### Re: [HACKERS] Stream consistent snapshot via a logical decoding plugin as a series of INSERTs

```On 15 January 2016 at 16:30, Shulgin, Oleksandr <
oleksandr.shul...@zalando.de> wrote:

> I'd like to propose generic functions (probably in an extension, or in
> core if not possible otherwise) to facilitate streaming existing data from
> the database *in the same format* that one would get if these would be the
> changes decoded by a logical decoding plugin.
>

So effectively produce synthetic logical decoding callbacks to run a bunch
of fake INSERTs, presumably with a fake xid etc?

> The idea is to use a snapshot returned from CREATE_REPLICATION_SLOT
> command of the replication protocol to get a consistent snapshot of the
> database, then start listening to new changes on the slot.
>

My impression is that you want to avoid the current step of "synchronize
database initial contents" when using logical decoding for replication. But
I guess you're looking to then populate that empty schema in-band via
logical decoding, rather than having to do a --data-only dump or use COPY.
Right?

That won't help you for schema; presumably you'd still do a pg_dump
--schema-only | pg_restore for that.

Just like when restoring a --data-only dump or using COPY you'd have to
disable FKs during sync, but that's pretty much unavoidable.

> The way this initial export phase is implemented there is by providing a
> SQL-callable set returning function which is using SPI to run "SELECT *
> FROM mytable" behind the scenes and runs the resulting tuples through the
> INSERT callback of the logical decoding plugin, which lives in the same
> loadable module as this SQL function.
>

o_O

What about the reorder buffer, the logical decoding memory context, etc?

> Bottled Water logical decoding plugin uses binary protocol based on Avro
> data serialization library.  As an experiment I was adding support for JSON
> output format to it, and for that I had to re-implement the aforementioned
> SRF to export initial data to convert tuples to JSON instead.
>

Have you taken a look at what's been done with pglogical and
pglogical_output?

We've got extensible protocol support there, and if Avro offers compelling
benefits over the current binary serialization I'm certainly interested in

> What do you say?
>

Interesting idea. As outlined I think it sounds pretty fragile though; I
really, really don't like the idea of lying to the insert callback by
passing it a fake insert with (presumably) fake reorder buffer txn, etc.

What we've done in pglogical is take a --schema-only dump then, on the
downstream, attach to the exported snapshot and use COPY ... TO STDOUT over
a libpq connection to the upstream feed that to COPY ... FROM STDIN on
another libpq connection to "ourselves", i.e. the downstream. Unless Petr
changed it to use COPY innards directly on the downstream; I know he talked
about it but haven't checked if he did. Anyway, either way it's not pretty
and requires a sideband non-replication connection to sync initial state.
The upside is that it can be relatively easily parallelized for faster sync
using multiple connections.

To what extent are you setting up a true logical decoding context here?
Where does the xact info come from? The commit record? etc. You're
presumably not forming a reorder buffer then decoding it since it could
create a massive tempfile on disk, so are you just dummying this info up?
Or hoping the plugin won't look at it?

The functionality is good and I think that for the SQL level you'd have to
use SET TRANSACTION SNAPSHOT as you show. But I think it should really be
usable from the replication protocol too - and should try to keep the state
as close to that of a normal decoding session as possible. We'd at least
need a new walsender protocol command equivalent that took the snapshot
identifier, relation info and the other decoding params instead of a slot
name. Or, ideally, a variant on START_REPLICATION ... LOGICAL ... that
omits SLOT and instead takes TABLES as an argument, with a list of
relation(s) to sync. Unlike normal START_REPLICATION ... LOGICAL ... it'd
return to walsender protocol mode on completion, like the phys rep protocol
does when it's time for a timeline switch.

Rather than lie to the insert callback I'd really rather define a new
logical decoding callback for copying initial records. It doesn't get any
xact info (since it's not useful/relevant) or a full reorder buffer. No
ReorderBufferChange is passed; instead we pass something like a
ReorderBufferSync that contains the new tuple ReorderBufferTupleBuf, origin
id, origin lsn and commit timestamp (if known) and the RelFileNode
affected. The LogicalDecodingContext that's set up for the callback gets
ctx->reorder = NULL .  There's no ReorderBufferTxn argument and none is
defined.

Since it's a new callback the plugin knows the rules, knows it's getting
initial state data to sync over, etc. It doesn't have to try to guess if
it's seeing a real insert and act differently with respect to xact identity
etc.

Obviously that's ```

### Re: [HACKERS] Inconsistent error handling in START_REPLICATION command

```On 5 January 2016 at 18:35, Shulgin, Oleksandr  wrote:

> psycopg2test=# START_REPLICATION SLOT "test1" LOGICAL 0/0 ("invalid"
>> 'value');
>> unexpected PQresultStatus: 8
>>
>
I think the point here is that START_REPLICATION issues useful errors and
returns to the normal protocol up until the logical decoding startup
callback is invoked.

It enters COPY BOTH mode before it invokes the startup callback. The client
has no way to unilaterally terminate COPY BOTH mode and return to the
normal walsender protocol. The server doesn't do it when there's an ERROR.
So the only option the client has for recovery is to disconnect and
reconnect.

> I recall Craig Rigner mentioning this issue in context of the
>> pglogical_output plugin, but I thought that was something to do with the
>> startup packet.  The behavior above doesn't strike me as very consistent,
>> we should be able to detect and report errors during output plugin startup
>> and let the client retry with the corrected command as we do for syntax or
>> other errors.
>>
>
I agree. This would also let clients have some limited facility for
negotiating options with the output plugin, albeit via the very clumsy
channel of libpq protocol error message records and their text.

> Looks like the attached patch fixes it for me:
>>
>
For those who didn't read it, the patch moves the start of COPY BOTH mode
to after the logical decoding startup callback is invoked.

This certainly fixes the immediate issue. It also forecloses something else
I'd really like though, which is the ability for plugins to send output to
clients from within their decoding startup hook.

For that reason I'd actually like to enter COPY BOTH mode before the
startup callback, as is currently the case. So I'd like to wrap the
decoding startup callback in a PG_TRY that catches an ERROR raised by the
startup callback (if any) and exits COPY BOTH mode before re-throwing.

Sound reasonable?

I'm looking toward a future where the decoding startup callback can emit a
structured, plugin-specific CopyData-wrapped message with information the
downstream might need to negotiate a connection. It could then exit back to
walsender mode by throwing an ERROR. Or, better, by setting some new
decoding context field that's checked after each callback to see if the
plugin wants to exit COPY BOTH mode and return to walsender protocol mode
without spamming the logs with an ERROR. That way we can finally do two-way
negotiation between downstream and upstream.

(Related to this, I also want to add a new decoding callback that lets the
downstream send arbitrary CopyData back up the wire to the upstream where
it's passed to the decoding callback. This would give the downstream a
backchannel to change settings, request things from the output plugin, etc,
w/o lots of disconnects and reconnects.)

--
PostgreSQL Development, 24x7 Support, Training & Services

```

### Re: [HACKERS] Logical decoding on standby

```On 2016-01-20 15:11:06 +0800, Craig Ringer wrote:
> Unfortunately it's not particularly simple and nobody seems to have time to
> implement it.

FWIW, I don't think it's *that* hard.

> As Álvaro pointed out, sometimes you have to do the work if
> you want the change to happen. Or find someone with the existing skills and
> convince them to want to do it, but most of those people are already very,
> very busy.
>
> As part of the failover slots work Simon noted that:
>
> "To prevent needed rows from being removed we need we would need to enhance
> hot_standby_feedback so it sends both xmin and catalog_xmin to the master."
>
> ... which means a protocol change in the walsender protocol. So you're
> looking at that plus the other comment given above, that

Not a huge problem though.

> "We need to be able to correctly and quickly identify the timeline a  LSN
> belongs to"
>
>  which is new internal infrastructure and new state in the replica that
> must be maintained. Probably persistently.

I think it just needs a more efficient lookup structure over the
existing tliOfPointInHistory(), the data is all there.

> (On a side note, failover slots probably won't be usable from a standby
> either. They have to write to WAL and you can't write to WAL from a
> standby. So if slot support is ever added on a standby it'll probably be
> ephemeral slots only.)

ephemeral slots are a different thing. Anyway, at this point failover
slots aren't really proposed / have an agreed upon design yet, so it's a
bit hard to take them into account.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### Re: [HACKERS] Stream consistent snapshot via a logical decoding plugin as a series of INSERTs

```On Wed, Jan 20, 2016 at 7:57 AM, Craig Ringer  wrote:

> On 15 January 2016 at 16:30, Shulgin, Oleksandr <
> oleksandr.shul...@zalando.de> wrote:
>
>
>> I'd like to propose generic functions (probably in an extension, or in
>> core if not possible otherwise) to facilitate streaming existing data from
>> the database *in the same format* that one would get if these would be the
>> changes decoded by a logical decoding plugin.
>>
>
> So effectively produce synthetic logical decoding callbacks to run a bunch
> of fake INSERTs, presumably with a fake xid etc?
>

Exactly.

> The idea is to use a snapshot returned from CREATE_REPLICATION_SLOT
>> command of the replication protocol to get a consistent snapshot of the
>> database, then start listening to new changes on the slot.
>>
>
> My impression is that you want to avoid the current step of "synchronize
> database initial contents" when using logical decoding for replication.
>

Yes, but...

> But I guess you're looking to then populate that empty schema in-band via
> logical decoding, rather than having to do a --data-only dump or use COPY.
> Right?
>
> That won't help you for schema; presumably you'd still do a pg_dump
> --schema-only | pg_restore for that.
>
> Just like when restoring a --data-only dump or using COPY you'd have to
> disable FKs during sync, but that's pretty much unavoidable.
>

All of this implies another *postgres* database on the receiving side,
which is not necessarily the case for my research.

The way this initial export phase is implemented there is by providing a
>> SQL-callable set returning function which is using SPI to run "SELECT *
>> FROM mytable" behind the scenes and runs the resulting tuples through the
>> INSERT callback of the logical decoding plugin, which lives in the same
>> loadable module as this SQL function.
>>
>
> o_O
>
> What about the reorder buffer, the logical decoding memory context, etc?
>

As shown by the POC patch, it is rather straightforward to achieve.

Bottled Water logical decoding plugin uses binary protocol based on Avro
>> data serialization library.  As an experiment I was adding support for JSON
>> output format to it, and for that I had to re-implement the aforementioned
>> SRF to export initial data to convert tuples to JSON instead.
>>
>
> Have you taken a look at what's been done with pglogical and
> pglogical_output?
>
> We've got extensible protocol support there, and if Avro offers compelling
> benefits over the current binary serialization I'm certainly interested in
>

This is what I'm going to benchmark.  With the generic function I can just
create two slots: one for pglogical and another one for BottledWater/Avro
and see which one performs better when forced to stream some TB worth of
INSERTs through the change callback.

What do you say?
>>
>
> Interesting idea. As outlined I think it sounds pretty fragile though; I
> really, really don't like the idea of lying to the insert callback by
> passing it a fake insert with (presumably) fake reorder buffer txn, etc.
>

Fair enough.  However for performance testing it could be not that bad,
even if nothing of that lands in the actual API.

What we've done in pglogical is take a --schema-only dump then, on the
> downstream, attach to the exported snapshot and use COPY ... TO STDOUT over
> a libpq connection to the upstream feed that to COPY ... FROM STDIN on
> another libpq connection to "ourselves", i.e. the downstream. Unless Petr
> changed it to use COPY innards directly on the downstream; I know he talked
> about it but haven't checked if he did. Anyway, either way it's not pretty
> and requires a sideband non-replication connection to sync initial state.
> The upside is that it can be relatively easily parallelized for faster sync
> using multiple connections.
>

I've also measured that to have a baseline for comparing it to decoding
performance.

To what extent are you setting up a true logical decoding context here?
>

It is done in the same way exact pg_logical_slot_get/peek_changes() do.

> Where does the xact info come from? The commit record? etc.
>

palloc0()

> You're presumably not forming a reorder buffer then decoding it since it
> could create a massive tempfile on disk, so are you just dummying this info
> up?
>

In my experience, it doesn't.  We know it's going to be a "committed xact",
so we don't really need to queue the changes up before we see a "commit"
record.

> Or hoping the plugin won't look at it?
>

Pretty much. :-)

The functionality is good and I think that for the SQL level you'd have to
> use SET TRANSACTION SNAPSHOT as you show. But I think it should really be
> usable from the replication protocol too - and should try to keep the state
> as close to that of a normal decoding session as possible. We'd at least
> need a new walsender protocol command equivalent that took the snapshot
> identifier, relation info and the other decoding params instead of a slot
> ```

### Re: [HACKERS] Why format() adds double quote?

```Hi

2016-01-20 7:20 GMT+01:00 Tatsuo Ishii :

> > 2016-01-20 3:47 GMT+01:00 Tatsuo Ishii :
> >
> >> test=# select format('%I', t) from t1;
> >>   format
> >> --
> >>  aaa
> >>  "AAA"
> >>  "あいう"
> >> (3 rows)
> >>
> >> Why is the text value of the third line needed to be double quoted?
> >> (note that it is a multi byte character). Same thing can be said to
> >> quote_ident().
> >>
> >> We treat identifiers made of the multi byte characters without double
> >> quotation (non delimited identifier) in other places.
> >>
> >> test=# create table t2(あいう text);
> >> CREATE TABLE
> >> test=# insert into t2 values('aaa');
> >> INSERT 0 1
> >> test=# select あいう from t2;
> >>  あいう
> >>
> >>  aaa
> >> (1 row)
> >
> > format uses same routine as quote_ident. So quote_ident should be fixed
> > first.
>
> Yes, I had that in my mind too.
>
> Attached is the proposed patch to fix the bug.
> Regression tests passed.
>
> Here is an example after the patch. Note that the third row is not
> quoted any more.
>
> test=#  select format('%I', あいう) from t2;
>  format
>
>  aaa
>  "AAA"
>  あああ
> (3 rows)
>
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
>
> index 3783e97..b93fc27 100644
> @@ -9405,7 +9405,7 @@ quote_identifier(const char *ident)
>  * would like to use  macros here, but they might yield
> unwanted
>  * locale-specific results...
>  */
> -   safe = ((ident[0] >= 'a' && ident[0] <= 'z') || ident[0] == '_');
> +   safe = ((ident[0] >= 'a' && ident[0] <= 'z') || ident[0] == '_' ||
> IS_HIGHBIT_SET(ident[0]));
>
> for (ptr = ident; *ptr; ptr++)
> {
> @@ -9413,7 +9413,8 @@ quote_identifier(const char *ident)
>
> if ((ch >= 'a' && ch <= 'z') ||
> (ch >= '0' && ch <= '9') ||
> -   (ch == '_'))
> +   (ch == '_') ||
> +   (IS_HIGHBIT_SET(ch)))
> {
> /* okay */
> }
>
>
This patch ls simply - I remember I was surprised, so we allow any
multibyte char few months ago.

+1

Pavel

```

### Re: [HACKERS] Re: [JDBC] 9.4-1207 behaves differently with server side prepared statements compared to 9.2-1102

```On Tue, Jan 19, 2016 at 4:20 AM, Thomas Kellerer  wrote:
>
> Robert Haas wrote:
> > and it won't be the last.  Way too many of our users are way more
> > aware than they should be that the threshold here is five rather than
> > any other number, which to me is a clear-cut sign that this needs to
> > be improved.  How to improve it is a harder question.  We lack the
> > ability to do any kind of sensitivity analysis on a plan, so we can't
> > know whether there are other parameter values that would have resulted
> > in a different plan, nor can we test whether a particular set of
> > parameter values would have changed the outcome.
>
> (I initially posted that question on the JDBC mailing list)
>
> To be honest: looking at the efforts Oracle has done since 9 up until 12 I
> am not sure this is a problem that can be solved by caching plans.
>
> Even with the new "in-flight" re-planning in Oracle 12 ("cardinality
> feedback") and all the effort that goes into caching plans we are still
> seeing similar problems with (prepared) statements that are suddenly slow.
> And as far as I can tell, the infrastructure around plan caching,
> invalidation, bind variable peeking and all that seems to be a *lot* more
> complex ("sophisticated") in Oracle compared to Postgres. And the results
> don't seem to justify the effort (at least in my experience).
>

I have heard the same feedback as above some time back from some
of the research fellows doing research in query optimization area.  They
come-up with different concept called "Plan Bouquet" [1] where
in they try to execute multiple plans during execution and proceed with
plan-selection problems and their claim is that this technique proves to
be better on benchmarks than existing mechanisms used for query
optimisation.

I am not advocating any such mechanism, but rather sharing an
information, I came across.

[1] - http://dsl.serc.iisc.ernet.in/publications/conference/bouquet.pdf

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

```

### Re: [HACKERS] custom function for converting human readable sizes to bytes

```On Mon, Jan 18, 2016 at 6:56 PM, Vitaly Burovoy
wrote:
> On 1/4/16, Robert Haas  wrote:
>> On Mon, Jan 4, 2016 at 10:17 AM, Pavel Stehule
>> wrote:
>>> [ new patch ]
>>
>> + case '-':
>> + ereport(ERROR,
>> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>> +  errmsg("size cannot be negative")));
>>
>> Why not?  I bet if you copy any - sign to the buffer, this will Just Work.
>
> Hmm. The function's name is pg_size_bytes. How number of bytes can be
> negative? How any length can be negative? If anyone insert '-' sign to
> an argument, it is copy-paste error. I don't see any case where there
> is possible negatives as input value.
>
> I prefer error message instead of getting all relations (by using
> comparison from the initial letter) just because of copy-paste mistake
> or incomplete checking of input values at app-level.

Well, we just recently did this:

commit 8a1fab36aba7506fcf4559c4ef95fcacdd0b439a
Author: Robert Haas
Date:   Fri Nov 6 11:03:02 2015 -0500

pg_size_pretty: Format negative values similar to positive ones.

Previously, negative values were always displayed in bytes, regardless
of how large they were.

Adrian Vondendriesch, reviewed by Julien Rouhaud and myself

Since we went to the trouble of making the handling of positive and
negative values symmetric for that function, it seems it should be
done here also.  Otherwise, it is asymmetric.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### Re: [HACKERS] Combining Aggregates

```On Mon, Jan 18, 2016 at 11:24 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> Here is a patch that helps a good deal.  I changed things so that when
>> we create a context, we always allocate at least 1kB.
>
> That's going to kill performance in some other cases; subtransactions
> in particular rely on the subtransaction's TransactionContext not causing
> any actual malloc unless something gets put into the TransactionContext.

Sorry, I guess I wasn't clear: it increases the allocation for the
context node itself to 1kB and uses the extra space to store a few
allocations.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### Re: [HACKERS] Advices on custom data type and extension development

```On Mon, Jan 18, 2016 at 9:36 PM, Luciano Coutinho Barcellos
wrote:

> * a lot of data being generated every day, which are mainly queried
> by an immutable column of type date or timestamp;
> * as a standard, almost every table has a bigserial id column as a
> primary key;
> * data is huge enough to demand table partitioning, which is
> implemented as suggested in Postgres documentation, by using triggers and
> table inheritance. A function called by cron deal with creation of new
> partitions.
>
> What I would like to develop first is a custom type (let's call it
> datedserial) for replacing bigserial as the primary key:
>
> * the type would be 8 bytes long, being 4 dedicated to storing the
> Date, and 4 dedicated to storing a serial within that day;

Seriously, you should consider having a primary key with two
columns, of type date and int.  It would take exactly the same
space as your current plan, and performance should be very close to
what you propose.  As long as you aren't using some ORM that is too
dumb to deal with this, it should be far easier than creating the
custom type.

If you can't overcome the limitations of the "standard" or your
development framework any other way, you plan sounds like the next
best thing.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### Re: [HACKERS] [Proposal] Table partition + join pushdown

```On Mon, Jan 18, 2016 at 5:55 PM, Robert Haas  wrote:
> For
> example, suppose that x and y are numeric columns and P(x) is
> length(x::text) == 3.  Then you could have 1 in one table and 1.0 in
> the table; they join, but P(x) is true for one and false for the
> other.

Fwiw, ages ago there was some talk about having a property on
functions "equality preserving" or something like that. If a function,
or more likely a  tuple had this property set then
x op y => f(x) op f(y). This would be most useful for things like
substring or hash functions which would allow partial indexes or
partition exclusion to be more generally useful.

Of course then you really want  to indicate that "a op1 b
=> f(a) op2 f(b)" so you can handle things like  so
that "a < b => substring(a,n) <= substring(b,n)" and you need some way
to represent the extra arguments to substring and the whole thing
became too complex and got dropped.

But perhaps even a simpler property that only worked for equality and
single-argument functions would be useful since it would let us mark
hash functions Or perhaps we only need to mark the few functions that
expose properties that don't affect equality since I think there are
actually very few of them.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### Re: [HACKERS] custom function for converting human readable sizes to bytes

```2016-01-19 13:42 GMT+01:00 Robert Haas :

> On Mon, Jan 18, 2016 at 6:56 PM, Vitaly Burovoy
>  wrote:
> > On 1/4/16, Robert Haas  wrote:
> >> On Mon, Jan 4, 2016 at 10:17 AM, Pavel Stehule  >
> >> wrote:
> >>> [ new patch ]
> >>
> >> + case '-':
> >> + ereport(ERROR,
> >> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> >> +  errmsg("size cannot be negative")));
> >>
> >> Why not?  I bet if you copy any - sign to the buffer, this will Just
> Work.
> >
> > Hmm. The function's name is pg_size_bytes. How number of bytes can be
> > negative? How any length can be negative? If anyone insert '-' sign to
> > an argument, it is copy-paste error. I don't see any case where there
> > is possible negatives as input value.
> >
> > I prefer error message instead of getting all relations (by using
> > comparison from the initial letter) just because of copy-paste mistake
> > or incomplete checking of input values at app-level.
>
> Well, we just recently did this:
>
> commit 8a1fab36aba7506fcf4559c4ef95fcacdd0b439a
> Author: Robert Haas
> Date:   Fri Nov 6 11:03:02 2015 -0500
>
> pg_size_pretty: Format negative values similar to positive ones.
>
> Previously, negative values were always displayed in bytes, regardless
> of how large they were.
>
> Adrian Vondendriesch, reviewed by Julien Rouhaud and myself
>
> Since we went to the trouble of making the handling of positive and
> negative values symmetric for that function, it seems it should be
> done here also.  Otherwise, it is asymmetric.
>

the last patch (pg-size-bytes-08.patch
)
at 2016-01-04 17:03:02

allows negative size

Regards

Pavel

> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

```

### Re: [HACKERS] Additional role attributes && superuser review

```On 1/17/16 9:10 PM, Stephen Frost wrote:
> but if it's possible to do a backup without
> would expect every backup soltuion to view that as a feature which they
> want to support, as there are environments which will find it desirable,
> at a minimum, and even some which will require it.

This would certainly be a desirable feature for pgBackRest.  The fewer
processes running with the ability to alter the cluster the better.
Even if they are well-tested it's still one less thing to worry about.

--
-David
da...@pgmasters.net

signature.asc
Description: OpenPGP digital signature

```

### Re: [HACKERS] checkpointer continuous flushing

```

synchronous_commit = off does make a significant difference.

I think I found one possible culprit: I automatically wrote 300 seconds
to rerun the tests with this (unreasonnable) figure to check whether I
really get a regression.

Other tests I ran with "reasonnable" settings on a large (scale=800) db
did not show any significant performance regression, up to know.

--
Fabien.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### [HACKERS] Logical decoding on standby

``` Hi,

If i try to create logical replication slot on standby i get error:
ERROR:  logical decoding cannot be used while in recovery

In code i found this comment:

/*
* TODO: We got to change that someday soon...
*
* There's basically three things missing to allow this:
* 1) We need to be able to correctly and quickly identify the timeline a
* LSN belongs to
* 2) We need to force hot_standby_feedback to be enabled at all times so
* the primary cannot remove rows we need.
* 3) support dropping replication slots referring to a database, in
* dbase_redo. There can't be any active ones due to HS recovery
* conflicts, so that should be relatively easy.
*
*/
if (RecoveryInProgress())
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("logical decoding cannot be used while in recovery")));
When you plan to add logical decoding on standby?

it is useful to have separate standby server for logical replication that will
not break the master if you make a mistake in plugin.

--
Regards
Dmitriy
```

### Re: [HACKERS] COPY (... tab completion

```Michael Paquier  writes:
> This patch makes me wonder: are we going to nuke the grammar "COPY [
> BINARY ] table_name" at some point? This was used up to 7.3.

I'm not particularly in a hurry to remove obsolete syntaxes, as long as
they're not blocking forward progress in some way.  However, it seems
to me that it would certainly make sense to remove tab-completion support
for long-deprecated syntax.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

```On Tue, Jan 19, 2016 at 5:44 PM, Michael Paquier
wrote:

> > Test3:
> > pgbench -i -s 100 postgres
> > pgbench -c\$ -j\$ -Mprepared -S postgres
> >
> > Client Base  Pached
> >
> > 1  2055519404
> > 32  375919  332670
> > 64  509067  440680
> > 128431346  415121
> > 256380926  379176
>
> It seems like you did a copy-paste of the results with s=100 and
> s=300. Both are showing the exact same numbers.

Oops, my mistake, re-pasting the correct results for s=100

pgbench -i -s 100 postgres
pgbench -c\$ -j\$ -Mprepared -S postgres

Client Base  Pached

12054820791
32  372633  355356
64  532052  552148
128412755  478826
256 346701 372057

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

```

### Re: [HACKERS] Re: pglogical_output - a general purpose logical decoding output plugin

```On 19 January 2016 at 05:47, Tomasz Rybak  wrote:

> I just quickly went through patch v5.
> It's rather big patch, on (or beyond) my knowledge of PostgreSQL to
> perform high-quality review. But during this week I'll try to send reviews
> of parts of the code, as going through it in one sitting seems impossible.
>
> One proposed change - update copyright to 2016.
>
> i'd also propose to change of pglogical_output_control.in:
> comment = 'general purpose logical decoding plugin'
> to something like "general-purpoer plugin decoding and generating stream
> of logical changes"
>
> We might also think about changing name of plugin to something resembling
> "logical_streaming_decoder" or even "logical_streamer"
>
>
I'm open to ideas there but I'd want some degree of consensus before
undertaking the changes required.

--
PostgreSQL Development, 24x7 Support, Training & Services

```

### Re: [HACKERS] Patch: Implement failover on libpq connect level.

```On 21 December 2015 at 14:50, Victor Wagner  wrote:
> On Mon, 21 Dec 2015 17:18:37 +0300
> Teodor Sigaev  wrote:
>
>> Sorry, but there is something wrong with your patch:
>
> Really, somehow broken version of the patch got into message.
>
> Here is correct one.

The segfault issue I originally reported now appears to be resolved.

But now I have another one:

psql
'postgresql://thom@127.0.0.1:5530,127.0.0.1:5531,127.0.0.1:5532,127.0.0.1:5533/postgres?hostorder=random=1_timeout=5'
-c 'show port'

Segmentation fault

This is where no nodes are available.  If I remove hostorder=random,
or replace it with hostorder=sequential, it doesn't segfault.
However, it then tries to connect to PGHOST on PGPORT repeatedly, even
if I bring up one of the nodes it should be looking for.  Not only
this, but it seems to do it forever if failover_timeout is greater
than 0.

Thom

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### Re: [HACKERS] Logical decoding on standby

```Hi Dimitriy,

Дмитрий Сарафанников wrote:

> /*
> * TODO: We got to change that someday soon...
[ more code ]
> if (RecoveryInProgress())
> ereport(ERROR,
> (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> errmsg("logical decoding cannot be used while in recovery")));
> When you plan to add logical decoding on standby?

Things change as people submit patches to make them change.  If you want
this changed, you could either write a patch yourself, or persuade
someone else to do it for you.

I don't think anyone is working in this particular TODO just yet -- as I
know, all the logical-decoding-enabled developers are pretty busy doing
other things.  The good news is that the things that need doing are
spelled right there in the comment :-)

--
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### [HACKERS] Re: [GENERAL] about test_parser installation failure problem(PostgreSQL in 9.5.0)？

```
On 01/14/2016 06:51 PM, 閬閬イふ wrote:

hi postgreSql !
test_parser install is ok (postgresql 9.2.4)
but at (postgresql 9.5.0) failure？
why？the postgresql say:
CREATE EXTENSION zhparser
say:
ERROR:  syntax error at or near "﻿"
LINE 1: CREATE EXTENSION zhparser
^
** 错误 **
ERROR: syntax error at or near "﻿"
SQL 状态: 42601

CREATE EXTENSION test_parser FROM unpackaged
ERROR:  function testprs_start(internal, integer) does not exist
** 错误 **
ERROR: function testprs_start(internal, integer) does not exist
SQL 状态: 42883
9.5.0 Not supported ?
can help me?

The last place I see it is:

http://www.postgresql.org/docs/9.4/interactive/test-parser.html

I do not see it in the 9.5 contrib modules:

http://www.postgresql.org/docs/9.5/interactive/contrib.html

Aah, looking at the Release Notes for 9.5:

http://www.postgresql.org/docs/9.5/interactive/release-9-5.html

Move dummy_seclabel, test_shm_mq, test_parser, and worker_spi from
contrib to src/test/modules (Álvaro Herrera)

These modules are only meant for server testing, so they do not need to
be built or installed when packaging PostgreSQL.

So if you are building from source you can go there to build the module.

cnpusr
2015-01-15

--

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### [HACKERS] about test_parser installation failure problem(PostgreSQL in 9.5.0)？

```hi postgreSql !
test_parser install is ok (postgresql 9.2.4)
but at (postgresql 9.5.0) failure？
why？the postgresql say:
CREATE EXTENSION zhparser
say:
ERROR:  syntax error at or near "﻿"
LINE 1: CREATE EXTENSION zhparser
^
** 错误 **
ERROR: syntax error at or near "﻿"
SQL 状态: 42601

CREATE EXTENSION test_parser FROM unpackaged
ERROR:  function testprs_start(internal, integer) does not exist
** 错误 **
ERROR: function testprs_start(internal, integer) does not exist
SQL 状态: 42883

9.5.0 Not supported ?
can help me?

cnpusr
2015-01-15
```

### Re: [HACKERS] PATCH: postpone building buckets to the end of Hash (in HashJoin)

```On Tue, Jan 19, 2016 at 4:49 PM, Tomas Vondra
wrote:
> I can totally see why this would slow-down the BuildBuckets function, but I
> don't quite see why it should make the other code significantly slower. Yet
> BuildBuckets takes just ~25ms while the total duration increases by ~200ms
> (for the 1x10 dataset).
>
> I do understand calling BuildBuckets may affect the code executed after that
> as it keeps other data in the cache, but i would not expect ~175ms.

I don't know.  There could be some other explanation, but I don't have
a guess as to what it is.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### NextXID format change (was Re: [HACKERS] exposing pg_controldata and pg_config as functions)

```On 01/19/2016 09:02 AM, Bruce Momjian wrote:
> Ok. Notwithstanding Simon's reply, there seems to be consensus that this
> is the way to go. Will commit it this way unless some additional
> objections surface in the next day or so.

FYI, this slash-colon change will break pg_upgrade unless it is patched.
Dp you want a patch from me?
>>>
>>> Didn't realize that -- yes please.
>>
>> Sure, attached, and it would be applied only to head, where you change
>> pg_controldata.  We could get more sophisticated by checking the catalog
>> version number where the format was changed, but that doesn't seem worth
>> it, and is overly complex because we get the catalog version number from
>> pg_controldata, so you would be adding a dependency in ordering of the
>> pg_controldata entries.
>
> Sorry, please use the attached patch instead, now tested with your
> changes.

The attached includes Bruce's change, plus I found two additional sites
that appear to need the same change. The xlog.c change is just a DEBUG
message, so not a big deal. I'm less certain if the xlogdesc.c change
might create some fallout.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index b694dea..2dcbfbd 100644
*** a/src/backend/access/rmgrdesc/xlogdesc.c
--- b/src/backend/access/rmgrdesc/xlogdesc.c
*** 43,49
CheckPoint *checkpoint = (CheckPoint *) rec;

appendStringInfo(buf, "redo %X/%X; "
! 		 "tli %u; prev tli %u; fpw %s; xid %u/%u; oid %u; multi %u; offset %u; "
"oldest xid %u in DB %u; oldest multi %u in DB %u; "
"oldest/newest commit timestamp xid: %u/%u; "
"oldest running xid %u; %s",
--- 43,49
CheckPoint *checkpoint = (CheckPoint *) rec;

appendStringInfo(buf, "redo %X/%X; "
! 		 "tli %u; prev tli %u; fpw %s; xid %u:%u; oid %u; multi %u; offset %u; "
"oldest xid %u in DB %u; oldest multi %u in DB %u; "
"oldest/newest commit timestamp xid: %u/%u; "
"oldest running xid %u; %s",
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 7d5d493..ee87e1b 100644
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
*** StartupXLOG(void)
*** 6284,6290
(uint32) (checkPoint.redo >> 32), (uint32) checkPoint.redo,
wasShutdown ? "TRUE" : "FALSE")));
ereport(DEBUG1,
! 			(errmsg_internal("next transaction ID: %u/%u; next OID: %u",
checkPoint.nextXidEpoch, checkPoint.nextXid,
checkPoint.nextOid)));
ereport(DEBUG1,
--- 6284,6290
(uint32) (checkPoint.redo >> 32), (uint32) checkPoint.redo,
wasShutdown ? "TRUE" : "FALSE")));
ereport(DEBUG1,
! 			(errmsg_internal("next transaction ID: %u:%u; next OID: %u",
checkPoint.nextXidEpoch, checkPoint.nextXid,
checkPoint.nextOid)));
ereport(DEBUG1,
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index e7e072f..5dd2dbc 100644
*** a/src/bin/pg_controldata/pg_controldata.c
--- b/src/bin/pg_controldata/pg_controldata.c
*** main(int argc, char *argv[])
*** 252,258
ControlFile.checkPointCopy.PrevTimeLineID);
printf(_("Latest checkpoint's full_page_writes: %s\n"),
ControlFile.checkPointCopy.fullPageWrites ? _("on") : _("off"));
! 	printf(_("Latest checkpoint's NextXID:  %u/%u\n"),
ControlFile.checkPointCopy.nextXidEpoch,
ControlFile.checkPointCopy.nextXid);
printf(_("Latest checkpoint's NextOID:  %u\n"),
--- 252,258
ControlFile.checkPointCopy.PrevTimeLineID);
printf(_("Latest checkpoint's full_page_writes: %s\n"),
ControlFile.checkPointCopy.fullPageWrites ? _("on") : _("off"));
! 	printf(_("Latest checkpoint's NextXID:  %u:%u\n"),
ControlFile.checkPointCopy.nextXidEpoch,
ControlFile.checkPointCopy.nextXid);
printf(_("Latest checkpoint's NextOID:  %u\n"),
diff --git a/src/bin/pg_resetxlog/pg_resetxlog.c b/src/bin/pg_resetxlog/pg_resetxlog.c
index ca706a5..525b82b 100644
*** a/src/bin/pg_resetxlog/pg_resetxlog.c
--- b/src/bin/pg_resetxlog/pg_resetxlog.c
*** PrintControlValues(bool guessed)
*** 646,652
ControlFile.checkPointCopy.ThisTimeLineID);
printf(_("Latest checkpoint's full_page_writes: %s\n"),
ControlFile.checkPointCopy.fullPageWrites ? _("on") : _("off"));
! 	printf(_("Latest checkpoint's NextXID:  %u/%u\n"),
ControlFile.checkPointCopy.nextXidEpoch,
ControlFile.checkPointCopy.nextXid);
printf(_("Latest checkpoint's NextOID:  %u\n"),
--- 646,652
ControlFile.checkPointCopy.ThisTimeLineID);
printf(_("Latest ```

### Re: [HACKERS] Combining Aggregates

```On Tue, Jan 19, 2016 at 12:22 AM, Tomas Vondra
wrote:
> I dare to claim that if expanded objects require a separate memory context
> per aggregate state (I don't see why would be the case), it's a dead end.
> Not so long ago we've fixed exactly this issue in array_agg(), which
> addressed a bunch of reported OOM issues and improved array_agg()
> performance quite a bit. It'd be rather crazy introducing the same problem
> to all aggregate functions.

Expanded objects require a separate memory context per Datum.  I think
I know how to rejigger things so that the overhead of that is
minimized as much as possible, but it's still 200 bytes for the
AllocSetContext + ~16 bytes for the name + 32 bytes for an AllocBlock
overhead per expanded datum, which means on this test case the
theoretical minimum memory burn for this approach is about 300 MB (and
in practice somewhat more).  Ouch.

The upshot seems to be that expanded objects aren't going to actually
be useful in any situation where you might have very many of them,
because the memory overhead is just awful.  That's rather
disappointing.  Even if you could get rid of the requirement for a
memory context per Datum - and it seems like you could if you added a
method defined to reparent the object to a designated context, which
looks like a totally reasonable innovation - somebody's probably going
to say, not without some justification, that 48 bytes of
ExpandedObjectHeader per object is an unaffordable luxury here.  The
structure has 8 bytes of unnecessary padding in it that we could elide
easily enough, but after that there's not really much way to squeeze
it down without hurting performance in some case cases where this
mechanism is fast today.

So I guess I'm going to agree that this approach is doomed.  Rats.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### Re: [HACKERS] Combining Aggregates

```On Tue, Jan 19, 2016 at 11:56 AM, Robert Haas  wrote:
> [ rewinding to here from the detour I led us on ]
>
> On Mon, Dec 21, 2015 at 4:02 AM, David Rowley
>  wrote:
>> Now, there has been talk of this previously, on various threads, but I don't
>> believe any final decisions were made on how exactly it should be done. At
>> the moment I plan to make changes as follows:

Oh, one more point: is there any reason why all of this needs to be a
single (giant) patch?  I feel like the handling of internal states
could be a separate patch from the basic patch to allow partial
aggregation and aggregate-combining, and that the latter could be
committed first if we had a reasonably final version of it.  That
seems like it would be easier from a review standpoint, and might
allow more development to proceed in parallel, too.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### Re: [HACKERS] exposing pg_controldata and pg_config as functions

```On Mon, Jan 18, 2016 at 07:50:12PM -0500, Bruce Momjian wrote:
> >  1) Change NextXID output format from "%u/%u" to "%u:%u"
> > (see recent hackers thread)
> > >>>
> > >>> ! printf(_("Latest checkpoint's NextXID:  %u/%u\n"),
> > >>>  ControlFile.checkPointCopy.nextXidEpoch,
> > >>>  ControlFile.checkPointCopy.nextXid);
> > >>>   printf(_("Latest checkpoint's NextOID:  %u\n"),
> > >>> --- 646,652
> > >>>  ControlFile.checkPointCopy.ThisTimeLineID);
> > >>>   printf(_("Latest checkpoint's full_page_writes: %s\n"),
> > >>>  ControlFile.checkPointCopy.fullPageWrites ? _("on") :
> > >>> _("off"));
> > >>> ! printf(_("Latest checkpoint's NextXID:  %u:%u\n"),
> > >>> This should be definitely a separate patch.
> > >>
> > >> Ok. Notwithstanding Simon's reply, there seems to be consensus that this
> > >> is the way to go. Will commit it this way unless some additional
> > >> objections surface in the next day or so.
> > >
> > > FYI, this slash-colon change will break pg_upgrade unless it is patched.
> > > Dp you want a patch from me?
> >
> > Didn't realize that -- yes please.
>
> Sure, attached, and it would be applied only to head, where you change
> pg_controldata.  We could get more sophisticated by checking the catalog
> version number where the format was changed, but that doesn't seem worth
> it, and is overly complex because we get the catalog version number from
> pg_controldata, so you would be adding a dependency in ordering of the
> pg_controldata entries.

changes.

--
Bruce Momjian  http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +
new file mode 100644
index 2def729..34e194c
*** get_control_data(ClusterInfo *cluster, b
*** 197,207
p++;/* remove ':' char */
cluster->controldata.chkpnt_nxtepoch = str2uint(p);

! 			p = strchr(p, '/');
if (p == NULL || strlen(p) <= 1)
pg_fatal("%d: controldata retrieval problem\n", __LINE__);

! 			p++;/* remove '/' char */
cluster->controldata.chkpnt_nxtxid = str2uint(p);
got_xid = true;
}
--- 197,214
p++;/* remove ':' char */
cluster->controldata.chkpnt_nxtepoch = str2uint(p);

! 			if (strchr(p, '/') != NULL)
! p = strchr(p, '/');
! 			/* delimiter changed from '/' to ':' in 9.6 */
! 			else if (GET_MAJOR_VERSION(cluster->major_version) >= 906)
! p = strchr(p, ':');
! 			else
! p = NULL;
!
if (p == NULL || strlen(p) <= 1)
pg_fatal("%d: controldata retrieval problem\n", __LINE__);

! 			p++;/* remove '/' or ':' char */
cluster->controldata.chkpnt_nxtxid = str2uint(p);
got_xid = true;
}

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### Re: [HACKERS] Combining Aggregates

```[ rewinding to here from the detour I led us on ]

On Mon, Dec 21, 2015 at 4:02 AM, David Rowley
wrote:
> Now, there has been talk of this previously, on various threads, but I don't
> believe any final decisions were made on how exactly it should be done. At
> the moment I plan to make changes as follows:
>
>  Add 3 new columns to pg_aggregate, aggserialfn, aggdeserialfn and
> aggserialtype These will only be required when aggtranstype is INTERNAL.

Check.

> Perhaps we should disallow CREATE AGGREAGET from accepting them for any
> other type...

Well, we should definitely not accept them and have them be
meaningless.  We should either reject them or accept them and then use
them.  I can't immediately think of a reason to serialize one
non-internal type as another, but maybe there is one.

> The return type of aggserialfn should be aggserialtype, and it
> should take a single parameter of aggtranstype. aggdeserialfn will be the
> reverse of that.

Check.

> Add a new bool field to nodeAgg's state named serialStates. If this is field
> is set to true then when we're in finalizeAgg = false mode, we'll call the
> serialfn on the agg state instead of the finalfn. This will allow the
> serialized state to be stored in the tuple and sent off to the main backend.
> The combine agg node should also be set to serialStates = true, so that it
> knows to deserialize instead of just assuming that the agg state is of type
> aggtranstype.

I'm not quite sure, but it sounds like you might be overloading
serialStates with two different meanings here.

> I believe this should allow us to not cause any performance regressions by
> moving away from INTERNAL agg states. It should also be very efficient for
> internal states such as Int8TransTypeData, as this struct merely has 2 int64
> fields which should be very simple to stuff into a bytea varlena type. We
> don't need to mess around converting the ->count and ->sum into a strings or
> anything.

I think it would be more user-friendly to emit these as 2-element
integer arrays rather than bytea.  Sure, bytea is fine when PostgreSQL
is talking to itself, but if you are communicating with an external
system, it's a different situation.  If the remote system is
PostgreSQL then you are again OK, except for the data going over the
wire being incomprehensible and maybe byte-order-dependent, but what
if you want some other database system to do partial aggregation and
then further aggregate the result?  You don't want the intermediate
state to be some kooky thing that only another PostgreSQL database has
a chance of generating correctly.

> Then in order for the planner to allow parallel aggregation all aggregates
> must:
>
> Not have a DISTINCT or ORDER BY clause
> Have a combinefn
> If aggtranstype = INTERNAL, must have a aggserialfn and aggdeserialfn.
>
> We can relax the requirement on 3 if we're using 2-stage aggregation, but
> not parallel aggregation.

When would we do that?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### Re: [HACKERS] checkpointer continuous flushing

```On 2016-01-19 13:34:14 +0100, Fabien COELHO wrote:
>
> >synchronous_commit = off does make a significant difference.
>
> Sure, but I had thought about that and kept this one...

But why are you then saying this is fundamentally limited to 160
xacts/sec?

> I think I found one possible culprit: I automatically wrote 300 seconds for
> checkpoint_timeout, instead of 30 seconds in your settings. I'll have to
> rerun the tests with this (unreasonnable) figure to check whether I really
> get a regression.

I've not seen meaningful changes in the size of the regression between 30/300s.

> Other tests I ran with "reasonnable" settings on a large (scale=800) db did
> not show any significant performance regression, up to know.

Try running it so that the data set nearly, but not entirely fit into
the OS page cache, while definitely not fitting into shared_buffers. The
scale=800 just worked for that on my hardware, no idea how it's for yours.

That seems to be the point where the effect is the worst.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```

### Re: [HACKERS]WIP: Covering + unique indexes.

```

18.01.2016 01:02, David Rowley пишет:
On 14 January 2016 at 08:24, David Rowley
>
wrote:

I will try to review the omit_opclass_4.0.patch soon.

Hi, as promised, here's my review of the omit_opclass_4.0.patch patch.

Thank you again. All mentioned points are fixed and patches are merged.
I hope it's all right now. Please check comments one more time. I rather
doubt that I wrote everything correctly.
Also this makes me think that the name ii_KeyAttrNumbers is now
out-of-date, as it contains
the including columns too by the looks of it. Maybe it just needs to
drop the "Key" and become
"ii_AttrNumbers". It would be interesting to hear what others think of
that.

I'm also wondering if indexkeys is still a good name for the
IndexOptInfo struct member.
Including columns are not really keys, but I feel renaming that might
cause a fair bit of code churn, so I'd be interested to hear what
other's have to say.

I agree that KeyAttrNumbers and indexkeys are a bit confusing names, but
I'd like to keep them at least in this patch.

It's may be worth doing "index structures refactoring" as a separate patch.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 97ef618..d17a06c 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -644,6 +644,13 @@
Does an index of this type manage fine-grained predicate locks?

+
+  amcaninclude
+  bool
+
+  Does the access method support included columns?
+
+

amkeytype
oid
@@ -3714,6 +3721,14 @@
pg_class.relnatts)

+
+  indnkeyatts
+  int2
+
+  The number of key columns in the index. "Key columns" are ordinary
+  index columns in contrast with "included" columns.
+
+

indisunique
bool
diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index 1c09bae..d01af17 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/indexam.sgml
@@ -767,7 +767,8 @@ amrestrpos (IndexScanDesc scan);
using unique indexes, which are indexes that disallow
multiple entries with identical keys.  An access method that supports this
feature sets pg_am.amcanunique true.
-   (At present, only b-tree supports it.)
+   (At present, only b-tree supports it.) Columns included with clause
+   INCLUDING  aren't used to enforce uniqueness.

diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/indices.sgml
index 23bbec6..09d4e6b 100644
--- a/doc/src/sgml/indices.sgml
+++ b/doc/src/sgml/indices.sgml
@@ -633,7 +633,8 @@ CREATE INDEX test3_desc_index ON test3 (id DESC NULLS LAST);
Indexes can also be used to enforce uniqueness of a column's value,
or the uniqueness of the combined values of more than one column.

-CREATE UNIQUE INDEX name ON table (column , ...);
+CREATE UNIQUE INDEX name ON table (column , ...)
+INCLUDING (column , ...);

Currently, only B-tree indexes can be declared unique.

@@ -642,7 +643,9 @@ CREATE UNIQUE INDEX name ON tableINCLUDING aren't used to enforce constraints (UNIQUE,
+   PRIMARY KEY, etc).

diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index ce36a1b..8360bb6 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -23,6 +23,7 @@ PostgreSQL documentation

CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] name ] ON table_name [ USING method ]
( { column_name | ( expression ) } [ COLLATE collation ] [ opclass ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] )
+[ INCLUDING ( { column_name | ( expression ) } [ COLLATE collation ] [ opclass ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] )
[ WITH ( storage_parameter = value [, ... ] ) ]
[ TABLESPACE tablespace_name ]
[ WHERE predicate ]
@@ -139,6 +140,32 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ]

+  INCLUDING
+
+
+An optional INCLUDING clause allows a list of columns to be
+specified which will be included in the index, in the non-key portion of
+the index. Columns which are part of this clause cannot also exist in the
+key columns portion of the index, and vice versa. The
+INCLUDING columns exist solely to allow more queries to benefit
+from index-only scans by including certain columns in the
+index, the value of which would otherwise have to be obtained by reading
+the table's heap. Having these columns in the INCLUDING clause
+in some cases allows PostgreSQL to skip the heap read
+completely. This also allows UNIQUE indexes to be defined on
+one set of columns, which can include another set of column in the
+INCLUDING clause, on which the uniqueness is not ```

### Re: [HACKERS]WIP: Covering + unique indexes.

```

12.01.2016 20:47, Jeff Janes:

It looks like the "covering" patch, with or without the "omit_opclass"
patch, does not support expressions as included columns:

create table foobar (x text, y xml);
create index on foobar (x) including  (md5(x));
ERROR:  unrecognized node type: 904
create index on foobar (x) including  ((y::text));
ERROR:  unrecognized node type: 911

I think we would probably want it to work with those (or at least to
throw a better error message).
Thank you for the notice. I couldn't fix it quickly and added a stub in
the latest patch.

But I'll try to fix it and add expressions support a bit later.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

```