date:20170929

2017-09-30 7:54 GMT+02:00 Pavel Stehule :

> Hi
>
> There is new, not well solved dependency. I have not any problem to build
> extension with PostgreSQL 9.6 there
>
> [oracle@ora2pg plpgsql_check]$ make
> gcc -Wall -Wmissing-prototypes -Wpointer-arith
> -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute
> -Wformat-security -fno-strict-aliasing -fwrapv -O2 -g -pipe -Wall
> -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector
> --param=ssp-buffer-size=4 -m64 -mtune=generic -DLINUX_OOM_SCORE_ADJ=0 -fPIC
> -I/usr/pgsql-10/lib/pgxs/src/makefiles/../../src/pl/plpgsql/src -I. -I./
> -I/usr/pgsql-10/include/server -I/usr/pgsql-10/include/internal
> -I/usr/include -D_GNU_SOURCE -I/usr/include/libxml2  -I/usr/include  -c -o
> plpgsql_check.oplpgsql_check.c
> In file included from /usr/pgsql-10/include/server/tsearch/ts_locale.h:18,
>  from plpgsql_check.c:88:
> /usr/pgsql-10/include/server/utils/pg_locale.h:19:26: error:
> unicode/ucol.h: Adresář nebo soubor neexistuje
> In file included from /usr/pgsql-10/include/server/tsearch/ts_locale.h:18,
>  from plpgsql_check.c:88:
> /usr/pgsql-10/include/server/utils/pg_locale.h:94: error: expected
> specifier-qualifier-list before ‘UCollator’
> /usr/pgsql-10/include/server/utils/pg_locale.h:108: error: expected ‘)’
> before ‘*’ token
> /usr/pgsql-10/include/server/utils/pg_locale.h:109: warning: type
> defaults to ‘int’ in declaration of ‘UChar’
> /usr/pgsql-10/include/server/utils/pg_locale.h:109: error: expected ‘;’,
> ‘,’ or ‘)’ before ‘*’ token
> make: *** [plpgsql_check.o] Error 1
>
> This mean not optional dependency on ICU.
>

It was fixed by libicu-devel install - so it should be on dep list of
postgresql10-devel



> Installed from rpm
>
> Regards
>
> Pavel
>

[HACKERS] extension build issue with PostgreSQL 10 on Centos6

Hi

There is new, not well solved dependency. I have not any problem to build
extension with PostgreSQL 9.6 there

[oracle@ora2pg plpgsql_check]$ make
gcc -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute
-Wformat-security -fno-strict-aliasing -fwrapv -O2 -g -pipe -Wall
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector
--param=ssp-buffer-size=4 -m64 -mtune=generic -DLINUX_OOM_SCORE_ADJ=0 -fPIC
-I/usr/pgsql-10/lib/pgxs/src/makefiles/../../src/pl/plpgsql/src -I. -I./
-I/usr/pgsql-10/include/server -I/usr/pgsql-10/include/internal
-I/usr/include -D_GNU_SOURCE -I/usr/include/libxml2  -I/usr/include  -c -o
plpgsql_check.oplpgsql_check.c
In file included from /usr/pgsql-10/include/server/tsearch/ts_locale.h:18,
 from plpgsql_check.c:88:
/usr/pgsql-10/include/server/utils/pg_locale.h:19:26: error:
unicode/ucol.h: Adresář nebo soubor neexistuje
In file included from /usr/pgsql-10/include/server/tsearch/ts_locale.h:18,
 from plpgsql_check.c:88:
/usr/pgsql-10/include/server/utils/pg_locale.h:94: error: expected
specifier-qualifier-list before ‘UCollator’
/usr/pgsql-10/include/server/utils/pg_locale.h:108: error: expected ‘)’
before ‘*’ token
/usr/pgsql-10/include/server/utils/pg_locale.h:109: warning: type defaults
to ‘int’ in declaration of ‘UChar’
/usr/pgsql-10/include/server/utils/pg_locale.h:109: error: expected ‘;’,
‘,’ or ‘)’ before ‘*’ token
make: *** [plpgsql_check.o] Error 1

This mean not optional dependency on ICU.

Installed from rpm

Regards

Pavel

Re: [HACKERS] Why are we including netinet/tcp.h so widely?

Andres Freund  writes:
> A lot of files which currently include netinet/tcp.h. For most of them I
> cannot recognize why, and it just seems to have been copied from one
> file to the next.

According to POSIX, that supplies TCP_NODELAY, and a look into the
file here finds these other symbols we care about:

#define TCP_KEEPIDLE 4  /* Start keeplives after this period */
#define TCP_KEEPINTVL5  /* Interval between keepalives */
#define TCP_KEEPCNT  6  /* Number of keepalives before death */

Probably you could drop it from any file not using any TCP_xxx
symbols.  I'd be a little wary of #ifdef checks in any file you're
about to remove it from, though, as those could mask any obvious
breakage.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 64-bit queryId?

Robert Haas  writes:
> How about widening the value to uint64?

Doesn't really seem like that would guarantee no collisions.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SQL/JSON in PostgreSQL

2017-09-30 1:06 GMT+02:00 Nikita Glukhov :

> On 29.09.2017 20:07, Pavel Stehule wrote:
>
> 2017-09-29 12:15 GMT+02:00 Pavel Stehule :
>
>>
>> 2017-09-29 12:09 GMT+02:00 Nikita Glukhov :
>>
>>>
>>>
>>> I have some free time now. Is it last version?
>>>
>>> Regards
>>>
>>> Pavel
>>>
>>> Yes, this is still the latest version. Now I am working only on
>>> unfinished WIP
>>> patch no. 9, but I think it should be reviewed the last.
>>>
>>>
>>
>> ok
>>
>> Thank you
>>
>
> I have few queries and notes
>
> 1. Why first patch holds Gin related functionality? Can be it separated?
>
> Yes, it can be easily separated. Attached archive with separated GIN patch
> no.2.
>
> 2. Why Json path functions starts by "_" ? These functions are not removed
> by other patches.
>
> Originally, these functions were created only for testing purposes and
> should
> be treated as "internal". But with introduction of jsonpath operators
> jsonpath
> tests can be completely rewritten using this operators.
>
> 3. What is base for jsonpath-extensions? ANSI/SQL?
>
> Our jsonpath extensions are not based on any standards, so they are quite
> dangerous because they can conflict with the standard in the future.
>
> This patch is pretty big - so I propose to push JSONPath and SQL/JSON
> related patches first, and then in next iteration to push JSON_TABLE patch.
> Is it acceptable strategy?
>
> I think it's acceptable. And this was the main reason for the separation
> of patches.
>

I prefer to move it to another commit fest item. It will simplify a
communication between us and possible committers - and we can better
concentrate to smaller set of code.



> I am sure so JSON_TABLE is pretty important function, but it is pretty
> complex too (significantly more complex than XMLTABLE), so it can be
> practiacal to move this function to separate project. I hope so all patches
> will be merged in release 11 time.
>
>
> --
> Nikita Glukhov
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>

Re: [HACKERS] Parallel Append implementation

2017-09-29 Thread Amit Kapila

On Sat, Sep 30, 2017 at 4:02 AM, Robert Haas  wrote:
> On Fri, Sep 29, 2017 at 7:48 AM, Amit Kapila  wrote:
>> I think in general the non-partial paths should be cheaper as compared
>> to partial paths as that is the reason planner choose not to make a
>> partial plan at first place. I think the idea patch is using will help
>> because the leader will choose to execute partial path in most cases
>> (when there is a mix of partial and non-partial paths) and for that
>> case, the leader is not bound to complete the execution of that path.
>> However, if all the paths are non-partial, then I am not sure much
>> difference it would be to choose one path over other.
>
> The case where all plans are non-partial is the case where it matters
> most!  If the leader is going to take a share of the work, we want it
> to take the smallest share possible.
>

Okay, but the point is whether it will make any difference
practically.  Let us try to see with an example, consider there are
two children (just taking two for simplicity, we can extend it to
many) and first having 1000 pages to scan and second having 900 pages
to scan, then it might not make much difference which child plan
leader chooses.  Now, it might matter if the first child relation has
1000 pages to scan and second has just 1 page to scan, but not sure
how much difference will it be in practice considering that is almost
the maximum possible theoretical difference between two non-partial
paths (if we have pages greater than 1024 pages
(min_parallel_table_scan_size) then it will have a partial path).

> It's a lot fuzzier what is best when there are only partial plans.
>

The point that bothers me a bit is whether it is a clear win if we
allow the leader to choose a different strategy to pick the paths or
is this just our theoretical assumption.  Basically, I think the patch
will become simpler if pick some simple strategy to choose paths.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] 64-bit queryId?

Our Query currently has space for a 32-bit queryId, but that seems
reasonably likely to result in collisions:

https://en.wikipedia.org/wiki/Birthday_problem#Probability_table

If you have as many as 50,000 queries, there's a 25% probability of
having at least one collision; that doesn't seem particularly
unrealistic.  Obviously, normalization reduces the number of distinct
queries a lot, but if queries are dynamically generated you might
still have quite a few of them.

How about widening the value to uint64?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

On Mon, Sep 25, 2017 at 11:40 AM, Peter Eisentraut
 wrote:
> Reading over this code again, it is admittedly not quite clear why this
> "canonicalization" is in there right now.  I think it had something to
> do with how we built the keyword variants at one point.  It might not
> make sense.  I'd be glad to take that out and use the result straight
> from uloc_getAvailable() for collcollate.  That is, after all, the
> "canonical" version that ICU chooses to report to us.

Is anything going to happen here ahead of shipping v10? This remains
an open item.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

On Wed, Sep 20, 2017 at 2:32 AM, Rushabh Lathia
 wrote:
> Yes, I haven't touched the randomAccess part yet. My initial goal was
> to incorporate the BufFileSet api's here.

This is going to need a rebase, due to the commit today to remove
replacement selection sort. That much should be easy.

> Sorry, I didn't get this part. Are you talking about the your patch changes
> into OpenTemporaryFileInTablespace(),  BufFileUnify() and other changes
> related to ltsUnify() ?  If that's the case, I don't think it require with
> the
> BufFileSet. Correct me if I am wrong here.

I thought that you'd have multiple BufFiles, which would be
multiplexed (much like a single BufFile itself mutiplexes 1GB
segments), so that logtape.c could still recycle space in the
randomAccess case. I guess that that's not a goal now.

> To be frank its too early for me to comment anything in this area.  I need
> to study this more closely. As an initial goal I was just focused on
> understanding the current implementation of the patch and incorporate
> the BufFileSet APIs.

Fair enough.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_basebackup --progress output for batch execution

2017-09-29 Thread Martin Marques

Hi,

Some time ago I had to work on a system where I was cloning a standby
using pg_basebackup, that didn't have screen or tmux. For that reason I
redirected the output to a file and ran it with nohup.

I normally (always actually ;) ) run pg_basebackup with --progress and
--verbose so I can follow how much has been done. When done on a tty you
get a nice progress bar with the percentage that has been cloned.

The problem came with the execution and redirection of the output, as
the --progress option will write a *very* long line!

Back then I thought of writing a patch (actually someone suggested I do
so) to add a --batch-mode option which would change the behavior
pg_basebackup has when printing the output messages.

Attach is the patch. I'll be submitting it to the CF.

P.D.: I'm aware that there's a documentation patch missing. :)

Kind regards,

-- 
Martín Marquéshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
>From f6e54c0d062d62daf70c0870f96032eb0c102e66 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Mart=C3=ADn=20Marqu=C3=A9s?=
 
Date: Fri, 29 Sep 2017 19:21:44 -0300
Subject: [PATCH] Adding an option to pg_basebackup to output messages as if it
 were running in batch-mode, as opossed to running in a tty.

This is usefull when using --progress and redirecting the output to
a file for later inspection with tail.

New option --batch-mode with the short option -b added.
---
 src/bin/pg_basebackup/pg_basebackup.c | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index dac7299..cf97ea3 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -82,6 +82,7 @@ static char format = 'p';		/* p(lain)/t(ar) */
 static char *label = "pg_basebackup base backup";
 static bool noclean = false;
 static bool showprogress = false;
+static bool batchmode = false;
 static int	verbose = 0;
 static int	compresslevel = 0;
 static IncludeWal includewal = STREAM_WAL;
@@ -355,6 +356,7 @@ usage(void)
 	printf(_("  -P, --progress show progress information\n"));
 	printf(_("  -S, --slot=SLOTNAMEreplication slot to use\n"));
 	printf(_("  --no-slot  prevent creation of temporary replication slot\n"));
+	printf(_("  -b, --batch-mode   run in batch mode\n"));
 	printf(_("  -v, --verbose  output verbose messages\n"));
 	printf(_("  -V, --version  output version information, then exit\n"));
 	printf(_("  -?, --help show this help, then exit\n"));
@@ -806,7 +808,10 @@ progress_report(int tablespacenum, const char *filename, bool force)
 totaldone_str, totalsize_str, percent,
 tablespacenum, tablespacecount);
 
-	fprintf(stderr, "\r");
+	if (batchmode)
+		fprintf(stderr, "\n");
+	else
+		fprintf(stderr, "\r");
 }
 
 static int32
@@ -1786,7 +1791,13 @@ BaseBackup(void)
 progname);
 
 	if (showprogress && !verbose)
+	{
 		fprintf(stderr, "waiting for checkpoint\r");
+		if (batchmode)
+			fprintf(stderr, "\n");
+		else
+			fprintf(stderr, "\r");
+	}
 
 	basebkp =
 		psprintf("BASE_BACKUP LABEL '%s' %s %s %s %s %s %s",
@@ -2118,6 +2129,7 @@ main(int argc, char **argv)
 		{"status-interval", required_argument, NULL, 's'},
 		{"verbose", no_argument, NULL, 'v'},
 		{"progress", no_argument, NULL, 'P'},
+		{"batch-mode", no_argument, NULL, 'b'},
 		{"waldir", required_argument, NULL, 1},
 		{"no-slot", no_argument, NULL, 2},
 		{NULL, 0, NULL, 0}
@@ -2146,7 +2158,7 @@ main(int argc, char **argv)
 
 	atexit(cleanup_directories_atexit);
 
-	while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWvP",
+	while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWvPb",
 			long_options, _index)) != -1)
 	{
 		switch (c)
@@ -2288,6 +2300,9 @@ main(int argc, char **argv)
 			case 'P':
 showprogress = true;
 break;
+			case 'b':
+batchmode = true;
+break;
 			default:
 
 /*
-- 
2.9.5


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SendRowDescriptionMessage() is slow for queries with a lot of columns

2017-09-29 Thread Andres Freund

On 2017-09-29 17:56:10 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > Does anybody have an opinion on whether we'll want to convert examples
> > like testlibpq3.c (included in libpq.sgml) too? I'm inclined not to,
> > because currently using pg_bswap.h requires c.h presence (just for a few
> > typedefs and configure data).  There's also not really a pressing need.
>
> We certainly mustn't encourage libpq users to start depending on c.h,
> so let's leave that alone.

Here's two patches:

0001: Previously submitted changes to pg_bswap.h, addressing concerns
  like the renaming
0002: Move over most users of ntoh[sl]/hton[sl] over to pg_bswap.h.

Note that the latter patch includes replacing open-coded byte swapping
of 64bit integers (using two 32 bit swaps) with a single 64bit
swap. I've also removed pg_recvint64 - it's now a single pg_ntoh64 - as
it's name strikes me as misleading.

Where it looked applicable I have removed netinet/in.h and arpa/inet.h
usage, which previously provided the relevant functionality. It's
perfectly possible that I missed other reasons for including those,
the buildfarm will tell.

Greetings,

Andres Freund
>From c953b5b7ea97db54c2cba262528b520a6b452462 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Fri, 29 Sep 2017 15:52:55 -0700
Subject: [PATCH 1/3] Extend & revamp pg_bswap.h infrastructure.

Upcoming patches are going to address performance issues that involve
slow system provided ntohs/htons etc. To address that expand
pg_bswap.h to provide pg_ntoh{16,32,64}, pg_hton{16,32,64} and
optimize their respective implementations by using compiler intrinsics
for gcc compatible compilers and msvc. Fall back to manual
implementations using shifts etc otherwise.

Additionally remove multiple evaluation hazards from the existing
BSWAP32/64 macros, by replacing them with inline functions when
necessary. In the course of that the naming scheme is changed to
pg_bswap16/32/64.

Author: Andres Freund
Discussion: https://postgr.es/m/20170927172019.gheidqy6xvlxb...@alap3.anarazel.de
---
 config/c-compiler.m4|  17 ++
 contrib/btree_gist/btree_uuid.c |   4 +-
 src/include/port/pg_bswap.h | 132 
 src/include/port/pg_crc32c.h|   2 +-
 4 files changed, 128 insertions(+), 27 deletions(-)

diff --git a/config/c-compiler.m4 b/config/c-compiler.m4
index 7275ea69fe..6dcc790649 100644
--- a/config/c-compiler.m4
+++ b/config/c-compiler.m4
@@ -224,6 +224,23 @@ AC_DEFINE(HAVE__BUILTIN_TYPES_COMPATIBLE_P, 1,
 fi])# PGAC_C_TYPES_COMPATIBLE
 
 
+# PGAC_C_BUILTIN_BSWAP16
+# -
+# Check if the C compiler understands __builtin_bswap16(),
+# and define HAVE__BUILTIN_BSWAP16 if so.
+AC_DEFUN([PGAC_C_BUILTIN_BSWAP16],
+[AC_CACHE_CHECK(for __builtin_bswap16, pgac_cv__builtin_bswap16,
+[AC_COMPILE_IFELSE([AC_LANG_SOURCE(
+[static unsigned long int x = __builtin_bswap16(0xaabb);]
+)],
+[pgac_cv__builtin_bswap16=yes],
+[pgac_cv__builtin_bswap16=no])])
+if test x"$pgac_cv__builtin_bswap16" = xyes ; then
+AC_DEFINE(HAVE__BUILTIN_BSWAP16, 1,
+  [Define to 1 if your compiler understands __builtin_bswap16.])
+fi])# PGAC_C_BUILTIN_BSWAP16
+
+
 
 # PGAC_C_BUILTIN_BSWAP32
 # -
diff --git a/contrib/btree_gist/btree_uuid.c b/contrib/btree_gist/btree_uuid.c
index ecf357d662..9ff421ea55 100644
--- a/contrib/btree_gist/btree_uuid.c
+++ b/contrib/btree_gist/btree_uuid.c
@@ -182,8 +182,8 @@ uuid_2_double(const pg_uuid_t *u)
 	 * machine, byte-swap each half so we can use native uint64 arithmetic.
 	 */
 #ifndef WORDS_BIGENDIAN
-	uu[0] = BSWAP64(uu[0]);
-	uu[1] = BSWAP64(uu[1]);
+	uu[0] = pg_bswap64(uu[0]);
+	uu[1] = pg_bswap64(uu[1]);
 #endif
 
 	/*
diff --git a/src/include/port/pg_bswap.h b/src/include/port/pg_bswap.h
index 50a6bd106b..f67ad4b133 100644
--- a/src/include/port/pg_bswap.h
+++ b/src/include/port/pg_bswap.h
@@ -3,15 +3,13 @@
  * pg_bswap.h
  *	  Byte swapping.
  *
- * Macros for reversing the byte order of 32-bit and 64-bit unsigned integers.
+ * Macros for reversing the byte order of 16, 32 and 64-bit unsigned integers.
  * For example, 0xAABBCCDD becomes 0xDDCCBBAA.  These are just wrappers for
  * built-in functions provided by the compiler where support exists.
- * Elsewhere, beware of multiple evaluations of the arguments!
  *
- * Note that the GCC built-in functions __builtin_bswap32() and
- * __builtin_bswap64() are documented as accepting single arguments of type
- * uint32_t and uint64_t respectively (these are also the respective return
- * types).  Use caution when using these wrapper macros with signed integers.
+ * Note that all of these functions accept unsigned integers as arguments and
+ * return the same.  Use caution when using these wrapper macros with signed
+ * integers.
  *
  * Copyright (c) 2015-2017, PostgreSQL Global Development Group
  *
@@ -22,28 +20,114 @@
 #ifndef PG_BSWAP_H
 #define PG_BSWAP_H
 
-#ifdef

Re: [HACKERS] User-perspective knowledge about wait events

2017-09-29 Thread Michael Paquier

On Sat, Sep 30, 2017 at 4:20 AM, Schneider  wrote:
> On Tue, Sep 26, 2017 at 4:28 PM, Michael Paquier
>  wrote:
>> Gathering a set of examples on wiki page with some rough
>> analysis I think would be a good start.
>
> I don't seem to have privs to create wiki pages; can someone else make
> a page where we can begin to gather things like this?
>
> Does https://wiki.postgresql.org/wiki/Wait_Events make sense?

Yes, that looks fine. Don't yo uhave a community account? I think that
it is necessary to log in to create a new page.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Append implementation

On Fri, Sep 29, 2017 at 7:48 AM, Amit Kapila  wrote:
> I think in general the non-partial paths should be cheaper as compared
> to partial paths as that is the reason planner choose not to make a
> partial plan at first place. I think the idea patch is using will help
> because the leader will choose to execute partial path in most cases
> (when there is a mix of partial and non-partial paths) and for that
> case, the leader is not bound to complete the execution of that path.
> However, if all the paths are non-partial, then I am not sure much
> difference it would be to choose one path over other.

The case where all plans are non-partial is the case where it matters
most!  If the leader is going to take a share of the work, we want it
to take the smallest share possible.

It's a lot fuzzier what is best when there are only partial plans.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Why are we including netinet/tcp.h so widely?

2017-09-29 Thread Andres Freund

Hi,

A lot of files which currently include netinet/tcp.h. For most of them I
cannot recognize why, and it just seems to have been copied from one
file to the next.

E.g.
src/interfaces/libpq/{fe-secure,fe-protocol3,fe-protocol2,fe-secure}.c
src/backend/libpq/{be-secure,be-secure-openssl}.c
and probably some more.

Not that it's particularly harmful to include a superflous include, but
due to the #ifdef HAVE_NETINET_TCP_H it's three lines, and it seems to
be copied enough that I got curious.

Any reason that I'm missing? Old platforms? I couldn't find anything in
the archives.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench - minor fix for meta command only scripts



Hello Robert,


ISTM that this bug exists since rate was introduced, so shame on me and
back-patching should be needed.


I took a look at this and found that the proposed patch applies
cleanly all the way back to 9.5, but the regression is reported to
have begun with a commit that starts in v10.  I haven't probed into
this in any depth, but are we sure that
12788ae49e1933f463bc59a6efe46c4a01701b76 is in fact where this problem
originated?


Yes.

I just rechecked that the problem occurs at 12788ae but not at the 
preceding da6c4f6ca8.


Now the situation before the restructuring is that it worked but given the 
spaghetti code it was very hard to guess why, not to fix issues when 
not...


My late at night fuzzy interpretation is as follows:

The issue is in the code above the fix I submitted which checks what has 
to be selected. In the previous version ISTM that the condition was laxed, 
so it filled the input_mask even if the client was not waiting for 
anything, so it was calling select later which was really just 
implementing the timeout. With the updated version the input mask and 
maxsock is only set if there is really something to wait, and if not then

it fall through and is active instead of doing a simple sleep/timeout.

So I would say that the previous version worked because of a side effect 
which may or may not have been intentional at the time, and was revealed 
by checking the condition better.


Basically I'd say that the restructuring patch fixed a defect which 
triggered the bug. Programming is fun:-)


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Fwd: Have a problem with citext

2017-09-29 Thread David E. Wheeler

Hackers,

Are permissions correct in the citext extension?

Best,

David

> Begin forwarded message:
> 
> From: Sadek Touati 
> Subject: Have a problem with citext
> Date: September 29, 2017 at 17:02:50 EDT
> To: "da...@kineticode.com" 
> 
> Dear sir,
> I'm using the citext datatype in my application. I have PostgresSql 9.6 
> installed by EnterpriseDB
> 
> 
> psql mydatabase postgres
> create extension citext with schema myschema
> 
> \c mydatabase biguser
> 
> set search_path to myschema;
> 
> create table tst(v citext);
> insert into tst values('sadek');
> > select strpos(v, 'd') from tst;
> ERROR:  permission denied for function strpos
> 
> > select strpos(v, 'd'::citext) from tst; (If I read the documentation 
> > correctly this should work! alas, it doesn't)
> ERROR:  permission denied for function strpos
> 
> > select strpos(v::citext, 'd'::citext) from tst;
> ERROR:  permission denied for function strpos
> 
> > select strpos(v::citext, 'd') from tst;
> ERROR:  permission denied for function strpos
> 
> > select strpos(v::citext, 'd'::citext) from tst;
> ERROR:  permission denied for function strpos
> 
> > select strpos(v::text, 'd'::text) from tst;
>  strpos
> 
>   3
> (1 row)
> 
> Am I missing something here?
> 
> thanks in advance



signature.asc
Description: Message signed with OpenPGP

Re: [HACKERS] SendRowDescriptionMessage() is slow for queries with a lot of columns

Andres Freund  writes:
> Does anybody have an opinion on whether we'll want to convert examples
> like testlibpq3.c (included in libpq.sgml) too? I'm inclined not to,
> because currently using pg_bswap.h requires c.h presence (just for a few
> typedefs and configure data).  There's also not really a pressing need.

We certainly mustn't encourage libpq users to start depending on c.h,
so let's leave that alone.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Append implementation

On Sat, Sep 16, 2017 at 2:15 AM, Amit Kapila  wrote:
> Yes, we can do that and that is what I think is probably better.  So,
> the question remains that in which case non-parallel-aware partial
> append will be required?  Basically, it is not clear to me why after
> having parallel-aware partial append we need non-parallel-aware
> version?  Are you keeping it for the sake of backward-compatibility or
> something like for cases if someone has disabled parallel append with
> the help of new guc in this patch?

We can't use parallel append if there are pathkeys, because parallel
append will disturb the output ordering.  Right now, that never
happens anyway, but that will change when
https://commitfest.postgresql.org/14/1093/ is committed.

Parallel append is also not safe for a parameterized path, and
set_append_rel_pathlist() already creates those.  I guess it could be
safe if the parameter is passed down from above the Gather, if we
allowed that, but it's sure not safe in a case like this:

Gather
-> Nested Loop
  -> Parallel Seq Scan
  -> Append
 -> whatever

If it's not clear why that's a disaster, please ask for a more
detailed explaination...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench - minor fix for meta command only scripts

On Mon, Sep 11, 2017 at 4:49 AM, Fabien COELHO  wrote:
> Ok, the problem was a little bit more trivial than I thought.
>
> The issue is that under a low rate there may be no transaction in progress,
> however the wait procedure was relying on select's timeout. If nothing is
> active there is nothing to wait for, thus it was an active loop in this
> case...
>
> I've introduced a usleep call in place of select for this particular case.
> Hopefully this is portable.
>
> ISTM that this bug exists since rate was introduced, so shame on me and
> back-patching should be needed.

I took a look at this and found that the proposed patch applies
cleanly all the way back to 9.5, but the regression is reported to
have begun with a commit that starts in v10.  I haven't probed into
this in any depth, but are we sure that
12788ae49e1933f463bc59a6efe46c4a01701b76 is in fact where this problem
originated?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SendRowDescriptionMessage() is slow for queries with a lot of columns

2017-09-29 Thread Andres Freund

On 2017-09-28 14:23:45 +0900, Michael Paquier wrote:
> On Thu, Sep 28, 2017 at 1:31 PM, Andres Freund  wrote:
> > On September 27, 2017 9:06:49 PM PDT, Andres Freund  
> > wrote:
> >>On 2017-09-28 00:01:53 -0400, Tom Lane wrote:
> >>> Could we please not perpetuate the brain-dead "s" and "l" suffixes
> >>> on these names?  Given the lack of standardization as to how long
> >>> "long" is, that's entirely unhelpful.  I'd be fine with names like
> >>> pg_ntoh16/32/64 and pg_hton16/32/64.
> >>
> >>Yes. I'd polled a few people and they leaned towards those. But I'm
> >>perfectly happy to do that renaming.
> >
> > If somebody wants to argue for replacing hton/ntoh with {to,from}big or 
> > *be, now's the time.
> 
> OK. pg_hton16/32/64 and pg_ntoh16/32/64 are fine enough IMO.

Does anybody have an opinion on whether we'll want to convert examples
like testlibpq3.c (included in libpq.sgml) too? I'm inclined not to,
because currently using pg_bswap.h requires c.h presence (just for a few
typedefs and configure data).  There's also not really a pressing need.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] alter server for foreign table

2017-09-29 Thread Nico Williams

On Fri, Sep 29, 2017 at 10:19:03PM +0200, David Fetter wrote:
> On Fri, Sep 29, 2017 at 01:47:59PM -0400, Tom Lane wrote:
> > Konstantin Knizhnik  writes:
> > > According to Postgresql documentation it is not possible to alter server 
> > > for foreign table:
> > > https://www.postgresql.org/docs/10/static/sql-alterforeigntable.html
> > 
> > Hmm, we'd have to check if the table's options were legal for the
> > new FDW, but in principle we could support this, I suppose.
> > Not sure if it's useful enough to be worth the trouble.
> 
> It would definitely be useful if it were available.  Nodes are a good
> bit more fungible than they were even 5 years back.

It would be nice to have DDLs for everything ALTERation that one could
make that is trivial to support.

It would also be nice to have a rule that every DDL (and every ADD/DROP
in ALTER statements) support IF EXISTS / IF NOT EXISTS, and, where
meaningful, OR REPLACE.  I work around a lot of missing IE/INE/OR by
using techniques like: conditioning on a schema query, using DROP IF
EXISTS then CREATE in a transaction (when the DROP has IE but the CREATE
does not have INE), catching exceptions, and so on.

These are little things -- quality of life sorts of things :)

I've also grown accustomed to writing complex pg_catalog queries.  It'd
be nice to have some built-in views on the pg_catalog -- something
better than the information_schema.  I have written some that we use for
code generation based on PG schemas; we'd be glad to contribute them.

Nico
-- 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Shaky coding for vacuuming partitioned relations

Robert Haas  writes:
> On Fri, Sep 29, 2017 at 12:43 PM, Tom Lane  wrote:
>> Robert Haas  writes:
>>> If I understand correctly, problem #1 is that get_rel_oids() can emit
>>> a user-visible cache lookup failure message. There is a proposed patch
>>> by Michael Paquier which appears to implement the design suggested by
>>> Tom.  I think that the normal procedure would be for Tom to commit
>>> that change if he's happy with it.

>> Yes, I'm happy to take responsibility for this.

> Great, thanks.

After thinking more carefully about the idea I proposed upthread (viz,
take and keep ShareUpdateExclusiveLock), I realized that that will be
pretty horrid once the proposed patch to allow VACUUM to name multiple
target tables goes in.  We'd be trying to take a pretty strong lock on
multiple tables at once during get_rel_oids(), creating substantial risk
of deadlock.  Up to now, VACUUM has always been careful to not take more
than one such lock at once, and I think we need to preserve that property.

The simplest solution therefore seems to be to just take AccessShareLock
and then release it as soon as we're done adding the rel and its
partitions to the list of target rels.  There's little point in taking a
stronger lock here if we are going to end the transaction and drop it
later, and it's a nicely low-risk answer for v10 in any case.

>>> I don't think I understand problem #2.  I think the concern is about
>>> reporting the proper relation name when VACUUM cascades from a
>>> partitioned table to its children and then some kind of concurrent DDL
>>> happens, but I don't see a clear explanation on the thread as to what
>>> exactly the failure scenario is, and I didn't see a problem in some
>>> simple tests I just ran.

>> I think the conclusion was that this wouldn't actually happen in v10,
>> but I will take a closer look and do something about it if it is possible.

> Even better, thanks.

After looking closer, I confirmed that there's no live bug today,
though it's a closer shave than one could wish.  The RangeVar passed to
vacuum_rel and/or analyze_rel can be wrong, or even NULL, but it is
only used when autovacuum.c requests failure logging, and for that
particular use-case the RangeVar will always be correct.

I felt though that I didn't want to just leave it like that, particularly
in view of the existence of multiple comments claiming things that were
entirely false.  So I adjusted those two functions to explicitly allow
for NULL RangeVars, and tried to improve the comments.

Nathan Bossart wants to see logging of relation-open failures in more
cases, and to get that we're going to have to work harder to track
valid RangeVars for target relations.  But that is not v10 material.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_prepared_xact_status

On Fri, Sep 29, 2017 at 4:22 AM, Craig Ringer  wrote:
> This sounds kind-of like 1/4 of a distributed transaction resolver, without
> a way to make it reliable enough to build the other 3/4.
>
> To make this practical I think you'd need a way to retain historic GIDs +
> their outcomes, and a way to prune that information only once an application
> knows all interested participants consider the transaction finalized.
>
> I'd be all for a global xid status function if there were a good way to
> manage resource retention. But it's fuzzy enough for txid_status, which
> isn't really making any firm promises, just improving on the prior state of
> "no f'ing idea what happened to that tx, sorry". 2PC consumers will want
> absolute guarantees, not "dunno, sorry".

Very well said, and I agree.

I think the approach this patch takes is a non-starter for exactly the
reasons you have stated.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] alter server for foreign table

2017-09-29 Thread David Fetter

On Fri, Sep 29, 2017 at 01:47:59PM -0400, Tom Lane wrote:
> Konstantin Knizhnik  writes:
> > According to Postgresql documentation it is not possible to alter server 
> > for foreign table:
> > https://www.postgresql.org/docs/10/static/sql-alterforeigntable.html
> 
> Hmm, we'd have to check if the table's options were legal for the
> new FDW, but in principle we could support this, I suppose.
> Not sure if it's useful enough to be worth the trouble.

It would definitely be useful if it were available.  Nodes are a good
bit more fungible than they were even 5 years back.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Arrays of domains

2017-09-29 Thread Andrew Dunstan



On 09/29/2017 01:10 PM, Tom Lane wrote:
> Andrew Dunstan  writes:
>> On 09/28/2017 05:44 PM, Tom Lane wrote:
>>> Assuming that that's going to happen for v11, I'm inclined to leave the
>>> optimization problem until the dust settles around CaseTestExpr.
>>> Does anyone feel that this can't be committed before that's addressed?
>> I'm Ok with it as long as we don't forget to revisit this.
> I decided to go ahead and build a quick optimization for this case,
> as per the attached patch that applies on top of what we previously
> discussed.  It brings us back to almost par with HEAD:
>
>   HEADPatch   + 04.patch
>
> Q15453.235 ms 5440.876 ms 5407.965 ms
> Q29340.670 ms 10191.194 ms9407.093 ms
> Q319078.457 ms18967.279 ms19050.392 ms
> Q448196.338 ms58547.531 ms48696.809 ms


Nice.

>
> Unless Andres feels this is too ugly to live, I'm inclined to commit
> the patch with this addition.  If we don't get around to revisiting
> CaseTestExpr, I think this is OK, and if we do, this will make sure
> that we consider this case in the revisit.
>
> It's probably also worth pointing out that this test case is intentionally
> chosen to be about the worst possible case for the patch.  A less-trivial
> coercion function would make the overhead proportionally less meaningful.
> There's also the point that the old code sometimes applies two layers of
> array coercion rather than one.  As an example, coercing int4[] to
> varchar(10)[] will do that.  If I replace "x::int8[]" with
> "x::varchar(10)[]" in Q2 and Q4 in this test, I get
>
>   HEADPatch (+04)
>
> Q246929.728 ms20646.003 ms
> Q4386200.286 ms   155917.572 ms
>
>   


Yeah, testing the worst case was the idea. This improvement in the
non-worst case is pretty good.

+1 for going ahead.


cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitions: \d vs \d+

On Thu, Sep 28, 2017 at 9:33 PM, Amit Langote
 wrote:
> Perhaps, there is no case when "No partition constraint" should be output,
> but I may be missing something.

The case arises when a partitioned table has a default partition but
no other partitions.

I have committed the patch.

In v10, it's impossible to have a partition with no partition
constraint, and if it does happen due to some bug, the worst that will
happen is \d will just print nothing, rather than explicitly printing
that there's no constraint.  That's not a serious problem and it
shouldn't happen anyway, so no back-patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPDATE of partition key

On Fri, Sep 22, 2017 at 1:57 AM, Amit Khandekar  wrote:
> The patch for the above change is :
> 0002-Prevent-a-redundant-ConvertRowtypeExpr-node.patch

Thinking about this a little more, I'm wondering about how this case
arises.  I think that for this patch to avoid multiple conversions,
we'd have to be calling map_variable_attnos on an expression and then
calling map_variable_attnos on that expression again.

>>> If I understand correctly, the reason for changing mt_partitions from
>>> ResultRelInfo * to ResultRelInfo ** is that, currently, all of the
>>> ResultRelInfos for a partitioning hierarchy are allocated as a single
>>> chunk, but we can't do that and also reuse the ResultRelInfos created
>>> during InitPlan.  I suggest that we do this as a preparatory patch.
>>
>> Ok, will prepare a separate patch. Do you mean to include in that
>> patch the changes I did in ExecSetupPartitionTupleRouting() that
>> re-use the ResultRelInfo structures of per-subplan update result rels
>> ?
>
> Above changes are in attached
> 0001-Re-use-UPDATE-result-rels-created-in-InitPlan.patch.

No, not all of those changes.  Just the adjustments to make
ModifyTableState's mt_partitions be of type ResultRelInfo ** rather
than ResultRelInfo *, and anything closely related to that.  Not, for
example, the num_update_rri stuff.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Multicolumn hash indexes

2017-09-29 Thread Nico Williams

On Fri, Sep 29, 2017 at 10:54:55AM -0400, Tom Lane wrote:
> There are few if any indexing techniques where the first column isn't
> significantly more important than the rest --- certainly that's true
> for btree, for example.  I do not think it's a showstopper if that's
> true for hash as well.

You have N>1 columns to index and which you'll be using a conjunction of
all of them together in queries, with equiality predicates.  No one of
those columns is sufficiently selective.  But all the columns together
are plenty selective enough.

Obviously a [multi-column] hash index should do.  The main question is
whether the planner can be made to not consider subsets of the columns
to the exclusion of the hash index -- maybe not, or not easily enough.

This is easy enough to implement with a B-Tree index on an expression
consisting of a decent hash function application to the row_to_json() of
a row composed of the columns in question.  But it requires using the
same expression in queries, naturally, which then acts as a barrier to
the planner's propensity to drop columns from consideration for index
selection.

A multi-column hash index facility shouldn't have to require anything
more than where clauses that are a conjunction of all the columns with
equality predicates.

Nico
-- 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] User-perspective knowledge about wait events

2017-09-29 Thread Schneider

On Tue, Sep 26, 2017 at 4:28 PM, Michael Paquier
 wrote:
> Gathering a set of examples on wiki page with some rough
> analysis I think would be a good start.

I don't seem to have privs to create wiki pages; can someone else make
a page where we can begin to gather things like this?

Does https://wiki.postgresql.org/wiki/Wait_Events make sense?

-Jeremy

-- 
http://about.me/jeremy_schneider

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench stuck with 100% cpu usage

On Thu, Sep 28, 2017 at 10:36 PM, Pavan Deolasee
 wrote:
> Well, I think it's a very legitimate test, not for testing performance, but
> testing crash recovery and I use it very often. This particular test was run
> to catch another bug which will be reported separately.

Yeah, I use pgbench for stuff like that all the time.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench stuck with 100% cpu usage




Committed and back-patched to v10.  I have to say I'm kind of
surprised that the comment removed by this patch got committed in the
first place.  It's got a ??? in it and isn't very grammatical either.


ISTM that I reviewed the initial patch.

AFAICR I agreed with the comment that whether it was appropriate to go on 
was unclear, but it did not strike me as obviously a bad idea at the time, 
so I let it pass. Now it does strike me as a bad idea (tm):-) My English 
is kind of fuzzy, so I tend not to comment too much on English unless I'm 
really sure that it is wrong.


Note that there is another 100% cpu pgbench bug, see 
https://commitfest.postgresql.org/15/1292/, which seems more likely to 
occur in the wild that this one, but there has been no review of the fix I 
sent.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH]make pg_rewind to not copy useless WAL files

On Fri, Sep 29, 2017 at 8:10 PM, chenhj  wrote:

> On 2017-09-30 00:53:31，"chenhj"  wrote:
>
> On 2017-09-29 19:29:40，"Alexander Korotkov" 
> wrote:
>
> On Fri, Sep 29, 2017 at 10:07 AM, chenhj  wrote:
>>
>>
>>
> OK.  That makes sense.  Thank you for the explanation.
>
> I still have some minor comments.
>
>
>> /*
>> +* Save the WAL filenames of the divergence and the current WAL insert
>> +* location of the source server. Later only the WAL files between
>> those
>> +* would be copied to the target data directory.
>>
>
> Comment is outdated.  We don't save filenames anymore, now we save segment
> numbers.
>
>
>> +* Note:The later generated WAL files in the source server before the
>> end
>> +* of the copy of the data files must be made available when the
>> target
>> +* server is started. This can be done by configuring the target
>> server as
>> +* a standby of the source server.
>> +*/
>
>
> You miss space after "Note:".  Also, it seems reasonable for me to leave
> empty line before "Note:".
>
> # Setup parameter for WAL reclaim
>
>
> Parameter*s*, because you're setting up multiple of them.
>
> # The accuracy of imodification from pg_ls_waldir() is seconds, so sleep
>> one seconds
>
>
> One second without "s".
>
> Also, please check empty lines in 006_wal_copy.pl to be just empty lines
> without tabs.
>
>
> Thanks for your comments, i had fix above problems.
> And also add several line breaks at long line in 006_wal_copy.pl
> Please check this patch again.
>
>
> Sorry, patch v6 did not remove tabs in two empty lines, please use the new
> one.
>

Great.  Now code of this patch looks good for me.
However, we forgot about documentation.

  
>The result is equivalent to replacing the target data directory with the
>source one. Only changed blocks from relation files are copied;
>all other files are copied in full, including configuration files. The
>advantage of pg_rewind over taking a new base backup, or
>tools like rsync, is that pg_rewind does
>not require reading through unchanged blocks in the cluster. This makes
>it a lot faster when the database is large and only a small
>fraction of blocks differ between the clusters.
>   


At least, this paragraph need to be adjusted, because it states whose files
are copied.  And probably latter paragraphs whose state about WAL files.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Multicolumn hash indexes

On Fri, Sep 29, 2017 at 6:33 PM, Robert Haas  wrote:

> Now you could also imagine something where we keep a separate set of
> hash buckets for each column and multi-column searches are handled by
> searching each hash table separately and taking the intersection of
> the sets of TIDs found for each column.  Not sure that's a good idea,
> but it's sort of like what GIN does.
>

I'd like to note that what GIN does for multicolumn indexes seems quite
artificial thing to me.  Logically multicolumn GIN index is the same thing
as multiple singlecolumn GIN indexes.  The reason why multicolumn GIN
exists is the fact that in particular case GIN does overlapping of scan
results over multiple attributes more efficient than our executor do.
Particular case is that scans over multiple columns return ordered sets of
TIDs.  GIN implements kind of merge join over TIDs while our executor can
only do BitmapAnd in this case.

However, if we imagine that at the moment we develop GIN our executor can
do merge join of TIDs produced by multiple index scans, then we wouldn't
invent multicolumn GIN at all. Or probably we would invent very different
kind of multicolumn GIN.

I mean that multicolumn GIN was a good solution of particular problem from
practical side of view.  It allowed to implement very efficient algorithms
with minimal changes in planner/executor infrastructure.  However,
multicolumn GIN doesn't look like a good design pattern we would like to
repeat.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

[HACKERS] Re: COMMIT TRIGGERs, take n+1

2017-09-29 Thread Nico Williams

The attached file demonstrates how to create COMMIT, BEGIN, and even
session CONNECT TRIGGERs for PostgreSQL using PlPgSQL only, via normal
and CONSTRAINT TRIGGERs.

There have been many threads on the PG mailing lists about commit
triggers, with much skepticism shown about the possible semantics of
such a thing.  Below we list use cases, and demonstrate reasonable,
useful, and desirable semantics.

The previous version of this could be defeated by using SET CONSTRAINTS
..  IMMEDIATE.  This version detects this when it would cause commit
triggers to run too soon, and causes an exception to be raised.  The
technique used to detect this could be used by anyone whose business
logic breaks when SET CONSTRAINTS ..  IMMEDIATE is used.

There is one shortcoming of this implementation: it is inefficient
because it has to use FOR EACH ROW triggers under the hood, so if you
do 1,000 inserts, then 999 of the resulting internal trigger
procedure invocations will be unnecessary.  These are FOR EACH ROW
triggers because that is the only level permitted for CONSTRAINT
triggers, which are used under the hood to trigger running at commit
time.

(It would be nice if CONSTRAINT triggers could be FOR EACH STATEMENT
too...)

Use-cases:

 - create TEMP schema before it's needed by regular triggers

   This can be useful if CREATE TEMP TABLE IF EXISTS and such in regular
   triggers could slow them down.

 - cleanup internal, temporary state left by triggers from earlier
   transactions

 - perform global consistency checks (assertions, if you like)

   Note that these can be made to scale by checking only the changes
   made by the current transaction.  Transition tables, temporal
   tables, audit tables -- these can all help for the purpose of
   checking only deltas as opposed to the entire database.

   Related: there was a thread about a patch to add assertions:

   
https://www.postgresql.org/message-id/flat/20131218113911.GC5224%40alap2.anarazel.de#20131218113911.gc5...@alap2.anarazel.de

 - update materializations of views when all the relevant deltas can
   be considered together

   I use an alternatively view materialization system that allows direct
   updates of the materialization table, and records updates in a
   related history table.  Being about to update such materializations
   via triggers is very convenient; being able to defer such updates as
   long as possible is a nice optimization.

 - call C functions that have external side-effects when you know the
   transaction will succeed (or previous ones that have succeeded but
   not had those functions called)

Semantics:

 - connect/begin/commit trigger procedures called exactly once per-
   transaction that had any writes (even if they changed nothing
   in the end), with one exception:

- exceptions thrown by triggers may be caught, and the triggering
  statement retried, in which case triggers will run again

 - connect/begin trigger procedures called in order of trigger
   name (ascending) before any rows are inserted/updated/deleted by
   any DML statements on non-TEMP tables in the current transaction

 - commit trigger procedures called in order of commit trigger name
   (ascending) at commit time, after the last statement sent by the
   client/user for the current transaction

 - begin and commit trigger procedures may perform additional write
   operations, and if so that will NOT cause additional invocations
   of begin/commit trigger procedures

 - commit trigger procedures may RAISE EXCEPTION, triggering a
   rollback of the transaction

Nico
-- 


commit_trigger.sql
Description: application/sql

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A design for amcheck heapam verification

On Fri, Sep 29, 2017 at 1:57 PM, Peter Geoghegan  wrote:
> On Fri, Sep 29, 2017 at 10:29 AM, Robert Haas  wrote:
>> I am also wondering whether this patch should consider
>> 81c5e46c490e2426db243eada186995da5bb0ba7 as a way of obtaining
>> multiple hash values.  I suppose that's probably inferior to what is
>> already being done on performance grounds, but I'll just throw out a
>> mention of it here all the same in case it was overlooked or the
>> relevance not spotted...
>
> Well, we sometimes only want one hash value. This happens when we're
> very short on memory (especially relative to the estimated final size
> of the set), so it's a fairly common requirement. And, we have a
> convenient way to get a second independent uint32 hash function now
> (murmurhash32()).

Right, so if you wanted to use the extended hash function
infrastructure, you'd just call the extended hash function with as
many different seeds as the number of hash functions you need.  If you
need 1, you call it with one seed, say 0.  And if you need any larger
number, well, cool.

Like I say, I'm not at all sure this is better than what you've got
right now.  But it's an option.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench stuck with 100% cpu usage

On Fri, Sep 29, 2017 at 1:39 AM, Pavan Deolasee
 wrote:
> Looks good to me.

Committed and back-patched to v10.  I have to say I'm kind of
surprised that the comment removed by this patch got committed in the
first place.  It's got a ??? in it and isn't very grammatical either.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A design for amcheck heapam verification

On Fri, Sep 29, 2017 at 10:29 AM, Robert Haas  wrote:
> I am also wondering whether this patch should consider
> 81c5e46c490e2426db243eada186995da5bb0ba7 as a way of obtaining
> multiple hash values.  I suppose that's probably inferior to what is
> already being done on performance grounds, but I'll just throw out a
> mention of it here all the same in case it was overlooked or the
> relevance not spotted...

Well, we sometimes only want one hash value. This happens when we're
very short on memory (especially relative to the estimated final size
of the set), so it's a fairly common requirement. And, we have a
convenient way to get a second independent uint32 hash function now
(murmurhash32()).

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A design for amcheck heapam verification

On Thu, Sep 28, 2017 at 8:34 PM, Thomas Munro
 wrote:
> FWIW I think if I were attacking that problem the first thing I'd
> probably try would be getting rid of that internal pointer
> filter->bitset in favour of a FLEXIBLE_ARRAY_MEMBER and then making
> the interface look something like this:
>
> extern size_t bloom_estimate(int64 total elems, int work_mem);
> extern void bloom_init(bloom_filter *filter, int64 total_elems, int work_mem);
>
> Something that allocates new memory as the patch's bloom_init()
> function does I'd tend to call 'make' or 'create' or 'new' or
> something, rather than 'init'.

I tend to agree. I'll adopt that style in the next version. I just
didn't want the caller to have to manage the memory themselves.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql \d sequence display

On 9/25/17 13:53, Fabien COELHO wrote:
> \d+ does not show more.
> 
> Maybe Type, Min, Max, Inc & Cycles are enough for \d?

That seems kind of arbitrary.  Start and Cache are just as relevant.

> The next/future or last/previous value is not shown. If one could be 
> available it would be nice to have?

You can get those from the sequence.  Running \d on a table doesn't show
the table data either.  So I think this is a valid distinction.

> Maybe some names are a little large, eg "Increment" could be "Inc.".
> The value is nearly always 1?

Yeah, but then that would look weird if only one heading were
abbreviated.  The total display width is good, so we don't need to
squeeze it too hard.

> Not sure why it is "Cycles" (plural) instead of "Cycle".

This is meant to be a verb, as in "does it cycle?".  I have changed it
to "Cycles?" to match other similar headings.

> I do not understand why some queries have "... \n" "... \n" and others
> have "\n ..." "\n ...". I would suggest to homogeneize around the former,
> because "\nFROM ..." is less readable.

Yeah that is a bit ugly, but I have just moved existing code around.

I have now committed my patch with minor adjustments, so we have
something working for PG10.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] alter server for foreign table

Konstantin Knizhnik  writes:
> According to Postgresql documentation it is not possible to alter server 
> for foreign table:
> https://www.postgresql.org/docs/10/static/sql-alterforeigntable.html

Hmm, we'd have to check if the table's options were legal for the
new FDW, but in principle we could support this, I suppose.
Not sure if it's useful enough to be worth the trouble.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench - minor fix for meta command only scripts




reality.  So I don't know if this needs backpatching or not.  But it 
should be fixed for v10, as there it becomes a demonstrably live issue.


Yes.

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug with pg_basebackup and 'shared' tablespace

On Fri, Sep 29, 2017 at 2:06 AM, Michael Paquier
 wrote:
> My tendency about this patch is still that it should be rejected. This
> is presenting additional handling for no real gain.

I vehemently disagree.  If the server lets you create a tablespace,
then everything that happens after that ought to work.

On another thread, there is the issue that if you create a tablespace
inside $PGDATA, things break.  We should either unbreak those things
or not allow creating the tablespace in the first place.  On this
thread, there is the issue that if you create two tablespaces for
different PG versions in the same directory, things break.  We should
either unbreak those things or not allow creating the tablespace in
the first place.

It is completely awful behavior to let users do things and then punish
them later for having done them.  Users are not obliged to read the
minds of the developers and guess what things the developers consider
"reasonable".  They should be able to count on the principle that if
they do something that we consider wrong, they'll get an error when
they try to do it -- not have it superficially appear to work and then
explode later.

To put that another way, there should be ONE rule about what is or is
not allowable in a particular situation, and all commands, utilities,
etc. that deal with that situation should handle it in a uniform
fashion.  Each .c file shouldn't get to make up its own notion of what
is or is not supported.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

On Fri, Sep 29, 2017 at 12:44 AM, Vaishnavi Prabakaran
 wrote:
> Option name "--enable-pgdumpall-behaviour"  is very generic

Yeah, that's a terrible name, at least in my opinion.

> and it is better
> to rename it to something that reflects its functionality like
> --skip-default-db-create/--no-default-db-create

But I wonder why this patch needs a new option at all?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A design for amcheck heapam verification

On Thu, Sep 28, 2017 at 11:34 PM, Thomas Munro
 wrote:
> FWIW I think if I were attacking that problem the first thing I'd
> probably try would be getting rid of that internal pointer
> filter->bitset in favour of a FLEXIBLE_ARRAY_MEMBER and then making
> the interface look something like this:
>
> extern size_t bloom_estimate(int64 total elems, int work_mem);
> extern void bloom_init(bloom_filter *filter, int64 total_elems, int work_mem);

Yes, that seems quite convenient and is by now an established coding pattern.

I am also wondering whether this patch should consider
81c5e46c490e2426db243eada186995da5bb0ba7 as a way of obtaining
multiple hash values.  I suppose that's probably inferior to what is
already being done on performance grounds, but I'll just throw out a
mention of it here all the same in case it was overlooked or the
relevance not spotted...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH]make pg_rewind to not copy useless WAL files

2017-09-29 Thread chenhj

On 2017-09-30 00:53:31，"chenhj"  wrote:

On 2017-09-29 19:29:40，"Alexander Korotkov"  wrote:

On Fri, Sep 29, 2017 at 10:07 AM, chenhj  wrote:




OK.  That makes sense.  Thank you for the explanation.


I still have some minor comments.
 
/*
+* Save the WAL filenames of the divergence and the current WAL insert
+* location of the source server. Later only the WAL files between those
+* would be copied to the target data directory.



Comment is outdated.  We don't save filenames anymore, now we save segment 
numbers.
 
+* Note:The later generated WAL files in the source server before the end
+* of the copy of the data files must be made available when the target
+* server is started. This can be done by configuring the target server as
+* a standby of the source server.
+*/


You miss space after "Note:".  Also, it seems reasonable for me to leave empty 
line before "Note:".


# Setup parameter for WAL reclaim 


Parameter*s*, because you're setting up multiple of them.


# The accuracy of imodification from pg_ls_waldir() is seconds, so sleep one 
seconds


One second without "s".


Also, please check empty lines in 006_wal_copy.pl to be just empty lines 
without tabs.


Thanks for your comments, i had fix above problems.
And also add several line breaks at long line in 006_wal_copy.pl
Please check this patch again.


Sorry, patch v6 did not remove tabs in two empty lines, please use the new one.


Best Regards,
Chen Huajun

pg_rewind_wal_copy_reduce_v7.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Arrays of domains

Andrew Dunstan  writes:
> On 09/28/2017 05:44 PM, Tom Lane wrote:
>> Assuming that that's going to happen for v11, I'm inclined to leave the
>> optimization problem until the dust settles around CaseTestExpr.
>> Does anyone feel that this can't be committed before that's addressed?

> I'm Ok with it as long as we don't forget to revisit this.

I decided to go ahead and build a quick optimization for this case,
as per the attached patch that applies on top of what we previously
discussed.  It brings us back to almost par with HEAD:

HEADPatch   + 04.patch

Q1  5453.235 ms 5440.876 ms 5407.965 ms
Q2  9340.670 ms 10191.194 ms9407.093 ms
Q3  19078.457 ms18967.279 ms19050.392 ms
Q4  48196.338 ms58547.531 ms48696.809 ms

Unless Andres feels this is too ugly to live, I'm inclined to commit
the patch with this addition.  If we don't get around to revisiting
CaseTestExpr, I think this is OK, and if we do, this will make sure
that we consider this case in the revisit.

It's probably also worth pointing out that this test case is intentionally
chosen to be about the worst possible case for the patch.  A less-trivial
coercion function would make the overhead proportionally less meaningful.
There's also the point that the old code sometimes applies two layers of
array coercion rather than one.  As an example, coercing int4[] to
varchar(10)[] will do that.  If I replace "x::int8[]" with
"x::varchar(10)[]" in Q2 and Q4 in this test, I get

HEADPatch (+04)

Q2  46929.728 ms20646.003 ms
Q4  386200.286 ms   155917.572 ms

regards, tom lane

diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index e0a8998..c5e97ef 100644
*** a/src/backend/executor/execExprInterp.c
--- b/src/backend/executor/execExprInterp.c
***
*** 34,43 
   *
   * For very simple instructions the overhead of the full interpreter
   * "startup", as minimal as it is, is noticeable.  Therefore
!  * ExecReadyInterpretedExpr will choose to implement simple scalar Var
!  * and Const expressions using special fast-path routines (ExecJust*).
!  * Benchmarking shows anything more complex than those may as well use the
!  * "full interpreter".
   *
   * Complex or uncommon instructions are not implemented in-line in
   * ExecInterpExpr(), rather we call out to a helper function appearing later
--- 34,41 
   *
   * For very simple instructions the overhead of the full interpreter
   * "startup", as minimal as it is, is noticeable.  Therefore
!  * ExecReadyInterpretedExpr will choose to implement certain simple
!  * opcode patterns using special fast-path routines (ExecJust*).
   *
   * Complex or uncommon instructions are not implemented in-line in
   * ExecInterpExpr(), rather we call out to a helper function appearing later
*** static Datum ExecJustConst(ExprState *st
*** 149,154 
--- 147,153 
  static Datum ExecJustAssignInnerVar(ExprState *state, ExprContext *econtext, bool *isnull);
  static Datum ExecJustAssignOuterVar(ExprState *state, ExprContext *econtext, bool *isnull);
  static Datum ExecJustAssignScanVar(ExprState *state, ExprContext *econtext, bool *isnull);
+ static Datum ExecJustApplyFuncToCase(ExprState *state, ExprContext *econtext, bool *isnull);
  
  
  /*
*** ExecReadyInterpretedExpr(ExprState *stat
*** 184,193 
  
  	/*
  	 * Select fast-path evalfuncs for very simple expressions.  "Starting up"
! 	 * the full interpreter is a measurable overhead for these.  Plain Vars
! 	 * and Const seem to be the only ones where the intrinsic cost is small
! 	 * enough that the overhead of ExecInterpExpr matters.  For more complex
! 	 * expressions it's cheaper to use ExecInterpExpr always.
  	 */
  	if (state->steps_len == 3)
  	{
--- 183,190 
  
  	/*
  	 * Select fast-path evalfuncs for very simple expressions.  "Starting up"
! 	 * the full interpreter is a measurable overhead for these, and these
! 	 * patterns occur often enough to be worth optimizing.
  	 */
  	if (state->steps_len == 3)
  	{
*** ExecReadyInterpretedExpr(ExprState *stat
*** 230,235 
--- 227,239 
  			state->evalfunc = ExecJustAssignScanVar;
  			return;
  		}
+ 		else if (step0 == EEOP_CASE_TESTVAL &&
+  step1 == EEOP_FUNCEXPR_STRICT &&
+  state->steps[0].d.casetest.value)
+ 		{
+ 			state->evalfunc = ExecJustApplyFuncToCase;
+ 			return;
+ 		}
  	}
  	else if (state->steps_len == 2 &&
  			 state->steps[0].opcode == EEOP_CONST)
*** ExecJustAssignScanVar(ExprState *state, 
*** 1811,1816 
--- 1815,1857 
  	return 0;
  }
  
+ /* Evaluate CASE_TESTVAL and apply a strict function to it */
+ static Datum
+ ExecJustApplyFuncToCase(ExprState *state, ExprContext *econtext, bool *isnull)
+ {
+ 	ExprEvalStep *op = >steps[0];
+ 	FunctionCallInfo fcinfo;
+ 	bool	   *argnull;
+ 	int			argno;
+

Re: [HACKERS] SQL/JSON in PostgreSQL

2017-09-29 12:15 GMT+02:00 Pavel Stehule :

>
>
> 2017-09-29 12:09 GMT+02:00 Nikita Glukhov :
>
>>
>>
>> I have some free time now. Is it last version?
>>
>> Regards
>>
>> Pavel
>>
>> Yes, this is still the latest version. Now I am working only on
>> unfinished WIP
>> patch no. 9, but I think it should be reviewed the last.
>>
>>
>
> ok
>
> Thank you
>

I have few queries and notes

1. Why first patch holds Gin related functionality? Can be it separated?

2. Why Json path functions starts by "_" ? These functions are not removed
by other patches.

3. What is base for jsonpath-extensions? ANSI/SQL?

This patch is pretty big - so I propose to push JSONPath and SQL/JSON
related patches first, and then in next iteration to push JSON_TABLE patch.
Is it acceptable strategy? I am sure so JSON_TABLE is pretty important
function, but it is pretty complex too (significantly more complex than
XMLTABLE), so it can be practiacal to move this function to separate
project. I hope so all patches will be merged in release 11 time.

Regards

Pavel



> Pavel
>
> --
>> Nikita Glukhov
>> Postgres Professional: http://www.postgrespro.com
>> The Russian Postgres Company
>>
>
>

[HACKERS] alter server for foreign table

According to Postgresql documentation it is not possible to alter server 
for foreign table:


https://www.postgresql.org/docs/10/static/sql-alterforeigntable.html

But stackoverflow suggests the following hack directly updating 
|pg_foreign_table|:

https://stackoverflow.com/questions/37388756/may-i-alter-server-for-foreign-table

I wonder how safe it is and if it is so simple, why it is not support in 
ALTER FOREIGN TABLE statement?


Thanks in advance,

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] SendRowDescriptionMessage() is slow for queries with a lot of columns

On Fri, Sep 29, 2017 at 5:02 AM, tushar  wrote:
> Case 3- TIME=1000
>
> PG HEAD =>tps = 1061.031463 (excluding connections establishing)
> PG HEAD+patch => tps= 8011.784839(3.30+% vs. head)

Going from 1061 tps to 8011 tps is not a 3.3% gain.  I assume you
garbled this output somehow.

Also note that you really mean +3.30% not 3.30+%.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] hash partitioning

On Thu, Sep 28, 2017 at 1:54 AM, Amit Langote
 wrote:
> I looked into how satisfies_hash_partition() works and came up with an
> idea that I think will make constraint exclusion work.  What if we emitted
> the hash partition constraint in the following form instead:
>
> hash_partition_mod(hash_partition_hash(key1-exthash, key2-exthash),
>) = 
>
> With that form, constraint exclusion seems to work as illustrated below:
>
> \d+ p0
> <...>
> Partition constraint:
> (hash_partition_modulus(hash_partition_hash(hashint4extended(a,
> '8816678312871386367'::bigint)), 4) = 0)
>
> -- note only p0 is scanned
> explain select * from p where
> hash_partition_modulus(hash_partition_hash(hashint4extended(a,
> '8816678312871386367'::bigint)), 4) = 0;

What we actually want constraint exclusion to cover is SELECT * FROM p
WHERE a = 525600;

As Amul says, nobody's going to enter a query in the form you have it
here.  Life is too short to take time to put queries into bizarre
forms.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH]make pg_rewind to not copy useless WAL files

2017-09-29 Thread chenhj

On 2017-09-29 19:29:40，"Alexander Korotkov"  wrote:

On Fri, Sep 29, 2017 at 10:07 AM, chenhj  wrote:




OK.  That makes sense.  Thank you for the explanation.


I still have some minor comments.
 
/*
+* Save the WAL filenames of the divergence and the current WAL insert
+* location of the source server. Later only the WAL files between those
+* would be copied to the target data directory.



Comment is outdated.  We don't save filenames anymore, now we save segment 
numbers.
 
+* Note:The later generated WAL files in the source server before the end
+* of the copy of the data files must be made available when the target
+* server is started. This can be done by configuring the target server as
+* a standby of the source server.
+*/


You miss space after "Note:".  Also, it seems reasonable for me to leave empty 
line before "Note:".


# Setup parameter for WAL reclaim 


Parameter*s*, because you're setting up multiple of them.


# The accuracy of imodification from pg_ls_waldir() is seconds, so sleep one 
seconds


One second without "s".


Also, please check empty lines in 006_wal_copy.pl to be just empty lines 
without tabs.


Thanks for your comments, i had fix above problems.
And also add several line breaks at long line in 006_wal_copy.pl
Please check this patch again.


--
Best Regards
Chen Huajun

pg_rewind_wal_copy_reduce_v6.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Shaky coding for vacuuming partitioned relations

On Fri, Sep 29, 2017 at 12:43 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> If I understand correctly, problem #1 is that get_rel_oids() can emit
>> a user-visible cache lookup failure message. There is a proposed patch
>> by Michael Paquier which appears to implement the design suggested by
>> Tom.  I think that the normal procedure would be for Tom to commit
>> that change if he's happy with it.
>
> Yes, I'm happy to take responsibility for this.

Great, thanks.

>> I don't think I understand problem #2.  I think the concern is about
>> reporting the proper relation name when VACUUM cascades from a
>> partitioned table to its children and then some kind of concurrent DDL
>> happens, but I don't see a clear explanation on the thread as to what
>> exactly the failure scenario is, and I didn't see a problem in some
>> simple tests I just ran.  Furthermore, it sounds like this might get
>> fixed as part of committing the patch to allow VACUUM to mention
>> multiple tables, which Tom has indicated he will handle.
>
> I think the conclusion was that this wouldn't actually happen in v10,
> but I will take a closer look and do something about it if it is possible.

Even better, thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Shaky coding for vacuuming partitioned relations

Robert Haas  writes:
> If I understand correctly, problem #1 is that get_rel_oids() can emit
> a user-visible cache lookup failure message. There is a proposed patch
> by Michael Paquier which appears to implement the design suggested by
> Tom.  I think that the normal procedure would be for Tom to commit
> that change if he's happy with it.

Yes, I'm happy to take responsibility for this.

> I don't think I understand problem #2.  I think the concern is about
> reporting the proper relation name when VACUUM cascades from a
> partitioned table to its children and then some kind of concurrent DDL
> happens, but I don't see a clear explanation on the thread as to what
> exactly the failure scenario is, and I didn't see a problem in some
> simple tests I just ran.  Furthermore, it sounds like this might get
> fixed as part of committing the patch to allow VACUUM to mention
> multiple tables, which Tom has indicated he will handle.

I think the conclusion was that this wouldn't actually happen in v10,
but I will take a closer look and do something about it if it is possible.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] path toward faster partition pruning

On Thu, Sep 28, 2017 at 5:16 AM, David Rowley
 wrote:
> I'd imagine, for
> each partition key, you'd want to store a Datum with the minimum and
> maximum possible value based on the quals processed. If either the
> minimum or maximum is still set to NULL, then it's unbounded at that
> end. If you encounter partkey = Const, then minimum and maximum can be
> set to the value of that Const. From there you can likely ignore any
> other quals for that partition key, as if the query did contain
> another qual with partkey = SomeOtherConst, then that would have
> become a gating qual during the constant folding process. This way if
> the user had written WHERE partkey >= 1 AND partkey <= 1 the
> evaluation would end up the same as if they'd written WHERE partkey =
> 1 as the minimum and maximum would be the same value in both cases,
> and when those two values are the same then it would mean just one
> theoretical binary search on a partition range to find the correct
> partition instead of two.

I have not looked at the code submitted here in detail yet but I do
think we should try to avoid wasting cycles in the
presumably-quite-common case where equality is being tested.  The
whole idea of thinking of this as minimum/maximum seems like it might
be off precisely for that reason.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Shaky coding for vacuuming partitioned relations

2017-09-29 Thread Bossart, Nathan

On 9/29/17, 11:18 AM, "Robert Haas"  wrote:
> I don't think I understand problem #2.  I think the concern is about
> reporting the proper relation name when VACUUM cascades from a
> partitioned table to its children and then some kind of concurrent DDL
> happens, but I don't see a clear explanation on the thread as to what
> exactly the failure scenario is, and I didn't see a problem in some
> simple tests I just ran.  Furthermore, it sounds like this might get
> fixed as part of committing the patch to allow VACUUM to mention
> multiple tables, which Tom has indicated he will handle.

Yes.  It looks like v10 is safe, and the vacuum-multiple-relations
patch should help prevent any future logging issues caused by this.

Discussion here: 
http://postgr.es/m/CAB7nPqRX1465FP%2Bameysxxt63tCQDDW6KvaTPMfkSxaT1TFGfw%40mail.gmail.com

Nathan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench - minor fix for meta command only scripts

2017-09-29 Thread Jeff Janes

On Mon, Sep 11, 2017 at 6:27 PM, Fabien COELHO  wrote:

>
> Hello Jeff,
>
> Shouldn't we use pg_usleep to ensure portability?  it is defined for
>> front-end code.  But it returns void, so the error check will have to be
>> changed.
>>
>
> Attached v3 with pg_usleep called instead.
>
> I didn't see the problem before the commit I originally indicated , so I
>> don't think it has to be back-patched to before v10.
>>
>
> Hmmm you've got a point, although I'm not sure how it could work
> without sleeping explicitely. Maybe the path was calling select with an
> empty wait list plus timeout, and select is kind enough to just sleep on an
> empty list, or some other miracle.

Not really a miracle, calling select with an empty list of file handles is
a standard way to sleep on Unix-like platforms.  (Indeed, that is how
pg_usleep is implemented on non-Windows platforms, see
"src/port/pgsleep.c").  The problem is that it is reportedly not portable
to Windows.  But I tested pgbench.exe for 9.6.5-1 from EDB installer, and I
don't see excessive CPU usage for a throttled run, and it throttles to
about the correct speed.  So maybe the non-portability is more rumor than
reality.  So I don't know if this needs backpatching or not.  But it should
be fixed for v10, as there it becomes a demonstrably live issue.

> ISTM clearer to explicitly sleep in that case.

Yes.

 Cheers,

Jeff

Re: [HACKERS] The case for removing replacement selection sort

On Fri, Sep 29, 2017 at 7:19 AM, Robert Haas  wrote:
> That supports your theory that there's some confounding factor in the
> CREATE INDEX case, such as I/O scheduling.  Since this machine has an
> SSD, I guess I don't have a mental model for how that works.  We're
> not waiting for the platter to rotate...

Random I/O is still significantly more expensive with SSDs, especially
random writes, where all the wear leveling stuff comes into play.
There is a tiny universe of very complicated firmware within every SSD
[1]. (I am generally concerned about the trend towards increasingly
complicated, unauditable firmware like this, but that's another
story.)

> ...but I guess that's all irrelevant as far as this patch goes.  The
> point of this patch is to simplify things from removing a technique
> that is no longer effective, and the evidence we have supports the
> contention that it is no longer effective.  I'll go commit this.

Thanks.

[1] https://lwn.net/Articles/353411/
-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Shaky coding for vacuuming partitioned relations

On Thu, Sep 28, 2017 at 1:31 AM, Noah Misch  wrote:
> This thread now has two open items, both of them pertaining to VACUUM error
> messages involving partitioning.  The pair is probably best treated as a
> single open item.

If I understand correctly, problem #1 is that get_rel_oids() can emit
a user-visible cache lookup failure message. There is a proposed patch
by Michael Paquier which appears to implement the design suggested by
Tom.  I think that the normal procedure would be for Tom to commit
that change if he's happy with it.

I don't think I understand problem #2.  I think the concern is about
reporting the proper relation name when VACUUM cascades from a
partitioned table to its children and then some kind of concurrent DDL
happens, but I don't see a clear explanation on the thread as to what
exactly the failure scenario is, and I didn't see a problem in some
simple tests I just ran.  Furthermore, it sounds like this might get
fixed as part of committing the patch to allow VACUUM to mention
multiple tables, which Tom has indicated he will handle.

> [Action required within three days.  This is a generic notification.]

If someone could specify a particular action they'd like me to take,
I'll consider taking it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] list of credits for release notes

On 9/29/17 11:35, Robert Haas wrote:
> On Wed, Sep 27, 2017 at 8:29 PM, Michael Paquier
>  wrote:
>> Looking at this list, the first name is followed by the family name,
>> so there are inconsistencies with some Japanese names:
>> - Fujii Masao should be Masao Fujii.
>> - KaiGai Kohei should be Kohei Kaigai.
> 
> But his emails say Fujii Masao, not Masao Fujii.
> 
> KaiGai's case is a bit trickier, as his email name has changed over time.

Yes, I used the form that the person used in their emails.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transactions involving multiple postgres foreign servers

On Wed, Sep 27, 2017 at 11:15 PM, Masahiko Sawada  wrote:
> I think that making a resolver process have connection caches to each
> foreign server for a while can reduce the overhead of connection to
> foreign servers. These connections will be invalidated by DDLs. Also,
> most of the time we spend to commit a distributed transaction is the
> interaction between the coordinator and foreign servers using
> two-phase commit protocal. So I guess the time in signalling to a
> resolver process would not be a big overhead.

I agree.  Also, in the future, we might try to allow connections to be
shared across backends.  I did some research on this a number of years
ago and found that every operating system I investigated had some way
of passing a file descriptor from one process to another -- so a
shared connection cache might be possible.

Also, we might port the whole backend to use threads, and then this
problem goes way.  But I don't have time to write that patch this
week.  :-)

It's possible that we might find that neither of the above approaches
are practical and that the performance benefits of resolving the
transaction from the original connection are large enough that we want
to try to make it work anyhow.  However, I think we can postpone that
work to a future time.  Any general solution to this problem at least
needs to be ABLE to resolve transactions at a later time from a
different session, so let's get that working first, and then see what
else we want to do.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] list of credits for release notes

On Wed, Sep 27, 2017 at 8:29 PM, Michael Paquier
 wrote:
> Looking at this list, the first name is followed by the family name,
> so there are inconsistencies with some Japanese names:
> - Fujii Masao should be Masao Fujii.
> - KaiGai Kohei should be Kohei Kaigai.

But his emails say Fujii Masao, not Masao Fujii.

KaiGai's case is a bit trickier, as his email name has changed over time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Multicolumn hash indexes

On Fri, Sep 29, 2017 at 10:54 AM, Tom Lane wrote:
> There are few if any indexing techniques where the first column isn't
> significantly more important than the rest --- certainly that's true
> for btree, for example.

Well, BRIN doesn't care.

> I do not think it's a showstopper if that's
> true for hash as well.

Certainly true, but on the other hand, if the design for which you are
advocating doesn't address a problem Tomasz has, he probably won't
want to bother implementing it.

I think it's pretty interesting to think about what other index
designs we might have that using hashing but are not the same as our
existing hash indexes. Reviewing Amit's patches for hash indexes has
helped me to understand how the hash AM works a lot better than I had
before, and it's really a pretty limiting design. The buckets aren't
sorted by hashcode overall, just within a page, which sucks, and the
way that storage allocations are managed within the index is painful.
It might be worth thinking about what we might create for a hash index
today if we were starting over from scratch. Andres proposed a
hash-over-btree thing where you build a hash value for each tuple and
then sort them by hash code and arrange the results into a btree; I
think that would have the same multi-column index problems you're
talking about here, but it would be more space-efficient.

Now you could also imagine something where we keep a separate set of
hash buckets for each column and multi-column searches are handled by
searching each hash table separately and taking the intersection of
the sets of TIDs found for each column. Not sure that's a good idea,
but it's sort of like what GIN does.

There are probably other ideas, too. We should investigate what
others have done. I see that
https://docs.microsoft.com/en-us/sql/relational-databases/in-memory-oltp/hash-indexes-for-memory-optimized-tables
section E.1 claims that SQL Server requires a hash index to include an
equality condition on every column in the index. I still think that's
an appealing option from a theoretical point of view even if the
implementation difficulties are formidable.

Come to think of it, I think that the planner behavior you describe
whereby we put a premium on throwing away unnecessary conditions is
probably not so great in other cases, too. In a btree index, we have
to step through all the tuples in the range covered by the index scan
one by one; if the join sitting above the index scan will reject those
tuples anyway when it tests the join qual, I can see that doing the
same test redundantly at the scan level could well be a loser. But
suppose it's a BRIN index. Now doing the test at the scan level is a
lot more likely to win -- I think -- because BRIN can perform that
test once per page rather than once per tuple, whereas the join will
have to reject individual tuples.

This sounds very similar to the question of whether we ought to infer
implied inequalities, which as you know has been a regular complaint.
It's certainly the case that enforcing an inequality in multiple
places can be inefficient, but it can also be a tremendous win if the
inequality is highly selective, which is not uncommon (e.g. imagine a
sort-and-merge-join between two tables and an inequality on the join
column that knocks out 90% - or 99% - of the rows in each). I have
been wondering whether we need a concept of an optional qual. Instead
of dropping a qual that doesn't need to be enforced, move it to a
separate bucket of quals that can be enforced if it looks like a win
from a cost perspective and otherwise ignored without breaking
anything. Implied inequalities (and maybe implied equalities) could
be thrown into such a bucket as well, allowing the decision about
whether to enforce them to be based on some algorithm or heuristic
rather than a summary judgement.

Or maybe that's not the right concept at all, but I think we need some
idea about how cases like this can be handled. People want the
database to handle the cases about which they care, not judge which
cases those are.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: bgw_type (was Re: [HACKERS] Why does logical replication launcher set application_name?)

On 9/27/17 18:59, Daniel Gustafsson wrote:
> The patch applies with minor fuzz, compiles without introduced warnings and
> work the way it says on the tin.  The utility of the proposed functionality is
> a clear win so +1 on getting that in.

I have committed this patch incorporating the feedback in this thread.

> Regarding the following hunk:
> 
> +   process listing, for example.  bgw_name on the
> +   other hand can contain additional information about the specific process.
> +   (Typically, the string for bgw_name will 
> contain
> +   the string for bgw_type somehow, but that is 
> not
> +   strictly required.)
> 
> This reads a bit too (oddly) detailed to me, I would’ve phrased it as 
> something
> along the lines of:
> 
> "Typically, the string for bgw_name will contain the type somehow, but 
> that
>  is not strictly required.”

Yes, that's better.

> I find omitting “background worker” in the following hunk is leaving out
> valuable information for bgworkers with badly configured types, but it’s
> definitely a matter of taste rather than a straight-up patch critique:
> 
> -  errmsg("terminating background worker \"%s\" due to administrator 
> command",
> - MyBgworkerEntry->bgw_name)));
> +  errmsg("terminating %s due to administrator command",
> + MyBgworkerEntry->bgw_type)));

Also changed.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: bgw_type (was Re: [HACKERS] Why does logical replication launcher set application_name?)

On 8/31/17 23:22, Michael Paquier wrote:
> On Fri, Sep 1, 2017 at 4:49 AM, Peter Eisentraut
>  wrote:
>> On 5/30/17 23:10, Peter Eisentraut wrote:
>>> Here is a proposed solution that splits bgw_name into bgw_type and
>>> bgw_name_extra.  bgw_type shows up in pg_stat_activity.backend_type.
>>> Uses of application_name are removed, because they are no longer
>>> necessary to identity the process type.
>>
>> Updated patch incorporating the feedback.  I have kept bgw_name as it
>> was and just added bgw_type completely independently.
> 
> - errmsg("terminating background worker \"%s\" due to
> administrator command",
> -MyBgworkerEntry->bgw_name)));
> + errmsg("terminating %s due to administrator command",
> +MyBgworkerEntry->bgw_type)));
> "terminating background worker %s of type %s due to administrator
> command", bgw_name, bgw_type?

OK.

>> One open question is how to treat a missing (empty) bgw_type.  I
>> currently fill in bgw_name as a fallback.  We could also treat it as an
>> error or a warning as a transition measure.
> 
> Hm. Why not reporting an empty type string as NULL at SQL level and
> just let it empty them? I tend to like more interfaces that report
> exactly what is exactly registered at memory-level, because that's
> easier to explain to users and in the documentation, as well as easier
> to interpret and easier for module developers.

The problem here is that we refer to bgw_type in a bunch of places now,
and adding a suitable fallback in all of these places would be a lot of
code and it would create a regression in behavior.  In practice, I think
that would be a lot of trouble for no gain.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Multicolumn hash indexes

Robert Haas  writes:
> Maybe you're worrying about something like a billion-row table where
> there are 3 columns that form a composite key: (1,1,1), (1,1,2), ...,
> (1,1000),(1,2,1),...,(1,1000,1000),(2,1,1),...,(1000,1000,1000).  In
> that case, treating the leading column as most important will indeed
> be terrible, since we'll put all billion rows into 1000 buckets no
> matter how many bucket splits we do.

> That seems a little unusual, though.

There are few if any indexing techniques where the first column isn't
significantly more important than the rest --- certainly that's true
for btree, for example.  I do not think it's a showstopper if that's
true for hash as well.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patch for hash index

On Fri, Sep 8, 2017 at 4:07 AM, Thomas Munro 
wrote:

> Hi Shubham,
>
> On Tue, Jun 27, 2017 at 9:21 PM, Shubham Barai 
> wrote:
> > If these two hash keys (78988658 and 546789888) mapped to the same
> bucket, this will result in false serialization failure.
> > Please correct me if this assumption about false positives is wrong.
>
> I wonder if there is an opportunity to use computed hash values
> directly in predicate lock tags, rather than doing it on the basis of
> buckets.  Perhaps I'm missing something important about the way that
> locks need to escalate that would prevent that from working.


+1,
Very nice idea!  Locking hash values directly seems to be superior over
locking hash index pages.
Shubham, do you have any comment on this?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Multicolumn hash indexes

On Wed, Sep 27, 2017 at 5:52 PM, Tomasz Ostrowski
 wrote:
> I feel that this would eliminate a large amount of potential gains from such
> an index. This would be usable only when a sufficiently variable column
> exists, in which case a simple hash index on the column wouldn't be much
> worse.

If the index isn't very selective and any query will match many rows
anyway, I think it will often be superior to use a BRIN index (but I
might be wrong).

Maybe you're worrying about something like a billion-row table where
there are 3 columns that form a composite key: (1,1,1), (1,1,2), ...,
(1,1000),(1,2,1),...,(1,1000,1000),(2,1,1),...,(1000,1000,1000).  In
that case, treating the leading column as most important will indeed
be terrible, since we'll put all billion rows into 1000 buckets no
matter how many bucket splits we do.

That seems a little unusual, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The case for removing replacement selection sort

On Thu, Sep 28, 2017 at 6:44 PM, Peter Geoghegan  wrote:
> I'm glad to hear that. But, I should reiterate that if sorting
> actually gets faster when my patch is applied, then that's something
> that I consider to be a bonus. (This is primarily a refactoring patch,
> to remove a bunch of obsolete code.)

I understand that.  The tests above weren't about your patch; they
were about whether replacement selection still has value.

>> Any idea what causes this regression?
>
> I don't know. My best guess is that the overall I/O scheduling is now
> suboptimal due to commit e94568e (Heikki's preloading thing), because
> this is CREATE INDEX, and there is new competitive pressure. You might
> find it hard to replicate the problem with a "SELECT COUNT(DISTINCT
> aid) FROM pgbench_accounts", which would confirm this explanation. Or,
> you could also see what happens with a separate temp tablespace.

I tried out that test case, again on 9.6 and master, again with scale
factor 300.  On 9.6, with default settings, it took ~12.5 seconds.
With 4MB of work_mem, it took about ~13.2 s with replacement selection
and ~12.5 s using quicksorted runs.  On master, with default settings,
it took ~8.4 seconds.  With 4MB of work_mem, it took about ~12.3 s
using replacement selection and ~8.4 seconds using quicksorted runs.
So here, everything was faster on master, but replacement selection
was only slightly faster while the other technique was a lot faster.

That supports your theory that there's some confounding factor in the
CREATE INDEX case, such as I/O scheduling.  Since this machine has an
SSD, I guess I don't have a mental model for how that works.  We're
not waiting for the platter to rotate...

...but I guess that's all irrelevant as far as this patch goes.  The
point of this patch is to simplify things from removing a technique
that is no longer effective, and the evidence we have supports the
contention that it is no longer effective.  I'll go commit this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] plpgsql_check future



Hello Pavel,


1. It is placed on my personal repository on GitHub. It is far to perfect
from security, from substitutability perspective.

Any ideas?


Ask to have a copy on git.postgresql.org?

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix bloom WAL tap test

On Wed, Sep 6, 2017 at 5:06 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Wed, Sep 6, 2017 at 4:08 PM, Alexander Korotkov <
> a.korot...@postgrespro.ru> wrote:
>
>> I just realized that these lines of contrib/bloom/t/001_wal.pl don't
>> check that queries give same results on master and standby.  They just
>> check that *return codes* of psql are equal.
>>
>> # Run test queries and compare their result
>>> my $master_result = $node_master->psql("postgres", $queries);
>>> my $standby_result = $node_standby->psql("postgres", $queries);
>>> is($master_result, $standby_result, "$test_name: query result matches");
>>
>>
>> Attached patch fixes this problem by using safe_psql() which returns psql
>> output instead of return code.  For safety, this patch replaces psql() with
>> safe_psql() in other places too.
>>
>> I think this should be backpatched to 9.6 as bugfix.
>>
>
> Also, it would be nice to call wal-check on bloom check (now wal-check
> isn't called even on check-world).
> See attached patch.  My changes to Makefile could be cumbersome.  Sorry
> for that, I don't have much experience with them...
>

This topic didn't receive any attention yet.  Apparently, it's because of
in-progress commitfest and upcoming release.
I've registered it on upcoming commitfest as bug fix.
https://commitfest.postgresql.org/15/1313/

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Enhancements to passwordcheck

2017-09-29 Thread Magnus Hagander

On Fri, Sep 29, 2017 at 3:17 PM, Alvaro Herrera 
wrote:

> Michael Paquier wrote:
>
> > Client commands may be run on a trusted network as well, let's not
> > forget that.
>
> I've never seen this "trusted network" thing of which you speak.  Is it
> nice?
>

I believe it's usually referred to as "localhost"?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: [HACKERS] Arrays of domains

2017-09-29 Thread Andrew Dunstan



On 09/28/2017 05:44 PM, Tom Lane wrote:
>
> I get these query timings in a non-cassert build:
>
>   HEADPatch
>
> Q15453.235 ms 5440.876 ms
> Q29340.670 ms 10191.194 ms
> Q319078.457 ms18967.279 ms
> Q448196.338 ms58547.531 ms
>
>
[ analysis elided]
>
> Assuming that that's going to happen for v11, I'm inclined to leave the
> optimization problem until the dust settles around CaseTestExpr.
> Does anyone feel that this can't be committed before that's addressed?
>
>   


I'm Ok with it as long as we don't forget to revisit this.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enhancements to passwordcheck

2017-09-29 Thread Alvaro Herrera

Michael Paquier wrote:

> Client commands may be run on a trusted network as well, let's not
> forget that.

I've never seen this "trusted network" thing of which you speak.  Is it
nice?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: add --if-exists to pg_recvlogical

On 9/22/17 15:31, Peter Eisentraut wrote:
> I suggest you create a patch that focuses on the --create part.
> 
> You can create a separate follow-on patch for the --start part if you
> wish, but I think that that patch would be rejected.

I have moved this entry to the next commit fest, awaiting your updated
patch.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rewriting the test of pg_upgrade as a TAP test

On 9/22/17 16:48, Peter Eisentraut wrote:
> On 9/19/17 19:00, Michael Paquier wrote:
>> On Wed, Sep 20, 2017 at 7:57 AM, Peter Eisentraut
>>  wrote:
>>> To get things rolling, I have committed just the basic TAP tests, to
>>> give it a spin on the build farm.  I'll work through the rest in the
>>> coming days.
> 
> I have reverted this because of the build farm issue.  Putting the patch
> on hold in the CF until we have a new plan.

Set to "Returned with feedback" now.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Walsender timeouts and large transactions

2017-09-29 Thread Sokolov Yura

Good day, Kyoutaro

On 2017-09-29 11:26, Kyotaro HORIGUCHI wrote:

Hello,

At Wed, 27 Sep 2017 14:28:37 +0300, Sokolov Yura
 wrote in
<90bb67da7131e6186b50897c4b0f0...@postgrespro.ru>

On 2017-09-12 11:28, Kyotaro HORIGUCHI wrote:
> Hello,
> At Wed, 06 Sep 2017 13:46:16 +, Yura Sokolov
>  wrote in
> <20170906134616.18925.88390.p...@coridan.postgresql.org>
> As the result, I think that the functions should be modified as
> the following.
> - Forcing slow-path if time elapses a half of a ping period is
>   right. (GetCurrentTimestamp is anyway requried.)
> - The slow-path should not sleep waiting Latch. It should just
>   pg_usleep() for maybe 1-10ms.
> - We should go to the fast path just after keepalive or response
>   message has been sent. In other words, the "if (now <" block
>   should be in the "for (;;)" loop. This avoids needless runs on
>   the slow-path.
> It would be refactorable as the following.
>   prepare for the send buffer;
>   for (;;)
>   {
> now = GetCurrentTimeStamp();
> if (now < )...
> {
>   fast-path
> }
> else
> {
>   slow-path
> }
> return if finished
> sleep for 1ms?
>   }
> What do you think about this?
> regards,
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center

Good day, Petr, Kyotaro

I've created failing test for issue (0001-Add-failing-test...) .
It tests insertion of 2 rows with 10ms wal_sender_timeout
(it fails in WalSndWriteData on master) and then deletion of
those rows with 1ms wal_sender_timeout (it fails in WalSndLoop).

Both Peter's patch and my simplified suggestion didn't pass the
test. I didn't checked Kyotaro's suggestion, though, cause I
didn't understand it well.

Mmm. The test seems broken. wal_sender_timeout = 10ms with
wal_receiver_status_interval=10s immediately causes a
timeout. Avoiding the timeout is just breaking the sane code.

I think you're not right. `wal_receiver_status_interval` is just
for status update, not for replies. Before I made my patch version,
I've added logging to every `now` and `last_reply_timestamp` during
test run. `last_reply_timestamp` definitely updates more often than
once in 10s. So, `wal_receiver_status_interval = 10s` has nothing
in common with receiver's replies, as I see.

(btw, logging slows down sender enough to "fix" test :-) )

And my patch doesn't avoid timeout check, so it doesn't break
anything. It just delays timeout on 1ms. Given, it is unpractical
to set wal_sender_timeout less than 50ms in production, this 1ms
increase in timeout check is negligible.

And I've checked just now that my patch passes test even with
wal_receiver_status_interval = 10s.

wal_sender_timeout = 3s and wal_receiver_status_interval=1s
effectively causes the problem with about 100 lines of (int)
insertion on UNIX socket connection, on my poor box.

I don't want to make test to lasts so long and generate so many data.
That is why I used such small timeouts for tests.

The original complain here came from the fact that
WalSndWriteData skips processing of replies for a long time on a
fast network. However Petr's patch fixed the problem, I pointed
that just letting the function take the slow path leads to
another problem, that is, waiting for new WAL records can result
in a unwanted pause in the slow path.

Combining the solutions for the two problem is my proposal sited
above. The sentence seems in quite bad style but the attached
file is the concorete patch of that.

Test is failing if there is "short quit" after `!pq_is_send_pending()`,
so I doubt your patch will pass the test.
And you've change calculated sleep time with sane waiting on all
insteresting events (using WaitLatchOrSocket) to semi-busy loop.
It at least could affect throughput.

And why did you remove `SetLatch(MyLatch)` in the end of function?
Probably this change is correct, but not obvious.

Any thoughts?

It certainly could be my test and my patch is wrong. But my point
is that test should be written first. Especially for such difficult
case. Without test it is impossible to say does our patches fix
something. And it is impossible to say if patch does something
wrong. And impossible to say if patch fixes this problem but
introduce new problem.

Please, write test for your remarks. If you think, my patch breaks
something, write test for the case my patch did broke. If you think
my test is wrong, write your test that is more correct.

Without tests it will be just bird's hubbub.

regards,

regards,
--
Sokolov Yura aka funny_falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Append implementation

2017-09-29 Thread Amit Kapila

On Wed, Sep 20, 2017 at 10:59 AM, Amit Khandekar  wrote:
> On 16 September 2017 at 10:42, Amit Kapila  wrote:
>> On Thu, Sep 14, 2017 at 9:41 PM, Robert Haas  wrote:
>>> On Mon, Sep 11, 2017 at 9:25 AM, Amit Kapila  
>>> wrote:
 I think the patch stores only non-partial paths in decreasing order,
 what if partial paths having more costs follows those paths?
>>>
>>> The general picture here is that we don't want the leader to get stuck
>>> inside some long-running operation because then it won't be available
>>> to read tuples from the workers.  On the other hand, we don't want to
>>> just have the leader do no work because that might be slow.  And in
>>> most cast cases, the leader will be the first participant to arrive at
>>> the Append node, because of the worker startup time.  So the idea is
>>> that the workers should pick expensive things first, and the leader
>>> should pick cheap things first.
>>>
>>
>> At a broader level, the idea is good, but I think it won't turn out
>> exactly like that considering your below paragraph which indicates
>> that it is okay if leader picks a partial path that is costly among
>> other partial paths as a leader won't be locked with that.
>> Considering this is a good design for parallel append, the question is
>> do we really need worker and leader to follow separate strategy for
>> choosing next path.  I think the patch will be simpler if we can come
>> up with a way for the worker and leader to use the same strategy to
>> pick next path to process.  How about we arrange the list of paths
>> such that first, all partial paths will be there and then non-partial
>> paths and probably both in decreasing order of cost.  Now, both leader
>> and worker can start from the beginning of the list. In most cases,
>> the leader will start at the first partial path and will only ever
>> need to scan non-partial path if there is no other partial path left.
>> This is not bulletproof as it is possible that some worker starts
>> before leader in which case leader might scan non-partial path before
>> all partial paths are finished, but I think we can avoid that as well
>> if we are too worried about such cases.
>
> If there are no partial subpaths, then again the leader is likely to
> take up the expensive subpath.
>

I think in general the non-partial paths should be cheaper as compared
to partial paths as that is the reason planner choose not to make a
partial plan at first place. I think the idea patch is using will help
because the leader will choose to execute partial path in most cases
(when there is a mix of partial and non-partial paths) and for that
case, the leader is not bound to complete the execution of that path.
However, if all the paths are non-partial, then I am not sure much
difference it would be to choose one path over other.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH]make pg_rewind to not copy useless WAL files

On Fri, Sep 29, 2017 at 10:07 AM, chenhj  wrote:

> On 2017-09-29 05:31:51, "Alexander Korotkov" 
> wrote:
>
> On Thu, Sep 28, 2017 at 10:52 PM, chenhj  wrote:
>
>> On 2017-09-29 00:43:18，"Alexander Korotkov" 
>> wrote:
>>
>> On Thu, Sep 28, 2017 at 6:44 PM, chenhj  wrote:
>>
>>> On 2017-09-28 01:29:29，"Alexander Korotkov" 
>>> wrote:
>>>
>>> It appears that your patch conflicts with fc49e24f.  Please, rebase it.
>>>
>>>
>>> Yes, i had rebased it, Please check the new patch.
>>>
>>
>> Good, now it applies cleanly.
>>
>> else if (strncmp(path, XLOGDIR"/", strlen(XLOGDIR"/")) == 0 &&
>>>  IsXLogFileName(path + strlen(XLOGDIR"/")) &&
>>>  (strcmp(path + strlen(XLOGDIR"/") + 8, divergence_wal_filename + 8) < 0
>>> ||
>>>   strcmp(path + strlen(XLOGDIR"/") + 8, last_source_wal_filename + 8) >
>>> 0))
>>
>>
>> According to our conding style, you should leave a space betwen XLOGDIF
>> and "/".
>> Also, you do a trick by comparison xlog segment numbers using strcmp().
>> It's nice, but I would prefer seeing XLogFromFileName() here.  It would
>> improve code readability and be less error prone during further
>> modifications.
>>
>>
>> Thanks for advice!
>> I had modified it.
>>
>
> OK. Patch becomes better.
> I also have more general question.  Why do we need upper bound for segment
> number (last_source_segno)?  I understand the purpose of lower bound
> (divergence_segno) which save us from copying extra WAL files, but what is
> upper bound for?  As I understood, we anyway need to replay most recent WAL
> records to reach consistent state after pg_rewind.  I propose to
> remove last_source_segno unless I'm missing something.
>
>
> Thanks for relay!
> When checkpoint occurs, some old WAL files will be renamed as future WAL
> files for later use.
> The upper bound for segment number (last_source_segno) is used to avoid
> copying these extra WAL files.
>
> When the parameter max_wal_size or max_min_size is large,these may be many
> renamed old WAL files for reused.
>
> For example, I have just looked at one of our production systems
> (max_wal_size = 64GB, min_wal_size = 2GB),
> the total size of WALs is about 30GB, and contains about 4GB renamed old
> WAL files.
>
> [postgres@hostxxx pg_xlog]$ ll
> ...
> -rw--- 1 postgres postgres 16777216 Sep 29 14:05
> 00010BCF0078
> -rw--- 1 postgres postgres 16777216 Sep 29 14:05
> 00010BCF0079
> -rw--- 1 postgres postgres 16777216 Sep 29 14:05
> 00010BCF007A
> -rw--- 1 postgres postgres 16777216 Sep 29 14:05
> 00010BCF007B
> -rw--- 1 postgres postgres 16777216 Sep 29 14:05
> 00010BCF007C
> -rw--- 1 postgres postgres 16777216 Sep 29 14:05
> 00010BCF007D
> -rw--- 1 postgres postgres 16777216 Sep 29 11:22
> 00010BCF007E //after this, there are about 4GB WALs for reuse
> -rw--- 1 postgres postgres 16777216 Sep 29 11:08
> 00010BCF007F
> -rw--- 1 postgres postgres 16777216 Sep 29 11:06
> 00010BCF0080
> -rw--- 1 postgres postgres 16777216 Sep 29 12:05
> 00010BCF0081
> -rw--- 1 postgres postgres 16777216 Sep 29 11:28
> 00010BCF0082
> -rw--- 1 postgres postgres 16777216 Sep 29 11:06
> 00010BCF0083
> ...
>

OK.  That makes sense.  Thank you for the explanation.

I still have some minor comments.


> /*
> +* Save the WAL filenames of the divergence and the current WAL insert
> +* location of the source server. Later only the WAL files between
> those
> +* would be copied to the target data directory.
>

Comment is outdated.  We don't save filenames anymore, now we save segment
numbers.


> +* Note:The later generated WAL files in the source server before the
> end
> +* of the copy of the data files must be made available when the target
> +* server is started. This can be done by configuring the target
> server as
> +* a standby of the source server.
> +*/


You miss space after "Note:".  Also, it seems reasonable for me to leave
empty line before "Note:".

# Setup parameter for WAL reclaim


Parameter*s*, because you're setting up multiple of them.

# The accuracy of imodification from pg_ls_waldir() is seconds, so sleep
> one seconds


One second without "s".

Also, please check empty lines in 006_wal_copy.pl to be just empty lines
without tabs.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] SQL/JSON in PostgreSQL

2017-09-29 12:09 GMT+02:00 Nikita Glukhov :

>
>
> I have some free time now. Is it last version?
>
> Regards
>
> Pavel
>
> Yes, this is still the latest version. Now I am working only on unfinished
> WIP
> patch no. 9, but I think it should be reviewed the last.
>
>

ok

Thank you

Pavel

-- 
> Nikita Glukhov
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>

Re: [HACKERS] SQL/JSON in PostgreSQL

2017-09-29 Thread Nikita Glukhov

On 29.09.2017 12:59, Pavel Stehule wrote:

2017-09-16 1:31 GMT+02:00 Nikita Glukhov >:

On 15.09.2017 22:36, Oleg Bartunov wrote:

On Fri, Sep 15, 2017 at 7:31 PM, Robert Haas
> wrote:

On Fri, Sep 15, 2017 at 10:10 AM, Daniel Gustafsson
> wrote:

Can we expect a rebased version of this patch for this
commitfest? Since it’s
a rather large feature it would be good to get it in
as early as we can in the
process.

Again, given that this needs a "major" rebase and hasn't
been updated
in a month, and given that the CF is already half over,
this should
just be bumped to the next CF. We're supposed to be
trying to review
things that were ready to go by the start of the CF, not
the end.

We are supporting v10 branch in our github repository
https://github.com/postgrespro/sqljson/tree/sqljson_v10

Since the first post we made a lot of changes, mostly because of
better understanding the standard and availability of
technical report

(http://standards.iso.org/ittf/PubliclyAvailableStandards/c067367_ISO_IEC_TR_19075-6_2017.zip

).
Most important are:

1.We abandoned FORMAT support, which could confuse our users,
since we
have data types json[b].

2. We use XMLTABLE infrastructure, extended for JSON_TABLE
support.

3. Reorganize commits, so we could split one big patch by several
smaller patches, which could be reviewed independently.

4. The biggest problem is documentation, we are working on it.

Nikita will submit patches soon.

Attached archive with 9 patches rebased onto latest master.

0001-jsonpath-v02.patch:
- jsonpath type
- jsonpath execution on jsonb type
- jsonpath operators for jsonb type
- GIN support for jsonpath operators

0002-jsonpath-json-v02.patch:
- jsonb-like iterators for json type
- jsonpath execution on json type
- jsonpath operators for json type

0003-jsonpath-extensions-v02.patch:
0004-jsonpath-extensions-tests-for-json-v02.patch:
- some useful standard extensions with tests
0005-sqljson-v02.patch:
- SQL/JSON constructors (JSON_OBJECT[AGG], JSON_ARRAY[AGG])
- SQL/JSON query functions (JSON_VALUE, JSON_QUERY, JSON_EXISTS)
- IS JSON predicate

0006-sqljson-json-v02.patch:
- SQL/JSON support for json type and tests

0007-json_table-v02.patch:
- JSON_TABLE using XMLTABLE infrastructure

0008-json_table-json-v02.patch:
- JSON_TABLE support for json type

0009-wip-extensions-v02.patch:
- FORMAT JSONB
- jsonb to/from bytea casts
- jsonpath operators
- some unfinished jsonpath extensions

Originally, JSON path was implemented only for jsonb type, and I
decided to
add jsonb-like iterators for json type for json support
implementation with
minimal changes in JSON path code. This solution (see
jsonpath_json.c from
patch 0002) looks a little dubious to me, so I separated json
support into
independent patches.

The last WIP patch 0009 is unfinished and contains a lot of
FIXMEs. But
the ability to use arbitrary Postgres operators in JSON path with
explicitly
specified types is rather interesting, and I think it should be
shown now
to get a some kind of pre-review.

We are supporting v11 and v10 branches in our github repository:

https://github.com/postgrespro/sqljson/tree/sqljson

https://github.com/postgrespro/sqljson/tree/sqljson_wip

https://github.com/postgrespro/sqljson/tree/sqljson_v10

https://github.com/postgrespro/sqljson/tree/sqljson_v10_wip

Attached patches can be produced simply by combining groups of
consecutive
commits from these branches.

I have some free time now. Is it last version?

Regards

Pavel

Yes, this is still the latest version. Now I am working only on
unfinished WIP

patch no. 9, but I think it should be reviewed the last.

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] SQL/JSON in PostgreSQL

Hi

2017-09-16 1:31 GMT+02:00 Nikita Glukhov :

> On 15.09.2017 22:36, Oleg Bartunov wrote:
>
> On Fri, Sep 15, 2017 at 7:31 PM, Robert Haas 
>> wrote:
>>
>>> On Fri, Sep 15, 2017 at 10:10 AM, Daniel Gustafsson 
>>> wrote:
>>>
 Can we expect a rebased version of this patch for this commitfest?
 Since it’s
 a rather large feature it would be good to get it in as early as we can
 in the
 process.

>>> Again, given that this needs a "major" rebase and hasn't been updated
>>> in a month, and given that the CF is already half over, this should
>>> just be bumped to the next CF.  We're supposed to be trying to review
>>> things that were ready to go by the start of the CF, not the end.
>>>
>> We are supporting v10 branch in our github repository
>> https://github.com/postgrespro/sqljson/tree/sqljson_v10
>>
>> Since the first post we made a lot of changes, mostly because of
>> better understanding the standard and availability of technical report
>> (http://standards.iso.org/ittf/PubliclyAvailableStandards/c0
>> 67367_ISO_IEC_TR_19075-6_2017.zip).
>> Most important are:
>>
>> 1.We abandoned FORMAT support, which could confuse our users, since we
>> have data types json[b].
>>
>> 2. We use XMLTABLE infrastructure, extended for  JSON_TABLE support.
>>
>> 3. Reorganize commits, so we could split one big patch by several
>> smaller patches, which could be reviewed independently.
>>
>> 4. The biggest problem is documentation, we are working on it.
>>
>> Nikita will submit patches soon.
>>
>
> Attached archive with 9 patches rebased onto latest master.
>
> 0001-jsonpath-v02.patch:
>  - jsonpath type
>  - jsonpath execution on jsonb type
>  - jsonpath operators for jsonb type
>  - GIN support for jsonpath operators
>
> 0002-jsonpath-json-v02.patch:
>  - jsonb-like iterators for json type
>  - jsonpath execution on json type
>  - jsonpath operators for json type
>
> 0003-jsonpath-extensions-v02.patch:
> 0004-jsonpath-extensions-tests-for-json-v02.patch:
>  - some useful standard extensions with tests
>  0005-sqljson-v02.patch:
>  - SQL/JSON constructors (JSON_OBJECT[AGG], JSON_ARRAY[AGG])
>  - SQL/JSON query functions (JSON_VALUE, JSON_QUERY, JSON_EXISTS)
>  - IS JSON predicate
>
> 0006-sqljson-json-v02.patch:
>  - SQL/JSON support for json type and tests
>
> 0007-json_table-v02.patch:
>  - JSON_TABLE using XMLTABLE infrastructure
>
> 0008-json_table-json-v02.patch:
>  - JSON_TABLE support for json type
>
> 0009-wip-extensions-v02.patch:
>  - FORMAT JSONB
>  - jsonb to/from bytea casts
>  - jsonpath operators
>  - some unfinished jsonpath extensions
>
>
> Originally, JSON path was implemented only for jsonb type, and I decided to
> add jsonb-like iterators for json type for json support implementation with
> minimal changes in JSON path code.  This solution (see jsonpath_json.c from
> patch 0002) looks a little dubious to me, so I separated json support into
> independent patches.
>
> The last WIP patch 0009 is unfinished and contains a lot of FIXMEs.  But
> the ability to use arbitrary Postgres operators in JSON path with
> explicitly
> specified  types is rather interesting, and I think it should be shown now
> to get a some kind of pre-review.
>
> We are supporting v11 and v10 branches in our github repository:
>
> https://github.com/postgrespro/sqljson/tree/sqljson
> https://github.com/postgrespro/sqljson/tree/sqljson_wip
> https://github.com/postgrespro/sqljson/tree/sqljson_v10
> https://github.com/postgrespro/sqljson/tree/sqljson_v10_wip
>
> Attached patches can be produced simply by combining groups of consecutive
> commits from these branches.
>
>
I have some free time now. Is it last version?

Regards

Pavel

--
> Nikita Glukhov
> Postgres Professional:http://www.postgrespro.com
> The Russian Postgres Company
>
>

Re: [HACKERS] pg_prepared_xact_status




On 29.09.2017 11:27, Craig Ringer wrote:
On 29 September 2017 at 15:57, Konstantin Knizhnik 
> wrote:


So you are saying that Postgresql 2PC mechanism is not complete
and user needs to maintain some extra information to make it work?


No, it provides what's needed for an implementation of what in XA 
terms is a local resource manager (LRM). What it does not provide is 
infrastructure to make postgres its self into a global transaction 
manager (GTM) for co-ordinating multiple LRMs.


It sounds like you're trying to build a GTM using PostgreSQL's 
existing LRM book-keeping as your authorative data store, right?


No exactly. I am trying to add 2PC to our pg_shardman: combination of 
pg_pathman + postgres_fdw + logical replication, which should provide HA 
and write scalability.
This architecture definitely not assume presence of GTM. Most of 
transactions are expected to be local (involves only one node) and 
number of participants of distributed transaction is expected to be much 
smaller than total number of nodes (usually 2). So we need to perform 
2PC without GTM.



The problems with 2PC arrive when coordinator node is not
available but is expected to be recovered in future.
In this case we may have not enough information to make a decision
whether to abort or commit prepared transaction.
But it is a different story. We need to use 3PC or some other
protocol to prevent such situation.


In that case the node sits and waits patiently for the GTM (or in more 
complex architectures, *a* valid voting quorum of GTMs) to be 
reachable again. Likely using a protocol like Raft, Paxos, 3PC etc to 
co-ordinate.


It can't do anything else, since if it unilaterally commits or rolls 
back it might later find out that the nodes on the other side of the 
network partition or whatever made the opposite decision and, boom!


Ok, I am not sure if  pg_prepared_xact_status can be really useful or not.
I agree with you that if we are implementing distributed transaction on 
top of Poasgres, then we need some better mechanism to determine 
transaction state.
But a lot of people are using 2PC without GTM or whatever else. For 
example, many Java ORMs are using 2PC for their transactions.
I think that it is better to provide to DBA or programmer some way to 
determine status of such transaction by GID (which is usually unique and 
known), as far as this information

is available in Postgres WAL.

In any case, I attached slightly improved version of this function which 
traverse log not only since last checkpoint, but also try iterates 
backward inspecting previous WAL segments.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index ae83291..fbf91f5 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -84,6 +84,7 @@
 #include "access/twophase_rmgr.h"
 #include "access/xact.h"
 #include "access/xlog.h"
+#include "access/xlog_internal.h"
 #include "access/xloginsert.h"
 #include "access/xlogutils.h"
 #include "access/xlogreader.h"
@@ -2408,3 +2409,106 @@ PrepareRedoRemove(TransactionId xid, bool giveWarning)
 
 	return;
 }
+
+Datum
+pg_prepared_xact_status(PG_FUNCTION_ARGS)
+{
+char const* gid = PG_GETARG_CSTRING(0);
+	XLogRecord *record;
+	XLogReaderState *xlogreader;
+	char	   *errormsg;
+	XLogRecPtr start_lsn;
+	XLogRecPtr lsn;
+	char const* xact_status = "unknown";
+	bool done = false;
+	TimeLineID timeline;
+	TransactionId xid = InvalidTransactionId;
+	XLogRecPtr wal_end = GetFlushRecPtr();
+
+	GetOldestRestartPoint(_lsn, );
+
+	xlogreader = XLogReaderAllocate(_local_xlog_page, NULL);
+	if (!xlogreader)
+		ereport(ERROR,
+(errcode(ERRCODE_OUT_OF_MEMORY),
+ errmsg("out of memory"),
+ errdetail("Failed while allocating a WAL reading processor.")));
+	while (true)
+	{
+		lsn = start_lsn;
+		do
+		{
+			record = XLogReadRecord(xlogreader, lsn, );
+			if (record == NULL)
+break;
+			lsn = InvalidXLogRecPtr; /* continue after the record */
+			if (XLogRecGetRmid(xlogreader) == RM_XACT_ID)
+			{
+uint32 info = XLogRecGetInfo(xlogreader);
+switch (info & XLOG_XACT_OPMASK)
+{
+case XLOG_XACT_PREPARE:
+	{
+		TwoPhaseFileHeader *hdr = (TwoPhaseFileHeader *)XLogRecGetData(xlogreader);
+		char* xact_gid = (char*)hdr + MAXALIGN(sizeof(TwoPhaseFileHeader));
+ 		if (strcmp(xact_gid, gid) == 0)
+		{
+			xid = hdr->xid;
+			xact_status = "prepared";
+		}
+		break;
+	}
+case XLOG_XACT_COMMIT_PREPARED:
+	{
+		xl_xact_commit *xlrec;
+		xl_xact_parsed_commit parsed;
+
+		xlrec = (xl_xact_commit *) XLogRecGetData(xlogreader);
+		ParseCommitRecord(info, xlrec, );
+		if (xid == parsed.twophase_xid)
+		{
+			Assert(TransactionIdIsValid(xid));
+

Re: [HACKERS] Partitions: \d vs \d+

2017-09-29 Thread Maksim Milyutin


On 29.09.2017 04:33, Amit Langote wrote:


So, we should be looking at partconstraintdef only when verbose is true,
because that's only when we set it to a valid value.  Now, if
partconstraintdef is NULL even after verbose is true, that means backend
returned that there exists no constraint for that partition, which I
thought would be true for a default partition (because the commit that
introduced default partitions also introduced "No partition constraint"),
but it's really not.

For example, \d and \d+ show contradictory outputs for a default partition.

create table p (a int) partition by list (a);
create table p1 partition of p for values in (1);
create table pd partition of p default;

\d pd
  Table "public.pd"
  Column |  Type   | Collation | Nullable | Default
+-+---+--+-
  a  | integer |   |  |
Partition of: p DEFAULT
No partition constraint

\d+ pd
 Table "public.pd"
  Column |  Type   | Collation | Nullable | Default | Storage | Stats
target | Description
+-+---+--+-+-+--+-
  a  | integer |   |  | | plain   |  |
Partition of: p DEFAULT
Partition constraint: (NOT ((a IS NOT NULL) AND (a = ANY (ARRAY[1]


Perhaps, there is no case when "No partition constraint" should be output,
but I may be missing something.


Anyhow, we have to protect ourselves from empty output from 
*pg_get_partition_constraintdef*. And printing *No partition constraint* 
would be good point to start to examine why we didn't get any constraint 
definition.


--
Regards,
Maksim Milyutin

Re: [HACKERS] SendRowDescriptionMessage() is slow for queries with a lot of columns

2017-09-29 Thread tushar


On 09/27/2017 10:50 PM, Andres Freund wrote:

This'll allow the later patches to allow the compiler to perform the
relevant optimizations. It also allows to optimize e.g. pq_sendint64()
to avoid having to do multiple byteswaps.


After applying all the required patches, able to see some performance gain

Virtual Machine configuration - Centos 6.5 x64 / 16 GB RAM / 8 VCPU core 
processor


./pgbench -M prepared -j 10 -c 10 -f /tmp/pgbench-many-cols.sql postgres 
-T TIME


After taking Median of 3 run  -

Case 1 – TIME=300

PG HEAD =>41285.089261 (excluding connections establishing)
PG HEAD+patch =>tps= 42446.626947(2.81+% vs. head)

Case 2- TIME=500

PG HEAD =>tps = 41252.897670 (excluding connections establishing)
PG HEAD+patch =>tps= 42257.439550(2.43+% vs. head)

Case 3- TIME=1000

PG HEAD =>tps = 1061.031463 (excluding connections establishing)
PG HEAD+patch => tps= 8011.784839(3.30+% vs. head)

Case 4-TIME=1500

PG HEAD =>tps = 40365.099628 (excluding connections establishing)
PG HEAD+patch =>tps= 42385.372848(5.00+% vs. head)

--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company

Re: [HACKERS] pg_prepared_xact_status

2017-09-29 Thread Craig Ringer

On 29 September 2017 at 15:57, Konstantin Knizhnik <
k.knizh...@postgrespro.ru> wrote:

> So you are saying that Postgresql 2PC mechanism is not complete and user
> needs to maintain some extra information to make it work?
>

No, it provides what's needed for an implementation of what in XA terms is
a local resource manager (LRM). What it does not provide is infrastructure
to make postgres its self into a global transaction manager (GTM) for
co-ordinating multiple LRMs.

It sounds like you're trying to build a GTM using PostgreSQL's existing LRM
book-keeping as your authorative data store, right?

> The problems with 2PC arrive when coordinator node is not available but is
> expected to be recovered in future.
> In this case we may have not enough information to make a decision whether
> to abort or commit prepared transaction.
> But it is a different story. We need to use 3PC or some other protocol to
> prevent such situation.

In that case the node sits and waits patiently for the GTM (or in more
complex architectures, *a* valid voting quorum of GTMs) to be reachable
again. Likely using a protocol like Raft, Paxos, 3PC etc to co-ordinate.

It can't do anything else, since if it unilaterally commits or rolls back
it might later find out that the nodes on the other side of the network
partition or whatever made the opposite decision and, boom!

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Walsender timeouts and large transactions

2017-09-29 Thread Kyotaro HORIGUCHI

Hello,

At Wed, 27 Sep 2017 14:28:37 +0300, Sokolov Yura  
wrote in <90bb67da7131e6186b50897c4b0f0...@postgrespro.ru>
> On 2017-09-12 11:28, Kyotaro HORIGUCHI wrote:
> > Hello,
> > At Wed, 06 Sep 2017 13:46:16 +, Yura Sokolov
> >  wrote in
> > <20170906134616.18925.88390.p...@coridan.postgresql.org>
> > As the result, I think that the functions should be modified as
> > the following.
> > - Forcing slow-path if time elapses a half of a ping period is
> >   right. (GetCurrentTimestamp is anyway requried.)
> > - The slow-path should not sleep waiting Latch. It should just
> >   pg_usleep() for maybe 1-10ms.
> > - We should go to the fast path just after keepalive or response
> >   message has been sent. In other words, the "if (now <" block
> >   should be in the "for (;;)" loop. This avoids needless runs on
> >   the slow-path.
> > It would be refactorable as the following.
> >   prepare for the send buffer;
> >   for (;;)
> >   {
> > now = GetCurrentTimeStamp();
> > if (now < )...
> > {
> >   fast-path
> > }
> > else
> > {
> >   slow-path
> > }
> > return if finished
> > sleep for 1ms?
> >   }
> > What do you think about this?
> > regards,
> > --
> > Kyotaro Horiguchi
> > NTT Open Source Software Center
> 
> Good day, Petr, Kyotaro
> 
> I've created failing test for issue (0001-Add-failing-test...) .
> It tests insertion of 2 rows with 10ms wal_sender_timeout
> (it fails in WalSndWriteData on master) and then deletion of
> those rows with 1ms wal_sender_timeout (it fails in WalSndLoop).
> 
> Both Peter's patch and my simplified suggestion didn't pass the
> test. I didn't checked Kyotaro's suggestion, though, cause I
> didn't understand it well.

Mmm. The test seems broken. wal_sender_timeout = 10ms with
wal_receiver_status_interval=10s immediately causes a
timeout. Avoiding the timeout is just breaking the sane code.

wal_sender_timeout = 3s and wal_receiver_status_interval=1s
effectively causes the problem with about 100 lines of (int)
insertion on UNIX socket connection, on my poor box.

The original complain here came from the fact that
WalSndWriteData skips processing of replies for a long time on a
fast network. However Petr's patch fixed the problem, I pointed
that just letting the function take the slow path leads to
another problem, that is, waiting for new WAL records can result
in a unwanted pause in the slow path.

Combining the solutions for the two problem is my proposal sited
above. The sentence seems in quite bad style but the attached
file is the concorete patch of that.

Any thoughts?

regards,
-- 
Kyotaro Horiguchi
NTT Open Source Software Center
*** a/src/backend/replication/walsender.c
--- b/src/backend/replication/walsender.c
***
*** 1151,1156  static void
--- 1151,1158 
  WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid,
  bool last_write)
  {
+ 	TimestampTz now = GetCurrentTimestamp();
+ 
  	/* output previously gathered data in a CopyData packet */
  	pq_putmessage_noblock('d', ctx->out->data, ctx->out->len);
  
***
*** 1160,1235  WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid,
  	 * several releases by streaming physical replication.
  	 */
  	resetStringInfo();
! 	pq_sendint64(, GetCurrentTimestamp());
  	memcpy(>out->data[1 + sizeof(int64) + sizeof(int64)],
  		   tmpbuf.data, sizeof(int64));
  
- 	/* fast path */
- 	/* Try to flush pending output to the client */
- 	if (pq_flush_if_writable() != 0)
- 		WalSndShutdown();
- 
- 	if (!pq_is_send_pending())
- 		return;
- 
  	for (;;)
  	{
  		int			wakeEvents;
  		long		sleeptime;
  		TimestampTz now;
  
  		/*
! 		 * Emergency bailout if postmaster has died.  This is to avoid the
! 		 * necessity for manual cleanup of all postmaster children.
  		 */
! 		if (!PostmasterIsAlive())
! 			exit(1);
! 
! 		/* Clear any already-pending wakeups */
! 		ResetLatch(MyLatch);
! 
! 		CHECK_FOR_INTERRUPTS();
! 
! 		/* Process any requests or signals received recently */
! 		if (ConfigReloadPending)
! 		{
! 			ConfigReloadPending = false;
! 			ProcessConfigFile(PGC_SIGHUP);
! 			SyncRepInitConfig();
! 		}
! 
! 		/* Check for input from the client */
! 		ProcessRepliesIfAny();
! 
! 		/* Try to flush pending output to the client */
! 		if (pq_flush_if_writable() != 0)
! 			WalSndShutdown();
! 
! 		/* If we finished clearing the buffered data, we're done here. */
! 		if (!pq_is_send_pending())
! 			break;
! 
  		now = GetCurrentTimestamp();
- 
- 		/* die if timeout was reached */
- 		WalSndCheckTimeOut(now);
- 
- 		/* Send keepalive if the time has come */
- 		WalSndKeepaliveIfNecessary(now);
- 
- 		sleeptime = WalSndComputeSleeptime(now);
- 
- 		wakeEvents = WL_LATCH_SET | WL_POSTMASTER_DEATH |
- 			WL_SOCKET_WRITEABLE | WL_SOCKET_READABLE | WL_TIMEOUT;
- 
- 		/* Sleep until something happens or we time out */
-

Re: [HACKERS] pg_prepared_xact_status

2017-09-29 Thread Craig Ringer

On 29 September 2017 at 15:57, Konstantin Knizhnik <
k.knizh...@postgrespro.ru> wrote:

>
> The idea of pg_prepared_xact_status function is that it allows to get
> status of 2PC transaction without any additional requirements to GIDs and
> any other additional information about participants of 2PC transaction.
>
>
This sounds kind-of like 1/4 of a distributed transaction resolver, without
a way to make it reliable enough to build the other 3/4.

To make this practical I think you'd need a way to retain historic GIDs +
their outcomes, and a way to prune that information only once an
application knows all interested participants consider the transaction
finalized.

I'd be all for a global xid status function if there were a good way to
manage resource retention. But it's fuzzy enough for txid_status, which
isn't really making any firm promises, just improving on the prior state of
"no f'ing idea what happened to that tx, sorry". 2PC consumers will want
absolute guarantees, not "dunno, sorry".

(Ahem, HeuristicMixedException raises its hand. You, sit down! You can only
get that if you mix 2PC and !2PC xacts).

I can see it being useful for Pg to be able to report a stream of GIDs +
outcomes for applications to consume. But unless you have either:

* some kind of app-controlled retention horizon and a way to multiplex it
for >1 app (like slots do); or

* a node registry where Pg its self implements a full built-in transaction
resolver;

then I think it's probably not going to get far.

I could see a full DTC resolver in postgres one day, once we have things
like working in-core logical rep based multi-master with 2PC support. But
that's a looong way off.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Index expression syntax




On 29.09.2017 11:03, Marko Tiikkaja wrote:
On Fri, Sep 29, 2017 at 9:31 AM, Konstantin Knizhnik 
> wrote:


I wonder why syntax error is produced in this case:

postgres=# create index metaindex on foo using
gin(to_tsvector('english', x)||to_tsvector('english',y));
ERROR:  syntax error at or near "||"
LINE 1: ...taindex on foo using gin(to_tsvector('english',
x)||to_tsvec...
^
[ .. ]

/|expression:|/
An expression based on one or more columns of the table. The
expression usually must be written with surrounding
parentheses, as shown in the syntax. However, the parentheses
can be omitted if the expression has the form of a function call.

So documentations states that sometimes it is possible to avoid
parentheses, but it is unclear why I have to use double
parentheses...
I think that either grammar should be fixed, either documentation
should be updated.


Your expression is clearly not a function call, it's a concatenation 
of two of them.  The documentation seems perfectly accurate to me?



O, sorry!
You are right. I just didn't notice extra parenthesis in CREATE INDEX 
syntax in case of using expressions.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Index expression syntax

2017-09-29 Thread Marko Tiikkaja

On Fri, Sep 29, 2017 at 9:31 AM, Konstantin Knizhnik <
k.knizh...@postgrespro.ru> wrote:

> I wonder why syntax error is produced in this case:
>
> postgres=# create index metaindex on foo using gin(to_tsvector('english',
> x)||to_tsvector('english',y));
> ERROR:  syntax error at or near "||"
> LINE 1: ...taindex on foo using gin(to_tsvector('english', x)||to_tsvec...
>  ^
> [ .. ]
>
> *expression:* An expression based on one or more columns of the table.
> The expression usually must be written with surrounding parentheses, as
> shown in the syntax. However, the parentheses can be omitted if the
> expression has the form of a function call.
>
> So documentations states that sometimes it is possible to avoid
> parentheses, but it is unclear why I have to use double parentheses...
> I think that either grammar should be fixed, either documentation should
> be updated.
>

Your expression is clearly not a function call, it's a concatenation of two
of them.  The documentation seems perfectly accurate to me?


.m

Re: [HACKERS] pg_prepared_xact_status




On 29.09.2017 06:02, Michael Paquier wrote:

On Fri, Sep 29, 2017 at 1:53 AM, Konstantin Knizhnik
 wrote:

In Postgres 10 we have txid_status function which returns status of
transaction by XID.
I wonder if it will be also useful to have similar function for 2PC
transactions which can operate with GID?
pg_prepared_xacts view allows to get information about prepared transaction
which are not yet committed or aborted.
But if transaction is committed, then there is no way now to find status of
this transaction.

But you need to keep track of the transaction XID of each transaction
happening on the remote nodes which are part of a global 2PC
transaction, no?


Why? We have GID which allows to identify 2PC transaction at all 
participant nodes.



  If you have this data at hand using txid_status is
enough to guess if a prepared transaction has been marked as committed
or prepared. And it seems to me that tracking those XIDs is mandatory
anyway for other consistency checks.


It is certainly possible to maintain information about XIDs involved in 
2PC transaction.

And it can really simplify recovery. But I wonder why it is mandatory?
Keeping track of XIDs requires some persistent storage.
So you are saying that Postgresql 2PC mechanism is not complete and user 
needs to maintain some extra information to make it work?


Also, I think that it is not necessary to know XIDs of all local 
transactions involved in 2PC. It is enough to know XID of coordinator's 
transaction.
It can be included in GID (as I proposed in the end of my mail). In this 
case txid_status can be used at coordinator to check global status of 
2PC transaction.


The idea of pg_prepared_xact_status function is that it allows to get 
status of 2PC transaction without any additional requirements to GIDs 
and any other additional information about participants of 2PC transaction.






If crash happen during 2PC commit, then transaction can be in prepared state
at some nodes and committed/aborted at  other nodes.

Handling inconsistencies here is a tricky problem, particularly if a
given transaction is marked as both committed and aborted on many
nodes.

How it can be?
Abort of transaction can happen only at prepare stage.
In this case coordinator should rollback transaction everywhere.
There should be no committed transactions in this case.

The following situations are possible:
1. Transaction is prepared at some nodes and information about it is not 
available at other nodes. It means that crash happen at prepare state 
and transaction was not able to

complete prepare at all nodes. It is safe to abort transaction in this case.
2. Transaction is prepared at some nodes and aborted at another nodes. 
The same as 1 - we can safely abort transaction everywhere.
3. Transaction is prepared at all nodes. It means that coordinator was 
crashed before sending commit message. It is safe to commit transaction 
everywhere.
4. Transaction is prepared at some nodes and committed at other nodes. 
Commit message was no delivered or proceeded by other nodes before crash.

It is safe to commit transaction at all nodes.


The problems with 2PC arrive when coordinator node is not available but 
is expected to be recovered in future.
In this case we may have not enough information to make a decision 
whether to abort or commit prepared transaction.
But it is a different story. We need to use 3PC or some other protocol 
to prevent such situation.



The only way that I could think of would be to perform PITR to
recover from the inconsistent states. So that's not an easy problem,
becoming even more tricky if more than one transaction is involved and
many transactions are inter-dependent across nodes.


3. Same GID can be reused multiple times. In this case
pg_prepared_xact_status function will return incorrect result, because it
will return information about first global transaction with such GID after
checkpoint and not the recent one.

Yeah, this argument alone is why I think that this is a dead-end approach.


May be. But I think that in most real systems unique GIDs are generated, 
because otherwise it is difficult to address concurrency and recovery 
issues.





There is actually alternative approach to recovery of 2PC transactions. We
can include coordinator identifier in GID (we can use GetSystemIdentifier()
to identify coordinator's node)
and XID of coordinator's transaction. In this case we can use txid_status()
to check status of transaction at coordinator. It eliminates need to scan
WAL to determine status of prepared transaction.

+GetOldestRestartPoint(, );
+
+xlogreader = XLogReaderAllocate(_local_xlog_page, NULL);
+if (!xlogreader)
So you scan a bunch of records for each GID? This is really costly. I
think that you would have an easier life by tracking the XID of each
transaction involved remotely. In Postgres-XL, this is not a problem
as XIDs are assigned globally and consistently. But you would gain in

[HACKERS] Index expression syntax


I wonder why syntax error is produced in this case:

postgres=# create index metaindex on foo using 
gin(to_tsvector('english', x)||to_tsvector('english',y));

ERROR:  syntax error at or near "||"
LINE 1: ...taindex on foo using gin(to_tsvector('english', x)||to_tsvec...
 ^
The error can be eliminated if extra surrounding parentheses are added:

postgres=# create index metaindex on foo using 
gin((to_tsvector('english', x)||to_tsvector('english',y)));

CREATE INDEX

Postgresql documentations says:

CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ]/|name|/  ] 
ON/|table_name|/  [ USING/|method|/  ]
( {/|column_name|/  | (/|expression|/  ) } [ COLLATE/|collation|/  ] 
[/|opclass|/  ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] )
[ WITH (/|storage_parameter|/  =/|value|/  [, ... ] ) ]
[ TABLESPACE/|tablespace_name|/  ]
[ WHERE/|predicate|/  ]

/|expression:|/
   An expression based on one or more columns of the table. The
   expression usually must be written with surrounding parentheses, as
   shown in the syntax. However, the parentheses can be omitted if the
   expression has the form of a function call.

--





So documentations states that sometimes it is possible to avoid 
parentheses, but it is unclear why I have to use double parentheses...
I think that either grammar should be fixed, either documentation should 
be updated.

/||/

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] UPDATE of partition key

2017-09-29 Thread amul sul

On Wed, Sep 13, 2017 at 4:24 PM, amul sul  wrote:
>
>
> On Sun, Sep 10, 2017 at 8:47 AM, Amit Kapila 
> wrote:
>>
>> On Fri, Sep 8, 2017 at 4:51 PM, amul sul  wrote:
>> > On Thu, May 18, 2017 at 9:13 AM, Amit Kapila 
>> > wrote:
>> >>
>> >>  On Wed, May 17, 2017 at 5:17 PM, Robert Haas 
>> >> wrote:
>> >> > On Wed, May 17, 2017 at 6:29 AM, Amit Kapila
>> >> > 
>> >> > wrote:
>> >> >> I think we can do this even without using an additional infomask
>> >> >> bit.
>> >> >> As suggested by Greg up thread, we can set InvalidBlockId in ctid to
>> >> >> indicate such an update.
>> >> >
>> >> > Hmm.  How would that work?
>> >> >
>> >>
>> >> We can pass a flag say row_moved (or require_row_movement) to
>> >> heap_delete which will in turn set InvalidBlockId in ctid instead of
>> >> setting it to self. Then the ExecUpdate needs to check for the same
>> >> and return an error when heap_update is not successful (result !=
>> >> HeapTupleMayBeUpdated).  Can you explain what difficulty are you
>> >> envisioning?
>> >>
>> >
>> > Attaching WIP patch incorporates the above logic, although I am yet to
>> > check
>> > all the code for places which might be using ip_blkid.  I have got a
>> > small
>> > query here,
>> > do we need an error on HeapTupleSelfUpdated case as well?
>> >
>>
>> No, because that case is anyway a no-op (or error depending on whether
>> is updated/deleted by same command or later command).  Basically, even
>> if the row wouldn't have been moved to another partition, we would not
>> have allowed the command to proceed with the update.  This handling is
>> to make commands fail rather than a no-op where otherwise (when the
>> tuple is not moved to another partition) the command would have
>> succeeded.
>>
> Thank you.
>
> I've rebased patch against  Amit Khandekar's latest patch (v17_rebased_2).
> Also, added ip_blkid validation check in heap_get_latest_tid(), 
> rewrite_heap_tuple()
> & rewrite_heap_tuple() function, because only ItemPointerEquals() check is no
> longer sufficient after this patch.

FYI, I have posted this patch in a separate thread :
https://postgr.es/m/caaj_b95pkwojoyfz0bzxu8ookctvgzn6vygcnvuukeudrnf...@mail.gmail.com

Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH]make pg_rewind to not copy useless WAL files

2017-09-29 Thread chenhj

On 2017-09-29 05:31:51, "Alexander Korotkov"  wrote:

On Thu, Sep 28, 2017 at 10:52 PM, chenhj  wrote:

On 2017-09-29 00:43:18，"Alexander Korotkov"  wrote:

On Thu, Sep 28, 2017 at 6:44 PM, chenhj  wrote:

On 2017-09-28 01:29:29，"Alexander Korotkov"  wrote:

It appears that your patch conflicts with fc49e24f.  Please, rebase it.



Yes, i had rebased it, Please check the new patch. 


Good, now it applies cleanly.


else if (strncmp(path, XLOGDIR"/", strlen(XLOGDIR"/")) == 0 &&
 IsXLogFileName(path + strlen(XLOGDIR"/")) &&
 (strcmp(path + strlen(XLOGDIR"/") + 8, divergence_wal_filename + 8) < 0 ||
  strcmp(path + strlen(XLOGDIR"/") + 8, last_source_wal_filename + 8) > 0))


According to our conding style, you should leave a space betwen XLOGDIF and "/".
Also, you do a trick by comparison xlog segment numbers using strcmp().  It's 
nice, but I would prefer seeing XLogFromFileName() here.  It would improve code 
readability and be less error prone during further modifications.


Thanks for advice!
I had modified it.


OK. Patch becomes better.
I also have more general question.  Why do we need upper bound for segment 
number (last_source_segno)?  I understand the purpose of lower bound 
(divergence_segno) which save us from copying extra WAL files, but what is 
upper bound for?  As I understood, we anyway need to replay most recent WAL 
records to reach consistent state after pg_rewind.  I propose to remove 
last_source_segno unless I'm missing something.


Thanks for relay!
When checkpoint occurs, some old WAL files will be renamed as future WAL files 
for later use.
The upper bound for segment number (last_source_segno) is used to avoid copying 
these extra WAL files.


When the parameter max_wal_size or max_min_size is large,these may be many 
renamed old WAL files for reused.


For example, I have just looked at one of our production systems (max_wal_size 
= 64GB, min_wal_size = 2GB), 
the total size of WALs is about 30GB, and contains about 4GB renamed old WAL 
files.


[postgres@hostxxx pg_xlog]$ ll
...
-rw--- 1 postgres postgres 16777216 Sep 29 14:05 00010BCF0078
-rw--- 1 postgres postgres 16777216 Sep 29 14:05 00010BCF0079
-rw--- 1 postgres postgres 16777216 Sep 29 14:05 00010BCF007A
-rw--- 1 postgres postgres 16777216 Sep 29 14:05 00010BCF007B
-rw--- 1 postgres postgres 16777216 Sep 29 14:05 00010BCF007C
-rw--- 1 postgres postgres 16777216 Sep 29 14:05 00010BCF007D
-rw--- 1 postgres postgres 16777216 Sep 29 11:22 00010BCF007E 
//after this, there are about 4GB WALs for reuse
-rw--- 1 postgres postgres 16777216 Sep 29 11:08 00010BCF007F
-rw--- 1 postgres postgres 16777216 Sep 29 11:06 00010BCF0080
-rw--- 1 postgres postgres 16777216 Sep 29 12:05 00010BCF0081
-rw--- 1 postgres postgres 16777216 Sep 29 11:28 00010BCF0082
-rw--- 1 postgres postgres 16777216 Sep 29 11:06 00010BCF0083
...


-
Best Regards,
Chen Huajun

Re: [HACKERS] Enhancements to passwordcheck

2017-09-29 Thread Albe Laurenz

Michael Paquier wrote:
> On Thu, Sep 28, 2017 at 12:06 AM, Alvaro Herrera  
> wrote:
>> I think the passwordcheck module as a whole is a dead end, security-
>> wise.  Myself, I've never seen the point in it.  It runs at the wrong
>> time, and there's no way to fix that.
> 
> Client commands may be run on a trusted network as well, let's not
> forget that. But I definitely agree that this is bad practice in
> general to not hash passwords beforehand. Another thing that
> passwordcheck is good at is being an example of hook use. I would
> think that many people refer to it when implementing their own module
> for whatever they want.

Right.

I originally only wanted the hook, but was lobbied into writing the
contrib module as well, to
a) have a nice checkbox item for ill-concieved security check lists
b) have an example of how the hook could be used.

I still think that there is nothing wrong with adding some GUCs
to the module, as long as there is nothing in it that can compromise
overall security.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench stuck with 100% cpu usage