Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-10-17 Thread Tom Lane
Alvaro Herrera  writes:
> Wait ... what?  I've been thinking that this GUC is just to enable or
> disable the computation of query ID, not to change the algorithm to do
> so.  Do we really need to allow different algorithms in different
> sessions?

We established that some time ago, no?

regards, tom lane




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-10-17 Thread Alvaro Herrera
On 2020-Oct-17, Tom Lane wrote:

> Fair point, but if we allow several different values to be set in
> different sessions, what ends up happening in pg_stat_statements?
> 
> On the other hand, maybe that's just a matter for documentation.
> "If the 'same' query is processed with two different queryID settings,
> that will generally result in two separate table entries, because
> the same ID hash is unlikely to be produced in both cases".

Wait ... what?  I've been thinking that this GUC is just to enable or
disable the computation of query ID, not to change the algorithm to do
so.  Do we really need to allow different algorithms in different
sessions?




Re: pg_restore error message during ENOSPC with largeobj

2020-10-17 Thread Tom Lane
Justin Pryzby  writes:
> I overflowed my homedir while testing with pg_reload, and got:
> |pg_restore: error: could not write to large object (result: 
> 18446744073709551615, expected: 30)

Bleah.

> I guess casting to long was the best option c. 2002 (commit 6faf8024f) but I
> gather the modern way is with %z.

Isn't the real problem that lo_write returns int, not size_t?

AFAICT, every other call site stores the result in an int,
it's just this one that's out in left field.

regards, tom lane




RE: Parallel copy

2020-10-17 Thread Hou, Zhijie
Hi Vignesh,

After having a look over the patch,
I have some suggestions for 
0003-Allow-copy-from-command-to-process-data-from-file.patch.

1.

+static uint32
+EstimateCstateSize(ParallelContext *pcxt, CopyState cstate, List *attnamelist,
+  char **whereClauseStr, char **rangeTableStr,
+  char **attnameListStr, char **notnullListStr,
+  char **nullListStr, char **convertListStr)
+{
+   uint32  strsize = MAXALIGN(sizeof(SerializedParallelCopyState));
+
+   strsize += EstimateStringSize(cstate->null_print);
+   strsize += EstimateStringSize(cstate->delim);
+   strsize += EstimateStringSize(cstate->quote);
+   strsize += EstimateStringSize(cstate->escape);


It use function EstimateStringSize to get the strlen of null_print, delim, 
quote and escape.
But the length of null_print seems has been stored in null_print_len.
And delim/quote/escape must be 1 byte, so I think call strlen again seems 
unnecessary.

How about  " strsize += sizeof(uint32) + cstate->null_print_len + 1"

2.
+   strsize += EstimateNodeSize(cstate->whereClause, whereClauseStr);

+   copiedsize += CopyStringToSharedMemory(cstate, whereClauseStr,
+   
   shmptr + copiedsize);

Some string length is counted for two times.
The ' whereClauseStr ' has call strlen in EstimateNodeSize once and call strlen 
in CopyStringToSharedMemory again.
I don't know wheather it's worth to refacor the code to avoid duplicate strlen 
. what do you think ?

Best regards,
houzj










pg_restore error message during ENOSPC with largeobj

2020-10-17 Thread Justin Pryzby
I overflowed my homedir while testing with pg_reload, and got:
|pg_restore: error: could not write to large object (result: 
18446744073709551615, expected: 30)

src/bin/pg_dump/pg_backup_archiver.c

   f (res != AH->lo_buf_used)
fatal("could not write to large object (result: %lu, expected: 
%lu)",
  (unsigned long) res, (unsigned long) AH->lo_buf_used);


; 18446744073709551615 - 1<<64
-1

I guess casting to long was the best option c. 2002 (commit 6faf8024f) but I
gather the modern way is with %z.

I confirmed this fixes the message.
|pg_restore: error: could not write to large object (result: -1, expected: 
16384)


-- 
Justin
>From 38d1f4ca314b9381a8fe5cbf90d4bc9b390b2fca Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sat, 17 Oct 2020 19:28:25 -0500
Subject: [PATCH v1] print size_t with %zd rather than casting to %lu

See also:
6faf8024facacd9cc30ce37b7ec9abb75238e0fd
be11f8400d7d99e8ae6602f3175e04b4f0c99376
---
 src/bin/pg_dump/pg_backup_archiver.c | 18 +-
 src/bin/pg_dump/pg_backup_tar.c  |  6 +++---
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index d61b290d2a..86dc355c9b 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -1640,13 +1640,13 @@ dump_lo_buf(ArchiveHandle *AH)
 		size_t		res;
 
 		res = lo_write(AH->connection, AH->loFd, AH->lo_buf, AH->lo_buf_used);
-		pg_log_debug(ngettext("wrote %lu byte of large object data (result = %lu)",
-			  "wrote %lu bytes of large object data (result = %lu)",
+		pg_log_debug(ngettext("wrote %zd byte of large object data (result = %zd)",
+			  "wrote %zd bytes of large object data (result = %zd)",
 			  AH->lo_buf_used),
-	 (unsigned long) AH->lo_buf_used, (unsigned long) res);
+	 AH->lo_buf_used, res);
 		if (res != AH->lo_buf_used)
-			fatal("could not write to large object (result: %lu, expected: %lu)",
-  (unsigned long) res, (unsigned long) AH->lo_buf_used);
+			fatal("could not write to large object (result: %zd, expected: %zd)",
+  res, AH->lo_buf_used);
 	}
 	else
 	{
@@ -2130,8 +2130,8 @@ _discoverArchiveFormat(ArchiveHandle *AH)
 		if (ferror(fh))
 			fatal("could not read input file: %m");
 		else
-			fatal("input file is too short (read %lu, expected 5)",
-  (unsigned long) cnt);
+			fatal("input file is too short (read %zd, expected 5)",
+  cnt);
 	}
 
 	/* Save it, just in case we need it later */
@@ -3794,8 +3794,8 @@ ReadHead(ArchiveHandle *AH)
 
 		AH->intSize = AH->ReadBytePtr(AH);
 		if (AH->intSize > 32)
-			fatal("sanity check on integer size (%lu) failed",
-  (unsigned long) AH->intSize);
+			fatal("sanity check on integer size (%zd) failed",
+  AH->intSize);
 
 		if (AH->intSize > sizeof(int))
 			pg_log_warning("archive was made on a machine with larger integers, some operations might fail");
diff --git a/src/bin/pg_dump/pg_backup_tar.c b/src/bin/pg_dump/pg_backup_tar.c
index 54e708875c..1751e12929 100644
--- a/src/bin/pg_dump/pg_backup_tar.c
+++ b/src/bin/pg_dump/pg_backup_tar.c
@@ -1233,10 +1233,10 @@ _tarGetHeader(ArchiveHandle *AH, TAR_MEMBER *th)
 			return 0;
 
 		if (len != TAR_BLOCK_SIZE)
-			fatal(ngettext("incomplete tar header found (%lu byte)",
-		   "incomplete tar header found (%lu bytes)",
+			fatal(ngettext("incomplete tar header found (%zd byte)",
+		   "incomplete tar header found (%zd bytes)",
 		   len),
-  (unsigned long) len);
+  len);
 
 		/* Calc checksum */
 		chk = tarChecksum(h);
-- 
2.17.0



Re: jit and explain nontext

2020-10-17 Thread Justin Pryzby
On Thu, Oct 15, 2020 at 02:51:38PM +1300, David Rowley wrote:
> On Thu, 15 Oct 2020 at 14:43, Justin Pryzby  wrote:
> > On Thu, Oct 15, 2020 at 02:23:01PM +1300, David Rowley wrote:
> > > On Thu, 15 Oct 2020 at 14:15, Tom Lane  wrote:
> > > > Hmm, I dunno if my opinion counts as "wisdom", but what I was arguing 
> > > > for
> > > > there was that we should print stuff if it's potentially invoked by a
> > > > run-time decision, but not if it was excluded at plan time.  I'm not
> > > > totally clear on whether jitting decisions are fixed by the plan tree
> > > > (including its cost values) or if the executor can make different
> > > > decisions in different executions of the identical plan tree.
> > > > If the latter, then I agree with Justin that this is a bug.
> > >
> > > As far as I know, the only exception where the executor overwrites the
> > > planner's decision is in nodeValuesscan.c where it turns jit off
> > > because each VALUES will get evaluated just once, which would be a
> > > waste of effort to JIT.
> > >
> > > Apart from that the choice is baked in by the planner and set in
> > > PlannedStmt.jitfFlags.
> >
> > What about the GUCs themselves ?
> >
> > They can change after planning, which means a given execution of a plan 
> > might
> > or might not use jit.
> 
> That's a pretty good point.

Added at: https://commitfest.postgresql.org/30/2766/

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 41317f1837..7345971507 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -839,7 +839,8 @@ ExplainPrintJIT(ExplainState *es, int jit_flags, 
JitInstrumentation *ji)
instr_time  total_time;
 
/* don't print information if no JITing happened */
-   if (!ji || ji->created_functions == 0)
+   if (!ji || (ji->created_functions == 0 &&
+   es->format == EXPLAIN_FORMAT_TEXT))
return;
 
/* calculate total time */
-- 
2.17.0
>From 237f4d5d0739583683fd7a4d67a6e3d9120aba04 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sat, 17 Oct 2020 14:10:11 -0500
Subject: [PATCH v1] explain: show JIT details in non-text format, even if zero

---
 src/backend/commands/explain.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 41317f1837..7345971507 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -839,7 +839,8 @@ ExplainPrintJIT(ExplainState *es, int jit_flags, JitInstrumentation *ji)
 	instr_time	total_time;
 
 	/* don't print information if no JITing happened */
-	if (!ji || ji->created_functions == 0)
+	if (!ji || (ji->created_functions == 0 &&
+			es->format == EXPLAIN_FORMAT_TEXT))
 		return;
 
 	/* calculate total time */
-- 
2.17.0



Re: Sometimes the output to the stdout in Windows disappears

2020-10-17 Thread Tom Lane
I wrote:
> Alexander Lakhin  writes:
>> What bothers me is:
>>  There must be a call to *WSACleanup* for each successful call to
>>  WSAStartup
>>  
>> .

> Yeah, that is a very odd statement.  Surely, the Windows kernel manages
> to cope if a program crashes without having done that.  So what exactly
> is the downside of intentionally not doing it?

A bit of grepping showed me that the backend, initdb, and pg_regress
all call WSAStartup without ever doing WSACleanup, and we've seen no
ill effects from that.  So it seems clear that this documentation can
safely be ignored.

I propose the attached patch.  If this doesn't cause buildfarm problems,
perhaps we should back-patch it.

BTW, I notice that libpq is asking WSAStartup for Winsock version 1.1,
which is remarkably ancient.  Almost everyplace else is asking for
version 2.2, which has been current for a decade or two.  Shouldn't
we update that?  (It occurs to me to wonder if this in itself is
some kind of problem; I wonder how well Winsock works when there are
requests for different API versions in the same program.)

regards, tom lane

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 3315f1dd05..de60281fcb 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -91,21 +91,6 @@
 

 
-   
-
- On Windows, there is a way to improve performance if a single
- database connection is repeatedly started and shutdown.  Internally,
- libpq calls WSAStartup() and WSACleanup() for connection startup
- and shutdown, respectively.  WSAStartup() increments an internal
- Windows library reference count which is decremented by WSACleanup().
- When the reference count is just one, calling WSACleanup() frees
- all resources and all DLLs are unloaded.  This is an expensive
- operation.  To avoid this, an application can manually call
- WSAStartup() so resources will not be freed when the last database
- connection is closed.
-
-   
-

 
  PQconnectdbParamsPQconnectdbParams
diff --git a/src/bin/pg_dump/parallel.c b/src/bin/pg_dump/parallel.c
index a967e11378..b51cc76c7d 100644
--- a/src/bin/pg_dump/parallel.c
+++ b/src/bin/pg_dump/parallel.c
@@ -229,19 +229,6 @@ static char *readMessageFromPipe(int fd);
 	(strncmp(msg, prefix, strlen(prefix)) == 0)
 
 
-/*
- * Shutdown callback to clean up socket access
- */
-#ifdef WIN32
-static void
-shutdown_parallel_dump_utils(int code, void *unused)
-{
-	/* Call the cleanup function only from the main thread */
-	if (mainThreadId == GetCurrentThreadId())
-		WSACleanup();
-}
-#endif
-
 /*
  * Initialize parallel dump support --- should be called early in process
  * startup.  (Currently, this is called whether or not we intend parallel
@@ -267,8 +254,7 @@ init_parallel_dump_utils(void)
 			pg_log_error("WSAStartup failed: %d", err);
 			exit_nicely(1);
 		}
-		/* ... and arrange to shut it down at exit */
-		on_exit_nicely(shutdown_parallel_dump_utils, NULL);
+
 		parallel_init_done = true;
 	}
 #endif
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index af27fee6b5..704c9e2f79 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -3871,23 +3871,30 @@ makeEmptyPGconn(void)
 #ifdef WIN32
 
 	/*
-	 * Make sure socket support is up and running.
+	 * Make sure socket support is up and running in this process.
+	 *
+	 * Note: the Windows documentation says that we should eventually do a
+	 * matching WSACleanup() call, but experience suggests that that is at
+	 * least as likely to cause problems as fix them.  So we don't.
 	 */
-	WSADATA		wsaData;
+	static bool wsastartup_done = false;
 
-	if (WSAStartup(MAKEWORD(1, 1), ))
-		return NULL;
+	if (!wsastartup_done)
+	{
+		WSADATA		wsaData;
+
+		if (WSAStartup(MAKEWORD(1, 1), ) != 0)
+			return NULL;
+		wsastartup_done = true;
+	}
+
+	/* Forget any earlier error */
 	WSASetLastError(0);
-#endif
+#endif			/* WIN32 */
 
 	conn = (PGconn *) malloc(sizeof(PGconn));
 	if (conn == NULL)
-	{
-#ifdef WIN32
-		WSACleanup();
-#endif
 		return conn;
-	}
 
 	/* Zero all pointers and booleans */
 	MemSet(conn, 0, sizeof(PGconn));
@@ -4080,10 +4087,6 @@ freePGconn(PGconn *conn)
 	termPQExpBuffer(>workBuffer);
 
 	free(conn);
-
-#ifdef WIN32
-	WSACleanup();
-#endif
 }
 
 /*


Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-10-17 Thread Tom Lane
Alvaro Herrera  writes:
> On 2020-Oct-17, Julien Rouhaud wrote:
>> On Sat, Oct 17, 2020 at 12:23 AM Tom Lane  wrote:
>>> then there's a potential security issue if the GUC is USERSET level:
>>> a user could hide her queries from pg_stat_statement by turning the
>>> GUC off.  So this line of thought suggests the GUC needs to be at
>>> least SUSET, and maybe higher ... doesn't pg_stat_statement need it
>>> to have the same value cluster-wide?

> I don't think we should consider pg_stat_statement a bulletproof defense
> for security problems.  It is already lossy by design.

Fair point, but if we allow several different values to be set in
different sessions, what ends up happening in pg_stat_statements?

On the other hand, maybe that's just a matter for documentation.
"If the 'same' query is processed with two different queryID settings,
that will generally result in two separate table entries, because
the same ID hash is unlikely to be produced in both cases".  There
is certainly a use-case for wanting to be able to do this, if for
example you'd like different query aggregation behavior for different
applications.

> I do think it'd be preferrable if we allowed it to be disabled at the
> config file level only, not with SET (prevent users from hiding stuff);
> but I think it is useful to allow users to enable it for specific
> queries or for specific sessions only, while globally disabled.

Indeed.  I'm kind of talking myself into the idea that USERSET, or
at most SUSET, is fine, so long as we document what happens when it
has different values in different sessions.

regards, tom lane




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-10-17 Thread Alvaro Herrera
On 2020-Oct-17, Julien Rouhaud wrote:

> On Sat, Oct 17, 2020 at 12:23 AM Tom Lane  wrote:

> > then there's a potential security issue if the GUC is USERSET level:
> > a user could hide her queries from pg_stat_statement by turning the
> > GUC off.  So this line of thought suggests the GUC needs to be at
> > least SUSET, and maybe higher ... doesn't pg_stat_statement need it
> > to have the same value cluster-wide?
> 
> Well, I don't think that there's any guarantee that pg_stat_statemens
> will display all activity that has been run, since there's a limited
> amount of (userid, dbid, queryid) that can be stored, but I agree that
> allowing random user to hide their activity isn't nice.  Note that I
> defined the GUC as SUSET, but maybe it should be SIGHUP?

I don't think we should consider pg_stat_statement a bulletproof defense
for security problems.  It is already lossy by design.

I do think it'd be preferrable if we allowed it to be disabled at the
config file level only, not with SET (prevent users from hiding stuff);
but I think it is useful to allow users to enable it for specific
queries or for specific sessions only, while globally disabled.  This
might mean we need to mark it PGC_SIGHUP and then have the check hook
disallow it from being changed under such-and-such conditions.




Re: warn_unused_results

2020-10-17 Thread Tom Lane
Peter Eisentraut  writes:
> Forgetting to assign the return value of list APIs such as lappend() is 
> a perennial favorite.  The compiler can help point out such mistakes. 
> GCC has an attribute warn_unused_results.  Also C++ has standardized 
> this under the name "nodiscard", and C has a proposal to do the same 
> [0].  In my patch I call the symbol pg_nodiscard, so that perhaps in a 
> distant future one only has to do s/pg_nodiscard/nodiscard/ or something 
> similar.  Also, the name is short enough that it doesn't mess up the 
> formatting of function declarations too much.

+1 in principle (I've not read the patch in detail); but I wonder what
pgindent does with these added keywords.

regards, tom lane




Re: partition routing layering in nodeModifyTable.c

2020-10-17 Thread Alvaro Herrera
On 2020-Oct-17, Amit Langote wrote:

> Hmm, I don't see ri_PartitionCheckExpr as being a piece of routing
> information, because it's primarily meant to be used when inserting
> *directly* into a partition, although it's true we do initialize it in
> routing target partitions too in some cases.
> 
> Also, ChildToRootMap was introduced by the trigger transition table
> project, not tuple routing.  I think we misjudged this when we added
> PartitionToRootMap to PartitionRoutingInfo, because it doesn't really
> belong there.  This patch fixes that by removing PartitionToRootMap.
> 
> RootToPartitionMap and the associated partition slot is the only piece
> of extra information that is needed by tuple routing target relations.

Well, I was thinking on making the ri_PartitionInfo be about
partitioning in general, not just specifically for partition tuple
routing.  Maybe Heikki is right that it may end up being simpler to
remove ri_PartitionInfo altogether.  It'd just be a couple of additional
pointers in ResultRelInfo after all.  (Remember that we wanted to get
rid of fields specific to only certain kinds of RTEs in RangeTblEntry
for example, to keep things cleanly separated, although that project
eventually found its demise for other reasons.)

> As I said in my previous email, I don't see how we can make
> initializing the map any lazier than it already is.  If a partition
> has a different tuple descriptor than the root table, then we know for
> sure that any tuples that are routed to it will need to be converted
> from the root tuple format to its tuple format, so we might as well
> build the map when the ResultRelInfo is built.  If no tuple lands into
> a partition, we would neither build its ResultRelInfo nor the map.
> With the current arrangement, if the map field is NULL, it simply
> means that the partition has the same tuple format as the root table.

I see -- makes sense.

> On Fri, Oct 16, 2020 at 11:45 PM Alvaro Herrera  
> wrote:

> > BTW it is curious that ExecInitRoutingInfo is called both in
> > ExecInitPartitionInfo() (from ExecFindPartition when the ResultRelInfo
> > for the partition is not found) *and* from ExecFindPartition again, when
> > the ResultRelInfo for the partition *is* found.  Doesn't this mean that
> > ri_PartitionInfo is set up twice for the same partition?
> 
> No.  ExecFindPartition() directly calls ExecInitRoutingInfo() only for
> reused update result relations, that too, only the first time a tuple
> lands into such a partition.  For the subsequent tuples that land into
> the same partition, ExecFindPartition() will be able to find that
> ResultRelInfo in the proute->partitions[] array.  All ResultRelInfos
> in that array are assumed to have been processed by
> ExecInitRoutingInfo().

Doh, right, sorry, I was misreading the if/else maze there.




Re: Sometimes the output to the stdout in Windows disappears

2020-10-17 Thread Tom Lane
Alexander Lakhin  writes:
> 16.10.2020 19:18, Tom Lane wrote:
>> What if we arranged to call WSAStartup just once, during the first
>> libpq connection-open in a process, and then never did WSACleanup?
>> Surely the OS can cope with that, and it eliminates any risk that
>> WSACleanup breaks something.

> What bothers me is:

> There must be a call to *WSACleanup* for each successful call to
> WSAStartup
> 
> .

Yeah, that is a very odd statement.  Surely, the Windows kernel manages
to cope if a program crashes without having done that.  So what exactly
is the downside of intentionally not doing it?  There's no reason to
care if the Winsock DLL stays loaded until program exit rather than
getting unloaded a bit earlier.  (If anything, the current code causes
an unload/reload cycle for each connection when the application makes
a series of PG connections; who could think that's a great idea?)

regards, tom lane




Re: pgstat_report_activity() and parallel CREATE INDEX (was: Parallel index creation & pg_stat_activity)

2020-10-17 Thread Peter Geoghegan
On Tue, Oct 13, 2020 at 7:26 PM Noah Misch  wrote:
> 1. Disable parallelism for the index build under ExecuteTruncateGuts().
>Nobody will mourn a performance loss from declining parallelism for an
>empty index, but I feel like this is fixing in the wrong place.
> 2. Make _bt_begin_parallel() and begin_parallel_vacuum() recognize the
>debug_query_string==NULL case and reproduce it on the worker.
> 3. Require bgworkers to set debug_query_string before entering code of vacuum,
>truncate, etc.  Logical replication might synthesize a DDL statement, or it
>might just use a constant string.
>
> I tend to prefer (2), but (3) isn't bad.  Opinions?

I also prefer 2.

Thanks
-- 
Peter Geoghegan




Re: [PATCH] Add extra statistics to explain for Nested Loop

2020-10-17 Thread David G. Johnston
On Fri, Oct 16, 2020 at 3:11 PM Anastasia Lubennikova <
a.lubennik...@postgrespro.ru> wrote:

> User visible change is:
>
>
> -   ->  Nested Loop (actual rows=N loops=N)
> +  ->  Nested Loop (actual min_rows=0 rows=0 max_rows=0
> loops=2)
>
I'd be inclined to append both new rows to the end.

(actual rows=N loops=N min_rows=N max_rows=N)

rows * loops is still an important calculation.

Why not just add total_rows while we are at it - last in the listing?

(actual rows=N loops=N min_rows=N max_rows=N total_rows=N)

David J.


Re: Sometimes the output to the stdout in Windows disappears

2020-10-17 Thread Andrew Dunstan


On 10/16/20 12:18 PM, Tom Lane wrote:
>
> But wait a minute: I just looked at Microsoft's docs [1] and found
>
> In a multithreaded environment, WSACleanup terminates Windows Sockets
> operations for all threads.
>
> This makes me (a) wonder if that explains the side-effects on stdio,
> and (b) question why libpq is calling WSACleanup at all.
> What if we arranged to call WSAStartup just once, during the first
> libpq connection-open in a process, and then never did WSACleanup?
> Surely the OS can cope with that, and it eliminates any risk that
> WSACleanup breaks something.
>
>   regards, tom lane
>
> [1] 
> https://docs.microsoft.com/en-us/windows/win32/api/winsock/nf-winsock-wsacleanup
>
>


This could explain random transient stdout/stderr failures we have seen
over the years. I think we should at least give your suggestion a try -
this is a good time in the dev cycle for such experiments.


cheers


andrew







Re: Wrong example in the bloom documentation

2020-10-17 Thread Daniel Westermann (DWE)
On Fri, Oct  9, 2020 at 11:08:32AM -0400, Bruce Momjian wrote:
> On Fri, Oct  9, 2020 at 05:44:57AM +, Daniel Westermann (DWE) wrote:
> > Hi Bruce, Tom,
> > 
> > On Thu, Oct  8, 2020 at 03:43:32PM -0400, Tom Lane wrote:
> > >> "Daniel Westermann (DWE)"  writes:
> > >> >> I was hoping someone more experienced with this would comment, but
> > >> >> seeing none, I will apply it in a day or two to all supported 
> > >> >> versions?
> > >> >> Have you tested this output back to 9.5?
> > >> 
> > >> > I hoped that as well. No, I tested down to 9.6 because the change 
> > >> > happened in 10.
> > >> 
> > >> The patch assumes that parallel query is enabled, which is not true by
> > >> default before v10, so it should certainly not be applied before v10
> > >> (at least not without significant revisions).
> >> 
> >> Yes, the behavior change was in 10. Before 10 the example is fine, I would 
> >> not apply that to any prior version, otherwise the whole example needs to 
> >> be rewritten.
>> 
>> Agreed.

>This is not applying to PG 12 or earlier because the patch mentions JIT,
>which was only mentioned in the PG bloom docs in PG 13+.

Does that mean we need separate patches for each release starting with 10? 
As I am not frequently writing patches, I would need some help here.

Regards
Daniel



[doc] improve tableoid description

2020-10-17 Thread Ian Lawrence Barwick
[doc] improve tableoid description

Hi

Attached patch aims to improve the description of the tableoid system column [1]
by:

- mentioning it's useful for determining table names for partitioned tables as
  well as for those in inheritance hierarchies
- mentioning the possibility of casting tableoid to regclass (which is simpler
  than the currently suggested join on pg_class, which is only needed if
  the schema name is absolutely required)

[1] https://www.postgresql.org/docs/current/ddl-system-columns.html


Regards

Ian Barwick


-- 
EnterpriseDB: https://www.enterprisedb.com
diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index c4897d68c9..f916f0bd70 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -1144,12 +1144,18 @@ CREATE TABLE circles (
 
  
   The OID of the table containing this row.  This column is
-  particularly handy for queries that select from inheritance
+  particularly handy for queries that select from partitioned
+  tables (see ) or inheritance
   hierarchies (see ), since without it,
   it's difficult to tell which individual table a row came from.  The
-  tableoid can be joined against the
+  tableoid can be cast to regclass
+  to obtain the table name.  However the table's schema will only be shown
+  if the table is not in the default path, otherwise
+  tableoid will need to be joined against the
   oid column of
-  pg_class to obtain the table name.
+  pg_class
+  to obtain the relnamespace OID (which itself
+  can be cast to regnamespace to obtain the schema name).
  
 



Timing of relcache inval at parallel worker init

2020-10-17 Thread Noah Misch
While reviewing what became commit fe4d022, I was surprised at the sequence of
relfilenode values that RelationInitPhysicalAddr() computed for pg_class,
during ParallelWorkerMain(), when running the last command of this recipe:

  begin;
  cluster pg_class using pg_class_oid_index;
  set force_parallel_mode = 'regress';
  values (1);

There's $OLD_NODE (relfilenode in the committed relation map) and $NEW_NODE
(relfilenode in this transaction's active_local_updates).  The worker performs
RelationInitPhysicalAddr(pg_class) four times:

1) $OLD_NODE in BackgroundWorkerInitializeConnectionByOid().
2) $OLD_NODE in RelationCacheInvalidate() directly.
3) $OLD_NODE in RelationReloadNailed(), indirectly via 
RelationCacheInvalidate().
4) $NEW_NODE indirectly as part of the executor running the query.

I did expect $OLD_NODE in (1), since ParallelWorkerMain() calls
BackgroundWorkerInitializeConnectionByOid() before
StartParallelWorkerTransaction().  I expected $NEW_NODE in (2) and (3); that
didn't happen, because ParallelWorkerMain() calls InvalidateSystemCaches()
before RestoreRelationMap().  Let's move InvalidateSystemCaches() later.
Invalidation should follow any worker initialization step that changes the
results of relcache validation; otherwise, we'd need to ensure the
InvalidateSystemCaches() will not validate any relcache entry.  Invalidation
should precede any step that reads from a cache; otherwise, we'd need to redo
that step after inval.  (Currently, no step reads from a cache.)  Many steps,
e.g. AttachSerializableXact(), have no effect on relcache validation, so it's
arbitrary whether they happen before or after inval.  I'm putting inval as
late as possible, because I think it's easier to confirm that a step doesn't
read from a cache than to confirm that a step doesn't affect relcache
validation.  An also-reasonable alternative would be to move inval and its
prerequisites as early as possible.

For reasons described in the attached commit message, this doesn't have
user-visible consequences today.  Innocent-looking relcache.c changes might
upheave that, so I'm proposing this on robustness grounds.  No need to
back-patch.
Author: Noah Misch 
Commit: Noah Misch 

In parallel workers, invalidate after restoring relmapper.

A relcache entry is no fresher than the relmapper data used to build it.
The "rd_refcnt <= 1" in RelationReloadNailed() prevented user-visible
consequences by leaving every relation !rd_isvalid at the end of this
InvalidateSystemCaches().  Any subsequent relation usage first validated
the relation, drawing on correct relmapper data.  Even so, reduce
fragility by delaying invalidation until we initialize all state
essential to rebuilding a relcache entry.

Reviewed by FIXME.

Discussion: https://postgr.es/m/FIXME

diff --git a/src/backend/access/transam/parallel.c 
b/src/backend/access/transam/parallel.c
index b042696..566ae87 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -1417,12 +1417,6 @@ ParallelWorkerMain(Datum main_arg)
PushActiveSnapshot(RestoreSnapshot(asnapspace));
 
/*
-* We've changed which tuples we can see, and must therefore invalidate
-* system caches.
-*/
-   InvalidateSystemCaches();
-
-   /*
 * Restore current role id.  Skip verifying whether session user is
 * allowed to become this role and blindly restore the leader's state 
for
 * current role.
@@ -1458,6 +1452,12 @@ ParallelWorkerMain(Datum main_arg)
AttachSerializableXact(fps->serializable_xact_handle);
 
/*
+* We've changed tuple visibility and relmapper state, so discard all
+* provisional state derived from system catalogs.
+*/
+   InvalidateSystemCaches();
+
+   /*
 * We've initialized all of our state now; nothing should change
 * hereafter.
 */


Re: [PATCH] Add extra statistics to explain for Nested Loop

2020-10-17 Thread hubert depesz lubaczewski
On Sat, Oct 17, 2020 at 12:26:08PM +0800, Julien Rouhaud wrote:
> >> -   ->  Nested Loop (actual rows=N loops=N)
> >> +  ->  Nested Loop (actual min_rows=0 rows=0 max_rows=0 
> >> loops=2)
> > This interface is ok - there is not too much space for creativity.
> Yes I also think it's ok. We should also consider usability for tools
> like explain.depesz.com, I don't know if the current output is best.
> I'm adding Depesz and Pierre which are both working on this kind of
> tool for additional input.

Thanks for heads up. This definitely will need some changes on my side,
but should be easy to handle.

Best regards,

depesz





Re: Sometimes the output to the stdout in Windows disappears

2020-10-17 Thread Alexander Lakhin
16.10.2020 19:18, Tom Lane wrote:
> Oh, very interesting.
> Now that you have it somewhat in captivity, maybe you could determine
> some things:
>
> 1. Is it only stdout that's affected?  What of other stdio streams?
> (Note that testing stderr might be tricky because it's probably
> line-buffered.)
stderr is affected too. Just replacing stdout with stderr in connect.c
and 000_connect.pl gives the same result.

Moreover, the following modification:
...
    outfile = fopen("out", "w");
...
    fprintf(stdout, "stdout test\n");
    fprintf(stderr, "stderr test\n");
    fprintf(outfile, "outfile test\n");
    WSACleanup();
...

---
for (my $i =0; $i < 10; $i++) {
    unlink('out');
    IPC::Run::run(\@cmd, '>', \$stdout, '2>', \$stderr);
    open(my $fh, '<', 'out') or die $!;
    my $fileout = <$fh>;
    ok(defined $fileout && $fileout ne '');
    close($fh);
}

detects similar failures too. (On a fail the out file exists but has
zero size.)

> 2. Does an fflush() just before, or just after, WSACleanup() fix it?
"fflush(NULL);" just before or after WSACleanup() fixes things.

I've managed to record in ProcMon the activity log for a failed run
(aside normal runs). Excerpts from the log are attached. As we can see,
the failed process doesn't even try to write into IPC-Run's temp file.

> Depending on your answers to the above, maybe some hack like this
> would be acceptable:
>
>   free(conn);
>  
>  #ifdef WIN32
> + fflush(NULL);
>   WSACleanup();
>  #endif
>  }
>
> It's not very nice for a library to be doing global things like that,
> but if the alternative is loss of output, maybe we should.
But now we see that the WSACleanup call is a global thing by itself.
> But wait a minute: I just looked at Microsoft's docs [1] and found
>
> In a multithreaded environment, WSACleanup terminates Windows Sockets
> operations for all threads.
>
> This makes me (a) wonder if that explains the side-effects on stdio,
> and (b) question why libpq is calling WSACleanup at all.
> What if we arranged to call WSAStartup just once, during the first
> libpq connection-open in a process, and then never did WSACleanup?
> Surely the OS can cope with that, and it eliminates any risk that
> WSACleanup breaks something.
What bothers me is:

There must be a call to *WSACleanup* for each successful call to
WSAStartup

.
Only the final *WSACleanup* function call performs the actual
cleanup. The preceding calls simply decrement an internal reference
count in the WS2_32.DLL.

So third-party application developers should understand that when using
libpq they would have to call WSACleanup one more time to perform "the
actual cleanup". (And thus WSAStartup is kind of like a global thing too.)
But may be it's a way better than to have a confirmed risk of losing data.

Best regards,
Alexander
"Time of Day","Process Name","PID","Operation","Path","Result","Detail"
"6:46:00.6867994 AM","connect.EXE","1964","RegQueryKey","HKLM","SUCCESS","Query: "
"6:46:00.6868045 AM","connect.EXE","1964","RegOpenKey","HKLM\System\CurrentControlSet\Services\Tcpip\Parameters\Winsock","REPARSE","Desired Access: Read"
"6:46:00.6868108 AM","connect.EXE","1964","RegOpenKey","HKLM\System\CurrentControlSet\Services\Tcpip\Parameters\Winsock","SUCCESS","Desired Access: Read"
"6:46:00.6868173 AM","connect.EXE","1964","RegQueryValue","HKLM\System\CurrentControlSet\Services\Tcpip\Parameters\Winsock\MinSockaddrLength","SUCCESS","Type: REG_DWORD, Length: 4, Data: 16"
"6:46:00.6868222 AM","connect.EXE","1964","RegQueryValue","HKLM\System\CurrentControlSet\Services\Tcpip\Parameters\Winsock\MaxSockaddrLength","SUCCESS","Type: REG_DWORD, Length: 4, Data: 16"
"6:46:00.6868270 AM","connect.EXE","1964","RegQueryValue","HKLM\System\CurrentControlSet\Services\Tcpip\Parameters\Winsock\UseDelayedAcceptance","SUCCESS","Type: REG_DWORD, Length: 4, Data: 0"
"6:46:00.6868324 AM","connect.EXE","1964","RegCloseKey","HKLM\System\CurrentControlSet\Services\Tcpip\Parameters\Winsock","SUCCESS",""
"6:46:00.6871675 AM","connect.EXE","1964","RegCloseKey","HKLM\System\CurrentControlSet\Services\WinSock2\Parameters\Protocol_Catalog9","SUCCESS",""
"6:46:00.6872047 AM","connect.EXE","1964","RegCloseKey","HKLM\System\CurrentControlSet\Services\WinSock2\Parameters\NameSpace_Catalog5","SUCCESS",""
"6:46:00.6873043 AM","connect.EXE","1964","FASTIO_NETWORK_QUERY_OPEN","C:\Windows\System32\kernel.appcore.dll","FAST IO DISALLOWED",""
"6:46:00.6874633 AM","connect.EXE","1964","IRP_MJ_CREATE","C:\Windows\System32\kernel.appcore.dll","SUCCESS","Desired Access: Read Attributes, Disposition: Open, Options: Open Reparse Point, Attributes: n/a, ShareMode: Read, Write, Delete, AllocationSize: n/a, OpenResult: Opened"
"6:46:00.6874866 AM","connect.EXE","1964","FASTIO_QUERY_INFORMATION","C:\Windows\System32\kernel.appcore.dll","SUCCESS","Type: QueryBasicInformationFile, CreationTime: 

Re: partition routing layering in nodeModifyTable.c

2020-10-17 Thread Amit Langote
On Fri, Oct 16, 2020 at 11:45 PM Alvaro Herrera  wrote:
> On 2020-Oct-16, Amit Langote wrote:
> > On Thu, Oct 15, 2020 at 11:59 PM Heikki Linnakangas  wrote:
> > > On Wed, Oct 14, 2020 at 6:04 PM Heikki Linnakangas  
> > > wrote:
>
> > > And if we removed
> > > ri_PartitionInfo->pi_PartitionToRootMap, and always used
> > > ri_ChildToRootMap for it.
> >
> > Done in the attached.
>
> Hmm...  Overall I like the simplification.

Thank you for looking it over.

> > > Maybe remove PartitionRoutingInfo struct altogether, and just move its
> > > fields directly to ResultRelInfo.
> >
> > If we do that, we'll end up with 3 notations for the same thing across
> > releases: In v10 and v11, PartitionRoutingInfos members are saved in
> > arrays in ModifyTableState, totally detached from the partition
> > ResultRelInfos.  In v12 (3f2393edef), we moved them into ResultRelInfo
> > but chose to add them into a sub-struct (PartitionRoutingInfo), which
> > in retrospect was not a great decision.  Now if we pull them into
> > ResultRelInfo, we'll have invented the 3rd notation.  Maybe that makes
> > things hard when back-patching bug-fixes?
>
> I don't necessarily agree that PartitionRoutingInfo was such a bad idea.
> In fact I wonder if we shouldn't move *more* stuff into it
> (ri_PartitionCheckExpr), and keep struct ResultRelInfo clean of
> partitioning-related stuff (other than ri_PartitionInfo and
> ri_PartitionRoot); there are plenty of ResultRelInfos that are not
> partitions, so I think it makes sense to keep the split.  I'm thinking
> that the ChildToRootMap should continue to be in PartitionRoutingInfo.

Hmm, I don't see ri_PartitionCheckExpr as being a piece of routing
information, because it's primarily meant to be used when inserting
*directly* into a partition, although it's true we do initialize it in
routing target partitions too in some cases.

Also, ChildToRootMap was introduced by the trigger transition table
project, not tuple routing.  I think we misjudged this when we added
PartitionToRootMap to PartitionRoutingInfo, because it doesn't really
belong there.  This patch fixes that by removing PartitionToRootMap.

RootToPartitionMap and the associated partition slot is the only piece
of extra information that is needed by tuple routing target relations.

> Maybe what we need in order to keep the initialization "lazy enough" is
> some inline functions that act as getters, initializing members of
> PartitionRoutingInfo when first needed.  (This would probably need
> boolean flags, to distinguish "hasn't been set up yet" from "it is not
> needed for this partition" for each member that requires it).

As I said in my previous email, I don't see how we can make
initializing the map any lazier than it already is.  If a partition
has a different tuple descriptor than the root table, then we know for
sure that any tuples that are routed to it will need to be converted
from the root tuple format to its tuple format, so we might as well
build the map when the ResultRelInfo is built.  If no tuple lands into
a partition, we would neither build its ResultRelInfo nor the map.
With the current arrangement, if the map field is NULL, it simply
means that the partition has the same tuple format as the root table.

> BTW it is curious that ExecInitRoutingInfo is called both in
> ExecInitPartitionInfo() (from ExecFindPartition when the ResultRelInfo
> for the partition is not found) *and* from ExecFindPartition again, when
> the ResultRelInfo for the partition *is* found.  Doesn't this mean that
> ri_PartitionInfo is set up twice for the same partition?

No.  ExecFindPartition() directly calls ExecInitRoutingInfo() only for
reused update result relations, that too, only the first time a tuple
lands into such a partition.  For the subsequent tuples that land into
the same partition, ExecFindPartition() will be able to find that
ResultRelInfo in the proute->partitions[] array.  All ResultRelInfos
in that array are assumed to have been processed by
ExecInitRoutingInfo().

--
Amit Langote
EDB: http://www.enterprisedb.com




warn_unused_results

2020-10-17 Thread Peter Eisentraut
Forgetting to assign the return value of list APIs such as lappend() is 
a perennial favorite.  The compiler can help point out such mistakes. 
GCC has an attribute warn_unused_results.  Also C++ has standardized 
this under the name "nodiscard", and C has a proposal to do the same 
[0].  In my patch I call the symbol pg_nodiscard, so that perhaps in a 
distant future one only has to do s/pg_nodiscard/nodiscard/ or something 
similar.  Also, the name is short enough that it doesn't mess up the 
formatting of function declarations too much.


I have added pg_nodiscard decorations to all the list functions where I 
found it sensible, as well as repalloc() for good measure, since 
realloc() is usually mentioned as an example where this function 
attribute is useful.


I have found two places in the existing code where this creates 
warnings.  Both places are correct as is, but make assumptions about the 
internals of the list APIs and it seemed better just to fix the warning 
than to write a treatise about why it's correct as is.



[0]: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2051.pdf

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 32a9bc5cd2c63500670b663964e878e4474ce257 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Sat, 17 Oct 2020 08:38:39 +0200
Subject: [PATCH 1/3] Add pg_nodiscard function declaration specifier

pg_nodiscard means the compiler should warn if the result of a
function call is ignored.  The name "nodiscard" is chosen in alignment
with (possibly future) C and C++ standards.  It maps to the GCC
attribute warn_unused_result.
---
 src/include/c.h | 12 
 1 file changed, 12 insertions(+)

diff --git a/src/include/c.h b/src/include/c.h
index 9cd67f8f76..d5dc3632f7 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -111,6 +111,18 @@
 #define pg_attribute_unused()
 #endif
 
+/*
+ * pg_nodiscard means the compiler should warn if the result of a function
+ * call is ignored.  The name "nodiscard" is chosen in alignment with
+ * (possibly future) C and C++ standards.  For maximum compatibility, use it
+ * as a function declaration specifier, so it goes before the return type.
+ */
+#ifdef __GNUC__
+#define pg_nodiscard __attribute__((warn_unused_result))
+#else
+#define pg_nodiscard
+#endif
+
 /*
  * Append PG_USED_FOR_ASSERTS_ONLY to definitions of variables that are only
  * used in assert-enabled builds, to avoid compiler warnings about unused
-- 
2.28.0

From ff90b03e24011f59c2a0fa7c3569c64e1189a028 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Sat, 17 Oct 2020 08:38:39 +0200
Subject: [PATCH 2/3] Add pg_nodiscard decorations to some functions

Especially for the list API such as lappend() forgetting to assign the
return value is a common problem.
---
 src/include/nodes/pg_list.h | 62 ++---
 src/include/utils/palloc.h  |  4 +--
 2 files changed, 33 insertions(+), 33 deletions(-)

diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index ec231010ce..cda77a841e 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -521,36 +521,36 @@ extern List *list_make3_impl(NodeTag t, ListCell datum1, 
ListCell datum2,
 extern List *list_make4_impl(NodeTag t, ListCell datum1, ListCell datum2,
 ListCell datum3, 
ListCell datum4);
 
-extern List *lappend(List *list, void *datum);
-extern List *lappend_int(List *list, int datum);
-extern List *lappend_oid(List *list, Oid datum);
+extern pg_nodiscard List *lappend(List *list, void *datum);
+extern pg_nodiscard List *lappend_int(List *list, int datum);
+extern pg_nodiscard List *lappend_oid(List *list, Oid datum);
 
-extern List *list_insert_nth(List *list, int pos, void *datum);
-extern List *list_insert_nth_int(List *list, int pos, int datum);
-extern List *list_insert_nth_oid(List *list, int pos, Oid datum);
+extern pg_nodiscard List *list_insert_nth(List *list, int pos, void *datum);
+extern pg_nodiscard List *list_insert_nth_int(List *list, int pos, int datum);
+extern pg_nodiscard List *list_insert_nth_oid(List *list, int pos, Oid datum);
 
-extern List *lcons(void *datum, List *list);
-extern List *lcons_int(int datum, List *list);
-extern List *lcons_oid(Oid datum, List *list);
+extern pg_nodiscard List *lcons(void *datum, List *list);
+extern pg_nodiscard List *lcons_int(int datum, List *list);
+extern pg_nodiscard List *lcons_oid(Oid datum, List *list);
 
-extern List *list_concat(List *list1, const List *list2);
-extern List *list_concat_copy(const List *list1, const List *list2);
+extern pg_nodiscard List *list_concat(List *list1, const List *list2);
+extern pg_nodiscard List *list_concat_copy(const List *list1, const List 
*list2);
 
-extern List *list_truncate(List *list, int new_size);
+extern pg_nodiscard List *list_truncate(List *list, int new_size);
 
 extern bool list_member(const List