Re: Online checksums verification in the backend

2020-03-11 Thread Julien Rouhaud
On Tue, Dec 10, 2019 at 11:12:34AM +0100, Julien Rouhaud wrote:
> On Tue, Dec 10, 2019 at 3:26 AM Michael Paquier  wrote:
> >
> > On Mon, Dec 09, 2019 at 07:02:43PM +0100, Julien Rouhaud wrote:
> > > On Mon, Dec 9, 2019 at 5:21 PM Robert Haas  wrote:
> > >> Some people might prefer notices, because you can get those while the
> > >> thing is still running, rather than a result set, which you will only
> > >> see when the query finishes. Other people might prefer an SRF, because
> > >> they want to have the data in structured form so that they can
> > >> postprocess it. Not sure what you mean by "more globally."
> > >
> > > I meant having the results available system-wide, not only to the
> > > caller.  I think that emitting a log/notice level should always be
> > > done on top on whatever other communication facility we're using.
> >
> > The problem of notice and logs is that they tend to be ignored.  Now I
> > don't see no problems either in adding something into the logs which
> > can be found later on for parsing on top of a SRF returned by the
> > caller which includes all the corruption details, say with pgbadger
> > or your friendly neighborhood grep.  I think that any backend function
> > should also make sure to call pgstat_report_checksum_failure() to
> > report a report visible at database-level in the catalogs, so as it is
> > possible to use that as a cheap high-level warning.  The details of
> > the failures could always be dug from the logs or the result of the
> > function itself after finding out that something is wrong in
> > pg_stat_database.
>
> I agree that adding extra information in the logs and calling
> pgstat_report_checksum_failure is a must do, and I changed that
> locally.  However, I doubt that the logs is the right place to find
> the details of corrupted blocks.  There's no guarantee that the file
> will be accessible to the DBA, nor that the content won't get
> truncated by the time it's needed.  I really think that corruption is
> important enough to justify more specific location.


The cfbot reported a build failure, so here's a rebased v2 which also contains
the pg_stat_report_failure() call and extra log info.
diff --git a/src/backend/catalog/system_views.sql 
b/src/backend/catalog/system_views.sql
index b8a3f46912..e266292b03 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1386,6 +1386,13 @@ LANGUAGE INTERNAL
 STRICT STABLE PARALLEL SAFE
 AS 'jsonb_path_query_first_tz';
 
+CREATE OR REPLACE FUNCTION pg_check_relation(
+  IN relation regclass DEFAULT NULL::regclass, IN fork text DEFAULT NULL::text,
+  OUT relid oid, OUT forknum integer, OUT failed_blocknum bigint,
+  OUT expected_checksum integer, OUT found_checksum integer)
+  RETURNS SETOF record VOLATILE LANGUAGE internal AS 'pg_check_relation'
+  PARALLEL RESTRICTED;
+
 --
 -- The default permissions for functions mean that anyone can execute them.
 -- A number of functions shouldn't be executable by just anyone, but rather
diff --git a/src/backend/storage/page/checksum.c 
b/src/backend/storage/page/checksum.c
index e010691c9f..ed1a0c9b30 100644
--- a/src/backend/storage/page/checksum.c
+++ b/src/backend/storage/page/checksum.c
@@ -15,8 +15,541 @@
 
 #include "storage/checksum.h"
 /*
- * The actual code is in storage/checksum_impl.h.  This is done so that
- * external programs can incorporate the checksum code by #include'ing
- * that file from the exported Postgres headers.  (Compare our CRC code.)
+ * The actual checksum computation code is in storage/checksum_impl.h.  This
+ * is done so that external programs can incorporate the checksum code by
+ * #include'ing that file from the exported Postgres headers.  (Compare our
+ * CRC code.)
  */
 #include "storage/checksum_impl.h"
+
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "catalog/pg_authid_d.h"
+#include "commands/dbcommands.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "storage/buf_internals.h"
+#include "storage/checksum.h"
+#include "storage/lockdefs.h"
+#include "utils/acl.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+#include "utils/syscache.h"
+
+/*
+ * A zero checksum can never be computed, see pg_checksum_page() */
+#define NoComputedChecksum 0
+
+/* The rest of this module provides a set of functions that can be used to
+ * safely check all checksums on a running cluster.
+ *
+ * Please note that those only perform standard buffered reads, and don't try
+ * to bypass or discard the operating system cache.  If you want to check the
+ * actual storage, you have to discard the operating system cache before
+ * running those functions.
+ *
+ * To avoid torn page and possible false postives when reading data, and
+ * keeping overhead as low as possible, the following heuristics are used:
+ *
+ * - a shared LWLock is taken on the target buffer pool partition mapping, and
+ *   we de

Re: Some problems of recovery conflict wait events

2020-03-11 Thread Fujii Masao




On 2020/03/10 13:54, Masahiko Sawada wrote:

On Tue, 10 Mar 2020 at 00:57, Fujii Masao  wrote:




On 2020/03/09 14:10, Masahiko Sawada wrote:

On Mon, 9 Mar 2020 at 13:24, Fujii Masao  wrote:




On 2020/03/08 13:52, Masahiko Sawada wrote:

On Thu, 5 Mar 2020 at 20:16, Fujii Masao  wrote:




On 2020/03/05 16:58, Masahiko Sawada wrote:

On Wed, 4 Mar 2020 at 15:21, Fujii Masao  wrote:




On 2020/03/04 14:31, Masahiko Sawada wrote:

On Wed, 4 Mar 2020 at 13:48, Fujii Masao  wrote:




On 2020/03/04 13:27, Michael Paquier wrote:

On Wed, Mar 04, 2020 at 01:13:19PM +0900, Masahiko Sawada wrote:

Yeah, so 0001 patch sets existing wait events to recovery conflict
resolution. For instance, it sets (PG_WAIT_LOCK | LOCKTAG_TRANSACTION)
to the recovery conflict on a snapshot. 0003 patch improves these wait
events by adding the new type of wait event such as
WAIT_EVENT_RECOVERY_CONFLICT_SNAPSHOT. Therefore 0001 (and 0002) patch
is the fix for existing versions and 0003 patch is an improvement for
only PG13. Did you mean even 0001 patch doesn't fit for back-patching?


Yes, it looks like a improvement rather than bug fix.



Okay, understand.


I got my eyes on this patch set.  The full patch set is in my opinion
just a set of improvements, and not bug fixes, so I would refrain from
back-backpatching.


I think that the issue (i.e., "waiting" is reported twice needlessly
in PS display) that 0002 patch tries to fix is a bug. So it should be
fixed even in the back branches.


So we need only two patches: one fixes process title issue and another
improve wait event. I've attached updated patches.


Thanks for updating the patches! I started reading 0001 patch.


Thank you for reviewing this patch.



-   /*
-* Report via ps if we have been waiting for more than 
500 msec
-* (should that be configurable?)
-*/
-   if (update_process_title && new_status == NULL &&
-   TimestampDifferenceExceeds(waitStart, 
GetCurrentTimestamp(),
-   
   500))

The patch changes ResolveRecoveryConflictWithSnapshot() and
ResolveRecoveryConflictWithTablespace() so that they always add
"waiting" into the PS display, whether wait is really necessary or not.
But isn't it better to display "waiting" in PS basically when wait is
necessary, like originally ResolveRecoveryConflictWithVirtualXIDs()
does as the above?


You're right. Will fix it.



  ResolveRecoveryConflictWithDatabase(Oid dbid)
  {
+   char*new_status = NULL;
+
+   /* Report via ps we are waiting */
+   new_status = set_process_title_waiting();

In  ResolveRecoveryConflictWithDatabase(), there seems no need to
display "waiting" in PS because no wait occurs when recovery conflict
with database happens.


Isn't the startup process waiting for other backend to terminate?


Yeah, you're right. I agree that "waiting" should be reported in this case.

Currently ResolveRecoveryConflictWithLock() and
ResolveRecoveryConflictWithBufferPin() don't call
ResolveRecoveryConflictWithVirtualXIDs and don't report "waiting"
in PS display. You changed them so that they report "waiting". I agree
to have this change. But this change is an improvement rather than
a bug fix, i.e., we should apply this change only in v13?



Did you mean ResolveRecoveryConflictWithDatabase and
ResolveRecoveryConflictWithBufferPin?


Yes! Sorry for my typo.


In the current code as far as I
researched there are two cases where we don't add "waiting" and one
case where we doubly add "waiting".

ResolveRecoveryConflictWithDatabase and
ResolveRecoveryConflictWithBufferPin don't update the ps title.
Although the path where GetCurrentTimestamp() >= ltime is false in
ResolveRecoveryConflictWithLock also doesn't update the ps title, it's
already updated in WaitOnLock. On the other hand, the path where
GetCurrentTimestamp() >= ltime is true in
ResolveRecoveryConflictWithLock updates the ps title but it's wrong as
I reported.

I've split the patch into two patches: 0001 patch fixes the issue
about doubly updating ps title, 0002 patch makes the recovery conflict
resolution on database and buffer pin update the ps title.


Thanks for splitting the patches. I think that 0001 patch can be back-patched.

-   /*
-* Report via ps if we have been waiting for more than 
500 msec
-* (should that be configurable?)
-*/
-   if (update_process_title && new_status == NULL &&
-   TimestampDifferenceExceeds(waitStart, 
GetCurrentTimestamp(),
-   
   500))

Originally, "waiting" is reported in PS if we've been waiting for more than
500 msec, as the above does. But you got rid of those code

Re: Refactor compile-time assertion checks for C/C++

2020-03-11 Thread Michael Paquier
On Sat, Mar 07, 2020 at 04:44:48PM -0500, Tom Lane wrote:
> cfbot reports this doesn't work with MSVC.  Not sure why --- maybe
> it defines __cpp_static_assert differently than you're expecting?

I don't think that's the issue.  The CF bot uses MSVC 12.0 which
refers to the 2013.  __cpp_static_assert being introduced in MSVC
2017, this error is visibly telling us that this environment does not
like the C++ fallback implementation, which is actually what my
previous version of the patch was using (I can reproduce the error
with my MSVC 2015 VM as well).  I think that this points to an error
in the patch: for the refactoring, the fallback implementation of C
and C++ should use the fallback implementation for C that we have
currently on HEAD.

With the updated patch attached, the error goes away for me.  Let's
see what Mr. Robot thinks.  The patch was marked as ready for
committer, I am switching it back to "Needs review".
--
Michael
From 562926d03172332457a328c5329cdac61cf7d99d Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 4 Feb 2020 17:09:59 +0900
Subject: [PATCH v2] Refactor assertion definitions in c.h

This unifies the C and C++ fallback implementations.
---
 src/include/c.h | 27 ++-
 1 file changed, 10 insertions(+), 17 deletions(-)

diff --git a/src/include/c.h b/src/include/c.h
index 831c89f473..9f318e16d0 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -840,39 +840,32 @@ extern void ExceptionalCondition(const char *conditionName,
  * about a negative width for a struct bit-field.  This will not include a
  * helpful error message, but it beats not getting an error at all.
  */
-#ifndef __cplusplus
-#ifdef HAVE__STATIC_ASSERT
+#if !defined(__cplusplus) && defined(HAVE__STATIC_ASSERT)
+/* Default C implementation */
 #define StaticAssertStmt(condition, errmessage) \
 	do { _Static_assert(condition, errmessage); } while(0)
 #define StaticAssertExpr(condition, errmessage) \
 	((void) ({ StaticAssertStmt(condition, errmessage); true; }))
 #define StaticAssertDecl(condition, errmessage) \
 	_Static_assert(condition, errmessage)
-#else			/* !HAVE__STATIC_ASSERT */
-#define StaticAssertStmt(condition, errmessage) \
-	((void) sizeof(struct { int static_assert_failure : (condition) ? 1 : -1; }))
-#define StaticAssertExpr(condition, errmessage) \
-	StaticAssertStmt(condition, errmessage)
-#define StaticAssertDecl(condition, errmessage) \
-	extern void static_assert_func(int static_assert_failure[(condition) ? 1 : -1])
-#endif			/* HAVE__STATIC_ASSERT */
-#else			/* C++ */
-#if defined(__cpp_static_assert) && __cpp_static_assert >= 200410
+#elif defined(__cplusplus) && \
+	defined(__cpp_static_assert) && __cpp_static_assert >= 200410
+/* Default C++ implementation */
 #define StaticAssertStmt(condition, errmessage) \
 	static_assert(condition, errmessage)
 #define StaticAssertExpr(condition, errmessage) \
 	({ static_assert(condition, errmessage); })
 #define StaticAssertDecl(condition, errmessage) \
 	static_assert(condition, errmessage)
-#else			/* !__cpp_static_assert */
+#else
+/* Fallback implementation for C and C++ */
 #define StaticAssertStmt(condition, errmessage) \
-	do { struct static_assert_struct { int static_assert_failure : (condition) ? 1 : -1; }; } while(0)
+	((void) sizeof(struct { int static_assert_failure : (condition) ? 1 : -1; }))
 #define StaticAssertExpr(condition, errmessage) \
-	((void) ({ StaticAssertStmt(condition, errmessage); }))
+	StaticAssertStmt(condition, errmessage)
 #define StaticAssertDecl(condition, errmessage) \
 	extern void static_assert_func(int static_assert_failure[(condition) ? 1 : -1])
-#endif			/* __cpp_static_assert */
-#endif			/* C++ */
+#endif
 
 
 /*
-- 
2.25.1



signature.asc
Description: PGP signature


Re: [Proposal] Global temporary tables

2020-03-11 Thread Prabhat Sahu
On Mon, Mar 9, 2020 at 10:02 PM 曾文旌(义从)  wrote:

>
>
> Fixed in global_temporary_table_v18-pg13.patch.
>
Hi Wenjing,
Thanks for the patch. I have verified the previous issues with
"gtt_v18_pg13.patch" and those are resolved.
Please find below case:

postgres=# create sequence seq;
CREATE SEQUENCE

postgres=# CREATE GLOBAL TEMPORARY TABLE gtt1(c1 int PRIMARY KEY) ON COMMIT
DELETE ROWS;
CREATE TABLE

postgres=# CREATE GLOBAL TEMPORARY TABLE gtt2(c1 int PRIMARY KEY) ON COMMIT
PRESERVE ROWS;
CREATE TABLE

postgres=# alter table gtt1 add c2 int default nextval('seq');
ERROR:  cannot reindex global temporary tables

postgres=# alter table gtt2 add c2 int default nextval('seq');
ERROR:  cannot reindex global temporary tables

*Note*: We are getting this error if we have a key column(PK/UNIQUE) in a
GTT, and trying to add a column with a default sequence into it.

-- 

With Regards,
Prabhat Kumar Sahu
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2020-03-11 Thread Amit Kapila
On Tue, Mar 10, 2020 at 8:53 AM Dilip Kumar  wrote:
>
> Please find the updated patch (summary of the changes)
> - Instead of searching the lock hash table for assert, it maintains a counter.
> - Also, handled the case where we can acquire the relation extension
> lock while holding the relation extension lock on the same relation.
> - Handled the error case.
>
> In addition to that prepared a WIP patch for handling the PageLock.
> First, I thought that we can use the same counter for the PageLock and
> the RelationExtensionLock because in assert we just need to check
> whether we are trying to acquire any other heavyweight lock while
> holding any of these locks.  But, the exceptional case where we
> allowed to acquire a relation extension lock while holding any of
> these locks is a bit different.  Because, if we are holding a relation
> extension lock then we allowed to acquire the relation extension lock
> on the same relation but it can not be any other relation otherwise it
> can create a cycle.  But, the same is not true with the PageLock,
> i.e. while holding the PageLock you can acquire the relation extension
> lock on any relation and that will be safe because the relation
> extension lock guarantee that, it will never create the cycle.
> However, I agree that we don't have any such cases where we want to
> acquire a relation extension lock on the different relations while
> holding the PageLock.
>

Right, today, we don't have such cases where after acquiring relation
extension or page lock for a particular relation, we need to acquire
any of those for other relation and I am not able to offhand think of
many cases where we might have such a need in the future.  The one
theoretical possibility is to include fork_num in the lock tag while
acquiring extension lock for fsm/vm, but that will also have the same
relation.  Similarly one might say it is valid to acquire extension
lock in share mode after we have acquired it exclusive mode.  I am not
sure how much futuristic we want to make these Asserts.

I feel we should cover the current possible cases (which I think will
make the asserts more strict then required) and if there is a need to
relax them in the future for any particular use case, then we will
consider those.  In general, if we consider the way Mahendra has
written a patch which is to find the entry via the local hash table to
check for an Assert condition, then it will be a bit easier to extend
the checks if required in future as that way we have more information
about the particular lock. However, it will make the check more
expensive which might be okay considering that it is only for Assert
enabled builds.

One minor comment:
/*
+ * We should not acquire any other lock if we are already holding the
+ * relation extension lock.  Only exception is that if we are trying to
+ * acquire the relation extension lock then we can hold the relation
+ * extension on the same relation.
+ */
+ Assert(!IsRelExtLockHeld() ||
+((locktag->locktag_type == LOCKTAG_RELATION_EXTEND) && found));

I think you don't need the second part of the check because if we have
found the lock in the local lock table, we would return before this
check.  I think it will catch the case where if we have an extension
lock on one relation, then it won't allow us to acquire it on another
relation. OTOH, it will also not allow cases where backend has
relation extension lock in Exclusive mode and it tries to acquire it
in Shared mode. So, not sure if it is a good idea.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Improve search for missing parent downlinks in amcheck

2020-03-11 Thread Alexander Korotkov
On Wed, Mar 11, 2020 at 7:19 AM Peter Geoghegan  wrote:
> This looks committable. I only noticed one thing: The comments above
> bt_target_page_check() need to be updated to reflect the new check,
> which no longer has anything to do with "heapallindexed = true".

Thank you!  Pushed with this comment revised!

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2020-03-11 Thread Amit Kapila
On Tue, Mar 10, 2020 at 6:39 PM Robert Haas  wrote:
>
> On Fri, Mar 6, 2020 at 11:27 PM Dilip Kumar  wrote:
> > I think instead of the flag we need to keep the counter because we can
> > acquire the same relation extension lock multiple times.  So
> > basically, every time we acquire the lock we can increment the counter
> > and while releasing we can decrement it.   During an error path, I
> > think it is fine to set it to 0 in CommitTransaction/AbortTransaction.
> > But, I am not sure that we can set to 0 or decrement it in
> > AbortSubTransaction because we are not sure whether we have acquired
> > the lock under this subtransaction or not.
>
> I think that CommitTransaction, AbortTransaction, and friends have
> *zero* business touching this. I think the counter - or flag - should
> track whether we've got a PROCLOCK entry for a relation extension
> lock. We either do, or we do not, and that does not change because of
> anything have to do with the transaction state. It changes because
> somebody calls LockRelease() or LockReleaseAll().
>

Do we want to have a special check in the LockRelease() to identify
whether we are releasing relation extension lock?  If not, then how we
will identify that relation extension is released and we can reset it
during subtransaction abort due to error?  During success paths, we
know when we have released RelationExtension or Page Lock (via
UnlockRelationForExtension or UnlockPage).  During the top-level
transaction end, we know when we have released all the locks, so that
will imply that RelationExtension and or Page locks must have been
released by that time.

If we have no other choice, then I see a few downsides of adding a
special check in the LockRelease() call:

1. Instead of resetting/decrement the variable from specific APIs like
UnlockRelationForExtension or UnlockPage, we need to have it in
LockRelease. It will also look odd, if set variable in
LockRelationForExtension, but don't reset in the
UnlockRelationForExtension variant.  Now, maybe we can allow to reset
it at both places if it is a flag, but not if it is a counter
variable.

2. One can argue that adding extra instructions in a generic path
(like LockRelease) is not a good idea, especially if those are for an
Assert. I understand this won't add anything which we can measure by
standard benchmarks.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: WIP/PoC for parallel backup

2020-03-11 Thread Rajkumar Raghuwanshi
Hi Asif

I have started testing this feature. I have applied v6 patch on commit
a069218163704c44a8996e7e98e765c56e2b9c8e (30 Jan).
I got few observations, please take a look.

*--if backup failed, backup directory is not getting removed.*
[edb@localhost bin]$ ./pg_basebackup -p 5432 --jobs=9 -D /tmp/test_bkp/bkp6
pg_basebackup: error: could not connect to server: FATAL:  number of
requested standby connections exceeds max_wal_senders (currently 10)
[edb@localhost bin]$ ./pg_basebackup -p 5432 --jobs=8 -D /tmp/test_bkp/bkp6
pg_basebackup: error: directory "/tmp/test_bkp/bkp6" exists but is not empty


*--giving large number of jobs leading segmentation fault.*
./pg_basebackup -p 5432 --jobs=1000 -D /tmp/t3
pg_basebackup: error: could not connect to server: FATAL:  number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL:  number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL:  number of
requested standby connections exceeds max_wal_senders (currently 10)
.
.
.
pg_basebackup: error: could not connect to server: FATAL:  number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL:  number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL:  number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL:  number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: could not fork new
process for connection: Resource temporarily unavailable

could not fork new process for connection: Resource temporarily unavailable
pg_basebackup: error: failed to create thread: Resource temporarily
unavailable
Segmentation fault (core dumped)

--stack-trace
gdb -q -c core.11824 pg_basebackup
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `./pg_basebackup -p 5432 --jobs=1000 -D
/tmp/test_bkp/bkp10'.
Program terminated with signal 11, Segmentation fault.
#0  pthread_join (threadid=140503120623360, thread_return=0x0) at
pthread_join.c:46
46  if (INVALID_NOT_TERMINATED_TD_P (pd))
Missing separate debuginfos, use: debuginfo-install
keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0  pthread_join (threadid=140503120623360, thread_return=0x0) at
pthread_join.c:46
#1  0x00408e21 in cleanup_workers () at pg_basebackup.c:2840
#2  0x00403846 in disconnect_atexit () at pg_basebackup.c:316
#3  0x003921235a02 in __run_exit_handlers (status=1) at exit.c:78
#4  exit (status=1) at exit.c:100
#5  0x00408aa6 in create_parallel_workers (backupinfo=0x1a4b8c0) at
pg_basebackup.c:2713
#6  0x00407946 in BaseBackup () at pg_basebackup.c:2127
#7  0x0040895c in main (argc=6, argv=0x7ffd566f4718) at
pg_basebackup.c:2668


*--with tablespace is in the same directory as data, parallel_backup
crashed*
[edb@localhost bin]$ ./initdb -D /tmp/data
[edb@localhost bin]$ ./pg_ctl -D /tmp/data -l /tmp/logfile start
[edb@localhost bin]$ mkdir /tmp/ts
[edb@localhost bin]$ ./psql postgres
psql (13devel)
Type "help" for help.

postgres=# create tablespace ts location '/tmp/ts';
CREATE TABLESPACE
postgres=# create table tx (a int) tablespace ts;
CREATE TABLE
postgres=# \q
[edb@localhost bin]$ ./pg_basebackup -j 2 -D /tmp/tts -T /tmp/ts=/tmp/ts1
Segmentation fault (core dumped)

--stack-trace
[edb@localhost bin]$ gdb -q -c core.15778 pg_basebackup
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `./pg_basebackup -j 2 -D /tmp/tts -T
/tmp/ts=/tmp/ts1'.
Program terminated with signal 11, Segmentation fault.
#0  0x00409442 in get_backup_filelist (conn=0x140cb20,
backupInfo=0x14210a0) at pg_basebackup.c:3000
3000 backupInfo->curr->next = file;
Missing separate debuginfos, use: debuginfo-install
keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0  0x00409442 in get_backup_filelist (conn=0x140cb20,
backupInfo=0x14210a0) at pg_basebackup.c:3000
#1  0x00408b56 in parallel_backup_run (backupinfo=0x14210a0) at
pg_basebackup.c:2739
#2  0x00407955 in BaseBackup () at pg_basebackup.c:2128
#3  0x0040895c in main (argc=7, argv=0x7ffca2910c58) at
pg_basebackup.c:2668
(gdb)

Thanks & Regards,
Rajkumar Raghuwanshi


On Tue, Feb 25, 2020 at 7:49 PM Asif Rehman  wrote:

> Hi,
>
> I have created a commitfest entry.
> https://commitfest.postgresql.org/27/2472/
>
>
> On Mon, Feb 17, 2020 at 1:39 PM Asif Rehman 
> wrote:
>
>> Thanks Jeevan. Here is 

Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2020-03-11 Thread Dilip Kumar
On Wed, Mar 11, 2020 at 2:23 PM Amit Kapila  wrote:
>
> On Tue, Mar 10, 2020 at 8:53 AM Dilip Kumar  wrote:
> >
> > Please find the updated patch (summary of the changes)
> > - Instead of searching the lock hash table for assert, it maintains a 
> > counter.
> > - Also, handled the case where we can acquire the relation extension
> > lock while holding the relation extension lock on the same relation.
> > - Handled the error case.
> >
> > In addition to that prepared a WIP patch for handling the PageLock.
> > First, I thought that we can use the same counter for the PageLock and
> > the RelationExtensionLock because in assert we just need to check
> > whether we are trying to acquire any other heavyweight lock while
> > holding any of these locks.  But, the exceptional case where we
> > allowed to acquire a relation extension lock while holding any of
> > these locks is a bit different.  Because, if we are holding a relation
> > extension lock then we allowed to acquire the relation extension lock
> > on the same relation but it can not be any other relation otherwise it
> > can create a cycle.  But, the same is not true with the PageLock,
> > i.e. while holding the PageLock you can acquire the relation extension
> > lock on any relation and that will be safe because the relation
> > extension lock guarantee that, it will never create the cycle.
> > However, I agree that we don't have any such cases where we want to
> > acquire a relation extension lock on the different relations while
> > holding the PageLock.
> >
>
> Right, today, we don't have such cases where after acquiring relation
> extension or page lock for a particular relation, we need to acquire
> any of those for other relation and I am not able to offhand think of
> many cases where we might have such a need in the future.  The one
> theoretical possibility is to include fork_num in the lock tag while
> acquiring extension lock for fsm/vm, but that will also have the same
> relation.  Similarly one might say it is valid to acquire extension
> lock in share mode after we have acquired it exclusive mode.  I am not
> sure how much futuristic we want to make these Asserts.
>
> I feel we should cover the current possible cases (which I think will
> make the asserts more strict then required) and if there is a need to
> relax them in the future for any particular use case, then we will
> consider those.  In general, if we consider the way Mahendra has
> written a patch which is to find the entry via the local hash table to
> check for an Assert condition, then it will be a bit easier to extend
> the checks if required in future as that way we have more information
> about the particular lock. However, it will make the check more
> expensive which might be okay considering that it is only for Assert
> enabled builds.
>
> One minor comment:
> /*
> + * We should not acquire any other lock if we are already holding the
> + * relation extension lock.  Only exception is that if we are trying to
> + * acquire the relation extension lock then we can hold the relation
> + * extension on the same relation.
> + */
> + Assert(!IsRelExtLockHeld() ||
> +((locktag->locktag_type == LOCKTAG_RELATION_EXTEND) && found));
>
> I think you don't need the second part of the check because if we have
> found the lock in the local lock table, we would return before this
> check.

Right.

  I think it will catch the case where if we have an extension
> lock on one relation, then it won't allow us to acquire it on another
> relation.

But, those will be caught even if we remove the second part right.
Basically, if we have Assert(!IsRelExtLockHeld(), that means by this
time you should not hold any relation extension lock.  The exceptional
case where we allow relation extension on the same relation will
anyway not reach here.  I think the second part of the Assert is just
useless.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




SERIAL datatype column skipping values.

2020-03-11 Thread Prabhat Sahu
Hi all,
Please check the below behavior for the "SERIAL" datatype.

postgres=# CREATE TABLE t1(c1 int, c2 serial);
CREATE TABLE
postgres=# insert into t1 values (generate_series(1,3));
INSERT 0 3
postgres=# insert into t1 values (generate_series(4,6));
INSERT 0 3
postgres=# select * from t1;
 c1 | c2
+
  1 |  1
  2 |  2
  3 |  3
  4 |  5
  5 |  6
  6 |  7
(6 rows)

In this above case, the serial column "c2" is skipping the value "4" in
select output.
Is this an expected behavior?

-- 

With Regards,
Prabhat Kumar Sahu
EnterpriseDB: http://www.enterprisedb.com


Re: [PATCH] Use PKG_CHECK_MODULES to detect the libxml2 library

2020-03-11 Thread Hugh McMaster
On Tue, 10 Mar 2020 at 22:41, Daniel Gustafsson wrote:
>
> > On 10 Mar 2020, at 11:53, Hugh McMaster  wrote:
> > This patch uses pkg-config's PKG_CHECK_MODULES macro to detect libxml2
> > or, if pkg-config is not available, falls back to xml2-confg.
>
> This was previously discussed in 20200120204715.ga73...@msg.df7cb.de which
> ended without a real conclusion on what could/should be done (except that
> nothing *had* to be done).
>
> What is the situation on non-Debian/Ubuntu systems (BSD's, macOS etc etc)?  Is
> it worth adding pkg-config support if we still need a fallback to xml2-config?

To the best of my knowledge, FreeBSD, macOS, OpenSUSE, Solaris etc.
all support detection of libxml2 via pkg-config.

One way or another, xml2-config is going away, whether by decision of
a package maintainer or upstream.

freetype-config was deprecated upstream a few years ago. Upstream ICU
will also disable the installation of icu-config by default from April
this year. Some systems, such as Debian, have not shipped icu-config
for a year or so.

The PHP project last year switched to using pkg-config by default for
all libraries supplying a .pc file. PHP's build scripts do not fall
back to legacy scripts.

Another reason for switching is that xml2-config incorrectly outputs
static libraries when called with `--libs`. You have to call `--libs
--dynamic` to output -lxml2 only. Debian patches the script to avoid
this unusual behaviour.




Re: HAVE_WORKING_LINK still needed?

2020-03-11 Thread Peter Eisentraut

On 2020-03-04 17:37, Peter Eisentraut wrote:

Here is the third patch again, we renaming durable_link_or_rename() to
durable_rename_excl().  This seems to match existing Unix system call
naming best (see open() flag O_EXCL, and macOS has a renamex_np() flag
RENAME_EXCL).


committed like that

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] Use PKG_CHECK_MODULES to detect the libxml2 library

2020-03-11 Thread Hugh McMaster
On Wed, 11 Mar 2020 at 07:49, Daniel Gustafsson wrote:
> > On 10 Mar 2020, at 18:38, Peter Eisentraut wrote:
> > Btw., here is an older thread for the same issue 
> > .
> >   Might be worth reflecting on the issues discussed there.
>
> Thanks, didn't realize that the subject had been up for discussion earlier as
> well.

Interesting thread. The issue of precedence (e.g. pkg-config over
xml2-config) is still relevant, although arguably less so today, due
to the far greater availability of pkg-config. Some packages choose to
fall back to xml2-config, say, if they need to support old or
soon-to-be EOL systems lacking pkg-config. These situations are
increasingly rare.

The thread is correct on multi-arch header and library directories.
That said, pkg-config can handle this easily.

> For me, the duplication aspect is the most troubling, since we'd still need 
> the
> xml2-config fallback and thus won't be able to simplify the code.

configure.in shows that ICU only uses the PKG_CHECK_MODULES macro.
libxml2, libxslt and other dependencies could also switch.

Using AC_CHECK_LIB to add libraries (such as -lxml2) to $LIBS isn't
probably the most ideal method (I'd recommend adding pkg-config's
native X_CFLAGS and X_LIBS variables as necessary to $LIBS, $CPPFLAGS
etc.), but that's a topic for another thread.




Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2020-03-11 Thread Dilip Kumar
On Tue, Mar 10, 2020 at 4:11 PM Amit Kapila  wrote:
>
> On Mon, Feb 24, 2020 at 3:38 PM Amit Kapila  wrote:
> >
> > On Thu, Feb 20, 2020 at 8:06 AM Andres Freund  wrote:
> > > What I'm advocating is that extension locks should continue to go
> > > through lock.c. And yes, that requires some changes to group locking,
> > > but I still don't see why they'd be complicated.
> > >
> >
> > Fair position, as per initial analysis, I think if we do below three
> > things, it should work out without changing to a new way of locking
> > for relation extension or page type locks.
> > a. As per the discussion above, ensure in code we will never try to
> > acquire another heavy-weight lock after acquiring relation extension
> > or page type locks (probably by having Asserts in code or maybe some
> > other way).
>
> I have done an analysis of the relation extension lock (which can be
> acquired via LockRelationForExtension or
> ConditionalLockRelationForExtension) and found that we don't acquire
> any other heavyweight lock after acquiring it. However, we do
> sometimes try to acquire it again in the places where we update FSM
> after extension, see points (e) and (f) described below.  The usage of
> this lock can be broadly divided into six categories and each one is
> explained as follows:
>
> a. Where after taking the relation extension lock we call ReadBuffer
> (or its variant) and then LockBuffer.  The LockBuffer internally calls
> either LWLock to acquire or release neither of which acquire another
> heavy-weight lock. It is quite obvious as well that while taking some
> lightweight lock, there is no reason to acquire another heavyweight
> lock on any object.  The specs/comments of ReadBufferExtended (which
> gets called from variants of ReadBuffer) API says that if the blknum
> requested is P_NEW, only one backend can call it at-a-time which
> indicates that we don't need to acquire any heavy-weight lock inside
> this API.  Otherwise, also, this API won't need a heavy-weight lock to
> read the existing block into shared buffer as two different backends
> are allowed to read the same block.  I have also gone through all the
> functions called/used in this path to ensure that we don't use
> heavy-weight locks inside it.
>
> The usage by APIs BloomNewBuffer, GinNewBuffer, gistNewBuffer,
> _bt_getbuf, and SpGistNewBuffer falls in this category.  Another API
> that falls under this category is revmap_physical_extend which uses
> ReadBuffer, LocakBuffer and ReleaseBuffer. The ReleaseBuffer API
> unpins aka decrement the reference count for buffer and disassociates
> a buffer from the resource owner.  None of that requires heavy-weight
> lock. T
>
> b. After taking relation extension lock, we call
> RelationGetNumberOfBlocks which primarily calls file-level functions
> to determine the size of the file. This doesn't acquire any other
> heavy-weight lock after relation extension lock.
>
> The usage by APIs ginvacuumcleanup, gistvacuumscan, btvacuumscan, and
> spgvacuumscan falls in this category.
>
> c. There is a usage in API brin_page_cleanup() where we just acquire
> and release the relation extension lock to avoid reinitializing the
> page. As there is no call in-between acquire and release, so there is
> no chance of another heavy-weight lock acquire after having relation
> extension lock.
>
> d. In fsm_extend() and vm_extend(), after acquiring relation extension
> lock, we perform various file-level operations like RelationOpenSmgr,
> smgrexists, smgrcreate, smgrnblocks, smgrextend.  First, from theory,
> we don't have any heavy-weight lock other than relation extension lock
> which can cover such operations and then I have verified it by going
> through these APIs that these don't acquire any other heavy-weight
> lock.  Then these APIs also call PageSetChecksumInplace computes a
> checksum of the page and sets the same in page header which is quite
> straight-forward and doesn't acquire any heavy-weight lock.
>
> In vm_extend, we additionally call CacheInvalidateSmgr to send a
> shared-inval message to force other backends to close any smgr
> references they may have for the relation for which we extending
> visibility map which has no reason to acquire any heavy-weight lock.
> I have checked the code path as well and I didn't find any
> heavy-weight lock call in that.
>
> e. In brin_getinsertbuffer, we call ReadBuffer() and LockBuffer(), the
> usage of which is the same as what is mentioned in (a).  In addition
> to that it calls brin_initialize_empty_new_buffer() which further
> calls RecordPageWithFreeSpace which can again acquire relation
> extension lock for same relation.  This usage is safe because we have
> a mechanism in heavy-weight lock manager that if we already hold a
> lock and a request came for the same lock and in same mode, the lock
> will be granted.
>
> f. In RelationGetBufferForTuple(), there are multiple APIs that get
> called and like (e), it can try to reacquire the relation extension
> lock

Re: Index Skip Scan

2020-03-11 Thread Andy Fan
On Tue, Mar 10, 2020 at 4:32 AM James Coleman  wrote:

> On Mon, Mar 9, 2020 at 3:56 PM Dmitry Dolgov <9erthali...@gmail.com>
> wrote:
> >
> > Assuming we'll implement it in a way that we do not know about what kind
> > of path type is that in create_distinct_path, then it can also work for
> > ProjectionPath or anything else (if UniqueKeys are present). But then
> > still EquivalenceMember are used only to figure out correct
> > distinctPrefixKeys and do not affect whether or not skipping is applied.
> > What do I miss?
>
>
> Part of the puzzle seems to me to this part of the response:
>
> > I think the UniqueKeys may need to be changed from using
> > EquivalenceClasses to use Exprs instead.
>
> But I can't say I'm being overly helpful by pointing that out, since I
> don't have my head in the code enough to understand how you'd
> accomplish that :)
>
>
There was a dedicated thread  [1]  where David explain his idea very
detailed,
and you can also check that messages around that message for the context.
hope it helps.

[1]
https://www.postgresql.org/message-id/CAApHDvq7i0%3DO97r4Y1pv68%2BtprVczKsXRsV28rM9H-rVPOfeNQ%40mail.gmail.com


Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction

2020-03-11 Thread Justin Pryzby
On Sun, Mar 08, 2020 at 03:40:09PM -0400, Tom Lane wrote:
> Justin Pryzby  writes:
> > On Sun, Mar 08, 2020 at 02:37:49PM -0400, Tom Lane wrote:
> >> I guess we ought to change that function to use returns-a-tuplestore
> >> protocol instead of thinking it can hold a directory open across calls.
> > Thanks for the analysis.
> > Do you mean it should enumerate all files during the initial SRF call, or 
> > use
> > something other than the SRF_* macros ?
> It has to enumerate all the files during the first call.  I suppose it

> I've just finished scanning the source code and concluding that all
> of these functions are similarly broken:
> pg_ls_dir_files

I patched this one to see what it looks like and to allow /hopefully/ moving
forward one way or another with the pg_ls_tmpfile() patch set (or at least
avoid trying to do anything there which is too inconsistent with this fix).

> I don't see anything in the documentation (either funcapi.h or
> xfunc.sgml) warning that the function might not get run to completion,
> either ...

Also, at first glance, these seem to be passing constant "randomAccess=true"
rather than (bool) (rsinfo->allowedModes&SFRM_Materialize_Random)

$ git grep -wl SFRM_Materialize |xargs grep -l 'tuplestore_begin_heap(true'
contrib/dblink/dblink.c
contrib/pageinspect/brinfuncs.c
contrib/pg_stat_statements/pg_stat_statements.c
src/backend/access/transam/xlogfuncs.c
src/backend/commands/event_trigger.c
src/backend/commands/extension.c
src/backend/foreign/foreign.c
src/backend/replication/logical/launcher.c
src/backend/replication/logical/logicalfuncs.c
src/backend/replication/logical/origin.c
src/backend/replication/slotfuncs.c
src/backend/replication/walsender.c
src/backend/storage/ipc/shmem.c
src/backend/utils/adt/pgstatfuncs.c
src/backend/utils/misc/guc.c
src/backend/utils/misc/pg_config.c

-- 
Justin
>From 1fd2918d65ed8cc64158e407e56ed61b44e951db Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Tue, 10 Mar 2020 21:01:47 -0500
Subject: [PATCH v1] pg_ls_dir_files: avoid leaking DIR if not run to
 completion

Change to return a tuplestore rather than leaving directory open across calls.

Discussion: https://www.postgresql.org/message-id/20200308173103.GC1357%40telsasoft.com
See also: 9cb7db3f0
---
 src/backend/utils/adt/genfile.c | 90 +++--
 1 file changed, 42 insertions(+), 48 deletions(-)

diff --git a/src/backend/utils/adt/genfile.c b/src/backend/utils/adt/genfile.c
index 5bda2af87c..046d218ffa 100644
--- a/src/backend/utils/adt/genfile.c
+++ b/src/backend/utils/adt/genfile.c
@@ -527,67 +527,63 @@ pg_ls_dir_1arg(PG_FUNCTION_ARGS)
 static Datum
 pg_ls_dir_files(FunctionCallInfo fcinfo, const char *dir, bool missing_ok)
 {
-	FuncCallContext *funcctx;
 	struct dirent *de;
-	directory_fctx *fctx;
-
-	if (SRF_IS_FIRSTCALL())
-	{
-		MemoryContext oldcontext;
-		TupleDesc	tupdesc;
+	MemoryContext oldcontext;
+	TupleDesc	tupdesc;
+	ReturnSetInfo	*rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	bool		randomAccess = (rsinfo->allowedModes&SFRM_Materialize_Random) != 0;
+	DIR			*dirdesc;
+	Tuplestorestate *tupstore;
 
-		funcctx = SRF_FIRSTCALL_INIT();
-		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+		ereport(ERROR,
+(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("set-valued function called in context that cannot accept a set")));
+	/* check to see if caller supports us returning a tuplestore */
+	if (!(rsinfo->allowedModes & SFRM_Materialize))
+		ereport(ERROR,
+(errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("materialize mode required, but it is not "
+	 "allowed in this context")));
 
-		fctx = palloc(sizeof(directory_fctx));
+	/* The Tuplestore and TupleDesc should be created in ecxt_per_query_memory */
+	oldcontext = MemoryContextSwitchTo(rsinfo->econtext->ecxt_per_query_memory);
 
-		tupdesc = CreateTemplateTupleDesc(3);
-		TupleDescInitEntry(tupdesc, (AttrNumber) 1, "name",
-		   TEXTOID, -1, 0);
-		TupleDescInitEntry(tupdesc, (AttrNumber) 2, "size",
-		   INT8OID, -1, 0);
-		TupleDescInitEntry(tupdesc, (AttrNumber) 3, "modification",
-		   TIMESTAMPTZOID, -1, 0);
-		funcctx->tuple_desc = BlessTupleDesc(tupdesc);
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+		elog(ERROR, "return type must be a row type");
 
-		fctx->location = pstrdup(dir);
-		fctx->dirdesc = AllocateDir(fctx->location);
+	tupstore = tuplestore_begin_heap(randomAccess, false, work_mem);
+	rsinfo->returnMode = SFRM_Materialize;
+	rsinfo->setResult = tupstore;
+	rsinfo->setDesc = tupdesc;
 
-		if (!fctx->dirdesc)
-		{
-			if (missing_ok && errno == ENOENT)
-			{
-MemoryContextSwitchTo(oldcontext);
-SRF_RETURN_DONE(funcctx);
-			}
-			else
-ereport(ERROR,
-		(errcode_for_file_access(),
-		 errmsg("could not open directory \"%s\": %m",
-fctx->location)));
-		}
+	MemoryContextSwitchTo(oldcontext);
 
-		funcctx->user_fctx = fctx;
-		Memor

Re: [PATCH] Add schema and table names to partition error

2020-03-11 Thread Amit Kapila
On Tue, Mar 3, 2020 at 10:05 AM Chris Bandy  wrote:
>
> On 3/1/20 10:09 PM, Amit Langote wrote:
> > Hi Chris,
> >
> > On Mon, Mar 2, 2020 at 8:51 AM Chris Bandy  wrote:
> >> On 3/1/20 5:14 AM, Amit Kapila wrote:
> >>> On Sun, Mar 1, 2020 at 10:10 AM Amit Langote  
> >>> wrote:
> 
>  There are couple more instances in src/backend/command/tablecmds.c
>  where partition constraint is checked:
> 
>  Maybe, better fix these too for completeness.
> >>>
> >>> Another thing we might need to see is which of these can be
> >>> back-patched.  We should also try to write the tests for cases we are
> >>> changing even if we don't want to commit those.
> >>
> >> I don't have any opinion on back-patching. Existing tests pass. I wasn't
> >> able to find another test that checks the constraint field of errors.
> >> There's a little bit in the tests for psql, but that is about the the
> >> \errverbose functionality rather than specific errors and their fields.
> >
> > Actually, it's not a bad idea to use \errverbose to test this.
> >
>
> I've added a second patch with tests that cover three of the five errors
> touched by the first patch. Rather than \errverbose, I simply \set
> VERBOSITY verbose. I could not find a way to exclude the location field
> from the output, so those lines will be likely be out of date soon--if
> not already.
>
> I couldn't find a way to exercise the errors in tablecmds.c. Does anyone
> know how to instigate a table rewrite that would violate partition
> constraints? I tried:
>

When I tried to apply your patch on HEAD with patch -p1 <
, I am getting below errors

(Stripping trailing CRs from patch; use --binary to disable.)
can't find file to patch at input line 17
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
..

I have tried with git am as well, but it failed.  I am not sure what
is the reason.  Can you please once check at your end? Also, see, if
it applies till 9.5 as I think we should backpatch this.

IIUC, this patch is mainly to get the table name, schema name in case
of the error paths, so that your application can handle errors in case
partition constraint violation, right?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Refactor compile-time assertion checks for C/C++

2020-03-11 Thread Georgios Kokolatos
Thank you for updating the status of the issue.

I have to admit that I completely missed the CF bot. Lesson learned.

For whatever is worth, my previous comment that the patch improves
readability also applies to the updated version of the patch.

The CF bot seems happy now, which means that your assessment as
to the error and fix are correct.

So :+1: from me.

Re: Add A Glossary

2020-03-11 Thread Roger Harkavy
Hello, everyone, I'm Roger, the tech writer who worked with Corey on the
glossary file. I just thought I'd announce that I am also on the list, and
I'm looking forward to any questions or comments people may have. Thanks!

On Tue, Mar 10, 2020 at 11:37 AM Corey Huinker 
wrote:

> This latest version is an attempt at merging the work of Jürgen Purtz into
> what I had posted earlier. There was relatively little overlap in the terms
> we had chosen to define.
>
> Each glossary definition now has a reference id (good idea Jürgen), the
> form of which is "glossary-term". So we can link to the glossary from
> outside if we so choose.
>
> I encourage everyone to read the definitions, and suggest fixes to any
> inaccuracies or awkward phrasings. Mostly, though, I'm seeking feedback on
> the structure itself, and hoping to get that committed.
>
>
> On Tue, Feb 11, 2020 at 11:22 PM Corey Huinker 
> wrote:
>
>> It seems like this could be a good idea, still the patch has been
>>> waiting on his author for more than two weeks now, so I have marked it
>>> as returned with feedback.
>>>
>>
>> In light of feedback, I enlisted the help of an actual technical writer
>> (Roger Harkavy, CCed) and we eventually found the time to take a second
>> pass at this.
>>
>> Attached is a revised patch.
>>
>>
>


Re: SERIAL datatype column skipping values.

2020-03-11 Thread Andreas Karlsson

On 3/11/20 11:15 AM, Prabhat Sahu wrote:

Hi all,
Please check the below behavior for the "SERIAL" datatype.

[...]

In this above case, the serial column "c2" is skipping the value "4" in 
select output.

Is this an expected behavior?


Curious, it seems like DEFAULT expressions of a table are executed an 
extra time if a set returning function is used like in your example. And 
the SERIAL type is implemented using DEFAULT.


On the other hand if you use "INSERT ... SELECT" the DEFAULT expression 
is only executed once per row inserted.


# CREATE FUNCTION test_default() RETURNS int LANGUAGE plpgsql AS $$
BEGIN
RAISE NOTICE 'Ran test_default()';
RETURN 42;
END
$$;
CREATE FUNCTION

# CREATE TABLE t2 (c1 int, c2 int DEFAULT test_default());
CREATE TABLE

# INSERT INTO t2 VALUES (generate_series(1,2));
NOTICE:  Ran test_default()
NOTICE:  Ran test_default()
NOTICE:  Ran test_default()
INSERT 0 2

# INSERT INTO t2 SELECT generate_series(1,2);
NOTICE:  Ran test_default()
NOTICE:  Ran test_default()
INSERT 0 2

Andreas




Re: more ALTER .. DEPENDS ON EXTENSION fixes

2020-03-11 Thread Alvaro Herrera
On 2020-Mar-05, Alvaro Herrera wrote:

> On 2020-Mar-05, Ibrar Ahmed wrote:
> 
> > Is this intentional that there is no error when removing a non-existing
> > dependency?
> 
> Hmm, I think we can do nothing silently if nothing is called for.
> So, yes, that seems to be the way it should work.

I pushed 0002 to all branches (9.6+), after modifying it to silently do
nothing instead of throwing an error when the dependency exists -- same
we discussed here, for the other form of the command.
I just noticed that I failed to credit Ahsan Hadi as reviewer for this
patch :-(

Thanks for reviewing.  I'll see about 0001 next, also backpatched to
9.6.

I'm still not sure whether to apply 0003 (+ your tab-completion patch,
thanks for it) to backbranches or just to master.  It seems legitimate
to see it as a feature addition, but OTOH the overall feature is not
complete without it ...

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Additional improvements to extended statistics

2020-03-11 Thread Dean Rasheed
On Mon, 9 Mar 2020 at 18:19, Tomas Vondra  wrote:>
> On Mon, Mar 09, 2020 at 08:35:48AM +, Dean Rasheed wrote:
> >
> >  P(a,b) = P(a) * [f + (1-f)*P(b)]
> >
> >because it might return a value that is larger that P(b), which
> >obviously should not be possible.
>
> Hmmm, yeah. It took me a while to reproduce this - the trick is in
> "inverting" the dependency on a subset of data, e.g. like this:
>
>create table t (a int, b int);
>insert into t select mod(i,50), mod(i,25)
>  from generate_series(1,5000) s(i);
>update t set a = 0 where a < 3;
>create statistics s (dependencies) on a,b from t;
>
> which then does this:
>
>test=# explain select * from t where a = 0;
> QUERY PLAN
>
> Seq Scan on t  (cost=0.00..86.50 rows=300 width=8)
>   Filter: (a = 0)
>(2 rows)
>
>test=# explain select * from t where b = 0;
> QUERY PLAN
>
> Seq Scan on t  (cost=0.00..86.50 rows=200 width=8)
>   Filter: (b = 0)
>(2 rows)
>
>test=# explain select * from t where a = 0 and b = 0;
> QUERY PLAN
>
> Seq Scan on t  (cost=0.00..99.00 rows=283 width=8)
>   Filter: ((a = 0) AND (b = 0))
>(2 rows)
>
> Which I think is the issue you've described ...
>

I think this is also related to the problem that functional dependency
stats don't take into account the fact that the user clauses may not
be compatible with one another. For example:

CREATE TABLE t (a int, b int);
INSERT INTO t
  SELECT x/10,x/10 FROM (SELECT generate_series(1,x)
 FROM generate_series(1,100) g(x)) AS t(x);
CREATE STATISTICS s (dependencies) ON a,b FROM t;
ANALYSE t;

EXPLAIN SELECT * FROM t WHERE a = 10;

QUERY PLAN
--
 Seq Scan on t  (cost=0.00..86.12 rows=1 width=8)
   Filter: (a = 10)
(2 rows)

EXPLAIN SELECT * FROM t WHERE b = 1;

 QUERY PLAN

 Seq Scan on t  (cost=0.00..86.12 rows=865 width=8)
   Filter: (b = 1)
(2 rows)

EXPLAIN SELECT * FROM t WHERE a = 10 AND b = 1;

 QUERY PLAN

 Seq Scan on t  (cost=0.00..98.75 rows=865 width=8)
   Filter: ((a = 10) AND (b = 1))
(2 rows)

whereas without stats it would estimate 1 row. That kind of
over-estimate could get very bad, so it would be good to find a way to
fix it.


> >We should amend that formula to prevent a result larger than P(b). The
> >obvious way to do that would be to use:
> >
> >  P(a,b) = Min(P(a) * [f + (1-f)*P(b)], P(b))
> >
> >but actually I think it would be better and more principled to use:
> >
> >  P(a,b) = f*Min(P(a),P(b)) + (1-f)*P(a)*P(b)
> >
> >I.e., for those rows believed to be functionally dependent, we use the
> >minimum probability, and for the rows believed to be independent, we
> >use the product.
> >
>
> Hmmm, yeah. The trouble is that we currently don't really have both
> selectivities in dependencies_clauselist_selectivity :-(
>

I hacked on this a bit, and I think it's possible to apply dependency
stats in a more general way (not necessarily assuming equality
clauses), something like the attached very rough patch.

This approach guarantees that the result of combining a pair of
selectivities with a functional dependency between them gives a
combined selectivity that is never greater than either individual
selectivity.

One regression test fails, but looking at it, that's to be expected --
the test alters the type of a column, causing its univariate stats to
be dropped, so the single-column estimate is reduced, and the new code
refuses to give a higher estimate than the single clause's new
estimate.


> It's also not clear to me how would this work for more than two clauses,
> that are all functionally dependent. Like (a => b => c), for example.
> But I haven't thought about this very much yet.
>

I attempted to solve that by computing a chain of conditional
probabilities. The maths needs checking over (as I said, this is a
very rough patch). In particular, I think it's wrong for cases like (
a->b, a->c ), but I think it's along the right lines.


> >I think that would solve the problem with the example you gave at the
> >end of [2], but I'm not sure if it helps with the general case.
> >
>
> I don't follow. I think the issue with [2] is that we can't really apply
> stats about the array values to queries on individual array elements.
> Can you explain how would the proposed changes deal with this?
>

With this patch, the original estimate of ~900 rows in that example is
restored with functional dependencies, because of the way it utilises
the minimum selectivity of the 2 clauses.

I've not fully thought this through, but I think it might allow
functiona

Re: more ALTER .. DEPENDS ON EXTENSION fixes

2020-03-11 Thread Tom Lane
Alvaro Herrera  writes:
> I'm still not sure whether to apply 0003 (+ your tab-completion patch,
> thanks for it) to backbranches or just to master.  It seems legitimate
> to see it as a feature addition, but OTOH the overall feature is not
> complete without it ...

0003 is the command addition to allow removing such a dependency,
right?  Given the lack of field demand I see no reason to risk
adding it to the back branches.

BTW, I did not like the syntax too much.  "NO DEPENDS ON EXTENSION"
doesn't seem like good English.  "NOT DEPENDS ON EXTENSION" is hardly
any better.  The real problem with both is that an ALTER action should
be, well, an action.  A grammar stickler would say that it should be
"ALTER thing DROP DEPENDENCY ON EXTENSION ext", but perhaps we could
get away with "ALTER thing DROP DEPENDS ON EXTENSION ext" to avoid
adding a new keyword.  By that logic the original command should have
been "ALTER thing ADD DEPENDS ON EXTENSION ext", but I suppose it's
too late for that.

regards, tom lane




Re: SERIAL datatype column skipping values.

2020-03-11 Thread Tom Lane
Andreas Karlsson  writes:
> On 3/11/20 11:15 AM, Prabhat Sahu wrote:
>> Is this an expected behavior?

> Curious, it seems like DEFAULT expressions of a table are executed an 
> extra time if a set returning function is used like in your example. And 
> the SERIAL type is implemented using DEFAULT.

Yeah, it's the same as if you do

regression=# select generate_series(1,2), test_default();
NOTICE:  Ran test_default()
NOTICE:  Ran test_default()
NOTICE:  Ran test_default()
 generate_series | test_default 
-+--
   1 |   42
   2 |   42
(2 rows)

The generated plan is

regression=# explain verbose select generate_series(1,2), test_default();
   QUERY PLAN
-
 ProjectSet  (cost=0.00..0.28 rows=2 width=8)
   Output: generate_series(1, 2), test_default()
   ->  Result  (cost=0.00..0.01 rows=1 width=0)
(3 rows)

and if you read nodeProjectSet.c you'll see that it needs to evaluate
the target list three times.  On the third iteration, generate_series()
returns isdone == ExprEndResult indicating that it has no more results,
so we don't emit an output tuple --- but we still run test_default()
while scanning the tlist.

Possibly the planner should try to avoid putting volatile expressions
into ProjectSet's tlist.  On the other hand, it's worked this way for
an awfully long time, so I wonder if anyone is relying on the behavior.
Even in versions before we used ProjectSet nodes, you still see three
calls to the volatile function.

Anyway, to get back to the OP's implied question, no you should never
assume that a SERIAL column's values won't have holes in the sequence.
Rolled-back transactions will have that effect in any case.

regards, tom lane




Re: [PATCH] Add schema and table names to partition error

2020-03-11 Thread Chris Bandy
Amit,

On 3/11/20 6:29 AM, Amit Kapila wrote:
> On Tue, Mar 3, 2020 at 10:05 AM Chris Bandy  wrote:
>>
>> On 3/1/20 10:09 PM, Amit Langote wrote:
>>> Hi Chris,
>>>
>>> On Mon, Mar 2, 2020 at 8:51 AM Chris Bandy  wrote:
 On 3/1/20 5:14 AM, Amit Kapila wrote:
> On Sun, Mar 1, 2020 at 10:10 AM Amit Langote  
> wrote:
>>
>> There are couple more instances in src/backend/command/tablecmds.c
>> where partition constraint is checked:
>>
>> Maybe, better fix these too for completeness.
>
> Another thing we might need to see is which of these can be
> back-patched.  We should also try to write the tests for cases we are
> changing even if we don't want to commit those.

 I don't have any opinion on back-patching. Existing tests pass. I wasn't
 able to find another test that checks the constraint field of errors.
 There's a little bit in the tests for psql, but that is about the the
 \errverbose functionality rather than specific errors and their fields.
>>>
>>> Actually, it's not a bad idea to use \errverbose to test this.
>>>
>>
>> I've added a second patch with tests that cover three of the five errors
>> touched by the first patch. Rather than \errverbose, I simply \set
>> VERBOSITY verbose. I could not find a way to exclude the location field
>> from the output, so those lines will be likely be out of date soon--if
>> not already.
>>
>> I couldn't find a way to exercise the errors in tablecmds.c. Does anyone
>> know how to instigate a table rewrite that would violate partition
>> constraints? I tried:
>>
> 
> When I tried to apply your patch on HEAD with patch -p1 <
> , I am getting below errors
> 
> (Stripping trailing CRs from patch; use --binary to disable.)
> can't find file to patch at input line 17
> Perhaps you used the wrong -p or --strip option?
> The text leading up to this was:
> ..
> 
> I have tried with git am as well, but it failed.  I am not sure what
> is the reason.  Can you please once check at your end?

Yes, sorry. This set (and v3 and v4) should work with -p0. Any following
patches from me will use the normal -p1.

> Also, see, if
> it applies till 9.5 as I think we should backpatch this.
> 
> IIUC, this patch is mainly to get the table name, schema name in case
> of the error paths, so that your application can handle errors in case
> partition constraint violation, right?

Yes, that is correct. Which also means it doesn't apply to 9.5 (no
partitions!) Later in this thread I created a test that covers all
integrity violation errors.[1] *That* can be backpatched, if you'd like.

For an approach limited to partitions only, I recommend looking at v4
rather than v2 or v3.[2]

[1]: 
https://postgresql.org/message-id/0731def8-978e-0285-04ee-582762729b38%40gmail.com
[2]: 
https://postgresql.org/message-id/7985cf2f-5082-22d9-1bb4-6b280150eeae%40gmail.com

Thanks,
Chris




Re: BEFORE ROW triggers for partitioned tables

2020-03-11 Thread Ashutosh Bapat
On Thu, Feb 27, 2020 at 10:22 PM Alvaro Herrera
 wrote:
>
> * The "root" is not necessarily the root partitioned table, but instead
> it's the table that was named in the command.  Because of this, we don't
> need to acquire locks on the tables, since the executor already has the
> tables open and locked (thus they cannot be modified by concurrent
> commands).

I believe this is because of the partition level constraints on the
table that was named in the command would catch any violation in the
partition key change in the levels above that table.

Will it be easier to subject the new tuple to the partition level
constraints themselves and report if those are violated. See
RelationGetPartitionQual() for getting partition constraints. This
function includes partition constraints from all the levels so in your
function you don't have to walk up the partition tree. It includes
constraints from the level above the table that was named in the
command, but that might be fine. We will catch the error earlier and
may be provide a better error message.

>
> * The new function I added, ReportTriggerPartkeyChange(), contains one
> serious bug (namely: it doesn't map attribute numbers properly if
> partitions are differently defined).

IIUC the code in your patch, it seems you are just looking at
partnatts. But partition key can contain expressions also which are
stored in partexprs. So, I think the code won't catch change in the
partition key values when it contains expression. Using
RelationGetPartitionQual() will fix this problem and also problem of
attribute remapping across the partition hierarchy.

-- 
Best Wishes,
Ashutosh Bapat




Re: Improve search for missing parent downlinks in amcheck

2020-03-11 Thread Peter Geoghegan
On Wed, Mar 11, 2020 at 2:02 AM Alexander Korotkov
 wrote:
> Thank you!  Pushed with this comment revised!

Thanks!

-- 
Peter Geoghegan




Re: [PATCH] Erase the distinctClause if the result is unique by definition

2020-03-11 Thread Ashutosh Bapat
On Tue, Mar 10, 2020 at 9:12 PM Andy Fan  wrote:
>
>
> Hi Tom & David & Bapat:
>
> Thanks for your review so far.  I want to summarize the current issues to help
> our following discussion.
>
> 1. Shall we bypass the AggNode as well with the same logic.
>
> I think yes, since the rules to bypass a AggNode and UniqueNode is exactly 
> same.
> The difficulty of bypassing AggNode is the current aggregation function call 
> is closely
> coupled with AggNode.  In the past few days, I have make the aggregation call 
> can
> run without AggNode (at least I tested sum(without finalized  fn),  avg (with 
> finalized fn)).
> But there are a few things to do, like acl check,  anynull check and maybe 
> more check.
> also there are some MemoryContext mess up need to fix.
> I still need some time for this goal,  so I think the complex of it deserves 
> another thread
> to discuss it,  any thought?

I think if the relation underlying an Agg node is know to be unique
for given groupByClause, we could safely use AGG_SORTED strategy.
Though the input is not ordered, it's sorted thus for every row Agg
node will combine/finalize the aggregate result.

-- 
Best Wishes,
Ashutosh Bapat




Re: [PATCH] Erase the distinctClause if the result is unique by definition

2020-03-11 Thread Ashutosh Bapat
On Wed, Mar 11, 2020 at 4:19 AM David Rowley  wrote:
>
> On Wed, 11 Mar 2020 at 02:50, Ashutosh Bapat
>  wrote:
> >
> > On Tue, Mar 10, 2020 at 1:49 PM Andy Fan  wrote:
> > > In my current implementation, it calculates the uniqueness for each
> > > BaseRel only, but in your way,  looks we need to calculate the
> > > UniquePathKey for both BaseRel and JoinRel.   This makes more
> > > difference for multi table join.
> >
> > I didn't understand this concern. I think, it would be better to do it
> > for all kinds of relation types base, join etc. This way we are sure
> > that one method works across the planner to eliminate the need for
> > Distinct or grouping. If we just implement something for base
> > relations right now and don't do that for joins, there is a chance
> > that that method may not work for joins when we come to implement it.
>
> Yeah, it seems to me that we're seeing more and more features that
> require knowledge of uniqueness of a RelOptInfo. The skip scans patch
> needs to know if a join will cause row duplication so it knows if the
> skip scan path can be joined to without messing up the uniqueness of
> the skip scan.  Adding more and more places that loop over the rel's
> indexlist just does not seem the right way to do it, especially so
> when you have to dissect the join rel down to its base rel components
> to check which indexes there are. Having the knowledge on-hand at the
> RelOptInfo level means we no longer have to look at indexes for unique
> proofs.

+1. Yep. When we break join down to the base relation, partitioned
relation pose another challenge that the partitioned relation may not
have an index on it per say but each partition may have it and the
index key happens to be part of the partition key. That case would be
easy to track through RelOptInfo instead of breaking a base rel down
into its child rels.

>
> > > Another concern is UniquePathKey
> > > is designed for a general purpose,  we need to maintain it no matter
> > > distinctClause/groupbyClause.
> >
> > This should be ok. The time spent in annotating a RelOptInfo about
> > uniqueness is not going to be a lot. But doing so would help generic
> > elimination of Distinct/Group/Unique operations. What is
> > UniquePathKey; I didn't find this in your patch or in the code.
>
> Possibly a misinterpretation. There is some overlap between this patch
> and the skip scan patch, both would like to skip doing explicit work
> to implement DISTINCT. Skip scans just go about it by adding new path
> types that scan the index and only gathers up unique values.  In that
> case, the RelOptInfo won't contain the unique keys, but the skip scan
> path will. How I imagine both of these patches working together in
> create_distinct_paths() is that we first check if the DISTINCT clause
> is a subset of the a set of the RelOptInfo's unique keys (this patch),
> else we check if there are any paths with uniquekeys that we can use
> to perform a no-op on the DISTINCT clause (the skip scan patch), if
> none of those apply, we create the required paths to uniquify the
> results.

Looks good to me. But I have not seen index skip patch.

-- 
Best Wishes,
Ashutosh Bapat




User and Logical Replication

2020-03-11 Thread Tobias Stadler
Hi,

Would it be possible to include the user (who changed the row) in the  logical 
replication data?

Best Reagrds
Tobias



Re: [PATCH] Use PKG_CHECK_MODULES to detect the libxml2 library

2020-03-11 Thread Tom Lane
Daniel Gustafsson  writes:
> On 10 Mar 2020, at 18:38, Peter Eisentraut  
> wrote:
>> Btw., here is an older thread for the same issue 
>> .
>>   Might be worth reflecting on the issues discussed there.

> Thanks, didn't realize that the subject had been up for discussion earlier as
> well.
> For me, the duplication aspect is the most troubling, since we'd still need 
> the
> xml2-config fallback and thus won't be able to simplify the code.

Yeah, but at least it's concentrated in a few lines in configure.in.

I think that the main objection to this is the documentation/confusion
issues raised by Noah in that old thread.  Still, we probably don't
have much choice given that some distros are going to remove xml2-config.
In that connection, Hugh's patch lacks docs which is entirely not OK,
but the doc changes in Peter's old patch look workable.

I wonder whether we ought to try to align this with our documented
procedure for falling back if you have no icu-config but want to
use ICU; that part of the docs suggests setting ICU_CFLAGS and ICU_LIBS
manually.  The patch as it stands doesn't seem to support manually
giving XML2_CFLAGS and XML2_LIBS, but it looks like it could easily
be adjusted to allow that.

Also, I see that pkg.m4 says

dnl Note that if there is a possibility the first call to
dnl PKG_CHECK_MODULES might not happen, you should be sure to include an
dnl explicit call to PKG_PROG_PKG_CONFIG in your configure.ac

which we are not doing.  We got away with that as long as there was only
one PKG_CHECK_MODULES call ... but with two, I'd expect that the second
one will fall over if the first one isn't executed.

regards, tom lane




Re: Add A Glossary

2020-03-11 Thread Jürgen Purtz
I made changes on top of 0001-add-glossary-page.patch which was supplied 
by C. Huinker. This affects not only terms proposed by me but also his 
original terms. If my changes are not obvious, please let me know and I 
will describe my motivation.


Please note especially lines marked with question marks.

It will be helpful for diff-ing to restrict the length of lines in the 
SGML files to 71 characters (as usual).


J. Purtz


diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
index 851e9debe6..52169b86a2 100644
--- a/doc/src/sgml/glossary.sgml
+++ b/doc/src/sgml/glossary.sgml
@@ -1,7 +1,7 @@
 
  Glossary
  
-  This is a list of terms and their in the context of PostgreSQL and Databases in general.
+  This is a list of terms and their meaning in the context of PostgreSQL and Databases in general.
  
   

@@ -44,7 +44,7 @@
 Atomic
 
  
-  In reference to the value of an Attribute or Datum: cannot be broken up into smaller components.
+  In reference to the value of an Attribute or Datum: cannot be broken down into smaller components.
  
  
   In reference to an operation: An event that cannot be completed in part: it must either entirely succeed or entirely fail. A series of SQL statements can be combined into a Transaction, and that transaction is said to be Atomic.
@@ -56,7 +56,7 @@
 Attribute
 
  
-  A typed data element found within a Tuple or Relation or Table.
+  An element with a certain name and data type found within a Tuple or Table.
  
 

@@ -164,12 +164,30 @@ any Attribute in the Relation, but
 

 
+   
+Checkpoint
+
+ 
+  A 
+  Checkpoint
+  is a point in time when all older dirty
+  pages of the Shared Memory,
+  all older WAL records, and a
+  special Checkpoint record
+  have been written and flushed to disk.
+ 
+
+   
+

 Cluster
 
  
   A group of Databases plus their Global SQL Objects. The Cluster is managed by exactly one Instance. A newly created Cluster will have three Databases created automatically. They are template0, template1, and postgres. It is expected that an application will create one or more additional Databases aside from these three.
  
+ 
+  Don't confuse the PostgreSQL specific term Cluster with the SQL command Cluster.
+ 
 

 
@@ -198,7 +216,7 @@ any Attribute in the Relation, but
 Concurrency
 
  
-  The concept that multiple independent operations can be happening within the Database at the same time.
+  The concept that multiple independent operations happen within the Database at the same time.
  
 

@@ -216,7 +234,7 @@ any Attribute in the Relation, but
 Constraint
 
  
-  A method of restricting the values of data allowed within a Table.
+  A concept of restricting the values of data allowed within a Table.
  
  
   For more information, see Constraints.
@@ -238,7 +256,7 @@ any Attribute in the Relation, but
 
  
   The base directory on the filesystem of a Server that contains all data
-files and subdirectories associated with a Cluster. The name for this directory in configuration files is PGDATA.
+  files and subdirectories associated with a Cluster with the exception of tablespaces. The environment variable PGDATA often - but not always - referes to the Data Directory.
  
  
   For more information, see Database Physical Storage: Database File Layout.
@@ -250,7 +268,7 @@ files and subdirectories associated with a Cluster. The n
 Database
 
  
-  A named collection of SQL Objects.
+  A named collection of SQL Objects.
  
  
   For more information, see Managing Databases: Overview.
@@ -292,14 +310,13 @@ files and subdirectories associated with a Cluster. The n
 File Segment
 
  
-   If a Database object grows in size past a designated limit, it may be
-split into multiple physical files. These files are called File Segments.
+   If a heap or index file grows in size over 1 GB, it will be split into multiple physical files. These files are called File Segments.
  
  
-  (Don't confuse this term with the similar term WAL Segment).
+  For more information, see Database Physical Storage: Database File Layout.
  
  
-  For more information, see Database Physical Storage: Database File Layout.
+  (Don't confuse this term with the similar term WAL Segment).
  
 

@@ -353,7 +370,7 @@ split into multiple physical files. These files are called File Segme
 Function
 
  
-  Any pre-defined tranformation of data. Many Functions are already defined within PostgreSQL itself, but can also be user-defined.
+  Any pre-defined transformation of data. Many Functions are already defined within PostgreSQL itself, but can also be user-defined.
  
  
   For more information, see

v13 latest snapshot build error

2020-03-11 Thread Devrim Gündüz

Hi,?

I'm getting build error while building latest snapshot. Any idea why? Please
note that I'm adding this patch to the  tarball:

https://git.postgresql.org/gitweb/?p=pgrpms.git;a=blob;f=rpm/redhat/master/postgresql-13/master/postgresql-13-var-run-socket.patch;h=a0292a80ae219b4c8dc1c2e686a3521f02b4330d;hb=HEAD

+ ./configure --enable-rpath --prefix=/usr/pgsql-13 --includedir=/usr/pgsql-
13/include --libdir=/usr/pgsql-13/lib --mandir=/usr/pgsql-13/share/man --
datadir=/usr/pgsql-13/share --with-icu --with-llvm --with-perl --with-python --
with-tcl --with-tclconfig=/usr/lib64 --with-openssl --with-pam --with-gssapi --
with-includes=/usr/include --with-libraries=/usr/lib64 --enable-nls --enable-
dtrace --with-uuid=e2fs --with-libxml --with-libxslt --with-ldap --with-selinux 
--with-systemd --with-system-tzdata=/usr/share/zoneinfo --
sysconfdir=/etc/sysconfig/pgsql --docdir=/usr/pgsql-13/doc --
htmldir=/usr/pgsql-13/doc/html
+ MAKELEVEL=0
+ /usr/bin/make -j4 all
In file included from ../../../../src/include/c.h:55,
 from ../../../../src/include/postgres.h:46,
 from guc.c:17:
../../../../src/include/pg_config_manual.h:200:31: error: called object is not
a function or function pointer
 #define DEFAULT_PGSOCKET_DIR  "/var/run/postgresql"
   ^
guc.c:4064:3: note: in expansion of macro 'DEFAULT_PGSOCKET_DIR'
   DEFAULT_PGSOCKET_DIR ", /tmp"
   ^~~~
make[4]: *** [: guc.o] Error 1
make[3]: *** [../../../src/backend/common.mk:39: misc-recursive] Error 2
make[3]: *** Waiting for unfinished jobs
make[2]: *** [common.mk:39: utils-recursive] Error 2
make[1]: *** [Makefile:42: all-backend-recurse] Error 2
make: *** [GNUmakefile:11: all-src-recurse] Error 2


Regards,
-- 
Devrim Gündüz
Open Source Solution Architect, Red Hat Certified Engineer
Twitter: @DevrimGunduz , @DevrimGunduzTR


signature.asc
Description: This is a digitally signed message part


Re: Add A Glossary

2020-03-11 Thread Corey Huinker
On Wed, Mar 11, 2020 at 12:50 PM Jürgen Purtz  wrote:

> I made changes on top of 0001-add-glossary-page.patch which was supplied
> by C. Huinker. This affects not only terms proposed by me but also his
> original terms. If my changes are not obvious, please let me know and I
> will describe my motivation.
>
> Please note especially lines marked with question marks.
>
> It will be helpful for diff-ing to restrict the length of lines in the
> SGML files to 71 characters (as usual).
>
> J. Purtz
>

A new person replied off-list with some suggested edits, all of which
seemed pretty good. I'll incorporate them myself if that person chooses to
remain off-list.


Re: Add A Glossary

2020-03-11 Thread Corey Huinker
>
> It will be helpful for diff-ing to restrict the length of lines in the
> SGML files to 71 characters (as usual).


I did it that way for the following reasons
1. It aids grep-ability
2. The committers seem to be moving towards that for SQL strings, mostly
for reason #1
3. I recall that the code is put through a linter as one of the final steps
before release, I assumed that the SGML gets the same.
4. Even if #3 is false, its easy enough to do manually for me to do for
this one file once we've settled on the text of the definitions.

As for the changes, most things seem fine, I specifically like:
* Checkpoint - looks good
* yes, PGDATA should have been a literal
* Partition - the a/b split works for me
* Unlogged - it reads better

I'm not so sure on / responses to your ???s:
* The statement that names of schema objects are unique isn't *strictly* true,
just *mostly* true. Take the case of a unique constraints. The constraint
has a name and the unique index has the same name, to the point where
adding a unique constraint using an existing index renames that index to
conform to the constraint name.
* Serializable "other way around" question - It's both. Outside the
transaction you can't see changes made inside another transaction (though
you can be blocked by them), and inside serializable you can't see any
changes made since you started. Does that make sense? Were you asking a
different question?
* Transaction - yes, all those things could be "visible" or they could be
"side effects". It may be best to leave the over-simplified definition in
place, and add a "For more information see <>


Re: Add A Glossary

2020-03-11 Thread Corey Huinker
>
>
> * Transaction - yes, all those things could be "visible" or they could be
> "side effects". It may be best to leave the over-simplified definition in
> place, and add a "For more information see < tutorial-transactions>>
>

transaction-iso would be a better linkref in this case


Re: User and Logical Replication

2020-03-11 Thread Peter Eisentraut

On 2020-03-11 17:18, Tobias Stadler wrote:

Would it be possible to include the user (who changed the row) in the  logical 
replication data?


Not without major re-engineering.

If you need this information, maybe a BEFORE INSERT OR UPDATE trigger 
could be used to write this information into a table column.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: backend type in log_line_prefix?

2020-03-11 Thread Justin Pryzby
On Tue, Mar 10, 2020 at 02:01:42PM -0500, Justin Pryzby wrote:
> On Thu, Feb 13, 2020 at 06:43:32PM +0900, Fujii Masao wrote:
> > If we do this, backend type should be also included in csvlog?
> 
> +1, I've been missing that
> 
> Note, this patch seems to correspond to:
> b025f32e0b Add leader_pid to pg_stat_activity
> 
> I had mentioned privately to Julien missing this info in CSV log.
> 
> Should leader_pid be exposed instead (or in addition)?  Or backend_type be a

I looked more closely and played with the patch.

Can I suggest:

$ git diff
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 3a6f7f9456..56e0a1437e 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -2945,7 +2945,7 @@ write_csvlog(ErrorData *edata)
if (MyProcPid == PostmasterPid)
appendCSVLiteral(&buf, "postmaster");
else if (MyBackendType == B_BG_WORKER)
-   appendCSVLiteral(&buf, MyBgworkerEntry->bgw_type);
+   appendCSVLiteral(&buf, MyBgworkerEntry->bgw_name);
else
appendCSVLiteral(&buf, pgstat_get_backend_desc(MyBackendType));


Then it logs the leader:
|2020-03-11 13:16:05.596 CDT,,,16289,,5e692ae3.3fa1,1,,2020-03-11 13:16:03 
CDT,4/3,0,LOG,0,"temporary file: path ""base/pgsql_tmp/pgsql_tmp16289.0"", 
size 4276224",,"explain analyze SELECT * FROM t a JOIN t b USING(i) WHERE 
i>999 GROUP BY 1;",,,"psql","parallel worker for PID 16210"

It'll be easy enough to extract the leader and join that ON leader=pid.

> I think this patch alone wouldn't provide that, and there'd need to either be 
> a
> line logged for each worker.  Maybe it'd log full query+details (ugh), or just
> log "parallel worker of pid...".  Or maybe there'd be a new column with which
> the leader would log nworkers (workers planned vs workers launched - I would
> *not* want to get this out of autoexplain).

I'm still not sure how to do that, though.
I see I can get what's needed at DEBUG1:

|2020-03-11 13:50:58.304 CDT,,,16196,,5e692aa7.3f44,22,,2020-03-11 13:15:03 
CDT,,0,DEBUG,0,"registering background worker ""parallel worker for PID 
16210""","","postmaster"

But I don't think it's viable to run for very long with log_statement=all,
log_min_messages=DEBUG.

-- 
Justin




Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction

2020-03-11 Thread Tom Lane
Justin Pryzby  writes:
>>> On Sun, Mar 08, 2020 at 02:37:49PM -0400, Tom Lane wrote:
 I guess we ought to change that function to use returns-a-tuplestore
 protocol instead of thinking it can hold a directory open across calls.

> I patched this one to see what it looks like and to allow /hopefully/ moving
> forward one way or another with the pg_ls_tmpfile() patch set (or at least
> avoid trying to do anything there which is too inconsistent with this fix).

I reviewed this, added some test cases, and pushed it, so that we can see
if the buildfarm finds anything wrong.  (I'm not expecting that, because
this should all be pretty portable, but you never know.)  Assuming not,
we need to fix the other functions similarly, and then do something about
revising the documentation to warn against this coding style.  Do you
want to have a go at that?

> Also, at first glance, these seem to be passing constant "randomAccess=true"
> rather than (bool) (rsinfo->allowedModes&SFRM_Materialize_Random)

Hm.  Not a bug, but possibly a performance issue, if the tuplestore
gets big enough for that to matter.  (I think it doesn't matter until
we start spilling to temp files.)

regards, tom lane




Re: Improve handling of parameter differences in physical replication

2020-03-11 Thread Peter Eisentraut
Here is an updated patch that incorporates some of the suggestions.  In 
particular, some of the warning messages have been rephrased to more 
accurate (but also less specific), the warning message at recovery pause 
repeats every 1 minute, and the documentation has been updated.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 3fb5c70b4b7e132cb75271bbd70c4a1523a72e86 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 11 Mar 2020 19:25:44 +0100
Subject: [PATCH v2] Improve handling of parameter differences in physical
 replication

When certain parameters are changed on a physical replication primary,
this is communicated to standbys using the XLOG_PARAMETER_CHANGE WAL
record.  The standby then checks whether its own settings are at least
as big as the ones on the primary.  If not, the standby shuts down
with a fatal error.

The correspondence of settings between primary and standby is required
because those settings influence certain shared memory sizings that
are required for processing WAL records that the primary might send.
For example, if the primary sends a prepared transaction, the standby
must have had max_prepared_transaction set appropriately or it won't
be able to process those WAL records.

However, fatally shutting down the standby immediately upon receipt of
the parameter change record might be a bit of an overreaction.  The
resources related to those settings are not required immediately at
that point, and might never be required if the activity on the primary
does not exhaust all those resources.  If we just let the standby roll
on with recovery, it will eventually produce an appropriate error when
those resources are used.

So this patch relaxes this a bit.  Upon receipt of
XLOG_PARAMETER_CHANGE, we still check the settings but only issue a
warning and set a global flag if there is a problem.  Then when we
actually hit the resource issue and the flag was set, we issue another
warning message with relevant information.  Additionally, at that
point we pause recovery, so a hot standby remains usable.
---
 doc/src/sgml/high-availability.sgml   | 48 +-
 src/backend/access/transam/twophase.c |  3 ++
 src/backend/access/transam/xlog.c | 72 +++
 src/backend/storage/ipc/procarray.c   |  9 +++-
 src/backend/storage/lmgr/lock.c   | 10 
 src/include/access/xlog.h |  1 +
 6 files changed, 120 insertions(+), 23 deletions(-)

diff --git a/doc/src/sgml/high-availability.sgml 
b/doc/src/sgml/high-availability.sgml
index bc4d98fe03..c5a9f9d5df 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -2148,18 +2148,14 @@ Administrator's Overview

 

-The setting of some parameters on the standby will need reconfiguration
-if they have been changed on the primary. For these parameters,
-the value on the standby must
-be equal to or greater than the value on the primary.
-Therefore, if you want to increase these values, you should do so on all
-standby servers first, before applying the changes to the primary server.
-Conversely, if you want to decrease these values, you should do so on the
-primary server first, before applying the changes to all standby servers.
-If these parameters
-are not set high enough then the standby will refuse to start.
-Higher values can then be supplied and the server
-restarted to begin recovery again.  These parameters are:
+The settings of some parameters determine the size of shared memory for
+tracking transaction IDs, locks, and prepared transactions.  These shared
+memory structures should be no smaller on a standby than on the primary.
+Otherwise, it could happen that the standby runs out of shared memory
+during recovery.  For example, if the primary uses a prepared transaction
+but the standby did not allocate any shared memory for tracking prepared
+transactions, then recovery will abort and cannot continue until the
+standby's configuration is changed.  The parameters affected are:
 
   

@@ -2188,6 +2184,34 @@ Administrator's Overview
 

   
+
+The easiest way to ensure this does not become a problem is to have these
+parameters set on the standbys to values equal to or greater than on the
+primary.  Therefore, if you want to increase these values, you should do
+so on all standby servers first, before applying the changes to the
+primary server.  Conversely, if you want to decrease these values, you
+should do so on the primary server first, before applying the changes to
+all standby servers.  The WAL tracks changes to these parameters on the
+primary, and if a standby processes WAL that indicates that the current
+value on the primary is higher than its own value, it will log a warning, 
for example:
+
+WARNING:  ins

Re: [PATCH] Skip llvm bytecode generation if LLVM is missing

2020-03-11 Thread Andres Freund
Hi,

On 2020-03-11 11:25:28 +0800, Craig Ringer wrote:
> I propose that per the attached patch PGXS should simply skip adding
> the automatic dependency for .bc files if clang cannot be found.
> Extensions may still choose to explicitly declare the rule in their
> own Makefile if they want to force bitcode generation.

Hm, that seems like it could also cause silent failures (e.g. after a
package upgrade or such).

How about erroring out, but with an instruction that llvm can be
disabled with make NO_LLVM=1 or such?

Regards,

Andres




Re: v13 latest snapshot build error

2020-03-11 Thread Tom Lane
Devrim =?ISO-8859-1?Q?G=FCnd=FCz?=  writes:
> I'm getting build error while building latest snapshot. Any idea why? Please
> note that I'm adding this patch to the  tarball:
> https://git.postgresql.org/gitweb/?p=pgrpms.git;a=blob;f=rpm/redhat/master/postgresql-13/master/postgresql-13-var-run-socket.patch;h=a0292a80ae219b4c8dc1c2e686a3521f02b4330d;hb=HEAD

(Hey, I recognize that patch ...)

> In file included from ../../../../src/include/c.h:55,
>  from ../../../../src/include/postgres.h:46,
>  from guc.c:17:
> ../../../../src/include/pg_config_manual.h:200:31: error: called object is not
> a function or function pointer
>  #define DEFAULT_PGSOCKET_DIR  "/var/run/postgresql"
>^
> guc.c:4064:3: note: in expansion of macro 'DEFAULT_PGSOCKET_DIR'
>DEFAULT_PGSOCKET_DIR ", /tmp"
>^~~~
> make[4]: *** [: guc.o] Error 1

That is just weird.  Could it be a compiler bug?  I assume you're
using some bleeding-edge gcc version, and it's really hard to see
another reason why this would fail, especially with a nonsensical
error like that.

regards, tom lane




Re: more ALTER .. DEPENDS ON EXTENSION fixes

2020-03-11 Thread Alvaro Herrera
Thanks for the reviews; I pushed 0001 now, again to all branches since
9.6.  Because of the previous commit, the fact that multiple statements
are emitted is not important anymore: the server will only restore the
first one, and silently ignore subsequent ones.  And once you're using a
system in that state, naturally only one will be emitted by pg_dump in
all cases.

What remains on this CF item is the new feature to remove an existing
dependency.  As Tom says, given the little use this feature gets it
doesn't sound worth the destabilization risk in back branches, so I'm
going to push it only to master -- but not yet.

Thanks,

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: backup manifests

2020-03-11 Thread Robert Haas
On Fri, Mar 6, 2020 at 3:58 AM Suraj Kharage
 wrote:
> 1: Getting below error while compiling 0002 patch.
> 2:
>
> Few macros defined in 0003 patch not used anywhere in 0005 patch. Either we 
> can replace these with hard-coded values or remove them.

Thanks. I hope that I have straightened those things out in the new
version which is attached. This version also includes some other
changes. The non-JSON code is now completely gone. Also, I've
refactored the code that does parses the JSON manifest to make it
cleaner, and I've moved it out into a separate file. This might be
useful if anyone ends up wanting to reuse that code for some other
purpose, and I think it makes it easier to understand, too, since the
manifest parsing is now much better separated from the task of
actually validating the given directory against the manifest. I've
also added some tests, which are based in part on testing ideas from
Rajkumar Raghuwanshi and Mark Dilger, but this test code was written
by me. So now it's like this:

0001 - checksum helper functions. same as before.
0002 - patch the server to generate and send a manifest, and
pg_basebackup to receive it
0003 - add pg_validatebackup
0004 - TAP tests

Comments?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


v10-0004-Regression-tests-for-pg_validatebackup.patch
Description: Binary data


v10-0001-Add-checksum-helper-functions.patch
Description: Binary data


v10-0003-pg_validatebackup-Validate-a-backup-against-the-.patch
Description: Binary data


v10-0002-Generate-backup-manifests-for-base-backups.patch
Description: Binary data


Re: [Proposal] Global temporary tables

2020-03-11 Thread Robert Haas
On Wed, Mar 11, 2020 at 9:07 AM 曾文旌(义从)  wrote:
> reindex need change relfilenode, but GTT is not currently supported.

In my view that'd have to be fixed somehow.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add an optional timeout clause to isolationtester step.

2020-03-11 Thread Tom Lane
Michael Paquier  writes:
> On Tue, Mar 10, 2020 at 02:53:36PM +0100, Julien Rouhaud wrote:
>> So basically we could just change pg_isolation_test_session_is_blocked() to
>> also return the wait_event_type and wait_event, and adding something like

> Hmm.  I think that Tom has in mind the reasons behind 511540d here.

Yeah, that history suggests that we need to be very protective of the
performance of the wait-checking query, especially in CLOBBER_CACHE_ALWAYS
builds.  That being the case, I'm hesitant to consider changing the test
function to return a tuple.  That'll add quite a lot of overhead due to
the cache lookups involved, or so my gut says.

I'm also finding the proposed semantics (issue a cancel if wait state X
is reached) to be odd and special-purpose.  I was envisioning something
more like "if wait state X is reached, consider the session to be blocked,
the same as if it had reached a heavyweight-lock wait".  Then
isolationtester would move on to issue another step, which is where
I'd envision putting the cancel for that particular test usage.

So that idea leads to thinking that the wait-state specification is an
input to pg_isolation_test_session_is_blocked, not an output.  We could
re-use Julien's ideas about the isolation spec syntax by making it be,
roughly,

step "" {  } [ blocked if "" "" ]

and then those items would need to be passed as parameters of the prepared
query.

Or maybe we should use two different prepared queries depending on whether
there's a BLOCKED IF spec.  We probably don't need lock-wait detection
if we're expecting a wait-state-based block, so maybe we should invent a
separate backend function "is this process waiting with this type of wait
state" and use that to check the state of a step that has this type of
annotation.

Just eyeing the proposed test case, I'm wondering whether this will
actually be sufficiently fine-grained.  It seems like "REINDEX has
reached a wait on a virtual XID" is not really all that specific;
it could match on other situations, such as blocking on a concurrent
tuple update.  Maybe it's okay given the restrictive context that
we don't expect anything to be happening that the isolation test
didn't ask for.

I'd like to see an attempt to rewrite some of the existing
timeout-dependent test cases to use this facility instead of
long timeouts.  If we could get rid of the timeouts in the
deadlock tests, that'd go a long way towards showing that this
idea is actually any good.

regards, tom lane




Re: Add an optional timeout clause to isolationtester step.

2020-03-11 Thread Alvaro Herrera
On 2020-Mar-11, Tom Lane wrote:

> We could re-use Julien's ideas about the isolation spec syntax by
> making it be, roughly,
> 
> step "" {  } [ blocked if "" "" ]
> 
> and then those items would need to be passed as parameters of the prepared
> query.

I think for test readability's sake, it'd be better to put the BLOCKED
IF clause ahead of the SQL, so you can write it in the same line and let
the SQL flow to the next one:

STEP "long_select" BLOCKED IF "lwlock" "ClogControlLock"
  { select foo from pg_class where ... some more long clauses ... }

otherwise I think a step would require more lines to write.

> I'd like to see an attempt to rewrite some of the existing
> timeout-dependent test cases to use this facility instead of
> long timeouts.  If we could get rid of the timeouts in the
> deadlock tests, that'd go a long way towards showing that this
> idea is actually any good.

+1.  Those long timeouts are annoying enough that infrastructure to make
a run shorter in normal circumstances might be sufficient justification
for this patch ...

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: v13 latest snapshot build error

2020-03-11 Thread Devrim Gunduz
Hi,

(Sorry for top posting)

This happens on RHEL 8. I don't think it's that bleeding edge.

Regards, Devrim

On 11 March 2020 19:44:55 GMT, Tom Lane  wrote:
>Devrim =?ISO-8859-1?Q?G=FCnd=FCz?=  writes:
>> I'm getting build error while building latest snapshot. Any idea why?
>Please
>> note that I'm adding this patch to the  tarball:
>>
>https://git.postgresql.org/gitweb/?p=pgrpms.git;a=blob;f=rpm/redhat/master/postgresql-13/master/postgresql-13-var-run-socket.patch;h=a0292a80ae219b4c8dc1c2e686a3521f02b4330d;hb=HEAD
>
>(Hey, I recognize that patch ...)
>
>> In file included from ../../../../src/include/c.h:55,
>>  from ../../../../src/include/postgres.h:46,
>>  from guc.c:17:
>> ../../../../src/include/pg_config_manual.h:200:31: error: called
>object is not
>> a function or function pointer
>>  #define DEFAULT_PGSOCKET_DIR  "/var/run/postgresql"
>>^
>> guc.c:4064:3: note: in expansion of macro 'DEFAULT_PGSOCKET_DIR'
>>DEFAULT_PGSOCKET_DIR ", /tmp"
>>^~~~
>> make[4]: *** [: guc.o] Error 1
>
>That is just weird.  Could it be a compiler bug?  I assume you're
>using some bleeding-edge gcc version, and it's really hard to see
>another reason why this would fail, especially with a nonsensical
>error like that.
>
>   regards, tom lane

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: control max length of parameter values logged

2020-03-11 Thread Alvaro Herrera
On 2020-Mar-10, Tom Lane wrote:

> I agree that something ought to be done here, but I'm not sure that
> this is exactly what.  It appears to me that there are three related
> but distinct behaviors under discussion:
> 
> 1. Truncation of bind parameters returned to clients in error message
>detail fields.
> 2. Truncation of bind parameters written to the server log in logged
>error messages.
> 3. Truncation of bind parameters written to the server log in non-error
>statement logging actions (log_statement and variants).
> 
> Historically we haven't truncated any of these, I believe.  As of
> ba79cb5dc we forcibly truncate #1 and #2 at 64 bytes, but not #3.
> Your patch proposes to provide a SUSET GUC that controls the
> truncation length for all 3.

The reason I didn't change the other uses was precisely that it was
established behavior, but my opinion was that truncating them would
be better, now that we have code to handle doing so.

Maybe it would make sense to always log complete parameters for error
cases when that feature is enabled, and have the GUC only control the
lengths logged for non-error cases?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: control max length of parameter values logged

2020-03-11 Thread Tom Lane
Alvaro Herrera  writes:
> Maybe it would make sense to always log complete parameters for error
> cases when that feature is enabled, and have the GUC only control the
> lengths logged for non-error cases?

I could get behind that.  It's a bit different from the original
idea here, but I think it's closer to being real-world-useful.

Another way to slice this up would be to have a USERSET GUC that
controls truncation of parameter values in errors, and a separate
SUSET GUC that controls it for the non-error statement logging
cases.  I'm not sure how much that's actually worth, but if we
feel that truncation in error cases can be useful, that's how
I'd vote to expose it.

regards, tom lane




Re: [PATCH] Use PKG_CHECK_MODULES to detect the libxml2 library

2020-03-11 Thread Hugh McMaster
On Thu, 12 Mar 2020 at 03:39, Tom Lane wrote:

> Daniel Gustafsson writes:
> > For me, the duplication aspect is the most troubling, since we'd still
> need the
> > xml2-config fallback and thus won't be able to simplify the code.
>
> Yeah, but at least it's concentrated in a few lines in configure.in.
>
> I think that the main objection to this is the documentation/confusion
> issues raised by Noah in that old thread.  Still, we probably don't
> have much choice given that some distros are going to remove xml2-config.
> In that connection, Hugh's patch lacks docs which is entirely not OK,
> but the doc changes in Peter's old patch look workable.


Documentation can be added easily. :-)

I wonder whether we ought to try to align this with our documented
> procedure for falling back if you have no icu-config but want to
> use ICU; that part of the docs suggests setting ICU_CFLAGS and ICU_LIBS
> manually.


Unless your system has ICU installed in a non-standard location, there is
no need to set those variables, as PKG_CHECK_MODULES will handle that for
you. `./configure --help` also provides the relevant documentation on
overriding pkg-config’s X_CFLAGS and X_LIBS.

The patch as it stands doesn't seem to support manually
> giving XML2_CFLAGS and XML2_LIBS, but it looks like it could easily
> be adjusted to allow that.


What I said for ICU also applies here (in fact, to all use of
PKG_CHECK_MODULES). For the fallback, a minor rework is required.

The question is really whether we want to maintain a fallback to
xml2-config. To give more context, I gave a more detailed assessment of the
situation in an earlier email to this list. (Personally, I don’t think we
should.)

Do note also that xslt-config will also be a problem at some point.

Also, I see that pkg.m4 says
>
> dnl Note that if there is a possibility the first call to
> dnl PKG_CHECK_MODULES might not happen, you should be sure to include an
> dnl explicit call to PKG_PROG_PKG_CONFIG in your configure.ac
>
> which we are not doing.  We got away with that as long as there was only
> one PKG_CHECK_MODULES call ... but with two, I'd expect that the second
> one will fall over if the first one isn't executed.


Yes, that macro needs to be set.

Hugh


Re: [PATCH] Use PKG_CHECK_MODULES to detect the libxml2 library

2020-03-11 Thread Tom Lane
Hugh McMaster  writes:
> The question is really whether we want to maintain a fallback to
> xml2-config. To give more context, I gave a more detailed assessment of the
> situation in an earlier email to this list. (Personally, I don’t think we
> should.)

I think that that is not optional.  We try to maintain portability
to old systems as well as new.

regards, tom lane




Re: range_agg

2020-03-11 Thread Paul A Jungwirth
Thanks everyone for offering some thoughts on this!

Tom Lane  wrote:
> have you given any thought to just deciding that ranges and
> multiranges are the same type?

I can see how it might be nice to have just one type to think about.
Still I think keeping them separate makes sense. Other folks have
brought up several reasons already. Just to chime in:

Tom Lane  wrote:
> Isaac Morland  writes:
> > Definitely agreed that range and multirange (or whatever it's called)
> > should be different. In the work I do I have a number of uses for ranges,
> > but not (yet) for multiranges. I want to be able to declare a column as
> > range and be sure that it is just a single range, and then call lower() and
> > upper() on it and be sure to get just one value in each case; and if I
> > accidentally try to take the union of ranges where the union isn’t another
> > range, I want to get an error rather than calculate some weird (in my
> > context) multirange.
>
> I do not find that argument convincing at all.  Surely you could put
> that constraint on your column using "CHECK (numranges(VALUE) <= 1)"
> or some such notation.

A check constraint works for columns, but there are other contexts
where you'd like to restrict things to just a contiguous range, e.g.
user-defined functions and intermediate results in queries. Basic
ranges seem a lot simpler to think about, so I can appreciate how
letting any range be a multirange adds a heavy cognitive burden. I
think a lot of people will share Isaac's opinion here.

Tom Lane  wrote:
> Also, this would allow us to remove at least one ugly misfeature:
>
> regression=# select '[1,2]'::int4range + '[3,10)'::int4range;
>  ?column?
>  --
>   [1,10)
>   (1 row)
>
>   regression=# select '[1,2]'::int4range + '[4,10)'::int4range;
>   ERROR:  result of range union would not be contiguous

Because of backwards compatibility we can't really change +/-/* not to
raise (right?), so if we joined ranges and multiranges we'd need to
add operators with a different name. I was calling those @+/@-/@*
before, but that was considered too unintuitive and undiscoverable.
Having two types lets us use the nicer operator names.

Tom Lane  wrote:
> it seems like we could consider the traditional
> range functions like lower() and upper() to report on the first or last
> range bound in a multirange

I tried to keep functions/operators similar, so already lower(mr) =
lower(r) and upper(mr) = upper(r).
I think *conceptually* it's good to make ranges & multiranges as
interchangable as possible, but that doesn't mean they have to be the
same type.

Adding multiranges-as-ranges also raises questions about their string
format. If a multirange is {[1,2), [4,5)} would you only print the
curly braces when there is more than one element?

I don't *think* allowing non-contiguous ranges would break how we use
them in GiST indexes or exclusion constraints, but maybe someone can
think of some problem I can't. It's one place to be wary anyway. At
the very least it would make those things slower I expect.

On a few other issues people have raised recently:

Alvaro Herrera  writes:
> I wonder what's the point of multirange arrays. Is there a reason we
> create those?

We have arrays of everything else, so why not have them for
multiranges? We don't have to identify specific use cases here,
although I can see how you'd want to call array_agg/UNNEST on some
multiranges, e.g. (Actually I really want to add an UNNEST that
*takes* a multirange, but that could be a follow-on commit.) If
nothing else I think omitting arrays of multiranges would be a strange
irregularity in the type system.

David G. Johnston  wrote:
> In the tests there is:
>
> +select '{[a,a],[b,b]}'::textmultirange;
> + textmultirange
> +
> + {[a,a],[b,b]}
> +(1 row)
> +
> +-- without canonicalization, we can't join these:
> +select '{[a,a], [b,b]}'::textmultirange;
> + textmultirange
> +
> + {[a,a],[b,b]}
> +(1 row)
> +
>
> Aside from the comment they are identical so I'm confused as to why both 
> tests exist - though I suspect it has to do with the fact that the expected 
> result would be {[a,b]} since text is discrete.

Those tests are for basic string parsing (multirange_in), so one is
testing {A,B} and the other {A, B} (with a space after the comma).
(There are some tests right above those that also have blank spaces,
but they only output a single element in the multirange result.)

David G. Johnston  wrote:
> Also, the current patch set seems a bit undecided on whether it wants to be 
> truly a multi-range or a range that can report non-contiguous components.  
> Specifically,
>
> +select '{[a,d), [b,f]}'::textmultirange;
> + textmultirange
> +
> + {[a,f]}
> +(1 row)

Without a canonicalization function, we can't know that [a,a] touches
[b,b], but we *can* know that [a,d) touches [b,f). Or even:

regression=# select '{[a,b), [b,b]}'::textmultirange;
 textmultirange
 
  {[a,b]}
  (1 r

Re: Use compiler intrinsics for bit ops in hash

2020-03-11 Thread David Rowley
On Sat, 29 Feb 2020 at 04:13, David Fetter  wrote:
>
> On Thu, Feb 27, 2020 at 02:41:49PM +0800, John Naylor wrote:
> > In 0002, the pg_bitutils functions have a test (input > 0), and the
> > new callers ceil_log2_* and next_power_of_2_* have asserts. That seems
> > backward to me.
>
> To me, too, now that you mention it.  My thinking was a little fuzzed
> by trying to accommodate platforms with intrinsics where clz is
> defined for 0 inputs.

Wouldn't it be better just to leave the existing definitions of the
pg_leftmost_one_pos* function alone?  It seems to me you're hacking
away at those just so you can support passing 1 to the new functions,
and that's giving you trouble now because you're doing num-1 to handle
the case where the number is already a power of 2. Which is
troublesome because 1-1 is 0, which you're trying to code around.

Isn't it better just to put in a run-time check for numbers that are
already a power of 2 and then get rid of the num - 1? Something like:

/*
 * pg_nextpow2_32
 *  Returns the next highest power of 2 of 'num', or 'num', if
it's already a
 *  power of 2.  'num' mustn't be 0 or be above UINT_MAX / 2.
 */
static inline uint32
pg_nextpow2_32(uint32 num)
{
Assert(num > 0 && num <= UINT_MAX / 2);
/* use some bitmasking tricks to see if only 1 bit is on */
return (num & (num - 1)) == 0 ? num : ((uint32) 1) <<
(pg_leftmost_one_pos32(num) + 1);
}

I think you'll also want to mention the issue about numbers greater
than UINT_MAX / 2, as I've done above and also align your naming
conversion to what else is in that file.

I don't think Jesse's proposed solution is that great due to the
additional function call overhead for pg_count_leading_zeros_32(). The
(num & (num - 1)) == 0 I imagine will perform better, but I didn't
test it.

Also, wondering if you've looked at any of the other places where we
do "*= 2;" or "<<= 1;" inside a loop? There's quite a number that look
like candidates for using the new function.




Re: Refactor compile-time assertion checks for C/C++

2020-03-11 Thread Michael Paquier
On Wed, Mar 11, 2020 at 12:31:19PM +, Georgios Kokolatos wrote:
> For whatever is worth, my previous comment that the patch improves
> readability also applies to the updated version of the patch.

v2 has actually less diffs for the C++ part.

> The CF bot seems happy now, which means that your assessment as
> to the error and fix are correct.

Indeed the bot is happy now.  By looking at the patch, one would note
that what it just does is unifying the fallback "hack-ish"
implementations so as C and C++ use the same thing, which is the
fallback implementation for C of HEAD.  I would prefer hear first from
more people to see if they like this change.  Or not.
--
Michael


signature.asc
Description: PGP signature


Re: v13 latest snapshot build error

2020-03-11 Thread Devrim Gündüz

Hi Tom,

[Adding Christoph]

On Wed, 2020-03-11 at 21:07 +, Devrim Gunduz wrote:
> This happens on RHEL 8. I don't think it's that bleeding edge.

Morever, my RHEL 7 build are broken as well:

+ MAKELEVEL=0
+ /usr/bin/make -j4 all
In file included from /usr/include/time.h:37:0,
 from ../../../../src/include/portability/instr_time.h:64,
 from ../../../../src/include/executor/instrument.h:16,
 from ../../../../src/include/nodes/execnodes.h:18,
 from ../../../../src/include/executor/execdesc.h:18,
 from ../../../../src/include/executor/executor.h:17,
 from ../../../../src/include/commands/explain.h:16,
 from ../../../../src/include/commands/prepare.h:16,
 from guc.c:40:
guc.c:4068:3: error: called object is not a function or function pointer
   NULL, NULL, NULL
   ^
make[4]: *** [guc.o] Error 1
make[3]: *** [misc-recursive] Error 2
make[3]: *** Waiting for unfinished jobs
make[2]: *** [utils-recursive] Error 2
make[1]: *** [all-backend-recurse] Error 2
make: *** [all-src-recurse] Error 2
error: Bad exit status from /var/tmp/rpm-tmp.PXbYKE (%build)
Bad exit status from /var/tmp/rpm-tmp.PXbYKE (%build)

IDK what is going on, but apparently something is broken somewhere.

Cheers,
--
Devrim Gündüz
Open Source Solution Architect, Red Hat Certified Engineer
Twitter: @DevrimGunduz , @DevrimGunduzTR


signature.asc
Description: This is a digitally signed message part


Re: v13 latest snapshot build error

2020-03-11 Thread Tom Lane
Devrim =?ISO-8859-1?Q?G=FCnd=FCz?=  writes:
> On Wed, 2020-03-11 at 21:07 +, Devrim Gunduz wrote:
>> This happens on RHEL 8. I don't think it's that bleeding edge.

> Morever, my RHEL 7 build are broken as well:

Hm.  We have various RHEL and CentOS 7.x machines in the buildfarm,
and they aren't unhappy.  So seems like it must be something specific
to your build.  Are you carrying any other patches?

regards, tom lane




Re: Index Skip Scan

2020-03-11 Thread David Rowley
On Wed, 11 Mar 2020 at 16:44, Andy Fan  wrote:
>>
>>
>> I think the UniqueKeys may need to be changed from using
>> EquivalenceClasses to use Exprs instead.
>
>
> When I try to understand why UniqueKeys needs EquivalenceClasses,
> see your comments here.  I feel that  FuncExpr can't be
> used to as a UniquePath even we can create unique index on f(a)
> and f->strict == true.  The reason is even we know a is not null,
>  f->strict = true.  it is still be possible that f(a) == null.  unique index
> allows more than 1 null values.  so shall we move further to use varattrno
> instead of Expr?  if so,  we can also use a list of Bitmapset to present multi
> unique path of a single RelOptInfo.

We do need some method to determine if NULL values are possible. At
the base relation level that can probably be done by checking NOT NULL
constraints and strict base quals. At higher levels, we can use strict
join quals as proofs.

As for bit a Bitmapset of varattnos, that would certainly work well at
the base relation level when there are no unique expression indexes,
but it's not so simple with join relations when the varattnos only
mean something when you know which base relation it comes from.  I'm
not saying that Lists of Exprs is ideal, but I think trying to
optimise some code that does not yet exist is premature.

There was some other talk in [1] on how we might make checking if a
List contains a given Node.  That could be advantageous in a few
places in the query planner, and it might be useful for this too.

[1] 
https://www.postgresql.org/message-id/CAKJS1f8v-fUG8YpaAGj309ZuALo3aEk7f6cqMHr_AVwz1fKXug%40mail.gmail.com




Re: GSOC 2020 - Develop Performance Farm Benchmarks and Website (2020)

2020-03-11 Thread Mark Wong
Hi,

On Mon, Mar 09, 2020 at 08:35:10PM +, do w.r. (wrd1e16) wrote:
> I am very interested in the Develop Performance Farm Benchmarks and Website 
> (2020) project as one of the GSOC project. Is it possible to link me up with 
> Andreas Scherbaum to discuss more and further understand the project?

I suggest reaching out on the #gsoc2020-students slack channel.  Details
on that, and other Postgres specific GSoC information, if you haven't
already seen it: https://wiki.postgresql.org/wiki/GSoC

Regards,
Mark

-- 
Mark Wong
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/




Re: v13 latest snapshot build error

2020-03-11 Thread Artur Zakirov

Hello,

On 3/12/2020 4:44 AM, Tom Lane wrote:

Devrim =?ISO-8859-1?Q?G=FCnd=FCz?=  writes:

I'm getting build error while building latest snapshot. Any idea why? Please
note that I'm adding this patch to the  tarball:
https://git.postgresql.org/gitweb/?p=pgrpms.git;a=blob;f=rpm/redhat/master/postgresql-13/master/postgresql-13-var-run-socket.patch;h=a0292a80ae219b4c8dc1c2e686a3521f02b4330d;hb=HEAD


(Hey, I recognize that patch ...)


In file included from ../../../../src/include/c.h:55,
  from ../../../../src/include/postgres.h:46,
  from guc.c:17:
../../../../src/include/pg_config_manual.h:200:31: error: called object is not
a function or function pointer
  #define DEFAULT_PGSOCKET_DIR  "/var/run/postgresql"
^
guc.c:4064:3: note: in expansion of macro 'DEFAULT_PGSOCKET_DIR'
DEFAULT_PGSOCKET_DIR ", /tmp"
^~~~
make[4]: *** [: guc.o] Error 1


That is just weird.  Could it be a compiler bug?  I assume you're
using some bleeding-edge gcc version, and it's really hard to see
another reason why this would fail, especially with a nonsensical
error like that.


I'm not familiar with the patch itself. But I think there is just a lack 
of the comma here, after ", /tmp" :-)



--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -4061,7 +4061,7 @@ static struct config_string ConfigureNamesString[] =
},
&Unix_socket_directories,
 #ifdef HAVE_UNIX_SOCKETS
-   DEFAULT_PGSOCKET_DIR,
+   DEFAULT_PGSOCKET_DIR ", /tmp"
 #else
"",
 #endif


So, it should be:
 > #ifdef HAVE_UNIX_SOCKETS

-   DEFAULT_PGSOCKET_DIR,
+   DEFAULT_PGSOCKET_DIR ", /tmp",
 #else


--
Artur




Re: v13 latest snapshot build error

2020-03-11 Thread Tom Lane
Artur Zakirov  writes:
> I'm not familiar with the patch itself. But I think there is just a lack 
> of the comma here, after ", /tmp" :-)

[ blink... ]  There definitely is a comma there in the version of the
patch that's in the Fedora repo.

regards, tom lane




A bug when use get_bit() function for a long bytea string

2020-03-11 Thread movead...@highgo.ca
Hello hackers,

I found an issue about get_bit() and set_bit() function,here it is:

postgres=# select 
get_bit(pg_read_binary_file('/home/movead/temp/file_seek/f_512M'), 0);
2020-03-12 10:05:23.296 CST [10549] ERROR:  index 0 out of valid range, 0..-1
2020-03-12 10:05:23.296 CST [10549] STATEMENT:  select 
get_bit(pg_read_binary_file('/home/movead/temp/file_seek/f_512M'), 0);
ERROR:  index 0 out of valid range, 0..-1
postgres=# select 
set_bit(pg_read_binary_file('/home/movead/temp/file_seek/f_512M'), 0,1);
2020-03-12 10:05:27.959 CST [10549] ERROR:  index 0 out of valid range, 0..-1
2020-03-12 10:05:27.959 CST [10549] STATEMENT:  select 
set_bit(pg_read_binary_file('/home/movead/temp/file_seek/f_512M'), 0,1);
ERROR:  index 0 out of valid range, 0..-1
postgres=#

PostgreSQL can handle bytea size nearby 1G, but now it reports an
error when 512M. And I research it and found it is byteaSetBit() and
byteaGetBit(), it uses an 'int32 len' to hold bit numbers for the long
bytea data, and obvious 512M * 8bit is an overflow for an int32. 
So I fix it and test ok, as below.

postgres=# select 
get_bit(set_bit(pg_read_binary_file('/home/movead/temp/file_seek/f_512M'), 
0,1),0); get_bit - 1 (1 row) postgres=# select 
get_bit(set_bit(pg_read_binary_file('/home/movead/temp/file_seek/f_512M'), 
0,0),0); get_bit - 0 (1 row) postgres=#



And I do a check about if anything else related bytea has this issue, several 
codes have the same issue:
1. byteaout() When formatting bytea as an escape, the 'len' variable should be 
int64, or
it may use an overflowing number. 2. esc_enc_len() Same as above, the 'len' 
variable should be int64, and the return type
should change as int64. Due to esc_enc_len() has same call struct with 
pg_base64_enc_len() and hex_enc_len(), so I want to change the return value of 
the two function. And the opposite function esc_dec_len() seem nothing wrong. 
3. binary_encode() and binary_decode() Here use an 'int32 resultlen' to accept 
an 'unsigned int' function return, which seem unfortable.
I fix all mentioned above, and patch attachments.
How do you think about that?




Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


long_bytea_string_bug_fix.patch
Description: Binary data


Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-11 Thread David Rowley
On Wed, 11 Mar 2020 at 17:24, Laurenz Albe  wrote:
>
> On Wed, 2020-03-11 at 12:00 +0900, Masahiko Sawada wrote:
> > I might be missing your point but could you elaborate on that in what
> > kind of case you think this lead to unnecessary vacuums?
>
> If you have an insert-only table that has 10 entries, it will get
> vacuumed roughly every 2 new entries.  The impact is probably too
> little to care, but it will increase the contention for the three
> autovacuum workers available by default.

I guess that depends on your definition of unnecessary. If you want
Index Only Scans, then those settings don't seem unreasonable.  If you
want it just to reduce the chances or impact of an anti-wraparound
vacuum then likely it's a bit too often.

I understand this patch was born due to the anti-wraparound case, but
should we really just ignore the Index Only Scan case?




Re: BEFORE ROW triggers for partitioned tables

2020-03-11 Thread Ashutosh Bapat
On Wed, Mar 11, 2020 at 8:53 PM Ashutosh Bapat
 wrote:
>
> On Thu, Feb 27, 2020 at 10:22 PM Alvaro Herrera
>  wrote:
> >
> > * The "root" is not necessarily the root partitioned table, but instead
> > it's the table that was named in the command.  Because of this, we don't
> > need to acquire locks on the tables, since the executor already has the
> > tables open and locked (thus they cannot be modified by concurrent
> > commands).
>
> I believe this is because of the partition level constraints on the
> table that was named in the command would catch any violation in the
> partition key change in the levels above that table.
>
> Will it be easier to subject the new tuple to the partition level
> constraints themselves and report if those are violated. See
> RelationGetPartitionQual() for getting partition constraints. This
> function includes partition constraints from all the levels so in your
> function you don't have to walk up the partition tree. It includes
> constraints from the level above the table that was named in the
> command, but that might be fine. We will catch the error earlier and
> may be provide a better error message.

I realized that this will implement the third option in your original
proposal, not the second one. I suppose that's fine too?
-- 
Best Wishes,
Ashutosh Bapat




Re: [PATCH] Fix for slow GIN index queries when "gin_fuzzy_search_limit" setting is relatively small for large tables

2020-03-11 Thread Adé
Hi, Tom. Thanks for taking a look.


> It seems like what you're actually trying
> to accomplish is to ensure that entryLoadMoreItems's "stepright" path
> is taken, instead of re-descending the index from the root.


What I was primarily trying to do is make sure that when entryLoadMoreItems
is called, it loads new items that it didn’t load previously, which would
occur in special cases. But the solution to that goal does result in the
"stepright" path being used. I explain the main goal in a segment of my bug
report this way, though it's a bit longwinded (from
https://www.postgresql.org/message-id/16220-1a0a4f0cb67ca...@postgresql.org
):


Since the program doesn't load all items into memory at once, it calls the
> "entryLoadMoreItems" function when it needs to get another page of items to
> iterate through. The "entryLoadMoreItems" function calls are passed an
> "advancePast" variable as an argument. This variable decides what leaf page
> in the items/posting tree should more items be retrieved from. Usually when
> iterating through all possible results, execution will exit the do-while
> loop responsible for iteration in order to perform some important actions
> (including the updating of the "advancePast" variable) before returning
> into
> the loop again to iterate over more items. However, when "dropItem" returns
> true in succession a great many times, the do-while loop can not be exited
> for updating the "advancePast" variable until a non-drop finally occurs.
> When this "advancePast" variable is not updated it leads to calls to
> "entryLoadMoreItems" repeatedly returning the same items when stuck in the
> do-while loop by a high succession of dropped items (because "advancePast"
> is never updated to a value after items already iterated through).



> Along with the issue of returning the same items, there's the issue of how
> the "entryLoadMoreItems" function traverses the posting tree from the root
> each time it's called while stuck in the do-while loop. This especially is
> the cause for the bad performance seen for low "gin_fuzzy_search_limit"
> values. In some cases, this situation is made even worse when "advancePast"
> is set to a value that leads to loading a page of items that has relatively
> few items actually past "advancePast", and so it must almost immediately
> call "entryLoadMoreItems" again. But because "advancePast" never gets
> updated, this results in a higher than usual succession of
> "entryLoadMoreItems" function calls (the program loads the same page,
> iterates over the same relatively few items until it goes and loads the
> same
> page again), with each call requiring traversal from the root of the
> posting
> tree down to the same leaf page as before.



> My patch makes it so that when stuck in the do-while loop after many
> successive "dropItems" returning true, the program instead now loads actual
> new items by passing the last item dropped into the "entryLoadMoreItems"
> function instead of the "advancePast" variable that can't be appropriately
> updated while stuck in the do-while loop. This means "entryLoadMoreItems"
> will instead load items ordered right after the last dropped item. This
> last
> item dropped is also the current item ("curItem") and so the
> "entryLoadMoreItems" can directly obtain the next page of items by making a
> step right from the current page, instead of traversing from the root of
> the
> posting tree, which is the most important fix for performance.


In regards to this:

While we're here, what do you think about the comment in the other
> code branch just above:
> /* XXX: shouldn't we apply the fuzzy search limit here? */
> I'm personally inclined to suspect that the fuzzy-search business has
> got lots of bugs, which haven't been noticed because (a) it's so squishily
> defined that one can hardly tell whether a given result is buggy or not,
> and (b) nearly nobody uses it anyway (possibly not unrelated to (a)).
> As somebody who evidently is using it, you're probably more motivated
> to track down bugs than the rest of us.


I think the comment is correct. It should be applied if you are to stay
consistent. Like the comment above that comment says, that code branch is
for the two cases of either (1) reaching the last page of a posting tree or
(2) when an "entry"/keyword has so few results that the item pointers fit
in just one page containing a posting list. If there is a chance of a
dropped item in the other pages of the posting tree, there should be a
chance of dropped items in the last page too for consistency sake at least.
And there should also be a chance of dropped items when iterating a single
posting list of entry with relatively few results. Placing "||
(entry->reduceResult == true && dropItem(entry))" at the end of the while
condition should be all that's needed to apply the fuzzy search limit there.

And I agree that probably usage of the fuzzy search feature is extremely
rare and the way I'm using it pr

Re: Refactor compile-time assertion checks for C/C++

2020-03-11 Thread Tom Lane
Michael Paquier  writes:
> Indeed the bot is happy now.  By looking at the patch, one would note
> that what it just does is unifying the fallback "hack-ish"
> implementations so as C and C++ use the same thing, which is the
> fallback implementation for C of HEAD.  I would prefer hear first from
> more people to see if they like this change.  Or not.

I looked at this and tried it on an old (non HAVE__STATIC_ASSERT)
gcc version.  Seems to work, but I have a couple cosmetic suggestions:

1. The comment block above this was never updated to mention that
we're now catering for C++ too.  Maybe something like

  * gcc 4.6 and up supports _Static_assert(), but there are bizarre syntactic
  * placement restrictions.  Macros StaticAssertStmt() and StaticAssertExpr()
  * make it safe to use as a statement or in an expression, respectively.
  * The macro StaticAssertDecl() is suitable for use at file scope (outside of
  * any function).
  *
+ * On recent C++ compilers, we can use standard static_assert().
+ *
  * Otherwise we fall back on a kluge that assumes the compiler will complain
  * about a negative width for a struct bit-field.  This will not include a
  * helpful error message, but it beats not getting an error at all.


2. I think you could simplify the #elif to just

#elif defined(__cplusplus) && __cpp_static_assert >= 200410

Per the C standard, an unrecognized identifier in an #if condition
is replaced by zero.  So the condition will come out false as desired
if __cpp_static_assert isn't defined; you don't need to test that
separately.

regards, tom lane




Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2020-03-11 Thread Michael Paquier
On Wed, Mar 11, 2020 at 12:27:03PM +0900, Michael Paquier wrote:
>> Also, I think the function comment could stand some more detailing.
> 
> What kind of additional information would you like to add on top of
> what the updated version attached does?

I have been working more on that, and attached is an updated version
of the main feature based on 0001, attached as 0002.  I have applied
the following changes, fixing some actual bugs while on it:
- Added back expectedSize, so as frontends can also fetch things other
than WAL segments from the archives, including history files.
pg_rewind needs only WAL segments, still other tools may want more.
- Fixed an issue with ENOENT when doing stat() on a segment after
running the restore command.  Like the backend, there is an argument
for looping here.
- Switched the several failures to issue exit() for the several
failure types possible instead of looping.  If a file has an incorrect
expected size or that stat() fails for a non-ENOENT, things are more
likely going to repeat if the frontend tool loops, so exiting
immediately is safer.

I think that we need a better name for RestoreArchivedFile() in
fe_archive.c.  The backend uses RestoreArchivedFile(), and the
previous versions of the patch switched to RestoreArchivedWALFile()
which is incorrect if working on something else than a WAL segment.
The version attached uses FrontendRestoreArchivedFile(), still we
could do better.

I'd like to commit the refactoring piece in 0001 tomorrow, then let's
move on with the rest as of 0002.  If more comments and docs are
needed for archive.c, let's continue discussing that.
--
Michael
From 43ac17434f5fba9effd77c1962d46589ba693874 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 12 Mar 2020 13:22:34 +0900
Subject: [PATCH 1/2] Move routine generating restore_command to src/common/

restore_command has only been used until now by the backend, but there
is a pending patch for pg_rewind to make use of that in the frontend.

Author: Alexey Kondratov
Reviewed-by: Andrey Borodin, Andres Freund, Alvaro Herrera, Alexander
Korotkov, Michael Paquier
Discussion: https://postgr.es/m/a3acff50-5a0d-9a2c-b3b2-ee3616895...@postgrespro.ru
---
 src/include/common/archive.h |  27 +
 src/backend/access/transam/xlogarchive.c |  66 ++---
 src/common/Makefile  |   1 +
 src/common/archive.c | 119 +++
 src/tools/msvc/Mkvcbuild.pm  |   4 +-
 5 files changed, 159 insertions(+), 58 deletions(-)
 create mode 100644 src/include/common/archive.h
 create mode 100644 src/common/archive.c

diff --git a/src/include/common/archive.h b/src/include/common/archive.h
new file mode 100644
index 00..8af0b4566f
--- /dev/null
+++ b/src/include/common/archive.h
@@ -0,0 +1,27 @@
+/*-
+ *
+ * archive.h
+ *	  Common WAL archive routines
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/common/archive.h
+ *
+ *-
+ */
+#ifndef ARCHIVE_H
+#define ARCHIVE_H
+
+/*
+ * Routine to build a restore command to retrieve WAL segments from a WAL
+ * archive.  This uses the arguments given by the caller to replace the
+ * corresponding alias fields as defined in the GUC parameter
+ * restore_command.
+ */
+extern char *BuildRestoreCommand(const char *restoreCommand,
+ const char *xlogpath,	/* %p */
+ const char *xlogfname, /* %f */
+ const char *lastRestartPointFname);	/* %r */
+
+#endif			/* ARCHIVE_H */
diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c
index 188b73e752..914ad340ea 100644
--- a/src/backend/access/transam/xlogarchive.c
+++ b/src/backend/access/transam/xlogarchive.c
@@ -21,6 +21,7 @@
 
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
+#include "common/archive.h"
 #include "miscadmin.h"
 #include "postmaster/startup.h"
 #include "replication/walsender.h"
@@ -53,11 +54,8 @@ RestoreArchivedFile(char *path, const char *xlogfname,
 	bool cleanupEnabled)
 {
 	char		xlogpath[MAXPGPATH];
-	char		xlogRestoreCmd[MAXPGPATH];
+	char	   *xlogRestoreCmd;
 	char		lastRestartPointFname[MAXPGPATH];
-	char	   *dp;
-	char	   *endp;
-	const char *sp;
 	int			rc;
 	struct stat stat_buf;
 	XLogSegNo	restartSegNo;
@@ -149,58 +147,13 @@ RestoreArchivedFile(char *path, const char *xlogfname,
 	else
 		XLogFileName(lastRestartPointFname, 0, 0L, wal_segment_size);
 
-	/*
-	 * construct the command to be executed
-	 */
-	dp = xlogRestoreCmd;
-	endp = xlogRestoreCmd + MAXPGPATH - 1;
-	*endp = '\0';
-
-	for (sp = recoveryRestoreCommand; *sp; sp++)
-	{
-		if (*sp == '%')
-		{
-			switch (sp[1])
-			{
-case 'p':
-	/* %p: relative path of target file */
-	sp++;
-	StrNCpy(dp, xlogpa

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-11 Thread David Rowley
On Wed, 11 Mar 2020 at 19:00, Masahiko Sawada
 wrote:
>
> On Wed, 11 Mar 2020 at 13:24, Laurenz Albe  wrote:
> > If you have an insert-only table that has 10 entries, it will get
> > vacuumed roughly every 2 new entries.  The impact is probably too
> > little to care, but it will increase the contention for the three
> > autovacuum workers available by default.
>
> The same is true for read-write table, right? If that becomes a
> problem, it's a mis-configuration and user should increase these
> values just like when we set these values for read-write tables.

It is true that if vacuum has more to do than it can do, then
something is not configured correctly.

I imagine Laurenz set the scale factor to 0.0 and the threshold to 10
million to reduce the chances that someone will encounter that
problem. I mentioned somewhere upthread that commonly used to see
production servers running with the standard vacuum_cost_limit of 200
and the (pre-PG12) autovacuum_vacuum_cost_delay of 20. Generally, it
didn't go well for them.  autovacuum_vacuum_cost_delay is now 2 by
default, so auto-vacuum in PG12 and beyond runs 10x faster, but it's
still pretty conservative and it'll still need another bump in several
years when hardware is faster than it is today. So, by no means did
that 10x increase mean that nobody will suffer from auto-vacuum
starvation ever again.

Now, perhaps it remains to be seen if adding additional work onto
auto-vacuum will help or hinder those people.  If their auto-vacuum
can just keep up until the cluster is old enough to need
anti-wraparound vacuums and then falls massively behind, then perhaps
this is a good thing as they might notice at some point before their
server explodes in the middle of the night. By that time they might
have become complacent. Additionally, I think this is pretty well
aligned to the case mentioned in the subject line of this email.  We
now have a freeze map, so performing vacuums to freeze tuples twice as
often is not really much more expensive in total than doing that
vacuuming half as often. Even tables (e.g log tables) that are never
queried won't become much more costly to maintain.  In the meantime,
for tables that do receive queries, then we're more likely to get an
index-only scan.

Perhaps a good way to decide what the scale_factor should be set to
should depend on the run-time of an Index Only Scan, vs an Index Scan.

create table ios (a int, b text);
insert into ios select x,x::text from generate_series(1,100)x;
create index on ios (a);
vacuum analyze ios;

explain (analyze, buffers) select a from ios order by a; -- on 2nd exec
  QUERY PLAN
--
 Index Only Scan using ios_a_idx on ios  (cost=0.42..25980.42
rows=100 width=4) (actual time=0.035..212.602 rows=100
loops=1)
   Heap Fetches: 0
   Buffers: shared hit=2736
 Planning Time: 0.095 ms
 Execution Time: 246.864 ms
(5 rows)

set enable_indexonlyscan=0;
explain (analyze, buffers) select a from ios order by a;
   QUERY PLAN
-
 Index Scan using ios_a_idx on ios  (cost=0.42..31388.42 rows=100
width=4) (actual time=0.036..451.381 rows=100 loops=1)
   Buffers: shared hit=8140
 Planning Time: 0.089 ms
 Execution Time: 486.582 ms
(4 rows)

So about twice as fast with the IOS. When it's going to be beneficial
to perform the vacuum will depend on the reads to insert ratio.  I'm
starting to think that we should set the scale_factor to something
like 0.3 and the threshold to 50. Is anyone strongly against that?  Or
Laurenz, are you really set on the 10 million threshold?




Re: [PATCH] Replica sends an incorrect epoch in its hot standby feedback to the Master

2020-03-11 Thread Thomas Munro
On Wed, Mar 11, 2020 at 7:47 PM Thomas Munro  wrote:
> On Sat, Feb 22, 2020 at 6:10 AM Palamadai, Eka  wrote:
> > Thanks a lot for the feedback. Please let me know if you have any further 
> > comments. Meanwhile, I have also added this patch to "Commitfest 2020-03" 
> > at https://commitfest.postgresql.org/27/2464.
>
> Thanks for the excellent reproducer for this obscure bug.  You said
> the problem exists in 9.6-11, but I'm also able to reproduce it in
> 9.5.  That's the oldest supported release, but it probably goes back
> further.  I confirmed that this patch fixes the immediate problem.
> I've attached a version of your patch with a commit message, to see if
> anyone has more feedback on this.

Pushed.




RE: Planning counters in pg_stat_statements (using pgss_store)

2020-03-11 Thread imai.yoshik...@fujitsu.com
Hi Julien,

On Mon, Mar 9, 2020 at 10:32 AM, Julien Rouhaud wrote:
> On Thu, Mar 05, 2020 at 01:26:19PM -0700, legrand legrand wrote:
> > Please consider PG13 shortest path ;o)
> >
> > My one is  parse->queryId != UINT64CONST(0) in pgss_planner_hook().
> > It fixes IVM problem (verified),
> > and keep CTAS equal to pgss without planning counters (verified too).
> 
> I still disagree that hiding this problem is the right fix, but since no one
> objected here's a v5 with that behavior.  Hopefully this will be fixed in v14.

Is there any case that query_text will be NULL when executing pg_plan_query?
If query_text will be NULL, we need to add codes to avoid errors in
pgss_store like as current patch. If query_text will not be NULL, we should
add Assert in pg_plan_query so that other developers can notice that they
would not mistakenly set query_text as NULL even without using pgss_planning
counter.

--
Yoshikazu Imai





Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-11 Thread Laurenz Albe
On Thu, 2020-03-12 at 17:47 +1300, David Rowley wrote:
> I'm starting to think that we should set the scale_factor to something
> like 0.3 and the threshold to 50. Is anyone strongly against that?  Or
> Laurenz, are you really set on the 10 million threshold?

These values are almost the same as "autovacuum_vacuum_scale_factor"
and "autovacuum_vacuum_threshold", so you actually agree with Masahiko
with the exception that you want it tunable separately.

I don't like the high scale factor.

If your insert-only table was last vacuumed when it had 500 million rows,
the next autovacuum will freeze 150 million tuples, which is a lot.
The impact will be less than that of an anti-wraparound vacuum because
it is not as persistent, but if our 150 million tuple autovacuum backs
down because it hits a lock or gets killed by the DBA, that is also not
good, since it will just come again.
And the bigger the vacuum run is, the more likely it is to meet an obstacle.

So I think that large insert-only tables should be vacuumed more often
than that.  If the number of tuples that have to be frozen is small,
the vacuum run will be short and is less likely to cause problems.
That is why I chose a scale factor of 0 here.


But I totally see your point about index-only scans.

I think the problem is that this insert-only autovacuum serves two masters:
1. preventing massive anti-wraparound vacuum that severely impacts the system
2. maintaining the visibility map for index-only scans

I thought of the first case when I chose the parameter values.

I am afraid that we cannot come up with one setting that fits all, so I
advocate a setting that targets the first problem, which I think is more
important (and was the motivation for this thread).

I could add a paragraph to the documentation that tells people how to
configure the parameters if they want to use it to get index-only scans.

Yours,
Laurenz Albe





Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2020-03-11 Thread Dilip Kumar
On Wed, Mar 11, 2020 at 2:23 PM Amit Kapila  wrote:
>
> On Tue, Mar 10, 2020 at 8:53 AM Dilip Kumar  wrote:
> >
> > Please find the updated patch (summary of the changes)
> > - Instead of searching the lock hash table for assert, it maintains a 
> > counter.
> > - Also, handled the case where we can acquire the relation extension
> > lock while holding the relation extension lock on the same relation.
> > - Handled the error case.
> >
> > In addition to that prepared a WIP patch for handling the PageLock.
> > First, I thought that we can use the same counter for the PageLock and
> > the RelationExtensionLock because in assert we just need to check
> > whether we are trying to acquire any other heavyweight lock while
> > holding any of these locks.  But, the exceptional case where we
> > allowed to acquire a relation extension lock while holding any of
> > these locks is a bit different.  Because, if we are holding a relation
> > extension lock then we allowed to acquire the relation extension lock
> > on the same relation but it can not be any other relation otherwise it
> > can create a cycle.  But, the same is not true with the PageLock,
> > i.e. while holding the PageLock you can acquire the relation extension
> > lock on any relation and that will be safe because the relation
> > extension lock guarantee that, it will never create the cycle.
> > However, I agree that we don't have any such cases where we want to
> > acquire a relation extension lock on the different relations while
> > holding the PageLock.
> >
>
> Right, today, we don't have such cases where after acquiring relation
> extension or page lock for a particular relation, we need to acquire
> any of those for other relation and I am not able to offhand think of
> many cases where we might have such a need in the future.  The one
> theoretical possibility is to include fork_num in the lock tag while
> acquiring extension lock for fsm/vm, but that will also have the same
> relation.  Similarly one might say it is valid to acquire extension
> lock in share mode after we have acquired it exclusive mode.  I am not
> sure how much futuristic we want to make these Asserts.
>
> I feel we should cover the current possible cases (which I think will
> make the asserts more strict then required) and if there is a need to
> relax them in the future for any particular use case, then we will
> consider those.  In general, if we consider the way Mahendra has
> written a patch which is to find the entry via the local hash table to
> check for an Assert condition, then it will be a bit easier to extend
> the checks if required in future as that way we have more information
> about the particular lock. However, it will make the check more
> expensive which might be okay considering that it is only for Assert
> enabled builds.
>
> One minor comment:
> /*
> + * We should not acquire any other lock if we are already holding the
> + * relation extension lock.  Only exception is that if we are trying to
> + * acquire the relation extension lock then we can hold the relation
> + * extension on the same relation.
> + */
> + Assert(!IsRelExtLockHeld() ||
> +((locktag->locktag_type == LOCKTAG_RELATION_EXTEND) && found));

I have fixed this in the attached patch set.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


v4-0001-Add-assert-to-check-that-we-should-not-acquire-an.patch
Description: Binary data


v4-0003-Conflict-Extension-Page-lock-in-group-member.patch
Description: Binary data


v4-0002-WIP-Extend-the-patch-for-handling-PageLock.patch
Description: Binary data


Re: [PATCH] Skip llvm bytecode generation if LLVM is missing

2020-03-11 Thread Craig Ringer
On Thu, 12 Mar 2020 at 03:43, Andres Freund  wrote:

> On 2020-03-11 11:25:28 +0800, Craig Ringer wrote:
> > I propose that per the attached patch PGXS should simply skip adding
> > the automatic dependency for .bc files if clang cannot be found.
> > Extensions may still choose to explicitly declare the rule in their
> > own Makefile if they want to force bitcode generation.
>
> Hm, that seems like it could also cause silent failures (e.g. after a
> package upgrade or such).
>
> How about erroring out, but with an instruction that llvm can be
> disabled with make NO_LLVM=1 or such?

I thought about that at first, but that'll only benefit people who're
hand-compiling things, and it's already possible with

make with_llvm=no ...

The proportion of people hand-compiling is an ever-shrinking
proportion of the user base. When something's nested inside an rpm
specfile inside a docker container inside a bash script inside another
Docker container on an AWS instance  not so fun. They might be
able to inject it into the environment. But often not.

Extensions that explicitly must generate bytecode may add their own
dependency rule. Or we could make skipping bytecode generation if llvm
cannot be found at build-time something the extension can turn off
with a PGXS option, as suggested upthread.

I'm reluctant to go with anything that requires each existing
extension to be patched because that introduces a huge lag time for
this change to actually help anyone out.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise




Re: Some problems of recovery conflict wait events

2020-03-11 Thread Masahiko Sawada
On Wed, 11 Mar 2020 at 16:42, Fujii Masao  wrote:
>
>
>
> On 2020/03/10 13:54, Masahiko Sawada wrote:
> > On Tue, 10 Mar 2020 at 00:57, Fujii Masao  
> > wrote:
> >>
> >>
> >>
> >> On 2020/03/09 14:10, Masahiko Sawada wrote:
> >>> On Mon, 9 Mar 2020 at 13:24, Fujii Masao  
> >>> wrote:
> 
> 
> 
>  On 2020/03/08 13:52, Masahiko Sawada wrote:
> > On Thu, 5 Mar 2020 at 20:16, Fujii Masao  
> > wrote:
> >>
> >>
> >>
> >> On 2020/03/05 16:58, Masahiko Sawada wrote:
> >>> On Wed, 4 Mar 2020 at 15:21, Fujii Masao 
> >>>  wrote:
> 
> 
> 
>  On 2020/03/04 14:31, Masahiko Sawada wrote:
> > On Wed, 4 Mar 2020 at 13:48, Fujii Masao 
> >  wrote:
> >>
> >>
> >>
> >> On 2020/03/04 13:27, Michael Paquier wrote:
> >>> On Wed, Mar 04, 2020 at 01:13:19PM +0900, Masahiko Sawada wrote:
>  Yeah, so 0001 patch sets existing wait events to recovery 
>  conflict
>  resolution. For instance, it sets (PG_WAIT_LOCK | 
>  LOCKTAG_TRANSACTION)
>  to the recovery conflict on a snapshot. 0003 patch improves 
>  these wait
>  events by adding the new type of wait event such as
>  WAIT_EVENT_RECOVERY_CONFLICT_SNAPSHOT. Therefore 0001 (and 0002) 
>  patch
>  is the fix for existing versions and 0003 patch is an 
>  improvement for
>  only PG13. Did you mean even 0001 patch doesn't fit for 
>  back-patching?
> >>
> >> Yes, it looks like a improvement rather than bug fix.
> >>
> >
> > Okay, understand.
> >
> >>> I got my eyes on this patch set.  The full patch set is in my 
> >>> opinion
> >>> just a set of improvements, and not bug fixes, so I would refrain 
> >>> from
> >>> back-backpatching.
> >>
> >> I think that the issue (i.e., "waiting" is reported twice 
> >> needlessly
> >> in PS display) that 0002 patch tries to fix is a bug. So it should 
> >> be
> >> fixed even in the back branches.
> >
> > So we need only two patches: one fixes process title issue and 
> > another
> > improve wait event. I've attached updated patches.
> 
>  Thanks for updating the patches! I started reading 0001 patch.
> >>>
> >>> Thank you for reviewing this patch.
> >>>
> 
>  -   /*
>  -* Report via ps if we have been waiting for 
>  more than 500 msec
>  -* (should that be configurable?)
>  -*/
>  -   if (update_process_title && new_status == 
>  NULL &&
>  -   
>  TimestampDifferenceExceeds(waitStart, GetCurrentTimestamp(),
>  -
>    500))
> 
>  The patch changes ResolveRecoveryConflictWithSnapshot() and
>  ResolveRecoveryConflictWithTablespace() so that they always add
>  "waiting" into the PS display, whether wait is really necessary or 
>  not.
>  But isn't it better to display "waiting" in PS basically when wait is
>  necessary, like originally ResolveRecoveryConflictWithVirtualXIDs()
>  does as the above?
> >>>
> >>> You're right. Will fix it.
> >>>
> 
>    ResolveRecoveryConflictWithDatabase(Oid dbid)
>    {
>  +   char*new_status = NULL;
>  +
>  +   /* Report via ps we are waiting */
>  +   new_status = set_process_title_waiting();
> 
>  In  ResolveRecoveryConflictWithDatabase(), there seems no need to
>  display "waiting" in PS because no wait occurs when recovery conflict
>  with database happens.
> >>>
> >>> Isn't the startup process waiting for other backend to terminate?
> >>
> >> Yeah, you're right. I agree that "waiting" should be reported in this 
> >> case.
> >>
> >> Currently ResolveRecoveryConflictWithLock() and
> >> ResolveRecoveryConflictWithBufferPin() don't call
> >> ResolveRecoveryConflictWithVirtualXIDs and don't report "waiting"
> >> in PS display. You changed them so that they report "waiting". I agree
> >> to have this change. But this change is an improvement rather than
> >> a bug fix, i.e., we should apply this change only in v13?
> >>
> >
> > Did you mean ResolveRecoveryConflictWithDatabase and
> > ResolveRecoveryConflictWithBufferPin?
> 
>  Yes! Sorry for my typo.
> 
> > In the current code as far as I
> > researched there are two cases where we don't add "waiting"

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-11 Thread David Rowley
On Thu, 12 Mar 2020 at 18:38, Laurenz Albe  wrote:
>
> On Thu, 2020-03-12 at 17:47 +1300, David Rowley wrote:
> > Laurenz, are you really set on the 10 million threshold?
>
> These values are almost the same as "autovacuum_vacuum_scale_factor"
> and "autovacuum_vacuum_threshold", so you actually agree with Masahiko
> with the exception that you want it tunable separately.
>
> I don't like the high scale factor.
>
> If your insert-only table was last vacuumed when it had 500 million rows,
> the next autovacuum will freeze 150 million tuples, which is a lot.
> The impact will be less than that of an anti-wraparound vacuum because
> it is not as persistent, but if our 150 million tuple autovacuum backs
> down because it hits a lock or gets killed by the DBA, that is also not
> good, since it will just come again.
> And the bigger the vacuum run is, the more likely it is to meet an obstacle.
>
> So I think that large insert-only tables should be vacuumed more often
> than that.  If the number of tuples that have to be frozen is small,
> the vacuum run will be short and is less likely to cause problems.
> That is why I chose a scale factor of 0 here.

That's a good point.  If those 150 million inserts were done one per
transaction, then it wouldn't take many more tuples before wraparound
vacuums occur more often than insert vacuums.  The only way I see
around that is to a) configure it the way you'd like, or; b) add yet
another GUC and reloption to represent how close to
autovacuum_freeze_max_age / autovacuum_multixact_freeze_max_age the
table is.  I'm not very excited about adding yet another GUC, plus
anti-wraparound vacuums already occur 10 times more often than they
need to. If we added such a GUC and set it to, say, 0.1, then they'd
happen 100 times more often than needed before actual wraparound
occurs.

I'm starting to see now why you were opposed to the scale_factor in
the first place.

I really think that this is really a problem with the design of the
threshold and scale_factor system.  I used to commonly see people with
larger tables zeroing out the scale_factor and setting a reasonable
threshold or dropping the scale_factor down to some fraction of a
percent. I don't really have any better design in mind though, at
least not one that does not require adding new vacuum options.




Re: Planning counters in pg_stat_statements (using pgss_store)

2020-03-11 Thread Julien Rouhaud
Hi Imai-san,

On Thu, Mar 12, 2020 at 05:28:38AM +, imai.yoshik...@fujitsu.com wrote:
> Hi Julien,
>
> On Mon, Mar 9, 2020 at 10:32 AM, Julien Rouhaud wrote:
> > On Thu, Mar 05, 2020 at 01:26:19PM -0700, legrand legrand wrote:
> > > Please consider PG13 shortest path ;o)
> > >
> > > My one is  parse->queryId != UINT64CONST(0) in pgss_planner_hook().
> > > It fixes IVM problem (verified),
> > > and keep CTAS equal to pgss without planning counters (verified too).
> >
> > I still disagree that hiding this problem is the right fix, but since no one
> > objected here's a v5 with that behavior.  Hopefully this will be fixed in 
> > v14.
>
> Is there any case that query_text will be NULL when executing pg_plan_query?

With current sources, there are no cases where the query text isn't provided
AFAICS.

> If query_text will be NULL, we need to add codes to avoid errors in
> pgss_store like as current patch. If query_text will not be NULL, we should
> add Assert in pg_plan_query so that other developers can notice that they
> would not mistakenly set query_text as NULL even without using pgss_planning
> counter.

I totally agree.  I already had such assert locally, and regression tests pass
without any problem.  I'm attaching a v6 with that extra assert.  If the
first patch is committed, it'll now be a requirement to provide it.  Or if
people think it's not, it'll make sure that we'll discuss it.
>From 3be06d43f0ef306f159a1c4ce0755d3bb0a4dd03 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Thu, 28 Mar 2019 13:33:23 +0100
Subject: [PATCH v6 1/3] Pass query string to the planner

---
 src/backend/commands/copy.c  |  3 ++-
 src/backend/commands/createas.c  |  3 ++-
 src/backend/commands/explain.c   |  2 +-
 src/backend/commands/extension.c |  2 +-
 src/backend/commands/matview.c   |  2 +-
 src/backend/commands/portalcmds.c|  2 +-
 src/backend/executor/functions.c |  1 +
 src/backend/optimizer/plan/planner.c | 10 ++
 src/backend/tcop/postgres.c  | 16 +++-
 src/backend/utils/cache/plancache.c  |  3 ++-
 src/include/optimizer/optimizer.h|  3 ++-
 src/include/optimizer/planner.h  |  4 +++-
 src/include/tcop/tcopprot.h  |  6 --
 13 files changed, 37 insertions(+), 20 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index fbde9f88e7..efb1e0d03e 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -1580,7 +1580,8 @@ BeginCopy(ParseState *pstate,
 		}
 
 		/* plan the query */
-		plan = pg_plan_query(query, CURSOR_OPT_PARALLEL_OK, NULL);
+		plan = pg_plan_query(query, pstate->p_sourcetext,
+			 CURSOR_OPT_PARALLEL_OK, NULL);
 
 		/*
 		 * With row level security and a user using "COPY relation TO", we
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 3a5676fb39..9febdc5165 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -329,7 +329,8 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
 		Assert(query->commandType == CMD_SELECT);
 
 		/* plan the query */
-		plan = pg_plan_query(query, CURSOR_OPT_PARALLEL_OK, params);
+		plan = pg_plan_query(query, pstate->p_sourcetext,
+			 CURSOR_OPT_PARALLEL_OK, params);
 
 		/*
 		 * Use a snapshot with an updated command ID to ensure this query sees
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index d901dc4a50..fb772aa5cd 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -375,7 +375,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
 		INSTR_TIME_SET_CURRENT(planstart);
 
 		/* plan the query */
-		plan = pg_plan_query(query, cursorOptions, params);
+		plan = pg_plan_query(query, queryString, cursorOptions, params);
 
 		INSTR_TIME_SET_CURRENT(planduration);
 		INSTR_TIME_SUBTRACT(planduration, planstart);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 00cf4ef268..38cbea385a 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -751,7 +751,7 @@ execute_sql_string(const char *sql)
 		   NULL,
 		   0,
 		   NULL);
-		stmt_list = pg_plan_queries(stmt_list, CURSOR_OPT_PARALLEL_OK, NULL);
+		stmt_list = pg_plan_queries(stmt_list, sql, CURSOR_OPT_PARALLEL_OK, NULL);
 
 		foreach(lc2, stmt_list)
 		{
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index c3954f3e24..e5a5eef102 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -391,7 +391,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
 	CHECK_FOR_INTERRUPTS();
 
 	/* Plan the query which will generate data for the refresh. */
-	plan = pg_plan_query(query, 0, NULL);
+	plan = pg_plan_query(query, queryString, 0, NULL);
 
 	/*
 	 * Use a snapshot with an updated command ID to ensure this query sees
diff --git a/src/backend/commands/portalcmds.c b/src/backend/command

Re: Online verification of checksums

2020-03-11 Thread Michael Banck
Hi,

thanks for reviewing this patch!

Am Donnerstag, den 27.02.2020, 10:57 + schrieb Asif Rehman:
> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, passed
> Implements feature:   tested, passed
> Spec compliant:   tested, passed
> Documentation:not tested
> 
> The patch applies cleanly and works as expected. Just a few minor 
> observations:
> 
> - I would suggest refactoring PageIsZero function by getting rid of 
> all_zeroes variable
> and simply returning false when a non-zero byte is found, rather than setting 
> all_zeros
> variable to false and breaking the for loop. The function should simply 
> return true at the
> end otherwise.


Good point, I have done so.

> - Remove the empty line:
> +* would throw an assertion failure.  Consider this a
> +* checksum failure.
> +*/
> +
> +   checksum_failures++;

Done

> - Code needs to run through pgindent.

Done.

> Also, I'd suggest to make "5" a define within the current file/function, 
> perhaps 
> something like "MAX_CHECKSUM_FAILURES". You could move the second 
> warning outside the conditional statement as it appears in both "if" and 
> "else" blocks.

Well, I think you have a valid point, but that would be a different (non
bug-fix) patch as this part is not changed by this patch, but code is at
most moved around, is it?

New version attached.


Best regards,

Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz
From 084a2fe968a3ee9c0bb4f18fa9fbc8a582f38b3c Mon Sep 17 00:00:00 2001
From: Michael Banck 
Date: Thu, 12 Mar 2020 07:31:22 +0100
Subject: [PATCH] Fix checksum verification in base backups for random or zero page headers

We currently do not checksum a page if it is considered new by PageIsNew() or
if its pd_lsn page header field is larger than the LSN at the beginning of the
base backup. However, this means that if the whole page header is zero,
PageIsNew() will consider that page new due to the pd_upper field being zero
and no subsequent checksum verification is done. Also, a random value in the
pd_lsn field will very likely be larger than the LSN at backup start, again
resulting in the page not getting checksummed.

To fix, factor out the all-zeroes check from PageIsVerified() into a new method
PageIsZero() and call that for pages considered new by PageIsNew(). If a page
is all zero, consider that a checksum failure. Also check the pd_lsn field
against both the backup start pointer and the current pointer from
GetInsertRecPtr().

Add two more tests to the pg_basebackup TAP tests to check those errors.

Reported-By: Andres Freund
Reviewed-By: Asif Rehman
Discussion: https://postgr.es/m/20190319200050.ncuxejradurja...@alap3.anarazel.de
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 806d013108..6e4a5a1365 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1526,87 +1526,27 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
 page = buf + BLCKSZ * i;
 
 /*
- * Only check pages which have not been modified since the
- * start of the base backup. Otherwise, they might have been
- * written only halfway and the checksum would not be valid.
- * However, replaying WAL would reinstate the correct page in
- * this case. We also skip completely new pages, since they
- * don't have a checksum yet.
+ * We skip completely new pages after checking they are
+ * all-zero, since they don't have a checksum yet.
  */
-if (!PageIsNew(page) && PageGetLSN(page) < startptr)
+if (PageIsNew(page))
 {
-	checksum = pg_checksum_page((char *) page, blkno + segmentno * RELSEG_SIZE);
-	phdr = (PageHeader) page;
-	if (phdr->pd_checksum != checksum)
+	if (!PageIsZero(page))
 	{
 		/*
-		 * Retry the block on the first failure.  It's
-		 * possible that we read the first 4K page of the
-		 * block just before postgres updated the entire block
-		 * so it ends up looking torn to us.  We only need to
-		 * retry once because the LSN should be updated to
-		 * something we can ignore on the next pass.  If the
-		 * error happens again then it is a true validation
-		 * failure.
+		 * pd_upper is zero, but the page is not all zero.  We
+		 * cannot run pg_checksum_page() on the page as it
+		 * would throw an assertion failure.  Consider this a
+		 * checksum failure.
 		 */
-		if

Re: User and Logical Replication

2020-03-11 Thread Tobias Stadler
Tanks for the Info.

> Am 11.03.2020 um 19:01 schrieb Peter Eisentraut 
> :
> 
> On 2020-03-11 17:18, Tobias Stadler wrote:
>> Would it be possible to include the user (who changed the row) in the  
>> logical replication data?
> 
> Not without major re-engineering.
> 
> If you need this information, maybe a BEFORE INSERT OR UPDATE trigger could 
> be used to write this information into a table column.
> 
> -- 
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-11 Thread Masahiko Sawada
On Thu, 12 Mar 2020 at 14:38, Laurenz Albe  wrote:
>
> On Thu, 2020-03-12 at 17:47 +1300, David Rowley wrote:
> > I'm starting to think that we should set the scale_factor to something
> > like 0.3 and the threshold to 50. Is anyone strongly against that?  Or
> > Laurenz, are you really set on the 10 million threshold?
>
> These values are almost the same as "autovacuum_vacuum_scale_factor"
> and "autovacuum_vacuum_threshold", so you actually agree with Masahiko
> with the exception that you want it tunable separately.
>
> I don't like the high scale factor.
>
> If your insert-only table was last vacuumed when it had 500 million rows,
> the next autovacuum will freeze 150 million tuples, which is a lot.
> The impact will be less than that of an anti-wraparound vacuum because
> it is not as persistent, but if our 150 million tuple autovacuum backs
> down because it hits a lock or gets killed by the DBA, that is also not
> good, since it will just come again.
> And the bigger the vacuum run is, the more likely it is to meet an obstacle.
>
> So I think that large insert-only tables should be vacuumed more often
> than that.  If the number of tuples that have to be frozen is small,
> the vacuum run will be short and is less likely to cause problems.
> That is why I chose a scale factor of 0 here.

The reason why you want to add new GUC parameters is to use different
default values for insert-update table case and insert-only table
case? I think I understand the pros and cons of adding separate
parameters, but I still cannot understand use cases where we cannot
handle without separate parameters.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Memory-Bounded Hash Aggregation

2020-03-11 Thread Jeff Davis
On Wed, 2020-02-26 at 19:14 -0800, Jeff Davis wrote:
> Rebased on your change. This simplified the JIT and interpretation
> code
> quite a bit.

Attached another version.

 * tweaked EXPLAIN output some more
 * rebased and cleaned up
 * Added back the enable_hashagg_disk flag (defaulting to on). I've
gone back and forth on this, but it seems like a good idea to have it
there. So now there are a total of two GUCs: enable_hashagg_disk and
enable_groupingsets_hash_disk

Unless I (or someone else) finds something significant, this is close
to commit.

Regards,
 Jeff Davis

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 371d7838fb6..5e223c42208 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4462,6 +4462,24 @@ ANY num_sync ( 
+  enable_groupingsets_hash_disk (boolean)
+  
+   enable_groupingsets_hash_disk configuration parameter
+  
+  
+  
+   
+Enables or disables the query planner's use of hashed aggregation for
+grouping sets when the size of the hash tables is expected to exceed
+work_mem.  See .  Note that this setting only
+affects the chosen plan; execution time may still require using
+disk-based hash aggregation.  The default is off.
+   
+  
+ 
+
  
   enable_hashagg (boolean)
   
@@ -4476,6 +4494,23 @@ ANY num_sync ( 
+  enable_hashagg_disk (boolean)
+  
+   enable_hashagg_disk configuration parameter
+  
+  
+  
+   
+Enables or disables the query planner's use of hashed aggregation plan
+types when the memory usage is expected to exceed
+work_mem. This only affects the planner choice;
+execution time may still require using disk-based hash
+aggregation. The default is on.
+   
+  
+ 
+
  
   enable_hashjoin (boolean)
   
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index d901dc4a50e..58141d8393c 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -104,6 +104,7 @@ static void show_tablesample(TableSampleClause *tsc, PlanState *planstate,
 			 List *ancestors, ExplainState *es);
 static void show_sort_info(SortState *sortstate, ExplainState *es);
 static void show_hash_info(HashState *hashstate, ExplainState *es);
+static void show_hashagg_info(AggState *hashstate, ExplainState *es);
 static void show_tidbitmap_info(BitmapHeapScanState *planstate,
 ExplainState *es);
 static void show_instrumentation_count(const char *qlabel, int which,
@@ -1882,6 +1883,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_Agg:
 			show_agg_keys(castNode(AggState, planstate), ancestors, es);
 			show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
+			show_hashagg_info((AggState *) planstate, es);
 			if (plan->qual)
 show_instrumentation_count("Rows Removed by Filter", 1,
 		   planstate, es);
@@ -2769,6 +2771,41 @@ show_hash_info(HashState *hashstate, ExplainState *es)
 	}
 }
 
+/*
+ * Show information on hash aggregate memory usage and batches.
+ */
+static void
+show_hashagg_info(AggState *aggstate, ExplainState *es)
+{
+	Agg		*agg	   = (Agg *)aggstate->ss.ps.plan;
+	long	 memPeakKb = (aggstate->hash_mem_peak + 1023) / 1024;
+
+	Assert(IsA(aggstate, AggState));
+
+	if (agg->aggstrategy != AGG_HASHED &&
+		agg->aggstrategy != AGG_MIXED)
+		return;
+
+	if (es->costs && aggstate->hash_planned_partitions > 0)
+	{
+		ExplainPropertyInteger("Planned Partitions", NULL,
+			   aggstate->hash_planned_partitions, es);
+	}
+
+	if (!es->analyze)
+		return;
+
+	/* EXPLAIN ANALYZE */
+	ExplainPropertyInteger("Peak Memory Usage", "kB", memPeakKb, es);
+	if (aggstate->hash_batches_used > 0)
+	{
+		ExplainPropertyInteger("Disk Usage", "kB",
+			   aggstate->hash_disk_used, es);
+		ExplainPropertyInteger("HashAgg Batches", NULL,
+			   aggstate->hash_batches_used, es);
+	}
+}
+
 /*
  * If it's EXPLAIN ANALYZE, show exact/lossy pages for a BitmapHeapScan node
  */
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 7aebb247d88..d5ab1769127 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -194,6 +194,29 @@
  *	  transition values.  hashcontext is the single context created to support
  *	  all hash tables.
  *
+ *	  Spilling To Disk
+ *
+ *	  When performing hash aggregation, if the hash table memory exceeds the
+ *	  limit (see hash_agg_check_limits()), we enter "spill mode". In spill
+ *	  mode, we advance the transition states only for groups already in the
+ *	  hash table. For tuples that would need to create a new hash table
+ *	  entries (and initialize new transition states), we instead spill them to
+ *	  disk to be processed later. The tuples are spilled in a partitioned
+ *	  manner, so that subsequent batches are smaller and less likely to exceed
+ *	  work_mem (if a batch does exceed work_mem, it mus