Re: pgsql: Implement jsonpath .datetime() method

2019-10-13 Thread Alexander Korotkov
On Sun, Oct 13, 2019 at 5:24 AM Tom Lane  wrote:
> Alexander Korotkov  writes:
> > This patch also changes the way timestamp to timestamptz cast works.
> > Previously it did timestamp2tm() then tm2timestamp().  Instead, after
> > timestamp2tm() it calculates timezone offset and applies it to
> > original timestamp value.  I hope this is correct.
>
> I'd wonder whether this gives the same answers near DST transitions,
> where it's not real clear which offset applies.

I will try this and share the results.

> Please *don't* wrap this sort of thing into an unrelated feature patch.

Sure, thank you for noticing.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2019-10-13 Thread Amit Kapila
On Mon, Oct 14, 2019 at 6:51 AM Craig Ringer  wrote:
>
> On Sun, 13 Oct 2019 at 19:50, Amit Kapila  wrote:
>>
>
>
> Does anyone object if we add the reorder buffer total size & in-memory size 
> to struct WalSnd too, so we can report it in pg_stat_replication?
>

There is already a patch
(0011-Track-statistics-for-streaming-spilling) in this series posted
by Tomas[1] which tracks important statistics in WalSnd which I think
are good enough.  Have you checked that?  I am not sure if adding
additional size will help, but I might be missing something.

> I can follow up with a patch to add on top of this one if you think it's 
> reasonable. I'll also take the opportunity to add a number of tracepoints 
> across the walsender and logical decoding, since right now it's very opaque 
> in production systems ... and everyone just LOVES hunting down debug syms and 
> attaching gdb to production DBs.
>

Sure, adding tracepoints can be helpful, but isn't it better to start
that as a separate thread?

[1] - 
https://www.postgresql.org/message-id/20190928190917.hrpknmq76v3ts3lj%40development

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2019-10-13 Thread Craig Ringer
On Sun, 13 Oct 2019 at 19:50, Amit Kapila  wrote:

> On Sun, Oct 13, 2019 at 12:25 PM Dilip Kumar 
> wrote:
> >
> > On Thu, Oct 3, 2019 at 2:43 PM Amit Kapila 
> wrote:
> > >
> > > 3.
> > > + *
> > > + * While updating the existing change with detoasted tuple data, we
> need to
> > > + * update the memory accounting info, because the change size will
> differ.
> > > + * Otherwise the accounting may get out of sync, triggering
> serialization
> > > + * at unexpected times.
> > > + *
> > > + * We simply subtract size of the change before rejiggering the
> tuple, and
> > > + * then adding the new size. This makes it look like the change was
> removed
> > > + * and then added back, except it only tweaks the accounting info.
> > > + *
> > > + * In particular it can't trigger serialization, which would be
> pointless
> > > + * anyway as it happens during commit processing right before handing
> > > + * the change to the output plugin.
> > >   */
> > >  static void
> > >  ReorderBufferToastReplace(ReorderBuffer *rb, ReorderBufferTXN *txn,
> > > @@ -3023,6 +3281,13 @@ ReorderBufferToastReplace(ReorderBuffer *rb,
> ReorderBufferTXN *txn,
> > >   if (txn->toast_hash == NULL)
> > >   return;
> > >
> > > + /*
> > > + * We're going modify the size of the change, so to make sure the
> > > + * accounting is correct we'll make it look like we're removing the
> > > + * change now (with the old size), and then re-add it at the end.
> > > + */
> > > + ReorderBufferChangeMemoryUpdate(rb, change, false);
> > >
> > > It is not very clear why this change is required.  Basically, this is
> done at commit time after which actually we shouldn't attempt to spill
> these changes.  This is mentioned in comments as well, but it is not clear
> if that is the case, then how and when accounting can create a problem.  If
> possible, can you explain it with an example?
> > >
> > IIUC, we are keeping the track of the memory in ReorderBuffer which is
> > common across the transactions.  So even if this transaction is
> > committing and will not spill to dis but we need to keep the memory
> > accounting correct for the future changes in other transactions.
> >
>
> You are right.  I somehow missed that we need to keep the size
> computation in sync even during commit for other in-progress
> transactions in the ReorderBuffer.  You can ignore this point or maybe
> slightly adjust the comment to make it explicit.


Does anyone object if we add the reorder buffer total size & in-memory size
to struct WalSnd too, so we can report it in pg_stat_replication?

I can follow up with a patch to add on top of this one if you think it's
reasonable. I'll also take the opportunity to add a number of tracepoints
across the walsender and logical decoding, since right now it's very opaque
in production systems ... and everyone just LOVES hunting down debug syms
and attaching gdb to production DBs.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: v12.0: segfault in reindex CONCURRENTLY

2019-10-13 Thread Michael Paquier
On Sun, Oct 13, 2019 at 04:18:34PM -0300, Alvaro Herrera wrote:
> True.  And we can copy the resulting comment to the other spot.
> 
> (FWIW I expect the crash is possible not just in reindex but also in
> CREATE INDEX CONCURRENTLY.)

I need to think about that, but shouldn't we have a way to reproduce
that case rather reliably with an isolation test?  The patch looks to
good to me, these are also the two places I spotted yesterday after a
quick lookup.  The only other caller is isTempNamespaceInUse() which
does its thing correctly.
--
Michael


signature.asc
Description: PGP signature


Re: d25ea01275 and partitionwise join

2019-10-13 Thread Justin Pryzby
On Sun, Oct 13, 2019 at 03:02:17PM -0500, Justin Pryzby wrote:
> On Thu, Sep 19, 2019 at 05:15:37PM +0900, Amit Langote wrote:
> > Please find attached updated patches.
> 
> Tom pointed me to this thread, since we hit it in 12.0
> https://www.postgresql.org/message-id/flat/16802.1570989962%40sss.pgh.pa.us#070f6675a11dff17760b1cfccf1c038d
> 
> I can't say much about the patch; there's a little typo:
> "The nullability of inner relation keys prevents them to"
> ..should say "prevent them from".
> 
> In order to compile it against REL12, I tried to cherry-pick this one:
> 3373c715: Speed up finding EquivalenceClasses for a given set of rels
> 
> But then it crashes in check-world (possibly due to misapplied hunks).

I did it again paying more attention and got it to pass.

The PWJ + FULL JOIN query seems ok now.

But I'll leave PWJ disabled until I can look more closely.

$ PGOPTIONS='-c max_parallel_workers_per_gather=0 -c enable_mergejoin=off -c 
enable_hashagg=off -c enable_partitionwise_join=on' psql postgres -f 
tmp/sql-2019-10-11.1  
SET
 Nested Loop  (cost=80106964.13..131163200.28 rows=2226681567 width=6)
   Join Filter: ((s.site_location = ''::text) OR ((s.site_office)::integer = 
((COALESCE(t1.site_id, t2.site_id))::integer)))
   ->  Group  (cost=80106964.13..80837945.46 rows=22491733 width=12)
 Group Key: (COALESCE(t1.start_time, t2.start_time)), 
((COALESCE(t1.site_id, t2.site_id))::integer)
 ->  Merge Append  (cost=80106964.13..80613028.13 rows=22491733 
width=12)
   Sort Key: (COALESCE(t1.start_time, t2.start_time)), 
((COALESCE(t1.site_id, t2.site_id))::integer)
   ->  Group  (cost=25494496.54..25633699.28 rows=11136219 width=12)
 Group Key: (COALESCE(t1.start_time, t2.start_time)), 
((COALESCE(t1.site_id, t2.site_id))::integer)
 ->  Sort  (cost=25494496.54..25522337.09 rows=11136219 
width=12)
   Sort Key: (COALESCE(t1.start_time, t2.start_time)), 
((COALESCE(t1.site_id, t2.site_id))::integer)
   ->  Hash Full Join  (cost=28608.75..24191071.36 
rows=11136219 width=12)
 Hash Cond: ((t1.start_time = t2.start_time) 
AND (t1.site_id = t2.site_id))
 Filter: ((COALESCE(t1.start_time, 
t2.start_time) >= '2019-10-01 00:00:00'::timestamp without time zone) AND 
(COALESCE(t1.start_time, t2.start_time) < '2019-10-01 01:00:00'::timestamp 
without time zone))
 ->  Seq Scan on t1  (cost=0.00..14495.10 
rows=940910 width=10)
 ->  Hash  (cost=14495.10..14495.10 rows=940910 
width=10)
   ->  Seq Scan on t1 t2  
(cost=0.00..14495.10 rows=940910 width=10)
   ->  Group  (cost=54612467.58..54754411.51 rows=11355514 width=12)
 Group Key: (COALESCE(t1_1.start_time, t2_1.start_time)), 
((COALESCE(t1_1.site_id, t2_1.site_id))::integer)
 ->  Sort  (cost=54612467.58..54640856.37 rows=11355514 
width=12)
   Sort Key: (COALESCE(t1_1.start_time, 
t2_1.start_time)), ((COALESCE(t1_1.site_id, t2_1.site_id))::integer)
   ->  Hash Full Join  (cost=28608.75..53281777.94 
rows=11355514 width=12)
 Hash Cond: ((t1_1.start_time = 
t2_1.start_time) AND (t1_1.site_id = t2_1.site_id))
 Filter: ((COALESCE(t1_1.start_time, 
t2_1.start_time) >= '2019-10-01 00:00:00'::timestamp without time zone) AND 
(COALESCE(t1_1.start_time, t2_1.start_time) < '2019-10-01 01:00:00'::timestamp 
without time zone))
 ->  Seq Scan on t2 t1_1  (cost=0.00..14495.10 
rows=940910 width=10)
 ->  Hash  (cost=14495.10..14495.10 rows=940910 
width=10)
   ->  Seq Scan on t2 t2_1  
(cost=0.00..14495.10 rows=940910 width=10)
   ->  Materialize  (cost=0.00..2.48 rows=99 width=6)
 ->  Seq Scan on s  (cost=0.00..1.99 rows=99 width=6)

-- 
Justin Pryzby
System Administrator
Telsasoft
+1-952-707-8581




Fix most -Wundef warnings

2019-10-13 Thread Peter Eisentraut
During the cleanup of the _MSC_VER versions (commit
38d8dce61fff09daae0edb6bcdd42b0c7f10ebcd), I found it useful to use
-Wundef, but that resulted in a bunch of gratuitous warnings.  Here is a
patch to fix those.  Most of these are just stylistic cleanups, but the
change in pg_bswap.h is potentially useful to avoid misuse by
third-party extensions.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 86e5e00e17762c174624f203b9c54fdf5fe815fb Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 23 Sep 2019 22:50:05 +0200
Subject: [PATCH] Fix most -Wundef warnings

In some cases #if was used instead of #ifdef in an inconsistent style.
Cleaning this up also helps when analyzing cases like
38d8dce61fff09daae0edb6bcdd42b0c7f10ebcd where this makes a
difference.

There are no behavior changes here, but the change in pg_bswap.h would
prevent possible accidental misuse by third-party code.
---
 contrib/hstore/hstore_compat.c  |  2 +-
 contrib/pg_standby/pg_standby.c |  2 +-
 contrib/pgcrypto/imath.c|  4 ++--
 src/backend/libpq/be-fsstubs.c  |  4 ++--
 src/backend/replication/logical/reorderbuffer.c |  2 +-
 src/backend/storage/file/fd.c   |  2 +-
 src/backend/utils/hash/dynahash.c   | 14 +++---
 src/backend/utils/mmgr/freepage.c   |  2 +-
 src/include/port/pg_bswap.h |  8 
 src/port/snprintf.c |  4 ++--
 src/port/win32error.c   |  2 +-
 11 files changed, 27 insertions(+), 19 deletions(-)

diff --git a/contrib/hstore/hstore_compat.c b/contrib/hstore/hstore_compat.c
index 1d4e7484e4..d75e9cb23f 100644
--- a/contrib/hstore/hstore_compat.c
+++ b/contrib/hstore/hstore_compat.c
@@ -299,7 +299,7 @@ hstoreUpgrade(Datum orig)
 
if (valid_new)
{
-#if HSTORE_IS_HSTORE_NEW
+#ifdef HSTORE_IS_HSTORE_NEW
elog(WARNING, "ambiguous hstore value resolved as hstore-new");
 
/*
diff --git a/contrib/pg_standby/pg_standby.c b/contrib/pg_standby/pg_standby.c
index 031b1b5cd5..9fae146dd6 100644
--- a/contrib/pg_standby/pg_standby.c
+++ b/contrib/pg_standby/pg_standby.c
@@ -145,7 +145,7 @@ CustomizableInitialize(void)
switch (restoreCommandType)
{
case RESTORE_COMMAND_LINK:
-#if HAVE_WORKING_LINK
+#ifdef HAVE_WORKING_LINK
SET_RESTORE_COMMAND("ln -s -f", WALFilePath, 
xlogFilePath);
break;
 #endif
diff --git a/contrib/pgcrypto/imath.c b/contrib/pgcrypto/imath.c
index 6936d2cdca..545f148a76 100644
--- a/contrib/pgcrypto/imath.c
+++ b/contrib/pgcrypto/imath.c
@@ -2361,12 +2361,12 @@ s_ucmp(mp_int a, mp_int b)
 static int
 s_vcmp(mp_int a, mp_small v)
 {
-#if _MSC_VER
+#ifdef _MSC_VER
 #pragma warning(push)
 #pragma warning(disable: 4146)
 #endif
mp_usmall   uv = (v < 0) ? -(mp_usmall) v : (mp_usmall) v;
-#if _MSC_VER
+#ifdef _MSC_VER
 #pragma warning(pop)
 #endif
 
diff --git a/src/backend/libpq/be-fsstubs.c b/src/backend/libpq/be-fsstubs.c
index a750f921fa..b0ece7ec25 100644
--- a/src/backend/libpq/be-fsstubs.c
+++ b/src/backend/libpq/be-fsstubs.c
@@ -95,7 +95,7 @@ be_lo_open(PG_FUNCTION_ARGS)
LargeObjectDesc *lobjDesc;
int fd;
 
-#if FSDB
+#ifdef FSDB
elog(DEBUG4, "lo_open(%u,%d)", lobjId, mode);
 #endif
 
@@ -118,7 +118,7 @@ be_lo_close(PG_FUNCTION_ARGS)
(errcode(ERRCODE_UNDEFINED_OBJECT),
 errmsg("invalid large-object descriptor: %d", 
fd)));
 
-#if FSDB
+#ifdef FSDB
elog(DEBUG4, "lo_close(%d)", fd);
 #endif
 
diff --git a/src/backend/replication/logical/reorderbuffer.c 
b/src/backend/replication/logical/reorderbuffer.c
index 8ce28ad629..62e54240ec 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -3247,7 +3247,7 @@ typedef struct RewriteMappingFile
charfname[MAXPGPATH];
 } RewriteMappingFile;
 
-#if NOT_USED
+#ifdef NOT_USED
 static void
 DisplayMapping(HTAB *tuplecid_data)
 {
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index 94be62fa6e..fe2bb8f859 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -738,7 +738,7 @@ durable_link_or_rename(const char *oldfile, const char 
*newfile, int elevel)
if (fsync_fname_ext(oldfile, false, false, elevel) != 0)
return -1;
 
-#if HAVE_WORKING_LINK
+#ifdef HAVE_WORKING_LINK
if (link(oldfile, newfile) < 0)
{
ereport(elevel,
diff --git a/src/backend/utils/hash/dynahash.c 
b/src/backend/utils/hash/dynahash.c
index 0dfbec8e3e..de16adef17 100644
--- a/src/backend/utils/hash/dynahash.c
+++ b/src/backend/utils/hash/dynahash.c
@@ -243,7 +243,7 @@ struct HTAB
  */
 #define 

Re: v12.0: segfault in reindex CONCURRENTLY

2019-10-13 Thread Alvaro Herrera
On 2019-Oct-13, Justin Pryzby wrote:

> On Sun, Oct 13, 2019 at 03:10:21PM -0300, Alvaro Herrera wrote:
> > On 2019-Oct-13, Justin Pryzby wrote:
> > 
> > > Looks like it's a race condition and dereferencing *holder=NULL.  The 
> > > first
> > > crash was probably the same bug, due to report query running during 
> > > "reindex
> > > CONCURRENTLY", and probably finished at nearly the same time as another 
> > > locker.
> > 
> > Ooh, right, makes sense.  There's another spot with the same mistake ...
> > this patch should fix it.
> 
> I would maybe chop off the 2nd sentence, since conditionalizing indicates that
> we do actually care.
> 
> +* If requested, publish who we're going to wait for. 
>  This is not
> +* 100% accurate if they're already gone, but we 
> don't care.

True.  And we can copy the resulting comment to the other spot.

(FWIW I expect the crash is possible not just in reindex but also in
CREATE INDEX CONCURRENTLY.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 70f9b6729a..589b8816a4 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -384,12 +384,14 @@ WaitForOlderSnapshots(TransactionId limitXmin, bool progress)
 
 		if (VirtualTransactionIdIsValid(old_snapshots[i]))
 		{
+			/* If requested, publish who we're going to wait for. */
 			if (progress)
 			{
 PGPROC	   *holder = BackendIdGetProc(old_snapshots[i].backendId);
 
-pgstat_progress_update_param(PROGRESS_WAITFOR_CURRENT_PID,
-			 holder->pid);
+if (holder)
+	pgstat_progress_update_param(PROGRESS_WAITFOR_CURRENT_PID,
+ holder->pid);
 			}
 			VirtualXactLock(old_snapshots[i], true);
 		}
diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index 4682438114..1dd0e5e957 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -900,16 +900,14 @@ WaitForLockersMultiple(List *locktags, LOCKMODE lockmode, bool progress)
 
 		while (VirtualTransactionIdIsValid(*lockholders))
 		{
-			/*
-			 * If requested, publish who we're going to wait for.  This is not
-			 * 100% accurate if they're already gone, but we don't care.
-			 */
+			/* If requested, publish who we're going to wait for. */
 			if (progress)
 			{
 PGPROC	   *holder = BackendIdGetProc(lockholders->backendId);
 
-pgstat_progress_update_param(PROGRESS_WAITFOR_CURRENT_PID,
-			 holder->pid);
+if (holder)
+	pgstat_progress_update_param(PROGRESS_WAITFOR_CURRENT_PID,
+ holder->pid);
 			}
 			VirtualXactLock(*lockholders, true);
 			lockholders++;


Re: [HACKERS] Deadlock in XLogInsert at AIX

2019-10-13 Thread Tom Lane
Noah Misch  writes:
> On Wed, Oct 09, 2019 at 01:15:29PM -0400, Tom Lane wrote:
>> * I still think that the added configure test is a waste of build cycles.
>> It'd be sufficient to test "#ifdef HAVE__BUILTIN_CONSTANT_P" where you
>> are testing HAVE_I_CONSTRAINT__BUILTIN_CONSTANT_P, because our previous
>> buildfarm go-round with this showed that all supported compilers
>> interpret "i" this way.

> xlc does not interpret "i" that way:
> https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=hornet=2019-09-14%2016%3A42%3A32=config

Hm, how'd I miss that while looking at the buildfarm results?  Anyway,
you're clearly right on this one --- objection withdrawn.

>> * I really dislike building the asm calls with macros as you've done
>> here.

> For a macro local to one C file, I think readability is the relevant metric.
> In particular, it would be wrong to add arguments to make these more like
> header file macros.  I think the macros make the code somewhat more readable,
> and you think they make the code less readable.  I have removed the macros.

Personally I definitely find this way more readable, though I agree
beauty is in the eye of the beholder.

I've checked that this patch set compiles and passes regression tests
on my old Apple machines, so it's good to go as far as I can see.

regards, tom lane




Re: v12.0: segfault in reindex CONCURRENTLY

2019-10-13 Thread Justin Pryzby
Resending this message, which didn't make it to the list when I sent it
earlier.  (And, notified -www).

On Sun, Oct 13, 2019 at 06:06:43PM +0900, Michael Paquier wrote:
> On Fri, Oct 11, 2019 at 07:44:46PM -0500, Justin Pryzby wrote:
> > Unfortunately, there was no core file, and I'm still trying to reproduce it.
> 
> Forgot to set ulimit -c?  Having a backtrace would surely help.

Fortunately (?) another server hit crashed last night.
(Doesn't appear to be relevant, but this table has no 
inheritence/partition-ness).

Looks like it's a race condition and dereferencing *holder=NULL.  The first
crash was probably the same bug, due to report query running during "reindex
CONCURRENTLY", and probably finished at nearly the same time as another locker.

Relevant code introduced here:

commit ab0dfc961b6a821f23d9c40c723d11380ce195a6
Author: Alvaro Herrera 
Date:   Tue Apr 2 15:18:08 2019 -0300

Report progress of CREATE INDEX operations

Needs to be conditionalized (as anticipated by the comment)

+   if (holder)

pgstat_progress_update_param(PROGRESS_WAITFOR_CURRENT_PID,

 holder->pid);


Core was generated by `postgres: postgres ts [local] REINDEX  '.
Program terminated with signal 11, Segmentation fault.

#0  WaitForLockersMultiple (locktags=locktags@entry=0x1d30548, 
lockmode=lockmode@entry=5, progress=progress@entry=true) at lmgr.c:911
#1  0x005c2ac8 in ReindexRelationConcurrently 
(relationOid=relationOid@entry=17618, options=options@entry=0) at 
indexcmds.c:3090
#2  0x005c328a in ReindexIndex (indexRelation=, 
options=0, concurrent=) at indexcmds.c:2352
#3  0x007657fe in standard_ProcessUtility (pstmt=pstmt@entry=0x1d05468, 
queryString=queryString@entry=0x1d046e0 "REINDEX INDEX CONCURRENTLY 
loaded_cdr_files_filename",
context=context@entry=PROCESS_UTILITY_TOPLEVEL, params=params@entry=0x0, 
queryEnv=queryEnv@entry=0x0, dest=dest@entry=0x1d05548,
completionTag=completionTag@entry=0x7ffc05e6c0a0 "") at utility.c:787
#4  0x7f21517204ef in pgss_ProcessUtility (pstmt=0x1d05468, 
queryString=0x1d046e0 "REINDEX INDEX CONCURRENTLY loaded_cdr_files_filename", 
context=PROCESS_UTILITY_TOPLEVEL,
params=0x0, queryEnv=0x0, dest=0x1d05548, completionTag=0x7ffc05e6c0a0 "") 
at pg_stat_statements.c:1006
#5  0x00762816 in PortalRunUtility (portal=0x1d7a4e0, pstmt=0x1d05468, 
isTopLevel=, setHoldSnapshot=, dest=0x1d05548,
completionTag=0x7ffc05e6c0a0 "") at pquery.c:1175
#6  0x00763267 in PortalRunMulti (portal=portal@entry=0x1d7a4e0, 
isTopLevel=isTopLevel@entry=true, setHoldSnapshot=setHoldSnapshot@entry=false, 
dest=dest@entry=0x1d05548,
altdest=altdest@entry=0x1d05548, 
completionTag=completionTag@entry=0x7ffc05e6c0a0 "") at pquery.c:1328
#7  0x00763e45 in PortalRun (portal=, 
count=9223372036854775807, isTopLevel=, run_once=, dest=0x1d05548, altdest=0x1d05548,
completionTag=0x7ffc05e6c0a0 "") at pquery.c:796
#8  0x0075ff45 in exec_simple_query (query_string=) at 
postgres.c:1215
#9  0x00761212 in PostgresMain (argc=, argv=, dbname=, username=) at postgres.c:4236
#10 0x00483d02 in BackendRun (port=, port=) at postmaster.c:4431
#11 BackendStartup (port=0x1d2b340) at postmaster.c:4122
#12 ServerLoop () at postmaster.c:1704
#13 0x006f0b1f in PostmasterMain (argc=argc@entry=3, 
argv=argv@entry=0x1cff280) at postmaster.c:1377
#14 0x00484c93 in main (argc=3, argv=0x1cff280) at main.c:228

bt f

#0  WaitForLockersMultiple (locktags=locktags@entry=0x1d30548, 
lockmode=lockmode@entry=5, progress=progress@entry=true) at lmgr.c:911
holder = 0x0
lockholders = 0x1d9b778
holders = 
lc = 0x1d9bf80
total = 
done = 1
#1  0x005c2ac8 in ReindexRelationConcurrently 
(relationOid=relationOid@entry=17618, options=options@entry=0) at 
indexcmds.c:3090
heapRelationIds = 0x1d30360
indexIds = 0x1d303b0
newIndexIds = 
relationLocks = 
lockTags = 
lc = 0x0
lc2 = 0x0
private_context = 
oldcontext = 
relkind = 105 'i'
relationName = 0x0
relationNamespace = 0x0
ru0 = {tv = {tv_sec = 30592544, tv_usec = 7232025}, ru = {ru_utime = 
{tv_sec = 281483566645394, tv_usec = 75668733820930}, ru_stime = {tv_sec = 0, 
tv_usec = 30592272}, {
  ru_maxrss = 0, __ru_maxrss_word = 0}, {ru_ixrss = 0, 
__ru_ixrss_word = 0}, {ru_idrss = 105, __ru_idrss_word = 105}, {ru_isrss = 
-926385342574214656, 
  __ru_isrss_word = -926385342574214656}, {ru_minflt = 8924839, 
__ru_minflt_word = 8924839}, {ru_majflt = 0, __ru_majflt_word = 0}, {ru_nswap = 
17618, 
  __ru_nswap_word = 17618}, {ru_inblock = 139781327898864, 
__ru_inblock_word = 139781327898864}, {ru_oublock = 30430312, __ru_oublock_word 
= 30430312}, 

Re: stress test for parallel workers

2019-10-13 Thread Tom Lane
Filed at

https://bugzilla.kernel.org/show_bug.cgi?id=205183

We'll see what happens ...

regards, tom lane




Re: stress test for parallel workers

2019-10-13 Thread Tom Lane
Andres Freund  writes:
> On 2019-10-13 10:29:45 -0400, Tom Lane wrote:
>> How recent do you think it needs to be?

> My experience reporting kernel bugs is that the latest released version,
> or even just the tip of the git tree, is your best bet :/.

Considering that we're going to point them at chapter and verse in
Torvald's own repo, I do not think they can claim that we're worried
about obsolete code.

regards, tom lane




Re: v12.0 ERROR: trying to store a heap tuple into wrong type of slot

2019-10-13 Thread Andres Freund
Hi,

On 2019-10-11 16:03:20 -0500, Justin Pryzby wrote:
> I'm not sure why we have that index, and my script probably should have known
> to choose a better one to cluster on, but still..
>
> ts=# CLUSTER huawei_m2000_config_enodebcell_enodeb USING 
> huawei_m2000_config_enodebcell_enodeb_coalesce_idx ;
> DEBUG:  0: building index "pg_toast_1840151315_index" on table 
> "pg_toast_1840151315" serially
> LOCATION:  index_build, index.c:2791
> DEBUG:  0: clustering "public.huawei_m2000_config_enodebcell_enodeb" 
> using sequential scan and sort
> LOCATION:  copy_table_data, cluster.c:907
> ERROR:  XX000: trying to store a heap tuple into wrong type of slot
> LOCATION:  ExecStoreHeapTuple, execTuples.c:1328

Well, that's annoying. There apparently is not a single test covering
cluster on expression indexes, that' really ought to not be the
case. Equally annoying that I just broke this without noticing at all
:(.

The cause of the error is that, while that sounds like it should be the
case, a virtual slot isn't sufficient for tuplesort_begin_cluster(). So
the fix is pretty trivial.  Will fix.

I started a separate thread about test coverage of tuplesort at
https://www.postgresql.org/message-id/20191013144153.ooxrfglvnaocsrx2%40alap3.anarazel.de

Greetings,

Andres Freund




Re: stress test for parallel workers

2019-10-13 Thread Andres Freund
Hi,

On 2019-10-13 10:29:45 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > Probably requires reproducing on a pretty recent kernel first, to have a
> > decent chance of being investigated...
> 
> How recent do you think it needs to be?  The machine I was testing on
> yesterday is under a year old:
>
> uname -r = 4.18.19-100.fc27.ppc64le
> ...
> uname -r = 4.19.15-300.fc29.ppc64le

My experience reporting kernel bugs is that the latest released version,
or even just the tip of the git tree, is your best bet :/. And that
reports using distro kernels - with all their out of tree changes - are
also less likely to receive a response.  IIRC there's a fedora repo with
upstream kernels.

Greetings,

Andres Freund




Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2019-10-13 Thread Amit Kapila
On Sun, Oct 13, 2019 at 12:25 PM Dilip Kumar  wrote:
>
> On Thu, Oct 3, 2019 at 2:43 PM Amit Kapila  wrote:
> >
> > 3.
> > + *
> > + * While updating the existing change with detoasted tuple data, we need to
> > + * update the memory accounting info, because the change size will differ.
> > + * Otherwise the accounting may get out of sync, triggering serialization
> > + * at unexpected times.
> > + *
> > + * We simply subtract size of the change before rejiggering the tuple, and
> > + * then adding the new size. This makes it look like the change was removed
> > + * and then added back, except it only tweaks the accounting info.
> > + *
> > + * In particular it can't trigger serialization, which would be pointless
> > + * anyway as it happens during commit processing right before handing
> > + * the change to the output plugin.
> >   */
> >  static void
> >  ReorderBufferToastReplace(ReorderBuffer *rb, ReorderBufferTXN *txn,
> > @@ -3023,6 +3281,13 @@ ReorderBufferToastReplace(ReorderBuffer *rb, 
> > ReorderBufferTXN *txn,
> >   if (txn->toast_hash == NULL)
> >   return;
> >
> > + /*
> > + * We're going modify the size of the change, so to make sure the
> > + * accounting is correct we'll make it look like we're removing the
> > + * change now (with the old size), and then re-add it at the end.
> > + */
> > + ReorderBufferChangeMemoryUpdate(rb, change, false);
> >
> > It is not very clear why this change is required.  Basically, this is done 
> > at commit time after which actually we shouldn't attempt to spill these 
> > changes.  This is mentioned in comments as well, but it is not clear if 
> > that is the case, then how and when accounting can create a problem.  If 
> > possible, can you explain it with an example?
> >
> IIUC, we are keeping the track of the memory in ReorderBuffer which is
> common across the transactions.  So even if this transaction is
> committing and will not spill to dis but we need to keep the memory
> accounting correct for the future changes in other transactions.
>

You are right.  I somehow missed that we need to keep the size
computation in sync even during commit for other in-progress
transactions in the ReorderBuffer.  You can ignore this point or maybe
slightly adjust the comment to make it explicit.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: v12.0: reindex CONCURRENTLY: lock ShareUpdateExclusiveLock on object 14185/39327/0 is already held

2019-10-13 Thread Michael Paquier
On Sat, Oct 12, 2019 at 09:51:45PM -0500, Justin Pryzby wrote:
> Variations on this seem to leave the locks table (?) or something else in a
> Real Bad state, such that I cannot truncate the table or drop it; or at least
> commands are unreasonably delayed for minutes, on this otherwise-empty test
> cluster.

I got an assertion failure on that:
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at
../sysdeps/unix/sysv/linux/raise.c:50
#1  0x7f417a283535 in __GI_abort () at abort.c:79
#2  0x564c351f0f4f in ExceptionalCondition
(conditionName=0x564c353d0ac8
"SHMQueueEmpty(&(MyProc->myProcLocks[i]))", errorType=0x564c353d09de
"FailedAssertion",
fileName=0x564c353d09d7 "proc.c", lineNumber=832) at assert.c:54
#3  0x564c3504debe in ProcKill (code=0, arg=0) at proc.c:832
#4  0x564c3503430e in shmem_exit (code=0) at ipc.c:272
#5  0x564c3503413d in proc_exit_prepare (code=0) at ipc.c:194
#6  0x564c3503409c in proc_exit (code=0) at ipc.c:107
#7  0x564c3506a629 in PostgresMain (argc=1,
argv=0x564c35c12ae0, dbname=0x564c35c129d0 "ioltas",
username=0x564c35c129b0 "ioltas") at postgres.c:4464
#8  0x564c34fb94ed in BackendRun (port=0x564c35c0c6b0) at
postmaster.c:4465
#9  0x564c34fb8c59 in BackendStartup (port=0x564c35c0c6b0) at
postmaster.c:4156
#10 0x564c34fb4c7f in ServerLoop () at postmaster.c:1718
#11 0x564c34fb44ad in PostmasterMain (argc=3,
argv=0x564c35bdefd0) at postmaster.c:1391
#12 0x564c34ec0d3d in main (argc=3, argv=0x564c35bdefd0) at main.c:210

This means that all the locks hold have not actually been released
when the timeout has kicked in.  Not sure that this is only an issue
related to REINDEX CONCURRENTLY, but if that's the case then we are
missing a cleanup step.
--
Michael


signature.asc
Description: PGP signature


Re: v12.0: segfault in reindex CONCURRENTLY

2019-10-13 Thread Michael Paquier
On Fri, Oct 11, 2019 at 07:44:46PM -0500, Justin Pryzby wrote:
> That's an index on a table partition, but not itself a child of a relkind=I
> index.

Interesting.  Testing with a partition tree, and indexes on leaves
which do not have dependencies with a parent I cannot reproduce
anything.  Perhaps you have some concurrent operations going on?

> Unfortunately, there was no core file, and I'm still trying to reproduce it.

Forgot to set ulimit -c?  Having a backtrace would surely help.
--
Michael


signature.asc
Description: PGP signature


Re: adding partitioned tables to publications

2019-10-13 Thread David Fetter
On Mon, Oct 07, 2019 at 09:55:23AM +0900, Amit Langote wrote:
> One cannot currently add partitioned tables to a publication.
> 
> create table p (a int, b int) partition by hash (a);
> create table p1 partition of p for values with (modulus 3, remainder 0);
> create table p2 partition of p for values with (modulus 3, remainder 1);
> create table p3 partition of p for values with (modulus 3, remainder 2);
> 
> create publication publish_p for table p;
> ERROR:  "p" is a partitioned table
> DETAIL:  Adding partitioned tables to publications is not supported.
> HINT:  You can add the table partitions individually.
> 
> One can do this instead:
> 
> create publication publish_p1 for table p1;
> create publication publish_p2 for table p2;
> create publication publish_p3 for table p3;
> 
> but maybe that's too much code to maintain for users.
> 
> I propose that we make this command:
> 
> create publication publish_p for table p;
> 
> automatically add all the partitions to the publication.  Also, any
> future partitions should also be automatically added to the
> publication.  So, publishing a partitioned table automatically
> publishes all of its existing and future partitions.  Attached patch
> implements that.
> 
> What doesn't change with this patch is that the partitions on the
> subscription side still have to match one-to-one with the partitions
> on the publication side, because the changes are still replicated as
> being made to the individual partitions, not as the changes to the
> root partitioned table.  It might be useful to implement that
> functionality on the publication side, because it allows users to
> define the replication target any way they need to, but this patch
> doesn't implement that.

With this patch, is it possible to remove a partition manually from a
subscription, or will it just get automatically re-added at some
point?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2019-10-13 Thread Dilip Kumar
On Thu, Oct 3, 2019 at 2:43 PM Amit Kapila  wrote:
>
> On Thu, Oct 3, 2019 at 4:03 AM Tomas Vondra  
> wrote:
>>
>> On Wed, Oct 02, 2019 at 04:27:30AM +0530, Amit Kapila wrote:
>> >On Tue, Oct 1, 2019 at 7:21 PM Tomas Vondra 
>> >wrote:
>> >
>> >> On Tue, Oct 01, 2019 at 06:55:52PM +0530, Amit Kapila wrote:
>> >> >
>> >> >On further testing, I found that the patch seems to have problems with
>> >> >toast.  Consider below scenario:
>> >> >Session-1
>> >> >Create table large_text(t1 text);
>> >> >INSERT INTO large_text
>> >> >SELECT (SELECT string_agg('x', ',')
>> >> >FROM generate_series(1, 100)) FROM generate_series(1, 1000);
>> >> >
>> >> >Session-2
>> >> >SELECT * FROM pg_create_logical_replication_slot('regression_slot',
>> >> >'test_decoding');
>> >> >SELECT * FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL);
>> >> >*--kaboom*
>> >> >
>> >> >The second statement in Session-2 leads to a crash.
>> >> >
>> >>
>> >> OK, thanks for the report - will investigate.
>> >>
>> >
>> >It was an assertion failure in ReorderBufferCleanupTXN at below line:
>> >+ /* Check we're not mixing changes from different transactions. */
>> >+ Assert(change->txn == txn);
>> >
>>
>> Can you still reproduce this issue with the patch I sent on 28/9? I have
>> been unable to trigger the failure, and it seems pretty similar to the
>> failure you reported (and I fixed) on 28/9.
>
>
> Yes, it seems we need a similar change in ReorderBufferAddNewTupleCids.  I 
> think in session-2 you need to create replication slot before creating table 
> in session-1 to see this problem.
>
> --- a/src/backend/replication/logical/reorderbuffer.c
> +++ b/src/backend/replication/logical/reorderbuffer.c
> @@ -2196,6 +2196,7 @@ ReorderBufferAddNewTupleCids(ReorderBuffer *rb, 
> TransactionId xid,
> change->data.tuplecid.cmax = cmax;
> change->data.tuplecid.combocid = combocid;
> change->lsn = lsn;
> +   change->txn = txn;
> change->action = REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID;
> dlist_push_tail(>tuplecids, >node);
>
> Few more comments:
> ---
> 1.
> +static bool
> +check_logical_decoding_work_mem(int *newval, void **extra, GucSource source)
> +{
> + /*
> + * -1 indicates fallback.
> + *
> + * If we haven't yet changed the boot_val default of -1, just let it be.
> + * logical decoding will look to maintenance_work_mem instead.
> + */
> + if (*newval == -1)
> + return true;
> +
> + /*
> + * We clamp manually-set values to at least 64kB. The maintenance_work_mem
> + * uses a higher minimum value (1MB), so this is OK.
> + */
> + if (*newval < 64)
> + *newval = 64;
>
> I think this needs to be changed as now we don't rely on 
> maintenance_work_mem.  Another thing related to this is that I think the 
> default value for logical_decoding_work_mem still seems to be -1.  We need to 
> make it to 64MB.  I have seen this while debugging memory accounting changes. 
>  I think this is the reason why I was not seeing toast related changes being 
> serialized because, in that test, I haven't changed the default value of 
> logical_decoding_work_mem.
>
> 2.
> + /*
> + * We're going modify the size of the change, so to make sure the
> + * accounting is correct we'll make it look like we're removing the
> + * change now (with the old size), and then re-add it at the end.
> + */
>
>
> /going modify/going to modify/
>
> 3.
> + *
> + * While updating the existing change with detoasted tuple data, we need to
> + * update the memory accounting info, because the change size will differ.
> + * Otherwise the accounting may get out of sync, triggering serialization
> + * at unexpected times.
> + *
> + * We simply subtract size of the change before rejiggering the tuple, and
> + * then adding the new size. This makes it look like the change was removed
> + * and then added back, except it only tweaks the accounting info.
> + *
> + * In particular it can't trigger serialization, which would be pointless
> + * anyway as it happens during commit processing right before handing
> + * the change to the output plugin.
>   */
>  static void
>  ReorderBufferToastReplace(ReorderBuffer *rb, ReorderBufferTXN *txn,
> @@ -3023,6 +3281,13 @@ ReorderBufferToastReplace(ReorderBuffer *rb, 
> ReorderBufferTXN *txn,
>   if (txn->toast_hash == NULL)
>   return;
>
> + /*
> + * We're going modify the size of the change, so to make sure the
> + * accounting is correct we'll make it look like we're removing the
> + * change now (with the old size), and then re-add it at the end.
> + */
> + ReorderBufferChangeMemoryUpdate(rb, change, false);
>
> It is not very clear why this change is required.  Basically, this is done at 
> commit time after which actually we shouldn't attempt to spill these changes. 
>  This is mentioned in comments as well, but it is not clear if that is the 
> case, then how and when accounting can create a problem.  If possible, can 
> you explain it with an example?
>
IIUC, we are