subject:"PATCH\: logical_work_mem and logical streaming of large in\-progress transactions"

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2021-05-06 Thread Dilip Kumar

On Thu, May 6, 2021 at 9:01 AM Amit Kapila  wrote:
>
> On Fri, Apr 30, 2021 at 7:43 PM Dilip Kumar  wrote:
> >
> > So, I
> > > used the other approach which led to the attached.
> >
> > The patch looks fine to me.
> >
>
> Thanks, pushed!

Thanks!



-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2021-05-05 Thread Amit Kapila

On Fri, Apr 30, 2021 at 7:43 PM Dilip Kumar  wrote:
>
> So, I
> > used the other approach which led to the attached.
>
> The patch looks fine to me.
>

Thanks, pushed!

-- 
With Regards,
Amit Kapila.

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2021-04-30 Thread Dilip Kumar

On Fri, Apr 30, 2021 at 3:01 PM Amit Kapila  wrote:
>
> On Wed, Apr 28, 2021 at 11:03 AM Dilip Kumar  wrote:
> >
> > On Wed, Apr 28, 2021 at 11:00 AM Amit Kapila  
> > wrote:
> > >
> >
> >  The idea I have is to additionally check that we are decoding
> > > streaming or prepared transaction (the same check as we have for
> > > setting curtxn) or we can check if CheckXidAlive is a valid
> > > transaction id. What do you think?
> >
> > I think a check based on CheckXidAlive looks good to me.  This will
> > protect against if a similar error is raised from any other path as
> > you mentioned above.
> >
>
> We can't use CheckXidAlive because it is reset by that time.

Right.

So, I
> used the other approach which led to the attached.

The patch looks fine to me.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2021-04-30 Thread Amit Kapila

On Wed, Apr 28, 2021 at 11:03 AM Dilip Kumar  wrote:
>
> On Wed, Apr 28, 2021 at 11:00 AM Amit Kapila  wrote:
> >
>
>  The idea I have is to additionally check that we are decoding
> > streaming or prepared transaction (the same check as we have for
> > setting curtxn) or we can check if CheckXidAlive is a valid
> > transaction id. What do you think?
>
> I think a check based on CheckXidAlive looks good to me.  This will
> protect against if a similar error is raised from any other path as
> you mentioned above.
>

We can't use CheckXidAlive because it is reset by that time. So, I
used the other approach which led to the attached.

-- 
With Regards,
Amit Kapila.


v1-0001-Tighten-the-concurrent-abort-check-during-decodin.patch
Description: Binary data

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2021-04-27 Thread Dilip Kumar

On Wed, Apr 28, 2021 at 11:00 AM Amit Kapila  wrote:
>
> Tom Lane has raised a complaint on pgsql-commiters [1] about one of
> the commits related to this work [2]. The new member wrasse is showing
> Warning:
>
> "/export/home/nm/farm/studio64v12_6/HEAD/pgsql.build/../pgsql/src/backend/replication/logical/reorderbuffer.c",
> line 2510: Warning: Likely null pointer dereference (*(curtxn+272)):
> ReorderBufferProcessTXN
>
> The Warning is for line:
> curtxn->concurrent_abort = true;
>
> Now, we can simply fix this warning by adding an if check like:
> if (curtxn)
> curtxn->concurrent_abort = true;
>
> However, on further discussion, it seems that is not sufficient here
> because the callbacks can throw the surrounding error code
> (ERRCODE_TRANSACTION_ROLLBACK) where we set concurrent_abort flag for
> a completely different scenario. I think here we need a
> stronger check to ensure that we set concurrent abort flag and do
> other things in that check only when we are decoding non-committed
> xacts.

That makes sense.

 The idea I have is to additionally check that we are decoding
> streaming or prepared transaction (the same check as we have for
> setting curtxn) or we can check if CheckXidAlive is a valid
> transaction id. What do you think?

I think a check based on CheckXidAlive looks good to me.  This will
protect against if a similar error is raised from any other path as
you mentioned above.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2021-04-27 Thread Amit Kapila

Tom Lane has raised a complaint on pgsql-commiters [1] about one of
the commits related to this work [2]. The new member wrasse is showing
Warning:

"/export/home/nm/farm/studio64v12_6/HEAD/pgsql.build/../pgsql/src/backend/replication/logical/reorderbuffer.c",
line 2510: Warning: Likely null pointer dereference (*(curtxn+272)):
ReorderBufferProcessTXN

The Warning is for line:
curtxn->concurrent_abort = true;

Now, we can simply fix this warning by adding an if check like:
if (curtxn)
curtxn->concurrent_abort = true;

However, on further discussion, it seems that is not sufficient here
because the callbacks can throw the surrounding error code
(ERRCODE_TRANSACTION_ROLLBACK) where we set concurrent_abort flag for
a completely different scenario. I think here we need a
stronger check to ensure that we set concurrent abort flag and do
other things in that check only when we are decoding non-committed
xacts. The idea I have is to additionally check that we are decoding
streaming or prepared transaction (the same check as we have for
setting curtxn) or we can check if CheckXidAlive is a valid
transaction id. What do you think?

[1] - https://www.postgresql.org/message-id/2752962.1619568098%40sss.pgh.pa.us
[2] - 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=7259736a6e5b7c7588fff9578370736a6648acbb

-- 
With Regards,
Amit Kapila.

Re: HASH_BLOBS hazards (was Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions)

2020-12-15 Thread Tom Lane

Noah Misch  writes:
> On Mon, Dec 14, 2020 at 01:59:03PM -0500, Tom Lane wrote:
>> Here's a rolled-up patch that does some further documentation work
>> and gets rid of the unnecessary memset's as well.

> +1 on removing the memset() calls.  That said, it's not a big deal if more
> creep in over time; it doesn't qualify as a project policy violation.

Right, that part is just neatnik-ism.  Neither the calls with memset
nor the ones without are buggy.

>> + * *infoP and hash_flags should specify at least the entry sizes and key

> s/should/must/

OK; thanks for reviewing!

regards, tom lane

Re: HASH_BLOBS hazards (was Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions)

2020-12-15 Thread Noah Misch

On Mon, Dec 14, 2020 at 01:59:03PM -0500, Tom Lane wrote:
> * Should we just have a blanket insistence that all callers supply
> HASH_ELEM?  The default sizes that dynahash.c uses without that are
> undocumented and basically useless.

+1

> we should just rip out all those memsets as pointless, since there's
> basically no case where you'd use the memset to fill a field that
> you meant to pass as zero.  The fact that hash_create() doesn't
> read fields it's not told to by a flag means we should not need
> the memsets to avoid uninitialized-memory reads.

On Mon, Dec 14, 2020 at 06:55:20PM -0500, Tom Lane wrote:
> Here's a rolled-up patch that does some further documentation work
> and gets rid of the unnecessary memset's as well.

+1 on removing the memset() calls.  That said, it's not a big deal if more
creep in over time; it doesn't qualify as a project policy violation.

> @@ -329,6 +328,11 @@ InitShmemIndex(void)
>   * whose maximum size is certain, this should be equal to max_size; that
>   * ensures that no run-time out-of-shared-memory failures can occur.
>   *
> + * *infoP and hash_flags should specify at least the entry sizes and key

s/should/must/

Re: HASH_BLOBS hazards (was Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions)

2020-12-14 Thread Tom Lane

Here's a rolled-up patch that does some further documentation work
and gets rid of the unnecessary memset's as well.

regards, tom lane

diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 2dc9e44ae6..651227f510 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -2607,7 +2607,8 @@ createConnHash(void)
 	ctl.keysize = NAMEDATALEN;
 	ctl.entrysize = sizeof(remoteConnHashEnt);
 
-	return hash_create("Remote Con hash", NUMCONN, , HASH_ELEM);
+	return hash_create("Remote Con hash", NUMCONN, ,
+	   HASH_ELEM | HASH_STRINGS);
 }
 
 static void
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 70cfdb2c9d..2f00344b7f 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -567,7 +567,6 @@ pgss_shmem_startup(void)
 		pgss->stats.dealloc = 0;
 	}
 
-	memset(, 0, sizeof(info));
 	info.keysize = sizeof(pgssHashKey);
 	info.entrysize = sizeof(pgssEntry);
 	pgss_hash = ShmemInitHash("pg_stat_statements hash",
diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index ab3226287d..66581e5414 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -119,14 +119,11 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	{
 		HASHCTL		ctl;
 
-		MemSet(, 0, sizeof(ctl));
 		ctl.keysize = sizeof(ConnCacheKey);
 		ctl.entrysize = sizeof(ConnCacheEntry);
-		/* allocate ConnectionHash in the cache context */
-		ctl.hcxt = CacheMemoryContext;
 		ConnectionHash = hash_create("postgres_fdw connections", 8,
 	 ,
-	 HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+	 HASH_ELEM | HASH_BLOBS);
 
 		/*
 		 * Register some callback functions that manage connection cleanup.
diff --git a/contrib/postgres_fdw/shippable.c b/contrib/postgres_fdw/shippable.c
index 3433c19712..b4766dc5ff 100644
--- a/contrib/postgres_fdw/shippable.c
+++ b/contrib/postgres_fdw/shippable.c
@@ -93,7 +93,6 @@ InitializeShippableCache(void)
 	HASHCTL		ctl;
 
 	/* Create the hash table. */
-	MemSet(, 0, sizeof(ctl));
 	ctl.keysize = sizeof(ShippableCacheKey);
 	ctl.entrysize = sizeof(ShippableCacheEntry);
 	ShippableCacheHash =
diff --git a/contrib/tablefunc/tablefunc.c b/contrib/tablefunc/tablefunc.c
index 85986ec24a..e9a9741154 100644
--- a/contrib/tablefunc/tablefunc.c
+++ b/contrib/tablefunc/tablefunc.c
@@ -714,7 +714,6 @@ load_categories_hash(char *cats_sql, MemoryContext per_query_ctx)
 	MemoryContext SPIcontext;
 
 	/* initialize the category hash table */
-	MemSet(, 0, sizeof(ctl));
 	ctl.keysize = MAX_CATNAME_LEN;
 	ctl.entrysize = sizeof(crosstab_HashEnt);
 	ctl.hcxt = per_query_ctx;
@@ -726,7 +725,7 @@ load_categories_hash(char *cats_sql, MemoryContext per_query_ctx)
 	crosstab_hash = hash_create("crosstab hash",
 INIT_CATS,
 ,
-HASH_ELEM | HASH_CONTEXT);
+HASH_ELEM | HASH_STRINGS | HASH_CONTEXT);
 
 	/* Connect to SPI manager */
 	if ((ret = SPI_connect()) < 0)
diff --git a/src/backend/access/gist/gistbuildbuffers.c b/src/backend/access/gist/gistbuildbuffers.c
index 4ad67c88b4..217c199a14 100644
--- a/src/backend/access/gist/gistbuildbuffers.c
+++ b/src/backend/access/gist/gistbuildbuffers.c
@@ -76,7 +76,6 @@ gistInitBuildBuffers(int pagesPerBuffer, int levelStep, int maxLevel)
 	 * nodeBuffersTab hash is association between index blocks and it's
 	 * buffers.
 	 */
-	memset(, 0, sizeof(hashCtl));
 	hashCtl.keysize = sizeof(BlockNumber);
 	hashCtl.entrysize = sizeof(GISTNodeBuffer);
 	hashCtl.hcxt = CurrentMemoryContext;
diff --git a/src/backend/access/hash/hashpage.c b/src/backend/access/hash/hashpage.c
index a664ecf494..c77a189907 100644
--- a/src/backend/access/hash/hashpage.c
+++ b/src/backend/access/hash/hashpage.c
@@ -1363,7 +1363,6 @@ _hash_finish_split(Relation rel, Buffer metabuf, Buffer obuf, Bucket obucket,
 	bool		found;
 
 	/* Initialize hash tables used to track TIDs */
-	memset(_ctl, 0, sizeof(hash_ctl));
 	hash_ctl.keysize = sizeof(ItemPointerData);
 	hash_ctl.entrysize = sizeof(ItemPointerData);
 	hash_ctl.hcxt = CurrentMemoryContext;
diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
index 39e33763df..65942cc428 100644
--- a/src/backend/access/heap/rewriteheap.c
+++ b/src/backend/access/heap/rewriteheap.c
@@ -266,7 +266,6 @@ begin_heap_rewrite(Relation old_heap, Relation new_heap, TransactionId oldest_xm
 	state->rs_cxt = rw_cxt;
 
 	/* Initialize hash tables used to track update chains */
-	memset(_ctl, 0, sizeof(hash_ctl));
 	hash_ctl.keysize = sizeof(TidHashKey);
 	hash_ctl.entrysize = sizeof(UnresolvedTupData);
 	hash_ctl.hcxt = state->rs_cxt;
@@ -824,7 +823,6 @@ logical_begin_heap_rewrite(RewriteState state)
 	state->rs_begin_lsn = GetXLogInsertRecPtr();
 	state->rs_num_rewrite_mappings = 0;
 
-	memset(_ctl, 0, sizeof(hash_ctl));
 	hash_ctl.keysize = sizeof(TransactionId);
 	hash_ctl.entrysize

Re: HASH_BLOBS hazards (was Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions)

2020-12-14 Thread Tom Lane

I wrote:
> There are a couple of other API oddities that maybe we should think
> about while we're here:

> * Should we just have a blanket insistence that all callers supply
> HASH_ELEM?  The default sizes that dynahash.c uses without that are
> undocumented and basically useless.  We're already asserting that
> in the HASH_BLOBS path, which is the majority use-case, and this
> patch now asserts it for HASH_STRINGS too.

Here's a follow-up patch for that part, which also tries to respond
a bit to Heikki's complaint about skimpy documentation.  While at it,
I const-ified the HASHCTL argument, since there's no need for
hash_create to modify that.

regards, tom lane

diff --git a/src/backend/utils/hash/dynahash.c b/src/backend/utils/hash/dynahash.c
index 07cae638df..49f21b77bb 100644
--- a/src/backend/utils/hash/dynahash.c
+++ b/src/backend/utils/hash/dynahash.c
@@ -317,11 +317,20 @@ string_compare(const char *key1, const char *key2, Size keysize)
  *	*info: additional table parameters, as indicated by flags
  *	flags: bitmask indicating which parameters to take from *info
  *
- * The flags value must include exactly one of HASH_STRINGS, HASH_BLOBS,
+ * The flags value *must* include HASH_ELEM.  (Formerly, this was nominally
+ * optional, but the default keysize and entrysize values were useless.)
+ * The flags value must also include exactly one of HASH_STRINGS, HASH_BLOBS,
  * or HASH_FUNCTION, to define the key hashing semantics (C strings,
  * binary blobs, or custom, respectively).  Callers specifying a custom
  * hash function will likely also want to use HASH_COMPARE, and perhaps
  * also HASH_KEYCOPY, to control key comparison and copying.
+ * Another often-used flag is HASH_CONTEXT, to allocate the hash table
+ * under info->hcxt rather than under TopMemoryContext; the default
+ * behavior is only suitable for session-lifespan hash tables.
+ * Other flags bits are special-purpose and seldom used.
+ *
+ * Fields in *info are read only when the associated flags bit is set.
+ * It is not necessary to initialize other fields of *info.
  *
  * Note: for a shared-memory hashtable, nelem needs to be a pretty good
  * estimate, since we can't expand the table on the fly.  But an unshared
@@ -330,11 +339,19 @@ string_compare(const char *key1, const char *key2, Size keysize)
  * large nelem will penalize hash_seq_search speed without buying much.
  */
 HTAB *
-hash_create(const char *tabname, long nelem, HASHCTL *info, int flags)
+hash_create(const char *tabname, long nelem, const HASHCTL *info, int flags)
 {
 	HTAB	   *hashp;
 	HASHHDR*hctl;
 
+	/*
+	 * Hash tables now allocate space for key and data, but you have to say
+	 * how much space to allocate.
+	 */
+	Assert(flags & HASH_ELEM);
+	Assert(info->keysize > 0);
+	Assert(info->entrysize >= info->keysize);
+
 	/*
 	 * For shared hash tables, we have a local hash header (HTAB struct) that
 	 * we allocate in TopMemoryContext; all else is in shared memory.
@@ -385,7 +402,6 @@ hash_create(const char *tabname, long nelem, HASHCTL *info, int flags)
 	{
 		Assert(!(flags & HASH_STRINGS));
 		/* We can optimize hashing for common key sizes */
-		Assert(flags & HASH_ELEM);
 		if (info->keysize == sizeof(uint32))
 			hashp->hash = uint32_hash;
 		else
@@ -402,7 +418,6 @@ hash_create(const char *tabname, long nelem, HASHCTL *info, int flags)
 		 * it's almost certainly an integer or pointer not a string.
 		 */
 		Assert(flags & HASH_STRINGS);
-		Assert(flags & HASH_ELEM);
 		Assert(info->keysize > 8);
 
 		hashp->hash = string_hash;
@@ -529,16 +544,9 @@ hash_create(const char *tabname, long nelem, HASHCTL *info, int flags)
 		hctl->dsize = info->dsize;
 	}
 
-	/*
-	 * hash table now allocates space for key and data but you have to say how
-	 * much space to allocate
-	 */
-	if (flags & HASH_ELEM)
-	{
-		Assert(info->entrysize >= info->keysize);
-		hctl->keysize = info->keysize;
-		hctl->entrysize = info->entrysize;
-	}
+	/* remember the entry sizes, too */
+	hctl->keysize = info->keysize;
+	hctl->entrysize = info->entrysize;
 
 	/* make local copies of heavily-used constant fields */
 	hashp->keysize = hctl->keysize;
@@ -617,10 +625,6 @@ hdefault(HTAB *hashp)
 	hctl->dsize = DEF_DIRSIZE;
 	hctl->nsegs = 0;
 
-	/* rather pointless defaults for key & entry size */
-	hctl->keysize = sizeof(char *);
-	hctl->entrysize = 2 * sizeof(char *);
-
 	hctl->num_partitions = 0;	/* not partitioned */
 
 	/* table has no fixed maximum size */
diff --git a/src/include/utils/hsearch.h b/src/include/utils/hsearch.h
index 666ad33567..c3daaae92b 100644
--- a/src/include/utils/hsearch.h
+++ b/src/include/utils/hsearch.h
@@ -124,7 +124,7 @@ typedef struct
  * one of these.
  */
 extern HTAB *hash_create(const char *tabname, long nelem,
-		 HASHCTL *info, int flags);
+		 const HASHCTL *info, int flags);
 extern void hash_destroy(HTAB *hashp);
 extern void hash_stats(const char *where, HTAB *hashp);
 extern void *hash_search(HTAB *hashp, const void

Re: HASH_BLOBS hazards (was Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions)

2020-12-14 Thread Tom Lane

Noah Misch  writes:
> On Sun, Dec 13, 2020 at 11:49:31AM -0500, Tom Lane wrote:
>> A quick count of grep hits suggest that the large majority of
>> existing hash_create() calls use HASH_BLOBS, and there might be
>> only order-of-ten calls that would need to be touched if we
>> required an explicit HASH_STRING flag.  So option #2 is seeming
>> kind of attractive.  Maybe that together with an assertion that
>> string keys have to exceed 8 or 16 bytes would be enough protection.

> Agreed.  I expect (2) gives most of the benefit.  Requiring 8-byte capacity
> should be harmless, and most architectures can zero 8 bytes in one
> instruction.  Requiring more bytes trades specificity for sensitivity.

Attached is a proposed patch that requires HASH_STRINGS to be stated
explicitly (in the event, there are 13 callers needing that) and insists
on keysize > 8 for string keys.  In examining the now-easily-visible uses
of string keys, almost all of them are using NAMEDATALEN-sized keys, or
in a few places larger values.  Only two are smaller:

1. ShmemIndex uses SHMEM_INDEX_KEYSIZE, which is only set to 48.

2. ResetUnloggedRelationsInDbspaceDir is using OIDCHARS + 1, because
it stores relfilenode OIDs as strings.  That seems pretty damfool
to me, so I'm inclined to change it to store binary OIDs instead;
those'd be a third the size (or probably a quarter the size after
alignment padding) and likely faster to hash or compare.  But I
didn't do that here, since it's still more than 8.  (I did whack
it upside the head to the extent of not storing its temporary
hash table in CacheMemoryContext.)

So it seems to me that insisting on keysize > 8 is fine.

There are a couple of other API oddities that maybe we should think
about while we're here:

* Should we just have a blanket insistence that all callers supply
HASH_ELEM?  The default sizes that dynahash.c uses without that are
undocumented and basically useless.  We're already asserting that
in the HASH_BLOBS path, which is the majority use-case, and this
patch now asserts it for HASH_STRINGS too.

* The coding convention that the HASHCTL argument struct should be
pre-zeroed seems to have been ignored at a lot of call sites.
I added a memset call to a couple of callers that I was touching
in this patch, but I'm having second thoughts about that.  Maybe
we should just rip out all those memsets as pointless, since there's
basically no case where you'd use the memset to fill a field that
you meant to pass as zero.  The fact that hash_create() doesn't
read fields it's not told to by a flag means we should not need
the memsets to avoid uninitialized-memory reads.

regards, tom lane

diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 2dc9e44ae6..8b17fb06eb 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -2604,10 +2604,12 @@ createConnHash(void)
 {
 	HASHCTL		ctl;

+	memset(, 0, sizeof(ctl));
 	ctl.keysize = NAMEDATALEN;
 	ctl.entrysize = sizeof(remoteConnHashEnt);

-	return hash_create("Remote Con hash", NUMCONN, , HASH_ELEM);
+	return hash_create("Remote Con hash", NUMCONN, ,
+	   HASH_ELEM | HASH_STRINGS);
 }

 static void
diff --git a/contrib/tablefunc/tablefunc.c b/contrib/tablefunc/tablefunc.c
index 85986ec24a..ec7819ca77 100644
--- a/contrib/tablefunc/tablefunc.c
+++ b/contrib/tablefunc/tablefunc.c
@@ -726,7 +726,7 @@ load_categories_hash(char *cats_sql, MemoryContext per_query_ctx)
 	crosstab_hash = hash_create("crosstab hash",
 INIT_CATS,
 ,
-HASH_ELEM | HASH_CONTEXT);
+HASH_ELEM | HASH_STRINGS | HASH_CONTEXT);

 	/* Connect to SPI manager */
 	if ((ret = SPI_connect()) < 0)
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 4b18be5b27..5ba7c2eb3c 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -414,7 +414,7 @@ InitQueryHashTable(void)
 	prepared_queries = hash_create("Prepared Queries",
    32,
    _ctl,
-   HASH_ELEM);
+   HASH_ELEM | HASH_STRINGS);
 }

 /*
diff --git a/src/backend/nodes/extensible.c b/src/backend/nodes/extensible.c
index ab04459c55..2fe89fd361 100644
--- a/src/backend/nodes/extensible.c
+++ b/src/backend/nodes/extensible.c
@@ -51,7 +51,8 @@ RegisterExtensibleNodeEntry(HTAB **p_htable, const char *htable_label,
 		ctl.keysize = EXTNODENAME_MAX_LEN;
 		ctl.entrysize = sizeof(ExtensibleNodeEntry);

-		*p_htable = hash_create(htable_label, 100, , HASH_ELEM);
+		*p_htable = hash_create(htable_label, 100, ,
+HASH_ELEM | HASH_STRINGS);
 	}

 	if (strlen(extnodename) >= EXTNODENAME_MAX_LEN)
diff --git a/src/backend/storage/file/reinit.c b/src/backend/storage/file/reinit.c
index 0c2094f766..f21ab67ae4 100644
--- a/src/backend/storage/file/reinit.c
+++ b/src/backend/storage/file/reinit.c
@@ -175,7 +175,9 @@ ResetUnloggedRelationsInDbspaceDir(const char *dbspacedirname, int op)
 		memset(, 0, sizeof(ctl));
 		ctl.keysize =

Re: HASH_BLOBS hazards (was Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions)

2020-12-14 Thread Amit Kapila

On Mon, Dec 14, 2020 at 1:36 AM Noah Misch  wrote:
>
> On Sun, Dec 13, 2020 at 11:49:31AM -0500, Tom Lane wrote:
> > But what jumps out at me here is that this sort of error seems way
> > too easy to make, and evidently way too hard to detect.  What can we
> > do to make it more obvious if one has incorrectly used or omitted
> > HASH_BLOBS?  Both directions of error might easily escape notice on
> > little-endian hardware.
> >
> > I thought of a few ideas, all of which have drawbacks:
> >
> > 1. Invert the sense of the flag, ie HASH_BLOBS becomes the default.
> > This seems to just move the problem somewhere else, besides which
> > it'd require touching an awful lot of callers, and would silently
> > break third-party callers.
> >
> > 2. Don't allow a default: invent a new HASH_STRING flag, and
> > require that hash_create() calls specify exactly one of HASH_BLOBS,
> > HASH_STRING, or HASH_FUNCTION.  This doesn't completely fix the
> > hazard of mindless-copy-and-paste, but I think it might make it
> > a little more obvious.  Still requires touching a lot of calls.
>
> I like (2), for making the bug harder and for greppability.  Probably
> pluralize it to HASH_STRINGS, for the parallel with HASH_BLOBS.
>
> > 3. Add some sort of heuristic restriction on keysize.  A keysize
> > that's only 4 or 8 bytes almost certainly is not a string.
> > This doesn't give us much traction for larger keysizes, though.
> >
> > 4. Disallow empty string keys, ie something like "Assert(s_len > 0)"
> > in string_hash().  I think we could get away with that given that
> > SQL disallows empty identifiers.  However, it would only help to
> > catch one direction of error (omitting HASH_BLOBS), and it would
> > only help on big-endian hardware, which is getting harder to find.
> > Still, we could hope that the buildfarm would detect errors.
>
> It's nontrivial to confirm that the empty-string key can't happen for a given
> hash table.  (In contrast, what (3) asserts on is usually a compile-time
> constant.)  I would stop short of adding (4), though it could be okay.
>
> > A quick count of grep hits suggest that the large majority of
> > existing hash_create() calls use HASH_BLOBS, and there might be
> > only order-of-ten calls that would need to be touched if we
> > required an explicit HASH_STRING flag.  So option #2 is seeming
> > kind of attractive.  Maybe that together with an assertion that
> > string keys have to exceed 8 or 16 bytes would be enough protection.
>
> Agreed.  I expect (2) gives most of the benefit.  Requiring 8-byte capacity
> should be harmless, and most architectures can zero 8 bytes in one
> instruction.  Requiring more bytes trades specificity for sensitivity.
>

+1. I also think in most cases (2) would be sufficient to avoid such
bugs. Adding restriction on string size might annoy some out-of-core
user which is already using small strings. However, adding an 8-byte
restriction on string size would be still okay.

-- 
With Regards,
Amit Kapila.

Re: HASH_BLOBS hazards (was Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions)

2020-12-14 Thread Peter Eisentraut


On 2020-12-13 17:49, Tom Lane wrote:

2. Don't allow a default: invent a new HASH_STRING flag, and
require that hash_create() calls specify exactly one of HASH_BLOBS,
HASH_STRING, or HASH_FUNCTION.  This doesn't completely fix the
hazard of mindless-copy-and-paste, but I think it might make it
a little more obvious.  Still requires touching a lot of calls.


I think this sounds best, and also expand the documentation of these 
flags a bit.


--
Peter Eisentraut
2ndQuadrant, an EDB company
https://www.2ndquadrant.com/

Re: HASH_BLOBS hazards (was Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions)

2020-12-13 Thread Noah Misch

On Sun, Dec 13, 2020 at 11:49:31AM -0500, Tom Lane wrote:
> But what jumps out at me here is that this sort of error seems way
> too easy to make, and evidently way too hard to detect.  What can we
> do to make it more obvious if one has incorrectly used or omitted
> HASH_BLOBS?  Both directions of error might easily escape notice on
> little-endian hardware.
> 
> I thought of a few ideas, all of which have drawbacks:
> 
> 1. Invert the sense of the flag, ie HASH_BLOBS becomes the default.
> This seems to just move the problem somewhere else, besides which
> it'd require touching an awful lot of callers, and would silently
> break third-party callers.
> 
> 2. Don't allow a default: invent a new HASH_STRING flag, and
> require that hash_create() calls specify exactly one of HASH_BLOBS,
> HASH_STRING, or HASH_FUNCTION.  This doesn't completely fix the
> hazard of mindless-copy-and-paste, but I think it might make it
> a little more obvious.  Still requires touching a lot of calls.

I like (2), for making the bug harder and for greppability.  Probably
pluralize it to HASH_STRINGS, for the parallel with HASH_BLOBS.

> 3. Add some sort of heuristic restriction on keysize.  A keysize
> that's only 4 or 8 bytes almost certainly is not a string.
> This doesn't give us much traction for larger keysizes, though.
> 
> 4. Disallow empty string keys, ie something like "Assert(s_len > 0)"
> in string_hash().  I think we could get away with that given that
> SQL disallows empty identifiers.  However, it would only help to
> catch one direction of error (omitting HASH_BLOBS), and it would
> only help on big-endian hardware, which is getting harder to find.
> Still, we could hope that the buildfarm would detect errors.

It's nontrivial to confirm that the empty-string key can't happen for a given
hash table.  (In contrast, what (3) asserts on is usually a compile-time
constant.)  I would stop short of adding (4), though it could be okay.

> A quick count of grep hits suggest that the large majority of
> existing hash_create() calls use HASH_BLOBS, and there might be
> only order-of-ten calls that would need to be touched if we
> required an explicit HASH_STRING flag.  So option #2 is seeming
> kind of attractive.  Maybe that together with an assertion that
> string keys have to exceed 8 or 16 bytes would be enough protection.

Agreed.  I expect (2) gives most of the benefit.  Requiring 8-byte capacity
should be harmless, and most architectures can zero 8 bytes in one
instruction.  Requiring more bytes trades specificity for sensitivity.

> A different angle we could think about is that the name "HASH_BLOBS"
> is kind of un-obvious.  Maybe we should deprecate that spelling in
> favor of something like "HASH_BINARY".

With (2) in place, I wouldn't worry about renaming HASH_BLOBS.  It's hard to
confuse with HASH_STRINGS or HASH_FUNCTION.  If anything, HASH_BLOBS conveys
something more specific.  HASH_FUNCTION cases see binary data, but that data
has structure that promotes it out of "blob" status.

HASH_BLOBS hazards (was Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions)

2020-12-13 Thread Tom Lane

Amit Kapila  writes:
> On Wed, Dec 9, 2020 at 2:56 PM Noah Misch  wrote:
>> The problem is xidhash using strcmp() to compare keys; it needs memcmp().

> Your analysis is correct.

Sorry for not having noticed this thread before.  Noah's fix is
clearly correct, and I have no objection to the added test case.
But what jumps out at me here is that this sort of error seems way
too easy to make, and evidently way too hard to detect.  What can we
do to make it more obvious if one has incorrectly used or omitted
HASH_BLOBS?  Both directions of error might easily escape notice on
little-endian hardware.

I thought of a few ideas, all of which have drawbacks:

1. Invert the sense of the flag, ie HASH_BLOBS becomes the default.
This seems to just move the problem somewhere else, besides which
it'd require touching an awful lot of callers, and would silently
break third-party callers.

2. Don't allow a default: invent a new HASH_STRING flag, and
require that hash_create() calls specify exactly one of HASH_BLOBS,
HASH_STRING, or HASH_FUNCTION.  This doesn't completely fix the
hazard of mindless-copy-and-paste, but I think it might make it
a little more obvious.  Still requires touching a lot of calls.

3. Add some sort of heuristic restriction on keysize.  A keysize
that's only 4 or 8 bytes almost certainly is not a string.
This doesn't give us much traction for larger keysizes, though.

4. Disallow empty string keys, ie something like "Assert(s_len > 0)"
in string_hash().  I think we could get away with that given that
SQL disallows empty identifiers.  However, it would only help to
catch one direction of error (omitting HASH_BLOBS), and it would
only help on big-endian hardware, which is getting harder to find.
Still, we could hope that the buildfarm would detect errors.

There might be some more options.  Also, some of these ideas
could be applied in combination.

A quick count of grep hits suggest that the large majority of
existing hash_create() calls use HASH_BLOBS, and there might be
only order-of-ten calls that would need to be touched if we
required an explicit HASH_STRING flag.  So option #2 is seeming
kind of attractive.  Maybe that together with an assertion that
string keys have to exceed 8 or 16 bytes would be enough protection.

Also, this census now suggests to me that the opposite problem
(copy-and-paste HASH_BLOBS when you meant string keys) might be
a real hazard, since so many of the existing prototypes that you
might copy have HASH_BLOBS.  I'm not sure if there's much to be
done for this case though.  A small saving grace is that it seems
relatively likely that you'd notice a functional problem pretty
quickly with this type of mistake, since lookups would tend to
fail due to trailing garbage after your lookup string.

A different angle we could think about is that the name "HASH_BLOBS"
is kind of un-obvious.  Maybe we should deprecate that spelling in
favor of something like "HASH_BINARY".

Thoughts?

regards, tom lane

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-12-09 Thread Amit Kapila

On Wed, Dec 9, 2020 at 2:56 PM Noah Misch  wrote:
>
> Further testing showed it was a file location problem, not a deletion problem.
> The worker tried to open
> base/pgsql_tmp/pgsql_tmp9896408.1.sharedfileset/16393-510.changes.0, but these
> were the files actually existing:
>
> [nm@power-aix 0:2 2020-12-08T13:56:35 64gcc 0]$ ls -la $(find 
> src/test/subscription/tmp_check -name '*sharedfileset*')
> src/test/subscription/tmp_check/t_015_stream_subscriber_data/pgdata/base/pgsql_tmp/pgsql_tmp9896408.0.sharedfileset:
> total 408
> drwx--2 nm   usr 256 Dec 08 03:20 .
> drwx--4 nm   usr 256 Dec 08 03:20 ..
> -rw---1 nm   usr  207806 Dec 08 03:20 16393-510.changes.0
>
> src/test/subscription/tmp_check/t_015_stream_subscriber_data/pgdata/base/pgsql_tmp/pgsql_tmp9896408.1.sharedfileset:
> total 0
> drwx--2 nm   usr 256 Dec 08 03:20 .
> drwx--4 nm   usr 256 Dec 08 03:20 ..
> -rw---1 nm   usr   0 Dec 08 03:20 16393-511.changes.0
>
> > > I have executed "make check" in the loop with only this file.  I have
> > > repeated it 5000 times but no failure, I am wondering shall we try to
> > > execute in the same machine in a loop where it failed once?
> >
> > Yes, that might help. Noah, would it be possible for you to try that
>
> The problem is xidhash using strcmp() to compare keys; it needs memcmp().  For
> this to matter, xidhash must contain more than one element.  Existing tests
> rarely exercise the multi-element scenario.  Under heavy load, on this system,
> the test publisher can have two active transactions at once, in which case it
> does exercise multi-element xidhash.  (The publisher is sensitive to timing,
> but the subscriber is not; once WAL contains interleaved records of two XIDs,
> the subscriber fails every time.)  This would be much harder to reproduce on a
> little-endian system, where strcmp(, _plus_one)!=0.  On big-endian,
> every small XID has zero in the first octet; they all look like empty strings.
>

Your analysis is correct.

> The attached patch has the one-line fix and some test suite changes that make
> this reproduce frequently on any big-endian system.  I'm currently planning to
> drop the test suite changes from the commit, but I could keep them if folks
> like them.  (They'd need more comments and timeout handling.)
>

I think it is better to keep this test which can always test multiple
streams on the subscriber.

Thanks for working on this.

-- 
With Regards,
Amit Kapila.

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-12-09 Thread Noah Misch

On Wed, Dec 02, 2020 at 01:50:25PM +0530, Amit Kapila wrote:
> On Wed, Dec 2, 2020 at 1:20 PM Dilip Kumar  wrote:
> > > On Mon, Nov 30, 2020 at 6:49 PM Amit Kapila  
> > > wrote:
> > > > On Mon, Nov 30, 2020 at 3:14 AM Noah Misch  wrote:
> > > > > On Mon, Sep 07, 2020 at 12:00:41PM +0530, Amit Kapila wrote:
> > > > > > Thanks, I have pushed the last patch. Let's wait for a day or so to
> > > > > > see the buildfarm reports
> > > > >
> > > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer=2020-09-08%2006%3A24%3A14
> > > > > failed the new 015_stream.pl test with the subscriber looping like 
> > > > > this:
> > > > >
> > > > > 2020-09-08 11:22:49.848 UTC [13959252:1] LOG:  logical replication 
> > > > > apply worker for subscription "tap_sub" has started
> > > > > 2020-09-08 11:22:54.045 UTC [13959252:2] ERROR:  could not open 
> > > > > temporary file "16393-510.changes.0" from BufFile 
> > > > > "16393-510.changes": No such file or directory
> > > > > 2020-09-08 11:22:54.055 UTC [7602182:1] LOG:  logical replication 
> > > > > apply worker for subscription "tap_sub" has started
> > > > > 2020-09-08 11:22:54.101 UTC [31785284:4] LOG:  background worker 
> > > > > "logical replication worker" (PID 13959252) exited with exit code 1
> > > > > 2020-09-08 11:23:01.142 UTC [7602182:2] ERROR:  could not open 
> > > > > temporary file "16393-510.changes.0" from BufFile 
> > > > > "16393-510.changes": No such file or directory
> > > > > ...

> > > > The above kind of error can happen due to the following reasons: (a)
> > > > the first time we sent the stream and created the file and that got
> > > > removed before the second stream reached the subscriber. (b) from the
> > > > publisher-side, we never sent the indication that it is the first
> > > > stream and the subscriber directly tries to open the file thinking it
> > > > is already there.

Further testing showed it was a file location problem, not a deletion problem.
The worker tried to open
base/pgsql_tmp/pgsql_tmp9896408.1.sharedfileset/16393-510.changes.0, but these
were the files actually existing:

[nm@power-aix 0:2 2020-12-08T13:56:35 64gcc 0]$ ls -la $(find 
src/test/subscription/tmp_check -name '*sharedfileset*')
src/test/subscription/tmp_check/t_015_stream_subscriber_data/pgdata/base/pgsql_tmp/pgsql_tmp9896408.0.sharedfileset:
total 408
drwx--2 nm   usr 256 Dec 08 03:20 .
drwx--4 nm   usr 256 Dec 08 03:20 ..
-rw---1 nm   usr  207806 Dec 08 03:20 16393-510.changes.0

src/test/subscription/tmp_check/t_015_stream_subscriber_data/pgdata/base/pgsql_tmp/pgsql_tmp9896408.1.sharedfileset:
total 0
drwx--2 nm   usr 256 Dec 08 03:20 .
drwx--4 nm   usr 256 Dec 08 03:20 ..
-rw---1 nm   usr   0 Dec 08 03:20 16393-511.changes.0

> > I have executed "make check" in the loop with only this file.  I have
> > repeated it 5000 times but no failure, I am wondering shall we try to
> > execute in the same machine in a loop where it failed once?
> 
> Yes, that might help. Noah, would it be possible for you to try that

The problem is xidhash using strcmp() to compare keys; it needs memcmp().  For
this to matter, xidhash must contain more than one element.  Existing tests
rarely exercise the multi-element scenario.  Under heavy load, on this system,
the test publisher can have two active transactions at once, in which case it
does exercise multi-element xidhash.  (The publisher is sensitive to timing,
but the subscriber is not; once WAL contains interleaved records of two XIDs,
the subscriber fails every time.)  This would be much harder to reproduce on a
little-endian system, where strcmp(, _plus_one)!=0.  On big-endian,
every small XID has zero in the first octet; they all look like empty strings.

The attached patch has the one-line fix and some test suite changes that make
this reproduce frequently on any big-endian system.  I'm currently planning to
drop the test suite changes from the commit, but I could keep them if folks
like them.  (They'd need more comments and timeout handling.)
Author: Noah Misch 
Commit: Noah Misch 

Use HASH_BLOBS for xidhash.

This caused BufFile errors on buildfarm member sungazer, and SIGSEGV was
possible.  Conditions for reaching those symptoms were more frequent on
big-endian systems.

Reviewed by FIXME.

Discussion: https://postgr.es/m/20201129214441.ga691...@rfd.leadboat.com

diff --git a/src/backend/replication/logical/worker.c 
b/src/backend/replication/logical/worker.c
index c37aafe..fce1dee 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -804,7 +804,7 @@ apply_handle_stream_start(StringInfo s)
hash_ctl.entrysize = sizeof(StreamXidHash);
hash_ctl.hcxt = ApplyContext;
xidhash = hash_create("StreamXidHash", 1024, _ctl,
-

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-12-02 Thread Amit Kapila

On Wed, Dec 2, 2020 at 1:20 PM Dilip Kumar  wrote:
>
> On Tue, Dec 1, 2020 at 11:38 AM Dilip Kumar  wrote:
> >
> > On Mon, Nov 30, 2020 at 6:49 PM Amit Kapila  wrote:
> > >
> > > On Mon, Nov 30, 2020 at 3:14 AM Noah Misch  wrote:
> > > >
> > > > On Mon, Sep 07, 2020 at 12:00:41PM +0530, Amit Kapila wrote:
> > > > > Thanks, I have pushed the last patch. Let's wait for a day or so to
> > > > > see the buildfarm reports
> > > >
> > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer=2020-09-08%2006%3A24%3A14
> > > > failed the new 015_stream.pl test with the subscriber looping like this:
> > > >
> > > > 2020-09-08 11:22:49.848 UTC [13959252:1] LOG:  logical replication 
> > > > apply worker for subscription "tap_sub" has started
> > > > 2020-09-08 11:22:54.045 UTC [13959252:2] ERROR:  could not open 
> > > > temporary file "16393-510.changes.0" from BufFile "16393-510.changes": 
> > > > No such file or directory
> > > > 2020-09-08 11:22:54.055 UTC [7602182:1] LOG:  logical replication apply 
> > > > worker for subscription "tap_sub" has started
> > > > 2020-09-08 11:22:54.101 UTC [31785284:4] LOG:  background worker 
> > > > "logical replication worker" (PID 13959252) exited with exit code 1
> > > > 2020-09-08 11:23:01.142 UTC [7602182:2] ERROR:  could not open 
> > > > temporary file "16393-510.changes.0" from BufFile "16393-510.changes": 
> > > > No such file or directory
> > > > ...
> > > >
> > > > What happened there?
> > > >
> > >
> > > What is going on here is that the expected streaming file is missing.
> > > Normally, the first time we send a stream of changes (some percentage
> > > of transaction changes) we create the streaming file, and then in
> > > respective streams we just keep on writing in that file the changes we
> > > receive from the publisher, and on commit, we read that file and apply
> > > all the changes.
> > >
> > > The above kind of error can happen due to the following reasons: (a)
> > > the first time we sent the stream and created the file and that got
> > > removed before the second stream reached the subscriber. (b) from the
> > > publisher-side, we never sent the indication that it is the first
> > > stream and the subscriber directly tries to open the file thinking it
> > > is already there.
> > >
>
> I have executed "make check" in the loop with only this file.  I have
> repeated it 5000 times but no failure, I am wondering shall we try to
> execute in the same machine in a loop where it failed once?
>

Yes, that might help. Noah, would it be possible for you to try that
out, and if it failed then probably get the stack trace of subscriber?
If we are able to reproduce it then we can add elogs in functions
SharedFileSetInit, BufFileCreateShared, BufFileOpenShared, and
SharedFileSetDeleteAll to print the paths to see if we are sometimes
unintentionally removing some files. I have checked the code and there
doesn't appear to be any such problems but I might be missing
something.

-- 
With Regards,
Amit Kapila.

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-12-01 Thread Dilip Kumar

On Tue, Dec 1, 2020 at 11:38 AM Dilip Kumar  wrote:
>
> On Mon, Nov 30, 2020 at 6:49 PM Amit Kapila  wrote:
> >
> > On Mon, Nov 30, 2020 at 3:14 AM Noah Misch  wrote:
> > >
> > > On Mon, Sep 07, 2020 at 12:00:41PM +0530, Amit Kapila wrote:
> > > > Thanks, I have pushed the last patch. Let's wait for a day or so to
> > > > see the buildfarm reports
> > >
> > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer=2020-09-08%2006%3A24%3A14
> > > failed the new 015_stream.pl test with the subscriber looping like this:
> > >
> > > 2020-09-08 11:22:49.848 UTC [13959252:1] LOG:  logical replication apply 
> > > worker for subscription "tap_sub" has started
> > > 2020-09-08 11:22:54.045 UTC [13959252:2] ERROR:  could not open temporary 
> > > file "16393-510.changes.0" from BufFile "16393-510.changes": No such file 
> > > or directory
> > > 2020-09-08 11:22:54.055 UTC [7602182:1] LOG:  logical replication apply 
> > > worker for subscription "tap_sub" has started
> > > 2020-09-08 11:22:54.101 UTC [31785284:4] LOG:  background worker "logical 
> > > replication worker" (PID 13959252) exited with exit code 1
> > > 2020-09-08 11:23:01.142 UTC [7602182:2] ERROR:  could not open temporary 
> > > file "16393-510.changes.0" from BufFile "16393-510.changes": No such file 
> > > or directory
> > > ...
> > >
> > > What happened there?
> > >
> >
> > What is going on here is that the expected streaming file is missing.
> > Normally, the first time we send a stream of changes (some percentage
> > of transaction changes) we create the streaming file, and then in
> > respective streams we just keep on writing in that file the changes we
> > receive from the publisher, and on commit, we read that file and apply
> > all the changes.
> >
> > The above kind of error can happen due to the following reasons: (a)
> > the first time we sent the stream and created the file and that got
> > removed before the second stream reached the subscriber. (b) from the
> > publisher-side, we never sent the indication that it is the first
> > stream and the subscriber directly tries to open the file thinking it
> > is already there.
> >
> > Now, the publisher and subscriber log doesn't directly indicate any of
> > the above problems but I have some observations.
> >
> > The subscriber log indicates that before the apply worker exits due to
> > an error the new apply worker gets started. We delete the
> > streaming-related temporary files on proc_exit, so one possibility
> > could have been that the new apply worker has created the streaming
> > file which the old apply worker has removed but that is not possible
> > because we always create these temp-files by having procid in the
> > path.
>
> Yeah, and I have tried to test on this line, basically, after the
> streaming has started I have set the binary=on.  Now using gdb I have
> made the worker wait before it deletes the temp file and meanwhile the
> new worker started and it worked properly as expected.
>
> > The other thing I observed in the code is that we can mark the
> > transaction as streamed (via ReorderBufferTruncateTxn) if we try to
> > stream a transaction that has no changes the first time we try to
> > stream the transaction. This would lead to symptom (b) because the
> > second-time when there are more changes we would stream the changes as
> > it is not the first time. However, this shouldn't happen because we
> > never pick-up a transaction to stream which has no changes. I can try
> > to fix the code here such that we don't mark the transaction as
> > streamed unless we have streamed at least one change but I don't see
> > how it is related to this particular test failure.
>
> Yeah, this can be improved but as you mentioned that we never select
> an empty transaction for streaming so this case should not occur.  I
> will perform some testing/review around this and report.

I have executed "make check" in the loop with only this file.  I have
repeated it 5000 times but no failure, I am wondering shall we try to
execute in the same machine in a loop where it failed once?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-12-01 Thread Amit Kapila

On Tue, Dec 1, 2020 at 11:38 AM Dilip Kumar  wrote:
>
> On Mon, Nov 30, 2020 at 6:49 PM Amit Kapila  wrote:
> >
> >
> > What is going on here is that the expected streaming file is missing.
> > Normally, the first time we send a stream of changes (some percentage
> > of transaction changes) we create the streaming file, and then in
> > respective streams we just keep on writing in that file the changes we
> > receive from the publisher, and on commit, we read that file and apply
> > all the changes.
> >
> > The above kind of error can happen due to the following reasons: (a)
> > the first time we sent the stream and created the file and that got
> > removed before the second stream reached the subscriber. (b) from the
> > publisher-side, we never sent the indication that it is the first
> > stream and the subscriber directly tries to open the file thinking it
> > is already there.
> >
> > Now, the publisher and subscriber log doesn't directly indicate any of
> > the above problems but I have some observations.
> >
> > The subscriber log indicates that before the apply worker exits due to
> > an error the new apply worker gets started. We delete the
> > streaming-related temporary files on proc_exit, so one possibility
> > could have been that the new apply worker has created the streaming
> > file which the old apply worker has removed but that is not possible
> > because we always create these temp-files by having procid in the
> > path.
>
> Yeah, and I have tried to test on this line, basically, after the
> streaming has started I have set the binary=on.  Now using gdb I have
> made the worker wait before it deletes the temp file and meanwhile the
> new worker started and it worked properly as expected.
>
> > The other thing I observed in the code is that we can mark the
> > transaction as streamed (via ReorderBufferTruncateTxn) if we try to
> > stream a transaction that has no changes the first time we try to
> > stream the transaction. This would lead to symptom (b) because the
> > second-time when there are more changes we would stream the changes as
> > it is not the first time. However, this shouldn't happen because we
> > never pick-up a transaction to stream which has no changes. I can try
> > to fix the code here such that we don't mark the transaction as
> > streamed unless we have streamed at least one change but I don't see
> > how it is related to this particular test failure.
>
> Yeah, this can be improved but as you mentioned that we never select
> an empty transaction for streaming so this case should not occur.  I
> will perform some testing/review around this and report.
>

On further thinking about this point, I think the message seen on
subscriber [1] won't occur if missed the first stream. This is because
we always check the value of fileset from the stream hash table
(xidhash) and it won't be there if we directly send the second stream
and that would have lead to a different kind of problem (probably
crash). This symptom seems to be due to the reason (a) mentioned above
unless we are missing something else. Now, I am not sure how the file
can be removed without the corresponding entry in hash table (xidhash)
is still present. The only reasons that come to mind are that some
other process cleaned pgsql_tmp directory thinking these temporary
file are not required or one manually removes it, none of those seems
plausible reasons.

[1] - ERROR: could not open temporary file "16393-510.changes.0" from
BufFile "16393-510.changes": No such file or directory

-- 
With Regards,
Amit Kapila.

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-11-30 Thread Dilip Kumar

On Mon, Nov 30, 2020 at 6:49 PM Amit Kapila  wrote:
>
> On Mon, Nov 30, 2020 at 3:14 AM Noah Misch  wrote:
> >
> > On Mon, Sep 07, 2020 at 12:00:41PM +0530, Amit Kapila wrote:
> > > Thanks, I have pushed the last patch. Let's wait for a day or so to
> > > see the buildfarm reports
> >
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer=2020-09-08%2006%3A24%3A14
> > failed the new 015_stream.pl test with the subscriber looping like this:
> >
> > 2020-09-08 11:22:49.848 UTC [13959252:1] LOG:  logical replication apply 
> > worker for subscription "tap_sub" has started
> > 2020-09-08 11:22:54.045 UTC [13959252:2] ERROR:  could not open temporary 
> > file "16393-510.changes.0" from BufFile "16393-510.changes": No such file 
> > or directory
> > 2020-09-08 11:22:54.055 UTC [7602182:1] LOG:  logical replication apply 
> > worker for subscription "tap_sub" has started
> > 2020-09-08 11:22:54.101 UTC [31785284:4] LOG:  background worker "logical 
> > replication worker" (PID 13959252) exited with exit code 1
> > 2020-09-08 11:23:01.142 UTC [7602182:2] ERROR:  could not open temporary 
> > file "16393-510.changes.0" from BufFile "16393-510.changes": No such file 
> > or directory
> > ...
> >
> > What happened there?
> >
>
> What is going on here is that the expected streaming file is missing.
> Normally, the first time we send a stream of changes (some percentage
> of transaction changes) we create the streaming file, and then in
> respective streams we just keep on writing in that file the changes we
> receive from the publisher, and on commit, we read that file and apply
> all the changes.
>
> The above kind of error can happen due to the following reasons: (a)
> the first time we sent the stream and created the file and that got
> removed before the second stream reached the subscriber. (b) from the
> publisher-side, we never sent the indication that it is the first
> stream and the subscriber directly tries to open the file thinking it
> is already there.
>
> Now, the publisher and subscriber log doesn't directly indicate any of
> the above problems but I have some observations.
>
> The subscriber log indicates that before the apply worker exits due to
> an error the new apply worker gets started. We delete the
> streaming-related temporary files on proc_exit, so one possibility
> could have been that the new apply worker has created the streaming
> file which the old apply worker has removed but that is not possible
> because we always create these temp-files by having procid in the
> path.

Yeah, and I have tried to test on this line, basically, after the
streaming has started I have set the binary=on.  Now using gdb I have
made the worker wait before it deletes the temp file and meanwhile the
new worker started and it worked properly as expected.

> The other thing I observed in the code is that we can mark the
> transaction as streamed (via ReorderBufferTruncateTxn) if we try to
> stream a transaction that has no changes the first time we try to
> stream the transaction. This would lead to symptom (b) because the
> second-time when there are more changes we would stream the changes as
> it is not the first time. However, this shouldn't happen because we
> never pick-up a transaction to stream which has no changes. I can try
> to fix the code here such that we don't mark the transaction as
> streamed unless we have streamed at least one change but I don't see
> how it is related to this particular test failure.

Yeah, this can be improved but as you mentioned that we never select
an empty transaction for streaming so this case should not occur.  I
will perform some testing/review around this and report.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-11-30 Thread Amit Kapila

On Mon, Nov 30, 2020 at 3:14 AM Noah Misch  wrote:
>
> On Mon, Sep 07, 2020 at 12:00:41PM +0530, Amit Kapila wrote:
> > Thanks, I have pushed the last patch. Let's wait for a day or so to
> > see the buildfarm reports
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer=2020-09-08%2006%3A24%3A14
> failed the new 015_stream.pl test with the subscriber looping like this:
>
> 2020-09-08 11:22:49.848 UTC [13959252:1] LOG:  logical replication apply 
> worker for subscription "tap_sub" has started
> 2020-09-08 11:22:54.045 UTC [13959252:2] ERROR:  could not open temporary 
> file "16393-510.changes.0" from BufFile "16393-510.changes": No such file or 
> directory
> 2020-09-08 11:22:54.055 UTC [7602182:1] LOG:  logical replication apply 
> worker for subscription "tap_sub" has started
> 2020-09-08 11:22:54.101 UTC [31785284:4] LOG:  background worker "logical 
> replication worker" (PID 13959252) exited with exit code 1
> 2020-09-08 11:23:01.142 UTC [7602182:2] ERROR:  could not open temporary file 
> "16393-510.changes.0" from BufFile "16393-510.changes": No such file or 
> directory
> ...
>
> What happened there?
>

What is going on here is that the expected streaming file is missing.
Normally, the first time we send a stream of changes (some percentage
of transaction changes) we create the streaming file, and then in
respective streams we just keep on writing in that file the changes we
receive from the publisher, and on commit, we read that file and apply
all the changes.

The above kind of error can happen due to the following reasons: (a)
the first time we sent the stream and created the file and that got
removed before the second stream reached the subscriber. (b) from the
publisher-side, we never sent the indication that it is the first
stream and the subscriber directly tries to open the file thinking it
is already there.

Now, the publisher and subscriber log doesn't directly indicate any of
the above problems but I have some observations.

The subscriber log indicates that before the apply worker exits due to
an error the new apply worker gets started. We delete the
streaming-related temporary files on proc_exit, so one possibility
could have been that the new apply worker has created the streaming
file which the old apply worker has removed but that is not possible
because we always create these temp-files by having procid in the
path.

The other thing I observed in the code is that we can mark the
transaction as streamed (via ReorderBufferTruncateTxn) if we try to
stream a transaction that has no changes the first time we try to
stream the transaction. This would lead to symptom (b) because the
second-time when there are more changes we would stream the changes as
it is not the first time. However, this shouldn't happen because we
never pick-up a transaction to stream which has no changes. I can try
to fix the code here such that we don't mark the transaction as
streamed unless we have streamed at least one change but I don't see
how it is related to this particular test failure.

I am not sure why this failure is not repeated since it occurred a few
months back, it's probably a timing issue. I have few timing issues in
the last month or so related to this feature but I am not able to come
up with a theory if any of those would have fixed this problem.

-- 
With Regards,
Amit Kapila.

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-11-29 Thread Amit Kapila

On Mon, Nov 30, 2020 at 3:14 AM Noah Misch  wrote:
>
> On Mon, Sep 07, 2020 at 12:00:41PM +0530, Amit Kapila wrote:
> > Thanks, I have pushed the last patch. Let's wait for a day or so to
> > see the buildfarm reports
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer=2020-09-08%2006%3A24%3A14
> failed the new 015_stream.pl test with the subscriber looping like this:
>

I will look into this.

-- 
With Regards,
Amit Kapila.

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-11-29 Thread Noah Misch

On Mon, Sep 07, 2020 at 12:00:41PM +0530, Amit Kapila wrote:
> Thanks, I have pushed the last patch. Let's wait for a day or so to
> see the buildfarm reports

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer=2020-09-08%2006%3A24%3A14
failed the new 015_stream.pl test with the subscriber looping like this:

2020-09-08 11:22:49.848 UTC [13959252:1] LOG:  logical replication apply worker 
for subscription "tap_sub" has started
2020-09-08 11:22:54.045 UTC [13959252:2] ERROR:  could not open temporary file 
"16393-510.changes.0" from BufFile "16393-510.changes": No such file or 
directory
2020-09-08 11:22:54.055 UTC [7602182:1] LOG:  logical replication apply worker 
for subscription "tap_sub" has started
2020-09-08 11:22:54.101 UTC [31785284:4] LOG:  background worker "logical 
replication worker" (PID 13959252) exited with exit code 1
2020-09-08 11:23:01.142 UTC [7602182:2] ERROR:  could not open temporary file 
"16393-510.changes.0" from BufFile "16393-510.changes": No such file or 
directory
...

What happened there?

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-09-16 Thread Amit Kapila

On Tue, Sep 15, 2020 at 10:17 AM Amit Kapila  wrote:
>
> On Tue, Sep 15, 2020 at 8:38 AM Tom Lane  wrote:
> > > As far as I can see they are useless in this case but I think they
> > > might be required in case the user provides its own allocator function
> > > (using HASH_ALLOC). So, we can probably remove those from here?
> >
> > You could imagine writing a HASH_ALLOC allocator whose behavior
> > varies depending on CurrentMemoryContext, but it seems like a
> > pretty foolish/fragile way to do it.  In any case I can think of,
> > the hash table lives in one specific context and you really
> > really do not want parts of it spread across other contexts.
> > dynahash.c is not going to look kindly on pieces of what it
> > is managing disappearing from under it.
> >
>
> I agree that doesn't make sense. I have fixed all the comments
> discussed in the attached patch.
>

Pushed.

-- 
With Regards,
Amit Kapila.

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-09-14 Thread Amit Kapila

On Tue, Sep 15, 2020 at 8:38 AM Tom Lane  wrote:
>
> Amit Kapila  writes:
> > On Mon, Sep 14, 2020 at 9:42 PM Tom Lane  wrote:
> >> BTW, unless someone has changed the behavior of dynahash when I
> >> wasn't looking, those MemoryContextSwitchTos shown above are useless.
>
> > As far as I can see they are useless in this case but I think they
> > might be required in case the user provides its own allocator function
> > (using HASH_ALLOC). So, we can probably remove those from here?
>
> You could imagine writing a HASH_ALLOC allocator whose behavior
> varies depending on CurrentMemoryContext, but it seems like a
> pretty foolish/fragile way to do it.  In any case I can think of,
> the hash table lives in one specific context and you really
> really do not want parts of it spread across other contexts.
> dynahash.c is not going to look kindly on pieces of what it
> is managing disappearing from under it.
>

I agree that doesn't make sense. I have fixed all the comments
discussed in the attached patch.

-- 
With Regards,
Amit Kapila.


v2-0001-Fix-initialization-of-RelationSyncEntry-for-strea.patch
Description: Binary data

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-09-14 Thread Tom Lane

Amit Kapila  writes:
> On Mon, Sep 14, 2020 at 9:42 PM Tom Lane  wrote:
>> BTW, unless someone has changed the behavior of dynahash when I
>> wasn't looking, those MemoryContextSwitchTos shown above are useless.

> As far as I can see they are useless in this case but I think they
> might be required in case the user provides its own allocator function
> (using HASH_ALLOC). So, we can probably remove those from here?

You could imagine writing a HASH_ALLOC allocator whose behavior
varies depending on CurrentMemoryContext, but it seems like a
pretty foolish/fragile way to do it.  In any case I can think of,
the hash table lives in one specific context and you really
really do not want parts of it spread across other contexts.
dynahash.c is not going to look kindly on pieces of what it
is managing disappearing from under it.

(To be clear, objects that the hash entries contain pointers to
are a different question.  But the hash entries themselves have
to have exactly the same lifespan as the hash table.)

regards, tom lane

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-09-14 Thread Amit Kapila

On Mon, Sep 14, 2020 at 9:42 PM Tom Lane  wrote:
>
> Amit Kapila  writes:
> > The attached patch will fix the issue. What do you think?
>
> I think it'd be cleaner to separate the initialization of a new entry from
> validation altogether, along the lines of
>
> /* Find cached function info, creating if not found */
> oldctx = MemoryContextSwitchTo(CacheMemoryContext);
> entry = (RelationSyncEntry *) hash_search(RelationSyncCache,
>   (void *) ,
>   HASH_ENTER, );
> MemoryContextSwitchTo(oldctx);
> Assert(entry != NULL);
>
> if (!found)
> {
> /* immediately make a new entry valid enough to satisfy callbacks */
> entry->schema_sent = false;
> entry->streamed_txns = NIL;
> entry->replicate_valid = false;
> /* are there any other fields we should clear here for safety??? */
> }
>

If we want to separate validation then we need to initialize other
fields like 'pubactions' and 'publish_as_relid' as well. I think it
will be better to arrange it the way you are suggesting. So, I will
change it along with other fields that required initialization.

> /* Fill it in if not valid */
> if (!entry->replicate_valid)
> {
> List   *pubids = GetRelationPublications(relid);
> ...
>
> BTW, unless someone has changed the behavior of dynahash when I
> wasn't looking, those MemoryContextSwitchTos shown above are useless.
>

As far as I can see they are useless in this case but I think they
might be required in case the user provides its own allocator function
(using HASH_ALLOC). So, we can probably remove those from here?

> Also, why does the comment refer to a "function" entry?
>

It should be "relation" instead. I'll take care of changing this as well.

-- 
With Regards,
Amit Kapila.

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-09-14 Thread Tom Lane

Amit Kapila  writes:
> The attached patch will fix the issue. What do you think?

I think it'd be cleaner to separate the initialization of a new entry from
validation altogether, along the lines of

/* Find cached function info, creating if not found */
oldctx = MemoryContextSwitchTo(CacheMemoryContext);
entry = (RelationSyncEntry *) hash_search(RelationSyncCache,
  (void *) ,
  HASH_ENTER, );
MemoryContextSwitchTo(oldctx);
Assert(entry != NULL);

if (!found)
{
/* immediately make a new entry valid enough to satisfy callbacks */
entry->schema_sent = false;
entry->streamed_txns = NIL;
entry->replicate_valid = false;
/* are there any other fields we should clear here for safety??? */
}

/* Fill it in if not valid */
if (!entry->replicate_valid)
{
List   *pubids = GetRelationPublications(relid);
...

BTW, unless someone has changed the behavior of dynahash when I
wasn't looking, those MemoryContextSwitchTos shown above are useless.
Also, why does the comment refer to a "function" entry?

regards, tom lane

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-09-14 Thread Dilip Kumar

On Mon, Sep 14, 2020 at 4:50 PM Amit Kapila  wrote:
>
> On Mon, Sep 14, 2020 at 1:23 PM Dilip Kumar  wrote:
> >
> > On Mon, Sep 14, 2020 at 8:48 AM Amit Kapila  wrote:
> > >
> > >
> > > Yeah, this is right, and here is some initial analysis. It seems to be
> > > failing in below code:
> > > rel_sync_cache_relation_cb(){ ...list_free(entry->streamed_txns);..}
> > >
> > > This list can have elements only in 'streaming' mode (need to enable
> > > 'streaming' with Create Subscription command) whereas none of the
> > > tests in 010_truncate.pl is using 'streaming', so this list should be
> > > empty (NULL). The two different assertion failures shown in BF reports
> > > in list_free code are as below:
> > > Assert(list->length > 0);
> > > Assert(list->length <= list->max_length);
> > >
> > > It seems to me that this list is not initialized properly when it is
> > > not used or maybe that is true in some special circumstances because
> > > we initialize it in get_rel_sync_entry(). I am not sure if CCI build
> > > is impacting this in some way.
> >
> >
> > Even I have analyzed this but did not find any reason why the
> > streamed_txns list should be anything other than NULL.  The only thing
> > is we are initializing the entry->streamed_txns to NULL and the list
> > free is checking  "if (list == NIL)" then return. However IMHO, that
> > should not be an issue becase NIL is defined as (List*) NULL.
> >
>
> Yeah, that is not the issue but it is better to initialize it with NIL
> for the sake of consistency. The basic issue here was we were trying
> to open/lock the relation(s) before initializing this list. Now, when
> we process the invalidations during open relation, we try to access
> this list in rel_sync_cache_relation_cb and that leads to assertion
> failure. I have reproduced the exact scenario of 010_truncate.pl via
> debugger. Basically, the backend on publisher has sent the
> invalidation after truncating the relation 'tab1' and while processing
> the truncate message if WALSender receives that message exactly after
> creating the RelSyncEntry for 'tab1', the Assertion shown in BF can be
> reproduced.

Yeah, this is an issue and I am also able to reproduce this manually
using gdb.  Basically, I have inserted some data in publication table
and after that, I stopped in get_rel_sync_entry after creating the
reentry and before calling GetRelationPublications.  Meanwhile, I have
truncated this table and then it hit the same issue you pointed here.

> The attached patch will fix the issue. What do you think?

The patch looks good to me and fixing the reported issue.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-09-14 Thread Amit Kapila

On Mon, Sep 14, 2020 at 1:23 PM Dilip Kumar  wrote:
>
> On Mon, Sep 14, 2020 at 8:48 AM Amit Kapila  wrote:
> >
> >
> > Yeah, this is right, and here is some initial analysis. It seems to be
> > failing in below code:
> > rel_sync_cache_relation_cb(){ ...list_free(entry->streamed_txns);..}
> >
> > This list can have elements only in 'streaming' mode (need to enable
> > 'streaming' with Create Subscription command) whereas none of the
> > tests in 010_truncate.pl is using 'streaming', so this list should be
> > empty (NULL). The two different assertion failures shown in BF reports
> > in list_free code are as below:
> > Assert(list->length > 0);
> > Assert(list->length <= list->max_length);
> >
> > It seems to me that this list is not initialized properly when it is
> > not used or maybe that is true in some special circumstances because
> > we initialize it in get_rel_sync_entry(). I am not sure if CCI build
> > is impacting this in some way.
>
>
> Even I have analyzed this but did not find any reason why the
> streamed_txns list should be anything other than NULL.  The only thing
> is we are initializing the entry->streamed_txns to NULL and the list
> free is checking  "if (list == NIL)" then return. However IMHO, that
> should not be an issue becase NIL is defined as (List*) NULL.
>

Yeah, that is not the issue but it is better to initialize it with NIL
for the sake of consistency. The basic issue here was we were trying
to open/lock the relation(s) before initializing this list. Now, when
we process the invalidations during open relation, we try to access
this list in rel_sync_cache_relation_cb and that leads to assertion
failure. I have reproduced the exact scenario of 010_truncate.pl via
debugger. Basically, the backend on publisher has sent the
invalidation after truncating the relation 'tab1' and while processing
the truncate message if WALSender receives that message exactly after
creating the RelSyncEntry for 'tab1', the Assertion shown in BF can be
reproduced.

The attached patch will fix the issue. What do you think?

-- 
With Regards,
Amit Kapila.

v1-0001-Fix-initialization-of-RelationSyncEntry-for-strea.patch
Description: Binary data

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-09-14 Thread Dilip Kumar

On Mon, Sep 14, 2020 at 8:48 AM Amit Kapila  wrote:
>
> On Mon, Sep 14, 2020 at 3:08 AM Tom Lane  wrote:
> >
> > Amit Kapila  writes:
> > > Pushed.
> >
> > Observe the following reports:
> >
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=idiacanthus=2020-09-13%2016%3A54%3A03
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=desmoxytes=2020-09-10%2009%3A08%3A03
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=komodoensis=2020-09-05%2020%3A22%3A02
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dragonet=2020-09-04%2001%3A52%3A03
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dragonet=2020-09-03%2020%3A54%3A04
> >
> > These are all on HEAD, and all within the last ten days, and I see
> > nothing comparable in any branch before that.  So it's hard to avoid
> > the conclusion that somebody broke something about ten days ago.
> >
> > None of these animals provided gdb backtraces; but we do have a built-in
> > trace from several, and they all look like pgoutput.so is trying to
> > list_free() garbage, somewhere inside a relcache invalidation/rebuild
> > scenario:
> >
>
> Yeah, this is right, and here is some initial analysis. It seems to be
> failing in below code:
> rel_sync_cache_relation_cb(){ ...list_free(entry->streamed_txns);..}
>
> This list can have elements only in 'streaming' mode (need to enable
> 'streaming' with Create Subscription command) whereas none of the
> tests in 010_truncate.pl is using 'streaming', so this list should be
> empty (NULL). The two different assertion failures shown in BF reports
> in list_free code are as below:
> Assert(list->length > 0);
> Assert(list->length <= list->max_length);
>
> It seems to me that this list is not initialized properly when it is
> not used or maybe that is true in some special circumstances because
> we initialize it in get_rel_sync_entry(). I am not sure if CCI build
> is impacting this in some way.


Even I have analyzed this but did not find any reason why the
streamed_txns list should be anything other than NULL.  The only thing
is we are initializing the entry->streamed_txns to NULL and the list
free is checking  "if (list == NIL)" then return. However IMHO, that
should not be an issue becase NIL is defined as (List*) NULL.  I am
doing further testing and investigation.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-09-13 Thread Amit Kapila

On Mon, Sep 14, 2020 at 3:08 AM Tom Lane  wrote:
>
> Amit Kapila  writes:
> > Pushed.
>
> Observe the following reports:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=idiacanthus=2020-09-13%2016%3A54%3A03
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=desmoxytes=2020-09-10%2009%3A08%3A03
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=komodoensis=2020-09-05%2020%3A22%3A02
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dragonet=2020-09-04%2001%3A52%3A03
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dragonet=2020-09-03%2020%3A54%3A04
>
> These are all on HEAD, and all within the last ten days, and I see
> nothing comparable in any branch before that.  So it's hard to avoid
> the conclusion that somebody broke something about ten days ago.
>
> None of these animals provided gdb backtraces; but we do have a built-in
> trace from several, and they all look like pgoutput.so is trying to
> list_free() garbage, somewhere inside a relcache invalidation/rebuild
> scenario:
>

Yeah, this is right, and here is some initial analysis. It seems to be
failing in below code:
rel_sync_cache_relation_cb(){ ...list_free(entry->streamed_txns);..}

This list can have elements only in 'streaming' mode (need to enable
'streaming' with Create Subscription command) whereas none of the
tests in 010_truncate.pl is using 'streaming', so this list should be
empty (NULL). The two different assertion failures shown in BF reports
in list_free code are as below:
Assert(list->length > 0);
Assert(list->length <= list->max_length);

It seems to me that this list is not initialized properly when it is
not used or maybe that is true in some special circumstances because
we initialize it in get_rel_sync_entry(). I am not sure if CCI build
is impacting this in some way.

--
With Regards,
Amit Kapila.

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-09-13 Thread Tom Lane

I wrote:
> * Starting over, it appears that 001_rep_changes.pl almost immediately
> gets into an infinite loop.  It does not complete the third test step,
> rather infinitely waiting for progress to be made.

Ah, looking closer, the problem is that wal_receiver_timeout = 60s
is too short when the sender is using CCA.  It times out before we
can get through the needed data transmission.

regards, tom lane

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-09-13 Thread Amit Kapila

On Mon, Sep 14, 2020 at 3:08 AM Tom Lane  wrote:
>
> Amit Kapila  writes:
> > Pushed.
>
> Observe the following reports:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=idiacanthus=2020-09-13%2016%3A54%3A03
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=desmoxytes=2020-09-10%2009%3A08%3A03
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=komodoensis=2020-09-05%2020%3A22%3A02
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dragonet=2020-09-04%2001%3A52%3A03
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dragonet=2020-09-03%2020%3A54%3A04
>
> These are all on HEAD, and all within the last ten days, and I see
> nothing comparable in any branch before that.  So it's hard to avoid
> the conclusion that somebody broke something about ten days ago.
>

I'll analyze these reports.

-- 
With Regards,
Amit Kapila.

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-09-13 Thread Tom Lane

I wrote:
> Probably this requires a relcache inval at the wrong time;
> although we have recent passes from CLOBBER_CACHE_ALWAYS animals,
> so that can't be the whole triggering condition.  I wonder whether
> it is relevant that all of the complaining animals are JIT-enabled.

Hmmm ... I take that back.  hyrax has indeed passed since this went
in, but *it doesn't run any TAP tests*.  So the buildfarm offers no
information about whether the replication tests work under
CLOBBER_CACHE_ALWAYS.

Realizing that, I built an installation that way and tried to run
the subscription tests.  Results so far:

* Running 010_truncate.pl by itself passed for me.  So there's still
some unexplained factor needed to trigger the buildfarm failures.
(I'm wondering about concurrent autovacuum activity now...)

* Starting over, it appears that 001_rep_changes.pl almost immediately
gets into an infinite loop.  It does not complete the third test step,
rather infinitely waiting for progress to be made.  The publisher log
shows a repeating loop like

2020-09-13 21:16:05.734 EDT [928529] tap_sub LOG:  could not send data to 
client: Broken pipe
2020-09-13 21:16:05.734 EDT [928529] tap_sub CONTEXT:  slot "tap_sub", output 
plugin "pgoutput", in the commit callback, associated LSN 0/1660628
2020-09-13 21:16:05.843 EDT [928581] 001_rep_changes.pl LOG:  statement: SELECT 
pg_current_wal_lsn() <= replay_lsn AND state = 'streaming' FROM 
pg_catalog.pg_stat_replication WHERE application_name = 'tap_sub';
2020-09-13 21:16:05.861 EDT [928582] tap_sub LOG:  statement: SELECT 
pg_catalog.set_config('search_path', '', false);
2020-09-13 21:16:05.929 EDT [928582] tap_sub LOG:  received replication 
command: IDENTIFY_SYSTEM
2020-09-13 21:16:05.930 EDT [928582] tap_sub LOG:  received replication 
command: START_REPLICATION SLOT "tap_sub" LOGICAL 0/1652820 (proto_version '2', 
publication_names '"tap_pub","tap_pub_ins_only"')
2020-09-13 21:16:05.930 EDT [928582] tap_sub LOG:  starting logical decoding 
for slot "tap_sub"
2020-09-13 21:16:05.930 EDT [928582] tap_sub DETAIL:  Streaming transactions 
committing after 0/1652820, reading WAL from 0/1651B20.
2020-09-13 21:16:05.930 EDT [928582] tap_sub LOG:  logical decoding found 
consistent point at 0/1651B20
2020-09-13 21:16:05.930 EDT [928582] tap_sub DETAIL:  There are no running 
transactions.
2020-09-13 21:16:21.560 EDT [928600] 001_rep_changes.pl LOG:  statement: SELECT 
pg_current_wal_lsn() <= replay_lsn AND state = 'streaming' FROM 
pg_catalog.pg_stat_replication WHERE application_name = 'tap_sub';
2020-09-13 21:16:37.291 EDT [928610] 001_rep_changes.pl LOG:  statement: SELECT 
pg_current_wal_lsn() <= replay_lsn AND state = 'streaming' FROM 
pg_catalog.pg_stat_replication WHERE application_name = 'tap_sub';
2020-09-13 21:16:52.959 EDT [928627] 001_rep_changes.pl LOG:  statement: SELECT 
pg_current_wal_lsn() <= replay_lsn AND state = 'streaming' FROM 
pg_catalog.pg_stat_replication WHERE application_name = 'tap_sub';
2020-09-13 21:17:06.866 EDT [928636] tap_sub LOG:  statement: SELECT 
pg_catalog.set_config('search_path', '', false);
2020-09-13 21:17:06.934 EDT [928636] tap_sub LOG:  received replication 
command: IDENTIFY_SYSTEM
2020-09-13 21:17:06.934 EDT [928636] tap_sub LOG:  received replication 
command: START_REPLICATION SLOT "tap_sub" LOGICAL 0/1652820 (proto_version '2', 
publication_names '"tap_pub","tap_pub_ins_only"')
2020-09-13 21:17:06.934 EDT [928636] tap_sub ERROR:  replication slot "tap_sub" 
is active for PID 928582
2020-09-13 21:17:07.811 EDT [928638] tap_sub LOG:  statement: SELECT 
pg_catalog.set_config('search_path', '', false);
2020-09-13 21:17:07.880 EDT [928638] tap_sub LOG:  received replication 
command: IDENTIFY_SYSTEM
2020-09-13 21:17:07.881 EDT [928638] tap_sub LOG:  received replication 
command: START_REPLICATION SLOT "tap_sub" LOGICAL 0/1652820 (proto_version '2', 
publication_names '"tap_pub","tap_pub_ins_only"')
2020-09-13 21:17:07.881 EDT [928638] tap_sub ERROR:  replication slot "tap_sub" 
is active for PID 928582
2020-09-13 21:17:08.618 EDT [928641] 001_rep_changes.pl LOG:  statement: SELECT 
pg_current_wal_lsn() <= replay_lsn AND state = 'streaming' FROM 
pg_catalog.pg_stat_replication WHERE application_name = 'tap_sub';
2020-09-13 21:17:08.753 EDT [928642] tap_sub LOG:  statement: SELECT 
pg_catalog.set_config('search_path', '', false);
2020-09-13 21:17:08.821 EDT [928642] tap_sub LOG:  received replication 
command: IDENTIFY_SYSTEM
2020-09-13 21:17:08.821 EDT [928642] tap_sub LOG:  received replication 
command: START_REPLICATION SLOT "tap_sub" LOGICAL 0/1652820 (proto_version '2', 
publication_names '"tap_pub","tap_pub_ins_only"')
2020-09-13 21:17:08.821 EDT [928642] tap_sub ERROR:  replication slot "tap_sub" 
is active for PID 928582
2020-09-13 21:17:09.689 EDT [928645] tap_sub LOG:  statement: SELECT 
pg_catalog.set_config('search_path', '', false);
2020-09-13 21:17:09.756 EDT [928645] tap_sub LOG:  received replication 
command: IDENTIFY_SYSTEM

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-09-13 Thread Tom Lane

Amit Kapila  writes:
> Pushed.

Observe the following reports:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=idiacanthus=2020-09-13%2016%3A54%3A03
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=desmoxytes=2020-09-10%2009%3A08%3A03
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=komodoensis=2020-09-05%2020%3A22%3A02
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dragonet=2020-09-04%2001%3A52%3A03
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dragonet=2020-09-03%2020%3A54%3A04

These are all on HEAD, and all within the last ten days, and I see
nothing comparable in any branch before that.  So it's hard to avoid
the conclusion that somebody broke something about ten days ago.

None of these animals provided gdb backtraces; but we do have a built-in
trace from several, and they all look like pgoutput.so is trying to
list_free() garbage, somewhere inside a relcache invalidation/rebuild
scenario:

TRAP: FailedAssertion("list->length > 0", File: 
"/home/bf/build/buildfarm-idiacanthus/HEAD/pgsql.build/../pgsql/src/backend/nodes/list.c",
 Line: 68)
postgres: publisher: walsender bf [local] 
idle(ExceptionalCondition+0x57)[0x9081f7]
postgres: publisher: walsender bf [local] idle[0x6bcc70]
postgres: publisher: walsender bf [local] idle(list_free+0x11)[0x6bdc01]
/home/bf/build/buildfarm-idiacanthus/HEAD/pgsql.build/tmp_install/home/bf/build/buildfarm-idiacanthus/HEAD/inst/lib/postgresql/pgoutput.so(+0x35d8)[0x7fa4c5a6f5d8]
postgres: publisher: walsender bf [local] 
idle(LocalExecuteInvalidationMessage+0x15b)[0x8f0cdb]
postgres: publisher: walsender bf [local] 
idle(ReceiveSharedInvalidMessages+0x4b)[0x7bca0b]
postgres: publisher: walsender bf [local] idle(LockRelationOid+0x56)[0x7c19e6]
postgres: publisher: walsender bf [local] idle(relation_open+0x1c)[0x4a2d0c]
postgres: publisher: walsender bf [local] idle(table_open+0x6)[0x524486]
postgres: publisher: walsender bf [local] idle[0x9017f2]
postgres: publisher: walsender bf [local] idle[0x8fabd4]
postgres: publisher: walsender bf [local] idle[0x8fa58a]
postgres: publisher: walsender bf [local] 
idle(RelationCacheInvalidateEntry+0xaf)[0x8fbdbf]
postgres: publisher: walsender bf [local] 
idle(LocalExecuteInvalidationMessage+0xec)[0x8f0c6c]
postgres: publisher: walsender bf [local] 
idle(ReceiveSharedInvalidMessages+0xcb)[0x7bca8b]
postgres: publisher: walsender bf [local] idle(LockRelationOid+0x56)[0x7c19e6]
postgres: publisher: walsender bf [local] idle(relation_open+0x1c)[0x4a2d0c]
postgres: publisher: walsender bf [local] idle(table_open+0x6)[0x524486]
postgres: publisher: walsender bf [local] idle[0x8ee8b0]

010_truncate.pl itself hasn't changed meaningfully in a good long time.
However, I see that 464824323 added a whole boatload of code to
pgoutput.c, and the timing is right for that commit to be the culprit,
so that's what I'm betting on.

Probably this requires a relcache inval at the wrong time;
although we have recent passes from CLOBBER_CACHE_ALWAYS animals,
so that can't be the whole triggering condition.  I wonder whether
it is relevant that all of the complaining animals are JIT-enabled.

regards, tom lane

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-09-12 Thread Amit Kapila

On Wed, Sep 9, 2020 at 2:26 PM Amit Kapila  wrote:
>
> On Wed, Sep 9, 2020 at 2:13 PM Tomas Vondra
>  wrote:
> >
> > Hi,
> >
> > while looking at the streaming code I noticed two minor issues:
> >
> > 1) logicalrep_read_stream_stop is never defined/called, so the prototype
> > in logicalproto.h is unnecessary
> >
> > 2) minor typo in one of the comments
> >
> > Patch attached.
> >
>
> LGTM.
>

Pushed.

-- 
With Regards,
Amit Kapila.

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-09-09 Thread Dilip Kumar

On Wed, Sep 9, 2020 at 2:13 PM Tomas Vondra 
wrote:

> Hi,
>
> while looking at the streaming code I noticed two minor issues:
>
> 1) logicalrep_read_stream_stop is never defined/called, so the prototype
> in logicalproto.h is unnecessary
>
>
Yeah, right.


> 2) minor typo in one of the comments
>
> Patch attached.
>

 Looks good to me.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-09-09 Thread Amit Kapila

On Wed, Sep 9, 2020 at 2:13 PM Tomas Vondra
 wrote:
>
> Hi,
>
> while looking at the streaming code I noticed two minor issues:
>
> 1) logicalrep_read_stream_stop is never defined/called, so the prototype
> in logicalproto.h is unnecessary
>
> 2) minor typo in one of the comments
>
> Patch attached.
>

LGTM.

-- 
With Regards,
Amit Kapila.

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-09-09 Thread Tomas Vondra


Hi,

while looking at the streaming code I noticed two minor issues:

1) logicalrep_read_stream_stop is never defined/called, so the prototype
in logicalproto.h is unnecessary

2) minor typo in one of the comments

Patch attached.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 
diff --git a/src/backend/replication/pgoutput/pgoutput.c 
b/src/backend/replication/pgoutput/pgoutput.c
index c29c088813..343f03129f 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -77,7 +77,7 @@ static void send_relation_and_attrs(Relation relation, 
TransactionId xid,
  * and with streamed transactions the commit order may be different from
  * the order the transactions are sent in. Also, the (sub) transactions
  * might get aborted so we need to send the schema for each (sub) transaction
- * so that we don't loose the schema information on abort. For handling this,
+ * so that we don't lose the schema information on abort. For handling this,
  * we maintain the list of xids (streamed_txns) for those we have already sent
  * the schema.
  *
diff --git a/src/include/replication/logicalproto.h 
b/src/include/replication/logicalproto.h
index 53905ee608..607a728508 100644
--- a/src/include/replication/logicalproto.h
+++ b/src/include/replication/logicalproto.h
@@ -133,7 +133,6 @@ extern void logicalrep_write_stream_start(StringInfo out, 
TransactionId xid,
 extern TransactionId logicalrep_read_stream_start(StringInfo in,

  bool *first_segment);
 extern void logicalrep_write_stream_stop(StringInfo out);
-extern TransactionId logicalrep_read_stream_stop(StringInfo in);
 extern void logicalrep_write_stream_commit(StringInfo out, ReorderBufferTXN 
*txn,

   XLogRecPtr commit_lsn);
 extern TransactionId logicalrep_read_stream_commit(StringInfo out,

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-09-08 Thread Amit Kapila

On Mon, Sep 7, 2020 at 12:57 PM Dilip Kumar  wrote:
>
> On Mon, Sep 7, 2020 at 12:00 PM Amit Kapila  wrote:
>>
>> On Sat, Sep 5, 2020 at 8:55 PM Dilip Kumar  wrote:
>> >
>> >
>> > I have reviewed the changes and looks fine to me.
>> >
>>
>> Thanks, I have pushed the last patch. Let's wait for a day or so to
>> see the buildfarm reports and then we can probably close this CF
>> entry.
>
>
> Thanks.
>

I have updated the status of CF entry as committed now.

-- 
With Regards,
Amit Kapila.

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-09-07 Thread Dilip Kumar

On Mon, Sep 7, 2020 at 12:00 PM Amit Kapila  wrote:

> On Sat, Sep 5, 2020 at 8:55 PM Dilip Kumar  wrote:
> >
> >
> > I have reviewed the changes and looks fine to me.
> >
>
> Thanks, I have pushed the last patch. Let's wait for a day or so to
> see the buildfarm reports and then we can probably close this CF
> entry.


Thanks.


> I am aware that we have one patch related to stats still
> pending but I think we can tackle it along with the spill stats patch
> which is being discussed in a different thread [1]. Do let me know if
> I have missed anything?
>
> [1] -
> https://www.postgresql.org/message-id/CAA4eK1JBqQh9cBKjO-nKOOE%3D7f6ONDCZp0TJZfn4VsQqRZ%2BuYA%40mail.gmail.com


Sound good to me.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-09-07 Thread Amit Kapila

On Sat, Sep 5, 2020 at 8:55 PM Dilip Kumar  wrote:
>
>
> I have reviewed the changes and looks fine to me.
>

Thanks, I have pushed the last patch. Let's wait for a day or so to
see the buildfarm reports and then we can probably close this CF
entry. I am aware that we have one patch related to stats still
pending but I think we can tackle it along with the spill stats patch
which is being discussed in a different thread [1]. Do let me know if
I have missed anything?

[1] - 
https://www.postgresql.org/message-id/CAA4eK1JBqQh9cBKjO-nKOOE%3D7f6ONDCZp0TJZfn4VsQqRZ%2BuYA%40mail.gmail.com

-- 
With Regards,
Amit Kapila.

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-09-05 Thread Dilip Kumar

On Sat, 5 Sep 2020 at 4:02 PM, Amit Kapila  wrote:

> On Tue, Sep 1, 2020 at 8:33 PM Amit Kapila 
> wrote:
>
> >
>
> > On Tue, Sep 1, 2020 at 9:28 AM Amit Kapila 
> wrote:
>
> >
>
> > I have fixed all the comments except the below comments.
>
> > 1. verify the size of various tests to ensure that it is above
>
> > logical_decoding_work_mem.
>
> > 2. I have checked that in one of the previous patches, we have a test
>
> > v53-0004-Add-TAP-test-for-streaming-vs.-DDL which contains a test case
>
> > quite similar to what we have in
>
> >
> v55-0002-Add-support-for-streaming-to-built-in-logical-re/013_stream_subxact_ddl_abort.
>
> > If there is any difference that can cover more scenarios then can we
>
> > consider merging them into one test?
>
> >
>
>
>
> I have compared these two tests and found that the only thing
>
> additional in the test case present in
>
> v53-0004-Add-TAP-test-for-streaming-vs.-DDL was that it was performing
>
> few savepoints and DMLs after doing the first rollback to savepoint
>
> and I included that in one of the existing tests in
>
> 018_stream_subxact_abort.pl. I have added one test for Rollback,
>
> changed few messages, removed one test case which was not making any
>
> sense in the patch. See attached and let me know what you think about
>
> it?


I have reviewed the changes and looks fine to me.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-09-05 Thread Amit Kapila

On Tue, Sep 1, 2020 at 8:33 PM Amit Kapila  wrote:
>
> On Tue, Sep 1, 2020 at 9:28 AM Amit Kapila  wrote:
>
> I have fixed all the comments except the below comments.
> 1. verify the size of various tests to ensure that it is above
> logical_decoding_work_mem.
> 2. I have checked that in one of the previous patches, we have a test
> v53-0004-Add-TAP-test-for-streaming-vs.-DDL which contains a test case
> quite similar to what we have in
> v55-0002-Add-support-for-streaming-to-built-in-logical-re/013_stream_subxact_ddl_abort.
> If there is any difference that can cover more scenarios then can we
> consider merging them into one test?
>

I have compared these two tests and found that the only thing
additional in the test case present in
v53-0004-Add-TAP-test-for-streaming-vs.-DDL was that it was performing
few savepoints and DMLs after doing the first rollback to savepoint
and I included that in one of the existing tests in
018_stream_subxact_abort.pl. I have added one test for Rollback,
changed few messages, removed one test case which was not making any
sense in the patch. See attached and let me know what you think about
it?

-- 
With Regards,
Amit Kapila.

v61-0001-Add-additional-tests-to-test-streaming-of-in-pro.patch
Description: Binary data

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-09-03 Thread Amit Kapila

On Fri, Sep 4, 2020 at 3:10 AM Bossart, Nathan  wrote:
>
> I noticed a small compiler warning for this.
>
> diff --git a/src/backend/replication/logical/worker.c 
> b/src/backend/replication/logical/worker.c
> index 812aca8011..88d3444c39 100644
> --- a/src/backend/replication/logical/worker.c
> +++ b/src/backend/replication/logical/worker.c
> @@ -199,7 +199,7 @@ typedef struct ApplySubXactData
>  static ApplySubXactData subxact_data = {0, 0, InvalidTransactionId, NULL};
>
>  static void subxact_filename(char *path, Oid subid, TransactionId xid);
> -static void changes_filename(char *path, Oid subid, TransactionId xid);
> +static inline void changes_filename(char *path, Oid subid, TransactionId 
> xid);
>

Thanks for the report, I'll take care of this. I think the nearby
similar function subxact_filename() should also be inline.

-- 
With Regards,
Amit Kapila.

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-09-03 Thread Bossart, Nathan

I noticed a small compiler warning for this.

diff --git a/src/backend/replication/logical/worker.c 
b/src/backend/replication/logical/worker.c
index 812aca8011..88d3444c39 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -199,7 +199,7 @@ typedef struct ApplySubXactData
 static ApplySubXactData subxact_data = {0, 0, InvalidTransactionId, NULL};

 static void subxact_filename(char *path, Oid subid, TransactionId xid);
-static void changes_filename(char *path, Oid subid, TransactionId xid);
+static inline void changes_filename(char *path, Oid subid, TransactionId xid);

 /*
  * Information about subtransactions of a given toplevel transaction.

Nathan

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-09-02 Thread Dilip Kumar

On Wed, Sep 2, 2020 at 7:19 PM Amit Kapila  wrote:

> On Wed, Sep 2, 2020 at 3:41 PM Dilip Kumar  wrote:
> >
> > On Wed, Sep 2, 2020 at 10:55 AM Amit Kapila 
> wrote:
> > >
> > > >
> > >
> > > We can combine the tests in 015_stream_simple.pl and
> > > 020_stream_binary.pl as I can't see a good reason to keep them
> > > separate. Then, I think we can keep only this part with the main patch
> > > and extract other tests into a separate patch. Basically, we can
> > > commit the basic tests with the main patch and then keep the advanced
> > > tests separately. I am afraid that there are some tests that don't add
> > > much value so we can review them separately.
> >
> > Fixed
> >
>
> I have slightly adjusted this test and ran pgindent on the patch. I am
> planning to push this tomorrow unless you have more comments.
>

Looks good to me.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-09-02 Thread Amit Kapila

On Wed, Sep 2, 2020 at 3:41 PM Dilip Kumar  wrote:
>
> On Wed, Sep 2, 2020 at 10:55 AM Amit Kapila  wrote:
> >
> > >
> >
> > We can combine the tests in 015_stream_simple.pl and
> > 020_stream_binary.pl as I can't see a good reason to keep them
> > separate. Then, I think we can keep only this part with the main patch
> > and extract other tests into a separate patch. Basically, we can
> > commit the basic tests with the main patch and then keep the advanced
> > tests separately. I am afraid that there are some tests that don't add
> > much value so we can review them separately.
>
> Fixed
>

I have slightly adjusted this test and ran pgindent on the patch. I am
planning to push this tomorrow unless you have more comments.

-- 
With Regards,
Amit Kapila.


v60-0001-Add-support-for-streaming-to-built-in-logical-re.patch
Description: Binary data

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-09-02 Thread Dilip Kumar

On Tue, Sep 1, 2020 at 8:33 PM Amit Kapila  wrote:
>
> On Tue, Sep 1, 2020 at 9:28 AM Amit Kapila  wrote:
> >
> > On Mon, Aug 31, 2020 at 7:28 PM Dilip Kumar  wrote:
> > >
> > > On Mon, Aug 31, 2020 at 1:24 PM Amit Kapila  
> > > wrote:
> >
> > In functions cleanup_rel_sync_cache and
> > get_schema_sent_in_streamed_txn, lets cast the result of lfirst_int to
> > uint32 as suggested by Tom [1]. Also, lets keep the way we compare
> > xids consistent in both functions, i.e, if (xid == lfirst_int(lc)).
> >
>
> Fixed this in the attached patch.
>
> > The behavior tested by the test case added for this is not clear
> > primarily because of comments.
> >
> > +++ b/src/test/subscription/t/021_stream_schema.pl
> > @@ -0,0 +1,80 @@
> > +# Test behavior with streaming transaction exceeding 
> > logical_decoding_work_mem
> > ...
> > +# large (streamed) transaction with DDL, DML and ROLLBACKs
> > +$node_publisher->safe_psql('postgres', q{
> > +BEGIN;
> > +ALTER TABLE test_tab ADD COLUMN c INT;
> > +INSERT INTO test_tab SELECT i, md5(i::text), i FROM
> > generate_series(3,3000) s(i);
> > +ALTER TABLE test_tab ADD COLUMN d INT;
> > +COMMIT;
> > +});
> > +
> > +# large (streamed) transaction with DDL, DML and ROLLBACKs
> > +$node_publisher->safe_psql('postgres', q{
> > +BEGIN;
> > +INSERT INTO test_tab SELECT i, md5(i::text), i, i FROM
> > generate_series(3001,3005) s(i);
> > +COMMIT;
> > +});
> > +wait_for_caught_up($node_publisher, $appname);
> >
> > I understand that how this test will test the functionality related to
> > schema_sent stuff but neither the comments atop of file nor atop the
> > test case explains it clearly.
> >
>
> Added comments for this test.
>
> > > > Few more comments:
> >
> > >
> > > > 2.
> > > > 009_stream_simple.pl
> > > > +# Insert, update and delete enough rows to exceed the 64kB limit.
> > > > +$node_publisher->safe_psql('postgres', q{
> > > > +BEGIN;
> > > > +INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(3, 
> > > > 5000) s(i);
> > > > +UPDATE test_tab SET b = md5(b) WHERE mod(a,2) = 0;
> > > > +DELETE FROM test_tab WHERE mod(a,3) = 0;
> > > > +COMMIT;
> > > > +});
> > > >
> > > > How much above this data is 64kB limit? I just wanted to see that it
> > > > should not be on borderline and then due to some alignment issues the
> > > > streaming doesn't happen on some machines? Also, how such a test
> > > > ensures that the streaming has happened because the way we are
> > > > checking results, won't it be the same for the non-streaming case as
> > > > well?
> > >
> > > Only for this case, or you mean for all the tests?
> > >
> >
>
> I have not done this yet.
Most of the test cases are generating above 100kb and a few are around
72kb, Please find the test case wise data size.

015 - 200kb
016 - 150kb
017 - 72kb
018 - 72kb before first rollback to sb and total ~100kb
019 - 76kb before first rollback to sb and total ~100kb
020 - 150kb
021 - 100kb

> > It is better to do it for all tests and I have clarified this in my
> > next email sent yesterday [2] where I have raised a few more comments
> > as well. I hope you have not missed that email.

I saw that I think I replied to this before seeing that.

> > > > 3.
> > > > +# Change the local values of the extra columns on the subscriber,
> > > > +# update publisher, and check that subscriber retains the expected
> > > > +# values
> > > > +$node_subscriber->safe_psql('postgres', "UPDATE test_tab SET c =
> > > > 'epoch'::timestamptz + 987654321 * interval '1s'");
> > > > +$node_publisher->safe_psql('postgres', "UPDATE test_tab SET b = 
> > > > md5(a::text)");
> > > > +
> > > > +wait_for_caught_up($node_publisher, $appname);
> > > > +
> > > > +$result =
> > > > +  $node_subscriber->safe_psql('postgres', "SELECT count(*),
> > > > count(extract(epoch from c) = 987654321), count(d = 999) FROM
> > > > test_tab");
> > > > +is($result, qq(3334|3334|3334), 'check extra columns contain locally
> > > > changed data');
> > > >
> > > > Again, how this test is relevant to streaming mode?
> > >
> > > I agree, it is not specific to the streaming.
> > >
>
> I think we can leave this as of now. After committing the stats
> patches by Sawada-San and Ajin, we might be able to improve this test.

Make sense to me.

> > +sub wait_for_caught_up
> > +{
> > + my ($node, $appname) = @_;
> > +
> > + $node->poll_query_until('postgres',
> > +"SELECT pg_current_wal_lsn() <= replay_lsn FROM pg_stat_replication
> > WHERE application_name = '$appname';"
> > + ) or die "Timed ou
> >
> > The patch has added this in all the test files if it is used in so
> > many tests then we need to add this in some generic place
> > (PostgresNode.pm) but actually, I am not sure if need this at all. Why
> > can't the existing wait_for_catchup in PostgresNode.pm serve the same
> > purpose.
> >
>
> Changed as per this suggestion.

Okay.

> > 2.
> > In system_views.sql,
> >
> > -- All columns of pg_subscription except subconninfo are readable.
> > REVOKE ALL ON pg_subscription FROM

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-09-01 Thread Amit Kapila

On Tue, Sep 1, 2020 at 8:33 PM Amit Kapila  wrote:
>
> I have fixed all the comments except
..
> 3. +# Change the local values of the extra columns on the subscriber,
> +# update publisher, and check that subscriber retains the expected
> +# values
> +$node_subscriber->safe_psql('postgres', "UPDATE test_tab SET c =
> 'epoch'::timestamptz + 987654321 * interval '1s'");
> +$node_publisher->safe_psql('postgres', "UPDATE test_tab SET b = 
> md5(a::text)");
> +
> +wait_for_caught_up($node_publisher, $appname);
> +
> +$result =
> +  $node_subscriber->safe_psql('postgres', "SELECT count(*),
> count(extract(epoch from c) = 987654321), count(d = 999) FROM
> test_tab");
> +is($result, qq(3334|3334|3334), 'check extra columns contain locally
> changed data');
>
> Again, how this test is relevant to streaming mode?
>

I think we can keep this test in one of the newly added tests say in
015_stream_simple.pl to ensure that after streaming transaction, the
non-streaming one behaves expectedly. So we can change the comment as
"Change the local values of the extra columns on the subscriber,
update publisher, and check that subscriber retains the expected
values. This is to ensure that non-streaming transactions behave
properly after a streaming transaction."

We can remove this test from the other two places
016_stream_subxact.pl and 020_stream_binary.pl.

> 4. Apart from the above, I think we should think of minimizing the
> test cases which can be committed with the base patch. We can later
> add more tests.
>

We can combine the tests in 015_stream_simple.pl and
020_stream_binary.pl as I can't see a good reason to keep them
separate. Then, I think we can keep only this part with the main patch
and extract other tests into a separate patch. Basically, we can
commit the basic tests with the main patch and then keep the advanced
tests separately. I am afraid that there are some tests that don't add
much value so we can review them separately.

One minor comment for option 'streaming = on', spacing-wise it should
be consistent in all the tests.

Similarly, we can combine 017_stream_ddl.pl and 021_stream_schema.pl
as both contains similar tests. As per the above suggestion, this will
be in a separate patch though.

If you agree with the above suggestions then kindly make these
adjustments and send the updated patch.

-- 
With Regards,
Amit Kapila.

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-08-31 Thread Amit Kapila

On Mon, Aug 31, 2020 at 10:27 PM Neha Sharma
 wrote:
>
> Hi Amit/Dilip,
>
> I have tested a few scenarios on  top of the v56 patches, where the 
> replication worker still had few subtransactions in uncommitted state and we 
> restart the publisher server.
> No crash or data discrepancies were observed, attached are the test scenarios 
> verified.
>

Thanks, I have pushed the fix
(https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=4ab77697f67aa5b90b032b9175b46901859da6d7).

-- 
With Regards,
Amit Kapila.

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-08-31 Thread Amit Kapila

On Mon, Aug 31, 2020 at 7:28 PM Dilip Kumar  wrote:
>
> On Mon, Aug 31, 2020 at 1:24 PM Amit Kapila  wrote:
> >
> > On Mon, Aug 31, 2020 at 10:49 AM Amit Kapila  
> > wrote:
> > >
> > > On Sun, Aug 30, 2020 at 2:43 PM Dilip Kumar  wrote:
> > > >
> > >
> > > Another comment:
> > >
> > > +cleanup_rel_sync_cache(TransactionId xid, bool is_commit)
> > > +{
> > > + HASH_SEQ_STATUS hash_seq;
> > > + RelationSyncEntry *entry;
> > > +
> > > + Assert(RelationSyncCache != NULL);
> > > +
> > > + hash_seq_init(_seq, RelationSyncCache);
> > > + while ((entry = hash_seq_search(_seq)) != NULL)
> > > + {
> > > + if (is_commit)
> > > + entry->schema_sent = true;
> > >
> > > How is it correct to set 'entry->schema_sent' for all the entries in
> > > RelationSyncCache? Consider a case where due to invalidation in an
> > > unrelated transaction we have set the flag schema_sent for a
> > > particular relation 'r1' as 'false' and that transaction is executed
> > > before the current streamed transaction for which we are performing
> > > commit and called this function. It will set the flag for unrelated
> > > entry in this case 'r1' which doesn't seem correct to me. Or, if this
> > > is correct, it would be a good idea to write some comments about it.
>
> Yeah, this is wrong,  I have fixed this issue in the attached patch
> and also added a new test for the same.
>

In functions cleanup_rel_sync_cache and
get_schema_sent_in_streamed_txn, lets cast the result of lfirst_int to
uint32 as suggested by Tom [1]. Also, lets keep the way we compare
xids consistent in both functions, i.e, if (xid == lfirst_int(lc)).

The behavior tested by the test case added for this is not clear
primarily because of comments.

+++ b/src/test/subscription/t/021_stream_schema.pl
@@ -0,0 +1,80 @@
+# Test behavior with streaming transaction exceeding logical_decoding_work_mem
...
+# large (streamed) transaction with DDL, DML and ROLLBACKs
+$node_publisher->safe_psql('postgres', q{
+BEGIN;
+ALTER TABLE test_tab ADD COLUMN c INT;
+INSERT INTO test_tab SELECT i, md5(i::text), i FROM
generate_series(3,3000) s(i);
+ALTER TABLE test_tab ADD COLUMN d INT;
+COMMIT;
+});
+
+# large (streamed) transaction with DDL, DML and ROLLBACKs
+$node_publisher->safe_psql('postgres', q{
+BEGIN;
+INSERT INTO test_tab SELECT i, md5(i::text), i, i FROM
generate_series(3001,3005) s(i);
+COMMIT;
+});
+wait_for_caught_up($node_publisher, $appname);

I understand that how this test will test the functionality related to
schema_sent stuff but neither the comments atop of file nor atop the
test case explains it clearly.

> > Few more comments:

>
> > 2.
> > 009_stream_simple.pl
> > +# Insert, update and delete enough rows to exceed the 64kB limit.
> > +$node_publisher->safe_psql('postgres', q{
> > +BEGIN;
> > +INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(3, 5000) 
> > s(i);
> > +UPDATE test_tab SET b = md5(b) WHERE mod(a,2) = 0;
> > +DELETE FROM test_tab WHERE mod(a,3) = 0;
> > +COMMIT;
> > +});
> >
> > How much above this data is 64kB limit? I just wanted to see that it
> > should not be on borderline and then due to some alignment issues the
> > streaming doesn't happen on some machines? Also, how such a test
> > ensures that the streaming has happened because the way we are
> > checking results, won't it be the same for the non-streaming case as
> > well?
>
> Only for this case, or you mean for all the tests?
>

It is better to do it for all tests and I have clarified this in my
next email sent yesterday [2] where I have raised a few more comments
as well. I hope you have not missed that email.

> > 3.
> > +# Change the local values of the extra columns on the subscriber,
> > +# update publisher, and check that subscriber retains the expected
> > +# values
> > +$node_subscriber->safe_psql('postgres', "UPDATE test_tab SET c =
> > 'epoch'::timestamptz + 987654321 * interval '1s'");
> > +$node_publisher->safe_psql('postgres', "UPDATE test_tab SET b = 
> > md5(a::text)");
> > +
> > +wait_for_caught_up($node_publisher, $appname);
> > +
> > +$result =
> > +  $node_subscriber->safe_psql('postgres', "SELECT count(*),
> > count(extract(epoch from c) = 987654321), count(d = 999) FROM
> > test_tab");
> > +is($result, qq(3334|3334|3334), 'check extra columns contain locally
> > changed data');
> >
> > Again, how this test is relevant to streaming mode?
>
> I agree, it is not specific to the streaming.
>

> > Apart from the above, I have made a few changes in the attached patch
> > which are mainly to simplify the code at one place, added/edited few
> > comments, some other cosmetic changes, and renamed the test case files
> > as the initials of their name were matching other tests in the similar
> > directory.
>
> Changes look fine to me except this
>
> +
>
> + /* the value must be on/off */
> + if (strcmp(strVal(defel->arg), "on") && strcmp(strVal(defel->arg), "off"))
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> + errmsg("invalid streaming value")));

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-08-31 Thread Neha Sharma

Hi Amit/Dilip,

I have tested a few scenarios on  top of the v56 patches, where the
replication worker still had few subtransactions in uncommitted state and
we restart the publisher server.
No crash or data discrepancies were observed, attached are the test
scenarios verified.

*Data Setup:*
*Publication Server postgresql.conf :*
echo "wal_level = logical
max_wal_senders = 10
max_replication_slots = 15
wal_log_hints = on
hot_standby_feedback = on
wal_receiver_status_interval = 1
listen_addresses='*'
log_min_messages=debug1
wal_sender_timeout = 0
logical_decoding_work_mem=64kB

*Subscription Server postgresql.conf :*
wal_level = logical
max_wal_senders = 10
max_replication_slots = 15
wal_log_hints = on
hot_standby_feedback = on
wal_receiver_status_interval = 1
listen_addresses='*'
log_min_messages=debug1
wal_sender_timeout = 0
logical_decoding_work_mem=64kB
port=5433

*Initial setup:*
*Publication Server:*
create table t(a int PRIMARY KEY ,b text);
CREATE OR REPLACE FUNCTION large_val() RETURNS TEXT LANGUAGE SQL AS 'select
array_agg(md5(g::text))::text from generate_series(1, 256) g';
create publication test_pub for table t
with(PUBLISH='insert,delete,update,truncate');
alter table t replica identity FULL ;
insert into t values (generate_series(1,20),large_val()) ON CONFLICT (a) DO
UPDATE SET a=EXCLUDED.a*300;

*Subscription server:*
 create table t(a int,b text);
 create subscription test_sub CONNECTION 'host=localhost port=5432
dbname=postgres user=edb' PUBLICATION test_pub WITH ( slot_name =
test_slot_sub1,streaming=on);

Thanks.
--
Regards,
Neha Sharma


On Mon, Aug 31, 2020 at 1:25 PM Amit Kapila  wrote:

> On Mon, Aug 31, 2020 at 10:49 AM Amit Kapila 
> wrote:
> >
> > On Sun, Aug 30, 2020 at 2:43 PM Dilip Kumar 
> wrote:
> > >
> >
> > Another comment:
> >
> > +cleanup_rel_sync_cache(TransactionId xid, bool is_commit)
> > +{
> > + HASH_SEQ_STATUS hash_seq;
> > + RelationSyncEntry *entry;
> > +
> > + Assert(RelationSyncCache != NULL);
> > +
> > + hash_seq_init(_seq, RelationSyncCache);
> > + while ((entry = hash_seq_search(_seq)) != NULL)
> > + {
> > + if (is_commit)
> > + entry->schema_sent = true;
> >
> > How is it correct to set 'entry->schema_sent' for all the entries in
> > RelationSyncCache? Consider a case where due to invalidation in an
> > unrelated transaction we have set the flag schema_sent for a
> > particular relation 'r1' as 'false' and that transaction is executed
> > before the current streamed transaction for which we are performing
> > commit and called this function. It will set the flag for unrelated
> > entry in this case 'r1' which doesn't seem correct to me. Or, if this
> > is correct, it would be a good idea to write some comments about it.
> >
>
> Few more comments:
> 1.
> +my $appname = 'tap_sub';
> +$node_subscriber->safe_psql('postgres',
> +"CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr
> application_name=$appname' PUBLICATION tap_pub"
> +);
>
> In most of the tests, we are using the above statement to create a
> subscription. Don't we need (streaming = 'on') parameter while
> creating a subscription? Is there a reason for not doing so in this
> patch itself?
>
> 2.
> 009_stream_simple.pl
> +# Insert, update and delete enough rows to exceed the 64kB limit.
> +$node_publisher->safe_psql('postgres', q{
> +BEGIN;
> +INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(3, 5000)
> s(i);
> +UPDATE test_tab SET b = md5(b) WHERE mod(a,2) = 0;
> +DELETE FROM test_tab WHERE mod(a,3) = 0;
> +COMMIT;
> +});
>
> How much above this data is 64kB limit? I just wanted to see that it
> should not be on borderline and then due to some alignment issues the
> streaming doesn't happen on some machines? Also, how such a test
> ensures that the streaming has happened because the way we are
> checking results, won't it be the same for the non-streaming case as
> well?
>
> 3.
> +# Change the local values of the extra columns on the subscriber,
> +# update publisher, and check that subscriber retains the expected
> +# values
> +$node_subscriber->safe_psql('postgres', "UPDATE test_tab SET c =
> 'epoch'::timestamptz + 987654321 * interval '1s'");
> +$node_publisher->safe_psql('postgres', "UPDATE test_tab SET b =
> md5(a::text)");
> +
> +wait_for_caught_up($node_publisher, $appname);
> +
> +$result =
> +  $node_subscriber->safe_psql('postgres', "SELECT count(*),
> count(extract(epoch from c) = 987654321), count(d = 999) FROM
> test_tab");
> +is($result, qq(3334|3334|3334), 'check extra columns contain locally
> changed data');
>
> Again, how this test is relevant to streaming mode?
>
> 4. I have checked that in one of the previous patches, we have a test
> v53-0004-Add-TAP-test-for-streaming-vs.-DDL which contains a test case
> quite similar to what we have in
>
> v55-0002-Add-support-for-streaming-to-built-in-logical-re/013_stream_subxact_ddl_abort.
> If there is any difference that can cover more scenarios then can we
> consider merging them into one test?
>
> Apart from the

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-08-31 Thread Amit Kapila

On Mon, Aug 31, 2020 at 1:24 PM Amit Kapila  wrote:
>
>
> 2.
> 009_stream_simple.pl
> +# Insert, update and delete enough rows to exceed the 64kB limit.
> +$node_publisher->safe_psql('postgres', q{
> +BEGIN;
> +INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(3, 5000) 
> s(i);
> +UPDATE test_tab SET b = md5(b) WHERE mod(a,2) = 0;
> +DELETE FROM test_tab WHERE mod(a,3) = 0;
> +COMMIT;
> +});
>
> How much above this data is 64kB limit? I just wanted to see that it
> should not be on borderline and then due to some alignment issues the
> streaming doesn't happen on some machines?
>

I think we should find similar information for other tests added by
the patch as well.

Few other comments:
===
+sub wait_for_caught_up
+{
+ my ($node, $appname) = @_;
+
+ $node->poll_query_until('postgres',
+"SELECT pg_current_wal_lsn() <= replay_lsn FROM pg_stat_replication
WHERE application_name = '$appname';"
+ ) or die "Timed ou

The patch has added this in all the test files if it is used in so
many tests then we need to add this in some generic place
(PostgresNode.pm) but actually, I am not sure if need this at all. Why
can't the existing wait_for_catchup in PostgresNode.pm serve the same
purpose.

2.
In system_views.sql,

-- All columns of pg_subscription except subconninfo are readable.
REVOKE ALL ON pg_subscription FROM public;
GRANT SELECT (subdbid, subname, subowner, subenabled, subbinary,
subslotname, subpublications)
ON pg_subscription TO public;

Here, we need to update for substream column as well.

3. Update describeSubscriptions() to show the 'substream' value in \dRs.

4. Also, lets add few tests in subscription.sql as we have added
'binary' option in commit 9de77b5453.

5. I think we can merge pg_dump related changes (the last version
posted in mail thread is v53-0005-Add-streaming-option-in-pg_dump) in
the main patch, one minor comment on pg_dump related changes
@@ -4358,6 +4369,8 @@ dumpSubscription(Archive *fout, SubscriptionInfo *subinfo)
  if (strcmp(subinfo->subbinary, "t") == 0)
  appendPQExpBuffer(query, ", binary = true");

+ if (strcmp(subinfo->substream, "f") != 0)
+ appendPQExpBuffer(query, ", streaming = on");
  if (strcmp(subinfo->subsynccommit, "off") != 0)
  appendPQExpBuffer(query, ", synchronous_commit = %s",
fmtId(subinfo->subsynccommit));

Keep one line space between substream and subsynccommit option code to
keep it consistent with nearby code.

-- 
With Regards,
Amit Kapila.

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-08-30 Thread Amit Kapila

On Sun, Aug 30, 2020 at 2:43 PM Dilip Kumar  wrote:
>
> On Sat, Aug 29, 2020 at 5:18 PM Amit Kapila  wrote:
> >
>
> > One more comment for which I haven't done anything yet.
> > +static void
> > +set_schema_sent_in_streamed_txn(RelationSyncEntry *entry, TransactionId 
> > xid)
> > +{
> > + MemoryContext oldctx;
> > +
> > + oldctx = MemoryContextSwitchTo(CacheMemoryContext);
> > +
> > + entry->streamed_txns = lappend_int(entry->streamed_txns, xid);
>
> > Is it a good idea to append xid with lappend_int? Won't we need
> > something equivalent for uint32? If so, I think we have a couple of
> > options (a) use lcons method and accordingly append the pointer to
> > xid, I think we need to allocate memory for xid if we want to use this
> > idea or (b) use an array instead. What do you think?
>
> BTW, OID is internally mapped to uint32,  but using lappend_oid might
> not look good.  So maybe we can provide an option for lappend_uint32?
> Using an array is also not a bad idea.  Providing lappend_uint32
> option looks more appealing to me.
>

I thought about this again and I feel it might be okay to use it for
our case as after storing it in T_IntList, we primarily fetch it for
comparison with TrasnactionId (uint32), so this shouldn't create any
problem. I feel we can just discuss this in a separate thread and
check the opinion of others, what do you think?

Another comment:

+cleanup_rel_sync_cache(TransactionId xid, bool is_commit)
+{
+ HASH_SEQ_STATUS hash_seq;
+ RelationSyncEntry *entry;
+
+ Assert(RelationSyncCache != NULL);
+
+ hash_seq_init(_seq, RelationSyncCache);
+ while ((entry = hash_seq_search(_seq)) != NULL)
+ {
+ if (is_commit)
+ entry->schema_sent = true;

How is it correct to set 'entry->schema_sent' for all the entries in
RelationSyncCache? Consider a case where due to invalidation in an
unrelated transaction we have set the flag schema_sent for a
particular relation 'r1' as 'false' and that transaction is executed
before the current streamed transaction for which we are performing
commit and called this function. It will set the flag for unrelated
entry in this case 'r1' which doesn't seem correct to me. Or, if this
is correct, it would be a good idea to write some comments about it.

-- 
With Regards,
Amit Kapila.

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-08-29 Thread Amit Kapila

On Fri, Aug 28, 2020 at 2:18 PM Dilip Kumar  wrote:
>
> As discussed, I have added a another test case for covering the out of
> order subtransaction rollback scenario.
>

+# large (streamed) transaction with out of order subtransaction ROLLBACKs
+$node_publisher->safe_psql('postgres', q{

How about writing a comment as: "large (streamed) transaction with
subscriber receiving out of order subtransaction ROLLBACKs"?

I have reviewed and modified the number of things in the attached patch:
1. In apply_handle_origin, improved the check streamed xacts.
2. In apply_handle_stream_commit() while applying changes in the loop,
added CHECK_FOR_INTERRUPTS.
3. In DEBUG messages, print the path with double-quotes as we are
doing in all other places.
4.
+ /*
+ * Exit if streaming option is changed. The launcher will start new
+ * worker.
+ */
+ if (newsub->stream != MySubscription->stream)
+ {
+ ereport(LOG,
+ (errmsg("logical replication apply worker for subscription \"%s\" will "
+ "restart because subscription's streaming option were changed",
+ MySubscription->name)));
+
+ proc_exit(0);
+ }
+
We don't need a separate check like this. I have merged this into one
of the existing checks.
5.
subxact_info_write()
{
..
+ if (subxact_data.nsubxacts == 0)
+ {
+ if (ent->subxact_fileset)
+ {
+ cleanup_subxact_info();
+ BufFileDeleteShared(ent->subxact_fileset, path);
+ pfree(ent->subxact_fileset);
+ ent->subxact_fileset = NULL;
+ }

I don't think it is right to use BufFileDeleteShared interface here
because it won't perform SharedFileSetUnregister which means if after
above code execution is the server exits it will crash in
SharedFileSetDeleteOnProcExit which will try to access already deleted
fileset entry. Fixed this by calling SharedFileSetDeleteAll() instead.
The another related problem is that in function
SharedFileSetDeleteOnProcExit, it tries to delete the list element
while traversing the list with 'foreach' construct which makes the
behavior of list traversal unpredictable. I have fixed this in a
separate patch v54-0001-Fix-the-SharedFileSetUnregister-API, if you
are fine with this, I would like to commit this as this fixes a
problem in the existing commit 808e13b282.
6. Function stream_cleanup_files() contains a missing_ok argument
which is not used so removed it.
7. In pgoutput.c, change the ordering of functions to make them
consistent with their declaration.
8.
typedef struct RelationSyncEntry
 {
  Oid relid; /* relation oid */
+ TransactionId xid; /* transaction that created the record */

Removed above parameter as this doesn't seem to be required as per the
new design in the patch.

Apart from above, I have added/changed quite a few comments and a few
other cosmetic changes. Kindly review and let me know what do you
think about the changes?

One more comment for which I haven't done anything yet.
+static void
+set_schema_sent_in_streamed_txn(RelationSyncEntry *entry, TransactionId xid)
+{
+ MemoryContext oldctx;
+
+ oldctx = MemoryContextSwitchTo(CacheMemoryContext);
+
+ entry->streamed_txns = lappend_int(entry->streamed_txns, xid);

Is it a good idea to append xid with lappend_int? Won't we need
something equivalent for uint32? If so, I think we have a couple of
options (a) use lcons method and accordingly append the pointer to
xid, I think we need to allocate memory for xid if we want to use this
idea or (b) use an array instead. What do you think?

-- 
With Regards,
Amit Kapila.


v54-0001-Fix-the-SharedFileSetUnregister-API.patch
Description: Binary data


v54-0002-Add-support-for-streaming-to-built-in-logical-re.patch
Description: Binary data

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-08-28 Thread Neha Sharma

Hi,

I have done code coverage analysis on the latest patches(v53) and below is
the report for the same.
Highlighted are the files where the coverage modifications were observed.

OS: Ubuntu 18.04
Patch applied on commit : 77c7267c37f7fa8e5e48abda4798afdbecb2b95a

File Name
Coverage
Without logical decoding patch On v53 (2,3,4,5) patch Without v53-0003 patch
%Line %Function %Line %Function %Line %Function
src/backend/access/transam/xact.c 86.2 92.9 86.2 92.9 86.2 92.9
src/backend/access/transam/xloginsert.c 90.2 94.1 90.2 94.1 90.2 94.1
src/backend/access/transam/xlogreader.c 73.3 93.3 73.8 93.3 73.8 93.3
src/backend/replication/logical/decode.c 93.4 100 93.4 100 93.4 100
src/backend/access/rmgrdesc/xactdesc.c 54.4 63.6 54.4 63.6 54.4 63.6
src/backend/replication/logical/reorderbuffer.c 93.4 96.7 93.4 96.7 93.4
96.7
src/backend/utils/cache/inval.c 98.1 100 98.1 100 98.1 100
contrib/test_decoding/test_decoding.c 86.8 95.2 86.8 95.2 86.8 95.2
src/backend/replication/logical/logical.c 90.9 93.5 90.9 93.5 91.8 93.5
src/backend/access/heap/heapam.c 86.1 94.5 86.1 94.5 86.1 94.5
src/backend/access/index/genam.c 90.7 91.7 91.2 91.7 91.2 91.7
src/backend/access/table/tableam.c 90.6 100 90.6 100 90.6 100
src/backend/utils/time/snapmgr.c 81.1 98.1 80.2 98.1 81.1 98.1
src/include/access/tableam.h 92.5 100 92.5 100 92.5 100
src/backend/access/heap/heapam_visibility.c 77.8 100 77.8 100 77.8 100
src/backend/replication/walsender.c 90.5 97.8 90.5 97.8 90.9 100
src/backend/catalog/pg_subscription.c 96 100 96 100 96 100
src/backend/commands/subscriptioncmds.c 93.2 90 92.7 90 92.7 90
src/backend/postmaster/pgstat.c 64.2 85.1 63.9 85.1 64.6 86.1
src/backend/replication/libpqwalreceiver/libpqwalreceiver.c 82.4 95 82.5 95
83.6 95
src/backend/replication/logical/proto.c 93.5 91.3 93.7 93.3 93.7 93.3
src/backend/replication/logical/worker.c 91.6 96 91.5 97.4 91.9 97.4
src/backend/replication/pgoutput/pgoutput.c 81.9 100 85.5 100 86.2 100
src/backend/replication/slotfuncs.c 93 93.8 93 93.8 93 93.8
src/include/pgstat.h 100 - 100 - 100 -
src/backend/replication/logical/logicalfuncs.c 87.1 90 87.1 90 87.1 90
src/backend/storage/file/buffile.c 68.3 85 69.6 85 69.6 85
src/backend/storage/file/fd.c 81.1 93 81.1 93 81.1 93
src/backend/storage/file/sharedfileset.c 77.7 90.9 93.2 100 93.2 100
src/backend/utils/sort/logtape.c 94.4 100 94.4 100 94.4 100
src/backend/utils/sort/sharedtuplestore.c 90.1 90.9 90.1 90.9 90.1 90.9

Thanks.
--
Regards,
Neha Sharma


On Thu, Aug 27, 2020 at 11:16 AM Amit Kapila 
wrote:

> On Wed, Aug 26, 2020 at 11:22 PM Jeff Janes  wrote:
> >
> >
> > On Tue, Aug 25, 2020 at 8:58 AM Amit Kapila 
> wrote:
> >
> >>
> >>  I am planning
> >> to push the first patch (v53-0001-Extend-the-BufFile-interface) in
> >> this series tomorrow unless you have any comments on the same.
> >
> >
> >
> > I'm getting compiler warnings now,
> src/backend/storage/file/sharedfileset.c line 288 needs to be:
> >
> > boolfound PG_USED_FOR_ASSERTS_ONLY = false;
> >
>
> Thanks for the report. Tom Lane has already fixed this [1].
>
> [1] -
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=e942af7b8261cd8070d0eeaf518dbc1a664859fd
>
> --
> With Regards,
> Amit Kapila.
>
>
>

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-08-28 Thread Dilip Kumar

On Thu, Aug 27, 2020 at 11:16 AM Amit Kapila  wrote:
>
> On Wed, Aug 26, 2020 at 11:22 PM Jeff Janes  wrote:
> >
> >
> > On Tue, Aug 25, 2020 at 8:58 AM Amit Kapila  wrote:
> >
> >>
> >>  I am planning
> >> to push the first patch (v53-0001-Extend-the-BufFile-interface) in
> >> this series tomorrow unless you have any comments on the same.
> >
> >
> >
> > I'm getting compiler warnings now, src/backend/storage/file/sharedfileset.c 
> > line 288 needs to be:
> >
> > boolfound PG_USED_FOR_ASSERTS_ONLY = false;
> >
>
> Thanks for the report. Tom Lane has already fixed this [1].
>
> [1] - 
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=e942af7b8261cd8070d0eeaf518dbc1a664859fd

As discussed, I have added a another test case for covering the out of
order subtransaction rollback scenario.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


tap_test_for_out_of_order_subxact_abort.patch
Description: Binary data

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-08-26 Thread Amit Kapila

On Wed, Aug 26, 2020 at 11:22 PM Jeff Janes  wrote:
>
>
> On Tue, Aug 25, 2020 at 8:58 AM Amit Kapila  wrote:
>
>>
>>  I am planning
>> to push the first patch (v53-0001-Extend-the-BufFile-interface) in
>> this series tomorrow unless you have any comments on the same.
>
>
>
> I'm getting compiler warnings now, src/backend/storage/file/sharedfileset.c 
> line 288 needs to be:
>
> boolfound PG_USED_FOR_ASSERTS_ONLY = false;
>

Thanks for the report. Tom Lane has already fixed this [1].

[1] - 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=e942af7b8261cd8070d0eeaf518dbc1a664859fd

-- 
With Regards,
Amit Kapila.

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-08-26 Thread Jeff Janes

On Tue, Aug 25, 2020 at 8:58 AM Amit Kapila  wrote:

>  I am planning
> to push the first patch (v53-0001-Extend-the-BufFile-interface) in
> this series tomorrow unless you have any comments on the same.
>

I'm getting compiler warnings now, src/backend/storage/file/sharedfileset.c
line 288 needs to be:

boolfound PG_USED_FOR_ASSERTS_ONLY = false;

Cheers,

Jeff

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-08-25 Thread Dilip Kumar

On Tue, Aug 25, 2020 at 6:27 PM Amit Kapila  wrote:
>
> On Tue, Aug 25, 2020 at 10:41 AM Dilip Kumar  wrote:
> >
> > On Tue, Aug 25, 2020 at 9:31 AM Amit Kapila  wrote:
> > >
> > >
> > > I think the existing design is superior as it allows the flexibility
> > > to create transaction files in different temp_tablespaces which is
> > > quite important to consider as we know the files will be created only
> > > for large transactions. Once we fix the sharedfileset for a worker all
> > > the files will be created in the temp_tablespaces chosen for the first
> > > time apply worker creates it even if it got changed at some later
> > > point of time (user can change its value and then do reload config
> > > which I think will impact the worker settings as well). This all can
> > > happen because we set the tablespaces at the time of
> > > SharedFileSetInit.
> >
> > Yeah, I agree with this point,  that if we use the single shared
> > fileset then it will always use the same tablespace for all the
> > streaming transactions.  And, we might get the benefit of concurrent
> > I/O if we use different tablespaces as we are not immediately flushing
> > the files to the disk.
> >
>
> Okay, so let's retain the original approach then. I have made a few
> cosmetic modifications in the first two patches which include updating
> docs, comments, slightly modify the commit message, and change the
> code to match the nearby code. One change which you might have a
> different opinion is below:
>
> + case WAIT_EVENT_LOGICAL_CHANGES_READ:
> + event_name = "ReorderLogicalChangesRead";
> + break;
> + case WAIT_EVENT_LOGICAL_CHANGES_WRITE:
> + event_name = "ReorderLogicalChangesWrite";
> + break;
> + case WAIT_EVENT_LOGICAL_SUBXACT_READ:
> + event_name = "ReorderLogicalSubxactRead";
> + break;
> + case WAIT_EVENT_LOGICAL_SUBXACT_WRITE:
> + event_name = "ReorderLogicalSubxactWrite";
> + break;
>
> Why do we want to name these events starting with name as Reorder*? I
> think these are used in subscriber-side, so no need to use the word
> Reorder, so I have removed it from the attached patch. I am planning
> to push the first patch (v53-0001-Extend-the-BufFile-interface) in
> this series tomorrow unless you have any comments on the same.

Your changes in 0001 and 0002,  looks fine to me.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-08-24 Thread Dilip Kumar

On Tue, Aug 25, 2020 at 9:31 AM Amit Kapila  wrote:
>
> On Mon, Aug 24, 2020 at 9:41 PM Dilip Kumar  wrote:
> >
> > On Sat, Aug 22, 2020 at 8:38 AM Amit Kapila  wrote:
> > >
> > > On Fri, Aug 21, 2020 at 5:10 PM Dilip Kumar  wrote:
> > > >
> > > > I have reviewed and tested the patch and the changes look fine to me.
> > > >
> > >
> > > Thanks, I will push the next patch early next week (by Tuesday) unless
> > > you or someone else has any more comments on it. The summary of the
> > > patch (v52-0001-Extend-the-BufFile-interface, attached with my
> > > previous email) I am planning to push is: "It extends the BufFile
> > > interface to support temporary files that can be used by the single
> > > backend when the corresponding files need to be survived across the
> > > transaction and need to be opened and closed multiple times. Such
> > > files need to be created as a member of a SharedFileSet. We have
> > > implemented the interface for BufFileTruncate to allow files to be
> > > truncated up to a particular offset and extended the BufFileSeek API
> > > to support SEEK_END case. We have also added an option to provide a
> > > mode while opening the shared BufFiles instead of always opening in
> > > read-only mode. These enhancements in BufFile interface are required
> > > for the upcoming patch to allow the replication apply worker, to
> > > properly handle streamed in-progress transactions."
> >
> > While reviewing 0002, I realized that instead of using individual
> > shared fileset for each transaction, we can use just one common shared
> > file set.  We can create individual buffile under one shared fileset
> > and whenever a transaction commits/aborts we can just delete its
> > buffile and the shared fileset can stay.
> >
>
> I think the existing design is superior as it allows the flexibility
> to create transaction files in different temp_tablespaces which is
> quite important to consider as we know the files will be created only
> for large transactions. Once we fix the sharedfileset for a worker all
> the files will be created in the temp_tablespaces chosen for the first
> time apply worker creates it even if it got changed at some later
> point of time (user can change its value and then do reload config
> which I think will impact the worker settings as well). This all can
> happen because we set the tablespaces at the time of
> SharedFileSetInit.

Yeah, I agree with this point,  that if we use the single shared
fileset then it will always use the same tablespace for all the
streaming transactions.  And, we might get the benefit of concurrent
I/O if we use different tablespaces as we are not immediately flushing
the files to the disk.

> The other relatively smaller thing which I don't like is that we
> always need to create a buffile for subxact even though we don't need
> it. We might be able to find some solution for this but I guess the
> previous point is what bothers me more.

Yeah, if we go this way we might need to find some solution to this.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-08-24 Thread Amit Kapila

On Mon, Aug 24, 2020 at 9:41 PM Dilip Kumar  wrote:
>
> On Sat, Aug 22, 2020 at 8:38 AM Amit Kapila  wrote:
> >
> > On Fri, Aug 21, 2020 at 5:10 PM Dilip Kumar  wrote:
> > >
> > > I have reviewed and tested the patch and the changes look fine to me.
> > >
> >
> > Thanks, I will push the next patch early next week (by Tuesday) unless
> > you or someone else has any more comments on it. The summary of the
> > patch (v52-0001-Extend-the-BufFile-interface, attached with my
> > previous email) I am planning to push is: "It extends the BufFile
> > interface to support temporary files that can be used by the single
> > backend when the corresponding files need to be survived across the
> > transaction and need to be opened and closed multiple times. Such
> > files need to be created as a member of a SharedFileSet. We have
> > implemented the interface for BufFileTruncate to allow files to be
> > truncated up to a particular offset and extended the BufFileSeek API
> > to support SEEK_END case. We have also added an option to provide a
> > mode while opening the shared BufFiles instead of always opening in
> > read-only mode. These enhancements in BufFile interface are required
> > for the upcoming patch to allow the replication apply worker, to
> > properly handle streamed in-progress transactions."
>
> While reviewing 0002, I realized that instead of using individual
> shared fileset for each transaction, we can use just one common shared
> file set.  We can create individual buffile under one shared fileset
> and whenever a transaction commits/aborts we can just delete its
> buffile and the shared fileset can stay.
>

I think the existing design is superior as it allows the flexibility
to create transaction files in different temp_tablespaces which is
quite important to consider as we know the files will be created only
for large transactions. Once we fix the sharedfileset for a worker all
the files will be created in the temp_tablespaces chosen for the first
time apply worker creates it even if it got changed at some later
point of time (user can change its value and then do reload config
which I think will impact the worker settings as well). This all can
happen because we set the tablespaces at the time of
SharedFileSetInit.

The other relatively smaller thing which I don't like is that we
always need to create a buffile for subxact even though we don't need
it. We might be able to find some solution for this but I guess the
previous point is what bothers me more.

-- 
With Regards,
Amit Kapila.

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-08-24 Thread Dilip Kumar

On Sat, Aug 22, 2020 at 8:38 AM Amit Kapila  wrote:
>
> On Fri, Aug 21, 2020 at 5:10 PM Dilip Kumar  wrote:
> >
> > I have reviewed and tested the patch and the changes look fine to me.
> >
>
> Thanks, I will push the next patch early next week (by Tuesday) unless
> you or someone else has any more comments on it. The summary of the
> patch (v52-0001-Extend-the-BufFile-interface, attached with my
> previous email) I am planning to push is: "It extends the BufFile
> interface to support temporary files that can be used by the single
> backend when the corresponding files need to be survived across the
> transaction and need to be opened and closed multiple times. Such
> files need to be created as a member of a SharedFileSet. We have
> implemented the interface for BufFileTruncate to allow files to be
> truncated up to a particular offset and extended the BufFileSeek API
> to support SEEK_END case. We have also added an option to provide a
> mode while opening the shared BufFiles instead of always opening in
> read-only mode. These enhancements in BufFile interface are required
> for the upcoming patch to allow the replication apply worker, to
> properly handle streamed in-progress transactions."

While reviewing 0002, I realized that instead of using individual
shared fileset for each transaction, we can use just one common shared
file set.  We can create individual buffile under one shared fileset
and whenever a transaction commits/aborts we can just delete its
buffile and the shared fileset can stay.

I have attached a POC patch for this idea and if we agree with this
approach then I will prepare a final patch in a couple of days.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


buffile_changes.patch
Description: Binary data

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-08-21 Thread Amit Kapila

On Fri, Aug 21, 2020 at 5:10 PM Dilip Kumar  wrote:
>
> I have reviewed and tested the patch and the changes look fine to me.
>

Thanks, I will push the next patch early next week (by Tuesday) unless
you or someone else has any more comments on it. The summary of the
patch (v52-0001-Extend-the-BufFile-interface, attached with my
previous email) I am planning to push is: "It extends the BufFile
interface to support temporary files that can be used by the single
backend when the corresponding files need to be survived across the
transaction and need to be opened and closed multiple times. Such
files need to be created as a member of a SharedFileSet. We have
implemented the interface for BufFileTruncate to allow files to be
truncated up to a particular offset and extended the BufFileSeek API
to support SEEK_END case. We have also added an option to provide a
mode while opening the shared BufFiles instead of always opening in
read-only mode. These enhancements in BufFile interface are required
for the upcoming patch to allow the replication apply worker, to
properly handle streamed in-progress transactions."

-- 
With Regards,
Amit Kapila.

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-08-21 Thread Dilip Kumar

On Fri, Aug 21, 2020 at 3:13 PM Amit Kapila  wrote:
>
> On Fri, Aug 21, 2020 at 10:33 AM Dilip Kumar  wrote:
> >
> > On Fri, Aug 21, 2020 at 9:14 AM Amit Kapila  wrote:
> > >
> > > 2.
> > > + /*
> > > + * If the new location is smaller then the current location in file then
> > > + * we need to set the curFile and the curOffset to the new values and 
> > > also
> > > + * reset the pos and nbytes.  Otherwise nothing to do.
> > > + */
> > > + else if ((newFile < file->curFile) ||
> > > + newOffset < file->curOffset + file->pos)
> > > + {
> > > + file->curFile = newFile;
> > > + file->curOffset = newOffset;
> > > + file->pos = 0;
> > > + file->nbytes = 0;
> > > + }
> > >
> > > Shouldn't there be && instead of || because if newFile is greater than
> > > curFile then there is no meaning to update it?
> >
> > I think this condition is wrong it should be,
> >
> > else if ((newFile < file->curFile) || ((newFile == file->curFile) &&
> > (newOffset < file->curOffset + file->pos)
> >
> > Basically, either new file is smaller otherwise if it is the same
> > then-new offset should be smaller.
> >
>
> I think we don't need to use file->pos for that as that is required
> only for the current buffer, otherwise, such a condition should
> suffice the need. However, I was not happy with the way code and
> conditions were arranged in BufFileTruncateShared, so I have
> re-arranged them and change quite a few comments in that API. Apart
> from that I have updated the docs and ran pgindent for the first
> patch. Do let me know if you have any more comments on the first
> patch?

I have reviewed and tested the patch and the changes look fine to me.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-08-20 Thread Dilip Kumar

On Fri, Aug 21, 2020 at 10:20 AM Amit Kapila  wrote:
>
> On Fri, Aug 21, 2020 at 9:14 AM Amit Kapila  wrote:
> >
> > On Thu, Aug 20, 2020 at 5:42 PM Dilip Kumar  wrote:
> > >
> > > On Thu, Aug 20, 2020 at 2:30 PM Amit Kapila  
> > > wrote:
> > > >
> > > >
> > > > Right, I think this can happen if one has changed those by BufFileSeek
> > > > before doing truncate. We should fix that case as well.
> > >
> > > Right.
> > >
> > > > >  I will work on those along with your other comments and
> > > > > submit the updated patch.
> > >
> > > I have fixed this in the attached patch along with your other
> > > comments.  I have also attached a contrib module that is just used for
> > > testing the truncate API.
> > >
> >
> > Few comments:
> > ==
> > +void
> > +BufFileTruncateShared(BufFile *file, int fileno, off_t offset)
> > {
> > ..
> > + if ((i != fileno || offset == 0) && i != 0)
> > + {
> > + SharedSegmentName(segment_name, file->name, i);
> > + FileClose(file->files[i]);
> > + if (!SharedFileSetDelete(file->fileset, segment_name, true))
> > + ereport(ERROR,
> > + (errcode_for_file_access(),
> > + errmsg("could not delete shared fileset \"%s\": %m",
> > + segment_name)));
> > + numFiles--;
> > + newOffset = MAX_PHYSICAL_FILESIZE;
> > +
> > + if (i == fileno)
> > + newFile--;
> > + }
> >
> > Here, shouldn't it be i <= fileno? Because we need to move back the
> > curFile up to newFile whenever curFile is greater than newFile
> >
>
> I think now I have understood why you have added this condition but
> probably a comment on the lines "This is required to indicate that we
> have removed the given fileno" would be better for future readers.

Okay.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-08-20 Thread Dilip Kumar

On Fri, Aug 21, 2020 at 9:14 AM Amit Kapila  wrote:
>
> On Thu, Aug 20, 2020 at 5:42 PM Dilip Kumar  wrote:
> >
> > On Thu, Aug 20, 2020 at 2:30 PM Amit Kapila  wrote:
> > >
> > >
> > > Right, I think this can happen if one has changed those by BufFileSeek
> > > before doing truncate. We should fix that case as well.
> >
> > Right.
> >
> > > >  I will work on those along with your other comments and
> > > > submit the updated patch.
> >
> > I have fixed this in the attached patch along with your other
> > comments.  I have also attached a contrib module that is just used for
> > testing the truncate API.
> >
>
> Few comments:
> ==
> +void
> +BufFileTruncateShared(BufFile *file, int fileno, off_t offset)
> {
> ..
> + if ((i != fileno || offset == 0) && i != 0)
> + {
> + SharedSegmentName(segment_name, file->name, i);
> + FileClose(file->files[i]);
> + if (!SharedFileSetDelete(file->fileset, segment_name, true))
> + ereport(ERROR,
> + (errcode_for_file_access(),
> + errmsg("could not delete shared fileset \"%s\": %m",
> + segment_name)));
> + numFiles--;
> + newOffset = MAX_PHYSICAL_FILESIZE;
> +
> + if (i == fileno)
> + newFile--;
> + }
>
> Here, shouldn't it be i <= fileno? Because we need to move back the
> curFile up to newFile whenever curFile is greater than newFile

+/* Loop over all the files upto the fileno which we want to truncate. */
+for (i = file->numFiles - 1; i >= fileno; i--)

Because the above loop is up to the fileno, so I feel there is no
point of that check or any assert.

> 2.
> + /*
> + * If the new location is smaller then the current location in file then
> + * we need to set the curFile and the curOffset to the new values and also
> + * reset the pos and nbytes.  Otherwise nothing to do.
> + */
> + else if ((newFile < file->curFile) ||
> + newOffset < file->curOffset + file->pos)
> + {
> + file->curFile = newFile;
> + file->curOffset = newOffset;
> + file->pos = 0;
> + file->nbytes = 0;
> + }
>
> Shouldn't there be && instead of || because if newFile is greater than
> curFile then there is no meaning to update it?

I think this condition is wrong it should be,

else if ((newFile < file->curFile) || ((newFile == file->curFile) &&
(newOffset < file->curOffset + file->pos)

Basically, either new file is smaller otherwise if it is the same
then-new offset should be smaller.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-08-20 Thread Amit Kapila

On Fri, Aug 21, 2020 at 9:14 AM Amit Kapila  wrote:
>
> On Thu, Aug 20, 2020 at 5:42 PM Dilip Kumar  wrote:
> >
> > On Thu, Aug 20, 2020 at 2:30 PM Amit Kapila  wrote:
> > >
> > >
> > > Right, I think this can happen if one has changed those by BufFileSeek
> > > before doing truncate. We should fix that case as well.
> >
> > Right.
> >
> > > >  I will work on those along with your other comments and
> > > > submit the updated patch.
> >
> > I have fixed this in the attached patch along with your other
> > comments.  I have also attached a contrib module that is just used for
> > testing the truncate API.
> >
>
> Few comments:
> ==
> +void
> +BufFileTruncateShared(BufFile *file, int fileno, off_t offset)
> {
> ..
> + if ((i != fileno || offset == 0) && i != 0)
> + {
> + SharedSegmentName(segment_name, file->name, i);
> + FileClose(file->files[i]);
> + if (!SharedFileSetDelete(file->fileset, segment_name, true))
> + ereport(ERROR,
> + (errcode_for_file_access(),
> + errmsg("could not delete shared fileset \"%s\": %m",
> + segment_name)));
> + numFiles--;
> + newOffset = MAX_PHYSICAL_FILESIZE;
> +
> + if (i == fileno)
> + newFile--;
> + }
>
> Here, shouldn't it be i <= fileno? Because we need to move back the
> curFile up to newFile whenever curFile is greater than newFile
>

I think now I have understood why you have added this condition but
probably a comment on the lines "This is required to indicate that we
have removed the given fileno" would be better for future readers.

-- 
With Regards,
Amit Kapila.

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-08-20 Thread Amit Kapila

On Fri, Aug 21, 2020 at 9:14 AM Amit Kapila  wrote:
>
> On Thu, Aug 20, 2020 at 5:42 PM Dilip Kumar  wrote:
> >
> > On Thu, Aug 20, 2020 at 2:30 PM Amit Kapila  wrote:
> > >
> > >
> > > Right, I think this can happen if one has changed those by BufFileSeek
> > > before doing truncate. We should fix that case as well.
> >
> > Right.
> >
> > > >  I will work on those along with your other comments and
> > > > submit the updated patch.
> >
> > I have fixed this in the attached patch along with your other
> > comments.  I have also attached a contrib module that is just used for
> > testing the truncate API.
> >
>
> Few comments:
> ==
> +void
> +BufFileTruncateShared(BufFile *file, int fileno, off_t offset)
> {
> ..
> + if ((i != fileno || offset == 0) && i != 0)
> + {
> + SharedSegmentName(segment_name, file->name, i);
> + FileClose(file->files[i]);
> + if (!SharedFileSetDelete(file->fileset, segment_name, true))
> + ereport(ERROR,
> + (errcode_for_file_access(),
> + errmsg("could not delete shared fileset \"%s\": %m",
> + segment_name)));
> + numFiles--;
> + newOffset = MAX_PHYSICAL_FILESIZE;
> +
> + if (i == fileno)
> + newFile--;
> + }
>
> Here, shouldn't it be i <= fileno? Because we need to move back the
> curFile up to newFile whenever curFile is greater than newFile
>
> 2.
> + /*
> + * If the new location is smaller then the current location in file then
> + * we need to set the curFile and the curOffset to the new values and also
> + * reset the pos and nbytes.  Otherwise nothing to do.
> + */
> + else if ((newFile < file->curFile) ||
> + newOffset < file->curOffset + file->pos)
> + {
> + file->curFile = newFile;
> + file->curOffset = newOffset;
> + file->pos = 0;
> + file->nbytes = 0;
> + }
>
> Shouldn't there be && instead of || because if newFile is greater than
> curFile then there is no meaning to update it?
>

Wait, actually, it is not clear to me which case second condition
(newOffset < file->curOffset + file->pos) is trying to cover, so I
can't recommend anything for this. Can you please explain to me why
you have added the second condition in the above check?


-- 
With Regards,
Amit Kapila.

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-08-20 Thread Amit Kapila

On Thu, Aug 20, 2020 at 5:42 PM Dilip Kumar  wrote:
>
> On Thu, Aug 20, 2020 at 2:30 PM Amit Kapila  wrote:
> >
> >
> > Right, I think this can happen if one has changed those by BufFileSeek
> > before doing truncate. We should fix that case as well.
>
> Right.
>
> > >  I will work on those along with your other comments and
> > > submit the updated patch.
>
> I have fixed this in the attached patch along with your other
> comments.  I have also attached a contrib module that is just used for
> testing the truncate API.
>

Few comments:
==
+void
+BufFileTruncateShared(BufFile *file, int fileno, off_t offset)
{
..
+ if ((i != fileno || offset == 0) && i != 0)
+ {
+ SharedSegmentName(segment_name, file->name, i);
+ FileClose(file->files[i]);
+ if (!SharedFileSetDelete(file->fileset, segment_name, true))
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not delete shared fileset \"%s\": %m",
+ segment_name)));
+ numFiles--;
+ newOffset = MAX_PHYSICAL_FILESIZE;
+
+ if (i == fileno)
+ newFile--;
+ }

Here, shouldn't it be i <= fileno? Because we need to move back the
curFile up to newFile whenever curFile is greater than newFile

2.
+ /*
+ * If the new location is smaller then the current location in file then
+ * we need to set the curFile and the curOffset to the new values and also
+ * reset the pos and nbytes.  Otherwise nothing to do.
+ */
+ else if ((newFile < file->curFile) ||
+ newOffset < file->curOffset + file->pos)
+ {
+ file->curFile = newFile;
+ file->curOffset = newOffset;
+ file->pos = 0;
+ file->nbytes = 0;
+ }

Shouldn't there be && instead of || because if newFile is greater than
curFile then there is no meaning to update it?

-- 
With Regards,
Amit Kapila.

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-08-20 Thread Amit Kapila

On Thu, Aug 20, 2020 at 1:41 PM Dilip Kumar  wrote:
>
> On Wed, Aug 19, 2020 at 1:35 PM Dilip Kumar  wrote:
> >
> > On Wed, Aug 19, 2020 at 12:20 PM Amit Kapila  
> > wrote:
> > >
> > > On Wed, Aug 19, 2020 at 10:10 AM Amit Kapila  
> > > wrote:
> > > >
> > > > On Mon, Aug 17, 2020 at 6:29 PM Dilip Kumar  
> > > > wrote:
> > > > >
> > > > >
> > > > > In last patch v49-0001, there is one issue,  Basically, I have called
> > > > > BufFileFlush in all the cases.  But, ideally, we can not call this if
> > > > > the underlying files are deleted/truncated because those files/blocks
> > > > > might not exist now.  So I think if the truncate position is within
> > > > > the same buffer we just need to adjust the buffer,  otherwise we just
> > > > > need to set the currFile and currOffset to the absolute number and set
> > > > > the pos and nbytes 0.  Attached patch fixes this issue.
> > > > >
> > > >
> > > > Few comments on the latest patch v50-0001-Extend-the-BufFile-interface
> > > > 1.
> > > > +
> > > > + /*
> > > > + * If the truncate point is within existing buffer then we can just
> > > > + * adjust pos-within-buffer, without flushing buffer.  Otherwise,
> > > > + * we don't need to do anything because we have already 
> > > > deleted/truncated
> > > > + * the underlying files.
> > > > + */
> > > > + if (curFile == file->curFile &&
> > > > + curOffset >= file->curOffset &&
> > > > + curOffset <= file->curOffset + file->nbytes)
> > > > + {
> > > > + file->pos = (int) (curOffset - file->curOffset);
> > > > + return;
> > > > + }
> > > >
> > > > I think in this case you have set the position correctly but what
> > > > about file->nbytes? In BufFileSeek, it was okay not to update 'nbytes'
> > > > because the contents of the buffer are still valid but I don't think
> > > > the same is true here.
> > > >
> > >
> > > I think you need to set 'nbytes' to curOffset as per your current
> > > patch as that is the new size of the file.
> > > --- a/src/backend/storage/file/buffile.c
> > > +++ b/src/backend/storage/file/buffile.c
> > > @@ -912,6 +912,7 @@ BufFileTruncateShared(BufFile *file, int fileno,
> > > off_t offset)
> > > curOffset <= file->curOffset + file->nbytes)
> > > {
> > > file->pos = (int) (curOffset - file->curOffset);
> > > +   file->nbytes = (int) curOffset;
> > > return;
> > > }
> > >
> > > Also, what about file 'numFiles', that can also change due to the
> > > removal of certain files, shouldn't that be also set in this case
> >
> > Right, we need to set the numFile.  I will fix this as well.
>
> I think there are a couple of more problems in the truncate APIs,
> basically, if the curFile and curOffset are already smaller than the
> truncate location the truncate should not change that.  So the
> truncate should only change the curFile and curOffset if it is
> truncating the part of the file where the curFile or curOffset is
> pointing.
>

Right, I think this can happen if one has changed those by BufFileSeek
before doing truncate. We should fix that case as well.

>  I will work on those along with your other comments and
> submit the updated patch.
>

Thanks.

-- 
With Regards,
Amit Kapila.

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-08-20 Thread Dilip Kumar

On Wed, Aug 19, 2020 at 1:35 PM Dilip Kumar  wrote:
>
> On Wed, Aug 19, 2020 at 12:20 PM Amit Kapila  wrote:
> >
> > On Wed, Aug 19, 2020 at 10:10 AM Amit Kapila  
> > wrote:
> > >
> > > On Mon, Aug 17, 2020 at 6:29 PM Dilip Kumar  wrote:
> > > >
> > > >
> > > > In last patch v49-0001, there is one issue,  Basically, I have called
> > > > BufFileFlush in all the cases.  But, ideally, we can not call this if
> > > > the underlying files are deleted/truncated because those files/blocks
> > > > might not exist now.  So I think if the truncate position is within
> > > > the same buffer we just need to adjust the buffer,  otherwise we just
> > > > need to set the currFile and currOffset to the absolute number and set
> > > > the pos and nbytes 0.  Attached patch fixes this issue.
> > > >
> > >
> > > Few comments on the latest patch v50-0001-Extend-the-BufFile-interface
> > > 1.
> > > +
> > > + /*
> > > + * If the truncate point is within existing buffer then we can just
> > > + * adjust pos-within-buffer, without flushing buffer.  Otherwise,
> > > + * we don't need to do anything because we have already deleted/truncated
> > > + * the underlying files.
> > > + */
> > > + if (curFile == file->curFile &&
> > > + curOffset >= file->curOffset &&
> > > + curOffset <= file->curOffset + file->nbytes)
> > > + {
> > > + file->pos = (int) (curOffset - file->curOffset);
> > > + return;
> > > + }
> > >
> > > I think in this case you have set the position correctly but what
> > > about file->nbytes? In BufFileSeek, it was okay not to update 'nbytes'
> > > because the contents of the buffer are still valid but I don't think
> > > the same is true here.
> > >
> >
> > I think you need to set 'nbytes' to curOffset as per your current
> > patch as that is the new size of the file.
> > --- a/src/backend/storage/file/buffile.c
> > +++ b/src/backend/storage/file/buffile.c
> > @@ -912,6 +912,7 @@ BufFileTruncateShared(BufFile *file, int fileno,
> > off_t offset)
> > curOffset <= file->curOffset + file->nbytes)
> > {
> > file->pos = (int) (curOffset - file->curOffset);
> > +   file->nbytes = (int) curOffset;
> > return;
> > }
> >
> > Also, what about file 'numFiles', that can also change due to the
> > removal of certain files, shouldn't that be also set in this case
>
> Right, we need to set the numFile.  I will fix this as well.

I think there are a couple of more problems in the truncate APIs,
basically, if the curFile and curOffset are already smaller than the
truncate location the truncate should not change that.  So the
truncate should only change the curFile and curOffset if it is
truncating the part of the file where the curFile or curOffset is
pointing.  I will work on those along with your other comments and
submit the updated patch.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-08-19 Thread Dilip Kumar

On Wed, Aug 19, 2020 at 12:20 PM Amit Kapila  wrote:
>
> On Wed, Aug 19, 2020 at 10:10 AM Amit Kapila  wrote:
> >
> > On Mon, Aug 17, 2020 at 6:29 PM Dilip Kumar  wrote:
> > >
> > >
> > > In last patch v49-0001, there is one issue,  Basically, I have called
> > > BufFileFlush in all the cases.  But, ideally, we can not call this if
> > > the underlying files are deleted/truncated because those files/blocks
> > > might not exist now.  So I think if the truncate position is within
> > > the same buffer we just need to adjust the buffer,  otherwise we just
> > > need to set the currFile and currOffset to the absolute number and set
> > > the pos and nbytes 0.  Attached patch fixes this issue.
> > >
> >
> > Few comments on the latest patch v50-0001-Extend-the-BufFile-interface
> > 1.
> > +
> > + /*
> > + * If the truncate point is within existing buffer then we can just
> > + * adjust pos-within-buffer, without flushing buffer.  Otherwise,
> > + * we don't need to do anything because we have already deleted/truncated
> > + * the underlying files.
> > + */
> > + if (curFile == file->curFile &&
> > + curOffset >= file->curOffset &&
> > + curOffset <= file->curOffset + file->nbytes)
> > + {
> > + file->pos = (int) (curOffset - file->curOffset);
> > + return;
> > + }
> >
> > I think in this case you have set the position correctly but what
> > about file->nbytes? In BufFileSeek, it was okay not to update 'nbytes'
> > because the contents of the buffer are still valid but I don't think
> > the same is true here.
> >
>
> I think you need to set 'nbytes' to curOffset as per your current
> patch as that is the new size of the file.
> --- a/src/backend/storage/file/buffile.c
> +++ b/src/backend/storage/file/buffile.c
> @@ -912,6 +912,7 @@ BufFileTruncateShared(BufFile *file, int fileno,
> off_t offset)
> curOffset <= file->curOffset + file->nbytes)
> {
> file->pos = (int) (curOffset - file->curOffset);
> +   file->nbytes = (int) curOffset;
> return;
> }
>
> Also, what about file 'numFiles', that can also change due to the
> removal of certain files, shouldn't that be also set in this case

Right, we need to set the numFile.  I will fix this as well.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-08-19 Thread Dilip Kumar

On Wed, Aug 19, 2020 at 10:11 AM Amit Kapila  wrote:
>
> On Mon, Aug 17, 2020 at 6:29 PM Dilip Kumar  wrote:
> >
> >
> > In last patch v49-0001, there is one issue,  Basically, I have called
> > BufFileFlush in all the cases.  But, ideally, we can not call this if
> > the underlying files are deleted/truncated because those files/blocks
> > might not exist now.  So I think if the truncate position is within
> > the same buffer we just need to adjust the buffer,  otherwise we just
> > need to set the currFile and currOffset to the absolute number and set
> > the pos and nbytes 0.  Attached patch fixes this issue.
> >
>
> Few comments on the latest patch v50-0001-Extend-the-BufFile-interface
> 1.
> +
> + /*
> + * If the truncate point is within existing buffer then we can just
> + * adjust pos-within-buffer, without flushing buffer.  Otherwise,
> + * we don't need to do anything because we have already deleted/truncated
> + * the underlying files.
> + */
> + if (curFile == file->curFile &&
> + curOffset >= file->curOffset &&
> + curOffset <= file->curOffset + file->nbytes)
> + {
> + file->pos = (int) (curOffset - file->curOffset);
> + return;
> + }
>
> I think in this case you have set the position correctly but what
> about file->nbytes? In BufFileSeek, it was okay not to update 'nbytes'
> because the contents of the buffer are still valid but I don't think
> the same is true here.

Right, I think we need to set nbytes to new file->pos as shown below

> + file->pos = (int) (curOffset - file->curOffset);
>  file->nbytes = file->pos


> 2.
> + int curFile = file->curFile;
> + off_t curOffset = file->curOffset;
>
> I find the previous naming (newFile, newOffset) was better as it
> distinguishes them from BufFile variables.

Ok

> 3.
> +void
> +SharedFileSetUnregister(SharedFileSet *input_fileset)
> +{
> ..
> + /* Delete all files in the set */
> + SharedFileSetDeleteAll(input_fileset);
> ..
> }
>
> I am not sure if this is completely correct because we call this
> function (SharedFileSetUnregister) from BufFileDeleteShared which
> would have already removed all the required files. This raises the
> question in my mind whether it is correct to call
> SharedFileSetUnregister from BufFileDeleteShared from the API
> perspective as one might not want to remove the entire fileset at that
> point of time. It will work for your use case (where while removing
> buffile you also want to remove the entire fileset) but not sure if it
> is generic enough. For your case, I wonder if we can directly call
> SharedFileSetDeleteAll and we can have a call like
> SharedFileSetUnregister which will be called from it.

Yeah this make more sense to me that we can directly call
SharedFileSetDeleteAll, instead of calling BufFileDeleteShared and we
can call SharedFileSetUnregister from SharedFileSetDeleteAll.

I will make these changes and send the patch after some testing.




--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-08-19 Thread Amit Kapila

On Wed, Aug 19, 2020 at 10:10 AM Amit Kapila  wrote:
>
> On Mon, Aug 17, 2020 at 6:29 PM Dilip Kumar  wrote:
> >
> >
> > In last patch v49-0001, there is one issue,  Basically, I have called
> > BufFileFlush in all the cases.  But, ideally, we can not call this if
> > the underlying files are deleted/truncated because those files/blocks
> > might not exist now.  So I think if the truncate position is within
> > the same buffer we just need to adjust the buffer,  otherwise we just
> > need to set the currFile and currOffset to the absolute number and set
> > the pos and nbytes 0.  Attached patch fixes this issue.
> >
>
> Few comments on the latest patch v50-0001-Extend-the-BufFile-interface
> 1.
> +
> + /*
> + * If the truncate point is within existing buffer then we can just
> + * adjust pos-within-buffer, without flushing buffer.  Otherwise,
> + * we don't need to do anything because we have already deleted/truncated
> + * the underlying files.
> + */
> + if (curFile == file->curFile &&
> + curOffset >= file->curOffset &&
> + curOffset <= file->curOffset + file->nbytes)
> + {
> + file->pos = (int) (curOffset - file->curOffset);
> + return;
> + }
>
> I think in this case you have set the position correctly but what
> about file->nbytes? In BufFileSeek, it was okay not to update 'nbytes'
> because the contents of the buffer are still valid but I don't think
> the same is true here.
>

I think you need to set 'nbytes' to curOffset as per your current
patch as that is the new size of the file.
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -912,6 +912,7 @@ BufFileTruncateShared(BufFile *file, int fileno,
off_t offset)
curOffset <= file->curOffset + file->nbytes)
{
file->pos = (int) (curOffset - file->curOffset);
+   file->nbytes = (int) curOffset;
return;
}

Also, what about file 'numFiles', that can also change due to the
removal of certain files, shouldn't that be also set in this case?

-- 
With Regards,
Amit Kapila.

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-08-18 Thread Amit Kapila

On Mon, Aug 17, 2020 at 6:29 PM Dilip Kumar  wrote:
>
>
> In last patch v49-0001, there is one issue,  Basically, I have called
> BufFileFlush in all the cases.  But, ideally, we can not call this if
> the underlying files are deleted/truncated because those files/blocks
> might not exist now.  So I think if the truncate position is within
> the same buffer we just need to adjust the buffer,  otherwise we just
> need to set the currFile and currOffset to the absolute number and set
> the pos and nbytes 0.  Attached patch fixes this issue.
>

Few comments on the latest patch v50-0001-Extend-the-BufFile-interface
1.
+
+ /*
+ * If the truncate point is within existing buffer then we can just
+ * adjust pos-within-buffer, without flushing buffer.  Otherwise,
+ * we don't need to do anything because we have already deleted/truncated
+ * the underlying files.
+ */
+ if (curFile == file->curFile &&
+ curOffset >= file->curOffset &&
+ curOffset <= file->curOffset + file->nbytes)
+ {
+ file->pos = (int) (curOffset - file->curOffset);
+ return;
+ }

I think in this case you have set the position correctly but what
about file->nbytes? In BufFileSeek, it was okay not to update 'nbytes'
because the contents of the buffer are still valid but I don't think
the same is true here.

2.
+ int curFile = file->curFile;
+ off_t curOffset = file->curOffset;

I find the previous naming (newFile, newOffset) was better as it
distinguishes them from BufFile variables.

3.
+void
+SharedFileSetUnregister(SharedFileSet *input_fileset)
+{
..
+ /* Delete all files in the set */
+ SharedFileSetDeleteAll(input_fileset);
..
}

I am not sure if this is completely correct because we call this
function (SharedFileSetUnregister) from BufFileDeleteShared which
would have already removed all the required files. This raises the
question in my mind whether it is correct to call
SharedFileSetUnregister from BufFileDeleteShared from the API
perspective as one might not want to remove the entire fileset at that
point of time. It will work for your use case (where while removing
buffile you also want to remove the entire fileset) but not sure if it
is generic enough. For your case, I wonder if we can directly call
SharedFileSetDeleteAll and we can have a call like
SharedFileSetUnregister which will be called from it.

-- 
With Regards,
Amit Kapila.

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-08-15 Thread Dilip Kumar

On Thu, Aug 13, 2020 at 6:47 PM Amit Kapila  wrote:
>
> On Thu, Aug 13, 2020 at 12:08 PM Amit Kapila  wrote:
> >
> > On Fri, Aug 7, 2020 at 2:04 PM Dilip Kumar  wrote:
> > >
> > > On Thu, Aug 6, 2020 at 2:43 PM Amit Kapila  
> > > wrote:
> > > >
> > ..
> > > > This patch's functionality can be independently verified by SQL APIs
> > >
> > > Your changes look fine to me.
> > >
> >
> > I have pushed that patch last week and attached are the remaining
> > patches. I have made a few changes in the next patch
> > 0001-Extend-the-BufFile-interface.patch and have some comments on it
> > which are as below:
> >
>
> Few more comments on the latest patches:
> v48-0002-Add-support-for-streaming-to-built-in-replicatio
> 1. It appears to me that we don't remove the temporary folders created
> by the apply worker. So, we have folders like
> pgsql_tmp15324.0.sharedfileset in base/pgsql_tmp directory even when
> the apply worker exits. I think we can remove these by calling
> PathNameDeleteTemporaryDir in SharedFileSetUnregister while removing
> the fileset from registered filesetlist.

I think we need to call SharedFileSetDeleteAll(input_fileset), from
SharedFileSetUnregister, so that all the directories created for this
fileset are removed

> 2.
> +typedef struct SubXactInfo
> +{
> + TransactionId xid; /* XID of the subxact */
> + int fileno; /* file number in the buffile */
> + off_t offset; /* offset in the file */
> +} SubXactInfo;
> +
> +static uint32 nsubxacts = 0;
> +static uint32 nsubxacts_max = 0;
> +static SubXactInfo *subxacts = NULL;
> +static TransactionId subxact_last = InvalidTransactionId;
>
> Will it be better if we move all the subxact related variables (like
> nsubxacts, nsubxacts_max and subxact_last) inside SubXactInfo struct
> as all the information anyway is related to sub-transactions?

I have moved them all to a structure.

> 3.
> + /*
> + * If there is no subtransaction then nothing to do,  but if already have
> + * subxact file then delete that.
> + */
>
> extra space before 'but' in the above sentence is not required.

Fixed

> v48-0001-Extend-the-BufFile-interface
> 4.
> - * SharedFileSets can also be used by backends when the temporary files need
> - * to be opened/closed multiple times and the underlying files need to 
> survive
> + * SharedFileSets can be used by backends when the temporary files need to be
> + * opened/closed multiple times and the underlying files need to survive
>   * across transactions.
>   *
>
> No need of 'also' in the above sentence.

Fixed


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-08-14 Thread Amit Kapila

On Sat, Aug 15, 2020 at 4:14 AM Thomas Munro  wrote:
>
> On Fri, Aug 14, 2020 at 6:14 PM Amit Kapila  wrote:
> > Yeah, that makes sense. I will take care of that later today or
> > tomorrow. We have not noticed that because currently none of the
> > extensions is using those functions. BTW, I noticed that after
> > failure, the next run is green, why so? Is the next run not on
> > windows?
>
> The three cfbot results are for applying the patch, testing on Windows
> and testing on Ubuntu in that order.  It's not at all clear and I'll
> probably find a better way to display it when I get around to adding
> some more operating systems, maybe with some OS icons or something
> like that...
>

Good to know, anyway, I have pushed a patch to mark those variables
with PGDLLIMPORT.

-- 
With Regards,
Amit Kapila.

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-08-14 Thread Thomas Munro

On Fri, Aug 14, 2020 at 6:14 PM Amit Kapila  wrote:
> Yeah, that makes sense. I will take care of that later today or
> tomorrow. We have not noticed that because currently none of the
> extensions is using those functions. BTW, I noticed that after
> failure, the next run is green, why so? Is the next run not on
> windows?

The three cfbot results are for applying the patch, testing on Windows
and testing on Ubuntu in that order.  It's not at all clear and I'll
probably find a better way to display it when I get around to adding
some more operating systems, maybe with some OS icons or something
like that...

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-08-14 Thread Amit Kapila

On Fri, Aug 14, 2020 at 10:11 AM Thomas Munro  wrote:
>
> On Thu, Aug 13, 2020 at 6:38 PM Amit Kapila  wrote:
> > I have pushed that patch last week and attached are the remaining
> > patches. I have made a few changes in the next patch
> > 0001-Extend-the-BufFile-interface.patch and have some comments on it
> > which are as below:
>
> Hi Amit,
>
> I noticed that Konstantin Knizhnik's CF entry 2386 calls
> table_scan_XXX() functions from an extension, namely
> contrib/auto_explain, and started failing to build on Windows after
> commit 7259736a.  This seems to be due to the new global variables
> CheckXidAlive and bsysscan, which probably need PGDLLIMPORT if they
> are accessed from inline functions that are part of the API that we
> expect extensions to be allowed to call.
>

Yeah, that makes sense. I will take care of that later today or
tomorrow. We have not noticed that because currently none of the
extensions is using those functions. BTW, I noticed that after
failure, the next run is green, why so? Is the next run not on
windows?

-- 
With Regards,
Amit Kapila.

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-08-13 Thread Thomas Munro

On Thu, Aug 13, 2020 at 6:38 PM Amit Kapila  wrote:
> I have pushed that patch last week and attached are the remaining
> patches. I have made a few changes in the next patch
> 0001-Extend-the-BufFile-interface.patch and have some comments on it
> which are as below:

Hi Amit,

I noticed that Konstantin Knizhnik's CF entry 2386 calls
table_scan_XXX() functions from an extension, namely
contrib/auto_explain, and started failing to build on Windows after
commit 7259736a.  This seems to be due to the new global variables
CheckXidAlive and bsysscan, which probably need PGDLLIMPORT if they
are accessed from inline functions that are part of the API that we
expect extensions to be allowed to call.

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-08-13 Thread Amit Kapila

On Thu, Aug 13, 2020 at 12:08 PM Amit Kapila  wrote:
>
> On Fri, Aug 7, 2020 at 2:04 PM Dilip Kumar  wrote:
> >
> > On Thu, Aug 6, 2020 at 2:43 PM Amit Kapila  wrote:
> > >
> ..
> > > This patch's functionality can be independently verified by SQL APIs
> >
> > Your changes look fine to me.
> >
>
> I have pushed that patch last week and attached are the remaining
> patches. I have made a few changes in the next patch
> 0001-Extend-the-BufFile-interface.patch and have some comments on it
> which are as below:
>

Few more comments on the latest patches:
v48-0002-Add-support-for-streaming-to-built-in-replicatio
1. It appears to me that we don't remove the temporary folders created
by the apply worker. So, we have folders like
pgsql_tmp15324.0.sharedfileset in base/pgsql_tmp directory even when
the apply worker exits. I think we can remove these by calling
PathNameDeleteTemporaryDir in SharedFileSetUnregister while removing
the fileset from registered filesetlist.

2.
+typedef struct SubXactInfo
+{
+ TransactionId xid; /* XID of the subxact */
+ int fileno; /* file number in the buffile */
+ off_t offset; /* offset in the file */
+} SubXactInfo;
+
+static uint32 nsubxacts = 0;
+static uint32 nsubxacts_max = 0;
+static SubXactInfo *subxacts = NULL;
+static TransactionId subxact_last = InvalidTransactionId;

Will it be better if we move all the subxact related variables (like
nsubxacts, nsubxacts_max and subxact_last) inside SubXactInfo struct
as all the information anyway is related to sub-transactions?

3.
+ /*
+ * If there is no subtransaction then nothing to do,  but if already have
+ * subxact file then delete that.
+ */

extra space before 'but' in the above sentence is not required.

v48-0001-Extend-the-BufFile-interface
4.
- * SharedFileSets can also be used by backends when the temporary files need
- * to be opened/closed multiple times and the underlying files need to survive
+ * SharedFileSets can be used by backends when the temporary files need to be
+ * opened/closed multiple times and the underlying files need to survive
  * across transactions.
  *

No need of 'also' in the above sentence.

-- 
With Regards,
Amit Kapila.

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-08-07 Thread Dilip Kumar

On Thu, Aug 6, 2020 at 2:43 PM Amit Kapila  wrote:
>
> On Wed, Aug 5, 2020 at 7:37 PM Dilip Kumar  wrote:
> >
> > On Wed, Aug 5, 2020 at 6:25 PM Amit Kapila  wrote:
> > >
> >
> > > Can we add a test for incomplete changes (probably with toast
> > > insertion but we can do it for spec_insert case as well) in
> > > ReorderBuffer such that it needs to first serialize the changes and
> > > then stream it?  I have manually verified such scenarios but it is
> > > good to have the test for the same.
> >
> > I have added a new test for the same in the stream.sql file.
> >
>
> Thanks, I have slightly changed the test so that we can consume DDL
> changes separately.  I have made a number of other adjustments like
> changing few more comments (to make them consistent with nearby
> comments), removed unnecessary inclusion of header file, ran pgindent.
> The next patch (v47-0001-Implement-streaming-mode-in-ReorderBuffer) in
> this series looks good to me.  I am planning to push it after one more
> read-through unless you or anyone else has any comments on the same.
> The patch I am talking about has the following functionality:
>
> Implement streaming mode in ReorderBuffer. Instead of serializing the
> transaction to disk after reaching the logical_decoding_work_mem limit
> in memory, we consume the changes we have in memory and invoke stream
> API methods added by commit 45fdc9738b. However, sometimes if we have
> incomplete toast or speculative insert we spill to the disk because we
> can't stream till we have the complete tuple.  And, as soon as we get
> the complete tuple we stream the transaction including the serialized
> changes. Now that we can stream in-progress transactions, the
> concurrent aborts may cause failures when the output plugin consults
> catalogs (both system and user-defined). We handle such failures by
> returning ERRCODE_TRANSACTION_ROLLBACK sqlerrcode from system table
> scan APIs to the backend or WALSender decoding a specific uncommitted
> transaction. The decoding logic on the receipt of such a sqlerrcode
> aborts the decoding of the current transaction and continues with the
> decoding of other transactions. We also provide a new option via SQL
> APIs to fetch the changes being streamed.
>
> This patch's functionality can be independently verified by SQL APIs

Your changes look fine to me.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-08-05 Thread Amit Kapila

On Tue, Aug 4, 2020 at 12:42 PM Dilip Kumar  wrote:
>
> On Tue, Aug 4, 2020 at 10:12 AM Amit Kapila  wrote:
> >
>
> > 4.  I think we can explain the problems (like we can see the wrong
> > tuple or see two versions of the same tuple or whatever else wrong can
> > happen, if possible with some example) related to concurrent aborts
> > somewhere in comments.
>
> Done
>

I have slightly modified the comment added for the above point and
apart from that added/modified a few comments at other places.  I have
also slightly edited the commit message.

@@ -2196,6 +2778,7 @@ ReorderBufferAddNewTupleCids(ReorderBuffer *rb,
TransactionId xid,
  change->lsn = lsn;
  change->txn = txn;
  change->action = REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID;
+ change->txn = txn;

This change is not required as the same information is assigned a few
lines before.  So, I have removed this change as well.  Let me know
what you think of the above changes?

Can we add a test for incomplete changes (probably with toast
insertion but we can do it for spec_insert case as well) in
ReorderBuffer such that it needs to first serialize the changes and
then stream it?  I have manually verified such scenarios but it is
good to have the test for the same.

-- 
With Regards,
Amit Kapila.

v46-0001-Implement-streaming-mode-in-ReorderBuffer.patch
Description: Binary data

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-08-03 Thread Amit Kapila

On Wed, Jul 29, 2020 at 10:46 AM Dilip Kumar  wrote:
>
> Thanks, please find the rebased patch set.
>

Few comments on v44-0001-Implement-streaming-mode-in-ReorderBuffer:

1.
+-- streaming with subxact, nothing in main
+BEGIN;
+savepoint s1;
+SELECT 'msg5' FROM pg_logical_emit_message(true, 'test', repeat('a', 50));
+INSERT INTO stream_test SELECT repeat('a', 2000) || g.i FROM
generate_series(1, 35) g(i);
+TRUNCATE table stream_test;
+rollback to s1;
+INSERT INTO stream_test SELECT repeat('a', 10) || g.i FROM
generate_series(1, 20) g(i);
+COMMIT;

Is the above comment true?  Because it seems to me that Insert is
getting streamed in the main transaction.

2.
+
+postgres[33712]=#* SELECT * FROM
pg_logical_slot_get_changes('test_slot', NULL, NULL, 'stream-changes',
'1');
+lsn| xid |   data
+---+-+--
+ 0/16B21F8 | 503 | opening a streamed block for transaction TXN 503
+ 0/16B21F8 | 503 | streaming change for TXN 503
+ 0/16B2300 | 503 | streaming change for TXN 503
+ 0/16B2408 | 503 | streaming change for TXN 503
+ 0/16BEBA0 | 503 | closing a streamed block for transaction TXN 503
+ 0/16B21F8 | 503 | opening a streamed block for transaction TXN 503
+ 0/16BECA8 | 503 | streaming change for TXN 503
+ 0/16BEDB0 | 503 | streaming change for TXN 503
+ 0/16BEEB8 | 503 | streaming change for TXN 503
+ 0/16BEBA0 | 503 | closing a streamed block for transaction TXN 503
+(10 rows)
+
+ 
+

Is the above example correct?  Because we should include XID in the
stream message only when include_xids option is specified.

3.
 /*
- * Queue a change into a transaction so it can be replayed upon commit.
+ * Record the partial change for the streaming of in-progress transactions.  We
+ * can stream only complete changes so if we have a partial change like toast
+ * table insert or speculative then we mark such a 'txn' so that it can't be
+ * streamed.

/speculative then/speculative insert then

4.  I think we can explain the problems (like we can see the wrong
tuple or see two versions of the same tuple or whatever else wrong can
happen, if possible with some example) related to concurrent aborts
somewhere in comments.



-- 
With Regards,
Amit Kapila.

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-07-31 Thread Ajin Cherian

> Attaching an updated patch for the stats for streaming based on v2 of
> Sawada's san replication slot stats framework and v44 of this patch series
> . This is one patch that has both the stats framework from Sawada-san (1)
> as well as my update for streaming, so it can be applied easily on top of
> v44.


regards,
Ajin Cherian
Fujitsu Australia


streaming_stats_update.patch
Description: Binary data

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-07-30 Thread Amit Kapila

On Thu, Jul 30, 2020 at 12:28 PM Ajin Cherian  wrote:
>
> I was running some tests on this patch. I was generally trying to see how the 
> patch affects logical replication when doing bulk inserts. This issue has 
> been raised in the past, for eg: this [1].
> My test setup is:
> 1. Two postgres servers running - A and B
> 2. Create a pgbench setup on A. (pgbench -i -s 5 postgres)
> 3. replicate the 3 tables (schema only) on B.
> 4. Three publishers on A for the 3 tables of pgbench; pgbench_accounts, 
> pgbench_branches and pgbench_tellers;
> 5. Three subscribers on B for the same tables. (streaming on and off based on 
> the scenarios described below)
>
> run pgbench with : pgbench -c 4 -T 100 postgres
> While pgbench is running, Do a bulk insert on some other table not in the 
> publication list (say t1); INSERT INTO t1 (select i FROM 
> generate_series(1,1000) i);
>
> Four scenarios:
> 1. Pgbench with logical replication enabled without bulk insert
> Avg TPS (out of 10 runs): 641 TPS
> 2.Pgbench without logical replication enabled with bulk insert (no pub/sub)
> Avg TPS (out of 10 runs): 665 TPS
> 3, Pgbench with logical replication enabled with bulk insert
> Avg TPS (out of 10 runs): 278 TPS
> 4. Pgbench with logical replication streaming on with bulk insert
> Avg TPS (out of 10 runs): 440 TPS
>
> As you can see, the bulk inserts, although on a totally unaffected table, 
> does impact the TPS. But what is good is that, enabling streaming improves 
> the TPS (about 58% improvement)
>

Thanks for doing these tests, it is a good win and probably the reason
is that after patch we won't serialize such big transactions (as shown
in Konstantin's email [1]) and they will be simply skipped.
Basically, it will try to stream such transactions and will skip them
as they are not required to be sent.

[1] - 
https://www.postgresql.org/message-id/5f5143cc-9f73-3909-3ef7-d3895cc6cc90%40postgrespro.ru

-- 
With Regards,
Amit Kapila.

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-07-30 Thread Ajin Cherian

On Wed, Jul 29, 2020 at 3:16 PM Dilip Kumar  wrote:

>
>
> Thanks, please find the rebased patch set.
>
> --
> Regards,
> Dilip Kumar
> EnterpriseDB: http://www.enterprisedb.com

I was running some tests on this patch. I was generally trying to see how
the patch affects logical replication when doing bulk inserts. This issue
has been raised in the past, for eg: this [1].
My test setup is:
1. Two postgres servers running - A and B
2. Create a pgbench setup on A. (pgbench -i -s 5 postgres)
3. replicate the 3 tables (schema only) on B.
4. Three publishers on A for the 3 tables of pgbench; pgbench_accounts,
pgbench_branches and pgbench_tellers;
5. Three subscribers on B for the same tables. (streaming on and off based
on the scenarios described below)

run pgbench with : pgbench -c 4 -T 100 postgres
While pgbench is running, Do a bulk insert on some other table not in the
publication list (say t1); INSERT INTO t1 (select i FROM
generate_series(1,1000) i);

Four scenarios:
1. Pgbench with logical replication enabled without bulk insert
Avg TPS (out of 10 runs): 641 TPS
2.Pgbench without logical replication enabled with bulk insert (no pub/sub)
Avg TPS (out of 10 runs): 665 TPS
3, Pgbench with logical replication enabled with bulk insert
Avg TPS (out of 10 runs): 278 TPS
4. Pgbench with logical replication streaming on with bulk insert
Avg TPS (out of 10 runs): 440 TPS

As you can see, the bulk inserts, although on a totally unaffected table,
does impact the TPS. But what is good is that, enabling streaming improves
the TPS (about 58% improvement)

[1] -
https://www.postgresql.org/message-id/flat/CAMsr%2BYE6aE6Re6smrMr-xCabRmCr%3DyzXEf2Yuv5upEDY5nMX8g%40mail.gmail.com#dbe51a181dd735eec8bb36f8a07bacf5

regards,
Ajin Cherian
Fujitsu Australia

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-07-27 Thread Amit Kapila

On Sun, Jul 26, 2020 at 11:04 AM Dilip Kumar  wrote:
>
> > Today, I have again looked at the first patch
> > (v42-0001-Extend-the-logical-decoding-output-plugin-API-wi) and didn't
> > find any more problems with it so planning to commit the same unless
> > you or someone else want to add more to it.   Just for ease of others,
> > "the next patch extends the logical decoding output plugin API with
> > stream methods".   It adds seven methods to the output plugin API,
> > adding support for streaming changes for large in-progress
> > transactions. The methods are stream_start, stream_stop, stream_abort,
> > stream_commit, stream_change, stream_message, and stream_truncate.
> > Most of this is a simple extension of the existing methods, with the
> > semantic difference that the transaction (or subtransaction) is
> > incomplete and may be aborted later (which is something the regular
> > API does not really need to deal with).
> >
> > This also extends the 'test_decoding' plugin, implementing these new
> > stream methods.  The stream_start/start_stop are used to demarcate a
> > chunk of changes streamed for a particular toplevel transaction.
> >
> > This commit simply adds these new APIs and the upcoming patch to
> > "allow the streaming mode in ReorderBuffer" will use these APIs.
>
> LGTM
>

Pushed.  Feel free to submit the remaining patches.

-- 
With Regards,
Amit Kapila.

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-07-25 Thread Dilip Kumar

On Sat, Jul 25, 2020 at 5:08 PM Amit Kapila  wrote:
>
> On Fri, Jul 24, 2020 at 7:17 PM Dilip Kumar  wrote:
> >
> > Your changes look fine to me.  Additionally, I have changed a test
> > case of getting the streaming changes in 0002.  Instead of just
> > showing the count, I am showing that the transaction is actually
> > streaming.
> >
>
> If you want to show the changes then there is no need to display 157
> rows probably a few (10-15) should be sufficient.  If we can do that
> by increasing the size of the row then good, otherwise, I think it is
> better to retain the test to display the count.

I think in existing test cases also we are displaying multiple lines
e.g. toast.out is showing 235 rows.  But maybe I will try to reduce it
to the less number of rows.

> Today, I have again looked at the first patch
> (v42-0001-Extend-the-logical-decoding-output-plugin-API-wi) and didn't
> find any more problems with it so planning to commit the same unless
> you or someone else want to add more to it.   Just for ease of others,
> "the next patch extends the logical decoding output plugin API with
> stream methods".   It adds seven methods to the output plugin API,
> adding support for streaming changes for large in-progress
> transactions. The methods are stream_start, stream_stop, stream_abort,
> stream_commit, stream_change, stream_message, and stream_truncate.
> Most of this is a simple extension of the existing methods, with the
> semantic difference that the transaction (or subtransaction) is
> incomplete and may be aborted later (which is something the regular
> API does not really need to deal with).
>
> This also extends the 'test_decoding' plugin, implementing these new
> stream methods.  The stream_start/start_stop are used to demarcate a
> chunk of changes streamed for a particular toplevel transaction.
>
> This commit simply adds these new APIs and the upcoming patch to
> "allow the streaming mode in ReorderBuffer" will use these APIs.

LGTM

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-07-25 Thread Amit Kapila

On Fri, Jul 24, 2020 at 7:17 PM Dilip Kumar  wrote:
>
> Your changes look fine to me.  Additionally, I have changed a test
> case of getting the streaming changes in 0002.  Instead of just
> showing the count, I am showing that the transaction is actually
> streaming.
>

If you want to show the changes then there is no need to display 157
rows probably a few (10-15) should be sufficient.  If we can do that
by increasing the size of the row then good, otherwise, I think it is
better to retain the test to display the count.

Today, I have again looked at the first patch
(v42-0001-Extend-the-logical-decoding-output-plugin-API-wi) and didn't
find any more problems with it so planning to commit the same unless
you or someone else want to add more to it.   Just for ease of others,
"the next patch extends the logical decoding output plugin API with
stream methods".   It adds seven methods to the output plugin API,
adding support for streaming changes for large in-progress
transactions. The methods are stream_start, stream_stop, stream_abort,
stream_commit, stream_change, stream_message, and stream_truncate.
Most of this is a simple extension of the existing methods, with the
semantic difference that the transaction (or subtransaction) is
incomplete and may be aborted later (which is something the regular
API does not really need to deal with).

This also extends the 'test_decoding' plugin, implementing these new
stream methods.  The stream_start/start_stop are used to demarcate a
chunk of changes streamed for a particular toplevel transaction.

This commit simply adds these new APIs and the upcoming patch to
"allow the streaming mode in ReorderBuffer" will use these APIs.

-- 
With Regards,
Amit Kapila.

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-07-23 Thread Amit Kapila

On Wed, Jul 22, 2020 at 4:55 PM Dilip Kumar  wrote:
>
> You are right.  I have changed it.
>

Thanks, I have pushed the second patch in this series which is
0001-WAL-Log-invalidations-at-command-end-with-wal_le in your latest
patch.  I will continue working on remaining patches.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-07-22 Thread Amit Kapila

On Wed, Jul 22, 2020 at 10:20 AM Dilip Kumar  wrote:
>
> On Wed, Jul 22, 2020 at 9:18 AM Amit Kapila  wrote:
> >
> > On Mon, Jul 20, 2020 at 6:46 PM Dilip Kumar  wrote:
> > >
> > > There was one warning in release mode in the last version in 0004 so
> > > attaching a new version.
> > >
> >
> > Today, I was reviewing patch
> > v38-0001-WAL-Log-invalidations-at-command-end-with-wal_le and found a
> > small problem with it.
> >
> > + /*
> > + * Execute the invalidations for xid-less transactions,
> > + * otherwise, accumulate them so that they can be processed at
> > + * the commit time.
> > + */
> > + if (!ctx->fast_forward)
> > + {
> > + if (TransactionIdIsValid(xid))
> > + {
> > + ReorderBufferAddInvalidations(reorder, xid, buf->origptr,
> > +   invals->nmsgs, invals->msgs);
> > + ReorderBufferXidSetCatalogChanges(ctx->reorder, xid,
> > +   buf->origptr);
> > + }
> >
> > I think we need to set ReorderBufferXidSetCatalogChanges even when
> > ctx->fast-forward is true because we are dependent on that flag for
> > snapshot build (see SnapBuildCommitTxn).  We are already doing the
> > same way in DecodeCommit where even though we skip adding
> > invalidations for fast-forward cases but we do set the flag to
> > indicate that this txn has catalog changes.  Is there any reason to do
> > things differently here?
>
> I think it is wrong,  we should set the
> ReorderBufferXidSetCatalogChanges, even if it is in fast-forward mode.
>

Thanks for the change.  I have one more minor comment in the patch
0001-WAL-Log-invalidations-at-command-end-with-wal_le.

 /*
+ * Invalidations logged with wal_level=logical.
+ */
+typedef struct xl_xact_invalidations
+{
+ int nmsgs; /* number of shared inval msgs */
+ SharedInvalidationMessage msgs[FLEXIBLE_ARRAY_MEMBER];
+} xl_xact_invalidations;

I see that we already have a structure xl_xact_invals in the code
which has the same members, so I think it is better to use that
instead of defining a new one.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-07-21 Thread Amit Kapila

On Mon, Jul 20, 2020 at 6:46 PM Dilip Kumar  wrote:
>
> There was one warning in release mode in the last version in 0004 so
> attaching a new version.
>

Today, I was reviewing patch
v38-0001-WAL-Log-invalidations-at-command-end-with-wal_le and found a
small problem with it.

+ /*
+ * Execute the invalidations for xid-less transactions,
+ * otherwise, accumulate them so that they can be processed at
+ * the commit time.
+ */
+ if (!ctx->fast_forward)
+ {
+ if (TransactionIdIsValid(xid))
+ {
+ ReorderBufferAddInvalidations(reorder, xid, buf->origptr,
+   invals->nmsgs, invals->msgs);
+ ReorderBufferXidSetCatalogChanges(ctx->reorder, xid,
+   buf->origptr);
+ }

I think we need to set ReorderBufferXidSetCatalogChanges even when
ctx->fast-forward is true because we are dependent on that flag for
snapshot build (see SnapBuildCommitTxn).  We are already doing the
same way in DecodeCommit where even though we skip adding
invalidations for fast-forward cases but we do set the flag to
indicate that this txn has catalog changes.  Is there any reason to do
things differently here?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-07-20 Thread Ajin Cherian

On Mon, Jul 20, 2020 at 11:16 PM Dilip Kumar  wrote:

>
>
> There was one warning in release mode in the last version in 0004 so
> attaching a new version.
>
> --
> Regards,
> Dilip Kumar
> EnterpriseDB: http://www.enterprisedb.com

Hello,

I have tried to rework the  patch which did the stats for the streaming of
logical replication but based on the new logical replication stats
framework developed by Masahiko-san and rebased by Amit in [1]. This uses
v38 of the streaming logical update patch as well as the v1 of the stats
framework patch as base. I will rebase this as the stats framework is
updated. Let me know if you have any comments.

regards,
Ajin Cherian
Fujitsu Australia

[1] -
https://www.postgresql.org/message-id/flat/CA%2Bfd4k5_pPAYRTDrO2PbtTOe0eHQpBvuqmCr8ic39uTNmR49Eg%40mail.gmail.com

v1_streaming_stats_update.patch
Description: Binary data

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-07-20 Thread Amit Kapila

On Mon, Jul 20, 2020 at 12:01 PM Dilip Kumar  wrote:
>
> On Thu, Jul 16, 2020 at 4:25 PM Amit Kapila  wrote:
> >
> > On Thu, Jul 16, 2020 at 12:23 PM Dilip Kumar  wrote:
> > >
> > > On Wed, Jul 15, 2020 at 6:59 PM Amit Kapila  
> > > wrote:
> > > >
> > > >
> > > > Let me know what you think of the changes?
> > >
> > > I have reviewed the changes and looks fine to me.
> > >
> >
> > Thanks, I am planning to start committing a few of the infrastructure
> > patches (especially first two) by early next week as we have resolved
> > all the open issues and done an extensive review of the entire
> > patch-set.  In the attached version, there is a slight change in one
> > of the commit messages as compared to the previous version.  I would
> > like to describe in brief the first two patches for the sake of
> > convenience.  Let me know if you or anyone else sees any problems with
> > these.
> >
> > The first patch in the series allows us to WAL-log subtransaction and
> > top-level XID association.  The logical decoding infrastructure needs
> > to know which top-level
> > transaction the subxact belongs to, in order to decode all the
> > changes. Until now that might be delayed until commit, due to the
> > caching (GPROC_MAX_CACHED_SUBXIDS), preventing features requiring
> > incremental decoding.  So we also write the assignment info into WAL
> > immediately, as part of the next WAL record (to minimize overhead)
> > only when *wal_level=logical*.  We can not remove the existing
> > XLOG_XACT_ASSIGNMENT WAL as that is required for avoiding overflow in
> > the hot standby snapshot.
> >

Pushed, this patch.

> >
>
> The patch set required to rebase after committing the binary format
> option support in the create subscription command.  I have rebased the
> patch set on the latest head and also added a test case to test
> streaming in binary format.
>

While going through commit 9de77b5453, I noticed below change:

@@ -424,6 +424,10 @@ libpqrcv_startstreaming(WalReceiverConn *conn,
PQfreemem(pubnames_literal);
pfree(pubnames_str);

+   if (options->proto.logical.binary &&
+   PQserverVersion(conn->streamConn) >= 14)
+   appendStringInfoString(, ", binary 'true'");
+

Now, the similar change in this patch series is as below:

@@ -408,6 +408,9 @@ libpqrcv_startstreaming(WalReceiverConn *conn,
  appendStringInfo(, "proto_version '%u'",
  options->proto.logical.proto_version);

+ if (options->proto.logical.streaming)
+ appendStringInfo(, ", streaming 'on'");
+

I think we also need a version check similar to commit 9de77b5453 to
ensure that we send the new option only when connected to a newer
version (>=14) primary server.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-07-16 Thread Dilip Kumar

On Wed, Jul 15, 2020 at 6:59 PM Amit Kapila  wrote:
>
> On Wed, Jul 15, 2020 at 9:29 AM Dilip Kumar  wrote:
> >
> >
> > I have reviewed your changes and those look good to me,  please find
> > the latest version of the patch set.
> >
>
> I have done an additional round of review and below are the changes I
> made in the attached patch-set.
> 1. Changed comments in 0002.
> 2. In 0005, apart from changing a few comments and function name, I
> have changed below code:
> + if (ReorderBufferCanStream(rb) &&
> + !SnapBuildXactNeedsSkip(builder, ctx->reader->ReadRecPtr))
> Here, I think it is better to compare it with EndRecPtr.  I feel in
> boundary case the next record could be the same as start_decoding_at,
> so why to avoid streaming in that case?

Make sense to me

> 3. In 0006, made below changes:
> a. Removed function ReorderBufferFreeChange and added a new
> parameter in ReorderBufferReturnChange to achieve the same purpose.
> b. Changed quite a few comments, function names, added additional
> Asserts, and few other cosmetic changes.
> 4. In 0007, made below changes:
> a. Removed the unnecessary change in .gitignore
> b. Changed the newly added option name to "stream-change".
>
> Apart from above, I have merged patches 0004, 0005, 0006 and 0007 as
> those seems one functionality to me.  For the sake of review, the
> patch-set that contains merged patches is attached separately as
> v34-combined.
>
> Let me know what you think of the changes?

I have reviewed the changes and looks fine to me.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

1 2 3 4 5 >

1 - 100 of 495 matches

Mail list logo