consistently use "ProcSignal" instead of "procsignal" in code comments

2021-11-04 Thread Bharath Rupireddy
Hi,

I see that "procsignal" and "ProcSignal" are being used in the code
comments which look inconsistent. IMO, "ProcSignal" is the right word
to use and let's be consistent across the code comments. Attaching a
tiny patch for that.

Thoughts?

Regards,
Bharath Rupireddy.


v1-0001-consistently-use-ProcSignal-instead-of-procsignal.patch
Description: Binary data


Re: pglz compression performance, take two

2021-11-04 Thread Andrey Borodin
Thanks for the review Mark! Sorry it took too long to reply on my side.

> 28 июня 2021 г., в 21:05, Mark Dilger  
> написал(а):
> 
>> #define PGLZ_HISTORY_SIZE   0x0fff - 1  /* to avoid compare in iteration 
>> */
> ...
>> static PGLZ_HistEntry hist_entries[PGLZ_HISTORY_SIZE + 1];
> ...
>>if (hist_next == PGLZ_HISTORY_SIZE + 1)
> 
> These are the only uses of PGLZ_HISTORY_SIZE.  Perhaps you could just defined 
> the symbol as 0x0fff and skip the -1 and +1 business?
Fixed.

>> /* --
>> * pglz_compare -
>> *
>> *  Compares 4 bytes at pointers
>> * --
>> */
>> static inline bool
>> pglz_compare32(const void *ptr1, const void *ptr2)
>> {
>>return memcmp(ptr1, ptr2, 4) == 0;
>> }
> 
> The comment function name differs from the actual function name.
> 
> Also, pglz_compare returns an offset into the string, whereas pglz_compare32 
> returns a boolean.  This is fairly unintuitive.  The "32" part of 
> pglz_compare32 implies doing the same thing as pglz_compare but where the 
> string is known to be 4 bytes in length.  Given that pglz_compare32 is 
> dissimilar to pglz_compare, perhaps avoid using /pglz_compare/ in its name?
I've removed pglz_compare32 entirely. It was a simple substitution for memcmp().

> 
>>/*
>> * Determine length of match. A better match must be larger than the
>> * best so far. And if we already have a match of 16 or more bytes,
>> * it's worth the call overhead to use memcmp()
> 
> This comment is hard to understand, given the code that follows.  The first 
> block calls memcmp(), which seems to be the function overhead you refer to.  
> The second block calls the static inline function pglz_compare32, which 
> internally calls memcmp().  Superficially, there seems to be a memcmp() 
> function call either way.  The difference is that in the first block's call 
> to memcmp(), the length is a runtime value, and in the second block, it is a 
> compile-time known value.  If you are depending on the compiler to notice 
> this distinction and optimize the second call, perhaps you can mention that 
> explicitly?  Otherwise, reading and understanding the comment takes more 
> effort.
I've updated comment for second branch with fixed-size memcmp(). Frankly, I'm 
not sure "if (memcmp(input_pos, hist_pos, 4) == 0)" worth the complexity,  
internals of "pglz_compare(0, len_bound, input_pos + 0, hist_pos + 0);" would 
do almost same instructions.

> 
> I took a quick look for other places in the code that try to beat the 
> performance of memcmp on short strings.  In varlena.c, rest_of_char_same() 
> seems to do so.  We also use comparisons on NameData, which frequently 
> contains strings shorter than 16 bytes.  Is it worth sharting a static inline 
> function that uses your optimization in other places?  How confident are you 
> that your optimization really helps?
Honestly, I do not know. The overall patch effect consists of stacking up many 
small optimizations. They have a net effect, but are too noisy to measure 
independently. That's mostly the reason why I didn't know what to reply for so 
long.


> 5 нояб. 2021 г., в 01:47, Tomas Vondra  
> написал(а):
> 
> Andrey, can you update the patch per Mark's review? I'll do my best to get it 
> committed sometime in this CF.

Cool! Here's the patch.

Best regards, Andrey Borodin.



v6-0001-Reorganize-pglz-compression-code.patch
Description: Binary data


Re: parallel vacuum comments

2021-11-04 Thread Masahiko Sawada
On Fri, Nov 5, 2021 at 6:25 AM Peter Geoghegan  wrote:
>
> On Thu, Nov 4, 2021 at 12:42 PM Peter Geoghegan  wrote:
> > Since "The leader process alone processes all parallel-safe indexes in
> > the case where no workers are launched" (no change there), I wonder:
> > how does the leader *know* that it's the leader (and so can always
> > process any indexes) inside its call to
> > lazy_parallel_process_indexes()? Or, does the leader actually process
> > all indexes inside lazy_serial_process_indexes() in this edge case?
> > (The edge case where a parallel VACUUM has no workers at all, because
> > they couldn't be launched by the core parallel infrastructure.)
>
> I think that I might see a related problem. But I'm not sure, so I'll just 
> ask:
>
> > +   /* Set data required for parallel index vacuum or cleanup */
> > +   prepare_parallel_index_processing(vacrel, vacuum);
> > +
> > +   /* Reset the parallel index processing counter */
> > +   pg_atomic_write_u32(&(lps->lvshared->idx), 0);
> > +
> > /* Setup the shared cost-based vacuum delay and launch workers */
> > if (nworkers > 0)
> > {
> > +   /* Reinitialize the parallel context to relaunch parallel workers */
> > if (vacrel->num_index_scans > 0)
> > -   {
> > -   /* Reset the parallel index processing counter */
> > -   pg_atomic_write_u32(&(lps->lvshared->idx), 0);
> > -
> > -   /* Reinitialize the parallel context to relaunch parallel 
> > workers */
> > ReinitializeParallelDSM(lps->pcxt);
> > -   }
>
> Is it okay that we don't call ReinitializeParallelDSM() just because
> "nworkers == 0" this time around? I notice that there is a wait for
> "nworkers_launched" workers to finish parallel processing, at the top
> of ReinitializeParallelDSM(). I can see why the
> "vacrel->num_index_scans > 0" test is okay, but I can't see why the
> "nworkers == 0" test is okay.
>
> I just want to be sure that we're not somehow relying on seeing state
> in shared memory (in the LVSharedIndStats array) in all cases, but
> finding that it is not actually there in certain rare edge cases.
> Maybe this didn't matter before, because the leader didn't expect to
> find this information in shared memory in any case. But that is
> changed by your patch, of course, so it's something to be concerned
> about.

If we launch workers (i.g., nworkers > 0), we wait for these workers
to finish after processing all indexes (see we call
WaitForParallelWorkersToFinish() after lazy_parallel_process_indexes).
So it's guaranteed that all workers finished at the end
ofparallel_lazy_vacuum_or_cleanup_all_indexes().  So even in the
second call to this function, we don't need to wait for
"nworkers_launched" workers who previously were running to finish.
Does it make sense?

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: enhance pg_log_backend_memory_contexts() to log memory contexts of auxiliary processes

2021-11-04 Thread Bharath Rupireddy
On Thu, Nov 4, 2021 at 9:35 AM Bharath Rupireddy
 wrote:
> > I think the reason we need to do this is not that aux processes have
> > the invalid backend id (=InvalidBackendId) but that "some" auxiliary
> > processes may have a broken proc->backendId in regard to
> > SendProcSignal (we know that's the startup for now.).
>
> I wanted to not have any problems signalling the startup process with
> the current code. Yes, the startup process is the only auxiliary
> process that has a valid backind id and we have other threads fixing
> it. Let's keep the way it is in the v1 patch. Based on whichever patch
> gets in we can modify the code.

I added a note there (with XXX) describing the fact that we explicitly
need to send invalid backend id to SendProcSignal.

> > +SELECT pg_log_backend_memory_contexts(memcxt_get_proc_pid('autovacuum 
> > launcher'+SELECT 
> > pg_log_backend_memory_contexts(memcxt_get_proc_pid('logical replication 
> > launcher'));
> > ...
> >
> > Maybe we can reduce (a quite bit of) run time of the test by
> > loopingover the processes but since the test only checks if the
> > function doesn't fail to send a signal, I'm not sure we need to
> > perform the test for all of the processes here.
>
> Okay, let me choose the checkpointer for this test, I will remove other tests.

I retained the test case just for the checkpointer.

> > On the other hand,
> > the test is missing the most significant target of the startup
> > process.
>
> If we were to have tests for the startup process, then it needs to be
> in TAP tests as we have to start a hot standby where the startup
> process will be in continuous mode. Is there any other way that we can
> add the test case in a .sql file? Do we need to get into this much
> complexity for the test case?

I've not added a TAP test case for the startup process, I see it as
unnecessary. I've tested the startup process case manually here which
just works.

PSA v2 patch and review it.

Regards,
Bharath Rupireddy.


v2-0001-enhance-pg_log_backend_memory_contexts-to-log-mem.patch
Description: Binary data


Re: parallel vacuum comments

2021-11-04 Thread Masahiko Sawada
On Fri, Nov 5, 2021 at 4:42 AM Peter Geoghegan  wrote:
>
> On Wed, Nov 3, 2021 at 10:25 PM Masahiko Sawada  wrote:
> > I've attached a draft patch. The patch incorporated all comments from
> > Andres except for the last comment that moves parallel related code to
> > another file. I'd like to discuss how we split vacuumlazy.c.
>
> This looks great!
>
> I wonder if this is okay, though:
>
> > /* Process the indexes that can be processed by only leader process */
> > -   do_serial_processing_for_unsafe_indexes(vacrel, lps->lvshared);
> > +   lazy_serial_process_indexes(vacrel);
> >
> > /*
> > -* Join as a parallel worker.  The leader process alone processes all 
> > the
> > -* indexes in the case where no workers are launched.
> > +* Join as a parallel worker.  The leader process alone processes all
> > +* parallel-safe indexes in the case where no workers are launched.
> >  */
> > -   do_parallel_processing(vacrel, lps->lvshared);
> > +   lazy_parallel_process_indexes(vacrel, lps->lvshared, 
> > vacrel->lps->lvsharedindstats);
> >
> > /*
> >  * Next, accumulate buffer and WAL usage.  (This must wait for the 
> > workers
> > @@ -2786,6 +2734,18 @@ do_parallel_vacuum_or_cleanup(LVRelState *vacrel, 
> > int nworkers)
> > InstrAccumParallelQuery(>buffer_usage[i], 
> > >wal_usage[i]);
> > }
>
> Since "The leader process alone processes all parallel-safe indexes in
> the case where no workers are launched" (no change there), I wonder:
> how does the leader *know* that it's the leader (and so can always
> process any indexes) inside its call to
> lazy_parallel_process_indexes()? Or, does the leader actually process
> all indexes inside lazy_serial_process_indexes() in this edge case?
> (The edge case where a parallel VACUUM has no workers at all, because
> they couldn't be launched by the core parallel infrastructure.)

lazy_serial_process_indexes() handles only parallel-unsafe (i.g.,
non-parallel-supported or too small indexes) indexes whereas
lazy_parallel_process_indexes() does that only parallel-safe indexes.
Therefore in the edge case, the leader process all indexes by using
both functions.

>
> One small thing: the new "LVSharedIndStats.parallel_safe" field seems
> to be slightly misnamed. Isn't it more like
> "LVSharedIndStats.parallel_workers_can_process"? The index might
> actually be parallel safe *in principle*, while nevertheless being
> deliberately skipped over by workers due to a deliberate up-front
> choice made earlier, in compute_parallel_vacuum_workers(). Most
> obvious example of this is the choice to not use parallelism for a
> smaller index (say a partial index whose size is <
> min_parallel_index_scan_size).

Agreed.

>
> Another small thing, which is closely related to the last one:
>
> >  typedef struct LVSharedIndStats
> >  {
> > -   boolupdated;/* are the stats updated? */
> > +   LVIndVacStatus status;
> > +
> > +   /*
> > +* True if both leader and worker can process the index, otherwise only
> > +* leader can process it.
> > +*/
> > +   boolparallel_safe;
> > +
> > +   boolistat_updated;  /* are the stats updated? */
> > IndexBulkDeleteResult istat;
> >  } LVSharedIndStats;
>
> It would be nice to point out that the new
> "LVSharedIndStats.parallel_safe" field (or whatever we end up calling
> it) had comments that pointed out that it isn't a fixed thing for the
> entire VACUUM operation -- it's only fixed for an individual call to
> parallel_lazy_vacuum_or_cleanup_all_indexes() (i.e., it's only fixed
> for the "ambulkdelete portion" or the "amvacuumcleanup portion" of the
> entire VACUUM).

Agreed.

>
> Alternatively, you could just have two booleans, I think. You know,
> one for the "ambulkdelete portion", another for the "amvacuumcleanup
> portion". As I've said before, it would be nice if we only called
> parallel_vacuum_main() once per VACUUM operation (and not once per
> "portion"), but that's a harder and more invasive change. But I don't
> think you necessarily have to go that far for it to make sense to have
> two bools. Having two might allow you to make both of them immutable,
> which is useful.

If we want to make booleans immutable, we need three booleans since
parallel index cleanup behaves differently depending on whether
bulk-deletion has been called once. Anyway, if I understand your
suggestion correctly, it probably means to have booleans corresponding
to VACUUM_OPTION_PARALLEL_XXX flags. Does the worker itself need to
decide whether to skip conditionally-parallel-index-cleanup-safe
indexes?

>
> > Regarding tests, I’d like to add tests to check if a vacuum with
> > multiple index scans (i.g., due to small maintenance_work_mem) works
> > fine. But a problem is that we need at least about 200,000 garbage
> > tuples to perform index scan twice even with the minimum
> > maintenance_work_mem. Which takes a time to load tuples.
>
> Maybe that's okay. Do you notice that it takes 

Re: row filtering for logical replication

2021-11-04 Thread Peter Smith
On Thu, Nov 4, 2021 at 2:21 PM houzj.f...@fujitsu.com
 wrote:
>
> Thanks for the patches.
> I started to review the patches and here are a few comments.
>
> 1)
> /*
>  * ALTER PUBLICATION ... ADD TABLE provides a 
> PublicationTable List
>  * (Relation, Where clause). ALTER PUBLICATION ... DROP TABLE 
> provides
>  * a Relation List. Check the List element to be used.
>  */
> if (IsA(lfirst(lc), PublicationTable))
> whereclause = true;
> else
> whereclause = false;
>
> I am not sure about the comments here, wouldn't it be better to always 
> provides
> PublicationTable List which could be more consistent.

Fixed in v37-0001 [1].

>
> 2)
> +   if ($3)
> +   {
> +   $$->pubtable->whereClause = 
> $3;
> +   }
>
> It seems we can remove the if ($3) check here.
>

Fixed in v37-0001 [1].

>
> 3)
>
> +   oldctx = 
> MemoryContextSwitchTo(CacheMemoryContext);
> +   rfnode = 
> stringToNode(TextDatumGetCString(rfdatum));
> +   exprstate = 
> pgoutput_row_filter_init_expr(rfnode);
> +   entry->exprstates = 
> lappend(entry->exprstates, exprstate);
> +   MemoryContextSwitchTo(oldctx);
> +   }
>
> Currently in the patch, it save and execute each expression separately. I was
> thinking it might be better if we can use "AND" to combine all the expressions
> into one expression, then we can initialize and optimize the final expression
> and execute it only once.

Yes, thanks for this suggestion - it is an interesting idea. I had
thought the same as this some time ago but never acted on it. I will
try implementing this idea as a separate new patch because it probably
needs to be performance tested against the current code just in case
the extra effort to combine the expressions outweighs any execution
benefits.

--
[1] 
https://www.postgresql.org/message-id/CAHut%2BPtRdXzPpm3qv3cEYWWfVUkGT84EopEHxwt95eo_cG_3eQ%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia.




Re: row filtering for logical replication

2021-11-04 Thread Peter Smith
On Wed, Oct 27, 2021 at 7:21 PM Greg Nancarrow  wrote:
>
> Regarding the v34-0006 patch, shouldn't it also include an update to
> the rowfilter_expr_checker() function added by the v34-0002 patch, for
> validating the referenced row-filter columns in the case of UPDATE?
> I was thinking something like the following (or is it more complex than 
> this?):
>
> diff --git a/src/backend/catalog/pg_publication.c
> b/src/backend/catalog/pg_publication.c
> index dc2f4597e6..579e727b10 100644
> --- a/src/backend/catalog/pg_publication.c
> +++ b/src/backend/catalog/pg_publication.c
> @@ -162,12 +162,10 @@ rowfilter_expr_checker(Publication *pub,
> ParseState *pstate, Node *rfnode, Relat
> rowfilter_validator(relname, rfnode);
>
> /*
> -* Rule 2: For "delete", check that filter cols are also valid replica
> -* identity cols.
> -*
> -* TODO - check later for publish "update" case.
> +* Rule 2: For "delete" and "update", check that filter cols are also
> +* valid replica identity cols.
>  */
> -   if (pub->pubactions.pubdelete)
> +   if (pub->pubactions.pubdelete || pub->pubactions.pubupdate)
> {
> char replica_identity = rel->rd_rel->relreplident;
>

Fixed in v37-0006 [1]

--
[1] 
https://www.postgresql.org/message-id/CAHut%2BPtRdXzPpm3qv3cEYWWfVUkGT84EopEHxwt95eo_cG_3eQ%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: row filtering for logical replication

2021-11-04 Thread Peter Smith
On Tue, Oct 26, 2021 at 6:26 PM Greg Nancarrow  wrote:
>
> A few comments for some things I have noticed so far:
>
> 1) scantuple cleanup seems to be missing since the v33-0001 patch.
>
> 2) I don't think that the ResetExprContext() calls (before
> FreeExecutorState()) are needed in the pgoutput_row_filter() and
> pgoutput_row_filter_virtual() functions.
>
> 3) make check-world fails, due to recent changes to PostgresNode.pm.

These 3 comments all addressed in v37-0006 [1]

--
[1] 
https://www.postgresql.org/message-id/CAHut%2BPtRdXzPpm3qv3cEYWWfVUkGT84EopEHxwt95eo_cG_3eQ%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia.




Re: row filtering for logical replication

2021-11-04 Thread Peter Smith
On Thu, Sep 23, 2021 at 10:33 PM Tomas Vondra
 wrote:
>
> 5) publicationcmds.c
>
> I mentioned this in my last review [1] already, but I really dislike the
> fact that OpenTableList accepts a list containing one of two entirely
> separate node types (PublicationTable or Relation). It was modified to
> use IsA() instead of a flag, but I still find it ugly, confusing and
> possibly error-prone.
>
> Also, not sure mentioning the two different callers explicitly in the
> OpenTableList comment is a great idea - it's likely to get stale if
> someone adds another caller.

Fixed in v37-0001 [1]

> 14) pgoutput_row_filter_update
>
> The function name seems a bit misleading, as it suggests might seem like
> it updates the row_filter, or something. Should indicate it's about
> deciding what to do with the update.

Fixed in v37-0006 [1]

--
[1] 
https://www.postgresql.org/message-id/CAHut%2BPtRdXzPpm3qv3cEYWWfVUkGT84EopEHxwt95eo_cG_3eQ%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Logical insert/update/delete WAL records for custom table AMs

2021-11-04 Thread Amit Kapila
On Fri, Nov 5, 2021 at 4:53 AM Jeff Davis  wrote:
>
> On Thu, 2021-11-04 at 14:33 +0530, Amit Kapila wrote:
> > Can't different tableAM's have different representations
> > for toast or may be some different concept like speculative
> > insertions?
>
> The decoding plugin should just use the common interfaces to toast, and
> if the table AM supports toast, everything should work fine.
>

I think that is true only if table AM uses same format as heap for
toast. It also seems to be relying heap tuple format.

> The only
> special thing it needs to do is check VARATT_IS_EXTERNAL_ONDISK(),
> because on-disk toast data can't be decoded (which is true for heap,
> too).
>
> I haven't looked as much into speculative insertions, but I don't think
> those are a problem either. Shouldn't they be handled before they make
> it into the change stream that the plugin sees?
>

They will be handled before the plugin sees them but I was talking
about what if the table AM has some other WAL similar to WAL of
speculative insertions that needs special handling.

> > Similarly, I remember that for zheap we didn't had
> > TransactionIds for subtransactions so we need to make some changes in
> > logical decoding to compensate for that. I guess similar stuff could
> > be required for another table AMs. Then a different table AM can have
> > a different tuple format which won't work for current change records
> > unless we convert it to heap tuple format before writing WAL for it.
>
> Columnar certainly has a different format. That makes me wonder whether
> ReorderBufferChange and/or ReorderBufferTupleBuf are too low-level, and
> we should have a higher-level representation of a change that is based
> on slots.
>
> Can you tell me more about the changes you made for zheap? I still
> don't understand why the decoding plugin would have to know what table
> AM the change came from.
>

I am not talking about decoding plugin but rather decoding itself,
basically, the work we do in reorderbuffer.c, decode.c, etc. The two
things I remember were tuple format and transaction ids as mentioned
in my previous email. I think we should try to find a solution for
tuple format as the current decoding code relies on it if we want
decoding to deal with another table AMs transparently.

-- 
With Regards,
Amit Kapila.




Re: Add missing function abs (interval)

2021-11-04 Thread Isaac Morland
On Thu, 4 Nov 2021 at 08:08, Daniel Gustafsson  wrote:

> > On 26 Sep 2021, at 19:58, Isaac Morland  wrote:
>
> > So I think I will prepare a revised patch that uses this formulation;
> and if I still have any suggestions that aren't directly related to adding
> abs(interval) I will split them off into a separate discussion.
>
> This CF entry is marked Waiting on Author, have you had the chance to
> prepare
> an updated version of this patch?
>

Not yet, but thanks for the reminder. I will try to get this done on the
weekend.


Re: On login trigger: take three

2021-11-04 Thread Greg Nancarrow
On Thu, Nov 4, 2021 at 7:43 AM Daniel Gustafsson  wrote:

> > On 3 Nov 2021, at 17:15, Ivan Panchenko  wrote:
> > Среда, 29 сентября 2021, 15:14 +03:00 от Teodor Sigaev  >:
> > 2 For logging purpose failing of trigger will cause impossibility to
> login, it
> > could be workarounded by catching error in trigger function, but it's
> not so
> > obvious for DBA.
> > If the trigger contains an error, nobody can login. The database is
> bricked.
> > Previous variant of the patch proposed to fix this with going to
> single-user mode.
> > For faster recovery I propose to have also a GUC variable to turn on/off
> the login triggers.
> > It should be 'on' by default.
>
> As voiced earlier, I disagree with this and I dislike the idea of punching
> a
> hole for circumventing infrastructure intended for auditing.
>
> Use-cases for a login-trigger commonly involve audit trail logging, session
> initialization etc.  If the login trigger bricks the production database
> to the
> extent that it needs to be restarted with the magic GUC, then it's highly
> likely that you *don't* want regular connections to the database for the
> duration of this.  Any such connection won't be subject to what the trigger
> does which seem counter to having the trigger in the first place.  This
> means
> that it's likely that the superuser fixing it will have to disable logins
> for
> everyone else while fixing, and it quicly becomes messy.
>
> With that in mind, I think single-user mode actually *helps* the users in
> this
> case, and we avoid a hole punched which in worst case can be a vector for
> an
> attack.
>
> Maybe I'm overly paranoid, but adding a backdoor of sorts for a situation
> which
> really shouldn't happen doesn't seem like a good idea.
>
>
+1
The arguments given are pretty convincing IMHO, and I agree that the
additions made in the v20 patch are not a good idea, and are not needed.


Regards,
Greg Nancarrow
Fujitsu Australia


Re: [PATCH] Added TRANSFORM FOR for COMMENT tab completion

2021-11-04 Thread Ken Kato

Hi,

I found unnecessary line deletion in my previous patch, so I made a 
minor update for that.


--
Best wishes,

Ken Kato
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATIONdiff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index ecae9df8ed..b9af1df319 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2399,22 +2399,19 @@ psql_completion(const char *text, int start, int end)
 	else if (Matches("COMMENT"))
 		COMPLETE_WITH("ON");
 	else if (Matches("COMMENT", "ON"))
-		COMPLETE_WITH("ACCESS METHOD", "CAST", "COLLATION", "CONVERSION",
-	  "DATABASE", "EVENT TRIGGER", "EXTENSION",
-	  "FOREIGN DATA WRAPPER", "FOREIGN TABLE", "SERVER",
-	  "INDEX", "LANGUAGE", "POLICY", "PUBLICATION", "RULE",
-	  "SCHEMA", "SEQUENCE", "STATISTICS", "SUBSCRIPTION",
-	  "TABLE", "TYPE", "VIEW", "MATERIALIZED VIEW",
-	  "COLUMN", "AGGREGATE", "FUNCTION",
-	  "PROCEDURE", "ROUTINE",
-	  "OPERATOR", "TRIGGER", "CONSTRAINT", "DOMAIN",
-	  "LARGE OBJECT", "TABLESPACE", "TEXT SEARCH", "ROLE");
+		COMPLETE_WITH("ACCESS METHOD", "AGGREGATE", "CAST", "COLLATION",
+	  "COLUMN","CONSTRAINT", "CONVERSION", "DATABASE",
+	  "DOMAIN","EXTENSION", "EVENT TRIGGER",
+	  "FOREIGN DATA WRAPPER","FOREIGN TABLE",
+	  "FUNCTION", "INDEX", "LANGUAGE","LARGE OBJECT",
+	  "MATERIALIZED VIEW", "OPERATOR","POLICY",
+	  "PROCEDURE", "PROCEDURAL LANGUAGE", "PUBLICATION","ROLE",
+	  "ROUTINE", "RULE", "SCHEMA", "SEQUENCE","SERVER",
+	  "STATISTICS", "SUBSCRIPTION", "TABLE",
+	  "TABLESPACE", "TEXT SEARCH", "TRANSFORM FOR",
+	  "TRIGGER", "TYPE", "VIEW");
 	else if (Matches("COMMENT", "ON", "ACCESS", "METHOD"))
 		COMPLETE_WITH_QUERY(Query_for_list_of_access_methods);
-	else if (Matches("COMMENT", "ON", "FOREIGN"))
-		COMPLETE_WITH("DATA WRAPPER", "TABLE");
-	else if (Matches("COMMENT", "ON", "TEXT", "SEARCH"))
-		COMPLETE_WITH("CONFIGURATION", "DICTIONARY", "PARSER", "TEMPLATE");
 	else if (Matches("COMMENT", "ON", "CONSTRAINT"))
 		COMPLETE_WITH_QUERY(Query_for_all_table_constraints);
 	else if (Matches("COMMENT", "ON", "CONSTRAINT", MatchAny))
@@ -2422,15 +2419,67 @@ psql_completion(const char *text, int start, int end)
 	else if (Matches("COMMENT", "ON", "CONSTRAINT", MatchAny, "ON"))
 	{
 		completion_info_charp = prev2_wd;
-		COMPLETE_WITH_QUERY(Query_for_list_of_tables_for_constraint);
+		COMPLETE_WITH_QUERY(Query_for_list_of_tables_for_constraint
+			" UNION SELECT 'DOMAIN'");
 	}
-	else if (Matches("COMMENT", "ON", "MATERIALIZED", "VIEW"))
-		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_matviews, NULL);
+	else if (Matches("COMMENT", "ON", "CONSTRAINT", MatchAny, "ON", "DOMAIN"))
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_domains, NULL);
 	else if (Matches("COMMENT", "ON", "EVENT", "TRIGGER"))
 		COMPLETE_WITH_QUERY(Query_for_list_of_event_triggers);
+	else if (Matches("COMMENT", "ON", "FOREIGN"))
+		COMPLETE_WITH("DATA WRAPPER", "TABLE");
+	else if (Matches("COMMENT", "ON", "FOREIGN", "TABLE"))
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_foreign_tables, NULL);
+	else if (Matches("COMMENT", "ON", "MATERIALIZED", "VIEW"))
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_matviews, NULL);
+	else if (Matches("COMMENT", "ON", "POLICY"))
+		COMPLETE_WITH_QUERY(Query_for_list_of_policies);
+	else if (Matches("COMMENT", "ON", "POLICY", MatchAny))
+		COMPLETE_WITH("ON");
+	else if (Matches("COMMENT", "ON", "POLICY", MatchAny, "ON"))
+	{
+		completion_info_charp = prev2_wd;
+		COMPLETE_WITH_QUERY(Query_for_list_of_tables_for_policy);
+	}
+	else if (Matches("COMMENT", "ON", "PROCEDURAL", "LANGUAGE"))
+		COMPLETE_WITH_QUERY(Query_for_list_of_languages);
+	else if (Matches("COMMENT", "ON", "RULE", MatchAny))
+		COMPLETE_WITH("ON");
+	else if (Matches("COMMENT", "ON", "RULE", MatchAny, "ON"))
+	{
+		completion_info_charp = prev2_wd;
+		COMPLETE_WITH_QUERY(Query_for_list_of_tables_for_rule);
+	}
+	else if (Matches("COMMENT", "ON", "TEXT", "SEARCH"))
+		COMPLETE_WITH("CONFIGURATION", "DICTIONARY", "PARSER", "TEMPLATE");
+	else if (Matches("COMMENT", "ON", "TEXT", "SEARCH", "CONFIGURATION"))
+		COMPLETE_WITH_QUERY(Query_for_list_of_ts_configurations);
+	else if (Matches("COMMENT", "ON", "TEXT", "SEARCH", "DICTIONARY"))
+		COMPLETE_WITH_QUERY(Query_for_list_of_ts_dictionaries);
+	else if (Matches("COMMENT", "ON", "TEXT", "SEARCH", "PARSER"))
+		COMPLETE_WITH_QUERY(Query_for_list_of_ts_parsers);
+	else if (Matches("COMMENT", "ON", "TEXT", "SEARCH", "TEMPLATE"))
+		COMPLETE_WITH_QUERY(Query_for_list_of_ts_templates);
+	else if (Matches("COMMENT", "ON", "TRANSFORM", "FOR"))
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_datatypes, NULL);
+	else if (Matches("COMMENT", "ON", "TRANSFORM", "FOR", MatchAny))
+		COMPLETE_WITH("LANGUAGE");
+	else if (Matches("COMMENT", "ON", "TRANSFORM", "FOR", MatchAny, "LANGUAGE"))
+	{
+		completion_info_charp = prev2_wd;
+		

Re: Data is copied twice when specifying both child and parent table in publication

2021-11-04 Thread Greg Nancarrow
On Thu, Nov 4, 2021 at 7:10 PM Amit Kapila  wrote:
>
> On Thu, Nov 4, 2021 at 12:23 PM Greg Nancarrow 
wrote:
> >
> > On Thu, Nov 4, 2021 at 3:13 PM Amit Kapila 
wrote:
> > >
> > > On further thinking about this, I think we should define the behavior
> > > of replication among partitioned (on the publisher) and
> > > non-partitioned (on the subscriber) tables a bit more clearly.
> > >
> > > - If the "publish_via_partition_root" is set for a publication then we
> > > can always replicate to the table with the same name as the root table
> > > in publisher.
> > > - If the "publish_via_partition_root" is *not* set for a publication
> > > then we can always replicate to the tables with the same name as the
> > > non-root tables in publisher.
> > >
> > > Thoughts?
> > >
> >
> > I'd adjust that wording slightly, because "we can always replicate to
> > ..." sounds a bit vague, and saying that an option is set or not set
> > could be misinterpreted, as the option could be "set" to false.
> >
> > How about:
> >
> > - If "publish_via_partition_root" is true for a publication, then data
> > is replicated to the table with the same name as the root (i.e.
> > partitioned) table in the publisher.
> > - If "publish_via_partition_root" is false (the default) for a
> > publication, then data is replicated to tables with the same name as
> > the non-root (i.e. partition) tables in the publisher.
> >
>
> Sounds good to me. If we follow this then I think the patch by Hou-San
> is good to solve the first problem as described in his last email [1]?
>
> [1] -
https://www.postgresql.org/message-id/OS0PR01MB5716C756312959F293A822C794869%40OS0PR01MB5716.jpnprd01.prod.outlook.com
>

Almost.
The patch does seem to solve that first problem (double publish on
tablesync).
I used the following test (taken from [2]), and variations of it:

--- Setup
create schema sch1;
create schema sch2;
create table sch1.tbl1 (a int) partition by range (a);
create table sch2.tbl1_part1 partition of sch1.tbl1 for values from (1) to
(10);
create table sch2.tbl1_part2 partition of sch1.tbl1 for values from
(10) to (20);
create schema sch3;
create table sch3.t1(c1 int);

--- Publication
create publication pub1 for all tables in schema sch3, table
sch1.tbl1, table sch2.tbl1_part1 with ( publish_via_partition_root=on);
insert into sch1.tbl1 values(1);
insert into sch1.tbl1 values(11);
insert into sch3.t1 values(1);

 Subscription
CREATE SUBSCRIPTION sub CONNECTION 'dbname=postgres host=localhost
port=5432' PUBLICATION pub1;


[2] -
https://postgr.es/m/caldanm3vxjpmmsrvdnk0f8uwp+eq5ry14xfeukmxsvg_ucw...@mail.gmail.com


However, there did still seem to be a problem, if
publish_via_partition_root is then set to false; it seems that can result
in duplicate partition entries in the pg_publication_tables view, see below
(this follows on from the test scenario given above):

postgres=# select * from pg_publication_tables;
 pubname | schemaname | tablename
-++---
 pub1| sch1   | tbl1
 pub1| sch3   | t1
(2 rows)

postgres=#  alter publication pub1 set (publish_via_partition_root=false);
ALTER PUBLICATION
postgres=# select * from pg_publication_tables;
 pubname | schemaname | tablename
-++
 pub1| sch2   | tbl1_part1
 pub1| sch2   | tbl1_part2
 pub1| sch2   | tbl1_part1
 pub1| sch3   | t1
(4 rows)

So I think the patch would need to be updated to prevent that.

Regards,
Greg Nancarrow
Fujitsu Australia


Re: Allow escape in application_name

2021-11-04 Thread Kyotaro Horiguchi
At Fri, 5 Nov 2021 03:14:00 +0900, Fujii Masao  
wrote in 
> 
> 
> On 2021/11/04 20:42, kuroda.hay...@fujitsu.com wrote:
> > Dear Fujii-san,
> >Thank you for giving comments! I attached new patches.
> 
> Thanks for updating the patch!
> 
> + 
> + Note that if embedded strings have Non-ASCII,
> + these characters will be replaced with the question marks
> (?).
> + This limitation is caused by application_name.
> + 
> 
> Isn't this descripton confusing because postgres_fdw actually doesn't
> do this?
> postgres_fdw just passses the application_name as it is to the remote
> server.

I'm not sure that that distinction is so clear for users. So I feel we
want a notice something like this. But it doesn't seem correct as it
is.  When the user name of the session consists of non-ascii
characters, %u is finally seen as a sequence of '?'.  It is not a
embedded strings in pgfdw_application_name.  I didn't notice this
behavior.

"pgfdw_application_name is set to application_name of a pgfdw
connection after placeholder conversion, thus the resulting string is
subject to the same restrictions to application_names.". Maybe
followed by "If session user name or database name contains non-ascii
characters, they are replaced by question marks "?"".

Sidetracking a bit, considering this restriction, we might need to
reconsider about %u and %d. session-id might be useful as it is
ascii-string that can identify a session exactly.

> >> On second thought, do we really need padding support for
> >> application_name
> >> in postgres_fdw? I just thought that when I read the description
> >> "Padding can be useful to aid human readability in log files." in the
> >> docs
> >> about log_line_prefix.
> >My feelings was that we don't have any reasons not to support,
> > but I cannot found some good use-cases.
> > Now I decided to keep supporting that,
> > but if we face another problem I will cut that.
> 
> I'd like to hear more opinions about this from others.
> But *if* there is actually no use case, I'd like not to add
> the feature, to make the code simpler.

I think padding is useful because it alingns the significant content
of log lines by equating the length of the leading fixed
components. application_name doesn't make any other visible components
unaligned even when shown in the pg_stat_activity view.  It is not
useful in that sense.

Padding make the string itself make look maybe nicer, but gives no
other advantage.

I doubt people complain to the lack of the padding feature.  Addition
to that, since pgfdw_application_name and log_line_prefix work
different way in regard to multibyte characters, we don't need to
struggle so hard to consilidate them in their behavior.

# I noticed that the paddig feature doesn't consider multibyte
# characters. (I'm not suggesting them ought to be handled
# "prpoerly").

In short, I'm for to removing it by +0.7.

> >> +  case 'a':
> >> +  if (MyProcPort)
> >> +  {
> >>
> >> I commented that the check of MyProcPort is necessary because
> >> background
> >> worker not having MyProcPort may access to the remote server. The
> >> example
> >> of such process is the resolver process that Sawada-san was proposing
> >> for
> >> atomic commit feature. But the proposal was withdrawn and for now
> >> there seems no such process. If this is true, we can safely remove the
> >> check
> >> of MyProcPort? If so, something like Assert(MyProcPort != NULL) may
> >> need
> >> to be added, instead.
> >My understating was that we don't have to assume committing the
> >Sawada-san's patch.
> > I think that FDW is only available from backend processes in the
> > current implementation,
> > and MyProcPort will be substituted when processes are initialized() -
> > in BackendInitialize().
> > Since the backend will execute BackendInitialize() after forking()
> > from the postmaster,
> > we can assume that everyone who operates FDW has a valid value for
> > MyProcPort.
> > I removed if statement and added assertion.

I think it is right.

> + case 'u':
> + Assert(MyProcPort != NULL);
> 
> Isn't it enough to perform this assertion check only once
> at the top of parse_pgfdw_appname()?

Yeah, in either way, we should treat them in the same way.

> > We can force parse_pgfdw_appname() not to be called if MyProcPort does
> > not exist,
> > but I don't think it is needed now.
> 
> Yes.

(I assume you said "it is needed now".)  I'm not sure how to force
that but if it means a NULL MyProcPort cuases a crash, I think
crashing server is needlessly too aggressive as the penatly.

> >> If user name or database name is not set, the name is replaced with
> >> "[unknown]". log_line_prefix needs this because log message may be
> >> output when they have not been set yet, e.g., at early stage of
> >> backend
> >> startup. But I wonder if application_name in postgres_fdw actually
> >> need that.. Thought?
> >Hmm, I 

Re: Teach pg_receivewal to use lz4 compression

2021-11-04 Thread Michael Paquier
On Thu, Nov 04, 2021 at 05:02:28PM +, gkokola...@pm.me wrote:
> Removed an extra condinional check while switching over compression_method.

Indeed.  My rebase was a bit sloppy here.

> because compression_method is the global option exposed to the whereas
> wal_compression_method is the local variable used to figure out what kind of
> file the function is currently working with. This error was existing at least 
> in
> v9-0002 of $subject.

Right.

> I felt that converting it a do {} while () loop instead, will help with
> readability:
> +   do
> +   {
> 
> +   /*
> +* No need to continue reading the file when the uncompressed_size
> +* exceeds WalSegSz, even if there are still data left to read.
> +* However, if uncompressed_size is equal to WalSegSz, it should
> +* verify that there is no more data to read.
> +*/
> +   } while (r > 0 && uncompressed_size <= WalSegSz);

No objections from me to do that.  This makes the code a bit easier to
follow, indeed.

> I would like to have a bit more test coverage in the case for 
> FindStreamingStart().
> Specifically for the case that a lz4-compressed segment larger than WalSegSz 
> exists.

The same could be said for gzip.  I am not sure that this is worth the
extra I/O and pg_receivewal commands, though.

I have spent an extra couple of hours staring at the code, and the
whole looked fine, so applied.  While on it, I have tested the new TAP
tests with all the possible combinations of --without-zlib and
--with-lz4.
--
Michael


signature.asc
Description: PGP signature


Re: inefficient loop in StandbyReleaseLockList()

2021-11-04 Thread Tom Lane
Michael Paquier  writes:
> On Thu, Nov 04, 2021 at 08:21:56PM -0400, Tom Lane wrote:
>> Hm.  I think it's not the only list function with O(N) behavior;
>> in fact there used to be more such functions than there are now.
>> But I could get behind a patch that annotates all of them.

> Documenting that makes sense.  Shouldn't we be careful to do that in
> both pg_list.h and list.c, then?

We have seldom, if ever, put function API-definition comments into .h files.
I do not see a reason why this case deserves an exception.  (It's tough
enough to get people to maintain definition comments that are right beside
the code they describe --- I think putting them in .h files would be a
disaster.)

regards, tom lane




Re: inefficient loop in StandbyReleaseLockList()

2021-11-04 Thread Michael Paquier
On Thu, Nov 04, 2021 at 08:21:56PM -0400, Tom Lane wrote:
> Andres Freund  writes:
> > I wonder if it's worth adding a note to list_delete_first() mentioning its
> > O(N) behaviour. It's not immediately visible from the code, and from the 
> > list
> > name one could very well be excused to not be worried about O(N) costs.
> 
> Hm.  I think it's not the only list function with O(N) behavior;
> in fact there used to be more such functions than there are now.
> But I could get behind a patch that annotates all of them.

Documenting that makes sense.  Shouldn't we be careful to do that in
both pg_list.h and list.c, then?
--
Michael


signature.asc
Description: PGP signature


RE: parallel vacuum comments

2021-11-04 Thread houzj.f...@fujitsu.com
On Thur, Nov 4, 2021 1:25 PM Masahiko Sawada  wrote:
> On Wed, Nov 3, 2021 at 1:08 PM Amit Kapila  wrote:
> >
> > On Tue, Nov 2, 2021 at 11:17 AM Masahiko Sawada 
> > wrote:
> > >
> > > On Tue, Nov 2, 2021 at 5:57 AM Peter Geoghegan  wrote:
> > > >
> > >
> > > > Rather than inventing PARALLEL_VACUUM_KEY_INDVAC_CHECK (just for
> > > > assert-enabled builds), we should invent PARALLEL_VACUUM_STATS --
> > > > a dedicated shmem area for the array of LVSharedIndStats (no more
> > > > storing LVSharedIndStats entries at the end of the LVShared space
> > > > in an ad-hoc, type unsafe way). There should be one array element
> > > > for each and every index -- even those indexes where parallel
> > > > index vacuuming is unsafe or not worthwhile (unsure if avoiding
> > > > parallel processing for "not worthwhile" indexes actually makes
> > > > sense, BTW). We can then get rid of the bitmap/IndStatsIsNull()
> > > > stuff entirely. We'd also add new per-index state fields to
> > > > LVSharedIndStats itself. We could directly record the status of
> > > > each index (e.g., parallel unsafe, amvacuumcleanup processing
> > > > done, ambulkdelete processing done) explicitly. All code could
> > > > safely subscript the LVSharedIndStats array directly, using idx
> > > > style integers. That seems far more robust and consistent.
> > >
> > > Sounds good.
> > >
> > > During the development, I wrote the patch while considering using
> > > fewer shared memory but it seems that it brought complexity (and
> > > therefore the bug). It would not be harmful even if we allocate
> > > index statistics on DSM for unsafe indexes and “not worthwhile"
> > > indexes in practice.
> > >
> >
> > If we want to allocate index stats for all indexes in DSM then why not
> > consider it on the lines of buf/wal_usage means tack those via
> > LVParallelState? And probably replace bitmap with an array of bools
> > that indicates which indexes can be skipped by the parallel worker.
> >
> 
> I've attached a draft patch. The patch incorporated all comments from Andres
> except for the last comment that moves parallel related code to another file.
> I'd like to discuss how we split vacuumlazy.c.

Hi,

I was recently reading the parallel vacuum code, and I think the patch can
bring a certain improvement.

Here are a few minor comments about it.

1)

+* Reset all index status back to invalid (while checking that we have
+* processed all indexes).
+*/
+   for (int i = 0; i < vacrel->nindexes; i++)
+   {
+   LVSharedIndStats *stats = &(lps->lvsharedindstats[i]);
+
+   Assert(stats->status == INDVAC_STATUS_COMPLETED);
+   stats->status = INDVAC_STATUS_INITIAL;
+   }

Do you think it might be clearer to report an error here ?

2)

+prepare_parallel_index_processing(LVRelState *vacrel, bool vacuum)

For the second paramater 'vacuum'. Would it be clearer if we pass a
LVIndVacStatus type instead of the boolean value ?

Best regards,
Hou zj


Re: [sqlsmith] Failed assertion in brin_minmax_multi_distance_float4 on REL_14_STABLE

2021-11-04 Thread Tomas Vondra



On 11/5/21 02:09, Tomas Vondra wrote:


Here's a patch that should fix this. 

>

Meh, forgot the attachment, ofc.

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Companydiff --git a/src/backend/access/brin/brin_minmax_multi.c b/src/backend/access/brin/brin_minmax_multi.c
index ee6c4795f1..a85dfdfec4 100644
--- a/src/backend/access/brin/brin_minmax_multi.c
+++ b/src/backend/access/brin/brin_minmax_multi.c
@@ -73,6 +73,7 @@
 #include "utils/builtins.h"
 #include "utils/date.h"
 #include "utils/datum.h"
+#include "utils/float.h"
 #include "utils/inet.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
@@ -1872,6 +1873,14 @@ brin_minmax_multi_distance_float4(PG_FUNCTION_ARGS)
 	float		a1 = PG_GETARG_FLOAT4(0);
 	float		a2 = PG_GETARG_FLOAT4(1);
 
+	/* if both values are NaN, then we consider them the same */
+	if (isnan(a1) && isnan(a2))
+		PG_RETURN_FLOAT8(0.0);
+
+	/* if one value is NaN, use infinite distance */
+	if (isnan(a1) || isnan(a2))
+		PG_RETURN_FLOAT8(get_float8_infinity());
+
 	/*
 	 * We know the values are range boundaries, but the range may be collapsed
 	 * (i.e. single points), with equal values.
@@ -1890,6 +1899,14 @@ brin_minmax_multi_distance_float8(PG_FUNCTION_ARGS)
 	double		a1 = PG_GETARG_FLOAT8(0);
 	double		a2 = PG_GETARG_FLOAT8(1);
 
+	/* if both values are NaN, then we consider them the same */
+	if (isnan(a1) && isnan(a2))
+		PG_RETURN_FLOAT8(0.0);
+
+	/* if one value is NaN, use infinite distance */
+	if (isnan(a1) || isnan(a2))
+		PG_RETURN_FLOAT8(get_float8_infinity());
+
 	/*
 	 * We know the values are range boundaries, but the range may be collapsed
 	 * (i.e. single points), with equal values.


Re: [sqlsmith] Failed assertion in brin_minmax_multi_distance_float4 on REL_14_STABLE

2021-11-04 Thread Tomas Vondra




On 11/4/21 23:56, Tomas Vondra wrote:

Hi,

On 11/4/21 17:53, Justin Pryzby wrote:

On Thu, Nov 04, 2021 at 09:46:49AM +0100, Andreas Seltenreich wrote:

sqlsmith triggers the following assertion when testing REL_14_STABLE:

 TRAP: FailedAssertion("a1 <= a2", File: "brin_minmax_multi.c", 
Line: 1879, PID: 631814)


I can reproduce it with the following query on a fresh regression
database:

 insert into public.brintest_multi (float4col) values (real 'nan');

The branch was at f6162c020c while testing, backtrace below.


I couldn't reproduce this, but it reminds me of this one, which we 
also had

trouble reproducing.



I can reproduce that just fine - all I had to do was 'make installcheck' 
and then connect to the regression db and run the insert.


It seems to be a simple case of confusion in handling of NaN values. We 
do sort them correctly (by calling float4_lt), but given two values


   arg1 = nan (0x40)
   arg2 = 0.0909090936

then a simple comparison does not give the expected result

   (arg1 < arg2)
   (arg1 == arg2)
   (arg1 > arg2)

all evaluate to false, which is why the assert fails. So I guess the 
distance function for float4 (and probably float8 too) need a couple 
more lines checking NaN.




Here's a patch that should fix this. It simply handled NaN and returns 
distance 0.0 for two NaN values and Infinity for NaN vs. something else. 
I did check what happens with Infinity values, but I think those are 
fine - we can actually calculate distance with them, and the assert 
works for them too.


I'll improve the regression tests to include a couple NaN/Infinity cases 
tomorrow, and then I'll get this pushed.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: inefficient loop in StandbyReleaseLockList()

2021-11-04 Thread Tom Lane
Andres Freund  writes:
> I wonder if it's worth adding a note to list_delete_first() mentioning its
> O(N) behaviour. It's not immediately visible from the code, and from the list
> name one could very well be excused to not be worried about O(N) costs.

Hm.  I think it's not the only list function with O(N) behavior;
in fact there used to be more such functions than there are now.
But I could get behind a patch that annotates all of them.

regards, tom lane




Re: pgsql: Fix WAL replay in presence of an incomplete record

2021-11-04 Thread Tom Lane
[ I'm working on the release notes ]

Alvaro Herrera  writes:
> Fix WAL replay in presence of an incomplete record
> ...
> Because a new type of WAL record is added, users should be careful to
> upgrade standbys first, primaries later. Otherwise they risk the standby
> being unable to start if the primary happens to write such a record.

Is there really any point in issuing such advice?  IIUC, the standbys
would be unable to proceed anyway in case of a primary crash at the
wrong time, because an un-updated primary would send them inconsistent
WAL.  If anything, it seems like it might be marginally better to
update the primary first, reducing the window for it to send WAL that
the standbys will *never* be able to handle.  Then, if it crashes, at
least the WAL contains something the standbys can process once you
update them.

Or am I missing something?

regards, tom lane




Re: WIP: expression evaluation improvements

2021-11-04 Thread Andres Freund
Hi,

I pushed a rebased (ugh, that was painul) version of the patches to
https://github.com/anarazel/postgres/tree/jit-relative-offsets

Besides rebasing I dropped a few patches and did some *minor* cleanup. Besides
that there's one substantial improvement, namely that I got rid of one more
absolute pointer reference (in the aggregate steps).

The main sources for pointers that remain is FunctionCallInfo->{flinfo,
context}. There's also WindowFuncExprState->wfuncno (which isn't yet known at
"expression compile time"), but that's not too hard to solve differently.


On 2021-11-04 12:30:00 -0400, Robert Haas wrote:
> As a general comment, I think the patches could really benefit from
> more meaningful commit messages and more comments on individual
> functions. It would definitely help me review, and it might help other
> people review, or modify the code later.

Sure. I was mostly exploring what would be necessary to to change expression
evaluation so that there's no absolute pointers in it. I still haven't figured
out all the necessary bits.


> For example, I'm looking at ExprEvalStep. If the intent here is that we
> don't want the union members to point to data that might differ from one
> execution of the plan to the next, it's surely important to mention that and
> explain to people who are trying to add steps later what they should do
> instead.  But I'm also not entirely sure that's the intended rule. It kind
> of surprises me that the only things that we'd be pointing to here that
> would fall into that category would be a bool, a NullableDatum, a
> NullableDatum array, and a FunctionCallInfo ... but I've been surprised by a
> lot of things that turned out to be true.

The immediate goal is to be able to generate JITed code/LLVM-IR that doesn't
contain any absolute pointer values. If the generated code doesn't change
regardless of any of the other contents of ExprEvalStep, we can still cache
the JIT optimization / code emission steps - which are the expensive bits.

With the exception of what I listed at the top, the types that you listed
really are what's needed to avoid such pointer constants. There are more
contents in the steps, but either they are constants (and thus just can be
embedded into the generated code), the expression step is just passed to
ExecEval*, or the data can just be loaded from the ExprStep at runtime
(although that makes the generated code slower).


There's a "more advanced" version of this where we can avoid recreating
ExprStates for e.g. prepared statements. Then we'd need to make a bit more of
the data use relative pointers. But that's likely a bit further off.  A more
moderate version will be to just store the number of steps for expressions
inside the expressions - for simple queries the allocation / growing / copying
of ExprSteps is quite visible.

FWIW interpreted execution does seem to win a bit from the higher density of
memory allocations for variable data this provides.


> I am not a huge fan of the various Rel[Whatever] typedefs. I am not
> sure that's really adding any clarity. On the other hand I would be a
> big fan of renaming the structure members in some systematic way. This
> kind of thing doesn't sit well with me:

I initially had all the Rel* use the same type, and it was much more error
prone because the compiler couldn't tell that the types are different.


> - NullableDatum *value; /* value to return */
> + RelNullableDatum value; /* value to return */
> 
> Well, if NullableDatum was the value to return, then RelNullableDatum
> isn't.It's some kind of thing that lets you find the value to return.

I don't really know what you mean? It's essentially just a different type of
pointer?


> Is it true that allocno is only used for, err, some kind of LLVM
> thing, and not in the regular interpreted path? As far as I can see,
> outside of the LLVM code, we only ever test whether it's 0, and don't
> actually care about the specific value.

I'd expect it to be useful for a few interpreded cases as well, but right now
it's not.


> I hope that the fact that this patch reverses the order of the first
> two arguments to ExecInitExprRec is only something you did to make it
> so that the compiler would find places you needed to update. Because
> otherwise it makes no sense to introduce a new thing called an
> ExprStateBuilder in 0017, make it an argument to that function, and
> then turn around and change the signature again in 0026. Anyway, a
> final patch shouldn't include this kind of churn.

Yes, that definitely needs to go.


> + offsetof(ExprState, steps) * esb->steps_len * sizeof(ExprEvalStep) +
> + state->mutable_off = offsetof(ExprState, steps) * esb->steps_len *
> sizeof(ExprEvalStep);
> 
> Well, either I'm confused here, or the first * should be a + in each
> case. I wonder how this works at all.

Oh. yes, that doesn't look right. I assume it's just always too big, and
that's why it doesn't cause problems...


> + /* copy in step data */
> + {
> + 

Re: Predefined role pg_maintenance for VACUUM, ANALYZE, CHECKPOINT.

2021-11-04 Thread Jeff Davis
On Thu, 2021-11-04 at 15:42 -0700, Andres Freund wrote:
> I don't like this. This turns the checkpoint command which previously
> didn't
> rely on the catalog in the happy path etc into something that
> requires most of
> the backend to be happily up to work.

It seems like this specific approach has been mostly shot down already.
 But out of curiosity, are you intending to run CHECKPOINT during
bootstrap or something?

Regards,
Jeff Davis






Re: Predefined role pg_maintenance for VACUUM, ANALYZE, CHECKPOINT.

2021-11-04 Thread Jeff Davis
On Thu, 2021-11-04 at 15:46 -0700, Andres Freund wrote:
> What about extending GRANT to allow to grant rights on commands? Yes,
> it'd be
> a bit of work to make that work in the catalogs, but it doesn't seem
> too hard
> to tackle.

You mean for the CHECKPOINT command specifically, or for many commands?

If it only applies to CHECKPOINT, it seems like more net clutter than a
new predefined role.

But I don't see it generalizing to a lot of commands, either. I looked
at the list, and it's taking some creativity to think of more than a
couple other commands where it makes sense. Maybe LISTEN/NOTIFY? But
even then, there are three related commands: LISTEN, UNLISTEN, and
NOTIFY. Are those one privilege representing them all, two
(LISTEN/UNLISTEN, and NOTIFY), or three separate privileges?

Regards,
Jeff Davis






Re: Logical insert/update/delete WAL records for custom table AMs

2021-11-04 Thread Jeff Davis
On Thu, 2021-11-04 at 14:33 +0530, Amit Kapila wrote:
> Can't different tableAM's have different representations
> for toast or may be some different concept like speculative
> insertions?

The decoding plugin should just use the common interfaces to toast, and
if the table AM supports toast, everything should work fine. The only
special thing it needs to do is check VARATT_IS_EXTERNAL_ONDISK(),
because on-disk toast data can't be decoded (which is true for heap,
too).

I haven't looked as much into speculative insertions, but I don't think
those are a problem either. Shouldn't they be handled before they make
it into the change stream that the plugin sees?

> Similarly, I remember that for zheap we didn't had
> TransactionIds for subtransactions so we need to make some changes in
> logical decoding to compensate for that. I guess similar stuff could
> be required for another table AMs. Then a different table AM can have
> a different tuple format which won't work for current change records
> unless we convert it to heap tuple format before writing WAL for it.

Columnar certainly has a different format. That makes me wonder whether
ReorderBufferChange and/or ReorderBufferTupleBuf are too low-level, and
we should have a higher-level representation of a change that is based
on slots.

Can you tell me more about the changes you made for zheap? I still
don't understand why the decoding plugin would have to know what table
AM the change came from.

Regards,
Jeff Davis






Re: Extending amcheck to check toast size and compression

2021-11-04 Thread Mark Dilger


> On Nov 4, 2021, at 7:53 AM, Robert Haas  wrote:
> 
> But, is it plausible to add test coverage for the new checks, or is
> that going to be too much of a pain?

It only takes about 20 additional lines in the regression test to check the 
code paths for raw sizes which are too large and too small, so I've done that 
in this next version.  Testing corrupt compressed data in a deterministic, 
cross platform manner with a compact, easy to maintain regression test has 
eluded me and is not included here.



v4-0001-Adding-more-toast-pointer-checks-to-amcheck.patch
Description: Binary data


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Re: [sqlsmith] Failed assertion in brin_minmax_multi_distance_float4 on REL_14_STABLE

2021-11-04 Thread Tomas Vondra

Hi,

On 11/4/21 17:53, Justin Pryzby wrote:

On Thu, Nov 04, 2021 at 09:46:49AM +0100, Andreas Seltenreich wrote:

sqlsmith triggers the following assertion when testing REL_14_STABLE:

 TRAP: FailedAssertion("a1 <= a2", File: "brin_minmax_multi.c", Line: 1879, 
PID: 631814)

I can reproduce it with the following query on a fresh regression
database:

 insert into public.brintest_multi (float4col) values (real 'nan');

The branch was at f6162c020c while testing, backtrace below.


I couldn't reproduce this, but it reminds me of this one, which we also had
trouble reproducing.



I can reproduce that just fine - all I had to do was 'make installcheck' 
and then connect to the regression db and run the insert.


It seems to be a simple case of confusion in handling of NaN values. We 
do sort them correctly (by calling float4_lt), but given two values


  arg1 = nan (0x40)
  arg2 = 0.0909090936

then a simple comparison does not give the expected result

  (arg1 < arg2)
  (arg1 == arg2)
  (arg1 > arg2)

all evaluate to false, which is why the assert fails. So I guess the 
distance function for float4 (and probably float8 too) need a couple 
more lines checking NaN.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Predefined role pg_maintenance for VACUUM, ANALYZE, CHECKPOINT.

2021-11-04 Thread Andres Freund
Hi,

On 2021-11-04 14:25:54 -0700, Jeff Davis wrote:
> On Thu, 2021-11-04 at 12:37 -0400, Robert Haas wrote:
> > I don't have anything specific to propose, which I realize is kind of
> > unhelpful ... but I don't like this, either.
> 
> We can go back to having a pg_checkpoint predefined role that is only
> used for the CHECKPOINT command.

What about extending GRANT to allow to grant rights on commands? Yes, it'd be
a bit of work to make that work in the catalogs, but it doesn't seem too hard
to tackle.

Greetings,

Andres Freund




Re: Predefined role pg_maintenance for VACUUM, ANALYZE, CHECKPOINT.

2021-11-04 Thread Andres Freund
Hi,

On 2021-11-02 10:28:39 -0700, Jeff Davis wrote:
> On Mon, 2021-11-01 at 12:50 -0400, Stephen Frost wrote:
> > All that said, I wonder if we can have our cake and eat it too.  I
> > haven't looked into this at all yet and perhaps it's foolish on its
> > face, but, could we make CHECKPOINT; basically turn around and just
> > run
> > select pg_checkpoint(); with the regular privilege checking
> > happening?
> > Then we'd keep the existing syntax working, but if the user is
> > allowed
> > to run the command would depend on if they've been GRANT'd EXECUTE
> > rights on the function or not.
> 
> Great idea! Patch attached.
> 
> This feels like a good pattern that we might want to use elsewhere, if
> the need arises.
>   case T_CheckPointStmt:
> - if (!superuser())
> - ereport(ERROR,
> - 
> (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
> -  errmsg("must be superuser to 
> do CHECKPOINT")));
> + {
> + /*
> +  * Invoke pg_checkpoint(). Implementing the 
> CHECKPOINT command
> +  * with a function allows administrators to 
> grant privileges
> +  * on the CHECKPOINT command by granting 
> privileges on the
> +  * pg_checkpoint() function. It also calls the 
> function
> +  * execute hook, if present.
> +  */
> + AclResult   aclresult;
> + FmgrInfoflinfo;
> +
> + aclresult = pg_proc_aclcheck(F_PG_CHECKPOINT, 
> GetUserId(),
> + 
>  ACL_EXECUTE);
> + if (aclresult != ACLCHECK_OK)
> + aclcheck_error(aclresult, 
> OBJECT_FUNCTION,
> +
> get_func_name(F_PG_CHECKPOINT));
>  
> - RequestCheckpoint(CHECKPOINT_IMMEDIATE | 
> CHECKPOINT_WAIT |
> -   (RecoveryInProgress() 
> ? 0 : CHECKPOINT_FORCE));
> + InvokeFunctionExecuteHook(F_PG_CHECKPOINT);
> +
> + fmgr_info(F_PG_CHECKPOINT, );
> +
> + (void) FunctionCall0Coll(, InvalidOid);
> + }
>   break;

I don't like this. This turns the checkpoint command which previously didn't
rely on the catalog in the happy path etc into something that requires most of
the backend to be happily up to work.

Greetings,

Andres Freund




Re: inefficient loop in StandbyReleaseLockList()

2021-11-04 Thread Andres Freund
Hi,

On 2021-11-02 11:35:55 -0400, Tom Lane wrote:
> It's not clear that the remaining list_delete_first
> callers have any real problem; and changing them would be complex.

I wonder if it's worth adding a note to list_delete_first() mentioning its
O(N) behaviour. It's not immediately visible from the code, and from the list
name one could very well be excused to not be worried about O(N) costs.

Greetings,

Andres Freund




Re: Possible Documentation Update for ALTER STATISTICS

2021-11-04 Thread Tomas Vondra

Hi Ahmet,

On 11/4/21 14:35, Ahmet Gedemenli wrote:

Hey,

I've noticed that the current documentation doesn't mention IF EXISTS 
clause for ALTER STATISTICS in the synopsis section, where PG supports it.
https://www.postgresql.org/docs/14/sql-alterstatistics.html 

(Only for the last item, that is ALTER STATISTICS .. SET STATISTICS; for 
the others, PG just throws a syntax error.)


I'm from the Citus team and noticed this while bug fixing, and I wonder 
if it is intentional or not. If it's intentionally supported while the 
other ALTER STATISTICS statement types are not supported, it would be 
good to mention that in the documentation.




Well, it's intentional in the sense that support for IF EXISTS in ALTER 
commands is rather spotty. For OWNER TO it's not supported at all, and 
for the other (RENAME & SET SCHEMA) it's supported only for some object 
types. So we added the minimum grammar and never got around to add it.


So you're right we should update the docs for the SET STATISTICS case to 
show it's supported in 14. I'll do that shortly.


For 15+ we could improve this to allow IF EXISTS in the other cases. For 
RENAME and SET SCHEMA it's fairly easy (see attached fix), for OWNER TO 
it's going to be more work because the AlterOwnerStmt does not have the 
missing_ok flag.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Companydiff --git a/src/backend/commands/alter.c b/src/backend/commands/alter.c
index 40044070cf..801f6eddb6 100644
--- a/src/backend/commands/alter.c
+++ b/src/backend/commands/alter.c
@@ -374,6 +374,49 @@ ExecRenameStmt(RenameStmt *stmt)
 		case OBJECT_TYPE:
 			return RenameType(stmt);
 
+		case OBJECT_STATISTIC_EXT:
+			{
+ObjectAddress address;
+Relation	catalog;
+Relation	relation = NULL;
+
+address = get_object_address(stmt->renameType,
+			 stmt->object,
+			 ,
+			 AccessExclusiveLock,
+			 stmt->missing_ok);
+Assert(relation == NULL);
+
+if (OidIsValid(address.objectId))
+{
+	catalog = table_open(address.classId, RowExclusiveLock);
+	AlterObjectRename_internal(catalog,
+			   address.objectId,
+			   stmt->newname);
+	table_close(catalog, RowExclusiveLock);
+}
+else
+{
+	char	   *schemaname;
+	char	   *statname;
+
+	Assert(stmt->missing_ok);
+
+	DeconstructQualifiedName(castNode(List, stmt->object), , );
+
+	if (schemaname)
+		ereport(NOTICE,
+(errmsg("statistics object \"%s.%s\" does not exist, skipping",
+		schemaname, statname)));
+	else
+		ereport(NOTICE,
+(errmsg("statistics object \"%s\" does not exist, skipping",
+		statname)));
+}
+
+return address;
+			}
+
 		case OBJECT_AGGREGATE:
 		case OBJECT_COLLATION:
 		case OBJECT_CONVERSION:
@@ -386,7 +429,6 @@ ExecRenameStmt(RenameStmt *stmt)
 		case OBJECT_LANGUAGE:
 		case OBJECT_PROCEDURE:
 		case OBJECT_ROUTINE:
-		case OBJECT_STATISTIC_EXT:
 		case OBJECT_TSCONFIGURATION:
 		case OBJECT_TSDICTIONARY:
 		case OBJECT_TSPARSER:
@@ -521,6 +563,51 @@ ExecAlterObjectSchemaStmt(AlterObjectSchemaStmt *stmt,
 		 oldSchemaAddr ?  : NULL);
 			break;
 
+		case OBJECT_STATISTIC_EXT:
+			{
+Relation	catalog;
+Relation	relation;
+Oid			classId;
+Oid			nspOid;
+
+address = get_object_address(stmt->objectType,
+			 stmt->object,
+			 ,
+			 AccessExclusiveLock,
+			 stmt->missing_ok);
+
+if (OidIsValid(address.objectId))
+{
+	Assert(relation == NULL);
+	classId = address.classId;
+	catalog = table_open(classId, RowExclusiveLock);
+	nspOid = LookupCreationNamespace(stmt->newschema);
+
+	oldNspOid = AlterObjectNamespace_internal(catalog, address.objectId,
+			  nspOid);
+	table_close(catalog, RowExclusiveLock);
+}
+else
+{
+	char	   *schemaname;
+	char	   *statname;
+
+	Assert(stmt->missing_ok);
+
+	DeconstructQualifiedName(castNode(List, stmt->object), , );
+
+	if (schemaname)
+		ereport(NOTICE,
+(errmsg("statistics object \"%s.%s\" does not exist, skipping",
+		schemaname, statname)));
+	else
+		ereport(NOTICE,
+(errmsg("statistics object \"%s\" does not exist, skipping",
+		statname)));
+}
+			}
+			break;
+
 			/* generic code path */
 		case OBJECT_AGGREGATE:
 		case OBJECT_COLLATION:
@@ -531,7 +618,6 @@ ExecAlterObjectSchemaStmt(AlterObjectSchemaStmt *stmt,
 		case OBJECT_OPFAMILY:
 		case OBJECT_PROCEDURE:
 		case OBJECT_ROUTINE:
-		case OBJECT_STATISTIC_EXT:
 		case OBJECT_TSCONFIGURATION:
 		case OBJECT_TSDICTIONARY:
 		case OBJECT_TSPARSER:
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index d0eb80e69c..878e9152f6 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -8955,6 +8955,15 @@ RenameStmt: ALTER AGGREGATE 

Re: Predefined role pg_maintenance for VACUUM, ANALYZE, CHECKPOINT.

2021-11-04 Thread Jeff Davis
On Thu, 2021-11-04 at 12:37 -0400, Robert Haas wrote:
> I don't have anything specific to propose, which I realize is kind of
> unhelpful ... but I don't like this, either.

We can go back to having a pg_checkpoint predefined role that is only
used for the CHECKPOINT command.

The only real argument against that was a general sense of clutter, but
I wasn't entirely convinced of that. If we have a specialized command,
we have all kinds of clutter associated with that; a predefined role
doesn't add much additional clutter.

Regards,
Jeff Davis






Re: parallel vacuum comments

2021-11-04 Thread Peter Geoghegan
On Thu, Nov 4, 2021 at 12:42 PM Peter Geoghegan  wrote:
> Since "The leader process alone processes all parallel-safe indexes in
> the case where no workers are launched" (no change there), I wonder:
> how does the leader *know* that it's the leader (and so can always
> process any indexes) inside its call to
> lazy_parallel_process_indexes()? Or, does the leader actually process
> all indexes inside lazy_serial_process_indexes() in this edge case?
> (The edge case where a parallel VACUUM has no workers at all, because
> they couldn't be launched by the core parallel infrastructure.)

I think that I might see a related problem. But I'm not sure, so I'll just ask:

> +   /* Set data required for parallel index vacuum or cleanup */
> +   prepare_parallel_index_processing(vacrel, vacuum);
> +
> +   /* Reset the parallel index processing counter */
> +   pg_atomic_write_u32(&(lps->lvshared->idx), 0);
> +
> /* Setup the shared cost-based vacuum delay and launch workers */
> if (nworkers > 0)
> {
> +   /* Reinitialize the parallel context to relaunch parallel workers */
> if (vacrel->num_index_scans > 0)
> -   {
> -   /* Reset the parallel index processing counter */
> -   pg_atomic_write_u32(&(lps->lvshared->idx), 0);
> -
> -   /* Reinitialize the parallel context to relaunch parallel workers 
> */
> ReinitializeParallelDSM(lps->pcxt);
> -   }

Is it okay that we don't call ReinitializeParallelDSM() just because
"nworkers == 0" this time around? I notice that there is a wait for
"nworkers_launched" workers to finish parallel processing, at the top
of ReinitializeParallelDSM(). I can see why the
"vacrel->num_index_scans > 0" test is okay, but I can't see why the
"nworkers == 0" test is okay.

I just want to be sure that we're not somehow relying on seeing state
in shared memory (in the LVSharedIndStats array) in all cases, but
finding that it is not actually there in certain rare edge cases.
Maybe this didn't matter before, because the leader didn't expect to
find this information in shared memory in any case. But that is
changed by your patch, of course, so it's something to be concerned
about.

--
Peter Geoghegan




Re: pglz compression performance, take two

2021-11-04 Thread Tomas Vondra

Hi,

I've looked at this patch again and did some testing. I don't have any 
comments to the code (I see there are two comments from Mark after the 
last version, though).


For the testing, I did a fairly simple benchmark loading either random 
or compressible data into a bytea column. The tables are defined as 
unlogged, the values are 1kB, 4kB and 1MB, and the total amount of data 
is always 1GB. The timings are


test   master patcheddelta
--
random_1k  1229512312 100%
random_1m  1299912984 100%
random_4k  1688115959  95%
redundant_1k   1230812348 100%
redundant_1m   1663214072  85%
redundant_4k   1679813828  82%

I ran the test on multiple x86_64 machines, but the behavior is almost 
exactly the same.


This shows there's no difference for 1kB (expected, because this does 
not exceed the ~2kB TOAST threshold). For random data in general the 
difference is pretty negligible, although it's a bit strange it takes 
longer for 4kB than 1MB ones.


For redundant (highly compressible) values, there's quite significant 
speedup between 15-18%. Real-world data are likely somewhere between, 
but the speedup is still pretty nice.


Andrey, can you update the patch per Mark's review? I'll do my best to 
get it committed sometime in this CF.


Attached are the two scripts used for generating / testing (you'll have 
to fix some hardcoded paths, but simple otherwise).


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

generator.sql
Description: application/sql


test.sh
Description: application/shellscript


Re: Time to drop plpython2?

2021-11-04 Thread Tom Lane
Andres Freund  writes:
> Another thing I wondered about is what we want to do with the extension
> names. Do we want to leave it named plpython3u? Do we want to have a plpython
> that depends on plpython3u?

I think we want to keep plpython3u.  Maybe we can point plpythonu
at that, but I'm concerned about the potential for user confusion.
In particular, I think there's a nonzero probability that someone
will choose to maintain plpython2u/plpythonu outside of core,
just because they still don't want to migrate their Python code.

> I'd be inclined to just keep it at plpython3u for now, but there's an argument
> that going for plpython would be better long term: Presumably there will be
> python 4 at some point - but I'd expect that to not be a breaking release,
> given the disaster that python 3 is. Making a non-versioned name better?

Meh.  If there is a python 4, I'd expect it to be named that way precisely
because it *is* a breaking release.  Why would we set ourselves up for
a repeat of this mess?

regards, tom lane




Re: should we enable log_checkpoints out of the box?

2021-11-04 Thread Andres Freund
Hi,

On 2021-11-02 11:55:23 -0400, Robert Haas wrote:
> On Sun, Oct 31, 2021 at 10:59 AM Tom Lane  wrote:
> > The general policy at the moment is that a normally-functioning server
> > should emit *no* log traffic by default (other than a few messages
> > at startup and shutdown).  log_checkpoints is a particularly poor
> > candidate for an exception to that policy, because it would produce so
> > much traffic.  No DBA would be likely to consider it as anything but
> > log spam.
> 
> That's absolutely false. On any system where there's anything actually
> happening, there's going to be tons of stuff in the log because there
> are going to be failed connection attempts, queries that result in
> errors, and all kinds of other things like that. By any reasonable
> metric, the log volume of log_checkpoints=on is tiny.

Yes.

I think we do have significant issues with noisy mesages drowning out all
signal in the log, but that's largely stuff that's logged by default based on
client actions, at a high rate, rather than something occasional like log
checkpoints.

It's *hard* to find relevant stuff in postgres log files. Most instances with
some amount of traffic will have non-graceful disconnects (each logging two
messages, one "LOG: could not send data to client: Broken pipe" and one
"FATAL: connection to client lost"). It's normal to have some amount of
constraint violations. Etc. One cannot realistically see LOG or ERROR messages
indicating real trouble because we do not provide a realistic way to separate
such "normal" log messages from others fairly reliably indicating a problem.


> Besides appearing to be unwarranted mockery of what Bharath wrote,

Indeed.

Greetings,

Andres Freund




Re: Time to drop plpython2?

2021-11-04 Thread Andres Freund
Hi,

On 2021-11-04 19:58:54 +0100, Peter Eisentraut wrote:
> I see you have posted a patch for this in the meson thread 
> (https://www.postgresql.org/message-id/attachment/127770/v5-0003-plpython-Drop-support-python2.patch).

Yea, I was planning to post that here after a bit more polish. I mostly wanted
to get rid of the gross gross hack I did for the transmutation of the
regression test files.


> Here is my review of that.
> 
> I would change the search order in configure from
> 
> PGAC_PATH_PROGS(PYTHON, [python python3 python2])

> to
> 
> PGAC_PATH_PROGS(PYTHON, [python3 python])
> 
> This makes sure you don't accidentally pick up a "python" binary that points
> to Python 2.  This would otherwise immediately generate lots of build
> failures on older platforms that still have Python 2 around.

Yes, we should do that, at least for now. I did see build failures that
required me to specify the python version to avoid issues around it.


> You left two FIXME sections in plpython.h.  AFAICT, we can make the
> substitutions corresponding to those #define's in the source code and remove
> those blocks.

Yea, it shouldn't be hard. Just required more time than I had to send it out
before Nov 1st ;)

With meson I'd do a version: '>= 3' or such, to filter out a bare 'python'
being python2, but I don't think there's an equally trivial way to do that
with autoconf.


> Finally, morally related, there is some Python 2/3 compat code in
> contrib/unaccent/generate_unaccent_rules.py that could be removed. Also,
> arguably, change the shebang line in that script.

Hm. So far the python used for plpython and python for code generation etc is
independent. I don't know if plpython actually can be cross-compiled, but if
so, they'd have to be independent.  Otherwise I'd say we should just invoke
contrib/unaccent/generate_unaccent_rules.py with a python chosen by
configure/meson, rather than relying on a shebang that can't be adjusted
without modifying source code.


Another thing I wondered about is what we want to do with the extension
names. Do we want to leave it named plpython3u? Do we want to have a plpython
that depends on plpython3u?

I'd be inclined to just keep it at plpython3u for now, but there's an argument
that going for plpython would be better long term: Presumably there will be
python 4 at some point - but I'd expect that to not be a breaking release,
given the disaster that python 3 is. Making a non-versioned name better?

Greetings,

Andres Freund




Re: parallel vacuum comments

2021-11-04 Thread Peter Geoghegan
On Wed, Nov 3, 2021 at 10:25 PM Masahiko Sawada  wrote:
> I've attached a draft patch. The patch incorporated all comments from
> Andres except for the last comment that moves parallel related code to
> another file. I'd like to discuss how we split vacuumlazy.c.

This looks great!

I wonder if this is okay, though:

> /* Process the indexes that can be processed by only leader process */
> -   do_serial_processing_for_unsafe_indexes(vacrel, lps->lvshared);
> +   lazy_serial_process_indexes(vacrel);
>
> /*
> -* Join as a parallel worker.  The leader process alone processes all the
> -* indexes in the case where no workers are launched.
> +* Join as a parallel worker.  The leader process alone processes all
> +* parallel-safe indexes in the case where no workers are launched.
>  */
> -   do_parallel_processing(vacrel, lps->lvshared);
> +   lazy_parallel_process_indexes(vacrel, lps->lvshared, 
> vacrel->lps->lvsharedindstats);
>
> /*
>  * Next, accumulate buffer and WAL usage.  (This must wait for the workers
> @@ -2786,6 +2734,18 @@ do_parallel_vacuum_or_cleanup(LVRelState *vacrel, int 
> nworkers)
> InstrAccumParallelQuery(>buffer_usage[i], 
> >wal_usage[i]);
> }

Since "The leader process alone processes all parallel-safe indexes in
the case where no workers are launched" (no change there), I wonder:
how does the leader *know* that it's the leader (and so can always
process any indexes) inside its call to
lazy_parallel_process_indexes()? Or, does the leader actually process
all indexes inside lazy_serial_process_indexes() in this edge case?
(The edge case where a parallel VACUUM has no workers at all, because
they couldn't be launched by the core parallel infrastructure.)

One small thing: the new "LVSharedIndStats.parallel_safe" field seems
to be slightly misnamed. Isn't it more like
"LVSharedIndStats.parallel_workers_can_process"? The index might
actually be parallel safe *in principle*, while nevertheless being
deliberately skipped over by workers due to a deliberate up-front
choice made earlier, in compute_parallel_vacuum_workers(). Most
obvious example of this is the choice to not use parallelism for a
smaller index (say a partial index whose size is <
min_parallel_index_scan_size).

Another small thing, which is closely related to the last one:

>  typedef struct LVSharedIndStats
>  {
> -   boolupdated;/* are the stats updated? */
> +   LVIndVacStatus status;
> +
> +   /*
> +* True if both leader and worker can process the index, otherwise only
> +* leader can process it.
> +*/
> +   boolparallel_safe;
> +
> +   boolistat_updated;  /* are the stats updated? */
> IndexBulkDeleteResult istat;
>  } LVSharedIndStats;

It would be nice to point out that the new
"LVSharedIndStats.parallel_safe" field (or whatever we end up calling
it) had comments that pointed out that it isn't a fixed thing for the
entire VACUUM operation -- it's only fixed for an individual call to
parallel_lazy_vacuum_or_cleanup_all_indexes() (i.e., it's only fixed
for the "ambulkdelete portion" or the "amvacuumcleanup portion" of the
entire VACUUM).

Alternatively, you could just have two booleans, I think. You know,
one for the "ambulkdelete portion", another for the "amvacuumcleanup
portion". As I've said before, it would be nice if we only called
parallel_vacuum_main() once per VACUUM operation (and not once per
"portion"), but that's a harder and more invasive change. But I don't
think you necessarily have to go that far for it to make sense to have
two bools. Having two might allow you to make both of them immutable,
which is useful.

> Regarding tests, I’d like to add tests to check if a vacuum with
> multiple index scans (i.g., due to small maintenance_work_mem) works
> fine. But a problem is that we need at least about 200,000 garbage
> tuples to perform index scan twice even with the minimum
> maintenance_work_mem. Which takes a time to load tuples.

Maybe that's okay. Do you notice that it takes a lot longer now? I did
try to keep the runtime down when I committed the fixup to the
parallel VACUUM related bug.

-- 
Peter Geoghegan




Re: parallel vacuum comments

2021-11-04 Thread Andres Freund
Hi,

On 2021-11-01 10:44:34 +0900, Masahiko Sawada wrote:
> On Sun, Oct 31, 2021 at 6:21 AM Andres Freund  wrote:
> >   But even though we have this space optimized bitmap thing, we actually 
> > need
> >   more memory allocated for each index, making this whole thing pointless.
> 
> Right. But is better to change to use booleans?

It seems very clearly better to me. We shouldn't use complicated stuff like

#define SizeOfLVShared (offsetof(LVShared, bitmap) + sizeof(bits8))
#define GetSharedIndStats(s) \
((LVSharedIndStats *)((char *)(s) + ((LVShared *)(s))->offset))
#define IndStatsIsNull(s, i) \
(!(((LVShared *)(s))->bitmap[(i) >> 3] & (1 << ((i) & 0x07

when there's reason / benefit.


> > - Imo it's pretty confusing to have functions like
> >   lazy_parallel_vacuum_indexes() (in 13, renamed in 14) that "Perform index
> >   vacuum or index cleanup with parallel workers.", based on
> >   lps->lvshared->for_cleanup.
> 
> Okay. We need to set lps->lvshared->for_cleanup to tell worker do
> either index vacuum or index cleanup. So it might be better to pass
> for_cleanup flag down to the functions in addition to setting
> lps->lvshared->for_cleanup.

Yup.


> > - I don't like some of the new names introduced in 14. E.g.
> >   "do_parallel_processing" is way too generic.
> 
> I listed the function names that probably needs to be renamed from
> that perspecti:
> 
> * do_parallel_processing
> * do_serial_processing_for_unsafe_indexes
> * parallel_process_one_index
> 
> Is there any other function that should be renamed?

parallel_processing_is_safe().

I don't like that there's three different naming patterns for parallel
things. There's do_parallel_*, there's parallel_, and there's
(begin|end|compute)_parallel_*.


> > - On a higher level, a lot of this actually doesn't seem to belong into
> >   vacuumlazy.c, but should be somewhere more generic. Pretty much none of 
> > this
> >   code is heap specific.  And vacuumlazy.c is large enough without the 
> > parallel
> >   code.
> 
> I don't come with an idea to make them more generic. Could you
> elaborate on that?

To me the the job that the parallel vacuum stuff does isn't really specific to
heap. Any table AM supporting indexes is going to need to do something pretty
much like it (it's calling indexam stuff). Most of the stuff in vacuumlazy.c
is very heap specific - you're not going to be able to reuse lazy_scan_heap()
or such. Before the parallel vacuum stuff, the index specific code in
vacuumlazy.c was fairly limited - but now it's a nontrivial amount of code.

Based on a quick look
  parallel_vacuum_main(), parallel_processing_is_safe(),
  parallel_stats_for_idx(), end_parallel_vacuum(), begin_parallel_vacuum(),
  compute_parallel_vacuum_workers(), parallel_process_one_index(),
  do_serial_processing_for_unsafe_indexes(), do_parallel_processing(),
  do_parallel_vacuum_or_cleanup(), do_parallel_lazy_cleanup_all_indexes(),
  do_parallel_lazy_vacuum_all_indexes(),

don't really belong in vacuumlazy.c. but should be in vacuum.c or a new
file. Of course that requires a bit of work, because of the heavy reliance on
LVRelState, but afaict there's not really an intrinsic need to use that.

Greetings,

Andres Freund




Re: Time to drop plpython2?

2021-11-04 Thread Peter Eisentraut
I see you have posted a patch for this in the meson thread 
(https://www.postgresql.org/message-id/attachment/127770/v5-0003-plpython-Drop-support-python2.patch). 
 Here is my review of that.


I would change the search order in configure from

PGAC_PATH_PROGS(PYTHON, [python python3 python2])

to

PGAC_PATH_PROGS(PYTHON, [python3 python])

This makes sure you don't accidentally pick up a "python" binary that 
points to Python 2.  This would otherwise immediately generate lots of 
build failures on older platforms that still have Python 2 around.


You left two FIXME sections in plpython.h.  AFAICT, we can make the 
substitutions corresponding to those #define's in the source code and 
remove those blocks.


src/pl/plpython/expected/README should be updated for the removed files.

Some documentation updates are needed.  I see possibly relevant text in 
the following files:


hstore.sgml
install-windows.sgml
installation.sgml
json.sgml
libpq.sgml
plpython.sgml
ref/comment.sgml (examples)
ref/create_transform.sgml (examples)
ref/drop_transform.sgml (examples)
standalone-profile.xsl

src/tools/msvc/ has lots of Python-related content.

More stuff to clean up:

contrib/hstore_plpython/.gitignore
contrib/jsonb_plpython/.gitignore
contrib/ltree_plpython/.gitignore
src/pl/plpython/.gitignore

Finally, morally related, there is some Python 2/3 compat code in 
contrib/unaccent/generate_unaccent_rules.py that could be removed. 
Also, arguably, change the shebang line in that script.





Re: [RFC] building postgres with meson -v

2021-11-04 Thread Andres Freund
Hi,

On 2021-11-04 19:17:05 +0100, Peter Eisentraut wrote:
> On 01.11.21 00:24, Andres Freund wrote:
> > - remaining hardcoded configure tests (e.g. ACCEPT_TYPE_ARG*)
> 
> I think we can get rid of that one.

Oh, nice!

I was somewhat confused by "unsigned int PASCAL" as a type.


> That test originally catered to some strange edge cases where the third
> argument was size_t that was not the same size as int.  That is long gone,
> if it ever really existed.  All systems currently of interest use either
> socklen_t or int, and socklen_t is always int.  (A few build farm animals
> report size_t, but they are all 32-bit.)

> diff --git a/src/include/c.h b/src/include/c.h
> index c8ede08273..7c790f557e 100644
> --- a/src/include/c.h
> +++ b/src/include/c.h
> @@ -408,6 +408,10 @@ typedef unsigned char bool;
>   * 
>   */
>  
> +#ifndef HAVE_SOCKLEN_T
> +typedef socklen_t int;
> +#endif

I'd put this in port.h instead of c.h, or is there a reason not to do so?


Probably worth putting this in fairly soon independent of whether anything
happens wrt meson?

Greetings,

Andres Freund




Re: [RFC] building postgres with meson -v

2021-11-04 Thread Peter Eisentraut


On 01.11.21 00:24, Andres Freund wrote:

- remaining hardcoded configure tests (e.g. ACCEPT_TYPE_ARG*)


I think we can get rid of that one.

That test originally catered to some strange edge cases where the third 
argument was size_t that was not the same size as int.  That is long 
gone, if it ever really existed.  All systems currently of interest use 
either socklen_t or int, and socklen_t is always int.  (A few build farm 
animals report size_t, but they are all 32-bit.)


I think we can change the code to use socklen_t and add a simple check 
to typedef socklen_t as int if not available.  See attached patch.
>From 6229ba9973134dfb184eb21bc62822d83ba554d8 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 4 Nov 2021 13:50:25 +0100
Subject: [PATCH] Remove check for accept() argument types

---
 aclocal.m4|  1 -
 config/ac_func_accept_argtypes.m4 | 78 -
 configure | 82 +--
 configure.ac  |  2 +-
 src/backend/libpq/auth.c  |  2 +-
 src/backend/libpq/pqcomm.c|  8 +--
 src/backend/postmaster/pgstat.c   |  4 +-
 src/include/c.h   |  4 ++
 src/include/libpq/pqcomm.h|  2 +-
 src/include/pg_config.h.in| 15 ++
 src/interfaces/libpq/fe-connect.c |  2 +-
 src/port/getpeereid.c |  4 +-
 src/tools/msvc/Solution.pm|  5 +-
 13 files changed, 31 insertions(+), 178 deletions(-)
 delete mode 100644 config/ac_func_accept_argtypes.m4

diff --git a/aclocal.m4 b/aclocal.m4
index 5e22482cd5..58ade65046 100644
--- a/aclocal.m4
+++ b/aclocal.m4
@@ -1,5 +1,4 @@
 dnl aclocal.m4
-m4_include([config/ac_func_accept_argtypes.m4])
 m4_include([config/ax_prog_perl_modules.m4])
 m4_include([config/ax_pthread.m4])
 m4_include([config/c-compiler.m4])
diff --git a/config/ac_func_accept_argtypes.m4 
b/config/ac_func_accept_argtypes.m4
deleted file mode 100644
index 178ef67818..00
--- a/config/ac_func_accept_argtypes.m4
+++ /dev/null
@@ -1,78 +0,0 @@
-# config/ac_func_accept_argtypes.m4
-# This comes from the official Autoconf macro archive at
-# 
-
-
-dnl @synopsis AC_FUNC_ACCEPT_ARGTYPES
-dnl
-dnl Checks the data types of the three arguments to accept(). Results are
-dnl placed into the symbols ACCEPT_TYPE_RETURN and ACCEPT_TYPE_ARG[123],
-dnl consistent with the following example:
-dnl
-dnl   #define ACCEPT_TYPE_RETURN int
-dnl   #define ACCEPT_TYPE_ARG1 int
-dnl   #define ACCEPT_TYPE_ARG2 struct sockaddr *
-dnl   #define ACCEPT_TYPE_ARG3 socklen_t
-dnl
-dnl NOTE: This is just a modified version of the AC_FUNC_SELECT_ARGTYPES
-dnl macro. Credit for that one goes to David MacKenzie et. al.
-dnl
-dnl @version $Id: ac_func_accept_argtypes.m4,v 1.1 1999/12/03 11:29:29 simons 
Exp $
-dnl @author Daniel Richard G. 
-dnl
-
-# PostgreSQL local changes: In the original version ACCEPT_TYPE_ARG3
-# is a pointer type. That's kind of useless because then you can't
-# use the macro to define a corresponding variable. We also make the
-# reasonable(?) assumption that you can use arg3 for getsocktype etc.
-# as well (i.e., anywhere POSIX.2 has socklen_t).
-#
-# arg2 can also be `const' (e.g., RH 4.2). Change the order of tests
-# for arg3 so that `int' is first, in case there is no prototype at all.
-#
-# Solaris 7 and 8 have arg3 as 'void *' (disguised as 'Psocklen_t'
-# which is *not* 'socklen_t *').  If we detect that, then we assume
-# 'int' as the result, because that ought to work best.
-#
-# On Win32, accept() returns 'unsigned int PASCAL'
-# Win64 uses SOCKET for return and arg1
-
-AC_DEFUN([AC_FUNC_ACCEPT_ARGTYPES],
-[AC_MSG_CHECKING([types of arguments for accept()])
- AC_CACHE_VAL(ac_cv_func_accept_return,dnl
- [AC_CACHE_VAL(ac_cv_func_accept_arg1,dnl
-  [AC_CACHE_VAL(ac_cv_func_accept_arg2,dnl
-   [AC_CACHE_VAL(ac_cv_func_accept_arg3,dnl
-[for ac_cv_func_accept_return in 'int' 'SOCKET WSAAPI' 'unsigned int 
PASCAL'; do
-  for ac_cv_func_accept_arg1 in 'int' 'SOCKET' 'unsigned int'; do
-   for ac_cv_func_accept_arg2 in 'struct sockaddr *' 'const struct 
sockaddr *' 'void *'; do
-for ac_cv_func_accept_arg3 in 'int' 'size_t' 'socklen_t' 'unsigned 
int' 'void'; do
- AC_COMPILE_IFELSE([AC_LANG_SOURCE(
-[#include 
-#include 
-extern $ac_cv_func_accept_return accept ($ac_cv_func_accept_arg1, 
$ac_cv_func_accept_arg2, $ac_cv_func_accept_arg3 *);])],
- [ac_not_found=no; break 4], [ac_not_found=yes])
-   done
-  done
- done
-done
-if test "$ac_not_found" = yes; then
-  AC_MSG_ERROR([could not determine argument types])
-fi
-if test "$ac_cv_func_accept_arg3" = "void"; then
-  ac_cv_func_accept_arg3=int
-fi
-])dnl AC_CACHE_VAL
-   ])dnl AC_CACHE_VAL
-  ])dnl AC_CACHE_VAL
- ])dnl AC_CACHE_VAL
- AC_MSG_RESULT([$ac_cv_func_accept_return, $ac_cv_func_accept_arg1, 
$ac_cv_func_accept_arg2, 

Re: Allow escape in application_name

2021-11-04 Thread Fujii Masao




On 2021/11/04 20:42, kuroda.hay...@fujitsu.com wrote:

Dear Fujii-san,

Thank you for giving comments! I attached new patches.


Thanks for updating the patch!

+ 
+ Note that if embedded strings have Non-ASCII,
+ these characters will be replaced with the question marks 
(?).
+ This limitation is caused by application_name.
+ 

Isn't this descripton confusing because postgres_fdw actually doesn't do this?
postgres_fdw just passses the application_name as it is to the remote server.



On second thought, do we really need padding support for application_name
in postgres_fdw? I just thought that when I read the description
"Padding can be useful to aid human readability in log files." in the docs
about log_line_prefix.


My feelings was that we don't have any reasons not to support,
but I cannot found some good use-cases.
Now I decided to keep supporting that,
but if we face another problem I will cut that.


I'd like to hear more opinions about this from others.
But *if* there is actually no use case, I'd like not to add
the feature, to make the code simpler.



+   case 'a':
+   if (MyProcPort)
+   {

I commented that the check of MyProcPort is necessary because background
worker not having MyProcPort may access to the remote server. The example
of such process is the resolver process that Sawada-san was proposing for
atomic commit feature. But the proposal was withdrawn and for now
there seems no such process. If this is true, we can safely remove the check
of MyProcPort? If so, something like Assert(MyProcPort != NULL) may need
to be added, instead.


My understating was that we don't have to assume committing the Sawada-san's 
patch.
I think that FDW is only available from backend processes in the current 
implementation,
and MyProcPort will be substituted when processes are initialized() - in 
BackendInitialize().
Since the backend will execute BackendInitialize() after forking() from the 
postmaster,
we can assume that everyone who operates FDW has a valid value for MyProcPort.
I removed if statement and added assertion.


+   case 'u':
+   Assert(MyProcPort != NULL);

Isn't it enough to perform this assertion check only once
at the top of parse_pgfdw_appname()?



We can force parse_pgfdw_appname() not to be called if MyProcPort does not 
exist,
but I don't think it is needed now.


Yes.



If user name or database name is not set, the name is replaced with
"[unknown]". log_line_prefix needs this because log message may be
output when they have not been set yet, e.g., at early stage of backend
startup. But I wonder if application_name in postgres_fdw actually
need that.. Thought?


Hmm, I think all of backend processes have username and database, but
here has been followed from Horiguchi-san's suggestion:

```
I'm not sure but even if user_name doesn't seem to be NULL, don't we
want to do the same thing with %u of log_line_prefix for safety?
Namely, setting [unknown] if user_name is NULL or "". The same can be
said for %d.
```

But actually I don't have strong opinions.


Ok, we can wait for more opinions about this to come.
But if user name and database name should NOT be NULL
in postgres_fdw, I think that it's better to add the assertion
check for those conditions and get rid of "[unknown]" part.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: should we enable log_checkpoints out of the box?

2021-11-04 Thread Bharath Rupireddy
On Thu, Nov 4, 2021 at 10:50 PM Fujii Masao  wrote:
>
> On 2021/11/05 0:04, Robert Haas wrote:
> > On Thu, Nov 4, 2021 at 10:59 AM Bharath Rupireddy
> >  wrote:
> >> With log_checkpoints=on, the "./initdb -D data" generates few
> >> checkpoints logs [1]. I hope this is okay as it's a one-time thing per
> >> database cluster. Thoughts?
> >
> > I think you should arrange to suppress that output.
>
> +1

Thanks. I did that.

> > I didn't spot any other problems on a quick read-through.
>
> -bool   log_checkpoints = false;
> +bool   log_checkpoints = true;
>
> It's better to initialize the global variable Log_autovacuum_min_duration
> with 60 like the above change?

I missed it. Added now.

Please review the attached v2 patch.

Regards,
Bharath Rupireddy.


v2-0001-set-log_checkpoints-on-log_autovacuum_min_duratio.patch
Description: Binary data


Re: should we enable log_checkpoints out of the box?

2021-11-04 Thread Fujii Masao




On 2021/11/05 0:04, Robert Haas wrote:

On Thu, Nov 4, 2021 at 10:59 AM Bharath Rupireddy
 wrote:

With log_checkpoints=on, the "./initdb -D data" generates few
checkpoints logs [1]. I hope this is okay as it's a one-time thing per
database cluster. Thoughts?


I think you should arrange to suppress that output.


+1


I didn't spot any other problems on a quick read-through.


-bool   log_checkpoints = false;
+bool   log_checkpoints = true;

It's better to initialize the global variable Log_autovacuum_min_duration
with 60 like the above change?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Teach pg_receivewal to use lz4 compression

2021-11-04 Thread gkokolatos


‐‐‐ Original Message ‐‐‐

On Thursday, November 4th, 2021 at 9:21 AM, Michael Paquier 
 wrote:
> > +#ifdef HAVE_LIBLZ4
> > +while (readp < readend)
> > +{
> > +size_tread_size = 1;
> > +size_tout_size = 1;
> > +
> > +status = LZ4F_decompress(ctx, outbuf, _size,
> > + readbuf, _size, NULL);
>
> And...  It happens that the error from v9 is here, where we need to
> read the amount of remaining data from "readp", and not "readbuf" :)
>
> It is already late here, I'll continue on this stuff tomorrow, but
> this looks rather committable overall.

Thank you for v11 of the patch. Please find attached v12 which addresses a few
minor points.

Added an Oxford comma since the list now contains three or more terms:
---with-lz4) and none.
+--with-lz4), and none.

Removed an extra condinional check while switching over compression_method.
Instead of:
+   case COMPRESSION_LZ4:
+#ifdef HAVE_LIBLZ4
+   if (compresslevel != 0)
+   {
+   pg_log_error("cannot use --compress with
--compression-method=%s",
+"lz4");
+   fprintf(stderr, _("Try \"%s --help\" for more 
information.\n"),
+   progname);
+   exit(1);
+   }
+#else
+   if (compression_method == COMPRESSION_LZ4)
+   {
+   pg_log_error("this build does not support compression 
with %s",
+"LZ4");
+   exit(1);
+   }
+   break;
+#endif

I opted for:
+   case COMPRESSION_LZ4:
+#ifdef HAVE_LIBLZ4
+   if (compresslevel != 0)
+   {
+   pg_log_error("cannot use --compress with
--compression-method=%s",
+"lz4");
+   fprintf(stderr, _("Try \"%s --help\" for more 
information.\n"),
+   progname);
+   exit(1);
+   }
+#else
+   pg_log_error("this build does not support compression with 
%s",
+"LZ4");
+   exit(1);
+ #endif

There was an error while trying to find the streaming start. The code read:
+ else if (!ispartial && compression_method == COMPRESSION_LZ4)

where it should be instead:
+ else if (!ispartial && wal_compression_method == COMPRESSION_LZ4)

because compression_method is the global option exposed to the whereas
wal_compression_method is the local variable used to figure out what kind of
file the function is currently working with. This error was existing at least in
v9-0002 of $subject.

The variables readbuf and outbuf, used in the decompression of LZ4 files, are
now heap allocated.

Last, while the following is correct:
+   /*
+* Once we have read enough data to cover one segment, we are
+* done, there is no need to do more.
+*/
+   while (uncompressed_size <= WalSegSz)

I felt that converting it a do {} while () loop instead, will help with
readability:
+   do
+   {

+   /*
+* No need to continue reading the file when the uncompressed_size
+* exceeds WalSegSz, even if there are still data left to read.
+* However, if uncompressed_size is equal to WalSegSz, it should
+* verify that there is no more data to read.
+*/
+   } while (r > 0 && uncompressed_size <= WalSegSz);

of course the check:
+   /* Done reading the file */
+   if (r == 0)
+   break;
midway the loop is no longer needed and thus removed.

I would like to have a bit more test coverage in the case for 
FindStreamingStart().
Specifically for the case that a lz4-compressed segment larger than WalSegSz 
exists.
The attached patch does not contain such test case. I am not very certain on 
how to
create such a test case reliably as it would be mostly based on a warning 
emitted
during the parsing of such a file.

Cheers,
//Georgios

> --
> MichaelFrom 48720e7c6ba771c45d43dc9f5e6833f8bb6715e6 Mon Sep 17 00:00:00 2001
From: Georgios Kokolatos 
Date: Thu, 4 Nov 2021 16:05:21 +
Subject: [PATCH v12] Teach pg_receivewal to use LZ4 compression

The program pg_receivewal can use gzip compression to store the received
WAL.  This commit teaches it to also be able to use LZ4 compression. It
is required that the binary is build using the -llz4 flag. It is enabled
via the --with-lz4 flag on configuration time.

The option `--compression-method` has been expanded to inlude the value
[LZ4].  The option `--compress` can not be used with LZ4 compression.

Under the hood there is nothing exceptional to be noted. Tar 

Re: [sqlsmith] Failed assertion in brin_minmax_multi_distance_float4 on REL_14_STABLE

2021-11-04 Thread Justin Pryzby
On Thu, Nov 04, 2021 at 09:46:49AM +0100, Andreas Seltenreich wrote:
> sqlsmith triggers the following assertion when testing REL_14_STABLE:
> 
> TRAP: FailedAssertion("a1 <= a2", File: "brin_minmax_multi.c", Line: 
> 1879, PID: 631814)
> 
> I can reproduce it with the following query on a fresh regression
> database:
> 
> insert into public.brintest_multi (float4col) values (real 'nan');
> 
> The branch was at f6162c020c while testing, backtrace below.

I couldn't reproduce this, but it reminds me of this one, which we also had
trouble reproducing.

https://www.postgresql.org/message-id/flat/20210913004447.GA17931%40ahch-to

Could you send a "bt full" ?

> (gdb) bt

-- 
Justin




Re: Predefined role pg_maintenance for VACUUM, ANALYZE, CHECKPOINT.

2021-11-04 Thread Robert Haas
On Thu, Nov 4, 2021 at 12:03 PM Jeff Davis  wrote:
> The approach of using a function's ACL to represent the ACL of a
> higher-level command (as in this patch) does feel right to me. It feels
> like something we might extend to similar situations in the future; and
> even if we don't, it seems like a clean solution in isolation.

It feels wrong to me. I realize that it's convenient to be able to
re-use the existing GRANT and REVOKE commands that we have for
functions, but actually DDL interfaces are better than SQL functions,
because the syntax can be richer and you can avoid things like needing
to take a snapshot. This particular patch dodges that problem, which
is both a good thing and also clever, but it doesn't really make me
feel any better about the concept in general.

I think that the ongoing pressure to reduce as many things as possible
to function permissions checks is ultimately going to turn out to be
an unpleasant dead end. But by the time we reach that dead end we'll
have put so much effort into making it work that it will be hard to
change course, for backward-compatibility reasons among others.

I don't have anything specific to propose, which I realize is kind of
unhelpful ... but I don't like this, either.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: WIP: expression evaluation improvements

2021-11-04 Thread Robert Haas
Andres asked me off-list for comments on 0026, so here goes.

As a general comment, I think the patches could really benefit from
more meaningful commit messages and more comments on individual
functions. It would definitely help me review, and it might help other
people review, or modify the code later. For example, I'm looking at
ExprEvalStep. If the intent here is that we don't want the union
members to point to data that might differ from one execution of the
plan to the next, it's surely important to mention that and explain to
people who are trying to add steps later what they should do instead.
But I'm also not entirely sure that's the intended rule. It kind of
surprises me that the only things that we'd be pointing to here that
would fall into that category would be a bool, a NullableDatum, a
NullableDatum array, and a FunctionCallInfo ... but I've been
surprised by a lot of things that turned out to be true.

I am not a huge fan of the various Rel[Whatever] typedefs. I am not
sure that's really adding any clarity. On the other hand I would be a
big fan of renaming the structure members in some systematic way. This
kind of thing doesn't sit well with me:

- NullableDatum *value; /* value to return */
+ RelNullableDatum value; /* value to return */

Well, if NullableDatum was the value to return, then RelNullableDatum
isn't. It's some kind of thing that lets you find the value to return.
Actually that's not really right either, because before 'value' was a
pointer to the value to return and the corresponding isnull flag, and
now it is a way of finding that stuff. I don't know exactly what to do
here to keep the comment comprehensible and not unreasonably long, but
I don't think not changing at it all is the thing. Nor do I think just
having it be called 'value' when it's clearly not the value, nor even
a pointer to the value, is as clear as I would like to be.

I wonder if ExprBuilderAllocBool ought to be using sizeof(bool) rather
than sizeof(NullableDatum).

Is it true that allocno is only used for, err, some kind of LLVM
thing, and not in the regular interpreted path? As far as I can see,
outside of the LLVM code, we only ever test whether it's 0, and don't
actually care about the specific value.

I hope that the fact that this patch reverses the order of the first
two arguments to ExecInitExprRec is only something you did to make it
so that the compiler would find places you needed to update. Because
otherwise it makes no sense to introduce a new thing called an
ExprStateBuilder in 0017, make it an argument to that function, and
then turn around and change the signature again in 0026. Anyway, a
final patch shouldn't include this kind of churn.

+ offsetof(ExprState, steps) * esb->steps_len * sizeof(ExprEvalStep) +
+ state->mutable_off = offsetof(ExprState, steps) * esb->steps_len *
sizeof(ExprEvalStep);

Well, either I'm confused here, or the first * should be a + in each
case. I wonder how this works at all.

+ /* copy in step data */
+ {
+ ListCell *lc;
+ int off = 0;
+
+ foreach(lc, esb->steps)
+ {
+ memcpy(>steps[off], lfirst(lc), sizeof(ExprEvalStep));
+ off++;
+ }
+ }

This seems incredibly pointless to me. Why use a List in the first
place if we're going to have to flatten it using this kind of code?

I think stuff like RelFCIOff() and RelFCIIdx() and RelArrayIdx() is
just pretty much incomprehensible. Now, the executor is full of
badly-named stuff already -- ExecInitExprRec being a fine example of a
name nobody is going to understand on first reading, or maybe ever --
but we ought to try not to make things worse. I also do understand
that anything with relative pointers is bound to involve a bunch of
crappy notation that we're just going to have to accept as the price
of doing business. But it would help to pick names that are not so
heavily abbreviated. Like, if RelFCIIdx() were called
find_function_argument_in_relative_fcinfo() or even
get_fnarg_from_relfcinfo() the casual reader might have a chance of
guessing what it does. Sure, the code might be longer, but if you can
tell what it does without cross-referencing, it's still better.

I would welcome changes that make it clearer which things happen just
once and which things happen at execution time; that said, it seems
like RELPTR_RESOLVE() happens at execution time, and it surprises me a
bit that this is OK from a performance perspective. The pointers can
change from execution to execution, but not within an individual
execution, or so I think. So it doesn't need to be resolved every
time, if somehow that can be avoided. But maybe CPUs are sufficiently
well-optimized for computing a pointer address as a+b*c that it does
not matter.

I'm not sure how helpful any of these comments are, but those are my
initial thoughts.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg14 psql broke \d datname.nspname.relname

2021-11-04 Thread Mark Dilger



> On Nov 4, 2021, at 6:37 AM, Hamlin, Garick L  wrote:
> 
>> If we pass the dots through to the POSIX regular expression, we can
>> only do that either for the table name or the schema name, not both -
>> either the first or last dot must mark the boundary between the two.
>> That means that you can't use all the same regexy things for one as
>> you can for the other, which is a strange system. I knew that your
>> patch made it do that, and I committed it that way because I didn't
>> think it really mattered, and also because the whole system is already
>> pretty strange, so what's one more bit of strangeness?
> 
> Rather than trying to guess at the meaning of each '.' based on the total
> string.  I wonder, if we could for v15 require '.' to be spelled in longer way
> if it needs to be treated as part of the regex.

We're trying to fix an edge case, not change how the basic case works.  Most 
users are accustomed to using patterns from within psql like:

  \dt myschema.mytable

Whatever patch we accept must not break these totally normal and currently 
working cases.

> Perhaps requiring something like '(.)' be used rather than a bare '.' 
> might be good enough and documenting otherwise it's really a separator?  
> I suppose we could also invent a non-standard class as a stand in like
> '[::any::]', but that seems kinda weird.

If I understand you, that would require the above example to be written as:

  \dt myschema(.)mytable

which nobody expects to have to do, and which would be a very significant 
breaking change in v15.  I can't see anything like that being accepted.

> I think it might be possible to give better error messages long term
> if we knew what '.' should mean without looking at the whole thing.

You quote a portion of an email from Robert.  After that email, there were 
several more, and a new patch.  The commit message of the new patch explains 
what it does.  I wonder if you'd review that message, quoted here, or even 
better, review the entire patch.  Does this seem like an ok fix to you?

Subject: [PATCH v2] Reject patterns with too many parts or wrong db

Object name patterns used by pg_dump and psql potentially contain
multiple parts (dotted names), and nothing prevents users from
specifying a name with too many parts, nor specifying a
database-qualified name for a database other than the currently
connected database.  Prior to PostgreSQL version 14, pg_dump,
pg_dumpall and psql quietly discarded extra parts of the name on the
left.  For example, `pg_dump -t` only expected a possibly schema
qualified table name, not a database name, and the following command

  pg_dump -t production.marketing.customers

quietly ignored the "production" database name with neither warning
nor error.  Commit 2c8726c4b0a496608919d1f78a5abc8c9b6e0868 changed
the behavior of name parsing.  Where names contain more than the
maximum expected number of dots, the extra dots on the right were
interpreted as part of the name, such that the above example was
interpreted as schema=production, relation=marketing.customers.
This turns out to be highly unintuitive to users.

We've had reports that users sometimes copy-and-paste database- and
schema-qualified relation names from the logs.
https://www.postgresql.org/message-id/20211013165426.GD27491%40telsasoft.com

There is no support for cross database references, but allowing a
database qualified pattern when the database portion matches the
current database, as in the above report, seems more friendly than
rejecting it, so do that.  We don't allow the database portion
itself to be a pattern, because if it matched more than one database
(including the current one), there would be confusion about which
database(s) were processed.

Consistent with how we allow db.schemapat.relpat in pg_dump and psql,
also allow db.schemapat for specifying schemas, as:

\dn mydb.myschema

in psql and

pg_dump --schema=mydb.myschema

Fix the pre-v14 behavior of ignoring leading portions of patterns
containing too many dotted names, and the v14.0 misfeature of
combining trailing portions of such patterns, and instead reject
such patterns in all cases by raising an error.



—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Predefined role pg_maintenance for VACUUM, ANALYZE, CHECKPOINT.

2021-11-04 Thread Jeff Davis
On Tue, 2021-11-02 at 14:26 -0400, Stephen Frost wrote:
> Think you meant 'Stephen' there. ;)

Yes ;-)

> > The approach in the patch looks alright to me, but another one
> > could
> > be to build a SelectStmt when parsing CHECKPOINT.  I think that'd
> > simplify the standard_ProcessUtility() changes.

Nathan, if I understand correctly, that would mean no CheckPointStmt at
all. So it would either lack the right command tag, or we would have to
hack it in somewhere. The utility changes in the existing patch are
fairly minor, so I'll stick with that approach unless I'm missing
something.

> For my 2c, at least, I'm not really partial to either approach,
> though
> I'd want to see what error messages end up looking like.  Seems like
> we
> might want to exercise a bit more control than we'd be able to if we
> transformed it directly into a SelectStmt (that is, we might add a
> HINT:
> roles with execute rights on pg_checkpoint() can run this command, or
> something; maybe not too tho).

I changed the error message to:

  ERROR:  permission denied for command CHECKPOINT
  HINT:  The CHECKPOINT command requires the EXECUTE privilege
  on the function pg_checkpoint().

New version attached.

Andres suggested that I also consider a new form of the GRANT clause
that works on the CHECKPOINT command directly. I looked into that
briefly, but in every other case it seems that GRANT works on an object
(like a function). It doesn't feel like grating on a command is the
right solution.

The approach of using a function's ACL to represent the ACL of a
higher-level command (as in this patch) does feel right to me. It feels
like something we might extend to similar situations in the future; and
even if we don't, it seems like a clean solution in isolation.

Regards,
Jeff Davis

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 4b49dff2ffc..3558044221a 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -25487,6 +25487,24 @@ LOG:  Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560
  
 
  
+  
+   
+
+ pg_checkpoint
+
+pg_checkpoint ()
+void
+   
+   
+Request an immediate checkpoint. This function implements the  command, so the behavior is identical. This
+function is restricted to superusers by default, but other users can
+be granted EXECUTE to run the function. If a user has permission to
+execute this function, they also have permission to issue a
+CHECKPOINT command.
+   
+  
+
   

 
diff --git a/doc/src/sgml/ref/checkpoint.sgml b/doc/src/sgml/ref/checkpoint.sgml
index 2afee6d7b59..7c6bea05fa0 100644
--- a/doc/src/sgml/ref/checkpoint.sgml
+++ b/doc/src/sgml/ref/checkpoint.sgml
@@ -52,7 +52,11 @@ CHECKPOINT
   
 
   
-   Only superusers can call CHECKPOINT.
+   The CHECKPOINT is equivalent to calling the function
+   pg_checkpoint(). By
+   default, only superusers can this function, but if other users are granted
+   privileges on it, they may also call the CHECKPOINT
+   command.
   
  
 
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index b98deb72ec6..c9e1df39c11 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -26,6 +26,7 @@
 #include "funcapi.h"
 #include "miscadmin.h"
 #include "pgstat.h"
+#include "postmaster/bgwriter.h"
 #include "replication/walreceiver.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
@@ -44,6 +45,18 @@
 static StringInfo label_file;
 static StringInfo tblspc_map_file;
 
+/*
+ * Implements the CHECKPOINT command. To allow non-superusers to perform the
+ * CHECKPOINT command, grant privileges on this function.
+ */
+Datum
+pg_checkpoint(PG_FUNCTION_ARGS)
+{
+	RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_WAIT |
+	  (RecoveryInProgress() ? 0 : CHECKPOINT_FORCE));
+	PG_RETURN_VOID();
+}
+
 /*
  * pg_start_backup: set up for taking an on-line backup dump
  *
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index 54c93b16c4c..4437eb3010b 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -603,6 +603,8 @@ AS 'unicode_is_normalized';
 -- available to superuser / cluster owner, if they choose.
 --
 
+REVOKE EXECUTE ON FUNCTION pg_checkpoint() FROM public;
+
 REVOKE EXECUTE ON FUNCTION pg_start_backup(text, boolean, boolean) FROM public;
 
 REVOKE EXECUTE ON FUNCTION pg_stop_backup() FROM public;
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index bf085aa93b2..dcd822075f5 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -24,6 +24,7 @@
 #include "catalog/catalog.h"
 #include "catalog/index.h"
 #include "catalog/namespace.h"
+#include "catalog/objectaccess.h"
 #include "catalog/pg_inherits.h"
 #include "catalog/toasting.h"
 #include "commands/alter.h"

Re: Why doesn't GiST VACUUM require a super-exclusive lock, like nbtree VACUUM?

2021-11-04 Thread Peter Geoghegan
On Thu, Nov 4, 2021 at 8:52 AM Andrey Borodin  wrote:
> Let's enumerate steps how things can go wrong.
>
> Backend1: Index-Only scan returns tid and xs_hitup with index_tuple1 on 
> index_page1 pointing to heap_tuple1 on page1
>
> Backend2: Remove index_tuple1 and heap_tuple1
>
> Backend3: Mark page1 all-visible
> Backend1: Thinks that page1 is all-visible and shows index_tuple1 as visible
>
> To avoid this Backend1 must hold pin on index_page1 until it's done with 
> checking visibility, and Backend2 must do LockBufferForCleanup(index_page1). 
> Do I get things right?

Almost. Backend3 is actually Backend2 here (there is no 3) -- it runs
VACUUM throughout.

Note that it's not particularly likely that Backend2/VACUUM will "win"
this race, because it typically has to do much more work than
Backend1. It has to actually remove the index tuples from the leaf
page, then any other index work (for this and other indexes). Then it
has to arrive back in vacuumlazy.c to set the VM bit in
lazy_vacuum_heap_page(). That's a pretty unlikely scenario. And even
if it happened it would only happen once (until the next time we get
unlucky). What are the chances of somebody noticing a more or less
once-off, slightly wrong answer?

-- 
Peter Geoghegan




Re: Skipping logical replication transactions on subscriber side

2021-11-04 Thread vignesh C
On Fri, Oct 29, 2021 at 10:55 AM Masahiko Sawada  wrote:
>
> On Thu, Oct 28, 2021 at 7:40 PM Amit Kapila  wrote:
> >
> > On Thu, Oct 28, 2021 at 10:36 AM Masahiko Sawada  
> > wrote:
> > >
> > > On Wed, Oct 27, 2021 at 7:02 PM Amit Kapila  
> > > wrote:
> > > >
> > > > On Thu, Oct 21, 2021 at 10:29 AM Masahiko Sawada 
> > > >  wrote:
> > > > >
> > > > >
> > > > > I've attached updated patches.
> > >
> > > Thank you for the comments!
> > >
> > > >
> > > > Few comments:
> > > > ==
> > > > 1. Is the patch cleaning tablesync error entries except via vacuum? If
> > > > not, can't we send a message to remove tablesync errors once tablesync
> > > > is successful (say when we reset skip_xid or when tablesync is
> > > > finished) or when we drop subscription? I think the same applies to
> > > > apply worker. I think we may want to track it in some way whether an
> > > > error has occurred before sending the message but relying completely
> > > > on a vacuum might be the recipe of bloat. I think in the case of a
> > > > drop subscription we can simply send the message as that is not a
> > > > frequent operation. I might be missing something here because in the
> > > > tests after drop subscription you are expecting the entries from the
> > > > view to get cleared
> > >
> > > Yes, I think we can have tablesync worker send a message to drop stats
> > > once tablesync is successful. But if we do that also when dropping a
> > > subscription, I think we need to do that only the transaction is
> > > committed since we can drop a subscription that doesn't have a
> > > replication slot and rollback the transaction. Probably we can send
> > > the message only when the subscritpion does have a replication slot.
> > >
> >
> > Right. And probably for apply worker after updating skip xid.
>
> I'm not sure it's better to drop apply worker stats after resetting
> skip xid (i.g., after skipping the transaction). Since the view is a
> cumulative view and has last_error_time, I thought we can have the
> apply worker stats until the subscription gets dropped. Since the
> error reporting message could get lost, no entry in the view doesn’t
> mean the worker doesn’t face an issue.
>
> >
> > > In other cases, we can remember the subscriptions being dropped and
> > > send the message to drop the statistics of them after committing the
> > > transaction but I’m not sure it’s worth having it.
> > >
> >
> > Yeah, let's not go to that extent. I think in most cases subscriptions
> > will have corresponding slots.
>
> Agreed.
>
> >
> >  FWIW, we completely
> > > rely on pg_stat_vacuum_stats() for cleaning up the dead tables and
> > > functions. And we don't expect there are many subscriptions on the
> > > database.
> > >
> >
> > True, but we do send it for the database, so let's do it for the cases
> > you explained in the first paragraph.
>
> Agreed.
>
> I've attached a new version patch. Since the syntax of skipping
> transaction id is under the discussion I've attached only the error
> reporting patch for now.

Thanks for the updated patch, few comments:
1) This check and return can be moved above CreateTemplateTupleDesc so
that the tuple descriptor need not be created if there is no worker
statistics
+   BlessTupleDesc(tupdesc);
+
+   /* Get subscription worker stats */
+   wentry = pgstat_fetch_subworker(subid, subrelid);
+
+   /* Return NULL if there is no worker statistics */
+   if (wentry == NULL)
+   PG_RETURN_NULL();
+
+   /* Initialise values and NULL flags arrays */
+   MemSet(values, 0, sizeof(values));
+   MemSet(nulls, 0, sizeof(nulls));

2) "NULL for the main apply worker" is mentioned as "null for the main
apply worker" in case of pg_stat_subscription view, we can mention it
similarly.
+  
+   OID of the relation that the worker is synchronizing; NULL for the
+   main apply worker
+  

3) Variable assignment can be done during declaration and this the
assignment can be removed
+   i = 0;
+   /* subid */
+   values[i++] = ObjectIdGetDatum(subid);

4) I noticed that the worker error is still present when queried from
pg_stat_subscription_workers even after conflict is resolved in the
subscriber and the worker proceeds with applying the other
transactions, should this be documented somewhere?

5) This needs to be aligned, the columns in select have used TAB, we
should align it using spaces.
+CREATE VIEW pg_stat_subscription_workers AS
+SELECT
+   w.subid,
+   s.subname,
+   w.subrelid,
+   w.relid,
+   w.command,
+   w.xid,
+   w.error_count,
+   w.error_message,
+   w.last_error_time,
+   w.stats_reset

Regards,
Vignesh




Re: Why doesn't GiST VACUUM require a super-exclusive lock, like nbtree VACUUM?

2021-11-04 Thread Andrey Borodin
  04.11.2021, 04:33, "Peter Geoghegan" :But what about index-only scans, which GiST also supports? I thinkthat the rules are different there, even though index-only scans usean MVCC snapshot.Let's enumerate steps how things can go wrong.Backend1: Index-Only scan returns tid and xs_hitup with index_tuple1 on index_page1 pointing to heap_tuple1 on page1Backend2: Remove index_tuple1 and heap_tuple1Backend3: Mark page1 all-visibleBackend1: Thinks that page1 is all-visible and shows index_tuple1 as visible To avoid this Backend1 must hold pin on index_page1 until it's done with checking visibility, and Backend2 must do LockBufferForCleanup(index_page1). Do I get things right? Best regards, Andrey Borodin. 




Possible Documentation Update for ALTER STATISTICS

2021-11-04 Thread Ahmet Gedemenli
Hey,

I've noticed that the current documentation doesn't mention IF EXISTS clause 
for ALTER STATISTICS in the synopsis section, where PG supports it.
https://www.postgresql.org/docs/14/sql-alterstatistics.html
(Only for the last item, that is ALTER STATISTICS .. SET STATISTICS; for the 
others, PG just throws a syntax error.)

I'm from the Citus team and noticed this while bug fixing, and I wonder if it 
is intentional or not. If it's intentionally supported while the other ALTER 
STATISTICS statement types are not supported, it would be good to mention that 
in the documentation.

Best,
Ahmet


Re: should we enable log_checkpoints out of the box?

2021-11-04 Thread Robert Haas
On Thu, Nov 4, 2021 at 10:59 AM Bharath Rupireddy
 wrote:
> With log_checkpoints=on, the "./initdb -D data" generates few
> checkpoints logs [1]. I hope this is okay as it's a one-time thing per
> database cluster. Thoughts?

I think you should arrange to suppress that output. There's no reason
why initdb can't pass -c log_checkpoints=off. See backend_options in
initdb.c.

I didn't spot any other problems on a quick read-through.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Automatic notification of top transaction IDs

2021-11-04 Thread Robert Haas
On Wed, Jun 30, 2021 at 8:56 PM Gurjeet Singh  wrote:
> As mentioned in that thread, when sending a cancellation signal, the
> client cannot be sure if the cancel signal was honored, and if the
> transaction was cancelled successfully. In the attached patch, the
> backend emits a NotificationResponse containing the current full
> transaction id. It does so only if the relevant GUC is enabled, and
> when the top-transaction is being assigned the ID.

There's nothing to keep a client that wants this information from just
using SELECT txid_current() to get it, so this doesn't really seem
worth it to me. It's true that it could be convenient for someone not
to need to issue an SQL query to get the information and instead just
get it automatically, but I don't think that minor convenience is
enough to justify a new feature of this type.

Also, your 8-line documentation changes contains two spelling
mistakes, and you've used // comments which are not project style in
two places. It's a good idea to check over your patches for such
simple mistakes before submitting them.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: should we enable log_checkpoints out of the box?

2021-11-04 Thread Bharath Rupireddy
On Thu, Nov 4, 2021 at 1:11 PM Michael Paquier  wrote:
>
> On Tue, Nov 02, 2021 at 11:55:23AM -0400, Robert Haas wrote:
> > I think shipping with log_checkpoints=on and
> > log_autovacuum_min_duration=10m or so would be one of the best things
> > we could possibly do to allow ex-post-facto troubleshooting of
> > system-wide performance issues. The idea that users care more about
> > the inconvenience of a handful of extra log messages than they do
> > about being able to find problems when they have them matches no part
> > of my experience. I suspect that many users would be willing to incur
> > *way more* useless log messages than those settings would ever
> > generate if it meant that they could actually find problems when and
> > if they have them. And these messages would in fact be the most
> > valuable thing in the log for a lot of users. What reasonable DBA
> > cares more about the fact that the application attempted an insert
> > that violated a foreign key constraint than they do about a checkpoint
> > that took 20 minutes to fsync everything? The former is expected; if
> > you thought that foreign key violations would never occur, you
> > wouldn't need to incur the expense of having the system enforce them.
> > The latter is unexpected and basically undiscoverable with default
> > settings.
>
> +1.

Thanks all for your inputs. Here's the v1 doing above. Please review it.

With log_checkpoints=on, the "./initdb -D data" generates few
checkpoints logs [1]. I hope this is okay as it's a one-time thing per
database cluster. Thoughts?

[1]
creating configuration files ... ok
running bootstrap script ... 2021-11-04 14:50:34.163 UTC [865574] LOG:
 checkpoint starting: shutdown immediate
2021-11-04 14:50:34.166 UTC [865574] LOG:  checkpoint complete: wrote
222 buffers (1.4%); 0 WAL file(s) added, 0 removed, 0 recycled;
write=0.002 s, sync=0.001 s, total=0.003 s; sync files=0,
longest=0.000 s, average=0.000 s; distance=698 kB, estimate=698 kB
ok
performing post-bootstrap initialization ... 2021-11-04 14:50:35.069
UTC [865576] LOG:  checkpoint starting: immediate force wait flush-all
2021-11-04 14:50:35.069 UTC [865576] STATEMENT:  CREATE DATABASE
template0 IS_TEMPLATE = true ALLOW_CONNECTIONS = false;

2021-11-04 14:50:35.076 UTC [865576] LOG:  checkpoint complete: wrote
731 buffers (4.5%); 0 WAL file(s) added, 0 removed, 0 recycled;
write=0.008 s, sync=0.001 s, total=0.008 s; sync files=0,
longest=0.000 s, average=0.000 s; distance=3889 kB, estimate=3889 kB
2021-11-04 14:50:35.076 UTC [865576] STATEMENT:  CREATE DATABASE
template0 IS_TEMPLATE = true ALLOW_CONNECTIONS = false;

2021-11-04 14:50:35.094 UTC [865576] LOG:  checkpoint starting:
immediate force wait
2021-11-04 14:50:35.094 UTC [865576] STATEMENT:  CREATE DATABASE
template0 IS_TEMPLATE = true ALLOW_CONNECTIONS = false;

2021-11-04 14:50:35.095 UTC [865576] LOG:  checkpoint complete: wrote
0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled;
write=0.001 s, sync=0.001 s, total=0.002 s; sync files=0,
longest=0.000 s, average=0.000 s; distance=0 kB, estimate=3500 kB
2021-11-04 14:50:35.095 UTC [865576] STATEMENT:  CREATE DATABASE
template0 IS_TEMPLATE = true ALLOW_CONNECTIONS = false;

2021-11-04 14:50:35.101 UTC [865576] LOG:  checkpoint starting:
immediate force wait flush-all
2021-11-04 14:50:35.101 UTC [865576] STATEMENT:  CREATE DATABASE postgres;

2021-11-04 14:50:35.102 UTC [865576] LOG:  checkpoint complete: wrote
12 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled;
write=0.001 s, sync=0.001 s, total=0.001 s; sync files=0,
longest=0.000 s, average=0.000 s; distance=18 kB, estimate=3152 kB
2021-11-04 14:50:35.102 UTC [865576] STATEMENT:  CREATE DATABASE postgres;

2021-11-04 14:50:35.120 UTC [865576] LOG:  checkpoint starting:
immediate force wait
2021-11-04 14:50:35.120 UTC [865576] STATEMENT:  CREATE DATABASE postgres;

2021-11-04 14:50:35.122 UTC [865576] LOG:  checkpoint complete: wrote
0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled;
write=0.001 s, sync=0.001 s, total=0.002 s; sync files=0,
longest=0.000 s, average=0.000 s; distance=0 kB, estimate=2836 kB
2021-11-04 14:50:35.122 UTC [865576] STATEMENT:  CREATE DATABASE postgres;

2021-11-04 14:50:35.123 UTC [865576] LOG:  checkpoint starting:
shutdown immediate
2021-11-04 14:50:35.124 UTC [865576] LOG:  checkpoint complete: wrote
4 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled;
write=0.001 s, sync=0.001 s, total=0.001 s; sync files=0,
longest=0.000 s, average=0.000 s; distance=0 kB, estimate=2553 kB
ok
syncing data to disk ... ok

initdb: warning: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

./pg_ctl -D data -l logfile start

Regards,
Bharath Rupireddy.


v1-0001-set-log_checkpoints-on-log_autovacuum_min_duratio.patch
Description: Binary data


Re: Extending amcheck to check toast size and compression

2021-11-04 Thread Robert Haas
On Wed, Nov 3, 2021 at 6:56 PM Mark Dilger  wrote:
> Done that way.

I agree with what others have said: this looks fine.

But, is it plausible to add test coverage for the new checks, or is
that going to be too much of a pain?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Missing include in be-secure-openssl.c?

2021-11-04 Thread Tom Lane
Michael Paquier  writes:
> On Wed, Nov 03, 2021 at 11:45:26PM -0400, Tom Lane wrote:
>> Yeah, I noted the comment about WIN32_LEAN_AND_MEAN in the
>> stackoverflow thread too ... but as you say, it seems like
>> that should make the problem less probable not more so.
>> Still, it's hard to think of any other relevant change.

> Yeah, I don't see how this could be linked to WIN32_LEAN_AND_MEAN.

According to that stackoverflow thread, the unwanted #define of
X509_NAME comes from , and WIN32_LEAN_AND_MEAN
prevents that from being immediately included by .

The rough idea I have is that prior to 8162464a2, we sucked in
that #define during postgres.h and then OpenSSL's headers were
able to undo it.  After 8162464a2, we don't read 
during postgres.h, but some *other* header that be-secure-openssl.c
is including after the OpenSSL headers is pulling it in, so that
by the time we get to the body of the file the unwanted #define
is active.

I don't have either the resources or the interest to track down
exactly where that happens; my thought was just to make
be-secure-openssl.c's inclusions look more like fe-secure-openssl.c.
But, if you'd like to pursue the details, feel free.

regards, tom lane




Re: pg14 psql broke \d datname.nspname.relname

2021-11-04 Thread Hamlin, Garick L
On Wed, Oct 13, 2021 at 09:24:53AM -0400, Robert Haas wrote:
> > Splitting the pattern on all the dots and throwing away any additional
> > leftmost fields is a bug, ...
> 
> I also agree with you right up to here.
> 
> > and when you stop doing that, passing additional dots through to the POSIX
> > regular expression for processing is the most natural thing to do.  This
> > is, in fact, how v14 works.  It is a bit debatable whether treating the
> > first dot as a separator and the additional dots as stuff to be passed
> > through is the right thing, so we could call the v14 behavior a
> > mis-feature, but it's not as clearcut as the discussion upthread suggested.
> > Reverting to v13 behavior seems wrong, but I'm now uncertain how to
> > proceed.
> 
> But not this part, or at least not entirely.
> 
> If we pass the dots through to the POSIX regular expression, we can
> only do that either for the table name or the schema name, not both -
> either the first or last dot must mark the boundary between the two.
> That means that you can't use all the same regexy things for one as
> you can for the other, which is a strange system. I knew that your
> patch made it do that, and I committed it that way because I didn't
> think it really mattered, and also because the whole system is already
> pretty strange, so what's one more bit of strangeness?

Rather than trying to guess at the meaning of each '.' based on the total
string.  I wonder, if we could for v15 require '.' to be spelled in longer way
if it needs to be treated as part of the regex.

Perhaps requiring something like '(.)' be used rather than a bare '.' 
might be good enough and documenting otherwise it's really a separator?  
I suppose we could also invent a non-standard class as a stand in like
'[::any::]', but that seems kinda weird.

I think it might be possible to give better error messages long term
if we knew what '.' should mean without looking at the whole thing.

Garick



Re: [PATCH] rename column if exists

2021-11-04 Thread David G. Johnston
On Thursday, November 4, 2021, Daniel Gustafsson  wrote:

> > On 22 Mar 2021, at 20:40, David Oksman  wrote:
> >
> > Added the ability to specify IF EXISTS when renaming a column of an
> object (table, view, etc.).
> > For example: ALTER TABLE distributors RENAME COLUMN IF EXISTS address TO
> city;
> > If the column does not exist, a notice is issued instead of throwing an
> error.
>
> What is the intended use-case for RENAME COLUMN IF EXISTS?  I'm struggling
> to
> see when that would be helpful to users but I might not be imaginative
> enough.
>
>
Same reasoning as for all the other if exists we have, idempotence. Being
able to run the command on an object that is already in the desired state
without provoking an error.

David J.


Re: Automatic notification of top transaction IDs

2021-11-04 Thread Daniel Gustafsson
> On 22 Jul 2021, at 07:28, vignesh C  wrote:
> 
> On Thu, Jul 1, 2021 at 6:41 AM Gurjeet Singh  wrote:
>> 
>> The proposed patch is attached.
> 
> There is one compilation warning:
> xid.c:165:1: warning: no previous prototype for
> ‘FullTransactionIdToStr’ [-Wmissing-prototypes]
>  165 | FullTransactionIdToStr(FullTransactionId fxid)
>  | ^~
> 
> There are few compilation issues in documentation:
> /usr/bin/xmllint --path . --noout --valid postgres.sgml
> protocol.sgml:1327: parser error : Opening and ending tag mismatch:
> literal line 1322 and para
>   
>  ^
> protocol.sgml:1339: parser error : Opening and ending tag mismatch:
> literal line 1322 and sect2
>  
>  ^
> protocol.sgml:1581: parser error : Opening and ending tag mismatch:
> para line 1322 and sect1
> 
> ^
> protocol.sgml:7893: parser error : Opening and ending tag mismatch:
> sect2 line 1322 and chapter
> 
>  ^
> protocol.sgml:7894: parser error : chunk is not well balanced
> 
> ^
> postgres.sgml:253: parser error : Failure to process entity protocol
>  
>^
> postgres.sgml:253: parser error : Entity 'protocol' not defined
>  
>^

The above compiler warning and documentation compilation errors haven't been
addressed still, so I'm marking this patch Returned with Feedback.  Please feel
free to open a new entry for an updated patch.

--
Daniel Gustafsson   https://vmware.com/





Re: Supply restore_command to pg_rewind via CLI argument

2021-11-04 Thread Daniel Gustafsson
> On 14 Sep 2021, at 16:05, Andrey Borodin  wrote:

>> Do we actually need --target-restore-command at all? It seems to me
>> that we have all we need with --restore-target-wal, and that's not
>> really instinctive to pass down a command via another command..
> 
> Currently we know that --restore-target-wal is not enough if postgresql.conf 
> does not reside within PGDATA.

That's a useful reason which wasn't brought up in the earlier thread, and may
tip the scales in favor.

The patch no longer applies, can you submit a rebased version please?

--
Daniel Gustafsson   https://vmware.com/





Re: RFC: Logging plan of the running query

2021-11-04 Thread torikoshia

On 2021-11-02 20:32, Ekaterina Sokolova wrote:
Thanks for your response!


Hi!

I'm here to answer your questions about contrib/pg_query_state.

I only took a quick look at pg_query_state, I have some questions.



pg_query_state seems using shm_mq to expose the plan information, but
there was a discussion that this kind of architecture would be tricky
to do properly [1].
Does pg_query_state handle difficulties listed on the discussion?
[1] 
https://www.postgresql.org/message-id/9a50371e15e741e295accabc72a41df1%40oss.nttdata.com


I doubt that it was the right link.


Sorry for make you confused, here is the link.

  
https://www.postgresql.org/message-id/CA%2BTgmobkpFV0UB67kzXuD36--OFHwz1bs%3DL_6PZbD4nxKqUQMw%40mail.gmail.com



But on the topic I will say that extension really use shared memory,
interaction is implemented by sending / receiving messages. This
architecture provides the required reliability and convenience.


As described in the link, using shared memory for this kind of work 
would need DSM and It'd be also necessary to exchange information 
between requestor and responder.


For example, when I looked at a little bit of pg_query_state code, it 
looks like the size of the queue is fixed at QUEUE_SIZE, and I wonder 
how plans that exceed QUEUE_SIZE are handled.



It seems the caller of the pg_query_state() has to wait until the
target process pushes the plan information into shared memory, can it
lead to deadlock situations?
I came up with this question because when trying to make a view for
memory contexts of other backends, we encountered deadlock 
situations.

After all, we gave up view design and adopted sending signal and
logging.


Discussion at the following URL.
https://www.postgresql.org/message-id/9a50371e15e741e295accabc72a41df1%40oss.nttdata.com


Before extracting information about side process we check its state.
Information will only be retrieved for a process willing to provide
it. Otherwise, we will receive an error message about impossibility of
getting query execution statistics + process status. Also checking
fact of extracting your own status exists. This is even verified in
tests.

Thanks for your attention.
Just in case, I am ready to discuss this topic in more detail.


I imagined the following procedure.
Does it cause dead lock in pg_query_state?

- session1
BEGIN; TRUNCATE t;

- session2
BEGIN; TRUNCATE t; -- wait

- session1
SELECT * FROM pg_query_state(); -- wait and dead locked?


About overhead:
I haven't measured it yet, but I believe that the overhead for 
backends

which are not called pg_log_current_plan() would be slight since the
patch just adds the logic for saving QueryDesc on ExecutorRun().
The overhead for backends which is called pg_log_current_plan() might
not slight, but since the target process are assumed dealing with
long-running query and the user want to know its plan, its overhead
would be worth the cost.

I think it would be useful for us to have couple of examples with a
different number of rows compared to using without this functionality.


Do you have any expectaion that the number of rows would affect the 
performance of this functionality?
This patch adds some codes to ExecutorRun(), but I thought the number of 
rows would not give impact on the performance.


--
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION




Re: pgbench logging broken by time logic changes

2021-11-04 Thread Daniel Gustafsson
> On 11 Jul 2021, at 15:07, Fabien COELHO  wrote:

> Attached the fully "ignored" version of the time features test as a patch.

This version of the patch is failing to apply on top of HEAD, can you please
submit a rebased version?

--
Daniel Gustafsson   https://vmware.com/





Re: standby recovery fails (tablespace related) (tentative patch and discussion)

2021-11-04 Thread Daniel Gustafsson
> On 24 Sep 2021, at 20:14, Tom Lane  wrote:
> 
> Robert Haas  writes:
>> The commit message for 0001 is not clear enough for me to understand
>> what problem it's supposed to be fixing. The code comments aren't
>> really either. They make it sound like there's some problem with
>> copying symlinks but mostly they just talk about callbacks, which
>> doesn't really help me understand what problem we'd have if we just
>> didn't commit this (or reverted it later).
> 
>> I am not really convinced by Álvaro's claim that 0004 is a "fix"; I
>> think I'd call it an improvement. But either way I agree that could
>> just be committed.
> 
>> I haven't analyzed 0002 and 0003 yet.
> 
> I took a quick look through this:
> 
> * I don't like 0001 either, though it seems like the issue is mostly
> documentation.  sub _srcsymlink should have a comment explaining
> what it's doing and why.  The documentation of copypath's new parameter
> seems like gobbledegook too --- I suppose it should read more like
> "By default, copypath fails if a source item is a symlink.  But if
> B is provided, that subroutine is called to process any
> symlink."
> 
> * I'm allergic to 0002's completely undocumented changes to
> poll_query_until, especially since I don't see anything in the
> patch that actually uses them.  Can't we just drop these diffs
> in PostgresNode.pm?  BTW, the last error message in the patch,
> talking about a 5-second timeout, seems wrong.  With or without
> these changes, poll_query_until's default timeout is 180 sec.
> The actual test case might be okay other than that nit and a
> comment typo or two.
> 
> * 0003 might actually be okay.  I've not read it line-by-line,
> but it seems like it's implementing a sane solution and it's
> adequately commented.
> 
> * I'm inclined to reject 0004 out of hand, because I don't
> agree with what it's doing.  The purpose of the rmgrdesc
> functions is to show you what is in the WAL records, and
> everywhere else we interpret that as "show the verbatim,
> numeric field contents".  heapdesc.c, for example, doesn't
> attempt to look up the name of the table being operated on.
> 0004 isn't adhering to that style, and aside from being
> inconsistent I'm afraid that it's adding failure modes
> we don't want.

This patch again fails to apply (seemingly from the Perl namespace work on the
testcode), and needs a few updates as per the above review.

--
Daniel Gustafsson   https://vmware.com/





Re: should INSERT SELECT use a BulkInsertState?

2021-11-04 Thread Daniel Gustafsson
> On 27 Sep 2021, at 02:08, Justin Pryzby  wrote:

> I split off the simple part again.  If there's no interest in the 0001 patch
> alone, then I guess it should be closed in the CF.

Since the thread has stalled, maybe that's the best course of action here.  Any
objections from anyone on the thread?

--
Daniel Gustafsson   https://vmware.com/





Re: LogwrtResult contended spinlock

2021-11-04 Thread Daniel Gustafsson
> On 8 Sep 2021, at 17:14, Jaime Casanova  wrote:
> 
> On Tue, Feb 02, 2021 at 08:19:19PM -0300, Alvaro Herrera wrote:
>> Hello,
>> 
>> So I addressed about half of your comments in this version merely by
>> fixing silly bugs.  The problem I had which I described as
>> "synchronization fails" was one of those silly bugs.
>> 
> 
> Hi Álvaro,
> 
> Are we waiting for another version of the patch based on Andres'
> comments? Or this version is good enough for testing?

Any update on this?  Should the patch be reset back to "Needs review" or will
there be a new version?

--
Daniel Gustafsson   https://vmware.com/





Re: [PROPOSAL] new diagnostic items for the dynamic sql

2021-11-04 Thread Daniel Gustafsson
> On 9 Sep 2021, at 08:23, Dinesh Chemuduru  wrote:

> Let me try to fix the suggested case(I tried to fix this case in the past, 
> but this time I will try to spend more time on this), and will submit a new 
> patch.

This CF entry is marked Waiting on Author, have you had the chance to prepare a
new version of the patch addressing the comments from Pavel?

--
Daniel Gustafsson   https://vmware.com/





Re: Add missing function abs (interval)

2021-11-04 Thread Daniel Gustafsson
> On 26 Sep 2021, at 19:58, Isaac Morland  wrote:

> So I think I will prepare a revised patch that uses this formulation; and if 
> I still have any suggestions that aren't directly related to adding 
> abs(interval) I will split them off into a separate discussion.

This CF entry is marked Waiting on Author, have you had the chance to prepare
an updated version of this patch?

--
Daniel Gustafsson   https://vmware.com/





Re: lastOverflowedXid does not handle transaction ID wraparound

2021-11-04 Thread Simon Riggs
On Wed, 3 Nov 2021 at 22:07, Alexander Korotkov  wrote:
>
> Hi!
>
> On Wed, Nov 3, 2021 at 8:51 PM Simon Riggs  
> wrote:
> > It is however, an undocumented modularity violation. I think that is
> > acceptable because of the ProcArrayLock traffic, but needs to have a
> > comment to explain this at the call to
> > ExpireOldKnownAssignedTransactionIds() i.e. " and potentially reset
> > lastOverflowedXid", as well as a comment on the
> > ExpireOldKnownAssignedTransactionIds() function.
>
> Thank you for your feedback.  Please find the revised patch attached.
> It incorporates this function comment changes altogether with minor
> editings and commit message. Let me know if you have further
> suggestions.
>
> I'm going to push this if no objections.

Looks good, go for it.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




RE: Allow escape in application_name

2021-11-04 Thread kuroda.hay...@fujitsu.com
Dear Fujii-san,

Thank you for giving comments! I attached new patches.

> On second thought, do we really need padding support for application_name
> in postgres_fdw? I just thought that when I read the description
> "Padding can be useful to aid human readability in log files." in the docs
> about log_line_prefix.

My feelings was that we don't have any reasons not to support,
but I cannot found some good use-cases.
Now I decided to keep supporting that,
but if we face another problem I will cut that.

> + case 'a':
> + if (MyProcPort)
> + {
> 
> I commented that the check of MyProcPort is necessary because background
> worker not having MyProcPort may access to the remote server. The example
> of such process is the resolver process that Sawada-san was proposing for
> atomic commit feature. But the proposal was withdrawn and for now
> there seems no such process. If this is true, we can safely remove the check
> of MyProcPort? If so, something like Assert(MyProcPort != NULL) may need
> to be added, instead.

My understating was that we don't have to assume committing the Sawada-san's 
patch.
I think that FDW is only available from backend processes in the current 
implementation,
and MyProcPort will be substituted when processes are initialized() - in 
BackendInitialize().
Since the backend will execute BackendInitialize() after forking() from the 
postmaster,
we can assume that everyone who operates FDW has a valid value for MyProcPort.
I removed if statement and added assertion.
We can force parse_pgfdw_appname() not to be called if MyProcPort does not 
exist,
but I don't think it is needed now.

> If user name or database name is not set, the name is replaced with
> "[unknown]". log_line_prefix needs this because log message may be
> output when they have not been set yet, e.g., at early stage of backend
> startup. But I wonder if application_name in postgres_fdw actually
> need that.. Thought?

Hmm, I think all of backend processes have username and database, but
here has been followed from Horiguchi-san's suggestion:

```
I'm not sure but even if user_name doesn't seem to be NULL, don't we
want to do the same thing with %u of log_line_prefix for safety?
Namely, setting [unknown] if user_name is NULL or "". The same can be
said for %d.
```

But actually I don't have strong opinions.

> + if (appname == NULL || *appname
> == '\0')
> + appname = "[unknown]";
> 
> Do we really want to replace the application name with "[unknown]"
> when application_name in the local server is not set? At least for me,
> it's intuitive to replace it with empty string in that case,
> in postgres_fdw application_name.

Yeah, I agreed that empty string should be keep here.
Currently I kept NULL case because of the above reason.

> The patch now fails to be applied to the master. Could you rebase it?

Thanks, rebased. I just moved test to the end of the sql file.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v21_0002_allow_escapes.patch
Description: v21_0002_allow_escapes.patch


v21_0001_export_padding_function.patch
Description: v21_0001_export_padding_function.patch


Re: PROXY protocol support

2021-11-04 Thread Magnus Hagander
On Wed, Nov 3, 2021 at 2:36 PM Daniel Gustafsson  wrote:

> > On 28 Sep 2021, at 15:23, Magnus Hagander  wrote:
> > On Fri, Sep 10, 2021 at 1:44 AM Jacob Champion 
> wrote:
>
> >> The TAP test will need to be rebased over the changes in 201a76183e.
> >
> > Done
>
> And now the TAP test will need to be rebased over the changes in
> b3b4d8e68ae83f432f43f035c7eb481ef93e1583.
>

Thanks for the pointer, PFA a rebase.


-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 
diff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml
index 02f0489112..a3ff09b3ac 100644
--- a/doc/src/sgml/client-auth.sgml
+++ b/doc/src/sgml/client-auth.sgml
@@ -353,6 +353,15 @@ hostnogssenc  database  user
 
+  
+   If  is enabled and the
+   connection is made through a proxy server using the PROXY
+   protocol, the actual IP address of the client will be used
+   for matching. If a connection is made through a proxy server
+   not using the PROXY protocol, the IP address of the
+   proxy server will be used.
+  
+
   
These fields do not apply to local records.
   
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index de77f14573..5211c1f7b1 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -682,6 +682,56 @@ include_dir 'conf.d'
   
  
 
+ 
+  proxy_port (integer)
+  
+   proxy_port configuration parameter
+  
+  
+  
+   
+The TCP port the server listens on for PROXY connections, disabled by
+default. If set to a number, PostgreSQL
+will listen on this port on the same addresses as for regular
+connections, but expect all connections to use the PROXY protocol to
+identify the client.  This parameter can only be set at server start.
+   
+   
+If a proxy connection is made over this port, and the proxy is listed
+in , the actual client address
+will be considered as the address of the client, instead of listing
+all connections as coming from the proxy server.
+   
+   
+ The http://www.haproxy.org/download/1.9/doc/proxy-protocol.txt;>PROXY
+ protocol is maintained by HAProxy,
+ and supported in many proxies and load
+ balancers. PostgreSQL supports version 2
+ of the protocol.
+   
+  
+ 
+
+ 
+  proxy_servers (string)
+  
+   proxy_servers configuration parameter
+  
+  
+  
+   
+A comma separated list of one or more ip addresses, cidr specifications or the
+literal unix, indicating which proxy servers to trust when
+connecting on the port specified in .
+   
+   
+If a proxy connection is made from an IP address not covered by this
+list, the connection will be rejected. By default no proxy is trusted
+and all proxy connections will be rejected.
+   
+  
+ 
+
  
   max_connections (integer)
   
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 4b49dff2ff..74f38d4891 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -22238,7 +22238,12 @@ SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS t(ls,n);
 connection,
 or NULL if the current connection is via a
 Unix-domain socket.
-   
+   
+   
+If the connection is a PROXY connection, this function returns the
+IP address used to connect to the proxy server.
+   
+   
   
 
   
@@ -22254,7 +22259,13 @@ SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS t(ls,n);
 connection,
 or NULL if the current connection is via a
 Unix-domain socket.
-   
+   
+   
+If the connection is a PROXY connection, this function returns the
+port used to connect to the proxy server.
+   
+
+   
   
 
   
diff --git a/src/backend/libpq/auth.c b/src/backend/libpq/auth.c
index a317aef1c9..f8c32ad492 100644
--- a/src/backend/libpq/auth.c
+++ b/src/backend/libpq/auth.c
@@ -1696,6 +1696,14 @@ ident_inet(hbaPort *port)
 			   *la = NULL,
 hints;
 
+	if (port->isProxy)
+	{
+		ereport(LOG,
+(errcode_for_socket_access(),
+ errmsg("Ident authentication cannot be used over PROXY connections")));
+		return STATUS_ERROR;
+	}
+
 	/*
 	 * Might look a little weird to first convert it to text and then back to
 	 * sockaddr, but it's protocol independent.
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 89a5f901aa..a8d6c5fa4c 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -311,13 +311,13 @@ socket_close(int code, Datum arg)
  * Successfully opened sockets are added to the ListenSocket[] array (of
  * length MaxListen), at the first position that isn't PGINVALID_SOCKET.
  *
- * RETURNS: STATUS_OK or STATUS_ERROR
+ 

Re: [PATCH] More docs on what to do and not do in extension code

2021-11-04 Thread Daniel Gustafsson
> On 30 Aug 2021, at 04:20, Craig Ringer  wrote:
> 
> On Tue, 29 Jun 2021 at 13:30, Craig Ringer  > wrote:
> Laurenz,
> 
> Thanks for your comments. Sorry it's taken me so long to get back to you. 
> Commenting inline below on anything I think needs comment; other proposed 
> changes look good.
> 
> I'm not going to get back to this anytime soon.
> 
> If anybody wants to pick it up that's great. Otherwise at least it's there in 
> the mailing lists to search.

I'm marking this returned with feedback for now, please open a new entry when
there is an updated patch.

--
Daniel Gustafsson   https://vmware.com/





Re: [PATCH] rename column if exists

2021-11-04 Thread Daniel Gustafsson
> On 22 Mar 2021, at 20:40, David Oksman  wrote:
> 
> Added the ability to specify IF EXISTS when renaming a column of an object 
> (table, view, etc.).
> For example: ALTER TABLE distributors RENAME COLUMN IF EXISTS address TO city;
> If the column does not exist, a notice is issued instead of throwing an error.

What is the intended use-case for RENAME COLUMN IF EXISTS?  I'm struggling to
see when that would be helpful to users but I might not be imaginative enough.

--
Daniel Gustafsson   https://vmware.com/





Re: Printing backtrace of postgres processes

2021-11-04 Thread Daniel Gustafsson
> On 26 Aug 2021, at 16:56, vignesh C  wrote:

> The previous patch was failing because of the recent test changes made
> by commit 201a76183e2 which unified new and get_new_node, attached
> patch has the changes to handle the changes accordingly.

This patch now fails because of the test changes made by commit b3b4d8e68a,
please submit a rebase.

--
Daniel Gustafsson   https://vmware.com/





Re: partial heap only tuples

2021-11-04 Thread Daniel Gustafsson
> On 14 Jul 2021, at 13:34, vignesh C  wrote:

> The patch does not apply on Head anymore, could you rebase and post a
> patch. I'm changing the status to "Waiting for Author".

As no update has been posted, the patch still doesn't apply.  I'm marking this
patch Returned with Feedback, feel free to open a new entry for an updated
patch.

--
Daniel Gustafsson   https://vmware.com/





Re: [PATCH] Added TRANSFORM FOR for COMMENT tab completion

2021-11-04 Thread Ken Kato

+   else if (Matches("COMMENT", "ON", "PROCEDURAL"))
+   COMPLETE_WITH("LANGUAGE");
+   else if (Matches("COMMENT", "ON", "PROCEDURAL", "LANGUAGE"))
+   COMPLETE_WITH_QUERY(Query_for_list_of_languages);
I don't think that there is much point in being this picky either 
with

the usage of PROCEDURAL, as we already complete a similar and simpler
grammar with LANGUAGE.  I would just remove this part of the patch.
In my opinion, it is written in the documentation, so tab-completion 
of "PROCEDURAL"is good.
How about a completion with "LANGUAGE" and "PROCEDURAL LANGUAGE", like 
"PASSWORD" and "ENCRYPTED PASSWORD" in CREATE ROLE?


I kept LANGUAGE and PROCEDURAL LANGUAGE just like PASSWORD and ENCRYPTED 
PASSWORD.




+   else if (Matches("COMMENT", "ON", "OPERATOR"))
+   COMPLETE_WITH("CLASS", "FAMILY");
Isn't this one wrong?  Operators can have comments, and we'd miss
them.  This is mentioned upthread, but it seems to me that we'd 
better

drop this part of the patch if the operator naming part cannot be
solved easily.

As you said, it may be misleading.
I agree to drop it.


Hearing all the opinions given, I decided not to support OPERATOR CLASS 
or FAMILY in COMMENT.
Therefore, I drooped Query_for_list_of_operator_class_index_methods as 
well.



+static const SchemaQuery Query_for_list_of_text_search_configurations 
= {


We already have Query_for_list_of_ts_configurations in tab-complete.c.
Do we really need both queries? Or we can drop either of them?


Thank you for pointing out!
I didn't notice that there already exists 
Query_for_list_of_ts_configurations,

so I changed TEXT SEARCH completion with using Query_for_list_of_ts_XXX.

I made the changes to the points above and updated the patch.

--
Best wishes,

Ken Kato
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATIONdiff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index ecae9df8ed..ee63a54470 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -494,7 +494,6 @@ static const SchemaQuery Query_for_list_of_partitioned_indexes = {
 	.result = "pg_catalog.quote_ident(c.relname)",
 };
 
-
 /* All relations */
 static const SchemaQuery Query_for_list_of_relations = {
 	.catname = "pg_catalog.pg_class c",
@@ -2399,22 +2398,19 @@ psql_completion(const char *text, int start, int end)
 	else if (Matches("COMMENT"))
 		COMPLETE_WITH("ON");
 	else if (Matches("COMMENT", "ON"))
-		COMPLETE_WITH("ACCESS METHOD", "CAST", "COLLATION", "CONVERSION",
-	  "DATABASE", "EVENT TRIGGER", "EXTENSION",
-	  "FOREIGN DATA WRAPPER", "FOREIGN TABLE", "SERVER",
-	  "INDEX", "LANGUAGE", "POLICY", "PUBLICATION", "RULE",
-	  "SCHEMA", "SEQUENCE", "STATISTICS", "SUBSCRIPTION",
-	  "TABLE", "TYPE", "VIEW", "MATERIALIZED VIEW",
-	  "COLUMN", "AGGREGATE", "FUNCTION",
-	  "PROCEDURE", "ROUTINE",
-	  "OPERATOR", "TRIGGER", "CONSTRAINT", "DOMAIN",
-	  "LARGE OBJECT", "TABLESPACE", "TEXT SEARCH", "ROLE");
+		COMPLETE_WITH("ACCESS METHOD", "AGGREGATE", "CAST", "COLLATION",
+	  "COLUMN","CONSTRAINT", "CONVERSION", "DATABASE",
+	  "DOMAIN","EXTENSION", "EVENT TRIGGER",
+	  "FOREIGN DATA WRAPPER","FOREIGN TABLE",
+	  "FUNCTION", "INDEX", "LANGUAGE","LARGE OBJECT",
+	  "MATERIALIZED VIEW", "OPERATOR","POLICY",
+	  "PROCEDURE", "PROCEDURAL LANGUAGE", "PUBLICATION","ROLE",
+	  "ROUTINE", "RULE", "SCHEMA", "SEQUENCE","SERVER",
+	  "STATISTICS", "SUBSCRIPTION", "TABLE",
+	  "TABLESPACE", "TEXT SEARCH", "TRANSFORM FOR",
+	  "TRIGGER", "TYPE", "VIEW");
 	else if (Matches("COMMENT", "ON", "ACCESS", "METHOD"))
 		COMPLETE_WITH_QUERY(Query_for_list_of_access_methods);
-	else if (Matches("COMMENT", "ON", "FOREIGN"))
-		COMPLETE_WITH("DATA WRAPPER", "TABLE");
-	else if (Matches("COMMENT", "ON", "TEXT", "SEARCH"))
-		COMPLETE_WITH("CONFIGURATION", "DICTIONARY", "PARSER", "TEMPLATE");
 	else if (Matches("COMMENT", "ON", "CONSTRAINT"))
 		COMPLETE_WITH_QUERY(Query_for_all_table_constraints);
 	else if (Matches("COMMENT", "ON", "CONSTRAINT", MatchAny))
@@ -2422,15 +2418,67 @@ psql_completion(const char *text, int start, int end)
 	else if (Matches("COMMENT", "ON", "CONSTRAINT", MatchAny, "ON"))
 	{
 		completion_info_charp = prev2_wd;
-		COMPLETE_WITH_QUERY(Query_for_list_of_tables_for_constraint);
+		COMPLETE_WITH_QUERY(Query_for_list_of_tables_for_constraint
+			" UNION SELECT 'DOMAIN'");
 	}
-	else if (Matches("COMMENT", "ON", "MATERIALIZED", "VIEW"))
-		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_matviews, NULL);
+	else if (Matches("COMMENT", "ON", "CONSTRAINT", MatchAny, "ON", "DOMAIN"))
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_domains, NULL);
 	else if (Matches("COMMENT", "ON", "EVENT", "TRIGGER"))
 		COMPLETE_WITH_QUERY(Query_for_list_of_event_triggers);
+	else if (Matches("COMMENT", "ON", "FOREIGN"))
+		COMPLETE_WITH("DATA WRAPPER", "TABLE");
+	else if (Matches("COMMENT", "ON", "FOREIGN", "TABLE"))
+		

Re: Make Append Cost aware of some run time partition prune case

2021-11-04 Thread Daniel Gustafsson
> On 14 Jul 2021, at 17:55, vignesh C  wrote:

> The patch does not apply on Head anymore, could you rebase and post a
> patch. I'm changing the status to "Waiting for Author".

As the thread has stalled, this patch still doesn't apply (judging by the log
it's likely not too hard to resolve).  I'm marking this patch Returned with
Feedback, feel free to open a new entry for an updated patch.

--
Daniel Gustafsson   https://vmware.com/





Re: Columns correlation and adaptive query optimization

2021-11-04 Thread Daniel Gustafsson
> On 14 Jul 2021, at 13:13, vignesh C  wrote:

> "C:\projects\postgresql\pgsql.sln" (default target) (1) ->
> "C:\projects\postgresql\auto_explain.vcxproj" (default target) (45) ->
> (ClCompile target) ->
> contrib/auto_explain/auto_explain.c(658): error C2039: 'mt_plans' : is
> not a member of 'ModifyTableState'
> [C:\projects\postgresql\auto_explain.vcxproj]
> contrib/auto_explain/auto_explain.c(659): error C2039: 'mt_nplans' :
> is not a member of 'ModifyTableState'
> [C:\projects\postgresql\auto_explain.vcxproj]
> contrib/auto_explain/auto_explain.c(660): error C2198:
> 'AddMultiColumnStatisticsForMemberNodes' : too few arguments for call
> [C:\projects\postgresql\auto_explain.vcxproj]
> 2 Warning(s)
> 3 Error(s)
> 
> Also Yugo Nagata's comments need to be addressed, I'm changing the
> status to "Waiting for Author".

As this thread has stalled and the patch hasn't worked in the CI for quite some
time, I'm marking this Returned with Feedback.  Feel free to open a new entry
for an updated patch.

--
Daniel Gustafsson   https://vmware.com/





Re: Teach pg_receivewal to use lz4 compression

2021-11-04 Thread gkokolatos



‐‐‐ Original Message ‐‐‐

On Thursday, November 4th, 2021 at 9:21 AM, Michael Paquier 
 wrote:

> On Thu, Nov 04, 2021 at 04:31:48PM +0900, Michael Paquier wrote:
> Thanks.  I have looked at 0001 today, and applied it after fixing a
> couple of issues.

Great! Thank you very much.

> From memory:
> - zlib.h was missing from pg_receivewal.c, issue that I noticed after
> removing the redefinition of Z_DEFAULT_COMPRESSION because there was
> no need for it (did a run with a --without-zlib as well).

Yeah, I simply wanted to avoid adding a header. Either way works really.

> - Simplified a bit the error handling for incorrect option
> combinations, using a switch/case while on it.

Much cleaner done this way.

> - Renamed the existing variable "compression" in walmethods.c to
> compression_level, to reduce any confusion with the introduction of
> compression_method.  One thing I have noticed is about the tar method,
> where we rely on the compression level to decide if compression should
> be used or not.  There should be some simplifications possible there
> but there is a huge take in receivelog.c where we use COMPRESSION_NONE
> to track down that we still want to zero a new segment when using tar
> method.

Agreed.

> - Use of 'I' as short option name, err...  After applying the first
> batch..

I left that in just to have the two compression related options next to each
other when switching. I assumed it might help with readability for the next
developer looking at it.

Removing it, is cleaner for the option definifion though, thanks.

>
> Based on the work of 0001, there were some conflicts with 0002.  I
> have solved them while reviewing it, and adapted the code to what got
> already applied.

Thank you very much.

>
> +   header_size = LZ4F_compressBegin(ctx, lz4buf, lz4bufsize, NULL);
> +   if (LZ4F_isError(header_size))
> +   {
> +   pg_free(lz4buf);
> +   close(fd);
> +   return NULL;
> +   }
> In dir_open_for_write(), I guess that this one is missing one
> LZ4F_freeCompressionContext().

Agreed.

>
> +   status = LZ4F_freeDecompressionContext(ctx);
> +   if (LZ4F_isError(status))
> +   {
> +   pg_log_error("could not free LZ4 decompression context: %s",
> +LZ4F_getErrorName(status));
> +   exit(1);
> +   }
> +
> +   if (uncompressed_size != WalSegSz)
> +   {
> +   pg_log_warning("compressed segment file \"%s\" has
> incorrect uncompressed size %ld, skipping",
> +  dirent->d_name, uncompressed_size);
> +   (void) LZ4F_freeDecompressionContext(ctx);
> +   continue;
> +   }
> When the uncompressed size does not match out expected size, the
> second LZ4F_freeDecompressionContext() looks unnecessary to me because
> we have already one a couple of lines above.

Agreed.

>
> +   ctx_out = LZ4F_createCompressionContext(, LZ4F_VERSION);
> +   lz4bufsize = LZ4F_compressBound(LZ4_IN_SIZE, NULL);
> +   if (LZ4F_isError(ctx_out))
> +   {
> +   close(fd);
> +   return NULL;
> +   }
> LZ4F_compressBound() can be after the check on ctx_out, here.
>
> +   while (1)
> +   {
> +   char   *readp;
> +   char   *readend;
> Simply looping when decompressing a segment to check its size looks
> rather unsafe to me.  We should leave the loop once uncompressed_size
> is strictly more than WalSegSz.

The loop exits when done reading or when it failed to read:

+   r = read(fd, readbuf, sizeof(readbuf));
+   if (r < 0)
+   {
+   pg_log_error("could not read file \"%s\": %m", fullpath);
+   exit(1);
+   }
+
+   /* Done reading */
+   if (r == 0)
+   break;

Although I do agree that it can exit before that, if the uncompressed size is
greater than WalSegSz.

>
> The amount of TAP tests looks fine, and that's consistent with what we
> do for zlib^D^D^D^Dgzip.  Now, when testing manually pg_receivewal
> with various combinations of gzip, lz4 and none, I can see the
> following failure in the code that calculates the streaming start
> point:
> pg_receivewal: error: could not decompress file
> "wals//00010006.lz4": ERROR_frameType_unknown
>

Hmmm I will look into that.

> In the LZ4 code, this points to lib/lz4frame.c, where we read an
> incorrect header (see the part that does not match LZ4F_MAGICNUMBER).
> The segments written by pg_receivewal are clean.  Please note that
> this shows up as well when manually compressing some segments with a
> simple lz4 command, to simulate for example the case where a user
> compressed some segments by himself/herself before running
> pg_receivewal.
>

Rights, thank you for investigating. I will look further.

> So, tour code does 

Re: [patch] [doc] Further note required activity aspect of automatic checkpoint and archving

2021-11-04 Thread Daniel Gustafsson
> On 18 Mar 2021, at 16:36, David Steele  wrote:

> Could you produce a new patch so Peter has something complete to look at?

As this thread has been stalled for for a few commitfests by now I'm marking
this patch as returned with feedback.  Feel free to open a new entry for an
updated patch.

--
Daniel Gustafsson   https://vmware.com/





Re: Logical insert/update/delete WAL records for custom table AMs

2021-11-04 Thread Amit Kapila
On Thu, Nov 4, 2021 at 7:09 AM Jeff Davis  wrote:
>
> On Wed, 2021-11-03 at 11:25 +0530, Amit Kapila wrote:
> > You have modeled DecodeLogicalInsert based on current DecodeInsert
> > and
> > it generates the same change message, so not sure how exactly these
> > new messages will be different from current heap_insert/update/delete
> > messages?
>
> Is there a reason you think the messages should be different for heap
> versus another table AM? Isn't the table AM a physical implementation
> detail?
>

We have special handling for speculative insertions and toast
insertions. Can't different tableAM's have different representations
for toast or may be some different concept like speculative
insertions? Similarly, I remember that for zheap we didn't had
TransactionIds for subtransactions so we need to make some changes in
logical decoding to compensate for that. I guess similar stuff could
be required for another table AMs. Then a different table AM can have
a different tuple format which won't work for current change records
unless we convert it to heap tuple format before writing WAL for it.

-- 
With Regards,
Amit Kapila.




Re: CREATE ROLE IF NOT EXISTS

2021-11-04 Thread Daniel Gustafsson
> On 3 Nov 2021, at 23:18, Tom Lane  wrote:

> I'm generally pretty down on IF NOT EXISTS semantics in all cases,
> but it seems particularly dangerous for something as fundamental
> to privilege checks as a role.  It's not hard at all to conjure up
> scenarios in which this permits privilege escalation.  That is,
> Alice wants to create role Bob and give it some privileges, but
> she's lazy and writes a quick-and-dirty script using CREATE ROLE
> IF NOT EXISTS.  Meanwhile Charlie sneaks in and creates Bob first,
> and then grants it to himself.  Now Alice's script is giving away
> all sorts of privilege to Charlie.  (Admittedly, Charlie must have
> CREATEROLE privilege already, but that doesn't mean he has every
> privilege that Alice has --- especially not as we continue working
> to slice the superuser salami ever more finely.)

I agree with this take, I don't think the convenience outweighs the risk in
this case.

--
Daniel Gustafsson   https://vmware.com/





[sqlsmith] Failed assertion in brin_minmax_multi_distance_float4 on REL_14_STABLE

2021-11-04 Thread Andreas Seltenreich
Hi,

sqlsmith triggers the following assertion when testing REL_14_STABLE:

TRAP: FailedAssertion("a1 <= a2", File: "brin_minmax_multi.c", Line: 1879, 
PID: 631814)

I can reproduce it with the following query on a fresh regression
database:

insert into public.brintest_multi (float4col) values (real 'nan');

The branch was at f6162c020c while testing, backtrace below.

regards,
Andreas

(gdb) bt
#0  0x7f703cc46ce1 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x7f703cc30537 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x55b17517c521 in ExceptionalCondition at assert.c:69
#3  0x55b174d25871 in brin_minmax_multi_distance_float4 (fcinfo=) at brin_minmax_multi.c:1879
#4  0x55b1751853ba in FunctionCall2Coll 
(flinfo=flinfo@entry=0x55b176913d10, collation=collation@entry=0, 
arg1=, 
arg2=) at fmgr.c:1160
#5  0x55b174d23a41 in build_distances 
(distanceFn=distanceFn@entry=0x55b176913d10, colloid=0, 
eranges=eranges@entry=0x55b17693fad0, neranges=neranges@entry=5) at 
brin_minmax_multi.c:1352
#6  0x55b174d256be in compactify_ranges (max_values=32, 
ranges=0x55b176945708, bdesc=) at brin_minmax_multi.c:1822
#7  brin_minmax_multi_serialize (bdesc=, src=94220687005448, 
dst=0x55b17693ae10) at brin_minmax_multi.c:2386
#8  0x55b174d2ae0d in brin_form_tuple (brdesc=brdesc@entry=0x55b176914988, 
blkno=blkno@entry=15, 
tuple=tuple@entry=0x55b17693aaa0, size=size@entry=0x7cc6d4f8) at 
brin_tuple.c:165
#9  0x55b174d20a5f in brininsert (idxRel=0x7f703b2ae170, 
values=0x7cc6d610, nulls=0x7cc6d5f0, heaptid=0x55b17690da78, 




Re: Map WAL segment files on PMEM as WAL buffers

2021-11-04 Thread Takashi Menjo
Hello Daniel,

Thank you for your comment. I had the following error message with
MSVC on Windows. It looks the same as what you told me. I'll fix it.

| > cd src\tools\msvc
| > build
| (..snipped..)
| Copying pg_config_os.h...
| Generating configuration headers...
| undefined symbol: HAVE_LIBPMEM at src/include/pg_config.h line 347
at C:/Users/menjo/Documents/git/postgres/src/tools/msvc/Mkvcbuild.pm
line 860.

Best regards,
Takashi


On Wed, Nov 3, 2021 at 10:04 PM Daniel Gustafsson  wrote:
>
> > On 28 Oct 2021, at 08:09, Takashi Menjo  wrote:
>
> > Rebased, and added the patches below into the patchset.
>
> Looks like the 0001 patch needs to be updated to support Windows and MSVC.  
> See
> src/tools/msvc/Mkvcbuild.pm and Solution.pm et.al for inspiration on how to 
> add
> the MSVC equivalent of --with-libpmem.  Currently the patch fails in the
> "Generating configuration headers" step in Solution.pm.
>
> --
> Daniel Gustafsson   https://vmware.com/
>


-- 
Takashi Menjo 




Re: Teach pg_receivewal to use lz4 compression

2021-11-04 Thread Michael Paquier
On Thu, Nov 04, 2021 at 04:31:48PM +0900, Michael Paquier wrote:
> I would have expected read_size to be (readend - readp) to match with
> the remaining data in the read buffer that we still need to read.
> Shouldn't out_size also be LZ4_CHUNK_SZ to match with the size of the
> output buffer where all the contents are read?  By setting it to 1, I
> think that this is doing more loops into LZ4F_decompress() than really
> necessary.  It would not hurt either to memset(0) those buffers before
> they are used, IMO.  I am not completely sure either, but should we
> use the number of bytes returned by LZ4F_decompress() as a hint for
> the follow-up loops?
>
> +#ifdef HAVE_LIBLZ4
> +while (readp < readend)
> +{
> +size_tread_size = 1;
> +size_tout_size = 1;
> +
> +status = LZ4F_decompress(ctx, outbuf, _size,
> + readbuf, _size, NULL);

And...  It happens that the error from v9 is here, where we need to
read the amount of remaining data from "readp", and not "readbuf" :)

It is already late here, I'll continue on this stuff tomorrow, but
this looks rather committable overall. 
--
Michael
From d2d76d8fefb1bad5db611746cf3d0ac89a67de4b Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 4 Nov 2021 17:19:41 +0900
Subject: [PATCH v11] Teach pg_receivewal to use LZ4 compression

The program pg_receivewal can use gzip compression to store the received
WAL.  This commit teaches it to also be able to use LZ4 compression. It
is required that the binary is build using the -llz4 flag. It is enabled
via the --with-lz4 flag on configuration time.

The option `--compression-method` has been expanded to inlude the value
[LZ4].  The option `--compress` can not be used with LZ4 compression.

Under the hood there is nothing exceptional to be noted. Tar based
archives have not yet been taught to use LZ4 compression. If that is
felt useful, then it is easy to be added in the future.

Tests have been added to verify the creation and correctness of the
generated LZ4 files. The later is achieved by the use of LZ4 program, if
present in the installation.
---
 src/bin/pg_basebackup/Makefile   |   1 +
 src/bin/pg_basebackup/pg_receivewal.c| 153 ++
 src/bin/pg_basebackup/t/020_pg_receivewal.pl |  72 -
 src/bin/pg_basebackup/walmethods.c   | 160 ++-
 src/bin/pg_basebackup/walmethods.h   |   1 +
 doc/src/sgml/ref/pg_receivewal.sgml  |   8 +-
 src/Makefile.global.in   |   1 +
 7 files changed, 384 insertions(+), 12 deletions(-)

diff --git a/src/bin/pg_basebackup/Makefile b/src/bin/pg_basebackup/Makefile
index 459d514183..fd920fc197 100644
--- a/src/bin/pg_basebackup/Makefile
+++ b/src/bin/pg_basebackup/Makefile
@@ -19,6 +19,7 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 # make these available to TAP test scripts
+export LZ4
 export TAR
 # Note that GZIP cannot be used directly as this environment variable is
 # used by the command "gzip" to pass down options, so stick with a different
diff --git a/src/bin/pg_basebackup/pg_receivewal.c b/src/bin/pg_basebackup/pg_receivewal.c
index 8acc0fc009..b5c0a98b82 100644
--- a/src/bin/pg_basebackup/pg_receivewal.c
+++ b/src/bin/pg_basebackup/pg_receivewal.c
@@ -32,6 +32,10 @@
 #include "receivelog.h"
 #include "streamutil.h"
 
+#ifdef HAVE_LIBLZ4
+#include "lz4frame.h"
+#endif
+
 /* Time to sleep between reconnection attempts */
 #define RECONNECT_SLEEP_TIME 5
 
@@ -136,6 +140,15 @@ is_xlogfilename(const char *filename, bool *ispartial,
 		return true;
 	}
 
+	/* File looks like a completed LZ4-compressed WAL file */
+	if (fname_len == XLOG_FNAME_LEN + strlen(".lz4") &&
+		strcmp(filename + XLOG_FNAME_LEN, ".lz4") == 0)
+	{
+		*ispartial = false;
+		*wal_compression_method = COMPRESSION_LZ4;
+		return true;
+	}
+
 	/* File looks like a partial uncompressed WAL file */
 	if (fname_len == XLOG_FNAME_LEN + strlen(".partial") &&
 		strcmp(filename + XLOG_FNAME_LEN, ".partial") == 0)
@@ -154,6 +167,15 @@ is_xlogfilename(const char *filename, bool *ispartial,
 		return true;
 	}
 
+	/* File looks like a partial LZ4-compressed WAL file */
+	if (fname_len == XLOG_FNAME_LEN + strlen(".lz4.partial") &&
+		strcmp(filename + XLOG_FNAME_LEN, ".lz4.partial") == 0)
+	{
+		*ispartial = true;
+		*wal_compression_method = COMPRESSION_LZ4;
+		return true;
+	}
+
 	/* File does not look like something we know */
 	return false;
 }
@@ -284,6 +306,14 @@ FindStreamingStart(uint32 *tli)
 		 * than 4GB, and then compare it to the size of a completed segment.
 		 * The 4 last bytes correspond to the ISIZE member according to
 		 * http://www.zlib.org/rfc-gzip.html.
+		 *
+		 * For LZ4 compressed segments, uncompress the file in a throw-away
+		 * buffer keeping track of the uncompressed size, then compare it to
+		 * the size of a completed segment.  Per its protocol, LZ4 does not
+		 * store the uncompressed size 

Re: Data is copied twice when specifying both child and parent table in publication

2021-11-04 Thread Amit Kapila
On Thu, Nov 4, 2021 at 12:23 PM Greg Nancarrow  wrote:
>
> On Thu, Nov 4, 2021 at 3:13 PM Amit Kapila  wrote:
> >
> > On further thinking about this, I think we should define the behavior
> > of replication among partitioned (on the publisher) and
> > non-partitioned (on the subscriber) tables a bit more clearly.
> >
> > - If the "publish_via_partition_root" is set for a publication then we
> > can always replicate to the table with the same name as the root table
> > in publisher.
> > - If the "publish_via_partition_root" is *not* set for a publication
> > then we can always replicate to the tables with the same name as the
> > non-root tables in publisher.
> >
> > Thoughts?
> >
>
> I'd adjust that wording slightly, because "we can always replicate to
> ..." sounds a bit vague, and saying that an option is set or not set
> could be misinterpreted, as the option could be "set" to false.
>
> How about:
>
> - If "publish_via_partition_root" is true for a publication, then data
> is replicated to the table with the same name as the root (i.e.
> partitioned) table in the publisher.
> - If "publish_via_partition_root" is false (the default) for a
> publication, then data is replicated to tables with the same name as
> the non-root (i.e. partition) tables in the publisher.
>

Sounds good to me. If we follow this then I think the patch by Hou-San
is good to solve the first problem as described in his last email [1]?

[1] - 
https://www.postgresql.org/message-id/OS0PR01MB5716C756312959F293A822C794869%40OS0PR01MB5716.jpnprd01.prod.outlook.com

-- 
With Regards,
Amit Kapila.




Re: Add proper planner support for ORDER BY / DISTINCT aggregates

2021-11-04 Thread Ronan Dunklau
Le jeudi 22 juillet 2021, 10:42:49 CET Ronan Dunklau a écrit :
> Le jeudi 22 juillet 2021, 09:38:50 CEST David Rowley a écrit :
> > On Thu, 22 Jul 2021 at 02:01, Ronan Dunklau  
wrote:
> > > I tested the 0001 patch against both HEAD and my proposed bugfix for
> > > postgres_fdw.
> > > 
> > > There is a problem that the ordered aggregate is not pushed down
> > > anymore.
> > > The underlying Sort node is correctly pushed down though.
> > > 
> > > This comes from the fact that postgres_fdw grouping path never contains
> > > any
> > > pathkey. Since the cost is fuzzily the same between the pushed-down
> > > aggregate and the locally performed one, the tie is broken against the
> > > pathkeys.
> > 
> > I think this might be more down to a lack of any penalty cost for
> > fetching foreign tuples. Looking at create_foreignscan_path(), I don't
> > see anything that adds anything to account for fetching the tuples
> > from the foreign server. If there was something like that then there'd
> > be more of a preference to perform the remote aggregation so that
> > fewer rows must arrive from the remote server.
> > 
> > I tested by adding: total_cost += cpu_tuple_cost * rows * 100; in
> > create_foreignscan_path() and I got the plan with the remote
> > aggregation. That's a fairly large penalty of 1.0 per row. Much bigger
> > than parallel_tuple_cost's default value.
> > 
> > I'm a bit undecided on how much this patch needs to get involved in
> > adjusting foreign scan costs.  The problem is that we've given the
> > executor a new path to consider and nobody has done any proper
> > costings for the foreign scan so that it properly prefers paths that
> > have to pull fewer foreign tuples.  This is a pretty similar problem
> > to what parallel_tuple_cost aims to fix. Also similar to how we had to
> > add APPEND_CPU_COST_MULTIPLIER to have partition-wise aggregates
> > prefer grouping at the partition level rather than at the partitioned
> > table level.
> > 
> > > Ideally we would add the group pathkeys to the grouping path, but this
> > > would add an additional ORDER BY expression matching the GROUP BY.
> > > Moreover, some triaging of the pathkeys would be necessary since we now
> > > mix the sort-in- aggref pathkeys with the group_pathkeys.
> > 
> > I think you're talking about passing pathkeys into
> > create_foreign_upper_path in add_foreign_grouping_paths. If so, I
> > don't really see how it would be safe to add pathkeys to the foreign
> > grouping path. What if the foreign server did a Hash Aggregate?  The
> > rows might come back in any random order.
> 
> Yes, I was suggesting to add a new path with the pathkeys factored in, which
> if chosen over the non-ordered path would result in an additional ORDER BY
> clause to prevent a HashAggregate. But that doesn't seem a good idea after
> all.
> 
> > I kinda think that to fix this properly would need a new foreign
> > server option such as foreign_tuple_cost. I'd feel better about
> > something like that with some of the people with a vested interest in
> > the FDW code were watching more closely. So far we've not managed to
> > entice any of them with the bug report yet, but it's maybe early days
> > yet.
> 
> We already have that in the form of fdw_tuple_cost as a server option if I'm
> not mistaken ? It works as expected when the number of tuples is notably
> reduced by the foreign group by.
> 
> The problem arise when the cardinality of the groups is equal to the input's
> cardinality. I think even in that case we should try to use a remote
> aggregate since it's a computation that will not happen on the local
> server. I also think we're more likely to have up to date statistics
> remotely than the ones collected locally on the foreign tables, and the
> estimated number of groups would be more accurate on the remote side than
> the local one.

I took some time to toy with this again.

At first I thought that charging a discount in foreign grouping paths for 
Aggref targets (since they are computed remotely) would be a good idea, 
similar to what is done for the grouping keys.

But in the end, it's probably not something we would like to do. Yes, the 
group planning will be more accurate on the remote side generally (better 
statistics than locally for estimating the number of groups) but executing the 
grouping locally when the number of groups is close to the input's cardinality 
(ex: group by unique_key)  gives us a form of parallelism which can actually 
perform better. 

For the other cases where there is fewer output than input tuples, that is, 
when an actual grouping takes place, adjusting fdw_tuple_cost might be enough 
to tune the behavior to what is desirable.


-- 
Ronan Dunklau






Re: should we enable log_checkpoints out of the box?

2021-11-04 Thread Michael Paquier
On Tue, Nov 02, 2021 at 11:55:23AM -0400, Robert Haas wrote:
> I think shipping with log_checkpoints=on and
> log_autovacuum_min_duration=10m or so would be one of the best things
> we could possibly do to allow ex-post-facto troubleshooting of
> system-wide performance issues. The idea that users care more about
> the inconvenience of a handful of extra log messages than they do
> about being able to find problems when they have them matches no part
> of my experience. I suspect that many users would be willing to incur
> *way more* useless log messages than those settings would ever
> generate if it meant that they could actually find problems when and
> if they have them. And these messages would in fact be the most
> valuable thing in the log for a lot of users. What reasonable DBA
> cares more about the fact that the application attempted an insert
> that violated a foreign key constraint than they do about a checkpoint
> that took 20 minutes to fsync everything? The former is expected; if
> you thought that foreign key violations would never occur, you
> wouldn't need to incur the expense of having the system enforce them.
> The latter is unexpected and basically undiscoverable with default
> settings.

+1.
--
Michael


signature.asc
Description: PGP signature


Re: Missing include in be-secure-openssl.c?

2021-11-04 Thread Michael Paquier
On Wed, Nov 03, 2021 at 11:45:26PM -0400, Tom Lane wrote:
> Yeah, I noted the comment about WIN32_LEAN_AND_MEAN in the
> stackoverflow thread too ... but as you say, it seems like
> that should make the problem less probable not more so.
> Still, it's hard to think of any other relevant change.

Yeah, I don't see how this could be linked to WIN32_LEAN_AND_MEAN.

> Anyway, my thought now is (1) move the openssl includes to
> after system includes in both *-secure-openssl.c files,
> and (2) add comments explaining why the order is critical.
> But it's late here and I'm not going to mess with it right now.
> If you want to take a shot at a blind fix before hamerkop's
> next run, have at it.

Reading through the error logs that Thomas has posted (thanks!), I
have seen error patterns like that with a dirty build repository.  So
it could be possible that hamerkop is reusing a directory where some
code has already been compiled on.
--
Michael


signature.asc
Description: PGP signature


Re: Teach pg_receivewal to use lz4 compression

2021-11-04 Thread Michael Paquier
On Wed, Nov 03, 2021 at 09:11:24AM +, gkokola...@pm.me wrote:
> Please find v9 attached.

Thanks.  I have looked at 0001 today, and applied it after fixing a
couple of issues.  From memory:
- zlib.h was missing from pg_receivewal.c, issue that I noticed after
removing the redefinition of Z_DEFAULT_COMPRESSION because there was
no need for it (did a run with a --without-zlib as well).
- Simplified a bit the error handling for incorrect option
combinations, using a switch/case while on it.
- Renamed the existing variable "compression" in walmethods.c to
compression_level, to reduce any confusion with the introduction of
compression_method.  One thing I have noticed is about the tar method,
where we rely on the compression level to decide if compression should
be used or not.  There should be some simplifications possible there
but there is a huge take in receivelog.c where we use COMPRESSION_NONE
to track down that we still want to zero a new segment when using tar
method.
- Use of 'I' as short option name, err...  After applying the first
batch..

Based on the work of 0001, there were some conflicts with 0002.  I
have solved them while reviewing it, and adapted the code to what got
already applied.

+   header_size = LZ4F_compressBegin(ctx, lz4buf, lz4bufsize, NULL);
+   if (LZ4F_isError(header_size))
+   {
+   pg_free(lz4buf);
+   close(fd);
+   return NULL;
+   }
In dir_open_for_write(), I guess that this one is missing one
LZ4F_freeCompressionContext().

+   status = LZ4F_freeDecompressionContext(ctx);
+   if (LZ4F_isError(status))
+   {
+   pg_log_error("could not free LZ4 decompression context: %s",
+LZ4F_getErrorName(status));
+   exit(1);
+   }
+
+   if (uncompressed_size != WalSegSz)
+   {
+   pg_log_warning("compressed segment file \"%s\" has
incorrect uncompressed size %ld, skipping",
+  dirent->d_name, uncompressed_size);
+   (void) LZ4F_freeDecompressionContext(ctx);
+   continue;
+   }
When the uncompressed size does not match out expected size, the
second LZ4F_freeDecompressionContext() looks unnecessary to me because
we have already one a couple of lines above.

+   ctx_out = LZ4F_createCompressionContext(, LZ4F_VERSION);
+   lz4bufsize = LZ4F_compressBound(LZ4_IN_SIZE, NULL);
+   if (LZ4F_isError(ctx_out))
+   {
+   close(fd);
+   return NULL;
+   }
LZ4F_compressBound() can be after the check on ctx_out, here.

+   while (1)
+   {
+   char   *readp;
+   char   *readend;
Simply looping when decompressing a segment to check its size looks
rather unsafe to me.  We should leave the loop once uncompressed_size
is strictly more than WalSegSz.

The amount of TAP tests looks fine, and that's consistent with what we
do for zlib^D^D^D^Dgzip.  Now, when testing manually pg_receivewal
with various combinations of gzip, lz4 and none, I can see the
following failure in the code that calculates the streaming start
point:
pg_receivewal: error: could not decompress file
"wals//00010006.lz4": ERROR_frameType_unknown

In the LZ4 code, this points to lib/lz4frame.c, where we read an
incorrect header (see the part that does not match LZ4F_MAGICNUMBER).
The segments written by pg_receivewal are clean.  Please note that
this shows up as well when manually compressing some segments with a
simple lz4 command, to simulate for example the case where a user
compressed some segments by himself/herself before running
pg_receivewal.

So, tour code does LZ4F_createDecompressionContext() followed by a
loop on read() and LZ4F_decompress() that relies on an input and an
output buffer of a fixed 4kB size (we could use 64kB at least here 
actually?).  So this set of loops looks rather correct to me.

Now, this part is weird:
+   while (readp < readend)
+   {
+   size_t  read_size = 1;
+   size_t  out_size = 1;

I would have expected read_size to be (readend - readp) to match with
the remaining data in the read buffer that we still need to read.
Shouldn't out_size also be LZ4_CHUNK_SZ to match with the size of the
output buffer where all the contents are read?  By setting it to 1, I
think that this is doing more loops into LZ4F_decompress() than really
necessary.  It would not hurt either to memset(0) those buffers before
they are used, IMO.  I am not completely sure either, but should we
use the number of bytes returned by LZ4F_decompress() as a hint for
the follow-up loops?

Attached is an updated patch, which includes fixes for most of the
issues I am mentioning above.  Please note that I have not changed
FindStreamingStart(), so this part is the same as v9.

Thanks,
--
Michael
From 51a2352d2e543795ec201ed444d4e74014c3a2d3 Mon Sep 17 00:00:00 2001

Re: CREATEROLE and role ownership hierarchies

2021-11-04 Thread Shinya Kato

On 2021-10-28 07:21, Mark Dilger wrote:
On Oct 25, 2021, at 10:09 PM, Shinya Kato 
 wrote:



Hi! Thank you for the patch.
I too think that CREATEROLE escalation attack is problem.

I have three comments.
1. Is there a function to check the owner of a role, it would be nice 
to be able to check with \du or pg_roles view.


No, but that is a good idea.


These two ideas are implemented in v2.  Both \du and pg_roles show the
owner information.

The current solution is to run REASSIGN OWNED in each database where 
the role owns objects before running DROP ROLE. At that point, the 
CASCADE option (not implemented) won't be needed.  Of course, I need 
to post the next revision of this patch set addressing the 
deficiencies that Nathan pointed out upthread to make that work.


REASSIGN OWNED and ALTER ROLE..OWNER TO now work in v2.


When ALTER ROLE with the privilege of REPLICATION, only the superuser is 
checked.
Therefore, we have a strange situation where we can create a role but 
not change it.

---
postgres=> SELECT current_user;
 current_user
--
 test
(1 row)

postgres=> \du test
   List of roles
 Role name | Owner  |Attributes| Member of
---++--+---
 test  | shinya | Create role, Replication | {}

postgres=> CREATE ROLE test2 REPLICATION;
CREATE ROLE
postgres=> ALTER ROLE test2 NOREPLICATION;
2021-11-04 14:24:02.687 JST [2615016] ERROR:  must be superuser to alter 
replication roles or change replication attribute
2021-11-04 14:24:02.687 JST [2615016] STATEMENT:  ALTER ROLE test2 
NOREPLICATION;
ERROR:  must be superuser to alter replication roles or change 
replication attribute

---
Wouldn't it be better to check if the role has CREATEROLE and 
REPLICATION?

The same is true for BYPASSRLS.

By the way, is this thread registered to CommitFest?

--
Regards,

--
Shinya Kato
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Data is copied twice when specifying both child and parent table in publication

2021-11-04 Thread Greg Nancarrow
On Thu, Nov 4, 2021 at 3:13 PM Amit Kapila  wrote:
>
> On further thinking about this, I think we should define the behavior
> of replication among partitioned (on the publisher) and
> non-partitioned (on the subscriber) tables a bit more clearly.
>
> - If the "publish_via_partition_root" is set for a publication then we
> can always replicate to the table with the same name as the root table
> in publisher.
> - If the "publish_via_partition_root" is *not* set for a publication
> then we can always replicate to the tables with the same name as the
> non-root tables in publisher.
>
> Thoughts?
>

I'd adjust that wording slightly, because "we can always replicate to
..." sounds a bit vague, and saying that an option is set or not set
could be misinterpreted, as the option could be "set" to false.

How about:

- If "publish_via_partition_root" is true for a publication, then data
is replicated to the table with the same name as the root (i.e.
partitioned) table in the publisher.
- If "publish_via_partition_root" is false (the default) for a
publication, then data is replicated to tables with the same name as
the non-root (i.e. partition) tables in the publisher.

?

Regards,
Greg Nancarrow
Fujitsu Australia