Re: Regex performance regression induced by match-all code

2021-05-01 Thread Joel Jacobson
On Sat, May 1, 2021, at 21:46, Tom Lane wrote:
> I found a nasty performance problem in commit 824bf7190: given the
> right sort of regex, checkmatchall() takes an unreasonable amount
> of time.

Nice catch.

> fix-exponential-cost-of-checkmatchall-1.patch

I've successfully tested the patch on the regex corpus:

SELECT
  is_match <> (subject ~ pattern),
  captured IS DISTINCT FROM regexp_match(subject, pattern, flags),
  COUNT(*)
FROM performance_test
GROUP BY 1,2
ORDER BY 1,2;
?column? | ?column? |  count
--+--+-
f| f| 3253889
(1 row)

HEAD (651d005e76bc0b9542615f609b4d0d946035dc58)
Time: 94096.020 ms (01:34.096)
Time: 93102.287 ms (01:33.102)
Time: 9.746 ms (01:33.334)

fix-exponential-cost-of-checkmatchall-1.patch
Time: 95247.529 ms (01:35.248)
Time: 92617.502 ms (01:32.618)
Time: 93259.700 ms (01:33.260)

I've also tested the problematic type of regexes:

HEAD (651d005e76bc0b9542615f609b4d0d946035dc58)
SELECT regexp_matches('', '(.|){20}','');
Time: 61.613 ms
SELECT regexp_matches('', '(.|){25}','');
Time: 1928.674 ms (00:01.929)
SELECT regexp_matches('', '(.|){27}','');
Time: 7789.601 ms (00:07.790)

fix-exponential-cost-of-checkmatchall-1.patch
SELECT regexp_matches('', '(.|){20}','');
Time: 0.965 ms
SELECT regexp_matches('', '(.|){25}','');
Time: 0.586 ms
SELECT regexp_matches('', '(.|){27}','');
Time: 0.788 ms

Nice improvement, thanks.

/Joel

Re: Some oversights in query_id calculation

2021-05-01 Thread Julien Rouhaud
Hi Aleksander,

On Wed, Apr 28, 2021 at 03:22:39PM +0300, Aleksander Alekseev wrote:
> Hi Julien,
> 
> > You should see failures doing a check-world or simply a make -C
> > contrib/pg_stat_statements check
> 
> Sorry, my bad. I was running make check-world, but did it with -j4 flag
> which was a mistake.
> 
> The patch is OK.

Thanks for reviewing!




Re: WIP: WAL prefetch (another approach)

2021-05-01 Thread Tom Lane
Thomas Munro  writes:
> On Thu, Apr 29, 2021 at 4:45 AM Tom Lane  wrote:
>> Andres Freund  writes:
>>> Tom, any chance you could check if your machine repros the issue before
>>> these commits?

>> Wilco, but it'll likely take a little while to get results ...

> FWIW I also chewed through many megawatts trying to reproduce this on
> a PowerPC system in 64 bit big endian mode, with an emulator.  No
> cigar.  However, it's so slow that I didn't make it to 10 runs...

So I've expended a lot of kilowatt-hours over the past several days,
and I've got results that are interesting but don't really get us
any closer to a resolution.

To recap, the test lashup is:
* 2003 PowerMac G4 (1.25GHz PPC 7455, 7200 rpm spinning-rust drive)
* Standard debug build (--enable-debug --enable-cassert)
* Out-of-the-box configuration, except add wal_consistency_checking = all
and configure a wal-streaming standby on the same machine
* Repeatedly run "make installcheck-parallel", but skip the tablespace
test to avoid issues with the standby trying to use the same directory
* Delay long enough after each installcheck-parallel to let the 
standby catch up (the run proper is ~24 min, plus 2 min for catchup)

The failures I'm seeing generally look like

2021-05-01 15:33:10.968 EDT [8281] FATAL:  inconsistent page found, rel 
1663/58186/66338, forknum 0, blkno 19
2021-05-01 15:33:10.968 EDT [8281] CONTEXT:  WAL redo at 3/4CE905B8 for 
Gist/PAGE_UPDATE: ; blkref #0: rel 1663/58186/66338, blk 19 FPW

with a variety of WAL record types being named, so it doesn't seem
to be specific to any particular record type.  I've twice gotten the
bogus-checksum-and-then-assertion-failure I reported before:

2021-05-01 17:07:52.992 EDT [17464] LOG:  incorrect resource manager data 
checksum in record at 3/E0073EA4
TRAP: FailedAssertion("state->recordRemainLen > 0", File: "xlogreader.c", Line: 
567, PID: 17464)

In both of those cases, the WAL on disk was perfectly fine, and the same
is true of most of the "inconsistent page" complaints.  So the issue
definitely seems to be about the startup process mis-reading data that
was correctly shipped over.

Anyway, the new and interesting data concerns the relative failure rates
of different builds:

* Recent HEAD (from 4-28 and 5-1): 4 failures in 8 test cycles

* Reverting 1d257577e: 1 failure in 8 test cycles

* Reverting 1d257577e and f003d9f87: 3 failures in 28 cycles

* Reverting 1d257577e, f003d9f87, and 323cbe7c7: 2 failures in 93 cycles

That last point means that there was some hard-to-hit problem even
before any of the recent WAL-related changes.  However, 323cbe7c7
(Remove read_page callback from XLogReader) increased the failure
rate by at least a factor of 5, and 1d257577e (Optionally prefetch
referenced data) seems to have increased it by another factor of 4.
But it looks like f003d9f87 (Add circular WAL decoding buffer)
didn't materially change the failure rate.

Considering that 323cbe7c7 was supposed to be just refactoring,
and 1d257577e is allegedly disabled-by-default, these are surely
not the results I was expecting to get.

It seems like it's still an open question whether all this is
a real bug, or flaky hardware.  I have seen occasional kernel
freezeups (or so I think -- machine stops responding to keyboard
or network input) over the past year or two, so I cannot in good
conscience rule out the flaky-hardware theory.  But it doesn't
smell like that kind of problem to me.  I think what we're looking
at is a timing-sensitive bug that was there before (maybe long
before?) and these commits happened to make it occur more often
on this particular hardware.  This hardware is enough unlike
anything made in the past decade that it's not hard to credit
that it'd show a timing problem that nobody else can reproduce.

(I did try the time-honored ritual of reseating all the machine's
RAM, partway through this.  Doesn't seem to have changed anything.)

Anyway, I'm not sure where to go from here.  I'm for sure nowhere
near being able to identify the bug --- and if there really is
a bug that formerly had a one-in-fifty reproduction rate, I have
zero interest in trying to identify where it started by bisecting.
It'd take at least a day per bisection step, and even that might
not be accurate enough.  (But, if anyone has ideas of specific
commits to test, I'd be willing to try a few.)

regards, tom lane




Re: Enhanced error message to include hint messages for redundant options error

2021-05-01 Thread Alvaro Herrera
On 2021-May-01, Bharath Rupireddy wrote:

> IMO, it's not good to change the function API just for showing up
> parse_position (which is there for cosmetic reasons I feel) in an error
> which actually has the option name clearly mentioned in the error message.

Why not?  We've done it before, I'm sure you can find examples in the
git log.  The function API is not that critical -- these functions are
mostly only called from ProcessUtility and friends.

-- 
Álvaro Herrera39°49'30"S 73°17'W




Re: Granting control of SUSET gucs to non-superusers

2021-05-01 Thread Michael Banck
Hi,

On Fri, Apr 30, 2021 at 04:19:22PM -0700, Mark Dilger wrote:
> PostgreSQL defines a number of GUCs that can only be set by
> superusers.  I would like to support granting privileges on subsets of
> these to non-superuser roles, inspired by Stephen Frost's recent work
> on pg_read_all_data and pg_write_all_data roles.
> 
> The specific use case motivating this work is that of a PostgreSQL
> service provider.  The provider guarantees certain aspects of the
> service, such as periodic backups, replication, uptime, availability,
> etc., while making no guarantees of other aspects, such as performance
> associated with the design of the schema or the queries executed.  The
> provider should be able to grant to the tenant privileges to set any
> GUC which cannot be used to "escape the sandbox" and interfere with
> the handful of metrics being guaranteed.  Given that the guarantees
> made by one provider may differ from those made by another, the exact
> set of GUCs which the provider allows the tenant to control may
> differ.
> 
> By my count, there are currently 50 such GUCs, already broken down
> into 15 config groups.  Creating a single new role pg_set_all_gucs
> seems much too coarse a control, but creating 50 new groups may be
> excessive.  We could certainly debate which GUCs could be used to
> escape the sandbox vs. which ones could not, but I would prefer a
> design that allows the provider to make that determination.  The patch
> I would like to submit would only give the provider the mechanism for
> controlling these things, but would not make the security choices for
> them.
> 
> Do folks think it would make sense to create a role per config group?
> Adding an extra 15 default roles seems high to me, but organizing the
> feature this way would make the roles easier to document, because
> there would be a one-to-one correlation between the roles and the
> config groups.
> 
> I have a WIP patch that I'm not attaching, but if I get any feedback,
> I might be able to adjust the patch before the first version posted.
> The basic idea is that it allows things like:
> 
> GRANT pg_set_stats_monitoring TO tenant_role;
> 
> And then tenant_role could, for example
> 
> SET log_parser_stats TO off;

Just saying, I've proposed something very similar, albeit for a narrower
scope (mostly the Reporting and Logging category) here:
https://www.postgresql.org/message-id/flat/c2ee39152957af339ae6f3e851aef09930dd2faf.ca...@credativ.de


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz




Re: performance benchmark

2021-05-01 Thread Yu Zhao
On Sat, May 1, 2021 at 12:55 AM Yu Zhao  wrote:
>
> Greetings,
>
> We are proposing a patchset to the Linux kernel, which brings very
> promising performance improvements under memory pressure. I'm
> wondering if anybody would be interested in benchmarking it on top of
> the latest 5.12 kernel.
> https://lore.kernel.org/linux-mm/20210413065633.2782273-1-yuz...@google.com/
>
> My knowledge in PostgreSQL is minimal. But I'm happy to provide any
> assistance you might need from the kernel side.
>
> Thanks in advance for any expertise you are able to provide. I look
> forward to hearing from you.

Sorry about the cross-posting. I didn't know it's prohibited until my
previous message got rejected.

I have removed pgsql-performance@, because pgsql-hacker@ seems more
relevant to me. Please let me know if I got it wrong again.

Thanks!




Re: Re: Perform COPY FROM encoding conversions in larger chunks

2021-05-01 Thread Chapman Flack
On 04/01/21 05:27, Heikki Linnakangas wrote:
> I read through the patches one more time, made a few small comment fixes,
> and pushed.

Wow, this whole thread escaped my attention at the time, though my ears
would have perked right up if the subject had been something like
'improve encoding conversion API to stream a buffer at a time'. I think
this is of great interest beyond one particular use case in COPY FROM.
For example, it could limit the allocations needed when streaming a large
text value out to a client; it might be used to advantage with the recent
work in incrementally detoasting large values, and so on.

This part seems a little underdeveloped:

> * TODO: The conversion function interface is not great.  Firstly, it
> * would be nice to pass through the destination buffer size to the
> * conversion function, so that if you pass a shorter destination buffer, it
> * could still continue to fill up the whole buffer.  Currently, we have to
> * assume worst case expansion and stop the conversion short, even if there
> * is in fact space left in the destination buffer.  Secondly, it would be
> * nice to return the number of bytes written to the caller, to avoid a call
> * to strlen().

If I understand correctly, this patch already makes a breaking change to
the conversion function API. If that's going to be the case anyway, I wonder
if it's worth going further and changing the API further to eliminate this
odd limitation.

There seems to be a sort of common shape that conversion APIs have evolved
toward, that can be seen in both the ICU4C converters [0] and in Java's [1].
This current tweak to our conversion API seems to get ammsst there,
but just not quite. For example, noError allows us to keep control when
the function has stopped converting, but we don't find out which reason
it stopped.

If we just went the rest of the way and structured the API like those
existing ones, then:

- it would be super easy to write wrappers around ICU4C converters, if
  there were any we wanted to use;

- I could very easily write wrappers presenting any PG-supported charset
  as a Java charset.

The essence of the API common to ICU4C and Java is this:

1. You pass the function the address and length of a source buffer,
   the address and length of a destination buffer, and a flag that is true
   if you know there is no further input where this source buffer came from.
   (It's allowable to pass false and only then discover you really have no
   more input after all; then you just make one final call passing true.)

2. The function eats as much as it can of the source buffer, fills as much
   as it can of the destination buffer, and returns indicating one of four
   reasons it stopped:

   underflow - ran out of source buffer
   overflow  - ran out of destination buffer
   malformed - something in source buffer isn't valid in that representation
   unmappable - a valid codepoint not available in destination encoding

   Based on that, the caller refills the source buffer, or drains the
   destination buffer, or handles or reports the malformed or unmappable,
   then repeats.

3. The function should update pointers on return to indicate how much
   of the source buffer it consumed and how much of the destination buffer
   it filled.

4. If it left any bytes unconsumed in the source buffer, the caller must
   preserve them (perhaps moved to the front) for the next call.

5. The converter can have internal state (so it is an object in Java, or
   has a UConverter struct allocated in ICU4C, to have a place for its
   state). The state gets flushed on the final call where the flag is
   passed true. In many cases, the converter can be implemented without
   keeping internal state, if it simply leaves, for example, an
   incomplete sequence at the end of the source buffer unconsumed, so the
   caller will move it to the front and supply the rest. On the other hand,
   any unconsumed input after the final call with flush flag true must be
   treated as malformed.

6. On a malformed or unmappable return, the source buffer is left pointed
   at the start of the offending sequence and the length in bytes of
   that sequence is available for error reporting/recovery.

The efficient handling of states, returning updated pointers, and so on,
probably requires a function signature with 'internal' in it ... but
the current function signature already has 'internal', so that doesn't
seem like a deal-breaker.


Thoughts? It seems a shame to make a breaking change in the conversion
API, only to still end up with an API that "is not great" and is still
impedance-mismatched to other existing prominent conversion APIs.

Regards,
-Chap


[0]
https://unicode-org.github.io/icu/userguide/conversion/converters.html#3-buffered-or-streamed

[1]
https://docs.oracle.com/javase/9/docs/api/java/nio/charset/CharsetDecoder.html#decode-java.nio.ByteBuffer-java.nio.CharBuffer-boolean-




Regex performance regression induced by match-all code

2021-05-01 Thread Tom Lane
I found a nasty performance problem in commit 824bf7190: given the
right sort of regex, checkmatchall() takes an unreasonable amount
of time.  For example,

regression=# SELECT regexp_matches('', '(.|){20}','');
 regexp_matches 

 {""}
(1 row)

Time: 129.213 ms
regression=# SELECT regexp_matches('', '(.|){25}','');
 regexp_matches 

 {""}
(1 row)

Time: 4101.416 ms (00:04.101)
regression=# SELECT regexp_matches('', '(.|){30}','');
 regexp_matches 

 {""}
(1 row)

Time: 130803.927 ms (02:10.804)

That's quite awful compared to v13, where these cases take
only a couple ms.

Worse still, you can't get out of it with control-C, because
checkmatchall_recurse lacks any check-for-interrupt.

The problem here is basically that we're willing to recursively
examine all paths out of the same NFA state over and over, once for
each possible way of arriving at that state.  That's dumb and we can
do better, though it takes a little more code and some more memory.
The attached patch applies a few different methods to make this
better:

* Before starting the recursive search, do a linear-time pass
through all the states to check whether there are any non-RAINBOW
arcs.  This allows failing fast for most non-matchall NFAs.

* Memo-ize the results of the recursive search, by storing an
array of possible path lengths for each state after we've examined
it once.

* Rewrite the checks for pseudocolor arcs to make them linear
time rather than O(N^2) --- I decided I'd better do that too,
after noting that the problem cases had fairly large numbers
of pre-state outarcs.  This makes them cheap enough to do
before the recursive search not after.

* Put a heuristic upper bound on the number of NFA states for
which we'll attempt this optimization at all.  The main reason
for this is to bound the amount of memory we can expend on
memoization results.  I think that it will result in little
if any degradation in our ability to recognize matchall NFAs,
because of the existing restriction that we can't represent
cases involving path lengths that are finite but more than DUPINF.
If there are a lot more than DUPINF states then (I think) it becomes
pretty likely that we'd fail due to that restriction anyhow.
However, I've not made much attempt to quantify that argument;
I just chose DUPINF * 4 out of the air.

* Just in case that's not enough to fix things, add a cancel check
within checkmatchall_recurse.

The main thing I find a bit ugly about this solution is that
I'm abusing the state->tmp fields (which are declared struct state *)
to hold the memoization results (which are "bool *").  It'd be
possible to avoid that by allocating an additional "bool **" array
indexed by state number, but whether it's worth that depends on how
allergic you are to weird pointer casts.

Comments?

regards, tom lane

diff --git a/src/backend/regex/regc_nfa.c b/src/backend/regex/regc_nfa.c
index 77b860cb0f..04920fa4e3 100644
--- a/src/backend/regex/regc_nfa.c
+++ b/src/backend/regex/regc_nfa.c
@@ -3032,46 +3032,109 @@ analyze(struct nfa *nfa)
 static void
 checkmatchall(struct nfa *nfa)
 {
-	bool		hasmatch[DUPINF + 1];
+	struct state *s;
+	bool		recur_result;
+	bool	   *hasmatch;
 	int			minmatch,
 maxmatch,
 morematch;
 
 	/*
-	 * hasmatch[i] will be set true if a match of length i is feasible, for i
-	 * from 0 to DUPINF-1.  hasmatch[DUPINF] will be set true if every match
-	 * length of DUPINF or more is feasible.
+	 * If there are too many states, don't bother trying to detect matchall.
+	 * This limit serves to bound the time and memory we could consume below.
+	 * Note that even if the graph is all-RAINBOW, if there are significantly
+	 * more than DUPINF states then it's likely that there are paths of length
+	 * more than DUPINF, which would force us to fail anyhow.  The exact
+	 * cutoff used here is a bit arbitrary.
 	 */
-	memset(hasmatch, 0, sizeof(hasmatch));
+	if (nfa->nstates > DUPINF * 4)
+		return;
 
 	/*
-	 * Recursively search the graph for all-RAINBOW paths to the "post" state,
-	 * starting at the "pre" state.  The -1 initial depth accounts for the
-	 * fact that transitions out of the "pre" state are not part of the
-	 * matched string.  We likewise don't count the final transition to the
-	 * "post" state as part of the match length.  (But we still insist that
-	 * those transitions have RAINBOW arcs, otherwise there are lookbehind or
-	 * lookahead constraints at the start/end of the pattern.)
+	 * First, scan all the states to verify that only RAINBOW arcs appear,
+	 * plus pseudocolor arcs adjacent to the pre and post states.  This lets
+	 * us quickly eliminate most cases that aren't matchall NFAs.
 	 */
-	if (!checkmatchall_recurse(nfa, nfa->pre, false, -1, hasmatch))
-		return;
+	for (s = nfa->states; s != NULL; s = s->next)
+	{
+		struct arc *a;
+
+		for (a = s->outs; a != NULL; a = a->outchain)
+		{
+			if (a->type != PLAIN)
+return;			/* any 

Re: Enhanced error message to include hint messages for redundant options error

2021-05-01 Thread Bharath Rupireddy
On Sat, May 1, 2021, 10:54 PM Alvaro Herrera 
wrote:

> On 2021-May-01, vignesh C wrote:

> On Thu, Apr 29, 2021 at 10:44 PM Alvaro Herrera 
> wrote:
> > >
> > > On 2021-Apr-29, vignesh C wrote:
> > >
> > > > Thanks for the comments, please find the attached v3 patch which has
> > > > the change for the first part.
> > >
> > > Looks good to me.  I would only add parser_errposition() to the few
> > > error sites missing that.
> >
> > I have not included parser_errposition as ParseState was not available
> > for these errors.
>
> Yeah, it's tough to do that in a few of those such as validator
> functions, and I don't think we'd want to do that.  However there are
> some cases where we can easily add the parsestate as an argument -- for
> example CreatePublication can get it in ProcessUtilitySlow and pass it
> down to parse_publication_options; likewise for ExecuteDoStmt.  I didn't
> check other places.
>

IMO, it's not good to change the function API just for showing up
parse_position (which is there for cosmetic reasons I feel) in an error
which actually has the option name clearly mentioned in the error message.

Best Regards,
Bharath Rupireddy.
EnterpriseDB.

>


Re: Enhanced error message to include hint messages for redundant options error

2021-05-01 Thread Alvaro Herrera
On 2021-May-01, vignesh C wrote:

> On Thu, Apr 29, 2021 at 10:44 PM Alvaro Herrera  
> wrote:
> >
> > On 2021-Apr-29, vignesh C wrote:
> >
> > > Thanks for the comments, please find the attached v3 patch which has
> > > the change for the first part.
> >
> > Looks good to me.  I would only add parser_errposition() to the few
> > error sites missing that.
> 
> I have not included parser_errposition as ParseState was not available
> for these errors.

Yeah, it's tough to do that in a few of those such as validator
functions, and I don't think we'd want to do that.  However there are
some cases where we can easily add the parsestate as an argument -- for
example CreatePublication can get it in ProcessUtilitySlow and pass it
down to parse_publication_options; likewise for ExecuteDoStmt.  I didn't
check other places.

-- 
Álvaro Herrera39°49'30"S 73°17'W




Re: [PATCH] Identify LWLocks in tracepoints

2021-05-01 Thread Dmitry Dolgov
> On Fri, Apr 30, 2021 at 11:23:56AM +0800, Craig Ringer wrote:
> On Wed, 14 Apr 2021, 22:29 Robert Haas,  wrote:
> 
> > > I'm actually inclined to revise the patch I sent in order to *remove*
> > > the LWLock name from the tracepoint argument.
> 
> > Reducing the overheads is good, but I have no opinion on what's
> > important for people doing tracing, because I am not one of those
> > people.
> >
> 
> Truthfully I'm not convinced anyone is "those people" right now. I don't
> think anyone is likely to be making serious use of them due to their
> limitations.

I would like to mention that tracepoints could be useful not only directly,
they also:

* deliver an information about what is important enough to trace from the
  developers, who wrote the code, point of view.

* declare more stable tracing points within the code, which are somewhat more
  reliable between the versions.

E.g. writing bcc scripts one is also sort of limited in use of those
tracepoints because of requirement to provide a specific pid, but still can get
better understanding what to look at (maybe using other methods).




Re: Dump public schema ownership & seclabels

2021-05-01 Thread Zhihong Yu
On Sat, May 1, 2021 at 8:16 AM Asif Rehman  wrote:

> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, passed
> Implements feature:   tested, passed
> Spec compliant:   tested, passed
> Documentation:not tested
>
> Hi,
>
> I have tested this patch. This patch still applies and it works well.
>
> Regards,
> Asif
>
> The new status of this patch is: Ready for Committer
>

For public-schema-comment-dump-v2.patch :

+   if (ncomments == 0)
+   {
+   comments = _comment;
+   ncomments = 1;
+   }
+   else if (strcmp(comments->descr, (fout->remoteVersion >= 80300 ?
+ "standard public schema" :
+ "Standard public schema")) == 0)
+   {
+   ncomments = 0;

Is it possible that, in the case ncomments > 0, there are more than one
comment ?
If not, an assertion can be added in the second if block above.

Cheers


Re: Granting control of SUSET gucs to non-superusers

2021-05-01 Thread Chapman Flack
On 05/01/21 12:13, Mark Dilger wrote:
> Extra syntax for use by the service provider seems much easier to justify.
> 
> If the service provider can install extra role-aware check_hooks for gucs,
> and if the include directive for postgresql.conf can specify a role under
> which a postgresql.conf.tenant file is processed, then the tenant can port
> their application and their config file and the only things that should break
> are those things the provider has intentionally prohibited.

Maybe version 0 is where the provider just builds a shared object
to go in shared_preload_libraries. The provider has probably already
done a bunch of other stuff more challenging than that.

The GUC system would have to expose a way for the shared object to
chain extra_check_hooks off existing GUCs. An extra_check_hook can check
both the value and the role of the caller.

The configfile syntax for include-with-a-role would be the only other
missing piece.

Version 0.5 is maybe where someone contributes code for such a shared
object that is somewhat general and configured by a yaml file, or
something. (That would probably be easier if an extra_check_hook accepts
the usual void *extra context argument that existing GUC hooks don't.)

Regards,
-Chap




Re: Transactions involving multiple postgres foreign servers, take 2

2021-05-01 Thread Zhihong Yu
On Fri, Apr 30, 2021 at 9:09 PM Masahiko Sawada 
wrote:

> On Wed, Mar 17, 2021 at 6:03 PM Zhihong Yu  wrote:
> >
> > Hi,
> > For v35-0007-Prepare-foreign-transactions-at-commit-time.patch :
>
> Thank you for reviewing the patch!
>
> >
> > With this commit, the foreign server modified within the transaction
> marked as 'modified'.
> >
> > transaction marked -> transaction is marked
>
> Will fix.
>
> >
> > +#define IsForeignTwophaseCommitRequested() \
> > +(foreign_twophase_commit > FOREIGN_TWOPHASE_COMMIT_DISABLED)
> >
> > Since the other enum is FOREIGN_TWOPHASE_COMMIT_REQUIRED, I think the
> macro should be named: IsForeignTwophaseCommitRequired.
>
> But even if foreign_twophase_commit is
> FOREIGN_TWOPHASE_COMMIT_REQUIRED, the two-phase commit is not used if
> there is only one modified server, right? It seems the name
> IsForeignTwophaseCommitRequested is fine.
>
> >
> > +static bool
> > +checkForeignTwophaseCommitRequired(bool local_modified)
> >
> > +   if (!ServerSupportTwophaseCommit(fdw_part))
> > +   have_no_twophase = true;
> > ...
> > +   if (have_no_twophase)
> > +   ereport(ERROR,
> >
> > It seems the error case should be reported within the loop. This way, we
> don't need to iterate the other participant(s).
> > Accordingly, nserverswritten should be incremented for local server
> prior to the loop. The condition in the loop would become if
> (!ServerSupportTwophaseCommit(fdw_part) && nserverswritten > 1).
> > have_no_twophase is no longer needed.
>
> Hmm, I think If we process one 2pc-non-capable server first and then
> process another one 2pc-capable server, we should raise an error but
> cannot detect that.
>

Then the check would stay as what you have in the patch:

  if (!ServerSupportTwophaseCommit(fdw_part))

When the non-2pc-capable server is encountered, we would report the error
in place (following the ServerSupportTwophaseCommit check) and come out of
the loop.
have_no_twophase can be dropped.

Thanks


>
> Regards,
>
> --
> Masahiko Sawada
> EDB:  https://www.enterprisedb.com/
>


Re: Granting control of SUSET gucs to non-superusers

2021-05-01 Thread Mark Dilger



> On May 1, 2021, at 7:07 AM, Chapman Flack  wrote:
> 
> On 04/30/21 22:00, Mark Dilger wrote:
>> Viewing all of this in terms of which controls allow the tenant to escape
>> a hypothetical sandbox seems like the wrong approach.  Shouldn't we let
>> service providers decide which controls would allow the tenant to escape
>> the specific sandbox the provider has designed?
> 
> I agree that sounds more like the right approach. It seems to me that
> in the general case, a provider might conclude that setting foo is
> safe in the provider-designed sandbox /if the value being assigned
> to it satisfies some provider-determined conditions/.

> So it seems the machinery is already in place with which a provider
> could allow a chosen set of SUSET-only GUCs to be set, to values that
> satisfy provider-determined conditions, by users in a provider-chosen
> role.

> Some pretty syntax like GRANT SETTING foo TO ROLE bar WHERE cond;
> would simply be sugar on top.

I agree with everything you say here.  I have some thoughts about usability

I'd like the experience for the tenant to be as similar as possible to having 
superuser privileges on their own cluster.  The tenant may be migrating an 
application from a database that they currently manage themselves, and any need 
to use different syntax from what they have been using is an extra hurdle that 
could derail the migration.

Extra syntax for use by the service provider seems much easier to justify.

If the service provider can install extra role-aware check_hooks for gucs, and 
if the include directive for postgresql.conf can specify a role under which a 
postgresql.conf.tenant file is processed, then the tenant can port their 
application and their config file and the only things that should break are 
those things the provider has intentionally prohibited.

Does this sound like a reasonable approach?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: function for testing that causes the backend to terminate

2021-05-01 Thread Andrew Dunstan


On 4/29/21 4:16 PM, Joe Conway wrote:
> On 4/29/21 6:56 AM, Dave Cramer wrote:
>> For testing unusual situations I'd like to be able to cause a backend
>> to terminate due to something like a segfault. Do we currently have
>> this in testing ?
>
> If you can run SQL as a superuser from that backend, try:
>
> COPY (SELECT pg_backend_pid())
>  TO PROGRAM 'xargs kill -SIGSEGV';
>
> HTH,
>
> Joe
>


or a plperlu function that does 'kill 11, $$;' should do it.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Enhanced error message to include hint messages for redundant options error

2021-05-01 Thread Bharath Rupireddy
On Sat, May 1, 2021 at 7:25 PM vignesh C  wrote:
> > > I'm not attaching above one line change as a patch, maybe Vignesh can
> > > merge this into the main patch.
>
> Thanks for the comments. I have merged the change into the attached patch.
> Thoughts?

Thanks! v4 basically LGTM. Can we park this in the current commitfest
if not done already?

Upon looking at the number of places where we have the "option \"%s\"
specified more than once" error, I, now strongly feel that we should
use goto duplicate_error approach like in compute_common_attribute, so
that we will have only one ereport(ERROR. We can change it in
following files: copy.c, dbcommands.c, extension.c,
compute_function_attributes, sequence.c, subscriptioncmds.c,
typecmds.c, user.c, walsender.c, pgoutput.c. This will reduce the LOC
greatly.

Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Dump public schema ownership & seclabels

2021-05-01 Thread Asif Rehman
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:not tested

Hi,

I have tested this patch. This patch still applies and it works well.

Regards,
Asif

The new status of this patch is: Ready for Committer


Re: Identify missing publications from publisher while create/alter subscription.

2021-05-01 Thread Bharath Rupireddy
On Sat, May 1, 2021 at 12:49 PM vignesh C  wrote:
> Thanks for the comments, Attached patch has the fixes for the same.
> Thoughts?

Few more comments on v5:

1) Deletion of below empty new line is spurious:
-
 /*
  * Common option parsing function for CREATE and ALTER SUBSCRIPTION commands.
  *

2) I think we could just do as below to save indentation of the code
for validate_publication == true.
static void
+connect_and_check_pubs(Subscription *sub, List *publications,
+   bool validate_publication)
+{
+char   *err;
+
+if (validate_pulication == false )
+return;
+
+/* Load the library providing us libpq calls. */
+load_file("libpqwalreceiver", false);

3) To be consistent, either we pass in validate_publication to both
connect_and_check_pubs and check_publications, return immediately from
them if it is false or do the checks outside. I suggest to pass in the
bool parameter to check_publications like you did for
connect_and_check_pubs. Or remove validate_publication from
connect_and_check_pubs and do the check outside.
+if (validate_publication)
+check_publications(wrconn, publications);
+if (check_pub)
+check_publications(wrconn, sub->publications);

4) Below line of code is above 80-char limit:
+else if (strcmp(defel->defname, "validate_publication") == 0
&& validate_publication)

5) Instead of adding a new file 021_validate_publications.pl for
tests, spawning a new test database which would make the overall
regression slower, can't we add with the existing database nodes in
0001_rep_changes.pl? I would suggest adding the tests in there even if
the number of tests are many, I don't mind.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Granting control of SUSET gucs to non-superusers

2021-05-01 Thread Chapman Flack
On 04/30/21 22:00, Mark Dilger wrote:
> Viewing all of this in terms of which controls allow the tenant to escape
> a hypothetical sandbox seems like the wrong approach.  Shouldn't we let
> service providers decide which controls would allow the tenant to escape
> the specific sandbox the provider has designed?

I agree that sounds more like the right approach. It seems to me that
in the general case, a provider might conclude that setting foo is
safe in the provider-designed sandbox /if the value being assigned
to it satisfies some provider-determined conditions/.

On 04/30/21 20:02, Chapman Flack wrote:
> So that suggests to me some mechanism where a provider could grant
> setting foo to role bar using validator baz().
>
> Can SUSET GUCs be set from SECURITY DEFINER functions? Maybe there are
> already the pieces to do that, minus some syntax sugar.

The answer seems to be yes: I just created a SECURITY DEFINER function
and used it to change a SUSET-only GUC setting.

So it seems the machinery is already in place with which a provider
could allow a chosen set of SUSET-only GUCs to be set, to values that
satisfy provider-determined conditions, by users in a provider-chosen
role.

Some pretty syntax like GRANT SETTING foo TO ROLE bar WHERE cond;
would simply be sugar on top.

Regards,
-Chap




Re: Enhanced error message to include hint messages for redundant options error

2021-05-01 Thread vignesh C
On Thu, Apr 29, 2021 at 10:44 PM Alvaro Herrera  wrote:
>
> On 2021-Apr-29, vignesh C wrote:
>
> > Thanks for the comments, please find the attached v3 patch which has
> > the change for the first part.
>
> Looks good to me.  I would only add parser_errposition() to the few
> error sites missing that.

I have not included parser_errposition as ParseState was not available
for these errors.
Thoughts?

Regards,
Vignesh




Re: Enhanced error message to include hint messages for redundant options error

2021-05-01 Thread vignesh C
On Sat, May 1, 2021 at 10:47 AM Dilip Kumar  wrote:
>
> On Sat, May 1, 2021 at 10:43 AM Bharath Rupireddy
>  wrote:
> >
> > On Fri, Apr 30, 2021 at 2:49 PM Dilip Kumar  wrote:
> > > Looking into this again, why not as shown below?  IMHO, this way the
> > > code will be logically the same as it was before the patch, basically
> > > why to process an extra statement ( *volatility_item = defel;) if we
> > > have already decided to error.
> >
> > I changed my mind given the concerns raised on removing the goto
> > statements. We could just do as below:
>
> Okay, that make sense.
>
> > diff --git a/src/backend/commands/functioncmds.c
> > b/src/backend/commands/functioncmds.c
> > index 9548287217..1f1c74c379 100644
> > --- a/src/backend/commands/functioncmds.c
> > +++ b/src/backend/commands/functioncmds.c
> > @@ -575,7 +575,7 @@ compute_common_attribute(ParseState *pstate,
> >  duplicate_error:
> >  ereport(ERROR,
> >  (errcode(ERRCODE_SYNTAX_ERROR),
> > - errmsg("conflicting or redundant options"),
> > + errmsg("option \"%s\" specified more than once", 
> > defel->defname),
> >   parser_errposition(pstate, defel->location)));
> >  return false;/* keep compiler quiet */
> >
> > I'm not attaching above one line change as a patch, maybe Vignesh can
> > merge this into the main patch.

Thanks for the comments. I have merged the change into the attached patch.
Thoughts?

Regards,
Vignesh
From b76a67be6c162d064f0a9cecd3ebc66d6c8fcbd9 Mon Sep 17 00:00:00 2001
From: vignesh 
Date: Mon, 26 Apr 2021 18:40:36 +0530
Subject: [PATCH v4] Enhance error message.

Enhanced error message, so that the user can easily identify the error.
---
 contrib/file_fdw/file_fdw.c |  6 +--
 src/backend/catalog/aclchk.c|  4 +-
 src/backend/commands/copy.c | 23 +-
 src/backend/commands/dbcommands.c   | 28 ++--
 src/backend/commands/extension.c|  8 ++--
 src/backend/commands/foreigncmds.c  |  4 +-
 src/backend/commands/functioncmds.c | 14 +++---
 src/backend/commands/publicationcmds.c  |  4 +-
 src/backend/commands/sequence.c | 18 
 src/backend/commands/subscriptioncmds.c | 18 
 src/backend/commands/tablecmds.c|  2 +-
 src/backend/commands/typecmds.c | 14 +++---
 src/backend/commands/user.c | 48 ++---
 src/backend/parser/parse_utilcmd.c  |  2 +-
 src/backend/replication/pgoutput/pgoutput.c | 10 ++---
 src/backend/replication/walsender.c |  6 +--
 src/test/regress/expected/copy2.out | 24 ++-
 src/test/regress/expected/foreign_data.out  |  4 +-
 src/test/regress/expected/publication.out   |  2 +-
 19 files changed, 120 insertions(+), 119 deletions(-)

diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 2c2f149fb0..10aa2fca28 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -292,8 +292,7 @@ file_fdw_validator(PG_FUNCTION_ARGS)
 			if (force_not_null)
 ereport(ERROR,
 		(errcode(ERRCODE_SYNTAX_ERROR),
-		 errmsg("conflicting or redundant options"),
-		 errhint("Option \"force_not_null\" supplied more than once for a column.")));
+		 errmsg("option \"%s\" specified more than once", def->defname)));
 			force_not_null = def;
 			/* Don't care what the value is, as long as it's a legal boolean */
 			(void) defGetBoolean(def);
@@ -304,8 +303,7 @@ file_fdw_validator(PG_FUNCTION_ARGS)
 			if (force_null)
 ereport(ERROR,
 		(errcode(ERRCODE_SYNTAX_ERROR),
-		 errmsg("conflicting or redundant options"),
-		 errhint("Option \"force_null\" supplied more than once for a column.")));
+		 errmsg("option \"%s\" specified more than once", def->defname)));
 			force_null = def;
 			(void) defGetBoolean(def);
 		}
diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
index e1573eb398..7885587bfc 100644
--- a/src/backend/catalog/aclchk.c
+++ b/src/backend/catalog/aclchk.c
@@ -923,7 +923,7 @@ ExecAlterDefaultPrivilegesStmt(ParseState *pstate, AlterDefaultPrivilegesStmt *s
 			if (dnspnames)
 ereport(ERROR,
 		(errcode(ERRCODE_SYNTAX_ERROR),
-		 errmsg("conflicting or redundant options"),
+		 errmsg("option \"%s\" specified more than once", defel->defname),
 		 parser_errposition(pstate, defel->location)));
 			dnspnames = defel;
 		}
@@ -932,7 +932,7 @@ ExecAlterDefaultPrivilegesStmt(ParseState *pstate, AlterDefaultPrivilegesStmt *s
 			if (drolespecs)
 ereport(ERROR,
 		(errcode(ERRCODE_SYNTAX_ERROR),
-		 errmsg("conflicting or redundant options"),
+		 errmsg("option \"%s\" specified more than once", defel->defname),
 		 parser_errposition(pstate, defel->location)));
 			drolespecs = defel;
 		}
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 8265b981eb..2860f36f91 100644
--- 

Re: Incorrect snapshots while promoting hot standby node when 2PC is used

2021-05-01 Thread Andrey Borodin
Hi Andres!

> 23 апр. 2021 г., в 01:36, Andres Freund  написал(а):
> 
> So snapshots within that window will always be "empty", i.e. xmin is
> latestCompletedXid and xmax is latestCompletedXid + 1. Once we reach 3), we'll
> look at the procarray, which then leads xmin going back to 588.
> 
> 
> I think that this can lead to data corruption, because a too new xmin horizon
> could lead to rows from a prepared transaction getting hint bitted as dead (or
> perhaps even pruned, although that's probably harder to hit). Due to the too
> new xmin horizon we won't treat rows by the prepared xact as in-progress, and
> TransactionIdDidCommit() will return false, as the transaction didn't commit
> yet. Which afaict can result in row versions created by the prepared
> transaction being invisible even after the prepared transaction commits.
> 
> Without prepared transaction there probably isn't an issue, because there
> shouldn't be any other in-progress xids at that point?

I'm investigating somewhat resemblant case.
We have an OLTP sharded installation where shards are almost always under 
rebalancing. Data movement is implemented with 2pc.
Switchover happens quite often due to datacenter drills. The installation is 
running on PostgreSQL 12.6.

In January heapcheck of backup reported some suspicious problems
ERROR:  Page marked as all-frozen, but found non frozen tuple. 
Oid(relation)=18487, blkno(page)=1470240, offnum(tuple)=1
ERROR:  Page marked as all-frozen, but found non frozen tuple. 
Oid(relation)=18487, blkno(page)=1470241, offnum(tuple)=1
ERROR:  Page marked as all-frozen, but found non frozen tuple. 
Oid(relation)=18487, blkno(page)=1470242, offnum(tuple)=1
...
and so on for ~100 pages - tuples with lp==1 were not frozen.

We froze tuples with pg_dirty_hands and run VACUUM (DSIABLE_PAGE_SKIPPING) on 
the table.

In the end of the March the same shard stroke again with:
ERROR:  Page marked as all-frozen, but found non frozen tuple. 
Oid(relation)=18487, blkno(page)=1470240, offnum(tuple)=42

around ~1040 blocks (starting from the same 1470240!) had non-frozen tuple at 
lp==42.
I've run
update s3.objects_65 set created = created where ctid = '(1470241,42)' 
returning *;

After that heapcheck showed VM problem
ERROR:  XX001: Found non all-visible tuple. Oid(relation)=18487, 
blkno(page)=1470240, offnum(tuple)=42
LOCATION:  collect_corrupt_items, heap_check.c:186

VACUUM fixed it with warnings.
WARNING: 01000: page is not marked all-visible but visibility map bit is set in 
relation "objects_65" page 1470240
and failed on next page
ERROR:  XX001: found xmin 1650436694 from before relfrozenxid 1752174172
LOCATION:  heap_prepare_freeze_tuple, heapam.c:6172

I run update from all tuples in heapcheks ctid list and subsequent vacuum 
(without page skipping). This satisfied corruption monitoring.


Can this case be related to the problem that you described?

Or, perhaps, it looks more like a hardware problem? Data_checksums are on, but 
few years ago we observed ssd firmware that was loosing updates, but passing 
checksums. I'm sure that we would benefit from having separate relation fork 
for checksums or LSNs.


We observe similar cases 3-5 times a year. To the date no data was lost due to 
this, but it's somewhat annoying.
BTW I'd say that such things are an argument for back-porting pg_surgery and 
heapcheck to old versions.

Thanks!

Best regards, Andrey Borodin.





Hook for extensible parsing.

2021-05-01 Thread Julien Rouhaud
Hi,

Being able to extend core parser has been requested multiple times, and AFAICT
all previous attempts were rejected not because this isn't wanted but because
the proposed implementations required plugins to reimplement all of the core
grammar with their own changes, as bison generated parsers aren't extensible.

I'd like to propose an alternative approach, which is to allow multiple parsers
to coexist, and let third-party parsers optionally fallback on the core
parsers.  I'm sending this now as a follow-up of [1] and to avoid duplicated
efforts, as multiple people are interested in that topic.

Obviously, since this is only about parsing, all modules can only implement
some kind of syntactic sugar, as they have to produce valid parsetrees, but
this could be a first step to later allow custom nodes and let plugins
implement e.g. new UTILITY commands.

So, this approach should allow different custom parser implementations:

1 implement only a few new commands on top of core grammar.  For instance, an
  extension could add support for CREATE [PHYSICAL | LOGICAL] REPLICATION SLOT
  and rewrite that to a SelectStmt on top of the extisting function, or add a
  CREATE HYPOTHETICAL INDEX, which would internally add a new option in
  IndexStmt->options, to be intercepted in processUtility and bypass its
  execution with the extension approach instead.

2 implement a totally different grammar for a different language.  In case of
  error, just silently fallback to core parser (or another hook) so both
  parsers can still be used.  Any language could be parsed as long as you can
  produce a valid postgres parsetree.

3 implement a superuser of core grammar and replace core parser entirely.  This
  could arguably be done like the 1st case, but the idea is to avoid to
  possibly parse the same input string twice, or to forbid the core parser if
  that's somehow wanted.


I'm attaching some POC patches that implement this approach to start a
discussion.  I split the infrastructure part in 2 patches to make it easier to
review, and I'm also adding 2 other patches with a small parser implementation
to be able to test the infrastructure.  Here are some more details on the
patches and implementation details:

0001 simply adds a parser hook, which is called instead of raw_parser.  This is
enough to make multiple parser coexist with one exception: multi-statement
query string.  If multiple statements are provided, then all of them will be
parsed using the same grammar, which obviously won't work if they are written
for different grammars.

0002 implements a lame "sqlol" parser, based on LOLCODE syntax, with only the
ability to produce "select [col, ] col FROM table" parsetree, for testing
purpose.  I chose it to ensure that everything works properly even with a
totally different grammar that has different keywords, which doesn't even ends
statements with a semicolon but a plain keyword.

0003 is where the real modifications are done to allow multi-statement string
to be parsed using different grammar.  It implements a new MODE_SINGLE_QUERY
mode, which is used when a parser_hook is present.  In that case,
pg_parse_query() will only parse part of the query string and loop until
everything is parsed (or some error happens).

pg_parse_query() will instruct plugins to parse a query at a time.  They're
free to ignore that mode if they want to implement the 3rd mode.  If so, they
should either return multiple RawStmt, a single RawStmt with a 0 or
strlen(query_string) stmt_len, or error out.  Otherwise, they will implement
either mode 1 or 2, and they should always return a List containing a single
RawStmt with properly set stmt_len, even if the underlying statement is NULL.
This is required to properly skip valid strings that don't contain a
statements, and pg_parse_query() will skip RawStmt that don't contain an
underlying statement.

It also teaches the core parser to do the same, by optionally start parsing
somewhere in the input string and stop parsing once a valid statement is found.

Note that the whole input string is provided to the parsers in order to report
correct cursor position, so all token can get a correct location.  This means
that raw_parser() signature needs an additional offset to know where the
parsing should start.

Finally, 0004 modifies the sqlol parser to implement the MODE_SINGLE_QUERY
mode, adds grammar for creating views and adds some regression test to validate
proper parsing and error location reporting with multi-statements input string.

As far as I can tell it's all working as expected but I may have missed some
usecases.  The regression tests still work with the additional parser
configured.  The only difference is for pg_stat_statements, as in
MODE_SINGLE_QUERY the trailing semicolon has to be included in the statement,
since other grammars may understand semicolons differently.

The obvious drawback is that it can cause overhead as the same input can be
parsed multiple time.  This could be 

Re: Identify missing publications from publisher while create/alter subscription.

2021-05-01 Thread vignesh C
On Tue, Apr 13, 2021 at 8:01 PM Bharath Rupireddy
 wrote:
>
> On Tue, Apr 13, 2021 at 6:22 PM vignesh C  wrote:
> > > 2) How about
> > > +  Specifies whether the subscriber must verify the
> > > publications that are
> > > +   being subscribed to are present in the publisher. By default,
> > > the subscriber
> > > instead of
> > > +  Specifies whether the subscriber must verify if the specified
> > > +  publications are present in the publisher. By default, the 
> > > subscriber
> > >
> >
> > Slightly reworded and modified.
>
> + 
> +  When true, the command will try to verify if the specified
> +  publications that are subscribed is present in the publisher.
> +  The default is false.
> + 
>
> "publications that are subscribed" is not right as the subscriber is
> not yet subscribed, it is "trying to subscribing", and it's not that
> the command "will try to verify", it actually verifies. So you can
> modify as follows:
>
> + 
> +  When true, the command verifies if all the specified
> publications that are being subscribed to are present in the publisher
> and throws an error if any of the publication doesn't exist. The
> default is false.
> + 
>
> > > 3) I think we can make below common code into a single function with
> > > flags to differentiate processing for both, something like:
> > > StringInfoData *get_publist_str(List *publicaitons, bool use_quotes,
> > > bool is_fetch_table_list);
> > > check_publications:
> > > +/* Convert the publications which does not exist into a string. 
> > > */
> > > +initStringInfo();
> > > +foreach(lc, publicationsCopy)
> > > +{
> > > and get_appended_publications_query:
> > >  foreach(lc, publications)
> > >
> > > With the new function that only prepares comma separated list of
> > > publications, you can get rid of get_appended_publications_query and
> > > just append the returned list to the query.
> > > fetch_table_list: get_publist_str(publications, true, true);
> > > check_publications: for select query preparation
> > > get_publist_str(publications, true, false); and for error string
> > > preparation get_publist_str(publications, false, false);
> > >
> > > And also let the new function get_publist_str allocate the string and
> > > just mention as a note in the function comment that the callers should
> > > pfree the returned string.
> > >
> >
> > I felt the existing code looks better, if we have a common function,
> > we will have to lot of if conditions as both the functions is not same
> > to same, they operate on different data types and do the preparation
> > appropriately. Like fetch_table_list get nspname & relname and
> > converts it to RangeVar and adds to the list other function prepares a
> > text and deletes the entries present from the list. So I did not fix
> > this. Thoughts?
>
> I was actually thinking we could move the following duplicate code
> into a function:
> foreach(lc, publicationsCopy)
> {
> char   *pubname = strVal(lfirst(lc));
>
> if (first)
> first = false;
> else
> appendStringInfoString(, ", ");
> appendStringInfoString(, "\"");
> appendStringInfoString(, pubname);
> appendStringInfoString(, "\"");
> }
> and
> foreach(lc, publications)
> {
> char   *pubname = strVal(lfirst(lc));
>
> if (first)
> first = false;
> else
> appendStringInfoString(cmd, ", ");
>
> appendStringInfoString(cmd, quote_literal_cstr(pubname));
> }
> that function can be:
> static void
> get_publications_str(List *publications, StringInfo dest, bool quote_literal)
> {
> ListCell   *lc;
> boolfirst = true;
>
> Assert(list_length(publications) > 0);
>
> foreach(lc, publications)
> {
> char   *pubname = strVal(lfirst(lc));
>
> if (first)
> first = false;
> else
> appendStringInfoString(dest, ", ");
>
> if (quote_literal)
> appendStringInfoString(pubnames, quote_literal_cstr(pubname));
> else
> {
> appendStringInfoString(, "\"");
> appendStringInfoString(, pubname);
> appendStringInfoString(, "\"");
> }
> }
> }
>
> This way, we can get rid of get_appended_publications_query and use
> the above function to return the appended list of publications. We
> need to just pass quote_literal as true while preparing the publist
> string for publication query and append it to the query outside the
> function. While preparing publist str for error, pass quote_literal as
> false. Thoughts?
>

Modified.

> > > 7) You can just do
> > > publications = list_copy(publications);
> > > instead of using another variable publicationsCopy
> > > publicationsCopy = list_copy(publications);
> >
> >