from:"Julien Rouhaud"

Re: RFC: Logging plan of the running query

2024-02-26 Thread Julien Rouhaud

On Mon, Feb 26, 2024 at 01:56:44PM +0530, Robert Haas wrote:
> On Sun, Feb 25, 2024 at 5:00 PM Julien Rouhaud  wrote:
> > Yeah, trying to find a generalized solution seems like worth investing some
> > time to avoid having a bunch of CHECK_FOR_XXX() calls scattered in the code 
> > a
> > few years down the road.
>
> I just don't really see how to do it. I suspect that every task that
> wants to run at some CFIs but not others is going to have slightly
> different requirements, and we probably can't foresee what all of
> those requirements are.
>
> Said another way, if in the future we want to call
> DoSomethingOrOther() from the CFI handler, well then we need to know
> that we're not already in the middle of using any subsystem that
> DoSomethingOrOther() might also try to use ... and we also need to
> know that we're not in the middle of doing anything that's more
> critical than DoSomethingOrOther(). But both of these are likely to
> vary in each case.
>
> EXPLAIN might be one member of a general class of things that require
> catalog access (and thus might take locks or lwlocks, access the
> catalogs, trigger invalidations, etc.) but it's not clear how general
> that class really is. Also, I think if we try to do too many different
> kinds of things at CFIs, the whole thing is going to fall apart.
> You'll end up failing to foresee some interactions, or the stack will
> get too deep, or ... something.

I still fail to understand your point.  So you say we might want to check for
safe condition to run explain or DoSomethingOrOther or even DoMeh, with all
different requirements.  IIUC what you're saying is that we should have
CHECK_FOR_EXPLAIN(), CHECK_FOR_DOSOMETHINGOROTHER() and CHECK_FOR_MEH()?

And so in some code path A we could have

CHECK_FOR_INTERRUPTS();
CHECK_FOR_EXPLAIN();

In another

CHECK_FOR_INTERRUPTS();
CHECK_FOR_DOSOMETHINGOROTHER();

and in one happy path

CHECK_FOR_INTERRUPTS();
CHECK_FOR_EXPLAIN();
CHECK_FOR_DOSOMETHINGOROTHER();
CHECK_FOR_MEH();

Or that we should still have all of those but they shouldn't be anywhere close
to a CHECK_FOR_INTERRUPTS(), or something totally different?

In the first case, I'm not sure why having CHECK_FOR_INTERRUPTS(EXPLAIN|...),
with a combination of flags and with the flag handling being done after
ProcessIterrupts(), would be any different apart from having less boilerplate
lines.  Similarly for the 2nd case, but relying on a single more general
CHECK_FOR_CONDITION(EXPLAIN | ...) rather than N CHECK_FOR_XXX?

If you just want something totally different then sure.

Re: RFC: Logging plan of the running query

2024-02-25 Thread Julien Rouhaud

On Mon, Feb 26, 2024 at 12:19:42PM +0530, Ashutosh Bapat wrote:
> On Sun, Feb 25, 2024 at 5:00 PM Julien Rouhaud  wrote:
> >
> > > > On Fri, Feb 23, 2024 at 7:50 PM Julien Rouhaud  
> > > > wrote:
> > > > >
> > > > > But it doesn't have to be all or nothing right?  I mean each call 
> > > > > could say
> > > > > what the situation is like in their context, like
> > > > > CHECK_FOR_INTERRUPTS(GUARANTEE_NO_HEAVYWEIGHT_LOCK | 
> > > > > GUARANTEE_WHATEVER), and
> > > > > slowly tag calls as needed, similarly to how we add already CFI based 
> > > > > on users
> > > > > report.
>
> That has some potential ...
>
> >
> > I might be missing something, but since we already have a ton of macro 
> > hacks,
> > why not get another to allow CFI() overloading rather than modifying every
> > single call?  Something like that should do the trick (and CFIFlagHandler() 
> > is
> > just a naive example with a function call to avoid multiple evaluation, 
> > should
> > be replaced with anything that required more than 10s thinking):
> >
> > #define CHECK_FOR_INTERRUPTS_0() \
> > do { \
> > if (INTERRUPTS_PENDING_CONDITION()) \
> > ProcessInterrupts(); \
> > } while(0)
> >
> > #define CHECK_FOR_INTERRUPTS_1(f) \
> > do { \
> > if (INTERRUPTS_PENDING_CONDITION()) \
> > ProcessInterrupts(); \
> > \
> > CFIFlagHandler(f); \
> > } while(0)
>
> From your earlier description I thought you are talking about flags
> that can be ORed. We need only two macros above. Why are we calling
> CFIFLagHandler() after ProcessInterrupts()? Shouldn't we pass flags to
> ProcessInterrupts() itself.

Yes, I'm still talking about ORed flags passed to CFI().  That CFIFlagHandler
call is just an example for a generalized function that would act those flags,
rather than having it coded inside the macro.
>
> >
> > #define CHECK_FOR_INTERRUPTS_X(x, f, CFI_IMPL, ...) CFI_IMPL
> >
> > #define CHECK_FOR_INTERRUPTS(...) \
> > CHECK_FOR_INTERRUPTS_X(, ##__VA_ARGS__, \
> > 
> > CHECK_FOR_INTERRUPTS_1(__VA_ARGS__), \
> > 
> > CHECK_FOR_INTERRUPTS_0(__VA_ARGS__) \
> > )
> >
> > We would have to duplicate the current CFI body, but it should never really
> > change, and we shouldn't end up with more than 2 overloads anyway so I don't
> > see it being much of a problem.
>
> Why do we need this complex macro?

So that client code can use either CHECK_FOR_INTERRUPTS() or
CHECK_FOR_INTERRUPTS(flag) rather that transforming every single
CHECK_FOR_INTERRUPTS() to CHECK_FOR_INTERRUPTS(0), which was Robert's
complaint about this approach.

Re: RangeTblEntry jumble omissions

2024-02-25 Thread Julien Rouhaud

On Fri, Feb 23, 2024 at 11:00:41PM +0100, Alvaro Herrera wrote:
>
> Another, similar but not quite: if you do
>
> SET search_path TO foo;
> SELECT * FROM t1;
> SET search_path TO bar;
> SELECT * FROM t1;
>
> and you have both foo.t1 and bar.t1, you'll get two identical-looking
> queries in pg_stat_statements with different jumble IDs, with no way to
> know which is which.  Not sure if the jumbling of the RTE (which
> includes the OID of the table in question) itself is to blame, or
> whether we want to store the relevant schemas with the entry somehow, or
> what.  Obviously, failing to differentiate them would not be an
> improvement.

Yeah that's also a very old known problem.  This has been raised multiple times
(on top of my head [1], [2], [3]).  At this point I'm not exactly holding my
breath.

[1]: 
https://www.postgresql.org/message-id/flat/8f54c609-17c6-945b-fe13-8b07c0866420%40dalibo.com
[2]: 
https://www.postgresql.org/message-id/flat/9baf5c06-d6ab-c688-010c-843348e3d98c%40gmail.com
[3]: 
https://www.postgresql.org/message-id/flat/3aa097d7-7c47-187b-5913-db8366cd4491%40gmail.com

Re: RFC: Logging plan of the running query

2024-02-25 Thread Julien Rouhaud

On Sat, Feb 24, 2024 at 08:56:41AM -0500, James Coleman wrote:
> On Fri, Feb 23, 2024 at 10:23 AM Robert Haas  wrote:
> >
> > On Fri, Feb 23, 2024 at 7:50 PM Julien Rouhaud  wrote:
> > >
> > > But it doesn't have to be all or nothing right?  I mean each call could 
> > > say
> > > what the situation is like in their context, like
> > > CHECK_FOR_INTERRUPTS(GUARANTEE_NO_HEAVYWEIGHT_LOCK | GUARANTEE_WHATEVER), 
> > > and
> > > slowly tag calls as needed, similarly to how we add already CFI based on 
> > > users
> > > report.
> >
> > Absolutely. My gut feeling is that it's going to be simpler to pick a
> > small number of places that are safe and sufficient for this
> > particular feature and add an extra call there, similar to how we do
> > vacuum_delay_point(). The reason I think that's likely to be better is
> > that it will likely require changing only a relatively small number of
> > places. If we instead start annotating CFIs, well, we've got hundreds
> > of those. That's a lot more to change, and it also inconveniences
> > third-party extension authors and people doing back-patching. I'm not
> > here to say it can't work; I just think it's likely not the easiest
> > path.
>
> Yes, I suspect it's not the easiest path. I have a small hunch it
> might end up paying more dividends in the future: there isn't just one
> of these things that is regularly a thorny discussion for the same
> reasons each time (basically "no way to trigger this safely from
> another backend interrupting another one at an arbitrary point"), and
> if we were able to generalize a solution we may have multiple wins (a
> very obvious one in my mind is the inability of auto explain to run an
> explain at the precise time it's most useful: when statement timeout
> fires).

Yeah, trying to find a generalized solution seems like worth investing some
time to avoid having a bunch of CHECK_FOR_XXX() calls scattered in the code a
few years down the road.

I might be missing something, but since we already have a ton of macro hacks,
why not get another to allow CFI() overloading rather than modifying every
single call?  Something like that should do the trick (and CFIFlagHandler() is
just a naive example with a function call to avoid multiple evaluation, should
be replaced with anything that required more than 10s thinking):

#define CHECK_FOR_INTERRUPTS_0() \
do { \
if (INTERRUPTS_PENDING_CONDITION()) \
ProcessInterrupts(); \
} while(0)

#define CHECK_FOR_INTERRUPTS_1(f) \
do { \
if (INTERRUPTS_PENDING_CONDITION()) \
ProcessInterrupts(); \
\
CFIFlagHandler(f); \
} while(0)

#define CHECK_FOR_INTERRUPTS_X(x, f, CFI_IMPL, ...) CFI_IMPL

#define CHECK_FOR_INTERRUPTS(...) \
CHECK_FOR_INTERRUPTS_X(, ##__VA_ARGS__, \

CHECK_FOR_INTERRUPTS_1(__VA_ARGS__), \

CHECK_FOR_INTERRUPTS_0(__VA_ARGS__) \
)

We would have to duplicate the current CFI body, but it should never really
change, and we shouldn't end up with more than 2 overloads anyway so I don't
see it being much of a problem.

Re: RangeTblEntry jumble omissions

2024-02-23 Thread Julien Rouhaud

Hi,

On Fri, Feb 23, 2024 at 04:26:53PM +0100, Peter Eisentraut wrote:
>
> - alias
>
> Currently, two queries like
>
> SELECT * FROM t1 AS foo
> SELECT * FROM t1 AS bar
>
> are counted together by pg_stat_statements -- that might be ok, but they
> both get listed under whichever one is run first, so here if you are looking
> for the "AS bar" query, you won't find it.

I think this one is intentional.  This alias won't change the query behavior or
the field names so it's good to avoid extraneous entries.  It's true that you
then won't find something matching "AS bar", but it's not something you can
rely on anyway.

If you first execute "select * from t1 as foo" and then "SELECT * FROM t1 AS
foo" then you won't find anything matching "AS foo" either.  There isn't even
any guarantee that the stored query text will be jumbled.

> - join_using_alias
>
> Similar situation, currently
>
> SELECT * FROM t1 JOIN t2 USING (a, b)
> SELECT * FROM t1 JOIN t2 USING (a, b) AS x
>
> are counted together.

IMHO same as above.

> - funcordinality
>
> This was probably just forgotten.  It should be included because the WITH
> ORDINALITY clause changes the query result.

Agreed.

> - lateral
>
> Also probably forgotten.  A query specifying LATERAL is clearly different
> from one without it.

Agreed.

Re: RFC: Logging plan of the running query

2024-02-23 Thread Julien Rouhaud

Hi,

On Fri, Feb 23, 2024 at 10:22:32AM +0530, Robert Haas wrote:
> On Thu, Feb 22, 2024 at 6:25 AM James Coleman  wrote:
> > This is potentially a bit of a wild idea, but I wonder if having some
> > kind of argument to CHECK_FOR_INTERRUPTS() signifying we're in
> > "normal" as opposed to "critical" (using that word differently than
> > the existing critical sections) would be worth it.
>
> It's worth considering, but the definition of "normal" vs. "critical"
> might be hard to pin down. Or, we might end up with a definition that
> is specific to this particular case and not generalizable to others.

But it doesn't have to be all or nothing right?  I mean each call could say
what the situation is like in their context, like
CHECK_FOR_INTERRUPTS(GUARANTEE_NO_HEAVYWEIGHT_LOCK | GUARANTEE_WHATEVER), and
slowly tag calls as needed, similarly to how we add already CFI based on users
report.

Re: Small fix on query_id_enabled

2024-02-13 Thread Julien Rouhaud

On Tue, Feb 13, 2024 at 05:28:32PM +0900, Michael Paquier wrote:
> On Tue, Feb 13, 2024 at 01:13:43AM +0900, Yugo NAGATA wrote:
> > I attached an updated patch that adds comments noting to use 
> > IsQueryIdEnabled()
> > instead of accessing the variables directly.
>
> Sounds good to me, thanks.

 +1!

Re: Small fix on query_id_enabled

2024-02-09 Thread Julien Rouhaud

Hi,

On Fri, Feb 09, 2024 at 03:38:23PM +0900, Yugo NAGATA wrote:
>
> I found the comment on query_id_enabled looks inaccurate because this is
> never set to true when compute_query_id is ON.
>
>  /* True when compute_query_id is ON, or AUTO and a module requests them */
>  bool   query_id_enabled = false;
>
> Should we fix this as following (just fixing the place of a comma) ?
>
> /* True when compute_query_id is ON or AUTO, and a module requests them */

Agreed.

> Also, I think the name is a bit confusing for the same reason, that is,
> query_id_enabled may be false even when query id is computed in fact.
>
> Actually, this does not matter because we use IsQueryIdEnabled to check
> if query id is enabled,  instead of referring to a global variable
> (query_id_enabled or compute_query_id). But, just for making a code a bit
> more readable, how about renaming this to query_id_required which seems to
> stand for the meaning more correctly?

-1 for renaming to avoid breaking extensions that might access it.  We should
simply document for compute_query_id and query_id_enabled declaration that one
should instead use IsQueryIdEnabled() if they're interested in whether the core
queryid are computed or not.

Re: System username in pg_stat_activity

2024-01-19 Thread Julien Rouhaud

Hi,

On Thu, Jan 18, 2024 at 11:01 PM Magnus Hagander  wrote:
>
> I did. Here it is, and also including that suggested docs fix as well
> as a rebase on current master.

+if (MyClientConnectionInfo.authn_id)
+strlcpy(lbeentry.st_auth_identity,
MyClientConnectionInfo.authn_id, NAMEDATALEN);
+else
+MemSet(_auth_identity, 0,
sizeof(lbeentry.st_auth_identity));

Should we use pg_mbcliplen() here?  I don't think there's any strong
guarantee that no multibyte character can be used.  I also agree with
the nearby comment about MemSet being overkill.

+   value as the identity part in , or NULL
I was looking at
https://www.postgresql.org/docs/current/auth-username-maps.html and
noticed that this page is switching between system-user and
system-username.  Should we clean that up while at it?

Re: pg_stat_statements: more test coverage

2023-12-31 Thread Julien Rouhaud

On Sun, Dec 31, 2023 at 2:28 PM Michael Paquier  wrote:
>
> On Sat, Dec 30, 2023 at 08:39:47PM +0100, Peter Eisentraut wrote:
> > Ok, I have committed these two patches.
>
> Please note that the buildfarm has turned red, as in:
> https://buildfarm.postgresql.org/cgi-bin/show_stagxe_log.pl?nm=pipit=2023-12-31%2001%3A12%3A22=misc-check
>
> pg_stat_statements's regression.diffs holds more details:
> SELECT query FROM pg_stat_statements WHERE query LIKE '%t001%' OR query LIKE 
> '%t098%' ORDER BY query;
>  query
>  
> - select * from t001
>   select * from t098
> -(2 rows)
> +(1 row)

That's surprising.  I wanted to see if there was any specific
configuration but I get a 403.  I'm wondering if this is only due to
other tests being run concurrently evicting an entry earlier than
planned.

Re: pg_stat_statements: more test coverage

2023-12-28 Thread Julien Rouhaud

On Wed, Dec 27, 2023 at 8:53 PM Peter Eisentraut  wrote:
>
> On 27.12.23 09:08, Julien Rouhaud wrote:
> >
> > I was a bit surprised by that so I checked locally.  It does work as
> > expected provided that you set pg_stat_statements.track to all:
>
> Ok, here is an updated patch set that does it that way.

It looks good to me.  One minor complaint, I'm a bit dubious about
those queries:

SELECT count(*) <= 100 AND count(*) > 0 FROM pg_stat_statements;

Is it to actually test that pg_stat_statements won't store more than
pg_stat_statements.max records?  Note also that this query can't
return 0 rows, as the currently executed query will have an entry
added during post_parse_analyze.  Maybe a comment saying what this is
actually testing would help.

It would also be good to test that pg_stat_statements_info.dealloc is
more than 0 once enough statements have been issued.

> I have committed the patches 0002 and 0003 from the previous patch set
> already.
>
> I have also enhanced the TAP test a bit to check the actual content of
> the output across restarts.

Nothing much to say about this one, it all looks good.

> I'm not too hung up on the 0001 patch if others don't like that approach.

I agree with Michael on this one, the only times I saw this pattern
was to comply with some company internal policy for minimal coverage
numbers.

Re: pg_stat_statements: more test coverage

2023-12-27 Thread Julien Rouhaud

Hi,

On Tue, Dec 26, 2023 at 10:03 PM Peter Eisentraut  wrote:
>
> On 24.12.23 03:03, Michael Paquier wrote:
> > - Use a DO block of a PL function, say with something like that to
> > ensure an amount of N queries?  Say with something like that after
> > tweaking pg_stat_statements.track:
> > CREATE OR REPLACE FUNCTION create_tables(num_tables int)
> >RETURNS VOID AS
> >$func$
> >BEGIN
> >FOR i IN 1..num_tables LOOP
> >  EXECUTE format('
> >CREATE TABLE IF NOT EXISTS %I (id int)', 't_' || i);
> >END LOOP;
> > END
> > $func$ LANGUAGE plpgsql;
>
> I tried it like this first, but this doesn't register as separately
> executed commands for pg_stat_statements.

I was a bit surprised by that so I checked locally.  It does work as
expected provided that you set pg_stat_statements.track to all:
=# select create_tables(5);
=# select queryid, query from pg_stat_statements where query like 'CREATE%';
   queryid|  query
--+-
 -4985234599080337259 | CREATE TABLE IF NOT EXISTS t_5 (id int)
  -790506371630237058 | CREATE TABLE IF NOT EXISTS t_2 (id int)
 -1104545884488896333 | CREATE TABLE IF NOT EXISTS t_3 (id int)
 -2961032912789520428 | CREATE TABLE IF NOT EXISTS t_4 (id int)
  7273321309563119428 | CREATE TABLE IF NOT EXISTS t_1 (id int)

Re: How to get started with contribution

2023-12-17 Thread Julien Rouhaud

Hi,

On Sun, Dec 17, 2023 at 03:09:10PM +, Sukhbir Singh wrote:
>
> I am Sukhbir Singh, a B.Tech undergraduate, and I am in my pre-final year at
> Punjab Engineering College (PEC). I am new to open source but know Python,
> PostgreSQL, C, Javascript and node.js. I would love to contribute to your
> organization, but could you please tell me how to get started?

You can look at
https://wiki.postgresql.org/wiki/So,_you_want_to_be_a_developer%3F, it's a good
starting point with many links to additional resources.

Re: btree_gist into core?

2023-12-14 Thread Julien Rouhaud

Hi,

On Wed, Jan 19, 2022 at 09:30:11AM +0100, Peter Eisentraut wrote:
>
> To use exclusion constraints in practice, you often need to install the
> btree_gist extension, so that you can combine for example a range type check
> and normal scalar key columns into one constraint.
>
> [...]
>
> There are also of course questions about how to smoothly arrange upgrades
> from extensions to the built-in situations.

I'm not sure if that's what you were thinking of, but I know at least one
extension (that I'm maintaining) that explicitly relies on btree_gist
extension, as in "requires = [...], btree_gist" in the .control file.

Since you can't really tweak the control file on a per-major-version basis,
this will require some extra thoughts to make sure that people can release
extensions without having to tweak this file in some make rule or something
like that.

Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

2023-11-24 Thread Julien Rouhaud

Hi,

On Sat, Nov 25, 2023 at 02:45:07AM +0200, Alexander Korotkov wrote:
>
> I've reviewed this patch. I think this is the feature of high demand.
> New columns (stats_since and minmax_stats_since) to the
> pg_stat_statements view, enhancing the granularity and precision of
> performance monitoring. This addition allows database administrators
> to have a clearer understanding of the time intervals for statistics
> collection on each statement. Such detailed tracking is invaluable for
> performance tuning and identifying bottlenecks in database operations.

Yes, it will greatly improve performance analysis tools, and as the maintainer
of one of them I've been waiting for this feature for a very long time!
>
> I think the design was well-discussed in this thread.  Implementation
> also looks good to me.  I've just slightly revised the commit
> messages.
>
> I'd going to push this patchset if no objections.

Thanks!  No objection from me, it all looks good.

Re: Schema variables - new implementation for Postgres 15

2023-11-21 Thread Julien Rouhaud

Hi,

On Tue, Oct 17, 2023 at 08:52:13AM +0200, Pavel Stehule wrote:
>
> When I thought about global temporary tables, I got one maybe interesting
> idea. The one significant problem of global temporary tables is place for
> storing info about size or column statistics.
>
> I think so these data can be stored simply in session variables. Any global
> temporary table can get assigned one session variable, that can hold these
> data.

I don't know how realistic this would be.  For instance it will require to
properly link the global temporary table life cycle with the session variable
and I'm afraid it would require to add some hacks to make it work as needed.

But this still raises the question of whether this feature could be used
internally for the need of another feature.  If we think it's likely, should we
try to act right now and reserve the "pg_" prefix for internal use rather than
do that a few years down the line and probably break some user code as it was
done recently for the role names?

Re: Fix documentation for pg_stat_statements JIT deform_counter

2023-11-15 Thread Julien Rouhaud

On Wed, Nov 15, 2023 at 01:53:13PM +0100, Daniel Gustafsson wrote:
> > On 11 Nov 2023, at 10:26, Julien Rouhaud  wrote:
>
> > I was adding support for the new pg_stat_statements JIT deform_counter in 
> > PoWA
> > when I realized that those were added after jit_generation_time in the
> > documentation while they're actually at the end of the view.
>
> Nice catch, that was indeed an omission in the original commit.  Thanks for 
> the
> patch, I'll apply that shortly.

Thanks!

Fix documentation for pg_stat_statements JIT deform_counter

2023-11-11 Thread Julien Rouhaud

Hi,

I was adding support for the new pg_stat_statements JIT deform_counter in PoWA
when I realized that those were added after jit_generation_time in the
documentation while they're actually at the end of the view.  I'm adding Daniel
in Cc as the committer of the original patch.

It looks like there was some will to put them earlier in the view too, but
since it would require some additional tests in the SRF they probably just
ended up at the end of the view while still being earlier in the struct.
Anyway, it's not a big problem but all other fields are documented in the
correct position so let's be consistent.

Trivial patch attached.
>From 0144620d2de75d321e5273416fdab5df671b92f0 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Sat, 11 Nov 2023 17:11:35 +0800
Subject: [PATCH v1] Fix documentation for pg_stat_statements JIT
 deform_counter

Oversight in 5a3423ad8e.

Author: Julien Rouhaud
Reviewed-by: FIXME
Discussion: FIXME
---
 doc/src/sgml/pgstatstatements.sgml | 34 +++---
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index 6c7ca962f8..c604b1b0f0 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -442,74 +442,74 @@
 
  
   
-   jit_deform_count bigint
+   jit_inlining_count bigint
   
   
-   Total number of tuple deform functions JIT-compiled by the statement
+   Number of times functions have been inlined
   
  
 
  
   
-   jit_deform_time double precision
+   jit_inlining_time double precision
   
   
-Total time spent by the statement on JIT-compiling tuple deform
-functions, in milliseconds
+   Total time spent by the statement on inlining functions, in milliseconds
   
  
 
  
   
-   jit_inlining_count bigint
+   jit_optimization_count bigint
   
   
-   Number of times functions have been inlined
+   Number of times the statement has been optimized
   
  
 
  
   
-   jit_inlining_time double precision
+   jit_optimization_time double precision
   
   
-   Total time spent by the statement on inlining functions, in milliseconds
+   Total time spent by the statement on optimizing, in milliseconds
   
  
 
  
   
-   jit_optimization_count bigint
+   jit_emission_count bigint
   
   
-   Number of times the statement has been optimized
+   Number of times code has been emitted
   
  
 
  
   
-   jit_optimization_time double precision
+   jit_emission_time double precision
   
   
-   Total time spent by the statement on optimizing, in milliseconds
+   Total time spent by the statement on emitting code, in milliseconds
   
  
 
  
   
-   jit_emission_count bigint
+   jit_deform_count bigint
   
   
-   Number of times code has been emitted
+   Total number of tuple deform functions JIT-compiled by the statement
   
  
 
  
   
-   jit_emission_time double precision
+   jit_deform_time double precision
   
   
-   Total time spent by the statement on emitting code, in milliseconds
+Total time spent by the statement on JIT-compiling tuple deform
+functions, in milliseconds
   
  
 
-- 
2.37.0

Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

2023-10-24 Thread Julien Rouhaud

On Tue, Oct 24, 2023 at 6:57 PM Peter Eisentraut  wrote:
>
> On 24.10.23 09:58, Andrei Zubkov wrote:
> > During last moving to the current commitfest this patch have lost its
> > reviewers list. With respect to reviewers contribution in this patch, I
> > think reviewers list should be fixed.
>
> I don't think it's the purpose of the commitfest app to track how *has*
> reviewed a patch.  The purpose is to plan and allocate *current* work.
> If we keep a bunch of reviewers listed on a patch who are not actually
> reviewing the patch, then that effectively blocks new reviewers from
> signing up and the patch will not make progress.
>
> Past reviewers should of course be listed in the commit message, the
> release notes, etc. as appropriate.

Really?  Last time this topic showed up at least one committer said
that they tend to believe the CF app more than digging the thread [1],
and some other hackers mentioned other usage for being kept in the
reviewer list.  Since no progress has been made on the CF app since
I'm not sure it's the best idea to drop reviewer names from patch
entries generally.

[1] https://www.postgresql.org/message-id/552155.1648737...@sss.pgh.pa.us

Re: Good News Everyone! + feature proposal

2023-10-05 Thread Julien Rouhaud

On Thu, Oct 5, 2023 at 11:11 PM Jon Erdman  wrote:
>
> As a second more general question: could my original idea (i.e. sans
> event trigger) be implemented in an extension somehow, or is that not
> technically possible (I suspect not)?

It should be easy to do using the ProcessUtility_hook hook, defined in
a custom module written in C.  As long as your module is preloaded
(one of the *_preload_libraries GUC), your code will be called without
the need for any SQL-level object and you would be free to add any
custom GUC you want to enable it on a per-user basis or anything else.

Re: persist logical slots to disk during shutdown checkpoint

2023-08-29 Thread Julien Rouhaud

Hi,

On Tue, Aug 29, 2023 at 02:21:15PM +0530, Amit Kapila wrote:
> On Tue, Aug 29, 2023 at 10:16 AM vignesh C  wrote:
> >
> > That makes sense. The attached v6 version has the changes for the
> > same, apart from this I have also fixed a) pgindent issues b) perltidy
> > issues c) one variable change (flush_lsn_changed to
> > confirmed_flush_has_changed) d) corrected few comments in the test
> > file. Thanks to Peter for providing few offline comments.
> >
>
> The latest version looks good to me. Julien, Ashutosh, and others,
> unless you have more comments or suggestions, I would like to push
> this in a day or two.

Unfortunately I'm currently swamped with some internal escalations so I
couldn't keep up closely with the latest activity here.

I think I recall that you wanted to
change the timing at which logical slots are shutdown, I'm assuming that this
change won't lead to always have a difference between the LSN and latest
persisted LSN being different?  Otherwise saving the latest persisted LSN to
try to avoid persisting again all logical slots on shutdown seems reasonable to
me.

Re: psql --no-connect option

2023-08-24 Thread Julien Rouhaud

Hi,

On Thu, Aug 24, 2023 at 12:55:30PM -0700, Gurjeet Singh wrote:
> Currently, psql exits if a database connection is not established when
> psql is launched.
>
> Sometimes it may be useful to launch psql without connecting to the
> database. For example, a user may choose to start psql and then pipe
> commands via stdin, one of which may eventually perform the \connect
> command. Or the user may be interested in performing operations that
> psql can perform, like setting variables etc., before optionally
> initiating a connection.
>
> The attached patch introduces the --no-connect option, which allows
> psql to continue operation in absence of connection options.

FTR this has been discussed in the past, see at least [1].

I was interested in this feature, suggesting the exact same "--no-connect"
name, so still +1 for this patch (note that I haven't read it).

[1]: 
https://www.postgresql.org/message-id/flat/CAFe70G7iATwCMrymVwSVz7NajxCw3552TzFFHvkJqL_3L6gDTA%40mail.gmail.com

Re: persist logical slots to disk during shutdown checkpoint

2023-08-19 Thread Julien Rouhaud

On Sat, 19 Aug 2023, 14:16 Amit Kapila,  wrote:

> It's entirely possible for a logical slot to have a confirmed_flush
> LSN higher than the last value saved on disk while not being marked as
> dirty.  It's currently not a problem to lose that value during a clean
> shutdown / restart cycle but to support the upgrade of logical slots
> [1] (see latest patch at [2]), we seem to rely on that value being
> properly persisted to disk. During the upgrade, we need to verify that
> all the data prior to shudown_checkpoint for the logical slots has
> been consumed, otherwise, the downstream may miss some data. Now, to
> ensure the same, we are planning to compare the confirm_flush LSN
> location with the latest shudown_checkpoint location which means that
> the confirm_flush LSN should be updated after restart.
>
> I think this is inefficient even without an upgrade because, after the
> restart, this may lead to decoding some data again. Say, we process
> some transactions for which we didn't send anything downstream (the
> changes got filtered) but the confirm_flush LSN is updated due to
> keepalives. As we don't flush the latest value of confirm_flush LSN,
> it may lead to processing the same changes again.
>

In most cases there shouldn't be a lot of records to decode after restart,
but I agree it's better to avoid decoding those again.

The idea discussed in the thread [1] is to always persist logical
> slots to disk during the shutdown checkpoint. I have extracted the
> patch to achieve the same from that thread and attached it here. This
> could lead to some overhead during shutdown (checkpoint) if there are
> many slots but it is probably a one-time work.
>
> I couldn't think of better ideas but another possibility is to mark
> the slot as dirty when we update the confirm_flush LSN (see
> LogicalConfirmReceivedLocation()). However, that would be a bigger
> overhead in the running server as it could be a frequent operation and
> could lead to more writes.
>

Yeah I didn't find any better option either at that time. I still think
that forcing persistence on shutdown is the best compromise. If we tried to
always mark the slot as dirty, we would be sure to add regular overhead but
we would probably end up persisting the slot on disk on shutdown anyway
most of the time, so I don't think it would be a good compromise.

My biggest concern was that some switchover scenario might be a bit slower
in some cases, but if that really is a problem it's hard to imagine what
workload would be possible without having to persist them anyway due to
continuous activity needing to be sent just before the shutdown.

>

Re: Ignore 2PC transaction GIDs in query jumbling

2023-08-13 Thread Julien Rouhaud

On Sun, Aug 13, 2023 at 03:25:33PM +0900, Michael Paquier wrote:
> On Tue, Aug 01, 2023 at 10:46:58AM +0800, Julien Rouhaud wrote:
> >
> > Agreed, it should be as trivial to implement as for the 2pc commands :)
>
> Perhaps not as much, actually, because I was just reminded that
> DEALLOCATE is something that pg_stat_statements ignores.  So this
> makes harder the introduction of tests.

Maybe it's time to revisit that?  According to [1] the reason why
pg_stat_statements currently ignores DEALLOCATE is because it indeed bloated
the entries but also because at that time the suggestion for jumbling only this
one was really hackish.

Now that we do have a sensible approach to jumble utility statements, I think
it would be beneficial to be able to track those, for instance to be easily
diagnose a driver that doesn't rely on the extended protocol.

> Anyway, I guess that your own
> extension modules have a need for a query ID compiled with these
> fields ignored?

My module has a need to track statements and still work efficiently after that.
So anything that can lead to virtually an infinite number of pg_stat_statements
entries is a problem for me, and I guess to all the other modules / tools that
track pg_stat_statements.  I guess the reason why my module is still ignoring
DEALLOCATE is because it existed before pg 9.4 (with a homemade queryid as it
wasn't exposed before that), and it just stayed there since with the rest of
still problematic statements.

> For now, I have applied the 2PC bits independently, as of 638d42a.

Thanks!

[1] 
https://www.postgresql.org/message-id/flat/alpine.DEB.2.10.1404011631560.2557%40sto

Re: Schema variables - new implementation for Postgres 15

2023-08-12 Thread Julien Rouhaud

On Sat, Aug 12, 2023 at 01:20:03PM +0200, Dmitry Dolgov wrote:
> > On Sat, Aug 12, 2023 at 09:28:19AM +0800, Julien Rouhaud wrote:
> > On Fri, Aug 11, 2023 at 05:55:26PM +0200, Dmitry Dolgov wrote:
> > >
> > > Another confusing example was this one at the end of set_session_variable:
> > >
> > > + /*
> > > +  * XXX While unlikely, an error here is possible. It wouldn't 
> > > leak memory
> > > +  * as the allocated chunk has already been correctly assigned 
> > > to the
> > > +  * session variable, but would contradict this function 
> > > contract, which is
> > > +  * that this function should either succeed or leave the 
> > > current value
> > > +  * untouched.
> > > +  */
> > > + elog(DEBUG1, "session variable \"%s.%s\" (oid:%u) has new 
> > > value",
> > > +  
> > > get_namespace_name(get_session_variable_namespace(svar->varid)),
> > > +  get_session_variable_name(svar->varid),
> > > +  svar->varid);
> > >
> > > It's not clear, which exactly error you're talking about, it's the last
> > > instruction in the function.
> >
> > FTR I think I'm the one that changed that.  The error I was talking about is
> > elog() itself (in case of OOM for instance), or even one of the get_* call, 
> > if
> > running with log_level <= DEBUG1.  It's clearly really unlikely but still
> > possible, thus this comment which also tries to explain why this elog() is 
> > not
> > done earlier.
>
> I see, thanks for clarification. Absolutely nitpicking, but the crucial
> "that's why this elog is not done earlier" is only assumed in the
> comment between the lines, not stated out loud :)

Well, yes although to be fair the original version of this had a prior comment
that was making it much more obvious:

+   /*
+* No error should happen after this poiht, otherwise we could leak the
+* newly allocated value if any.
+*/

(which would maybe have been better said "Nothing that can error out should be
called after that point").  After quite a lot of patch revisions it now simply
says:

+   /* We can overwrite old variable now. No error expected */

I agree that a bit more explanation is needed, and maybe also reminding that
this is because all of that is done in a persistent memory context.

Re: Schema variables - new implementation for Postgres 15

2023-08-11 Thread Julien Rouhaud

On Fri, Aug 11, 2023 at 05:55:26PM +0200, Dmitry Dolgov wrote:
>
> Another confusing example was this one at the end of set_session_variable:
>
> + /*
> +  * XXX While unlikely, an error here is possible. It wouldn't leak 
> memory
> +  * as the allocated chunk has already been correctly assigned to the
> +  * session variable, but would contradict this function contract, which 
> is
> +  * that this function should either succeed or leave the current value
> +  * untouched.
> +  */
> + elog(DEBUG1, "session variable \"%s.%s\" (oid:%u) has new value",
> +  
> get_namespace_name(get_session_variable_namespace(svar->varid)),
> +  get_session_variable_name(svar->varid),
> +  svar->varid);
>
> It's not clear, which exactly error you're talking about, it's the last
> instruction in the function.

FTR I think I'm the one that changed that.  The error I was talking about is
elog() itself (in case of OOM for instance), or even one of the get_* call, if
running with log_level <= DEBUG1.  It's clearly really unlikely but still
possible, thus this comment which also tries to explain why this elog() is not
done earlier.

Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-10 Thread Julien Rouhaud

Hi,

On Thu, Aug 10, 2023 at 04:30:40PM +0900, Masahiko Sawada wrote:
> On Thu, Aug 10, 2023 at 2:27 PM Amit Kapila  wrote:
> >
> > Sawada-San, Julien, and others, do you have any thoughts on the above point?
>
> IIUC during the old cluster running in the middle of pg_upgrade it
> doesn't accept TCP connections. I'm not sure we need to worry about
> the case where someone in the same server attempts to create
> replication slots during the upgrade.

AFAICS this is only true for non-Windows platform, so we would still need some
extra safeguards on Windows.  Having those on all platforms will probably be
simpler and won't hurt otherwise.

> The same is true for other objects, as Amit mentioned.

I disagree.  As I mentioned before any module registered in
shared_preload_libraries can spawn background workers which can perform any
activity.  There were previous reports of corruption because of multi-xact
being generated by such bgworkers during pg_upgrade, I'm pretty sure that there
are some modules that create objects (automatic partitioning tools for
instance).  It's also unclear to me what would happen if some writes are
performed by such module at various points of the pg_upgrade process.  Couldn't
that lead to either data loss or broken slot (as it couldn't stream changes
from older major version)?

Re: Fix last unitialized memory warning

2023-08-09 Thread Julien Rouhaud

On Wed, Aug 09, 2023 at 10:29:56AM -0500, Tristan Partin wrote:
> On Wed Aug 9, 2023 at 10:02 AM CDT, Peter Eisentraut wrote:
> >
> > This patch has apparently upset one buildfarm member with a very old
> > compiler:
> > https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=lapwing=HEAD
> >
> > Any thoughts?
>
> Best I could find is SO question[0] which links out to[1]. Try this patch.
> Otherwise, a memset() would probably do too.

Yes, it's a buggy warning that came up in the past a few times as I recall, for
which we previously used the {{...}} approach to silence it.

As there have been previous complaints about it, I removed the -Werror from
lapwing and forced a new run to make it green again.

Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-07 Thread Julien Rouhaud

On Mon, Aug 07, 2023 at 12:42:33PM +0530, Amit Kapila wrote:
> On Mon, Aug 7, 2023 at 11:29 AM Julien Rouhaud  wrote:
> >
> > Unless I'm missing something I don't see what prevents something to connect
> > using the replication protocol and issue any query or even create new
> > replication slots?
> >
>
> I think the point is that if we have any slots where we have not
> consumed the pending WAL (other than the expected like
> SHUTDOWN_CHECKPOINT) or if there are invalid slots then the upgrade
> won't proceed and we will request user to remove such slots or ensure
> that WAL is consumed by slots. So, I think in the case you mentioned,
> the upgrade won't succeed.

What if new slots are added while the old instance is started in the middle of
pg_upgrade, *after* the various checks are done?

> > Note also that as complained a few years ago nothing prevents a bgworker 
> > from
> > spawning up during pg_upgrade and possibly corrupt the upgraded cluster if
> > multixid are assigned.  If publications are preserved wouldn't it mean that
> > such bgworkers could also lead to data loss?
> >
>
> Is it because such workers would write some WAL which slots may not
> process? If so, I think it is equally dangerous as other problems that
> can arise due to such a worker. Do you think of any special handling
> here?

Yes, and there were already multiple reports of multixact corruption due to
bgworker activity during pg_upgrade (see
https://www.postgresql.org/message-id/20210121152357.s6eflhqyh4g5e...@dalibo.com
for instance).  I think we should once and for all fix this whole class of
problem one way or another.

Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-06 Thread Julien Rouhaud

On Mon, Aug 07, 2023 at 09:24:02AM +0530, Amit Kapila wrote:
>
> I think autovacuum is not enabled during the upgrade. See comment "Use
> -b to disable autovacuum." in start_postmaster(). However, I am not
> sure if there can't be any additional WAL from checkpointer or
> bgwriter. Checkpointer has a code that ensures that if there is no
> important WAL activity then it would be skipped. Similarly, bgwriter
> also doesn't LOG xl_running_xacts unless there is an important
> activity. I feel if there is a chance of any WAL activity during the
> upgrade, we need to either change the check to ensure such WAL records
> are expected or document the same in some way.

Unless I'm missing something I don't see what prevents something to connect
using the replication protocol and issue any query or even create new
replication slots?

Note also that as complained a few years ago nothing prevents a bgworker from
spawning up during pg_upgrade and possibly corrupt the upgraded cluster if
multixid are assigned.  If publications are preserved wouldn't it mean that
such bgworkers could also lead to data loss?

Re: How to add a new operator for parser?

2023-08-06 Thread Julien Rouhaud

On Sun, Aug 06, 2023 at 01:37:42PM +0800, jacktby jacktby wrote:
>
> I need to build a new datatype. It can contains different datatypes, like
> ‘(1,’a’,2.0)’,it’s a (interger,string,float) tuple type

Is there any reason why you can't simply rely on the record datatype?

> and Then I need to
> add operator for it. How should I do?

If using record datatype you would only need to add a new operator.

Re: How to add a new operator for parser?

2023-08-05 Thread Julien Rouhaud

On Sun, 6 Aug 2023, 12:34 jacktby jacktby,  wrote:

> I’m trying to add a new operator for my pg application like greater_equals
> called “<~>", how many files I need to
> modify and how to do it? Can you give me an example?
>

you can look at some contrib for some examples of custom operator (and
custom datatype) implementation, like citext or btree_gin/gist

>

Re: How to build a new grammer for pg?

2023-08-01 Thread Julien Rouhaud

Hi,

On Tue, Aug 01, 2023 at 07:36:36PM +0800, jacktby wrote:
> Hi, I’m trying to 
> develop a new grammar for pg, can
+you give me a code example to reference?

It's unclear to me whether you want to entirely replace the flex/bison parser
with something else or just add some new bison rule.

If the latter, it really depends on what you change to achieve exactly.  I
guess you could just look at the gram.y git history until you find something
close enough to your use case and get inspiration from that.

Re: Ignore 2PC transaction GIDs in query jumbling

2023-07-31 Thread Julien Rouhaud

On Tue, Aug 01, 2023 at 11:37:49AM +0900, Michael Paquier wrote:
> On Tue, Aug 01, 2023 at 10:22:09AM +0800, Julien Rouhaud wrote:
> > Looking at the rest of the ignored patterns, the only remaining one would be
> > DEALLOCATE, which AFAICS doesn't have a query_jumble_ignore tag for now.
>
> This one seems to be simple as well with a location field, looking
> quickly at its Node.

Agreed, it should be as trivial to implement as for the 2pc commands :)

Re: Ignore 2PC transaction GIDs in query jumbling

2023-07-31 Thread Julien Rouhaud

On Tue, Aug 01, 2023 at 11:00:32AM +0900, Michael Paquier wrote:
> On Tue, Aug 01, 2023 at 09:28:08AM +0800, Julien Rouhaud wrote:
>
> > FTR we had to entirely ignore all those statements in powa years ago to try 
> > to
> > make the tool usable in such case for some users who where using 2pc, it 
> > would
> > be nice to be able to track them back for pg16+.
>
> That would be nice for your users.

Indeed, although I will need to make that part a runtime variable based on the
server version rather than a hardcoded string since the extension framework
doesn't provide a way to do that cleanly across major pg versions.

> Are there more query patterns you'd
> be interested in grouping in the backend?  I had a few folks aiming
> for CallStmt and SetStmt, but both a a bit tricky without a custom
> routine.

Looking at the rest of the ignored patterns, the only remaining one would be
DEALLOCATE, which AFAICS doesn't have a query_jumble_ignore tag for now.

Re: Ignore 2PC transaction GIDs in query jumbling

2023-07-31 Thread Julien Rouhaud

Hi,

On Tue, Aug 01, 2023 at 09:38:14AM +0900, Michael Paquier wrote:
>
> 31de7e6 has silenced savepoint names in the query jumbling, and
> something similar can be done for 2PC transactions once the GID is
> ignored in TransactionStmt.  This leads to the following grouping in
> pg_stat_statements, for instance, which is something that matters with
> workloads that heavily rely on 2PC:
> COMMIT PREPARED $1
> PREPARE TRANSACTION $1
> ROLLBACK PREPARED $1

Having an application relying on 2pc leads to pg_stat_statements being
virtually unusable on the whole instance, so +1 for the patch.

FTR we had to entirely ignore all those statements in powa years ago to try to
make the tool usable in such case for some users who where using 2pc, it would
be nice to be able to track them back for pg16+.

The patch LGTM.

Re: Improve join_search_one_level readibilty (one line change)

2023-07-31 Thread Julien Rouhaud

Hi,

On Wed, Jun 07, 2023 at 11:02:09AM +0800, 謝東霖 wrote:
> Thank you, Julien, for letting me know that cfbot doesn't test txt files.
> Much appreciated!

Thanks for posting this v2!

So unsurprisingly the cfbot is happy with this patch, since it doesn't change
the behavior at all.  I just have some nitpicking:

@@ -109,14 +109,14 @@ join_search_one_level(PlannerInfo *root, int level)
List   *other_rels_list;
ListCell   *other_rels;

+   other_rels_list = joinrels[1];
+
if (level == 2) /* consider remaining initial 
rels */
{
-   other_rels_list = joinrels[level - 1];
other_rels = lnext(other_rels_list, r);
}
else/* consider all initial 
rels */
{
-   other_rels_list = joinrels[1];
other_rels = list_head(other_rels_list);
}

Since each branch only has a single instruction after the change the curly
braces aren't needed anymore.  The only reason keep them is if it helps
readability (like if there's a big comment associated), but that's not the case
here so it would be better to get rid of them.

Apart from that +1 from me for the patch, I think it helps focusing the
attention on what actually matters here.

Re: \di+ cannot show the same name indexes

2023-07-13 Thread Julien Rouhaud

Hi,

On Thu, Jul 13, 2023 at 03:17:17PM +0800, フブキダイスキ wrote:
> After I create the same name index on the heap table and the temporary
> table, I can only get the temporary table's index by \di+.
>
> create table t1(c1 int);
> create temp table t2(c1 int);
>
> create index idx1 on t1(c1);
> \di+
>   List of relations
>  Schema | Name | Type  | Owner | Table |  Size  | Description
> +--+---+---+---++-
>  public | idx1 | index | zhrt  | t1| 128 kB |
> (1 row)
>
> create index idx1 on t2(c1);
> \di+
>  List of relations
>Schema| Name | Type  | Owner | Table |  Size  | Description
> -+--+---+---+---++-
>  pg_temp_298 | idx1 | index | zhrt  | t2| 128 kB |
> (1 row)
>
> Is it the expected bavior?

Yes, since the pg_temp schema has higher priority and those command will not
show multiple objects for the same non qualified name.  You can either change
the priority with something like

SET search_path TO public, pg_temp;

to look at public (or any other schema) first, or explicitly ask for the schema
you want, e.g. \di+ public.*

Re: numeric datatype for current release not available

2023-07-12 Thread Julien Rouhaud

On Wed, 12 Jul 2023, 23:15 Ashutosh Bapat, 
wrote:

> Hi All,
> https://www.postgresql.org/docs/current/datatype-numeric.html gives me
> "bad gateway" error. Attached screen shot. Date/Time datatype
> documentation is accessible at
> https://www.postgresql.org/docs/current/datatype-datetime.html.
>
> Just got this while wrapping up for the day. Didn't look at what's going
> wrong.
>

it's working here. probably a transient error, it happens from time to
time.

>

Re: contrib/pg_freespacemap first check input argument, then relation_open.

2023-07-05 Thread Julien Rouhaud

Hi,

On Thu, Jul 06, 2023 at 10:14:46AM +0800, jian he wrote:
>
> In:
> https://git.postgresql.org/cgit/postgresql.git/tree/contrib/pg_freespacemap/pg_freespacemap.c
>
> rel = relation_open(relid, AccessShareLock);
>
> if (blkno < 0 || blkno > MaxBlockNumber)
> ereport(ERROR,
> (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> errmsg("invalid block number")));
>
> 
> should it first check input arguments, then relation_open?

It would probably be a slightly better approach but wouldn't really change much
in practice so I'm not sure it's worth changing now.

> Does ereport automatically unlock the relation?

Yes, locks, lwlocks, memory contexts and everything else is properly cleaned /
released in case of error.

Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2023-07-04 Thread Julien Rouhaud

Hi,

On Tue, Jul 04, 2023 at 03:03:01PM +0900, Michael Paquier wrote:
> On Tue, Jul 04, 2023 at 07:16:47AM +0900, Michael Paquier wrote:
> > The second and third animals to fail are skate and snapper, both using
> > Debian 7 Wheezy.  As far as I know, it was an LTS supported until
> > 2018.  The owner of both machines is added in CC.  I guess that we
> > this stuff could just remove --with-openssl from the configure
> > switches.
>
> lapwing has reported a failure and runs a Debian 7, so adding Julien
> in CC about the removal of --with-openssl or similar in this animal.

Thanks, I actually saw that and already took care of removing openssl support a
couple of hours ago, and also added a new note on the animal to remember when
it was removed.  It should come back to green at the next scheduled run.

Re: Outdated description of PG_CACHE_LINE_SIZE

2023-07-03 Thread Julien Rouhaud

On Mon, Jul 03, 2023 at 12:01:55PM +0300, Heikki Linnakangas wrote:
> On 01/07/2023 10:49, Julien Rouhaud wrote:
> >
> > I just noticed that the comment for PG_CACHE_LINE_SIZE still says that "it's
> > currently used in xlog.c", which hasn't been true for quite some time.
> >
> > PFA a naive patch to make the description more generic.
>
> Applied, thanks!

Thanks!

Outdated description of PG_CACHE_LINE_SIZE

2023-07-01 Thread Julien Rouhaud

Hi,

I just noticed that the comment for PG_CACHE_LINE_SIZE still says that "it's
currently used in xlog.c", which hasn't been true for quite some time.

PFA a naive patch to make the description more generic.
>From a554ee9ca3558c1cc67b2f4024c13b26aacff3c9 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Sat, 1 Jul 2023 15:41:54 +0800
Subject: [PATCH v1] Fix PG_CACHE_LINE_SIZE description.

PG_CACHE_LINE_SIZE was originally only used in xlog.c, but this hasn't been
true for a very long time and is now wildly used, so modify its description to
not mention any explicit source code file.

Author: Julien Rouhaud
Reviewed-by: FIXME
Discussion: FIXME
---
 src/include/pg_config_manual.h | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/src/include/pg_config_manual.h b/src/include/pg_config_manual.h
index a1a93ad706..01fe2af499 100644
--- a/src/include/pg_config_manual.h
+++ b/src/include/pg_config_manual.h
@@ -218,12 +218,11 @@
 
 /*
  * Assumed cache line size. This doesn't affect correctness, but can be used
- * for low-level optimizations. Currently, this is used to pad some data
- * structures in xlog.c, to ensure that highly-contended fields are on
- * different cache lines. Too small a value can hurt performance due to false
- * sharing, while the only downside of too large a value is a few bytes of
- * wasted memory. The default is 128, which should be large enough for all
- * supported platforms.
+ * for low-level optimizations. This is mostly used to pad various data
+ * structures, to ensure that highly-contended fields are on different cache
+ * lines. Too small a value can hurt performance due to false sharing, while
+ * the only downside of too large a value is a few bytes of wasted memory. The
+ * default is 128, which should be large enough for all supported platforms.
  */
 #define PG_CACHE_LINE_SIZE		128
 
-- 
2.37.0

Re: Targetlist lost when CTE join

2023-06-28 Thread Julien Rouhaud

On Wed, Jun 28, 2023 at 05:17:14PM +0800, Julien Rouhaud wrote:
> >
> > Table t1 and  t2 both has 2 columns: c1, c2, when CTE join select *, the 
> > result target list seems to lost one’s column c1.
> > But it looks good when select cte1.* and t1.* explicitly .
> >
> > Is it a bug?
>
> This is working as intended.  When using a USING clause you "merge" both
> columns so the final target list only contain one version of the merged
> columns, which doesn't happen if you use e.g. ON instead.  I'm assuming that
> what the SQL standard says, but I don't have a copy to confirm.

I forgot to mention that this is actually documented:

https://www.postgresql.org/docs/current/queries-table-expressions.html

Furthermore, the output of JOIN USING suppresses redundant columns: there is no
need to print both of the matched columns, since they must have equal values.
While JOIN ON produces all columns from T1 followed by all columns from T2,
JOIN USING produces one output column for each of the listed column pairs (in
the listed order), followed by any remaining columns from T1, followed by any
remaining columns from T2.

Re: Targetlist lost when CTE join

2023-06-28 Thread Julien Rouhaud

Hi,

On Wed, Jun 28, 2023 at 04:52:34PM +0800, Zhang Mingli wrote:
>
> Mini repo
>
> create table t1(c1 int, c2 int);
> CREATE TABLE
> create table t2(c1 int, c2 int);
> CREATE TABLE
> explain with cte1 as (insert into t2 values (1, 2) returning *) select * from 
> cte1 join t1 using(c1);
>  QUERY PLAN
> 
>  Hash Join (cost=0.04..41.23 rows=11 width=12)
>  Hash Cond: (t1.c1 = cte1.c1)
>  CTE cte1
>  -> Insert on t2 (cost=0.00..0.01 rows=1 width=8)
>  -> Result (cost=0.00..0.01 rows=1 width=8)
>  -> Seq Scan on t1 (cost=0.00..32.60 rows=2260 width=8)
>  -> Hash (cost=0.02..0.02 rows=1 width=8)
>  -> CTE Scan on cte1 (cost=0.00..0.02 rows=1 width=8)
> (8 rows)
>
> with cte1 as (insert into t2 values (1, 2) returning *) select * from cte1 
> join t1 using(c1);
>  c1 | c2 | c2
> ++
> (0 rows)
>
> truncate t2;
> TRUNCATE TABLE
> with cte1 as (insert into t2 values (1, 2) returning *) select cte1.*, t1.* 
> from cte1 join t1 using(c1);
>  c1 | c2 | c1 | c2
> +++
> (0 rows)
>
> Table t1 and  t2 both has 2 columns: c1, c2, when CTE join select *, the 
> result target list seems to lost one’s column c1.
> But it looks good when select cte1.* and t1.* explicitly .
>
> Is it a bug?

This is working as intended.  When using a USING clause you "merge" both
columns so the final target list only contain one version of the merged
columns, which doesn't happen if you use e.g. ON instead.  I'm assuming that
what the SQL standard says, but I don't have a copy to confirm.

Re: [PATCH] Slight improvement of worker_spi.c example

2023-06-15 Thread Julien Rouhaud

On Wed, Jun 14, 2023 at 02:08:03PM +0300, Aleksander Alekseev wrote:
>
> Unfortunately I'm not familiar with the problem in respect of naptime
> Julien is referring to. If you know what this problem is and how to
> fix it, go for it. I'll review and test the code then. I can write the
> part of the patch that fixes the part regarding dynamic workers if
> necessary.

Oh, sorry I thought it was somewhat evident.

The naptime GUC description says:

> Duration between each check (in seconds).

and the associated code does a single

WaitLatch(..., WL_LATCH_SET | WL_TIMEOUT, ...)

So unless I'm missing something nothing prevents the check being run way more
often than expected if the latch keeps being set.

Similarly, my understanding of "duration between checks" is that a naptime of 1
min means that the check should be run a minute apart, assuming it's possible.
As is, the checks are run naptime + query execution time apart, which doesn't
seem right.  Obviously there's isn't much you can do if the query execution
lasts for more than naptime, apart from detecting it and raising a warning to
let users know that their configuration isn't adequate (or that there's some
other problem like some lock contention or something), similarly to e.g.
checkpoint_warning.

Note I haven't looked closely at this module otherwise, so I can't say if there
are some other problems around.

Re: [PATCH] Slight improvement of worker_spi.c example

2023-06-13 Thread Julien Rouhaud

On Tue, Jun 13, 2023 at 12:34:09PM +0300, Aleksander Alekseev wrote:
>
> > I agree that the current code
> > could lead folks to think that PushActiveSnapshot must go after
> > SPI_connect, but wouldn't the reverse ordering just give folks the opposite
> > impression?
>
> This is the exact reason why the original patch had an explicit
> comment that the ordering is not important in this case. It was argued
> however that the comment is redundant and thus it was removed.

I also don't think that a comment is really worthwhile.  If there were any hard
dependency, it should be mentioned in the various functions comments as that's
the first place one should look at when using a function they're not familiar
with.

That being said, I still don't understand why you focus on this tiny and not
really important detail while the module itself is actually broken (for dynamic
bgworker without s_p_l) and also has some broken behaviors with regards to the
naptime that are way more likely to hurt third party code that was written
using this module as an example.

Re: Views no longer in rangeTabls?

2023-06-10 Thread Julien Rouhaud

On Sat, Jun 10, 2023 at 08:56:47AM -0400, Tom Lane wrote:
>
> Well, if we're gonna do it we should do it for v16, rather than
> change the data structure twice.  It wouldn't be hard exactly:
>
> /*
>  * Clear fields that should not be set in a subquery RTE.  Note that we
>  * leave the relid, rellockmode, and perminfoindex fields set, so that the
>  * view relation can be appropriately locked before execution and its
>  * permissions checked.
>  */
> -   rte->relkind = 0;
> rte->tablesample = NULL;
> rte->inh = false;/* must not be set for a subquery */
>
> plus adjustment of that comment and probably also the comment
> for struct RangeTblEntry.

and also handle that field in (read|out)funcs.c

Re: confusion about this commit "Revert "Skip redundant anti-wraparound vacuums""

2023-06-07 Thread Julien Rouhaud

On Wed, Jun 07, 2023 at 03:42:25PM +0800, jiye wrote:
> we will update all commits with latest version certaintly, but we must
> confirm that this issue is same with it currently we can not confirm this
> issue can be fixed by revert 2aa6e331ead7f3ad080561495ad4bd3bc7cd8913 this
> commit, so i just query about how this commit can trigger autovacuum lock
> down or does not work.

The revert commit contains a description of the problem and a link to the
discussion and analysis that led to that revert.

Re: confusion about this commit "Revert "Skip redundant anti-wraparound vacuums""

2023-06-07 Thread Julien Rouhaud

On Wed, Jun 07, 2023 at 03:12:44PM +0800, jiye wrote:
> actually out test instance include 2aa6e331ead7f3ad080561495ad4bd3bc7cd8913
> this commit,  not yet reverted this commit.

Are you saying that you're doing tests relying on a version that's missing
about 3 years of security and bug fixes?  You should definitely update to the
latest minor version (currently 12.15) and keep applying all minor versions as
they get released.

Re: confusion about this commit "Revert "Skip redundant anti-wraparound vacuums""

2023-06-07 Thread Julien Rouhaud

On Tue, Jun 06, 2023 at 03:30:02PM -0400, Robert Haas wrote:
> On Mon, Jun 5, 2023 at 1:50 AM jiye  wrote:
>
> > we can not get determinate test case as this issue reproduce only once,
> > and currently autovaccum can works as we using vacuum freeze for each
> > tables of each database.
> >
> > our client's application is real online bank business, and have serveral
> > customer database, do a majority of update opertaion as  result trigger
> > some table dead_tup_ratio nealy 100%, but can not find any autovacuum
> > process work for a very long time before we do vacuum freeze manally.
> >
>
> I tend to doubt that this is caused by the commit you're blaming, because
> that commit purports to skip autovacuum operations only if some other
> vacuum has already done the work. Here you are saying that you see no
> autovacuum tasks at all.

I'm a bit confused about what commit is actually being discussed here.

Is it commit 2aa6e331ead7f3ad080561495ad4bd3bc7cd8913?  FTR this commit was
indeed problematic and eventually reverted in 12.3
(3ec8576a02b2b06aa214c8f3c2c3303c8a67639f), as it was leading to exactly the
problem described here (autovacuum kept triggering the same jobs that were
silently ignored, leading to absolutely no visible activity from a user point
of view).

Re: Improve join_search_one_level readibilty (one line change)

2023-06-06 Thread Julien Rouhaud

On Tue, 6 Jun 2023, 16:18 謝東霖,  wrote:

> Thank you to Julien Rouhaud and Tender Wang for the reviews.
>
> Julien's detailed guide has proven to be incredibly helpful, and I am
> truly grateful for it.
> Thank you so much for providing such valuable guidance!
>
> I have initiated a new commitfest:
> https://commitfest.postgresql.org/43/4346/
>
> Furthermore, I have attached a patch that improves the code by moving
> the initialization of "other_rels_list" outside the if branching.
>

I'm glad I could help! Thanks for creating the cf entry. Note however that
the cfbot ignores files with a .txt extension (I don't think it's
documented but it will mostly handle files with diff, patch, gz(ip), tar
extensions IIRC, processing them as needed depending on the extension), so
you should send v2 again with a supported extension, otherwise the cfbot
will keep testing your original patch.

>

Re: [PATCH] pg_regress.c: Fix "make check" on Mac OS X: Pass DYLD_LIBRARY_PATH

2023-06-06 Thread Julien Rouhaud

On Wed, Jun 7, 2023 at 5:44 AM Evan Jones  wrote:
>
> On Tue, Jun 6, 2023 at 5:23 PM Peter Eisentraut  wrote:
>>
>> This addresses only pg_regress.  What about all the other test suites?
>> Per the previous discussions, you'd need to patch up other places in a
>> similar way, potentially everywhere system() is called.
>
>
> Are there instructions for how I can run these other test suites? The 
> installation notes that Julien linked, and the Postgres wiki Developer FAQ 
> [1] only seem to mention "make check". I would be happy to try to fix other 
> tests on Mac OS X.

AFAIK there's no make rule that can really run everything.  You can
get most of it using make check-world (see
https://www.postgresql.org/docs/current/regress-run.html#id-1.6.20.5.5)
and making sure you added support for TAP tests (and probably also a
lot of optional dependencies) when running configure.  This won't run
everything but hopefully will hit most of the relevant infrastructure.

Re: [PATCH] pg_regress.c: Fix "make check" on Mac OS X: Pass DYLD_LIBRARY_PATH

2023-06-05 Thread Julien Rouhaud

Hi,

On Mon, Jun 05, 2023 at 09:47:30AM -0400, Evan Jones wrote:
> This makes "make check" work on Mac OS X. Without this patch, on Mac OS X a
> default "./configure; make; make check" fails with errors like:
> 
> dyld[65265]: Library not loaded: /usr/local/pgsql/lib/libpq.5.dylib
>   Referenced from: <59A2EAF9-6298-3112-BEDB-EA9A62A9DB53>
> /Users/evan.jones/postgresql-clean/tmp_install/usr/local/pgsql/bin/initdb
>   Reason: tried: '/usr/local/pgsql/lib/libpq.5.dylib' (no such file),
> '/System/Volumes/Preboot/Cryptexes/OS/usr/local/pgsql/lib/libpq.5.dylib'
> (no such file), '/usr/local/pgsql/lib/libpq.5.dylib' (no such file),
> '/usr/local/lib/libpq.5.dylib' (no such file), '/usr/lib/libpq.5.dylib' (no
> such file, not in dyld cache)
> 
> The reason is that at some point, Mac OS X started removing the
> DYLD_LIBRARY_PATH environment variable for "untrusted" executables [1]:
> "Any dynamic linker (dyld) environment variables, such as
> DYLD_LIBRARY_PATH, are purged when launching protected processes."
> 
> [1]
> https://developer.apple.com/library/archive/documentation/Security/Conceptual/System_Integrity_Protection_Guide/RuntimeProtections/RuntimeProtections.html
> 
> One solution is to explicitly pass the DYLD_LIBRARY_PATH environment
> variable to to the sub-process shell scripts that are run by pg_regress. To
> do this, I created an extra_envvars global variable which is set to the
> empty string "", but on Mac OS X, is filled in with "DYLD_LIBRARY_PATH=%s",
> where the %s is the current environment variable. The "make check" Makefile
> sets this environment variable to the temporary install directory, so this
> fixes the above errors.

Note that this is a known issue and a workaround is documented in the macos
specific notes at
https://www.postgresql.org/docs/current/installation-platform-notes.html#INSTALLATION-NOTES-MACOS:

> macOS's “System Integrity Protection” (SIP) feature breaks make check,
> because it prevents passing the needed setting of DYLD_LIBRARY_PATH down to
> the executables being tested. You can work around that by doing make install
> before make check. Most PostgreSQL developers just turn off SIP, though.

Re: Improve join_search_one_level readibilty (one line change)

2023-06-04 Thread Julien Rouhaud

Hi,

On Sat, Jun 03, 2023 at 05:24:43PM +0800, 謝東霖 wrote:
>
> Attached is my first patch for PostgreSQL, which is a simple one-liner
> that I believe can improve the code.

Welcome!

> In the "join_search_one_level" function, I noticed that the variable
> "other_rels_list" always refers to "joinrels[1]" even when the (level
> == 2) condition is met. I propose changing:
>
> "other_rels_list = joinrels[level - 1]" to "other_rels_list = joinrels[1]"
>
> This modification aims to enhance clarity and avoid unnecessary instructions.

Agreed.  It looks like it was originally introduced as mechanical changes in a
bigger patch.  It would probably be better to move the other_rels_list
initialization out of the if instruction and put it with the variable
declaration, to make it even clearer.  I'm not that familiar with this area of
the code so hopefully someone else will comment.

> I would greatly appreciate any review and feedback on this patch as I
> am a newcomer to PostgreSQL contributions. Your input will help me
> improve and contribute effectively to the project.

I think you did everything as needed!  You should consider adding you patch to
the next opened commitfest (https://commitfest.postgresql.org/43/) if you
haven't already, to make sure it won't be forgotten, even if it's a one-liner.
It will then also be regularly tested by the cfbot (http://cfbot.cputube.org/).

If needed, you can also test the same CI jobs (covering multiple OS) using your
personal github account, see
https://github.com/postgres/postgres/blob/master/src/tools/ci/README on details
to set it up.

Best practice is also to review a patch of similar difficulty when sending one.
You can look at the same commit fest entry if anything interests you, and
register as a reviewer.

> Additionally, if anyone has any tips on ensuring that Gmail recognizes
> my attached patches as the "text/x-patch" MIME type when sending them
> from the Chrome client, I would be grateful for the advice.

I don't see any problem with the attachment.  You can always check looking at
the online archive for that, for instance for your email:
https://www.postgresql.org/message-id/CANWNU8x9P9aCXGF=at-a_8mltat0lkcz_ysyrgbcuhzmqw2...@mail.gmail.com

As far as I know only apple mail is problematic with that regards, as it
doesn't send attachments as attachments.

> Or maybe the best practice is to use Git send-email ?

I don't think anyone uses git send-email on this mailing list.  We usually
prefer to manually attach patch(es), possibly compressing them if they're big,
and then keep all the discussion and new revisions on the same thread.

Re: Should "REGRESS_OPTS = --temp-config" be working for 3rd party extensions?

2023-06-03 Thread Julien Rouhaud

On Sat, Jun 03, 2023 at 02:56:27PM +0300, Aleksander Alekseev wrote:
>
> I tried to use `REGRESS_OPTS = --temp-config` in order to test a 3rd
> party extension with a custom .conf file similarly to how PostgreSQL
> does it for src/test/modules/test_slru. It didn't work and "38.18.
> Extension Building Infrastructure" [1] doesn't seem to be much help.
>
> Is it accurate to say that the author of a 3rd party extension that
> uses shared_preload_libraries can't be using SQL tests and has to use
> TAP tests instead? If not then what did I miss?

temp-config can only be used when bootstrapping a temporary environment, so
when using e.g. make check.  PGXS / third-party extension can only use
installcheck, so if you need specific config like shared_preload_libraries you
need to manually configure your instance beforehand, or indeed rely on TAP
tests.  Most projects properly setup their instance in the CI jobs, and at
least the Debian packaging infrastructure has a way to configure it too.

Re: [PATCH] Slight improvement of worker_spi.c example

2023-06-03 Thread Julien Rouhaud

On Sat, Jun 03, 2023 at 02:38:26PM +0300, Aleksander Alekseev wrote:
> Hi Julien,
>
> > I'm pretty sure that this is intentional.  The worker can be launched
> > dynamically and in that case it still needs a GUC for the naptime.
>
> The dynamic worker also is going to need worker_spi_database, however
> the corresponding GUC declaration is placed below the check.

Yes, and that's because that GUC is declared as PGC_POSTMASTER so you can't
declare such a GUC dynamically.
>
> Perhaps we should just say that the extension shouldn't be used
> without shared_preload_libraies. We are not testing whether it works
> in such a case anyway.

The patch that added the database name clearly was committed without bothering
testing that scenario, but it would be better to fix it and add some coverage
rather than remove some example of how to use dynamic bgworkers.  Maybe with a
assign hook to make sure it's never changed once assigned or something along
those lines.

That being said this module is really naive and has so many problems that I
don't think it's actually helpful as coding guidelines for anyone who wants to
create a non toy extension using bgworkers.

Re: [PATCH] Slight improvement of worker_spi.c example

2023-06-03 Thread Julien Rouhaud

On Sat, Jun 03, 2023 at 02:09:26PM +0300, Aleksander Alekseev wrote:
>
> Additionally I noticed that the check:
>
> ```
> if (!process_shared_preload_libraries_in_progress)
> return;
> ```
>
> ... was misplaced in _PG_init(). Here is the patch v2 which fixes this too.

I'm pretty sure that this is intentional.  The worker can be launched
dynamically and in that case it still needs a GUC for the naptime.

Re: cutting down the TODO list thread

2023-05-16 Thread Julien Rouhaud

On Tue, 16 May 2023, 02:05 Matthias van de Meent,

>
> The result I got when searching for "automatic postgresql index
> suggestions" was a combination of hypopg, pg_qualstats and some manual
> glue to get suggested indexes in the current database - but none of
> these are available in the main distribution.
>

FTR pg_qualstats has an integrated "automatic index suggestion" feature
since many years, so no glue needed.

>

Re: Missing update of all_hasnulls in BRIN opclasses

2023-05-06 Thread Julien Rouhaud

Hi,

On Sun, May 07, 2023 at 12:13:07AM +0200, Tomas Vondra wrote:
>
> c) ignore the issue - AFAICS this would be an issue only for (external)
> code accessing BrinMemTuple structs, but I don't think we're aware of
> any out-of-core BRIN opclasses or anything like that ...

FTR there's at least postgis that implments BRIN opclasses (for geometries and
geographies), but there's nothing fancy in the implementation and it doesn't
access BrinMemTuple struct.

Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-05-02 Thread Julien Rouhaud

On Tue, 2 May 2023, 19:43 Julien Rouhaud,  wrote:

> Hi,
>
> On Tue, May 02, 2023 at 12:55:18PM +0200, Alvaro Herrera wrote:
> > On 2023-Apr-07, Julien Rouhaud wrote:
> >
> > > That being said, I have a hard time believing that we could actually
> preserve
> > > physical replication slots.  I don't think that pg_upgrade final state
> is fully
> > > reproducible:  not all object oids are preserved, and the various
> pg_restore
> > > are run in parallel so you're very likely to end up with small physical
> > > differences that would be incompatible with physical replication.
> Even if we
> > > could make it totally reproducible, it would probably be at the cost
> of making
> > > pg_upgrade orders of magnitude slower.  And since many people are
> already
> > > complaining that it's too slow, that doesn't seem like something we
> would want.
> >
> > A point on preserving physical replication slots: because we change WAL
> > format from one major version to the next (adding new messages or
> > changing format for other messages), we can't currently rely on physical
> > slots working across different major versions.
>
> I don't think anyone suggested to do physical replication over different
> major
> versions.  My understanding was that it would be used to pg_upgrade a
> "physical cluster" (e.g. a primary and physical standby server) at the same
> time, and then simply starting them up again would lead to a working
> physical
> replication on the new version.
>
> I guess one could try to keep using the slots for other needs (PITR backup
> with
> pg_receivewal or something similar), and then you would indeed have to be
> aware
> that you won't be able to do anything with the new WAL records until you
> do a
> fresh base backup, but that's a problem that you can already face after a
> normal pg_upgrade (although in most cases it's probably quite obvious for
> now
> as the timeline isn't preserved).
>

if what you meant is that the slot may have to send a record generated by
an older major version, then unless I'm missing something the same
restriction could be added to such a feature as what's being discussed in
this thread for the logical replication slots. so only a final shutdown
checkpoint record would be present after the flushed WAL position. it may
be possible to work around that, if there weren't all the other problems I
mentioned.

>

Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-05-02 Thread Julien Rouhaud

Hi,

On Tue, May 02, 2023 at 12:55:18PM +0200, Alvaro Herrera wrote:
> On 2023-Apr-07, Julien Rouhaud wrote:
>
> > That being said, I have a hard time believing that we could actually 
> > preserve
> > physical replication slots.  I don't think that pg_upgrade final state is 
> > fully
> > reproducible:  not all object oids are preserved, and the various pg_restore
> > are run in parallel so you're very likely to end up with small physical
> > differences that would be incompatible with physical replication.  Even if 
> > we
> > could make it totally reproducible, it would probably be at the cost of 
> > making
> > pg_upgrade orders of magnitude slower.  And since many people are already
> > complaining that it's too slow, that doesn't seem like something we would 
> > want.
>
> A point on preserving physical replication slots: because we change WAL
> format from one major version to the next (adding new messages or
> changing format for other messages), we can't currently rely on physical
> slots working across different major versions.

I don't think anyone suggested to do physical replication over different major
versions.  My understanding was that it would be used to pg_upgrade a
"physical cluster" (e.g. a primary and physical standby server) at the same
time, and then simply starting them up again would lead to a working physical
replication on the new version.

I guess one could try to keep using the slots for other needs (PITR backup with
pg_receivewal or something similar), and then you would indeed have to be aware
that you won't be able to do anything with the new WAL records until you do a
fresh base backup, but that's a problem that you can already face after a
normal pg_upgrade (although in most cases it's probably quite obvious for now
as the timeline isn't preserved).

Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-04-24 Thread Julien Rouhaud

Hi,

On Mon, Apr 24, 2023 at 12:03:05PM +, Hayato Kuroda (Fujitsu) wrote:
>
> > I think that this test should be different when just checking for the
> > prerequirements (live_check / --check) compared to actually doing the 
> > upgrade,
> > as it's almost guaranteed that the slots won't have sent everything when the
> > source server is up and running.
>
> Hmm, you assumed that the user application is still running and data is coming
> continuously when doing --check, right? Personally I have thought that the
> --check operation is executed just before the actual upgrading, therefore I'm 
> not
> sure your assumption is real problem.

The checks are always executed before doing the upgrade, to prevent it if
something isn't right.  But you can also just do those check on a live
instance, so you can get a somewhat strong guarantee that the upgrade operation
will succeed before needing to stop all services and shut down postgres.  It's
basically free to run those checks and can avoid an unnecessary service
interruption so I'm pretty sure people use it quite often.

> And I could not find any checks which their
> contents are changed based on the --check option.

Yes, because other checks are things that you can actually fix when the
instance is running, like getting rid of tables with oids.  The only semi
exception if for 2pc which can be continuously prepared and committed, but if
you hit that problem at least you know you have to stop cleanly your XA-like
application and make sure there are no 2pc left.

> Yeah, if we support the case checking pg_replication_slots.active may be 
> sufficient.
> Actually this cannot handle the case that pg_create_logical_replication_slot()
> is executed just before upgrading, but I'm not sure it should be.

It shouldn't, same for any of the other checks.  The live check can't predict
the future, it just tells you if there's anything that would prevent the
upgrade *at the moment it's executed*.

Re: pg_upgrade and logical replication

2023-04-24 Thread Julien Rouhaud

Hi,

On Tue, Apr 18, 2023 at 01:40:51AM +, Hayato Kuroda (Fujitsu) wrote:
>
> I found a cfbot failure on macOS [1]. According to the log,
> "SELECT count(*) FROM t2" was executed before synchronization was done.
>
> ```
> [09:24:21.018](0.132s) not ok 18 - Table t2 should now have 3 rows on the new 
> subscriber
> ```
>
> With the patch present, wait_for_catchup() is executed after REFRESH, but
> it may not be sufficient because it does not check pg_subscription_rel.
> wait_for_subscription_sync() seems better for the purpose.

Fixed, thanks!

v5 attached with all previously mentioned fixes.
>From c6755ea3318220dc41bc315cc7acce4954e9b252 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Wed, 22 Feb 2023 09:19:32 +0800
Subject: [PATCH v5] Optionally preserve the full subscription's state during
 pg_upgrade

Previously, only the subscription metadata information was preserved.  Without
the list of relations and their state it's impossible to re-enable the
subscriptions without missing some records as the list of relations can only be
refreshed after enabling the subscription (and therefore starting the apply
worker).  Even if we added a way to refresh the subscription while enabling a
publication, we still wouldn't know which relations are new on the publication
side, and therefore should be fully synced, and which shouldn't.

Similarly, the subscription's replication origin are needed to ensure
that we don't replicate anything twice.

To fix this problem, this patch teaches pg_dump in binary upgrade mode to emit
additional ALTER SUBSCRIPTION subcommands that will restore the content of
pg_subscription_rel, and also provides an additional LSN parameter for CREATE
SUBSCRIPTION to restore the underlying replication origin remote LSN.  The new
ALTER SUBSCRIPTION subcommand and the new LSN parameter are not exposed to
users and only accepted in binary upgrade mode.

The new ALTER SUBSCRIPTION subcommand has the following syntax:

ALTER SUBSCRIPTION name ADD TABLE (relid = XYZ, state = 'x' [, lsn = 'X/Y'])

The relation is identified by its oid, as it's preserved during pg_upgrade.
The lsn is optional, and defaults to NULL / InvalidXLogRecPtr if not provided.
Explicitly passing InvalidXLogRecPtr (0/0) is however not allowed.

This mode is optional and not enabled by default.  A new
--preserve-subscription-state option is added to pg_upgrade to use it.  For
now, pg_upgrade will check that all the subscription have a valid replication
origin remote_lsn, and that all underlying relations are in 'r' (ready) state,
and will error out if that's not the case, logging the reason for the failure.

Author: Julien Rouhaud
Reviewed-by: FIXME
Discussion: https://postgr.es/m/20230217075433.u5mjly4d5cr4hcfe@jrouhaud
---
 doc/src/sgml/ref/pgupgrade.sgml  |  23 +++
 src/backend/commands/subscriptioncmds.c  |  75 +++-
 src/backend/parser/gram.y|  11 ++
 src/bin/pg_dump/common.c |  22 +++
 src/bin/pg_dump/pg_backup.h  |   2 +
 src/bin/pg_dump/pg_dump.c| 136 +-
 src/bin/pg_dump/pg_dump.h|  15 ++
 src/bin/pg_upgrade/check.c   |  81 +
 src/bin/pg_upgrade/dump.c|   3 +-
 src/bin/pg_upgrade/meson.build   |   1 +
 src/bin/pg_upgrade/option.c  |   6 +
 src/bin/pg_upgrade/pg_upgrade.h  |   1 +
 src/bin/pg_upgrade/t/003_subscription.pl | 220 +++
 src/include/nodes/parsenodes.h   |   3 +-
 src/tools/pgindent/typedefs.list |   1 +
 15 files changed, 595 insertions(+), 5 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_subscription.pl

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7816b4c685..6af790c986 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -240,6 +240,29 @@ PostgreSQL documentation
   
  
 
+ 
+  --preserve-subscription-state
+  
+   
+Fully preserve the logical subscription state if any.  That includes
+the underlying replication origin with their remote LSN and the list of
+relations in each subscription so that replication can be simply
+resumed if the subscriptions are reactivated.
+   
+   
+If this option isn't used, it is up to the user to reactivate the
+subscriptions in a suitable way; see the subscription part in  for more information.
+   
+   
+If this option is used and any of the subscription on the old cluster
+has an unknown remote_lsn (0/0), or has any relation
+in a state different from r (ready), the
+pg_upgrade run will error.
+   
+  
+ 
+
  
   -?
   --help
diff --git a/src/backend/commands/subscriptioncmds.c 
b/src/backend/commands/subscriptioncmds.c
index 56eafbff10..657db3791e 100644
--- a/src/backend/commands/subscriptioncmd

Re: pg_upgrade and logical replication

2023-04-24 Thread Julien Rouhaud

Hi,

On Fri, Apr 14, 2023 at 04:19:35AM +, Hayato Kuroda (Fujitsu) wrote:
>
> I have tested, but srsublsn became NULL if copy_data was specified as off.
> This is because when copy_data is false, all tuples in pg_subscription_rels 
> are filled
> as state = 'r' and srsublsn = NULL, and tablesync workers will never boot.
> See CreateSubscription().
> Doesn't it mean that there is a possibility that LSN option is not specified 
> while
> ALTER SUBSCRIPTION ADD TABLE?

It shouldn't be the case for now, as pg_upgrade will check first if there's a
invalid remote_lsn and refuse to proceed if that's the case.  Also, the
remote_lsn should be set as soon as some data is replicated, so unless you add
a table that's never modified to a publication you should be able to run
pg_upgrade at some point, once there's replicated DML on such a table.

I'm personally fine with the current restrictions, but I don't really use
logical replication in any project so maybe I'm not objective enough.  For now
I'd rather keep things as-is, and later improve on it if some people want to
lift such restrictions (and such restrictions can actually be lifted).

Re: pg_upgrade and logical replication

2023-04-24 Thread Julien Rouhaud

Hi,

On Thu, Apr 13, 2023 at 03:26:56PM +1000, Peter Smith wrote:
>
> 1.
> All the comments look alike, so it is hard to know what is going on.
> If each of the main test parts could be highlighted then the test code
> would be easier to read IMO.
>
> Something like below:
> [...]

I added a bit more comments about what's is being tested.  I'm not sure that a
big TEST CASE prefix is necessary, as it's not really multiple separated test
cases and other stuff can be tested in between.  Also AFAICT no other TAP test
current needs this kind of banner, even if they're testing more complex
scenario.

> 2.
> +# replication origin's remote_lsn isn't set if not data is replicated after 
> the
> +# initial sync
>
> wording:
> /if not data is replicated/if data is not replicated/

I actually mean "if no data", which is a bit different than what you suggest.
Fixed.

> 3.
> # Make sure the replication origin is set
>
> I was not sure if all of the SELECT COUNT(*) checking is needed
> because it just seems normal pub/sub functionality. There is no
> pg_upgrade happening, so really it seemed the purpose of this part was
> mainly to set the origin so that it will not be a blocker for
> ready-state tests that follow this code. Maybe this can just be
> incorporated into the following test part.

Since this patch is transferring internal details about subscriptions I prefer
to be thorough about what is tested, when data is actually being replicated and
so on so if something is broken (relation added to the wrong subscription,
wrong oid or something) it should immediately show what's happening.

> 4a.
> TBH, I felt it might be easier to follow if the SQL was checking for
> WHERE (text = "while old_sub is down") etc, rather than just using
> SELECT COUNT(*), and then trusting the comments to describe what the
> different counts mean.

I prefer the plain count as it's a simple way to make sure that the state is
exactly what's wanted.  If for some reason the patch leads to previous row
being replicated again, such a test wouldn't reveal it.  Sure, it could be
broken enough so that one old row is replicated twice and the new row isn't
replicated, but it seems so unlikely that I don't think that testing the whole
table content is necessary.

> 4b.
> All these messages like "Table t1 should still have 2 rows on the new
> subscriber" don't seem very helpful. e.g. They are not saying anything
> about WHAT this is testing or WHY it should still have 2 rows.

I don't think that those messages are supposed to say what or why something is
tested, just give a quick context / reference on the test in case it's broken.
The comments are there to explain in more details what is tested and/or why.

> 5.
> # Refresh the subscription, only the missing row on t2 show be replicated
>
> /show/should/

Fixed.

Re: Mark a transaction uncommittable

2023-04-22 Thread Julien Rouhaud

Hi,

On Sat, Apr 22, 2023 at 12:53:23PM -0400, Isaac Morland wrote:
>
> I have an application for this: creating various dev/test versions of data
> from production.
>
> Start by restoring a copy of production from backup. Then successively
> create several altered versions of the data and save them to a place where
> developers can pick them up. For example, you might have one version which
> has all data old than 1 year deleted, and another where 99% of the
> students/customers/whatever are deleted. Anonymization could also be
> applied. This would give you realistic (because it ultimately originates
> from production) test data.
>
> This could be done by starting a non-committable transaction, making the
> adjustments, then doing a pg_dump in the same transaction (using --snapshot
> to allow it to see that transaction). Then rollback, and repeat for the
> other versions. This saves repeatedly restoring the (probably very large)
> production data each time.

There already are tools to handle those use cases.  Looks for instance at
https://github.com/mla/pg_sample to backup a consistent subset of the data, or
https://github.com/rjuju/pg_anonymize to transparently pg_dump (or
interactively query) anonymized data.

Both tool also works when connected on a physical standby, while trying to
update data before dumping them wouldn't.

Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-04-20 Thread Julien Rouhaud

Hi,

On Thu, Apr 20, 2023 at 05:31:16AM +, Hayato Kuroda (Fujitsu) wrote:
> Dear Vignesh,
>
> Thank you for reviewing! PSA new patchset.
>
> > > Additionally, I added a checking functions in 0003
> > > According to pg_resetwal and other functions, the length of
> > CHECKPOINT_SHUTDOWN
> > > record seems (SizeOfXLogRecord + SizeOfXLogRecordDataHeaderShort +
> > sizeof(CheckPoint)).
> > > Therefore, the function ensures that the difference between current insert
> > position
> > > and confirmed_flush_lsn is less than (above + page header).

I think that this test should be different when just checking for the
prerequirements (live_check / --check) compared to actually doing the upgrade,
as it's almost guaranteed that the slots won't have sent everything when the
source server is up and running.

Maybe simply check that all logical slots are currently active when running the
live check, so at least there's a good chance that they will still be at
shutdown, and will therefore send all the data to the subscribers?  Having a
regression tests for that scenario would also be a good idea.  Having an
uncommitted write transaction should be enough to cover it.

Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-04-14 Thread Julien Rouhaud

Hi,

Sorry for the delay, I didn't had time to come back to it until this afternoon.

On Mon, Apr 10, 2023 at 09:18:46AM +, Hayato Kuroda (Fujitsu) wrote:
>
> I have analyzed about the point but it seemed to be difficult. This is because
> some additional records like followings may be inserted. PSA the script which 
> is
> used for testing. Note that "double CHECKPOINT_SHUTDOWN" issue might be wrong,
> so I wanted to withdraw it once. Sorry for noise.
>
> * HEAP/HEAP2 records. These records may be inserted by checkpointer.
>
> IIUC, if there are tuples which have not been flushed yet when shutdown is 
> requested,
> the checkpointer writes back all of them into heap file. At that time many WAL
> records are generated. I think we cannot predict the number of records 
> beforehand.
>
> * INVALIDATION(S) records. These records may be inserted by VACUUM.
>
> There is a possibility that autovacuum runs and generate WAL records. I think 
> we
> cannot predict the number of records beforehand because it depends on the 
> number
> of objects.
>
> * RUNNING_XACTS record
>
> It might be a timing issue, but I found that sometimes background writer 
> generated
> a XLOG_RUNNING record. According to the function BackgroundWriterMain(), it 
> will be
> generated when the process spends 15 seconds since last logging and there are
> important records. I think it is difficult to predict whether this will be 
> appeared or not.

I don't think that your analysis is correct.  Slots are guaranteed to be
stopped after all the normal backends have been stopped, exactly to avoid such
extraneous records.

What is happening here is that the slot's confirmed_flush_lsn is properly
updated in memory and ends up being the same as the current LSN before the
shutdown.  But as it's a logical slot and those records aren't decoded, the
slot isn't marked as dirty and therefore isn't saved to disk.  You don't see
that behavior when doing a manual checkpoint before (per your script comment),
as in that case the checkpoint also tries to save the slot to disk but then
finds a slot that was marked as dirty and therefore saves it.

In your script's scenario, when you restart the server the previous slot data
is restored and the confirmed_flush_lsn goes backward, which explains those
extraneous records.

It's probably totally harmless to throw away that value for now (and probably
also doesn't lead to crazy amount of work after restart, I really don't know
much about the logical slot code), but clearly becomes problematic with your
usecase.  One easy way to fix this is to teach the checkpoint code to force
saving the logical slots to disk even if they're not marked as dirty during a
shutdown checkpoint, as done in the attached v1 patch (renamed as .txt to not
interfere with the cfbot).  With this patch applied I reliably only see a final
shutdown checkpoint record with your scenario.

Now such a change will make shutdown a bit more expensive when using logical
replication, even if in 99% of cases you will not need to save the
confirmed_flush_lsn value, so I don't know if that's acceptable or not.
>From 77c3d2d361893de857627e036d0eaaf01cfe91c1 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Fri, 14 Apr 2023 13:49:09 +0800
Subject: [PATCH v1] Always persist to disk logical slots during a shutdown
 checkpoint.

It's entirely possible for a logical slot to have a confirmed_flush_lsn higher
than the last value saved on disk while not being marked as dirty.  It's
currently not a problem to lose that value during a clean shutdown / restart
cycle, but a later patch adding support for pg_upgrade of publications and
logical slots will rely on that value being properly persisted to disk.

Author: Julien Rouhaud
Reviewed-by: FIXME
Discussion: FIXME
---
 src/backend/access/transam/xlog.c |  2 +-
 src/backend/replication/slot.c| 26 --
 src/include/replication/slot.h|  2 +-
 3 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index b540ee293b..8100ca656e 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7011,7 +7011,7 @@ static void
 CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 {
CheckPointRelationMap();
-   CheckPointReplicationSlots();
+   CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
CheckPointSnapBuild();
CheckPointLogicalRewriteHeap();
CheckPointReplicationOrigin();
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 8021aaa0a8..2bbf2af770 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -109,7 +109,8 @@ static void ReplicationSlotDropPtr(ReplicationSlot *slot);
 /* internal persistency functions */
 static void RestoreSlotFromDisk(const char *name);
 s

Re: pg_upgrade and logical replication

2023-04-13 Thread Julien Rouhaud

Hi,

On Thu, Apr 13, 2023 at 12:42:05PM +1000, Peter Smith wrote:
> Here are some review comments for patch v4-0001 (not the test code)

Thanks!

>
> (There are some overlaps here with what Kuroda-san already posted
> yesterday because we were looking at the same patch code. Also, a few
> of my comments might become moot points if refactoring will be done
> according to Kuroda-san's "general" questions).

Ok, for the record, the parts I don't reply to are things I fully agree with
and already changed locally.

> ==
> Commit message
>
> 1.
> To fix this problem, this patch teaches pg_dump in binary upgrade mode to emit
> additional commands to be able to restore the content of pg_subscription_rel,
> and addition LSN parameter in the subscription creation to restore the
> underlying replication origin remote LSN.  The LSN parameter is only accepted
> in CREATE SUBSCRIPTION in binary upgrade mode.
>
> ~
>
> SUGGESTION
> To fix this problem, this patch teaches pg_dump in binary upgrade mode
> to emit additional ALTER SUBSCRIPTION commands to facilitate restoring
> the content of pg_subscription_rel, and provides an additional LSN
> parameter for CREATE SUBSCRIPTION to restore the underlying
> replication origin remote LSN. The new ALTER SUBSCRIPTION syntax and
> new LSN parameter are not exposed to the user -- they are only
> accepted in binary upgrade mode.

Thanks, I eventually adapted a bit more the suggested wording:

To fix this problem, this patch teaches pg_dump in binary upgrade mode to emit
additional ALTER SUBSCRIPTION subcommands that will restore the content of
pg_subscription_rel, and also provides an additional LSN parameter for CREATE
SUBSCRIPTION to restore the underlying replication origin remote LSN.  The new
ALTER SUBSCRIPTION subcommand and the new LSN parameter are not exposed to
users and only accepted in binary upgrade mode.

The new ALTER SUBSCRIPTION subcommand has the following syntax:

> 2b.
> The link renders strangely. It just says:
>
> See the subscription part in the [section called "Notes"] for more 
> information.
>
> Maybe the link part can be rewritten so that it renders more nicely,
> and also makes mention of pg_dump.

Yes I saw that.  I didn't try to look at it yet but that's indeed what I wanted
to do eventually.

> ==
> src/backend/commands/subscriptioncmds.c
>
> 3.
> +#define SUBOPT_RELID 0x8000
> +#define SUBOPT_STATE 0x0001
>
> Maybe 'SUBOPT_RELSTATE' is a better name for this per-relation state option?

I looked at it but part of the existing code is already using state as a
variable name, to be consistent with pg_subscription_rel.srsubstate.  I think
it's better to use the same pattern in this patch.

> 6.
> + if (strlen(state_str) != 1)
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> + errmsg("invalid relation state used")));
>
> IIUC this is syntax not supposed to be reachable by user input. Maybe
> there is some merit in making the errors similar looking to the normal
> options, but OTOH it could also be misleading.

It doesn't cost much and may be helpful for debugging so I will use error
messages similar to the error facing ones.

> This might as well just be: Assert(strlen(state_str) == 1 &&
> *state_str == SUBREL_STATE_READY);
> or even simply: Assert(IsBinaryUpgrade);

As I mentioned in a previous email, it's still unclear to me whether the
restriction on the srsubstate will be weakened or not, so I prefer to keep such
part of the code generic and have the restriction centralized in the pg_upgrade
check.

I added some Assert(IsBinaryUpgrade) in those code path as it may not be
evident in this place that it's a requirement.

> 7. CreateSubscription
>
> + if(IsBinaryUpgrade)
> + supported_opts |= SUBOPT_LSN;
>   parse_subscription_options(pstate, stmt->options, supported_opts, );
> 7b.
> I wonder if this was deserving of a comment something like "The LSN
> option is for internal use only"...

I was thinking that being valid only for IsBinaryUpgrade would be enough?

> 8. CreateSubscription
>
> + originid = replorigin_create(originname);
> +
> + if (IsBinaryUpgrade && IsSet(opts.lsn, SUBOPT_LSN))
> + replorigin_advance(originid, opts.lsn, InvalidXLogRecPtr,
> + false /* backward */ ,
> + false /* WAL log */ );
>
> I think the  'IsBinaryUpgrade' check is redundant here because
> SUBOPT_LSN is not possible to be set unless that is true anyhow.

It's indeed redundant for now, but it's also used as a safeguard if some code
is changed.  Maybe just having an assert(IsBinaryUpgrade) would be better
though.

While looking at it I noticed that this code was never reached, as I should
have checked IsSet(opts.specified_opts, ...).  I fixed that and added a TAP
test to make sure that the restored remote_lsn is the same as on the old
subscription node.

> 9. AlterSubscription
>
> + AddSubscriptionRelState(subid, opts.relid, opts.state,
> + opts.lsn);
>
> This line wrapping of AddSubscriptionRelState seems unnecessary.

Without it the

Re: pg_upgrade and logical replication

2023-04-13 Thread Julien Rouhaud

On Thu, Apr 13, 2023 at 10:51:10AM +0800, Julien Rouhaud wrote:
>
> On Wed, Apr 12, 2023 at 09:48:15AM +, Hayato Kuroda (Fujitsu) wrote:
> >
> > 5. AlterSubscription
> >
> > ```
> > +   supported_opts = SUBOPT_RELID | 
> > SUBOPT_STATE | SUBOPT_LSN;
> > +   parse_subscription_options(pstate, 
> > stmt->options,
> > +   
> >supported_opts, );
> > +
> > +   /* relid and state should always be 
> > provided. */
> > +   Assert(IsSet(opts.specified_opts, 
> > SUBOPT_RELID));
> > +   Assert(IsSet(opts.specified_opts, 
> > SUBOPT_STATE));
> > +
> > ```
> >
> > SUBOPT_LSN accepts "none" string, which means InvalidLSN. Isn't it better to
> > reject it?
>
> If you mean have an Assert for that I agree.  It's not supposed to be used by
> users so I don't think having non debug check is sensible, as any user 
> provided
> value has no reason to be correct anyway.

After looking at the code I remember that I kept the lsn optional in ALTER
SUBSCRIPTION name ADD TABLE command processing.  For now pg_upgrade checks that
all subscriptions have a valid remote_lsn so there should indeed always be a
value different from InvalidLSN/none specified, but it's still unclear to me
whether this check will eventually be weakened or not, so for now I think it's
better to keep AlterSubscription accept this case, here and in all other code
paths.

If there's a hard objection I will just make the lsn mandatory.

> > 9. parseCommandLine
> >
> > ```
> > +   user_opts.preserve_subscriptions = false;
> > ```
> >
> > I think this initialization is not needed because it is default.
>
> It's not strictly needed because of C rules but I think it doesn't really hurt
> to make it explicit and not have to remember what the standard says.

So I looked at nearby code and other option do rely on zero-initialized global
variables, so I agree that this initialization should be removed.

Re: pg_upgrade and logical replication

2023-04-12 Thread Julien Rouhaud

Hi,

On Wed, Apr 12, 2023 at 09:48:15AM +, Hayato Kuroda (Fujitsu) wrote:
>
> Thank you for updating the patch. I checked yours.
> Followings are general or non-minor questions:

Thanks!

> 1.
> Feature freeze for PG16 has already come. So I think there is no reason to 
> rush
> making the patch. Based on above, could you allow to upgrade while 
> synchronizing
> data? Personally it can be added as 0002 patch which extends the feature. Or
> have you already found any problem?

I didn't really look into it, mostly because I don't think it's a sensible
use case.  Logical sync of a relation is a heavy and time consuming operation
that requires to retain the xmin for quite some time.  This can already lead to
some bad effect on the publisher, so adding a pg_upgrade in the middle of that
would just make things worse.  Upgrading a subscriber is a rare event that has
to be well planned (you need to test your application with the new version and
so on), initial sync of relation shouldn't happen continually, so having to
wait for the sync to be finished doesn't seem like a source of problem but
might instead avoid some for users who may not fully realize the implications.

If someone has a scenario where running pg_upgrade in the middle of a logical
sync is mandatory I can try to look at it, but for now I just don't see a good
reason to add even more complexity to this part of the code, especially since
adding regression tests seems a bit troublesome.

> 2.
> I have a questions about the SQL interface:
>
> ALTER SUBSCRIPTION name ADD TABLE (relid = XYZ, state = 'x' [, lsn = 'X/Y'])
>
> Here the oid of the table is directly specified, but is it really kept between
> old and new node?

Yes, pg_upgrade does need to preserve relation's oid.

> Similar command ALTER PUBLICATION requires the name of table,
> not the oid.

Yes, but those are user facing commands, while ALTER SUBSCRIPTION name ADD
TABLE is only used internally for pg_upgrade.  My goal is to make this command
a bit faster by avoiding an extra cache lookup each time, relying on pg_upgrade
existing requirements.  If that's really a problem I can use the name instead
but I didn't hear any argument against it for now.

> 3.
> Currently getSubscriptionRels() is called from the getSubscriptions(), but I 
> could
> not find the reason why we must do like that. Other functions like
> getPublicationTables() is directly called from getSchemaData(), so they should
> be followed.

I think you're right, doing a single getSubscriptionRels() rather than once
per subscription should be more efficient.

> Additionaly, I found two problems.
>
> * Only tables that to be dumped should be included. See 
> getPublicationTables().

This is only done during pg_upgrade where all tables are dumped, so there
shouldn't be any need to filter the list.

> * dropStmt for subscription relations seems not to be needed.

I'm not sure I understand this one.  I agree that a dropStmt isn't needed, and
there's no such thing in the patch.  Are you saying that you agree with it?

> * Maybe security label and comments should be also dumped.

Subscription's security labels and comments are already dumped (well should be
dumped, AFAICS pg_dump was never taught to look at shared security label on
objects other than databases but still try to emit them, pg_dumpall instead
handles pg_authid and pg_tablespace), and we can't add security label or
comment on subscription's relations so I don't think this patch is missing
something?

So unless I'm missing something it looks like shared security label handling is
partly broken, but that's orthogonal to this patch.

> Followings are minor comments.
>
>
> 4. parse_subscription_options
>
> ```
> +   opts->state = defGetString(defel)[0];
> ```
>
> [0] is not needed.

It still needs to be dereferenced, I personally find [0] a bit clearer in that
situation but I'm not opposed to a plain *.

> 5. AlterSubscription
>
> ```
> +   supported_opts = SUBOPT_RELID | SUBOPT_STATE 
> | SUBOPT_LSN;
> +   parse_subscription_options(pstate, 
> stmt->options,
> + 
>  supported_opts, );
> +
> +   /* relid and state should always be provided. 
> */
> +   Assert(IsSet(opts.specified_opts, 
> SUBOPT_RELID));
> +   Assert(IsSet(opts.specified_opts, 
> SUBOPT_STATE));
> +
> ```
>
> SUBOPT_LSN accepts "none" string, which means InvalidLSN. Isn't it better to
> reject it?

If you mean have an Assert for that I agree.  It's not supposed to be used by
users so I don't think having non debug check is sensible, as any user provided
value has no reason to be correct anyway.

> 6. dumpSubscription()
>
> ```
> +   if (dopt->binary_upgrade && dopt->preserve_subscriptions &&
> +   subinfo->suboriginremotelsn)
> +   {
> +

Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-04-07 Thread Julien Rouhaud

On Fri, Apr 07, 2023 at 12:51:51PM +, Hayato Kuroda (Fujitsu) wrote:
> Dear Julien,
> 
> > > Agreed, but then shouldn't the option be named "--logical-slots-only" or
> > > something like that, same for all internal function names?
> > 
> > Seems right. Will be fixed in next version. Maybe
> > "--logical-replication-slots-only"
> > will be used, per Peter's suggestion [1].
> 
> After considering more, I decided not to include the word "logical" in the 
> option
> at this point. This is because we have not decided yet whether we dumps 
> physical
> replication slots or not. Current restriction has been occurred because of 
> just
> lack of analysis and considerations, If we decide not to do that, then they 
> will
> be renamed accordingly.

Well, even if physical replication slots were eventually preserved during
pg_upgrade, maybe users would like to only keep one kind of the others so
having both options could make sense.

That being said, I have a hard time believing that we could actually preserve
physical replication slots.  I don't think that pg_upgrade final state is fully
reproducible:  not all object oids are preserved, and the various pg_restore
are run in parallel so you're very likely to end up with small physical
differences that would be incompatible with physical replication.  Even if we
could make it totally reproducible, it would probably be at the cost of making
pg_upgrade orders of magnitude slower.  And since many people are already
complaining that it's too slow, that doesn't seem like something we would want.

Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-04-07 Thread Julien Rouhaud

On Fri, Apr 07, 2023 at 09:40:14AM +, Hayato Kuroda (Fujitsu) wrote:
>
> > As I mentioned in my original thread, I'm not very familiar with that code, 
> > but
> > I'm a bit worried about "all the changes generated on publisher must be send
> > and applied".  Is that a hard requirement for the feature to work reliably?
>
> I think the requirement is needed because the existing WALs on old node 
> cannot be
> transported on new instance. The WAL hole from confirmed_flush to current 
> position
> could not be filled by newer instance.

I see, that was also the first blocker I could think of when Amit mentioned
that feature weeks ago and I also don't see how that whole could be filled
either.

> > If
> > yes, how does this work if some subscriber node isn't connected when the
> > publisher node is stopped?  I guess you could add a check in pg_upgrade to 
> > make
> > sure that all logical slot are indeed caught up and fail if that's not the 
> > case
> > rather than assuming that a clean shutdown implies it.  It would be good to
> > cover that in the TAP test, and also cover some corner cases, like any new 
> > row
> > added on the publisher node after the pg_upgrade but before the subscriber 
> > is
> > reconnected is also replicated as expected.
>
> Hmm, good point. Current patch could not be handled the case because 
> walsenders
> for the such slots do not exist. I have tested your approach, however, I 
> found that
> CHECKPOINT_SHUTDOWN record were generated twice when publisher was
> shutted down and started. It led that the confirmed_lsn of slots always was 
> behind
> from WAL insert location and failed to upgrade every time.
> Now I do not have good idea to solve it... Do anyone have for this?

I'm wondering if we could just check that each slot's LSN is exactly
sizeof(CHECKPOINT_SHUTDOWN) ago or something like that?  That's hackish, but if
pg_upgrade can run it means it was a clean shutdown so it should be safe to
assume that what's the last record in the WAL was.  For the double
shutdown checkpoint, I'm not sure that I get the problem.  The check should
only be done at the very beginning of pg_upgrade, so there should have been
only one shutdown checkpoint done right?

Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-04-06 Thread Julien Rouhaud

Hi,

On Tue, Apr 04, 2023 at 07:00:01AM +, Hayato Kuroda (Fujitsu) wrote:
> Dear hackers,
> (CC: Amit and Julien)

(thanks for the Cc)

> This is a fork thread of Julien's thread, which allows to upgrade subscribers
> without losing changes [1].
>
> I briefly implemented a prototype for allowing to upgrade publisher node.
> IIUC the key lack was that replication slots used for logical replication 
> could
> not be copied to new node by pg_upgrade command, so this patch allows that.
> This feature can be used when '--include-replication-slot' is specified. Also,
> I added a small test for the typical case. It may be helpful to understand.
>
> Pg_upgrade internally executes pg_dump for dumping a database object from the 
> old.
> This feature follows this, adds a new option '--slot-only' to pg_dump command.
> When specified, it extracts needed info from old node and generate an SQL file
> that executes pg_create_logical_replication_slot().
>
> The notable deference from pre-existing is that restoring slots are done at 
> the
> different time. Currently pg_upgrade works with following steps:
>
> ...
> 1. dump schema from old nodes
> 2. do pg_resetwal several times to new node
> 3. restore schema to new node
> 4. do pg_resetwal again to new node
> ...
>
> The probem is that if we create replication slots at step 3, the restart_lsn 
> and
> confirmed_flush_lsn are set to current_wal_insert_lsn at that time, whereas
> pg_resetwal discards the WAL file. Such slots cannot extracting changes.
> To handle the issue the resotring is seprarated into two phases. At the first 
> phase
> restoring is done at step 3, excepts replicatin slots. At the second phase
> replication slots are restored at step 5, after doing pg_resetwal.
>
> Before upgrading a publisher node, all the changes gerenated on publisher must
> be sent and applied on subscirber. This is because restart_lsn and 
> confirmed_flush_lsn
> of copied replication slots is same as current_wal_insert_lsn. New node resets
> the information which WALs are really applied on subscriber and restart.
> Basically it is not problematic because before shutting donw the publisher, 
> its
> walsender processes confirm all data is replicated. See WalSndDone() and 
> related code.

As I mentioned in my original thread, I'm not very familiar with that code, but
I'm a bit worried about "all the changes generated on publisher must be send
and applied".  Is that a hard requirement for the feature to work reliably?  If
yes, how does this work if some subscriber node isn't connected when the
publisher node is stopped?  I guess you could add a check in pg_upgrade to make
sure that all logical slot are indeed caught up and fail if that's not the case
rather than assuming that a clean shutdown implies it.  It would be good to
cover that in the TAP test, and also cover some corner cases, like any new row
added on the publisher node after the pg_upgrade but before the subscriber is
reconnected is also replicated as expected.
>
> Currently physical slots are ignored because this is out-of-scope for me.
> I did not any analysis about it.

Agreed, but then shouldn't the option be named "--logical-slots-only" or
something like that, same for all internal function names?

Re: pg_upgrade and logical replication

2023-04-06 Thread Julien Rouhaud

Hi,

On Thu, Apr 06, 2023 at 04:49:59AM +, Hayato Kuroda (Fujitsu) wrote:
> Dear Julien,
>
> > I'm attaching a v3 to fix a recent conflict with pg_dump due to 
> > a563c24c9574b7
> > (Allow pg_dump to include/exclude child tables automatically).
>
> Thank you for making the patch.
> FYI - it could not be applied due to recent commits. SUBOPT_* and attributes
> in SubscriptionInfo was added these days.

Thanks a lot for warning me!

While rebasing and testing the patch, I realized that I forgot to git-add a
chunk, so I want ahead and added some minimal TAP tests to make sure that the
feature and various checks work as expected, also demonstrating that you can
safely resume after running pg_upgrade a logical replication setup where only
some of the tables are added to a publication, where new rows and new tables
are added to the publication while pg_upgrade is running (for the new table you
obviously need to make sure that the same relation exist on the subscriber side
but that's orthogonal to this patch).

While doing so, I also realized that the subscription's underlying replication
origin remote LSN is only set after some activity is seen *after* the initial
sync, so I also added a new check in pg_upgrade to make sure that all remote
origin tied to a subscription have a valid remote_lsn when the new option is
used.  Documentation is updated to cover that, same for the TAP tests.

v4 attached.
>From a5823a0ea289860367e0ebfb76c7dad7be5337e7 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Wed, 22 Feb 2023 09:19:32 +0800
Subject: [PATCH v4] Optionally preserve the full subscription's state during
 pg_upgrade

Previously, only the subscription metadata information was preserved.  Without
the list of relations and their state it's impossible to re-enable the
subscriptions without missing some records as the list of relations can only be
refreshed after enabling the subscription (and therefore starting the apply
worker).  Even if we added a way to refresh the subscription while enabling a
publication, we still wouldn't know which relations are new on the publication
side, and therefore should be fully synced, and which shouldn't.

Similarly, the subscription's replication origin are needed to ensure
that we don't replicate anything twice.

To fix this problem, this patch teaches pg_dump in binary upgrade mode to emit
additional commands to be able to restore the content of pg_subscription_rel,
and addition LSN parameter in the subscription creation to restore the
underlying replication origin remote LSN.  The LSN parameter is only accepted
in CREATE SUBSCRIPTION in binary upgrade mode.

The new ALTER SUBSCRIPTION subcommand, usable only during binary upgrade, has
the following syntax:

ALTER SUBSCRIPTION name ADD TABLE (relid = XYZ, state = 'x' [, lsn = 'X/Y'])

The relation is identified by its oid, as it's preserved during pg_upgrade.
The lsn is optional, and defaults to NULL / InvalidXLogRecPtr if not provided.
Explicitly passing InvalidXLogRecPtr (0/0) is however not allowed.

This mode is optional and not enabled by default.  A new
--preserve-subscription-state option is added to pg_upgrade to use it.  For
now, pg_upgrade will check that all the subscription have a valid replication
origin remote_lsn, and that all underlying relations are in 'r' (ready) state,
and will error out if that's not the case, logging the reason for the failure.

Author: Julien Rouhaud
Reviewed-by: FIXME
Discussion: https://postgr.es/m/20230217075433.u5mjly4d5cr4hcfe@jrouhaud
---
 doc/src/sgml/ref/pgupgrade.sgml  |  19 +++
 src/backend/commands/subscriptioncmds.c  |  67 +++-
 src/backend/parser/gram.y|  11 ++
 src/bin/pg_dump/pg_backup.h  |   2 +
 src/bin/pg_dump/pg_dump.c| 114 -
 src/bin/pg_dump/pg_dump.h|  13 ++
 src/bin/pg_upgrade/check.c   |  82 +
 src/bin/pg_upgrade/dump.c|   3 +-
 src/bin/pg_upgrade/meson.build   |   1 +
 src/bin/pg_upgrade/option.c  |   7 +
 src/bin/pg_upgrade/pg_upgrade.h  |   1 +
 src/bin/pg_upgrade/t/003_subscription.pl | 204 +++
 src/include/nodes/parsenodes.h   |   3 +-
 13 files changed, 522 insertions(+), 5 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_subscription.pl

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7816b4c685..b23c536954 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -240,6 +240,25 @@ PostgreSQL documentation

+ 
+  --preserve-subscription-state
+  
+   
+Fully preserve the logical subscription state if any.  That includes
+the underlying replication origin with their remote LSN and the list of
+relations in each subscription so that replication can be simply
+resumed if the subscriptions are reactived.
+

Re: Schema variables - new implementation for Postgres 15

2023-04-06 Thread Julien Rouhaud

On Thu, Apr 6, 2023 at 1:58 AM Pavel Stehule  wrote:
>
> st 5. 4. 2023 v 19:20 odesílatel Greg Stark  napsal:
>>
>> On Sun, 26 Mar 2023 at 07:34, Julien Rouhaud  wrote:
>> >
>> > This feature can significantly increase log size, so it's disabled by 
>> > default.
>> > For testing or development environments it's recommended to enable it if 
>> > you
>> > use session variables.
>>
>> I think it's generally not practical to have warnings for valid DML.
>> Effectively warnings in DML are errors since they make the syntax just
>> unusable. I suppose it's feasible to have it as a debugging option
>> that defaults to off but I'm not sure it's really useful.
>
>
> It is a tool that should help with collision detection.  Without it, it can 
> be pretty hard to detect it. It is similar to plpgsql's extra warnings.

Another example is escape_string_warning, which can also emit warning
for valid DML.  I once had to fix some random framework that a
previous employer was using, in order to move to a more recent pg
version and have standard_conforming_strings on, and having
escape_string_warning was quite helpful.

Re: [EXTERNAL] Support load balancing in libpq

2023-03-30 Thread Julien Rouhaud

On Thu, Mar 30, 2023 at 3:03 PM Daniel Gustafsson  wrote:
>
> > On 30 Mar 2023, at 03:48, Hayato Kuroda (Fujitsu) 
> >  wrote:
>
> > While checking the buildfarm, I found a failure on NetBSD caused by the 
> > added code[1]:
>
> Thanks for reporting, I see that lapwing which runs Linux (Debian 7, gcc 
> 4.7.2)
> has the same error.  I'll look into it today to get a fix committed.

This is an i686 machine, so it probably has the same void *
difference.  Building with -m32 might be enough to reproduce the
problem.

Re: [POC] Allow an extension to add data into Query and PlannedStmt nodes

2023-03-30 Thread Julien Rouhaud

Hi,

On Wed, Mar 29, 2023 at 12:02:30PM +0500, Andrey Lepikhov wrote:
>
> Previously, we read int this mailing list some controversial opinions on
> queryid generation and Jumbling technique. Here we don't intend to solve
> these problems but help an extension at least don't conflict with others on
> the queryId value.
>
> Extensions could need to pass some query-related data through all stages of
> the query planning and execution. As a trivial example, pg_stat_statements
> uses queryid at the end of execution to save some statistics. One more
> reason - extensions now conflict on queryid value and the logic of its
> changing. With this patch, it can be managed.

I just had a quick lookc at the patch, and IIUC it doesn't really help on that
side, as there's still a single official "queryid" that's computed, stored
everywhere and later used by pg_stat_statements (which does then store in
additionally to that official queryid).

You can currently change the main jumbling algorithm with a custom extension,
and all extensions will then use it as the source of truth, but I guess that
what you want is to e.g. have an additional and semantically different queryid,
and create multiple ecosystem of extensions, each using one or the other source
of queryid without changing the other ecosystem behavior.
>
> This patch introduces the structure 'ExtensionData' which allows to manage
> of a list of entries with a couple of interface functions
> addExtensionDataToNode() and GetExtensionData(). Keep in mind the possible
> future hiding of this structure from the public interface.
> An extension should invent a symbolic key to identify its data. It may
> invent as many additional keys as it wants but the best option here - is no
> more than one entry for each extension.
> Usage of this machinery is demonstrated by the pg_stat_statements example -
> here we introduced Bigint node just for natively storing of queryId value.
>
> Ruthless pgbench benchmark shows that we got some overhead:
> 1.6% - in default mode
> 4% - in prepared mode
> ~0.1% in extended mode.

That's a quite significant overhead.  But the only reason to accept such a
change is to actually use it to store additional data, so it would be
interesting to see what the overhead is like once you store at least 2
different values there.

> - Are we need to invent a registration procedure to do away with the names
> of entries and use some compact integer IDs?

Note that the patch as proposed doesn't have any defense for two extensions
trying to register something with the same name, or update a stored value, as
AddExtensionDataToNode() simply prepend the new value to the list.  You can
actually update the value by just storing the new value, but it will add a
significant overhead to every other extension that want to read another value.

The API is also quite limited as each stored value has a single identifier.
What if your extension needs to store multiple things?  Since it's all based on
Node you can't really store some custom struct, so you probably have to end up
with things like "my_extension.my_val1", "my_extension.my_val2" which isn't
great.

Re: MacOS: xsltproc fails with "warning: failed to load external entity"

2023-03-27 Thread Julien Rouhaud

On Mon, Mar 27, 2023 at 02:06:34PM +0200, Daniel Gustafsson wrote:
> > On 27 Mar 2023, at 14:04, Dagfinn Ilmari Mannsåker  
> > wrote:
> > Daniel Gustafsson  writes:
>
> >> Applied with a tiny but of changes to make it look like the rest of the
> >> paragraph more. Thanks!
> >
> > Doesn't this apply to Apple Silicon generally, not just M1? M2 already
> > exists, and M3 etc. will presumably also appear at some point. The
> > linked Homebrew issue refers to Apple Silicon, not any specific models.
>
> Thats a good point, it should say Apple Silicon and not M1 specifically.
> Thanks, I'll go fix.

Ah indeed that's a good point.  Thanks for pushing and fixing!

Re: pg_upgrade and logical replication

2023-03-27 Thread Julien Rouhaud

Hi,

On Thu, Mar 09, 2023 at 04:34:56PM +0800, Julien Rouhaud wrote:
> 
> Yeah I agree.  I added support to also preserve the subscription's replication
> origin information, a new --preserve-subscription-state (better naming 
> welcome)
> documented option for pg_upgrade to optionally ask for this new mode, and a
> similar (but undocumented) option for pg_dump that only works with
> --binary-upgrade and added a check in pg_upgrade that all relations are in 'r'
> (ready) mode.  Patch v2 attached.

I'm attaching a v3 to fix a recent conflict with pg_dump due to a563c24c9574b7
(Allow pg_dump to include/exclude child tables automatically).  While at it I
also tried to improve the documentation, explaining how that option could be
useful and what is the drawback of not using it (linking to the pg_dump note
about the same) if you plan to reactivate subscription(s) after an upgrade.
>From 3a17a292805451c7b1733bd1e331bee91b2ce1c5 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Wed, 22 Feb 2023 09:19:32 +0800
Subject: [PATCH v3] Optionally preserve the full subscription's state during
 pg_upgrade

Previously, only the subscription metadata information was preserved.  Without
the list of relations and their state it's impossible to re-enable the
subscriptions without missing some records as the list of relations can only be
refreshed after enabling the subscription (and therefore starting the apply
worker).  Even if we added a way to refresh the subscription while enabling a
publication, we still wouldn't know which relations are new on the publication
side, and therefore should be fully synced, and which shouldn't.

Similarly, the subscription's replication origin are needed to ensure
that we don't replicate anything twice.

To fix this problem, this patch teaches pg_dump in binary upgrade mode to emit
additional commands to be able to restore the content of pg_subscription_rel,
and addition LSN parameter in the subscription creation to restore the
underlying replication origin remote LSN.  The LSN parameter is only accepted
in CREATE SUBSCRIPTION in binary upgrade mode.

The new ALTER SUBSCRIPTION subcommand, usable only during binary upgrade, has
the following syntax:

ALTER SUBSCRIPTION name ADD TABLE (relid = XYZ, state = 'x' [, lsn = 'X/Y'])

The relation is identified by its oid, as it's preserved during pg_upgrade.
The lsn is optional, and defaults to NULL / InvalidXLogRecPtr.

This mode is optional and not enabled by default.  A new
--preserve-subscription-state option is added to pg_upgrade to use it.  For
now, pg_upgrade will check that all the subscription relations are in 'r'
(ready) state, and will error out if any subscription relation in any database
has a different state, logging the list of problematic databases with the
number of problematic relation in each.

Author: Julien Rouhaud
Reviewed-by: FIXME
Discussion: https://postgr.es/m/20230217075433.u5mjly4d5cr4hcfe@jrouhaud
---
 doc/src/sgml/ref/pgupgrade.sgml |  18 
 src/backend/commands/subscriptioncmds.c |  67 +-
 src/backend/parser/gram.y   |  11 +++
 src/bin/pg_dump/pg_backup.h |   2 +
 src/bin/pg_dump/pg_dump.c   | 114 +++-
 src/bin/pg_dump/pg_dump.h   |  13 +++
 src/bin/pg_upgrade/check.c  |  54 +++
 src/bin/pg_upgrade/dump.c   |   3 +-
 src/bin/pg_upgrade/option.c |   7 ++
 src/bin/pg_upgrade/pg_upgrade.h |   1 +
 src/include/nodes/parsenodes.h  |   3 +-
 11 files changed, 288 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7816b4c685..0b3a8fd57b 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -240,6 +240,24 @@ PostgreSQL documentation
   
  
 
+ 
+  --preserve-subscription-state
+  
+   
+Fully preserve the logical subscription state if any.  That includes
+the underlying replication origin with their remote LSN and the list of
+relations in each subscription so that replication can be simply
+resumed if the subscriptions are reactived.
+If that option isn't used, it is up to the user to reactivate the
+subscriptions in a suitable way; see the subscription part in  for more information.
+If this option is used and any of the subscription on the old cluster
+has any relation in a state different from r
+(ready), the pg_upgrade run will error.
+   
+  
+ 
+
  
   -?
   --help
diff --git a/src/backend/commands/subscriptioncmds.c 
b/src/backend/commands/subscriptioncmds.c
index 8a26ddab1c..9e9d011c06 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -66,6 +66,8 @@
 #define SUBOPT_DISABLE_ON_ERR  0x0400
 #define SUBOPT_LSN 0x

Re: MacOS: xsltproc fails with "warning: failed to load external entity"

2023-03-27 Thread Julien Rouhaud

On Mon, Mar 27, 2023 at 10:32:52AM +0200, Daniel Gustafsson wrote:
> > On 27 Mar 2023, at 10:24, Julien Rouhaud  wrote:
> >
> > Hi,
> >
> > On Wed, Feb 08, 2023 at 05:18:13PM -0500, Tom Lane wrote:
> >> I pushed the discussed documentation improvements, and changed the
> >> behavior of "ninja docs" to only build the HTML docs.  However,
> >> I've not done anything about documenting what is the minimum
> >> ninja version.
> >
> > FTR the documented XML_CATALOG_FILES environment variable is only valid for
> > Intel based machines, as homebrew installs everything in a different 
> > location
> > for M1...
> >
> > I'm attaching a patch to make that distinction, hoping that no one else will
> > have to waste time trying to figure out how to get it working on such 
> > hardware.
>
> LGTM apart from the double // in the export which is easy enough to fix before
> pushing.
>
> +export XML_CATALOG_FILES=/opt/homebrew//etc/xml/catalog

Oh, I didn't notice it.  Apparently apple's find isn't smart enough to trim a /
when fed with a directory with a trailing /

> For reference on why Homebrew use a different structure on Apple M1 the below
> issue has more details:
>
>   https://github.com/Homebrew/brew/issues/9177

Ah I was wondering why, thanks!

Re: MacOS: xsltproc fails with "warning: failed to load external entity"

2023-03-27 Thread Julien Rouhaud

Hi,

On Wed, Feb 08, 2023 at 05:18:13PM -0500, Tom Lane wrote:
> I pushed the discussed documentation improvements, and changed the
> behavior of "ninja docs" to only build the HTML docs.  However,
> I've not done anything about documenting what is the minimum
> ninja version.

FTR the documented XML_CATALOG_FILES environment variable is only valid for
Intel based machines, as homebrew installs everything in a different location
for M1...

I'm attaching a patch to make that distinction, hoping that no one else will
have to waste time trying to figure out how to get it working on such hardware.
>From e51f1b13dc70798b37e9d8f4bb34664fe138dd86 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Mon, 27 Mar 2023 16:18:12 +0800
Subject: [PATCH] Fix XML_CATALOG_FILES env var for Apple M1 base machines.

Author: Julien Rouhaud
Reviewed-by: FIXME
Discussion: FIXME
---
 doc/src/sgml/docguide.sgml | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/docguide.sgml b/doc/src/sgml/docguide.sgml
index cf8c072a49..81626d51cd 100644
--- a/doc/src/sgml/docguide.sgml
+++ b/doc/src/sgml/docguide.sgml
@@ -209,9 +209,13 @@ brew install docbook docbook-xsl libxslt fop
 

 The Homebrew-supplied programs require the following environment variable
-to be set:
+to be set.  For Intel based machines:
 
 export XML_CATALOG_FILES=/usr/local/etc/xml/catalog
+
+or for Apple M1 based machines:
+
+export XML_CATALOG_FILES=/opt/homebrew//etc/xml/catalog
 
 Without it, xsltproc will throw errors like this:
 
-- 
2.37.0

Re: Schema variables - new implementation for Postgres 15

2023-03-26 Thread Julien Rouhaud

Hi,

I just have a few minor wording improvements for the various comments /
documentation you quoted.

On Sun, Mar 26, 2023 at 08:53:49AM +0200, Pavel Stehule wrote:
> út 21. 3. 2023 v 17:18 odesílatel Peter Eisentraut <
> peter.eisentr...@enterprisedb.com> napsal:
>
> > - What is the purpose of struct Variable?  It seems very similar to
> >FormData_pg_variable.  At least a comment would be useful.
> >
>
> I wrote comment there:
>
>
> /*
>  * The Variable struct is based on FormData_pg_variable struct. Against
>  * FormData_pg_variable it can hold node of deserialized expression used
>  * for calculation of default value.
>  */

Did you mean "Unlike" rather than "Against"?

> > 0002
> >
> > expr_kind_allows_session_variables() should have some explanation
> > about criteria for determining which expression kinds should allow
> > variables.
> >
>
> I wrote comment there:
>
>  /*
>   * Returns true, when expression of kind allows using of
>   * session variables.
> + * The session's variables can be used everywhere where
> + * can be used external parameters. Session variables
> + * are not allowed in DDL. Session's variables cannot be
> + * used in constraints.
> + *
> + * The identifier can be parsed as an session variable
> + * only in expression's kinds where session's variables
> + * are allowed. This is the primary usage of this function.
> + *
> + * Second usage of this function is for decision if
> + * an error message "column does not exist" or "column
> + * or variable does not exist" should be printed. When
> + * we are in expression, where session variables cannot
> + * be used, we raise the first form or error message.
>   */

Maybe

/*
 * Returns true if the given expression kind is valid for session variables
 * Session variables can be used everywhere where external parameters can be
 * used.  Session variables are not allowed in DDL commands or in constraints.
 *
 * An identifier can be parsed as a session variable only for expression kinds
 * where session variables are allowed. This is the primary usage of this
 * function.
 *
 * Second usage of this function is to decide whether "column does not exist" or
 * "column or variable does not exist" error message should be printed.
 * When we are in an expression where session variables cannot be used, we raise
 * the first form or error message.
 */

> > session_variables_ambiguity_warning: There needs to be more
> > information about this.  The current explanation is basically just,
> > "warn if your query is confusing".  Why do I want that?  Why would I
> > not want that?  What is the alternative?  What are some examples?
> > Shouldn't there be a standard behavior without a need to configure
> > anything?
> >
>
> I enhanced this entry:
>
> +   
> +The session variables can be shadowed by column references in a
> query. This
> +is an expected feature. The existing queries should not be broken
> by creating
> +any session variable, because session variables are shadowed
> always if the
> +identifier is ambiguous. The variables should be named without
> possibility
> +to collision with identifiers of other database objects (column
> names or
> +record field names). The warnings enabled by setting
> session_variables_ambiguity_warning
> +should help with finding identifier's collisions.

Maybe

Session variables can be shadowed by column references in a query, this is an
expected behavior.  Previously working queries shouldn't error out by creating
any session variable, so session variables are always shadowed if an identifier
is ambiguous.  Variables should be referenced using an unambiguous identifier
without any possibility for a collision with identifier of other database
objects (column names or record fields names).  The warning messages emitted
when enabling session_variables_ambiguity_warning can help
finding such identifier collision.

> +   
> +   
> +This feature can significantly increase size of logs, and then it
> is
> +disabled by default, but for testing or development environments it
> +should be enabled.

Maybe

This feature can significantly increase log size, so it's disabled by default.
For testing or development environments it's recommended to enable it if you
use session variables.

Re: pg_upgrade and logical replication

2023-03-23 Thread Julien Rouhaud

Hi,

On Thu, Mar 23, 2023 at 04:27:28PM +0900, Masahiko Sawada wrote:
>
> I might be missing something but is there any reason why you created a
> subscription before pg_upgrade?
>
> Steps like doing pg_upgrade, then creating missing tables, and then
> creating a subscription (with copy_data = false) could be an
> alternative way to support upgrading the server from the physical
> standby?

As I already answered to Nikolay, and explained in my very first email, yes
it's possible to create the subscriptions after running pg_upgrade.  I
personally prefer to do it first to make sure that the logical replication is
actually functional, so I can still easily do a pg_rewind or something to fix
things without having to trash the newly built (and promoted) replica.

But that exact scenario is a corner case, as in any other scenario pg_upgrade
leaves the subscription in an unrecoverable state, where you have to truncate
all the underlying tables first and start from scratch doing an initial sync.
This kind of defeats the purpose of pg_upgrade.

Re: pg_dump versus hash partitioning

2023-03-17 Thread Julien Rouhaud

On Fri, Mar 17, 2023 at 01:44:12PM -0400, Tom Lane wrote:
> Julien Rouhaud  writes:
> > On Thu, Mar 16, 2023 at 08:43:56AM -0400, Tom Lane wrote:
> >> I think the odds of that yielding a usable dump are nil, so I don't
> >> see why we should bother.
>
> > No objection from me.
>
> OK, pushed with the discussed changes.

Great news, thanks a lot!

Re: pg_dump versus hash partitioning

2023-03-16 Thread Julien Rouhaud

On Thu, Mar 16, 2023 at 08:43:56AM -0400, Tom Lane wrote:
> Julien Rouhaud  writes:
> > On Mon, Mar 13, 2023 at 07:39:12PM -0400, Tom Lane wrote:
> >> Yeah, we need to do both.  Attached find an updated patch series:
>
> > I didn't find a CF entry, is it intended?
>
> Yeah, it's there:
>
> https://commitfest.postgresql.org/42/4226/

Ah thanks!  I was looking for "pg_dump" in the page.

> > I'm not sure if you intend to keep the current 0002 - 0003 separation, but 
> > if
> > yes the part about te->defn and possible fallback should be moved to 0003.  
> > In
> > 0002 we're only looking at the COPY statement.
>
> I was intending to smash it all into one commit --- the separation is
> just to ease review.

Ok, no problem then and I agree it's better to squash them.

> > I know you're only inheriting this comment, but isn't it well outdated and 
> > not
> > accurate anymore?  I'm assuming that "archiving is not on" was an acceptable
> > way to mean "wal_level < archive" at some point, but it's now completely
> > misleading.
>
> Sure, want to propose wording?

Just mentioning the exact wal_level, something like

 * [...].  If wal_level is set to minimal this prevents
 * WAL-logging the COPY.  This obtains a speedup similar to
 * [...]


> > More generally, I also think that forcing --load-via-partition-root for
> > known unsafe partitioning seems like the best compromise.  I'm not sure if 
> > we
> > should have an option to turn it off though.
>
> I think the odds of that yielding a usable dump are nil, so I don't
> see why we should bother.

No objection from me.

Re: pg_dump versus hash partitioning

2023-03-16 Thread Julien Rouhaud

On Mon, Mar 13, 2023 at 07:39:12PM -0400, Tom Lane wrote:
> Julien Rouhaud  writes:
> > On Sun, Mar 12, 2023 at 03:46:52PM -0400, Tom Lane wrote:
> >> The trick is to detect in pg_restore whether pg_dump chose to do
> >> load-via-partition-root.
>
> > Given that this approach wouldn't help with existing dump files (at least if
> > using COPY, in any case the one using INSERT are doomed), so I'm slightly in
> > favor of the first approach, and later add an easy and non magic incantation
> > way to produce dumps that don't depend on partitioning.
>
> Yeah, we need to do both.  Attached find an updated patch series:

I didn't find a CF entry, is it intended?

> 0001: TAP test that exhibits both this deadlock problem and the
> different-hash-codes problem.  I'm not sure if we want to commit
> this, or if it should be in exactly this form --- the second set
> of tests with a manual --load-via-partition-root switch will be
> pretty redundant after this patch series.

I think there should be at least the first set of tests committed.  I would
also be happy to see some test with the --insert case, even if those are
technically redundant too, as it's quite cheap to do once you setup the
cluster.

> 0002: Make pg_restore detect load-via-partition-root by examining the
> COPY commands embedded in the dump, and skip the TRUNCATE if so,
> thereby fixing the deadlock issue.  This is the best we can do for
> legacy dump files, I think, but it should be good enough.

is_load_via_partition_root():

+ * In newer archive files this can be detected by checking for a special
+ * comment placed in te->defn.  In older files we have to fall back to seeing
+ * if the COPY statement targets the named table or some other one.  This
+ * will not work for data dumped as INSERT commands, so we could give a false
+ * negative in that case; fortunately, that's a rarely-used option.

I'm not sure if you intend to keep the current 0002 - 0003 separation, but if
yes the part about te->defn and possible fallback should be moved to 0003.  In
0002 we're only looking at the COPY statement.

-* the run then we wrap the COPY in a transaction and
-* precede it with a TRUNCATE.  If archiving is not on
-* this prevents WAL-logging the COPY.  This obtains a
-* speedup similar to that from using single_txn mode in
-* non-parallel restores.
+* the run and we are not restoring a
+* load-via-partition-root data item then we wrap the COPY
+* in a transaction and precede it with a TRUNCATE.  If
+* archiving is not on this prevents WAL-logging the COPY.
+* This obtains a speedup similar to that from using
+* single_txn mode in non-parallel restores.

I know you're only inheriting this comment, but isn't it well outdated and not
accurate anymore?  I'm assuming that "archiving is not on" was an acceptable
way to mean "wal_level < archive" at some point, but it's now completely
misleading.

Minor nitpicking:
- should the function name prefixed with a "_" like almost all nearby code?
- should there be an assert that the given toc entry is indeed a TABLE DATA?

FWIW it unsurprisingly fixes the problem on my original use case.

> 0003: Also detect load-via-partition-root by adding a label in the
> dump.  This is a more bulletproof solution going forward.
>
> 0004-0006: same as previous patches, but rebased over these.
> This gets us to a place where the new TAP test passes.

+getPartitioningInfo(Archive *fout)
+{
+   PQExpBuffer query;
+   PGresult   *res;
+   int ntups;
+
+   /* no partitions before v10 */
+   if (fout->remoteVersion < 10)
+   return;
+ [...]
+   /*
+* Unsafe partitioning schemes are exactly those for which hash enum_ops
+* appears among the partition opclasses.  We needn't check partstrat.
+*
+* Note that this query may well retrieve info about tables we aren't
+* going to dump and hence have no lock on.  That's okay since we need not
+* invoke any unsafe server-side functions.
+*/
+   appendPQExpBufferStr(query,
+"SELECT partrelid FROM pg_partitioned_table WHERE\n"
+"(SELECT c.oid FROM pg_opclass c JOIN pg_am a "
+"ON c.opcmethod = a.oid\n"
+"WHERE opcname = 'enum_ops' "
+"AND opcnamespace = 'pg_catalog'::regnamespace "
+"AND amname = 'hash') = ANY(partclass)");

Hash partitioning was added with pg11, should we bypass pg10 too with a comment
saying that we only care about hash, at least for the forseeable future?

Other than that, the pat

Re: pg_dump versus hash partitioning

2023-03-13 Thread Julien Rouhaud

On Sun, Mar 12, 2023 at 03:46:52PM -0400, Tom Lane wrote:
> Julien Rouhaud  writes:
> > The BEGIN + TRUNCATE is only there to avoid generating WAL records just in 
> > case
> > the wal_level is minimal.  I don't remember if that optimization still 
> > exists,
> > but if yes we could avoid doing that if the server's wal_level is replica or
> > higher?  That's not perfect but it would help in many cases.
>
> After thinking about this, it seems like a better idea is to skip the
> TRUNCATE if we're doing load-via-partition-root.  In that case it's
> clearly a dangerous thing to do regardless of deadlock worries, since
> it risks discarding previously-loaded data that came over from another
> partition.  (IOW this seems like an independent, pre-existing bug in
> load-via-partition-root mode.)

It's seems quite unlikely to be able to actually truncate already restored data
without eventually going into a deadlock, but it's still possible so agreed.

> The trick is to detect in pg_restore whether pg_dump chose to do
> load-via-partition-root.  If we have a COPY statement we can fairly
> easily examine it to see if the target table is what we expect or
> something else.  However, if the table was dumped as INSERT statements
> it'd be far messier; the INSERTs are not readily accessible from the
> code that needs to make the decision.
>
> What I propose we do about that is further tweak things so that
> load-via-partition-root forces dumping via COPY.  AFAIK the only
> compelling use-case for dump-as-INSERTs is in transferring data
> to a non-Postgres database, which is a context in which dumping
> partitioned tables as such is pretty hopeless anyway.

It seems acceptable to me.

> (I wonder if
> we should have some way to dump all the contents of a partitioned
> table as if it were unpartitioned, to support such migration.)

(this would be nice to have)

> An alternative could be to extend the archive TOC format to record
> directly whether a given TABLE DATA object loads data via partition
> root or normally.  Potentially we could do that without an archive
> format break by defining te->defn for TABLE DATA to be empty for
> normal dumps (as it is now) or something like "-- load via partition root"
> for the load-via-partition-root case.  However, depending on examination
> of the COPY command would already work for the vast majority of existing
> archive files, so I feel like it might be the preferable choice.

Given that this approach wouldn't help with existing dump files (at least if
using COPY, in any case the one using INSERT are doomed), so I'm slightly in
favor of the first approach, and later add an easy and non magic incantation
way to produce dumps that don't depend on partitioning.  It would mean that you
would only be able to produce such dumps using pg16 client binaries, but such
version would also work with older server versions so it doesn't seem like a
huge problem in the long run.

Re: pg_dump versus hash partitioning

2023-03-10 Thread Julien Rouhaud

On Fri, Mar 10, 2023 at 10:10:14PM -0500, Tom Lane wrote:
> Julien Rouhaud  writes:
> > Working on some side project that can cause dump of hash partitions to be
> > routed to a different partition, I realized that --load-via-partition-root 
> > can
> > indeed cause deadlock in such case without FK dependency or anything else.
>
> > The problem is that each worker will perform a TRUNCATE TABLE ONLY followed 
> > by
> > a copy of the original partition's data in a transaction, and that obviously
> > will lead to deadlock if the original and locked partition and the restored
> > partition are different.
>
> Oh, interesting.  I wonder if we can rearrange things to avoid that.

The BEGIN + TRUNCATE is only there to avoid generating WAL records just in case
the wal_level is minimal.  I don't remember if that optimization still exists,
but if yes we could avoid doing that if the server's wal_level is replica or
higher?  That's not perfect but it would help in many cases.

Re: pg_dump versus hash partitioning

2023-03-10 Thread Julien Rouhaud

On Tue, Feb 14, 2023 at 02:21:33PM -0500, Tom Lane wrote:
> Here's a set of draft patches around this issue.
>
> 0001 does what I last suggested, ie force load-via-partition-root for
> leaf tables underneath a partitioned table with a partitioned-by-hash
> enum column.  It wasn't quite as messy as I first feared, although we do
> need a new query (and pg_dump now knows more about pg_partitioned_table
> than it used to).
>
> I was a bit unhappy to read this in the documentation:
>
> It is best not to use parallelism when restoring from an archive made
> with this option, because pg_restore will
> not know exactly which partition(s) a given archive data item will
> load data into.  This could result in inefficiency due to lock
> conflicts between parallel jobs, or perhaps even restore failures due
> to foreign key constraints being set up before all the relevant data
> is loaded.
>
> This made me wonder if this could be a usable solution at all, but
> after thinking for awhile, I don't see how the claim about foreign key
> constraints is anything but FUD.  pg_dump/pg_restore have sufficient
> dependency logic to prevent that from happening.  I think we can just
> drop the "or perhaps ..." clause here, and tolerate the possible
> inefficiency as better than failing.

Working on some side project that can cause dump of hash partitions to be
routed to a different partition, I realized that --load-via-partition-root can
indeed cause deadlock in such case without FK dependency or anything else.

The problem is that each worker will perform a TRUNCATE TABLE ONLY followed by
a copy of the original partition's data in a transaction, and that obviously
will lead to deadlock if the original and locked partition and the restored
partition are different.

Re: Combine pg_walinspect till_end_of_wal functions with others

2023-03-10 Thread Julien Rouhaud

On Fri, 10 Mar 2023, 16:14 Michael Paquier,  wrote:

> On Fri, Mar 10, 2023 at 04:04:15PM +0800, Julien Rouhaud wrote:
> > As long as we provide a sensible default value (so I guess '0/0' to
> > mean "no upper bound") and that we therefore don't have to manually
> > specify an upper bound if we don't want one I'm fine with keeping the
> > functions marked as STRICT.
>
> FWIW, using also InvalidXLogRecPtr as a shortcut to say "Don't fail,
> just do the job" is fine by me.


isn't '0/0' the same as InvalidXLogRecPtr? but my point is that we
shouldn't require to spell it explicitly, just rely on the default value.

Something like a FFF/ should
> just mean the same on a fresh cluster, still it gets risky the longer
> the WAL is generated.
>

yeah, it would be handy to accept 'infinity' in that context.

>

Re: Combine pg_walinspect till_end_of_wal functions with others

2023-03-10 Thread Julien Rouhaud

On Fri, Mar 10, 2023 at 12:24 PM Michael Paquier  wrote:
>
> On Wed, Mar 08, 2023 at 01:40:46PM +0530, Bharath Rupireddy wrote:
> > 1. When start_lsn is NULL or invalid ('0/0'), emit an error. There was
> > a comment on the functions automatically determining start_lsn to be
> > the oldest WAL LSN. I'm not implementing this change now, as it
> > requires extra work to traverse the pg_wal directory. I'm planning to
> > do it in the next set of improvements where I'm planning to make the
> > functions timeline-aware, introduce functions for inspecting
> > wal_buffers and so on.
> >
> > [.. long description ..]
> >
> > 9. Refactored the tests according to the new behaviours.
>
> Hmm.  I think this patch ought to have a result simpler than what's
> proposed here.
>
> First, do we really have to begin marking the functions as non-STRICT
> to abide with the treatment of NULL as a special value?  The part that
> I've found personally the most annoying with these functions is that
> an incorrect bound leads to a random failure, particularly when such
> queries are used for monitoring.

As long as we provide a sensible default value (so I guess '0/0' to
mean "no upper bound") and that we therefore don't have to manually
specify an upper bound if we don't want one I'm fine with keeping the
functions marked as STRICT.

>  I would simplify the whole with two
> simple rules, as of:
> - Keeping all the functions strict.
> - When end_lsn is a LSN in the future of the current LSN inserted or
> replayed, adjust its value to be the exactly GetXLogReplayRecPtr() or
> GetFlushRecPtr().  This way, monitoring tools can use a value ahead,
> at will.
> - Failing if start_lsn > end_lsn.
> - Failing if start_lsn refers to a position older than what exists is
> still fine by me.

+1

> I would also choose to remove
> pg_get_wal_records_info_till_end_of_wal() from the SQL interface in
> 1.1 to limit the confusion arount it, but keep a few lines of code so
> as we are still able to use it when pg_walinspect 1.0 is the version
> enabled with CREATE EXTENSION.

Yeah the SQL function should be removed no matter what.

Re: pg_upgrade and logical replication

2023-03-09 Thread Julien Rouhaud

Hi,

On Thu, Mar 09, 2023 at 12:05:36PM +0530, Amit Kapila wrote:
> On Wed, Mar 8, 2023 at 12:26 PM Julien Rouhaud  wrote:
> >
> > Is there something that can be done for pg16? I was thinking that having a
> > fix for the normal and easy case could be acceptable: only allowing
> > pg_upgrade to optionally, and not by default, preserve the subscription
> > relations IFF all subscriptions only have tables in ready state. Different
> > states should be transient, and it's easy to check as a user beforehand and
> > also easy to check during pg_upgrade, so it seems like an acceptable
> > limitations (which I personally see as a good sanity check, but YMMV). It
> > could be lifted in later releases if wanted anyway.
> >
> > It's unclear to me whether this limited scope would also require to
> > preserve the replication origins, but having looked at the code I don't
> > think it would be much of a problem as the local LSN doesn't have to be
> > preserved.
> >
>
> I think we need to preserve replication origins as they help us to
> determine the WAL location from where to start the streaming after the
> upgrade. If we don't preserve those then from which location will the
> subscriber start streaming?

It would start from the slot's information on the publisher side, but I guess
there's no guarantee that this will be accurate in all cases.

> We don't want to replicate the WAL which
> has already been sent.

Yeah I agree.  I added support to also preserve the subscription's replication
origin information, a new --preserve-subscription-state (better naming welcome)
documented option for pg_upgrade to optionally ask for this new mode, and a
similar (but undocumented) option for pg_dump that only works with
--binary-upgrade and added a check in pg_upgrade that all relations are in 'r'
(ready) mode.  Patch v2 attached.
>From 0a77ac305243e0f58dbfce6bb7c8cf062b45d4f4 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Wed, 22 Feb 2023 09:19:32 +0800
Subject: [PATCH v2] Optionally preserve the full subscription's state during
 pg_upgrade

Previously, only the subscription metadata information was preserved.  Without
the list of relations and their state it's impossible to re-enable the
subscriptions without missing some records as the list of relations can only be
refreshed after enabling the subscription (and therefore starting the apply
worker).  Even if we added a way to refresh the subscription while enabling a
publication, we still wouldn't know which relations are new on the publication
side, and therefore should be fully synced, and which shouldn't.

Similarly, the subscription's replication origin are needed to ensure
that we don't replicate anything twice.

To fix this problem, this patch teaches pg_dump in binary upgrade mode to emit
additional commands to be able to restore the content of pg_subscription_rel,
and addition LSN parameter in the subscription creation to restore the
underlying replication origin remote LSN.  The LSN parameter is only accepted
in CREATE SUBSCRIPTION in binary upgrade mode.

The new ALTER SUBSCRIPTION subcommand, usable only during binary upgrade, has
the following syntax:

ALTER SUBSCRIPTION name ADD TABLE (relid = XYZ, state = 'x' [, lsn = 'X/Y'])

The relation is identified by its oid, as it's preserved during pg_upgrade.
The lsn is optional, and defaults to NULL / InvalidXLogRecPtr.

This mode is optional and not enabled by default.  A new
--preserve-subscription-state option is added to pg_upgrade to use it.  For
now, pg_upgrade will check that all the subscription relations are in 'r'
(ready) state, and will error out if any subscription relation in any database
has a different state, logging the list of problematic databases with the
number of problematic relation in each.

Author: Julien Rouhaud
Reviewed-by: FIXME
Discussion: https://postgr.es/m/20230217075433.u5mjly4d5cr4hcfe@jrouhaud
---
 doc/src/sgml/ref/pgupgrade.sgml |  13 +++
 src/backend/commands/subscriptioncmds.c |  67 +-
 src/backend/parser/gram.y   |  11 +++
 src/bin/pg_dump/pg_backup.h |   2 +
 src/bin/pg_dump/pg_dump.c   | 114 +++-
 src/bin/pg_dump/pg_dump.h   |  13 +++
 src/bin/pg_upgrade/check.c  |  54 +++
 src/bin/pg_upgrade/dump.c   |   3 +-
 src/bin/pg_upgrade/option.c |   7 ++
 src/bin/pg_upgrade/pg_upgrade.h |   1 +
 src/include/nodes/parsenodes.h  |   3 +-
 11 files changed, 283 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7816b4c685..aef3b8a8b8 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -240,6 +240,19 @@ PostgreSQL documentation
   
  
 
+ 
+  --preserve-subscription-state
+  
+   
+Fully preserve

Re: pg_upgrade and logical replication

2023-03-07 Thread Julien Rouhaud

On Sat, 4 Mar 2023, 14:13 Amit Kapila,  wrote:

>
> > For the publisher nodes, that may be something nice to support (I'm
> assuming it
> > could be useful for more complex replication setups) but I'm not
> interested in
> > that at the moment as my goal is to reduce downtime for major upgrade of
> > physical replica, thus *not* doing pg_upgrade of the primary node,
> whether
> > physical or logical.  I don't see why it couldn't be done later on,
> if/when
> > someone has a use case for it.
> >
>
> I thought there is value if we provide a way to upgrade both publisher
> and subscriber.

it's still unclear to me whether it's actually achievable on the publisher
side, as running pg_upgrade leaves a "hole" in the WAL stream and resets
the timeline, among other possible difficulties. Now I don't know much
about logical replication internals so I'm clearly not the best person to
answer those questions.

Now, you came up with a use case linking it to a
> physical replica where allowing an upgrade of only subscriber nodes is
> useful. It is possible that users find your steps easy to perform and
> didn't find them error-prone but it may be better to get some
> authentication of the same. I haven't yet analyzed all the steps in
> detail but let's see what others think.
>

It's been quite some time since and no one seemed to chime in or object.
IMO doing a major version upgrade with limited downtime (so something
faster than stopping postgres and running pg_upgrade) has always been
difficult and never prevented anyone from doing it, so I don't think that
it should be a blocker for what I'm suggesting here, especially since the
current behavior of pg_upgrade on a subscriber node is IMHO broken.

Is there something that can be done for pg16? I was thinking that having a
fix for the normal and easy case could be acceptable: only allowing
pg_upgrade to optionally, and not by default, preserve the subscription
relations IFF all subscriptions only have tables in ready state. Different
states should be transient, and it's easy to check as a user beforehand and
also easy to check during pg_upgrade, so it seems like an acceptable
limitations (which I personally see as a good sanity check, but YMMV). It
could be lifted in later releases if wanted anyway.

It's unclear to me whether this limited scope would also require to
preserve the replication origins, but having looked at the code I don't
think it would be much of a problem as the local LSN doesn't have to be
preserved. In both cases I would prefer a single option (e. g.
--preserve-logical-subscription-state or something like that) to avoid too
much complications. Similarly, I still don't see any sensible use case for
allowing such option in a normal pg_dump so I'd rather not expose that.

>

Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?

2023-03-06 Thread Julien Rouhaud

On Tue, Mar 07, 2023 at 07:14:51PM +1300, Thomas Munro wrote:
> On Tue, Mar 7, 2023 at 5:32 PM Michael Paquier  wrote:
> > On Tue, Mar 07, 2023 at 03:44:46PM +1300, Thomas Munro wrote:
> > > Oh, you already pushed a fix.  But now I'm wondering if it's useful to
> > > have old buggy compilers set to run with -Werror.
> >
> > Yes, as far as I can see when investigating the issue, this is an old
> > bug of gcc when detecting where the initialization needs to be
> > applied.  And at the same time the fix is deadly simple, so the
> > current statu-quo does not sound that bad to me.  Note that lapwing is
> > one of the only animals testing 32b builds, and it has saved from
> > quite few bugs over the years.
>
> Yeah, but I'm just wondering, why not run a current release on it[1]?
> Debian is one of the few distributions still supporting 32 bit
> kernels, and it's good to test rare things, but AFAIK the primary
> reason we finish up with EOL'd OSes in the 'farm is because they have
> been forgotten (the secondary reason is because they couldn't be
> upgraded because the OS dropped the [micro]architecture).

I registered lapwing as a 32b Debian 7 so I thought it would be expected to
keep it as-is rather than upgrading to all newer major Debian versions,
especially since there were newer debian animal registered (no 32b though
AFAICS).  I'm not opposed to upgrading it but I think there's still value in
having somewhat old packages versions being tested, especially since there
isn't much 32b coverage of those.  I would be happy to register a newer 32b
version, or even sid, if needed but the -m32 part on the CI makes me think
there isn't much value doing that now.

Now about the -Werror:

> BTW CI also tests 32 bit with -m32 on Debian, but with a 64 bit
> kernel, which probably doesn't change much at the level we care about,
> so maybe this doesn't matter much... just sharing an observation that
> we're wasting time thinking about an OS release that gave up the ghost
> in 2016, because it is running with -Werror.  *shrug*

I think this is the first time that a problem raised by -Werror on that old
animal is actually a false positive, while there were many times it reported
useful stuff.  Now this has been up for years before we got better CI tooling,
especially with -m32 support, so there might not be any value to have it
anymore.  As I mentioned at [1] I don't mind removing it or just work on
upgrading any dependency (or removing known buggy compiler flags) to keep it
without being annoying.  In any case I'm usually quite fast at reacting to any
problem/complaint on that animal, so you don't have to worry about the
buildfarm being red too long if it came to that.

[1] 
https://www.postgresql.org/message-id/20220921155025.wdixzbrt2uzbi6vz%40jrouhaud

Re: Combine pg_walinspect till_end_of_wal functions with others

2023-03-06 Thread Julien Rouhaud

On Tue, Mar 07, 2023 at 01:56:24PM +0900, Michael Paquier wrote:
> On Tue, Mar 07, 2023 at 12:42:20PM +0800, Julien Rouhaud wrote:
> > ah right I should have checked. but the same ABI compatibility concern
> > still exists for version 1.0 of the extension.
>
> Yes, we'd better make sure that the past code is able to run, at
> least.  Now I am not really convinced that we have the need to enforce
> an error with the new code even if 1.0 is still installed,

So keep this "deprecated" C function working, as it would only be a few lines
of code?

> so as it is
> possible to remove all the traces of the code that triggers errors if
> an end LSN is higher than the current insert LSN for primaries or
> replayed LSN for standbys.

+1 for that

Re: Combine pg_walinspect till_end_of_wal functions with others

2023-03-06 Thread Julien Rouhaud

On Tue, 7 Mar 2023, 12:36 Michael Paquier,  wrote:

> On Tue, Mar 07, 2023 at 09:13:46AM +0800, Julien Rouhaud wrote:
> > It's problematic to install the extension if we rely on upgrade scripts
> only.
> > We could also provide a pg_walinspect--1.2.sql file and it would just
> work, and
> > that may have been a good idea if there wasn't also the problem of
> people still
> > having the version 1.1 locally installed, as we don't want them to see
> random
> > failures like "could not find function ... in file ...", or keeping the
> ability
> > to install the former 1.1 version (with those functions bypassed).
>
> Why would we need a 1.2?  HEAD is the only branch with pg_walinspect
> 1.1, and it has not been released yet.
>

ah right I should have checked. but the same ABI compatibility concern
still exists for version 1.0 of the extension.

>

Re: proposal: possibility to read dumped table's name from file

2023-03-06 Thread Julien Rouhaud

Hi,

On Mon, Mar 06, 2023 at 10:20:32PM +0100, Daniel Gustafsson wrote:
> > On 6 Mar 2023, at 21:45, Gregory Stark (as CFM)  wrote:
> >
> > So This patch has been through a lot of commitfests. And it really
> > doesn't seem that hard to resolve -- Pavel has seemingly been willing
> > to go along whichever way the wind has been blowing but honestly it
> > kind of seems like he's just gotten drive-by suggestions and he's put
> > a lot of work into trying to satisfy them.
>
> Agreed.

Indeed, I'm not sure I would have had that much patience.

> > He implemented --include-tables-from-file=... etc. Then he implemented
> > a hand-written parser for a DSL to select objects, then he implemented
> > a bison parser, then he went back to the hand-written parser.
>
> Well, kind of.  I was trying to take the patch to the finishing line but was
> uncomfortable with the hand written parser so I implemented a parser in Bison
> to replace it with.  Not that hand-written parsers are bad per se (or that my
> bison parser was perfect), but reading quoted identifiers across line
> boundaries tend to require a fair amount of handwritten code.  Pavel did not
> object to this version, but it was objected to by two other committers.
>
> At this point [0] I stepped down from trying to finish it as the approach I 
> was
> comfortable didn't gain traction (which is totally fine).
>
> Downthread from this the patch got a lot of reviews from Julien with the old
> parser back in place.

Yeah, and the current state seems quite good to me.

> > Can we get some consensus on whether the DSL looks right
>
> I would consider this pretty settled.

Agreed.

Re: Combine pg_walinspect till_end_of_wal functions with others

2023-03-06 Thread Julien Rouhaud

On Mon, Mar 06, 2023 at 08:36:17PM +0530, Bharath Rupireddy wrote:
> On Mon, Mar 6, 2023 at 2:22 PM Julien Rouhaud  wrote:
>
> > Also:
> >
> > /*
> > - * Get info and data of all WAL records from start LSN till end of WAL.
> > + * NB: This function does nothing and stays here for backward 
> > compatibility.
> > + * Without it, the extension fails to install.
> >   *
> > - * This function emits an error if a future start i.e. WAL LSN the database
> > - * system doesn't know about is specified.
> > + * Try using pg_get_wal_records_info() for the same till_end_of_wal
> > + * functionaility.
> >
> > I don't like much this chunk (same for the other kept function).  Apart from
> > the obvious typo in "functionaility", I don't think that the comment is 
> > really
> > accurate.
>
> Can you be more specific what's inaccurate about the comment?

It's problematic to install the extension if we rely on upgrade scripts only.
We could also provide a pg_walinspect--1.2.sql file and it would just work, and
that may have been a good idea if there wasn't also the problem of people still
having the version 1.1 locally installed, as we don't want them to see random
failures like "could not find function ... in file ...", or keeping the ability
to install the former 1.1 version (with those functions bypassed).

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1641 matches

Mail list logo