Re: How to check for in-progress transactions

2023-03-15 Thread Bharath Rupireddy
On Thu, Mar 16, 2023 at 1:18 AM Tejasvi Kashi  wrote:
>
> For my use case, I'm trying to ascertain if there are any in-flight 
> transactions that are yet to be replicated to synchronous standbys (in a 
> synchronous streaming replication setting)
>
> I've been looking at sent_lsn, write_lsn, flush_lsn etc., of the walsender, 
> but with no success. Considering the visibility change added above, is there 
> a way for me to check for transactions that have been committed locally but 
> are waiting for replication?

I think you can look for SyncRep wait_event from pg_stat_activity,
something like [1]. The backends will wait indefinitely until latch is
set (postmaster death or an ack is received from sync standbys) in
SyncRepWaitForLSN(). backend_xid is your
locally-committed-but-not-yet-replicated txn id. Will this help?

Well, if you're planning to know all
locally-committed-but-not-yet-replicated txns from an extension or any
other source code, you may run the full query [1] or if running a
query seems costly, you can look at what pg_stat_get_activity() does
to get each backend's wait_event_info and have your code do that.

BTW, what exactly is the use-case that'd want
locally-committed-but-not-yet-replicated txns info?

[1]
postgres=# select * from pg_stat_activity where backend_type = 'client
backend' and wait_event = 'SyncRep';
-[ RECORD 1 ]+--
datid| 5
datname  | postgres
pid  | 4187907
leader_pid   |
usesysid | 10
usename  | ubuntu
application_name | psql
client_addr  |
client_hostname  |
client_port  | -1
backend_start| 2023-03-16 05:16:56.917124+00
xact_start   | 2023-03-16 05:17:09.472092+00
query_start  | 2023-03-16 05:17:09.472092+00
state_change | 2023-03-16 05:17:09.472095+00
wait_event_type  | IPC
wait_event   | SyncRep
state| active
backend_xid  | 731
backend_xmin | 731
query_id |
query| create table foo(col1 int);
backend_type | client backend

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Get rid of PgStat_BackendFunctionEntry

2023-03-15 Thread Michael Paquier
On Wed, Mar 08, 2023 at 01:38:38PM -0800, Nathan Bossart wrote:
> Looks reasonable to me.

I have been catching up with this thread and the other thread, and
indeed it looks like this is going to help in refactoring
pgstatfuncs.c to have more macros for all these mostly-duplicated
functions.  So, I have applied this bit.
--
Michael


signature.asc
Description: PGP signature


Re: Add macros for ReorderBufferTXN toptxn

2023-03-15 Thread Amit Kapila
On Thu, Mar 16, 2023 at 10:40 AM Bharath Rupireddy
 wrote:
>
> On Thu, Mar 16, 2023 at 7:20 AM Peter Smith  wrote:
> >
> > PSA v4 which addresses both of your review comments.
>
> Looks like a reasonable change to me.
>
> A nitpick: how about using rbtxn_get_toptxn instead of an explicit
> variable toptxn for single use?
>

I find all three suggestions are similar. Personally, I think the
current code looks better. The v4 patch LGTM and I am planning to
commit it unless there are more comments or people find your
suggestions as an improvement.

-- 
With Regards,
Amit Kapila.




Re: Bug in jsonb_in function (14 & 15 version are affected)

2023-03-15 Thread Yura Sokolov
В Пн, 13/03/2023 в 13:58 -0400, Tom Lane пишет:
> Nikolay Shaplov  writes:
> > I found a bug in jsonb_in function (it converts json from sting 
> > representation
> >  into jsonb internal representation).
> 
> Yeah.  Looks like json_lex_string is failing to honor the invariant
> that it needs to set token_terminator ... although the documentation
> of the function certainly isn't helping.  I think we need the attached.
> 
> A nice side benefit is that the error context reports get a lot more
> useful --- somebody should have inquired before as to why they were
> so bogus.
> 
> regards, tom lane
> 

Good day, Tom and all.

Merged patch looks like start of refactoring.

Colleague (Nikita Glukhov) propose further refactoring of jsonapi.c:
- use of inline functions instead of macroses,
- more uniform their usage in token success or error reporting,
- simplify json_lex_number and its usage a bit.
Also he added tests for fixed bug.


-

Regards,
Yura Sokolov.
From 757a314d5fa9c6ba8334762b4a080763f02244c5 Mon Sep 17 00:00:00 2001
From: Nikita Glukhov 
Date: Tue, 14 Mar 2023 12:15:48 +0300
Subject: [PATCH] Refactor JSON lexer error reporting

Also add special tests for already fixed bug and for multibyte symbols after surrogates.
---
 src/common/jsonapi.c  | 246 +++---
 src/test/regress/expected/json_encoding.out   |  25 ++
 src/test/regress/expected/json_encoding_1.out |  23 ++
 src/test/regress/sql/json_encoding.sql|   7 +
 4 files changed, 149 insertions(+), 152 deletions(-)

diff --git a/src/common/jsonapi.c b/src/common/jsonapi.c
index 2e86589cfd8..d4a9d8f0378 100644
--- a/src/common/jsonapi.c
+++ b/src/common/jsonapi.c
@@ -44,8 +44,7 @@ typedef enum	/* contexts of JSON parser */
 } JsonParseContext;
 
 static inline JsonParseErrorType json_lex_string(JsonLexContext *lex);
-static inline JsonParseErrorType json_lex_number(JsonLexContext *lex, char *s,
- bool *num_err, int *total_len);
+static inline JsonParseErrorType json_lex_number(JsonLexContext *lex, char *s);
 static inline JsonParseErrorType parse_scalar(JsonLexContext *lex, JsonSemAction *sem);
 static JsonParseErrorType parse_object_field(JsonLexContext *lex, JsonSemAction *sem);
 static JsonParseErrorType parse_object(JsonLexContext *lex, JsonSemAction *sem);
@@ -104,8 +103,6 @@ lex_expect(JsonParseContext ctx, JsonLexContext *lex, JsonTokenType token)
 bool
 IsValidJsonNumber(const char *str, int len)
 {
-	bool		numeric_error;
-	int			total_len;
 	JsonLexContext dummy_lex;
 
 	if (len <= 0)
@@ -128,9 +125,8 @@ IsValidJsonNumber(const char *str, int len)
 		dummy_lex.input_length = len;
 	}
 
-	json_lex_number(_lex, dummy_lex.input, _error, _len);
-
-	return (!numeric_error) && (total_len == dummy_lex.input_length);
+	return json_lex_number(_lex, dummy_lex.input) == JSON_SUCCESS &&
+		dummy_lex.token_terminator == dummy_lex.input + dummy_lex.input_length;
 }
 
 /*
@@ -545,6 +541,37 @@ parse_array(JsonLexContext *lex, JsonSemAction *sem)
 	return JSON_SUCCESS;
 }
 
+/* Convenience inline functions for success/error exits from lexer */
+static inline JsonParseErrorType
+json_lex_success(JsonLexContext *lex, char *token_terminator,
+ JsonTokenType token_type)
+{
+	lex->prev_token_terminator = lex->token_terminator;
+	lex->token_terminator = token_terminator;
+	lex->token_type = token_type;
+
+	return JSON_SUCCESS;
+}
+
+static inline JsonParseErrorType
+json_lex_fail_at_char_start(JsonLexContext *lex, char *s,
+			JsonParseErrorType code)
+{
+	lex->token_terminator = s;
+
+	return code;
+}
+
+static inline JsonParseErrorType
+json_lex_fail_at_char_end(JsonLexContext *lex, char *s,
+		  JsonParseErrorType code)
+{
+	lex->token_terminator =
+		s + pg_encoding_mblen_bounded(lex->input_encoding, s);
+
+	return code;
+}
+
 /*
  * Lex one token from the input stream.
  */
@@ -553,7 +580,6 @@ json_lex(JsonLexContext *lex)
 {
 	char	   *s;
 	char	   *const end = lex->input + lex->input_length;
-	JsonParseErrorType result;
 
 	/* Skip leading whitespace. */
 	s = lex->token_terminator;
@@ -571,9 +597,7 @@ json_lex(JsonLexContext *lex)
 	if (s >= end)
 	{
 		lex->token_start = NULL;
-		lex->prev_token_terminator = lex->token_terminator;
-		lex->token_terminator = s;
-		lex->token_type = JSON_TOKEN_END;
+		return json_lex_success(lex, s, JSON_TOKEN_END);
 	}
 	else
 	{
@@ -581,49 +605,31 @@ json_lex(JsonLexContext *lex)
 		{
 /* Single-character token, some kind of punctuation mark. */
 			case '{':
-lex->prev_token_terminator = lex->token_terminator;
-lex->token_terminator = s + 1;
-lex->token_type = JSON_TOKEN_OBJECT_START;
-break;
+return json_lex_success(lex, s + 1, JSON_TOKEN_OBJECT_START);
+
 			case '}':
-lex->prev_token_terminator = lex->token_terminator;
-lex->token_terminator = s + 1;
-lex->token_type = JSON_TOKEN_OBJECT_END;
-break;
+return json_lex_success(lex, s + 1, JSON_TOKEN_OBJECT_END);
+
 			case '[':
-		

Re: Add macros for ReorderBufferTXN toptxn

2023-03-15 Thread Bharath Rupireddy
On Thu, Mar 16, 2023 at 7:20 AM Peter Smith  wrote:
>
> PSA v4 which addresses both of your review comments.

Looks like a reasonable change to me.

A nitpick: how about using rbtxn_get_toptxn instead of an explicit
variable toptxn for single use?

1.
Change
ReorderBufferTXN *toptxn = rbtxn_get_toptxn(txn);
TestDecodingTxnData *txndata = toptxn->output_plugin_private;

To
TestDecodingTxnData *txndata = rbtxn_get_toptxn(txn)->output_plugin_private;

2.
Change
ReorderBufferTXN *toptxn = rbtxn_get_toptxn(txn);
toptxn->txn_flags |= RBTXN_HAS_STREAMABLE_CHANGE;

To
rbtxn_get_toptxn(txn)->txn_flags |= RBTXN_HAS_STREAMABLE_CHANGE;

3.
Change
/*
 * Update the total size in top level as well. This is later used to
 * compute the decoding stats.
 */
toptxn = rbtxn_get_toptxn(txn);

if (addition)
{
txn->size += sz;
rb->size += sz;

/* Update the total size in the top transaction. */
toptxn->total_size += sz;
}
else
{
Assert((rb->size >= sz) && (txn->size >= sz));
txn->size -= sz;
rb->size -= sz;

/* Update the total size in the top transaction. */
toptxn->total_size -= sz;
}

To

/*
 * Update the total size in top level as well. This is later used to
 * compute the decoding stats.
 */
if (addition)
{
txn->size += sz;
rb->size += sz;
rbtxn_get_toptxn(txn)->total_size += sz;
}
else
{
Assert((rb->size >= sz) && (txn->size >= sz));
txn->size -= sz;
rb->size -= sz;
rbtxn_get_toptxn(txn)->total_size -= sz;
}

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: zstd compression for pg_dump

2023-03-15 Thread Justin Pryzby
On Fri, Mar 10, 2023 at 12:48:13PM -0800, Jacob Champion wrote:
> On Wed, Mar 8, 2023 at 10:59 AM Jacob Champion  
> wrote:
> > I did some smoke testing against zstd's GitHub release on Windows. To
> > build against it, I had to construct an import library, and put that
> > and the DLL into the `lib` folder expected by the MSVC scripts...
> > which makes me wonder if I've chosen a harder way than necessary?
> 
> It looks like pg_dump's meson.build is missing dependencies on zstd
> (meson couldn't find the headers in the subproject without them).

I saw that this was added for LZ4, but I hadn't added it for zstd since
I didn't run into an issue without it.  Could you check that what I've
added works for your case ?

> > Parallel zstd dumps seem to work as expected, in that the resulting
> > pg_restore output is identical to uncompressed dumps and nothing
> > explodes. I haven't inspected the threading implementation for safety
> > yet, as you mentioned.
> 
> Hm. Best I can tell, the CloneArchive() machinery is supposed to be
> handling safety for this, by isolating each thread's state. I don't feel
> comfortable pronouncing this new addition safe or not, because I'm not
> sure I understand what the comments in the format-specific _Clone()
> callbacks are saying yet.

My line of reasoning for unix is that pg_dump forks before any calls to
zstd.  Nothing zstd does ought to affect the pg_dump layer.  But that
doesn't apply to pg_dump under windows.  This is an opened question.  If
there's no solid answer, I could disable/ignore the option (maybe only
under windows).

> On to code (not a complete review):
> 
> > if (hasSuffix(fname, ".gz"))
> > compression_spec.algorithm = PG_COMPRESSION_GZIP;
> > else
> > {
> > boolexists;
> > 
> > exists = (stat(path, ) == 0);
> > /* avoid unused warning if it is not built with compression */
> > if (exists)
> > compression_spec.algorithm = PG_COMPRESSION_NONE;
> > -#ifdef HAVE_LIBZ
> > -   if (!exists)
> > -   {
> > -   free_keep_errno(fname);
> > -   fname = psprintf("%s.gz", path);
> > -   exists = (stat(fname, ) == 0);
> > -
> > -   if (exists)
> > -   compression_spec.algorithm = PG_COMPRESSION_GZIP;
> > -   }
> > -#endif
> > -#ifdef USE_LZ4
> > -   if (!exists)
> > -   {
> > -   free_keep_errno(fname);
> > -   fname = psprintf("%s.lz4", path);
> > -   exists = (stat(fname, ) == 0);
> > -
> > -   if (exists)
> > -   compression_spec.algorithm = PG_COMPRESSION_LZ4;
> > -   }
> > -#endif
> > +   else if (check_compressed_file(path, , "gz"))
> > +   compression_spec.algorithm = PG_COMPRESSION_GZIP;
> > +   else if (check_compressed_file(path, , "lz4"))
> > +   compression_spec.algorithm = PG_COMPRESSION_LZ4;
> > +   else if (check_compressed_file(path, , "zst"))
> > +   compression_spec.algorithm = PG_COMPRESSION_ZSTD;
> > }
> 
> This function lost some coherence, I think. Should there be a hasSuffix
> check at the top for ".zstd" (and, for that matter, ".lz4")?

The function is first checking if it was passed a filename which already
has a suffix.  And if not, it searches through a list of suffixes,
testing for an existing file with each suffix.  The search with stat()
doesn't happen if it has a suffix.  I'm having trouble seeing how the
hasSuffix() branch isn't dead code.  Another opened question.

> I'm a little suspicious of the replacement of supports_compression()
> with parse_compress_specification(). For example:
> 
> > -   errmsg = supports_compression(AH->compression_spec);
> > -   if (errmsg)
> > +   parse_compress_specification(AH->compression_spec.algorithm,
> > +   NULL, _spec);
> > +   if (compress_spec.parse_error != NULL)
> > {
> > pg_log_warning("archive is compressed, but this installation does 
> > not support compression (%s
> > -   errmsg);
> > -   pg_free(errmsg);
> > +   compress_spec.parse_error);
> > +   pg_free(compress_spec.parse_error);
> > }
> 
> The top-level error here is "does not support compression", but wouldn't
> a bad specification option with a supported compression method trip this
> path too?

No - since the 2nd argument is passed as NULL, it just checks whether
the compression is supported.  Maybe there ought to be a more
direct/clean way to do it.  But up to now evidently nobody needed to do
that.

> > +static void
> > +ZSTD_CCtx_setParam_or_die(ZSTD_CStream *cstream,
> > + ZSTD_cParameter param, int value, char *paramname)
> 
> IMO we should avoid stepping on the ZSTD_ namespace with our own
> internal function names.

done

> > +   if (cs->readF != NULL)
> > +
> > +   if (cs->writeF != NULL)
> 
> This seems to suggest that both cs->readF and cs->writeF could be set,
> but in that case, the output buffer gets 

Re: Simplify some codes in pgoutput

2023-03-15 Thread Amit Kapila
On Wed, Mar 15, 2023 at 2:00 PM houzj.f...@fujitsu.com
 wrote:
>
> I noticed that there are some duplicated codes in pgoutput_change() function
> which can be simplified, and here is an attempt to do that.
>

For REORDER_BUFFER_CHANGE_DELETE, when the old tuple is missing, after
this patch, we will still send BEGIN and do OutputPluginWrite, etc.
Also, it will try to perform row_filter when none of old_slot or
new_slot is set. I don't know for which particular case we have s
handling missing old tuples for deletes but that might require changes
in your proposed patch.

-- 
With Regards,
Amit Kapila.




Re: Initial Schema Sync for Logical Replication

2023-03-15 Thread Peter Smith
Hi,

I have a couple of questions.

Q1.

What happens if the subscriber already has some tables present? For
example, I did not see the post saying anything like "Only if the
table does not already exist then it will be created".

On the contrary, the post seemed to say SUBREL_STATE_CREATE 'c' would
*always* be set when this subscriber mode is enabled. And then it
seemed to say the table would *always* get re-created by the tablesync
in this new mode.

Won't this cause problems
- if the user wanted a slightly different subscriber-side table? (eg
some extra columns on the subscriber-side table)
- if there was some pre-existing table data on the subscriber-side
table that you now are about to re-create and clobber?

Or does the idea intend that the CREATE TABLE DDL that will be
executed is like "CREATE TABLE ... IF NOT EXISTS"?

~~~

Q2.

The post says. "DDL replication is required for this feature". And "It
should also have turned on publication of DDL commands."

It wasn't entirely clear to me why those must be a requirement. Is
that just to make implementation easier?

Sure, I see that the idea might have some (or maybe a lot?) of common
internal code with the table DDL replication work, but OTOH an
auto-create feature for subscriber tables seems like it might be a
useful feature to have regardless of the value of the publication
'ddl' parameter.

--
Kind Regards,
Peter Smith.
Fujitsu Australia.




Re: "current directory" in a server error message

2023-03-15 Thread Bharath Rupireddy
On Thu, Mar 16, 2023 at 7:47 AM Kyotaro Horiguchi
 wrote:
>
> Hello.
>
> When I ran pg_ls_dir('..'), the error message I received was somewhat
> difficult to understand.
>
> postgres=> select * from pg_ls_dir('..');
> ERROR:  path must be in or below the current directory
>
> As far as I know the concept of a "current directory" doesn't apply to
> the server side. In fact, the function comment for
> convert_and_check_filename explicitly states that:
>
> > * Filename may be absolute or relative to the DataDir
>
> Thus I think that the message should read "path must be in or below
> the data directory" instead.
>
> What do you think about making this change?

Well yes. As far as postgres processes are concerned their working
directory is set to data directory by the postmaster in
ChangeToDataDir() and all the children will inherit that setting. So,
I see nothing wrong in being explicit about it in the error messages.

BTW, adminpack too has the same error message.

FWIW, here are the steps to generate the error:
create role foo with nosuperuser;
grant execute on function pg_ls_dir(text) to foo;
set role foo;
select * from pg_ls_dir('..');

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: meson documentation build open issues

2023-03-15 Thread Andres Freund
Hi,

On 2023-03-15 08:14:09 +0100, Peter Eisentraut wrote:
> I have identified several open issues with the documentation build under
> Meson (approximately in priority order):
>
> 1. Image files are not handled at all, so they don't show up in the final
> product.

Hm. Somehow I thought I'd tackled that at some point. Ah. I got there for the
PDF output, but didn't realize it's also an issue for the html output.

For FO it sufficed to set the img.src.path param. For HTML that's not enough,
because that just adjusts the link to the file - but we don't want to link to
the source file. We actually solved this for the single-page html version - we
just embed the svg.  I wonder if we should just do that as well.

Another way would be to emit the files into the desired place as part of the
stylesheet. While it requires touching xslt, it does seems somewhat more
elegant than just copying files around. I did implement that, curious what you
think.


> 2. Defaults to website stylesheet, no way to configure.  This should be
> adjusted to match the make build.

Should we add a meson option?


> 3. The various build targets and their combinations are mismatching and
> incomplete.  For example:
>
> Top-level GNUmakefile has these targets:
>
> - docs (builds html and man)
> - html
> - man
>
> (Those are the formats that are part of a distribution build.)
>
> doc/src/sgml/Makefile has these documented targets:
>
> - default target is html
> - all (builds html and man, maps to top-level "docs")
> - html
> - man
> - postgres-A4.pdf
> - postgres-US.pdf
> - check
>
> as well as (undocumented):
>
> - htmlhelp
> - postgres.html
> - postgres.txt
> - epub
> - postgres.epub
> - postgres.info
>
> meson has the following documented targets:
>
> - docs (builds only html)
> - alldocs (builds all formats, including obscure ones)
>
> as well as the following undocumented targets:
>
> - html
> - man
> - html_help [sic]

renamed in the attached patch.


> - postgres-A4.pdf
> - postgres-US.pdf
> - postgres.epub

Note that these are actually named doc/src/sgml/{html,man,...}, not top-level
targets.


> - [info is not implemented at all]

Would be easy to implement, but not sure it's worth doing.


> - [didn't find an equivalent of check]

That's probably worth doing - should it be run as an actual test, or be a
target?


> 4. There doesn't appear to be a way to install the documentation.
> (There are also some open questions in the top-level meson.build about
> the installation directories, but I suppose if we can't install them
> then exactly where to install them hasn't been thought about too
> much.)

WIP patch for that attached. There's now
  install-doc-man
  install-doc-html
run targets and a
  install-docs
alias target.


I did end up getting stuck when hacking on this, and ended up adding css
support for nochunk and support for the website style for htmlhelp and
nochunk, as well as obsoleting the need for copying the css files... But
perhaps that's a bit too much.

Greetings,

Andres Freund
>From e604a5c5acdaa840ede0c83205c9799a9edb2630 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Wed, 15 Mar 2023 15:53:14 -0700
Subject: [PATCH v1 1/5] meson: rename html_help target to htmlhelp

Reported-by: Peter Eisentraut 
Discussion: https://postgr.es/m/3fc3bb9b-f7f8-d442-35c1-ec82280c5...@enterprisedb.com
---
 doc/src/sgml/meson.build | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/meson.build b/doc/src/sgml/meson.build
index 38f1b8e7b17..e6fe124c7bc 100644
--- a/doc/src/sgml/meson.build
+++ b/doc/src/sgml/meson.build
@@ -123,7 +123,7 @@ if xsltproc_bin.found()
   # build multi-page html docs as part of docs target
   docs += html
 
-  html_help = custom_target('html_help',
+  htmlhelp = custom_target('htmlhelp',
 input: ['stylesheet-hh.xsl', postgres_full_xml],
 output: 'htmlhelp',
 depfile: 'htmlhelp.d',
@@ -131,7 +131,7 @@ if xsltproc_bin.found()
 command: [xsltproc, '--path', '@OUTDIR@', '-o', '@OUTDIR@/', xsltproc_flags, '@INPUT@'],
 build_by_default: false,
   )
-  alldocs += html_help
+  alldocs += htmlhelp
 
 
   # single-page HTML
-- 
2.38.0

>From a739950d06678f28f08deb48d080939041dc5584 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Wed, 15 Mar 2023 16:29:27 -0700
Subject: [PATCH v1 2/5] meson: make install_test_files more generic, rename to
 install_files

Now it supports installing directories and directory contents as well. This
will be used in a subsequent patch to install doc contents.
---
 meson.build  |  5 ++-
 src/tools/install_files  | 71 
 src/tools/install_test_files | 33 -
 3 files changed, 74 insertions(+), 35 deletions(-)
 create mode 100644 src/tools/install_files
 delete mode 100644 src/tools/install_test_files

diff --git a/meson.build b/meson.build
index 2ebdf914c1b..b4b87f1a9fc 100644
--- a/meson.build
+++ b/meson.build
@@ -369,6 +369,8 @@ flex_cmd = [python, flex_wrapper,
 

Re: Allow logical replication to copy tables in binary format

2023-03-15 Thread Amit Kapila
On Thu, Mar 16, 2023 at 8:27 AM Euler Taveira  wrote:
>
> On Wed, Mar 8, 2023, at 11:50 PM, Amit Kapila wrote:
>
> It is not clear to me which version check you wanted to add because we
> seem to have a binary option in COPY from the time prior to logical
> replication. I feel we need a publisher version 14 check as that is
> where we start to support binary mode transfer in logical replication.
> See the check in function libpqrcv_startstreaming().
>
> ... then you are breaking existent cases. Even if you have a convincing
> argument, you are introducing a behavior change in prior versions (commit
> messages should always indicate that you are breaking backward compatibility).
>
> +
> +   /*
> +* The binary option for replication is supported since v14
> +*/
> +   if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 14 &&
> +   MySubscription->binary)
> +   {
> +   appendStringInfo(, " WITH (FORMAT binary)");
> +   options = lappend(options, makeDefElem("format", (Node *) 
> makeString("binary"), -1));
> +   }
> +
>
> What are the arguments to support since v14 instead of the to-be-released
> version? I read the thread but it is not clear. It was said about the
> restrictive nature of this feature and it will be frustrating to see that the
> same setup (with the same commands) works with v14 and v15 but it doesn't with
> v16.
>

If the failure has to happen it will anyway happen later when the
publisher will be upgraded to v16. The reason for the version checks
as v14 was to allow the initial sync from the same version where the
binary mode for replication was introduced. However, if we expect
failures in the existing setup, I am fine with supporting this for >=
v16.

> IMO it should be >= 16 and documentation should explain that v14/v15 uses
> text format during initial table synchronization even if binary = true.
>

Yeah, if we change the version then the corresponding text in the
patch should also be changed.

> Should there be a fallback mode (text) if initial table synchronization failed
> because of the binary option? Maybe a different setting (auto) to support such
> behavior.
>

I think the workaround is that the user disables binary mode for the
time of initial sync. I think if we want to extend and add a fallback
(text) mode then it is better to keep it as default behavior rather
than introducing a new setting like 'auto'. Personally, I feel it can
be added later after doing some more study.

-- 
With Regards,
Amit Kapila.




Re: windows CI failing PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED

2023-03-15 Thread Alexander Lakhin

16.03.2023 00:13, Thomas Munro wrote:

Thanks.  We were wondering if the retry mechanism might somehow be
hiding this in non-assert builds, but, looking more closely, that is
tied specifically to the memory reservation operation.


As to hiding, when analyzing the Assert issue, I was wondered if 
PMSignalShmemInit() can hide an error when calling ShmemInitStruct() for backends?
If I understand correctly, backends get PMSignalState through backend_variables, 
so maybe ShmemInitStruct() should not be executed for backends at all?


Best regards,
Alexander




Re: Allow logical replication to copy tables in binary format

2023-03-15 Thread Euler Taveira
On Wed, Mar 8, 2023, at 11:50 PM, Amit Kapila wrote:
> On Wed, Mar 8, 2023 at 6:17 PM Melih Mutlu  wrote:
> >
> > On 7 Mar 2023 Tue at 04:10 Amit Kapila  wrote:
> >>
> >> As per what I could read in this thread, most people prefer to use the
> >> existing binary option rather than inventing a new way (option) to
> >> binary copy in the initial sync phase. Do you agree?
> >
> >
> > I agree.
> > What do you think about the version checks? I removed any kind of check 
> > since it’s currently a different option. Should we check publisher version 
> > before doing binary copy to ensure that the publisher node supports binary 
> > option of COPY command?
> >
> 
> It is not clear to me which version check you wanted to add because we
> seem to have a binary option in COPY from the time prior to logical
> replication. I feel we need a publisher version 14 check as that is
> where we start to support binary mode transfer in logical replication.
> See the check in function libpqrcv_startstreaming().
... then you are breaking existent cases. Even if you have a convincing
argument, you are introducing a behavior change in prior versions (commit
messages should always indicate that you are breaking backward compatibility).

+
+   /*
+* The binary option for replication is supported since v14
+*/
+   if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 14 &&
+   MySubscription->binary)
+   {
+   appendStringInfo(, " WITH (FORMAT binary)");
+   options = lappend(options, makeDefElem("format", (Node *) 
makeString("binary"), -1));
+   }
+

What are the arguments to support since v14 instead of the to-be-released
version? I read the thread but it is not clear. It was said about the
restrictive nature of this feature and it will be frustrating to see that the
same setup (with the same commands) works with v14 and v15 but it doesn't with
v16. IMO it should be >= 16 and documentation should explain that v14/v15 uses
text format during initial table synchronization even if binary = true.

Should there be a fallback mode (text) if initial table synchronization failed
because of the binary option? Maybe a different setting (auto) to support such
behavior.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: Allow logical replication to copy tables in binary format

2023-03-15 Thread Amit Kapila
On Wed, Mar 15, 2023 at 3:33 PM Melih Mutlu  wrote:
>
> Amit Kapila , 15 Mar 2023 Çar, 12:31 tarihinde şunu 
> yazdı:
>>
>> On Tue, Mar 14, 2023 at 4:32 PM Melih Mutlu  wrote:
>
>
>>
>> What purpose does this test serve w.r.t this patch? Before checking
>> the sync for different column orders, the patch has already changed
>> binary to false, so it doesn't seem to test the functionality of this
>> patch. Am, I missing something?
>
>
> I missed that binary has changed to false before testing column orders. I 
> moved that test case up before changing binary to false.
> Please see v14 [1].
>

After thinking some more about this test, I don't think we need this
test as this doesn't add any value to this patch. This tests the
column orders which is well-established functionality of the apply
worker.

-- 
With Regards,
Amit Kapila.




RE: Allow logical replication to copy tables in binary format

2023-03-15 Thread houzj.f...@fujitsu.com
Hi,

Thanks for updating the patch, I think it is a useful feature.

I looked at the v15 patch and the patch looks mostly good to me.
Here are few comments:

1.
+   {
+   appendStringInfo(, " WITH (FORMAT binary)");

We could use appendStringInfoString here.


2.
+# It should fail
+$node_subscriber->wait_for_log(qr/ERROR: ( [A-Z0-9]+:)? no binary input 
function available for type/);
...
+# Cannot sync due to type mismatch
+$node_subscriber->wait_for_log(qr/ERROR: ( [A-Z0-9]+:)? incorrect binary data 
format/);
...
+# Ensure the COPY command is executed in text format on the publisher
+$node_publisher->wait_for_log(qr/LOG: ( [a-z0-9]+:)? COPY (.+)? TO STDOUT\n/);

I think it would be better to pass the log offset when using wait_for_log,
because otherwise it will check the whole log file to find the target message,
This might not be a big problem, but it has a risk of getting unexpected log 
message
which was generated by previous commands.

Best Regards,
Hou zj


Re: Allow logical replication to copy tables in binary format

2023-03-15 Thread Amit Kapila
On Thu, Mar 16, 2023 at 5:59 AM Peter Smith  wrote:
>
> On Wed, Mar 15, 2023 at 5:31 PM Amit Kapila  wrote:
> >
> > On Wed, Mar 15, 2023 at 11:52 AM Peter Smith  wrote:
> > >
> > > ==
> > > src/backend/replication/logical/tablesync.c
> > >
> > > 5.
> > > +
> > > + /*
> > > + * If the publisher version is earlier than v14, it COPY command doesn't
> > > + * support the binary option.
> > > + */
> > > + if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 14 &&
> > > + MySubscription->binary)
> > > + {
> > > + appendStringInfo(, " WITH (FORMAT binary)");
> > > + options = lappend(options, makeDefElem("format", (Node *)
> > > makeString("binary"), -1));
> > > + }
> > >
> > > Sorry, I gave a poor review comment for this previously. Now I have
> > > revisited all the thread discussions about version checking. I feel
> > > that some explanation should be given in the code comment so that
> > > future readers of this code can understand why you decided to use v14
> > > checking.
> > >
> > > Something like this:
> > >
> > > SUGGESTION
> > > If the publisher version is earlier than v14, we use text format COPY.
> > >
> >
> > I think this isn't explicit that we supported the binary format since
> > v14. So, I would prefer my version of the comment as suggested in the
> > previous email.
> >
>
> Hmm, but my *full* suggestion was bigger than what is misquoted above,
> and it certainly did say " logical replication binary mode transfer
> was not introduced until v14".
>
> SUGGESTION
> If the publisher version is earlier than v14, we use text format COPY.
> Note - In fact COPY syntax "WITH (FORMAT binary)" has existed since
> v9, but since the logical replication binary mode transfer was not
> introduced until v14 it was decided to check using the later version.
>

I find this needlessly verbose.

> ~~
>
> Anyway, the shortened comment as in the latest v15 patch is fine by me too.
>

Okay, then let's go with that.

-- 
With Regards,
Amit Kapila.




RE: Support logical replication of DDLs

2023-03-15 Thread wangw.f...@fujitsu.com
On Tues, Mar 14, 2023 12:17 PM Ajin Cherian  wrote:
> On Mon, Mar 13, 2023 at 2:24 AM Zheng Li  wrote:
> >
> > Thanks for working on the test coverage for CREATE and ALTER TABLE.
> > I've made fixes for some of the failures in the v79 patch set (0002,
> > 0003 and 0004 are updated). The changes includes:
> > 1. Fixed a syntax error caused by ON COMMIT clause placement in
> > deparse_CreateStmt.
> > 2. Fixed deparse_Seq_As and start using it in deparse_CreateSeqStmt,
> > this issue is also reported in [1].
> > 3. Fixed a bug in append_not_present: the 'present: false' element
> > can't be omitted even in non-verbose mode. It will cause syntax error
> > on reformed command if 'present: false' element is missing but the fmt
> > string indicates the corresponding object must be present.
> > 4. Replaced if_not_exists with if_exists in deparse of
> > AT_DropConstraint and AT_DropColumn.
> > 5. Added missing CASCADE clause for AT_DropConstraint deparse.
> > 6. Enabled the fixed test cases.
> >
> 
> I found out that the option ONLY was not parsed in the "CREATE INDEX"
> command,
> for eg: CREATE UNIQUE INDEX ... ON ONLY table_name ...
> 
> I've fixed this in patch 0002.

Thanks for the new patch set.

Here are some comments:

For v-80-0002* patch.
1. The comments atop the function deparse_IndexStmt.
+ * Verbose syntax
+ * CREATE %{unique}s INDEX %{concurrently}s %{if_not_exists}s %{name}I ON
+ * %{table}D USING %{index_am}s %{definition}s %{with}s %{tablespace}s
+ * %{where_clause}s %{nulls_not_distinct}s
+ */
+static ObjTree *
+deparse_IndexStmt(Oid objectId, Node *parsetree)

Since we added decoding for the [ONLY] option in this version, it seems that we
also need to add related comments, like this:
```
%{table}D USING %{index_am}s %{definition}s %{with}s %{tablespace}s
->
%{only}s %{table}D USING %{index_am}s %{definition}s %{with}s %{tablespace}s
```

===

For v-80-0003* patch.
2. In the function deparse_CreateTrigStmt.
I think we need to parse the [OR REPLACE] option for CREATE TRIGGER command.

And I think there are two similar missing in the functions
deparse_DefineStmt_Aggregate (option [OR REPLACE]) and
deparse_DefineStmt_Collation (option [IF NOT EXISTS]).

===

For v-80-0004* patch.
3. There are some whitespace errors:
Applying: Introduce the test_ddl_deparse_regress test module.
.git/rebase-apply/patch:163: new blank line at EOF.
+
.git/rebase-apply/patch:3020: new blank line at EOF.
+
.git/rebase-apply/patch:4114: new blank line at EOF.
+
warning: 3 lines add whitespace errors.

Hou zj and I will try to address these comments soon.

Regards,
Wang wei


RE: Allow logical replication to copy tables in binary format

2023-03-15 Thread shiy.f...@fujitsu.com
On Thu, Mar 16, 2023 2:26 AM Melih Mutlu  wrote:
> 
> Right, it needs to be ordered. Fixed.
> 

Hi,

Thanks for updating the patch. I tested some cases like toast data, combination
of row filter and column lists, and it works well.

Here is a comment:

+# Ensure the COPY command is executed in binary format on the publisher
+$node_publisher->wait_for_log(qr/LOG: ( [a-z0-9]+:)? COPY (.+)? TO STDOUT WITH 
\(FORMAT binary\)/);

The test failed with `log_error_verbosity = verbose` because it couldn't match
the following log:
2023-03-16 09:45:50.096 CST [2499415] pg_16398_sync_16391_7210954376230900539 
LOG:  0: statement: COPY public.test_arrays (a, b, c) TO STDOUT WITH 
(FORMAT binary)

I think we should make it pass, see commit 19408aae7f.
Should it be changed to:

$node_publisher->wait_for_log(qr/LOG: ( [A-Z0-9]+:)? statement: COPY (.+)? TO 
STDOUT WITH \(FORMAT binary\)/);

Besides, for the same reason, this line also needs to be modified.
+$node_publisher->wait_for_log(qr/LOG: ( [a-z0-9]+:)? COPY (.+)? TO STDOUT\n/);

Regards,
Shi Yu


"current directory" in a server error message

2023-03-15 Thread Kyotaro Horiguchi
Hello.

When I ran pg_ls_dir('..'), the error message I received was somewhat
difficult to understand.

postgres=> select * from pg_ls_dir('..');
ERROR:  path must be in or below the current directory

As far as I know the concept of a "current directory" doesn't apply to
the server side. In fact, the function comment for
convert_and_check_filename explicitly states that:

> * Filename may be absolute or relative to the DataDir

Thus I think that the message should read "path must be in or below
the data directory" instead.

What do you think about making this change?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/utils/adt/genfile.c b/src/backend/utils/adt/genfile.c
index 75588eebb3..f281ce9806 100644
--- a/src/backend/utils/adt/genfile.c
+++ b/src/backend/utils/adt/genfile.c
@@ -86,7 +86,7 @@ convert_and_check_filename(text *arg)
 	else if (!path_is_relative_and_below_cwd(filename))
 		ereport(ERROR,
 (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
- errmsg("path must be in or below the current directory")));
+ errmsg("path must be in or below the data directory")));
 
 	return filename;
 }


Re: Add macros for ReorderBufferTXN toptxn

2023-03-15 Thread Peter Smith
On Wed, Mar 15, 2023 at 4:55 PM Masahiko Sawada  wrote:
>
> Hi,
>
...
> +1 to the idea. Here are some minor comments:
>
> @@ -1667,7 +1658,7 @@ ReorderBufferTruncateTXN(ReorderBuffer *rb,
> ReorderBufferTXN *txn, bool txn_prep
>   * about the toplevel xact (we send the XID in all messages), but we never
>   * stream XIDs of empty subxacts.
>   */
> - if ((!txn_prepared) && ((!txn->toptxn) || (txn->nentries_mem != 0)))
> + if ((!txn_prepared) && (rbtxn_is_toptxn(txn) || (txn->nentries_mem != 0)))
>   txn->txn_flags |= RBTXN_IS_STREAMED;
>
> Probably the following comment of the above lines also needs to be updated?
>
>  * The toplevel transaction, identified by (toptxn==NULL), is marked as
>  * streamed always,
>
> ---
> +/* Is this a top-level transaction? */
> +#define rbtxn_is_toptxn(txn)\
> +(\
> + (txn)->toptxn == NULL\
> +)
> +
> +/* Is this a subtransaction? */
> +#define rbtxn_is_subtxn(txn)\
> +(\
> + (txn)->toptxn != NULL\
> +)
> +
> +/* Get the top-level transaction of this (sub)transaction. */
> +#define rbtxn_get_toptxn(txn)\
> +(\
> + rbtxn_is_subtxn(txn) ? (txn)->toptxn : (txn)\
> +)
>
> We need a whitespace before backslashes.
>

Thanks for your interest in my patch.

PSA v4 which addresses both of your review comments.

--
Kind Regards,
Peter Smith.
Fujitsu Australia


v4-0001-Add-macros-for-ReorderBufferTXN-toptxn.patch
Description: Binary data


Re: Making empty Bitmapsets always be NULL

2023-03-15 Thread Yuya Watari
Hello,

On Tue, Mar 7, 2023 at 1:07 PM David Rowley  wrote:
> I adjusted the patch to remove the invariant checks and fixed up a
> couple of things I'd missed.  The 0002 patch changes the for loops
> into do while loops. I wanted to see if we could see any performance
> gains from doing this.

I noticed that these patches caused significant degradation while
working on improving planning performance in another thread [1].

In the experiment, I used the query attached to this email. This
workload consists of eight tables, each of which is split into n
partitions. The "query.sql" measures the planning time of a query that
joins these tables. You can quickly reproduce my experiment using the
following commands.

=
psql -f create-tables.sql
psql -f query.sql
=

I show the result in the following tables. I refer to David's patches
in [2] as the "trailing-zero" patch. When n was large, the
trailing-zero patch showed significant degradation. This is due to too
many calls of repalloc(). With this patch, we cannot reuse spaces
after the last non-zero bitmapword, so we need to call repalloc() more
frequently than before. When n is 256, repalloc() was called only 670
times without the patch, while it was called 5694 times with the
patch.

Table 1: Planning time (ms)
-
   n |   Master | Patched (trailing-zero) | Patched (bitwise-OR)
-
   1 |   37.639 |  37.330 |   36.979
   2 |   36.066 |  35.646 |   36.044
   4 |   37.958 |  37.349 |   37.842
   8 |   42.397 |  42.994 |   39.779
  16 |   54.565 |  67.713 |   44.186
  32 |   89.220 | 100.828 |   65.542
  64 |  227.854 | 269.059 |  150.398
 128 |  896.513 |1279.965 |  577.671
 256 | 4241.994 |8220.508 | 2538.681
-

Table 2: Planning time speedup (higher is better)
--
   n | Patched (trailing-zero) | Patched (bitwise-OR)
--
   1 |  100.8% |   101.8%
   2 |  101.2% |   100.1%
   4 |  101.6% |   100.3%
   8 |   98.6% |   106.6%
  16 |   80.6% |   123.5%
  32 |   88.5% |   136.1%
  64 |   84.7% |   151.5%
 128 |   70.0% |   155.2%
 256 |   51.6% |   167.1%
--

On Fri, Mar 3, 2023 at 10:52 AM David Rowley  wrote:
> The patch also optimizes sub-optimal newly added code which calls
> bms_is_empty_internal() when we have other more optimal means to
> determine if the set is empty or not.

However, I agree with David's opinion regarding the
bms_is_empty_internal() calls, which is quoted above. I have
implemented this optimization in a slightly different way than David.
My patch is attached to this email. The difference between my patch
and David's is in the determination method of whether the result is
empty: David's patch records the last index of non-zero bitmapword to
minimize the Bitmapset. If the index is -1, we can conclude that the
result is empty. In contrast, my patch uses a more lightweight
operation. I show my changes as follows.

=
@@ -263,6 +261,7 @@ bms_intersect(const Bitmapset *a, const Bitmapset *b)
 const Bitmapset *other;
 intresultlen;
 inti;
+bitmapwordbitwise_or = 0;

 /* Handle cases where either input is NULL */
 if (a == NULL || b == NULL)
@@ -281,9 +280,17 @@ bms_intersect(const Bitmapset *a, const Bitmapset *b)
 /* And intersect the longer input with the result */
 resultlen = result->nwords;
 for (i = 0; i < resultlen; i++)
-result->words[i] &= other->words[i];
+{
+bitmapwordw = (result->words[i] &= other->words[i]);
+
+/*
+ * Compute bitwise OR of all bitmapwords to determine if the result
+ * is empty
+ */
+bitwise_or |= w;
+}
 /* If we computed an empty result, we must return NULL */
-if (bms_is_empty_internal(result))
+if (bitwise_or == 0)
 {
 pfree(result);
 return NULL;
@@ -711,30 +718,6 @@ bms_membership(const Bitmapset *a)
 return result;
 }
=

My idea is to compute the bitwise OR of all bitmapwords of the result
Bitmapset. The bitwise OR can be represented as a single operation in
the machine code and does not require any conditional branches. If the
bitwise ORed value is zero, we can conclude the result Bitmapset is
empty. 

Re: recovery modules

2023-03-15 Thread Michael Paquier
On Tue, Mar 14, 2023 at 09:13:09PM -0700, Nathan Bossart wrote:
> However, that's not what happens when hot_standby is off.  In that case,
> the postmaster.pid file is updated with PM_STATUS_STANDBY once recovery
> starts, which wait_for_postmaster_start() interprets as "ready."  I see
> this was reported before [0], but that discussion fizzled out.  IIUC it was
> done this way to avoid infinite waits when hot_standby is off and standby
> mode is enabled.  I could be missing something obvious, but that doesn't
> seem necessary when hot_standby is off and recovery mode is enabled because
> recovery should end at some point (never mind the halting problem).  I'm
> still digging into this and may spin off a new thread if I can conjure up a
> proposal.
> 
> [0] 
> https://postgr.es/m/CAMkU%3D1wrMqPggnEfszE-c3PPLmKgRK17_qr7tmxBECYEbyV-4Q%40mail.gmail.com

These days, knowing hot_standby is on by default, and that users would
recover up to the end-of-backup record of just use read replicas, do
we have a strong case for keeping this GUC parameter at all?  It does
not strike me that we really need to change a five-year-old behavior
if there has been few complaints about it.  I agree that it is
confusing as it stands, but the long-term simplifications may be worth
it in the recovery code (aka less booleans needed to track the flow of
the startup process, and less confusion around that).
--
Michael


signature.asc
Description: PGP signature


Re: Fix fseek() detection of unseekable files on WIN32

2023-03-15 Thread Michael Paquier
On Wed, Mar 15, 2023 at 12:18:25PM +0100, Juan José Santamaría Flecha wrote:
> PFA a new version of the patch.

+_pgftello64(FILE *stream)
+{
+   DWORD   fileType;
+
+   fileType = GetFileType((HANDLE) _get_osfhandle(_fileno(stream)));

Hmm.  I am a bit surprised here..  It seems to me that we should make
sure that:
- We exist quickly if _get_osfhandle() returns -2 or
INVALID_HANDLE_VALUE, returning EINVAL?
- After GetFileType(), check for GetLastError() and the
FILE_TYPE_UNKNOWN case?

Do you think that these would be improvements?
--
Michael


signature.asc
Description: PGP signature


Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

2023-03-15 Thread Melanie Plageman
Thanks for the reviews and for trying the patch!

On Wed, Mar 15, 2023 at 4:31 AM Masahiko Sawada  wrote:
>
> On Sat, Mar 11, 2023 at 11:55 PM Melanie Plageman
>  wrote:
> >
> > > On Tue, Feb 28, 2023 at 3:16 AM Bharath Rupireddy
> > >  wrote:
> > >
> > > > On Thu, Jan 12, 2023 at 6:06 AM Andres Freund  
> > > > wrote:
> > > > >
> > > > > On 2023-01-11 17:26:19 -0700, David G. Johnston wrote:
> > > > > > Should we just add "ring_buffers" to the existing "shared_buffers" 
> > > > > > and
> > > > > > "temp_buffers" settings?
> > > > >
> > > > > The different types of ring buffers have different sizes, for good 
> > > > > reasons. So
> > > > > I don't see that working well. I also think it'd be more often useful 
> > > > > to
> > > > > control this on a statement basis - if you have a parallel import 
> > > > > tool that
> > > > > starts NCPU COPYs you'd want a smaller buffer than a single threaded 
> > > > > COPY. Of
> > > > > course each session can change the ring buffer settings, but still.
> > > >
> > > > How about having GUCs for each ring buffer (bulk_read_ring_buffers,
> > > > bulk_write_ring_buffers, vacuum_ring_buffers - ah, 3 more new GUCs)?
> > > > These options can help especially when statement level controls aren't
> > > > easy to add (COPY, CREATE TABLE AS/CTAS, REFRESH MAT VIEW/RMV)? If
> > > > needed users can also set them at the system level. For instance, one
> > > > can set bulk_write_ring_buffers to other than 16MB or -1 to disable
> > > > the ring buffer to use shared_buffers and run a bunch of bulk write
> > > > queries.
> >
> > In attached v3, I've changed the name of the guc from buffer_usage_limit
> > to vacuum_buffer_usage_limit, since it is only used for vacuum and
> > autovacuum.
> >
> > I haven't added the other suggested strategy gucs, as those could easily
> > be done in a future patchset.
> >
> > I've also changed GetAccessStrategyExt() to GetAccessStrategyWithSize()
> > -- similar to initArrayResultWithSize().
> >
> > And I've added tab completion for BUFFER_USAGE_LIMIT so that it is
> > easier to try out my patch.
> >
> > Most of the TODOs in the code are related to the question of how
> > autovacuum uses the guc vacuum_buffer_usage_limit. autovacuum creates
> > the buffer access strategy ring in do_autovacuum() before looping
> > through and vacuuming tables. It passes this strategy object on to
> > vacuum(). Since we reuse the same strategy object for all tables in a
> > given invocation of do_autovacuum(), only failsafe autovacuum would
> > change buffer access strategies. This is probably okay, but it does mean
> > that the table-level VacuumParams variable, ring_size, means something
> > different for autovacuum than vacuum. Autovacuum workers will always
> > have set it to -1. We won't ever reach code in vacuum() which relies on
> > VacuumParams->ring_size as long as autovacuum workers pass a non-NULL
> > BufferAccessStrategy object to vacuum(), though.
>
> I've not reviewed the patchset in depth yet but I got assertion
> failure and SEGV when using the buffer_usage_limit parameter.
>
> postgres(1:471180)=# vacuum (buffer_usage_limit 100) ;
> 2023-03-15 17:10:02.947 JST [471180] ERROR:  buffer_usage_limit for a
> vacuum must be between -1 and 16777216. 100 is invalid. at
> character 9
>
> The message show the max value is 16777216, but when I set it, I got
> an assertion failure:
>
> postgres(1:470992)=# vacuum (buffer_usage_limit 16777216) ;
> TRAP: failed Assert("ring_size < MAX_BAS_RING_SIZE_KB"), File:
> "freelist.c", Line: 606, PID: 470992
>
> Then when I used 1 byte lower value, 16777215, I got a SEGV:
>
> postgres(1:471180)=# vacuum (buffer_usage_limit 16777215) ;
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: 2023-03-15
> 17:10:59.404 JST [471159] LOG:  server process (PID 471180) was
> terminated by signal 11: Segmentation fault
>
> Finally, when I used a more lower value, 16777100, I got a memory
> allocation error:
>
> postgres(1:471361)=# vacuum (buffer_usage_limit 16777100) ;
> 2023-03-15 17:12:17.853 JST [471361] ERROR:  invalid memory alloc
> request size 18446744073709551572
>
> Probably vacuum_buffer_usage_limit also has the same issue.

Oh dear--it seems I had an integer overflow when calculating the number
of buffers using the specified buffer size in the macro:

 #define bufsize_limit_to_nbuffers(bufsize) (bufsize * 1024 / BLCKSZ)

In the attached v4, I've changed that to:

  static inline int
  bufsize_limit_to_nbuffers(int bufsize_limit_kb)
  {
int blcksz_kb = BLCKSZ / 1024;

Assert(blcksz_kb > 0);

return bufsize_limit_kb / blcksz_kb;
  }

This should address Justin's suggestions and Ants' concern about the
macro as well.

Also, I was missing the = in the Assert(ring_size <= MAX_BAS_RING_SIZE)
I've fixed that as well, so it should work for you to 

Re: psql \watch 2nd argument: iteration count

2023-03-15 Thread Michael Paquier
On Wed, Mar 15, 2023 at 04:58:49PM +0900, Michael Paquier wrote:
> Yep.  You are right.

Fixed that and applied 0001.

+valptr++;
+if (strncmp("i", opt, strlen("i")) == 0 ||
+strncmp("interval", opt, strlen("interval")) == 0)
+{

Did you look at process_command_g_options() and if some consolidation
was possible?  It would be nice to have APIs shaped so as more
sub-commands could rely on the same facility in the future.

-\watch [ seconds ]
+\watch [ seconds [ iterations ] ]

This set of changes is not reflected in the documentation.

With an interval in place, we could now automate some tests with
\watch where it does not fail.  What do you think about adding a test
with a simple query, an interval of 0s and one iteration?
--
Michael


signature.asc
Description: PGP signature


Re: Add a hook to allow modification of the ldapbindpasswd

2023-03-15 Thread Michael Paquier
On Wed, Mar 15, 2023 at 06:18:28PM -0400, Andrew Dunstan wrote:
> Ugh. Not batting 1000 today. Will investigate.

I have noticed that you forgot a .gitignore in this new path, as well,
so I have taken the liberty to add one ;)

FWIW, I use git-sh-prompt prompt to detect such things quickly.
--
Michael


signature.asc
Description: PGP signature


Re: Allow logical replication to copy tables in binary format

2023-03-15 Thread Peter Smith
On Wed, Mar 15, 2023 at 5:31 PM Amit Kapila  wrote:
>
> On Wed, Mar 15, 2023 at 11:52 AM Peter Smith  wrote:
> >
> > ==
> > src/backend/replication/logical/tablesync.c
> >
> > 5.
> > +
> > + /*
> > + * If the publisher version is earlier than v14, it COPY command doesn't
> > + * support the binary option.
> > + */
> > + if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 14 &&
> > + MySubscription->binary)
> > + {
> > + appendStringInfo(, " WITH (FORMAT binary)");
> > + options = lappend(options, makeDefElem("format", (Node *)
> > makeString("binary"), -1));
> > + }
> >
> > Sorry, I gave a poor review comment for this previously. Now I have
> > revisited all the thread discussions about version checking. I feel
> > that some explanation should be given in the code comment so that
> > future readers of this code can understand why you decided to use v14
> > checking.
> >
> > Something like this:
> >
> > SUGGESTION
> > If the publisher version is earlier than v14, we use text format COPY.
> >
>
> I think this isn't explicit that we supported the binary format since
> v14. So, I would prefer my version of the comment as suggested in the
> previous email.
>

Hmm, but my *full* suggestion was bigger than what is misquoted above,
and it certainly did say " logical replication binary mode transfer
was not introduced until v14".

SUGGESTION
If the publisher version is earlier than v14, we use text format COPY.
Note - In fact COPY syntax "WITH (FORMAT binary)" has existed since
v9, but since the logical replication binary mode transfer was not
introduced until v14 it was decided to check using the later version.

~~

Anyway, the shortened comment as in the latest v15 patch is fine by me too.

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Add LZ4 compression in pg_dump

2023-03-15 Thread Justin Pryzby
On Mon, Mar 13, 2023 at 10:47:12PM +0100, Tomas Vondra wrote:
> > Rearrange functions to their original order allowing a cleaner diff to the 
> > prior code;
> 
> OK. I wasn't very enthusiastic about this initially, but after thinking
> about it a bit I think it's meaningful to make diffs clearer. But I
> don't see much difference with/without the patch. The
> 
> git diff --diff-algorithm=minimal -w e9960732a~:src/bin/pg_dump/compress_io.c 
> src/bin/pg_dump/compress_gzip.c
> 
> Produces ~25k diff with/without the patch. What am I doing wrong?

Do you mean 25 kB of diff ?  I agree that the statistics of the diff
output don't change a lot:

  1 file changed, 201 insertions(+), 570 deletions(-)
  1 file changed, 198 insertions(+), 548 deletions(-)

But try reading the diff while looking for the cause of a bug.  It's the
difference between reading 50, two-line changes, and reading a hunk that
replaces 100 lines with a different 100 lines, with empty/unrelated
lines randomly thrown in as context.

When the diff is readable, the pg_fatal() also stands out.

> > Change pg_fatal() to an assertion+comment;
> 
> Yeah, that's reasonable. I'd even ditch the assert/comment, TBH. We
> could add such protections against "impossible" stuff to a zillion other
> places and the confusion likely outweighs the benefits.
> 
> > Update the commit message and fix a few typos;
> 
> Thanks. I don't want to annoy you too much, but could you split the
> patch into the "empty-data" fix and all the other changes (rearranging
> functions etc.)? I'd rather not mix those in the same commit.

I don't know if that makes sense?  The "empty-data" fix creates a new
function called DeflateCompressorInit().  My proposal was to add the new
function in the same place in the file as it used to be.

The patch also moves the pg_fatal() that's being removed.  I don't think
it's going to look any cleaner to read a history involving the
pg_fatal() first being added, then moved, then removed.  Anyway, I'll
wait while the community continues discussion about the pg_fatal().

-- 
Justin




Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

2023-03-15 Thread Justin Pryzby
On Tue, Mar 14, 2023 at 08:56:58PM -0400, Melanie Plageman wrote:
> On Sat, Mar 11, 2023 at 2:16 PM Justin Pryzby  wrote:

> > > @@ -586,6 +587,45 @@ GetAccessStrategy(BufferAccessStrategyType btype)
> > > +BufferAccessStrategy
> > > +GetAccessStrategyWithSize(BufferAccessStrategyType btype, int ring_size)
> >
> > Maybe it would make sense for GetAccessStrategy() to call
> > GetAccessStrategyWithSize().  Or maybe not.
> 
> You mean instead of having anyone call GetAccessStrategyWithSize()?
> We would need to change the signature of GetAccessStrategy() then -- and
> there are quite a few callers.

I mean to avoid code duplication, GetAccessStrategy() could "Select ring
size to use" and then call into GetAccessStrategyWithSize().  Maybe it's
counter to your intent or otherwise not worth it to save 8 LOC.

> > > +#vacuum_buffer_usage_limit = -1 # size of vacuum buffer access strategy 
> > > ring.
> > > + # -1 to use default,
> > > + # 0 to disable vacuum buffer access 
> > > strategy and use shared buffers
> > > + # > 0 to specify size
> >
> > If I'm not wrong, there's still no documentation about "ring buffers" or
> > postgres' "strategy".  Which seems important to do for this patch, along
> > with other documentation.
> 
> Yes, it is. I have been thinking about where in the docs to add it (the
> docs about buffer access strategies). Any ideas?

This patch could add something to the vacuum manpage and to the appendix.
And maybe references from the shared_buffers and pg_buffercache
manpages.

> > This patch should add support in vacuumdb.c.
> 
> Oh, I had totally forgotten about vacuumdb.

:)

> > And maybe a comment about adding support there, since it's annoying
> > when it the release notes one year say "support VACUUM (FOO)" and then
> > one year later say "support vacuumdb --foo".
> 
> I'm not sure I totally follow. Do you mean to add a comment to
> ExecVacuum() saying to add support to vacuumdb when adding a new option
> to vacuum?

Yeah, like:
/* Options added here should also be added to vacuumdb.c */

> In the past, did people forget to add support to vacuumdb for new vacuum
> options or did they forget to document that they did that or did they
> forgot to include that they did that in the release notes?

The first.  Maybe not often, it's not important whether it's in the
original patch, but it's odd if the vacuumdb option isn't added until
the following release, which then shows up as a separate "feature".

-- 
Justin




Re: Progress report of CREATE INDEX for nested partitioned tables

2023-03-15 Thread Justin Pryzby
On Tue, Mar 14, 2023 at 06:58:14PM +0400, Ilya Gladyshev wrote:
> > The only change from the current patch is (3).  (1) still calls
> > count_leaf_partitions(), but only once.  I'd prefer that to rearranging
> > the progress reporting to set the TOTAL in ProcessUtilitySlow().
>
> As for reusing TOTAL calculated outside of DefineIndex, as I can see, 
> ProcessUtilitySlow is not the only call site for DefineIndex (although, I 
> don’t know whether all of them need progress tracking), for instance, there 
> is ALTER TABLE that calls DefineIndex to create index for constraints. So I 
> feel like rearranging progress reporting will result in unnecessary code 
> duplication in those call sites, so passing in an optional parameter seems to 
> be easier here, if we are going to optimize it, after all. Especially if 
> back-patching is a non-issue.

Yeah.  See attached.  I don't like duplicating the loop.  Is this really
the right direction to go ?

I haven't verified if the child tables are locked in all the paths which
would call count_leaf_partitions().  But why is it important to lock
them for this?  If they weren't locked before, that'd be a pre-existing
problem...
>From 1e05b055d1bd21265b68265ddef5e9654629087c Mon Sep 17 00:00:00 2001
From: Ilya Gladyshev 
Date: Tue, 31 Jan 2023 19:13:07 +0400
Subject: [PATCH] fix CREATE INDEX progress report with nested partitions

The progress reporting was added in v12 (ab0dfc961) but the original
patch didn't seem to consider the possibility of nested partitioning.

When called recursively, DefineIndex() would clobber the number of
completed partitions, and it was possible to end up with the TOTAL
counter greater than the DONE counter.

This clarifies/re-defines that the progress reporting counts both direct
and indirect children, but doesn't count intermediate partitioned tables:

- The TOTAL counter is set once at the start of the command.
- For indexes which are newly-built, the recursively-called
DefineIndex() increments the DONE counter.
- For pre-existing indexes which are ATTACHed rather than built,
DefineIndex() increments the DONE counter, unless the attached index is
partitioned, in which case progress report is not updated.

Author: Ilya Gladyshev
Reviewed-By: Justin Pryzby, Tomas Vondra, Dean Rasheed, Alvaro Herrera, Matthias van de Meent
Discussion: https://www.postgresql.org/message-id/flat/a15f904a70924ffa4ca25c3c744cff31e0e6e143.camel%40gmail.com
---
 doc/src/sgml/monitoring.sgml  | 10 ++-
 src/backend/bootstrap/bootparse.y |  2 +
 src/backend/commands/indexcmds.c  | 70 +--
 src/backend/commands/tablecmds.c  |  4 +-
 src/backend/tcop/utility.c|  8 ++-
 src/backend/utils/activity/backend_progress.c | 28 
 src/include/commands/defrem.h |  1 +
 src/include/utils/backend_progress.h  |  1 +
 8 files changed, 115 insertions(+), 9 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 6249bb50d02..3f891b75541 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6901,7 +6901,10 @@ FROM pg_stat_get_backend_idset() AS backendid;
   
   
When creating an index on a partitioned table, this column is set to
-   the total number of partitions on which the index is to be created.
+   the total number of partitions on which the index is to be created or attached.
+   In the case of intermediate partitioned tables, this includes both
+   direct and indirect partitions, but excludes the intermediate
+   partitioned tables themselves.
This field is 0 during a REINDEX.
   
  
@@ -6912,7 +6915,10 @@ FROM pg_stat_get_backend_idset() AS backendid;
   
   
When creating an index on a partitioned table, this column is set to
-   the number of partitions on which the index has been created.
+   the number of partitions on which the index has been created or attached.
+   In the case of intermediate partitioned tables, this includes both
+   direct and indirect partitions, but excludes the intermediate
+   partitioned tables themselves.
This field is 0 during a REINDEX.
   
  
diff --git a/src/backend/bootstrap/bootparse.y b/src/backend/bootstrap/bootparse.y
index 86804bb598e..81a1b7bfec3 100644
--- a/src/backend/bootstrap/bootparse.y
+++ b/src/backend/bootstrap/bootparse.y
@@ -306,6 +306,7 @@ Boot_DeclareIndexStmt:
 $4,
 InvalidOid,
 InvalidOid,
+-1,
 false,
 false,
 false,
@@ -358,6 +359,7 @@ Boot_DeclareUniqueIndexStmt:
 $5,
 InvalidOid,
 InvalidOid,
+-1,
 false,
 false,
 false,
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 16ec0b114e6..62a8f0d6aa2 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -130,6 

Re: Allow logical replication to copy tables in binary format

2023-03-15 Thread Peter Smith
Here are some review comments for v15-0001

==
doc/src/sgml/logical-replication.sgml

1.
+   target table. However, logical replication in binary format is
more restrictive,
+   see binary option of
+   CREATE
SUBSCRIPTION
+   for more details.

IMO  (and Chat-GPT agrees) the new text should be 2 sentences.

Also, I changed "more details" --> "details" because none are provided here,.

SUGGESTION
However, logical replication in binary format is
more restrictive. See the binary option of CREATE
SUBSCRIPTION for details.

==
doc/src/sgml/ref/alter_subscription.sgml

2.
+ 
+  See binary option of 
+  for details of copying pre-existing data in binary format.
+ 

Should the link should be defined more like you did above using the
 markup to get the better font?

SUGGESTION (also minor rewording)
See the binary option of CREATE
SUBSCRIPTION for details about copying pre-existing
data in binary format.

==
doc/src/sgml/ref/create_subscription.sgml

3.
  
-  Specifies whether the subscription will request the publisher to
-  send the data in binary format (as opposed to text).
-  The default is false.
-  Even when this option is enabled, only data types having
-  binary send and receive functions will be transferred in binary.
+  Specifies whether the subscription will request the publisher to send
+  the data in binary format (as opposed to text). The default is
+  false. Any initial table synchronization copy
+  (see copy_data) also uses the same format. Binary
+  format can be faster than the text format, but it is less portable
+  across machine architectures and PostgreSQL versions. Binary format
+  is very data type specific; for example, it will not allow copying
+  from a smallint column to an integer column, even though that would
+  work fine in text format. Even when this option is enabled, only data
+  types having binary send and receive functions will be transferred in
+  binary. Note that the initial synchronization requires all data types
+  to have binary send and receive functions, otherwise the
synchronization
+  will fail (see  for more about
+  send/receive functions).
  

IMO that part saying "from a smallint column to an integer column"
should have  markups for "smallint" and "integer".

==
src/backend/replication/logical/tablesync.c

4.
+ /*
+ * The binary option for replication is supported since v14
+ */
+ if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 14 &&
+ MySubscription->binary)
+ {
+ appendStringInfo(, " WITH (FORMAT binary)");
+ options = lappend(options, makeDefElem("format", (Node *)
makeString("binary"), -1));
+ }

Should this now be a single-line comment instead of spanning 3 lines?

==
src/test/subscription/t/014_binary.pl

5.
Everything looked OK to me, but the TAP file has only small comments
for each test step, which forces you to read everything from
top-to-bottom to understand what is going on. I felt it might be
easier to understand the tests if you add a few "bigger" comments just
to break the tests into the categories being tested. For example,
something like:


# --
# Ensure binary mode also executes COPY in binary format
# --

~

# --
# Ensure normal binary replication works
# --

~

# --
# Use ALTER SUBSCRIPTION to change to text format and then back to binary format
# --

~

# ---
# Test binary replication without and with send/receive functions
# ---

~

# --
# Test different column orders on pub/sub tables
# --

~

# -
# Test mismatched column types with/without binary mode
# -

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Add a hook to allow modification of the ldapbindpasswd

2023-03-15 Thread Andrew Dunstan


On 2023-03-15 We 17:50, Tom Lane wrote:

Andrew Dunstan  writes:

pushed.

drongo is not happy with this, but I'm kind of baffled as to why:

"c:\\prog\\bf\\root\\HEAD\\pgsql.build\\pgsql.sln" (default target) (1) ->
"c:\\prog\\bf\\root\\HEAD\\pgsql.build\\ldap_password_func.vcxproj" (default 
target) (60) ->
(Link target) ->
   ldap_password_func.obj : error LNK2001: unresolved external symbol 
__imp_ldap_password_hook 
[c:\\prog\\bf\\root\\HEAD\\pgsql.build\\ldap_password_func.vcxproj]
   .\\Release\\ldap_password_func\\ldap_password_func.dll : fatal error 
LNK1120: 1 unresolved externals 
[c:\\prog\\bf\\root\\HEAD\\pgsql.build\\ldap_password_func.vcxproj]

The only obvious explanation for a link problem would be if the
variable's declaration were missing PGDLLIMPORT; but it's not.





Ugh. Not batting 1000 today. Will investigate.


cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: ICU locale validation / canonicalization

2023-03-15 Thread Jeff Davis
On Tue, 2023-03-14 at 23:47 -0700, Jeff Davis wrote:
> On Tue, 2023-03-14 at 10:10 -0700, Jeff Davis wrote:
> > One loose end is that we really should support language tags like
> > "und"
> > in those older versions (54 and earlier). Your commit d72900bded
> > avoided the problem, but perhaps we should fix it by looking for
> > "und"
> > and replacing it with "root" while opening, or something.
> 
> Attached are a few patches to implement this idea.

Here is an updated patch series that includes these earlier fixes for
older ICU versions, with the canonicalization patch last (0005).

I left out the validation patch for now, and I'm evaluating a different
approach that will attempt to match to the locales retrieved with
uloc_countAvailable()/uloc_getAvailable().


-- 
Jeff Davis
PostgreSQL Contributor Team - AWS


From 70d98770ce6c1795ab172adf10bda87dafa310e3 Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Tue, 14 Mar 2023 09:58:29 -0700
Subject: [PATCH v5 1/5] Support language tags in older ICU versions (53 and
 earlier).

By calling uloc_canonicalize() before parsing the attributes, the
existing locale attribute parsing logic works on language tags as
well.

Fix a small memory leak, too.

Discussion: http://postgr.es/m/60da0cecfb512a78b8666b31631a636215d8ce73.ca...@j-davis.com
---
 src/backend/commands/collationcmds.c  |  8 +++---
 src/backend/utils/adt/pg_locale.c | 26 ---
 .../regress/expected/collate.icu.utf8.out |  8 ++
 src/test/regress/sql/collate.icu.utf8.sql |  4 +++
 4 files changed, 38 insertions(+), 8 deletions(-)

diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 8949684afe..b8f2e7059f 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -950,7 +950,6 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 			const char *name;
 			char	   *langtag;
 			char	   *icucomment;
-			const char *iculocstr;
 			Oid			collid;
 
 			if (i == -1)
@@ -959,20 +958,19 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 name = uloc_getAvailable(i);
 
 			langtag = get_icu_language_tag(name);
-			iculocstr = U_ICU_VERSION_MAJOR_NUM >= 54 ? langtag : name;
 
 			/*
 			 * Be paranoid about not allowing any non-ASCII strings into
 			 * pg_collation
 			 */
-			if (!pg_is_ascii(langtag) || !pg_is_ascii(iculocstr))
+			if (!pg_is_ascii(langtag) || !pg_is_ascii(langtag))
 continue;
 
 			collid = CollationCreate(psprintf("%s-x-icu", langtag),
 	 nspid, GetUserId(),
 	 COLLPROVIDER_ICU, true, -1,
-	 NULL, NULL, iculocstr, NULL,
-	 get_collation_actual_version(COLLPROVIDER_ICU, iculocstr),
+	 NULL, NULL, langtag, NULL,
+	 get_collation_actual_version(COLLPROVIDER_ICU, langtag),
 	 true, true);
 			if (OidIsValid(collid))
 			{
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 1d3d4d86d3..b9c7fbd511 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -2643,9 +2643,28 @@ pg_attribute_unused()
 static void
 icu_set_collation_attributes(UCollator *collator, const char *loc)
 {
-	char	   *str = asc_tolower(loc, strlen(loc));
+	UErrorCode	status;
+	int32_t		len;
+	char	   *icu_locale_id;
+	char	   *lower_str;
+	char	   *str;
 
-	str = strchr(str, '@');
+	/* first, make sure the string is an ICU format locale ID */
+	status = U_ZERO_ERROR;
+	len = uloc_canonicalize(loc, NULL, 0, );
+	icu_locale_id = palloc(len + 1);
+	status = U_ZERO_ERROR;
+	len = uloc_canonicalize(loc, icu_locale_id, len + 1, );
+	if (U_FAILURE(status))
+		ereport(ERROR,
+(errmsg("canonicalization failed for locale string \"%s\": %s",
+		loc, u_errorName(status;
+
+	lower_str = asc_tolower(icu_locale_id, strlen(icu_locale_id));
+
+	pfree(icu_locale_id);
+
+	str = strchr(lower_str, '@');
 	if (!str)
 		return;
 	str++;
@@ -2660,7 +2679,6 @@ icu_set_collation_attributes(UCollator *collator, const char *loc)
 			char	   *value;
 			UColAttribute uattr;
 			UColAttributeValue uvalue;
-			UErrorCode	status;
 
 			status = U_ZERO_ERROR;
 
@@ -2727,6 +2745,8 @@ icu_set_collation_attributes(UCollator *collator, const char *loc)
 loc, u_errorName(status;
 		}
 	}
+
+	pfree(lower_str);
 }
 
 #endif			/* USE_ICU */
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index 9a3e12e42d..6225b575ce 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1304,6 +1304,14 @@ SELECT 'abc' <= 'ABC' COLLATE case_insensitive, 'abc' >= 'ABC' COLLATE case_inse
  t| t
 (1 row)
 
+-- test language tags
+CREATE COLLATION lt_insensitive (provider = icu, locale = 'en-u-ks-level1', deterministic = false);
+SELECT 'aBcD' COLLATE lt_insensitive = 'AbCd' COLLATE lt_insensitive;
+ ?column? 
+--
+ t
+(1 row)
+
 CREATE TABLE test1cs (x text COLLATE 

Re: Memory leak in libxml2 (was Re: [PATCH] Add pretty-printed XML output option)

2023-03-15 Thread Daniel Gustafsson
> On 15 Mar 2023, at 22:38, Tom Lane  wrote:

> After a bit of further testing: the leak is present in libxml2 2.9.7
> which is what I have on this RHEL8 box, but it seems not to occur
> in libxml2 2.10.3 (tested on Fedora 37, and I verified that Fedora
> isn't carrying any relevant local patch).
> 
> So maybe it's worth working around that, or maybe it isn't.

2.9.7 is from November 2017 and 2.10.3 is from October 2022, so depending on
when in that timespan the issue was fixed it might be in a release which will
be with us for quite some time.  The lack of reports (that I was able to find)
indicate that it might be rare in production though?

--
Daniel Gustafsson





Re: Speed-up shared buffers prewarming

2023-03-15 Thread Melanie Plageman
On Wed, Mar 15, 2023 at 4:38 PM Konstantin Knizhnik  wrote:
> It is well known fact that queries using sequential scan can not be used to 
> prewarm cache, because them are using ring buffer
> even if shared buffers are almost empty.
> I have searched hackers archive but failed to find any discussion about it.
> What are the drawbacks of using free buffers even with BAM_BULKREAD strategy?
> I mean the following trivial patch:

It has been brought up at least in 2014 [1] and 2020 [2]
The part relevant to your patch is in the thread from 2020 here [3].
This quote in particular:

> a) Don't evict buffers when falling off the ringbuffer as long as
> there unused buffers on the freelist. Possibly just set their
> usagecount to zero as long that is the case.

> diff --git a/src/backend/storage/buffer/freelist.c 
> b/src/backend/storage/buffer/freelist.c
> index 6be80476db..243335d0e4 100644
> --- a/src/backend/storage/buffer/freelist.c
> +++ b/src/backend/storage/buffer/freelist.c
> @@ -208,8 +208,15 @@ StrategyGetBuffer(BufferAccessStrategy strategy, uint32 
> *buf_state)
> /*
>  * If given a strategy object, see whether it can select a buffer. We
>  * assume strategy objects don't need buffer_strategy_lock.
>  */
> -   if (strategy != NULL)
> +   if (strategy != NULL && StrategyControl->firstFreeBuffer < 0)
> {
> buf = GetBufferFromRing(strategy, buf_state);
> if (buf != NULL)
>
> So if there are free buffers, then use normal buffer allocation instead of 
> GetBufferFromRing.

Similar to what you did.

- Melanie

[1] 
https://www.postgresql.org/message-id/flat/CAJRYxuL98fE_QN7McnCM5HUo8p9ceNJw%3D20GoN5NVdZdueJFqg%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/flat/20200206040026.trjzsmdsbl4gu2b6%40alap3.anarazel.de
[5] 
https://www.postgresql.org/message-id/20200206040026.trjzsmdsbl4gu2b6%40alap3.anarazel.de




Re: Add a hook to allow modification of the ldapbindpasswd

2023-03-15 Thread Tom Lane
Andrew Dunstan  writes:
> pushed.

drongo is not happy with this, but I'm kind of baffled as to why:

"c:\\prog\\bf\\root\\HEAD\\pgsql.build\\pgsql.sln" (default target) (1) ->
"c:\\prog\\bf\\root\\HEAD\\pgsql.build\\ldap_password_func.vcxproj" (default 
target) (60) ->
(Link target) -> 
  ldap_password_func.obj : error LNK2001: unresolved external symbol 
__imp_ldap_password_hook 
[c:\\prog\\bf\\root\\HEAD\\pgsql.build\\ldap_password_func.vcxproj]
  .\\Release\\ldap_password_func\\ldap_password_func.dll : fatal error LNK1120: 
1 unresolved externals 
[c:\\prog\\bf\\root\\HEAD\\pgsql.build\\ldap_password_func.vcxproj]

The only obvious explanation for a link problem would be if the
variable's declaration were missing PGDLLIMPORT; but it's not.

regards, tom lane




Re: Speed-up shared buffers prewarming

2023-03-15 Thread Matthias van de Meent
On Wed, 15 Mar 2023 at 21:38, Konstantin Knizhnik  wrote:
>
> Hi hackers,
>
> It is well known fact that queries using sequential scan can not be used to 
> prewarm cache, because them are using ring buffer
> even if shared buffers are almost empty.
> I have searched hackers archive but failed to find any discussion about it.
> What are the drawbacks of using free buffers even with BAM_BULKREAD strategy?
> I mean the following trivial patch:
>
> diff --git a/src/backend/storage/buffer/freelist.c 
> b/src/backend/storage/buffer/freelist.c
> index 6be80476db..243335d0e4 100644
> --- a/src/backend/storage/buffer/freelist.c
> +++ b/src/backend/storage/buffer/freelist.c
> @@ -208,8 +208,15 @@ StrategyGetBuffer(BufferAccessStrategy strategy, uint32 
> *buf_state)
> /*
>  * If given a strategy object, see whether it can select a buffer. We
>  * assume strategy objects don't need buffer_strategy_lock.
>  */
> -   if (strategy != NULL)
> +   if (strategy != NULL && StrategyControl->firstFreeBuffer < 0)
> {
> buf = GetBufferFromRing(strategy, buf_state);
> if (buf != NULL)
>
> So if there are free buffers, then use normal buffer allocation instead of 
> GetBufferFromRing.

Yes. As seen in [1], ring buffers aren't all that great in some cases,
and I think this is one. Buffer allocation should always make use of
the available resources, so that it doesn't take O(N/ring_size) scans
on a table to fill the buffers if that seqscan is the only workload of
the system.

> Definitely it is possible that seqscan limited by ring buffer will be 
> completed faster than seqscan filling all shared buffers especially if
> size of shared buffers is large enough. OS will need some extra time to 
> commit memory and may be swap out other regions to find enough physical
> memory for shared buffers.

Not just that, but it is also possible that by ignoring the ring we're
going to hit pages that aren't yet in the CPU caches and we would thus
need to fetch the data from RAM (or from another NUMA node), which
could be more expensive than reading it from a local kernel's file
cache and writing it to the local cache lines.

Anyway, I'm all for this change - I don't think we need to be careful
about trashing other workload's buffers if the buffer is useless for
literally every workload.


Kind regards,

Matthias van de Meent

[1] 
https://www.postgresql.org/message-id/flat/2023082720.ejifsclfwymw2reb%40awork3.anarazel.de




Memory leak in libxml2 (was Re: [PATCH] Add pretty-printed XML output option)

2023-03-15 Thread Tom Lane
I wrote:
> BTW, the libxml leak problem seems to extend to other cases too.
> I tested with code like

> do $$
> declare x xml; t text;
> begin
> x := ' encoding="utf8"?>73';
> for i in 1..1000 loop
>   t := xmlserialize(document x as text);
> end loop;
> raise notice 't = %', t;
> end;
> $$;

> That case is fine, but if you change the encoding spec to "latin1",
> it leaks like mad.  That problem is not the fault of this patch,
> I don't think.  I wonder if we need to do something to prevent
> libxml from seeing encoding declarations other than utf8?

After a bit of further testing: the leak is present in libxml2 2.9.7
which is what I have on this RHEL8 box, but it seems not to occur
in libxml2 2.10.3 (tested on Fedora 37, and I verified that Fedora
isn't carrying any relevant local patch).

So maybe it's worth working around that, or maybe it isn't.

regards, tom lane




Re: Add support for DEFAULT specification in COPY FROM

2023-03-15 Thread Andrew Dunstan


On 2023-03-15 We 13:00, Alexander Lakhin wrote:

Hello,
13.03.2023 17:15, Andrew Dunstan wrote:


On 2022-12-02 Fr 09:11, Israel Barth Rubio wrote:

Hello all,

I'm submitting a new version of the patch. Instead of changing 
signature
of several functions in order to use the defaults parameter, it is 
now storing
that in the cstate structure, which is already passed to all 
functions that

were previously modified.



Thanks, committed.


Please look at the query:
create table t (f1 int);
copy t from stdin with (format csv, default '\D');
1,\D

that invokes an assertion failure after 9f8377f7a:
Core was generated by `postgres: law regression [local] 
COPY '.

Program terminated with signal SIGABRT, Aborted.




Fix pushed, thanks for the report.


cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: windows CI failing PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED

2023-03-15 Thread Thomas Munro
On Thu, Mar 16, 2023 at 2:00 AM Alexander Lakhin  wrote:
> 15.03.2023 11:43, Thomas Munro wrote:
> > Do you know how it fails in non-assert builds, without the fix?

> So at least with my test script that doesn't lead to a crash or something.

Thanks.  We were wondering if the retry mechanism might somehow be
hiding this in non-assert builds, but, looking more closely, that is
tied specifically to the memory reservation operation.

I noticed that d41a178b missed a comment explaining why we used
malloc() instead of palloc(), but that isn't true anymore, so here's a
small patch to clean that up.
From 774e1acd27b49fc8d4b017686eeb9260032d6bce Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Thu, 16 Mar 2023 09:42:57 +1300
Subject: [PATCH] Small tidyup for commit d41a178b.

A comment was left behind claiming that we needed to use malloc() rather
than palloc() because the corresponding free would run in another thread,
but that's not true anymore.  Adjust the comment.  And, with that reason
being gone, we might as well use palloc().

Back-patch to supported releases, like d41a178b.

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 71198b72c8..27184fb3c4 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -4786,13 +4786,10 @@ retry:
 
 	/*
 	 * Queue a waiter to signal when this child dies. The wait will be handled
-	 * automatically by an operating system thread pool.
-	 *
-	 * Note: use malloc instead of palloc, since it needs to be thread-safe.
-	 * Struct will be free():d from the callback function that runs on a
-	 * different thread.
+	 * automatically by an operating system thread pool.  The memory will be
+	 * freed by a later call to waitpid().
 	 */
-	childinfo = malloc(sizeof(win32_deadchild_waitinfo));
+	childinfo = palloc(sizeof(win32_deadchild_waitinfo));
 	if (!childinfo)
 		ereport(FATAL,
 (errcode(ERRCODE_OUT_OF_MEMORY),
@@ -6463,7 +6460,7 @@ waitpid(pid_t pid, int *exitstatus, int options)
 	 * Free struct that was allocated before the call to
 	 * RegisterWaitForSingleObject()
 	 */
-	free(childinfo);
+	pfree(childinfo);
 
 	return pid;
 }
-- 
2.39.2



Re: [PATCH] Add pretty-printed XML output option

2023-03-15 Thread Tom Lane
I wrote:
> Huh, interesting.  That is a legitimate pretty-fication of the input,
> I suppose, but some people might think it goes beyond the charter of
> "indentation".  I'm okay with it personally; anyone want to object?

Hearing no objections to that, I moved ahead with this.

It occurred to me to test v23 for memory leaks, and it had bad ones:

* the "newline" node used in the CONTENT case never got freed.
Making another one for each line wasn't helping, either.

* libxml, at least in the 2.9.7 version I have here, turns out to
leak memory if you pass a non-null encoding to xmlSaveToBuffer.
But AFAICS we don't really need to do that, because the last thing
we want is for libxml to try to do any encoding conversion.

After cleaning that up, I saw that we were indeed doing essentially
duplicative xml_parse calls for the DOCUMENT check and the indentation
work, so I refactored to allow just one call to serve.

Pushed with those changes and some other cosmetic cleanup.
Thanks for working so hard on this!

(Now to keep an eye on the buildfarm, to see if other versions of
libxml work like mine ...)

BTW, the libxml leak problem seems to extend to other cases too.
I tested with code like

do $$
declare x xml; t text;
begin
x := '73';
for i in 1..1000 loop
  t := xmlserialize(document x as text);
end loop;
raise notice 't = %', t;
end;
$$;

That case is fine, but if you change the encoding spec to "latin1",
it leaks like mad.  That problem is not the fault of this patch,
I don't think.  I wonder if we need to do something to prevent
libxml from seeing encoding declarations other than utf8?

regards, tom lane




Re: Add support for DEFAULT specification in COPY FROM

2023-03-15 Thread Andrew Dunstan


On 2023-03-15 We 13:00, Alexander Lakhin wrote:

Hello,
13.03.2023 17:15, Andrew Dunstan wrote:


On 2022-12-02 Fr 09:11, Israel Barth Rubio wrote:

Hello all,

I'm submitting a new version of the patch. Instead of changing 
signature
of several functions in order to use the defaults parameter, it is 
now storing
that in the cstate structure, which is already passed to all 
functions that

were previously modified.



Thanks, committed.


Please look at the query:
create table t (f1 int);
copy t from stdin with (format csv, default '\D');
1,\D

that invokes an assertion failure after 9f8377f7a:
Core was generated by `postgres: law regression [local] 
COPY '.

Program terminated with signal SIGABRT, Aborted.

warning: Section `.reg-xstate/3253881' in core file too small.
#0  __pthread_kill_implementation (no_tid=0, signo=6, 
threadid=140665061189440) at ./nptl/pthread_kill.c:44

44  ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, 
threadid=140665061189440) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140665061189440) at 
./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140665061189440, 
signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x7fef2250e476 in __GI_raise (sig=sig@entry=6) at 
../sysdeps/posix/raise.c:26

#4  0x7fef224f47f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x5600fd395750 in ExceptionalCondition (
    conditionName=conditionName@entry=0x5600fd3fa751 "n >= 0 && n < 
list->length",
    fileName=fileName@entry=0x5600fd416db8 
"../../../src/include/nodes/pg_list.h", lineNumber=lineNumber@entry=280)

    at assert.c:66
#6  0x5600fd02626d in list_nth_cell (n=, 
list=)

    at ../../../src/include/nodes/pg_list.h:280
#7  list_nth_int (n=, list=) at 
../../../src/include/nodes/pg_list.h:313

#8  CopyReadAttributesCSV (cstate=) at copyfromparse.c:1905
#9  0x5600fd0265a5 in NextCopyFromRawFields 
(cstate=0x5600febdd238, fields=0x7fff12ef7130, nfields=0x7fff12ef712c)

    at copyfromparse.c:833
#10 0x5600fd0267f9 in NextCopyFrom 
(cstate=cstate@entry=0x5600febdd238, 
econtext=econtext@entry=0x5600fec9c5c8,

    values=0x5600febdd5c8, nulls=0x5600febdd5d0) at copyfromparse.c:885
#11 0x5600fd0234db in CopyFrom 
(cstate=cstate@entry=0x5600febdd238) at copyfrom.c:989
#12 0x5600fd0222e5 in DoCopy (pstate=0x5600febdc568, 
stmt=0x5600febb2d58, stmt_location=0, stmt_len=49,

    processed=0x7fff12ef7340) at copy.c:308
#13 0x5600fd25c5e9 in standard_ProcessUtility (pstmt=0x5600febb2e78,
    queryString=0x5600febb2178 "copy t from stdin with (format csv, 
default '\\D');", readOnlyTree=,
    context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, 
dest=0x5600febb3138, qc=0x7fff12ef7600)

    at utility.c:742
#14 0x5600fd25a9f1 in PortalRunUtility 
(portal=portal@entry=0x5600fec4ea48, pstmt=pstmt@entry=0x5600febb2e78,
    isTopLevel=isTopLevel@entry=true, 
setHoldSnapshot=setHoldSnapshot@entry=false, 
dest=dest@entry=0x5600febb3138,

    qc=qc@entry=0x7fff12ef7600) at pquery.c:1158
#15 0x5600fd25ab2d in PortalRunMulti 
(portal=portal@entry=0x5600fec4ea48, isTopLevel=isTopLevel@entry=true,
    setHoldSnapshot=setHoldSnapshot@entry=false, 
dest=dest@entry=0x5600febb3138,
    altdest=altdest@entry=0x5600febb3138, qc=qc@entry=0x7fff12ef7600) 
at pquery.c:1315
#16 0x5600fd25b1c1 in PortalRun 
(portal=portal@entry=0x5600fec4ea48, 
count=count@entry=9223372036854775807,
    isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, 
dest=dest@entry=0x5600febb3138,
    altdest=altdest@entry=0x5600febb3138, qc=0x7fff12ef7600) at 
pquery.c:791

#17 0x5600fd256f34 in exec_simple_query (
    query_string=0x5600febb2178 "copy t from stdin with (format csv, 
default '\\D');") at postgres.c:1240
#18 0x5600fd258ae7 in PostgresMain (dbname=, 
username=) at postgres.c:4572
#19 0x5600fd1c2d3f in BackendRun (port=0x5600febe05c0, 
port=0x5600febe05c0) at postmaster.c:4461

#20 BackendStartup (port=0x5600febe05c0) at postmaster.c:4189
#21 ServerLoop () at postmaster.c:1779
#22 0x5600fd1c3d63 in PostmasterMain (argc=argc@entry=3, 
argv=argv@entry=0x5600febad640) at postmaster.c:1463

#23 0x5600fced4fc6 in main (argc=3, argv=0x5600febad640) at main.c:200




Thanks for the test case. Will fix.


cheers


andrew


--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: Add a hook to allow modification of the ldapbindpasswd

2023-03-15 Thread Andrew Dunstan


On 2023-01-23 Mo 14:11, Andrew Dunstan wrote:

On 2022-12-19 Mo 11:29, Andrew Dunstan wrote:

This patch, mostly the work of John Naylor, provides a hook whereby a
module can modify the ldapbindpasswd before it is handed to the ldap
server. This is similar in concept to the ssl_passphrase_callback
feature, and allows the user not to have to put the cleartext password
in the pg_hba.conf file. A trivial test is added which provides an
example of such a module.


Updated to take advantage of refactoring of ldap tests.




pushed.


cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Speed-up shared buffers prewarming

2023-03-15 Thread Konstantin Knizhnik

Hi hackers,

It is well known fact that queries using sequential scan can not be used 
to prewarm cache, because them are using ring buffer

even if shared buffers are almost empty.
I have searched hackers archive but failed to find any discussion about it.
What are the drawbacks of using free buffers even with BAM_BULKREAD 
strategy?

I mean the following trivial patch:

diff --git a/src/backend/storage/buffer/freelist.c 
b/src/backend/storage/buffer/freelist.c

index 6be80476db..243335d0e4 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -208,8 +208,15 @@ StrategyGetBuffer(BufferAccessStrategy strategy, 
uint32 *buf_state)

    /*
 * If given a strategy object, see whether it can select a 
buffer. We

 * assume strategy objects don't need buffer_strategy_lock.
 */
-   if (strategy != NULL)
+   if (strategy != NULL && StrategyControl->firstFreeBuffer < 0)
    {
    buf = GetBufferFromRing(strategy, buf_state);
    if (buf != NULL)

So if there are free buffers, then use normal buffer allocation instead 
of GetBufferFromRing.


Right now it is necessary to use pg_prewarm extension in order to 
prewarm buffers.
But it is not convenient (you need to manually locate and prewarm all 
indexes and TOAST relation) and not always possible

(client may just not notice that server is restarted).

One potential problem which I can imagine is sync scan: when several 
seqscans of the same table are using the same pages from ring buffer.
But synchronization of concurrent sync scans is naturally achieved: 
backed which is moving first is moving slowly than catching up backends
which do not need to read something from the disk. It seems to me that 
if we allow to use all shared buffers instead of small ring buffer,
then concurrent seqscans will have more chances to reuse cached pages. I 
have performed multiple tests with spawning multiple parallel seqscans
after postgres restart and didn't observe any problems or degradation of 
performance comparing with master.


Also ring buffer is used not only for seqscan. There are several places 
in Postgres core and extension (for example pgvector) where BAM_BULKREAD 
strategy is used

also for index scan.

Certainly OS file cache should prevent redundant disk reads.
But it seems to be better in any case to use free memory inside Postgres 
process rather than rely on OS cache and perform syscalls to copy data 
from this cache.


Definitely it is possible that seqscan limited by ring buffer will be 
completed faster than seqscan filling all shared buffers especially if
size of shared buffers is large enough. OS will need some extra time to 
commit memory and may be swap out other regions to find enough physical
memory for shared buffers. But if data set fits in memory, then 
subsequent  queries will be much faster. And it is quite common for 
modern servers

that size of shared buffers is comparable with database size.

I will be pleased you point me at some drawbacks of such approach.
Otherwise I can propose patch for commitfest.

Re: On login trigger: take three

2023-03-15 Thread Mikhail Gribkov
Hi Gregory,

Thanks for the note. The problem was that the patch was not aware of
yesterday Tom Lane's changes in the test.
It's fixed now: the attached v39 patch contains the updated version along
with the freshest rebase on master branch.

--
 best regards,
Mikhail A. Gribkov

e-mail: youzh...@gmail.com
*http://www.flickr.com/photos/youzhick/albums
*
http://www.strava.com/athletes/5085772
phone: +7(916)604-71-12
Telegram: @youzhick



On Wed, Mar 15, 2023 at 8:45 PM Gregory Stark (as CFM) 
wrote:

> It looks like the patch is failing a test by not dumping
> login_event_trigger_func? I'm guessing there's a race condition in the
> test but I don't know. I also don't see the tmp_test_jI6t output file
> being preserved in the artifacts anywhere :(
>
> https://cirrus-ci.com/task/6391161871400960?logs=test_world#L2671
>
>
> [16:16:48.594] # Looks like you failed 1 test of 10350.
>
>
> # Running: pg_dump --no-sync
>
> --file=/tmp/cirrus-ci-build/src/bin/pg_dump/tmp_check/tmp_test_jI6t/only_dump_measurement.sql
> --table-and-children=dump_test.measurement --lock-wait-timeout=18
> postgres
> [16:16:27.027](0.154s) ok 6765 - only_dump_measurement: pg_dump runs
> .
> [16:16:27.035](0.000s) not ok 6870 - only_dump_measurement: should
> dump CREATE FUNCTION dump_test.login_event_trigger_func
> [16:16:27.035](0.000s)
> [16:16:27.035](0.000s) #   Failed test 'only_dump_measurement: should
> dump CREATE FUNCTION dump_test.login_event_trigger_func'
> #   at t/002_pg_dump.pl line 4761.
> .
> [16:16:48.594] +++ tap check in src/bin/pg_dump +++
> [16:16:48.594] [16:16:05] t/001_basic.pl  ok 612 ms (
> 0.01 usr 0.00 sys + 0.24 cusr 0.26 csys = 0.51 CPU)
> [16:16:48.594]
> [16:16:48.594] # Failed test 'only_dump_measurement: should dump
> CREATE FUNCTION dump_test.login_event_trigger_func'
> [16:16:48.594] # at t/002_pg_dump.pl line 4761.
> [16:16:48.594] # Review only_dump_measurement results in
> /tmp/cirrus-ci-build/src/bin/pg_dump/tmp_check/tmp_test_jI6t
>


v39-On_client_login_event_trigger.patch
Description: Binary data


How to check for in-progress transactions

2023-03-15 Thread Tejasvi Kashi
Hi everyone,

I'm Tej, a grad student poking around postgres for a project.

For my use case, I'm trying to ascertain if there are any in-flight
transactions that are yet to be replicated to synchronous standbys (in a
synchronous streaming replication setting)

The first way to do this would be to check the WalSndCtl->lsn[] array to
see if the current max lsn of the system has replicated or not. This works
well when postgres is running and being actively used. However, when
postgres has just started up, WalSndCtl->lsn[] values could be 0, but there
could still be transactions waiting to replicate.

The second way to do it would be to scan ProcGlobal to check for active
xids. However, the issue is that I'm calling ProcArrayEndTransaction()
before calling SyncRepWaitForLSN() to ensure that the transaction becomes
visible to other transactions before it begins to wait in the SyncRep
queue.

So, with this change, if I scan ProcGlobal, I would not see transactions
that have been committed locally but are yet to be replicated to
synchronous standbys because ProcArrayEndTransaction() would have marked
the transaction as completed.

I've been looking at sent_lsn, write_lsn, flush_lsn etc., of the
walsender, but with no success. Considering the visibility change added
above, is there a way for me to check for transactions that have been
committed locally but are waiting for replication?

I would appreciate it if someone could point me in the right direction!

Sincerely,

Tej Kashi
MMath CS, University of Waterloo
Waterloo, ON, CA


Re: Commitfest 2023-03 starting tomorrow!

2023-03-15 Thread Gregory Stark (as CFM)
So, sorry I've been a bit under the weather but can concentrate on the
commitfest now. I tried to recapitulate the history but the activity
log only goes back a certain distance on the web. If I can log in to
the database I guess I could construct the history from sql queries if
it was important.

But where we stand now is:

Status summary:
  Needs review: 152.
  Waiting on Author: 42.
  Ready for Committer: 39.
  Committed: 61.
  Moved to next CF: 4.
  Withdrawn: 17.
  Returned with Feedback: 4.
Total: 319.

Of the Needs Review patches there are 81 that have received no email
messages since the CF began. A lot of those have reviews attached but
presumably those reviewers are mostly from earlier CFs and may have
already contributed all they can.

I don't know, should we move some or all of these to the next CF
already? I'm reluctant to bounce them en masse because there are
definitely some patches that will get reviewed and some that should
really be marked "Ready for Committer".

These patches that are "Needs Review" and have received no comments at
all since before March 1st are these. If your patch is amongst this
list I would suggest any of:

1) Move it yourself to the next CF (or withdraw it)
2) Post to the list with any pending questions asking for specific
feedback -- it's much more likely to get feedback than just a generic
"here's a patch plz review"...
3) Mark it Ready for Committer and possibly post explaining the
resolution to any earlier questions to make it easier for a committer
to understand the state

If there's still no emails on these at some point I suppose it will
make sense to move them out of the CF.

 * ALTER TABLE SET ACCESS METHOD on partitioned tables
 * New hooks in the connection path
 * Add log messages when replication slots become active and inactive
 * Avoid use deprecated Windows Memory API
 * Remove dead macro exec_subplan_get_plan
 * Consider parallel for LATERAL subqueries having LIMIT/OFFSET
 * pg_rewind WAL deletion pitfall
 * Simplify find_my_exec by using realpath(3)
 * Move backup-related code to xlogbackup.c/.h
 * Avoid hiding shared filesets in pg_ls_tmpdir (pg_ls_* functions for
showing metadata ...)
 * Fix bogus error emitted by pg_recvlogical when interrupted
 * warn if GUC set to an invalid shared library
 * Check consistency of GUC defaults between .sample.conf and
pg_settings.boot_val
 * Code checks for App Devs, using new options for transaction behavior
 * Lockless queue of waiters based on atomic operations for LWLock
 * Fix assertion failure with next_phase_at in snapbuild.c
 * Add SPLIT PARTITION/MERGE PARTITIONS commands
 * Add sortsupport for range types and btree_gist
 * asynchronous execution support for Custom Scan
 * Periodic burst growth of the checkpoint_req counter on replica.
 * CREATE INDEX CONCURRENTLY on partitioned table
 * Fix ParamPathInfo for union-all AppendPath
 * Add OR REPLACE option for CREATE OPERATOR
 * ALTER TABLE and CLUSTER fail to use a BulkInsertState for toast tables
 * Partial aggregates push down
 * Non-replayable WAL records through overflows and >MaxAllocSize lengths
 * Enable jitlink as an alternative jit linker of legacy Rtdyld and
add riscv jitting support
 * Test for function error in logrep worker
 * basebackup: support zstd long distance matching
 * pgbench - adding pl/pgsql versions of tests
 * Function to log backtrace of postgres processes
 * More scalable multixacts buffers and locking
 * Remove nonmeaningful prefixes in PgStat_* fields
 * COPY FROM enable FORCE_NULL/FORCE_NOT_NULL on all columns
 * postgres_fdw: commit remote (sub)transactions in parallel during pre-commit
 * Add semi-join pushdown to postgres_fdw
 * Skip replicating the tables specified in except table option
 * Split index and table statistics into different types of stats
 * Exclusion constraints on partitioned tables
 * Post-special Page Storage TDE support
 * Direct I/O (developer-only feature)
 * Improve doc for autovacuum on partitioned tables
 * Patch to implement missing join selectivity estimation for range types
 * Clarify the behavior of the system when approaching XID wraparound
 * Set arbitrary GUC options during initdb
 * An attempt to avoid
locally-committed-but-not-replicated-to-standby-transactions in
synchronous replication
 * Check lateral references within PHVs for memoize cache keys
 * Add n_tup_newpage_upd to pg_stat table views
 * monitoring usage count distribution
 * Reduce wakeup on idle for bgwriter & walwriter for >5s
 * Report the query string that caused a memory error under Valgrind
 * New [relation] options engine
 * Data is copied twice when specifying both child and parent table in
publication
 * possibility to take name, signature and oid of currently executed
function in GET DIAGNOSTICS statement
 * Named Operators
 * nbtree performance improvements through specialization on key shape
 * Fix assertion failure in SnapBuildInitialSnapshot()
 * Speed up releasing of locks
 * Compression dictionaries
 * 

Re: Allow logical replication to copy tables in binary format

2023-03-15 Thread Melih Mutlu
Hi,

vignesh C , 15 Mar 2023 Çar, 18:12 tarihinde şunu
yazdı:

> One comment:
> 1) There might be a chance the result order of select may vary as
> "ORDER BY" is not specified,  Should we add "ORDER BY" as the table
> has multiple rows:
> +# Check the synced data on the subscriber
> +$result = $node_subscriber->safe_psql('postgres', 'SELECT a,b FROM
> +public.test_col_order;');
> +
> +is( $result, '1|2
> +3|4', 'check synced data on subscriber for different column order');
>

Right, it needs to be ordered. Fixed.

Thanks,
-- 
Melih Mutlu
Microsoft


v15-0001-Allow-logical-replication-to-copy-table-in-binary.patch
Description: Binary data


Re: On login trigger: take three

2023-03-15 Thread Gregory Stark (as CFM)
It looks like the patch is failing a test by not dumping
login_event_trigger_func? I'm guessing there's a race condition in the
test but I don't know. I also don't see the tmp_test_jI6t output file
being preserved in the artifacts anywhere :(

https://cirrus-ci.com/task/6391161871400960?logs=test_world#L2671


[16:16:48.594] # Looks like you failed 1 test of 10350.


# Running: pg_dump --no-sync
--file=/tmp/cirrus-ci-build/src/bin/pg_dump/tmp_check/tmp_test_jI6t/only_dump_measurement.sql
--table-and-children=dump_test.measurement --lock-wait-timeout=18
postgres
[16:16:27.027](0.154s) ok 6765 - only_dump_measurement: pg_dump runs
.
[16:16:27.035](0.000s) not ok 6870 - only_dump_measurement: should
dump CREATE FUNCTION dump_test.login_event_trigger_func
[16:16:27.035](0.000s)
[16:16:27.035](0.000s) #   Failed test 'only_dump_measurement: should
dump CREATE FUNCTION dump_test.login_event_trigger_func'
#   at t/002_pg_dump.pl line 4761.
.
[16:16:48.594] +++ tap check in src/bin/pg_dump +++
[16:16:48.594] [16:16:05] t/001_basic.pl  ok 612 ms (
0.01 usr 0.00 sys + 0.24 cusr 0.26 csys = 0.51 CPU)
[16:16:48.594]
[16:16:48.594] # Failed test 'only_dump_measurement: should dump
CREATE FUNCTION dump_test.login_event_trigger_func'
[16:16:48.594] # at t/002_pg_dump.pl line 4761.
[16:16:48.594] # Review only_dump_measurement results in
/tmp/cirrus-ci-build/src/bin/pg_dump/tmp_check/tmp_test_jI6t




Initial Schema Sync for Logical Replication

2023-03-15 Thread Kumar, Sachin
Hi Everyone,

I am working on the initial schema sync for Logical replication. Currently, 
user have to
manually create a schema on subscriber side. Aim of this feature is to add an 
option in
create subscription, so that schema sync can be automatic. I am sharing Design 
Doc below,
but there are some corner cases where the design does not work. Please share 
your opinion
if design can be improved and we can get rid of corner cases. This design is 
loosely based
on Pglogical.
DDL replication is required for this feature.
(https://www.postgresql.org/message-id/flat/CAAD30U%2BpVmfKwUKy8cbZOnUXyguJ-uBNejwD75Kyo%3DOjdQGJ9g%40mail.gmail.com)

SQL Changes:-
CREATE SUBSCRIPTION subscription_name
CONNECTION 'conninfo'
PUBLICATION publication_name [, ...]
[ WITH ( subscription_parameter [= value] [, ... ] ) ]
sync_initial_schema (enum) will be added to subscription_parameter.
It can have 3 values:-
TABLES, ALL , NONE (Default)
In ALL everything will be synced including global objects too.

Restrictions :- sync_initial_schema=ALL can only be used for publication with 
FOR ALL TABLES

Design:-

Publisher :-
Publisher have to implement `SHOW CREATE TABLE_NAME`, this table definition 
will be used by
subscriber to create exact schema of a table on the subscriber. One alternative 
to this can
be doing it on the subscriber side itself, we can create a function similar to
describeOneTableDetails and call it on the subscriber. We also need maintain 
same ownership
as of publisher.

It should also have turned on publication of DDL commands.

Subscriber :-

1. In  CreateSubscription()  when we create replication 
slot(walrcv_create_slot()), should
use CRS_EXPORT_SNAPSHOT, So that we can use this snapshot later in the pg_dump.

2.  Now we can call pg_dump with above snapshot from CreateSubscription. This 
is inside
opts.connect && opts.create_slot if statement. If we fail in this step we have 
to drop
the replication slot and create a new one again. Because we need snapshot and 
creating a
replication slot is a way to get snapshot. The reason for running pg_dump with 
above
snapshot is that we don't want execute DDLs in wal_logs to 2 times. With above 
snapshot we
get a state of database which is before the replication slot origin and any 
changes after
the snapshot will be in wal_logs.

We will save the pg_dump into a file (custom archive format). So pg_dump will 
be similar to
pg_dump --connection_string --schema_only --snapshot=xyz -Fc --file initSchema

If  sync_initial_schema=TABLES we dont have to call pg_dump/restore at all. 
TableSync process
will take care of it.

3. If we have to sync global objects we need to call pg_dumpall --globals-only 
also. But pg_dumpall
does not support --snapshot option, So if user creates a new global object 
between creation
of replication slot and running pg_dumpall, that above global object will be 
created 2
times on subscriber , which will error out the Applier process.

4. walrcv_disconnect should be called after pg_dump is finished, otherwise 
snapshot will
not be valid.

5. Users will replication role cant not call pg_dump , So the replication user 
have to
superuser. This is a a major problem.
postgres=# create role s4 WITH LOGIN Replication;
CREATE ROLE
╭─sachin@DUB-1800550165 ~
╰─$ pg_dump postgres -s -U s4   
1 ↵
pg_dump: error: query failed: ERROR:  permission denied for table t1
pg_dump: detail: Query was: LOCK TABLE public.t1, public.t2 IN ACCESS SHARE MODE

6. pg_subscription_rel table column srsubstate will have one more state
SUBREL_STATE_CREATE 'c'. if sync_initial_schema is enabled we will set 
table_state to 'c'.
Above 6 steps will be done even if subscription is not enabled, but connect is 
true.

7. Leader Applier process should check if initSync file exist , if true  then 
it should
call pg_restore. We are not using —pre-data and —post-data segment as it is 
used in
Pglogical, Because post_data works on table having data , but we will fill the 
data into
table on later stages.  pg_restore can be called like this

pg_restore --connection_string -1 file_name
-1 option will execute every command inside of one transaction. If there is any 
error
everything will be rollbacked.
pg_restore should be called quite early in the Applier process code, before any 
tablesync
process can be created.
Instead of checking if file exist maybe pg_subscription table can be extended 
with column
SyncInitialSchema and applier process will check SyncInitialSchema == 
SYNC_PENDING

8. TableSync process should check the state of table , if it is 
SUBREL_STATE_CREATE it should
get the latest definition from the publisher and recreate the table. (We have 
to recreate
the table even if there are no changes). Then it should go into copy table mode 
as usual.

It might seem that TableSync is doing duplicate work already done by 
pg_restore. We are doing
it in this way because of concurrent DDLs and refresh publication 

Re: Add support for DEFAULT specification in COPY FROM

2023-03-15 Thread Alexander Lakhin

Hello,
13.03.2023 17:15, Andrew Dunstan wrote:


On 2022-12-02 Fr 09:11, Israel Barth Rubio wrote:

Hello all,

I'm submitting a new version of the patch. Instead of changing signature
of several functions in order to use the defaults parameter, it is now storing
that in the cstate structure, which is already passed to all functions that
were previously modified.



Thanks, committed.


Please look at the query:
create table t (f1 int);
copy t from stdin with (format csv, default '\D');
1,\D

that invokes an assertion failure after 9f8377f7a:
Core was generated by `postgres: law regression [local] 
COPY '.

Program terminated with signal SIGABRT, Aborted.

warning: Section `.reg-xstate/3253881' in core file too small.
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140665061189440) 
at ./nptl/pthread_kill.c:44

44  ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140665061189440) 
at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140665061189440) at 
./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140665061189440, signo=signo@entry=6) at 
./nptl/pthread_kill.c:89
#3  0x7fef2250e476 in __GI_raise (sig=sig@entry=6) at 
../sysdeps/posix/raise.c:26

#4  0x7fef224f47f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x5600fd395750 in ExceptionalCondition (
    conditionName=conditionName@entry=0x5600fd3fa751 "n >= 0 && n < 
list->length",
    fileName=fileName@entry=0x5600fd416db8 
"../../../src/include/nodes/pg_list.h", lineNumber=lineNumber@entry=280)

    at assert.c:66
#6  0x5600fd02626d in list_nth_cell (n=, list=)
    at ../../../src/include/nodes/pg_list.h:280
#7  list_nth_int (n=, list=) at 
../../../src/include/nodes/pg_list.h:313

#8  CopyReadAttributesCSV (cstate=) at copyfromparse.c:1905
#9  0x5600fd0265a5 in NextCopyFromRawFields (cstate=0x5600febdd238, 
fields=0x7fff12ef7130, nfields=0x7fff12ef712c)

    at copyfromparse.c:833
#10 0x5600fd0267f9 in NextCopyFrom (cstate=cstate@entry=0x5600febdd238, 
econtext=econtext@entry=0x5600fec9c5c8,

    values=0x5600febdd5c8, nulls=0x5600febdd5d0) at copyfromparse.c:885
#11 0x5600fd0234db in CopyFrom (cstate=cstate@entry=0x5600febdd238) at 
copyfrom.c:989
#12 0x5600fd0222e5 in DoCopy (pstate=0x5600febdc568, stmt=0x5600febb2d58, 
stmt_location=0, stmt_len=49,

    processed=0x7fff12ef7340) at copy.c:308
#13 0x5600fd25c5e9 in standard_ProcessUtility (pstmt=0x5600febb2e78,
    queryString=0x5600febb2178 "copy t from stdin with (format csv, default 
'\\D');", readOnlyTree=,
    context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, 
dest=0x5600febb3138, qc=0x7fff12ef7600)

    at utility.c:742
#14 0x5600fd25a9f1 in PortalRunUtility (portal=portal@entry=0x5600fec4ea48, 
pstmt=pstmt@entry=0x5600febb2e78,
    isTopLevel=isTopLevel@entry=true, 
setHoldSnapshot=setHoldSnapshot@entry=false, dest=dest@entry=0x5600febb3138,

    qc=qc@entry=0x7fff12ef7600) at pquery.c:1158
#15 0x5600fd25ab2d in PortalRunMulti (portal=portal@entry=0x5600fec4ea48, 
isTopLevel=isTopLevel@entry=true,

    setHoldSnapshot=setHoldSnapshot@entry=false, dest=dest@entry=0x5600febb3138,
    altdest=altdest@entry=0x5600febb3138, qc=qc@entry=0x7fff12ef7600) at 
pquery.c:1315
#16 0x5600fd25b1c1 in PortalRun (portal=portal@entry=0x5600fec4ea48, 
count=count@entry=9223372036854775807,
    isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, 
dest=dest@entry=0x5600febb3138,

    altdest=altdest@entry=0x5600febb3138, qc=0x7fff12ef7600) at pquery.c:791
#17 0x5600fd256f34 in exec_simple_query (
    query_string=0x5600febb2178 "copy t from stdin with (format csv, default 
'\\D');") at postgres.c:1240
#18 0x5600fd258ae7 in PostgresMain (dbname=, 
username=) at postgres.c:4572
#19 0x5600fd1c2d3f in BackendRun (port=0x5600febe05c0, port=0x5600febe05c0) 
at postmaster.c:4461

#20 BackendStartup (port=0x5600febe05c0) at postmaster.c:4189
#21 ServerLoop () at postmaster.c:1779
#22 0x5600fd1c3d63 in PostmasterMain (argc=argc@entry=3, 
argv=argv@entry=0x5600febad640) at postmaster.c:1463

#23 0x5600fced4fc6 in main (argc=3, argv=0x5600febad640) at main.c:200

Best regards,
Alexander

Re: optimize several list functions with SIMD intrinsics

2023-03-15 Thread Ankit Kumar Pandey



On 15/03/23 21:53, Nathan Bossart wrote:


Did you try building without SIMD support?  This is what I see:



list.c: In function ‘list_member_ptr’:
list.c:697:2: warning: ISO C90 forbids mixed declarations and code 
[-Wdeclaration-after-statement]
  697 |  const ListCell *cell;
  |  ^



If your build doesn't have USE_NO_SIMD defined, this warning won't appear
because the code in question will be compiled out.


My mistake, I tried with USE_NO_SIMD defined and it showed the warning. Sorry 
for the noise.

Regards,
Ankit





Re: optimize several list functions with SIMD intrinsics

2023-03-15 Thread Nathan Bossart
On Wed, Mar 15, 2023 at 07:31:46PM +0530, Ankit Kumar Pandey wrote:
>> On 14/03/23 03:10, Nathan Bossart wrote:
>> On Sat, Mar 11, 2023 at 09:41:18AM +, Ankit Kumar Pandey wrote:
>> > 1. In list_member_ptr, will it be okay to bring `const ListCell
>> > *cell` from #ifdef USE_NO_SIMD
>> >const ListCell *cell;
>> > #endif
>> > to #else like as mentioned below? This will make visual separation between 
>> > #if cases more cleaner
>> I would expect to see -Wdeclaration-after-statement warnings if we did
>> this.
> 
> This worked fine for me, no warnings on gcc 12.2.0. Not a concern though.

Did you try building without SIMD support?  This is what I see:

list.c: In function ‘list_member_ptr’:
list.c:697:2: warning: ISO C90 forbids mixed declarations and code 
[-Wdeclaration-after-statement]
  697 |  const ListCell *cell;
  |  ^

If your build doesn't have USE_NO_SIMD defined, this warning won't appear
because the code in question will be compiled out.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: pkg-config Requires.private entries should be comma-separated

2023-03-15 Thread Andres Freund
Hi,

On 2023-03-15 08:51:04 +0100, Peter Eisentraut wrote:
> While comparing the .pc (pkg-config) files generated by the make and meson
> builds, I noticed that the Requires.private entries use different
> delimiters.  The make build uses spaces, the meson build uses commas. The
> pkg-config documentation says that it should be comma-separated, but
> apparently about half the .pc in the wild use just spaces.
> 
> The pkg-config source code acknowledges that both commas and spaces work:
> 
> https://github.com/freedesktop/pkg-config/blob/master/parse.c#L273
> https://github.com/pkgconf/pkgconf/blob/master/libpkgconf/dependency.c#L286
> 
> I think for consistency we should change the make build to use commas
> anyway.  See attached patch.

Makes sense.

Greetings,

Andres Freund




Re: CI and test improvements

2023-03-15 Thread Peter Eisentraut

On 15.03.23 15:56, Justin Pryzby wrote:

I'm surprised if there's any question about the merits of making
documentation easily available for review.  Several people have agreed;
one person mailed me privately specifically to ask how to show HTML docs
on cirrusci.

Anyway, all this stuff is best addressed either before or after the CF.
I'll kick the patch forward.  Thanks for looking.


I suppose this depends on what you want to use this for.  If your use is 
to prepare and lay out as much information as possible about a patch for 
a reviewer, some of your ideas make sense.


I'm using this primarily to quickly test local work in progress.  So I 
want a quick feedback cycle.  I don't need it to show me which HTML docs 
changed, for example.


So maybe there need to be different modes.





Re: Allow logical replication to copy tables in binary format

2023-03-15 Thread vignesh C
On Wed, 15 Mar 2023 at 15:30, Melih Mutlu  wrote:
>
> Hi,
>
> Please see the attached patch.

One comment:
1) There might be a chance the result order of select may vary as
"ORDER BY" is not specified,  Should we add "ORDER BY" as the table
has multiple rows:
+# Check the synced data on the subscriber
+$result = $node_subscriber->safe_psql('postgres', 'SELECT a,b FROM
+public.test_col_order;');
+
+is( $result, '1|2
+3|4', 'check synced data on subscriber for different column order');

Regards,
Vignesh




Re: CI and test improvements

2023-03-15 Thread Justin Pryzby
On Wed, Mar 15, 2023 at 10:58:41AM +0100, Peter Eisentraut wrote:
> On 14.03.23 05:56, Justin Pryzby wrote:
> > I'm soliticing feedback on those patches that I've sent recently - I've
> > elided patches if they have some unresolved issue.
> 
> > [PATCH 1/8] cirrus/windows: add compiler_warnings_script
> 
> Needs a better description of what it actually does.  (And fewer "I'm not
> sure how to write this ..." comments ;-) )  It looks like it would fail the
> build if there is a compiler warning in the Windows VS task? Shouldn't that
> be done in the CompilerWarnings task?

The goal is to fail due to warnings only after running tests.

https://www.postgresql.org/message-id/20220212212310.f645c6vw3njkgxka%40alap3.anarazel.de
"Probably worth scripting something to make the windows task error out
if there had been warnings, but only after running the tests."

CompilerWarnings runs in a linux environment running with -Werror.  
This patch scrapes warnings out of MSVC, since (at least historically)
it's too slow to run a separate windows VM to compile with -Werror.

> Also, I see a bunch of warnings in the current output from that task. These
> should be cleaned up in any case before we can let a thing like this loose.

Yeah (and I mentioned those myself).  As it stands, my patch also
"breaks" everytime someone's else's patch introduces warnings.  I
included links demonstrating its failures.

I agree that it's not okay to merge the patch when it's currently
failing, but I cannot dig into that other issue right now.

> > [PATCH 6/8] cirrus: code coverage
> 
> This adds -Db_coverage=true to the FreeBSD task.  This has a significant
> impact on the build time.  (+50% at least, it appears.)

Yes - but with the CPUs added by the prior patch, the freebsd task is
faster than it is currently.  And its 8min runtime would match the other
tasks well.

> I'm not sure the approach here makes sense.  For example, if you add a new
> test, the set of changed files is just that test.  So you won't get any
> report on what coverage change the test has caused.

The coverage report that I proposed clearly doesn't handle that case -
it's not intended to.

Showing a full coverage report is somewhat slow to generate, probably
unreasonable to upload for every patch, every day, and not very
interesting since it's at least 99% duplicative.  The goal is to show a
coverage report for new code for every patch.  What fraction of the time
do you think the patch author, reviewer or committer have looked at a
coverage report?  It's not a question of whether it's possible to do so
locally, but of whether it's actually done.

> Also, I don't think I trust the numbers from the meson coverage stuff yet.
> See for example
> .

I'm not using the meson coverage target.  I could instead add
CFLAGS=--coverage.  Anyway, getting a scalar value like "83%" might be
interesting to show in cfbot, but it's not the main goal here.

> > [PATCH 7/8] cirrus: upload changed html docs as artifacts
> > [PATCH 8/8] +html index file
> 
> This builds the docs twice and then analyzes the differences between the two
> builds.  This also affects the build times quite significantly.

The main goal is to upload the changed docs.

> People who want to look at the docs can build them locally.

This makes the docs for every patch available for reviewers, without
needing a build environment.  An easy goal would be if documentation for
every patch was reviewed by a native english speaker.  Right now that's
not consistently true.

> How useful is this actually?

I'm surprised if there's any question about the merits of making
documentation easily available for review.  Several people have agreed;
one person mailed me privately specifically to ask how to show HTML docs
on cirrusci.

Anyway, all this stuff is best addressed either before or after the CF.
I'll kick the patch forward.  Thanks for looking.

-- 
Justin




Re: Making background psql nicer to use in tap tests

2023-03-15 Thread Andrew Dunstan


On 2023-01-30 Mo 19:00, Andres Freund wrote:

Hi,

On 2023-01-30 15:06:46 -0500, Tom Lane wrote:

Andres Freund  writes:

It's annoyingly hard to wait for the result of a query in a generic way with
background_psql(), and more generally for psql. background_psql() uses -XAtq,
which means that we'll not get "status" output (like "BEGIN" or "(1 row)"),
and that queries not returning anything are completely invisible.

Yeah, the empty-query-result problem was giving me fits recently.
+1 for wrapping this into something more convenient to use.

I've hacked some on this. I first tried to just introduce a few helper
functions in Cluster.pm, but that ended up being awkward. So I bit the bullet
and introduced a new class (in BackgroundPsql.pm), and made background_psql()
and interactive_psql() return an instance of it.

This is just a rough prototype. Several function names don't seem great, it
need POD documentation, etc.



Since this class is only intended to have instances created from 
Cluster, I would be inclined just to put it at the end of Cluster.pm 
instead of creating a new file. That makes it clearer that the new 
package is not standalone. We already have instances of that.


The first param of the constructor is a bit opaque. If it were going to 
be called from elsewhere I'd want something a bit more obvious, but I 
guess we can live with it here. An alternative might be 
multiple_constructors (e.g. new_background, new_interactive) which use a 
common private routine.


Don't have comments yet on the other things, will continue looking.


cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: optimize several list functions with SIMD intrinsics

2023-03-15 Thread Ankit Kumar Pandey

Agree with your points Nathan. Just a headup.


On 14/03/23 03:10, Nathan Bossart wrote:
On Sat, Mar 11, 2023 at 09:41:18AM +, Ankit Kumar Pandey wrote:
1. In list_member_ptr, will it be okay to bring `const ListCell *cell` from 
#ifdef USE_NO_SIMD

const ListCell *cell;
#endif
to #else like as mentioned below? This will make visual separation between #if 
cases more cleaner

I would expect to see -Wdeclaration-after-statement warnings if we did
this.


This worked fine for me, no warnings on gcc 12.2.0. Not a concern though.

Thanks,

Ankit





Re: windows CI failing PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED

2023-03-15 Thread Alexander Lakhin

Hi,
15.03.2023 11:43, Thomas Munro wrote:

On Wed, Mar 15, 2023 at 9:00 PM Alexander Lakhin  wrote:

The result depends on some OS conditions (it reproduced pretty well
immediately after VM reboot), but it's enough to test the patch proposed.
And I can confirm that the Assert is not observed anymore (with the sleep
added after CloseHandle(childinfo->procHandle)).

Thanks for confirming.  Pushed earlier today.

Do you know how it fails in non-assert builds, without the fix?


I've replaced the Assert with 'if (!...) elog(...)' and got (with a non-assert 
build):

t/099_check-pids.pl .. ok
All tests successful.
Files=1, Tests=1, 67 wallclock secs ( 0.03 usr +  0.00 sys =  0.03 CPU)
Result: PASS
2023-03-15 12:22:46.923 GMT|postgres|postgres|4484|6411b896.1184|LOG: 
!(PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED)
2023-03-15 12:22:47.806 GMT|postgres|postgres|4180|6411b897.1054|LOG: 
!(PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED)
2023-03-15 12:23:06.313 GMT|postgres|postgres|4116|6411b8aa.1014|LOG: 
!(PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED)
2023-03-15 12:23:06.374 GMT|postgres|postgres|4740|6411b8aa.1284|LOG: 
!(PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED)
2023-03-15 12:23:25.967 GMT|postgres|postgres|6812|6411b8bd.1a9c|LOG: 
!(PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED)


So at least with my test script that doesn't lead to a crash or something.

Best regards,
Alexander




Re: logical decoding and replication of sequences, take 2

2023-03-15 Thread Tomas Vondra



On 3/10/23 11:03, John Naylor wrote:
> 
> On Wed, Mar 1, 2023 at 1:02 AM Tomas Vondra
> mailto:tomas.von...@enterprisedb.com>>
> wrote:
>> here's a rebased patch to make cfbot happy, dropping the first part that
>> is now unnecessary thanks to 7fe1aa991b.
> 
> Hi Tomas,
> 
> I'm looking into doing some "in situ" testing, but for now I'll mention
> some minor nits I found:
> 
> 0001
> 
> + * so we simply do a lookup (the sequence is identified by relfilende). If
> 
> relfilenode? Or should it be called a relfilelocator, which is the
> parameter type? I see some other references to relfilenode in comments
> and commit message, and I'm not sure which need to be updated.
> 

Yeah, that's a leftover from the original patch, before the relfilenode
was renamed to relfilelocator.

> + /* XXX Maybe check that we're still in the same top-level xact? */
> 
> Any ideas on what should happen here?
> 

I don't recall why I added this comment, but I don't think there's
anything we need to do (so drop the comment).

> + /* XXX how could we have sequence change without data? */
> + if(!datalen || !tupledata)
> + elog(ERROR, "sequence decode missing tuple data");
> 
> Since the ERROR is new based on feedback, we can get rid of XXX I think.
> 
> More generally, I associate XXX comments to highlight problems or
> unpleasantness in the code that don't quite rise to the level of FIXME,
> but are perhaps more serious than "NB:", "Note:", or "Important:"
> 

Understood. I keep adding XXX in places where I have some open
questions, or something that may need to be improved (so kinda less
serious than a FIXME).

> + * When we're called via the SQL SRF there's already a transaction
> 
> I see this was copied from existing code, but I found it confusing --
> does this function have a stable name?
> 

What do you mean by "stable name"? It certainly is not exposed as a
user-callable SQL function, so I think this comment it misleading and
should be removed.

> + /* Only ever called from ReorderBufferApplySequence, so transational. */
> 
> Typo: transactional
> 
> 0002
> 
> I see a few SERIAL types in the tests but no GENERATED ... AS IDENTITY
> -- not sure if it matters, but seems good for completeness.
> 

That's a good point. Adding tests for GENERATED ... AS IDENTITY is a
good idea.

> Reminder for later: Patches 0002 and 0003 still refer to 0da92dc530,
> which is a reverted commit -- I assume it intends to refer to the
> content of 0001?
> 

Correct. That needs to be adjusted at commit time.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Documentation for building with meson

2023-03-15 Thread Peter Eisentraut

> [PATCH v8 1/5] Make minor additions and corrections to meson docs

The last hunk revealed that there is some mixing up between meson setup 
and meson configure.  This goes a bit further.  For example, earlier it 
says that to get a list of meson setup options, call meson configure 
--help and look at https://mesonbuild.com/Commands.html#configure, which 
are both wrong.  Also later throughout the text it uses one or the 
other.  I think this has the potential to be very confusing, and we 
should clean this up carefully.


The text about additional meson test options maybe should go into the 
regress.sgml chapter?



> [PATCH v8 2/5] Add data layout options sub-section in installation
 docs

This makes sense.  Please double check your patch for correct title 
casing, vertical spacing of XML, to keep everything looking consistent.


This text isn't yours, but since your patch emphasizes it, I wonder if 
it could use some clarification:


+ These options affect how PostgreSQL lays out data on disk.
+ Note that changing these breaks on-disk database compatibility,
+ meaning you cannot use pg_upgrade to upgrade to
+ a build with different values of these options.

This isn't really correct.  What breaking on-disk compatibility means is 
that you can't use a server compiled one way with a data directory 
initialized by binaries compiled another way.  pg_upgrade may well have 
the ability to upgrade between one or the other; that's up to pg_upgrade 
to figure out but not an intrinsic property.  (I wonder why pg_upgrade 
cares about the WAL block size.)



> [PATCH v8 3/5] Remove Anti-Features section from Installation from
 source docs

Makes sense.  But is "--disable-thread-safety" really a developer 
feature?  I think not.



> [PATCH v8 4/5] Re-organize Miscellaneous section

This moves the Miscellaneous section after Developer Features.  I think 
Developer Features should be last.


Maybe should remove this section and add the options to the regular 
PostgreSQL Features section.


Also consider the grouping in meson_options.txt, which is slightly 
different yet.



> [PATCH v8 5/5] Change Short Version for meson installation guide

+# create working directory
+mkdir postgres
+cd postgres
+
+# fetch source code
+git clone https://git.postgresql.org/git/postgresql.git src

This comes after the "Getting the Source" section, so at this point they 
already have the source and don't need to do "git clone" etc. again.


+# setup and enter build directory (done only first time)
+## Unix based platforms
+meson setup build src --prefix=$PWD/install
+
+## Windows
+meson setup build src --prefix=%cd%/install

Maybe some people work this way, but to me the directory structures you 
create here are completely weird.


+# Initialize a new database
+../install/bin/initdb -D ../data
+
+# Start database
+../install/bin/pg_ctl -D ../data/ -l logfile start
+
+# Connect to the database
+../install/bin/psql -d postgres

The terminology here needs to be tightened up.  You are using "database" 
here to mean three different things.






Re: Fix fseek() detection of unseekable files on WIN32

2023-03-15 Thread Juan José Santamaría Flecha
On Wed, Mar 15, 2023 at 5:57 AM Michael Paquier  wrote:

> On Tue, Mar 14, 2023 at 01:26:27PM +0100, Juan José Santamaría Flecha
> wrote:
> > As highlighted in [1] fseek() might fail to error even when accessing
> > unseekable streams.
> >
> > PFA a patch that checks the file type before the actual fseek(), so only
> > supported calls are made.
>
> + * streams, so harden that funcion with our version.
> s/funcion/function/.
>

Done.

+extern int pgfseek64(FILE *stream, pgoff_t offset, int origin);
> +extern pgoff_t pgftell64(FILE *stream);
> +#define fseeko(stream, offset, origin) pgfseek64(stream, offset, origin)
> +#define ftello(stream) pgftell64(stream)
>
> What about naming the internal wrappers _pgfseeko64() and
> _pgftello64(), located in a new file named win32fseek.c?  It may be
> possible that we would need a similar treatment for fseek(), in the
> future, though I don't see an issue why this would be needed now.
>

Done.


> +   if (GetFileType((HANDLE) _get_osfhandle(_fileno(stream))) !=
> FILE_TYPE_DISK)
> +   {
> +   errno = ESPIPE;
> +   return -1;
> +   }
> Shouldn't there be cases where we should return EINVAL for some of the
> other types, like FILE_TYPE_REMOTE or FILE_TYPE_UNKNOWN?  We should
> return ESPIPE only for FILE_TYPE_PIPE and FILE_TYPE_CHAR, then?
>

Done.

PFA a new version of the patch.

Regards,

Juan José Santamaría Flecha


v2-0001-fix-fseek-detection-of-unseekable-files-for-WIN32.patch
Description: Binary data


Re: postgres_fdw: Useless if-test in GetConnection()

2023-03-15 Thread Daniel Gustafsson
> On 15 Mar 2023, at 11:18, Etsuro Fujita  wrote:

> While working on something else, I noticed that the “if (entry->conn
> == NULL)” test after doing disconnect_pg_server() when re-establishing
> a given connection in GetConnection() is pointless, because the former
> function ensures that entry->conn is NULL.  So I removed the if-test.
> Attached is a patch for that.

LGTM, nice catch.

> I think we could instead add an assertion, but I did not, because we already
> have it in make_new_connection().

Agreed, the assertion in make_new_connection is enough (and is needed there).

--
Daniel Gustafsson





Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

2023-03-15 Thread Ants Aasma
On Wed, 15 Mar 2023 at 02:57, Melanie Plageman
 wrote:
> > > Subject: [PATCH v3 3/3] add vacuum option to specify ring_size and guc
> >
> > >  #define INT_ACCESS_ONCE(var) ((int)(*((volatile int *)&(var
> > > +#define bufsize_limit_to_nbuffers(bufsize) (bufsize * 1024 / BLCKSZ)
> >
> > Macros are normally be capitalized
>
> Yes, there doesn't seem to be a great amount of consistency around
> this... See pgstat.c read_chunk_s and bufmgr.c BufHdrGetBlock and
> friends. Though there are probably more capitalized than not. Since it
> does a bit of math and returns a value, I wanted to convey that it was
> more like a function. Also, since the name was long, I thought all-caps
> would be hard to read. However, if you or others feel strongly, I am
> attached neither to the capitalization nor to the name at all (what do
> you think of the name?).

A static inline function seems like a less surprising and more type
safe solution for this.

-- 
Ants Aasma
Senior Database Engineer
www.cybertec-postgresql.com




Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

2023-03-15 Thread Ants Aasma
On Wed, 15 Mar 2023 at 02:29, Melanie Plageman
 wrote:
> As for routine vacuuming and the other buffer access strategies, I think
> there is an argument for configurability based on operator knowledge --
> perhaps your workload will use the data you are COPYing as soon as the
> COPY finishes, so you might as well disable a buffer access strategy or
> use a larger fraction of shared buffers. Also, the ring sizes were
> selected sixteen years ago and average server memory and data set sizes
> have changed.

To be clear I'm not at all arguing against configurability. I was
thinking that dynamic use could make the configuration simpler by self
tuning to use no more buffers than is useful.

> StrategyRejectBuffer() will allow bulkreads to, as you say, use more
> buffers than the original ring size, since it allows them to kick
> dirty buffers out of the ring and claim new shared buffers.
>
> Bulkwrites and vacuums, however, will inevitably dirty buffers and
> require flushing the buffer (and thus flushing the associated WAL) when
> reusing them. Bulkwrites and vacuum do not kick dirtied buffers out of
> the ring, since dirtying buffers is their common case. A dynamic
> resizing like the one you suggest would likely devolve to vacuum and
> bulkwrite strategies always using the max size.

I think it should self stabilize around the point where the WAL is
either flushed by other commit activity, WAL writer or WAL buffers
filling up. Writing out their own dirtied buffers will still happen,
just the associated WAL flushes will be in larger chunks and possibly
done by other processes.

> As for decreasing the ring size, buffers are only "added" to the ring
> lazily and, technically, as it is now, buffers which have been added
> added to the ring can always be reclaimed by the clocksweep (as long as
> they are not pinned). The buffer access strategy is more of a
> self-imposed restriction than it is a reservation. Since the ring is
> small and the buffers are being frequently reused, odds are the usage
> count will be 1 and we will be the one who set it to 1, but there is no
> guarantee. If, when attempting to reuse the buffer, its usage count is
> > 1 (or it is pinned), we also will kick it out of the ring and go look
> for a replacement buffer.

Right, but while the buffer is actively used by the ring it is
unlikely that clocksweep will find it at usage 0 as the ring buffer
should cycle more often than the clocksweep. Whereas if the ring stops
using a buffer, clocksweep will eventually come and reclaim it. And if
the ring shrinking decision turns out to be wrong before the
clocksweep gets around to reusing it, we can bring the same buffer
back into the ring.

> I do think that it is a bit unreasonable to expect users to know how
> large they would like to make their buffer access strategy ring. What we
> want is some way of balancing different kinds of workloads and
> maintenance tasks reasonably. If your database has no activity because
> it is the middle of the night or it was shutdown because of transaction
> id wraparound, there is no reason why vacuum should limit the number of
> buffers it uses. I'm sure there are many other such examples.

Ideally yes, though I am not hopeful of finding a solution that does
this any time soon. Just to take your example, if a nightly
maintenance job wipes out the shared buffer contents slightly
optimizing its non time-critical work and then causes morning user
visible load to have big latency spikes due to cache misses, that's
not a good tradeoff either.

--
Ants Aasma
Senior Database Engineer
www.cybertec-postgresql.com




Re: postgres_fdw: Useless if-test in GetConnection()

2023-03-15 Thread Richard Guo
On Wed, Mar 15, 2023 at 6:18 PM Etsuro Fujita 
wrote:

> While working on something else, I noticed that the “if (entry->conn
> == NULL)” test after doing disconnect_pg_server() when re-establishing
> a given connection in GetConnection() is pointless, because the former
> function ensures that entry->conn is NULL.  So I removed the if-test.
> Attached is a patch for that.  I think we could instead add an
> assertion, but I did not, because we already have it in
> make_new_connection().


+1. Good catch.

Thanks
Richard


Re: TAP output format in pg_regress

2023-03-15 Thread Peter Eisentraut

On 24.02.23 10:49, Daniel Gustafsson wrote:

Another rebase on top of 337903a16f.  Unless there are conflicting reviews, I
consider this patch to be done and ready for going in during the next CF.


I think this is just about as good as it's going to get, so I think we 
can consider committing this soon.


A few comments along the way:

1) We can remove the gettext markers _() inside calls like note(), 
bail(), etc.  If we really wanted to do translation, we would do that 
inside those functions (similar to errmsg() etc.).


2) There are a few fprintf(stderr, ...) calls left.  Should those be 
changed to something TAP-enabled?


3) Maybe these lines

+++ isolation check in src/test/isolation +++

should be changed to TAP format?  Arguably, if I run "make -s check", 
then everything printed should be valid TAP, right?


4) From the introduction lines

== creating temporary instance==
== initializing database system   ==
== starting postmaster==
running on port 61696 with PID 85346
== creating database "regression" ==
== running regression test queries==

you have kept

# running on port 61696 with PID 85346

which, well, is that the most interesting of those?

The first three lines (creating, initializing, starting) take some 
noticeable amount of time, so they could be kept as a progress 
indicator.  Or just delete all of them?  I suppose some explicit 
indication about temp-instance versus existing instance would be useful.


5) About the output format.  Obviously, this will require some 
retraining of the eye.  But my first impression is that there is a lot 
of whitespace without any guiding lines, so to speak, horizontally or 
vertically.  It makes me a bit dizzy. ;-)


I think instead

# parallel group (2 tests):  event_trigger oidjoins
ok 210  event_trigger  131 ms
ok 211  oidjoins   190 ms
ok 212   fast_default  158 ms
ok 213   tablespace319 ms

Something like this would be less dizzy-making:

# parallel group (2 tests):  event_trigger oidjoins
ok 210 - + event_trigger   131 ms
ok 211 - + oidjoins190 ms
ok 212 - fast_default  158 ms
ok 213 - tablespace319 ms

Just an idea, we don't have to get this exactly perfect right now, but 
it's something to think about.



I think if 1-4 are addressed, this can be committed.





postgres_fdw: Useless if-test in GetConnection()

2023-03-15 Thread Etsuro Fujita
Hi,

While working on something else, I noticed that the “if (entry->conn
== NULL)” test after doing disconnect_pg_server() when re-establishing
a given connection in GetConnection() is pointless, because the former
function ensures that entry->conn is NULL.  So I removed the if-test.
Attached is a patch for that.  I think we could instead add an
assertion, but I did not, because we already have it in
make_new_connection().

This would be harmless, so I am planning to apply the patch to HEAD only.

Best regards,
Etsuro Fujita


remove-useless-if-test.patch
Description: Binary data


Re: Allow logical replication to copy tables in binary format

2023-03-15 Thread Melih Mutlu
Amit Kapila , 15 Mar 2023 Çar, 12:31 tarihinde
şunu yazdı:

> On Tue, Mar 14, 2023 at 4:32 PM Melih Mutlu 
> wrote:
>


> What purpose does this test serve w.r.t this patch? Before checking
> the sync for different column orders, the patch has already changed
> binary to false, so it doesn't seem to test the functionality of this
> patch. Am, I missing something?
>

I missed that binary has changed to false before testing column orders. I
moved that test case up before changing binary to false.
Please see v14 [1].

[1]
https://www.postgresql.org/message-id/CAGPVpCTaXYctCUp3z%3D_BstonHiZcC5Jj7584i7B8jeZQq4RJkw%40mail.gmail.com

Thanks,
-- 
Melih Mutlu
Microsoft


Re: Allow logical replication to copy tables in binary format

2023-03-15 Thread Melih Mutlu
Hi,

Please see v14 [1].

Peter Smith , 15 Mar 2023 Çar, 09:22 tarihinde şunu
yazdı:

> Here are some review comments for v13-0001
>
> ==
> doc/src/sgml/logical-replication.sgml
>
> 1.
> That's why in the previous v12 review [1] (comment #3) I suggested
> that this page should just say something quite generic like "However,
> replication in binary format is more restrictive", and link back to
> the other page which has all the gory details.
>

You're right. Changed it with what you suggested.


> 2.
> My previous v12 review [1] (comment #6) suggested maybe updating this
> page. But it was not done in v13. Did you accidentally miss the review
> comment, or chose not to do it?
>

Sorry, I missed this. Added a line leading to CREATE SUBSCRIPTION doc.


> ==
> doc/src/sgml/ref/create_subscription.sgml
>
> 3.
> BEFORE
> Binary format is also very data type specific, it will not allow
> copying between different column types as opposed to text format.
>
> SUGGESTION (worded more similar to what is already on the COPY page [2])
> Binary format is very data type specific; for example, it will not
> allow copying from a smallint column to an integer column, even though
> that would work fine in text format.
>

Done.


> 4.
> SUGGESTION
> If the publisher is a PostgreSQL version
> before 14, then any initial table synchronization will use text format
> even if binary = true.
>

Done.


> SUGGESTION
> If the publisher version is earlier than v14, we use text format COPY.
> Note - In fact COPY syntax "WITH (FORMAT binary)" has existed since
> v9, but since the logical replication binary mode transfer was not
> introduced until v14 it was decided to check using the later version.
>

Changed it as suggested here [2].

[1]
https://www.postgresql.org/message-id/CAGPVpCTaXYctCUp3z%3D_BstonHiZcC5Jj7584i7B8jeZQq4RJkw%40mail.gmail.com
[2]
https://www.postgresql.org/message-id/CAA4eK1%2BC7ykvdBxh_t1BdbX5Da1bM1BgsE%3D-i2koPkd3pSid0A%40mail.gmail.com


Thanks,
-- 
Melih Mutlu
Microsoft


Re: Allow logical replication to copy tables in binary format

2023-03-15 Thread Melih Mutlu
Hi,

Please see the attached patch.

Takamichi Osumi (Fujitsu) , 14 Mar 2023 Sal,
18:20 tarihinde şunu yazdı:

> (1) create_subscription.sgml
>
> +  column types as opposed to text format. Even when this option
> is enabled,
> +  only data types having binary send and receive functions will be
> +  transferred in binary. Note that the initial synchronization
> requires
>
> (1-1)
>
> I think it's helpful to add a reference for the description about send and
> receive functions (e.g. to the page of CREATE TYPE).
>

Done.


>
> (1-2)
>
> Also, it would be better to have a cross reference from there to this doc
> as one paragraph probably in "Notes". I suggested this, because send and
> receive functions are described as "optional" there and missing them leads
> to error in the context of binary table synchronization.
>

I'm not sure whether this is necessary. In case of missing send/receive
functions, error logs are already clear about what's wrong and logical
replication docs also explain what could go wrong with binary.


> (3) copy_table()
>
> +   /*
> +* If the publisher version is earlier than v14, it COPY command
> doesn't
> +* support the binary option.
> +*/
>
> This sentence doesn't look correct grammatically. We can replace "it COPY
> command" with "subscription" for example. Kindly please fix it.
>

Changed this with Amit's suggestion [1].


[1]
https://www.postgresql.org/message-id/CAA4eK1%2BC7ykvdBxh_t1BdbX5Da1bM1BgsE%3D-i2koPkd3pSid0A%40mail.gmail.com

Thanks,
-- 
Melih Mutlu
Microsoft


v14-0001-Allow-logical-replication-to-copy-table-in-binary.patch
Description: Binary data


Re: CI and test improvements

2023-03-15 Thread Peter Eisentraut

On 14.03.23 05:56, Justin Pryzby wrote:

I'm soliticing feedback on those patches that I've sent recently - I've
elided patches if they have some unresolved issue.


> [PATCH 1/8] cirrus/windows: add compiler_warnings_script

Needs a better description of what it actually does.  (And fewer "I'm 
not sure how to write this ..." comments ;-) )  It looks like it would 
fail the build if there is a compiler warning in the Windows VS task? 
Shouldn't that be done in the CompilerWarnings task?


Also, I see a bunch of warnings in the current output from that task. 
These should be cleaned up in any case before we can let a thing like 
this loose.


(The warnings are all like

C:\python\Include\pyconfig.h(117): warning C4005: 'MS_WIN64': macro 
redefinition


so possibly a single fix can address them all.)


> [PATCH 2/8] cirrus/freebsd: run with more CPUs+RAM and do not repartition

I don't know enough about this.  Maybe Andres or Thomas want to take 
this.  No concerns if it's safe.



> [PATCH 3/8] cirrus/freebsd: define 
ENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS


Looks sensible.


> [PATCH 4/8] pg_upgrade: tap test: exercise --link and --clone

I haven't been able to get any changes to the test run times outside of 
noise from this.  But some more coverage is sensible in any case.


I'm concerned that with this change, the only platform that tests --copy 
is Windows, but Windows has a separate code path for copy.  So we should 
leave one Unix platform to test --copy.  Maybe have FreeBSD test --link 
and macOS test --clone and leave the others with --copy?



> [PATCH 5/8] WIP: ci/meson: allow showing only failed tests ..


7e09035f588 WIP: ci/meson: allow showing only failed tests ..


I'm not sure I like this one.  I sometimes look up the logs of non-failed
tests to compare them with failed tests, to get context to could lead to
failures.  Maybe we can make this behavior adjustable. But I've not been
bothered by the current behavior.


It's adjustable by un/setting the environment variable.

I'm surprised to hear that anyone using cirrusci (with or without cfbot)
wouldn't prefer the behavior this patch implements.  It's annoying to
search find the logs for the (typically exactly one) failing test in
cirrus' directory of 200some test artifacts.  We're also uploading a lot
of logs for every failure.  (But I suppose this might break cfbot's new
client side parsing of things like build logs...)


One thing that actually annoys me that is that a successful run does not 
upload any test artifacts at all.  So, I guess I'm just of a different 
opinion here.



> [PATCH 6/8] cirrus: code coverage

This adds -Db_coverage=true to the FreeBSD task.  This has a significant 
impact on the build time.  (+50% at least, it appears.)


I'm not sure the approach here makes sense.  For example, if you add a 
new test, the set of changed files is just that test.  So you won't get 
any report on what coverage change the test has caused.


Also, I don't think I trust the numbers from the meson coverage stuff 
yet.  See for example 
.



> [PATCH 7/8] cirrus: upload changed html docs as artifacts
> [PATCH 8/8] +html index file

This builds the docs twice and then analyzes the differences between the 
two builds.  This also affects the build times quite significantly.


How useful is this actually?  People who want to look at the docs can 
build them locally.  There are no platform dependencies or anything like 
that where having them built elsewhere is of advantage.







Re: Combine pg_walinspect till_end_of_wal functions with others

2023-03-15 Thread Michael Paquier
On Wed, Mar 15, 2023 at 12:40:17PM +0530, Bharath Rupireddy wrote:
> -1 for removing \dx+ for pg_walinspect version 1.0, because we wanted
> to show the diff of functions along with testing the upgrade path in
> the oldextversions.sql. Therefore, I prefer something like [1]:

This is a duplicate of what describe.c uses, with a COLLATE clause.
The main goal was to have a simple check, so I'd still stand by the
simplest choice and move on.
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] Extend the length of BackgroundWorker.bgw_library_name

2023-03-15 Thread Daniel Gustafsson
> On 13 Mar 2023, at 18:35, Nathan Bossart  wrote:
> 
> On Mon, Mar 13, 2023 at 07:57:47AM -0700, Yurii Rashkovskii wrote:
>> However, there are use cases where [potentially] longer names are
>> expected/desired; for example, test benches (where library files may not
>> [or can not] be copied to Postgres installation) or alternative library
>> installation methods that do not put them into $libdir.
>> 
>> The patch is backwards-compatible and ensures that bgw_library_name stays
>> *at least* as long as BGW_MAXLEN. Existing external code that uses
>> BGW_MAXLEN is a length boundary (for example, in `strncpy`) will continue
>> to work as expected.
> 
> I see that BGW_MAXLEN was originally set to 64 in 2013 (7f7485a) [0], but
> was increased to 96 in 2018 (3a4b891) [1].  It seems generally reasonable
> to me to increase the length of bgw_library_name further for the use-case
> you describe, but I wonder if it'd be better to simply increase BGW_MAXLEN
> again.  However, IIUC bgw_library_name is the only field that is likely to
> be used for absolute paths, so only increasing that one to MAXPGPATH makes
> sense.

Yeah, raising just bgw_library_name to MAXPGPATH seems reasonable here.  While
the memory usage does grow it's still quite modest, and has an upper limit in
max_worker_processes.

While here, I wonder if we should document what BGW_MAXLEN is defined as in
bgworker.sgml?

--
Daniel Gustafsson





Re: Allow logical replication to copy tables in binary format

2023-03-15 Thread Amit Kapila
On Tue, Mar 14, 2023 at 4:32 PM Melih Mutlu  wrote:
>
> Attached v13.
>

I have a question related to the below test in the patch:

+# Setting binary to false should allow syncing
+$node_subscriber->safe_psql(
+'postgres', qq(
+ALTER SUBSCRIPTION tsub SET (binary = false);));
+
+# Ensure the COPY command is executed in text format on the publisher
+$node_publisher->wait_for_log(qr/LOG: ( [a-z0-9]+:)? COPY (.+)? TO STDOUT\n/);
+
+$node_subscriber->wait_for_subscription_sync($node_publisher, 'tsub');
+
+# Check the synced data on the subscriber
+$result = $node_subscriber->safe_psql('postgres', 'SELECT a FROM
test_mismatching_types ORDER BY a;');
+
+is( $result, '1
+2', 'check synced data on subscriber with binary = false');
+
+# Test syncing tables with different column order
+$node_publisher->safe_psql(
+'postgres', qq(
+CREATE TABLE public.test_col_order (
+a bigint, b int
+);
+INSERT INTO public.test_col_order (a,b)
+VALUES (1,2),(3,4);
+));

What purpose does this test serve w.r.t this patch? Before checking
the sync for different column orders, the patch has already changed
binary to false, so it doesn't seem to test the functionality of this
patch. Am, I missing something?

-- 
With Regards,
Amit Kapila.




Re: [EXTERNAL] Re: Add non-blocking version of PQcancel

2023-03-15 Thread Jelte Fennema
The rebase was indeed trivial (git handled everything automatically),
because my first patch was doing a superset of the changes that were
committed in b6dfee28f. Attached are the new patches.

On Tue, 14 Mar 2023 at 19:04, Greg Stark  wrote:
>
> On Tue, 14 Mar 2023 at 13:59, Tom Lane  wrote:
> >
> > "Gregory Stark (as CFM)"  writes:
> > > It looks like this needs a big rebase in fea-uth.c fe-auth-scram.c and
> > > fe-connect.c. Every hunk is failing which perhaps means the code
> > > you're patching has been moved or refactored?
> >
> > The cfbot is giving up after
> > v14-0001-libpq-Run-pgindent-after-a9e9a9f32b3.patch fails,
> > but that's been superseded (at least in part) by b6dfee28f.
>
> Ah, same with Jelte Fennema's patch for load balancing in libpq.
>
> --
> greg


v15-0003-Return-2-from-pqReadData-on-EOF.patch
Description: Binary data


v15-0002-Refactor-libpq-to-store-addrinfo-in-a-libpq-owne.patch
Description: Binary data


v15-0001-libpq-Run-pgindent-after-a9e9a9f32b3.patch
Description: Binary data


v15-0004-Add-non-blocking-version-of-PQcancel.patch
Description: Binary data


v15-0005-Start-using-new-libpq-cancel-APIs.patch
Description: Binary data


Re: [EXTERNAL] Re: Support load balancing in libpq

2023-03-15 Thread Jelte Fennema
Rebased

On Tue, 14 Mar 2023 at 19:05, Gregory Stark (as CFM)
 wrote:
>
> The pgindent run in b6dfee28f is causing this patch to need a rebase
> for the cfbot to apply it.


v12-0002-Refactor-libpq-to-store-addrinfo-in-a-libpq-owne.patch
Description: Binary data


v12-0003-Support-load-balancing-in-libpq.patch
Description: Binary data


v12-0001-libpq-Run-pgindent-after-a9e9a9f32b3.patch
Description: Binary data


v12-0004-Remove-unnecessary-check-from-Fisher-Yates-imple.patch
Description: Binary data


Re: windows CI failing PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED

2023-03-15 Thread Thomas Munro
On Wed, Mar 15, 2023 at 9:00 PM Alexander Lakhin  wrote:
> The result depends on some OS conditions (it reproduced pretty well
> immediately after VM reboot), but it's enough to test the patch proposed.
> And I can confirm that the Assert is not observed anymore (with the sleep
> added after CloseHandle(childinfo->procHandle)).

Thanks for confirming.  Pushed earlier today.

Do you know how it fails in non-assert builds, without the fix?




Re: Using each rel as both outer and inner for JOIN_ANTI

2023-03-15 Thread Richard Guo
On Wed, Mar 15, 2023 at 2:25 AM Gregory Stark (as CFM) 
wrote:

> So what is the status of this patch?
>
> It looks like you received some feedback from Emre, Tom, Ronan, and
> Alvaro but it also looks like you responded to most or all of that.
> Are you still blocked waiting for feedback? Anything specific you need
> help with?
>
> Or is the patch ready for commit now? In which case it would be good
> to rebase it since it's currently failing to apply. Well it would be
> good to rebase regardless but it would be especially important if we
> want to get it committed :)


Thanks for reminding.  Attached is the rebased patch, with no other
changes.  I think the patch is ready for commit.

Thanks
Richard


v6-0001-Using-each-rel-as-both-outer-and-inner-for-anti-joins.patch
Description: Binary data


Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

2023-03-15 Thread Masahiko Sawada
On Sat, Mar 11, 2023 at 11:55 PM Melanie Plageman
 wrote:
>
> > On Tue, Feb 28, 2023 at 3:16 AM Bharath Rupireddy
> >  wrote:
> >
> > > On Thu, Jan 12, 2023 at 6:06 AM Andres Freund  wrote:
> > > >
> > > > On 2023-01-11 17:26:19 -0700, David G. Johnston wrote:
> > > > > Should we just add "ring_buffers" to the existing "shared_buffers" and
> > > > > "temp_buffers" settings?
> > > >
> > > > The different types of ring buffers have different sizes, for good 
> > > > reasons. So
> > > > I don't see that working well. I also think it'd be more often useful to
> > > > control this on a statement basis - if you have a parallel import tool 
> > > > that
> > > > starts NCPU COPYs you'd want a smaller buffer than a single threaded 
> > > > COPY. Of
> > > > course each session can change the ring buffer settings, but still.
> > >
> > > How about having GUCs for each ring buffer (bulk_read_ring_buffers,
> > > bulk_write_ring_buffers, vacuum_ring_buffers - ah, 3 more new GUCs)?
> > > These options can help especially when statement level controls aren't
> > > easy to add (COPY, CREATE TABLE AS/CTAS, REFRESH MAT VIEW/RMV)? If
> > > needed users can also set them at the system level. For instance, one
> > > can set bulk_write_ring_buffers to other than 16MB or -1 to disable
> > > the ring buffer to use shared_buffers and run a bunch of bulk write
> > > queries.
>
> In attached v3, I've changed the name of the guc from buffer_usage_limit
> to vacuum_buffer_usage_limit, since it is only used for vacuum and
> autovacuum.
>
> I haven't added the other suggested strategy gucs, as those could easily
> be done in a future patchset.
>
> I've also changed GetAccessStrategyExt() to GetAccessStrategyWithSize()
> -- similar to initArrayResultWithSize().
>
> And I've added tab completion for BUFFER_USAGE_LIMIT so that it is
> easier to try out my patch.
>
> Most of the TODOs in the code are related to the question of how
> autovacuum uses the guc vacuum_buffer_usage_limit. autovacuum creates
> the buffer access strategy ring in do_autovacuum() before looping
> through and vacuuming tables. It passes this strategy object on to
> vacuum(). Since we reuse the same strategy object for all tables in a
> given invocation of do_autovacuum(), only failsafe autovacuum would
> change buffer access strategies. This is probably okay, but it does mean
> that the table-level VacuumParams variable, ring_size, means something
> different for autovacuum than vacuum. Autovacuum workers will always
> have set it to -1. We won't ever reach code in vacuum() which relies on
> VacuumParams->ring_size as long as autovacuum workers pass a non-NULL
> BufferAccessStrategy object to vacuum(), though.

I've not reviewed the patchset in depth yet but I got assertion
failure and SEGV when using the buffer_usage_limit parameter.

postgres(1:471180)=# vacuum (buffer_usage_limit 100) ;
2023-03-15 17:10:02.947 JST [471180] ERROR:  buffer_usage_limit for a
vacuum must be between -1 and 16777216. 100 is invalid. at
character 9

The message show the max value is 16777216, but when I set it, I got
an assertion failure:

postgres(1:470992)=# vacuum (buffer_usage_limit 16777216) ;
TRAP: failed Assert("ring_size < MAX_BAS_RING_SIZE_KB"), File:
"freelist.c", Line: 606, PID: 470992

Then when I used 1 byte lower value, 16777215, I got a SEGV:

postgres(1:471180)=# vacuum (buffer_usage_limit 16777215) ;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: 2023-03-15
17:10:59.404 JST [471159] LOG:  server process (PID 471180) was
terminated by signal 11: Segmentation fault

Finally, when I used a more lower value, 16777100, I got a memory
allocation error:

postgres(1:471361)=# vacuum (buffer_usage_limit 16777100) ;
2023-03-15 17:12:17.853 JST [471361] ERROR:  invalid memory alloc
request size 18446744073709551572

Probably vacuum_buffer_usage_limit also has the same issue.

Also, should we support a table option for vacuum_buffer_usage_limit as well?

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Simplify some codes in pgoutput

2023-03-15 Thread houzj.f...@fujitsu.com
Hi,

I noticed that there are some duplicated codes in pgoutput_change() function
which can be simplified, and here is an attempt to do that.

Best Regards,
Hou Zhijie



0001-simplify-the-code-in-pgoutput_change.patch
Description: 0001-simplify-the-code-in-pgoutput_change.patch


Re: meson: Non-feature feature options

2023-03-15 Thread Peter Eisentraut

On 14.03.23 15:07, Nazir Bilal Yavuz wrote:

I think the uuid side of this is making this way too complicated.  I'm
content leaving this as a manual option for now.

There is much more value in making the ssl option work automatically.
So I would welcome a patch that just makes -Dssl=auto work smoothly,
perhaps using the "trick" that Andres described.

I tried to implement what we did for ssl to uuid as well, do you have
any comments?


For the uuid option, we have three different choices.  What should be 
the search order and why?





Re: windows CI failing PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED

2023-03-15 Thread Alexander Lakhin

Hi,
14.03.2023 01:20, Andres Freund wrote:

I am yet to construct a reproduction of the case, but it seems to me that
the race condition is not impossible here.

I suspect the issue could be made much more likely by adding a sleep before
the pg_queue_signal(SIGCHLD) in pgwin32_deadchild_callback().


Thanks for the tip! With pg_usleep(5) added there, I can reproduce the issue
reliably during a minute on average with the 099_check_pids.pl I posted before:
...
2023-03-15 07:26:14.301 GMT|[unknown]|[unknown]|3748|64117316.ea4|LOG:  
connection received: host=127.0.0.1 port=49902
2023-03-15 07:26:14.302 GMT|postgres|postgres|3748|64117316.ea4|LOG:  connection 
authorized: user=postgres database=postgres application_name=099_check-pids.pl
2023-03-15 07:26:14.304 GMT|postgres|postgres|3748|64117316.ea4|LOG:  statement: 
SELECT pg_backend_pid()
2023-03-15 07:26:14.305 GMT|postgres|postgres|3748|64117316.ea4|LOG:  
disconnection: session time: 0:00:00.005 user=postgres database=postgres 
host=127.0.0.1 port=49902

...
2023-03-15 07:26:25.592 GMT|[unknown]|[unknown]|3748|64117321.ea4|LOG:  
connection received: host=127.0.0.1 port=50407
TRAP: failed Assert("PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED"), 
File: "C:\src\postgresql\src\backend\storage\ipc\pmsignal.c", Line: 329, PID: 3748
abort() has been called2023-03-15 07:26:25.608 
GMT|[unknown]|[unknown]|3524|64117321.dc4|LOG:  connection received: 
host=127.0.0.1 port=50408


The result depends on some OS conditions (it reproduced pretty well
immediately after VM reboot), but it's enough to test the patch proposed.
And I can confirm that the Assert is not observed anymore (with the sleep
added after CloseHandle(childinfo->procHandle)).

Best regards,
Alexander




Re: psql \watch 2nd argument: iteration count

2023-03-15 Thread Michael Paquier
On Tue, Mar 14, 2023 at 09:23:48PM -0700, Nathan Bossart wrote:
> + sleep = strtod(opt, _end);
> + if (sleep < 0 || *opt_end || errno == ERANGE)
> 
> Should we set errno to 0 before calling strtod()?

Yep.  You are right.
--
Michael


signature.asc
Description: PGP signature


pkg-config Requires.private entries should be comma-separated

2023-03-15 Thread Peter Eisentraut
While comparing the .pc (pkg-config) files generated by the make and 
meson builds, I noticed that the Requires.private entries use different 
delimiters.  The make build uses spaces, the meson build uses commas. 
The pkg-config documentation says that it should be comma-separated, but 
apparently about half the .pc in the wild use just spaces.


The pkg-config source code acknowledges that both commas and spaces work:

https://github.com/freedesktop/pkg-config/blob/master/parse.c#L273
https://github.com/pkgconf/pkgconf/blob/master/libpkgconf/dependency.c#L286

I think for consistency we should change the make build to use commas 
anyway.  See attached patch.From 5501fd952566654e49bcf5b8b4d5dc6c1026a90b Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 15 Mar 2023 08:42:27 +0100
Subject: [PATCH] pkg-config Requires.private entries should be comma-separated

---
 src/interfaces/ecpg/compatlib/Makefile | 2 +-
 src/interfaces/ecpg/ecpglib/Makefile   | 2 +-
 src/interfaces/libpq/Makefile  | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/interfaces/ecpg/compatlib/Makefile 
b/src/interfaces/ecpg/compatlib/Makefile
index da470e5418..b9483fba08 100644
--- a/src/interfaces/ecpg/compatlib/Makefile
+++ b/src/interfaces/ecpg/compatlib/Makefile
@@ -32,7 +32,7 @@ OBJS = \
$(WIN32RES) \
informix.o
 
-PKG_CONFIG_REQUIRES_PRIVATE = libecpg libpgtypes
+PKG_CONFIG_REQUIRES_PRIVATE = libecpg, libpgtypes
 
 all: all-lib
 
diff --git a/src/interfaces/ecpg/ecpglib/Makefile 
b/src/interfaces/ecpg/ecpglib/Makefile
index 6fadebf53d..652e023405 100644
--- a/src/interfaces/ecpg/ecpglib/Makefile
+++ b/src/interfaces/ecpg/ecpglib/Makefile
@@ -41,7 +41,7 @@ SHLIB_PREREQS = submake-libpq submake-pgtypeslib
 
 SHLIB_EXPORTS = exports.txt
 
-PKG_CONFIG_REQUIRES_PRIVATE = libpq libpgtypes
+PKG_CONFIG_REQUIRES_PRIVATE = libpq, libpgtypes
 
 all: all-lib
 
diff --git a/src/interfaces/libpq/Makefile b/src/interfaces/libpq/Makefile
index c18e914228..0919d8f32f 100644
--- a/src/interfaces/libpq/Makefile
+++ b/src/interfaces/libpq/Makefile
@@ -96,7 +96,7 @@ SHLIB_PREREQS = submake-libpgport
 SHLIB_EXPORTS = exports.txt
 
 ifeq ($(with_ssl),openssl)
-PKG_CONFIG_REQUIRES_PRIVATE = libssl libcrypto
+PKG_CONFIG_REQUIRES_PRIVATE = libssl, libcrypto
 endif
 
 all: all-lib libpq-refs-stamp
-- 
2.39.2



Re: Privileges on PUBLICATION

2023-03-15 Thread Antonin Houska
Gregory Stark (as CFM)  wrote:

> FYI this looks like it needs a rebase due to a conflict in copy.c and
> an offset in pgoutput.c.
> 
> Is there anything specific that still needs review or do you think
> you've handled all Peter's concerns? In particular, is there "a
> comprehensive description of what it is trying to do"? :)

I tried to improve the documentation and commit messages in v05. v06 (just
rebased) is attached.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com

>From d4490664ec80f52d23c4345eec5771764bcdbb17 Mon Sep 17 00:00:00 2001
From: Antonin Houska 
Date: Wed, 15 Mar 2023 04:21:01 +0100
Subject: [PATCH 1/2] Move some code into functions.

This is only a preparation for a patch that introduces USAGE privilege on
publications. It should make the following diff a little bit easier to read.
---
 src/backend/catalog/pg_publication.c| 236 +---
 src/backend/commands/copy.c |  81 +--
 src/backend/commands/copyto.c   |  89 
 src/backend/replication/pgoutput/pgoutput.c | 139 +---
 src/include/catalog/pg_publication.h|   6 +
 src/include/commands/copy.h |   2 +
 6 files changed, 308 insertions(+), 245 deletions(-)

diff --git a/src/backend/catalog/pg_publication.c b/src/backend/catalog/pg_publication.c
index a98fcad421..7f6024b7a5 100644
--- a/src/backend/catalog/pg_publication.c
+++ b/src/backend/catalog/pg_publication.c
@@ -1025,6 +1025,208 @@ GetPublicationByName(const char *pubname, bool missing_ok)
 	return OidIsValid(oid) ? GetPublication(oid) : NULL;
 }
 
+/*
+ * Get the mapping for given publication and relation.
+ */
+void
+GetPublicationRelationMapping(Oid pubid, Oid relid,
+			  Datum *attrs, bool *attrs_isnull,
+			  Datum *qual, bool *qual_isnull)
+{
+	Publication *publication;
+	HeapTuple	pubtuple = NULL;
+	Oid			schemaid = get_rel_namespace(relid);
+
+	publication = GetPublication(pubid);
+
+	/*
+	 * We don't consider row filters or column lists for FOR ALL TABLES or
+	 * FOR TABLES IN SCHEMA publications.
+	 */
+	if (!publication->alltables &&
+		!SearchSysCacheExists2(PUBLICATIONNAMESPACEMAP,
+			   ObjectIdGetDatum(schemaid),
+			   ObjectIdGetDatum(publication->oid)))
+		pubtuple = SearchSysCacheCopy2(PUBLICATIONRELMAP,
+	   ObjectIdGetDatum(relid),
+	   ObjectIdGetDatum(publication->oid));
+
+	if (HeapTupleIsValid(pubtuple))
+	{
+		/* Lookup the column list attribute. */
+		*attrs = SysCacheGetAttr(PUBLICATIONRELMAP, pubtuple,
+ Anum_pg_publication_rel_prattrs,
+ attrs_isnull);
+
+		/* Null indicates no filter. */
+		*qual = SysCacheGetAttr(PUBLICATIONRELMAP, pubtuple,
+Anum_pg_publication_rel_prqual,
+qual_isnull);
+	}
+	else
+	{
+		*attrs_isnull = true;
+		*qual_isnull = true;
+	}
+}
+/*
+ * Pick those publications from a list which should actually be used to
+ * publish given relation and return them.
+ *
+ * If publish_as_relid_p is passed, the relation whose tuple descriptor should
+ * be used to publish the data is stored in *publish_as_relid_p.
+ *
+ * If pubactions is passed, update the structure according to the matching
+ * publications.
+ */
+List *
+GetEffectiveRelationPublications(Oid relid, List *publications,
+ Oid *publish_as_relid_p,
+ PublicationActions *pubactions)
+{
+	Oid			schemaId = get_rel_namespace(relid);
+	List	   *pubids = GetRelationPublications(relid);
+	/*
+	 * We don't acquire a lock on the namespace system table as we build the
+	 * cache entry using a historic snapshot and all the later changes are
+	 * absorbed while decoding WAL.
+	 */
+	List	   *schemaPubids = GetSchemaPublications(schemaId);
+	ListCell   *lc;
+	Oid			publish_as_relid = relid;
+	int			publish_ancestor_level = 0;
+	bool		am_partition = get_rel_relispartition(relid);
+	char		relkind = get_rel_relkind(relid);
+	List	   *rel_publications = NIL;
+
+	foreach(lc, publications)
+	{
+		Publication *pub = lfirst(lc);
+		bool		publish = false;
+
+		/*
+		 * Under what relid should we publish changes in this publication?
+		 * We'll use the top-most relid across all publications. Also track
+		 * the ancestor level for this publication.
+		 */
+		Oid	pub_relid = relid;
+		int	ancestor_level = 0;
+
+		/*
+		 * If this is a FOR ALL TABLES publication, pick the partition root
+		 * and set the ancestor level accordingly.
+		 */
+		if (pub->alltables)
+		{
+			publish = true;
+			if (pub->pubviaroot && am_partition)
+			{
+List	   *ancestors = get_partition_ancestors(relid);
+
+pub_relid = llast_oid(ancestors);
+ancestor_level = list_length(ancestors);
+			}
+		}
+
+		if (!publish)
+		{
+			bool		ancestor_published = false;
+
+			/*
+			 * For a partition, check if any of the ancestors are published.
+			 * If so, note down the topmost ancestor that is published via
+			 * this publication, which will be used as the relation via which
+			 * to publish the partition's changes.
+			 */
+			if 

meson documentation build open issues

2023-03-15 Thread Peter Eisentraut
I have identified several open issues with the documentation build under 
Meson (approximately in priority order):


1. Image files are not handled at all, so they don't show up in the 
final product.


2. Defaults to website stylesheet, no way to configure.  This should be 
adjusted to match the make build.


3. The various build targets and their combinations are mismatching and 
incomplete.  For example:


Top-level GNUmakefile has these targets:

- docs (builds html and man)
- html
- man

(Those are the formats that are part of a distribution build.)

doc/src/sgml/Makefile has these documented targets:

- default target is html
- all (builds html and man, maps to top-level "docs")
- html
- man
- postgres-A4.pdf
- postgres-US.pdf
- check

as well as (undocumented):

- htmlhelp
- postgres.html
- postgres.txt
- epub
- postgres.epub
- postgres.info

meson has the following documented targets:

- docs (builds only html)
- alldocs (builds all formats, including obscure ones)

as well as the following undocumented targets:

- html
- man
- html_help [sic]
- postgres-A4.pdf
- postgres-US.pdf
- postgres.epub

- [info is not implemented at all]
- [didn't find an equivalent of check]

As you can see, this is all over the place.  I'd like to arrive at some 
consistency across all build systems for handling each tier of 
documentation formats, in terms of what is documented, what the targets 
are named, and how they are grouped.


4. There doesn't appear to be a way to install the documentation.
(There are also some open questions in the top-level meson.build about
the installation directories, but I suppose if we can't install them
then exactly where to install them hasn't been thought about too
much.)

5. There doesn't appear to be an equivalent of "make world" and "make
install-world" that includes documentation builds.




Re: Combine pg_walinspect till_end_of_wal functions with others

2023-03-15 Thread Bharath Rupireddy
On Wed, Mar 15, 2023 at 12:27 PM Michael Paquier  wrote:
>
> On Tue, Mar 14, 2023 at 07:05:20PM -0700, Andres Freund wrote:
> > It's using ICU, but not a specific collation. The build I linked to is WIP
> > hackery to add ICU support to windows CI. Here's the initdb output:
> > https://api.cirrus-ci.com/v1/artifact/task/6288336663347200/testrun/build/testrun/pg_walinspect/regress/log/initdb.log
>
> Hmm.  Thanks.  At the end, I think that I would be tempted to just
> remove this \dx query and move on.  I did not anticipate that this
> ordering would be that much sensitive, and the solution of using a
> COLLATE C at the end of a describe.c query does not sound much
> appealing, either.

-1 for removing \dx+ for pg_walinspect version 1.0, because we wanted
to show the diff of functions along with testing the upgrade path in
the oldextversions.sql. Therefore, I prefer something like [1]:

[1]
diff --git a/contrib/pg_walinspect/sql/oldextversions.sql
b/contrib/pg_walinspect/sql/oldextversions.sql
index 258a009888..32a059c72d 100644
--- a/contrib/pg_walinspect/sql/oldextversions.sql
+++ b/contrib/pg_walinspect/sql/oldextversions.sql
@@ -5,8 +5,17 @@ CREATE EXTENSION pg_walinspect WITH VERSION '1.0';
 -- Mask DETAIL messages as these could refer to current LSN positions.
 \set VERBOSITY terse

+-- \dx+ will give locale-sensitive results, so we can't use it here.
+CREATE VIEW list_pg_walinspect_objects AS
+SELECT pg_describe_object(classid, objid, 0) AS "Object description"
+  FROM pg_depend
+  WHERE refclassid = 'pg_extension'::regclass AND
+refobjid = (SELECT oid FROM pg_extension WHERE
extname = 'pg_walinspect') AND
+deptype = 'e'
+  ORDER BY pg_describe_object(classid, objid, 0) COLLATE "C";
+
 -- List what version 1.0 contains
-\dx+ pg_walinspect
+SELECT * FROM list_pg_walinspect_objects;

 -- Make sure checkpoints don't interfere with the test.
 SELECT 'init' FROM
pg_create_physical_replication_slot('regress_pg_walinspect_slot',
true, false);
@@ -25,7 +34,7 @@ SELECT COUNT(*) >= 1 AS ok FROM
pg_get_wal_stats_till_end_of_wal('/F
 ALTER EXTENSION pg_walinspect UPDATE TO '1.1';

 -- List what version 1.1 contains
-\dx+ pg_walinspect
+SELECT * FROM list_pg_walinspect_objects;

 SELECT pg_drop_replication_slot('regress_pg_walinspect_slot');

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Add pg_walinspect function with block info columns

2023-03-15 Thread Michael Paquier
On Wed, Mar 15, 2023 at 12:13:56PM +0530, Bharath Rupireddy wrote:
> How about something like the attached? It adds the per-record columns
> to pg_get_wal_block_info() avoiding "possibly expensive" joins with
> pg_get_wal_records_info().
> 
> With this, pg_get_wal_records_info() too will be useful for users
> scanning WAL at record level. That is to say that we can retain both
> pg_get_wal_records_info() and pg_get_wal_block_info().

FWIW, I am not convinced that there is any need to bloat more the
attributes of these functions, as filters for records could basically
touch all the fields returned by pg_get_wal_records_info().  What
about adding an example in the docs with the LATERAL query I mentioned
previously?
--
Michael


signature.asc
Description: PGP signature


Re: Combine pg_walinspect till_end_of_wal functions with others

2023-03-15 Thread Michael Paquier
On Tue, Mar 14, 2023 at 07:05:20PM -0700, Andres Freund wrote:
> It's using ICU, but not a specific collation. The build I linked to is WIP
> hackery to add ICU support to windows CI. Here's the initdb output:
> https://api.cirrus-ci.com/v1/artifact/task/6288336663347200/testrun/build/testrun/pg_walinspect/regress/log/initdb.log

Hmm.  Thanks.  At the end, I think that I would be tempted to just
remove this \dx query and move on.  I did not anticipate that this
ordering would be that much sensitive, and the solution of using a
COLLATE C at the end of a describe.c query does not sound much
appealing, either.
--
Michael


signature.asc
Description: PGP signature


Re: Add pg_walinspect function with block info columns

2023-03-15 Thread Michael Paquier
On Tue, Mar 14, 2023 at 06:50:15PM -0700, Peter Geoghegan wrote:
> On Tue, Mar 14, 2023 at 5:34 PM Melanie Plageman
>  wrote:
>> Well, I think if you only care about the WAL record-level information
>> and not the block-level information, having the WAL record information
>> denormalized like that with all the block information would be a
>> nuisance.
> 
> I generally care about both. When I want to look at things at the
> pg_get_wal_records_info() level (as opposed to a summary), the
> block_ref information is *always* of primary importance. I don't want
> to have to write my own bug-prone parser for block_ref, but why should
> the only alternative be joining against pg_get_wal_block_info()? The
> information that I'm interested in is "close at hand" to
> pg_get_wal_records_info() already.
>

I am not sure to get the concern here.  As long as one is smart enough
with SQL, there is no need to perform a double scan of the contents of
pg_wal with a large scan on the start LSN.  If one wishes to only
extract some block for a given record type, or for a filter of your
choice, it is possible to use a LATERAL on pg_get_wal_block_info(),
say:
SELECT r.start_lsn, b.blockid
  FROM pg_get_wal_records_info('0/0128', '0/1911AA8') AS r,
  LATERAL pg_get_wal_block_info(start_lsn, end_lsn) as b
  WHERE r.resource_manager = 'Heap2';

This will extract the block information that you'd want for a given
record type.

> I understand that in the general case there might be quite a few
> blocks associated with a WAL record. For complicated cases,
> pg_get_wal_block_info() does make sense. However, the vast majority of
> individual WAL records (and possibly most WAL record types) are
> related to one block only. One block that is generally from the
> relation's main fork.

Sure, though there may be more complicated scenarios, like custom
RMGRs.  At the end it comes to how much normalization should be
applied to the data extracted.  FWIW, I think that the current
interface is a pretty good balance in usability.
--
Michael


signature.asc
Description: PGP signature


RE: Allow logical replication to copy tables in binary format

2023-03-15 Thread Takamichi Osumi (Fujitsu)
Hi,


On Wednesday, March 15, 2023 2:34 PM Amit Kapila  
wrote:
> On Tue, Mar 14, 2023 at 8:50 PM Takamichi Osumi (Fujitsu)
>  wrote:
> >
> > On Tuesday, March 14, 2023 8:02 PM Melih Mutlu
>  wrote:
> > (3) copy_table()
> >
> > +   /*
> > +* If the publisher version is earlier than v14, it COPY command
> doesn't
> > +* support the binary option.
> > +*/
> >
> > This sentence doesn't look correct grammatically. We can replace "it COPY
> command" with "subscription" for example. Kindly please fix it.
> >
> 
> How about something like: "The binary option for replication is supported 
> since
> v14."?
Yes, this looks best to me. I agree with this suggestion.


Best Regards,
Takamichi Osumi



Re: ICU locale validation / canonicalization

2023-03-15 Thread Jeff Davis
On Tue, 2023-03-14 at 10:10 -0700, Jeff Davis wrote:
> One loose end is that we really should support language tags like
> "und"
> in those older versions (54 and earlier). Your commit d72900bded
> avoided the problem, but perhaps we should fix it by looking for
> "und"
> and replacing it with "root" while opening, or something.

Attached are a few patches to implement this idea.

-- 
Jeff Davis
PostgreSQL Contributor Team - AWS


From fa789a446b425cd491a0452acb2b2f135ce06f5a Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Tue, 14 Mar 2023 09:58:29 -0700
Subject: [PATCH v4 1/3] Support language tags in older ICU versions (53 and
 earlier).

By calling uloc_canonicalize() before parsing the attributes, the
existing locale attribute parsing logic works on language tags as
well.

Fix a small memory leak, too.
---
 src/backend/commands/collationcmds.c  |  8 +++---
 src/backend/utils/adt/pg_locale.c | 26 ---
 .../regress/expected/collate.icu.utf8.out |  8 ++
 src/test/regress/sql/collate.icu.utf8.sql |  4 +++
 4 files changed, 38 insertions(+), 8 deletions(-)

diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 8949684afe..b8f2e7059f 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -950,7 +950,6 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 			const char *name;
 			char	   *langtag;
 			char	   *icucomment;
-			const char *iculocstr;
 			Oid			collid;
 
 			if (i == -1)
@@ -959,20 +958,19 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 name = uloc_getAvailable(i);
 
 			langtag = get_icu_language_tag(name);
-			iculocstr = U_ICU_VERSION_MAJOR_NUM >= 54 ? langtag : name;
 
 			/*
 			 * Be paranoid about not allowing any non-ASCII strings into
 			 * pg_collation
 			 */
-			if (!pg_is_ascii(langtag) || !pg_is_ascii(iculocstr))
+			if (!pg_is_ascii(langtag) || !pg_is_ascii(langtag))
 continue;
 
 			collid = CollationCreate(psprintf("%s-x-icu", langtag),
 	 nspid, GetUserId(),
 	 COLLPROVIDER_ICU, true, -1,
-	 NULL, NULL, iculocstr, NULL,
-	 get_collation_actual_version(COLLPROVIDER_ICU, iculocstr),
+	 NULL, NULL, langtag, NULL,
+	 get_collation_actual_version(COLLPROVIDER_ICU, langtag),
 	 true, true);
 			if (OidIsValid(collid))
 			{
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 1d3d4d86d3..b9c7fbd511 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -2643,9 +2643,28 @@ pg_attribute_unused()
 static void
 icu_set_collation_attributes(UCollator *collator, const char *loc)
 {
-	char	   *str = asc_tolower(loc, strlen(loc));
+	UErrorCode	status;
+	int32_t		len;
+	char	   *icu_locale_id;
+	char	   *lower_str;
+	char	   *str;
 
-	str = strchr(str, '@');
+	/* first, make sure the string is an ICU format locale ID */
+	status = U_ZERO_ERROR;
+	len = uloc_canonicalize(loc, NULL, 0, );
+	icu_locale_id = palloc(len + 1);
+	status = U_ZERO_ERROR;
+	len = uloc_canonicalize(loc, icu_locale_id, len + 1, );
+	if (U_FAILURE(status))
+		ereport(ERROR,
+(errmsg("canonicalization failed for locale string \"%s\": %s",
+		loc, u_errorName(status;
+
+	lower_str = asc_tolower(icu_locale_id, strlen(icu_locale_id));
+
+	pfree(icu_locale_id);
+
+	str = strchr(lower_str, '@');
 	if (!str)
 		return;
 	str++;
@@ -2660,7 +2679,6 @@ icu_set_collation_attributes(UCollator *collator, const char *loc)
 			char	   *value;
 			UColAttribute uattr;
 			UColAttributeValue uvalue;
-			UErrorCode	status;
 
 			status = U_ZERO_ERROR;
 
@@ -2727,6 +2745,8 @@ icu_set_collation_attributes(UCollator *collator, const char *loc)
 loc, u_errorName(status;
 		}
 	}
+
+	pfree(lower_str);
 }
 
 #endif			/* USE_ICU */
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index 9a3e12e42d..6225b575ce 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1304,6 +1304,14 @@ SELECT 'abc' <= 'ABC' COLLATE case_insensitive, 'abc' >= 'ABC' COLLATE case_inse
  t| t
 (1 row)
 
+-- test language tags
+CREATE COLLATION lt_insensitive (provider = icu, locale = 'en-u-ks-level1', deterministic = false);
+SELECT 'aBcD' COLLATE lt_insensitive = 'AbCd' COLLATE lt_insensitive;
+ ?column? 
+--
+ t
+(1 row)
+
 CREATE TABLE test1cs (x text COLLATE case_sensitive);
 CREATE TABLE test2cs (x text COLLATE case_sensitive);
 CREATE TABLE test3cs (x text COLLATE case_sensitive);
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index 0790068f31..64cbfd0a5b 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -518,6 +518,10 @@ CREATE COLLATION case_insensitive (provider = icu, locale = '@colStrength=second
 SELECT 'abc' <= 'ABC' COLLATE 

Re: Add pg_walinspect function with block info columns

2023-03-15 Thread Bharath Rupireddy
On Wed, Mar 15, 2023 at 7:20 AM Peter Geoghegan  wrote:
>
> > But, perhaps you are suggesting a parameter to pg_get_wal_records_info()
> > like "with_block_info" or something, which produces the full
> > denormalized block + record output?
>
> I was thinking of something like that, yes -- though it wouldn't
> necessarily have to be the *full* denormalized block_ref info, the FPI
> itself, etc. Just the more useful stuff.
>
> It occurs to me that my concern about the information that
> pg_get_wal_records_info() lacks could be restated as a concern about
> what pg_get_wal_block_info() lacks: pg_get_wal_block_info() fails to
> show basic information about the WAL record whose blocks it reports
> on, even though it could easily show all of the
> pg_get_wal_records_info() info once per block (barring block_ref). So
> addressing my concern by adjusting pg_get_wal_block_info() might be
> the best approach. I'd probably be happy with that -- I'd likely just
> stop using pg_get_wal_records_info() completely under this scheme.

How about something like the attached? It adds the per-record columns
to pg_get_wal_block_info() avoiding "possibly expensive" joins with
pg_get_wal_records_info().

With this, pg_get_wal_records_info() too will be useful for users
scanning WAL at record level. That is to say that we can retain both
pg_get_wal_records_info() and pg_get_wal_block_info().

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From c98e250011d9944b44fe166bb8bdf8ac81ac9955 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Wed, 15 Mar 2023 06:33:31 +
Subject: [PATCH v1] Emit WAL record info via pg_get_wal_block_info()

---
 .../pg_walinspect/pg_walinspect--1.0--1.1.sql | 10 +++-
 contrib/pg_walinspect/pg_walinspect.c | 48 ++-
 doc/src/sgml/pgwalinspect.sgml| 38 +--
 3 files changed, 68 insertions(+), 28 deletions(-)

diff --git a/contrib/pg_walinspect/pg_walinspect--1.0--1.1.sql b/contrib/pg_walinspect/pg_walinspect--1.0--1.1.sql
index 586c3b4467..46fcaff3a1 100644
--- a/contrib/pg_walinspect/pg_walinspect--1.0--1.1.sql
+++ b/contrib/pg_walinspect/pg_walinspect--1.0--1.1.sql
@@ -12,7 +12,15 @@ DROP FUNCTION pg_get_wal_stats_till_end_of_wal(pg_lsn, boolean);
 --
 CREATE FUNCTION pg_get_wal_block_info(IN start_lsn pg_lsn,
 	IN end_lsn pg_lsn,
-	OUT lsn pg_lsn,
+OUT start_lsn pg_lsn,
+OUT end_lsn pg_lsn,
+OUT prev_lsn pg_lsn,
+OUT xid xid,
+OUT resource_manager text,
+OUT record_type text,
+OUT record_length int4,
+OUT main_data_length int4,
+OUT description text,
 	OUT blockid int2,
 	OUT reltablespace oid,
 	OUT reldatabase oid,
diff --git a/contrib/pg_walinspect/pg_walinspect.c b/contrib/pg_walinspect/pg_walinspect.c
index 3b3215daf5..d766a050a4 100644
--- a/contrib/pg_walinspect/pg_walinspect.c
+++ b/contrib/pg_walinspect/pg_walinspect.c
@@ -43,7 +43,8 @@ static XLogRecPtr GetCurrentLSN(void);
 static XLogReaderState *InitXLogReaderState(XLogRecPtr lsn);
 static XLogRecord *ReadNextXLogRecord(XLogReaderState *xlogreader);
 static void GetWALRecordInfo(XLogReaderState *record, Datum *values,
-			 bool *nulls, uint32 ncols);
+			 bool *nulls, uint32 ncols,
+			 bool fetch_blk_ref);
 static void GetWALRecordsInfo(FunctionCallInfo fcinfo,
 			  XLogRecPtr start_lsn,
 			  XLogRecPtr end_lsn);
@@ -180,7 +181,8 @@ ReadNextXLogRecord(XLogReaderState *xlogreader)
  */
 static void
 GetWALRecordInfo(XLogReaderState *record, Datum *values,
- bool *nulls, uint32 ncols)
+ bool *nulls, uint32 ncols,
+ bool fetch_blk_ref)
 {
 	const char *id;
 	RmgrData	desc;
@@ -200,8 +202,11 @@ GetWALRecordInfo(XLogReaderState *record, Datum *values,
 	desc.rm_desc(_desc, record);
 
 	/* Block references. */
-	initStringInfo(_blk_ref);
-	XLogRecGetBlockRefInfo(record, false, true, _blk_ref, _len);
+	if (fetch_blk_ref)
+	{
+		initStringInfo(_blk_ref);
+		XLogRecGetBlockRefInfo(record, false, true, _blk_ref, _len);
+	}
 
 	main_data_len = XLogRecGetDataLen(record);
 
@@ -213,9 +218,14 @@ GetWALRecordInfo(XLogReaderState *record, Datum *values,
 	values[i++] = CStringGetTextDatum(id);
 	values[i++] = UInt32GetDatum(XLogRecGetTotalLen(record));
 	values[i++] = UInt32GetDatum(main_data_len);
-	values[i++] = UInt32GetDatum(fpi_len);
+
+	if (fetch_blk_ref)
+		values[i++] = UInt32GetDatum(fpi_len);
+
 	values[i++] = CStringGetTextDatum(rec_desc.data);
-	values[i++] = CStringGetTextDatum(rec_blk_ref.data);
+
+	if (fetch_blk_ref)
+		values[i++] = CStringGetTextDatum(rec_blk_ref.data);
 
 	Assert(i == ncols);
 }
@@ -228,19 +238,30 @@ GetWALRecordInfo(XLogReaderState *record, Datum *values,
 static void
 GetWALBlockInfo(FunctionCallInfo fcinfo, XLogReaderState *record)
 {
-#define PG_GET_WAL_BLOCK_INFO_COLS 11
+#define PG_GET_WAL_BLOCK_INFO_COLS 19
+#define PG_GET_WAL_BLOCK_INFO_PER_RECORD_COLS 9
+	Datum		values[PG_GET_WAL_BLOCK_INFO_COLS] = {0};
+	bool		

Re: Allow logical replication to copy tables in binary format

2023-03-15 Thread Amit Kapila
On Wed, Mar 15, 2023 at 11:52 AM Peter Smith  wrote:
>
> ==
> src/backend/replication/logical/tablesync.c
>
> 5.
> +
> + /*
> + * If the publisher version is earlier than v14, it COPY command doesn't
> + * support the binary option.
> + */
> + if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 14 &&
> + MySubscription->binary)
> + {
> + appendStringInfo(, " WITH (FORMAT binary)");
> + options = lappend(options, makeDefElem("format", (Node *)
> makeString("binary"), -1));
> + }
>
> Sorry, I gave a poor review comment for this previously. Now I have
> revisited all the thread discussions about version checking. I feel
> that some explanation should be given in the code comment so that
> future readers of this code can understand why you decided to use v14
> checking.
>
> Something like this:
>
> SUGGESTION
> If the publisher version is earlier than v14, we use text format COPY.
>

I think this isn't explicit that we supported the binary format since
v14. So, I would prefer my version of the comment as suggested in the
previous email.

-- 
With Regards,
Amit Kapila.




Re: Allow logical replication to copy tables in binary format

2023-03-15 Thread Peter Smith
Here are some review comments for v13-0001

==
doc/src/sgml/logical-replication.sgml

1.
@@ -241,10 +241,13 @@
types of the columns do not need to match, as long as the text
representation of the data can be converted to the target type.  For
example, you can replicate from a column of type integer to a
-   column of type bigint.  The target table can also have
-   additional columns not provided by the published table.  Any such columns
-   will be filled with the default value as specified in the definition of the
-   target table.
+   column of type bigint.  However, replication in
binary format is
+   type specific and does not allow to replicate data between different types
+   according to its restrictions (See binary option of
+   CREATE
SUBSCRIPTION
+   for details).  The target table can also have additional columns
not provided
+   by the published table.  Any such columns will be filled with the default
+   value as specified in the definition of the target table.
   

I don’t really think we should mention details of what the binary
problems are here, because then:
i) it is just duplicating information already on the CREATE SUBSCRIPTION page
ii) you will miss some restrictions. (e.g. here you said something
about "type specific" but didn't mention send/receive functions would
be mandatory for the copy_data option)

That's why in the previous v12 review [1] (comment #3) I suggested
that this page should just say something quite generic like "However,
replication in binary format is more restrictive", and link back to
the other page which has all the gory details.

==
doc/src/sgml/ref/alter_subscription.sgml

2.
My previous v12 review [1] (comment #6) suggested maybe updating this
page. But it was not done in v13. Did you accidentally miss the review
comment, or chose not to do it?

==
doc/src/sgml/ref/create_subscription.sgml

3.
  
-  Specifies whether the subscription will request the publisher to
-  send the data in binary format (as opposed to text).
-  The default is false.
-  Even when this option is enabled, only data types having
-  binary send and receive functions will be transferred in binary.
+  Specifies whether the subscription will request the publisher to send
+  the data in binary format (as opposed to text). The default is
+  false. Any initial table synchronization copy
+  (see copy_data) also uses the same format. Binary
+  format can be faster than the text format, but it is less portable
+  across machine architectures and PostgreSQL versions.
Binary format is
+  also very data type specific, it will not allow copying
between different
+  column types as opposed to text format. Even when this
option is enabled,
+  only data types having binary send and receive functions will be
+  transferred in binary. Note that the initial synchronization requires
+  all data types to have binary send and receive functions, otherwise
+  the synchronization will fail.
  


BEFORE
Binary format is also very data type specific, it will not allow
copying between different column types as opposed to text format.

SUGGESTION (worded more similar to what is already on the COPY page [2])
Binary format is very data type specific; for example, it will not
allow copying from a smallint column to an integer column, even though
that would work fine in text format.


~~~

4.

+ 
+  If the publisher is a PostgreSQL version
+  before 14, then any initial table synchronization will use
text format
+  even if this option is enabled.
+ 

IMO it will be clearer to explicitly say the option instead of 'this option'.

SUGGESTION
If the publisher is a PostgreSQL version
before 14, then any initial table synchronization will use text format
even if binary = true.


==
src/backend/replication/logical/tablesync.c

5.
+
+ /*
+ * If the publisher version is earlier than v14, it COPY command doesn't
+ * support the binary option.
+ */
+ if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 14 &&
+ MySubscription->binary)
+ {
+ appendStringInfo(, " WITH (FORMAT binary)");
+ options = lappend(options, makeDefElem("format", (Node *)
makeString("binary"), -1));
+ }

Sorry, I gave a poor review comment for this previously. Now I have
revisited all the thread discussions about version checking. I feel
that some explanation should be given in the code comment so that
future readers of this code can understand why you decided to use v14
checking.

Something like this:

SUGGESTION
If the publisher version is earlier than v14, we use text format COPY.
Note - In fact COPY syntax "WITH (FORMAT binary)" has existed since
v9, but since the logical replication binary mode transfer was not
introduced until v14 it was decided to check using the later version.

--
[1] PS v12 review -