pg_stat_wal: tracking the compression effect

2022-08-25 Thread Ken Kato



Hi hackers,

We can specify compression method (for example, lz4, zstd), but it is 
hard to know the effect of compression depending on the method. There is 
already a way to know the compression effect using pg_waldump. However, 
having these statistics in the view makes it more accessible. I am 
proposing to add statistics, which keeps track of compression effect in 
pg_stat_ wal view.


The design I am thinking is below:

compression_saved | compression_times
--+---
38741 |6


Accumulating the values, which indicates how much space is saved by each 
compression (size before compression - size after compression), and keep 
track of how many times compression has happened. So that one can know 
how much space is saved on average.


What do you think?

Regards,

--
Ken Kato
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: pg_basebackup: add test about zstd compress option

2022-08-25 Thread Dong Wook Lee
On Tue, Aug 23, 2022 at 11:37 AM Michael Paquier  wrote:

> It seems to me that checking that the contents generated are valid is
> equally necessary.  We do that with zlib with gzip --test, and you
> could use ${ZSTD} in the context of this test.

Thank you for the good points.
I supplemented the test according to your suggestion.
However, there was a problem.
Even though I did export ZSTD on the Makefile , the test runner can't
find ZSTD when it actually tests.
```
my $zstd = $ENV{ZSTD};
skip "program zstd is not found in your system", 1
  if (!defined $zstd
|| $zstd eq '');
```
log: regress_log_010_pg_basebackup
```
ok 183 # skip program zstd is not found in your system.
```
Could you check if I missed anything?


v2_add_test_pg_basebackup.patch
Description: Binary data


Re: add checkpoint stats of snapshot and mapping files of pg_logical dir

2022-08-25 Thread Kyotaro Horiguchi
At Wed, 24 Aug 2022 13:34:54 +0900, Michael Paquier  wrote 
in 
> On Wed, Aug 24, 2022 at 10:48:01AM +0900, Kyotaro Horiguchi wrote:
> > By the way, I think we use INSTR_TIME_* macros to do masure internal
> > durations (mainly for the monotonic clock characteristics, and to
> > reduce performance degradation on Windows?). I'm not sure that's
> > crutial here but I don't think there's any reason to use
> > GetCurrentTimestamp() instead.
> 
> This implies two calls of gettimeofday(), but that does not worry me
> much in this code path.  There is some consistency with
> CheckpointGuts() where we take timestamps for the sync requests.

Mmm. heap_vacuum_rel does the same. From the other direction, the two
are the only use of GetCurrentTimestamp() for this purpose. However,
I'm fine with that. Thanks for the info.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)

2022-08-25 Thread Alvaro Herrera
On 2022-Aug-24, Peter Eisentraut wrote:

> I don't follow how this is a backpatching hazard.

It changes code.  Any bugfix in the surrounding code would have to fix a
conflict.  That is nonzero effort.  Is it a huge risk?  No, it is very
small risk and a very small cost to fix such a conflict; but my claim is
that this change has zero benefit, therefore we should not incur a
nonzero future effort.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"How amazing is that? I call it a night and come back to find that a bug has
been identified and patched while I sleep."(Robert Davidson)
   http://archives.postgresql.org/pgsql-sql/2006-03/msg00378.php




Re: Letter case of "admin option"

2022-08-25 Thread Alvaro Herrera
On 2022-Aug-25, Kyotaro Horiguchi wrote:

> At Tue, 23 Aug 2022 09:58:47 -0400, Robert Haas  wrote 
> in 

> I would translate "ADMIN OPTION" to "ADMIN OPTION" in Japanese but
> "admin option" is translated to "管理者オプション" which is a bit hard
> for the readers to come up with the connection to "ADMIN OPTION" (or
> ADMIN ). I guess this is somewhat simliar to use "You need to
> give capability to administrate the role" to suggest users to add WITH
> ADMIN OPTION to the role.
> 
> Maybe Álvaro has a similar difficulty on it.

Exactly.

I ran a quick poll in a Spanish community.  Everyone who responded (not
many admittedly) agreed with this idea -- they find the message clearer
if the keyword is mentioned explicitly in the translation.

> > In short, I'm wondering whether we should regard ADMIN as the name of
> > the option, but OPTION as part of the GRANT syntax, and hence
> > capitalize it "ADMIN option". However, if the non-English speakers on
> > this list have a strong preference for something else I'm certainly
> > not going to fight about it.
> 
> "ADMIN option" which is translated into "ADMINオプション" is fine by
> me.  I hope Álvaro thinks the same way.

Hmm, but our docs say that the option is called ADMIN OPTION, don't
they?  And I think the standard sees it the same way.  You cannot invoke
it without the word OPTION.  I understand the point of view, but I don't
think it is clearer done that way.  It is different for example with
INHERIT; we could say "the INHERIT option" making the word "option"
translatable in that phrase.  But then you don't have to add that word
in the command.

> What do you think about the attached?

I prefer the ADMIN OPTION interpretation (both for
docs and error messages).  I think it's clearer that way, given that the
syntax is what it is.

> > > >   !is_admin_of_role(currentUserId, roleid))
> > > >   ereport(ERROR,
> > > >   
> > > > (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
> > > >errmsg("must have admin option 
> > > > on role \"%s\"",
> > > >   rolename)));
> > >
> > > The message seems a bit short that it only mentions admin option while
> > > omitting CREATEROLE privilege. "must have CREATEROLE privilege or
> > > admin option on role %s" might be better.  Or we could say just
> > > "insufficient privilege" or "permission denied" in the main error
> > > message then provide "CREATEROLE privilege or admin option on role %s
> > > is required" in DETAILS or HINTS message.

I'm not opposed to moving that part of detail/hint, but I would prefer
that it says "the CREATEROLE privilege or ADMIN OPTION".

> --- a/doc/src/sgml/ref/alter_group.sgml
> +++ b/doc/src/sgml/ref/alter_group.sgml
> @@ -55,7 +55,7 @@ ALTER GROUP  class="parameter">group_name RENAME TO  REVOKE. Note that
> GRANT and REVOKE have additional
> options which are not available with this command, such as the ability
> -   to grant and revoke ADMIN OPTION, and the ability to
> +   to grant and revoke ADMIN option, and the ability to
> specify the grantor.
>

I think the original reads better.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"La libertad es como el dinero; el que no la sabe emplear la pierde" (Alvarez)




Re: pg_receivewal and SIGTERM

2022-08-25 Thread Daniel Gustafsson
> On 23 Aug 2022, at 02:15, Michael Paquier  wrote:
> 
> On Mon, Aug 22, 2022 at 04:05:16PM +0200, Christoph Berg wrote:
>> Do you mean it can, or can not be backpatched? (I'd argue for
>> backpatching since the behaviour is slightly broken at the moment.)
> 
> I mean that it is fine to backpatch that, in my opinion.

I think this can be argued both for and against backpatching.  Catching SIGTERM
makes a lot of sense, especially given systemd's behavior.  On the other hand,
This adds functionality to something arguably working as intended, regardless
of what one thinks about the intent.

The attached adds the Exit Status section to pg_recvlogical docs which is
present in pg_receivewal to make them more aligned, and tweaks comments to
pgindent standards. This is the version I think is ready to commit.

--
Daniel Gustafsson   https://vmware.com/



v2-0001-Handle-SIGTERM-in-pg_receivewal-and-pg_recvlogica.patch
Description: Binary data


Re: Column Filtering in Logical Replication

2022-08-25 Thread vignesh C
On Tue, Aug 23, 2022 at 7:52 AM Peter Smith  wrote:
>
> On Mon, Aug 22, 2022 at 9:25 PM vignesh C  wrote:
> >
> ...
>
> > Few comments:
> > 1) I felt no expressions are allowed in case of column filters. Only
> > column names can be specified. The second part of the sentence
> > confuses what is allowed and what is not allowed. Won't it be better
> > to remove the second sentence and mention that only column names can
> > be specified.
> > +   
> > +Column list can contain only simple column references. Complex
> > +expressions, function calls etc. are not allowed.
> > +   
> >
>
> This wording was lifted verbatim from the commit message [1]. But I
> see your point that it just seems to be overcomplicating a simple
> rule. Modified as suggested.
>
> > 2) tablename should be table name.
> > +   
> > +A column list is specified per table following the tablename, and
> > enclosed by
> > +parenthesis. See  for details.
> > +   
> >
> > We have used table name in the same page in other instances like:
> > a) The row filter is defined per table. Use a WHERE clause after the
> > table name for each published table that requires data to be filtered
> > out. The WHERE clause must be enclosed by parentheses.
> > b) The tables are matched between the publisher and the subscriber
> > using the fully qualified table name.
> >
>
> Fixed as suggested.
>
> > 3) One small whitespace issue:
> > git am v2-0001-Column-List-replica-identity-rules.patch
> > Applying: Column List replica identity rules.
> > .git/rebase-apply/patch:30: trailing whitespace.
> >if the publication publishes only INSERT operations.
> > warning: 1 line adds whitespace errors.
> >
>
> Fixed.
>
> ~~~
>
> PSA the v3* patch set.

Thanks for the updated patch.
Few comments:
1) We can shuffle the columns in publisher table and subscriber to
show that the order of the column does not matter
+   
+Create a publication p1. A column list is defined for
+table t1 to reduce the number of columns that will be
+replicated.
+
+test_pub=# CREATE PUBLICATION p1 FOR TABLE t1 (id, a, b, c);
+CREATE PUBLICATION
+test_pub=#
+

2) We can try to keep the line content to less than 80 chars
+   
+A column list is specified per table following the tablename, and
enclosed by
+parenthesis. See  for details.
+   

3) tablename should be table name like it is used in other places
+   
+A column list is specified per table following the tablename, and
enclosed by
+parenthesis. See  for details.
+   

4a) In the below, you could include mentioning "Only the column list
data of publication p1 are replicated."
+
+ Insert some rows to table t1.
+
+test_pub=# INSERT INTO t1 VALUES(1, 'a-1', 'b-1', 'c-1', 'd-1', 'e-1');
+INSERT 0 1

4b) In the above, we could mention that the insert should be done on
the "publisher side" as the previous statements are executed on the
subscriber side.

Regards,
Vignesh




Re: use SSE2 for is_valid_ascii

2022-08-25 Thread John Naylor
v3 applies on top of the v9 json_lex_string patch in [1] and adds a
bit more to that, resulting in a simpler patch that is more amenable
to additional SIMD-capable platforms.

[1] 
https://www.postgresql.org/message-id/CAFBsxsFV4v802idV0-Bo%3DV7wLMHRbOZ4er0hgposhyGCikmVGA%40mail.gmail.com

-- 
John Naylor
EDB: http://www.enterprisedb.com
diff --git a/src/common/wchar.c b/src/common/wchar.c
index 1e6e198bf2..1ca7533f00 100644
--- a/src/common/wchar.c
+++ b/src/common/wchar.c
@@ -1918,11 +1918,12 @@ pg_utf8_verifystr(const unsigned char *s, int len)
 	const int	orig_len = len;
 	uint32		state = BGN;
 
-/*
- * Sixteen seems to give the best balance of performance across different
- * byte distributions.
- */
-#define STRIDE_LENGTH 16
+	/*
+	 * With a stride of two vector widths, gcc will unroll the loop. Even if
+	 * the compiler can unroll a longer loop, it's not worth it because we
+	 * must fall back to the byte-wise algorithm if we find any non-ASCII.
+	 */
+#define STRIDE_LENGTH (2 * sizeof(Vector8))
 
 	if (len >= STRIDE_LENGTH)
 	{
diff --git a/src/include/mb/pg_wchar.h b/src/include/mb/pg_wchar.h
index 011b0b3abd..aea045aa66 100644
--- a/src/include/mb/pg_wchar.h
+++ b/src/include/mb/pg_wchar.h
@@ -19,6 +19,8 @@
 #ifndef PG_WCHAR_H
 #define PG_WCHAR_H
 
+#include "port/simd.h"
+
 /*
  * The pg_wchar type
  */
@@ -704,25 +706,28 @@ extern WCHAR *pgwin32_message_to_UTF16(const char *str, int len, int *utf16len);
  * Verify a chunk of bytes for valid ASCII.
  *
  * Returns false if the input contains any zero bytes or bytes with the
- * high-bit set. Input len must be a multiple of 8.
+ * high-bit set. Input len must be a multiple of the chunk size (8 or 16).
  */
 static inline bool
 is_valid_ascii(const unsigned char *s, int len)
 {
 	const unsigned char *const s_end = s + len;
-	uint64		chunk,
-highbit_cum = UINT64CONST(0),
-zero_cum = UINT64CONST(0x8080808080808080);
+	Vector8		chunk;
+	Vector8		highbit_cum = vector8_broadcast(0);
+#ifdef USE_NO_SIMD
+	Vector8		zero_cum = vector8_broadcast(0x80);
+#endif
 
 	Assert(len % sizeof(chunk) == 0);
 
 	while (s < s_end)
 	{
-		memcpy(&chunk, s, sizeof(chunk));
+		vector8_load(&chunk, s);
+
+		/* Capture any zero bytes in this chunk. */
+#if defined(USE_NO_SIMD)
 
 		/*
-		 * Capture any zero bytes in this chunk.
-		 *
 		 * First, add 0x7f to each byte. This sets the high bit in each byte,
 		 * unless it was a zero. If any resulting high bits are zero, the
 		 * corresponding high bits in the zero accumulator will be cleared.
@@ -734,20 +739,31 @@ is_valid_ascii(const unsigned char *s, int len)
 		 * because we check for those separately.
 		 */
 		zero_cum &= (chunk + UINT64CONST(0x7f7f7f7f7f7f7f7f));
+#else
+
+		/*
+		 * Set all bits in each lane of the highbit accumulator where input
+		 * bytes are zero.
+		 */
+		highbit_cum = vector8_or(highbit_cum,
+ vector8_eq(chunk, vector8_broadcast(0)));
+#endif
 
 		/* Capture all set bits in this chunk. */
-		highbit_cum |= chunk;
+		highbit_cum = vector8_or(highbit_cum, chunk);
 
 		s += sizeof(chunk);
 	}
 
 	/* Check if any high bits in the high bit accumulator got set. */
-	if (highbit_cum & UINT64CONST(0x8080808080808080))
+	if (vector8_is_highbit_set(highbit_cum))
 		return false;
 
+#ifdef USE_NO_SIMD
 	/* Check if any high bits in the zero accumulator got cleared. */
-	if (zero_cum != UINT64CONST(0x8080808080808080))
+	if (zero_cum != vector8_broadcast(0x80))
 		return false;
+#endif
 
 	return true;
 }
diff --git a/src/include/port/simd.h b/src/include/port/simd.h
index 56df989094..8f85153110 100644
--- a/src/include/port/simd.h
+++ b/src/include/port/simd.h
@@ -38,6 +38,7 @@ typedef __m128i Vector8;
  * If no SIMD instructions are available, we can in some cases emulate vector
  * operations using bitwise operations on unsigned integers.
  */
+#define USE_NO_SIMD
 typedef uint64 Vector8;
 #endif
 
@@ -47,7 +48,11 @@ static inline Vector8 vector8_broadcast(const uint8 c);
 static inline bool vector8_has_zero(const Vector8 v);
 static inline bool vector8_has(const Vector8 v, const uint8 c);
 static inline bool vector8_has_le(const Vector8 v, const uint8 c);
-
+static inline bool vector8_is_highbit_set(const Vector8 v);
+static inline Vector8 vector8_or(const Vector8 v1, const Vector8 v2);
+#ifndef USE_NO_SIMD
+static inline Vector8 vector8_eq(const Vector8 v1, const Vector8 v2);
+#endif
 
 /*
  * Functions for loading a chunk of memory into a vector.
@@ -181,4 +186,38 @@ vector8_has_le(const Vector8 v, const uint8 c)
 	return result;
 }
 
+static inline bool
+vector8_is_highbit_set(const Vector8 v)
+{
+#ifdef USE_SSE2
+	return _mm_movemask_epi8(v) != 0;
+#else
+	return v & vector8_broadcast(0x80);
+#endif
+}
+
+/* comparisons between vectors */
+
+#ifndef USE_NO_SIMD
+static inline Vector8
+vector8_eq(const Vector8 v1, const Vector8 v2)
+{
+#ifdef USE_SSE2
+	return _mm_cmpeq_epi8(v1, v2);
+#endif
+}
+#endif
+
+/* bitwise operations */
+
+static inline Vector8
+vector8_or(const Vector8 v

Re: Making Vars outer-join aware

2022-08-25 Thread Richard Guo
On Thu, Aug 25, 2022 at 5:18 AM Tom Lane  wrote:

> Richard Guo  writes:
> > Do we need to also
> > generate two SpecialJoinInfos for the B/C join in the first order, with
> > and without the A/B join in its min_lefthand?
>
> No, the SpecialJoinInfos would stay as they are now.  It's already the
> case that the first join's min_righthand would contain only B, and
> the second one's min_righthand would contain only C.


I'm not sure if I understand it correctly. If we are given the first
order from the parser, the SpecialJoinInfo for the B/C join would have
min_lefthand as containing both B and the A/B join. And this
SpecialJoinInfo would make the B/C join be invalid, which is not what we
want. Currently the patch resolves this by explicitly running
remove_unneeded_nulling_relids, and the A/B join would be removed from
B/C join's min_lefthand, if Pbc is strict for B.

Do we still need this kind of fixup if we are to keep just one form of
SpecialJoinInfo and two forms of RestrictInfos?

Thanks
Richard


Insertion Sort Improvements

2022-08-25 Thread Benjamin Coutu
Hello,

Inspired by the recent discussions[1][2] around sort improvements, I took a 
look around the code and noticed the use of a somewhat naive version of 
insertion sort within the broader quicksort code.

The current implementation (see sort_template.h) is practically the textbook 
version of insertion sort:

for (pm = a + ST_POINTER_STEP; pm < a + n * ST_POINTER_STEP; pm += 
ST_POINTER_STEP)
  for (pl = pm; pl > a && DO_COMPARE(pl - ST_POINTER_STEP, pl) > 0; pl -= 
ST_POINTER_STEP)
DO_SWAP(pl, pl - ST_POINTER_STEP);

I propose to rather use the slightly more efficient variant of insertion sort 
where only a single assignment instead of a fully-fledged swap is performed in 
the inner loop:

for (pm = a + ST_POINTER_STEP; pm < a + n * ST_POINTER_STEP; pm += 
ST_POINTER_STEP) {
  DO_COPY(pm_temp, pm); /* pm_temp <- copy of pm */
  
  pl = pm - ST_POINTER_STEP;
  
  for (; pl >= a && DO_COMPARE(pl, pm_temp) > 0; pl -= ST_POINTER_STEP)
DO_ASSIGN(pl + ST_POINTER_STEP, pl); /* pl + 1 <- pl */

  DO_COPY(pl + ST_POINTER_STEP, pm_temp); /* pl + 1 <- copy of pm_temp */
}

DO_ASSIGN and DO_COPY macros would have to be declared analogue to DO_SWAP via 
the template.

There is obviously a trade-off involved here as O(1) extra memory is required 
to hold the temporary variable and DO_COPY might be expensive if the sort 
element is large. In case of single datum sort with trivial data types this 
would not be a big issue. For large tuples on the other hand it could mean a 
significant overhead that might not be made up for by the improved inner loop. 
One might want to limit this algorithm to a certain maximum tuple size and use 
the original insertion sort version for larger elements (this could be decided 
at compile-time via sort_template.h).

Anyways, there might be some low hanging fruit here. If it turns out to be 
significantly faster one might even consider increasing the insertion sort 
threshold from < 7 to < 10 sort elements. But that is a whole other discussion 
for another day.

Has anyone tested such an approach before? Please let me know your thoughts.

Cheers,

Benjamin

[1] 
https://www.postgresql.org/message-id/flat/CAFBsxsHanJTsX9DNJppXJxwg3bU%2BYQ6pnmSfPM0uvYUaFdwZdQ%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/flat/CAApHDvoTTtoQYfp3d0kTPF6y1pjexgLwquzKmjzvjC9NCw4RGw%40mail.gmail.com

-- 

Benjamin Coutu
http://www.zeyos.com




Re: pg_receivewal and SIGTERM

2022-08-25 Thread Michael Paquier
On Thu, Aug 25, 2022 at 11:19:05AM +0200, Daniel Gustafsson wrote:
> I think this can be argued both for and against backpatching.  Catching 
> SIGTERM
> makes a lot of sense, especially given systemd's behavior.  On the other hand,
> This adds functionality to something arguably working as intended, regardless
> of what one thinks about the intent.

Sure.  My view on this matter is that the behavior of the patch is
more useful to users as, on HEAD, a SIGTERM is equivalent to a drop of
the connection followed by a retry when not using -n.  Or do you think
that there could be cases where the behavior of HEAD (force a
connection drop with the backend and handle the retry infinitely in
pg_receivewal/recvlogical) is more useful?  systemd can also do
retries a certain given of times, so that's moving the ball one layer
to the other, at the end.  We could also say to just set KillSignal to
SIGINT in the docs, but my guess is that few users would actually
notice that until they see how pg_receiwal/recvlogical work with
systemd's default.

FWIW, I've worked on an archiver integration a few years ago and got
annoyed that we use SIGINT while SIGTERM was the default (systemd was
not directly used there but the signal problem was the same, so we had
to go through some loops to make the stop signal configurable, like
systemd).
--
Michael


signature.asc
Description: PGP signature


Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)

2022-08-25 Thread Michael Paquier
On Thu, Aug 25, 2022 at 10:38:41AM +0200, Alvaro Herrera wrote:
> It changes code.  Any bugfix in the surrounding code would have to fix a
> conflict.  That is nonzero effort.  Is it a huge risk?  No, it is very
> small risk and a very small cost to fix such a conflict; but my claim is
> that this change has zero benefit, therefore we should not incur a
> nonzero future effort.

Agreed to leave things as they are.  This really comes down to if we
want to make this code more C99-ish or not, and the post-patch result
is logically the same as the pre-patch result.
--
Michael


signature.asc
Description: PGP signature


Re: Perform streaming logical transactions by background workers and parallel apply

2022-08-25 Thread Amit Kapila
On Wed, Aug 24, 2022 at 7:17 PM houzj.f...@fujitsu.com
 wrote:
>
> On Friday, August 19, 2022 4:49 PM Amit Kapila 
> >
>
> > 8. It is not clear to me how APPLY_BGWORKER_EXIT status is used. Is it 
> > required
> > for the cases where bgworker exists due to some error and then apply worker
> > uses it to detect that and exits? How other bgworkers would notice this, is 
> > it
> > done via apply_bgworker_check_status()?
>
> It was used to detect the unexpected exit of bgworker and I have changed the 
> design
> of this which is now similar to what we have in parallel query.
>

Thanks, this looks better.

> Attach the new version patch set(v24) which address above comments.
> Besides, I added some logic which try to stop the bgworker at transaction end
> if there are enough workers in the pool.
>

I think this deserves an explanation in worker.c under the title:
"Separate background workers" in the patch.

Review comments for v24-0001
=
1.
+ * cost of searhing the hash table

/searhing/searching

2.
+/*
+ * Apply background worker states.
+ */
+typedef enum ApplyBgworkerState
+{
+ APPLY_BGWORKER_BUSY, /* assigned to a transaction */
+ APPLY_BGWORKER_FINISHED /* transaction is completed */
+} ApplyBgworkerState;

Now, that there are just two states, can we think to represent them
via a flag ('available'/'in_use') or do you see a downside with that
as compared to the current approach?

3.
-replorigin_session_setup(RepOriginId node)
+replorigin_session_setup(RepOriginId node, int apply_leader_pid)

I have mentioned previously that we don't need anything specific to
apply worker/leader in this API, so why this change? The other idea
that occurred to me is that can we use replorigin_session_reset()
before sending the commit message to bgworker and then do the session
setup in bgworker only to handle the commit/abort/prepare message. We
also need to set it again for the leader apply worker after the leader
worker completes the wait for bgworker to finish the commit handling.

4. Unlike parallel query, here we seem to be creating separate DSM for
each worker, and probably the difference is due to the fact that here
we don't know upfront how many workers will actually be required. If
so, can we write some comments for the same in worker.c where you have
explained about parallel bgwroker stuff?

5.
/*
- * Handle streamed transactions.
+ * Handle streamed transactions for both the main apply worker and the apply
+ * background workers.

Shall we use leader apply worker in the above comment? Also, check
other places in the patch for similar changes.

6.
+ else
+ {

- /* open the spool file for this transaction */
- stream_open_file(MyLogicalRepWorker->subid, stream_xid, first_segment);
+ /* notify handle methods we're processing a remote transaction */
+ in_streamed_transaction = true;

There is a spurious line after else {. Also, the comment could be
slightly improved: "/* notify handle methods we're processing a remote
in-progress transaction */"

7. The checks in various apply_handle_stream_* functions have improved
as compared to the previous version but I think we can still improve
those. One idea could be to use a separate function to decide the
action we want to take and then based on it, the caller can take
appropriate action. Using a similar idea, we can improve the checks in
handle_streamed_transaction() as well.

8.
+ else if ((winfo = apply_bgworker_find(xid)))
+ {
+ /* Send STREAM ABORT message to the apply background worker. */
+ apply_bgworker_send_data(winfo, s->len, s->data);
+
+ /*
+ * After sending the data to the apply background worker, wait for
+ * that worker to finish. This is necessary to maintain commit
+ * order which avoids failures due to transaction dependencies and
+ * deadlocks.
+ */
+ if (subxid == xid)
+ {
+ apply_bgworker_wait_for(winfo, APPLY_BGWORKER_FINISHED);
+ apply_bgworker_free(winfo);
+ }
+ }
+ else
+ /*
+ * We are in main apply worker and the transaction has been
+ * serialized to file.
+ */
+ serialize_stream_abort(xid, subxid);

In the last else block, you can use {} to make it consistent with
other if, else checks.

9.
+void
+ApplyBgworkerMain(Datum main_arg)
+{
+ volatile ApplyBgworkerShared *shared;
+
+ dsm_handle handle;

Is there a need to keep this empty line between the above two declarations?

10.
+ /*
+ * Attach to the message queue.
+ */
+ mq = shm_toc_lookup(toc, APPLY_BGWORKER_KEY_ERROR_QUEUE, false);

Here, we should say error queue in the comments.

11.
+ /*
+ * Attach to the message queue.
+ */
+ mq = shm_toc_lookup(toc, APPLY_BGWORKER_KEY_ERROR_QUEUE, false);
+ shm_mq_set_sender(mq, MyProc);
+ error_mqh = shm_mq_attach(mq, seg, NULL);
+ pq_redirect_to_shm_mq(seg, error_mqh);
+
+ /*
+ * Now, we have initialized DSM. Attach to slot.
+ */
+ logicalrep_worker_attach(worker_slot);
+ MyParallelShared->logicalrep_worker_generation =
MyLogicalRepWorker->generation;
+ MyParallelShared->logicalrep_worker_slot_no = worker_slot;
+
+ pq_set_paralle

Re: making relfilenodes 56 bits

2022-08-25 Thread Dilip Kumar
On Tue, Aug 23, 2022 at 8:00 PM Robert Haas  wrote:
>
> On Tue, Aug 23, 2022 at 2:06 AM Dilip Kumar  wrote:
> > OTOH, if we keep the two separate ranges for the user and system table
> > then we don't need all this complex logic of conflict checking.
>
> True. That's the downside. The question is whether it's worth adding
> some complexity to avoid needing separate ranges.

Other than complexity, we will have to check the conflict for all the
user table's relfilenumber from the old cluster into the hash build on
the new cluster's relfilenumber, isn't it extra overhead if there are
a lot of user tables?  But I think we are already restoring all those
tables in the new cluster so compared to that it will be very small.

> Honestly, if we don't care about having separate ranges, we can do
> something even simpler and just make the starting relfilenumber for
> system tables same as the OID. Then we don't have to do anything at
> all, outside of not changing the OID assigned to pg_largeobject in a
> future release. Then as long as pg_upgrade is targeting a new cluster
> with completely fresh databases that have not had any system table
> rewrites so far, there can't be any conflict.

I think having the OID-based system and having two ranges are not
exactly the same.  Because if we have the OID-based relfilenumber
allocation for system table (initially) and then later allocate from
the nextRelFileNumber counter then it seems like a mix of old system
(where actually OID and relfilenumber are tightly connected) and the
new system where nextRelFileNumber is completely independent counter.
OTOH having two ranges means logically we are not making dependent on
OID we are just allocating from a central counter but after catalog
initialization, we will leave some gap and start from a new range. So
I don't think this system is hard to explain.

> And perhaps that is the best solution after all, but while it is
> simple in terms of code, I feel it's a bit complicated for human
> beings. It's very simple to understand the scheme that Amit proposed:
> if there's anything in the new cluster that would conflict, we move it
> out of the way. We don't have to assume the new cluster hasn't had any
> table rewrites. We don't have to nail down starting relfilenumber
> assignments for system tables. We don't have to worry about
> relfilenumber or OID assignments changing between releases.
> pg_largeobject is not a special case. There are no special ranges of
> OIDs or relfilenumbers required. It just straight up works -- all the
> time, no matter what, end of story.

I agree on this that this system is easy to explain that we just
rewrite anything that conflicts so looks more future-proof.  Okay, I
will try this solution and post the patch.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-08-25 Thread Michael Paquier
On Thu, Aug 25, 2022 at 10:48:08AM +0530, Bharath Rupireddy wrote:
> On Tue, Aug 23, 2022 at 11:37 PM Nathan Bossart  
> wrote:
>> IIUC an error in get_dirent_type() could cause slots to be skipped here,
>> which is a behavior change.
> 
> That behaviour hasn't changed, no? Currently, if lstat() fails in
> ReorderBufferCleanupSerializedTXNs() it returns to
> StartupReorderBuffer()'s while loop which is in a way continuing with
> the other slots, this patch does nothing new.

Are you sure?  FWIW, the changes in reorderbuffer.c for
ReorderBufferCleanupSerializedTXNs() reduce the code readability, in
my opinion, so that's one less argument in favor of this change.

The gain in ParseConfigDirectory() is kind of cool.
pg_tzenumerate_next(), copydir(), RemoveXlogFile()
StartupReplicationSlots(), CheckPointLogicalRewriteHeap() and
RemovePgTempFilesInDir() seem fine, as well.  At least these avoid
extra lstat() calls when the file type is unknown, which would be only
a limited number of users where some of the three DT_* are missing
(right?).
--
Michael


signature.asc
Description: PGP signature


Re: Strip -mmacosx-version-min options from plperl build

2022-08-25 Thread Peter Eisentraut

On 25.08.22 02:14, Andrew Dunstan wrote:

In short: if the Cygwin Perl headers redefine PGDLLEXPORT (unlikely)
or somehow #define "__attribute__()" or "visibility()" into no-ops
(perhaps more likely) then we could explain this failure, and that
would also explain why it doesn't fail elsewhere.

I can't readily check this, since I have no idea exactly which version
of the Perl headers lorikeet uses.


It's built against cygwin perl 5.32.

I don't see anything like that in perl.h. It's certainly using
__attribute__() a lot.


This could be checked by running plperl.c through the preprocessor 
(replace gcc -c plperl.c -o plperl.o by gcc -E plperl.c -o plperl.i) and 
seeing what becomes of those symbols.


If we want to get the buildfarm green again sooner, we could force a 
--export-all-symbols directly.





Re: [RFC] building postgres with meson - v11

2022-08-25 Thread Peter Eisentraut

On 24.08.22 17:30, Andres Freund wrote:

258f6dc0a7 Don't hardcode tmp_check/ as test directory for tap tests
8ecc33cf04 Split TESTDIR into TESTLOGDIR and TESTDATADIR

I think these patches are split up a bit incorrectly.  If you apply
the first patch by itself, then the output appears in tab_comp_dir/
directly under the source directory.  And then the second patch moves
it to tmp_check/tap_comp_dir/.  If there is an intent to apply these
patches separately somehow, this should be cleaned up.


How is that happening with that version of the patch? The test puts
tap_comp_dir under TESTDIR, and TESTDIR is $(CURDIR)/tmp_check. There was an
earlier version of the patch that was split one more time that did have that
problem, but I don't quite see how that version has it?


Ok, I see now how this works.  It's a bit weird since the meaning of 
TESTDIR is changed.  I'm not sure if this could create cross-branch 
confusion.



97a0b096e8 meson: prereq: Add src/tools/gen_export.pl

This produces leading whitespace in the output files that at least on
darwin wasn't there before.  See attached patch (0002).  This should
be checked again on other platforms as well.


Hm, to me the indentation as is makes more sense, but ...


Maybe for the 'gnu' format, but on darwin (and others) it's just a flat 
list, so indenting it is pointless.



I wonder if we should rewrite this in python - I chose perl because I thought
we could share it, but as you pointed out, that's not possible, because we
don't want to depend on perl during the autoconf build from a tarball.


Given that the code is already written, I wouldn't do it.  Introducing 
Python into the toolchain would require considerations of minimum 
versions, finding the right binaries, formatting, style checking, etc., 
which would probably be a distraction right now.  I also think that this 
script, whose purpose is to read an input file line by line and print it 
back out slightly differently, is not going to be done better in Python.





Re: archive modules

2022-08-25 Thread talk to ben
Here is a patch with the proposed wording.
From 7fce0073f8a53b3e9ba84fa10fbc7b8efef36e97 Mon Sep 17 00:00:00 2001
From: benoit 
Date: Mon, 22 Aug 2022 12:00:46 +0200
Subject: [PATCH] basic_archive parameter visibility doc patch

Module parameters are only visible from the pg_settings view once
the module is loaded. Since an archive module is loaded by the archiver
process, the parameters are never visible from the view. This patch
adds a note bout this in the pg_settings system view documentation.
---
 doc/src/sgml/system-views.sgml | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml
index 9728039e71..92aad11c71 100644
--- a/doc/src/sgml/system-views.sgml
+++ b/doc/src/sgml/system-views.sgml
@@ -3275,7 +3275,13 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
   
This view does not display customized options
-   until the extension module that defines them has been loaded.
+   until the extension module that defines them has been loaded by the backend
+   process executing the query (e.g., via shared_preload_libraries, the
+   LOAD command, or a call to a user-defined C function).
+   For example, since the archive_library is only loaded by the archiver
+   process, this view will not display any customized options defined by
+   archive modules unless special action is taken to load them into the backend
+   process executing the query.
   
 
   
-- 
2.37.1



postgres_fdw hint messages

2022-08-25 Thread Peter Eisentraut

The postgres_fdw tests contain this (as amended by patch 0001):

ALTER SERVER loopback_nopw OPTIONS (ADD password 'dummypw');
ERROR:  invalid option "password"
HINT:  Valid options in this context are: service, passfile, 
channel_binding, connect_timeout, dbname, host, hostaddr, port, options, 
application_name, keepalives, keepalives_idle, keepalives_interval, 
keepalives_count, tcp_user_timeout, sslmode, sslcompression, sslcert, 
sslkey, sslrootcert, sslcrl, sslcrldir, sslsni, requirepeer, 
ssl_min_protocol_version, ssl_max_protocol_version, gssencmode, 
krbsrvname, gsslib, target_session_attrs, use_remote_estimate, 
fdw_startup_cost, fdw_tuple_cost, extensions, updatable, truncatable, 
fetch_size, batch_size, async_capable, parallel_commit, keep_connections


This annoys developers who are working on libpq connection options, 
because any option added, removed, or changed causes this test to need 
to be updated.


It's also questionable how useful this hint is in its current form, 
considering how long it is and that the options are in an 
implementation-dependent order.


Possible changes:

- Hide the hint from this particular test (done in the attached patches).

- Keep the hint, but maybe make it sorted?

- Remove all the hints like this from foreign data commands.

- Don't show the hint when there are more than N valid options.

- Do some kind of "did you mean" like we have for column names.

Thoughts?From 4adb3b6846e98a55ee19d8425dfb3d6d12145f16 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 25 Aug 2022 15:31:10 +0200
Subject: [PATCH 1/2] postgres_fdw: Remove useless DO block in test

---
 contrib/postgres_fdw/expected/postgres_fdw.out | 8 +---
 contrib/postgres_fdw/sql/postgres_fdw.sql  | 6 +-
 2 files changed, 2 insertions(+), 12 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out 
b/contrib/postgres_fdw/expected/postgres_fdw.out
index 7bf35602b0..a42c9720c3 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9621,15 +9621,9 @@ ERROR:  password is required
 DETAIL:  Non-superusers must provide a password in the user mapping.
 -- If we add a password to the connstr it'll fail, because we don't allow 
passwords
 -- in connstrs only in user mappings.
-DO $d$
-BEGIN
-EXECUTE $$ALTER SERVER loopback_nopw OPTIONS (ADD password 
'dummypw')$$;
-END;
-$d$;
+ALTER SERVER loopback_nopw OPTIONS (ADD password 'dummypw');
 ERROR:  invalid option "password"
 HINT:  Valid options in this context are: service, passfile, channel_binding, 
connect_timeout, dbname, host, hostaddr, port, options, application_name, 
keepalives, keepalives_idle, keepalives_interval, keepalives_count, 
tcp_user_timeout, sslmode, sslcompression, sslcert, sslkey, sslrootcert, 
sslcrl, sslcrldir, sslsni, requirepeer, ssl_min_protocol_version, 
ssl_max_protocol_version, gssencmode, krbsrvname, gsslib, target_session_attrs, 
use_remote_estimate, fdw_startup_cost, fdw_tuple_cost, extensions, updatable, 
truncatable, fetch_size, batch_size, async_capable, parallel_commit, 
keep_connections
-CONTEXT:  SQL statement "ALTER SERVER loopback_nopw OPTIONS (ADD password 
'dummypw')"
-PL/pgSQL function inline_code_block line 3 at EXECUTE
 -- If we add a password for our user mapping instead, we should get a different
 -- error because the password wasn't actually *used* when we run with trust 
auth.
 --
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql 
b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 42735ae78a..63581a457d 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2886,11 +2886,7 @@ CREATE FOREIGN TABLE pg_temp.ft1_nopw (
 -- If we add a password to the connstr it'll fail, because we don't allow 
passwords
 -- in connstrs only in user mappings.
 
-DO $d$
-BEGIN
-EXECUTE $$ALTER SERVER loopback_nopw OPTIONS (ADD password 
'dummypw')$$;
-END;
-$d$;
+ALTER SERVER loopback_nopw OPTIONS (ADD password 'dummypw');
 
 -- If we add a password for our user mapping instead, we should get a different
 -- error because the password wasn't actually *used* when we run with trust 
auth.
-- 
2.37.1

From 986ea253cd887b472e8040580f8c1a2642f3cada Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 25 Aug 2022 15:31:11 +0200
Subject: [PATCH 2/2] postgres_fdw: Avoid listing all libpq connection options
 in error hint

---
 contrib/postgres_fdw/expected/postgres_fdw.out | 7 ++-
 contrib/postgres_fdw/sql/postgres_fdw.sql  | 6 ++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out 
b/contrib/postgres_fdw/expected/postgres_fdw.out
index a42c9720c3..3678428b95 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9621,9 +9621,14 @@ ERROR:  password is required
 DETAIL:  Non-superusers must provide a password in the user mapp

Re: Strip -mmacosx-version-min options from plperl build

2022-08-25 Thread Tom Lane
Peter Eisentraut  writes:
>>> In short: if the Cygwin Perl headers redefine PGDLLEXPORT (unlikely)
>>> or somehow #define "__attribute__()" or "visibility()" into no-ops
>>> (perhaps more likely) then we could explain this failure, and that
>>> would also explain why it doesn't fail elsewhere.

> This could be checked by running plperl.c through the preprocessor 
> (replace gcc -c plperl.c -o plperl.o by gcc -E plperl.c -o plperl.i) and 
> seeing what becomes of those symbols.

Yeah, that was what I was going to suggest: grep the "-E" output for
_PG_init and Pg_magic_func and confirm what their extern declarations
look like.

> If we want to get the buildfarm green again sooner, we could force a 
> --export-all-symbols directly.

I'm not hugely upset as long as it's just the one machine failing.

regards, tom lane




Re: pg_regress: lookup shellprog in $PATH

2022-08-25 Thread Robert Haas
On Wed, Aug 24, 2022 at 10:31 PM Tom Lane  wrote:
> git blame blames that whole mechanism on me: 60cfe25e68d.  It looks
> like the reason I did it like that is that I was replacing use of
> system(3) with execl(), and system(3) is defined thus by POSIX:
>
> execl(, "sh", "-c", command, (char *)0);
>
> where  is an unspecified pathname for the sh utility.
>
> Using SHELL for the "unspecified path" is already a bit of a leap
> of faith, since users are allowed to make that point at a non-Bourne
> shell.  I don't see any strong reason to preserve that behavior.

It seems weird that you use any arbitrary shell to run 'sh', but I
guess the point is that your shell command, whatever it is, is
supposed to be a full pathname, and then it can do pathname resolution
to figure out where you 'sh' executable is. So that makes me think
that the problem Gurjeet is reporting is an issue with Nix rather than
an issue with PostgreSQL.

But what we've got is:

[rhaas pgsql]$ git grep execl\(
src/bin/pg_ctl/pg_ctl.c: (void) execl("/bin/sh", "/bin/sh", "-c", cmd,
(char *) NULL);
src/test/regress/pg_regress.c:   execl(shellprog, shellprog, "-c",
cmdline2, (char *) NULL);

And surely that's stupid. The whole point here has to be that if you
want to run something called 'sh' but don't know where it is, you need
to execute a shell at a known pathname to figure it out for you.

We could do as you propose and I don't think we would be worse off
than we are today. But I'm confused why the correct formulation
wouldn't be exactly what POSIX specifies, namely execl(shellprog,
"sh", "-c", ...). That way, if somebody has a system where they do set
$SHELL properly but do not have /bin/sh, things would still work.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_regress: lookup shellprog in $PATH

2022-08-25 Thread Tom Lane
Robert Haas  writes:
> But what we've got is:

> [rhaas pgsql]$ git grep execl\(
> src/bin/pg_ctl/pg_ctl.c: (void) execl("/bin/sh", "/bin/sh", "-c", cmd,
> (char *) NULL);
> src/test/regress/pg_regress.c:   execl(shellprog, shellprog, "-c",
> cmdline2, (char *) NULL);

Right.  I wouldn't really feel a need to change anything, except
that we have this weird inconsistency between the way pg_ctl does
it and the way pg_regress does it.  I think we should settle on
just one way.

> We could do as you propose and I don't think we would be worse off
> than we are today. But I'm confused why the correct formulation
> wouldn't be exactly what POSIX specifies, namely execl(shellprog,
> "sh", "-c", ...). That way, if somebody has a system where they do set
> $SHELL properly but do not have /bin/sh, things would still work.

My point is that that *isn't* what POSIX specifies.  They say in so
many words that the path actually used by system(3) is unspecified.
They do NOT say that it's the value of $SHELL, and given that you're
allowed to set $SHELL to a non-POSIX-compatible shell, using that
is really wrong.  We've gotten away with it so far because we
resolve $SHELL at build time not run time, but it's still shaky.

Interestingly, if you look at concrete man pages, you tend to find
something else.  Linux says

   The  system()  library  function uses fork(2) to create a child process
   that executes the shell command specified in command using execl(3)  as
   follows:
   execl("/bin/sh", "sh", "-c", command, (char *) 0);

My BSD machines say "the command is handed to sh(1)", without committing
to just how that's found ... but guess what, "which sh" finds /bin/sh.

In any case, I can't find any system(3) that relies on $SHELL,
so my translation wasn't correct according to either the letter
of POSIX or common practice.  It's supposed to be more or less
a hard-wired path, they just don't want to commit to which path.

Moreover, leaving aside the question of whether pg_regress'
current behavior is actually bug-compatible with system(3),
what is the argument that it needs to be?  We have at this
point sufficient experience with pg_ctl's use of /bin/sh
to be pretty confident that that works everywhere.  So let's
standardize on the simpler way, not the more complex way.

(It looks like pg_ctl has used /bin/sh since 6bcce25801c3f
of Oct 2015.)

regards, tom lane




Re: replacing role-level NOINHERIT with a grant-level option

2022-08-25 Thread Robert Haas
On Wed, Aug 24, 2022 at 10:23 AM tushar  wrote:
> On 8/24/22 12:28 AM, Robert Haas wrote:
> > This patch needed to be rebased pretty extensively after commit
> > ce6b672e4455820a0348214be0da1a024c3f619f. Here is a new version.
> Thanks, Robert, I have retested this patch with my previous scenarios
> and things look good to me.

I read through this again and found a comment that needed to be
updated, so I did that, bumped catversion, and committed this.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_regress: lookup shellprog in $PATH

2022-08-25 Thread Robert Haas
On Thu, Aug 25, 2022 at 10:13 AM Tom Lane  wrote:
> My point is that that *isn't* what POSIX specifies.  They say in so
> many words that the path actually used by system(3) is unspecified.
> They do NOT say that it's the value of $SHELL, and given that you're
> allowed to set $SHELL to a non-POSIX-compatible shell, using that
> is really wrong.  We've gotten away with it so far because we
> resolve $SHELL at build time not run time, but it's still shaky.
>
> Interestingly, if you look at concrete man pages, you tend to find
> something else.  Linux says
>
>The  system()  library  function uses fork(2) to create a child process
>that executes the shell command specified in command using execl(3)  as
>follows:
>execl("/bin/sh", "sh", "-c", command, (char *) 0);
>
> My BSD machines say "the command is handed to sh(1)", without committing
> to just how that's found ... but guess what, "which sh" finds /bin/sh.
>
> In any case, I can't find any system(3) that relies on $SHELL,
> so my translation wasn't correct according to either the letter
> of POSIX or common practice.  It's supposed to be more or less
> a hard-wired path, they just don't want to commit to which path.
>
> Moreover, leaving aside the question of whether pg_regress'
> current behavior is actually bug-compatible with system(3),
> what is the argument that it needs to be?  We have at this
> point sufficient experience with pg_ctl's use of /bin/sh
> to be pretty confident that that works everywhere.  So let's
> standardize on the simpler way, not the more complex way.

I mean, I can see you're on the warpath here and I don't care enough
to fight about it very much, but as a matter of theory, I believe that
hard-coded pathnames suck. Giving the user a way to survive if /bin/sh
doesn't exist on their system or isn't the path they want to use seems
fundamentally sensible to me. Now if system() doesn't do that anyhow,
well then there is no such mechanism in such cases, and so the benefit
of providing one in the tiny number of other cases that we have may
not be there. But if you're trying to convince me that hard-coded
paths are as a theoretical matter brilliant, I'm not buying it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Letter case of "admin option"

2022-08-25 Thread Robert Haas
On Thu, Aug 25, 2022 at 4:58 AM Alvaro Herrera  wrote:
> I ran a quick poll in a Spanish community.  Everyone who responded (not
> many admittedly) agreed with this idea -- they find the message clearer
> if the keyword is mentioned explicitly in the translation.

Makes sense. I didn't really doubt that ADMIN should be capitalized, I
just wasn't sure about OPTION.

> > > In short, I'm wondering whether we should regard ADMIN as the name of
> > > the option, but OPTION as part of the GRANT syntax, and hence
> > > capitalize it "ADMIN option". However, if the non-English speakers on
> > > this list have a strong preference for something else I'm certainly
> > > not going to fight about it.
> >
> > "ADMIN option" which is translated into "ADMINオプション" is fine by
> > me.  I hope Álvaro thinks the same way.
>
> Hmm, but our docs say that the option is called ADMIN OPTION, don't
> they?  And I think the standard sees it the same way.  You cannot invoke
> it without the word OPTION.  I understand the point of view, but I don't
> think it is clearer done that way.  It is different for example with
> INHERIT; we could say "the INHERIT option" making the word "option"
> translatable in that phrase.  But then you don't have to add that word
> in the command.

It's going to be a little strange of we have ADMIN OPTION and INHERIT
option, isn't it? But we can try it.

One thing I have noticed, though, is that there are a lot of existing
references to ADMIN OPTION in code comments. If we decide on anything
else here we're going to have quite a few things to tidy up. Not that
that's a big deal I guess, but it's something to think about.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




libpq error message refactoring

2022-08-25 Thread Peter Eisentraut
libpq now contains a mix of error message strings that end with newlines 
and don't end with newlines, due to some newer code paths with new ways 
of passing errors around.  This has now gotten me confused a few too 
many times both during development and translation.  So I looked into 
whether we can unify this, similar to how we have done elsewhere (e.g., 
pg_upgrade).  I came up with the attached patch.  It's not complete, but 
it shows the idea and it looks like a nice simplification to me. 
Thoughts on this approach?From e547d9993232e20c9e696bd7facbcb3de4afdd6c Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 4 Aug 2022 23:31:58 +0300
Subject: [PATCH v1] WIP: libpq_append_error

---
 src/interfaces/libpq/fe-connect.c  | 271 +++--
 src/interfaces/libpq/fe-exec.c | 109 
 src/interfaces/libpq/fe-misc.c |  15 ++
 src/interfaces/libpq/libpq-int.h   |   2 +
 src/interfaces/libpq/nls.mk|   4 +-
 src/interfaces/libpq/pqexpbuffer.c |  27 ++-
 src/interfaces/libpq/pqexpbuffer.h |   2 +
 7 files changed, 186 insertions(+), 244 deletions(-)

diff --git a/src/interfaces/libpq/fe-connect.c 
b/src/interfaces/libpq/fe-connect.c
index 8cefef20d1..79b9392858 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -896,8 +896,7 @@ fillPGconn(PGconn *conn, PQconninfoOption *connOptions)
*connmember = strdup(tmp);
if (*connmember == NULL)
{
-   
appendPQExpBufferStr(&conn->errorMessage,
-   
 libpq_gettext("out of memory\n"));
+   libpq_append_error(conn, "out of 
memory");
return false;
}
}
@@ -1079,9 +1078,8 @@ connectOptions2(PGconn *conn)
if (more || i != conn->nconnhost)
{
conn->status = CONNECTION_BAD;
-   appendPQExpBuffer(&conn->errorMessage,
- libpq_gettext("could 
not match %d host names to %d hostaddr values\n"),
- 
count_comma_separated_elems(conn->pghost), conn->nconnhost);
+   libpq_append_error(conn, "could not match %d host names 
to %d hostaddr values",
+  
count_comma_separated_elems(conn->pghost), conn->nconnhost);
return false;
}
}
@@ -1160,9 +1158,8 @@ connectOptions2(PGconn *conn)
else if (more || i != conn->nconnhost)
{
conn->status = CONNECTION_BAD;
-   appendPQExpBuffer(&conn->errorMessage,
- libpq_gettext("could 
not match %d port numbers to %d hosts\n"),
- 
count_comma_separated_elems(conn->pgport), conn->nconnhost);
+   libpq_append_error(conn, "could not match %d port 
numbers to %d hosts",
+  
count_comma_separated_elems(conn->pgport), conn->nconnhost);
return false;
}
}
@@ -1250,9 +1247,8 @@ connectOptions2(PGconn *conn)
&& strcmp(conn->channel_binding, "require") != 0)
{
conn->status = CONNECTION_BAD;
-   appendPQExpBuffer(&conn->errorMessage,
- 
libpq_gettext("invalid %s value: \"%s\"\n"),
- "channel_binding", 
conn->channel_binding);
+   libpq_append_error(conn, "invalid %s value: \"%s\"",
+  "channel_binding", 
conn->channel_binding);
return false;
}
}
@@ -1276,9 +1272,8 @@ connectOptions2(PGconn *conn)
&& strcmp(conn->sslmode, "verify-full") != 0)
{
conn->status = CONNECTION_BAD;
-   appendPQExpBuffer(&conn->errorMessage,
- 
libpq_gettext("invalid %s value: \"%s\"\n"),
- "sslmode", 
conn->sslmode);
+   libpq_append_error(conn, "invalid %s value: \"%s\"",
+  "sslmode", 
conn->sslmode);
return false;
}
 
@@ -1297,9 +1292,8 @@ connectOptions2(PGconn *conn)
case 'r':   /* "require" */
  

Re: pg_basebackup: add test about zstd compress option

2022-08-25 Thread Robert Haas
On Thu, Aug 25, 2022 at 3:52 AM Dong Wook Lee  wrote:
> Could you check if I missed anything?

There is already a far better test for this in
src/bin/pg_verifybackup/t/009_extract.pl

-- 
Robert Haas
EDB: http://www.enterprisedb.com




[PATCH] Fix alter subscription concurrency errors

2022-08-25 Thread Jelte Fennema
Without this patch concurrent ALTER/DROP SUBSCRIPTION statements for 
the same subscription could result in one of these statements returning the
following error:

ERROR:  XX000: tuple concurrently updated

This patch fixes that by re-fetching the tuple after acquiring the lock on the
subscription. The included isolation test fails most of its permutations
without this patch, with the error shown above.

The loop to re-fetch the tuple is heavily based on the code from dbcommands.c

0001-Fix-errors-when-concurrently-altering-subscriptions.patch
Description:  0001-Fix-errors-when-concurrently-altering-subscriptions.patch


Re: pg_regress: lookup shellprog in $PATH

2022-08-25 Thread Tom Lane
Robert Haas  writes:
> I mean, I can see you're on the warpath here and I don't care enough
> to fight about it very much, but as a matter of theory, I believe that
> hard-coded pathnames suck. Giving the user a way to survive if /bin/sh
> doesn't exist on their system or isn't the path they want to use seems
> fundamentally sensible to me. Now if system() doesn't do that anyhow,
> well then there is no such mechanism in such cases, and so the benefit
> of providing one in the tiny number of other cases that we have may
> not be there. But if you're trying to convince me that hard-coded
> paths are as a theoretical matter brilliant, I'm not buying it.

If we were executing a program that the user needs to have some control
over, sure, but what we have here is an implementation detail that I
doubt anyone cares about.  The fact that we're using a shell at all is
only because nobody has cared to manually implement I/O redirection logic
in these places; otherwise we'd be execl()'ing the server or psql directly.
Maybe the best answer would be to do that, and get out of the business
of knowing where the shell is?

regards, tom lane




Re: shadow variables - pg15 edition

2022-08-25 Thread David Rowley
On Thu, 25 Aug 2022 at 14:08, Justin Pryzby  wrote:
> Here, I've included the rest of your list.

OK, I've gone through v3-remove-var-declarations.txt, v4-reuse.txt
v4-reuse-more.txt and committed most of what you had and removed a few
that I thought should be renames instead.

I also added some additional ones after reprocessing the RenameOrScope
category from the spreadsheet.

With some minor adjustments to a small number of your ones, I pushed
what I came up with.

David


shadow_analysis.ods
Description: application/vnd.oasis.opendocument.spreadsheet


Re: shadow variables - pg15 edition

2022-08-25 Thread David Rowley
On Thu, 25 Aug 2022 at 13:46, David Rowley  wrote:
> I've attached a patch which I think improves the code in
> gistRelocateBuildBuffersOnSplit() so that there's no longer a shadowed
> variable. I also benchmarked this method in a tight loop and can
> measure no performance change from getting the loop index this way vs
> the old way.

I've now pushed this patch too.

David




Re: pg_regress: lookup shellprog in $PATH

2022-08-25 Thread Robert Haas
On Thu, Aug 25, 2022 at 10:48 AM Tom Lane  wrote:
> If we were executing a program that the user needs to have some control
> over, sure, but what we have here is an implementation detail that I
> doubt anyone cares about.  The fact that we're using a shell at all is
> only because nobody has cared to manually implement I/O redirection logic
> in these places; otherwise we'd be execl()'ing the server or psql directly.
> Maybe the best answer would be to do that, and get out of the business
> of knowing where the shell is?

Well that also would not be crazy.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_receivewal and SIGTERM

2022-08-25 Thread Christoph Berg
Re: Michael Paquier
> FWIW, I've worked on an archiver integration a few years ago and got
> annoyed that we use SIGINT while SIGTERM was the default (systemd was
> not directly used there but the signal problem was the same, so we had
> to go through some loops to make the stop signal configurable, like
> systemd).

SIGTERM is really the default for any init system or run-a-daemon system.

Christoph




has_privs_of_role vs. is_member_of_role, redux

2022-08-25 Thread Robert Haas
Hi,

We've had some previous discussions about when to use
has_privs_of_role and when to use is_member_of_role, and
has_privs_of_role has mostly won the fight. That means that, if role
"robert" is set to NOINHERIT and you "GRANT stuff TO robert", for the
most part "robert" will not actually be able to do things that "stuff"
could do. Now, robert will be able TO "SET ROLE stuff" and then do all
and only those things that "stuff" can do, but he won't be able to do
those things as "robert". For example:

rhaas=# set role robert;
SET
rhaas=> select * from stuff_table;
ERROR:  permission denied for table stuff_table

So far, so good. But it's clearly not the case that "GRANT stuff TO
robert" has conferred no privileges at all on robert. At the very
least, it's enabled him to "SET ROLE stuff", but what else? I decided
to go through the code and make a list of the things that robert can
now do that he couldn't do before. Here it is:

1. robert can create new objects of various types owned by stuff:

rhaas=> create schema stuff_by_robert authorization stuff;
CREATE SCHEMA
rhaas=> create schema unrelated_by_robert authorization unrelated;
ERROR:  must be member of role "unrelated"

2. robert can change the owner of objects he owns to instead be owned by stuff:

rhaas=> alter table robert_table owner to unrelated;
ERROR:  must be member of role "unrelated"
rhaas=> alter table robert_table owner to stuff;
ALTER TABLE

3. robert can change the default privileges for stuff:

rhaas=> alter default privileges for role unrelated grant select on
tables to public;
ERROR:  must be member of role "unrelated"
rhaas=> alter default privileges for role stuff grant select on tables
to public;
ALTER DEFAULT PRIVILEGES

4. robert can execute "SET ROLE stuff".

That's it. There are two other behaviors that change -- the return
value of pg_has_role('robert', 'stuff', 'MEMBER') and pg_hba.conf
matching to groups -- but those aren't things that robert gains the
ability to do. The above is an exhaustive list of the things robert
gains the ability to do.

I argue that #3 is a clear bug. robert can't select from stuff's
tables or change privileges on stuff's objects, so why can he change
stuff's default privileges? is_member_of_role() has a note that it is
not to be used for privilege checking, and this seems like it's pretty
clearly a privilege check.

On the flip side, #4 is pretty clearly correct. Presumably, allowing
that to happen was the whole point of executing "GRANT stuff TO
robert" in the first place.

The other two are less clear, in my opinion. We don't want users to
end up owning objects that they didn't intend to own; in particular,
if any user could make a security-definer function and then gift it to
the superuser, it would be a disaster. So, arguably, the ability to
make some other role the owner of an object represents a privilege
that your role holds with respect to their role. Under that theory,
the is_member_of_role() checks that are performed in cases #1 and #2
are privilege checks, and we ought to be using has_privis_of_role()
instead, so that a non-inherited role grant doesn't confer those
privileges. But I don't find this very clear cut, because except when
the object you're gifting is a Trojan horse, giving stuff away helps
the recipient, not the donor.

Also, from a practical point of view, changing the owner of an object
is different from other things that robert might want to do. If robert
wants to create a table as user stuff or read some data from tables
user stuff can access or change privileges on objects that role stuff
owns, he can just execute "SET ROLE stuff" and then do any of that
stuff. But he can't give away his own objects by assuming stuff's
privileges. Either he can do it as himself, or he can't do it at all.
It wouldn't be crazy IMHO to decide that a non-inherited grant isn't
sufficient to donate objects to the granted role, and thus an
inherited grant is required in such cases. However, the current system
doesn't seem insane either, and in fact might be convenient in some
situations.

In short, my proposal is to change the ALTER DEFAULT PRIVILEGES code
so that you have to have the privileges of the target role, not jut
membership in the target role, and leave everything else unchanged.

Thoughts?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




re: postgres_fdw hint messages

2022-08-25 Thread Ranier Vilela
>The postgres_fdw tests contain this (as amended by patch 0001):

>ALTER SERVER loopback_nopw OPTIONS (ADD password 'dummypw');
>ERROR: invalid option "password"
>HINT: Valid options in this context are: service, passfile,
>channel_binding, connect_timeout, dbname, host, hostaddr, port, options,
>application_name, keepalives, keepalives_idle, keepalives_interval,
>keepalives_count, tcp_user_timeout, sslmode, sslcompression, sslcert,
>sslkey, sslrootcert, sslcrl, sslcrldir, sslsni, requirepeer,
>ssl_min_protocol_version, ssl_max_protocol_version, gssencmode,
>krbsrvname, gsslib, target_session_attrs, use_remote_estimate,
>fdw_startup_cost, fdw_tuple_cost, extensions, updatable, truncatable,
>fetch_size, batch_size, async_capable, parallel_commit, keep_connections

>This annoys developers who are working on libpq connection options,
>because any option added, removed, or changed causes this test to need
>to be updated.

>It's also questionable how useful this hint is in its current form,
>considering how long it is and that the options are in an
>implementation-dependent order.

>Possible changes:

>- Hide the hint from this particular test (done in the attached patches).
+1
I vote for this option.
Less work for future developers changes, I think worth the effort.

Anyway, in alphabetical order, it's a lot easier for users to read.

Patch attached.

regards,
Ranier Vilela


0003-reorder-alphabetical-libpq-options.patch
Description: Binary data


Re: Schema variables - new implementation for Postgres 15

2022-08-25 Thread Pavel Stehule
st 24. 8. 2022 v 10:04 odesílatel Erik Rijkers  napsal:

> Op 24-08-2022 om 08:37 schreef Pavel Stehule:
> >>
> >
> > I fixed these.
> >
>
>  > [v20220824-1-*.patch]
>
> Hi Pavel,
>
> I noticed just now that variable assignment (i.e., LET) unexpectedly
> (for me anyway) cast the type of the input value. Surely that's wrong?
> The documentation says clearly enough:
>
> 'The result must be of the same data type as the session variable.'
>
>
> Example:
>
> create variable x integer;
> let x=1.5;
> select x, pg_typeof(x);
>   x | pg_typeof
> ---+---
>   2 | integer
> (1 row)
>
>
> Is this correct?
>
> If such casts (there are several) are intended then the text of the
> documentation should be changed.
>

yes - the behave is designed like plpgsql assignment or SQL assignment

 (2022-08-25 19:35:35) postgres=# do $$
postgres$# declare i int;
postgres$# begin
postgres$#   i := 1.5;
postgres$#   raise notice '%', i;
postgres$# end;
postgres$# $$;
NOTICE:  2
DO

(2022-08-25 19:38:10) postgres=# create table foo1(a int);
CREATE TABLE
(2022-08-25 19:38:13) postgres=# insert into foo1 values(1.5);
INSERT 0 1
(2022-08-25 19:38:21) postgres=# select * from foo1;
┌───┐
│ a │
╞═══╡
│ 2 │
└───┘
(1 row)

There are the same rules as in SQL.

This sentence is not good - the value should be casteable to the target
type.

Regards

Pavel





> Thanks,
>
> Erik
>
>


V14 and later build the backend with -lpthread

2022-08-25 Thread Tom Lane
I realized $SUBJECT while wondering why my new buildfarm animal chickadee
(NetBSD on gaur's old hardware) fails the plpython tests on v13 and
earlier.  After a bit of investigation I realized it *should* be failing,
because neither NetBSD nor Python have done anything about the problem
documented in [1].  The reason it fails to fail in current branches is
that we're now pulling -lpthread into the backend, which AFAICT is an
unintentional side-effect of sloppy autoconfmanship in commits
de91c3b97 / 44bf3d508.  We wanted pthread_barrier_wait() for pgbench,
not the backend, but as-committed we'll add -lpthread to LIBS if it
provides pthread_barrier_wait.

Now maybe someday we'll be brave enough to make the backend multithreaded,
but today is not that day, and in the meantime this seems like a rather
dangerous situation.  There has certainly been exactly zero analysis
of whether it's safe.

... On the third hand, poking at backends with ldd shows that at
least on Linux, we've been linking the backend with -lpthread for
quite some time, back to 9.4 or so.  The new-in-v14 behavior is that
it's getting in there on BSD-ish platforms as well.

Should we try to pull that back out, or just cross our fingers and
hope there's no real problem?

regards, tom lane

[1] https://www.postgresql.org/message-id/flat/25662.1560896200%40sss.pgh.pa.us




Re: postgres_fdw hint messages

2022-08-25 Thread Ranier Vilela
Em qui., 25 de ago. de 2022 às 14:31, Ranier Vilela 
escreveu:

> >The postgres_fdw tests contain this (as amended by patch 0001):
>
> >ALTER SERVER loopback_nopw OPTIONS (ADD password 'dummypw');
> >ERROR: invalid option "password"
> >HINT: Valid options in this context are: service, passfile,
> >channel_binding, connect_timeout, dbname, host, hostaddr, port, options,
> >application_name, keepalives, keepalives_idle, keepalives_interval,
> >keepalives_count, tcp_user_timeout, sslmode, sslcompression, sslcert,
> >sslkey, sslrootcert, sslcrl, sslcrldir, sslsni, requirepeer,
> >ssl_min_protocol_version, ssl_max_protocol_version, gssencmode,
> >krbsrvname, gsslib, target_session_attrs, use_remote_estimate,
> >fdw_startup_cost, fdw_tuple_cost, extensions, updatable, truncatable,
> >fetch_size, batch_size, async_capable, parallel_commit, keep_connections
>
> >This annoys developers who are working on libpq connection options,
> >because any option added, removed, or changed causes this test to need
> >to be updated.
>
> >It's also questionable how useful this hint is in its current form,
> >considering how long it is and that the options are in an
> >implementation-dependent order.
>
> >Possible changes:
>
> >- Hide the hint from this particular test (done in the attached patches).
> +1
> I vote for this option.
> Less work for future developers changes, I think worth the effort.
>
> Anyway, in alphabetical order, it's a lot easier for users to read.
>
> Patch attached.
>
Little tweak in the comments.

regards,
Ranier Vilela

>
>


v1-0003-reorder-alphabetical-libpq-options.patch
Description: Binary data


Re: pg_receivewal and SIGTERM

2022-08-25 Thread Magnus Hagander
On Thu, Aug 25, 2022 at 5:13 PM Christoph Berg  wrote:
>
> Re: Michael Paquier
> > FWIW, I've worked on an archiver integration a few years ago and got
> > annoyed that we use SIGINT while SIGTERM was the default (systemd was
> > not directly used there but the signal problem was the same, so we had
> > to go through some loops to make the stop signal configurable, like
> > systemd).
>
> SIGTERM is really the default for any init system or run-a-daemon system.

It is, but there is also precedent for not using it for graceful
shutdown. Apache, for example, will do what we do today on SIGTERM and
you use SIGWINCH to make it shut down gracefully (which would be the
equivalent of us flushing the compression buffers, I'd say).

I'm not saying we shouldn't change -- I fully approve of making the
change. But the world is full of fairly prominent examples of the
other way as well.

I'm leaning towards considering it a feature-change and thus not
something to backpatch (I'd be OK sneaking it into 15 though, as that
one is not released yet and it feels like a perfectly *safe* change).
Not enough to insist on it, but it seems "slightly more correct".

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/




Re: has_privs_of_role vs. is_member_of_role, redux

2022-08-25 Thread Joe Conway

On 8/25/22 12:12, Robert Haas wrote:

So far, so good. But it's clearly not the case that "GRANT stuff TO
robert" has conferred no privileges at all on robert. At the very
least, it's enabled him to "SET ROLE stuff", but what else? I decided
to go through the code and make a list of the things that robert can
now do that he couldn't do before. Here it is:

1. robert can create new objects of various types owned by stuff:



2. robert can change the owner of objects he owns to instead be owned by stuff:



3. robert can change the default privileges for stuff:



4. robert can execute "SET ROLE stuff".


Nice analysis, and surprising (to me)


I argue that #3 is a clear bug. robert can't select from stuff's
tables or change privileges on stuff's objects, so why can he change
stuff's default privileges? is_member_of_role() has a note that it is
not to be used for privilege checking, and this seems like it's pretty
clearly a privilege check.



+1 this feels very wrong to me



On the flip side, #4 is pretty clearly correct. Presumably, allowing
that to happen was the whole point of executing "GRANT stuff TO
robert" in the first place.


Exactly


The other two are less clear, in my opinion. We don't want users to
end up owning objects that they didn't intend to own; in particular,
if any user could make a security-definer function and then gift it to
the superuser, it would be a disaster. So, arguably, the ability to
make some other role the owner of an object represents a privilege
that your role holds with respect to their role. Under that theory,
the is_member_of_role() checks that are performed in cases #1 and #2
are privilege checks, and we ought to be using has_privis_of_role()
instead, so that a non-inherited role grant doesn't confer those
privileges. But I don't find this very clear cut, because except when
the object you're gifting is a Trojan horse, giving stuff away helps
the recipient, not the donor.

Also, from a practical point of view, changing the owner of an object
is different from other things that robert might want to do. If robert
wants to create a table as user stuff or read some data from tables
user stuff can access or change privileges on objects that role stuff
owns, he can just execute "SET ROLE stuff" and then do any of that
stuff. But he can't give away his own objects by assuming stuff's
privileges. Either he can do it as himself, or he can't do it at all.
It wouldn't be crazy IMHO to decide that a non-inherited grant isn't
sufficient to donate objects to the granted role, and thus an
inherited grant is required in such cases. However, the current system
doesn't seem insane either, and in fact might be convenient in some
situations.

In short, my proposal is to change the ALTER DEFAULT PRIVILEGES code
so that you have to have the privileges of the target role, not jut
membership in the target role, and leave everything else unchanged.

Thoughts?


I'm not sure about these last two. Does it matter that object creation 
is being logged, maybe for auditing purposes, under a different user 
than the owner of the object?


--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2022-08-25 Thread Andres Freund
Hi,

On 2022-08-22 13:15:18 -0400, Melanie Plageman wrote:
> v28 attached.
> 
> I've added the new structs I added to typedefs.list.
> 
> I've split the commit which adds all of the logic to track
> IO operation statistics into two commits -- one which includes all of
> the code to count IOOps for IOContexts locally in a backend and a second
> which includes all of the code to accumulate and manage these with the
> cumulative stats system.

Thanks!


> A few notes about the commit which adds local IO Operation stats:
> 
> - There is a comment above pgstat_io_op_stats_collected() which mentions
> the cumulative stats system even though this commit doesn't engage the
> cumulative stats system. I wasn't sure if it was more or less
> confusing to have two different versions of this comment.

Not worth being worried about...


> - should pgstat_count_io_op() take BackendType as a parameter instead of
> using MyBackendType internally?

I don't forsee a case where a different value would be passed in.


> - pgstat_count_io_op() Assert()s that the passed-in IOOp and IOContext
> are valid for this BackendType, but it doesn't check that all of the
> pending stats which should be zero are zero. I thought this was okay
> because if I did add that zero-check, it would be added to
> pgstat_count_ioop() as well, and we already Assert() there that we can
> count the op. Thus, it doesn't seem like checking that the stats are
> zero would add any additional regression protection.

It's probably ok.


> - I've kept pgstat_io_context_desc() and pgstat_io_op_desc() in the
> commit which adds those types (the local stats commit), however they
> are not used in that commit. I wasn't sure if I should keep them in
> that commit or move them to the first commit using them (the commit
> adding the new view).

> - I've left pgstat_fetch_backend_io_context_ops() in the shared stats
> commit, however it is not used until the commit which adds the view in
> pg_stat_get_io(). I wasn't sure which way seemed better.


Think that's fine.


> Notes on the commit which accumulates IO Operation stats in shared
> memory:
> 
> - I've extended the usage of the Assert()s that IO Operation stats that
> should be zero are. Previously we only checked the stats validity when
> querying the view. Now we check it when flushing pending stats and
> when reading the stats file into shared memory.

> Note that the three locations with these validity checks (when
> flushing pending stats, when reading stats file into shared memory,
> and when querying the view) have similar looking code to loop through
> and validate the stats. However, the actual action they perform if the
> stats are valid is different for each site (adding counters together,
> doing a read, setting nulls in a tuple column to true). Also, some of
> these instances have other code interspersed in the loops which would
> require additional looping if separated from this logic. So it was
> difficult to see a way of combining these into a single helper
> function.

All of them seem to repeat something like

> + if (!pgstat_bktype_io_op_valid(bktype, io_op) ||
> + 
> !pgstat_io_context_io_op_valid(io_context, io_op))

perhaps those could be combined? Afaics nothing uses pgstat_bktype_io_op_valid
separately.


> Subject: [PATCH v28 3/5] Track IO operation statistics locally
> 
> Introduce "IOOp", an IO operation done by a backend, and "IOContext",
> the IO location source or target or IO type done by a backend. For
> example, the checkpointer may write a shared buffer out. This would be
> counted as an IOOp "write" on an IOContext IOCONTEXT_SHARED by
> BackendType "checkpointer".
> 
> Each IOOp (alloc, extend, fsync, read, write) is counted per IOContext
> (local, shared, or strategy) through a call to pgstat_count_io_op().
> 
> The primary concern of these statistics is IO operations on data blocks
> during the course of normal database operations. IO done by, for
> example, the archiver or syslogger is not counted in these statistics.

s/is/are/?


> Stats on IOOps for all IOContexts for a backend are counted in a
> backend's local memory. This commit does not expose any functions for
> aggregating or viewing these stats.

s/This commit does not/A subsequent commit will expose/...


> @@ -823,6 +823,7 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, 
> ForkNumber forkNum,
>   BufferDesc *bufHdr;
>   Block   bufBlock;
>   boolfound;
> + IOContext   io_context;
>   boolisExtend;
>   boolisLocalBuf = SmgrIsTemp(smgr);
>  
> @@ -986,10 +987,25 @@ ReadBuffer_common(SMgrRelation smgr, char 
> relpersistence, ForkNumber forkNum,
>*/
>   Assert(!(pg_atomic_read_u32(&bufHdr->state) & BM_VALID));   /* 
> spinlock not needed */
>  
> - bufBlock = isLocalBuf ? LocalBufHdrGetBlock(bufHdr) : 
> BufHdrGetBlock(bufHdr);
> + if (isLocalBuf)

Re: postgres_fdw hint messages

2022-08-25 Thread Ibrar Ahmed
On Thu, Aug 25, 2022 at 6:42 PM Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:

> The postgres_fdw tests contain this (as amended by patch 0001):
>
> ALTER SERVER loopback_nopw OPTIONS (ADD password 'dummypw');
> ERROR:  invalid option "password"
> HINT:  Valid options in this context are: service, passfile,
> channel_binding, connect_timeout, dbname, host, hostaddr, port, options,
> application_name, keepalives, keepalives_idle, keepalives_interval,
> keepalives_count, tcp_user_timeout, sslmode, sslcompression, sslcert,
> sslkey, sslrootcert, sslcrl, sslcrldir, sslsni, requirepeer,
> ssl_min_protocol_version, ssl_max_protocol_version, gssencmode,
> krbsrvname, gsslib, target_session_attrs, use_remote_estimate,
> fdw_startup_cost, fdw_tuple_cost, extensions, updatable, truncatable,
> fetch_size, batch_size, async_capable, parallel_commit, keep_connections
>
> This annoys developers who are working on libpq connection options,
> because any option added, removed, or changed causes this test to need
> to be updated.
>
> It's also questionable how useful this hint is in its current form,
> considering how long it is and that the options are in an
> implementation-dependent order.
>
>
Thanks Peter, for looking at that; this HINT message is growing over time.

In my opinion, we should hide the complete message in case of an invalid
option. But
try to show dependent options; for example, if someone specify "sslcrl" and
that option
require some more options, then show the HINT of that options.

Possible changes:
>
> - Hide the hint from this particular test (done in the attached patches).
>
>

> - Keep the hint, but maybe make it sorted?
>
> - Remove all the hints like this from foreign data commands.
>
> - Don't show the hint when there are more than N valid options.
>
> - Do some kind of "did you mean" like we have for column names.
>
> Thoughts?



-- 
Ibrar Ahmed


Re: Cleaning up historical portability baggage

2022-08-25 Thread Andres Freund
Hi,

On 2022-08-18 18:13:38 +1200, Thomas Munro wrote:
> Here's a slightly better AF_INET6 one.  I'm planning to push it
> tomorrow if there are no objections.

You didn't yet, I think. Any chance you could? The HAVE_IPV6 stuff is
wrong/ugly in the meson build, right now, and I'd rather not spend time fixing
it up ;)


> It does something a little more aggressive than the preceding stuff, because
> SUSv3 says that IPv6 is an "option".  I don't see that as an issue: it also
> says that various other ubiquitous stuff we're using is optional.  Of
> course, it would be absurd for a new socket implementation to appear today
> that can't talk to a decent chunk of the internet, and all we require here
> is the headers.  That optionality was relevant for the transition period a
> couple of decades ago.

> From f162a15a6d723f8c94d9daa6236149e1f39b0d9a Mon Sep 17 00:00:00 2001
> From: Thomas Munro 
> Date: Thu, 18 Aug 2022 11:55:10 +1200
> Subject: [PATCH] Remove configure probe for sockaddr_in6 and require AF_INET6.
> 
> SUSv3  defines struct sockaddr_in6, and all targeted Unix
> systems have it.  Windows has it in .  Remove the configure
> probe, the macro and a small amount of dead code.
> 
> Also remove a mention of IPv6-less builds from the documentation, since
> there aren't any.
> 
> This is similar to commits f5580882 and 077bf2f2 for Unix sockets.
> Even though AF_INET6 is an "optional" component of SUSv3, there are no
> known modern operating system without it, and it seems even less likely
> to be omitted from future systems than AF_UNIX.
> 
> Discussion: 
> https://postgr.es/m/ca+hukgkernfhmvb_h0upremp4lpzgn06yr2_0tyikjzb-2e...@mail.gmail.com

Looks good to me.


I'm idly wondering whether it's worth at some point to introduce a configure
test of just compiling a file referencing all the headers and symbols we exist
to be there...

Greetings,

Andres Freund




Re: pg_regress: lookup shellprog in $PATH

2022-08-25 Thread Tom Lane
Robert Haas  writes:
> On Thu, Aug 25, 2022 at 10:48 AM Tom Lane  wrote:
>> If we were executing a program that the user needs to have some control
>> over, sure, but what we have here is an implementation detail that I
>> doubt anyone cares about.  The fact that we're using a shell at all is
>> only because nobody has cared to manually implement I/O redirection logic
>> in these places; otherwise we'd be execl()'ing the server or psql directly.
>> Maybe the best answer would be to do that, and get out of the business
>> of knowing where the shell is?

> Well that also would not be crazy.

I experimented with this, and it seems like it might not be as awful as
we've always assumed it would be.  Attached is an incomplete POC that
converts pg_regress proper to doing things this way.  (isolationtester
and pg_regress_ecpg are outright broken by this patch, because they rely
on pg_regress' spawn_process and I didn't fix them yet.  But you can run
the core regression tests to see it works.)

The Windows side of this is completely untested and may be broken; also,
perhaps Windows has something more nearly equivalent to execvp() that we
could use instead of reconstructing a command line?  It's annoying that
the patch removes all shell-quoting hazards on the Unix side but they
are still there on the Windows side.

regards, tom lane

diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index a803355f8e..e1e0d5f7a2 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -19,6 +19,7 @@
 #include "postgres_fe.h"
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -51,10 +52,6 @@ typedef struct _resultmap
  */
 char	   *host_platform = HOST_TUPLE;
 
-#ifndef WIN32	/* not used in WIN32 case */
-static char *shellprog = SHELLPROG;
-#endif
-
 /*
  * On Windows we use -w in diff switches to avoid problems with inconsistent
  * newline representation.  The actual result files will generally have
@@ -73,7 +70,7 @@ _stringlist *dblist = NULL;
 bool		debug = false;
 char	   *inputdir = ".";
 char	   *outputdir = ".";
-char   *expecteddir = ".";
+char	   *expecteddir = ".";
 char	   *bindir = PGBINDIR;
 char	   *launcher = NULL;
 static _stringlist *loadextension = NULL;
@@ -1052,12 +1049,71 @@ psql_end_command(StringInfo buf, const char *database)
 	} while (0)
 
 /*
- * Spawn a process to execute the given shell command; don't wait for it
+ * Redirect f to the file specified by fpath, opened with given flags and mode
+ * Does not return upon failure
+ */
+static void
+redirect_to(FILE *f, const char *fpath, int flags, int mode)
+{
+	int			fh;
+
+	/* Let's just be sure the FILE is idle */
+	fflush(f);
+
+	fh = open(fpath, flags, mode);
+	if (fh < 0)
+	{
+		fprintf(stderr, "could not open file \"%s\": %m\n", fpath);
+		_exit(1);
+	}
+	if (dup2(fh, fileno(f)) < 0)
+	{
+		fprintf(stderr, "could not duplicate descriptor for file \"%s\": %m\n",
+fpath);
+		_exit(1);
+	}
+	(void) close(fh);
+}
+
+/*
+ * Redirect f to the same file used by fref
+ * Does not return upon failure
+ */
+static void
+redirect_link(FILE *f, FILE *fref)
+{
+	/* Let's just be sure the FILE is idle */
+	fflush(f);
+
+	if (dup2(fileno(fref), fileno(f)) < 0)
+	{
+		fprintf(stderr, "could not duplicate file descriptor: %m\n");
+		_exit(1);
+	}
+}
+
+/*
+ * Spawn a process to execute a sub-command; don't wait for it
  *
  * Returns the process ID (or HANDLE) so we can wait for it later
+ * Does not return upon failure
+ *
+ * Arguments:
+ * file: program to be executed (may be a full path, or just program name)
+ * argv: NULL-terminated array of argument strings, as for execvp(2);
+ *   argv[0] should normally be the same as file
+ * proc_stdin: filename to make child's stdin read from, or NULL
+ *   for no redirection
+ * proc_stdout: filename to make child's stdout write to, or NULL
+ *   for no redirection
+ * proc_stderr: filename to make child's stderr write to, or NULL
+ *   for no redirection, or "&1" to link it to child's stdout
  */
 PID_TYPE
-spawn_process(const char *cmdline)
+spawn_process(const char *file, char *argv[],
+			  const char *proc_stdin,
+			  const char *proc_stdout,
+			  const char *proc_stderr)
 {
 #ifndef WIN32
 	pid_t		pid;
@@ -1085,35 +1141,65 @@ spawn_process(const char *cmdline)
 	if (pid == 0)
 	{
 		/*
-		 * In child
-		 *
-		 * Instead of using system(), exec the shell directly, and tell it to
-		 * "exec" the command too.  This saves two useless processes per
-		 * parallel test case.
+		 * In child: redirect stdio as requested, then execvp()
 		 */
-		char	   *cmdline2;
-
-		cmdline2 = psprintf("exec %s", cmdline);
-		execl(shellprog, shellprog, "-c", cmdline2, (char *) NULL);
+		if (proc_stdin)
+			redirect_to(stdin, proc_stdin, O_RDONLY, 0);
+		if (proc_stdout)
+			redirect_to(stdout, proc_stdout, O_WRONLY | O_CREAT | O_TRUNC,
+		S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH);
+		if (proc_stderr)
+		{

Re: archive modules

2022-08-25 Thread Nathan Bossart
On Thu, Aug 25, 2022 at 03:29:41PM +0200, talk to ben wrote:
> Here is a patch with the proposed wording.

Here is the same patch with a couple more links.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 92a6d8669d9e5b527a7ac9af7eb359a86526775b Mon Sep 17 00:00:00 2001
From: benoit 
Date: Mon, 22 Aug 2022 12:00:46 +0200
Subject: [PATCH v4 1/1] basic_archive parameter visibility doc patch

Module parameters are only visible from the pg_settings view once
the module is loaded. Since an archive module is loaded by the archiver
process, the parameters are never visible from the view. This patch
adds a note bout this in the pg_settings system view documentation.
---
 doc/src/sgml/system-views.sgml | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml
index 44aa70a031..18ac4620f0 100644
--- a/doc/src/sgml/system-views.sgml
+++ b/doc/src/sgml/system-views.sgml
@@ -3275,7 +3275,15 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
   
This view does not display customized options
-   until the extension module that defines them has been loaded.
+   until the extension module that defines them has been loaded by the backend
+   process executing the query (e.g., via
+   , the
+   LOAD command, or a call
+   to a user-defined C function).  For example,
+   since the  is only loaded by the
+   archiver process, this view will not display any customized options defined
+   by archive modules unless special
+   action is taken to load them into the backend process executing the query.
   
 
   
-- 
2.25.1



Re: has_privs_of_role vs. is_member_of_role, redux

2022-08-25 Thread Robert Haas
On Thu, Aug 25, 2022 at 3:03 PM Joe Conway  wrote:
> Nice analysis, and surprising (to me)

Thanks.

> > I argue that #3 is a clear bug. robert can't select from stuff's
> > tables or change privileges on stuff's objects, so why can he change
> > stuff's default privileges? is_member_of_role() has a note that it is
> > not to be used for privilege checking, and this seems like it's pretty
> > clearly a privilege check.
>
> +1 this feels very wrong to me

Cool. I'll prepare a patch for that, unless someone else beats me to it.

I really hate back-patching this kind of change but it's possible that
it's the right thing to do. There's no real security exposure because
the member could always SET ROLE and then do the exact same thing, so
back-patching feels to me like it has a significantly higher chance of
turning happy users into unhappy ones than the reverse. On the other
hand, it's pretty hard to defend the current behavior once you stop to
think about it, so perhaps it should be back-patched on those grounds.
On the third hand, the fact that this has gone undiscovered for a
decade makes you wonder whether we've really had clear enough ideas
about this to justify calling it a bug rather than, say, an elevation
of our thinking on this topic.

> I'm not sure about these last two. Does it matter that object creation
> is being logged, maybe for auditing purposes, under a different user
> than the owner of the object?

I'd be inclined to say that it doesn't matter, because the grant could
have just as well been inheritable, or the action could have been
performed by a superuser. Also, as a rule of thumb, I don't think we
should choose to prohibit things on the grounds that some auditing
regime might not be able to understand what happened. If that's an
issue, we should address it by making the logging better, or including
better logging hooks, or what have you. I think that the focus should
be on the permissions model: what is the "right thing" from a security
standpoint?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: V14 and later build the backend with -lpthread

2022-08-25 Thread Robert Haas
On Thu, Aug 25, 2022 at 1:41 PM Tom Lane  wrote:
> I realized $SUBJECT while wondering why my new buildfarm animal chickadee
> (NetBSD on gaur's old hardware) fails the plpython tests on v13 and
> earlier.  After a bit of investigation I realized it *should* be failing,
> because neither NetBSD nor Python have done anything about the problem
> documented in [1].  The reason it fails to fail in current branches is
> that we're now pulling -lpthread into the backend, which AFAICT is an
> unintentional side-effect of sloppy autoconfmanship in commits
> de91c3b97 / 44bf3d508.  We wanted pthread_barrier_wait() for pgbench,
> not the backend, but as-committed we'll add -lpthread to LIBS if it
> provides pthread_barrier_wait.
>
> Now maybe someday we'll be brave enough to make the backend multithreaded,
> but today is not that day, and in the meantime this seems like a rather
> dangerous situation.  There has certainly been exactly zero analysis
> of whether it's safe.
>
> ... On the third hand, poking at backends with ldd shows that at
> least on Linux, we've been linking the backend with -lpthread for
> quite some time, back to 9.4 or so.  The new-in-v14 behavior is that
> it's getting in there on BSD-ish platforms as well.
>
> Should we try to pull that back out, or just cross our fingers and
> hope there's no real problem?

Absent some evidence of a real problem, I vote for crossing our
fingers. It would certainly be a very bad idea to start using pthreads
willy-nilly in the back end, but the mere presence of the library
doesn't seem like a particularly severe issue. I might feel
differently if no such version had been released yet, but it's hard to
feel like the sky is falling if it's been like this on Linux since
9.4.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: has_privs_of_role vs. is_member_of_role, redux

2022-08-25 Thread Tom Lane
Robert Haas  writes:
> I really hate back-patching this kind of change but it's possible that
> it's the right thing to do. There's no real security exposure because
> the member could always SET ROLE and then do the exact same thing, so
> back-patching feels to me like it has a significantly higher chance of
> turning happy users into unhappy ones than the reverse. On the other
> hand, it's pretty hard to defend the current behavior once you stop to
> think about it, so perhaps it should be back-patched on those grounds.
> On the third hand, the fact that this has gone undiscovered for a
> decade makes you wonder whether we've really had clear enough ideas
> about this to justify calling it a bug rather than, say, an elevation
> of our thinking on this topic.

Yeah, I'd lean against back-patching.  This is the sort of behavioral
change that users tend not to like finding in minor releases.

regards, tom lane




Re: V14 and later build the backend with -lpthread

2022-08-25 Thread Thomas Munro
On Fri, Aug 26, 2022 at 8:40 AM Robert Haas  wrote:
> On Thu, Aug 25, 2022 at 1:41 PM Tom Lane  wrote:
> > I realized $SUBJECT while wondering why my new buildfarm animal chickadee
> > (NetBSD on gaur's old hardware) fails the plpython tests on v13 and
> > earlier.  After a bit of investigation I realized it *should* be failing,
> > because neither NetBSD nor Python have done anything about the problem
> > documented in [1].  The reason it fails to fail in current branches is
> > that we're now pulling -lpthread into the backend, which AFAICT is an
> > unintentional side-effect of sloppy autoconfmanship in commits
> > de91c3b97 / 44bf3d508.  We wanted pthread_barrier_wait() for pgbench,
> > not the backend, but as-committed we'll add -lpthread to LIBS if it
> > provides pthread_barrier_wait.
> >
> > Now maybe someday we'll be brave enough to make the backend multithreaded,
> > but today is not that day, and in the meantime this seems like a rather
> > dangerous situation.  There has certainly been exactly zero analysis
> > of whether it's safe.
> >
> > ... On the third hand, poking at backends with ldd shows that at
> > least on Linux, we've been linking the backend with -lpthread for
> > quite some time, back to 9.4 or so.  The new-in-v14 behavior is that
> > it's getting in there on BSD-ish platforms as well.
> >
> > Should we try to pull that back out, or just cross our fingers and
> > hope there's no real problem?
>
> Absent some evidence of a real problem, I vote for crossing our
> fingers. It would certainly be a very bad idea to start using pthreads
> willy-nilly in the back end, but the mere presence of the library
> doesn't seem like a particularly severe issue. I might feel
> differently if no such version had been released yet, but it's hard to
> feel like the sky is falling if it's been like this on Linux since
> 9.4.

I suspect we will end up linked against the threading library anyway
in real-world builds via --with-XXX (I see that --with-icu has that
effect on my FreeBSD system, but I know that details about threading
are quite different in NetBSD).  I may lack imagination but I'm
struggling to see how it could break anything.

How should I have done that, by the way?  Is the attached the right trick?
From 761158cb38508dbb91ca7fff474b4f6e041583a7 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Fri, 26 Aug 2022 08:16:12 +1200
Subject: [PATCH] Avoid adding -pthread to backend unintentionally.

---
 configure| 35 +--
 configure.ac |  7 +++
 2 files changed, 20 insertions(+), 22 deletions(-)

diff --git a/configure b/configure
index 814014a96b..fefe97e520 100755
--- a/configure
+++ b/configure
@@ -647,13 +647,13 @@ MSGFMT
 PG_CRC32C_OBJS
 CFLAGS_ARMV8_CRC32C
 CFLAGS_SSE42
-LIBOBJS
 ZSTD
 LZ4
 UUID_LIBS
 LDAP_LIBS_BE
 LDAP_LIBS_FE
 with_ssl
+LIBOBJS
 PTHREAD_CFLAGS
 PTHREAD_LIBS
 PTHREAD_CC
@@ -12429,6 +12429,7 @@ fi
 
 
 if test "$enable_thread_safety" = yes; then
+  _LIBS="$LIBS"
   { $as_echo "$as_me:${as_lineno-$LINENO}: checking for library containing pthread_barrier_wait" >&5
 $as_echo_n "checking for library containing pthread_barrier_wait... " >&6; }
 if ${ac_cv_search_pthread_barrier_wait+:} false; then :
@@ -12485,6 +12486,21 @@ if test "$ac_res" != no; then :
 
 fi
 
+  ac_fn_c_check_func "$LINENO" "pthread_barrier_wait" "ac_cv_func_pthread_barrier_wait"
+if test "x$ac_cv_func_pthread_barrier_wait" = xyes; then :
+  $as_echo "#define HAVE_PTHREAD_BARRIER_WAIT 1" >>confdefs.h
+
+else
+  case " $LIBOBJS " in
+  *" pthread_barrier_wait.$ac_objext "* ) ;;
+  *) LIBOBJS="$LIBOBJS pthread_barrier_wait.$ac_objext"
+ ;;
+esac
+
+fi
+
+
+  LIBS="$_LIBS"
 fi
 
 if test "$with_readline" = yes; then
@@ -16369,23 +16385,6 @@ fi
 
 
 
-if test "$enable_thread_safety" = yes; then
-  ac_fn_c_check_func "$LINENO" "pthread_barrier_wait" "ac_cv_func_pthread_barrier_wait"
-if test "x$ac_cv_func_pthread_barrier_wait" = xyes; then :
-  $as_echo "#define HAVE_PTHREAD_BARRIER_WAIT 1" >>confdefs.h
-
-else
-  case " $LIBOBJS " in
-  *" pthread_barrier_wait.$ac_objext "* ) ;;
-  *) LIBOBJS="$LIBOBJS pthread_barrier_wait.$ac_objext"
- ;;
-esac
-
-fi
-
-
-fi
-
 if test "$PORTNAME" = "win32" -o "$PORTNAME" = "cygwin"; then
 	# Cygwin and (apparently, based on test results) Mingw both
 	# have a broken strtof(), so substitute its implementation.
diff --git a/configure.ac b/configure.ac
index 6ff294d405..c51f174fde 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1260,7 +1260,10 @@ AC_SEARCH_LIBS(shmget, cygipc)
 AC_SEARCH_LIBS(backtrace_symbols, execinfo)
 
 if test "$enable_thread_safety" = yes; then
+  _LIBS="$LIBS"
   AC_SEARCH_LIBS(pthread_barrier_wait, pthread)
+  AC_REPLACE_FUNCS([pthread_barrier_wait])
+  LIBS="$_LIBS"
 fi
 
 if test "$with_readline" = yes; then
@@ -1831,10 +1834,6 @@ AC_REPLACE_FUNCS(m4_normalize([
 	strnlen
 ]))
 
-if test "$enable_thread_safety" = yes; then
-  AC_REPLACE_FUNCS(pthread_barrier_wait)
-fi
-
 if test "$PORTNAME" = "win32" -o "$PORTNAM

Re: V14 and later build the backend with -lpthread

2022-08-25 Thread Tom Lane
Robert Haas  writes:
> On Thu, Aug 25, 2022 at 1:41 PM Tom Lane  wrote:
>> ... On the third hand, poking at backends with ldd shows that at
>> least on Linux, we've been linking the backend with -lpthread for
>> quite some time, back to 9.4 or so.  The new-in-v14 behavior is that
>> it's getting in there on BSD-ish platforms as well.

[ further study shows that it's been pulled in on Linux to get sem_init() ]

>> Should we try to pull that back out, or just cross our fingers and
>> hope there's no real problem?

> Absent some evidence of a real problem, I vote for crossing our
> fingers. It would certainly be a very bad idea to start using pthreads
> willy-nilly in the back end, but the mere presence of the library
> doesn't seem like a particularly severe issue. I might feel
> differently if no such version had been released yet, but it's hard to
> feel like the sky is falling if it's been like this on Linux since
> 9.4.

Well, -lpthread on other platforms might have more or different
side-effects than it does on Linux, so I'm not particularly comforted
by that argument.  I concede though that the lack of complaints about
v14 is comforting.  I'm prepared to do nothing for now; I just wanted
to raise visibility of this point so that if we do come across any
weird pre-vs-post-v14 issues, we think of this as a possible cause.

regards, tom lane




Re: V14 and later build the backend with -lpthread

2022-08-25 Thread Tom Lane
Thomas Munro  writes:
> I suspect we will end up linked against the threading library anyway
> in real-world builds via --with-XXX (I see that --with-icu has that
> effect on my FreeBSD system, but I know that details about threading
> are quite different in NetBSD).  I may lack imagination but I'm
> struggling to see how it could break anything.

Yeah, there are plenty of situations where you end up with thread
support present somehow.  So it may be a lost cause.  I was mostly
concerned about this because it seemed like an unintentional change.

(I'm also still struggling to explain why mamba, with the *exact*
same NetBSD code on a different hardware platform, isn't showing
the same failures as chickadee.  More news if I figure that out.)

> How should I have done that, by the way?  Is the attached the right trick?

I think that'd do for preventing side-effects on LIBS, but I'm not
sure if we'd have to back-fill something in pgbench's link options.
Anyway, as I said to Robert, I'm content to watch and wait for now.

regards, tom lane




PostgreSQL 15 Beta 4

2022-08-25 Thread Jonathan S. Katz

Hi,

We will be releasing a PostgreSQL 15 Beta 4 on September 8, 2022.

Please have open items[1] completed and committed no later than 
September 5, 2022 0:00 AoE[2].


Thanks,

Jonathan

[1] https://wiki.postgresql.org/wiki/PostgreSQL_15_Open_Items
[2] https://en.wikipedia.org/wiki/Anywhere_on_Earth


OpenPGP_signature
Description: OpenPGP digital signature


New strategies for freezing, advancing relfrozenxid early

2022-08-25 Thread Peter Geoghegan
Attached patch series is a completely overhauled version of earlier
work on freezing. Related work from the Postgres 15 cycle became
commits 0b018fab, f3c15cbe, and 44fa8488.

Recap
=

The main high level goal of this work is to avoid painful, disruptive
antiwraparound autovacuums (and other aggressive VACUUMs) that do way
too much "catch up" freezing, all at once, causing significant
disruption to production workloads. The patches teach VACUUM to care
about how far behind it is on freezing for each table -- the number of
unfrozen all-visible pages that have accumulated so far is directly
and explicitly kept under control over time. Unfrozen pages can be
seen as debt. There isn't necessarily anything wrong with getting into
debt (getting into debt to a small degree is all but inevitable), but
debt can be dangerous when it isn't managed carefully. Accumulating
large amounts of debt doesn't always end badly, but it does seem to
reliably create the *risk* that things will end badly.

Right now, a standard append-only table could easily do *all* freezing
in aggressive/antiwraparound VACUUM, without any earlier
non-aggressive VACUUM operations triggered by
autovacuum_vacuum_insert_threshold doing any freezing at all (unless
the user goes out of their way to tune vacuum_freeze_min_age). There
is currently no natural limit on the number of unfrozen all-visible
pages that can accumulate -- unless you count age(relfrozenxid), the
triggering condition for antiwraparound autovacuum. But relfrozenxid
age predicts almost nothing about how much freezing is required (or
will be required later on). The overall result is that it oftens takes
far too long for freezing to finally happen, even when the table
receives plenty of autovacuums (they all could freeze something, but
in practice just don't freeze anything). It's very hard to avoid that
through tuning, because what we really care about is something pretty
closely related to (if not exactly) the number of unfrozen heap pages
in the system. XID age is fundamentally "the wrong unit" here -- the
physical cost of freezing is the most important thing, by far.

In short, the goal of the patch series/project is to make autovacuum
scheduling much more predictable over time. Especially with very large
append-only tables. The patches improve the performance stability of
VACUUM by managing costs holistically, over time. What happens in one
single VACUUM operation is much less important than the behavior of
successive VACUUM operations over time.

What's new: freezing/skipping strategies


This newly overhauled version introduces the concept of
per-VACUUM-operation strategies, which we decide on once per VACUUM,
at the very start. There are 2 choices to be made at this point (right
after we acquire OldestXmin and similar cutoffs):

1) Do we scan all-visible pages, or do we skip instead? (Added by
second patch, involves a trade-off between eagerness and laziness.)
2) How should we freeze -- eagerly or lazily? (Added by third patch)

The strategy-based approach can be thought of as something that blurs
the distinction between aggressive and non-aggressive VACUUM, giving
VACUUM more freedom to do either more or less work, based on known
costs and benefits. This doesn't completely supersede
aggressive/antiwraparound VACUUMs, but should make them much rarer
with larger tables, where controlling freeze debt actually matters.
There is a need to keep laziness and eagerness in balance here. We try
to get the benefit of lazy behaviors/strategies, but will still course
correct when it doesn't work out.

A new GUC/reloption called vacuum_freeze_strategy_threshold is added
to control freezing strategy (also influences our choice of skipping
strategy). It defaults to 4GB, so tables smaller than that cutoff
(which are usually the majority of all tables) will continue to freeze
in much the same way as today by default. Our current lazy approach to
freezing makes sense there, and should be preserved for its own sake.

Compatibility
=

Structuring the new freezing behavior as an explicit user-configurable
strategy is also useful as a bridge between the old and new freezing
behaviors. It makes it fairly easy to get the old/current behavior
where that's preferred -- which, I must admit, is something that
wasn't well thought through last time around. The
vacuum_freeze_strategy_threshold GUC is effectively (though not
explicitly) a compatibility option. Users that want something close to
the old/current behavior can use the GUC or reloption to more or less
opt-out of the new freezing behavior, and can do so selectively. The
GUC should be easy for users to understand, too -- it's just a table
size cutoff.

Skipping pages using a snapshot of the visibility map
=

We now take a copy of the visibility map at the point that VACUUM
begins, and work off of that when skipping, instead of working off of
t

Re: V14 and later build the backend with -lpthread

2022-08-25 Thread Tom Lane
I wrote:
> (I'm also still struggling to explain why mamba, with the *exact*
> same NetBSD code on a different hardware platform, isn't showing
> the same failures as chickadee.  More news if I figure that out.)

Hah: I left --with-libxml out of chickadee's configuration, because
libxml2 seemed to have some problems on that platform, and that is
what is pulling in libpthread on mamba:

$ ldd /usr/pkg/lib/libxml2.so
/usr/pkg/lib/libxml2.so:
-lz.1 => /usr/lib/libz.so.1
-lc.12 => /usr/lib/libc.so.12
-llzma.2 => /usr/lib/liblzma.so.2
-lpthread.1 => /lib/libpthread.so.1
-lm.0 => /usr/lib/libm.so.0
-lgcc_s.1 => /lib/libgcc_s.so.1

Reinforces your point about real-world builds, I suppose.

For the moment I'll just disable testing plpython pre-v14 on
chickadee.

regards, tom lane




Re: V14 and later build the backend with -lpthread

2022-08-25 Thread Andres Freund
Hi,

On 2022-08-25 17:04:37 -0400, Tom Lane wrote:
> (I'm also still struggling to explain why mamba, with the *exact*
> same NetBSD code on a different hardware platform, isn't showing
> the same failures as chickadee.  More news if I figure that out.)

I'd guess it's because of the different dependencies that are enabled. On my
netbsd VM libxml2 pulls in -lpthread, for example. We add xml2's dependencies
to LIBS, so if that's enabled, we end up indirectly pulling in libxml2 in as
well.


> > How should I have done that, by the way?  Is the attached the right trick?
>
> I think that'd do for preventing side-effects on LIBS, but I'm not
> sure if we'd have to back-fill something in pgbench's link options.
> Anyway, as I said to Robert, I'm content to watch and wait for now.

Given that linking in pthreads support fixes things, that seems the right
course... I wonder if we shouldn't even be more explicit about it and just add
it - who knows what extension libraries pull in. It'd not be good if we end up
with non-reentrant versions of functions just because initially the backend
isn't threaded etc.

Greetings,

Andres Freund




Re: Strip -mmacosx-version-min options from plperl build

2022-08-25 Thread Andrew Dunstan


On 2022-08-25 Th 09:43, Tom Lane wrote:
> Peter Eisentraut  writes:
 In short: if the Cygwin Perl headers redefine PGDLLEXPORT (unlikely)
 or somehow #define "__attribute__()" or "visibility()" into no-ops
 (perhaps more likely) then we could explain this failure, and that
 would also explain why it doesn't fail elsewhere.
>> This could be checked by running plperl.c through the preprocessor 
>> (replace gcc -c plperl.c -o plperl.o by gcc -E plperl.c -o plperl.i) and 
>> seeing what becomes of those symbols.
> Yeah, that was what I was going to suggest: grep the "-E" output for
> _PG_init and Pg_magic_func and confirm what their extern declarations
> look like.


$ egrep '_PG_init|Pg_magic_func'  plperl.i
extern __attribute__((visibility("default"))) void _PG_init(void);
extern __attribute__((visibility("default"))) const Pg_magic_struct
*Pg_magic_func(void); const Pg_magic_struct * Pg_magic_func(void) {
static const Pg_magic_struct Pg_magic_data = { sizeof(Pg_magic_struct),
16 / 100, 100, 32, 64,
_PG_init(void)


cheers


andrew



--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Strip -mmacosx-version-min options from plperl build

2022-08-25 Thread Andres Freund
Hi,

On 2022-08-25 17:39:35 -0400, Andrew Dunstan wrote:
> On 2022-08-25 Th 09:43, Tom Lane wrote:
> > Peter Eisentraut  writes:
>  In short: if the Cygwin Perl headers redefine PGDLLEXPORT (unlikely)
>  or somehow #define "__attribute__()" or "visibility()" into no-ops
>  (perhaps more likely) then we could explain this failure, and that
>  would also explain why it doesn't fail elsewhere.
> >> This could be checked by running plperl.c through the preprocessor 
> >> (replace gcc -c plperl.c -o plperl.o by gcc -E plperl.c -o plperl.i) and 
> >> seeing what becomes of those symbols.
> > Yeah, that was what I was going to suggest: grep the "-E" output for
> > _PG_init and Pg_magic_func and confirm what their extern declarations
> > look like.
> 
> 
> $ egrep '_PG_init|Pg_magic_func'  plperl.i
> extern __attribute__((visibility("default"))) void _PG_init(void);
> extern __attribute__((visibility("default"))) const Pg_magic_struct
> *Pg_magic_func(void); const Pg_magic_struct * Pg_magic_func(void) {
> static const Pg_magic_struct Pg_magic_data = { sizeof(Pg_magic_struct),
> 16 / 100, 100, 32, 64,
> _PG_init(void)

Could you show objdump -t of the library? Perhaps once with the flags as now,
and once relinking with the "old" flags that we're now omitting?

Greetings,

Andres Freund




Re: Strip -mmacosx-version-min options from plperl build

2022-08-25 Thread Andrew Dunstan


On 2022-08-25 Th 17:47, Andres Freund wrote:
> Hi,
>
> On 2022-08-25 17:39:35 -0400, Andrew Dunstan wrote:
>> On 2022-08-25 Th 09:43, Tom Lane wrote:
>>> Peter Eisentraut  writes:
>> In short: if the Cygwin Perl headers redefine PGDLLEXPORT (unlikely)
>> or somehow #define "__attribute__()" or "visibility()" into no-ops
>> (perhaps more likely) then we could explain this failure, and that
>> would also explain why it doesn't fail elsewhere.
 This could be checked by running plperl.c through the preprocessor 
 (replace gcc -c plperl.c -o plperl.o by gcc -E plperl.c -o plperl.i) and 
 seeing what becomes of those symbols.
>>> Yeah, that was what I was going to suggest: grep the "-E" output for
>>> _PG_init and Pg_magic_func and confirm what their extern declarations
>>> look like.
>>
>> $ egrep '_PG_init|Pg_magic_func'  plperl.i
>> extern __attribute__((visibility("default"))) void _PG_init(void);
>> extern __attribute__((visibility("default"))) const Pg_magic_struct
>> *Pg_magic_func(void); const Pg_magic_struct * Pg_magic_func(void) {
>> static const Pg_magic_struct Pg_magic_data = { sizeof(Pg_magic_struct),
>> 16 / 100, 100, 32, 64,
>> _PG_init(void)
> Could you show objdump -t of the library? Perhaps once with the flags as now,
> and once relinking with the "old" flags that we're now omitting?


current:


$ objdump -t plperl.dll | egrep '_PG_init|Pg_magic_func'
[103](sec  1)(fl 0x00)(ty  20)(scl   2) (nx 0) 0x40a0
Pg_magic_func
[105](sec  1)(fl 0x00)(ty  20)(scl   2) (nx 0) 0x40b0 _PG_init


from July 11th build:


$ objdump -t plperl.dll | egrep '_PG_init|Pg_magic_func'
[101](sec  1)(fl 0x00)(ty  20)(scl   2) (nx 0) 0x40d0
Pg_magic_func
[103](sec  1)(fl 0x00)(ty  20)(scl   2) (nx 0) 0x40e0 _PG_init


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Strip -mmacosx-version-min options from plperl build

2022-08-25 Thread Andres Freund
Hi,

On 2022-08-25 18:04:34 -0400, Andrew Dunstan wrote:
> On 2022-08-25 Th 17:47, Andres Freund wrote:
> >> $ egrep '_PG_init|Pg_magic_func'  plperl.i
> >> extern __attribute__((visibility("default"))) void _PG_init(void);
> >> extern __attribute__((visibility("default"))) const Pg_magic_struct
> >> *Pg_magic_func(void); const Pg_magic_struct * Pg_magic_func(void) {
> >> static const Pg_magic_struct Pg_magic_data = { sizeof(Pg_magic_struct),
> >> 16 / 100, 100, 32, 64,
> >> _PG_init(void)
> > Could you show objdump -t of the library? Perhaps once with the flags as 
> > now,
> > and once relinking with the "old" flags that we're now omitting?
> 
> 
> current:
> 
> 
> $ objdump -t plperl.dll | egrep '_PG_init|Pg_magic_func'
> [103](sec  1)(fl 0x00)(ty  20)(scl   2) (nx 0) 0x40a0
> Pg_magic_func
> [105](sec  1)(fl 0x00)(ty  20)(scl   2) (nx 0) 0x40b0 _PG_init
> 
> 
> from July 11th build:
> 
> 
> $ objdump -t plperl.dll | egrep '_PG_init|Pg_magic_func'
> [101](sec  1)(fl 0x00)(ty  20)(scl   2) (nx 0) 0x40d0
> Pg_magic_func
> [103](sec  1)(fl 0x00)(ty  20)(scl   2) (nx 0) 0x40e0 _PG_init

Thanks.

So it looks like it's not the symbol not being exported. I wonder if the image
base thing is somehow the problem? Sounds like it should just be an efficiency
difference, by avoiding some relocations, not a functional difference.

Can you try adding just that to the flags for building and whether that then
allows a LOAD 'plperl' to succeed?

Greetings,

Andres Freund




Re: Cleaning up historical portability baggage

2022-08-25 Thread Thomas Munro
On Fri, Aug 26, 2022 at 7:47 AM Andres Freund  wrote:
> On 2022-08-18 18:13:38 +1200, Thomas Munro wrote:
> > Here's a slightly better AF_INET6 one.  I'm planning to push it
> > tomorrow if there are no objections.
>
> You didn't yet, I think. Any chance you could? The HAVE_IPV6 stuff is
> wrong/ugly in the meson build, right now, and I'd rather not spend time fixing
> it up ;)

Done, and thanks for looking.

Remaining things from this thread:
 * removing --disable-thread-safety
 * removing those vestigial HAVE_XXX macros (one by one analysis and patches)
 * making Unix sockets secure for Windows in tests




Re: New strategies for freezing, advancing relfrozenxid early

2022-08-25 Thread Peter Geoghegan
On Thu, Aug 25, 2022 at 3:35 PM Jeremy Schneider  wrote:
> We should be careful here. IIUC, the current autovac behavior helps
> bound the "spread" or range of active multixact IDs in the system, which
> directly determines the number of distinct pages that contain those
> multixacts. If the proposed change herein causes the spread/range of
> MXIDs to significantly increase, then it will increase the number of
> blocks and increase the probability of thrashing on the SLRUs for these
> data structures.

As a general rule VACUUM will tend to do more eager freezing with the
patch set compared to HEAD, though it should never do less eager
freezing. Not even in corner cases -- never.

With the patch, VACUUM pretty much uses the most aggressive possible
XID-wise/MXID-wise cutoffs in almost all cases (though only when we
actually decide to freeze a page at all, which is now a separate
question). The fourth patch in the patch series introduces a very
limited exception, where we use the same cutoffs that we'll always use
on HEAD (FreezeLimit + MultiXactCutoff) instead of the aggressive
variants (OldestXmin and OldestMxact). This isn't just *any* xmax
containing a MultiXact: it's a Multi that contains *some* XIDs that
*need* to go away during the ongoing VACUUM, and others that *cannot*
go away. Oh, and there usually has to be a need to keep two or more
XIDs for this to happen -- if there is only one XID then we can
usually swap xmax with that XID without any fuss.

> PS. see also
> https://www.postgresql.org/message-id/247e3ce4-ae81-d6ad-f54d-7d3e0409a...@ardentperf.com

I think that the problem you describe here is very real, though I
suspect that it needs to be addressed by making opportunistic cleanup
of Multis happen more reliably. Running VACUUM more often just isn't
practical once a table reaches a certain size. In general, any kind of
processing that is time sensitive probably shouldn't be happening
solely during VACUUM -- it's just too risky. VACUUM might take a
relatively long time to get to the affected page. It might not even be
that long in wall clock time or whatever -- just too long to reliably
avoid the problem.

-- 
Peter Geoghegan




Re: New strategies for freezing, advancing relfrozenxid early

2022-08-25 Thread Peter Geoghegan
On Thu, Aug 25, 2022 at 4:23 PM Peter Geoghegan  wrote:
> As a general rule VACUUM will tend to do more eager freezing with the
> patch set compared to HEAD, though it should never do less eager
> freezing. Not even in corner cases -- never.

Come to think of it, I don't think that that's quite true. Though the
fourth patch isn't particularly related to the problem.

It *is* true that VACUUM will do at least as much freezing of XID
based tuple header fields as before. That just leaves MXIDs. It's even
true that we will do just as much freezing as before if you go pure on
MultiXact-age. But I'm the one that likes to point out that age is
altogether the wrong approach for stuff like this -- so that won't cut
it.

More concretely, I think that the patch series will fail to do certain
inexpensive eager processing of tuple xmax, that will happen today,
regardless of what the user has set vacuum_freeze_min_age or
vacuum_multixact_freeze_min_age to. Although we currently only care
about XID age when processing simple XIDs, we already sometimes make
trade-offs similar to the trade-off I propose to make in the fourth
patch for Multis.

In other words, on HEAD, we promise to process any XMID >=
MultiXactCutoff inside FreezeMultiXactId(). But we also manage to do
"eager processing of xmax" when it's cheap and easy to do so, without
caring about MultiXactCutoff at all -- this is likely the common case,
even. This preexisting eager processing of Multis is likely important
in many applications.

The problem that I think I've created is that page-level freezing as
implemented in lazy_scan_prune by the third patch doesn't know that we
might be a good idea to execute a subset of freeze plans, in order to
remove a multi from a page right away. It mostly has the right idea by
holding off on freezing until it looks like a good idea at the level
of the whole page, but I think that this is a plausible exception.
Just because we're much more sensitive to leaving behind an Multi, and
right now the only code path that can remove a Multi that isn't needed
anymore is FreezeMultiXactId().

If xmax was an updater that aborted instead of a multi then we could
rely on hint bits being set by pruning to avoid clog lookups.
Technically nobody has violated their contract here, I think, but it
still seems like it could easily be unacceptable.

I need to come up with my own microbenchmark suite for Multis -- that
was on my TODO list already. I still believe that the fourth patch
addresses Andres' complaint about allocating new Multis during VACUUM.
But it seems like I need to think about the nuances of Multis some
more. In particular, what the performance impact might be of making a
decision on freezing at the page level, in light of the special
performance considerations for Multis.

Maybe it needs to be more granular than that, more often. Or maybe we
can comprehensively solve the problem in some other way entirely.
Maybe pruning should do this instead, in general. Something like that
might put this right, and be independently useful.

-- 
Peter Geoghegan




Re: pg_receivewal and SIGTERM

2022-08-25 Thread Michael Paquier
On Thu, Aug 25, 2022 at 08:45:05PM +0200, Magnus Hagander wrote:
> I'm leaning towards considering it a feature-change and thus not
> something to backpatch (I'd be OK sneaking it into 15 though, as that
> one is not released yet and it feels like a perfectly *safe* change).
> Not enough to insist on it, but it seems "slightly more correct".

Fine by me if both you and Daniel want to be more careful with this
change.  We could always argue about a backpatch later if there is
more ask for it, as well.
--
Michael


signature.asc
Description: PGP signature


Re: SYSTEM_USER reserved word implementation

2022-08-25 Thread Michael Paquier
On Thu, Aug 25, 2022 at 08:21:05PM +0200, Drouvot, Bertrand wrote:
> system_user() now returns a text and I moved it to miscinit.c in the new
> version attached (I think it makes more sense now).

+/* kluge to avoid including libpq/libpq-be.h here */
+struct ClientConnectionInfo;
+extern void InitializeSystemUser(struct ClientConnectionInfo conninfo);
+extern const char* GetSystemUser(void);

FWIW, I was also wondering about the need for all this initialization
stanza and the extra SystemUser in TopMemoryContext.  Now that we have
MyClientConnectionInfo, I was thinking to just build the string in the
SQL function as that's the only code path that needs to know about
it.  True that this approach saves some extra palloc() calls each time
the function is called.

> New version attached is also addressing Michael's remark regarding the peer
> authentication TAP test.

Thanks.  I've wanted some basic tests for the peer authentication for
some time now, independently on this thread, so it would make sense to
split that into a first patch and stress the buildfarm to see what
happens, then add these tests for SYSTEM_USER on top of the new test.
--
Michael


signature.asc
Description: PGP signature


Fix japanese translation of log messages

2022-08-25 Thread Shinya Kato

Hi hackers,

I've found typos in ja.po, and fixed them.
The patch is attached.

--
Regards,

--
Shinya Kato
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATIONdiff --git a/src/backend/po/ja.po b/src/backend/po/ja.po
index 0925465d22..189000792c 100644
--- a/src/backend/po/ja.po
+++ b/src/backend/po/ja.po
@@ -1492,7 +1492,7 @@ msgstr "パラレルワーカの初期化に失敗しました"
 #: access/transam/parallel.c:719 access/transam/parallel.c:838
 #, c-format
 msgid "More details may be available in the server log."
-msgstr "詳細な情報がはサーバログにあるかもしれません。"
+msgstr "詳細な情報はサーバログにあるかもしれません。"
 
 #: access/transam/parallel.c:899
 #, c-format
@@ -1841,7 +1841,7 @@ msgid ""
 "Stop the postmaster and vacuum that database in single-user mode.\n"
 "You might also need to commit or roll back old prepared transactions, or drop stale replication slots."
 msgstr ""
-"postmaster を停止後、シングルユーザモードでデータベースをVACUUMを実行してください。\n"
+"postmaster を停止後、シングルユーザモードでデータベースにVACUUMを実行してください。\n"
 "古い準備済みトランザクションのコミットまたはロールバック、もしくは古いレプリケーションスロットの削除も必要かもしれません。"
 
 #: access/transam/varsup.c:136
@@ -12183,7 +12183,7 @@ msgstr "失敗した行は%sを含みます"
 #: executor/execMain.c:1970
 #, c-format
 msgid "null value in column \"%s\" of relation \"%s\" violates not-null constraint"
-msgstr "リレーション\"%2$s\"の列\"%1$s\"のNULL値がが非NULL制約に違反しています"
+msgstr "リレーション\"%2$s\"の列\"%1$s\"のNULL値が非NULL制約に違反しています"
 
 #: executor/execMain.c:2021
 #, c-format
@@ -17583,7 +17583,7 @@ msgstr "CREATE TABLE では既存のインデックスを使えません"
 #: parser/parse_utilcmd.c:2261
 #, c-format
 msgid "index \"%s\" is already associated with a constraint"
-msgstr "インデックス\"%s\"はすでに1つの制約に割り当てられれいます"
+msgstr "インデックス\"%s\"はすでに1つの制約に割り当てられています"
 
 #: parser/parse_utilcmd.c:2276
 #, c-format
@@ -25012,7 +25012,7 @@ msgstr "設定列\"%s\"をNULLにすることはできません"
 #: utils/adt/tsvector_op.c:2665
 #, c-format
 msgid "text search configuration name \"%s\" must be schema-qualified"
-msgstr "テキスト検索設定名称\"%s\"はスキーマ修飾しなけれナバりません"
+msgstr "テキスト検索設定名称\"%s\"はスキーマ修飾しなければなりません"
 
 #: utils/adt/tsvector_op.c:2690
 #, c-format
@@ -26503,7 +26503,7 @@ msgstr "問い合わせのパースツリーをログに記録します。"
 
 #: utils/misc/guc.c:1498
 msgid "Logs each query's rewritten parse tree."
-msgstr "リライト後の問い合わせのパースツリーをログを記録します。"
+msgstr "リライト後の問い合わせのパースツリーをログに記録します。"
 
 #: utils/misc/guc.c:1507
 msgid "Logs each query's execution plan."
@@ -28443,7 +28443,7 @@ msgstr "固定されたポータル\"%s\"は削除できません"
 #: utils/mmgr/portalmem.c:488
 #, c-format
 msgid "cannot drop active portal \"%s\""
-msgstr "アクテイブなポータル\"%s\"を削除できません"
+msgstr "アクティブなポータル\"%s\"を削除できません"
 
 #: utils/mmgr/portalmem.c:739
 #, c-format


Re: [PATCH] Add native windows on arm64 support

2022-08-25 Thread Andres Freund
Hi,

On 2022-05-10 10:01:55 +0900, Michael Paquier wrote:
> On Mon, May 09, 2022 at 12:44:22PM +0100, Niyas Sait wrote:
> > Microsoft updated documentation [1] and clarified that ASLR cannot be
> > disabled for Arm64 targets.
> > 
> > Because ASLR can't be disabled on ARM, ARM64, or ARM64EC architectures,
> > > /DYNAMICBASE:NO isn't supported for these targets.
> 
> Better than nothing, I guess, though this does not stand as a reason
> explaining why this choice has been made.  And it looks like we will
> never know.  We are going to need more advanced testing to check that
> DYNAMICBASE is stable enough if we begin to enable it.  Perhaps we
> should really look at that for all the build targets we currently
> support rather than just ARM, for the most modern Win32 environments
> if we are going to cut support for most of the oldest releases..

I accidentally did a lot of testing with DYNAMICBASE - I accidentally
mistranslated MSBuildProject.pm's false to adding
/DYNAMICBASE in the meson port. Survived several hundred CI cycles and dozens
of local runs without observing any problems related to that.

Which doesn't surprise me, given the way we reserve space in the new process.

Greetings,

Andres Freund




Re: Column Filtering in Logical Replication

2022-08-25 Thread Peter Smith
On Thu, Aug 25, 2022 at 7:38 PM vignesh C  wrote:
>
...
> > PSA the v3* patch set.
>
> Thanks for the updated patch.
> Few comments:
> 1) We can shuffle the columns in publisher table and subscriber to
> show that the order of the column does not matter
> +   
> +Create a publication p1. A column list is defined for
> +table t1 to reduce the number of columns that will be
> +replicated.
> +
> +test_pub=# CREATE PUBLICATION p1 FOR TABLE t1 (id, a, b, c);
> +CREATE PUBLICATION
> +test_pub=#
> +
>

OK. I made the following changes to the example.
- now the subscriber table defines cols in a different order than that
of the publisher table
- now the publisher column list defines col names in a different order
than that of the table
- now the column list avoids using only adjacent column names

> 2) We can try to keep the line content to less than 80 chars
> +   
> +A column list is specified per table following the tablename, and
> enclosed by
> +parenthesis. See  for details.
> +   
>

OK. Modified to use < 80 chars

> 3) tablename should be table name like it is used in other places
> +   
> +A column list is specified per table following the tablename, and
> enclosed by
> +parenthesis. See  for details.
> +   
>

Sorry, I don't see this problem. AFAIK this same issue was already
fixed in the v3* patches.  Notice in the cited fragment that
'parenthesis' is misspelt but that was also fixed in v3. Maybe you are
looking at an old patch file (??)

> 4a) In the below, you could include mentioning "Only the column list
> data of publication p1 are replicated."
> +
> + Insert some rows to table t1.
> +
> +test_pub=# INSERT INTO t1 VALUES(1, 'a-1', 'b-1', 'c-1', 'd-1', 'e-1');
> +INSERT 0 1
>

OK. Modified to say this.

> 4b) In the above, we could mention that the insert should be done on
> the "publisher side" as the previous statements are executed on the
> subscriber side.

OK. Modified to say this.

~~~

Thanks for the feedback.

PSA patch set v4* where all of the above comments are now addressed.

--
Kind Regards,
Peter Smith.
Fujitsu Australia


v4-0001-Column-List-replica-identity-rules.patch
Description: Binary data


v4-0002-Column-List-new-pgdocs-section.patch
Description: Binary data


Re: [PATCH] Add native windows on arm64 support

2022-08-25 Thread Michael Paquier
On Thu, Aug 25, 2022 at 06:29:07PM -0700, Andres Freund wrote:
> I accidentally did a lot of testing with DYNAMICBASE - I accidentally
> mistranslated MSBuildProject.pm's false to adding
> /DYNAMICBASE in the meson port. Survived several hundred CI cycles and dozens
> of local runs without observing any problems related to that.
> 
> Which doesn't surprise me, given the way we reserve space in the new process.

Still, we had problems with that in the past and I don't recall huge
changes in the way we allocate shmem on WIN32.  Could there be some
stuff in Windows itself that would explain more stability?  I would
not mind switching back to DYNAMICBASE on HEAD seeing your results.
That would simplify this thread's patch a bit as well.
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] Fix alter subscription concurrency errors

2022-08-25 Thread Amit Kapila
On Thu, Aug 25, 2022 at 8:17 PM Jelte Fennema
 wrote:
>
> Without this patch concurrent ALTER/DROP SUBSCRIPTION statements for
> the same subscription could result in one of these statements returning the
> following error:
>
> ERROR:  XX000: tuple concurrently updated
>
> This patch fixes that by re-fetching the tuple after acquiring the lock on the
> subscription. The included isolation test fails most of its permutations
> without this patch, with the error shown above.
>

Won't the same thing can happen for similar publication commands? Why
is this unique to the subscription and not other Alter/Drop commands?

-- 
With Regards,
Amit Kapila.




Re: pg_stat_wal: tracking the compression effect

2022-08-25 Thread Kyotaro Horiguchi
At Thu, 25 Aug 2022 16:04:50 +0900, Ken Kato  wrote 
in 
> Accumulating the values, which indicates how much space is saved by
> each compression (size before compression - size after compression),
> and keep track of how many times compression has happened. So that one
> can know how much space is saved on average.

Honestly, I don't think its useful much.
How about adding them to pg_waldump and pg_walinspect instead?

# It further widens the output of pg_waldump, though..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: pg_stat_wal: tracking the compression effect

2022-08-25 Thread Kyotaro Horiguchi
At Fri, 26 Aug 2022 11:55:27 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> At Thu, 25 Aug 2022 16:04:50 +0900, Ken Kato  wrote 
> in 
> > Accumulating the values, which indicates how much space is saved by
> > each compression (size before compression - size after compression),
> > and keep track of how many times compression has happened. So that one
> > can know how much space is saved on average.
> 
> Honestly, I don't think its useful much.
> How about adding them to pg_waldump and pg_walinspect instead?
> 
> # It further widens the output of pg_waldump, though..

Sorry, that was apparently too short.

I know you already see that in per-record output of pg_waldump, but
maybe we need the summary of saved bytes in "pg_waldump -b -z" output
and the corresponding output of pg_walinspect.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [PATCH] Optimize json_lex_string by batching character copying

2022-08-25 Thread Nathan Bossart
On Thu, Aug 25, 2022 at 01:35:45PM +0700, John Naylor wrote:
> - For the following comment, pgindent will put spaced operands on a
> separate line which is not great for readability. and our other
> reference to the Stanford bithacks page keeps the in-page link, and I
> see no reason to exclude it -- if it goes missing, the whole page will
> still load. So I put back those two details.
> 
> +* To find bytes <= c, we can use bitwise operations to find
> bytes < c+1,
> +* but it only works if c+1 <= 128 and if the highest bit in v
> is not set.
> +* Adapted from
> +* https://graphics.stanford.edu/~seander/bithacks.html#HasLessInWord

This was just unnecessary fiddling on my part, sorry about that.

> +test_lfind8_internal(uint8 key)
> +{
> + uint8   charbuf[LEN_WITH_TAIL(Vector8)];
> + const int   len_no_tail = LEN_NO_TAIL(Vector8);
> + const int   len_with_tail = LEN_WITH_TAIL(Vector8);
> +
> + memset(charbuf, 0xFF, len_with_tail);
> + /* search tail to test one-byte-at-a-time path */
> + charbuf[len_with_tail - 1] = key;
> + if (key > 0x00 && pg_lfind8(key - 1, charbuf, len_with_tail))
> + elog(ERROR, "pg_lfind8() found nonexistent element <= '0x%x'", 
> key - 1);
> + if (key < 0xFF && !pg_lfind8(key, charbuf, len_with_tail))
> + elog(ERROR, "pg_lfind8() did not find existing element <= 
> '0x%x'", key);
> + if (key < 0xFE && pg_lfind8(key + 1, charbuf, len_with_tail))
> + elog(ERROR, "pg_lfind8() found nonexistent element <= '0x%x'", 
> key + 1);
> +
> + memset(charbuf, 0xFF, len_with_tail);
> + /* search with vector operations */
> + charbuf[len_no_tail - 1] = key;
> + if (key > 0x00 && pg_lfind8(key - 1, charbuf, len_no_tail))
> + elog(ERROR, "pg_lfind8() found nonexistent element <= '0x%x'", 
> key - 1);
> + if (key < 0xFF && !pg_lfind8(key, charbuf, len_no_tail))
> + elog(ERROR, "pg_lfind8() did not find existing element <= 
> '0x%x'", key);
> + if (key < 0xFE && pg_lfind8(key + 1, charbuf, len_no_tail))
> + elog(ERROR, "pg_lfind8() found nonexistent element <= '0x%x'", 
> key + 1);
> +}

nitpick: Shouldn't the elog() calls use "==" instead of "<=" for this one?

Otherwise, 0001 looks good to me.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: [PoC] Reducing planning time when tables have many partitions

2022-08-25 Thread David Rowley
On Fri, 26 Aug 2022 at 12:40, Yuya Watari  wrote:
> This performance degradation is due to the heavy processing of the
> get_ec***_indexes***() functions. These functions are the core part of
> the optimization we are working on in this thread, but they are
> relatively heavy when the number of partitions is small.
>
> I noticed that these functions were called repeatedly with the same
> arguments. During planning Query B with one partition, the
> get_ec_source_indexes_strict() function was called 2087 times with
> exactly the same parameters. Such repeated calls occurred many times
> in a single query.

How about instead of doing this caching like this, why don't we code
up some iterators that we can loop over to fetch the required EMs.

I'll attempt to type out my thoughts here without actually trying to
see if this works:

typedef struct EquivalenceMemberIterator
{
   EquivalenceClass *ec;
   Relids relids;
   Bitmapset *em_matches;
   int   position; /* last found index of em_matches or -1 */
   bool use_index;
   bool with_children;
   bool with_norel_members;
} EquivalenceMemberIterator;

We'd then have functions like:

static void
get_ecmember_indexes_iterator(EquivalenceMemberIterator *it,
PlannerInfo *root, EquivalenceClass *ec, Relids relids, bool
with_children, bool with_norel_members)
{
it->ec = ec;
it->relids = relids;
it->position = -1;

it->use_index = (root->simple_rel_array_size > 32); /* or whatever
threshold is best */
it->with_children = with_children;
it->with_norel_members = with_norel_members;

if (it->use_index)
it->em_matches = get_ecmember_indexes(root, ec, relids,
with_children, with_norel_members);
   else
   it->em_matches = NULL;
}

static EquivalenceMember *
get_next_matching_member(PlannerInfo *root, EquivalenceMemberIterator *it)
{
   if (it->use_index)
   {
it->position = bms_next_member(it->ec_matches, it->position);
if (it->position >= 0)
 return list_nth(root->eq_members, it->position);
return NULL;
}
else
{
 int i = it->position;
 while ((i = bms_next_member(it->ec->ec_member_indexes, i) >= 0)
  {
/* filter out the EMs we don't want here "break" when
we find a match */
  }
  it->position = i;
  if (i >= 0)
 return list_nth(root->eq_members, i);
  return NULL;
}
}

Then the consuming code will do something like:

EquivalenceMemberIterator iterator;
get_ecmember_indexes_iterator(&iterator, root, ec, relids, true, false);

while ((cur_em = get_next_matching_member(root, &it)) != NULL)
{
 // do stuff
}

David




Re: use SSE2 for is_valid_ascii

2022-08-25 Thread Nathan Bossart
On Thu, Aug 25, 2022 at 04:41:53PM +0700, John Naylor wrote:
> v3 applies on top of the v9 json_lex_string patch in [1] and adds a
> bit more to that, resulting in a simpler patch that is more amenable
> to additional SIMD-capable platforms.

LGTM

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: [PATCH] Add native windows on arm64 support

2022-08-25 Thread Andres Freund
Hi,

On 2022-08-26 11:09:41 +0900, Michael Paquier wrote:
> On Thu, Aug 25, 2022 at 06:29:07PM -0700, Andres Freund wrote:
> > I accidentally did a lot of testing with DYNAMICBASE - I accidentally
> > mistranslated MSBuildProject.pm's false to adding
> > /DYNAMICBASE in the meson port. Survived several hundred CI cycles and 
> > dozens
> > of local runs without observing any problems related to that.
> > 
> > Which doesn't surprise me, given the way we reserve space in the new 
> > process.
> 
> Still, we had problems with that in the past and I don't recall huge
> changes in the way we allocate shmem on WIN32.

It's really not great that 7f3e17b4827 disabled randomization without even a
comment as to why...

There actually seems to have been a lot of work in that area. See 617dc6d299c
/ bcbf2346d69 and as well as the retrying in 45e004fb4e39. Those weren't
prevented by us disabling ASLR.

Greetings,

Andres Freund




Re: use ARM intrinsics in pg_lfind32() where available

2022-08-25 Thread John Naylor
On Thu, Aug 25, 2022 at 11:57 AM Nathan Bossart
 wrote:
>
> On Thu, Aug 25, 2022 at 10:38:34AM +0700, John Naylor wrote:
> > On Thu, Aug 25, 2022 at 1:01 AM Nathan Bossart  
> > wrote:
> >> On Wed, Aug 24, 2022 at 11:07:03AM +0700, John Naylor wrote:
> >> > - Can a user on ARM64 ever get a runtime fault if the machine attempts
> >> > to execute NEON instructions?
> >>
> >> IIUC yes, although I'm not sure how likely it is in practice.
> >
> > Given the quoted part above, it doesn't seem likely, but we should try
> > to find out for sure, because a runtime fault is surely not acceptable
> > even on a toy system.
>
> The ARM literature appears to indicate that Neon support is pretty standard
> on aarch64, and AFAICT it's pretty common to just assume it's available.

This doesn't exactly rise to the level of "find out for sure", so I
went looking myself. This is the language I found [1]:

"Both floating-point and NEON are required in all standard ARMv8
implementations. However, implementations targeting specialized
markets may support the following combinations:

No NEON or floating-point.
Full floating-point and SIMD support with exception trapping.
Full floating-point and SIMD support without exception trapping."

Since we assume floating-point, I see no reason not to assume NEON,
but a case could be made for documenting that we require NEON on
aarch64, in addition to exception trapping (for CRC runtime check) and
floating point on any Arm. Or even just say "standard". I don't
believe anyone will want to run Postgres on specialized hardware
lacking these features, so maybe it's a moot point.

[1] 
https://developer.arm.com/documentation/den0024/a/AArch64-Floating-point-and-NEON

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: [PATCH] Optimize json_lex_string by batching character copying

2022-08-25 Thread John Naylor
On Fri, Aug 26, 2022 at 10:14 AM Nathan Bossart
 wrote:
>
> > +test_lfind8_internal(uint8 key)
[...]
> > + elog(ERROR, "pg_lfind8() found nonexistent element <= 
> > '0x%x'", key + 1);
> > +}
>
> nitpick: Shouldn't the elog() calls use "==" instead of "<=" for this one?

Good catch, will fix.

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: Perform streaming logical transactions by background workers and parallel apply

2022-08-25 Thread Dilip Kumar
On Thu, Aug 11, 2022 at 12:13 PM Amit Kapila  wrote:
> > Since we will later consider applying non-streamed transactions in 
> > parallel, I
> > think "apply streaming worker" might not be very suitable. I think 
> > PostgreSQL
> > also has the worker "parallel worker", so for "apply parallel worker" and
> > "apply background worker", I feel that "apply background worker" will make 
> > the
> > relationship between workers more clear. ("[main] apply worker" and "apply
> > background worker")
> >
>
> But, on similar lines, we do have vacuumparallel.c for parallelizing
> index vacuum. I agree with Kuroda-San on this point that the currently
> proposed terminology doesn't sound to be very clear. The other options
> that come to my mind are "apply streaming transaction worker", "apply
> parallel worker" and file name could be applystreamworker.c,
> applyparallel.c, applyparallelworker.c, etc. I see the point why you
> are hesitant in calling it "apply parallel worker" but it is quite
> possible that even for non-streamed xacts, we will share quite some
> part of this code.

I think the "apply streaming transaction worker" is a good option
w.r.t. what we are currently doing but then in the future, if we want
to apply normal transactions in parallel then we will have to again
change the name.  So I think  "apply parallel worker" might look
better and the file name could be "applyparallelworker.c" or just
"parallelworker.c". Although "parallelworker.c" file name is a bit
generic but we already have worker.c so w.r.t that "parallelworker.c"
should just look fine.  At least that is what I think.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Handle infinite recursion in logical replication setup

2022-08-25 Thread Dilip Kumar
On Mon, Aug 22, 2022 at 9:19 AM houzj.f...@fujitsu.com
 wrote:
>
> > Jonathan, Sawada-San, Hou-San, and others, what do you think is the best way
> > to move forward here?
>
> I think it's fine to throw a WARNING in this case given that there is a
> chance of inconsistent data.

IMHO, since the user has specifically asked for origin=NONE but we do
not have any way to detect the origin during initial sync so I think
this could be documented and we can also issue the WARNING.  So that
users notice that part and carefully set up the replication.  OTOH, I
do not think that giving an error is very inconvenient because we are
already providing a new option "origin=NONE" so along with that lets
force the user to choose between copy_data=off or copy_data=force and
with that, there is no scope for mistakes.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Small cleanups to tuplesort.c and a bonus small performance improvement

2022-08-25 Thread David Rowley
Hi,

Since doing some work for PG15 for speeding up sorts, I've been a
little irritated by the fact that dumptuples() calls WRITETUP, (which
is now always calling writetuple()) and calls pfree() on the tuple
only for dumptuples() to do
MemoryContextReset(state->base.tuplecontext) directly afterwards.
These pfrees are just a waste of effort and we might as well leave it
to the context reset to do the cleanup. (Probably especially so when
using AllocSet for storing tuples)

There are only 2 calls to WRITETUP and the other one is always called
when state->slabAllocatorUsed is true.  writetuple() checks for that
before freeing the tuple, which is a bit of a wasted branch since
it'll always prove to be false for the use case in mergeonerun().
(It's possible the compiler might inline that now anyway since the
WRITETUP macro always calls writetuple() directly now)

I've attached 3 patches aimed to do a small amount of cleanup work in
tuplesort.c

0001: Just fixes a broken looking comment in writetuple()
0002: Gets rid of the WRITETUP marco. That does not do anything useful
since 097366c45
0003: Changes writetuple to tell it what it should do in regards to
freeing and adjusting the memory accounting.

Probably 0003 could be done differently. I'm certainly not set on the
bool args. I understand that I'm never calling it with "freetup" ==
true. So other options include 1) rip out the pfree code and that
parameter; or 2) just do the inlining manually at both call sites.

I'll throw this in the September CF to see if anyone wants to look.
There's probably lots more cleaning jobs that could be done in
tuplesort.c.

The performance improvement from 0003 is not that impressive, but it
looks like it makes things very slightly faster, so probably worth it
if the patch makes the code cleaner. See attached gif and script for
the benchmark I ran to test it.  I think the gains might go up
slightly with [1] applied as that patch seems to do more to improve
the speed of palloc() than it does to improve the speed of pfree().

David

[1] https://commitfest.postgresql.org/39/3810/
From 3a0dc2893c757be7cbde5d1cd1f7801de073f943 Mon Sep 17 00:00:00 2001
From: David Rowley 
Date: Fri, 26 Aug 2022 14:42:23 +1200
Subject: [PATCH v1 3/3] Be smarter about freeing tuples during tuplesorts

During dumptuples() the call to writetuple() would pfree any non-null
tuple.  This was quite wasteful as this happens just before we perform a
reset of the context which stores all of those tuples.

Here we adjust the writetuple signature so we can tell it exactly what we
need done in regards to freeing the tuple and/or adjusting the memory
accounting.  In dumptuples() we must still account for the memory that's
been released.  Also, since writetuple() is only called twice, let's make
it static inline so that the compiler inlines the call and removes the
branching to check the bool arguments.
---
 src/backend/utils/sort/tuplesort.c | 44 --
 1 file changed, 29 insertions(+), 15 deletions(-)

diff --git a/src/backend/utils/sort/tuplesort.c 
b/src/backend/utils/sort/tuplesort.c
index d1c2f5f58f..519c7424f7 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -452,8 +452,9 @@ struct Sharedsort
 
 
 static void tuplesort_begin_batch(Tuplesortstate *state);
-static void writetuple(Tuplesortstate *state, LogicalTape *tape,
-  SortTuple *stup);
+static inline void writetuple(Tuplesortstate *state, LogicalTape *tape,
+ SortTuple *stup, bool 
freetup,
+ bool 
adjust_accounting);
 static bool consider_abort_common(Tuplesortstate *state);
 static void inittapes(Tuplesortstate *state, bool mergeruns);
 static void inittapestate(Tuplesortstate *state, int maxTapes);
@@ -1339,18 +1340,22 @@ tuplesort_puttuple_common(Tuplesortstate *state, 
SortTuple *tuple, bool useAbbre
 }
 
 /*
- * Write 'stup' out onto 'tape' and, unless using the slab allocator, pfree
- * stup's tuple and adjust the memory accounting accordingly.
+ * Write 'stup' out onto 'tape' and optionally free stup->tuple and optionally
+ * adjust the memory accounting.
  */
-static void
-writetuple(Tuplesortstate *state, LogicalTape *tape, SortTuple *stup)
+static inline void
+writetuple(Tuplesortstate *state, LogicalTape *tape, SortTuple *stup,
+  bool freetup, bool adjust_accounting)
 {
state->base.writetup(state, tape, stup);
 
-   if (!state->slabAllocatorUsed && stup->tuple)
+   if (stup->tuple != NULL)
{
-   FREEMEM(state, GetMemoryChunkSpace(stup->tuple));
-   pfree(stup->tuple);
+   if (adjust_accounting)
+   FREEMEM(state, GetMemoryChunkSpace(stup->tuple));
+
+   if (freetup)
+   pfree(stup->tuple);
}
 }
 
@@ -2251,6 +2256,8 @@ mergeonerun(Tuplesortstat

Re: use ARM intrinsics in pg_lfind32() where available

2022-08-25 Thread Nathan Bossart
On Fri, Aug 26, 2022 at 10:45:10AM +0700, John Naylor wrote:
> On Thu, Aug 25, 2022 at 11:57 AM Nathan Bossart
>  wrote:
>> The ARM literature appears to indicate that Neon support is pretty standard
>> on aarch64, and AFAICT it's pretty common to just assume it's available.
> 
> This doesn't exactly rise to the level of "find out for sure", so I
> went looking myself. This is the language I found [1]:
> 
> "Both floating-point and NEON are required in all standard ARMv8
> implementations. However, implementations targeting specialized
> markets may support the following combinations:
> 
> No NEON or floating-point.
> Full floating-point and SIMD support with exception trapping.
> Full floating-point and SIMD support without exception trapping."

Sorry, I should've linked to the documentation I found.  I saw similar
language in a couple of manuals, which is what led me to the conclusion
that Neon support is relatively standard.

> Since we assume floating-point, I see no reason not to assume NEON,
> but a case could be made for documenting that we require NEON on
> aarch64, in addition to exception trapping (for CRC runtime check) and
> floating point on any Arm. Or even just say "standard". I don't
> believe anyone will want to run Postgres on specialized hardware
> lacking these features, so maybe it's a moot point.

I'm okay with assuming Neon support for now.  It's probably easier to add
the __ARM_NEON check if/when someone complains than it is to justify
removing it once it's there.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




wal_sync_method=fsync_writethrough

2022-08-25 Thread Thomas Munro
Hi,

We allow $SUBJECT on Windows.  I'm not sure exactly how we finished up
with that, maybe a historical mistake, but I find it misleading today.
Modern Windows flushes drive write caches for fsync (= _commit()) and
fdatasync (= FLUSH_FLAGS_FILE_DATA_SYNC_ONLY).  In fact it is possible
to tell Windows to write out file data without flushing the drive
cache (= FLUSH_FLAGS_NO_SYNC), but I don't believe anyone is
interested in new weaker levels.  Any reason not to just get rid of
it?

On macOS, our fsync and fdatasync levels *don't* flush drive caches,
because those system calls don't on that OS, and they offer a weird
special fcntl, so there we offer $SUBJECT for a good reason.  Now that
macOS 10.2 systems are thoroughly extinct, I think we might as well
drop the configure probe, though, while we're doing a lot of that sort
of thing.

The documentation also says a couple of things that aren't quite
correct about wal_sync_level.  (I would also like to revise other
nearby outdated paragraphs about volatile write caches, sector sizes
etc, but that'll take some more research.)
From 42bd2d6c0ff4cd931ec3d4008438842bab5acd4b Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Fri, 26 Aug 2022 14:27:43 +1200
Subject: [PATCH 1/3] Remove wal_sync_method=fsync_writethrough on Windows.

The fsync and fdatasync levels already flush drive write caches on
Windows, so it only confuses matters to have an apparently higher level
that isn't actually higher.  We don't do that on any other OS.
---
 doc/src/sgml/wal.sgml | 5 ++---
 src/backend/storage/file/fd.c | 6 ++
 src/bin/pg_test_fsync/pg_test_fsync.c | 4 +---
 src/include/port/win32_port.h | 8 
 4 files changed, 5 insertions(+), 18 deletions(-)

diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index 01f7379ebb..90cbacfe41 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -108,9 +108,8 @@
 open_datasync (the default), write caching can be disabled
 by unchecking My Computer\Open\disk drive\Properties\Hardware\Properties\Policies\Enable write caching on the disk.
 Alternatively, set wal_sync_method to
-fdatasync (NTFS only), fsync or
-fsync_writethrough, which prevent
-write caching.
+fdatasync (NTFS only) or fsync,
+which prevent write caching.
   
 
 
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index e3b19ca1ed..c27a6600fe 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -395,7 +395,7 @@ pg_fsync(int fd)
 #endif
 
 	/* #if is to skip the sync_method test if there's no need for it */
-#if defined(HAVE_FSYNC_WRITETHROUGH) && !defined(FSYNC_WRITETHROUGH_IS_FSYNC)
+#if defined(HAVE_FSYNC_WRITETHROUGH)
 	if (sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH)
 		return pg_fsync_writethrough(fd);
 	else
@@ -425,9 +425,7 @@ pg_fsync_writethrough(int fd)
 {
 	if (enableFsync)
 	{
-#ifdef WIN32
-		return _commit(fd);
-#elif defined(F_FULLFSYNC)
+#if defined(F_FULLFSYNC)
 		return (fcntl(fd, F_FULLFSYNC, 0) == -1) ? -1 : 0;
 #else
 		errno = ENOSYS;
diff --git a/src/bin/pg_test_fsync/pg_test_fsync.c b/src/bin/pg_test_fsync/pg_test_fsync.c
index 77f0db0376..e23e2601ad 100644
--- a/src/bin/pg_test_fsync/pg_test_fsync.c
+++ b/src/bin/pg_test_fsync/pg_test_fsync.c
@@ -605,9 +605,7 @@ signal_cleanup(int signum)
 static int
 pg_fsync_writethrough(int fd)
 {
-#ifdef WIN32
-	return _commit(fd);
-#elif defined(F_FULLFSYNC)
+#if defined(F_FULLFSYNC)
 	return (fcntl(fd, F_FULLFSYNC, 0) == -1) ? -1 : 0;
 #else
 	errno = ENOSYS;
diff --git a/src/include/port/win32_port.h b/src/include/port/win32_port.h
index 707f8760ca..b6769a1693 100644
--- a/src/include/port/win32_port.h
+++ b/src/include/port/win32_port.h
@@ -75,14 +75,6 @@
 /* Windows doesn't have fsync() as such, use _commit() */
 #define fsync(fd) _commit(fd)
 
-/*
- * For historical reasons, we allow setting wal_sync_method to
- * fsync_writethrough on Windows, even though it's really identical to fsync
- * (both code paths wind up at _commit()).
- */
-#define HAVE_FSYNC_WRITETHROUGH
-#define FSYNC_WRITETHROUGH_IS_FSYNC
-
 #define USES_WINSOCK
 
 /*
-- 
2.35.1

From 6e59fb92aafa4803ceed411e1610a4f458ed092b Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Fri, 26 Aug 2022 15:00:26 +1200
Subject: [PATCH 2/3] Remove configure probe for F_FULLSYNC.

This was done to avoid using it on very old macOS systems.
---
 configure  | 14 --
 configure.ac   |  3 ---
 src/include/pg_config.h.in |  4 
 src/include/port/darwin.h  |  3 ---
 src/tools/msvc/Solution.pm |  1 -
 5 files changed, 25 deletions(-)

diff --git a/configure b/configure
index a268780c5d..6975c3f0ca 100755
--- a/configure
+++ b/configure
@@ -16204,20 +16204,6 @@ esac
 fi
 
 
-# This is probably only present on macOS, but may as well check always
-ac_fn_c_check_decl "$LINENO" "F_FULLFSYNC" "ac_cv_have_decl_F_FULLFSYNC" "#include 
-"
-if test "x$

Re: Fix japanese translation of log messages

2022-08-25 Thread Kyotaro Horiguchi
At Fri, 26 Aug 2022 10:23:01 +0900, Shinya Kato  
wrote in 
> I've found typos in ja.po, and fixed them.
> The patch is attached.

(This is not for -hackers but I'm fine with it being posted here;p)

Thanks for the report! Pushed to 10 to 15 of translation repository
with some minor changes.  They will be reflected in the code tree some
time later.

msgid "More details may be available in the server log."
-msgstr "詳細な情報がはサーバログにあるかもしれません。"
+msgstr "詳細な情報はサーバログにあるかもしれません。"

I prefer "詳細な情報が" than "詳細な情報は" here.  (The existnce of
the details is unknown here.)

 msgid "cannot drop active portal \"%s\""
-msgstr "アクテイブなポータル\"%s\"を削除できません"
+msgstr "アクティブなポータル\"%s\"を削除できません"

I canged it to "アクティブなポータル\"%s\"は削除できません". (It
describes state facts, not telling the result of an action.)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: pg_stat_wal: tracking the compression effect

2022-08-25 Thread Andrey Borodin



> On 25 Aug 2022, at 12:04, Ken Kato  wrote:
> 
> What do you think?

I think users will need to choose between Lz4 and Zstd. So they need to know 
tradeoff - compression ratio vs cpu time spend per page(or any other segment).

I know that Zstd must be kind of "better", but doubt it have enough runway on 1 
block to show off. If only we could persist compression context between many 
pages...
Compression ratio may be different on different workloads, so system view or 
something similar could be of use.

Thanks!

Best regards, Andrey Borodin.



Re: Reducing the chunk header sizes on all memory context types

2022-08-25 Thread David Rowley
On Tue, 23 Aug 2022 at 21:03, David Rowley  wrote:
> Finally, the v5 patch with the fixes mentioned above.

The CFbot just alerted me to the cplusplus check was failing with the
v5 patch, so here's v6.

> I'm pretty keen to get this patch committed early next week.  This is
> quite core infrastructure, so it would be good if the patch gets a few
> extra eyeballs on it before then.

That'll probably be New Zealand Monday, unless objects before then.

David
From 41419082028216112c92f9ff5bdec21f952b7f37 Mon Sep 17 00:00:00 2001
From: David Rowley 
Date: Thu, 9 Jun 2022 20:09:22 +1200
Subject: [PATCH v6] Improve performance of and reduce overheads of memory
 management

Whenever we palloc a chunk of memory, traditionally, we prefix the
returned pointer with a pointer to the memory context to which the chunk
belongs.  This is required so that we're able to easily determine the
owning context when performing operations such as pfree() and repalloc().

For the AllocSet context, prior to this commit we additionally prefixed
the pointer to the owning context with the size of the chunk.  This made
the header 16 bytes in size.  This 16-byte overhead was required for all
AllocSet allocations regardless of the allocation size.

For the generation context, the problem was worse; in addition to the
pointer to the owning context and chunk size, we also stored a pointer to
the owning block so that we could track the number of freed chunks on a
block.

The slab allocator had a 16-byte chunk header.

The changes being made here reduce the chunk header size down to just 8
bytes for all 3 of our memory context types.  For small to medium sized
allocations, this significantly increases the number of chunks that we can
fit on a given block which results in much more efficient use of memory.

Additionally, this commit completely changes the rule that pointers to
palloc'd memory must be directly prefixed by a pointer to the owning
memory context and instead, we now insist that they're directly prefixed
by an 8-byte value where the least significant 3-bits are set to a value
to indicate which type of memory context the pointer belongs to.  Using
those 3 bits as an index (known as MemoryContextMethodID) to a new array
which stores the methods for each memory context type, we're now able to
pass the pointer given to functions such as pfree() and repalloc() to the
function specific to that context implementation to allow them to devise
their own methods of finding the memory context which owns the given
allocated chunk of memory.

The reason we're able to reduce the chunk header down to just 8 bytes is
because of the way we make use of the remaining 61 bits of the required
8-byte chunk header.  Here we also implement a general-purpose MemoryChunk
struct which makes use of those 61 remaining bits to allow the storage of
a 30-bit value which the MemoryContext is free to use as it pleases, and
also the number of bytes which must be subtracted from the chunk to get a
reference to the block that the chunk is stored on (also 30 bits).  The 1
additional remaining bit is to denote if the chunk is an "external" chunk
or not.  External here means that the chunk header does not store the
30-bit value or the block offset.  The MemoryContext can use these
external chunks at any time, but must use them if any of the two 30-bit
fields are not large enough for the value(s) that need to be stored in
them.  When the chunk is marked as external, it is up to the MemoryContext
to devise its own means to determine the block offset.

Using 3-bits for the MemoryContextMethodID does mean we're limiting
ourselves to only having a maximum of 8 different memory context types.
We could reduce the bit space for the 30-bit value a little to make way
for more than 3 bits, but it seems like it might be better to do that only
if we ever need more than 8 context types.  This would only be a problem
if some future memory context type which does not use MemoryChunk really
couldn't give up any of the 61 remaining bits in the chunk header.

With this MemoryChunk, each of our 3 memory context types can quickly
obtain a reference to the block any given chunk is located on.  AllocSet
is able to find the context to which the chunk is owned by first obtaining
a reference to the block by subtracting the block offset as is stored in
the 'hdrmask' field and then referencing the block's 'aset' field.
The Generation context uses the same method, but GenerationBlock did not
have a field pointing back to the owning context, so one is added by this
commit.

In aset.c and generation.c, all allocations larger than allocChunkLimit
are stored on dedicated blocks.  When there's just a single chunk on a
block like this, it's easy to find the block from the chunk, we just
subtract the size of the block header from the chunk pointer.  The size of
these chunks is also known as we store the endptr on the block, so we can
just subtract the pointer to the allocated memory from that.  Because we
ca

Re: Extensible storage manager API - smgr hooks

2022-08-25 Thread Andrey Borodin



> On 16 Jun 2022, at 13:41, Kirill Reshke  wrote:
> 
> Hello Yura and Anastasia.

FWIW this technology is now a part of Greenplum [0]. We are building GP 
extension that automatically offloads cold data to S3 - a very simplified 
version of Neon for analytical workloads.
When a segment of a table is not used for a long period of time, extension will 
sync files with backup storage in the Cloud.
When the user touches data, extension's smgr will bring table segments back 
from backup or latest synced version.

Our #1 goal is to provide a tool useful for the community. We easily can 
provide same extension for Postgres if this technology (extensible smgr) is in 
core. Does such an extension seem useful for Postgres? Or does this data access 
pattern seems unusual for Postgres? By pattern I mean vast amounts of cold data 
only ever appended and never touched.


Best regards, Andrey Borodin.

[0] https://github.com/greenplum-db/gpdb/pull/13601



Re: Fix japanese translation of log messages

2022-08-25 Thread torikoshia

On 2022-08-26 10:23, Shinya Kato wrote:

Hi hackers,

I've found typos in ja.po, and fixed them.
The patch is attached.


Thanks for the patch!
LGTM.

I had found a similar typo before in ja.po, so I added that as well.

 @@ -12739,7 +12739,7 @@ msgstr "ロールオプション\"%s\"が認識できません"

 #: gram.y:1588 gram.y:1604
 #, c-format
 msgid "CREATE SCHEMA IF NOT EXISTS cannot include schema elements"
-msgstr "CREATE SCHEMA IF NOT EXISTSんはスキーマ要素を含めることはできません"
+msgstr "CREATE SCHEMA IF NOT EXISTSはスキーマ要素を含めることはできません"


How do you think?

--
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATIONdiff --git a/src/backend/po/ja.po b/src/backend/po/ja.po
index 0925465d22..8f0b7adbea 100644
--- a/src/backend/po/ja.po
+++ b/src/backend/po/ja.po
@@ -1492,7 +1492,7 @@ msgstr "パラレルワーカの初期化に失敗しました"
 #: access/transam/parallel.c:719 access/transam/parallel.c:838
 #, c-format
 msgid "More details may be available in the server log."
-msgstr "詳細な情報がはサーバログにあるかもしれません。"
+msgstr "詳細な情報はサーバログにあるかもしれません。"
 
 #: access/transam/parallel.c:899
 #, c-format
@@ -1841,7 +1841,7 @@ msgid ""
 "Stop the postmaster and vacuum that database in single-user mode.\n"
 "You might also need to commit or roll back old prepared transactions, or drop stale replication slots."
 msgstr ""
-"postmaster を停止後、シングルユーザモードでデータベースをVACUUMを実行してください。\n"
+"postmaster を停止後、シングルユーザモードでデータベースにVACUUMを実行してください。\n"
 "古い準備済みトランザクションのコミットまたはロールバック、もしくは古いレプリケーションスロットの削除も必要かもしれません。"
 
 #: access/transam/varsup.c:136
@@ -12183,7 +12183,7 @@ msgstr "失敗した行は%sを含みます"
 #: executor/execMain.c:1970
 #, c-format
 msgid "null value in column \"%s\" of relation \"%s\" violates not-null constraint"
-msgstr "リレーション\"%2$s\"の列\"%1$s\"のNULL値がが非NULL制約に違反しています"
+msgstr "リレーション\"%2$s\"の列\"%1$s\"のNULL値が非NULL制約に違反しています"
 
 #: executor/execMain.c:2021
 #, c-format
@@ -12739,7 +12739,7 @@ msgstr "ロールオプション\"%s\"が認識できません"
 #: gram.y:1588 gram.y:1604
 #, c-format
 msgid "CREATE SCHEMA IF NOT EXISTS cannot include schema elements"
-msgstr "CREATE SCHEMA IF NOT EXISTSんはスキーマ要素を含めることはできません"
+msgstr "CREATE SCHEMA IF NOT EXISTSはスキーマ要素を含めることはできません"
 
 #: gram.y:1761
 #, c-format
@@ -17583,7 +17583,7 @@ msgstr "CREATE TABLE では既存のインデックスを使えません"
 #: parser/parse_utilcmd.c:2261
 #, c-format
 msgid "index \"%s\" is already associated with a constraint"
-msgstr "インデックス\"%s\"はすでに1つの制約に割り当てられれいます"
+msgstr "インデックス\"%s\"はすでに1つの制約に割り当てられています"
 
 #: parser/parse_utilcmd.c:2276
 #, c-format
@@ -25012,7 +25012,7 @@ msgstr "設定列\"%s\"をNULLにすることはできません"
 #: utils/adt/tsvector_op.c:2665
 #, c-format
 msgid "text search configuration name \"%s\" must be schema-qualified"
-msgstr "テキスト検索設定名称\"%s\"はスキーマ修飾しなけれナバりません"
+msgstr "テキスト検索設定名称\"%s\"はスキーマ修飾しなければなりません"
 
 #: utils/adt/tsvector_op.c:2690
 #, c-format
@@ -26503,7 +26503,7 @@ msgstr "問い合わせのパースツリーをログに記録します。"
 
 #: utils/misc/guc.c:1498
 msgid "Logs each query's rewritten parse tree."
-msgstr "リライト後の問い合わせのパースツリーをログを記録します。"
+msgstr "リライト後の問い合わせのパースツリーをログに記録します。"
 
 #: utils/misc/guc.c:1507
 msgid "Logs each query's execution plan."
@@ -28443,7 +28443,7 @@ msgstr "固定されたポータル\"%s\"は削除できません"
 #: utils/mmgr/portalmem.c:488
 #, c-format
 msgid "cannot drop active portal \"%s\""
-msgstr "アクテイブなポータル\"%s\"を削除できません"
+msgstr "アクティブなポータル\"%s\"を削除できません"
 
 #: utils/mmgr/portalmem.c:739
 #, c-format


Re: Perform streaming logical transactions by background workers and parallel apply

2022-08-25 Thread Amit Kapila
On Fri, Aug 26, 2022 at 9:30 AM Dilip Kumar  wrote:
>
> On Thu, Aug 11, 2022 at 12:13 PM Amit Kapila  wrote:
> > > Since we will later consider applying non-streamed transactions in 
> > > parallel, I
> > > think "apply streaming worker" might not be very suitable. I think 
> > > PostgreSQL
> > > also has the worker "parallel worker", so for "apply parallel worker" and
> > > "apply background worker", I feel that "apply background worker" will 
> > > make the
> > > relationship between workers more clear. ("[main] apply worker" and "apply
> > > background worker")
> > >
> >
> > But, on similar lines, we do have vacuumparallel.c for parallelizing
> > index vacuum. I agree with Kuroda-San on this point that the currently
> > proposed terminology doesn't sound to be very clear. The other options
> > that come to my mind are "apply streaming transaction worker", "apply
> > parallel worker" and file name could be applystreamworker.c,
> > applyparallel.c, applyparallelworker.c, etc. I see the point why you
> > are hesitant in calling it "apply parallel worker" but it is quite
> > possible that even for non-streamed xacts, we will share quite some
> > part of this code.
>
> I think the "apply streaming transaction worker" is a good option
> w.r.t. what we are currently doing but then in the future, if we want
> to apply normal transactions in parallel then we will have to again
> change the name.  So I think  "apply parallel worker" might look
> better and the file name could be "applyparallelworker.c" or just
> "parallelworker.c". Although "parallelworker.c" file name is a bit
> generic but we already have worker.c so w.r.t that "parallelworker.c"
> should just look fine.
>

Yeah based on that theory, we can go with parallelworker.c but my vote
is to go with applyparallelworker.c among the above as that is more
clear. I feel worker.c is already not a very good name where we are
doing the work related to apply, so it won't be advisable to go down
that path further.

-- 
With Regards,
Amit Kapila.




Re: use ARM intrinsics in pg_lfind32() where available

2022-08-25 Thread Nathan Bossart
Here is a new patch set that applies on top of v9-0001 in the
json_lex_string patch set [0] and v3 of the is_valid_ascii patch [1].

[0] 
https://postgr.es/m/CAFBsxsFV4v802idV0-Bo%3DV7wLMHRbOZ4er0hgposhyGCikmVGA%40mail.gmail.com
[1] 
https://postgr.es/m/CAFBsxsFFAZ6acUfyUALiem4DpCW%3DApXbF02zrc0G0oT9CPof0Q%40mail.gmail.com

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 5f973a39d67a744d514ee80e05a1c7f40bc0ebc6 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Thu, 25 Aug 2022 22:18:30 -0700
Subject: [PATCH v3 1/2] abstract architecture-specific implementation details
 from pg_lfind32()

---
 src/include/port/pg_lfind.h | 55 ++
 src/include/port/simd.h | 60 +
 2 files changed, 90 insertions(+), 25 deletions(-)

diff --git a/src/include/port/pg_lfind.h b/src/include/port/pg_lfind.h
index a4e13dffec..7a851ea42c 100644
--- a/src/include/port/pg_lfind.h
+++ b/src/include/port/pg_lfind.h
@@ -91,16 +91,19 @@ pg_lfind32(uint32 key, uint32 *base, uint32 nelem)
 {
 	uint32		i = 0;
 
-#ifdef USE_SSE2
+#ifndef USE_NO_SIMD
 
 	/*
-	 * A 16-byte register only has four 4-byte lanes. For better
-	 * instruction-level parallelism, each loop iteration operates on a block
-	 * of four registers. Testing has showed this is ~40% faster than using a
-	 * block of two registers.
+	 * For better instruction-level parallelism, each loop iteration operates
+	 * on a block of four registers.  Testing for SSE2 has showed this is ~40%
+	 * faster than using a block of two registers.
 	 */
-	const		__m128i keys = _mm_set1_epi32(key); /* load 4 copies of key */
-	uint32		iterations = nelem & ~0xF;	/* round down to multiple of 16 */
+	const Vector32 keys = vector32_broadcast(key);	/* load copies of key */
+	uint32		nelem_per_vector = sizeof(Vector32) / sizeof(uint32);
+	uint32		nelem_per_iteration = 4 * nelem_per_vector;
+
+	/* round down to multiple of elements per iteration */
+	uint32		tail_idx = nelem & ~(nelem_per_iteration - 1);
 
 #if defined(USE_ASSERT_CHECKING)
 	bool		assert_result = false;
@@ -116,31 +119,33 @@ pg_lfind32(uint32 key, uint32 *base, uint32 nelem)
 	}
 #endif
 
-	for (i = 0; i < iterations; i += 16)
+	for (i = 0; i < tail_idx; i += nelem_per_iteration)
 	{
-		/* load the next block into 4 registers holding 4 values each */
-		const		__m128i vals1 = _mm_loadu_si128((__m128i *) & base[i]);
-		const		__m128i vals2 = _mm_loadu_si128((__m128i *) & base[i + 4]);
-		const		__m128i vals3 = _mm_loadu_si128((__m128i *) & base[i + 8]);
-		const		__m128i vals4 = _mm_loadu_si128((__m128i *) & base[i + 12]);
+		Vector32	vals1, vals2, vals3, vals4,
+	result1, result2, result3, result4,
+	tmp1, tmp2, result;
+
+		/* load the next block into 4 registers */
+		vector32_load(&vals1, &base[i]);
+		vector32_load(&vals2, &base[i + nelem_per_vector]);
+		vector32_load(&vals3, &base[i + nelem_per_vector * 2]);
+		vector32_load(&vals4, &base[i + nelem_per_vector * 3]);
 
 		/* compare each value to the key */
-		const		__m128i result1 = _mm_cmpeq_epi32(keys, vals1);
-		const		__m128i result2 = _mm_cmpeq_epi32(keys, vals2);
-		const		__m128i result3 = _mm_cmpeq_epi32(keys, vals3);
-		const		__m128i result4 = _mm_cmpeq_epi32(keys, vals4);
+		result1 = vector32_eq(keys, vals1);
+		result2 = vector32_eq(keys, vals2);
+		result3 = vector32_eq(keys, vals3);
+		result4 = vector32_eq(keys, vals4);
 
 		/* combine the results into a single variable */
-		const		__m128i tmp1 = _mm_or_si128(result1, result2);
-		const		__m128i tmp2 = _mm_or_si128(result3, result4);
-		const		__m128i result = _mm_or_si128(tmp1, tmp2);
+		tmp1 = vector32_or(result1, result2);
+		tmp2 = vector32_or(result3, result4);
+		result = vector32_or(tmp1, tmp2);
 
 		/* see if there was a match */
-		if (_mm_movemask_epi8(result) != 0)
+		if (vector32_any_lane_set(result))
 		{
-#if defined(USE_ASSERT_CHECKING)
 			Assert(assert_result == true);
-#endif
 			return true;
 		}
 	}
@@ -151,14 +156,14 @@ pg_lfind32(uint32 key, uint32 *base, uint32 nelem)
 	{
 		if (key == base[i])
 		{
-#if defined(USE_SSE2) && defined(USE_ASSERT_CHECKING)
+#ifndef USE_NO_SIMD
 			Assert(assert_result == true);
 #endif
 			return true;
 		}
 	}
 
-#if defined(USE_SSE2) && defined(USE_ASSERT_CHECKING)
+#ifndef USE_NO_SIMD
 	Assert(assert_result == false);
 #endif
 	return false;
diff --git a/src/include/port/simd.h b/src/include/port/simd.h
index 8f85153110..bd4f1a3f39 100644
--- a/src/include/port/simd.h
+++ b/src/include/port/simd.h
@@ -32,6 +32,7 @@
 #include 
 #define USE_SSE2
 typedef __m128i Vector8;
+typedef __m128i Vector32;
 
 #else
 /*
@@ -40,18 +41,24 @@ typedef __m128i Vector8;
  */
 #define USE_NO_SIMD
 typedef uint64 Vector8;
+typedef uint64 Vector32;
 #endif
 
 
 static inline void vector8_load(Vector8 *v, const uint8 *s);
+static inline void vector32_load(Vector32 *v, const uint32 *s);
 static inline Vector8 vector8_broadcast(const uint8 c);
+static inline Vector32 vector32_broad

HOT chain validation in verify_heapam()

2022-08-25 Thread Himanshu Upadhyaya
Hi,

On Mon, Apr 4, 2022 at 11:46 PM Andres Freund  wrote:

>
> I think there's a few more things that'd be good to check. For example
> amcheck
> doesn't verify that HOT chains are reasonable, which can often be spotted
> looking at an individual page. Which is a bit unfortunate, given how many
> bugs
> we had in that area.
>
> Stuff to check around that:
> - target of redirect has HEAP_ONLY_TUPLE, HEAP_UPDATED set
> - In a valid ctid chain within a page (i.e. xmax = xmin):
>   - tuples have HEAP_UPDATED set
>   - HEAP_ONLY_TUPLE / HEAP_HOT_UPDATED matches across chains elements


(I changed the subject because the attached patch is related to HOT chain
validation).

Please find attached the patch with the above idea of HOT chain's
validation(within a Page) and a few more validation as below.

* If the predecessor’s xmin is aborted or in progress, the current tuples
xmin should be aborted or in progress respectively. Also, Both xmin must be
equal.
* If the predecessor’s xmin is not frozen, then-current tuple’s shouldn’t
be either.
* If the predecessor’s xmin is equal to the current tuple’s xmin, the
current tuple’s cmin should be greater than the predecessor’s xmin.
* If the current tuple is not HOT then its predecessor’s tuple must not be
HEAP_HOT_UPDATED.
* If the current Tuple is HOT then its predecessor’s tuple must be
HEAP_HOT_UPDATED and vice-versa.
* If xmax is 0, which means it's the last tuple in the chain, then it must
not be HEAP_HOT_UPDATED.
* If the current tuple is the last tuple in the HOT chain then the next
tuple should not be HOT.

I am looking into the process of adding the TAP test for these changes and
finding a way to corrupt a page in the tap test. Will try to include these
test cases in my Upcoming version of the patch.

-- 
Regards,
Himanshu Upadhyaya
EnterpriseDB: http://www.enterprisedb.com
From f3ae2f783a255109655cade770ebc74e01aef34c Mon Sep 17 00:00:00 2001
From: Himanshu Upadhyaya 
Date: Thu, 18 Aug 2022 13:20:51 +0530
Subject: [PATCH v1] HOT chain validation in verify_heapam.

---
 contrib/amcheck/verify_heapam.c | 144 
 1 file changed, 144 insertions(+)

diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c
index c875f3e5a2..ae2c100de2 100644
--- a/contrib/amcheck/verify_heapam.c
+++ b/contrib/amcheck/verify_heapam.c
@@ -399,6 +399,7 @@ verify_heapam(PG_FUNCTION_ARGS)
 	for (ctx.blkno = first_block; ctx.blkno <= last_block; ctx.blkno++)
 	{
 		OffsetNumber maxoff;
+		OffsetNumber predecessor[MaxOffsetNumber] = {0};
 
 		CHECK_FOR_INTERRUPTS();
 
@@ -433,6 +434,9 @@ verify_heapam(PG_FUNCTION_ARGS)
 		for (ctx.offnum = FirstOffsetNumber; ctx.offnum <= maxoff;
 			 ctx.offnum = OffsetNumberNext(ctx.offnum))
 		{
+			OffsetNumber nextTupOffnum;
+			HeapTupleHeader htup;
+
 			ctx.itemid = PageGetItemId(ctx.page, ctx.offnum);
 
 			/* Skip over unused/dead line pointers */
@@ -469,6 +473,13 @@ verify_heapam(PG_FUNCTION_ARGS)
 	report_corruption(&ctx,
 	  psprintf("line pointer redirection to unused item at offset %u",
 			   (unsigned) rdoffnum));
+
+htup = (HeapTupleHeader) PageGetItem(ctx.page, rditem);
+if (!(HeapTupleHeaderIsHeapOnly(htup) && htup->t_infomask & HEAP_UPDATED))
+	report_corruption(&ctx,
+	  psprintf("Redirected Tuple at line pointer offset %u is not HEAP_ONLY_TUPLE or HEAP_UPDATED tuple",
+			   (unsigned) rdoffnum));
+
 continue;
 			}
 
@@ -507,6 +518,139 @@ verify_heapam(PG_FUNCTION_ARGS)
 
 			/* Ok, ready to check this next tuple */
 			check_tuple(&ctx);
+			/*
+			 * Add line pointer offset to predecessor array.
+			 * 1) If xmax is matching with xmin of next Tuple(reaching via its t_ctid).
+			 * 2) If next tuple is in the same page.
+			 * Raise corruption if:
+			 * We have two tuples having same predecessor.
+			 *
+			 * We add offset to predecessor irrespective of transaction(t_xmin) status. We will
+			 * do validation related to transaction status(and also all other validations)
+			 * when we loop over predecessor array.
+			 */
+			nextTupOffnum = ItemPointerGetOffsetNumber(&(ctx.tuphdr)->t_ctid);
+			if (ItemPointerGetBlockNumber(&(ctx.tuphdr)->t_ctid) == ctx.blkno &&
+nextTupOffnum != ctx.offnum)
+			{
+TransactionId currTupXmax;
+ItemId lp;
+
+if (predecessor[nextTupOffnum] != 0)
+{
+	report_corruption(&ctx,
+psprintf("Tuple at offset %u is reachable from two or more updated tuple",
+	(unsigned) nextTupOffnum));
+	continue;
+}
+currTupXmax = HeapTupleHeaderGetUpdateXid(ctx.tuphdr);
+lp   = PageGetItemId(ctx.page, nextTupOffnum);
+
+htup = (HeapTupleHeader) PageGetItem(ctx.page, lp);
+
+if (TransactionIdIsValid(currTupXmax) &&
+	TransactionIdEquals(HeapTupleHeaderGetXmin(htup), currTupXmax))
+{
+	predecessor[nextTupOffnum] = ctx.offnum;
+}
+/* Non matching XMAX with XMIN is not a corruption*/
+			}
+
+		}
+		/* Now loop over offset and validate data i

Re: Fix japanese translation of log messages

2022-08-25 Thread Kyotaro Horiguchi
At Fri, 26 Aug 2022 14:28:26 +0900, torikoshia  
wrote in 
>  "
> >  #: gram.y:1588 gram.y:1604
> >  #, c-format
> >  msgid "CREATE SCHEMA IF NOT EXISTS cannot include schema elements"
> > -msgstr "CREATE SCHEMA IF NOT EXISTSんはスキーマ要素を含めることはでき
> > -ません"
> > +msgstr "CREATE SCHEMA IF NOT EXISTSはスキーマ要素を含めることはできま
> > せん"
> 
> How do you think?

"NHa" Darn...  That kind of mistypes are inevitable when I worked on
nearly or over a thousand of messages at once..  Currently I'm working
in an incremental fashion and I only process at most up to 10 or so
messages at a time thus that kind of silly mistake cannot happen..

It's a mistake of "には". I'll load it into the next ship.  The next
release is 9/8 and I'm not sure the limit of translation commits for
the release, though..

Anyway, thank you for reporting!

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Fix japanese translation of log messages

2022-08-25 Thread torikoshia

On 2022-08-26 15:20, Kyotaro Horiguchi wrote:

At Fri, 26 Aug 2022 14:28:26 +0900, torikoshia
 wrote in

>  #: gram.y:1588 gram.y:1604
>  #, c-format
>  msgid "CREATE SCHEMA IF NOT EXISTS cannot include schema elements"
> -msgstr "CREATE SCHEMA IF NOT EXISTSんはスキーマ要素を含めることはでき
> -ません"
> +msgstr "CREATE SCHEMA IF NOT EXISTSはスキーマ要素を含めることはできま
> せん"

How do you think?


"NHa" Darn...  That kind of mistypes are inevitable when I worked on
nearly or over a thousand of messages at once..  Currently I'm working
in an incremental fashion and I only process at most up to 10 or so
messages at a time thus that kind of silly mistake cannot happen..


No problem, rather thanks for working on this!


It's a mistake of "には".


Ah, I got it.


I'll load it into the next ship.  The next
release is 9/8 and I'm not sure the limit of translation commits for
the release, though..


Thanks a lot!

--
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION