pg_stat_wal: tracking the compression effect
Hi hackers, We can specify compression method (for example, lz4, zstd), but it is hard to know the effect of compression depending on the method. There is already a way to know the compression effect using pg_waldump. However, having these statistics in the view makes it more accessible. I am proposing to add statistics, which keeps track of compression effect in pg_stat_ wal view. The design I am thinking is below: compression_saved | compression_times --+--- 38741 |6 Accumulating the values, which indicates how much space is saved by each compression (size before compression - size after compression), and keep track of how many times compression has happened. So that one can know how much space is saved on average. What do you think? Regards, -- Ken Kato Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: pg_basebackup: add test about zstd compress option
On Tue, Aug 23, 2022 at 11:37 AM Michael Paquier wrote: > It seems to me that checking that the contents generated are valid is > equally necessary. We do that with zlib with gzip --test, and you > could use ${ZSTD} in the context of this test. Thank you for the good points. I supplemented the test according to your suggestion. However, there was a problem. Even though I did export ZSTD on the Makefile , the test runner can't find ZSTD when it actually tests. ``` my $zstd = $ENV{ZSTD}; skip "program zstd is not found in your system", 1 if (!defined $zstd || $zstd eq ''); ``` log: regress_log_010_pg_basebackup ``` ok 183 # skip program zstd is not found in your system. ``` Could you check if I missed anything? v2_add_test_pg_basebackup.patch Description: Binary data
Re: add checkpoint stats of snapshot and mapping files of pg_logical dir
At Wed, 24 Aug 2022 13:34:54 +0900, Michael Paquier wrote in > On Wed, Aug 24, 2022 at 10:48:01AM +0900, Kyotaro Horiguchi wrote: > > By the way, I think we use INSTR_TIME_* macros to do masure internal > > durations (mainly for the monotonic clock characteristics, and to > > reduce performance degradation on Windows?). I'm not sure that's > > crutial here but I don't think there's any reason to use > > GetCurrentTimestamp() instead. > > This implies two calls of gettimeofday(), but that does not worry me > much in this code path. There is some consistency with > CheckpointGuts() where we take timestamps for the sync requests. Mmm. heap_vacuum_rel does the same. From the other direction, the two are the only use of GetCurrentTimestamp() for this purpose. However, I'm fine with that. Thanks for the info. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)
On 2022-Aug-24, Peter Eisentraut wrote: > I don't follow how this is a backpatching hazard. It changes code. Any bugfix in the surrounding code would have to fix a conflict. That is nonzero effort. Is it a huge risk? No, it is very small risk and a very small cost to fix such a conflict; but my claim is that this change has zero benefit, therefore we should not incur a nonzero future effort. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "How amazing is that? I call it a night and come back to find that a bug has been identified and patched while I sleep."(Robert Davidson) http://archives.postgresql.org/pgsql-sql/2006-03/msg00378.php
Re: Letter case of "admin option"
On 2022-Aug-25, Kyotaro Horiguchi wrote: > At Tue, 23 Aug 2022 09:58:47 -0400, Robert Haas wrote > in > I would translate "ADMIN OPTION" to "ADMIN OPTION" in Japanese but > "admin option" is translated to "管理者オプション" which is a bit hard > for the readers to come up with the connection to "ADMIN OPTION" (or > ADMIN ). I guess this is somewhat simliar to use "You need to > give capability to administrate the role" to suggest users to add WITH > ADMIN OPTION to the role. > > Maybe Álvaro has a similar difficulty on it. Exactly. I ran a quick poll in a Spanish community. Everyone who responded (not many admittedly) agreed with this idea -- they find the message clearer if the keyword is mentioned explicitly in the translation. > > In short, I'm wondering whether we should regard ADMIN as the name of > > the option, but OPTION as part of the GRANT syntax, and hence > > capitalize it "ADMIN option". However, if the non-English speakers on > > this list have a strong preference for something else I'm certainly > > not going to fight about it. > > "ADMIN option" which is translated into "ADMINオプション" is fine by > me. I hope Álvaro thinks the same way. Hmm, but our docs say that the option is called ADMIN OPTION, don't they? And I think the standard sees it the same way. You cannot invoke it without the word OPTION. I understand the point of view, but I don't think it is clearer done that way. It is different for example with INHERIT; we could say "the INHERIT option" making the word "option" translatable in that phrase. But then you don't have to add that word in the command. > What do you think about the attached? I prefer the ADMIN OPTION interpretation (both for docs and error messages). I think it's clearer that way, given that the syntax is what it is. > > > > !is_admin_of_role(currentUserId, roleid)) > > > > ereport(ERROR, > > > > > > > > (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), > > > >errmsg("must have admin option > > > > on role \"%s\"", > > > > rolename))); > > > > > > The message seems a bit short that it only mentions admin option while > > > omitting CREATEROLE privilege. "must have CREATEROLE privilege or > > > admin option on role %s" might be better. Or we could say just > > > "insufficient privilege" or "permission denied" in the main error > > > message then provide "CREATEROLE privilege or admin option on role %s > > > is required" in DETAILS or HINTS message. I'm not opposed to moving that part of detail/hint, but I would prefer that it says "the CREATEROLE privilege or ADMIN OPTION". > --- a/doc/src/sgml/ref/alter_group.sgml > +++ b/doc/src/sgml/ref/alter_group.sgml > @@ -55,7 +55,7 @@ ALTER GROUP class="parameter">group_name RENAME TO REVOKE. Note that > GRANT and REVOKE have additional > options which are not available with this command, such as the ability > - to grant and revoke ADMIN OPTION, and the ability to > + to grant and revoke ADMIN option, and the ability to > specify the grantor. > I think the original reads better. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "La libertad es como el dinero; el que no la sabe emplear la pierde" (Alvarez)
Re: pg_receivewal and SIGTERM
> On 23 Aug 2022, at 02:15, Michael Paquier wrote: > > On Mon, Aug 22, 2022 at 04:05:16PM +0200, Christoph Berg wrote: >> Do you mean it can, or can not be backpatched? (I'd argue for >> backpatching since the behaviour is slightly broken at the moment.) > > I mean that it is fine to backpatch that, in my opinion. I think this can be argued both for and against backpatching. Catching SIGTERM makes a lot of sense, especially given systemd's behavior. On the other hand, This adds functionality to something arguably working as intended, regardless of what one thinks about the intent. The attached adds the Exit Status section to pg_recvlogical docs which is present in pg_receivewal to make them more aligned, and tweaks comments to pgindent standards. This is the version I think is ready to commit. -- Daniel Gustafsson https://vmware.com/ v2-0001-Handle-SIGTERM-in-pg_receivewal-and-pg_recvlogica.patch Description: Binary data
Re: Column Filtering in Logical Replication
On Tue, Aug 23, 2022 at 7:52 AM Peter Smith wrote: > > On Mon, Aug 22, 2022 at 9:25 PM vignesh C wrote: > > > ... > > > Few comments: > > 1) I felt no expressions are allowed in case of column filters. Only > > column names can be specified. The second part of the sentence > > confuses what is allowed and what is not allowed. Won't it be better > > to remove the second sentence and mention that only column names can > > be specified. > > + > > +Column list can contain only simple column references. Complex > > +expressions, function calls etc. are not allowed. > > + > > > > This wording was lifted verbatim from the commit message [1]. But I > see your point that it just seems to be overcomplicating a simple > rule. Modified as suggested. > > > 2) tablename should be table name. > > + > > +A column list is specified per table following the tablename, and > > enclosed by > > +parenthesis. See for details. > > + > > > > We have used table name in the same page in other instances like: > > a) The row filter is defined per table. Use a WHERE clause after the > > table name for each published table that requires data to be filtered > > out. The WHERE clause must be enclosed by parentheses. > > b) The tables are matched between the publisher and the subscriber > > using the fully qualified table name. > > > > Fixed as suggested. > > > 3) One small whitespace issue: > > git am v2-0001-Column-List-replica-identity-rules.patch > > Applying: Column List replica identity rules. > > .git/rebase-apply/patch:30: trailing whitespace. > >if the publication publishes only INSERT operations. > > warning: 1 line adds whitespace errors. > > > > Fixed. > > ~~~ > > PSA the v3* patch set. Thanks for the updated patch. Few comments: 1) We can shuffle the columns in publisher table and subscriber to show that the order of the column does not matter + +Create a publication p1. A column list is defined for +table t1 to reduce the number of columns that will be +replicated. + +test_pub=# CREATE PUBLICATION p1 FOR TABLE t1 (id, a, b, c); +CREATE PUBLICATION +test_pub=# + 2) We can try to keep the line content to less than 80 chars + +A column list is specified per table following the tablename, and enclosed by +parenthesis. See for details. + 3) tablename should be table name like it is used in other places + +A column list is specified per table following the tablename, and enclosed by +parenthesis. See for details. + 4a) In the below, you could include mentioning "Only the column list data of publication p1 are replicated." + + Insert some rows to table t1. + +test_pub=# INSERT INTO t1 VALUES(1, 'a-1', 'b-1', 'c-1', 'd-1', 'e-1'); +INSERT 0 1 4b) In the above, we could mention that the insert should be done on the "publisher side" as the previous statements are executed on the subscriber side. Regards, Vignesh
Re: use SSE2 for is_valid_ascii
v3 applies on top of the v9 json_lex_string patch in [1] and adds a bit more to that, resulting in a simpler patch that is more amenable to additional SIMD-capable platforms. [1] https://www.postgresql.org/message-id/CAFBsxsFV4v802idV0-Bo%3DV7wLMHRbOZ4er0hgposhyGCikmVGA%40mail.gmail.com -- John Naylor EDB: http://www.enterprisedb.com diff --git a/src/common/wchar.c b/src/common/wchar.c index 1e6e198bf2..1ca7533f00 100644 --- a/src/common/wchar.c +++ b/src/common/wchar.c @@ -1918,11 +1918,12 @@ pg_utf8_verifystr(const unsigned char *s, int len) const int orig_len = len; uint32 state = BGN; -/* - * Sixteen seems to give the best balance of performance across different - * byte distributions. - */ -#define STRIDE_LENGTH 16 + /* + * With a stride of two vector widths, gcc will unroll the loop. Even if + * the compiler can unroll a longer loop, it's not worth it because we + * must fall back to the byte-wise algorithm if we find any non-ASCII. + */ +#define STRIDE_LENGTH (2 * sizeof(Vector8)) if (len >= STRIDE_LENGTH) { diff --git a/src/include/mb/pg_wchar.h b/src/include/mb/pg_wchar.h index 011b0b3abd..aea045aa66 100644 --- a/src/include/mb/pg_wchar.h +++ b/src/include/mb/pg_wchar.h @@ -19,6 +19,8 @@ #ifndef PG_WCHAR_H #define PG_WCHAR_H +#include "port/simd.h" + /* * The pg_wchar type */ @@ -704,25 +706,28 @@ extern WCHAR *pgwin32_message_to_UTF16(const char *str, int len, int *utf16len); * Verify a chunk of bytes for valid ASCII. * * Returns false if the input contains any zero bytes or bytes with the - * high-bit set. Input len must be a multiple of 8. + * high-bit set. Input len must be a multiple of the chunk size (8 or 16). */ static inline bool is_valid_ascii(const unsigned char *s, int len) { const unsigned char *const s_end = s + len; - uint64 chunk, -highbit_cum = UINT64CONST(0), -zero_cum = UINT64CONST(0x8080808080808080); + Vector8 chunk; + Vector8 highbit_cum = vector8_broadcast(0); +#ifdef USE_NO_SIMD + Vector8 zero_cum = vector8_broadcast(0x80); +#endif Assert(len % sizeof(chunk) == 0); while (s < s_end) { - memcpy(&chunk, s, sizeof(chunk)); + vector8_load(&chunk, s); + + /* Capture any zero bytes in this chunk. */ +#if defined(USE_NO_SIMD) /* - * Capture any zero bytes in this chunk. - * * First, add 0x7f to each byte. This sets the high bit in each byte, * unless it was a zero. If any resulting high bits are zero, the * corresponding high bits in the zero accumulator will be cleared. @@ -734,20 +739,31 @@ is_valid_ascii(const unsigned char *s, int len) * because we check for those separately. */ zero_cum &= (chunk + UINT64CONST(0x7f7f7f7f7f7f7f7f)); +#else + + /* + * Set all bits in each lane of the highbit accumulator where input + * bytes are zero. + */ + highbit_cum = vector8_or(highbit_cum, + vector8_eq(chunk, vector8_broadcast(0))); +#endif /* Capture all set bits in this chunk. */ - highbit_cum |= chunk; + highbit_cum = vector8_or(highbit_cum, chunk); s += sizeof(chunk); } /* Check if any high bits in the high bit accumulator got set. */ - if (highbit_cum & UINT64CONST(0x8080808080808080)) + if (vector8_is_highbit_set(highbit_cum)) return false; +#ifdef USE_NO_SIMD /* Check if any high bits in the zero accumulator got cleared. */ - if (zero_cum != UINT64CONST(0x8080808080808080)) + if (zero_cum != vector8_broadcast(0x80)) return false; +#endif return true; } diff --git a/src/include/port/simd.h b/src/include/port/simd.h index 56df989094..8f85153110 100644 --- a/src/include/port/simd.h +++ b/src/include/port/simd.h @@ -38,6 +38,7 @@ typedef __m128i Vector8; * If no SIMD instructions are available, we can in some cases emulate vector * operations using bitwise operations on unsigned integers. */ +#define USE_NO_SIMD typedef uint64 Vector8; #endif @@ -47,7 +48,11 @@ static inline Vector8 vector8_broadcast(const uint8 c); static inline bool vector8_has_zero(const Vector8 v); static inline bool vector8_has(const Vector8 v, const uint8 c); static inline bool vector8_has_le(const Vector8 v, const uint8 c); - +static inline bool vector8_is_highbit_set(const Vector8 v); +static inline Vector8 vector8_or(const Vector8 v1, const Vector8 v2); +#ifndef USE_NO_SIMD +static inline Vector8 vector8_eq(const Vector8 v1, const Vector8 v2); +#endif /* * Functions for loading a chunk of memory into a vector. @@ -181,4 +186,38 @@ vector8_has_le(const Vector8 v, const uint8 c) return result; } +static inline bool +vector8_is_highbit_set(const Vector8 v) +{ +#ifdef USE_SSE2 + return _mm_movemask_epi8(v) != 0; +#else + return v & vector8_broadcast(0x80); +#endif +} + +/* comparisons between vectors */ + +#ifndef USE_NO_SIMD +static inline Vector8 +vector8_eq(const Vector8 v1, const Vector8 v2) +{ +#ifdef USE_SSE2 + return _mm_cmpeq_epi8(v1, v2); +#endif +} +#endif + +/* bitwise operations */ + +static inline Vector8 +vector8_or(const Vector8 v
Re: Making Vars outer-join aware
On Thu, Aug 25, 2022 at 5:18 AM Tom Lane wrote: > Richard Guo writes: > > Do we need to also > > generate two SpecialJoinInfos for the B/C join in the first order, with > > and without the A/B join in its min_lefthand? > > No, the SpecialJoinInfos would stay as they are now. It's already the > case that the first join's min_righthand would contain only B, and > the second one's min_righthand would contain only C. I'm not sure if I understand it correctly. If we are given the first order from the parser, the SpecialJoinInfo for the B/C join would have min_lefthand as containing both B and the A/B join. And this SpecialJoinInfo would make the B/C join be invalid, which is not what we want. Currently the patch resolves this by explicitly running remove_unneeded_nulling_relids, and the A/B join would be removed from B/C join's min_lefthand, if Pbc is strict for B. Do we still need this kind of fixup if we are to keep just one form of SpecialJoinInfo and two forms of RestrictInfos? Thanks Richard
Insertion Sort Improvements
Hello, Inspired by the recent discussions[1][2] around sort improvements, I took a look around the code and noticed the use of a somewhat naive version of insertion sort within the broader quicksort code. The current implementation (see sort_template.h) is practically the textbook version of insertion sort: for (pm = a + ST_POINTER_STEP; pm < a + n * ST_POINTER_STEP; pm += ST_POINTER_STEP) for (pl = pm; pl > a && DO_COMPARE(pl - ST_POINTER_STEP, pl) > 0; pl -= ST_POINTER_STEP) DO_SWAP(pl, pl - ST_POINTER_STEP); I propose to rather use the slightly more efficient variant of insertion sort where only a single assignment instead of a fully-fledged swap is performed in the inner loop: for (pm = a + ST_POINTER_STEP; pm < a + n * ST_POINTER_STEP; pm += ST_POINTER_STEP) { DO_COPY(pm_temp, pm); /* pm_temp <- copy of pm */ pl = pm - ST_POINTER_STEP; for (; pl >= a && DO_COMPARE(pl, pm_temp) > 0; pl -= ST_POINTER_STEP) DO_ASSIGN(pl + ST_POINTER_STEP, pl); /* pl + 1 <- pl */ DO_COPY(pl + ST_POINTER_STEP, pm_temp); /* pl + 1 <- copy of pm_temp */ } DO_ASSIGN and DO_COPY macros would have to be declared analogue to DO_SWAP via the template. There is obviously a trade-off involved here as O(1) extra memory is required to hold the temporary variable and DO_COPY might be expensive if the sort element is large. In case of single datum sort with trivial data types this would not be a big issue. For large tuples on the other hand it could mean a significant overhead that might not be made up for by the improved inner loop. One might want to limit this algorithm to a certain maximum tuple size and use the original insertion sort version for larger elements (this could be decided at compile-time via sort_template.h). Anyways, there might be some low hanging fruit here. If it turns out to be significantly faster one might even consider increasing the insertion sort threshold from < 7 to < 10 sort elements. But that is a whole other discussion for another day. Has anyone tested such an approach before? Please let me know your thoughts. Cheers, Benjamin [1] https://www.postgresql.org/message-id/flat/CAFBsxsHanJTsX9DNJppXJxwg3bU%2BYQ6pnmSfPM0uvYUaFdwZdQ%40mail.gmail.com [2] https://www.postgresql.org/message-id/flat/CAApHDvoTTtoQYfp3d0kTPF6y1pjexgLwquzKmjzvjC9NCw4RGw%40mail.gmail.com -- Benjamin Coutu http://www.zeyos.com
Re: pg_receivewal and SIGTERM
On Thu, Aug 25, 2022 at 11:19:05AM +0200, Daniel Gustafsson wrote: > I think this can be argued both for and against backpatching. Catching > SIGTERM > makes a lot of sense, especially given systemd's behavior. On the other hand, > This adds functionality to something arguably working as intended, regardless > of what one thinks about the intent. Sure. My view on this matter is that the behavior of the patch is more useful to users as, on HEAD, a SIGTERM is equivalent to a drop of the connection followed by a retry when not using -n. Or do you think that there could be cases where the behavior of HEAD (force a connection drop with the backend and handle the retry infinitely in pg_receivewal/recvlogical) is more useful? systemd can also do retries a certain given of times, so that's moving the ball one layer to the other, at the end. We could also say to just set KillSignal to SIGINT in the docs, but my guess is that few users would actually notice that until they see how pg_receiwal/recvlogical work with systemd's default. FWIW, I've worked on an archiver integration a few years ago and got annoyed that we use SIGINT while SIGTERM was the default (systemd was not directly used there but the signal problem was the same, so we had to go through some loops to make the stop signal configurable, like systemd). -- Michael signature.asc Description: PGP signature
Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)
On Thu, Aug 25, 2022 at 10:38:41AM +0200, Alvaro Herrera wrote: > It changes code. Any bugfix in the surrounding code would have to fix a > conflict. That is nonzero effort. Is it a huge risk? No, it is very > small risk and a very small cost to fix such a conflict; but my claim is > that this change has zero benefit, therefore we should not incur a > nonzero future effort. Agreed to leave things as they are. This really comes down to if we want to make this code more C99-ish or not, and the post-patch result is logically the same as the pre-patch result. -- Michael signature.asc Description: PGP signature
Re: Perform streaming logical transactions by background workers and parallel apply
On Wed, Aug 24, 2022 at 7:17 PM houzj.f...@fujitsu.com wrote: > > On Friday, August 19, 2022 4:49 PM Amit Kapila > > > > > 8. It is not clear to me how APPLY_BGWORKER_EXIT status is used. Is it > > required > > for the cases where bgworker exists due to some error and then apply worker > > uses it to detect that and exits? How other bgworkers would notice this, is > > it > > done via apply_bgworker_check_status()? > > It was used to detect the unexpected exit of bgworker and I have changed the > design > of this which is now similar to what we have in parallel query. > Thanks, this looks better. > Attach the new version patch set(v24) which address above comments. > Besides, I added some logic which try to stop the bgworker at transaction end > if there are enough workers in the pool. > I think this deserves an explanation in worker.c under the title: "Separate background workers" in the patch. Review comments for v24-0001 = 1. + * cost of searhing the hash table /searhing/searching 2. +/* + * Apply background worker states. + */ +typedef enum ApplyBgworkerState +{ + APPLY_BGWORKER_BUSY, /* assigned to a transaction */ + APPLY_BGWORKER_FINISHED /* transaction is completed */ +} ApplyBgworkerState; Now, that there are just two states, can we think to represent them via a flag ('available'/'in_use') or do you see a downside with that as compared to the current approach? 3. -replorigin_session_setup(RepOriginId node) +replorigin_session_setup(RepOriginId node, int apply_leader_pid) I have mentioned previously that we don't need anything specific to apply worker/leader in this API, so why this change? The other idea that occurred to me is that can we use replorigin_session_reset() before sending the commit message to bgworker and then do the session setup in bgworker only to handle the commit/abort/prepare message. We also need to set it again for the leader apply worker after the leader worker completes the wait for bgworker to finish the commit handling. 4. Unlike parallel query, here we seem to be creating separate DSM for each worker, and probably the difference is due to the fact that here we don't know upfront how many workers will actually be required. If so, can we write some comments for the same in worker.c where you have explained about parallel bgwroker stuff? 5. /* - * Handle streamed transactions. + * Handle streamed transactions for both the main apply worker and the apply + * background workers. Shall we use leader apply worker in the above comment? Also, check other places in the patch for similar changes. 6. + else + { - /* open the spool file for this transaction */ - stream_open_file(MyLogicalRepWorker->subid, stream_xid, first_segment); + /* notify handle methods we're processing a remote transaction */ + in_streamed_transaction = true; There is a spurious line after else {. Also, the comment could be slightly improved: "/* notify handle methods we're processing a remote in-progress transaction */" 7. The checks in various apply_handle_stream_* functions have improved as compared to the previous version but I think we can still improve those. One idea could be to use a separate function to decide the action we want to take and then based on it, the caller can take appropriate action. Using a similar idea, we can improve the checks in handle_streamed_transaction() as well. 8. + else if ((winfo = apply_bgworker_find(xid))) + { + /* Send STREAM ABORT message to the apply background worker. */ + apply_bgworker_send_data(winfo, s->len, s->data); + + /* + * After sending the data to the apply background worker, wait for + * that worker to finish. This is necessary to maintain commit + * order which avoids failures due to transaction dependencies and + * deadlocks. + */ + if (subxid == xid) + { + apply_bgworker_wait_for(winfo, APPLY_BGWORKER_FINISHED); + apply_bgworker_free(winfo); + } + } + else + /* + * We are in main apply worker and the transaction has been + * serialized to file. + */ + serialize_stream_abort(xid, subxid); In the last else block, you can use {} to make it consistent with other if, else checks. 9. +void +ApplyBgworkerMain(Datum main_arg) +{ + volatile ApplyBgworkerShared *shared; + + dsm_handle handle; Is there a need to keep this empty line between the above two declarations? 10. + /* + * Attach to the message queue. + */ + mq = shm_toc_lookup(toc, APPLY_BGWORKER_KEY_ERROR_QUEUE, false); Here, we should say error queue in the comments. 11. + /* + * Attach to the message queue. + */ + mq = shm_toc_lookup(toc, APPLY_BGWORKER_KEY_ERROR_QUEUE, false); + shm_mq_set_sender(mq, MyProc); + error_mqh = shm_mq_attach(mq, seg, NULL); + pq_redirect_to_shm_mq(seg, error_mqh); + + /* + * Now, we have initialized DSM. Attach to slot. + */ + logicalrep_worker_attach(worker_slot); + MyParallelShared->logicalrep_worker_generation = MyLogicalRepWorker->generation; + MyParallelShared->logicalrep_worker_slot_no = worker_slot; + + pq_set_paralle
Re: making relfilenodes 56 bits
On Tue, Aug 23, 2022 at 8:00 PM Robert Haas wrote: > > On Tue, Aug 23, 2022 at 2:06 AM Dilip Kumar wrote: > > OTOH, if we keep the two separate ranges for the user and system table > > then we don't need all this complex logic of conflict checking. > > True. That's the downside. The question is whether it's worth adding > some complexity to avoid needing separate ranges. Other than complexity, we will have to check the conflict for all the user table's relfilenumber from the old cluster into the hash build on the new cluster's relfilenumber, isn't it extra overhead if there are a lot of user tables? But I think we are already restoring all those tables in the new cluster so compared to that it will be very small. > Honestly, if we don't care about having separate ranges, we can do > something even simpler and just make the starting relfilenumber for > system tables same as the OID. Then we don't have to do anything at > all, outside of not changing the OID assigned to pg_largeobject in a > future release. Then as long as pg_upgrade is targeting a new cluster > with completely fresh databases that have not had any system table > rewrites so far, there can't be any conflict. I think having the OID-based system and having two ranges are not exactly the same. Because if we have the OID-based relfilenumber allocation for system table (initially) and then later allocate from the nextRelFileNumber counter then it seems like a mix of old system (where actually OID and relfilenumber are tightly connected) and the new system where nextRelFileNumber is completely independent counter. OTOH having two ranges means logically we are not making dependent on OID we are just allocating from a central counter but after catalog initialization, we will leave some gap and start from a new range. So I don't think this system is hard to explain. > And perhaps that is the best solution after all, but while it is > simple in terms of code, I feel it's a bit complicated for human > beings. It's very simple to understand the scheme that Amit proposed: > if there's anything in the new cluster that would conflict, we move it > out of the way. We don't have to assume the new cluster hasn't had any > table rewrites. We don't have to nail down starting relfilenumber > assignments for system tables. We don't have to worry about > relfilenumber or OID assignments changing between releases. > pg_largeobject is not a special case. There are no special ranges of > OIDs or relfilenumbers required. It just straight up works -- all the > time, no matter what, end of story. I agree on this that this system is easy to explain that we just rewrite anything that conflicts so looks more future-proof. Okay, I will try this solution and post the patch. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
On Thu, Aug 25, 2022 at 10:48:08AM +0530, Bharath Rupireddy wrote: > On Tue, Aug 23, 2022 at 11:37 PM Nathan Bossart > wrote: >> IIUC an error in get_dirent_type() could cause slots to be skipped here, >> which is a behavior change. > > That behaviour hasn't changed, no? Currently, if lstat() fails in > ReorderBufferCleanupSerializedTXNs() it returns to > StartupReorderBuffer()'s while loop which is in a way continuing with > the other slots, this patch does nothing new. Are you sure? FWIW, the changes in reorderbuffer.c for ReorderBufferCleanupSerializedTXNs() reduce the code readability, in my opinion, so that's one less argument in favor of this change. The gain in ParseConfigDirectory() is kind of cool. pg_tzenumerate_next(), copydir(), RemoveXlogFile() StartupReplicationSlots(), CheckPointLogicalRewriteHeap() and RemovePgTempFilesInDir() seem fine, as well. At least these avoid extra lstat() calls when the file type is unknown, which would be only a limited number of users where some of the three DT_* are missing (right?). -- Michael signature.asc Description: PGP signature
Re: Strip -mmacosx-version-min options from plperl build
On 25.08.22 02:14, Andrew Dunstan wrote: In short: if the Cygwin Perl headers redefine PGDLLEXPORT (unlikely) or somehow #define "__attribute__()" or "visibility()" into no-ops (perhaps more likely) then we could explain this failure, and that would also explain why it doesn't fail elsewhere. I can't readily check this, since I have no idea exactly which version of the Perl headers lorikeet uses. It's built against cygwin perl 5.32. I don't see anything like that in perl.h. It's certainly using __attribute__() a lot. This could be checked by running plperl.c through the preprocessor (replace gcc -c plperl.c -o plperl.o by gcc -E plperl.c -o plperl.i) and seeing what becomes of those symbols. If we want to get the buildfarm green again sooner, we could force a --export-all-symbols directly.
Re: [RFC] building postgres with meson - v11
On 24.08.22 17:30, Andres Freund wrote: 258f6dc0a7 Don't hardcode tmp_check/ as test directory for tap tests 8ecc33cf04 Split TESTDIR into TESTLOGDIR and TESTDATADIR I think these patches are split up a bit incorrectly. If you apply the first patch by itself, then the output appears in tab_comp_dir/ directly under the source directory. And then the second patch moves it to tmp_check/tap_comp_dir/. If there is an intent to apply these patches separately somehow, this should be cleaned up. How is that happening with that version of the patch? The test puts tap_comp_dir under TESTDIR, and TESTDIR is $(CURDIR)/tmp_check. There was an earlier version of the patch that was split one more time that did have that problem, but I don't quite see how that version has it? Ok, I see now how this works. It's a bit weird since the meaning of TESTDIR is changed. I'm not sure if this could create cross-branch confusion. 97a0b096e8 meson: prereq: Add src/tools/gen_export.pl This produces leading whitespace in the output files that at least on darwin wasn't there before. See attached patch (0002). This should be checked again on other platforms as well. Hm, to me the indentation as is makes more sense, but ... Maybe for the 'gnu' format, but on darwin (and others) it's just a flat list, so indenting it is pointless. I wonder if we should rewrite this in python - I chose perl because I thought we could share it, but as you pointed out, that's not possible, because we don't want to depend on perl during the autoconf build from a tarball. Given that the code is already written, I wouldn't do it. Introducing Python into the toolchain would require considerations of minimum versions, finding the right binaries, formatting, style checking, etc., which would probably be a distraction right now. I also think that this script, whose purpose is to read an input file line by line and print it back out slightly differently, is not going to be done better in Python.
Re: archive modules
Here is a patch with the proposed wording. From 7fce0073f8a53b3e9ba84fa10fbc7b8efef36e97 Mon Sep 17 00:00:00 2001 From: benoit Date: Mon, 22 Aug 2022 12:00:46 +0200 Subject: [PATCH] basic_archive parameter visibility doc patch Module parameters are only visible from the pg_settings view once the module is loaded. Since an archive module is loaded by the archiver process, the parameters are never visible from the view. This patch adds a note bout this in the pg_settings system view documentation. --- doc/src/sgml/system-views.sgml | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml index 9728039e71..92aad11c71 100644 --- a/doc/src/sgml/system-views.sgml +++ b/doc/src/sgml/system-views.sgml @@ -3275,7 +3275,13 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx This view does not display customized options - until the extension module that defines them has been loaded. + until the extension module that defines them has been loaded by the backend + process executing the query (e.g., via shared_preload_libraries, the + LOAD command, or a call to a user-defined C function). + For example, since the archive_library is only loaded by the archiver + process, this view will not display any customized options defined by + archive modules unless special action is taken to load them into the backend + process executing the query. -- 2.37.1
postgres_fdw hint messages
The postgres_fdw tests contain this (as amended by patch 0001): ALTER SERVER loopback_nopw OPTIONS (ADD password 'dummypw'); ERROR: invalid option "password" HINT: Valid options in this context are: service, passfile, channel_binding, connect_timeout, dbname, host, hostaddr, port, options, application_name, keepalives, keepalives_idle, keepalives_interval, keepalives_count, tcp_user_timeout, sslmode, sslcompression, sslcert, sslkey, sslrootcert, sslcrl, sslcrldir, sslsni, requirepeer, ssl_min_protocol_version, ssl_max_protocol_version, gssencmode, krbsrvname, gsslib, target_session_attrs, use_remote_estimate, fdw_startup_cost, fdw_tuple_cost, extensions, updatable, truncatable, fetch_size, batch_size, async_capable, parallel_commit, keep_connections This annoys developers who are working on libpq connection options, because any option added, removed, or changed causes this test to need to be updated. It's also questionable how useful this hint is in its current form, considering how long it is and that the options are in an implementation-dependent order. Possible changes: - Hide the hint from this particular test (done in the attached patches). - Keep the hint, but maybe make it sorted? - Remove all the hints like this from foreign data commands. - Don't show the hint when there are more than N valid options. - Do some kind of "did you mean" like we have for column names. Thoughts?From 4adb3b6846e98a55ee19d8425dfb3d6d12145f16 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Thu, 25 Aug 2022 15:31:10 +0200 Subject: [PATCH 1/2] postgres_fdw: Remove useless DO block in test --- contrib/postgres_fdw/expected/postgres_fdw.out | 8 +--- contrib/postgres_fdw/sql/postgres_fdw.sql | 6 +- 2 files changed, 2 insertions(+), 12 deletions(-) diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index 7bf35602b0..a42c9720c3 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -9621,15 +9621,9 @@ ERROR: password is required DETAIL: Non-superusers must provide a password in the user mapping. -- If we add a password to the connstr it'll fail, because we don't allow passwords -- in connstrs only in user mappings. -DO $d$ -BEGIN -EXECUTE $$ALTER SERVER loopback_nopw OPTIONS (ADD password 'dummypw')$$; -END; -$d$; +ALTER SERVER loopback_nopw OPTIONS (ADD password 'dummypw'); ERROR: invalid option "password" HINT: Valid options in this context are: service, passfile, channel_binding, connect_timeout, dbname, host, hostaddr, port, options, application_name, keepalives, keepalives_idle, keepalives_interval, keepalives_count, tcp_user_timeout, sslmode, sslcompression, sslcert, sslkey, sslrootcert, sslcrl, sslcrldir, sslsni, requirepeer, ssl_min_protocol_version, ssl_max_protocol_version, gssencmode, krbsrvname, gsslib, target_session_attrs, use_remote_estimate, fdw_startup_cost, fdw_tuple_cost, extensions, updatable, truncatable, fetch_size, batch_size, async_capable, parallel_commit, keep_connections -CONTEXT: SQL statement "ALTER SERVER loopback_nopw OPTIONS (ADD password 'dummypw')" -PL/pgSQL function inline_code_block line 3 at EXECUTE -- If we add a password for our user mapping instead, we should get a different -- error because the password wasn't actually *used* when we run with trust auth. -- diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql index 42735ae78a..63581a457d 100644 --- a/contrib/postgres_fdw/sql/postgres_fdw.sql +++ b/contrib/postgres_fdw/sql/postgres_fdw.sql @@ -2886,11 +2886,7 @@ CREATE FOREIGN TABLE pg_temp.ft1_nopw ( -- If we add a password to the connstr it'll fail, because we don't allow passwords -- in connstrs only in user mappings. -DO $d$ -BEGIN -EXECUTE $$ALTER SERVER loopback_nopw OPTIONS (ADD password 'dummypw')$$; -END; -$d$; +ALTER SERVER loopback_nopw OPTIONS (ADD password 'dummypw'); -- If we add a password for our user mapping instead, we should get a different -- error because the password wasn't actually *used* when we run with trust auth. -- 2.37.1 From 986ea253cd887b472e8040580f8c1a2642f3cada Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Thu, 25 Aug 2022 15:31:11 +0200 Subject: [PATCH 2/2] postgres_fdw: Avoid listing all libpq connection options in error hint --- contrib/postgres_fdw/expected/postgres_fdw.out | 7 ++- contrib/postgres_fdw/sql/postgres_fdw.sql | 6 ++ 2 files changed, 12 insertions(+), 1 deletion(-) diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index a42c9720c3..3678428b95 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -9621,9 +9621,14 @@ ERROR: password is required DETAIL: Non-superusers must provide a password in the user mapp
Re: Strip -mmacosx-version-min options from plperl build
Peter Eisentraut writes: >>> In short: if the Cygwin Perl headers redefine PGDLLEXPORT (unlikely) >>> or somehow #define "__attribute__()" or "visibility()" into no-ops >>> (perhaps more likely) then we could explain this failure, and that >>> would also explain why it doesn't fail elsewhere. > This could be checked by running plperl.c through the preprocessor > (replace gcc -c plperl.c -o plperl.o by gcc -E plperl.c -o plperl.i) and > seeing what becomes of those symbols. Yeah, that was what I was going to suggest: grep the "-E" output for _PG_init and Pg_magic_func and confirm what their extern declarations look like. > If we want to get the buildfarm green again sooner, we could force a > --export-all-symbols directly. I'm not hugely upset as long as it's just the one machine failing. regards, tom lane
Re: pg_regress: lookup shellprog in $PATH
On Wed, Aug 24, 2022 at 10:31 PM Tom Lane wrote: > git blame blames that whole mechanism on me: 60cfe25e68d. It looks > like the reason I did it like that is that I was replacing use of > system(3) with execl(), and system(3) is defined thus by POSIX: > > execl(, "sh", "-c", command, (char *)0); > > where is an unspecified pathname for the sh utility. > > Using SHELL for the "unspecified path" is already a bit of a leap > of faith, since users are allowed to make that point at a non-Bourne > shell. I don't see any strong reason to preserve that behavior. It seems weird that you use any arbitrary shell to run 'sh', but I guess the point is that your shell command, whatever it is, is supposed to be a full pathname, and then it can do pathname resolution to figure out where you 'sh' executable is. So that makes me think that the problem Gurjeet is reporting is an issue with Nix rather than an issue with PostgreSQL. But what we've got is: [rhaas pgsql]$ git grep execl\( src/bin/pg_ctl/pg_ctl.c: (void) execl("/bin/sh", "/bin/sh", "-c", cmd, (char *) NULL); src/test/regress/pg_regress.c: execl(shellprog, shellprog, "-c", cmdline2, (char *) NULL); And surely that's stupid. The whole point here has to be that if you want to run something called 'sh' but don't know where it is, you need to execute a shell at a known pathname to figure it out for you. We could do as you propose and I don't think we would be worse off than we are today. But I'm confused why the correct formulation wouldn't be exactly what POSIX specifies, namely execl(shellprog, "sh", "-c", ...). That way, if somebody has a system where they do set $SHELL properly but do not have /bin/sh, things would still work. -- Robert Haas EDB: http://www.enterprisedb.com
Re: pg_regress: lookup shellprog in $PATH
Robert Haas writes: > But what we've got is: > [rhaas pgsql]$ git grep execl\( > src/bin/pg_ctl/pg_ctl.c: (void) execl("/bin/sh", "/bin/sh", "-c", cmd, > (char *) NULL); > src/test/regress/pg_regress.c: execl(shellprog, shellprog, "-c", > cmdline2, (char *) NULL); Right. I wouldn't really feel a need to change anything, except that we have this weird inconsistency between the way pg_ctl does it and the way pg_regress does it. I think we should settle on just one way. > We could do as you propose and I don't think we would be worse off > than we are today. But I'm confused why the correct formulation > wouldn't be exactly what POSIX specifies, namely execl(shellprog, > "sh", "-c", ...). That way, if somebody has a system where they do set > $SHELL properly but do not have /bin/sh, things would still work. My point is that that *isn't* what POSIX specifies. They say in so many words that the path actually used by system(3) is unspecified. They do NOT say that it's the value of $SHELL, and given that you're allowed to set $SHELL to a non-POSIX-compatible shell, using that is really wrong. We've gotten away with it so far because we resolve $SHELL at build time not run time, but it's still shaky. Interestingly, if you look at concrete man pages, you tend to find something else. Linux says The system() library function uses fork(2) to create a child process that executes the shell command specified in command using execl(3) as follows: execl("/bin/sh", "sh", "-c", command, (char *) 0); My BSD machines say "the command is handed to sh(1)", without committing to just how that's found ... but guess what, "which sh" finds /bin/sh. In any case, I can't find any system(3) that relies on $SHELL, so my translation wasn't correct according to either the letter of POSIX or common practice. It's supposed to be more or less a hard-wired path, they just don't want to commit to which path. Moreover, leaving aside the question of whether pg_regress' current behavior is actually bug-compatible with system(3), what is the argument that it needs to be? We have at this point sufficient experience with pg_ctl's use of /bin/sh to be pretty confident that that works everywhere. So let's standardize on the simpler way, not the more complex way. (It looks like pg_ctl has used /bin/sh since 6bcce25801c3f of Oct 2015.) regards, tom lane
Re: replacing role-level NOINHERIT with a grant-level option
On Wed, Aug 24, 2022 at 10:23 AM tushar wrote: > On 8/24/22 12:28 AM, Robert Haas wrote: > > This patch needed to be rebased pretty extensively after commit > > ce6b672e4455820a0348214be0da1a024c3f619f. Here is a new version. > Thanks, Robert, I have retested this patch with my previous scenarios > and things look good to me. I read through this again and found a comment that needed to be updated, so I did that, bumped catversion, and committed this. -- Robert Haas EDB: http://www.enterprisedb.com
Re: pg_regress: lookup shellprog in $PATH
On Thu, Aug 25, 2022 at 10:13 AM Tom Lane wrote: > My point is that that *isn't* what POSIX specifies. They say in so > many words that the path actually used by system(3) is unspecified. > They do NOT say that it's the value of $SHELL, and given that you're > allowed to set $SHELL to a non-POSIX-compatible shell, using that > is really wrong. We've gotten away with it so far because we > resolve $SHELL at build time not run time, but it's still shaky. > > Interestingly, if you look at concrete man pages, you tend to find > something else. Linux says > >The system() library function uses fork(2) to create a child process >that executes the shell command specified in command using execl(3) as >follows: >execl("/bin/sh", "sh", "-c", command, (char *) 0); > > My BSD machines say "the command is handed to sh(1)", without committing > to just how that's found ... but guess what, "which sh" finds /bin/sh. > > In any case, I can't find any system(3) that relies on $SHELL, > so my translation wasn't correct according to either the letter > of POSIX or common practice. It's supposed to be more or less > a hard-wired path, they just don't want to commit to which path. > > Moreover, leaving aside the question of whether pg_regress' > current behavior is actually bug-compatible with system(3), > what is the argument that it needs to be? We have at this > point sufficient experience with pg_ctl's use of /bin/sh > to be pretty confident that that works everywhere. So let's > standardize on the simpler way, not the more complex way. I mean, I can see you're on the warpath here and I don't care enough to fight about it very much, but as a matter of theory, I believe that hard-coded pathnames suck. Giving the user a way to survive if /bin/sh doesn't exist on their system or isn't the path they want to use seems fundamentally sensible to me. Now if system() doesn't do that anyhow, well then there is no such mechanism in such cases, and so the benefit of providing one in the tiny number of other cases that we have may not be there. But if you're trying to convince me that hard-coded paths are as a theoretical matter brilliant, I'm not buying it. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Letter case of "admin option"
On Thu, Aug 25, 2022 at 4:58 AM Alvaro Herrera wrote: > I ran a quick poll in a Spanish community. Everyone who responded (not > many admittedly) agreed with this idea -- they find the message clearer > if the keyword is mentioned explicitly in the translation. Makes sense. I didn't really doubt that ADMIN should be capitalized, I just wasn't sure about OPTION. > > > In short, I'm wondering whether we should regard ADMIN as the name of > > > the option, but OPTION as part of the GRANT syntax, and hence > > > capitalize it "ADMIN option". However, if the non-English speakers on > > > this list have a strong preference for something else I'm certainly > > > not going to fight about it. > > > > "ADMIN option" which is translated into "ADMINオプション" is fine by > > me. I hope Álvaro thinks the same way. > > Hmm, but our docs say that the option is called ADMIN OPTION, don't > they? And I think the standard sees it the same way. You cannot invoke > it without the word OPTION. I understand the point of view, but I don't > think it is clearer done that way. It is different for example with > INHERIT; we could say "the INHERIT option" making the word "option" > translatable in that phrase. But then you don't have to add that word > in the command. It's going to be a little strange of we have ADMIN OPTION and INHERIT option, isn't it? But we can try it. One thing I have noticed, though, is that there are a lot of existing references to ADMIN OPTION in code comments. If we decide on anything else here we're going to have quite a few things to tidy up. Not that that's a big deal I guess, but it's something to think about. -- Robert Haas EDB: http://www.enterprisedb.com
libpq error message refactoring
libpq now contains a mix of error message strings that end with newlines and don't end with newlines, due to some newer code paths with new ways of passing errors around. This has now gotten me confused a few too many times both during development and translation. So I looked into whether we can unify this, similar to how we have done elsewhere (e.g., pg_upgrade). I came up with the attached patch. It's not complete, but it shows the idea and it looks like a nice simplification to me. Thoughts on this approach?From e547d9993232e20c9e696bd7facbcb3de4afdd6c Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Thu, 4 Aug 2022 23:31:58 +0300 Subject: [PATCH v1] WIP: libpq_append_error --- src/interfaces/libpq/fe-connect.c | 271 +++-- src/interfaces/libpq/fe-exec.c | 109 src/interfaces/libpq/fe-misc.c | 15 ++ src/interfaces/libpq/libpq-int.h | 2 + src/interfaces/libpq/nls.mk| 4 +- src/interfaces/libpq/pqexpbuffer.c | 27 ++- src/interfaces/libpq/pqexpbuffer.h | 2 + 7 files changed, 186 insertions(+), 244 deletions(-) diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c index 8cefef20d1..79b9392858 100644 --- a/src/interfaces/libpq/fe-connect.c +++ b/src/interfaces/libpq/fe-connect.c @@ -896,8 +896,7 @@ fillPGconn(PGconn *conn, PQconninfoOption *connOptions) *connmember = strdup(tmp); if (*connmember == NULL) { - appendPQExpBufferStr(&conn->errorMessage, - libpq_gettext("out of memory\n")); + libpq_append_error(conn, "out of memory"); return false; } } @@ -1079,9 +1078,8 @@ connectOptions2(PGconn *conn) if (more || i != conn->nconnhost) { conn->status = CONNECTION_BAD; - appendPQExpBuffer(&conn->errorMessage, - libpq_gettext("could not match %d host names to %d hostaddr values\n"), - count_comma_separated_elems(conn->pghost), conn->nconnhost); + libpq_append_error(conn, "could not match %d host names to %d hostaddr values", + count_comma_separated_elems(conn->pghost), conn->nconnhost); return false; } } @@ -1160,9 +1158,8 @@ connectOptions2(PGconn *conn) else if (more || i != conn->nconnhost) { conn->status = CONNECTION_BAD; - appendPQExpBuffer(&conn->errorMessage, - libpq_gettext("could not match %d port numbers to %d hosts\n"), - count_comma_separated_elems(conn->pgport), conn->nconnhost); + libpq_append_error(conn, "could not match %d port numbers to %d hosts", + count_comma_separated_elems(conn->pgport), conn->nconnhost); return false; } } @@ -1250,9 +1247,8 @@ connectOptions2(PGconn *conn) && strcmp(conn->channel_binding, "require") != 0) { conn->status = CONNECTION_BAD; - appendPQExpBuffer(&conn->errorMessage, - libpq_gettext("invalid %s value: \"%s\"\n"), - "channel_binding", conn->channel_binding); + libpq_append_error(conn, "invalid %s value: \"%s\"", + "channel_binding", conn->channel_binding); return false; } } @@ -1276,9 +1272,8 @@ connectOptions2(PGconn *conn) && strcmp(conn->sslmode, "verify-full") != 0) { conn->status = CONNECTION_BAD; - appendPQExpBuffer(&conn->errorMessage, - libpq_gettext("invalid %s value: \"%s\"\n"), - "sslmode", conn->sslmode); + libpq_append_error(conn, "invalid %s value: \"%s\"", + "sslmode", conn->sslmode); return false; } @@ -1297,9 +1292,8 @@ connectOptions2(PGconn *conn) case 'r': /* "require" */
Re: pg_basebackup: add test about zstd compress option
On Thu, Aug 25, 2022 at 3:52 AM Dong Wook Lee wrote: > Could you check if I missed anything? There is already a far better test for this in src/bin/pg_verifybackup/t/009_extract.pl -- Robert Haas EDB: http://www.enterprisedb.com
[PATCH] Fix alter subscription concurrency errors
Without this patch concurrent ALTER/DROP SUBSCRIPTION statements for the same subscription could result in one of these statements returning the following error: ERROR: XX000: tuple concurrently updated This patch fixes that by re-fetching the tuple after acquiring the lock on the subscription. The included isolation test fails most of its permutations without this patch, with the error shown above. The loop to re-fetch the tuple is heavily based on the code from dbcommands.c 0001-Fix-errors-when-concurrently-altering-subscriptions.patch Description: 0001-Fix-errors-when-concurrently-altering-subscriptions.patch
Re: pg_regress: lookup shellprog in $PATH
Robert Haas writes: > I mean, I can see you're on the warpath here and I don't care enough > to fight about it very much, but as a matter of theory, I believe that > hard-coded pathnames suck. Giving the user a way to survive if /bin/sh > doesn't exist on their system or isn't the path they want to use seems > fundamentally sensible to me. Now if system() doesn't do that anyhow, > well then there is no such mechanism in such cases, and so the benefit > of providing one in the tiny number of other cases that we have may > not be there. But if you're trying to convince me that hard-coded > paths are as a theoretical matter brilliant, I'm not buying it. If we were executing a program that the user needs to have some control over, sure, but what we have here is an implementation detail that I doubt anyone cares about. The fact that we're using a shell at all is only because nobody has cared to manually implement I/O redirection logic in these places; otherwise we'd be execl()'ing the server or psql directly. Maybe the best answer would be to do that, and get out of the business of knowing where the shell is? regards, tom lane
Re: shadow variables - pg15 edition
On Thu, 25 Aug 2022 at 14:08, Justin Pryzby wrote: > Here, I've included the rest of your list. OK, I've gone through v3-remove-var-declarations.txt, v4-reuse.txt v4-reuse-more.txt and committed most of what you had and removed a few that I thought should be renames instead. I also added some additional ones after reprocessing the RenameOrScope category from the spreadsheet. With some minor adjustments to a small number of your ones, I pushed what I came up with. David shadow_analysis.ods Description: application/vnd.oasis.opendocument.spreadsheet
Re: shadow variables - pg15 edition
On Thu, 25 Aug 2022 at 13:46, David Rowley wrote: > I've attached a patch which I think improves the code in > gistRelocateBuildBuffersOnSplit() so that there's no longer a shadowed > variable. I also benchmarked this method in a tight loop and can > measure no performance change from getting the loop index this way vs > the old way. I've now pushed this patch too. David
Re: pg_regress: lookup shellprog in $PATH
On Thu, Aug 25, 2022 at 10:48 AM Tom Lane wrote: > If we were executing a program that the user needs to have some control > over, sure, but what we have here is an implementation detail that I > doubt anyone cares about. The fact that we're using a shell at all is > only because nobody has cared to manually implement I/O redirection logic > in these places; otherwise we'd be execl()'ing the server or psql directly. > Maybe the best answer would be to do that, and get out of the business > of knowing where the shell is? Well that also would not be crazy. -- Robert Haas EDB: http://www.enterprisedb.com
Re: pg_receivewal and SIGTERM
Re: Michael Paquier > FWIW, I've worked on an archiver integration a few years ago and got > annoyed that we use SIGINT while SIGTERM was the default (systemd was > not directly used there but the signal problem was the same, so we had > to go through some loops to make the stop signal configurable, like > systemd). SIGTERM is really the default for any init system or run-a-daemon system. Christoph
has_privs_of_role vs. is_member_of_role, redux
Hi, We've had some previous discussions about when to use has_privs_of_role and when to use is_member_of_role, and has_privs_of_role has mostly won the fight. That means that, if role "robert" is set to NOINHERIT and you "GRANT stuff TO robert", for the most part "robert" will not actually be able to do things that "stuff" could do. Now, robert will be able TO "SET ROLE stuff" and then do all and only those things that "stuff" can do, but he won't be able to do those things as "robert". For example: rhaas=# set role robert; SET rhaas=> select * from stuff_table; ERROR: permission denied for table stuff_table So far, so good. But it's clearly not the case that "GRANT stuff TO robert" has conferred no privileges at all on robert. At the very least, it's enabled him to "SET ROLE stuff", but what else? I decided to go through the code and make a list of the things that robert can now do that he couldn't do before. Here it is: 1. robert can create new objects of various types owned by stuff: rhaas=> create schema stuff_by_robert authorization stuff; CREATE SCHEMA rhaas=> create schema unrelated_by_robert authorization unrelated; ERROR: must be member of role "unrelated" 2. robert can change the owner of objects he owns to instead be owned by stuff: rhaas=> alter table robert_table owner to unrelated; ERROR: must be member of role "unrelated" rhaas=> alter table robert_table owner to stuff; ALTER TABLE 3. robert can change the default privileges for stuff: rhaas=> alter default privileges for role unrelated grant select on tables to public; ERROR: must be member of role "unrelated" rhaas=> alter default privileges for role stuff grant select on tables to public; ALTER DEFAULT PRIVILEGES 4. robert can execute "SET ROLE stuff". That's it. There are two other behaviors that change -- the return value of pg_has_role('robert', 'stuff', 'MEMBER') and pg_hba.conf matching to groups -- but those aren't things that robert gains the ability to do. The above is an exhaustive list of the things robert gains the ability to do. I argue that #3 is a clear bug. robert can't select from stuff's tables or change privileges on stuff's objects, so why can he change stuff's default privileges? is_member_of_role() has a note that it is not to be used for privilege checking, and this seems like it's pretty clearly a privilege check. On the flip side, #4 is pretty clearly correct. Presumably, allowing that to happen was the whole point of executing "GRANT stuff TO robert" in the first place. The other two are less clear, in my opinion. We don't want users to end up owning objects that they didn't intend to own; in particular, if any user could make a security-definer function and then gift it to the superuser, it would be a disaster. So, arguably, the ability to make some other role the owner of an object represents a privilege that your role holds with respect to their role. Under that theory, the is_member_of_role() checks that are performed in cases #1 and #2 are privilege checks, and we ought to be using has_privis_of_role() instead, so that a non-inherited role grant doesn't confer those privileges. But I don't find this very clear cut, because except when the object you're gifting is a Trojan horse, giving stuff away helps the recipient, not the donor. Also, from a practical point of view, changing the owner of an object is different from other things that robert might want to do. If robert wants to create a table as user stuff or read some data from tables user stuff can access or change privileges on objects that role stuff owns, he can just execute "SET ROLE stuff" and then do any of that stuff. But he can't give away his own objects by assuming stuff's privileges. Either he can do it as himself, or he can't do it at all. It wouldn't be crazy IMHO to decide that a non-inherited grant isn't sufficient to donate objects to the granted role, and thus an inherited grant is required in such cases. However, the current system doesn't seem insane either, and in fact might be convenient in some situations. In short, my proposal is to change the ALTER DEFAULT PRIVILEGES code so that you have to have the privileges of the target role, not jut membership in the target role, and leave everything else unchanged. Thoughts? -- Robert Haas EDB: http://www.enterprisedb.com
re: postgres_fdw hint messages
>The postgres_fdw tests contain this (as amended by patch 0001): >ALTER SERVER loopback_nopw OPTIONS (ADD password 'dummypw'); >ERROR: invalid option "password" >HINT: Valid options in this context are: service, passfile, >channel_binding, connect_timeout, dbname, host, hostaddr, port, options, >application_name, keepalives, keepalives_idle, keepalives_interval, >keepalives_count, tcp_user_timeout, sslmode, sslcompression, sslcert, >sslkey, sslrootcert, sslcrl, sslcrldir, sslsni, requirepeer, >ssl_min_protocol_version, ssl_max_protocol_version, gssencmode, >krbsrvname, gsslib, target_session_attrs, use_remote_estimate, >fdw_startup_cost, fdw_tuple_cost, extensions, updatable, truncatable, >fetch_size, batch_size, async_capable, parallel_commit, keep_connections >This annoys developers who are working on libpq connection options, >because any option added, removed, or changed causes this test to need >to be updated. >It's also questionable how useful this hint is in its current form, >considering how long it is and that the options are in an >implementation-dependent order. >Possible changes: >- Hide the hint from this particular test (done in the attached patches). +1 I vote for this option. Less work for future developers changes, I think worth the effort. Anyway, in alphabetical order, it's a lot easier for users to read. Patch attached. regards, Ranier Vilela 0003-reorder-alphabetical-libpq-options.patch Description: Binary data
Re: Schema variables - new implementation for Postgres 15
st 24. 8. 2022 v 10:04 odesílatel Erik Rijkers napsal: > Op 24-08-2022 om 08:37 schreef Pavel Stehule: > >> > > > > I fixed these. > > > > > [v20220824-1-*.patch] > > Hi Pavel, > > I noticed just now that variable assignment (i.e., LET) unexpectedly > (for me anyway) cast the type of the input value. Surely that's wrong? > The documentation says clearly enough: > > 'The result must be of the same data type as the session variable.' > > > Example: > > create variable x integer; > let x=1.5; > select x, pg_typeof(x); > x | pg_typeof > ---+--- > 2 | integer > (1 row) > > > Is this correct? > > If such casts (there are several) are intended then the text of the > documentation should be changed. > yes - the behave is designed like plpgsql assignment or SQL assignment (2022-08-25 19:35:35) postgres=# do $$ postgres$# declare i int; postgres$# begin postgres$# i := 1.5; postgres$# raise notice '%', i; postgres$# end; postgres$# $$; NOTICE: 2 DO (2022-08-25 19:38:10) postgres=# create table foo1(a int); CREATE TABLE (2022-08-25 19:38:13) postgres=# insert into foo1 values(1.5); INSERT 0 1 (2022-08-25 19:38:21) postgres=# select * from foo1; ┌───┐ │ a │ ╞═══╡ │ 2 │ └───┘ (1 row) There are the same rules as in SQL. This sentence is not good - the value should be casteable to the target type. Regards Pavel > Thanks, > > Erik > >
V14 and later build the backend with -lpthread
I realized $SUBJECT while wondering why my new buildfarm animal chickadee (NetBSD on gaur's old hardware) fails the plpython tests on v13 and earlier. After a bit of investigation I realized it *should* be failing, because neither NetBSD nor Python have done anything about the problem documented in [1]. The reason it fails to fail in current branches is that we're now pulling -lpthread into the backend, which AFAICT is an unintentional side-effect of sloppy autoconfmanship in commits de91c3b97 / 44bf3d508. We wanted pthread_barrier_wait() for pgbench, not the backend, but as-committed we'll add -lpthread to LIBS if it provides pthread_barrier_wait. Now maybe someday we'll be brave enough to make the backend multithreaded, but today is not that day, and in the meantime this seems like a rather dangerous situation. There has certainly been exactly zero analysis of whether it's safe. ... On the third hand, poking at backends with ldd shows that at least on Linux, we've been linking the backend with -lpthread for quite some time, back to 9.4 or so. The new-in-v14 behavior is that it's getting in there on BSD-ish platforms as well. Should we try to pull that back out, or just cross our fingers and hope there's no real problem? regards, tom lane [1] https://www.postgresql.org/message-id/flat/25662.1560896200%40sss.pgh.pa.us
Re: postgres_fdw hint messages
Em qui., 25 de ago. de 2022 às 14:31, Ranier Vilela escreveu: > >The postgres_fdw tests contain this (as amended by patch 0001): > > >ALTER SERVER loopback_nopw OPTIONS (ADD password 'dummypw'); > >ERROR: invalid option "password" > >HINT: Valid options in this context are: service, passfile, > >channel_binding, connect_timeout, dbname, host, hostaddr, port, options, > >application_name, keepalives, keepalives_idle, keepalives_interval, > >keepalives_count, tcp_user_timeout, sslmode, sslcompression, sslcert, > >sslkey, sslrootcert, sslcrl, sslcrldir, sslsni, requirepeer, > >ssl_min_protocol_version, ssl_max_protocol_version, gssencmode, > >krbsrvname, gsslib, target_session_attrs, use_remote_estimate, > >fdw_startup_cost, fdw_tuple_cost, extensions, updatable, truncatable, > >fetch_size, batch_size, async_capable, parallel_commit, keep_connections > > >This annoys developers who are working on libpq connection options, > >because any option added, removed, or changed causes this test to need > >to be updated. > > >It's also questionable how useful this hint is in its current form, > >considering how long it is and that the options are in an > >implementation-dependent order. > > >Possible changes: > > >- Hide the hint from this particular test (done in the attached patches). > +1 > I vote for this option. > Less work for future developers changes, I think worth the effort. > > Anyway, in alphabetical order, it's a lot easier for users to read. > > Patch attached. > Little tweak in the comments. regards, Ranier Vilela > > v1-0003-reorder-alphabetical-libpq-options.patch Description: Binary data
Re: pg_receivewal and SIGTERM
On Thu, Aug 25, 2022 at 5:13 PM Christoph Berg wrote: > > Re: Michael Paquier > > FWIW, I've worked on an archiver integration a few years ago and got > > annoyed that we use SIGINT while SIGTERM was the default (systemd was > > not directly used there but the signal problem was the same, so we had > > to go through some loops to make the stop signal configurable, like > > systemd). > > SIGTERM is really the default for any init system or run-a-daemon system. It is, but there is also precedent for not using it for graceful shutdown. Apache, for example, will do what we do today on SIGTERM and you use SIGWINCH to make it shut down gracefully (which would be the equivalent of us flushing the compression buffers, I'd say). I'm not saying we shouldn't change -- I fully approve of making the change. But the world is full of fairly prominent examples of the other way as well. I'm leaning towards considering it a feature-change and thus not something to backpatch (I'd be OK sneaking it into 15 though, as that one is not released yet and it feels like a perfectly *safe* change). Not enough to insist on it, but it seems "slightly more correct". -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: has_privs_of_role vs. is_member_of_role, redux
On 8/25/22 12:12, Robert Haas wrote: So far, so good. But it's clearly not the case that "GRANT stuff TO robert" has conferred no privileges at all on robert. At the very least, it's enabled him to "SET ROLE stuff", but what else? I decided to go through the code and make a list of the things that robert can now do that he couldn't do before. Here it is: 1. robert can create new objects of various types owned by stuff: 2. robert can change the owner of objects he owns to instead be owned by stuff: 3. robert can change the default privileges for stuff: 4. robert can execute "SET ROLE stuff". Nice analysis, and surprising (to me) I argue that #3 is a clear bug. robert can't select from stuff's tables or change privileges on stuff's objects, so why can he change stuff's default privileges? is_member_of_role() has a note that it is not to be used for privilege checking, and this seems like it's pretty clearly a privilege check. +1 this feels very wrong to me On the flip side, #4 is pretty clearly correct. Presumably, allowing that to happen was the whole point of executing "GRANT stuff TO robert" in the first place. Exactly The other two are less clear, in my opinion. We don't want users to end up owning objects that they didn't intend to own; in particular, if any user could make a security-definer function and then gift it to the superuser, it would be a disaster. So, arguably, the ability to make some other role the owner of an object represents a privilege that your role holds with respect to their role. Under that theory, the is_member_of_role() checks that are performed in cases #1 and #2 are privilege checks, and we ought to be using has_privis_of_role() instead, so that a non-inherited role grant doesn't confer those privileges. But I don't find this very clear cut, because except when the object you're gifting is a Trojan horse, giving stuff away helps the recipient, not the donor. Also, from a practical point of view, changing the owner of an object is different from other things that robert might want to do. If robert wants to create a table as user stuff or read some data from tables user stuff can access or change privileges on objects that role stuff owns, he can just execute "SET ROLE stuff" and then do any of that stuff. But he can't give away his own objects by assuming stuff's privileges. Either he can do it as himself, or he can't do it at all. It wouldn't be crazy IMHO to decide that a non-inherited grant isn't sufficient to donate objects to the granted role, and thus an inherited grant is required in such cases. However, the current system doesn't seem insane either, and in fact might be convenient in some situations. In short, my proposal is to change the ALTER DEFAULT PRIVILEGES code so that you have to have the privileges of the target role, not jut membership in the target role, and leave everything else unchanged. Thoughts? I'm not sure about these last two. Does it matter that object creation is being logged, maybe for auditing purposes, under a different user than the owner of the object? -- Joe Conway PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
Hi, On 2022-08-22 13:15:18 -0400, Melanie Plageman wrote: > v28 attached. > > I've added the new structs I added to typedefs.list. > > I've split the commit which adds all of the logic to track > IO operation statistics into two commits -- one which includes all of > the code to count IOOps for IOContexts locally in a backend and a second > which includes all of the code to accumulate and manage these with the > cumulative stats system. Thanks! > A few notes about the commit which adds local IO Operation stats: > > - There is a comment above pgstat_io_op_stats_collected() which mentions > the cumulative stats system even though this commit doesn't engage the > cumulative stats system. I wasn't sure if it was more or less > confusing to have two different versions of this comment. Not worth being worried about... > - should pgstat_count_io_op() take BackendType as a parameter instead of > using MyBackendType internally? I don't forsee a case where a different value would be passed in. > - pgstat_count_io_op() Assert()s that the passed-in IOOp and IOContext > are valid for this BackendType, but it doesn't check that all of the > pending stats which should be zero are zero. I thought this was okay > because if I did add that zero-check, it would be added to > pgstat_count_ioop() as well, and we already Assert() there that we can > count the op. Thus, it doesn't seem like checking that the stats are > zero would add any additional regression protection. It's probably ok. > - I've kept pgstat_io_context_desc() and pgstat_io_op_desc() in the > commit which adds those types (the local stats commit), however they > are not used in that commit. I wasn't sure if I should keep them in > that commit or move them to the first commit using them (the commit > adding the new view). > - I've left pgstat_fetch_backend_io_context_ops() in the shared stats > commit, however it is not used until the commit which adds the view in > pg_stat_get_io(). I wasn't sure which way seemed better. Think that's fine. > Notes on the commit which accumulates IO Operation stats in shared > memory: > > - I've extended the usage of the Assert()s that IO Operation stats that > should be zero are. Previously we only checked the stats validity when > querying the view. Now we check it when flushing pending stats and > when reading the stats file into shared memory. > Note that the three locations with these validity checks (when > flushing pending stats, when reading stats file into shared memory, > and when querying the view) have similar looking code to loop through > and validate the stats. However, the actual action they perform if the > stats are valid is different for each site (adding counters together, > doing a read, setting nulls in a tuple column to true). Also, some of > these instances have other code interspersed in the loops which would > require additional looping if separated from this logic. So it was > difficult to see a way of combining these into a single helper > function. All of them seem to repeat something like > + if (!pgstat_bktype_io_op_valid(bktype, io_op) || > + > !pgstat_io_context_io_op_valid(io_context, io_op)) perhaps those could be combined? Afaics nothing uses pgstat_bktype_io_op_valid separately. > Subject: [PATCH v28 3/5] Track IO operation statistics locally > > Introduce "IOOp", an IO operation done by a backend, and "IOContext", > the IO location source or target or IO type done by a backend. For > example, the checkpointer may write a shared buffer out. This would be > counted as an IOOp "write" on an IOContext IOCONTEXT_SHARED by > BackendType "checkpointer". > > Each IOOp (alloc, extend, fsync, read, write) is counted per IOContext > (local, shared, or strategy) through a call to pgstat_count_io_op(). > > The primary concern of these statistics is IO operations on data blocks > during the course of normal database operations. IO done by, for > example, the archiver or syslogger is not counted in these statistics. s/is/are/? > Stats on IOOps for all IOContexts for a backend are counted in a > backend's local memory. This commit does not expose any functions for > aggregating or viewing these stats. s/This commit does not/A subsequent commit will expose/... > @@ -823,6 +823,7 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, > ForkNumber forkNum, > BufferDesc *bufHdr; > Block bufBlock; > boolfound; > + IOContext io_context; > boolisExtend; > boolisLocalBuf = SmgrIsTemp(smgr); > > @@ -986,10 +987,25 @@ ReadBuffer_common(SMgrRelation smgr, char > relpersistence, ForkNumber forkNum, >*/ > Assert(!(pg_atomic_read_u32(&bufHdr->state) & BM_VALID)); /* > spinlock not needed */ > > - bufBlock = isLocalBuf ? LocalBufHdrGetBlock(bufHdr) : > BufHdrGetBlock(bufHdr); > + if (isLocalBuf)
Re: postgres_fdw hint messages
On Thu, Aug 25, 2022 at 6:42 PM Peter Eisentraut < peter.eisentr...@enterprisedb.com> wrote: > The postgres_fdw tests contain this (as amended by patch 0001): > > ALTER SERVER loopback_nopw OPTIONS (ADD password 'dummypw'); > ERROR: invalid option "password" > HINT: Valid options in this context are: service, passfile, > channel_binding, connect_timeout, dbname, host, hostaddr, port, options, > application_name, keepalives, keepalives_idle, keepalives_interval, > keepalives_count, tcp_user_timeout, sslmode, sslcompression, sslcert, > sslkey, sslrootcert, sslcrl, sslcrldir, sslsni, requirepeer, > ssl_min_protocol_version, ssl_max_protocol_version, gssencmode, > krbsrvname, gsslib, target_session_attrs, use_remote_estimate, > fdw_startup_cost, fdw_tuple_cost, extensions, updatable, truncatable, > fetch_size, batch_size, async_capable, parallel_commit, keep_connections > > This annoys developers who are working on libpq connection options, > because any option added, removed, or changed causes this test to need > to be updated. > > It's also questionable how useful this hint is in its current form, > considering how long it is and that the options are in an > implementation-dependent order. > > Thanks Peter, for looking at that; this HINT message is growing over time. In my opinion, we should hide the complete message in case of an invalid option. But try to show dependent options; for example, if someone specify "sslcrl" and that option require some more options, then show the HINT of that options. Possible changes: > > - Hide the hint from this particular test (done in the attached patches). > > > - Keep the hint, but maybe make it sorted? > > - Remove all the hints like this from foreign data commands. > > - Don't show the hint when there are more than N valid options. > > - Do some kind of "did you mean" like we have for column names. > > Thoughts? -- Ibrar Ahmed
Re: Cleaning up historical portability baggage
Hi, On 2022-08-18 18:13:38 +1200, Thomas Munro wrote: > Here's a slightly better AF_INET6 one. I'm planning to push it > tomorrow if there are no objections. You didn't yet, I think. Any chance you could? The HAVE_IPV6 stuff is wrong/ugly in the meson build, right now, and I'd rather not spend time fixing it up ;) > It does something a little more aggressive than the preceding stuff, because > SUSv3 says that IPv6 is an "option". I don't see that as an issue: it also > says that various other ubiquitous stuff we're using is optional. Of > course, it would be absurd for a new socket implementation to appear today > that can't talk to a decent chunk of the internet, and all we require here > is the headers. That optionality was relevant for the transition period a > couple of decades ago. > From f162a15a6d723f8c94d9daa6236149e1f39b0d9a Mon Sep 17 00:00:00 2001 > From: Thomas Munro > Date: Thu, 18 Aug 2022 11:55:10 +1200 > Subject: [PATCH] Remove configure probe for sockaddr_in6 and require AF_INET6. > > SUSv3 defines struct sockaddr_in6, and all targeted Unix > systems have it. Windows has it in . Remove the configure > probe, the macro and a small amount of dead code. > > Also remove a mention of IPv6-less builds from the documentation, since > there aren't any. > > This is similar to commits f5580882 and 077bf2f2 for Unix sockets. > Even though AF_INET6 is an "optional" component of SUSv3, there are no > known modern operating system without it, and it seems even less likely > to be omitted from future systems than AF_UNIX. > > Discussion: > https://postgr.es/m/ca+hukgkernfhmvb_h0upremp4lpzgn06yr2_0tyikjzb-2e...@mail.gmail.com Looks good to me. I'm idly wondering whether it's worth at some point to introduce a configure test of just compiling a file referencing all the headers and symbols we exist to be there... Greetings, Andres Freund
Re: pg_regress: lookup shellprog in $PATH
Robert Haas writes: > On Thu, Aug 25, 2022 at 10:48 AM Tom Lane wrote: >> If we were executing a program that the user needs to have some control >> over, sure, but what we have here is an implementation detail that I >> doubt anyone cares about. The fact that we're using a shell at all is >> only because nobody has cared to manually implement I/O redirection logic >> in these places; otherwise we'd be execl()'ing the server or psql directly. >> Maybe the best answer would be to do that, and get out of the business >> of knowing where the shell is? > Well that also would not be crazy. I experimented with this, and it seems like it might not be as awful as we've always assumed it would be. Attached is an incomplete POC that converts pg_regress proper to doing things this way. (isolationtester and pg_regress_ecpg are outright broken by this patch, because they rely on pg_regress' spawn_process and I didn't fix them yet. But you can run the core regression tests to see it works.) The Windows side of this is completely untested and may be broken; also, perhaps Windows has something more nearly equivalent to execvp() that we could use instead of reconstructing a command line? It's annoying that the patch removes all shell-quoting hazards on the Unix side but they are still there on the Windows side. regards, tom lane diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c index a803355f8e..e1e0d5f7a2 100644 --- a/src/test/regress/pg_regress.c +++ b/src/test/regress/pg_regress.c @@ -19,6 +19,7 @@ #include "postgres_fe.h" #include +#include #include #include #include @@ -51,10 +52,6 @@ typedef struct _resultmap */ char *host_platform = HOST_TUPLE; -#ifndef WIN32 /* not used in WIN32 case */ -static char *shellprog = SHELLPROG; -#endif - /* * On Windows we use -w in diff switches to avoid problems with inconsistent * newline representation. The actual result files will generally have @@ -73,7 +70,7 @@ _stringlist *dblist = NULL; bool debug = false; char *inputdir = "."; char *outputdir = "."; -char *expecteddir = "."; +char *expecteddir = "."; char *bindir = PGBINDIR; char *launcher = NULL; static _stringlist *loadextension = NULL; @@ -1052,12 +1049,71 @@ psql_end_command(StringInfo buf, const char *database) } while (0) /* - * Spawn a process to execute the given shell command; don't wait for it + * Redirect f to the file specified by fpath, opened with given flags and mode + * Does not return upon failure + */ +static void +redirect_to(FILE *f, const char *fpath, int flags, int mode) +{ + int fh; + + /* Let's just be sure the FILE is idle */ + fflush(f); + + fh = open(fpath, flags, mode); + if (fh < 0) + { + fprintf(stderr, "could not open file \"%s\": %m\n", fpath); + _exit(1); + } + if (dup2(fh, fileno(f)) < 0) + { + fprintf(stderr, "could not duplicate descriptor for file \"%s\": %m\n", +fpath); + _exit(1); + } + (void) close(fh); +} + +/* + * Redirect f to the same file used by fref + * Does not return upon failure + */ +static void +redirect_link(FILE *f, FILE *fref) +{ + /* Let's just be sure the FILE is idle */ + fflush(f); + + if (dup2(fileno(fref), fileno(f)) < 0) + { + fprintf(stderr, "could not duplicate file descriptor: %m\n"); + _exit(1); + } +} + +/* + * Spawn a process to execute a sub-command; don't wait for it * * Returns the process ID (or HANDLE) so we can wait for it later + * Does not return upon failure + * + * Arguments: + * file: program to be executed (may be a full path, or just program name) + * argv: NULL-terminated array of argument strings, as for execvp(2); + * argv[0] should normally be the same as file + * proc_stdin: filename to make child's stdin read from, or NULL + * for no redirection + * proc_stdout: filename to make child's stdout write to, or NULL + * for no redirection + * proc_stderr: filename to make child's stderr write to, or NULL + * for no redirection, or "&1" to link it to child's stdout */ PID_TYPE -spawn_process(const char *cmdline) +spawn_process(const char *file, char *argv[], + const char *proc_stdin, + const char *proc_stdout, + const char *proc_stderr) { #ifndef WIN32 pid_t pid; @@ -1085,35 +1141,65 @@ spawn_process(const char *cmdline) if (pid == 0) { /* - * In child - * - * Instead of using system(), exec the shell directly, and tell it to - * "exec" the command too. This saves two useless processes per - * parallel test case. + * In child: redirect stdio as requested, then execvp() */ - char *cmdline2; - - cmdline2 = psprintf("exec %s", cmdline); - execl(shellprog, shellprog, "-c", cmdline2, (char *) NULL); + if (proc_stdin) + redirect_to(stdin, proc_stdin, O_RDONLY, 0); + if (proc_stdout) + redirect_to(stdout, proc_stdout, O_WRONLY | O_CREAT | O_TRUNC, + S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH); + if (proc_stderr) + {
Re: archive modules
On Thu, Aug 25, 2022 at 03:29:41PM +0200, talk to ben wrote: > Here is a patch with the proposed wording. Here is the same patch with a couple more links. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From 92a6d8669d9e5b527a7ac9af7eb359a86526775b Mon Sep 17 00:00:00 2001 From: benoit Date: Mon, 22 Aug 2022 12:00:46 +0200 Subject: [PATCH v4 1/1] basic_archive parameter visibility doc patch Module parameters are only visible from the pg_settings view once the module is loaded. Since an archive module is loaded by the archiver process, the parameters are never visible from the view. This patch adds a note bout this in the pg_settings system view documentation. --- doc/src/sgml/system-views.sgml | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml index 44aa70a031..18ac4620f0 100644 --- a/doc/src/sgml/system-views.sgml +++ b/doc/src/sgml/system-views.sgml @@ -3275,7 +3275,15 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx This view does not display customized options - until the extension module that defines them has been loaded. + until the extension module that defines them has been loaded by the backend + process executing the query (e.g., via + , the + LOAD command, or a call + to a user-defined C function). For example, + since the is only loaded by the + archiver process, this view will not display any customized options defined + by archive modules unless special + action is taken to load them into the backend process executing the query. -- 2.25.1
Re: has_privs_of_role vs. is_member_of_role, redux
On Thu, Aug 25, 2022 at 3:03 PM Joe Conway wrote: > Nice analysis, and surprising (to me) Thanks. > > I argue that #3 is a clear bug. robert can't select from stuff's > > tables or change privileges on stuff's objects, so why can he change > > stuff's default privileges? is_member_of_role() has a note that it is > > not to be used for privilege checking, and this seems like it's pretty > > clearly a privilege check. > > +1 this feels very wrong to me Cool. I'll prepare a patch for that, unless someone else beats me to it. I really hate back-patching this kind of change but it's possible that it's the right thing to do. There's no real security exposure because the member could always SET ROLE and then do the exact same thing, so back-patching feels to me like it has a significantly higher chance of turning happy users into unhappy ones than the reverse. On the other hand, it's pretty hard to defend the current behavior once you stop to think about it, so perhaps it should be back-patched on those grounds. On the third hand, the fact that this has gone undiscovered for a decade makes you wonder whether we've really had clear enough ideas about this to justify calling it a bug rather than, say, an elevation of our thinking on this topic. > I'm not sure about these last two. Does it matter that object creation > is being logged, maybe for auditing purposes, under a different user > than the owner of the object? I'd be inclined to say that it doesn't matter, because the grant could have just as well been inheritable, or the action could have been performed by a superuser. Also, as a rule of thumb, I don't think we should choose to prohibit things on the grounds that some auditing regime might not be able to understand what happened. If that's an issue, we should address it by making the logging better, or including better logging hooks, or what have you. I think that the focus should be on the permissions model: what is the "right thing" from a security standpoint? -- Robert Haas EDB: http://www.enterprisedb.com
Re: V14 and later build the backend with -lpthread
On Thu, Aug 25, 2022 at 1:41 PM Tom Lane wrote: > I realized $SUBJECT while wondering why my new buildfarm animal chickadee > (NetBSD on gaur's old hardware) fails the plpython tests on v13 and > earlier. After a bit of investigation I realized it *should* be failing, > because neither NetBSD nor Python have done anything about the problem > documented in [1]. The reason it fails to fail in current branches is > that we're now pulling -lpthread into the backend, which AFAICT is an > unintentional side-effect of sloppy autoconfmanship in commits > de91c3b97 / 44bf3d508. We wanted pthread_barrier_wait() for pgbench, > not the backend, but as-committed we'll add -lpthread to LIBS if it > provides pthread_barrier_wait. > > Now maybe someday we'll be brave enough to make the backend multithreaded, > but today is not that day, and in the meantime this seems like a rather > dangerous situation. There has certainly been exactly zero analysis > of whether it's safe. > > ... On the third hand, poking at backends with ldd shows that at > least on Linux, we've been linking the backend with -lpthread for > quite some time, back to 9.4 or so. The new-in-v14 behavior is that > it's getting in there on BSD-ish platforms as well. > > Should we try to pull that back out, or just cross our fingers and > hope there's no real problem? Absent some evidence of a real problem, I vote for crossing our fingers. It would certainly be a very bad idea to start using pthreads willy-nilly in the back end, but the mere presence of the library doesn't seem like a particularly severe issue. I might feel differently if no such version had been released yet, but it's hard to feel like the sky is falling if it's been like this on Linux since 9.4. -- Robert Haas EDB: http://www.enterprisedb.com
Re: has_privs_of_role vs. is_member_of_role, redux
Robert Haas writes: > I really hate back-patching this kind of change but it's possible that > it's the right thing to do. There's no real security exposure because > the member could always SET ROLE and then do the exact same thing, so > back-patching feels to me like it has a significantly higher chance of > turning happy users into unhappy ones than the reverse. On the other > hand, it's pretty hard to defend the current behavior once you stop to > think about it, so perhaps it should be back-patched on those grounds. > On the third hand, the fact that this has gone undiscovered for a > decade makes you wonder whether we've really had clear enough ideas > about this to justify calling it a bug rather than, say, an elevation > of our thinking on this topic. Yeah, I'd lean against back-patching. This is the sort of behavioral change that users tend not to like finding in minor releases. regards, tom lane
Re: V14 and later build the backend with -lpthread
On Fri, Aug 26, 2022 at 8:40 AM Robert Haas wrote: > On Thu, Aug 25, 2022 at 1:41 PM Tom Lane wrote: > > I realized $SUBJECT while wondering why my new buildfarm animal chickadee > > (NetBSD on gaur's old hardware) fails the plpython tests on v13 and > > earlier. After a bit of investigation I realized it *should* be failing, > > because neither NetBSD nor Python have done anything about the problem > > documented in [1]. The reason it fails to fail in current branches is > > that we're now pulling -lpthread into the backend, which AFAICT is an > > unintentional side-effect of sloppy autoconfmanship in commits > > de91c3b97 / 44bf3d508. We wanted pthread_barrier_wait() for pgbench, > > not the backend, but as-committed we'll add -lpthread to LIBS if it > > provides pthread_barrier_wait. > > > > Now maybe someday we'll be brave enough to make the backend multithreaded, > > but today is not that day, and in the meantime this seems like a rather > > dangerous situation. There has certainly been exactly zero analysis > > of whether it's safe. > > > > ... On the third hand, poking at backends with ldd shows that at > > least on Linux, we've been linking the backend with -lpthread for > > quite some time, back to 9.4 or so. The new-in-v14 behavior is that > > it's getting in there on BSD-ish platforms as well. > > > > Should we try to pull that back out, or just cross our fingers and > > hope there's no real problem? > > Absent some evidence of a real problem, I vote for crossing our > fingers. It would certainly be a very bad idea to start using pthreads > willy-nilly in the back end, but the mere presence of the library > doesn't seem like a particularly severe issue. I might feel > differently if no such version had been released yet, but it's hard to > feel like the sky is falling if it's been like this on Linux since > 9.4. I suspect we will end up linked against the threading library anyway in real-world builds via --with-XXX (I see that --with-icu has that effect on my FreeBSD system, but I know that details about threading are quite different in NetBSD). I may lack imagination but I'm struggling to see how it could break anything. How should I have done that, by the way? Is the attached the right trick? From 761158cb38508dbb91ca7fff474b4f6e041583a7 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Fri, 26 Aug 2022 08:16:12 +1200 Subject: [PATCH] Avoid adding -pthread to backend unintentionally. --- configure| 35 +-- configure.ac | 7 +++ 2 files changed, 20 insertions(+), 22 deletions(-) diff --git a/configure b/configure index 814014a96b..fefe97e520 100755 --- a/configure +++ b/configure @@ -647,13 +647,13 @@ MSGFMT PG_CRC32C_OBJS CFLAGS_ARMV8_CRC32C CFLAGS_SSE42 -LIBOBJS ZSTD LZ4 UUID_LIBS LDAP_LIBS_BE LDAP_LIBS_FE with_ssl +LIBOBJS PTHREAD_CFLAGS PTHREAD_LIBS PTHREAD_CC @@ -12429,6 +12429,7 @@ fi if test "$enable_thread_safety" = yes; then + _LIBS="$LIBS" { $as_echo "$as_me:${as_lineno-$LINENO}: checking for library containing pthread_barrier_wait" >&5 $as_echo_n "checking for library containing pthread_barrier_wait... " >&6; } if ${ac_cv_search_pthread_barrier_wait+:} false; then : @@ -12485,6 +12486,21 @@ if test "$ac_res" != no; then : fi + ac_fn_c_check_func "$LINENO" "pthread_barrier_wait" "ac_cv_func_pthread_barrier_wait" +if test "x$ac_cv_func_pthread_barrier_wait" = xyes; then : + $as_echo "#define HAVE_PTHREAD_BARRIER_WAIT 1" >>confdefs.h + +else + case " $LIBOBJS " in + *" pthread_barrier_wait.$ac_objext "* ) ;; + *) LIBOBJS="$LIBOBJS pthread_barrier_wait.$ac_objext" + ;; +esac + +fi + + + LIBS="$_LIBS" fi if test "$with_readline" = yes; then @@ -16369,23 +16385,6 @@ fi -if test "$enable_thread_safety" = yes; then - ac_fn_c_check_func "$LINENO" "pthread_barrier_wait" "ac_cv_func_pthread_barrier_wait" -if test "x$ac_cv_func_pthread_barrier_wait" = xyes; then : - $as_echo "#define HAVE_PTHREAD_BARRIER_WAIT 1" >>confdefs.h - -else - case " $LIBOBJS " in - *" pthread_barrier_wait.$ac_objext "* ) ;; - *) LIBOBJS="$LIBOBJS pthread_barrier_wait.$ac_objext" - ;; -esac - -fi - - -fi - if test "$PORTNAME" = "win32" -o "$PORTNAME" = "cygwin"; then # Cygwin and (apparently, based on test results) Mingw both # have a broken strtof(), so substitute its implementation. diff --git a/configure.ac b/configure.ac index 6ff294d405..c51f174fde 100644 --- a/configure.ac +++ b/configure.ac @@ -1260,7 +1260,10 @@ AC_SEARCH_LIBS(shmget, cygipc) AC_SEARCH_LIBS(backtrace_symbols, execinfo) if test "$enable_thread_safety" = yes; then + _LIBS="$LIBS" AC_SEARCH_LIBS(pthread_barrier_wait, pthread) + AC_REPLACE_FUNCS([pthread_barrier_wait]) + LIBS="$_LIBS" fi if test "$with_readline" = yes; then @@ -1831,10 +1834,6 @@ AC_REPLACE_FUNCS(m4_normalize([ strnlen ])) -if test "$enable_thread_safety" = yes; then - AC_REPLACE_FUNCS(pthread_barrier_wait) -fi - if test "$PORTNAME" = "win32" -o "$PORTNAM
Re: V14 and later build the backend with -lpthread
Robert Haas writes: > On Thu, Aug 25, 2022 at 1:41 PM Tom Lane wrote: >> ... On the third hand, poking at backends with ldd shows that at >> least on Linux, we've been linking the backend with -lpthread for >> quite some time, back to 9.4 or so. The new-in-v14 behavior is that >> it's getting in there on BSD-ish platforms as well. [ further study shows that it's been pulled in on Linux to get sem_init() ] >> Should we try to pull that back out, or just cross our fingers and >> hope there's no real problem? > Absent some evidence of a real problem, I vote for crossing our > fingers. It would certainly be a very bad idea to start using pthreads > willy-nilly in the back end, but the mere presence of the library > doesn't seem like a particularly severe issue. I might feel > differently if no such version had been released yet, but it's hard to > feel like the sky is falling if it's been like this on Linux since > 9.4. Well, -lpthread on other platforms might have more or different side-effects than it does on Linux, so I'm not particularly comforted by that argument. I concede though that the lack of complaints about v14 is comforting. I'm prepared to do nothing for now; I just wanted to raise visibility of this point so that if we do come across any weird pre-vs-post-v14 issues, we think of this as a possible cause. regards, tom lane
Re: V14 and later build the backend with -lpthread
Thomas Munro writes: > I suspect we will end up linked against the threading library anyway > in real-world builds via --with-XXX (I see that --with-icu has that > effect on my FreeBSD system, but I know that details about threading > are quite different in NetBSD). I may lack imagination but I'm > struggling to see how it could break anything. Yeah, there are plenty of situations where you end up with thread support present somehow. So it may be a lost cause. I was mostly concerned about this because it seemed like an unintentional change. (I'm also still struggling to explain why mamba, with the *exact* same NetBSD code on a different hardware platform, isn't showing the same failures as chickadee. More news if I figure that out.) > How should I have done that, by the way? Is the attached the right trick? I think that'd do for preventing side-effects on LIBS, but I'm not sure if we'd have to back-fill something in pgbench's link options. Anyway, as I said to Robert, I'm content to watch and wait for now. regards, tom lane
PostgreSQL 15 Beta 4
Hi, We will be releasing a PostgreSQL 15 Beta 4 on September 8, 2022. Please have open items[1] completed and committed no later than September 5, 2022 0:00 AoE[2]. Thanks, Jonathan [1] https://wiki.postgresql.org/wiki/PostgreSQL_15_Open_Items [2] https://en.wikipedia.org/wiki/Anywhere_on_Earth OpenPGP_signature Description: OpenPGP digital signature
New strategies for freezing, advancing relfrozenxid early
Attached patch series is a completely overhauled version of earlier work on freezing. Related work from the Postgres 15 cycle became commits 0b018fab, f3c15cbe, and 44fa8488. Recap = The main high level goal of this work is to avoid painful, disruptive antiwraparound autovacuums (and other aggressive VACUUMs) that do way too much "catch up" freezing, all at once, causing significant disruption to production workloads. The patches teach VACUUM to care about how far behind it is on freezing for each table -- the number of unfrozen all-visible pages that have accumulated so far is directly and explicitly kept under control over time. Unfrozen pages can be seen as debt. There isn't necessarily anything wrong with getting into debt (getting into debt to a small degree is all but inevitable), but debt can be dangerous when it isn't managed carefully. Accumulating large amounts of debt doesn't always end badly, but it does seem to reliably create the *risk* that things will end badly. Right now, a standard append-only table could easily do *all* freezing in aggressive/antiwraparound VACUUM, without any earlier non-aggressive VACUUM operations triggered by autovacuum_vacuum_insert_threshold doing any freezing at all (unless the user goes out of their way to tune vacuum_freeze_min_age). There is currently no natural limit on the number of unfrozen all-visible pages that can accumulate -- unless you count age(relfrozenxid), the triggering condition for antiwraparound autovacuum. But relfrozenxid age predicts almost nothing about how much freezing is required (or will be required later on). The overall result is that it oftens takes far too long for freezing to finally happen, even when the table receives plenty of autovacuums (they all could freeze something, but in practice just don't freeze anything). It's very hard to avoid that through tuning, because what we really care about is something pretty closely related to (if not exactly) the number of unfrozen heap pages in the system. XID age is fundamentally "the wrong unit" here -- the physical cost of freezing is the most important thing, by far. In short, the goal of the patch series/project is to make autovacuum scheduling much more predictable over time. Especially with very large append-only tables. The patches improve the performance stability of VACUUM by managing costs holistically, over time. What happens in one single VACUUM operation is much less important than the behavior of successive VACUUM operations over time. What's new: freezing/skipping strategies This newly overhauled version introduces the concept of per-VACUUM-operation strategies, which we decide on once per VACUUM, at the very start. There are 2 choices to be made at this point (right after we acquire OldestXmin and similar cutoffs): 1) Do we scan all-visible pages, or do we skip instead? (Added by second patch, involves a trade-off between eagerness and laziness.) 2) How should we freeze -- eagerly or lazily? (Added by third patch) The strategy-based approach can be thought of as something that blurs the distinction between aggressive and non-aggressive VACUUM, giving VACUUM more freedom to do either more or less work, based on known costs and benefits. This doesn't completely supersede aggressive/antiwraparound VACUUMs, but should make them much rarer with larger tables, where controlling freeze debt actually matters. There is a need to keep laziness and eagerness in balance here. We try to get the benefit of lazy behaviors/strategies, but will still course correct when it doesn't work out. A new GUC/reloption called vacuum_freeze_strategy_threshold is added to control freezing strategy (also influences our choice of skipping strategy). It defaults to 4GB, so tables smaller than that cutoff (which are usually the majority of all tables) will continue to freeze in much the same way as today by default. Our current lazy approach to freezing makes sense there, and should be preserved for its own sake. Compatibility = Structuring the new freezing behavior as an explicit user-configurable strategy is also useful as a bridge between the old and new freezing behaviors. It makes it fairly easy to get the old/current behavior where that's preferred -- which, I must admit, is something that wasn't well thought through last time around. The vacuum_freeze_strategy_threshold GUC is effectively (though not explicitly) a compatibility option. Users that want something close to the old/current behavior can use the GUC or reloption to more or less opt-out of the new freezing behavior, and can do so selectively. The GUC should be easy for users to understand, too -- it's just a table size cutoff. Skipping pages using a snapshot of the visibility map = We now take a copy of the visibility map at the point that VACUUM begins, and work off of that when skipping, instead of working off of t
Re: V14 and later build the backend with -lpthread
I wrote: > (I'm also still struggling to explain why mamba, with the *exact* > same NetBSD code on a different hardware platform, isn't showing > the same failures as chickadee. More news if I figure that out.) Hah: I left --with-libxml out of chickadee's configuration, because libxml2 seemed to have some problems on that platform, and that is what is pulling in libpthread on mamba: $ ldd /usr/pkg/lib/libxml2.so /usr/pkg/lib/libxml2.so: -lz.1 => /usr/lib/libz.so.1 -lc.12 => /usr/lib/libc.so.12 -llzma.2 => /usr/lib/liblzma.so.2 -lpthread.1 => /lib/libpthread.so.1 -lm.0 => /usr/lib/libm.so.0 -lgcc_s.1 => /lib/libgcc_s.so.1 Reinforces your point about real-world builds, I suppose. For the moment I'll just disable testing plpython pre-v14 on chickadee. regards, tom lane
Re: V14 and later build the backend with -lpthread
Hi, On 2022-08-25 17:04:37 -0400, Tom Lane wrote: > (I'm also still struggling to explain why mamba, with the *exact* > same NetBSD code on a different hardware platform, isn't showing > the same failures as chickadee. More news if I figure that out.) I'd guess it's because of the different dependencies that are enabled. On my netbsd VM libxml2 pulls in -lpthread, for example. We add xml2's dependencies to LIBS, so if that's enabled, we end up indirectly pulling in libxml2 in as well. > > How should I have done that, by the way? Is the attached the right trick? > > I think that'd do for preventing side-effects on LIBS, but I'm not > sure if we'd have to back-fill something in pgbench's link options. > Anyway, as I said to Robert, I'm content to watch and wait for now. Given that linking in pthreads support fixes things, that seems the right course... I wonder if we shouldn't even be more explicit about it and just add it - who knows what extension libraries pull in. It'd not be good if we end up with non-reentrant versions of functions just because initially the backend isn't threaded etc. Greetings, Andres Freund
Re: Strip -mmacosx-version-min options from plperl build
On 2022-08-25 Th 09:43, Tom Lane wrote: > Peter Eisentraut writes: In short: if the Cygwin Perl headers redefine PGDLLEXPORT (unlikely) or somehow #define "__attribute__()" or "visibility()" into no-ops (perhaps more likely) then we could explain this failure, and that would also explain why it doesn't fail elsewhere. >> This could be checked by running plperl.c through the preprocessor >> (replace gcc -c plperl.c -o plperl.o by gcc -E plperl.c -o plperl.i) and >> seeing what becomes of those symbols. > Yeah, that was what I was going to suggest: grep the "-E" output for > _PG_init and Pg_magic_func and confirm what their extern declarations > look like. $ egrep '_PG_init|Pg_magic_func' plperl.i extern __attribute__((visibility("default"))) void _PG_init(void); extern __attribute__((visibility("default"))) const Pg_magic_struct *Pg_magic_func(void); const Pg_magic_struct * Pg_magic_func(void) { static const Pg_magic_struct Pg_magic_data = { sizeof(Pg_magic_struct), 16 / 100, 100, 32, 64, _PG_init(void) cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Strip -mmacosx-version-min options from plperl build
Hi, On 2022-08-25 17:39:35 -0400, Andrew Dunstan wrote: > On 2022-08-25 Th 09:43, Tom Lane wrote: > > Peter Eisentraut writes: > In short: if the Cygwin Perl headers redefine PGDLLEXPORT (unlikely) > or somehow #define "__attribute__()" or "visibility()" into no-ops > (perhaps more likely) then we could explain this failure, and that > would also explain why it doesn't fail elsewhere. > >> This could be checked by running plperl.c through the preprocessor > >> (replace gcc -c plperl.c -o plperl.o by gcc -E plperl.c -o plperl.i) and > >> seeing what becomes of those symbols. > > Yeah, that was what I was going to suggest: grep the "-E" output for > > _PG_init and Pg_magic_func and confirm what their extern declarations > > look like. > > > $ egrep '_PG_init|Pg_magic_func' plperl.i > extern __attribute__((visibility("default"))) void _PG_init(void); > extern __attribute__((visibility("default"))) const Pg_magic_struct > *Pg_magic_func(void); const Pg_magic_struct * Pg_magic_func(void) { > static const Pg_magic_struct Pg_magic_data = { sizeof(Pg_magic_struct), > 16 / 100, 100, 32, 64, > _PG_init(void) Could you show objdump -t of the library? Perhaps once with the flags as now, and once relinking with the "old" flags that we're now omitting? Greetings, Andres Freund
Re: Strip -mmacosx-version-min options from plperl build
On 2022-08-25 Th 17:47, Andres Freund wrote: > Hi, > > On 2022-08-25 17:39:35 -0400, Andrew Dunstan wrote: >> On 2022-08-25 Th 09:43, Tom Lane wrote: >>> Peter Eisentraut writes: >> In short: if the Cygwin Perl headers redefine PGDLLEXPORT (unlikely) >> or somehow #define "__attribute__()" or "visibility()" into no-ops >> (perhaps more likely) then we could explain this failure, and that >> would also explain why it doesn't fail elsewhere. This could be checked by running plperl.c through the preprocessor (replace gcc -c plperl.c -o plperl.o by gcc -E plperl.c -o plperl.i) and seeing what becomes of those symbols. >>> Yeah, that was what I was going to suggest: grep the "-E" output for >>> _PG_init and Pg_magic_func and confirm what their extern declarations >>> look like. >> >> $ egrep '_PG_init|Pg_magic_func' plperl.i >> extern __attribute__((visibility("default"))) void _PG_init(void); >> extern __attribute__((visibility("default"))) const Pg_magic_struct >> *Pg_magic_func(void); const Pg_magic_struct * Pg_magic_func(void) { >> static const Pg_magic_struct Pg_magic_data = { sizeof(Pg_magic_struct), >> 16 / 100, 100, 32, 64, >> _PG_init(void) > Could you show objdump -t of the library? Perhaps once with the flags as now, > and once relinking with the "old" flags that we're now omitting? current: $ objdump -t plperl.dll | egrep '_PG_init|Pg_magic_func' [103](sec 1)(fl 0x00)(ty 20)(scl 2) (nx 0) 0x40a0 Pg_magic_func [105](sec 1)(fl 0x00)(ty 20)(scl 2) (nx 0) 0x40b0 _PG_init from July 11th build: $ objdump -t plperl.dll | egrep '_PG_init|Pg_magic_func' [101](sec 1)(fl 0x00)(ty 20)(scl 2) (nx 0) 0x40d0 Pg_magic_func [103](sec 1)(fl 0x00)(ty 20)(scl 2) (nx 0) 0x40e0 _PG_init cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Strip -mmacosx-version-min options from plperl build
Hi, On 2022-08-25 18:04:34 -0400, Andrew Dunstan wrote: > On 2022-08-25 Th 17:47, Andres Freund wrote: > >> $ egrep '_PG_init|Pg_magic_func' plperl.i > >> extern __attribute__((visibility("default"))) void _PG_init(void); > >> extern __attribute__((visibility("default"))) const Pg_magic_struct > >> *Pg_magic_func(void); const Pg_magic_struct * Pg_magic_func(void) { > >> static const Pg_magic_struct Pg_magic_data = { sizeof(Pg_magic_struct), > >> 16 / 100, 100, 32, 64, > >> _PG_init(void) > > Could you show objdump -t of the library? Perhaps once with the flags as > > now, > > and once relinking with the "old" flags that we're now omitting? > > > current: > > > $ objdump -t plperl.dll | egrep '_PG_init|Pg_magic_func' > [103](sec 1)(fl 0x00)(ty 20)(scl 2) (nx 0) 0x40a0 > Pg_magic_func > [105](sec 1)(fl 0x00)(ty 20)(scl 2) (nx 0) 0x40b0 _PG_init > > > from July 11th build: > > > $ objdump -t plperl.dll | egrep '_PG_init|Pg_magic_func' > [101](sec 1)(fl 0x00)(ty 20)(scl 2) (nx 0) 0x40d0 > Pg_magic_func > [103](sec 1)(fl 0x00)(ty 20)(scl 2) (nx 0) 0x40e0 _PG_init Thanks. So it looks like it's not the symbol not being exported. I wonder if the image base thing is somehow the problem? Sounds like it should just be an efficiency difference, by avoiding some relocations, not a functional difference. Can you try adding just that to the flags for building and whether that then allows a LOAD 'plperl' to succeed? Greetings, Andres Freund
Re: Cleaning up historical portability baggage
On Fri, Aug 26, 2022 at 7:47 AM Andres Freund wrote: > On 2022-08-18 18:13:38 +1200, Thomas Munro wrote: > > Here's a slightly better AF_INET6 one. I'm planning to push it > > tomorrow if there are no objections. > > You didn't yet, I think. Any chance you could? The HAVE_IPV6 stuff is > wrong/ugly in the meson build, right now, and I'd rather not spend time fixing > it up ;) Done, and thanks for looking. Remaining things from this thread: * removing --disable-thread-safety * removing those vestigial HAVE_XXX macros (one by one analysis and patches) * making Unix sockets secure for Windows in tests
Re: New strategies for freezing, advancing relfrozenxid early
On Thu, Aug 25, 2022 at 3:35 PM Jeremy Schneider wrote: > We should be careful here. IIUC, the current autovac behavior helps > bound the "spread" or range of active multixact IDs in the system, which > directly determines the number of distinct pages that contain those > multixacts. If the proposed change herein causes the spread/range of > MXIDs to significantly increase, then it will increase the number of > blocks and increase the probability of thrashing on the SLRUs for these > data structures. As a general rule VACUUM will tend to do more eager freezing with the patch set compared to HEAD, though it should never do less eager freezing. Not even in corner cases -- never. With the patch, VACUUM pretty much uses the most aggressive possible XID-wise/MXID-wise cutoffs in almost all cases (though only when we actually decide to freeze a page at all, which is now a separate question). The fourth patch in the patch series introduces a very limited exception, where we use the same cutoffs that we'll always use on HEAD (FreezeLimit + MultiXactCutoff) instead of the aggressive variants (OldestXmin and OldestMxact). This isn't just *any* xmax containing a MultiXact: it's a Multi that contains *some* XIDs that *need* to go away during the ongoing VACUUM, and others that *cannot* go away. Oh, and there usually has to be a need to keep two or more XIDs for this to happen -- if there is only one XID then we can usually swap xmax with that XID without any fuss. > PS. see also > https://www.postgresql.org/message-id/247e3ce4-ae81-d6ad-f54d-7d3e0409a...@ardentperf.com I think that the problem you describe here is very real, though I suspect that it needs to be addressed by making opportunistic cleanup of Multis happen more reliably. Running VACUUM more often just isn't practical once a table reaches a certain size. In general, any kind of processing that is time sensitive probably shouldn't be happening solely during VACUUM -- it's just too risky. VACUUM might take a relatively long time to get to the affected page. It might not even be that long in wall clock time or whatever -- just too long to reliably avoid the problem. -- Peter Geoghegan
Re: New strategies for freezing, advancing relfrozenxid early
On Thu, Aug 25, 2022 at 4:23 PM Peter Geoghegan wrote: > As a general rule VACUUM will tend to do more eager freezing with the > patch set compared to HEAD, though it should never do less eager > freezing. Not even in corner cases -- never. Come to think of it, I don't think that that's quite true. Though the fourth patch isn't particularly related to the problem. It *is* true that VACUUM will do at least as much freezing of XID based tuple header fields as before. That just leaves MXIDs. It's even true that we will do just as much freezing as before if you go pure on MultiXact-age. But I'm the one that likes to point out that age is altogether the wrong approach for stuff like this -- so that won't cut it. More concretely, I think that the patch series will fail to do certain inexpensive eager processing of tuple xmax, that will happen today, regardless of what the user has set vacuum_freeze_min_age or vacuum_multixact_freeze_min_age to. Although we currently only care about XID age when processing simple XIDs, we already sometimes make trade-offs similar to the trade-off I propose to make in the fourth patch for Multis. In other words, on HEAD, we promise to process any XMID >= MultiXactCutoff inside FreezeMultiXactId(). But we also manage to do "eager processing of xmax" when it's cheap and easy to do so, without caring about MultiXactCutoff at all -- this is likely the common case, even. This preexisting eager processing of Multis is likely important in many applications. The problem that I think I've created is that page-level freezing as implemented in lazy_scan_prune by the third patch doesn't know that we might be a good idea to execute a subset of freeze plans, in order to remove a multi from a page right away. It mostly has the right idea by holding off on freezing until it looks like a good idea at the level of the whole page, but I think that this is a plausible exception. Just because we're much more sensitive to leaving behind an Multi, and right now the only code path that can remove a Multi that isn't needed anymore is FreezeMultiXactId(). If xmax was an updater that aborted instead of a multi then we could rely on hint bits being set by pruning to avoid clog lookups. Technically nobody has violated their contract here, I think, but it still seems like it could easily be unacceptable. I need to come up with my own microbenchmark suite for Multis -- that was on my TODO list already. I still believe that the fourth patch addresses Andres' complaint about allocating new Multis during VACUUM. But it seems like I need to think about the nuances of Multis some more. In particular, what the performance impact might be of making a decision on freezing at the page level, in light of the special performance considerations for Multis. Maybe it needs to be more granular than that, more often. Or maybe we can comprehensively solve the problem in some other way entirely. Maybe pruning should do this instead, in general. Something like that might put this right, and be independently useful. -- Peter Geoghegan
Re: pg_receivewal and SIGTERM
On Thu, Aug 25, 2022 at 08:45:05PM +0200, Magnus Hagander wrote: > I'm leaning towards considering it a feature-change and thus not > something to backpatch (I'd be OK sneaking it into 15 though, as that > one is not released yet and it feels like a perfectly *safe* change). > Not enough to insist on it, but it seems "slightly more correct". Fine by me if both you and Daniel want to be more careful with this change. We could always argue about a backpatch later if there is more ask for it, as well. -- Michael signature.asc Description: PGP signature
Re: SYSTEM_USER reserved word implementation
On Thu, Aug 25, 2022 at 08:21:05PM +0200, Drouvot, Bertrand wrote: > system_user() now returns a text and I moved it to miscinit.c in the new > version attached (I think it makes more sense now). +/* kluge to avoid including libpq/libpq-be.h here */ +struct ClientConnectionInfo; +extern void InitializeSystemUser(struct ClientConnectionInfo conninfo); +extern const char* GetSystemUser(void); FWIW, I was also wondering about the need for all this initialization stanza and the extra SystemUser in TopMemoryContext. Now that we have MyClientConnectionInfo, I was thinking to just build the string in the SQL function as that's the only code path that needs to know about it. True that this approach saves some extra palloc() calls each time the function is called. > New version attached is also addressing Michael's remark regarding the peer > authentication TAP test. Thanks. I've wanted some basic tests for the peer authentication for some time now, independently on this thread, so it would make sense to split that into a first patch and stress the buildfarm to see what happens, then add these tests for SYSTEM_USER on top of the new test. -- Michael signature.asc Description: PGP signature
Fix japanese translation of log messages
Hi hackers, I've found typos in ja.po, and fixed them. The patch is attached. -- Regards, -- Shinya Kato Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATIONdiff --git a/src/backend/po/ja.po b/src/backend/po/ja.po index 0925465d22..189000792c 100644 --- a/src/backend/po/ja.po +++ b/src/backend/po/ja.po @@ -1492,7 +1492,7 @@ msgstr "ãã©ã¬ã«ã¯ã¼ã«ã®åæåã«å¤±æãã¾ãã" #: access/transam/parallel.c:719 access/transam/parallel.c:838 #, c-format msgid "More details may be available in the server log." -msgstr "è©³ç´°ãªæ å ±ãã¯ãµã¼ããã°ã«ããããããã¾ããã" +msgstr "è©³ç´°ãªæ å ±ã¯ãµã¼ããã°ã«ããããããã¾ããã" #: access/transam/parallel.c:899 #, c-format @@ -1841,7 +1841,7 @@ msgid "" "Stop the postmaster and vacuum that database in single-user mode.\n" "You might also need to commit or roll back old prepared transactions, or drop stale replication slots." msgstr "" -"postmaster ã忢å¾ãã·ã³ã°ã«ã¦ã¼ã¶ã¢ã¼ãã§ãã¼ã¿ãã¼ã¹ãVACUUMãå®è¡ãã¦ãã ããã\n" +"postmaster ã忢å¾ãã·ã³ã°ã«ã¦ã¼ã¶ã¢ã¼ãã§ãã¼ã¿ãã¼ã¹ã«VACUUMãå®è¡ãã¦ãã ããã\n" "å¤ãæºåæ¸ã¿ãã©ã³ã¶ã¯ã·ã§ã³ã®ã³ãããã¾ãã¯ãã¼ã«ããã¯ããããã¯å¤ãã¬ããªã±ã¼ã·ã§ã³ã¹ãããã®åé¤ãå¿ è¦ããããã¾ããã" #: access/transam/varsup.c:136 @@ -12183,7 +12183,7 @@ msgstr "失æããè¡ã¯%sãå«ã¿ã¾ã" #: executor/execMain.c:1970 #, c-format msgid "null value in column \"%s\" of relation \"%s\" violates not-null constraint" -msgstr "ãªã¬ã¼ã·ã§ã³\"%2$s\"ã®å\"%1$s\"ã®NULLå¤ããéNULLå¶ç´ã«éåãã¦ãã¾ã" +msgstr "ãªã¬ã¼ã·ã§ã³\"%2$s\"ã®å\"%1$s\"ã®NULLå¤ãéNULLå¶ç´ã«éåãã¦ãã¾ã" #: executor/execMain.c:2021 #, c-format @@ -17583,7 +17583,7 @@ msgstr "CREATE TABLE ã§ã¯æ¢åã®ã¤ã³ããã¯ã¹ã使ãã¾ãã" #: parser/parse_utilcmd.c:2261 #, c-format msgid "index \"%s\" is already associated with a constraint" -msgstr "ã¤ã³ããã¯ã¹\"%s\"ã¯ãã§ã«1ã¤ã®å¶ç´ã«å²ãå½ã¦ããããã¾ã" +msgstr "ã¤ã³ããã¯ã¹\"%s\"ã¯ãã§ã«1ã¤ã®å¶ç´ã«å²ãå½ã¦ããã¦ãã¾ã" #: parser/parse_utilcmd.c:2276 #, c-format @@ -25012,7 +25012,7 @@ msgstr "è¨å®å\"%s\"ãNULLã«ãããã¨ã¯ã§ãã¾ãã" #: utils/adt/tsvector_op.c:2665 #, c-format msgid "text search configuration name \"%s\" must be schema-qualified" -msgstr "ããã¹ãæ¤ç´¢è¨å®åç§°\"%s\"ã¯ã¹ãã¼ã修飾ããªãããããã¾ãã" +msgstr "ããã¹ãæ¤ç´¢è¨å®åç§°\"%s\"ã¯ã¹ãã¼ã修飾ããªããã°ãªãã¾ãã" #: utils/adt/tsvector_op.c:2690 #, c-format @@ -26503,7 +26503,7 @@ msgstr "åãåããã®ãã¼ã¹ããªã¼ããã°ã«è¨é²ãã¾ãã" #: utils/misc/guc.c:1498 msgid "Logs each query's rewritten parse tree." -msgstr "ãªã©ã¤ãå¾ã®åãåããã®ãã¼ã¹ããªã¼ããã°ãè¨é²ãã¾ãã" +msgstr "ãªã©ã¤ãå¾ã®åãåããã®ãã¼ã¹ããªã¼ããã°ã«è¨é²ãã¾ãã" #: utils/misc/guc.c:1507 msgid "Logs each query's execution plan." @@ -28443,7 +28443,7 @@ msgstr "åºå®ããããã¼ã¿ã«\"%s\"ã¯åé¤ã§ãã¾ãã" #: utils/mmgr/portalmem.c:488 #, c-format msgid "cannot drop active portal \"%s\"" -msgstr "ã¢ã¯ãã¤ããªãã¼ã¿ã«\"%s\"ãåé¤ã§ãã¾ãã" +msgstr "ã¢ã¯ãã£ããªãã¼ã¿ã«\"%s\"ãåé¤ã§ãã¾ãã" #: utils/mmgr/portalmem.c:739 #, c-format
Re: [PATCH] Add native windows on arm64 support
Hi, On 2022-05-10 10:01:55 +0900, Michael Paquier wrote: > On Mon, May 09, 2022 at 12:44:22PM +0100, Niyas Sait wrote: > > Microsoft updated documentation [1] and clarified that ASLR cannot be > > disabled for Arm64 targets. > > > > Because ASLR can't be disabled on ARM, ARM64, or ARM64EC architectures, > > > /DYNAMICBASE:NO isn't supported for these targets. > > Better than nothing, I guess, though this does not stand as a reason > explaining why this choice has been made. And it looks like we will > never know. We are going to need more advanced testing to check that > DYNAMICBASE is stable enough if we begin to enable it. Perhaps we > should really look at that for all the build targets we currently > support rather than just ARM, for the most modern Win32 environments > if we are going to cut support for most of the oldest releases.. I accidentally did a lot of testing with DYNAMICBASE - I accidentally mistranslated MSBuildProject.pm's false to adding /DYNAMICBASE in the meson port. Survived several hundred CI cycles and dozens of local runs without observing any problems related to that. Which doesn't surprise me, given the way we reserve space in the new process. Greetings, Andres Freund
Re: Column Filtering in Logical Replication
On Thu, Aug 25, 2022 at 7:38 PM vignesh C wrote: > ... > > PSA the v3* patch set. > > Thanks for the updated patch. > Few comments: > 1) We can shuffle the columns in publisher table and subscriber to > show that the order of the column does not matter > + > +Create a publication p1. A column list is defined for > +table t1 to reduce the number of columns that will be > +replicated. > + > +test_pub=# CREATE PUBLICATION p1 FOR TABLE t1 (id, a, b, c); > +CREATE PUBLICATION > +test_pub=# > + > OK. I made the following changes to the example. - now the subscriber table defines cols in a different order than that of the publisher table - now the publisher column list defines col names in a different order than that of the table - now the column list avoids using only adjacent column names > 2) We can try to keep the line content to less than 80 chars > + > +A column list is specified per table following the tablename, and > enclosed by > +parenthesis. See for details. > + > OK. Modified to use < 80 chars > 3) tablename should be table name like it is used in other places > + > +A column list is specified per table following the tablename, and > enclosed by > +parenthesis. See for details. > + > Sorry, I don't see this problem. AFAIK this same issue was already fixed in the v3* patches. Notice in the cited fragment that 'parenthesis' is misspelt but that was also fixed in v3. Maybe you are looking at an old patch file (??) > 4a) In the below, you could include mentioning "Only the column list > data of publication p1 are replicated." > + > + Insert some rows to table t1. > + > +test_pub=# INSERT INTO t1 VALUES(1, 'a-1', 'b-1', 'c-1', 'd-1', 'e-1'); > +INSERT 0 1 > OK. Modified to say this. > 4b) In the above, we could mention that the insert should be done on > the "publisher side" as the previous statements are executed on the > subscriber side. OK. Modified to say this. ~~~ Thanks for the feedback. PSA patch set v4* where all of the above comments are now addressed. -- Kind Regards, Peter Smith. Fujitsu Australia v4-0001-Column-List-replica-identity-rules.patch Description: Binary data v4-0002-Column-List-new-pgdocs-section.patch Description: Binary data
Re: [PATCH] Add native windows on arm64 support
On Thu, Aug 25, 2022 at 06:29:07PM -0700, Andres Freund wrote: > I accidentally did a lot of testing with DYNAMICBASE - I accidentally > mistranslated MSBuildProject.pm's false to adding > /DYNAMICBASE in the meson port. Survived several hundred CI cycles and dozens > of local runs without observing any problems related to that. > > Which doesn't surprise me, given the way we reserve space in the new process. Still, we had problems with that in the past and I don't recall huge changes in the way we allocate shmem on WIN32. Could there be some stuff in Windows itself that would explain more stability? I would not mind switching back to DYNAMICBASE on HEAD seeing your results. That would simplify this thread's patch a bit as well. -- Michael signature.asc Description: PGP signature
Re: [PATCH] Fix alter subscription concurrency errors
On Thu, Aug 25, 2022 at 8:17 PM Jelte Fennema wrote: > > Without this patch concurrent ALTER/DROP SUBSCRIPTION statements for > the same subscription could result in one of these statements returning the > following error: > > ERROR: XX000: tuple concurrently updated > > This patch fixes that by re-fetching the tuple after acquiring the lock on the > subscription. The included isolation test fails most of its permutations > without this patch, with the error shown above. > Won't the same thing can happen for similar publication commands? Why is this unique to the subscription and not other Alter/Drop commands? -- With Regards, Amit Kapila.
Re: pg_stat_wal: tracking the compression effect
At Thu, 25 Aug 2022 16:04:50 +0900, Ken Kato wrote in > Accumulating the values, which indicates how much space is saved by > each compression (size before compression - size after compression), > and keep track of how many times compression has happened. So that one > can know how much space is saved on average. Honestly, I don't think its useful much. How about adding them to pg_waldump and pg_walinspect instead? # It further widens the output of pg_waldump, though.. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: pg_stat_wal: tracking the compression effect
At Fri, 26 Aug 2022 11:55:27 +0900 (JST), Kyotaro Horiguchi wrote in > At Thu, 25 Aug 2022 16:04:50 +0900, Ken Kato wrote > in > > Accumulating the values, which indicates how much space is saved by > > each compression (size before compression - size after compression), > > and keep track of how many times compression has happened. So that one > > can know how much space is saved on average. > > Honestly, I don't think its useful much. > How about adding them to pg_waldump and pg_walinspect instead? > > # It further widens the output of pg_waldump, though.. Sorry, that was apparently too short. I know you already see that in per-record output of pg_waldump, but maybe we need the summary of saved bytes in "pg_waldump -b -z" output and the corresponding output of pg_walinspect. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: [PATCH] Optimize json_lex_string by batching character copying
On Thu, Aug 25, 2022 at 01:35:45PM +0700, John Naylor wrote: > - For the following comment, pgindent will put spaced operands on a > separate line which is not great for readability. and our other > reference to the Stanford bithacks page keeps the in-page link, and I > see no reason to exclude it -- if it goes missing, the whole page will > still load. So I put back those two details. > > +* To find bytes <= c, we can use bitwise operations to find > bytes < c+1, > +* but it only works if c+1 <= 128 and if the highest bit in v > is not set. > +* Adapted from > +* https://graphics.stanford.edu/~seander/bithacks.html#HasLessInWord This was just unnecessary fiddling on my part, sorry about that. > +test_lfind8_internal(uint8 key) > +{ > + uint8 charbuf[LEN_WITH_TAIL(Vector8)]; > + const int len_no_tail = LEN_NO_TAIL(Vector8); > + const int len_with_tail = LEN_WITH_TAIL(Vector8); > + > + memset(charbuf, 0xFF, len_with_tail); > + /* search tail to test one-byte-at-a-time path */ > + charbuf[len_with_tail - 1] = key; > + if (key > 0x00 && pg_lfind8(key - 1, charbuf, len_with_tail)) > + elog(ERROR, "pg_lfind8() found nonexistent element <= '0x%x'", > key - 1); > + if (key < 0xFF && !pg_lfind8(key, charbuf, len_with_tail)) > + elog(ERROR, "pg_lfind8() did not find existing element <= > '0x%x'", key); > + if (key < 0xFE && pg_lfind8(key + 1, charbuf, len_with_tail)) > + elog(ERROR, "pg_lfind8() found nonexistent element <= '0x%x'", > key + 1); > + > + memset(charbuf, 0xFF, len_with_tail); > + /* search with vector operations */ > + charbuf[len_no_tail - 1] = key; > + if (key > 0x00 && pg_lfind8(key - 1, charbuf, len_no_tail)) > + elog(ERROR, "pg_lfind8() found nonexistent element <= '0x%x'", > key - 1); > + if (key < 0xFF && !pg_lfind8(key, charbuf, len_no_tail)) > + elog(ERROR, "pg_lfind8() did not find existing element <= > '0x%x'", key); > + if (key < 0xFE && pg_lfind8(key + 1, charbuf, len_no_tail)) > + elog(ERROR, "pg_lfind8() found nonexistent element <= '0x%x'", > key + 1); > +} nitpick: Shouldn't the elog() calls use "==" instead of "<=" for this one? Otherwise, 0001 looks good to me. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: [PoC] Reducing planning time when tables have many partitions
On Fri, 26 Aug 2022 at 12:40, Yuya Watari wrote: > This performance degradation is due to the heavy processing of the > get_ec***_indexes***() functions. These functions are the core part of > the optimization we are working on in this thread, but they are > relatively heavy when the number of partitions is small. > > I noticed that these functions were called repeatedly with the same > arguments. During planning Query B with one partition, the > get_ec_source_indexes_strict() function was called 2087 times with > exactly the same parameters. Such repeated calls occurred many times > in a single query. How about instead of doing this caching like this, why don't we code up some iterators that we can loop over to fetch the required EMs. I'll attempt to type out my thoughts here without actually trying to see if this works: typedef struct EquivalenceMemberIterator { EquivalenceClass *ec; Relids relids; Bitmapset *em_matches; int position; /* last found index of em_matches or -1 */ bool use_index; bool with_children; bool with_norel_members; } EquivalenceMemberIterator; We'd then have functions like: static void get_ecmember_indexes_iterator(EquivalenceMemberIterator *it, PlannerInfo *root, EquivalenceClass *ec, Relids relids, bool with_children, bool with_norel_members) { it->ec = ec; it->relids = relids; it->position = -1; it->use_index = (root->simple_rel_array_size > 32); /* or whatever threshold is best */ it->with_children = with_children; it->with_norel_members = with_norel_members; if (it->use_index) it->em_matches = get_ecmember_indexes(root, ec, relids, with_children, with_norel_members); else it->em_matches = NULL; } static EquivalenceMember * get_next_matching_member(PlannerInfo *root, EquivalenceMemberIterator *it) { if (it->use_index) { it->position = bms_next_member(it->ec_matches, it->position); if (it->position >= 0) return list_nth(root->eq_members, it->position); return NULL; } else { int i = it->position; while ((i = bms_next_member(it->ec->ec_member_indexes, i) >= 0) { /* filter out the EMs we don't want here "break" when we find a match */ } it->position = i; if (i >= 0) return list_nth(root->eq_members, i); return NULL; } } Then the consuming code will do something like: EquivalenceMemberIterator iterator; get_ecmember_indexes_iterator(&iterator, root, ec, relids, true, false); while ((cur_em = get_next_matching_member(root, &it)) != NULL) { // do stuff } David
Re: use SSE2 for is_valid_ascii
On Thu, Aug 25, 2022 at 04:41:53PM +0700, John Naylor wrote: > v3 applies on top of the v9 json_lex_string patch in [1] and adds a > bit more to that, resulting in a simpler patch that is more amenable > to additional SIMD-capable platforms. LGTM -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: [PATCH] Add native windows on arm64 support
Hi, On 2022-08-26 11:09:41 +0900, Michael Paquier wrote: > On Thu, Aug 25, 2022 at 06:29:07PM -0700, Andres Freund wrote: > > I accidentally did a lot of testing with DYNAMICBASE - I accidentally > > mistranslated MSBuildProject.pm's false to adding > > /DYNAMICBASE in the meson port. Survived several hundred CI cycles and > > dozens > > of local runs without observing any problems related to that. > > > > Which doesn't surprise me, given the way we reserve space in the new > > process. > > Still, we had problems with that in the past and I don't recall huge > changes in the way we allocate shmem on WIN32. It's really not great that 7f3e17b4827 disabled randomization without even a comment as to why... There actually seems to have been a lot of work in that area. See 617dc6d299c / bcbf2346d69 and as well as the retrying in 45e004fb4e39. Those weren't prevented by us disabling ASLR. Greetings, Andres Freund
Re: use ARM intrinsics in pg_lfind32() where available
On Thu, Aug 25, 2022 at 11:57 AM Nathan Bossart wrote: > > On Thu, Aug 25, 2022 at 10:38:34AM +0700, John Naylor wrote: > > On Thu, Aug 25, 2022 at 1:01 AM Nathan Bossart > > wrote: > >> On Wed, Aug 24, 2022 at 11:07:03AM +0700, John Naylor wrote: > >> > - Can a user on ARM64 ever get a runtime fault if the machine attempts > >> > to execute NEON instructions? > >> > >> IIUC yes, although I'm not sure how likely it is in practice. > > > > Given the quoted part above, it doesn't seem likely, but we should try > > to find out for sure, because a runtime fault is surely not acceptable > > even on a toy system. > > The ARM literature appears to indicate that Neon support is pretty standard > on aarch64, and AFAICT it's pretty common to just assume it's available. This doesn't exactly rise to the level of "find out for sure", so I went looking myself. This is the language I found [1]: "Both floating-point and NEON are required in all standard ARMv8 implementations. However, implementations targeting specialized markets may support the following combinations: No NEON or floating-point. Full floating-point and SIMD support with exception trapping. Full floating-point and SIMD support without exception trapping." Since we assume floating-point, I see no reason not to assume NEON, but a case could be made for documenting that we require NEON on aarch64, in addition to exception trapping (for CRC runtime check) and floating point on any Arm. Or even just say "standard". I don't believe anyone will want to run Postgres on specialized hardware lacking these features, so maybe it's a moot point. [1] https://developer.arm.com/documentation/den0024/a/AArch64-Floating-point-and-NEON -- John Naylor EDB: http://www.enterprisedb.com
Re: [PATCH] Optimize json_lex_string by batching character copying
On Fri, Aug 26, 2022 at 10:14 AM Nathan Bossart wrote: > > > +test_lfind8_internal(uint8 key) [...] > > + elog(ERROR, "pg_lfind8() found nonexistent element <= > > '0x%x'", key + 1); > > +} > > nitpick: Shouldn't the elog() calls use "==" instead of "<=" for this one? Good catch, will fix. -- John Naylor EDB: http://www.enterprisedb.com
Re: Perform streaming logical transactions by background workers and parallel apply
On Thu, Aug 11, 2022 at 12:13 PM Amit Kapila wrote: > > Since we will later consider applying non-streamed transactions in > > parallel, I > > think "apply streaming worker" might not be very suitable. I think > > PostgreSQL > > also has the worker "parallel worker", so for "apply parallel worker" and > > "apply background worker", I feel that "apply background worker" will make > > the > > relationship between workers more clear. ("[main] apply worker" and "apply > > background worker") > > > > But, on similar lines, we do have vacuumparallel.c for parallelizing > index vacuum. I agree with Kuroda-San on this point that the currently > proposed terminology doesn't sound to be very clear. The other options > that come to my mind are "apply streaming transaction worker", "apply > parallel worker" and file name could be applystreamworker.c, > applyparallel.c, applyparallelworker.c, etc. I see the point why you > are hesitant in calling it "apply parallel worker" but it is quite > possible that even for non-streamed xacts, we will share quite some > part of this code. I think the "apply streaming transaction worker" is a good option w.r.t. what we are currently doing but then in the future, if we want to apply normal transactions in parallel then we will have to again change the name. So I think "apply parallel worker" might look better and the file name could be "applyparallelworker.c" or just "parallelworker.c". Although "parallelworker.c" file name is a bit generic but we already have worker.c so w.r.t that "parallelworker.c" should just look fine. At least that is what I think. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Handle infinite recursion in logical replication setup
On Mon, Aug 22, 2022 at 9:19 AM houzj.f...@fujitsu.com wrote: > > > Jonathan, Sawada-San, Hou-San, and others, what do you think is the best way > > to move forward here? > > I think it's fine to throw a WARNING in this case given that there is a > chance of inconsistent data. IMHO, since the user has specifically asked for origin=NONE but we do not have any way to detect the origin during initial sync so I think this could be documented and we can also issue the WARNING. So that users notice that part and carefully set up the replication. OTOH, I do not think that giving an error is very inconvenient because we are already providing a new option "origin=NONE" so along with that lets force the user to choose between copy_data=off or copy_data=force and with that, there is no scope for mistakes. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Small cleanups to tuplesort.c and a bonus small performance improvement
Hi, Since doing some work for PG15 for speeding up sorts, I've been a little irritated by the fact that dumptuples() calls WRITETUP, (which is now always calling writetuple()) and calls pfree() on the tuple only for dumptuples() to do MemoryContextReset(state->base.tuplecontext) directly afterwards. These pfrees are just a waste of effort and we might as well leave it to the context reset to do the cleanup. (Probably especially so when using AllocSet for storing tuples) There are only 2 calls to WRITETUP and the other one is always called when state->slabAllocatorUsed is true. writetuple() checks for that before freeing the tuple, which is a bit of a wasted branch since it'll always prove to be false for the use case in mergeonerun(). (It's possible the compiler might inline that now anyway since the WRITETUP macro always calls writetuple() directly now) I've attached 3 patches aimed to do a small amount of cleanup work in tuplesort.c 0001: Just fixes a broken looking comment in writetuple() 0002: Gets rid of the WRITETUP marco. That does not do anything useful since 097366c45 0003: Changes writetuple to tell it what it should do in regards to freeing and adjusting the memory accounting. Probably 0003 could be done differently. I'm certainly not set on the bool args. I understand that I'm never calling it with "freetup" == true. So other options include 1) rip out the pfree code and that parameter; or 2) just do the inlining manually at both call sites. I'll throw this in the September CF to see if anyone wants to look. There's probably lots more cleaning jobs that could be done in tuplesort.c. The performance improvement from 0003 is not that impressive, but it looks like it makes things very slightly faster, so probably worth it if the patch makes the code cleaner. See attached gif and script for the benchmark I ran to test it. I think the gains might go up slightly with [1] applied as that patch seems to do more to improve the speed of palloc() than it does to improve the speed of pfree(). David [1] https://commitfest.postgresql.org/39/3810/ From 3a0dc2893c757be7cbde5d1cd1f7801de073f943 Mon Sep 17 00:00:00 2001 From: David Rowley Date: Fri, 26 Aug 2022 14:42:23 +1200 Subject: [PATCH v1 3/3] Be smarter about freeing tuples during tuplesorts During dumptuples() the call to writetuple() would pfree any non-null tuple. This was quite wasteful as this happens just before we perform a reset of the context which stores all of those tuples. Here we adjust the writetuple signature so we can tell it exactly what we need done in regards to freeing the tuple and/or adjusting the memory accounting. In dumptuples() we must still account for the memory that's been released. Also, since writetuple() is only called twice, let's make it static inline so that the compiler inlines the call and removes the branching to check the bool arguments. --- src/backend/utils/sort/tuplesort.c | 44 -- 1 file changed, 29 insertions(+), 15 deletions(-) diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c index d1c2f5f58f..519c7424f7 100644 --- a/src/backend/utils/sort/tuplesort.c +++ b/src/backend/utils/sort/tuplesort.c @@ -452,8 +452,9 @@ struct Sharedsort static void tuplesort_begin_batch(Tuplesortstate *state); -static void writetuple(Tuplesortstate *state, LogicalTape *tape, - SortTuple *stup); +static inline void writetuple(Tuplesortstate *state, LogicalTape *tape, + SortTuple *stup, bool freetup, + bool adjust_accounting); static bool consider_abort_common(Tuplesortstate *state); static void inittapes(Tuplesortstate *state, bool mergeruns); static void inittapestate(Tuplesortstate *state, int maxTapes); @@ -1339,18 +1340,22 @@ tuplesort_puttuple_common(Tuplesortstate *state, SortTuple *tuple, bool useAbbre } /* - * Write 'stup' out onto 'tape' and, unless using the slab allocator, pfree - * stup's tuple and adjust the memory accounting accordingly. + * Write 'stup' out onto 'tape' and optionally free stup->tuple and optionally + * adjust the memory accounting. */ -static void -writetuple(Tuplesortstate *state, LogicalTape *tape, SortTuple *stup) +static inline void +writetuple(Tuplesortstate *state, LogicalTape *tape, SortTuple *stup, + bool freetup, bool adjust_accounting) { state->base.writetup(state, tape, stup); - if (!state->slabAllocatorUsed && stup->tuple) + if (stup->tuple != NULL) { - FREEMEM(state, GetMemoryChunkSpace(stup->tuple)); - pfree(stup->tuple); + if (adjust_accounting) + FREEMEM(state, GetMemoryChunkSpace(stup->tuple)); + + if (freetup) + pfree(stup->tuple); } } @@ -2251,6 +2256,8 @@ mergeonerun(Tuplesortstat
Re: use ARM intrinsics in pg_lfind32() where available
On Fri, Aug 26, 2022 at 10:45:10AM +0700, John Naylor wrote: > On Thu, Aug 25, 2022 at 11:57 AM Nathan Bossart > wrote: >> The ARM literature appears to indicate that Neon support is pretty standard >> on aarch64, and AFAICT it's pretty common to just assume it's available. > > This doesn't exactly rise to the level of "find out for sure", so I > went looking myself. This is the language I found [1]: > > "Both floating-point and NEON are required in all standard ARMv8 > implementations. However, implementations targeting specialized > markets may support the following combinations: > > No NEON or floating-point. > Full floating-point and SIMD support with exception trapping. > Full floating-point and SIMD support without exception trapping." Sorry, I should've linked to the documentation I found. I saw similar language in a couple of manuals, which is what led me to the conclusion that Neon support is relatively standard. > Since we assume floating-point, I see no reason not to assume NEON, > but a case could be made for documenting that we require NEON on > aarch64, in addition to exception trapping (for CRC runtime check) and > floating point on any Arm. Or even just say "standard". I don't > believe anyone will want to run Postgres on specialized hardware > lacking these features, so maybe it's a moot point. I'm okay with assuming Neon support for now. It's probably easier to add the __ARM_NEON check if/when someone complains than it is to justify removing it once it's there. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
wal_sync_method=fsync_writethrough
Hi, We allow $SUBJECT on Windows. I'm not sure exactly how we finished up with that, maybe a historical mistake, but I find it misleading today. Modern Windows flushes drive write caches for fsync (= _commit()) and fdatasync (= FLUSH_FLAGS_FILE_DATA_SYNC_ONLY). In fact it is possible to tell Windows to write out file data without flushing the drive cache (= FLUSH_FLAGS_NO_SYNC), but I don't believe anyone is interested in new weaker levels. Any reason not to just get rid of it? On macOS, our fsync and fdatasync levels *don't* flush drive caches, because those system calls don't on that OS, and they offer a weird special fcntl, so there we offer $SUBJECT for a good reason. Now that macOS 10.2 systems are thoroughly extinct, I think we might as well drop the configure probe, though, while we're doing a lot of that sort of thing. The documentation also says a couple of things that aren't quite correct about wal_sync_level. (I would also like to revise other nearby outdated paragraphs about volatile write caches, sector sizes etc, but that'll take some more research.) From 42bd2d6c0ff4cd931ec3d4008438842bab5acd4b Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Fri, 26 Aug 2022 14:27:43 +1200 Subject: [PATCH 1/3] Remove wal_sync_method=fsync_writethrough on Windows. The fsync and fdatasync levels already flush drive write caches on Windows, so it only confuses matters to have an apparently higher level that isn't actually higher. We don't do that on any other OS. --- doc/src/sgml/wal.sgml | 5 ++--- src/backend/storage/file/fd.c | 6 ++ src/bin/pg_test_fsync/pg_test_fsync.c | 4 +--- src/include/port/win32_port.h | 8 4 files changed, 5 insertions(+), 18 deletions(-) diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml index 01f7379ebb..90cbacfe41 100644 --- a/doc/src/sgml/wal.sgml +++ b/doc/src/sgml/wal.sgml @@ -108,9 +108,8 @@ open_datasync (the default), write caching can be disabled by unchecking My Computer\Open\disk drive\Properties\Hardware\Properties\Policies\Enable write caching on the disk. Alternatively, set wal_sync_method to -fdatasync (NTFS only), fsync or -fsync_writethrough, which prevent -write caching. +fdatasync (NTFS only) or fsync, +which prevent write caching. diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c index e3b19ca1ed..c27a6600fe 100644 --- a/src/backend/storage/file/fd.c +++ b/src/backend/storage/file/fd.c @@ -395,7 +395,7 @@ pg_fsync(int fd) #endif /* #if is to skip the sync_method test if there's no need for it */ -#if defined(HAVE_FSYNC_WRITETHROUGH) && !defined(FSYNC_WRITETHROUGH_IS_FSYNC) +#if defined(HAVE_FSYNC_WRITETHROUGH) if (sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH) return pg_fsync_writethrough(fd); else @@ -425,9 +425,7 @@ pg_fsync_writethrough(int fd) { if (enableFsync) { -#ifdef WIN32 - return _commit(fd); -#elif defined(F_FULLFSYNC) +#if defined(F_FULLFSYNC) return (fcntl(fd, F_FULLFSYNC, 0) == -1) ? -1 : 0; #else errno = ENOSYS; diff --git a/src/bin/pg_test_fsync/pg_test_fsync.c b/src/bin/pg_test_fsync/pg_test_fsync.c index 77f0db0376..e23e2601ad 100644 --- a/src/bin/pg_test_fsync/pg_test_fsync.c +++ b/src/bin/pg_test_fsync/pg_test_fsync.c @@ -605,9 +605,7 @@ signal_cleanup(int signum) static int pg_fsync_writethrough(int fd) { -#ifdef WIN32 - return _commit(fd); -#elif defined(F_FULLFSYNC) +#if defined(F_FULLFSYNC) return (fcntl(fd, F_FULLFSYNC, 0) == -1) ? -1 : 0; #else errno = ENOSYS; diff --git a/src/include/port/win32_port.h b/src/include/port/win32_port.h index 707f8760ca..b6769a1693 100644 --- a/src/include/port/win32_port.h +++ b/src/include/port/win32_port.h @@ -75,14 +75,6 @@ /* Windows doesn't have fsync() as such, use _commit() */ #define fsync(fd) _commit(fd) -/* - * For historical reasons, we allow setting wal_sync_method to - * fsync_writethrough on Windows, even though it's really identical to fsync - * (both code paths wind up at _commit()). - */ -#define HAVE_FSYNC_WRITETHROUGH -#define FSYNC_WRITETHROUGH_IS_FSYNC - #define USES_WINSOCK /* -- 2.35.1 From 6e59fb92aafa4803ceed411e1610a4f458ed092b Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Fri, 26 Aug 2022 15:00:26 +1200 Subject: [PATCH 2/3] Remove configure probe for F_FULLSYNC. This was done to avoid using it on very old macOS systems. --- configure | 14 -- configure.ac | 3 --- src/include/pg_config.h.in | 4 src/include/port/darwin.h | 3 --- src/tools/msvc/Solution.pm | 1 - 5 files changed, 25 deletions(-) diff --git a/configure b/configure index a268780c5d..6975c3f0ca 100755 --- a/configure +++ b/configure @@ -16204,20 +16204,6 @@ esac fi -# This is probably only present on macOS, but may as well check always -ac_fn_c_check_decl "$LINENO" "F_FULLFSYNC" "ac_cv_have_decl_F_FULLFSYNC" "#include -" -if test "x$
Re: Fix japanese translation of log messages
At Fri, 26 Aug 2022 10:23:01 +0900, Shinya Kato wrote in > I've found typos in ja.po, and fixed them. > The patch is attached. (This is not for -hackers but I'm fine with it being posted here;p) Thanks for the report! Pushed to 10 to 15 of translation repository with some minor changes. They will be reflected in the code tree some time later. msgid "More details may be available in the server log." -msgstr "詳細な情報がはサーバログにあるかもしれません。" +msgstr "詳細な情報はサーバログにあるかもしれません。" I prefer "詳細な情報が" than "詳細な情報は" here. (The existnce of the details is unknown here.) msgid "cannot drop active portal \"%s\"" -msgstr "アクテイブなポータル\"%s\"を削除できません" +msgstr "アクティブなポータル\"%s\"を削除できません" I canged it to "アクティブなポータル\"%s\"は削除できません". (It describes state facts, not telling the result of an action.) regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: pg_stat_wal: tracking the compression effect
> On 25 Aug 2022, at 12:04, Ken Kato wrote: > > What do you think? I think users will need to choose between Lz4 and Zstd. So they need to know tradeoff - compression ratio vs cpu time spend per page(or any other segment). I know that Zstd must be kind of "better", but doubt it have enough runway on 1 block to show off. If only we could persist compression context between many pages... Compression ratio may be different on different workloads, so system view or something similar could be of use. Thanks! Best regards, Andrey Borodin.
Re: Reducing the chunk header sizes on all memory context types
On Tue, 23 Aug 2022 at 21:03, David Rowley wrote: > Finally, the v5 patch with the fixes mentioned above. The CFbot just alerted me to the cplusplus check was failing with the v5 patch, so here's v6. > I'm pretty keen to get this patch committed early next week. This is > quite core infrastructure, so it would be good if the patch gets a few > extra eyeballs on it before then. That'll probably be New Zealand Monday, unless objects before then. David From 41419082028216112c92f9ff5bdec21f952b7f37 Mon Sep 17 00:00:00 2001 From: David Rowley Date: Thu, 9 Jun 2022 20:09:22 +1200 Subject: [PATCH v6] Improve performance of and reduce overheads of memory management Whenever we palloc a chunk of memory, traditionally, we prefix the returned pointer with a pointer to the memory context to which the chunk belongs. This is required so that we're able to easily determine the owning context when performing operations such as pfree() and repalloc(). For the AllocSet context, prior to this commit we additionally prefixed the pointer to the owning context with the size of the chunk. This made the header 16 bytes in size. This 16-byte overhead was required for all AllocSet allocations regardless of the allocation size. For the generation context, the problem was worse; in addition to the pointer to the owning context and chunk size, we also stored a pointer to the owning block so that we could track the number of freed chunks on a block. The slab allocator had a 16-byte chunk header. The changes being made here reduce the chunk header size down to just 8 bytes for all 3 of our memory context types. For small to medium sized allocations, this significantly increases the number of chunks that we can fit on a given block which results in much more efficient use of memory. Additionally, this commit completely changes the rule that pointers to palloc'd memory must be directly prefixed by a pointer to the owning memory context and instead, we now insist that they're directly prefixed by an 8-byte value where the least significant 3-bits are set to a value to indicate which type of memory context the pointer belongs to. Using those 3 bits as an index (known as MemoryContextMethodID) to a new array which stores the methods for each memory context type, we're now able to pass the pointer given to functions such as pfree() and repalloc() to the function specific to that context implementation to allow them to devise their own methods of finding the memory context which owns the given allocated chunk of memory. The reason we're able to reduce the chunk header down to just 8 bytes is because of the way we make use of the remaining 61 bits of the required 8-byte chunk header. Here we also implement a general-purpose MemoryChunk struct which makes use of those 61 remaining bits to allow the storage of a 30-bit value which the MemoryContext is free to use as it pleases, and also the number of bytes which must be subtracted from the chunk to get a reference to the block that the chunk is stored on (also 30 bits). The 1 additional remaining bit is to denote if the chunk is an "external" chunk or not. External here means that the chunk header does not store the 30-bit value or the block offset. The MemoryContext can use these external chunks at any time, but must use them if any of the two 30-bit fields are not large enough for the value(s) that need to be stored in them. When the chunk is marked as external, it is up to the MemoryContext to devise its own means to determine the block offset. Using 3-bits for the MemoryContextMethodID does mean we're limiting ourselves to only having a maximum of 8 different memory context types. We could reduce the bit space for the 30-bit value a little to make way for more than 3 bits, but it seems like it might be better to do that only if we ever need more than 8 context types. This would only be a problem if some future memory context type which does not use MemoryChunk really couldn't give up any of the 61 remaining bits in the chunk header. With this MemoryChunk, each of our 3 memory context types can quickly obtain a reference to the block any given chunk is located on. AllocSet is able to find the context to which the chunk is owned by first obtaining a reference to the block by subtracting the block offset as is stored in the 'hdrmask' field and then referencing the block's 'aset' field. The Generation context uses the same method, but GenerationBlock did not have a field pointing back to the owning context, so one is added by this commit. In aset.c and generation.c, all allocations larger than allocChunkLimit are stored on dedicated blocks. When there's just a single chunk on a block like this, it's easy to find the block from the chunk, we just subtract the size of the block header from the chunk pointer. The size of these chunks is also known as we store the endptr on the block, so we can just subtract the pointer to the allocated memory from that. Because we ca
Re: Extensible storage manager API - smgr hooks
> On 16 Jun 2022, at 13:41, Kirill Reshke wrote: > > Hello Yura and Anastasia. FWIW this technology is now a part of Greenplum [0]. We are building GP extension that automatically offloads cold data to S3 - a very simplified version of Neon for analytical workloads. When a segment of a table is not used for a long period of time, extension will sync files with backup storage in the Cloud. When the user touches data, extension's smgr will bring table segments back from backup or latest synced version. Our #1 goal is to provide a tool useful for the community. We easily can provide same extension for Postgres if this technology (extensible smgr) is in core. Does such an extension seem useful for Postgres? Or does this data access pattern seems unusual for Postgres? By pattern I mean vast amounts of cold data only ever appended and never touched. Best regards, Andrey Borodin. [0] https://github.com/greenplum-db/gpdb/pull/13601
Re: Fix japanese translation of log messages
On 2022-08-26 10:23, Shinya Kato wrote: Hi hackers, I've found typos in ja.po, and fixed them. The patch is attached. Thanks for the patch! LGTM. I had found a similar typo before in ja.po, so I added that as well. @@ -12739,7 +12739,7 @@ msgstr "ロールオプション\"%s\"が認識できません" #: gram.y:1588 gram.y:1604 #, c-format msgid "CREATE SCHEMA IF NOT EXISTS cannot include schema elements" -msgstr "CREATE SCHEMA IF NOT EXISTSんはスキーマ要素を含めることはできません" +msgstr "CREATE SCHEMA IF NOT EXISTSはスキーマ要素を含めることはできません" How do you think? -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATIONdiff --git a/src/backend/po/ja.po b/src/backend/po/ja.po index 0925465d22..8f0b7adbea 100644 --- a/src/backend/po/ja.po +++ b/src/backend/po/ja.po @@ -1492,7 +1492,7 @@ msgstr "ãã©ã¬ã«ã¯ã¼ã«ã®åæåã«å¤±æãã¾ãã" #: access/transam/parallel.c:719 access/transam/parallel.c:838 #, c-format msgid "More details may be available in the server log." -msgstr "è©³ç´°ãªæ å ±ãã¯ãµã¼ããã°ã«ããããããã¾ããã" +msgstr "è©³ç´°ãªæ å ±ã¯ãµã¼ããã°ã«ããããããã¾ããã" #: access/transam/parallel.c:899 #, c-format @@ -1841,7 +1841,7 @@ msgid "" "Stop the postmaster and vacuum that database in single-user mode.\n" "You might also need to commit or roll back old prepared transactions, or drop stale replication slots." msgstr "" -"postmaster ã忢å¾ãã·ã³ã°ã«ã¦ã¼ã¶ã¢ã¼ãã§ãã¼ã¿ãã¼ã¹ãVACUUMãå®è¡ãã¦ãã ããã\n" +"postmaster ã忢å¾ãã·ã³ã°ã«ã¦ã¼ã¶ã¢ã¼ãã§ãã¼ã¿ãã¼ã¹ã«VACUUMãå®è¡ãã¦ãã ããã\n" "å¤ãæºåæ¸ã¿ãã©ã³ã¶ã¯ã·ã§ã³ã®ã³ãããã¾ãã¯ãã¼ã«ããã¯ããããã¯å¤ãã¬ããªã±ã¼ã·ã§ã³ã¹ãããã®åé¤ãå¿ è¦ããããã¾ããã" #: access/transam/varsup.c:136 @@ -12183,7 +12183,7 @@ msgstr "失æããè¡ã¯%sãå«ã¿ã¾ã" #: executor/execMain.c:1970 #, c-format msgid "null value in column \"%s\" of relation \"%s\" violates not-null constraint" -msgstr "ãªã¬ã¼ã·ã§ã³\"%2$s\"ã®å\"%1$s\"ã®NULLå¤ããéNULLå¶ç´ã«éåãã¦ãã¾ã" +msgstr "ãªã¬ã¼ã·ã§ã³\"%2$s\"ã®å\"%1$s\"ã®NULLå¤ãéNULLå¶ç´ã«éåãã¦ãã¾ã" #: executor/execMain.c:2021 #, c-format @@ -12739,7 +12739,7 @@ msgstr "ãã¼ã«ãªãã·ã§ã³\"%s\"ãèªèã§ãã¾ãã" #: gram.y:1588 gram.y:1604 #, c-format msgid "CREATE SCHEMA IF NOT EXISTS cannot include schema elements" -msgstr "CREATE SCHEMA IF NOT EXISTSãã¯ã¹ãã¼ãè¦ç´ ãå«ãããã¨ã¯ã§ãã¾ãã" +msgstr "CREATE SCHEMA IF NOT EXISTSã¯ã¹ãã¼ãè¦ç´ ãå«ãããã¨ã¯ã§ãã¾ãã" #: gram.y:1761 #, c-format @@ -17583,7 +17583,7 @@ msgstr "CREATE TABLE ã§ã¯æ¢åã®ã¤ã³ããã¯ã¹ã使ãã¾ãã" #: parser/parse_utilcmd.c:2261 #, c-format msgid "index \"%s\" is already associated with a constraint" -msgstr "ã¤ã³ããã¯ã¹\"%s\"ã¯ãã§ã«1ã¤ã®å¶ç´ã«å²ãå½ã¦ããããã¾ã" +msgstr "ã¤ã³ããã¯ã¹\"%s\"ã¯ãã§ã«1ã¤ã®å¶ç´ã«å²ãå½ã¦ããã¦ãã¾ã" #: parser/parse_utilcmd.c:2276 #, c-format @@ -25012,7 +25012,7 @@ msgstr "è¨å®å\"%s\"ãNULLã«ãããã¨ã¯ã§ãã¾ãã" #: utils/adt/tsvector_op.c:2665 #, c-format msgid "text search configuration name \"%s\" must be schema-qualified" -msgstr "ããã¹ãæ¤ç´¢è¨å®åç§°\"%s\"ã¯ã¹ãã¼ã修飾ããªãããããã¾ãã" +msgstr "ããã¹ãæ¤ç´¢è¨å®åç§°\"%s\"ã¯ã¹ãã¼ã修飾ããªããã°ãªãã¾ãã" #: utils/adt/tsvector_op.c:2690 #, c-format @@ -26503,7 +26503,7 @@ msgstr "åãåããã®ãã¼ã¹ããªã¼ããã°ã«è¨é²ãã¾ãã" #: utils/misc/guc.c:1498 msgid "Logs each query's rewritten parse tree." -msgstr "ãªã©ã¤ãå¾ã®åãåããã®ãã¼ã¹ããªã¼ããã°ãè¨é²ãã¾ãã" +msgstr "ãªã©ã¤ãå¾ã®åãåããã®ãã¼ã¹ããªã¼ããã°ã«è¨é²ãã¾ãã" #: utils/misc/guc.c:1507 msgid "Logs each query's execution plan." @@ -28443,7 +28443,7 @@ msgstr "åºå®ããããã¼ã¿ã«\"%s\"ã¯åé¤ã§ãã¾ãã" #: utils/mmgr/portalmem.c:488 #, c-format msgid "cannot drop active portal \"%s\"" -msgstr "ã¢ã¯ãã¤ããªãã¼ã¿ã«\"%s\"ãåé¤ã§ãã¾ãã" +msgstr "ã¢ã¯ãã£ããªãã¼ã¿ã«\"%s\"ãåé¤ã§ãã¾ãã" #: utils/mmgr/portalmem.c:739 #, c-format
Re: Perform streaming logical transactions by background workers and parallel apply
On Fri, Aug 26, 2022 at 9:30 AM Dilip Kumar wrote: > > On Thu, Aug 11, 2022 at 12:13 PM Amit Kapila wrote: > > > Since we will later consider applying non-streamed transactions in > > > parallel, I > > > think "apply streaming worker" might not be very suitable. I think > > > PostgreSQL > > > also has the worker "parallel worker", so for "apply parallel worker" and > > > "apply background worker", I feel that "apply background worker" will > > > make the > > > relationship between workers more clear. ("[main] apply worker" and "apply > > > background worker") > > > > > > > But, on similar lines, we do have vacuumparallel.c for parallelizing > > index vacuum. I agree with Kuroda-San on this point that the currently > > proposed terminology doesn't sound to be very clear. The other options > > that come to my mind are "apply streaming transaction worker", "apply > > parallel worker" and file name could be applystreamworker.c, > > applyparallel.c, applyparallelworker.c, etc. I see the point why you > > are hesitant in calling it "apply parallel worker" but it is quite > > possible that even for non-streamed xacts, we will share quite some > > part of this code. > > I think the "apply streaming transaction worker" is a good option > w.r.t. what we are currently doing but then in the future, if we want > to apply normal transactions in parallel then we will have to again > change the name. So I think "apply parallel worker" might look > better and the file name could be "applyparallelworker.c" or just > "parallelworker.c". Although "parallelworker.c" file name is a bit > generic but we already have worker.c so w.r.t that "parallelworker.c" > should just look fine. > Yeah based on that theory, we can go with parallelworker.c but my vote is to go with applyparallelworker.c among the above as that is more clear. I feel worker.c is already not a very good name where we are doing the work related to apply, so it won't be advisable to go down that path further. -- With Regards, Amit Kapila.
Re: use ARM intrinsics in pg_lfind32() where available
Here is a new patch set that applies on top of v9-0001 in the json_lex_string patch set [0] and v3 of the is_valid_ascii patch [1]. [0] https://postgr.es/m/CAFBsxsFV4v802idV0-Bo%3DV7wLMHRbOZ4er0hgposhyGCikmVGA%40mail.gmail.com [1] https://postgr.es/m/CAFBsxsFFAZ6acUfyUALiem4DpCW%3DApXbF02zrc0G0oT9CPof0Q%40mail.gmail.com -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From 5f973a39d67a744d514ee80e05a1c7f40bc0ebc6 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Thu, 25 Aug 2022 22:18:30 -0700 Subject: [PATCH v3 1/2] abstract architecture-specific implementation details from pg_lfind32() --- src/include/port/pg_lfind.h | 55 ++ src/include/port/simd.h | 60 + 2 files changed, 90 insertions(+), 25 deletions(-) diff --git a/src/include/port/pg_lfind.h b/src/include/port/pg_lfind.h index a4e13dffec..7a851ea42c 100644 --- a/src/include/port/pg_lfind.h +++ b/src/include/port/pg_lfind.h @@ -91,16 +91,19 @@ pg_lfind32(uint32 key, uint32 *base, uint32 nelem) { uint32 i = 0; -#ifdef USE_SSE2 +#ifndef USE_NO_SIMD /* - * A 16-byte register only has four 4-byte lanes. For better - * instruction-level parallelism, each loop iteration operates on a block - * of four registers. Testing has showed this is ~40% faster than using a - * block of two registers. + * For better instruction-level parallelism, each loop iteration operates + * on a block of four registers. Testing for SSE2 has showed this is ~40% + * faster than using a block of two registers. */ - const __m128i keys = _mm_set1_epi32(key); /* load 4 copies of key */ - uint32 iterations = nelem & ~0xF; /* round down to multiple of 16 */ + const Vector32 keys = vector32_broadcast(key); /* load copies of key */ + uint32 nelem_per_vector = sizeof(Vector32) / sizeof(uint32); + uint32 nelem_per_iteration = 4 * nelem_per_vector; + + /* round down to multiple of elements per iteration */ + uint32 tail_idx = nelem & ~(nelem_per_iteration - 1); #if defined(USE_ASSERT_CHECKING) bool assert_result = false; @@ -116,31 +119,33 @@ pg_lfind32(uint32 key, uint32 *base, uint32 nelem) } #endif - for (i = 0; i < iterations; i += 16) + for (i = 0; i < tail_idx; i += nelem_per_iteration) { - /* load the next block into 4 registers holding 4 values each */ - const __m128i vals1 = _mm_loadu_si128((__m128i *) & base[i]); - const __m128i vals2 = _mm_loadu_si128((__m128i *) & base[i + 4]); - const __m128i vals3 = _mm_loadu_si128((__m128i *) & base[i + 8]); - const __m128i vals4 = _mm_loadu_si128((__m128i *) & base[i + 12]); + Vector32 vals1, vals2, vals3, vals4, + result1, result2, result3, result4, + tmp1, tmp2, result; + + /* load the next block into 4 registers */ + vector32_load(&vals1, &base[i]); + vector32_load(&vals2, &base[i + nelem_per_vector]); + vector32_load(&vals3, &base[i + nelem_per_vector * 2]); + vector32_load(&vals4, &base[i + nelem_per_vector * 3]); /* compare each value to the key */ - const __m128i result1 = _mm_cmpeq_epi32(keys, vals1); - const __m128i result2 = _mm_cmpeq_epi32(keys, vals2); - const __m128i result3 = _mm_cmpeq_epi32(keys, vals3); - const __m128i result4 = _mm_cmpeq_epi32(keys, vals4); + result1 = vector32_eq(keys, vals1); + result2 = vector32_eq(keys, vals2); + result3 = vector32_eq(keys, vals3); + result4 = vector32_eq(keys, vals4); /* combine the results into a single variable */ - const __m128i tmp1 = _mm_or_si128(result1, result2); - const __m128i tmp2 = _mm_or_si128(result3, result4); - const __m128i result = _mm_or_si128(tmp1, tmp2); + tmp1 = vector32_or(result1, result2); + tmp2 = vector32_or(result3, result4); + result = vector32_or(tmp1, tmp2); /* see if there was a match */ - if (_mm_movemask_epi8(result) != 0) + if (vector32_any_lane_set(result)) { -#if defined(USE_ASSERT_CHECKING) Assert(assert_result == true); -#endif return true; } } @@ -151,14 +156,14 @@ pg_lfind32(uint32 key, uint32 *base, uint32 nelem) { if (key == base[i]) { -#if defined(USE_SSE2) && defined(USE_ASSERT_CHECKING) +#ifndef USE_NO_SIMD Assert(assert_result == true); #endif return true; } } -#if defined(USE_SSE2) && defined(USE_ASSERT_CHECKING) +#ifndef USE_NO_SIMD Assert(assert_result == false); #endif return false; diff --git a/src/include/port/simd.h b/src/include/port/simd.h index 8f85153110..bd4f1a3f39 100644 --- a/src/include/port/simd.h +++ b/src/include/port/simd.h @@ -32,6 +32,7 @@ #include #define USE_SSE2 typedef __m128i Vector8; +typedef __m128i Vector32; #else /* @@ -40,18 +41,24 @@ typedef __m128i Vector8; */ #define USE_NO_SIMD typedef uint64 Vector8; +typedef uint64 Vector32; #endif static inline void vector8_load(Vector8 *v, const uint8 *s); +static inline void vector32_load(Vector32 *v, const uint32 *s); static inline Vector8 vector8_broadcast(const uint8 c); +static inline Vector32 vector32_broad
HOT chain validation in verify_heapam()
Hi, On Mon, Apr 4, 2022 at 11:46 PM Andres Freund wrote: > > I think there's a few more things that'd be good to check. For example > amcheck > doesn't verify that HOT chains are reasonable, which can often be spotted > looking at an individual page. Which is a bit unfortunate, given how many > bugs > we had in that area. > > Stuff to check around that: > - target of redirect has HEAP_ONLY_TUPLE, HEAP_UPDATED set > - In a valid ctid chain within a page (i.e. xmax = xmin): > - tuples have HEAP_UPDATED set > - HEAP_ONLY_TUPLE / HEAP_HOT_UPDATED matches across chains elements (I changed the subject because the attached patch is related to HOT chain validation). Please find attached the patch with the above idea of HOT chain's validation(within a Page) and a few more validation as below. * If the predecessor’s xmin is aborted or in progress, the current tuples xmin should be aborted or in progress respectively. Also, Both xmin must be equal. * If the predecessor’s xmin is not frozen, then-current tuple’s shouldn’t be either. * If the predecessor’s xmin is equal to the current tuple’s xmin, the current tuple’s cmin should be greater than the predecessor’s xmin. * If the current tuple is not HOT then its predecessor’s tuple must not be HEAP_HOT_UPDATED. * If the current Tuple is HOT then its predecessor’s tuple must be HEAP_HOT_UPDATED and vice-versa. * If xmax is 0, which means it's the last tuple in the chain, then it must not be HEAP_HOT_UPDATED. * If the current tuple is the last tuple in the HOT chain then the next tuple should not be HOT. I am looking into the process of adding the TAP test for these changes and finding a way to corrupt a page in the tap test. Will try to include these test cases in my Upcoming version of the patch. -- Regards, Himanshu Upadhyaya EnterpriseDB: http://www.enterprisedb.com From f3ae2f783a255109655cade770ebc74e01aef34c Mon Sep 17 00:00:00 2001 From: Himanshu Upadhyaya Date: Thu, 18 Aug 2022 13:20:51 +0530 Subject: [PATCH v1] HOT chain validation in verify_heapam. --- contrib/amcheck/verify_heapam.c | 144 1 file changed, 144 insertions(+) diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c index c875f3e5a2..ae2c100de2 100644 --- a/contrib/amcheck/verify_heapam.c +++ b/contrib/amcheck/verify_heapam.c @@ -399,6 +399,7 @@ verify_heapam(PG_FUNCTION_ARGS) for (ctx.blkno = first_block; ctx.blkno <= last_block; ctx.blkno++) { OffsetNumber maxoff; + OffsetNumber predecessor[MaxOffsetNumber] = {0}; CHECK_FOR_INTERRUPTS(); @@ -433,6 +434,9 @@ verify_heapam(PG_FUNCTION_ARGS) for (ctx.offnum = FirstOffsetNumber; ctx.offnum <= maxoff; ctx.offnum = OffsetNumberNext(ctx.offnum)) { + OffsetNumber nextTupOffnum; + HeapTupleHeader htup; + ctx.itemid = PageGetItemId(ctx.page, ctx.offnum); /* Skip over unused/dead line pointers */ @@ -469,6 +473,13 @@ verify_heapam(PG_FUNCTION_ARGS) report_corruption(&ctx, psprintf("line pointer redirection to unused item at offset %u", (unsigned) rdoffnum)); + +htup = (HeapTupleHeader) PageGetItem(ctx.page, rditem); +if (!(HeapTupleHeaderIsHeapOnly(htup) && htup->t_infomask & HEAP_UPDATED)) + report_corruption(&ctx, + psprintf("Redirected Tuple at line pointer offset %u is not HEAP_ONLY_TUPLE or HEAP_UPDATED tuple", + (unsigned) rdoffnum)); + continue; } @@ -507,6 +518,139 @@ verify_heapam(PG_FUNCTION_ARGS) /* Ok, ready to check this next tuple */ check_tuple(&ctx); + /* + * Add line pointer offset to predecessor array. + * 1) If xmax is matching with xmin of next Tuple(reaching via its t_ctid). + * 2) If next tuple is in the same page. + * Raise corruption if: + * We have two tuples having same predecessor. + * + * We add offset to predecessor irrespective of transaction(t_xmin) status. We will + * do validation related to transaction status(and also all other validations) + * when we loop over predecessor array. + */ + nextTupOffnum = ItemPointerGetOffsetNumber(&(ctx.tuphdr)->t_ctid); + if (ItemPointerGetBlockNumber(&(ctx.tuphdr)->t_ctid) == ctx.blkno && +nextTupOffnum != ctx.offnum) + { +TransactionId currTupXmax; +ItemId lp; + +if (predecessor[nextTupOffnum] != 0) +{ + report_corruption(&ctx, +psprintf("Tuple at offset %u is reachable from two or more updated tuple", + (unsigned) nextTupOffnum)); + continue; +} +currTupXmax = HeapTupleHeaderGetUpdateXid(ctx.tuphdr); +lp = PageGetItemId(ctx.page, nextTupOffnum); + +htup = (HeapTupleHeader) PageGetItem(ctx.page, lp); + +if (TransactionIdIsValid(currTupXmax) && + TransactionIdEquals(HeapTupleHeaderGetXmin(htup), currTupXmax)) +{ + predecessor[nextTupOffnum] = ctx.offnum; +} +/* Non matching XMAX with XMIN is not a corruption*/ + } + + } + /* Now loop over offset and validate data i
Re: Fix japanese translation of log messages
At Fri, 26 Aug 2022 14:28:26 +0900, torikoshia wrote in > " > > #: gram.y:1588 gram.y:1604 > > #, c-format > > msgid "CREATE SCHEMA IF NOT EXISTS cannot include schema elements" > > -msgstr "CREATE SCHEMA IF NOT EXISTSんはスキーマ要素を含めることはでき > > -ません" > > +msgstr "CREATE SCHEMA IF NOT EXISTSはスキーマ要素を含めることはできま > > せん" > > How do you think? "NHa" Darn... That kind of mistypes are inevitable when I worked on nearly or over a thousand of messages at once.. Currently I'm working in an incremental fashion and I only process at most up to 10 or so messages at a time thus that kind of silly mistake cannot happen.. It's a mistake of "には". I'll load it into the next ship. The next release is 9/8 and I'm not sure the limit of translation commits for the release, though.. Anyway, thank you for reporting! regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Fix japanese translation of log messages
On 2022-08-26 15:20, Kyotaro Horiguchi wrote: At Fri, 26 Aug 2022 14:28:26 +0900, torikoshia wrote in > #: gram.y:1588 gram.y:1604 > #, c-format > msgid "CREATE SCHEMA IF NOT EXISTS cannot include schema elements" > -msgstr "CREATE SCHEMA IF NOT EXISTSんはスキーマ要素を含めることはでき > -ません" > +msgstr "CREATE SCHEMA IF NOT EXISTSはスキーマ要素を含めることはできま > せん" How do you think? "NHa" Darn... That kind of mistypes are inevitable when I worked on nearly or over a thousand of messages at once.. Currently I'm working in an incremental fashion and I only process at most up to 10 or so messages at a time thus that kind of silly mistake cannot happen.. No problem, rather thanks for working on this! It's a mistake of "には". Ah, I got it. I'll load it into the next ship. The next release is 9/8 and I'm not sure the limit of translation commits for the release, though.. Thanks a lot! -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATION