Re: Allow single table VACUUM in transaction block
On Sun, 6 Nov 2022 at 18:50, Peter Geoghegan wrote: > > On Thu, Oct 27, 2022 at 2:31 AM Simon Riggs > wrote: > > Fix, so that this works without issue: > > > > BEGIN; > > > > VACUUM (ANALYZE) vactst; > > > > COMMIT; > > > > Allows both ANALYZE and vacuum of toast tables, but not VACUUM FULL. > > > > When in a xact block, we do not set PROC_IN_VACUUM, > > nor update datfrozenxid. > > It doesn't seem like a good idea to add various new special cases to > VACUUM just to make scripts like this work. Usability is a major concern that doesn't get a high enough priority. > I'm pretty sure that there > are several deep, subtle reasons why VACUUM cannot be assumed safe to > run in a user transaction. I expected there were, so it's good to discuss them. Thanks for the input. > If we absolutely have to do this, then the least worst approach might > be to make VACUUM into a no-op rather than throwing an ERROR -- demote > the ERROR into a WARNING. You could argue that we're just arbitrarily > deciding to not do a VACUUM just to be able to avoid throwing an error > if we do that. But isn't that already true with the patch that we > have? Is it really a "true VACUUM" if the operation can never advance > datfrozenxid? At least selectively demoting the ERROR to a WARNING is > "transparent" about it. I'll answer that part in my reply to Tom, since there are good ideas in both. -- Simon Riggshttp://www.EnterpriseDB.com/
Re: Tracking last scan time
On Thu, Nov 03, 2022 at 04:44:16PM -0400, Dave Page wrote: > Here's a patch to fix this issue. Many thanks to Peter Eisentraut who > figured it out in a few minutes after I spent far too long looking down > rabbit holes in entirely the wrong place. FWIW, all the other areas of pgstatfuncs.c manipulate timestamptz fields with a style like the attached. That's a nit, still per the role of consistency with the surroundings.. Anyway, it seems to me that a regression test is in order before a scan happens just after the relation creation, and the same problem shows up with last_idx_scan. -- Michael diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c index 96bffc0f2a..ae3365d917 100644 --- a/src/backend/utils/adt/pgstatfuncs.c +++ b/src/backend/utils/adt/pgstatfuncs.c @@ -56,12 +56,18 @@ Datum pg_stat_get_lastscan(PG_FUNCTION_ARGS) { Oid relid = PG_GETARG_OID(0); + TimestampTz result; PgStat_StatTabEntry *tabentry; if ((tabentry = pgstat_fetch_stat_tabentry(relid)) == NULL) + result = 0; + else + result = tabentry->lastscan; + + if (result == 0) PG_RETURN_NULL(); else - PG_RETURN_TIMESTAMPTZ(tabentry->lastscan); + PG_RETURN_TIMESTAMPTZ(result); } diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out index 257a6a9da9..1d84407a03 100644 --- a/src/test/regress/expected/stats.out +++ b/src/test/regress/expected/stats.out @@ -573,6 +573,12 @@ SELECT pg_stat_force_next_flush(); (1 row) +SELECT last_seq_scan, last_idx_scan FROM pg_stat_all_tables WHERE relid = 'test_last_scan'::regclass; + last_seq_scan | last_idx_scan +---+--- + | +(1 row) + COMMIT; SELECT pg_stat_reset_single_table_counters('test_last_scan'::regclass); pg_stat_reset_single_table_counters diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql index f6270f7bad..b4d6753c71 100644 --- a/src/test/regress/sql/stats.sql +++ b/src/test/regress/sql/stats.sql @@ -303,6 +303,7 @@ BEGIN; CREATE TEMPORARY TABLE test_last_scan(idx_col int primary key, noidx_col int); INSERT INTO test_last_scan(idx_col, noidx_col) VALUES(1, 1); SELECT pg_stat_force_next_flush(); +SELECT last_seq_scan, last_idx_scan FROM pg_stat_all_tables WHERE relid = 'test_last_scan'::regclass; COMMIT; SELECT pg_stat_reset_single_table_counters('test_last_scan'::regclass); signature.asc Description: PGP signature
Re: Add semi-join pushdown to postgres_fdw
Ian Lawrence Barwick писал 2022-11-04 02:21: This entry was marked as "Needs review" in the CommitFest app but cfbot reports the patch no longer applies. We've marked it as "Waiting on Author". As CommitFest 2022-11 is currently underway, this would be an excellent time update the patch. Once you think the patchset is ready for review again, you (or any interested party) can move the patch entry forward by visiting https://commitfest.postgresql.org/40/3838/ and changing the status to "Needs review". Hi. I've rebased the patch. -- Best regards, Alexander Pyhalov, Postgres ProfessionalFrom 1ac2a9e3611f716da688c04a4ec36888f62078ce Mon Sep 17 00:00:00 2001 From: Alexander Pyhalov Date: Mon, 7 Nov 2022 10:23:32 +0300 Subject: [PATCH] postgres_fdw: add support for deparsing semi joins We deparse semi-joins as EXISTS subqueries. So, deparsing semi-join leads to generating addl_conds condition, which is then added to the uppermost JOIN's WHERE clause. --- contrib/postgres_fdw/deparse.c| 198 +--- .../postgres_fdw/expected/postgres_fdw.out| 297 -- contrib/postgres_fdw/postgres_fdw.c | 78 - contrib/postgres_fdw/postgres_fdw.h | 3 +- contrib/postgres_fdw/sql/postgres_fdw.sql | 119 ++- 5 files changed, 613 insertions(+), 82 deletions(-) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index 95247656504..45885442418 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -179,12 +179,13 @@ static void appendLimitClause(deparse_expr_cxt *context); static void appendConditions(List *exprs, deparse_expr_cxt *context); static void deparseFromExprForRel(StringInfo buf, PlannerInfo *root, RelOptInfo *foreignrel, bool use_alias, - Index ignore_rel, List **ignore_conds, + Index ignore_rel, List **ignore_conds, StringInfo addl_conds, List **params_list); +static void appendWhereClause(List *exprs, StringInfo addl_conds, deparse_expr_cxt *context); static void deparseFromExpr(List *quals, deparse_expr_cxt *context); static void deparseRangeTblRef(StringInfo buf, PlannerInfo *root, RelOptInfo *foreignrel, bool make_subquery, - Index ignore_rel, List **ignore_conds, List **params_list); + Index ignore_rel, List **ignore_conds, StringInfo addl_conds, List **params_list); static void deparseAggref(Aggref *node, deparse_expr_cxt *context); static void appendGroupByClause(List *tlist, deparse_expr_cxt *context); static void appendOrderBySuffix(Oid sortop, Oid sortcoltype, bool nulls_first, @@ -1370,23 +1371,20 @@ deparseFromExpr(List *quals, deparse_expr_cxt *context) { StringInfo buf = context->buf; RelOptInfo *scanrel = context->scanrel; + StringInfoData addl_conds; /* For upper relations, scanrel must be either a joinrel or a baserel */ Assert(!IS_UPPER_REL(context->foreignrel) || IS_JOIN_REL(scanrel) || IS_SIMPLE_REL(scanrel)); + initStringInfo(&addl_conds); /* Construct FROM clause */ appendStringInfoString(buf, " FROM "); deparseFromExprForRel(buf, context->root, scanrel, (bms_membership(scanrel->relids) == BMS_MULTIPLE), - (Index) 0, NULL, context->params_list); - - /* Construct WHERE clause */ - if (quals != NIL) - { - appendStringInfoString(buf, " WHERE "); - appendConditions(quals, context); - } + (Index) 0, NULL, &addl_conds, context->params_list); + appendWhereClause(quals, &addl_conds, context); + pfree(addl_conds.data); } /* @@ -1598,6 +1596,33 @@ appendConditions(List *exprs, deparse_expr_cxt *context) reset_transmission_modes(nestlevel); } +/* + * Append WHERE clause, containing conditions + * from exprs and addl_conds, to context->buf. + */ +static void +appendWhereClause(List *exprs, StringInfo addl_conds, deparse_expr_cxt *context) +{ + StringInfo buf = context->buf; + bool need_and = false; + + if (exprs != NIL || addl_conds->len > 0) + appendStringInfoString(buf, " WHERE "); + + if (exprs != NIL) + { + appendConditions(exprs, context); + need_and = true; + } + + if (addl_conds->len > 0) + { + if (need_and) + appendStringInfoString(buf, " AND "); + appendStringInfo(buf, "(%s)", addl_conds->data); + } +} + /* Output join name for given join type */ const char * get_jointype_name(JoinType jointype) @@ -1616,6 +1641,9 @@ get_jointype_name(JoinType jointype) case JOIN_FULL: return "FULL"; + case JOIN_SEMI: + return "SEMI"; + default: /* Shouldn't come here, but protect from buggy code. */ elog(ERROR, "unsupported join type %d", jointype); @@ -1715,7 +1743,7 @@ deparseSubqueryTargetList(deparse_expr_cxt *context) */ static void deparseFromExprForRel(StringInfo buf, PlannerInfo *root, RelOptInfo *foreignrel, - bool use_alias, Index ignore_rel, List **ignore_conds, + bool use_alias, Index ignore_rel, List **ignore_conds, StringInfo addl_conds, List **params_list) { PgFdwRela
Re: New docs chapter on Transaction Management and related changes
On Fri, Nov 4, 2022 at 04:17:28PM +0100, Laurenz Albe wrote: > On Sat, 2022-10-15 at 21:08 -0400, Bruce Momjian wrote: > > I therefore merged all three paragraphs into > > one and tried to make the text saner; release_savepoint.sgml diff > > attached, URL content updated. > > I wanted to have a look at this, but I am confused. The original patch > was much bigger. Is this just an incremental patch? If yes, it would > be nice to have a "grand total" patch, so that I can read it all > in one go. Yeah, I said: Yes, I didn't like global either --- I like your wording. I added your other changes too, with slight rewording. Merged patch to be posted in - a later email. but that was unclear. Let me post one now, and Simon posted one too. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Indecision is a decision. Inaction is an action. Mark Batterson
Re: [PATCH] Add native windows on arm64 support
On Sat, Nov 05, 2022 at 11:31:36AM -0700, Andres Freund wrote: > On 2022-11-03 11:06:46 +, Niyas Sait wrote: > Note that we're planning to remove the custom windows build scripts before the > next release, relying on the meson build instead. Youpi. >> #if defined(_WIN64) >> static __forceinline void >> spin_delay(void) >> { >> +#ifdef _M_ARM64 >> +/* >> + * arm64 way of hinting processor for spin loops optimisations >> + * ref: >> https://community.arm.com/support-forums/f/infrastructure-solutions-forum/48654/ssetoneon-faq >> +*/ >> +__isb(_ARM64_BARRIER_SY); >> +#else >> _mm_pause(); >> +#endif >> } >> #else >> static __forceinline void > > I think we should just apply this, there seems very little risk of making > anything worse, given the gating to _WIN64 && _M_ARM64. Seems so. Hmm, where does _ARM64_BARRIER_SY come from? Perhaps it would be better to have a comment referring to it from a different place than the forums of arm, like some actual docs? > The meson checking logic is used both for msvc and other compilers, so this > will need to work with both. The patch has been switched as waiting on author for now. -- Michael signature.asc Description: PGP signature
Re: [patch] Have psql's \d+ indicate foreign partitions
Michael Paquier writes: > On Sun, Nov 06, 2022 at 09:23:01PM +0900, Ian Lawrence Barwick wrote: >> Fair enough, make sense. > Fine by me and the patch looks OK. I'd like to apply this if there > are no objections. WFM. regards, tom lane
Re: [patch] Have psql's \d+ indicate foreign partitions
On Sun, Nov 06, 2022 at 09:23:01PM +0900, Ian Lawrence Barwick wrote: > Fair enough, make sense. Fine by me and the patch looks OK. I'd like to apply this if there are no objections. -- Michael signature.asc Description: PGP signature
Re: [PoC] Reducing planning time when tables have many partitions
HI, Regards, Zhang Mingli On Nov 7, 2022, 14:26 +0800, Tom Lane , wrote: > Andrey Lepikhov writes: > > I'm still in review of your patch now. At most it seems ok, but are you > > really need both eq_sources and eq_derives lists now? > > Didn't we just have this conversation? eq_sources needs to be kept > separate to support the "broken EC" logic. We don't want to be > regurgitating derived clauses as well as originals in that path. > Aha, we have that conversation in another thread(Reducing duplicativeness of EquivalenceClass-derived clauses ) : https://www.postgresql.org/message-id/644164.1666877342%40sss.pgh.pa.us
Re: [PoC] Reducing planning time when tables have many partitions
Andrey Lepikhov writes: > I'm still in review of your patch now. At most it seems ok, but are you > really need both eq_sources and eq_derives lists now? Didn't we just have this conversation? eq_sources needs to be kept separate to support the "broken EC" logic. We don't want to be regurgitating derived clauses as well as originals in that path. regards, tom lane
Re: [PoC] Reducing planning time when tables have many partitions
On 2/11/2022 15:27, Yuya Watari wrote: Hello, I noticed that the previous patch does not apply to the current HEAD. I attached the rebased version to this email. I'm still in review of your patch now. At most it seems ok, but are you really need both eq_sources and eq_derives lists now? As I see, everywhere access to these lists guides by eclass_source_indexes and eclass_derive_indexes correspondingly. Maybe to merge them? -- regards, Andrey Lepikhov Postgres Professional
Re: archive modules
On Sat, Nov 05, 2022 at 02:08:58PM -0700, Nathan Bossart wrote: > Such a module could define a custom GUC that accepts a shell command. I > don't think we should overload the meaning of archive_command based on the > whims of whatever archive module is loaded. Besides the potential end-user > confusion, your archive_command might be unexpectedly used incorrectly if > you forget to set archive_library. While mostly copying the logic from shell_archive.c to build the command to execute (aka shell_archive_file), which is not great as well. But well, perhaps my whole line of argument is just moot.. > Perhaps we could eventually move the archive_command functionality to a > contrib module (i.e., "shell_archive") so that users must always set > archive_library. But until then, I suspect it's better to treat modules > and commands as two separate interfaces to ease migration from older major > versions (even though archive_command is now essentially a built-in archive > module). I agree that this is a fine long-term goal, removing all traces of the archive_command from the backend core code. This is actually an argument in favor of having no traces of XLogArchiveCommand in pgarch.c, no? ;p I am not sure how long we should wait before being able to do that, perhaps a couple of years of least? I'd like to think the sooner the better (like v17?) but we are usually conservative, and the removal of the exclusive backup mode took 5~6 years if I recall correctly.. -- Michael signature.asc Description: PGP signature
Re: Allow file inclusion in pg_hba and pg_ident files
On Thu, Nov 03, 2022 at 08:55:05PM +0900, Michael Paquier wrote: > On Wed, Nov 02, 2022 at 09:06:02PM +0800, Julien Rouhaud wrote: >>> The addition of a check for the depth in two places seems unnecessary, >>> and it looks like we should do this kind of check in only one place. >> >> I usually prefer to add a maybe unnecessary depth check since it's basically >> free rather than risking an unfriendly stack size error (including in >> possible >> later refactoring), but no objection to get rid of it. > > The depth check is a good idea, though I guess that there is an > argument for having it in only one code path, not two. So, I have looked at this specific part, and I think that we should try to solve this issue on its own as the error message leading to a failure at startup is confusing now, giving: FATAL: exceeded maxAllocatedDescs (166) while trying to open file \"blah\". It took me some time to ponder about what kind of interface we should use for this job, but I think that it comes down to three things: - The depth ought to be checked just before doing AllocateFile() for a file. - We should have one code path calling AllocateFile(). - The same interface should be used for hba.c and hbafuncs.c. I have settled down to the following in the attached patch 0002: open_auth_file(const char *filename, int elevel, int depth, char **err_msg) More checks could be added in this code path (guc-file.l does more with immediate recursions, for example), but the depth level should be enough anyway. I am aware of the fact that this reduces a bit the information provided for what's called a "secondary" file in your patch set, but I tend toward moving into something that's minimalistic. Currently, we provide the file and its '@' shortcut, and we don't mention at all the file where it comes from. This is a minimal problem now because we only have pg_hba.conf to worry about for most users, still I am wondering whether it would be better in the long run to complete that with an error context that mentions the file(s) where this comes from and report a full chain of the events that led to this opening failure. At the end, I think that 0002 moves us closer toward the goal of this thread. Not been a fan of HbaIncludeKind since the patch was proposed, TBH, as well. Another thing is that this removes the need for open_inc_file(), my opinion is that it is overcomplicated, and actually it seems like under the non-strict mode (aka include_if_exists), we would generate a LOG to mention that a configuration file is skipped, but there would be nothing in err_msg for the TokenizedAuthLine, which is a mistake I guess? >>> I have not tried yet, but if we actually move the AllocateFile() and >>> FreeFile() calls within tokenize_auth_file(), it looks like we may be >>> able to get to a simpler logic without the need of the with_context() >>> flavor (even no process_included_auth_file required)? That could make >>> the interface easier to follow as a whole, while limiting the presence >>> of AllocateFile() and FreeFile() to a single code path, impacting >>> open_inc_file() that relies on what the patch uses for >>> SecondaryAuthFile and IncludedAuthFile (we could attempt to use the >>> same error message everywhere as well, as one could expect that >>> expanded and included files have different names which is enough to >>> guess which one is an inclusion and which one is a secondary). >> >> You meant tokenize_auth_file_internal? Yes possibly, in general I tried >> to avoid changing too much the existing code to ease the patch acceptance, >> but >> I agree it would make things simpler. > > I *guess* that my mind implied tokenize_auth_file() as of > yesterday's. While thinking more about the logic, I saw that your approach to use the same MemoryContext "linecxt" and pass it down to the various layers with tokenize_file_with_context() has the advantage to remove the need to copy a list of TokenizedAuthLine coming from other included files and/or dirs, which could easily lead to bugs especially for the sourcefile if someone is not careful enough. There is no need to touch the existing tokenize_inc_file() that copies a set of AuthTokens to an existing list in TokenizedAuthLine, of course. Should we do without tokenize_file_with_context() though? There is an argument for making everything go through tokenize_auth_file(), having a MemoryContext as argument (aka if NULL create it, if not use it), while still returning the memory context used? Similarly, it looks like we should have no need for process_included_authfile() at the end. I have applied as a1a7bb8 the refactoring of the directory logic for configuration files, and noticed that this could also be used for tokenize_inc_file(). This reduces the whole patch by ~15%. Attached is a set of three patches: - 0001 changes tokenize_inc_file() to use AbsoluteConfigLocation(). AbsoluteConfigLocation() uses a static buffer and a MAXPGPATH, but we'd rather change it to use a
Re: Perform streaming logical transactions by background workers and parallel apply
On Mon, Nov 7, 2022 at 10:02 AM Masahiko Sawada wrote: > > On Mon, Nov 7, 2022 at 12:58 PM Amit Kapila wrote: > > > > > > I agree that it could be better to have a new lock tag. Another point > > > > is that > > > > the remote xid and Local xid could be the same in some rare cases, so I > > > > think > > > > we might need to add another identifier to make it unique. > > > > > > > > Maybe : > > > > locktag_field1 : subscription oid > > > > locktag_field2 : xid(remote or local) > > > > locktag_field3 : 0(lock for stream block)/1(lock for transaction) > > > > > > Or I think we can use locktag_field2 for remote xid and locktag_field3 > > > for local xid. > > > > > > > We can do that way as well but OTOH, I think for the local > > transactions we don't need subscription oid, so field1 could be > > InvalidOid and field2 will be xid of local xact. Won't that be better? > > This would work. But I'm a bit concerned that we cannot identify which > subscriptions the lock belongs to when checking pg_locks view. > Fair point. I think if the user wants, she can join with pg_stat_subscription based on PID and find the corresponding subscription. However, if we want to identify everything via pg_locks then I think we should also mention classid or database id as field1. So, it would look like: field1: (pg_subscription's oid or current db id); field2: OID of subscription in pg_subscription; field3: local or remote xid; field4: 0/1 to differentiate between remote and local xid. -- With Regards, Amit Kapila.
Re: Free list same_input_transnos in preprocess_aggref
HI, On Nov 7, 2022, 04:12 +0800, Tom Lane , wrote: > > The NIL lists are of course occupying no storage. The two one-element > lists are absolutely, completely negligible in the context of planning > any nontrivial statement. Even the aggtransinfos list that is the > primary output of preprocess_aggref will dwarf that; and we leak > similarly small data structures in probably many hundred places in > the planner. > > I went a bit further and ran the core regression tests, then aggregated > the results: > > $ grep 'leaking list' postmaster.log | sed 's/.*] //' | sort | uniq -c > 4516 LOG: leaking list of length 0 > 95 LOG: leaking list of length 1 > 15 LOG: leaking list of length 2 > > You can quibble of course about how representative the regression tests > are, but there's sure no evidence at all here that we'd be saving > anything measurable. > > If anything, I'd be inclined to get rid of the > > list_free(*same_input_transnos); > > in find_compatible_agg, because it seems like a waste of code on > the same grounds. Instrumenting that in the same way, I find > that it's not reached at all in your example, while the > regression tests give > > 49 LOG: freeing list of length 0 > 2 LOG: freeing list of length 1 > Thanks for the investigation. Yeah, this patch is negligible. I’ll withdraw it in CF. Regards, Zhang Mingli
Re: Perform streaming logical transactions by background workers and parallel apply
On Mon, Nov 7, 2022 at 12:58 PM Amit Kapila wrote: > > On Mon, Nov 7, 2022 at 8:26 AM Masahiko Sawada wrote: > > > > On Sun, Nov 6, 2022 at 3:40 PM houzj.f...@fujitsu.com > > wrote: > > > > > > On Saturday, November 5, 2022 1:43 PM Amit Kapila > > > > > > > > > > > On Fri, Nov 4, 2022 at 7:35 PM houzj.f...@fujitsu.com > > > > wrote: > > > > > > > > > > On Friday, November 4, 2022 4:07 PM Amit Kapila > > > > wrote: > > > > > > > > > > > > On Thu, Nov 3, 2022 at 6:36 PM houzj.f...@fujitsu.com > > > > > > wrote: > > > > > > > > > > > > > > Thanks for the analysis and summary ! > > > > > > > > > > > > > > I tried to implement the above idea and here is the patch set. > > > > > > > > > > > > > > > > > > > Few comments on v42-0001 > > > > > > === > > > > > > > > > > Thanks for the comments. > > > > > > > > > > > > > > > > > 10. > > > > > > + winfo->shared->stream_lock_id = parallel_apply_get_unique_id(); > > > > > > + winfo->shared->transaction_lock_id = > > > > > > + winfo->shared->parallel_apply_get_unique_id(); > > > > > > > > > > > > Why can't we use xid (remote_xid) for one of these and local_xid > > > > > > (one generated by parallel apply) for the other? > > > > ... > > > > ... > > > > > > > > > > I also considered using xid for these locks, but it seems the objsubid > > > > > for the shared object lock is 16bit while xid is 32 bit. So, I tried > > > > > to generate a unique 16bit id here. > > > > > > > > > > > > > Okay, I see your point. Can we think of having a new lock tag for this > > > > with classid, > > > > objid, objsubid for the first three fields of locktag field? We can use > > > > a new > > > > macro SET_LOCKTAG_APPLY_TRANSACTION and a common function to set the > > > > tag and acquire the lock. One more point related to this is that I am > > > > suggesting > > > > classid by referring to SET_LOCKTAG_OBJECT as that is used in the > > > > current > > > > patch but do you think we need it for our purpose, won't subscription > > > > id and > > > > xid can uniquely identify the tag? > > > > > > I agree that it could be better to have a new lock tag. Another point is > > > that > > > the remote xid and Local xid could be the same in some rare cases, so I > > > think > > > we might need to add another identifier to make it unique. > > > > > > Maybe : > > > locktag_field1 : subscription oid > > > locktag_field2 : xid(remote or local) > > > locktag_field3 : 0(lock for stream block)/1(lock for transaction) > > > > Or I think we can use locktag_field2 for remote xid and locktag_field3 > > for local xid. > > > > We can do that way as well but OTOH, I think for the local > transactions we don't need subscription oid, so field1 could be > InvalidOid and field2 will be xid of local xact. Won't that be better? This would work. But I'm a bit concerned that we cannot identify which subscriptions the lock belongs to when checking pg_locks view. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: explain analyze rows=%.0f
On 22/7/2022 16:47, Amit Kapila wrote: I feel the discussion has slightly deviated which makes it unclear whether this patch is required or not? After quick review I want to express my thoughts. At first, We have been waiting for this feature for years. Often clients give an explain to us where we see something like: "rows=0, loops=100". Without verbose mode, I can't even understand whether this node produces any rows or not. So, I think this feature is useful for parameterized plans mostly. Also, printing two decimal digits or even three isn't meaningful - sometimes we have a plan where number of loops is about 1E6 or even 1E7, but number of real rows is equal 100 or 1000. To overcome this issue, I see two options: 1. Show the exact number of tuples without division by loops (fair case but invasive and bad for automation tools). 2. Show rows in scientific format like X.XXEXX. I vote for second option. -- regards, Andrey Lepikhov Postgres Professional
Re: Perform streaming logical transactions by background workers and parallel apply
On Mon, Nov 7, 2022 at 8:26 AM Masahiko Sawada wrote: > > On Sun, Nov 6, 2022 at 3:40 PM houzj.f...@fujitsu.com > wrote: > > > > On Saturday, November 5, 2022 1:43 PM Amit Kapila > > > > > > On Fri, Nov 4, 2022 at 7:35 PM houzj.f...@fujitsu.com > > > wrote: > > > > > > > > On Friday, November 4, 2022 4:07 PM Amit Kapila > > > wrote: > > > > > > > > > > On Thu, Nov 3, 2022 at 6:36 PM houzj.f...@fujitsu.com > > > > > wrote: > > > > > > > > > > > > Thanks for the analysis and summary ! > > > > > > > > > > > > I tried to implement the above idea and here is the patch set. > > > > > > > > > > > > > > > > Few comments on v42-0001 > > > > > === > > > > > > > > Thanks for the comments. > > > > > > > > > > > > > > 10. > > > > > + winfo->shared->stream_lock_id = parallel_apply_get_unique_id(); > > > > > + winfo->shared->transaction_lock_id = > > > > > + winfo->shared->parallel_apply_get_unique_id(); > > > > > > > > > > Why can't we use xid (remote_xid) for one of these and local_xid > > > > > (one generated by parallel apply) for the other? > > > ... > > > ... > > > > > > > > I also considered using xid for these locks, but it seems the objsubid > > > > for the shared object lock is 16bit while xid is 32 bit. So, I tried > > > > to generate a unique 16bit id here. > > > > > > > > > > Okay, I see your point. Can we think of having a new lock tag for this > > > with classid, > > > objid, objsubid for the first three fields of locktag field? We can use a > > > new > > > macro SET_LOCKTAG_APPLY_TRANSACTION and a common function to set the > > > tag and acquire the lock. One more point related to this is that I am > > > suggesting > > > classid by referring to SET_LOCKTAG_OBJECT as that is used in the current > > > patch but do you think we need it for our purpose, won't subscription id > > > and > > > xid can uniquely identify the tag? > > > > I agree that it could be better to have a new lock tag. Another point is > > that > > the remote xid and Local xid could be the same in some rare cases, so I > > think > > we might need to add another identifier to make it unique. > > > > Maybe : > > locktag_field1 : subscription oid > > locktag_field2 : xid(remote or local) > > locktag_field3 : 0(lock for stream block)/1(lock for transaction) > > Or I think we can use locktag_field2 for remote xid and locktag_field3 > for local xid. > We can do that way as well but OTOH, I think for the local transactions we don't need subscription oid, so field1 could be InvalidOid and field2 will be xid of local xact. Won't that be better? -- With Regards, Amit Kapila.
Re: Perform streaming logical transactions by background workers and parallel apply
On Sun, Nov 6, 2022 at 3:40 PM houzj.f...@fujitsu.com wrote: > > On Saturday, November 5, 2022 1:43 PM Amit Kapila > > > > On Fri, Nov 4, 2022 at 7:35 PM houzj.f...@fujitsu.com > > wrote: > > > > > > On Friday, November 4, 2022 4:07 PM Amit Kapila > > wrote: > > > > > > > > On Thu, Nov 3, 2022 at 6:36 PM houzj.f...@fujitsu.com > > > > wrote: > > > > > > > > > > Thanks for the analysis and summary ! > > > > > > > > > > I tried to implement the above idea and here is the patch set. > > > > > > > > > > > > > Few comments on v42-0001 > > > > === > > > > > > Thanks for the comments. > > > > > > > > > > > 10. > > > > + winfo->shared->stream_lock_id = parallel_apply_get_unique_id(); > > > > + winfo->shared->transaction_lock_id = > > > > + winfo->shared->parallel_apply_get_unique_id(); > > > > > > > > Why can't we use xid (remote_xid) for one of these and local_xid > > > > (one generated by parallel apply) for the other? > > ... > > ... > > > > > > I also considered using xid for these locks, but it seems the objsubid > > > for the shared object lock is 16bit while xid is 32 bit. So, I tried > > > to generate a unique 16bit id here. > > > > > > > Okay, I see your point. Can we think of having a new lock tag for this with > > classid, > > objid, objsubid for the first three fields of locktag field? We can use a > > new > > macro SET_LOCKTAG_APPLY_TRANSACTION and a common function to set the > > tag and acquire the lock. One more point related to this is that I am > > suggesting > > classid by referring to SET_LOCKTAG_OBJECT as that is used in the current > > patch but do you think we need it for our purpose, won't subscription id and > > xid can uniquely identify the tag? > > I agree that it could be better to have a new lock tag. Another point is that > the remote xid and Local xid could be the same in some rare cases, so I think > we might need to add another identifier to make it unique. > > Maybe : > locktag_field1 : subscription oid > locktag_field2 : xid(remote or local) > locktag_field3 : 0(lock for stream block)/1(lock for transaction) Or I think we can use locktag_field2 for remote xid and locktag_field3 for local xid. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: Do we need to pass down nonnullable_vars when reducing outer joins?
On Sun, Nov 6, 2022 at 4:00 AM Tom Lane wrote: > Richard Guo writes: > > AFAICS, the Vars forced nonnullable by given clause are only used to > > check if we can reduce JOIN_LEFT to JOIN_ANTI, and it is checking the > > join's own quals there. It seems to me we do not need to pass down > > nonnullable_vars by upper quals to the children of a join. > > Hmm, you are right, we are not doing anything useful with that data. > I can't remember if I had a concrete plan for doing something with it > or not, but we sure aren't using it now. So pushed. Thanks for pushing it! Thanks Richard
Re: Check SubPlan clause for nonnullable rels/Vars
On Sun, Nov 6, 2022 at 3:33 AM Tom Lane wrote: > Richard Guo writes: > > [ v2-0001-Check-SubPlan-clause-for-nonnullable-rels-Vars.patch ] > > Pushed with cosmetic changes: > > * I don't believe in "add at the end" as a principle for placement > of new code. There's usually some other logic that will give more > consistent results. In cases like this, ordering the treatment of > Node types in the same way as they appear in the include/nodes/ > headers is the standard answer. (Not that everybody's been totally > consistent about that :-( ... but that's not an argument for > introducing even more entropy.) > > * I rewrote the comments a bit. > > * I didn't like the test case too much: spinning up a whole new set > of tables seems like a lot of useless cycles. Plus it makes it > harder to experiment with the test query manually. I usually like > to write such queries using the regression database's standard tables, > so I rewrote this example that way. Thanks for the changes. They make the patch look better. And thanks for pushing it. Thanks Richard
Re: O(n) tasks cause lengthy startups and checkpoints
On Fri, Sep 23, 2022 at 10:41:54AM -0700, Nathan Bossart wrote: > v11 adds support for building with meson. rebased -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From 367c5f3863457cfbd0fe8add0e8df3e630aaaea9 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Wed, 5 Jan 2022 19:24:22 + Subject: [PATCH v12 1/6] Introduce custodian. The custodian process is a new auxiliary process that is intended to help offload tasks could otherwise delay startup and checkpointing. This commit simply adds the new process; it does not yet do anything useful. --- src/backend/postmaster/Makefile | 1 + src/backend/postmaster/auxprocess.c | 8 + src/backend/postmaster/custodian.c | 383 src/backend/postmaster/meson.build | 1 + src/backend/postmaster/postmaster.c | 44 ++- src/backend/storage/ipc/ipci.c | 3 + src/backend/storage/lmgr/proc.c | 1 + src/backend/utils/activity/wait_event.c | 3 + src/backend/utils/init/miscinit.c | 3 + src/include/miscadmin.h | 3 + src/include/postmaster/custodian.h | 32 ++ src/include/storage/proc.h | 11 +- src/include/utils/wait_event.h | 1 + 13 files changed, 489 insertions(+), 5 deletions(-) create mode 100644 src/backend/postmaster/custodian.c create mode 100644 src/include/postmaster/custodian.h diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile index 3a794e54d6..e1e1d1123f 100644 --- a/src/backend/postmaster/Makefile +++ b/src/backend/postmaster/Makefile @@ -18,6 +18,7 @@ OBJS = \ bgworker.o \ bgwriter.o \ checkpointer.o \ + custodian.o \ fork_process.o \ interrupt.o \ pgarch.o \ diff --git a/src/backend/postmaster/auxprocess.c b/src/backend/postmaster/auxprocess.c index 7765d1c83d..c275271c95 100644 --- a/src/backend/postmaster/auxprocess.c +++ b/src/backend/postmaster/auxprocess.c @@ -20,6 +20,7 @@ #include "pgstat.h" #include "postmaster/auxprocess.h" #include "postmaster/bgwriter.h" +#include "postmaster/custodian.h" #include "postmaster/startup.h" #include "postmaster/walwriter.h" #include "replication/walreceiver.h" @@ -74,6 +75,9 @@ AuxiliaryProcessMain(AuxProcType auxtype) case CheckpointerProcess: MyBackendType = B_CHECKPOINTER; break; + case CustodianProcess: + MyBackendType = B_CUSTODIAN; + break; case WalWriterProcess: MyBackendType = B_WAL_WRITER; break; @@ -153,6 +157,10 @@ AuxiliaryProcessMain(AuxProcType auxtype) CheckpointerMain(); proc_exit(1); + case CustodianProcess: + CustodianMain(); + proc_exit(1); + case WalWriterProcess: WalWriterMain(); proc_exit(1); diff --git a/src/backend/postmaster/custodian.c b/src/backend/postmaster/custodian.c new file mode 100644 index 00..e90f5d0d1f --- /dev/null +++ b/src/backend/postmaster/custodian.c @@ -0,0 +1,383 @@ +/*- + * + * custodian.c + * + * The custodian process handles a variety of non-critical tasks that might + * otherwise delay startup, checkpointing, etc. Offloaded tasks should not + * be synchronous (e.g., checkpointing shouldn't wait for the custodian to + * complete a task before proceeding). However, tasks can be synchronously + * executed when necessary (e.g., single-user mode). The custodian is not + * an essential process and can shutdown quickly when requested. The + * custodian only wakes up to perform its tasks when its latch is set. + * + * + * Copyright (c) 2022, PostgreSQL Global Development Group + * + * + * IDENTIFICATION + * src/backend/postmaster/custodian.c + * + *- + */ +#include "postgres.h" + +#include "libpq/pqsignal.h" +#include "pgstat.h" +#include "postmaster/custodian.h" +#include "postmaster/interrupt.h" +#include "storage/bufmgr.h" +#include "storage/condition_variable.h" +#include "storage/fd.h" +#include "storage/proc.h" +#include "storage/procsignal.h" +#include "storage/smgr.h" +#include "utils/memutils.h" + +static void DoCustodianTasks(bool retry); +static CustodianTask CustodianGetNextTask(void); +static void CustodianEnqueueTask(CustodianTask task); +static const struct cust_task_funcs_entry *LookupCustodianFunctions(CustodianTask task); + +typedef struct +{ + slock_t cust_lck; + + CustodianTask task_queue_elems[NUM_CUSTODIAN_TASKS]; + int task_queue_head; +} CustodianShmemStruct; + +static CustodianShmemStruct *CustodianShmem; + +typedef void (*CustodianTaskFunction) (void); +typedef void (*CustodianTaskHandleArg) (Datum arg); + +struct cust_task_funcs_entry +{ + CustodianTask task; + CustodianTaskFunction task_func; /* performs task */ + CustodianTaskHandleArg handle_arg_func; /* handles additional info in request */ +}; + +/* + * Add new tasks here. + * + * task_func is the logic that will be executed via DoCustodianTasks() when the + * matching task is r
Re: Allow single table VACUUM in transaction block
On Sun, Nov 6, 2022 at 11:14 AM Tom Lane wrote: > In general, I do not believe in encouraging users to run VACUUM > manually in the first place. We would be far better served by > spending our effort to improve autovacuum's shortcomings. I couldn't agree more. A lot of problems seem related to the idea that VACUUM is just a command that the DBA periodically runs to get a predictable fixed result, a little like CREATE INDEX. That conceptual model isn't exactly wrong; it just makes it much harder to apply any kind of context about the needs of the table over time. There is a natural cycle to how VACUUM (really autovacuum) is run, and the details matter. There is a significant amount of relevant context that we can't really use right now. That wouldn't be true if VACUUM only ran within an autovacuum worker (by definition). The VACUUM command itself would still be available, and support the same user interface, more or less. Under the hood the VACUUM command would work by enqueueing a VACUUM job, to be performed asynchronously by an autovacuum worker. Perhaps the initial enqueue operation could be transactional, fixing Simon's complaint. "No more VACUUMs outside of autovacuum" would enable more advanced autovacuum.c scheduling, allowing us to apply a lot more context about the costs and benefits, without having to treat manual VACUUM as an independent thing. We could coalesce together redundant VACUUM jobs, suspend and resume VACUUM operations, and have more strategies to deal with problems as they emerge. > I'd like to see some sort of direct attack on its inability to deal > with temp tables, for instance. (Force the owning backend to > do it? Temporarily change the access rules so that the data > moves to shared buffers? Dunno, but we sure haven't tried hard.) This is a good example of the kind of thing I have in mind. Perhaps it could work by killing the backend that owns the temp relation when things truly get out of hand? I think that that would be a perfectly reasonable trade-off. Another related idea: better behavior in the event of a manually issued VACUUM (now just an enqueued autovacuum) that cannot do useful work due to the presence of a long running snapshot. The VACUUM doesn't have to dutifully report "success" when there is no practical sense in which it was successful. There could be a back and forth conversation between autovacuum.c and vacuumlazy.c that makes sure that something useful happens sooner or later. The passage of time really matters here. As a bonus, we might be able to get rid of the autovacuum GUC variants. Plus the current autovacuum logging would just be how we'd log every VACUUM. -- Peter Geoghegan
Re: Free list same_input_transnos in preprocess_aggref
Zhang Mingli writes: > Correction: SaveBytes = Sum results of accumulate_list_size: 24(4+4+8+8), What I did was to stick in elog(LOG, "leaking list of length %d", list_length(same_input_transnos)); at the end of preprocess_aggref. What I see on your five-aggregate example is 2022-11-06 14:59:25.666 EST [3046253] LOG: leaking list of length 0 2022-11-06 14:59:25.666 EST [3046253] STATEMENT: explain select max(id), min(id), sum(id), count(id), avg(id) from t1; 2022-11-06 14:59:25.666 EST [3046253] LOG: leaking list of length 1 2022-11-06 14:59:25.666 EST [3046253] STATEMENT: explain select max(id), min(id), sum(id), count(id), avg(id) from t1; 2022-11-06 14:59:25.666 EST [3046253] LOG: leaking list of length 0 2022-11-06 14:59:25.666 EST [3046253] STATEMENT: explain select max(id), min(id), sum(id), count(id), avg(id) from t1; 2022-11-06 14:59:25.666 EST [3046253] LOG: leaking list of length 1 2022-11-06 14:59:25.666 EST [3046253] STATEMENT: explain select max(id), min(id), sum(id), count(id), avg(id) from t1; 2022-11-06 14:59:25.666 EST [3046253] LOG: leaking list of length 0 2022-11-06 14:59:25.666 EST [3046253] STATEMENT: explain select max(id), min(id), sum(id), count(id), avg(id) from t1; The NIL lists are of course occupying no storage. The two one-element lists are absolutely, completely negligible in the context of planning any nontrivial statement. Even the aggtransinfos list that is the primary output of preprocess_aggref will dwarf that; and we leak similarly small data structures in probably many hundred places in the planner. I went a bit further and ran the core regression tests, then aggregated the results: $ grep 'leaking list' postmaster.log | sed 's/.*] //' | sort | uniq -c 4516 LOG: leaking list of length 0 95 LOG: leaking list of length 1 15 LOG: leaking list of length 2 You can quibble of course about how representative the regression tests are, but there's sure no evidence at all here that we'd be saving anything measurable. If anything, I'd be inclined to get rid of the list_free(*same_input_transnos); in find_compatible_agg, because it seems like a waste of code on the same grounds. Instrumenting that in the same way, I find that it's not reached at all in your example, while the regression tests give 49 LOG: freeing list of length 0 2 LOG: freeing list of length 1 regards, tom lane
Re: Allow single table VACUUM in transaction block
Peter Geoghegan writes: > My guess is that there are more things like that. Possibly even things > that were never directly considered. VACUUM evolved in a world where > we absolutely took not running in a transaction for granted. Changing > that now is a pretty big deal. Maybe it would all be worth it if the end > result was a super compelling feature. But I for one don't think that > it will be. Yeah. To be blunt, this proposal scares the sh*t out of me. I agree that there are tons of subtle dependencies on our current assumptions about how VACUUM works, and I strongly suspect that we won't find all of them until after users have lost data. I cannot believe that running VACUUM inside transactions is valuable enough to take that risk ... especially if it's a modified limited kind of VACUUM that doesn't eliminate the need for periodic real VACUUMs. In general, I do not believe in encouraging users to run VACUUM manually in the first place. We would be far better served by spending our effort to improve autovacuum's shortcomings. I'd like to see some sort of direct attack on its inability to deal with temp tables, for instance. (Force the owning backend to do it? Temporarily change the access rules so that the data moves to shared buffers? Dunno, but we sure haven't tried hard.) However many bugs such a thing might have initially, at least they'd only put temporary data at hazard. regards, tom lane
Re: Allow single table VACUUM in transaction block
On Thu, Oct 27, 2022 at 2:31 AM Simon Riggs wrote: > Fix, so that this works without issue: > > BEGIN; > > VACUUM (ANALYZE) vactst; > > COMMIT; > > Allows both ANALYZE and vacuum of toast tables, but not VACUUM FULL. > > When in a xact block, we do not set PROC_IN_VACUUM, > nor update datfrozenxid. It doesn't seem like a good idea to add various new special cases to VACUUM just to make scripts like this work. I'm pretty sure that there are several deep, subtle reasons why VACUUM cannot be assumed safe to run in a user transaction. For example, the whole way that index page deletion is decoupled from recycling in access methods like nbtree (see "Placing deleted pages in the FSM" from the nbtree README) rests upon delicate assumptions about whether or not there could be an "in-flight" B-tree descent that is at risk of landing on a deleted page as it is concurrently recycled. In general the deleted page has to remain in place as a tombstone, until that is definitely not possible anymore. This relies on the backend that runs VACUUM having no references to the page pending deletion. (Commit 9dd963ae25 added an optimization that heavily leaned on the idea that the state within the backend running VACUUM couldn't possibly have a live page reference that is at risk of being broken by the optimization, though I'm pretty sure that you'd have problems even without that commit/optimization in place.) My guess is that there are more things like that. Possibly even things that were never directly considered. VACUUM evolved in a world where we absolutely took not running in a transaction for granted. Changing that now is a pretty big deal. Maybe it would all be worth it if the end result was a super compelling feature. But I for one don't think that it will be. If we absolutely have to do this, then the least worst approach might be to make VACUUM into a no-op rather than throwing an ERROR -- demote the ERROR into a WARNING. You could argue that we're just arbitrarily deciding to not do a VACUUM just to be able to avoid throwing an error if we do that. But isn't that already true with the patch that we have? Is it really a "true VACUUM" if the operation can never advance datfrozenxid? At least selectively demoting the ERROR to a WARNING is "transparent" about it. -- Peter Geoghegan
Re: [DOCS] Stats views and functions not in order?
Peter Smith writes: > Sorry, I forgot the attachments in the previous post. PSA. I spent a bit of time looking at this. I agree that a lot of the current ordering choices here look like they were made with the advice of a dartboard, and there's a number of things that are pretty blatantly just sloppy merging (like the out-of-order wait-event items). However, I'm not a big fan of "alphabetical order at all costs", because that frequently leads to ordering decisions that are not a lot better than random from a semantic standpoint. For example, I resist the idea that it's sensible to put pg_stat_all_indexes before pg_stat_all_tables. I'm unconvinced that putting pg_stat_sys_tables and pg_stat_user_tables far away from pg_stat_all_tables is great, either. So ... how do we proceed? One thing I'm unhappy about that you didn't address is that the subsection ordering in "28.4. Progress Reporting" could hardly have been invented even with a dartboard. Perhaps it reflects development order, but that's a poor excuse. I'd be inclined to alphabetize by SQL command name, but maybe leave Base Backup to the end since it's not a SQL command. regards, tom lane
Re: remap the .text segment into huge pages at run time
Hi, On 2022-11-06 13:56:10 +0700, John Naylor wrote: > On Sat, Nov 5, 2022 at 3:27 PM Andres Freund wrote: > > I don't think the dummy functions are a good approach, there were plenty > > things after it when I played with them. > > To be technical, the point wasn't to have no code after it, but to have no > *hot* code *before* it, since with the iodlr approach the first 1.99MB of > .text is below the first aligned boundary within that section. But yeah, > I'm happy to ditch that hack entirely. Just because code is colder than the alternative branch, doesn't necessary mean it's entirely cold overall. I saw hits to things after the dummy function to have a perf effect. > > > > With these flags the "R E" segments all start on a 0x20/2MiB > boundary > > > and > > > > are padded to the next 2MiB boundary. However the OS / dynamic loader > only > > > > maps the necessary part, not all the zero padding. > > > > > > > > This means that if we were to issue a MADV_COLLAPSE, we can before it > do > > > an > > > > mremap() to increase the length of the mapping. > > > > > > I see, interesting. What location are you passing for madvise() and > > > mremap()? The beginning of the segment (for me has .init/.plt) or an > > > aligned boundary within .text? > > >/* > > * Make huge pages out of it. Requires at least linux 6.1. We > could > > * fall back to MADV_HUGEPAGE if it fails, but it doesn't do all > that > > * much in older kernels. > > */ > > About madvise(), I take it MADV_HUGEPAGE and MADV_COLLAPSE only work for > THP? The man page seems to indicate that. MADV_HUGEPAGE works as long as /sys/kernel/mm/transparent_hugepage/enabled is to always or madvise. My understanding is that MADV_COLLAPSE will work even if /sys/kernel/mm/transparent_hugepage/enabled is set to never. > In the support work I've done, the standard recommendation is to turn THP > off, especially if they report sudden performance problems. I think that's pretty much an outdated suggestion FWIW. Largely caused by Red Hat extremely aggressively backpatching transparent hugepages into RHEL 6 (IIRC). Lots of improvements have been made to THP since then. I've tried to see negative effects maybe 2-3 years back, without success. I really don't see a reason to ever set /sys/kernel/mm/transparent_hugepage/enabled to 'never', rather than just 'madvise'. > If explicit HP's are used for shared mem, maybe THP is less of a risk? I > need to look back at the tests that led to that advice... I wouldn't give that advice to customers anymore, unless they use extremely old platforms or unless there's very concrete evidence. > > A real version would have to open /proc/self/maps and do this for at least > > I can try and generalize your above sketch into a v2 patch. Cool. > > postgres' r-xp mapping. We could do it for libraries too, if they're > suitably > > aligned (both in memory and on-disk). > > It looks like plpgsql is only 27 standard pages in size... > > Regarding glibc, we could try moving a couple of the hotter functions into > PG, using smaller and simpler coding, if that has better frontend cache > behavior. The paper "Understanding and Mitigating Front-End Stalls in > Warehouse-Scale Computers" talks about this, particularly section 4.4 > regarding memcmp(). I think the amount of work necessary for that is nontrivial and continual. So I'm loathe to go there. > > > I quickly tried to align the segments with the linker and then in my > patch > > > have the address for mmap() rounded *down* from the .text start to the > > > beginning of that segment. It refused to start without logging an error. > > > > Hm, what linker was that? I did note that you need some additional flags > for > > some of the linkers. > > BFD, but I wouldn't worry about that failure too much, since the > mremap()/madvise() strategy has a lot fewer moving parts. > > On the subject of linkers, though, one thing that tripped me up was trying > to change the linker with Meson. First I tried > > -Dc_args='-fuse-ld=lld' It's -Dc_link_args=... > but that led to warnings like this when : > /usr/bin/ld: warning: -z separate-loadable-segments ignored > > When using this in the top level meson.build > > elif host_system == 'linux' > sema_kind = 'unnamed_posix' > cppflags += '-D_GNU_SOURCE' > # Align the loadable segments to 2MB boundaries to support remapping to > # huge pages. > ldflags += cc.get_supported_link_arguments([ > '-Wl,-zmax-page-size=0x20', > '-Wl,-zcommon-page-size=0x20', > '-Wl,-zseparate-loadable-segments' > ]) > > > According to > > https://mesonbuild.com/howtox.html#set-linker > > I need to add CC_LD=lld to the env vars before invoking, which got rid of > the warning. Then I wanted to verify that lld was actually used, and in > > https://releases.llvm.org/14.0.0/tools/lld/docs/index.html You can just look at build.ninja, fwiw. Or use ninja -v (in postgres's cases with -d keeprsp, be
Re: BUG #17434: CREATE/DROP DATABASE can be executed in the same transaction with other commands
Yugo NAGATA writes: >> The attached patch tries to add comments explaining it on the functions. > I forward it to the hackers list because the patch is to fix comments. What do you think of the attached wording? I don't think the pipeline angle is of concern to anyone who might be reading these comments with the aim of understanding what guarantees they have. Perhaps there should be more about that in the user-facing docs, though. regards, tom lane diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c index 883d6c0f70..8086b857b9 100644 --- a/src/backend/access/transam/xact.c +++ b/src/backend/access/transam/xact.c @@ -3448,6 +3448,10 @@ AbortCurrentTransaction(void) * a transaction block, typically because they have non-rollback-able * side effects or do internal commits. * + * If this routine completes successfully, then the calling statement is + * guaranteed that if it completes without error, its results will be + * committed immediately. + * * If we have already started a transaction block, issue an error; also issue * an error if we appear to be running inside a user-defined function (which * could issue more commands and possibly cause a failure after the statement @@ -3573,6 +3577,10 @@ CheckTransactionBlock(bool isTopLevel, bool throwError, const char *stmtType) * a transaction block than when running as single commands. ANALYZE is * currently the only example. * + * If this routine returns "false", then the calling statement is + * guaranteed that if it completes without error, its results will be + * committed immediately. + * * isTopLevel: passed down from ProcessUtility to determine whether we are * inside a function. */
Re: ssl tests aren't concurrency safe due to get_free_port()
On 2022-11-05 Sa 14:36, Andres Freund wrote: >> >> use Carp; >> use Config; >> -use Fcntl qw(:mode); >> +use Fcntl qw(:mode :flock :seek O_CREAT O_RDWR); > Does this do anything useful on windows? All we're doing here on Windows and elsewhere is getting access to some constants used in calls to flock(), seek() and sysopen(). It's not actually doing anything else anywhere. > >> +if ($pid +0 > 0) > Gotta love perl. Think of it as a typecast. > > >> +{ >> +if (kill 0, $pid) > Does this work on windows? > Yes, it's supposed to. It doesn't actually send a signal, it checks if the process exists. There's some suggestion it might give false positives on Windows, but that won't really hurt us here, we'll just look for a different port. One possible addition would be to add removing the reservation files in an END handler. That would be pretty simple. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: bug: pg_dump use strange tag for trigger
ne 6. 11. 2022 v 15:52 odesílatel Tom Lane napsal: > Pavel Stehule writes: > > When I played with regression tests for pg_restore, I tested -T filtering > > triggers too. I had problems with restoring triggers. I found that the > name > > for trigger uses the pattern "tablename triggername" (not just (and > > correct) triggername). > > > I propose to generate tag just like trigger name > > Trigger names by themselves aren't even a little bit unique, so that > doesn't seem like a great idea to me. There's backwards compatibility > to worry about, too. Maybe we need a documentation adjustment, instead? > I understand, but the space is a little bit non intuitive. Maybe use dot there and better documentation. Regards Pavel > > regards, tom lane >
Re: Draft back-branch release notes are up
Justin Pryzby writes: > + Fix planner failure with extended statistics on partitioned tables > + (Richard Guo, Justin Pryzby) > This can also happen with inheritance tables. > + Add missing guards for NULL connection pointer > Maybe should be NULL or NULL ? Done, thanks. regards, tom lane
Re: Pluggable toaster
Hi, Pluggable TOAST is provided as an interface to allow developers to plug in custom TOAST mechanics. It does not determines would it be universal Toaster or one data type, but syntax for universal Toaster is out of scope for this patchset. >True, but besides Jsonb Toaster one can implement a universal one as >well (your own example: encryption, or a Toaster that bypasses a 1 GB >value limitation). However we can probably keep this out of scope of >this particular patchset for now. As mentioned before this is going to >be just an additional syntax for the user convenience. To transparently bypass the 1 Gb limit you have to increase size of data VARLENA type is able to hold. This is out if scope for this patchset too, but as I mentioned before, there are means to do this with Pluggable TOAST using toast and detoast iterators. >Compression is actually a part of the four-stage TOAST algorithm. So >what you're doing is using the default TOAST most of the time, but if >the storage strategy is EXTERNAL and a custom TOASTer is specified >then you use a type-aware "TOASTer". We provide several custom Toasters for particular types of data causing a lot of problems in storage. The main idea behind Pluggable TOAST is using special data-aware Toasters where it is needed. Extended storage mode supports only 2 compression algorithms, though there are more efficient ones for different types of data. >If the goal is to implement true "Pluggable TOASTers" then the TOAST >should be substituted entirely. If the goal is to implement type-aware >sub-TOASTers we should be honest about this and call it otherwise: >e.g. "Type-aware TOASTer" or perhaps "subTOASTer". Additionally in >this case there should be no validate() method since this is going to >work only with the default heapam that implements the default TOASTer. >So to clarify, the goal is to deliver true "Pluggable TOASTers" or >only "type-aware sub-TOASTers"? Pluggable TOAST does not supposes complete substitution of existing TOAST mechanics - this is out of scope for this patchset. It proposes means to substitute it or plug in additional custom TOAST mechanics for particular data types. >I don't see any reason why the semantics for Toasters should be any >different from user-defined types implemented in an extension. If >there are columns that use a given Toaster we should forbid DROPping >the extension. Otherwise "DROP extension" should succeed. No one says >that this should be a fast operation but it should be possible. I'm currently working on a revision of Pluggable TOAST that would make dropping Toaster possible if there is no data TOASTed with it, along with several other major changes. It will be available in this (I hope so) or the following, if I won't make it in time, commitfest. On Fri, Nov 4, 2022 at 12:35 PM Aleksander Alekseev < aleksan...@timescale.com> wrote: > Hi again, > > > [1]: > https://www.postgresql.org/message-id/flat/CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22=5xvbg7s4vr...@mail.gmail.com > > I added a link but forgot to use it, d'oh! > > Please note that if the goal is to merely implement "type-aware > sub-TOASTers" then compression dictionaries [1] arguably provide the > same value with MUCH less complexity. > > -- > Best regards, > Aleksander Alekseev > -- Regards, Nikita Malakhov Postgres Professional https://postgrespro.ru/
Re: Add LZ4 compression in pg_dump
On Wed, Nov 2, 2022 at 14:28, Justin Pryzby wrote: > Checking if you'll be able to submit new patches soon ? Thank you for checking up. Expect new versions within this commitfest cycle.
Re: bug: pg_dump use strange tag for trigger
Pavel Stehule writes: > When I played with regression tests for pg_restore, I tested -T filtering > triggers too. I had problems with restoring triggers. I found that the name > for trigger uses the pattern "tablename triggername" (not just (and > correct) triggername). > I propose to generate tag just like trigger name Trigger names by themselves aren't even a little bit unique, so that doesn't seem like a great idea to me. There's backwards compatibility to worry about, too. Maybe we need a documentation adjustment, instead? regards, tom lane
Re: Small TAP improvements
On 2022-Jun-14, Andrew Dunstan wrote: > OK, here's a more principled couple of patches. For config_data, if you > give multiple options it gives you back the list of values. If you don't > specify any, in scalar context it just gives you back all of pg_config's > output, but in array context it gives you a map, so you should be able > to say things like: > > my %node_config = $node->config_data; Hi, it looks to me like these were forgotten? -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Re: Improve logging when using Huge Pages
On Sun, Nov 06, 2022 at 02:04:29PM +0700, John Naylor wrote: > On Thu, Nov 3, 2022 at 8:11 AM Thomas Munro wrote: > > > > I wonder if the problem here is that people are reluctant to add noise > > to every starting system. There are people who have not configured > > their system and don't want to see that noise, and then some people > > have configured their system and would like to know about it if it > > doesn't work so they can be aware of that, but don't want to use "off" > > because they don't want a hard failure. Would it be better if there > > were a new level "try_log" (or something), which only logs a message > > if it tries and fails? > > I think the best thing to do is change huge_pages='on' to log a WARNING and > fallback to regular pages if the mapping fails. That way, both dev and prod > can keep the same settings, since 'on' will have both visibility and > robustness. I don't see a good reason to refuse to start -- seems like an > anti-pattern. I'm glad to see there's still discussion on this topic :) Another idea is to add a RUNTIME_COMPUTED GUC to *display* the state of huge pages, so I can stop parsing /proc/maps to find it. -- Justin
Re: [patch] Have psql's \d+ indicate foreign partitions
2022年11月6日(日) 1:39 Tom Lane : > > Michael Paquier writes: > > On Mon, Oct 24, 2022 at 09:44:18PM +0900, Ian Lawrence Barwick wrote: > >> Recently I have been working a lot with partitioned tables which contain a > >> mix > >> of local and foreign partitions, and find it would be very useful to be > >> able to > >> easily obtain an overview of which partitions are foreign and where they > >> are > >> located. > > > Hmm. I am not sure that we should add this much amount of > > information, particularly for the server bits. > > FWIW, I am also in favor of adding ", FOREIGN" but no more. > My concern is that as submitted, the patch greatly increases > the cost of the underlying query by adding two more catalogs > to the join. I don't think imposing such a cost on everybody > (whether they use foreign partitions or not) is worth that. But > we can add ", FOREIGN" for free since we have the relkind anyway. Fair enough, make sense. Revised version added per suggestions, which produces output like this: postgres=# \d+ parttest Partitioned table "public.parttest" Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description +-+---+--+-+--+-+--+- id | integer | | not null | | plain| | | val1 | text| | | | extended | | | val2 | text| | | | extended | | | Partition key: HASH (id) Partitions: parttest_10_0 FOR VALUES WITH (modulus 10, remainder 0), parttest_10_1 FOR VALUES WITH (modulus 10, remainder 1), FOREIGN, parttest_10_2 FOR VALUES WITH (modulus 10, remainder 2), parttest_10_3 FOR VALUES WITH (modulus 10, remainder 3), FOREIGN, parttest_10_4 FOR VALUES WITH (modulus 10, remainder 4), parttest_10_5 FOR VALUES WITH (modulus 10, remainder 5), FOREIGN, parttest_10_6 FOR VALUES WITH (modulus 10, remainder 6), parttest_10_7 FOR VALUES WITH (modulus 10, remainder 7), FOREIGN, parttest_10_8 FOR VALUES WITH (modulus 10, remainder 8), parttest_10_9 FOR VALUES WITH (modulus 10, remainder 9), FOREIGN Regards Ian Barwick commit 0b330a67e5941bacb815fa6dfae914c56563f7a9 Author: Ian Barwick Date: Sun Nov 6 21:08:26 2022 +0900 psql: in \d+, indicate foreign partitions Currently with a partitioned table, \d+ lists the partitions and their partition key, but it would be useful to see which ones, if any, are foreign partitions. diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c index c645d66418..2eae519b1d 100644 --- a/src/bin/psql/describe.c +++ b/src/bin/psql/describe.c @@ -3445,6 +3445,8 @@ describeOneTableDetails(const char *schemaname, if (child_relkind == RELKIND_PARTITIONED_TABLE || child_relkind == RELKIND_PARTITIONED_INDEX) appendPQExpBufferStr(&buf, ", PARTITIONED"); +else if (child_relkind == RELKIND_FOREIGN_TABLE) + appendPQExpBufferStr(&buf, ", FOREIGN"); if (strcmp(PQgetvalue(result, i, 2), "t") == 0) appendPQExpBufferStr(&buf, " (DETACH PENDING)"); if (i < tuples - 1) diff --git a/src/test/regress/expected/foreign_data.out b/src/test/regress/expected/foreign_data.out index 9d7610b948..47bf56adbf 100644 --- a/src/test/regress/expected/foreign_data.out +++ b/src/test/regress/expected/foreign_data.out @@ -1404,7 +1404,7 @@ CREATE FOREIGN TABLE ft2 () INHERITS (fd_pt1) c1 | integer | | not null | | plain| | c2 | text| | | | extended | | c3 | date| | | | plain| | -Child tables: ft2 +Child tables: ft2, FOREIGN \d+ ft2 Foreign table "public.ft2" @@ -1449,7 +1449,7 @@ ALTER FOREIGN TABLE ft2 INHERIT fd_pt1; c1 | integer | | not null | | plain| | c2 | text| | | | extended | | c3 | date| | | | plain| | -Child tables: ft2 +Child tables: ft2, FOREIGN \d+ ft2 Foreign table "public.ft2" @@ -1483,7 +1483,7 @@ Server: s0 FDW options: (delimiter ',', quote '"', "be quoted" 'value') Inherits: fd_pt1 Child tables: ct3, - ft3 + ft3, FOREIGN \d+ ct3 Table "public.ct3" @@ -1522,7 +1522,7 @@ ALTER TABLE fd_pt1 ADD COLUMN c8 integer; c6 | integer | | | | plain| | c7 | integer | | not null | | plain| | c8 | integer |
pg_upgrade, tables_with_oids.txt -> tables_with_oids.sql?
Hi, as I've just upgraded an instance which contained tables "WITH OIDS" I wonder if it would make sense if pg_upgrade directly creates a script to fix those. I know it is easy to that with e.g. sed over tables_with_oids.txt but it would be more convenient to have the script generated directly. Thoughts? Regards Daniel
Re: Improve logging when using Huge Pages
On Thu, Nov 3, 2022 at 8:11 AM Thomas Munro wrote: > > I wonder if the problem here is that people are reluctant to add noise > to every starting system. There are people who have not configured > their system and don't want to see that noise, and then some people > have configured their system and would like to know about it if it > doesn't work so they can be aware of that, but don't want to use "off" > because they don't want a hard failure. Would it be better if there > were a new level "try_log" (or something), which only logs a message > if it tries and fails? I think the best thing to do is change huge_pages='on' to log a WARNING and fallback to regular pages if the mapping fails. That way, both dev and prod can keep the same settings, since 'on' will have both visibility and robustness. I don't see a good reason to refuse to start -- seems like an anti-pattern. -- John Naylor EDB: http://www.enterprisedb.com