Re: postgres_fdw test timeouts
On Sun, Dec 3, 2023 at 6:00 PM Alexander Lakhin wrote: > I've managed to reproduce the failure locally when running postgres_fdw_x/ > regress in parallel (--num-processes 10). It reproduced for me on > on 04a09ee94 (iterations 1, 2, 4), but not on 04a09ee94~1 (30 iterations > passed). > > I'm going to investigate this case within days. Maybe we could find a > better fix for the issue. Thanks. One thing I can recommend to anyone trying to understand the change is that you view it with: git show --ignore-all-space 04a09ee ... because it changed a lot of indentation when wrapping a bunch of stuff in a new for loop.
Re: Schema variables - new implementation for Postgres 15
Hi ne 26. 11. 2023 v 18:56 odesílatel Dmitry Dolgov <9erthali...@gmail.com> napsal: > > On Sat, Nov 18, 2023 at 06:28:53PM +0100, Pavel Stehule wrote: > > so 18. 11. 2023 v 15:54 odesílatel Dmitry Dolgov <9erthali...@gmail.com> > > napsal: > > > As a side note, I'm intended to go one more time through the first few > > > patches introducing the basic functionality, and then mark it as ready > > > in CF. I can't break the patch in testing since quite long time, and > for > > > most parts the changes make sense to me. > > > > I marked pg_session_variables function as PARALLEL RESTRICTED, and did > > rebase > > So, after one week of uninterrupted evening reviews I've made it through > the first four patches :) > > It's a decent job -- more than once, looking at the code, I thought I > could construct a case when it's going to blow up, but everything was > working just fine. Yet, I think the patch still has to be reshaped a bit > before moving forward. I've got a couple proposals of different nature: > high level changes (you probably won't like some of them, but I'm sure > they're going to be useful), technical code-level improvements/comments, > and few language changes. With those changes in mind I would be > satisfied with the patch, and hopefully they would also make it easier > for a potential committer to pick it up. > > # High level proposals > > * I would suggest reducing the scope of the patch as much as possible, > and not just by trimming on the edges, but rather following Phileas > Fogg's example with the steamboat Henrietta -- get rid of all > non-essential parts. This will make this rather large patch more > approachable for others. > > For that one can concentrate only on the first two patches plus the > fourth one (memory cleanup after dropping variables), leaving DISCARD, > ON TRANSACTION END, DEFAULT, IMMUTABLE for the follow-up in the > future. > > Another thing in this context would be to evaluate plpgsql support for > this feature. You know the use case better than me, how important it > is? Is it an intrinsic part of the feature, or session variables could > be still valuable enough even without plpgsql? From what I see > postponing plgpsql will make everything about ~800 lines lighter (most > likely more), and also allow to ignore couple of concerns about the > implementation (about this later). > > * The new GUC session_variables_ambiguity_warning is definitely going to > cause many objections, it's another knob to manage very subtle > behaviour detail very few people will ever notice. I see the point > behind warning about ambiguity, so probably it makes sense to bite the > bullet and decide one way or another. The proposal is to warn always > in potentially ambiguous situations, and if concerns are high about > logging too much, maybe do the warning on lower logging levels. > > # Code-level observations > > * It feels a bit awkward to have varid assignment logic in a separate > function, what about adding an argument with varid to > CreateVariableDestReceiver? SetVariableDestReceiverVarid still could > be used for CreateDestReceiver. > > /* > * Initially create a DestReceiver object. > */ > DestReceiver * > CreateVariableDestReceiver(void) > > /* > * Set parameters for a VariableDestReceiver. > * Should be called right after creating the DestReceiver. > */ > void > SetVariableDestReceiverVarid(DestReceiver *self, Oid varid) > > * It's worth it to add a commentary here explaining why it's fine to use > InvalidOid here: > > if (pstmt->commandType != CMD_UTILITY) > - ExplainOnePlan(pstmt, into, es, query_string, paramLI, > queryEnv, > + ExplainOnePlan(pstmt, into, InvalidOid, es, query_string, > paramLI, queryEnv, >, (es->buffers ? : > NULL)); > > My understanding is that since LetStmt is CMD_UTILITY, this branch > will never be visited for a session variable. > > * IIUC this one is introduced to exclude session variables from the normal > path with EXPR_KIND_UPDATE_TARGET: > > + EXPR_KIND_ASSIGN_VARIABLE, /* PL/pgSQL assignment target - > disallow > +* session > variables */ > > But the name doesn't sound right, maybe longer > EXPR_KIND_UPDATE_TARGET_NO_VARS is better? > > * I'm curious about this one, which exactly part does this change cover? > > @@ -4888,21 +4914,43 @@ substitute_actual_parameters_mutator(Node *node, > - if (param->paramkind != PARAM_EXTERN) > + if (param->paramkind != PARAM_EXTERN && > + param->paramkind != PARAM_VARIABLE) > elog(ERROR, "unexpected paramkind: %d", (int) > param->paramkind); > > I've commented it out, but no tests were affected. > > * Does it mean there could be theoretically two LET statements at the > same time with different command type,
Re: postgres_fdw test timeouts
Hello Thomas, 03.12.2023 02:48, Thomas Munro wrote: Thanks for finding this correlation. Yeah, poking around in the cfbot history database I see about 1 failure like that per day since that date, and there doesn't seem to be anything else as obviously likely to be related to wakeups and timeouts. I don't understand what's wrong with the logic, and I think it would take someone willing to debug it locally to figure that out. Unless someone has an idea, I'm leaning towards reverting that commit and leaving the relatively minor problem that it was intended to fix as a TODO I've managed to reproduce the failure locally when running postgres_fdw_x/ regress in parallel (--num-processes 10). It reproduced for me on on 04a09ee94 (iterations 1, 2, 4), but not on 04a09ee94~1 (30 iterations passed). I'm going to investigate this case within days. Maybe we could find a better fix for the issue. Best regards, Alexander
Re: Patch: Improve Boolean Predicate JSON Path Docs
The following review has been posted through the commitfest application: make installcheck-world: not tested Implements feature: not tested Spec compliant: not tested Documentation:tested, passed I took a look for this commit, it looks correct to me
Re: postgres_fdw test timeouts
On Fri, Dec 1, 2023 at 9:38 AM Nathan Bossart wrote: > AFAICT the failures began around September 10th, which leads me to wonder > if this is related to commit 04a09ee. That is little more than a wild > guess, though. I haven't been able to deduce much else from the logs I can > find, and I didn't find any previous reports about this in the archives > after lots of searching, so I thought I'd at least park these notes here in > case anyone else has ideas. Thanks for finding this correlation. Yeah, poking around in the cfbot history database I see about 1 failure like that per day since that date, and there doesn't seem to be anything else as obviously likely to be related to wakeups and timeouts. I don't understand what's wrong with the logic, and I think it would take someone willing to debug it locally to figure that out. Unless someone has an idea, I'm leaning towards reverting that commit and leaving the relatively minor problem that it was intended to fix as a TODO.
Re: Is WAL_DEBUG related code still relevant today?
Nathan Bossart writes: > On Sat, Dec 02, 2023 at 07:36:29PM +0530, Bharath Rupireddy wrote: >> I started to think if this code is needed at all in production. How >> about we do either of the following? > Well, the fact that this code is hidden behind an off-by-default macro > seems like a pretty strong indicator that it is not intended for > production. But that doesn't mean we should remove it. Agreed, production is not the question here. The question is whether it's of any use to developers either. It looks to me that the code's been broken since v13, if not before, which very strongly suggests that nobody is using it. Think I'd vote for nuking it rather than putting effort into fixing it. regards, tom lane
Re: Emitting JSON to file using COPY TO
On 12/2/23 13:50, Maciek Sakrejda wrote: On Fri, Dec 1, 2023 at 11:32 AM Joe Conway wrote: 1. Is supporting JSON array format sufficient, or does it need to support some other options? How flexible does the support scheme need to be? "JSON Lines" is a semi-standard format [1] that's basically just newline-separated JSON values. (In fact, this is what log_destination=jsonlog gives you for Postgres logs, no?) It might be worthwhile to support that, too. [1]: https://jsonlines.org/ Yes, I have seen examples of that associated with other databases (MSSQL and Duckdb at least) as well. It probably makes sense to support that format too. -- Joe Conway PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Emitting JSON to file using COPY TO
On 12/2/23 16:53, Nathan Bossart wrote: On Sat, Dec 02, 2023 at 10:11:20AM -0500, Tom Lane wrote: So if you are writing a production that might need to match FORMAT followed by JSON, you need to match FORMAT_LA too. Thanks for the pointer. That does seem to be the culprit. diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index d631ac89a9..048494dd07 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -3490,6 +3490,10 @@ copy_generic_opt_elem: { $$ = makeDefElem($1, $2, @1); } +| FORMAT_LA copy_generic_opt_arg +{ +$$ = makeDefElem("format", $2, @1); +} ; copy_generic_opt_arg: Yep -- I concluded the same. Thanks Tom! -- Joe Conway PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Is WAL_DEBUG related code still relevant today?
On Sat, Dec 02, 2023 at 07:36:29PM +0530, Bharath Rupireddy wrote: > I started to think if this code is needed at all in production. How > about we do either of the following? Well, the fact that this code is hidden behind an off-by-default macro seems like a pretty strong indicator that it is not intended for production. But that doesn't mean we should remove it. > a) Remove the WAL_DEBUG macro and move all the code under the > wal_debug GUC? Since the GUC is already marked as DEVELOPER_OPTION, > the users will know the consequences of enabling it in production. I think the key to this option is verifying there's no measurable performance impact. > b) Remove both the WAL_DEBUG macro and the wal_debug GUC. I don't > think (2) is needed to be in core especially when tools like > pg_walinspect and pg_waldump can do the same job. And, the messages in > (3) and (4) can be turned to some DEBUGX level without being under the > WAL_DEBUG macro. Is there anything provided by wal_debug that can't be found via pg_walinspect/pg_waldump? > I have no idea if anyone uses WAL_DEBUG macro and wal_debug GUCs in > production, if we have somebody using it, I think we need to fix the > compilation and test failure issues, and start testing this code > (perhaps I can think of setting up a buildfarm member to help here). +1 for at least fixing the code and tests, provided we decide to keep it. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: about help message for new pg_dump's --filter option
> On 2 Dec 2023, at 17:02, Alvaro Herrera wrote: > > On 2023-Nov-30, Kyotaro Horiguchi wrote: > >> Hello. >> >> Recently, a new --filter option was added to pg_dump. I might be >> wrong, but the syntax of the help message for this feels off. Is the >> word 'on' not necessary after 'based'? >> >>> --filter=FILENAMEinclude or exclude objects and data from dump >>> based expressions in FILENAME > > Isn't this a bit too long? I was trying to come up with a shorter description but didn't come up with one that clearly enough described what it does. > Maybe we can do something shorter, like > > --filter=FILENAMEdetermine objects in dump based on FILENAME I don't think that's an improvement really, it's not obvious what "determine objects" means. How about a variation along these lines?: --filter=FILENAMEinclude/exclude objects based on rules in FILENAME If we want to use less horizontal space we can replace FILENAME with FILE, though I'd prefer not to since FILENAME is already used in the help output for --file setting a precedent. -- Daniel Gustafsson
Re: Emitting JSON to file using COPY TO
On Sat, Dec 02, 2023 at 10:11:20AM -0500, Tom Lane wrote: > So if you are writing a production that might need to match > FORMAT followed by JSON, you need to match FORMAT_LA too. Thanks for the pointer. That does seem to be the culprit. diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index d631ac89a9..048494dd07 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -3490,6 +3490,10 @@ copy_generic_opt_elem: { $$ = makeDefElem($1, $2, @1); } +| FORMAT_LA copy_generic_opt_arg +{ +$$ = makeDefElem("format", $2, @1); +} ; copy_generic_opt_arg: -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Proposal obfuscate password in pg logs
Guanqun Yang writes: > We notice Postgres logs, pg_stat_statements and pg_stat_activity will > record passwords when using "CREATE" statement to create user with > password. Can we provide users with an option to obfuscate those passwords? See the many, many prior discussions of this idea. The short answer is that you're better off securing your logs. regards, tom lane
Re: Proposal: In-flight explain logging
Hello Maciek, Thanks for pointing that out. They are indeed super similar. Before I wrote the patch I searched for "explain" related ones. I guess I should have performed a better search. Comparing the patches, there is one main difference: the existing patch prints only the plan without any instrumentation details of the current execution state at that particular time. That is precisely what I am looking for with this new feature. I guess the way to proceed here would be to use the already existing patch as a lot of work was done there already.
Proposal obfuscate password in pg logs
hey guys, We notice Postgres logs, pg_stat_statements and pg_stat_activity will record passwords when using "CREATE" statement to create user with password. Can we provide users with an option to obfuscate those passwords? Yours, Guanqun
Re: Proposal: In-flight explain logging
Have you seen the other recent patch regarding this? [1] The mailing list thread was active pretty recently. The submission is marked as Needs Review. I haven't looked at either patch, but the proposals are very similar as I understand it. [1]: https://commitfest.postgresql.org/45/4345/
Proposal: In-flight explain logging
Hello hackers, # MOTIVATION My recent experiences with problematic queries in customers motivated me to write this patch proposing a new feature to enhance visibility on what active queries are doing. PostgreSQL already offers 2 very powerful tools for query troubleshooting: - EXPLAIN: gives us hints on potential bottlenecks in an execution plan. - EXPLAIN ANALYZE: shows precisely where bottlenecks are, but the query must finish. In my humble opinion we are missing something in the middle. Having visibility over in-flight queries would provide more insights than a plain EXPLAIN and would allow us to analyze super problematic queries that never finish a EXPLAIN ANALYZE execution. Considering that every active query has an execution plan, the new feature can target not only controlled EXPLAIN statements but also any query in progress. This allows us to identify if a slow active query is using a different plan and why (for example, custom settings set a session level that are currently only visible to the backend). # PROPOSAL The feature works similarly to the recently introduced pg_log_backend_memory_contexts(). The patch adds function pg_log_backend_explain_plan(PID) to be executed as superuser in a second backend to signal the target backend to print execution plan details in the log. For regular queries (called without instrumentation) PG will log the plain explain along with useful details like custom settings. When targeting a query with instrumentation enabled PG will log the complete EXPLAIN ANALYZE plan with current row count and, if enabled, timing for each node. Considering that the query is in progress the output will include the following per node: - (never executed) for nodes that weren't touched yet (or may never be). - (in progress) for nodes currently being executed, ie, InstrStartNode was called and clock is ticking there. Parallel workers can be targeted too, where PG will log only the relevant part of the complete execution plan. # DEMONSTRATION a) Targeting a not valid PG process: postgres=# select pg_log_backend_explain_plan(1); WARNING: PID 1 is not a PostgreSQL server process pg_log_backend_explain_plan - f (1 row) b) Targeting a PG process not running a query: postgres=# select pg_log_backend_explain_plan(24103); pg_log_backend_explain_plan - t (1 row) 2023-12-02 16:30:19.979 UTC [24103] LOG: PID 24103 not executing a statement with in-flight explain logging enabled c) Targeting an active query without any instrumentation: postgres=# select pg_log_backend_explain_plan(24103); pg_log_backend_explain_plan - t (1 row) 2023-12-02 16:33:10.968 UTC [24103] LOG: logging explain plan of PID 24103 Query Text: select * from t2 a inner join t1 b on a.c1=b.c1 inner join t1 c on a.c1=c.c1 inner join t1 d on a.c1=d.c1 inner join t1 e on a.c1=e.c1; Gather (cost=70894.63..202643.27 rows=100 width=20) Workers Planned: 2 -> Parallel Hash Join (cost=69894.63..101643.27 rows=416667 width=20) Hash Cond: (a.c1 = e.c1) -> Parallel Hash Join (cost=54466.62..77218.65 rows=416667 width=16) Hash Cond: (a.c1 = c.c1) -> Parallel Hash Join (cost=15428.00..29997.42 rows=416667 width=8) Hash Cond: (b.c1 = a.c1) -> Parallel Seq Scan on t1 b (cost=0.00..8591.67 rows=416667 width=4) -> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4) -> Parallel Seq Scan on t2 a (cost=0.00..8591.67 rows=416667 width=4) -> Parallel Hash (cost=32202.28..32202.28 rows=416667 width=8) -> Parallel Hash Join (cost=15428.00..32202.28 rows=416667 width=8) Hash Cond: (c.c1 = d.c1) -> Parallel Seq Scan on t1 c (cost=0.00..8591.67 rows=416667 width=4) -> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4) -> Parallel Seq Scan on t1 d (cost=0.00..8591.67 rows=416667 width=4) -> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4) -> Parallel Seq Scan on t1 e (cost=0.00..8591.67 rows=416667 width=4) Settings: max_parallel_workers_per_gather = '4' d) Targeting a parallel query (and its parallel workers) with instrumentation: postgres=# select pid, backend_type,pg_log_backend_explain_plan(pid) postgres=# from pg_stat_activity postgres=# where (backend_type = 'client backend' and pid != pg_backend_pid()) postgres=#or backend_type = 'parallel worker'; pid | backend_type | pg_log_backend_explain_plan ---+-+- 24103 | client backend | t 24389 | parallel worker | t 24390 | parallel worker | t (3 rows) 2023-12-02 16:36:34.840 UTC [24103] LOG: logging explain plan of PID 24103 Query Text: explain (analyze, buffers) select * from t2 a inner join t1 b
Re: Emitting JSON to file using COPY TO
On Fri, Dec 1, 2023 at 11:32 AM Joe Conway wrote: > 1. Is supporting JSON array format sufficient, or does it need to > support some other options? How flexible does the support scheme need to be? "JSON Lines" is a semi-standard format [1] that's basically just newline-separated JSON values. (In fact, this is what log_destination=jsonlog gives you for Postgres logs, no?) It might be worthwhile to support that, too. [1]: https://jsonlines.org/
Re: SQL:2011 application time
On Thu, Nov 23, 2023 at 1:08 AM Peter Eisentraut wrote: > After further thought, I think the right solution is to change > btree_gist (and probably also btree_gin) to use the common RT* strategy > numbers. Okay. That will mean bumping the version of btree_gist, and you must be running that version to use the new temporal features, or you will get silly results. Right? Is there a way to protect users against that and communicate they need to upgrade the extension? This also means temporal features may not work in custom GIST opclasses. What we're saying is they must have an appropriate operator for RTEqualStrategyNumber (18) and RTOverlapStrategyNumber (3). Equal matters for the scalar key part(s), overlap for the range part. So equal is more likely to be an issue, but overlap matters if we want to support non-ranges (which I'd say is worth doing). Also if they get it wrong, we won't really have any way to report an error. I did some research on other extensions in contrib, as well as PostGIS. Here is what I found: ## btree_gin: 3 is = 18 is undefined same for all types: macaddr8, int2, int4, int8, float4, float8, oid, timestamp, timestamptz, time, timetz, date, interval, inet, cidr, text, varchar, char, bytea, bit, varbit, numeric, anyenum, uuid, name, bool, bpchar ## cube 3 is && 18 is <=> ## intarray 3 is && 18 is undefined ## ltree 3 is = 18 is undefined ## hstore 3 and 18 are undefined ## seg 3 is && 18 is undefined ## postgis: geometry 3 is && 18 is undefined ## postgis: geometry_nd 3 is &&& 18 is undefined I thought about looking through pgxn for more, but I haven't yet. I may still do that. But already it seems like there is not much consistency. So what do you think of this idea instead?: We could add a new (optional) support function to GiST that translates "well-known" strategy numbers into the opclass's own strategy numbers. This would be support function 12. Then we can say translateStrategyNumber(RTEqualStrategyNumber) and look up the operator with the result. There is not a performance hit, because we do this for the DDL command (create pk/uq/fk), then store the operator in the index/constraint. If you don't provide this new support function, then creating the pk/uq/fk fails with a hint about what you can do to make it work. This approach means we don't change the rules about GiST opclasses: you can still use the stranums how you like. This function would also let me support non-range "temporal" foreign keys, where I'll need to build queries with && and maybe other operators. What do you think? -- Paul ~{:-) p...@illuminatedcomputing.com
Re: undetected deadlock in ALTER SUBSCRIPTION ... REFRESH PUBLICATION
Hi, > thread. I think you can compare the timing of regression tests in > subscription, with and without the patch to show there is no > regression. And probably some tests with a large number of tables for > sync with very little data. I have tested the regression test timings for subscription with and without patch. I also did the timing test for sync of subscription with the publisher for 100 and 1000 tables respectively. I have attached the test script and results of the timing test are as follows: Time taken for test to run in Linux VM Summary| Subscription Test (sec) |100 tables in pub and Sub (sec)| 1000 tables in pub and Sub (sec) Without patch Release | 95.564 | 7.877 | 58.919 With patch Release| 96.513 | 6.533 | 45.807 Time Taken for test to run in another Linux VM Summary| Subscription Test (sec) |100 tables in pub and Sub (sec)| 1000 tables in pub and Sub (sec) Without patch Release | 109.8145 |6.4675 |83.001 With patch Release| 113.162 |7.947 | 87.113 Time Taken for test to run in Performance Machine Linux Summary| Subscription Test (sec) |100 tables in pub and Sub (sec)| 1000 tables in pub and Sub (sec) Without patch Release | 115.871 | 6.656 | 81.157 With patch Release| 115.922 | 6.7305 | 81.1525 thoughts? Thanks and Regards, Shlok Kyal 034_tmp.pl Description: Binary data
Re: about help message for new pg_dump's --filter option
On 2023-Nov-30, Kyotaro Horiguchi wrote: > Hello. > > Recently, a new --filter option was added to pg_dump. I might be > wrong, but the syntax of the help message for this feels off. Is the > word 'on' not necessary after 'based'? > > > --filter=FILENAMEinclude or exclude objects and data from dump > > based expressions in FILENAME Isn't this a bit too long? Maybe we can do something shorter, like --filter=FILENAMEdetermine objects in dump based on FILENAME -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ "Those who use electric razors are infidels destined to burn in hell while we drink from rivers of beer, download free vids and mingle with naked well shaved babes." (http://slashdot.org/comments.pl?sid=44793=4647152)
Re: Emitting JSON to file using COPY TO
Joe Conway writes: >> I noticed that, with the PoC patch, "json" is the only format that must be >> quoted. Without quotes, I see a syntax error. I'm assuming there's a >> conflict with another json-related rule somewhere in gram.y, but I haven't >> tracked down exactly which one is causing it. While I've not looked too closely, I suspect this might be due to the FORMAT_LA hack in base_yylex: /* Replace FORMAT by FORMAT_LA if it's followed by JSON */ switch (next_token) { case JSON: cur_token = FORMAT_LA; break; } So if you are writing a production that might need to match FORMAT followed by JSON, you need to match FORMAT_LA too. (I spent a little bit of time last week trying to get rid of FORMAT_LA, thinking that it didn't look necessary. Did not succeed yet.) regards, tom lane
Re: Emitting JSON to file using COPY TO
On 12/1/23 18:09, Nathan Bossart wrote: On Fri, Dec 01, 2023 at 02:28:55PM -0500, Joe Conway wrote: I did a quick PoC patch (attached) -- if there interest and no hard objections I would like to get it up to speed for the January commitfest. Cool. I would expect there to be interest, given all the other JSON support that has been added thus far. Thanks for the review I noticed that, with the PoC patch, "json" is the only format that must be quoted. Without quotes, I see a syntax error. I'm assuming there's a conflict with another json-related rule somewhere in gram.y, but I haven't tracked down exactly which one is causing it. It seems to be because 'json' is also a type name ($$ = SystemTypeName("json")). What do you think about using 'json_array' instead? It is more specific and accurate, and avoids the need to quote. test=# copy foo to stdout (format json_array); [ {"id":1,"f1":"line: 1","f2":"2023-12-01T12:58:16.776863-05:00"} ,{"id":2,"f1":"line: 2","f2":"2023-12-01T12:58:16.777084-05:00"} ,{"id":3,"f1":"line: 3","f2":"2023-12-01T12:58:16.777096-05:00"} ,{"id":4,"f1":"line: 4","f2":"2023-12-01T12:58:16.777103-05:00"} ] 1. Is supporting JSON array format sufficient, or does it need to support some other options? How flexible does the support scheme need to be? I don't presently have a strong opinion on this one. My instinct would be start with something simple, though. I don't think we offer any special options for log_destination... WFM 2. This only supports COPY TO and we would undoubtedly want to support COPY FROM for JSON as well, but is that required from the start? I would vote for including COPY FROM support from the start. Check. My thought is to only accept the same format we emit -- i.e. only take a json array. ! if (!cstate->opts.json_mode) I think it's unfortunate that this further complicates the branching in CopyOneRowTo(), but after some quick glances at the code, I'm not sure it's worth refactoring a bunch of stuff to make this nicer. Yeah that was my conclusion. -- Joe Conway PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Is WAL_DEBUG related code still relevant today?
Hi, I was recently looking at the code around the WAL_DEBUG macro and GUC. When enabled, the code does the following: 1. Creates a memory context that allows pallocs within critical sections. 2. Decodes (not logical decoding but DecodeXLogRecord()) every WAL record using the above memory context that's generated in the server and emits a LOG message. 3. Emits messages at DEBUG level in AdvanceXLInsertBuffer(), at LOG level in XLogFlush(), at LOG level in XLogBackgroundFlush(). 4. Emits messages at LOG level for every record that the server replays/applies in the main redo loop. I enabled this code by compiling with the WAL_DEBUG macro and setting wal_debug GUC to on. Firstly, the compilation on Windows failed because XL_ROUTINE was passed inappropriately for XLogReaderAllocate() used. After fixing the compilation issue [1], the TAP tests started to fail [2] which I'm sure we can fix. I started to think if this code is needed at all in production. How about we do either of the following? a) Remove the WAL_DEBUG macro and move all the code under the wal_debug GUC? Since the GUC is already marked as DEVELOPER_OPTION, the users will know the consequences of enabling it in production. b) Remove both the WAL_DEBUG macro and the wal_debug GUC. I don't think (2) is needed to be in core especially when tools like pg_walinspect and pg_waldump can do the same job. And, the messages in (3) and (4) can be turned to some DEBUGX level without being under the WAL_DEBUG macro. I have no idea if anyone uses WAL_DEBUG macro and wal_debug GUCs in production, if we have somebody using it, I think we need to fix the compilation and test failure issues, and start testing this code (perhaps I can think of setting up a buildfarm member to help here). I'm in favour of option (b), but I'd like to hear more thoughts on this. [1] diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index ca7100d4db..52633793d4 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -1023,8 +1023,12 @@ XLogInsertRecord(XLogRecData *rdata, palloc(DecodeXLogRecordRequiredSpace(record->xl_tot_len)); if (!debug_reader) - debug_reader = XLogReaderAllocate(wal_segment_size, NULL, - XL_ROUTINE(), NULL); + debug_reader = XLogReaderAllocate(wal_segment_size, + NULL, + XL_ROUTINE(.page_read = NULL, + .segment_open = NULL, + .segment_close = NULL), + NULL); [2] src/test/subscription/t/029_on_error.pl because the test gets LSN from an error context message emitted to server logs which the new WAL_DEBUG LOG messages flood the server logs with. src/bin/initdb/t/001_initdb.pl because the WAL_DEBUG LOG messages are emitted to the console while initdb. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Remove unnecessary includes of system headers in header files
On 01.12.23 17:41, Tom Lane wrote: Peter Eisentraut writes: I noticed that some header files included system header files for no apparent reason, so I did some digging and found out that in a few cases the original reason has disappeared. So I propose the attached patches to remove the unnecessary includes. Seems generally reasonable. Have you checked that headerscheck and cpluspluscheck are happy? Yes, I ran it through Cirrus, which includes those checks.
Re: Change GUC hashtable to use simplehash?
On Wed, Nov 29, 2023 at 8:31 PM John Naylor wrote: > > Attached is a rough start with Andres's earlier ideas, to get > something concrete out there. While looking at the assembly out of curiosity, I found a couple bugs in the split API that I've fixed locally. I think the path forward is: - performance measurements with both byte-at-a-time and word-at-a-time, once I make sure they're fixed - based on the above decide which one is best for guc_name_hash - clean up hash function implementation - test with with a new guc_name_compare (using what we learned from my guc_name_eq) and see how well we do with keeping dynahash vs. simplehash Separately, for string_hash: - run SMHasher and see about reincorporating length in the calculation. v5 should be a clear improvement in collision behavior over the current guc_name_hash, but we need to make sure it's at least as good as hash_bytes, and ideally not lose anything compared to standard fast_hash.