Re: postgres_fdw test timeouts

2023-12-02 Thread Thomas Munro
On Sun, Dec 3, 2023 at 6:00 PM Alexander Lakhin  wrote:
> I've managed to reproduce the failure locally when running postgres_fdw_x/
> regress in parallel (--num-processes 10). It reproduced for me on
> on 04a09ee94 (iterations 1, 2, 4), but not on 04a09ee94~1 (30 iterations
> passed).
>
> I'm going to investigate this case within days. Maybe we could find a
> better fix for the issue.

Thanks.  One thing I can recommend to anyone trying to understand the
change is that you view it with:

git show --ignore-all-space 04a09ee

... because it changed a lot of indentation when wrapping a bunch of
stuff in a new for loop.




Re: Schema variables - new implementation for Postgres 15

2023-12-02 Thread Pavel Stehule
Hi

ne 26. 11. 2023 v 18:56 odesílatel Dmitry Dolgov <9erthali...@gmail.com>
napsal:

> > On Sat, Nov 18, 2023 at 06:28:53PM +0100, Pavel Stehule wrote:
> > so 18. 11. 2023 v 15:54 odesílatel Dmitry Dolgov <9erthali...@gmail.com>
> > napsal:
> > > As a side note, I'm intended to go one more time through the first few
> > > patches introducing the basic functionality, and then mark it as ready
> > > in CF. I can't break the patch in testing since quite long time, and
> for
> > > most parts the changes make sense to me.
> >
> > I marked pg_session_variables function as PARALLEL RESTRICTED, and did
> > rebase
>
> So, after one week of uninterrupted evening reviews I've made it through
> the first four patches :)
>
> It's a decent job -- more than once, looking at the code, I thought I
> could construct a case when it's going to blow up, but everything was
> working just fine. Yet, I think the patch still has to be reshaped a bit
> before moving forward. I've got a couple proposals of different nature:
> high level changes (you probably won't like some of them, but I'm sure
> they're going to be useful), technical code-level improvements/comments,
> and few language changes. With those changes in mind I would be
> satisfied with the patch, and hopefully they would also make it easier
> for a potential committer to pick it up.
>
> # High level proposals
>
> * I would suggest reducing the scope of the patch as much as possible,
>   and not just by trimming on the edges, but rather following Phileas
>   Fogg's example with the steamboat Henrietta -- get rid of all
>   non-essential parts. This will make this rather large patch more
>   approachable for others.
>
>   For that one can concentrate only on the first two patches plus the
>   fourth one (memory cleanup after dropping variables), leaving DISCARD,
>   ON TRANSACTION END, DEFAULT, IMMUTABLE for the follow-up in the
>   future.
>
>   Another thing in this context would be to evaluate plpgsql support for
>   this feature. You know the use case better than me, how important it
>   is? Is it an intrinsic part of the feature, or session variables could
>   be still valuable enough even without plpgsql? From what I see
>   postponing plgpsql will make everything about ~800 lines lighter (most
>   likely more), and also allow to ignore couple of concerns about the
>   implementation (about this later).
>
> * The new GUC session_variables_ambiguity_warning is definitely going to
>   cause many objections, it's another knob to manage very subtle
>   behaviour detail very few people will ever notice. I see the point
>   behind warning about ambiguity, so probably it makes sense to bite the
>   bullet and decide one way or another. The proposal is to warn always
>   in potentially ambiguous situations, and if concerns are high about
>   logging too much, maybe do the warning on lower logging levels.
>
> # Code-level observations
>
> * It feels a bit awkward to have varid assignment logic in a separate
>   function, what about adding an argument with varid to
>   CreateVariableDestReceiver? SetVariableDestReceiverVarid still could
>   be used for CreateDestReceiver.
>
> /*
>  * Initially create a DestReceiver object.
>  */
> DestReceiver *
> CreateVariableDestReceiver(void)
>
> /*
>  * Set parameters for a VariableDestReceiver.
>  * Should be called right after creating the DestReceiver.
>  */
> void
> SetVariableDestReceiverVarid(DestReceiver *self, Oid varid)
>
> * It's worth it to add a commentary here explaining why it's fine to use
>   InvalidOid here:
>
>  if (pstmt->commandType != CMD_UTILITY)
> -   ExplainOnePlan(pstmt, into, es, query_string, paramLI,
> queryEnv,
> +   ExplainOnePlan(pstmt, into, InvalidOid, es, query_string,
> paramLI, queryEnv,
>, (es->buffers ?  :
> NULL));
>
>   My understanding is that since LetStmt is CMD_UTILITY, this branch
>   will never be visited for a session variable.
>
> * IIUC this one is introduced to exclude session variables from the normal
>   path with EXPR_KIND_UPDATE_TARGET:
>
> +   EXPR_KIND_ASSIGN_VARIABLE,  /* PL/pgSQL assignment target -
> disallow
> +* session
> variables */
>
>   But the name doesn't sound right, maybe longer
>   EXPR_KIND_UPDATE_TARGET_NO_VARS is better?
>
> * I'm curious about this one, which exactly part does this change cover?
>
> @@ -4888,21 +4914,43 @@ substitute_actual_parameters_mutator(Node *node,
> -   if (param->paramkind != PARAM_EXTERN)
> +   if (param->paramkind != PARAM_EXTERN &&
> +   param->paramkind != PARAM_VARIABLE)
> elog(ERROR, "unexpected paramkind: %d", (int)
> param->paramkind);
>
>   I've commented it out, but no tests were affected.
>
> * Does it mean there could be theoretically two LET statements at the
>   same time with different command type, 

Re: postgres_fdw test timeouts

2023-12-02 Thread Alexander Lakhin

Hello Thomas,

03.12.2023 02:48, Thomas Munro wrote:

Thanks for finding this correlation.  Yeah, poking around in the cfbot
history database I see about 1 failure like that per day since that
date, and there doesn't seem to be anything else as obviously likely
to be related to wakeups and timeouts.  I don't understand what's
wrong with the logic, and I think it would take someone willing to
debug it locally to figure that out.  Unless someone has an idea, I'm
leaning towards reverting that commit and leaving the relatively minor
problem that it was intended to fix as a TODO


I've managed to reproduce the failure locally when running postgres_fdw_x/
regress in parallel (--num-processes 10). It reproduced for me on
on 04a09ee94 (iterations 1, 2, 4), but not on 04a09ee94~1 (30 iterations
passed).

I'm going to investigate this case within days. Maybe we could find a
better fix for the issue.

Best regards,
Alexander




Re: Patch: Improve Boolean Predicate JSON Path Docs

2023-12-02 Thread shihao zhong
The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:   not tested
Spec compliant:   not tested
Documentation:tested, passed

I took a look for this commit, it looks correct to me

Re: postgres_fdw test timeouts

2023-12-02 Thread Thomas Munro
On Fri, Dec 1, 2023 at 9:38 AM Nathan Bossart  wrote:
> AFAICT the failures began around September 10th, which leads me to wonder
> if this is related to commit 04a09ee.  That is little more than a wild
> guess, though.  I haven't been able to deduce much else from the logs I can
> find, and I didn't find any previous reports about this in the archives
> after lots of searching, so I thought I'd at least park these notes here in
> case anyone else has ideas.

Thanks for finding this correlation.  Yeah, poking around in the cfbot
history database I see about 1 failure like that per day since that
date, and there doesn't seem to be anything else as obviously likely
to be related to wakeups and timeouts.  I don't understand what's
wrong with the logic, and I think it would take someone willing to
debug it locally to figure that out.  Unless someone has an idea, I'm
leaning towards reverting that commit and leaving the relatively minor
problem that it was intended to fix as a TODO.




Re: Is WAL_DEBUG related code still relevant today?

2023-12-02 Thread Tom Lane
Nathan Bossart  writes:
> On Sat, Dec 02, 2023 at 07:36:29PM +0530, Bharath Rupireddy wrote:
>> I started to think if this code is needed at all in production. How
>> about we do either of the following?

> Well, the fact that this code is hidden behind an off-by-default macro
> seems like a pretty strong indicator that it is not intended for
> production.  But that doesn't mean we should remove it. 

Agreed, production is not the question here.  The question is whether
it's of any use to developers either.  It looks to me that the code's
been broken since v13, if not before, which very strongly suggests
that nobody is using it.  Think I'd vote for nuking it rather than
putting effort into fixing it.

regards, tom lane




Re: Emitting JSON to file using COPY TO

2023-12-02 Thread Joe Conway

On 12/2/23 13:50, Maciek Sakrejda wrote:

On Fri, Dec 1, 2023 at 11:32 AM Joe Conway  wrote:

1. Is supporting JSON array format sufficient, or does it need to
support some other options? How flexible does the support scheme need to be?


"JSON Lines" is a semi-standard format [1] that's basically just
newline-separated JSON values. (In fact, this is what
log_destination=jsonlog gives you for Postgres logs, no?) It might be
worthwhile to support that, too.

[1]: https://jsonlines.org/



Yes, I have seen examples of that associated with other databases (MSSQL 
and Duckdb at least) as well. It probably makes sense to support that 
format too.


--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com





Re: Emitting JSON to file using COPY TO

2023-12-02 Thread Joe Conway

On 12/2/23 16:53, Nathan Bossart wrote:

On Sat, Dec 02, 2023 at 10:11:20AM -0500, Tom Lane wrote:

So if you are writing a production that might need to match
FORMAT followed by JSON, you need to match FORMAT_LA too.


Thanks for the pointer.  That does seem to be the culprit.

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index d631ac89a9..048494dd07 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -3490,6 +3490,10 @@ copy_generic_opt_elem:
  {
  $$ = makeDefElem($1, $2, @1);
  }
+| FORMAT_LA copy_generic_opt_arg
+{
+$$ = makeDefElem("format", $2, @1);
+}
  ;
  
  copy_generic_opt_arg:



Yep -- I concluded the same. Thanks Tom!

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com





Re: Is WAL_DEBUG related code still relevant today?

2023-12-02 Thread Nathan Bossart
On Sat, Dec 02, 2023 at 07:36:29PM +0530, Bharath Rupireddy wrote:
> I started to think if this code is needed at all in production. How
> about we do either of the following?

Well, the fact that this code is hidden behind an off-by-default macro
seems like a pretty strong indicator that it is not intended for
production.  But that doesn't mean we should remove it. 

> a) Remove the WAL_DEBUG macro and move all the code under the
> wal_debug GUC? Since the GUC is already marked as DEVELOPER_OPTION,
> the users will know the consequences of enabling it in production.

I think the key to this option is verifying there's no measurable
performance impact.

> b) Remove both the WAL_DEBUG macro and the wal_debug GUC. I don't
> think (2) is needed to be in core especially when tools like
> pg_walinspect and pg_waldump can do the same job. And, the messages in
> (3) and (4) can be turned to some DEBUGX level without being under the
> WAL_DEBUG macro.

Is there anything provided by wal_debug that can't be found via
pg_walinspect/pg_waldump?

> I have no idea if anyone uses WAL_DEBUG macro and wal_debug GUCs in
> production, if we have somebody using it, I think we need to fix the
> compilation and test failure issues, and start testing this code
> (perhaps I can think of setting up a buildfarm member to help here).

+1 for at least fixing the code and tests, provided we decide to keep it.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: about help message for new pg_dump's --filter option

2023-12-02 Thread Daniel Gustafsson
> On 2 Dec 2023, at 17:02, Alvaro Herrera  wrote:
> 
> On 2023-Nov-30, Kyotaro Horiguchi wrote:
> 
>> Hello.
>> 
>> Recently, a new --filter option was added to pg_dump. I might be
>> wrong, but the syntax of the help message for this feels off. Is the
>> word 'on' not necessary after 'based'?
>> 
>>> --filter=FILENAMEinclude or exclude objects and data from dump
>>>  based expressions in FILENAME
> 
> Isn't this a bit too long?

I was trying to come up with a shorter description but didn't come up with one
that clearly enough described what it does.

>  Maybe we can do something shorter, like
> 
>  --filter=FILENAMEdetermine objects in dump based on FILENAME

I don't think that's an improvement really, it's not obvious what "determine
objects" means.  How about a variation along these lines?:

--filter=FILENAMEinclude/exclude objects based on rules in FILENAME

If we want to use less horizontal space we can replace FILENAME with FILE,
though I'd prefer not to since FILENAME is already used in the help output for
--file setting a precedent.

--
Daniel Gustafsson





Re: Emitting JSON to file using COPY TO

2023-12-02 Thread Nathan Bossart
On Sat, Dec 02, 2023 at 10:11:20AM -0500, Tom Lane wrote:
> So if you are writing a production that might need to match
> FORMAT followed by JSON, you need to match FORMAT_LA too.

Thanks for the pointer.  That does seem to be the culprit.

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index d631ac89a9..048494dd07 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -3490,6 +3490,10 @@ copy_generic_opt_elem:
 {
 $$ = makeDefElem($1, $2, @1);
 }
+| FORMAT_LA copy_generic_opt_arg
+{
+$$ = makeDefElem("format", $2, @1);
+}
 ;
 
 copy_generic_opt_arg:

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Proposal obfuscate password in pg logs

2023-12-02 Thread Tom Lane
Guanqun Yang  writes:
> We notice Postgres logs, pg_stat_statements and pg_stat_activity will
> record passwords when using "CREATE" statement to create user with
> password. Can we provide users with an option to obfuscate those passwords?

See the many, many prior discussions of this idea.
The short answer is that you're better off securing your logs.

regards, tom lane




Re: Proposal: In-flight explain logging

2023-12-02 Thread Rafael Thofehrn Castro
Hello Maciek,

Thanks for pointing that out. They are indeed super similar. Before I wrote
the patch I searched for
"explain" related ones. I guess I should have performed a better search.

Comparing the patches, there is one main difference: the existing patch
prints only the plan without
any instrumentation details of the current execution state at that
particular time. That is precisely what
I am looking for with this new feature.

I guess the way to proceed here would be to use the already existing patch
as a lot of work was done
there already.


Proposal obfuscate password in pg logs

2023-12-02 Thread Guanqun Yang
hey guys,

We notice Postgres logs, pg_stat_statements and pg_stat_activity will
record passwords when using "CREATE" statement to create user with
password. Can we provide users with an option to obfuscate those passwords?

Yours,
Guanqun


Re: Proposal: In-flight explain logging

2023-12-02 Thread Maciek Sakrejda
Have you seen the other recent patch regarding this? [1] The mailing
list thread was active pretty recently. The submission is marked as
Needs Review. I haven't looked at either patch, but the proposals are
very similar as I understand it.

[1]: https://commitfest.postgresql.org/45/4345/




Proposal: In-flight explain logging

2023-12-02 Thread Rafael Thofehrn Castro
Hello hackers,

# MOTIVATION

My recent experiences with problematic queries in customers motivated
me to write this patch proposing a new feature to enhance visibility
on what active queries are doing.

PostgreSQL already offers 2 very powerful tools for query troubleshooting:

- EXPLAIN: gives us hints on potential bottlenecks in an execution plan.

- EXPLAIN ANALYZE: shows precisely where bottlenecks are, but the query
must finish.

In my humble opinion we are missing something in the middle. Having
visibility
over in-flight queries would provide more insights than a plain EXPLAIN
and would allow us to analyze super problematic queries that never finish
a EXPLAIN ANALYZE execution.

Considering that every active query has an execution plan, the new feature
can target not only controlled EXPLAIN statements but also any query in
progress. This allows us to identify if a slow active query is using a
different plan and why (for example, custom settings set a session level
that are currently only visible to the backend).

# PROPOSAL

The feature works similarly to the recently introduced
pg_log_backend_memory_contexts().

The patch adds function pg_log_backend_explain_plan(PID) to be executed as
superuser in a second backend to signal the target backend to print
execution
plan details in the log.

For regular queries (called without instrumentation) PG will log the plain
explain along with useful details like custom settings.

When targeting a query with instrumentation enabled PG will log the complete
EXPLAIN ANALYZE plan with current row count and, if enabled, timing for each
node. Considering that the query is in progress the output will include the
following per node:

- (never executed) for nodes that weren't touched yet (or
 may never be).
- (in progress) for nodes currently being executed, ie,
 InstrStartNode was called and clock is ticking there.

Parallel workers can be targeted too, where PG will log only the relevant
part
of the complete execution plan.

# DEMONSTRATION

a) Targeting a not valid PG process:

postgres=# select pg_log_backend_explain_plan(1);
WARNING:  PID 1 is not a PostgreSQL server process
 pg_log_backend_explain_plan
-
 f
(1 row)

b) Targeting a PG process not running a query:

postgres=# select pg_log_backend_explain_plan(24103);
 pg_log_backend_explain_plan
-
 t
(1 row)

2023-12-02 16:30:19.979 UTC [24103] LOG:  PID 24103 not executing a
statement with in-flight explain logging enabled

c) Targeting an active query without any instrumentation:

postgres=# select pg_log_backend_explain_plan(24103);
 pg_log_backend_explain_plan
-
 t
(1 row)

2023-12-02 16:33:10.968 UTC [24103] LOG:  logging explain plan of PID 24103
Query Text: select *
from t2 a
inner join t1 b on a.c1=b.c1
inner join t1 c on a.c1=c.c1
inner join t1 d on a.c1=d.c1
inner join t1 e on a.c1=e.c1;
Gather  (cost=70894.63..202643.27 rows=100 width=20)
 Workers Planned: 2
 ->  Parallel Hash Join  (cost=69894.63..101643.27 rows=416667 width=20)
   Hash Cond: (a.c1 = e.c1)
   ->  Parallel Hash Join  (cost=54466.62..77218.65 rows=416667
width=16)
 Hash Cond: (a.c1 = c.c1)
 ->  Parallel Hash Join  (cost=15428.00..29997.42 rows=416667
width=8)
   Hash Cond: (b.c1 = a.c1)
   ->  Parallel Seq Scan on t1 b  (cost=0.00..8591.67
rows=416667 width=4)
   ->  Parallel Hash  (cost=8591.67..8591.67 rows=416667
width=4)
 ->  Parallel Seq Scan on t2 a  (cost=0.00..8591.67
rows=416667 width=4)
 ->  Parallel Hash  (cost=32202.28..32202.28 rows=416667
width=8)
   ->  Parallel Hash Join  (cost=15428.00..32202.28
rows=416667 width=8)
 Hash Cond: (c.c1 = d.c1)
 ->  Parallel Seq Scan on t1 c  (cost=0.00..8591.67
rows=416667 width=4)
 ->  Parallel Hash  (cost=8591.67..8591.67
rows=416667 width=4)
   ->  Parallel Seq Scan on t1 d
 (cost=0.00..8591.67 rows=416667 width=4)
   ->  Parallel Hash  (cost=8591.67..8591.67 rows=416667 width=4)
 ->  Parallel Seq Scan on t1 e  (cost=0.00..8591.67 rows=416667
width=4)
Settings: max_parallel_workers_per_gather = '4'

d) Targeting a parallel query (and its parallel workers) with
instrumentation:

postgres=# select pid, backend_type,pg_log_backend_explain_plan(pid)
postgres=# from pg_stat_activity
postgres=# where (backend_type = 'client backend' and pid !=
pg_backend_pid())
postgres=#or backend_type = 'parallel worker';
  pid  |  backend_type   | pg_log_backend_explain_plan
---+-+-
 24103 | client backend  | t
 24389 | parallel worker | t
 24390 | parallel worker | t
(3 rows)

2023-12-02 16:36:34.840 UTC [24103] LOG:  logging explain plan of PID 24103
Query Text: explain (analyze, buffers)
select *
from t2 a
inner join t1 b 

Re: Emitting JSON to file using COPY TO

2023-12-02 Thread Maciek Sakrejda
On Fri, Dec 1, 2023 at 11:32 AM Joe Conway  wrote:
> 1. Is supporting JSON array format sufficient, or does it need to
> support some other options? How flexible does the support scheme need to be?

"JSON Lines" is a semi-standard format [1] that's basically just
newline-separated JSON values. (In fact, this is what
log_destination=jsonlog gives you for Postgres logs, no?) It might be
worthwhile to support that, too.

[1]: https://jsonlines.org/




Re: SQL:2011 application time

2023-12-02 Thread Paul Jungwirth

On Thu, Nov 23, 2023 at 1:08 AM Peter Eisentraut  wrote:
> After further thought, I think the right solution is to change
> btree_gist (and probably also btree_gin) to use the common RT* strategy
> numbers.

Okay. That will mean bumping the version of btree_gist, and you must be running that version to use 
the new temporal features, or you will get silly results. Right? Is there a way to protect users 
against that and communicate they need to upgrade the extension?


This also means temporal features may not work in custom GIST opclasses. What we're saying is they 
must have an appropriate operator for RTEqualStrategyNumber (18) and RTOverlapStrategyNumber (3). 
Equal matters for the scalar key part(s), overlap for the range part. So equal is more likely to be 
an issue, but overlap matters if we want to support non-ranges (which I'd say is worth doing).


Also if they get it wrong, we won't really have any way to report an error.

I did some research on other extensions in contrib, as well as PostGIS. Here is 
what I found:

## btree_gin:

3 is =
18 is undefined

same for all types: macaddr8, int2, int4, int8, float4, float8, oid, timestamp, timestamptz, time, 
timetz, date, interval, inet, cidr, text, varchar, char, bytea, bit, varbit, numeric, anyenum, uuid, 
name, bool, bpchar


## cube

3 is &&
18 is <=>

## intarray

3 is &&
18 is undefined

## ltree

3 is =
18 is undefined

## hstore

3 and 18 are undefined

## seg

3 is &&
18 is undefined

## postgis: geometry

3 is &&
18 is undefined

## postgis: geometry_nd

3 is &&&
18 is undefined

I thought about looking through pgxn for more, but I haven't yet. I may still 
do that.
But already it seems like there is not much consistency.

So what do you think of this idea instead?:

We could add a new (optional) support function to GiST that translates "well-known" strategy numbers 
into the opclass's own strategy numbers. This would be support function 12. Then we can say 
translateStrategyNumber(RTEqualStrategyNumber) and look up the operator with the result.


There is not a performance hit, because we do this for the DDL command (create pk/uq/fk), then store 
the operator in the index/constraint.


If you don't provide this new support function, then creating the pk/uq/fk fails with a hint about 
what you can do to make it work.


This approach means we don't change the rules about GiST opclasses: you can still use the stranums 
how you like.


This function would also let me support non-range "temporal" foreign keys, where I'll need to build 
queries with && and maybe other operators.


What do you think?

--
Paul  ~{:-)
p...@illuminatedcomputing.com




Re: undetected deadlock in ALTER SUBSCRIPTION ... REFRESH PUBLICATION

2023-12-02 Thread Shlok Kyal
Hi,

> thread. I think you can compare the timing of regression tests in
> subscription, with and without the patch to show there is no
> regression. And probably some tests with a large number of tables for
> sync with very little data.

I have tested the regression test timings for subscription with and
without patch. I also did the timing test for sync of subscription
with the publisher for 100 and 1000 tables respectively.
I have attached the test script and results of the timing test are as follows:

Time taken for test to run in Linux VM
Summary|  Subscription Test (sec)
|100 tables in pub and Sub (sec)|  1000 tables in pub and Sub
(sec)
Without patch Release   |  95.564
 | 7.877 |   58.919
With patch Release|  96.513
   | 6.533 |   45.807

Time Taken for test to run in another Linux VM
Summary|  Subscription Test (sec)
|100 tables in pub and Sub (sec)|  1000 tables in pub and Sub
(sec)
Without patch Release   |  109.8145
|6.4675   |83.001
With patch Release|  113.162
   |7.947  |   87.113

Time Taken for test to run in Performance Machine Linux
Summary|  Subscription Test (sec)
|100 tables in pub and Sub (sec)|  1000 tables in pub and Sub
(sec)
Without patch Release   |  115.871
 | 6.656 |   81.157
With patch Release|  115.922
   | 6.7305   |   81.1525

thoughts?

Thanks and Regards,
Shlok Kyal


034_tmp.pl
Description: Binary data


Re: about help message for new pg_dump's --filter option

2023-12-02 Thread Alvaro Herrera
On 2023-Nov-30, Kyotaro Horiguchi wrote:

> Hello.
> 
> Recently, a new --filter option was added to pg_dump. I might be
> wrong, but the syntax of the help message for this feels off. Is the
> word 'on' not necessary after 'based'?
> 
> >  --filter=FILENAMEinclude or exclude objects and data from dump
> >   based expressions in FILENAME

Isn't this a bit too long?  Maybe we can do something shorter, like
  
  --filter=FILENAMEdetermine objects in dump based on FILENAME

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Those who use electric razors are infidels destined to burn in hell while
we drink from rivers of beer, download free vids and mingle with naked
well shaved babes." (http://slashdot.org/comments.pl?sid=44793=4647152)




Re: Emitting JSON to file using COPY TO

2023-12-02 Thread Tom Lane
Joe Conway  writes:
>> I noticed that, with the PoC patch, "json" is the only format that must be
>> quoted.  Without quotes, I see a syntax error.  I'm assuming there's a
>> conflict with another json-related rule somewhere in gram.y, but I haven't
>> tracked down exactly which one is causing it.

While I've not looked too closely, I suspect this might be due to the
FORMAT_LA hack in base_yylex:

/* Replace FORMAT by FORMAT_LA if it's followed by JSON */
switch (next_token)
{
case JSON:
cur_token = FORMAT_LA;
break;
}

So if you are writing a production that might need to match
FORMAT followed by JSON, you need to match FORMAT_LA too.

(I spent a little bit of time last week trying to get rid of
FORMAT_LA, thinking that it didn't look necessary.  Did not
succeed yet.)

regards, tom lane




Re: Emitting JSON to file using COPY TO

2023-12-02 Thread Joe Conway

On 12/1/23 18:09, Nathan Bossart wrote:

On Fri, Dec 01, 2023 at 02:28:55PM -0500, Joe Conway wrote:

I did a quick PoC patch (attached) -- if there interest and no hard
objections I would like to get it up to speed for the January commitfest.


Cool.  I would expect there to be interest, given all the other JSON
support that has been added thus far.


Thanks for the review


I noticed that, with the PoC patch, "json" is the only format that must be
quoted.  Without quotes, I see a syntax error.  I'm assuming there's a
conflict with another json-related rule somewhere in gram.y, but I haven't
tracked down exactly which one is causing it.


It seems to be because 'json' is also a type name ($$ = 
SystemTypeName("json")).


What do you think about using 'json_array' instead? It is more specific 
and accurate, and avoids the need to quote.


test=# copy foo to stdout (format json_array);
[
 {"id":1,"f1":"line: 1","f2":"2023-12-01T12:58:16.776863-05:00"}
,{"id":2,"f1":"line: 2","f2":"2023-12-01T12:58:16.777084-05:00"}
,{"id":3,"f1":"line: 3","f2":"2023-12-01T12:58:16.777096-05:00"}
,{"id":4,"f1":"line: 4","f2":"2023-12-01T12:58:16.777103-05:00"}
]


1. Is supporting JSON array format sufficient, or does it need to support
some other options? How flexible does the support scheme need to be?


I don't presently have a strong opinion on this one.  My instinct would be
start with something simple, though.  I don't think we offer any special
options for log_destination...


WFM


2. This only supports COPY TO and we would undoubtedly want to support COPY
FROM for JSON as well, but is that required from the start?


I would vote for including COPY FROM support from the start.


Check. My thought is to only accept the same format we emit -- i.e. only 
take a json array.



!   if (!cstate->opts.json_mode)


I think it's unfortunate that this further complicates the branching in
CopyOneRowTo(), but after some quick glances at the code, I'm not sure it's
worth refactoring a bunch of stuff to make this nicer.


Yeah that was my conclusion.

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com





Is WAL_DEBUG related code still relevant today?

2023-12-02 Thread Bharath Rupireddy
Hi,

I was recently looking at the code around the WAL_DEBUG macro and GUC.
When enabled, the code does the following:

1. Creates a memory context that allows pallocs within critical sections.
2. Decodes (not logical decoding but DecodeXLogRecord()) every WAL
record using the above memory context that's generated in the server
and emits a LOG message.
3. Emits messages at DEBUG level in AdvanceXLInsertBuffer(),  at LOG
level in XLogFlush(), at LOG level in XLogBackgroundFlush().
4. Emits messages at LOG level for every record that the server
replays/applies in the main redo loop.

I enabled this code by compiling with the WAL_DEBUG macro and setting
wal_debug GUC to on. Firstly, the compilation on Windows failed
because XL_ROUTINE was passed inappropriately for XLogReaderAllocate()
used. After fixing the compilation issue [1], the TAP tests started to
fail [2] which I'm sure we can fix.

I started to think if this code is needed at all in production. How
about we do either of the following?

a) Remove the WAL_DEBUG macro and move all the code under the
wal_debug GUC? Since the GUC is already marked as DEVELOPER_OPTION,
the users will know the consequences of enabling it in production.
b) Remove both the WAL_DEBUG macro and the wal_debug GUC. I don't
think (2) is needed to be in core especially when tools like
pg_walinspect and pg_waldump can do the same job. And, the messages in
(3) and (4) can be turned to some DEBUGX level without being under the
WAL_DEBUG macro.

I have no idea if anyone uses WAL_DEBUG macro and wal_debug GUCs in
production, if we have somebody using it, I think we need to fix the
compilation and test failure issues, and start testing this code
(perhaps I can think of setting up a buildfarm member to help here).

I'm in favour of option (b), but I'd like to hear more thoughts on this.

[1]
diff --git a/src/backend/access/transam/xlog.c
b/src/backend/access/transam/xlog.c
index ca7100d4db..52633793d4 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -1023,8 +1023,12 @@ XLogInsertRecord(XLogRecData *rdata,

palloc(DecodeXLogRecordRequiredSpace(record->xl_tot_len));

if (!debug_reader)
-   debug_reader =
XLogReaderAllocate(wal_segment_size, NULL,
-
   XL_ROUTINE(), NULL);
+   debug_reader = XLogReaderAllocate(wal_segment_size,
+
   NULL,
+
   XL_ROUTINE(.page_read = NULL,
+
  .segment_open = NULL,
+
  .segment_close = NULL),
+
  NULL);

[2]
src/test/subscription/t/029_on_error.pl because the test gets LSN from
an error context message emitted to server logs which the new
WAL_DEBUG LOG messages flood the server logs with.
src/bin/initdb/t/001_initdb.pl because the WAL_DEBUG LOG messages are
emitted to the console while initdb.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Remove unnecessary includes of system headers in header files

2023-12-02 Thread Peter Eisentraut

On 01.12.23 17:41, Tom Lane wrote:

Peter Eisentraut  writes:

I noticed that some header files included system header files for no
apparent reason, so I did some digging and found out that in a few cases
the original reason has disappeared.  So I propose the attached patches
to remove the unnecessary includes.


Seems generally reasonable.  Have you checked that headerscheck and
cpluspluscheck are happy?


Yes, I ran it through Cirrus, which includes those checks.





Re: Change GUC hashtable to use simplehash?

2023-12-02 Thread John Naylor
On Wed, Nov 29, 2023 at 8:31 PM John Naylor  wrote:
>
> Attached is a rough start with Andres's earlier ideas, to get
> something concrete out there.

While looking at the assembly out of curiosity, I found a couple bugs
in the split API that I've fixed locally.

I think the path forward is:

- performance measurements with both byte-at-a-time and
word-at-a-time, once I make sure they're fixed
- based on the above decide which one is best for guc_name_hash
- clean up hash function implementation
- test with with a new guc_name_compare (using what we learned from my
guc_name_eq) and see how well we do with keeping dynahash vs.
simplehash

Separately, for string_hash:

- run SMHasher and see about reincorporating length in the
calculation. v5 should be a clear improvement in collision behavior
over the current guc_name_hash, but we need to make sure it's at least
as good as hash_bytes, and ideally not lose anything compared to
standard fast_hash.