from:"Imseih \(AWS\), Sami"

Re: problems with "Shared Memory and Semaphores" section of docs

2024-05-21 Thread Imseih (AWS), Sami

> Any concerns with doing something like this [0] for the back-branches? The
> constant would be 6 instead of 7 on v14 through v16.

As far as backpatching the present inconsistencies in the docs,
[0] looks good to me.

[0] 
https://postgr.es/m/attachment/160360/v1-0001-fix-kernel-resources-docs-on-back-branches.patch
 



Regards,

Sami

Re: problems with "Shared Memory and Semaphores" section of docs

2024-05-17 Thread Imseih (AWS), Sami

> postgres -D pgdev-dev -c shared_buffers=16MB -C 
> shared_memory_size_in_huge_pages
> 13
> postgres -D pgdev-dev -c shared_buffers=16MB -c huge_page_size=1GB -C 
> shared_memory_size_in_huge_pages
> 1


> Which is very useful to be able to actually configure that number of huge
> pages. I don't think a system view or such would not help here.

Oops. Totally missed the -C flag. Thanks for clarifying!

Regards,

Sami

Re: problems with "Shared Memory and Semaphores" section of docs

2024-05-17 Thread Imseih (AWS), Sami

>>> If the exact values
>>> are important, maybe we could introduce more GUCs like
>>> shared_memory_size_in_huge_pages that can be consulted (instead of
>>> requiring users to break out their calculators).
>>
>> I don't especially like shared_memory_size_in_huge_pages, and I don't
>> want to introduce more of those. GUCs are not the right way to expose
>> values that you can't actually set. (Yeah, I'm guilty of some of the
>> existing ones like that, but it's still not a good thing.) Maybe it's
>> time to introduce a system view for such things? It could be really
>> simple, with name and value, or we could try to steal some additional
>> ideas such as units from pg_settings.

I always found some of the preset GUCs [1] to be useful for writing SQLs used by
DBAs, particularly block_size, wal_block_size, server_version and 
server_version_num.

> The advantage of the GUC is that its value could be seen before trying to
> actually start the server. 

Only if they have a sample in postgresql.conf file, right? 
A GUC like shared_memory_size_in_huge_pages will not be.


[1] https://www.postgresql.org/docs/current/runtime-config-preset.html


Regards,

Sami

Re: query_id, pg_stat_activity, extended query protocol

2024-05-16 Thread Imseih (AWS), Sami

> I'm unsure if it needs to call ExecutorStart in the bind code. But if we
> don't change the current logic, would it make more sense to move
> pgstat_report_query_id to the ExecutorRun routine?

I initially thought about that, but for utility statements (CTAS, etc.) being 
executed with extended query protocol, we will still not advertise the queryId 
as  we should. This is why I chose to set the queryId in PortalRunSelect and
PortalRunMulti in v2 of the patch [1].

We can advertise the queryId inside ExecutorRun instead of
PortalRunSelect as the patch does, but we will still need to advertise 
the queryId inside PortalRunMulti.

[1] 
https://www.postgresql.org/message-id/FAB6AEA1-AB5E-4DFF-9A2E-BB320E6C3DF1%40amazon.com


Regards,

Sami

Re: allow changing autovacuum_max_workers without restarting

2024-05-16 Thread Imseih (AWS), Sami

>>> That's true, but using a hard-coded limit means we no longer need to add a
>>> new GUC. Always allocating, say, 256 slots might require a few additional
>>> kilobytes of shared memory, most of which will go unused, but that seems
>>> unlikely to be a problem for the systems that will run Postgres v18.
>>
>> I agree with this.


> Here's what this might look like. I chose an upper limit of 1024, which
> seems like it "ought to be enough for anybody," at least for now.

I thought 256 was a good enough limit. In practice, I doubt anyone will 
benefit from more than a few dozen autovacuum workers. 
I think 1024 is way too high to even allow.

Besides that the overall patch looks good to me, but I have
some comments on the documentation.

I don't think combining 1024 + 5 = 1029 is a good idea in docs.
Breaking down the allotment and using the name of the constant 
is much more clear.

I suggest 
" max_connections + max_wal_senders + max_worker_processes + 
AUTOVAC_MAX_WORKER_SLOTS + 5"

and in other places in the docs, we should mention the actual 
value of AUTOVAC_MAX_WORKER_SLOTS. Maybe in the 
below section?

Instead of:
-() and allowed background
+(1024) and allowed background

do something like:
-() and allowed background
+   AUTOVAC_MAX_WORKER_SLOTS  (1024) and allowed background

Also,  replace the 1024 here with AUTOVAC_MAX_WORKER_SLOTS.

+max_wal_senders,
+plus max_worker_processes, plus 1024 for autovacuum
+worker processes, plus one extra for each 16


Also, Not sure if I am mistaken here, but the "+ 5" in the existing docs
seems wrong.
 
If it refers to NUM_AUXILIARY_PROCS defined in 
include/storage/proc.h, it should a "6"

#define NUM_AUXILIARY_PROCS 6

This is not a consequence of this patch, and can be dealt with
In a separate thread if my understanding is correct.


Regards,

Sami

Re: query_id, pg_stat_activity, extended query protocol

2024-05-15 Thread Imseih (AWS), Sami

> Okay, that's what I precisely wanted to understand: queryId doesn't have
> semantics to show the job that consumes resources right now—it is mostly
> about convenience to know that the backend processes nothing except
> (probably) this query.

It may be a good idea to expose in pg_stat_activity or a
supplemental activity view information about the current state of the
query processing. i.e. Is it parsing, planning or executing a query or
is it processing a nested query. 

I can see this being useful and perhaps could be taken up in a 
separate thread.

Regards,

Sami

Re: query_id, pg_stat_activity, extended query protocol

2024-05-14 Thread Imseih (AWS), Sami

> I discovered the current state of queryId reporting and found that it
> may be unlogical: Postgres resets queryId right before query execution
> in simple protocol and doesn't reset it at all in extended protocol and
> other ways to execute queries.

In exec_parse_message, exec_bind_message  and exec_execute_message,
the queryId is reset via pgstat_report_activity

> I think we should generally report it when the backend executes a job
> related to the query with that queryId. This means it would reset the
> queryId at the end of the query execution.

When the query completes execution and the session goes into a state 
other than "active", both the query text and the queryId should be of the 
last executed statement. This is the documented behavior, and I believe
it's the correct behavior.

If we reset queryId at the end of execution, this behavior breaks. Right?

> This seems logical, but
> what about the planning process? If an extension plans a query without
> the intention to execute it for speculative reasons, should we still
> show the queryId? Perhaps we should reset the state right after planning
> to accurately reflect the current queryId.

I think you are suggesting that during planning, the queryId
of the current statement being planned should not be reported.
 
If my understanding is correct, I don't think that is a good idea. Tools that 
snasphot pg_stat_activity will not be able to account for the queryId during
planning. This could mean that certain load on the database cannot be tied
back to a specific queryId.

Regards,

Sami

Re: New GUC autovacuum_max_threshold ?

2024-05-08 Thread Imseih (AWS), Sami

> This is about how I feel, too. In any case, I +1'd a higher default
> because I think we need to be pretty conservative with these changes, at
> least until we have a better prioritization strategy. While folks may opt
> to set this value super low, I think that's more likely to lead to some
> interesting secondary effects. If the default is high, hopefully these
> secondary effects will be minimized or avoided.


There is also an alternative of making this GUC -1 by default, which
means it has not effect and any value larger will be used in the threshold
calculation of autovacuunm. A user will have to be careful not to set it too 
low, 
but that is going to be a concern either way.


This idea maybe worth considering as it does not change the default
behavior of the autovac threshold calculation, and if a user has cases in 
which they have many tables with a few billion tuples that they wish to 
see autovacuumed more often, they now have a GUC to make 
that possible and potentially avoid per-table threshold configuration.


Also, I think coming up with a good default will be challenging,
and perhaps this idea is a good middle ground.


Regards,

Sami

Re: allow changing autovacuum_max_workers without restarting

2024-05-03 Thread Imseih (AWS), Sami

> That's true, but using a hard-coded limit means we no longer need to add a
> new GUC. Always allocating, say, 256 slots might require a few additional
> kilobytes of shared memory, most of which will go unused, but that seems
> unlikely to be a problem for the systems that will run Postgres v18.

I agree with this.


Regards,

Sami

Re: New GUC autovacuum_max_threshold ?

2024-05-02 Thread Imseih (AWS), Sami

> And as far as that goes, I'd like you - and others - to spell out more
> precisely why you think 100 or 200 million tuples is too much. It
> might be, or maybe it is in some cases but not in others. To me,
> that's not a terribly large amount of data. Unless your tuples are
> very wide, it's a few tens of gigabytes. That is big enough that I can
> believe that you *might* want autovacuum to run when you hit that
> threshold, but it's definitely not *obvious* to me that you want
> autovacuum to run when you hit that threshold.


Vacuuming the heap alone gets faster the more I do it, thanks to the 
visibility map. However, the more indexes I have, and the larger ( in the TBs),
the indexes become, autovacuum workers will be 
monopolized vacuuming these indexes.


At 500k tuples, I am constantly vacuuming large indexes
and monopolizing autovacuum workers. At 100 or 200
million tuples, I will also monopolize autovacuum workers
as they vacuums indexes for many minutes or hours. 


At 100 or 200 million, the monopolization will occur less often, 
but it will still occur leading an operator to maybe have to terminate
the autovacuum an kick of a manual vacuum. 


I am not convinced a new tuple based threshold will address this, 
but I may also may be misunderstanding the intention of this GUC.


> To make that concrete: If the table is 10TB, do you want to vacuum to
> reclaim 20GB of bloat? You might be vacuuming 5TB of indexes to
> reclaim 20GB of heap space - is that the right thing to do? If yes,
> why?


No, I would not want to run autovacuum on 5TB indexes to reclaim 
a small amount of bloat.




Regards,


Sami

Re: New GUC autovacuum_max_threshold ?

2024-05-01 Thread Imseih (AWS), Sami

I've been following this discussion and would like to add my
2 cents.

> Unless I'm missing something major, that's completely bonkers. It
> might be true that it would be a good idea to vacuum such a table more
> often than we do at present, but there's no shot that we want to do it
> that much more often. 

This is really an important point.

Too small of a threshold and a/v will constantly be vacuuming a fairly large 
and busy table with many indexes. 

If the threshold is large, say 100 or 200 million, I question if you want 
autovacuum 
to be doing the work of cleanup here?  That long of a period without a 
autovacuum 
on a table means there maybe something  misconfigured in your autovacuum 
settings. 

At that point aren't you just better off performing a manual vacuum and
taking advantage of parallel index scans?

Regards,

Sami Imseih
Amazon Web Services (AWS)

Re: query_id, pg_stat_activity, extended query protocol

2024-04-30 Thread Imseih (AWS), Sami

Here is a new rev of the patch which deals with the scenario
mentioned by Andrei [1] in which the queryId may change
due to a cached query invalidation.


[1] 
https://www.postgresql.org/message-id/724348C9-8023-41BC-895E-80634E79A538%40amazon.com

Regards,

Sami



0001-v2-Fix-Extended-Query-Protocol-handling-of-queryId.patch
Description: 0001-v2-Fix-Extended-Query-Protocol-handling-of-queryId.patch

Re: query_id, pg_stat_activity, extended query protocol

2024-04-27 Thread Imseih (AWS), Sami

>> But simplistic case with a prepared statement shows how the value of
>> queryId can be changed if you don't acquire all the objects needed for
>> the execution:




>> CREATE TABLE test();
>> PREPARE name AS SELECT * FROM test;
>> EXPLAIN (ANALYSE, VERBOSE, COSTS OFF) EXECUTE name;
>> DROP TABLE test;
>> CREATE TABLE test();
>> EXPLAIN (ANALYSE, VERBOSE, COSTS OFF) EXECUTE name;


> Hmm, you raise a good point. Isn't this a fundamental problem
> with prepared statements? If there is DDL on the
> relations of the prepared statement query, shouldn't the prepared
> statement be considered invalid at that point and raise an error
> to the user?

I tested v1 thoroughly.

Using the attached JDBC script for testing, I added some logging of the queryId 
being reported by the patch and added a breakpoint after sync [1] which at that 
point the locks are released on the table. I then proceeded to drop and 
recreate the table
and observed that the first bind after recreating the table still reports the
old queryId but the execute reports the correct queryId. This is because
the bind still has not had a chance to re-parse and re-plan after the
cache invalidation.


2024-04-27 13:51:15.757 CDT [43483] LOG:  duration: 21322.475 ms  execute S_1: 
select pg_sleep(10)
2024-04-27 13:51:21.591 CDT [43483] LOG:  duration: 0.834 ms  parse S_2: select 
from tab1 where id = $1
2024-04-27 13:51:21.591 CDT [43483] LOG:  query_id = -192969736922694368
2024-04-27 13:51:21.592 CDT [43483] LOG:  duration: 0.729 ms  bind S_2: select 
from tab1 where id = $1
2024-04-27 13:51:21.592 CDT [43483] LOG:  query_id = -192969736922694368
2024-04-27 13:51:21.592 CDT [43483] LOG:  duration: 0.032 ms  execute S_2: 
select from tab1 where id = $1
2024-04-27 13:51:32.501 CDT [43483] LOG:  query_id = -192969736922694368
2024-04-27 13:51:32.502 CDT [43483] LOG:  duration: 0.342 ms  bind S_2: select 
from tab1 where id = $1
2024-04-27 13:51:32.502 CDT [43483] LOG:  query_id = -192969736922694368
2024-04-27 13:51:32.502 CDT [43483] LOG:  duration: 0.067 ms  execute S_2: 
select from tab1 where id = $1
2024-04-27 13:51:42.613 CDT [43526] LOG:  query_id = -4766379021163149612
-- recreate the tables
2024-04-27 13:51:42.621 CDT [43526] LOG:  duration: 8.488 ms  statement: drop 
table if exists tab1;
2024-04-27 13:51:42.621 CDT [43526] LOG:  query_id = 7875284141628316369
2024-04-27 13:51:42.625 CDT [43526] LOG:  duration: 3.364 ms  statement: create 
table tab1 ( id int );
2024-04-27 13:51:42.625 CDT [43526] LOG:  query_id = 2967282624086800441
2024-04-27 13:51:42.626 CDT [43526] LOG:  duration: 0.936 ms  statement: insert 
into tab1 values (1);

-- this reports the old query_id
2024-04-27 13:51:45.058 CDT [43483] LOG:  query_id = -192969736922694368 

2024-04-27 13:51:45.059 CDT [43483] LOG:  duration: 0.913 ms  bind S_2: select 
from tab1 where id = $1
2024-04-27 13:51:45.059 CDT [43483] LOG:  query_id = 3010297048333693297
2024-04-27 13:51:45.059 CDT [43483] LOG:  duration: 0.096 ms  execute S_2: 
select from tab1 where id = $1
2024-04-27 13:51:46.777 CDT [43483] LOG:  query_id = 3010297048333693297
2024-04-27 13:51:46.777 CDT [43483] LOG:  duration: 0.108 ms  bind S_2: select 
from tab1 where id = $1
2024-04-27 13:51:46.777 CDT [43483] LOG:  query_id = 3010297048333693297
2024-04-27 13:51:46.777 CDT [43483] LOG:  duration: 0.024 ms  execute S_2: 
select from tab1 where id = $1

The easy answer is to not report queryId during the bind message, but I will 
look
at what else can be done here as it's good to have a queryId reported in this 
scenario
for cases there are long planning times and we rather not have those missed in 
pg_stat_activity sampling.


[1] 
https://github.com/postgres/postgres/blob/master/src/backend/tcop/postgres.c#L4877


Regards,

Sami



Test.java
Description: Test.java

Re: query_id, pg_stat_activity, extended query protocol

2024-04-27 Thread Imseih (AWS), Sami

> We choose a arguably more user-friendly option:

> https://www.postgresql.org/docs/current/sql-prepare.html

Thanks for pointing this out!

Regards,

Sami

Re: query_id, pg_stat_activity, extended query protocol

2024-04-27 Thread Imseih (AWS), Sami

> But simplistic case with a prepared statement shows how the value of
> queryId can be changed if you don't acquire all the objects needed for
> the execution:


> CREATE TABLE test();
> PREPARE name AS SELECT * FROM test;
> EXPLAIN (ANALYSE, VERBOSE, COSTS OFF) EXECUTE name;
> DROP TABLE test;
> CREATE TABLE test();
> EXPLAIN (ANALYSE, VERBOSE, COSTS OFF) EXECUTE name;

Hmm, you raise a good point. Isn't this a fundamental problem
with prepared statements? If there is DDL on the
relations of the prepared statement query, shouldn't the prepared
statement be considered invalid at that point and raise an error
to the user?

Regards,

Sami

Re: query_id, pg_stat_activity, extended query protocol

2024-04-23 Thread Imseih (AWS), Sami

> I am also a bit surprised with the choice of using the first Query
> available in the list for the ID, FWIW.


IIUC, the query trees returned from QueryRewrite
will all have the same queryId, so it appears valid to 
use the queryId from the first tree in the list. Right?

Here is an example I was working with that includes user-defined rules
that has a list with more than 1 tree.


postgres=# explain (verbose, generic_plan) insert into mytab values ($1) 
RETURNING pg_sleep($1), id ;
QUERY PLAN 
---
Insert on public.mytab (cost=0.00..0.01 rows=1 width=4)
Output: pg_sleep(($1)::double precision), mytab.id
-> Result (cost=0.00..0.01 rows=1 width=4)
Output: $1
Query Identifier: 3703848357297795425


Insert on public.mytab2 (cost=0.00..0.01 rows=0 width=0)
-> Result (cost=0.00..0.01 rows=1 width=4)
Output: $1
Query Identifier: 3703848357297795425


Insert on public.mytab3 (cost=0.00..0.01 rows=0 width=0)
-> Result (cost=0.00..0.01 rows=1 width=4)
Output: $1
Query Identifier: 3703848357297795425


Insert on public.mytab4 (cost=0.00..0.01 rows=0 width=0)
-> Result (cost=0.00..0.01 rows=1 width=4)
Output: $1
Query Identifier: 3703848357297795425
(20 rows)



> Did you consider using \bind to show how this behaves in a regression
> test?


Yes, this is precisely how I tested. Without the patch, I could not
see a queryId after 9 seconds of a pg_sleep, but with the patch it 
appears. See the test below.


## test query
select pg_sleep($1) \bind 30


## unpatched
postgres=# select 
query_id, 
query, 
now()-query_start query_duration, 
state 
from pg_stat_activity where pid <> pg_backend_pid()
and state = 'active';
query_id |query | query_duration  | state  
--+--+-+
  | select pg_sleep($1) +| 00:00:08.604845 | active
  | ;| | 
(1 row)

## patched

postgres=# truncate table large;^C
postgres=# select 
query_id, 
query, 
now()-query_start query_duration, 
state 
from pg_stat_activity where pid <> pg_backend_pid()
and state = 'active';
  query_id   |query | query_duration | state  
-+--++
 2433215470630378210 | select pg_sleep($1) +| 00:00:09.6881  | active
 | ;|| 
(1 row)


For exec_execute_message, I realized that to report queryId for
Utility and non-utility statements, we need to report the queryId 
inside the portal routines where PlannedStmt contains the queryId.

Attached is the first real attempt at the fix. 

Regards,


Sami







0001-v1-Fix-Extended-QUery-Protocol-handling-of-queryId.patch
Description: 0001-v1-Fix-Extended-QUery-Protocol-handling-of-queryId.patch

Re: query_id, pg_stat_activity, extended query protocol

2024-04-22 Thread Imseih (AWS), Sami

> FWIW, I'd like to think that we could improve the situation, requiring
> a mix of calling pgstat_report_query_id() while feeding on some query
> IDs retrieved from CachedPlanSource->query_list. I have not in
> details looked at how much could be achieved, TBH.

I was dealing with this today and found this thread. I spent some time
looking at possible solutions.

In the flow of extended query protocol, the exec_parse_message 
reports the queryId, but subsequent calls to exec_bind_message
and exec_execute_message reset the queryId when calling
pgstat_report_activity(STATE_RUNNING,..) as you can see below.
  
 /*
  * If a new query is started, we reset the query identifier as it'll only
  * be known after parse analysis, to avoid reporting last query's
  * identifier.
  */
 if (state == STATE_RUNNING)
 beentry->st_query_id = UINT64CONST(0);


So, I think the simple answer is something like the below. 
Inside exec_bind_message and exec_execute_message,
the query_id should be reported after pg_report_activity. 

diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 76f48b13d2..7ec2df91d5 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1678,6 +1678,7 @@ exec_bind_message(StringInfo input_message)
debug_query_string = psrc->query_string;
 
pgstat_report_activity(STATE_RUNNING, psrc->query_string);
+   pgstat_report_query_id(linitial_node(Query, psrc->query_list)->queryId, 
true);
 
set_ps_display("BIND");
 
@@ -2146,6 +2147,7 @@ exec_execute_message(const char *portal_name, long 
max_rows)
debug_query_string = sourceText;
 
pgstat_report_activity(STATE_RUNNING, sourceText);
+   pgstat_report_query_id(portal->queryDesc->plannedstmt->queryId, true);
 
cmdtagname = GetCommandTagNameAndLen(portal->commandTag, );


thoughts?


Regards,

Sami Imseih
Amazon Web Services (AWS)

Re: allow changing autovacuum_max_workers without restarting

2024-04-21 Thread Imseih (AWS), Sami

> Part of the underlying problem here is that, AFAIK, neither PostgreSQL
> as a piece of software nor we as human beings who operate PostgreSQL
> databases have much understanding of how autovacuum_max_workers should
> be set. It's relatively easy to hose yourself by raising
> autovacuum_max_workers to try to make things go faster, but produce
> the exact opposite effect due to how the cost balancing stuff works.

Yeah, this patch will not fix this problem. Anyone who raises av_max_workers
should think about adjusting the av_cost_delay. This was discussed up the
thread [4] and even without this patch, I think it's necessary to add more
documentation on the relationship between workers and cost.


> So I feel like what this proposal reveals is that we know that our
> algorithm for ramping up the number of running workers doesn't really
> work. And maybe that's just a consequence of the general problem that
> we have no global information about how much vacuuming work there is
> to be done at any given time, and therefore we cannot take any kind of
> sensible guess about whether 1 more worker will help or hurt. Or,
> maybe there's some way to do better than what we do today without a
> big rewrite. I'm not sure. I don't think this patch should be burdened
> with solving the general problem here. But I do think the general
> problem is worth some discussion.

This patch is only solving the operational problem of adjusting 
autovacuum_max_workers,  and it does so without introducing complexity.

A proposal that will alleviate the users from the burden of having to think 
about
autovacuum_max_workers, cost_delay and cost_limit settings will be great.
This patch may be the basis for such dynamic  "auto-tuning" of autovacuum 
workers.


Regards,

Sami

[4] https://www.postgresql.org/message-id/20240419154322.GA3988554%40nathanxps13

Re: allow changing autovacuum_max_workers without restarting

2024-04-17 Thread Imseih (AWS), Sami

> Here is a first attempt at a proper patch set based on the discussion thus
> far. I've split it up into several small patches for ease of review, which
> is probably a bit excessive. If this ever makes it to commit, they could
> likely be combined.

I looked at the patch set. With the help of DEBUG2 output, I tested to ensure
that the the autovacuum_cost_limit  balance adjusts correctly when the 
autovacuum_max_workers value increases/decreases. I did not think the 
patch will break this behavior, but it's important to verify this.

Some comments on the patch:

1. A nit. There should be a tab here.

-   dlist_head  av_freeWorkers;
+   dclist_head av_freeWorkers;

2. autovacuum_max_worker_slots documentation:

+   
+Note that the value of  is
+silently capped to this value.
+   

This comment looks redundant in the docs, since the entry 
for autovacuum_max_workers that follows mentions the
same.


3. The docs for autovacuum_max_workers should mention that when
the value changes, consider adjusting the autovacuum_cost_limit/cost_delay. 

This is not something new. Even in the current state, users should think about 
these settings. However, it seems even important if this value is to be 
dynamically adjusted.


Regards,

Sami

Re: allow changing autovacuum_max_workers without restarting

2024-04-15 Thread Imseih (AWS), Sami

> Another option could be to just remove the restart-only GUC and hard-code
> the upper limit of autovacuum_max_workers to 64 or 128 or something. While
> that would simplify matters, I suspect it would be hard to choose an
> appropriate limit that won't quickly become outdated.

Hardcoded values are usually hard to deal with because they are hidden either
In code or in docs.

> When I thought about this, I considered proposing to add a new GUC for
> "autovacuum_policy_workers".

> autovacuum_max_workers would be the same as before, requiring a restart
> to change.  The policy GUC would be the soft limit, changable at runtime

I think autovacuum_max_workers should still be the GUC that controls
the number of concurrent autovacuums. This parameter is already well 
established and changing the meaning now will be confusing. 

I suspect most users will be glad it's now dynamic, but will probably 
be annoyed if it's no longer doing what it's supposed to.

Regards,

Sami

Re: allow changing autovacuum_max_workers without restarting

2024-04-13 Thread Imseih (AWS), Sami

> IIRC using GUC hooks to handle dependencies like this is generally frowned
> upon because it tends to not work very well [0]. We could probably get it
> to work for this particular case, but IMHO we should still try to avoid
> this approach. 

Thanks for pointing this out. I agree, this could lead to false logs being
emitted.

> so it might not be crucial to emit a
> WARNING here.

As mentioned earlier in the thread, we can let the autovacuum launcher emit the 
log,  but it will need to be careful not flood the logs when this condition 
exists ( i.e.
log only the first time the condition is detected or log every once in a while )

The additional complexity is not worth it.

Regards,

Sami

Re: allow changing autovacuum_max_workers without restarting

2024-04-12 Thread Imseih (AWS), Sami

>> 1/ We should emit a log when autovacuum_workers is set higher than the max.


>> Hm. Maybe the autovacuum launcher could do that.

Would it be better to use a GUC check_hook that compares the 
new value with the max allowed values and emits a WARNING ?

autovacuum_max_workers already has a check_autovacuum_max_workers
check_hook, which can be repurposed for this.

In the POC patch, this check_hook is kept as-is, which will no longer make 
sense.

>> 2/ should the name of the restart limit be "reserved_autovacuum_workers"?


>> That's kind-of what I had in mind, although I think we might want to avoid
>> the word "reserved" because it sounds a bit like reserved_connections 
>> and superuser_reserved_connections

Yes, I agree. This can be confusing.

>>  "autovacuum_max_slots" or
>> "autovacuum_max_worker_slots" might be worth considering, too.

"autovacuum_max_worker_slots" is probably the best option because
we should have "worker" in the name of the GUC.


Regards,

Sami

Re: allow changing autovacuum_max_workers without restarting

2024-04-12 Thread Imseih (AWS), Sami

I spent sometime reviewing/testing the POC. It is relatively simple with a lot
of obvious value. 

I tested with 16 tables that constantly reach the autovac threashold and the
patch did the right thing. I observed concurrent autovacuum workers matching
the setting as I was adjusting it dynamically.

As you mention above, If there are more autovacs in progress and a new lower 
setting 
is applied, we should not take any special action on those autovacuums, and 
eventually 
the max number of autovacuum workers will match the setting.

I also tested by allowing user connections to reach max_connections, and 
observed the 
expected number of autovacuums spinning up and correctly adjusted.

Having autovacuum tests ( unless I missed something ) like the above is a good 
general improvement, but it should not be tied to this. 

A few comments on  the POC patch:

1/ We should emit a log when autovacuum_workers is set higher than the max.

2/ should the name of the restart limit be "reserved_autovacuum_workers"?

Regards,

Sami Imseih
AWS (Amazon Web Services)

Re: allow changing autovacuum_max_workers without restarting

2024-04-11 Thread Imseih (AWS), Sami

> My concern with this approach is that other background workers could use up
> all the slots and prevent autovacuum workers from starting

That's a good point, the current settings do not guarantee that you
get a worker for the purpose if none are available, 
i.e. max_parallel_workers_per_gather, you may have 2 workers planned 
and 0 launched. 

> unless of
> course we reserve autovacuum_max_workers slots for _only_ autovacuum
> workers. I'm not sure if we want to get these parameters tangled up like
> this, though...

This will be confusing to describe and we will be reserving autovac workers
implicitly, rather than explicitly with a new GUC.

Regards,

Sami

Re: allow changing autovacuum_max_workers without restarting

2024-04-11 Thread Imseih (AWS), Sami

> I frequently hear about scenarios where users with thousands upon thousands
> of tables realize that autovacuum is struggling to keep up.  When they
> inevitably go to bump up autovacuum_max_workers, they discover that it
> requires a server restart (i.e., downtime) to take effect, causing further
> frustration.  For this reason, I think $SUBJECT is a desirable improvement.
> I spent some time looking for past discussions about this, and I was
> surprised to not find any, so I thought I'd give it a try.

I did not review the patch in detail yet, but +1 to the idea. 
It's not just thousands of tables that suffer from this.
If a user has a few large tables hogging the autovac workers, then other
tables don't get the autovac cycles they require. Users are then forced
to run manual vacuums, which adds complexity to their operations.

> The attached proof-of-concept patch demonstrates what I have in mind.
> Instead of trying to dynamically change the global process table, etc., I'm
> proposing that we introduce a new GUC that sets the effective maximum
> number of autovacuum workers that can be started at any time.

max_worker_processes defines a pool of max # of background workers allowed.
parallel workers and extensions that spin  up background workers all utilize 
from 
this pool. 

Should autovacuum_max_workers be able to utilize from max_worker_processes also?

This will allow autovacuum_max_workers to be dynamic while the user only has
to deal with an already existing GUC. We may want to increase the default value
for max_worker_processes as part of this.


Regards,

Sami
Amazon Web Services (AWS)

Re: Psql meta-command conninfo+

2024-04-05 Thread Imseih (AWS), Sami

> The original \conninfo was designed to report values from the libpq API
> about what libpq connected to. And the convention in psql is that "+"
> provide more or less the same information but a bit more. So I think it
> is wrong to make "\conninfo+" something fundamentally different than
> "\conninfo".

That is a fair argument. Looking at this a bit more, it looks like we can
enhance the GSSAPI output of conninfo to get the fields that GSSAPI fields that
conninfo+ was aiming for

Right now it just shows:

printf(_("GSSAPI-encrypted connection\n"));

but we can use libpq to get the other GSSAPI attributes in the output.

> And even more so if it contains information that isn't really
> "connection information". For example, current_user() doesn't make
> sense here, I think.

> I mean, if you want to add a command \some_useful_status_information, we
> can talk about that, but let's leave \conninfo to what it does.

> But I think there are better ways to implement this kind of server-side
> status, for example, with a system view.

What about a \sessioninfo command that shows all the
information for the current session, PID, app_name, current_user,
session_user, system_user, client_address, client_port, etc.

These will be the fields that we cannot gather if the connection is
broken for whatever reason.

The distinction here is \sessioninfo are the details after the connection
is established and could possibly be changed during the connection.

Regards,

Sami

Re: Psql meta-command conninfo+

2024-04-04 Thread Imseih (AWS), Sami

> The point about application_name is a valid one. I guess it's there
> because it's commonly given from the client side rather than being set
>server-side, even though it's still a GUC. Arguably we could remove it
> from \conninfo+, and claim that nothing that shows up in \dconfig should
> also appear in \conninfo+. Then users should get in the habit of using
> both to obtain a full picture. This sounds to me a very good compromise
> actually.

Perhaps another reason to remove "application_name" is because the
value can be modified after the connection is established. If that is also
another good reason, the same can be said about "Current User"
and "Session User" as those could be modified with SET commands.

This way conninfo could be only for attributes that will not change
during the lifetime of the connection.


Also, I just realized that pg_stat_ssl and pg_stat_gssapi will both return 0
rows if there is a set role/set session authorization, this means \conninfo+
will return empty.

postgres=> select current_user;
 current_user 
--
 user1
(1 row)

postgres=> select * from pg_stat_ssl ;
  pid  | ssl | version |   cipher| bits | client_dn | 
client_serial | issuer_dn 
---+-+-+-+--+---+---+---
 27223 | t   | TLSv1.2 | ECDHE-RSA-AES256-GCM-SHA384 |  256 |   |   
| 
(1 row)

postgres=> set role = user2;
SET
postgres=> select * from pg_stat_ssl ;
 pid | ssl | version | cipher | bits | client_dn | client_serial | issuer_dn 
-+-+-++--+---+---+---
(0 rows)

postgres=> select current_user;
 current_user 
--
 user2
(1 row)

postgres=> reset role;
RESET
postgres=> select current_user;
 current_user 
--
 user1
(1 row)

postgres=> select * from pg_stat_ssl ;
  pid  | ssl | version |   cipher| bits | client_dn | 
client_serial | issuer_dn 
---+-+-+-+--+---+---+---
 27223 | t   | TLSv1.2 | ECDHE-RSA-AES256-GCM-SHA384 |  256 |   |   
| 
(1 row)


Regards, 

Sami

Re: Psql meta-command conninfo+

2024-04-03 Thread Imseih (AWS), Sami

Building the docs fail for v26. The error is:



ref/psql-ref.sgml:1042: element member: validity error : Element term is not 
declared in member list of possible children

 

  ^



I am able to build up to v24 before the  was replaced with 




I tested building with a modified structure as below; the  is a 

that has a  within it.  Each field is a simple list member and

the the name of the fields should be .



  

   \conninfo[+]

   

   

Outputs a string displaying information about the current

database connection. When + is appended,

more details about the connection are displayed in table

format:

   

   Database:The database name of the 
connection.

   Authenticated User:The authenticated database 
user of the connection.

   Current User:The user name of the current 
execution context;

  see the current_user() function in

   for more 
details.

   

   

   

  



Regards,



Sami

Re: Psql meta-command conninfo+

2024-04-01 Thread Imseih (AWS), Sami

> Here I considered your suggestion (Sami and Álvaro's). However, I haven't yet
> added the links for the functions system_user(), current_user(), and 
> session_user().
> 'm not sure how to do it. Any suggestion on how to create/add the link?

Here is an example [1] where the session information functions are referenced.
The public doc for this example is in [2].

Something similar can be done here. For example:

The session user's name; see the session_user() function in
 for more details.


[1] 
https://github.com/postgres/postgres/blob/master/doc/src/sgml/system-views.sgml#L1663-L1664
[2] https://www.postgresql.org/docs/16/view-pg-locks.html

Regards,

Sami

Re: Psql meta-command conninfo+

2024-04-01 Thread Imseih (AWS), Sami

> That minor point aside, I disagree with Sami about repeating the docs
> for system_user() here. I would just say "The authentication data
> provided for this connection; see the function system_user() for more
> details." 

+1. FWIW; Up the thread [1], I did mention we should link to the functions page,
but did not have a very strong opinion on that. 

I do like your suggestion and the same should be done for "Current User" 
and "Session User" as well. 

[1] 
https://www.postgresql.org/message-id/640B2586-EECF-44C0-B474-CA8510F8EAFC%40amazon.com

Regards,

Sami

Re: Psql meta-command conninfo+

2024-03-31 Thread Imseih (AWS), Sami

>. However,
> I didn't complete item 4. I'm not sure, but I believe that linking it to the 
> documentation
> could confuse the user a bit. I chose to keep the descriptions as they were. 
> However, if
> you have any ideas on how we could outline it, let me know and perhaps we can
> implement it.



That is fair, after spending some time thinking about this, it is better

to make the documentation as crystal clear as possible.



I do have some rewording suggestions for the following fields.





Database:

The database name of the connection.



Authenticated User:

The authenticated database user of the connection.



Socket Directory:

The Unix socket directory of the connection. NULL if host or hostaddr are 
specified.



Host:

The host name or address of the connection. NULL if a Unix socket is used.



Server Port:

The IP address of the connection. NULL if a Unix socket is used.



Server Address:

The IP address of the host name. NULL if a Unix socket is used.



Client Address:

The IP address of the client connected to this backend. NULL if a Unix socket 
is used.



Client Port:

The port of the client connected to this backend. NULL if a Unix socket is used.



Backend PID:

The process id of the backend for the connection.



Application name:

psql is the default for a psql connection

unless otherwise specified in the 

configuration parameter.



For System User, you should use the full definition here

https://github.com/postgres/postgres/blob/master/doc/src/sgml/func.sgml#L24251-L24259.

The current path is missing the full description.





Regards,



Sami

Re: Psql meta-command conninfo+

2024-03-29 Thread Imseih (AWS), Sami

Thank you for the updated patch.

First and foremost, thank you very much for the review.

> The initial and central idea was always to keep the metacommand
> "\conninfo" in its original state, that is, to preserve the string as it is.
> The idea of "\conninfo+" is to expand this to include more information.
> If I change the "\conninfo" command and transform it into a table,
> I would have to remove all the translation part (files) that has already
> been implemented in the past. I believe that keeping the string and
> its translations is a good idea and there is no great reason to change that.

You are right. Not much to be gained to change this.



For the current patch, I have a few more comments.



1/ The output should be reorganized to show the fields that are part of 
“conninfo” first.



I suggest it should look like this:



Current Connection Information

-[ RECORD 1 ]+-

Database | postgres

Authenticated User   | postgres

Socket Directory | /tmp

Host |

Server Port  | 5432

Server Address   |

Client Address   |

Client Port  |

Backend PID  | 30846

System User  |

Current User | postgres

Session User | postgres

Application Name | psql

SSL Connection   | f

SSL Protocol |

SSL Cipher   |

SSL Compression  |

GSSAPI Authenticated | f

GSSAPI Principal |

GSSAPI Encrypted | f

GSSAPI Credentials Delegated | f





With regards to the documentation. I think it's a good idea that every field 
has an

description. However, I have some comments:



1/ For the description of the conninfo command, what about simplifying like 
below?



"Outputs a string displaying information about the current database connection. 
When + is appended, more details about the connection are displayed in table 
format:"



2/ I don't think the descriptions need to start with "Displays". It is very 
repetitive.



3/ For the "Socket Directory" description, this could be NULL if the host was 
not specified.



what about the below?



"The socket directory of the connection. NULL if the host or hostaddr are 
specified for the connection"



4/ For most of the fields, they are just the output of a function, such as 
"pg_catalog.system_user()". What about the docs simply

link to the documentation of the function. This way we are not copying 
descriptions and have to be concerned if the description

of the function changes in the future.



5/ "true" and "false", do not need double quotes. This is not the convention 
used in other places docs.





Regards,



Sami

Re: Psql meta-command conninfo+

2024-03-27 Thread Imseih (AWS), Sami

Hi,

Thanks for working on this.

I have a few comments about the current patch.

1/ I looked through other psql-meta commands and the “+” adds details but
does not change output format. In this patch, conninfo and conninfo+
have completely different output. The former is a string with all the details
and the latter is a table. Should conninfo just be a table with the minimal info
and additional details can be displayed with "+" appended?

instead of \conninfo displaying a string

You are connected to database "postgres" as user "postgres" on host "127.0.01" 
(address "127.0.0.1") at port "5432".

It can instead display the same information in table format

Current Connection Information
-[ RECORD 1 ]---+---
Database  | postgres
Authenticated User| postgres
Socket Directory |
Host  | 127.0.0.1
Port   | 5432

and \conninfo+ can show the rest of the details

Current Connection Information
-[ RECORD 1 ]--+
Database | postgres
Authenticated User   | postgres
Socket Directory|
Host | 127.0.0.1
Port | 5432
Backend PID   | 1234
...
.

2/ GSSAPI details should show the full details, such as "principal",
"encrypted" and "credentials_delegated".

This also means that pg_stat_ssl will need to be joined with pg_stat_gssapi


FROM

pg_catalog.pg_stat_ssl ssl LEFT JOIN pg_catalog.pg_stat_gssapi gssapi

ON ssl.pid = gssapi.pid

ssl.pid = pg_catalog.pg_backend_pid()


3/ A very nice to have is "Application Name", in the case one
sets the application_name within a connection or in the connection string.

Regards,

Sami Imseih
Amazon Web Services (AWS)

Re: [BUG] autovacuum may skip tables when session_authorization/role is set on database

2023-12-13 Thread Imseih (AWS), Sami

> What is the actual
> use case for such a setting? 

I don't have exact details on the use-case, bit this is not a common
use-case.

> Doesn't it risk security problems?

I cannot see how setting it on the database being more problematic than
setting it on a session level.


> I'm rather unimpressed by this proposal, first because there are
> probably ten other ways to break autovac with ill-considered settings,

There exists code in autovac that safeguard for such settings. For example,
statement_timeout, lock_timeout are disabled. There are a dozen or
more other settings that are overridden for autovac.

I see this being just another one to ensure that autovacuum always runs
as superuser.

> and second because if we do want to consider this a supported case,
> what about other background processes? They'd likely have issues
> as well.

I have not considered other background processes, but autovac is the only
one that I can think of which checks for relation permissions.

Regards,

Sami

[BUG] autovacuum may skip tables when session_authorization/role is set on database

2023-12-13 Thread Imseih (AWS), Sami

Hi,

A recent case in the field in which a database session_authorization is
altered to a non-superuser, non-owner of tables via alter database .. set 
session_authorization ..
caused autovacuum to skip tables.

The issue was discovered on 13.10, and the logs show such messages:

warning:  skipping "table1" --- only table or database owner can vacuum it

In HEAD, I can repro, but the message is now a bit different due to [1].

WARNING:  permission denied to vacuum "table1”, skipping it

It seems to me we should force an autovacuum worker to set the session userid to
a superuser.

Attached is a repro and a patch which sets the session user to the BOOTSTRAP 
superuser
at the start of the autovac worker.

Thoughts?

Regards,

Sami
Amazon Web Services (AWS)


[1] 
https://postgr.es/m/20220726.104712.912995710251150228.horikyota@gmail.com



0001-v1-Force-autovacuum-to-use-bootstrap-superuser.patch
Description: 0001-v1-Force-autovacuum-to-use-bootstrap-superuser.patch
## make autovac trigger often for the test
psql<

Re: False "pg_serial": apparent wraparound” in logs

2023-10-17 Thread Imseih (AWS), Sami

> So, I've spent more time on that and applied the simplification today,
> doing as you have suggested to use the head page rather than the tail
> page when the tail XID is ahead of the head XID, but without disabling
> the whole. I've simplified a bit the code and the comments, though,
> while on it (some renames and a slight refactoring of tailPage, for
> example).
> --
> Michael

Thank you!

Regards,

Sami

Re: Logging parallel worker draught

2023-10-15 Thread Imseih (AWS), Sami

> I believe both cumulative statistics and logs are needed. Logs excel in 
> pinpointing specific queries at precise times, while statistics provide 
> a broader overview of the situation. Additionally, I often encounter 
> situations where clients lack pg_stat_statements and can't restart their 
> production promptly.

I agree that logging will be very useful here. 
Cumulative stats/pg_stat_statements can be handled in a separate discussion.

> log_temp_files exhibits similar behavior when a query involves multiple
> on-disk sorts. I'm uncertain whether this is something we should or need
> to address. I'll explore whether the error message can be made more
> informative.


> [local]:5437 postgres@postgres=# SET work_mem to '125kB';
> [local]:5437 postgres@postgres=# SET log_temp_files TO 0;
> [local]:5437 postgres@postgres=# SET client_min_messages TO log;
> [local]:5437 postgres@postgres=# WITH a AS ( SELECT x FROM
> generate_series(1,1) AS F(x) ORDER BY 1 ) , b AS (SELECT x FROM
> generate_series(1,1) AS F(x) ORDER BY 1 ) SELECT * FROM a,b;
> LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp138850.20", size
> 122880 => First sort
> LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp138850.19", size 14
> LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp138850.23", size 14
> LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp138850.22", size
> 122880 => Second sort
> LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp138850.21", size 14

That is true.

Users should also control if they want this logging overhead or not, 
The best answer is a new GUC that is OFF by default.

I am also not sure if we want to log draught only. I think it's important
to not only see which operations are in parallel draught, but to also log 
operations are using 100% of planned workers. 
This will help the DBA tune queries that are eating up the parallel workers.

Regards,

Sami

Re: False "pg_serial": apparent wraparound” in logs

2023-10-14 Thread Imseih (AWS), Sami

Sorry for the delay in response.

> Back then, we were pretty much OK with the amount of space that could
> be wasted even in this case. Actually, how much space are we talking
> about here when a failed truncation happens? 

It is a transient waste in space as it will eventually clean up.

> As this is basically
> harmless, still leads to a confusing message, 

Correct, and especially because the message has
"wraparound" in the text.

> do we really need a backpatch here?

No, I don't think a backpatch is necessary.


> Anyway, it looks like you're right, we don't really need the SLRU once
> the tail is ahead of the tail because the SLRU has wrapped around due
> to the effect of transactions aging out, so making the truncation a
> bit smarter should be OK.

I assume you meant " the tail is ahead of the head".

SummarizeOldestCommittedSxact advances the headXid, but if we
checkpoint before this is called, then the tail could be ahead. The tail is
advanced by SerialSetActiveSerXmin whenever there is a new serializable
transaction.


> Hmm. This doesn't seem enough. Shouldn't we explain at least in
> which scenarios the tail can get ahead of the head (aka at least
> with long running transactions that make the SLRU wrap-around)?
> Except if I am missing something, there is no explanation of that in
> predicate.c.

After looking at this a bit more, I don't think the previous rev is correct.
We should not fall through to the " The SLRU is no longer needed." Which
also sets the headPage to invalid. We should only truncate up to the
head page.

Please see attached v3.


Regards,

Sami





0001-v3-Fix-false-apparent-wraparound-detected-in-pg_serial.patch
Description: 0001-v3-Fix-false-apparent-wraparound-detected-in-pg_serial.patch

Re: Logging parallel worker draught

2023-10-11 Thread Imseih (AWS), Sami

>> Currently explain ( analyze ) will give you the "Workers Planned"
>> and "Workers launched". Logging this via auto_explain is possible, so I am
>> not sure we need additional GUCs or debug levels for this info.
>>
>> -> Gather (cost=10430.00..10430.01 rows=2 width=8) (actual tim
>> e=131.826..134.325 rows=3 loops=1)
>> Workers Planned: 2
>> Workers Launched: 2

> I don't think autoexplain is a good substitute for the originally
> proposed log line. The possibility for log bloat is enormous. Some
> explain plans are gigantic, and I doubt people can afford that kind of
> log traffic just in case these numbers don't match.

Correct, that is a downside of auto_explain in general. 

The logging traffic can be controlled by 
auto_explain.log_min_duration/auto_explain.sample_rate/etc.
of course. 

> Well, if you read Benoit's earlier proposal at [1] you'll see that he
> does propose to have some cumulative stats; this LOG line he proposes
> here is not a substitute for stats, but rather a complement.  I don't
> see any reason to reject this patch even if we do get stats.

> Also, we do have a patch on stats, by Sotolongo and Bonne here [2].  I

Thanks. I will review the threads in depth and see if the ideas can be combined
in a comprehensive proposal.

Regarding the current patch, the latest version removes the separate GUC,
but the user should be able to control this behavior. 

Query text is logged when  log_min_error_statement > default level of "error".

This could be especially problematic when there is a query running more than 1 
Parallel
Gather node that is in draught. In those cases each node will end up 
generating a log with the statement text. So, a single query execution could 
end up 
having multiple log lines with the statement text.

i.e.
LOG:  Parallel Worker draught during statement execution: workers spawned 0, 
requested 2
STATEMENT:  select (select count(*) from large) as a, (select count(*) from 
large) as b, (select count(*) from large) as c ;
LOG:  Parallel Worker draught during statement execution: workers spawned 0, 
requested 2
STATEMENT:  select (select count(*) from large) as a, (select count(*) from 
large) as b, (select count(*) from large) as c ;
LOG:  Parallel Worker draught during statement execution: workers spawned 0, 
requested 2
STATEMENT:  select (select count(*) from large) as a, (select count(*) from 
large) as b, (select count(*) from large) as c ;

I wonder if it will be better to accumulate the total # of workers planned and 
# of workers launched and
logging this information at the end of execution?

Regards,

Sami

Re: Logging parallel worker draught

2023-10-09 Thread Imseih (AWS), Sami

Hi,

This thread has been quiet for a while, but I'd like to share some
thoughts.

+1 to the idea of improving visibility into parallel worker saturation.
But overall, we should improve parallel processing visibility, so DBAs can
detect trends in parallel usage ( is the workload doing more parallel, or doing 
less )
and have enough data to either tune the workload or change parallel GUCs.

>> We can output this at the LOG level to avoid running the server at
>> DEBUG1 level. There are a few other cases where we are not able to
>> spawn the worker or process and those are logged at the LOG level. For
>> example, "could not fork autovacuum launcher process .." or "too many
>> background workers". So, not sure, if this should get a separate
>> treatment. If we fear this can happen frequently enough that it can
>> spam the LOG then a GUC may be worthwhile.


> I think we should definitely be afraid of that. I am in favor of a separate 
> GUC.

Currently explain ( analyze ) will give you the "Workers Planned"
and "Workers launched". Logging this via auto_explain is possible, so I am
not sure we need additional GUCs or debug levels for this info.

   ->  Gather  (cost=10430.00..10430.01 rows=2 width=8) (actual tim
e=131.826..134.325 rows=3 loops=1)
 Workers Planned: 2
 Workers Launched: 2


>> What I was wondering was whether we would be better off putting this
>> into the statistics collector, vs. doing it via logging. Both
>> approaches seem to have pros and cons.
>>
>> I think it could be easier for users to process the information if it
>> is available via some view, so there is a benefit in putting this into
>> the stats subsystem.


> Unless we do this instead.

Adding cumulative stats is a much better idea.

3 new columns can be added to pg_stat_database:
workers_planned, 
workers_launched,
parallel_operations - There could be more than 1 operation
per query, if for example there are multiple Parallel Gather
operations in a plan.

With these columns, monitoring tools can trend if there is more
or less parallel work happening over time ( by looking at parallel
operations ) or if the workload is suffering from parallel saturation.
workers_planned/workers_launched < 1 means there is a lack
of available worker processes.

Also, We could add this information on a per query level as well 
in pg_stat_statements, but this can be taken up in a seperate
discussion.

Regards,

--
Sami Imseih
Amazon Web Services (AWS)

Re: False "pg_serial": apparent wraparound” in logs

2023-10-05 Thread Imseih (AWS), Sami

Correct a typo in my last message: 

Instead of:
“ I added an additional condition to make sure that the tailPage proceeds the 
headPage
as well. “

It should be:
“ I added an additional condition to make sure that the tailPage precedes the 
headPage
as well. ”

Regards,

Sami

> On Oct 5, 2023, at 6:29 PM, Imseih (AWS), Sami  wrote:
> 
> I added an additional condition to make sure that the tailPage proceeds the 
> headPage
> as well.

Re: False "pg_serial": apparent wraparound” in logs

2023-10-05 Thread Imseih (AWS), Sami

> I think the smallest fix here would be to change CheckPointPredicate()
> so that if tailPage > headPage, pass headPage to SimpleLruTruncate()
> instead of tailPage. Or perhaps it should go into the "The SLRU is no
> longer needed" codepath in that case. If tailPage > headPage, the SLRU
> isn't needed at the moment.

I spent sometime studying this and it appears to be a good approach. 

Passing the cutoff page as headPage (SLRU not needed code path ) instead of the 
tailPage to 
SimpleLruTruncate is already being done when the tailXid is not a valid XID. 
I added an additional condition to make sure that the tailPage proceeds the 
headPage
as well. 

Attached is v2 of the patch.

> In addition to that, we could change SerialAdd() to not zero out the
> pages between old headXid and tailXid unnecessarily, but that's more of
> an optimization than bug fix.

Yes, I did notice that in my debugging, but will not address this in the 
current patch.


Regards,

Sami



0001-v2-Fix-false-apparent-wraparound-detected-in-pg_serial.patch
Description: 0001-v2-Fix-false-apparent-wraparound-detected-in-pg_serial.patch

Re: False "pg_serial": apparent wraparound” in logs

2023-09-29 Thread Imseih (AWS), Sami

> I don't really understand what exactly the problem is, or how this fixes
> it. But this doesn't feel right:

As the repro show, false reports of "pg_serial": apparent wraparound”
messages are possible. For a very busy system which checkpoints frequently
and heavy usage of serializable isolation, this will flood the error logs, and 
falsely cause alarm to the user. It also prevents the SLRU from being
truncated.

In my repro, I end up seeing, even though the SLRU does not wraparound.
" LOG: could not truncate directory "pg_serial": apparent wraparound"

> Firstly, isn't headPage == 0 also a valid value? We initialize headPage
> to -1 when it's not in use.

Yes. You are correct. This is wrong.

> Secondly, shouldn't we set it to the page corresponding to headXid
> rather than tailXid.

> Thirdly, I don't think this code should have any business setting
> latest_page_number directly. latest_page_number is set in
> SimpleLruZeroPage(). 

Correct, after checking again, I do realize the patch is wrong.

> Are we missing a call to SimpleLruZeroPage() somewhere?

That is a good point.

The initial idea was to advance the latest_page_number 
during SerialSetActiveSerXmin, but the initial approach is 
obviously wrong.

When SerialSetActiveSerXmin is called for a new active
serializable xmin, and at that point we don't need to keep any
any earlier transactions, should SimpleLruZeroPage be called
to ensure there is a target page for the xid?

I tried something like below, which fixes my repro, by calling
SimpleLruZeroPage at the end of SerialSetActiveSerXmin.

@@ -953,6 +953,8 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
 static void
 SerialSetActiveSerXmin(TransactionId xid)
 {
+   int targetPage = SerialPage(xid);
+
LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
 
/*
@@ -992,6 +994,9 @@ SerialSetActiveSerXmin(TransactionId xid)
 
serialControl->tailXid = xid;
 
+   if (serialControl->headPage != targetPage)
+   SimpleLruZeroPage(SerialSlruCtl, targetPage);
+
LWLockRelease(SerialSLRULock);
 }

Regards,

Sami

Re: Jumble the CALL command in pg_stat_statements

2023-09-13 Thread Imseih (AWS), Sami

> Still this grouping is much better than having thousands of entries
> with different values. I am not sure if we should bother improving
> that more than what you suggest that, especially as FuncExpr->args can
> itself include Const nodes as far as I recall.

I agree.

> As far as the SET command mentioned in [1] is concerned, it is a bit more 
> complex
> as it requires us to deal with A_Constants which is not very straightforward. 
> We can surely
> deal with SET currently by applying custom query jumbling logic to 
> VariableSetStmt,
> but this can be dealt with in a separate discussion.

> As VariableSetStmt is the top-most node structure for SET/RESET
> commands, using a custom implementation may be wise in this case,

I do have a patch for this with test cases, 0001-v1-Jumble-the-SET-command.patch
If you feel this needs a separate discussion I can start one.

In the patch, the custom _jumbleVariableSetStmt jumbles
 the kind, name, is_local and number of arguments ( in case of a list ) 
and tracks the locations for normalization.

> This choice is a bit surprising. How does it influence the jumbling?
> For example, if I add a query_jumble_ignore to it, the regression
> tests of pg_stat_statements still pass. This is going to require more
> test coverage to prove that this addition is useful.

CALL with OUT or INOUT args is a bit strange, because
as the doc [1] mentions "Arguments must be supplied for all procedure 
parameters 
that lack defaults, including OUT parameters. However, arguments 
matching OUT parameters are not evaluated, so it's customary
to just write NULL for them."

so for pgss, passing a NULL or some other value into OUT/INOUT args should 
be normalized like IN args.

0001-v2-Jumble-the-CALL-command-in-pg_stat_statements.patch adds
these test cases.


Regards,

Sami Imseih
Amazon Web Services (AWS)

[1] https://www.postgresql.org/docs/current/sql-call.html



0001-v1-Jumble-the-SET-command.patch
Description: 0001-v1-Jumble-the-SET-command.patch


0001-v2-Jumble-the-CALL-command-in-pg_stat_statements.patch
Description: 0001-v2-Jumble-the-CALL-command-in-pg_stat_statements.patch

Jumble the CALL command in pg_stat_statements

2023-09-12 Thread Imseih (AWS), Sami

Hi,

The proposal by Bertrand in CC to jumble CALL and SET in [1] was
rejected at the time for a more robust solution to jumble DDL.

Michael also in CC made this possible with commit 3db72ebcbe.

The attached patch takes advantage of the jumbling infrastructure
added in the above mentioned commit and jumbles the CALL statement
in pg_stat_statements.

The patch also modifies existing test cases for CALL handling in 
pg_stat_statements
and adds additional tests which prove that a CALL to an overloaded procedure
will generate a different query_id.

As far as the SET command mentioned in [1] is concerned, it is a bit more 
complex
as it requires us to deal with A_Constants which is not very straightforward. 
We can surely
deal with SET currently by applying custom query jumbling logic to 
VariableSetStmt,
but this can be dealt with in a separate discussion.


Regards,

Sami Imseih
Amazon Web Services (AWS)

[1] 
https://www.postgresql.org/message-id/flat/36e5bffe-e989-194f-85c8-06e7bc88e6f7%40amazon.com


0001-v1-Jumble-the-CALL-command-in-pg_stat_statements.patch
Description: 0001-v1-Jumble-the-CALL-command-in-pg_stat_statements.patch

Re: Correct the documentation for work_mem

2023-09-08 Thread Imseih (AWS), Sami

> This looks mostly fine to me modulo "sort or hash". I do see many
> instances of "and/or" in the docs. Maybe that would work better.

"sort or hash operations at the same time" is clear explanation IMO.

This latest version of the patch looks good to me.

Regards,

Sami

Re: pg_stat_get_backend_subxact() and backend IDs?

2023-08-25 Thread Imseih (AWS), Sami

> Here is a new version of the patch that avoids changing the names of the
> existing functions. I'm not thrilled about the name
> (pgstat_fetch_stat_local_beentry_by_backend_id), so I am open to
> suggestions. In any case, I'd like to rename all three of the>
> pgstat_fetch_stat_* functions in v17.

Thanks for the updated patch.

I reviewed/tested the latest version and I don't have any more comments.

Regards,

Sami

Re: pg_stat_get_backend_subxact() and backend IDs?

2023-08-25 Thread Imseih (AWS), Sami

I tested the patch and it does the correct thing.

I have a few comments:

1/ cast the return of bsearch. This was done previously and is the common
convention in the code.

So

+   return bsearch(, localBackendStatusTable, localNumBackends,
+  sizeof(LocalPgBackendStatus), cmp_lbestatus);

Should be

+   return (LocalPgBackendStatus *) bsearch(, localBackendStatusTable, 
localNumBackends,
+  sizeof(LocalPgBackendStatus), cmp_lbestatus);

2/ This will probably be a good time to update the docs for 
pg_stat_get_backend_subxact [1]
to call out that "subxact_count" will "only increase if a transaction is 
performing writes". Also to link
the reader to the subtransactions doc [2].


1. 
https://www.postgresql.org/docs/16/monitoring-stats.html#WAIT-EVENT-TIMEOUT-TABLE
2. https://www.postgresql.org/docs/16/subxacts.html


Regards,

Sami Imseih
Amazon Web Services (AWS)

Re: False "pg_serial": apparent wraparound” in logs

2023-08-24 Thread Imseih (AWS), Sami

Attached a patch with a new CF entry: https://commitfest.postgresql.org/44/4516/

Regards,

Sami Imseih
Amazon Web Services (AWS)







0001-v1-Fix-false-report-of-wraparound-in-pg_serial.patch
Description: 0001-v1-Fix-false-report-of-wraparound-in-pg_serial.patch

Re: False "pg_serial": apparent wraparound” in logs

2023-08-23 Thread Imseih (AWS), Sami

Hi,

I dug a bit into this and what looks to be happening is the comparison
of the page containing the latest cutoff xid could falsely be reported
as in the future of the last page number because the latest
page number of the Serial slru is only set when the page is
initialized [1].

So under the correct conditions, such as in the repro, the serializable
XID has moved past the last page number, therefore to the next checkpoint
which triggers a CheckPointPredicate, it will appear that the slru
has wrapped around.

It seems what may be needed here is to advance the
latest_page_number during SerialSetActiveSerXmin and if
we are using the SLRU. See below:


diff --git a/src/backend/storage/lmgr/predicate.c 
b/src/backend/storage/lmgr/predicate.c
index 1af41213b4..6946ed21b4 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -992,6 +992,9 @@ SerialSetActiveSerXmin(TransactionId xid)

serialControl->tailXid = xid;

+   if (serialControl->headPage > 0)
+   SerialSlruCtl->shared->latest_page_number = SerialPage(xid);
+
LWLockRelease(SerialSLRULock);
}

[1] 
https://github.com/postgres/postgres/blob/master/src/backend/access/transam/slru.c#L306

Regards,

Sami

From: "Imseih (AWS), Sami" 
Date: Tuesday, August 22, 2023 at 7:56 PM
To: "pgsql-hack...@postgresql.org" 
Subject: False "pg_serial": apparent wraparound” in logs

Hi,

I Recently encountered a situation on the field in which the message
“could not truncate directory "pg_serial": apparent wraparound”
was logged even through there was no danger of wraparound. This
was on a brand new cluster and only took a few minutes to see
the message in the logs.

Reading on some history of this error message, it appears that there
was work done to improve SLRU truncation and associated wraparound
log messages [1]. The attached repro on master still shows that this message
can be logged incorrectly.

The repro runs updates with 90 threads in serializable mode and kicks
off a “long running” select on the same table in serializable mode.

As soon as the long running select commits, the next checkpoint fails
to truncate the SLRU and logs the error message.

Besides the confusing log message, there may also also be risk with
pg_serial getting unnecessarily bloated and depleting the disk space.

Is this a bug?

[1] 
https://www.postgresql.org/message-id/flat/20190202083822.GC32531%40gust.leadboat.com

Regards,

Sami Imseih
Amazon Web Services (AWS)

False "pg_serial": apparent wraparound” in logs

2023-08-22 Thread Imseih (AWS), Sami

Hi,

I Recently encountered a situation on the field in which the message
“could not truncate directory "pg_serial": apparent wraparound”
was logged even through there was no danger of wraparound. This
was on a brand new cluster and only took a few minutes to see
the message in the logs.

Reading on some history of this error message, it appears that there
was work done to improve SLRU truncation and associated wraparound
log messages [1]. The attached repro on master still shows that this message
can be logged incorrectly.

The repro runs updates with 90 threads in serializable mode and kicks
off a “long running” select on the same table in serializable mode.

As soon as the long running select commits, the next checkpoint fails
to truncate the SLRU and logs the error message.

Besides the confusing log message, there may also also be risk with
pg_serial getting unnecessarily bloated and depleting the disk space.

Is this a bug?

[1] 
https://www.postgresql.org/message-id/flat/20190202083822.GC32531%40gust.leadboat.com

Regards,

Sami Imseih
Amazon Web Services (AWS)





### create the pgbench script
echo "\set idv random(1, 1)" > /tmp/pgbench.sql
echo "BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;" >> /tmp/pgbench.sql
echo "UPDATE tab1 SET id2 = id2 WHERE id = :idv;" >> /tmp/pgbench.sql
echo "COMMIT;" >> /tmp/pgbench.sql

### create the driver script
cat < /tmp/repro.sh
psql<

Re: Correct the documentation for work_mem

2023-08-01 Thread Imseih (AWS), Sami

Hi,

Sorry for the delay in response and thanks for the feedback!

> I've reviewed and built the documentation for the updated patch. As it stands 
> right now I think the documentation for this section is quite clear.

Sorry, I am not understanding. What is clear? The current documentation -or- 
the proposed documentation in the patch?

>> I'm wondering about adding "and more than one of these operations may
>> be in progress simultaneously".  Are you talking about concurrent
>> sessions running other queries which are using work_mem too?

> This appears to be referring to the "sort and hash" operations mentioned 
> prior.

Correct, this is not referring to multiple sessions, but a given execution 
could 
have multiple operations that are each using up to work_mem simultaneously.

> I also agree that changing "sort or hash" to "sort and hash" is a better 
> description.

That is addressed in the last revision of the patch.

-Note that for a complex query, several sort or hash operations might be
-running in parallel; each operation will generally be allowed
+Note that a complex query may include several sort and hash operations,

Regards,

Sami

WaitForOlderSnapshots in DETACH PARTITION causes deadlocks

2023-07-24 Thread Imseih (AWS), Sami

Hi,

While recently looking into partition maintenance, I found a case in which
DETACH PARTITION FINALIZE could case deadlocks. This occurs when a
ALTER TABLE DETACH CONCURRENTLY, followed by a cancel, the followed by
an ALTER TABLE DETACH FINALIZE, and this sequence of steps occur in the
middle of other REPEATABLE READ/SERIALIZABLE transactions. These RR/SERIALIZABLE
should not accessed the partition until the FINALIZE step starts. See the
attached
repro.

This seems to occur as the FINALIZE is calling WaitForOlderSnapshot [1]
to make sure that all older snapshots are completed before the finalize
is completed and the detached partition is removed from the parent table
association.

WaitForOlderSnapshots is used here to ensure that snapshots older than
the start of the ALTER TABLE DETACH CONCURRENTLY are completely removed
to guarantee consistency, however it does seem to cause deadlocks for at
least RR/SERIALIZABLE transactions.

There are cases [2] in which certain operations are accepted as not being
MVCC-safe, and now I am wondering if this is another case? Deadlocks are
not a good scenario, and WaitForOlderSnapshot does not appear to do
anything for READ COMMITTED transactions.

So do we actually need WaitForOlderSnapshot in the FINALIZE code? and
Could be acceptable that this operation is marked as not MVCC-safe like
the other aforementioned operations?

Perhaps I am missing some important point here, so any feedback will be
appreciated.

Regards,

Sami Imseih
Amazon Web Services (AWS)

[1]
https://github.com/postgres/postgres/blob/master/src/backend/commands/tablecmds.c#L18757-L18764
[2] https://www.postgresql.org/docs/current/mvcc-caveats.html

partition_detach_finalize_deadlock_repro.sql
Description: partition_detach_finalize_deadlock_repro.sql

Re: New function to show index being vacuumed

2023-06-22 Thread Imseih (AWS), Sami

> I'm sorry for not having read (and not reading) the other thread yet,
> but what was the reason we couldn't store that oid in a column in the
> pg_s_p_vacuum-view?


> Could you summarize the other solutions that were considered for this issue?

Thanks for your feedback!

The reason we cannot stick the oid in pg_s_p_vacuum is because it will
not work for parallel vacuum as only the leader process has an entry
in pg_s_p_vacuum.

With a function the leader or worker pid can be passed in to the function
and will return the indexrelid being processed.

Regards,

Sami

New function to show index being vacuumed

2023-06-22 Thread Imseih (AWS), Sami

Hi,

[1] is a ready-for-committer enhancement to pg_stat_progress_vacuum which 
exposes
the total number of indexes to vacuum and how many indexes have been vacuumed in
the current vacuum cycle.

To even further improve visibility into index vacuuming, it would be beneficial 
to have a
function called pg_stat_get_vacuum_index(pid) that takes in a pid and returns 
the
indexrelid of the index being processed.

Currently the only way to get the index being vacuumed by a process
Is through os tools such as pstack.

I had a patch for this as part of [1], but it was decided to handle this in a 
separate
discussion.

Comments/feedback will be appreciated before sending out a v1 of the patch.


Regards,

Sami Imseih
Amazon Web Services (AWS)

1. 
https://www.postgresql.org/message-id/flat/5478dfcd-2333-401a-b2f0-0d186ab09...@amazon.com

Re: [BUG] pg_dump does not properly deal with BEGIN ATOMIC function

2023-06-02 Thread Imseih (AWS), Sami

> So it's lacking a rule to tell it what to do in this case, and the
> default is the wrong way around. I think we need to fix it in
> about the same way as the equivalent case for matviews, which
> leads to the attached barely-tested patch.

Thanks for the patch! A test on the initially reported use case
and some other cases show it does the expected.

Some minor comments I have:

1/

+ agginfo[i].aggfn.postponed_def = false;   /* might get set during sort */

This is probably not needed as it seems that we can only
get into this situation when function dependencies are tracked.
This is either the argument or results types of a function which
are already handled correctly, or when the function body is examined
as is the case with BEGIN ATOMIC.


2/

Instead of

+ * section.  This is sufficient to handle cases where a function depends on
+ * some unique index, as can happen if it has a GROUP BY for example.
+ */

The following description makes more sense. 

+ * section.  This is sufficient to handle cases where a function depends on
+ * some constraints, as can happen if a BEGIN ATOMIC function 
+ * references a constraint directly.


3/

The docs in https://www.postgresql.org/docs/current/ddl-depend.html
should be updated. The entire section after "For user-defined functions "
there is no mention of BEGIN ATOMIC as one way that the function body
can be examined for dependencies.

This could be tracked in a separate doc update patch. What do you think?


> BTW, now that I see a case the default printout here seems
> completely ridiculous. I think we need to do


> describeDumpableObject(loop[i], buf, sizeof(buf));
> - pg_log_info(" %s", buf);
> + pg_log_warning(" %s", buf);
> }

Not sure I like this more than what is there now.

The current indentation in " pg_dump:   " makes it obvious 
that these lines are details for the warning message. Additional
"warning" messages will be confusing.


Regards,

Sami Imseih
Amazon Web Services (AWS)

[BUG] pg_dump does not properly deal with BEGIN ATOMIC function

2023-05-31 Thread Imseih (AWS), Sami

Hi,

What appears to be a pg_dump/pg_restore bug was observed with the new
BEGIN ATOMIC function body syntax introduced in Postgres 14.

Dependencies inside a BEGIN ATOMIC function cannot be resolved
if those dependencies are dumped after the function body. The repro
case is when a primary key constraint is used in a ON CONFLICT ON CONSTRAINT
used within the function.

With the attached repro, pg_restore fails with

pg_restore: error: could not execute query: ERROR:  constraint "a_pkey" for 
table "a" does not exist
Command was: CREATE FUNCTION public.a_f(c1_in text, c2 integer DEFAULT 60) 
RETURNS void


I am not sure if the answer if to dump functions later on in the process.

Would appreciate some feedback on this issue.

Regards,

Sami Imseih
Amazon Web Services (AWS)



repro.sh
Description: repro.sh

Re: Correct the documentation for work_mem

2023-04-24 Thread Imseih (AWS), Sami

Based on the feedback, here is a v1 of the suggested doc changes.

I modified Gurjeets suggestion slightly to make it clear that a specific
query execution could have operations simultaneously using up to 
work_mem.

I also added the small hash table memory limit clarification.


Regards,

Sami Imseih
Amazon Web Services (AWS)







v1-0001-Fix-documentation-for-work_mem.patch
Description: v1-0001-Fix-documentation-for-work_mem.patch

Re: Correct the documentation for work_mem

2023-04-21 Thread Imseih (AWS), Sami

> > especially since the next sentence uses "concurrently" to describe the
> > other case.  I think we need a more thorough rewording, perhaps like
> >
> > -   Note that for a complex query, several sort or hash operations 
> > might be
> > -   running in parallel; each operation will generally be allowed
> > +   Note that a complex query may include several sort or hash
> > +   operations; each such operation will generally be allowed

> This wording doesn't seem to bring out the fact that there could be
> more than one work_mem consumer running (in-progress) at the same
> time. 

Do you mean, more than one work_mem consumer running at the same
time for a given query? If so, that is precisely the point we need to convey
in the docs.

i.e. if I have 2 sorts in a query that can use up to 4MB each, at some point
in the query execution, I can have 8MB of memory allocated.


Regards,

Sami Imseih
Amazon Web Services (AWS)

Correct the documentation for work_mem

2023-04-21 Thread Imseih (AWS), Sami

Hi,

I recently noticed the following in the work_mem [1] documentation:

“Note that for a complex query, several sort or hash operations might be 
running in parallel;”

The use of “parallel” here is misleading as this has nothing to do with 
parallel query, but
rather several operations in a plan running simultaneously.

The use of parallel in this doc predates parallel query support, which explains 
the usage.

I suggest a small doc fix:

“Note that for a complex query, several sort or hash operations might be 
running simultaneously;”

This should also be backpatched to all supported versions docs.

Thoughts?

Regards,

Sami Imseih
Amazon Web Services (AWS)

1. https://www.postgresql.org/docs/current/runtime-config-resource.html

Re: Add index scan progress to pg_stat_progress_vacuum

2023-04-12 Thread Imseih (AWS), Sami

> This should be OK (also checked the code paths where the reports are
> added). Note that the patch needed a few adjustments for its
> indentation.

Thanks for the formatting corrections! This looks good to me.

--
Sami

Re: Add index scan progress to pg_stat_progress_vacuum

2023-04-10 Thread Imseih (AWS), Sami

> + case 'P': /* Parallel progress reporting */

I kept this comment as-is but inside case code block I added 
more comments. This is to avoid cluttering up the one-liner comment.

> + * Increase and report the number of index scans. Also, we reset the progress
> + * counters.


> The counters reset are the two index counts, perhaps this comment
> should mention this fact.

Yes, since we are using the multi_param API here, it makes sense to 
mention the progress fields being reset in the comments.


+ /* update progress */
+ int index = pq_getmsgint(msg, 4);
+ int incr = pq_getmsgint(msg, 1);
[...]
+ pq_beginmessage(_message, 'P');
+ pq_sendint32(_message, index);
+ pq_sendint64(_message, incr);
+ pq_endmessage(_message);


> It seems to me that the receiver side is missing one pq_getmsgend()?
Yes. I added this.

> incr is defined and sent as an int64 on the sender side, hence the
> receiver should use pq_getmsgint64(), no? pq_getmsgint(msg, 1) means
> to receive only one byte, see pqformat.c. 

Ah correct, incr is an int64 so what we need is.

int64 incr = pq_getmsgint64(msg);

I also added the pq_getmsgend call.


> And the order is reversed?

I don't think so. The index then incr are sent and they are
back in the same order. Testing the patch shows the value
increments correctly.


See v28 addressing the comments.

Regards,

Sami Imseih 
AWS (Amazon Web Services)




v28-0001-Report-index-vacuum-progress.patch
Description: v28-0001-Report-index-vacuum-progress.patch

Re: Add index scan progress to pg_stat_progress_vacuum

2023-04-07 Thread Imseih (AWS), Sami

> The arguments of pgstat_progress_update_param() would be given by the
> worker directly as components of the 'P' message. It seems to me that
> this approach would have the simplicity to not require the setup of a
> shmem area for the extra counters, and there would be no need for a
> callback. Hence, the only thing the code paths of workers would need
> to do is to call this routine, then the leaders would increment their
> progress when they see a CFI to process the 'P' message. Also, I
> guess that we would only need an interface in backend_progress.c to
> increment counters, like pgstat_progress_incr_param(), but usable by
> workers. Like a pgstat_progress_worker_incr_param()?

So, here is what I think should be workable to give a generic
progress interface.

pgstat_progress_parallel_incr_param will be a new API that
can be called by either worker of leader from any parallel
code path that chooses to increment a progress index. 

If called by a worker, it will send a 'P' message to the front end
passing both the progress index, i.e. PROGRESS_VACUUM_INDEXES_PROCESSED
And the value to increment by, i.e. 1 for index vacuum progress.

With that, the additional shared memory counters in ParallelVacuumState
are not needed, and the poke of the worker to the leader goes directly
through a generic backend_progress API.

Let me know your thoughts.

Thanks!

--
Sami Imseih
Amazon Web Services (AWS)





v27-0001-Report-index-vacuum-progress.patch
Description: v27-0001-Report-index-vacuum-progress.patch

Re: Add index scan progress to pg_stat_progress_vacuum

2023-04-06 Thread Imseih (AWS), Sami

> As one thing,
> for example, it introduces a dependency to parallel.h to do progress
> reporting without touching at backend_progress.h.  

Containing the logic in backend_progress.h is a reasonable point
from a maintenance standpoint.

We can create a new function in backend_progress.h called 
pgstat_progress_update_leader which is called from
vacuumparallel.c. 

pgstat_progress_update_leader can then call pq_putmessage('P', NULL, 0)

> Is a callback
> approach combined with a counter in shared memory the best thing there
> could be?  

It seems to be the best way.

The shared memory, ParallelVacuumState, is already tracking the
counters for the Parallel Vacuum.

Also, the callback in ParallelContext is the only way I can see
to let the 'P' message know what to do for updating progress
to the leader.


> Could it be worth thinking about a different design where
> the value incremented and the parameters of
> pgstat_progress_update_param() are passed through the 'P' message
> instead?

I am not sure how this is different than the approach suggested.
In the current design, the 'P' message is used to pass the
ParallelvacuumState to parallel_vacuum_update_progress which then
calls pgstat_progress_update_param.


> It strikes me that gathering data in the leader from a poke
> of the workers is something that could be useful in so much more areas
> than just the parallel index operations done in a vacuum because we do
> more and more things in parallel these days, so the API interface
> ought to have some attention.

We may need an interface that does more than progress
reporting, but I am not sure what those use cases are at
this point, besides progress reporting.


> There is also an argument where we could
> have each worker report their progress independently of the leader?

In this case, we don't need ParallelContext at all or to go through the
'P' message.


--
Regards,

Sami Imseih
Amazon Web Services (AWS)

Re: [BUG] pg_stat_statements and extended query protocol

2023-04-05 Thread Imseih (AWS), Sami

> Makes sense to me. I'll look at that again today, potentially apply
> the fix on HEAD.

Here is v6. That was my mistake not to zero out the es_total_processed.
I had it in the first version.

--
Regards,

Sami Imseih
Amazon Web Services (AWS)




v6-0001-Fix-row-tracking-in-pg_stat_statements.patch
Description: v6-0001-Fix-row-tracking-in-pg_stat_statements.patch

Re: Add index scan progress to pg_stat_progress_vacuum

2023-04-05 Thread Imseih (AWS), Sami

> > The key point of the patch is here. From what I understand based on
> > the information of the thread, this is used as a way to make the
> progress reporting done by the leader more responsive so as we'd
> > update the index counters each time the leader is poked at with a 'P'
> > message by one of its workers, once a worker is done with the parallel
> > cleanup of one of the indexes. That's appealing, because this design
> > is responsive and cheap, while we'd rely on CFIs to make sure that the
> > leader triggers its callback on a timely basis. Unfortunately,
> > sticking a concept of "Parallel progress reporting" is rather
> > confusing here? This stuff can be used for much more purposes than
> > just progress reporting: the leader would execute the callback it has
> > registered based on the timing given by one or more of its workers,
> > these willing to push an event on the leader. Progress reporting is
> > one application of that to force a refresh and make the information of
> > the leader accurate. What about things like a chain of callbacks, for
> > example? Could the leader want to register more than one callback and
> > act on all of them with one single P message?


> That seems a valid argument. I was thinking that such an asynchronous
> state update mechanism would be a good infrastructure for progress
> reporting of parallel operations. It might be worth considering to use
> it in more general usage but since the current implementation is
> minimal can we extend it in the future when we need it for other use
> cases?

I don't think we should delay this patch to design a more general
infrastructure. I agree this can be handled by a future requirement.


> >
> > Another question I have: could the reporting of each individual worker
> > make sense on its own? The cleanup of the indexes depends on the
> > order they are processed, their number, size and AM with their cleanup
> > strategy, still there may be a point in getting information about how
> > much work a single worker is doing rather than just have the
> > aggregated information given to the leader?


> It would also be useful information for users but I don't think it can
> alternate the aggregated information. The aggregated information can
> answer the question from the user like "how many indexes to vacuum are
> remaining?", which helps estimate the remaining time to complete.

The original intention of the thread was to expose stats for both 
aggregate (leader level) and individual index progress. Both the aggregate
and individual indexes information have benefit as mentioned by Swada-San.

For the individual index progress, a suggested patch was suggested earlier in
the thread, v1-0001-Function-to-return-currently-vacuumed-or-cleaned-ind.patch.

However, since this particular thread has focused mainly on the aggregated 
stats work,
my thoughts have been to start a new thread for the individual index progress
once this gets committed.


> > Btw, Is an assertion really helpful here? If
> > parallel_progress_callback is not set, we'd just crash one line
> > after. It seems to me that it could be cleaner to do nothing if a
> > leader gets a poke message from a worker if there are no callbacks
> > registered.

> Agreed.

I removed the assert and added an if condition instead.

See the attached v26 please.

Regards,

--
Sami Imseih
Amazon Web Services (AWS)





v26-0001-Add-2-new-columns-to-pg_stat_progress_vacuum.-Th.patch
Description: v26-0001-Add-2-new-columns-to-pg_stat_progress_vacuum.-Th.patch

Re: [BUG] pg_stat_statements and extended query protocol

2023-04-04 Thread Imseih (AWS), Sami

> Doing nothing for calls now is fine by me, though I
> agree that this could be improved at some point, as seeing only 1
> rather than N for each fetch depending on the size is a bit confusing.

I think we will need to clearly define what "calls" is. Perhaps as mentioned
above, we may need separate counters for "calls" vs "fetches". This is
definitely a separate thread.


> Doesn't this comment at the top of ExecutorRun() need an update? It
> seems to me that this comment should mention both es_total_processed

Yes, updated in v5.


> There is no need for this part in ExecutorFinish(), actually, as long
> as we always increment es_total_processed at the end ExecutorRun() for
> all the operation types? 

Ah, correct. I changed that and tested again.

> - es_processed: number of tuples processed during one ExecutorRun()
> call.
> - es_total_processed: total number of tuples aggregated across all
> ExecutorRun() calls.

I thought hard about this point and for some reason I did not want to
mention ExecutorRun in the comment. But, I agree with what you suggest.
It's more clear as to the intention of the fields.

Attached is v5 addressing the comments.


Regards,

Sami Imseih
Amazon Web Services (AWS)



v5-0001-Fix-row-tracking-in-pg_stat_statements.patch
Description: v5-0001-Fix-row-tracking-in-pg_stat_statements.patch

Re: [BUG] pg_stat_statements and extended query protocol

2023-04-04 Thread Imseih (AWS), Sami

> I was looking back at this thread, and the suggestion to use one field
> in EState sounds fine to me. Sami, would you like to send a new
> version of the patch (simplified version based on v1)?

Here is v4.

The "calls" tracking is removed from Estate. Unlike v1 however,
I added a check for the operation type. Inside ExecutorRun,
es_total_processed is incremented when the operation is
a SELECT. This check is done for es_processed as well inside
executorRun -> ExecutePlan.

For non select operations, es_total_processed is set to
es_processed in executorfinish. This is because the modify
plan nodes set es_processed outside of execMain.c


Regards,

Sami Imseih
Amazon Web Services (AWS)





v4-0001-Fix-row-tracking-in-pg_stat_statements.patch
Description: v4-0001-Fix-row-tracking-in-pg_stat_statements.patch

Re: [BUG] pg_stat_statements and extended query protocol

2023-04-03 Thread Imseih (AWS), Sami

> Maybe, but is there any field demand for that?

I don't think there is.

> We clearly do need to fix the
> reported rowcount for cases where ExecutorRun is invoked more than
> once per ExecutorEnd call; but I think that's sufficient.

Sure, the original proposed fix, but with tracking the es_total_processed
only in Estate should be enough for now.

Regards,

Sami Imseih
Amazon Web Services (AWS)

Re: [BUG] pg_stat_statements and extended query protocol

2023-04-03 Thread Imseih (AWS), Sami

> Why should that be the definition? Partial execution of a portal
> might be something that is happening at the driver level, behind the
> user's back. You can't make rational calculations of, say, plan
> time versus execution time if that's how "calls" is measured.

Correct, and there are also drivers that implement fetch size using
cursor statements, i.e. DECLARE CURSOR, FETCH CURSOR,
and each FETCH gets counted as 1 call.

I wonder if the right answer here is to track fetches as 
a separate counter in pg_stat_statements, in which fetch
refers to the number of times a portal is executed?

Regards,

Sami Imseih
Amazon Web Services (AWS)

Re: [BUG] pg_stat_statements and extended query protocol

2023-04-03 Thread Imseih (AWS), Sami

> * Yeah, it'd be nice to have an in-core test, but it's folly to insist
> on one that works via libpq and psql. That requires a whole new set
> of features that you're apparently designing on-the-fly with no other
> use cases in mind. I don't think that will accomplish much except to
> ensure that this bug fix doesn't make it into v16.

I agree, Solving the lack of in-core testing for this area belong in a
different discussion. 


> * I don't understand why it was thought good to add two new counters
> to struct Instrumentation. In EXPLAIN ANALYZE cases those will be
> wasted space *per plan node*, not per Query.

Indeed, looking at ExplainNode, Instrumentation struct is allocated
per node and the new fields will be wasted space. Thanks for highlighting
this.

> * It also seems quite bizarre to add counters to a core data structure
> and then leave it to pg_stat_statements to maintain them. 

That is fair point

> I'm inclined to think that adding the counters to struct EState is
> fine. That's 304 bytes already on 64-bit platforms, another 8 or 16
> won't matter.

With the point you raise about Insrumentation per node, Estate 
is the better place for the new counters.


> Also, I'm doubtful that counting calls this way is a great idea,
> which would mean you only need one new counter field not two. The
> fact that you're having trouble defining what it means certainly
> suggests that the implementation is out front of the design.

ISTM you are not in agreement that a call count should be incremented 
after every executorRun, but should only be incremented after 
the portal is closed, at executorEnd. Is that correct?

FWIW, The rationale for incrementing calls in executorRun is that calls refers 
to the number of times a client executes a portal, whether partially or to 
completion.

Clients can also fetch rows from portals at various rates, so to determine the
"rows per call" accurately from pg_stat_statements, we should track a calls as 
the number of times executorRun was called on a portal.

> In short, what I think I'd suggest is adding an es_total_processed
> field to EState and having standard_ExecutorRun do "es_total_processed
> += es_processed" near the end, then just change pg_stat_statements to
> use es_total_processed not es_processed.

The original proposal in 
0001-correct-pg_stat_statements-tracking-of-portals.patch,
was to add a "calls" and "es_total_processed" field to EState.


Regards,

Sami Imseih
Amazon Web Services (AWS)

Re: [BUG] pg_stat_statements and extended query protocol

2023-03-31 Thread Imseih (AWS), Sami

> So...  The idea here is to set a custom fetch size so as the number of
> calls can be deterministic in the tests, still more than 1 for the
> tests we'd have.  And your point is that libpq enforces always 0 when
> sending the EXECUTE message causing it to always return all the rows
> for any caller of PQsendQueryGuts().

That is correct.

> The extended protocol allows that, so you would like a libpq API to
> have more control of what we send with EXECUTE:
> https://www.postgresql.org/docs/current/protocol-overview.html#PROTOCOL-QUERY-CONCEPTS


> The extended query protocol would require multiple 'E' messages, but
> we would not need multiple describe or bind messages, meaning that
> this cannot just be an extra flavor PQsendQueryParams().  Am I gettig
> that right?  

Correct, there will need to be separate APIs for Parse/Bind, Execute
and Close of a Portal.


> The correct API design seems tricky, to say the least.
> Perhaps requiring this much extra work in libpq for the purpose of
> having some tests in this thread is not a brilliant idea..  Or perhaps
> we could just do it and have something a-la-JDBC with two routines?
> That would be one libpq routine for describe/bind and one for execute
> where the limit can be given by the caller in the latter case, similar
> to sendDescribeStatement() and sendExecute() in
> QueryExecutorImpl.java.

I am not too clear on your point here. ISTM you are suggesting adding
new libpq APis similar to JDBC, which is what I am also suggesting.

Did I understand correctly?


--
Sami Imseih
Amazon Web Services (AWS)

Re: [BUG] pg_stat_statements and extended query protocol

2023-03-24 Thread Imseih (AWS), Sami

> I wonder that this patch changes the meaning of "calls" in the 
> pg_stat_statement
> view a bit; previously it was "Number of times the statement was executed" as
> described in the documentation, but currently this means "Number of times the
> portal was executed". I'm worried that this makes users confused. For example,
> a user may think the average numbers of rows returned by a statement is given 
> by
> rows/calls, but it is not always correct because some statements could be 
> executed
> with multiple portal runs.

I don't think it changes the meaning of "calls" in pg_stat_statements, since 
every
time the app fetches X amount of rows from a portal, it's still done in a 
separate
execution, and thus a separate call.

I agree, the meaning of "calls" should be clarified in docs.


Regards,

Sami Imseih
Amazon Web Services (AWS)

Re: [BUG] pg_stat_statements and extended query protocol

2023-03-23 Thread Imseih (AWS), Sami

> How does JDBC test that? Does it have a dependency on
> pg_stat_statements?

No, at the start of the thread, a sample jdbc script was attached.
But I agree, we need to add test coverage. See below.


>> But, I'm tempted to say that adding new tests could be addressed
>> separately though (as this patch looks pretty straightforward).


> Even small patches can have gotchas. I think that this should have
> tests in-core rather than just depend on JDBC and hope for the best.
> Even if \bind does not allow that, we could use an approach similar to
> libpq_pipeline, for example, depending on pg_stat_statements for the
> validation with a test module in src/test/modules/?

Yes, that is possible but we will need to add a libpq API
that allows the caller to pass in a "fetch size".
PQsendQueryParams does not take in a "fetch size", 
so it returns all rows, through PQsendQueryParams

https://github.com/postgres/postgres/blob/master/src/interfaces/libpq/fe-exec.c#L1882

Adding such an API that takes in a "fetch size" will be beneficial 
not just for this test, but I can see it enabling another psql meta command,
similar to \bind but that takes in a "fetch size".

Regards

--
Sami Imseih
Amazon Web Services (AWS)

Re: [BUG] pg_stat_statements and extended query protocol

2023-03-22 Thread Imseih (AWS), Sami

> What about using an uint64 for calls? That seems more appropriate to me (even 
> if
> queryDesc->totaltime->calls will be passed (which is int64), but that's 
> already
> also the case for the "rows" argument and 
> queryDesc->totaltime->rows_processed)

That's fair


> I'm not sure it's worth mentioning that the new counters are "currently" used 
> with the ExecutorRun.

Sure, I suppose these fields could be used outside of ExecutorRun. Good point.


> Also, I wonder if "rows" (and not rows_processed) would not be a better 
> naming.

Agree.

I went with rows_processed initially, since it was accumulating es_processed,
but as the previous point, this instrumentation could be used outside of
ExecutorRun.

v3 addresses the comments.


Regards,


--
Sami Imseih
Amazon Web Services (AWS)





v3-0001-Correct-accumulation-of-counters-for-extended-query-.patch
Description: v3-0001-Correct-accumulation-of-counters-for-extended-query-.patch

Re: [BUG] pg_stat_statements and extended query protocol

2023-03-21 Thread Imseih (AWS), Sami

> This indeed feels a bit more natural seen from here, after looking at
> the code paths using an Instrumentation in the executor and explain,
> for example. At least, this stresses me much less than adding 16
> bytes to EState for something restricted to the extended protocol when
> it comes to monitoring capabilities.

Attached is the patch that uses Instrumentation. 

I did not add any new tests, and we do not have anyway now
of setting a row count when going through the Execute message.
I think this may be need to be addressed separately since there
Seems to be. Gap in extended query protocol testing.

For this fix however, The JDBC test does show correct results.


Regards,

Sami Imseih
Amazon Web Services (AWS)





0001-Correct-accumulation-of-counters-for-extended-query-.patch
Description: 0001-Correct-accumulation-of-counters-for-extended-query-.patch

Re: [BUG] pg_stat_statements and extended query protocol

2023-03-20 Thread Imseih (AWS), Sami

Sorry about the delay in response about this.

I was thinking about this and it seems to me we can avoid
adding new fields to Estate. I think a better place to track
rows and calls is in the Instrumentation struct.

--- a/src/include/executor/instrument.h
+++ b/src/include/executor/instrument.h
@@ -88,6 +88,8 @@ typedef struct Instrumentation
double  nfiltered2; /* # of tuples removed by 
"other" quals */
BufferUsage bufusage;   /* total buffer usage */
WalUsagewalusage;   /* total WAL usage */
+   int64   calls;
+   int64   rows_processed;
 } Instrumentation; 


If this is more palatable, I can prepare the patch.

Thanks for your feedback.

Regards.

Sami Imseih
Amazon Web Services (AWS)

Re: [BUG] pg_stat_statements and extended query protocol

2023-03-11 Thread Imseih (AWS), Sami

> If I remove this patch and recompile again, then "initdb -D $PGDATA" works.

It appears you must "make clean; make install" to correctly compile after
applying the patch.

Regards,

Sami Imseih
Amazon Web Services (AWS)

Re: Track IO times in pg_stat_io

2023-03-09 Thread Imseih (AWS), Sami

> >>> Now I've a second thought: what do you think about resetting the related 
> >>> number
> >>> of operations and *_time fields when enabling/disabling track_io_timing? 
> >>> (And mention it in the doc).
> >>>
> >>> That way it'd prevent bad interpretation (at least as far the time per 
> >>> operation metrics are concerned).
> >>>
> >>> Thinking that way as we'd loose some (most?) benefits of the new *_time 
> >>> columns
> >>> if one can't "trust" their related operations and/or one is not sampling 
> >>> pg_stat_io frequently enough (to discard the samples
> >>> where the track_io_timing changes occur).
> >>>
> >>> But well, resetting the operations could also lead to bad interpretation 
> >>> about the operations...
> >>>
> >>> Not sure about which approach I like the most yet, what do you think?
> >>
> >> Oh, this is an interesting idea. I think you are right about the
> >> synchronization issues making the statistics untrustworthy and, thus,
> >> unuseable.
> > 
> > No, I don't think we can do that. It can be enabled on a per-session basis.

> Oh right. So it's even less clear to me to get how one would make use of 
> those new *_time fields, given that:

> - pg_stat_io is "global" across all sessions. So, even if one session is 
> doing some "testing" and needs to turn track_io_timing on, then it
> is even not sure it's only reflecting its own testing (as other sessions may 
> have turned it on too).

> - There is the risk mentioned above of bad interpretations for the "time per 
> operation" metrics.

> - Even if there is frequent enough sampling of it pg_stat_io, one does not 
> know which samples contain track_io_timing changes (at the cluster or session 
> level).

As long as track_io_timing can be toggled, blk_write_time could lead to wrong 
conclusions.
I think it may be helpful to track the blks_read when track_io_timing is enabled
Separately.

blks_read will be as is and give the overall blks_read, while a new column
blks_read_with_timing will only report on blks_read with track_io_timing 
enabled.

blks_read_with_timing should never be larger than blks_read.

This will then make the blks_read_time valuable if it's looked at with
the blks_read_with_timing column.


Regards,

-- 

Sami Imseih
Amazon Web Services (AWS)

Re: Record queryid when auto_explain.log_verbose is on

2023-03-06 Thread Imseih (AWS), Sami

> > It's a bit annoying that the info is missing since pg 14, but we
> > probably can't
> > backpatch this as it might break log parser tools.


> What do you think?

That's a good point about log parsing tools, i.e. pgbadger.

Backpatching does not sounds to appealing to me after
giving this a second thought.

However, I do feel it needs to be called out in docs,
that "Query Identifier" is not available in auto_explain
until version 16.

Regards,

--

Sami Imseih
Amazon Web Services (AWS)

Re: Record queryid when auto_explain.log_verbose is on

2023-03-06 Thread Imseih (AWS), Sami

I am wondering if this patch should be backpatched?

The reason being is in auto_explain documentation [1],
there is a claim of equivalence of the auto_explain.log_verbose
option and EXPLAIN(verbose)

". it's equivalent to the VERBOSE option of EXPLAIN."

This can be quite confusing for users of the extension.
The documentation should either be updated or a backpatch
all the way down to 14, which the version the query identifier
was moved to core. I am in favor of the latter.

Any thoughts?


[1] https://www.postgresql.org/docs/14/auto-explain.html

Regards,

--
Sami Imseih
Amazon Web Services (AWS)

Re: [BUG] pg_stat_statements and extended query protocol

2023-03-06 Thread Imseih (AWS), Sami

> Well, it is one of these areas where it seems to me we have never been
> able to put a definition on what should be the correct behavior when
> it comes to pg_stat_statements. 

What needs to be defined here is how pgss should account for # of rows
processed when A) a select goes through extended query (EP) protocol, and 
B) it requires multiple executes to complete a portal.

The patch being suggested will treat every 'E' ( execute message ) to the same
portal as a new call ( pgss will increment the calls ) and the number of rows
processed will be accumulated for every 'E' message.

Currently, only the rows fetched in the last 'E' call to the portal is tracked 
by 
pgss. This is incorrect.

> Could it be possible to add some
> regression tests using the recently-added \bind command and see how
> this affects things? 

\bind alone will not be enough as we also need a way to fetch from
a portal in batches. The code that needs to be exercised
as part of the test is exec_execute_message with max_rows != 0.

\bind will call exec_execute_message with max_rows = 0 to fetch
all the rows.

> I would suggest splitting these into their own
> SQL file, following an effort I have been doing recently for the
> regression tests of pg_stat_statements. It would be good to know the
> effects of this change for pg_stat_statements.track = (top|all), as
> well.

Yes, I agree that proper test coverage is needed here. Will think
about how to accomplish this.

> - uint64 es_processed; /* # of tuples processed */
> + uint64 es_processed; /* # of tuples processed at the top level only */
> + uint64 es_calls; /* # of calls */
> + uint64 es_total_processed; /* total # of tuples processed */


> So the root of the logic is here. Anything that makes the executor
> structures larger freaks me out, FWIW, and that's quite an addition.

I am not sure how to get around the changes to Estate and fixing this
Issue. 

We could potentially only need the es_total_processed field and
continue to track calls in pgss. 

es_total_processed in EState however is still needed.

Regards,

--

Sami Imseih
Amazon Web Services (AWS)

Re: Doc update for pg_stat_statements normalization

2023-02-28 Thread Imseih (AWS), Sami

> + 
> + Queries on which normalization can be applied may be observed with constant
> + values in pg_stat_statements, especially when there
> + is a high rate of entry deallocations. To reduce the likelihood of this
> + happening, consider increasing pg_stat_statements.max.
> + The pg_stat_statements_info view, discussed below
> + in ,
> + provides statistics about entry deallocations.
> + 

> Are you OK with this text?

Yes, that makes sense.


Regards,

--
Sami Imseih
Amazon Web Services

Re: Doc update for pg_stat_statements normalization

2023-02-28 Thread Imseih (AWS), Sami

> I am OK with an addition to the documentation to warn that one may
> have to increase the maximum number of entries that can be stored if
> seeing a non-normalized entry that should have been normalized.

I agree. We introduce the concept of a plannable statement in a 
previous section and we can then make this distinction in the new
paragraph. 

I also added a link to pg_stat_statements_info since that is introduced
later on int the doc.


Regards,

-- 
Sami Imseih
Amazon Web Services




0001-doc-update-regarding-pg_stat_statements-normalizatio.patch
Description: 0001-doc-update-regarding-pg_stat_statements-normalizatio.patch

Re: Doc update for pg_stat_statements normalization

2023-02-27 Thread Imseih (AWS), Sami

>On Sat, Feb 25, 2023 at 01:59:04PM +0000, Imseih (AWS), Sami wrote:>
>> The overhead of storing this additional private data for the life of the 
> query
>> execution may not be  desirable.

>Okay, but why?

Additional memory to maintain the JumbleState data in other structs, and
the additional copy operations.

>This seems like the key point to me here.  If we copy more information
>into the Query structures, then we basically have no need for sticky
>entries, which could be an advantage on its own as it simplifies the
>deallocation and lookup logic.

Removing the sticky entry logic will be a big plus, and of course we can
eliminate a query not showing up properly normalized.

>For a DML or a SELECT, the manipulation of the hash table would still
>be a three-step process (post-analyze, planner and execution end), but
>the first step would have no need to use an exclusive lock on the hash
>table because we could just read and copy over the Query the
>normalized query if an entry exists, meaning that we could actually
>relax things a bit?  

No lock is held while normalizing, and a shared lock is held while storing,
so there is no apparent benefit from that aspect.

>This relaxation has as cost the extra memory used
>to store more data to allow the insertion to use a proper state of the
>Query[Desc] coming from the JumbleState (this extra data has no need
>to be JumbleState, just the results we generate from it aka the
>normalized query).

Wouldn't be less in terms of memory usage to just store the required
JumbleState fields in Query[Desc]?

clocations_count,
highest_extern_param_id,
clocations_locations,
clocations_length

>> In v14, we added a dealloc metric to pg_stat_statements_info, which is 
> helpful.
>> However, this only deals with the pgss_hash entry deallocation.
>> I think we should also add a metric for the text file garbage collection.

>This sounds like a good idea on its own.

I can create a separate patch for this.

Regards,

--
Sami Imseih
Amazon Web services

Re: Doc update for pg_stat_statements normalization

2023-02-25 Thread Imseih (AWS), Sami

>> Could things be done in a more stable way?  For example, imagine that
>> we have an extra Query field called void *private_data that extensions
>> can use to store custom data associated to a query ID, then we could
>> do something like that:
>> - In the post-analyze hook, check if an entry with the query ID
>> calculated exists.
>> -- If the entry exists, grab a copy of the existing query string,
>> which may be normalized or not, and save it into Query->private_data.
>> -- If the entry does not exist, normalize the query, store it in
>> Query->private_data but do not yet create an entry in the hash table.
>> - In the planner/utility hook, fetch the normalized query from
>> private_data, then use it if an entry needs to be created in the hash
>> table.  The entry may have been deallocated since the post-analyze
>> hook, in which case it is re-created with the normalized copy saved in
>> the first phase.

>I think the idea of a "private_data" like thing has been discussed before 
> and
>rejected IIRC, as it could be quite expensive and would also need to
>accommodate for multiple extensions and so on.

The overhead of storing this additional private data for the life of the query
execution may not be  desirable. I think we also will need to copy the
private data to QueryDesc as well to make it available to planner/utility/exec
hooks.

>Overall, I think that if the pgss eviction rate is high enough that it's
>problematic for doing performance analysis, the performance overhead will 
> be so
>bad that simply removing pg_stat_statements will give you a ~ x2 
> performance
>increase.  I don't see much point trying to make such a performance killer
>scenario more usable.

In v14, we added a dealloc metric to pg_stat_statements_info, which is helpful.
However, this only deals with the pgss_hash entry deallocation.
I think we should also add a metric for the text file garbage collection.

Regards

-- 
Sami Imseih
Amazon Web Services

Doc update for pg_stat_statements normalization

2023-02-24 Thread Imseih (AWS), Sami

Replacing constants in pg_stat_statements is on a best effort basis.
It is not unlikely that on a busy workload with heavy entry deallocation,
the user may observe the query with the constants in pg_stat_statements.

From what I can see, this is because the only time an entry is normalized is
during post_parse_analyze, and the entry may be deallocated by the time query
execution ends. At that point, the original form ( with constants ) of the query
is used.

It is not clear how prevalent this is in real-world workloads, but it's easily 
reproducible
on a workload with high entry deallocation. Attached are the repro steps on the 
latest
branch.

I think the only thing to do here is to call this out in docs with a suggestion 
to increase
pg_stat_statements.max to reduce the likelihood. I also attached the suggested
doc enhancement as well.

Any thoughts?

Regards,

--
Sami Imseih
Amazon Web Services


### pg_stat_statements.max is min allowed
postgres=# show pg_stat_statements.max ;
 pg_stat_statements.max

 100
(1 row)

### create 200 tables
do $$
begin
for i in 1 .. 200
loop
execute 'drop table if exists foo'||i;
execute 'create table foo'||i||'(id int, c2 text)';
end loop;
end; $$ ;

### create a pgbench script to insert into the tables.
for (( i=1; i<=200; i++ ))
do 
   echo "INSERT INTO foo"$i" (id, c2) values (1, 'somedata');" >> 
/tmp/pgbench.sql
done

### run pgbench
pgbench -c 50 -f /tmp/pgbench.sql -T 1200


### observe pg_stat_statements
postgres=# select query from pg_stat_statements where query like '%foo%' and 
query not like '%$1%';
query
-
 INSERT INTO foo31 (id, c2) values (1, 'somedata')
 INSERT INTO foo20 (id, c2) values (1, 'somedata')
 INSERT INTO foo191 (id, c2) values (1, 'somedata')
 INSERT INTO foo170 (id, c2) values (1, 'somedata')
 INSERT INTO foo167 (id, c2) values (1, 'somedata')
 INSERT INTO foo32 (id, c2) values (1, 'somedata')
 INSERT INTO foo36 (id, c2) values (1, 'somedata')
 INSERT INTO foo43 (id, c2) values (1, 'somedata')
 INSERT INTO foo181 (id, c2) values (1, 'somedata')
 INSERT INTO foo88 (id, c2) values (1, 'somedata')
(10 rows)

postgres=# select query from pg_stat_statements where query like '%foo%' and 
query  like '%$1%' limit 5;
   query

 INSERT INTO foo33 (id, c2) values ($1, $2)
 INSERT INTO foo59 (id, c2) values ($1, $2)
 INSERT INTO foo50 (id, c2) values ($1, $2)
 INSERT INTO foo42 (id, c2) values ($1, $2)
 INSERT INTO foo91 (id, c2) values ($1, $2)

### observe the # of deallocations
postgres=# select * from pg_stat_statements_info ;
 dealloc |  stats_reset
-+---
  113371 | 2023-02-24 06:24:21.912275-06
(1 row)


0001-doc-update-regarding-pg_stat_statements-normalizatio.patch
Description: 0001-doc-update-regarding-pg_stat_statements-normalizatio.patch

Re: Add index scan progress to pg_stat_progress_vacuum

2023-02-20 Thread Imseih (AWS), Sami

Thanks!

>  I think PROGRESS_VACUUM_INDEXES_TOTAL and
>   PROGRESS_VACUUM_INDEXES_PROCESSED are better for consistency. The rest
>   looks good to me.

Took care of that in v25. 

Regards

--
Sami Imseih
Amazon Web Services



v25-0001-Add-2-new-columns-to-pg_stat_progress_vacuum.-Th.patch
Description: v25-0001-Add-2-new-columns-to-pg_stat_progress_vacuum.-Th.patch

Re: Add index scan progress to pg_stat_progress_vacuum

2023-02-17 Thread Imseih (AWS), Sami

Thanks for the review!

>+ 
>+  ParallelVacuumFinish
>+  Waiting for parallel vacuum workers to finish index
>vacuum.
>+ 

>This change is out-of-date.

That was an oversight. Thanks for catching.

>Total number of indexes that will be vacuumed or cleaned up. This
>number is reported as of the beginning of the vacuuming indexes phase
>or the cleaning up indexes phase.

This is cleaner. I was being unnecessarily verbose in the original description.

>Number of indexes processed. This counter only advances when the phase
>is vacuuming indexes or cleaning up indexes.

I agree.

>Also, index_processed sounds accurate to me. What do you think?

At one point, II used index_processed, but decided to change it. 
"processed" makes sense also. I will use this.

>I think these settings are not necessary since the pcxt is palloc0'ed.

Good point.

>Assert(pvs->pcxt->parallel_progress_callback_arg) looks wrong to me.
>If 'arg' is NULL, a SEGV happens.

Correct, Assert(pvs) is all that is needed.

>I think it's better to update pvs->shared->nindexes_completed by both
>leader and worker processes who processed the index.

No reason for that, since only the leader process can report process to
backend_progress.


>I think it's better to make the function type consistent with the
>existing parallel_worker_main_type. How about
>parallel_progress_callback_type?

Yes, that makes sense.

> I've attached a patch that incorporates the above comments and has
>some suggestions of updating comments etc.

I reviewed and incorporated these changes, with a slight change. See v24.

-* Increase and report the number of index. Also, we reset the progress
-* counters.


+* Increase and report the number of index scans. Also, we reset the 
progress
+* counters.


Thanks

--
Sami Imseih
Amazon Web Services (AWS)



v24-0001-Add-2-new-columns-to-pg_stat_progress_vacuum.-Th.patch
Description: v24-0001-Add-2-new-columns-to-pg_stat_progress_vacuum.-Th.patch

Re: Add index scan progress to pg_stat_progress_vacuum

2023-02-07 Thread Imseih (AWS), Sami

Hi,

Thanks for your reply!

I addressed the latest comments in v23.

1/ cleaned up the asserts as discussed.
2/ used pq_putmessage to send the message on index scan completion.

Thanks

--
Sami Imseih
Amazon Web Services (AWS)



v23-0001-Add-2-new-columns-to-pg_stat_progress_vacuum.-Th.patch
Description: v23-0001-Add-2-new-columns-to-pg_stat_progress_vacuum.-Th.patch

[BUG] pg_stat_statements and extended query protocol

2023-01-25 Thread Imseih (AWS), Sami

Doing some work with extended query protocol, I encountered the same
issue that was discussed in [1]. It appears when a client is using
extended query protocol and sends an Execute message to a portal with
max_rows, and a portal is executed multiple times,
pg_stat_statements does not correctly track rows and calls.

Using the attached jdbc script, TEST.java, which can reproduce the issue
with setFetchSize of 100 with autocommit mode set to OFF. We can
see that although pg_class has 414 rows, the total call and
rows returned is 14. the first 4 * 100 fetches did not get accounted for,.

postgres=# select calls, rows, query from pg_stat_statements
postgres-# where queryid = '-1905758228217333571';
calls | rows | query
-
1 | 14 | select * from pg_class
(1 row)

The execution work flow goes something like this:
ExecutorStart
ExecutorRun – which will be called multiple times to fetch from the
portal until the caller Closes the portal or the
portal
runs out of rows.
ExecutorFinish
ExecutorEnd – portal is closed & pg_stat_statements stores the final rows
processed

Where this breaks for pg_stat_statements is during ExecutorRun,
es_processed is reset to 0 every iteration. So by the time the portal
is closed, es_processed will only show the total from the last execute
message.

This appears to be only an issue for portals fetched
through extended query protocol and not explicit cursors
that go through simple query protocol (i.e. FETCH )

I attached a JDBC script to repro the issue.

One potential fix I see is to introduce 2 new counters in the
ExecutionState which will track the total rows processed
and the number of calls. These counters can then be used
by pg_stat_statements. Attached is an experimental patch
which shows the correct number of rows and number of
calls.

postgres=# select calls, rows, query from pg_stat_statements
postgres-# where queryid = '-1905758228217333571';
calls | rows | query
-
5 | 414 | select * from pg_class
(1 row)

[1]
https://www.postgresql.org/message-id/flat/c90890e7-9c89-c34f-d3c5-d5c763a34bd8%40dunslane.net

Thanks

–
Sami Imseih
Amazon Web Services (AWS)

0001-correct-pg_stat_statements-tracking-of-portals.patch
Description: 0001-correct-pg_stat_statements-tracking-of-portals.patch

Test.java
Description: Test.java

Re: Add index scan progress to pg_stat_progress_vacuum

2023-01-20 Thread Imseih (AWS), Sami

>Number of indexes that will be vacuumed or cleaned up. This counter only
>advances when the phase is vacuuming indexes or cleaning up indexes.

I agree, this reads better.

---
-/* Report that we are now vacuuming indexes */
-pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
-
PROGRESS_VACUUM_PHASE_VACUUM_INDEX);
+/*
+ * Report that we are now vacuuming indexes
+ * and the number of indexes to vacuum.
+ */
+progress_start_val[0] = PROGRESS_VACUUM_PHASE_VACUUM_INDEX;
+progress_start_val[1] = vacrel->nindexes;
+pgstat_progress_update_multi_param(2, progress_start_index,
progress_start_val);

>According to our code style guideline[1], we limit line lengths so
>that the code is readable in an 80-column window. Some comments
 >   updated in this patch seem too short.

I will correct this.

>I think it's better to define "void *arg".

Agree

>---
>+/*
>+ * A Leader process that receives this 
> message
>+ * must be ready to update progress.
>+ */
>+Assert(pcxt->parallel_progress_callback);
>+
> Assert(pcxt->parallel_progress_callback_arg);
>+
>+/* Report progress */
>+
>pcxt->parallel_progress_callback(pcxt->parallel_progress_callback_arg);

>I think the parallel query infra should not require
>parallel_progress_callback_arg to always be set. I think it can be
>NULL.

This assertion is inside the new 'P' message type handling.
If a leader is consuming this message, they must have a
progress callback set. Right now we only set the callback
in the parallel vacuum case only, so not all leaders will be prepared
to handle this case. 

Would you agree this is needed for safety?

case 'P':   /* Parallel progress reporting */
{
/*
 * A Leader process that receives this message
 * must be ready to update progress.
 */
Assert(pcxt->parallel_progress_callback);
Assert(pcxt->parallel_progress_callback_arg);

---
>+void
>+parallel_vacuum_update_progress(void *arg)
>+{
>+ParallelVacuumState *pvs = (ParallelVacuumState *)arg;
>+
>+Assert(!IsParallelWorker());
>+
>+if (pvs)
>+
> pgstat_progress_update_param(PROGRESS_VACUUM_INDEX_COMPLETED,
>+
>   pg_atomic_add_fetch_u32(&(pvs->shared->nindexes_completed), 1));
>+}

>Since parallel vacuum always sets the arg, I think we don't need to check 
> it.

The arg is only set during parallel vacuum. During non-parallel vacuum,
It's NULL. This check can be removed, but I realize now that we do need 
an Assert(pvs). Do you agree?

--

Regards,

Sami Imseih
Amazon Web Services (AWS)

Re: Add index scan progress to pg_stat_progress_vacuum

2023-01-12 Thread Imseih (AWS), Sami

Thanks for the feedback and I apologize for the delay in response.

>I think the problem here is that you're basically trying to work around the
>lack of an asynchronous state update mechanism between leader and workers. 
> The
>workaround is to add a lot of different places that poll whether there has
>been any progress. And you're not doing that by integrating with the 
> existing
>machinery for interrupt processing (i.e. CHECK_FOR_INTERRUPTS()), but by
>developing a new mechanism.

>I think your best bet would be to integrate with HandleParallelMessages().

You are correct. I have been trying to work around the async nature
of parallel workers performing the index vacuum. As you have pointed out,
integrating with HandleParallelMessages does appear to be the proper way.
Doing so will also avoid having to do any progress updates in the index AMs.

In the attached patch, the parallel workers send a new type of protocol message
type to the leader called 'P' which signals the leader that it should handle a
progress update. The leader then performs the progress update by
invoking a callback set in the ParallelContext. This is done inside  
HandleParallelMessages.
In the index vacuum case, the callback is parallel_vacuum_update_progress. 

The new message does not contain a payload, and it's merely used to
signal the leader that it can invoke a progress update.

Also, If the leader performs the index vacuum, it can call 
parallel_vacuum_update_progress
directly inside vacuumparallel.c.

Regards,

Sami Imseih
Amazon Web Services (AWS)






v22-0001-Add-2-new-columns-to-pg_stat_progress_vacuum.-Th.patch
Description: v22-0001-Add-2-new-columns-to-pg_stat_progress_vacuum.-Th.patch

Re: Add index scan progress to pg_stat_progress_vacuum

2023-01-06 Thread Imseih (AWS), Sami

> My point is whether we should show
> indexes_total throughout the vacuum execution (even also in not
>  relevant phases such as heap scanning/vacuum/truncation).

That is a good point. We should only show indexes_total
and indexes_completed only during the relevant phases.

V21 addresses this along with a documentation fix.

indexes_total and indexes_completed can only be a value > 0 while in the
"vacuuming indexes" or "cleaning up indexes" phases of vacuum. 

Indexes_total is set to 0 at the start of the index vacuum cycle or index 
cleanups
and set back to 0, along with indexes_completed, at the end of the index vacuum
cycle and index cleanups

Also, for the progress updates in vacuumlazy.c that should be done atomically,
I made a change to use pgstat_progress_update_multi_param.

Regards,

--
Sami Imseih
Amazon Web Services: https://aws.amazon.com



v21-0001-Add-2-new-columns-to-pg_stat_progress_vacuum.-Th.patch
Description: v21-0001-Add-2-new-columns-to-pg_stat_progress_vacuum.-Th.patch

Re: Add index scan progress to pg_stat_progress_vacuum

2023-01-05 Thread Imseih (AWS), Sami

>Similar to above three cases, vacuum can bypass index vacuuming if
>there are almost zero TIDs. Should we set indexes_total to 0 in this
>case too? If so, I think we can set both indexes_total and
>indexes_completed at the beginning of the index vacuuming/cleanup and
>reset them at the end. 

Unlike the other 3 cases, in which the vacuum and cleanup are totally skipped,
a cleanup still occurs when the index vacuum is bypassed. From what I can tell,
this is to allow for things like a gin pending list cleanup even if the index
is not vacuumed. There could be other reasons as well.

if (bypass)
{
/*
 * There are almost zero TIDs.  Behave as if there were 
precisely
 * zero: bypass index vacuuming, but do index cleanup.
 *
 * We expect that the ongoing VACUUM operation will finish very
 * quickly, so there is no point in considering speeding up as a
 * failsafe against wraparound failure. (Index cleanup is 
expected to
 * finish very quickly in cases where there were no 
ambulkdelete()
 * calls.)
 */
vacrel->do_index_vacuuming = false;
}

So it seems like we should still report the total number of indexes as we are 
currently
doing in the patch.

With that said, the documentation should make this be more clear.

For indexes_total, the description should be:

   Number of indexes that will be vacuumed or cleaned up. This value will be
0 if there are no indexes to vacuum, 
INDEX_CLEANUP
is set to OFF, or vacuum failsafe is triggered.
See 

For indexes_completed, it should be:

   Number of indexes vacuumed in the current vacuum cycle when the
   phase is vacuuming indexes, or the number
   of indexes cleaned up in the cleaning up indexes
   phase.


Regards,

--
Sami Imseih
Amazon Web Services: https://aws.amazon.com

Re: Add index scan progress to pg_stat_progress_vacuum

2023-01-04 Thread Imseih (AWS), Sami

Thanks for the review!

Addressed the comments.

> "Increment the indexes completed." (dot at the end) instead?

Used the commenting format being used in other places in this
file with an inclusion of a double-dash. i.,e.
/* Wraparound emergency -- end current index scan */

> It seems to me that "#define REPORT_PARALLEL_VACUUM_EVERY_PAGES 
> ((BlockNumber) (1024 * 1024 * 1024 / BLCKSZ))" would be fine too.

I kept this the same as it matches what we are doing in other places such
as FAILSAFE_EVERY_PAGES

v20 attached.


Regards,

--
Sami Imseih
Amazon Web Services: https://aws.amazon.com



v20-0001-Add-2-new-columns-to-pg_stat_progress_vacuum.-Th.patch
Description: v20-0001-Add-2-new-columns-to-pg_stat_progress_vacuum.-Th.patch

Re: Add index scan progress to pg_stat_progress_vacuum

2023-01-03 Thread Imseih (AWS), Sami

> cirrus-ci.com/task/4557389261701120

I earlier compiled without building with --enable-cassert,
which is why the compilation errors did not produce on my
buid.

Fixed in v19.

Thanks

--
Sami Imseih
Amazon Web Services: https://aws.amazon.com





v19-0001-Add-2-new-columns-to-pg_stat_progress_vacuum.-Th.patch
Description: v19-0001-Add-2-new-columns-to-pg_stat_progress_vacuum.-Th.patch

Re: Add index scan progress to pg_stat_progress_vacuum

2023-01-01 Thread Imseih (AWS), Sami

>cfbot is complaining that this patch no longer applies.  Sami, would you
>mind rebasing it?

Rebased patch attached.

--
Sami Imseih
Amazon Web Services: https://aws.amazon.com



v18-0001-Add-2-new-columns-to-pg_stat_progress_vacuum.-Th.patch
Description: v18-0001-Add-2-new-columns-to-pg_stat_progress_vacuum.-Th.patch

Re: Call lazy_check_wraparound_failsafe earlier for parallel vacuum

2022-12-22 Thread Imseih (AWS), Sami

>I adjusted the FAILSAFE_EVERY_PAGES comments, which now point out that
>FAILSAFE_EVERY_PAGES is a power-of-two. The implication is that the
>compiler is all but guaranteed to be able to reduce the modulo
>division into a shift in the lazy_scan_heap loop, at the point of the
>failsafe check. I doubt that it would really matter if the compiler
>had to generate a DIV instruction, but it seems like a good idea to
>avoid it on general principle, at least in performance sensitive code.

Thank you! 

Regards,

Sami Imseih
Amazon Web Services (AWS)

Re: Call lazy_check_wraparound_failsafe earlier for parallel vacuum

2022-12-20 Thread Imseih (AWS), Sami

Attached is a patch to check scanned pages rather
than blockno. 

Regards,

Sami Imseih
Amazon Web Services (AWS)




v1-0001-fixed-when-wraparound-failsafe-is-checked.patch
Description: v1-0001-fixed-when-wraparound-failsafe-is-checked.patch

1 2 >

1 - 100 of 152 matches

Mail list logo