Re: Fwd: temp_file_limit?

2022-12-23 Thread Frits Jalvingh
Hi Justin,

As our queries are generated I decided to create a peephole optimizer kind
of thing to scan the generated SQL AST to find multiple conditions on the
same table reference. I can then use our metadata to see if these
references are expected to be correlated. This creates about 20 statistics
sets, including the one you have indicated. This at least makes the
problematic query have a stable and very fast plan (so far). I had hoped
for some more improvement with other queries but that has not yet been
evident ;)

Thanks a lot for the tips and your help!

Cordially,

Frits

On Tue, Dec 20, 2022 at 10:11 PM Justin Pryzby  wrote:

> On Mon, Dec 19, 2022 at 09:10:27PM +0100, Frits Jalvingh wrote:
> > @justin
> >
> > I tried the create statistics variant and that definitely improves the
> > estimate, and with that one of the "bad" cases (the one with the 82
> minute
> > plan) now creates a good plan using only a few seconds.
> > That is a worthwhile path to follow. A bit hard to do, because those
> > conditions can be anything, but I can probably calculate the ones used
> per
> > customer and create those correlation statistics from that... It is
> > definitely better than tweaking the "poor man's query hints" enable_
> > 8-/ which is really not helping with plan stability either.
> >
> > That will be a lot of work, but I'll let you know the results ;)
>
> Yeah, if the conditions are arbitrary, then it's going to be more
> difficult.  Hopefully you don't have too many columns.  :)
>
> I suggest enabling autoexplain and monitoring for queries which were
> slow, and retroactively adding statistics to those columns which are
> most-commonly queried, and which have correlations (which the planner
> doesn't otherwise know about).
>
> You won't want to have more than a handful of columns in a stats object
> (since it requires factorial(N) complexity), but you can have multiple
> stats objects with different combinations of columns (and, in v14,
> expressions).  You can also set a lower stats target to make the cost a
> bit lower.
>
> You could try to check which columns are correlated, either by running:
> | SELECT COUNT(1),col1,col2 FROM tbl GROUP BY 2,3 ORDER BY 1;
> for different combinations of columns.
>
> Or by creating a tentative/experimental stats object on a handful of
> columns at a time for which you have an intuition about their
> correlation, and then checking the calculated dependencies FROM
> pg_stats_ext.  You may need to to something clever to use that for
> arbitrarily columns.  Maybe this is a start.
> | SELECT dep.value::float, tablename, attnames, dep.key, exprs FROM
> (SELECT (json_each_text(dependencies::text::json)).* AS dep, * FROM
> pg_stats_ext)dep WHERE dependencies IS NOT NULL ORDER BY 1 DESC ; -- AND
> regexp_count(key, ',') < 2
>
> --
> Justin
>


Re: Fwd: temp_file_limit?

2022-12-20 Thread Justin Pryzby
On Mon, Dec 19, 2022 at 09:10:27PM +0100, Frits Jalvingh wrote:
> @justin
> 
> I tried the create statistics variant and that definitely improves the
> estimate, and with that one of the "bad" cases (the one with the 82 minute
> plan) now creates a good plan using only a few seconds.
> That is a worthwhile path to follow. A bit hard to do, because those
> conditions can be anything, but I can probably calculate the ones used per
> customer and create those correlation statistics from that... It is
> definitely better than tweaking the "poor man's query hints" enable_
> 8-/ which is really not helping with plan stability either.
> 
> That will be a lot of work, but I'll let you know the results ;)

Yeah, if the conditions are arbitrary, then it's going to be more
difficult.  Hopefully you don't have too many columns.  :)

I suggest enabling autoexplain and monitoring for queries which were
slow, and retroactively adding statistics to those columns which are
most-commonly queried, and which have correlations (which the planner
doesn't otherwise know about).

You won't want to have more than a handful of columns in a stats object
(since it requires factorial(N) complexity), but you can have multiple
stats objects with different combinations of columns (and, in v14,
expressions).  You can also set a lower stats target to make the cost a
bit lower.

You could try to check which columns are correlated, either by running:
| SELECT COUNT(1),col1,col2 FROM tbl GROUP BY 2,3 ORDER BY 1;
for different combinations of columns.

Or by creating a tentative/experimental stats object on a handful of
columns at a time for which you have an intuition about their
correlation, and then checking the calculated dependencies FROM
pg_stats_ext.  You may need to to something clever to use that for
arbitrarily columns.  Maybe this is a start.
| SELECT dep.value::float, tablename, attnames, dep.key, exprs FROM (SELECT 
(json_each_text(dependencies::text::json)).* AS dep, * FROM pg_stats_ext)dep 
WHERE dependencies IS NOT NULL ORDER BY 1 DESC ; -- AND regexp_count(key, ',') 
< 2

-- 
Justin




Re: Fwd: temp_file_limit?

2022-12-19 Thread Frits Jalvingh
@Thomas


Thanks for helping identifying the issue. I think it would be nice to clean
up those obsoleted files during the run because running out of disk is
reality not a good thing to have ;) Of course the bad estimates leading to
the resize are the real issue but this at least makes it less bad.

Thanks for the tip about disabling parallel_hash but I also found it in the
source. As mentioned before I disable (on production systems) all
parallelism, not just for this issue but to prevent the OOM killer from
killing Postgres - which happens way more often with parallelism on...


Re: Fwd: temp_file_limit?

2022-12-19 Thread Frits Jalvingh
@justin

I tried the create statistics variant and that definitely improves the
estimate, and with that one of the "bad" cases (the one with the 82 minute
plan) now creates a good plan using only a few seconds.
That is a worthwhile path to follow. A bit hard to do, because those
conditions can be anything, but I can probably calculate the ones used per
customer and create those correlation statistics from that... It is
definitely better than tweaking the "poor man's query hints" enable_
8-/ which is really not helping with plan stability either.

That will be a lot of work, but I'll let you know the results ;)


Re: Fwd: temp_file_limit?

2022-12-19 Thread Thomas Munro
On Tue, Dec 20, 2022 at 8:59 AM Frits Jalvingh  wrote:
> @ranier
> These files ONLY exist during the query. They get deleted as soon as the 
> query terminates, by Postgres itself. Once the query terminates pgsql_tmp is 
> completely empty. Considering what Thomas said (and the actual occurrence of 
> the files he mentioned) this does seem to be the more likely cause to me.

I'm working on some bug fixes near this area at the moment, so I'll
also see if I can figure out how to implement the missing eager
cleanup of earlier generations.  It's still a pretty bad scenario once
you reach it (repartitioning repeatedly, that is) and the solution to
that it probably much harder, but it's obviously not great to waste
temporary disk space like that.  BTW you can disable just parallel
hash with enable_parallel_hash=false.




Re: Fwd: temp_file_limit?

2022-12-19 Thread Frits Jalvingh
@ranier
These files ONLY exist during the query. They get deleted as soon as the
query terminates, by Postgres itself. Once the query terminates pgsql_tmp
is completely empty. Considering what Thomas said (and the actual
occurrence of the files he mentioned) this does seem to be the more likely
cause to me.


Re: Fwd: temp_file_limit?

2022-12-19 Thread Justin Pryzby
On Sun, Dec 18, 2022 at 06:29:41PM +0100, Frits Jalvingh wrote:
> By itself I'm used to bad query performance in Postgresql; our application
> only does bulk queries and Postgres quite often makes terrible plans for
> those, but with set enable_nestloop=false set always most of them at least
> execute. The remaining failing queries are almost 100% caused by bad join
> sequences; I plan to work around those by forcing the join order from our
> application. For instance, the exact same query above can also generate the
> following plan (this one was created by manually setting
> join_collapse_limit = 1, but fast variants also occur quite often when
> disabling parallelism):

I, too, ended up setting enable_nestloop=false for our report queries,
to avoid the worst-case plans.

But you should also try to address the rowcount misestimates.  This
underestimates the rowcount by a factor of 69 (or 138 in the plan you
sent today):

| (soort = 'MSL'::text) AND (code = 'DAE'::text)

If those conditions are correlated, you can improve the estimate by
adding extended stats object.

| CREATE STATISTICS s_h_sturingslabel_ssm_stats soort,code FROM 
s_h_sturingslabel_ssm; ANALYZE s_h_sturingslabel_ssm;

Unfortunately, stats objects currently only improve scans, and not
joins, so that might *improve* some queries, but it won't resolve the
worst problems:

| Hash Join (cost=22,832.23..44,190.21 rows=185 width=47) (actual 
time=159.725..2,645,634.918 rows=28,086,472,886 loops=1) 

Maybe you can improve that by adjusting the stats target or ndistinct...

-- 
Justin




Re: Fwd: temp_file_limit?

2022-12-19 Thread Ranier Vilela
Em seg., 19 de dez. de 2022 às 16:29, Frits Jalvingh  escreveu:

> Ok, just to make sure that I understand correctly:
> The parallel hash implementation needs to resize its table because of a
> mismatch in expected tuple count. I do expect this to be true: Postgres
> often grossly underestimates the expected row counts in our queries.
> This is not fully implemented yet: removing the "old "files is not yet
> done, so every time the table resizes it creates a new set of files and the
> old ones remain.
> I assume that the "used file size" only includes the "current" set of
> files, and that the old ones are not counted towards that amount? That
> would explain why it overallocates, of course.
>
It is not necessary what is happening.
Could you try manually deleting (rm) these files, using the postgres user?
It's an ugly and dirty test, but it could indicate that files are really
being left behind, without being deleted by Postgres.

Alternatively, you could compile a version with
CHECK_WRITE_VS_EXTEND set, and try to fetch as much information from the
logs as possible,
as has been indicated by others here.


> By itself I now know what to do: I just need to disable all parallelism (
> •̀ᴗ•́ )و ̑̑
>
> I usually do that anyway because it makes queries die randomly. This is
> just another reason.
>
> I restarted that query with max_parallel_workers_per_gather=0, and this
> does not seem to use tempspace at all. It was not exactly fast, it took 82
> minutes of a single process running at 100% cpu.
> https://explain.depesz.com/s/HedE
>
Anyway, see the hint page (https://explain.depesz.com/s/HedE#hints),
maybe it will be useful.

regards,
Ranier Vilela


Re: Fwd: temp_file_limit?

2022-12-19 Thread Frits Jalvingh
Ok, just to make sure that I understand correctly:
The parallel hash implementation needs to resize its table because of a
mismatch in expected tuple count. I do expect this to be true: Postgres
often grossly underestimates the expected row counts in our queries.
This is not fully implemented yet: removing the "old "files is not yet
done, so every time the table resizes it creates a new set of files and the
old ones remain.
I assume that the "used file size" only includes the "current" set of
files, and that the old ones are not counted towards that amount? That
would explain why it overallocates, of course.

By itself I now know what to do: I just need to disable all parallelism (
•̀ᴗ•́ )و ̑̑

I usually do that anyway because it makes queries die randomly. This is
just another reason.

I restarted that query with max_parallel_workers_per_gather=0, and this
does not seem to use tempspace at all. It was not exactly fast, it took 82
minutes of a single process running at 100% cpu.
https://explain.depesz.com/s/HedE

Thanks a lot for your help Justin, and all others that chimed in too.

Next round is to try to get that query to run in the 5 seconds that we know
it can do, reliably.


On Mon, Dec 19, 2022 at 6:46 PM Justin Pryzby  wrote:

> On Mon, Dec 19, 2022 at 06:27:57PM +0100, Frits Jalvingh wrote:
> > I have listed the files during that run,
>
> > 213M -rw--- 1 postgres postgres 213M dec 19 17:46 i100of128.p0.0
> > 207M -rw--- 1 postgres postgres 207M dec 19 17:46 i100of128.p1.0
> > 210M -rw--- 1 postgres postgres 210M dec 19 17:49 i100of256.p0.0
> > 211M -rw--- 1 postgres postgres 211M dec 19 17:49 i100of256.p1.0
> > 188M -rw--- 1 postgres postgres 188M dec 19 17:53 i100of512.p0.0
> [...]
>
> I think that proves Thomas' theory.  I'm not sure how that helps you,
> though...
>
> On Mon, Dec 19, 2022 at 01:51:33PM +1300, Thomas Munro wrote:
> > One possibility is that you've hit a case that needs several rounds of
> > repartitioning (because of a failure to estimate the number of tuples
> > well), but we can't see that because you didn't show EXPLAIN (ANALYZE)
> > output (understandably if it runs out of disk space before
> > completing...).  The parallel hash code doesn't free up the previous
> > generations' temporary files; it really only needs two generations'
> > worth concurrently (the one it's reading from and the one it's writing
> > to).  In rare cases where more generations are needed it could unlink
> > the older ones -- that hasn't been implemented yet.  If you set
> > log_temp_files = 0 to log temporary file names, it should be clear if
> > it's going through multiple rounds of repartitioning, from the names
> > (...of32..., ...of64..., ...of128..., ...of256..., ...).
>
> --
> Justin Pryzby
> System Administrator
> Telsasoft
> +1-952-707-8581
>


Re: Fwd: temp_file_limit?

2022-12-19 Thread Justin Pryzby
On Mon, Dec 19, 2022 at 06:27:57PM +0100, Frits Jalvingh wrote:
> I have listed the files during that run,

> 213M -rw--- 1 postgres postgres 213M dec 19 17:46 i100of128.p0.0
> 207M -rw--- 1 postgres postgres 207M dec 19 17:46 i100of128.p1.0
> 210M -rw--- 1 postgres postgres 210M dec 19 17:49 i100of256.p0.0
> 211M -rw--- 1 postgres postgres 211M dec 19 17:49 i100of256.p1.0
> 188M -rw--- 1 postgres postgres 188M dec 19 17:53 i100of512.p0.0
[...]

I think that proves Thomas' theory.  I'm not sure how that helps you,
though...

On Mon, Dec 19, 2022 at 01:51:33PM +1300, Thomas Munro wrote:
> One possibility is that you've hit a case that needs several rounds of
> repartitioning (because of a failure to estimate the number of tuples
> well), but we can't see that because you didn't show EXPLAIN (ANALYZE)
> output (understandably if it runs out of disk space before
> completing...).  The parallel hash code doesn't free up the previous
> generations' temporary files; it really only needs two generations'
> worth concurrently (the one it's reading from and the one it's writing
> to).  In rare cases where more generations are needed it could unlink
> the older ones -- that hasn't been implemented yet.  If you set
> log_temp_files = 0 to log temporary file names, it should be clear if
> it's going through multiple rounds of repartitioning, from the names
> (...of32..., ...of64..., ...of128..., ...of256..., ...).

-- 
Justin Pryzby
System Administrator
Telsasoft
+1-952-707-8581




Re: Fwd: temp_file_limit?

2022-12-19 Thread Justin Pryzby
On Mon, Dec 19, 2022 at 05:57:42PM +0100, Frits Jalvingh wrote:
> @justin:
> 
> Ran the query again. Top shows the following processes:
>PID USER  PR  NIVIRTRESSHR S  %CPU  %MEM TIME+

Thanks

> root@chatelet:/d2/var/lib/postgresql/15/main/base# du --max=2 -mx
> ./pgsql_tmp |sort -nr
> 412021 ./pgsql_tmp/pgsql_tmp650830.3.fileset
> 412021 ./pgsql_tmp
> ^^^ a few seconds after this last try the query aborted:
> ERROR:  temporary file size exceeds temp_file_limit (104857600kB)
> 
> One possibility is that there are files in the tmpdir, which have been
> > unlinked, but are still opened, so their space hasn't been reclaimed.
> > You could check for that by running lsof -nn |grep pgsql_tmp Any deleted
> > files would say things like 'DEL|deleted|inode|no such'
>
> I do not really understand what you would like me to do, and when. The disk
> space is growing, and it is actual files under pgsql_tmp?

Run this during the query as either postgres or root:
| lsof -nn |grep pgsql_tmp |grep -E 'DEL|deleted|inode|no such'

Any files it lists would be interesting to know about.

> Hope this tells you something, please let me know if you would like more
> info, and again - thanks!

I think Thomas' idea is more likely.  We'd want to know the names of
files being written, either as logged by log_temp_files or from 
| find pgsql_tmp -ls
during the query.

-- 
Justin




Re: Fwd: temp_file_limit?

2022-12-19 Thread Frits Jalvingh
@justin:

Ran the query again. Top shows the following processes:
   PID USER  PR  NIVIRTRESSHR S  %CPU  %MEM TIME+
COMMAND

 650830 postgres  20   0 7503,2m   2,6g   2,6g R 100,0   4,2  12:46.34
postgres: jal datavault_317_prd [local] EXPLAIN

 666141 postgres  20   0 7486,3m   2,6g   2,6g R 100,0   4,1   2:10.24
postgres: parallel worker for PID 650830

Your commands shows, during execution:
root@chatelet:/d2/var/lib/postgresql/15/main/base# du --max=2 -mx
./pgsql_tmp |sort -nr
68629 ./pgsql_tmp/pgsql_tmp650830.3.fileset
68629 ./pgsql_tmp

root@chatelet:/d2/var/lib/postgresql/15/main/base# du --max=2 -mx
./pgsql_tmp |sort -nr
194494 ./pgsql_tmp
194493 ./pgsql_tmp/pgsql_tmp650830.3.fileset

root@chatelet:/d2/var/lib/postgresql/15/main/base# du --max=2 -mx
./pgsql_tmp |sort -nr
335289 ./pgsql_tmp/pgsql_tmp650830.3.fileset
335289 ./pgsql_tmp

root@chatelet:/d2/var/lib/postgresql/15/main/base# du --max=2 -mx
./pgsql_tmp |sort -nr
412021 ./pgsql_tmp/pgsql_tmp650830.3.fileset
412021 ./pgsql_tmp
^^^ a few seconds after this last try the query aborted:
ERROR:  temporary file size exceeds temp_file_limit (104857600kB)

One possibility is that there are files in the tmpdir, which have been
> unlinked, but are still opened, so their space hasn't been reclaimed.
> You could check for that by running lsof -nn |grep pgsql_tmp Any deleted
> files would say things like 'DEL|deleted|inode|no such'
>
I do not really understand what you would like me to do, and when. The disk
space is growing, and it is actual files under pgsql_tmp?

Hope this tells you something, please let me know if you would like more
info, and again - thanks!


Re: Fwd: temp_file_limit?

2022-12-19 Thread Frits Jalvingh
Hehehe, that is not the worst plan ;) I did that once to debug a deadlock
in the JDBC driver when talking with Postgres, but it's not an adventure
I'd like to repeat right now ;)

>


Re: Fwd: temp_file_limit?

2022-12-19 Thread Ranier Vilela
Em seg., 19 de dez. de 2022 às 11:45, Frits Jalvingh  escreveu:

> Hi Ranier, thanks for your help.
>
> I do not have more disks lying around, and I fear that if it does not
> complete with 1.3TB of disk space it might not be that likely that adding
> 750GB would work...
> Postgres version: the original (prd) issue was on 10.x. I also tested it
> on 14.x with the same issue. I then upgraded my machine to 15.1 to make
> sure to report on the latest version, and all information mentioned in this
> thread is from that version.
>
You can run with a Postgres debug compiled version?
https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD

Maybe, some light appears.

regards,
Ranier Vilela

>


Re: Fwd: temp_file_limit?

2022-12-19 Thread Frits Jalvingh
Hi Ranier, thanks for your help.

I do not have more disks lying around, and I fear that if it does not
complete with 1.3TB of disk space it might not be that likely that adding
750GB would work...
Postgres version: the original (prd) issue was on 10.x. I also tested it on
14.x with the same issue. I then upgraded my machine to 15.1 to make sure
to report on the latest version, and all information mentioned in this
thread is from that version.

btw, this query generates quite different plans when tweaking things like
nested_loop=false/true, and the "fast" plan requires nested_loops=true and
join_collapse_limit=1 (5 seconds response). An odd thing is that both plans
contain only one nested loop (a cross join, I think it cannot do that
without one) but the general plan changes a lot.. I am trying to get output
from that second plan because this one just loops using CPU, not disk...
Perhaps that one will finish with some statistics...


Re: Fwd: temp_file_limit?

2022-12-19 Thread Ranier Vilela
Em seg., 19 de dez. de 2022 às 06:47, Frits Jalvingh  escreveu:

>
> The test is running on Ubuntu 22.04.1, x86_64, the disk is an NVMe 2TB
> WD850X with ext4 as a file system.
>
It's probably not a production environment.
Any chance of adding another 2TB NVMe, just for the temp files?
To see if Postgres can finish the queries and provide more information?
What exactly is the version of Postgres (14.???) are you using it?

regards,
Ranier Vilela


Re: Fwd: temp_file_limit?

2022-12-19 Thread Frits Jalvingh
Hi Tom and Thomas, thanks for your help.

@Tom:
If it really is per-process then I would have expected it to die after
200GB was used?
As far as "valid bug" is concerned: I had hoped this would be per session,
as this at least delivers a reasonable and usable limit; it is easy to
control the number of sessions/statements in execution.
If it really is per process then the limit is not really useful, just like
work_mem: the execution plan of a query determines the number of processes
(up to the max, at least that is way better than work_mem) and that can
change whenever Postgres feels a new plan is in order. I can understand
that solving this might be harder (although to me it looks like just a
little bit of shared memory per session to keep a number). To me it does
not sound like an invalid bug, just one you do not want to solve now ;) And
the real problem, for me, is actually that both work_mem and
temp_file_limit should be for the entire instance/cluster ;) I know that
that is even harder.

For us it means we really cannot use Postgres parallelism: it is infinitely
better to have a query that runs longer but which finishes than to have the
database die and recover randomly with OOM or with disk space filling up
killing random queries. Which is a bit of a pity, ofc.

@Justin
The test is running on Ubuntu 22.04.1, x86_64, the disk is an NVMe 2TB
WD850X with ext4 as a file system.
I will collect the other data around tmpfiles hopefully later today.
I have already set max_parallel_workers_per_gather to 1. I will probably
disable all parallelism for the next runs to see whether that makes the
size limit more workable..


Re: Fwd: temp_file_limit?

2022-12-18 Thread Thomas Munro
On Mon, Dec 19, 2022 at 1:51 PM Thomas Munro  wrote:
> It's really the limit for a single file

Oops, sorry I take that back.  It should be per process.




Re: Fwd: temp_file_limit?

2022-12-18 Thread Thomas Munro
On Mon, Dec 19, 2022 at 9:11 AM Justin Pryzby  wrote:
> On Sun, Dec 18, 2022 at 06:29:41PM +0100, Frits Jalvingh wrote:
> > Just to be clear: my real question is: why is temp_file_limit not
> > working at the specified size? Because this is my real problem: when a
> > query is dying like this it will also kill other queries because these
> > are also running out of space. Even when the limit is per-process it
> > should not have exceeded 200GB imo.

It's really the limit for a single file (or virtual file because we
split them on 1GB boundaries, probably well past time we stopped doing
that), but we create many temporary files for various reasons.  One
possibility is that you've hit a case that needs several rounds of
repartitioning (because of a failure to estimate the number of tuples
well), but we can't see that because you didn't show EXPLAIN (ANALYZE)
output (understandably if it runs out of disk space before
completing...).  The parallel hash code doesn't free up the previous
generations' temporary files; it really only needs two generations'
worth concurrently (the one it's reading from and the one it's writing
to).  In rare cases where more generations are needed it could unlink
the older ones -- that hasn't been implemented yet.  If you set
log_temp_files = 0 to log temporary file names, it should be clear if
it's going through multiple rounds of repartitioning, from the names
(...of32..., ...of64..., ...of128..., ...of256..., ...).




Re: Fwd: temp_file_limit?

2022-12-18 Thread Justin Pryzby
On Sun, Dec 18, 2022 at 06:29:41PM +0100, Frits Jalvingh wrote:
> Just to be clear: my real question is: why is temp_file_limit not
> working at the specified size? Because this is my real problem: when a
> query is dying like this it will also kill other queries because these
> are also running out of space. Even when the limit is per-process it
> should not have exceeded 200GB imo.

What OS and filesystem are in use ?

Could you list the tmpdir when it's getting huge?  The filenames include
the PID, which would indicate if there's another procecss involved, or a
bug allowed it to get huge.
sudo du --max=2 -mx ./pgsql_tmp |sort -nr

BTW, pg_ls_tmpdir() hides directories, so you shouldn't rely on it for
listing temporary directories...

One possibility is that there are files in the tmpdir, which have been
unlinked, but are still opened, so their space hasn't been reclaimed.
You could check for that by running lsof -nn |grep pgsql_tmp Any deleted
files would say things like 'DEL|deleted|inode|no such'

> BTW, if that limit is really per process instead of per
> session/query then that is a Very Bad Thing(tm), because this makes the
> limit effectively worthless - if a query can spawn 8 parallel processes
> then you can suddenly, without any form of control, again fill up that disk.

8 is the default value of max_worker_processes and max_parallel_workers,
but 2 is the default value of max_parallel_workers_per_gather.  You're
free the change the default value to balance it with the temp_file_limit
(as suggested by the earlier-mentioned commit).

-- 
Justin




Re: Fwd: temp_file_limit?

2022-12-18 Thread Tom Lane
Frits Jalvingh  writes:
> Just to be clear: my real question is: why is temp_file_limit not working
> at the specified size?

I've not looked at that code lately, but I strongly suspect that
it's implemented in such a way that it's a per-process limit, not a
per-session limit.  So each parallel worker could use up that much
space.

It's also possible that you've found an actual bug, but without
a reproducer case nobody's going to take that possibility too
seriously.  We're unlikely to accept "the limit should work
across multiple processes" as a valid bug though.  That would
require a vastly more complicated implementation.

regards, tom lane




Fwd: temp_file_limit?

2022-12-18 Thread Frits Jalvingh
Hi Justin, thanks for your help!

Simple things first:
- I am running a single query on a developer machine. Nothing else uses the
database at that point.
- The database runs on a disk that has 473GB in use and 1.3T still free. I
am watching the increase in size used (watch df -hl /d2).
- If I remove the temp_file_limit the query will run until it has used the
1.3TB that was free, then it dies.
- when it runs I see two PG processes active: a main and a worker process
for that main.

I  hope this answers some of the questions: yes, the query is the one using
the tempspace; it is the only one running; it uses only one parallel worker.

Just to be clear: my real question is: why is temp_file_limit not working
at the specified size? Because this is my real problem: when a query is
dying like this it will also kill other queries because these are also
running out of space. Even when the limit is per-process it should not have
exceeded 200GB imo. BTW, if that limit is really per process instead of per
session/query then that is a Very Bad Thing(tm), because this makes the
limit effectively worthless - if a query can spawn 8 parallel processes
then you can suddenly, without any form of control, again fill up that disk.

I'm not really asking for a solution to the bad performance, but hints are
always welcome so I'll include the requested info below:

With the failing plan the query never finishes; it just uses 1.3TB of
space, then dies.
This also means I cannot explain analyze as this does not produce output
when the query dies. This is a pretty terrible bug in my eyes, because you
cannot get the info when it's most needed. If I ever have time left to work
on Postgres' code this will be the first thing to fix 8-/

Anyway. The plan that fails badly is this one:
---
 Unique  (cost=37360.85..37360.86 rows=1 width=42)
   ->  Sort  (cost=37360.85..37360.85 rows=1 width=42)
 Sort Key: (COALESCE(tijd.tijdkey, 'Unknown'::character varying)),
s_h_eenheid_ssm.identificatie
 ->  Hash Join  (cost=34899.49..37360.84 rows=1 width=42)
   Hash Cond: ((ve03678.calender_id)::text =
(COALESCE(tijd.tijdkey, 'Unknown'::character varying))::text)
   Join Filter: ((s_h_eenheid_ssm.dv_start_dts <=
tijd.einddatum) AND (s_h_eenheid_ssm.dv_end_dts > tijd.einddatum) AND
(l_eenheid_sturingslabel_ssm_pe.dv_start_dts <= tijd.einddatum) AND
(l_eenheid_sturingslabel_ssm_pe.dv_end_dts > tijd.einddatum) AND
(sturingslabel_pe.dv_start_dts <= tijd.einddatum) AND
(sturingslabel_pe.dv_end_dts > tijd.einddatum))
   ->  Gather  (cost=34897.66..37358.98 rows=1 width=65)
 Workers Planned: 1
 ->  Parallel Hash Join  (cost=33897.66..36358.88
rows=1 width=65)
   Hash Cond: ((s_h_eenheid_ssm.id_h_eenheid =
l_eenheid_sturingslabel_ssm_pe.id_h_eenheid) AND
(COALESCE(s_h_eenheid_ssm.id_s, '-1'::integer) = ve03678.eenheid_id))
   ->  Parallel Seq Scan on s_h_eenheid_ssm
 (cost=0.00..2326.55 rows=17955 width=34)
   ->  Parallel Hash  (cost=33896.02..33896.02
rows=109 width=47)
 ->  Parallel Hash Join
 (cost=18850.80..33896.02 rows=109 width=47)
   Hash Cond: (ve03678.ve03678 =
sturingslabel_pe.datum)
   ->  Parallel Seq Scan on ve0367801
ve03678  (cost=0.00..12584.92 rows=655792 width=15)
   ->  Parallel Hash
 (cost=18850.78..18850.78 rows=1 width=40)
 ->  Parallel Hash Join
 (cost=15458.27..18850.78 rows=1 width=40)
   Hash Cond:
(l_eenheid_sturingslabel_ssm_pe.id_h_sturingslabel =
sturingslabel_pe.id_h_sturingslabel)
   ->  Parallel Seq Scan on
l_eenheid_sturingslabel_ssm l_eenheid_sturingslabel_ssm_pe
 (cost=0.00..2963.36 rows=114436 width=24)
   ->  Parallel Hash
 (cost=15458.26..15458.26 rows=1 width=24)
 ->  Parallel Seq
Scan on s_h_sturingslabel_ssm sturingslabel_pe  (cost=0.00..15458.26 rows=1
width=24)
   Filter:
((soort = 'MSL'::text) AND (code = 'DAE'::text))
   ->  Hash  (cost=1.37..1.37 rows=37 width=11)
 ->  Seq Scan on tijd  (cost=0.00..1.37 rows=37
width=11)
(24 rows)
https://explain.depesz.com/s/qwsh

By itself I'm used to bad query performance in Postgresql;