Re: Death postgres

2023-05-12 Thread Peter J. Holzer
On 2023-05-12 17:41:37 +0200, Marc Millas wrote:
> On Fri, May 12, 2023 at 8:31 AM Peter J. Holzer  wrote:
> My guess is that the amount of parallelism is the problem.
> 
> work_mem is a per-node limit. Even a single process can use a multiple of
> work_mem if the query contains nested nodes (which almost every query
> does, but most nodes don't need much memory). With 5 parallel workers,
> the total consumption will be 5 times that. So to prevent the OOM
> condition you would need to reduce work_mem or max_parallel_workers (at
> least for this query).
> 
> 
> we have more than 100GB RAM and only 1 user, with one request running.
> work_mem is set to 10MB.

I don't remember mentioning you that before, so I had to guess.

>   for oom to kill due to work_mem it means that for one
> request with 2 left join, postgres needs more than 10.000 work_mem buffers.
> to me, it seems difficult to believe.

Yes. work_mem is unlikely to be the culprit here. It must be something
else which is eating the RAM. But I think it's hard to reproduce that
with the information you've given us this far.

> but that postgres may need that RAM space for hashing or whatever
> other similar purpose seems more probable. no ? 

That should adhere to the work_mem limit, too.

I'm not a core developer (actually not a Postgres developer at all, just
a developer of other stuff which happens to use Postgres), but I
remember that there was an issue some years ago where a hash table could
grow much larger than estimated by the planner. That issue was fixed,
but maybe you've run into a similar case?


> The description temp_file_limit says "...the maximum amount of disk
> space that *a process* can use...". So with 5 workers that's 210*5 =
> 1050 GB total. Again, you may want to reduce either temp_file_limit or
> max_parallel_workers.
> 
> Yes, but if so, we may have had a request canceled due to temp_file limit,
> which was not the case. 

Well, you've said that the disk space grew by "1.1 TB". I understand
that to mean "something between 1.05 TB and 1.15 TB", not exactly
11000 Bytes (or 1209462790554 Bytes or 1100213452800 Bytes,
depending on how you define "TB"). 1050 GB is within that range, so you
might just have missed that limit.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: Death postgres

2023-05-12 Thread Marc Millas
On Fri, May 12, 2023 at 8:31 AM Peter J. Holzer  wrote:

> On 2023-05-11 21:27:57 +0200, Marc Millas wrote:
> > the 75 lines in each tables are not NULLs but '' empty varchar,
> which,
> > obviously is not the same thing.
> > and which perfectly generates 500 billions lines for the left join.
> > So, no planner or statistics pbs. apologies for the time wasted.
>
> No problem. Glad to have solved that puzzle.
>
> > Back to the initial pb: if, with temp_file_limit positioned to 210 GB,
> > I try to run the select * from table_a left join table_b on the col_a
> > (which contains the 75 '' on both tables)
> > then postgres do crash, killed by oom, after having taken 1.1 TB of
> additional
> > disk space.
>
> My guess is that the amount of parallelism is the problem.
>
> work_mem is a per-node limit. Even a single process can use a multiple of
> work_mem if the query contains nested nodes (which almost every query
> does, but most nodes don't need much memory). With 5 parallel workers,
> the total consumption will be 5 times that. So to prevent the OOM
> condition you would need to reduce work_mem or max_parallel_workers (at
> least for this query).
>

we have more than 100GB RAM and only 1 user, with one request running.
work_mem is set to 10MB.  for oom to kill due to work_mem it means that for
one request with 2 left join, postgres needs more than 10.000 work_mem
buffers.
to me, it seems difficult to believe. but that postgres may need that RAM
space for hashing or whatever other similar purpose seems more probable.
no ?

>
> The description temp_file_limit says "...the maximum amount of disk
> space that *a process* can use...". So with 5 workers that's 210*5 =
> 1050 GB total. Again, you may want to reduce either temp_file_limit or
> max_parallel_workers.
>
Yes, but if so, we may have had a request canceled due to temp_file limit,
which was not the case.

>
> > to my understanding, before postgres 13, hash aggregate did eat RAM
> limitless
> > in such circumstances.
> > but in 14.2 ??
> > (I know, 14.8 is up...)
>
> Maybe the older version of postgres didn't use as many workers for that
> query (or maybe not parallelize it at all)?
>
> hp
>
> --
>_  | Peter J. Holzer| Story must make more sense than reality.
> |_|_) ||
> | |   | h...@hjp.at |-- Charles Stross, "Creative writing
> __/   | http://www.hjp.at/ |   challenge!"
>


Re: Death postgres

2023-05-12 Thread Peter J. Holzer
On 2023-05-11 21:27:57 +0200, Marc Millas wrote:
> the 75 lines in each tables are not NULLs but '' empty varchar, which,
> obviously is not the same thing.
> and which perfectly generates 500 billions lines for the left join.
> So, no planner or statistics pbs. apologies for the time wasted.

No problem. Glad to have solved that puzzle.

> Back to the initial pb: if, with temp_file_limit positioned to 210 GB,
> I try to run the select * from table_a left join table_b on the col_a
> (which contains the 75 '' on both tables)
> then postgres do crash, killed by oom, after having taken 1.1 TB of additional
> disk space. 

My guess is that the amount of parallelism is the problem.

work_mem is a per-node limit. Even a single process can use a multiple of
work_mem if the query contains nested nodes (which almost every query
does, but most nodes don't need much memory). With 5 parallel workers,
the total consumption will be 5 times that. So to prevent the OOM
condition you would need to reduce work_mem or max_parallel_workers (at
least for this query).

The description temp_file_limit says "...the maximum amount of disk
space that *a process* can use...". So with 5 workers that's 210*5 =
1050 GB total. Again, you may want to reduce either temp_file_limit or
max_parallel_workers.

> to my understanding, before postgres 13, hash aggregate did eat RAM limitless
> in such circumstances.
> but in 14.2 ??
> (I know, 14.8 is up...)

Maybe the older version of postgres didn't use as many workers for that
query (or maybe not parallelize it at all)?

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: Death postgres

2023-05-11 Thread Marc Millas
On Thu, May 11, 2023 at 1:56 AM Peter J. Holzer  wrote:

> On 2023-05-10 22:52:47 +0200, Marc Millas wrote:
> > On Wed, May 10, 2023 at 7:24 PM Peter J. Holzer 
> wrote:
> >
> > On 2023-05-10 16:35:04 +0200, Marc Millas wrote:
> > >  Unique  (cost=72377463163.02..201012533981.80 rows=1021522829864
> width=
> > 97)
> > >->  Gather Merge  (cost=72377463163.02..195904919832.48 rows=
> > 1021522829864 width=97)
> > ...
> > >->  Parallel Hash Left Join  (cost=
> > 604502.76..1276224253.51 rows=204304565973 width=97)
> > >  Hash Cond: ((t1.col_ano)::text =
> (t2.col_ano)::text)
> > ...
> > >
> > > //so.. the planner guess that those 2 join will generate 1000
> billions
> > rows...
> >
> > Are some of the col_ano values very frequent? If say the value 42
> occurs
> > 1 million times in both table_a and table_b, the join will create 1
> > trillion rows for that value alone. That doesn't explain the crash
> or the
> > disk usage, but it would explain the crazy cost (and would probably
> be a
> > hint that this query is unlikely to finish in any reasonable time).
> >
> >
> > good guess, even if a bit surprising: there is one (and only one)
> "value" which
> > fit your supposition: NULL
>
> But NULL doesn't equal NULL, so that would result in only one row in the
> left join. So that's not it.
>


so, apo...

the 75 lines in each tables are not NULLs but '' empty varchar, which,
obviously is not the same thing.
and which perfectly generates 500 billions lines for the left join.
So, no planner or statistics pbs. apologies for the time wasted.
Back to the initial pb:
if, with temp_file_limit positioned to 210 GB, I try to run the select *
from table_a left join table_b on the col_a (which contains the 75 ''
on both tables)
then postgres do crash, killed by oom, after having taken 1.1 TB of
additional disk space.
the explain plan guess 512 planned partitions. (obviously, I cannot provide
an explain analyze...)

to my understanding, before postgres 13, hash aggregate did eat RAM
limitless in such circumstances.
but in 14.2 ??
(I know, 14.8 is up...)


> hp
>
> --
>_  | Peter J. Holzer| Story must make more sense than reality.
> |_|_) ||
> | |   | h...@hjp.at |-- Charles Stross, "Creative writing
> __/   | http://www.hjp.at/ |   challenge!"
>



Marc MILLAS
Senior Architect
+33607850334
www.mokadb.com


Re: Death postgres

2023-05-11 Thread Marc Millas
On Thu, May 11, 2023 at 1:56 AM Peter J. Holzer  wrote:

> On 2023-05-10 22:52:47 +0200, Marc Millas wrote:
> > On Wed, May 10, 2023 at 7:24 PM Peter J. Holzer 
> wrote:
> >
> > On 2023-05-10 16:35:04 +0200, Marc Millas wrote:
> > >  Unique  (cost=72377463163.02..201012533981.80 rows=1021522829864
> width=
> > 97)
> > >->  Gather Merge  (cost=72377463163.02..195904919832.48 rows=
> > 1021522829864 width=97)
> > ...
> > >->  Parallel Hash Left Join  (cost=
> > 604502.76..1276224253.51 rows=204304565973 width=97)
> > >  Hash Cond: ((t1.col_ano)::text =
> (t2.col_ano)::text)
> > ...
> > >
> > > //so.. the planner guess that those 2 join will generate 1000
> billions
> > rows...
> >
> > Are some of the col_ano values very frequent? If say the value 42
> occurs
> > 1 million times in both table_a and table_b, the join will create 1
> > trillion rows for that value alone. That doesn't explain the crash
> or the
> > disk usage, but it would explain the crazy cost (and would probably
> be a
> > hint that this query is unlikely to finish in any reasonable time).
> >
> >
> > good guess, even if a bit surprising: there is one (and only one)
> "value" which
> > fit your supposition: NULL
>
> But NULL doesn't equal NULL, so that would result in only one row in the
> left join. So that's not it.
>

if so... how ???

>
> hp
>
> --
>_  | Peter J. Holzer| Story must make more sense than reality.
> |_|_) ||
> | |   | h...@hjp.at |-- Charles Stross, "Creative writing
> __/   | http://www.hjp.at/ |   challenge!"
>


Re: Death postgres

2023-05-10 Thread Peter J. Holzer
On 2023-05-10 22:52:47 +0200, Marc Millas wrote:
> On Wed, May 10, 2023 at 7:24 PM Peter J. Holzer  wrote:
> 
> On 2023-05-10 16:35:04 +0200, Marc Millas wrote:
> >  Unique  (cost=72377463163.02..201012533981.80 rows=1021522829864 width=
> 97)
> >    ->  Gather Merge  (cost=72377463163.02..195904919832.48 rows=
> 1021522829864 width=97)
> ...
> >                ->  Parallel Hash Left Join  (cost=
> 604502.76..1276224253.51 rows=204304565973 width=97)
> >                      Hash Cond: ((t1.col_ano)::text = 
> (t2.col_ano)::text)
> ...
> >
> > //so.. the planner guess that those 2 join will generate 1000 billions
> rows...
> 
> Are some of the col_ano values very frequent? If say the value 42 occurs
> 1 million times in both table_a and table_b, the join will create 1
> trillion rows for that value alone. That doesn't explain the crash or the
> disk usage, but it would explain the crazy cost (and would probably be a
> hint that this query is unlikely to finish in any reasonable time).
> 
> 
> good guess, even if a bit surprising: there is one (and only one) "value" 
> which
> fit your supposition: NULL

But NULL doesn't equal NULL, so that would result in only one row in the
left join. So that's not it.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: Death postgres

2023-05-10 Thread Marc Millas
On Wed, May 10, 2023 at 7:24 PM Peter J. Holzer  wrote:

> On 2023-05-10 16:35:04 +0200, Marc Millas wrote:
> >  Unique  (cost=72377463163.02..201012533981.80 rows=1021522829864
> width=97)
> >->  Gather Merge  (cost=72377463163.02..195904919832.48
> rows=1021522829864 width=97)
> ...
> >->  Parallel Hash Left Join
>  (cost=604502.76..1276224253.51 rows=204304565973 width=97)
> >  Hash Cond: ((t1.col_ano)::text = (t2.col_ano)::text)
> ...
> >
> > //so.. the planner guess that those 2 join will generate 1000 billions
> rows...
>
> Are some of the col_ano values very frequent? If say the value 42 occurs
> 1 million times in both table_a and table_b, the join will create 1
> trillion rows for that value alone. That doesn't explain the crash or the
> disk usage, but it would explain the crazy cost (and would probably be a
> hint that this query is unlikely to finish in any reasonable time).
>
> hp
>
> good guess, even if a bit surprising: there is one (and only one) "value"
which fit your supposition: NULL
75 in each table which perfectly fit the planner rows estimate.
One question: what is postgres doing when it planned to hash 1000 billions
rows ?
Did postgres create an appropriate ""space"" to handle those 1000 billions
hash values ?
 thanks,
MM

> --
>_  | Peter J. Holzer| Story must make more sense than reality.
> |_|_) ||
> | |   | h...@hjp.at |-- Charles Stross, "Creative writing
> __/   | http://www.hjp.at/ |   challenge!"
>


Re: Death postgres

2023-05-10 Thread Peter J. Holzer
On 2023-05-10 16:35:04 +0200, Marc Millas wrote:
>  Unique  (cost=72377463163.02..201012533981.80 rows=1021522829864 width=97)
>    ->  Gather Merge  (cost=72377463163.02..195904919832.48 rows=1021522829864 
> width=97)
...
>                ->  Parallel Hash Left Join  (cost=604502.76..1276224253.51 
> rows=204304565973 width=97)
>                      Hash Cond: ((t1.col_ano)::text = (t2.col_ano)::text)
...
> 
> //so.. the planner guess that those 2 join will generate 1000 billions rows...

Are some of the col_ano values very frequent? If say the value 42 occurs
1 million times in both table_a and table_b, the join will create 1
trillion rows for that value alone. That doesn't explain the crash or the
disk usage, but it would explain the crazy cost (and would probably be a
hint that this query is unlikely to finish in any reasonable time).

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: Death postgres

2023-05-10 Thread Laurenz Albe
On Wed, 2023-05-10 at 16:35 +0200, Marc Millas wrote:
> > > postgres 14.2 on Linux redhat
> > > 
> > > temp_file_limit set around 210 GB.
> > > 
> > > a select request with 2 left join have crashed the server (oom killer) 
> > > after the postgres
> > > disk occupation did grow from 15TB to 16 TB.
> > > 
> > > What are the cases where postgres may grow without caring about 
> > > temp_file_limit ?
> > 
> > That's too little information for a decent answer.
> > One obvious answer is: if it is not writing temporary files.
> 
> so here is a little more info:
> 
>                                                         QUERY PLAN            
>                                              
> ---
>  Unique  (cost=72377463163.02..201012533981.80 rows=1021522829864 width=97)
>    ->  Gather Merge  (cost=72377463163.02..195904919832.48 rows=1021522829864 
> width=97)
>          Workers Planned: 5
>          ->  Sort  (cost=72377462162.94..72888223577.87 rows=204304565973 
> width=97)
>                Sort Key: t1.col_ine, (CASE WHEN (t2.col_ibi IS NULL) THEN 
> t3.col_ibi ELSE t2.col_ibi END)
>                ->  Parallel Hash Left Join  (cost=604502.76..1276224253.51 
> rows=204304565973 width=97)
>                      Hash Cond: ((t1.col_ano)::text = (t2.col_ano)::text)
>                      ->  Parallel Hash Left Join  (cost=300803.38..582295.38 
> rows=4857277 width=52)
>                            Hash Cond: ((t1.col_ne)::text = (t3.col_ne)::text)
>                            ->  Parallel Seq Scan on table_a t1  
> (cost=0.00..184052.35 rows=2616335 width=53)
>                            ->  Parallel Hash  (cost=243466.06..243466.06 
> rows=2965306 width=31)
>                                  ->  Parallel Seq Scan on table_b t3  
> (cost=0.00..243466.06 rows=2965306 width=31)
>                      ->  Parallel Hash  (cost=243466.06..243466.06 
> rows=2965306 width=34)
>                            ->  Parallel Seq Scan on table_b t2  
> (cost=0.00..243466.06 rows=2965306 width=34)
>  JIT:
>    Functions: 19
>    Options: Inlining true, Optimization true, Expressions true, Deforming true
> (17 rows)

Perhaps parallel query drives you OOM.  Does the problem also happen if
"max_parallel_workers_per_gather" is set to 0?

Yours,
Laurenz Albe




Re: Death postgres

2023-05-10 Thread Marc Millas
On Sun, May 7, 2023 at 8:42 PM Laurenz Albe 
wrote:

> On Sat, 2023-05-06 at 03:14 +0200, Marc Millas wrote:
> > postgres 14.2 on Linux redhat
> >
> > temp_file_limit set around 210 GB.
> >
> > a select request with 2 left join have crashed the server (oom killer)
> after the postgres
> > disk occupation did grow from 15TB to 16 TB.
> >
> > What are the cases where postgres may grow without caring about
> temp_file_limit ?
>
> That's too little information for a decent answer.
> One obvious answer is: if it is not writing temporary files.
>
> Yours,
> Laurenz Albe
>

Logical ...

so here is a little more info:

db=# analyze myschema.table_a;
ANALYZE

db=# with ta as(select 'myschema' as s, 'table_a' as t), p as (select *
from information_schema.columns, ta where table_schema=ta.s and
table_name=ta.t),  tableid as(select relid from pg_stat_user_tables, ta
where schemaname=ta.s and relname=ta.t) select staattnum, column_name,
stanullfrac, stadistinct from tableid, pg_statistic join p on
p.ordinal_position=staattnum where starelid=tableid.relid order by
staattnum;
 staattnum | column_name  | stanullfrac | stadistinct
---+--+-+-
 1 | col_ne |   0 |  -0.6100224
 2 | col_brg|  0.0208 |   6
 3 | col_ano|   0 |  447302
 4 | col_ine|   0 |  -0.5341927
 5 | col_cha|   0 |  11
(5 rows)

db=# select count(*) from myschema.table_a;
  count
--
 13080776
(1 row)

db=# select count(distinct col_ano) from myschema.table_a;
  count
--
 10149937
(1 row)


// stats looks somewhat absurd, as analyze guess 20 times less distinct
values as a select distinct does on column col_ano...
db=# analyze myschema.table_b;
ANALYZE
db=# with ta as(select 'myschema' as s, 'table_b' as t), p as (select *
from information_schema.columns, ta where table_schema=ta.s and
table_name=ta.t),  tableid as(select relid from pg_stat_user_tables, ta
where schemaname=ta.s and relname=ta.t) select staattnum, column_name,
stanullfrac, stadistinct from tableid, pg_statistic join p on
p.ordinal_position=staattnum where starelid=tableid.relid order by
staattnum;
 staattnum |   column_name   | stanullfrac | stadistinct
---+-+-+-
 1 | col_nerg  |   0 |  161828
 2 | col_ibi   |   0 |  362161
 3 | col_imi   |   0 |  381023
 4 | col_ipi   |   0 |  391915
 5 | col_ne|   0 | -0.53864235
 6 | col_ano   |   0 |  482225
(6 rows)

db=# select count(*) from myschema.table_b;
  count
--
 14811485
(1 row)

db=# select count(distinct col_ano) from myschema.table_b;
  count
--
 10149937
(1 row)

//same remark


db=# explain  select distinct t1.col_ine,  case when t2.col_ibi is null
then t3.col_ibi else t2.col_ibi end  from myschema.table_a t1 left join
myschema.table_b t2 on t1.col_ano=t2.col_ano Left join myschema.table_b t3
on t1.NUM_ENQ=t3.NUM_ENQ;
QUERY PLAN

---
 Unique  (cost=72377463163.02..201012533981.80 rows=1021522829864 width=97)
   ->  Gather Merge  (cost=72377463163.02..195904919832.48
rows=1021522829864 width=97)
 Workers Planned: 5
 ->  Sort  (cost=72377462162.94..72888223577.87 rows=204304565973
width=97)
   Sort Key: t1.col_ine, (CASE WHEN (t2.col_ibi IS NULL) THEN
t3.col_ibi ELSE t2.col_ibi END)
   ->  Parallel Hash Left Join  (cost=604502.76..1276224253.51
rows=204304565973 width=97)
 Hash Cond: ((t1.col_ano)::text = (t2.col_ano)::text)
 ->  Parallel Hash Left Join
 (cost=300803.38..582295.38 rows=4857277 width=52)
   Hash Cond: ((t1.col_ne)::text =
(t3.col_ne)::text)
   ->  Parallel Seq Scan on table_a t1
 (cost=0.00..184052.35 rows=2616335 width=53)
   ->  Parallel Hash  (cost=243466.06..243466.06
rows=2965306 width=31)
 ->  Parallel Seq Scan on table_b t3
 (cost=0.00..243466.06 rows=2965306 width=31)
 ->  Parallel Hash  (cost=243466.06..243466.06
rows=2965306 width=34)
   ->  Parallel Seq Scan on table_b t2
 (cost=0.00..243466.06 rows=2965306 width=34)
 JIT:
   Functions: 19
   Options: Inlining true, Optimization true, Expressions true, Deforming
true
(17 rows)

//so.. the planner guess that those 2 join will generate 1000 billions
rows...

//so, I try to change stats

db=# alter table myschema.table_a alter column col_ano set
(n_distinct=-0.8);
ALTER TABLE
db=# analyze myschema.table_a;
ANALYZE
db=# with ta as(select 'myschema' as s, 'table_a' as t), p as (select *
from 

Re: Death postgres

2023-05-07 Thread Laurenz Albe
On Sat, 2023-05-06 at 03:14 +0200, Marc Millas wrote:
> postgres 14.2 on Linux redhat
> 
> temp_file_limit set around 210 GB.
> 
> a select request with 2 left join have crashed the server (oom killer) after 
> the postgres
> disk occupation did grow from 15TB to 16 TB.
> 
> What are the cases where postgres may grow without caring about 
> temp_file_limit ?

That's too little information for a decent answer.
One obvious answer is: if it is not writing temporary files.

Yours,
Laurenz Albe




Re: Death postgres

2023-05-07 Thread Adrian Klaver

On 5/7/23 04:46, Marc Millas wrote:



Le sam. 6 mai 2023 à 21:46, Adrian Klaver > a écrit :


On 5/6/23 10:13, Marc Millas wrote:
 >
 >

 >     When you restarted the server where there any warnings shown?
 >
 > Sadly, I cannot. Will be done next tuesday.

Cannot do what:

1) Get to the log to see if there are warnings?

2) Restart the server?

Your original post said the server crashed.

If that was the case how can you do any of the below without
restarting it?
Infrastructure is managed by another company. The db is within a
container.

So.. If I run a pg_ctl restart, the container... Vanished.
So for each main thing, I must ask... By mail..


1) The question remains, if the container crashes when you run pg_ctl 
how can you do?:



Using psql can you \d ?

Yes, and no pb to check pg_statistic,...


Can you select from any other table in the database?

Yes


2) You need to provide a more detailed explanation of the setup that the 
Postgres server is running in.


a) The host OS and version.

b) The container specifications.

   1) What container software and version?

   2) 'Hardware' specifications.

c) Some indication of the size of the data sets you are querying. Also 
the query you used.




 >
 >
 >     Using psql can you \d ?
 >
 > Yes, and no pb to check pg_statistic,...
 >
 >
 >     Can you select from any other table in the database?
 >
 > Yes
 >
 >
 >
 >
 >
 >
 >     --
 >     Adrian Klaver
 > adrian.kla...@aklaver.com 
>
 >

-- 
Adrian Klaver

adrian.kla...@aklaver.com 



--
Adrian Klaver
adrian.kla...@aklaver.com





Re: Death postgres

2023-05-07 Thread Marc Millas
Le sam. 6 mai 2023 à 21:46, Adrian Klaver  a
écrit :

> On 5/6/23 10:13, Marc Millas wrote:
> >
> >
>
> > When you restarted the server where there any warnings shown?
> >
> > Sadly, I cannot. Will be done next tuesday.
>
> Cannot do what:
>
> 1) Get to the log to see if there are warnings?
>
> 2) Restart the server?
>
> Your original post said the server crashed.
>
> If that was the case how can you do any of the below without restarting it?
> Infrastructure is managed by another company. The db is within a
> container.
>
So.. If I run a pg_ctl restart, the container... Vanished.
So for each main thing, I must ask... By mail..

> >
> >
> > Using psql can you \d ?
> >
> > Yes, and no pb to check pg_statistic,...
> >
> >
> > Can you select from any other table in the database?
> >
> > Yes
> >
> >
> >
> >
> >
> >
> > --
> > Adrian Klaver
> > adrian.kla...@aklaver.com 
> >
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>


Re: Death postgres

2023-05-06 Thread Adrian Klaver

On 5/6/23 10:13, Marc Millas wrote:






When you restarted the server where there any warnings shown?

Sadly, I cannot. Will be done next tuesday.


Cannot do what:

1) Get to the log to see if there are warnings?

2) Restart the server?

Your original post said the server crashed.

If that was the case how can you do any of the below without restarting it?




Using psql can you \d ?

Yes, and no pb to check pg_statistic,...


Can you select from any other table in the database?

Yes






-- 
Adrian Klaver

adrian.kla...@aklaver.com 



--
Adrian Klaver
adrian.kla...@aklaver.com





Re: Death postgres

2023-05-06 Thread Marc Millas
Le sam. 6 mai 2023 à 18:11, Adrian Klaver  a
écrit :

> On 5/6/23 05:25, Marc Millas wrote:
> >
> >
> > Le sam. 6 mai 2023 à 06:18, Adrian Klaver  > > a écrit :
> >
> > On 5/5/23 18:14, Marc Millas wrote:
> >  > Hi,
> >  >
> >  > postgres 14.2 on Linux redhat
> >  >
> >  > temp_file_limit set around 210 GB.
> >  >
> >  > a select request with 2 left join have crashed the server (oom
> > killer)
> >  > after the postgres disk occupation did grow from 15TB to 16 TB.
> >
> > The result of EXPLAIN  would be helpful.
> > Sure!
> >
> > But. One of the table looks "inaccessible" since.
> > Ie. Even explain select * from the_table didnt answer and must be killed
> > by control c
>
> When you restarted the server where there any warnings shown?
>
Sadly, I cannot. Will be done next tuesday.

>
> Using psql can you \d ?
>
Yes, and no pb to check pg_statistic,...

>
> Can you select from any other table in the database?
>
Yes

>
>
>
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>


Re: Death postgres

2023-05-06 Thread Adrian Klaver

On 5/6/23 05:25, Marc Millas wrote:



Le sam. 6 mai 2023 à 06:18, Adrian Klaver > a écrit :


On 5/5/23 18:14, Marc Millas wrote:
 > Hi,
 >
 > postgres 14.2 on Linux redhat
 >
 > temp_file_limit set around 210 GB.
 >
 > a select request with 2 left join have crashed the server (oom
killer)
 > after the postgres disk occupation did grow from 15TB to 16 TB.

The result of EXPLAIN  would be helpful.
Sure!

But. One of the table looks "inaccessible" since.
Ie. Even explain select * from the_table didnt answer and must be killed 
by control c


When you restarted the server where there any warnings shown?

Using psql can you \d ?

Can you select from any other table in the database?





--
Adrian Klaver
adrian.kla...@aklaver.com





Re: Death postgres

2023-05-06 Thread Thomas Guyot

On 2023-05-05 21:14, Marc Millas wrote:

Hi,

postgres 14.2 on Linux redhat

temp_file_limit set around 210 GB.

a select request with 2 left join have crashed the server (oom killer) 
after the postgres disk occupation did grow from 15TB to 16 TB.


What are the cases where postgres may grow without caring about 
temp_file_limit ?


thanks,



Some OSes like IIRC RHEL9 now default to tmpfs for /tmp - if your temp 
files are written in a tmpfs then it may may very well trigger the OOM 
because of the temp file used up all RAM.


Check the filesystem type of your temp file's location.

--
Thomas




Re: Death postgres

2023-05-06 Thread Ron

On 5/6/23 08:52, Marc Millas wrote:


Le sam. 6 mai 2023 à 15:15, Ron  a écrit :



[snip]


If your question is about temp_file_limit, don't distract us with OOM
issues.

My question is how postgres can use space without caring about 
temp_file_limit. The oom info is kind of hint about the context as, as 
said, one select did generate both things


It's a distraction to lead with "OOM killed my process".  Evidence of this 
fact is that all respondents have talked about is memory, not disk space.


--
Born in Arizona, moved to Babylonia.

Re: Death postgres

2023-05-06 Thread Marc Millas
Le sam. 6 mai 2023 à 15:15, Ron  a écrit :

> On 5/6/23 07:19, Marc Millas wrote:
>
>
>
> Le sam. 6 mai 2023 à 09:46, Peter J. Holzer  a écrit :
>
>> On 2023-05-06 03:14:20 +0200, Marc Millas wrote:
>> > postgres 14.2 on Linux redhat
>> >
>> > temp_file_limit set around 210 GB.
>> >
>> > a select request with 2 left join have crashed the server (oom killer)
>> after
>> > the postgres disk occupation did grow from 15TB to 16 TB.
>>
>
> "15TB" and "16TB" are pretty low-resolution.  For example, 15.4TB rounds
> *down* to 15TB, while 15.6TB rounds *up* to 16TB, while they are in fact
> only 200GB apart.
>
> Heck, even 15.4TB and 15.6TB are low-resolution.  temp_file_limit may
> actually be working.
>
>
>> temp_file_limit limits the space a process may use on disk while the OOM
>> killer gets activated when the system runs out of RAM. So these seem to
>> be unrelated.
>>
>> hp
>>
> Its clear that oom killer is triggered by RAM and temp_file is a disk
> thing...
> But the sudden growth of disk space usage and RAM did happen exactly at
> the very same time, with only one user connected, and only one query
> running...
>
>
> If your question is about temp_file_limit, don't distract us with OOM
> issues.
>
My question is how postgres can use space without caring about
temp_file_limit. The oom info is kind of hint about the context as, as
said, one select did generate both things

>
> --
> Born in Arizona, moved to Babylonia.
>


Re: Death postgres

2023-05-06 Thread Marc Millas
Le sam. 6 mai 2023 à 15:15, Ron  a écrit :

> On 5/6/23 07:19, Marc Millas wrote:
>
>
>
> Le sam. 6 mai 2023 à 09:46, Peter J. Holzer  a écrit :
>
>> On 2023-05-06 03:14:20 +0200, Marc Millas wrote:
>> > postgres 14.2 on Linux redhat
>> >
>> > temp_file_limit set around 210 GB.
>> >
>> > a select request with 2 left join have crashed the server (oom killer)
>> after
>> > the postgres disk occupation did grow from 15TB to 16 TB.
>>
>
> "15TB" and "16TB" are pretty low-resolution.  For example, 15.4TB rounds
> *down* to 15TB, while 15.6TB rounds *up* to 16TB, while they are in fact
> only 200GB apart.
>
> Heck, even 15.4TB and 15.6TB are low-resolution.  temp_file_limit may
> actually be working.
>

It was... 15.2  and becomes 16.3...

>
>
>> temp_file_limit limits the space a process may use on disk while the OOM
>> killer gets activated when the system runs out of RAM. So these seem to
>> be unrelated.
>>
>> hp
>>
> Its clear that oom killer is triggered by RAM and temp_file is a disk
> thing...
> But the sudden growth of disk space usage and RAM did happen exactly at
> the very same time, with only one user connected, and only one query
> running...
>
>
> If your question is about temp_file_limit, don't distract us with OOM
> issues.
>
> --
> Born in Arizona, moved to Babylonia.
>


Re: Death postgres

2023-05-06 Thread Ron

On 5/6/23 07:19, Marc Millas wrote:



Le sam. 6 mai 2023 à 09:46, Peter J. Holzer  a écrit :

On 2023-05-06 03:14:20 +0200, Marc Millas wrote:
> postgres 14.2 on Linux redhat
>
> temp_file_limit set around 210 GB.
>
> a select request with 2 left join have crashed the server (oom
killer) after
> the postgres disk occupation did grow from 15TB to 16 TB.



"15TB" and "16TB" are pretty low-resolution.  For example, 15.4TB rounds 
*down* to 15TB, while 15.6TB rounds *up* to 16TB, while they are in fact 
only 200GB apart.


Heck, even 15.4TB and 15.6TB are low-resolution.  temp_file_limit may 
actually be working.




temp_file_limit limits the space a process may use on disk while the OOM
killer gets activated when the system runs out of RAM. So these seem to
be unrelated.

        hp

Its clear that oom killer is triggered by RAM and temp_file is a disk 
thing...
But the sudden growth of disk space usage and RAM did happen exactly at 
the very same time, with only one user connected, and only one query 
running...


If your question is about temp_file_limit, don't distract us with OOM issues.

--
Born in Arizona, moved to Babylonia.

Re: Death postgres

2023-05-06 Thread Marc Millas
Le sam. 6 mai 2023 à 06:18, Adrian Klaver  a
écrit :

> On 5/5/23 18:14, Marc Millas wrote:
> > Hi,
> >
> > postgres 14.2 on Linux redhat
> >
> > temp_file_limit set around 210 GB.
> >
> > a select request with 2 left join have crashed the server (oom killer)
> > after the postgres disk occupation did grow from 15TB to 16 TB.
>
> The result of EXPLAIN  would be helpful.
> Sure!
>
But. One of the table looks "inaccessible" since.
Ie. Even explain select * from the_table didnt answer and must be killed by
control c

> >
> > What are the cases where postgres may grow without caring about
> > temp_file_limit ?
> >
> > thanks,
> >
> >
> >
> >
> > Marc MILLAS
> > Senior Architect
> > +33607850334
> > www.mokadb.com 
> >
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>


Re: Death postgres

2023-05-06 Thread Marc Millas
Le sam. 6 mai 2023 à 09:46, Peter J. Holzer  a écrit :

> On 2023-05-06 03:14:20 +0200, Marc Millas wrote:
> > postgres 14.2 on Linux redhat
> >
> > temp_file_limit set around 210 GB.
> >
> > a select request with 2 left join have crashed the server (oom killer)
> after
> > the postgres disk occupation did grow from 15TB to 16 TB.
>
> temp_file_limit limits the space a process may use on disk while the OOM
> killer gets activated when the system runs out of RAM. So these seem to
> be unrelated.
>
> hp
>
Its clear that oom killer is triggered by RAM and temp_file is a disk
thing...
But the sudden growth of disk space usage and RAM did happen exactly at the
very same time, with only one user connected, and only one query running...

>
> --
>_  | Peter J. Holzer| Story must make more sense than reality.
> |_|_) ||
> | |   | h...@hjp.at |-- Charles Stross, "Creative writing
> __/   | http://www.hjp.at/ |   challenge!"
>


Re: Death postgres

2023-05-06 Thread Peter J. Holzer
On 2023-05-06 03:14:20 +0200, Marc Millas wrote:
> postgres 14.2 on Linux redhat
> 
> temp_file_limit set around 210 GB.
> 
> a select request with 2 left join have crashed the server (oom killer) after
> the postgres disk occupation did grow from 15TB to 16 TB.

temp_file_limit limits the space a process may use on disk while the OOM
killer gets activated when the system runs out of RAM. So these seem to
be unrelated.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: Death postgres

2023-05-05 Thread Adrian Klaver

On 5/5/23 18:14, Marc Millas wrote:

Hi,

postgres 14.2 on Linux redhat

temp_file_limit set around 210 GB.

a select request with 2 left join have crashed the server (oom killer) 
after the postgres disk occupation did grow from 15TB to 16 TB.


The result of EXPLAIN  would be helpful.



What are the cases where postgres may grow without caring about 
temp_file_limit ?


thanks,




Marc MILLAS
Senior Architect
+33607850334
www.mokadb.com 



--
Adrian Klaver
adrian.kla...@aklaver.com





Death postgres

2023-05-05 Thread Marc Millas
Hi,

postgres 14.2 on Linux redhat

temp_file_limit set around 210 GB.

a select request with 2 left join have crashed the server (oom killer)
after the postgres disk occupation did grow from 15TB to 16 TB.

What are the cases where postgres may grow without caring about
temp_file_limit ?

thanks,




Marc MILLAS
Senior Architect
+33607850334
www.mokadb.com