Re: Postgresql 16.3 Out Of Memory

2024-06-03 Thread Greg Sabino Mullane
On Mon, Jun 3, 2024 at 6:19 AM Radu Radutiu  wrote:

> Do you have any idea how to further debug the problem?
>

Putting aside the issue of non-reclaimed memory for now, can you show us
the actual query? The explain analyze you provided shows it doing an awful
lot of joins and then returning 14+ million rows to the client. Surely the
client does not need that many rows?

You can probably also prevent OOM by lowering your work_mem and/or your
max_parallel_workers_per_gather.

Cheers,
Greg


Postgresql 16.3 Out Of Memory

2024-06-03 Thread Radu Radutiu
Hello,

I have an out of memory problem after upgrading from postgresql 12 to 16.3.
I have identified one query that can reproduce the error on demand. Once
the query starts, it will eventually exhaust all RAM and swap until the OOM
killer will stop postgresql.
The setup is as follows:
- One VMWARE VM, RHEL 8.9, 64 GB RAM, 16 vCPU, 32 GB swap. Database is used
by a local java application that takes up to 22 GB (hard limit). The
application uses 2 db connections (long running, app uses a connection
pool). The database has about 10 tables, with 3 large tables involved in
the problem query:
 relation   | total_size
---+
 outputrequest | 217 GB
 inputrequest  | 149 GB
 tran  | 5041 MB

I have the following settings:
shared_buffers = '8192MB'
effective_cache_size = '24GB'
maintenance_work_mem = '2GB'
checkpoint_completion_target = '0.9'
wal_buffers = '16MB'
default_statistics_target = '100'
random_page_cost = '1.1'
effective_io_concurrency = '200'
work_mem = '104857kB'
min_wal_size = '1GB'
max_wal_size = '4GB'
max_worker_processes = '16'
max_parallel_workers_per_gather = '4'
max_parallel_workers = '16'
max_parallel_maintenance_workers = '4'
jit = 'off'

Beside OOM there is another (possibly related)problem: the postgresql
memory usage is significantly higher than the shared_buffers + the rest of
the memory used per connection. On my system with shared buffers 8G and
work_mem ~ 100M I have memory usage of 20GB with the system running just
simple inserts (the memory might have increased due to previous queries).

I have found out that if I set  enable_parallel_hash = 'off', I do get high
memory usage for the problem query (above 30G) but not OOM. With the
enable_parallel_hash = 'on', I will always get OOM. For simplicity I've
restricted postgresql systemd service to cap the memory at 36 GB (so that I
don't have to wait for the 32 GB swap to be filled before OOM).  This is
the memory usage I get when running the query, before systemd restarts
postgres - memory usage is 35.9 GB just below the 36 GB limit (the query is
running with explain analyze,buffers, the other 2 idle connections are from
the application):

[postgres@ips3 ~]$ systemctl status postgresql-16; top -u postgres -n 1 -c
● postgresql-16.service - PostgreSQL 16 database server
   Loaded: loaded (/etc/systemd/system/postgresql-16.service; enabled;
vendor preset: disabled)
  Drop-In: /etc/systemd/system/postgresql-16.service.d
   └─override.conf
   Active: active (running) since Mon 2024-06-03 04:23:16 +08; 11h ago
 Docs: https://www.postgresql.org/docs/16/static/
  Process: 283703 ExecStartPre=/usr/pgsql-16/bin/postgresql-16-check-db-dir
${PGDATA} (code=exited, status=0/SUCCESS)
 Main PID: 283709 (postgres)
Tasks: 14 (limit: 408404)
   Memory: 35.9G (limit: 36.0G)
   CGroup: /system.slice/postgresql-16.service
   ├─283709 /usr/pgsql-16/bin/postgres -D /var/lib/pgsql/16/data/
   ├─283710 postgres: logger
   ├─295499 postgres: checkpointer
   ├─295500 postgres: background writer
   ├─295506 postgres: walwriter
   ├─295507 postgres: autovacuum launcher
   ├─295508 postgres: logical replication launcher
   ├─295511 postgres: db1 db1 127.0.0.1(37720) idle
   ├─295946 postgres: postgres db1 [local] EXPLAIN
   ├─295947 postgres: parallel worker for PID 295946
   ├─295948 postgres: parallel worker for PID 295946
   ├─295949 postgres: parallel worker for PID 295946
   ├─295950 postgres: parallel worker for PID 295946
   └─296037 postgres: db1 db1 127.0.0.1(41708) idle

top - 16:14:32 up 14 days,  4:42,  1 user,  load average: 5.41, 3.58, 2.25
Tasks: 405 total,   6 running, 399 sleeping,   0 stopped,   0 zombie
%Cpu(s):  1.1 us,  5.2 sy,  0.0 ni, 71.2 id, 21.3 wa,  0.7 hi,  0.4 si,
 0.0 st
MiB Mem :  63873.4 total,   2162.0 free,  57177.8 used,   4533.6 buff/cache
MiB Swap:  32260.0 total,   3813.8 free,  28446.2 used.   4349.0 avail Mem

PID USER  PR  NIVIRTRESSHR S  %CPU  %MEM TIME+
COMMAND


 295947 postgres  20   0   21.2g   7.4g 654168 R  20.0  11.8   1:31.81
postgres: parallel worker for PID 295946


 295949 postgres  20   0   21.3g   7.5g 656316 R  20.0  12.0   1:32.57
postgres: parallel worker for PID 295946


 295950 postgres  20   0   21.2g   7.5g 654756 R  20.0  12.1   1:31.99
postgres: parallel worker for PID 295946


 295946 postgres  20   0   21.2g   7.4g 655684 R  13.3  11.8   1:31.86
postgres: postgres db1 [local] EXPLAIN


 295948 postgres  20   0   21.6g   6.0g 656024 R  13.3   9.6   1:37.91
postgres: parallel worker for PID 295946


 283709 postgres  20   0 8960628  44144  43684 S   0.0   0.1   0:42.28
/usr/pgsql-16/bin/postgres -D /var/lib/pgsql/16/data/


 283710 postgres  20   0  355288   4876   4576 S   0.0   0.0   0:04.42
postgres: logger

 295499 postgres  20   0 8960936 107376 106868 S   0.0   0.2   0:01.75
postgres: checkpointer


 295500

Re: Postgres Out Of Memory Crash

2023-11-06 Thread Merlin Moncure
On Thu, Nov 2, 2023 at 4:13 AM Avi Weinberg  wrote:

> Hi experts,
>
>
>
> I'm using Patroni Postgres installation and noticed that twice already
> postgres crashed due to out of memory.  I'm using logical replication with
> around 30-40 active subscribers on this machine.  The machine has 128GB but
> only 32GB is allocated to Postgres.  How can I know what is actually
> causing the out of memory issue?  Is it caused by not optimal postgres
> configuration or something else?
>
>
>
> /usr/lib/postgresql/13/bin/postgres -D
> /home/postgres/pgdata/pgroot/data
> --config-file=/home/postgres/pgdata/pgroot/data/postgresql.conf --port=5432
> --cluster_name=postgres-cluster --wal_level=logical --hot_standby=on
> --max_connections=533 --max_wal_senders=90 --max_prepared_transactions=0
> --max_locks_per_transaction=64 --track_commit_timestamp=on
> --max_replication_slots=90 --max_worker_processes=30 --wal_log_hints=on
>

Couple things here.   You don't really allocate memory to postgres, rather
you set up memory reserved for certain kinds of buffering operations via
shared buffers, and other less important settings.   This memory is
carefully managed, and is likely not underneath your oom condition.
Ironically, increasing shared buffers can make your problem more likely to
occur as you are taking memory from other tasks.

>  --max_connections=533

Probably your problem is at least partially here.   This number was
configured specifically, implying you are running out of connections and
had to crank this value.

If you have a lot of idle connections sitting around (say, if you have
several application servers managing connection pools),  a sudden spike in
memory load and/or connection utilization can cause this to occur.  Suggest
one or more of:
* lowering shared buffers
* implementing pgbouncer and lowering max_connections
* increasing physical memory

Also,
> I'm using logical replication with around 30-40 active subscribers on
this machine.

Hm.  Have you considered streaming replica setup, so that you can attach
read only processes to the standby?

merlin


Re: Postgres Out Of Memory Crash

2023-11-02 Thread Laurenz Albe
On Thu, 2023-11-02 at 09:12 +, Avi Weinberg wrote:
> I'm using Patroni Postgres installation and noticed that twice already 
> postgres
> crashed due to out of memory.  I'm using logical replication with around 30-40
> active subscribers on this machine.  The machine has 128GB but only 32GB is 
> allocated
> to Postgres.  How can I know what is actually causing the out of memory issue?
> Is it caused by not optimal postgres configuration or something else?

You should look into the PostgreSQL log.  That should show a message like

  LOG:  server process (PID 16024) was terminated by signal 9: Killed
  DETAIL:  Failed process was running: SELECT ...

It is not certain, but often that statement is the one that used up
all that memory.  At least it is a starting point for your investigation.

Yours,
Laurenz Albe




Postgres Out Of Memory Crash

2023-11-02 Thread Avi Weinberg
Hi experts,

I'm using Patroni Postgres installation and noticed that twice already postgres 
crashed due to out of memory.  I'm using logical replication with around 30-40 
active subscribers on this machine.  The machine has 128GB but only 32GB is 
allocated to Postgres.  How can I know what is actually causing the out of 
memory issue?  Is it caused by not optimal postgres configuration or something 
else?

/usr/lib/postgresql/13/bin/postgres -D /home/postgres/pgdata/pgroot/data 
--config-file=/home/postgres/pgdata/pgroot/data/postgresql.conf --port=5432 
--cluster_name=postgres-cluster --wal_level=logical --hot_standby=on 
--max_connections=533 --max_wal_senders=90 --max_prepared_transactions=0 
--max_locks_per_transaction=64 --track_commit_timestamp=on 
--max_replication_slots=90 --max_worker_processes=30 --wal_log_hints=on


Oct 27 07:05:31 node2 kernel: postgres invoked oom-killer: gfp_mask=0xd0, 
order=0, oom_score_adj=993
Oct 27 07:05:31 node2 kernel: postgres 
cpuset=docker-6ae67e04710619972d3b1ab5d4c69c318d001c2da47fecee121cdc60279a14a0.scope
 mems_allowed=0
Oct 27 07:05:31 node2 kernel: CPU: 6 PID: 15536 Comm: postgres Kdump: 
loaded Tainted: G    T 3.10.0-1160.el7.x86_64 #1
Oct 27 07:05:31 node2 kernel: Hardware name: Kontron MSP8040/4008, BIOS 
Core: 5.11, MSP804x: 1.57.0943FC77 05/06/2020
Oct 27 07:05:31 node2 kernel: Call Trace:
Oct 27 07:05:31 node2 kernel: [] dump_stack+0x19/0x1b
Oct 27 07:05:31 node2 kernel: [] dump_header+0x90/0x229
Oct 27 07:05:31 node2 kernel: [] ? 
ep_poll_callback+0xf8/0x220
Oct 27 07:05:31 node2 kernel: [] ? 
find_lock_task_mm+0x56/0xc0
Oct 27 07:05:31 node2 kernel: [] ? 
try_get_mem_cgroup_from_mm+0x28/0x60
Oct 27 07:05:31 node2 kernel: [] 
oom_kill_process+0x2cd/0x490
Oct 27 07:05:31 node2 kernel: [] 
mem_cgroup_oom_synchronize+0x55c/0x590
Oct 27 07:05:31 node2 kernel: [] ? 
mem_cgroup_charge_common+0xc0/0xc0
Oct 27 07:05:31 node2 kernel: [] 
pagefault_out_of_memory+0x14/0x90
Oct 27 07:05:31 node2 kernel: [] mm_fault_error+0x6a/0x157
Oct 27 07:05:31 node2 kernel: [] 
__do_page_fault+0x491/0x500
Oct 27 07:05:31 node2 kernel: [] do_page_fault+0x35/0x90
Oct 27 07:05:31 node2 kernel: [] page_fault+0x28/0x30
Oct 27 07:05:31 node2 kernel: Task in 
/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod5984f099_30b0_4506_8730_6c72e7c02b78.slice/docker-6ae67e0471061997$
Oct 27 07:05:31 node2 kernel: memory: usage 32768000kB, limit 32768000kB, 
failcnt 144867
Oct 27 07:05:31 node2 kernel: memory+swap: usage 32768000kB, limit 
9007199254740988kB, failcnt 0
Oct 27 07:05:31 node2 kernel: kmem: usage 0kB, limit 9007199254740988kB, 
failcnt 0
Oct 27 07:05:31 node2 kernel: Memory cgroup stats for 
/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod5984f099_30b0_4506_8730_6c72e7c02b78.slice:
 cache:$
Oct 27 07:05:31 node2 kernel: Memory cgroup stats for 
/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod5984f099_30b0_4506_8730_6c72e7c02b78.slice/docker-$
Oct 27 07:05:31 node2 kernel: Memory cgroup stats for 
/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod5984f099_30b0_4506_8730_6c72e7c02b78.slice/docker-$
Oct 27 07:05:31 node2 kernel: [ pid ]   uid  tgid total_vm  rss nr_ptes 
swapents oom_score_adj name
Oct 27 07:05:31 node2 kernel: [13159] 0 13159  2391   3 
   0  -998 pause
Oct 27 07:05:31 node2 kernel: [13322] 0 13322 1095   97   8 
   0   993 dumb-init
Oct 27 07:05:31 node2 kernel: [13335] 0 13335 1156  171   8 
   0   993 sh
Oct 27 07:05:31 node2 kernel: [13411] 0 13411 1137   98   8 
   0   993 runsvdir
Oct 27 07:05:31 node2 kernel: [13438] 0 13438 1099   98   7 
   0   993 runsv
Oct 27 07:05:31 node2 kernel: [13439] 0 13439 1099   98   7 
   0   993 runsv
Oct 27 07:05:31 node2 kernel: [13440]   101 1344027026 1186  54 
   0   993 pgqd
Oct 27 07:05:31 node2 kernel: [13441]   101 13441   155215 8237 101 
   0   993 patroni
Oct 27 07:05:31 node2 kernel: [19532]   101 19532  174003346817 171 
   0   993 postgres
Oct 27 07:05:31 node2 kernel: [19542]   101 19542  1767874 6713 121 
   0   993 postgres
Oct 27 07:05:31 node2 kernel: [19546]   101 19546  1740173  14450313166 
   0   993 postgres
Oct 27 07:05:31 node2 kernel: [19547]   101 19547  174006920060 171 
   0   993 postgres
Oct 27 07:05:31 node2 kernel: [19548]   101 19548  1740027 4821  86 
   0   993 postgres
Oct 27 07:05:31 node2 kernel: [19549]   101 19549  1740283 1011  91 
   0   993 postgres
Oct 27 07:05:31 node2 kernel: [19549]   101 19549

Re: PostgreSQL 14.4 ERROR: out of memory issues

2022-08-07 Thread Aleš Zelený
Hello,

The problem Joe spotted is that a variable name is misspelled in one
> place. It should (probably) be external_complete_id, not
> _external_complete_id).


Oh, copy-paste issue :-)


> > better solution proposal?
>
> I think you can make that clearer by using IS [NOT] DISTINCT FROM:
>
> SELECT ... simple join of two tables...
> WHERE opd.id_data_provider = _id_data_provider
> AND external_id IS NOT DISTINCT FROM _external_id
> AND external_complete_id IS NOT DISTINCT FROM _external_complete_id
> ;
>
> However, your version may be faster, as last time I looked (it's been
> some time) the optimizer wasn't especially good at handlung DISTINCT
> FROM (probably because it's so rarely used).g
>
> Distinct from was the original version, but it ignores indexes, 130ms  vs
0.3 ms, you are absolutely correct.

Kind regards
Ales Zeleny


Re: PostgreSQL 14.4 ERROR: out of memory issues

2022-08-06 Thread Peter J. Holzer
On 2022-08-05 17:47:02 +0200, Aleš Zelený wrote:
> the construct surprised me when I saw it in the function the first time, but 
> it
> is correct and works as expected - it allows writing the function as SQL
> instead of PLPGSQL while it ensures that for a parameter null value it
> evaluates to true instead of filtering the resultset.
> What is the issue

The problem Joe spotted is that a variable name is misspelled in one
place. It should (probably) be external_complete_id, not
_external_complete_id).

> better solution proposal?

I think you can make that clearer by using IS [NOT] DISTINCT FROM:

SELECT ... simple join of two tables...
WHERE opd.id_data_provider = _id_data_provider
    AND external_id IS NOT DISTINCT FROM _external_id
AND external_complete_id IS NOT DISTINCT FROM _external_complete_id
;

However, your version may be faster, as last time I looked (it's been
some time) the optimizer wasn't especially good at handlung DISTINCT
FROM (probably because it's so rarely used).

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: PostgreSQL 14.4 ERROR: out of memory issues

2022-08-05 Thread Aleš Zelený
Hello,

the construct surprised me when I saw it in the function the first time,
but it is correct and works as expected - it allows writing the function as
SQL instead of PLPGSQL while it ensures that for a parameter null value it
evaluates to true instead of filtering the resultset.
What is the issue/better solution proposal?

Kind regards Ales Zeleny


čt 4. 8. 2022 v 23:15 odesílatel Joe Conway  napsal:

> On 8/4/22 08:34, Aleš Zelený wrote:
> >> SELECT ... simple join of two tables...
> >>  WHERE opd.id_data_provider = _id_data_provider
> >>AND CASE WHEN _external_id IS NULL
> >> THEN external_id IS NULL
> >> ELSE external_id = _external_id
> >>END
> >>AND CASE WHEN _external_complete_id IS NULL
> >> THEN _external_complete_id IS NULL
>
> Unrelated to your question, but shouldn't that actually read:
>
> AND CASE WHEN _external_complete_id IS NULL
>  THEN external_complete_id IS NULL
>  ^^^
>
> >> ELSE external_complete_id = _external_complete_id
> >>END;
> >> $function$
>
> --
> Joe Conway
> RDS Open Source Databases
> Amazon Web Services: https://aws.amazon.com
>


Re: PostgreSQL 14.4 ERROR: out of memory issues

2022-08-04 Thread Joe Conway

On 8/4/22 08:34, Aleš Zelený wrote:

SELECT ... simple join of two tables...
      WHERE opd.id_data_provider = _id_data_provider
        AND CASE WHEN _external_id IS NULL
                 THEN external_id IS NULL
                 ELSE external_id = _external_id
            END
        AND CASE WHEN _external_complete_id IS NULL
                 THEN _external_complete_id IS NULL


Unrelated to your question, but shouldn't that actually read:

   AND CASE WHEN _external_complete_id IS NULL
THEN external_complete_id IS NULL
^^^


                 ELSE external_complete_id = _external_complete_id
            END;
$function$


--
Joe Conway
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: PostgreSQL 14.4 ERROR: out of memory issues

2022-08-04 Thread Aleš Zelený
Hello,

thanks for the information and the link!

Ales

čt 4. 8. 2022 v 1:05 odesílatel Tom Lane  napsal:

> =?UTF-8?B?QWxlxaEgWmVsZW7DvQ==?=  writes:
> > CREATE OR REPLACE FUNCTION tescase (_id_data_provider integer,
> _external_id
> > bigint DEFAULT NULL::bigint, _external_complete_id character varying
> > DEFAULT NULL::character varying)
> >  RETURNS TABLE(id_gm bigint, id_opp bigint, id_opp_state integer)
> >  LANGUAGE sql
> > AS $function$
> > SELECT ... simple join of two tables...
> >  WHERE opd.id_data_provider = _id_data_provider
> >AND CASE WHEN _external_id IS NULL
> > THEN external_id IS NULL
> > ELSE external_id = _external_id
> >END
> >AND CASE WHEN _external_complete_id IS NULL
> > THEN _external_complete_id IS NULL
> > ELSE external_complete_id = _external_complete_id
> >END;
> > $function$
> > ;
>
> > It is a kind of creative construct for me, but it works. The key here is
> > that if I replace at least one of the "CASEd" where conditions, it seems
> > not to suffer from the memory leak issue.
>
> > Finally, I've found, that even having the function as is and before the
> > test disabling JIT (SET jit = off;) and calling the function 100k times,
> > RssAnon memory for the given process is stable and only 3612 kB, while
> when
> > JIT is enabled (the default setting on the server suffering from the
> memory
> > leak, RssAnon memory for the given process growth in a linear manner over
> > time (canceled when it reached 5GB).
>
> Ah.  I bet this is another instance of the known memory leakage problems
> with JIT inlining [1].  Per Andres' comments in that thread, it seems
> hard to solve properly.  For now all I can recommend is to disable that.
>
> regards, tom lane
>
> [1]
> https://www.postgresql.org/message-id/flat/20201001021609.GC8476%40telsasoft.com
>


Re: PostgreSQL 14.4 ERROR: out of memory issues

2022-08-03 Thread Tom Lane
=?UTF-8?B?QWxlxaEgWmVsZW7DvQ==?=  writes:
> CREATE OR REPLACE FUNCTION tescase (_id_data_provider integer, _external_id
> bigint DEFAULT NULL::bigint, _external_complete_id character varying
> DEFAULT NULL::character varying)
>  RETURNS TABLE(id_gm bigint, id_opp bigint, id_opp_state integer)
>  LANGUAGE sql
> AS $function$
> SELECT ... simple join of two tables...
>  WHERE opd.id_data_provider = _id_data_provider
>AND CASE WHEN _external_id IS NULL
> THEN external_id IS NULL
> ELSE external_id = _external_id
>END
>AND CASE WHEN _external_complete_id IS NULL
> THEN _external_complete_id IS NULL
> ELSE external_complete_id = _external_complete_id
>END;
> $function$
> ;

> It is a kind of creative construct for me, but it works. The key here is
> that if I replace at least one of the "CASEd" where conditions, it seems
> not to suffer from the memory leak issue.

> Finally, I've found, that even having the function as is and before the
> test disabling JIT (SET jit = off;) and calling the function 100k times,
> RssAnon memory for the given process is stable and only 3612 kB, while when
> JIT is enabled (the default setting on the server suffering from the memory
> leak, RssAnon memory for the given process growth in a linear manner over
> time (canceled when it reached 5GB).

Ah.  I bet this is another instance of the known memory leakage problems
with JIT inlining [1].  Per Andres' comments in that thread, it seems
hard to solve properly.  For now all I can recommend is to disable that.

regards, tom lane

[1] 
https://www.postgresql.org/message-id/flat/20201001021609.GC8476%40telsasoft.com




Re: PostgreSQL 14.4 ERROR: out of memory issues

2022-08-03 Thread Aleš Zelený
Hello,

I did some testing and the result is that on pg12 there are no such
problems. Pg13 and Pg 14.3 tests will follow but based on monitoring
processed RssAnon memory, I've found a correlation that only processes by a
single user are suffering from the memory allocation (it looks to be a
memory leak to me).
For the given DB user, I've configured log_min_duration_stattement to 0 and
afterward analyzed the Postgres server log with all the statements in the
time window when I spot a significant memory growth for a single process in
a short time (a few minutes).
This correlation points my attention to a SQL function called frequently
and I've tried to do some testing with it.



CREATE OR REPLACE FUNCTION tescase (_id_data_provider integer, _external_id
bigint DEFAULT NULL::bigint, _external_complete_id character varying
DEFAULT NULL::character varying)
 RETURNS TABLE(id_gm bigint, id_opp bigint, id_opp_state integer)
 LANGUAGE sql
AS $function$
SELECT ... simple join of two tables...
 WHERE opd.id_data_provider = _id_data_provider
   AND CASE WHEN _external_id IS NULL
THEN external_id IS NULL
ELSE external_id = _external_id
   END
   AND CASE WHEN _external_complete_id IS NULL
THEN _external_complete_id IS NULL
ELSE external_complete_id = _external_complete_id
   END;
$function$
;

It is a kind of creative construct for me, but it works. The key here is
that if I replace at least one of the "CASEd" where conditions, it seems
not to suffer from the memory leak issue.

Finally, I've found, that even having the function as is and before the
test disabling JIT (SET jit = off;) and calling the function 100k times,
RssAnon memory for the given process is stable and only 3612 kB, while when
JIT is enabled (the default setting on the server suffering from the memory
leak, RssAnon memory for the given process growth in a linear manner over
time (canceled when it reached 5GB).
Tested on real application data I could not share, but if I got a chance to
give try to the preparation of a synthetic test case, then I thought it'll
be time to submit it as a bug.

In the smaps for the given process, it is possible to identify the address
of a heap allocated memory wich constantly grows every execution (while
other mapped heap segment's size is static ), but I have no clue whether it
might help, it looks like:

while :; do date; sed -n '/^02d93000.*/,/^VmFlags/p' /proc/31600/smaps;
sleep 2; done

Út srp  2 17:10:18 CEST 2022
02d93000-20fe1000 rw-p  00:00 0
 [heap]
Size: 493880 kB
Rss:  492784 kB
Pss:  492784 kB
Shared_Clean:  0 kB
Shared_Dirty:  0 kB
Private_Clean: 0 kB
Private_Dirty:492784 kB
Referenced:   492560 kB
Anonymous:492784 kB
AnonHugePages: 0 kB
Swap:  0 kB
KernelPageSize:4 kB
MMUPageSize:   4 kB
Locked:0 kB
VmFlags: rd wr mr mp me ac sd
Út srp  2 17:10:20 CEST 2022
02d93000-21003000 rw-p  00:00 0
 [heap]
Size: 494016 kB
Rss:  493660 kB
Pss:  493660 kB
Shared_Clean:  0 kB
Shared_Dirty:  0 kB
Private_Clean: 0 kB
Private_Dirty:493660 kB
Referenced:   493436 kB
Anonymous:493660 kB
AnonHugePages: 0 kB
Swap:  0 kB
KernelPageSize:4 kB
MMUPageSize:   4 kB
Locked:0 kB
VmFlags: rd wr mr mp me ac sd

Thanks for any hints or comments.
Ales


Re: PostgreSQL 14.4 ERROR: out of memory issues

2022-07-20 Thread Aleš Zelený
Hello,

po 18. 7. 2022 v 21:04 odesílatel Tom Lane  napsal:

> =?UTF-8?B?QWxlxaEgWmVsZW7DvQ==?=  writes:
> > postgres=# show shared_buffers ;
> >  shared_buffers
> > 
> >  10GB
> > (1 row)
>
> Oh!  In that case, there is absolutely nothing to see here.
> This chunk:
>
> > 7fd9b0551000 10827040K rw-s- zero (deleted)
>
> must be your shared-memory region, and it's accounting for just
> about all of the process's memory space.  There is no reason
> to suspect a leak.
>

I agree this 10 GB is expected to be accounted for shared buffers (just a
size correlation) and therefore no memory leak here.



> I think you've been fooled by the fact that "top" and some other
> tools frequently don't report a shared-memory page as part of a
> process's usage until that process touches that particular page.
> Thus, the reported memory usage of a Postgres backend will grow
> over time as it randomly happens to touch different buffers within
> the shared buffer arena.  That does not constitute a leak, but
> it does make such tools next door to useless for detecting
> actual leaks :-(.  You can only believe there's a leak if the
> reported usage doesn't level off after reaching the vicinity of
> your shared memory size.
>

On the other hand, if we do have 10GB of shared buffers plus ~14MB reported
by Top Memory Context (even if /proc//status reports RssAnon memory
when touched, the process VM peak is ~12 GB?
-bash-4.2$ cat 20220718_200230.status.17048.log
Name: postmaster
Umask: 0077
State: S (sleeping)
Tgid: 17048
Ngid: 0
Pid: 17048
PPid: 4476
TracerPid: 0
Uid: 26 26 26 26
Gid: 26 26 26 26
FDSize: 1024
Groups: 26
VmPeak: 12477752 kB
VmSize: 12439740 kB
VmLck:   0 kB
VmPin:   0 kB
VmHWM: 9811560 kB
VmRSS: 9811560 kB
RssAnon: 1283964 kB
RssFile:   15784 kB
RssShmem: 8511812 kB
VmData: 1282860 kB
VmStk: 132 kB
VmExe:7816 kB
VmLib:   85312 kB
VmPTE:   23716 kB
VmSwap:   0 kB
Threads: 1
SigQ: 0/160197
SigPnd: 
ShdPnd: 
SigBlk: 0040
SigIgn: 01701800
SigCgt: 0001a0006287
CapInh: 
CapPrm: 
CapEff: 
CapBnd: 001f
CapAmb: 
NoNewPrivs: 0
Seccomp: 0
Speculation_Store_Bypass: thread vulnerable
Cpus_allowed: ff
Cpus_allowed_list: 0-23
Mems_allowed:
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0003
Mems_allowed_list: 0-1
voluntary_ctxt_switches: 5273344
nonvoluntary_ctxt_switches: 13651

I'd expect VMPeak size close to 10GB  (suppose that all shared memory pages
were touched) shared_buffers plus ~14MB of private memory, but
VMPeak: 12477752 [kB] - 10485760 shared buffers [kB] = 1991992 kB Which is
a bit more than reported RssAnon: 1283964 kB  memory for the process, but
very close.

pmap taken in the same time:

-bash-4.2$ cat 20220718_200230.pmap.17048.log | grep -e anon -e deleted
00db3000200K rw---   [ anon ]
02d15000504K rw---   [ anon ]
02d93000 934476K rw---   [ anon ]  <<<= this is suspected to be
responsible for the OutOfMemory -> failing fork(), reinitialize the cluser
7fd989776000 311304K rw---   [ anon ]  <<<= togehter with this one
7fd9a8d75000  32772K rw---   [ anon ]
7fd9acb65000 20K rw---   [ anon ]
7fd9affc1000372K rw---   [ anon ]
7fd9b023d000   1028K rw---   [ anon ]
7fd9b054b000 24K rw---   [ anon ]
7fd9b0551000 10827040K rw-s- zero (deleted)   <<<= expected to be
shared_buffers
7fdc45f18000 16K rw---   [ anon ]
7fdc4d50c000184K rw---   [ anon ]
7fdc4de16000  4K rw---   [ anon ]
7fdc4e03c000  8K rw---   [ anon ]
7fdc4fd1e000 84K rw---   [ anon ]
7fdc4ff6f000  8K rw---   [ anon ]
7fdc508d7000  8K rw---   [ anon ]
7fdc50d59000  4K rw---   [ anon ]
7fdc5139e000  8K rw---   [ anon ]
7fdc520f3000 40K rw---   [ anon ]
7fdc52902000 20K rw---   [ anon ]
7fdc52ead000 16K rw---   [ anon ]
7fdc532af000  4K rw---   [ anon ]
7fdc53e2b000 16K rw---   [ anon ]
7fdc544db000 16K rw---   [ anon ]
7fdc54cc8000  8K rw---   [ anon ]
7fdc54f1a000   1588K rw---   [ anon ]
7fdc550dc000104K rw---   [ anon ]
7fdc550fc000  8K r-x--   [ anon ]
7fdc550ff000  4K rw---   [ anon ]
7fdc55102000  4K rw---   [ anon ]
7fffd38f  8K r-x--   [ anon ]
ff60  4K r-x--   [ anon ]


The machine has 40GB of RAM. It is a dedicated DB server, only a single PG
instance is running there. Since we again run out of memory from my last
reply, now it looks  healthy:

Re: Out Of Memory

2022-07-19 Thread Adrian Klaver

On 7/19/22 10:54 AM, ghoostin...@mail.com wrote:

Again reply to list also.

Ccing list.


What should i do and what’s the requirements
———-


The link explains what you should do.

What requirements are you talking about?

This is probably something that the Odoo community will be better able 
to answer:


https://www.odoo.com/forum/help-1


--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Out Of Memory

2022-07-19 Thread Adrian Klaver

On 7/19/22 9:56 AM, ghoostin...@mail.com wrote:

Please reply to list also.
Ccing list


 > 1) Detail how you are doing migration
1 - I'm Using Tool OpenUpgrade OCA
2 - Creation new db and restore my db on it (36GB and Some table 5m+ 
rows)
3 - Execution Script Sql to unistall Unwanted Modules
4 - Switching to v12.0
-> gitrecupdate 12.0
5 - Execute Shell to Unistall unwanted Modules
-> echo "self.env['base.module.upgrade'].upgrade_module()" | 
./odoo/odoo-bin shell --config ./odoo12.conf -d bt --stop-after-init
6 - Switching to 13.0 & Execute OpenUpgrade
-> gitrecupdate 13.0
-> ./OCA/OpenUpgrade/odoo-bin --config ./openupgrade13.conf -d 
bt -u all --stop-after-init
7 - Switching to 14.0
-> gitrecupdate 14.0
-> ./odoo/odoo-bin --config ./odoo14.conf -d bt -u all 
--stop-after-init


I would read:

https://oca.github.io/OpenUpgrade/after_migration.html

I have to believe doing serial migrations with no clean up is going to 
eat up space where you have database of size 36GB and ram of 30GB and 
disk size of 100GB.





2) Are you doing this on the same machine?
- Yeah, i did it on AWS Server
3) OS and version
- Ubuntu 20.04.4 LTS


4) Hardware specifications for machine e.g. memory, disk space, etc.
- Amazon AWS
- Instance Type: r6gd.xlarge
- OS: Ubuntu
- RAM: 30GB, Disk Space: 100GB


Note: It works till Migration to 13.0 , Stock account Migration

—— - - -
On 19/07/2022 at 3:00 PM, Adrian Klaver wrote:


On 7/19/22 06:16, ghoostin...@mail.com wrote:

Hello, I’m using Odoo Erp v12 Community and i want migrate to v14 and i have 
tablea content more 5m rows and after 8hr of execution it sho msg “out of 
memory” so i need your help


And we need information:

1) Detail how you are doing migration.

2) Are you doing this on the same machine?

3) OS and version.

4) Hardware specifications for machine e.g. memory, disk space, etc.


--
Adrian Klaver
adrian.kla...@aklaver.com




--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Out Of Memory

2022-07-19 Thread Adrian Klaver

On 7/19/22 06:16, ghoostin...@mail.com wrote:

Hello, I’m using Odoo Erp v12 Community and i want migrate to v14 and i have 
tablea content more 5m rows and after 8hr of execution it sho msg “out of 
memory” so i need your help


And we need information:

1) Detail how you are doing migration.

2) Are you doing this on the same machine?

3) OS and version.

4) Hardware specifications for machine e.g. memory, disk space, etc.


--
Adrian Klaver
adrian.kla...@aklaver.com




Out Of Memory

2022-07-19 Thread ghoostinger
Hello, I’m using Odoo Erp v12 Community and i want migrate to v14 and i have 
tablea content more 5m rows and after 8hr of execution it sho msg “out of 
memory” so i need your help




Re: PostgreSQL 14.4 ERROR: out of memory issues

2022-07-18 Thread Tom Lane
=?UTF-8?B?QWxlxaEgWmVsZW7DvQ==?=  writes:
> postgres=# show shared_buffers ;
>  shared_buffers
> 
>  10GB
> (1 row)

Oh!  In that case, there is absolutely nothing to see here.
This chunk:

> 7fd9b0551000 10827040K rw-s- zero (deleted)

must be your shared-memory region, and it's accounting for just
about all of the process's memory space.  There is no reason
to suspect a leak.

I think you've been fooled by the fact that "top" and some other
tools frequently don't report a shared-memory page as part of a
process's usage until that process touches that particular page.
Thus, the reported memory usage of a Postgres backend will grow
over time as it randomly happens to touch different buffers within
the shared buffer arena.  That does not constitute a leak, but
it does make such tools next door to useless for detecting
actual leaks :-(.  You can only believe there's a leak if the
reported usage doesn't level off after reaching the vicinity of
your shared memory size.

So back to why you're getting these out-of-memory failures:
we still don't have much info about that.  I wonder whether
there is something watching the total usage reported by "top"
and taking action based on that entirely-misleading number.
The kernel itself should be aware that there's no extra memory
pressure from N backends all using the same shared memory
segment, but it can be hard for outside tools to know that.

At this point I suspect that PG 14.3 vs 14.4 is a red herring,
and what you should be looking into is what else you updated
at about the same time, particularly in the line of container
management tools or the like (if you use any).

regards, tom lane




Re: PostgreSQL 14.4 ERROR: out of memory issues

2022-07-18 Thread Pavel Stehule
po 18. 7. 2022 v 20:26 odesílatel Aleš Zelený 
napsal:

>
> po 18. 7. 2022 v 16:25 odesílatel Tom Lane  napsal:
>
>> =?UTF-8?B?QWxlxaEgWmVsZW7DvQ==?=  writes:
>> > after some time, I've found a process consuming over 1GB of memory"
>> > -bash-4.2$ grep RssAnon /proc/*/status | sort -nk2 | tail
>> > /proc/17048/status:RssAnon: 1053952 kB
>>
>> > Here are memory contexts for PID 17048:
>>
>> > TopMemoryContext: 422592 total in 14 blocks; 42536 free (169 chunks);
>> > 380056 used
>> ...
>> > Grand total: 14312808 bytes in 4752 blocks; 3920880 free (1043 chunks);
>> > 10391928 used
>>
>> OK, so PG's normal memory consumption is only ~14MB.  Where'd the
>> rest of it go?
>>
>> > -bash-4.2$ pmap 17048
>> ...
>> > 02d93000 838092K rw---   [ anon ]
>> > 7fd999777000 180232K rw---   [ anon ]
>> > 7fd9a8d75000  32772K rw---   [ anon ]
>>
>
> and a few hours later it even grew:
>
> -bash-4.2$ cat 20220718_200230.pmap.17048.log | grep anon
> 00db3000200K rw---   [ anon ]
> 02d15000504K rw---   [ anon ]
> 02d93000 934476K rw---   [ anon ]
> 7fd989776000 311304K rw---   [ anon ]
> 7fd9a8d75000  32772K rw---   [ anon ]
> 7fd9acb65000 20K rw---   [ anon ]
> 7fd9affc1000372K rw---   [ anon ]
>
> From previous observation I know, that the process RssAnon memory grew
> over time, sometimes there are some steps. Still, generally, the growth is
> linear until the process finishes or we run out of memory, and the cluster
> is reinitialized by the postmaster.
> No such behavior on older versions until 14.3 (including).
>
> ...
>> Probably the 838M chunk is shared memory?  Is that within hailing
>> distance of your shared_buffers setting?
>>
>
> postgres=# show shared_buffers ;
>  shared_buffers
> 
>  10GB
> (1 row)
>
> ...
>> > 7fd9b0551000 10827040K rw-s- zero (deleted)
>>
>
> These 10GB matches 10GB configured as shared buffers.
>
>
>> ...
>> And here we have the culprit, evidently ... but what the dickens
>> is it?  I can't think of any mechanism within Postgres that would
>> create such an allocation.
>>
>
I checked code, and there is lot of usage of malloc function.

static void *
GenerationAlloc(MemoryContext context, Size size)
{
<-->GenerationContext *set = (GenerationContext *) context;
<-->GenerationBlock *block;
<-->GenerationChunk *chunk;
<-->Size<--><-->chunk_size = MAXALIGN(size);
<-->Size<--><-->required_size = chunk_size + Generation_CHUNKHDRSZ;

<-->/* is it an over-sized chunk? if yes, allocate special block */
<-->if (chunk_size > set->allocChunkLimit)
<-->{
<--><-->Size<--><-->blksize = required_size + Generation_BLOCKHDRSZ;

<--><-->block = (GenerationBlock *) malloc(blksize);
<--><-->if (block == NULL)
<--><--><-->return NULL;

Cannot be some memory lost in this allocation?

Regards

Pavel



>
>> regards, tom lane
>>
>
> Kind regards
> Ales Zeleny
>


Re: PostgreSQL 14.4 ERROR: out of memory issues

2022-07-18 Thread Aleš Zelený
po 18. 7. 2022 v 16:25 odesílatel Tom Lane  napsal:

> =?UTF-8?B?QWxlxaEgWmVsZW7DvQ==?=  writes:
> > after some time, I've found a process consuming over 1GB of memory"
> > -bash-4.2$ grep RssAnon /proc/*/status | sort -nk2 | tail
> > /proc/17048/status:RssAnon: 1053952 kB
>
> > Here are memory contexts for PID 17048:
>
> > TopMemoryContext: 422592 total in 14 blocks; 42536 free (169 chunks);
> > 380056 used
> ...
> > Grand total: 14312808 bytes in 4752 blocks; 3920880 free (1043 chunks);
> > 10391928 used
>
> OK, so PG's normal memory consumption is only ~14MB.  Where'd the
> rest of it go?
>
> > -bash-4.2$ pmap 17048
> ...
> > 02d93000 838092K rw---   [ anon ]
> > 7fd999777000 180232K rw---   [ anon ]
> > 7fd9a8d75000  32772K rw---   [ anon ]
>

and a few hours later it even grew:

-bash-4.2$ cat 20220718_200230.pmap.17048.log | grep anon
00db3000200K rw---   [ anon ]
02d15000504K rw---   [ anon ]
02d93000 934476K rw---   [ anon ]
7fd989776000 311304K rw---   [ anon ]
7fd9a8d75000  32772K rw---   [ anon ]
7fd9acb65000 20K rw---   [ anon ]
7fd9affc1000372K rw---   [ anon ]

>From previous observation I know, that the process RssAnon memory grew over
time, sometimes there are some steps. Still, generally, the growth is
linear until the process finishes or we run out of memory, and the cluster
is reinitialized by the postmaster.
No such behavior on older versions until 14.3 (including).

...
> Probably the 838M chunk is shared memory?  Is that within hailing
> distance of your shared_buffers setting?
>

postgres=# show shared_buffers ;
 shared_buffers

 10GB
(1 row)

...
> > 7fd9b0551000 10827040K rw-s- zero (deleted)
>

These 10GB matches 10GB configured as shared buffers.


> ...
> And here we have the culprit, evidently ... but what the dickens
> is it?  I can't think of any mechanism within Postgres that would
> create such an allocation.
>
> regards, tom lane
>

Kind regards
Ales Zeleny


Re: PostgreSQL 14.4 ERROR: out of memory issues

2022-07-18 Thread Tom Lane
=?UTF-8?B?QWxlxaEgWmVsZW7DvQ==?=  writes:
> after some time, I've found a process consuming over 1GB of memory"
> -bash-4.2$ grep RssAnon /proc/*/status | sort -nk2 | tail
> /proc/17048/status:RssAnon: 1053952 kB

> Here are memory contexts for PID 17048:

> TopMemoryContext: 422592 total in 14 blocks; 42536 free (169 chunks);
> 380056 used
...
> Grand total: 14312808 bytes in 4752 blocks; 3920880 free (1043 chunks);
> 10391928 used

OK, so PG's normal memory consumption is only ~14MB.  Where'd the
rest of it go?

> -bash-4.2$ pmap 17048
...
> 02d93000 838092K rw---   [ anon ]
> 7fd999777000 180232K rw---   [ anon ]
> 7fd9a8d75000  32772K rw---   [ anon ]
...
Probably the 838M chunk is shared memory?  Is that within hailing
distance of your shared_buffers setting?
...
> 7fd9b0551000 10827040K rw-s- zero (deleted)
...
And here we have the culprit, evidently ... but what the dickens
is it?  I can't think of any mechanism within Postgres that would
create such an allocation.

regards, tom lane




Re: PostgreSQL 14.4 ERROR: out of memory issues

2022-07-18 Thread Aleš Zelený
Hello,




čt 14. 7. 2022 v 23:31 odesílatel Aleš Zelený 
napsal:

>
>
> Dne čt 14. 7. 2022 23:11 uživatel Tomas Vondra <
> tomas.von...@enterprisedb.com> napsal:
>
>>
>>
>> On 7/14/22 21:25, Aleš Zelený wrote:
>> >
>> > st 13. 7. 2022 v 2:20 odesílatel Michael Paquier > > <mailto:mich...@paquier.xyz>> napsal:
>> >
>> > On Mon, Jul 11, 2022 at 10:50:23AM +0200, Aleš Zelený wrote:
>> > > So far, it has happened three times (during a single week) from
>> > the 14.3 ->
>> > > 14.4 upgrade, before 14.4 we haven't suffered from such an issue.
>> > >
>> > > Questions:
>> > > 1)  Can we safely downgrade from 14.4 to 14.3 by shutting down the
>> > instance
>> > > and reinstalling 14.3 PG packages (to prove, that the issue
>> > disappear)?
>> > > 2) What is the best way to diagnose what is the root cause?
>> >
>> > Hmm.  14.4 has nothing in its release notes that would point to a
>> > change in the vacuum or autovacuum's code paths:
>> > https://www.postgresql.org/docs/14/release-14-4.html#id-1.11.6.5.4
>> > <https://www.postgresql.org/docs/14/release-14-4.html#id-1.11.6.5.4
>> >
>> >
>> > There is nothing specific after a look at the changes as of, and I
>> am
>> > not grabbing anything that would imply a change in memory context
>> > handling either:
>> > `git log --stat REL_14_3..REL_14_4`
>> > `git diff REL_14_3..REL_14_4 -- *.c`
>> >
>> > Saying that, you should be able to downgrade safely as there are no
>> > changes in WAL format or such that would break things.  Saying that,
>> > the corruption issue caused by CONCURRENTLY is something you'd still
>> > have to face.
>> >
>> >
>> > Thanks, good to know that, we can use it for a test case, since we
>> > already hit the CONCURRENTLY bug on 14.3.
>> >
>> > > 2022-07-02 14:48:07 CEST [3930]: [3-1] user=,db=,host=,app=
>> > ERROR:  out of
>> > > memory
>> > > 2022-07-02 14:48:07 CEST [3930]: [4-1] user=,db=,host=,app=
>> > DETAIL:  Failed
>> > > on request of size 152094068 in memory context
>> > "TopTransactionContext".
>> > > 2022-07-02 14:48:07 CEST [3930]: [5-1] user=,db=,host=,app=
>> CONTEXT:
>> > >  automatic vacuum of table "prematch.replication.tab_queue_tmp"
>> >
>> > This is the interesting part.  Do you happen to use logical
>> > replication in a custom C++ plugin?
>> >
>> >
>> > We are using logical replication to other instances (pg_output) and
>> > decoderbufs
>> > https://github.com/debezium/postgres-decoderbufs
>> > <https://github.com/debezium/postgres-decoderbufs> for other
>> applications.
>> >
>>
>> This is probably just a red herring - std:bad_alloc is what the process
>> that runs into the overcommit limit gets. But the real issue (e.g.
>> memory leak) is likely somewhere else - different part of the code,
>> different process ...
>>
>> > ...
>> >
>> > Checking the RssAnon from proc/pid/status I've found some points where
>> > RssAnon memory usage grew very steep for a minute, but no "suspicious"
>> > queries/arguments were found in the instance logfile.
>> >
>> > Any hint, on how to get the root cause would be appreciated since so far
>> > I've failed to isolate the issue reproducible testcase.
>> > At least I hope that looking for the RssAnon process memory is an
>> > appropriate metric, if not, let me know and I'll try to update the
>> > monitoring to get the root cause.
>> >
>> > I can imagine a workaround with client application regular reconnect...,
>> > but u to 14.3 it works, so I'd like to fix the issue either on our
>> > application side or at PG side if it is a PG problem.
>> >
>>
>> I think it's be interesting to get memory context stats from the
>> processes consuming a lot of memory. If you know which processes are
>> suspect (and it seems you know, bacause if a reconnect helps it's the
>> backend handling the connection), you can attach a debugger and do
>>
>>$ gdb -p $PID
>>call MemoryContextStats(TopMemoryContext)
>>
>> which will log info about memory contexts, just like autovacuum.
>> Hopefully that

Re: PostgreSQL 14.4 ERROR: out of memory issues

2022-07-14 Thread Aleš Zelený
Dne čt 14. 7. 2022 23:11 uživatel Tomas Vondra <
tomas.von...@enterprisedb.com> napsal:

>
>
> On 7/14/22 21:25, Aleš Zelený wrote:
> >
> > st 13. 7. 2022 v 2:20 odesílatel Michael Paquier  > <mailto:mich...@paquier.xyz>> napsal:
> >
> > On Mon, Jul 11, 2022 at 10:50:23AM +0200, Aleš Zelený wrote:
> > > So far, it has happened three times (during a single week) from
> > the 14.3 ->
> > > 14.4 upgrade, before 14.4 we haven't suffered from such an issue.
> > >
> > > Questions:
> > > 1)  Can we safely downgrade from 14.4 to 14.3 by shutting down the
> > instance
> > > and reinstalling 14.3 PG packages (to prove, that the issue
> > disappear)?
> > > 2) What is the best way to diagnose what is the root cause?
> >
> > Hmm.  14.4 has nothing in its release notes that would point to a
> > change in the vacuum or autovacuum's code paths:
> > https://www.postgresql.org/docs/14/release-14-4.html#id-1.11.6.5.4
> > <https://www.postgresql.org/docs/14/release-14-4.html#id-1.11.6.5.4>
> >
> > There is nothing specific after a look at the changes as of, and I am
> > not grabbing anything that would imply a change in memory context
> > handling either:
> > `git log --stat REL_14_3..REL_14_4`
> > `git diff REL_14_3..REL_14_4 -- *.c`
> >
> > Saying that, you should be able to downgrade safely as there are no
> > changes in WAL format or such that would break things.  Saying that,
> > the corruption issue caused by CONCURRENTLY is something you'd still
> > have to face.
> >
> >
> > Thanks, good to know that, we can use it for a test case, since we
> > already hit the CONCURRENTLY bug on 14.3.
> >
> > > 2022-07-02 14:48:07 CEST [3930]: [3-1] user=,db=,host=,app=
> > ERROR:  out of
> > > memory
> > > 2022-07-02 14:48:07 CEST [3930]: [4-1] user=,db=,host=,app=
> > DETAIL:  Failed
> > > on request of size 152094068 in memory context
> > "TopTransactionContext".
> > > 2022-07-02 14:48:07 CEST [3930]: [5-1] user=,db=,host=,app=
> CONTEXT:
> > >  automatic vacuum of table "prematch.replication.tab_queue_tmp"
> >
> > This is the interesting part.  Do you happen to use logical
> > replication in a custom C++ plugin?
> >
> >
> > We are using logical replication to other instances (pg_output) and
> > decoderbufs
> > https://github.com/debezium/postgres-decoderbufs
> > <https://github.com/debezium/postgres-decoderbufs> for other
> applications.
> >
>
> This is probably just a red herring - std:bad_alloc is what the process
> that runs into the overcommit limit gets. But the real issue (e.g.
> memory leak) is likely somewhere else - different part of the code,
> different process ...
>
> > ...
> >
> > Checking the RssAnon from proc/pid/status I've found some points where
> > RssAnon memory usage grew very steep for a minute, but no "suspicious"
> > queries/arguments were found in the instance logfile.
> >
> > Any hint, on how to get the root cause would be appreciated since so far
> > I've failed to isolate the issue reproducible testcase.
> > At least I hope that looking for the RssAnon process memory is an
> > appropriate metric, if not, let me know and I'll try to update the
> > monitoring to get the root cause.
> >
> > I can imagine a workaround with client application regular reconnect...,
> > but u to 14.3 it works, so I'd like to fix the issue either on our
> > application side or at PG side if it is a PG problem.
> >
>
> I think it's be interesting to get memory context stats from the
> processes consuming a lot of memory. If you know which processes are
> suspect (and it seems you know, bacause if a reconnect helps it's the
> backend handling the connection), you can attach a debugger and do
>
>$ gdb -p $PID
>call MemoryContextStats(TopMemoryContext)
>
> which will log info about memory contexts, just like autovacuum.
> Hopefully that tells us memory context is bloated, and that might point
> us to particular part of the code.
>

If the RssAnon memory is a good indicator, i can then determine the
backends and dump memory context.
It'll take me some time since I'm out of office for vacation, but I'll
manage that somewhat way.

Thanks for all to the hints!

Aleš

>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


Re: PostgreSQL 14.4 ERROR: out of memory issues

2022-07-14 Thread Tomas Vondra



On 7/14/22 21:25, Aleš Zelený wrote:
> 
> st 13. 7. 2022 v 2:20 odesílatel Michael Paquier  <mailto:mich...@paquier.xyz>> napsal:
> 
> On Mon, Jul 11, 2022 at 10:50:23AM +0200, Aleš Zelený wrote:
> > So far, it has happened three times (during a single week) from
> the 14.3 ->
> > 14.4 upgrade, before 14.4 we haven't suffered from such an issue.
> >
> > Questions:
> > 1)  Can we safely downgrade from 14.4 to 14.3 by shutting down the
> instance
> > and reinstalling 14.3 PG packages (to prove, that the issue
> disappear)?
> > 2) What is the best way to diagnose what is the root cause?
> 
> Hmm.  14.4 has nothing in its release notes that would point to a
> change in the vacuum or autovacuum's code paths:
> https://www.postgresql.org/docs/14/release-14-4.html#id-1.11.6.5.4
> <https://www.postgresql.org/docs/14/release-14-4.html#id-1.11.6.5.4>
> 
> There is nothing specific after a look at the changes as of, and I am
> not grabbing anything that would imply a change in memory context
> handling either:
> `git log --stat REL_14_3..REL_14_4`
> `git diff REL_14_3..REL_14_4 -- *.c`
> 
> Saying that, you should be able to downgrade safely as there are no
> changes in WAL format or such that would break things.  Saying that,
> the corruption issue caused by CONCURRENTLY is something you'd still
> have to face.
> 
> 
> Thanks, good to know that, we can use it for a test case, since we
> already hit the CONCURRENTLY bug on 14.3.
> 
> > 2022-07-02 14:48:07 CEST [3930]: [3-1] user=,db=,host=,app=
> ERROR:  out of
> > memory
> > 2022-07-02 14:48:07 CEST [3930]: [4-1] user=,db=,host=,app=
> DETAIL:  Failed
> > on request of size 152094068 in memory context
> "TopTransactionContext".
> > 2022-07-02 14:48:07 CEST [3930]: [5-1] user=,db=,host=,app= CONTEXT:
> >  automatic vacuum of table "prematch.replication.tab_queue_tmp"
> 
> This is the interesting part.  Do you happen to use logical
> replication in a custom C++ plugin?
> 
> 
> We are using logical replication to other instances (pg_output) and
> decoderbufs
> https://github.com/debezium/postgres-decoderbufs
> <https://github.com/debezium/postgres-decoderbufs> for other applications.
>  

This is probably just a red herring - std:bad_alloc is what the process
that runs into the overcommit limit gets. But the real issue (e.g.
memory leak) is likely somewhere else - different part of the code,
different process ...

> ...
> 
> Checking the RssAnon from proc/pid/status I've found some points where
> RssAnon memory usage grew very steep for a minute, but no "suspicious"
> queries/arguments were found in the instance logfile.
>  
> Any hint, on how to get the root cause would be appreciated since so far
> I've failed to isolate the issue reproducible testcase.
> At least I hope that looking for the RssAnon process memory is an
> appropriate metric, if not, let me know and I'll try to update the
> monitoring to get the root cause.
> 
> I can imagine a workaround with client application regular reconnect...,
> but u to 14.3 it works, so I'd like to fix the issue either on our
> application side or at PG side if it is a PG problem.
> 

I think it's be interesting to get memory context stats from the
processes consuming a lot of memory. If you know which processes are
suspect (and it seems you know, bacause if a reconnect helps it's the
backend handling the connection), you can attach a debugger and do

   $ gdb -p $PID
   call MemoryContextStats(TopMemoryContext)

which will log info about memory contexts, just like autovacuum.
Hopefully that tells us memory context is bloated, and that might point
us to particular part of the code.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: PostgreSQL 14.4 ERROR: out of memory issues

2022-07-14 Thread Pavel Stehule
čt 14. 7. 2022 v 21:26 odesílatel Aleš Zelený 
napsal:

>
> st 13. 7. 2022 v 2:20 odesílatel Michael Paquier 
> napsal:
>
>> On Mon, Jul 11, 2022 at 10:50:23AM +0200, Aleš Zelený wrote:
>> > So far, it has happened three times (during a single week) from the
>> 14.3 ->
>> > 14.4 upgrade, before 14.4 we haven't suffered from such an issue.
>> >
>> > Questions:
>> > 1)  Can we safely downgrade from 14.4 to 14.3 by shutting down the
>> instance
>> > and reinstalling 14.3 PG packages (to prove, that the issue disappear)?
>> > 2) What is the best way to diagnose what is the root cause?
>>
>> Hmm.  14.4 has nothing in its release notes that would point to a
>> change in the vacuum or autovacuum's code paths:
>> https://www.postgresql.org/docs/14/release-14-4.html#id-1.11.6.5.4
>>
>> There is nothing specific after a look at the changes as of, and I am
>> not grabbing anything that would imply a change in memory context
>> handling either:
>> `git log --stat REL_14_3..REL_14_4`
>> `git diff REL_14_3..REL_14_4 -- *.c`
>>
>> Saying that, you should be able to downgrade safely as there are no
>> changes in WAL format or such that would break things.  Saying that,
>> the corruption issue caused by CONCURRENTLY is something you'd still
>> have to face.
>>
>>
> Thanks, good to know that, we can use it for a test case, since we already
> hit the CONCURRENTLY bug on 14.3.
>
> > 2022-07-02 14:48:07 CEST [3930]: [3-1] user=,db=,host=,app= ERROR:  out
>> of
>> > memory
>> > 2022-07-02 14:48:07 CEST [3930]: [4-1] user=,db=,host=,app= DETAIL:
>> Failed
>> > on request of size 152094068 in memory context "TopTransactionContext".
>> > 2022-07-02 14:48:07 CEST [3930]: [5-1] user=,db=,host=,app= CONTEXT:
>> >  automatic vacuum of table "prematch.replication.tab_queue_tmp"
>>
>> This is the interesting part.  Do you happen to use logical
>> replication in a custom C++ plugin?
>>
>
> We are using logical replication to other instances (pg_output) and
> decoderbufs
> https://github.com/debezium/postgres-decoderbufs for other applications.
>
>
>> > 2022-07-02 14:48:47 CEST [4476]: [43-1] user=,db=,host=,app= LOG:  could
>> > not fork worker process: Cannot allocate memory
>> > terminate called after throwing an instance of 'std::bad_alloc'
>> >   what():  std::bad_alloc
>> >
>> > DETAIL: parameters: $1 = '1', $2 = '1748010445', $3 = '0', $4 = '1000'
>> > terminate
>> > called after throwing an instance of 'std::bad_alloc' terminate called
>> > after throwing an instance of 'std::bad_alloc' what(): what():
>> > std::bad_allocstd::bad_alloc 2022-07-08 14:54:23 CEST [4476]: [49-1]
>> > user=,db=,host=,app= LOG: background worker "parallel worker" (PID
>> 25251)
>> > was terminated by signal 6: Aborted
>> > 2022-07-08 14:54:23 CEST [4476]: [51-1] user=,db=,host=,app= LOG:
>> >  terminating any other active server processes
>>
>> Looks like something is going wrong in the memory handling of one of
>> your C++ extensions here.  If you can isolate an issue using a query
>> without any custom code, that would be a Postgres problem, but I think
>> that you are missing a trick in it.
>>
>
> Here are extensions installed in the database served by the cluster:
> prematch=# \dx
> List of installed extensions
> Name| Version |   Schema   |
>  Description
>
> +-++
>  amcheck| 1.3 | public | functions for verifying
> relation integrity
>  dblink | 1.2 | public | connect to other PostgreSQL
> databases from within a database
>  file_fdw   | 1.0 | public | foreign-data wrapper for flat
> file access
>  hstore | 1.8 | public | data type for storing sets of
> (key, value) pairs
>  hypopg | 1.3.1   | public | Hypothetical indexes for
> PostgreSQL
>  pageinspect| 1.9 | public | inspect the contents of
> database pages at a low level
>  pg_buffercache | 1.3 | public | examine the shared buffer
> cache
>  pg_stat_kcache | 2.2.0   | public | Kernel statistics gathering
>  pg_stat_statements | 1.9 | public | track planning and execution
> statistics of all SQL statements executed
>  pgcrypto   | 1.3 | public | cryptographic functions
>  pgstattuple

Re: PostgreSQL 14.4 ERROR: out of memory issues

2022-07-14 Thread Aleš Zelený
st 13. 7. 2022 v 2:20 odesílatel Michael Paquier 
napsal:

> On Mon, Jul 11, 2022 at 10:50:23AM +0200, Aleš Zelený wrote:
> > So far, it has happened three times (during a single week) from the 14.3
> ->
> > 14.4 upgrade, before 14.4 we haven't suffered from such an issue.
> >
> > Questions:
> > 1)  Can we safely downgrade from 14.4 to 14.3 by shutting down the
> instance
> > and reinstalling 14.3 PG packages (to prove, that the issue disappear)?
> > 2) What is the best way to diagnose what is the root cause?
>
> Hmm.  14.4 has nothing in its release notes that would point to a
> change in the vacuum or autovacuum's code paths:
> https://www.postgresql.org/docs/14/release-14-4.html#id-1.11.6.5.4
>
> There is nothing specific after a look at the changes as of, and I am
> not grabbing anything that would imply a change in memory context
> handling either:
> `git log --stat REL_14_3..REL_14_4`
> `git diff REL_14_3..REL_14_4 -- *.c`
>
> Saying that, you should be able to downgrade safely as there are no
> changes in WAL format or such that would break things.  Saying that,
> the corruption issue caused by CONCURRENTLY is something you'd still
> have to face.
>
>
Thanks, good to know that, we can use it for a test case, since we already
hit the CONCURRENTLY bug on 14.3.

> 2022-07-02 14:48:07 CEST [3930]: [3-1] user=,db=,host=,app= ERROR:  out of
> > memory
> > 2022-07-02 14:48:07 CEST [3930]: [4-1] user=,db=,host=,app= DETAIL:
> Failed
> > on request of size 152094068 in memory context "TopTransactionContext".
> > 2022-07-02 14:48:07 CEST [3930]: [5-1] user=,db=,host=,app= CONTEXT:
> >  automatic vacuum of table "prematch.replication.tab_queue_tmp"
>
> This is the interesting part.  Do you happen to use logical
> replication in a custom C++ plugin?
>

We are using logical replication to other instances (pg_output) and
decoderbufs
https://github.com/debezium/postgres-decoderbufs for other applications.


> > 2022-07-02 14:48:47 CEST [4476]: [43-1] user=,db=,host=,app= LOG:  could
> > not fork worker process: Cannot allocate memory
> > terminate called after throwing an instance of 'std::bad_alloc'
> >   what():  std::bad_alloc
> >
> > DETAIL: parameters: $1 = '1', $2 = '1748010445', $3 = '0', $4 = '1000'
> > terminate
> > called after throwing an instance of 'std::bad_alloc' terminate called
> > after throwing an instance of 'std::bad_alloc' what(): what():
> > std::bad_allocstd::bad_alloc 2022-07-08 14:54:23 CEST [4476]: [49-1]
> > user=,db=,host=,app= LOG: background worker "parallel worker" (PID 25251)
> > was terminated by signal 6: Aborted
> > 2022-07-08 14:54:23 CEST [4476]: [51-1] user=,db=,host=,app= LOG:
> >  terminating any other active server processes
>
> Looks like something is going wrong in the memory handling of one of
> your C++ extensions here.  If you can isolate an issue using a query
> without any custom code, that would be a Postgres problem, but I think
> that you are missing a trick in it.
>

Here are extensions installed in the database served by the cluster:
prematch=# \dx
List of installed extensions
Name| Version |   Schema   |
 Description
+-++
 amcheck| 1.3 | public | functions for verifying
relation integrity
 dblink | 1.2 | public | connect to other PostgreSQL
databases from within a database
 file_fdw   | 1.0 | public | foreign-data wrapper for flat
file access
 hstore | 1.8 | public | data type for storing sets of
(key, value) pairs
 hypopg | 1.3.1   | public | Hypothetical indexes for
PostgreSQL
 pageinspect| 1.9 | public | inspect the contents of
database pages at a low level
 pg_buffercache | 1.3 | public | examine the shared buffer cache
 pg_stat_kcache | 2.2.0   | public | Kernel statistics gathering
 pg_stat_statements | 1.9 | public | track planning and execution
statistics of all SQL statements executed
 pgcrypto   | 1.3 | public | cryptographic functions
 pgstattuple| 1.5 | public | show tuple-level statistics
 plpgsql| 1.0 | pg_catalog | PL/pgSQL procedural language
 plpgsql_check  | 2.1 | public | extended check for plpgsql
functions
 postgres_fdw   | 1.1 | public | foreign-data wrapper for
remote PostgreSQL servers
 tablefunc  | 1.0 | public | functions that manipulate
whole tables, including crosstab
(15 rows)

When we started experiencing these issues, based on Prometheus
node-exporte

Re: PostgreSQL 14.4 ERROR: out of memory issues

2022-07-12 Thread Michael Paquier
On Mon, Jul 11, 2022 at 10:50:23AM +0200, Aleš Zelený wrote:
> So far, it has happened three times (during a single week) from the 14.3 ->
> 14.4 upgrade, before 14.4 we haven't suffered from such an issue.
> 
> Questions:
> 1)  Can we safely downgrade from 14.4 to 14.3 by shutting down the instance
> and reinstalling 14.3 PG packages (to prove, that the issue disappear)?
> 2) What is the best way to diagnose what is the root cause?

Hmm.  14.4 has nothing in its release notes that would point to a
change in the vacuum or autovacuum's code paths:
https://www.postgresql.org/docs/14/release-14-4.html#id-1.11.6.5.4

There is nothing specific after a look at the changes as of, and I am
not grabbing anything that would imply a change in memory context
handling either:
`git log --stat REL_14_3..REL_14_4`
`git diff REL_14_3..REL_14_4 -- *.c`

Saying that, you should be able to downgrade safely as there are no
changes in WAL format or such that would break things.  Saying that,
the corruption issue caused by CONCURRENTLY is something you'd still
have to face.

> 2022-07-02 14:48:07 CEST [3930]: [3-1] user=,db=,host=,app= ERROR:  out of
> memory
> 2022-07-02 14:48:07 CEST [3930]: [4-1] user=,db=,host=,app= DETAIL:  Failed
> on request of size 152094068 in memory context "TopTransactionContext".
> 2022-07-02 14:48:07 CEST [3930]: [5-1] user=,db=,host=,app= CONTEXT:
>  automatic vacuum of table "prematch.replication.tab_queue_tmp"

This is the interesting part.  Do you happen to use logical
replication in a custom C++ plugin?

> 2022-07-02 14:48:47 CEST [4476]: [43-1] user=,db=,host=,app= LOG:  could
> not fork worker process: Cannot allocate memory
> terminate called after throwing an instance of 'std::bad_alloc'
>   what():  std::bad_alloc
>
> DETAIL: parameters: $1 = '1', $2 = '1748010445', $3 = '0', $4 = '1000'
> terminate
> called after throwing an instance of 'std::bad_alloc' terminate called
> after throwing an instance of 'std::bad_alloc' what(): what():
> std::bad_allocstd::bad_alloc 2022-07-08 14:54:23 CEST [4476]: [49-1]
> user=,db=,host=,app= LOG: background worker "parallel worker" (PID 25251)
> was terminated by signal 6: Aborted
> 2022-07-08 14:54:23 CEST [4476]: [51-1] user=,db=,host=,app= LOG:
>  terminating any other active server processes

Looks like something is going wrong in the memory handling of one of
your C++ extensions here.  If you can isolate an issue using a query
without any custom code, that would be a Postgres problem, but I think
that you are missing a trick in it.
--
Michael


signature.asc
Description: PGP signature


Re: PostgreSQL 14.4 ERROR: out of memory issues

2022-07-11 Thread Aleš Zelený
Hello,

po 11. 7. 2022 v 14:47 odesílatel Pavel Stehule 
napsal:

> Hi
>
> It's looks like memory leak
>

As we are collecting some data for Postgres processes, I've found processes
using Rss Anonymous memory allocations:

Some examples:
2022-07-08T14:50:24+0200 Name: postmaster Pid: 31522 RssAnon:  742600 kB
2022-07-08T14:51:13+0200 Name: postmaster Pid: 31522 RssAnon:  742600 kB
2022-07-08T14:52:05+0200 Name: postmaster Pid: 31522 RssAnon:  742852 kB
2022-07-08T14:52:58+0200 Name: postmaster Pid: 31522 RssAnon:  743112 kB
2022-07-08T14:53:50+0200 Name: postmaster Pid: 31522 RssAnon:  743112 kB
2022-07-08T14:50:24+0200 Name: postmaster Pid: 31647 RssAnon:  684632 kB
2022-07-08T14:51:13+0200 Name: postmaster Pid: 31647 RssAnon:  684632 kB
2022-07-08T14:52:05+0200 Name: postmaster Pid: 31647 RssAnon:  684676 kB
2022-07-08T14:52:58+0200 Name: postmaster Pid: 31647 RssAnon:  684700 kB
2022-07-08T14:53:50+0200 Name: postmaster Pid: 31647 RssAnon:  684824 kB
2022-07-08T14:50:27+0200 Name: postmaster Pid: 7866 RssAnon: 1180960 kB
2022-07-08T14:51:16+0200 Name: postmaster Pid: 7866 RssAnon: 1180960 kB
2022-07-08T14:52:08+0200 Name: postmaster Pid: 7866 RssAnon: 1180960 kB
2022-07-08T14:53:01+0200 Name: postmaster Pid: 7866 RssAnon: 1180960 kB
2022-07-08T14:53:53+0200 Name: postmaster Pid: 7866 RssAnon: 1180960 kB

other processes uses less memory:
2022-07-08T14:52:08+0200 Name: postmaster Pid: 3869 RssAnon:3256 kB
2022-07-08T14:53:02+0200 Name: postmaster Pid: 3869 RssAnon:3256 kB
2022-07-08T14:53:54+0200 Name: postmaster Pid: 3869 RssAnon:3256 kB
2022-07-08T14:50:27+0200 Name: postmaster Pid: 4217 RssAnon:2532 kB
2022-07-08T14:51:16+0200 Name: postmaster Pid: 4217 RssAnon:2532 kB



And Shared memory:
2022-07-08T14:59:12+0200 Name: postmaster Pid: 5719 RssShmem:  908264 kB
2022-07-08T14:59:20+0200 Name: postmaster Pid: 5719 RssShmem:  908264 kB
2022-07-08T14:59:28+0200 Name: postmaster Pid: 5719 RssShmem:  908264 kB
2022-07-08T14:59:37+0200 Name: postmaster Pid: 5719 RssShmem:  908264 kB
2022-07-08T14:59:45+0200 Name: postmaster Pid: 5719 RssShmem:  908264 kB
2022-07-08T14:59:53+0200 Name: postmaster Pid: 5719 RssShmem:  908264 kB
2022-07-08T14:50:27+0200 Name: postmaster Pid: 5721 RssShmem: 1531656 kB
2022-07-08T14:51:16+0200 Name: postmaster Pid: 5721 RssShmem: 1531656 kB
2022-07-08T14:52:08+0200 Name: postmaster Pid: 5721 RssShmem: 1531656 kB
2022-07-08T14:53:01+0200 Name: postmaster Pid: 5721 RssShmem: 1531656 kB
2022-07-08T14:53:53+0200 Name: postmaster Pid: 5721 RssShmem: 1531656 kB

while other processes uses failry less of shared mem:

2022-07-08T14:55:25+0200 Name: postmaster Pid: 8521 RssShmem:1988 kB
2022-07-08T14:55:33+0200 Name: postmaster Pid: 8521 RssShmem:1988 kB
2022-07-08T14:55:40+0200 Name: postmaster Pid: 8521 RssShmem:2104 kB
2022-07-08T14:55:49+0200 Name: postmaster Pid: 8521 RssShmem:2104 kB
2022-07-08T14:55:57+0200 Name: postmaster Pid: 8521 RssShmem:2104 kB
2022-07-08T14:56:06+0200 Name: postmaster Pid: 8521 RssShmem:2104 kB
2022-07-08T14:56:15+0200 Name: postmaster Pid: 8521 RssShmem:2104 kB

Wich processes are better candidates for investigation, the ones using more
Anonymous memory or the ones using larger shared memory?
I can do the (tedious) work to get a list of statements that a given PID
executed (at least the statements that appear in the minute snapshots).


>   ErrorContext: 8192 total in 1 blocks; 7928 free (5 chunks); 264 used
>> Grand total: 1456224 bytes in 195 blocks; 378824 free (165 chunks);
>> 1077400 used
>> 2022-07-02 14:48:07 CEST [3930]: [3-1] user=,db=,host=,app= ERROR:  out
>> of memory
>> 2022-07-02 14:48:07 CEST [3930]: [4-1] user=,db=,host=,app= DETAIL:
>>  Failed on request of size 152094068 in memory context
>> "TopTransactionContext".
>> 2022-07-02 14:48:07 CEST [3930]: [5-1] user=,db=,host=,app= CONTEXT:
>>  automatic vacuum of table "prematch.replication.tab_queue_tmp"
>>
>> 2022-07-02 14:48:47 CEST [4476]: [42-1] user=,db=,host=,app= LOG:  could
>> not fork worker process: Cannot allocate memory
>> 2022-07-02 14:48:47 CEST [4476]: [43-1] user=,db=,host=,app= LOG:  could
>> not fork worker process: Cannot allocate memory
>> terminate called after throwing an instance of 'std::bad_alloc'
>>   what():  std::bad_alloc
>>
> 2022-07-02 14:48:47 CEST [4476]: [44-1] user=,db=,host=,app= LOG:
>>  background worker "parallel worker" (PID 4303) was terminated by signal 6:
>> Aborted
>>
>>
> But what you see is probably just side effect. Do you use some extension
> in C++? This is C++ message. This process kills Postgres.
>

These extensions are installed (and no change from 14.3):

List of installed extensions
Name| Version |   Schema   |
 Description
+

Re: PostgreSQL 14.4 ERROR: out of memory issues

2022-07-11 Thread Pavel Stehule
Hi

It's looks like memory leak

  ErrorContext: 8192 total in 1 blocks; 7928 free (5 chunks); 264 used
> Grand total: 1456224 bytes in 195 blocks; 378824 free (165 chunks);
> 1077400 used
> 2022-07-02 14:48:07 CEST [3930]: [3-1] user=,db=,host=,app= ERROR:  out of
> memory
> 2022-07-02 14:48:07 CEST [3930]: [4-1] user=,db=,host=,app= DETAIL:
>  Failed on request of size 152094068 in memory context
> "TopTransactionContext".
> 2022-07-02 14:48:07 CEST [3930]: [5-1] user=,db=,host=,app= CONTEXT:
>  automatic vacuum of table "prematch.replication.tab_queue_tmp"
>
> 2022-07-02 14:48:47 CEST [4476]: [42-1] user=,db=,host=,app= LOG:  could
> not fork worker process: Cannot allocate memory
> 2022-07-02 14:48:47 CEST [4476]: [43-1] user=,db=,host=,app= LOG:  could
> not fork worker process: Cannot allocate memory
> terminate called after throwing an instance of 'std::bad_alloc'
>   what():  std::bad_alloc
>
2022-07-02 14:48:47 CEST [4476]: [44-1] user=,db=,host=,app= LOG:
>  background worker "parallel worker" (PID 4303) was terminated by signal 6:
> Aborted
>
>
But what you see is probably just side effect. Do you use some extension in
C++? This is C++ message. This process kills Postgres.

It can be related to some operation over prematch.replication.tab_queue_tmp
because in both cases an atovacuum was started.


>
>
> 132hba parser context: 25600 total in 6 blocks; 4464 free (8 chunks);
> 21136 used
> 133  ErrorContext: 8192 total in 1 blocks; 7928 free (5 chunks); 264 used
> 134Grand total: 1456224 bytes in 195 blocks; 378824 free (165 chunks);
> 1077400 used
> 1352022-07-02 14:48:07 CEST [3930]: [3-1] user=,db=,host=,app= ERROR:  out
> of memory
> 1362022-07-02 14:48:07 CEST [3930]: [4-1] user=,db=,host=,app= DETAIL:
>  Failed on request of size 152094068 in memory context
> "TopTransactionContext".
> 1372022-07-02 14:48:07 CEST [3930]: [5-1] user=,db=,host=,app= CONTEXT:
>  automatic vacuum of table "prematch.replication.tab_queue_tmp"
>

I am not sure, but it looks like corrupted table
prematch.replication.tab_queue_tmp


> And a few days later again:
> 2022-07-08 14:54:22 CEST [4791]: [81186-1]
> user=app_evaluation,db=prematch,host=172.25.0.80,app=Evaluation_STS_UAT
> DETAIL: parameters: $1 = '1', $2 = '1748010445', $3 = '0', $4 = '1000' 
> terminate
> called after throwing an instance of 'std::bad_alloc' terminate called
> after throwing an instance of 'std::bad_alloc' what(): what():
> std::bad_allocstd::bad_alloc 2022-07-08 14:54:23 CEST [4476]: [49-1]
> user=,db=,host=,app= LOG: background worker "parallel worker" (PID 25251)
> was terminated by signal 6: Aborted
> 2022-07-08 14:54:23 CEST [4476]: [51-1] user=,db=,host=,app= LOG:
>  terminating any other active server processes
>
>
> Thanks Ales Zeleny
>


Re: PostgreSQL 14.4 ERROR: out of memory issues

2022-07-11 Thread Rob Sargent



> On Jul 11, 2022, at 2:50 AM, Aleš Zelený  wrote:
> 
> 
> Hello,
> 
> we are experiencing out-of-memory issues after Postygres upgrade from 14.3 to 
> 14.4 on CentOS7.
> 
> OS: CenotOS7
> 
> No OOM killer messages in the syslog. SWAP is disabled.
> 
> -bash-4.2$ sysctl -q vm.overcommit_memory
> vm.overcommit_memory = 2
> 
> [root@sts-uat-pgsql100 ~]# swapon -vs
> [root@sts-uat-pgsql100 ~]# 
> 
> [root@sts-uat-pgsql100 ~]# sysctl -q vm.overcommit_ratio
> vm.overcommit_ratio = 95
> 
> 
> -bash-4.2$ uname -r
> 3.10.0-1160.42.2.el7.x86_64
> 
> -bash-4.2$ free -h
>   totalusedfree  shared  buff/cache   
> available
> Mem:39G7,8G220M 10G 31G 
> 20G
> Swap:0B  0B  0B
> 
> 
> 
> postgres=# show shared_buffers ;
>  shared_buffers
> 
>  10GB
> (1 row)
> 
> postgres=# show  work_mem ;
>  work_mem
> --
>  4MB
> (1 row)
> 
> postgres=# show maintenance_work_mem ;
>  maintenance_work_mem
> --
>  512MB
> (1 row)
> 
> postgres=# show max_parallel_workers;
>  max_parallel_workers
> --
>  8
> (1 row)
> 
> postgres=# show max_parallel_maintenance_workers ;
>  max_parallel_maintenance_workers
> --
>  2
> (1 row)
> 
> postgres=# select count(*) from pg_stat_activity ;
>  count
> ---
> 93
> (1 row)
> 
> postgres=# show max_connections ;
>  max_connections
> -
>  1000
> (1 row)
> 
> No changes on the application side were made before the out-of-memory 
> happened.
> 
> 
> 
> So far, it has happened three times (during a single week) from the 14.3 -> 
> 14.4 upgrade, before 14.4 we haven't suffered from such an issue.
> 
> Questions:e. 
> 1)  Can we safely downgrade from 14.4 to 14.3 by shutting down the instance 
> and reinstalling 14.3 PG packages (to prove, that the issue disappear)?

> 2) What is the best way to diagnose what is the root cause?
You can turn on sql logging to see what query is causing the crash. 
Has any new code added anything with 
   a.id!= b.id
on any large table?




What we have started to collect:

When you think you see it happening you can query the
System tables for currently running queries. pg_stat*


> - vmstat -nwt (per second)
> - vmstat -nawt (per





PostgreSQL 14.4 ERROR: out of memory issues

2022-07-11 Thread Aleš Zelený
Hello,

we are experiencing out-of-memory issues after Postygres upgrade from 14.3
to 14.4 on CentOS7.

OS: CenotOS7

No OOM killer messages in the syslog. SWAP is disabled.

-bash-4.2$ sysctl -q vm.overcommit_memory
vm.overcommit_memory = 2

[root@sts-uat-pgsql100 ~]# swapon -vs
[root@sts-uat-pgsql100 ~]#

[root@sts-uat-pgsql100 ~]# sysctl -q vm.overcommit_ratio
vm.overcommit_ratio = 95


-bash-4.2$ uname -r
3.10.0-1160.42.2.el7.x86_64

-bash-4.2$ free -h
  totalusedfree  shared  buff/cache
available
Mem:39G7,8G220M 10G 31G
20G
Swap:0B  0B  0B



postgres=# show shared_buffers ;
 shared_buffers

 10GB
(1 row)

postgres=# show  work_mem ;
 work_mem
--
 4MB
(1 row)

postgres=# show maintenance_work_mem ;
 maintenance_work_mem
--
 512MB
(1 row)

postgres=# show max_parallel_workers;
 max_parallel_workers
--
 8
(1 row)

postgres=# show max_parallel_maintenance_workers ;
 max_parallel_maintenance_workers
--
 2
(1 row)

postgres=# select count(*) from pg_stat_activity ;
 count
---
93
(1 row)

postgres=# show max_connections ;
 max_connections
-
 1000
(1 row)

No changes on the application side were made before the out-of-memory
happened.



So far, it has happened three times (during a single week) from the 14.3 ->
14.4 upgrade, before 14.4 we haven't suffered from such an issue.

Questions:
1)  Can we safely downgrade from 14.4 to 14.3 by shutting down the instance
and reinstalling 14.3 PG packages (to prove, that the issue disappear)?
2) What is the best way to diagnose what is the root cause?

What we have started to collect:
- vmstat -nwt (per second)
- vmstat -nawt (per second)

~onece a minute
df  for /dev/shm
/proc/meminfo
proc//stat for all PG processes
proc//cmdline for all PG processes
proc//status for all PG processes
proc//smaps for all PG processes
list from pg_stat_activcity
pmap -x for all PG processes
ps aux listing
smem -u --abbreviate
smem -m --abbreviate
smem -w -k

Except the "pmpap" all of the metrics were already collected before the
last out of meory issue.


Error messages:
--
postgresql-Fri.log:2022-07-01 06:49:53 CEST [4476]: [20-1]
user=,db=,host=,app= LOG:  could not fork worker process: Cannot allocate
memory
postgresql-Fri.log:2022-07-01 06:49:53 CEST [4476]: [21-1]
user=,db=,host=,app= LOG:  could not fork worker process: Cannot allocate
memory
postgresql-Fri.log:2022-07-01 06:50:02 CEST [4476]: [22-1]
user=,db=,host=,app= LOG:  could not fork new process for connection:
Cannot allocate memory
postgresql-Fri.log:2022-07-01 06:50:02 CEST [4476]: [23-1]
user=,db=,host=,app= LOG:  could not fork new process for connection:
Cannot allocate memory
postgresql-Fri.log:2022-07-01 06:50:02 CEST [4476]: [24-1]
user=,db=,host=,app= LOG:  could not fork new process for connection:
Cannot allocate memory
...
2022-07-02 14:48:07 CEST [3930]: [2-1] user=,db=,host=,app= LOG:  automatic
vacuum of table "prematch.monitoring.tab_replication_clock": index scans: 0
pages: 0 removed, 36 remain, 0 skipped due to pins, 34 skipped
frozen
tuples: 0 removed, 203 remain, 201 are dead but not yet removable,
oldest xmin: 269822444
index scan not needed: 0 pages from table (0.00% of total) had 0
dead item identifiers removed
I/O timings: read: 0.000 ms, write: 0.000 ms
avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
buffer usage: 67 hits, 0 misses, 0 dirtied
WAL usage: 1 records, 0 full page images, 245 bytes
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
TopMemoryContext: 279360 total in 10 blocks; 15384 free (9 chunks); 263976
used
  TopTransactionContext: 8192 total in 1 blocks; 7280 free (12 chunks); 912
used
  TOAST to main relid map: 65536 total in 4 blocks; 35168 free (15 chunks);
30368 used
  AV worker: 16384 total in 2 blocks; 10080 free (10 chunks); 6304 used
Autovacuum Portal: 8192 total in 1 blocks; 7720 free (0 chunks); 472
used
  Vacuum: 8192 total in 1 blocks; 7768 free (0 chunks); 424 used
  Operator class cache: 8192 total in 1 blocks; 552 free (0 chunks); 7640
used
  smgr relation table: 16384 total in 2 blocks; 4592 free (2 chunks); 11792
used
  pgstat TabStatusArray lookup hash table: 8192 total in 1 blocks; 1448
free (0 chunks); 6744 used
  TransactionAbortContext: 32768 total in 1 blocks; 32504 free (0 chunks);
264 used
  Portal hash: 8192 total in 1 blocks; 552 free (0 chunks); 7640 used
  TopPortalContext: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used
  Relcache by OID: 16384 total in 2 blocks; 3504 free (2 chunks); 12880 used
  CacheMemoryContext: 524288 total in 7 blocks; 123552 free (4 chunks);
400736 used
index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used:
tab_queue_tmp_pkey
index info: 2048 tota

RE: Upgrade from 11.3 to 13.1 failed with out of memory

2021-04-07 Thread Mihalidesová Jana
Open

Hi,

Thanks a lot for information.

Best regards,
 Jana

-Original Message-
From: Magnus Hagander  
Sent: Tuesday, April 6, 2021 3:23 PM
To: Mihalidesová Jana 
Cc: pgsql-general@lists.postgresql.org
Subject: Re: Upgrade from 11.3 to 13.1 failed with out of memory

On Tue, Apr 6, 2021 at 3:08 PM Mihalidesová Jana  
wrote:
>
> Open
>
>
> Hi,
>
>
>
> I have aproximetly 560GB large database and try to upgrade it from 11.3 to 
> 13.1. I’ve successfully upgraded dev,test and ref environment but on the 
> production pg_dump failed with out of memory. Yes, of course, the dev,test 
> and ref are much much smaller then production database.
>
> We are using OID data type so there’s a lot of large objects. pg_largeobject 
> it’s 59GB large.
>
> The upgrade process fail during the pg_dump schemas_only so I’m confused why 
> it’s not enough 35GB RAM which is free on the server when there’s no data. 
> When I tried to run same pg_dump command by hand as during upgrade it fails 
> on line pg_dump: reading large objects.
>
>
>
> Creating dump of global objects 
> "/pgsql/bin/13.1_/bin/pg_dumpall" --host /pgsql/data/ --port 50432 
> --username XX --globals-only --quote-all-identifiers --binary-upgrade 
> --verbose -f pg_upgrade_dump_globals.sql >> "pg_upgrade_utility.log" 2>&1
>
> ok
>
> Creating dump of database schemas
>
> "/pgsql/bin/13.1_/bin/pg_dump" --host /pgsql/data/ --port 
> 50432 --username XX --schema-only --quote-all-identifiers 
> --binary-upgrade --format=custom --verbose 
> --file="pg_upgrade_dump_16384.custom" 'dbname=' >> 
> "pg_upgrade_dump_16384.log" 2>&1
>
>
>
> *failure*
>
> There were problems executing ""/pgsql/bin/13.1_/bin/pg_dump" --host 
> /pgsql/data/ --port 50432 --username pgpnip --schema-only 
> --quote-all-identifiers --binary-upgrade --format=custom --verbose 
> --file="pg_upgrade_dump_16384.custom" 'dbname=' >> 
> "pg_upgrade_dump_16384.log" 2>&1"
>
>
>
>
>
> Do you have any idea how to upgrade the database? This is my upgrade command:
>
>
>
> /pgsql/bin/13.1_/bin/pg_upgrade -k -b /pgsql/bin/11.3_/bin -B 
> /pgsql/bin/13.1_/bin -d /pgsql/data/ -D 
> /pgsql/data//.new
>


This is unfortunately a known limitation in pg_dump (and therefor by proxy it 
becomes a problem with pg_upgrade) when you have many large objects. It doesn't 
really matter how big they are, it matters how
*many* they are. It takes a long time and uses crazy amounts of memory, but 
that's unfortunately where it's at. You'd have the same problem with a plain 
dump/reload as well, not just the "binary upgrade mode".

There's been some recent work on trying to find a remedy for this, but nothing 
is available at this point. You'll need to either trim the number of objects if 
you can (by maybe manually dumping them out to files before the restore and 
then reloading them back in later), or just add more memory/swap to the machine.

Long term you should probably consider switching to using bytea columns when 
you have that many objects.

--
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/


Re: Upgrade from 11.3 to 13.1 failed with out of memory

2021-04-06 Thread Magnus Hagander
On Tue, Apr 6, 2021 at 3:08 PM Mihalidesová Jana
 wrote:
>
> Open
>
>
> Hi,
>
>
>
> I have aproximetly 560GB large database and try to upgrade it from 11.3 to 
> 13.1. I’ve successfully upgraded dev,test and ref environment but on the 
> production pg_dump failed with out of memory. Yes, of course, the dev,test 
> and ref are much much smaller then production database.
>
> We are using OID data type so there’s a lot of large objects. pg_largeobject 
> it’s 59GB large.
>
> The upgrade process fail during the pg_dump schemas_only so I’m confused why 
> it’s not enough 35GB RAM which is free on the server when there’s no data. 
> When I tried to run same pg_dump command by hand as during upgrade it fails 
> on line pg_dump: reading large objects.
>
>
>
> Creating dump of global objects 
> "/pgsql/bin/13.1_/bin/pg_dumpall" --host /pgsql/data/ --port 50432 
> --username XX --globals-only --quote-all-identifiers --binary-upgrade 
> --verbose -f pg_upgrade_dump_globals.sql >> "pg_upgrade_utility.log" 2>&1
>
> ok
>
> Creating dump of database schemas
>
> "/pgsql/bin/13.1_/bin/pg_dump" --host /pgsql/data/ --port 50432 
> --username XX --schema-only --quote-all-identifiers --binary-upgrade 
> --format=custom --verbose --file="pg_upgrade_dump_16384.custom" 'dbname=' 
> >> "pg_upgrade_dump_16384.log" 2>&1
>
>
>
> *failure*
>
> There were problems executing ""/pgsql/bin/13.1_/bin/pg_dump" --host 
> /pgsql/data/ --port 50432 --username pgpnip --schema-only 
> --quote-all-identifiers --binary-upgrade --format=custom --verbose 
> --file="pg_upgrade_dump_16384.custom" 'dbname=' >> 
> "pg_upgrade_dump_16384.log" 2>&1"
>
>
>
>
>
> Do you have any idea how to upgrade the database? This is my upgrade command:
>
>
>
> /pgsql/bin/13.1_/bin/pg_upgrade -k -b /pgsql/bin/11.3_/bin -B 
> /pgsql/bin/13.1_/bin -d /pgsql/data/ -D /pgsql/data//.new
>


This is unfortunately a known limitation in pg_dump (and therefor by
proxy it becomes a problem with pg_upgrade) when you have many large
objects. It doesn't really matter how big they are, it matters how
*many* they are. It takes a long time and uses crazy amounts of
memory, but that's unfortunately where it's at. You'd have the same
problem with a plain dump/reload as well, not just the "binary upgrade
mode".

There's been some recent work on trying to find a remedy for this, but
nothing is available at this point. You'll need to either trim the
number of objects if you can (by maybe manually dumping them out to
files before the restore and then reloading them back in later), or
just add more memory/swap to the machine.

Long term you should probably consider switching to using bytea
columns when you have that many objects.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/




Upgrade from 11.3 to 13.1 failed with out of memory

2021-04-06 Thread Mihalidesová Jana
Open

Hi,

I have aproximetly 560GB large database and try to upgrade it from 11.3 to 
13.1. I’ve successfully upgraded dev,test and ref environment but on the 
production pg_dump failed with out of memory. Yes, of course, the dev,test and 
ref are much much smaller then production database.
We are using OID data type so there’s a lot of large objects. pg_largeobject 
it’s 59GB large.
The upgrade process fail during the pg_dump schemas_only so I’m confused why 
it’s not enough 35GB RAM which is free on the server when there’s no data. When 
I tried to run same pg_dump command by hand as during upgrade it fails on line 
pg_dump: reading large objects.

Creating dump of global objects 
"/pgsql/bin/13.1_/bin/pg_dumpall" --host /pgsql/data/ --port 50432 
--username XX --globals-only --quote-all-identifiers --binary-upgrade 
--verbose -f pg_upgrade_dump_globals.sql >> "pg_upgrade_utility.log" 2>&1
ok
Creating dump of database schemas
"/pgsql/bin/13.1_/bin/pg_dump" --host /pgsql/data/ --port 50432 
--username XX --schema-only --quote-all-identifiers --binary-upgrade 
--format=custom --verbose --file="pg_upgrade_dump_16384.custom" 'dbname=' 
>> "pg_upgrade_dump_16384.log" 2>&1

*failure*
There were problems executing ""/pgsql/bin/13.1_/bin/pg_dump" --host 
/pgsql/data/ --port 50432 --username pgpnip --schema-only 
--quote-all-identifiers --binary-upgrade --format=custom --verbose 
--file="pg_upgrade_dump_16384.custom" 'dbname=' >> 
"pg_upgrade_dump_16384.log" 2>&1"


Do you have any idea how to upgrade the database? This is my upgrade command:

/pgsql/bin/13.1_/bin/pg_upgrade -k -b /pgsql/bin/11.3_/bin -B 
/pgsql/bin/13.1_/bin -d /pgsql/data/ -D /pgsql/data//.new

Thank you for any ideas.

Best regards,

Jana Mihalidesova
Database Administrator

►CETIN a.s.
Českomoravská 2510/19, 190 00 Praha 9
m: +420 603 419 862t: +420 238 465 074
jana.mihalides...@cetin.cz



Re: Out of memory with "create extension postgis"

2020-08-03 Thread Daniel Westermann (DWE)
>I am aware that the behavior is different from what we've seen last week but 
>this is how it looks today.
>Anything we missed or did not do correct?

Finally this can be re-produced quite easily by installing this extension: 
https://de.osdn.net/projects/pgstoreplans/downloads/72297/pg_store_plans12-1.4-1.el7.x86_64.rpm/
As soon as pg_store_plans is in shared_preload_libraries the postgis extension 
can not be installed anymore. This is the exact package list:

[root@rhel77 ~]# rpm -qa | egrep "postgres|postgis|store_plan"
postgresql12-server-12.3-5PGDG.rhel7.x86_64
postgresql12-contrib-12.3-5PGDG.rhel7.x86_64
postgresql12-12.3-5PGDG.rhel7.x86_64
postgis30_12-3.0.1-5.rhel7.x86_64
pg_store_plans12-1.4-1.el7.x86_64
postgresql12-libs-12.3-5PGDG.rhel7.x86_64
postgresql12-llvmjit-12.3-5PGDG.rhel7.x86_64

I will open an issue on the extensions' GitHub repo.

Regards
Daniel






Re: Out of memory with "create extension postgis"

2020-08-03 Thread Daniel Westermann (DWE)
>> Here is a new one with bt at the end:

>That's just showing the stack when the backend is idle waiting for input.
>We need to capture the stack at the moment when the "out of memory" error
>is reported (errfinish() should be the top of stack).

Then I don't know what/how to do it. Here is a complete trace of what we did 
today:

-- First session 

$ psql
# select pg_backend_pid();
 pg_backend_pid

  22480
(1 row)


-- Second session

# gdb -p 22480
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Attaching to process 22480
Reading symbols from /usr/pgsql-12/bin/postgres...Reading symbols from 
/usr/lib/debug/usr/pgsql-12/bin/postgres.debug...done.
done.
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols 
found)...done.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libxml2.so.2...Reading symbols from 
/lib64/libxml2.so.2...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libxml2.so.2
Reading symbols from /lib64/libpam.so.0...Reading symbols from 
/lib64/libpam.so.0...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libpam.so.0
Reading symbols from /lib64/libssl.so.10...Reading symbols from 
/lib64/libssl.so.10...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libssl.so.10
Reading symbols from /lib64/libcrypto.so.10...Reading symbols from 
/lib64/libcrypto.so.10...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libcrypto.so.10
Reading symbols from /lib64/libgssapi_krb5.so.2...Reading symbols from 
/lib64/libgssapi_krb5.so.2...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libgssapi_krb5.so.2
Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/librt.so.1
Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libldap-2.4.so.2...Reading symbols from 
/lib64/libldap-2.4.so.2...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libldap-2.4.so.2
Reading symbols from /lib64/libicui18n.so.50...Reading symbols from 
/lib64/libicui18n.so.50...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libicui18n.so.50
Reading symbols from /lib64/libicuuc.so.50...Reading symbols from 
/lib64/libicuuc.so.50...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libicuuc.so.50
Reading symbols from /lib64/libsystemd.so.0...Reading symbols from 
/lib64/libsystemd.so.0...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libsystemd.so.0
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols 
found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib64/libz.so.1...Reading symbols from 
/lib64/libz.so.1...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libz.so.1
Reading symbols from /lib64/liblzma.so.5...Reading symbols from 
/lib64/liblzma.so.5...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/liblzma.so.5
Reading symbols from /lib64/libaudit.so.1...Reading symbols from 
/lib64/libaudit.so.1...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libaudit.so.1
Reading symbols from /lib64/libkrb5.so.3...Reading symbols from 
/lib64/libkrb5.so.3...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libkrb5.so.3
Reading symbols from /lib64/libcom_err.so.2...Reading symbols from 
/lib64/libcom_err.so.2...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libcom_err.so.2
Reading symbols from /lib64/libk5crypto.so.3...Reading symbols from 
/lib64/libk5crypto.so

Re: Out of memory with "create extension postgis"

2020-07-30 Thread Tom Lane
"Daniel Westermann (DWE)"  writes:
>> Umm ... you didn't issue a "bt" when you got to errfinish, so there's
>> no useful info here.

> Here is a new one with bt at the end:

That's just showing the stack when the backend is idle waiting for input.
We need to capture the stack at the moment when the "out of memory" error
is reported (errfinish() should be the top of stack).

>> libraries to have gotten linked into a Postgres backend.  What
>> extensions are you using?

> These are the extensions in use:
>  plpythonu  | 1.0 | pg_catalog | PL/PythonU untrusted procedural 
> language

Hm.  If you've been actively using plpython in this session, perhaps
libpython would have pulled all this other weirdness in with it.

regards, tom lane




Re: Out of memory with "create extension postgis"

2020-07-30 Thread Daniel Westermann (DWE)
From: Tom Lane 
Sent: Wednesday, July 29, 2020 17:05
To: Daniel Westermann (DWE) 
Cc: pgsql-general@lists.postgresql.org 
Subject: Re: Out of memory with "create extension postgis" 
 
"Daniel Westermann (DWE)"  writes:
>> So this is what we got today. In the log file there is this:

>> 2020-07-29 16:33:23 CEST 101995 ERROR:  out of memory
>> 2020-07-29 16:33:23 CEST 101995 DETAIL:  Failed on request of size 8265691 
>> in memory context "PortalContext".
>> 2020-07-29 16:33:23 CEST 101995 STATEMENT:  create extension postgis;

>Is there not a dump of memory context sizes just before the "ERROR: out of
>memory" line?  It should look something like

>TopMemoryContext: 68720 total in 5 blocks; 17040 free (15 chunks); 51680 used
>  MessageContext: 8192 total in 1 blocks; 6880 free (1 chunks); 1312 used
>...
>  ErrorContext: 8192 total in 1 blocks; 7928 free (3 chunks); 264 used
>Grand total: 1063328 bytes in 190 blocks; 312376 free (159 chunks); 750952 used

>(this taken from an idle backend, so numbers from a backend that's hit
>OOM would be a lot larger).  If you don't see that then you must be
>using some logging mechanism that fails to capture the postmaster's
>stderr output, such as syslog.  If your postmaster start script doesn't
>actually send stderr to /dev/null, you might find the context map in some
>other log file.

Thanks for the hint, will check

>Umm ... you didn't issue a "bt" when you got to errfinish, so there's
>no useful info here.

Here is a new one with bt at the end:

Breakpoint 1 at 0x87e210: file elog.c, line 411.
Continuing.

Breakpoint 1, errfinish (dummy=dummy@entry=0) at elog.c:411
411 elog.c: No such file or directory.
Continuing.

Breakpoint 1, errfinish (dummy=dummy@entry=0) at elog.c:411
411 in elog.c
Missing separate debuginfos, use: debuginfo-install CGAL-4.7-1.rhel7.1.x86_64 
boost-date-time-1.53.0-27.el7.x86_64 boost-serialization-1.53.0-27.el7.x86_64 
boost-system-1.53.0-27.el7.x86_64 boost-thread-1.53.0-27.el7.x86_64 
gmp-6.0.0-15.el7.x86_64 jbigkit-libs-2.0-11.el7.x86_64 
json-c-0.11-4.el7_0.x86_64 libcurl-7.29.0-54.el7_7.2.x86_64 
libidn-1.28-4.el7.x86_64 libjpeg-turbo-1.2.90-8.el7.x86_64 
libssh2-1.8.0-3.el7.x86_64 libtiff-4.0.3-32.el7.x86_64 mpfr-3.1.1-4.el7.x86_64 
postgis30_12-3.0.1-5.rhel7.x86_64 protobuf-c-1.0.2-3.el7.x86_64 
sqlite33-libs-3.30.1-1.rhel7.x86_64
Continuing.

Program received signal SIGINT, Interrupt.
0x7f94f4af2e93 in __epoll_wait_nocancel () from /lib64/libc.so.6
#0  0x7f94f4af2e93 in __epoll_wait_nocancel () from /lib64/libc.so.6
#1  0x0073fdae in WaitEventSetWaitBlock (nevents=1, 
occurred_events=0x7ffcf3b4bc30, cur_timeout=-1, set=0x27c3718) at latch.c:1080
#2  WaitEventSetWait (set=0x27c3718, timeout=timeout@entry=-1, 
occurred_events=occurred_events@entry=0x7ffcf3b4bc30, nevents=nevents@entry=1, 
wait_event_info=wait_event_info@entry=100663296) at latch.c:1032
#3  0x0064fbd7 in secure_read (port=0x27c3900, ptr=0xd45a80 
, len=8192) at be-secure.c:185
#4  0x0065aa38 in pq_recvbuf () at pqcomm.c:964
#5  0x0065b655 in pq_getbyte () at pqcomm.c:1007
#6  0x00761aaa in SocketBackend (inBuf=0x7ffcf3b4bda0) at postgres.c:341
#7  ReadCommand (inBuf=0x7ffcf3b4bda0) at postgres.c:514
#8  PostgresMain (argc=, argv=argv@entry=0x27cb420, 
dbname=0x27cb2e8 "pcl_l800", username=) at postgres.c:4189
#9  0x00484022 in BackendRun (port=, port=) at postmaster.c:4448
#10 BackendStartup (port=0x27c3900) at postmaster.c:4139
#11 ServerLoop () at postmaster.c:1704
#12 0x006f14c3 in PostmasterMain (argc=argc@entry=3, 
argv=argv@entry=0x278c280) at postmaster.c:1377
#13 0x00484f23 in main (argc=3, argv=0x278c280) at main.c:228
A debugging session is active.

Inferior 1 [process 97279] will be detached.

Quit anyway? (y or n) Detaching from program: /usr/pgsql-12/bin/postgres, 
process 97279


>> Missing separate debuginfos, use: debuginfo-install 
>> CGAL-4.7-1.rhel7.1.x86_64 boost-date-time-1.53.0-27.el7.x86_64 
>> boost-serialization-1.53.0-27.el7.x86_64 boost-system-1.53.0-27.el7.x86_64 
>> boost-thread-1.53.0-27.el7.x86_64 gmp-6.0.0-15.el7.x86_64 
>> jbigkit-libs-2.0-11.el7.x86_64 json-c-0.11-4.el7_0.x86_64 
>> libcurl-7.29.0-54.el7_7.2.x86_64 libidn-1.28-4.el7.x86_64 
>> libjpeg-turbo-1.2.90-8.el7.x86_64 libssh2-1.8.0-3.el7.x86_64 
>> libtiff-4.0.3-32.el7.x86_64 mpfr-3.1.1-4.el7.x86_64 
>> postgis30_12-3.0.1-5.rhel7.x86_64 protobuf-c-1.0.2-3.el7.x86_64 
>> sqlite33-libs-3.30.1-1.rhel7.x86_64

>This is a little weird and scary; I would not expect most of those
>libraries to have gotten linked into a Postgres backend.  What
>extensions are you using?  (And what the devil would any of them
>want with sqlite or libcurl?  boost-thread is even scarier, because
>we absolutely do

Re: Out of memory with "create extension postgis"

2020-07-29 Thread Tom Lane
"Daniel Westermann (DWE)"  writes:
> So this is what we got today. In the log file there is this:

> 2020-07-29 16:33:23 CEST 101995 ERROR:  out of memory
> 2020-07-29 16:33:23 CEST 101995 DETAIL:  Failed on request of size 8265691 in 
> memory context "PortalContext".
> 2020-07-29 16:33:23 CEST 101995 STATEMENT:  create extension postgis;

Is there not a dump of memory context sizes just before the "ERROR: out of
memory" line?  It should look something like

TopMemoryContext: 68720 total in 5 blocks; 17040 free (15 chunks); 51680 used
  MessageContext: 8192 total in 1 blocks; 6880 free (1 chunks); 1312 used
...
  ErrorContext: 8192 total in 1 blocks; 7928 free (3 chunks); 264 used
Grand total: 1063328 bytes in 190 blocks; 312376 free (159 chunks); 750952 used

(this taken from an idle backend, so numbers from a backend that's hit
OOM would be a lot larger).  If you don't see that then you must be
using some logging mechanism that fails to capture the postmaster's
stderr output, such as syslog.  If your postmaster start script doesn't
actually send stderr to /dev/null, you might find the context map in some
other log file.

> The GDB session shows this:

Umm ... you didn't issue a "bt" when you got to errfinish, so there's
no useful info here.

> Missing separate debuginfos, use: debuginfo-install CGAL-4.7-1.rhel7.1.x86_64 
> boost-date-time-1.53.0-27.el7.x86_64 boost-serialization-1.53.0-27.el7.x86_64 
> boost-system-1.53.0-27.el7.x86_64 boost-thread-1.53.0-27.el7.x86_64 
> gmp-6.0.0-15.el7.x86_64 jbigkit-libs-2.0-11.el7.x86_64 
> json-c-0.11-4.el7_0.x86_64 libcurl-7.29.0-54.el7_7.2.x86_64 
> libidn-1.28-4.el7.x86_64 libjpeg-turbo-1.2.90-8.el7.x86_64 
> libssh2-1.8.0-3.el7.x86_64 libtiff-4.0.3-32.el7.x86_64 
> mpfr-3.1.1-4.el7.x86_64 postgis30_12-3.0.1-5.rhel7.x86_64 
> protobuf-c-1.0.2-3.el7.x86_64 sqlite33-libs-3.30.1-1.rhel7.x86_64

This is a little weird and scary; I would not expect most of those
libraries to have gotten linked into a Postgres backend.  What
extensions are you using?  (And what the devil would any of them
want with sqlite or libcurl?  boost-thread is even scarier, because
we absolutely do not support multithreading in a backend process.)

regards, tom lane




Re: Out of memory with "create extension postgis"

2020-07-29 Thread Daniel Westermann (DWE)
>>>> "Daniel Westermann (DWE)"  writes:
>>>>> The process eats all the available memory and finally dies:
>>>>> # create extension postgis;
>>>>> ERROR:  out of memory
>>>>> DETAIL:  Failed on request of size 8265691 in memory context 
>>>>> "PortalContext".
>>>>> Time: 773569.877 ms (12:53.570)

>>> There is nothing in the log file except these (not from today, but the 
>>> messages are always the same):
>>> 2020-07-03 16:52:16 CEST 53617 LOG:  server process (PID 54070) was 
>>> terminated by signal 9: Killed

>>A process that was killed by the OOM killer would not have managed to
>>produce an "out of memory" ERROR report, so these two are different
>>symptoms.  You need to reproduce the first case, or you won't have
>>any luck setting an error breakpoint either.

>You're right, that was before we adjusted the oom behavior. Will get back once 
>I have more information.

So this is what we got today. In the log file there is this:

2020-07-29 16:33:23 CEST 101995 ERROR:  out of memory
2020-07-29 16:33:23 CEST 101995 DETAIL:  Failed on request of size 8265691 in 
memory context "PortalContext".
2020-07-29 16:33:23 CEST 101995 STATEMENT:  create extension postgis;
2020-07-29 16:35:00 CEST 106695 LOG:  disconnection: session time: 0:04:45.200 
user=monitor database=pcl_l800 host=127.0.0.1 port=52160
2020-07-29 16:35:00 CEST 107063 LOG:  connection received: host=127.0.0.1 
port=52212
2020-07-29 16:35:00 CEST 107063 LOG:  connection authorized: user=monitor 
database=pdb_l80_oiz application_name=pg_statsinfod
2020-07-29 16:35:00 CEST 53630 ALERT:  pg_statsinfo: memory swap size exceeds 
threshold in snapshot '2020-07-29 16:35:00' --- 17383584 KiB (threshold = 
100 KiB)
2020-07-29 16:35:15 CEST 107063 LOG:  disconnection: session time: 0:00:14.770 
user=monitor database=pdb_l80_oiz host=127.0.0.1 port=52212
2020-07-29 16:35:15 CEST 107269 LOG:  connection received: host=127.0.0.1 
port=52220

The GDB session shows this:
(gdb) set pagination off
(gdb) set logging file postgis.log
(gdb)
(gdb) set logging on
Copying output to postgis.log.
(gdb) b errfinish
Breakpoint 1 at 0x87e210: file elog.c, line 411.
(gdb) cont
Continuing.
Breakpoint 1, errfinish (dummy=dummy@entry=0) at elog.c:411
411     elog.c: No such file or directory.
(gdb) cont
Continuing.
Breakpoint 1, errfinish (dummy=dummy@entry=0) at elog.c:411
411     in elog.c
Missing separate debuginfos, use: debuginfo-install CGAL-4.7-1.rhel7.1.x86_64 
boost-date-time-1.53.0-27.el7.x86_64 boost-serialization-1.53.0-27.el7.x86_64 
boost-system-1.53.0-27.el7.x86_64 boost-thread-1.53.0-27.el7.x86_64 
gmp-6.0.0-15.el7.x86_64 jbigkit-libs-2.0-11.el7.x86_64 
json-c-0.11-4.el7_0.x86_64 libcurl-7.29.0-54.el7_7.2.x86_64 
libidn-1.28-4.el7.x86_64 libjpeg-turbo-1.2.90-8.el7.x86_64 
libssh2-1.8.0-3.el7.x86_64 libtiff-4.0.3-32.el7.x86_64 mpfr-3.1.1-4.el7.x86_64 
postgis30_12-3.0.1-5.rhel7.x86_64 protobuf-c-1.0.2-3.el7.x86_64 
sqlite33-libs-3.30.1-1.rhel7.x86_64
(gdb)
Continuing.


The GDB logfile shows this:
[root@tstm49012 ~]# cat postgis.log
Breakpoint 1 at 0x87e210: file elog.c, line 411.
Continuing.
Breakpoint 1, errfinish (dummy=dummy@entry=0) at elog.c:411
411     elog.c: No such file or directory.
Continuing.
Breakpoint 1, errfinish (dummy=dummy@entry=0) at elog.c:411
411     in elog.c
Missing separate debuginfos, use: debuginfo-install CGAL-4.7-1.rhel7.1.x86_64 
boost-date-time-1.53.0-27.el7.x86_64 boost-serialization-1.53.0-27.el7.x86_64 
boost-system-1.53.0-27.el7.x86_64 boost-thread-1.53.0-27.el7.x86_64 
gmp-6.0.0-15.el7.x86_64 jbigkit-libs-2.0-11.el7.x86_64 
json-c-0.11-4.el7_0.x86_64 libcurl-7.29.0-54.el7_7.2.x86_64 
libidn-1.28-4.el7.x86_64 libjpeg-turbo-1.2.90-8.el7.x86_64 
libssh2-1.8.0-3.el7.x86_64 libtiff-4.0.3-32.el7.x86_64 mpfr-3.1.1-4.el7.x86_64 
postgis30_12-3.0.1-5.rhel7.x86_64 protobuf-c-1.0.2-3.el7.x86_64 
sqlite33-libs-3.30.1-1.rhel7.x86_64
Continuing.
Program received signal SIGINT, Interrupt.
0x7f94f4af2e93 in __epoll_wait_nocancel () from /lib64/libc.so.6
Detaching from program: /usr/pgsql-12/bin/postgres, process 101995
[root@tstm49012 ~]#

Is that of any help?

Regards
Daniel










Re: Out of memory with "create extension postgis"

2020-07-28 Thread Daniel Westermann (DWE)
>>> "Daniel Westermann (DWE)"  writes:
>>>> The process eats all the available memory and finally dies:
>>>> # create extension postgis;
>>>> ERROR:  out of memory
>>>> DETAIL:  Failed on request of size 8265691 in memory context 
>>>> "PortalContext".
>>>> Time: 773569.877 ms (12:53.570)

>> There is nothing in the log file except these (not from today, but the 
>> messages are always the same):
>> 2020-07-03 16:52:16 CEST 53617 LOG:  server process (PID 54070) was 
>> terminated by signal 9: Killed

>A process that was killed by the OOM killer would not have managed to
>produce an "out of memory" ERROR report, so these two are different
>symptoms.  You need to reproduce the first case, or you won't have
>any luck setting an error breakpoint either.

You're right, that was before we adjusted the oom behavior. Will get back once 
I have more information.

Regards
Daniel


Re: Out of memory with "create extension postgis"

2020-07-28 Thread Tom Lane
"Daniel Westermann (DWE)"  writes:
>> "Daniel Westermann (DWE)"  writes:
>>> The process eats all the available memory and finally dies:
>>> # create extension postgis;
>>> ERROR:  out of memory
>>> DETAIL:  Failed on request of size 8265691 in memory context 
>>> "PortalContext".
>>> Time: 773569.877 ms (12:53.570)

> There is nothing in the log file except these (not from today, but the 
> messages are always the same):
> 2020-07-03 16:52:16 CEST 53617 LOG:  server process (PID 54070) was 
> terminated by signal 9: Killed

A process that was killed by the OOM killer would not have managed to
produce an "out of memory" ERROR report, so these two are different
symptoms.  You need to reproduce the first case, or you won't have
any luck setting an error breakpoint either.

If you can't manage to get back to that state, you might get somewhere
by waiting for the process to grow large and then attaching with gdb
and getting a stack trace.  That's a bit less reliable than the
errfinish approach, though.

regards, tom lane




Re: Out of memory with "create extension postgis"

2020-07-28 Thread Daniel Westermann (DWE)
>"Daniel Westermann (DWE)"  writes:
>> we have a very strange behavior on PostgreSQL 12.3 when we try to create the 
>> extension postgis. Postgres and postgis have both been installed from 
>> packages:
>> ...
>> The process eats all the available memory and finally dies:
>> # create extension postgis;
>> ERROR:  out of memory
>> DETAIL:  Failed on request of size 8265691 in memory context "PortalContext".
>> Time: 773569.877 ms (12:53.570)

>Quite odd.  There should have been a memory context dump written to the
>postmaster's stderr, can you show that?  Also possibly useful would be
>a backtrace (set a breakpoint at errfinish):

>https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend<https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend>

There is nothing in the log file except these (not from today, but the messages 
are always the same):
2020-07-03 16:52:16 CEST 53617 LOG:  server process (PID 54070) was terminated 
by signal 9: Killed
2020-07-03 16:52:16 CEST 53617 DETAIL:  Failed process was running: create 
extension postgis;
2020-07-03 16:52:16 CEST 53617 LOG:  terminating any other active server 
processes

I will take care of the backtrace

Regards
Daniel



Re: Out of memory with "create extension postgis"

2020-07-28 Thread Tom Lane
"Daniel Westermann (DWE)"  writes:
> we have a very strange behavior on PostgreSQL 12.3 when we try to create the 
> extension postgis. Postgres and postgis have both been installed from 
> packages:
> ...
> The process eats all the available memory and finally dies:
> # create extension postgis;
> ERROR:  out of memory
> DETAIL:  Failed on request of size 8265691 in memory context "PortalContext".
> Time: 773569.877 ms (12:53.570)

Quite odd.  There should have been a memory context dump written to the
postmaster's stderr, can you show that?  Also possibly useful would be
a backtrace (set a breakpoint at errfinish):

https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend

regards, tom lane




Out of memory with "create extension postgis"

2020-07-28 Thread Daniel Westermann (DWE)
Hi,

we have a very strange behavior on PostgreSQL 12.3 when we try to create the 
extension postgis. Postgres and postgis have both been installed from packages:

postgresql12-server-12.3-5PGDG
postgis30_12-3.0.1-5.rhel7

The OS is RedHat 7.7.

Creating other extensions works fine, we only see this for postgis. A gdb 
session attached to the psql process while creating the extension gives:

Program received signal SIGINT, Interrupt.
0x7f19c0de9740 in __read_nocancel () from /lib64/libpthread.so.0
Detaching from program: /usr/pgsql-12/bin/psql, process 44202

The process eats all the available memory and finally dies:
# create extension postgis;
ERROR:  out of memory
DETAIL:  Failed on request of size 8265691 in memory context "PortalContext".
Time: 773569.877 ms (12:53.570)


Stats file from /proc:

# cat status
Name:   psql
Umask:  0022
State:  S (sleeping)
Tgid:   45958
Ngid:   0
Pid:    45958
PPid:   44075
TracerPid:      0
Uid:    280     280     280     280
Gid:    280     280     280     280
FDSize: 256
Groups: 280
VmPeak:   184604 kB
VmSize:   184600 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:      4304 kB
VmRSS:      4304 kB
RssAnon:             980 kB
RssFile:            3324 kB
RssShmem:              0 kB
VmData:      672 kB
VmStk:       132 kB
VmExe:       604 kB
VmLib:     11288 kB
VmPTE:       180 kB
VmSwap:        0 kB
Threads:        1
SigQ:   0/15635
SigPnd: 
ShdPnd: 
SigBlk: 
SigIgn: 
SigCgt: 00018002
CapInh: 
CapPrm: 
CapEff: 
CapBnd: 001f
CapAmb: 
NoNewPrivs:     0
Seccomp:        0
Speculation_Store_Bypass:       thread vulnerable
Cpus_allowed:   ,,,
Cpus_allowed_list:      0-127
Mems_allowed:   
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0001
Mems_allowed_list:      0
voluntary_ctxt_switches:        47

stack file from /proc:
# cat stack
[] poll_schedule_timeout+0x55/0xc0
[] do_sys_poll+0x48d/0x590
[] SyS_poll+0x74/0x110
[] system_call_fastpath+0x25/0x2a
[] 0x

limits from /proc:
# cat limits
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            8388608              unlimited            bytes
Max core file size        0                    0                    bytes
Max resident set          unlimited            unlimited            bytes
Max processes             4096                 15635                processes
Max open files            1024                 4096                 files
Max locked memory         65536                65536                bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       15635                15635                signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us

Stracing the psql process shows no activity. Looking at the used swap shows 
that the postmaster uses a huge amount:

find /proc -maxdepth 2 -path "/proc/[0-9]*/status" -readable -exec awk -v 
FS=":" '{process[$1]=$2;sub(/^[ \t]+/,"",process[$1]);} END 
{if(process["VmSwap"] && process["VmSwap"] != "0 kB") printf "%10s %-30s 
%20s\n",process["Pid"],process["Name"],process["VmSwap"]}' '{}' \;
...
 48043 psql1004 kB
 48044 postmaster  21948064 kB
 49059 postmaster  1008 kB
 52550 rhsmcertd144 kB
...
 
Any hints what could be the issue? Can I provide any other information that 
would help troubleshooting this issue?

Thanks in advance
Daniel






AW: Out of memory in big transactions after upgrade to 12.2

2020-04-06 Thread Jan Strube
Thanks a lot so far for all your answers. work_mem is 4 MB and max_connections 
is 100, no pooling.

As additional info maybe I should mention that we do an update on one table 
which cascades to some other tables updating 10ks of rows there and triggering 
the Perl functions for every row. I’m not sure this was clear from the stack 
trace.


Von: Michael Lewis [mailto:mle...@entrata.com]
Gesendet: Freitag, 3. April 2020 18:30
An: David Day 
Cc: Pavel Stehule ; Jan Strube 
; pgsql-general@lists.postgresql.org
Betreff: Re: Out of memory in big transactions after upgrade to 12.2

If you didn't turn it off, you have parallel workers on by default with v12. If 
work_mem is set high, memory use may be much higher as each node in a complex 
plan could end up executing in parallel.

Also, do you use a connection pooler such as pgbouncer or pgpool? What is 
max_connections set to?


Re: Out of memory in big transactions after upgrade to 12.2

2020-04-03 Thread Michael Lewis
If you didn't turn it off, you have parallel workers on by default with
v12. If work_mem is set high, memory use may be much higher as each node in
a complex plan could end up executing in parallel.

Also, do you use a connection pooler such as pgbouncer or pgpool? What is
max_connections set to?

>


Re: DB running out of memory issues after upgrade

2020-02-23 Thread Peter J. Holzer
On 2020-02-18 18:10:08 +, Nagaraj Raj wrote:
> Below are the same configurations ins .conf file before and after updagrade
> 
> show max_connections; = 1743
[...]
> show work_mem = "4MB"

This is an interesting combination: So you expect a large number of
connections but each one should use very little RAM?

[...]

> here is some sys logs,
> 
> 2020-02-16 21:01:17 UTC [-]The database process was killed by the OS
> due to excessive memory consumption. 
> 2020-02-16 13:41:16 UTC [-]The database process was killed by the OS
> due to excessive memory consumption. 

The oom-killer produces a huge block of messages which you can find with
dmesg or in your syslog. It looks something like this:

Feb 19 19:06:53 akran kernel: [3026711.344817] platzangst invoked oom-killer: 
gfp_mask=0x15080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), nodemask=(null), order=1, 
oom_score_adj=0
Feb 19 19:06:53 akran kernel: [3026711.344819] platzangst cpuset=/ 
mems_allowed=0-1
Feb 19 19:06:53 akran kernel: [3026711.344825] CPU: 7 PID: 2012 Comm: 
platzangst Tainted: G   OE4.15.0-74-generic #84-Ubuntu
Feb 19 19:06:53 akran kernel: [3026711.344826] Hardware name: Dell Inc. 
PowerEdge R630/02C2CP, BIOS 2.1.7 06/16/2016
Feb 19 19:06:53 akran kernel: [3026711.344827] Call Trace:
Feb 19 19:06:53 akran kernel: [3026711.344835]  dump_stack+0x6d/0x8e
Feb 19 19:06:53 akran kernel: [3026711.344839]  dump_header+0x71/0x285
...
Feb 19 19:06:53 akran kernel: [3026711.344893] RIP: 0033:0x7f292d076b1c
Feb 19 19:06:53 akran kernel: [3026711.344894] RSP: 002b:7fff187ef240 
EFLAGS: 0246 ORIG_RAX: 0038
Feb 19 19:06:53 akran kernel: [3026711.344895] RAX: ffda RBX: 
7fff187ef240 RCX: 7f292d076b1c
Feb 19 19:06:53 akran kernel: [3026711.344896] RDX:  RSI: 
 RDI: 01200011
Feb 19 19:06:53 akran kernel: [3026711.344897] RBP: 7fff187ef2b0 R08: 
7f292d596740 R09: 009d43a0
Feb 19 19:06:53 akran kernel: [3026711.344897] R10: 7f292d596a10 R11: 
0246 R12: 
Feb 19 19:06:53 akran kernel: [3026711.344898] R13: 0020 R14: 
 R15: 
Feb 19 19:06:53 akran kernel: [3026711.344899] Mem-Info:
Feb 19 19:06:53 akran kernel: [3026711.344905] active_anon:14862589 
inactive_anon:1133875 isolated_anon:0
Feb 19 19:06:53 akran kernel: [3026711.344905]  active_file:467 
inactive_file:371 isolated_file:0
Feb 19 19:06:53 akran kernel: [3026711.344905]  unevictable:0 dirty:3 
writeback:0 unstable:0
...
Feb 19 19:06:53 akran kernel: [3026711.344985] [ pid ]   uid  tgid total_vm 
 rss pgtables_bytes swapents oom_score_adj name
Feb 19 19:06:53 akran kernel: [3026711.344997] [  823] 0   82344909 
   0   106496  121 0 lvmetad
Feb 19 19:06:53 akran kernel: [3026711.344999] [ 1354] 0  135411901 
   3   135168  112 0 rpcbind
Feb 19 19:06:53 akran kernel: [3026711.345000] [ 1485] 0  148569911 
  99   180224  159 0 accounts-daemon
...
Feb 19 19:06:53 akran kernel: [3026711.345345] Out of memory: Kill process 
25591 (postgres) score 697 or sacrifice child
Feb 19 19:06:53 akran kernel: [3026711.346563] Killed process 25591 (postgres) 
total-vm:71116948kB, anon-rss:52727552kB, file-rss:0kB, shmem-rss:3023196kB

The most interesting lines are usually the last two: In this case they
tell us that the process killed was a postgres process and it occupied
about 71 GB of virtual memory at that time. That was clearly the right
choice since the machine has only 64 GB of RAM. Sometimes it is less
clear and then you might want to scroll through the (usually long) list
of processes to see if there are other processes which need suspicious
amounts of RAM or maybe if there are just more of them than you would
expect.


> I identified one simple select which consuming more memory and here is the
> query plan,
> 
> 
> 
> "Result  (cost=0.00..94891854.11 rows=3160784900 width=288)"
> "  ->  Append  (cost=0.00..47480080.61 rows=3160784900 width=288)"
> "->  Seq Scan on msghist  (cost=0.00..15682777.12 rows=312949 
> width
> =288)"
> "  Filter: (((data -> 'info'::text) ->> 'status'::text) =
> 'CLOSE'::text)"
> "->  Seq Scan on msghist msghist_1  (cost=0.00..189454.50 
> rows=31294900
> width=288)"
> "  Filter: (((data -> 'info'::text) ->> 'status'::text) =
> 'CLOSE'::text)"

So: How much memory does that use? It produces a huge number of rows
(more than 3 billion) but it doesn't do much with them, so I wouldn't
expect the postgres process itself to use much memory. Are you sure its
the postgres process and not the application which uses a lot of memory?

hp

-- 
   _  | Peter J. Holzer| Story must make more sen

Re: DB running out of memory issues after upgrade

2020-02-18 Thread Merlin Moncure
On Tue, Feb 18, 2020 at 12:10 PM Nagaraj Raj  wrote:
>
> Below are the same configurations ins .conf file before and after updagrade
>
> show max_connections; = 1743
> show shared_buffers = "4057840kB"
> show effective_cache_size =  "8115688kB"
> show maintenance_work_mem = "259MB"
> show checkpoint_completion_target = "0.9"
> show wal_buffers = "16MB"
> show default_statistics_target = "100"
> show random_page_cost = "1.1"
> show effective_io_concurrency =" 200"
> show work_mem = "4MB"
> show min_wal_size = "256MB"
> show max_wal_size = "2GB"
> show max_worker_processes = "8"
> show max_parallel_workers_per_gather = "2"

This smells like oom killer for sure.  how did you resolve some of
these values.  In particular max_connections and effective_cache_size.
  How much memory is in this server?

merlin




Re: DB running out of memory issues after upgrade

2020-02-18 Thread Nagaraj Raj
 Below are the same configurations ins .conf file before and after updagrade
show max_connections; = 1743show shared_buffers = "4057840kB"show 
effective_cache_size =  "8115688kB"show maintenance_work_mem = "259MB"show 
checkpoint_completion_target = "0.9"show wal_buffers = "16MB"show 
default_statistics_target = "100"show random_page_cost = "1.1"show 
effective_io_concurrency =" 200"show work_mem = "4MB"show min_wal_size = 
"256MB"show max_wal_size = "2GB"show max_worker_processes = "8"show 
max_parallel_workers_per_gather = "2"

here is some sys logs,
2020-02-16 21:01:17 UTC [-]The database process was killed by the OS 
due to excessive memory consumption. 2020-02-16 13:41:16 UTC [-]The 
database process was killed by the OS due to excessive memory consumption. 

I identified one simple select which consuming more memory and here is the 
query plan,


"Result  (cost=0.00..94891854.11 rows=3160784900 width=288)""  ->  Append  
(cost=0.00..47480080.61 rows=3160784900 width=288)""        ->  Seq Scan on 
msghist  (cost=0.00..15682777.12 rows=312949 width=288)""              
Filter: (((data -> 'info'::text) ->> 'status'::text) = 'CLOSE'::text)""        
->  Seq Scan on msghist msghist_1  (cost=0.00..189454.50 rows=31294900 
width=288)""              Filter: (((data -> 'info'::text) ->> 'status'::text) 
= 'CLOSE'::text)"


Thanks,


On Tuesday, February 18, 2020, 09:59:37 AM PST, Tomas Vondra 
 wrote:  
 
 On Tue, Feb 18, 2020 at 05:46:28PM +, Nagaraj Raj wrote:
>after upgrade Postgres to v9.6.11 from v9.6.9 DB running out of memory issues 
>no world load has changed before and after upgrade. 
>
>spec: RAM 16gb,4vCore
>Any bug reported like this or suggestions on how to fix this issue? I 
>appreciate the response..!! 
>

This bug report (in fact, we don't know if it's a bug, but OK) is
woefully incomplete :-(

The server log is mostly useless, unfortunately - it just says a bunch
of processes were killed (by OOM killer, most likely) so the server has
to restart. It tells us nothing about why the backends consumed so much
memory etc.

What would help us is knowing how much memory was the backend (killed by
OOM) consuming, which should be in dmesg.

And then MemoryContextStats output - you need to connect to a backend
consuming a lot of memory using gdb (before it gets killed) and do

  (gdb) p MemoryContextStats(TopMemoryContext)
  (gdb) q

and show us the output printed into server log. If it's a backend
running a query, it'd help knowing the execution plan.

It would also help knowing the non-default configuration, i.e. stuff
tweaked in postgresql.conf.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 


  

Re: DB running out of memory issues after upgrade

2020-02-18 Thread Tomas Vondra

On Tue, Feb 18, 2020 at 05:46:28PM +, Nagaraj Raj wrote:

after upgrade Postgres to v9.6.11 from v9.6.9 DB running out of memory issues 
no world load has changed before and after upgrade. 

spec: RAM 16gb,4vCore
Any bug reported like this or suggestions on how to fix this issue? I 
appreciate the response..!! 



This bug report (in fact, we don't know if it's a bug, but OK) is
woefully incomplete :-(

The server log is mostly useless, unfortunately - it just says a bunch
of processes were killed (by OOM killer, most likely) so the server has
to restart. It tells us nothing about why the backends consumed so much
memory etc.

What would help us is knowing how much memory was the backend (killed by
OOM) consuming, which should be in dmesg.

And then MemoryContextStats output - you need to connect to a backend
consuming a lot of memory using gdb (before it gets killed) and do

 (gdb) p MemoryContextStats(TopMemoryContext)
 (gdb) q

and show us the output printed into server log. If it's a backend
running a query, it'd help knowing the execution plan.

It would also help knowing the non-default configuration, i.e. stuff
tweaked in postgresql.conf.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





PG 10 vs. 11: Large increase in memory usage when selecting BYTEA data (actually out of memory with PG11)

2019-03-13 Thread Matthias Otterbach
Dear mailing list,

I am currently testing an application for which I previously used PostgreSQL 10 
with the current PostgreSQL 11.2 release. During the tests I experienced out of 
memory errors of my database which I could not explain, they seem to affect 
tables containing large BYTEA data.

I was able to narrow them down to a small example query where I also 
experienced a much higher memory usage with PostgreSQL 11 compared to previous 
versions.

For my comparison I worked on a local VM with only 4 GB memory configured, a 
plain Ubuntu 18.04.2 installation (including all updates) and the current 
PostgreSQL 10.7 resp. 11.2 installation installed from 
http://apt.postgresql.org/pub/repos/apt/dists/bionic-pgdg/. I created a table 
containing a large amount of BYTEA data using these statements:

> create table public.a (b bytea);
> insert into public.a select repeat('0', 1024 * 1024 * 100)::bytea from 
> generate_series(1, 300);
> select pg_size_pretty(sum(length(b))) from public.a;

In total I now have ~ 29 GB of data (actually 300 rows à 100 MB) and start 
selecting all data using a small Java program (current JDBC driver 42.2.5).

String sql = "SELECT b FROM public.a";
try (Connection connection = 
DriverManager.getConnection("jdbc:postgresql://127.0.0.1:5432/postgres?currentSchema=public",
 "username", "password")) {
  connection.setAutoCommit(false);
  try (PreparedStatement ps = connection.prepareStatement(sql)) {
  ps.setFetchSize(5); // we do not want to run out of Java heap space
  try (ResultSet rs = ps.executeQuery()) {
int i = 0;
while (rs.next()) {
// just loop over all data, get the data and do something with it (actually 
we print a line every 10 rows containing the length, the other argument is 
never true with my test data
  i++;
  byte[] b = rs.getBytes(1);
  if (i % 10 == 0 || b == null || b.length <= i) {
System.err.println("Row " + i + ": " + (b != null ? b.length : null));
  }
}
  }
}

The Java program actually just executes SELECT b FROM public.a and keeps 
streaming more rows doing something with the content, also the execution plan 
for my query is fairly simple - actually it seems to be just a sequential scan 
(with both versions).

With PostgreSQL 10.7 the program went through fine (with plenty of free memory 
on my database VM, actually including the OS there was never used more than 1 
GB on the VM). With PostgreSQL 11.2 the memory of my postgres process (pid of 
my session) keeps increasing and finally crashes after I fetched only about 8 
GB of the data:

TopMemoryContext: 67424 total in 5 blocks; 12656 free (10 chunks); 54768 used
TableSpace cache: 8192 total in 1 blocks; 2096 free (0 chunks); 6096 used
TopTransactionContext: 8192 total in 1 blocks; 7744 free (1 chunks); 448 used
pgstat TabStatusArray lookup hash table: 8192 total in 1 blocks; 1456 free (0 
chunks); 6736 used
RowDescriptionContext: 8192 total in 1 blocks; 6896 free (0 chunks); 1296 used
MessageContext: 8192 total in 1 blocks; 6752 free (1 chunks); 1440 used
Operator class cache: 8192 total in 1 blocks; 560 free (0 chunks); 7632 used
smgr relation table: 16384 total in 2 blocks; 4600 free (2 chunks); 11784 used
TransactionAbortContext: 32768 total in 1 blocks; 32512 free (0 chunks); 256 
used
Portal hash: 8192 total in 1 blocks; 560 free (0 chunks); 7632 used
TopPortalContext: 8192 total in 1 blocks; 7392 free (0 chunks); 800 used
PortalContext: 1024 total in 1 blocks; 552 free (0 chunks); 472 used: C_1
ExecutorState: 4294976384 total in 17 blocks; 4080 free (0 chunks); 4294972304 
used
printtup: 314581120 total in 3 blocks; 7936 free (8 chunks); 314573184 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
PortalContext: 1024 total in 1 blocks; 744 free (1 chunks); 280 used: 
Relcache by OID: 16384 total in 2 blocks; 3512 free (2 chunks); 12872 used
CacheMemoryContext: 524288 total in 7 blocks; 133056 free (5 chunks); 391232 
used
[...]
Grand total: 4610628736 bytes in 192 blocks; 315312 free (137 chunks); 
4610313424 used
2019-03-13 10:11:54.628 UTC [1350] postgres@postgres ERROR: out of memory
2019-03-13 10:11:54.628 UTC [1350] postgres@postgres DETAIL: Failed on request 
of size 268435456 in memory context "ExecutorState".
2019-03-13 10:11:54.628 UTC [1350] postgres@postgres STATEMENT: SELECT b FROM 
public.a

Am I even posting this to the right list (sorry if I choose the wrong one), I 
also already thought about filing a bug report, but it could be a bug in either 
PostgreSQL or the PG JDBC driver (I suspect the database itself as I use the 
same driver against PG 10). Would we expect PG 11 to use that much more memory 
than PG 10? Is it maybe a known bug (I did a quick look a pg-bugs but did not 
find any concerning this problem). 

Actually my process crashed after I've selected only about 80 rows - with PG 10 
I was able to load all 300 rows (and eve

Re: Ran out of memory retrieving query results.

2019-03-11 Thread Andreas Kretschmer




Am 11.03.19 um 06:44 schrieb Nanda Kumar:

Hello Tem,

Can you please help on the below issues . The below Error occurred when I run 
the select statement for the huge data volume.

  Error Details :

  Ran out of memory retrieving query results.


you should provide more details, for instance (at least)
* os and pg version
* how much ram contains the machine
* config-parameters (shared_buffers, work_mem, max_connections)
* running activities
* the query itself
* table-definition
* how large are the tables
* the EXPLAIN of the query


Regards, Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com




RE: Ran out of memory retrieving query results.

2019-03-11 Thread Nanda Kumar
Hello Tem,

Can you please help on the below issues . The below Error occurred when I run 
the select statement for the huge data volume.

 Error Details :

 Ran out of memory retrieving query results.

Regards
Nanda Kumar.M
SmartStream Technologies
Eastland Citadel | 5th Floor | 102 Hosur Road | Banglore 560 095 | India
nanda.ku...@smartstream-stp.com | www.smartstream-stp.com
Mob
+91 99720 44779
Tel
+91 80617 64107



-Original Message-
From: Stephen Frost [mailto:sfr...@snowman.net]
Sent: 09 March 2019 00:19
To: Nanda Kumar
Cc: pgsql-general Owner
Subject: Re: Ran out of memory retrieving query results.

Greetings,

You need to email pgsql-general@lists.postgresql.org with your question, this 
address is for the moderators.

Thanks!

Stephen

* Nanda Kumar (nanda.ku...@smartstream-stp.com) wrote:
> Hello Team,
>
>
>
> I am getting error when I try to run the select query. Kindly help me in 
> fixing this issue.
>
>
>
> Error Details :
>
>
>
> Ran out of memory retrieving query results.
>
>
>
> Screenshot of the error :
>
>
>
> [cid:image001.png@01D4D5AA.5A204D50]
>
>
>
> Regards
>
> Nanda Kumar.M
>
> SmartStream Technologies
>
> Eastland Citadel | 5th Floor | 102 Hosur Road | Banglore 560 095 |
> India
>
> nanda.ku...@smartstream-stp.com | www.smartstream-stp.com
>
> Mob +91 99720 44779
>
> 
> The information in this email is confidential and may be legally privileged. 
> It is intended solely for the addressee. Access to this email by anyone else 
> is unauthorised. If you are not the intended recipient, any disclosure, 
> copying, distribution or any action taken or omitted to be taken in reliance 
> on it, is prohibited and may be unlawful.



 The information in this email is confidential and may be legally privileged. 
It is intended solely for the addressee. Access to this email by anyone else is 
unauthorised. If you are not the intended recipient, any disclosure, copying, 
distribution or any action taken or omitted to be taken in reliance on it, is 
prohibited and may be unlawful.



Re: Out of memory: Kill process nnnn (postmaster) score nn or sacrifice child

2019-02-13 Thread Andreas Kretschmer
On 12 February 2019 17:20:09 CET, Vikas Sharma  wrote:
>Hello All,
>
>I have a 4 node PostgreSQL 9.6 cluster with streaming replication.  we
>encounter today the Out of Memory  Error on the Master which resulted
>in
>All postres  processes restarted and cluster recovered itself. Please
>let
>me know the best way to diagnose this issue.
>

typical reason for oom-kill are too high values for work_mem.


Regards, Andreas

-- 
2ndQuadrant - The PostgreSQL Support Company



Re: Out of memory: Kill process nnnn (postmaster) score nn or sacrifice child

2019-02-13 Thread Vikas Sharma
Thank you Adrian for the reply, I did check the postgres processes running
around the time when OOM was invoked, there were lots of high CPU consuming
postgres processes running long running selects.
I am not sure of how to interpret the memory terms appearing in   linux
dmeg or /var/log/messages but I can see out of memory happened and
Postmaster invoked OOM.

Regards
Vikas Sharma

On Tue, 12 Feb 2019 at 16:39, Adrian Klaver 
wrote:

> On 2/12/19 8:20 AM, Vikas Sharma wrote:
> > Hello All,
> >
> > I have a 4 node PostgreSQL 9.6 cluster with streaming replication.  we
> > encounter today the Out of Memory  Error on the Master which resulted in
> > All postres  processes restarted and cluster recovered itself. Please
> > let me know the best way to diagnose this issue.
>
> For a start look back further in the Postgres log then the below. What
> is shown below is the effects of the OOM killer. What you need to look
> for is the statement that caused Postgres memory to increase to the
> point that the OOM killer was invoked.
>
> >
> >
> >
> > The error seen in the postgresql log:
> >
> > 2019-02-12 10:55:17 GMT LOG:  terminating any other active server
> processes
> > 2019-02-12 10:55:17 GMT WARNING:  terminating connection because of
> > crash of another server process
> > 2019-02-12 10:55:17 GMT DETAIL:  The postmaster has commanded this
> > server process to roll back the current transaction and exit, because
> > another server process exited abnormally and possibly corrupted shared
> > memory.
> > 2019-02-12 10:55:17 GMT HINT:  In a moment you should be able to
> > reconnect to the database and repeat your command.
> > 2019-02-12 10:55:17 GMT WARNING:  terminating connection because of
> > crash of another server process
> > 2019-02-12 10:55:17 GMT DETAIL:  The postmaster has commanded this
> > server process to roll back the current transaction and exit, because
> > another server process exited abnormally and possibly corrupted shared
> > memory.
> > 2019-02-12 10:55:17 GMT HINT:  In a moment you should be able to
> > reconnect to the database and repeat your command.
> > 2019-02-12 10:55:17 GMT WARNING:  terminating connection because of
> > crash of another server process
> > -
> >
> > Error from dmesg on linux:
> > ---
> > [4331093.885622] Out of memory: Kill process n (postmaster) score nn
> > or sacrifice child
> > [4331093.890225] Killed process n (postmaster) total-vm:18905944kB,
> > anon-rss:1747460kB, file-rss:4kB, shmem-rss:838220kB
> >
> > Thanks & Best Regards
> > Vikas Sharma
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>


Re: Out of memory: Kill process nnnn (postmaster) score nn or sacrifice child

2019-02-12 Thread Adrian Klaver

On 2/12/19 8:20 AM, Vikas Sharma wrote:

Hello All,

I have a 4 node PostgreSQL 9.6 cluster with streaming replication.  we 
encounter today the Out of Memory  Error on the Master which resulted in 
All postres  processes restarted and cluster recovered itself. Please 
let me know the best way to diagnose this issue.


For a start look back further in the Postgres log then the below. What 
is shown below is the effects of the OOM killer. What you need to look 
for is the statement that caused Postgres memory to increase to the 
point that the OOM killer was invoked.






The error seen in the postgresql log:

2019-02-12 10:55:17 GMT LOG:  terminating any other active server processes
2019-02-12 10:55:17 GMT WARNING:  terminating connection because of 
crash of another server process
2019-02-12 10:55:17 GMT DETAIL:  The postmaster has commanded this 
server process to roll back the current transaction and exit, because 
another server process exited abnormally and possibly corrupted shared 
memory.
2019-02-12 10:55:17 GMT HINT:  In a moment you should be able to 
reconnect to the database and repeat your command.
2019-02-12 10:55:17 GMT WARNING:  terminating connection because of 
crash of another server process
2019-02-12 10:55:17 GMT DETAIL:  The postmaster has commanded this 
server process to roll back the current transaction and exit, because 
another server process exited abnormally and possibly corrupted shared 
memory.
2019-02-12 10:55:17 GMT HINT:  In a moment you should be able to 
reconnect to the database and repeat your command.
2019-02-12 10:55:17 GMT WARNING:  terminating connection because of 
crash of another server process

-

Error from dmesg on linux:
---
[4331093.885622] Out of memory: Kill process n (postmaster) score nn 
or sacrifice child
[4331093.890225] Killed process n (postmaster) total-vm:18905944kB, 
anon-rss:1747460kB, file-rss:4kB, shmem-rss:838220kB


Thanks & Best Regards
Vikas Sharma



--
Adrian Klaver
adrian.kla...@aklaver.com



Out of memory: Kill process nnnn (postmaster) score nn or sacrifice child

2019-02-12 Thread Vikas Sharma
Hello All,

I have a 4 node PostgreSQL 9.6 cluster with streaming replication.  we
encounter today the Out of Memory  Error on the Master which resulted in
All postres  processes restarted and cluster recovered itself. Please let
me know the best way to diagnose this issue.



The error seen in the postgresql log:

2019-02-12 10:55:17 GMT LOG:  terminating any other active server processes
2019-02-12 10:55:17 GMT WARNING:  terminating connection because of crash
of another server process
2019-02-12 10:55:17 GMT DETAIL:  The postmaster has commanded this server
process to roll back the current transaction and exit, because another
server process exited abnormally and possibly corrupted shared memory.
2019-02-12 10:55:17 GMT HINT:  In a moment you should be able to reconnect
to the database and repeat your command.
2019-02-12 10:55:17 GMT WARNING:  terminating connection because of crash
of another server process
2019-02-12 10:55:17 GMT DETAIL:  The postmaster has commanded this server
process to roll back the current transaction and exit, because another
server process exited abnormally and possibly corrupted shared memory.
2019-02-12 10:55:17 GMT HINT:  In a moment you should be able to reconnect
to the database and repeat your command.
2019-02-12 10:55:17 GMT WARNING:  terminating connection because of crash
of another server process
-

Error from dmesg on linux:
---
[4331093.885622] Out of memory: Kill process n (postmaster) score nn or
sacrifice child
[4331093.890225] Killed process n (postmaster) total-vm:18905944kB,
anon-rss:1747460kB, file-rss:4kB, shmem-rss:838220kB

Thanks & Best Regards
Vikas Sharma


RE: pg_dump out of memory for large table with LOB

2018-11-21 Thread Jean-Marc Lessard
JMLessard wrote:
> What about updates where the bytea do not changed. Does a new copy of the 
> bytea will be made in the toast table or new row will point to the original 
> bytea?
> > https://www.postgresql.org/docs/current/storage-toast.html says:
> > The TOAST management code is triggered only when a row value to be stored 
> > in a table is wider than TOAST_TUPLE_THRESHOLD bytes (normally 2 kB). The 
> > TOAST code will compress
> > and/or move field values out-of-line until the row value is shorter than 
> > TOAST_TUPLE_TARGET bytes (also normally 2 kB, adjustable) or no more gains 
> > can be had. During an UPDATE
> > operation, values of unchanged fields are normally preserved as-is; so an 
> > UPDATE of a row with out-of-line values incurs no TOAST costs if none of 
> > the out-of-line values change.
> Does it means, no incurs cost to generate the out of line toast, but that a 
> copy of the bytea is still made for the new line?

I bench mark it as follow:
UPDATE table SET mod_tim=mod_tim;
The relpages of the table doubled, but the relpages of the toast table did not 
changed.

Jean-Marc Lessard
Administrateur de base de donn?es / Database Administrator
Ultra Electronics Forensic Technology Inc.
T +1 514 489 4247 x4164
www.ultra-forensictechnology.com


RE: pg_dump out of memory for large table with LOB

2018-11-15 Thread Jean-Marc Lessard
Thanks to Daniel Verite, nice answer, really helpful :)
It summarizes what I have read in the doc and blogs.

What about updates where the bytea do not changed. Does a new copy of the bytea 
will be made in the toast table or new row will point to the original bytea?
> https://www.postgresql.org/docs/current/storage-toast.html says
> The TOAST management code is triggered only when a row value to be stored in 
> a table is wider than TOAST_TUPLE_THRESHOLD bytes (normally 2 kB). The TOAST 
> code will compress
> and/or move field values out-of-line until the row value is shorter than 
> TOAST_TUPLE_TARGET bytes (also normally 2 kB, adjustable) or no more gains 
> can be had. During an UPDATE
> operation, values of unchanged fields are normally preserved as-is; so an 
> UPDATE of a row with out-of-line values incurs no TOAST costs if none of the 
> out-of-line values change.

Does it means, no incurs cost to generate the out of line toast, but that a 
copy of the bytea is still made for the new line?


Jean-Marc Lessard
Administrateur de base de données / Database Administrator
Ultra Electronics Forensic Technology Inc.
T +1 514 489 4247 x4164
www.ultra-forensictechnology.com


RE: pg_dump out of memory for large table with LOB

2018-11-15 Thread Daniel Verite
Jean-Marc Lessard wrote:

> Another area where LOB hurts is the storage. LOB are broken and stored in 2K
> pieces.
> Due to the block header, only three 2k pieces fit in an 8k block wasting 25%
> of space (in fact pgstattuple reports ~ 20%).

Yes. bytea stored as TOAST is sliced into pieces of 2000 bytes, versus
2048 bytes for large objects. And that makes a significant difference
when packing these slices because 2000*4+page overhead+
4*(row overhead) is just under the default size of 8192 bytes per page,
whereas 2048*4+(page overhead)+4*(row overhead)
is obviously a bit over 8192, since 2048*4=8192.

If the data is compressible, the difference may be less obvious because
the slices in pg_largeobject are compressed individually
(as opposed to bytea that gets compressed as a whole),
so more than 3 slices can fit in a page inside pg_largeobject
The post-compression size can be known with pg_column_size(),
versus octet_length() that gives the pre-compression size.
 
> Would you recommend bytea over LOB considering that the max LOB size is well
> bellow 1GB?
> Are bytea preferable in terms of support by the community, performance,
> feature, etc?

For the storage and pg_dump issues, bytea seems clearly preferable
in your case.
As for the performance aspect, large objects are excellent because their
API never requires a binary<->text conversion.
This may be different with bytea. The C API provided by libpq allows to
retrieve and send bytea in binary format, for instance through
PQexecParams(), but most drivers implemented on top of libpq use only 
the text representation for all datatypes, because it's simpler for them.
So you may want to check the difference in sending and retrieving
your biggest binary objects with your particular app/language/framework
stored in a bytea column versus large objects.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: pg_dump out of memory for large table with LOB

2018-11-14 Thread Ron

On 11/14/2018 11:14 AM, Jean-Marc Lessard wrote:

Adrien Nayrat wrote:
> With 17 million LO, it could eat lot of memory ;)
Yes it does.

I did several tests and here are my observations.

First memory settings are:
shared_buffers = 3GB
work_mem = 32Mb
maintenance_work_mem = 1GB
effective_cache_size = 9MB
bytea_output = 'escape'


Why escape instead of hex?


--
Angular momentum makes the world go 'round.


Re: pg_dump out of memory for large table with LOB

2018-11-14 Thread Tom Lane
Jean-Marc Lessard  writes:
> Would you recommend bytea over LOB considering that the max LOB size is well 
> bellow 1GB?

Yes, probably.  The reason that pg_dump has trouble with lots of small
BLOBs is the 9.0-era decision to treat BLOBs as independent objects
having their own owners, privilege attributes, and archive TOC entries
--- it's really the per-BLOB TOC entries that are causing the issue
for you here.  That model is fine as long as BLOBs are, uh, large.
If you're using them as replacements for bytea, the overhead is going
to be prohibitive.

regards, tom lane



Re: pg_dump out of memory for large table with LOB

2018-11-11 Thread Adrien Nayrat
Hello,

On 11/10/18 12:49 AM, Jean-Marc Lessard wrote:
> The dumped table is 0.5TB, 17 million rows and LOB uses about 99% of the 
> space.
> 

If I understand, you have 17 million Large Object?

I do not recall exactly and maybe I am wrong. But it seems pg_dump has to
allocate memory for each object to dump :
addBoundaryDependencies:

for (i = 0; i < numObjs; i++)
[...]

case DO_BLOB_DATA:
/* Data objects: must come between the boundaries */
addObjectDependency(dobj, preDataBound->dumpId);
addObjectDependency(postDataBound, dobj->dumpId);
break;

addObjectDependency:

[...]
pg_malloc(dobj->allocDeps * sizeof(DumpId));


With 17 million LO, it could eat lot of memory ;)



signature.asc
Description: OpenPGP digital signature


Re: pg_dump out of memory for large table with LOB

2018-11-10 Thread Adrian Klaver

On 11/10/18 2:46 PM, Ron wrote:

On 11/09/2018 05:49 PM, Jean-Marc Lessard wrote:


I am running PostgreSQL 9.6.5 on x86_64-pc-mingw64, compiled by 
gcc.exe (Rev5, Built by MSYS2 project) 4.9.2, 64-bit


on win2012 with 12Gb RAM

The dumped table is 0.5TB, 17 million rows and LOB uses about 99% of 
the space.


The pg_dump consumes the entire system memory and swap, then 
terminates with out of memory error


Is it a bug or normal behavior?

If I do not include LOB in the dump, it works fine.

Here is the dump output:

C:\Users\Administrator> pg_dump -h localhost -Fc -a -b -t signatures 
-v > d:\postgresql\sig.dmp


pg_dump: last built-in OID is 16383

...

pg_dump: reading row security enabled for table "ibisl1.signatures"

pg_dump: reading policies for table "ibisl1.signatures"

pg_dump: reading large objects

pg_dump: reading dependency data

pg_dump: saving encoding = UTF8

pg_dump: saving standard_conforming_strings = on

out of memory



This looks similar to the recent thread "Trouble Upgrading Postgres".
https://www.postgresql.org/message-id/flat/CAFw6%3DU2oz9rTF0qa0LFMg91bu%3Dhdisfu2-xXU1%3D%3DD7yBif%2B2uw%40mail.gmail.com

Specifically, message 
ce239c9c-68f2-43e6-a6b6-81c66d0f4...@manitou-mail.org 
<https://www.postgresql.org/message-id/ce239c9c-68f2-43e6-a6b6-81c66d0f46e5%40manitou-mail.org>


"The hex expansion performed by COPY must allocate twice that size,
plus the rest of the row, and if that resulting size is above 1GB, it
will error out with the message you mentioned upthread:
ERROR: invalid memory alloc request size .
So there's no way it can deal with the contents over 500MB, and the
ones just under that limit may also be problematic."


I don't this is the case. The above is an issue because of the maximum 
length of a string that Postgres can process. LO's are different creatures:


https://www.postgresql.org/docs/11/lo-implementation.html


It would help to see the memory configuration values set for the cluster:

https://www.postgresql.org/docs/11/lo-implementation.html




And message 89b5b622-4c79-4c95-9ad4-b16d0d0da...@manitou-mail.org

"It's undoubtedly very annoying that a database can end up with

non-pg_dump'able contents, but it's not an easy problem to solve. Some 
time ago, work was done to extend the 1GB limit but eventually it got 
scratched. The thread in [1] discusses many details of the problem and 
why the proposed solution were mostly a band aid. Basically, the specs 
of COPY and other internal aspects of Postgres are from the 32-bit era 
when putting the size of an entire CDROM in a single row/column was not 
anticipated as a valid use case. It's still a narrow use case today and 
applications that need to store big pieces of data like that should 
slice them in chunks, a bit like in pg_largeobject, except in much 
larger chunks, like 1MB.


[1] pg_dump / copy bugs with "big lines" ? 
https://www.postgresql.org/message-id/1836813.YmyOrS99PX%40ronan.dunklau.fr 
<https://www.postgresql.org/message-id/1836813.ymyors9...@ronan.dunklau.fr>


  "



*Jean-Marc Lessard*
Administrateur de base de données / Database Administrator
Ultra Electronics Forensic Technology Inc.
*T* +1 514 489 4247 x4164
www.ultra-forensictechnology.com <http://www.ultra-forensictechnology.com>



--
Angular momentum makes the world go 'round.



--
Adrian Klaver
adrian.kla...@aklaver.com



pg_dump out of memory for large table with LOB

2018-11-10 Thread Jean-Marc Lessard
I am running PostgreSQL 9.6.5 on x86_64-pc-mingw64, compiled by gcc.exe (Rev5, 
Built by MSYS2 project) 4.9.2, 64-bit
on win2012 with 12Gb RAM
The dumped table is 0.5TB, 17 million rows and LOB uses about 99% of the space.
The pg_dump consumes the entire system memory and swap, then terminates with 
out of memory error
Is it a bug or normal behavior?
If I do not include LOB in the dump, it works fine.

Here is the dump output:
C:\Users\Administrator> pg_dump -h localhost -Fc -a -b -t signatures -v > 
d:\postgresql\sig.dmp
pg_dump: last built-in OID is 16383
...
pg_dump: reading row security enabled for table "ibisl1.signatures"
pg_dump: reading policies for table "ibisl1.signatures"
pg_dump: reading large objects
pg_dump: reading dependency data
pg_dump: saving encoding = UTF8
pg_dump: saving standard_conforming_strings = on
out of memory

Jean-Marc Lessard
Administrateur de base de donn?es / Database Administrator
Ultra Electronics Forensic Technology Inc.
T +1 514 489 4247 x4164
www.ultra-forensictechnology.com<http://www.ultra-forensictechnology.com>


Re: Out of Memory

2018-09-29 Thread Peter J. Holzer
On 2018-09-28 07:23:59 +0200, Laurenz Albe wrote:
> Rob Sargent wrote:
> > > Christoph Moench-Tegeder wrote:
> > > > ## Laurenz Albe (laurenz.a...@cybertec.at):
> > > > 
> > > > > vm.overcommit_memory = 2
> > > > > vm_overcommit_ratio = 100
> > > > > 
> > > > > Linux commits (swap * overcommit_ratio * RAM / 100),
> > > > 
> > > >  ^
> > > >  That should be a "+".
> > > 
> > > Yes; shame on me for careless typing, and thank you for the
> > > correction.
> > 
> > Are there any parentheses needed in that formula?
> 
> No.  It is swap space plus a certain percentage of RAM.
> 
> I don't know how the Linux developers came up with that
> weird formula.

I suspect they copied it from some other Unix.

Traditionally, Unix allocated all memory in the swap space. You could
say that the swap space was the "real memory" and RAM was a cache for
that (Hence the rule that swap must be at least as large as RAM and
should preferrably be 2 or 4 times the size of RAM). So, when Unixes
started to allow allocating more virtual memory than swap space, they
were "overcommitting". 

But for Linux that doesn't make much sense, since a page lived either in
RAM /or/ in swap right from the start, so the limit was always RAM+swap,
not swap alone, and you are only overcommitting if you exceeded the size
of the sum. The overcommitment in Linux is of a different kind: Linux
uses copy on write whereever it can (e.g. when forking processes, but
also when mallocing memory), and a CoW page may or may not be written in
the future. It only needs additional space when it's actually written,
so by counting the page only once (hoping that there will be enough
space if and when that page is written) the kernel is overcommitting
memory. 

hp


-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson 


signature.asc
Description: PGP signature


Re: Out of Memory

2018-09-27 Thread Laurenz Albe
Rob Sargent wrote:
> > Christoph Moench-Tegeder wrote:
> > > ## Laurenz Albe (laurenz.a...@cybertec.at):
> > > 
> > > > vm.overcommit_memory = 2
> > > > vm_overcommit_ratio = 100
> > > > 
> > > > Linux commits (swap * overcommit_ratio * RAM / 100),
> > > 
> > >  ^
> > >  That should be a "+".
> > 
> > Yes; shame on me for careless typing, and thank you for the
> > correction.
> 
> Are there any parentheses needed in that formula?

No.  It is swap space plus a certain percentage of RAM.

I don't know how the Linux developers came up with that
weird formula.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




Re: Out of Memory

2018-09-27 Thread Rob Sargent



> On Sep 27, 2018, at 3:45 PM, Laurenz Albe  wrote:
> 
> Christoph Moench-Tegeder wrote:
>> ## Laurenz Albe (laurenz.a...@cybertec.at):
>> 
>>> vm.overcommit_memory = 2
>>> vm_overcommit_ratio = 100
>>> 
>>> Linux commits (swap * overcommit_ratio * RAM / 100),
>> 
>>  ^
>>  That should be a "+".
> 
> Yes; shame on me for careless typing, and thank you for the
> correction.
Are there any parentheses needed in that formula?



Re: Out of Memory

2018-09-27 Thread Laurenz Albe
Christoph Moench-Tegeder wrote:
> ## Laurenz Albe (laurenz.a...@cybertec.at):
> 
> > vm.overcommit_memory = 2
> > vm_overcommit_ratio = 100
> > 
> > Linux commits (swap * overcommit_ratio * RAM / 100),
> 
>   ^
>   That should be a "+".

Yes; shame on me for careless typing, and thank you for the
correction.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




Re: Out of Memory

2018-09-27 Thread Christoph Moench-Tegeder
## Laurenz Albe (laurenz.a...@cybertec.at):

> vm.overcommit_memory = 2
> vm_overcommit_ratio = 100
> 
> Linux commits (swap * overcommit_ratio * RAM / 100),
  ^
  That should be a "+".

See Documentation/sysctl/vm.txt and Documentation/vm/overcommit-accounting.rst
in your kernel tree.

> so without any swap the correct value would be 100.
> 
> I don't know how safe it is to got entirely without swap.

That's completely safe. Even more: if your production machine ever needs
swap, it's already mostly dead in the water. So there's also no benefit in
having the kernel copy stuff to swap "just in case" (read: set sysctl
vm.swappiness to 0) and use the swap disk space for something more
productive. If your memory usage ever exceeds available memory, something
already went horribly wrong, and you want to know about it right now.
A good way to know about this kind of mistakes is having stuff fail
hard, obviously and loudly (i.e. Out Of Memory). Do not try to save
some day by swapping: stuff is "sometimes slow" and if you really need
the swap, everything crawls to a halt anyway.
Of course, this does not hold for personal workstations and
the like where you might want to use the swap space for suspend to
disk, or have some buffer in case of runaway processes.

Regards,
Christoph

-- 
Spare Space.



Re: Out of Memory

2018-09-26 Thread Laurenz Albe
greigwise wrote:
> All right.. one more thing here.  Any suggestions for how to set overcommit
> on a postgres db server with 16 GB of RAM and no swap?   I think I want
> vm.overcommit_memory = 2, as I understand that prevents the OOM killer from
> zapping me.   Is 100% the right way to go for overcommit_ratio?  Is there a
> drawback to this?

vm.overcommit_memory = 2
vm_overcommit_ratio = 100

Linux commits (swap * overcommit_ratio * RAM / 100),
so without any swap the correct value would be 100.

I don't know how safe it is to got entirely without swap.
I would add a little bit just to be sure.

Yours,
Laurenz Albe




Re: Out of Memory

2018-09-26 Thread greigwise
All right.. one more thing here.  Any suggestions for how to set overcommit
on a postgres db server with 16 GB of RAM and no swap?   I think I want
vm.overcommit_memory = 2, as I understand that prevents the OOM killer from
zapping me.   Is 100% the right way to go for overcommit_ratio?  Is there a
drawback to this?

Thanks again.
Greig



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html



Re: Out of Memory

2018-09-26 Thread greigwise
I think I figured it out:

vm.overcommit_memory = 2
vm.overcommit_ratio = 50

Only allows me to use 50% of my RAM... ugh!  I have 16 GB, so when only 8 is
left, I start seeing OOM.  Will increase this setting and see if it helps.

Thanks everyone for the help.

Greig



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html



Re: Out of Memory

2018-09-26 Thread greigwise
Tom Lane-2 wrote
> greigwise 

> greigwise@

>  writes:
>> Is it possible that the fact that my stack size is limited is what is
>> causing my issue?
> 
> No.  If you were hitting that limit you'd get a message specifically
> talking about stack.
> 
>   regards, tom lane

Well, darn.  I'm at a loss... any suggestions of what I should do next to
troubleshoot this?

Thanks.
Greig



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html



Re: Out of Memory

2018-09-26 Thread Tom Lane
greigwise  writes:
> Is it possible that the fact that my stack size is limited is what is
> causing my issue?

No.  If you were hitting that limit you'd get a message specifically
talking about stack.

regards, tom lane



Re: Out of Memory

2018-09-26 Thread greigwise
There is also this:

-bash-4.2$ prlimit -p 6590
RESOURCE   DESCRIPTION SOFT  HARD UNITS
AS address space limitunlimited unlimited bytes
CORE   max core file size 0 unlimited blocks
CPUCPU time   unlimited unlimited seconds
DATA   max data size  unlimited unlimited bytes
FSIZE  max file size  unlimited unlimited blocks
LOCKS  max number of file locks held  unlimited unlimited
MEMLOCKmax locked-in-memory address space 65536 65536 bytes
MSGQUEUE   max bytes in POSIX mqueues819200819200 bytes
NICE   max nice prio allowed to raise 0 0
NOFILE max number of open files1024  4096
NPROC  max number of processes 4096 59341
RSSmax resident set size  unlimited unlimited pages
RTPRIO max real-time priority 0 0
RTTIME timeout for real-time tasksunlimited unlimited microsecs
SIGPENDING max number of pending signals  59341 59341
STACK  max stack size   8388608 unlimited bytes

Is it possible that the fact that my stack size is limited is what is
causing my issue?

Thanks,
Greig



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html



Re: Out of Memory

2018-09-25 Thread Tory M Blue
On Tue, Sep 25, 2018 at 2:05 PM PT  wrote:

> On Tue, 25 Sep 2018 11:34:19 -0700 (MST)
> greigwise  wrote:
>
> > Well, I've been unsuccessful so far on creating a standalone test.
> >
> > I have put some scripting in place to capture some additional
> information on
> > the server with the out of memory issues.   I have a script which just
> > periodically dumps the output of free -m to a text file.
> >
> > So, the output of free -m immediately before and after the out of memory
> > error looks like this:
> >
> > Just before:
> >   totalusedfree  shared  buff/cache
> > available
> > Mem:  148772978 1323553   11766
>
> > 7943
> > Swap: 0   0   0
> >
> > Just after:
> >   totalusedfree  shared  buff/cache
> > available
> > Mem:  148772946 6493548   11280
>
> > 7982
> > Swap: 0   0   0
> >
> > If I have nearly 8 GB of memory left, why am I getting out of memory
> errors?
>
> Doesn't the default NUMA setup mean that it can't actually allocate all
> the available memory to a single NUMA zone (or whatever it's called)?
>
> Or am I talking ancient Linux history with that?
>
> --
> Bill Moran
>
>
By the way we have similar issues running in our smaller stage
environment.  9.5.6 on CentOS 7.2

Only ever my stage environment on smaller AWS t2 boxen. Memory looks fine
but Postgres says it has no access to any. And my settings are really low.

Not sure if it's the same issue, but we run into it a few times a year in
my stage environment. Also running postgres in unlimited mode for ulimit.

Tory


Re: Out of Memory

2018-09-25 Thread PT
On Tue, 25 Sep 2018 11:34:19 -0700 (MST)
greigwise  wrote:

> Well, I've been unsuccessful so far on creating a standalone test.   
> 
> I have put some scripting in place to capture some additional information on
> the server with the out of memory issues.   I have a script which just
> periodically dumps the output of free -m to a text file.  
> 
> So, the output of free -m immediately before and after the out of memory
> error looks like this:
> 
> Just before:
>   totalusedfree  shared  buff/cache  
> available
> Mem:  148772978 1323553   11766   
> 7943
> Swap: 0   0   0
> 
> Just after: 
>   totalusedfree  shared  buff/cache  
> available
> Mem:  148772946 6493548   11280   
> 7982
> Swap: 0   0       0
> 
> If I have nearly 8 GB of memory left, why am I getting out of memory errors?

Doesn't the default NUMA setup mean that it can't actually allocate all
the available memory to a single NUMA zone (or whatever it's called)?

Or am I talking ancient Linux history with that?

-- 
Bill Moran



Re: Out of Memory

2018-09-25 Thread greigwise
Tom Lane-2 wrote
> greigwise 

> greigwise@

>  writes:
>> If I have nearly 8 GB of memory left, why am I getting out of memory
>> errors?
> 
> Probably the postmaster is running under restrictive ulimit settings.
> 
>   regards, tom lane

If I login as the user which runs postmaster, I get this:

-bash-4.2$ ulimit -a
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 59341
max locked memory   (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files  (-n) 1024
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 8192
cpu time   (seconds, -t) unlimited
max user processes  (-u) 4096
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited

Max memory size is unlimited as is virtual memory... is there something else
there I should be concerned with in regard to out of memory?

Thanks again.
Greig



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html



Re: Out of Memory

2018-09-25 Thread Tom Lane
greigwise  writes:
> If I have nearly 8 GB of memory left, why am I getting out of memory errors?

Probably the postmaster is running under restrictive ulimit settings.

regards, tom lane



Re: Out of Memory

2018-09-25 Thread greigwise
Well, I've been unsuccessful so far on creating a standalone test.   

I have put some scripting in place to capture some additional information on
the server with the out of memory issues.   I have a script which just
periodically dumps the output of free -m to a text file.  

So, the output of free -m immediately before and after the out of memory
error looks like this:

Just before:
  totalusedfree  shared  buff/cache  
available
Mem:  148772978 1323553   11766   
7943
Swap: 0   0   0

Just after: 
  totalusedfree  shared  buff/cache  
available
Mem:  148772946 6493548   11280   
7982
Swap: 0   0   0

If I have nearly 8 GB of memory left, why am I getting out of memory errors?

Thanks,
Greig



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html



Re: Out of Memory

2018-09-21 Thread greigwise
Well, we are 64-bit I'll see if I can make some kind of self contained
test to repeat it.

Thanks,
Greig



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html



Re: Out of Memory

2018-09-20 Thread Tom Lane
greigwise  writes:
> Hello, I'm running postgres 9.6.10 on Centos 7.   Seeing the occasional out
> of memory error trying to run a query.   In the logs I see something like
> this:

> Grand total: 462104832 bytes in 795 blocks; 142439136 free (819860 chunks);
> 319665696 used
> 2018-09-20 18:08:01 UTC  5ba3e1a2.7a8a dbname ERROR:  out of memory
> 2018-09-20 18:08:01 UTC  5ba3e1a2.7a8a dbname DETAIL:  Failed on request
> of size 2016.

> If I have 142439136 free, then why am I failing on a request of size 2016?

The free space must be in contexts other than the one that last little
request wanted space in.  Overall, you've got about 460MB of space
consumed in that session, so it's not *that* surprising that you got OOM.
(At least, it's unsurprising on a 32-bit machine.  If the server is
64-bit I'd have thought the kernel would be a bit more liberal.)

But anyway, this looks like a mighty inefficient usage pattern at best,
and maybe a memory leak at worst.  Can you create a self-contained test
case that does this?

regards, tom lane



Re: Out of Memory

2018-09-20 Thread greigwise
ulimit -a for the postgres user shows memory unlimited.

numactl --hardware gives command not found.

Thanks again.

Greig



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html



Re: Out of Memory

2018-09-20 Thread Paul Carlucci
My first two guesses are ulimit or numa.  numactl --hardware will show your
how many nodes your box has and if you're exhausting any of them.

On Thu, Sep 20, 2018, 6:11 PM greigwise  wrote:

> Hello, I'm running postgres 9.6.10 on Centos 7.   Seeing the occasional out
> of memory error trying to run a query.   In the logs I see something like
> this:
>
> Grand total: 462104832 bytes in 795 blocks; 142439136 free (819860 chunks);
> 319665696 used
> 2018-09-20 18:08:01 UTC  5ba3e1a2.7a8a dbname ERROR:  out of memory
> 2018-09-20 18:08:01 UTC  5ba3e1a2.7a8a dbname DETAIL:  Failed on
> request
> of size 2016.
>
> If I have 142439136 free, then why am I failing on a request of size 2016?
>
> Am I misunderstanding here?
>
> Thanks,
> Greig Wise
>
>
>
> --
> Sent from:
> http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
>
>


Out of Memory

2018-09-20 Thread greigwise
Hello, I'm running postgres 9.6.10 on Centos 7.   Seeing the occasional out
of memory error trying to run a query.   In the logs I see something like
this:

Grand total: 462104832 bytes in 795 blocks; 142439136 free (819860 chunks);
319665696 used
2018-09-20 18:08:01 UTC  5ba3e1a2.7a8a dbname ERROR:  out of memory
2018-09-20 18:08:01 UTC  5ba3e1a2.7a8a dbname DETAIL:  Failed on request
of size 2016.

If I have 142439136 free, then why am I failing on a request of size 2016?

Am I misunderstanding here?

Thanks,
Greig Wise



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html



Re: pg_dump out of memory

2018-07-04 Thread Andy Colson

On 07/04/2018 12:31 AM, David Rowley wrote:

On 4 July 2018 at 14:43, Andy Colson  wrote:

I moved a physical box to a VM, and set its memory to 1Gig.  Everything
runs fine except one backup:


/pub/backup# pg_dump -Fc -U postgres -f wildfire.backup wildfirep

g_dump: Dumping the contents of table "ofrrds" failed: PQgetResult() failed.
pg_dump: Error message from server: ERROR:  out of memory
DETAIL:  Failed on request of size 1073741823.> pg_dump: The command was: COPY 
public.ofrrds (id, updateddate, bytes) TO
stdout;


There will be less memory pressure on the server if the pg_dump was
performed from another host. When running pg_dump locally the 290MB
bytea value will be allocated in both the backend process pg_dump is
using and pg_dump itself. Running the backup remotely won't require
the latter to be allocated on the server.


I've been reducing my memory settings:

maintenance_work_mem = 80MB
work_mem = 5MB
shared_buffers = 200MB


You may also get it to work by reducing shared_buffers further.
work_mem won't have any affect, neither will maintenance_work_mem.

Failing that, the suggestions of more RAM and/or swap look good.



Adding more ram to the vm is the simplest option.  I just seems a waste cuz of 
one backup.

Thanks all.

-Andy



Re: pg_dump out of memory

2018-07-03 Thread George Neuner
On Tue, 3 Jul 2018 21:43:38 -0500, Andy Colson 
wrote:

>Hi All,
>
>I moved a physical box to a VM, and set its memory to 1Gig.  Everything
>runs fine except one backup:
>
>
>/pub/backup# pg_dump -Fc -U postgres -f wildfire.backup wildfirep
>
>g_dump: Dumping the contents of table "ofrrds" failed: PQgetResult() failed.
>pg_dump: Error message from server: ERROR:  out of memory
>DETAIL:  Failed on request of size 1073741823.
^^

pg_dump is trying to allocate 1GB.  Obviously it can't if 1GB is all
you have.


>pg_dump: The command was: COPY public.ofrrds (id, updateddate, bytes) TO
>stdout;
>
>wildfire=# \dt+ ofrrds
>List of relations
>  Schema |  Name  | Type  | Owner | Size  | Description
>++---+---+---+-
>  public | ofrrds | table | andy  | 15 MB |
>
>
>wildfire=# \d ofrrds
>   Table "public.ofrrds"
>Column|  Type  | Modifiers
>-++---
>  id  | character varying(100) | not null
>  updateddate | bigint | not null
>  bytes   | bytea  |
>Indexes:
> "ofrrds_pk" PRIMARY KEY, btree (id)
>

There must be a heck of a lot of data in that bytea column.


>I'm not sure how to get this backup to run.  Any hints would be appreciated.

As Adrian mentioned already, you're going to have to give it more
memory somehow.  Either more RAM or a big swap file.

George




Re: pg_dump out of memory

2018-07-03 Thread Adrian Klaver

On 07/03/2018 08:28 PM, Andy Colson wrote:

On 07/03/2018 10:21 PM, Adrian Klaver wrote:

On 07/03/2018 07:43 PM, Andy Colson wrote:

Hi All,

I moved a physical box to a VM, and set its memory to 1Gig.  Everything
runs fine except one backup:


/pub/backup# pg_dump -Fc -U postgres -f wildfire.backup wildfirep

g_dump: Dumping the contents of table "ofrrds" failed: PQgetResult() 
failed.

pg_dump: Error message from server: ERROR:  out of memory
DETAIL:  Failed on request of size 1073741823.
pg_dump: The command was: COPY public.ofrrds (id, updateddate, bytes) TO
stdout;



I'm not sure how to get this backup to run.  Any hints would be 
appreciated.


Maybe:

1) Try:
 pg_dump -t ofrrds
    to dump only that table.



It didnt work.  I get the same error.


Well all I can think of is to give the VM more memory.



Also, I'm running Slackware 14.2, and PG 9.5.11

-Andy






--
Adrian Klaver
adrian.kla...@aklaver.com



Re: pg_dump out of memory

2018-07-03 Thread Andy Colson

On 07/03/2018 10:21 PM, Adrian Klaver wrote:

On 07/03/2018 07:43 PM, Andy Colson wrote:

Hi All,

I moved a physical box to a VM, and set its memory to 1Gig.  Everything
runs fine except one backup:


/pub/backup# pg_dump -Fc -U postgres -f wildfire.backup wildfirep

g_dump: Dumping the contents of table "ofrrds" failed: PQgetResult() failed.
pg_dump: Error message from server: ERROR:  out of memory
DETAIL:  Failed on request of size 1073741823.
pg_dump: The command was: COPY public.ofrrds (id, updateddate, bytes) TO
stdout;



I'm not sure how to get this backup to run.  Any hints would be appreciated.


Maybe:

1) Try:
 pg_dump -t ofrrds
    to dump only that table.



It didnt work.  I get the same error.

Also, I'm running Slackware 14.2, and PG 9.5.11

-Andy





Re: pg_dump out of memory

2018-07-03 Thread Adrian Klaver

On 07/03/2018 07:43 PM, Andy Colson wrote:

Hi All,

I moved a physical box to a VM, and set its memory to 1Gig.  Everything
runs fine except one backup:


/pub/backup# pg_dump -Fc -U postgres -f wildfire.backup wildfirep

g_dump: Dumping the contents of table "ofrrds" failed: PQgetResult() 
failed.

pg_dump: Error message from server: ERROR:  out of memory
DETAIL:  Failed on request of size 1073741823.
pg_dump: The command was: COPY public.ofrrds (id, updateddate, bytes) TO
stdout;

I've been reducing my memory settings:

maintenance_work_mem = 80MB
work_mem = 5MB
shared_buffers = 200MB

But it doesnt seem to make a difference.
The tables looks like:

wildfire=# \dt+ ofrrds
    List of relations
  Schema |  Name  | Type  | Owner | Size  | Description
++---+---+---+-
  public | ofrrds | table | andy  | 15 MB |


ildfire=# \d ofrrds
   Table "public.ofrrds"
    Column    |  Type  | Modifiers
-++---
  id  | character varying(100) | not null
  updateddate | bigint | not null
  bytes   | bytea  |
Indexes:
     "ofrrds_pk" PRIMARY KEY, btree (id)


wildfire=# select id, length(bytes) from ofrrds;
     id |  length
---+---
  muc_rooms | 152330241
  packet_count  |  76165121
  muc_users |  76165121
  sessions  |  76165121
  muc_occupants |   9520641
  muc_traffic   |
  server_bytes  | 301850625
  proxyTransferRate |  76165121
  server_sessions   | 152330241
  conversations | 304660481
(10 rows)


I'm not sure how to get this backup to run.  Any hints would be 
appreciated.


Maybe:

1) Try:
pg_dump -t ofrrds
   to dump only that table.

2) If that works then:

pg_dump -T ofrrds
   to dump everything but that table.



-Andy







--
Adrian Klaver
adrian.kla...@aklaver.com



pg_dump out of memory

2018-07-03 Thread Andy Colson

Hi All,

I moved a physical box to a VM, and set its memory to 1Gig.  Everything
runs fine except one backup:


/pub/backup# pg_dump -Fc -U postgres -f wildfire.backup wildfirep

g_dump: Dumping the contents of table "ofrrds" failed: PQgetResult() failed.
pg_dump: Error message from server: ERROR:  out of memory
DETAIL:  Failed on request of size 1073741823.
pg_dump: The command was: COPY public.ofrrds (id, updateddate, bytes) TO
stdout;

I've been reducing my memory settings:

maintenance_work_mem = 80MB
work_mem = 5MB
shared_buffers = 200MB

But it doesnt seem to make a difference.
The tables looks like:

wildfire=# \dt+ ofrrds
   List of relations
 Schema |  Name  | Type  | Owner | Size  | Description
++---+---+---+-
 public | ofrrds | table | andy  | 15 MB |


ildfire=# \d ofrrds
  Table "public.ofrrds"
   Column|  Type  | Modifiers
-++---
 id  | character varying(100) | not null
 updateddate | bigint | not null
 bytes   | bytea  |
Indexes:
"ofrrds_pk" PRIMARY KEY, btree (id)


wildfire=# select id, length(bytes) from ofrrds;
id |  length
---+---
 muc_rooms | 152330241
 packet_count  |  76165121
 muc_users |  76165121
 sessions  |  76165121
 muc_occupants |   9520641
 muc_traffic   |
 server_bytes  | 301850625
 proxyTransferRate |  76165121
 server_sessions   | 152330241
 conversations | 304660481
(10 rows)


I'm not sure how to get this backup to run.  Any hints would be appreciated.

-Andy





Re: Out of memory error with PG10.3, 10.4 but not 9.3.19

2018-06-12 Thread Christophe combet
 Le mercredi 16 mai 2018 à 09:48:54 UTC+2, ChatPristi  
a écrit :
 
 
 Dear all,

I have a SELECT command (in partitionned tables) that failed 
with:psql:/tmp/query.txt:1: ERROR:  out of memory
DETAIL:  Cannot enlarge string buffer containing 1073741818 bytes by 32 more 
bytes.

I got the error with PG 10.3 and 10.4 on CentOS 7 up-to-date.The command works 
with a smaller size database.
The command works with the same database with PG 9.3.19 on RHEL 6.9 up-to-date.

I attach the EXPLAIN SELECT command.

Apart rewriting the query is there any parameter that could be changed to make 
the query work in the postgresql.conf ?
Thank you very much for any help.

Hello,
Any ideas where the OOM comes from in 10.4 while 9.3 is working ?

Thanks.

  

Fw: Re: Out of memory error with PG10.3, 10.4 but not 9.3.19

2018-06-06 Thread ChatPristi
 Forgotten to CC the list, sorry...

>Well, instead of an explain output which takes 2.4MB compressed and
>9.6MB uncompressed (take it as unreadable), could you produce a
>self-contained test case with a glimpse of the schema you are using?
>Where does the OOM happen, and how did you change your partitioned table
>schema?  Are you using the native partitioning instead?
>Michael,
Thank you for your answer.

Sorry for the unreadable explain output.
I attached a SQL dump with 2 entities loaded in the database (2,872,265 
entities in the actual database), the actual query and the actual output.

The OOM is durin the query (SELECT) after ~9 minutes the memory of the postgres 
increase until 8GB and the OOM message.

Partitioning is done by inherhitance.
After a complete reload of the database in PG10.4 the OOM still exists.


--
Michael<>


Re: Out of memory error with PG10.3, 10.4 but not 9.3.19

2018-05-18 Thread Peter J. Holzer
On 2018-05-15 15:02:48 +, ChatPristi wrote:
> I have a SELECT command (in partitionned tables) that failed with:
> psql:/tmp/query.txt:1: ERROR:  out of memory
> DETAIL:  Cannot enlarge string buffer containing 1073741818 bytes by 32 more
> bytes.

1073741818 is a bit less than 1GB and 1073741818+32 is a bit more. So
you are obviously hitting a 1GB limit here.

Given that 1GB is the maximum length of a character type value in
PostgreSQL and the error message mentions a "string buffer", I suspect
that your query tries to construct a very long string. Try to rewrite
the query so that it creates several shorter strings instead.

hp




-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>


signature.asc
Description: PGP signature


Re: Out of memory error with PG10.3, 10.4 but not 9.3.19

2018-05-16 Thread Michael Paquier
On Tue, May 15, 2018 at 03:02:48PM +, ChatPristi wrote:
> I got the error with PG 10.3 and 10.4 on CentOS 7 up-to-date.The
> command works with a smaller size database. 
> The command works with the same database with PG 9.3.19 on RHEL 6.9
> up-to-date.
> 
> I attach the EXPLAIN SELECT command.

Well, instead of an explain output which takes 2.4MB compressed and
9.6MB uncompressed (take it as unreadable), could you produce a
self-contained test case with a glimpse of the schema you are using?
Where does the OOM happen, and how did you change your partitioned table
schema?  Are you using the native partitioning instead?
--
Michael


signature.asc
Description: PGP signature


Re: error 53200 out of memory

2018-03-20 Thread Adrian Klaver

On 03/20/2018 12:08 AM, francis cherat wrote:

Hello,

there is no message in /var/log/messages


How about the Postgres logs?



Regards



--
Adrian Klaver
adrian.kla...@aklaver.com



RE: error 53200 out of memory

2018-03-20 Thread francis cherat
Hello,

there is no message in /var/log/messages

Regards

De : francis cherat <renault...@hotmail.com>
Envoyé : vendredi 16 mars 2018 20:27
À : Adrian Klaver; pgsql-gene...@postgresql.org
Objet : RE: error 53200 out of memory

I don't think so, but i am not in my office. I can't connect to this server.
 I would answer you on monday

Regards

De : Adrian Klaver <adrian.kla...@aklaver.com>
Envoyé : vendredi 16 mars 2018 16:26
À : francis cherat; pgsql-gene...@postgresql.org
Objet : Re: error 53200 out of memory

On 03/16/2018 04:00 AM, francis cherat wrote:
> Hello,
>
> we have got  an error 53200 after sql statement
>
> [5-1] ERROR:  53200: out of memory
> [6-1] DETAIL:  Failed on request of size 1572864.
> [7-1] LOCATION:  AllocSetRealloc, aset.c:973
>
> in jboss logs we have got those errors
>
> org.jboss.logging.jboss-logging - 3.1.2.GA | ERROR: out of memory
>Détail : Failed on request of size 1572864.
> executing failed
> org.hibernate.exception.GenericJDBCException: could not extract ResultSet
>
> cluster configuration
> server_version  | 9.3.16
> effective_cache_size| 12800MB
> shared_buffers  | 384MB
> work_mem| 384MB
>
> Server configuration
> RHEL 6.5
> RAM : 16Go
> 2 CPUs
>
> Thanks for your feedback

Is there anything in the Postgres and/or system logs from the same time
that might shed on this?



--
Adrian Klaver
adrian.kla...@aklaver.com


RE: error 53200 out of memory

2018-03-16 Thread francis cherat
I don't think so, but i am not in my office. I can't connect to this server.
 I would answer you on monday

Regards

De : Adrian Klaver <adrian.kla...@aklaver.com>
Envoyé : vendredi 16 mars 2018 16:26
À : francis cherat; pgsql-gene...@postgresql.org
Objet : Re: error 53200 out of memory

On 03/16/2018 04:00 AM, francis cherat wrote:
> Hello,
>
> we have got  an error 53200 after sql statement
>
> [5-1] ERROR:  53200: out of memory
> [6-1] DETAIL:  Failed on request of size 1572864.
> [7-1] LOCATION:  AllocSetRealloc, aset.c:973
>
> in jboss logs we have got those errors
>
> org.jboss.logging.jboss-logging - 3.1.2.GA | ERROR: out of memory
>Détail : Failed on request of size 1572864.
> executing failed
> org.hibernate.exception.GenericJDBCException: could not extract ResultSet
>
> cluster configuration
> server_version  | 9.3.16
> effective_cache_size| 12800MB
> shared_buffers  | 384MB
> work_mem| 384MB
>
> Server configuration
> RHEL 6.5
> RAM : 16Go
> 2 CPUs
>
> Thanks for your feedback

Is there anything in the Postgres and/or system logs from the same time
that might shed on this?



--
Adrian Klaver
adrian.kla...@aklaver.com


  1   2   >