Re: Out of Memory errors are frustrating as heck!

2019-08-23 Thread Gunther
Hi all, I am connecting to a discussion back from April this year. My 
data has grown and now I am running into new out of memory situations. 
Meanwhile the world turned from 11.2 to 11.5 which I just installed only 
to find the same out of memory error.


Have any of the things discussed and proposed, especially this last one 
by Tomas Vondra, been applied to the 11 releases? Should I try these 
older patches from April?


regards,
-Gunther

For what it is worth, this is what I am getting:

TopMemoryContext: 67424 total in 5 blocks; 7184 free (7 chunks); 60240 
used pgstat TabStatusArray lookup hash table: 8192 total in 1 blocks; 
416 free (0 chunks); 7776 used TopTransactionContext: 8192 total in 1 
blocks; 7720 free (1 chunks); 472 used Operator lookup cache: 24576 
total in 2 blocks; 10760 free (3 chunks); 13816 used TableSpace cache: 
8192 total in 1 blocks; 2096 free (0 chunks); 6096 used Type information 
cache: 24352 total in 2 blocks; 2624 free (0 chunks); 21728 used 
RowDescriptionContext: 8192 total in 1 blocks; 6896 free (0 chunks); 
1296 used MessageContext: 8388608 total in 11 blocks; 3094872 free (4 
chunks); 5293736 used JoinRelHashTable: 16384 total in 2 blocks; 5576 
free (1 chunks); 10808 used Operator class cache: 8192 total in 1 
blocks; 560 free (0 chunks); 7632 used smgr relation table: 32768 total 
in 3 blocks; 12720 free (8 chunks); 20048 used TransactionAbortContext: 
32768 total in 1 blocks; 32512 free (0 chunks); 256 used Portal hash: 
8192 total in 1 blocks; 560 free (0 chunks); 7632 used TopPortalContext: 
8192 total in 1 blocks; 7664 free (0 chunks); 528 used PortalContext: 
1024 total in 1 blocks; 624 free (0 chunks); 400 used: ExecutorState: 
202528536 total in 19 blocks; 433464 free (12 chunks); 202095072 used 
HashTableContext: 8192 total in 1 blocks; 7656 free (0 chunks); 536 used 
HashBatchContext: 10615104 total in 261 blocks; 7936 free (0 chunks); 
10607168 used HashTableContext: 8192 total in 1 blocks; 7688 free (1 
chunks); 504 used HashBatchContext: 13079304 total in 336 blocks; 7936 
free (0 chunks); 13071368 used TupleSort main: 49208 total in 3 blocks; 
8552 free (7 chunks); 40656 used Caller tuples: 8192 total in 1 blocks; 
7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 
free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free 
(0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 
chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 
chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 
chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 
chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 
chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 
chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 
chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 
chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 
chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 
chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 
chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 
chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 
chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 
chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 
chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 
chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 
chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 
chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 
chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 
chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 
chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 
chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 
chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 
chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 
chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 
chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 
chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 
chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 
chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 
chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 
chunks); 256 used Subplan HashTable Temp Context: 1024 total in 1 
blocks; 768 free (0 chunks); 256 used Subplan HashTable Context: 8192 
total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 
total in 1 blocks; 7936 free (0 chunks); 256 used Subplan HashTable Temp 
Context: 1024 total in 1 blocks; 768 free (0 chunks); 256 used Subplan 
HashTable Context: 8192 total in 1 blocks; 7936 free (0 chunks); 256 
used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used 
Subp

Re: Out of Memory errors are frustrating as heck!

2019-08-23 Thread Tom Lane
Gunther  writes:
> Hi all, I am connecting to a discussion back from April this year. My 
> data has grown and now I am running into new out of memory situations. 

It doesn't look like this has much of anything to do with the hash-table
discussion.  The big hog is an ExprContext:

> ExprContext: 1107296256 total in 142 blocks; 6328 free (101 chunks); 
> 1107289928 used

So there's something leaking in there, but this isn't enough info
to guess what.

regards, tom lane




Re: Out of Memory errors are frustrating as heck!

2019-08-23 Thread Gunther
OK, I went back through that old thread, and I noticed an early opinion 
by a certain Peter  who said that I should provision 
some swap space. Since I had plenty of disk and no other option I tried 
that. And it did some magic. Here this is a steady state now:


top - 14:07:32 up 103 days,  9:57,  5 users,  load average: 1.33, 1.05, 0.54
Tasks: 329 total,   2 running, 117 sleeping,   0 stopped,   0 zombie
%Cpu(s): 31.0 us, 11.4 sy,  0.0 ni, 35.3 id, 22.3 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  7910376 total,   120524 free,  2174940 used,  5614912 buff/cache
KiB Swap: 16777212 total, 16777212 free,0 used.  3239724 avail Mem

  PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+ COMMAND
 5068 postgres  20   0 4352496   4.0g   2.0g R  76.4 52.6   3:01.39 postgres: 
postgres integrator [local] INSERT
  435 root  20   0   0  0  0 S   4.0  0.0  10:52.38 [kswapd0]

and the nice thing is, the backend server process appears to be bounded 
at 4GB, so there isn't really a "memory leak". And also, the swap space 
isn't really being used. This may have to do with these vm. sysctl 
settings, overcommit, etc.


 * vm.overcommit_memory = 2 -- values are
 o 0 -- estimate free memory
 o 1 -- always assume there is enough memory
 o 2 -- no over-commit allocate only inside the following two
   parameters
 * vm.overcommit_kbytes = 0 -- how many kB above swap can be
   over-committed, EITHER this OR
 * vm.overcommit_ratio = 50 -- percent of main memory that can be
   committed over swap,
 o with 0 swap, that percent can be committed
 o i.e., this of 8 GB, 4 GB are reserved for buffer cache
 o not a good idea probably
 o at least we should allow 75% committed, i.e., 6 GB of 8 GB, leaving
 + 2 GB of buffer cache
 + 2 GB of shared buffers
 + 4 GB of all other memory

I have vm.overcommit_memory = 2, _kbytes = 0, _ratio = 50. So this means 
with _ratio = 50 I can commit 50% of memory, 4GB and this is exactly 
what the server process wants. So with little impact on the available 
buffer cache I am in a fairly good position now. The swap (that in my 
case I set at 2 x main memory = 16G) serves as a buffer to smooth out 
this peak usage without ever actually paging.


I suppose even without swap I could have set vm.overcommit_ratio = 75, 
and I notice now that I already commented this much (the above bullet 
points are my own notes.)


Anyway, for now, I am good. Thank you very much.

regards,
-Gunther



On 8/23/2019 9:17, Gunther wrote:


Hi all, I am connecting to a discussion back from April this year. My 
data has grown and now I am running into new out of memory situations. 
Meanwhile the world turned from 11.2 to 11.5 which I just installed 
only to find the same out of memory error.


Have any of the things discussed and proposed, especially this last 
one by Tomas Vondra, been applied to the 11 releases? Should I try 
these older patches from April?


regards,
-Gunther

For what it is worth, this is what I am getting:

TopMemoryContext: 67424 total in 5 blocks; 7184 free (7 chunks); 60240 
used pgstat TabStatusArray lookup hash table: 8192 total in 1 blocks; 
416 free (0 chunks); 7776 used TopTransactionContext: 8192 total in 1 
blocks; 7720 free (1 chunks); 472 used Operator lookup cache: 24576 
total in 2 blocks; 10760 free (3 chunks); 13816 used TableSpace cache: 
8192 total in 1 blocks; 2096 free (0 chunks); 6096 used Type 
information cache: 24352 total in 2 blocks; 2624 free (0 chunks); 
21728 used RowDescriptionContext: 8192 total in 1 blocks; 6896 free (0 
chunks); 1296 used MessageContext: 8388608 total in 11 blocks; 3094872 
free (4 chunks); 5293736 used JoinRelHashTable: 16384 total in 2 
blocks; 5576 free (1 chunks); 10808 used Operator class cache: 8192 
total in 1 blocks; 560 free (0 chunks); 7632 used smgr relation table: 
32768 total in 3 blocks; 12720 free (8 chunks); 20048 used 
TransactionAbortContext: 32768 total in 1 blocks; 32512 free (0 
chunks); 256 used Portal hash: 8192 total in 1 blocks; 560 free (0 
chunks); 7632 used TopPortalContext: 8192 total in 1 blocks; 7664 free 
(0 chunks); 528 used PortalContext: 1024 total in 1 blocks; 624 free 
(0 chunks); 400 used: ExecutorState: 202528536 total in 19 blocks; 
433464 free (12 chunks); 202095072 used HashTableContext: 8192 total 
in 1 blocks; 7656 free (0 chunks); 536 used HashBatchContext: 10615104 
total in 261 blocks; 7936 free (0 chunks); 10607168 used 
HashTableContext: 8192 total in 1 blocks; 7688 free (1 chunks); 504 
used HashBatchContext: 13079304 total in 336 blocks; 7936 free (0 
chunks); 13071368 used TupleSort main: 49208 total in 3 blocks; 8552 
free (7 chunks); 40656 used Caller tuples: 8192 total in 1 blocks; 
7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 
7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 
7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 
7936 free (0 chunks); 256 used 

pg_basebackup is taking an unusually long time with Postgres 11.3

2019-08-23 Thread andy andy
Hi Folks,


I am having trouble setting up replication with Postgres 11.3.
pg_basebackup is taking an unusually long time for an small Postgres
database. Anything wrong in my configuration or anything I could do to
speed up pg_basebackup?


I recently upgraded form Postgres 9.2.1. Using a similar postgres
configuration,  apart from some updates to config for Postgres 11.3. I am
using pg_basebackup to replicate from the master. I am using secure ssh
tunnel for the replication between master and slave, I.e. there is a ssh
tunnel that forwards data from the localhost on port 5433 on the slave to
the master server’s port 5432.


pg_basebackup is taking about 30 seconds.

c12-array2-c1:/# du ./path/to/database

249864  ./nimble/var/private/config/versions/group/sodb


pg_basebackup -D $PGSQL_BASEBKUP_PATH -U $DBUSER -c fast -l $backup_name -h
localhost -p 5433 --wal-method=stream -Pv -s 10


postgresql.conf:

….

max_connections = 100   # (change requires restart)

# Note:  Increasing max_connections costs ~400 bytes of shared memory per

# connection slot, plus lock space (see max_locks_per_transaction).

#superuser_reserved_connections = 3 # (change requires restart)

unix_socket_directories = '/var/run/postgresql'   # (change requires
restart)

…..

# - Security and Authentication -

#authentication_timeout = 1min  # 1s-600s

#ssl = off  # (change requires restart)

#ssl_ciphers = 'ALL:!ADH:!LOW:!EXP:!MD5:@STRENGTH'  # allowed SSL
ciphers

# (change requires restart)

#ssl_renegotiation_limit = 512MB# amount of data between
renegotiations

#ssl_cert_file = 'server.crt'   # (change requires restart)

#ssl_key_file = 'server.key'# (change requires restart)

#ssl_ca_file = ''   # (change requires restart)

#ssl_crl_file = ''  # (change requires restart)

#password_encryption = on

#db_user_namespace = off

# Kerberos and GSSAPI

#krb_server_keyfile = ''

#krb_srvname = 'postgres'   # (Kerberos only)

#krb_caseins_users = off

# - TCP Keepalives -

# see "man 7 tcp" for details

#tcp_keepalives_idle = 0# TCP_KEEPIDLE, in seconds;

# 0 selects the system default

#tcp_keepalives_interval = 0# TCP_KEEPINTVL, in seconds;

# 0 selects the system default

#tcp_keepalives_count = 0   # TCP_KEEPCNT;

# 0 selects the system default

…..

shared_buffers = 32MB # 32 or 300MB based on model

# (change requires restart)

#temp_buffers = 8MB # min 800kB

….

work_mem = 10MB # min 64kB

#maintenance_work_mem = 16MB# min 1MB

…..

wal_level = replica   # minimal, archive, or hot_standby

# (change requires restart)

#fsync = on # turns forced synchronization on
or off

#synchronous_commit = on# synchronization level;

# off, local, remote_write, or on

wal_sync_method = open_sync # the default is the first option

# supported by the operating system:

#   open_datasync

#   fdatasync (default on Linux)

#   fsync

#   fsync_writethrough

#   open_sync

#full_page_writes = on  # recover from partial page writes

#wal_buffers = -1   # min 32kB, -1 sets based on
shared_buffers

# (change requires restart)

#wal_writer_delay = 200ms   # 1-1 milliseconds

#commit_delay = 0   # range 0-10, in microseconds

#commit_siblings = 5# range 1-1000

# - Checkpoints -

#checkpoint_segments = 3# in logfile segments, min 1, 16MB
each

checkpoint_timeout = 1min   # range 30s-1h

#checkpoint_completion_target = 0.5 # checkpoint target duration, 0.0 -
1.0

#checkpoint_warning = 30s   # 0 disables

# - Archiving -

archive_mode = on # allows archiving to be done

# (change requires restart)

archive_command = '/bin/true' # command to use to archive a logfile segment

# placeholders: %p = path of file to archive

#   %f = file name only

# e.g. 'test ! -f /mnt/server/archivedir/%f
&& cp %p /mnt/server/archivedir/%f'

#archive_timeout = 0# force a logfile segment switch after this