Re: Out of Memory errors are frustrating as heck!
Hi all, I am connecting to a discussion back from April this year. My data has grown and now I am running into new out of memory situations. Meanwhile the world turned from 11.2 to 11.5 which I just installed only to find the same out of memory error. Have any of the things discussed and proposed, especially this last one by Tomas Vondra, been applied to the 11 releases? Should I try these older patches from April? regards, -Gunther For what it is worth, this is what I am getting: TopMemoryContext: 67424 total in 5 blocks; 7184 free (7 chunks); 60240 used pgstat TabStatusArray lookup hash table: 8192 total in 1 blocks; 416 free (0 chunks); 7776 used TopTransactionContext: 8192 total in 1 blocks; 7720 free (1 chunks); 472 used Operator lookup cache: 24576 total in 2 blocks; 10760 free (3 chunks); 13816 used TableSpace cache: 8192 total in 1 blocks; 2096 free (0 chunks); 6096 used Type information cache: 24352 total in 2 blocks; 2624 free (0 chunks); 21728 used RowDescriptionContext: 8192 total in 1 blocks; 6896 free (0 chunks); 1296 used MessageContext: 8388608 total in 11 blocks; 3094872 free (4 chunks); 5293736 used JoinRelHashTable: 16384 total in 2 blocks; 5576 free (1 chunks); 10808 used Operator class cache: 8192 total in 1 blocks; 560 free (0 chunks); 7632 used smgr relation table: 32768 total in 3 blocks; 12720 free (8 chunks); 20048 used TransactionAbortContext: 32768 total in 1 blocks; 32512 free (0 chunks); 256 used Portal hash: 8192 total in 1 blocks; 560 free (0 chunks); 7632 used TopPortalContext: 8192 total in 1 blocks; 7664 free (0 chunks); 528 used PortalContext: 1024 total in 1 blocks; 624 free (0 chunks); 400 used: ExecutorState: 202528536 total in 19 blocks; 433464 free (12 chunks); 202095072 used HashTableContext: 8192 total in 1 blocks; 7656 free (0 chunks); 536 used HashBatchContext: 10615104 total in 261 blocks; 7936 free (0 chunks); 10607168 used HashTableContext: 8192 total in 1 blocks; 7688 free (1 chunks); 504 used HashBatchContext: 13079304 total in 336 blocks; 7936 free (0 chunks); 13071368 used TupleSort main: 49208 total in 3 blocks; 8552 free (7 chunks); 40656 used Caller tuples: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used Subplan HashTable Temp Context: 1024 total in 1 blocks; 768 free (0 chunks); 256 used Subplan HashTable Context: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used Subplan HashTable Temp Context: 1024 total in 1 blocks; 768 free (0 chunks); 256 used Subplan HashTable Context: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used Subp
Re: Out of Memory errors are frustrating as heck!
Gunther writes: > Hi all, I am connecting to a discussion back from April this year. My > data has grown and now I am running into new out of memory situations. It doesn't look like this has much of anything to do with the hash-table discussion. The big hog is an ExprContext: > ExprContext: 1107296256 total in 142 blocks; 6328 free (101 chunks); > 1107289928 used So there's something leaking in there, but this isn't enough info to guess what. regards, tom lane
Re: Out of Memory errors are frustrating as heck!
OK, I went back through that old thread, and I noticed an early opinion by a certain Peter who said that I should provision some swap space. Since I had plenty of disk and no other option I tried that. And it did some magic. Here this is a steady state now: top - 14:07:32 up 103 days, 9:57, 5 users, load average: 1.33, 1.05, 0.54 Tasks: 329 total, 2 running, 117 sleeping, 0 stopped, 0 zombie %Cpu(s): 31.0 us, 11.4 sy, 0.0 ni, 35.3 id, 22.3 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 7910376 total, 120524 free, 2174940 used, 5614912 buff/cache KiB Swap: 16777212 total, 16777212 free,0 used. 3239724 avail Mem PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 5068 postgres 20 0 4352496 4.0g 2.0g R 76.4 52.6 3:01.39 postgres: postgres integrator [local] INSERT 435 root 20 0 0 0 0 S 4.0 0.0 10:52.38 [kswapd0] and the nice thing is, the backend server process appears to be bounded at 4GB, so there isn't really a "memory leak". And also, the swap space isn't really being used. This may have to do with these vm. sysctl settings, overcommit, etc. * vm.overcommit_memory = 2 -- values are o 0 -- estimate free memory o 1 -- always assume there is enough memory o 2 -- no over-commit allocate only inside the following two parameters * vm.overcommit_kbytes = 0 -- how many kB above swap can be over-committed, EITHER this OR * vm.overcommit_ratio = 50 -- percent of main memory that can be committed over swap, o with 0 swap, that percent can be committed o i.e., this of 8 GB, 4 GB are reserved for buffer cache o not a good idea probably o at least we should allow 75% committed, i.e., 6 GB of 8 GB, leaving + 2 GB of buffer cache + 2 GB of shared buffers + 4 GB of all other memory I have vm.overcommit_memory = 2, _kbytes = 0, _ratio = 50. So this means with _ratio = 50 I can commit 50% of memory, 4GB and this is exactly what the server process wants. So with little impact on the available buffer cache I am in a fairly good position now. The swap (that in my case I set at 2 x main memory = 16G) serves as a buffer to smooth out this peak usage without ever actually paging. I suppose even without swap I could have set vm.overcommit_ratio = 75, and I notice now that I already commented this much (the above bullet points are my own notes.) Anyway, for now, I am good. Thank you very much. regards, -Gunther On 8/23/2019 9:17, Gunther wrote: Hi all, I am connecting to a discussion back from April this year. My data has grown and now I am running into new out of memory situations. Meanwhile the world turned from 11.2 to 11.5 which I just installed only to find the same out of memory error. Have any of the things discussed and proposed, especially this last one by Tomas Vondra, been applied to the 11 releases? Should I try these older patches from April? regards, -Gunther For what it is worth, this is what I am getting: TopMemoryContext: 67424 total in 5 blocks; 7184 free (7 chunks); 60240 used pgstat TabStatusArray lookup hash table: 8192 total in 1 blocks; 416 free (0 chunks); 7776 used TopTransactionContext: 8192 total in 1 blocks; 7720 free (1 chunks); 472 used Operator lookup cache: 24576 total in 2 blocks; 10760 free (3 chunks); 13816 used TableSpace cache: 8192 total in 1 blocks; 2096 free (0 chunks); 6096 used Type information cache: 24352 total in 2 blocks; 2624 free (0 chunks); 21728 used RowDescriptionContext: 8192 total in 1 blocks; 6896 free (0 chunks); 1296 used MessageContext: 8388608 total in 11 blocks; 3094872 free (4 chunks); 5293736 used JoinRelHashTable: 16384 total in 2 blocks; 5576 free (1 chunks); 10808 used Operator class cache: 8192 total in 1 blocks; 560 free (0 chunks); 7632 used smgr relation table: 32768 total in 3 blocks; 12720 free (8 chunks); 20048 used TransactionAbortContext: 32768 total in 1 blocks; 32512 free (0 chunks); 256 used Portal hash: 8192 total in 1 blocks; 560 free (0 chunks); 7632 used TopPortalContext: 8192 total in 1 blocks; 7664 free (0 chunks); 528 used PortalContext: 1024 total in 1 blocks; 624 free (0 chunks); 400 used: ExecutorState: 202528536 total in 19 blocks; 433464 free (12 chunks); 202095072 used HashTableContext: 8192 total in 1 blocks; 7656 free (0 chunks); 536 used HashBatchContext: 10615104 total in 261 blocks; 7936 free (0 chunks); 10607168 used HashTableContext: 8192 total in 1 blocks; 7688 free (1 chunks); 504 used HashBatchContext: 13079304 total in 336 blocks; 7936 free (0 chunks); 13071368 used TupleSort main: 49208 total in 3 blocks; 8552 free (7 chunks); 40656 used Caller tuples: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
pg_basebackup is taking an unusually long time with Postgres 11.3
Hi Folks, I am having trouble setting up replication with Postgres 11.3. pg_basebackup is taking an unusually long time for an small Postgres database. Anything wrong in my configuration or anything I could do to speed up pg_basebackup? I recently upgraded form Postgres 9.2.1. Using a similar postgres configuration, apart from some updates to config for Postgres 11.3. I am using pg_basebackup to replicate from the master. I am using secure ssh tunnel for the replication between master and slave, I.e. there is a ssh tunnel that forwards data from the localhost on port 5433 on the slave to the master server’s port 5432. pg_basebackup is taking about 30 seconds. c12-array2-c1:/# du ./path/to/database 249864 ./nimble/var/private/config/versions/group/sodb pg_basebackup -D $PGSQL_BASEBKUP_PATH -U $DBUSER -c fast -l $backup_name -h localhost -p 5433 --wal-method=stream -Pv -s 10 postgresql.conf: …. max_connections = 100 # (change requires restart) # Note: Increasing max_connections costs ~400 bytes of shared memory per # connection slot, plus lock space (see max_locks_per_transaction). #superuser_reserved_connections = 3 # (change requires restart) unix_socket_directories = '/var/run/postgresql' # (change requires restart) ….. # - Security and Authentication - #authentication_timeout = 1min # 1s-600s #ssl = off # (change requires restart) #ssl_ciphers = 'ALL:!ADH:!LOW:!EXP:!MD5:@STRENGTH' # allowed SSL ciphers # (change requires restart) #ssl_renegotiation_limit = 512MB# amount of data between renegotiations #ssl_cert_file = 'server.crt' # (change requires restart) #ssl_key_file = 'server.key'# (change requires restart) #ssl_ca_file = '' # (change requires restart) #ssl_crl_file = '' # (change requires restart) #password_encryption = on #db_user_namespace = off # Kerberos and GSSAPI #krb_server_keyfile = '' #krb_srvname = 'postgres' # (Kerberos only) #krb_caseins_users = off # - TCP Keepalives - # see "man 7 tcp" for details #tcp_keepalives_idle = 0# TCP_KEEPIDLE, in seconds; # 0 selects the system default #tcp_keepalives_interval = 0# TCP_KEEPINTVL, in seconds; # 0 selects the system default #tcp_keepalives_count = 0 # TCP_KEEPCNT; # 0 selects the system default ….. shared_buffers = 32MB # 32 or 300MB based on model # (change requires restart) #temp_buffers = 8MB # min 800kB …. work_mem = 10MB # min 64kB #maintenance_work_mem = 16MB# min 1MB ….. wal_level = replica # minimal, archive, or hot_standby # (change requires restart) #fsync = on # turns forced synchronization on or off #synchronous_commit = on# synchronization level; # off, local, remote_write, or on wal_sync_method = open_sync # the default is the first option # supported by the operating system: # open_datasync # fdatasync (default on Linux) # fsync # fsync_writethrough # open_sync #full_page_writes = on # recover from partial page writes #wal_buffers = -1 # min 32kB, -1 sets based on shared_buffers # (change requires restart) #wal_writer_delay = 200ms # 1-1 milliseconds #commit_delay = 0 # range 0-10, in microseconds #commit_siblings = 5# range 1-1000 # - Checkpoints - #checkpoint_segments = 3# in logfile segments, min 1, 16MB each checkpoint_timeout = 1min # range 30s-1h #checkpoint_completion_target = 0.5 # checkpoint target duration, 0.0 - 1.0 #checkpoint_warning = 30s # 0 disables # - Archiving - archive_mode = on # allows archiving to be done # (change requires restart) archive_command = '/bin/true' # command to use to archive a logfile segment # placeholders: %p = path of file to archive # %f = file name only # e.g. 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f' #archive_timeout = 0# force a logfile segment switch after this