Re: [PERFORM] 60 core performance with 9.3
I've been assisting Mark with the benchmarking of these new servers. The drop off in both throughput and CPU utilisation that we've been observing as the client count increases has let me to investigate which lwlocks are dominant at different client counts. I've recompiled postgres with Andres LWLock improvements, Kevin's libnuma patch and with LWLOCK_STATS enabled. The LWLOCK_STATS below suggest that ProcArrayLock might be the main source of locking that's causing throughput to take a dive as the client count increases beyond the core count. wal_buffers = 256MB checkpoint_segments = 1920 wal_sync_method = open_datasync pgbench -s 2000 -T 600 Results: clients | tps -+- 6 | 9490 12 | 17558 24 | 25681 48 | 41175 96 | 48954 192 | 31887 384 | 15564 LWLOCK_STATS at 48 clients Lock |Blk | SpinDelay | Blk % | SpinDelay % +--+---+---+- BufFreelistLock| 31144 | 11 | 1.64 | 1.62 ShmemIndexLock |192 | 1 | 0.01 | 0.15 OidGenLock | 32648 | 14 | 1.72 | 2.06 XidGenLock | 35731 | 18 | 1.88 | 2.64 ProcArrayLock | 291121 | 215 | 15.36 | 31.57 SInvalReadLock | 32136 | 13 | 1.70 | 1.91 SInvalWriteLock| 32141 | 12 | 1.70 | 1.76 WALBufMappingLock | 31662 | 15 | 1.67 | 2.20 WALWriteLock | 825380 | 45 | 36.31 | 6.61 CLogControlLock| 583458 | 337 | 26.93 | 49.49 LWLOCK_STATS at 96 clients Lock |Blk | SpinDelay | Blk % | SpinDelay % +--+---+---+- BufFreelistLock| 62954 | 12 | 1.54 | 0.27 ShmemIndexLock | 62635 | 4 | 1.54 | 0.09 OidGenLock | 92232 | 22 | 2.26 | 0.50 XidGenLock | 98326 | 18 | 2.41 | 0.41 ProcArrayLock | 928871 |3188 | 22.78 | 72.57 SInvalReadLock | 58392 | 13 | 1.43 | 0.30 SInvalWriteLock| 57429 | 14 | 1.41 | 0.32 WALBufMappingLock | 138375 | 14 | 3.39 | 0.32 WALWriteLock | 1480707 | 42 | 36.31 | 0.96 CLogControlLock| 1098239 |1066 | 26.93 | 27.27 LWLOCK_STATS at 384 clients Lock |Blk | SpinDelay | Blk % | SpinDelay % +--+---+---+- BufFreelistLock| 184298 | 158 | 1.93 | 0.03 ShmemIndexLock | 183573 | 164 | 1.92 | 0.03 OidGenLock | 184558 | 173 | 1.93 | 0.03 XidGenLock | 200239 | 213 | 2.09 | 0.04 ProcArrayLock | 4035527 | 579666 | 42.22 | 98.62 SInvalReadLock | 182204 | 152 | 1.91 | 0.03 SInvalWriteLock| 182898 | 137 | 1.91 | 0.02 WALBufMappingLock | 219936 | 215 | 2.30 | 0.04 WALWriteLock | 3172725 | 457 | 24.67 | 0.08 CLogControlLock| 1012458 |6423 | 10.59 | 1.09 The same test done with a readonly workload show virtually no SpinDelay at all. Any thoughts or comments on these results are welcome! Regards, Matt. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] 60 core performance with 9.3
On 31/07/14 00:47, Tomas Vondra wrote: On 30 Červenec 2014, 14:39, Tom Lane wrote: "Tomas Vondra" writes: On 30 ??ervenec 2014, 3:44, Mark Kirkwood wrote: While these numbers look great in the middle range (12-96 clients), then benefit looks to be tailing off as client numbers increase. Also running with no stats (and hence no auto vacuum or analyze) is way too scary! By disabling statistics collector you loose runtime counters - number of sequential/index scans on a table, tuples read from a relation aetc. But it does not influence VACUUM or planning at all. It does break autovacuum. Of course, you're right. It throws away info about how much data was modified and when the table was last (auto)vacuumed. This is a clear proof that I really need to drink at least one cup of coffee in the morning before doing anything in the morning. Lol - thanks for taking a look anyway. Yes, coffee is often an important part of the exercise. Regards Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] 60 core performance with 9.3
Hi Tomas, Unfortunately I think you are mistaken - disabling the stats collector (i.e. track_counts = off) means that autovacuum has no idea about when/if it needs to start a worker (as it uses those counts to decide), and hence you lose all automatic vacuum and analyze as a result. With respect to comments like "it shouldn't make difference" etc etc, well the profile suggests otherwise, and the change in tps numbers support the observation. regards Mark On 30/07/14 20:42, Tomas Vondra wrote: On 30 Červenec 2014, 3:44, Mark Kirkwood wrote: While these numbers look great in the middle range (12-96 clients), then benefit looks to be tailing off as client numbers increase. Also running with no stats (and hence no auto vacuum or analyze) is way too scary! I assume you've disabled statistics collector, which has nothing to do with vacuum or analyze. There are two kinds of statistics in PostgreSQL - data distribution statistics (which is collected by ANALYZE and stored in actual tables within the database) and runtime statistics (which is collected by the stats collector and stored in a file somewhere on the dist). By disabling statistics collector you loose runtime counters - number of sequential/index scans on a table, tuples read from a relation aetc. But it does not influence VACUUM or planning at all. Also, it's mostly async (send over UDP and you're done) and shouldn't make much difference unless you have large number of objects. There are ways to improve this (e.g. by placing the stat files into a tmpfs). Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why you should turn on Checksums with SSDs
On Wed, Jul 30, 2014 at 4:01 AM, Tomas Vondra wrote: > On 30 Červenec 2014, 5:12, Josh Berkus wrote: >> Explained here: >> https://www.usenix.org/system/files/conference/fast13/fast13-final80.pdf >> >> 13 out of 15 tested SSD's had various kinds of corruption on a power-out. >> >> (thanks, Neil!) > > Well, only four of the devices supposedly had a power-loss protection > (battery, capacitor, ...) so I guess it's not really that surprising the > remaining 11 devices failed in a test like this. Although it really > shouldn't damage the device, as apparently happened during the tests. > > Too bad they haven't mentioned which SSDs they've been testing > specifically. While I understand the reason for that (HP Labs can't just > point at products from other companies), it significantly limits the > usefulness of the study. Too many companies are producing crappy > consumer-level devices, advertising them as "enterprise". I could name a > few ... > > Maybe it could be deciphered using the information in the paper > (power-loss protection, year of release, ...). > > I'd expect to see Intel 320/710 to see there, but that seems not to be the > case, because those devices were released in 2011 and all the four devices > with power-loss protection have year=2012. Or maybe it's the year when > that particular device was manufactured? Take a look here: http://hardware.slashdot.org/story/13/12/27/208249/power-loss-protected-ssds-tested-only-intel-s3500-passes "Only the end-of-lifed Intel 320 and its newer replacement, the S3500, survived unscathed. The conclusion: if you care about data even when power could be unreliable, only buy Intel SSDs."" merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Slow create temp table
Greetings, Any help regarding a sporadic and mysterious issue would be much appreciated. A pgsql function for loading in data is occasionally taking 12+ hours to complete versus its normal 1-2 hours, due to a slow down at the CREATE TEMP TABLE step. During slow runs of the function, the temp table data file is being written to at 8192 bytes/second. This rate was consistent at the 5 hour mark up until I canceled the query at 6 hrs in. An immediate rerunning of the function finished in an hour. Temp table file size was 226 MB and was created in ~15 mins. PostgreSQL 9.3.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-4), 64-bit Linux 2.6.32-431.20.3.el6.x86_64 #1 SMP Thu Jun 19 21:14:45 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux ProLiant DL380p Gen8, 2 x E5-2620 (hyperthreading on) 96 GB pgsql data dir mounted on 25 x ssd storage array, connected via fibre channel, pg_xlog on a RAID 10 hdd array deadline scheduler 8192 readahead name | current_setting | source --+---+ application_name | psql | client archive_command | * | configuration file archive_mode | on| configuration file autovacuum | on| configuration file autovacuum_max_workers | 6 | configuration file bgwriter_delay | 40ms | configuration file bgwriter_lru_maxpages| 1000 | configuration file bgwriter_lru_multiplier | 3 | configuration file checkpoint_completion_target | 0.9 | configuration file checkpoint_segments | 1024 | configuration file checkpoint_timeout | 30min | configuration file client_encoding | UTF8 | client cpu_operator_cost| 0.5 | configuration file cpu_tuple_cost | 0.5 | configuration file DateStyle| ISO, MDY | configuration file deadlock_timeout | 10s | configuration file default_text_search_config | pg_catalog.english| configuration file effective_cache_size | 70GB | configuration file effective_io_concurrency | 6 | configuration file full_page_writes | on| configuration file hot_standby | on| configuration file hot_standby_feedback | on| configuration file lc_messages | en_US.UTF-8 | configuration file lc_monetary | en_US.UTF-8 | configuration file lc_numeric | en_US.UTF-8 | configuration file lc_time | en_US.UTF-8 | configuration file listen_addresses | * | configuration file log_autovacuum_min_duration | 1s| configuration file log_checkpoints | on| configuration file log_destination | csvlog| configuration file log_file_mode| 0600 | configuration file log_filename | postgresql-%a.log | configuration file log_lock_waits | on| configuration file log_min_duration_statement | 250ms | configuration file log_rotation_age | 1d| configuration file log_rotation_size| 0 | configuration file log_statement| ddl | configuration file log_timezone | America/New_York | configuration file log_truncate_on_rotation | on| configuration file logging_collector| on| configuration file maintenance_work_mem | 2400MB| configuration file max_connections | 1000 | configuration file max_stack_depth | 5MB | configuration file max_wal_senders | 5 | configuration file port | 5432 | command line random_page_cost | 4 | session seq_page_cost| 1 | configuration file shared_buffers | 8GB | configuration file shared_preload_libraries | auto_explain | configuration file stats_temp_directory | /var/lib/pgsql_stat_tmpfs | config
Re: [PERFORM] 60 core performance with 9.3
On 30 Červenec 2014, 14:39, Tom Lane wrote: > "Tomas Vondra" writes: >> On 30 ??ervenec 2014, 3:44, Mark Kirkwood wrote: >>> While these numbers look great in the middle range (12-96 clients), >>> then >>> benefit looks to be tailing off as client numbers increase. Also >>> running >>> with no stats (and hence no auto vacuum or analyze) is way too scary! > >> By disabling statistics collector you loose runtime counters - number of >> sequential/index scans on a table, tuples read from a relation aetc. But >> it does not influence VACUUM or planning at all. > > It does break autovacuum. Of course, you're right. It throws away info about how much data was modified and when the table was last (auto)vacuumed. This is a clear proof that I really need to drink at least one cup of coffee in the morning before doing anything in the morning. Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] 60 core performance with 9.3
"Tomas Vondra" writes: > On 30 Äervenec 2014, 3:44, Mark Kirkwood wrote: >> While these numbers look great in the middle range (12-96 clients), then >> benefit looks to be tailing off as client numbers increase. Also running >> with no stats (and hence no auto vacuum or analyze) is way too scary! > By disabling statistics collector you loose runtime counters - number of > sequential/index scans on a table, tuples read from a relation aetc. But > it does not influence VACUUM or planning at all. It does break autovacuum. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Very slow planning performance on partition table
I think I understand what happened now. I have another monitor script runs periodically and calls pg_cancel_backend and pg_terminate_backend for those hanging update sqls. However for some unkown reason the cancle and termiante command doesn't work at pgsql side for those update sqls. But I think pgbouncer side was notified by cancel or terminate command. It then drops old connections and creates new ones while those old connections still hang at pgsql side. That's why the connection status shows CLOST_WAIT and there are more processes at pgsql side than pgbouncer defined . So the root cause is still at pgsql side. It shouldn't hang there. What the hanging process was doing is in my previous posts. There many same concurrent sql which updates a partitioned table witouth partition key specified in conditions. The gdb trace shows this: (gdb) bt #0 0x7f8cea310db7 in semop () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x005f97d3 in PGSemaphoreLock () #2 0x00638153 in LWLockAcquire () #3 0x004a90d0 in ginTraverseLock () #4 0x004a9d0b in ginFindLeafPage () #5 0x004a8377 in ginInsertItemPointers () #6 0x004a4548 in ginEntryInsert () #7 0x004ae687 in ginInsertCleanup () #8 0x004af3d6 in ginHeapTupleFastInsert () #9 0x004a4ab1 in gininsert () #10 0x00709b15 in FunctionCall6Coll () #11 0x0047b6b7 in index_insert () #12 0x0057f475 in ExecInsertIndexTuples () #13 0x0058bf07 in ExecModifyTable () #14 0x005766e3 in ExecProcNode () #15 0x00575ad4 in standard_ExecutorRun () #16 0x0064718f in ProcessQuery () #17 0x006473b7 in PortalRunMulti () #18 0x00647e8a in PortalRun () #19 0x00645160 in PostgresMain () #20 0x0060459e in ServerLoop () #21 0x006053bc in PostmasterMain () #22 0x005a686b in main () (gdb) q It will just hangs there forever and finally blocks all other update sqls if I don't stop pgbouncer. When this happens, all the cpus will be utilized by those hanging processes and the server load is very very high. It keeps at serveral hundreds comparing with about 20 normally which causes the performance problem for all tasks on the server. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why you should turn on Checksums with SSDs
On 30 Červenec 2014, 5:12, Josh Berkus wrote: > Explained here: > https://www.usenix.org/system/files/conference/fast13/fast13-final80.pdf > > 13 out of 15 tested SSD's had various kinds of corruption on a power-out. > > (thanks, Neil!) Well, only four of the devices supposedly had a power-loss protection (battery, capacitor, ...) so I guess it's not really that surprising the remaining 11 devices failed in a test like this. Although it really shouldn't damage the device, as apparently happened during the tests. Too bad they haven't mentioned which SSDs they've been testing specifically. While I understand the reason for that (HP Labs can't just point at products from other companies), it significantly limits the usefulness of the study. Too many companies are producing crappy consumer-level devices, advertising them as "enterprise". I could name a few ... Maybe it could be deciphered using the information in the paper (power-loss protection, year of release, ...). I'd expect to see Intel 320/710 to see there, but that seems not to be the case, because those devices were released in 2011 and all the four devices with power-loss protection have year=2012. Or maybe it's the year when that particular device was manufactured? regards Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] 60 core performance with 9.3
On 30 Červenec 2014, 3:44, Mark Kirkwood wrote: > > While these numbers look great in the middle range (12-96 clients), then > benefit looks to be tailing off as client numbers increase. Also running > with no stats (and hence no auto vacuum or analyze) is way too scary! I assume you've disabled statistics collector, which has nothing to do with vacuum or analyze. There are two kinds of statistics in PostgreSQL - data distribution statistics (which is collected by ANALYZE and stored in actual tables within the database) and runtime statistics (which is collected by the stats collector and stored in a file somewhere on the dist). By disabling statistics collector you loose runtime counters - number of sequential/index scans on a table, tuples read from a relation aetc. But it does not influence VACUUM or planning at all. Also, it's mostly async (send over UDP and you're done) and shouldn't make much difference unless you have large number of objects. There are ways to improve this (e.g. by placing the stat files into a tmpfs). Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance