Re: [PERFORM] 60 core performance with 9.3

2014-07-30 Thread Matt Clarkson
I've been assisting Mark with the benchmarking of these new servers. 

The drop off in both throughput and CPU utilisation that we've been
observing as the client count increases has let me to investigate which
lwlocks are dominant at different client counts.

I've recompiled postgres with Andres LWLock improvements, Kevin's
libnuma patch and with LWLOCK_STATS enabled.

The LWLOCK_STATS below suggest that ProcArrayLock might be the main
source of locking that's causing throughput to take a dive as the client
count increases beyond the core count.


wal_buffers = 256MB
checkpoint_segments = 1920
wal_sync_method = open_datasync

pgbench -s 2000 -T 600


Results:

 clients |  tps
-+-
 6   |  9490
12   | 17558
24   | 25681
48   | 41175
96   | 48954
   192   | 31887
   384   | 15564
   
   

LWLOCK_STATS at 48 clients

  Lock  |Blk   | SpinDelay | Blk % | SpinDelay % 
+--+---+---+-
 BufFreelistLock|  31144   |  11   |  1.64 |   1.62
 ShmemIndexLock |192   |   1   |  0.01 |   0.15
 OidGenLock |  32648   |  14   |  1.72 |   2.06
 XidGenLock |  35731   |  18   |  1.88 |   2.64
 ProcArrayLock  | 291121   | 215   | 15.36 |  31.57
 SInvalReadLock |  32136   |  13   |  1.70 |   1.91
 SInvalWriteLock|  32141   |  12   |  1.70 |   1.76
 WALBufMappingLock  |  31662   |  15   |  1.67 |   2.20
 WALWriteLock   | 825380   |  45   | 36.31 |   6.61
 CLogControlLock| 583458   | 337   | 26.93 |  49.49
 
 
   
LWLOCK_STATS at 96 clients

  Lock  |Blk   | SpinDelay | Blk % | SpinDelay % 
+--+---+---+-
 BufFreelistLock|   62954  |  12   |  1.54 |   0.27
 ShmemIndexLock |   62635  |   4   |  1.54 |   0.09
 OidGenLock |   92232  |  22   |  2.26 |   0.50
 XidGenLock |   98326  |  18   |  2.41 |   0.41
 ProcArrayLock  |  928871  |3188   | 22.78 |  72.57
 SInvalReadLock |   58392  |  13   |  1.43 |   0.30
 SInvalWriteLock|   57429  |  14   |  1.41 |   0.32
 WALBufMappingLock  |  138375  |  14   |  3.39 |   0.32
 WALWriteLock   | 1480707  |  42   | 36.31 |   0.96
 CLogControlLock| 1098239  |1066   | 26.93 |  27.27
 
 
 
LWLOCK_STATS at 384 clients

  Lock  |Blk   | SpinDelay | Blk % | SpinDelay % 
+--+---+---+-
 BufFreelistLock|  184298  | 158   |  1.93 |   0.03
 ShmemIndexLock |  183573  | 164   |  1.92 |   0.03
 OidGenLock |  184558  | 173   |  1.93 |   0.03
 XidGenLock |  200239  | 213   |  2.09 |   0.04
 ProcArrayLock  | 4035527  |  579666   | 42.22 |  98.62
 SInvalReadLock |  182204  | 152   |  1.91 |   0.03
 SInvalWriteLock|  182898  | 137   |  1.91 |   0.02
 WALBufMappingLock  |  219936  | 215   |  2.30 |   0.04
 WALWriteLock   | 3172725  | 457   | 24.67 |   0.08
 CLogControlLock| 1012458  |6423   | 10.59 |   1.09
 

The same test done with a readonly workload show virtually no SpinDelay
at all.


Any thoughts or comments on these results are welcome!


Regards,
Matt.





-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 60 core performance with 9.3

2014-07-30 Thread Mark Kirkwood

On 31/07/14 00:47, Tomas Vondra wrote:

On 30 Červenec 2014, 14:39, Tom Lane wrote:

"Tomas Vondra"  writes:

On 30 ??ervenec 2014, 3:44, Mark Kirkwood wrote:

While these numbers look great in the middle range (12-96 clients),
then
benefit looks to be tailing off as client numbers increase. Also
running
with no stats (and hence no auto vacuum or analyze) is way too scary!



By disabling statistics collector you loose runtime counters - number of
sequential/index scans on a table, tuples read from a relation aetc. But
it does not influence VACUUM or planning at all.


It does break autovacuum.


Of course, you're right. It throws away info about how much data was
modified and when the table was last (auto)vacuumed.

This is a clear proof that I really need to drink at least one cup of
coffee in the morning before doing anything in the morning.



Lol - thanks for taking a look anyway. Yes, coffee is often an important 
part of the exercise.


Regards

Mark



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 60 core performance with 9.3

2014-07-30 Thread Mark Kirkwood

Hi Tomas,

Unfortunately I think you are mistaken - disabling the stats collector 
(i.e. track_counts = off) means that autovacuum has no idea about 
when/if it needs to start a worker (as it uses those counts to decide), 
and hence you lose all automatic vacuum and analyze as a result.


With respect to comments like "it shouldn't make difference" etc etc, 
well the profile suggests otherwise, and the change in tps numbers 
support the observation.


regards

Mark

On 30/07/14 20:42, Tomas Vondra wrote:

On 30 Červenec 2014, 3:44, Mark Kirkwood wrote:


While these numbers look great in the middle range (12-96 clients), then
benefit looks to be tailing off as client numbers increase. Also running
with no stats (and hence no auto vacuum or analyze) is way too scary!


I assume you've disabled statistics collector, which has nothing to do
with vacuum or analyze.

There are two kinds of statistics in PostgreSQL - data distribution
statistics (which is collected by ANALYZE and stored in actual tables
within the database) and runtime statistics (which is collected by the
stats collector and stored in a file somewhere on the dist).

By disabling statistics collector you loose runtime counters - number of
sequential/index scans on a table, tuples read from a relation aetc. But
it does not influence VACUUM or planning at all.

Also, it's mostly async (send over UDP and you're done) and shouldn't make
much difference unless you have large number of objects. There are ways to
improve this (e.g. by placing the stat files into a tmpfs).

Tomas





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why you should turn on Checksums with SSDs

2014-07-30 Thread Merlin Moncure
On Wed, Jul 30, 2014 at 4:01 AM, Tomas Vondra  wrote:
> On 30 Červenec 2014, 5:12, Josh Berkus wrote:
>> Explained here:
>> https://www.usenix.org/system/files/conference/fast13/fast13-final80.pdf
>>
>> 13 out of 15 tested SSD's had various kinds of corruption on a power-out.
>>
>> (thanks, Neil!)
>
> Well, only four of the devices supposedly had a power-loss protection
> (battery, capacitor, ...) so I guess it's not really that surprising the
> remaining 11 devices failed in a test like this. Although it really
> shouldn't damage the device, as apparently happened during the tests.
>
> Too bad they haven't mentioned which SSDs they've been testing
> specifically. While I understand the reason for that (HP Labs can't just
> point at products from other companies), it significantly limits the
> usefulness of the study. Too many companies are producing crappy
> consumer-level devices, advertising them as "enterprise". I could name a
> few ...
>
> Maybe it could be deciphered using the information in the paper
> (power-loss protection, year of release, ...).
>
> I'd expect to see Intel 320/710 to see there, but that seems not to be the
> case, because those devices were released in 2011 and all the four devices
> with power-loss protection have year=2012. Or maybe it's the year when
> that particular device was manufactured?

Take a look here:
http://hardware.slashdot.org/story/13/12/27/208249/power-loss-protected-ssds-tested-only-intel-s3500-passes

"Only the end-of-lifed Intel 320 and its newer replacement, the S3500,
survived unscathed. The conclusion: if you care about data even when
power could be unreliable, only buy Intel SSDs.""

merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Slow create temp table

2014-07-30 Thread Clinton Adams
Greetings,

Any help regarding a sporadic and mysterious issue would be much appreciated.

A pgsql function for loading in data is occasionally taking 12+ hours
to complete versus its normal 1-2 hours, due to a slow down at the
CREATE TEMP TABLE step. During slow runs of the function, the temp
table data file is being written to at 8192 bytes/second. This rate
was consistent at the 5 hour mark up until I canceled the query at 6
hrs in. An immediate rerunning of the function finished in an hour.
Temp table file size was 226 MB and was created in ~15 mins.

PostgreSQL 9.3.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC)
4.4.7 20120313 (Red Hat 4.4.7-4), 64-bit

Linux 2.6.32-431.20.3.el6.x86_64 #1 SMP Thu Jun 19 21:14:45 UTC 2014
x86_64 x86_64 x86_64 GNU/Linux

ProLiant DL380p Gen8, 2 x E5-2620  (hyperthreading on)
96 GB
pgsql data dir mounted on 25 x ssd storage array, connected via fibre
channel, pg_xlog on a RAID 10 hdd array
deadline scheduler
8192 readahead

 name |  current_setting  |   source
--+---+
 application_name | psql  | client
 archive_command | * | configuration file
 archive_mode | on| configuration file
 autovacuum   | on| configuration file
 autovacuum_max_workers   | 6 | configuration file
 bgwriter_delay   | 40ms  | configuration file
 bgwriter_lru_maxpages| 1000  | configuration file
 bgwriter_lru_multiplier  | 3 | configuration file
 checkpoint_completion_target | 0.9   | configuration file
 checkpoint_segments  | 1024  | configuration file
 checkpoint_timeout   | 30min | configuration file
 client_encoding  | UTF8  | client
 cpu_operator_cost| 0.5   | configuration file
 cpu_tuple_cost   | 0.5   | configuration file
 DateStyle| ISO, MDY  | configuration file
 deadlock_timeout | 10s   | configuration file
 default_text_search_config   | pg_catalog.english| configuration file
 effective_cache_size | 70GB  | configuration file
 effective_io_concurrency | 6 | configuration file
 full_page_writes | on| configuration file
 hot_standby  | on| configuration file
 hot_standby_feedback | on| configuration file
 lc_messages  | en_US.UTF-8   | configuration file
 lc_monetary  | en_US.UTF-8   | configuration file
 lc_numeric   | en_US.UTF-8   | configuration file
 lc_time  | en_US.UTF-8   | configuration file
 listen_addresses | * | configuration file
 log_autovacuum_min_duration  | 1s| configuration file
 log_checkpoints  | on| configuration file
 log_destination  | csvlog| configuration file
 log_file_mode| 0600  | configuration file
 log_filename | postgresql-%a.log | configuration file
 log_lock_waits   | on| configuration file
 log_min_duration_statement   | 250ms | configuration file
 log_rotation_age | 1d| configuration file
 log_rotation_size| 0 | configuration file
 log_statement| ddl   | configuration file
 log_timezone | America/New_York  | configuration file
 log_truncate_on_rotation | on| configuration file
 logging_collector| on| configuration file
 maintenance_work_mem | 2400MB| configuration file
 max_connections  | 1000  | configuration file
 max_stack_depth  | 5MB   | configuration file
 max_wal_senders  | 5 | configuration file
 port | 5432  | command line
 random_page_cost | 4 | session
 seq_page_cost| 1 | configuration file
 shared_buffers   | 8GB   | configuration file
 shared_preload_libraries | auto_explain  | configuration file
 stats_temp_directory | /var/lib/pgsql_stat_tmpfs | config

Re: [PERFORM] 60 core performance with 9.3

2014-07-30 Thread Tomas Vondra
On 30 Červenec 2014, 14:39, Tom Lane wrote:
> "Tomas Vondra"  writes:
>> On 30 ??ervenec 2014, 3:44, Mark Kirkwood wrote:
>>> While these numbers look great in the middle range (12-96 clients),
>>> then
>>> benefit looks to be tailing off as client numbers increase. Also
>>> running
>>> with no stats (and hence no auto vacuum or analyze) is way too scary!
>
>> By disabling statistics collector you loose runtime counters - number of
>> sequential/index scans on a table, tuples read from a relation aetc. But
>> it does not influence VACUUM or planning at all.
>
> It does break autovacuum.

Of course, you're right. It throws away info about how much data was
modified and when the table was last (auto)vacuumed.

This is a clear proof that I really need to drink at least one cup of
coffee in the morning before doing anything in the morning.

Tomas



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 60 core performance with 9.3

2014-07-30 Thread Tom Lane
"Tomas Vondra"  writes:
> On 30 Červenec 2014, 3:44, Mark Kirkwood wrote:
>> While these numbers look great in the middle range (12-96 clients), then
>> benefit looks to be tailing off as client numbers increase. Also running
>> with no stats (and hence no auto vacuum or analyze) is way too scary!

> By disabling statistics collector you loose runtime counters - number of
> sequential/index scans on a table, tuples read from a relation aetc. But
> it does not influence VACUUM or planning at all.

It does break autovacuum.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very slow planning performance on partition table

2014-07-30 Thread Rural Hunter
I think I understand what happened now. I have another monitor script 
runs periodically and calls pg_cancel_backend and pg_terminate_backend 
for those hanging update sqls. However for some unkown reason the cancle 
and termiante command doesn't work at pgsql side for those update sqls.


But I think pgbouncer side was notified by cancel or terminate command. 
It then drops old connections and creates new ones while those old 
connections still hang at pgsql side. That's why the connection status 
shows CLOST_WAIT and there are more processes at pgsql side than 
pgbouncer defined . So the root cause is still at pgsql side. It 
shouldn't hang there. What the hanging process was doing is in my 
previous posts. There many same concurrent sql which updates a 
partitioned table witouth partition key specified in conditions. The gdb 
trace shows this:

(gdb) bt
#0  0x7f8cea310db7 in semop () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x005f97d3 in PGSemaphoreLock ()
#2  0x00638153 in LWLockAcquire ()
#3  0x004a90d0 in ginTraverseLock ()
#4  0x004a9d0b in ginFindLeafPage ()
#5  0x004a8377 in ginInsertItemPointers ()
#6  0x004a4548 in ginEntryInsert ()
#7  0x004ae687 in ginInsertCleanup ()
#8  0x004af3d6 in ginHeapTupleFastInsert ()
#9  0x004a4ab1 in gininsert ()
#10 0x00709b15 in FunctionCall6Coll ()
#11 0x0047b6b7 in index_insert ()
#12 0x0057f475 in ExecInsertIndexTuples ()
#13 0x0058bf07 in ExecModifyTable ()
#14 0x005766e3 in ExecProcNode ()
#15 0x00575ad4 in standard_ExecutorRun ()
#16 0x0064718f in ProcessQuery ()
#17 0x006473b7 in PortalRunMulti ()
#18 0x00647e8a in PortalRun ()
#19 0x00645160 in PostgresMain ()
#20 0x0060459e in ServerLoop ()
#21 0x006053bc in PostmasterMain ()
#22 0x005a686b in main ()
(gdb) q

It will just hangs there forever and finally blocks all other update 
sqls if I don't stop pgbouncer. When this happens, all the cpus will be 
utilized by those hanging processes and the server load is very very 
high. It keeps at serveral hundreds comparing with about 20 normally 
which causes the performance problem for all tasks on the server.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why you should turn on Checksums with SSDs

2014-07-30 Thread Tomas Vondra
On 30 Červenec 2014, 5:12, Josh Berkus wrote:
> Explained here:
> https://www.usenix.org/system/files/conference/fast13/fast13-final80.pdf
>
> 13 out of 15 tested SSD's had various kinds of corruption on a power-out.
>
> (thanks, Neil!)

Well, only four of the devices supposedly had a power-loss protection
(battery, capacitor, ...) so I guess it's not really that surprising the
remaining 11 devices failed in a test like this. Although it really
shouldn't damage the device, as apparently happened during the tests.

Too bad they haven't mentioned which SSDs they've been testing
specifically. While I understand the reason for that (HP Labs can't just
point at products from other companies), it significantly limits the
usefulness of the study. Too many companies are producing crappy
consumer-level devices, advertising them as "enterprise". I could name a
few ...

Maybe it could be deciphered using the information in the paper
(power-loss protection, year of release, ...).

I'd expect to see Intel 320/710 to see there, but that seems not to be the
case, because those devices were released in 2011 and all the four devices
with power-loss protection have year=2012. Or maybe it's the year when
that particular device was manufactured?


regards
Tomas



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 60 core performance with 9.3

2014-07-30 Thread Tomas Vondra
On 30 Červenec 2014, 3:44, Mark Kirkwood wrote:
>
> While these numbers look great in the middle range (12-96 clients), then
> benefit looks to be tailing off as client numbers increase. Also running
> with no stats (and hence no auto vacuum or analyze) is way too scary!

I assume you've disabled statistics collector, which has nothing to do
with vacuum or analyze.

There are two kinds of statistics in PostgreSQL - data distribution
statistics (which is collected by ANALYZE and stored in actual tables
within the database) and runtime statistics (which is collected by the
stats collector and stored in a file somewhere on the dist).

By disabling statistics collector you loose runtime counters - number of
sequential/index scans on a table, tuples read from a relation aetc. But
it does not influence VACUUM or planning at all.

Also, it's mostly async (send over UDP and you're done) and shouldn't make
much difference unless you have large number of objects. There are ways to
improve this (e.g. by placing the stat files into a tmpfs).

Tomas



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance