[PERFORM] pgtune + configurations with 9.3

2014-10-30 Thread Tory M Blue
Greetings all,

I'm trying to wrap my head around updating my configuration files, which
have been probably fairly static since before 8.4.

I've got some beefy hardware but have some tables that are over 57GB raw
and end up at 140GB size after indexes are applied. One index creation took
7 hours today. So it's time to dive in and see where i'm lacking and what I
should be tweaking.

I looked at pgtune again today and the numbers it's spitting out took me
back, they are huge. From all historical conversations and attempts a few
of these larger numbers netted reduced performance vs better performance
(but that was on older versions of Postgres).

So I come here today to seek out some type of affirmation that these
numbers look good and I should look at putting them into my config, staged
and or in one fell swoop.

I will start at the same time migrating my config to the latest 9.3
template...

Postgres Version: 9.3.4, Slony 2.1.3 (migrating to 2.2).
CentOS 6.x, 2.6.32-431.5.1.el6.x86_64
Big HP Boxen.

32 core, 256GB of Ram DB is roughly 175GB in size but many tables are
hundreds of millions of rows.

The pgtune configurations that were spit out based on the information above;

max_connections = 300
shared_buffers = 64GB
effective_cache_size = 192GB
work_mem = 223696kB
maintenance_work_mem = 2GB
checkpoint_segments = 32
checkpoint_completion_target = 0.7
wal_buffers = 16MB
default_statistics_target = 100

*my current configuration:*

max_connections = 300
shared_buffers = 2000MB
effective_cache_size = 7GB
work_mem = 6GB
maintenance_work_mem = 10GB-- bumped this to try to get my reindexes
done
checkpoint_segments = 100
#wal_buffers = 64kB
#default_statistics_target = 10

Here is my complete configuration (This is my slon slave server, so fsync
is off and archive is off, but on my primary fsync=on and archive=on).

listen_addresses = '*'
max_connections = 300
shared_buffers = 2000MB
max_prepared_transactions = 0
work_mem = 6GB
maintenance_work_mem = 10GB
fsync = off
checkpoint_segments = 100
checkpoint_timeout = 10min
checkpoint_warning = 3600s
wal_level archive
archive_mode = off
archive_command = 'tar -czvpf /pg_archives/%f.tgz %p'
archive_timeout = 10min
random_page_cost = 2.0
effective_cache_size = 7GB
log_destination = 'stderr'
logging_collector = on
log_directory = '/data/logs'
log_filename = 'pgsql-%m-%d.log'
log_truncate_on_rotation = on
log_rotation_age = 1d
log_min_messages = info
log_min_duration_statement = 15s
log_line_prefix = '%t %d %u %r %p %m'
log_lock_waits = on
log_timezone = 'US/Pacific'
autovacuum_max_workers = 3
autovacuum_vacuum_threshold = 1000
autovacuum_analyze_threshold = 2000
datestyle = 'iso, mdy'
timezone = 'US/Pacific'
lc_messages = 'en_US.UTF-8'
lc_monetary = 'en_US.UTF-8'
lc_numeric = 'en_US.UTF-8'
lc_time = 'en_US.UTF-8'
deadlock_timeout = 5s

Also while it doesn't matter in 9.3 anymore apparently my sysctl.conf has

kernel.shmmax = 68719476736
kernel.shmall = 4294967296

And PGTune recommended;

kernel.shmmax=137438953472
kernel.shmall=33554432

Also of note in my sysctl.conf config:

vm.zone_reclaim_mode = 0
vm.swappiness = 10

Thanks for the assistance, watching these index creations crawl along when
you know you have so many more compute cycles to provide makes one go
crazy.'

Tory


Re: [PERFORM] Incredibly slow restore times after 9.09.2 upgrade

2014-10-30 Thread jmcdonagh
I just had a thought- I know some of these tables are in need of a vacuuming.
Could it be that the dump is dumping a bunch of garbage that the restore has
to sift through on the restore? I don't know enough details to know if this
is a dumb thought or not.

The restore to RDS took roughly the same amount of time. My next move is to
try on a fast instance store, and also do a postgres 9 restore of a pure SQL
dump, but that won't really be a great test since I use custom format. I'm
assuming here that I can't take the custom dump from 9.2 and apply it to
9.0, or can I?



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Incredibly-slow-restore-times-after-9-0-9-2-upgrade-tp5824701p5825052.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Incredibly slow restore times after 9.09.2 upgrade

2014-10-30 Thread Jerry Sievers
jmcdonagh joseph.e.mcdon...@gmail.com writes:

 I just had a thought- I know some of these tables are in need of a vacuuming.
 Could it be that the dump is dumping a bunch of garbage that the restore has
 to sift through on the restore? I don't know enough details to know if this
 is a dumb thought or not.

No.  However it's true that the dump will take a bit longer having to
scan a bloated table rather than a tight one.

Dump will only output the live rows.  psql or pg_restore whatever you're
using on the target side will not have to step over any junk.

HTH


 The restore to RDS took roughly the same amount of time. My next move is to
 try on a fast instance store, and also do a postgres 9 restore of a pure SQL
 dump, but that won't really be a great test since I use custom format. I'm
 assuming here that I can't take the custom dump from 9.2 and apply it to
 9.0, or can I?



 --
 View this message in context: 
 http://postgresql.1045698.n5.nabble.com/Incredibly-slow-restore-times-after-9-0-9-2-upgrade-tp5824701p5825052.html
 Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

-- 
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consult...@comcast.net
p: 312.241.7800


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Incredibly slow restore times after 9.09.2 upgrade

2014-10-30 Thread Tomas Vondra
On 29.10.2014 16:12, jmcdonagh wrote:
 Hi Tomas- thank you for your thoughtful response!
 
 
 Tomas Vondra wrote
 On 28.10.2014 21:55, jmcdonagh wrote:
 Hi, we have a nightly job that restores current production data to
 the development databases in a 'warm spare' database so that if the
 developers need fresh data, it's ready during the day. When we moved
 from 9.0 to 9.2 suddenly the restores began to take from a few hours
 to more like 15 hours or so. We're in Amazon EC2, I've tried new EBS
 volumes, warmed them up, threw IOPS at them, pretty much all the
 standard stuff to get more disk performance.

 So, if I understand it correctly, you've been restoring into 9.0, then
 you switched to 9.2 and it's much slower?
 
 Yes- but since the move was done utilizing snapshots so the move
 involves new volumes, but I have created new volumes since then to
 rule out a single bad volume.

My advice would be to do some basic low-level performance tests to rule
this out. Use dd or (better) fio to test basic I/O performance, it's
much easier to spot issues that way.

 Tomas Vondra wrote
 Is the 9.2 configured equally to 9.0? If you do something like this

   SELECT name, setting
 FROM pg_settings
WHERE source = 'configuration file';

 on both versions, what do you get?
 
 I no longer have the 9.0 box up but we do track configuration via
 puppet and git. The only configuration change made for 9.2 is:
 
 -#standard_conforming_strings = off
 +standard_conforming_strings = off

Compared to 9.0, I suppose? Anyway, post the non-default config values
at least for 9.2, please.

 Cause we have an old app that needs this setting on otherwise we'd
 spend a lot of time trying to fix it.

I doubt standard_conforming_strings has anything to do with the issues.

 Tomas Vondra wrote
 Here's the thing, the disk isn't saturated. The behavior I'm seeing 
 seems very odd to me; I'm seeing the source disk which holds the dump
 saturated by reads, which is great, but then I just see nothing being
 written to the postgres volume. Just nothing happening, then a
 small burst. There is no write queue backup on the destination disk
 either. if I look at pg_stat_activity I'll see something like:

 COPY salesforce_reconciliation (salesforce_id, email,
 advisor_salesforce_id, processed) FROM stdin

 and even for small tables, that seems to take a very long time even
 though the destination disk is almost at 0 utilization.

 So, where's the bottleneck? Clearly, there's one, so is it a CPU, a
 disk or something else? Or maybe network, because you're using EBS?

 What do you mean by 'utilization'? How do you measure that?
 
 The bottleneck is I/O somehow. I say somehow, because I see iowait 
 averaging about 50% between two CPUs, but there is just no writes to 
 the destination EBS volume really happening, just reads from the
 disk where the source dump is located, then bursts of writes to the
 destination volume every so often. It's kind of puzzling. This is
 happening on multiple database servers, in multiple availability
 zones. Driving me bonkers.
 
 What I mean by utilization is util% from iostat -m -x 1.

I find this rather contradictory. At one moment you say the disk isn't
saturated, the next moment you say you're I/O bound.

Also, iowait (as reported e.g. by 'top') is tricky to interpret
correctly, especially on multi-cpu systems (nice intro to the complexity
[1]). It's really difficult to interpret the 50% iowait without more
info about what's happening on the machine.

IMHO, the utilization (as reported by iotop) is much easier to
interpret, because it means '% of time the device was servicing
requests'. It has issues too, because 100% does not mean 'saturated'
(especially on RAID arrays that can service multiple requests in
parallel), but it's better than iowait.

If I had to guess based from your info, I'd bet you're CPU bound, so
there's very little idle time and about 50% of it is spent waiting for
I/O requests (hence the 50% iowait). But in total the amount of I/O is
very small, so %util is ~0.

Please, post a few lines of 'iostat -x -k 1' output. Samples from 'top'
and 'vmstat 1' would be handy too.

regards
Tomas

[1] http://veithen.blogspot.cz/2013/11/iowait-linux.html



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance