dear Pietro, are you sure about effective_io_concurrency = 30
could you please explain the type of disk storage? Il 14/Nov/2016 12:46, "Pietro Pugni" <pietro.pu...@gmail.com> ha scritto: > Dear list, > I’m looking for some guidelines on how to optimize the configuration of a > production database dedicated to a DWH application. > I run the application on different machines and have solved several issues > since now but am struggling on a production environment running Red Hat 6.7 > and PostgreSQL 9.5.3. > My application does a lot of reads and many writes (plain “SELECT … INTO” > and “INSERT”, no “UPDATE”), but on a order of magnitude lower than the > reads. > The work flow consists of two big blocks: an ETL phase and the workloads > on the data imported during the ETL phase. > > The biggest schema has about 1.2 billions of rows distributed over a ten > of tables; many of those tables are partitioned and have indexes. At the > moment the database stores two schemas but I plan to add other three > schemas of similar size. > > The machine is virtualized and has 8 CPUs at about 3GHz, 64GB of RAM and > 5TB of storage. It runs on Red Hat 6.7, kernel 2.6.x > > The configuration changes I made so far are: > max_connections = 30 > shared_buffers = 32GB > work_mem = 256MB > maintenance_work_mem = 4GB > effective_io_concurrency = 30 > checkpoint_completion_target = 0.9 > random_page_cost = 2.0 > effective_cache_size = 48GB > default_statistics_target = 1000 > > autovacuum is on and the collation is ‘C’. > > > The first issue I faced was about maintenance_work_mem because I set it to > 16GB and the server silently crashed during a VACUUM because I didn’t > consider that it could take up to autovacuum_max_workers * > maintenance_work_mem (roughly 48GB). So I lowered maintenance_work_mem to > 4GB and it did work. *Should I set maintenance_work_mem to a smaller > value (1GB) after the ETL terminates or can I leave it at 4GB without > degrading the overall performance?* > > The second issue emerged during a intensive parallel query. I implemented > a splitter that parallelize certain kind of queries. There were 8 similar > queries running that was working on 8 overall disjoined subsets of the same > table; this table has roughly 4.5 millions of rows. These queries uses > SELECT DISTINCT, ORDER BY, OVER (PARTITION BY … ORDER BY) and COALESCE(). > At a certain point the server crashed and I found the following error in > the logs: > > *postgres server process was terminated by signal 9 killed* > > After some research, I found that probably it was the OOM killer. Running > “dmesg” tells that effectively it was. Reading the documentation and this > answer on SO ( http://stackoverflow.com/questions/16418173/psql-seems- > to-timeout-with-long-queries ), I realized that probably the issue is due > to a misconfiguration. The value I set for this pg instance don’t seem to > be so wrong, except maybe from maintenance_work_mem. I will certainly > disable OOM as suggested by the official docs ( > https://www.postgresql.org/docs/current/static/kernel- > resources.html#LINUX-MEMORY-OVERCOMMIT ) but *was wondering if I could > tune the configuration a little better. Can someone give me some more > advices?* > > I run the same application with different data (and workload) on other > machines, but they have different configurations (Ubuntu 16.0.4). On one of > them I previously disabled the virtual memory overcommit and never > experienced that issue, but the machine has 128GB of RAM. > > I hope to have been clear enough. > Thank you everyone > Pietro >