Justin, Thanks for the extensive reading list, very educative.
After reading https://blog.jcole.us/2010/09/28/mysql-swap-insanity-and-the-numa-architecture/ I was thinking that it could be a NUMA/THP-related problem. Turning off THP solved the "swap storm" problem. Some queries are even 40% faster with THP off. Then also turning off KSM improved performance by another 5% I was seriously worried about this issue as we received today another server with 144GB of RAM. I will try to post a little summary of all the suggestion I received via this thread later this week/early next week. Thanks! Charles On Tue, Jul 18, 2017 at 8:01 PM, Justin Pryzby <[email protected]> wrote: > On Tue, Jul 18, 2017 at 02:13:58PM -0300, Claudio Freire wrote: > > On Tue, Jul 18, 2017 at 1:01 PM, Claudio Freire <[email protected]> > wrote: > > > On Tue, Jul 18, 2017 at 6:20 AM, Charles Nadeau > > > <[email protected]> wrote: > > >> Claudio, > > >> > > >> At one moment > > >> during the query, there is a write storm to the swap drive (a bit > like this > > >> case: > > >> https://www.postgresql.org/message-id/AANLkTi% > 3Diw4fC2RgTxhw0aGpyXANhOT%3DXBnjLU1_v6PdA%40mail.gmail.com). > > >> I can hardly explain it as there is plenty of memory on this server. > > > > > > That sounds a lot like NUMA zone_reclaim issues: > > > > > > https://www.postgresql.org/message-id/[email protected] > > > > I realize you have zone_reclaim_mode set to 0. Still, the symptoms are > > eerily similar. > > Did you look at disabling KSM and/or THP ? > > sudo sh -c 'echo 2 >/sys/kernel/mm/ksm/run' > > https://www.postgresql.org/message-id/20170524155855. > GH31097%40telsasoft.com > https://www.postgresql.org/message-id/CANQNgOrD02f8mR3Y8Pi= > [email protected] > https://www.postgresql.org/message-id/CAHyXU0y9hviyKWvQZxX5UWfH9M2LY > vwvAOPQ_DUPva2b71t12g%40mail.gmail.com > https://www.postgresql.org/message-id/20130716195834. > [email protected] > https://www.postgresql.org/message-id/CAE_gQfW3dBiELcOppYN6v%3D8%2B% > 2BpEeywD7iXGw-OT3doB8SXO4_A%40mail.gmail.com > https://www.postgresql.org/message-id/flat/1436268563235- > 5856914.post%40n5.nabble.com#[email protected] > https://www.postgresql.org/message-id/CAL_0b1tJOZCx3Lo3Eve1RqGaT%2BJJ_ > [email protected] > https://www.postgresql.org/message-id/[email protected] > https://www.postgresql.org/message-id/1415981309.90631. > YahooMailNeo%40web133205.mail.ir2.yahoo.com > https://www.postgresql.org/message-id/CAHyXU0yXYpCXN4%3D81ZDRQu- > oGzrcq2qNAXDpyz4oiQPPAGk4ew%40mail.gmail.com > https://www.pythian.com/blog/performance-tuning-hugepages-in-linux/ > http://structureddata.org/2012/06/18/linux-6-transparent-huge-pages-and- > hadoop-workloads/ > > Justin > -- Charles Nadeau Ph.D. http://charlesnadeau.blogspot.com/
