Matt You may consider using multiple PostgreSQL replica servers in HOT mode servers with pgpool in front of them it may help but a little. but because of te way spacewalk does its queries I doubt it will help much as the safties in pgpool will think just about every query may write and send it to the master any way. A better but more maintenece intensive option you may want to consider is table partitioning to alleviate the locks https://www.postgresql.org/docs/9.1/static/ddl-partitioning.html https://wiki.postgresql.org/wiki/Table_partitioning this is a chopped down version of what the commercial data warehouse versions of PostgreSQL do, its a little bit of a pain but it works well for alleviating locking issues. I've used it with multi TB databases in the past with great success but it takes a lot of planning and testing to get it right.
On Tue, Oct 11, 2016 at 10:04 PM, Matt Moldvan <m...@moldvan.com> wrote: > Yeah, it was quite a trial to get it to scale to that degree, so I tried a > lot of different things. I increased the max_fds in the jabber configs and > the ulimits for the jabber user and it's pretty stable now, though even the > back end Python code wasn't written for having multiple osa dispatchers (one > overwrites the other with respect to the database password in the > rhnpushdispatcher table in the main database). > > As for proxies, we have 4 in each datacenter but even that was bringing down > the database due to what I realized later was OSAD restarting on all systems > when Puppet would run in the morning. That coupled with the database locks > that were being set on the snapshot tables, because Spacewalk for some > reason thought the base channels were updated daily, made the UI completely > unresponsive and unusable by our operations folks trying to run the patching > for hundreds of systems at a time. > > As for the proxies, my main struggle was putting them behind F5 local and > global load balancers. I signed the SSL certs with the name of the global > WIP, and unfortunately the way the jabber S2S component works, it was trying > to use that same name for all the proxies. I realized later I was trying to > fit a square peg into a round hole and fixed up the Puppet module, disabled > snapshots, and disabled Jabber on the proxies and pointed the clients at the > masters, instead. That was after months of frustration and a growing chorus > of complaints on the issues with the UI on the masters. > > Anyway, having the proxies helped reduce the load significantly on the > masters, as they still function with the other components like Apache httpd > and Squid for caching the RPMs. > > On Tue, Oct 11, 2016 at 5:02 PM Paul Robert Marino <prmari...@gmail.com> > wrote: >> >> Matt, >> you ave a lot of clients so those numbers start to make sense to >> increase the database connections. >> also OSAD isn't meant to scale that high in fact you should run out of >> file handles for it before it even gets to that many clients. >> further more I hope you are using spacewalk proxies if not you may >> find they help you a lot. >> >> As to the original poster look at this page >> https://www.postgresql.org/docs/9.2/static/runtime-config-resource.html >> specifically work_mem I would be willing to bet that is your issue >> because it isn't enough for Spacewalk by default. >> the easiest way to confirm it is to look in the postgresql base data >> directory /var/lib/pgsql/data by default and look for a directory >> called pgsql_tmp if there are any files in that directory then you >> know its swaping. >> further more you can look for any queries that have been running for a >> long time and run explain analyze on them. >> I normally do not suggest this site or any ones like it because they >> often give very wrong answers but in this case its not a bad answer >> >> http://dba.stackexchange.com/questions/112079/slow-query-performance-due-to-temporary-file >> I just don't suggest going strait to 64M >> >> >> >> On Tue, Oct 11, 2016 at 8:37 AM, Allan Moraes <al...@allanmoraes.com.br> >> wrote: >> > Thank you for the tips, >> > >> > In this case there is available 6GB of memory and the high I/O occur at >> > the >> > postgres disk. Other disks, the I/O is normal. The system not is using >> > swap >> > and there is 3GB of swap. >> > >> > I will separate the postgre and apply your tips. >> > >> > 2016-10-10 21:58 GMT-03:00 Matt Moldvan <m...@moldvan.com>: >> >> >> >> We have about 6,000 systems to manage and it was unusable otherwise... >> >> I >> >> had way too much trouble trying to get OSAD to work through proxies and >> >> F5 >> >> load balancers, so I had to end up pointing them all to two masters >> >> that are >> >> still using the same Postgres database VM. I was also toying with >> >> having >> >> the database be the back end for OSAD, so with that in mind the number >> >> of >> >> concurrent clients would often reach higher than usual numbers... I >> >> tried a >> >> lot of different things to get Spacewalk stable, usable, and have >> >> proper >> >> failover, so I don't know that any of my recommendations or environment >> >> specific settings might be a silver bullet for anyone else, but it >> >> can't >> >> hurt to try, and learn in the process. >> >> >> >> On Mon, Oct 10, 2016 at 6:23 PM Paul Robert Marino >> >> <prmari...@gmail.com> >> >> wrote: >> >>> >> >>> tuning for 5000 clients is nuts that would hurt your performance >> >>> try running pgtune for about 50 to maybe 500 clients max, but I try >> >>> the lower setting first. >> >>> Now lets talk about the high IO that usually happens when you don't >> >>> have enough working memory in PostgreSQL's configuration. When that >> >>> happens PostgreSQL creates temp files that are slow and do a lot of >> >>> write IO during read operations because it will have to swap the data >> >>> out to the temp files, note setting the number of connections too high >> >>> would exacerbate that issue f its the root cause. >> >>> By the way I managed up to 400 with spacewalk and never had to disable >> >>> the snapshots. >> >>> >> >>> >> >>> On Mon, Oct 10, 2016 at 4:48 PM, Matt Moldvan <m...@moldvan.com> >> >>> wrote: >> >>> > I had similar issues and ended up first breaking out the database to >> >>> > it's >> >>> > own VM, then increasing the Postgres debug logs. I saw that there >> >>> > were >> >>> > a >> >>> > large number of operations running against the snapshot tables, with >> >>> > locks >> >>> > and so on being set for a long period of time. In >> >>> > /etc/rhn/rhn.conf, >> >>> > try >> >>> > disabling snapshots with: >> >>> > >> >>> > enable_snapshots = 0 >> >>> > >> >>> > I also did quite a bit of Postgres tuning using pgtune, for 5,000 >> >>> > clients or >> >>> > so: >> >>> > pgtune -i data/postgresql.conf -o ./data/postgresql.conf.new -c >> >>> > 5000 >> >>> > >> >>> > Another thing that may help is installing pgbadger to analyze your >> >>> > Postgres >> >>> > logs... it has some nice visualizations of the types of queries and >> >>> > tables >> >>> > involved, which may point you in the right direction if snapshots >> >>> > aren't the >> >>> > reason for the high utilization. >> >>> > https://github.com/dalibo/pgbadger >> >>> > >> >>> > Hope that helps. >> >>> > >> >>> > On Mon, Oct 10, 2016 at 4:06 PM Konstantin Raskoshnyi >> >>> > <konra...@gmail.com> >> >>> > wrote: >> >>> >> >> >>> >> Because all your systems request information from SP, and default >> >>> >> installation doesn't make any sense if you have more that 50 >> >>> >> machines, >> >>> >> so >> >>> >> you need to tyne postgres, tomcat & linux itself >> >>> >> >> >>> >> On Mon, Oct 10, 2016 at 12:34 PM, Allan Moraes >> >>> >> <al...@allanmoraes.com.br> >> >>> >> wrote: >> >>> >>> >> >>> >>> Hi >> >>> >>> In my CentOS 7 server, is installed the spacewalk 2.4 and >> >>> >>> PostgreSQL >> >>> >>> from >> >>> >>> default installation. Via iotop, my postgresql write a lot of >> >>> >>> informations, >> >>> >>> during all day. Why this occur? >> >>> >>> >> >>> >>> _______________________________________________ >> >>> >>> Spacewalk-list mailing list >> >>> >>> Spacewalk-list@redhat.com >> >>> >>> https://www.redhat.com/mailman/listinfo/spacewalk-list >> >>> >> >> >>> >> >> >>> >> _______________________________________________ >> >>> >> Spacewalk-list mailing list >> >>> >> Spacewalk-list@redhat.com >> >>> >> https://www.redhat.com/mailman/listinfo/spacewalk-list >> >>> > >> >>> > >> >>> > _______________________________________________ >> >>> > Spacewalk-list mailing list >> >>> > Spacewalk-list@redhat.com >> >>> > https://www.redhat.com/mailman/listinfo/spacewalk-list >> >>> >> >>> _______________________________________________ >> >>> Spacewalk-list mailing list >> >>> Spacewalk-list@redhat.com >> >>> https://www.redhat.com/mailman/listinfo/spacewalk-list >> >> >> >> >> >> _______________________________________________ >> >> Spacewalk-list mailing list >> >> Spacewalk-list@redhat.com >> >> https://www.redhat.com/mailman/listinfo/spacewalk-list >> > >> > >> > >> > >> > -- >> > >> > Atenciosamente... >> > >> > Allan Moraes >> > - Linux Consulting at Venda e Cia >> > - Founder and Editor at MySQL Box >> > - Linux System Administrador and DBA MySQL at Umbler >> > >> > Cel: (51) 81885480 >> > E-mail: al...@mysqlbox.com.br >> > Skype: al...@allanmoraes.com.br >> > >> > _______________________________________________ >> > Spacewalk-list mailing list >> > Spacewalk-list@redhat.com >> > https://www.redhat.com/mailman/listinfo/spacewalk-list >> >> _______________________________________________ >> Spacewalk-list mailing list >> Spacewalk-list@redhat.com >> https://www.redhat.com/mailman/listinfo/spacewalk-list > > > _______________________________________________ > Spacewalk-list mailing list > Spacewalk-list@redhat.com > https://www.redhat.com/mailman/listinfo/spacewalk-list _______________________________________________ Spacewalk-list mailing list Spacewalk-list@redhat.com https://www.redhat.com/mailman/listinfo/spacewalk-list