Sorry I did not see this thread until now. Paul Eggert wrote: > Can't browse source code via Savannah now. Here are symptoms:
A 502 Bad Gateway is a symptom of the machine being too overloaded to process the request before the web server times out talking to the CGI process. In this case the gitweb process. The machine is surging in load because the Internet is a hostile place and people are the problem. It is getting hit with a large botnet that has over 3 million addresses. (Therefore I suspect it is composed of either compromised security cameras or compromised phones but no real idea.) And there is never just one botnet. There are actually multiple botnets operating concurrently! Because people are the problem. We are mitigating things as best as we can to shed the abuse load but still provide member services. Basically though the problem is that on average everything is fine but when the thundering herd of the botnets increases then the machine browns out for a while and we see 502 returns due to the load and so sometimes it is not usable. Please be patient and try again after a while. Member access through ssh is almost always better than the web http based protocols because ssh is authenticated and http is anonymous and anonymity leads to more abuse. I am scripted in some mitigations which are blocking the botnet addresses dynamically. This means that though things surge that this allows the system to shed that load after a bit of running. > Browsing via the Savannah web didn't work. The not-working was for some time > (a few minutes? don't recall exactly), which is why I reported it. We always appreciate feedback! And it allows us to communicate and share what is happening that is causing the problems. > Now it's failing, with different symptoms: > > $ wget -S -O- 'https://git.savannah.gnu.org/gitweb/?p=gettext.git' ... > ... and at this point it hangs indefinitely. ... > So I suspect vcs3 is the culprit somehow. Yes. This happened yesterday. I saw it mostly in real-time because another user reported it on IRC and I spotted it and was able to jump onto the problem within a minute. And then the machine locked up entirely. Which prevented me from getting more than the main part of the problem. Earlier in the day something caused the vcs3 machine to have wonky kernel problems (hard to describe and no idea the problem) and to drop the configured swap partition. And without swap it started to OOM Killer things. Therefore I rolled the git service from vcs3 to vcs2 so I could work on vcs3. It looked like systemd disabled the custom swap on service which I know has been enabled because it is a scripted provisioning. I enabled it again and rebooted it to verify. And noticed that it was trying to set up suspend-resume from swap which is undesired there so reconfigured that and rebooted again. All being good I rolled git from vcs2 back onto vcs3 again. And that's when more trouble started after a while. The problem there was that the botnet block list didn't get loaded after reboot. I had _thought_ that I had it working but that detail did not work. And so vcs3 was open to the large 3 million strong botnet and eventually it was overwhelmed and fell prey to it. I don't think it should have locked up regardless but eventually it did anyway. I rolled git back from vcs3 onto vcs2 and let it run the night there because I did not have more time to deal with it at that moment. Now you might be asking why not leave it on vcs2? vcs2 is Trisquel 9 on xfs which is a good combination but it needs to be upgraded to Trisquel 11 and currently is running with rsync disabled due to the recent rsync vulnerability. vcs3 is Trisquel 11 on btrfs and fully updated and all services including rsync fully upgraded and running. (Note that I personally am not a fan of btrfs and do not like the configuration there but that was not my decision to make with it.) The theory goes that vcs3 is the better system and more new VMs will have the exact same configuration in the future. If it can't handle it then better for us to find out. I need to spend some time to do various cleaning on vcs2 and then upgrade it to Trisquel 11. It's just life and time which is keeping everything from happening all at once. And lately for me life has been a problem. Having vcs2 to fall back onto when vcs3 suffers has been a huge good thing and I am a little worried about disturbing the functionality on vcs2. But can't stay on Trisquel 9 forever due to the ongoing security issues. Bob