Re: [Savannah-hackers-public] Anyone have any news on Savannah?
It looks like the automatic restart is working. But the underlying problem remains. Afer a restart... [Mon Aug 15 17:47:40.409406 2022] [mpm_prefork:notice] [pid 16147] AH00173: SIGHUP received. Attempting to restart [Mon Aug 15 17:47:41.241212 2022] [mpm_prefork:notice] [pid 16147] AH00163: Apache/2.4.29 (Trisquel_GNU/Linux) OpenSSL/1.1.1 configured -- resuming normal operations [Mon Aug 15 17:47:41.241352 2022] [core:notice] [pid 16147] AH00094: Command line: '/usr/sbin/apache2' postdrop: warning: mail_queue_enter: create file maildrop/338453.17005: Permission denied postdrop: warning: mail_queue_enter: create file maildrop/933049.17009: Permission denied postdrop: warning: mail_queue_enter: create file maildrop/339760.17005: Permission denied postdrop: warning: mail_queue_enter: create file maildrop/934393.17009: Permission denied ... Endless logs of mail failure. Which I don't yet understand. It would not be normal for the system to be sending mail at the rate that attempts are being logged. Also testing shows that email is working okay otherwise. Something is trying to send a lot of mail but fortunately failing. The access logs don't immediately point to a culprit. But at least it has been automatically restarted successfully and that is keeping the web site online. I continue to poke at things... Bob P.S. I really wish this were nginx instead of apache. Every time I am forced to deal with apache problems I feel like I am being punished for bad karma from a previous life. But maybe I am...
Re: [Savannah-hackers-public] Anyone have any news on Savannah?
On Mon, 2022-08-15 at 13:28 -0600, Bob Proulx wrote: > Paul Smith wrote: > > Unfortunately it seems to be down / under attack again this morning > > :( > > I see that my attempts at mitigation of the current problem are > failing. I am looking into things again now. I'll type this in > stream of consciousness as I work the problem and then send it. Thanks the site is currently working for me. And I was interested to read the notes on your debug session :).
Re: [Savannah-hackers-public] Anyone have any news on Savannah?
Paul Smith wrote: > Unfortunately it seems to be down / under attack again this morning :( I see that my attempts at mitigation of the current problem are failing. I am looking into things again now. I'll type this in stream of consciousness as I work the problem and then send it. I may have fixed the insufficient mitigation for the apache web server wedging up. I had thought that "apachectl graceful" should be sufficient to restart the apache server. But no it was not. The apache logs were filled with attempts to restart using that command and obviously it wasn't working. It appears that what is happening is that external client agents are connecting to the apache web server and hanging onto connections forever. This causes apache to spawn up to the MaxRequestWorkers number and hold there. And it doesn't matter the number. Eventually all server processes are consumed regardless of the number. Processes respond to other requests until the malicious agent eventually grabs them and holds onto them. That's when no one else can get response from the site. Meanwhile that number must be below the max memory available for use. If it is larger than available then of course that's bad and we must start swapping. I reduced it from the large 255 value to 32 yesterday on the previous pass at this tuning. The graceful action politely waits for the client to finish processing. Which in this case never occurs because the abusive agents are not polite and hold on forever. Therefore using graceful is insufficient. That is why it requires the forced stop action to have apache stop talking to the abusive agents forcibly and to drop all connections. And of course then a start works. I changed that to be a full "apachectl stop" and then followed by a "apachectl start" and that worked. Rather than simply restarting I changed the automatic mitigation test and let it work and so tested that it would automatically detect and restart okay. But I am going to configure the "restart" action. It makes me nervous that the automation triggers "stop;start" immediately with no delay. I worry that it might take a moment for the stop to propagate and all of the old apache server processes to actually exit. And if they don't exit then the restart action can't succeed. It's a race condition due to the asynchronous behavior of systemd. Though the automation would trigger again and that one might succeed if the previous action failed. Feels like the apachectl restart would handle this action more reliably since that instructs the supervisor process to perform the restart. Will configure it while the current malicious agents are active to help us test for it. [[ In this investigation I see that on systemd systems there is an "interesting" control loop. apachectl calls systemd and systemd calls apachectl. This would be a loop except for a programmed in internal variable that breaks the loop. Isn't it a wonderful world that we now live in? Not! ]] If the restart action proves insufficient (though I think it will work) then I will script up a stronger more explicit restart action. Let's hope that isn't needed. Additionally I tuned the apache server to have a smaller number of MaxRequestWorkers than the default 255 which is more than the memory available. I have it currently set to a more conservative 32. Though now that things are observable again it appears that each process uses about 25MB of memory for about 800MB of memory consumed in this configuration. We could increase that number somewhat and use more memory for server processes. Increasing this would eat into available file system buffer cache though. Life is a tradeoff. But this seems pretty responsive right now. I'll leave it this way for a while and observe. Additionally I configured MaxConnectionsPerChild to be 100 rather than unlimited. This will cause apache to restart server processes after that number of client requests have been handled. That's just a useful "random bug" guard restarting processes ever so often rather than never. > Don't people have better things to do with their lives? Sigh. It's why we can't have nice things. :-( Bob
Re: [Savannah-hackers-public] Anyone have any news on Savannah?
On Sun, 2022-08-14 at 22:06 -0600, Bob Proulx wrote: > Paul Smith wrote: > > I haven't been able to reach the Savannah website for most of the > > day. > > Things like the Git service are available but the website is not. > > Thanks for the report. It appears that some agent was pounding on > the web site. There were max processes of apache2 web server running > and nothing making progress. I killed all and restarted the web > server and things seem to be functional now. > > It looks like the fail2ban dynamic rules were not transferred over to > the new system. We have some custom rules there that help block > abusive agents. I'll get those set up on the system. Thanks Bob. Unfortunately it seems to be down / under attack again this morning :( Don't people have better things to do with their lives? Sigh.