Re: [Savannah-hackers-public] Anyone have any news on Savannah?

2022-08-15 Thread Bob Proulx
It looks like the automatic restart is working.  But the underlying
problem remains.

Afer a restart...

[Mon Aug 15 17:47:40.409406 2022] [mpm_prefork:notice] [pid 16147] AH00173: 
SIGHUP received.  Attempting to restart
[Mon Aug 15 17:47:41.241212 2022] [mpm_prefork:notice] [pid 16147] AH00163: 
Apache/2.4.29 (Trisquel_GNU/Linux) OpenSSL/1.1.1 configured -- resuming normal 
operations
[Mon Aug 15 17:47:41.241352 2022] [core:notice] [pid 16147] AH00094: 
Command line: '/usr/sbin/apache2'
postdrop: warning: mail_queue_enter: create file maildrop/338453.17005: 
Permission denied
postdrop: warning: mail_queue_enter: create file maildrop/933049.17009: 
Permission denied
postdrop: warning: mail_queue_enter: create file maildrop/339760.17005: 
Permission denied
postdrop: warning: mail_queue_enter: create file maildrop/934393.17009: 
Permission denied
...

Endless logs of mail failure.  Which I don't yet understand.  It would
not be normal for the system to be sending mail at the rate that
attempts are being logged.  Also testing shows that email is working
okay otherwise.  Something is trying to send a lot of mail but
fortunately failing.  The access logs don't immediately point to a
culprit.  But at least it has been automatically restarted
successfully and that is keeping the web site online.

I continue to poke at things...

Bob

P.S. I really wish this were nginx instead of apache.  Every time I am
forced to deal with apache problems I feel like I am being punished
for bad karma from a previous life.  But maybe I am...



Re: [Savannah-hackers-public] Anyone have any news on Savannah?

2022-08-15 Thread Bob Proulx
Paul Smith wrote:
> Unfortunately it seems to be down / under attack again this morning :(

I see that my attempts at mitigation of the current problem are
failing.  I am looking into things again now.  I'll type this in
stream of consciousness as I work the problem and then send it.

I may have fixed the insufficient mitigation for the apache web server
wedging up.  I had thought that "apachectl graceful" should be
sufficient to restart the apache server.  But no it was not.  The
apache logs were filled with attempts to restart using that command
and obviously it wasn't working.

It appears that what is happening is that external client agents are
connecting to the apache web server and hanging onto connections
forever.  This causes apache to spawn up to the MaxRequestWorkers
number and hold there.  And it doesn't matter the number.  Eventually
all server processes are consumed regardless of the number.  Processes
respond to other requests until the malicious agent eventually grabs
them and holds onto them.  That's when no one else can get response
from the site.

Meanwhile that number must be below the max memory available for use.
If it is larger than available then of course that's bad and we must
start swapping.  I reduced it from the large 255 value to 32 yesterday
on the previous pass at this tuning.

The graceful action politely waits for the client to finish
processing.  Which in this case never occurs because the abusive
agents are not polite and hold on forever.  Therefore using graceful
is insufficient.  That is why it requires the forced stop action to
have apache stop talking to the abusive agents forcibly and to drop
all connections.  And of course then a start works.

I changed that to be a full "apachectl stop" and then followed by a
"apachectl start" and that worked.  Rather than simply restarting I
changed the automatic mitigation test and let it work and so tested
that it would automatically detect and restart okay.

But I am going to configure the "restart" action.  It makes me nervous
that the automation triggers "stop;start" immediately with no delay.
I worry that it might take a moment for the stop to propagate and all
of the old apache server processes to actually exit.  And if they
don't exit then the restart action can't succeed.  It's a race
condition due to the asynchronous behavior of systemd.  Though the
automation would trigger again and that one might succeed if the
previous action failed.  Feels like the apachectl restart would handle
this action more reliably since that instructs the supervisor process
to perform the restart.  Will configure it while the current malicious
agents are active to help us test for it.

[[ In this investigation I see that on systemd systems there is an
"interesting" control loop.  apachectl calls systemd and systemd calls
apachectl.  This would be a loop except for a programmed in internal
variable that breaks the loop.  Isn't it a wonderful world that we now
live in?  Not! ]]

If the restart action proves insufficient (though I think it will
work) then I will script up a stronger more explicit restart action.
Let's hope that isn't needed.

Additionally I tuned the apache server to have a smaller number of
MaxRequestWorkers than the default 255 which is more than the memory
available.  I have it currently set to a more conservative 32.  Though
now that things are observable again it appears that each process uses
about 25MB of memory for about 800MB of memory consumed in this
configuration.  We could increase that number somewhat and use more
memory for server processes.  Increasing this would eat into available
file system buffer cache though.  Life is a tradeoff.  But this seems
pretty responsive right now.  I'll leave it this way for a while and
observe.

Additionally I configured MaxConnectionsPerChild to be 100 rather than
unlimited.  This will cause apache to restart server processes after
that number of client requests have been handled.  That's just a
useful "random bug" guard restarting processes ever so often rather
than never.

> Don't people have better things to do with their lives?  Sigh.

It's why we can't have nice things.  :-(

Bob



Re: [Savannah-hackers-public] Anyone have any news on Savannah?

2022-08-15 Thread Paul Smith
On Sun, 2022-08-14 at 22:06 -0600, Bob Proulx wrote:
> Paul Smith wrote:
> > I haven't been able to reach the Savannah website for most of the
> > day.
> > Things like the Git service are available but the website is not.
> 
> Thanks for the report.  It appears that some agent was pounding on
> the web site.  There were max processes of apache2 web server running
> and nothing making progress.  I killed all and restarted the web
> server and things seem to be functional now.
> 
> It looks like the fail2ban dynamic rules were not transferred over to
> the new system.  We have some custom rules there that help block
> abusive agents.  I'll get those set up on the system.

Thanks Bob.

Unfortunately it seems to be down / under attack again this morning :(

Don't people have better things to do with their lives?  Sigh.



Re: [Savannah-hackers-public] Anyone have any news on Savannah?

2022-08-14 Thread Bob Proulx
Hi Paul,

Paul Smith wrote:
> I haven't been able to reach the Savannah website for most of the day.
> Things like the Git service are available but the website is not.

Thanks for the report.  It appears that some agent was pounding on the
web site.  There were max processes of apache2 web server running and
nothing making progress.  I killed all and restarted the web server
and things seem to be functional now.

It looks like the fail2ban dynamic rules were not transferred over to
the new system.  We have some custom rules there that help block
abusive agents.  I'll get those set up on the system.

Bob



[Savannah-hackers-public] Anyone have any news on Savannah?

2022-08-14 Thread Paul Smith
I haven't been able to reach the Savannah website for most of the day.
Things like the Git service are available but the website is not.