Re: savannah is down

Bob Proulx Wed, 13 May 2026 16:29:55 -0700

Riccardo Mottola wrote:
> Everything started working again after a couple of hours.


Makes me think it was simply overloading.

> For info, my current IP ...

I do not find your address in any of the current bans.

> AI scrapers are becoming a plague much worse than crawlers. One reason more
> I dislike this AI hype.
> I am part of other Open Source projects and we all have more or less similar
> problems.
> It usually affects wikis, CVS/SVN/GIT browsers, bug trackers and such.

It's an asymetrical problem.  Small projects are small.  They have one
public facing web server.  They start off as an idea with no
resources.  They face having a hundred million proxy bots hitting them
from well funded AI start-up companies currently none of which have
posted a profit but are getting funded with a zillion dollars in
startup money.

> I was alarmed that SVN repos were down.

For svn note that there is also svnserve:// protocol.  It's on a
different port.  It might be surviving when the https port is
overwhelmed.  As you saw that ssh:// protocol was also available
during the same time.

There are not yet any mirrors available for the subversion
repositories.  Yet.  It's in the plan.  Just need server resources and
then time to get them set up.

Things continue to evolve.  Maybe we set up a new port that secures
with an http basic password?  In order to have a known good member
access protocol service.  Maybe.

The HTTP protocol has really opened up the Internet.  It seems that
everything is using it.  And therefore people have in their mind that
it is the best way to do things.  But HTTP almost always means a
resource heavy back end server to be associated with it.

Let's say someone has a 1 GB RAM virtual machine for a server.  That's
a typical size for a lot of uses.  It used to be more than sufficient.
Let's say that a backend process uses 50 MB for the process size to do
something.  That's not very large these days.  Doing the math...  If I
didn't make some silly error...  That's only 20 backend server
processes that can be run consuming all memory.  We must limit the
number of backend processes that can be spawned to 20.

A botnet AI scraper hammers the machine with 500 queries per second.
That will cause all 20 backend processes to be 100% busy.  Let's say
those servers have 16 cores.  That's still going to keep all of the
cpu cores 100% busy.  In practice there will be I/O wait time for
storage preventing it.  The machine load average will rise to 20, plus
a few more for other miscellaneous processes associated.  So maybe
level at a load of 25.  And it will be completely max'd out.  It will
be operating at its capacity.  But the botnet with millions of proxy
clients available will have plenty more resources to keep the system
overwhelmed.  And even if we scaled that 1 GB to 256 GB that only
increases the number of bots that could be served linearly.  It is not
a big enough scaling and the botnets would still overwhelm us.

Also remember that these AI scrapers are not doing svn checkouts.
They are web browsing the repository as if it were a web page,
downloading each of the hash numbered object files without regard.
This is a useless wasted activity for them.  But they don't care.
They have seemingly endless resources right now.  They should be doing
an svn check and then scraping their own local sandbox.  But instead
since the repository is available by https:// it is now being
uselessly scraped.  Regardless of the robots.txt files saying not to
do it to these directories.

I propose this as an illustration that using HTTP for everything and
treating everything as a web protocol has this unintended consequence
that it allows for the horde of scrapers to damage us.  If it were a
different dedicated protocol then this would not be happening at this
level of abuse.  The svnserve:// and git:// protocol services have so
far not been the main target of abuse.

Bob

Re: savannah is down

Reply via email to