Roi Martin wrote: > First of all, sorry for the late reply. Some personal stuff has kept me > away from the computer for quite some time.
No worries! I am also juggling life, the universe, and everything. We are all getting by as best as we can. For example I am writing this on a train while out on travel right now. > However, with regard to "https://cgit.git.savannah.gnu.org/cgit/" and > "https://gitweb.git.savannah.gnu.org/gitweb/", you said that these URLs > are very likely to change and should not be used in documentation. > > Bob Proulx writes: > > https://cgit.git.savannah.gnu.org//cgit/ > > https://gitweb.git.savannah.gnu.org//gitweb/ > > > > Those names and paths might need to change but for the moment maybe > > having them even temporarily will give some relief? I would avoid > > using them in documentation which will persist because again they are > > likely not stable naming yet. You have been warned. Yes. I did say that. But it has been a year now and I think things have solidified. In particular the above convention helps with the combinations of permutations of redirects that are needed. For one example the folowing is rather programmatic and regular. location /git { return 302 $scheme://$scheme.$server_name$request_uri; } > Now that all redirects seem to be implemented: > > $ curl -isS https://git.savannah.gnu.org/git/ | grep Location > Location: https://https.git.savannah.gnu.org/git/ > $ curl -isS https://git.savannah.gnu.org/cgit/ | grep Location > Location: https://cgit.git.savannah.gnu.org/cgit/ > $ curl -isS https://git.savannah.gnu.org/gitweb/ | grep Location > Location: https://gitweb.git.savannah.gnu.org/gitweb/ > > What are the recommended URLs to use in documentation? > "https://{https,cgit,gitweb}.git.savannah.gnu.org" or > "https://git.savannah.gnu.org"? That is a good question. I know people prefer shorter ones (Look at the shorter debbugs URLs for an example) and the redirects will then go to the best place. But that does require the primary to be online and available and fails if it is not. Whereas the longer URLs to the mirror pool will work independently of the primary. With the redirects in place the load on the primary has been reduced hugely. Basically reduced from 40+ with 100% of 8-cpus down to an average of 2 with idle cpu available. And so perhaps I no longer feel strongly about it one way or the other until and unless some new problem arises. Do you have an opinion? What do you think? > While I was using curl to check the previous redirects, > git.savannah.gnu.org stopped answering my requests. However, it worked > again when tethering from my phone. Out of curiosity, do you > temporarily blacklist hosts showing a suspicious behaviour? For > instance, using curl instead of a browser or git to access these URLs? > Maybe this is what happened last time? In the battle against the AI scrapers taking the site offline we are blocklisting addresses very dynamically. We are blocking at multiple levels and layers. The site router has blocks, mostly blocking entire ASN network blocks. The individual systems in the primary pool of core servers block on each machine. (These are currently not coordinated but likely will be in the future.) The mirror pool systems have their own firewalls completely independent of the primary systems. So there is a lot of specific things that apply to one machine separately from the others. For obvious reasons no one wants to discuss publicly specific details of how the dynamic blocking rules are being created and implemented. No one wants to give a scraper vender clues about how to blast through their defenses. But what I can say are that the problem scrapers do not obey the robots.txt file ignoring it completely and falsify their User-Agent strings to appear as much as they can like user web browsers. This makes it tricky to separate the humans we want to have access from the scrapers which should simply clone the repositories and have full access to the free software source code that way rather than hammering on a site until it goes offline. So that's the long answer. The shorter answer is yes. Yes your address might have been blocked by tripping one of the rules. If you tell us your address (you can seen it privately to the private [email protected] list) then we can look to see if so and why. And we would need to know which system you were connecting to as well. Since as described they are both independent and coupled and we would need to look both places. Bob
