Follow-up Comment #10, sr #111374 (group administration):
> One of the cgit mirror sites was stale for groff by about two whole weeks! Anything I say here won't help my case. I know I should just stand up and own up that the mirrors were stale. And I will do that because I had one job and that was to keep the mirrors in sync and clearly that failed. But the logs at the time only showed that the mirror was stale by about 4 days not 2 weeks. Why might that be different? I think git may have fooled things here because if one commits locally then that's the date. Then a few days later that is pushed and the data the git log shows is the date of the commit not the push. So sure the log may show the last commit to be 2 weeks ago but that is not a direct insight into how long the mirror has been since the last successful mirror sync. But I had one job and when that broke down it is still a fail. Sorry! Fortunately git when fetching will do the right thing. And I counted upon that in the design of things. If a mirror is behind and git is asked to fetch from it then nothing happens. Later when git is asked to fetch and it hits an updated mirror then of course it will fetch the new bits. As the RR-DNS rotates around through the addresses things will work out for the CI/CD automated systems. How does Round-Robin-DNS work? Yes as you surmise there are completely independent mirror systems running in parallel. There are currently 7 systems with two more in the queue to be added. It's growing to be somewhat of a large collection to manage. The primary is the upstream system that is used for git push. The mirrors shield the primary from the load from the AI Scraper bots and DDOS attacks. RR-DNS is only somewhat randomly distributed. It depends upon client implementation. Every DNS query will rotate through the list of addresses. But clients such as web browsers will cache the DNS lookup and therefore stick with one address rather than rotate through the list. There are other limitations. The technique is somewhat more thoroughly described on Wikipedia. https://en.wikipedia.org/wiki/Round-robin_DNS The advantage is that it is a completely independently distributed technique. There is no other fully independent distributed technique available to us to use with standard servers, networking, and DNS. Vendors such as CloudFlare and Akamai use stronger infrastructure methods of distribution which are not readily available to us. So even though RR-DNS is not without drawbacks it is the best system readily available to us. I know that HA High Availability systems are often suggested such as haproxy, traefik, nginx proxy, others are not distributed across sites. They are great HA solutions within a single datacenter location. They are not designed for distributing across a heterogeneous collection of volunteer contributed systems across many datacenters. Though haproxy is an excellent solution within a datacenter site. So I will just say this in order to get ahead of it. > I'm not seeing any hostname at the top of any cgit pages, and don't remember > having ever seen it. Also, I would expect such a hostname in the footer: There are only a few options available when using cgit and one of them is the cgit root-desc string. So I embedded the hostname in that string. It's not perfect but it is at least a clue. Here is an automated way to print that string. This will show the mirror that is being queried for each wget run. wget -O- -q https://cgit.git.savannah.gnu.org/cgit/ | grep cgit.browser Since wget does not cache the DNS lookup it will perform a new DNS query each time. It is operating as git itself will operate. It will rotate through the RR-DNS address list. The web browser such as Firefox, Chromium, others will cache result though and stick with one until the cache expires in the browser. > ... this didn't work due to HTTPS default and inappropriate or missing SSL > certificate matches When looking at git-daemon cloning that is the easiest way to probe individual IP addresses. For example picking a single address from the list can be cloned from a selected mirror like this. git clone --depth=1 git://15.204.9.231/test-project.git git clone --depth=1 git://"[2604:2dc0:202:300::5d3]"/test-project.git Don't fixate on that IP address. It's dynamic. That one in particular is going to be removed from the pool soon. But it should be online today. I have a checker which is testing that the service is online using the above technique. That reports if it is online. It does not know if it is stale. I need to set up something different which will test for stale repositories. There are a thousand plus repositories however so it will need to be something which is a surrogate for the collection. It is not practical to test each of the thousand individually all of the tiem. To dig into a cgit page one needs something a little more invasive. Saying cgit of course implies HTTPS protocol. Without using a chroot and overriding /etc/hosts or creating an nsswitch module we just have to use the addres and disable certificate checking becuase the certificate won't match the address. (I suppose we could now add IP certificates.) wget -q -O- --no-check-certificate --header="Host: cgit.git.savannah.gnu.org" https://15.204.9.231/cgit/test-project.git/commit/ | grep -F '/commit/?id=' That gets messier very quickly. That's also part of my automated testing to ensure that the gitweb, cgit, http, https services are all operating. I don't know what else to add here so I am just going to post this and keep working on things. _______________________________________________________ Reply to this item at: <https://savannah.nongnu.org/support/?111374> _______________________________________________ Message sent via Savannah https://savannah.nongnu.org/
signature.asc
Description: PGP signature
