On Mon, Oct 12, 2009 at 3:04 AM, Bernie Innocenti <[email protected]> wrote: > El Mon, 12-10-2009 a las 08:19 +0200, Sean DALY escribió: >> ideally, we could take the VLC approach, proposing a link which hits >> the mirror script, since editors and bloggers just copy/paste the >> first link they find. > > With the Mirrorbrain setup just put in place by David, requests to > download.sugarlabs.org will be bounced to the best available mirror > automatically. > > >> by the way did the filename change impact the mirrors? Perhaps Marten >> can tell us?
Sorry for the long post. Mirror management systems are often considered black boxes.... Yet they touch many parts of the project. So it seems important that we all have a basic understanding of how they work and how the mirror management system will affect our individual work flows. From a user point of view the mirror system is invisible. It is very similar to the download button on http://www.mozilla.com/en-US/firefox/personal.html . The user clicks a button(link) and the the rest happens behind the scenes. If you watch the status bar in the lower left corner of firefox as you you click the download link, you can see the redirect flashing by. The mirror systems works through a couple of processes: 1. The mirrors pull updates from the primary server. Individual mirrors are controlled by their local mirror maintainers. Those maintainers configure their mirrors to update from the download.sugarlabs.org. It is common for maintainers to sync against between once an hour to once a day with the primary servers. A good example of mirror ages can be seen at http://download.services.openoffice.org/mirmon/ 2. Mirrorbrain pings all of the mirror once a minute to make sure that they are still alive. (Redirecting to a dead mirror in not good.) 3. Mirrorbrain checks what files are available on each mirror every five minutes. (This is a check on step 1 to determine _when_ each individual mirror has updated itself) 4. When a download request comes in mirrorbrain uses the information gathered in steps 1-3 to correctly redirect the download request: 4.1. When a download request come in to download.sugarlab.org , mirrorbrain check to see if the file exits in /srv/uploads (the name is a bit of a kludge... /var/www-sugarlabs/download was a symlink to /srv/upload for historical purposes). If the file does not exist the user receives a file not found error. 4.2. If the file exists, mirrorbrain check the file size. Anything smaller than 4K is served. (It is not worth the database lookup and redirect traffic for files smaller than 4K.) 4.3 If a request is for a file larger than 4K: 4.3.1 Mirrorbrain determines the physical location where the download request originated (Onalaska, WI, US, North America or Berlin, Germany, Europe) 4.3.2 Mirrorbrain search its database for the closest (good) mirror which has the requested file (As determined in step 3) 4.3.3 If a good mirror is found, the download request it redirected to the mirror. 4.3.4. In no mirror is found, the file is served straight from download.sugarlabs.org . There are some side effects of this process: 1. download.sugarlabs.org is the weak link. If it goes down, the entire mirror system become inoperable. Because of this, the infrastructure team hosts d.sl.o on a very reliable machine. 2. There is a lag between when file are available for download and when they are available on the mirrors. During this lag, a file is served directly from download.sugarlabs.org. At current traffic levels that is not a problem. 2.1 Project with popular products usually a third layer called staging. For example, when a new version of firefox (or fedora) is release, downloads spike immediately. So the mirrors compete with normal downloads for copies of the content. This competition can crash the primary server. Instead, the mirrors synchronize against staging. New popular product at first added to the staging tree a couple of days before the actually public release data. This gives the mirrors several days to update. On the public release date the file is added to the download tree. At this point it is available to public download and the mirrors have already been pre-seeded. 2.2. A harder challenge will be activities.sugarlabs.org . When activities are approved they are immediatly made available for public download. This could be a problem if every student in Uruguay updates their computer with in minutes of a large and popular activity such as etoys being release. The good news is that we have at least a year for that to become an issue and the mozilla and mirrorbrain developers also working the the issue. 3. Security. We are going to have to consider that mirrors can be hijacked. ISOs will have to be shipped with md5 hashes. The md5 hashes will be small enough that it is always shipped from the primary server. This will make it harder to attack both the iso and the hash. The activity installer will need to check the md5 hash of activity bundles before installing them. The hash is calculated as part of the process to upload to a.sl.o. 4. Download tree size. We are going to have to consider the size of the download tree. For example currently, the tree has 40 to 50 soas snapshots which take about 20GB of space. We are going to have to determine what gets mirrored. I am looking at setting up two separate rsysnc groups called 'releases' and 'entire.' This would allow individual mirror maintainers to chose between the small 'releases' goups and the entire tree. I'll try to put this onto the CDN wiki page. david _______________________________________________ SoaS mailing list [email protected] http://lists.sugarlabs.org/listinfo/soas

