Re: System update spec proposal
Christopher Blizzard wrote: On Wed, 2007-06-27 at 12:39 -0400, Mike C. Fletcher wrote: Could we get a summary of what the problem is: The main objection to vserver from all the kernel hackers (and those of us that have to support them!) is that it's a huge patch Okay, so a set of smaller, more targeted patches is preferred. that touches core kernel bits Isn't that going to have to happen in all cases where you can provide total isolation of the various elements? I'd think that adding the capability-base control to the major sub-systems would require some modifications. Of course, how core and how extensively are probably the real issue. and it has no plans to make it upstream. That sounds like a serious problem for maintainability, yes. Alright, so from my understanding of vserver it sounds like what we want is something along these lines: * chroot fixes o probably the least intrusive and most widely useful part of the system is patches to patch a couple of security holes in the chroot system (fchdir hole and the mknod hole IIRC) o goal of this is to make the chroot perform reliably and reasonably securely to provide file-system isolation o might be possible to pull just this work out of vserver * overlay/COW filesystem o goal here is to allow for a tree of overlays with read/only read/write support on each layer o aufs is the closest project at the moment? * hardware access capabilities and rate limiting o memory o cpu o disk io o network interfaces where we would want each of those patches to be as small, maintainable and elegant as possible, with all targeted for upstream inclusion. I'm guessing that it's the third set that cause the patch-size to balloon, as it seems rather involved. The first two alone, though, should provide quite a lot of the functionality we want and be reasonably general in their interest. So I guess we'd want three kernel hackers (or so) to work on the three projects simultaneously. First is probably a small project. Second is likely just a matter of massaging the code. Third is an involved project that would likely have to be working with SELinux and the like to try to integrate everything. Anyway, that's just the way it sounds to me, Mike -- Mike C. Fletcher Designer, VR Plumber, Coder http://www.vrplumber.com http://blog.vrplumber.com ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: System update spec proposal
On 6/28/07, Wayne Davison [EMAIL PROTECTED] wrote: I wouldn't recommend deploying rsync 3 widely just yet. I'm going to be working on finalizing the release in the near future, but there is still a chance that protocol 30 (which is new for this release) may still need to be changed a bit before it is released. The program is stable enough that I use it for all my own rsync copying, but I also ensure that my installed versions get updated for new releases. As I've written before, I think that splitting up the rsync per-directory will solve our immediate resource worries, although we'll shortly validate that with some testing. We certainly remain interested in your work on version 3.0. One rsync feature I would like to see is a ''--hints directory option that would tell the remote rsync that the local machine is highly likely to have the files in the specified remote directory. When blocks mismatch, the local machine can send the hash of the block it has, and if that block is in the remote hints directory the remote machine can send a binary diff instead of the complete contents of the block. That could greatly reduce the bandwidth used. (With --fuzzy I suppose each block in the hints directory could be hashed and checked for possible matches to remote blocks, but without --fuzzy the --hints directory would be required to match the directory structure of the rsync'ed directory to allow using one hints file at a time for less required memory. The --fuzzy would be useful when rsyncing (say) linux-2.6.21/ when both sides have linux-2.6.20/ available, while the non-fuzzy would be faster when I'm updating local:linux/ to remote:linux-2.6.21/ while giving the remote a hint that I previously had remote:linux-2.6.20/ in there.) --scott -- ( http://cscott.net/ ) ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: System update spec proposal
On Wed, 2007-06-27 at 17:43 -0400, Polychronis Ypodimatopoulos wrote: I wrote some code to deal with both problems, so that you have a qualitative hop count (offering real numbers instead of just integers for hop count) for all nodes in the mesh without broadcasting/multicasting any frames. The following snapshot only shows one neighbor with perfect reception. http://web.media.mit.edu/~ypod/teleporter/dump.png I can clean up the code and upload it somewhere. I would very much appreciate that. ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: System update spec proposal
On Wed, 2007-06-27 at 18:37 -0400, Ivan Krstić wrote: On Jun 27, 2007, at 2:57 AM, David Woodhouse wrote: Nevertheless, it's an accurate description of what happened. Let's agree to disagree. Sounds like a fine plan. As long as we're united on the common goal to drop vserver as soon as possible and replace it with something which is viable and supportable, I really don't care enough to argue about whatever else we might have disagreed upon. I certainly didn't mean to place blame at your door -- you needed input from kernel hackers and you didn't get it because we were all busy doing other things. That's not your fault. In any case, it doesn't matter at this point. We have work to do. Indeed. -- dwmw2 ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: System update spec proposal
On Tue, 2007-06-26 at 14:21 -0400, Christopher Blizzard wrote: First about approach: you should have given this feedback earlier rather than later since Alex has been off working on an implementation and if you're not giving feedback early then you're wasting Alex's time. Also I would have appreciated it if you had given direct feedback to Alex instead of just dropping your own proposal from space. It's a crappy thing to do. I owe Ivan an apology here. I should have handled this in private email instead of on a public mailing list. --Chris ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: System update spec proposal
On Tue, 2007-06-26 at 20:45 -0400, Ivan Krstić wrote: On Jun 26, 2007, at 7:23 PM, David Woodhouse wrote: because the people working on the security stuff let it all slide for too long and now have declared that we don't have time to do anything sensible. That's a cutely surreal take on things -- I really appreciate you trying to defuse the situation with offbeat humor :) Nevertheless, it's an accurate description of what happened. To avoid ruffling feathers unnecessarily I suppose I should have made it clear that there is no blame to be assigned here -- the kernel hackers who would ideally have worked on this were simply busy doing more important things like power management and didn't do anything more than just say No, VServer is not workable, which evidently wasn't taken sufficiently seriously. The plan of record is to use this vserver crap for as short a period of time as possible, until we can implement something which is supportable upstream. -- dwmw2 ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: System update spec proposal
On Tue, 2007-06-26 at 13:55 -0400, Ivan Krstić wrote: Software updates on the One Laptop per Child's XO laptop First some stray comments: 1.4. Design note: rsync scalability --- rsync is a known CPU hog on the server side. It would be absolutely infeasible to support a very large number of users from a single rsync server. This is far less of a problem in our scenario for three reasons: What about CPU hogging on the school server? That seems likely to be far less beefy than the centralized server. The most up-to-date bundle for each activity in the set is accessed, and the first several kilobytes downloaded. Since bundles are simple ZIP files, the downloaded data will contain the ZIP file index which stores byte offsets for the constituent compressed files. The updater then locates the bundle manifest in each index and makes a HTTP request with the respective byte range to each bundle origin. At the end of this process, the updater has cheaply obtained a set of manifests of the files in all available activity updates. Zip files have the file index at the end of the file. Now for comments on the general approach: First of all, there seems to be exceptional amounts of confusion as to exactly how some form of atomic updating of the system will happen. Some people talk about overlays, others about vserver, I myself has thrown in the filesystem transaction idea. I must say this area seems very uncertain, and I worry that this will result in the implementation of none of these options... But anyway, the exact way these updates are applied is quite orthogonal to how you download the bits required for the update, or how to discover new updates. So far I've been mainly working on this part in order to avoid blocking on the confusion I mentioned above. As to using rsync for the file transfers. This seems worse than the trivial manifest + sha1-named files on http approach I've been working on, especially with the optional usage of bsdiff I just commited. We already know (have to know in fact, so we can strongly verify them) the contents on both the laptop and the target image. To drop all this knowledge and have rsync reconstruct it at runtime seems both a waste and a possible performance problem (e.g. cpu and memory overload on the school server, and rsync re-hashing files on the laptop using up battery). You talk about the time it takes to implement another approach, but its really quite simple, and I have most of it done already. The only hard part is the atomic applying of the bits. Also, there seems to be development needed for the rsync approach too, as there is e.g. no support for xattrs in the current protocol. I've got the code for discovering local instances of updgrades and downloading it already working. I'll try to make it do an actual (non-atomic, unsafe) upgrade of a XO this week. I have a general question on how this vserver/overlay/whatever system is supposed to handle system files that are not part of the system image, but still exist in the root file system. For instance, take /var/log/messages or /dev/log? Where are they stored? Are they mixed in with the other system files? If so, then rolling back to an older version will give you e.g. your old log files back. Also, that could be complicating the usage of rsync. If you use --delete then it would delete these files (as they are not on the server). Also, your document contains a lot of comments about what will be in FRS and not. Does this mean you're working on actually developing this system for FRS? ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: System update spec proposal
On Tue, 2007-06-26 at 15:03 -0400, Mike C. Fletcher wrote: My understanding here is that Alex's system is currently point-to-point, but that we don't yet have any way to distribute it on the mesh? That is, that we are looking at a research project to determine how to make it work in-the-field using some form of mesh-based discovery protocol that's going to try to optimise for connecting to local laptops. I personally don't care which way we distribute, but I'm wary of having to have some mesh-network-level hacking implemented to provide discovery of an update server. I'm not sure what you mean here exactly. Discovery is done using avahi, a well known protocol which we are already using in many places on the laptop. The actual downloading of file uses http, which is a well known protocol with many implementations. The only thing i'm missing atm is a way to tell which ip addresses to prefer downloading from since they are close. This information is already availible in the mesh routing tables in the network driver (and possibly the arp cache), and its just a question of getting this info and using it to drive what servers to pick for downloading. Basically aside from the vserver bits, which no one has seen, I don't see a particular advantage to using rsync. In fact, I see serious downsides since it misses some of the key critical advantages of using our own tool not the least of which is that we can make our tool do what we want and with rsync you're talking about changing the protocols. Hmm, interestingly I see using our own tool as a disadvantage, not a huge one, but a disadvantage nonetheless, in that we have more untested code on the system (and we already have a lot), and in this case, in a critical must-never-fail system. For instance, what happens if the user is never connected to another XO or school server, but only connects to a (non-mesh) WiFi network? Does the mesh-broadcast upgrade discovery protocol work in that case? Avahi works fine for these cases too. Of course, since it was originally created for normal networks. However, if you never come close to another OLPC machine, then we won't find a machine to upgrade against. Its quite trivial to make it pull from any http server on the net, but that has to be either polled (which I don't like) or initiated manually (which might be fine). ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: System update spec proposal
Alexander Larsson wrote: On Tue, 2007-06-26 at 15:03 -0400, Mike C. Fletcher wrote: ... I'm not sure what you mean here exactly. Discovery is done using avahi, a well known protocol which we are already using in many places on the laptop. The actual downloading of file uses http, which is a well known protocol with many implementations. The only thing i'm missing atm is a way to tell which ip addresses to prefer downloading from since they are close. This information is already availible in the mesh routing tables in the network driver (and possibly the arp cache), and its just a question of getting this info and using it to drive what servers to pick for downloading. Ah, somehow in the discussions I'd come under the impression that the only way the system would be allowed to work was a single-hop network link on the mesh. If we already have the information and can always have a fallback, even if it means going a number of hops across the network. Looking back I see that was actually a discussion on bandwidth characteristics that wasn't intended to imply an absolute requirement. I'm reasonably happy with the approach of using Avahi and only using the network topology to inform the decision on which server to use. That said, I would be more comfortable if the fallback included a way for the laptop to check a well-known machine every X period (e.g. in Ivan's proposal) and if there's no locally discovered source, use a publicly available source as the HTTP source by default Hmm, interestingly I see using our own tool as a disadvantage, not a huge one, but a disadvantage nonetheless, in that we have more untested code on the system (and we already have a lot), and in this case, in a critical must-never-fail system. For instance, what happens if the user is never connected to another XO or school server, but only connects to a (non-mesh) WiFi network? Does the mesh-broadcast upgrade discovery protocol work in that case? Avahi works fine for these cases too. Of course, since it was originally created for normal networks. However, if you never come close to another OLPC machine, then we won't find a machine to upgrade against. Sorry, should have been clearer that the later case (not coming close to another OLPC) was the one I was concerned about. I realise such situations will represent only a small percentage of children, but a small percentage of 50,000,000 or so users is a huge number of people to have to teach how to manually upgrade their machines. Its quite trivial to make it pull from any http server on the net, but that has to be either polled (which I don't like) or initiated manually (which might be fine). I'd advocate that the piece of Ivan's proposal wherein a central mechanism allows even a completely isolated machine to find and update automatically is a good idea. It's a fairly trivial proposal that way, in the same check to see if we've been stolen download 4 bytes (or so) telling us the currently available version. If we can't get it locally after X period, try to get it from the server (using whatever protocol, be it your own, rsync, BitTorrent or Telepathy (cute, I was actually writing telepathy there and then realised it was the name of our networking library)). That is, I'd like to see a robust, automatic mechanism for triggering the laptop's search and a fallback position that allows for resolution even in the isolated-user case that doesn't require user intervention. Enjoy, Mike -- Mike C. Fletcher Designer, VR Plumber, Coder http://www.vrplumber.com http://blog.vrplumber.com ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: System update spec proposal
On Wed, 2007-06-27 at 17:42 +0200, Alexander Larsson wrote: On Tue, 2007-06-26 at 15:03 -0400, Mike C. Fletcher wrote: My understanding here is that Alex's system is currently point-to-point, but that we don't yet have any way to distribute it on the mesh? That is, that we are looking at a research project to determine how to make it work in-the-field using some form of mesh-based discovery protocol that's going to try to optimise for connecting to local laptops. I personally don't care which way we distribute, but I'm wary of having to have some mesh-network-level hacking implemented to provide discovery of an update server. I'm not sure what you mean here exactly. Discovery is done using avahi, a well known protocol which we are already using in many places on the laptop. The actual downloading of file uses http, which is a well known protocol with many implementations. The only thing i'm missing atm is a way to tell which ip addresses to prefer downloading from since they are close. This information is already availible in the mesh routing tables in the network driver (and possibly the arp cache), and its just a question of getting this info and using it to drive what servers to pick for downloading. So, like michail said, do something like: n = 0 while (True) { buf = output of (iwpriv msh0 fwt_list_neigh n) if (buf == (null)) break; // all done parse buf into fields hwaddr = parsed[0] // Grab the 'ra' field (1st one) ip4addr = lookup_hwaddr_in_arp(hwaddr) do something with ip4addr n++; } Look on the 'olpc' branch of the libertas driver here: http://git.infradead.org/?p=libertas-2.6.git;a=blob;f=drivers/net/wireless/libertas/README;hb=olpc http://git.infradead.org/?p=libertas-2.6.git;a=blob;f=drivers/net/wireless/libertas/ioctl.c;hb=olpc The README has a description of the command, and the ioctl.c has the implementation. Just search for the string neigh and you'll find it. Dan Basically aside from the vserver bits, which no one has seen, I don't see a particular advantage to using rsync. In fact, I see serious downsides since it misses some of the key critical advantages of using our own tool not the least of which is that we can make our tool do what we want and with rsync you're talking about changing the protocols. Hmm, interestingly I see using our own tool as a disadvantage, not a huge one, but a disadvantage nonetheless, in that we have more untested code on the system (and we already have a lot), and in this case, in a critical must-never-fail system. For instance, what happens if the user is never connected to another XO or school server, but only connects to a (non-mesh) WiFi network? Does the mesh-broadcast upgrade discovery protocol work in that case? Avahi works fine for these cases too. Of course, since it was originally created for normal networks. However, if you never come close to another OLPC machine, then we won't find a machine to upgrade against. Its quite trivial to make it pull from any http server on the net, but that has to be either polled (which I don't like) or initiated manually (which might be fine). ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: System update spec proposal
On Wed, 2007-06-27 at 12:26 -0400, Mike C. Fletcher wrote: That said, I would be more comfortable if the fallback included a way for the laptop to check a well-known machine every X period (e.g. in Ivan's proposal) and if there's no locally discovered source, use a publicly available source as the HTTP source by default I think that you're talking about the mail from Scott, not Ivan. And I think that we'll do something like that, yeah. That's one of the easiest parts of the update system. (Also one of the worst mistakes if you get it wrong ala the Hour of Terror: http://www.justdave.net/dave/2005/05/01/one-hour-of-terror/ but that's beside the point.) --Chris ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: System update spec proposal
On Wed, 2007-06-27 at 12:39 -0400, Mike C. Fletcher wrote: Could we get a summary of what the problem is: The main objection to vserver from all the kernel hackers (and those of us that have to support them!) is that it's a huge patch that touches core kernel bits and it has no plans to make it upstream. Yes, it's used in a lot of interesting places successfully, but that doesn't mean it's a supportable-over-the-long-term solution. Scale has nothing to do with long term supportability. And these laptops have to be supported for at least 5 years. This isn't a new discussion; it's been going on for months and months. Just quietly, that's all. --Chris ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: System update spec proposal
On Wed, 2007-06-27 at 17:31 +0200, Alexander Larsson wrote: I have a general question on how this vserver/overlay/whatever system is supposed to handle system files that are not part of the system image, but still exist in the root file system. For instance, take /var/log/messages or /dev/log? Where are they stored? Are they mixed in with the other system files? If so, then rolling back to an older version will give you e.g. your old log files back. Also, that could be complicating the usage of rsync. If you use --delete then it would delete these files (as they are not on the server). Just a note about these particular files. I don't think that on the final version that we're going to be running a kernel or syslog daemon. We're running them right now because they are useful for debugging but I don't want those out there in the field taking up memory and writing to the flash when they don't have to be. I suspect that for most users they will have very little use. --Chris ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: System update spec proposal
Hi, I have a general question on how this vserver/overlay/whatever system is supposed to handle system files that are not part of the system image, but still exist in the root file system. For instance, take /var/log/messages or /dev/log? Where are they stored? Are they mixed in with the other system files? Just a note about these particular files. I don't think that on the final version that we're going to be running a kernel or syslog daemon. We're running them right now because they are useful for debugging but I don't want those out there in the field taking up memory and writing to the flash when they don't have to be. I suspect that for most users they will have very little use. We're currently using a tmpfs for these (/var/log, /var/run, plus others) so they aren't being written to the flash at all. I'd rather have us keep doing that than turn them off, since we're throwing away useful in-the-field debugging information (for anyone who wants to help fix laptops remotely, not just us) if we turn them off. - Chris. -- Chris Ball [EMAIL PROTECTED] ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: System update spec proposal
On Jun 27, 2007, at 1:50 PM, Christopher Blizzard wrote: I think that you're talking about the mail from Scott, not Ivan. It was my mail; my proposal explicitly talks about this. -- Ivan Krstić [EMAIL PROTECTED] | GPG: 0x147C722D ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: System update spec proposal
On Jun 27, 2007, at 2:57 AM, David Woodhouse wrote: Nevertheless, it's an accurate description of what happened. Let's agree to disagree. In any case, it doesn't matter at this point. We have work to do. -- Ivan Krstić [EMAIL PROTECTED] | GPG: 0x147C722D ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
System update spec proposal
Software updates on the One Laptop per Child's XO laptop 0. Problem statement and scope == This document aims to specify the mechanism for updating software on the XO-1 laptop. When we talk about updating software, we are referring both to system software such as the OS and the core services controlled by OLPC that are required for the laptop's basic operation, and about any installed user-facing applications (activities), both those provided by OLPC and those provided by third parties. 1. System updater = 1.1. Core goals --- The three core goals of a software update tool (hereafter updater) for the XO are as follows: * Security Given the initial age group of our users, it is the only reasonable solution to default to automatic detection and installation of updates, both to be able to apply security patches in a timely fashion, and to enable users to benefit from rapid development and improvements in the software they're using. Automatic updates, however, are a security issue unto themselves: compromising the update system in any way can provide an attacker with the ability to wreak havoc across entire installed bases of laptops while bypassing -- by design -- all the security measures on the machine. Therefore, the security of the updater is paramount and must be its first design goal. * Uncompromising emphasis on fault-tolerance Given the scale of our deployment, the relatively high complexity of our network stack when compared to currently-common deployments, the unreliability of Internet connectivity even when available, and perhaps most importantly our desire for participating countries to soon begin customizing the official OLPC OS images to best suit them, it is clear that our updater must be fault-tolerant. This is both in the simple sense -- cryptographic checksums need to be used to ensure updates were received correctly -- and in the more complex sense that the likelihood of a human error with regard to update preparation goes up proportionally to the number of different base OS images at play. A fault-tolerant updater will therefore allow _unconditional_ rollback of the most recently applied update. Unconditional here means that, barring the failure of other parts of the system which are dependencies of the updater (e.g. the filesystem), the updater must always know how to correctly unapply an applied update, even if the update was malformed. * Low bandwidth For much the same reasons (project scale, Internet access scarcity and unreliability) that require fault-tolerance from the updater, the tool must take maximum care to minimize data transfer requirements. This means, concretely, that a delta-based approach must be utilized by the updater, with a keyframe or heavy update being strictly a fallback in the unlikely case an update path cannot be constructed from the available or reachable delta sets. 1.2. Design --- It is given, due to requirements imposed by the Bitfrost security platform, that a laptop will attempt to make daily contact with the OLPC anti-theft servers. During that interaction, the laptop will post its system software version, and the response provided by the anti-theft service will optionally contain a relative URL of a more recent OS image. If such a pointer has been received and the laptop is behind a known school server, it will probe the school server via rsync at the provided relative URL to determine whether the server has cached the update locally. If the update is not available locally, the laptop will wait up to 24 hours, checking approximately hourly whether the school server has obtained the update. If at the end of this wait period the school server still does not have a local copy of the update, it is assumed to be malfunctioning, and the laptop will contact an upstream master server directly by using the URL provided originally by the anti-theft service. In any of these three cases (school server has update immediately, school server has update after delay, upstream master has update), we say the laptop has 'found an update source'. Once an update source has been found, the laptop will invoke the standard rsync tool over a plaintext (unsecured) connection via the rsync protocol -- not piped through a shell of any kind -- to bring its own files up to date with the more recent version of the system. rsync uses a network-efficient binary diff algorithm which satisfies goal 3. 1.3. Design note: peer-to-peer updates -- It is desirable to provide viral update functionality at a later date, such that two laptops with different software versions (and without any notion of trust) can engage
Re: System update spec proposal
A few notes follow here. First about approach: you should have given this feedback earlier rather than later since Alex has been off working on an implementation and if you're not giving feedback early then you're wasting Alex's time. Also I would have appreciated it if you had given direct feedback to Alex instead of just dropping your own proposal from space. It's a crappy thing to do. So notes on the proposal: 1. There's a lot in here about vserver + updates and all of that is fine. But we've been pretty careful in our proposals to point out that how you get to the bits to the box is different than how they are applied. I don't see anything in here around vserver that couldn't use alex's system instead of rsync. So there's no added value there. 2. rsync is a huge hammer in this case. In fact, I think it's too much of a hammer. We've used it in the past ourselves for these image update systems over the last few years (see also: stateless linux) and it always made things pretty hard. Because you have to use lots random exceptions during its execution and once it starts you can't really control what it does. It's good for moving live image to live image, but I wouldn't really want to use it for an image update system - especially one that will be as distributed as this. Simply put I see rsync as more of a tool for sysadmins than for a task like this. I think that we need something that's actually designed to solve the problems at hand rather than seeing the hammer we have on the shelf and thinking that it's obviously the right solution. 3. It doesn't really solve the scaling + bandwith problems in the same way as Alex's tool does. Still requires a server and doesn't let you propagate changes out to servers as easily as his code does. Basically aside from the vserver bits, which no one has seen, I don't see a particular advantage to using rsync. In fact, I see serious downsides since it misses some of the key critical advantages of using our own tool not the least of which is that we can make our tool do what we want and with rsync you're talking about changing the protocols. Anyway, I'll let Alex respond with more technical points if he chooses to. --Chris ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: System update spec proposal
Ivan Krstić wrote: On Jun 26, 2007, at 2:21 PM, Christopher Blizzard wrote:. Also I would have appreciated it if you had given direct feedback to Alex instead of just dropping your own proposal from space. It's a crappy thing to do. Let's not make this about approach on a public mailing list, please. Actually, while I found the response a bit harsh, could I suggest that what the project needs is *more* public discussion all around, not necessarily about approach, but about half-formed plans, ideas and rationale. Having discussion move offline into some private channel is a good way to prevent anyone outside the offices from knowing *why* things are happening and having things blow up when the decisions appear to come from on high. For instance: * Whole projects are surfacing after weeks or months of development. Papers and implementations are starting and stopping without anyone knowing what's going on or, more importantly, knowing *why* they have been done. Witness the immediate counter-proposals to Alex's implementation of the point-to-point protocol. * VServer only appeared in public discussions yesterday or so AFAIK, yet it's apparently already the chosen path for doing the system compartmentalization. We need more draft-level discussions, more discussion of plans, rationales, ideas and approaches. The discussions don't need to be long, they don't need to be formal and well structured, they just need to be sufficient to let people have an idea what they need to write for in a few months. I realise we're on a very tight schedule, but it's extremely difficult to help with the project if we don't know what's going on inside it. Assume good will and proper intentions on the part of all people until *proven* wrong (repeatedly), and only then assume ignorance of the proper path until proved wrong, and only then assume misguidance yet a thirst for knowledge until proved wrong. Even if proved wrong many times, attempt to find a way to solve the issue politely and respectfully. We are all working to make a better world, and there should be no egos involved if we are doing things right. Anyway, just my thoughts, Mike -- Mike C. Fletcher Designer, VR Plumber, Coder http://www.vrplumber.com http://blog.vrplumber.com ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: System update spec proposal
On Jun 26, 2007, at 5:15 PM, Daniel Monteiro Basso wrote: And the same document rendered using jsCrossmark is available at: http://www.lec.ufrgs.br/~dmbasso/jsCrossmark/systemUpdate.html Looks great! Please Ivan, consider answering my e-mail about Crossmark. I sent you privately because I wasn't on the list before, but now you can answer it openly. Hm. I don't have any mail from you after a message from May 21st that I answered a couple of days later. Could you please send me copies of any other messages you sent? In general, if I don't answer non-urgent mail in 3-5 days, it's safe to assume something's wrong and I probably haven't seen the message. -- Ivan Krstić [EMAIL PROTECTED] | GPG: 0x147C722D ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: System update spec proposal
On Tue, 2007-06-26 at 15:59 -0400, Mike C. Fletcher wrote: * VServer only appeared in public discussions yesterday or so AFAIK, yet it's apparently already the chosen path for doing the system compartmentalization. It's a short-term hack, because the people working on the security stuff let it all slide for too long and now have declared that we don't have time to do anything sensible. It will be dropped as soon as possible, because we know it's not a viable and supportable plan in the long (or even medium) term. -- dwmw2 ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: System update spec proposal
On Tue, 2007-06-26 at 18:50 -0400, C. Scott Ananian wrote: On 6/26/07, Christopher Blizzard [EMAIL PROTECTED] wrote: A note about the history of using rsync. We used rsync as the basis for a lot of the Stateless Linux work that we did a few years ago. That approach (although using LVM snapshots instead of CoW snapshots) basically did exactly what you've proposed here. And we used to kill servers all the time with only a handful of clients. Other people report that it's easy to take out other servers using rsync. It's pretty fragile and it doesn't scale well to entire filesystem updates. That's just based on our experience of building systems like what you're suggesting here and how we got to where we are today. I can try to get some benchmark numbers to validate this one way or the other. My understanding is that rsync is a memory hog because it builds the complete list of filenames to sync before doing anything. 'Killing servers' would be their running out of memory. Rsync 3.0 claims to fix this problem, which may also be mitigated by the relatively small scale of our use: my laptop's debian/unstable build has 1,345,731 files. Rsync documents using 100 bytes per file, so that's 100M of core required. Not hard to see that 10 clients or so would tax a machine with 1G main memory. In contrast, XO build 465 has 23,327 files: ~2M of memory. 100 kids simultaneously updating equals 2G of memory, which is within our specs for the school server. Almost two orders of magnitude fewer files for the XO vs. a 'standard' distribution ought to fix the scaling problem, even without moving to rsync 3.0. --scott I think that in our case it wasn't just memory it was also seeking all over the disk. We could probably solve that easily by stuffing the entire image into memory (and it will fit, easily) but your comment serves to prove another point: that firing up a program that has to do a lot of computation every time a client connects is something that's deeply wrong. And that's just for the central server. Also, 2G of memory on the school server is nice - as long as you don't expect to do anything else. Or as long as you don't want to do what I mention above and shove everything into memory to avoid the thrashing problem. --Chris ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: System update spec proposal
On Jun 26, 2007, at 7:23 PM, David Woodhouse wrote: because the people working on the security stuff let it all slide for too long and now have declared that we don't have time to do anything sensible. That's a cutely surreal take on things -- I really appreciate you trying to defuse the situation with offbeat humor :) -- Ivan Krstić [EMAIL PROTECTED] | GPG: 0x147C722D ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: System update spec proposal
On Jun 26, 2007, at 7:21 PM, Christopher Blizzard wrote: Also, 2G of memory on the school server is nice - as long as you don't expect to do anything else. Or as long as you don't want to do what I mention above and shove everything into memory to avoid the thrashing problem. I see no reason why the school server should be configured to allow more than 10-20 simultaneous client updates. We're not going for real- time propagation update here. Our largest schools can get updated within a day or so at that rate, and all others within at most a few hours. -- Ivan Krstić [EMAIL PROTECTED] | GPG: 0x147C722D ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: System update spec proposal
Scott, Rsync documents using 100 bytes per file, so that's 100M of core required. That 100 bytes per file is very approximate. It also increases quite a lot if you use --delete and also increases if you use --hard-links. Other options have smaller, but non-zero, impacts on the memory usage, and of course it depends on the filenames themselves. If rsync is going to be used on low memory machines, then it could be broken up into several pieces. So do multiple rsync runs, each synchronising a portion of the filesystem (eg. each directory under /usr). Alternatively, talk to Wayne Davison about rsync 3.0. One of the core things that brings is lower memory usage (essentially automating the breakup into directory trees that I mentioned above). I had hoped to have time to write a new synchronisation tool for OLPC that would be much more memory efficient and take advantage of multicast, taking advantage of a changeset like approach to complete OS update, but various things have gotten in the way of me contributing serious time to the OLPC project, for which I apologise. I could review any rsync based scripts you have though, and offer suggestions on getting the most out of rsync. Cheers, Tridge ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: System update spec proposal
Chris, but your comment serves to prove another point: that firing up a program that has to do a lot of computation every time a client connects is something that's deeply wrong. And that's just for the central server. yes, very true. What rsync as a daemon should do is mmap a pre-prepared file list, and you generate that file list using cron. For the OLPC case this isn't as hard as for the general rsync case, as you know that all the clients will be passing the same options to the server, so the same pre-prepared file list can be used for all of them. In the general rsync case we can't guarantee that, which is what makes it harder (though we could use a map file named after a hash of the options). Coding/testing this takes time though :( Cheers, Tridge ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: System update spec proposal
On 6/26/07, Ivan Krstić [EMAIL PROTECTED] wrote: term one. It's still my *strong* hunch that we are not going to run into any issues whatsoever given our update sizes and the fact that we're serving them from reasonably beefy school server machines, so adding this functionality to rsync would easily be a post-FRS goal. I concur. Rate-limiting is certainly a viable option for FRS if server resources are an issue, and the *network* characteristics of rsync are certainly in the right ballpark. I suspect that we won't need to hack rsync ourselves at all, since rsync 3.0 will Do What We Want. But we'll see what Wayne says about the timeline of rsync 3.0. Scott, are you willing to do a few tests and grab some real numbers, using previous OLPC OS images, for resource utilization on the school server in the face of e.g. 5, 10, 20, 50 parallel updates? I might need some help getting access to enough clients, but I have no problem doing the benchmarks. Tridge, do you have any recommendations about benchmarking rsync? --scott -- ( http://cscott.net/ ) ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: System update spec proposal
On Jun 26, 2007, at 11:46 PM, C. Scott Ananian wrote: I suspect that we won't need to hack rsync ourselves at all, since rsync 3.0 will Do What We Want I understood rsync 3.0 is smart about breaking up file list generation into smaller chunks to be better about memory usage, but that's orthogonal to the pregenerated-mmaped-file-list optimization. The former helps memory consumption, but you're still needlessly stat ()ing static data left and right. Although, again, our updates are sufficiently small that I expect the stat() calls to just be hitting the VFS cache and not actually cost us much of anything. -- Ivan Krstić [EMAIL PROTECTED] | GPG: 0x147C722D ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel