Re: System update spec proposal

2007-06-30 Thread Mike C. Fletcher
Christopher Blizzard wrote:
 On Wed, 2007-06-27 at 12:39 -0400, Mike C. Fletcher wrote:
   
 Could we get a summary of what the problem is: 
 

 The main objection to vserver from all the kernel hackers (and those of
 us that have to support them!) is that it's a huge patch 
Okay, so a set of smaller, more targeted patches is preferred.
 that touches
 core kernel bits 
Isn't that going to have to happen in all cases where you can provide 
total isolation of the various elements?  I'd think that adding the 
capability-base control to the major sub-systems would require some 
modifications. Of course, how core and how extensively are probably the 
real issue.
 and it has no plans to make it upstream.
That sounds like a serious problem for maintainability, yes.

Alright, so from my understanding of vserver it sounds like what we want 
is something along these lines:

* chroot fixes
  o probably the least intrusive and most widely useful part of
the system is patches to patch a couple of security holes in
the chroot system (fchdir hole and the mknod hole IIRC)
  o goal of this is to make the chroot perform reliably and
reasonably securely to provide file-system isolation
  o might be possible to pull just this work out of vserver
* overlay/COW filesystem
  o goal here is to allow for a tree of overlays with read/only
read/write support on each layer
  o aufs is the closest project at the moment?
* hardware access capabilities and rate limiting
  o memory
  o cpu
  o disk io
  o network interfaces

where we would want each of those patches to be as small, maintainable 
and elegant as possible, with all targeted for upstream inclusion.  I'm 
guessing that it's the third set that cause the patch-size to balloon, 
as it seems rather involved.  The first two alone, though, should 
provide quite a lot of the functionality we want and be reasonably 
general in their interest.

So I guess we'd want three kernel hackers (or so) to work on the three 
projects simultaneously.  First is probably a small project.  Second is 
likely just a matter of massaging the code.  Third is an involved 
project that would likely have to be working with SELinux and the like 
to try to integrate everything.

Anyway, that's just the way it sounds to me,
Mike

-- 

  Mike C. Fletcher
  Designer, VR Plumber, Coder
  http://www.vrplumber.com
  http://blog.vrplumber.com

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: System update spec proposal

2007-06-29 Thread C. Scott Ananian
On 6/28/07, Wayne Davison [EMAIL PROTECTED] wrote:
 I wouldn't recommend deploying rsync 3 widely just yet.  I'm going to be
 working on finalizing the release in the near future, but there is still
 a chance that protocol 30 (which is new for this release) may still need
 to be changed a bit before it is released.  The program is stable enough
 that I use it for all my own rsync copying, but I also ensure that my
 installed versions get updated for new releases.

As I've written before, I think that splitting up the rsync
per-directory will solve our immediate resource worries, although
we'll shortly validate that with some testing.  We certainly remain
interested in your work on version 3.0.

One rsync feature I would like to see is a ''--hints directory
option that would tell the remote rsync that the local machine is
highly likely to have the files in the specified remote directory.
When blocks mismatch, the local machine can send the hash of the block
it has, and if that block is in the remote hints directory the remote
machine can send a binary diff instead of the complete contents of the
block.  That could greatly reduce the bandwidth used.

(With --fuzzy I suppose each block in the hints directory could be
hashed and checked for possible matches to remote blocks, but without
--fuzzy the --hints directory would be required to match the directory
structure of the rsync'ed directory to allow using one hints file at a
time for less required memory.  The --fuzzy would be useful when
rsyncing (say) linux-2.6.21/ when both sides have linux-2.6.20/
available, while the non-fuzzy would be faster when I'm updating
local:linux/ to remote:linux-2.6.21/ while giving the remote a hint
that I previously had remote:linux-2.6.20/ in there.)
 --scott

-- 
 ( http://cscott.net/ )
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: System update spec proposal

2007-06-28 Thread Alexander Larsson
On Wed, 2007-06-27 at 17:43 -0400, Polychronis Ypodimatopoulos wrote:

 I wrote some code to deal with both problems, so that you have a
 qualitative hop count (offering real numbers instead of just integers
 for hop count) for all nodes in the mesh without
 broadcasting/multicasting any frames. The following snapshot only shows
 one neighbor with perfect reception.
 
 http://web.media.mit.edu/~ypod/teleporter/dump.png
 
 I can clean up the code and upload it somewhere.

I would very much appreciate that.

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: System update spec proposal

2007-06-28 Thread David Woodhouse
On Wed, 2007-06-27 at 18:37 -0400, Ivan Krstić wrote:
 On Jun 27, 2007, at 2:57 AM, David Woodhouse wrote:
  Nevertheless, it's an accurate description of what happened.
 
 Let's agree to disagree. 

Sounds like a fine plan.

As long as we're united on the common goal to drop vserver as soon as
possible and replace it with something which is viable and supportable,
I really don't care enough to argue about whatever else we might have
disagreed upon. I certainly didn't mean to place blame at your door --
you needed input from kernel hackers and you didn't get it because we
were all busy doing other things. That's not your fault.

 In any case, it doesn't matter at this point. We have work to do.

Indeed.

-- 
dwmw2

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: System update spec proposal

2007-06-28 Thread Christopher Blizzard
On Tue, 2007-06-26 at 14:21 -0400, Christopher Blizzard wrote:
 
 First about approach: you should have given this feedback earlier
 rather
 than later since Alex has been off working on an implementation and if
 you're not giving feedback early then you're wasting Alex's time.
 Also
 I would have appreciated it if you had given direct feedback to Alex
 instead of just dropping your own proposal from space.  It's a crappy
 thing to do.
 

I owe Ivan an apology here.  I should have handled this in private email
instead of on a public mailing list.

--Chris

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: System update spec proposal

2007-06-27 Thread David Woodhouse
On Tue, 2007-06-26 at 20:45 -0400, Ivan Krstić wrote:
 On Jun 26, 2007, at 7:23 PM, David Woodhouse wrote:
  because the people working on the security stuff
  let it all slide for too long and now have declared that we don't have
  time to do anything sensible.
 
 That's a cutely surreal take on things -- I really appreciate you  
 trying to defuse the situation with offbeat humor :)

Nevertheless, it's an accurate description of what happened. To avoid
ruffling feathers unnecessarily I suppose I should have made it clear
that there is no blame to be assigned here -- the kernel hackers who
would ideally have worked on this were simply busy doing more important
things like power management and didn't do anything more than just say
No, VServer is not workable, which evidently wasn't taken sufficiently
seriously.

The plan of record is to use this vserver crap for as short a period of
time as possible, until we can implement something which is supportable
upstream.

-- 
dwmw2

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: System update spec proposal

2007-06-27 Thread Alexander Larsson
On Tue, 2007-06-26 at 13:55 -0400, Ivan Krstić wrote:
 Software updates on the One Laptop per Child's XO laptop
 

First some stray comments:

 1.4. Design note: rsync scalability
 ---
 
 rsync is a known CPU hog on the server side. It would be absolutely
 infeasible to support a very large number of users from a single rsync
 server. This is far less of a problem in our scenario for three reasons:

What about CPU hogging on the school server? That seems likely to be far
less beefy than the centralized server.

 The most up-to-date bundle for each activity in the set is accessed, and
 the first several kilobytes downloaded. Since bundles are simple ZIP
 files, the downloaded data will contain the ZIP file index which stores
 byte offsets for the constituent compressed files. The updater then
 locates the bundle manifest in each index and makes a HTTP request with
 the respective byte range to each bundle origin. At the end of this
 process, the updater has cheaply obtained a set of manifests of the
 files in all available activity updates.

Zip files have the file index at the end of the file.

Now for comments on the general approach:

First of all, there seems to be exceptional amounts of confusion as to
exactly how some form of atomic updating of the system will happen. Some
people talk about overlays, others about vserver, I myself has thrown in
the filesystem transaction idea. I must say this area seems very
uncertain, and I worry that this will result in the implementation of
none of these options...

But anyway, the exact way these updates are applied is quite orthogonal
to how you download the bits required for the update, or how to discover
new updates. So far I've been mainly working on this part in order to
avoid blocking on the confusion I mentioned above.

As to using rsync for the file transfers. This seems worse than the
trivial manifest + sha1-named files on http approach I've been working
on, especially with the optional usage of bsdiff I just commited. We
already know (have to know in fact, so we can strongly verify them) the
contents on both the laptop and the target image. To drop all this
knowledge and have rsync reconstruct it at runtime seems both a waste
and a possible performance problem (e.g. cpu and memory overload on the
school server, and rsync re-hashing files on the laptop using up
battery). 

You talk about the time it takes to implement another approach, but its
really quite simple, and I have most of it done already. The only hard
part is the atomic applying of the bits. Also, there seems to be
development needed for the rsync approach too, as there is e.g. no
support for xattrs in the current protocol.

I've got the code for discovering local instances of updgrades and
downloading it already working. I'll try to make it do an actual
(non-atomic, unsafe) upgrade of a XO this week.

I have a general question on how this vserver/overlay/whatever system is
supposed to handle system files that are not part of the system image,
but still exist in the root file system. For instance,
take /var/log/messages or /dev/log? Where are they stored? Are they
mixed in with the other system files? If so, then rolling back to an
older version will give you e.g. your old log files back. Also, that
could be complicating the usage of rsync. If you use --delete then it
would delete these files (as they are not on the server).

Also, your document contains a lot of comments about what will be in FRS
and not. Does this mean you're working on actually developing this
system for FRS?

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: System update spec proposal

2007-06-27 Thread Alexander Larsson
On Tue, 2007-06-26 at 15:03 -0400, Mike C. Fletcher wrote:

 My understanding here is that Alex's system is currently point-to-point, 
 but that we don't yet have any way to distribute it on the mesh?  That 
 is, that we are looking at a research project to determine how to make 
 it work in-the-field using some form of mesh-based discovery protocol 
 that's going to try to optimise for connecting to local laptops.  I 
 personally don't care which way we distribute, but I'm wary of having to 
 have some mesh-network-level hacking implemented to provide discovery of 
 an update server.

I'm not sure what you mean here exactly. Discovery is done using avahi,
a well known protocol which we are already using in many places on the
laptop. The actual downloading of file uses http, which is a well known
protocol with many implementations.

The only thing i'm missing atm is a way to tell which ip addresses to
prefer downloading from since they are close. This information is
already availible in the mesh routing tables in the network driver (and
possibly the arp cache), and its just a question of getting this info
and using it to drive what servers to pick for downloading.

  Basically aside from the vserver bits, which no one has seen, I don't
  see a particular advantage to using rsync.  In fact, I see serious
  downsides since it misses some of the key critical advantages of using
  our own tool not the least of which is that we can make our tool do what
  we want and with rsync you're talking about changing the protocols.

 Hmm, interestingly I see using our own tool as a disadvantage, not a 
 huge one, but a disadvantage nonetheless, in that we have more untested 
 code on the system (and we already have a lot), and in this case, in a 
 critical must-never-fail system.  For instance, what happens if the user 
 is never connected to another XO or school server, but only connects to 
 a (non-mesh) WiFi network?  Does the mesh-broadcast upgrade discovery 
 protocol work in that case?

Avahi works fine for these cases too. Of course, since it was originally
created for normal networks. However, if you never come close to another
OLPC machine, then we won't find a machine to upgrade against. Its quite
trivial to make it pull from any http server on the net, but that has to
be either polled (which I don't like) or initiated manually (which might
be fine).


___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: System update spec proposal

2007-06-27 Thread Mike C. Fletcher
Alexander Larsson wrote:
 On Tue, 2007-06-26 at 15:03 -0400, Mike C. Fletcher wrote:
   
...
 I'm not sure what you mean here exactly. Discovery is done using avahi,
 a well known protocol which we are already using in many places on the
 laptop. The actual downloading of file uses http, which is a well known
 protocol with many implementations.

 The only thing i'm missing atm is a way to tell which ip addresses to
 prefer downloading from since they are close. This information is
 already availible in the mesh routing tables in the network driver (and
 possibly the arp cache), and its just a question of getting this info
 and using it to drive what servers to pick for downloading.
   
Ah, somehow in the discussions I'd come under the impression that the 
only way the system would be allowed to work was a single-hop network 
link on the mesh.  If we already have the information and can always 
have a fallback, even if it means going a number of hops across the 
network.  Looking back I see that was actually a discussion on bandwidth 
characteristics that wasn't intended to imply an absolute requirement. 
I'm reasonably happy with the approach of using Avahi and only using the 
network topology to inform the decision on which server to use.

That said, I would be more comfortable if the fallback included a way 
for the laptop to check a well-known machine every X period (e.g. in 
Ivan's proposal) and if there's no locally discovered source, use a 
publicly available source as the HTTP source by default
 Hmm, interestingly I see using our own tool as a disadvantage, not a 
 huge one, but a disadvantage nonetheless, in that we have more untested 
 code on the system (and we already have a lot), and in this case, in a 
 critical must-never-fail system.  For instance, what happens if the user 
 is never connected to another XO or school server, but only connects to 
 a (non-mesh) WiFi network?  Does the mesh-broadcast upgrade discovery 
 protocol work in that case?
 

 Avahi works fine for these cases too. Of course, since it was originally
 created for normal networks. However, if you never come close to another
 OLPC machine, then we won't find a machine to upgrade against. 
Sorry, should have been clearer that the later case (not coming close to 
another OLPC) was the one I was concerned about.  I realise such 
situations will represent only a small percentage of children, but a 
small percentage of 50,000,000 or so users is a huge number of people to 
have to teach how to manually upgrade their machines.
 Its quite
 trivial to make it pull from any http server on the net, but that has to
 be either polled (which I don't like) or initiated manually (which might
 be fine).
   
I'd advocate that the piece of Ivan's proposal wherein a central 
mechanism allows even a completely isolated machine to find and update 
automatically is a good idea.  It's a fairly trivial proposal that way, 
in the same check to see if we've been stolen download 4 bytes (or so) 
telling us the currently available version.  If we can't get it locally 
after X period, try to get it from the server (using whatever protocol, 
be it your own, rsync, BitTorrent or Telepathy (cute, I was actually 
writing telepathy there and then realised it was the name of our 
networking library)).  That is, I'd like to see a robust, automatic 
mechanism for triggering the laptop's search and a fallback position 
that allows for resolution even in the isolated-user case that doesn't 
require user intervention.

Enjoy,
Mike

-- 

  Mike C. Fletcher
  Designer, VR Plumber, Coder
  http://www.vrplumber.com
  http://blog.vrplumber.com

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: System update spec proposal

2007-06-27 Thread Dan Williams
On Wed, 2007-06-27 at 17:42 +0200, Alexander Larsson wrote:
 On Tue, 2007-06-26 at 15:03 -0400, Mike C. Fletcher wrote:
 
  My understanding here is that Alex's system is currently point-to-point, 
  but that we don't yet have any way to distribute it on the mesh?  That 
  is, that we are looking at a research project to determine how to make 
  it work in-the-field using some form of mesh-based discovery protocol 
  that's going to try to optimise for connecting to local laptops.  I 
  personally don't care which way we distribute, but I'm wary of having to 
  have some mesh-network-level hacking implemented to provide discovery of 
  an update server.
 
 I'm not sure what you mean here exactly. Discovery is done using avahi,
 a well known protocol which we are already using in many places on the
 laptop. The actual downloading of file uses http, which is a well known
 protocol with many implementations.
 
 The only thing i'm missing atm is a way to tell which ip addresses to
 prefer downloading from since they are close. This information is
 already availible in the mesh routing tables in the network driver (and
 possibly the arp cache), and its just a question of getting this info
 and using it to drive what servers to pick for downloading.

So, like michail said, do something like:

n = 0
while (True) {
buf = output of (iwpriv msh0 fwt_list_neigh n)
if (buf == (null))
break;  // all done

parse buf into fields
hwaddr = parsed[0]  // Grab the 'ra' field (1st one)
ip4addr = lookup_hwaddr_in_arp(hwaddr)
do something with ip4addr
n++;
}

Look on the 'olpc' branch of the libertas driver here:

http://git.infradead.org/?p=libertas-2.6.git;a=blob;f=drivers/net/wireless/libertas/README;hb=olpc
http://git.infradead.org/?p=libertas-2.6.git;a=blob;f=drivers/net/wireless/libertas/ioctl.c;hb=olpc

The README has a description of the command, and the ioctl.c has the
implementation.  Just search for the string neigh and you'll find it.

Dan

   Basically aside from the vserver bits, which no one has seen, I don't
   see a particular advantage to using rsync.  In fact, I see serious
   downsides since it misses some of the key critical advantages of using
   our own tool not the least of which is that we can make our tool do what
   we want and with rsync you're talking about changing the protocols.
 
  Hmm, interestingly I see using our own tool as a disadvantage, not a 
  huge one, but a disadvantage nonetheless, in that we have more untested 
  code on the system (and we already have a lot), and in this case, in a 
  critical must-never-fail system.  For instance, what happens if the user 
  is never connected to another XO or school server, but only connects to 
  a (non-mesh) WiFi network?  Does the mesh-broadcast upgrade discovery 
  protocol work in that case?
 
 Avahi works fine for these cases too. Of course, since it was originally
 created for normal networks. However, if you never come close to another
 OLPC machine, then we won't find a machine to upgrade against. Its quite
 trivial to make it pull from any http server on the net, but that has to
 be either polled (which I don't like) or initiated manually (which might
 be fine).
 
 
 ___
 Devel mailing list
 Devel@lists.laptop.org
 http://lists.laptop.org/listinfo/devel

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: System update spec proposal

2007-06-27 Thread Christopher Blizzard
On Wed, 2007-06-27 at 12:26 -0400, Mike C. Fletcher wrote:
 
 That said, I would be more comfortable if the fallback included a way 
 for the laptop to check a well-known machine every X period (e.g. in 
 Ivan's proposal) and if there's no locally discovered source, use a 
 publicly available source as the HTTP source by default 

I think that you're talking about the mail from Scott, not Ivan.  And I
think that we'll do something like that, yeah.  That's one of the
easiest parts of the update system.  (Also one of the worst mistakes if
you get it wrong ala the Hour of Terror:
http://www.justdave.net/dave/2005/05/01/one-hour-of-terror/ but that's
beside the point.)

--Chris

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: System update spec proposal

2007-06-27 Thread Christopher Blizzard
On Wed, 2007-06-27 at 12:39 -0400, Mike C. Fletcher wrote:
 
 Could we get a summary of what the problem is: 

The main objection to vserver from all the kernel hackers (and those of
us that have to support them!) is that it's a huge patch that touches
core kernel bits and it has no plans to make it upstream.  Yes, it's
used in a lot of interesting places successfully, but that doesn't mean
it's a supportable-over-the-long-term solution.  Scale has nothing to do
with long term supportability.  And these laptops have to be supported
for at least 5 years.

This isn't a new discussion; it's been going on for months and months.
Just quietly, that's all.

--Chris

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: System update spec proposal

2007-06-27 Thread Christopher Blizzard
On Wed, 2007-06-27 at 17:31 +0200, Alexander Larsson wrote:
 
 I have a general question on how this vserver/overlay/whatever system
 is
 supposed to handle system files that are not part of the system image,
 but still exist in the root file system. For instance,
 take /var/log/messages or /dev/log? Where are they stored? Are they
 mixed in with the other system files? If so, then rolling back to an
 older version will give you e.g. your old log files back. Also, that
 could be complicating the usage of rsync. If you use --delete then it
 would delete these files (as they are not on the server).
 

Just a note about these particular files.  I don't think that on the
final version that we're going to be running a kernel or syslog daemon.
We're running them right now because they are useful for debugging but I
don't want those out there in the field taking up memory and writing to
the flash when they don't have to be.  I suspect that for most users
they will have very little use.

--Chris

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: System update spec proposal

2007-06-27 Thread Chris Ball
Hi,

I have a general question on how this vserver/overlay/whatever
system is supposed to handle system files that are not part of the
system image, but still exist in the root file system. For
instance, take /var/log/messages or /dev/log? Where are they
stored? Are they mixed in with the other system files? 

Just a note about these particular files.  I don't think that on
the final version that we're going to be running a kernel or syslog
daemon. We're running them right now because they are useful for
debugging but I don't want those out there in the field taking up
memory and writing to the flash when they don't have to be.  I
suspect that for most users they will have very little use.

We're currently using a tmpfs for these (/var/log, /var/run, plus others)
so they aren't being written to the flash at all.  I'd rather have us
keep doing that than turn them off, since we're throwing away useful
in-the-field debugging information (for anyone who wants to help fix
laptops remotely, not just us) if we turn them off.

- Chris.
-- 
Chris Ball   [EMAIL PROTECTED]
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: System update spec proposal

2007-06-27 Thread Ivan Krstić
On Jun 27, 2007, at 1:50 PM, Christopher Blizzard wrote:
 I think that you're talking about the mail from Scott, not Ivan.

It was my mail; my proposal explicitly talks about this.

--
Ivan Krstić [EMAIL PROTECTED] | GPG: 0x147C722D

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: System update spec proposal

2007-06-27 Thread Ivan Krstić
On Jun 27, 2007, at 2:57 AM, David Woodhouse wrote:
 Nevertheless, it's an accurate description of what happened.

Let's agree to disagree. In any case, it doesn't matter at this  
point. We have work to do.

--
Ivan Krstić [EMAIL PROTECTED] | GPG: 0x147C722D

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


System update spec proposal

2007-06-26 Thread Ivan Krstić
Software updates on the One Laptop per Child's XO laptop





0. Problem statement and scope
==

This document aims to specify the mechanism for updating software on the
XO-1 laptop. When we talk about updating software, we are referring both
to system software such as the OS and the core services controlled by
OLPC that are required for the laptop's basic operation, and about any
installed user-facing applications (activities), both those provided
by OLPC and those provided by third parties.




1. System updater
=

1.1. Core goals
---

The three core goals of a software update tool (hereafter updater)  
for the
XO are as follows:

 * Security
 Given the initial age group of our users, it is the only reasonable
 solution to default to automatic detection and installation of
 updates, both to be able to apply security patches in a timely
 fashion, and to enable users to benefit from rapid development and
 improvements in the software they're using. Automatic updates,
 however, are a security issue unto themselves: compromising the
 update system in any way can provide an attacker with the  
ability to
 wreak havoc across entire installed bases of laptops while  
bypassing
 -- by design -- all the security measures on the machine.  
Therefore,
 the security of the updater is paramount and must be its first
 design goal.

 * Uncompromising emphasis on fault-tolerance
 Given the scale of our deployment, the relatively high  
complexity of
 our network stack when compared to currently-common deployments,  
the
 unreliability of Internet connectivity even when available, and
 perhaps most importantly our desire for participating countries to
 soon begin customizing the official OLPC OS images to best suit
 them, it is clear that our updater must be fault-tolerant. This is
 both in the simple sense -- cryptographic checksums need to be used
 to ensure updates were received correctly -- and in the more  
complex
 sense that the likelihood of a human error with regard to update
 preparation goes up proportionally to the number of different base
 OS images at play. A fault-tolerant updater will therefore allow
 _unconditional_ rollback of the most recently applied
 update. Unconditional here means that, barring the failure of
 other parts of the system which are dependencies of the updater
 (e.g. the filesystem), the updater must always know how to  
correctly
 unapply an applied update, even if the update was malformed.

 * Low bandwidth
 For much the same reasons (project scale, Internet access scarcity
 and unreliability) that require fault-tolerance from the updater,
 the tool must take maximum care to minimize data transfer
 requirements. This means, concretely, that a delta-based approach
 must be utilized by the updater, with a keyframe or heavy  
update
 being strictly a fallback in the unlikely case an update path  
cannot
 be constructed from the available or reachable delta sets.



1.2. Design
---

It is given, due to requirements imposed by the Bitfrost security
platform, that a laptop will attempt to make daily contact with the
OLPC anti-theft servers. During that interaction, the laptop will post
its system software version, and the response provided by the
anti-theft service will optionally contain a relative URL of a more
recent OS image.

If such a pointer has been received and the laptop is behind a known
school server, it will probe the school server via rsync at the provided
relative URL to determine whether the server has cached the update
locally. If the update is not available locally, the laptop will wait up
to 24 hours, checking approximately hourly whether the school server has
obtained the update. If at the end of this wait period the school server
still does not have a local copy of the update, it is assumed to be
malfunctioning, and the laptop will contact an upstream master server
directly by using the URL provided originally by the anti-theft service.

In any of these three cases (school server has update immediately,
school server has update after delay, upstream master has update), we
say the laptop has 'found an update source'.

Once an update source has been found, the laptop will invoke the
standard rsync tool over a plaintext (unsecured) connection via the
rsync protocol -- not piped through a shell of any kind -- to bring
its own files up to date with the more recent version of the
system. rsync uses a network-efficient binary diff algorithm which
satisfies goal 3.



1.3. Design note: peer-to-peer updates
--

It is desirable to provide viral update functionality at a later date,
such that two laptops with different software versions (and without any
notion of trust) can engage 

Re: System update spec proposal

2007-06-26 Thread Christopher Blizzard
A few notes follow here.

First about approach: you should have given this feedback earlier rather
than later since Alex has been off working on an implementation and if
you're not giving feedback early then you're wasting Alex's time.  Also
I would have appreciated it if you had given direct feedback to Alex
instead of just dropping your own proposal from space.  It's a crappy
thing to do.

So notes on the proposal:

1. There's a lot in here about vserver + updates and all of that is
fine.  But we've been pretty careful in our proposals to point out that
how you get to the bits to the box is different than how they are
applied.  I don't see anything in here around vserver that couldn't use
alex's system instead of rsync.  So there's no added value there.

2. rsync is a huge hammer in this case.  In fact, I think it's too much
of a hammer.  We've used it in the past ourselves for these image update
systems over the last few years (see also: stateless linux) and it
always made things pretty hard.  Because you have to use lots random
exceptions during its execution and once it starts you can't really
control what it does.  It's good for moving live image to live image,
but I wouldn't really want to use it for an image update system -
especially one that will be as distributed as this.  Simply put I see
rsync as more of a tool for sysadmins than for a task like this.  I
think that we need something that's actually designed to solve the
problems at hand rather than seeing the hammer we have on the shelf and
thinking that it's obviously the right solution.

3. It doesn't really solve the scaling + bandwith problems in the same
way as Alex's tool does.  Still requires a server and doesn't let you
propagate changes out to servers as easily as his code does.

Basically aside from the vserver bits, which no one has seen, I don't
see a particular advantage to using rsync.  In fact, I see serious
downsides since it misses some of the key critical advantages of using
our own tool not the least of which is that we can make our tool do what
we want and with rsync you're talking about changing the protocols.

Anyway, I'll let Alex respond with more technical points if he chooses
to.

--Chris

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: System update spec proposal

2007-06-26 Thread Mike C. Fletcher
Ivan Krstić wrote:
 On Jun 26, 2007, at 2:21 PM, Christopher Blizzard wrote:.  Also
   
 I would have appreciated it if you had given direct feedback to Alex
 instead of just dropping your own proposal from space.  It's a crappy
 thing to do.
 
 Let's not make this about approach on a public mailing list, please.
   
Actually, while I found the response a bit harsh, could I suggest that 
what the project needs is *more* public discussion all around, not 
necessarily about approach, but about half-formed plans, ideas and 
rationale.  Having discussion move offline into some private channel is 
a good way to prevent anyone outside the offices from knowing *why* 
things are happening and having things blow up when the decisions appear 
to come from on high.

For instance:

* Whole projects are surfacing after weeks or months of development.
  Papers and implementations are starting and stopping without
  anyone knowing what's going on or, more importantly, knowing *why*
  they have been done.  Witness the immediate counter-proposals to
  Alex's implementation of the point-to-point protocol.
* VServer only appeared in public discussions yesterday or so
  AFAIK, yet it's apparently already the chosen path for doing the
  system compartmentalization.

We need more draft-level discussions, more discussion of plans, 
rationales, ideas and approaches.  The discussions don't need to be 
long, they don't need to be formal and well structured, they just need 
to be sufficient to let people have an idea what they need to write for 
in a few months.  I realise we're on a very tight schedule, but it's 
extremely difficult to help with the project if we don't know what's 
going on inside it.

Assume good will and proper intentions on the part of all people until 
*proven* wrong (repeatedly), and only then assume ignorance of the 
proper path until proved wrong, and only then assume misguidance yet a 
thirst for knowledge until proved wrong.  Even if proved wrong many 
times, attempt to find a way to solve the issue politely and 
respectfully.  We are all working to make a better world, and there 
should be no egos involved if we are doing things right.

Anyway, just my thoughts,
Mike

-- 

  Mike C. Fletcher
  Designer, VR Plumber, Coder
  http://www.vrplumber.com
  http://blog.vrplumber.com

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: System update spec proposal

2007-06-26 Thread Ivan Krstić
On Jun 26, 2007, at 5:15 PM, Daniel Monteiro Basso wrote:
 And the same document rendered using jsCrossmark is available at:
 http://www.lec.ufrgs.br/~dmbasso/jsCrossmark/systemUpdate.html

Looks great!

 Please Ivan, consider answering my e-mail about Crossmark. I sent you
 privately because I wasn't on the list before, but now you can  
 answer it
 openly.

Hm. I don't have any mail from you after a message from May 21st that  
I answered a couple of days later. Could you please send me copies of  
any other messages you sent? In general, if I don't answer non-urgent  
mail in 3-5 days, it's safe to assume something's wrong and I  
probably haven't seen the message.

--
Ivan Krstić [EMAIL PROTECTED] | GPG: 0x147C722D

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: System update spec proposal

2007-06-26 Thread David Woodhouse
On Tue, 2007-06-26 at 15:59 -0400, Mike C. Fletcher wrote:
 * VServer only appeared in public discussions yesterday or so
   AFAIK, yet it's apparently already the chosen path for doing the
   system compartmentalization. 

It's a short-term hack, because the people working on the security stuff
let it all slide for too long and now have declared that we don't have
time to do anything sensible. It will be dropped as soon as possible,
because we know it's not a viable and supportable plan in the long (or
even medium) term.

-- 
dwmw2

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: System update spec proposal

2007-06-26 Thread Christopher Blizzard
On Tue, 2007-06-26 at 18:50 -0400, C. Scott Ananian wrote:
 On 6/26/07, Christopher Blizzard [EMAIL PROTECTED] wrote:
  A note about the history of using rsync.  We used rsync as the basis for
  a lot of the Stateless Linux work that we did a few years ago.  That
  approach (although using LVM snapshots instead of CoW snapshots)
  basically did exactly what you've proposed here.  And we used to kill
  servers all the time with only a handful of clients.  Other people
  report that it's easy to take out other servers using rsync.  It's
  pretty fragile and it doesn't scale well to entire filesystem updates.
  That's just based on our experience of building systems like what you're
  suggesting here and how we got to where we are today.
 
 I can try to get some benchmark numbers to validate this one way or
 the other.  My understanding is that rsync is a memory hog because it
 builds the complete list of filenames to sync before doing anything.
 'Killing servers' would be their running out of memory.  Rsync 3.0
 claims to fix this problem, which may also be mitigated by the
 relatively small scale of our use:  my laptop's debian/unstable build
 has 1,345,731 files. Rsync documents using 100 bytes per file, so
 that's 100M of core required. Not hard to see that 10 clients or so
 would tax a machine with 1G main memory.  In contrast, XO build 465
 has 23,327 files: ~2M of memory.  100 kids simultaneously updating
 equals 2G of memory, which is within our specs for the school server.
 Almost two orders of magnitude fewer files for the XO vs. a 'standard'
 distribution ought to fix the scaling problem, even without moving to
 rsync 3.0.
  --scott
 

I think that in our case it wasn't just memory it was also seeking all
over the disk.  We could probably solve that easily by stuffing the
entire image into memory (and it will fit, easily) but your comment
serves to prove another point: that firing up a program that has to do a
lot of computation every time a client connects is something that's
deeply wrong.  And that's just for the central server.

Also, 2G of memory on the school server is nice - as long as you don't
expect to do anything else.  Or as long as you don't want to do what I
mention above and shove everything into memory to avoid the thrashing
problem.

--Chris

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: System update spec proposal

2007-06-26 Thread Ivan Krstić
On Jun 26, 2007, at 7:23 PM, David Woodhouse wrote:
 because the people working on the security stuff
 let it all slide for too long and now have declared that we don't have
 time to do anything sensible.

That's a cutely surreal take on things -- I really appreciate you  
trying to defuse the situation with offbeat humor :)

--
Ivan Krstić [EMAIL PROTECTED] | GPG: 0x147C722D

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: System update spec proposal

2007-06-26 Thread Ivan Krstić
On Jun 26, 2007, at 7:21 PM, Christopher Blizzard wrote:
 Also, 2G of memory on the school server is nice - as long as you don't
 expect to do anything else.  Or as long as you don't want to do what I
 mention above and shove everything into memory to avoid the thrashing
 problem.

I see no reason why the school server should be configured to allow  
more than 10-20 simultaneous client updates. We're not going for real- 
time propagation update here. Our largest schools can get updated  
within a day or so at that rate, and all others within at most a few  
hours.

--
Ivan Krstić [EMAIL PROTECTED] | GPG: 0x147C722D

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: System update spec proposal

2007-06-26 Thread tridge
Scott,

  Rsync documents using 100 bytes per file, so that's 100M of core
  required.

That 100 bytes per file is very approximate. It also increases quite
a lot if you use --delete and also increases if you use
--hard-links. Other options have smaller, but non-zero, impacts on the
memory usage, and of course it depends on the filenames themselves.

If rsync is going to be used on low memory machines, then it could be
broken up into several pieces. So do multiple rsync runs, each
synchronising a portion of the filesystem (eg. each directory under
/usr).

Alternatively, talk to Wayne Davison about rsync 3.0. One of the core
things that brings is lower memory usage (essentially automating the
breakup into directory trees that I mentioned above).

I had hoped to have time to write a new synchronisation tool for OLPC
that would be much more memory efficient and take advantage of
multicast, taking advantage of a changeset like approach to complete
OS update, but various things have gotten in the way of me
contributing serious time to the OLPC project, for which I
apologise. I could review any rsync based scripts you have though, and
offer suggestions on getting the most out of rsync.

Cheers, Tridge
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: System update spec proposal

2007-06-26 Thread tridge
Chris,

  but your comment serves to prove another point: that firing up a
  program that has to do a lot of computation every time a client
  connects is something that's deeply wrong.  And that's just for the
  central server.

yes, very true. What rsync as a daemon should do is mmap a
pre-prepared file list, and you generate that file list using
cron. 

For the OLPC case this isn't as hard as for the general rsync case, as
you know that all the clients will be passing the same options to the
server, so the same pre-prepared file list can be used for all of
them. In the general rsync case we can't guarantee that, which is what
makes it harder (though we could use a map file named after a hash of
the options).

Coding/testing this takes time though :(

Cheers, Tridge
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: System update spec proposal

2007-06-26 Thread C. Scott Ananian
On 6/26/07, Ivan Krstić [EMAIL PROTECTED] wrote:
 term one. It's still my *strong* hunch that we are not going to run
 into any issues whatsoever given our update sizes and the fact that
 we're serving them from reasonably beefy school server machines, so
 adding this functionality to rsync would easily be a post-FRS goal.

I concur.  Rate-limiting is certainly a viable option for FRS if
server resources are an issue, and the *network* characteristics of
rsync are certainly in the right ballpark.  I suspect that we won't
need to hack rsync ourselves at all, since rsync 3.0 will Do What We
Want.  But we'll see what Wayne says about the timeline of rsync 3.0.

 Scott, are you willing to do a few tests and grab some real numbers,
 using previous OLPC OS images, for resource utilization on the school
 server in the face of e.g. 5, 10, 20, 50 parallel updates?

I might need some help getting access to enough clients, but I have no
problem doing the benchmarks.  Tridge, do you have any recommendations
about benchmarking rsync?
 --scott

-- 
 ( http://cscott.net/ )
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: System update spec proposal

2007-06-26 Thread Ivan Krstić
On Jun 26, 2007, at 11:46 PM, C. Scott Ananian wrote:
 I suspect that we won't
 need to hack rsync ourselves at all, since rsync 3.0 will Do What We
 Want

I understood rsync 3.0 is smart about breaking up file list  
generation into smaller chunks to be better about memory usage, but  
that's orthogonal to the pregenerated-mmaped-file-list optimization.  
The former helps memory consumption, but you're still needlessly stat 
()ing static data left and right. Although, again, our updates are  
sufficiently small that I expect the stat() calls to just be hitting  
the VFS cache and not actually cost us much of anything.

--
Ivan Krstić [EMAIL PROTECTED] | GPG: 0x147C722D

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel