Re: [Server-devel] Apache proxy CRCsync mozilla gsoc project?
Hi Rusty, I didn't implement this because I wanted the server to be able to cache the reply easily (ie. new story goes up on /., everyone sends old hash, reply gets served from accelerator). I don't think that cacheing these will work as well as you might expect. In your example of slashdot, it gives a different reply every time, even for the same user. Try two wget's of slashdot.org and run a diff between the results. It would work for static pages, but with static pages you don't really need delta-encoding, as you'll get a good hit rate with the normal cache tag mechanisms that browsers and proxies already use. Cheers, Tridge ___ Server-devel mailing list Server-devel@lists.laptop.org http://lists.laptop.org/listinfo/server-devel
Re: [Server-devel] Apache proxy CRCsync mozilla gsoc project?
Hi Toby, The plan was to include something like an sha1 hash of the original file in the response headers. Then once the file has been decoded you can check to make sure it matches. If not you can resend the request without the black hash header and get the file the oldfashioned way. re-sending http requests can be dangerous. The request might have triggered an action like delete the last person from the list. When you resend it could delete two users rather than one. Remember that one of the aims of this work is to allow cacheing of dynamic requests, so you can't just assume the pages are marked as cacheable (which usually implies that a 2nd request won't do any harm). Certainly including a strong whole-page hash is a good idea, but if the strong hash doesn't match, then I think you need to return an error, just like if you got a network outage. The per-block rolling hash should also be randomly seeded as Martin mentioned. That way if the user does ask for the page again then the hashing will be different. You need to send that seed along with the request. In practice hashing errors will be extremely rare. It is extremely rare for rsync to need a 2nd pass, and it uses a much weaker rolling hash (I think I used 16 bits by default for the per block hashes). The ability to do multiple passes is what allows rsync to get away with such a small hash, but I remember that when I was testing the multiple-pass code I needed to weaken it even more to get any reasonable chance of a 2nd pass so I could be sure the code worked. Cheers, Tridge ___ Server-devel mailing list Server-devel@lists.laptop.org http://lists.laptop.org/listinfo/server-devel
Re: Fwd: Dailymotion for XO laptop
Rob, I think it is work the effort, and would like to figure out how we on the Gnash team can do this. great! I'll contact you off list to discuss some of the ways we do this for Samba and see if I can give you some help getting started. Cheers, Tridge ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: Fwd: Dailymotion for XO laptop
Ed, 1. There are two kinds of good patent attorneys. One kind works pro bono for free software and the other gets paid big bucks by patent holders. There is a 3rd kind - the kind that works for a law firm specifically funded to assist free software projects. For example, the Software Freedom Law Center. It's a question of two competing sides of a specific and very detailed technical and legal argument being thrashed out in the press and in the courts. no, it does not need to be thrashed out either in the press or in court. The aim is to avoid both. What you need to do is produce detailed claim charts, along with a set of non-infringement arguments. Once you have those then you can work out how to write your code so as to avoid the patent. This isn't always possible, but it often is. If you head to the Groklaw web site, you can see this sort of thing (from the free software side). This process does not take a few weeks but *decades*. Not true at all. I have handled the patent avoidance for Samba for a long time now, and it has generally taken me a few weeks per patent with a good patent attorney to come up with a solid non-infringement argument. Those are intensive weeks, but it is certainly not years. If it took 'decades' then what you would be doing is waiting for the patent to expire. Whether the effort involved is worth it depends how much of an impediement these codec patents are to the success of the OLPC project. In the case of Samba we can't just choose to use another protocol, so avoiding patents via non-infringement is our only choice. If we can't do it then we have to shutdown the project. That makes almost any level of effort worthwhile. For OLPC the need for these codecs is almost certainly not as critical, so perhaps the effort is not worthwhile. That is not really for me to judge. I just wanted to point out that the existance of a patent in connection with a codec is not necessarily a show stopper. Cheers, Tridge ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: Another proxy idea - web proxy logs from school servers?
Ian, A somewhat relevant idea. http://wiki.openmoko.org/wiki/Server:WebProxy You can get really quite large benefits from a local cache. The problem with this approach is this line in the description: then informing the proxy of which version of the page it has That means the proxy needs to keep all old versions of every page for any client that might need them in order to take the diff. The key trick with rproxy is it avoids the need to keep all the old versions. The way rproxy works is this: - the laptop talks to the school server proxy - the school server proxy talks to a well connected upstream proxy - the well connected upstream proxy talks to the world The school server proxy keeps one old version for any URL. When a request comes in for a new copy of that page then it sends the request to the upstream proxy, but tagged with approx 100 bytes of rsync style rolling block hashes of the old page it has. The upstream server then fetches the current copy of the page, and can use just the new page plus the 100 bytes to calculate a binary diff between the old and new page (the diff algorithm doesn't need access to the old page). The diff is sent to the school server, which applies it to the old page to generate the new page. A strong checksum is also calculated and checked. The result is: - no extra storage requirements on upstream server - store one copy of the page (with the usual LRU cache stuff) on the school server - no changes on the laptop - reduced bandwidth usage over the link from the school to the upstream server Another neat feature is that it also works for different URLs for the same server. If the school server doesn't find the exact URL in its cache, it can send a hash of the best matching URL it hash. So for example if someone has previously visited: http://some.site/foo?user=fred then someone else visits: http://some.site/foo?user=mary then there is often a lot of common data between the two URLs. That common data won't go over the schools internet link. If there isn't any common data then you pay the 100 byte price, but nothing more. You don't even pay the price of calculating the rolling hashes, as that can be done once while reading the page originally, in parallel with the memcpy (and at extremely low cost). It doesn't produce as small diff sizes as you get with a real local diff, but the savings on disk usage make it worthwhile. Cheers, Tridge ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
web proxy logs from school servers?
Is anyone gathering up the web proxy logs from any of the school servers? (I presume they do run a proxy cache) The reason I ask is that if the logs show the kids are hitting dynamic web sites frequently, and that multiple kids are hitting the same site, then it might be worth installing rproxy on the school servers. That would allow effective cacheing of dynamic pages. Normally a proxy cache can't cache a page like slashdot.org (not that the kids would be reading slashdot!) as it changes each time. By using rproxy, only the changes would come across the slow link, thus saving valuable bandwidth. Whether its worthwhile all depends on the ratios of dynamic and non-dymamic content, and whether kids tend to flock to the same sites or reload a dynamic site often enough that its worth the extra hundred or so bytes per request that rproxy adds. So we'd need logs to work out if its worth the effort. rproxy hasn't had much TLC lately, but that's easy enough to fix. Cheers, Tridge ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: web proxy logs from school servers?
James, I had thought of rproxy between the school server and the XO browser, if the time taken to checksum is low enough, and if there is a local cache on the XO. gosh no, I don't propose putting rproxy on the laptops themselves. The way it would work is this: - install rproxy on the school server - install another rproxy on a well connected server (eg. rproxy.laptop.org) which might also chain to a squid or similar. The laptops would be unmodified. The school server would receive the normal http requests from the laptops. It would add the rproxy tags and forward to the upstream proxy. The upstream proxy will see the tags and would get the page, then do the delta encoding and send to the school server. The school server decodes, and sends a standard http reply to the laptop. The end result is lower bandwidth usage over the link between the school server and the world. So no extra storage on the laptops and no software change on them. Cheers, Tridge ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: web proxy logs from school servers?
Adrian, Hm, i haven't seen rproxy before; integrating it into Squid could prove to be rather interesting.. yes, I had a student looking at that a while ago (back when I was an academic), but the student got tempted by a .com job and stopped his degree :-) It's not totally necessary though as you can chain proxies together. The way I've generally used rproxy at home in the past has been to chain it to a squid on the upstream side. It saved a lot when I was on a slow link, but now I have ADSL2+ I have it turned off as it doesn't really matter. Chaining it also has the advantage of keeping the code very simple. For school servers, we'd probably want to round-robin or similar between multiple upsteam servers, each running a squid. One of the nice features of the delta encoding that rproxy uses is that the subsequent requests don't need to be against the same upstream proxy as the original request for the compression to be effective. Cheers, Tridge ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: web proxy logs from school servers?
Adrian, If the school server team decides to run Squid then I'm open to have a chat about what sorts of things can be done to improve the experience, like rproxy. great. We really need some logs showing the pattern of requests the kids are using though. I suspect my own browsing patterns would be very different from kids, and I'd want to know that we will gain enough for this to be worthwhile. I suspect it would be a significant win, but thats only a guess. Cheers, Tridge ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
no serial number ?
I've got a B2-1 which I've just upgraded ready for a demo to a local Australian senator. Unfortunately I've been bitten by the activation process. At powerup it says S/N Unknown, then could not activate this XO Serial number: SHF When I try the activative procedure, I instead get: No serial number in mfg data No serial number User power button to power off plus a sad face icon. Any suggestions? Under the battery cover it says b2-7 and gives a serial number of SHF70600215 Cheers, Tridge PS: Are there any other olpc owners in Canberra who could bring a machine along to a meeting with Senator Lundy next week? ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: no serial number ?
I wonder how the WP tag got set? WP == write protect? You think the flash is write protected? ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: System update spec proposal
Scott, Rsync documents using 100 bytes per file, so that's 100M of core required. That 100 bytes per file is very approximate. It also increases quite a lot if you use --delete and also increases if you use --hard-links. Other options have smaller, but non-zero, impacts on the memory usage, and of course it depends on the filenames themselves. If rsync is going to be used on low memory machines, then it could be broken up into several pieces. So do multiple rsync runs, each synchronising a portion of the filesystem (eg. each directory under /usr). Alternatively, talk to Wayne Davison about rsync 3.0. One of the core things that brings is lower memory usage (essentially automating the breakup into directory trees that I mentioned above). I had hoped to have time to write a new synchronisation tool for OLPC that would be much more memory efficient and take advantage of multicast, taking advantage of a changeset like approach to complete OS update, but various things have gotten in the way of me contributing serious time to the OLPC project, for which I apologise. I could review any rsync based scripts you have though, and offer suggestions on getting the most out of rsync. Cheers, Tridge ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: System update spec proposal
Chris, but your comment serves to prove another point: that firing up a program that has to do a lot of computation every time a client connects is something that's deeply wrong. And that's just for the central server. yes, very true. What rsync as a daemon should do is mmap a pre-prepared file list, and you generate that file list using cron. For the OLPC case this isn't as hard as for the general rsync case, as you know that all the clients will be passing the same options to the server, so the same pre-prepared file list can be used for all of them. In the general rsync case we can't guarantee that, which is what makes it harder (though we could use a map file named after a hash of the options). Coding/testing this takes time though :( Cheers, Tridge ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel