Re: [Server-devel] Apache proxy CRCsync mozilla gsoc project?

2009-04-05 Thread tridge
Hi Rusty,

 I didn't implement this because I wanted the server to be able to cache
  the reply easily (ie. new story goes up on /., everyone sends old hash, 
  reply gets served from accelerator).

I don't think that cacheing these will work as well as you might
expect. In your example of slashdot, it gives a different reply every
time, even for the same user. Try two wget's of slashdot.org and run a
diff between the results.

It would work for static pages, but with static pages you don't really
need delta-encoding, as you'll get a good hit rate with the normal
cache tag mechanisms that browsers and proxies already use.

Cheers, Tridge
___
Server-devel mailing list
Server-devel@lists.laptop.org
http://lists.laptop.org/listinfo/server-devel


Re: [Server-devel] Apache proxy CRCsync mozilla gsoc project?

2009-03-31 Thread tridge
Hi Toby,

  The plan was to include something like an sha1 hash of the original file in
  the response headers. Then once the file has been decoded you can check to
  make sure it matches. If not you can resend the request without the black
  hash header and get the file the oldfashioned way.

re-sending http requests can be dangerous. The request might have
triggered an action like delete the last person from the list. When
you resend it could delete two users rather than one.

Remember that one of the aims of this work is to allow cacheing of
dynamic requests, so you can't just assume the pages are marked as
cacheable (which usually implies that a 2nd request won't do any
harm).

Certainly including a strong whole-page hash is a good idea, but if
the strong hash doesn't match, then I think you need to return an
error, just like if you got a network outage.

The per-block rolling hash should also be randomly seeded as Martin
mentioned. That way if the user does ask for the page again then the
hashing will be different. You need to send that seed along with the
request.

In practice hashing errors will be extremely rare. It is extremely
rare for rsync to need a 2nd pass, and it uses a much weaker rolling
hash (I think I used 16 bits by default for the per block hashes). The
ability to do multiple passes is what allows rsync to get away with
such a small hash, but I remember that when I was testing the
multiple-pass code I needed to weaken it even more to get any
reasonable chance of a 2nd pass so I could be sure the code worked.

Cheers, Tridge
___
Server-devel mailing list
Server-devel@lists.laptop.org
http://lists.laptop.org/listinfo/server-devel


Re: Fwd: Dailymotion for XO laptop

2008-01-16 Thread tridge
Rob,

I think it is work the effort, and would like to figure out how we on
  the Gnash team can do this.

great! I'll contact you off list to discuss some of the ways we do
this for Samba and see if I can give you some help getting started.

Cheers, Tridge
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Fwd: Dailymotion for XO laptop

2008-01-14 Thread tridge
Ed,

  1. There are two kinds of good patent attorneys. One kind works pro 
  bono for free software and the other gets paid big bucks by patent 
  holders.

There is a 3rd kind - the kind that works for a law firm specifically
funded to assist free software projects. For example, the Software
Freedom Law Center.

  It's a question of two competing sides of a specific and very
  detailed technical and legal argument being thrashed out in the
  press and in the courts.

no, it does not need to be thrashed out either in the press or in
court. The aim is to avoid both. What you need to do is produce
detailed claim charts, along with a set of non-infringement
arguments. Once you have those then you can work out how to write your
code so as to avoid the patent. This isn't always possible, but it
often is.

  If you head to the Groklaw web site, you can see this sort of thing
  (from the free software side). This process does not take a few
  weeks but *decades*.

Not true at all. I have handled the patent avoidance for Samba for a
long time now, and it has generally taken me a few weeks per patent
with a good patent attorney to come up with a solid non-infringement
argument. Those are intensive weeks, but it is certainly not years.

If it took 'decades' then what you would be doing is waiting for the
patent to expire. 

Whether the effort involved is worth it depends how much of an
impediement these codec patents are to the success of the OLPC
project. In the case of Samba we can't just choose to use another
protocol, so avoiding patents via non-infringement is our only
choice. If we can't do it then we have to shutdown the project. That
makes almost any level of effort worthwhile. For OLPC the need for
these codecs is almost certainly not as critical, so perhaps the
effort is not worthwhile. That is not really for me to judge. I just
wanted to point out that the existance of a patent in connection with
a codec is not necessarily a show stopper. 

Cheers, Tridge
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Another proxy idea - web proxy logs from school servers?

2007-12-08 Thread tridge
Ian,

  A somewhat relevant idea.
  
  http://wiki.openmoko.org/wiki/Server:WebProxy
  
  You can get really quite large benefits from a local cache.

The problem with this approach is this line in the description:

 then informing the proxy of which version of the page it has

That means the proxy needs to keep all old versions of every page for
any client that might need them in order to take the diff. The key
trick with rproxy is it avoids the need to keep all the old versions.

The way rproxy works is this:

 - the laptop talks to the school server proxy

 - the school server proxy talks to a well connected upstream proxy

 - the well connected upstream proxy talks to the world

The school server proxy keeps one old version for any URL. When a
request comes in for a new copy of that page then it sends the request
to the upstream proxy, but tagged with approx 100 bytes of rsync style
rolling block hashes of the old page it has. The upstream server then
fetches the current copy of the page, and can use just the new page
plus the 100 bytes to calculate a binary diff between the old and new
page (the diff algorithm doesn't need access to the old page).

The diff is sent to the school server, which applies it to the old
page to generate the new page. A strong checksum is also calculated
and checked.

The result is:

 - no extra storage requirements on upstream server

 - store one copy of the page (with the usual LRU cache stuff) on the
   school server

 - no changes on the laptop

 - reduced bandwidth usage over the link from the school to the
   upstream server

Another neat feature is that it also works for different URLs for the
same server. If the school server doesn't find the exact URL in its
cache, it can send a hash of the best matching URL it hash. So for
example if someone has previously visited:

  http://some.site/foo?user=fred

then someone else visits:

  http://some.site/foo?user=mary

then there is often a lot of common data between the two URLs. That
common data won't go over the schools internet link. If there isn't
any common data then you pay the 100 byte price, but nothing more. You
don't even pay the price of calculating the rolling hashes, as that
can be done once while reading the page originally, in parallel with
the memcpy (and at extremely low cost).

It doesn't produce as small diff sizes as you get with a real local
diff, but the savings on disk usage make it worthwhile.

Cheers, Tridge
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


web proxy logs from school servers?

2007-12-06 Thread tridge
Is anyone gathering up the web proxy logs from any of the school
servers? (I presume they do run a proxy cache)

The reason I ask is that if the logs show the kids are hitting dynamic
web sites frequently, and that multiple kids are hitting the same
site, then it might be worth installing rproxy on the school
servers. That would allow effective cacheing of dynamic
pages. Normally a proxy cache can't cache a page like slashdot.org
(not that the kids would be reading slashdot!) as it changes each
time. By using rproxy, only the changes would come across the slow
link, thus saving valuable bandwidth.

Whether its worthwhile all depends on the ratios of dynamic and
non-dymamic content, and whether kids tend to flock to the same
sites or reload a dynamic site often enough that its worth the extra
hundred or so bytes per request that rproxy adds. So we'd need logs to
work out if its worth the effort.

rproxy hasn't had much TLC lately, but that's easy enough to fix.

Cheers, Tridge
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: web proxy logs from school servers?

2007-12-06 Thread tridge
James,

  I had thought of rproxy between the school server and the XO browser, if
  the time taken to checksum is low enough, and if there is a local cache
  on the XO.

gosh no, I don't propose putting rproxy on the laptops themselves.

The way it would work is this:

 - install rproxy on the school server
 - install another rproxy on a well connected server
   (eg. rproxy.laptop.org) which might also chain to a squid or
   similar. 

The laptops would be unmodified. The school server would receive the
normal http requests from the laptops. It would add the rproxy tags
and forward to the upstream proxy. The upstream proxy will see the tags
and would get the page, then do the delta encoding and send to the
school server. The school server decodes, and sends a standard http
reply to the laptop.

The end result is lower bandwidth usage over the link between the
school server and the world.

So no extra storage on the laptops and no software change on them. 

Cheers, Tridge
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: web proxy logs from school servers?

2007-12-06 Thread tridge
Adrian,

  Hm, i haven't seen rproxy before; integrating it into Squid could
  prove to be rather interesting..

yes, I had a student looking at that a while ago (back when I was an
academic), but the student got tempted by a .com job and stopped his
degree :-)

It's not totally necessary though as you can chain proxies
together. The way I've generally used rproxy at home in the past has
been to chain it to a squid on the upstream side. It saved a lot when
I was on a slow link, but now I have ADSL2+ I have it turned off as it
doesn't really matter.

Chaining it also has the advantage of keeping the code very
simple. For school servers, we'd probably want to round-robin or
similar between multiple upsteam servers, each running a squid. One of
the nice features of the delta encoding that rproxy uses is that the
subsequent requests don't need to be against the same upstream proxy
as the original request for the compression to be effective.

Cheers, Tridge
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: web proxy logs from school servers?

2007-12-06 Thread tridge
Adrian,

  If the school server team decides to run Squid then I'm open to have a
  chat about what sorts of things can be done to improve the experience,
  like rproxy.

great. We really need some logs showing the pattern of requests the
kids are using though. I suspect my own browsing patterns would be
very different from kids, and I'd want to know that we will gain
enough for this to be worthwhile. I suspect it would be a significant
win, but thats only a guess.

Cheers, Tridge
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


no serial number ?

2007-12-05 Thread tridge
I've got a B2-1 which I've just upgraded ready for a demo to a local
Australian senator. Unfortunately I've been bitten by the activation
process.

At powerup it says S/N Unknown, then 

  could not activate this XO
  Serial number: SHF

When I try the activative procedure, I instead get:

 No serial number in mfg data
 No serial number
 User power button to power off

plus a sad face icon.

Any suggestions? 

Under the battery cover it says b2-7 and gives a serial number of
SHF70600215 

Cheers, Tridge

PS: Are there any other olpc owners in Canberra who could bring a
machine along to a meeting with Senator Lundy next week?
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: no serial number ?

2007-12-05 Thread tridge
  I wonder how the WP tag got set?

WP == write protect?

You think the flash is write protected? 
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: System update spec proposal

2007-06-26 Thread tridge
Scott,

  Rsync documents using 100 bytes per file, so that's 100M of core
  required.

That 100 bytes per file is very approximate. It also increases quite
a lot if you use --delete and also increases if you use
--hard-links. Other options have smaller, but non-zero, impacts on the
memory usage, and of course it depends on the filenames themselves.

If rsync is going to be used on low memory machines, then it could be
broken up into several pieces. So do multiple rsync runs, each
synchronising a portion of the filesystem (eg. each directory under
/usr).

Alternatively, talk to Wayne Davison about rsync 3.0. One of the core
things that brings is lower memory usage (essentially automating the
breakup into directory trees that I mentioned above).

I had hoped to have time to write a new synchronisation tool for OLPC
that would be much more memory efficient and take advantage of
multicast, taking advantage of a changeset like approach to complete
OS update, but various things have gotten in the way of me
contributing serious time to the OLPC project, for which I
apologise. I could review any rsync based scripts you have though, and
offer suggestions on getting the most out of rsync.

Cheers, Tridge
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: System update spec proposal

2007-06-26 Thread tridge
Chris,

  but your comment serves to prove another point: that firing up a
  program that has to do a lot of computation every time a client
  connects is something that's deeply wrong.  And that's just for the
  central server.

yes, very true. What rsync as a daemon should do is mmap a
pre-prepared file list, and you generate that file list using
cron. 

For the OLPC case this isn't as hard as for the general rsync case, as
you know that all the clients will be passing the same options to the
server, so the same pre-prepared file list can be used for all of
them. In the general rsync case we can't guarantee that, which is what
makes it harder (though we could use a map file named after a hash of
the options).

Coding/testing this takes time though :(

Cheers, Tridge
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel