Re: [Server-devel] Apache proxy CRCsync mozilla gsoc project?

2009-04-01 Thread Toby Collett
So a quick question, what sort of http transfers are chunking most often
used for? I believe we will get poor results with the method for most types
of binary data, which tend to be the larger files. In the web context these
will generally have not changed at all (in which case traditional caching
will help) or will have changed completely in which case the hashing is just
overhead. Happy to be corrected on this point.

Actually while we are on this thought do we want to add the strong hash to
the request headers so the upstream server can reply with use the cached
version. This would allow the server side to correct for sites that don't
use correct cache headers (i.e. static images with no cache information).

One alternative to the fail on error is to hold a copy on the server end for
a short period so we can retransmit unencoded, but this is probably
unacceptable overhead on the server side, especially if we can't manage to
maintain a TCP session for the retry.

Are there any headers sent with each http chunk, we could always put our
strong hash across these, assuming that chunking is defined at source and
not repartitioned by caches and proxies in between.

Toby
___
Server-devel mailing list
Server-devel@lists.laptop.org
http://lists.laptop.org/listinfo/server-devel


Re: [Server-devel] Apache proxy CRCsync mozilla gsoc project?

2009-03-31 Thread Toby Collett
One thing we need to do is think about the headers carefully, as this is the
aspect of the project we could promote as a web standard. There is a large
amount of flexibility we could put in to this, but as Rusty has said, if
there is a way someone can implement a protocol wrong they will. So we need
to keep it as simple as possible.

At the moment we append the block size and hashes for the blocks to the
request. The response has a content encoding set, and will need a strong
hash added. The number of blocks is fixed at 20 for the moment, with a hash
size of 30 bits, which felt like a nice balance between overhead and
performance. This keeps our header at around the 128 byte mark when you have
base64 encoded the hashes (we dont pad the base64 encoding, so
30bits-5bytes).

The other aspect we need to standardise is the encoding of the response.
Again at the moment this is a very simplistic binary encoding. The response
is encoded in sections, each beginning with either a 'L' to indicate a
literal section or a 'B' to indicate a matched block (actually we could make
one a default and save a few bytes here). A literal section then has a 4
byte int in network byte order for the size of the literal section, followed
by the data. a block section has a single byte indicating the block number.

There is no error checking in the encoding itself, this is assumed to be
taken care in other layers, and we through in a strong hash on the whole
file to make sure this is correct. There is a risk if we get a corruption of
the literal length byte that we could try read a very large amount of data,
not sure if this is acceptable.

Toby



2009/3/31 Gervase Markham g...@mozilla.org

 On 25/03/09 18:20, Toby Collett wrote:

 Not a GSoC project, just a project(crcsync is the name at the moment).
 Initial target is a double proxy server, one each end of the slow link,
 with dreams of web standards and browser integration following.

 Seems to me that both projects need the same upstream server extension
 to be able to send the deltas down. Current state of the apache modules
 is that all the major pieces are in place but not a lot testing and no
 optimisation has been carried out yet.


 OK. So maybe the browser integration for this, or at least the groundwork
 for it, is what our SoC project should be. Particularly if you have Apache
 modules that work already.

 See
 https://wiki.mozilla.org/Community:SummerOfCode09:WebPagesOverRsync
 for where we are at the moment. We are getting incredible amounts of
 interest in this project - more than all the others combined. It seems like
 an idea whose time has come.

 Gerv




-- 
This email is intended for the addressee only and may contain privileged
and/or confidential information
___
Server-devel mailing list
Server-devel@lists.laptop.org
http://lists.laptop.org/listinfo/server-devel


Re: [Server-devel] Apache proxy CRCsync mozilla gsoc project?

2009-03-31 Thread Toby Collett
We are only using 30 bit hashes, so even if it was a perfect hash it is
possible you could get a collision. Having said that our collision space is
only the single web request, so should reduce chances of error.

Toby

2009/4/1 Martin Langhoff martin.langh...@gmail.com

 On Mon, Mar 30, 2009 at 8:26 PM, Toby Collett t...@plan9.net.nz wrote:
  There is no error checking in the encoding itself, this is assumed to be
  taken care in other layers, and we through in a strong hash on the whole
  file to make sure this is correct.

 Is that right? I thought what Rusty was saying re crcsync is that crc
 is strong, even when rolling?

 cheers,


 m
 --
  martin.langh...@gmail.com
  mar...@laptop.org -- School Server Architect
  - ask interesting questions
  - don't get distracted with shiny stuff  - working code first
  - http://wiki.laptop.org/go/User:Martinlanghoff




-- 
This email is intended for the addressee only and may contain privileged
and/or confidential information
___
Server-devel mailing list
Server-devel@lists.laptop.org
http://lists.laptop.org/listinfo/server-devel


Re: [Server-devel] Apache proxy CRCsync mozilla gsoc project?

2009-03-31 Thread Toby Collett
The plan was to include something like an sha1 hash of the original file in
the response headers. Then once the file has been decoded you can check to
make sure it matches. If not you can resend the request without the black
hash header and get the file the oldfashioned way.

Toby

2009/4/1 Martin Langhoff martin.langh...@gmail.com

 On Tue, Mar 31, 2009 at 8:32 PM, Toby Collett t...@plan9.net.nz wrote:
  We are only using 30 bit hashes, so even if it was a perfect hash it is
  possible you could get a collision. Having said that our collision space
 is
  only the single web request, so should reduce chances of error.

 IIRC, if rsync thinks there was a collision on the weak hash, it rolls
 again through the file with the weak hash and a different seed.

 Maybe we could include a differently seeded fingerprint?

 Is that what you were thinking?

 cheers,



 m
 --
  martin.langh...@gmail.com
  mar...@laptop.org -- School Server Architect
  - ask interesting questions
  - don't get distracted with shiny stuff  - working code first
  - http://wiki.laptop.org/go/User:Martinlanghoff




-- 
This email is intended for the addressee only and may contain privileged
and/or confidential information
___
Server-devel mailing list
Server-devel@lists.laptop.org
http://lists.laptop.org/listinfo/server-devel


Re: [Server-devel] Apache proxy CRCsync mozilla gsoc project?

2009-03-25 Thread Toby Collett
Not a GSoC project, just a project(crcsync is the name at the moment).
Initial target is a double proxy server, one each end of the slow link, with
dreams of web standards and browser integration following.

Seems to me that both projects need the same upstream server extension to be
able to send the deltas down. Current state of the apache modules is that
all the major pieces are in place but not a lot testing and no optimisation
has been carried out yet.

Toby

2009/3/26 Gervase Markham g...@mozilla.org

 On 23/03/09 11:19, Martin Langhoff wrote:

 Fantastic! I assume the rsync-http now know of the vastly superior
 karma of crcsync over the 2-hash method of rsync.


 Er, not really. After a lunchtime conversation with tridge at LCA where he
 told me about his original project, I just thought it would be cool and put
 it up on our SoC list. So I know very little about what's possible.

  If the Apache mods
 and Mozilla speak the same protocol, then machines behind
 bandwidth-constrained links will be in much better shape. I can see
 3G-internet providers pushing this too.


 Clearly, it's worth making sure everyone's on the same page. I see this as
 a killer app for Firefox on low-bandwidth links; we'll have every smalltown
 and developing world ISP which still has dial-up customers telling their
 customers use Firefox to make your Internet faster. They'd install the
 compression server on their web proxy, and voila.

 Have I understood correctly? Is Martin coordinating a GSoC project to do an
 apache extension for delta-compression-over-HTTP?

 Gerv




-- 
This email is intended for the addressee only and may contain privileged
and/or confidential information
___
Server-devel mailing list
Server-devel@lists.laptop.org
http://lists.laptop.org/listinfo/server-devel


Re: [Server-devel] Apache proxy CRCsync

2009-03-23 Thread Toby Collett
Hi Alex,
I think you are on the right track, there is a third option which is to add
a few extra configuration options to the cache module to make it more
aggressive about caching. Basically to cache everything except pages marked
'private' (and possibly even them as long as you can ensure the server is
secure).

Another aspect of our use of the cache module at the moment is that the disk
cache modules has a large portion of code in common with the client module.
It would be nice to add a few hooks to this module and reduce our code
duplication.

That being said, a module that doesn't require modifications to the apache
source is an advantage, however I think header hacking would be the only way
to achieve this so probably we cant get away with this.

Martin's comments about the garbage collection (which just arrived) are an
important consideration otherwise the cache could get out of control quite
quickly.

Toby
___
Server-devel mailing list
Server-devel@lists.laptop.org
http://lists.laptop.org/listinfo/server-devel


Re: [Server-devel] Apache proxy CRCsync

2009-03-16 Thread Toby Collett
Great to hear you got it running, unfortunately I only have about a two week
head start on you with regard to the apache front, so I am sure lots of
things will get neater as we go along.

2009/3/16 Alex Wulms alex.wu...@scarlet.be

 Hi Toby,

 I managed to get it working on my PC under suse 11.1 with apache 2.2.10.

 When I configured a dedicated debug log per virtual server, I noticed that
 the
 crccache_client and crccache_server modules were both invoked in both
 virtual
 servers. Judging from the error log you sent me, that is also the case on
 your server.

 I have made following changes to fix the situation:

 1) Move the 'CacheEnable crccache_client' directive (for the 'default'
 virtual
 server) inside the VirtualHost tag. Apparently it is applied globally as
 long as it is outside the VirtualHost tag, regardless of the config file
 in
 which it appears.

Seems like a sensible change.


 2) Introduce a new directive 'CRCCacheServer on'.
 This directive is checked by mod_crccache_server in the
 crccache_server_header_parser_handler.
 It is specified in the VirtualHost tag of the upstream_proxy of the
 virtual
 server.
 Apparently modules get loaded globally and functions like
 the ..._header_parser_handler get invoked for each virtual server, so they
 must check themselves if they should be enabled or disabled in a given
 virtual server. I found this through google, which pointed me to a forum
 where somebody else had faced a similar problem.

Makes sense


 I also realize why I only found cached files
 under /var/cache/apache2/mod_crccache_server and not under ..._client.
 It is because the crccache_client.conf and crccache_server.conf file both
 use
 the parameter CacheRoot to store the cache directory. These parameters are
 apparently also global. The fact that they are in two different config
 files
 does not automagically store them in a module specific namespace. So I have
 renamed the parameters to differentiate between the client and the server
 module.

Actually only the client end should need the CacheRoot at all, the server
side doesnt need caching at all. You could configure a standard apache cache
if you wanted, but it probably wont gain much.


 I have also noticed that, although the server module reads these
 parameters,
 they actually don't get used by the current code. Are they there due to
 copypaste reasons or are they already there for future enhancements, in
 order to cache stuff temporary on the server side?

Just copy and paste I guess, I think I left them there so I can something to
base other parameters on if we need them server side.


 I have pushed my changes to the repository. Please review them. I'm still
 new
 to Apache development so I might have misinterpreted some things.

 Thanks and kind regards,
 Alex



-- 
This email is intended for the addressee only and may contain privileged
and/or confidential information
___
Server-devel mailing list
Server-devel@lists.laptop.org
http://lists.laptop.org/listinfo/server-devel