subject:"\[Server\-devel\] Apache proxy CRCsync"

Re: [Server-devel] Apache proxy CRCsync & mozilla gsoc project?

2009-04-05 Thread tridge

Hi Rusty,

 >I didn't implement this because I wanted the server to be able to cache
 > the reply easily (ie. new story goes up on /., everyone sends old hash, 
 > reply gets served from accelerator).

I don't think that cacheing these will work as well as you might
expect. In your example of slashdot, it gives a different reply every
time, even for the same user. Try two wget's of slashdot.org and run a
diff between the results.

It would work for static pages, but with static pages you don't really
need delta-encoding, as you'll get a good hit rate with the normal
cache tag mechanisms that browsers and proxies already use.

Cheers, Tridge
___
Server-devel mailing list
Server-devel@lists.laptop.org
http://lists.laptop.org/listinfo/server-devel

Re: [Server-devel] Apache proxy CRCsync & mozilla gsoc project?

2009-04-02 Thread WULMS Alexander

>
>If the cache just blows away the cached base page it was using when one of
>these errors occurs, that should Do The Right Thing even without seed.
I see some concurrency issues and race conditions if the cache would simply 
blow away the cached base page, just in case that two
different concurrent requests are using the same base page and only one of them 
suffers from the checksum clash.

I think using a random seed per request would indeed be a safer approach to 
handle the manual retry scenario 

Nevertheless, the chance on failure should indeed be reduced by using a 
stronger hash.



smime.p7s
Description: S/MIME cryptographic signature
___
Server-devel mailing list
Server-devel@lists.laptop.org
http://lists.laptop.org/listinfo/server-devel

Re: [Server-devel] Apache proxy CRCsync & mozilla gsoc project?

2009-04-01 Thread Rusty Russell

On Wednesday 01 April 2009 19:40:37 Martin Langhoff wrote:
> On Wed, Apr 1, 2009 at 8:29 AM, Rusty Russell  wrote:
> > Yes, we need to chunk, because we can't hand the data on to the client until
> > we've verified it, at least in a serious implementation.
> 
> Hmmm. If I understand you right, the concern is that the rolling hash
> matches in locations that aren't a true match.

Yep, any hash can have collisions.

It's not one or two hashes, it's the number of bits.

> IOWs, a blind, unchecked "delete-last-user" action in a webapp is a
> bug in the webapp. It is ok to fail, as long as the retry will use a
> different seed...

Yes, but it changes the balance; I assumed that 1 in 1000 was a fair failure
rate.  If we want one in a million, we need more hash bits :)

If the cache just blows away the cached base page it was using when one of
these errors occurs, that should Do The Right Thing even without seed.

A more radical approach is to use a random seed, but don't have a total-content
hash: say "meh, it's unreliable, but unlikely to fail", though I don't think
the world is ready to accept such engineered-in-failure.

Cheers,
Rusty.
___
Server-devel mailing list
Server-devel@lists.laptop.org
http://lists.laptop.org/listinfo/server-devel

Re: [Server-devel] Apache proxy CRCsync & mozilla gsoc project?

2009-04-01 Thread Jim Gettys

On Thu, 2009-04-02 at 07:17 +1300, Toby Collett wrote:
> So a quick question, what sort of http transfers are chunking most
> often used for?

Dynamically generated content is the scenario for chunked transfers;
since you don't know the length a-priori, some other method of
indicating the message length is necessary.
- Jim

-- 
Jim Gettys 

___
Server-devel mailing list
Server-devel@lists.laptop.org
http://lists.laptop.org/listinfo/server-devel

Re: [Server-devel] Apache proxy CRCsync & mozilla gsoc project?

2009-04-01 Thread Martin Langhoff

On Wed, Apr 1, 2009 at 8:17 PM, Toby Collett  wrote:
> So a quick question, what sort of http transfers are chunking most often
> used for? I believe we will get poor results with the method for most types
> of binary data, which tend to be the larger files. In the web context these
> will generally have not changed at all (in which case traditional caching
> will help) or will have changed completely in which case the hashing is just
> overhead. Happy to be corrected on this point.

I agree. We can apply this method exclusively to */text and */xml mimetypes.

And something I forgot on the earlier pro-streaming notes: the memory
model of apache doesn't really release memory back to the kernel -- it
keeps it in the per-process memory pool. This means that if we have
unbound memory allocations (such as buffering whole requests), then
our memory usage will be terrible.

It's a bit less horrid with worker-threads, but in general, apache
modules usually strive to maintained fixed-sized buffers.

So it's something to keep in mind :-)

cheers,

m
-- 
 martin.langh...@gmail.com
 mar...@laptop.org -- School Server Architect
 - ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 - http://wiki.laptop.org/go/User:Martinlanghoff
___
Server-devel mailing list
Server-devel@lists.laptop.org
http://lists.laptop.org/listinfo/server-devel

Re: [Server-devel] Apache proxy CRCsync & mozilla gsoc project?

2009-04-01 Thread Toby Collett

So a quick question, what sort of http transfers are chunking most often
used for? I believe we will get poor results with the method for most types
of binary data, which tend to be the larger files. In the web context these
will generally have not changed at all (in which case traditional caching
will help) or will have changed completely in which case the hashing is just
overhead. Happy to be corrected on this point.

Actually while we are on this thought do we want to add the strong hash to
the request headers so the upstream server can reply with use the cached
version. This would allow the server side to correct for sites that don't
use correct cache headers (i.e. static images with no cache information).

One alternative to the fail on error is to hold a copy on the server end for
a short period so we can retransmit unencoded, but this is probably
unacceptable overhead on the server side, especially if we can't manage to
maintain a TCP session for the retry.

Are there any headers sent with each http chunk, we could always put our
strong hash across these, assuming that chunking is defined at source and
not repartitioned by caches and proxies in between.

Toby
___
Server-devel mailing list
Server-devel@lists.laptop.org
http://lists.laptop.org/listinfo/server-devel

Re: [Server-devel] Apache proxy CRCsync & mozilla gsoc project?

2009-04-01 Thread WULMS Alexander

If we go for the 'fail in case of mismatch' approach, we can keep streaming. We 
simply have to make sure that we close the
connection before we have streamed the last block if we discover the global 
checksum mismatch. And that is just a matter of putting
the global checksum validation before the 'stream last block back to client' in 
the decoder logic.



Alex WULMS
Lead Developer/Systems Engineer

Tel: +32 2 655 3931
Information Systems - SWIFT.COM Development
S.W.I.F.T. SCRL


>-Original Message-
>From: Martin Langhoff [mailto:martin.langh...@gmail.com]
>Sent: Wednesday, April 01, 2009 11:11 AM
>To: Rusty Russell
>Cc: Toby Collett; Gervase Markham; Alex Wulms; XS Devel; WULMS Alexander; 
>tri...@samba.org; angxia Huang;
>j...@freedesktop.org; http-crcs...@lists.laptop.org
>Subject: Re: Apache proxy CRCsync & mozilla gsoc project?
>
>On Wed, Apr 1, 2009 at 8:29 AM, Rusty Russell  wrote:
>> Yes, we need to chunk, because we can't hand the data on to the client until
>> we've verified it, at least in a serious implementation.
>
>Hmmm. If I understand you right, the concern is that the rolling hash
>matches in locations that aren't a true match.
>
>Can we do anything that is still efficient and retains the ability to
>stream? Maybe the client can send 2 hashes in the header, same block
>size but seeded differently?
>
>Or is the problem with the delta blocks we send?... (doesn't seem
>likely to prevent a streaming implementation, but maybe I'm missing
>something)
>
>> Since we're going to error out on the fail case, I'll switch the code to do
>> 64-bit checksums (not right now, but soon: what we have is good enough for
>> testing).
>
>Does 2 hashes make the error condition so unlikely that we can assume
>it won't happen normally? Also - delivery of HTTP payloads is not
>guaranteed. As Tridge said, non-cacheable GETs may be non-idempotent,
>but they sometimes fail to complete for any of many reasons, and the
>user has a big fat Refresh button right there in the webbrowser.
>
>IOWs, a blind, unchecked "delete-last-user" action in a webapp is a
>bug in the webapp. It is ok to fail, as long as the retry will use a
>different seed...
>
>cheers,
>
>
>
>m
>ps: cc'd the http-crcsync list, which is more appropriate...
>--
> martin.langh...@gmail.com
> mar...@laptop.org -- School Server Architect
> - ask interesting questions
> - don't get distracted with shiny stuff  - working code first
> - http://wiki.laptop.org/go/User:Martinlanghoff


smime.p7s
Description: S/MIME cryptographic signature
___
Server-devel mailing list
Server-devel@lists.laptop.org
http://lists.laptop.org/listinfo/server-devel

Re: [Server-devel] Apache proxy CRCsync & mozilla gsoc project?

2009-04-01 Thread Martin Langhoff

On Wed, Apr 1, 2009 at 8:29 AM, Rusty Russell  wrote:
> Yes, we need to chunk, because we can't hand the data on to the client until
> we've verified it, at least in a serious implementation.

Hmmm. If I understand you right, the concern is that the rolling hash
matches in locations that aren't a true match.

Can we do anything that is still efficient and retains the ability to
stream? Maybe the client can send 2 hashes in the header, same block
size but seeded differently?

Or is the problem with the delta blocks we send?... (doesn't seem
likely to prevent a streaming implementation, but maybe I'm missing
something)

> Since we're going to error out on the fail case, I'll switch the code to do
> 64-bit checksums (not right now, but soon: what we have is good enough for
> testing).

Does 2 hashes make the error condition so unlikely that we can assume
it won't happen normally? Also - delivery of HTTP payloads is not
guaranteed. As Tridge said, non-cacheable GETs may be non-idempotent,
but they sometimes fail to complete for any of many reasons, and the
user has a big fat Refresh button right there in the webbrowser.

IOWs, a blind, unchecked "delete-last-user" action in a webapp is a
bug in the webapp. It is ok to fail, as long as the retry will use a
different seed...

cheers,

m
ps: cc'd the http-crcsync list, which is more appropriate...
-- 
 martin.langh...@gmail.com
 mar...@laptop.org -- School Server Architect
 - ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 - http://wiki.laptop.org/go/User:Martinlanghoff
___
Server-devel mailing list
Server-devel@lists.laptop.org
http://lists.laptop.org/listinfo/server-devel

Re: [Server-devel] Apache proxy CRCsync & mozilla gsoc project?

2009-03-31 Thread Rusty Russell

On Wednesday 01 April 2009 11:11:23 tri...@samba.org wrote:
> The per-block rolling hash should also be randomly seeded as Martin
> mentioned. That way if the user does ask for the page again then the
> hashing will be different. You need to send that seed along with the
> request.

Hi Tridge,

   I didn't implement this because I wanted the server to be able to cache
the reply easily (ie. new story goes up on /., everyone sends old hash, 
reply gets served from accelerator).

   But then I assumed a re-get on fail.

Rusty.
___
Server-devel mailing list
Server-devel@lists.laptop.org
http://lists.laptop.org/listinfo/server-devel

Re: [Server-devel] Apache proxy CRCsync & mozilla gsoc project?

2009-03-31 Thread Rusty Russell

On Wednesday 01 April 2009 15:52:22 Martin Langhoff wrote:
> On Wed, Apr 1, 2009 at 12:48 AM, Rusty Russell  wrote:
> > Well, 'strong' here is relative.  In order to keep the checksum length 
> > finite and hence encode more blocks we only use a portion of the bits; it's 
> > a tradeoff.  And so an overall checksum is important, just to verify that 
> > the final result is correct.
> 
> Hmmm, if we need an overall checksum...
> 
>  - The server cannot stream data to the client because it has to wait
> until it has all of it. Even if our current implementation doesn't
> have this, having a protocol that allows streaming is high in my list.

Yes, we need to chunk, because we can't hand the data on to the client until
we've verified it, at least in a serious implementation.

>  - Aren't we back to the 2-hashes-will-get-us-sued square?

Nope, that's two hashes *per-block* IIRC.

> frankly, a hash collision that has the same content length and over
> the same syntax format (html/xml) is so rare as to be... well, not
> really something I would expect :-)

Tridge said 16 bits, but actually it's 48 bits per block (32 bit adler
+ 16 bit strong).

Since we're going to error out on the fail case, I'll switch the code to do
64-bit checksums (not right now, but soon: what we have is good enough for
testing).

Thanks,
Rusty.
___
Server-devel mailing list
Server-devel@lists.laptop.org
http://lists.laptop.org/listinfo/server-devel

Re: [Server-devel] Apache proxy CRCsync & mozilla gsoc project?

2009-03-31 Thread Martin Langhoff

On Wed, Apr 1, 2009 at 12:48 AM, Rusty Russell  wrote:
> Well, 'strong' here is relative.  In order to keep the checksum length finite 
> and hence encode more blocks we only use a portion of the bits; it's a 
> tradeoff.  And so an overall checksum is important, just to verify that the 
> final result is correct.

Hmmm, if we need an overall checksum...

 - The server cannot stream data to the client because it has to wait
until it has all of it. Even if our current implementation doesn't
have this, having a protocol that allows streaming is high in my list.

 - Aren't we back to the 2-hashes-will-get-us-sued square?

frankly, a hash collision that has the same content length and over
the same syntax format (html/xml) is so rare as to be... well, not
really something I would expect :-)

cheers,



m
-- 
 martin.langh...@gmail.com
 mar...@laptop.org -- School Server Architect
 - ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 - http://wiki.laptop.org/go/User:Martinlanghoff
___
Server-devel mailing list
Server-devel@lists.laptop.org
http://lists.laptop.org/listinfo/server-devel

Re: [Server-devel] Apache proxy CRCsync & mozilla gsoc project?

2009-03-31 Thread tridge

Hi Toby,

 > The plan was to include something like an sha1 hash of the original file in
 > the response headers. Then once the file has been decoded you can check to
 > make sure it matches. If not you can resend the request without the black
 > hash header and get the file the oldfashioned way.

re-sending http requests can be dangerous. The request might have
triggered an action like "delete the last person from the list". When
you resend it could delete two users rather than one.

Remember that one of the aims of this work is to allow cacheing of
dynamic requests, so you can't just assume the pages are marked as
cacheable (which usually implies that a 2nd request won't do any
harm).

Certainly including a strong whole-page hash is a good idea, but if
the strong hash doesn't match, then I think you need to return an
error, just like if you got a network outage.

The per-block rolling hash should also be randomly seeded as Martin
mentioned. That way if the user does ask for the page again then the
hashing will be different. You need to send that seed along with the
request.

In practice hashing errors will be extremely rare. It is extremely
rare for rsync to need a 2nd pass, and it uses a much weaker rolling
hash (I think I used 16 bits by default for the per block hashes). The
ability to do multiple passes is what allows rsync to get away with
such a small hash, but I remember that when I was testing the
multiple-pass code I needed to weaken it even more to get any
reasonable chance of a 2nd pass so I could be sure the code worked.

Cheers, Tridge
___
Server-devel mailing list
Server-devel@lists.laptop.org
http://lists.laptop.org/listinfo/server-devel

Re: [Server-devel] Apache proxy CRCsync & mozilla gsoc project?

2009-03-31 Thread Rusty Russell

On Tuesday 31 March 2009 23:29:23 Martin Langhoff wrote:
> On Mon, Mar 30, 2009 at 8:26 PM, Toby Collett  wrote:
> > There is no error checking in the encoding itself, this is assumed to be
> > taken care in other layers, and we through in a strong hash on the whole
> > file to make sure this is correct.
> 
> Is that right? I thought what Rusty was saying re crcsync is that crc
> is strong, even when rolling?

Well, 'strong' here is relative.  In order to keep the checksum length finite 
and hence encode more blocks we only use a portion of the bits; it's a 
tradeoff.  And so an overall checksum is important, just to verify that the 
final result is correct.

Cheers,
Rusty.
___
Server-devel mailing list
Server-devel@lists.laptop.org
http://lists.laptop.org/listinfo/server-devel

Re: [Server-devel] Apache proxy CRCsync & mozilla gsoc project?

2009-03-31 Thread Toby Collett

The plan was to include something like an sha1 hash of the original file in
the response headers. Then once the file has been decoded you can check to
make sure it matches. If not you can resend the request without the black
hash header and get the file the oldfashioned way.

Toby

2009/4/1 Martin Langhoff 

> On Tue, Mar 31, 2009 at 8:32 PM, Toby Collett  wrote:
> > We are only using 30 bit hashes, so even if it was a perfect hash it is
> > possible you could get a collision. Having said that our collision space
> is
> > only the single web request, so should reduce chances of error.
>
> IIRC, if rsync thinks there was a collision on the weak hash, it rolls
> again through the file with the weak hash and a different seed.
>
> Maybe we could include a differently seeded fingerprint?
>
> Is that what you were thinking?
>
> cheers,
>
>
>
> m
> --
>  martin.langh...@gmail.com
>  mar...@laptop.org -- School Server Architect
>  - ask interesting questions
>  - don't get distracted with shiny stuff  - working code first
>  - http://wiki.laptop.org/go/User:Martinlanghoff
>



-- 
This email is intended for the addressee only and may contain privileged
and/or confidential information
___
Server-devel mailing list
Server-devel@lists.laptop.org
http://lists.laptop.org/listinfo/server-devel

Re: [Server-devel] Apache proxy CRCsync & mozilla gsoc project?

2009-03-31 Thread Toby Collett

We are only using 30 bit hashes, so even if it was a perfect hash it is
possible you could get a collision. Having said that our collision space is
only the single web request, so should reduce chances of error.

Toby

2009/4/1 Martin Langhoff 

> On Mon, Mar 30, 2009 at 8:26 PM, Toby Collett  wrote:
> > There is no error checking in the encoding itself, this is assumed to be
> > taken care in other layers, and we through in a strong hash on the whole
> > file to make sure this is correct.
>
> Is that right? I thought what Rusty was saying re crcsync is that crc
> is strong, even when rolling?
>
> cheers,
>
>
> m
> --
>  martin.langh...@gmail.com
>  mar...@laptop.org -- School Server Architect
>  - ask interesting questions
>  - don't get distracted with shiny stuff  - working code first
>  - http://wiki.laptop.org/go/User:Martinlanghoff
>



-- 
This email is intended for the addressee only and may contain privileged
and/or confidential information
___
Server-devel mailing list
Server-devel@lists.laptop.org
http://lists.laptop.org/listinfo/server-devel

Re: [Server-devel] Apache proxy CRCsync & mozilla gsoc project?

2009-03-31 Thread Martin Langhoff

On Tue, Mar 31, 2009 at 8:32 PM, Toby Collett  wrote:
> We are only using 30 bit hashes, so even if it was a perfect hash it is
> possible you could get a collision. Having said that our collision space is
> only the single web request, so should reduce chances of error.

IIRC, if rsync thinks there was a collision on the weak hash, it rolls
again through the file with the weak hash and a different seed.

Maybe we could include a differently seeded fingerprint?

Is that what you were thinking?

cheers,



m
-- 
 martin.langh...@gmail.com
 mar...@laptop.org -- School Server Architect
 - ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 - http://wiki.laptop.org/go/User:Martinlanghoff
___
Server-devel mailing list
Server-devel@lists.laptop.org
http://lists.laptop.org/listinfo/server-devel

Re: [Server-devel] Apache proxy CRCsync & mozilla gsoc project?

2009-03-31 Thread Martin Langhoff

On Mon, Mar 30, 2009 at 8:26 PM, Toby Collett  wrote:
> There is no error checking in the encoding itself, this is assumed to be
> taken care in other layers, and we through in a strong hash on the whole
> file to make sure this is correct.

Is that right? I thought what Rusty was saying re crcsync is that crc
is strong, even when rolling?

cheers,


m
-- 
 martin.langh...@gmail.com
 mar...@laptop.org -- School Server Architect
 - ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 - http://wiki.laptop.org/go/User:Martinlanghoff
___
Server-devel mailing list
Server-devel@lists.laptop.org
http://lists.laptop.org/listinfo/server-devel

Re: [Server-devel] Apache proxy CRCsync & mozilla gsoc project?

2009-03-31 Thread Toby Collett

One thing we need to do is think about the headers carefully, as this is the
aspect of the project we could promote as a web standard. There is a large
amount of flexibility we could put in to this, but as Rusty has said, if
there is a way someone can implement a protocol wrong they will. So we need
to keep it as simple as possible.

At the moment we append the block size and hashes for the blocks to the
request. The response has a content encoding set, and will need a strong
hash added. The number of blocks is fixed at 20 for the moment, with a hash
size of 30 bits, which felt like a nice balance between overhead and
performance. This keeps our header at around the 128 byte mark when you have
base64 encoded the hashes (we dont pad the base64 encoding, so
30bits->5bytes).

The other aspect we need to standardise is the encoding of the response.
Again at the moment this is a very simplistic binary encoding. The response
is encoded in sections, each beginning with either a 'L' to indicate a
literal section or a 'B' to indicate a matched block (actually we could make
one a default and save a few bytes here). A literal section then has a 4
byte int in network byte order for the size of the literal section, followed
by the data. a block section has a single byte indicating the block number.

There is no error checking in the encoding itself, this is assumed to be
taken care in other layers, and we through in a strong hash on the whole
file to make sure this is correct. There is a risk if we get a corruption of
the literal length byte that we could try read a very large amount of data,
not sure if this is acceptable.

Toby

2009/3/31 Gervase Markham 

> On 25/03/09 18:20, Toby Collett wrote:
>
>> Not a GSoC project, just a project(crcsync is the name at the moment).
>> Initial target is a double proxy server, one each end of the slow link,
>> with dreams of web standards and browser integration following.
>>
>> Seems to me that both projects need the same upstream server extension
>> to be able to send the deltas down. Current state of the apache modules
>> is that all the major pieces are in place but not a lot testing and no
>> optimisation has been carried out yet.
>>
>
> OK. So maybe the browser integration for this, or at least the groundwork
> for it, is what our SoC project should be. Particularly if you have Apache
> modules that work already.
>
> See
> https://wiki.mozilla.org/Community:SummerOfCode09:WebPagesOverRsync
> for where we are at the moment. We are getting incredible amounts of
> interest in this project - more than all the others combined. It seems like
> an idea whose time has come.
>
> Gerv
>

-- 
This email is intended for the addressee only and may contain privileged
and/or confidential information
___
Server-devel mailing list
Server-devel@lists.laptop.org
http://lists.laptop.org/listinfo/server-devel

Re: [Server-devel] Apache proxy CRCsync & mozilla gsoc project?

2009-03-25 Thread Toby Collett

Not a GSoC project, just a project(crcsync is the name at the moment).
Initial target is a double proxy server, one each end of the slow link, with
dreams of web standards and browser integration following.

Seems to me that both projects need the same upstream server extension to be
able to send the deltas down. Current state of the apache modules is that
all the major pieces are in place but not a lot testing and no optimisation
has been carried out yet.

Toby

2009/3/26 Gervase Markham 

> On 23/03/09 11:19, Martin Langhoff wrote:
>
>> Fantastic! I assume the rsync-http now know of the vastly superior
>> karma of crcsync over the 2-hash method of rsync.
>>
>
> Er, not really. After a lunchtime conversation with tridge at LCA where he
> told me about his original project, I just thought it would be cool and put
> it up on our SoC list. So I know very little about what's possible.
>
>  If the Apache mods
>> and Mozilla speak the same protocol, then machines behind
>> bandwidth-constrained links will be in much better shape. I can see
>> 3G-internet providers pushing this too.
>>
>
> Clearly, it's worth making sure everyone's on the same page. I see this as
> a killer app for Firefox on low-bandwidth links; we'll have every smalltown
> and developing world ISP which still has dial-up customers telling their
> customers "use Firefox to make your Internet faster". They'd install the
> compression server on their web proxy, and voila.
>
> Have I understood correctly? Is Martin coordinating a GSoC project to do an
> apache extension for delta-compression-over-HTTP?
>
> Gerv
>



-- 
This email is intended for the addressee only and may contain privileged
and/or confidential information
___
Server-devel mailing list
Server-devel@lists.laptop.org
http://lists.laptop.org/listinfo/server-devel

Re: [Server-devel] Apache proxy CRCsync

2009-03-23 Thread Toby Collett

Hi Alex,
I think you are on the right track, there is a third option which is to add
a few extra configuration options to the cache module to make it more
aggressive about caching. Basically to cache everything except pages marked
'private' (and possibly even them as long as you can ensure the server is
secure).

Another aspect of our use of the cache module at the moment is that the disk
cache modules has a large portion of code in common with the client module.
It would be nice to add a few hooks to this module and reduce our code
duplication.

That being said, a module that doesn't require modifications to the apache
source is an advantage, however I think header hacking would be the only way
to achieve this so probably we cant get away with this.

Martin's comments about the garbage collection (which just arrived) are an
important consideration otherwise the cache could get out of control quite
quickly.

Toby
___
Server-devel mailing list
Server-devel@lists.laptop.org
http://lists.laptop.org/listinfo/server-devel

Re: [Server-devel] Apache proxy CRCsync

2009-03-23 Thread Martin Langhoff

On Mon, Mar 23, 2009 at 10:05 PM, Alex Wulms  wrote:
> What are your thoughts on this subject?

I'm not Toby, but I do have some notes from the chat we had with Rusty
back in January. The idea at the time was that

 - The 'normal' caching proxy would cache things that have good
caching headers. Our crcsync smarts are not needed there. That cache
has its own garbage collection logic, based on cache headers.

 - The crcsync proxy wants to cache "the rest" -- the formally
uncacheable content, perhaps taking some decision by content type
(prefer html, xml, text mimetypes. ignore others?) and its garbage
collector has a very different logic (something LRU-ish?)

that's a big advantage of this code being in-process with a normal
caching proxy, it can ignore the stuff that the caching proxy is
handling, and help with the requests that "aren't cacheable" according
to their server (most requests these days, unfortunately)...

cheers,



m
-- 
 martin.langh...@gmail.com
 mar...@laptop.org -- School Server Architect
 - ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 - http://wiki.laptop.org/go/User:Martinlanghoff
___
Server-devel mailing list
Server-devel@lists.laptop.org
http://lists.laptop.org/listinfo/server-devel

Re: [Server-devel] Apache proxy CRCsync

2009-03-23 Thread Alex Wulms

Hi Toby,

I did not have much time last week to work on the project but want to continue 
again this week. I have been thinking about the integration between the 
standard cache module of apache and the crccache_client cache-handler module.

At this moment, the cache module unfortunately does not invoke crccache_client 
for most dynamic pages; the web/application servers indicate that those pages 
are not cacheable, either by setting an expiry time in the past or by setting 
appropriate cache-control headers or a combination of the two. And the cache 
module respects that, as a good http-citizen. But the whole idea of crccache 
is that those pages should be stored by crccache_client anyway but get 
refetched and then delta'd by the crccache_client/crccache_server chain on 
the next request.
So one way or another crccache_client/crccache_server should trick the cache 
module into caching those dynamic pages.

I see two potential ways to make this happen:

Option 1)
crccache_client (or crccache_server?) modifies the cache related response 
headers before returning the response back to the cache module. It would 
modify the headers in such a way that the cache module would decide that the 
page must be cached but revalidated at the next request. This would require 
no modifications to the cache module but I do consider it a not-so-clean 
hack, because we would have to reverse engineer the cache module to 
understand when to modify the headers and when not to modify the headers. 
Which is obviously fragile because future enhancements to the cache module 
could potentially break such logic.

Option 2)
We introduce some new header(s) that crccache can inject in the response to 
indicate to the cache module that the pages will be cached by a delta/crcsync 
aware cache handler. And then we adapt the cache module itself to understand 
this new header and to cache normally-not-cachable pages if this header is 
present (and send them for revalidation to the crccache handler upon next 
request). I see this as a cleaner solution. Though, before immediately 
starting to implement this solution, I believe that it should be analysed 
into a little bit more detail. Especially with respect to the future, when 
crccache will talk to some servers that are crcsync aware and can directly 
handle the encoding themselves while crccache will at the same time also 
still talk to many current-gen servers that do not know this http extension.

What are your thoughts on this subject?


Thanks and kind regards,
Alex





Op maandag 16 maart 2009, schreef Toby Collett:
> Great to hear you got it running, unfortunately I only have about a two
> week head start on you with regard to the apache front, so I am sure lots
> of things will get neater as we go along.
>
> 2009/3/16 Alex Wulms 
>
> > Hi Toby,
> >
> > I managed to get it working on my PC under suse 11.1 with apache 2.2.10.
> >
> > When I configured a dedicated debug log per virtual server, I noticed
> > that the
> > crccache_client and crccache_server modules were both invoked in both
> > virtual
> > servers. Judging from the error log you sent me, that is also the case on
> > your server.
> >
> > I have made following changes to fix the situation:
> >
> > 1) Move the 'CacheEnable crccache_client' directive (for the 'default'
> > virtual
> > server) inside the  tag. Apparently it is applied globally
> > as long as it is outside the  tag, regardless of the config
> > file in
> > which it appears.
>
> Seems like a sensible change.
>
> > 2) Introduce a new directive 'CRCCacheServer on'.
> > This directive is checked by mod_crccache_server in the
> > crccache_server_header_parser_handler.
> > It is specified in the  tag of the upstream_proxy of the
> > virtual
> > server.
> > Apparently modules get loaded globally and functions like
> > the ..._header_parser_handler get invoked for each virtual server, so
> > they must check themselves if they should be enabled or disabled in a
> > given virtual server. I found this through google, which pointed me to a
> > forum where somebody else had faced a similar problem.
>
> Makes sense
>
> > I also realize why I only found cached files
> > under /var/cache/apache2/mod_crccache_server and not under ..._client.
> > It is because the crccache_client.conf and crccache_server.conf file both
> > use
> > the parameter CacheRoot to store the cache directory. These parameters
> > are apparently also global. The fact that they are in two different
> > config files
> > does not automagically store them in a module specific namespace. So I
> > have renamed the parameters to differentiate between the client and the
> > server module.
>
> Actually only the client end should need the CacheRoot at all, the server
> side doesnt need caching at all. You could configure a standard apache
> cache if you wanted, but it probably wont gain much.
>
> > I have also noticed that, although the server module reads these
> > parameters,
> > they actually don't get used by

[Server-devel] Apache proxy CRCsync & mozilla gsoc project?

2009-03-23 Thread Rusty Russell

Hi,

Tridge just cc'd me on on a GSOC rsync-http mozilla project; given that
Martin is coordinating an apache proxy plugin, I thought I'd send a big
inclusive mail to make sure we all know about each other!

My involvement: a crcsync module in CCAN which can be used as a (simplified)
librsync.

Cheers!
Rusty.
___
Server-devel mailing list
Server-devel@lists.laptop.org
http://lists.laptop.org/listinfo/server-devel

Re: [Server-devel] Apache proxy CRCsync & mozilla gsoc project?

2009-03-23 Thread Martin Langhoff

On Mon, Mar 23, 2009 at 7:29 AM, Rusty Russell  wrote:
>        Tridge just cc'd me on on a GSOC rsync-http mozilla project; given that
> Martin is coordinating an apache proxy plugin, I thought I'd send a big
> inclusive mail to make sure we all know about each other!
>
> My involvement: a crcsync module in CCAN which can be used as a (simplified)
> librsync.

Fantastic! I assume the rsync-http now know of the vastly superior
karma of crcsync over the 2-hash method of rsync. If the Apache mods
and Mozilla speak the same protocol, then machines behind
bandwidth-constrained links will be in much better shape. I can see
3G-internet providers pushing this too.

Also cc'ing Jim Gettys -- our long-held hope is that the resulting
extension to the http protocol is something that can be folded into a
future http spec.

Pushing buttons to create http-crcs...@lists.laptop.org to serve as a
coordination point.

cheers,

martin
-- 
 martin.langh...@gmail.com
 mar...@laptop.org -- School Server Architect
 - ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 - http://wiki.laptop.org/go/User:Martinlanghoff
___
Server-devel mailing list
Server-devel@lists.laptop.org
http://lists.laptop.org/listinfo/server-devel

Re: [Server-devel] Apache proxy CRCsync

2009-03-16 Thread Toby Collett

Great to hear you got it running, unfortunately I only have about a two week
head start on you with regard to the apache front, so I am sure lots of
things will get neater as we go along.

2009/3/16 Alex Wulms 

> Hi Toby,
>
> I managed to get it working on my PC under suse 11.1 with apache 2.2.10.
>
> When I configured a dedicated debug log per virtual server, I noticed that
> the
> crccache_client and crccache_server modules were both invoked in both
> virtual
> servers. Judging from the error log you sent me, that is also the case on
> your server.
>
> I have made following changes to fix the situation:
>
> 1) Move the 'CacheEnable crccache_client' directive (for the 'default'
> virtual
> server) inside the  tag. Apparently it is applied globally as
> long as it is outside the  tag, regardless of the config file
> in
> which it appears.
>
Seems like a sensible change.

>
> 2) Introduce a new directive 'CRCCacheServer on'.
> This directive is checked by mod_crccache_server in the
> crccache_server_header_parser_handler.
> It is specified in the  tag of the upstream_proxy of the
> virtual
> server.
> Apparently modules get loaded globally and functions like
> the ..._header_parser_handler get invoked for each virtual server, so they
> must check themselves if they should be enabled or disabled in a given
> virtual server. I found this through google, which pointed me to a forum
> where somebody else had faced a similar problem.
>
Makes sense

>
> I also realize why I only found cached files
> under /var/cache/apache2/mod_crccache_server and not under ..._client.
> It is because the crccache_client.conf and crccache_server.conf file both
> use
> the parameter CacheRoot to store the cache directory. These parameters are
> apparently also global. The fact that they are in two different config
> files
> does not automagically store them in a module specific namespace. So I have
> renamed the parameters to differentiate between the client and the server
> module.
>
Actually only the client end should need the CacheRoot at all, the server
side doesnt need caching at all. You could configure a standard apache cache
if you wanted, but it probably wont gain much.

>
> I have also noticed that, although the server module reads these
> parameters,
> they actually don't get used by the current code. Are they there due to
> copy&paste reasons or are they already there for future enhancements, in
> order to cache stuff temporary on the server side?
>
Just copy and paste I guess, I think I left them there so I can something to
base other parameters on if we need them server side.

>
> I have pushed my changes to the repository. Please review them. I'm still
> new
> to Apache development so I might have misinterpreted some things.
>
> Thanks and kind regards,
> Alex
>
>

-- 
This email is intended for the addressee only and may contain privileged
and/or confidential information
___
Server-devel mailing list
Server-devel@lists.laptop.org
http://lists.laptop.org/listinfo/server-devel

Re: [Server-devel] Apache proxy CRCsync

2009-03-16 Thread Alex Wulms

Hi Toby,

I managed to get it working on my PC under suse 11.1 with apache 2.2.10.

When I configured a dedicated debug log per virtual server, I noticed that the 
crccache_client and crccache_server modules were both invoked in both virtual 
servers. Judging from the error log you sent me, that is also the case on 
your server.

I have made following changes to fix the situation:

1) Move the 'CacheEnable crccache_client' directive (for the 'default' virtual 
server) inside the  tag. Apparently it is applied globally as 
long as it is outside the  tag, regardless of the config file in 
which it appears.

2) Introduce a new directive 'CRCCacheServer on'.
This directive is checked by mod_crccache_server in the 
crccache_server_header_parser_handler.
It is specified in the  tag of the upstream_proxy of the virtual 
server. 
Apparently modules get loaded globally and functions like 
the ..._header_parser_handler get invoked for each virtual server, so they 
must check themselves if they should be enabled or disabled in a given 
virtual server. I found this through google, which pointed me to a forum 
where somebody else had faced a similar problem.

I also realize why I only found cached files 
under /var/cache/apache2/mod_crccache_server and not under ..._client. 
It is because the crccache_client.conf and crccache_server.conf file both use 
the parameter CacheRoot to store the cache directory. These parameters are 
apparently also global. The fact that they are in two different config files 
does not automagically store them in a module specific namespace. So I have 
renamed the parameters to differentiate between the client and the server 
module.

I have also noticed that, although the server module reads these parameters, 
they actually don't get used by the current code. Are they there due to 
copy&paste reasons or are they already there for future enhancements, in 
order to cache stuff temporary on the server side?

I have pushed my changes to the repository. Please review them. I'm still new 
to Apache development so I might have misinterpreted some things.

Thanks and kind regards,
Alex



Op zondag 15 maart 2009, schreef Toby Collett:
> Not much time to work on crcsync this weekend, but I have enabled block
> replacement, so in theory the latest version in git should be able to be
> used to serve web pages.
>
> Toby


___
Server-devel mailing list
Server-devel@lists.laptop.org
http://lists.laptop.org/listinfo/server-devel

Re: [Server-devel] Apache proxy CRCsync

2009-03-16 Thread Toby Collett

Not much time to work on crcsync this weekend, but I have enabled block
replacement, so in theory the latest version in git should be able to be
used to serve web pages.

Toby
___
Server-devel mailing list
Server-devel@lists.laptop.org
http://lists.laptop.org/listinfo/server-devel

[Server-devel] Apache proxy CRCsync

2009-03-05 Thread Martin Langhoff

This is a brief email to mention that I've setup a few skeleton things
related to the crcsync + apache proxy push.

There is an unofficial (but apparently well maintained and popular)
git mirror of the apache sources at http://jukka.zitting.name/git/ -
so I cloned the httpd.git repo, and used it as the base for a new
project repository at repo.or.cz.

So we now have

 - GIT repository http://repo.or.cz/w/httpd-crcsyncproxy.git

 - Project 'homepage': http://wiki.laptop.org/go/Apache_Proxy_CRCsync
- I admit it's horrid, if anyone wants to expand/improve...

cheers,


martin
-- 
 martin.langh...@gmail.com
 mar...@laptop.org -- School Server Architect
 - ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 - http://wiki.laptop.org/go/User:Martinlanghoff
___
Server-devel mailing list
Server-devel@lists.laptop.org
http://lists.laptop.org/listinfo/server-devel

Re: [Server-devel] Apache proxy CRCsync & mozilla gsoc project?

Re: [Server-devel] Apache proxy CRCsync & mozilla gsoc project?

Re: [Server-devel] Apache proxy CRCsync & mozilla gsoc project?

Re: [Server-devel] Apache proxy CRCsync & mozilla gsoc project?

Re: [Server-devel] Apache proxy CRCsync & mozilla gsoc project?

Re: [Server-devel] Apache proxy CRCsync & mozilla gsoc project?

Re: [Server-devel] Apache proxy CRCsync & mozilla gsoc project?

Re: [Server-devel] Apache proxy CRCsync & mozilla gsoc project?

Re: [Server-devel] Apache proxy CRCsync & mozilla gsoc project?

Re: [Server-devel] Apache proxy CRCsync & mozilla gsoc project?

Re: [Server-devel] Apache proxy CRCsync & mozilla gsoc project?

Re: [Server-devel] Apache proxy CRCsync & mozilla gsoc project?

Re: [Server-devel] Apache proxy CRCsync & mozilla gsoc project?

Re: [Server-devel] Apache proxy CRCsync & mozilla gsoc project?

Re: [Server-devel] Apache proxy CRCsync & mozilla gsoc project?

Re: [Server-devel] Apache proxy CRCsync & mozilla gsoc project?

Re: [Server-devel] Apache proxy CRCsync & mozilla gsoc project?

Re: [Server-devel] Apache proxy CRCsync & mozilla gsoc project?

Re: [Server-devel] Apache proxy CRCsync & mozilla gsoc project?

Re: [Server-devel] Apache proxy CRCsync

Re: [Server-devel] Apache proxy CRCsync

Re: [Server-devel] Apache proxy CRCsync

[Server-devel] Apache proxy CRCsync & mozilla gsoc project?

Re: [Server-devel] Apache proxy CRCsync & mozilla gsoc project?

Re: [Server-devel] Apache proxy CRCsync

Re: [Server-devel] Apache proxy CRCsync

Re: [Server-devel] Apache proxy CRCsync

[Server-devel] Apache proxy CRCsync

28 matches

Site Navigation

Mail list logo

Footer information