#11813: Stale caches with trac and transparent proxies
----------------------------+-----------------------------------------------
   Reporter:  vbraun        |          Owner:  mvngu, schilly
       Type:  defect        |         Status:  new           
   Priority:  major         |      Milestone:  sage-4.7.2    
  Component:  website/wiki  |       Keywords:                
Work_issues:                |       Upstream:  N/A           
   Reviewer:                |         Author:                
     Merged:                |   Dependencies:                
----------------------------+-----------------------------------------------

Old description:

> Many sites are running transparent web proxies. Which should be fine, but
> I and Simon King both recently ran into a bug where and attempt to
> download a patch from trac resulted in an old version of the patch.
> Needless to say, this is very dangerous for development.
>
> To reproduce, you need to have a transparent proxy in front of you, and
> then
>   1. Upload a patch to trac
>   2. Download the patch (the proxy will cache it)
>   3. Upload a new version of the patch under the same name
>   4. Download the patch again - under some circumstances the old version
> of the patch is served by the (not so) transparent proxy.
>
> This just happened to me with `trac11115-cached_cython.patch`. If I
> download it from boxen (without proxy), I receive the following http
> headers:
> {{{
> vbraun@boxen:~$ wget -O- -S http://trac.sagemath.org/sage_trac/raw-
> attachment/ticket/11115/trac11115-cached_cython.patch | md5sum
> --05:39:42--  http://trac.sagemath.org/sage_trac/raw-
> attachment/ticket/11115/trac11115-cached_cython.patch
>            => `-'
> Resolving trac.sagemath.org... 128.208.160.197
> Connecting to trac.sagemath.org|128.208.160.197|:80... connected.
> HTTP request sent, awaiting response...
>   HTTP/1.1 200 Ok
>   Date: Sun, 18 Sep 2011 12:39:42 GMT
>   Server: Apache/2.2.8 (Ubuntu) DAV/2 SVN/1.5.1 mod_python/3.3.1
> Python/2.5.2 mod_ssl/2.2.8 OpenSSL/0.9.8g mod_wsgi/2.0
>   ETag: W/"anonymous/Sat, 17 Sep 2011 21:06:12 GMT/False"
>   Content-Disposition: attachment
>   Content-Length: 151548
>   Last-Modified: Sat, 17 Sep 2011 21:06:12 GMT
>   Keep-Alive: timeout=15, max=1000
>   Connection: Keep-Alive
>   Content-Type: text/x-diff; charset=iso-8859-15
> Length: 151,548 (148K) [text/x-diff]
>
> 100%[=============================================================================>]
> 151,548       --.--K/s
>
> 05:39:42 (161.15 MB/s) - `-' saved [151548/151548]
>
> 0dc42d7f8d3ae270eb65927ed942ad24  -
> }}}
> This is the correct patch. But behind my proxy, I receive a stale copy:
> {{{
> wget -O- -S http://trac.sagemath.org/sage_trac/raw-
> attachment/ticket/11115/trac11115-cached_cython.patch | md5sum
> --2011-09-18 13:37:47--  http://trac.sagemath.org/sage_trac/raw-
> attachment/ticket/11115/trac11115-cached_cython.patch
> Resolving trac.sagemath.org... 128.208.160.197
> Connecting to trac.sagemath.org|128.208.160.197|:80... connected.
> HTTP request sent, awaiting response...
>   HTTP/1.0 200 OK
>   Date: Sat, 17 Sep 2011 20:37:09 GMT
>   Server: Apache/2.2.8 (Ubuntu) DAV/2 SVN/1.5.1 mod_python/3.3.1
> Python/2.5.2 mod_ssl/2.2.8 OpenSSL/0.9.8g mod_wsgi/2.0
>   ETag: W/"anonymous/Thu, 26 May 2011 07:16:22 GMT/False"
>   Content-Disposition: attachment
>   Content-Length: 151609
>   Last-Modified: Thu, 26 May 2011 07:16:22 GMT
>   Content-Type: text/x-diff; charset=iso-8859-15
>   Age: 57638
>   X-Cache: HIT from fw.stp.dias.ie
>   X-Cache-Lookup: HIT from fw.stp.dias.ie:3128
>   Via: 1.1 fw.stp.dias.ie:3128 (squid/2.7.STABLE9)
>   Connection: keep-alive
> Length: 151609 (148K) [text/x-diff]
> Saving to: “STDOUT”
>
> 100%[============================================================>]
> 151,609     --.-K/s   in 0.002s
>
> 2011-09-18 13:37:47 (77.8 MB/s) - written to stdout [151609/151609]
>
> f88ca8ad9090aeacb6dc0c726dcc76b5  -
> }}}
> HTTP provides the ETag header to control cache freshness. The proxy
> (squid/2.7.STABLE9) should have checked with the trac server to see if
> the cached ETag `W/"anonymous/Thu, 26 May 2011 07:16:22 GMT/False"` is
> still up-to-date. If the resource were still up to date the trac server
> would reply `HTTP 304 Not Modified`, but since the ETag changed the trac
> server should reply with the new version of the patch. I don't have
> access to the server logs so I can't say what happened for sure, but
> something is broken.
>
> A workaround is to set the `Pragma: no-cache` in the client query (i.e.
> use `wget --no-cache`), but then its easy to forget that.
>
> Irrespective of who is precisely at fault, we should configure the trac
> server to never allow caching of the patches since their integrity is
> crucial for us and client-side caching doesn't really buy us much. For
> that, I propose to configure Apache to add the following to the headers
> for all resources under `/sage_trac/raw_attachment`:
> {{{
> Cache-Control: no-cache
> Expires: Thu, 1 Jan 1970 00:00:00 GMT
> }}}
> hitting both the HTTP/1.0 and 1.1 cache control mechanisms.

New description:

 Many sites are running transparent web proxies. Which should be fine, but
 I and Simon King both recently ran into a bug where and attempt to
 download a patch from trac resulted in an old version of the patch.
 Needless to say, this is very dangerous for development.

 To reproduce, you need to have a transparent proxy in front of you, and
 then
   1. Upload a patch to trac
   2. Download the patch (the proxy will cache it)
   3. Upload a new version of the patch under the same name
   4. Download the patch again - under some circumstances the old version
 of the patch is served by the (not so) transparent proxy.

 This just happened to me with `trac11115-cached_cython.patch`. If I
 download it from boxen (without proxy), I receive the following http
 headers:
 {{{
 vbraun@boxen:~$ wget -O- -S http://trac.sagemath.org/sage_trac/raw-
 attachment/ticket/11115/trac11115-cached_cython.patch | md5sum
 --05:39:42--  http://trac.sagemath.org/sage_trac/raw-
 attachment/ticket/11115/trac11115-cached_cython.patch
            => `-'
 Resolving trac.sagemath.org... 128.208.160.197
 Connecting to trac.sagemath.org|128.208.160.197|:80... connected.
 HTTP request sent, awaiting response...
   HTTP/1.1 200 Ok
   Date: Sun, 18 Sep 2011 12:39:42 GMT
   Server: Apache/2.2.8 (Ubuntu) DAV/2 SVN/1.5.1 mod_python/3.3.1
 Python/2.5.2 mod_ssl/2.2.8 OpenSSL/0.9.8g mod_wsgi/2.0
   ETag: W/"anonymous/Sat, 17 Sep 2011 21:06:12 GMT/False"
   Content-Disposition: attachment
   Content-Length: 151548
   Last-Modified: Sat, 17 Sep 2011 21:06:12 GMT
   Keep-Alive: timeout=15, max=1000
   Connection: Keep-Alive
   Content-Type: text/x-diff; charset=iso-8859-15
 Length: 151,548 (148K) [text/x-diff]

 
100%[=============================================================================>]
 151,548       --.--K/s

 05:39:42 (161.15 MB/s) - `-' saved [151548/151548]

 0dc42d7f8d3ae270eb65927ed942ad24  -
 }}}
 This is the correct patch. But behind my proxy, I receive a stale copy:
 {{{
 wget -O- -S http://trac.sagemath.org/sage_trac/raw-
 attachment/ticket/11115/trac11115-cached_cython.patch | md5sum
 --2011-09-18 13:37:47--  http://trac.sagemath.org/sage_trac/raw-
 attachment/ticket/11115/trac11115-cached_cython.patch
 Resolving trac.sagemath.org... 128.208.160.197
 Connecting to trac.sagemath.org|128.208.160.197|:80... connected.
 HTTP request sent, awaiting response...
   HTTP/1.0 200 OK
   Date: Sat, 17 Sep 2011 20:37:09 GMT
   Server: Apache/2.2.8 (Ubuntu) DAV/2 SVN/1.5.1 mod_python/3.3.1
 Python/2.5.2 mod_ssl/2.2.8 OpenSSL/0.9.8g mod_wsgi/2.0
   ETag: W/"anonymous/Thu, 26 May 2011 07:16:22 GMT/False"
   Content-Disposition: attachment
   Content-Length: 151609
   Last-Modified: Thu, 26 May 2011 07:16:22 GMT
   Content-Type: text/x-diff; charset=iso-8859-15
   Age: 57638
   X-Cache: HIT from fw.stp.dias.ie
   X-Cache-Lookup: HIT from fw.stp.dias.ie:3128
   Via: 1.1 fw.stp.dias.ie:3128 (squid/2.7.STABLE9)
   Connection: keep-alive
 Length: 151609 (148K) [text/x-diff]
 Saving to: “STDOUT”

 100%[============================================================>]
 151,609     --.-K/s   in 0.002s

 2011-09-18 13:37:47 (77.8 MB/s) - written to stdout [151609/151609]

 f88ca8ad9090aeacb6dc0c726dcc76b5  -
 }}}
 HTTP provides the ETag header to control cache freshness. The proxy
 (squid/2.7.STABLE9) should have checked with the trac server to see if the
 cached ETag `W/"anonymous/Thu, 26 May 2011 07:16:22 GMT/False"` is still
 up-to-date. If the resource were still up to date the trac server would
 reply `HTTP 304 Not Modified`, but since the ETag changed the trac server
 should reply with the new version of the patch. I don't have access to the
 server logs so I can't say what happened for sure, but something is
 broken.

 A workaround is to set the `Pragma: no-cache` in the client query (i.e.
 use `wget --no-cache`), but then its easy to forget that.

 Irrespective of who is precisely at fault, we should configure the trac
 server to never allow caching of the patches since their integrity is
 crucial for us and client-side caching doesn't really buy us much. For
 that, I propose to configure Apache to add the following to the headers
 for all resources under `/sage_trac/raw_attachment`:
 {{{
 Cache-Control: no-cache
 Expires: Thu, 1 Jan 1970 00:00:00 GMT
 }}}
 hitting both the HTTP/1.0 and 1.1 cache control mechanisms.

 See also upstream bug http://trac.edgewall.org/ticket/6367

--

Comment(by vbraun):

 Replying to [comment:5 leif]:
 > But the purpose of `~/.wgetrc` in this case would be to ''always''
 disable caching (by default), such that it wouldn't matter whether you're
 behind a proxy or not (provided the proxy isn't broken and doesn't refuse
 to bypass caching).

 So you are suggesting that every Sage developer puts a particular entry in
 `~/.wgetrc` on all of his laptops, just to be safe if he ever leaves his
 house with it. While we could just work around it in a few lines of the
 apache `httpd.conf`.

 > Well, humans are more likely to read the comments on a ticket, so they
 actually see that a patch was re-uploaded / modified (though they perhaps
 don't look at the file modification times of the downloaded files, which
 one IMHO should do).

 The html version does not get erroneously cached, the bug manifests only
 with the raw attachment. Trac dishes out the html version with `Cache-
 control: must-revalidate`:
 {{{
 vbraun@boxen:~$ wget -O- -S
 
http://trac.sagemath.org/sage_trac/attachment/ticket/11115/trac11115-cached_cython.patch
 | md5sum--10:49:49--
 
http://trac.sagemath.org/sage_trac/attachment/ticket/11115/trac11115-cached_cython.patch
            => `-'
 Resolving trac.sagemath.org... 128.208.160.197
 Connecting to trac.sagemath.org|128.208.160.197|:80... connected.
 HTTP request sent, awaiting response...
   HTTP/1.1 200 Ok
   Date: Sun, 18 Sep 2011 17:49:49 GMT
   Server: Apache/2.2.8 (Ubuntu) DAV/2 SVN/1.5.1 mod_python/3.3.1
 Python/2.5.2 mod_ssl/2.2.8 OpenSSL/0.9.8g mod_wsgi/2.0
   ETag: W/"anonymous/Sat, 17 Sep 2011 21:06:12 GMT/False"
   Cache-control: must-revalidate
   Set-Cookie: trac_form_token=6ee2168bc6a1bd4e46d5ac03; Path=/sage_trac
   Set-Cookie: trac_session=36b6b4eaaf22a880e1451a6a; expires=Sat,
 17-Dec-2011 17:49:53 GMT; Path=/sage_trac
   Content-Length: 750922
   Vary: Accept-Encoding
   Keep-Alive: timeout=15, max=1000
   Connection: Keep-Alive
   Content-Type: text/html;charset=utf-8
 Length: 750,922 (733K) [text/html]

 
100%[===========================================================================================================================>]
 750,922       --.--K/s

 10:49:53 (181.60 MB/s) - `-' saved [750922/750922]

 0ee8396915e5be21797f03b88cacd53c  -
 }}}
 Though the `Vary: Accept-Encoding` header is very wrong. Looking at the
 trac trac (:-), this seems to be a known bug:
 http://trac.edgewall.org/ticket/6367. That ticket says: "Also note that
 Request.send_file() function does not send a Cache-Control header. That
 should be OK if Vary * is sent". This seems to be the issue, raw
 attachments neither have a `Cache-control` nor a `Vary: *` header.

 And I don't check manually that a downloaded file has the right time
 stamp, I have a computer to do menial task for me not the other way round
 :-)

-- 
Ticket URL: <http://trac.sagemath.org/sage_trac/ticket/11813#comment:7>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica, 
and MATLAB

-- 
You received this message because you are subscribed to the Google Groups 
"sage-trac" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/sage-trac?hl=en.

Reply via email to