#11813: Stale caches with trac and transparent proxies
----------------------------+-----------------------------------------------
Reporter: vbraun | Owner: mvngu, schilly
Type: defect | Status: new
Priority: major | Milestone: sage-4.7.2
Component: website/wiki | Keywords:
Work_issues: | Upstream: N/A
Reviewer: | Author:
Merged: | Dependencies:
----------------------------+-----------------------------------------------
Old description:
> Many sites are running transparent web proxies. Which should be fine, but
> I and Simon King both recently ran into a bug where and attempt to
> download a patch from trac resulted in an old version of the patch.
> Needless to say, this is very dangerous for development.
>
> To reproduce, you need to have a transparent proxy in front of you, and
> then
> 1. Upload a patch to trac
> 2. Download the patch (the proxy will cache it)
> 3. Upload a new version of the patch under the same name
> 4. Download the patch again - under some circumstances the old version
> of the patch is served by the (not so) transparent proxy.
>
> This just happened to me with `trac11115-cached_cython.patch`. If I
> download it from boxen (without proxy), I receive the following http
> headers:
> {{{
> vbraun@boxen:~$ wget -O- -S http://trac.sagemath.org/sage_trac/raw-
> attachment/ticket/11115/trac11115-cached_cython.patch | md5sum
> --05:39:42-- http://trac.sagemath.org/sage_trac/raw-
> attachment/ticket/11115/trac11115-cached_cython.patch
> => `-'
> Resolving trac.sagemath.org... 128.208.160.197
> Connecting to trac.sagemath.org|128.208.160.197|:80... connected.
> HTTP request sent, awaiting response...
> HTTP/1.1 200 Ok
> Date: Sun, 18 Sep 2011 12:39:42 GMT
> Server: Apache/2.2.8 (Ubuntu) DAV/2 SVN/1.5.1 mod_python/3.3.1
> Python/2.5.2 mod_ssl/2.2.8 OpenSSL/0.9.8g mod_wsgi/2.0
> ETag: W/"anonymous/Sat, 17 Sep 2011 21:06:12 GMT/False"
> Content-Disposition: attachment
> Content-Length: 151548
> Last-Modified: Sat, 17 Sep 2011 21:06:12 GMT
> Keep-Alive: timeout=15, max=1000
> Connection: Keep-Alive
> Content-Type: text/x-diff; charset=iso-8859-15
> Length: 151,548 (148K) [text/x-diff]
>
> 100%[=============================================================================>]
> 151,548 --.--K/s
>
> 05:39:42 (161.15 MB/s) - `-' saved [151548/151548]
>
> 0dc42d7f8d3ae270eb65927ed942ad24 -
> }}}
> This is the correct patch. But behind my proxy, I receive a stale copy:
> {{{
> wget -O- -S http://trac.sagemath.org/sage_trac/raw-
> attachment/ticket/11115/trac11115-cached_cython.patch | md5sum
> --2011-09-18 13:37:47-- http://trac.sagemath.org/sage_trac/raw-
> attachment/ticket/11115/trac11115-cached_cython.patch
> Resolving trac.sagemath.org... 128.208.160.197
> Connecting to trac.sagemath.org|128.208.160.197|:80... connected.
> HTTP request sent, awaiting response...
> HTTP/1.0 200 OK
> Date: Sat, 17 Sep 2011 20:37:09 GMT
> Server: Apache/2.2.8 (Ubuntu) DAV/2 SVN/1.5.1 mod_python/3.3.1
> Python/2.5.2 mod_ssl/2.2.8 OpenSSL/0.9.8g mod_wsgi/2.0
> ETag: W/"anonymous/Thu, 26 May 2011 07:16:22 GMT/False"
> Content-Disposition: attachment
> Content-Length: 151609
> Last-Modified: Thu, 26 May 2011 07:16:22 GMT
> Content-Type: text/x-diff; charset=iso-8859-15
> Age: 57638
> X-Cache: HIT from fw.stp.dias.ie
> X-Cache-Lookup: HIT from fw.stp.dias.ie:3128
> Via: 1.1 fw.stp.dias.ie:3128 (squid/2.7.STABLE9)
> Connection: keep-alive
> Length: 151609 (148K) [text/x-diff]
> Saving to: “STDOUT”
>
> 100%[============================================================>]
> 151,609 --.-K/s in 0.002s
>
> 2011-09-18 13:37:47 (77.8 MB/s) - written to stdout [151609/151609]
>
> f88ca8ad9090aeacb6dc0c726dcc76b5 -
> }}}
> HTTP provides the ETag header to control cache freshness. The proxy
> (squid/2.7.STABLE9) should have checked with the trac server to see if
> the cached ETag `W/"anonymous/Thu, 26 May 2011 07:16:22 GMT/False"` is
> still up-to-date. If the resource were still up to date the trac server
> would reply `HTTP 304 Not Modified`, but since the ETag changed the trac
> server should reply with the new version of the patch. I don't have
> access to the server logs so I can't say what happened for sure, but
> something is broken.
>
> A workaround is to set the `Pragma: no-cache` in the client query (i.e.
> use `wget --no-cache`), but then its easy to forget that.
>
> Irrespective of who is precisely at fault, we should configure the trac
> server to never allow caching of the patches since their integrity is
> crucial for us and client-side caching doesn't really buy us much. For
> that, I propose to configure Apache to add the following to the headers
> for all resources under `/sage_trac/raw_attachment`:
> {{{
> Cache-Control: no-cache
> Expires: Thu, 1 Jan 1970 00:00:00 GMT
> }}}
> hitting both the HTTP/1.0 and 1.1 cache control mechanisms.
New description:
Many sites are running transparent web proxies. Which should be fine, but
I and Simon King both recently ran into a bug where and attempt to
download a patch from trac resulted in an old version of the patch.
Needless to say, this is very dangerous for development.
To reproduce, you need to have a transparent proxy in front of you, and
then
1. Upload a patch to trac
2. Download the patch (the proxy will cache it)
3. Upload a new version of the patch under the same name
4. Download the patch again - under some circumstances the old version
of the patch is served by the (not so) transparent proxy.
This just happened to me with `trac11115-cached_cython.patch`. If I
download it from boxen (without proxy), I receive the following http
headers:
{{{
vbraun@boxen:~$ wget -O- -S http://trac.sagemath.org/sage_trac/raw-
attachment/ticket/11115/trac11115-cached_cython.patch | md5sum
--05:39:42-- http://trac.sagemath.org/sage_trac/raw-
attachment/ticket/11115/trac11115-cached_cython.patch
=> `-'
Resolving trac.sagemath.org... 128.208.160.197
Connecting to trac.sagemath.org|128.208.160.197|:80... connected.
HTTP request sent, awaiting response...
HTTP/1.1 200 Ok
Date: Sun, 18 Sep 2011 12:39:42 GMT
Server: Apache/2.2.8 (Ubuntu) DAV/2 SVN/1.5.1 mod_python/3.3.1
Python/2.5.2 mod_ssl/2.2.8 OpenSSL/0.9.8g mod_wsgi/2.0
ETag: W/"anonymous/Sat, 17 Sep 2011 21:06:12 GMT/False"
Content-Disposition: attachment
Content-Length: 151548
Last-Modified: Sat, 17 Sep 2011 21:06:12 GMT
Keep-Alive: timeout=15, max=1000
Connection: Keep-Alive
Content-Type: text/x-diff; charset=iso-8859-15
Length: 151,548 (148K) [text/x-diff]
100%[=============================================================================>]
151,548 --.--K/s
05:39:42 (161.15 MB/s) - `-' saved [151548/151548]
0dc42d7f8d3ae270eb65927ed942ad24 -
}}}
This is the correct patch. But behind my proxy, I receive a stale copy:
{{{
wget -O- -S http://trac.sagemath.org/sage_trac/raw-
attachment/ticket/11115/trac11115-cached_cython.patch | md5sum
--2011-09-18 13:37:47-- http://trac.sagemath.org/sage_trac/raw-
attachment/ticket/11115/trac11115-cached_cython.patch
Resolving trac.sagemath.org... 128.208.160.197
Connecting to trac.sagemath.org|128.208.160.197|:80... connected.
HTTP request sent, awaiting response...
HTTP/1.0 200 OK
Date: Sat, 17 Sep 2011 20:37:09 GMT
Server: Apache/2.2.8 (Ubuntu) DAV/2 SVN/1.5.1 mod_python/3.3.1
Python/2.5.2 mod_ssl/2.2.8 OpenSSL/0.9.8g mod_wsgi/2.0
ETag: W/"anonymous/Thu, 26 May 2011 07:16:22 GMT/False"
Content-Disposition: attachment
Content-Length: 151609
Last-Modified: Thu, 26 May 2011 07:16:22 GMT
Content-Type: text/x-diff; charset=iso-8859-15
Age: 57638
X-Cache: HIT from fw.stp.dias.ie
X-Cache-Lookup: HIT from fw.stp.dias.ie:3128
Via: 1.1 fw.stp.dias.ie:3128 (squid/2.7.STABLE9)
Connection: keep-alive
Length: 151609 (148K) [text/x-diff]
Saving to: “STDOUT”
100%[============================================================>]
151,609 --.-K/s in 0.002s
2011-09-18 13:37:47 (77.8 MB/s) - written to stdout [151609/151609]
f88ca8ad9090aeacb6dc0c726dcc76b5 -
}}}
HTTP provides the ETag header to control cache freshness. The proxy
(squid/2.7.STABLE9) should have checked with the trac server to see if the
cached ETag `W/"anonymous/Thu, 26 May 2011 07:16:22 GMT/False"` is still
up-to-date. If the resource were still up to date the trac server would
reply `HTTP 304 Not Modified`, but since the ETag changed the trac server
should reply with the new version of the patch. I don't have access to the
server logs so I can't say what happened for sure, but something is
broken.
A workaround is to set the `Pragma: no-cache` in the client query (i.e.
use `wget --no-cache`), but then its easy to forget that.
Irrespective of who is precisely at fault, we should configure the trac
server to never allow caching of the patches since their integrity is
crucial for us and client-side caching doesn't really buy us much. For
that, I propose to configure Apache to add the following to the headers
for all resources under `/sage_trac/raw_attachment`:
{{{
Cache-Control: no-cache
Expires: Thu, 1 Jan 1970 00:00:00 GMT
}}}
hitting both the HTTP/1.0 and 1.1 cache control mechanisms.
See also upstream bug http://trac.edgewall.org/ticket/6367
--
Comment(by vbraun):
Replying to [comment:5 leif]:
> But the purpose of `~/.wgetrc` in this case would be to ''always''
disable caching (by default), such that it wouldn't matter whether you're
behind a proxy or not (provided the proxy isn't broken and doesn't refuse
to bypass caching).
So you are suggesting that every Sage developer puts a particular entry in
`~/.wgetrc` on all of his laptops, just to be safe if he ever leaves his
house with it. While we could just work around it in a few lines of the
apache `httpd.conf`.
> Well, humans are more likely to read the comments on a ticket, so they
actually see that a patch was re-uploaded / modified (though they perhaps
don't look at the file modification times of the downloaded files, which
one IMHO should do).
The html version does not get erroneously cached, the bug manifests only
with the raw attachment. Trac dishes out the html version with `Cache-
control: must-revalidate`:
{{{
vbraun@boxen:~$ wget -O- -S
http://trac.sagemath.org/sage_trac/attachment/ticket/11115/trac11115-cached_cython.patch
| md5sum--10:49:49--
http://trac.sagemath.org/sage_trac/attachment/ticket/11115/trac11115-cached_cython.patch
=> `-'
Resolving trac.sagemath.org... 128.208.160.197
Connecting to trac.sagemath.org|128.208.160.197|:80... connected.
HTTP request sent, awaiting response...
HTTP/1.1 200 Ok
Date: Sun, 18 Sep 2011 17:49:49 GMT
Server: Apache/2.2.8 (Ubuntu) DAV/2 SVN/1.5.1 mod_python/3.3.1
Python/2.5.2 mod_ssl/2.2.8 OpenSSL/0.9.8g mod_wsgi/2.0
ETag: W/"anonymous/Sat, 17 Sep 2011 21:06:12 GMT/False"
Cache-control: must-revalidate
Set-Cookie: trac_form_token=6ee2168bc6a1bd4e46d5ac03; Path=/sage_trac
Set-Cookie: trac_session=36b6b4eaaf22a880e1451a6a; expires=Sat,
17-Dec-2011 17:49:53 GMT; Path=/sage_trac
Content-Length: 750922
Vary: Accept-Encoding
Keep-Alive: timeout=15, max=1000
Connection: Keep-Alive
Content-Type: text/html;charset=utf-8
Length: 750,922 (733K) [text/html]
100%[===========================================================================================================================>]
750,922 --.--K/s
10:49:53 (181.60 MB/s) - `-' saved [750922/750922]
0ee8396915e5be21797f03b88cacd53c -
}}}
Though the `Vary: Accept-Encoding` header is very wrong. Looking at the
trac trac (:-), this seems to be a known bug:
http://trac.edgewall.org/ticket/6367. That ticket says: "Also note that
Request.send_file() function does not send a Cache-Control header. That
should be OK if Vary * is sent". This seems to be the issue, raw
attachments neither have a `Cache-control` nor a `Vary: *` header.
And I don't check manually that a downloaded file has the right time
stamp, I have a computer to do menial task for me not the other way round
:-)
--
Ticket URL: <http://trac.sagemath.org/sage_trac/ticket/11813#comment:7>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica,
and MATLAB
--
You received this message because you are subscribed to the Google Groups
"sage-trac" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/sage-trac?hl=en.