Re: mod_cache thundering herd bug
On 19 Apr 2014, at 10:26 PM, Eric Covener cove...@gmail.com wrote: Graham -- related subject brought up either in Denver or in the bug. It seems that when we serve a stale file while the cache is locked, the age headers are small instead of large. I got totally lost trying to track down the issue, maybe it makes sense to you? It's almost as if they time of the revalidation is somehow updated early and the delta in the stale cache hits is based off of that. All thundering herd does is after letting the first conditional request through, it serves stale data (RFC willing) until that conditional request comes back or a specific maximum time is reached, whichever comes first. The most valuable piece of information in this process is the reason variable, which describes the reason why something wasn't eligible for caching. In httpd v2.4 the X-Cache-Detail header will give this to you, in httpd v2.2 you'll need to log at DEBUG level to get this: ap_log_rerror(APLOG_MARK, APLOG_DEBUG, 0, r, cache: %s not cached. Reason: %s, r-unparsed_uri, reason); The questions to answer are: - Is there stale content to serve? No stale content, no thundering herd protection. - If stale content is being deleted, identify why that is. This is likely to be unrelated to thundering herd, but rather in other parts of mod_cache. Regards, Graham --
Re: mod_cache thundering herd bug
On 21 Apr 2014, at 06:38, Graham Leggett minf...@sharp.fm wrote: On 19 Apr 2014, at 10:26 PM, Eric Covener cove...@gmail.com wrote: Graham -- related subject brought up either in Denver or in the bug. It seems that when we serve a stale file while the cache is locked, the age headers are small instead of large. I got totally lost trying to track down the issue, maybe it makes sense to you? It's almost as if they time of the revalidation is somehow updated early and the delta in the stale cache hits is based off of that. All thundering herd does is after letting the first conditional request through, it serves stale data (RFC willing) until that conditional request comes back or a specific maximum time is reached, whichever comes first. The most valuable piece of information in this process is the reason variable, which describes the reason why something wasn't eligible for caching. In httpd v2.4 the X-Cache-Detail header will give this to you, in httpd v2.2 you'll need to log at DEBUG level to get this: ap_log_rerror(APLOG_MARK, APLOG_DEBUG, 0, r, cache: %s not cached. Reason: %s, r-unparsed_uri, reason); The questions to answer are: - Is there stale content to serve? No stale content, no thundering herd protection. - If stale content is being deleted, identify why that is. This is likely to be unrelated to thundering herd, but rather in other parts of mod_cache. Covener - Are you talking about my comments in #16 on the ticket? (https://issues.apache.org/bugzilla/show_bug.cgi?id=50317#c16) If so, do either you or Graham have thoughts on the Age header getting returned with stale content? In my testing, when stale content is getting returned, no Age header is set which appears to be a violation of HTTP 1.1.
Re: mod_cache thundering herd bug
Covener - Are you talking about my comments in #16 on the ticket? (https://issues.apache.org/bugzilla/show_bug.cgi?id=50317#c16) If so, do either you or Graham have thoughts on the Age header getting returned with stale content? In my testing, when stale content is getting returned, no Age header is set which appears to be a violation of HTTP 1.1. yes, I think it's not that it's unset, but that the calculation somehow uses the revalidation-in-progress check time as the basis. -- Eric Covener cove...@gmail.com
Re: mod_cache thundering herd bug
On Tue, Apr 8, 2014 at 4:11 PM, Jim Riggs apache-li...@riggs.me wrote: https://issues.apache.org/bugzilla/show_bug.cgi?id=50317 While we are at ApacheCon, I would love to address this nasty bug with someone familiar with 2.2's mod_cache. Our sites were brought down a few times last year before we finally tracked it down to being this particular bug. I am using a crude backport of the 2.3 patch (r1023398) in 2.2. It works, but I don't know if it is correct. Can someone look at this one with me? We really need to get this fixed in 2.2, because there is NO thundering herd protection at all as things stand right now. Graham -- related subject brought up either in Denver or in the bug. It seems that when we serve a stale file while the cache is locked, the age headers are small instead of large. I got totally lost trying to track down the issue, maybe it makes sense to you? It's almost as if they time of the revalidation is somehow updated early and the delta in the stale cache hits is based off of that. -- Eric Covener cove...@gmail.com
Re: mod_cache thundering herd bug
r1023398 for 2.2: http://people.apache.org/~covener/patches/httpd-2.2.x-thunder.diff The remove_url() prevents other threads from serving a stale cached file during refresh of a slow response, but it's unnecessary to have a separate path because the refresh has to deal with 200s already. When the remove_url was added, there as no thundering herd lock / no ability to serve stale content while one guy was reloading. covener, mrumph, and I looked at this today at ApacheCon. I updated the bug with some comments and attached this patch. https://issues.apache.org/bugzilla/show_bug.cgi?id=50317 Hello, Thank You very much for the patch but*it doesn't works*. When I'm doing ab (/usr/bin/ab -k -c 5 -n 10http://host/url) test the application get more than one request 1.1.1.1 - - [14/Apr/2014:14:01:58 +0200] GET /url HTTP/1.0 200 42398 9A68DBA96CED90DC517F7D6302F5A748.gpi-app1 1163 1163 1.1.1.1 - - [14/Apr/2014:14:02:05 +0200] GET /url HTTP/1.0 200 42398 D378685BBD4FB87C63A3A867ABFAFB3E.gpi-app1 2931 2930 1.1.1.1 - - [14/Apr/2014:14:02:05 +0200] GET /url HTTP/1.0 200 42398 8B77A0C68FC6F16E0BA3A89C7A614E1A.gpi-app1 2992 2991 1.1.1.1 - - [14/Apr/2014:14:02:05 +0200] GET /url HTTP/1.0 200 42398 57A48B49FB6C52E28F1FA97DDFCDC0C8.gpi-app1 3007 3006 1.1.1.1 - - [14/Apr/2014:14:02:05 +0200] GET /url HTTP/1.0 200 42398 71573080388181B3C55E88CB4BFAB890.gpi-app1 3051 3051 1.1.1.1 - - [14/Apr/2014:14:02:06 +0200] GET /url HTTP/1.0 200 42398 38DA8533D4F9B4046A2F607071652E94.gpi-app1 1412 1412 Here are more information how to reproduce it. *Compilation* cd /tmp svn cohttp://svn.apache.org/repos/asf/httpd/httpd/branches/2.2.x cd 2.2.x/ svn cohttp://svn.apache.org/repos/asf/apr/apr/branches/1.4.x srclib/apr svn cohttp://svn.apache.org/repos/asf/apr/apr-util/branches/1.4.x srclib/apr-util ./buildconf ./configure --prefix=/etc/httpd --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --mandir=/usr/share/man --libdir=/usr/lib64 --sysconfdir=/etc/httpd/conf --includedir=/usr/include/httpd --libexecdir=/usr/lib64/httpd/modules --datadir=/var/www --with-installbuilddir=/usr/lib64/httpd/build --with-mpm=prefork --with-apr=/usr --with-apr-util=/usr --enable-suexec --with-suexec --with-suexec-caller=apache --with-suexec-docroot=/var/www --with-suexec-logfile=/var/log/httpd/suexec.log --with-suexec-bin=/usr/sbin/suexec --with-suexec-uidmin=500 --with-suexec-gidmin=100 --enable-pie --with-pcre --enable-mods-shared=all --enable-ssl --with-ssl --enable-proxy --enable-cache --enable-disk-cache --enable-ldap --enable-authnz-ldap --enable-cgid --enable-authn-anon --enable-authn-alias --disable-imagemap patch -p0 /root/rpmbuild/SOURCES/httpd-2.2.x-thunder.patch make make install *Configuration** * VirtualHost host:80 ... ... ## Cache CacheRoot /tmp/cache CacheEnable disk / CacheDisable /static/ CacheMinFileSize 0 CacheMaxFileSize 1048576 CacheDirLevels 2 CacheDirLength 2 CacheLock on CacheLockPath /tmp/mod_cache-lock CacheLockMaxAge 5 CacheIgnoreHeaders ETag Set-Cookie Header unset Expires Header unset Cache-Control Header always set Cache-Control max-age=30,stale-while-revalidate=15 /VirtualHost Best Regards Maciej Bogucki
Re: mod_cache thundering herd bug
r1023398 for 2.2: http://people.apache.org/~covener/patches/httpd-2.2.x-thunder.diff The remove_url() prevents other threads from serving a stale cached file during refresh of a slow response, but it's unnecessary to have a separate path because the refresh has to deal with 200s already. When the remove_url was added, there as no thundering herd lock / no ability to serve stale content while one guy was reloading. On Tue, Apr 8, 2014 at 2:11 PM, Jim Riggs apache-li...@riggs.me wrote: https://issues.apache.org/bugzilla/show_bug.cgi?id=50317 While we are at ApacheCon, I would love to address this nasty bug with someone familiar with 2.2's mod_cache. Our sites were brought down a few times last year before we finally tracked it down to being this particular bug. I am using a crude backport of the 2.3 patch (r1023398) in 2.2. It works, but I don't know if it is correct. Can someone look at this one with me? We really need to get this fixed in 2.2, because there is NO thundering herd protection at all as things stand right now. - Jim -- Eric Covener cove...@gmail.com
Re: mod_cache thundering herd bug
On 9 Apr 2014, at 14:46, Eric Covener cove...@gmail.com wrote: r1023398 for 2.2: http://people.apache.org/~covener/patches/httpd-2.2.x-thunder.diff The remove_url() prevents other threads from serving a stale cached file during refresh of a slow response, but it's unnecessary to have a separate path because the refresh has to deal with 200s already. When the remove_url was added, there as no thundering herd lock / no ability to serve stale content while one guy was reloading. covener, mrumph, and I looked at this today at ApacheCon. I updated the bug with some comments and attached this patch. https://issues.apache.org/bugzilla/show_bug.cgi?id=50317
mod_cache thundering herd bug
https://issues.apache.org/bugzilla/show_bug.cgi?id=50317 While we are at ApacheCon, I would love to address this nasty bug with someone familiar with 2.2's mod_cache. Our sites were brought down a few times last year before we finally tracked it down to being this particular bug. I am using a crude backport of the 2.3 patch (r1023398) in 2.2. It works, but I don't know if it is correct. Can someone look at this one with me? We really need to get this fixed in 2.2, because there is NO thundering herd protection at all as things stand right now. - Jim