Re: mod_cache responsibilities vs mod_xxx_cache provider responsibilities
On Wed, 13 Sep 2006, Davi Arnaut wrote: I'm working on this. You may want to check my proposal at http://verdesmares.com/Apache/proposal.txt Will it be possible to do away with one file for headers and one file for body in mod_disk_cache with this scheme? The thing is that I've been pounding seriously at mod_disk_cache to make it able to sustain rather heavy load on not-so-heavy equipment, and part of that effort was to wrap headers and body into one file for mainly the following purposes: * Less files, less open():s (small gain) * Way much easier to purge old entries from the cache (huge gain). Simply list all files in cache, sort by atime and remove the oldest. The old way by using htcacheclean took ages and had less useful removal criteria. * No synchronisation issues between the header file and body file, unlink one and it's gone. That's only one of many changes made, but I found it to be crucial to be able to have an architecture that's consistent without relying on locks. This made it rather easy to implement stuff like serving files that are currently being cached from cache, reusing expired cached files if the originating file is found to be unmodified, and so on. But the largest gain is still the cache cleaning process. The stuff is used in production and seems stable, however I haven't had any response to the first (trivial) patch sent so I don't know if there's any interest in this. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- Does the Little Mermaid wear an algebra? =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: mod_cache responsibilities vs mod_xxx_cache provider responsibilities
On Thu, 14 Sep 2006, Graham Leggett wrote: Niklas Edmundsson wrote: Will it be possible to do away with one file for headers and one file for body in mod_disk_cache with this scheme? This definitely has lots of advantages - however HTTP/1.1 requires that it be possible to modify the headers on a cached entry independently of the cached body. As long as this is catered for, it should be fine. Our patch allows for this, the body is simply stored at an offset with some logic to detect headers larger than the offset and cope with that too (albeit this introduces a risk for bad data being sent to the client due to the lockless design, so you really want to avoid this by having the offset large enough). Since seek():ing and writing to an offset doesn't occupy disk space in normal unix filesystems there isn't a problem in having the data at a rather large offset, but I don't know how non-unix behaves in this regard. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- To refuse praise is to seek praise twice. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
[PATCH] (resend) mod_disk_cache LFS-aware config
To facilitate the merging of our large mod_disk_cache fixup I will send small patches that fix various bugs so that they can be applied incrementally to trunk with relevant discussion limited to those patches and me not having to respin entire patchsets due to trivial fixes to patches like this one. If you want larger patchsets instead of this baby steps approach that's fine by me, but small pieces usually allows for easier review when merging. This patch and the jumbo patch with all fixes are also attached to bug #39380. This patch makes it possible to configure mod_disk_cache to cache files that are larger than the LFS limit. While at it, I implemented error handling so it doesn't accept things like CacheMinFileSize barf anymore. Actual LFS support (current code eats all address-space/memory in 32bit boxes) will come in a separate patch once this is commited. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- Ensign. How do I get to Ten-Forward? - Picard =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=Index: mod_disk_cache.c === --- mod_disk_cache.c(revision 416365) +++ mod_disk_cache.c(working copy) @@ -334,14 +334,14 @@ static int create_entity(cache_handle_t if (len conf-maxfs) { ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, r-server, disk_cache: URL %s failed the size check - (% APR_OFF_T_FMT % APR_SIZE_T_FMT ), + (% APR_OFF_T_FMT % APR_OFF_T_FMT ), key, len, conf-maxfs); return DECLINED; } if (len = 0 len conf-minfs) { ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, r-server, disk_cache: URL %s failed the size check - (% APR_OFF_T_FMT % APR_SIZE_T_FMT ), + (% APR_OFF_T_FMT % APR_OFF_T_FMT ), key, len, conf-minfs); return DECLINED; } @@ -1026,7 +1026,7 @@ static apr_status_t store_body(cache_han if (dobj-file_size conf-maxfs) { ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, r-server, disk_cache: URL %s failed the size check - (% APR_OFF_T_FMT % APR_SIZE_T_FMT ), + (% APR_OFF_T_FMT % APR_OFF_T_FMT ), h-cache_obj-key, dobj-file_size, conf-maxfs); /* Remove the intermediate cache file and return non-APR_SUCCESS */ file_cache_errorcleanup(dobj, r); @@ -1050,7 +1050,7 @@ static apr_status_t store_body(cache_han if (dobj-file_size conf-minfs) { ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, r-server, disk_cache: URL %s failed the size check - (% APR_OFF_T_FMT % APR_SIZE_T_FMT ), + (% APR_OFF_T_FMT % APR_OFF_T_FMT ), h-cache_obj-key, dobj-file_size, conf-minfs); /* Remove the intermediate cache file and return non-APR_SUCCESS */ file_cache_errorcleanup(dobj, r); @@ -1137,15 +1137,25 @@ static const char { disk_cache_conf *conf = ap_get_module_config(parms-server-module_config, disk_cache_module); -conf-minfs = atoi(arg); + +if (apr_strtoff(conf-minfs, arg, NULL, 0) != APR_SUCCESS || +conf-minfs 0) +{ +return CacheMinFileSize argument must be a non-negative integer representing the min size of a file to cache in bytes.; +} return NULL; } + static const char *set_cache_maxfs(cmd_parms *parms, void *in_struct_ptr, const char *arg) { disk_cache_conf *conf = ap_get_module_config(parms-server-module_config, disk_cache_module); -conf-maxfs = atoi(arg); +if (apr_strtoff(conf-maxfs, arg, NULL, 0) != APR_SUCCESS || +conf-maxfs 0) +{ +return CacheMaxFileSize argument must be a non-negative integer representing the max size of a file to cache in bytes.; +} return NULL; } Index: mod_disk_cache.h === --- mod_disk_cache.h(revision 416365) +++ mod_disk_cache.h(working copy) @@ -88,8 +88,8 @@ typedef struct { apr_size_t cache_root_len; int dirlevels; /* Number of levels of subdirectories */ int dirlength; /* Length of subdirectory names */ -apr_size_t minfs;/* minumum file size for cached files */ -apr_size_t maxfs;/* maximum file size for cached files */ +apr_off_t minfs; /* minimum file size for cached files */ +apr_off_t maxfs
Re: mod_cache responsibilities vs mod_xxx_cache provider responsibilities
On Thu, 14 Sep 2006, Davi Arnaut wrote: On 14/09/2006, at 04:24, Niklas Edmundsson wrote: On Wed, 13 Sep 2006, Davi Arnaut wrote: I'm working on this. You may want to check my proposal at http://verdesmares.com/Apache/proposal.txt Will it be possible to do away with one file for headers and one file for body in mod_disk_cache with this scheme? http://verdesmares.com/Apache/patches/016.patch OK. You seem to dump the body right after the headers though, so you won't be able to do header rewrites. Also, it's rather unneccessary to call the files .cache if there are only one type of files ;) /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- You have learned much, young one. - Vader =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: [PATCH] (resend) mod_disk_cache LFS-aware config
On Thu, 14 Sep 2006, Graham Leggett wrote: On Thu, September 14, 2006 11:17 am, Niklas Edmundsson wrote: To facilitate the merging of our large mod_disk_cache fixup I will send small patches that fix various bugs so that they can be applied incrementally to trunk with relevant discussion limited to those patches and me not having to respin entire patchsets due to trivial fixes to patches like this one. +1. This also makes it easier when more than one person is working on patchsets to integrate both patches. Yup. The situation seems to be complicated somewhat by Davi working on the cache-thingies, and doing more than just poking around in the mod_cache infrastructure... However, it seems that we really should start merging stuff in a tree, be it trunk or cache-dev or whatever, before we are sitting on two hard-to-merge trees which both holds significant improvements. As said, our stuff is stable in production (except for one bug that I suspect is an apache/apr bug, more about that when/if we get to that part) and transforms mod_disk_cache from unusable for us to performing nicely with approx 90% cache hit rate when serving ftp.acc.umu.se, ftp.gnome.org, se.releases/archive.ubuntu.com, releases.mozilla.org, ftp.se.debian.org ... /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- You have learned much, young one. - Vader =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: mod_cache responsibilities vs mod_xxx_cache provider responsibilities
On Thu, 14 Sep 2006, Davi Arnaut wrote: I'm working on this. You may want to check my proposal at http://verdesmares.com/Apache/proposal.txt Will it be possible to do away with one file for headers and one file for body in mod_disk_cache with this scheme? http://verdesmares.com/Apache/patches/016.patch OK. You seem to dump the body right after the headers though, so you won't be able to do header rewrites. Could you kindly point me to the cache code that rewrites only the headers ? If I remember correctly the code in 2.2.3 only does whole-file revalidation, the next logical step (that our patch does) is to make it understand that if the source file hasn't changed you don't have to copy the whole file since it's enough to just update the headers. Our patch does this, because it's needed to get decent performance when juggling dvd images (yes, recaching a 4GB file is rather expensive). There are a couple of trivial improvements like this that needs to be done in mod_disk_cache that depends on the underlying disk storage layer done right. However, given the current state of mod_disk_cache almost everything is an improvement... Also, it's rather unneccessary to call the files .cache if there are only one type of files ;) That's convenience, there may be other type of files on the same cache directory that are created by other tools. That seems silly to me, the cache directory structure should be strictly private to the cache. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- You have learned much, young one. - Vader =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: [PATCH] (resend) mod_disk_cache LFS-aware config
On Thu, 14 Sep 2006, Graham Leggett wrote: On Thu, September 14, 2006 2:41 pm, Niklas Edmundsson wrote: Yup. The situation seems to be complicated somewhat by Davi working on the cache-thingies, and doing more than just poking around in the mod_cache infrastructure... However, it seems that we really should start merging stuff in a tree, be it trunk or cache-dev or whatever, before we are sitting on two hard-to-merge trees which both holds significant improvements. Small patches are easy to merge, easy to review, and unlikely to clash - what I am keen to do is start finding all the small fixes in both your and Davi's code, and see them all applied. Davi's patches are already reasonably small - is it possible to break up your patch into discrete bits as well? That's my intention. I had hoped for my small patches to be applied to trunk one by one as they come, fixing eventual objections patch by patch instead of having to respin large patchsets that depends on eachother. At least for the small obvious fixes this should be doable. Ie, make small patch, submit for review/commit, fix/redo if needed. On to next patch. When there are more complex changes I'll probably have to do multi-part patchsets, but I really want to avoid them since they're a pain when it comes to rejections. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- Read the docs. Wow, what a radical concept! =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: mod_cache responsibilities vs mod_xxx_cache provider responsibilities
On Fri, 15 Sep 2006, Brian Akins wrote: The separate header and body files work wonderfully for performance (filling multiple gig interfaces and/or 30k requests/sec. or rather modest hardware). If you have them all in one, it can make the sendfile for the body cumbersome. If you write to the file using mmap on linux, then sendfile() breaks yes. mmap didn't give any major performance benefit for the body copy though, so it doesn't matter and we don't use it. This is really a Linux bug, since non-overlapping write/sendfile should be OK. If you somehow track what entries or in the cache, it is very easy to purge entries. Extra tracking sounds unnecessary if you can do it in a way that doesn't need it. At Apachecon, I'll talk some about our version of mod_cache. Unfortunately, I can't share code :( But I can tell you the separate files way is not a performance or housekeeping issue. If you have the index i can agree. However, I don't see how you can do a lockless design with multiple files and an index that can do: * Clients read from the cache as files are being cached. * Only one session caches the same file. * Header/Body updates. * No index/files out-of-sync issues. Ever. With locks, yes it's possible but also a hassle to get right with performance intact. The current mod_disk_cache seems to be designed for small files and enough memory to hide the problems by the design. If you have files that fit into the OS cache then it doesn't matter if hundreds of sessions are caching the same file, it'll work out eventually without reduced performance. This isn't the case when each file (DVD image) is bigger than your memory and doesn't fit in the OS file cache. In fact you can tell that the author never even consider this due to the way the body is copied (on 32bit you loose). We, as a ftp mirror operated by a non-profit computer club, have a slightly different usecase with single files larger than machine RAM and a working set of approx 40 times larger than RAM. Some bad design decisions in mod_disk_cache becomes really visible in this environment. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- I wish I had a snappy Trek Message to put here... =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: mod_cache responsibilities vs mod_xxx_cache provider responsibilities
On Sun, 17 Sep 2006, Graham Leggett wrote: Niklas Edmundsson wrote: However, I don't see how you can do a lockless design with multiple files and an index that can do: * Clients read from the cache as files are being cached. * Only one session caches the same file. * Header/Body updates. * No index/files out-of-sync issues. Ever. Thinking about this some more I do see a race during purging - a cache thread could read the header, the purge deletes header and body, and then the cache thread reads the body, and interprets the missing body as the body is still coming. The easiest way to deal with this might be to have a timeout, if the body hasn't shown up in $timeout time then something went bad, DECLINE, meaning that the cache layer thinks it should cache the file and acts accordingly. You actually want this fallback anyway, and it's probably enough to deal with the purge-problem. The purge should delete the oldest unused entries anyway, so the chance of hitting that case shouldn't be too common. And yes, since this scheme only might cause on-disk stray files that can be cleaned up by purging I can agree that it'll work. However, I strongly believe that the purging should not have to read each header file the way that htcacheclean currently does it since it poses such a strain on the cache filesystem. A file system traversal should be enough. Anyhow, I can probably rather easily adapt our patches to do it this way if that's what people want. I'm not entirely sure what the gain would be though, since it's a tad more housekeeping work and double the number of inodes to traverse during a purge... But, that is future work. I haven't had any comment of the current patch of mine yet (lfs-config) so I'm not entirely sure of whether it seems OK and I should proceed with the next patch or what. I'm not that well endowed in all API:s involved, and stuff that looks right to me might have a much better Apachier solution so I don't want to get carried away creating huge patchsets to having the first one rejected because my coding style sucks... However, I can understand if you want a complete patch that solves the lfs issues, but then you'll have to tell me since I'm not a mind reader ;) /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- --- tribbles playing follow-the-leader =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: mod_cache responsibilities vs mod_xxx_cache provider responsibilities
On Mon, 18 Sep 2006, Brian Akins wrote: Niklas Edmundsson wrote: Extra tracking sounds unnecessary if you can do it in a way that doesn't need it. It's not extra it just adding some tracking. When an objects gets cached log (sql, db, whatever) that /blah/foo/bar.html is cached as /cache/x/y/something.meta. Then it's very easy to ask the store what is /blah/foo/bar.html cached as? There may be multiples because of vary. Extra because you already have the needed info to puzzle the things together... * Clients read from the cache as files are being cached. That's the hard one, IMO But the implementation was rather easy once the cache to separate file and mv to correct location-stuff was ripped out. Or, as easy as building your own bucket-type is. * Only one session caches the same file. Easy to do if we use deterministic tmp files and not the way we currently do it. Then all you have to do is when creating temp files use O_EXCL. Or, if we skip the tmp files altogether. * Header/Body updates. Eaiser with seperate files like mod_disk_cache does now. True. * No index/files out-of-sync issues. Ever. Hard to guarantee, but not impossible. Always to index when storing file and remove when deleting. This should use something like providers so it's not in core cache code and can be easily modified. With locks, yes it's possible but also a hassle to get right with performance intact. Not really that hard. Trust me it has been done... I'll take your word for that. We, as a ftp mirror operated by a non-profit computer club, have a slightly different usecase with single files larger than machine RAM and a working set of approx 40 times larger than RAM. Some bad design decisions in mod_disk_cache becomes really visible in this environment. Seems to me you should approach problem differently, like rsyncing the mirrored content. I don't know your environment, but was just what I cam up with off the top of my head. Try rsyncing a few TB of content onto a few hundred GB of cache disk and see how that works out for you :) Our setup is briefly described here by the way: http://ftp.acc.umu.se/mirror/ftp-about.html /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- A closed mouth gathers no feet. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: mod_cache responsibilities vs mod_xxx_cache provider responsibilities
On Mon, 18 Sep 2006, Brian Akins wrote: Graham Leggett wrote: I have not seen inside the htcacheclean code, why is the code reading the headers? In theory the cache should be purged based on last access time, deleted as space is needed. Everyone should be mounting cache directories noatime, unless they don't care about performance... Actually, cache on xfs mounted with atime doesn't seem to be a performance killer oddly enough... Our frontends had no problems surviving 1k requests/s during the latest mozilla-update-barrage. Other mirrors had problems, so it seems we ended up with taking the majority of the load... That said: yes, noatime is quicker but if you want to be able to clean your cache often (think new linux distro release which quickly fills up the cache with new contents) atime+fs traversal is a better combined solution than having to open/read every header. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- That's not a bug. It's supposed to do that. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: mod_cache responsibilities vs mod_xxx_cache provider responsibilities
On Wed, 20 Sep 2006, Brian Akins wrote: Niklas Edmundsson wrote: don't care about performance... Actually, cache on xfs mounted with atime doesn't seem to be a performance killer oddly enough... Our frontends had no problems surviving 1k requests/s during the latest mozilla-update-barrage. 1k requests/second is not really that much... 10k requests/second is more what I'm used to. XFS sucks for us as a cache storage. It tends to crock under some traffic patterns (reads vs writes). ext3 is actually more reliable for us. Reiserfs is interesting, but tends to go haywire from time to time. I think the key difference here is our average file size... We don't need that many requests/s to bottom out gige normally. We clean our cache often because we have a really quick way to find the size and remove the oldest expired objects first. Every cache store gets recorded in SQLite with info about the object (size, mtime, expire time, url, key, etc.). Makes it trivial tow write cron jobs to do cache management. Yup. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- Don't force it, use a bigger hammer =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
[PATCH] mod_disk_cache working LFS (filecopy)
This patch depends on mod_disk_cache LFS-aware config submitted earlier and is for trunk. It makes caching of large files possible on 32bit machines by: * Realising that a file is a file and can be copied as such, without reading the whole thing into memory first. * When a file is cached by copying, replace the brigade with a new one refering to the cached file so we don't have to read the file from the backend again when sending a response to the client. * When a file is cached by copying, keep the file even if the client aborts the connection since we know that the response is valid. * Check a few more return values to be able to add successfully in the appropriate places above. The thing is mildly tested, but it's a subset of our much larger patchset that's been in production since June. I'm able to get a 4.3GB file from a 32bit machine with 1GB of memory using mod_disk_cache, and the md5sum is correct afterwards. The old behaviour was eating all the address space/memory and segfault. I'll attach the thing to bug #39380 as well. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- Great thing about being a Slayer? Kicking ass is comfort food. - Buffy =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--- mod_disk_cache.c.1-lfsconfig2006-09-18 12:19:56.0 +0200 +++ mod_disk_cache.c2006-09-26 09:35:51.0 +0200 @@ -157,7 +157,16 @@ static apr_status_t file_cache_el_final( if (dobj-tfd) { apr_status_t rv; -apr_file_close(dobj-tfd); +rv = apr_file_close(dobj-tfd); +dobj-tfd = NULL; + +if(rv != APR_SUCCESS) { +ap_log_error(APLOG_MARK, APLOG_DEBUG, rv, r-server, + disk_cache: closing tempfile failed: %s, + dobj-tempfile); +apr_file_remove(dobj-tempfile, r-pool); +return rv; +} /* This assumes that the tempfile is on the same file system * as the cache_root. If not, then we need a file copy/move @@ -169,9 +178,8 @@ static apr_status_t file_cache_el_final( disk_cache: rename tempfile to datafile failed: %s - %s, dobj-tempfile, dobj-datafile); apr_file_remove(dobj-tempfile, r-pool); +return rv; } - -dobj-tfd = NULL; } return APR_SUCCESS; @@ -976,15 +984,133 @@ static apr_status_t store_headers(cache_ return APR_SUCCESS; } + +static apr_status_t copy_body(apr_file_t *srcfd, apr_off_t srcoff, + apr_file_t *destfd, apr_off_t destoff, + apr_off_t len) +{ +apr_status_t rc; +apr_size_t size; +apr_finfo_t finfo; +apr_time_t starttime = apr_time_now(); +char buf[CACHE_BUF_SIZE]; + +if(srcoff != 0) { +rc = apr_file_seek(srcfd, APR_SET, srcoff); +if(rc != APR_SUCCESS) { +return rc; +} +} + +if(destoff != 0) { +rc = apr_file_seek(destfd, APR_SET, destoff); +if(rc != APR_SUCCESS) { +return rc; +} +} + +/* Tried doing this with mmap, but sendfile on Linux got confused when + sending a file while it was being written to from an mmapped area. + The traditional way seems to be good enough, and less complex. + */ +while(len 0) { +size=MIN(len, CACHE_BUF_SIZE); + +rc = apr_file_read_full (srcfd, buf, size, NULL); +if(rc != APR_SUCCESS) { +return rc; +} + +rc = apr_file_write_full(destfd, buf, size, NULL); +if(rc != APR_SUCCESS) { +return rc; +} +len -= size; +} + +/* Check if file has changed during copying. This is not 100% foolproof + due to NFS attribute caching when on NFS etc. */ +/* FIXME: Can we assume that we're always copying an entire file? In that + case we can check if the current filesize matches the length + we think it is */ +rc = apr_file_info_get(finfo, APR_FINFO_MTIME, srcfd); +if(rc != APR_SUCCESS) { +return rc; +} +if(starttime finfo.mtime) { +return APR_EGENERAL; +} + +return APR_SUCCESS; +} + + +static apr_status_t replace_brigade_with_cache(cache_handle_t *h, + request_rec *r, + apr_bucket_brigade *bb) +{ +apr_status_t rv; +int flags; +apr_bucket *e; +core_dir_config *pdcfg = ap_get_module_config(r-per_dir_config, +core_module); +disk_cache_object_t *dobj = (disk_cache_object_t *) h-cache_obj-vobj; + +flags = APR_READ|APR_BINARY; +#if APR_HAS_SENDFILE +flags |= ((pdcfg-enable_sendfile
Re: [PATCH] mod_disk_cache working LFS (filecopy)
On Tue, 26 Sep 2006, Issac Goldstand wrote: Forgive me for missing the obvious, but why not just use mod_file_cache for this? I recall you mentioning that your use of mod_cache was for locally caching very large remote files, so don't see how this would help that in any case since the file doesn't exist locally when being stored, and if the file is otherwise known to be on the file system, there's no reason to keep it in mod_disk_cache's cache area (in any case, it wouldn't improve performance - only mod_file_cache would). So what am I missing? Apache Module mod_file_cache Description:Caches a static list of files in memory This has little to do with setup like ours (ftp.acc.umu.se): * NFS backend with lots of storage (multiple TB), not lots of bandwidth/performance. * Multiple frontends with (relatively) fast cache storage. * A working set of a couple of hundred GB which changes daily. By using caching frontends we can easily fill our available 2Gbit even though the backend can only do about 300-400Mbit. This is possible because of a cache hit rate of about 90%. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- One family builds a wall, two families enjoy it. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: [PATCH] mod_disk_cache working LFS (filecopy)
On Tue, 26 Sep 2006, Graham Leggett wrote: On Tue, September 26, 2006 1:00 pm, Joe Orton wrote: This was discussed a while back. I think this is an API problem which needs to be fixed at API level, not something which should be worked around by adding bucket-type-specific hacks. API changes won't be backportable to v2.2.x though, although you're right. Won't that method mean that caching the file will happen at the speed the client reads the file? So, with only one session caching a file and read-while-caching (ie. the features you want in the end) you can get the following scenario: - Slw client starts downloading a large file. First access, the file is getting cached slwly. - Fast clients starts downloading the same file, but slowly caused by the pacing of the slow client. I suspect that this also means that if the caching client hangs up before the caching is finished we'll have to toss what's been cached so far, or do we get that error before the brigade is destroyed? In any case, it sounds like a better way to do it than the current always-eat-your-memory-and-die solution, but I think that we'll be needing that kludge to get good behaviour in our caching-frontend-for-ftpserver-case ... /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- But I was going into Toshi Station to pick up some power converters... =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: [PATCH] mod_disk_cache working LFS (filecopy)
On Tue, 26 Sep 2006, Graham Leggett wrote: Niklas Edmundsson wrote: * Realising that a file is a file and can be copied as such, without reading the whole thing into memory first. * When a file is cached by copying, replace the brigade with a new one refering to the cached file so we don't have to read the file from the backend again when sending a response to the client. As I read the code, the copy is completed before an attempt is made to deliver the copy to the network. This should in theory stop a slow initial client from holding up faster following clients, if the caching is still in transit. Is this correct? This is the original design of mod_disk_cache (which isn't changed by this patch), so yes. In practice this isn't enough when dealing with large files, so in our production code (the hideously large jumbopatch) this is fixed by read-while-caching and spawning a thread to do the caching in the background while delivering the response (by read-while-caching) to the client that initiated the caching. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- Why don't you go teach someone who actually needs to learn?! - Buffy =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: svn commit: r450188 - /httpd/httpd/trunk/modules/cache/mod_disk_cache.c
On Wed, 27 Sep 2006, Graham Leggett wrote: Ruediger Pluem wrote: Are we sure that we do not iterate too often ( 100) over this during the lifetime of a request? I would say 'No, we do not iterate too often', but I think a crosscheck by someone else is a good idea. Otherwise we would have a potential temporary memory leak here. We would copy the body once per request, surely? That's how I read it - copy_body would be called once, resulting in the buffer being declared once, and reused inside the copy_body loop. The code is very picky about there only being a single, complete, body so it should only be called once per request. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- I am Jay Leno of Borg : Kevin's love life is irrelevant (and non-existing) =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: svn commit: r450105 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_disk_cache.c modules/cache/mod_disk_cache.h
On Wed, 27 Sep 2006, Joe Orton wrote: I don't get it - as discussed, this approach is completely unsound. There is no reason to assume it's possible to copy the entire content into the cache before sending anything to the client just because it happens to be a FILE bucket (think slow NFS servers). That is something which needs to be *fixed*, not explicitly hard-coded. Yes, it has to be fixed eventually. Until then we're better off with gradual improvements than just saying the solution isn't perfect. mod_disk_cache isn't exactly the best code out there, so it'll take a while to get it decent, and it'll take more than a single patch to do it. I fully agree that the copy everything into cache and then reply method is utterly stupid, but that's the way mod_disk_cache currently works, and despite that it got tagged as stable... In an effort to improve things, I'll start taking more stuff from our jumbo patch and building smaller incremental patches that will eventually mean that mod_disk_cache will have read-while-caching. When that code is in there will be a plethora of options on how to solve the client that does the caching request hangs problem which we have kludged to work in the nfs-backend case. As said, we have kludged it to work in our setup (which has a slow NFS backend), but the perfect solution will have to come from people who knows all the deep magic in httpd and I know I'm not that person. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- Only together can we turn him to the dark side of the Force. - Emperor =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: [PATCH] mod_disk_cache working LFS (filecopy)
On Wed, 27 Sep 2006, Graham Leggett wrote: On Wed, September 27, 2006 11:07 am, Niklas Edmundsson wrote: In practice this isn't enough when dealing with large files, so in our production code (the hideously large jumbopatch) this is fixed by read-while-caching and spawning a thread to do the caching in the background while delivering the response (by read-while-caching) to the client that initiated the caching. A thread makes sense for platforms that support threads, but we would need some kind of functional behaviour for platforms that don't have threads. Would the option of spawning a process to copy the file also work, leaving the original process to read-while-cache the response for the benefit of the client? We have code for it, it's just untested since we're using the worker mpm. We'll deal with that when I get to those patches. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- Dyslexia rules KO. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
[PATCH] mod_cache: Don't log bogus errors
The following patch should eliminate bogus error log entries similar to: [Wed Sep 27 15:31:29 2006] [error] (-3)Unknown error 18446744073709551613: cache: error returned while trying to return disk cached data If I have understood things right AP_FILTER_ERROR only means that an error has occured and that an error web page has already been sent (documented in CHANGES of all places). The additional garbage in the error log doesn't make anyone happy... /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- Don't give away the homeworld. - Babylon 5 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=Index: mod_cache.c === --- mod_cache.c (revision 450405) +++ mod_cache.c (working copy) @@ -244,10 +244,12 @@ static int cache_url_handler(request_rec out = apr_brigade_create(r-pool, r-connection-bucket_alloc); rv = ap_pass_brigade(r-output_filters, out); if (rv != APR_SUCCESS) { -ap_log_error(APLOG_MARK, APLOG_ERR, rv, r-server, - cache: error returned while trying to return %s - cached data, - cache-provider_name); +if(rv != AP_FILTER_ERROR) { +ap_log_error(APLOG_MARK, APLOG_ERR, rv, r-server, + cache: error returned while trying to return %s + cached data, + cache-provider_name); +} return rv; }
Re: [PATCH] mod_disk_cache working LFS (filecopy)
On Sat, 30 Sep 2006, Davi Arnaut wrote: Hi, Wouldn't you avoid a lot of complexity in this patch if you just deleted from the brigade the implicitly created heap buckets while reading file buckets ? Something like: store_body: .. if (is_file_bucket(bucket)) copy_file_bucket(bucket, bb); Probably, but that doesn't allow for creating a thread/process that does the copying in the background, which is my long term goal. Also, simply doing bucket_delete like that means that the file will never be sent to the client, which is a bad thing IMO ;) /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- Excuse me, is that a toupee or do you have a tribble on your head =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: [PATCH] mod_disk_cache working LFS (filecopy)
On Sun, 1 Oct 2006, Davi Arnaut wrote: store_body: .. if (is_file_bucket(bucket)) copy_file_bucket(bucket, bb); Probably, but that doesn't allow for creating a thread/process that does the copying in the background, which is my long term goal. Also, simply doing bucket_delete like that means that the file will never be sent to the client, which is a bad thing IMO ;) Shame on me, but I said something like.. :) I guess the attached patch does the same (plus mmap, et cetera) and is much simpler. Comments ? Simpler, yes. But it only has the benefit of not eating all your memory... * It leaves the brigade containing the uncached entity, so it will cause the backend to first deliver stuff to be cached and then stuff to the client. * When this evolves to wanting to spawn a thread/process to do the copying you'll need the is this a file-thingie anyway (at least I need it, but I might have missed some nifty feature in APR). /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- Oohhh. Jedi Master. Yoda. You seek Yoda. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: [PATCH] mod_disk_cache working LFS (filecopy)
On Mon, 2 Oct 2006, Davi Arnaut wrote: Simpler, yes. But it only has the benefit of not eating all your memory... Well, that was the goal. Maybe we could merge this one instead and work together on the other goals. As I have said before, we have a large patchset that fixes a bunch of problems. However, since the wish was to merge it in pieces, I have started breaking it into pieces for merging. If the intent is a total redesign of mod_disk_cache, ie you're not interested in these patches at all, please say so and I would have not wasted a lot of work on bending our patches to get something that works when applying them one by one and then do QA on the thing. * It leaves the brigade containing the uncached entity, so it will cause the backend to first deliver stuff to be cached and then stuff to the client. Yeah, that's how mod_disk_cache works. I think it's possible to work around this limitation without using threads by keeping each cache instance with it's own brigades and flushing it occasionally with non-blocking i/o. The replace_brigade_with_cache()-function simply replaced the brigade with an instance pointing to the cached entity. Or we could move all disk i/o to a mod_disk_cache exclusive thread pool, it could be configurable at build time whether or not to use a thread pool. Comments ? I would very happy if people would fix the mod_disk_cache mess so I didn't have to. However, since noone seems to have produced something usable for our usage in the timeframe mod_disk_cache has existed I was forced to hack on it. I'm trying my best to not give up on having it merged as I know that there are other sites interested in it, and now that the first trivial bit has been applied I'm hoping that people will at least look at the rest... There are bound to be cool apachier ways to solve some of the problems, but given that our patch is stable in production and has a generally much higher code quality than mod_disk_cache (ever heard of error checking/handling?) it would be nice if people at least could look at the whole thing before starting to complain at the complexity of small parts (or code not touched by the patch, for that matter). * When this evolves to wanting to spawn a thread/process to do the copying you'll need the is this a file-thingie anyway (at least I need it, but I might have missed some nifty feature in APR). You would just need to copy the remaining buckets (granted if there are no concurrency problems) and send then to a per-process thread pool. And when not having threads? /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- To avoid seeing a fool, break your mirror. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: Coding style
On Mon, 2 Oct 2006, Garrett Rooney wrote: Or the even more readable: rv = do_something(args); if (rv == APR_SUCCESS) { } +1 for simple code like this. It comes naturally when you need to do stuff like rv = dostuff(...); if(rv != APR_SUCCESS rv != whatever) { ... and is also less likely to cause ugly linewraps when using functions_with_long_names(and, a, large, list, of, arguments) ... /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- Sexy: Uses feather. Kinky: Uses entire chicken. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
[PATCH] sendfile_nonblocking broken in trunk
I stumbled upon this when porting the mod_disk_cache read-while-caching feature to trunk. r-w-c uses a diskcache bucket which it morphs into file buckets as more data becomes available. Ie, it starts with a brigade containing: FILE-DISKCACHE FILE is sendfile:d as usual by core_filters, and when DISKCACHE is bucket_read() it morphs the bucket into a 0-length HEAP bucket, a FILE bucket and the remains in a trailing DISKCACHE bucket, ie: HEAP-FILE-DISKCACHE send_brigade_nonblocking() correctly does the bucket_read and moves on to the next bucket which it correctly identifies as a FILE bucket and tries to sendfile_nonblocking(). sendfile_nonblocking() takes the _brigade_ as an argument, gets the first bucket from the brigade, finds it not to be a FILE bucket and barfs. The attached fix is trivial, and I really can't understand why sendfile_nonblocking() was taking a brigade instead of a bucket as argument in the first place. On a side note, in send_brigade_nonblocking() it's unnecessary to queue 0-length writes to the iovec. It probably won't do any difference at all in real world performance but it's obviously not optimal ;) Also, I'm not at all fond of all those XXX: We really should log/return error/foo here-lines. It's not THAT hard doing it while coding, or at least do a final touchup before submitting a patch/comitting code... /Nikke - now able to do some QA before submitting mod_disk_cache patches. -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- My stereo's «-fixed, said Tom monotonously. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=Index: core_filters.c === --- core_filters.c (revision 452869) +++ core_filters.c (working copy) @@ -330,7 +330,7 @@ static apr_status_t writev_nonblocking(a #if APR_HAS_SENDFILE static apr_status_t sendfile_nonblocking(apr_socket_t *s, - apr_bucket_brigade *bb, + apr_bucket *bucket, apr_size_t *cumulative_bytes_written, conn_rec *c); #endif @@ -567,7 +567,7 @@ static apr_status_t send_brigade_nonbloc return rv; } } -rv = sendfile_nonblocking(s, bb, bytes_written, c); +rv = sendfile_nonblocking(s, bucket, bytes_written, c); if (nvec 0) { (void)apr_socket_opt_set(s, APR_TCP_NOPUSH, 0); } @@ -730,21 +730,21 @@ static apr_status_t writev_nonblocking(a #if APR_HAS_SENDFILE static apr_status_t sendfile_nonblocking(apr_socket_t *s, - apr_bucket_brigade *bb, + apr_bucket *bucket, apr_size_t *cumulative_bytes_written, conn_rec *c) { apr_status_t rv = APR_SUCCESS; -apr_bucket *bucket; apr_bucket_file *file_bucket; apr_file_t *fd; apr_size_t file_length; apr_off_t file_offset; apr_size_t bytes_written = 0; -bucket = APR_BRIGADE_FIRST(bb); if (!APR_BUCKET_IS_FILE(bucket)) { -/* XXX log a this should never happen message */ +ap_log_error(APLOG_MARK, APLOG_ERR, rv, c-base_server, + core_filter: sendfile_nonblocking: + this should never happen); return APR_EGENERAL; } file_bucket = (apr_bucket_file *)(bucket-data);
Re: [PATCHES] mod_disk_cache read-while-caching
On Thu, 5 Oct 2006, Niklas Edmundsson wrote: OK, here comes the latest two patches in the mod_disk_cache improvement parody. I'll attach these patches to bug #39380, but with less comments. I discovered a few misses, mostly not NULL:ing fd pointers when closing them, missing close/flush, and some unneccessary code duplication instead of calling the right helper in replace_brigade_with_cache(). The misses are in the loadstore-patch, so I would recommend applying this before reviewing the results even though it's generated from a file with the read-while-caching patch applied. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- Real men write self-modifying code =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--- mod_disk_cache.c.rwc2006-10-06 14:22:27.0 +0200 +++ mod_disk_cache.c2006-10-08 19:17:31.0 +0200 @@ -676,6 +676,7 @@ static apr_status_t open_header_timeout( while(1) { if(dobj-hfd) { apr_file_close(dobj-hfd); +dobj-hfd = NULL; } rc = open_header(h, r, key, conf); if(rc != APR_SUCCESS rc != CACHE_ENODATA) { @@ -1209,6 +1210,7 @@ static apr_status_t recall_headers(cache } apr_file_close(dobj-hfd); +dobj-hfd = NULL; ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, r-server, disk_cache: Recalled headers for URL %s, dobj-name); @@ -1556,6 +1558,7 @@ static apr_status_t store_headers(cache_ rv = apr_file_open(dobj-hfd, dobj-hdrsfile, APR_WRITE | APR_BINARY | APR_BUFFERED, 0, r-pool); if (rv != APR_SUCCESS) { +dobj-hfd = NULL; return rv; } } @@ -1590,6 +1593,19 @@ static apr_status_t store_headers(cache_ return rv; } +/* If the body size is unknown, the header file will be rewritten later + so we can't close it */ +if(dobj-initial_size 0) { +rv = apr_file_flush(dobj-hfd); +} +else { +rv = apr_file_close(dobj-hfd); +dobj-hfd = NULL; +} +if(rv != APR_SUCCESS) { +return rv; +} + ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, r-server, disk_cache: Stored headers for URL %s, dobj-name); return APR_SUCCESS; @@ -1666,23 +1682,20 @@ static apr_status_t replace_brigade_with apr_bucket_brigade *bb) { apr_status_t rv; -int flags; apr_bucket *e; -core_dir_config *pdcfg = ap_get_module_config(r-per_dir_config, -core_module); disk_cache_object_t *dobj = (disk_cache_object_t *) h-cache_obj-vobj; -flags = APR_READ|APR_BINARY; -#if APR_HAS_SENDFILE -flags |= ((pdcfg-enable_sendfile == ENABLE_SENDFILE_OFF) -? 0 : APR_SENDFILE_ENABLED); -#endif - -rv = apr_file_open(dobj-fd, dobj-datafile, flags, 0, r-pool); +if(dobj-fd) { +apr_file_close(dobj-fd); +dobj-fd = NULL; +} +rv = open_body_timeout(r, dobj-name, dobj); if (rv != APR_SUCCESS) { -ap_log_error(APLOG_MARK, APLOG_ERR, rv, r-server, - disk_cache: Error opening datafile %s for URL %s, - dobj-datafile, dobj-name); +if(rv != CACHE_EDECLINED) { +ap_log_error(APLOG_MARK, APLOG_ERR, rv, r-server, + disk_cache: Error opening datafile %s for URL %s, + dobj-datafile, dobj-name); +} return rv; } @@ -1922,14 +1935,12 @@ static apr_status_t store_body(cache_han /* All checks were fine, close output file */ rv = apr_file_close(dobj-fd); +dobj-fd = NULL; if(rv != APR_SUCCESS) { file_cache_errorcleanup(dobj, r); return rv; } -ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, r-server, - disk_cache: Body for URL %s cached., dobj-name); - /* Redirect to cachefile if we copied a plain file */ if(copy_file) { rv = replace_brigade_with_cache(h, r, bb);
[PATCH] mod_disk_cache background copy
This patch implements copying a file in the background so the client initiating the caching can get the file delivered by read-while-caching instead of having to wait for the file to finish. I'll attach it to bug #39380 as well, with less comments. The method used here is rather crude, but works well enough in practice. It should suffice as a first step of implementing this functionality. Known missing features: * Documentation for the CacheMinBGSize parameter, the minimum file size to to do background caching. Typically set to what your backend can deliver in approx 250ms at normal load (given 200ms sleep loop). * It doesn't set the stacksize for the background thread, it made stuff unloadable on AIX which probably means some symbol is missing in an export table somewhere. * Testing of the forked variation. This has only had testing with the worker MPM on Unix. Known areas of possible improvements: * Figure out why the cleanup-function isn't run before the fd's are closed so the private pool can be removed. * I suppose it's possible to use cross-threads-fd's with some setaside-magic instead of open new fd's in the bgcopy thread. * Experiment with a separate copy-files-thread spawned at initialization for threaded environments. * The forked thingie could probably use a few cleanups. In practice I don't think those improvements will give much in terms of performance but it sure would be more elegant :) /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- It's funny how the Earth never opens up and swallows you when you want it to. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--- mod_disk_cache.c.ls-rwc-fixups 2006-10-08 19:17:31.0 +0200 +++ mod_disk_cache.c2006-10-08 19:11:40.0 +0200 @@ -22,6 +22,8 @@ #include util_filter.h #include util_script.h #include util_charset.h +#include ap_mpm.h + /* * mod_disk_cache: Disk Based HTTP 1.1 Cache. @@ -1677,6 +1679,272 @@ static apr_status_t copy_body(apr_pool_t } +/* Provide srcfile and srcinfo containing + APR_FINFO_INODE|APR_FINFO_MTIME to make sure we have opened the right file + (someone might have just replaced it which messes up things). +*/ +static apr_status_t copy_body_nofd(apr_pool_t *p, const char *srcfile, + apr_off_t srcoff, apr_finfo_t *srcinfo, + const char *destfile, apr_off_t destoff, + apr_off_t len) +{ +apr_status_t rc; +apr_file_t *srcfd, *destfd; +apr_finfo_t finfo; + +rc = apr_file_open(srcfd, srcfile, APR_READ | APR_BINARY, 0, p); +if(rc != APR_SUCCESS) { +return rc; +} +rc = apr_file_info_get(finfo, APR_FINFO_INODE|APR_FINFO_MTIME, srcfd); +if(rc != APR_SUCCESS) { +return rc; +} +if(srcinfo-inode != finfo.inode || srcinfo-mtime finfo.mtime) { +return APR_EGENERAL; +} + +rc = apr_file_open(destfd, destfile, APR_WRITE | APR_BINARY, 0, p); +if(rc != APR_SUCCESS) { +return rc; +} + +rc = copy_body(p, srcfd, srcoff, destfd, destoff, len); +apr_file_close(srcfd); +if(rc != APR_SUCCESS) { +apr_file_close(destfd); +return rc; +} + +return apr_file_close(destfd); +} + + +#if APR_HAS_THREADS +static apr_status_t bgcopy_thread_cleanup(void *data) +{ +copyinfo *ci = data; +apr_status_t rc, ret; +apr_pool_t *p; + +/* FIXME: Debug */ +ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, ci-s, + disk_cache: bgcopy_thread_cleanup: %s - %s, + ci-srcfile, ci-destfile); + +rc = apr_thread_join(ret, ci-t); +if(rc != APR_SUCCESS) { +ap_log_error(APLOG_MARK, APLOG_ERR, rc, ci-s, + disk_cache: bgcopy_thread_cleanup: apr_thread_join + failed %s - %s, ci-srcfile, ci-destfile); +return rc; +} +if(ret != APR_SUCCESS) { +ap_log_error(APLOG_MARK, APLOG_ERR, ret, ci-s, + disk_cache: Background caching body %s - %s failed, + ci-srcfile, ci-destfile); +} + +/* FIXME: Debug */ +ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, ci-s, + disk_cache: bgcopy_thread_cleanup: SUCCESS %s - %s, + ci-srcfile, ci-destfile); + +/* Destroy our private pool */ +p = ci-pool; +apr_pool_destroy(p); + +return APR_SUCCESS; +} + + +static void *bgcopy_thread(apr_thread_t *t, void *data) +{ +copyinfo *ci = data; +apr_pool_t *p; +apr_status_t rc; + +p = apr_thread_pool_get(t); + +/* FIXME: Debug */ +ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, ci-s, + disk_cache: bgcopy_thread: start %s - %s, + ci-srcfile, ci-destfile
Re: [PATCH] mod_disk_cache background copy
On Wed, 11 Oct 2006, Graham Leggett wrote: This patch implements copying a file in the background so the client initiating the caching can get the file delivered by read-while-caching instead of having to wait for the file to finish. Something that Joe Orton raised, and that I've been looking into in more detail. The copy_body function currently only supports file buckets, which specifically excludes buckets generated by say mod_proxy, or mod_cgi. From my testing, for these non file buckets, the response is downloaded and cached fully, then the client gets fed data. Initially I understood this as an optimisation specific to files, assuming that file buckets were the only buckets that could potentially exceed available RAM, but the case where non file buckets are present is currently unhandled. I don't have enough knowledge of httpd internals to be sure, but doesn't the data-generating types insert flush buckets in the stream to avoid this? That said, mod_disk_cache seems to be totally unaware of flush buckets so I'm either barking up the wrong tree, it's handled on a higher level or it isn't handled. In theory, the copy body should be able to read from any brigade, rather than just a file brigade, in such a way that it doesn't try and load 4.7GB into RAM at once for file buckets. The original reason for copy_body() was to have something that could be used in a background thread, and the only thing I'm sure can be copied in the background is plain files. Everything else must be handled the old way. I am jetlagged right now and can't think straight any more today, will carry on looking at this tomorrow :) OK. I'll be away for a week or so and might lag quite a bit in replying to stuff. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- Huh? What? Am I on-line? =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: mod_disk_cache summarization
On Mon, 23 Oct 2006, Graham Leggett wrote: Was busy cleaning up some other odds and ends, will be back on the cache code again shortly. I'm awaiting the verdict on how to resolve the lead request hangs problem before I submit more patches, I feel it's important enough to be solved before I start submitting fixes/improvements to the following items for mod_disk_cache: * On disk header fixes to not break when moving between 32/64 bit builds, include filename so we can fill in r-filename so %f in LogFormat works. * More assorted small cleanups (mostly error handling). * Allow disk cache to realise that a (large) file is the same regardless of which URL is used to access it. Reduces cache disk usage a lot for sites like ours that's known by ftp.acc.umu.se, ftp.se.debian.org, ftp.gnome.org, se.releases.ubuntu.com, releases.mozilla.org and so on. * Add option to not try to remove cache directories in the cache structure. IMHO, this should never be needed since the cache directory should not be excessively deep (which the broken defaults leads to). Davi had a fix for the cache dir layout I think, and I personally think that neither mod_disk_cache nor htcacheclean should do rmdir. * Eventually add option to have header and body in the same cachefile. * Probably more stuff that I don't remember without looking in the jumbopatch. Also, I suspect that there is documentation that needs to be updated, more than just new options. While working with this I have understood that there are two rather different uses for mod_disk_cache: either as a cache in a proxy or as a way to make a FTP-server frontend reduce load of its file server backend. For the FTP-server frontend usage we see the following characteristics: Large files, relatively few requests/s. It's important to keep files that are frequently accessed in cache (they might be large), hence have cache filesystem mounted with atime and clean cache based on atime. This works nicely for us using XFS, and cleaning by atime is much quicker and uses less resources than htcacheclean. Others here are more clued on the proxy-cache-usecase, but as I understand it the keywords are many small files, many requests/s so need to mount with noatime and use htcacheclean. Tuning tips in the documentation for these rather different cases would probably be apprecieted. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- All this will be for nothing unless we go to the stars : Babylon 5 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: mod_disk_cache summarization
On Tue, 24 Oct 2006, Graham Leggett wrote: * Allow disk cache to realise that a (large) file is the same regardless of which URL is used to access it. Reduces cache disk usage a lot for sites like ours that's known by ftp.acc.umu.se, ftp.se.debian.org, ftp.gnome.org, se.releases.ubuntu.com, releases.mozilla.org and so on. Perhaps this could be as simple as using ServerName and ServerAlias (unless the name of the site is part of the URL, which will happen in the forward proxy case) to reduce the cached URL to a canonical form before storing and or retrieving from the cache. We have a few different servernames depending on which site it's serving (needs to cater for official download locations and so on) so I guess that won't help much. * Add option to not try to remove cache directories in the cache structure. IMHO, this should never be needed since the cache directory should not be excessively deep (which the broken defaults leads to). Davi had a fix for the cache dir layout I think, and I personally think that neither mod_disk_cache nor htcacheclean should do rmdir. It makes sense that mod_disk_cache shouldn't do it, but perhaps it should be tunable for htcacheclean. Arguably. But if you ever need to remove directories in the cache hiearchy you should really start to wonder why they were created in the first place... * Eventually add option to have header and body in the same cachefile. Is there an advantage to this? IIRC Brian reported that a body in a separate file can take advantage of sendfile, as is as a result much faster. We use combined header/body, and sendfile works flawlessly. Linux sendfile has problems when writing to a sendfile():d file with mmap, and all sendfiles have problems with overlapping sendfile/writes. The main advantage is half the number of inodes and that by removing one file you get rid of both the header and body. I suspect that the performance gain is minimal though. A more formal cache cleanup process needs to be fleshed out, giving the options above both as options in code, and as documentation as you say. The comparison of your and Brian's experience are two ends of extremes on high volume caches, one low hits large files, the second high hits small files. This should make for some useful tuning information. The extreme difference is what makes me think that we should acknowledge that they exist and provide the relevant knobs where necessary. As it looks right now, those knobs tend to be more OS/filesystem specific, but that might change as this evolves. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- Buy a 486-33 you can reboot faster.. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: mod_disk_cache summarization
On Tue, 24 Oct 2006, Joe Orton wrote: IMO: for a general purpose cache it is not appropriate to stop and try to write the entire response to the cache before serving anything. This is existing mod_disk_cache behaviour, the patches reduces these problems. Maybe not in a perfect way, but in a way good enough to show really noticeable improvements. Since improving this mess is a gradual process, you'll have to live with kludges until the optimal solution is there. The alternative would be to do a completely new perfectly designed cache, which given the time it has taken to get mod_cache/mod_disk_cache even near a usable state simply won't happen... You can't both have we want fixes in small incremental pieces and this thing sucks, make it perfect at once. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- Buy a 486-33 you can reboot faster.. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: svn commit: r468373 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_cache.c modules/cache/mod_cache.h modules/cache/mod_disk_cache.c modules/cache/mod_disk_cache.h modules/cache/
On Fri, 27 Oct 2006, Graham Leggett wrote: On Fri, October 27, 2006 4:38 pm, Davi Arnaut wrote: Where is pdconf ? Check out all those APR_HAS_SENDFILE. Aaargh... will fix. The purpose of that code was originally to make EnableSendfile Off in the config file work. APR_HAS_SENDFILE only tells you that APR has sendfile. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- My favorite color? Red. No, BluAHHH! =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: svn commit: r467655 - in /httpd/httpd/trunk: CHANGES docs/manual/mod/mod_cache.xml modules/cache/mod_cache.c modules/cache/mod_cache.h
On Wed, 25 Oct 2006, Graham Leggett wrote: I managed to solve this problem last night. snip This is what this code needed: Someone with a clue on the apache internals so stuff can be solved properly. I have said it before and say it again: I'm not that guy, but I know what functionality is needed for our usecase. People have complained at the kludges present in my patches, and yes they were kludgy. However, they miss the big point: Despite the kludges they get the job done, with the end result being something usable for our usecase. With good performance, no less. If I can improve stuff from the state unusable to actually-pretty-good with kludges, then this should be a rather obvious hint that things suck and should be fixed. To just keep repeating this is no good probably won't achieve this. If the goal is to never accept code that isn't perfect, mod*cache never should have been committed to the httpd tree, and probably most modules (including mod_example) too. Once in a while you have to acknowledge that commited code is crap, and accept patches, albeit kludges, if it improves the situation. Otherwise you might end up with code that keeps on rotting away (mod_example is a good example, again). I would have been most happy if this had been fixed ages ago so I hadn't been forced to spend lots and lots of hours kludging stuff togehter. At least, my kludges seem to have sparked some development in this area, so they have served some purpose other than enabling a non-profit computer club building a FTP/HTTP server that actually works. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- My favorite color? Red. No, BluAHHH! =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: svn commit: r468373 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_cache.c modules/cache/mod_cache.h modules/cache/mod_disk_cache.c modules/cache/mod_disk_cache.h modules/cache/
On Fri, 27 Oct 2006, Graham Leggett wrote: Err. We had the data in memory, we are going to read it back from disk again just in order to not block ? That's nonsense. Agreed. Please explain. This is a disk cache. Why would you write expensive bucket data to cache, and then expensive bucket data to the network? That's plain stupid. And when you have a file backend, you want to hit your disk cache and not the backend when delivering data to a client. People might think that this doesn't matter, but for large files, especially larger than RAM in your machine, you usually go disk-bound without much help from the OS disk cache. Also, httpd seems to be faster delivering data by sendfile than delivering data from memory buckets. That's more of a performance bug in httpd though. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- Monolith Auto Sales Center: My God! It's full of cars! =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: mod_disk_cache summarization
On Tue, 24 Oct 2006, Graham Leggett wrote: On Tue, October 24, 2006 2:48 pm, Niklas Edmundsson wrote: Perhaps this could be as simple as using ServerName and ServerAlias (unless the name of the site is part of the URL, which will happen in the forward proxy case) to reduce the cached URL to a canonical form before storing and or retrieving from the cache. We have a few different servernames depending on which site it's serving (needs to cater for official download locations and so on) so I guess that won't help much. How it is configured? Is this in a virtual host like so? VirtualHost ip.address:port ServerName ftp.gnome.se ServerAlias ftp.somewhere.else ServerAlias ftp.whatever ... /VirtualHost If the URLs change (ie the directories are different) then its a different story. Different VHosts meaning different URLs/directories, pointing to the same files... /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- Colleges don't make fools; they only develop them =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: cache: the store_body interface
On Mon, 30 Oct 2006, Ruediger Pluem wrote: BTW: Does anybody know if MMAP for writing files is possible / makes sense / improves performance? It reduces some data-copying, so it's a tad cheaper to mmap. But, on Linux you can't do sendfile from a file that's being written to with mmap, and since I wanted to be able to do read-while-caching I dropped the mmap-write-idea since the drawbacks was way larger than the benefits. YMMV /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- DIME: A dollar after taxes. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: mod_disk_cache and mod_include bugs and suggestions
On Mon, 15 Jan 2007, Graham Leggett wrote: In order for caches to work, the Last-Modified or ETag headers need to be set correctly on the content, and this isn't always the case. When this happens, content isn't cached. Another module with this problem is the mod_dir directory index generator, which also isn't cacheable for the same reasons SSI aren't. Modern httpd releases can work around this if you set IndexOptions TrackModified, look in the docs for more info and limitations. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- ODOSCAN.EXE - Gets the Quaraks out of your Hard Drive! =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: Solved: mod_disk_cache and mod_include bugs and suggestions
On Wed, 17 Jan 2007, Giuliano Gavazzi wrote: I have a solution for the r470455 mod_disk_cache not caching SSI. There are two points where the module seems incorrect to me, changing those makes it work: Since you're talking about the code on trunk, you should be warned that the current state is somewhat unreliable due to merging patches which then ran into an implementation discussion that never got solved (I think). Last I heard, the current plan is to revoke most patches and redo stuff. However, since I'm the one to blame for the patches that has been partially landed on trunk (which is the parts you touch) I can provide my comments on your solutions (and I hope that others can chime in where I'm wrong). First, don't reindent code when not needed. That only serves to make your patch hard to read. 1) in store_body the condition (!APR_BUCKET_IS_EOS(APR_BRIGADE_LAST(bb))) was incorrectly stopping the flow from ever going past (for static and dynamic pages). I moved it, changing the condition. I will post the patch tomorrow. From looking at the patch I can only say huh?. The brigade is complete when EOS is present, and only then can you complete the storing procedure. From a quick look at your patch I can't see how it could change things (instead of dropping out if not EOS you have a big if-chunk if it indeed is EOS, only adding an indentation level). I might have missed some detail, but it's not obious from the hard-to-read patch... 2) in store_disk_headers nothing should happen (well, it should just return or never be called) if the dobj-initial_size 0. It should be called, and it should do stuff. One of the points of those patches are to solve the thundering herd problem, simply described as when a frequently accessed object is expired all accesses are served directly by your backend until one access has completed successfully and the cache has been able to store it. This is Bad if it causes your backend to grind to a halt. To avoid this, the header is always written when the cache thinks it should cache something. Other requests will find this header, and if the size is unknown they will wait until it's updated with the correct size, otherwise they will do read-while-caching and return the contents as the file is being cached. Those two changes make the header cache file store the correct resource size also for dynamic pages. It stores the size, but doing so it breaks quite a few things. I think it would be best if someone (Graham?) could revoke the status of mod_disk_cache on trunk to the agreed last good status, which is essentially the same as 2.2.4 if I remember correctly. As for your problems, I would recommend staying on 2.2.4 proper and look further into the issue of expired/last-modified headers. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- And tomorrow will be like today, only more so. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
mod_disk_cache jumbopatch - new revision
I uploaded a new version of our mod_disk_cache jumbopatch for httpd 2.2.4 to http://issues.apache.org/bugzilla/show_bug.cgi?id=39380 It's what we've been using for a couple of months now (modulo upgrade to httpd 2.2.4) and should be considered fairly stable. It has survived all sorts of pathetic load-cases on http://ftp.acc.umu.se/ (also known as ftp.se.debian.org ftp.gnome.org, se.releases.ubuntu.com, se.archive.ubuntu.com, releases.mozilla.org) including our nfs backend going bezerk and bottoming out at a few MB/s when all frontends wanted to cache 300GB of new debian-weekly-build-isos. Highlights from previous patch: * Reverted to separate files for header and data, there were too many corner cases and having the data file separate allows us to reuse the cached data for other purposes (for example rsync). * Fixed on disk headers to be stored in easily machine parseable format which allows for error checking instead human readable form that doesn't. * Attaching the background thread to the connection instead of request pool allows for restarts to work, the thing doesn't crash when you do apachectl graceful anymore :) * Lots of error handling fixes and corner cases, we triggered most of them when our backend went bezerk-go-slow-mode. * Deletes cached files when cache decides that the object is stale for real, previously it only NULL:ed the data structure in memory causing other requests oto read headers etc. Not mentioned in bugzilla, this is probably also relevant: * Cache-file-path for Headers are hashed on URL, body on r-filename if present. This allows for using the same cache with external programs (for example rsync). For those interested in using the same cache for rsync, we have whipped up an open-wrapper (uses LD_PRELOAD) which seems to be doing the job nicely. It can't cache as much metadata as mod_disk_cache, but it is able to reuse the cached bodies at least, which is a good thing if you have a lot of client sites that rsync the same trees daily. We're awaiting some progress on mod_ftp to be able to cache ftp too, all usable ftpd's we have seen uses chroot() which causes trouble when trying to wrap open() and friends to access files outside the chroot ;) /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- What? Hey. Beverly. - Picard =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: Solved: mod_disk_cache and mod_include bugs and suggestions
On Wed, 17 Jan 2007, Giuliano Gavazzi wrote: rv = apr_file_seek(dobj-hfd, APR_SET, off); does not rewind if the file has been opened with APR_FOPEN_BUFFERED. Now, I This is an APR bug, I submitted a bug report for it a while ago. I worked around it by not using buffering at all and writing larger chunks when writing headers. There is another potential problem in store_headers, if the headers file is Is this in trunk or in 2.2.4 proper? You should probably ignore mod_disk_cache on trunk until that situation is settled. If you're into trying patches you could give my mod_disk_cache jumbopatch a spin, note however that it's only been tested for mostly static content (directory indexes being an exception). You can find the patch at http://issues.apache.org/bugzilla/show_bug.cgi?id=39380 ... /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- Preserve wildlife... pickle a sqirrel. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: Solved: mod_disk_cache and mod_include bugs and suggestions
On Wed, 17 Jan 2007, Giuliano Gavazzi wrote: Is this in trunk or in 2.2.4 proper? You should probably ignore mod_disk_cache on trunk until that situation is settled. I could work on mod_disk_cache from 2.2.4 proper, and find what causes the bug with SSI pages, but I do not see why I should spend another couple of days on it now that I have fixed the r470455 (that is trunk) release. After all the situation might settle quicklier (I think Shakespeare used this...) if I submit my patches! And I suppose now you agree that the extra indentation on my patch stemmed from the broken nature of the original code! Actually I don't. Either you or me have misunderstood how buckets work, since the rest of the code should syntactically be equivalent. Or I'm missing some fine detail somewhere. Anyhow, since the code on trunk probably is ReallyBrokentm I wouldn't waste my time on it until the situation has been cleared up. If you're into trying patches you could give my mod_disk_cache jumbopatch a spin, note however that it's only been tested for mostly static content (directory indexes being an exception). You can find the patch at http://issues.apache.org/bugzilla/show_bug.cgi?id=39380 ... I think I will give it a spin, more to give you feedback on possible issues with SSI. Do that. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- I think he did a little too much LDS. - Kirk =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: Solved: mod_disk_cache and mod_include bugs and suggestions
On Thu, 18 Jan 2007, Giuliano Gavazzi wrote: Actually I don't. Either you or me have misunderstood how buckets work, since the rest of the code should syntactically be equivalent. Or I'm missing some fine detail somewhere. Perhaps I do not understand buckets fully (and brigades), but this seems to be clear enough. The fine detail is in the original code (sorry for repeating myself): while (e != APR_BRIGADE_SENTINEL(bb)) { Ugh. I see it now. The version on trunk has a different form of solution to the read-while-caching-problem than my patch, and that solution depends on other stuff in trunk. If I remember correctly you crafted the trunk version onto 2.2.4, and that's bound to fail. Either test trunk, or 2.2.4. Don't mix files freely between them and expect stuff to work ;) I have also tested your patch (httpd-2.2.4-mod_disk_cache-jumbo20070117.patch) and in my limited test it works for SSI, but does not seem to be less prone than my patch of r470455 to hammering of the back-end. It is actually a tad worse. A test on localhost with an SSI calling this script: #!/bin/sh echo `date` foo.log sleep 10 echo bar with: /usr/local/apache2/bin/ab -c 10 -n 20 URL gives 13 calls to the backend with yours and 12 with mine. 18 failures out of 20 (for length) in yours, and no failures in mine. Actually, it seems that yours confuses ab, as it reported a length 2 bytes short, and not corresponding to the one in the header file. The throughput is about the same. What's your update timeout? If you have a sleep 10 in the script, you'll need an update timeout longer than that or you'll always fail. It shouldn't report different lengths though. Enable debug logging in httpd and review the debug log in order to find out exactly where it falls short. Regarding it hitting backend many times, that's probably due to the small window between I have no cached copy, I need to cache it so I let it travel along the filter chain and I have stuff to write, let's create a cache file. ab hits the page at exactly the same time, so it will trigger it. My patches try hard to detect when it's happening and only one instance will do the actual caching, but since I haven't looked at the particular issues with dynamic content the code tends to lean towards correctness (old behaviour) rather than performance. It replaces the brigade with an instance of the cached file when it detects that it's already being cached. For stuff with unknown size (usually dynamic content) it can't do this, so it's bound to hit your backend. I have no clue on how to solve this with the current cache design, but I'm sure there are more clued people here when it comes to caching and dynamic content. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- Want to forget all your troubles? Wear tight shoes. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
mod_cache: save filter recalls body to non-empty brigade?
In mod_cache, recall_body() is called in the cache_save_filter() when revalidating an entity. However, if I have understood things correctly the brigade is already populated when the save filter is called, so calling recall_body() in this case would place additional stuff in the bucket brigade. Wouldn't it be more correct to empty the brigade before calling recall_body()? Or am I missing something? This is mod_cache in vanilla httpd 2.2.4 by the way. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- 9 out of 10 priests prefer young boys to Doom. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: mod_cache: save filter recalls body to non-empty brigade?
On Wed, 24 Jan 2007, Graham Leggett wrote: On Wed, January 24, 2007 2:15 pm, Niklas Edmundsson wrote: In mod_cache, recall_body() is called in the cache_save_filter() when revalidating an entity. However, if I have understood things correctly the brigade is already populated when the save filter is called, so calling recall_body() in this case would place additional stuff in the bucket brigade. Wouldn't it be more correct to empty the brigade before calling recall_body()? Or am I missing something? I think the theory is that recall_body() should only be called on a 304 not modified (with no body), so in theory there is no existing body present, so no need to clear the brigade. Ah. Then it makes sense. I only saw that it checked if status == OK, but I see now that I was looking at the wrong status value ;) Of course practically you don't want to make assumptions about the emptiness of the existing brigade, so clearing the brigade as a first step makes definite sense. OK. Do you want a patch for it, or will you fix it yourself? The cache-situation on trunk isn't completely clear, so maybe those patches that should be revoked from there should be cleaned up first... /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- Reality--what a concept! =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: mod_cache: save filter recalls body to non-empty brigade?
On Wed, 24 Jan 2007, Plüm, Rüdiger, VF EITO wrote: Of course practically you don't want to make assumptions about the emptiness of the existing brigade, so clearing the brigade as a first step makes definite sense. It is not needed to clear the brigade, because the brigade passed to the filter is named in, the one where recall_body stores the cached file is bb. I the case of a recalled body we pass bb down the chain not in. Ah, of course. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- Air pollution is a mist demeanor. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: 3.0 - Proposed Goals
On Wed, 14 Feb 2007, Garrett Rooney wrote: - Rewrite how Brigades, Buckets and filters work. Possibly replace them with other models. I haven't been able to personally consolidate my thoughts on how to 'fix' filters, but I am sure we can plenty of long threads about it :-) I think a big part of this should be documenting how filters are supposed to interact with the rest of the system. Right now it seems to be very much a well, I looked at this other module and did what it did, and it's quite easy to start depending on behavior in the system that isn't actually documented to work that way. This hits a rather sweet spot it seems. Browsing the current httpd module/developer docco I find gems like: http://httpd.apache.org/docs/2.2/developer/modules.html One would think that now that 2.2 is released at least the 1.3-2.0 converting docco would have evolved to something better than it's a start ... Also, we have http://httpd.apache.org/docs/2.2/developer/API.html ... It seems that the most current API docco is for 1.3, but at least there's a nice disclaimer telling that it's obsolete but some information might be correct. So yes, I fully agree that documentation is needed. It's a pain trying to figure out how stuff (are supposed to) work when the docco is two major releases behind... One problem here is that this kind of docco usually needs to be made by those who hate to write it: the core programmers. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- I should have done this a long time ago. - Picard =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: 3.0 - Proposed Goals
On Wed, 14 Feb 2007, Nick Kew wrote: On Wed, 14 Feb 2007 15:41:38 +0100 (MET) Niklas Edmundsson [EMAIL PROTECTED] wrote: One problem here is that this kind of docco usually needs to be made by those who hate to write it: the core programmers. The core programmers use the core programmer documentation, aka the source code. In particular, the .h files, which give you detailed API documentation. Mkay. However, the source and header files aren't very good in the how it's supposed to work department. You usually end up looking at a module that implements stuff the wrong way. mod_example might be the ultimate example of this ;) For higher-level documentation of Apache 2.2, follow my .sig. Remove stale docco and point there from the httpd website, then? /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- A bird in the bush can't mess in your hand! =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Please backport mod_cache PR 41475 to 2.2.5 ...
Hi! I might be jumping the gun here, but I'd really like to see the fix for PR 41475 backported to 2.2.5. We're hitting this issue when mirroring the firefox installer which has a space in the filename... We'll probably apply the fix locally, but it would be nice to have the mod_cache fixes in 2.2.5 so we don't have to keep track of them when upgrading... /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- I am NOT a computer nerd! I am a techno-weenie. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: [RFC] Guide to writing output filters
On Sat, 17 Mar 2007, Ruediger Pluem wrote: On 03/16/2007 11:55 PM, Joe Orton wrote: http://people.apache.org/~jorton/output-filters.html How does this look? Anything missed out, anything that doesn't make sense? I think this covers most of the major problems in output filters which keep coming up. Thanks for doing this. It looks very good to me, especially as it gives us a set of rules and best practises even though I think there might be a discussion on the details. As a not-so-clued person on httpd internals I have to whole-hearedly agree and add a Bravo! to this effort. httpd is seriously lacking on the devel-docco-front, meaning that the little in-tree documentation and examples that exists is generally outdated or broken, and out-of-tree docco doesn't count in this regard. This is truly a step in the right direction. Now, if only someone clued could have a go at the existing pages that says this should be improved/updated/written life would be bliss :) And yes, I know that writing documentation is a drag. However, in the long run it pays off. Really. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- Is virus a 'micro' organism? =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
mod_ftp, status and progress?
Hi all! What is the current status/progress on mod_ftp? I haven't seen much on [EMAIL PROTECTED] about it since the graduation... In any case, we'd really like to get mod_ftp in a usable state so we can use it on our anonftp frontends. We currently use vsftpd and are really happy with it, but we would REALLY want to have the cache handled by mod_cache used by FTP too. We have already convinced rsync into using the cache by a LD_PRELOAD hack, unfortunately this doesn't work too well with ftpd's since they rely on chroot() to work. So, how close is mod_ftp to handle this for us? I'll list the requirements we have and comment, and hopefully the Really Clued Ones will chime in with additional comments/status/etc: In order to use mod_ftp on ftp.acc.umu.se (which runs httpd 2.2.4 with our mod_disk_cache jumbopatch) we need it to: * Play well with mod_cache, if a file has been requested with HTTP a FTP request should reuse the cached copy. Last time I checked mod_ftp only did subrequests which mod_cache didn't act on. Of course files requested with FTP and thus cached should cause a HTTP request to use the cached copy (although with a revalidation to get current headers I guess). I think this won't work too well with vanilla mod_disk_cache, however our mod_disk_cache jumbo patch caches the body in a hash based on r-filename to solve the name-space issues. * Only anonyomous read-only-access is required. I think this is working today. * Download-related items like file listings, continuation, etc MUST work. * Both passive and active mode MUST work. I think there was some issues causing it to always use 0.0.0.0 in passive mode last time I checked. * Large file support MUST work (we serve DVD images). Last time I checked there was a whole slew of LFS issues with the mod_ftp globbing code which was simply broken since it didn't use the information gathered by configure and didn't use APR - it should be ripped out and replaced with APR stuff instead IMHO. It makes no sense to keep that mess just o keep httpd 2.0 compatibility... * IPv6 MUST work. I think this is being addressed. * Probably something that I forgot so it can't be that important ;) I (and other fellow computer club people) would be happy to hack on small bugs and issues, but the more hairy things like the mod_ftp/mod_cache interaction and the globbing mess really needs a Clued Httpd Developer sorting out the various odds and ends. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- PRIME DIRECTIVE, MY ASS! Phasers on maximum! Load photon torpedoes! =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: mod_ftp, status and progress?
On Fri, 23 Mar 2007, William A. Rowe, Jr. wrote: * Play well with mod_cache, if a file has been requested with HTTP a FTP request should reuse the cached copy. Last time I checked mod_ftp only did subrequests which mod_cache didn't act on. In terms of using 'top level' requests in lieu of subrequests, it's not low hanging fruit but definitely worth the refactoring. Doing this against httpd trunk/ will show up the API's that httpd is missing for providers of resource-based servers such as ftp. OK. This will need more investigation then, the easiest solution would seem to be to get the subrequest interaction with mod_cache right. Bright insights are welcome. * Both passive and active mode MUST work. I think there was some issues causing it to always use 0.0.0.0 in passive mode last time I checked. fixed afaik, unless you are on win2000 and didn't DisableWin32AcceptEx in 2.2.4 release (apr 1.2.8). The fix is in trunk and will percolate out as apr 1.2.9 (or later) with 2.2.5. Nice. * Large file support MUST work (we serve DVD images). Last time I checked there was a whole slew of LFS issues with the mod_ftp globbing code which was simply broken since it didn't use the information gathered by configure and didn't use APR - it should be ripped out and replaced with APR stuff instead IMHO. It makes no sense to keep that mess just o keep httpd 2.0 compatibility... Patches welcome, yes this needs some refactoring. Any thoughts on how to do this? My mind tend to be focused on what needs to work for anonftp in this regard, and that means naively listing a directory without any thoughts on file permissions and such. If a file/directory is within the anonftp tree it's OK to include it in the listing. However, if there's some special care that needs to be done when supporting file uploads (only listing directories which the auth:ed user have access to and other special stuff) I will probably miss this unless someone clued does the high level design. * IPv6 MUST work. I think this is being addressed. I'd fixed the traditional interfaces (PORT/PASV) but we need to hack together EPRT/EPSV support, yet. OK. This shouldn't be too hard, given that EPRT/EPSV doesn't differ too much from PORT/PASV. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- It is always darkest before it goes totally black. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: PATCH 19824 -- enhancement to mod_expires
On Sat, 31 Mar 2007, Jeffrey Friedl wrote: so that images are cached essentially forever, but this means that they can not reasonably be updated in place. However, with this patch, you might use ExpiresByType image/jpeg aged 2 days THEN 10 years ELSE 1 hour to allow for some initial tweaking. I think it would make more sense to use the same behaviour as mod_cache instead of having hard-coded expire-times when it comes to entities which has a last-modified header, ie newly modified entities gets a low expire while stuff not changed for a while gets a high expire. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- Where will YOU be when your laxative starts working? =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: mod_cache: 304 on HEAD (bug 41230)
On Wed, 11 Apr 2007, Niklas Edmundsson wrote: Would the correct fix be to check for r-header_only in cache_select(), or are there even more funky stuff going on? You don't want the cached object to be removed just because you got a HEAD request when it really isn't stale but just in need of revalidation. Ideally the HEAD request would cause the object to be revalidated if possible, but we can live with head requests just doing fallback without touching the cache. I can whip up a patch for it, but I suspect that you guys are more clued on the deep magic involved :) Looking a bit further, I think that something like this would actually be enough: ---8-- --- mod_cache.c.orig2007-04-11 13:29:14.0 +0200 +++ mod_cache.c 2007-04-11 14:06:29.0 +0200 @@ -456,7 +456,7 @@ static int cache_save_filter(ap_filter_t */ reason = No Last-Modified, Etag, or Expires headers; } -else if (r-header_only) { +else if (r-header_only !cache-stale_handle) { /* HEAD requests */ reason = HTTP HEAD request; } @@ -589,11 +589,12 @@ static int cache_save_filter(ap_filter_t cache-provider-remove_entity(cache-stale_handle); /* Treat the request as if it wasn't conditional. */ cache-stale_handle = NULL; +rv = !OK; } } -/* no cache handle, create a new entity */ -if (!cache-handle) { +/* no cache handle, create a new entity only for non-HEAD request */ +if (!cache-handle !r-header_only) { rv = cache_create_entity(r, size); info = apr_pcalloc(r-pool, sizeof(cache_info)); /* We only set info-status upon the initial creation. */ ---8-- If I have understood things right this would: - Accept revalidations even though it's a HEAD if the object wasn't stale. - Bail out if the object is stale and it's a HEAD. I haven't tried it yet though, I'm just trying to get a grasp of things. I have no clue on whether other things would break due to the fact that it's revalidated based on a HEAD instead of a GET, for example. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- I am Mr. T of Borg. I pity da fool that resists me. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
[PATCH] mod_cache 304 on HEAD (bug 41230)
On Wed, 11 Apr 2007, Niklas Edmundsson wrote: Looking a bit further, I think that something like this would actually be enough: snip, included as an attachment I have now tested this patch, and it seems to solve the problem. This is on httpd-2.2.4 + patch for PR41475 + our mod_disk_cache patches. Without the patch a HEAD on a cached expired object that isn't modified will unconditionally return 304 and furthermore cause the cached object to be deleted. We believe that this is the explanation to why it has been so hard to track down this bug - it only bites one user and that user usually has no clue on what happens, and even if we try to reproduce it immediately afterwards it won't trigger. With the patch stuff works like expected: - A HEAD on a cached expired object that isn't modified will update the cache header and return the proper return code, it follows the same code path as other requests on expired unmodified objects. - A HEAD on a cached expired object that IS modified will remove the object from cache and then decline the opportunity to cache the object. I request that this is reviewed, commited and proposed for backport to httpd 2.2.5. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- The pain is bad enough. Don't go poetic on me. - Madeline =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--- mod_cache.c.orig2007-04-11 13:29:14.0 +0200 +++ mod_cache.c 2007-04-11 14:06:29.0 +0200 @@ -456,7 +456,7 @@ static int cache_save_filter(ap_filter_t */ reason = No Last-Modified, Etag, or Expires headers; } -else if (r-header_only) { +else if (r-header_only !cache-stale_handle) { /* HEAD requests */ reason = HTTP HEAD request; } @@ -589,11 +589,12 @@ static int cache_save_filter(ap_filter_t cache-provider-remove_entity(cache-stale_handle); /* Treat the request as if it wasn't conditional. */ cache-stale_handle = NULL; +rv = !OK; } } -/* no cache handle, create a new entity */ -if (!cache-handle) { +/* no cache handle, create a new entity only for non-HEAD request */ +if (!cache-handle !r-header_only) { rv = cache_create_entity(r, size); info = apr_pcalloc(r-pool, sizeof(cache_info)); /* We only set info-status upon the initial creation. */
Re: [PATCH] mod_cache 304 on HEAD (bug 41230)
On Mon, 16 Apr 2007, Ruediger Pluem wrote: I have now tested this patch, and it seems to solve the problem. This is on httpd-2.2.4 + patch for PR41475 + our mod_disk_cache patches. Without the patch a HEAD on a cached expired object that isn't modified will unconditionally return 304 and furthermore cause the cached object to be deleted. We believe that this is the explanation to why it has been so hard to track down this bug - it only bites one user and that user usually has no clue on what happens, and even if we try to reproduce it immediately afterwards it won't trigger. With the patch stuff works like expected: - A HEAD on a cached expired object that isn't modified will update the cache header and return the proper return code, it follows the same code path as other requests on expired unmodified objects. - A HEAD on a cached expired object that IS modified will remove the object from cache and then decline the opportunity to cache the object. Are you really sure that it gets deleted? cache-provider-remove_entity does not really remove the object from the cache. Only cache-provider-remove_url does this. Yes, but the CACHE_REMOVE_URL filter will remove it, right? It removes the CACHE_REMOVE_URL filter only after it has decided that it's actually caching the response so it will bite in that case. I consider the CACHE_SAVE filter already as hard to read (not your fault by any means), but from my point of view your patch does increase this (specificly I think about the rv = !OK line. I know that a similar trick is done some lines above, but I don't like that one either). I also found rv=!OK ugly, but I just followed the established style to create a minimal patch without extra fuzz. Feel free to clean stuff up to improve readability, as long as the bug gets fixed I'm happy :) /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- Push any key. Then push the any other key. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: [PATCH] mod_cache 304 on HEAD (bug 41230)
On Tue, 17 Apr 2007, Plüm, Rüdiger, VF-Group wrote: Are you really sure that it gets deleted? cache-provider-remove_entity does not really remove the object from the cache. Only cache-provider-remove_url does this. Yes, but the CACHE_REMOVE_URL filter will remove it, right? It removes the CACHE_REMOVE_URL filter only after it has decided that it's actually caching the response so it will bite in that case. But only if there is cache-handle or a cache-stale_handle. We have neither, as cache-stale_handle is set to NULL. Ah, of course. Looking closer I find that as a part of our hacking on mod_disk_cache we fixed remove_entity to also remove the cache-files. If I remember correctly it was leaving stale cache files in some code paths, I guess that this was one of them. And we never figured out why there was both remove_entity and remove_url anyway, even the mod_cache-code seems to get them confused... /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- HEALTH: The slowest possible rate of dying. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: mod_ftp, status and progress?
On Thu, 26 Apr 2007, Jim Jagielski wrote: On Apr 18, 2007, at 1:22 PM, Guenter Knauf wrote: Hi, the current code fails to build for Win32 target. This is because ftp_glob.c seems not APR-ised yet; I'm actually looking at removing the whole glob stuff and emulating it as regexes... Wouldn't apr_match_glob() be a better starting point? I don't really see the point of going via regexes... /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- Luckily, I'm out of hairs to split! =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: mod_ftp, status and progress?
On Thu, 26 Apr 2007, Jim Jagielski wrote: I'm actually looking at removing the whole glob stuff and emulating it as regexes... Wouldn't apr_match_glob() be a better starting point? I don't really see the point of going via regexes... I was thinking for 2.0.x compatibility... Wouldn't it be better to focus on 2.2.x and onwards? OK, there's a lot of people still running 1.3 and 2.0, but that doesn't mean that we have to make it run on all of them... I'm all for focusing on getting it usable for 2.2+, and if people really want the httpd-tree mod_ftp that bad they can see it as yet another good reason to upgrade. There's a lot of work that needs to be done in order to have mod_ftp usable, and making the code more complex in order to support the previous stable httpd version doesn't really sound that appealing. Not that I'm against backward compatibility, but I'd prefer seeing a clean design for the current/future httpd version and the compat stuff handled by wrapper functions stashed in a mod_ftp_20compat.c or something like that. In any case, as long as the code is readable and works. The current mod_ftp globbing stuff is simply a mess. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- Reality is for people who can't handle Star Trek. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: mod_ftp, status and progress?
On Fri, 27 Apr 2007, Jim Jagielski wrote: I'm actually looking at removing the whole glob stuff and emulating it as regexes... Wouldn't apr_match_glob() be a better starting point? I don't really see the point of going via regexes... I was thinking for 2.0.x compatibility... Wouldn't it be better to focus on 2.2.x and onwards? OK, there's a lot of people still running 1.3 and 2.0, but that doesn't mean that we have to make it run on all of them... Why? Really, it's no big deal to ensure it runs on both. I'm not against keeping compatibility. However I feel that the right way to do it would be to design stuff for current httpd and then add glue for the backward compat stuff (and not doing it the #ifdef-mess-way). So, going for regexes just because apr_match_glob() doesn't exist in 2.0.x seems a bit sub-optimal... /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- You're the security chief-shouldn't you be out securing something? =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
mod_ftp: [PATCH] Make REST work with large files
Attached is a patch written some time ago to make the REST command grok large files on LFS-capable platforms by using apr_strtoff() instead of strtol(). It's untested, mostly because I didn't have a test server handy at the moment. Thought I should submit the patch before I lost it in the twisty maze of svn trees though. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- Fiddle: Friction of a horse's tail on cat's entrails. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=Index: modules/ftp/ftp_commands.c === --- modules/ftp/ftp_commands.c (revision 533227) +++ modules/ftp/ftp_commands.c (working copy) @@ -1784,33 +1784,18 @@ static int ftp_cmd_rest(request_rec *r, const char *arg) { ftp_connection *fc = ftp_get_module_config(r-request_config); -conn_rec *c = r-connection; char *endp; +apr_status_t rv; -/* XXX: shortcoming, cannot restart ~2GB. Must be solved in - * APR, or we need to use - * int len; - * res = sscanf(arg,%APR_OFF_T_FMT%n, fc-restart_point, len); - * end = arg + len; - * and test that res == 2. Dunno how portable or safe this gross - * hack would be in real life. - */ -fc-restart_point = strtol(arg, endp, 10); -if (((*arg == '\0') || (*endp != '\0')) || fc-restart_point 0) { -fc-response_notes = apr_pstrdup(r-pool, REST requires a an - integer value greater than zero); +rv = apr_strtoff((fc-restart_point), arg, endp, 10); +if (rv != APR_SUCCESS || ((*arg == '\0') || (*endp != '\0')) || +fc-restart_point 0) +{ +fc-response_notes = apr_pstrdup(r-pool, REST requires a + non-negative integer value); return FTP_REPLY_SYNTAX_ERROR; } -/* Check overflow condition */ -if (fc-restart_point == LONG_MAX) { -ap_log_error(APLOG_MARK, APLOG_WARNING|APLOG_NOERRNO, 0, - c-base_server, - Client attempted an invalid restart point); -/* XXX: possible overflow, continue gracefully? Many other FTP - * client do not check overflow conditions in the REST command. - */ -} fc-response_notes = apr_psprintf(r-pool, Restarting at % APR_OFF_T_FMT . Send STORE or RETRIEVE to initiate transfer., fc-restart_point);
[PATCH] mod_cache: Don't follow NULL pointers.
We encountered the following bug: httpd segfaulted due to a client emitting Cache-Control: max-age=216000, max-stale which is a perfectly valid header. The segfault is caused by the fact that ap_cache_liststr() sets the value pointer to NULL when there is no value, and this isn't checked at all in the cases when a value pointer is passed. I think that this patch catches all those occurances. I'm not proud of the solution for max-stale without value, but it should do the job... In any case, this is a bug that should be fixed ASAP and queued for inclusion in httpd 2.2.5 since it segfaults your httpd even with valid headers... /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- I am Yoda of Borg. Assimilated you will be, hmmm? =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--- ../../../dist/modules/cache/cache_util.c2006-10-13 01:11:33.0 +0200 +++ cache_util.c2007-05-02 10:26:08.0 +0200 @@ -243,7 +243,8 @@ age = ap_cache_current_age(info, age_c, r-request_time); /* extract s-maxage */ -if (cc_cresp ap_cache_liststr(r-pool, cc_cresp, s-maxage, val)) { +if (cc_cresp ap_cache_liststr(r-pool, cc_cresp, s-maxage, val) + val != NULL) { smaxage = apr_atoi64(val); } else { @@ -252,7 +253,8 @@ /* extract max-age from request */ if (!conf-ignorecachecontrol - cc_req ap_cache_liststr(r-pool, cc_req, max-age, val)) { + cc_req ap_cache_liststr(r-pool, cc_req, max-age, val) + val != NULL) { maxage_req = apr_atoi64(val); } else { @@ -260,7 +262,8 @@ } /* extract max-age from response */ -if (cc_cresp ap_cache_liststr(r-pool, cc_cresp, max-age, val)) { +if (cc_cresp ap_cache_liststr(r-pool, cc_cresp, max-age, val) + val != NULL) { maxage_cresp = apr_atoi64(val); } else { @@ -282,7 +285,14 @@ /* extract max-stale */ if (cc_req ap_cache_liststr(r-pool, cc_req, max-stale, val)) { -maxstale = apr_atoi64(val); +if(val != NULL) { +maxstale = apr_atoi64(val); +} +else { +/* If no value is assigned to max-stale, then the client is willing + * to accept a stale response of any age */ +maxstale = APR_INT64_C(0x7fff); /* No APR_INT64_MAX? */ +} } else { maxstale = 0; @@ -290,7 +300,8 @@ /* extract min-fresh */ if (!conf-ignorecachecontrol - cc_req ap_cache_liststr(r-pool, cc_req, min-fresh, val)) { + cc_req ap_cache_liststr(r-pool, cc_req, min-fresh, val) + val != NULL) { minfresh = apr_atoi64(val); } else {
Re: [PATCH] mod_cache: Don't follow NULL pointers.
On Wed, 2 May 2007, Niklas Edmundsson wrote: We encountered the following bug: httpd segfaulted due to a client emitting Cache-Control: max-age=216000, max-stale which is a perfectly valid header. The segfault is caused by the fact that ap_cache_liststr() sets the value pointer to NULL when there is no value, and this isn't checked at all in the cases when a value pointer is passed. I think that this patch catches all those occurances. Or so I thought. It turned out that ap_cache_liststr() didn't set the value pointer to NULL in all cases where it should. Now it does. I'm not proud of the solution for max-stale without value, but it should do the job... It did, but it caused the freshness calculation to overflow so the end result was bollocks. I hard-coded 100 years for the max-stale without value case, not pretty but it works. Updated patch attached. /Nikke - not fond of fixing bugs with core-files as the only source of information :/ -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- REJECTION: When your imaginary friends won't talk to you. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--- ../../../dist/modules/cache/cache_util.c2006-10-13 01:11:33.0 +0200 +++ cache_util.c2007-05-02 14:01:06.0 +0200 @@ -243,7 +243,8 @@ age = ap_cache_current_age(info, age_c, r-request_time); /* extract s-maxage */ -if (cc_cresp ap_cache_liststr(r-pool, cc_cresp, s-maxage, val)) { +if (cc_cresp ap_cache_liststr(r-pool, cc_cresp, s-maxage, val) + val != NULL) { smaxage = apr_atoi64(val); } else { @@ -252,7 +253,8 @@ /* extract max-age from request */ if (!conf-ignorecachecontrol - cc_req ap_cache_liststr(r-pool, cc_req, max-age, val)) { + cc_req ap_cache_liststr(r-pool, cc_req, max-age, val) + val != NULL) { maxage_req = apr_atoi64(val); } else { @@ -260,7 +262,8 @@ } /* extract max-age from response */ -if (cc_cresp ap_cache_liststr(r-pool, cc_cresp, max-age, val)) { +if (cc_cresp ap_cache_liststr(r-pool, cc_cresp, max-age, val) + val != NULL) { maxage_cresp = apr_atoi64(val); } else { @@ -282,7 +285,16 @@ /* extract max-stale */ if (cc_req ap_cache_liststr(r-pool, cc_req, max-stale, val)) { -maxstale = apr_atoi64(val); +if(val != NULL) { +maxstale = apr_atoi64(val); +} +else { +/* If no value is assigned to max-stale, then the client is willing + * to accept a stale response of any age */ +/* Let's pretend 100 years is enough, need margin some marging here + * or the freshness calculation later will overflow */ +maxstale = APR_INT64_C(86400*365*100); +} } else { maxstale = 0; @@ -290,7 +302,8 @@ /* extract min-fresh */ if (!conf-ignorecachecontrol - cc_req ap_cache_liststr(r-pool, cc_req, min-fresh, val)) { + cc_req ap_cache_liststr(r-pool, cc_req, min-fresh, val) + val != NULL) { minfresh = apr_atoi64(val); } else { @@ -419,6 +432,9 @@ next - val_start); } } +else { +*val = NULL; +} } return 1; }
Re: mod_ftp, status and progress?
On Wed, 2 May 2007, Jim Jagielski wrote: In fact, to be honest, it would be easier still to just update ftp_direntry_get() to use apr_fnmatch(), since we always want to support globing. ftp_direntry_get already does most of what makes apr_match_glob attractive in the 1st place. Should have a patch to commit later on tomorrow, after some more tests :) I suspect that you're fixing the large file issues while you're at it? Another thing I noticed when we started to look at mod_ftp (looking at strace/truss-output trying to figure out why things didn't work) was that it stats all entries in a directory twice, first explicitly and then via the subreq. Wouldn't the subreq be enough? It's no biggie for now, but it would be nice to get rid of unneccessary stats as a bonus ;) /Nikke - eager to give it a spin :) -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- You wanted to make it law. Make it a good one. - Picard =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: mod_ftp, status and progress?
On Thu, 3 May 2007, William A. Rowe, Jr. wrote: Another thing I noticed when we started to look at mod_ftp (looking at strace/truss-output trying to figure out why things didn't work) was that it stats all entries in a directory twice, first explicitly and then via the subreq. Wouldn't the subreq be enough? It's no biggie for now, but it would be nice to get rid of unneccessary stats as a bonus ;) This is a separate issue; we need to refactor out 90% of the subrequests and treat these at top level requests. OK. I was under the impression that those subrequests were made to filter out stuff you don't have access to from the directory listings, but I stand corrected. I discovered while trying to accomodate named virtual hosts (the hack to let [EMAIL PROTECTED] resolve to the host.com vhost) it's simply not worth hacking one without the other. Ah. Am I right in guessing that making it play well with mod_cache would come more or less for free after the request-refactoring is done? /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- He who laughs last is probably your boss! =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: ftp glob/limits?
On Mon, 14 May 2007, William A. Rowe, Jr. wrote: What would folks think about changing if (ap_strchr_c(arg, '*') != NULL) { /* Prevent DOS attacks, only allow one segment to have a wildcard */ int found = 0; /* The number of segments with a wildcard */ to permit multiple wildcards, but to restrict the number of matches returned (configurable with a directive, of course)? Over a small pattern space, uploads/*/* is often very useful. What would be the sane default? 1,000 entries? For anonftp usage I would prefer the restrictive behaviour, it's good enough for most users and most decent ftpd's already does it that way. For example, you can find this in ls.c in vsftpd: --8-- * Note that pattern matching is only supported within the last path * component. For example, searching for /a/b/? will work, but searching * for /a/?/c will not. --8-- which is a sane behaviour for a public server in my world. For non-anonftp usage limiting the number of matches might be OK, if the thing stops recursion when hitting the limit and not just limit the reply send to the client ;) So my vote would be default to restrictive, a more relaxed behaviour must be explicitly configured. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- A bird in hand makes brushing your teeth difficult. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Any progress on PR41230 (HEAD issues on cached items)?
Has there been any progress on PR41230? I submitted a patch that at least seems to improve the situation that now seems to have seen some testing by others as well. As I have stated before, it would be really nice if a fix for this could be committed, be it my patch or some other solution. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- Don't hide your contempt of the contemptible! =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: Any progress on PR41230 (HEAD issues on cached items)?
On Fri, 18 May 2007, Justin Erenkrantz wrote: On 5/17/07, Niklas Edmundsson [EMAIL PROTECTED] wrote: Has there been any progress on PR41230? I submitted a patch that at least seems to improve the situation that now seems to have seen some testing by others as well. As I have stated before, it would be really nice if a fix for this could be committed, be it my patch or some other solution. I've committed a variant of this patch to trunk in r539620. Thanks! Great! Now it just needs to be included in 2.2.x and I'll be even more happy :) /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- Old mufflers never die, they get exhausted. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: mod_cache: Don't update when req max-age=0?
On Mon, 21 May 2007, Graham Leggett wrote: Since max-age=0 requests can't be fulfilled without revalidating the object they don't benefit from this header rewrite, and requests with max-age!=0 that can benefit from the header rewrite won't be affected by this change. Am I making sense? Have I missed something fundamental? At first glance, doing this I think will break RFC2616 compliance, and if it does break RFC compliance then I think it should not be default behaviour. However if it does solve a real problem for admins, then having a directive allowing the admin to enable this behaviour does make sense. Why would it break RFC compliance? This request will never benefit of the headers being saved to disk, and the headers returned to the client should of course be those that resulted of the revalidation of the object. The only difference is that they aren't saved to disk too. The only difference I can see is that you can't probe that the previous request was a max-age=0 by doing max-age!=0 request afterwards... Zooming out a little bit, this seems to fall into the category of RFC violations that allow the cache to either hit the backend less, or hit the backend not at all, for the benefit of an admin who knows whet they are doing. A simple set of directives that allow an admin to break RFC compliance under certain circumstances in order to achieve certain goals does make sense. Yup. CacheIgnoreCacheControl is one of those, we use it on the offloaders that only serves large files that we know doesn't need the RFC behaviour. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- Sir, We are receiving 285,000 Hails. þ Crusher =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: mod_cache: Don't update when req max-age=0?
On Tue, 22 May 2007, Henrik Nordstrom wrote: tis 2007-05-22 klockan 11:40 +0200 skrev Niklas Edmundsson: -8--- Does anybody see a problem with changing mod_cache to not update the stored headers when the request has max-age=0, the body turns out not to be stale and the on-disk header hasn't expired? -8--- My understanding: It's fine in an RFC point of view for the cache to completely ignore a 304 and not update the stored entity at all. But the response to this request should be the merge of the two responses assuming the conditional was added by the cache. This is in line with my understanding, and since the response-merging is being done today the only change that would be done is to skip storing the header to disk. I think it would be wise to only skip the storing for the max-age=0 case though. Should I try to whip up a patch for it then? /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- Radioactive halibut will make fission chips. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: mod_cache: Don't update when req max-age=0?
On Thu, 24 May 2007, Sander Striker wrote: -8--- Does anybody see a problem with changing mod_cache to not update the stored headers when the request has max-age=0, the body turns out not to be stale and the on-disk header hasn't expired? -8--- My understanding: It's fine in an RFC point of view for the cache to completely ignore a 304 and not update the stored entity at all. But the response to this request should be the merge of the two responses assuming the conditional was added by the cache. This is in line with my understanding, and since the response-merging is being done today the only change that would be done is to skip storing the header to disk. I think it would be wise to only skip the storing for the max-age=0 case though. Why limit it to the the max-age=0 case? Isn't it a general improvement? Consider a default cache lifetime of 86400 seconds, and requests coming in with max-age=4 (we see a lot of mozilla downloads with this, for example). If you don't rewrite the on-disk headers you'll end up always hitting your backend when you pass an age of 4. In the max-age=0 case you only force an unneccesary header write, because: a) The written header won't be useful for other requests with max-age=0. A ground rule of caching is to not save stuff that's never used. b) Requests with max-age!=0 aren't helped much by it, the only penalty would be when an max-age!=0 request causes a header rewrite that an max-age=0 access would have performed. Doing this single rewrite instead of potentially thousands if rewriting due to max-age=0 is a rather big win. c) RFC-wise it seems to me that a not-modified object is a not-modified object. There is no guarantee that next request will hit the same cache, so nothing can expect a max-age=0 request to force a cache to rewrite its headers and then access it with max-age!=0 and get headers of that age. d) Also, an object tend to be accessed with more-or-less the same max-age. So to store headers in the max-age=0 case just because it might be accessed by max-age!=0 makes no sense, since it's more likely that the next request to this object will have the same max-age. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- Did I just step on someones toes again?? =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
mod_disk_cache jumbopatch - 20070727 revision
I have uploaded the version of our mod_disk_cache jumbopatch that we've been using on ftp.acc.umu.se for some time now to http://issues.apache.org/bugzilla/show_bug.cgi?id=39380 for those who wants a one-patch solution to using our modifications. Cutpaste from the bugzilla attachment comment: httpd 2.2.4 - mod_disk_cache jumbo patch - lfs/diskformat/read-while-caching etc. A snapshot from 20070727 of our mod_disk_cache jumbo patch and some assorted additional patches that's needed for stability. We've been running this for a couple of months on ftp.acc.umu.se, they have survived Debian/Ubuntu/Mozilla releases gracefully. This version plays well with other entities using/updating the cache. We are using a open()-wrapper in combination with rsync which lets rsync utilise the cached bodies, and also cache files. This patch is provided mostly as a one-patch solution for other sites that wishes to use these mod_disk_cache modifications. Highlights from previous patch: * More corner case error fixes, most of them triggered by Mozilla releases. * Greatly reduced duplicated data in the cache when using an NFS backend by hashing the body on the source files device and inode when available. HTTPD has already done the stat() of the file for us, so it's essentially free. * Tidied up the handling of updated files, only delete files in cache if they're really obsolete. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- We are ATT of Borg, MCI will be assimilated =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
mod_limitipconn for httpd 2.2 and mod_cache
Hi! Attached is a version of mod_limitipconn.c that works in conjunction with mod_cache and httpd-2.2. We've been using this on ftp.acc.umu.se for some time now without any unwanted issues. The main problem with mod_limitipconn-0.22 was that since mod_cache runs as a quick handler, mod_limitipconn also must run as a quick handler with all those benefits and drawbacks. Download the tarball from http://dominia.org/djao/limitipconn2.html , extract it, and replace mod_limitipconn.c with this version and follow the build instructions. I would really wish that this was made part of httpd, it's really needed when running a file-download site due to the scarily large amount of demented download manager clients out there. However, I have not received any response from the original author on the matter. From what I have understood of the license it should be OK to merge into httpd if you want though, but I think that you guys are way more clued in that matter than me. This is a summary of the changes made: * Rewritten to run as a Quick Handler, before mod_cache. * Configuration directives are now set per VHost (Directory/Location are available after the Quick Handler has been run). This means that any Location containers has to be deleted in existing configs. * Fixed configuration merging, so per-vhost settings use defaults set at the server level. * By running as a Quick Handler we don't go through the entire lookup phase (resolve path, stat file, etc) before we get the possibility to block a request. This gives a clear performance enhancement. * Made the handler exit as soon as possible, doing the easy checks first. * Don't do subrequest to lookup MIME type if we don't have mime-type specific config. * Count connections in closing and logging state too, we don't want to be DOS'd by clients behind buggy firewalls and so on. * Added debug messages for easy debugging. * Reduced loglevel from ERR to INFO for reject-logging. In any case, I hope that this can be of use for others than us. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- We are ATT of Borg, MCI will be assimilated =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=/* * Copyright (C) 2000-2002 David Jao [EMAIL PROTECTED] * * Permission is hereby granted, free of charge, to any person * obtaining a copy of this software and associated documentation * files (the Software), to deal in the Software without * restriction, including without limitation the rights to use, copy, * modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is * furnished to do so, subject to the following conditions: * * The above copyright notice, this permission notice, and the * following disclaimer shall be included in all copies or substantial * portions of the Software. * * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, * WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER * DEALINGS IN THE SOFTWARE. * */ #include httpd.h #include http_config.h #include http_request.h #include http_protocol.h #include http_core.h #include http_main.h #include http_log.h #include ap_mpm.h #include apr_strings.h #include scoreboard.h #define MODULE_NAME mod_limitipconn #define MODULE_VERSION 0.22 module AP_MODULE_DECLARE_DATA limitipconn_module; static int server_limit, thread_limit; typedef struct { signed int limit; /* max number of connections per IP */ /* array of MIME types exempt from limit checking */ apr_array_header_t *no_limit; int no_limit_set; /* array of MIME types to limit check; all other types are exempt */ apr_array_header_t *excl_limit; int excl_limit_set; } limitipconn_config; static void *limitipconn_create_config(apr_pool_t *p, server_rec *s) { limitipconn_config *cfg = (limitipconn_config *) apr_pcalloc(p, sizeof (*cfg)); /* default configuration: no limit (unset), and both arrays are empty */ cfg-limit = -1; cfg-no_limit = apr_array_make(p, 0, sizeof(char *)); cfg-excl_limit = apr_array_make(p, 0, sizeof(char *)); return cfg; } /* Simple merge: Per vhost entries overrides main server entries */ static void *limitipconn_merge_config(apr_pool_t *p, void *BASE, void *ADD) { limitipconn_config *base = BASE; limitipconn_config *add = ADD; limitipconn_config *cfg
[PATCH]: mod_cache: don't store headers that will never be used
Attached is a patch for mod_cache (patch is for httpd-2.2.4) that implements what I suggested in May (see the entire thread at http://mail-archives.apache.org/mod_mbox/httpd-dev/200705.mbox/[EMAIL PROTECTED] ). The problem is that cached objects that gets hammered with Cache-Control: max-age=0 requests will get their on-disk headers rewritten for each request, and since max-age=0 are always revalidated (hence the rewriting in the first place) those rewritten on-disk headers will never be used. Since the ground rule of caching is to cache stuff that's being reused this is rather suboptimal. The solution is to NOT rewrite the on-disk headers when the following conditions are true: - The body is NOT stale (ie. HTTP_NOT_MODIFIED when revalidating) - The on-disk header hasn't expired. - The request has max-age=0 This is perfectly OK with RFC2616 10.3.5 and does NOT break anything. Patch is tested on httpd-2.2.4 and works as expected according to my tests. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- A pretty .GIF is like a melody =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--- ../dist/modules/cache/mod_cache.c 2006-12-08 13:56:00.0 +0100 +++ modules/cache/mod_cache.c 2007-07-28 22:17:48.0 +0200 @@ -305,7 +305,7 @@ cache_server_conf *conf; const char *cc_out, *cl; const char *exps, *lastmods, *dates, *etag; -apr_time_t exp, date, lastmod, now; +apr_time_t exp, date, lastmod, now, staleexp=APR_DATE_BAD; apr_off_t size; cache_info *info = NULL; char *reason; @@ -582,6 +582,8 @@ /* Oh, hey. It isn't that stale! Yay! */ cache-handle = cache-stale_handle; info = cache-handle-cache_obj-info; +/* Save stale expiry timestamp for later perusal */ +staleexp = info-expire; rv = OK; } else { @@ -736,14 +738,41 @@ ap_cache_accept_headers(cache-handle, r, 1); } -/* Write away header information to cache. It is possible that we are - * trying to update headers for an entity which has already been cached. - * - * This may fail, due to an unwritable cache area. E.g. filesystem full, - * permissions problems or a read-only (re)mount. This must be handled - * later. - */ -rv = cache-provider-store_headers(cache-handle, r, info); +/* Avoid storing on-disk headers that are never used. When the following + * conditions are fulfilled: + * - The body is NOT stale (ie. HTTP_NOT_MODIFIED when revalidating) + * - The on-disk header hasn't expired. + * - The request has max-age=0 + * Then there is no use to update the on-disk header since it won't be used + * by other max-age=0 requests since they are always revalidated, and we + * know it's likely there will be more max-age=0 requests since objects + * tend to have the same access pattern. + * Luckily for us RFC2616 10.3.5 last paragraph allows us to NOT update the + * on-disk headers if we don't want to on HTTP_NOT_MODIFIED. + */ +rv = APR_EGENERAL; +if(cache-stale_handle staleexp != APR_DATE_BAD now staleexp) { +const char *cc_req; +char *val; + +cc_req = apr_table_get(r-headers_in, Cache-Control); +if(cc_req ap_cache_liststr(r-pool, cc_req, max-age, val) +val != NULL apr_atoi64(val) == 0) +{ +/* Yay, we can skip storing the on-disk header */ +rv = APR_SUCCESS; +} +} +if(rv != APR_SUCCESS) { +/* Write away header information to cache. It is possible that we are + * trying to update headers for an entity which has already been cached. + * + * This may fail, due to an unwritable cache area. E.g. filesystem full, + * permissions problems or a read-only (re)mount. This must be handled + * later. + */ +rv = cache-provider-store_headers(cache-handle, r, info); +} /* Did we just update the cached headers on a revalidated response? *
Re: [PATCH]: mod_cache: don't store headers that will never be used
On Sun, 29 Jul 2007, Graham Leggett wrote: What may make this workable is the combination of The body is NOT stale with max-age=0. The danger of not writing the headers is that an entity, once stale, will not be freshened when the spec says it should, and will cause a thundering herd of conditional requests to a backend server. This issue has been badly understood by some in the past, who suggested that the ability to update cached entities be removed. We need to make very sure that by fixing one problem we don't introduce another. You missed the condition that the header wasn't expired, right? To reiterate: - The body is NOT stale (ie. HTTP_NOT_MODIFIED when revalidating) - The on-disk header hasn't expired. - The request has max-age=0 Since the on-disk header hasn't expired AND the body is unchanged, you'll have the same data in the cache except for the Expires header. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- * . . . . . - Tribble Mother and Young =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: [PATCH]: mod_cache: don't store headers that will never be used
On Sun, 29 Jul 2007, Roy T. Fielding wrote: The solution is to NOT rewrite the on-disk headers when the following conditions are true: - The body is NOT stale (ie. HTTP_NOT_MODIFIED when revalidating) - The on-disk header hasn't expired. - The request has max-age=0 This is perfectly OK with RFC2616 10.3.5 and does NOT break anything. No, it breaks the refreshing of the on-disk header with a new Date field representing its new age. The patch would cause a prefetching spider to fail to do its intended job of refreshing all cached entries even when they are not yet stale, which is something that content management systems do all the time when fronted by a caching server. Uh, OK... So they are dependant upon having the Date/Expires header updated, since this is the only thing that will be affected by this patch... Stale content will be refreshed as usual. Since you generally never know if you will be talking to the same cache (think DNS record pointing to multiple caching hosts) I hadn't even imagined people trying to be clever and forcing updates this way since it's kind of a special case that it would work ;) I'm especially intrigued by the fact that stuff is depending on the Date/Expires header in a cache being exactly what it thinks it should be, sounds kind of broken to me... As I said before, address the problem you have by adding a directive to either ignore such requests from abusive downloaders or to define a minimum age for certain cached objects. HTTP does not require the cache configuration to be that of a transparent cache -- it only defines how a cache configured to be transparent should work. I would really like to understand why it wouldn't work before resorting to such solutions. Much of the response I have got on this has been of the it will not work variety while people obviously haven't read carefully enough to realise that the condition they state won't work isn't even affected. However, if stuff is really depending on Date/Expires being what it thinks it is (*shiver*) then I guess there won't be any other options... /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- * . . . . . - Tribble Mother and Young =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: [PATCH]: mod_cache: don't store headers that will never be used
On Mon, 30 Jul 2007, Niklas Edmundsson wrote: However, if stuff is really depending on Date/Expires being what it thinks it is (*shiver*) then I guess there won't be any other options... Here's a version with a config directive, defaults to disabled. Thoughts? /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- No, no, nurse! I said SLIP off his SPECTACLES!! =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--- ../dist/modules/cache/mod_cache.c 2006-12-08 13:56:00.0 +0100 +++ modules/cache/mod_cache.c 2007-07-30 14:17:17.0 +0200 @@ -305,7 +305,7 @@ cache_server_conf *conf; const char *cc_out, *cl; const char *exps, *lastmods, *dates, *etag; -apr_time_t exp, date, lastmod, now; +apr_time_t exp, date, lastmod, now, staleexp=APR_DATE_BAD; apr_off_t size; cache_info *info = NULL; char *reason; @@ -582,6 +582,8 @@ /* Oh, hey. It isn't that stale! Yay! */ cache-handle = cache-stale_handle; info = cache-handle-cache_obj-info; +/* Save stale expiry timestamp for later perusal */ +staleexp = info-expire; rv = OK; } else { @@ -736,14 +738,41 @@ ap_cache_accept_headers(cache-handle, r, 1); } -/* Write away header information to cache. It is possible that we are - * trying to update headers for an entity which has already been cached. - * - * This may fail, due to an unwritable cache area. E.g. filesystem full, - * permissions problems or a read-only (re)mount. This must be handled - * later. - */ -rv = cache-provider-store_headers(cache-handle, r, info); +rv = APR_EGENERAL; +if(conf-relaxupdates cache-stale_handle +staleexp != APR_DATE_BAD now staleexp) +{ +/* Avoid storing on-disk headers that are never used. When the + * following conditions are fulfilled: + * - The body is NOT stale (ie. HTTP_NOT_MODIFIED when revalidating) + * - The on-disk header hasn't expired. + * - The request has max-age=0 + * Then there is no use to update the on-disk header since it won't be + * used by other max-age=0 requests since they are always revalidated, + * and we know it's likely there will be more max-age=0 requests since + * objects tend to have the same access pattern. + */ +const char *cc_req; +char *val; + +cc_req = apr_table_get(r-headers_in, Cache-Control); +if(cc_req ap_cache_liststr(r-pool, cc_req, max-age, val) +val != NULL apr_atoi64(val) == 0) +{ +/* Yay, we can skip storing the on-disk header */ +rv = APR_SUCCESS; +} +} +if(rv != APR_SUCCESS) { +/* Write away header information to cache. It is possible that we are + * trying to update headers for an entity which has already been cached. + * + * This may fail, due to an unwritable cache area. E.g. filesystem full, + * permissions problems or a read-only (re)mount. This must be handled + * later. + */ +rv = cache-provider-store_headers(cache-handle, r, info); +} /* Did we just update the cached headers on a revalidated response? * @@ -896,6 +925,8 @@ /* array of headers that should not be stored in cache */ ps-ignore_headers = apr_array_make(p, 10, sizeof(char *)); ps-ignore_headers_set = CACHE_IGNORE_HEADERS_UNSET; +ps-relaxupdates = 0; +ps-relaxupdates_set = 0; return ps; } @@ -941,6 +972,10 @@ (overrides-ignore_headers_set == CACHE_IGNORE_HEADERS_UNSET) ? base-ignore_headers : overrides-ignore_headers; +ps-relaxupdates = +(overrides-relaxupdates_set == 0) +? base-relaxupdates +: overrides-relaxupdates; return ps; } static const char *set_cache_ignore_no_last_mod(cmd_parms *parms, void *dummy, @@ -1119,6 +1154,19 @@ return NULL; } +static const char *set_cache_relaxupdates(cmd_parms *parms, void *dummy, + int flag) +{ +cache_server_conf *conf; + +conf = +(cache_server_conf *)ap_get_module_config(parms-server-module_config, + cache_module); +conf-relaxupdates = flag; +conf-relaxupdates_set = 1; +return NULL; +} + static int cache_post_config(apr_pool_t *p, apr_pool_t *plog, apr_pool_t *ptemp, server_rec *s) { @@ -1171,6 +1219,11 @@ AP_INIT_TAKE1(CacheLastModifiedFactor, set_cache_factor, NULL, RSRC_CONF, The factor used to estimate Expires date from LastModified date
Re: [PATCH]: mod_cache: don't store headers that will never be used
On Tue, 31 Jul 2007, Sander Striker wrote: Here's a version with a config directive, defaults to disabled. Silly Q; a directive? Or a env var that can be scoped in interesting ways using mod_setenvif and/or mod_rewrite? Most of our proxy behavior overrides are in terms of envvars. They are much more flexible to being tuned per-browser, per-backend etc. Directive, envvar, I don't think Niklas cares much. Can we make up our mind please? I have no clue on the envvar-stuff, so I don't think I'm qualified to have an opinion. CacheIgnoreCacheControl et al are config directives currently and I have the gut feeling that they should all either be envvar-thingies or config directives, and that starting to mix stuff will only end in confusion and despair ;) I prefer a config-option that I can set serverwide without too much fuss since we want this behaviour on all files. If this can also be accomplished with envvar-stuff then sure. One way might be to do a config directive for now, and deal with the envvar-stuff separately. Related, this config option might also be of interest for mod_disk_cache to enable similar optimizations. What would the good way be to accomplish this? /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- * - Tribble þ oð oð - Tribbles and Rock! =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: 1.3 bugs
On Thu, 2 Aug 2007, Jim Jagielski wrote: It's easy to be brave when being heartless :) Lots of WONTFIX :) Actually, it's more heartless to just leave the bugs without feedback. It gives people the impression that the developers simply don't care, and they will most likely never submit a bug report again. This is especially true if the reporter had come up with a fix and produced a patch... /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- Death is nature's way of telling you to slow down... =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: [PATCH]: mod_cache: don't store headers that will never be used
On Tue, 31 Jul 2007, Niklas Edmundsson wrote: Any opinions on this? Here's a version with a config directive, defaults to disabled. Silly Q; a directive? Or a env var that can be scoped in interesting ways using mod_setenvif and/or mod_rewrite? Most of our proxy behavior overrides are in terms of envvars. They are much more flexible to being tuned per-browser, per-backend etc. Directive, envvar, I don't think Niklas cares much. Can we make up our mind please? I have no clue on the envvar-stuff, so I don't think I'm qualified to have an opinion. CacheIgnoreCacheControl et al are config directives currently and I have the gut feeling that they should all either be envvar-thingies or config directives, and that starting to mix stuff will only end in confusion and despair ;) I prefer a config-option that I can set serverwide without too much fuss since we want this behaviour on all files. If this can also be accomplished with envvar-stuff then sure. One way might be to do a config directive for now, and deal with the envvar-stuff separately. Related, this config option might also be of interest for mod_disk_cache to enable similar optimizations. What would the good way be to accomplish this? /Nikke /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- Now, what was that magic word? Shazam? WHAM! Nah - Garibaldi =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: CHANGES
On Wed, 8 Aug 2007, Jim Jagielski wrote: I know I've said this before, but having copies of Changes in Apache 2.2.5 under the -trunk CHANGES file, as well as the 2.0.x stuff in both trunk and 2.2 means that we are pretty much assured that they will get out of sync. I'd like to re-propose that the CHANGES files only refer to changes related to that MAJOR.MINOR release and, at the end, refer people to the other CHANGES files for historical purposes (except for Apache 1.3 which will maintain CHANGES history since the beginning). Comments? I'd like to actually implement this for the TR Friday. I'm no committer or anything, but it sounds like the sane way to do it. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- I keep trying to lose weight, but it keeps finding me =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: [PATCH]: mod_cache: don't store headers that will never be used (fwd)
I think that this discussion kind of got lost due to vacations or something... In any case, I'd really like to get some closure. The discussion starts here for those of you that has deleted the thread: http://mail-archives.apache.org/mod_mbox/httpd-dev/200707.mbox/[EMAIL PROTECTED] (the permalink doesn't seem to show the nifty thread list, you have to click a bit for that). What I'd like answered is: - Was the latest patch as suggested OK? - What's the correct way of getting the mod_cache configuration from the mod_disk_cache module? /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- Operator...give me the no for 999, QUICK! =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= -- Forwarded message -- From: Niklas Edmundsson [EMAIL PROTECTED] To: dev@httpd.apache.org Date: Wed, 8 Aug 2007 09:28:48 +0200 (MEST) Subject: Re: [PATCH]: mod_cache: don't store headers that will never be used Reply-To: dev@httpd.apache.org X-Bogosity: Unsure, tests=bogofilter, spamicity=0.50, version=0.96.2 On Tue, 31 Jul 2007, Niklas Edmundsson wrote: Any opinions on this? Here's a version with a config directive, defaults to disabled. Silly Q; a directive? Or a env var that can be scoped in interesting ways using mod_setenvif and/or mod_rewrite? Most of our proxy behavior overrides are in terms of envvars. They are much more flexible to being tuned per-browser, per-backend etc. Directive, envvar, I don't think Niklas cares much. Can we make up our mind please? I have no clue on the envvar-stuff, so I don't think I'm qualified to have an opinion. CacheIgnoreCacheControl et al are config directives currently and I have the gut feeling that they should all either be envvar-thingies or config directives, and that starting to mix stuff will only end in confusion and despair ;) I prefer a config-option that I can set serverwide without too much fuss since we want this behaviour on all files. If this can also be accomplished with envvar-stuff then sure. One way might be to do a config directive for now, and deal with the envvar-stuff separately. Related, this config option might also be of interest for mod_disk_cache to enable similar optimizations. What would the good way be to accomplish this? /Nikke /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- Now, what was that magic word? Shazam? WHAM! Nah - Garibaldi =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: [PATCH]: mod_cache: don't store headers that will never be used (fwd)
On Wed, 10 Oct 2007, Graham Leggett wrote: Niklas Edmundsson wrote: What I'd like answered is: - Was the latest patch as suggested OK? The latest patch was the one with a directive, which is +1 from me - though is it possible to add documentation for the directive? Sure. Is http://apache-server.com/tutorials/ATdocs-project.html the relevant docco-documentation? Should it be a combined patch with both code and docs? - What's the correct way of getting the mod_cache configuration from the mod_disk_cache module? Look inside mod_proxy_http.c for a function called ap_proxy_read_headers(). In it, the module mod_proxy_http reads the config from the module mod_proxy. Thanks, I'll take a look :) /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- Self-made man: A horrible example of unskilled labor. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Proposal: Increase request in worker_score
Hi all! We've been annoyed by the fact that the status page as served by mod_status only shows the first 64 bytes of the current requests for a couple of years now. We know that it's only meant to be a hint, not the complete request in all conditions, but the problem is that 64 bytes is just too short to be useful in a lot of cases. When using httpd for serving files on a FTP server you usually see the directory and the first characters of the filename, very annoying. Locally we've been running with a patch[1] to increase the size from 64 bytes to 192 bytes for a while now (since httpd 2.2.4 was released) with no ill effects. Admittedly, our servers are configured to MaxClients 6000 so they don't see the insane amount of simultaneous accesses as some bigger configurations out there. However, as a useful improvement we would like to propose that the request entry in worker_score is increased from 64 bytes to 128 bytes. This would cover most cases we've seen of missing just a couple of characters in mod_status to be able to determine which file is being accessed... In terms of memory footprint it would mean the following: sizeof(worker_score) on: 32bit Linux (Ubuntu 6.06) from 224 bytes to 288 bytes 64bit Linux (Ubuntu 7.04) from 264 bytes to 328 bytes Summing this up for a server configured for MaxClients 2 it would mean: 32bit from 4375kB to 5625kB 64bit from 5156kB to 6406kB Since we're talking about memory footprint increases in the megabyte-range for a server configured for 2 connections I can't see that the increased memory consumption should be a problem. To be honest though, I would really prefer having it increased to something like 192 or 256 bytes. 256 bytes would mean an increase of 3750kB for 2 MaxClients. Not much of a big deal on modern (and not-so-modern) hardware IMHO. Thoughts? [1] - our patch to increase request scoreboard size. --- ../dist/include/scoreboard.h2006-07-12 05:38:44.0 +0200 +++ ./include/scoreboard.h 2007-09-20 20:24:41.0 +0200 @@ -125,7 +125,7 @@ #endif apr_time_t last_used; char client[32]; /* Keep 'em small... */ -char request[64]; /* We just want an idea... */ +char request[192]; /* We just want an idea... */ char vhost[32];/* What virtual host is being accessed? */ }; /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- Reformat Hard Drive! Are you SURE (Y/Y)? =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
mod_disk_cache jumbopatch - 20071016 revision for 2.2.6.
Hi all! I've uploaded a httpd 2.2.6-adapted version of our mod_disk_cache jumbo patch that we're using at ftp.acc.umu.se to http://issues.apache.org/bugzilla/show_bug.cgi?id=39380 for those who wants a one-patch solution to using our modifications. The only changes from the last version is making the patch apply to httpd 2.2.6 and fixing the bugs fixed in the vanilla 2.2.6 version. It's survived one Ubuntu release, so it's fairly stable. We typically saw 250MB/s (we only have 2gigabit) being delivered from the ftp cluster and the backend doing around 5-20MB/s serving up uncached files and file system traversals. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- * * * - Tribbles O O O - Tribbles on drugs =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: Apache 2.2 MPM Worker Virtual Memory Usage
On Sun, 21 Oct 2007, Ruediger Pluem wrote: What is your setting for ThreadsPerChild? On my Linux each thread consumes 8MB of virtual memory (I assume for stack and other thread private data) as shown by pmap. This can sum up to a large amount of memory. This is due to linux libc setting the thread stack size using the stack resource limit. We have the following in our apache httpd startup script: # NPTL (modern Linux threads) defaults the thread stack size to the setting # of your stack resource limit. The system-wide default for this is 8MB, # which is waaay exaggerated when running httpd. # 512kB should be more than enough (AIX manages on 96kB, Netware on 64kB). ulimit -s 512 We didn't bother with trying to lower it more, but I've run the same httpd config on IBM AIX 5.1 with the default 96kB thread stack size without problems. This could probably be worked around in httpd/APR by calling setrlimit before starting the threads, however I think it's probably better to just document this Linux thread bogosity and let vendors fix their httpd startup scripts though. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- No boom now. Boom tomorrow...there's ALWAYS a boom tomorrow...BOOM! =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: Proposal: Increase request in worker_score
On Sun, 21 Oct 2007, William A. Rowe, Jr. wrote: Could we start by increasing the existing one, which is rather easily done, and then move on to doing it the fancy way? If someone has a fancy-patch right now I'm all for that, but pending that I'd prefer landing some sort of improvement... I don't quite see the reasoning of having 2-steps to a solution, an intermediate that doesn't land in 2.2 or 2.4)... Just the logic of having some improvement committed if noone gets round to doing it the fancy way. If it gets replaced by a fancy solution it's not that much extra work that has been put into it, but if it's forgotten we'll have to live with yet another major version with this annoyingly small default buffer. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- Can I have someone to eat? - Spike =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: Segmentation fault( SSL enable Apache 2.2.6(64 bit)
On Thu, 25 Oct 2007, Renu Tiwari wrote: Hi, We have configured Apache 2.2.6(64 bit) with openssl-0.9.8g on AIX5.2(64 bit). Build openssl source after setting BUILD_MODE=64. The issue is, when we start the Apache web server(./apachectl start), we are getting segmentation fault in error_log. This issue is coming only when openssl is coming into the picture. What cud be the possible reason? Please reply. Did your openssl 64bit build pass make test (or make check, whatever it's called)? /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- If I wanted your opinion, I would have given you one =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
RE: Segmentation fault( SSL enable Apache 2.2.6(64 bit)
On Thu, 25 Oct 2007, Renu Tiwari wrote: No, when I tried doing make test, it failed. But when I tried to run make install and make, there I didn't get any error. make install doesn't do the test. If your openssl doesn't pass make test, then it's broken. Fix that first. Also I have copied the same SSL-enabled Apache webserver on AIX 5.3(64 bit) there it is working perfectly fine. Does this depend on the kernel also. As AIX5.3 is 64-bit kernel and AIX 5.2 is 32-bit kernel m/c. But our application is running as 64 bit application. Shouldn't matter, could be that you're hitting some bug that's been fixed. You might want to check your C runtime patch levels, and other patches too for that matter. -Original Message- From: Niklas Edmundsson [mailto:[EMAIL PROTECTED] Sent: Thursday, October 25, 2007 2:36 PM To: 'dev@httpd.apache.org' Subject: Re: Segmentation fault( SSL enable Apache 2.2.6(64 bit) On Thu, 25 Oct 2007, Renu Tiwari wrote: Hi, We have configured Apache 2.2.6(64 bit) with openssl-0.9.8g on AIX5.2(64 bit). Build openssl source after setting BUILD_MODE=64. The issue is, when we start the Apache web server(./apachectl start), we are getting segmentation fault in error_log. This issue is coming only when openssl is coming into the picture. What cud be the possible reason? Please reply. Did your openssl 64bit build pass make test (or make check, whatever it's called)? /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- If I wanted your opinion, I would have given you one =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= CAUTION - Disclaimer * This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Infosys e-mail system. ***INFOSYS End of Disclaimer INFOSYS*** /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- Taken as a whole, the universe is absurd =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: Proxying subrequests
On Sat, 27 Oct 2007, Paul Querna wrote: -0.9 on enabling this by default in mod_includes. Make it possible to turn it on via httpd.conf, but never on by default I agree. And it should have huge warning signs, and a long descriptive name that does not invite to let's try this and see if it solves my problem. Cross-site-include-holes are nasty, and I see it as a feature that they are not supported ;) /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- The only one who can destroy your Tasha now, is you. Q =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: Proposal: Increase request in worker_score
On Wed, 31 Oct 2007, Jim Jagielski wrote: For those interested, check out http://svn.apache.org/viewvc?rev=590641view=rev pasts tests and works as expected, at least in my limited testing :) Again, the main focus in this was to resolve the issue in a 2.2-friendly way. So I'd like to get additional feedback with that in mind before I propose a backport. Seems reasonable for 2.2. I'm not too keen on the directive name though, but since I have no better suggestion I'll be quiet now ;) /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- Vell, Zaphod's just zis guy you know? - Gag Halfrunt. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: svn commit: r601843 - in /httpd/mod_ftp/trunk: STATUS include/mod_ftp.h
On Thu, 6 Dec 2007, William A. Rowe, Jr. wrote: First question, are there testers who will test/vote on the module? I'm game for testing. Our environment is strictly anonftp read-only though, so I won't test the non-anon stuff. Having the thing work with mod_cache would be absolute bliss, but I guess that's an item to chew on after the first release ;) + * FTPLimit* family of directives share an FTPLimitDBFile across hosts, +yet fail to scope their tracking records to the corresponding host. If there's no fix, I'd just mark those directives as experimental and call it baked. If it's documented how it works, that's fine. Although, if they don't scope correctly per-host they should probably be server-wide to avoid confusion by people that doesn't read the fine print. Who's interested in seeing a TR and helping make the release happen? I'm keen on helping out. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- If you call your doctor Bones, YMBAT =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: [VOTE] initial release of httpd-mod_ftp-0.9.0
On Tue, 18 Dec 2007, William A. Rowe, Jr. wrote: Please fetch up the newly prepared httpd-mod_ftp-0.9.0.tar.[gz|bz2] (and its md5/asc sigs) from: http://httpd.apache.org/dev/dist/mod_ftp/ review, take it for a spin, and cast your choice As I mentioned, the perms of the installed httpd include directory were corrupted to 664 by the first candidate, so I've withdrawn it. Proceeding to tag the next crack at an alpha/beta 0.9.1 tomorrow, You might want to have a go at the configure.apxs before doing that. It seems to contain some bashisms that shows up on debian/ubuntu machines which uses dash as /bin/sh: % ./configure.apxs test: 8: ==: unexpected operator test: 19: ==: unexpected operator Configuring mod_ftp for APXS ... The thing is that == is not a valid /bin/sh style test expression. It should probably be just =, or test -z $var ... On the positive side, the thing builds on both Linux and AIX (out of tree, for httpd 2.2.6). I'll await the 0.9.1 tag before doing more elaborate tests though. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- I'm not crazy, I just don't give a s#!t =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
What's the right way to spawn a child in modules?
Hi all! I'm currently working on beating mod_disk_cache into submission, with the goal of it being able to deliver data while caching a file (this started as bug #39380). I have solved most of the problems, I'll submit patches when they have passed the scrutiny of my fellow computer club admins. The goal is to make the thing usable on http://ftp.acc.umu.se/ after all (and no, we don't have a budget so we can't compensate bad code with more hardware ;). Anyhow, my real question: What's the right way to spawn a child in modules? The problem is that the current mod_disk_cache design means that the first one to request an uncached file gets to wait until it's cached, since the caching is done by that request. That can be a long time to wait for a reply when you're caching a 4GB DVD image from a slow backend. The naive solution is to spawn a child that does the copying letting the request be processed simultaneously. Is this doable? Would it be considered offensive to do apr_thread_create() if threads are available and fork() otherwise? Other ways to solve this? /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- This building is so high, the elevator shows movies. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: What's the right way to spawn a child in modules?
On Thu, 27 Apr 2006, Brian Akins wrote: Would it be considered offensive to do apr_thread_create() if threads are available and fork() otherwise? sounds reasonable - having only thought about it for 10 seconds.. OK. I'll try then and see how it plays out. Any particular reason your backends are slow? Currently: Old hardware. In the future: Gigabit Ethernet. I might add that our FTP mirror has a bunch of DVD images, and even at full gigabit speed it takes some 40 seconds to cache it and that's simply too long before the server starts responding by sending data to the client. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- Go Ahead.. We're cleared for wierd. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: Possible new cache architecture
On Mon, 1 May 2006, Davi Arnaut wrote: More important, if we stick with the key/data concept it's possible to implement the header/body relationship under single or multiple keys. I've been hacking on mod_disk_cache to make it: * Only store one set of data when one uncached item is accessed simultaneously (currently all requests cache the file and the last finished cache process is wins). * Don't wait until the whole item is cached, reply while caching (currently it stalls). * Don't block the requesting thread when requestng a large uncached item, cache in the background and reply while caching (currently it stalls). This is mostly aimed at serving huge static files from a slow disk backend (typically an NFS export from a server holding all the disk), such as http://ftp.acc.umu.se/ and http://ftp.heanet.ie/ . Doing this with the current mod_disk_cache disk layout was not possible, doing the above without unneccessary locking means: * More or less atomic operations, so caching headers and data in separate files gets very messy if you want to keep consistency. * You can't use tempfiles since you want to be able to figure out where the data is to be able to reply while caching. * You want to know the size of the data in order to tell when you're done (ie the current size of a file isn't necessarily the real size of the body since it might be caching while we're reading it). In the light of our experiences, I really think that you want to have a concept that allows you to keep the bond between header and data. Yes, you can patch up a missing bond by require locking and stuff, but I really prefer not having to lock cache files when doing read access. When it comes to make the common case fast a lockless design is very much preferred. However, if all those issues are sorted out in the layer above disk cache then the above observations becomes more or less moot. In any case the patch is more or less finished, independent testing and auditing haven't been done yet but I can submit a preliminary jumbo-patch if people are interested in having a look at it now. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- Want to forget all your troubles? Wear tight shoes. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
mod_disk_cache patch, preview edition (was: new cache arch)
On Tue, 2 May 2006, Graham Leggett wrote: I've been hacking on mod_disk_cache to make it: * Only store one set of data when one uncached item is accessed simultaneously (currently all requests cache the file and the last finished cache process is wins). * Don't wait until the whole item is cached, reply while caching (currently it stalls). * Don't block the requesting thread when requestng a large uncached item, cache in the background and reply while caching (currently it stalls). This is great, in doing this you've been solving a proxy bug that was first reported in 1998 :). OK. Stuck in the File under L for Later pile? ;) The only things to be careful of is for Cache-Control: no-cache and friends to be handled gracefully (the partially cached file should be marked as delete-me so that the current request creates a new cache file / no cache file. Existing running downloads should be unaffected by this.), and for backend failures (either a timeout or a premature socket close) to cause the cache entry to be invalidated and deleted. I haven't changed the handling of this, so any bugs in this regard shouldn't be my fault at least ;) Regarding partially cached files, it understands when caching a file has failed and so on. * More or less atomic operations, so caching headers and data in separate files gets very messy if you want to keep consistency. Keep in mind that HTTP/1.1 compliance requires that the headers be updatable without changing the body. They are. It seek():s to an offset where the body is stored so headers can be updated as long as they don't grow too much. * You can't use tempfiles since you want to be able to figure out where the data is to be able to reply while caching. * You want to know the size of the data in order to tell when you're done (ie the current size of a file isn't necessarily the real size of the body since it might be caching while we're reading it). The cache already wants to know the size of the data so that it can decide whether it's prepared to try and cache the file in the first place, so in theory this should not be a problem. The need-size-issue goes for retrievals as well. You also have the size unknown right now issue, which this patch solves by writing a header with the size -1 and then updating it when the size is known. In any case the patch is more or less finished, independent testing and auditing haven't been done yet but I can submit a preliminary jumbo-patch if people are interested in having a look at it now. Post it, people can take a look. OK. It's attached. It has only had mild testing using the worker mpm with mmap enabled, it needs a bit more testing and auditing before trusting it too hard. Note that this patch fixes a whole slew of other issues along the way, the most notable ones being LFS on 32bit arch, don't eat all your 32bit memory/address space when caching a huge files, provide r-filename so %f in LogFormat works, and other smaller issues. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- I am Zirofsky of Borg. I will reassimilate Alaska and Finland. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= httpd-2.2.2-mod_disk_cache-jumbo20060502.patch.gz Description: Binary data
Re: Possible new cache architecture
On Tue, 2 May 2006, Graham Leggett wrote: This is great, in doing this you've been solving a proxy bug that was first reported in 1998 :). This already works in the case you get the data from the proxy backend. It does not work for local files that get cached (the scenario Niklas uses the cache for). Ok then I have misunderstood - I was referring to the thundering herd problem. Exactly what is the thundering herd problem? I can guess the general problem, but without a more precise definition I can't really say if my patch fixes it or not. If it's: * Link to latest GNOME Live CD gets published on Slashdot. * A gazillion users click the link to download it. * mod_disk_cache starts a new instance of caching the file for each request, until someone has completed caching the file. Then this patch solves the problem regardless of whether it's a static file or dynamically generated content since it only allows one instance to cache the file (OK, there's a small hole so there can be multiple instances but it's wy smaller than now), all other instances delivers data as the caching process is writing it. Additionally, if it's a static file that's allowed to be cached in the background it solves: * Reduce chance of user getting bored since the data is delivered while being cached. * The user got bored and closed the connection so the painfully cached file gets deleted. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- Illiterate? Write for information! =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: Possible new cache architecture
On Tue, 2 May 2006, Plüm, Rüdiger, VF EITO wrote: Another thing: I guess on systems with no mmap support the current implementation of mod_disk_cache will eat up a lot of memory if you cache a large local file, because it transforms the file bucket(s) into heap buckets in this case. Even if mmap is present I think that mod_disk_cache causes the file buckets to be transformed into many mmap buckets if the file is large. Thus we do not use sendfile in the case we cache the file. Correct. When caching a 4.3GB file on a 32bit arch it gets so bad that mmap eats all your address space and the thing segfaults. I initally thought it was eating memory, but that's only if you have mmap disabled. I the case that a brigade only contains file_buckets it might be possible to copy this brigade, sent it up the chain and process the copy of the brigade for disk storage afterwards. Of course this opens a race if the file gets changed in between these operations. This approach does not work with socket or pipe buckets for obvious reasons. Even heap buckets seem to be a somewhat critical idea because of the added memory usage. I did the somewhat naive approach of only doing background caching when the buckets refer to a single sequential file. It's not perfect, but it solves the main case where you get a huge amount of data to store ... /Nikke - stumbled upon more than one bug when digging into mod_disk_cache -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- Anything is edible if it's chopped finely enough =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: Possible new cache architecture
On Tue, 2 May 2006, Graham Leggett wrote: If it's: * Link to latest GNOME Live CD gets published on Slashdot. * A gazillion users click the link to download it. * mod_disk_cache starts a new instance of caching the file for each request, until someone has completed caching the file. Then this is the thundering herd problem :) OK :) Either a site is slashdotted (as in your case), or a cached entry expires, and suddenly the backend gets nailed until at least one request wins, then we are back to normal serving from the cache. In your case, the backend is the disk, while in the bug from 1998, the backend was another webserver. Either way, same problem. OK. Then this patch solves the problem regardless of whether it's a static file or dynamically generated content since it only allows one instance to cache the file (OK, there's a small hole so there can be multiple instances but it's wy smaller than now), all other instances delivers data as the caching process is writing it. Additionally, if it's a static file that's allowed to be cached in the background it solves: * Reduce chance of user getting bored since the data is delivered while being cached. * The user got bored and closed the connection so the painfully cached file gets deleted. Hmmm - thinking about this we try to cache the brigade (all X GB of it) first, then we try write it to the network, thus the delay. Does your patch solve all of these already, or are they planned? It solves everything I've mentioned. The solution is probably not perfect for the not-static-file case since it falls back to the old behaviour of caching the whole file, but it should be a lot better than the current mod_disk_cache since the rest of the threads get reply-while-caching. There are issues here with the fact that the result is discarded if the connection is aborted, but I'm not familiar enough with apache filter internals to state that you can keep the result even though the connection is aborted. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- Anything is edible if it's chopped finely enough =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: mod_disk_cache patch, preview edition (was: new cache arch)
On Tue, 2 May 2006, Graham Leggett wrote: The need-size-issue goes for retrievals as well. If you are going to read from partially cached files, you need a total size field as well as a flag to say give up, this attempt at caching failed Are there partially cached files? If I request the last 200 bytes of a 4.3GB DVD image, the bucket brigade contains the complete file... The headers says ranges and all sorts of things but they don't match what's cached. What may be useful is a cache header with some metadata in it giving the total size and a download failed flag, which goes in front of the headers. The metadata can also contain the offset of the body. I solved it with size in the body and a timeout mechanism, a download failed flag doesn't cope with segfaults. OK. It's attached. It has only had mild testing using the worker mpm with mmap enabled, it needs a bit more testing and auditing before trusting it too hard. Note that this patch fixes a whole slew of other issues along the way, the most notable ones being LFS on 32bit arch, don't eat all your 32bit memory/address space when caching a huge files, provide r-filename so %f in LogFormat works, and other smaller issues. Is it possibly to split the patch into separate fixes for each issue (where practical)? It makes it easier to digest. It's possible, but since I needed to hammer so hard at mod_disk_cache to get it in the shape I wanted it I set out to first get the whole thing working and then worry about breaking the patch into manageable pieces. For example, by doing it all-incremental there would have been a dozen or so disk format change-patches, and I really don't think you would have wanted that :) As said, this is a preliminary jumbo patch for those interested in how we tackled the various problems involved (or those who love to take bleeding edge code for a spin and watch it falling into pieces when hitting a weird corner case ;). Also the other fixes can be committed immediately/soon, depending on how simple they are, which will simplify the final patch. Yup. I'll update bug#39380 when we feel that we have a good solution. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- To err is Human. To blame someone else is politics. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=