from:"Niklas Edmundsson"

Re: mod_cache responsibilities vs mod_xxx_cache provider responsibilities

2006-09-14 Thread Niklas Edmundsson


On Wed, 13 Sep 2006, Davi Arnaut wrote:

I'm working on this. You may want to check my proposal at 
http://verdesmares.com/Apache/proposal.txt


Will it be possible to do away with one file for headers and one file 
for body in mod_disk_cache with this scheme?


The thing is that I've been pounding seriously at mod_disk_cache to 
make it able to sustain rather heavy load on not-so-heavy equipment, 
and part of that effort was to wrap headers and body into one file for 
mainly the following purposes:


* Less files, less open():s (small gain)
* Way much easier to purge old entries from the cache (huge gain).
  Simply list all files in cache, sort by atime and remove the oldest.
  The old way by using htcacheclean took ages and had less useful
  removal criteria.
* No synchronisation issues between the header file and body file,
  unlink one and it's gone.

That's only one of many changes made, but I found it to be crucial to 
be able to have an architecture that's consistent without relying on 
locks. This made it rather easy to implement stuff like serving files 
that are currently being cached from cache, reusing expired cached 
files if the originating file is found to be unmodified, and so on.


But the largest gain is still the cache cleaning process.

The stuff is used in production and seems stable, however I haven't 
had any response to the first (trivial) patch sent so I don't know if 
there's any interest in this.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Does the Little Mermaid wear an algebra?
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: mod_cache responsibilities vs mod_xxx_cache provider responsibilities

2006-09-14 Thread Niklas Edmundsson


On Thu, 14 Sep 2006, Graham Leggett wrote:


Niklas Edmundsson wrote:

Will it be possible to do away with one file for headers and one file for 
body in mod_disk_cache with this scheme?


This definitely has lots of advantages - however HTTP/1.1 requires that it be 
possible to modify the headers on a cached entry independently of the cached 
body. As long as this is catered for, it should be fine.


Our patch allows for this, the body is simply stored at an offset with 
some logic to detect headers larger than the offset and cope with that 
too (albeit this introduces a risk for bad data being sent to the 
client due to the lockless design, so you really want to avoid this by 
having the offset large enough).


Since seek():ing and writing to an offset doesn't occupy disk space in 
normal unix filesystems there isn't a problem in having the data at a 
rather large offset, but I don't know how non-unix behaves in this 
regard.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 To refuse praise is to seek praise twice.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

[PATCH] (resend) mod_disk_cache LFS-aware config

2006-09-14 Thread Niklas Edmundsson



To facilitate the merging of our large mod_disk_cache fixup I will 
send small patches that fix various bugs so that they can be applied 
incrementally to trunk with relevant discussion limited to those 
patches and me not having to respin entire patchsets due to trivial 
fixes to patches like this one.


If you want larger patchsets instead of this baby steps approach 
that's fine by me, but small pieces usually allows for easier review 
when merging.


This patch and the jumbo patch with all fixes are also attached to bug 
#39380.


This patch makes it possible to configure mod_disk_cache to cache 
files that are larger than the LFS limit. While at it, I implemented 
error handling so it doesn't accept things like CacheMinFileSize 
barf anymore.


Actual LFS support (current code eats all address-space/memory in 
32bit boxes) will come in a separate patch once this is commited.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Ensign.  How do I get to Ten-Forward? - Picard
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=Index: mod_disk_cache.c
===
--- mod_disk_cache.c(revision 416365)
+++ mod_disk_cache.c(working copy)
@@ -334,14 +334,14 @@ static int create_entity(cache_handle_t 
 if (len  conf-maxfs) {
 ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, r-server,
  disk_cache: URL %s failed the size check 
- (% APR_OFF_T_FMT   % APR_SIZE_T_FMT ),
+ (% APR_OFF_T_FMT   % APR_OFF_T_FMT ),
  key, len, conf-maxfs);
 return DECLINED;
 }
 if (len = 0  len  conf-minfs) {
 ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, r-server,
  disk_cache: URL %s failed the size check 
- (% APR_OFF_T_FMT   % APR_SIZE_T_FMT ),
+ (% APR_OFF_T_FMT   % APR_OFF_T_FMT ),
  key, len, conf-minfs);
 return DECLINED;
 }
@@ -1026,7 +1026,7 @@ static apr_status_t store_body(cache_han
 if (dobj-file_size  conf-maxfs) {
 ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, r-server,
  disk_cache: URL %s failed the size check 
- (% APR_OFF_T_FMT % APR_SIZE_T_FMT ),
+ (% APR_OFF_T_FMT % APR_OFF_T_FMT ),
  h-cache_obj-key, dobj-file_size, conf-maxfs);
 /* Remove the intermediate cache file and return non-APR_SUCCESS */
 file_cache_errorcleanup(dobj, r);
@@ -1050,7 +1050,7 @@ static apr_status_t store_body(cache_han
 if (dobj-file_size  conf-minfs) {
 ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, r-server,
  disk_cache: URL %s failed the size check 
- (% APR_OFF_T_FMT % APR_SIZE_T_FMT ),
+ (% APR_OFF_T_FMT % APR_OFF_T_FMT ),
  h-cache_obj-key, dobj-file_size, conf-minfs);
 /* Remove the intermediate cache file and return non-APR_SUCCESS */
 file_cache_errorcleanup(dobj, r);
@@ -1137,15 +1137,25 @@ static const char
 {
 disk_cache_conf *conf = ap_get_module_config(parms-server-module_config,
  disk_cache_module);
-conf-minfs = atoi(arg);
+
+if (apr_strtoff(conf-minfs, arg, NULL, 0) != APR_SUCCESS ||
+conf-minfs  0) 
+{
+return CacheMinFileSize argument must be a non-negative integer 
representing the min size of a file to cache in bytes.;
+}
 return NULL;
 }
+
 static const char
 *set_cache_maxfs(cmd_parms *parms, void *in_struct_ptr, const char *arg)
 {
 disk_cache_conf *conf = ap_get_module_config(parms-server-module_config,
  disk_cache_module);
-conf-maxfs = atoi(arg);
+if (apr_strtoff(conf-maxfs, arg, NULL, 0) != APR_SUCCESS ||
+conf-maxfs  0) 
+{
+return CacheMaxFileSize argument must be a non-negative integer 
representing the max size of a file to cache in bytes.;
+}
 return NULL;
 }
 
Index: mod_disk_cache.h
===
--- mod_disk_cache.h(revision 416365)
+++ mod_disk_cache.h(working copy)
@@ -88,8 +88,8 @@ typedef struct {
 apr_size_t cache_root_len;
 int dirlevels;   /* Number of levels of subdirectories */
 int dirlength;   /* Length of subdirectory names */
-apr_size_t minfs;/* minumum file size for cached files */
-apr_size_t maxfs;/* maximum file size for cached files */
+apr_off_t minfs; /* minimum file size for cached files */
+apr_off_t maxfs

Re: mod_cache responsibilities vs mod_xxx_cache provider responsibilities

2006-09-14 Thread Niklas Edmundsson


On Thu, 14 Sep 2006, Davi Arnaut wrote:



On 14/09/2006, at 04:24, Niklas Edmundsson wrote:


On Wed, 13 Sep 2006, Davi Arnaut wrote:

I'm working on this. You may want to check my proposal at 
http://verdesmares.com/Apache/proposal.txt


Will it be possible to do away with one file for headers and one file for 
body in mod_disk_cache with this scheme?


http://verdesmares.com/Apache/patches/016.patch


OK. You seem to dump the body right after the headers though, so you 
won't be able to do header rewrites.


Also, it's rather unneccessary to call the files .cache if there are 
only one type of files ;)


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 You have learned much, young one. - Vader
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: [PATCH] (resend) mod_disk_cache LFS-aware config

2006-09-14 Thread Niklas Edmundsson


On Thu, 14 Sep 2006, Graham Leggett wrote:


On Thu, September 14, 2006 11:17 am, Niklas Edmundsson wrote:


To facilitate the merging of our large mod_disk_cache fixup I will
send small patches that fix various bugs so that they can be applied
incrementally to trunk with relevant discussion limited to those
patches and me not having to respin entire patchsets due to trivial
fixes to patches like this one.


+1.

This also makes it easier when more than one person is working on
patchsets to integrate both patches.


Yup. The situation seems to be complicated somewhat by Davi working on 
the cache-thingies, and doing more than just poking around in the 
mod_cache infrastructure...


However, it seems that we really should start merging stuff in a tree, 
be it trunk or cache-dev or whatever, before we are sitting on two 
hard-to-merge trees which both holds significant improvements.


As said, our stuff is stable in production (except for one bug that I 
suspect is an apache/apr bug, more about that when/if we get to that 
part) and transforms mod_disk_cache from unusable for us to performing 
nicely with approx 90% cache hit rate when serving ftp.acc.umu.se, 
ftp.gnome.org, se.releases/archive.ubuntu.com, releases.mozilla.org, 
ftp.se.debian.org ...


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 You have learned much, young one. - Vader
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: mod_cache responsibilities vs mod_xxx_cache provider responsibilities

2006-09-14 Thread Niklas Edmundsson


On Thu, 14 Sep 2006, Davi Arnaut wrote:

I'm working on this. You may want to check my proposal at 
http://verdesmares.com/Apache/proposal.txt
Will it be possible to do away with one file for headers and one file 
for body in mod_disk_cache with this scheme?


http://verdesmares.com/Apache/patches/016.patch


OK. You seem to dump the body right after the headers though, so you won't 
be able to do header rewrites.


Could you kindly point me to the cache code that rewrites only the headers ?


If I remember correctly the code in 2.2.3 only does whole-file 
revalidation, the next logical step (that our patch does) is to make 
it understand that if the source file hasn't changed you don't have to 
copy the whole file since it's enough to just update the headers.


Our patch does this, because it's needed to get decent performance 
when juggling dvd images (yes, recaching a 4GB file is rather 
expensive).


There are a couple of trivial improvements like this that needs to be 
done in mod_disk_cache that depends on the underlying disk storage 
layer done right. However, given the current state of mod_disk_cache 
almost everything is an improvement...


Also, it's rather unneccessary to call the files .cache if there are only 
one type of files ;)


That's convenience, there may be other type of files on the same cache 
directory that are created by other tools.


That seems silly to me, the cache directory structure should be 
strictly private to the cache.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 You have learned much, young one. - Vader
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: [PATCH] (resend) mod_disk_cache LFS-aware config

2006-09-14 Thread Niklas Edmundsson


On Thu, 14 Sep 2006, Graham Leggett wrote:


On Thu, September 14, 2006 2:41 pm, Niklas Edmundsson wrote:


Yup. The situation seems to be complicated somewhat by Davi working on
the cache-thingies, and doing more than just poking around in the
mod_cache infrastructure...

However, it seems that we really should start merging stuff in a tree,
be it trunk or cache-dev or whatever, before we are sitting on two
hard-to-merge trees which both holds significant improvements.


Small patches are easy to merge, easy to review, and unlikely to clash -
what I am keen to do is start finding all the small fixes in both your and
Davi's code, and see them all applied. Davi's patches are already
reasonably small - is it possible to break up your patch into discrete
bits as well?


That's my intention. I had hoped for my small patches to be applied to 
trunk one by one as they come, fixing eventual objections patch by 
patch instead of having to respin large patchsets that depends on 
eachother. At least for the small obvious fixes this should be doable.


Ie, make small patch, submit for review/commit, fix/redo if needed. On 
to next patch.


When there are more complex changes I'll probably have to do 
multi-part patchsets, but I really want to avoid them since they're a 
pain when it comes to rejections.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Read the docs. Wow, what a radical concept!
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: mod_cache responsibilities vs mod_xxx_cache provider responsibilities

2006-09-16 Thread Niklas Edmundsson


On Fri, 15 Sep 2006, Brian Akins wrote:

The separate header and body files work wonderfully for performance (filling 
multiple gig interfaces and/or 30k requests/sec. or rather modest hardware). 
If you have them all in one, it can make the sendfile for the body 
cumbersome.


If you write to the file using mmap on linux, then sendfile() breaks 
yes. mmap didn't give any major performance benefit for the body copy 
though, so it doesn't matter and we don't use it. This is really a 
Linux bug, since non-overlapping write/sendfile should be OK.


If you somehow track what entries or in the cache, it is very easy to purge 
entries.


Extra tracking sounds unnecessary if you can do it in a way that 
doesn't need it.


At Apachecon, I'll talk some about our version of mod_cache. 
Unfortunately, I can't share code :( But I can tell you the separate 
files way is not a performance or housekeeping issue.


If you have the index i can agree.

However, I don't see how you can do a lockless design with multiple 
files and an index that can do:


* Clients read from the cache as files are being cached.
* Only one session caches the same file.
* Header/Body updates.
* No index/files out-of-sync issues. Ever.

With locks, yes it's possible but also a hassle to get right with 
performance intact.


The current mod_disk_cache seems to be designed for small files and 
enough memory to hide the problems by the design. If you have files 
that fit into the OS cache then it doesn't matter if hundreds of 
sessions are caching the same file, it'll work out eventually without 
reduced performance. This isn't the case when each file (DVD image) is 
bigger than your memory and doesn't fit in the OS file cache. In fact 
you can tell that the author never even consider this due to the way 
the body is copied (on 32bit you loose).


We, as a ftp mirror operated by a non-profit computer club, have a 
slightly different usecase with single files larger than machine RAM 
and a working set of approx 40 times larger than RAM. Some bad design 
decisions in mod_disk_cache becomes really visible in this 
environment.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 I wish I had a snappy Trek Message to put here...
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: mod_cache responsibilities vs mod_xxx_cache provider responsibilities

2006-09-18 Thread Niklas Edmundsson


On Sun, 17 Sep 2006, Graham Leggett wrote:


Niklas Edmundsson wrote:

However, I don't see how you can do a lockless design with multiple files 
and an index that can do:


* Clients read from the cache as files are being cached.
* Only one session caches the same file.
* Header/Body updates.
* No index/files out-of-sync issues. Ever.


Thinking about this some more I do see a race during purging - a cache thread 
could read the header, the purge deletes header and body, and then the cache 
thread reads the body, and interprets the missing body as the body is still 
coming.


The easiest way to deal with this might be to have a timeout, if the 
body hasn't shown up in $timeout time then something went bad, 
DECLINE, meaning that the cache layer thinks it should cache the file 
and acts accordingly. You actually want this fallback anyway, and it's 
probably enough to deal with the purge-problem. The purge should 
delete the oldest unused entries anyway, so the chance of hitting that 
case shouldn't be too common.


And yes, since this scheme only might cause on-disk stray files that 
can be cleaned up by purging I can agree that it'll work. However, I 
strongly believe that the purging should not have to read each header 
file the way that htcacheclean currently does it since it poses such a 
strain on the cache filesystem. A file system traversal should be 
enough.


Anyhow, I can probably rather easily adapt our patches to do it this 
way if that's what people want. I'm not entirely sure what the gain 
would be though, since it's a tad more housekeeping work and double 
the number of inodes to traverse during a purge...


But, that is future work. I haven't had any comment of the current 
patch of mine yet (lfs-config) so I'm not entirely sure of whether it 
seems OK and I should proceed with the next patch or what. I'm not 
that well endowed in all API:s involved, and stuff that looks right to 
me might have a much better Apachier solution so I don't want to get 
carried away creating huge patchsets to having the first one rejected 
because my coding style sucks... However, I can understand if you want 
a complete patch that solves the lfs issues, but then you'll have to 
tell me since I'm not a mind reader ;)


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
  --- tribbles playing follow-the-leader
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: mod_cache responsibilities vs mod_xxx_cache provider responsibilities

2006-09-20 Thread Niklas Edmundsson


On Mon, 18 Sep 2006, Brian Akins wrote:


Niklas Edmundsson wrote:



Extra tracking sounds unnecessary if you can do it in a way that
doesn't need it.


It's not extra it just adding some tracking.  When an objects gets cached 
log (sql, db, whatever) that /blah/foo/bar.html is cached as 
/cache/x/y/something.meta.  Then it's very easy to ask the store what is 
/blah/foo/bar.html cached as?  There may be multiples because of vary.


Extra because you already have the needed info to puzzle the things 
together...



* Clients read from the cache as files are being cached.


That's the hard one, IMO


But the implementation was rather easy once the cache to separate 
file and mv to correct location-stuff was ripped out. Or, as easy as 
building your own bucket-type is.



* Only one session caches the same file.


Easy to do if we use deterministic tmp files and not the way we currently do 
it.  Then all you have to do is when creating temp files use O_EXCL.


Or, if we skip the tmp files altogether.


* Header/Body updates.


Eaiser with seperate files like mod_disk_cache does now.


True.


* No index/files out-of-sync issues. Ever.


Hard to guarantee, but not impossible.  Always to index when storing file and 
remove when deleting.  This should use something like providers so it's not 
in core cache code and can be easily modified.



With locks, yes it's possible but also a hassle to get right with
performance intact.


Not really that hard.  Trust me it has been done...


I'll take your word for that.


We, as a ftp mirror operated by a non-profit computer club, have a
slightly different usecase with single files larger than machine RAM
and a working set of approx 40 times larger than RAM. Some bad design
decisions in mod_disk_cache becomes really visible in this
environment.


Seems to me you should approach problem differently, like rsyncing the 
mirrored content.  I don't know your environment, but was just what I cam up 
with off the top of my head.


Try rsyncing a few TB of content onto a few hundred GB of cache disk 
and see how that works out for you :)


Our setup is briefly described here by the way:
http://ftp.acc.umu.se/mirror/ftp-about.html

/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 A closed mouth gathers no feet.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: mod_cache responsibilities vs mod_xxx_cache provider responsibilities

2006-09-20 Thread Niklas Edmundsson


On Mon, 18 Sep 2006, Brian Akins wrote:


Graham Leggett wrote:


I have not seen inside the htcacheclean code, why is the code reading the
headers? In theory the cache should be purged based on last access time,
deleted as space is needed.


Everyone should be mounting cache directories noatime, unless they don't care 
about performance...


Actually, cache on xfs mounted with atime doesn't seem to be a 
performance killer oddly enough... Our frontends had no problems 
surviving 1k requests/s during the latest mozilla-update-barrage. 
Other mirrors had problems, so it seems we ended up with taking the 
majority of the load...


That said: yes, noatime is quicker but if you want to be able to clean 
your cache often (think new linux distro release which quickly fills 
up the cache with new contents) atime+fs traversal is a better 
combined solution than having to open/read every header.



/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 That's not a bug. It's supposed to do that.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: mod_cache responsibilities vs mod_xxx_cache provider responsibilities

2006-09-20 Thread Niklas Edmundsson


On Wed, 20 Sep 2006, Brian Akins wrote:


Niklas Edmundsson wrote:
don't care about performance...


Actually, cache on xfs mounted with atime doesn't seem to be a performance 
killer oddly enough... Our frontends had no problems surviving 1k 
requests/s during the latest mozilla-update-barrage.


1k requests/second is not really that much...  10k requests/second is more 
what I'm used to.  XFS sucks for us as a cache storage.  It tends to crock 
under some traffic patterns (reads vs writes).  ext3 is actually more 
reliable for us.  Reiserfs is interesting, but tends to go haywire from time 
to time.


I think the key difference here is our average file size... We don't 
need that many requests/s to bottom out gige normally.


We clean our cache often because we have a really quick way to find the size 
and remove the oldest expired objects first.  Every cache store gets recorded 
in SQLite with info about the object (size, mtime, expire time, url, key, 
etc.).  Makes it trivial tow write cron jobs to do cache management.


Yup.

/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Don't force it, use a bigger hammer
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

[PATCH] mod_disk_cache working LFS (filecopy)

2006-09-26 Thread Niklas Edmundsson



This patch depends on mod_disk_cache LFS-aware config submitted 
earlier and is for trunk.


It makes caching of large files possible on 32bit machines by:

* Realising that a file is a file and can be copied as such, without
  reading the whole thing into memory first.
* When a file is cached by copying, replace the brigade with a new one
  refering to the cached file so we don't have to read the file from
  the backend again when sending a response to the client.
* When a file is cached by copying, keep the file even if the client
  aborts the connection since we know that the response is valid.
* Check a few more return values to be able to add successfully in
  the appropriate places above.

The thing is mildly tested, but it's a subset of our much larger 
patchset that's been in production since June.


I'm able to get a 4.3GB file from a 32bit machine with 1GB of memory 
using mod_disk_cache, and the md5sum is correct afterwards. The old 
behaviour was eating all the address space/memory and segfault.


I'll attach the thing to bug #39380 as well.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Great thing about being a Slayer? Kicking ass is comfort food. - Buffy
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--- mod_disk_cache.c.1-lfsconfig2006-09-18 12:19:56.0 +0200
+++ mod_disk_cache.c2006-09-26 09:35:51.0 +0200
@@ -157,7 +157,16 @@ static apr_status_t file_cache_el_final(
 if (dobj-tfd) {
 apr_status_t rv;
 
-apr_file_close(dobj-tfd);
+rv = apr_file_close(dobj-tfd);
+dobj-tfd = NULL;
+
+if(rv != APR_SUCCESS) {
+ap_log_error(APLOG_MARK, APLOG_DEBUG, rv, r-server,
+ disk_cache: closing tempfile failed: %s,
+ dobj-tempfile);
+apr_file_remove(dobj-tempfile, r-pool);
+return rv;
+}
 
 /* This assumes that the tempfile is on the same file system
  * as the cache_root. If not, then we need a file copy/move
@@ -169,9 +178,8 @@ static apr_status_t file_cache_el_final(
  disk_cache: rename tempfile to datafile failed:
   %s - %s, dobj-tempfile, dobj-datafile);
 apr_file_remove(dobj-tempfile, r-pool);
+return rv;
 }
-
-dobj-tfd = NULL;
 }
 
 return APR_SUCCESS;
@@ -976,15 +984,133 @@ static apr_status_t store_headers(cache_
 return APR_SUCCESS;
 }
 
+
+static apr_status_t copy_body(apr_file_t *srcfd, apr_off_t srcoff, 
+  apr_file_t *destfd, apr_off_t destoff, 
+  apr_off_t len)
+{
+apr_status_t rc;
+apr_size_t size;
+apr_finfo_t finfo;
+apr_time_t starttime = apr_time_now();
+char buf[CACHE_BUF_SIZE];
+
+if(srcoff != 0) {
+rc = apr_file_seek(srcfd, APR_SET, srcoff);
+if(rc != APR_SUCCESS) {
+return rc;
+}
+}
+
+if(destoff != 0) {
+rc = apr_file_seek(destfd, APR_SET, destoff);
+if(rc != APR_SUCCESS) {
+return rc;
+}
+}
+
+/* Tried doing this with mmap, but sendfile on Linux got confused when
+   sending a file while it was being written to from an mmapped area.
+   The traditional way seems to be good enough, and less complex.
+ */
+while(len  0) {
+size=MIN(len, CACHE_BUF_SIZE);
+
+rc = apr_file_read_full (srcfd, buf, size, NULL);
+if(rc != APR_SUCCESS) {
+return rc;
+}
+
+rc = apr_file_write_full(destfd, buf, size, NULL);
+if(rc != APR_SUCCESS) {
+return rc;
+}
+len -= size;
+}
+
+/* Check if file has changed during copying. This is not 100% foolproof
+   due to NFS attribute caching when on NFS etc. */
+/* FIXME: Can we assume that we're always copying an entire file? In that
+  case we can check if the current filesize matches the length
+  we think it is */
+rc = apr_file_info_get(finfo, APR_FINFO_MTIME, srcfd);
+if(rc != APR_SUCCESS) {
+return rc;
+}
+if(starttime  finfo.mtime) {
+return APR_EGENERAL;
+}
+
+return APR_SUCCESS;
+}
+
+
+static apr_status_t replace_brigade_with_cache(cache_handle_t *h,
+   request_rec *r,
+   apr_bucket_brigade *bb)
+{
+apr_status_t rv;
+int flags;
+apr_bucket *e;
+core_dir_config *pdcfg = ap_get_module_config(r-per_dir_config,
+core_module);
+disk_cache_object_t *dobj = (disk_cache_object_t *) h-cache_obj-vobj;
+
+flags = APR_READ|APR_BINARY;
+#if APR_HAS_SENDFILE
+flags |= ((pdcfg-enable_sendfile

Re: [PATCH] mod_disk_cache working LFS (filecopy)

2006-09-26 Thread Niklas Edmundsson


On Tue, 26 Sep 2006, Issac Goldstand wrote:

Forgive me for missing the obvious, but why not just use mod_file_cache for 
this? I recall you mentioning that your use of mod_cache was for locally 
caching very large remote files, so don't see how this would help that in any 
case since the file doesn't exist locally when being stored, and if the file 
is otherwise known to be on the file system, there's no reason to keep it in 
mod_disk_cache's cache area (in any case, it wouldn't improve performance - 
only mod_file_cache would).  So what am I missing?


Apache Module mod_file_cache
Description:Caches a static list of files in memory

This has little to do with setup like ours (ftp.acc.umu.se):
* NFS backend with lots of storage (multiple TB), not lots of
  bandwidth/performance.
* Multiple frontends with (relatively) fast cache storage.
* A working set of a couple of hundred GB which changes daily.

By using caching frontends we can easily fill our available 2Gbit even 
though the backend can only do about 300-400Mbit. This is possible 
because of a cache hit rate of about 90%.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 One family builds a wall, two families enjoy it.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: [PATCH] mod_disk_cache working LFS (filecopy)

2006-09-26 Thread Niklas Edmundsson


On Tue, 26 Sep 2006, Graham Leggett wrote:


On Tue, September 26, 2006 1:00 pm, Joe Orton wrote:


This was discussed a while back.  I think this is an API problem which
needs to be fixed at API level, not something which should be worked
around by adding bucket-type-specific hacks.


API changes won't be backportable to v2.2.x though, although you're right.


Won't that method mean that caching the file will happen at the speed 
the client reads the file?


So, with only one session caching a file and read-while-caching (ie. 
the features you want in the end) you can get the following scenario:


- Slw client starts downloading a large file. First access, the
  file is getting cached slwly.
- Fast clients starts downloading the same file, but slowly caused by
  the pacing of the slow client.

I suspect that this also means that if the caching client hangs up 
before the caching is finished we'll have to toss what's been cached 
so far, or do we get that error before the brigade is destroyed?


In any case, it sounds like a better way to do it than the current 
always-eat-your-memory-and-die solution, but I think that we'll be 
needing that kludge to get good behaviour in our 
caching-frontend-for-ftpserver-case ...


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 But I was going into Toshi Station to pick up some power converters...
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: [PATCH] mod_disk_cache working LFS (filecopy)

2006-09-27 Thread Niklas Edmundsson


On Tue, 26 Sep 2006, Graham Leggett wrote:


Niklas Edmundsson wrote:


* Realising that a file is a file and can be copied as such, without
  reading the whole thing into memory first.
* When a file is cached by copying, replace the brigade with a new one
  refering to the cached file so we don't have to read the file from
  the backend again when sending a response to the client.


As I read the code, the copy is completed before an attempt is made to 
deliver the copy to the network. This should in theory stop a slow initial 
client from holding up faster following clients, if the caching is still in 
transit.


Is this correct?


This is the original design of mod_disk_cache (which isn't changed by 
this patch), so yes.


In practice this isn't enough when dealing with large files, so in our 
production code (the hideously large jumbopatch) this is fixed by 
read-while-caching and spawning a thread to do the caching in the 
background while delivering the response (by read-while-caching) to 
the client that initiated the caching.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Why don't you go teach someone who actually needs to learn?! - Buffy
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: svn commit: r450188 - /httpd/httpd/trunk/modules/cache/mod_disk_cache.c

2006-09-27 Thread Niklas Edmundsson


On Wed, 27 Sep 2006, Graham Leggett wrote:


Ruediger Pluem wrote:

Are we sure that we do not iterate too often ( 100) over this during the 
lifetime
of a request? I would say 'No, we do not iterate too often', but I think a 
crosscheck
by someone else is a good idea. Otherwise we would have a potential 
temporary memory

leak here.


We would copy the body once per request, surely? That's how I read it - 
copy_body would be called once, resulting in the buffer being declared once, 
and reused inside the copy_body loop.


The code is very picky about there only being a single, complete, 
body so it should only be called once per request.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 I am Jay Leno of Borg : Kevin's love life is irrelevant (and non-existing)
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: svn commit: r450105 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_disk_cache.c modules/cache/mod_disk_cache.h

2006-09-27 Thread Niklas Edmundsson


On Wed, 27 Sep 2006, Joe Orton wrote:


I don't get it - as discussed, this approach is completely unsound.
There is no reason to assume it's possible to copy the entire content
into the cache before sending anything to the client just because it
happens to be a FILE bucket (think slow NFS servers).  That is something
which needs to be *fixed*, not explicitly hard-coded.


Yes, it has to be fixed eventually. Until then we're better off with 
gradual improvements than just saying the solution isn't perfect.


mod_disk_cache isn't exactly the best code out there, so it'll take a 
while to get it decent, and it'll take more than a single patch to do 
it. I fully agree that the copy everything into cache and then reply 
method is utterly stupid, but that's the way mod_disk_cache currently 
works, and despite that it got tagged as stable...


In an effort to improve things, I'll start taking more stuff from 
our jumbo patch and building smaller incremental patches that will 
eventually mean that mod_disk_cache will have read-while-caching.


When that code is in there will be a plethora of options on how to 
solve the client that does the caching request hangs problem which 
we have kludged to work in the nfs-backend case. As said, we have 
kludged it to work in our setup (which has a slow NFS backend), but 
the perfect solution will have to come from people who knows all the 
deep magic in httpd and I know I'm not that person.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Only together can we turn him to the dark side of the Force. - Emperor
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: [PATCH] mod_disk_cache working LFS (filecopy)

2006-09-27 Thread Niklas Edmundsson


On Wed, 27 Sep 2006, Graham Leggett wrote:


On Wed, September 27, 2006 11:07 am, Niklas Edmundsson wrote:


In practice this isn't enough when dealing with large files, so in our
production code (the hideously large jumbopatch) this is fixed by
read-while-caching and spawning a thread to do the caching in the
background while delivering the response (by read-while-caching) to
the client that initiated the caching.


A thread makes sense for platforms that support threads, but we would need
some kind of functional behaviour for platforms that don't have threads.
Would the option of spawning a process to copy the file also work, leaving
the original process to read-while-cache the response for the benefit of
the client?


We have code for it, it's just untested since we're using the worker 
mpm. We'll deal with that when I get to those patches.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Dyslexia rules KO.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

[PATCH] mod_cache: Don't log bogus errors

2006-09-27 Thread Niklas Edmundsson



The following patch should eliminate bogus error log entries similar 
to:

[Wed Sep 27 15:31:29 2006] [error] (-3)Unknown error 18446744073709551613: 
cache: error returned while trying to return disk cached data

If I have understood things right AP_FILTER_ERROR only means that an 
error has occured and that an error web page has already been sent 
(documented in CHANGES of all places). The additional garbage in the 
error log doesn't make anyone happy...



/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Don't give away the homeworld. - Babylon 5
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=Index: mod_cache.c
===
--- mod_cache.c (revision 450405)
+++ mod_cache.c (working copy)
@@ -244,10 +244,12 @@ static int cache_url_handler(request_rec
 out = apr_brigade_create(r-pool, r-connection-bucket_alloc);
 rv = ap_pass_brigade(r-output_filters, out);
 if (rv != APR_SUCCESS) {
-ap_log_error(APLOG_MARK, APLOG_ERR, rv, r-server,
- cache: error returned while trying to return %s 
- cached data,
- cache-provider_name);
+if(rv != AP_FILTER_ERROR) {
+ap_log_error(APLOG_MARK, APLOG_ERR, rv, r-server,
+ cache: error returned while trying to return %s 
+ cached data,
+ cache-provider_name);
+}
 return rv;
 }

Re: [PATCH] mod_disk_cache working LFS (filecopy)

2006-10-01 Thread Niklas Edmundsson


On Sat, 30 Sep 2006, Davi Arnaut wrote:


Hi,

Wouldn't you avoid a lot of complexity in this patch
if you just deleted from the brigade the implicitly
created heap buckets while reading file buckets ?

Something like:

store_body:
.. if (is_file_bucket(bucket))
copy_file_bucket(bucket, bb);


Probably, but that doesn't allow for creating a thread/process that 
does the copying in the background, which is my long term goal.


Also, simply doing bucket_delete like that means that the file will 
never be sent to the client, which is a bad thing IMO ;)


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Excuse me, is that a toupee or do you have a tribble on your head
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: [PATCH] mod_disk_cache working LFS (filecopy)

2006-10-02 Thread Niklas Edmundsson


On Sun, 1 Oct 2006, Davi Arnaut wrote:


store_body:
.. if (is_file_bucket(bucket))
copy_file_bucket(bucket, bb);


Probably, but that doesn't allow for creating a thread/process that
does the copying in the background, which is my long term goal.

Also, simply doing bucket_delete like that means that the file will
never be sent to the client, which is a bad thing IMO ;)


Shame on me, but I said something like.. :)

I guess the attached patch does the same (plus mmap, et cetera) and is
much simpler. Comments ?


Simpler, yes. But it only has the benefit of not eating all your 
memory...


* It leaves the brigade containing the uncached entity, so it will
  cause the backend to first deliver stuff to be cached and then stuff
  to the client.
* When this evolves to wanting to spawn a thread/process to do the
  copying you'll need the is this a file-thingie anyway (at least I
  need it, but I might have missed some nifty feature in APR).

/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Oohhh.  Jedi Master.  Yoda.  You seek Yoda.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: [PATCH] mod_disk_cache working LFS (filecopy)

2006-10-02 Thread Niklas Edmundsson


On Mon, 2 Oct 2006, Davi Arnaut wrote:


Simpler, yes. But it only has the benefit of not eating all your
memory...


Well, that was the goal. Maybe we could merge this one instead and work
together on the other goals.


As I have said before, we have a large patchset that fixes a bunch of 
problems. However, since the wish was to merge it in pieces, I have 
started breaking it into pieces for merging.


If the intent is a total redesign of mod_disk_cache, ie you're not 
interested in these patches at all, please say so and I would have not 
wasted a lot of work on bending our patches to get something that 
works when applying them one by one and then do QA on the thing.



* It leaves the brigade containing the uncached entity, so it will
   cause the backend to first deliver stuff to be cached and then stuff
   to the client.


Yeah, that's how mod_disk_cache works. I think it's possible to work
around this limitation without using threads by keeping each cache
instance with it's own brigades and flushing it occasionally with
non-blocking i/o.


The replace_brigade_with_cache()-function simply replaced the brigade 
with an instance pointing to the cached entity.



Or we could move all disk i/o to a mod_disk_cache exclusive thread pool,
it could be configurable at build time whether or not to use a thread pool.

Comments ?


I would very happy if people would fix the mod_disk_cache mess so I 
didn't have to. However, since noone seems to have produced something 
usable for our usage in the timeframe mod_disk_cache has existed I was 
forced to hack on it.


I'm trying my best to not give up on having it merged as I know that 
there are other sites interested in it, and now that the first trivial 
bit has been applied I'm hoping that people will at least look at the 
rest... There are bound to be cool apachier ways to solve some of the 
problems, but given that our patch is stable in production and has a 
generally much higher code quality than mod_disk_cache (ever heard of 
error checking/handling?) it would be nice if people at least could 
look at the whole thing before starting to complain at the complexity 
of small parts (or code not touched by the patch, for that matter).



* When this evolves to wanting to spawn a thread/process to do the
   copying you'll need the is this a file-thingie anyway (at least I
   need it, but I might have missed some nifty feature in APR).


You would just need to copy the remaining buckets (granted if there are
no concurrency problems) and send then to a per-process thread pool.


And when not having threads?

/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 To avoid seeing a fool, break your mirror.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: Coding style

2006-10-03 Thread Niklas Edmundsson


On Mon, 2 Oct 2006, Garrett Rooney wrote:


Or the even more readable:

rv = do_something(args);
if (rv == APR_SUCCESS) {

}


+1 for simple code like this.

It comes naturally when you need to do stuff like
rv = dostuff(...);
if(rv != APR_SUCCESS  rv != whatever) {
...

and is also less likely to cause ugly linewraps when using
functions_with_long_names(and, a, large, list, of, arguments) ...

/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Sexy: Uses feather.  Kinky: Uses entire chicken.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

[PATCH] sendfile_nonblocking broken in trunk

2006-10-04 Thread Niklas Edmundsson



I stumbled upon this when porting the mod_disk_cache 
read-while-caching feature to trunk. r-w-c uses a diskcache bucket 
which it morphs into file buckets as more data becomes available.


Ie, it starts with a brigade containing:
FILE-DISKCACHE

FILE is sendfile:d as usual by core_filters, and when DISKCACHE is 
bucket_read() it morphs the bucket into a 0-length HEAP bucket, a FILE 
bucket and the remains in a trailing DISKCACHE bucket, ie: 
HEAP-FILE-DISKCACHE


send_brigade_nonblocking() correctly does the bucket_read and moves on 
to the next bucket which it correctly identifies as a FILE bucket and 
tries to sendfile_nonblocking().


sendfile_nonblocking() takes the _brigade_ as an argument, gets the 
first bucket from the brigade, finds it not to be a FILE bucket and 
barfs.


The attached fix is trivial, and I really can't understand why 
sendfile_nonblocking() was taking a brigade instead of a bucket as 
argument in the first place.


On a side note, in send_brigade_nonblocking() it's unnecessary to 
queue 0-length writes to the iovec. It probably won't do any 
difference at all in real world performance but it's obviously not 
optimal ;)


Also, I'm not at all fond of all those XXX: We really should 
log/return error/foo here-lines. It's not THAT hard doing it while 
coding, or at least do a final touchup before submitting a 
patch/comitting code...


/Nikke - now able to do some QA before submitting mod_disk_cache
 patches.
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 My stereo's «-fixed, said Tom monotonously.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=Index: core_filters.c
===
--- core_filters.c  (revision 452869)
+++ core_filters.c  (working copy)
@@ -330,7 +330,7 @@ static apr_status_t writev_nonblocking(a
 
 #if APR_HAS_SENDFILE
 static apr_status_t sendfile_nonblocking(apr_socket_t *s,
- apr_bucket_brigade *bb,
+ apr_bucket *bucket,
  apr_size_t *cumulative_bytes_written,
  conn_rec *c);
 #endif
@@ -567,7 +567,7 @@ static apr_status_t send_brigade_nonbloc
 return rv;
 }
 }
-rv = sendfile_nonblocking(s, bb, bytes_written, c);
+rv = sendfile_nonblocking(s, bucket, bytes_written, c);
 if (nvec  0) {
 (void)apr_socket_opt_set(s, APR_TCP_NOPUSH, 0);
 }
@@ -730,21 +730,21 @@ static apr_status_t writev_nonblocking(a
 #if APR_HAS_SENDFILE
 
 static apr_status_t sendfile_nonblocking(apr_socket_t *s,
- apr_bucket_brigade *bb,
+ apr_bucket *bucket,
  apr_size_t *cumulative_bytes_written,
  conn_rec *c)
 {
 apr_status_t rv = APR_SUCCESS;
-apr_bucket *bucket;
 apr_bucket_file *file_bucket;
 apr_file_t *fd;
 apr_size_t file_length;
 apr_off_t file_offset;
 apr_size_t bytes_written = 0;
 
-bucket = APR_BRIGADE_FIRST(bb);
 if (!APR_BUCKET_IS_FILE(bucket)) {
-/* XXX log a this should never happen message */
+ap_log_error(APLOG_MARK, APLOG_ERR, rv, c-base_server,
+ core_filter: sendfile_nonblocking: 
+ this should never happen);
 return APR_EGENERAL;
 }
 file_bucket = (apr_bucket_file *)(bucket-data);

Re: [PATCHES] mod_disk_cache read-while-caching

2006-10-08 Thread Niklas Edmundsson


On Thu, 5 Oct 2006, Niklas Edmundsson wrote:

OK, here comes the latest two patches in the mod_disk_cache improvement 
parody. I'll attach these patches to bug #39380, but with less comments.


I discovered a few misses, mostly not NULL:ing fd pointers when 
closing them, missing close/flush, and some unneccessary code 
duplication instead of calling the right helper in 
replace_brigade_with_cache().


The misses are in the loadstore-patch, so I would recommend applying 
this before reviewing the results even though it's generated from a 
file with the read-while-caching patch applied.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Real men write self-modifying code
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--- mod_disk_cache.c.rwc2006-10-06 14:22:27.0 +0200
+++ mod_disk_cache.c2006-10-08 19:17:31.0 +0200
@@ -676,6 +676,7 @@ static apr_status_t open_header_timeout(
 while(1) {
 if(dobj-hfd) {
 apr_file_close(dobj-hfd);
+dobj-hfd = NULL;
 }
 rc = open_header(h, r, key, conf);
 if(rc != APR_SUCCESS  rc != CACHE_ENODATA) {
@@ -1209,6 +1210,7 @@ static apr_status_t recall_headers(cache
 }
 
 apr_file_close(dobj-hfd);
+dobj-hfd = NULL;
 
 ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, r-server,
  disk_cache: Recalled headers for URL %s,  dobj-name);
@@ -1556,6 +1558,7 @@ static apr_status_t store_headers(cache_
 rv = apr_file_open(dobj-hfd, dobj-hdrsfile, 
 APR_WRITE | APR_BINARY | APR_BUFFERED, 0, r-pool);
 if (rv != APR_SUCCESS) {
+dobj-hfd = NULL;
 return rv;
 }
 }
@@ -1590,6 +1593,19 @@ static apr_status_t store_headers(cache_
 return rv;
 }
 
+/* If the body size is unknown, the header file will be rewritten later
+   so we can't close it */
+if(dobj-initial_size  0) {
+rv = apr_file_flush(dobj-hfd);
+}
+else {
+rv = apr_file_close(dobj-hfd);
+dobj-hfd = NULL;
+}
+if(rv != APR_SUCCESS) {
+return rv;
+}
+
 ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, r-server,
  disk_cache: Stored headers for URL %s,  dobj-name);
 return APR_SUCCESS;
@@ -1666,23 +1682,20 @@ static apr_status_t replace_brigade_with
apr_bucket_brigade *bb)
 {
 apr_status_t rv;
-int flags;
 apr_bucket *e;
-core_dir_config *pdcfg = ap_get_module_config(r-per_dir_config,
-core_module);
 disk_cache_object_t *dobj = (disk_cache_object_t *) h-cache_obj-vobj;
 
-flags = APR_READ|APR_BINARY;
-#if APR_HAS_SENDFILE
-flags |= ((pdcfg-enable_sendfile == ENABLE_SENDFILE_OFF)
-? 0 : APR_SENDFILE_ENABLED);
-#endif
-
-rv = apr_file_open(dobj-fd, dobj-datafile, flags, 0, r-pool);
+if(dobj-fd) {
+apr_file_close(dobj-fd);
+dobj-fd = NULL;
+}
+rv = open_body_timeout(r, dobj-name, dobj);
 if (rv != APR_SUCCESS) {
-ap_log_error(APLOG_MARK, APLOG_ERR, rv, r-server,
- disk_cache: Error opening datafile %s for URL %s,
- dobj-datafile, dobj-name);
+if(rv != CACHE_EDECLINED) {
+ap_log_error(APLOG_MARK, APLOG_ERR, rv, r-server,
+ disk_cache: Error opening datafile %s for URL %s,
+ dobj-datafile, dobj-name);
+}
 return rv;
 }
 
@@ -1922,14 +1935,12 @@ static apr_status_t store_body(cache_han
 
 /* All checks were fine, close output file */
 rv = apr_file_close(dobj-fd);
+dobj-fd = NULL;
 if(rv != APR_SUCCESS) {
 file_cache_errorcleanup(dobj, r);
 return rv;
 }
 
-ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, r-server,
- disk_cache: Body for URL %s cached.,  dobj-name);
-
 /* Redirect to cachefile if we copied a plain file */
 if(copy_file) {
 rv = replace_brigade_with_cache(h, r, bb);

[PATCH] mod_disk_cache background copy

2006-10-08 Thread Niklas Edmundsson



This patch implements copying a file in the background so the client 
initiating the caching can get the file delivered by 
read-while-caching instead of having to wait for the file to finish.


I'll attach it to bug #39380 as well, with less comments.

The method used here is rather crude, but works well enough in 
practice. It should suffice as a first step of implementing this 
functionality.


Known missing features:
* Documentation for the CacheMinBGSize parameter, the minimum file
  size to to do background caching. Typically set to what your backend
  can deliver in approx 250ms at normal load (given 200ms sleep loop).
* It doesn't set the stacksize for the background thread, it made
  stuff unloadable on AIX which probably means some symbol is missing
  in an export table somewhere.
* Testing of the forked variation. This has only had testing with the
  worker MPM on Unix.

Known areas of possible improvements:
* Figure out why the cleanup-function isn't run before the fd's are
  closed so the private pool can be removed.
* I suppose it's possible to use cross-threads-fd's with some
  setaside-magic instead of open new fd's in the bgcopy thread.
* Experiment with a separate copy-files-thread spawned at
  initialization for threaded environments.
* The forked thingie could probably use a few cleanups.

In practice I don't think those improvements will give much in terms 
of performance but it sure would be more elegant :)


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 It's funny how the Earth never opens up and swallows you when you want it to.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--- mod_disk_cache.c.ls-rwc-fixups  2006-10-08 19:17:31.0 +0200
+++ mod_disk_cache.c2006-10-08 19:11:40.0 +0200
@@ -22,6 +22,8 @@
 #include util_filter.h
 #include util_script.h
 #include util_charset.h
+#include ap_mpm.h
+
 
 /*
  * mod_disk_cache: Disk Based HTTP 1.1 Cache.
@@ -1677,6 +1679,272 @@ static apr_status_t copy_body(apr_pool_t
 }
 
 
+/* Provide srcfile and srcinfo containing
+   APR_FINFO_INODE|APR_FINFO_MTIME to make sure we have opened the right file
+   (someone might have just replaced it which messes up things).
+*/
+static apr_status_t copy_body_nofd(apr_pool_t *p, const char *srcfile, 
+   apr_off_t srcoff, apr_finfo_t *srcinfo,
+   const char *destfile, apr_off_t destoff, 
+   apr_off_t len)
+{
+apr_status_t rc;
+apr_file_t *srcfd, *destfd;
+apr_finfo_t finfo;
+
+rc = apr_file_open(srcfd, srcfile, APR_READ | APR_BINARY, 0, p);
+if(rc != APR_SUCCESS) {
+return rc;
+}
+rc = apr_file_info_get(finfo, APR_FINFO_INODE|APR_FINFO_MTIME, srcfd);
+if(rc != APR_SUCCESS) {
+return rc;
+}
+if(srcinfo-inode != finfo.inode || srcinfo-mtime  finfo.mtime) {
+return APR_EGENERAL;
+}
+
+rc = apr_file_open(destfd, destfile, APR_WRITE | APR_BINARY, 0, p);
+if(rc != APR_SUCCESS) {
+return rc;
+}
+
+rc = copy_body(p, srcfd, srcoff, destfd, destoff, len);
+apr_file_close(srcfd);
+if(rc != APR_SUCCESS) {
+apr_file_close(destfd);
+return rc;
+}
+
+return apr_file_close(destfd);
+}
+
+
+#if APR_HAS_THREADS
+static apr_status_t bgcopy_thread_cleanup(void *data)
+{
+copyinfo *ci = data;
+apr_status_t rc, ret;
+apr_pool_t *p;
+
+/* FIXME: Debug */
+ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, ci-s,
+ disk_cache: bgcopy_thread_cleanup: %s - %s,
+ ci-srcfile, ci-destfile);
+
+rc = apr_thread_join(ret, ci-t);
+if(rc != APR_SUCCESS) {
+ap_log_error(APLOG_MARK, APLOG_ERR, rc, ci-s,
+ disk_cache: bgcopy_thread_cleanup: apr_thread_join 
+ failed %s - %s, ci-srcfile, ci-destfile);
+return rc;
+}
+if(ret != APR_SUCCESS) {
+ap_log_error(APLOG_MARK, APLOG_ERR, ret, ci-s,
+ disk_cache: Background caching body %s - %s failed,
+ ci-srcfile, ci-destfile);
+}
+
+/* FIXME: Debug */
+ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, ci-s,
+ disk_cache: bgcopy_thread_cleanup: SUCCESS %s - %s,
+ ci-srcfile, ci-destfile);
+
+/* Destroy our private pool */
+p = ci-pool;
+apr_pool_destroy(p);
+
+return APR_SUCCESS;
+}
+
+
+static void *bgcopy_thread(apr_thread_t *t, void *data)
+{
+copyinfo *ci = data;
+apr_pool_t *p;
+apr_status_t rc;
+
+p = apr_thread_pool_get(t);
+
+/* FIXME: Debug */
+ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, ci-s,
+ disk_cache: bgcopy_thread: start %s - %s,
+ ci-srcfile, ci-destfile

Re: [PATCH] mod_disk_cache background copy

2006-10-11 Thread Niklas Edmundsson


On Wed, 11 Oct 2006, Graham Leggett wrote:

This patch implements copying a file in the background so the client 
initiating the caching can get the file delivered by read-while-caching 
instead of having to wait for the file to finish.


Something that Joe Orton raised, and that I've been looking into in more 
detail.


The copy_body function currently only supports file buckets, which 
specifically excludes buckets generated by say mod_proxy, or mod_cgi. From my 
testing, for these non file buckets, the response is downloaded and cached 
fully, then the client gets fed data. Initially I understood this as an 
optimisation specific to files, assuming that file buckets were the only 
buckets that could potentially exceed available RAM, but the case where non 
file buckets are present is currently unhandled.


I don't have enough knowledge of httpd internals to be sure, but 
doesn't the data-generating types insert flush buckets in the stream 
to avoid this? That said, mod_disk_cache seems to be totally unaware 
of flush buckets so I'm either barking up the wrong tree, it's handled 
on a higher level or it isn't handled.


In theory, the copy body should be able to read from any brigade, rather than 
just a file brigade, in such a way that it doesn't try and load 4.7GB into 
RAM at once for file buckets.


The original reason for copy_body() was to have something that could 
be used in a background thread, and the only thing I'm sure can be 
copied in the background is plain files. Everything else must be 
handled the old way.


I am jetlagged right now and can't think straight any more today, will carry 
on looking at this tomorrow :)


OK. I'll be away for a week or so and might lag quite a bit in 
replying to stuff.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Huh?  What?  Am I on-line?
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: mod_disk_cache summarization

2006-10-24 Thread Niklas Edmundsson


On Mon, 23 Oct 2006, Graham Leggett wrote:

Was busy cleaning up some other odds and ends, will be back on the cache code 
again shortly.


I'm awaiting the verdict on how to resolve the lead request hangs 
problem before I submit more patches, I feel it's important enough to 
be solved before I start submitting fixes/improvements to the 
following items for mod_disk_cache:


* On disk header fixes to not break when moving between 32/64 bit
  builds, include filename so we can fill in r-filename so %f in
  LogFormat works.
* More assorted small cleanups (mostly error handling).
* Allow disk cache to realise that a (large) file is the same
  regardless of which URL is used to access it. Reduces cache disk
  usage a lot for sites like ours that's known by ftp.acc.umu.se,
  ftp.se.debian.org, ftp.gnome.org, se.releases.ubuntu.com,
  releases.mozilla.org and so on.
* Add option to not try to remove cache directories in the cache
  structure. IMHO, this should never be needed since the cache
  directory should not be excessively deep (which the broken defaults
  leads to). Davi had a fix for the cache dir layout I think, and I
  personally think that neither mod_disk_cache nor htcacheclean should
  do rmdir.
* Eventually add option to have header and body in the same cachefile.
* Probably more stuff that I don't remember without looking in the
  jumbopatch.

Also, I suspect that there is documentation that needs to be updated, 
more than just new options.


While working with this I have understood that there are two rather 
different uses for mod_disk_cache: either as a cache in a proxy or as 
a way to make a FTP-server frontend reduce load of its file server 
backend.


For the FTP-server frontend usage we see the following 
characteristics: Large files, relatively few requests/s. It's 
important to keep files that are frequently accessed in cache (they 
might be large), hence have cache filesystem mounted with atime and 
clean cache based on atime. This works nicely for us using XFS, and 
cleaning by atime is much quicker and uses less resources than 
htcacheclean.


Others here are more clued on the proxy-cache-usecase, but as I 
understand it the keywords are many small files, many requests/s so 
need to mount with noatime and use htcacheclean.


Tuning tips in the documentation for these rather different cases 
would probably be apprecieted.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 All this will be for nothing unless we go to the stars :  Babylon 5
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: mod_disk_cache summarization

2006-10-24 Thread Niklas Edmundsson


On Tue, 24 Oct 2006, Graham Leggett wrote:


* Allow disk cache to realise that a (large) file is the same
   regardless of which URL is used to access it. Reduces cache disk
   usage a lot for sites like ours that's known by ftp.acc.umu.se,
   ftp.se.debian.org, ftp.gnome.org, se.releases.ubuntu.com,
   releases.mozilla.org and so on.


Perhaps this could be as simple as using ServerName and ServerAlias
(unless the name of the site is part of the URL, which will happen in the
forward proxy case) to reduce the cached URL to a canonical form before
storing and or retrieving from the cache.


We have a few different servernames depending on which site it's 
serving (needs to cater for official download locations and so on) so 
I guess that won't help much.



* Add option to not try to remove cache directories in the cache
   structure. IMHO, this should never be needed since the cache
   directory should not be excessively deep (which the broken defaults
   leads to). Davi had a fix for the cache dir layout I think, and I
   personally think that neither mod_disk_cache nor htcacheclean should
   do rmdir.


It makes sense that mod_disk_cache shouldn't do it, but perhaps it should
be tunable for htcacheclean.


Arguably. But if you ever need to remove directories in the cache 
hiearchy you should really start to wonder why they were created in 
the first place...



* Eventually add option to have header and body in the same cachefile.


Is there an advantage to this? IIRC Brian reported that a body in a
separate file can take advantage of sendfile, as is as a result much
faster.


We use combined header/body, and sendfile works flawlessly. Linux 
sendfile has problems when writing to a sendfile():d file with 
mmap, and all sendfiles have problems with overlapping 
sendfile/writes.


The main advantage is half the number of inodes and that by removing 
one file you get rid of both the header and body. I suspect that the 
performance gain is minimal though.



A more formal cache cleanup process needs to be fleshed out, giving the
options above both as options in code, and as documentation as you say.

The comparison of your and Brian's experience are two ends of extremes on
high volume caches, one low hits large files, the second high hits small
files. This should make for some useful tuning information.


The extreme difference is what makes me think that we should 
acknowledge that they exist and provide the relevant knobs where 
necessary. As it looks right now, those knobs tend to be more 
OS/filesystem specific, but that might change as this evolves.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Buy a 486-33 you can reboot faster..
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: mod_disk_cache summarization

2006-10-24 Thread Niklas Edmundsson


On Tue, 24 Oct 2006, Joe Orton wrote:


IMO: for a general purpose cache it is not appropriate to stop and try
to write the entire response to the cache before serving anything.


This is existing mod_disk_cache behaviour, the patches reduces these 
problems. Maybe not in a perfect way, but in a way good enough to show 
really noticeable improvements.


Since improving this mess is a gradual process, you'll have to live 
with kludges until the optimal solution is there. The alternative 
would be to do a completely new perfectly designed cache, which given 
the time it has taken to get mod_cache/mod_disk_cache even near a 
usable state simply won't happen...


You can't both have we want fixes in small incremental pieces and 
this thing sucks, make it perfect at once.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Buy a 486-33 you can reboot faster..
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: svn commit: r468373 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_cache.c modules/cache/mod_cache.h modules/cache/mod_disk_cache.c modules/cache/mod_disk_cache.h modules/cache/

2006-10-27 Thread Niklas Edmundsson


On Fri, 27 Oct 2006, Graham Leggett wrote:


On Fri, October 27, 2006 4:38 pm, Davi Arnaut wrote:


Where is pdconf ? Check out all those APR_HAS_SENDFILE.


Aaargh... will fix.


The purpose of that code was originally to make EnableSendfile Off 
in the config file work. APR_HAS_SENDFILE only tells you that APR has 
sendfile.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 My favorite color?  Red.  No, BluAHHH!
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: svn commit: r467655 - in /httpd/httpd/trunk: CHANGES docs/manual/mod/mod_cache.xml modules/cache/mod_cache.c modules/cache/mod_cache.h

2006-10-27 Thread Niklas Edmundsson


On Wed, 25 Oct 2006, Graham Leggett wrote:


I managed to solve this problem last night.


snip

This is what this code needed: Someone with a clue on the apache 
internals so stuff can be solved properly. I have said it before and 
say it again: I'm not that guy, but I know what functionality is 
needed for our usecase.


People have complained at the kludges present in my patches, and yes 
they were kludgy. However, they miss the big point: Despite the 
kludges they get the job done, with the end result being something 
usable for our usecase. With good performance, no less. If I can 
improve stuff from the state unusable to actually-pretty-good with 
kludges, then this should be a rather obvious hint that things suck 
and should be fixed. To just keep repeating this is no good probably 
won't achieve this.


If the goal is to never accept code that isn't perfect, mod*cache 
never should have been committed to the httpd tree, and probably most 
modules (including mod_example) too. Once in a while you have to 
acknowledge that commited code is crap, and accept patches, albeit 
kludges, if it improves the situation. Otherwise you might end up with 
code that keeps on rotting away (mod_example is a good example, 
again).


I would have been most happy if this had been fixed ages ago so I 
hadn't been forced to spend lots and lots of hours kludging stuff 
togehter. At least, my kludges seem to have sparked some development 
in this area, so they have served some purpose other than enabling a 
non-profit computer club building a FTP/HTTP server that actually 
works.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 My favorite color?  Red.  No, BluAHHH!
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: svn commit: r468373 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_cache.c modules/cache/mod_cache.h modules/cache/mod_disk_cache.c modules/cache/mod_disk_cache.h modules/cache/

2006-10-27 Thread Niklas Edmundsson


On Fri, 27 Oct 2006, Graham Leggett wrote:


Err. We had the data in memory, we are going to read it back from disk
again just in order to not block ? That's nonsense.


Agreed.


Please explain.

This is a disk cache. Why would you write expensive bucket data to cache,
and then expensive bucket data to the network?

That's plain stupid.


And when you have a file backend, you want to hit your disk cache and 
not the backend when delivering data to a client. People might think 
that this doesn't matter, but for large files, especially larger than 
RAM in your machine, you usually go disk-bound without much help from 
the OS disk cache.


Also, httpd seems to be faster delivering data by sendfile than 
delivering data from memory buckets. That's more of a performance bug 
in httpd though.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Monolith Auto Sales Center: My God! It's full of cars!
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: mod_disk_cache summarization

2006-10-27 Thread Niklas Edmundsson


On Tue, 24 Oct 2006, Graham Leggett wrote:


On Tue, October 24, 2006 2:48 pm, Niklas Edmundsson wrote:


Perhaps this could be as simple as using ServerName and ServerAlias
(unless the name of the site is part of the URL, which will happen in
the
forward proxy case) to reduce the cached URL to a canonical form before
storing and or retrieving from the cache.


We have a few different servernames depending on which site it's
serving (needs to cater for official download locations and so on) so
I guess that won't help much.


How it is configured? Is this in a virtual host like so?

VirtualHost ip.address:port
 ServerName ftp.gnome.se
 ServerAlias ftp.somewhere.else
 ServerAlias ftp.whatever
 ...
/VirtualHost

If the URLs change (ie the directories are different) then its a different
story.


Different VHosts meaning different URLs/directories, pointing to the 
same files...


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Colleges don't make fools; they only develop them
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: cache: the store_body interface

2006-10-31 Thread Niklas Edmundsson


On Mon, 30 Oct 2006, Ruediger Pluem wrote:


BTW: Does anybody know if MMAP for writing files is possible / makes sense /
improves performance?


It reduces some data-copying, so it's a tad cheaper to mmap. But, on 
Linux you can't do sendfile from a file that's being written to with 
mmap, and since I wanted to be able to do read-while-caching I dropped 
the mmap-write-idea since the drawbacks was way larger than the 
benefits.


YMMV

/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 DIME: A dollar after taxes.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: mod_disk_cache and mod_include bugs and suggestions

2007-01-15 Thread Niklas Edmundsson


On Mon, 15 Jan 2007, Graham Leggett wrote:

In order for caches to work, the Last-Modified or ETag headers need to be set 
correctly on the content, and this isn't always the case. When this happens, 
content isn't cached.


Another module with this problem is the mod_dir directory index generator, 
which also isn't cacheable for the same reasons SSI aren't.


Modern httpd releases can work around this if you set 
IndexOptions TrackModified, look in the docs for more info and 
limitations.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 ODOSCAN.EXE - Gets the Quaraks out of your Hard Drive!
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: Solved: mod_disk_cache and mod_include bugs and suggestions

2007-01-17 Thread Niklas Edmundsson


On Wed, 17 Jan 2007, Giuliano Gavazzi wrote:


I have a solution for the r470455 mod_disk_cache not caching SSI.
There are two points where the module seems incorrect to me, changing those 
makes it work:


Since you're talking about the code on trunk, you should be warned 
that the current state is somewhat unreliable due to merging patches 
which then ran into an implementation discussion that never got solved 
(I think). Last I heard, the current plan is to revoke most patches 
and redo stuff.


However, since I'm the one to blame for the patches that has been 
partially landed on trunk (which is the parts you touch) I can provide 
my comments on your solutions (and I hope that others can chime in 
where I'm wrong).


First, don't reindent code when not needed. That only serves to make 
your patch hard to read.


1) in store_body the condition (!APR_BUCKET_IS_EOS(APR_BRIGADE_LAST(bb))) was 
incorrectly stopping the flow from ever going past (for static and dynamic 
pages). I moved it, changing the condition. I will post the patch tomorrow.


From looking at the patch I can only say huh?. The brigade is 
complete when EOS is present, and only then can you complete the 
storing procedure. From a quick look at your patch I can't see how it 
could change things (instead of dropping out if not EOS you have a big 
if-chunk if it indeed is EOS, only adding an indentation level).


I might have missed some detail, but it's not obious from the 
hard-to-read patch...


2) in store_disk_headers nothing should happen (well, it should just return 
or never be called) if the dobj-initial_size  0.


It should be called, and it should do stuff.

One of the points of those patches are to solve the thundering herd 
problem, simply described as when a frequently accessed object is 
expired all accesses are served directly by your backend until one 
access has completed successfully and the cache has been able to store 
it. This is Bad if it causes your backend to grind to a halt.


To avoid this, the header is always written when the cache thinks it 
should cache something. Other requests will find this header, and if 
the size is unknown they will wait until it's updated with the correct 
size, otherwise they will do read-while-caching and return the 
contents as the file is being cached.


Those two changes make the header cache file store the correct resource size 
also for dynamic pages.


It stores the size, but doing so it breaks quite a few things.

I think it would be best if someone (Graham?) could revoke the status 
of mod_disk_cache on trunk to the agreed last good status, which is 
essentially the same as 2.2.4 if I remember correctly.


As for your problems, I would recommend staying on 2.2.4 proper and 
look further into the issue of expired/last-modified headers.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 And tomorrow will be like today, only more so.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

mod_disk_cache jumbopatch - new revision

2007-01-17 Thread Niklas Edmundsson



I uploaded a new version of our mod_disk_cache jumbopatch for httpd 
2.2.4 to http://issues.apache.org/bugzilla/show_bug.cgi?id=39380


It's what we've been using for a couple of months now (modulo upgrade 
to httpd 2.2.4) and should be considered fairly stable. It has 
survived all sorts of pathetic load-cases on http://ftp.acc.umu.se/ 
(also known as ftp.se.debian.org ftp.gnome.org, 
se.releases.ubuntu.com, se.archive.ubuntu.com, releases.mozilla.org) 
including our nfs backend going bezerk and bottoming out at a few MB/s 
when all frontends wanted to cache 300GB of new 
debian-weekly-build-isos.


Highlights from previous patch:
* Reverted to separate files for header and data, there were too many
  corner cases and having the data file separate allows us to reuse
  the cached data for other purposes (for example rsync).
* Fixed on disk headers to be stored in easily machine parseable
  format which allows for error checking instead human readable form
  that doesn't.
* Attaching the background thread to the connection instead of request
  pool allows for restarts to work, the thing doesn't crash when you
  do apachectl graceful anymore :)
* Lots of error handling fixes and corner cases, we triggered most of
  them when our backend went bezerk-go-slow-mode.
* Deletes cached files when cache decides that the object is stale for
  real, previously it only NULL:ed the data structure in memory
  causing other requests oto read headers etc.

Not mentioned in bugzilla, this is probably also relevant:
* Cache-file-path for Headers are hashed on URL, body on r-filename
  if present. This allows for using the same cache with external
  programs (for example rsync).

For those interested in using the same cache for rsync, we have 
whipped up an open-wrapper (uses LD_PRELOAD) which seems to be doing 
the job nicely. It can't cache as much metadata as mod_disk_cache, 
but it is able to reuse the cached bodies at least, which is a good 
thing if you have a lot of client sites that rsync the same trees 
daily.


We're awaiting some progress on mod_ftp to be able to cache ftp too, 
all usable ftpd's we have seen uses chroot() which causes trouble when 
trying to wrap open() and friends to access files outside the chroot 
;)



/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 What?  Hey.  Beverly. - Picard
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: Solved: mod_disk_cache and mod_include bugs and suggestions

2007-01-17 Thread Niklas Edmundsson


On Wed, 17 Jan 2007, Giuliano Gavazzi wrote:


  rv = apr_file_seek(dobj-hfd, APR_SET, off);

does not rewind if the file has been opened with APR_FOPEN_BUFFERED. Now, I


This is an APR bug, I submitted a bug report for it a while ago. I 
worked around it by not using buffering at all and writing larger 
chunks when writing headers.



There is another potential problem in store_headers, if the headers file is


Is this in trunk or in 2.2.4 proper? You should probably ignore 
mod_disk_cache on trunk until that situation is settled.


If you're into trying patches you could give my mod_disk_cache 
jumbopatch a spin, note however that it's only been tested for mostly 
static content (directory indexes being an exception). You can find 
the patch at http://issues.apache.org/bugzilla/show_bug.cgi?id=39380 
...


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Preserve wildlife... pickle a sqirrel.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: Solved: mod_disk_cache and mod_include bugs and suggestions

2007-01-17 Thread Niklas Edmundsson


On Wed, 17 Jan 2007, Giuliano Gavazzi wrote:

Is this in trunk or in 2.2.4 proper? You should probably ignore 
mod_disk_cache on trunk until that situation is settled.




I could work on mod_disk_cache from 2.2.4 proper, and find what causes the 
bug with SSI pages, but I do not see why I should spend another couple of 
days on it now that I have fixed the r470455 (that is trunk) release. After 
all the situation might settle quicklier (I think Shakespeare used this...) 
if I submit my patches! And I suppose now you agree that the extra 
indentation on my patch stemmed from the broken nature of the original code!


Actually I don't. Either you or me have misunderstood how buckets 
work, since the rest of the code should syntactically be equivalent. 
Or I'm missing some fine detail somewhere.


Anyhow, since the code on trunk probably is ReallyBrokentm I 
wouldn't waste my time on it until the situation has been cleared up.


If you're into trying patches you could give my mod_disk_cache jumbopatch a 
spin, note however that it's only been tested for mostly static content 
(directory indexes being an exception). You can find the patch at 
http://issues.apache.org/bugzilla/show_bug.cgi?id=39380 ...


I think I will give it a spin, more to give you feedback on possible issues 
with SSI.


Do that.

/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 I think he did a little too much LDS. - Kirk
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: Solved: mod_disk_cache and mod_include bugs and suggestions

2007-01-18 Thread Niklas Edmundsson


On Thu, 18 Jan 2007, Giuliano Gavazzi wrote:

Actually I don't. Either you or me have misunderstood how buckets work, 
since the rest of the code should syntactically be equivalent. Or I'm 
missing some fine detail somewhere.


Perhaps I do not understand buckets fully (and brigades), but this seems to 
be clear enough.

The fine detail is in the original code (sorry for repeating myself):

while (e != APR_BRIGADE_SENTINEL(bb)) {


Ugh. I see it now.

The version on trunk has a different form of solution to the 
read-while-caching-problem than my patch, and that solution depends on 
other stuff in trunk. If I remember correctly you crafted the trunk 
version onto 2.2.4, and that's bound to fail.


Either test trunk, or 2.2.4. Don't mix files freely between them and 
expect stuff to work ;)



I have also tested your patch 
(httpd-2.2.4-mod_disk_cache-jumbo20070117.patch) and in my limited test it 
works for SSI, but does not seem to be less prone than my patch of r470455 to 
hammering of the back-end. It is actually a tad worse.


A test on localhost with an SSI calling this script:

#!/bin/sh
echo `date`  foo.log
sleep 10
echo bar


with:

/usr/local/apache2/bin/ab -c 10 -n 20  URL

gives 13 calls to the backend with yours and 12 with mine. 18 failures out of 
20 (for length) in yours, and no failures in mine. Actually, it seems that 
yours confuses ab, as it reported a length 2 bytes short, and not 
corresponding to the one in the header file.

The throughput is about the same.


What's your update timeout? If you have a sleep 10 in the script, 
you'll need an update timeout longer than that or you'll always fail.


It shouldn't report different lengths though.

Enable debug logging in httpd and review the debug log in order to 
find out exactly where it falls short.


Regarding it hitting backend many times, that's probably due to the 
small window between I have no cached copy, I need to cache it so I 
let it travel along the filter chain and I have stuff to write, 
let's create a cache file. ab hits the page at exactly the same time, 
so it will trigger it. My patches try hard to detect when it's 
happening and only one instance will do the actual caching, but since 
I haven't looked at the particular issues with dynamic content the 
code tends to lean towards correctness (old behaviour) rather than 
performance.


It replaces the brigade with an instance of the cached file when it 
detects that it's already being cached. For stuff with unknown size 
(usually dynamic content) it can't do this, so it's bound to hit your 
backend. I have no clue on how to solve this with the current cache 
design, but I'm sure there are more clued people here when it comes to 
caching and dynamic content.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Want to forget all your troubles? Wear tight shoes.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

mod_cache: save filter recalls body to non-empty brigade?

2007-01-24 Thread Niklas Edmundsson



In mod_cache, recall_body() is called in the cache_save_filter() when 
revalidating an entity.


However, if I have understood things correctly the brigade is already 
populated when the save filter is called, so calling recall_body() in 
this case would place additional stuff in the bucket brigade.


Wouldn't it be more correct to empty the brigade before calling 
recall_body()? Or am I missing something?


This is mod_cache in vanilla httpd 2.2.4 by the way.

/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 9 out of 10 priests prefer young boys to Doom.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: mod_cache: save filter recalls body to non-empty brigade?

2007-01-24 Thread Niklas Edmundsson


On Wed, 24 Jan 2007, Graham Leggett wrote:


On Wed, January 24, 2007 2:15 pm, Niklas Edmundsson wrote:


In mod_cache, recall_body() is called in the cache_save_filter() when
revalidating an entity.

However, if I have understood things correctly the brigade is already
populated when the save filter is called, so calling recall_body() in
this case would place additional stuff in the bucket brigade.

Wouldn't it be more correct to empty the brigade before calling
recall_body()? Or am I missing something?


I think the theory is that recall_body() should only be called on a 304
not modified (with no body), so in theory there is no existing body
present, so no need to clear the brigade.


Ah. Then it makes sense. I only saw that it checked if status == OK, 
but I see now that I was looking at the wrong status value ;)



Of course practically you don't want to make assumptions about the
emptiness of the existing brigade, so clearing the brigade as a first step
makes definite sense.


OK. Do you want a patch for it, or will you fix it yourself? The 
cache-situation on trunk isn't completely clear, so maybe those 
patches that should be revoked from there should be cleaned up 
first...


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Reality--what a concept!
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: mod_cache: save filter recalls body to non-empty brigade?

2007-01-25 Thread Niklas Edmundsson


On Wed, 24 Jan 2007, Plüm, Rüdiger, VF EITO wrote:


Of course practically you don't want to make assumptions about the
emptiness of the existing brigade, so clearing the brigade as
a first step
makes definite sense.


It is not needed to clear the brigade, because the brigade passed to 
the filter is named in, the one where recall_body stores the cached 
file is bb. I the case of a recalled body we pass bb down the chain 
not in.


Ah, of course.

/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Air pollution is a mist demeanor.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: 3.0 - Proposed Goals

2007-02-14 Thread Niklas Edmundsson


On Wed, 14 Feb 2007, Garrett Rooney wrote:


- Rewrite how Brigades, Buckets and filters work.  Possibly replace them
with other models. I haven't been able to personally consolidate my
thoughts on how to 'fix' filters, but I am sure we can plenty of long
threads about it :-)


I think a big part of this should be documenting how filters are
supposed to interact with the rest of the system.  Right now it seems
to be very much a well, I looked at this other module and did what it
did, and it's quite easy to start depending on behavior in the system
that isn't actually documented to work that way.


This hits a rather sweet spot it seems. Browsing the current httpd 
module/developer docco I find gems like:

http://httpd.apache.org/docs/2.2/developer/modules.html

One would think that now that 2.2 is released at least the 1.3-2.0 
converting docco would have evolved to something better than it's a 
start ...


Also, we have http://httpd.apache.org/docs/2.2/developer/API.html ... 
It seems that the most current API docco is for 1.3, but at least 
there's a nice disclaimer telling that it's obsolete but some 
information might be correct.


So yes, I fully agree that documentation is needed. It's a pain trying 
to figure out how stuff (are supposed to) work when the docco is two 
major releases behind...


One problem here is that this kind of docco usually needs to be made 
by those who hate to write it: the core programmers.



/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 I should have done this a long time ago. - Picard
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: 3.0 - Proposed Goals

2007-02-14 Thread Niklas Edmundsson


On Wed, 14 Feb 2007, Nick Kew wrote:


On Wed, 14 Feb 2007 15:41:38 +0100 (MET)
Niklas Edmundsson [EMAIL PROTECTED] wrote:


One problem here is that this kind of docco usually needs to be made
by those who hate to write it: the core programmers.


The core programmers use the core programmer documentation,
aka the source code.  In particular, the .h files, which
give you detailed API documentation.


Mkay. However, the source and header files aren't very good in the 
how it's supposed to work department. You usually end up looking at 
a module that implements stuff the wrong way. mod_example might be the 
ultimate example of this ;)



For higher-level documentation of Apache 2.2, follow my .sig.


Remove stale docco and point there from the httpd website, then?

/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 A bird in the bush can't mess in your hand!
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Please backport mod_cache PR 41475 to 2.2.5 ...

2007-02-23 Thread Niklas Edmundsson



Hi!

I might be jumping the gun here, but I'd really like to see the fix 
for PR 41475 backported to 2.2.5. We're hitting this issue when 
mirroring the firefox installer which has a space in the filename...


We'll probably apply the fix locally, but it would be nice to have the 
mod_cache fixes in 2.2.5 so we don't have to keep track of them when 
upgrading...


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 I am NOT a computer nerd! I am a techno-weenie.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: [RFC] Guide to writing output filters

2007-03-17 Thread Niklas Edmundsson


On Sat, 17 Mar 2007, Ruediger Pluem wrote:


On 03/16/2007 11:55 PM, Joe Orton wrote:

http://people.apache.org/~jorton/output-filters.html

How does this look?  Anything missed out, anything that doesn't make
sense?  I think this covers most of the major problems in output filters
which keep coming up.


Thanks for doing this. It looks very good to me, especially as it gives
us a set of rules and best practises even though I think there might be
a discussion on the details.


As a not-so-clued person on httpd internals I have to whole-hearedly 
agree and add a Bravo! to this effort.


httpd is seriously lacking on the devel-docco-front, meaning that the 
little in-tree documentation and examples that exists is generally 
outdated or broken, and out-of-tree docco doesn't count in this 
regard. This is truly a step in the right direction.


Now, if only someone clued could have a go at the existing pages that 
says this should be improved/updated/written life would be bliss :)


And yes, I know that writing documentation is a drag. However, in the 
long run it pays off. Really.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Is virus a 'micro' organism?
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

mod_ftp, status and progress?

2007-03-23 Thread Niklas Edmundsson



Hi all!

What is the current status/progress on mod_ftp? I haven't seen much on 
[EMAIL PROTECTED] about it since the graduation...


In any case, we'd really like to get mod_ftp in a usable state so we 
can use it on our anonftp frontends. We currently use vsftpd and are 
really happy with it, but we would REALLY want to have the cache 
handled by mod_cache used by FTP too. We have already convinced rsync 
into using the cache by a LD_PRELOAD hack, unfortunately this doesn't 
work too well with ftpd's since they rely on chroot() to work.


So, how close is mod_ftp to handle this for us? I'll list the 
requirements we have and comment, and hopefully the Really Clued Ones 
will chime in with additional comments/status/etc:


In order to use mod_ftp on ftp.acc.umu.se (which runs httpd 2.2.4 with 
our mod_disk_cache jumbopatch) we need it to:


* Play well with mod_cache, if a file has been requested with HTTP a
  FTP request should reuse the cached copy. Last time I checked
  mod_ftp only did subrequests which mod_cache didn't act on. Of
  course files requested with FTP and thus cached should cause a HTTP
  request to use the cached copy (although with a revalidation to get
  current headers I guess). I think this won't work too well with
  vanilla mod_disk_cache, however our mod_disk_cache jumbo patch
  caches the body in a hash based on r-filename to solve the
  name-space issues.

* Only anonyomous read-only-access is required. I think this is
  working today.

* Download-related items like file listings, continuation, etc MUST
  work.

* Both passive and active mode MUST work. I think there was some
  issues causing it to always use 0.0.0.0 in passive mode last time I
  checked.

* Large file support MUST work (we serve DVD images). Last time I
  checked there was a whole slew of LFS issues with the mod_ftp
  globbing code which was simply broken since it didn't use the information
  gathered by configure and didn't use APR - it should be ripped out
  and replaced with APR stuff instead IMHO. It makes no sense to keep
  that mess just o keep httpd 2.0 compatibility...

* IPv6 MUST work. I think this is being addressed.

* Probably something that I forgot so it can't be that important ;)

I (and other fellow computer club people) would be happy to hack on 
small bugs and issues, but the more hairy things like the 
mod_ftp/mod_cache interaction and the globbing mess really needs a 
Clued Httpd Developer sorting out the various odds and ends.



/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 PRIME DIRECTIVE, MY ASS! Phasers on maximum!  Load photon torpedoes!
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: mod_ftp, status and progress?

2007-03-26 Thread Niklas Edmundsson


On Fri, 23 Mar 2007, William A. Rowe, Jr. wrote:


* Play well with mod_cache, if a file has been requested with HTTP a
  FTP request should reuse the cached copy. Last time I checked
  mod_ftp only did subrequests which mod_cache didn't act on.


In terms of  using 'top level' requests in lieu of subrequests, it's
not low hanging fruit but definitely worth the refactoring.  Doing this
against httpd trunk/ will show up the API's that httpd is missing for
providers of resource-based servers such as ftp.


OK. This will need more investigation then, the easiest solution
would seem to be to get the subrequest interaction with mod_cache 
right. Bright insights are welcome.



* Both passive and active mode MUST work. I think there was some
  issues causing it to always use 0.0.0.0 in passive mode last time I
  checked.


fixed afaik, unless you are on win2000 and didn't DisableWin32AcceptEx
in 2.2.4 release (apr 1.2.8).  The fix is in trunk and will percolate
out as apr 1.2.9 (or later) with 2.2.5.


Nice.


* Large file support MUST work (we serve DVD images). Last time I
  checked there was a whole slew of LFS issues with the mod_ftp
  globbing code which was simply broken since it didn't use the information
  gathered by configure and didn't use APR - it should be ripped out
  and replaced with APR stuff instead IMHO. It makes no sense to keep
  that mess just o keep httpd 2.0 compatibility...


Patches welcome, yes this needs some refactoring.


Any thoughts on how to do this? My mind tend to be focused on what 
needs to work for anonftp in this regard, and that means naively 
listing a directory without any thoughts on file permissions and such. 
If a file/directory is within the anonftp tree it's OK to include it 
in the listing.


However, if there's some special care that needs to be done when 
supporting file uploads (only listing directories which the auth:ed 
user have access to and other special stuff) I will probably miss this 
unless someone clued does the high level design.



* IPv6 MUST work. I think this is being addressed.


I'd fixed the traditional interfaces (PORT/PASV) but we need to hack
together EPRT/EPSV support, yet.


OK. This shouldn't be too hard, given that EPRT/EPSV doesn't differ 
too much from PORT/PASV.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 It is always darkest before it goes totally black.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: PATCH 19824 -- enhancement to mod_expires

2007-04-01 Thread Niklas Edmundsson


On Sat, 31 Mar 2007, Jeffrey Friedl wrote:


so that images are cached essentially forever, but this means that they can
not reasonably be updated in place. However, with this patch, you might use

  ExpiresByType image/jpeg aged 2 days  THEN  10 years  ELSE  1 hour

to allow for some initial tweaking.


I think it would make more sense to use the same behaviour as 
mod_cache instead of having hard-coded expire-times when it comes to 
entities which has a last-modified header, ie newly modified entities 
gets a low expire while stuff not changed for a while gets a high 
expire.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Where will YOU be when your laxative starts working?
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: mod_cache: 304 on HEAD (bug 41230)

2007-04-11 Thread Niklas Edmundsson


On Wed, 11 Apr 2007, Niklas Edmundsson wrote:

Would the correct fix be to check for r-header_only in cache_select(), or 
are there even more funky stuff going on? You don't want the cached object to 
be removed just because you got a HEAD request when it really isn't stale but 
just in need of revalidation. Ideally the HEAD request would cause the object 
to be revalidated if possible, but we can live with head requests just doing 
fallback without touching the cache.


I can whip up a patch for it, but I suspect that you guys are more clued on 
the deep magic involved :)


Looking a bit further, I think that something like this would actually 
be enough:

---8--
--- mod_cache.c.orig2007-04-11 13:29:14.0 +0200
+++ mod_cache.c 2007-04-11 14:06:29.0 +0200
@@ -456,7 +456,7 @@ static int cache_save_filter(ap_filter_t
  */
 reason = No Last-Modified, Etag, or Expires headers;
 }
-else if (r-header_only) {
+else if (r-header_only  !cache-stale_handle) {
 /* HEAD requests */
 reason = HTTP HEAD request;
 }
@@ -589,11 +589,12 @@ static int cache_save_filter(ap_filter_t
 cache-provider-remove_entity(cache-stale_handle);
 /* Treat the request as if it wasn't conditional. */
 cache-stale_handle = NULL;
+rv = !OK;
 }
 }

-/* no cache handle, create a new entity */
-if (!cache-handle) {
+/* no cache handle, create a new entity only for non-HEAD request */
+if (!cache-handle  !r-header_only) {
 rv = cache_create_entity(r, size);
 info = apr_pcalloc(r-pool, sizeof(cache_info));
 /* We only set info-status upon the initial creation. */
---8--

If I have understood things right this would:
- Accept revalidations even though it's a HEAD if the object wasn't
  stale.
- Bail out if the object is stale and it's a HEAD.

I haven't tried it yet though, I'm just trying to get a grasp of 
things. I have no clue on whether other things would break due to the 
fact that it's revalidated based on a HEAD instead of a GET, for 
example.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 I am Mr. T of Borg. I pity da fool that resists me.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

[PATCH] mod_cache 304 on HEAD (bug 41230)

2007-04-13 Thread Niklas Edmundsson


On Wed, 11 Apr 2007, Niklas Edmundsson wrote:

Looking a bit further, I think that something like this would actually be 
enough:

snip, included as an attachment

I have now tested this patch, and it seems to solve the problem. This 
is on httpd-2.2.4 + patch for PR41475 + our mod_disk_cache patches.


Without the patch a HEAD on a cached expired object that isn't 
modified will unconditionally return 304 and furthermore cause the 
cached object to be deleted. We believe that this is the explanation 
to why it has been so hard to track down this bug - it only bites one 
user and that user usually has no clue on what happens, and even if we 
try to reproduce it immediately afterwards it won't trigger.


With the patch stuff works like expected:
- A HEAD on a cached expired object that isn't modified will update
  the cache header and return the proper return code, it follows the
  same code path as other requests on expired unmodified objects.
- A HEAD on a cached expired object that IS modified will remove the
  object from cache and then decline the opportunity to cache the
  object.

I request that this is reviewed, commited and proposed for backport to 
httpd 2.2.5.



/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 The pain is bad enough. Don't go poetic on me. - Madeline
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--- mod_cache.c.orig2007-04-11 13:29:14.0 +0200
+++ mod_cache.c 2007-04-11 14:06:29.0 +0200
@@ -456,7 +456,7 @@ static int cache_save_filter(ap_filter_t
  */
 reason = No Last-Modified, Etag, or Expires headers;
 }
-else if (r-header_only) {
+else if (r-header_only  !cache-stale_handle) {
 /* HEAD requests */
 reason = HTTP HEAD request;
 }
@@ -589,11 +589,12 @@ static int cache_save_filter(ap_filter_t
 cache-provider-remove_entity(cache-stale_handle);
 /* Treat the request as if it wasn't conditional. */
 cache-stale_handle = NULL;
+rv = !OK;
 }
 }
 
-/* no cache handle, create a new entity */
-if (!cache-handle) {
+/* no cache handle, create a new entity only for non-HEAD request */
+if (!cache-handle  !r-header_only) {
 rv = cache_create_entity(r, size);
 info = apr_pcalloc(r-pool, sizeof(cache_info));
 /* We only set info-status upon the initial creation. */

Re: [PATCH] mod_cache 304 on HEAD (bug 41230)

2007-04-17 Thread Niklas Edmundsson


On Mon, 16 Apr 2007, Ruediger Pluem wrote:


I have now tested this patch, and it seems to solve the problem. This is
on httpd-2.2.4 + patch for PR41475 + our mod_disk_cache patches.

Without the patch a HEAD on a cached expired object that isn't modified
will unconditionally return 304 and furthermore cause the cached object
to be deleted. We believe that this is the explanation to why it has
been so hard to track down this bug - it only bites one user and that
user usually has no clue on what happens, and even if we try to
reproduce it immediately afterwards it won't trigger.

With the patch stuff works like expected:
- A HEAD on a cached expired object that isn't modified will update
  the cache header and return the proper return code, it follows the
  same code path as other requests on expired unmodified objects.
- A HEAD on a cached expired object that IS modified will remove the
  object from cache and then decline the opportunity to cache the
  object.


Are you really sure that it gets deleted? cache-provider-remove_entity does
not really remove the object from the cache. Only cache-provider-remove_url
does this.


Yes, but the CACHE_REMOVE_URL filter will remove it, right? It removes 
the CACHE_REMOVE_URL filter only after it has decided that it's 
actually caching the response so it will bite in that case.


I consider the CACHE_SAVE filter already as hard to read (not your 
fault by any means), but from my point of view your patch does 
increase this (specificly I think about the rv = !OK line. I know 
that a similar trick is done some lines above, but I don't like that 
one either).


I also found rv=!OK ugly, but I just followed the established style to 
create a minimal patch without extra fuzz. Feel free to clean stuff up 
to improve readability, as long as the bug gets fixed I'm happy :)



/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Push any key. Then push the any other key.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: [PATCH] mod_cache 304 on HEAD (bug 41230)

2007-04-17 Thread Niklas Edmundsson


On Tue, 17 Apr 2007, Plüm, Rüdiger, VF-Group wrote:


Are you really sure that it gets deleted?

cache-provider-remove_entity does

not really remove the object from the cache. Only

cache-provider-remove_url

does this.


Yes, but the CACHE_REMOVE_URL filter will remove it, right?
It removes
the CACHE_REMOVE_URL filter only after it has decided that it's
actually caching the response so it will bite in that case.


But only if there is cache-handle or a cache-stale_handle. We have neither, as
cache-stale_handle is set to NULL.


Ah, of course.

Looking closer I find that as a part of our hacking on mod_disk_cache 
we fixed remove_entity to also remove the cache-files. If I remember 
correctly it was leaving stale cache files in some code paths, I guess 
that this was one of them. And we never figured out why there was both 
remove_entity and remove_url anyway, even the mod_cache-code seems to 
get them confused...


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 HEALTH: The slowest possible rate of dying.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: mod_ftp, status and progress?

2007-04-26 Thread Niklas Edmundsson


On Thu, 26 Apr 2007, Jim Jagielski wrote:



On Apr 18, 2007, at 1:22 PM, Guenter Knauf wrote:


Hi,
the current code fails to build for Win32 target.
This is because ftp_glob.c seems not APR-ised yet;


I'm actually looking at removing the whole glob stuff
and emulating it as regexes...


Wouldn't apr_match_glob() be a better starting point? I don't really 
see the point of going via regexes...


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Luckily, I'm out of hairs to split!
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: mod_ftp, status and progress?

2007-04-26 Thread Niklas Edmundsson


On Thu, 26 Apr 2007, Jim Jagielski wrote:


I'm actually looking at removing the whole glob stuff
and emulating it as regexes...


Wouldn't apr_match_glob() be a better starting point? I don't really see 
the point of going via regexes...


I was thinking for 2.0.x compatibility...


Wouldn't it be better to focus on 2.2.x and onwards? OK, there's a lot 
of people still running 1.3 and 2.0, but that doesn't mean that we 
have to make it run on all of them...


I'm all for focusing on getting it usable for 2.2+, and if people 
really want the httpd-tree mod_ftp that bad they can see it as yet 
another good reason to upgrade. There's a lot of work that needs to be 
done in order to have mod_ftp usable, and making the code more complex 
in order to support the previous stable httpd version doesn't really 
sound that appealing.


Not that I'm against backward compatibility, but I'd prefer seeing a 
clean design for the current/future httpd version and the compat stuff 
handled by wrapper functions stashed in a mod_ftp_20compat.c or 
something like that.


In any case, as long as the code is readable and works. The current 
mod_ftp globbing stuff is simply a mess.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Reality is for people who can't handle Star Trek.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: mod_ftp, status and progress?

2007-04-27 Thread Niklas Edmundsson


On Fri, 27 Apr 2007, Jim Jagielski wrote:


I'm actually looking at removing the whole glob stuff
and emulating it as regexes...
Wouldn't apr_match_glob() be a better starting point? I don't really see 
the point of going via regexes...


I was thinking for 2.0.x compatibility...


Wouldn't it be better to focus on 2.2.x and onwards? OK, there's a lot of 
people still running 1.3 and 2.0, but that doesn't mean that we have to 
make it run on all of them...


Why? Really, it's no big deal to ensure it runs on both.


I'm not against keeping compatibility. However I feel that the right 
way to do it would be to design stuff for current httpd and then add 
glue for the backward compat stuff (and not doing it the 
#ifdef-mess-way).


So, going for regexes just because apr_match_glob() doesn't exist in 
2.0.x seems a bit sub-optimal...


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 You're the security chief-shouldn't you be out securing something?
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

mod_ftp: [PATCH] Make REST work with large files

2007-04-27 Thread Niklas Edmundsson



Attached is a patch written some time ago to make the REST command 
grok large files on LFS-capable platforms by using apr_strtoff() 
instead of strtol().


It's untested, mostly because I didn't have a test server handy at the 
moment. Thought I should submit the patch before I lost it in the 
twisty maze of svn trees though.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Fiddle: Friction of a horse's tail on cat's entrails.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=Index: modules/ftp/ftp_commands.c
===
--- modules/ftp/ftp_commands.c  (revision 533227)
+++ modules/ftp/ftp_commands.c  (working copy)
@@ -1784,33 +1784,18 @@
 static int ftp_cmd_rest(request_rec *r, const char *arg)
 {
 ftp_connection *fc = ftp_get_module_config(r-request_config);
-conn_rec *c = r-connection;
 char *endp;
+apr_status_t rv;
 
-/* XXX: shortcoming, cannot restart  ~2GB.  Must be solved in 
- * APR, or we need to use 
- *  int len;
- *  res = sscanf(arg,%APR_OFF_T_FMT%n, fc-restart_point, len);
- *  end = arg + len;
- * and test that res == 2. Dunno how portable or safe this gross
- * hack would be in real life.
- */
-fc-restart_point = strtol(arg, endp, 10);
-if (((*arg == '\0') || (*endp != '\0')) || fc-restart_point  0) {
-fc-response_notes = apr_pstrdup(r-pool, REST requires a an 
- integer value greater than zero);
+rv = apr_strtoff((fc-restart_point), arg, endp, 10);
+if (rv != APR_SUCCESS || ((*arg == '\0') || (*endp != '\0')) || 
+fc-restart_point  0) 
+{
+fc-response_notes = apr_pstrdup(r-pool, REST requires a 
+ non-negative integer value);
 return FTP_REPLY_SYNTAX_ERROR;
 }
 
-/* Check overflow condition */
-if (fc-restart_point == LONG_MAX) {
-ap_log_error(APLOG_MARK, APLOG_WARNING|APLOG_NOERRNO, 0, 
- c-base_server,
- Client attempted an invalid restart point);
-/* XXX: possible overflow, continue gracefully?  Many other FTP
- * client do not check overflow conditions in the REST command.
- */
-}
 fc-response_notes = apr_psprintf(r-pool, Restarting at % APR_OFF_T_FMT
   . Send STORE or RETRIEVE to initiate 
   transfer., fc-restart_point);

[PATCH] mod_cache: Don't follow NULL pointers.

2007-05-02 Thread Niklas Edmundsson



We encountered the following bug: httpd segfaulted due to a client 
emitting Cache-Control: max-age=216000, max-stale which is a 
perfectly valid header.


The segfault is caused by the fact that ap_cache_liststr() sets the 
value pointer to NULL when there is no value, and this isn't checked 
at all in the cases when a value pointer is passed.


I think that this patch catches all those occurances.

I'm not proud of the solution for max-stale without value, but it 
should do the job...


In any case, this is a bug that should be fixed ASAP and queued for 
inclusion in httpd 2.2.5 since it segfaults your httpd even with valid 
headers...



/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 I am Yoda of Borg.  Assimilated you will be, hmmm?
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--- ../../../dist/modules/cache/cache_util.c2006-10-13 01:11:33.0 
+0200
+++ cache_util.c2007-05-02 10:26:08.0 +0200
@@ -243,7 +243,8 @@
 age = ap_cache_current_age(info, age_c, r-request_time);
 
 /* extract s-maxage */
-if (cc_cresp  ap_cache_liststr(r-pool, cc_cresp, s-maxage, val)) {
+if (cc_cresp  ap_cache_liststr(r-pool, cc_cresp, s-maxage, val)
+ val != NULL) {
 smaxage = apr_atoi64(val);
 }
 else {
@@ -252,7 +253,8 @@
 
 /* extract max-age from request */
 if (!conf-ignorecachecontrol
- cc_req  ap_cache_liststr(r-pool, cc_req, max-age, val)) {
+ cc_req  ap_cache_liststr(r-pool, cc_req, max-age, val)
+ val != NULL) {
 maxage_req = apr_atoi64(val);
 }
 else {
@@ -260,7 +262,8 @@
 }
 
 /* extract max-age from response */
-if (cc_cresp  ap_cache_liststr(r-pool, cc_cresp, max-age, val)) {
+if (cc_cresp  ap_cache_liststr(r-pool, cc_cresp, max-age, val)
+ val != NULL) {
 maxage_cresp = apr_atoi64(val);
 }
 else {
@@ -282,7 +285,14 @@
 
 /* extract max-stale */
 if (cc_req  ap_cache_liststr(r-pool, cc_req, max-stale, val)) {
-maxstale = apr_atoi64(val);
+if(val != NULL) {
+maxstale = apr_atoi64(val);
+}
+else {
+/* If no value is assigned to max-stale, then the client is willing
+ * to accept a stale response of any age */
+maxstale = APR_INT64_C(0x7fff); /* No APR_INT64_MAX? */
+}
 }
 else {
 maxstale = 0;
@@ -290,7 +300,8 @@
 
 /* extract min-fresh */
 if (!conf-ignorecachecontrol
- cc_req  ap_cache_liststr(r-pool, cc_req, min-fresh, val)) {
+ cc_req  ap_cache_liststr(r-pool, cc_req, min-fresh, val)
+ val != NULL) {
 minfresh = apr_atoi64(val);
 }
 else {

Re: [PATCH] mod_cache: Don't follow NULL pointers.

2007-05-02 Thread Niklas Edmundsson


On Wed, 2 May 2007, Niklas Edmundsson wrote:

We encountered the following bug: httpd segfaulted due to a client emitting 
Cache-Control: max-age=216000, max-stale which is a perfectly valid header.


The segfault is caused by the fact that ap_cache_liststr() sets the value 
pointer to NULL when there is no value, and this isn't checked at all in the 
cases when a value pointer is passed.


I think that this patch catches all those occurances.


Or so I thought.

It turned out that ap_cache_liststr() didn't set the value pointer to 
NULL in all cases where it should. Now it does.


I'm not proud of the solution for max-stale without value, but it should do 
the job...


It did, but it caused the freshness calculation to overflow so the end 
result was bollocks. I hard-coded 100 years for the max-stale without 
value case, not pretty but it works.


Updated patch attached.


/Nikke - not fond of fixing bugs with core-files as the only source of
 information :/
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 REJECTION: When your imaginary friends won't talk to you.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--- ../../../dist/modules/cache/cache_util.c2006-10-13 01:11:33.0 
+0200
+++ cache_util.c2007-05-02 14:01:06.0 +0200
@@ -243,7 +243,8 @@
 age = ap_cache_current_age(info, age_c, r-request_time);
 
 /* extract s-maxage */
-if (cc_cresp  ap_cache_liststr(r-pool, cc_cresp, s-maxage, val)) {
+if (cc_cresp  ap_cache_liststr(r-pool, cc_cresp, s-maxage, val)
+ val != NULL) {
 smaxage = apr_atoi64(val);
 }
 else {
@@ -252,7 +253,8 @@
 
 /* extract max-age from request */
 if (!conf-ignorecachecontrol
- cc_req  ap_cache_liststr(r-pool, cc_req, max-age, val)) {
+ cc_req  ap_cache_liststr(r-pool, cc_req, max-age, val)
+ val != NULL) {
 maxage_req = apr_atoi64(val);
 }
 else {
@@ -260,7 +262,8 @@
 }
 
 /* extract max-age from response */
-if (cc_cresp  ap_cache_liststr(r-pool, cc_cresp, max-age, val)) {
+if (cc_cresp  ap_cache_liststr(r-pool, cc_cresp, max-age, val)
+ val != NULL) {
 maxage_cresp = apr_atoi64(val);
 }
 else {
@@ -282,7 +285,16 @@
 
 /* extract max-stale */
 if (cc_req  ap_cache_liststr(r-pool, cc_req, max-stale, val)) {
-maxstale = apr_atoi64(val);
+if(val != NULL) {
+maxstale = apr_atoi64(val);
+}
+else {
+/* If no value is assigned to max-stale, then the client is willing
+ * to accept a stale response of any age */
+/* Let's pretend 100 years is enough, need margin some marging here
+ * or the freshness calculation later will overflow */
+maxstale = APR_INT64_C(86400*365*100);
+}
 }
 else {
 maxstale = 0;
@@ -290,7 +302,8 @@
 
 /* extract min-fresh */
 if (!conf-ignorecachecontrol
- cc_req  ap_cache_liststr(r-pool, cc_req, min-fresh, val)) {
+ cc_req  ap_cache_liststr(r-pool, cc_req, min-fresh, val)
+ val != NULL) {
 minfresh = apr_atoi64(val);
 }
 else {
@@ -419,6 +432,9 @@
   next - val_start);
 }
 }
+else {
+*val = NULL;
+}
 }
 return 1;
 }

Re: mod_ftp, status and progress?

2007-05-03 Thread Niklas Edmundsson


On Wed, 2 May 2007, Jim Jagielski wrote:


In fact, to be honest, it would be easier still to just
update ftp_direntry_get() to use apr_fnmatch(), since we
always want to support globing. ftp_direntry_get already
does most of what makes apr_match_glob attractive in
the 1st place.



Should have a patch to commit later on tomorrow,
after some more tests :)


I suspect that you're fixing the large file issues while you're at it?

Another thing I noticed when we started to look at mod_ftp (looking at 
strace/truss-output trying to figure out why things didn't work) was 
that it stats all entries in a directory twice, first explicitly and 
then via the subreq. Wouldn't the subreq be enough? It's no biggie for 
now, but it would be nice to get rid of unneccessary stats as a bonus 
;)



/Nikke - eager to give it a spin :)
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 You wanted to make it law. Make it a good one. - Picard
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: mod_ftp, status and progress?

2007-05-03 Thread Niklas Edmundsson


On Thu, 3 May 2007, William A. Rowe, Jr. wrote:


Another thing I noticed when we started to look at mod_ftp (looking at
strace/truss-output trying to figure out why things didn't work) was
that it stats all entries in a directory twice, first explicitly and
then via the subreq. Wouldn't the subreq be enough? It's no biggie for
now, but it would be nice to get rid of unneccessary stats as a bonus ;)


This is a separate issue; we need to refactor out 90% of the subrequests
and treat these at top level requests.


OK. I was under the impression that those subrequests were made to 
filter out stuff you don't have access to from the directory listings, 
but I stand corrected.



I discovered while trying to accomodate named virtual hosts (the hack to
let [EMAIL PROTECTED] resolve to the host.com vhost) it's simply not worth
hacking one without the other.


Ah. Am I right in guessing that making it play well with mod_cache 
would come more or less for free after the request-refactoring is 
done?


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 He who laughs last is probably your boss!
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: ftp glob/limits?

2007-05-15 Thread Niklas Edmundsson


On Mon, 14 May 2007, William A. Rowe, Jr. wrote:


What would folks think about changing

   if (ap_strchr_c(arg, '*') != NULL) {
   /* Prevent DOS attacks, only allow one segment to have a wildcard */
   int found = 0;   /* The number of segments with a wildcard */

to permit multiple wildcards, but to restrict the number of matches
returned (configurable with a directive, of course)?

Over a small pattern space, uploads/*/* is often very useful.

What would be the sane default?  1,000 entries?


For anonftp usage I would prefer the restrictive behaviour, it's good 
enough for most users and most decent ftpd's already does it that way.


For example, you can find this in ls.c in vsftpd:
--8--
   * Note that pattern matching is only supported within the last path
   * component. For example, searching for /a/b/? will work, but searching
   * for /a/?/c will not.
--8--

which is a sane behaviour for a public server in my world.

For non-anonftp usage limiting the number of matches might be OK, if 
the thing stops recursion when hitting the limit and not just limit 
the reply send to the client ;)


So my vote would be default to restrictive, a more relaxed behaviour 
must be explicitly configured.



/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 A bird in hand makes brushing your teeth difficult.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Any progress on PR41230 (HEAD issues on cached items)?

2007-05-17 Thread Niklas Edmundsson



Has there been any progress on PR41230? I submitted a patch that at 
least seems to improve the situation that now seems to have seen some 
testing by others as well.


As I have stated before, it would be really nice if a fix for this 
could be committed, be it my patch or some other solution.



/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Don't hide your contempt of the contemptible!
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: Any progress on PR41230 (HEAD issues on cached items)?

2007-05-21 Thread Niklas Edmundsson


On Fri, 18 May 2007, Justin Erenkrantz wrote:


On 5/17/07, Niklas Edmundsson [EMAIL PROTECTED] wrote:


Has there been any progress on PR41230? I submitted a patch that at
least seems to improve the situation that now seems to have seen some
testing by others as well.

As I have stated before, it would be really nice if a fix for this
could be committed, be it my patch or some other solution.


I've committed a variant of this patch to trunk in r539620.  Thanks!


Great!

Now it just needs to be included in 2.2.x and I'll be even more happy 
:)


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Old mufflers never die, they get exhausted.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: mod_cache: Don't update when req max-age=0?

2007-05-21 Thread Niklas Edmundsson


On Mon, 21 May 2007, Graham Leggett wrote:


Since max-age=0 requests can't be fulfilled without revalidating the
object they don't benefit from this header rewrite, and requests with
max-age!=0 that can benefit from the header rewrite won't be affected
by this change.

Am I making sense? Have I missed something fundamental?


At first glance, doing this I think will break RFC2616 compliance, and if
it does break RFC compliance then I think it should not be default
behaviour. However if it does solve a real problem for admins, then having
a directive allowing the admin to enable this behaviour does make sense.


Why would it break RFC compliance? This request will never benefit of 
the headers being saved to disk, and the headers returned to the 
client should of course be those that resulted of the revalidation of 
the object. The only difference is that they aren't saved to disk too.


The only difference I can see is that you can't probe that the 
previous request was a max-age=0 by doing max-age!=0 request 
afterwards...



Zooming out a little bit, this seems to fall into the category of RFC
violations that allow the cache to either hit the backend less, or hit the
backend not at all, for the benefit of an admin who knows whet they are
doing.

A simple set of directives that allow an admin to break RFC compliance
under certain circumstances in order to achieve certain goals does make
sense.


Yup. CacheIgnoreCacheControl is one of those, we use it on the 
offloaders that only serves large files that we know doesn't need the 
RFC behaviour.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Sir, We are receiving 285,000 Hails. þ Crusher
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: mod_cache: Don't update when req max-age=0?

2007-05-24 Thread Niklas Edmundsson


On Tue, 22 May 2007, Henrik Nordstrom wrote:


tis 2007-05-22 klockan 11:40 +0200 skrev Niklas Edmundsson:


-8---
Does anybody see a problem with changing mod_cache to not update the
stored headers when the request has max-age=0, the body turns out not
to be stale and the on-disk header hasn't expired?
-8---


My understanding:

It's fine in an RFC point of view for the cache to completely ignore a
304 and not update the stored entity at all. But the response to this
request should be the merge of the two responses assuming the
conditional was added by the cache.


This is in line with my understanding, and since the response-merging 
is being done today the only change that would be done is to skip 
storing the header to disk. I think it would be wise to only skip the 
storing for the max-age=0 case though.


Should I try to whip up a patch for it then?


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Radioactive halibut will make fission chips.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: mod_cache: Don't update when req max-age=0?

2007-05-24 Thread Niklas Edmundsson


On Thu, 24 May 2007, Sander Striker wrote:


 -8---
 Does anybody see a problem with changing mod_cache to not update the
 stored headers when the request has max-age=0, the body turns out not
 to be stale and the on-disk header hasn't expired?
 -8---

 My understanding:

 It's fine in an RFC point of view for the cache to completely ignore a
 304 and not update the stored entity at all. But the response to this
 request should be the merge of the two responses assuming the
 conditional was added by the cache.

This is in line with my understanding, and since the response-merging
is being done today the only change that would be done is to skip
storing the header to disk. I think it would be wise to only skip the
storing for the max-age=0 case though.


Why limit it to the the max-age=0 case?  Isn't it a general improvement?


Consider a default cache lifetime of 86400 seconds, and requests 
coming in with max-age=4 (we see a lot of mozilla downloads with 
this, for example). If you don't rewrite the on-disk headers you'll 
end up always hitting your backend when you pass an age of 4.


In the max-age=0 case you only force an unneccesary header write, 
because:

a) The written header won't be useful for other requests with
   max-age=0. A ground rule of caching is to not save stuff that's
   never used.
b) Requests with max-age!=0 aren't helped much by it, the only penalty
   would be when an max-age!=0 request causes a header rewrite that
   an max-age=0 access would have performed. Doing this single rewrite
   instead of potentially thousands if rewriting due to max-age=0
   is a rather big win.
c) RFC-wise it seems to me that a not-modified object is a
   not-modified object. There is no guarantee that next request will
   hit the same cache, so nothing can expect a max-age=0 request to
   force a cache to rewrite its headers and then access it with
   max-age!=0 and get headers of that age.
d) Also, an object tend to be accessed with more-or-less the same
   max-age. So to store headers in the max-age=0 case just because it
   might be accessed by max-age!=0 makes no sense, since it's more
   likely that the next request to this object will have the same
   max-age.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Did I just step on someones toes again??
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

mod_disk_cache jumbopatch - 20070727 revision

2007-07-27 Thread Niklas Edmundsson



I have uploaded the version of our mod_disk_cache jumbopatch that 
we've been using on ftp.acc.umu.se for some time now to

http://issues.apache.org/bugzilla/show_bug.cgi?id=39380
for those who wants a one-patch solution to using our modifications.

Cutpaste from the bugzilla attachment comment:
httpd 2.2.4 - mod_disk_cache jumbo patch - lfs/diskformat/read-while-caching 
etc.

A snapshot from 20070727 of our mod_disk_cache jumbo patch and some 
assorted additional patches that's needed for stability. We've been 
running this for a couple of months on ftp.acc.umu.se, they have 
survived Debian/Ubuntu/Mozilla releases gracefully.


This version plays well with other entities using/updating the cache. 
We are using a open()-wrapper in combination with rsync which lets 
rsync utilise the cached bodies, and also cache files.


This patch is provided mostly as a one-patch solution for other sites 
that wishes to use these mod_disk_cache modifications.


Highlights from previous patch:
* More corner case error fixes, most of them triggered by Mozilla 
releases.
* Greatly reduced duplicated data in the cache when using an NFS 
backend by hashing the body on the source files device and inode when 
available. HTTPD has

already done the stat() of the file for us, so it's essentially free.
* Tidied up the handling of updated files, only delete files in cache 
if they're really obsolete.



/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 We are ATT of Borg, MCI will be assimilated
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

mod_limitipconn for httpd 2.2 and mod_cache

2007-07-27 Thread Niklas Edmundsson



Hi!

Attached is a version of mod_limitipconn.c that works in conjunction 
with mod_cache and httpd-2.2. We've been using this on ftp.acc.umu.se 
for some time now without any unwanted issues.


The main problem with mod_limitipconn-0.22 was that since mod_cache 
runs as a quick handler, mod_limitipconn also must run as a quick 
handler with all those benefits and drawbacks.


Download the tarball from http://dominia.org/djao/limitipconn2.html , 
extract it, and replace mod_limitipconn.c with this version and follow 
the build instructions.


I would really wish that this was made part of httpd, it's really 
needed when running a file-download site due to the scarily large 
amount of demented download manager clients out there.


However, I have not received any response from the original author on 
the matter. From what I have understood of the license it should be OK 
to merge into httpd if you want though, but I think that you guys are 
way more clued in that matter than me.


This is a summary of the changes made:
* Rewritten to run as a Quick Handler, before mod_cache.
* Configuration directives are now set per VHost (Directory/Location
   are available after the Quick Handler has been run). This means that
   any Location containers has to be deleted in existing configs.
* Fixed configuration merging, so per-vhost settings use defaults set
   at the server level.
* By running as a Quick Handler we don't go through the entire lookup
   phase (resolve path, stat file, etc) before we get the possibility
   to block a request. This gives a clear performance enhancement.
* Made the handler exit as soon as possible, doing the easy checks
   first.
* Don't do subrequest to lookup MIME type if we don't have mime-type
   specific config.
* Count connections in closing and logging state too, we don't want to
   be DOS'd by clients behind buggy firewalls and so on.
* Added debug messages for easy debugging.
* Reduced loglevel from ERR to INFO for reject-logging.

In any case, I hope that this can be of use for others than us.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 We are ATT of Borg, MCI will be assimilated
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=/*
 * Copyright (C) 2000-2002 David Jao [EMAIL PROTECTED]
 *
 * Permission is hereby granted, free of charge, to any person
 * obtaining a copy of this software and associated documentation
 * files (the Software), to deal in the Software without
 * restriction, including without limitation the rights to use, copy,
 * modify, merge, publish, distribute, sublicense, and/or sell copies
 * of the Software, and to permit persons to whom the Software is
 * furnished to do so, subject to the following conditions:
 *
 * The above copyright notice, this permission notice, and the
 * following disclaimer shall be included in all copies or substantial
 * portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND,
 * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
 * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
 * NONINFRINGEMENT.  IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
 * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
 * WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
 * DEALINGS IN THE SOFTWARE.
 *
 */

#include httpd.h
#include http_config.h
#include http_request.h
#include http_protocol.h
#include http_core.h
#include http_main.h
#include http_log.h
#include ap_mpm.h
#include apr_strings.h
#include scoreboard.h

#define MODULE_NAME mod_limitipconn
#define MODULE_VERSION 0.22

module AP_MODULE_DECLARE_DATA limitipconn_module;

static int server_limit, thread_limit;

typedef struct {
signed int limit;   /* max number of connections per IP */

/* array of MIME types exempt from limit checking */
apr_array_header_t *no_limit;
int no_limit_set;

/* array of MIME types to limit check; all other types are exempt */
apr_array_header_t *excl_limit;
int excl_limit_set;
} limitipconn_config;

static void *limitipconn_create_config(apr_pool_t *p, server_rec *s)
{
limitipconn_config *cfg = (limitipconn_config *)
   apr_pcalloc(p, sizeof (*cfg));

/* default configuration: no limit (unset), and both arrays are empty */
cfg-limit = -1;
cfg-no_limit = apr_array_make(p, 0, sizeof(char *));
cfg-excl_limit = apr_array_make(p, 0, sizeof(char *));

return cfg;
}

/* Simple merge: Per vhost entries overrides main server entries */
static void *limitipconn_merge_config(apr_pool_t *p, void *BASE, void *ADD)
{
limitipconn_config *base = BASE;
limitipconn_config *add  = ADD;

limitipconn_config *cfg

[PATCH]: mod_cache: don't store headers that will never be used

2007-07-29 Thread Niklas Edmundsson



Attached is a patch for mod_cache (patch is for httpd-2.2.4) that 
implements what I suggested in May (see the entire thread at
http://mail-archives.apache.org/mod_mbox/httpd-dev/200705.mbox/[EMAIL PROTECTED] 
).


The problem is that cached objects that gets hammered with 
Cache-Control: max-age=0 requests will get their on-disk headers 
rewritten for each request, and since max-age=0 are always revalidated 
(hence the rewriting in the first place) those rewritten on-disk 
headers will never be used. Since the ground rule of caching is to 
cache stuff that's being reused this is rather suboptimal.


The solution is to NOT rewrite the on-disk headers when the following 
conditions are true:

- The body is NOT stale (ie. HTTP_NOT_MODIFIED when revalidating)
- The on-disk header hasn't expired.
- The request has max-age=0

This is perfectly OK with RFC2616 10.3.5 and does NOT break anything.

Patch is tested on httpd-2.2.4 and works as expected according to my 
tests.



/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 A pretty .GIF is like a melody
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--- ../dist/modules/cache/mod_cache.c   2006-12-08 13:56:00.0 +0100
+++ modules/cache/mod_cache.c   2007-07-28 22:17:48.0 +0200
@@ -305,7 +305,7 @@
 cache_server_conf *conf;
 const char *cc_out, *cl;
 const char *exps, *lastmods, *dates, *etag;
-apr_time_t exp, date, lastmod, now;
+apr_time_t exp, date, lastmod, now, staleexp=APR_DATE_BAD;
 apr_off_t size;
 cache_info *info = NULL;
 char *reason;
@@ -582,6 +582,8 @@
 /* Oh, hey.  It isn't that stale!  Yay! */
 cache-handle = cache-stale_handle;
 info = cache-handle-cache_obj-info;
+/* Save stale expiry timestamp for later perusal */
+staleexp = info-expire;
 rv = OK;
 }
 else {
@@ -736,14 +738,41 @@
 ap_cache_accept_headers(cache-handle, r, 1);
 }
 
-/* Write away header information to cache. It is possible that we are
- * trying to update headers for an entity which has already been cached.
- *
- * This may fail, due to an unwritable cache area. E.g. filesystem full,
- * permissions problems or a read-only (re)mount. This must be handled
- * later.
- */
-rv = cache-provider-store_headers(cache-handle, r, info);
+/* Avoid storing on-disk headers that are never used. When the following
+ * conditions are fulfilled:
+ * - The body is NOT stale (ie. HTTP_NOT_MODIFIED when revalidating)
+ * - The on-disk header hasn't expired.
+ * - The request has max-age=0
+ * Then there is no use to update the on-disk header since it won't be used
+ * by other max-age=0 requests since they are always revalidated, and we
+ * know it's likely there will be more max-age=0 requests since objects
+ * tend to have the same access pattern.
+ * Luckily for us RFC2616 10.3.5 last paragraph allows us to NOT update the
+ * on-disk headers if we don't want to on HTTP_NOT_MODIFIED.
+ */
+rv = APR_EGENERAL;
+if(cache-stale_handle  staleexp != APR_DATE_BAD  now  staleexp) {
+const char *cc_req;
+char *val;
+
+cc_req = apr_table_get(r-headers_in, Cache-Control);
+if(cc_req  ap_cache_liststr(r-pool, cc_req, max-age, val) 
+val != NULL  apr_atoi64(val) == 0) 
+{
+/* Yay, we can skip storing the on-disk header */
+rv = APR_SUCCESS;
+}
+}
+if(rv != APR_SUCCESS) {
+/* Write away header information to cache. It is possible that we are
+ * trying to update headers for an entity which has already been 
cached.
+ *
+ * This may fail, due to an unwritable cache area. E.g. filesystem 
full,
+ * permissions problems or a read-only (re)mount. This must be handled
+ * later.
+ */
+rv = cache-provider-store_headers(cache-handle, r, info);
+}
 
 /* Did we just update the cached headers on a revalidated response?
  *

Re: [PATCH]: mod_cache: don't store headers that will never be used

2007-07-30 Thread Niklas Edmundsson


On Sun, 29 Jul 2007, Graham Leggett wrote:

What may make this workable is the combination of The body is NOT stale 
with max-age=0. The danger of not writing the headers is that an entity, 
once stale, will not be freshened when the spec says it should, and will 
cause a thundering herd of conditional requests to a backend server. This 
issue has been badly understood by some in the past, who suggested that the 
ability to update cached entities be removed. We need to make very sure that 
by fixing one problem we don't introduce another.


You missed the condition that the header wasn't expired, right? To 
reiterate:

- The body is NOT stale (ie. HTTP_NOT_MODIFIED when revalidating)
- The on-disk header hasn't expired.
- The request has max-age=0

Since the on-disk header hasn't expired AND the body is unchanged, 
you'll have the same data in the cache except for the Expires header.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 * . . . . .   - Tribble Mother and Young
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: [PATCH]: mod_cache: don't store headers that will never be used

2007-07-30 Thread Niklas Edmundsson


On Sun, 29 Jul 2007, Roy T. Fielding wrote:

The solution is to NOT rewrite the on-disk headers when the following 
conditions are true:

- The body is NOT stale (ie. HTTP_NOT_MODIFIED when revalidating)
- The on-disk header hasn't expired.
- The request has max-age=0

This is perfectly OK with RFC2616 10.3.5 and does NOT break anything.


No, it breaks the refreshing of the on-disk header with a new Date
field representing its new age.  The patch would cause a prefetching
spider to fail to do its intended job of refreshing all cached entries
even when they are not yet stale, which is something that content
management systems do all the time when fronted by a caching server.


Uh, OK... So they are dependant upon having the Date/Expires header 
updated, since this is the only thing that will be affected by this 
patch... Stale content will be refreshed as usual.


Since you generally never know if you will be talking to the same 
cache (think DNS record pointing to multiple caching hosts) I hadn't 
even imagined people trying to be clever and forcing updates this 
way since it's kind of a special case that it would work ;)


I'm especially intrigued by the fact that stuff is depending on the 
Date/Expires header in a cache being exactly what it thinks it should 
be, sounds kind of broken to me...



As I said before, address the problem you have by adding a directive
to either ignore such requests from abusive downloaders or to define
a minimum age for certain cached objects.  HTTP does not require the
cache configuration to be that of a transparent cache -- it only
defines how a cache configured to be transparent should work.


I would really like to understand why it wouldn't work before 
resorting to such solutions. Much of the response I have got on this 
has been of the it will not work variety while people obviously 
haven't read carefully enough to realise that the condition they state 
won't work isn't even affected.


However, if stuff is really depending on Date/Expires being what it 
thinks it is (*shiver*) then I guess there won't be any other 
options...


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 * . . . . .   - Tribble Mother and Young
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: [PATCH]: mod_cache: don't store headers that will never be used

2007-07-30 Thread Niklas Edmundsson


On Mon, 30 Jul 2007, Niklas Edmundsson wrote:

However, if stuff is really depending on Date/Expires being what it thinks it 
is (*shiver*) then I guess there won't be any other options...


Here's a version with a config directive, defaults to disabled.

Thoughts?


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 No, no, nurse! I said SLIP off his SPECTACLES!!
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--- ../dist/modules/cache/mod_cache.c   2006-12-08 13:56:00.0 +0100
+++ modules/cache/mod_cache.c   2007-07-30 14:17:17.0 +0200
@@ -305,7 +305,7 @@
 cache_server_conf *conf;
 const char *cc_out, *cl;
 const char *exps, *lastmods, *dates, *etag;
-apr_time_t exp, date, lastmod, now;
+apr_time_t exp, date, lastmod, now, staleexp=APR_DATE_BAD;
 apr_off_t size;
 cache_info *info = NULL;
 char *reason;
@@ -582,6 +582,8 @@
 /* Oh, hey.  It isn't that stale!  Yay! */
 cache-handle = cache-stale_handle;
 info = cache-handle-cache_obj-info;
+/* Save stale expiry timestamp for later perusal */
+staleexp = info-expire;
 rv = OK;
 }
 else {
@@ -736,14 +738,41 @@
 ap_cache_accept_headers(cache-handle, r, 1);
 }
 
-/* Write away header information to cache. It is possible that we are
- * trying to update headers for an entity which has already been cached.
- *
- * This may fail, due to an unwritable cache area. E.g. filesystem full,
- * permissions problems or a read-only (re)mount. This must be handled
- * later.
- */
-rv = cache-provider-store_headers(cache-handle, r, info);
+rv = APR_EGENERAL;
+if(conf-relaxupdates  cache-stale_handle  
+staleexp != APR_DATE_BAD  now  staleexp) 
+{
+/* Avoid storing on-disk headers that are never used. When the
+ * following conditions are fulfilled:
+ * - The body is NOT stale (ie. HTTP_NOT_MODIFIED when revalidating)
+ * - The on-disk header hasn't expired.
+ * - The request has max-age=0
+ * Then there is no use to update the on-disk header since it won't be
+ * used by other max-age=0 requests since they are always revalidated,
+ * and we know it's likely there will be more max-age=0 requests since
+ * objects tend to have the same access pattern.
+ */
+const char *cc_req;
+char *val;
+
+cc_req = apr_table_get(r-headers_in, Cache-Control);
+if(cc_req  ap_cache_liststr(r-pool, cc_req, max-age, val) 
+val != NULL  apr_atoi64(val) == 0) 
+{
+/* Yay, we can skip storing the on-disk header */
+rv = APR_SUCCESS;
+}
+}
+if(rv != APR_SUCCESS) {
+/* Write away header information to cache. It is possible that we are
+ * trying to update headers for an entity which has already been 
cached.
+ *
+ * This may fail, due to an unwritable cache area. E.g. filesystem 
full,
+ * permissions problems or a read-only (re)mount. This must be handled
+ * later.
+ */
+rv = cache-provider-store_headers(cache-handle, r, info);
+}
 
 /* Did we just update the cached headers on a revalidated response?
  *
@@ -896,6 +925,8 @@
 /* array of headers that should not be stored in cache */
 ps-ignore_headers = apr_array_make(p, 10, sizeof(char *));
 ps-ignore_headers_set = CACHE_IGNORE_HEADERS_UNSET;
+ps-relaxupdates = 0;
+ps-relaxupdates_set = 0;
 return ps;
 }
 
@@ -941,6 +972,10 @@
 (overrides-ignore_headers_set == CACHE_IGNORE_HEADERS_UNSET)
 ? base-ignore_headers
 : overrides-ignore_headers;
+ps-relaxupdates  =
+(overrides-relaxupdates_set == 0)
+? base-relaxupdates
+: overrides-relaxupdates;
 return ps;
 }
 static const char *set_cache_ignore_no_last_mod(cmd_parms *parms, void *dummy,
@@ -1119,6 +1154,19 @@
 return NULL;
 }
 
+static const char *set_cache_relaxupdates(cmd_parms *parms, void *dummy,
+  int flag)
+{
+cache_server_conf *conf;
+
+conf =
+(cache_server_conf *)ap_get_module_config(parms-server-module_config,
+  cache_module);
+conf-relaxupdates = flag;
+conf-relaxupdates_set = 1;
+return NULL;
+}
+
 static int cache_post_config(apr_pool_t *p, apr_pool_t *plog,
  apr_pool_t *ptemp, server_rec *s)
 {
@@ -1171,6 +1219,11 @@
 AP_INIT_TAKE1(CacheLastModifiedFactor, set_cache_factor, NULL, RSRC_CONF,
   The factor used to estimate Expires date from 
   LastModified date

Re: [PATCH]: mod_cache: don't store headers that will never be used

2007-07-31 Thread Niklas Edmundsson


On Tue, 31 Jul 2007, Sander Striker wrote:


Here's a version with a config directive, defaults to disabled.


Silly Q; a directive?  Or a env var that can be scoped in interesting
ways using mod_setenvif and/or mod_rewrite?

Most of our proxy behavior overrides are in terms of envvars.  They are
much more flexible to being tuned per-browser, per-backend etc.


Directive, envvar, I don't think Niklas cares much.  Can we make up our
mind please?


I have no clue on the envvar-stuff, so I don't think I'm qualified to 
have an opinion. CacheIgnoreCacheControl et al are config directives 
currently and I have the gut feeling that they should all either be 
envvar-thingies or config directives, and that starting to mix stuff 
will only end in confusion and despair ;)


I prefer a config-option that I can set serverwide without too much 
fuss since we want this behaviour on all files. If this can also be 
accomplished with envvar-stuff then sure.


One way might be to do a config directive for now, and deal with the 
envvar-stuff separately.


Related, this config option might also be of interest for 
mod_disk_cache to enable similar optimizations. What would the good 
way be to accomplish this?


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 *   - Tribble þ   oð  oð - Tribbles and Rock!
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: 1.3 bugs

2007-08-02 Thread Niklas Edmundsson


On Thu, 2 Aug 2007, Jim Jagielski wrote:


It's easy to be brave when being heartless :)

Lots of WONTFIX :)


Actually, it's more heartless to just leave the bugs without feedback. 
It gives people the impression that the developers simply don't care, 
and they will most likely never submit a bug report again.


This is especially true if the reporter had come up with a fix and 
produced a patch...


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Death is nature's way of telling you to slow down...
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: [PATCH]: mod_cache: don't store headers that will never be used

2007-08-08 Thread Niklas Edmundsson


On Tue, 31 Jul 2007, Niklas Edmundsson wrote:

Any opinions on this?


Here's a version with a config directive, defaults to disabled.


Silly Q; a directive?  Or a env var that can be scoped in interesting
ways using mod_setenvif and/or mod_rewrite?

Most of our proxy behavior overrides are in terms of envvars.  They are
much more flexible to being tuned per-browser, per-backend etc.


Directive, envvar, I don't think Niklas cares much.  Can we make up our
mind please?


I have no clue on the envvar-stuff, so I don't think I'm qualified to have an 
opinion. CacheIgnoreCacheControl et al are config directives currently and I 
have the gut feeling that they should all either be envvar-thingies or config 
directives, and that starting to mix stuff will only end in confusion and 
despair ;)


I prefer a config-option that I can set serverwide without too much fuss 
since we want this behaviour on all files. If this can also be accomplished 
with envvar-stuff then sure.


One way might be to do a config directive for now, and deal with the 
envvar-stuff separately.


Related, this config option might also be of interest for mod_disk_cache to 
enable similar optimizations. What would the good way be to accomplish 
this?


/Nikke




/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Now, what was that magic word? Shazam? WHAM! Nah - Garibaldi
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: CHANGES

2007-08-08 Thread Niklas Edmundsson


On Wed, 8 Aug 2007, Jim Jagielski wrote:


I know I've said this before, but having copies of
Changes in Apache 2.2.5 under the -trunk CHANGES file,
as well as the 2.0.x stuff in both trunk and 2.2
means that we are pretty much assured that they will
get out of sync.

I'd like to re-propose that the CHANGES files only
refer to changes related to that MAJOR.MINOR release
and, at the end, refer people to the other CHANGES
files for historical purposes (except for Apache 1.3
which will maintain CHANGES history since the beginning).

Comments? I'd like to actually implement this for
the TR Friday.


I'm no committer or anything, but it sounds like the sane way to do 
it.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 I keep trying to lose weight, but it keeps finding me
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: [PATCH]: mod_cache: don't store headers that will never be used (fwd)

2007-10-10 Thread Niklas Edmundsson



I think that this discussion kind of got lost due to vacations or 
something...


In any case, I'd really like to get some closure.

The discussion starts here for those of you that has deleted the 
thread:

http://mail-archives.apache.org/mod_mbox/httpd-dev/200707.mbox/[EMAIL PROTECTED]
(the permalink doesn't seem to show the nifty thread list, you have to 
click a bit for that).


What I'd like answered is:
- Was the latest patch as suggested OK?
- What's the correct way of getting the mod_cache configuration from
  the mod_disk_cache module?


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Operator...give me the no for 999, QUICK!
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

-- Forwarded message --
From: Niklas Edmundsson [EMAIL PROTECTED]
To: dev@httpd.apache.org
Date: Wed, 8 Aug 2007 09:28:48 +0200 (MEST)
Subject: Re: [PATCH]: mod_cache: don't store headers that will never be used
Reply-To: dev@httpd.apache.org
X-Bogosity: Unsure, tests=bogofilter, spamicity=0.50, version=0.96.2

On Tue, 31 Jul 2007, Niklas Edmundsson wrote:

Any opinions on this?


Here's a version with a config directive, defaults to disabled.


Silly Q; a directive?  Or a env var that can be scoped in interesting
ways using mod_setenvif and/or mod_rewrite?

Most of our proxy behavior overrides are in terms of envvars.  They are
much more flexible to being tuned per-browser, per-backend etc.


Directive, envvar, I don't think Niklas cares much.  Can we make up our
mind please?


I have no clue on the envvar-stuff, so I don't think I'm qualified to have an 
opinion. CacheIgnoreCacheControl et al are config directives currently and I 
have the gut feeling that they should all either be envvar-thingies or config 
directives, and that starting to mix stuff will only end in confusion and 
despair ;)


I prefer a config-option that I can set serverwide without too much fuss 
since we want this behaviour on all files. If this can also be accomplished 
with envvar-stuff then sure.


One way might be to do a config directive for now, and deal with the 
envvar-stuff separately.


Related, this config option might also be of interest for mod_disk_cache to 
enable similar optimizations. What would the good way be to accomplish 
this?


/Nikke




/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Now, what was that magic word? Shazam? WHAM! Nah - Garibaldi
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: [PATCH]: mod_cache: don't store headers that will never be used (fwd)

2007-10-17 Thread Niklas Edmundsson


On Wed, 10 Oct 2007, Graham Leggett wrote:


Niklas Edmundsson wrote:


What I'd like answered is:
- Was the latest patch as suggested OK?


The latest patch was the one with a directive, which is +1 from me - though 
is it possible to add documentation for the directive?


Sure. Is http://apache-server.com/tutorials/ATdocs-project.html the 
relevant docco-documentation? Should it be a combined patch with both 
code and docs?



- What's the correct way of getting the mod_cache configuration from
  the mod_disk_cache module?


Look inside mod_proxy_http.c for a function called ap_proxy_read_headers(). 
In it, the module mod_proxy_http reads the config from the module mod_proxy.


Thanks, I'll take a look :)

/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Self-made man: A horrible example of unskilled labor.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Proposal: Increase request in worker_score

2007-10-20 Thread Niklas Edmundsson



Hi all!

We've been annoyed by the fact that the status page as served by 
mod_status only shows the first 64 bytes of the current requests for 
a couple of years now.


We know that it's only meant to be a hint, not the complete request in 
all conditions, but the problem is that 64 bytes is just too short to 
be useful in a lot of cases. When using httpd for serving files on a 
FTP server you usually see the directory and the first characters of 
the filename, very annoying.


Locally we've been running with a patch[1] to increase the size from 
64 bytes to 192 bytes for a while now (since httpd 2.2.4 was released) 
with no ill effects. Admittedly, our servers are configured to 
MaxClients 6000 so they don't see the insane amount of simultaneous 
accesses as some bigger configurations out there.


However, as a useful improvement we would like to propose that the 
request entry in worker_score is increased from 64 bytes to 128 bytes. 
This would cover most cases we've seen of missing just a couple of 
characters in mod_status to be able to determine which file is being 
accessed...


In terms of memory footprint it would mean the following:

sizeof(worker_score) on:
32bit Linux (Ubuntu 6.06) from 224 bytes to 288 bytes
64bit Linux (Ubuntu 7.04) from 264 bytes to 328 bytes

Summing this up for a server configured for MaxClients 2 it would 
mean:

32bit from 4375kB to 5625kB
64bit from 5156kB to 6406kB

Since we're talking about memory footprint increases in the 
megabyte-range for a server configured for 2 connections I can't 
see that the increased memory consumption should be a problem.


To be honest though, I would really prefer having it increased to 
something like 192 or 256 bytes. 256 bytes would mean an increase of 
3750kB for 2 MaxClients. Not much of a big deal on modern (and 
not-so-modern) hardware IMHO.


Thoughts?


[1] - our patch to increase request scoreboard size.
--- ../dist/include/scoreboard.h2006-07-12 05:38:44.0 +0200
+++ ./include/scoreboard.h  2007-09-20 20:24:41.0 +0200
@@ -125,7 +125,7 @@
 #endif
 apr_time_t last_used;
 char client[32];   /* Keep 'em small... */
-char request[64];  /* We just want an idea... */
+char request[192]; /* We just want an idea... */
 char vhost[32];/* What virtual host is being 
accessed? */

 };



/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Reformat Hard Drive!  Are you SURE (Y/Y)?
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

mod_disk_cache jumbopatch - 20071016 revision for 2.2.6.

2007-10-21 Thread Niklas Edmundsson



Hi all!

I've uploaded a httpd 2.2.6-adapted version of our mod_disk_cache 
jumbo patch that we're using at ftp.acc.umu.se to 
http://issues.apache.org/bugzilla/show_bug.cgi?id=39380

for those who wants a one-patch solution to using our modifications.

The only changes from the last version is making the patch apply to 
httpd 2.2.6 and fixing the bugs fixed in the vanilla 2.2.6 version.


It's survived one Ubuntu release, so it's fairly stable. We typically 
saw 250MB/s (we only have 2gigabit) being delivered from the ftp 
cluster and the backend doing around 5-20MB/s serving up uncached 
files and file system traversals.



/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 * * * - Tribbles  O O O - Tribbles on drugs
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: Apache 2.2 MPM Worker Virtual Memory Usage

2007-10-21 Thread Niklas Edmundsson


On Sun, 21 Oct 2007, Ruediger Pluem wrote:


What is your setting for ThreadsPerChild?



On my Linux each thread consumes 8MB of virtual memory (I assume for 
stack and other thread private data) as shown by pmap. This can sum 
up to a large amount of memory.


This is due to linux libc setting the thread stack size using the 
stack resource limit. We have the following in our apache httpd 
startup script:


# NPTL (modern Linux threads) defaults the thread stack size to the setting
# of your stack resource limit. The system-wide default for this is 8MB,
# which is waaay exaggerated when running httpd.
# 512kB should be more than enough (AIX manages on 96kB, Netware on 64kB).
ulimit -s 512

We didn't bother with trying to lower it more, but I've run the same 
httpd config on IBM AIX 5.1 with the default 96kB thread stack size 
without problems.


This could probably be worked around in httpd/APR by calling setrlimit 
before starting the threads, however I think it's probably better to 
just document this Linux thread bogosity and let vendors fix their 
httpd startup scripts though.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 No boom now.  Boom tomorrow...there's ALWAYS a boom tomorrow...BOOM!
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: Proposal: Increase request in worker_score

2007-10-22 Thread Niklas Edmundsson


On Sun, 21 Oct 2007, William A. Rowe, Jr. wrote:

Could we start by increasing the existing one, which is rather easily done, 
and then move on to doing it the fancy way? If someone has a fancy-patch 
right now I'm all for that, but pending that I'd prefer landing some sort 
of improvement...


I don't quite see the reasoning of having 2-steps to a solution, an
intermediate that doesn't land in 2.2 or 2.4)...


Just the logic of having some improvement committed if noone gets 
round to doing it the fancy way. If it gets replaced by a fancy 
solution it's not that much extra work that has been put into it, but 
if it's forgotten we'll have to live with yet another major version 
with this annoyingly small default buffer.



/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Can I have someone to eat? - Spike
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: Segmentation fault( SSL enable Apache 2.2.6(64 bit)

2007-10-25 Thread Niklas Edmundsson


On Thu, 25 Oct 2007, Renu Tiwari wrote:



Hi,

We have configured Apache 2.2.6(64 bit) with openssl-0.9.8g on AIX5.2(64 bit).
Build openssl source after setting BUILD_MODE=64.

The issue is, when we start the Apache web server(./apachectl start), we are 
getting segmentation fault in error_log.

This issue is coming only when openssl is coming into the picture.

What cud be the possible reason? Please reply.


Did your openssl 64bit build pass make test (or make check, whatever 
it's called)?


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 If I wanted your opinion, I would have given you one
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

RE: Segmentation fault( SSL enable Apache 2.2.6(64 bit)

2007-10-25 Thread Niklas Edmundsson


On Thu, 25 Oct 2007, Renu Tiwari wrote:





No, when I tried doing make test, it failed.


But when I tried to run make install and make, there I didn't 
get any error.


make install doesn't do the test.

If your openssl doesn't pass make test, then it's broken. Fix that 
first.


Also I have copied the same SSL-enabled Apache webserver on AIX 
5.3(64 bit) there it is working perfectly fine.


Does this depend on the kernel also. As AIX5.3 is 64-bit kernel and 
AIX 5.2 is 32-bit kernel m/c.



But our application is running as 64 bit application.


Shouldn't matter, could be that you're hitting some bug that's been 
fixed. You might want to check your C runtime patch levels, and other 
patches too for that matter.




-Original Message-
From: Niklas Edmundsson [mailto:[EMAIL PROTECTED]
Sent: Thursday, October 25, 2007 2:36 PM
To: 'dev@httpd.apache.org'
Subject: Re: Segmentation fault( SSL enable Apache 2.2.6(64 bit)



On Thu, 25 Oct 2007, Renu Tiwari wrote:








Hi,







We have configured Apache 2.2.6(64 bit) with openssl-0.9.8g on AIX5.2(64 bit).



Build openssl source after setting BUILD_MODE=64.







The issue is, when we start the Apache web server(./apachectl start), we are 
getting segmentation fault in error_log.







This issue is coming only when openssl is coming into the picture.







What cud be the possible reason? Please reply.




Did your openssl 64bit build pass make test (or make check, whatever

it's called)?



/Nikke

--

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]

---

 If I wanted your opinion, I would have given you one

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=


 CAUTION - Disclaimer *
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely 
for the use of the addressee(s). If you are not the intended recipient, please 
notify the sender by e-mail and delete the original message. Further, you are 
not to copy, disclose, or distribute this e-mail or its contents to any other 
person and any such actions are unlawful. This e-mail may contain viruses. 
Infosys has taken every reasonable precaution to minimize this risk, but is not 
liable for any damage you may sustain as a result of any virus in this e-mail. 
You should carry out your own virus checks before opening the e-mail or 
attachment. Infosys reserves the right to monitor and review the content of all 
messages sent to or from this e-mail address. Messages sent to or from this 
e-mail address may be stored on the Infosys e-mail system.
***INFOSYS End of Disclaimer INFOSYS***



/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Taken as a whole, the universe is absurd
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: Proxying subrequests

2007-10-28 Thread Niklas Edmundsson


On Sat, 27 Oct 2007, Paul Querna wrote:


-0.9 on enabling this by default in mod_includes.  Make it possible to
turn it on via httpd.conf, but never on by default


I agree.

And it should have huge warning signs, and a long descriptive name 
that does not invite to let's try this and see if it solves my 
problem.


Cross-site-include-holes are nasty, and I see it as a feature that 
they are not supported ;)


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 The only one who can destroy your Tasha now, is you.  Q
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: Proposal: Increase request in worker_score

2007-10-31 Thread Niklas Edmundsson


On Wed, 31 Oct 2007, Jim Jagielski wrote:


For those interested, check out

   http://svn.apache.org/viewvc?rev=590641view=rev

pasts tests and works as expected, at least in my limited
testing :)

Again, the main focus in this was to resolve the issue in a
2.2-friendly way. So I'd like to get additional feedback
with that in mind before I propose a backport.


Seems reasonable for 2.2. I'm not too keen on the directive name 
though, but since I have no better suggestion I'll be quiet now ;)



/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Vell, Zaphod's just zis guy you know? - Gag Halfrunt.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: svn commit: r601843 - in /httpd/mod_ftp/trunk: STATUS include/mod_ftp.h

2007-12-06 Thread Niklas Edmundsson


On Thu, 6 Dec 2007, William A. Rowe, Jr. wrote:


First question, are there testers who will test/vote on the module?


I'm game for testing. Our environment is strictly anonftp read-only 
though, so I won't test the non-anon stuff. Having the thing work with 
mod_cache would be absolute bliss, but I guess that's an item to chew 
on after the first release ;)



+  * FTPLimit* family of directives share an FTPLimitDBFile across hosts,
+yet fail to scope their tracking records to the corresponding host.


If there's no fix, I'd just mark those directives as experimental and call
it baked.


If it's documented how it works, that's fine. Although, if they don't 
scope correctly per-host they should probably be server-wide to avoid 
confusion by people that doesn't read the fine print.



Who's interested in seeing a TR and helping make the release happen?


I'm keen on helping out.

/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 If you call your doctor Bones, YMBAT
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: [VOTE] initial release of httpd-mod_ftp-0.9.0

2007-12-18 Thread Niklas Edmundsson


On Tue, 18 Dec 2007, William A. Rowe, Jr. wrote:


Please fetch up the newly prepared httpd-mod_ftp-0.9.0.tar.[gz|bz2]
(and its md5/asc sigs) from:

  http://httpd.apache.org/dev/dist/mod_ftp/

review, take it for a spin, and cast your choice


As I mentioned, the perms of the installed httpd include directory
were corrupted to 664 by the first candidate, so I've withdrawn it.

Proceeding to tag the next crack at an alpha/beta 0.9.1 tomorrow,


You might want to have a go at the configure.apxs before doing that. 
It seems to contain some bashisms that shows up on debian/ubuntu 
machines which uses dash as /bin/sh:


% ./configure.apxs
test: 8: ==: unexpected operator
test: 19: ==: unexpected operator
Configuring mod_ftp for APXS
...

The thing is that == is not a valid /bin/sh style test expression. It 
should probably be just =, or test -z $var ...


On the positive side, the thing builds on both Linux and AIX (out of 
tree, for httpd 2.2.6). I'll await the 0.9.1 tag before doing more 
elaborate tests though.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 I'm not crazy, I just don't give a s#!t
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

What's the right way to spawn a child in modules?

2006-04-27 Thread Niklas Edmundsson



Hi all!

I'm currently working on beating mod_disk_cache into submission, with 
the goal of it being able to deliver data while caching a file (this 
started as bug #39380). I have solved most of the problems, I'll 
submit patches when they have passed the scrutiny of my fellow 
computer club admins. The goal is to make the thing usable on 
http://ftp.acc.umu.se/ after all (and no, we don't have a budget so we 
can't compensate bad code with more hardware ;).


Anyhow, my real question: What's the right way to spawn a child in 
modules?


The problem is that the current mod_disk_cache design means that the 
first one to request an uncached file gets to wait until it's cached, 
since the caching is done by that request. That can be a long time to 
wait for a reply when you're caching a 4GB DVD image from a slow 
backend.


The naive solution is to spawn a child that does the copying letting 
the request be processed simultaneously. Is this doable?


Would it be considered offensive to do apr_thread_create() if threads 
are available and fork() otherwise?


Other ways to solve this?


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 This building is so high, the elevator shows movies.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: What's the right way to spawn a child in modules?

2006-04-28 Thread Niklas Edmundsson


On Thu, 27 Apr 2006, Brian Akins wrote:

Would it be considered offensive to do apr_thread_create() if threads are 
available and fork() otherwise?


sounds reasonable - having only thought about it for 10 seconds..


OK. I'll try then and see how it plays out.


Any particular reason your backends are slow?


Currently: Old hardware. In the future: Gigabit Ethernet.

I might add that our FTP mirror has a bunch of DVD images, and even at 
full gigabit speed it takes some 40 seconds to cache it and that's 
simply too long before the server starts responding by sending data to 
the client.



/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Go Ahead.. We're cleared for wierd.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: Possible new cache architecture

2006-05-02 Thread Niklas Edmundsson


On Mon, 1 May 2006, Davi Arnaut wrote:


More important, if we stick with the key/data concept it's possible to
implement the header/body relationship under single or multiple keys.


I've been hacking on mod_disk_cache to make it:
* Only store one set of data when one uncached item is accessed
  simultaneously (currently all requests cache the file and the last
  finished cache process is wins).
* Don't wait until the whole item is cached, reply while caching
  (currently it stalls).
* Don't block the requesting thread when requestng a large uncached
  item, cache in the background and reply while caching (currently it
  stalls).

This is mostly aimed at serving huge static files from a slow disk 
backend (typically an NFS export from a server holding all the disk), 
such as http://ftp.acc.umu.se/ and http://ftp.heanet.ie/ .


Doing this with the current mod_disk_cache disk layout was not 
possible, doing the above without unneccessary locking means:


* More or less atomic operations, so caching headers and data in
  separate files gets very messy if you want to keep consistency.
* You can't use tempfiles since you want to be able to figure out
  where the data is to be able to reply while caching.
* You want to know the size of the data in order to tell when you're
  done (ie the current size of a file isn't necessarily the real size
  of the body since it might be caching while we're reading it).

In the light of our experiences, I really think that you want to have 
a concept that allows you to keep the bond between header and data. 
Yes, you can patch up a missing bond by require locking and stuff, but 
I really prefer not having to lock cache files when doing read access. 
When it comes to make the common case fast a lockless design is very 
much preferred.


However, if all those issues are sorted out in the layer above disk 
cache then the above observations becomes more or less moot.


In any case the patch is more or less finished, independent testing 
and auditing haven't been done yet but I can submit a preliminary 
jumbo-patch if people are interested in having a look at it now.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Want to forget all your troubles? Wear tight shoes.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

mod_disk_cache patch, preview edition (was: new cache arch)

2006-05-02 Thread Niklas Edmundsson


On Tue, 2 May 2006, Graham Leggett wrote:


I've been hacking on mod_disk_cache to make it:
* Only store one set of data when one uncached item is accessed
   simultaneously (currently all requests cache the file and the last
   finished cache process is wins).
* Don't wait until the whole item is cached, reply while caching
   (currently it stalls).
* Don't block the requesting thread when requestng a large uncached
   item, cache in the background and reply while caching (currently it
   stalls).


This is great, in doing this you've been solving a proxy bug that was
first reported in 1998 :).


OK. Stuck in the File under L for Later pile? ;)


The only things to be careful of is for Cache-Control: no-cache and
friends to be handled gracefully (the partially cached file should be
marked as delete-me so that the current request creates a new cache file
/ no cache file. Existing running downloads should be unaffected by
this.), and for backend failures (either a timeout or a premature socket
close) to cause the cache entry to be invalidated and deleted.


I haven't changed the handling of this, so any bugs in this regard 
shouldn't be my fault at least ;)


Regarding partially cached files, it understands when caching a file 
has failed and so on.



* More or less atomic operations, so caching headers and data in
   separate files gets very messy if you want to keep consistency.


Keep in mind that HTTP/1.1 compliance requires that the headers be
updatable without changing the body.


They are. It seek():s to an offset where the body is stored so 
headers can be updated as long as they don't grow too much.



* You can't use tempfiles since you want to be able to figure out
   where the data is to be able to reply while caching.
* You want to know the size of the data in order to tell when you're
   done (ie the current size of a file isn't necessarily the real size
   of the body since it might be caching while we're reading it).


The cache already wants to know the size of the data so that it can decide
whether it's prepared to try and cache the file in the first place, so in
theory this should not be a problem.


The need-size-issue goes for retrievals as well.

You also have the size unknown right now issue, which this patch 
solves by writing a header with the size -1 and then updating it when 
the size is known.



In any case the patch is more or less finished, independent testing
and auditing haven't been done yet but I can submit a preliminary
jumbo-patch if people are interested in having a look at it now.


Post it, people can take a look.


OK. It's attached. It has only had mild testing using the worker mpm 
with mmap enabled, it needs a bit more testing and auditing before 
trusting it too hard.


Note that this patch fixes a whole slew of other issues along the way, 
the most notable ones being LFS on 32bit arch, don't eat all your 
32bit memory/address space when caching a huge files, provide 
r-filename so %f in LogFormat works, and other smaller issues.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 I am Zirofsky of Borg. I will reassimilate Alaska and Finland.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

httpd-2.2.2-mod_disk_cache-jumbo20060502.patch.gz
Description: Binary data

Re: Possible new cache architecture

2006-05-02 Thread Niklas Edmundsson


On Tue, 2 May 2006, Graham Leggett wrote:


This is great, in doing this you've been solving a proxy bug that was
first reported in 1998 :).


This already works in the case you get the data from the proxy backend. It
does
not work for local files that get cached (the scenario Niklas uses the
cache
for).


Ok then I have misunderstood - I was referring to the thundering herd
problem.


Exactly what is the thundering herd problem? I can guess the general 
problem, but without a more precise definition I can't really say if 
my patch fixes it or not.


If it's:
* Link to latest GNOME Live CD gets published on Slashdot.
* A gazillion users click the link to download it.
* mod_disk_cache starts a new instance of caching the file for each
  request, until someone has completed caching the file.

Then this patch solves the problem regardless of whether it's a static 
file or dynamically generated content since it only allows one 
instance to cache the file (OK, there's a small hole so there can be 
multiple instances but it's wy smaller than now), all other 
instances delivers data as the caching process is writing it.


Additionally, if it's a static file that's allowed to be cached in 
the background it solves:

* Reduce chance of user getting bored since the data is delivered
  while being cached.
* The user got bored and closed the connection so the painfully cached
  file gets deleted.

/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Illiterate?  Write for information!
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: Possible new cache architecture

2006-05-02 Thread Niklas Edmundsson


On Tue, 2 May 2006, Plüm, Rüdiger, VF EITO wrote:


Another thing: I guess on systems with no mmap support the current 
implementation
of mod_disk_cache will eat up a lot of memory if you cache a large local file,
because it transforms the file bucket(s) into heap buckets in this case.
Even if mmap is present I think that mod_disk_cache causes the file buckets
to be transformed into many mmap buckets if the file is large. Thus we do not
use sendfile in the case we cache the file.


Correct. When caching a 4.3GB file on a 32bit arch it gets so bad that 
mmap eats all your address space and the thing segfaults. I initally 
thought it was eating memory, but that's only if you have mmap 
disabled.



I the case that a brigade only contains file_buckets it might be possible to
copy this brigade, sent it up the chain and process the copy of the brigade
for disk storage afterwards. Of course this opens a race if the file gets
changed in between these operations.
This approach does not work with socket or pipe buckets for obvious reasons.
Even heap buckets seem to be a somewhat critical idea because of the 
added memory usage.


I did the somewhat naive approach of only doing background caching 
when the buckets refer to a single sequential file. It's not perfect, 
but it solves the main case where you get a huge amount of data to 
store ...



/Nikke - stumbled upon more than one bug when digging into
 mod_disk_cache
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Anything is edible if it's chopped finely enough
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: Possible new cache architecture

2006-05-02 Thread Niklas Edmundsson


On Tue, 2 May 2006, Graham Leggett wrote:


If it's:
* Link to latest GNOME Live CD gets published on Slashdot.
* A gazillion users click the link to download it.
* mod_disk_cache starts a new instance of caching the file for each
   request, until someone has completed caching the file.


Then this is the thundering herd problem :)


OK :)


Either a site is slashdotted (as in your case), or a cached entry expires,
and suddenly the backend gets nailed until at least one request wins,
then we are back to normal serving from the cache.

In your case, the backend is the disk, while in the bug from 1998, the
backend was another webserver. Either way, same problem.


OK.


Then this patch solves the problem regardless of whether it's a static
file or dynamically generated content since it only allows one
instance to cache the file (OK, there's a small hole so there can be
multiple instances but it's wy smaller than now), all other
instances delivers data as the caching process is writing it.



Additionally, if it's a static file that's allowed to be cached in
the background it solves:
* Reduce chance of user getting bored since the data is delivered
   while being cached.
* The user got bored and closed the connection so the painfully cached
   file gets deleted.


Hmmm - thinking about this we try to cache the brigade (all X GB of it)
first, then we try write it to the network, thus the delay.

Does your patch solve all of these already, or are they planned?


It solves everything I've mentioned. The solution is probably not 
perfect for the not-static-file case since it falls back to the old 
behaviour of caching the whole file, but it should be a lot better 
than the current mod_disk_cache since the rest of the threads get 
reply-while-caching. There are issues here with the fact that the 
result is discarded if the connection is aborted, but I'm not familiar 
enough with apache filter internals to state that you can keep the 
result even though the connection is aborted.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Anything is edible if it's chopped finely enough
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: mod_disk_cache patch, preview edition (was: new cache arch)

2006-05-02 Thread Niklas Edmundsson


On Tue, 2 May 2006, Graham Leggett wrote:


The need-size-issue goes for retrievals as well.


If you are going to read from partially cached files, you need a total
size field as well as a flag to say give up, this attempt at caching
failed


Are there partially cached files? If I request the last 200 bytes of a 
4.3GB DVD image, the bucket brigade contains the complete file... The 
headers says ranges and all sorts of things but they don't match 
what's cached.



What may be useful is a cache header with some metadata in it giving the
total size and a download failed flag, which goes in front of the
headers. The metadata can also contain the offset of the body.


I solved it with size in the body and a timeout mechanism, a download 
failed flag doesn't cope with segfaults.



OK. It's attached. It has only had mild testing using the worker mpm
with mmap enabled, it needs a bit more testing and auditing before
trusting it too hard.

Note that this patch fixes a whole slew of other issues along the way,
the most notable ones being LFS on 32bit arch, don't eat all your
32bit memory/address space when caching a huge files, provide
r-filename so %f in LogFormat works, and other smaller issues.


Is it possibly to split the patch into separate fixes for each issue
(where practical)? It makes it easier to digest.


It's possible, but since I needed to hammer so hard at mod_disk_cache 
to get it in the shape I wanted it I set out to first get the whole 
thing working and then worry about breaking the patch into manageable 
pieces. For example, by doing it all-incremental there would have been 
a dozen or so disk format change-patches, and I really don't think you 
would have wanted that :)


As said, this is a preliminary jumbo patch for those interested in how 
we tackled the various problems involved (or those who love to take 
bleeding edge code for a spin and watch it falling into pieces when 
hitting a weird corner case ;).



Also the other fixes can be committed immediately/soon, depending on how
simple they are, which will simplify the final patch.


Yup. I'll update bug#39380 when we feel that we have a good solution.

/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 To err is Human. To blame someone else is politics.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

1 2 3 >

1 - 100 of 224 matches

Mail list logo