Re: apr_inet_pton
Cesar Silveira wrote: Hello, Not sure if this is the right place to ask this. Does anyone know why apr_inet_pton seems not to be available in the APR library's public API? By looking at the source code I have found it is only implemented on UNIX-specific files. However, I was able to take the code and simply paste and compile it on Visual Studio 2005 and it worked perfectly. I'll be really glad if anyone can provide me some input on that. APR questions are better suited to the apr-dev mailing list. What you need to do with apr_inet_pton that cannot be done with other public APR functions? Regards, -- Davi Arnaut
Re: Is async the answer
Akins, Brian wrote: On 1/18/08 3:07 PM, Colm MacCarthaigh [EMAIL PROTECTED] wrote: That's not even a consideration, async is really for dynamic content, proxies, and other non-sendfile content. For dynamic stuff, X-sendfile works well. (Just really starting to play with that, liking it so far). The proxy that the LiveJournal folks wrote, I think, copies all the data from the origin server into a file and then uses sendfile to send to the client... Doesn't this limit the network bandwidth to the bandwidth of the disk and/or file system? -- Davi Arnaut
Re: Is async the answer
Graham Leggett wrote: Davi Arnaut wrote: The proxy that the LiveJournal folks wrote, I think, copies all the data from the origin server into a file and then uses sendfile to send to the client... Doesn't this limit the network bandwidth to the bandwidth of the disk and/or file system? Yes, and the effective bandwidth of the disk can be significantly higher than both the cache backend (which is often expensive) and the network frontend (which has slow potential slow clients typing up your resources). Don't forget that your cache disk is most often RAM backed, meaning effectively your cache disk is a ramdisk, with all the speed advantages that go with it. This is true for expensive hardware and very well designed operating systems and file systems.. and the space is not infinite. But... OK. Back to the topic I thought that one of the key points of async/event based servers were that we use software to scale and not hardware (so that hardware is not the bottleneck)... like serving thousands of slow clients from commodity hardware. -- Davi Arnaut
Re: Integrity of Apache source code
Andrew Beverley wrote: Hi, I hope that this is the correct mailing list for this question, and that you can easily provide a quick response. I am currently working within the UK Ministry of Defence, and am trying to get Apache web server accredited as software able to be installed on one of our defence networks. However, one of the barriers I am coming up against is the argument that, because it is open source, that someone could contribute a Trojan horse to the code and that the code could be included in the official product. What I would like to know, so that I can dispel this, is what procedures are in place to prevent this happening? I know that all downloads are digitally signed, but what other procedures are in place? For example, how is code signed-off for inclusion in production releases? The trojan horse argument is a flawed argument commonly used against open source projects, a Google search on the subject will pop up a lot of articles on this subject. Example, this fine article: http://www.dwheeler.com/secure-programs/Secure-Programs-HOWTO/open-source-security.html On a personal note, I think that the distributed development nature of the Apache HTTPD (and other open source projects) are what makes me more confident to trust it and given it's 50+% market share I'm pretty sure other people trust it too. On a side note, the NSA (National Security Agency) has taken the open source route and they don't seem to be worried :-) Regards, Davi Arnaut
Re: svn commit: r595813 - /httpd/httpd/branches/2.2.x/STATUS
Ruediger Pluem wrote: On 11/16/2007 09:34 PM, [EMAIL PROTECTED] wrote: Author: davi Date: Fri Nov 16 12:34:22 2007 New Revision: 595813 URL: http://svn.apache.org/viewvc?rev=595813view=rev Log: Remove proposal, fix is already backported (r574884) but has Not its not. jorton's solution (r591384) is better. Please readd the proposal. See jorton's comment on PR 43865. /me stops committing while on holiday trip Regards, Davi Arnaut
Re: Amsterdam sandbox
Justin Erenkrantz wrote: On Nov 14, 2007 9:07 AM, Issac Goldstand [EMAIL PROTECTED] wrote: Well, not to overstate the obvious, but aren't we implying here that serf will become an integral part of apr-util (at least that's what I'd understood)? As such, serf wouldn't be forked as much as absorbed... IMO, serf is and should be a standalone library. -- justin Under (or not) the Apache umbrella? -- Davi Arnaut
Re: Apachelounge problems
William A. Rowe, Jr. wrote: Hmmm... seems that - even though we've *repeated* this multiple times, we have to state this again. Contents of http://httpd.apache.org/dev/dist/ are *development* tarballs and not for any distribution. Just out of curiosity, why don't we name the tarballs as such? httpd-2.2.5-RC.tar.gz or httpd-2.2.5-dev.tar.gz, and if the tarball gets enough votes, just rename it to httpd-2.2.5.tar.gz -- Davi Arnaut
Re: svn commit: r563839 - /httpd/httpd/trunk/CHANGES
[EMAIL PROTECTED] wrote: Author: jim Date: Wed Aug 8 05:41:21 2007 New Revision: 563839 [..] Changes with Apache 2.2.5 + + *) mod_deflate: fix protocol handling in deflate input filter + PR 23287 [Nick Kew] + + *) mod_proxy: fix buffer overflow issue + PR 41144 [Davi Arnaut] BTW, the above description should be corrected to: http://svn.apache.org/viewvc/httpd/httpd/branches/2.0.x/CHANGES?r1=563329r2=563328pathrev=563329 -- Davi Arnaut
Re: svn commit: r563839 - /httpd/httpd/trunk/CHANGES
Jim Jagielski wrote: On Aug 8, 2007, at 8:47 AM, Davi Arnaut wrote: [EMAIL PROTECTED] wrote: Author: jim Date: Wed Aug 8 05:41:21 2007 New Revision: 563839 [..] Changes with Apache 2.2.5 + + *) mod_deflate: fix protocol handling in deflate input filter + PR 23287 [Nick Kew] + + *) mod_proxy: fix buffer overflow issue + PR 41144 [Davi Arnaut] BTW, the above description should be corrected to: http://svn.apache.org/viewvc/httpd/httpd/branches/2.0.x/CHANGES?r1=563329r2=563328pathrev=563329 I'm guessing you didn't read the log message :) Relax, it was just a gentle remainder :) Another question. There are some recent changes in apr that fixes a few issues (hp-ux sendfile, day of the year, dbm) which were originally reported as httpd (2.2.4) problems, wouldn't be better to ship 2.2.5 with newer apr/apr-util releases? -- Davi Arnaut
Re: svn commit: r563147 - /httpd/httpd/trunk/server/mpm/experimental/event/fdqueue.c
Ruediger Pluem wrote: On 08/06/2007 04:20 PM, [EMAIL PROTECTED] wrote: Author: jim Date: Mon Aug 6 07:20:24 2007 New Revision: 563147 URL: http://svn.apache.org/viewvc?view=revrev=563147 Log: These atomics expect apr_uint32_t *... The expectation, of course, is that the add/inc still works as expected even though we are using signed values. Hm. I have a bad feeling here. These atomics are platform specific and seemed to be designed for unsigned ints. Are we really sure that they can handle signed ints as well? Comments from an atomics guru from APR? I'm no guru on the subject, but AFAIK it's not reliable because of the underlying signed - unsigned conversions and unsigned arithmetic of the various instructions used. Looking at the code, I think we could use some other scheme to track the blocked threads. -- Davi Arnaut
Re: mod_cache summary and plan
Graham Leggett wrote: Davi Arnaut wrote: . Problem: You have described two separate problems below. No, and it's seems you are deeply confused on what buckets and brigades represent. You already committed what ? four fixes to the same problem ? Each time we point your wrong assumptions you came up with yet another bogus fix. Could you please stop for a moment and listen ? IMHO, you haven't presented any acceptable fix and you keep trying to fix things by your self without discussing on the list first. And more important, discussing on the list means that you have to hear other people comments. For a moment forget about file buckets and large files, what's really at stake is proxy/cache brigade management when the arrival rate is too high (e.g. a single 4.7GB file bucket, high-rate input data to be consumed by relatively low-rate). By operating as a normal output filter mod_cache must deal with potentially large brigades of (possibly) different (other than the stock ones) bucket types created by other filters on the chain. This first problem has largely been solved, bar some testing. Those fixes were vetoed if I remember correctly. The solution was to pass the output filter through the save_body() hook, and let the save_body() code decide for itself when the best time is to write the bucket(s) to the network. For example in the disk cache, the apr_bucket_read() loop will read chunks of the 4.7GB file 4MB at a time. This chunk will be cached, and then this chuck will be written to the network, then cleanup up. Rinse repeat. Previously, save_body() was expected to save all 4.7GB to the cache, and then only write the first byte to the network possibly minutes later. If a filter was present before cache that for any reason converted file buckets into heap buckets (for example mod_deflate), then save_body() would try and store 4.7GB of heap buckets in RAM to pass to the network later, and boom. You just described what I've said with another words. Listen if you don't change a bit your attitude I won't continue arguing with you, it's pointless. How mod_disk_cache chooses to send data to the network is an entirely separate issue, detailed below. NO! It's the same problem. The problem arises from the fact that mod_disk_cache store function traverses the brigade by it self reading each bucket in order to write it's contents to disk, potentially filling the memory with large chunks of data allocated/created by the bucket type read function (e.g. file bucket). To put this another way: The core problem in the old cache code was that the assumption was made that it was practical to call apr_bucket_read() on the same data _twice_ - once during caching, once during network write. No, the core problem it's how it manages the bucket/brigade (deep down it's the same problem, but ...). This assumption isn't valid, thus the recent fixes. . Constraints: No threads/forked processes. Bucket type specific workarounds won't work. No core changes/knowledge, easily back-portable fixes are preferable. . Proposed solution: File buffering (or a part of Graham's last approach). The solution consists of using the cache file as a output buffer by splitting the buckets into smaller chunks and writing then to disk. Once written (apr_file_write_full) a new file bucket is created with offset and size of the just written buffer. The old bucket is deleted. After that, the bucket is inserted into a temporary (empty) brigade and sent down the output filter stack for (probably) network i/o. At a quick glance, this solution may sound absurd -- the chunk is already in memory, and the output filter might need it again in memory soon. But there's no silver bullet, and it's a simple enough approach to solve the growing memory problem while not occurring into performance penalties. As soon as apr_file_write_full() is executed, the bucket just saved to disk cache is also in kernel buffer memory - meaning that a corresponding apr_bucket_read() afterwards in the network code reads already kernel memory cached data. I've just said that in the e-mail, but you deleted it. In performance testing, on files small enough to be buffered by the kernel (a few MB), the initial part of the download after caching is very fast. What this technique does is guarantee that regardless of the source of the response, be it a file, a CGI, or proxy, what gets written to the network is always a file, and always takes advantage of kernel based file performance features. I've just described that. Maybe my English was poor in the e-mail. -- Davi Arnaut
Re: mod_cache summary and plan
Ruediger Pluem wrote: On 10/29/2006 04:39 PM, Davi Arnaut wrote: Graham Leggett wrote: Davi Arnaut wrote: . Problem: You have described two separate problems below. No, and it's seems you are deeply confused on what buckets and brigades represent. You already committed what ? four fixes to the same problem ? Each time we point your wrong assumptions you came up with yet another bogus fix. Could you please stop for a moment and listen ? IMHO, you haven't presented any acceptable fix and you keep trying to fix things by your self without discussing on the list first. And more important, discussing on the list means that you have to hear other people comments. The solution was to pass the output filter through the save_body() hook, and let the save_body() code decide for itself when the best time is to write the bucket(s) to the network. For example in the disk cache, the apr_bucket_read() loop will read chunks of the 4.7GB file 4MB at a time. This chunk will be cached, and then this chuck will be written to the network, then cleanup up. Rinse repeat. Previously, save_body() was expected to save all 4.7GB to the cache, and then only write the first byte to the network possibly minutes later. If a filter was present before cache that for any reason converted file buckets into heap buckets (for example mod_deflate), then save_body() would try and store 4.7GB of heap buckets in RAM to pass to the network later, and boom. You just described what I've said with another words. Listen if you don't change a bit your attitude I won't continue arguing with you, it's pointless. I do not really like the way the discussion goes here. If I remember myself correctly we had a very similar discussion between you and Graham several month ago regarding the need for mod_cache to be RFC compliant. Yes, but it was about the generic cache architecture. It may be that we circle around the same things again and again and sometimes this may be even unproductive. But this way for sure we do not get anything (and the past proved it) productive out of this. As we had this in the past I try to throw a flag very early to avoid wasting time for everybody with the back and forth following such things. If you are frustrated by Grahams responses and the situation please try to express this a little different and less personalized. I'm not frustrated, I just wanted to tell him what I thought. I think he is a smart guy and works hard on the issues but we could achieve much more by collaborating constructively in a sane manner (step by step). -- Davi Arnaut
Re: mod_cache summary and plan
Graham Leggett wrote: Davi Arnaut wrote: I've just described that. Maybe my English was poor in the e-mail. Your English is spot on, unfortunately the aggressive nature of your email isn't. You are not going to bully anybody on this list into accepting any patch, it's not how this project works. I'm not bulling anyone. This is not a personal attack, it was a public calling for you to adjust the process. It is quite clear to me that you are upset that your patches were not accepted as is. Unfortunately your patches break existing compliance with RFC2616 in the cache, and in the process introduce a significant performance penalty. This has been pointed out to you before, and not just by me. That's exactly the problem, I'm not trying to compete with you. I'm not upset if my patches are not accepted, I just want the best possible solution that satisfies the community. I called you to work together with everybody on the list before committing. My patches were intended as a experiment, there weren't even targeted at trunk (cache refactor branch). I don't care if the patches are going to be committed or not. I don't contribute to prove to anyone that I'm better or anything else, I contribute because I really enjoy working on some parts of httpd/apr. I'm not going to dispute with you if mine suggestions or yours are accepted. I just want that everybody is heard on the process and that the final process pleases the majority. Your recent comments on patches contributed have made it clear that you neither understand the patches so committed, nor have you actually run the code in question. I respectfully request you fix both these issues before continuing any work on this cache. I don't want to solve this problem alone, I don't have all the answers. But I do know that last week jumbo patches didn't advance the issue any further because they were vetoed -- not because they were wrong, but because you didn't work with everybody before committing then. I will let this thread die now, which was created to gather a consensus but failed miserably. I just hope our minor disagreements won't interfere with us working on mod_cache in the future. I will repeat again, I'm not attacking you. I was pursing what I thought was better for mod_cache. -- Davi Arnaut
Re: mod_cache summary and plan
Graham Leggett wrote: Davi Arnaut wrote: You are not going to bully anybody on this list into accepting any patch, it's not how this project works. I'm not bulling anyone. This is not a personal attack, it was a public calling for you to adjust the process. Let's not fool ourselves, it was a personal attack. I'm not fooling my self. Bullying me is not going to force me to adjust the Apache process that has been in place on this project since I joined it 8 years ago. That process is not mine to change, and the sooner you realise that the better for us all. I will let this thread die now, which was created to gather a consensus but failed miserably. I just hope our minor disagreements won't interfere with us working on mod_cache in the future. I will repeat again, I'm not attacking you. I was pursing what I thought was better for mod_cache. mod_cache doesn't get better when the maintainers get fed up by the constant barrage of your abuse and give it up as a bad idea. To declare end of thread doesn't give us any guarantee that we won't see more of the same trolling from you. Fair enough, I withdrawal all my patches and I'm going to unsubscribe from the list. Have a nice day. -- Davi Arnaut
mod_cache summary and plan
Hi, It's quite clear that without some agreement we won't be able to actually fix mod_cache shortcomings. The idea now is to gather our efforts to get consensus on the proposed fixes and commit then one by one. The current high priority issues can be summarized as: * Buffering . Problem: For a moment forget about file buckets and large files, what's really at stake is proxy/cache brigade management when the arrival rate is too high (e.g. a single 4.7GB file bucket, high-rate input data to be consumed by relatively low-rate). By operating as a normal output filter mod_cache must deal with potentially large brigades of (possibly) different (other than the stock ones) bucket types created by other filters on the chain. The problem arises from the fact that mod_disk_cache store function traverses the brigade by it self reading each bucket in order to write it's contents to disk, potentially filling the memory with large chunks of data allocated/created by the bucket type read function (e.g. file bucket). . Constraints: No threads/forked processes. Bucket type specific workarounds won't work. No core changes/knowledge, easily back-portable fixes are preferable. . Proposed solution: File buffering (or a part of Graham's last approach). The solution consists of using the cache file as a output buffer by splitting the buckets into smaller chunks and writing then to disk. Once written (apr_file_write_full) a new file bucket is created with offset and size of the just written buffer. The old bucket is deleted. After that, the bucket is inserted into a temporary (empty) brigade and sent down the output filter stack for (probably) network i/o. At a quick glance, this solution may sound absurd -- the chunk is already in memory, and the output filter might need it again in memory soon. But there's no silver bullet, and it's a simple enough approach to solve the growing memory problem while not occurring into performance penalties. The memory usage is kept low because the deleted buckets will be released to the free list. Performance won't be hit because if there is enough memory system-wide the reads from the core output filter are going to be served by the page cache, or better yet, sendfil()ed. . Comments Do you agree/disagree ? Better solution ? * Thundering herd/parallel downloads * Partial downloads * Code quality Let's settle down the buffer issue first. -- Davi Arnaut
Re: svn commit: r468373 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_cache.c modules/cache/mod_cache.h modules/cache/mod_disk_cache.c modules/cache/mod_disk_cache.h modules/cache/mod_mem_cache.c
{ -ap_log_error(APLOG_MARK, APLOG_INFO, 0, r-server, - disk_cache: Caching body for URL %s, dobj-name); +e = APR_BRIGADE_FIRST(bb); +while (e != APR_BRIGADE_SENTINEL(bb)) { -for (e = APR_BRIGADE_FIRST(bb); -e != APR_BRIGADE_SENTINEL(bb); -e = APR_BUCKET_NEXT(e)) -{ -const char *str; -apr_size_t length, written; +const char *str; +apr_size_t length, written; +apr_off_t offset = 0; -/* Ignore the non-data-buckets */ -if(APR_BUCKET_IS_METADATA(e)) { -continue; -} +/* try write all data buckets to the cache, except for metadata buckets */ +if(!APR_BUCKET_IS_METADATA(e)) { Swapping the metadata check would make the code much more readable: if (is_metadata) deal with it continue handle normal bucket +/* read in a bucket fragment */ rv = apr_bucket_read(e, str, length, APR_BLOCK_READ); if (rv != APR_SUCCESS) { ap_log_error(APLOG_MARK, APLOG_ERR, rv, r-server, - disk_cache: Error when reading bucket for URL %s, + disk_cache: Error when reading bucket for URL %s, aborting request, dobj-name); file_cache_errorcleanup(dobj, r); +/* not being able to read the bucket is fatal, + * return this up the filter stack + */ return rv; } + +/* try write the bucket fragment to the cache */ +apr_file_seek(dobj-fd, APR_END, offset); rv = apr_file_write_full(dobj-fd, str, length, written); -if (rv != APR_SUCCESS) { +offset = - (apr_off_t)written; +apr_file_seek(dobj-fd, APR_END, offset); +/* if the cache write was successful, swap the original bucket + * with a file bucket pointing to the same data in the cache. + * + * This is done because: + * + * - The ap_core_output_filter can take advantage of its ability + * to do non blocking writes on file buckets. + * + * - We are prevented from the need to read the original bucket + * a second time inside ap_core_output_filter, which could be + * expensive or memory consuming. + * + * - The cache, in theory, should be faster than the backend, + * otherwise there would be little point in caching in the first + * place. + */ +if (APR_SUCCESS == rv) { + +/* remove and destroy the original bucket from the brigade */ +b = e; +e = APR_BUCKET_NEXT(e); +APR_BUCKET_REMOVE(b); +apr_bucket_destroy(b); aka as apr_bucket_delete +/* Is our network connection still alive? + * If not, we must continue caching the file, so keep looping. + * We will return the error at the end when caching is done. + */ +if (APR_SUCCESS == dobj-frv) { + +/* insert a file bucket pointing to the cache into out temporary brigade */ +if (diskcache_brigade_insert(dobj-tmpbb, dobj-fd, dobj-file_size, + written, + dobj-updtimeout, r-pool) == NULL) { + return APR_ENOMEM; +} Err. We had the data in memory, we are going to read it back from disk again just in order to not block ? That's nonsense. We don't need a special bucket type! Also, are you really sure you are tracking correctly all those buckets with the correct offset ? The file pointer offset is unique for a single fd. I will stop the review here, I have to calm down to really digest what's happening. -- Davi Arnaut
Re: mod_disk_cache summarization
Graham Leggett wrote: On Mon, October 23, 2006 10:50 pm, Davi Arnaut wrote: AFAIK all major platforms provide one, even win32. I even made a incomplete APR abstraction for file notification: http://haxent.com/~davi/apr/notify Is it possible to add doxygen comments on the apr_notify.h file, so as to confirm exactly how to use it? I would like to look at this for apr. The code is a bit outdated, I have integrated it onto a generic event abstraction for apr a la kevent that I'm working on. I'll try to post something this weekend to the list with the appropriate documentation. -- Davi Arnaut
Re: svn commit: r468373 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_cache.c modules/cache/mod_cache.h modules/cache/mod_disk_cache.c modules/cache/mod_disk_cache.h modules/cache/
Niklas Edmundsson wrote: On Fri, 27 Oct 2006, Graham Leggett wrote: Err. We had the data in memory, we are going to read it back from disk again just in order to not block ? That's nonsense. Agreed. Please explain. This is a disk cache. Why would you write expensive bucket data to cache, and then expensive bucket data to the network? That's plain stupid. And when you have a file backend, you want to hit your disk cache and not the backend when delivering data to a client. People might think that this doesn't matter, but for large files, especially larger than RAM in your machine, you usually go disk-bound without much help from the OS disk cache. But that's a corner case. There is no reason in doing this for small files (common case). For example, in a enterprise grade server memory is cheap and permanent storage is slow and expensive. Also, httpd seems to be faster delivering data by sendfile than delivering data from memory buckets. That's more of a performance bug in httpd though. It's not all httpd fault. Sendfile avoids memory copying/context switches/etc behind the scenes (in the kernel) because it talks directly to the underlying file system. When you have the memory already in user space there isn't much that you can do, the kernel has to copy it..etc. -- Davi Arnaut
Re: svn commit: r467655 - in /httpd/httpd/trunk: CHANGES docs/manual/mod/mod_cache.xml modules/cache/mod_cache.c modules/cache/mod_cache.h
Graham Leggett wrote: On Fri, October 27, 2006 6:25 pm, Niklas Edmundsson wrote: I would have been most happy if this had been fixed ages ago so I hadn't been forced to spend lots and lots of hours kludging stuff togehter. At least, my kludges seem to have sparked some development in this area, so they have served some purpose other than enabling a non-profit computer club building a FTP/HTTP server that actually works. When Brian Atkins stood up at Apachecon and presented a talk pretty much summarised as we don't use Apache's cache because it sucked, and here's why, that was a wake up call to realise that the disk cache in its current form sucks. We have significant contributions from two people - Davi Arnaut and Niklas Edmundsson, and I've been integrating the issues fixed by both these contributions into a coherent workable whole, so that the effort spent by these people isn't wasted. Both of their efforts have focused on different aspects of the cache, making this workable. Some parts are not RFC compliant, other parts are not implemented elegantly enough, but these are details that need to be raised, addressed and fixed, not used as a feeble excuse to abandon the effort and return to some cache code that nobody wants to use. I particularly don't care if my patches are not used if a better solution cames up. It takes time for ideas to maturate. I see lots of comments on the code, but the comments are summarised as the cache is fine as it is. It isn't. If it was fine, key users of network caching wouldn't be standing up saying they're using something else. No one said that, but we must think before acting. -- Davi Arnaut
Re: ROADMAP for mod_cache
Paul Querna wrote: I think we have had enough -1s passed over the last couple days related to mod_cache, that it indicates there is not a consensus on what should be done, nor on how it should be done. I think we need to stop committing code for a couple days, and try to make a rudimentary plan for the intended changes to mod_cache, and try to gather some kind of consensus on them, rather than continuing down the path of -1s and long ass mailing list threads. Thoughts? This is definitely a good and obvious idea, but those discussions only start when people begin committing code. I'm all in favor of posting the ideas/code to the list before committing as long as we all work towards choosing a common goal. Graham was right in committing the code because he was the only committer active in the discussing (expect for a few comments/patches from Joe Orton). Placing a call for a common goal and not actually suggesting one is not that helpful. Let's see if we can attract more attention to the cache problems now. Graham, could you please summarize the problems we want to solve and the possible solutions and send then to the list ? Thanks, -- Davi Arnaut
Re: svn commit: r468373 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_cache.c modules/cache/mod_cache.h modules/cache/mod_disk_cache.c modules/cache/mod_disk_cache.h modules/cache/mo
Greg Marr wrote: At 01:41 PM 10/27/2006, Davi Arnaut wrote: Niklas Edmundsson wrote: And when you have a file backend, you want to hit your disk cache and not the backend when delivering data to a client. People might think that this doesn't matter, but for large files, especially larger than RAM in your machine, you usually go disk-bound without much help from the OS disk cache. But that's a corner case. There is no reason in doing this for small files (common case). For example, in a enterprise grade server memory is cheap and permanent storage is slow and expensive. So why would this server be using mod_disk_cache in that case? Shouldn't this server be using mod_mem_cache? Selecting mod_disk_cache over mod_mem_cache means it's better to serve the cache from disk rather than from memory. If serving from the disk on the original request is too slow, then wouldn't serving from the disk on the subsequent requests be too slow as well? Because the data is already in memory. Why should I write something to disk, erase it from memory, and read it again shortly ? Why should I take care of something that is the job of the OS ? Why should I trash the VM constantly ? A proxy/cache common purpose is to cache web content, this means we have a large number of small files (think html pages, images, etc) that we must keep in a permanent storage, but if we have memory, let's use it! -- Davi Arnaut
Re: svn commit: r468373 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_cache.c modules/cache/mod_cache.h modules/cache/mod_disk_cache.c modules/cache/mod_disk_cache.h modules/cache/mo
Davi Arnaut wrote: Greg Marr wrote: At 01:41 PM 10/27/2006, Davi Arnaut wrote: Niklas Edmundsson wrote: And when you have a file backend, you want to hit your disk cache and not the backend when delivering data to a client. People might think that this doesn't matter, but for large files, especially larger than RAM in your machine, you usually go disk-bound without much help from the OS disk cache. But that's a corner case. There is no reason in doing this for small files (common case). For example, in a enterprise grade server memory is cheap and permanent storage is slow and expensive. So why would this server be using mod_disk_cache in that case? Shouldn't this server be using mod_mem_cache? Selecting mod_disk_cache over mod_mem_cache means it's better to serve the cache from disk rather than from memory. If serving from the disk on the original request is too slow, then wouldn't serving from the disk on the subsequent requests be too slow as well? Because the data is already in memory. Why should I write something to disk, erase it from memory, and read it again shortly ? Why should I take care of something that is the job of the OS ? Why should I trash the VM constantly ? I should note that if there is enough memory those reads will be served by the page cache. Later we could think about reading directly to mmapable files, changing it's protection (from rw to ro) as soon as we are done writing to. -- Davi Arnaut
Re: svn commit: r467655 - in /httpd/httpd/trunk: CHANGES docs/manual/mod/mod_cache.xml modules/cache/mod_cache.c modules/cache/mod_cache.h
Graham Leggett wrote: On Wed, October 25, 2006 11:48 pm, Davi Arnaut wrote: No, we start with a file bucket on the brigade, I will try to explain what happens to see if we are talking about the same thing. Suppose we have a brigade containing a a file bucket, and the file size is 4.7GB. We want to read it fully. When we call apr_bucket_read() on this bucket, we end-up calling the bucket read function (file_bucket_read). What does the bucket file read do ? If mmap is supported, it mmaps APR_MMAP_LIMIT bytes (4MB) by creating a new mmap bucket and splits the bucket. So, after calling read on a file bucket you have two buckets on the brigade. The first one is the mmap bucket and last is file bucket with a updated offset. The same thing happens if mmap is not supported, but the bucket type will be a heap bucket. If we don't delete or flush those implicitly created buckets they will keep the whole file in memory, but one single read will not put the entire file on memory. What Joe's patch does is remove this first implicitly created bucket from the brigade, placing it on the brigade on a temporary brigade for sending it to the client. Ok, this makes more sense, but in its current form the temporary brigade should not be necessary. But it would be better. We don't need to keep creating brigades with apr_brigade_split(), we could only move the buckets to a temporary brigade. Take a look at apr_brigade_split: APU_DECLARE(apr_bucket_brigade *) apr_brigade_split(apr_bucket_brigade *b, apr_bucket *e) { apr_bucket_brigade *a; apr_bucket *f; a = apr_brigade_create(b-p, b-bucket_alloc); /* Return an empty brigade if there is nothing left in * the first brigade to split off */ if (e != APR_BRIGADE_SENTINEL(b)) { f = APR_RING_LAST(b-list); APR_RING_UNSPLICE(e, f, link); APR_RING_SPLICE_HEAD(a-list, e, f, apr_bucket, link); } APR_BRIGADE_CHECK_CONSISTENCY(a); APR_BRIGADE_CHECK_CONSISTENCY(b); return a; } If we used a tmpbb we could replace the apr_brigade_split with something similar to: if (e != APR_BRIGADE_SENTINEL(b)) { f = APR_RING_LAST(b-list); APR_RING_UNSPLICE(e, f, link); APR_RING_SPLICE_HEAD(a-list, e, f, apr_bucket, link); } And instead of destroying the brigade, we could only do a call to: apr_brigade_cleanup(tmpbb); While disk cache goes through the brigade, it replaces buckets on the incoming brigade with a bucket referring to the cached file. I think a number of the thorny memory problems in mod_*cache have come about because the brigade is read twice - once by store_body, and once by ap_core_output_filter. Maybe. Buckets are reference counted and get reused pretty quickly. The replacing of the buckets with file buckets in the brigade by disk cache means the brigade will only be read once - by save_body(). Right, I liked your approach. I just wanted to be sure everyone is on the same page on this issue. That's why splitting the brigade with magical values (16MB) is not such a good idea, because the bucket type knows betters and will split the bucket anyway. Right, the picture is getting clearer. Look closer at the cache code in mod_cache, right at the end of the CACHE_SAVE filter. Notice how store_body() is called, followed by ap_pass_brigade(). Notice how store_body() is expected to handle the complete brigade, all 4.7GB of it, before ap_pass_brigade() has a look see. Now - create a 4.7GB file on your drive. Copy this file to a second filename. Time this and tell me how long it takes. Do you see the further problem? Yup. The split-before-save_body patch wasn't described well enough by me - it is designed to prevent save_body() from having to handle bucket sizes that are impractically large to handle by a cache, both in terms of memory, and in terms of time. That is clear. -- Davi Arnaut
Re: svn commit: r467655 - in /httpd/httpd/trunk: CHANGES docs/manual/mod/mod_cache.xml modules/cache/mod_cache.c modules/cache/mod_cache.h
.. +static int do_store_body(cache_request_rec *cache, + ap_filter_t *f, + apr_bucket_brigade *in) { +apr_bucket *e; +apr_bucket_brigade *bb; +apr_status_t rv, rv2; +cache_server_conf *conf; + +conf = (cache_server_conf *) ap_get_module_config(f-r-server-module_config, + cache_module); + +/* try split any buckets larger than threshold */ +rv = APR_SUCCESS; /* successful unless found otherwise */ +rv2 = APR_SUCCESS; +if (conf-maxbucketsize 0) { +e = APR_BRIGADE_FIRST(in); +while (e != APR_BRIGADE_SENTINEL(in)) { (rv == APR_SUCCESS rv2 == APR_SUCCESS) { Otherwise we may keep walking through the brigade unnecessarily. + +/* if necessary, split the brigade and send what we have so far */ +if (APR_SUCCESS == apr_bucket_split(e, conf-maxbucketsize)) { +e = APR_BUCKET_NEXT(e); +bb = in; +in = apr_brigade_split(bb, e); + +/* if store body fails, don't try store body again */ +if (APR_SUCCESS == rv) { Drop if. +rv = cache-provider-store_body(cache-handle, f-r, bb); +} + +/* try write split brigade to the filter stack and network */ +if (APR_SUCCESS == rv2) { Drop if. +rv2 = ap_pass_brigade(f-next, bb); +} +apr_brigade_destroy(bb); +} +else { +e = APR_BUCKET_NEXT(e); +} +} +} + Nice patch. -- Davi Arnaut
Re: svn commit: r467655 - in /httpd/httpd/trunk: CHANGES docs/manual/mod/mod_cache.xml modules/cache/mod_cache.c modules/cache/mod_cache.h
Graham Leggett wrote: On Wed, October 25, 2006 4:14 pm, Davi Arnaut wrote: (rv == APR_SUCCESS rv2 == APR_SUCCESS) { Otherwise we may keep walking through the brigade unnecessarily. This is intentional - we don't want a failure on store_body() to cause a premature abort on the network write. Huh ? The loop will break and the whole brigade will be sent at once a few lines below. Why would we want to keep splitting the brigade ? There is no reason to keep breaking the buckets if we won't store then, or I am missing something ? -- Davi Arnaut
Re: svn commit: r467655 - in /httpd/httpd/trunk: CHANGES docs/manual/mod/mod_cache.xml modules/cache/mod_cache.c modules/cache/mod_cache.h
Joe Orton wrote: On Wed, Oct 25, 2006 at 01:44:48PM -, Graham Leggett wrote: Author: minfrin Date: Wed Oct 25 06:44:47 2006 New Revision: 467655 URL: http://svn.apache.org/viewvc?view=revrev=467655 Log: mod_cache: Fix an out of memory condition that occurs when the cache tries to save huge files (greater than RAM). Buckets bigger than a tuneable threshold are split into smaller buckets before being passed to mod_disk_cache, etc. PR 39380 Another couple of hundred lines of code and even a new config directive, and this still doesn't get close to actually fixing the problem! -1 already, this code is just not getting better. mod_disk_cache is still liable to eat all your RAM in that apr_bucket_read() loop, the apr_bucket_split() is not guaranteed to work for a morphing bucket type. It is simple enough to fix this problem without adding all this code and without all the stuff in r450105 too, something like the below. And you almost got it right. We don't want to stop caching if the client's connection fail and we must not slow the caching because of a slow client (that's why I didn't pass the brigade). In the end, your's do almost the same as mine [1] expect that I dint's pass the new buckets up the chain before deleting then (and it was file bucket exclusive). Copying to disk tends to be faster then sending to a client. -- Davi Arnaut 1 http://marc.theaimsgroup.com/?l=apache-httpd-devm=115971884005419w=2
Re: svn commit: r467655 - in /httpd/httpd/trunk: CHANGES docs/manual/mod/mod_cache.xml modules/cache/mod_cache.c modules/cache/mod_cache.h
Joe Orton wrote: On Wed, Oct 25, 2006 at 01:54:04PM -0300, Davi Arnaut wrote: Joe Orton wrote: Another couple of hundred lines of code and even a new config directive, and this still doesn't get close to actually fixing the problem! -1 already, this code is just not getting better. mod_disk_cache is still liable to eat all your RAM in that apr_bucket_read() loop, the apr_bucket_split() is not guaranteed to work for a morphing bucket type. It is simple enough to fix this problem without adding all this code and without all the stuff in r450105 too, something like the below. And you almost got it right. We don't want to stop caching if the client's connection fail ...a simple enough change. Yes. and we must not slow the caching because of a slow client (that's why I didn't pass the brigade). There is no other acceptable solution AFAICS. Buffering the entire brigade (either to disk, or into RAM as the current code does) before writing to the client is not OK, polling on buckets is not possible, using threads is not OK, using non-blocking writes up the output filter chain is not possible. Any other ideas? I think we should build the necessary infrastructure to solve this problem once and for all. Be it either non-blocking writes, AIO or add support to the core output filter to write data to different destinations. -- Davi Arnaut
Re: svn commit: r467655 - in /httpd/httpd/trunk: CHANGES docs/manual/mod/mod_cache.xml modules/cache/mod_cache.c modules/cache/mod_cache.h
William A. Rowe, Jr. wrote: Paul Querna wrote: Thats not going to be possible until 3.0. Good luck :) Paul's right. The next expectiation we break, that a protocol spends it's life on a single thread. That is really a 3.0 magnitude change because it will break modules in a major way. Then, the final expectation to break is that a -request- spends it's life on a given thread. That's earth shattering to developers, and really is beyond the 3.0 release (if we want to see 3.0 adopted in the next few years). Ok. I will come back in a few years :) j/k How about moving the proxy/cache parts to use their own core filters/architecture ? I obviously lack experience, but in my limited understanding the proxy/cache code is being restrained because we can't improve core features that may affect the HTTP serving parts. In a proxy/cache situation no one cares about php/python, cgi, or what-so-ever filters. Can't we change the rules for apache (core and modules) operating as a proxy/cache more easily ? I don't know all possible implications of such a move, but it may work. -- Davi Arnaut
Re: svn commit: r467655 - in /httpd/httpd/trunk: CHANGES docs/manual/mod/mod_cache.xml modules/cache/mod_cache.c modules/cache/mod_cache.h
Graham Leggett wrote: Davi Arnaut wrote: Yes, first because size_t is 32 bits :). When you do a read like this on a file bucket, the whole bucket is not read into memory. The read function splits the bucket and changes the current bucket to refer to data that was read. 32 bits is 4GB. A large number of webservers don't have memory that size, thus the problem. The problem lies that those new buckets keep accumulating in the brigade! See my patch again. Where? I was referring to it will attempt to read all 4.7GB. Such a thing does not exist! We start with one 4.7GB bucket in a brigade. No, we start with a file bucket on the brigade, I will try to explain what happens to see if we are talking about the same thing. Suppose we have a brigade containing a a file bucket, and the file size is 4.7GB. We want to read it fully. When we call apr_bucket_read() on this bucket, we end-up calling the bucket read function (file_bucket_read). What does the bucket file read do ? If mmap is supported, it mmaps APR_MMAP_LIMIT bytes (4MB) by creating a new mmap bucket and splits the bucket. So, after calling read on a file bucket you have two buckets on the brigade. The first one is the mmap bucket and last is file bucket with a updated offset. The same thing happens if mmap is not supported, but the bucket type will be a heap bucket. If we don't delete or flush those implicitly created buckets they will keep the whole file in memory, but one single read will not put the entire file on memory. What Joe's patch does is remove this first implicitly created bucket from the brigade, placing it on the brigade on a temporary brigade for sending it to the client. That's why splitting the brigade with magical values (16MB) is not such a good idea, because the bucket type knows betters and will split the bucket anyway. -- Davi Arnaut
Re: svn commit: r467655 - in /httpd/httpd/trunk: CHANGES docs/manual/mod/mod_cache.xml modules/cache/mod_cache.c modules/cache/mod_cache.h
Davi Arnaut wrote: Graham Leggett wrote: Davi Arnaut wrote: Yes, first because size_t is 32 bits :). When you do a read like this on a file bucket, the whole bucket is not read into memory. The read function splits the bucket and changes the current bucket to refer to data that was read. 32 bits is 4GB. A large number of webservers don't have memory that size, thus the problem. The problem lies that those new buckets keep accumulating in the brigade! See my patch again. Where? I was referring to it will attempt to read all 4.7GB. Such a thing does not exist! We start with one 4.7GB bucket in a brigade. No, we start with a file bucket on the brigade, I will try to explain what happens to see if we are talking about the same thing. Suppose we have a brigade containing a a file bucket, and the file size is 4.7GB. We want to read it fully. When we call apr_bucket_read() on this bucket, we end-up calling the bucket read function (file_bucket_read). What does the bucket file read do ? If mmap is supported, it mmaps APR_MMAP_LIMIT bytes (4MB) by creating a new mmap bucket and splits the bucket. So, after calling read on a file bucket you have two buckets on the brigade. The first one is the mmap bucket and last is file bucket with a updated offset. s/last/last one is the old/ The same thing happens if mmap is not supported, but the bucket type will be a heap bucket. If we don't delete or flush those implicitly created buckets they will keep the whole file in memory, but one single read will not put the entire file on memory. s/keep the whole file in memory/will accumulate on the brigade, implying that the whole file is in memory/ What Joe's patch does is remove this first implicitly created bucket from the brigade, placing it on the brigade on a temporary brigade for sending it to the client. s/on the brigade// That's why splitting the brigade with magical values (16MB) is not such a good idea, because the bucket type knows betters and will split the bucket anyway. -- Davi Arnaut
Re: mod_disk_cache summarization
Graham Leggett wrote: Brian Akins wrote: Can someone please summarize the various patches for mod_disk_cache that have been floating around in last couple weeks? I have looked at the patches but wasn't real sure of the general philosophy/methodology to them. In essence, the patches solve the thundering herd problem. This involved some significant changes to the way disk cache worked, some of which are still being reviewed. What's been committed so far is a temporary workaround to the 4.7GB file buckets are being loaded into RAM problem. The workaround detects if the bucket being read is a file bucket, and copies the file into the cache using file read and file write, instead of bucket read (and thus crash). Joe Orton raised concern about the targeting of file buckets for special treatment in the disk cache, and I've been trying to find the right way to handle this. Looking at the network write filter, it also seemed to follow the logic if file bucket, special handling (possibly SENDFILE), else read then write. So far the way that seems to most elegant to me is to create a file write filter based on the network write filter, that hides all the special bucket handling magic. Still working on it though. Have you seen my patch to address this issue ? IMHO, it is far less complex and less expensive then the committed workaround. Next up was the fix for thundering herd itself - this involved teaching the read then write loop inside disk cache to be broken into two possible paths: read from output filter stack and write to cache file, followed by read from cache file and write to network. .. The next patch, which was posted by Niklas but not committed yet, solves the problem of the lead reading the entire response, then sending it to the client as mentioned above. It involves disk_cache forking a process/thread as appropriate, splitting the read response / write to cache from read from cache / write to network, which now run independently of each other. I am trying to find ways of removing the need for the extra fork / thread. One option is to perform non blocking writes in the read from cache / write to network code, and only iterating to the next write if the previous write is complete (ie no block). This means the write to client will always be the same speed or slower than the read from backend, which is exactly what we want. I didn't like that sleep/fstat hackery. How about using a real file notification scheme ? -- Davi Arnaut
Re: mod_disk_cache summarization
Graham Leggett wrote: Davi Arnaut wrote: Have you seen my patch to address this issue ? IMHO, it is far less complex and less expensive then the committed workaround. No - I went through your patches in some detail, but I didn't see one that addressed this problem specifically. Once thundering herd is solved, I plan to commit all your refactoring patches. http://marc.theaimsgroup.com/?l=apache-httpd-devm=115971884005419w=2 I didn't like that sleep/fstat hackery. How about using a real file notification scheme ? The prerequisite is that APR needs to be taught about this scheme, and it has to work portably across all platforms. AFAIK all major platforms provide one, even win32. I even made a incomplete APR abstraction for file notification: http://haxent.com/~davi/apr/notify -- Davi Arnaut
Re: svn commit: r454683 - in /httpd/httpd/trunk: CHANGES docs/manual/mod/mod_rewrite.xml modules/mappers/mod_rewrite.c
Nick Kew wrote: On Tuesday 10 October 2006 13:00, Ruediger Pluem wrote: On 10/10/2006 12:33 PM, wrote: Author: niq Date: Tue Oct 10 03:33:06 2006 New Revision: 454683 URL: http://svn.apache.org/viewvc?view=revrev=454683 Log: Add SQL Query capability to RewriteMap Modified: httpd/httpd/trunk/modules/mappers/mod_rewrite.c URL: http://svn.apache.org/viewvc/httpd/httpd/trunk/modules/mappers/mod_rewrit e.c?view=diffrev=454683r1=454682r2=454683 = = --- httpd/httpd/trunk/modules/mappers/mod_rewrite.c (original) +++ httpd/httpd/trunk/modules/mappers/mod_rewrite.c Tue Oct 10 03:33:06 2006 +static char *lookup_map_dbd(request_rec *r, char *key, const char *label) +{ +apr_status_t rv; +apr_dbd_prepared_t *stmt; +apr_dbd_results_t *res = NULL; +apr_dbd_row_t *row = NULL; +const char *ret = NULL; +int n = 0; +ap_dbd_t *db = dbd_acquire(r); + +stmt = apr_hash_get(db-prepared, label, APR_HASH_KEY_STRING); + +rv = apr_dbd_pvselect(db-driver, r-pool, db-handle, res, + stmt, 0, key, NULL); +if (rv != 0) { +ap_log_rerror(APLOG_MARK, APLOG_ERR, 0, r, + rewritemap: error querying for %s, key); Missing return ? res may end up NULL.. -- Davi Arnaut
Re: [PATCH] mod_disk_cache working LFS (filecopy)
Niklas Edmundsson wrote: On Sun, 1 Oct 2006, Davi Arnaut wrote: store_body: .. if (is_file_bucket(bucket)) copy_file_bucket(bucket, bb); Probably, but that doesn't allow for creating a thread/process that does the copying in the background, which is my long term goal. Also, simply doing bucket_delete like that means that the file will never be sent to the client, which is a bad thing IMO ;) Shame on me, but I said something like.. :) I guess the attached patch does the same (plus mmap, et cetera) and is much simpler. Comments ? Simpler, yes. But it only has the benefit of not eating all your memory... Well, that was the goal. Maybe we could merge this one instead and work together on the other goals. * It leaves the brigade containing the uncached entity, so it will cause the backend to first deliver stuff to be cached and then stuff to the client. Yeah, that's how mod_disk_cache works. I think it's possible to work around this limitation without using threads by keeping each cache instance with it's own brigades and flushing it occasionally with non-blocking i/o. Or we could move all disk i/o to a mod_disk_cache exclusive thread pool, it could be configurable at build time whether or not to use a thread pool. Comments ? * When this evolves to wanting to spawn a thread/process to do the copying you'll need the is this a file-thingie anyway (at least I need it, but I might have missed some nifty feature in APR). You would just need to copy the remaining buckets (granted if there are no concurrency problems) and send then to a per-process thread pool. -- Davi Arnaut
Re: [PATCH] mod_disk_cache working LFS (filecopy)
Niklas Edmundsson wrote: On Mon, 2 Oct 2006, Davi Arnaut wrote: Simpler, yes. But it only has the benefit of not eating all your memory... Well, that was the goal. Maybe we could merge this one instead and work together on the other goals. As I have said before, we have a large patchset that fixes a bunch of problems. However, since the wish was to merge it in pieces, I have started breaking it into pieces for merging. Thank you. If the intent is a total redesign of mod_disk_cache, ie you're not interested in these patches at all, please say so and I would have not wasted a lot of work on bending our patches to get something that works when applying them one by one and then do QA on the thing. Your work is never wasted, but the overall goal is to achieve a good code quality whether is comes from you, me or anyone - personally speaking since I'm not a apache committer. I'm also interested in the same features you are, but we have to think about long-term maintainability/quality to avoid half-backed short-term solutions which wastes everybody's work. * It leaves the brigade containing the uncached entity, so it will cause the backend to first deliver stuff to be cached and then stuff to the client. Yeah, that's how mod_disk_cache works. I think it's possible to work around this limitation without using threads by keeping each cache instance with it's own brigades and flushing it occasionally with non-blocking i/o. The replace_brigade_with_cache()-function simply replaced the brigade with an instance pointing to the cached entity. Or we could move all disk i/o to a mod_disk_cache exclusive thread pool, it could be configurable at build time whether or not to use a thread pool. Comments ? I would very happy if people would fix the mod_disk_cache mess so I didn't have to. However, since noone seems to have produced something usable for our usage in the timeframe mod_disk_cache has existed I was forced to hack on it. I'm trying my best to not give up on having it merged as I know that there are other sites interested in it, and now that the first trivial bit has been applied I'm hoping that people will at least look at the rest... There are bound to be cool apachier ways to solve some of the problems, but given that our patch is stable in production and has a generally much higher code quality than mod_disk_cache (ever heard of error checking/handling?) it would be nice if people at least could look at the whole thing before starting to complain at the complexity of small parts (or code not touched by the patch, for that matter). As I said, we should always strive for the best possible quality or we are doomed to keep fixing it over and over. * When this evolves to wanting to spawn a thread/process to do the copying you'll need the is this a file-thingie anyway (at least I need it, but I might have missed some nifty feature in APR). You would just need to copy the remaining buckets (granted if there are no concurrency problems) and send then to a per-process thread pool. And when not having threads? non-blocking when possible. -- Davi Arnaut
Re: [PATCH] mod_disk_cache working LFS (filecopy)
Ruediger Pluem wrote: On 09/26/2006 01:00 PM, Joe Orton wrote: On Tue, Sep 26, 2006 at 10:52:18AM +0200, Niklas Edmundsson wrote: This patch depends on mod_disk_cache LFS-aware config submitted earlier and is for trunk. It makes caching of large files possible on 32bit machines by: * Realising that a file is a file and can be copied as such, without reading the whole thing into memory first. This was discussed a while back. I think this is an API problem which needs to be fixed at API level, not something which should be worked around by adding bucket-type-specific hacks. The store_body callback needs to be able to operate like a real output filter so it can avoid the buffering-buckets-into-memory problem; as in: while (brigade not empty) 1. read data from bucket 2. write data to disk 3. pass bucket up output filter chain 4. delete bucket I agree. But in the case that we want to save a file bucket, can't we use sendfile in this case? Something like ap(r)_store_file_bucket which would store it via sendfile or splice on Linux would be really neat. Of course a lot more details would need to be worked out e.g. how and if this function changes the file bucket and what happens to the position of the src and target file pointers in the file. Linux sendfile() only works on mmapable fds to sockets. splice() one of the two fds must refer to a pipe (afaik). tee() both fds must refer to pipes. -- Davi Arnaut
Re: [PATCH] mod_disk_cache working LFS (filecopy)
Niklas Edmundsson wrote: On Sat, 30 Sep 2006, Davi Arnaut wrote: Hi, Wouldn't you avoid a lot of complexity in this patch if you just deleted from the brigade the implicitly created heap buckets while reading file buckets ? Something like: store_body: .. if (is_file_bucket(bucket)) copy_file_bucket(bucket, bb); Probably, but that doesn't allow for creating a thread/process that does the copying in the background, which is my long term goal. Also, simply doing bucket_delete like that means that the file will never be sent to the client, which is a bad thing IMO ;) Shame on me, but I said something like.. :) I guess the attached patch does the same (plus mmap, et cetera) and is much simpler. Comments ? -- Davi Arnaut Index: trunk/modules/cache/mod_disk_cache.c === --- trunk.orig/modules/cache/mod_disk_cache.c 2006-10-01 12:56:55.0 -0300 +++ trunk/modules/cache/mod_disk_cache.c2006-10-01 12:57:34.0 -0300 @@ -984,126 +984,51 @@ return APR_SUCCESS; } +apr_status_t copy_file_bucket(apr_bucket *b, apr_file_t *fd, apr_off_t *file_size) +{ +apr_size_t len; +const char *str; +apr_size_t written; +apr_status_t rv, rc; +apr_bucket *ref, *e, *s; -static apr_status_t copy_body(apr_pool_t *p, - apr_file_t *srcfd, apr_off_t srcoff, - apr_file_t *destfd, apr_off_t destoff, - apr_off_t len) -{ -apr_status_t rc; -apr_size_t size; -apr_finfo_t finfo; -apr_time_t starttime = apr_time_now(); - -char *buf = apr_palloc(p, CACHE_BUF_SIZE); -if (!buf) { -return APR_ENOMEM; -} - -if(srcoff != 0) { -rc = apr_file_seek(srcfd, APR_SET, srcoff); -if(rc != APR_SUCCESS) { -return rc; -} -} +rv = apr_bucket_copy(b, e); -if(destoff != 0) { -rc = apr_file_seek(destfd, APR_SET, destoff); -if(rc != APR_SUCCESS) { -return rc; -} +if (rv != APR_SUCCESS) { +return rv; } -/* Tried doing this with mmap, but sendfile on Linux got confused when - sending a file while it was being written to from an mmapped area. - The traditional way seems to be good enough, and less complex. - */ -while(len 0) { -size=MIN(len, CACHE_BUF_SIZE); +APR_BUCKET_INIT(e); +s = APR_BUCKET_NEXT(e); -rc = apr_file_read_full (srcfd, buf, size, NULL); -if(rc != APR_SUCCESS) { -return rc; -} +do { +rv = apr_bucket_read(e, str, len, APR_BLOCK_READ); -rc = apr_file_write_full(destfd, buf, size, NULL); -if(rc != APR_SUCCESS) { -return rc; +if (rv != APR_SUCCESS rv != APR_EOF) { +break; } -len -= size; -} - -/* Check if file has changed during copying. This is not 100% foolproof - due to NFS attribute caching when on NFS etc. */ -/* FIXME: Can we assume that we're always copying an entire file? In that - case we can check if the current filesize matches the length - we think it is */ -rc = apr_file_info_get(finfo, APR_FINFO_MTIME, srcfd); -if(rc != APR_SUCCESS) { -return rc; -} -if(starttime finfo.mtime) { -return APR_EGENERAL; -} - -return APR_SUCCESS; -} - - -static apr_status_t replace_brigade_with_cache(cache_handle_t *h, - request_rec *r, - apr_bucket_brigade *bb) -{ -apr_status_t rv; -int flags; -apr_bucket *e; -core_dir_config *pdcfg = ap_get_module_config(r-per_dir_config, -core_module); -disk_cache_object_t *dobj = (disk_cache_object_t *) h-cache_obj-vobj; -flags = APR_READ|APR_BINARY; -#if APR_HAS_SENDFILE -flags |= ((pdcfg-enable_sendfile == ENABLE_SENDFILE_OFF) -? 0 : APR_SENDFILE_ENABLED); -#endif +rc = apr_file_write_full(fd, str, len, written); -rv = apr_file_open(dobj-fd, dobj-datafile, flags, 0, r-pool); -if (rv != APR_SUCCESS) { -ap_log_error(APLOG_MARK, APLOG_ERR, rv, r-server, - disk_cache: Error opening datafile %s for URL %s, - dobj-datafile, dobj-name); -return rv; -} +if (rc != APR_SUCCESS) { +rv = rc; +break; +} -/* First, empty the brigade */ -e = APR_BRIGADE_FIRST(bb); -while (e != APR_BRIGADE_SENTINEL(bb)) { -apr_bucket *d; -d = e; +ref = e; +*file_size += written; e = APR_BUCKET_NEXT(e); -apr_bucket_delete(d); -} +apr_bucket_delete(ref); +} while (e != s rv == APR_SUCCESS); -/* Then, populate it with our cached instance */ -rv = recall_body(h, r-pool, bb); -if (rv
Re: [PATCH] mod_disk_cache working LFS (filecopy)
On 26/09/2006, at 05:52, Niklas Edmundsson wrote: This patch depends on mod_disk_cache LFS-aware config submitted earlier and is for trunk. It makes caching of large files possible on 32bit machines by: * Realising that a file is a file and can be copied as such, without reading the whole thing into memory first. Hi, Wouldn't you avoid a lot of complexity in this patch if you just deleted from the brigade the implicitly created heap buckets while reading file buckets ? Something like: store_body: .. if (is_file_bucket(bucket)) copy_file_bucket(bucket, bb); copy_file_bucket: while (bucket != sentinel(bb) is_file_bucket(bucket) { rv = apr_bucket_read(..) if (is_heap_bucket(bucket)) { apr_bucket *tmp = bucket; bucket = bucket_next(tmp); bucket_delete(tmp); } else bucket = bucket_next(bucket); } IMHO, this is a lot better than all that added complexity for a corner case. -- Davi Arnaut
Re: svn commit: r450188 - /httpd/httpd/trunk/modules/cache/mod_disk_cache.c
On 26/09/2006, at 17:35, [EMAIL PROTECTED] wrote: Author: minfrin Date: Tue Sep 26 13:35:42 2006 New Revision: 450188 + +char *buf = apr_palloc(p, CACHE_BUF_SIZE); +if (!buf) { +return APR_ENOMEM; +} IIRC, apache abort()s on memory allocation errors. -- Davi Arnaut
Re: [patch 10/16] fix up coding style issues
On 20/09/2006, at 06:20, Joe Orton wrote: Hi Davi, On Tue, Sep 19, 2006 at 11:34:03PM -0300, Davi Arnaut wrote: Clean up code style in the cache code and shrink the mod_mem_cache store_body function. The casts to/from void * are unnecessary and could just be removed rather than being whitespace-adjusted. Yes, certainly. --- modules/cache/mod_mem_cache.c.orig +++ modules/cache/mod_mem_cache.c @@ -52,27 +52,30 @@ module AP_MODULE_DECLARE_DATA mem_cache_module; -typedef enum { +typedef enum +{ The traditional style is to keep the brace on the same line for data type definitions (but not with function definitions). see e.g. grep struct httpd.h :) I used the GNU indent command line as cited at http:// httpd.apache.org/dev/styleguide.html Anyway, I should have double checked the result. -- Davi Arnaut
Re: [patch 09/16] simplify array and table serialization
On 20/09/2006, at 10:16, Brian Akins wrote: Davi Arnaut wrote: Simplify the array and table serialization code, separating it from the underlying I/O operations. Probably faster to just put every thing in an iovec (think writev). Probably no, apr_brigade_writev does (quite) the same. -- Davi Arnaut
Re: [patch 09/16] simplify array and table serialization
On 20/09/2006, at 11:00, Brian Akins wrote: Davi Arnaut wrote: On 20/09/2006, at 10:16, Brian Akins wrote: Davi Arnaut wrote: Simplify the array and table serialization code, separating it from the underlying I/O operations. Probably faster to just put every thing in an iovec (think writev). Probably no, apr_brigade_writev does (quite) the same. Doesn't mean apr_brigade_writev does it fast either... If the serialization simply returned an iovec, mod_mem_cache could use apr_pstrcatv and mod_disk_cache could use apr_file_writev. There are no brigades _yet_, also pay attention to the comment: /* XXX this is a temporary function, it will be removed later */ -- Davi Arnaut
Re: [patch 09/16] simplify array and table serialization
On 20/09/2006, at 11:58, Davi Arnaut wrote: On 20/09/2006, at 11:00, Brian Akins wrote: Davi Arnaut wrote: On 20/09/2006, at 10:16, Brian Akins wrote: Davi Arnaut wrote: Simplify the array and table serialization code, separating it from the underlying I/O operations. Probably faster to just put every thing in an iovec (think writev). Probably no, apr_brigade_writev does (quite) the same. Doesn't mean apr_brigade_writev does it fast either... If the serialization simply returned an iovec, mod_mem_cache could use apr_pstrcatv and mod_disk_cache could use apr_file_writev. There are no brigades _yet_, also pay attention to the comment: /* XXX this is a temporary function, it will be removed later */ And yes, once we have brigades between then it is much better/faster to use the writev functions. -- Davi Arnaut
[patch 02/16] revamped mod_disk_cache directory structure
This patch converts the mod_disk_cache cache directory structure to a uniformly distributed N level hierarchy. The admin specifies the number of levels (or directories) and the files are scattered across the directories. Example: CacheRoot /cache/ # CacheDirLevels n l1 l2 ln CacheDirLevels 2 4 256 16 In this example, the directory tree will be three levels deep. The first level will have 4 directories, each of the 4 first level directories will have 256 directories (second level) and so on to the last level. Directory tree layout for the example: /cache/ /cache/0d /cache/0d/36 /cache/0d/36/06 /cache/0d/36/06/7a50a5c38a73abdb6db28a2b0f6881e5.data /cache/0d/36/06/7a50a5c38a73abdb6db28a2b0f6881e5.header This patch adds support for symlinking the directories to separate disk or partitions by creating the cache files on their destination directory. The symlinks can also be used to load balance the cache files between disks because each last level directory gets the same (on average) number of files -- on a cache setup with 4 first level directories each one receives 25%, linking the three of them to disk A and the last one to disk B yields a 75/25 distribution between disks A (75) and B (25). Index: modules/cache/cache_util.c === --- modules/cache/cache_util.c.orig +++ modules/cache/cache_util.c @@ -19,6 +19,7 @@ #include mod_cache.h #include ap_provider.h +#include util_md5.h /* -- */ @@ -489,54 +490,49 @@ y[sizeof(j) * 2] = '\0'; } -static void cache_hash(const char *it, char *val, int ndepth, int nlength) +static unsigned int cdb_string_hash(const char *str) { -apr_md5_ctx_t context; -unsigned char digest[16]; -char tmp[22]; -int i, k, d; -unsigned int x; -static const char enc_table[64] = -ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_@; - -apr_md5_init(context); -apr_md5_update(context, (const unsigned char *) it, strlen(it)); -apr_md5_final(digest, context); - -/* encode 128 bits as 22 characters, using a modified uuencoding - * the encoding is 3 bytes - 4 characters* i.e. 128 bits is - * 5 x 3 bytes + 1 byte - 5 * 4 characters + 2 characters - */ -for (i = 0, k = 0; i 15; i += 3) { -x = (digest[i] 16) | (digest[i + 1] 8) | digest[i + 2]; -tmp[k++] = enc_table[x 18]; -tmp[k++] = enc_table[(x 12) 0x3f]; -tmp[k++] = enc_table[(x 6) 0x3f]; -tmp[k++] = enc_table[x 0x3f]; -} - -/* one byte left */ -x = digest[15]; -tmp[k++] = enc_table[x 2];/* use up 6 bits */ -tmp[k++] = enc_table[(x 4) 0x3f]; - -/* now split into directory levels */ -for (i = k = d = 0; d ndepth; ++d) { -memcpy(val[i], tmp[k], nlength); -k += nlength; -val[i + nlength] = '/'; -i += nlength + 1; -} -memcpy(val[i], tmp[k], 22 - k); -val[i + 22 - k] = '\0'; -} - -CACHE_DECLARE(char *)ap_cache_generate_name(apr_pool_t *p, int dirlevels, -int dirlength, const char *name) -{ -char hashfile[66]; -cache_hash(name, hashfile, dirlevels, dirlength); -return apr_pstrdup(p, hashfile); +unsigned int hash = 5381; + +while (*str) { +hash += (hash 5); +hash ^= *str++; +} + +return hash; +} + +#define MD5_HEXDIGESTSIZE (APR_MD5_DIGESTSIZE * 2 + 1) + +CACHE_DECLARE(char *)ap_cache_generate_name(apr_pool_t *p, unsigned int nlevels, +unsigned int *dirlevels, +unsigned int *divisors, +const char *name) +{ +char *md5_hash; +unsigned int i; +char *ptr, *key; +unsigned char h; +unsigned int cdb_hash; +static const char hex[] = 0123456789abcdef; + +md5_hash = ap_md5_binary(p, (unsigned char *) name, (int) strlen(name)); + +cdb_hash = cdb_string_hash(name); + +key = ptr = apr_palloc(p, nlevels * LEVEL_DIR_LENGTH + + MD5_HEXDIGESTSIZE); + +for (i = 0; i nlevels; i++) { +h = (cdb_hash / divisors[i]) % dirlevels[i]; +*ptr++ = hex[h 4]; +*ptr++ = hex[h 0xf]; +*ptr++ = '/'; +} + +memcpy(ptr, md5_hash, MD5_HEXDIGESTSIZE); + +return key; } /* Create a new table consisting of those elements from an input Index: modules/cache/mod_cache.h === --- modules/cache/mod_cache.h.orig +++ modules/cache/mod_cache.h @@ -72,6 +72,8 @@ #include apr_atomic.h +#define LEVEL_DIR_LENGTH3 + #ifndef MAX #define MAX(a,b)((a) (b) ? (a) : (b)) #endif @@ -274,8 +276,9 @@ CACHE_DECLARE(apr_time_t) ap_cache_hex2usec(const char *x); CACHE_DECLARE(void) ap_cache_usec2hex(apr_time_t j, char *y); -CACHE_DECLARE(char *)
[patch 04/16] shrink cache_url_handler
Move parts of cache_url_handler() to a smaller add_cache_filters() function that is easier to read and understand. Index: modules/cache/mod_cache.c === --- modules/cache/mod_cache.c.orig +++ modules/cache/mod_cache.c @@ -48,6 +48,44 @@ * oh well. */ +static void add_cache_filters(request_rec *r, cache_request_rec *cache) +{ +/* + * Add cache_save filter to cache this request. Choose + * the correct filter by checking if we are a subrequest + * or not. + */ +if (r-main) { +ap_log_error(APLOG_MARK, APLOG_DEBUG, APR_SUCCESS, +r-server, +Adding CACHE_SAVE_SUBREQ filter for %s, +r-uri); +ap_add_output_filter_handle(cache_save_subreq_filter_handle, +NULL, r, r-connection); +} +else { +ap_log_error(APLOG_MARK, APLOG_DEBUG, APR_SUCCESS, +r-server, Adding CACHE_SAVE filter for %s, +r-uri); +ap_add_output_filter_handle(cache_save_filter_handle, +NULL, r, r-connection); +} + +ap_log_error(APLOG_MARK, APLOG_DEBUG, APR_SUCCESS, r-server, +Adding CACHE_REMOVE_URL filter for %s, +r-uri); + +/* Add cache_remove_url filter to this request to remove a + * stale cache entry if needed. Also put the current cache + * request rec in the filter context, as the request that + * is available later during running the filter maybe + * different due to an internal redirect. + */ +cache-remove_url_filter = +ap_add_output_filter_handle(cache_remove_url_filter_handle, +cache, r, r-connection); +} + static int cache_url_handler(request_rec *r, int lookup) { apr_status_t rv; @@ -113,41 +151,7 @@ if (rv != OK) { if (rv == DECLINED) { if (!lookup) { - -/* - * Add cache_save filter to cache this request. Choose - * the correct filter by checking if we are a subrequest - * or not. - */ -if (r-main) { -ap_log_error(APLOG_MARK, APLOG_DEBUG, APR_SUCCESS, - r-server, - Adding CACHE_SAVE_SUBREQ filter for %s, - r-uri); - ap_add_output_filter_handle(cache_save_subreq_filter_handle, -NULL, r, r-connection); -} -else { -ap_log_error(APLOG_MARK, APLOG_DEBUG, APR_SUCCESS, - r-server, Adding CACHE_SAVE filter for %s, - r-uri); -ap_add_output_filter_handle(cache_save_filter_handle, -NULL, r, r-connection); -} - -ap_log_error(APLOG_MARK, APLOG_DEBUG, APR_SUCCESS, r-server, - Adding CACHE_REMOVE_URL filter for %s, - r-uri); - -/* Add cache_remove_url filter to this request to remove a - * stale cache entry if needed. Also put the current cache - * request rec in the filter context, as the request that - * is available later during running the filter maybe - * different due to an internal redirect. - */ -cache-remove_url_filter = -ap_add_output_filter_handle(cache_remove_url_filter_handle, -cache, r, r-connection); +add_cache_filters(r, cache); } else { if (cache-stale_headers) { --
[patch 05/16] pass uri_meets_conditions values by reference
Don't pass 'large' objects by value when not needed, it wastes time and space. Index: modules/cache/cache_util.c === --- modules/cache/cache_util.c.orig +++ modules/cache/cache_util.c @@ -28,40 +28,41 @@ /* Determine if url matches the hostname, scheme and port and path * in filter. All but the path comparisons are case-insensitive. */ -static int uri_meets_conditions(apr_uri_t filter, int pathlen, apr_uri_t url) +static int uri_meets_conditions(const apr_uri_t *filter, apr_size_t pathlen, +const apr_uri_t *url) { /* Compare the hostnames */ -if(filter.hostname) { -if (!url.hostname) { +if(filter-hostname) { +if (!url-hostname) { return 0; } -else if (strcasecmp(filter.hostname, url.hostname)) { +else if (strcasecmp(filter-hostname, url-hostname)) { return 0; } } /* Compare the schemes */ -if(filter.scheme) { -if (!url.scheme) { +if(filter-scheme) { +if (!url-scheme) { return 0; } -else if (strcasecmp(filter.scheme, url.scheme)) { +else if (strcasecmp(filter-scheme, url-scheme)) { return 0; } } /* Compare the ports */ -if(filter.port_str) { -if (url.port_str filter.port != url.port) { +if(filter-port_str) { +if (url-port_str filter-port != url-port) { return 0; } /* NOTE: ap_port_of_scheme will return 0 if given NULL input */ -else if (filter.port != apr_uri_port_of_scheme(url.scheme)) { +else if (filter-port != apr_uri_port_of_scheme(url-scheme)) { return 0; } } -else if(url.port_str filter.scheme) { -if (apr_uri_port_of_scheme(filter.scheme) == url.port) { +else if(url-port_str filter-scheme) { +if (apr_uri_port_of_scheme(filter-scheme) == url-port) { return 0; } } @@ -69,12 +70,11 @@ /* Url has met all of the filter conditions so far, determine * if the paths match. */ -return !strncmp(filter.path, url.path, pathlen); +return !strncmp(filter-path, url-path, pathlen); } CACHE_DECLARE(cache_provider_list *)ap_cache_get_providers(request_rec *r, - cache_server_conf *conf, - apr_uri_t uri) + cache_server_conf *conf) { cache_provider_list *providers = NULL; int i; @@ -83,7 +83,7 @@ for (i = 0; i conf-cacheenable-nelts; i++) { struct cache_enable *ent = (struct cache_enable *)conf-cacheenable-elts; -if (uri_meets_conditions(ent[i].url, ent[i].pathlen, uri)) { +if (uri_meets_conditions(ent[i].url, ent[i].pathlen, r-parsed_uri)) { /* Fetch from global config and add to the list. */ cache_provider *provider; provider = ap_lookup_provider(CACHE_PROVIDER_GROUP, ent[i].type, @@ -120,7 +120,7 @@ for (i = 0; i conf-cachedisable-nelts; i++) { struct cache_disable *ent = (struct cache_disable *)conf-cachedisable-elts; -if (uri_meets_conditions(ent[i].url, ent[i].pathlen, uri)) { +if (uri_meets_conditions(ent[i].url, ent[i].pathlen, r-parsed_uri)) { /* Stop searching now. */ return NULL; } Index: modules/cache/mod_cache.c === --- modules/cache/mod_cache.c.orig +++ modules/cache/mod_cache.c @@ -108,7 +108,7 @@ /* * Which cache module (if any) should handle this request? */ -if (!(providers = ap_cache_get_providers(r, conf, r-parsed_uri))) { +if (!(providers = ap_cache_get_providers(r, conf))) { return DECLINED; } Index: modules/cache/mod_cache.h === --- modules/cache/mod_cache.h.orig +++ modules/cache/mod_cache.h @@ -280,7 +280,7 @@ unsigned int *dirlevels, unsigned int *divisors, const char *name); -CACHE_DECLARE(cache_provider_list *)ap_cache_get_providers(request_rec *r, cache_server_conf *conf, apr_uri_t uri); +CACHE_DECLARE(cache_provider_list *)ap_cache_get_providers(request_rec *r, cache_server_conf *conf); CACHE_DECLARE(int) ap_cache_liststr(apr_pool_t *p, const char *list, const char *key, char **val); CACHE_DECLARE(const char *)ap_cache_tokstr(apr_pool_t *p, const char *list, const char **str); --
[patch 03/16] shrink cache_save_filter
Move parts of cache_save_filter() to a smaller check_cacheable_request() function that is easier to read and understand. Also, a useless assignment is removed. Index: modules/cache/mod_cache.c === --- modules/cache/mod_cache.c.orig +++ modules/cache/mod_cache.c @@ -313,65 +313,11 @@ * Finally, pass the data to the next filter (the network or whatever) */ -static int cache_save_filter(ap_filter_t *f, apr_bucket_brigade *in) +static const char* check_cacheable_request(request_rec *r, +cache_request_rec *cache, cache_server_conf *conf) { -int rv = !OK; -int date_in_errhdr = 0; -request_rec *r = f-r; -cache_request_rec *cache; -cache_server_conf *conf; -const char *cc_out, *cl; -const char *exps, *lastmods, *dates, *etag; -apr_time_t exp, date, lastmod, now; -apr_off_t size; -cache_info *info = NULL; -char *reason; -apr_pool_t *p; - -conf = (cache_server_conf *) ap_get_module_config(r-server-module_config, - cache_module); - -/* Setup cache_request_rec */ -cache = (cache_request_rec *) ap_get_module_config(r-request_config, - cache_module); -if (!cache) { -/* user likely configured CACHE_SAVE manually; they should really use - * mod_cache configuration to do that - */ -cache = apr_pcalloc(r-pool, sizeof(cache_request_rec)); -ap_set_module_config(r-request_config, cache_module, cache); -} - -reason = NULL; -p = r-pool; -/* - * Pass Data to Cache - * -- - * This section passes the brigades into the cache modules, but only - * if the setup section (see below) is complete. - */ -if (cache-block_response) { -/* We've already sent down the response and EOS. So, ignore - * whatever comes now. - */ -return APR_SUCCESS; -} - -/* have we already run the cachability check and set up the - * cached file handle? - */ -if (cache-in_checked) { -/* pass the brigades into the cache, then pass them - * up the filter stack - */ -rv = cache-provider-store_body(cache-handle, r, in); -if (rv != APR_SUCCESS) { -ap_log_error(APLOG_MARK, APLOG_DEBUG, rv, r-server, - cache: Cache provider's store_body failed!); -ap_remove_output_filter(f); -} -return ap_pass_brigade(f-next, in); -} +apr_time_t exp, lastmod; +const char *exps, *lastmods, *etag, *cc_out, *reason = NULL; /* * Setup Data in Cache @@ -439,11 +385,11 @@ * We include 304 Not Modified here too as this is the origin server * telling us to serve the cached copy. */ -reason = apr_psprintf(p, Response status %d, r-status); +reason = apr_psprintf(r-pool, Response status %d, r-status); } else if (exps != NULL exp == APR_DATE_BAD) { /* if a broken Expires header is present, don't cache it */ -reason = apr_pstrcat(p, Broken expires header: , exps, NULL); +reason = apr_pstrcat(r-pool, Broken expires header: , exps, NULL); } else if (exp != APR_DATE_BAD exp r-request_time) { @@ -522,20 +468,16 @@ reason = r-no_cache present; } -if (reason) { -ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, r-server, - cache: %s not cached. Reason: %s, r-unparsed_uri, - reason); +cache-exp = exp; +cache-lastmod = lastmod; -/* remove this filter from the chain */ -ap_remove_output_filter(f); - -/* ship the data up the stack */ -return ap_pass_brigade(f-next, in); -} +return reason; +} -/* Make it so that we don't execute this path again. */ -cache-in_checked = 1; +static apr_off_t request_cl(request_rec *r, apr_bucket_brigade *in) +{ +const char *cl; +apr_off_t size; /* Set the content length if known. */ @@ -580,48 +522,18 @@ } } -/* It's safe to cache the response. - * - * There are two possiblities at this point: - * - cache-handle == NULL. In this case there is no previously - * cached entity anywhere on the system. We must create a brand - * new entity and store the response in it. - * - cache-stale_handle != NULL. In this case there is a stale - * entity in the system which needs to be replaced by new - * content (unless the result was 304 Not Modified, which means - * the cached entity is actually fresh, and we should update - * the headers). - */ - -/* Did we have a stale cache entry that really is stale? */ -if (cache-stale_handle) { -if (r-status == HTTP_NOT_MODIFIED) { -/* Oh, hey. It isn't that stale! Yay! */ -cache-handle =
[patch 08/16] dont delete empty cache directories
Don't delete empty cache directories, it is too expensive and unnecessary. Later they can be removed with htcacheclean. Index: modules/cache/mod_disk_cache.c === --- modules/cache/mod_disk_cache.c.orig +++ modules/cache/mod_disk_cache.c @@ -579,45 +579,6 @@ } } -/* now delete directories as far as possible up to our cache root */ -if (dobj-root) { -const char *str_to_copy; - -str_to_copy = dobj-hdrsfile ? dobj-hdrsfile : dobj-datafile; -if (str_to_copy) { -char *dir, *slash, *q; - -dir = apr_pstrdup(p, str_to_copy); - -/* remove filename */ -slash = strrchr(dir, '/'); -*slash = '\0'; - -/* - * now walk our way back to the cache root, delete everything - * in the way as far as possible - * - * Note: due to the way we constructed the file names in - * header_file and data_file, we are guaranteed that the - * cache_root is suffixed by at least one '/' which will be - * turned into a terminating null by this loop. Therefore, - * we won't either delete or go above our cache root. - */ -for (q = dir + dobj-root_len; *q ; ) { - ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, NULL, - disk_cache: Deleting directory %s from cache, - dir); - - rc = apr_dir_remove(dir, p); - if (rc != APR_SUCCESS !APR_STATUS_IS_ENOENT(rc)) { -break; - } - slash = strrchr(q, '/'); - *slash = '\0'; -} -} -} - return OK; } --
[patch 00/16] SoC 2006, cache refactoring
Hi, My first round of patches to the cache-refactor branch. They are mostly cleanups, bug fixes and small but useful enchantments. The patches are also available at http://verdesmares.com/Apache/patches/ -- Davi Arnaut
[patch 01/16] dont cache expired objects
Don't cache requests with a expires date in the past; otherwise mod_cache will always try to cache the URL. This bug might lead to numerous rename() errors on win32 if the URL was previously cached. Index: modules/cache/mod_cache.c === --- modules/cache/mod_cache.c.orig +++ modules/cache/mod_cache.c @@ -445,6 +445,11 @@ /* if a broken Expires header is present, don't cache it */ reason = apr_pstrcat(p, Broken expires header: , exps, NULL); } +else if (exp != APR_DATE_BAD exp r-request_time) +{ +/* if a Expires header is in the past, don't cache it */ +reason = Expires header already expired, not cacheable; +} else if (r-args exps == NULL) { /* if query string present but no expiration time, don't cache it * (RFC 2616/13.9) --
[patch 10/16] fix up coding style issues
Clean up code style in the cache code and shrink the mod_mem_cache store_body function. Index: modules/cache/mod_mem_cache.c === --- modules/cache/mod_mem_cache.c.orig +++ modules/cache/mod_mem_cache.c @@ -52,27 +52,30 @@ module AP_MODULE_DECLARE_DATA mem_cache_module; -typedef enum { +typedef enum +{ CACHE_TYPE_FILE = 1, CACHE_TYPE_HEAP, CACHE_TYPE_MMAP } cache_type_e; -typedef struct { -char* hdr; -char* val; +typedef struct +{ +char *hdr; +char *val; } cache_header_tbl_t; -typedef struct mem_cache_object { +typedef struct mem_cache_object +{ cache_type_e type; apr_ssize_t num_header_out; apr_ssize_t num_req_hdrs; cache_header_tbl_t *header_out; -cache_header_tbl_t *req_hdrs; /* for Vary negotiation */ +cache_header_tbl_t *req_hdrs; /* for Vary negotiation */ apr_size_t m_len; void *m; apr_os_file_t fd; -apr_int32_t flags; /* File open flags */ +apr_int32_t flags; /* File open flags */ long priority; /** the priority of this entry */ long total_refs; /** total number of references this entry has had */ @@ -80,14 +83,15 @@ } mem_cache_object_t; -typedef struct { +typedef struct +{ apr_thread_mutex_t *lock; cache_cache_t *cache_cache; /* Fields set by config directives */ apr_size_t min_cache_object_size; /* in bytes */ apr_size_t max_cache_object_size; /* in bytes */ -apr_size_t max_cache_size; /* in bytes */ +apr_size_t max_cache_size; /* in bytes */ apr_size_t max_object_cnt; cache_pqueue_set_priority cache_remove_algorithm; @@ -95,8 +99,11 @@ * we haven't yet seen EOS */ apr_off_t max_streaming_buffer_size; } mem_cache_conf; + static mem_cache_conf *sconf; +static void cleanup_cache_object(cache_object_t * obj); + #define DEFAULT_MAX_CACHE_SIZE 100*1024 #define DEFAULT_MIN_CACHE_OBJECT_SIZE 0 #define DEFAULT_MAX_CACHE_OBJECT_SIZE 1 @@ -104,15 +111,6 @@ #define DEFAULT_MAX_STREAMING_BUFFER_SIZE 10 #define CACHEFILE_LEN 20 -/* Forward declarations */ -static int remove_entity(cache_handle_t *h); -static apr_status_t store_headers(cache_handle_t *h, request_rec *r, cache_info *i); -static apr_status_t store_body(cache_handle_t *h, request_rec *r, apr_bucket_brigade *b); -static apr_status_t recall_headers(cache_handle_t *h, request_rec *r); -static apr_status_t recall_body(cache_handle_t *h, apr_pool_t *p, apr_bucket_brigade *bb); - -static void cleanup_cache_object(cache_object_t *obj); - static long memcache_get_priority(void*a) { cache_object_t *obj = (cache_object_t *)a; @@ -132,31 +130,34 @@ static void memcache_set_pos(void *a, apr_ssize_t pos) { -cache_object_t *obj = (cache_object_t *)a; +cache_object_t *obj = (cache_object_t *) a; mem_cache_object_t *mobj = obj-vobj; apr_atomic_set32(mobj-pos, pos); } + static apr_ssize_t memcache_get_pos(void *a) { -cache_object_t *obj = (cache_object_t *)a; +cache_object_t *obj = (cache_object_t *) a; mem_cache_object_t *mobj = obj-vobj; return apr_atomic_read32(mobj-pos); } -static apr_size_t memcache_cache_get_size(void*a) +static apr_size_t memcache_cache_get_size(void *a) { -cache_object_t *obj = (cache_object_t *)a; +cache_object_t *obj = (cache_object_t *) a; mem_cache_object_t *mobj = obj-vobj; return mobj-m_len; } + /** callback to get the key of a item */ -static const char* memcache_cache_get_key(void*a) +static const char *memcache_cache_get_key(void *a) { -cache_object_t *obj = (cache_object_t *)a; +cache_object_t *obj = (cache_object_t *) a; return obj-key; } + /** * memcache_cache_free() * memcache_cache_free is a callback that is only invoked by a thread @@ -165,9 +166,9 @@ * has been ejected from the cache. decrement the refcount and if the refcount drops * to 0, cleanup the cache object. */ -static void memcache_cache_free(void*a) +static void memcache_cache_free(void *a) { -cache_object_t *obj = (cache_object_t *)a; +cache_object_t *obj = (cache_object_t *) a; /* Decrement the refcount to account for the object being ejected * from the cache. If the refcount is 0, free the object. @@ -176,13 +177,14 @@ cleanup_cache_object(obj); } } + /* * functions return a 'negative' score since priority queues * dequeue the object with the highest value first */ static long memcache_lru_algorithm(long queue_clock, void *a) { -cache_object_t *obj = (cache_object_t *)a; +cache_object_t *obj = (cache_object_t *) a; mem_cache_object_t *mobj = obj-vobj; if (mobj-priority == 0) mobj-priority = queue_clock - mobj-total_refs; @@ -196,17 +198,17 @@ static long memcache_gdsf_algorithm(long queue_clock, void *a) { -cache_object_t *obj = (cache_object_t *)a; +cache_object_t *obj =
[patch 12/16] use APR pools in mod_mem_cache
Convert mod_mem_cache to use APR memory pool functions by creating a root pool for object persistence across requests. This also eliminates the need for custom serialization code. Index: modules/cache/mod_mem_cache.c === --- modules/cache/mod_mem_cache.c.orig +++ modules/cache/mod_mem_cache.c @@ -59,19 +59,12 @@ CACHE_TYPE_MMAP } cache_type_e; -typedef struct -{ -char *hdr; -char *val; -} cache_header_tbl_t; - typedef struct mem_cache_object { +apr_pool_t *pool; cache_type_e type; -apr_ssize_t num_header_out; -apr_ssize_t num_req_hdrs; -cache_header_tbl_t *header_out; -cache_header_tbl_t *req_hdrs; /* for Vary negotiation */ +apr_table_t *header_out; +apr_table_t *req_hdrs; /* for Vary negotiation */ apr_size_t m_len; void *m; apr_os_file_t fd; @@ -212,19 +205,6 @@ { mem_cache_object_t *mobj = obj-vobj; -/* TODO: - * We desperately need a more efficient way of allocating objects. We're - * making way too many malloc calls to create a fully populated - * cache object... - */ - -/* Cleanup the cache_object_t */ -if (obj-key) { -free((void *) obj-key); -} - -free(obj); - /* Cleanup the mem_cache_object_t */ if (mobj) { if (mobj-m) { @@ -237,18 +217,9 @@ close(mobj-fd); #endif } -if (mobj-header_out) { -if (mobj-header_out[0].hdr) -free(mobj-header_out[0].hdr); -free(mobj-header_out); -} -if (mobj-req_hdrs) { -if (mobj-req_hdrs[0].hdr) -free(mobj-req_hdrs[0].hdr); -free(mobj-req_hdrs); -} -free(mobj); } + +apr_pool_destroy(mobj-pool); } static apr_status_t decrement_refcount(void *arg) @@ -339,9 +310,10 @@ static int create_entity(cache_handle_t * h, cache_type_e type_e, request_rec *r, const char *key, apr_off_t len) { +apr_status_t rv; +apr_pool_t *pool; cache_object_t *obj, *tmp_obj; mem_cache_object_t *mobj; -apr_size_t key_len; if (len == -1) { /* Caching a streaming response. Assume the response is @@ -375,25 +347,21 @@ } } -/* Allocate and initialize cache_object_t */ -obj = calloc(1, sizeof(*obj)); -if (!obj) { -return DECLINED; -} -key_len = strlen(key) + 1; -obj-key = malloc(key_len); -if (!obj-key) { -cleanup_cache_object(obj); +rv = apr_pool_create(pool, NULL); + +if (rv != APR_SUCCESS) { +ap_log_error(APLOG_MARK, APLOG_WARNING, rv, r-server, + mem_cache: Failed to create memory pool.); return DECLINED; } -memcpy((void *) obj-key, key, key_len); + +/* Allocate and initialize cache_object_t */ +obj = apr_pcalloc(pool, sizeof(*obj)); +obj-key = apr_pstrdup(pool, key); /* Allocate and init mem_cache_object_t */ -mobj = calloc(1, sizeof(*mobj)); -if (!mobj) { -cleanup_cache_object(obj); -return DECLINED; -} +mobj = apr_pcalloc(pool, sizeof(*mobj)); +mobj-pool = pool; /* Finish initing the cache object */ apr_atomic_set32(obj-refcount, 1); @@ -545,65 +513,6 @@ return OK; } -static apr_status_t serialize_table(cache_header_tbl_t ** obj, -apr_ssize_t * nelts, apr_table_t * table) -{ -const apr_array_header_t *elts_arr = apr_table_elts(table); -apr_table_entry_t *elts = (apr_table_entry_t *) elts_arr-elts; -apr_ssize_t i; -apr_size_t len = 0; -apr_size_t idx = 0; -char *buf; - -*nelts = elts_arr-nelts; -if (*nelts == 0) { -*obj = NULL; -return APR_SUCCESS; -} -*obj = malloc(sizeof(cache_header_tbl_t) * elts_arr-nelts); -if (NULL == *obj) { -return APR_ENOMEM; -} -for (i = 0; i elts_arr-nelts; ++i) { -len += strlen(elts[i].key); -len += strlen(elts[i].val); -len += 2; /* Extra space for NULL string terminator for key and val */ -} - -/* Transfer the headers into a contiguous memory block */ -buf = malloc(len); -if (!buf) { -free(*obj); -*obj = NULL; -return APR_ENOMEM; -} - -for (i = 0; i *nelts; ++i) { -(*obj)[i].hdr = buf[idx]; -len = strlen(elts[i].key) + 1; /* Include NULL terminator */ -memcpy(buf[idx], elts[i].key, len); -idx += len; - -(*obj)[i].val = buf[idx]; -len = strlen(elts[i].val) + 1; -memcpy(buf[idx], elts[i].val, len); -idx += len; -} -return APR_SUCCESS; -} - -static int unserialize_table(cache_header_tbl_t * ctbl, int num_headers, - apr_table_t * t) -{ -int i; - -for (i = 0; i num_headers; ++i) { -apr_table_addn(t, ctbl[i].hdr, ctbl[i].val); -} - -return
[patch 16/16] remove duplicated defines
Remove duplicated defines. Index: modules/cache/mod_cache.h === --- modules/cache/mod_cache.h.orig +++ modules/cache/mod_cache.h @@ -311,27 +311,6 @@ /* hooks */ -/* Create a set of CACHE_DECLARE(type), CACHE_DECLARE_NONSTD(type) and - * CACHE_DECLARE_DATA with appropriate export and import tags for the platform - */ -#if !defined(WIN32) -#define CACHE_DECLARE(type)type -#define CACHE_DECLARE_NONSTD(type) type -#define CACHE_DECLARE_DATA -#elif defined(CACHE_DECLARE_STATIC) -#define CACHE_DECLARE(type)type __stdcall -#define CACHE_DECLARE_NONSTD(type) type -#define CACHE_DECLARE_DATA -#elif defined(CACHE_DECLARE_EXPORT) -#define CACHE_DECLARE(type)__declspec(dllexport) type __stdcall -#define CACHE_DECLARE_NONSTD(type) __declspec(dllexport) type -#define CACHE_DECLARE_DATA __declspec(dllexport) -#else -#define CACHE_DECLARE(type)__declspec(dllimport) type __stdcall -#define CACHE_DECLARE_NONSTD(type) __declspec(dllimport) type -#define CACHE_DECLARE_DATA __declspec(dllimport) -#endif - APR_DECLARE_OPTIONAL_FN(apr_status_t, ap_cache_generate_key, (request_rec *r, apr_pool_t*p, char**key )); --
[patch 06/16] dont choke on empty URI path
According to my reading of RFC3986 (section 6.2.3.) if a URI contains an authority component and an empty path, the empty path is to be equivalent to /. It explicitly cites the following four URIs as equivalents: http://example.com http://example.com/ http://example.com:/ http://example.com:80/ Index: modules/cache/cache_util.c === --- modules/cache/cache_util.c.orig +++ modules/cache/cache_util.c @@ -67,9 +67,23 @@ } } +/* For HTTP caching purposes, an empty (NULL) path is equivalent to + * a single / path. RFCs 3986/2396 + */ + +if (!url-path) { +if (*filter-path == '/' pathlen == 1) { +return 1; +} +else { +return 0; +} +} + /* Url has met all of the filter conditions so far, determine * if the paths match. */ + return !strncmp(filter-path, url-path, pathlen); } --
[patch 07/16] shrink cache_select
Move parts of cache_select() to a smaller cache_check_request() function that is easier to read and understand. Index: modules/cache/cache_storage.c === --- modules/cache/cache_storage.c.orig +++ modules/cache/cache_storage.c @@ -169,6 +169,115 @@ } } +static int cache_check_request(cache_handle_t *h, request_rec *r, + cache_request_rec *cache, + cache_provider_list *list) +{ +int fresh; +char *vary = NULL; + +/* + * Check Content-Negotiation - Vary + * + * At this point we need to make sure that the object we found in + * the cache is the same object that would be delivered to the + * client, when the effects of content negotiation are taken into + * effect. + * + * In plain english, we want to make sure that a language-negotiated + * document in one language is not given to a client asking for a + * language negotiated document in a different language by mistake. + * + * This code makes the assumption that the storage manager will + * cache the req_hdrs if the response contains a Vary + * header. + * + * RFC2616 13.6 and 14.44 describe the Vary mechanism. + */ +vary = apr_pstrdup(r-pool, apr_table_get(h-resp_hdrs, Vary)); +while (vary *vary) { +char *name = vary; +const char *h1, *h2; + +/* isolate header name */ +while (*vary !apr_isspace(*vary) (*vary != ',')) +++vary; +while (*vary (apr_isspace(*vary) || (*vary == ','))) { +*vary = '\0'; +++vary; +} + +/* + * is this header in the request and the header in the cached + * request identical? If not, we give up and do a straight get + */ +h1 = apr_table_get(r-headers_in, name); +h2 = apr_table_get(h-req_hdrs, name); +if (h1 == h2) { +/* both headers NULL, so a match - do nothing */ +} +else if (h1 h2 !strcmp(h1, h2)) { +/* both headers exist and are equal - do nothing */ +} +else { +/* headers do not match, so Vary failed */ +ap_log_error(APLOG_MARK, APLOG_DEBUG, APR_SUCCESS, +r-server, +cache_select_url(): Vary header mismatch.); +return DECLINED; +} +} + +cache-provider = list-provider; +cache-provider_name = list-provider_name; + +/* Is our cached response fresh enough? */ +fresh = ap_cache_check_freshness(h, r); +if (!fresh) { +const char *etag, *lastmod; + +ap_log_error(APLOG_MARK, APLOG_DEBUG, APR_SUCCESS, r-server, +Cached response for %s isn't fresh. Adding/replacing +conditional request headers., r-uri); + +/* Make response into a conditional */ +cache-stale_headers = apr_table_copy(r-pool, +r-headers_in); + +/* We can only revalidate with our own conditionals: remove the + * conditions from the original request. + */ +apr_table_unset(r-headers_in, If-Match); +apr_table_unset(r-headers_in, If-Modified-Since); +apr_table_unset(r-headers_in, If-None-Match); +apr_table_unset(r-headers_in, If-Range); +apr_table_unset(r-headers_in, If-Unmodified-Since); + +etag = apr_table_get(h-resp_hdrs, ETag); +lastmod = apr_table_get(h-resp_hdrs, Last-Modified); + +if (etag || lastmod) { +/* If we have a cached etag and/or Last-Modified add in + * our own conditionals. + */ + +if (etag) { +apr_table_set(r-headers_in, If-None-Match, etag); +} + +if (lastmod) { +apr_table_set(r-headers_in, If-Modified-Since, +lastmod); +} +cache-stale_handle = h; +} + +return DECLINED; +} + +return OK; +} + /* * select a specific URL entity in the cache * @@ -201,110 +310,13 @@ while (list) { switch ((rv = list-provider-open_entity(h, r, key))) { case OK: { -char *vary = NULL; -int fresh; if (list-provider-recall_headers(h, r) != APR_SUCCESS) { /* TODO: Handle this error */ return DECLINED; } -/* - * Check Content-Negotiation - Vary - * - * At this point we need to make sure that the object we found in - * the cache is the same object that would be delivered to the - * client, when the effects of content negotiation are taken into - * effect. - * - * In plain english, we want to make sure that a language-negotiated - * document in one language is
[patch 15/16] fix a FIXME (clean cache_object_t)
Move members from cache_object_t to mem_cache_object_t that are only required for mod_mem_cache. Index: modules/cache/mod_cache.h === --- modules/cache/mod_cache.h.orig +++ modules/cache/mod_cache.h @@ -181,23 +181,12 @@ /* cache handle information */ -/* XXX TODO On the next structure change/MMN bump, - * count must become an apr_off_t, representing - * the potential size of disk cached objects. - * Then dig for - * XXX Bad Temporary Cast - see cache_object_t notes - */ typedef struct cache_object cache_object_t; struct cache_object { const char *key; -cache_object_t *next; cache_info info; /* Opaque portion (specific to the implementation) of the cache object */ void *vobj; -/* FIXME: These are only required for mod_mem_cache. */ -apr_size_t count; /* Number of body bytes written to the cache so far */ -int complete; -apr_uint32_t refcount; /* refcount and bit flag to cleanup object */ }; typedef struct cache_handle cache_handle_t; Index: modules/cache/mod_mem_cache.c === --- modules/cache/mod_mem_cache.c.orig +++ modules/cache/mod_mem_cache.c @@ -71,9 +71,10 @@ apr_int32_t flags; /* File open flags */ long priority; /** the priority of this entry */ long total_refs; /** total number of references this entry has had */ - apr_uint32_t pos; /** the position of this entry in the cache */ - +apr_off_t count; /* Number of body bytes written to the cache so far */ +int complete; +apr_uint32_t refcount; /* refcount and bit flag to cleanup object */ } mem_cache_object_t; typedef struct @@ -162,11 +163,12 @@ static void memcache_cache_free(void *a) { cache_object_t *obj = (cache_object_t *) a; +mem_cache_object_t *mobj = obj-vobj; /* Decrement the refcount to account for the object being ejected * from the cache. If the refcount is 0, free the object. */ -if (!apr_atomic_dec32(obj-refcount)) { +if (!apr_atomic_dec32(mobj-refcount)) { cleanup_cache_object(obj); } } @@ -225,6 +227,7 @@ static apr_status_t decrement_refcount(void *arg) { cache_object_t *obj = (cache_object_t *) arg; +mem_cache_object_t *mobj = obj-vobj; /* If obj-complete is not set, the cache update failed and the * object needs to be removed from the cache then cleaned up. @@ -232,7 +235,7 @@ * cache already, so make sure it is really still in the cache * before attempting to remove it. */ -if (!obj-complete) { +if (!mobj-complete) { cache_object_t *tobj = NULL; if (sconf-lock) { apr_thread_mutex_lock(sconf-lock); @@ -240,7 +243,7 @@ tobj = cache_find(sconf-cache_cache, obj-key); if (tobj == obj) { cache_remove(sconf-cache_cache, obj); -apr_atomic_dec32(obj-refcount); +apr_atomic_dec32(mobj-refcount); } if (sconf-lock) { apr_thread_mutex_unlock(sconf-lock); @@ -248,7 +251,7 @@ } /* If the refcount drops to 0, cleanup the cache object */ -if (!apr_atomic_dec32(obj-refcount)) { +if (!apr_atomic_dec32(mobj-refcount)) { cleanup_cache_object(obj); } return APR_SUCCESS; @@ -257,6 +260,7 @@ static apr_status_t cleanup_cache_mem(void *sconfv) { cache_object_t *obj; +mem_cache_object_t *mobj; mem_cache_conf *co = (mem_cache_conf *) sconfv; if (!co) { @@ -271,8 +275,9 @@ } obj = cache_pop(co-cache_cache); while (obj) { +mobj = obj-vobj; /* Iterate over the cache and clean up each unreferenced entry */ -if (!apr_atomic_dec32(obj-refcount)) { +if (!apr_atomic_dec32(mobj-refcount)) { cleanup_cache_object(obj); } obj = cache_pop(co-cache_cache); @@ -364,9 +369,9 @@ mobj-pool = pool; /* Finish initing the cache object */ -apr_atomic_set32(obj-refcount, 1); +apr_atomic_set32(mobj-refcount, 1); mobj-total_refs = 1; -obj-complete = 0; +mobj-complete = 0; obj-vobj = mobj; /* Safe cast: We tested sconf-max_cache_object_size above */ mobj-m_len = (apr_size_t) len; @@ -390,12 +395,13 @@ tmp_obj = (cache_object_t *) cache_find(sconf-cache_cache, key); if (!tmp_obj) { +mem_cache_object_t *tmp_mobj = obj-vobj; cache_insert(sconf-cache_cache, obj); /* Add a refcount to account for the reference by the * hashtable in the cache. Refcount should be 2 now, one * for this thread, and one for the cache. */ -apr_atomic_inc32(obj-refcount); +apr_atomic_inc32(tmp_mobj-refcount); } if (sconf-lock) { apr_thread_mutex_unlock(sconf-lock); @@ -434,6 +440,7 @@ static int open_entity(cache_handle_t * h, request_rec *r, const char *key) {
Re: mod_cache responsibilities vs mod_xxx_cache provider responsibilities
On 14/09/2006, at 04:24, Niklas Edmundsson wrote: On Wed, 13 Sep 2006, Davi Arnaut wrote: I'm working on this. You may want to check my proposal at http:// verdesmares.com/Apache/proposal.txt Will it be possible to do away with one file for headers and one file for body in mod_disk_cache with this scheme? http://verdesmares.com/Apache/patches/016.patch The thing is that I've been pounding seriously at mod_disk_cache to make it able to sustain rather heavy load on not-so-heavy equipment, and part of that effort was to wrap headers and body into one file for mainly the following purposes: * Less files, less open():s (small gain) * Way much easier to purge old entries from the cache (huge gain). Simply list all files in cache, sort by atime and remove the oldest. The old way by using htcacheclean took ages and had less useful removal criteria. * No synchronisation issues between the header file and body file, unlink one and it's gone. That's only one of many changes made, but I found it to be crucial to be able to have an architecture that's consistent without relying on locks. This made it rather easy to implement stuff like serving files that are currently being cached from cache, reusing expired cached files if the originating file is found to be unmodified, and so on. But the largest gain is still the cache cleaning process. The stuff is used in production and seems stable, however I haven't had any response to the first (trivial) patch sent so I don't know if there's any interest in this. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- =-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] -- - Does the Little Mermaid wear an algebra? =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- =-=-=-=
Re: mod_cache responsibilities vs mod_xxx_cache provider responsibilities
On 14/09/2006, at 04:39, Graham Leggett wrote: Niklas Edmundsson wrote: Will it be possible to do away with one file for headers and one file for body in mod_disk_cache with this scheme? This definitely has lots of advantages - however HTTP/1.1 requires that it be possible to modify the headers on a cached entry independently of the cached body. As long as this is catered for, it should be fine. This is not a top priority since actually there is no complete support for it in mod_cache (partial responses and such), but it would be nice to have it. We could later easily extend the format to support it. -- Davi Arnaut
Re: mod_cache responsibilities vs mod_xxx_cache provider responsibilities
On 14/09/2006, at 05:08, Issac Goldstand wrote: This looks familiar. I seem to remembering seeing patches for this a few months back. Were they not committed to trunk? If not, is there any reason why not? I'd hate to spend serious time making modifications only to have to redo the work when this (pretty major) patchset gets committed... Probably because I have not yet submitted it for inclusion and it's not finished. A new cache-dev branch would be really nice - hint hint :) -- Davi Arnaut Davi Arnaut wrote: On 13/09/2006, at 16:29, Issac Goldstand wrote: Hi all, I've been hacking at mod_cache a bit, and was surprised to find that part of the decision to serve previously cached content or not was being made by the backend provider and not mod_cache; specifically, the expiration date of the content seems to be checked by mod_disk_cache (as part of open_entity), and if the provider check fails, mod_cache doesn't even know about the entity (and therefore, in the case of a caching proxy, can't treat it as a possibly stale entity upon which it can just do a conditional GET and possibly get a 304, rather than requiring mod_proxy to rerequest the entire entity again). When I originally started looking at the family of cache modules, I assumed that all of the decision-making logic would be in mod_cache, while the mod_xxx_cache providers would be dumb file-stores (at least, as far as mod_cache is concerned). Is this not the case? I'm working on this. You may want to check my proposal at http://verdesmares.com/Apache/proposal.txt If it is, would patches be acceptable if I have the time to try to rectify the situation (at least somewhat)? http://verdesmares.com/Apache/patches/022.patch I'm still working on it, things may change radically. -- Davi Arnaut
Re: mod_cache responsibilities vs mod_xxx_cache provider responsibilities
On 14/09/2006, at 08:48, Graham Leggett wrote: On Thu, September 14, 2006 1:42 pm, Davi Arnaut wrote: This is not a top priority since actually there is no complete support for it in mod_cache (partial responses and such), but it would be nice to have it. HTTP/1.1 compliance is mandatory for the cache. If it doesn't work now, it needs to be fixed. The cache is required to send to the client the most up-to-date response, it doesn't mean it must cache it. What I meant is _if_ it causes significant slowdowns for a common cache hit path _probably_ it is better to just revalidate the hole entity. -- Davi Arnaut
Re: mod_cache responsibilities vs mod_xxx_cache provider responsibilities
On 14/09/2006, at 09:06, Niklas Edmundsson wrote: On Thu, 14 Sep 2006, Davi Arnaut wrote: On 14/09/2006, at 04:24, Niklas Edmundsson wrote: On Wed, 13 Sep 2006, Davi Arnaut wrote: I'm working on this. You may want to check my proposal at http:// verdesmares.com/Apache/proposal.txt Will it be possible to do away with one file for headers and one file for body in mod_disk_cache with this scheme? http://verdesmares.com/Apache/patches/016.patch OK. You seem to dump the body right after the headers though, so you won't be able to do header rewrites. Could you kindly point me to the cache code that rewrites only the headers ? Also, it's rather unneccessary to call the files .cache if there are only one type of files ;) That's convenience, there may be other type of files on the same cache directory that are created by other tools. -- Davi Arnaut
Re: mod_cache responsibilities vs mod_xxx_cache provider responsibilities
On 14/09/2006, at 09:21, Davi Arnaut wrote: On 14/09/2006, at 09:06, Niklas Edmundsson wrote: On Thu, 14 Sep 2006, Davi Arnaut wrote: On 14/09/2006, at 04:24, Niklas Edmundsson wrote: On Wed, 13 Sep 2006, Davi Arnaut wrote: I'm working on this. You may want to check my proposal at http://verdesmares.com/Apache/proposal.txt Will it be possible to do away with one file for headers and one file for body in mod_disk_cache with this scheme? http://verdesmares.com/Apache/patches/016.patch OK. You seem to dump the body right after the headers though, so you won't be able to do header rewrites. Could you kindly point me to the cache code that rewrites only the headers ? Oops, I spoke too much too soon :) Also, it's rather unneccessary to call the files .cache if there are only one type of files ;) That's convenience, there may be other type of files on the same cache directory that are created by other tools. -- Davi Arnaut
Re: mod_cache responsibilities vs mod_xxx_cache provider responsibilities
On 13/09/2006, at 16:29, Issac Goldstand wrote: Hi all, I've been hacking at mod_cache a bit, and was surprised to find that part of the decision to serve previously cached content or not was being made by the backend provider and not mod_cache; specifically, the expiration date of the content seems to be checked by mod_disk_cache (as part of open_entity), and if the provider check fails, mod_cache doesn't even know about the entity (and therefore, in the case of a caching proxy, can't treat it as a possibly stale entity upon which it can just do a conditional GET and possibly get a 304, rather than requiring mod_proxy to rerequest the entire entity again). When I originally started looking at the family of cache modules, I assumed that all of the decision-making logic would be in mod_cache, while the mod_xxx_cache providers would be dumb file-stores (at least, as far as mod_cache is concerned). Is this not the case? I'm working on this. You may want to check my proposal at http:// verdesmares.com/Apache/proposal.txt If it is, would patches be acceptable if I have the time to try to rectify the situation (at least somewhat)? http://verdesmares.com/Apache/patches/022.patch I'm still working on it, things may change radically. -- Davi Arnaut
Re: How to detect the mpm used via apxs?
Em 19/08/2006, às 07:55, Mladen Turk escreveu: Hi, Anyone knows how to detect the mpm used so that module can be build with/without threading code depending if the mpm is prefork or any other threaded one like worker by using apxs? Why use apxs ? mpm.h: #define WORKER_MPM #define MPM_NAME Worker The only way I can imagine is to parse the stdout from httpd, but IMHO something like apxs -q MPM should do the trick like it does for CFLAGS or INCLUDES. Regards, Mladen.
Re: svn commit: r426604 - in /httpd/httpd/branches/httpd-proxy-scoreboard: modules/proxy/ support/
Em 28/07/2006, às 13:34, [EMAIL PROTECTED] escreveu: Author: jfclere Date: Fri Jul 28 09:33:58 2006 New Revision: 426604 ... + +static const slotmem_storage_method *checkstorage = NULL; +static ap_slotmem_t *myscore=NULL; Indentation consistency ? myscore=NULL + +if (!port) { +if (strcmp(scheme, ajp) == 0) +port = 8009; +else if (strcmp(scheme, http) == 0) +port = 80; +else +port = 443; +} apr_uri_port_of_scheme ? (it may not have the ajp scheme, a patch is appreciated) +rv = apr_sockaddr_info_get(epsv_addr, hostname, APR_INET, port, 0, pool); +if (rv != APR_SUCCESS) { +ap_log_error(APLOG_MARK, APLOG_ERR, 0, NULL, + apr_sockaddr_info_get failed); +apr_socket_close(newsock); +return rv; +} ap_log_error(..APLOG_ERR, rv, NULL..) so we may have a hint why it failed +rv = apr_socket_timeout_set(newsock, 10); +if (rv != APR_SUCCESS) { +ap_log_error(APLOG_MARK, APLOG_WARNING, 0, NULL, +apr_socket_timeout_set); +apr_socket_close(newsock); +return rv; same for ap_log_error +rv = apr_socket_connect(newsock, epsv_addr); +if (rv != APR_SUCCESS) { +ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, NULL, +apr_socket_connect failed); +apr_socket_close(newsock); +return rv; +} same for ap_log_error..and so on. + +if (balancer_name) +strcpy(workerconf-balancer_name, balancer_name); +workerconf-id = worker-id; +workerconf-retry = worker-retry; +workerconf-lbfactor = worker-lbfactor; +if (worker-name) +strcpy(workerconf-name, worker-name); +if (worker-scheme) +strcpy(workerconf-scheme, worker-scheme); +if (worker-hostname) +strcpy(workerconf-hostname, worker-hostname); +if (worker-route) +strcpy(workerconf-route, worker-route); +if (worker-redirect) +strcpy(workerconf-redirect, worker-redirect); strncpy ? +/* allocate the data */ +*worker = apr_pcalloc(pool, sizeof(proxy_worker)); +if (workerconf-balancer_name) +*balancer_name = apr_pcalloc(pool, strlen(workerconf- balancer_name)); +else +*balancer_name = NULL; allocated for what ? the string is not copied. Also, shoudn't it be strlen(..) + 1 ? +/* make the module usuable from outside */ +health_worker_method *health_checker_get_storage() +{ +return(worker_storage); +} + +/* handle the slotmem storage */ +void health_checker_init_slotmem_storage(slotmem_storage_method * storage) +{ +checkstorage = storage; +} +slotmem_storage_method * health_checker_get_slotmem_storage() +{ +return(checkstorage); +} + +/* handle the slotmen itself */ +void health_checker_init_slotmem(ap_slotmem_t *score) +{ + myscore = score; +} +ap_slotmem_t *health_checker_get_slotmem() +{ +return(myscore); +} static APR_INLINE ... +char * ap_server_root_relative(apr_pool_t *p, const char *name) +{ +char *fname; + +/* XXX: apr_filepath_merge better ? */ +if (basedir name[0] != '/') { +fname = apr_pcalloc(p, strlen(basedir)+strlen(name)+1); +strcpy(fname, basedir); +strcat(fname, /); +strcat(fname, name); apr_pstrcat ?
[patch 00/10] SoC 2006 -- extending the cache architecture
Hi, Here is the first code drop regarding my SoC proposal Redesigning and extending the Apache cache architecture (http://verdesmares.com/Apache/proposal.txt). The patches are mainly clenaups and bug-fixes, except a new mod_disk_cache directory structure. Further clenaups still required. Other than that, I modified a bit the way and which storege callback are called. Also, I would like to thank the patience of my mentor, Paul Querna. -- Davi Arnaut
[patch 02/10] revamped mod_disk_cache directory structure
This patch converts the mod_disk_cache cache directory structure to a uniformly distributed N level hierarchy. The admin specifies the number of levels (or directories) and the files are scattered across the directories. Example: CacheRoot /cache/ # CacheDirLevels n l1 l2 ln CacheDirLevels 2 4 256 16 In this example, the directory tree will be three levels deep. The first level will have 4 directories, each of the 4 first level directories will have 256 directories (second level) and so on to the last level. Directory tree layout for the example: /cache/ /cache/0d /cache/0d/36 /cache/0d/36/06 /cache/0d/36/06/7a50a5c38a73abdb6db28a2b0f6881e5.data /cache/0d/36/06/7a50a5c38a73abdb6db28a2b0f6881e5.header This patch adds support for symlinking the directories to separate disk or partitions by creating the cache files on their destination directory. The symlinks can also be used to load balance the cache files between disks because each last level directory gets the same (on average) number of files -- on a cache setup with 4 first level directories each one receives 25%, linking the three of them to disk A and the last one to disk B yields a 75/25 distribution between disks A (75) and B (25). Index: trunk/modules/cache/cache_util.c === --- trunk.orig/modules/cache/cache_util.c +++ trunk/modules/cache/cache_util.c @@ -19,6 +19,7 @@ #include mod_cache.h #include ap_provider.h +#include util_md5.h /* -- */ @@ -489,54 +490,49 @@ CACHE_DECLARE(void) ap_cache_usec2hex(ap y[sizeof(j) * 2] = '\0'; } -static void cache_hash(const char *it, char *val, int ndepth, int nlength) +static unsigned int cdb_string_hash(const char *str) { -apr_md5_ctx_t context; -unsigned char digest[16]; -char tmp[22]; -int i, k, d; -unsigned int x; -static const char enc_table[64] = -ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_@; - -apr_md5_init(context); -apr_md5_update(context, (const unsigned char *) it, strlen(it)); -apr_md5_final(digest, context); - -/* encode 128 bits as 22 characters, using a modified uuencoding - * the encoding is 3 bytes - 4 characters* i.e. 128 bits is - * 5 x 3 bytes + 1 byte - 5 * 4 characters + 2 characters - */ -for (i = 0, k = 0; i 15; i += 3) { -x = (digest[i] 16) | (digest[i + 1] 8) | digest[i + 2]; -tmp[k++] = enc_table[x 18]; -tmp[k++] = enc_table[(x 12) 0x3f]; -tmp[k++] = enc_table[(x 6) 0x3f]; -tmp[k++] = enc_table[x 0x3f]; -} - -/* one byte left */ -x = digest[15]; -tmp[k++] = enc_table[x 2];/* use up 6 bits */ -tmp[k++] = enc_table[(x 4) 0x3f]; - -/* now split into directory levels */ -for (i = k = d = 0; d ndepth; ++d) { -memcpy(val[i], tmp[k], nlength); -k += nlength; -val[i + nlength] = '/'; -i += nlength + 1; -} -memcpy(val[i], tmp[k], 22 - k); -val[i + 22 - k] = '\0'; -} - -CACHE_DECLARE(char *)ap_cache_generate_name(apr_pool_t *p, int dirlevels, -int dirlength, const char *name) -{ -char hashfile[66]; -cache_hash(name, hashfile, dirlevels, dirlength); -return apr_pstrdup(p, hashfile); +unsigned int hash = 5381; + +while (*str) { +hash += (hash 5); +hash ^= *str++; +} + +return hash; +} + +#define MD5_HEXDIGESTSIZE (APR_MD5_DIGESTSIZE * 2 + 1) + +CACHE_DECLARE(char *)ap_cache_generate_name(apr_pool_t *p, unsigned int nlevels, +unsigned int *dirlevels, +unsigned int *divisors, +const char *name) +{ +char *md5_hash; +unsigned int i; +char *ptr, *key; +unsigned char h; +unsigned int cdb_hash; +static const char hex[] = 0123456789abcdef; + +md5_hash = ap_md5_binary(p, (unsigned char *) name, (int) strlen(name)); + +cdb_hash = cdb_string_hash(name); + +key = ptr = apr_palloc(p, nlevels * LEVEL_DIR_LENGTH + + MD5_HEXDIGESTSIZE); + +for (i = 0; i nlevels; i++) { +h = (cdb_hash / divisors[i]) % dirlevels[i]; +*ptr++ = hex[h 4]; +*ptr++ = hex[h 0xf]; +*ptr++ = '/'; +} + +memcpy(ptr, md5_hash, MD5_HEXDIGESTSIZE); + +return key; } /* Create a new table consisting of those elements from an input Index: trunk/modules/cache/mod_cache.h === --- trunk.orig/modules/cache/mod_cache.h +++ trunk/modules/cache/mod_cache.h @@ -72,6 +72,8 @@ #include apr_atomic.h +#define LEVEL_DIR_LENGTH3 + #ifndef MAX #define MAX(a,b)((a) (b) ? (a) : (b)) #endif @@ -274,8 +276,9 @@ CACHE_DECLARE(void) ap_cache_accept_head CACHE_DECLARE(apr_time_t)
[patch 01/10] dont cache expired objects
Don't cache requests with a expires date in the past; otherwise the cache code will _always_ try to cache the URL, leading to numerous rename() errors on win32 since the URL is already cached. Index: trunk/modules/cache/mod_cache.c === --- trunk.orig/modules/cache/mod_cache.c +++ trunk/modules/cache/mod_cache.c @@ -426,6 +426,11 @@ static int cache_save_filter(ap_filter_t /* if a broken Expires header is present, don't cache it */ reason = apr_pstrcat(p, Broken expires header: , exps, NULL); } +else if (exps != NULL exp != APR_DATE_BAD exp r-request_time) +{ +/* if a Expires header is in the past, don't cache it */ +reason = Expires header already expired, not cacheable; +} else if (r-args exps == NULL) { /* if query string present but no expiration time, don't cache it * (RFC 2616/13.9) --
[patch 06/10] dont choke on empty URI path
According to my reading of RFC3986 (section 6.2.3.) if a URI contains an authority component and an empty path, the empty path is to be equivalent to /. It explicitly cites the following four URIs as equivalents: http://example.com http://example.com/ http://example.com:/ http://example.com:80/ Index: trunk/modules/cache/cache_util.c === --- trunk.orig/modules/cache/cache_util.c +++ trunk/modules/cache/cache_util.c @@ -67,9 +67,23 @@ static int uri_meets_conditions(const ap } } +/* For HTTP caching purposes, an empty (NULL) path is equivalent to + * a single / path. RFCs 3986/2396 + */ + +if (!url-path) { +if (*filter-path == '/' pathlen == 1) { +return 1; +} +else { +return 0; +} +} + /* Url has met all of the filter conditions so far, determine * if the paths match. */ + return !strncmp(filter-path, url-path, pathlen); } --
[patch 03/10] shrink cache_save_filter
Move parts of cache_save_filter() to a smaller check_cacheable_request() function that is easier to read and understand. Also, a useless assignment is removed. Index: trunk/modules/cache/mod_cache.c === --- trunk.orig/modules/cache/mod_cache.c +++ trunk/modules/cache/mod_cache.c @@ -294,65 +294,11 @@ static int cache_out_filter(ap_filter_t * Finally, pass the data to the next filter (the network or whatever) */ -static int cache_save_filter(ap_filter_t *f, apr_bucket_brigade *in) +static const char* check_cacheable_request(request_rec *r, +cache_request_rec *cache, cache_server_conf *conf) { -int rv = !OK; -int date_in_errhdr = 0; -request_rec *r = f-r; -cache_request_rec *cache; -cache_server_conf *conf; -const char *cc_out, *cl; -const char *exps, *lastmods, *dates, *etag; -apr_time_t exp, date, lastmod, now; -apr_off_t size; -cache_info *info = NULL; -char *reason; -apr_pool_t *p; - -conf = (cache_server_conf *) ap_get_module_config(r-server-module_config, - cache_module); - -/* Setup cache_request_rec */ -cache = (cache_request_rec *) ap_get_module_config(r-request_config, - cache_module); -if (!cache) { -/* user likely configured CACHE_SAVE manually; they should really use - * mod_cache configuration to do that - */ -cache = apr_pcalloc(r-pool, sizeof(cache_request_rec)); -ap_set_module_config(r-request_config, cache_module, cache); -} - -reason = NULL; -p = r-pool; -/* - * Pass Data to Cache - * -- - * This section passes the brigades into the cache modules, but only - * if the setup section (see below) is complete. - */ -if (cache-block_response) { -/* We've already sent down the response and EOS. So, ignore - * whatever comes now. - */ -return APR_SUCCESS; -} - -/* have we already run the cachability check and set up the - * cached file handle? - */ -if (cache-in_checked) { -/* pass the brigades into the cache, then pass them - * up the filter stack - */ -rv = cache-provider-store_body(cache-handle, r, in); -if (rv != APR_SUCCESS) { -ap_log_error(APLOG_MARK, APLOG_DEBUG, rv, r-server, - cache: Cache provider's store_body failed!); -ap_remove_output_filter(f); -} -return ap_pass_brigade(f-next, in); -} +apr_time_t exp, lastmod; +const char *exps, *lastmods, *etag, *cc_out, *reason = NULL; /* * Setup Data in Cache @@ -420,11 +366,11 @@ static int cache_save_filter(ap_filter_t * We include 304 Not Modified here too as this is the origin server * telling us to serve the cached copy. */ -reason = apr_psprintf(p, Response status %d, r-status); +reason = apr_psprintf(r-pool, Response status %d, r-status); } else if (exps != NULL exp == APR_DATE_BAD) { /* if a broken Expires header is present, don't cache it */ -reason = apr_pstrcat(p, Broken expires header: , exps, NULL); +reason = apr_pstrcat(r-pool, Broken expires header: , exps, NULL); } else if (exps != NULL exp != APR_DATE_BAD exp r-request_time) { @@ -503,20 +449,16 @@ static int cache_save_filter(ap_filter_t reason = r-no_cache present; } -if (reason) { -ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, r-server, - cache: %s not cached. Reason: %s, r-unparsed_uri, - reason); +cache-exp = exp; +cache-lastmod = lastmod; -/* remove this filter from the chain */ -ap_remove_output_filter(f); - -/* ship the data up the stack */ -return ap_pass_brigade(f-next, in); -} +return reason; +} -/* Make it so that we don't execute this path again. */ -cache-in_checked = 1; +static apr_off_t request_cl(request_rec *r, apr_bucket_brigade *in) +{ +const char *cl; +apr_off_t size; /* Set the content length if known. */ @@ -561,48 +503,18 @@ static int cache_save_filter(ap_filter_t } } -/* It's safe to cache the response. - * - * There are two possiblities at this point: - * - cache-handle == NULL. In this case there is no previously - * cached entity anywhere on the system. We must create a brand - * new entity and store the response in it. - * - cache-stale_handle != NULL. In this case there is a stale - * entity in the system which needs to be replaced by new - * content (unless the result was 304 Not Modified, which means - * the cached entity is actually fresh, and we should update - * the headers). - */ - -/* Did we have a stale cache entry
[patch 05/10] pass uri_meets_conditions values by reference
Don't pass 'large' objects by value when not needed, it wastes time and space. Index: trunk/modules/cache/cache_util.c === --- trunk.orig/modules/cache/cache_util.c +++ trunk/modules/cache/cache_util.c @@ -28,40 +28,41 @@ extern module AP_MODULE_DECLARE_DATA cac /* Determine if url matches the hostname, scheme and port and path * in filter. All but the path comparisons are case-insensitive. */ -static int uri_meets_conditions(apr_uri_t filter, int pathlen, apr_uri_t url) +static int uri_meets_conditions(const apr_uri_t *filter, apr_size_t pathlen, +const apr_uri_t *url) { /* Compare the hostnames */ -if(filter.hostname) { -if (!url.hostname) { +if(filter-hostname) { +if (!url-hostname) { return 0; } -else if (strcasecmp(filter.hostname, url.hostname)) { +else if (strcasecmp(filter-hostname, url-hostname)) { return 0; } } /* Compare the schemes */ -if(filter.scheme) { -if (!url.scheme) { +if(filter-scheme) { +if (!url-scheme) { return 0; } -else if (strcasecmp(filter.scheme, url.scheme)) { +else if (strcasecmp(filter-scheme, url-scheme)) { return 0; } } /* Compare the ports */ -if(filter.port_str) { -if (url.port_str filter.port != url.port) { +if(filter-port_str) { +if (url-port_str filter-port != url-port) { return 0; } /* NOTE: ap_port_of_scheme will return 0 if given NULL input */ -else if (filter.port != apr_uri_port_of_scheme(url.scheme)) { +else if (filter-port != apr_uri_port_of_scheme(url-scheme)) { return 0; } } -else if(url.port_str filter.scheme) { -if (apr_uri_port_of_scheme(filter.scheme) == url.port) { +else if(url-port_str filter-scheme) { +if (apr_uri_port_of_scheme(filter-scheme) == url-port) { return 0; } } @@ -69,12 +70,11 @@ static int uri_meets_conditions(apr_uri_ /* Url has met all of the filter conditions so far, determine * if the paths match. */ -return !strncmp(filter.path, url.path, pathlen); +return !strncmp(filter-path, url-path, pathlen); } CACHE_DECLARE(cache_provider_list *)ap_cache_get_providers(request_rec *r, - cache_server_conf *conf, - apr_uri_t uri) + cache_server_conf *conf) { cache_provider_list *providers = NULL; int i; @@ -83,7 +83,7 @@ CACHE_DECLARE(cache_provider_list *)ap_c for (i = 0; i conf-cacheenable-nelts; i++) { struct cache_enable *ent = (struct cache_enable *)conf-cacheenable-elts; -if (uri_meets_conditions(ent[i].url, ent[i].pathlen, uri)) { +if (uri_meets_conditions(ent[i].url, ent[i].pathlen, r-parsed_uri)) { /* Fetch from global config and add to the list. */ cache_provider *provider; provider = ap_lookup_provider(CACHE_PROVIDER_GROUP, ent[i].type, @@ -120,7 +120,7 @@ CACHE_DECLARE(cache_provider_list *)ap_c for (i = 0; i conf-cachedisable-nelts; i++) { struct cache_disable *ent = (struct cache_disable *)conf-cachedisable-elts; -if (uri_meets_conditions(ent[i].url, ent[i].pathlen, uri)) { +if (uri_meets_conditions(ent[i].url, ent[i].pathlen, r-parsed_uri)) { /* Stop searching now. */ return NULL; } Index: trunk/modules/cache/mod_cache.c === --- trunk.orig/modules/cache/mod_cache.c +++ trunk/modules/cache/mod_cache.c @@ -106,7 +106,7 @@ static int cache_url_handler(request_rec /* * Which cache module (if any) should handle this request? */ -if (!(providers = ap_cache_get_providers(r, conf, r-parsed_uri))) { +if (!(providers = ap_cache_get_providers(r, conf))) { return DECLINED; } Index: trunk/modules/cache/mod_cache.h === --- trunk.orig/modules/cache/mod_cache.h +++ trunk/modules/cache/mod_cache.h @@ -280,7 +280,7 @@ CACHE_DECLARE(char *) ap_cache_generate_ unsigned int *dirlevels, unsigned int *divisors, const char *name); -CACHE_DECLARE(cache_provider_list *)ap_cache_get_providers(request_rec *r, cache_server_conf *conf, apr_uri_t uri); +CACHE_DECLARE(cache_provider_list *)ap_cache_get_providers(request_rec *r, cache_server_conf *conf); CACHE_DECLARE(int) ap_cache_liststr(apr_pool_t *p, const char *list,
[patch 04/10] shrink cache_url_handler
Move parts of cache_url_handler() to a smaller add_cache_filters() function that is easier to read and understand. Index: trunk/modules/cache/mod_cache.c === --- trunk.orig/modules/cache/mod_cache.c +++ trunk/modules/cache/mod_cache.c @@ -48,6 +48,44 @@ static ap_filter_rec_t *cache_remove_url * oh well. */ +static void add_cache_filters(request_rec *r, cache_request_rec *cache) +{ +/* + * Add cache_save filter to cache this request. Choose + * the correct filter by checking if we are a subrequest + * or not. + */ +if (r-main) { +ap_log_error(APLOG_MARK, APLOG_DEBUG, APR_SUCCESS, +r-server, +Adding CACHE_SAVE_SUBREQ filter for %s, +r-uri); +ap_add_output_filter_handle(cache_save_subreq_filter_handle, +NULL, r, r-connection); +} +else { +ap_log_error(APLOG_MARK, APLOG_DEBUG, APR_SUCCESS, +r-server, Adding CACHE_SAVE filter for %s, +r-uri); +ap_add_output_filter_handle(cache_save_filter_handle, +NULL, r, r-connection); +} + +ap_log_error(APLOG_MARK, APLOG_DEBUG, APR_SUCCESS, r-server, +Adding CACHE_REMOVE_URL filter for %s, +r-uri); + +/* Add cache_remove_url filter to this request to remove a + * stale cache entry if needed. Also put the current cache + * request rec in the filter context, as the request that + * is available later during running the filter maybe + * different due to an internal redirect. + */ +cache-remove_url_filter = +ap_add_output_filter_handle(cache_remove_url_filter_handle, +cache, r, r-connection); +} + static int cache_url_handler(request_rec *r, int lookup) { apr_status_t rv; @@ -111,41 +149,7 @@ static int cache_url_handler(request_rec if (rv != OK) { if (rv == DECLINED) { if (!lookup) { - -/* - * Add cache_save filter to cache this request. Choose - * the correct filter by checking if we are a subrequest - * or not. - */ -if (r-main) { -ap_log_error(APLOG_MARK, APLOG_DEBUG, APR_SUCCESS, - r-server, - Adding CACHE_SAVE_SUBREQ filter for %s, - r-uri); - ap_add_output_filter_handle(cache_save_subreq_filter_handle, -NULL, r, r-connection); -} -else { -ap_log_error(APLOG_MARK, APLOG_DEBUG, APR_SUCCESS, - r-server, Adding CACHE_SAVE filter for %s, - r-uri); -ap_add_output_filter_handle(cache_save_filter_handle, -NULL, r, r-connection); -} - -ap_log_error(APLOG_MARK, APLOG_DEBUG, APR_SUCCESS, r-server, - Adding CACHE_REMOVE_URL filter for %s, - r-uri); - -/* Add cache_remove_url filter to this request to remove a - * stale cache entry if needed. Also put the current cache - * request rec in the filter context, as the request that - * is available later during running the filter maybe - * different due to an internal redirect. - */ -cache-remove_url_filter = -ap_add_output_filter_handle(cache_remove_url_filter_handle, -cache, r, r-connection); +add_cache_filters(r, cache); } else { if (cache-stale_headers) { --
[patch 08/10] unify storage providers entry calls
Later on those will handle all HTTP specific caching nuances Index: trunk/modules/cache/cache_storage.c === --- trunk.orig/modules/cache/cache_storage.c +++ trunk/modules/cache/cache_storage.c @@ -24,6 +24,30 @@ extern module AP_MODULE_DECLARE_DATA cac /* -- */ +apr_status_t cache_store_entity_headers(cache_request_rec *cache, +request_rec *r, cache_info *info) +{ +return cache-provider-store_headers(cache-handle, r, info); +} + +apr_status_t cache_store_entity_body(cache_request_rec *cache, request_rec *r, + apr_bucket_brigade *bb) +{ +return cache-provider-store_body(cache-handle, r, bb); +} + +apr_status_t cache_recall_entity_headers(const cache_provider *provider, + cache_handle_t *h, request_rec *r) +{ +return provider-recall_headers(h, r); +} + +apr_status_t cache_recall_entity_body(cache_request_rec *cache, + apr_pool_t *pool, apr_bucket_brigade *bb) +{ +return cache-provider-recall_body(cache-handle, pool, bb); +} + /* * delete all URL entities from the cache * @@ -311,7 +335,7 @@ int cache_select(request_rec *r) switch ((rv = list-provider-open_entity(h, r, key))) { case OK: { -if (list-provider-recall_headers(h, r) != APR_SUCCESS) { +if (cache_recall_entity_headers(list-provider, h, r) != APR_SUCCESS) { /* TODO: Handle this error */ return DECLINED; } Index: trunk/modules/cache/mod_cache.c === --- trunk.orig/modules/cache/mod_cache.c +++ trunk/modules/cache/mod_cache.c @@ -271,7 +271,7 @@ static int cache_out_filter(ap_filter_t r-status = cache-handle-cache_obj-info.status; /* recall_headers() was called in cache_select() */ -cache-provider-recall_body(cache-handle, r-pool, bb); +cache_recall_entity_body(cache, r-pool, bb); /* This filter is done once it has served up its content */ ap_remove_output_filter(f); @@ -676,7 +676,7 @@ static int cache_save_filter(ap_filter_t /* pass the brigades into the cache, then pass them * up the filter stack */ -rv = cache-provider-store_body(cache-handle, r, in); +rv = cache_store_entity_body(cache, r, in); if (rv != APR_SUCCESS) { ap_log_error(APLOG_MARK, APLOG_DEBUG, rv, r-server, cache: Cache provider's store_body failed!); @@ -781,7 +781,7 @@ static int cache_save_filter(ap_filter_t * permissions problems or a read-only (re)mount. This must be handled * later. */ -rv = cache-provider-store_headers(cache-handle, r, info); +rv = cache_store_entity_headers(cache, r, info); /* Did we just update the cached headers on a revalidated response? * @@ -809,7 +809,7 @@ static int cache_save_filter(ap_filter_t APR_BRIGADE_INSERT_TAIL(bb, bkt); } else { -cache-provider-recall_body(cache-handle, r-pool, bb); +cache_recall_entity_body(cache, r-pool, bb); } cache-block_response = 1; @@ -845,7 +845,7 @@ static int cache_save_filter(ap_filter_t return ap_pass_brigade(f-next, in); } -rv = cache-provider-store_body(cache-handle, r, in); +rv = cache_store_entity_body(cache, r, in); if (rv != APR_SUCCESS) { ap_log_error(APLOG_MARK, APLOG_DEBUG, rv, r-server, cache: store_body failed); Index: trunk/modules/cache/mod_cache.h === --- trunk.orig/modules/cache/mod_cache.h +++ trunk/modules/cache/mod_cache.h @@ -305,13 +305,11 @@ apr_status_t cache_generate_key_default( */ const char* cache_create_key( request_rec*r ); -/* -apr_status_t cache_store_entity_headers(cache_handle_t *h, request_rec *r, cache_info *info); -apr_status_t cache_store_entity_body(cache_handle_t *h, request_rec *r, apr_bucket_brigade *bb); - -apr_status_t cache_recall_entity_headers(cache_handle_t *h, request_rec *r); -apr_status_t cache_recall_entity_body(cache_handle_t *h, apr_pool_t *p, apr_bucket_brigade *bb); -*/ +apr_status_t cache_store_entity_headers(cache_request_rec *cache, request_rec *r, cache_info *info); +apr_status_t cache_store_entity_body(cache_request_rec *cache, request_rec *r, apr_bucket_brigade *bb); + +apr_status_t cache_recall_entity_headers(const cache_provider *provider, cache_handle_t *h, request_rec *r); +apr_status_t cache_recall_entity_body(cache_request_rec *cache, apr_pool_t *pool, apr_bucket_brigade *bb); /* hooks */ --
[patch 09/10] move open_entity
Move the open_entity call to cache_recall_entity_headers, adding a new urlkey parameter. Index: trunk/modules/cache/cache_storage.c === --- trunk.orig/modules/cache/cache_storage.c +++ trunk/modules/cache/cache_storage.c @@ -37,8 +37,16 @@ apr_status_t cache_store_entity_body(cac } apr_status_t cache_recall_entity_headers(const cache_provider *provider, - cache_handle_t *h, request_rec *r) + cache_handle_t *h, request_rec *r, + const char *urlkey) { +apr_status_t rv; + +rv = provider-open_entity(h, r, urlkey); + +if (rv != OK) +return rv; + return provider-recall_headers(h, r); } @@ -332,14 +340,9 @@ int cache_select(request_rec *r) list = cache-providers; while (list) { -switch ((rv = list-provider-open_entity(h, r, key))) { -case OK: { - -if (cache_recall_entity_headers(list-provider, h, r) != APR_SUCCESS) { -/* TODO: Handle this error */ -return DECLINED; -} +rv = cache_recall_entity_headers(list-provider, h, r, key); +if (rv == APR_SUCCESS) { if (cache_check_request(h, r, cache, list) != OK) { return DECLINED; } @@ -350,16 +353,11 @@ int cache_select(request_rec *r) cache-handle = h; return OK; } -case DECLINED: { +else { /* try again with next cache type */ list = list-next; continue; } -default: { -/* oo-er! an error */ -return rv; -} -} } return DECLINED; } Index: trunk/modules/cache/mod_cache.h === --- trunk.orig/modules/cache/mod_cache.h +++ trunk/modules/cache/mod_cache.h @@ -308,7 +308,8 @@ const char* cache_create_key( request_re apr_status_t cache_store_entity_headers(cache_request_rec *cache, request_rec *r, cache_info *info); apr_status_t cache_store_entity_body(cache_request_rec *cache, request_rec *r, apr_bucket_brigade *bb); -apr_status_t cache_recall_entity_headers(const cache_provider *provider, cache_handle_t *h, request_rec *r); +apr_status_t cache_recall_entity_headers(const cache_provider *provider, cache_handle_t *h, request_rec *r, + const char *urlkey); apr_status_t cache_recall_entity_body(cache_request_rec *cache, apr_pool_t *pool, apr_bucket_brigade *bb); /* hooks */ --
[patch 07/10] shrink cache_select
Move parts of cache_select() to a smaller cache_check_request() function that is easier to read and understand. Index: trunk/modules/cache/cache_storage.c === --- trunk.orig/modules/cache/cache_storage.c +++ trunk/modules/cache/cache_storage.c @@ -169,6 +169,115 @@ CACHE_DECLARE(void) ap_cache_accept_head } } +static int cache_check_request(cache_handle_t *h, request_rec *r, + cache_request_rec *cache, + cache_provider_list *list) +{ +int fresh; +char *vary = NULL; + +/* + * Check Content-Negotiation - Vary + * + * At this point we need to make sure that the object we found in + * the cache is the same object that would be delivered to the + * client, when the effects of content negotiation are taken into + * effect. + * + * In plain english, we want to make sure that a language-negotiated + * document in one language is not given to a client asking for a + * language negotiated document in a different language by mistake. + * + * This code makes the assumption that the storage manager will + * cache the req_hdrs if the response contains a Vary + * header. + * + * RFC2616 13.6 and 14.44 describe the Vary mechanism. + */ +vary = apr_pstrdup(r-pool, apr_table_get(h-resp_hdrs, Vary)); +while (vary *vary) { +char *name = vary; +const char *h1, *h2; + +/* isolate header name */ +while (*vary !apr_isspace(*vary) (*vary != ',')) +++vary; +while (*vary (apr_isspace(*vary) || (*vary == ','))) { +*vary = '\0'; +++vary; +} + +/* + * is this header in the request and the header in the cached + * request identical? If not, we give up and do a straight get + */ +h1 = apr_table_get(r-headers_in, name); +h2 = apr_table_get(h-req_hdrs, name); +if (h1 == h2) { +/* both headers NULL, so a match - do nothing */ +} +else if (h1 h2 !strcmp(h1, h2)) { +/* both headers exist and are equal - do nothing */ +} +else { +/* headers do not match, so Vary failed */ +ap_log_error(APLOG_MARK, APLOG_DEBUG, APR_SUCCESS, +r-server, +cache_select_url(): Vary header mismatch.); +return DECLINED; +} +} + +cache-provider = list-provider; +cache-provider_name = list-provider_name; + +/* Is our cached response fresh enough? */ +fresh = ap_cache_check_freshness(h, r); +if (!fresh) { +const char *etag, *lastmod; + +ap_log_error(APLOG_MARK, APLOG_DEBUG, APR_SUCCESS, r-server, +Cached response for %s isn't fresh. Adding/replacing +conditional request headers., r-uri); + +/* Make response into a conditional */ +cache-stale_headers = apr_table_copy(r-pool, +r-headers_in); + +/* We can only revalidate with our own conditionals: remove the + * conditions from the original request. + */ +apr_table_unset(r-headers_in, If-Match); +apr_table_unset(r-headers_in, If-Modified-Since); +apr_table_unset(r-headers_in, If-None-Match); +apr_table_unset(r-headers_in, If-Range); +apr_table_unset(r-headers_in, If-Unmodified-Since); + +etag = apr_table_get(h-resp_hdrs, ETag); +lastmod = apr_table_get(h-resp_hdrs, Last-Modified); + +if (etag || lastmod) { +/* If we have a cached etag and/or Last-Modified add in + * our own conditionals. + */ + +if (etag) { +apr_table_set(r-headers_in, If-None-Match, etag); +} + +if (lastmod) { +apr_table_set(r-headers_in, If-Modified-Since, +lastmod); +} +cache-stale_handle = h; +} + +return DECLINED; +} + +return OK; +} + /* * select a specific URL entity in the cache * @@ -201,110 +310,13 @@ int cache_select(request_rec *r) while (list) { switch ((rv = list-provider-open_entity(h, r, key))) { case OK: { -char *vary = NULL; -int fresh; if (list-provider-recall_headers(h, r) != APR_SUCCESS) { /* TODO: Handle this error */ return DECLINED; } -/* - * Check Content-Negotiation - Vary - * - * At this point we need to make sure that the object we found in - * the cache is the same object that would be delivered to the - * client, when the effects of content negotiation are taken into - * effect. - * - * In plain english, we
[patch 10/10] combine open_entity and recall_headers callbacks
An open_entity without a close_entity call is pointless, those can be resumed to a single call. Index: trunk/modules/cache/cache_storage.c === --- trunk.orig/modules/cache/cache_storage.c +++ trunk/modules/cache/cache_storage.c @@ -40,14 +40,7 @@ apr_status_t cache_recall_entity_headers cache_handle_t *h, request_rec *r, const char *urlkey) { -apr_status_t rv; - -rv = provider-open_entity(h, r, urlkey); - -if (rv != OK) -return rv; - -return provider-recall_headers(h, r); +return provider-recall_headers(h, r, urlkey); } apr_status_t cache_recall_entity_body(cache_request_rec *cache, Index: trunk/modules/cache/mod_cache.h === --- trunk.orig/modules/cache/mod_cache.h +++ trunk/modules/cache/mod_cache.h @@ -213,12 +213,10 @@ typedef struct { int (*remove_entity) (cache_handle_t *h); apr_status_t (*store_headers)(cache_handle_t *h, request_rec *r, cache_info *i); apr_status_t (*store_body)(cache_handle_t *h, request_rec *r, apr_bucket_brigade *b); -apr_status_t (*recall_headers) (cache_handle_t *h, request_rec *r); +apr_status_t (*recall_headers) (cache_handle_t *h, request_rec *r, const char *urlkey); apr_status_t (*recall_body) (cache_handle_t *h, apr_pool_t *p, apr_bucket_brigade *bb); int (*create_entity) (cache_handle_t *h, request_rec *r, const char *urlkey, apr_off_t len); -int (*open_entity) (cache_handle_t *h, request_rec *r, - const char *urlkey); int (*remove_url) (cache_handle_t *h, apr_pool_t *p); } cache_provider; Index: trunk/modules/cache/mod_disk_cache.c === --- trunk.orig/modules/cache/mod_disk_cache.c +++ trunk/modules/cache/mod_disk_cache.c @@ -57,7 +57,7 @@ module AP_MODULE_DECLARE_DATA disk_cache static int remove_entity(cache_handle_t *h); static apr_status_t store_headers(cache_handle_t *h, request_rec *r, cache_info *i); static apr_status_t store_body(cache_handle_t *h, request_rec *r, apr_bucket_brigade *b); -static apr_status_t recall_headers(cache_handle_t *h, request_rec *r); +static apr_status_t recall_headers(cache_handle_t *h, request_rec *r, const char *urlkey); static apr_status_t recall_body(cache_handle_t *h, apr_pool_t *p, apr_bucket_brigade *bb); static apr_status_t read_array(request_rec *r, apr_array_header_t* arr, apr_file_t *file); @@ -777,9 +777,15 @@ static apr_status_t read_table(cache_han * @@@: XXX: FIXME: currently the headers are passed thru un-merged. * Is that okay, or should they be collapsed where possible? */ -static apr_status_t recall_headers(cache_handle_t *h, request_rec *r) +static apr_status_t recall_headers(cache_handle_t *h, request_rec *r, + const char *urlkey) { -disk_cache_object_t *dobj = (disk_cache_object_t *) h-cache_obj-vobj; +disk_cache_object_t *dobj; + +if (open_entity(h, r, urlkey) != OK) +return DECLINED; + +dobj = (disk_cache_object_t *) h-cache_obj-vobj; /* This case should not happen... */ if (!dobj-hfd) { @@ -1203,7 +1209,6 @@ static const cache_provider cache_disk_p recall_headers, recall_body, create_entity, -open_entity, remove_url, }; Index: trunk/modules/cache/mod_mem_cache.c === --- trunk.orig/modules/cache/mod_mem_cache.c +++ trunk/modules/cache/mod_mem_cache.c @@ -108,7 +108,7 @@ static mem_cache_conf *sconf; static int remove_entity(cache_handle_t *h); static apr_status_t store_headers(cache_handle_t *h, request_rec *r, cache_info *i); static apr_status_t store_body(cache_handle_t *h, request_rec *r, apr_bucket_brigade *b); -static apr_status_t recall_headers(cache_handle_t *h, request_rec *r); +static apr_status_t recall_headers(cache_handle_t *h, request_rec *r, const char *urlkey); static apr_status_t recall_body(cache_handle_t *h, apr_pool_t *p, apr_bucket_brigade *bb); static void cleanup_cache_object(cache_object_t *obj); @@ -628,10 +628,16 @@ static int remove_url(cache_handle_t *h, return OK; } -static apr_status_t recall_headers(cache_handle_t *h, request_rec *r) +static apr_status_t recall_headers(cache_handle_t *h, request_rec *r, + const char *urlkey) { int rc; -mem_cache_object_t *mobj = (mem_cache_object_t*) h-cache_obj-vobj; +mem_cache_object_t *mobj; + +if (open_entity(h, r, urlkey) != OK) +return DECLINED; + +mobj = (mem_cache_object_t*) h-cache_obj-vobj; h-req_hdrs = apr_table_make(r-pool, mobj-num_req_hdrs); h-resp_hdrs = apr_table_make(r-pool, mobj-num_header_out); @@ -1053,7 +1059,6 @@ static const
[PATCH] revamped mod_disk_cache directory structure
Hi, This patch converts the mod_disk_cache cache directory structure to a uniformly distributed two level hierarchy. The admin specifies the number of level-1 and level-2 directories and the files are scattered across the level-2 directories. Also, with this patch it is possible to designate directories to separate partitions because the temporary files are created on the destination directory. For example, running Apache/proxy+cache for a small network: [EMAIL PROTECTED] cache]# sh files-per-directory.sh dir: 00/ subs: 139 files: 632 size: 4.8M dir: 01/ subs: 156 files: 765 size: 5.7M dir: 02/ subs: 144 files: 626 size: 4.8M dir: 03/ subs: 160 files: 714 size: 6.1M dir: 04/ subs: 169 files: 820 size: 5.9M dir: 05/ subs: 131 files: 590 size: 4.1M dir: 06/ subs: 148 files: 677 size: 5.3M dir: 07/ subs: 142 files: 644 size: 5.8M dir: 08/ subs: 148 files: 749 size: 5.8M dir: 09/ subs: 158 files: 711 size: 6.3M dir: 0A/ subs: 146 files: 666 size: 5.1M dir: 0B/ subs: 157 files: 701 size: 5.1M dir: 0C/ subs: 157 files: 671 size: 5.2M dir: 0D/ subs: 157 files: 711 size: 5.7M dir: 0E/ subs: 149 files: 704 size: 5.6M dir: 0F/ subs: 158 files: 742 size: 5.8M -- Davi Arnaut Index: modules/cache/cache_util.c === --- modules/cache/cache_util.c (revision 423984) +++ modules/cache/cache_util.c (working copy) @@ -19,6 +19,7 @@ #include mod_cache.h #include ap_provider.h +#include util_md5.h /* -- */ @@ -489,54 +490,31 @@ y[sizeof(j) * 2] = '\0'; } -static void cache_hash(const char *it, char *val, int ndepth, int nlength) +static unsigned int cdb_string_hash(const char *str) { -apr_md5_ctx_t context; -unsigned char digest[16]; -char tmp[22]; -int i, k, d; -unsigned int x; -static const char enc_table[64] = -ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_@; +unsigned int hash = 5381; -apr_md5_init(context); -apr_md5_update(context, (const unsigned char *) it, strlen(it)); -apr_md5_final(digest, context); +while (*str) +hash = 33 * hash + *str++; -/* encode 128 bits as 22 characters, using a modified uuencoding - * the encoding is 3 bytes - 4 characters* i.e. 128 bits is - * 5 x 3 bytes + 1 byte - 5 * 4 characters + 2 characters - */ -for (i = 0, k = 0; i 15; i += 3) { -x = (digest[i] 16) | (digest[i + 1] 8) | digest[i + 2]; -tmp[k++] = enc_table[x 18]; -tmp[k++] = enc_table[(x 12) 0x3f]; -tmp[k++] = enc_table[(x 6) 0x3f]; -tmp[k++] = enc_table[x 0x3f]; -} - -/* one byte left */ -x = digest[15]; -tmp[k++] = enc_table[x 2];/* use up 6 bits */ -tmp[k++] = enc_table[(x 4) 0x3f]; - -/* now split into directory levels */ -for (i = k = d = 0; d ndepth; ++d) { -memcpy(val[i], tmp[k], nlength); -k += nlength; -val[i + nlength] = '/'; -i += nlength + 1; -} -memcpy(val[i], tmp[k], 22 - k); -val[i + 22 - k] = '\0'; +return hash; } -CACHE_DECLARE(char *)ap_cache_generate_name(apr_pool_t *p, int dirlevels, -int dirlength, const char *name) +CACHE_DECLARE(char *)ap_cache_generate_name(apr_pool_t *p, unsigned int L1, +unsigned int L2, const char *name) { -char hashfile[66]; -cache_hash(name, hashfile, dirlevels, dirlength); -return apr_pstrdup(p, hashfile); +char *key; +char *md5_hash; +unsigned int cdb_hash; + +md5_hash = ap_md5_binary(p, (unsigned char *) name, (int) strlen(name)); + +cdb_hash = cdb_string_hash(md5_hash) / L2; + +key = apr_psprintf(p, %02X/%02X/%s, (cdb_hash / L2) % L1, + cdb_hash % L2, md5_hash); + +return key; } /* Create a new table consisting of those elements from an input Index: modules/cache/mod_cache.h === --- modules/cache/mod_cache.h (revision 423984) +++ modules/cache/mod_cache.h (working copy) @@ -274,8 +274,8 @@ CACHE_DECLARE(apr_time_t) ap_cache_hex2usec(const char *x); CACHE_DECLARE(void) ap_cache_usec2hex(apr_time_t j, char *y); -CACHE_DECLARE(char *) ap_cache_generate_name(apr_pool_t *p, int dirlevels, - int dirlength, +CACHE_DECLARE(char *) ap_cache_generate_name(apr_pool_t *p, unsigned int L1, + unsigned int L2, const char *name); CACHE_DECLARE(cache_provider_list *)ap_cache_get_providers(request_rec *r, cache_server_conf *conf, apr_uri_t uri); CACHE_DECLARE(int) ap_cache_liststr(apr_pool_t *p, const char *list, Index: modules/cache/mod_disk_cache.c === --- modules/cache/mod_disk_cache.c (revision
Re: [PATCH] revamped mod_disk_cache directory structure
Em 20/07/2006, às 17:06, Colm MacCarthaigh escreveu: On Thu, Jul 20, 2006 at 11:58:01AM -0300, Davi Arnaut wrote: Also, with this patch it is possible to designate directories to separate partitions because the temporary files are created on the destination directory. I'm not sure it goes far enough though. What if an admin has two filesystems/disks they can to store the cache on, or what if it's 7? CacheDirLevels n 256 for n = 1,2,...,7,... What if one is a 160GB filesystem and the other only 10GB? This was not the scope of the patch but it's one step towards. I think we can add load balancing later. If we're going to tackle these kind problems, we need to look at how things like diablo (which gets it right IMO) do it. diablo ? the game ? :-) -- Davi Arnaut
Re: [PATCH] Don't cache expired objects
On Thu, 22 Jun 2006 06:35:05 -0700 Paul Querna [EMAIL PROTECTED] wrote: Davi Arnaut wrote: Don't cache requests with a expires date in the past; otherwise the cache code will _always_ try to cache the URL, leading to numerous rename() errors - since the URL is already cached. ... Index: modules/cache/mod_cache.c === --- modules/cache/mod_cache.c (revision 414393) +++ modules/cache/mod_cache.c (working copy) @@ -426,6 +426,9 @@ /* if a broken Expires header is present, don't cache it */ reason = apr_pstrcat(p, Broken expires header: , exps, NULL); } +else if (exps != NULL exp != APR_DATE_BAD exp apr_time_now()) { +reason = Expires header already expired, not cacheable; +} else if (r-args exps == NULL) { /* if query string present but no expiration time, don't cache it * (RFC 2616/13.9) Instead of calling apr_time_now(), shouldn't it re-use r-request_time here, saving a possibly expensive call? Otherwise, I think its a good fix. Good catch, thanks! -- Davi Arnaut Index: mod_cache.c === --- mod_cache.c (revision 416367) +++ mod_cache.c (working copy) @@ -426,6 +426,9 @@ /* if a broken Expires header is present, don't cache it */ reason = apr_pstrcat(p, Broken expires header: , exps, NULL); } +else if (exps != NULL exp != APR_DATE_BAD exp r-request_time) { +reason = Expires header already expired, not cacheable; +} else if (r-args exps == NULL) { /* if query string present but no expiration time, don't cache it * (RFC 2616/13.9)
[PATCH] Don't cache expired objects
Hi, Don't cache requests with a expires date in the past; otherwise the cache code will _always_ try to cache the URL, leading to numerous rename() errors - since the URL is already cached. -- Davi Arnaut Index: modules/cache/mod_cache.c === --- modules/cache/mod_cache.c (revision 414393) +++ modules/cache/mod_cache.c (working copy) @@ -426,6 +426,9 @@ /* if a broken Expires header is present, don't cache it */ reason = apr_pstrcat(p, Broken expires header: , exps, NULL); } +else if (exps != NULL exp != APR_DATE_BAD exp apr_time_now()) { +reason = Expires header already expired, not cacheable; +} else if (r-args exps == NULL) { /* if query string present but no expiration time, don't cache it * (RFC 2616/13.9)
SoC 2006
Hi, I'm starting to work on my SoC 2006 proposal [1] and I want to let you guys know how I will eventually proceed and what to expect. Your comments and suggestions are welcome. - What I am trying to achieve ? A cache framework that features three layers of granularity: storage, protocol and provider. On arrival of a request (ie: http request), the protocol layer places filters into the protocol pipeline and decides whether or not this content should be cached. If so, it requests a a cache provider to the provider. The provider layer then mediates the and dispatches the data from the protocol modules to the various storage modules. The I/O operations are handled by the storage layer (storage modules) which retrieves and stores key/data pairs. - How I will proceed ? Initially, I will be refactoring the current mode cache code, without altering its behavior. There are functions (ie: cache_save_filter) taking more than 500 lines. I will separate those large functions into smaller functions without changing functionality at all, making the core more readable. This will also help me fully understand the structure of the cache code. (Separated patches). Further down the road, I want to rework the store_body|headers and recall_body|headers function into single store and recall functions, moving the http specific bits back to mod_cache and see and see how well it works (later, those bits will be consolidated in a branding new mod_cache_http - protocol layer). When I hit this point I'm going to send a new report. - Notes Don't expect much work on this in June, my semester is ending (tests.. oh my) and there is the World Soccer Cup (FIFA) which nearly stops the country (Brazil is expected to win its sixty championship). -- Davi Arnaut 1. Redesigning and extending the Apache cache architecture. http://verdesmares.com/Apache/proposal.txt
SoC 2006, v1
Doh, did you run a spell-check before you hit the send button? :) Hi, I'm starting to work on my SoC 2006 proposal [1] and I want to let you guys know how I will eventually proceed and what to expect. Your comments and suggestions are welcome. - What I am trying to achieve ? A cache framework that features three layers of granularity: storage, protocol and provider. On arrival of a request (ie: http request), the protocol layer places filters into the protocol pipeline and decides whether or not this content should be cached. If so, it requests a a cache provider to the provider layer. The provider layer then mediates and dispatches the data from the protocol modules to the various storage modules. The I/O operations are handled by the storage layer (storage modules) which retrieves and stores key/data pairs. - How I will proceed ? Initially, I will be refactoring the current mode cache code, without altering its behavior. There are functions (ie: cache_save_filter) taking more than 500 lines. I will separate those large functions into smaller functions without changing functionality at all, making the core more readable. This will also help me fully understand the structure of the cache code. (Separated patches). Further down the road, I want to rework the store_body|headers and recall_body|headers function into single store and recall functions, moving the http specific bits back to mod_cache and see how well it works (later, those bits will be consolidated in a branding new mod_cache_http - protocol layer). When I hit this point I'm going to send a new report. - Notes Don't expect much work on this in June, my semester is ending (tests.. oh my) and there is the World Soccer Cup (FIFA) which nearly stops the country (Brazil is expected to win its sixty championship). -- Davi Arnaut 1. Redesigning and extending the Apache cache architecture. http://verdesmares.com/Apache/proposal.txt
[PATCH] No temporary file left behind
Hi, Remove mod_disk_cache temporary files when rename() fails, otherwise they may accumulate in excess. -- Davi Arnaut Index: modules/cache/mod_disk_cache.c === --- modules/cache/mod_disk_cache.c (revision 409900) +++ modules/cache/mod_disk_cache.c (working copy) @@ -165,7 +165,10 @@ */ rv = apr_file_rename(dobj-tempfile, dobj-datafile, r-pool); if (rv != APR_SUCCESS) { -/* XXX log */ +ap_log_error(APLOG_MARK, APLOG_DEBUG, rv, r-server, + disk_cache: rename tempfile to datafile failed: %s - %s, + dobj-tempfile, dobj-datafile); +apr_file_remove(dobj-tempfile, r-pool); } dobj-tfd = NULL; @@ -854,6 +857,7 @@ ap_log_error(APLOG_MARK, APLOG_DEBUG, rv, r-server, disk_cache: rename tempfile to varyfile failed: %s - %s, dobj-tempfile, dobj-hdrsfile); +apr_file_remove(dobj-tempfile, r-pool); return rv; } @@ -946,6 +950,7 @@ ap_log_error(APLOG_MARK, APLOG_ERR, rv, r-server, disk_cache: rename tempfile to hdrsfile failed: %s - %s, dobj-tempfile, dobj-hdrsfile); +apr_file_remove(dobj-tempfile, r-pool); return rv; }
[PATCH] properly set the parent pid on shutdown
Hi, On WIN9X machines, before signaling the parent process in ap_signal_parent() we ought to set the proper parent pid retrieved by GetWindowThreadProcessId(). Otherwise, the event will be signaled without being properly opened because the parent pid will remain the same as the calling process pid. With this patch it's possible to stop the httpd daemon with the standart -k stop signal on WIN9X. -- Davi Arnaut Index: server/mpm/winnt/service.c === --- server/mpm/winnt/service.c (revision 409410) +++ server/mpm/winnt/service.c (working copy) @@ -35,6 +35,7 @@ #undef _WINUSER_ #include winuser.h +extern DWORD parent_pid; static char *mpm_service_name = NULL; static char *mpm_display_name = NULL; @@ -1289,8 +1290,10 @@ * provided the class is ApacheWin95ServiceMonitor */ hwnd = FindWindow(ApacheWin95ServiceMonitor, mpm_service_name); -if (hwnd GetWindowThreadProcessId(hwnd, service_pid)) +if (hwnd GetWindowThreadProcessId(hwnd, service_pid)) { +parent_pid = service_pid; globdat.ssStatus.dwCurrentState = SERVICE_RUNNING; +} else { globdat.ssStatus.dwCurrentState = SERVICE_STOPPED; Index: server/mpm/winnt/mpm_winnt.c === --- server/mpm/winnt/mpm_winnt.c(revision 409410) +++ server/mpm/winnt/mpm_winnt.c(working copy) @@ -57,7 +57,7 @@ OSVERSIONINFO osver; /* VER_PLATFORM_WIN32_NT */ -static DWORD parent_pid; +DWORD parent_pid; DWORD my_pid; int ap_threads_per_child = 0;
Re: [PATCH] properly set the parent pid on shutdown
On Fri, 26 May 2006 13:48:05 -0500 William A. Rowe, Jr. [EMAIL PROTECTED] wrote: +1 on visual review. Anyone have a win9x install handy to verify that this works? Sorry, none of my multiboots still boot 9x. I've personally tested it on WIN98 (virtual machine). Davi Arnaut wrote: Hi, On WIN9X machines, before signaling the parent process in ap_signal_parent() we ought to set the proper parent pid retrieved by GetWindowThreadProcessId(). Otherwise, the event will be signaled without being properly opened because the parent pid will remain the same as the calling process pid. With this patch it's possible to stop the httpd daemon with the standart -k stop signal on WIN9X. -- Davi Arnaut
Re: How to get at the apr_socket_t?
On Thu, 11 May 2006 13:36:44 -0700 Tyler MacDonald [EMAIL PROTECTED] wrote: Tyler MacDonald [EMAIL PROTECTED] wrote: I'm working on a raw tcp/ip handler which is going to have to do nonblocking I/O. I'd like to get direct access to the apr_socket_t from the request_rec/conn_rec... OK, I found this in mod_proxy: apr_socket_t *client_socket = ap_get_module_config(r-connection-conn_config, core_module); But that is only avaialable if CORE_PRIVATE is defined... Does that mean my module needs to define CORE_PRIVATE in order to get access to an apr_socket_t? That seems dirty to me... is there some better way? How about: conn_rec-cs-pfd-desc ? -- Davi Arnaut
Re: Possible new cache architecture
On Wed, 3 May 2006 14:31:06 +0200 (SAST) Graham Leggett [EMAIL PROTECTED] wrote: On Wed, May 3, 2006 1:26 am, Davi Arnaut said: Then you will end up with code that does not meet the requirements of HTTP, and you will have wasted your time. Yeah, right! How ? Hey, you are using the Monty Python argument style. Can you point to even one requirement of HTTP that my_cache_provider wont meet ? Yes. Atomic insertions and deletions, the ability to update headers independantly of body, etc etc, just go back and read the thread. I can't argue with a zombie, you keep repeating the same misunderstands. Seriously, please move this off list to keep the noise out of people's inboxes. Fine, I give up. -- Davi Arnaut
Re: Possible new cache architecture
On Wed, 3 May 2006 11:39:02 -0700 Roy T. Fielding [EMAIL PROTECTED] wrote: On May 3, 2006, at 5:56 AM, Davi Arnaut wrote: On Wed, 3 May 2006 14:31:06 +0200 (SAST) Graham Leggett [EMAIL PROTECTED] wrote: On Wed, May 3, 2006 1:26 am, Davi Arnaut said: Then you will end up with code that does not meet the requirements of HTTP, and you will have wasted your time. Yeah, right! How ? Hey, you are using the Monty Python argument style. Can you point to even one requirement of HTTP that my_cache_provider wont meet ? Yes. Atomic insertions and deletions, the ability to update headers independantly of body, etc etc, just go back and read the thread. I can't argue with a zombie, you keep repeating the same misunderstands. Seriously, please move this off list to keep the noise out of people's inboxes. Fine, I give up. For the record, Graham's statements were entirely correct, Brian's suggested architecture would slow the HTTP cache, and your responses have been amazingly childish for someone who has earned zero credibility on this list. Fine, I do have zero credibility. I suggest you stop defending a half-baked design theory and just go ahead and implement something as a patch. If it works, that's great. If it slows the HTTP cache, I will veto it myself. I'm already doing this. There is, of course, no reason why the HTTP cache has to use some new middle-layer back-end cache, so maybe you could just stop arguing about vaporware and simply implement a single mod_backend_cache that doesn't try to be all things to all people. Implement it and then convince people on the basis of measurements. That is a heck of a lot easier than convincing everyone to dump the current code based on an untested theory. I just wanted to get comments (the original idea wasn't mine). It wasn't my intention to flame anyone, I'm not mad or anything. I was just stating my opinion. I maybe wrong, but I don't give up easy. :) -- Davi Arnaut
Re: Possible new cache architecture
On Tue, 2 May 2006 11:22:31 +0200 (MEST) Niklas Edmundsson [EMAIL PROTECTED] wrote: On Mon, 1 May 2006, Davi Arnaut wrote: More important, if we stick with the key/data concept it's possible to implement the header/body relationship under single or multiple keys. I've been hacking on mod_disk_cache to make it: * Only store one set of data when one uncached item is accessed simultaneously (currently all requests cache the file and the last finished cache process is wins). * Don't wait until the whole item is cached, reply while caching (currently it stalls). * Don't block the requesting thread when requestng a large uncached item, cache in the background and reply while caching (currently it stalls). This is mostly aimed at serving huge static files from a slow disk backend (typically an NFS export from a server holding all the disk), such as http://ftp.acc.umu.se/ and http://ftp.heanet.ie/ . Doing this with the current mod_disk_cache disk layout was not possible, doing the above without unneccessary locking means: * More or less atomic operations, so caching headers and data in separate files gets very messy if you want to keep consistency. * You can't use tempfiles since you want to be able to figure out where the data is to be able to reply while caching. * You want to know the size of the data in order to tell when you're done (ie the current size of a file isn't necessarily the real size of the body since it might be caching while we're reading it). In the light of our experiences, I really think that you want to have a concept that allows you to keep the bond between header and data. Yes, you can patch up a missing bond by require locking and stuff, but I really prefer not having to lock cache files when doing read access. When it comes to make the common case fast a lockless design is very much preferred. I will repeat once again: there is no locking involved, unless your format of storing the header/data is really wrong. _The data format is up to the module using it_, while the storage backend is a completely different issue. However, if all those issues are sorted out in the layer above disk cache then the above observations becomes more or less moot. Yes, that's the point. In any case the patch is more or less finished, independent testing and auditing haven't been done yet but I can submit a preliminary jumbo-patch if people are interested in having a look at it now. -- Davi Arnaut
Re: Possible new cache architecture
On Tue, 2 May 2006 15:40:30 +0200 (SAST) Graham Leggett [EMAIL PROTECTED] wrote: On Tue, May 2, 2006 3:24 pm, Brian Akins said: - the cache says cool, will send my copy upstream. Oops, where has my data gone?. So, the cache says, okay must get content the old fashioned way (proxy, filesystem, magic fairies, etc.). Where's the issue? To rephrase it, a whole lot of extra code, which has to be written and debugged, has to say oops, ok sorry backend about the If-None-Match, I thought I had it cached but I actually didn't, please can I have the full file?. Then the backend gives you a response with different headers to those you already delivered to the frontend. Oops. There is not such scenario. I will simulate a request using the disk_cache format: . Incoming client requests URI /foo/bar/baz . Request goes through mod_http_cache, Generate hash off of URI . mod_http_cache ask mod_cache for the data associated with key: hash.header . No data: . Fetch from upstream . Data Fetched: . If format #1 (Contains a list of Vary Headers): . Use each header name (from .header) with our request values (headers_in) to regenerate hash using HeaderName+ HeaderValue+URI . Ask mod_cache for data with key: hash.header . No data: . Fetch from upstream . Data: . Serve data to client . If format #2 . Serve data to client Where is the difference ? Keeping the code as simple as possible will keep your code bug free, which means less time debugging for you, and less time for end users trying to figure out what the cause is of their weird symptoms. We are trying to get it more simple as possible by separating the storage layer from the protocol layer. -- Davi Arnaut Davi Arnaut
Re: Possible new cache architecture
On Tue, 2 May 2006 17:22:00 +0200 (SAST) Graham Leggett [EMAIL PROTECTED] wrote: On Tue, May 2, 2006 7:06 pm, Davi Arnaut said: There is not such scenario. I will simulate a request using the disk_cache format: The way HTTP caching works is a lot more complex than in your example, you haven't taken into account conditional HTTP requests. I've taken into account the actual mod_disk_cache code! Let me try to translate your typical scenario. A typical conditional scenario goes like this: - Browser asks for URL from httpd. Same. - Mod_cache has a cached copy by looking up the headers BUT - it's stale. mod_cache converts the browser's original request to a conditional request by adding the header If-None-Match. sed s/mod_cache/mod_http_cache - The backend server answers no worries, what you have is still fresh by sending a 304 Not Modified. sed s/mod_cache/mod_http_cache - mod_cache takes the headers from the 304, and replaces the headers on the cached entry, in the process making the entry fresh again. sed s/mod_cache/mod_http_cache - mod_cache hands the cached data back to the browser. sed s/mod_cache/mod_http_cache Read http://www.ietf.org/rfc/rfc2616.txt section 13 (mainly) to see in detail how this works. Again: we do not want to change the semantics, we only want to separate the HTTP specific part from the storage specific part. The HTTP specific parts of mod_disk_cache, mod_mem_cache and mod_cache are moved to a mod_http_cache, while retaining the storage specific parts. And mod_cache is the one who will combine those two layers. Again: it's the same thing as we were replacing all mod_disk_cache file operations by hash table operations. -- Davi Arnaut
Re: Possible new cache architecture
On Tue, 02 May 2006 23:31:13 +0200 Graham Leggett [EMAIL PROTECTED] wrote: Davi Arnaut wrote: The way HTTP caching works is a lot more complex than in your example, you haven't taken into account conditional HTTP requests. I've taken into account the actual mod_disk_cache code! mod_disk_cache doesn't contain any of the conditional HTTP request code, which is why you're not seeing it there. Please keep in mind that the existing mod_cache framework's goal is to be a fully HTTP/1.1 compliant, content generator neutral, efficient, error free and high performance cache. Moving towards and keeping with the above goals is a far higher priority than simplifying the generic backend cache interface. To sum up - the cache backend must fulfill the requirements of the cache frontend (generic or not), which in turn must fulfill the requirements of the users, who are browsers, web robot code, and humans. To try and prioritise this the other way round is putting the cart before the horse. Graham, what I want is to be able to write a mod_cache backend _without_ having to worry about HTTP. _NOT_ to rewrite mod_disk/proxy/cache/whatever! You keep talking about HTTP this, HTTP that, I wont change the way it currently works. I just want to place a glue beteween the storage and the HTTP part. I could even wrap around your code: typedef struct apr_status_t (*fetch) (cache_handle_t *h, apr_bucket_brigade *bb); apr_status_t (*store) (cache_handle_t *h, apr_bucket_brigade *bb); int (*remove) (const char *key); } my_cache_provider; typedef struct { const char *key_headers; const char *key_body; } my_cache_object; create_entity: my_cache_object *obj; obj-key_headers = hash_headers(request, whatever); obj-key_body = hash_body(request, whatever); open_entity: my_cache_object *obj; my_provider-fetch(h, obj-key_headers, header_brigade); // if necessary, update obj-key_headers/body (vary..) remove_url: my_provider-remove(obj-key_header); my_provider-remove(obj-key_body); remove_entity: nop store_headers: my_cache_object *obj; // if necessary, update obj-key_headers (vary..) my_provider-store(h, obj-key_headers, header_brigade); store_body: my_cache_object *obj; my_provider-store(h, obj-key_body, body_brigade) recall_headers: my_cache_object *obj; my_provider-fetch(h, obj-key_headers, header_brigade); recall_body: my_cache_object *obj; my_provider-fetch(h, obj-key_body, body_brigade); -- Davi Arnaut
Re: Possible new cache architecture
On Wed, 03 May 2006 01:09:03 +0200 Graham Leggett [EMAIL PROTECTED] wrote: Davi Arnaut wrote: Graham, what I want is to be able to write a mod_cache backend _without_ having to worry about HTTP. Then you will end up with code that does not meet the requirements of HTTP, and you will have wasted your time. Yeah, right! How ? Hey, you are using the Monty Python argument style. Can you point to even one requirement of HTTP that my_cache_provider wont meet ? Please go through _all_ of the mod_cache architecture, and not just mod_disk_cache. Also read and understand HTTP/1.1 gateways and caches, and as you want to create a generic cache, read and understand mod_ldap, a module that will probably benefit from the availability of a generic cache. Then step back and see that mod_cache is a small part of a bigger picture. At this point you'll see that as nice as your idea of a simple generic cache interface is, it's not going to be the most elegant solution to the problem. blah, blah.. you essentially said: I don't want a simpler interface, I think the current mess is more elegant. I have shown you that I can even wrap your messy cache_provider hooks into a much simpler one, how can anything else be more elegant ? -- Davi Arnaut
Re: Possible new cache architecture
On Mon, 01 May 2006 14:51:53 +0200 Graham Leggett [EMAIL PROTECTED] wrote: Davi Arnaut wrote: mod_cache need not be HTTP specific, it only needs the ability to cache multiple entities (data, headers) under the same key, and be able to replace zero or more entities independently of the other entities (think updating headers without updating content). mod_cache needs only to cache key/value pairs. The key/value format is up to the mod_cache user. It's a design flaw to create problems that have to be specially coded around, when you can avoid the problem entirely. Maybe I'm missing something, what problems do you foresee ? The cache needs to be generic, yes - but there is no need to stick to the key/value cliché of cache code, if a variation to this is going to make your life significantly easier. And the variation is..? -- Davi Arnaut
Re: Possible new cache architecture
On Mon, 01 May 2006 09:02:31 -0400 Brian Akins [EMAIL PROTECTED] wrote: Here is a scenario. We will assume a cache hit. I think the usage scenario is clear. Moving on, I would like to able to stack up the cache providers (like the apache filter chain). Basically, mod_cache will expose the functions: add(key, value, expiration, flag) get(key) remove(key) mod_cache will then pass the request (add/get or remove) down the chain, similar to apache filter chain. ie: apr_status_t mem_cache_get_filter(ap_cache_filter_t *f, apr_bucket_brigade *bb, ...); apr_status_t disk_cache_get_filter(ap_cache_filter_t *f, apr_bucket_brigade *bb, ...); This way it would be possible for one cache to act as a cache of another cache provider, mod_mem_cache would work as a small/fast MRU cache for mod_disk_cache. -- Davi Arnaut
Re: Possible new cache architecture
On Mon, 01 May 2006 15:46:58 -0400 Brian Akins [EMAIL PROTECTED] wrote: Graham Leggett wrote: That's two hits to find whether something is cached. You must have two hits if you support vary. How are races prevented? shouldn't be any. something is in the cache or not. if one piece of an http object is not valid or in cache, the object is invalid. Although other variants may be valid/in cache. More important, if we stick with the key/data concept it's possible to implement the header/body relationship under single or multiple keys. I think Brian want's mod_cache should be only a layer (glue) between the underlying providers and the cache users. Each set of problems are better dealt under their own layers. The storage layer (cache providers) are going to only worry about storing the key/data pairs (and expiring ?) while the protocol layer will deal with the underlying concepts of each protocol (mod_http_cache). The current design leads to bloat, just look at mem_cache and disk_cache, both have their own duplicated quirks (serialize/unserialize, et cetera) and need special handling of the headers and file format. Under the new design this duplication will be gone, think that we will assemble the HTTP-specific part and generalize the storage part. -- Davi Arnaut
Re: Possible new cache architecture
On Mon, 01 May 2006 22:46:44 +0200 Graham Leggett [EMAIL PROTECTED] wrote: Brian Akins wrote: That's two hits to find whether something is cached. You must have two hits if you support vary. You need only one - bring up the original cached entry with the key, and then use cheap subkeys over a very limited data set to find both the variants and the header/data. How are races prevented? shouldn't be any. something is in the cache or not. if one piece of an http object is not valid or in cache, the object is invalid. Although other variants may be valid/in cache. I can think of one race off the top of my head: - the browser says send me this URL. - the cache has it cached, but it's stale, so it asks the backend If-None-Match. - the cache reaper comes along, says oh, this is stale, and reaps the cached body (which is independant, remember?). The data is no longer cached even though the headers still exist. - The backend says 304 Not Modified. - the cache says cool, will send my copy upstream. Oops, where has my data gone?. Sorry, but this only happens in your imagination. It's pretty obvious that mod_cache_http will handle this. The end user will probably experience this as oh, the website had a glitch, let me try again, so it won't be reported as a bug. No. Ok, so you tried to lock the body before going to the backend, but searching for and locking the body would have been an additional wasted cache hit if the backend answered with its own body. Not to mention having to write and debug code to do this. Locks are not necessary, perhaps you are imaginating something very different. If a data body disappears under mod_http_cache it is not a big deal! It will refuse to serve the request from the cache and a new version of the page will be cached. Races need to be properly handled, and atomic cache operations will go a long way to prevent them. I think we are discussing apples and oranges. First, we only want to *organize* the current cache code into a more layered solution. The current semantics won't change, yet! -- Davi Arnaut
Re: Possible new cache architecture
On Sun, 30 Apr 2006 22:38:23 +0200 Graham Leggett [EMAIL PROTECTED] wrote: Brian Akins wrote: mod_http_cache could just cache headers and data as separate cache entries. The potential danger with this is for race conditions to happen while expiring cache entries. If the data entity expired before the header entity, it potentially could confuse the cache - is the entry cached or not? The headers say yes, data says no. If both the data and header have the same expiration time they should both be removed atomically, but this would be hard to achieve. The trick is to set the header to expire before the data. Also, this would confuse the cache user not the cache itself. So a given HTTP object may actually have 3 entries in the cache: -first entry says: Vary on x,y,z -second entry is headers for new key (generated with the vary info) -third entry is the actual data Each variant should be an independent cached entry, the cache should allow different variants to be cached side by side. As far as mod_cache is concerned these are 3 independent entries, but mod_http_cache knows how to stitch them together. mod_cache should *not* be HTTP specific in any way. mod_cache need not be HTTP specific, it only needs the ability to cache multiple entities (data, headers) under the same key, and be able to replace zero or more entities independently of the other entities (think updating headers without updating content). mod_cache needs only to cache key/value pairs. The key/value format is up to the mod_cache user. -- Davi Arnaut
[PATCH]: upstream server timeout
According to RFC 2616, whenever a connection to a upstream server fails due to a timeout (or DNS failure) the proxy server should return with status 504 (Gateway Timeout). -- Davi Arnaut Index: mod_proxy_http.c === --- mod_proxy_http.c(revision 398058) +++ mod_proxy_http.c(working copy) @@ -1689,7 +1689,7 @@ /* Step Two: Make the Connection */ if (ap_proxy_connect_backend(proxy_function, backend, worker, r-server)) { if (r-proxyreq == PROXYREQ_PROXY) -status = HTTP_NOT_FOUND; +status = HTTP_GATEWAY_TIME_OUT; else status = HTTP_SERVICE_UNAVAILABLE; goto cleanup;
mod_proxy yields a 404 on timeout
Hi, When mod_proxy fails to connect to a backend (ap_proxy_connect_backend) due to a timeout, or something else, and the connection is a standard proxy request (i.e. non-reverse) it returns a status code 404 (Not found). Is there any reason why we are not returning a 500 (Gateway Timeout) code ? Quoting RFC 2616: 10.5.5 504 Gateway Timeout The server, while acting as a gateway or proxy, did not receive a timely response from the upstream server specified by the URI (e.g. HTTP, FTP, LDAP) or some other auxiliary server (e.g. DNS) it needed to access in attempting to complete the request. -- Davi Arnaut