from:"Davi Arnaut"

Re: apr_inet_pton

2008-04-17 Thread Davi Arnaut

Cesar Silveira wrote:
 Hello,
 
 Not sure if this is the right place to ask this. Does anyone know why 
 apr_inet_pton seems not to be available in the APR library's public API? 
 By looking at the source code I have found it is only implemented on 
 UNIX-specific files. However, I was able to take the code and simply 
 paste and compile it on Visual Studio 2005 and it worked perfectly.
 
 I'll be really glad if anyone can provide me some input on that.

APR questions are better suited to the apr-dev mailing list. What you
need to do with apr_inet_pton that cannot be done with other public APR
functions?

Regards,

--
Davi Arnaut

Re: Is async the answer

2008-01-19 Thread Davi Arnaut

Akins, Brian wrote:
 On 1/18/08 3:07 PM, Colm MacCarthaigh [EMAIL PROTECTED] wrote:
 That's not even a consideration,
 async is really for dynamic content, proxies, and other non-sendfile
 content.
 
 For dynamic stuff, X-sendfile works well. (Just really starting to play
 with that, liking it so far).
 
 The proxy that the LiveJournal folks wrote, I think, copies all the data
 from the origin server into a file and then uses sendfile to send to the
 client...

Doesn't this limit the network bandwidth to the bandwidth of the disk
and/or file system?

--
Davi Arnaut

Re: Is async the answer

2008-01-19 Thread Davi Arnaut

Graham Leggett wrote:
 Davi Arnaut wrote:
 
 The proxy that the LiveJournal folks wrote, I think, copies all the data
 from the origin server into a file and then uses sendfile to send to the
 client...
 Doesn't this limit the network bandwidth to the bandwidth of the disk
 and/or file system?
 
 Yes, and the effective bandwidth of the disk can be significantly higher 
 than both the cache backend (which is often expensive) and the network 
 frontend (which has slow potential slow clients typing up your resources).
 
 Don't forget that your cache disk is most often RAM backed, meaning 
 effectively your cache disk is a ramdisk, with all the speed advantages 
 that go with it.
 

This is true for expensive hardware and very well designed operating
systems and file systems.. and the space is not infinite.

But... OK. Back to the topic I thought that one of the key points of
async/event based servers were that we use software to scale and not
hardware (so that hardware is not the bottleneck)... like serving
thousands of slow clients from commodity hardware.

--
Davi Arnaut

Re: Integrity of Apache source code

2007-12-17 Thread Davi Arnaut

Andrew Beverley wrote:
 Hi,
 
 I hope that this is the correct mailing list for this question, and that you 
 can
 easily provide a quick response.
 
 I am currently working within the UK Ministry of Defence, and am trying to get
 Apache web server accredited as software able to be installed on one of our
 defence networks. However, one of the barriers I am coming up against is the
 argument that, because it is open source, that someone could contribute a 
 Trojan
 horse to the code and that the code could be included in the official product.
 
 What I would like to know, so that I can dispel this, is what procedures are 
 in
 place to prevent this happening? I know that all downloads are digitally 
 signed,
 but what other procedures are in place? For example, how is code signed-off 
 for
 inclusion in production releases?
 

The trojan horse argument is a flawed argument commonly used against
open source projects, a Google search on the subject will pop up a lot
of articles on this subject. Example, this fine article:

http://www.dwheeler.com/secure-programs/Secure-Programs-HOWTO/open-source-security.html

On a personal note, I think that the distributed development nature of
the Apache HTTPD (and other open source projects) are what makes me more
confident to trust it and given it's 50+% market share I'm pretty sure
other people trust it too.

On a side note, the NSA (National Security Agency) has taken the open
source route and they don't seem to be worried :-)

Regards,

Davi Arnaut

Re: svn commit: r595813 - /httpd/httpd/branches/2.2.x/STATUS

2007-11-16 Thread Davi Arnaut

Ruediger Pluem wrote:
 
 On 11/16/2007 09:34 PM, [EMAIL PROTECTED] wrote:
 Author: davi
 Date: Fri Nov 16 12:34:22 2007
 New Revision: 595813

 URL: http://svn.apache.org/viewvc?rev=595813view=rev
 Log:
 Remove proposal, fix is already backported (r574884) but has
 
 Not its not. jorton's solution (r591384) is better. Please readd the
 proposal.
 

See jorton's comment on PR 43865.

/me stops committing while on holiday trip

Regards,

Davi Arnaut

Re: Amsterdam sandbox

2007-11-14 Thread Davi Arnaut

Justin Erenkrantz wrote:
 On Nov 14, 2007 9:07 AM, Issac Goldstand [EMAIL PROTECTED] wrote:
 Well, not to overstate the obvious, but aren't we implying here that
 serf will become an integral part of apr-util (at least that's what I'd
 understood)?  As such, serf wouldn't be forked as much as absorbed...
 
 IMO, serf is and should be a standalone library.  -- justin

Under (or not) the Apache umbrella?

--
Davi Arnaut

Re: Apachelounge problems

2007-08-18 Thread Davi Arnaut

William A. Rowe, Jr. wrote:
 Hmmm... seems that - even though we've *repeated* this multiple times,
 we have to state this again.  Contents of http://httpd.apache.org/dev/dist/
 are *development* tarballs and not for any distribution.

Just out of curiosity, why don't we name the tarballs as such?
httpd-2.2.5-RC.tar.gz or httpd-2.2.5-dev.tar.gz, and if the tarball gets
enough votes, just rename it to httpd-2.2.5.tar.gz

--
Davi Arnaut

Re: svn commit: r563839 - /httpd/httpd/trunk/CHANGES

2007-08-08 Thread Davi Arnaut


[EMAIL PROTECTED] wrote:

Author: jim
Date: Wed Aug  8 05:41:21 2007
New Revision: 563839

[..]
 
 Changes with Apache 2.2.5

+
+  *) mod_deflate: fix protocol handling in deflate input filter
+ PR 23287 [Nick Kew]
+
+  *) mod_proxy: fix buffer overflow issue
+ PR 41144 [Davi Arnaut]
 


BTW, the above description should be corrected to:

http://svn.apache.org/viewvc/httpd/httpd/branches/2.0.x/CHANGES?r1=563329r2=563328pathrev=563329

--
Davi Arnaut

Re: svn commit: r563839 - /httpd/httpd/trunk/CHANGES

2007-08-08 Thread Davi Arnaut


Jim Jagielski wrote:


On Aug 8, 2007, at 8:47 AM, Davi Arnaut wrote:


[EMAIL PROTECTED] wrote:

Author: jim
Date: Wed Aug  8 05:41:21 2007
New Revision: 563839

[..]

  Changes with Apache 2.2.5
+
+  *) mod_deflate: fix protocol handling in deflate input filter
+ PR 23287 [Nick Kew]
+
+  *) mod_proxy: fix buffer overflow issue
+ PR 41144 [Davi Arnaut]



BTW, the above description should be corrected to:

http://svn.apache.org/viewvc/httpd/httpd/branches/2.0.x/CHANGES?r1=563329r2=563328pathrev=563329 





I'm guessing you didn't read the log message :)


Relax, it was just a gentle remainder :)

Another question. There are some recent changes in apr that fixes a few 
issues (hp-ux sendfile, day of the year, dbm) which were originally 
reported as httpd (2.2.4) problems, wouldn't be better to ship 2.2.5 
with newer apr/apr-util releases?


--
Davi Arnaut

Re: svn commit: r563147 - /httpd/httpd/trunk/server/mpm/experimental/event/fdqueue.c

2007-08-06 Thread Davi Arnaut

Ruediger Pluem wrote:
 
 On 08/06/2007 04:20 PM, [EMAIL PROTECTED] wrote:
 Author: jim
 Date: Mon Aug  6 07:20:24 2007
 New Revision: 563147

 URL: http://svn.apache.org/viewvc?view=revrev=563147
 Log:
 These atomics expect apr_uint32_t *... The expectation, of course,
 is that the add/inc still works as expected even though we
 are using signed values.
 
 Hm. I have a bad feeling here. These atomics are platform specific and seemed
 to be designed for unsigned ints. Are we really sure that they can handle 
 signed ints
 as well? Comments from an atomics guru from APR?

I'm no guru on the subject, but AFAIK it's not reliable because of the
underlying signed - unsigned conversions and unsigned arithmetic of
the various instructions used.

Looking at the code, I think we could use some other scheme to track the
blocked threads.

--
Davi Arnaut

Re: mod_cache summary and plan

2006-10-29 Thread Davi Arnaut

Graham Leggett wrote:
 Davi Arnaut wrote:
 
 . Problem:
 
 You have described two separate problems below.

No, and it's seems you are deeply confused on what buckets and brigades
represent. You already committed what ? four fixes to the same problem ?
Each time we point your wrong assumptions you came up with yet another
bogus fix. Could you please stop for a moment and listen ?

IMHO, you haven't presented any acceptable fix and you keep trying to
fix things by your self without discussing on the list first. And more
important, discussing on the list means that you have to hear other
people comments.

 For a moment forget about file buckets and large files, what's really at
 stake is proxy/cache brigade management when the arrival rate is too
 high (e.g. a single 4.7GB file bucket, high-rate input data to be
 consumed by relatively low-rate).

 By operating as a normal output filter mod_cache must deal with
 potentially large brigades of (possibly) different (other than the stock
 ones) bucket types created by other filters on the chain.
 
 This first problem has largely been solved, bar some testing.

Those fixes were vetoed if I remember correctly.

 The solution was to pass the output filter through the save_body() hook, 
 and let the save_body() code decide for itself when the best time is to 
 write the bucket(s) to the network.
 
 For example in the disk cache, the apr_bucket_read() loop will read 
 chunks of the 4.7GB file 4MB at a time. This chunk will be cached, and 
 then this chuck will be written to the network, then cleanup up. Rinse 
 repeat.
 
 Previously, save_body() was expected to save all 4.7GB to the cache, and 
 then only write the first byte to the network possibly minutes later.
 
 If a filter was present before cache that for any reason converted file 
 buckets into heap buckets (for example mod_deflate), then save_body() 
 would try and store 4.7GB of heap buckets in RAM to pass to the network 
 later, and boom.

You just described what I've said with another words. Listen if you
don't change a bit your attitude I won't continue arguing with you, it's
pointless.

 How mod_disk_cache chooses to send data to the network is an entirely 
 separate issue, detailed below.

NO! It's the same problem.

 The problem arises from the fact that mod_disk_cache store function
 traverses the brigade by it self reading each bucket in order to write
 it's contents to disk, potentially filling the memory with large chunks
 of data allocated/created by the bucket type read function (e.g. file
 bucket).
 
 To put this another way:
 
 The core problem in the old cache code was that the assumption was made 
 that it was practical to call apr_bucket_read() on the same data _twice_ 
 - once during caching, once during network write.

No, the core problem it's how it manages the bucket/brigade (deep down
it's the same problem, but ...).

 This assumption isn't valid, thus the recent fixes.
 
 . Constraints:

 No threads/forked processes.
 Bucket type specific workarounds won't work.
 No core changes/knowledge, easily back-portable fixes are preferable.

 . Proposed solution:

 File buffering (or a part of Graham's last approach).

 The solution consists of using the cache file as a output buffer by
 splitting the buckets into smaller chunks and writing then to disk. Once
 written (apr_file_write_full) a new file bucket is created with offset
 and size of the just written buffer. The old bucket is deleted.

 After that, the bucket is inserted into a temporary (empty) brigade and
 sent down the output filter stack for (probably) network i/o.

 At a quick glance, this solution may sound absurd -- the chunk is
 already in memory, and the output filter might need it again in memory
 soon. But there's no silver bullet, and it's a simple enough approach to
 solve the growing memory problem while not occurring into performance
 penalties.
 
 As soon as apr_file_write_full() is executed, the bucket just saved to 
 disk cache is also in kernel buffer memory - meaning that a 
 corresponding apr_bucket_read() afterwards in the network code reads 
 already kernel memory cached data.

I've just said that in the e-mail, but you deleted it.

 In performance testing, on files small enough to be buffered by the 
 kernel (a few MB), the initial part of the download after caching is 
 very fast.
 
 What this technique does is guarantee that regardless of the source of 
 the response, be it a file, a CGI, or proxy, what gets written to the 
 network is always a file, and always takes advantage of kernel based 
 file performance features.

I've just described that. Maybe my English was poor in the e-mail.

--
Davi Arnaut

Re: mod_cache summary and plan

2006-10-29 Thread Davi Arnaut

Ruediger Pluem wrote:
 
 On 10/29/2006 04:39 PM, Davi Arnaut wrote:
 Graham Leggett wrote:

 Davi Arnaut wrote:


 . Problem:
 You have described two separate problems below.

 No, and it's seems you are deeply confused on what buckets and brigades
 represent. You already committed what ? four fixes to the same problem ?
 Each time we point your wrong assumptions you came up with yet another
 bogus fix. Could you please stop for a moment and listen ?

 IMHO, you haven't presented any acceptable fix and you keep trying to
 fix things by your self without discussing on the list first. And more
 important, discussing on the list means that you have to hear other
 people comments.


 
 The solution was to pass the output filter through the save_body() hook, 
 and let the save_body() code decide for itself when the best time is to 
 write the bucket(s) to the network.

 For example in the disk cache, the apr_bucket_read() loop will read 
 chunks of the 4.7GB file 4MB at a time. This chunk will be cached, and 
 then this chuck will be written to the network, then cleanup up. Rinse 
 repeat.

 Previously, save_body() was expected to save all 4.7GB to the cache, and 
 then only write the first byte to the network possibly minutes later.

 If a filter was present before cache that for any reason converted file 
 buckets into heap buckets (for example mod_deflate), then save_body() 
 would try and store 4.7GB of heap buckets in RAM to pass to the network 
 later, and boom.

 You just described what I've said with another words. Listen if you
 don't change a bit your attitude I won't continue arguing with you, it's
 pointless.
 
 I do not really like the way the discussion goes here. If I remember myself
 correctly we had a very similar discussion between you and Graham several
 month ago regarding the need for mod_cache to be RFC compliant.

Yes, but it was about the generic cache architecture.

 It may be that
 we circle around the same things again and again and sometimes this may be 
 even
 unproductive. But this way for sure we do not get anything (and the past 
 proved it)
 productive out of this. As we had this in the past I try to throw a flag very 
 early
 to avoid wasting time for everybody with the back and forth following such 
 things.
 If you are frustrated by Grahams responses and the situation please try to 
 express
 this a little different and less personalized.

I'm not frustrated, I just wanted to tell him what I thought. I think he
is a smart guy and works hard on the issues but we could achieve much
more by collaborating constructively in a sane manner (step by step).

--
Davi Arnaut

Re: mod_cache summary and plan

2006-10-29 Thread Davi Arnaut

Graham Leggett wrote:
 Davi Arnaut wrote:
 
 I've just described that. Maybe my English was poor in the e-mail.
 
 Your English is spot on, unfortunately the aggressive nature of your 
 email isn't.
 
 You are not going to bully anybody on this list into accepting any 
 patch, it's not how this project works.

I'm not bulling anyone. This is not a personal attack, it was a public
calling for you to adjust the process.

 It is quite clear to me that you are upset that your patches were not 
 accepted as is. Unfortunately your patches break existing compliance 
 with RFC2616 in the cache, and in the process introduce a significant 
 performance penalty. This has been pointed out to you before, and not 
 just by me.

That's exactly the problem, I'm not trying to compete with you. I'm not
upset if my patches are not accepted, I just want the best possible
solution that satisfies the community. I called you to work together
with everybody on the list before committing.

My patches were intended as a experiment, there weren't even targeted at
trunk (cache refactor branch). I don't care if the patches are going to
be committed or not. I don't contribute to prove to anyone that I'm
better or anything else, I contribute because I really enjoy working on
some parts of httpd/apr.

I'm not going to dispute with you if mine suggestions or yours are
accepted. I just want that everybody is heard on the process and that
the final process pleases the majority.

 Your recent comments on patches contributed have made it clear that you 
 neither understand the patches so committed, nor have you actually run 
 the code in question. I respectfully request you fix both these issues 
 before continuing any work on this cache.

I don't want to solve this problem alone, I don't have all the answers.
But I do know that last week jumbo patches didn't advance the issue any
further because they were vetoed -- not because they were wrong, but
because you didn't work with everybody before committing then.

I will let this thread die now, which was created to gather a consensus
but failed miserably. I just hope our minor disagreements won't
interfere with us working on mod_cache in the future. I will repeat
again, I'm not attacking you. I was pursing what I thought was better
for mod_cache.

--
Davi Arnaut

Re: mod_cache summary and plan

2006-10-29 Thread Davi Arnaut

Graham Leggett wrote:
 Davi Arnaut wrote:
 
 You are not going to bully anybody on this list into accepting any 
 patch, it's not how this project works.
 I'm not bulling anyone. This is not a personal attack, it was a public
 calling for you to adjust the process.
 
 Let's not fool ourselves, it was a personal attack.

I'm not fooling my self.

 Bullying me is not going to force me to adjust the Apache process that 
 has been in place on this project since I joined it 8 years ago. That 
 process is not mine to change, and the sooner you realise that the 
 better for us all.
 
   I will let this thread die now, which was created to gather
   a consensus but failed miserably. I just hope our minor
   disagreements won't interfere with us working on mod_cache in
   the future. I will repeat again, I'm not attacking you. I was
   pursing what I thought was better for mod_cache.
 
 mod_cache doesn't get better when the maintainers get fed up by the 
 constant barrage of your abuse and give it up as a bad idea. To declare 
 end of thread doesn't give us any guarantee that we won't see more of 
 the same trolling from you.

Fair enough, I withdrawal all my patches and I'm going to unsubscribe
from the list. Have a nice day.

--
Davi Arnaut

mod_cache summary and plan

2006-10-28 Thread Davi Arnaut

Hi,

It's quite clear that without some agreement we won't be able to
actually fix mod_cache shortcomings. The idea now is to gather our
efforts to get consensus on the proposed fixes and commit then one by one.

The current high priority issues can be summarized as:

* Buffering

. Problem:

For a moment forget about file buckets and large files, what's really at
stake is proxy/cache brigade management when the arrival rate is too
high (e.g. a single 4.7GB file bucket, high-rate input data to be
consumed by relatively low-rate).

By operating as a normal output filter mod_cache must deal with
potentially large brigades of (possibly) different (other than the stock
ones) bucket types created by other filters on the chain.

The problem arises from the fact that mod_disk_cache store function
traverses the brigade by it self reading each bucket in order to write
it's contents to disk, potentially filling the memory with large chunks
of data allocated/created by the bucket type read function (e.g. file
bucket).

. Constraints:

No threads/forked processes.
Bucket type specific workarounds won't work.
No core changes/knowledge, easily back-portable fixes are preferable.

. Proposed solution:

File buffering (or a part of Graham's last approach).

The solution consists of using the cache file as a output buffer by
splitting the buckets into smaller chunks and writing then to disk. Once
written (apr_file_write_full) a new file bucket is created with offset
and size of the just written buffer. The old bucket is deleted.

After that, the bucket is inserted into a temporary (empty) brigade and
sent down the output filter stack for (probably) network i/o.

At a quick glance, this solution may sound absurd -- the chunk is
already in memory, and the output filter might need it again in memory
soon. But there's no silver bullet, and it's a simple enough approach to
solve the growing memory problem while not occurring into performance
penalties.

The memory usage is kept low because the deleted buckets will be
released to the free list. Performance won't be hit because if there is
enough memory system-wide the reads from the core output filter are
going to be served by the page cache, or better yet, sendfil()ed.

. Comments

Do you agree/disagree ? Better solution ?

* Thundering herd/parallel downloads
* Partial downloads
* Code quality

Let's settle down the buffer issue first.

--
Davi Arnaut

Re: svn commit: r468373 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_cache.c modules/cache/mod_cache.h modules/cache/mod_disk_cache.c modules/cache/mod_disk_cache.h modules/cache/mod_mem_cache.c

2006-10-27 Thread Davi Arnaut

 {
 -ap_log_error(APLOG_MARK, APLOG_INFO, 0, r-server,
 - disk_cache: Caching body for URL %s, dobj-name);
 +e = APR_BRIGADE_FIRST(bb);
 +while (e != APR_BRIGADE_SENTINEL(bb)) {
  
 -for (e = APR_BRIGADE_FIRST(bb);
 -e != APR_BRIGADE_SENTINEL(bb);
 -e = APR_BUCKET_NEXT(e))
 -{   
 -const char *str;
 -apr_size_t length, written;
 +const char *str;
 +apr_size_t length, written;
 +apr_off_t offset = 0;
  
 -/* Ignore the non-data-buckets */
 -if(APR_BUCKET_IS_METADATA(e)) {
 -continue;
 -}
 +/* try write all data buckets to the cache, except for metadata 
 buckets */
 +if(!APR_BUCKET_IS_METADATA(e)) {

Swapping the metadata check would make the code much more readable:

if (is_metadata)
deal with it
continue

handle normal bucket


 +/* read in a bucket fragment */
  rv = apr_bucket_read(e, str, length, APR_BLOCK_READ);
  if (rv != APR_SUCCESS) {
  ap_log_error(APLOG_MARK, APLOG_ERR, rv, r-server,
 - disk_cache: Error when reading bucket for URL 
 %s,
 + disk_cache: Error when reading bucket for URL 
 %s, aborting request,
   dobj-name);
  file_cache_errorcleanup(dobj, r);
 +/* not being able to read the bucket is fatal,
 + * return this up the filter stack
 + */
  return rv;
  }
 +
 +/* try write the bucket fragment to the cache */
 +apr_file_seek(dobj-fd, APR_END, offset);
  rv = apr_file_write_full(dobj-fd, str, length, written);
 -if (rv != APR_SUCCESS) {
 +offset = - (apr_off_t)written;
 +apr_file_seek(dobj-fd, APR_END, offset);
 +/* if the cache write was successful, swap the original bucket
 + * with a file bucket pointing to the same data in the cache.
 + * 
 + * This is done because:
 + * 
 + * - The ap_core_output_filter can take advantage of its ability
 + * to do non blocking writes on file buckets.
 + * 
 + * - We are prevented from the need to read the original bucket
 + * a second time inside ap_core_output_filter, which could be
 + * expensive or memory consuming.
 + * 
 + * - The cache, in theory, should be faster than the backend,
 + * otherwise there would be little point in caching in the first
 + * place.
 + */
 +if (APR_SUCCESS == rv) {
 +
 +/* remove and destroy the original bucket from the brigade */
 +b = e;
 +e = APR_BUCKET_NEXT(e);
 +APR_BUCKET_REMOVE(b);
 +apr_bucket_destroy(b);

aka as apr_bucket_delete

 +/* Is our network connection still alive?
 + * If not, we must continue caching the file, so keep 
 looping.
 + * We will return the error at the end when caching is done.
 + */
 +if (APR_SUCCESS == dobj-frv) {
 +
 +/* insert a file bucket pointing to the cache into out 
 temporary brigade */
 +if (diskcache_brigade_insert(dobj-tmpbb, dobj-fd, 
 dobj-file_size, 
 + written,
 + dobj-updtimeout, r-pool) 
 == NULL) {
 +   return APR_ENOMEM;
 +}

Err. We had the data in memory, we are going to read it back from disk
again just in order to not block ? That's nonsense.

We don't need a special bucket type!

Also, are you really sure you are tracking correctly all those buckets
with the correct offset ? The file pointer offset is unique for a single fd.

I will stop the review here, I have to calm down to really digest what's
happening.

--
Davi Arnaut

Re: mod_disk_cache summarization

2006-10-27 Thread Davi Arnaut

Graham Leggett wrote:
 On Mon, October 23, 2006 10:50 pm, Davi Arnaut wrote:
 
 AFAIK all major platforms provide one, even win32. I even made a
 incomplete APR abstraction for file notification:

 http://haxent.com/~davi/apr/notify
 
 Is it possible to add doxygen comments on the apr_notify.h file, so as to
 confirm exactly how to use it?
 
 I would like to look at this for apr.
 

The code is a bit outdated, I have integrated it onto a generic event
abstraction for apr a la kevent that I'm working on.

I'll try to post something this weekend to the list with the appropriate
documentation.

--
Davi Arnaut

Re: svn commit: r468373 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_cache.c modules/cache/mod_cache.h modules/cache/mod_disk_cache.c modules/cache/mod_disk_cache.h modules/cache/

2006-10-27 Thread Davi Arnaut

Niklas Edmundsson wrote:
 On Fri, 27 Oct 2006, Graham Leggett wrote:
 
 Err. We had the data in memory, we are going to read it back from disk
 again just in order to not block ? That's nonsense.
 Agreed.
 Please explain.

 This is a disk cache. Why would you write expensive bucket data to cache,
 and then expensive bucket data to the network?

 That's plain stupid.
 
 And when you have a file backend, you want to hit your disk cache and 
 not the backend when delivering data to a client. People might think 
 that this doesn't matter, but for large files, especially larger than 
 RAM in your machine, you usually go disk-bound without much help from 
 the OS disk cache.

But that's a corner case. There is no reason in doing this for small
files (common case). For example, in a enterprise grade server memory is
cheap and permanent storage is slow and expensive.

 Also, httpd seems to be faster delivering data by sendfile than 
 delivering data from memory buckets. That's more of a performance bug 
 in httpd though.

It's not all httpd fault.  Sendfile avoids memory copying/context
switches/etc behind the scenes (in the kernel) because it talks
directly to the underlying file system. When you have the memory already
in user space there isn't much that you can do, the kernel has to copy
it..etc.

--
Davi Arnaut

Re: svn commit: r467655 - in /httpd/httpd/trunk: CHANGES docs/manual/mod/mod_cache.xml modules/cache/mod_cache.c modules/cache/mod_cache.h

2006-10-27 Thread Davi Arnaut

Graham Leggett wrote:
 On Fri, October 27, 2006 6:25 pm, Niklas Edmundsson wrote:
 
 I would have been most happy if this had been fixed ages ago so I
 hadn't been forced to spend lots and lots of hours kludging stuff
 togehter. At least, my kludges seem to have sparked some development
 in this area, so they have served some purpose other than enabling a
 non-profit computer club building a FTP/HTTP server that actually
 works.
 
 When Brian Atkins stood up at Apachecon and presented a talk pretty much
 summarised as we don't use Apache's cache because it sucked, and here's
 why, that was a wake up call to realise that the disk cache in its
 current form sucks.
 
 We have significant contributions from two people - Davi Arnaut and Niklas
 Edmundsson, and I've been integrating the issues fixed by both these
 contributions into a coherent workable whole, so that the effort spent by
 these people isn't wasted. Both of their efforts have focused on different
 aspects of the cache, making this workable. Some parts are not RFC
 compliant, other parts are not implemented elegantly enough, but these are
 details that need to be raised, addressed and fixed, not used as a feeble
 excuse to abandon the effort and return to some cache code that nobody
 wants to use.

I particularly don't care if my patches are not used if a better
solution cames up. It takes time for ideas to maturate.

 I see lots of comments on the code, but the comments are summarised as
 the cache is fine as it is. It isn't. If it was fine, key users of
 network caching wouldn't be standing up saying they're using something
 else.

No one said that, but we must think before acting.

--
Davi Arnaut

Re: ROADMAP for mod_cache

2006-10-27 Thread Davi Arnaut

Paul Querna wrote:
 I think we have had enough -1s passed over the last couple days related 
 to mod_cache, that it indicates there is not a consensus on what should 
 be done, nor on how it should be done.
 
 I think we need to stop committing code for a couple days, and try to 
 make a rudimentary plan for the intended changes to mod_cache, and try 
 to gather some kind of consensus on them, rather than continuing down 
 the path of -1s and long ass mailing list threads.
 
 Thoughts?

This is definitely a good and obvious idea, but those discussions only
start when people begin committing code. I'm all in favor of posting the
ideas/code to the list before committing as long as we all work towards
choosing a common goal.

Graham was right in committing the code because he was the only
committer active in the discussing (expect for a few comments/patches
from Joe Orton). Placing a call for a common goal and not actually
suggesting one is not that helpful.

Let's see if we can attract more attention to the cache problems now.

Graham, could you please summarize the problems we want to solve and the
possible solutions and send then to the list ?

Thanks,

--
Davi Arnaut

Re: svn commit: r468373 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_cache.c modules/cache/mod_cache.h modules/cache/mod_disk_cache.c modules/cache/mod_disk_cache.h modules/cache/mo

2006-10-27 Thread Davi Arnaut

Greg Marr wrote:
 At 01:41 PM 10/27/2006, Davi Arnaut wrote:
 Niklas Edmundsson wrote:
 And when you have a file backend, you want to hit your disk cache 
 and
 not the backend when delivering data to a client. People might 
 think
 that this doesn't matter, but for large files, especially larger 
 than
 RAM in your machine, you usually go disk-bound without much help 
 from
 the OS disk cache.
 But that's a corner case. There is no reason in doing this for small
 files (common case). For example, in a enterprise grade server 
 memory is
 cheap and permanent storage is slow and expensive.
 
 So why would this server be using mod_disk_cache in that 
 case?  Shouldn't this server be using mod_mem_cache?  Selecting 
 mod_disk_cache over mod_mem_cache means it's better to serve the 
 cache from disk rather than from memory.  If serving from the disk on 
 the original request is too slow, then wouldn't serving from the disk 
 on the subsequent requests be too slow as well?
 

Because the data is already in memory. Why should I write something to
disk, erase it from memory, and read it again shortly ? Why should I
take care of something that is the job of the OS ? Why should I trash
the VM constantly ?

A proxy/cache common purpose is to cache web content, this means we have
a large number of small files (think html pages, images, etc) that we
must keep in a permanent storage, but if we have memory, let's use it!

--
Davi Arnaut

Re: svn commit: r468373 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_cache.c modules/cache/mod_cache.h modules/cache/mod_disk_cache.c modules/cache/mod_disk_cache.h modules/cache/mo

2006-10-27 Thread Davi Arnaut

Davi Arnaut wrote:
 Greg Marr wrote:
 At 01:41 PM 10/27/2006, Davi Arnaut wrote:
 Niklas Edmundsson wrote:
 And when you have a file backend, you want to hit your disk cache 
 and
 not the backend when delivering data to a client. People might 
 think
 that this doesn't matter, but for large files, especially larger 
 than
 RAM in your machine, you usually go disk-bound without much help 
 from
 the OS disk cache.
 But that's a corner case. There is no reason in doing this for small
 files (common case). For example, in a enterprise grade server 
 memory is
 cheap and permanent storage is slow and expensive.
 So why would this server be using mod_disk_cache in that 
 case?  Shouldn't this server be using mod_mem_cache?  Selecting 
 mod_disk_cache over mod_mem_cache means it's better to serve the 
 cache from disk rather than from memory.  If serving from the disk on 
 the original request is too slow, then wouldn't serving from the disk 
 on the subsequent requests be too slow as well?

 
 Because the data is already in memory. Why should I write something to
 disk, erase it from memory, and read it again shortly ? Why should I
 take care of something that is the job of the OS ? Why should I trash
 the VM constantly ?

I should note that if there is enough memory those reads will be served
by the page cache. Later we could think about reading directly to
mmapable files, changing it's protection (from rw to ro) as soon as we
are done writing to.

--
Davi Arnaut

Re: svn commit: r467655 - in /httpd/httpd/trunk: CHANGES docs/manual/mod/mod_cache.xml modules/cache/mod_cache.c modules/cache/mod_cache.h

2006-10-26 Thread Davi Arnaut

Graham Leggett wrote:
 On Wed, October 25, 2006 11:48 pm, Davi Arnaut wrote:
 
 No, we start with a file bucket on the brigade, I will try to explain
 what happens to see if we are talking about the same thing.

 Suppose we have a brigade containing a a file bucket, and the file size
 is 4.7GB. We want to read it fully.

 When we call apr_bucket_read() on this bucket, we end-up calling the
 bucket read function (file_bucket_read). What does the bucket file read do
 ?

 If mmap is supported, it mmaps APR_MMAP_LIMIT bytes (4MB) by creating a
 new mmap bucket and splits the bucket. So, after calling read on a file
 bucket you have two buckets on the brigade. The first one is the mmap
 bucket and last is file bucket with a updated offset.

 The same thing happens if mmap is not supported, but the bucket type
 will be a heap bucket. If we don't delete or flush those implicitly
 created buckets they will keep the whole file in memory, but one single
 read will not put the entire file on memory.

 What Joe's patch does is remove this first implicitly created bucket
 from the brigade, placing it on the brigade on a temporary brigade for
 sending it to the client.
 
 Ok, this makes more sense, but in its current form the temporary brigade
 should not be necessary.

But it would be better. We don't need to keep creating brigades with
apr_brigade_split(), we could only move the buckets to a temporary
brigade. Take a look at apr_brigade_split:

APU_DECLARE(apr_bucket_brigade *) apr_brigade_split(apr_bucket_brigade
*b, apr_bucket *e)
{
apr_bucket_brigade *a;
apr_bucket *f;

a = apr_brigade_create(b-p, b-bucket_alloc);
/* Return an empty brigade if there is nothing left in
 * the first brigade to split off
 */
if (e != APR_BRIGADE_SENTINEL(b)) {
f = APR_RING_LAST(b-list);
APR_RING_UNSPLICE(e, f, link);
APR_RING_SPLICE_HEAD(a-list, e, f, apr_bucket, link);
}

APR_BRIGADE_CHECK_CONSISTENCY(a);
APR_BRIGADE_CHECK_CONSISTENCY(b);

return a;
}

If we used a tmpbb we could replace the apr_brigade_split with something
similar to:

if (e != APR_BRIGADE_SENTINEL(b)) {
f = APR_RING_LAST(b-list);
APR_RING_UNSPLICE(e, f, link);
APR_RING_SPLICE_HEAD(a-list, e, f, apr_bucket, link);
}

And instead of destroying the brigade, we could only do a call to:

apr_brigade_cleanup(tmpbb);

 While disk cache goes through the brigade, it replaces buckets on the
 incoming brigade with a bucket referring to the cached file.
 
 I think a number of the thorny memory problems in mod_*cache have come
 about because the brigade is read twice - once by store_body, and once by
 ap_core_output_filter.

Maybe. Buckets are reference counted and get reused pretty quickly.

 The replacing of the buckets with file buckets in the brigade by disk
 cache means the brigade will only be read once - by save_body().

Right, I liked your approach. I just wanted to be sure everyone is on
the same page on this issue.

 That's why splitting the brigade with magical values (16MB) is not such
 a good idea, because the bucket type knows betters and will split the
 bucket anyway.
 
 Right, the picture is getting clearer.
 
 Look closer at the cache code in mod_cache, right at the end of the
 CACHE_SAVE filter.
 
 Notice how store_body() is called, followed by ap_pass_brigade(). Notice
 how store_body() is expected to handle the complete brigade, all 4.7GB of
 it, before ap_pass_brigade() has a look see. Now - create a 4.7GB file on
 your drive. Copy this file to a second filename. Time this and tell me how
 long it takes. Do you see the further problem?

Yup.

 The split-before-save_body patch wasn't described well enough by me - it
 is designed to prevent save_body() from having to handle bucket sizes that
 are impractically large to handle by a cache, both in terms of memory, and
 in terms of time.
 

That is clear.

--
Davi Arnaut

Re: svn commit: r467655 - in /httpd/httpd/trunk: CHANGES docs/manual/mod/mod_cache.xml modules/cache/mod_cache.c modules/cache/mod_cache.h

2006-10-25 Thread Davi Arnaut

..

 +static int do_store_body(cache_request_rec *cache,
 + ap_filter_t *f,
 + apr_bucket_brigade *in) {
 +apr_bucket *e;
 +apr_bucket_brigade *bb;
 +apr_status_t rv, rv2;
 +cache_server_conf *conf;
 +
 +conf = (cache_server_conf *) 
 ap_get_module_config(f-r-server-module_config,
 +  cache_module);
 +
 +/* try split any buckets larger than threshold */
 +rv = APR_SUCCESS; /* successful unless found otherwise */
 +rv2 = APR_SUCCESS;
 +if (conf-maxbucketsize  0) {
 +e = APR_BRIGADE_FIRST(in);
 +while (e != APR_BRIGADE_SENTINEL(in)) {   

 (rv == APR_SUCCESS  rv2 == APR_SUCCESS) {

Otherwise we may keep walking through the brigade unnecessarily.

 +
 +/* if necessary, split the brigade and send what we have so far 
 */
 +if (APR_SUCCESS == apr_bucket_split(e, conf-maxbucketsize)) {
 +e = APR_BUCKET_NEXT(e);
 +bb = in;
 +in = apr_brigade_split(bb, e);
 +
 +/* if store body fails, don't try store body again */
 +if (APR_SUCCESS == rv) {

Drop if.

 +rv = cache-provider-store_body(cache-handle, f-r, 
 bb);
 +}
 +
 +/* try write split brigade to the filter stack and network */
 +if (APR_SUCCESS == rv2) {

Drop if.

 +rv2 = ap_pass_brigade(f-next, bb);
 +}
 +apr_brigade_destroy(bb);
 +}
 +else {
 +e = APR_BUCKET_NEXT(e);
 +}
 +}
 +}
 +


Nice patch.

--
Davi Arnaut

Re: svn commit: r467655 - in /httpd/httpd/trunk: CHANGES docs/manual/mod/mod_cache.xml modules/cache/mod_cache.c modules/cache/mod_cache.h

2006-10-25 Thread Davi Arnaut

Graham Leggett wrote:
 On Wed, October 25, 2006 4:14 pm, Davi Arnaut wrote:
 
  (rv == APR_SUCCESS  rv2 == APR_SUCCESS) {

 Otherwise we may keep walking through the brigade unnecessarily.
 
 This is intentional - we don't want a failure on store_body() to cause a
 premature abort on the network write.

Huh ? The loop will break and the whole brigade will be sent at once a
few lines below. Why would we want to keep splitting the brigade ?

There is no reason to keep breaking the buckets if we won't store then,
or I am missing something ?

--
Davi Arnaut

Re: svn commit: r467655 - in /httpd/httpd/trunk: CHANGES docs/manual/mod/mod_cache.xml modules/cache/mod_cache.c modules/cache/mod_cache.h

2006-10-25 Thread Davi Arnaut

Joe Orton wrote:
 On Wed, Oct 25, 2006 at 01:44:48PM -, Graham Leggett wrote:
 Author: minfrin
 Date: Wed Oct 25 06:44:47 2006
 New Revision: 467655

 URL: http://svn.apache.org/viewvc?view=revrev=467655
 Log:
 mod_cache: Fix an out of memory condition that occurs when the
 cache tries to save huge files (greater than RAM). Buckets bigger
 than a tuneable threshold are split into smaller buckets before
 being passed to mod_disk_cache, etc. PR 39380
 
 Another couple of hundred lines of code and even a new config directive, 
 and this still doesn't get close to actually fixing the problem! -1 
 already, this code is just not getting better.  mod_disk_cache is still 
 liable to eat all your RAM in that apr_bucket_read() loop, the 
 apr_bucket_split() is not guaranteed to work for a morphing bucket type.
 
 It is simple enough to fix this problem without adding all this code and 
 without all the stuff in r450105 too, something like the below.
 

And you almost got it right. We don't want to stop caching if the
client's connection fail and we must not slow the caching because of a
slow client (that's why I didn't pass the brigade).

In the end, your's do almost the same as mine [1] expect that I dint's
pass the new buckets up the chain before deleting then (and it was file
bucket exclusive). Copying to disk tends to be faster then sending to a
client.

--
Davi Arnaut

1 http://marc.theaimsgroup.com/?l=apache-httpd-devm=115971884005419w=2

Re: svn commit: r467655 - in /httpd/httpd/trunk: CHANGES docs/manual/mod/mod_cache.xml modules/cache/mod_cache.c modules/cache/mod_cache.h

2006-10-25 Thread Davi Arnaut

Joe Orton wrote:
 On Wed, Oct 25, 2006 at 01:54:04PM -0300, Davi Arnaut wrote:
 Joe Orton wrote:
 Another couple of hundred lines of code and even a new config directive, 
 and this still doesn't get close to actually fixing the problem! -1 
 already, this code is just not getting better.  mod_disk_cache is still 
 liable to eat all your RAM in that apr_bucket_read() loop, the 
 apr_bucket_split() is not guaranteed to work for a morphing bucket type.

 It is simple enough to fix this problem without adding all this code and 
 without all the stuff in r450105 too, something like the below.

 And you almost got it right. We don't want to stop caching if the
 client's connection fail
 
 ...a simple enough change.

Yes.

 and we must not slow the caching because of a slow client (that's why 
 I didn't pass the brigade).
 
 There is no other acceptable solution AFAICS.  Buffering the entire 
 brigade (either to disk, or into RAM as the current code does) before 
 writing to the client is not OK, polling on buckets is not possible, 
 using threads is not OK, using non-blocking writes up the output filter 
 chain is not possible.  Any other ideas?

I think we should build the necessary infrastructure to solve this
problem once and for all. Be it either non-blocking writes, AIO or add
support to the core output filter to write data to different destinations.

--
Davi Arnaut

Re: svn commit: r467655 - in /httpd/httpd/trunk: CHANGES docs/manual/mod/mod_cache.xml modules/cache/mod_cache.c modules/cache/mod_cache.h

2006-10-25 Thread Davi Arnaut

William A. Rowe, Jr. wrote:
 Paul Querna wrote:
 Thats not going to be possible until 3.0. Good luck :)
 
 Paul's right.  The next expectiation we break, that a protocol spends it's
 life on a single thread.  That is really a 3.0 magnitude change because it
 will break modules in a major way.
 
 Then, the final expectation to break is that a -request- spends it's life on
 a given thread.  That's earth shattering to developers, and really is beyond
 the 3.0 release (if we want to see 3.0 adopted in the next few years).
 

Ok. I will come back in a few years :) j/k

How about moving the proxy/cache parts to use their own core
filters/architecture ? I obviously lack experience, but in my limited
understanding the proxy/cache code is being restrained because we can't
improve core features that may affect the HTTP serving parts.

In a proxy/cache situation no one cares about php/python, cgi, or
what-so-ever filters. Can't we change the rules for apache (core and
modules) operating as a proxy/cache more easily ?

I don't know all possible implications of such a move, but it may work.

--
Davi Arnaut

Re: svn commit: r467655 - in /httpd/httpd/trunk: CHANGES docs/manual/mod/mod_cache.xml modules/cache/mod_cache.c modules/cache/mod_cache.h

2006-10-25 Thread Davi Arnaut

Graham Leggett wrote:
 Davi Arnaut wrote:
 
 Yes, first because size_t is 32 bits :). When you do a read like this on
 a file bucket, the whole bucket is not read into memory. The read
 function splits the bucket and changes the current bucket to refer to
 data that was read.
 
 32 bits is 4GB. A large number of webservers don't have memory that 
 size, thus the problem.
 
 The problem lies that those new buckets keep accumulating in the
 brigade! See my patch again.
 
 Where?

I was referring to it will attempt to read all 4.7GB. Such a thing
does not exist!

 We start with one 4.7GB bucket in a brigade.

No, we start with a file bucket on the brigade, I will try to explain
what happens to see if we are talking about the same thing.

Suppose we have a brigade containing a a file bucket, and the file size
is 4.7GB. We want to read it fully.

When we call apr_bucket_read() on this bucket, we end-up calling the
bucket read function (file_bucket_read). What does the bucket file read do ?

If mmap is supported, it mmaps APR_MMAP_LIMIT bytes (4MB) by creating a
new mmap bucket and splits the bucket. So, after calling read on a file
bucket you have two buckets on the brigade. The first one is the mmap
bucket and last is file bucket with a updated offset.

The same thing happens if mmap is not supported, but the bucket type
will be a heap bucket. If we don't delete or flush those implicitly
created buckets they will keep the whole file in memory, but one single
read will not put the entire file on memory.

What Joe's patch does is remove this first implicitly created bucket
from the brigade, placing it on the brigade on a temporary brigade for
sending it to the client.

That's why splitting the brigade with magical values (16MB) is not such
a good idea, because the bucket type knows betters and will split the
bucket anyway.

--
Davi Arnaut

Re: svn commit: r467655 - in /httpd/httpd/trunk: CHANGES docs/manual/mod/mod_cache.xml modules/cache/mod_cache.c modules/cache/mod_cache.h

2006-10-25 Thread Davi Arnaut

Davi Arnaut wrote:
 Graham Leggett wrote:
 Davi Arnaut wrote:

 Yes, first because size_t is 32 bits :). When you do a read like this on
 a file bucket, the whole bucket is not read into memory. The read
 function splits the bucket and changes the current bucket to refer to
 data that was read.
 32 bits is 4GB. A large number of webservers don't have memory that 
 size, thus the problem.

 The problem lies that those new buckets keep accumulating in the
 brigade! See my patch again.
 Where?
 
 I was referring to it will attempt to read all 4.7GB. Such a thing
 does not exist!
 
 We start with one 4.7GB bucket in a brigade.
 
 No, we start with a file bucket on the brigade, I will try to explain
 what happens to see if we are talking about the same thing.
 
 Suppose we have a brigade containing a a file bucket, and the file size
 is 4.7GB. We want to read it fully.
 
 When we call apr_bucket_read() on this bucket, we end-up calling the
 bucket read function (file_bucket_read). What does the bucket file read do ?
 
 If mmap is supported, it mmaps APR_MMAP_LIMIT bytes (4MB) by creating a
 new mmap bucket and splits the bucket. So, after calling read on a file
 bucket you have two buckets on the brigade. The first one is the mmap
 bucket and last is file bucket with a updated offset.

s/last/last one is the old/

 The same thing happens if mmap is not supported, but the bucket type
 will be a heap bucket. If we don't delete or flush those implicitly
 created buckets they will keep the whole file in memory, but one single
 read will not put the entire file on memory.

s/keep the whole file in memory/will accumulate on the brigade, implying
that the whole file is in memory/

 What Joe's patch does is remove this first implicitly created bucket
 from the brigade, placing it on the brigade on a temporary brigade for
 sending it to the client.

s/on the brigade//

 That's why splitting the brigade with magical values (16MB) is not such
 a good idea, because the bucket type knows betters and will split the
 bucket anyway.

--
Davi Arnaut

Re: mod_disk_cache summarization

2006-10-23 Thread Davi Arnaut

Graham Leggett wrote:
 Brian Akins wrote:
 
 Can someone please summarize the various patches for mod_disk_cache that 
 have been floating around in last couple weeks?  I have looked at the 
 patches but wasn't real sure of the general philosophy/methodology to them.
 
 In essence, the patches solve the thundering herd problem.
 
 This involved some significant changes to the way disk cache worked, 
 some of which are still being reviewed.
 
 What's been committed so far is a temporary workaround to the 4.7GB 
 file buckets are being loaded into RAM problem. The workaround detects 
 if the bucket being read is a file bucket, and copies the file into the 
 cache using file read and file write, instead of bucket read (and thus 
 crash).
 
 Joe Orton raised concern about the targeting of file buckets for special 
 treatment in the disk cache, and I've been trying to find the right 
 way to handle this. Looking at the network write filter, it also seemed 
 to follow the logic if file bucket, special handling (possibly 
 SENDFILE), else read then write. So far the way that seems to most 
 elegant to me is to create a file write filter based on the network 
 write filter, that hides all the special bucket handling magic. Still 
 working on it though.

Have you seen my patch to address this issue ? IMHO, it is far less
complex and less expensive then the committed workaround.

 Next up was the fix for thundering herd itself - this involved teaching 
 the read then write loop inside disk cache to be broken into two 
 possible paths: read from output filter stack and write to cache file, 
 followed by read from cache file and write to network.
 

..

 
 The next patch, which was posted by Niklas but not committed yet, solves 
 the problem of the lead reading the entire response, then sending it to 
 the client as mentioned above. It involves disk_cache forking a 
 process/thread as appropriate, splitting the read response / write to 
 cache from read from cache / write to network, which now run 
 independently of each other.
 
 I am trying to find ways of removing the need for the extra fork / 
 thread. One option is to perform non blocking writes in the read from 
 cache / write to network code, and only iterating to the next write if 
 the previous write is complete (ie no block).
 
 This means the write to client will always be the same speed or slower 
 than the read from backend, which is exactly what we want.
 

I didn't like that sleep/fstat hackery. How about using a real file
notification scheme ?

--
Davi Arnaut

Re: mod_disk_cache summarization

2006-10-23 Thread Davi Arnaut

Graham Leggett wrote:
 Davi Arnaut wrote:
 
 Have you seen my patch to address this issue ? IMHO, it is far less
 complex and less expensive then the committed workaround.
 
 No - I went through your patches in some detail, but I didn't see one 
 that addressed this problem specifically. Once thundering herd is 
 solved, I plan to commit all your refactoring patches.

http://marc.theaimsgroup.com/?l=apache-httpd-devm=115971884005419w=2

 I didn't like that sleep/fstat hackery. How about using a real file
 notification scheme ?
 
 The prerequisite is that APR needs to be taught about this scheme, and 
 it has to work portably across all platforms.

AFAIK all major platforms provide one, even win32. I even made a
incomplete APR abstraction for file notification:

http://haxent.com/~davi/apr/notify

--
Davi Arnaut

Re: svn commit: r454683 - in /httpd/httpd/trunk: CHANGES docs/manual/mod/mod_rewrite.xml modules/mappers/mod_rewrite.c

2006-10-10 Thread Davi Arnaut

Nick Kew wrote:
 On Tuesday 10 October 2006 13:00, Ruediger Pluem wrote:
 On 10/10/2006 12:33 PM,  wrote:
 Author: niq
 Date: Tue Oct 10 03:33:06 2006
 New Revision: 454683

 URL: http://svn.apache.org/viewvc?view=revrev=454683
 Log:
 Add SQL Query capability to RewriteMap


 Modified: httpd/httpd/trunk/modules/mappers/mod_rewrite.c
 URL:
 http://svn.apache.org/viewvc/httpd/httpd/trunk/modules/mappers/mod_rewrit
 e.c?view=diffrev=454683r1=454682r2=454683
 =
 = --- httpd/httpd/trunk/modules/mappers/mod_rewrite.c (original)
 +++ httpd/httpd/trunk/modules/mappers/mod_rewrite.c Tue Oct 10 03:33:06
 2006

 +static char *lookup_map_dbd(request_rec *r, char *key, const char
 *label) +{
 +apr_status_t rv;
 +apr_dbd_prepared_t *stmt;
 +apr_dbd_results_t *res = NULL;
 +apr_dbd_row_t *row = NULL;
 +const char *ret = NULL;
 +int n = 0;
 +ap_dbd_t *db = dbd_acquire(r);
 +
 +stmt = apr_hash_get(db-prepared, label, APR_HASH_KEY_STRING);
 +
 +rv = apr_dbd_pvselect(db-driver, r-pool, db-handle, res,
 +  stmt, 0, key, NULL);
 +if (rv != 0) {
 +ap_log_rerror(APLOG_MARK, APLOG_ERR, 0, r,
 +  rewritemap: error querying for %s, key);

Missing return ? res may end up NULL..

--
Davi Arnaut

Re: [PATCH] mod_disk_cache working LFS (filecopy)

2006-10-02 Thread Davi Arnaut

Niklas Edmundsson wrote:
 On Sun, 1 Oct 2006, Davi Arnaut wrote:
 
 store_body:
.. if (is_file_bucket(bucket))
copy_file_bucket(bucket, bb);
 Probably, but that doesn't allow for creating a thread/process that
 does the copying in the background, which is my long term goal.

 Also, simply doing bucket_delete like that means that the file will
 never be sent to the client, which is a bad thing IMO ;)
 Shame on me, but I said something like.. :)

 I guess the attached patch does the same (plus mmap, et cetera) and is
 much simpler. Comments ?
 
 Simpler, yes. But it only has the benefit of not eating all your 
 memory...

Well, that was the goal. Maybe we could merge this one instead and work
together on the other goals.

 * It leaves the brigade containing the uncached entity, so it will
cause the backend to first deliver stuff to be cached and then stuff
to the client.

Yeah, that's how mod_disk_cache works. I think it's possible to work
around this limitation without using threads by keeping each cache
instance with it's own brigades and flushing it occasionally with
non-blocking i/o.

Or we could move all disk i/o to a mod_disk_cache exclusive thread pool,
it could be configurable at build time whether or not to use a thread pool.

Comments ?

 * When this evolves to wanting to spawn a thread/process to do the
copying you'll need the is this a file-thingie anyway (at least I
need it, but I might have missed some nifty feature in APR).

You would just need to copy the remaining buckets (granted if there are
no concurrency problems) and send then to a per-process thread pool.

--
Davi Arnaut

Re: [PATCH] mod_disk_cache working LFS (filecopy)

2006-10-02 Thread Davi Arnaut

Niklas Edmundsson wrote:
 On Mon, 2 Oct 2006, Davi Arnaut wrote:
 
 Simpler, yes. But it only has the benefit of not eating all your
 memory...
 Well, that was the goal. Maybe we could merge this one instead and work
 together on the other goals.
 
 As I have said before, we have a large patchset that fixes a bunch of 
 problems. However, since the wish was to merge it in pieces, I have 
 started breaking it into pieces for merging.

Thank you.

 If the intent is a total redesign of mod_disk_cache, ie you're not 
 interested in these patches at all, please say so and I would have not 
 wasted a lot of work on bending our patches to get something that 
 works when applying them one by one and then do QA on the thing.

Your work is never wasted, but the overall goal is to achieve a good
code quality whether is comes from you, me or anyone - personally
speaking since I'm not a apache committer.

I'm also interested in the same features you are, but we have to think
about long-term maintainability/quality to avoid half-backed short-term
solutions which wastes everybody's work.

 
 * It leaves the brigade containing the uncached entity, so it will
cause the backend to first deliver stuff to be cached and then stuff
to the client.
 Yeah, that's how mod_disk_cache works. I think it's possible to work
 around this limitation without using threads by keeping each cache
 instance with it's own brigades and flushing it occasionally with
 non-blocking i/o.
 
 The replace_brigade_with_cache()-function simply replaced the brigade 
 with an instance pointing to the cached entity.
 
 Or we could move all disk i/o to a mod_disk_cache exclusive thread pool,
 it could be configurable at build time whether or not to use a thread pool.

 Comments ?
 
 I would very happy if people would fix the mod_disk_cache mess so I 
 didn't have to. However, since noone seems to have produced something 
 usable for our usage in the timeframe mod_disk_cache has existed I was 
 forced to hack on it.
 
 I'm trying my best to not give up on having it merged as I know that 
 there are other sites interested in it, and now that the first trivial 
 bit has been applied I'm hoping that people will at least look at the 
 rest... There are bound to be cool apachier ways to solve some of the 
 problems, but given that our patch is stable in production and has a 
 generally much higher code quality than mod_disk_cache (ever heard of 
 error checking/handling?) it would be nice if people at least could 
 look at the whole thing before starting to complain at the complexity 
 of small parts (or code not touched by the patch, for that matter).

As I said, we should always strive for the best possible quality or we
are doomed to keep fixing it over and over.

 * When this evolves to wanting to spawn a thread/process to do the
copying you'll need the is this a file-thingie anyway (at least I
need it, but I might have missed some nifty feature in APR).
 You would just need to copy the remaining buckets (granted if there are
 no concurrency problems) and send then to a per-process thread pool.
 
 And when not having threads?

non-blocking when possible.

--
Davi Arnaut

Re: [PATCH] mod_disk_cache working LFS (filecopy)

2006-10-02 Thread Davi Arnaut

Ruediger Pluem wrote:
 
 On 09/26/2006 01:00 PM, Joe Orton wrote:
 On Tue, Sep 26, 2006 at 10:52:18AM +0200, Niklas Edmundsson wrote:

 This patch depends on mod_disk_cache LFS-aware config submitted 
 earlier and is for trunk.

 It makes caching of large files possible on 32bit machines by:

 * Realising that a file is a file and can be copied as such, without
  reading the whole thing into memory first.

 This was discussed a while back.  I think this is an API problem which 
 needs to be fixed at API level, not something which should be worked 
 around by adding bucket-type-specific hacks.  The store_body callback 
 needs to be able to operate like a real output filter so it can avoid 
 the buffering-buckets-into-memory problem; as in:

 while (brigade not empty)
 1. read data from bucket
 2. write data to disk
 3. pass bucket up output filter chain
 4. delete bucket
 
 I agree. But in the case that we want to save a file bucket, can't we use
 sendfile in this case? Something like ap(r)_store_file_bucket which would
 store it via sendfile or splice on Linux would be really neat. Of course
 a lot more details would need to be worked out e.g. how and if this function
 changes the file bucket and what happens to the position of the src and target
 file pointers in the file.

Linux sendfile() only works on mmapable fds to sockets. splice() one of
the two fds must refer to a pipe (afaik). tee() both fds must refer to
pipes.

--
Davi Arnaut

Re: [PATCH] mod_disk_cache working LFS (filecopy)

2006-10-01 Thread Davi Arnaut

Niklas Edmundsson wrote:
 On Sat, 30 Sep 2006, Davi Arnaut wrote:
 
 Hi,

 Wouldn't you avoid a lot of complexity in this patch
 if you just deleted from the brigade the implicitly
 created heap buckets while reading file buckets ?

 Something like:

 store_body:
  .. if (is_file_bucket(bucket))
  copy_file_bucket(bucket, bb);
 
 Probably, but that doesn't allow for creating a thread/process that 
 does the copying in the background, which is my long term goal.
 
 Also, simply doing bucket_delete like that means that the file will 
 never be sent to the client, which is a bad thing IMO ;)

Shame on me, but I said something like.. :)

I guess the attached patch does the same (plus mmap, et cetera) and is
much simpler. Comments ?

--
Davi Arnaut


Index: trunk/modules/cache/mod_disk_cache.c
===
--- trunk.orig/modules/cache/mod_disk_cache.c   2006-10-01 12:56:55.0 
-0300
+++ trunk/modules/cache/mod_disk_cache.c2006-10-01 12:57:34.0 
-0300
@@ -984,126 +984,51 @@
 return APR_SUCCESS;
 }
 
+apr_status_t copy_file_bucket(apr_bucket *b, apr_file_t *fd, apr_off_t 
*file_size)
+{
+apr_size_t len;
+const char *str;
+apr_size_t written;
+apr_status_t rv, rc;
+apr_bucket *ref, *e, *s;
 
-static apr_status_t copy_body(apr_pool_t *p,
-  apr_file_t *srcfd, apr_off_t srcoff, 
-  apr_file_t *destfd, apr_off_t destoff, 
-  apr_off_t len)
-{
-apr_status_t rc;
-apr_size_t size;
-apr_finfo_t finfo;
-apr_time_t starttime = apr_time_now();
-
-char *buf = apr_palloc(p, CACHE_BUF_SIZE);
-if (!buf) {
-return APR_ENOMEM;
-}
-
-if(srcoff != 0) {
-rc = apr_file_seek(srcfd, APR_SET, srcoff);
-if(rc != APR_SUCCESS) {
-return rc;
-}
-}
+rv = apr_bucket_copy(b, e);
 
-if(destoff != 0) {
-rc = apr_file_seek(destfd, APR_SET, destoff);
-if(rc != APR_SUCCESS) {
-return rc;
-}
+if (rv != APR_SUCCESS) {
+return rv;
 }
 
-/* Tried doing this with mmap, but sendfile on Linux got confused when
-   sending a file while it was being written to from an mmapped area.
-   The traditional way seems to be good enough, and less complex.
- */
-while(len  0) {
-size=MIN(len, CACHE_BUF_SIZE);
+APR_BUCKET_INIT(e);
+s = APR_BUCKET_NEXT(e);
 
-rc = apr_file_read_full (srcfd, buf, size, NULL);
-if(rc != APR_SUCCESS) {
-return rc;
-}
+do {
+rv = apr_bucket_read(e, str, len, APR_BLOCK_READ);
 
-rc = apr_file_write_full(destfd, buf, size, NULL);
-if(rc != APR_SUCCESS) {
-return rc;
+if (rv != APR_SUCCESS  rv != APR_EOF) {
+break;
 }
-len -= size;
-}
-
-/* Check if file has changed during copying. This is not 100% foolproof
-   due to NFS attribute caching when on NFS etc. */
-/* FIXME: Can we assume that we're always copying an entire file? In that
-  case we can check if the current filesize matches the length
-  we think it is */
-rc = apr_file_info_get(finfo, APR_FINFO_MTIME, srcfd);
-if(rc != APR_SUCCESS) {
-return rc;
-}
-if(starttime  finfo.mtime) {
-return APR_EGENERAL;
-}
-
-return APR_SUCCESS;
-}
-
-
-static apr_status_t replace_brigade_with_cache(cache_handle_t *h,
-   request_rec *r,
-   apr_bucket_brigade *bb)
-{
-apr_status_t rv;
-int flags;
-apr_bucket *e;
-core_dir_config *pdcfg = ap_get_module_config(r-per_dir_config,
-core_module);
-disk_cache_object_t *dobj = (disk_cache_object_t *) h-cache_obj-vobj;
 
-flags = APR_READ|APR_BINARY;
-#if APR_HAS_SENDFILE
-flags |= ((pdcfg-enable_sendfile == ENABLE_SENDFILE_OFF)
-? 0 : APR_SENDFILE_ENABLED);
-#endif
+rc = apr_file_write_full(fd, str, len, written);
 
-rv = apr_file_open(dobj-fd, dobj-datafile, flags, 0, r-pool);
-if (rv != APR_SUCCESS) {
-ap_log_error(APLOG_MARK, APLOG_ERR, rv, r-server,
- disk_cache: Error opening datafile %s for URL %s,
- dobj-datafile, dobj-name);
-return rv;
-}
+if (rc != APR_SUCCESS) {
+rv = rc;
+break;
+}
 
-/* First, empty the brigade */
-e  = APR_BRIGADE_FIRST(bb);
-while (e != APR_BRIGADE_SENTINEL(bb)) {
-apr_bucket *d;
-d = e;
+ref = e;
+*file_size += written;
 e = APR_BUCKET_NEXT(e);
-apr_bucket_delete(d);
-}
+apr_bucket_delete(ref);
+} while (e != s  rv == APR_SUCCESS);
 
-/* Then, populate it with our cached instance */
-rv = recall_body(h, r-pool, bb);
-if (rv

Re: [PATCH] mod_disk_cache working LFS (filecopy)

2006-09-30 Thread Davi Arnaut



On 26/09/2006, at 05:52, Niklas Edmundsson wrote:



This patch depends on mod_disk_cache LFS-aware config submitted  
earlier and is for trunk.


It makes caching of large files possible on 32bit machines by:

* Realising that a file is a file and can be copied as such, without
  reading the whole thing into memory first.


Hi,

Wouldn't you avoid a lot of complexity in this patch
if you just deleted from the brigade the implicitly
created heap buckets while reading file buckets ?

Something like:

store_body:
.. if (is_file_bucket(bucket))
copy_file_bucket(bucket, bb);

copy_file_bucket:

while (bucket != sentinel(bb)  is_file_bucket(bucket) {
rv = apr_bucket_read(..)
if (is_heap_bucket(bucket)) {
apr_bucket *tmp = bucket;
bucket = bucket_next(tmp);
bucket_delete(tmp);
}
else
bucket = bucket_next(bucket);
}

IMHO, this is a lot better than all that added complexity
for a corner case.

--
Davi Arnaut

Re: svn commit: r450188 - /httpd/httpd/trunk/modules/cache/mod_disk_cache.c

2006-09-26 Thread Davi Arnaut



On 26/09/2006, at 17:35, [EMAIL PROTECTED] wrote:


Author: minfrin
Date: Tue Sep 26 13:35:42 2006
New Revision: 450188

+
+char *buf = apr_palloc(p, CACHE_BUF_SIZE);
+if (!buf) {
+return APR_ENOMEM;
+}


IIRC, apache abort()s on memory allocation errors.

--
Davi Arnaut

Re: [patch 10/16] fix up coding style issues

2006-09-20 Thread Davi Arnaut



On 20/09/2006, at 06:20, Joe Orton wrote:


Hi Davi,

On Tue, Sep 19, 2006 at 11:34:03PM -0300, Davi Arnaut wrote:

Clean up code style in the cache code and shrink the mod_mem_cache
store_body function.


The casts to/from void * are unnecessary and could just be removed
rather than being whitespace-adjusted.


Yes, certainly.




--- modules/cache/mod_mem_cache.c.orig
+++ modules/cache/mod_mem_cache.c
@@ -52,27 +52,30 @@

 module AP_MODULE_DECLARE_DATA mem_cache_module;

-typedef enum {
+typedef enum
+{


The traditional style is to keep the brace on the same line for data
type definitions (but not with function definitions).  see e.g. grep
struct httpd.h :)


I used the GNU indent command line as cited at http:// 
httpd.apache.org/dev/styleguide.html

Anyway, I should have double checked the result.

--
Davi Arnaut

Re: [patch 09/16] simplify array and table serialization

2006-09-20 Thread Davi Arnaut



On 20/09/2006, at 10:16, Brian Akins wrote:


Davi Arnaut wrote:

Simplify the array and table serialization code, separating it from
the underlying I/O operations.


Probably faster to just put every thing in an iovec (think writev).


Probably no, apr_brigade_writev does (quite) the same.

--
Davi Arnaut

Re: [patch 09/16] simplify array and table serialization

2006-09-20 Thread Davi Arnaut



On 20/09/2006, at 11:00, Brian Akins wrote:


Davi Arnaut wrote:

On 20/09/2006, at 10:16, Brian Akins wrote:
  Davi Arnaut wrote:
  Simplify the array and table serialization code, separating it  
from

  the underlying I/O operations.
 
  Probably faster to just put every thing in an iovec (think  
writev).

Probably no, apr_brigade_writev does (quite) the same.


Doesn't mean apr_brigade_writev does it fast either...


If the serialization simply returned an iovec, mod_mem_cache could  
use apr_pstrcatv and mod_disk_cache could use apr_file_writev.




There are no brigades _yet_, also pay attention to the comment:
/* XXX this is a temporary function, it will be removed later */

--
Davi Arnaut

Re: [patch 09/16] simplify array and table serialization

2006-09-20 Thread Davi Arnaut



On 20/09/2006, at 11:58, Davi Arnaut wrote:



On 20/09/2006, at 11:00, Brian Akins wrote:


Davi Arnaut wrote:

On 20/09/2006, at 10:16, Brian Akins wrote:
  Davi Arnaut wrote:
  Simplify the array and table serialization code, separating  
it from

  the underlying I/O operations.
 
  Probably faster to just put every thing in an iovec (think  
writev).

Probably no, apr_brigade_writev does (quite) the same.


Doesn't mean apr_brigade_writev does it fast either...


If the serialization simply returned an iovec, mod_mem_cache could  
use apr_pstrcatv and mod_disk_cache could use apr_file_writev.




There are no brigades _yet_, also pay attention to the comment:
/* XXX this is a temporary function, it will be removed later */



And yes, once we have brigades between then it is much better/faster  
to use the writev

functions.

--
Davi Arnaut

[patch 02/16] revamped mod_disk_cache directory structure

2006-09-19 Thread Davi Arnaut

This patch converts the mod_disk_cache cache directory structure to a
uniformly distributed N level hierarchy. The admin specifies the number
of levels (or directories) and the files are scattered across the
directories. Example:

CacheRoot /cache/
# CacheDirLevels n l1 l2 ln
CacheDirLevels 2 4 256 16

In this example, the directory tree will be three levels deep. The first
level will have 4 directories, each of the 4 first level directories will
have 256 directories (second level) and so on to the last level.

Directory tree layout for the example:

/cache/
/cache/0d
/cache/0d/36
/cache/0d/36/06
/cache/0d/36/06/7a50a5c38a73abdb6db28a2b0f6881e5.data
/cache/0d/36/06/7a50a5c38a73abdb6db28a2b0f6881e5.header

This patch adds support for symlinking the directories to separate disk
or partitions by creating the cache files on their destination directory.
The symlinks can also be used to load balance the cache files between
disks because each last level directory gets the same (on average) number
of files -- on a cache setup with 4 first level directories each one
receives 25%, linking the three of them to disk A and the last one to
disk B yields a 75/25 distribution between disks A (75) and B (25).

Index: modules/cache/cache_util.c
===
--- modules/cache/cache_util.c.orig
+++ modules/cache/cache_util.c
@@ -19,6 +19,7 @@
 #include mod_cache.h
 
 #include ap_provider.h
+#include util_md5.h
 
 /* -- */
 
@@ -489,54 +490,49 @@
 y[sizeof(j) * 2] = '\0';
 }
 
-static void cache_hash(const char *it, char *val, int ndepth, int nlength)
+static unsigned int cdb_string_hash(const char *str)
 {
-apr_md5_ctx_t context;
-unsigned char digest[16];
-char tmp[22];
-int i, k, d;
-unsigned int x;
-static const char enc_table[64] =
-ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_@;
-
-apr_md5_init(context);
-apr_md5_update(context, (const unsigned char *) it, strlen(it));
-apr_md5_final(digest, context);
-
-/* encode 128 bits as 22 characters, using a modified uuencoding
- * the encoding is 3 bytes - 4 characters* i.e. 128 bits is
- * 5 x 3 bytes + 1 byte - 5 * 4 characters + 2 characters
- */
-for (i = 0, k = 0; i  15; i += 3) {
-x = (digest[i]  16) | (digest[i + 1]  8) | digest[i + 2];
-tmp[k++] = enc_table[x  18];
-tmp[k++] = enc_table[(x  12)  0x3f];
-tmp[k++] = enc_table[(x  6)  0x3f];
-tmp[k++] = enc_table[x  0x3f];
-}
-
-/* one byte left */
-x = digest[15];
-tmp[k++] = enc_table[x  2];/* use up 6 bits */
-tmp[k++] = enc_table[(x  4)  0x3f];
-
-/* now split into directory levels */
-for (i = k = d = 0; d  ndepth; ++d) {
-memcpy(val[i], tmp[k], nlength);
-k += nlength;
-val[i + nlength] = '/';
-i += nlength + 1;
-}
-memcpy(val[i], tmp[k], 22 - k);
-val[i + 22 - k] = '\0';
-}
-
-CACHE_DECLARE(char *)ap_cache_generate_name(apr_pool_t *p, int dirlevels,
-int dirlength, const char *name)
-{
-char hashfile[66];
-cache_hash(name, hashfile, dirlevels, dirlength);
-return apr_pstrdup(p, hashfile);
+unsigned int hash = 5381;
+
+while (*str) {
+hash += (hash  5);
+hash ^= *str++;
+}
+
+return hash;
+}
+
+#define MD5_HEXDIGESTSIZE   (APR_MD5_DIGESTSIZE * 2 + 1)
+
+CACHE_DECLARE(char *)ap_cache_generate_name(apr_pool_t *p, unsigned int 
nlevels,
+unsigned int *dirlevels,
+unsigned int *divisors,
+const char *name)
+{
+char *md5_hash;
+unsigned int i;
+char *ptr, *key;
+unsigned char h;
+unsigned int cdb_hash;
+static const char hex[] = 0123456789abcdef;
+
+md5_hash = ap_md5_binary(p, (unsigned char *) name, (int) strlen(name));
+
+cdb_hash = cdb_string_hash(name);
+
+key = ptr = apr_palloc(p, nlevels * LEVEL_DIR_LENGTH +
+   MD5_HEXDIGESTSIZE);
+
+for (i = 0; i  nlevels; i++) {
+h = (cdb_hash / divisors[i]) % dirlevels[i];
+*ptr++ = hex[h  4];
+*ptr++ = hex[h  0xf];
+*ptr++ = '/';
+}
+
+memcpy(ptr, md5_hash, MD5_HEXDIGESTSIZE);
+
+return key;
 }
 
 /* Create a new table consisting of those elements from an input
Index: modules/cache/mod_cache.h
===
--- modules/cache/mod_cache.h.orig
+++ modules/cache/mod_cache.h
@@ -72,6 +72,8 @@
 
 #include apr_atomic.h
 
+#define LEVEL_DIR_LENGTH3
+
 #ifndef MAX
 #define MAX(a,b)((a)  (b) ? (a) : (b))
 #endif
@@ -274,8 +276,9 @@
 
 CACHE_DECLARE(apr_time_t) ap_cache_hex2usec(const char *x);
 CACHE_DECLARE(void) ap_cache_usec2hex(apr_time_t j, char *y);
-CACHE_DECLARE(char *)

[patch 04/16] shrink cache_url_handler

2006-09-19 Thread Davi Arnaut

Move parts of cache_url_handler() to a smaller add_cache_filters()
function that is easier to read and understand.

Index: modules/cache/mod_cache.c
===
--- modules/cache/mod_cache.c.orig
+++ modules/cache/mod_cache.c
@@ -48,6 +48,44 @@
  * oh well.
  */
 
+static void add_cache_filters(request_rec *r, cache_request_rec *cache)
+{
+/*
+ * Add cache_save filter to cache this request. Choose
+ * the correct filter by checking if we are a subrequest
+ * or not.
+ */
+if (r-main) {
+ap_log_error(APLOG_MARK, APLOG_DEBUG, APR_SUCCESS,
+r-server,
+Adding CACHE_SAVE_SUBREQ filter for %s,
+r-uri);
+ap_add_output_filter_handle(cache_save_subreq_filter_handle,
+NULL, r, r-connection);
+}
+else {
+ap_log_error(APLOG_MARK, APLOG_DEBUG, APR_SUCCESS,
+r-server, Adding CACHE_SAVE filter for %s,
+r-uri);
+ap_add_output_filter_handle(cache_save_filter_handle,
+NULL, r, r-connection);
+}
+
+ap_log_error(APLOG_MARK, APLOG_DEBUG, APR_SUCCESS, r-server,
+Adding CACHE_REMOVE_URL filter for %s,
+r-uri);
+
+/* Add cache_remove_url filter to this request to remove a
+ * stale cache entry if needed. Also put the current cache
+ * request rec in the filter context, as the request that
+ * is available later during running the filter maybe
+ * different due to an internal redirect.
+ */
+cache-remove_url_filter =
+ap_add_output_filter_handle(cache_remove_url_filter_handle,
+cache, r, r-connection);
+}
+
 static int cache_url_handler(request_rec *r, int lookup)
 {
 apr_status_t rv;
@@ -113,41 +151,7 @@
 if (rv != OK) {
 if (rv == DECLINED) {
 if (!lookup) {
-
-/*
- * Add cache_save filter to cache this request. Choose
- * the correct filter by checking if we are a subrequest
- * or not.
- */
-if (r-main) {
-ap_log_error(APLOG_MARK, APLOG_DEBUG, APR_SUCCESS,
- r-server,
- Adding CACHE_SAVE_SUBREQ filter for %s,
- r-uri);
-
ap_add_output_filter_handle(cache_save_subreq_filter_handle,
-NULL, r, r-connection);
-}
-else {
-ap_log_error(APLOG_MARK, APLOG_DEBUG, APR_SUCCESS,
- r-server, Adding CACHE_SAVE filter for %s,
- r-uri);
-ap_add_output_filter_handle(cache_save_filter_handle,
-NULL, r, r-connection);
-}
-
-ap_log_error(APLOG_MARK, APLOG_DEBUG, APR_SUCCESS, r-server,
- Adding CACHE_REMOVE_URL filter for %s,
- r-uri);
-
-/* Add cache_remove_url filter to this request to remove a
- * stale cache entry if needed. Also put the current cache
- * request rec in the filter context, as the request that
- * is available later during running the filter maybe
- * different due to an internal redirect.
- */
-cache-remove_url_filter =
-ap_add_output_filter_handle(cache_remove_url_filter_handle,
-cache, r, r-connection);
+add_cache_filters(r, cache);
 }
 else {
 if (cache-stale_headers) {

--

[patch 05/16] pass uri_meets_conditions values by reference

2006-09-19 Thread Davi Arnaut

Don't pass 'large' objects by value when not needed, it wastes time and space.

Index: modules/cache/cache_util.c
===
--- modules/cache/cache_util.c.orig
+++ modules/cache/cache_util.c
@@ -28,40 +28,41 @@
 /* Determine if url matches the hostname, scheme and port and path
  * in filter. All but the path comparisons are case-insensitive.
  */
-static int uri_meets_conditions(apr_uri_t filter, int pathlen, apr_uri_t url)
+static int uri_meets_conditions(const apr_uri_t *filter, apr_size_t pathlen,
+const apr_uri_t *url)
 {
 /* Compare the hostnames */
-if(filter.hostname) {
-if (!url.hostname) {
+if(filter-hostname) {
+if (!url-hostname) {
 return 0;
 }
-else if (strcasecmp(filter.hostname, url.hostname)) {
+else if (strcasecmp(filter-hostname, url-hostname)) {
 return 0;
 }
 }
 
 /* Compare the schemes */
-if(filter.scheme) {
-if (!url.scheme) {
+if(filter-scheme) {
+if (!url-scheme) {
 return 0;
 }
-else if (strcasecmp(filter.scheme, url.scheme)) {
+else if (strcasecmp(filter-scheme, url-scheme)) {
 return 0;
 }
 }
 
 /* Compare the ports */
-if(filter.port_str) {
-if (url.port_str  filter.port != url.port) {
+if(filter-port_str) {
+if (url-port_str  filter-port != url-port) {
 return 0;
 }
 /* NOTE:  ap_port_of_scheme will return 0 if given NULL input */
-else if (filter.port != apr_uri_port_of_scheme(url.scheme)) {
+else if (filter-port != apr_uri_port_of_scheme(url-scheme)) {
 return 0;
 }
 }
-else if(url.port_str  filter.scheme) {
-if (apr_uri_port_of_scheme(filter.scheme) == url.port) {
+else if(url-port_str  filter-scheme) {
+if (apr_uri_port_of_scheme(filter-scheme) == url-port) {
 return 0;
 }
 }
@@ -69,12 +70,11 @@
 /* Url has met all of the filter conditions so far, determine
  * if the paths match.
  */
-return !strncmp(filter.path, url.path, pathlen);
+return !strncmp(filter-path, url-path, pathlen);
 }
 
 CACHE_DECLARE(cache_provider_list *)ap_cache_get_providers(request_rec *r,
-  cache_server_conf *conf,
-  apr_uri_t uri)
+  cache_server_conf *conf)
 {
 cache_provider_list *providers = NULL;
 int i;
@@ -83,7 +83,7 @@
 for (i = 0; i  conf-cacheenable-nelts; i++) {
 struct cache_enable *ent =
 (struct cache_enable *)conf-cacheenable-elts;
-if (uri_meets_conditions(ent[i].url, ent[i].pathlen, uri)) {
+if (uri_meets_conditions(ent[i].url, ent[i].pathlen, r-parsed_uri)) 
{
 /* Fetch from global config and add to the list. */
 cache_provider *provider;
 provider = ap_lookup_provider(CACHE_PROVIDER_GROUP, ent[i].type,
@@ -120,7 +120,7 @@
 for (i = 0; i  conf-cachedisable-nelts; i++) {
 struct cache_disable *ent =
(struct cache_disable 
*)conf-cachedisable-elts;
-if (uri_meets_conditions(ent[i].url, ent[i].pathlen, uri)) {
+if (uri_meets_conditions(ent[i].url, ent[i].pathlen, r-parsed_uri)) 
{
 /* Stop searching now. */
 return NULL;
 }
Index: modules/cache/mod_cache.c
===
--- modules/cache/mod_cache.c.orig
+++ modules/cache/mod_cache.c
@@ -108,7 +108,7 @@
 /*
  * Which cache module (if any) should handle this request?
  */
-if (!(providers = ap_cache_get_providers(r, conf, r-parsed_uri))) {
+if (!(providers = ap_cache_get_providers(r, conf))) {
 return DECLINED;
 }
 
Index: modules/cache/mod_cache.h
===
--- modules/cache/mod_cache.h.orig
+++ modules/cache/mod_cache.h
@@ -280,7 +280,7 @@
  unsigned int *dirlevels,
  unsigned int *divisors,
  const char *name);
-CACHE_DECLARE(cache_provider_list *)ap_cache_get_providers(request_rec *r, 
cache_server_conf *conf, apr_uri_t uri);
+CACHE_DECLARE(cache_provider_list *)ap_cache_get_providers(request_rec *r, 
cache_server_conf *conf);
 CACHE_DECLARE(int) ap_cache_liststr(apr_pool_t *p, const char *list,
 const char *key, char **val);
 CACHE_DECLARE(const char *)ap_cache_tokstr(apr_pool_t *p, const char *list, 
const char **str);

--

[patch 03/16] shrink cache_save_filter

2006-09-19 Thread Davi Arnaut

Move parts of cache_save_filter() to a smaller check_cacheable_request()
function that is easier to read and understand. Also, a useless assignment
is removed.


Index: modules/cache/mod_cache.c
===
--- modules/cache/mod_cache.c.orig
+++ modules/cache/mod_cache.c
@@ -313,65 +313,11 @@
  *   Finally, pass the data to the next filter (the network or whatever)
  */
 
-static int cache_save_filter(ap_filter_t *f, apr_bucket_brigade *in)
+static const char* check_cacheable_request(request_rec *r,
+cache_request_rec *cache, cache_server_conf *conf)
 {
-int rv = !OK;
-int date_in_errhdr = 0;
-request_rec *r = f-r;
-cache_request_rec *cache;
-cache_server_conf *conf;
-const char *cc_out, *cl;
-const char *exps, *lastmods, *dates, *etag;
-apr_time_t exp, date, lastmod, now;
-apr_off_t size;
-cache_info *info = NULL;
-char *reason;
-apr_pool_t *p;
-
-conf = (cache_server_conf *) ap_get_module_config(r-server-module_config,
-  cache_module);
-
-/* Setup cache_request_rec */
-cache = (cache_request_rec *) ap_get_module_config(r-request_config,
-   cache_module);
-if (!cache) {
-/* user likely configured CACHE_SAVE manually; they should really use
- * mod_cache configuration to do that
- */
-cache = apr_pcalloc(r-pool, sizeof(cache_request_rec));
-ap_set_module_config(r-request_config, cache_module, cache);
-}
-
-reason = NULL;
-p = r-pool;
-/*
- * Pass Data to Cache
- * --
- * This section passes the brigades into the cache modules, but only
- * if the setup section (see below) is complete.
- */
-if (cache-block_response) {
-/* We've already sent down the response and EOS.  So, ignore
- * whatever comes now.
- */
-return APR_SUCCESS;
-}
-
-/* have we already run the cachability check and set up the
- * cached file handle?
- */
-if (cache-in_checked) {
-/* pass the brigades into the cache, then pass them
- * up the filter stack
- */
-rv = cache-provider-store_body(cache-handle, r, in);
-if (rv != APR_SUCCESS) {
-ap_log_error(APLOG_MARK, APLOG_DEBUG, rv, r-server,
- cache: Cache provider's store_body failed!);
-ap_remove_output_filter(f);
-}
-return ap_pass_brigade(f-next, in);
-}
+apr_time_t exp, lastmod;
+const char *exps, *lastmods, *etag, *cc_out, *reason = NULL;
 
 /*
  * Setup Data in Cache
@@ -439,11 +385,11 @@
  * We include 304 Not Modified here too as this is the origin server
  * telling us to serve the cached copy.
  */
-reason = apr_psprintf(p, Response status %d, r-status);
+reason = apr_psprintf(r-pool, Response status %d, r-status);
 }
 else if (exps != NULL  exp == APR_DATE_BAD) {
 /* if a broken Expires header is present, don't cache it */
-reason = apr_pstrcat(p, Broken expires header: , exps, NULL);
+reason = apr_pstrcat(r-pool, Broken expires header: , exps, NULL);
 }
 else if (exp != APR_DATE_BAD  exp  r-request_time)
 {
@@ -522,20 +468,16 @@
 reason = r-no_cache present;
 }
 
-if (reason) {
-ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, r-server,
- cache: %s not cached. Reason: %s, r-unparsed_uri,
- reason);
+cache-exp = exp;
+cache-lastmod = lastmod;
 
-/* remove this filter from the chain */
-ap_remove_output_filter(f);
-
-/* ship the data up the stack */
-return ap_pass_brigade(f-next, in);
-}
+return reason;
+}
 
-/* Make it so that we don't execute this path again. */
-cache-in_checked = 1;
+static apr_off_t request_cl(request_rec *r, apr_bucket_brigade *in)
+{
+const char *cl;
+apr_off_t size;
 
 /* Set the content length if known.
  */
@@ -580,48 +522,18 @@
 }
 }
 
-/* It's safe to cache the response.
- *
- * There are two possiblities at this point:
- * - cache-handle == NULL. In this case there is no previously
- * cached entity anywhere on the system. We must create a brand
- * new entity and store the response in it.
- * - cache-stale_handle != NULL. In this case there is a stale
- * entity in the system which needs to be replaced by new
- * content (unless the result was 304 Not Modified, which means
- * the cached entity is actually fresh, and we should update
- * the headers).
- */
-
-/* Did we have a stale cache entry that really is stale? */
-if (cache-stale_handle) {
-if (r-status == HTTP_NOT_MODIFIED) {
-/* Oh, hey.  It isn't that stale!  Yay! */
-cache-handle =

[patch 08/16] dont delete empty cache directories

2006-09-19 Thread Davi Arnaut

Don't delete empty cache directories, it is too expensive and unnecessary.
Later they can be removed with htcacheclean.

Index: modules/cache/mod_disk_cache.c
===
--- modules/cache/mod_disk_cache.c.orig
+++ modules/cache/mod_disk_cache.c
@@ -579,45 +579,6 @@
 }
 }
 
-/* now delete directories as far as possible up to our cache root */
-if (dobj-root) {
-const char *str_to_copy;
-
-str_to_copy = dobj-hdrsfile ? dobj-hdrsfile : dobj-datafile;
-if (str_to_copy) {
-char *dir, *slash, *q;
-
-dir = apr_pstrdup(p, str_to_copy);
-
-/* remove filename */
-slash = strrchr(dir, '/');
-*slash = '\0';
-
-/*
- * now walk our way back to the cache root, delete everything
- * in the way as far as possible
- *
- * Note: due to the way we constructed the file names in
- * header_file and data_file, we are guaranteed that the
- * cache_root is suffixed by at least one '/' which will be
- * turned into a terminating null by this loop.  Therefore,
- * we won't either delete or go above our cache root.
- */
-for (q = dir + dobj-root_len; *q ; ) {
- ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, NULL,
-  disk_cache: Deleting directory %s from cache,
-  dir);
-
- rc = apr_dir_remove(dir, p);
- if (rc != APR_SUCCESS  !APR_STATUS_IS_ENOENT(rc)) {
-break;
- }
- slash = strrchr(q, '/');
- *slash = '\0';
-}
-}
-}
-
 return OK;
 }
 

--

[patch 00/16] SoC 2006, cache refactoring

2006-09-19 Thread Davi Arnaut

Hi,

My first round of patches to the cache-refactor branch. They are mostly 
cleanups,
bug fixes and small but useful enchantments.

The patches are also available at http://verdesmares.com/Apache/patches/

--
Davi Arnaut

[patch 01/16] dont cache expired objects

2006-09-19 Thread Davi Arnaut

Don't cache requests with a expires date in the past; otherwise mod_cache will
always try to cache the URL. This bug might lead to numerous rename() errors
on win32 if the URL was previously cached.

Index: modules/cache/mod_cache.c
===
--- modules/cache/mod_cache.c.orig
+++ modules/cache/mod_cache.c
@@ -445,6 +445,11 @@
 /* if a broken Expires header is present, don't cache it */
 reason = apr_pstrcat(p, Broken expires header: , exps, NULL);
 }
+else if (exp != APR_DATE_BAD  exp  r-request_time)
+{
+/* if a Expires header is in the past, don't cache it */
+reason = Expires header already expired, not cacheable;
+}
 else if (r-args  exps == NULL) {
 /* if query string present but no expiration time, don't cache it
  * (RFC 2616/13.9)

--

[patch 10/16] fix up coding style issues

2006-09-19 Thread Davi Arnaut

Clean up code style in the cache code and shrink the mod_mem_cache
store_body function.


Index: modules/cache/mod_mem_cache.c
===
--- modules/cache/mod_mem_cache.c.orig
+++ modules/cache/mod_mem_cache.c
@@ -52,27 +52,30 @@
 
 module AP_MODULE_DECLARE_DATA mem_cache_module;
 
-typedef enum {
+typedef enum
+{
 CACHE_TYPE_FILE = 1,
 CACHE_TYPE_HEAP,
 CACHE_TYPE_MMAP
 } cache_type_e;
 
-typedef struct {
-char* hdr;
-char* val;
+typedef struct
+{
+char *hdr;
+char *val;
 } cache_header_tbl_t;
 
-typedef struct mem_cache_object {
+typedef struct mem_cache_object
+{
 cache_type_e type;
 apr_ssize_t num_header_out;
 apr_ssize_t num_req_hdrs;
 cache_header_tbl_t *header_out;
-cache_header_tbl_t *req_hdrs; /* for Vary negotiation */
+cache_header_tbl_t *req_hdrs;   /* for Vary negotiation */
 apr_size_t m_len;
 void *m;
 apr_os_file_t fd;
-apr_int32_t flags;  /* File open flags */
+apr_int32_t flags;  /* File open flags */
 long priority;  /** the priority of this entry */
 long total_refs;  /** total number of references this entry has 
had */
 
@@ -80,14 +83,15 @@
 
 } mem_cache_object_t;
 
-typedef struct {
+typedef struct
+{
 apr_thread_mutex_t *lock;
 cache_cache_t *cache_cache;
 
 /* Fields set by config directives */
 apr_size_t min_cache_object_size;   /* in bytes */
 apr_size_t max_cache_object_size;   /* in bytes */
-apr_size_t max_cache_size;  /* in bytes */
+apr_size_t max_cache_size;  /* in bytes */
 apr_size_t max_object_cnt;
 cache_pqueue_set_priority cache_remove_algorithm;
 
@@ -95,8 +99,11 @@
  * we haven't yet seen EOS */
 apr_off_t max_streaming_buffer_size;
 } mem_cache_conf;
+
 static mem_cache_conf *sconf;
 
+static void cleanup_cache_object(cache_object_t * obj);
+
 #define DEFAULT_MAX_CACHE_SIZE 100*1024
 #define DEFAULT_MIN_CACHE_OBJECT_SIZE 0
 #define DEFAULT_MAX_CACHE_OBJECT_SIZE 1
@@ -104,15 +111,6 @@
 #define DEFAULT_MAX_STREAMING_BUFFER_SIZE 10
 #define CACHEFILE_LEN 20
 
-/* Forward declarations */
-static int remove_entity(cache_handle_t *h);
-static apr_status_t store_headers(cache_handle_t *h, request_rec *r, 
cache_info *i);
-static apr_status_t store_body(cache_handle_t *h, request_rec *r, 
apr_bucket_brigade *b);
-static apr_status_t recall_headers(cache_handle_t *h, request_rec *r);
-static apr_status_t recall_body(cache_handle_t *h, apr_pool_t *p, 
apr_bucket_brigade *bb);
-
-static void cleanup_cache_object(cache_object_t *obj);
-
 static long memcache_get_priority(void*a)
 {
 cache_object_t *obj = (cache_object_t *)a;
@@ -132,31 +130,34 @@
 
 static void memcache_set_pos(void *a, apr_ssize_t pos)
 {
-cache_object_t *obj = (cache_object_t *)a;
+cache_object_t *obj = (cache_object_t *) a;
 mem_cache_object_t *mobj = obj-vobj;
 
 apr_atomic_set32(mobj-pos, pos);
 }
+
 static apr_ssize_t memcache_get_pos(void *a)
 {
-cache_object_t *obj = (cache_object_t *)a;
+cache_object_t *obj = (cache_object_t *) a;
 mem_cache_object_t *mobj = obj-vobj;
 
 return apr_atomic_read32(mobj-pos);
 }
 
-static apr_size_t memcache_cache_get_size(void*a)
+static apr_size_t memcache_cache_get_size(void *a)
 {
-cache_object_t *obj = (cache_object_t *)a;
+cache_object_t *obj = (cache_object_t *) a;
 mem_cache_object_t *mobj = obj-vobj;
 return mobj-m_len;
 }
+
 /** callback to get the key of a item */
-static const char* memcache_cache_get_key(void*a)
+static const char *memcache_cache_get_key(void *a)
 {
-cache_object_t *obj = (cache_object_t *)a;
+cache_object_t *obj = (cache_object_t *) a;
 return obj-key;
 }
+
 /**
  * memcache_cache_free()
  * memcache_cache_free is a callback that is only invoked by a thread
@@ -165,9 +166,9 @@
  * has been ejected from the cache. decrement the refcount and if the refcount 
drops
  * to 0, cleanup the cache object.
  */
-static void memcache_cache_free(void*a)
+static void memcache_cache_free(void *a)
 {
-cache_object_t *obj = (cache_object_t *)a;
+cache_object_t *obj = (cache_object_t *) a;
 
 /* Decrement the refcount to account for the object being ejected
  * from the cache. If the refcount is 0, free the object.
@@ -176,13 +177,14 @@
 cleanup_cache_object(obj);
 }
 }
+
 /*
  * functions return a 'negative' score since priority queues
  * dequeue the object with the highest value first
  */
 static long memcache_lru_algorithm(long queue_clock, void *a)
 {
-cache_object_t *obj = (cache_object_t *)a;
+cache_object_t *obj = (cache_object_t *) a;
 mem_cache_object_t *mobj = obj-vobj;
 if (mobj-priority == 0)
 mobj-priority = queue_clock - mobj-total_refs;
@@ -196,17 +198,17 @@
 
 static long memcache_gdsf_algorithm(long queue_clock, void *a)
 {
-cache_object_t *obj = (cache_object_t *)a;
+cache_object_t *obj =

[patch 12/16] use APR pools in mod_mem_cache

2006-09-19 Thread Davi Arnaut

Convert mod_mem_cache to use APR memory pool functions by creating
a root pool for object persistence across requests. This also
eliminates the need for custom serialization code.

Index: modules/cache/mod_mem_cache.c
===
--- modules/cache/mod_mem_cache.c.orig
+++ modules/cache/mod_mem_cache.c
@@ -59,19 +59,12 @@
 CACHE_TYPE_MMAP
 } cache_type_e;
 
-typedef struct
-{
-char *hdr;
-char *val;
-} cache_header_tbl_t;
-
 typedef struct mem_cache_object
 {
+apr_pool_t *pool;
 cache_type_e type;
-apr_ssize_t num_header_out;
-apr_ssize_t num_req_hdrs;
-cache_header_tbl_t *header_out;
-cache_header_tbl_t *req_hdrs;   /* for Vary negotiation */
+apr_table_t *header_out;
+apr_table_t *req_hdrs; /* for Vary negotiation */
 apr_size_t m_len;
 void *m;
 apr_os_file_t fd;
@@ -212,19 +205,6 @@
 {
 mem_cache_object_t *mobj = obj-vobj;
 
-/* TODO:
- * We desperately need a more efficient way of allocating objects. We're
- * making way too many malloc calls to create a fully populated
- * cache object...
- */
-
-/* Cleanup the cache_object_t */
-if (obj-key) {
-free((void *) obj-key);
-}
-
-free(obj);
-
 /* Cleanup the mem_cache_object_t */
 if (mobj) {
 if (mobj-m) {
@@ -237,18 +217,9 @@
 close(mobj-fd);
 #endif
 }
-if (mobj-header_out) {
-if (mobj-header_out[0].hdr)
-free(mobj-header_out[0].hdr);
-free(mobj-header_out);
-}
-if (mobj-req_hdrs) {
-if (mobj-req_hdrs[0].hdr)
-free(mobj-req_hdrs[0].hdr);
-free(mobj-req_hdrs);
-}
-free(mobj);
 }
+
+apr_pool_destroy(mobj-pool);
 }
 
 static apr_status_t decrement_refcount(void *arg)
@@ -339,9 +310,10 @@
 static int create_entity(cache_handle_t * h, cache_type_e type_e,
  request_rec *r, const char *key, apr_off_t len)
 {
+apr_status_t rv;
+apr_pool_t *pool;
 cache_object_t *obj, *tmp_obj;
 mem_cache_object_t *mobj;
-apr_size_t key_len;
 
 if (len == -1) {
 /* Caching a streaming response. Assume the response is
@@ -375,25 +347,21 @@
 }
 }
 
-/* Allocate and initialize cache_object_t */
-obj = calloc(1, sizeof(*obj));
-if (!obj) {
-return DECLINED;
-}
-key_len = strlen(key) + 1;
-obj-key = malloc(key_len);
-if (!obj-key) {
-cleanup_cache_object(obj);
+rv = apr_pool_create(pool, NULL);
+
+if (rv != APR_SUCCESS) {
+ap_log_error(APLOG_MARK, APLOG_WARNING, rv, r-server,
+ mem_cache: Failed to create memory pool.);
 return DECLINED;
 }
-memcpy((void *) obj-key, key, key_len);
+
+/* Allocate and initialize cache_object_t */
+obj = apr_pcalloc(pool, sizeof(*obj));
+obj-key = apr_pstrdup(pool, key);
 
 /* Allocate and init mem_cache_object_t */
-mobj = calloc(1, sizeof(*mobj));
-if (!mobj) {
-cleanup_cache_object(obj);
-return DECLINED;
-}
+mobj = apr_pcalloc(pool, sizeof(*mobj));
+mobj-pool = pool;
 
 /* Finish initing the cache object */
 apr_atomic_set32(obj-refcount, 1);
@@ -545,65 +513,6 @@
 return OK;
 }
 
-static apr_status_t serialize_table(cache_header_tbl_t ** obj,
-apr_ssize_t * nelts, apr_table_t * table)
-{
-const apr_array_header_t *elts_arr = apr_table_elts(table);
-apr_table_entry_t *elts = (apr_table_entry_t *) elts_arr-elts;
-apr_ssize_t i;
-apr_size_t len = 0;
-apr_size_t idx = 0;
-char *buf;
-
-*nelts = elts_arr-nelts;
-if (*nelts == 0) {
-*obj = NULL;
-return APR_SUCCESS;
-}
-*obj = malloc(sizeof(cache_header_tbl_t) * elts_arr-nelts);
-if (NULL == *obj) {
-return APR_ENOMEM;
-}
-for (i = 0; i  elts_arr-nelts; ++i) {
-len += strlen(elts[i].key);
-len += strlen(elts[i].val);
-len += 2;   /* Extra space for NULL string terminator for 
key and val */
-}
-
-/* Transfer the headers into a contiguous memory block */
-buf = malloc(len);
-if (!buf) {
-free(*obj);
-*obj = NULL;
-return APR_ENOMEM;
-}
-
-for (i = 0; i  *nelts; ++i) {
-(*obj)[i].hdr = buf[idx];
-len = strlen(elts[i].key) + 1;  /* Include NULL terminator */
-memcpy(buf[idx], elts[i].key, len);
-idx += len;
-
-(*obj)[i].val = buf[idx];
-len = strlen(elts[i].val) + 1;
-memcpy(buf[idx], elts[i].val, len);
-idx += len;
-}
-return APR_SUCCESS;
-}
-
-static int unserialize_table(cache_header_tbl_t * ctbl, int num_headers,
- apr_table_t * t)
-{
-int i;
-
-for (i = 0; i  num_headers; ++i) {
-apr_table_addn(t, ctbl[i].hdr, ctbl[i].val);
-}
-
-return

[patch 16/16] remove duplicated defines

2006-09-19 Thread Davi Arnaut

Remove duplicated defines.

Index: modules/cache/mod_cache.h
===
--- modules/cache/mod_cache.h.orig
+++ modules/cache/mod_cache.h
@@ -311,27 +311,6 @@
 
 /* hooks */
 
-/* Create a set of CACHE_DECLARE(type), CACHE_DECLARE_NONSTD(type) and 
- * CACHE_DECLARE_DATA with appropriate export and import tags for the platform
- */
-#if !defined(WIN32)
-#define CACHE_DECLARE(type)type
-#define CACHE_DECLARE_NONSTD(type) type
-#define CACHE_DECLARE_DATA
-#elif defined(CACHE_DECLARE_STATIC)
-#define CACHE_DECLARE(type)type __stdcall
-#define CACHE_DECLARE_NONSTD(type) type
-#define CACHE_DECLARE_DATA
-#elif defined(CACHE_DECLARE_EXPORT)
-#define CACHE_DECLARE(type)__declspec(dllexport) type __stdcall
-#define CACHE_DECLARE_NONSTD(type) __declspec(dllexport) type
-#define CACHE_DECLARE_DATA __declspec(dllexport)
-#else
-#define CACHE_DECLARE(type)__declspec(dllimport) type __stdcall
-#define CACHE_DECLARE_NONSTD(type) __declspec(dllimport) type
-#define CACHE_DECLARE_DATA __declspec(dllimport)
-#endif
-
 APR_DECLARE_OPTIONAL_FN(apr_status_t, 
 ap_cache_generate_key, 
 (request_rec *r, apr_pool_t*p, char**key ));

--

[patch 06/16] dont choke on empty URI path

2006-09-19 Thread Davi Arnaut

According to my reading of RFC3986 (section 6.2.3.) if a URI contains an
authority component and an empty path, the empty path is to be equivalent
to /.

It explicitly cites the following four URIs as equivalents:

http://example.com
http://example.com/
http://example.com:/
http://example.com:80/

Index: modules/cache/cache_util.c
===
--- modules/cache/cache_util.c.orig
+++ modules/cache/cache_util.c
@@ -67,9 +67,23 @@
 }
 }
 
+/* For HTTP caching purposes, an empty (NULL) path is equivalent to
+ * a single / path. RFCs 3986/2396
+ */
+
+if (!url-path) {
+if (*filter-path == '/'  pathlen == 1) {
+return 1;
+}
+else {
+return 0;
+}
+}
+
 /* Url has met all of the filter conditions so far, determine
  * if the paths match.
  */
+
 return !strncmp(filter-path, url-path, pathlen);
 }
 

--

[patch 07/16] shrink cache_select

2006-09-19 Thread Davi Arnaut

Move parts of cache_select() to a smaller cache_check_request()
function that is easier to read and understand.

Index: modules/cache/cache_storage.c
===
--- modules/cache/cache_storage.c.orig
+++ modules/cache/cache_storage.c
@@ -169,6 +169,115 @@
 }
 }
 
+static int cache_check_request(cache_handle_t *h, request_rec *r,
+   cache_request_rec *cache,
+   cache_provider_list *list)
+{
+int fresh;
+char *vary = NULL;
+
+/*
+ * Check Content-Negotiation - Vary
+ *
+ * At this point we need to make sure that the object we found in
+ * the cache is the same object that would be delivered to the
+ * client, when the effects of content negotiation are taken into
+ * effect.
+ *
+ * In plain english, we want to make sure that a language-negotiated
+ * document in one language is not given to a client asking for a
+ * language negotiated document in a different language by mistake.
+ *
+ * This code makes the assumption that the storage manager will
+ * cache the req_hdrs if the response contains a Vary
+ * header.
+ *
+ * RFC2616 13.6 and 14.44 describe the Vary mechanism.
+ */
+vary = apr_pstrdup(r-pool, apr_table_get(h-resp_hdrs, Vary));
+while (vary  *vary) {
+char *name = vary;
+const char *h1, *h2;
+
+/* isolate header name */
+while (*vary  !apr_isspace(*vary)  (*vary != ','))
+++vary;
+while (*vary  (apr_isspace(*vary) || (*vary == ','))) {
+*vary = '\0';
+++vary;
+}
+
+/*
+ * is this header in the request and the header in the cached
+ * request identical? If not, we give up and do a straight get
+ */
+h1 = apr_table_get(r-headers_in, name);
+h2 = apr_table_get(h-req_hdrs, name);
+if (h1 == h2) {
+/* both headers NULL, so a match - do nothing */
+}
+else if (h1  h2  !strcmp(h1, h2)) {
+/* both headers exist and are equal - do nothing */
+}
+else {
+/* headers do not match, so Vary failed */
+ap_log_error(APLOG_MARK, APLOG_DEBUG, APR_SUCCESS,
+r-server,
+cache_select_url(): Vary header mismatch.);
+return DECLINED;
+}
+}
+
+cache-provider = list-provider;
+cache-provider_name = list-provider_name;
+
+/* Is our cached response fresh enough? */
+fresh = ap_cache_check_freshness(h, r);
+if (!fresh) {
+const char *etag, *lastmod;
+
+ap_log_error(APLOG_MARK, APLOG_DEBUG, APR_SUCCESS, r-server,
+Cached response for %s isn't fresh.  Adding/replacing 
+conditional request headers., r-uri);
+
+/* Make response into a conditional */
+cache-stale_headers = apr_table_copy(r-pool,
+r-headers_in);
+
+/* We can only revalidate with our own conditionals: remove the
+ * conditions from the original request.
+ */
+apr_table_unset(r-headers_in, If-Match);
+apr_table_unset(r-headers_in, If-Modified-Since);
+apr_table_unset(r-headers_in, If-None-Match);
+apr_table_unset(r-headers_in, If-Range);
+apr_table_unset(r-headers_in, If-Unmodified-Since);
+
+etag = apr_table_get(h-resp_hdrs, ETag);
+lastmod = apr_table_get(h-resp_hdrs, Last-Modified);
+
+if (etag || lastmod) {
+/* If we have a cached etag and/or Last-Modified add in
+ * our own conditionals.
+ */
+
+if (etag) {
+apr_table_set(r-headers_in, If-None-Match, etag);
+}
+
+if (lastmod) {
+apr_table_set(r-headers_in, If-Modified-Since,
+lastmod);
+}
+cache-stale_handle = h;
+}
+
+return DECLINED;
+}
+
+return OK;
+}
+
 /*
  * select a specific URL entity in the cache
  *
@@ -201,110 +310,13 @@
 while (list) {
 switch ((rv = list-provider-open_entity(h, r, key))) {
 case OK: {
-char *vary = NULL;
-int fresh;
 
 if (list-provider-recall_headers(h, r) != APR_SUCCESS) {
 /* TODO: Handle this error */
 return DECLINED;
 }
 
-/*
- * Check Content-Negotiation - Vary
- *
- * At this point we need to make sure that the object we found in
- * the cache is the same object that would be delivered to the
- * client, when the effects of content negotiation are taken into
- * effect.
- *
- * In plain english, we want to make sure that a 
language-negotiated
- * document in one language is

[patch 15/16] fix a FIXME (clean cache_object_t)

2006-09-19 Thread Davi Arnaut

Move members from cache_object_t to mem_cache_object_t that are only
required for mod_mem_cache.

Index: modules/cache/mod_cache.h
===
--- modules/cache/mod_cache.h.orig
+++ modules/cache/mod_cache.h
@@ -181,23 +181,12 @@
 
 /* cache handle information */
 
-/* XXX TODO On the next structure change/MMN bump, 
- * count must become an apr_off_t, representing
- * the potential size of disk cached objects.
- * Then dig for
- * XXX Bad Temporary Cast - see cache_object_t notes 
- */
 typedef struct cache_object cache_object_t;
 struct cache_object {
 const char *key;
-cache_object_t *next;
 cache_info info;
 /* Opaque portion (specific to the implementation) of the cache object */
 void *vobj;
-/* FIXME: These are only required for mod_mem_cache. */
-apr_size_t count;   /* Number of body bytes written to the cache so far */
-int complete;
-apr_uint32_t refcount;  /* refcount and bit flag to cleanup object */
 };
 
 typedef struct cache_handle cache_handle_t;
Index: modules/cache/mod_mem_cache.c
===
--- modules/cache/mod_mem_cache.c.orig
+++ modules/cache/mod_mem_cache.c
@@ -71,9 +71,10 @@
 apr_int32_t flags;  /* File open flags */
 long priority;  /** the priority of this entry */
 long total_refs;  /** total number of references this entry has 
had */
-
 apr_uint32_t pos;   /** the position of this entry in the cache */
-
+apr_off_t count;   /* Number of body bytes written to the cache so far */
+int complete;
+apr_uint32_t refcount;  /* refcount and bit flag to cleanup object */
 } mem_cache_object_t;
 
 typedef struct
@@ -162,11 +163,12 @@
 static void memcache_cache_free(void *a)
 {
 cache_object_t *obj = (cache_object_t *) a;
+mem_cache_object_t *mobj = obj-vobj;
 
 /* Decrement the refcount to account for the object being ejected
  * from the cache. If the refcount is 0, free the object.
  */
-if (!apr_atomic_dec32(obj-refcount)) {
+if (!apr_atomic_dec32(mobj-refcount)) {
 cleanup_cache_object(obj);
 }
 }
@@ -225,6 +227,7 @@
 static apr_status_t decrement_refcount(void *arg)
 {
 cache_object_t *obj = (cache_object_t *) arg;
+mem_cache_object_t *mobj = obj-vobj;
 
 /* If obj-complete is not set, the cache update failed and the
  * object needs to be removed from the cache then cleaned up.
@@ -232,7 +235,7 @@
  * cache already, so make sure it is really still in the cache
  * before attempting to remove it.
  */
-if (!obj-complete) {
+if (!mobj-complete) {
 cache_object_t *tobj = NULL;
 if (sconf-lock) {
 apr_thread_mutex_lock(sconf-lock);
@@ -240,7 +243,7 @@
 tobj = cache_find(sconf-cache_cache, obj-key);
 if (tobj == obj) {
 cache_remove(sconf-cache_cache, obj);
-apr_atomic_dec32(obj-refcount);
+apr_atomic_dec32(mobj-refcount);
 }
 if (sconf-lock) {
 apr_thread_mutex_unlock(sconf-lock);
@@ -248,7 +251,7 @@
 }
 
 /* If the refcount drops to 0, cleanup the cache object */
-if (!apr_atomic_dec32(obj-refcount)) {
+if (!apr_atomic_dec32(mobj-refcount)) {
 cleanup_cache_object(obj);
 }
 return APR_SUCCESS;
@@ -257,6 +260,7 @@
 static apr_status_t cleanup_cache_mem(void *sconfv)
 {
 cache_object_t *obj;
+mem_cache_object_t *mobj;
 mem_cache_conf *co = (mem_cache_conf *) sconfv;
 
 if (!co) {
@@ -271,8 +275,9 @@
 }
 obj = cache_pop(co-cache_cache);
 while (obj) {
+mobj = obj-vobj;
 /* Iterate over the cache and clean up each unreferenced entry */
-if (!apr_atomic_dec32(obj-refcount)) {
+if (!apr_atomic_dec32(mobj-refcount)) {
 cleanup_cache_object(obj);
 }
 obj = cache_pop(co-cache_cache);
@@ -364,9 +369,9 @@
 mobj-pool = pool;
 
 /* Finish initing the cache object */
-apr_atomic_set32(obj-refcount, 1);
+apr_atomic_set32(mobj-refcount, 1);
 mobj-total_refs = 1;
-obj-complete = 0;
+mobj-complete = 0;
 obj-vobj = mobj;
 /* Safe cast: We tested  sconf-max_cache_object_size above */
 mobj-m_len = (apr_size_t) len;
@@ -390,12 +395,13 @@
 tmp_obj = (cache_object_t *) cache_find(sconf-cache_cache, key);
 
 if (!tmp_obj) {
+mem_cache_object_t *tmp_mobj = obj-vobj;
 cache_insert(sconf-cache_cache, obj);
 /* Add a refcount to account for the reference by the
  * hashtable in the cache. Refcount should be 2 now, one
  * for this thread, and one for the cache.
  */
-apr_atomic_inc32(obj-refcount);
+apr_atomic_inc32(tmp_mobj-refcount);
 }
 if (sconf-lock) {
 apr_thread_mutex_unlock(sconf-lock);
@@ -434,6 +440,7 @@
 static int open_entity(cache_handle_t * h, request_rec *r, const char *key)
 {

Re: mod_cache responsibilities vs mod_xxx_cache provider responsibilities

2006-09-14 Thread Davi Arnaut



On 14/09/2006, at 04:24, Niklas Edmundsson wrote:


On Wed, 13 Sep 2006, Davi Arnaut wrote:

I'm working on this. You may want to check my proposal at http:// 
verdesmares.com/Apache/proposal.txt


Will it be possible to do away with one file for headers and one  
file for body in mod_disk_cache with this scheme?


http://verdesmares.com/Apache/patches/016.patch



The thing is that I've been pounding seriously at mod_disk_cache to  
make it able to sustain rather heavy load on not-so-heavy  
equipment, and part of that effort was to wrap headers and body  
into one file for mainly the following purposes:


* Less files, less open():s (small gain)
* Way much easier to purge old entries from the cache (huge gain).
  Simply list all files in cache, sort by atime and remove the oldest.
  The old way by using htcacheclean took ages and had less useful
  removal criteria.
* No synchronisation issues between the header file and body file,
  unlink one and it's gone.

That's only one of many changes made, but I found it to be crucial  
to be able to have an architecture that's consistent without  
relying on locks. This made it rather easy to implement stuff like  
serving files that are currently being cached from cache, reusing  
expired cached files if the originating file is found to be  
unmodified, and so on.


But the largest gain is still the cache cleaning process.

The stuff is used in production and seems stable, however I haven't  
had any response to the first (trivial) patch sent so I don't know  
if there's any interest in this.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- 
=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  |  
[EMAIL PROTECTED]
-- 
-

 Does the Little Mermaid wear an algebra?
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- 
=-=-=-=

Re: mod_cache responsibilities vs mod_xxx_cache provider responsibilities

2006-09-14 Thread Davi Arnaut



On 14/09/2006, at 04:39, Graham Leggett wrote:


Niklas Edmundsson wrote:

Will it be possible to do away with one file for headers and one  
file for body in mod_disk_cache with this scheme?


This definitely has lots of advantages - however HTTP/1.1 requires  
that it be possible to modify the headers on a cached entry  
independently of the cached body. As long as this is catered for,  
it should be fine.


This is not a top priority since actually there is no complete  
support for it in mod_cache (partial responses and such), but it  
would be nice to have it.


 We could later easily extend the format to support it.

--
Davi Arnaut

Re: mod_cache responsibilities vs mod_xxx_cache provider responsibilities

2006-09-14 Thread Davi Arnaut



On 14/09/2006, at 05:08, Issac Goldstand wrote:


This looks familiar.  I seem to remembering seeing patches for this a
few months back.   Were they not committed to trunk?  If not, is there
any reason why not?  I'd hate to spend serious time making  
modifications

only to have to redo the work when this (pretty major) patchset gets
committed...


Probably because I have not yet submitted it for inclusion and it's  
not finished.

A new cache-dev branch would be really nice - hint hint :)

--
Davi Arnaut




Davi Arnaut wrote:


On 13/09/2006, at 16:29, Issac Goldstand wrote:


Hi all,
  I've been hacking at mod_cache a bit, and was surprised to find  
that
part of the decision to serve previously cached content or not  
was being

made by the backend provider and not mod_cache; specifically, the
expiration date of the content seems to be checked by  
mod_disk_cache (as
part of open_entity), and if the provider check fails, mod_cache  
doesn't

even know about the entity (and therefore, in the case of a caching
proxy,  can't treat it as a possibly stale entity upon which it  
can just

do a conditional GET and possibly get a 304, rather than requiring
mod_proxy to rerequest the entire entity again).

When I originally started looking at the family of cache modules, I
assumed that all of the decision-making logic would be in mod_cache,
while the mod_xxx_cache providers would be dumb file-stores (at  
least,

as far as mod_cache is concerned).  Is this not the case?


I'm working on this. You may want to check my proposal at
http://verdesmares.com/Apache/proposal.txt



If it is, would patches be acceptable if I have the time to try to
rectify the situation (at least somewhat)?


http://verdesmares.com/Apache/patches/022.patch

I'm still working on it, things may change radically.

--
Davi Arnaut

Re: mod_cache responsibilities vs mod_xxx_cache provider responsibilities

2006-09-14 Thread Davi Arnaut



On 14/09/2006, at 08:48, Graham Leggett wrote:


On Thu, September 14, 2006 1:42 pm, Davi Arnaut wrote:


This is not a top priority since actually there is no complete
support for it in mod_cache (partial responses and such), but it
would be nice to have it.


HTTP/1.1 compliance is mandatory for the cache. If it doesn't work  
now, it

needs to be fixed.


The cache is required to send to the client the most up-to-date  
response, it doesn't mean it must cache it.


What I meant is _if_ it causes significant slowdowns for a common  
cache hit path _probably_ it is better to just revalidate the hole  
entity.


--
Davi Arnaut

Re: mod_cache responsibilities vs mod_xxx_cache provider responsibilities

2006-09-14 Thread Davi Arnaut



On 14/09/2006, at 09:06, Niklas Edmundsson wrote:


On Thu, 14 Sep 2006, Davi Arnaut wrote:



On 14/09/2006, at 04:24, Niklas Edmundsson wrote:


On Wed, 13 Sep 2006, Davi Arnaut wrote:
I'm working on this. You may want to check my proposal at http:// 
verdesmares.com/Apache/proposal.txt
Will it be possible to do away with one file for headers and one  
file for body in mod_disk_cache with this scheme?


http://verdesmares.com/Apache/patches/016.patch


OK. You seem to dump the body right after the headers though, so  
you won't be able to do header rewrites.


Could you kindly point me to the cache code that rewrites only the  
headers ?




Also, it's rather unneccessary to call the files .cache if there  
are only one type of files ;)


That's convenience, there may be other type of files on the same  
cache directory that are created by other tools.


--
Davi Arnaut

Re: mod_cache responsibilities vs mod_xxx_cache provider responsibilities

2006-09-14 Thread Davi Arnaut



On 14/09/2006, at 09:21, Davi Arnaut wrote:



On 14/09/2006, at 09:06, Niklas Edmundsson wrote:


On Thu, 14 Sep 2006, Davi Arnaut wrote:



On 14/09/2006, at 04:24, Niklas Edmundsson wrote:


On Wed, 13 Sep 2006, Davi Arnaut wrote:
I'm working on this. You may want to check my proposal at  
http://verdesmares.com/Apache/proposal.txt
Will it be possible to do away with one file for headers and  
one file for body in mod_disk_cache with this scheme?


http://verdesmares.com/Apache/patches/016.patch


OK. You seem to dump the body right after the headers though, so  
you won't be able to do header rewrites.


Could you kindly point me to the cache code that rewrites only the  
headers ?


Oops, I spoke too much too soon :)





Also, it's rather unneccessary to call the files .cache if there  
are only one type of files ;)


That's convenience, there may be other type of files on the same  
cache directory that are created by other tools.


--
Davi Arnaut

Re: mod_cache responsibilities vs mod_xxx_cache provider responsibilities

2006-09-13 Thread Davi Arnaut



On 13/09/2006, at 16:29, Issac Goldstand wrote:


Hi all,
  I've been hacking at mod_cache a bit, and was surprised to find that
part of the decision to serve previously cached content or not was  
being

made by the backend provider and not mod_cache; specifically, the
expiration date of the content seems to be checked by  
mod_disk_cache (as
part of open_entity), and if the provider check fails, mod_cache  
doesn't

even know about the entity (and therefore, in the case of a caching
proxy,  can't treat it as a possibly stale entity upon which it can  
just

do a conditional GET and possibly get a 304, rather than requiring
mod_proxy to rerequest the entire entity again).

When I originally started looking at the family of cache modules, I
assumed that all of the decision-making logic would be in mod_cache,
while the mod_xxx_cache providers would be dumb file-stores (at  
least,

as far as mod_cache is concerned).  Is this not the case?


I'm working on this. You may want to check my proposal at http:// 
verdesmares.com/Apache/proposal.txt




If it is, would patches be acceptable if I have the time to try to
rectify the situation (at least somewhat)?


http://verdesmares.com/Apache/patches/022.patch

I'm still working on it, things may change radically.

--
Davi Arnaut

Re: How to detect the mpm used via apxs?

2006-08-19 Thread Davi Arnaut



Em 19/08/2006, às 07:55, Mladen Turk escreveu:


Hi,

Anyone knows how to detect the mpm used so that
module can be build with/without threading code
depending if the mpm is prefork or any other threaded
one like worker by using apxs?


Why use apxs ?

mpm.h:

#define WORKER_MPM
#define MPM_NAME Worker



The only way I can imagine is to parse the stdout from
httpd, but IMHO something like
apxs -q MPM should do the trick like it does for CFLAGS
or INCLUDES.

Regards,
Mladen.

Re: svn commit: r426604 - in /httpd/httpd/branches/httpd-proxy-scoreboard: modules/proxy/ support/

2006-07-28 Thread Davi Arnaut



Em 28/07/2006, às 13:34, [EMAIL PROTECTED] escreveu:


Author: jfclere
Date: Fri Jul 28 09:33:58 2006
New Revision: 426604


...


+
+static const slotmem_storage_method *checkstorage = NULL;
+static ap_slotmem_t *myscore=NULL;


Indentation consistency ? myscore=NULL


+
+if (!port) {
+if (strcmp(scheme, ajp) == 0)
+port = 8009;
+else if (strcmp(scheme, http) == 0)
+port = 80;
+else
+port = 443;
+}


apr_uri_port_of_scheme ? (it may not have the ajp scheme, a patch is  
appreciated)




+rv = apr_sockaddr_info_get(epsv_addr, hostname, APR_INET,  
port, 0, pool);

+if (rv != APR_SUCCESS) {
+ap_log_error(APLOG_MARK, APLOG_ERR, 0, NULL,
+ apr_sockaddr_info_get failed);
+apr_socket_close(newsock);
+return rv;
+}



ap_log_error(..APLOG_ERR, rv, NULL..) so we may have a hint why it  
failed



+rv = apr_socket_timeout_set(newsock, 10);
+if (rv != APR_SUCCESS) {
+ap_log_error(APLOG_MARK, APLOG_WARNING, 0, NULL,
+apr_socket_timeout_set);
+apr_socket_close(newsock);
+return rv;


same for ap_log_error


+rv = apr_socket_connect(newsock, epsv_addr);
+if (rv != APR_SUCCESS) {
+ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, NULL,
+apr_socket_connect failed);
+apr_socket_close(newsock);
+return rv;
+}


same for ap_log_error..and so on.


+
+if (balancer_name)
+strcpy(workerconf-balancer_name, balancer_name);
+workerconf-id = worker-id;
+workerconf-retry = worker-retry;
+workerconf-lbfactor = worker-lbfactor;
+if (worker-name)
+strcpy(workerconf-name, worker-name);
+if (worker-scheme)
+strcpy(workerconf-scheme, worker-scheme);
+if (worker-hostname)
+strcpy(workerconf-hostname, worker-hostname);
+if (worker-route)
+strcpy(workerconf-route, worker-route);
+if (worker-redirect)
+strcpy(workerconf-redirect, worker-redirect);


strncpy ?


+/* allocate the data */
+*worker = apr_pcalloc(pool, sizeof(proxy_worker));
+if (workerconf-balancer_name)
+*balancer_name = apr_pcalloc(pool, strlen(workerconf- 
balancer_name));

+else
+*balancer_name = NULL;


allocated for what ? the string is not copied. Also, shoudn't it be  
strlen(..) + 1 ?




+/* make the module usuable from outside */
+health_worker_method *health_checker_get_storage()
+{
+return(worker_storage);
+}
+
+/* handle the slotmem storage */
+void health_checker_init_slotmem_storage(slotmem_storage_method *  
storage)

+{
+checkstorage = storage;
+}
+slotmem_storage_method * health_checker_get_slotmem_storage()
+{
+return(checkstorage);
+}
+
+/* handle the slotmen itself */
+void health_checker_init_slotmem(ap_slotmem_t *score)
+{
+ myscore = score;
+}
+ap_slotmem_t *health_checker_get_slotmem()
+{
+return(myscore);
+}


static APR_INLINE ...


+char * ap_server_root_relative(apr_pool_t *p, const char *name)
+{
+char *fname;
+
+/* XXX: apr_filepath_merge better ? */
+if (basedir  name[0] != '/') {
+fname = apr_pcalloc(p, strlen(basedir)+strlen(name)+1);
+strcpy(fname, basedir);
+strcat(fname, /);
+strcat(fname, name);


apr_pstrcat ?

[patch 00/10] SoC 2006 -- extending the cache architecture

2006-07-23 Thread Davi Arnaut

Hi,

Here is the first code drop regarding my SoC proposal Redesigning and extending
the Apache cache architecture (http://verdesmares.com/Apache/proposal.txt).

The patches are mainly clenaups and bug-fixes, except a new mod_disk_cache
directory structure. Further clenaups still required.

Other than that, I modified a bit the way and which storege callback
are called.

Also, I would like to thank the patience of my mentor, Paul Querna.

--
Davi Arnaut

[patch 02/10] revamped mod_disk_cache directory structure

2006-07-23 Thread Davi Arnaut

This patch converts the mod_disk_cache cache directory structure to a
uniformly distributed N level hierarchy. The admin specifies the number
of levels (or directories) and the files are scattered across the
directories. Example:

CacheRoot /cache/
# CacheDirLevels n l1 l2 ln
CacheDirLevels 2 4 256 16

In this example, the directory tree will be three levels deep. The first
level will have 4 directories, each of the 4 first level directories will
have 256 directories (second level) and so on to the last level.

Directory tree layout for the example:

/cache/
/cache/0d
/cache/0d/36
/cache/0d/36/06
/cache/0d/36/06/7a50a5c38a73abdb6db28a2b0f6881e5.data
/cache/0d/36/06/7a50a5c38a73abdb6db28a2b0f6881e5.header

This patch adds support for symlinking the directories to separate disk
or partitions by creating the cache files on their destination directory.
The symlinks can also be used to load balance the cache files between
disks because each last level directory gets the same (on average) number
of files -- on a cache setup with 4 first level directories each one
receives 25%, linking the three of them to disk A and the last one to
disk B yields a 75/25 distribution between disks A (75) and B (25).

Index: trunk/modules/cache/cache_util.c
===
--- trunk.orig/modules/cache/cache_util.c
+++ trunk/modules/cache/cache_util.c
@@ -19,6 +19,7 @@
 #include mod_cache.h
 
 #include ap_provider.h
+#include util_md5.h
 
 /* -- */
 
@@ -489,54 +490,49 @@ CACHE_DECLARE(void) ap_cache_usec2hex(ap
 y[sizeof(j) * 2] = '\0';
 }
 
-static void cache_hash(const char *it, char *val, int ndepth, int nlength)
+static unsigned int cdb_string_hash(const char *str)
 {
-apr_md5_ctx_t context;
-unsigned char digest[16];
-char tmp[22];
-int i, k, d;
-unsigned int x;
-static const char enc_table[64] =
-ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_@;
-
-apr_md5_init(context);
-apr_md5_update(context, (const unsigned char *) it, strlen(it));
-apr_md5_final(digest, context);
-
-/* encode 128 bits as 22 characters, using a modified uuencoding
- * the encoding is 3 bytes - 4 characters* i.e. 128 bits is
- * 5 x 3 bytes + 1 byte - 5 * 4 characters + 2 characters
- */
-for (i = 0, k = 0; i  15; i += 3) {
-x = (digest[i]  16) | (digest[i + 1]  8) | digest[i + 2];
-tmp[k++] = enc_table[x  18];
-tmp[k++] = enc_table[(x  12)  0x3f];
-tmp[k++] = enc_table[(x  6)  0x3f];
-tmp[k++] = enc_table[x  0x3f];
-}
-
-/* one byte left */
-x = digest[15];
-tmp[k++] = enc_table[x  2];/* use up 6 bits */
-tmp[k++] = enc_table[(x  4)  0x3f];
-
-/* now split into directory levels */
-for (i = k = d = 0; d  ndepth; ++d) {
-memcpy(val[i], tmp[k], nlength);
-k += nlength;
-val[i + nlength] = '/';
-i += nlength + 1;
-}
-memcpy(val[i], tmp[k], 22 - k);
-val[i + 22 - k] = '\0';
-}
-
-CACHE_DECLARE(char *)ap_cache_generate_name(apr_pool_t *p, int dirlevels,
-int dirlength, const char *name)
-{
-char hashfile[66];
-cache_hash(name, hashfile, dirlevels, dirlength);
-return apr_pstrdup(p, hashfile);
+unsigned int hash = 5381;
+
+while (*str) {
+hash += (hash  5);
+hash ^= *str++;
+}
+
+return hash;
+}
+
+#define MD5_HEXDIGESTSIZE   (APR_MD5_DIGESTSIZE * 2 + 1)
+
+CACHE_DECLARE(char *)ap_cache_generate_name(apr_pool_t *p, unsigned int 
nlevels,
+unsigned int *dirlevels,
+unsigned int *divisors,
+const char *name)
+{
+char *md5_hash;
+unsigned int i;
+char *ptr, *key;
+unsigned char h;
+unsigned int cdb_hash;
+static const char hex[] = 0123456789abcdef;
+
+md5_hash = ap_md5_binary(p, (unsigned char *) name, (int) strlen(name));
+
+cdb_hash = cdb_string_hash(name);
+
+key = ptr = apr_palloc(p, nlevels * LEVEL_DIR_LENGTH +
+   MD5_HEXDIGESTSIZE);
+
+for (i = 0; i  nlevels; i++) {
+h = (cdb_hash / divisors[i]) % dirlevels[i];
+*ptr++ = hex[h  4];
+*ptr++ = hex[h  0xf];
+*ptr++ = '/';
+}
+
+memcpy(ptr, md5_hash, MD5_HEXDIGESTSIZE);
+
+return key;
 }
 
 /* Create a new table consisting of those elements from an input
Index: trunk/modules/cache/mod_cache.h
===
--- trunk.orig/modules/cache/mod_cache.h
+++ trunk/modules/cache/mod_cache.h
@@ -72,6 +72,8 @@
 
 #include apr_atomic.h
 
+#define LEVEL_DIR_LENGTH3
+
 #ifndef MAX
 #define MAX(a,b)((a)  (b) ? (a) : (b))
 #endif
@@ -274,8 +276,9 @@ CACHE_DECLARE(void) ap_cache_accept_head
 
 CACHE_DECLARE(apr_time_t)

[patch 01/10] dont cache expired objects

2006-07-23 Thread Davi Arnaut

Don't cache requests with a expires date in the past; otherwise the cache code
will _always_ try to cache the URL, leading to numerous rename() errors on win32
since the URL is already cached.

Index: trunk/modules/cache/mod_cache.c
===
--- trunk.orig/modules/cache/mod_cache.c
+++ trunk/modules/cache/mod_cache.c
@@ -426,6 +426,11 @@ static int cache_save_filter(ap_filter_t
 /* if a broken Expires header is present, don't cache it */
 reason = apr_pstrcat(p, Broken expires header: , exps, NULL);
 }
+else if (exps != NULL  exp != APR_DATE_BAD  exp  r-request_time)
+{
+/* if a Expires header is in the past, don't cache it */
+reason = Expires header already expired, not cacheable;
+}
 else if (r-args  exps == NULL) {
 /* if query string present but no expiration time, don't cache it
  * (RFC 2616/13.9)

--

[patch 06/10] dont choke on empty URI path

2006-07-23 Thread Davi Arnaut

According to my reading of RFC3986 (section 6.2.3.) if a URI contains an
authority component and an empty path, the empty path is to be equivalent
to /.

It explicitly cites the following four URIs as equivalents:

http://example.com
http://example.com/
http://example.com:/
http://example.com:80/

Index: trunk/modules/cache/cache_util.c
===
--- trunk.orig/modules/cache/cache_util.c
+++ trunk/modules/cache/cache_util.c
@@ -67,9 +67,23 @@ static int uri_meets_conditions(const ap
 }
 }
 
+/* For HTTP caching purposes, an empty (NULL) path is equivalent to
+ * a single / path. RFCs 3986/2396
+ */
+
+if (!url-path) {
+if (*filter-path == '/'  pathlen == 1) {
+return 1;
+}
+else {
+return 0;
+}
+}
+
 /* Url has met all of the filter conditions so far, determine
  * if the paths match.
  */
+
 return !strncmp(filter-path, url-path, pathlen);
 }
 

--

[patch 03/10] shrink cache_save_filter

2006-07-23 Thread Davi Arnaut

Move parts of cache_save_filter() to a smaller check_cacheable_request()
function that is easier to read and understand. Also, a useless assignment
is removed.


Index: trunk/modules/cache/mod_cache.c
===
--- trunk.orig/modules/cache/mod_cache.c
+++ trunk/modules/cache/mod_cache.c
@@ -294,65 +294,11 @@ static int cache_out_filter(ap_filter_t 
  *   Finally, pass the data to the next filter (the network or whatever)
  */
 
-static int cache_save_filter(ap_filter_t *f, apr_bucket_brigade *in)
+static const char* check_cacheable_request(request_rec *r,
+cache_request_rec *cache, cache_server_conf *conf)
 {
-int rv = !OK;
-int date_in_errhdr = 0;
-request_rec *r = f-r;
-cache_request_rec *cache;
-cache_server_conf *conf;
-const char *cc_out, *cl;
-const char *exps, *lastmods, *dates, *etag;
-apr_time_t exp, date, lastmod, now;
-apr_off_t size;
-cache_info *info = NULL;
-char *reason;
-apr_pool_t *p;
-
-conf = (cache_server_conf *) ap_get_module_config(r-server-module_config,
-  cache_module);
-
-/* Setup cache_request_rec */
-cache = (cache_request_rec *) ap_get_module_config(r-request_config,
-   cache_module);
-if (!cache) {
-/* user likely configured CACHE_SAVE manually; they should really use
- * mod_cache configuration to do that
- */
-cache = apr_pcalloc(r-pool, sizeof(cache_request_rec));
-ap_set_module_config(r-request_config, cache_module, cache);
-}
-
-reason = NULL;
-p = r-pool;
-/*
- * Pass Data to Cache
- * --
- * This section passes the brigades into the cache modules, but only
- * if the setup section (see below) is complete.
- */
-if (cache-block_response) {
-/* We've already sent down the response and EOS.  So, ignore
- * whatever comes now.
- */
-return APR_SUCCESS;
-}
-
-/* have we already run the cachability check and set up the
- * cached file handle?
- */
-if (cache-in_checked) {
-/* pass the brigades into the cache, then pass them
- * up the filter stack
- */
-rv = cache-provider-store_body(cache-handle, r, in);
-if (rv != APR_SUCCESS) {
-ap_log_error(APLOG_MARK, APLOG_DEBUG, rv, r-server,
- cache: Cache provider's store_body failed!);
-ap_remove_output_filter(f);
-}
-return ap_pass_brigade(f-next, in);
-}
+apr_time_t exp, lastmod;
+const char *exps, *lastmods, *etag, *cc_out, *reason = NULL;
 
 /*
  * Setup Data in Cache
@@ -420,11 +366,11 @@ static int cache_save_filter(ap_filter_t
  * We include 304 Not Modified here too as this is the origin server
  * telling us to serve the cached copy.
  */
-reason = apr_psprintf(p, Response status %d, r-status);
+reason = apr_psprintf(r-pool, Response status %d, r-status);
 }
 else if (exps != NULL  exp == APR_DATE_BAD) {
 /* if a broken Expires header is present, don't cache it */
-reason = apr_pstrcat(p, Broken expires header: , exps, NULL);
+reason = apr_pstrcat(r-pool, Broken expires header: , exps, NULL);
 }
 else if (exps != NULL  exp != APR_DATE_BAD  exp  r-request_time)
 {
@@ -503,20 +449,16 @@ static int cache_save_filter(ap_filter_t
 reason = r-no_cache present;
 }
 
-if (reason) {
-ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, r-server,
- cache: %s not cached. Reason: %s, r-unparsed_uri,
- reason);
+cache-exp = exp;
+cache-lastmod = lastmod;
 
-/* remove this filter from the chain */
-ap_remove_output_filter(f);
-
-/* ship the data up the stack */
-return ap_pass_brigade(f-next, in);
-}
+return reason;
+}
 
-/* Make it so that we don't execute this path again. */
-cache-in_checked = 1;
+static apr_off_t request_cl(request_rec *r, apr_bucket_brigade *in)
+{
+const char *cl;
+apr_off_t size;
 
 /* Set the content length if known.
  */
@@ -561,48 +503,18 @@ static int cache_save_filter(ap_filter_t
 }
 }
 
-/* It's safe to cache the response.
- *
- * There are two possiblities at this point:
- * - cache-handle == NULL. In this case there is no previously
- * cached entity anywhere on the system. We must create a brand
- * new entity and store the response in it.
- * - cache-stale_handle != NULL. In this case there is a stale
- * entity in the system which needs to be replaced by new
- * content (unless the result was 304 Not Modified, which means
- * the cached entity is actually fresh, and we should update
- * the headers).
- */
-
-/* Did we have a stale cache entry

[patch 05/10] pass uri_meets_conditions values by reference

2006-07-23 Thread Davi Arnaut

Don't pass 'large' objects by value when not needed, it wastes time and space.

Index: trunk/modules/cache/cache_util.c
===
--- trunk.orig/modules/cache/cache_util.c
+++ trunk/modules/cache/cache_util.c
@@ -28,40 +28,41 @@ extern module AP_MODULE_DECLARE_DATA cac
 /* Determine if url matches the hostname, scheme and port and path
  * in filter. All but the path comparisons are case-insensitive.
  */
-static int uri_meets_conditions(apr_uri_t filter, int pathlen, apr_uri_t url)
+static int uri_meets_conditions(const apr_uri_t *filter, apr_size_t pathlen,
+const apr_uri_t *url)
 {
 /* Compare the hostnames */
-if(filter.hostname) {
-if (!url.hostname) {
+if(filter-hostname) {
+if (!url-hostname) {
 return 0;
 }
-else if (strcasecmp(filter.hostname, url.hostname)) {
+else if (strcasecmp(filter-hostname, url-hostname)) {
 return 0;
 }
 }
 
 /* Compare the schemes */
-if(filter.scheme) {
-if (!url.scheme) {
+if(filter-scheme) {
+if (!url-scheme) {
 return 0;
 }
-else if (strcasecmp(filter.scheme, url.scheme)) {
+else if (strcasecmp(filter-scheme, url-scheme)) {
 return 0;
 }
 }
 
 /* Compare the ports */
-if(filter.port_str) {
-if (url.port_str  filter.port != url.port) {
+if(filter-port_str) {
+if (url-port_str  filter-port != url-port) {
 return 0;
 }
 /* NOTE:  ap_port_of_scheme will return 0 if given NULL input */
-else if (filter.port != apr_uri_port_of_scheme(url.scheme)) {
+else if (filter-port != apr_uri_port_of_scheme(url-scheme)) {
 return 0;
 }
 }
-else if(url.port_str  filter.scheme) {
-if (apr_uri_port_of_scheme(filter.scheme) == url.port) {
+else if(url-port_str  filter-scheme) {
+if (apr_uri_port_of_scheme(filter-scheme) == url-port) {
 return 0;
 }
 }
@@ -69,12 +70,11 @@ static int uri_meets_conditions(apr_uri_
 /* Url has met all of the filter conditions so far, determine
  * if the paths match.
  */
-return !strncmp(filter.path, url.path, pathlen);
+return !strncmp(filter-path, url-path, pathlen);
 }
 
 CACHE_DECLARE(cache_provider_list *)ap_cache_get_providers(request_rec *r,
-  cache_server_conf *conf,
-  apr_uri_t uri)
+  cache_server_conf *conf)
 {
 cache_provider_list *providers = NULL;
 int i;
@@ -83,7 +83,7 @@ CACHE_DECLARE(cache_provider_list *)ap_c
 for (i = 0; i  conf-cacheenable-nelts; i++) {
 struct cache_enable *ent =
 (struct cache_enable *)conf-cacheenable-elts;
-if (uri_meets_conditions(ent[i].url, ent[i].pathlen, uri)) {
+if (uri_meets_conditions(ent[i].url, ent[i].pathlen, r-parsed_uri)) 
{
 /* Fetch from global config and add to the list. */
 cache_provider *provider;
 provider = ap_lookup_provider(CACHE_PROVIDER_GROUP, ent[i].type,
@@ -120,7 +120,7 @@ CACHE_DECLARE(cache_provider_list *)ap_c
 for (i = 0; i  conf-cachedisable-nelts; i++) {
 struct cache_disable *ent =
(struct cache_disable 
*)conf-cachedisable-elts;
-if (uri_meets_conditions(ent[i].url, ent[i].pathlen, uri)) {
+if (uri_meets_conditions(ent[i].url, ent[i].pathlen, r-parsed_uri)) 
{
 /* Stop searching now. */
 return NULL;
 }
Index: trunk/modules/cache/mod_cache.c
===
--- trunk.orig/modules/cache/mod_cache.c
+++ trunk/modules/cache/mod_cache.c
@@ -106,7 +106,7 @@ static int cache_url_handler(request_rec
 /*
  * Which cache module (if any) should handle this request?
  */
-if (!(providers = ap_cache_get_providers(r, conf, r-parsed_uri))) {
+if (!(providers = ap_cache_get_providers(r, conf))) {
 return DECLINED;
 }
 
Index: trunk/modules/cache/mod_cache.h
===
--- trunk.orig/modules/cache/mod_cache.h
+++ trunk/modules/cache/mod_cache.h
@@ -280,7 +280,7 @@ CACHE_DECLARE(char *) ap_cache_generate_
  unsigned int *dirlevels,
  unsigned int *divisors,
  const char *name);
-CACHE_DECLARE(cache_provider_list *)ap_cache_get_providers(request_rec *r, 
cache_server_conf *conf, apr_uri_t uri);
+CACHE_DECLARE(cache_provider_list *)ap_cache_get_providers(request_rec *r, 
cache_server_conf *conf);
 CACHE_DECLARE(int) ap_cache_liststr(apr_pool_t *p, const char *list,

[patch 04/10] shrink cache_url_handler

2006-07-23 Thread Davi Arnaut

Move parts of cache_url_handler() to a smaller add_cache_filters()
function that is easier to read and understand.

Index: trunk/modules/cache/mod_cache.c
===
--- trunk.orig/modules/cache/mod_cache.c
+++ trunk/modules/cache/mod_cache.c
@@ -48,6 +48,44 @@ static ap_filter_rec_t *cache_remove_url
  * oh well.
  */
 
+static void add_cache_filters(request_rec *r, cache_request_rec *cache)
+{
+/*
+ * Add cache_save filter to cache this request. Choose
+ * the correct filter by checking if we are a subrequest
+ * or not.
+ */
+if (r-main) {
+ap_log_error(APLOG_MARK, APLOG_DEBUG, APR_SUCCESS,
+r-server,
+Adding CACHE_SAVE_SUBREQ filter for %s,
+r-uri);
+ap_add_output_filter_handle(cache_save_subreq_filter_handle,
+NULL, r, r-connection);
+}
+else {
+ap_log_error(APLOG_MARK, APLOG_DEBUG, APR_SUCCESS,
+r-server, Adding CACHE_SAVE filter for %s,
+r-uri);
+ap_add_output_filter_handle(cache_save_filter_handle,
+NULL, r, r-connection);
+}
+
+ap_log_error(APLOG_MARK, APLOG_DEBUG, APR_SUCCESS, r-server,
+Adding CACHE_REMOVE_URL filter for %s,
+r-uri);
+
+/* Add cache_remove_url filter to this request to remove a
+ * stale cache entry if needed. Also put the current cache
+ * request rec in the filter context, as the request that
+ * is available later during running the filter maybe
+ * different due to an internal redirect.
+ */
+cache-remove_url_filter =
+ap_add_output_filter_handle(cache_remove_url_filter_handle,
+cache, r, r-connection);
+}
+
 static int cache_url_handler(request_rec *r, int lookup)
 {
 apr_status_t rv;
@@ -111,41 +149,7 @@ static int cache_url_handler(request_rec
 if (rv != OK) {
 if (rv == DECLINED) {
 if (!lookup) {
-
-/*
- * Add cache_save filter to cache this request. Choose
- * the correct filter by checking if we are a subrequest
- * or not.
- */
-if (r-main) {
-ap_log_error(APLOG_MARK, APLOG_DEBUG, APR_SUCCESS,
- r-server,
- Adding CACHE_SAVE_SUBREQ filter for %s,
- r-uri);
-
ap_add_output_filter_handle(cache_save_subreq_filter_handle,
-NULL, r, r-connection);
-}
-else {
-ap_log_error(APLOG_MARK, APLOG_DEBUG, APR_SUCCESS,
- r-server, Adding CACHE_SAVE filter for %s,
- r-uri);
-ap_add_output_filter_handle(cache_save_filter_handle,
-NULL, r, r-connection);
-}
-
-ap_log_error(APLOG_MARK, APLOG_DEBUG, APR_SUCCESS, r-server,
- Adding CACHE_REMOVE_URL filter for %s,
- r-uri);
-
-/* Add cache_remove_url filter to this request to remove a
- * stale cache entry if needed. Also put the current cache
- * request rec in the filter context, as the request that
- * is available later during running the filter maybe
- * different due to an internal redirect.
- */
-cache-remove_url_filter =
-ap_add_output_filter_handle(cache_remove_url_filter_handle,
-cache, r, r-connection);
+add_cache_filters(r, cache);
 }
 else {
 if (cache-stale_headers) {

--

[patch 08/10] unify storage providers entry calls

2006-07-23 Thread Davi Arnaut

Later on those will handle all HTTP specific caching nuances

Index: trunk/modules/cache/cache_storage.c
===
--- trunk.orig/modules/cache/cache_storage.c
+++ trunk/modules/cache/cache_storage.c
@@ -24,6 +24,30 @@ extern module AP_MODULE_DECLARE_DATA cac
 
 /* -- */
 
+apr_status_t cache_store_entity_headers(cache_request_rec *cache,
+request_rec *r, cache_info *info)
+{
+return cache-provider-store_headers(cache-handle, r, info);
+}
+
+apr_status_t cache_store_entity_body(cache_request_rec *cache, request_rec *r,
+ apr_bucket_brigade *bb)
+{
+return cache-provider-store_body(cache-handle, r, bb);
+}
+
+apr_status_t cache_recall_entity_headers(const cache_provider *provider,
+ cache_handle_t *h, request_rec *r)
+{
+return provider-recall_headers(h, r);
+}
+
+apr_status_t cache_recall_entity_body(cache_request_rec *cache,
+  apr_pool_t *pool, apr_bucket_brigade *bb)
+{
+return cache-provider-recall_body(cache-handle, pool, bb);
+}
+
 /*
  * delete all URL entities from the cache
  *
@@ -311,7 +335,7 @@ int cache_select(request_rec *r)
 switch ((rv = list-provider-open_entity(h, r, key))) {
 case OK: {
 
-if (list-provider-recall_headers(h, r) != APR_SUCCESS) {
+if (cache_recall_entity_headers(list-provider, h, r) != 
APR_SUCCESS) {
 /* TODO: Handle this error */
 return DECLINED;
 }
Index: trunk/modules/cache/mod_cache.c
===
--- trunk.orig/modules/cache/mod_cache.c
+++ trunk/modules/cache/mod_cache.c
@@ -271,7 +271,7 @@ static int cache_out_filter(ap_filter_t 
 r-status = cache-handle-cache_obj-info.status;
 
 /* recall_headers() was called in cache_select() */
-cache-provider-recall_body(cache-handle, r-pool, bb);
+cache_recall_entity_body(cache, r-pool, bb);
 
 /* This filter is done once it has served up its content */
 ap_remove_output_filter(f);
@@ -676,7 +676,7 @@ static int cache_save_filter(ap_filter_t
 /* pass the brigades into the cache, then pass them
  * up the filter stack
  */
-rv = cache-provider-store_body(cache-handle, r, in);
+rv = cache_store_entity_body(cache, r, in);
 if (rv != APR_SUCCESS) {
 ap_log_error(APLOG_MARK, APLOG_DEBUG, rv, r-server,
  cache: Cache provider's store_body failed!);
@@ -781,7 +781,7 @@ static int cache_save_filter(ap_filter_t
  * permissions problems or a read-only (re)mount. This must be handled
  * later.
  */
-rv = cache-provider-store_headers(cache-handle, r, info);
+rv = cache_store_entity_headers(cache, r, info);
 
 /* Did we just update the cached headers on a revalidated response?
  *
@@ -809,7 +809,7 @@ static int cache_save_filter(ap_filter_t
 APR_BRIGADE_INSERT_TAIL(bb, bkt);
 }
 else {
-cache-provider-recall_body(cache-handle, r-pool, bb);
+cache_recall_entity_body(cache, r-pool, bb);
 }
 
 cache-block_response = 1;
@@ -845,7 +845,7 @@ static int cache_save_filter(ap_filter_t
 return ap_pass_brigade(f-next, in);
 }
 
-rv = cache-provider-store_body(cache-handle, r, in);
+rv = cache_store_entity_body(cache, r, in);
 if (rv != APR_SUCCESS) {
 ap_log_error(APLOG_MARK, APLOG_DEBUG, rv, r-server,
  cache: store_body failed);
Index: trunk/modules/cache/mod_cache.h
===
--- trunk.orig/modules/cache/mod_cache.h
+++ trunk/modules/cache/mod_cache.h
@@ -305,13 +305,11 @@ apr_status_t cache_generate_key_default(
  */
 const char* cache_create_key( request_rec*r );
 
-/*
-apr_status_t cache_store_entity_headers(cache_handle_t *h, request_rec *r, 
cache_info *info);
-apr_status_t cache_store_entity_body(cache_handle_t *h, request_rec *r, 
apr_bucket_brigade *bb);
-
-apr_status_t cache_recall_entity_headers(cache_handle_t *h, request_rec *r);
-apr_status_t cache_recall_entity_body(cache_handle_t *h, apr_pool_t *p, 
apr_bucket_brigade *bb);
-*/
+apr_status_t cache_store_entity_headers(cache_request_rec *cache, request_rec 
*r, cache_info *info);
+apr_status_t cache_store_entity_body(cache_request_rec *cache, request_rec *r, 
apr_bucket_brigade *bb);
+
+apr_status_t cache_recall_entity_headers(const cache_provider *provider, 
cache_handle_t *h, request_rec *r);
+apr_status_t cache_recall_entity_body(cache_request_rec *cache, apr_pool_t 
*pool, apr_bucket_brigade *bb);
 
 /* hooks */
 

--

[patch 09/10] move open_entity

2006-07-23 Thread Davi Arnaut

Move the open_entity call to cache_recall_entity_headers, adding a
new urlkey parameter.

Index: trunk/modules/cache/cache_storage.c
===
--- trunk.orig/modules/cache/cache_storage.c
+++ trunk/modules/cache/cache_storage.c
@@ -37,8 +37,16 @@ apr_status_t cache_store_entity_body(cac
 }
 
 apr_status_t cache_recall_entity_headers(const cache_provider *provider,
- cache_handle_t *h, request_rec *r)
+ cache_handle_t *h, request_rec *r,
+ const char *urlkey)
 {
+apr_status_t rv;
+
+rv = provider-open_entity(h, r, urlkey);
+
+if (rv != OK)
+return rv;
+
 return provider-recall_headers(h, r);
 }
 
@@ -332,14 +340,9 @@ int cache_select(request_rec *r)
 list = cache-providers;
 
 while (list) {
-switch ((rv = list-provider-open_entity(h, r, key))) {
-case OK: {
-
-if (cache_recall_entity_headers(list-provider, h, r) != 
APR_SUCCESS) {
-/* TODO: Handle this error */
-return DECLINED;
-}
+rv = cache_recall_entity_headers(list-provider, h, r, key);
 
+if (rv == APR_SUCCESS) {
 if (cache_check_request(h, r, cache, list) != OK) {
 return DECLINED;
 }
@@ -350,16 +353,11 @@ int cache_select(request_rec *r)
 cache-handle = h;
 return OK;
 }
-case DECLINED: {
+else {
 /* try again with next cache type */
 list = list-next;
 continue;
 }
-default: {
-/* oo-er! an error */
-return rv;
-}
-}
 }
 return DECLINED;
 }
Index: trunk/modules/cache/mod_cache.h
===
--- trunk.orig/modules/cache/mod_cache.h
+++ trunk/modules/cache/mod_cache.h
@@ -308,7 +308,8 @@ const char* cache_create_key( request_re
 apr_status_t cache_store_entity_headers(cache_request_rec *cache, request_rec 
*r, cache_info *info);
 apr_status_t cache_store_entity_body(cache_request_rec *cache, request_rec *r, 
apr_bucket_brigade *bb);
 
-apr_status_t cache_recall_entity_headers(const cache_provider *provider, 
cache_handle_t *h, request_rec *r);
+apr_status_t cache_recall_entity_headers(const cache_provider *provider, 
cache_handle_t *h, request_rec *r,
+ const char *urlkey);
 apr_status_t cache_recall_entity_body(cache_request_rec *cache, apr_pool_t 
*pool, apr_bucket_brigade *bb);
 
 /* hooks */

--

[patch 07/10] shrink cache_select

2006-07-23 Thread Davi Arnaut

Move parts of cache_select() to a smaller cache_check_request()
function that is easier to read and understand.

Index: trunk/modules/cache/cache_storage.c
===
--- trunk.orig/modules/cache/cache_storage.c
+++ trunk/modules/cache/cache_storage.c
@@ -169,6 +169,115 @@ CACHE_DECLARE(void) ap_cache_accept_head
 }
 }
 
+static int cache_check_request(cache_handle_t *h, request_rec *r,
+   cache_request_rec *cache,
+   cache_provider_list *list)
+{
+int fresh;
+char *vary = NULL;
+
+/*
+ * Check Content-Negotiation - Vary
+ *
+ * At this point we need to make sure that the object we found in
+ * the cache is the same object that would be delivered to the
+ * client, when the effects of content negotiation are taken into
+ * effect.
+ *
+ * In plain english, we want to make sure that a language-negotiated
+ * document in one language is not given to a client asking for a
+ * language negotiated document in a different language by mistake.
+ *
+ * This code makes the assumption that the storage manager will
+ * cache the req_hdrs if the response contains a Vary
+ * header.
+ *
+ * RFC2616 13.6 and 14.44 describe the Vary mechanism.
+ */
+vary = apr_pstrdup(r-pool, apr_table_get(h-resp_hdrs, Vary));
+while (vary  *vary) {
+char *name = vary;
+const char *h1, *h2;
+
+/* isolate header name */
+while (*vary  !apr_isspace(*vary)  (*vary != ','))
+++vary;
+while (*vary  (apr_isspace(*vary) || (*vary == ','))) {
+*vary = '\0';
+++vary;
+}
+
+/*
+ * is this header in the request and the header in the cached
+ * request identical? If not, we give up and do a straight get
+ */
+h1 = apr_table_get(r-headers_in, name);
+h2 = apr_table_get(h-req_hdrs, name);
+if (h1 == h2) {
+/* both headers NULL, so a match - do nothing */
+}
+else if (h1  h2  !strcmp(h1, h2)) {
+/* both headers exist and are equal - do nothing */
+}
+else {
+/* headers do not match, so Vary failed */
+ap_log_error(APLOG_MARK, APLOG_DEBUG, APR_SUCCESS,
+r-server,
+cache_select_url(): Vary header mismatch.);
+return DECLINED;
+}
+}
+
+cache-provider = list-provider;
+cache-provider_name = list-provider_name;
+
+/* Is our cached response fresh enough? */
+fresh = ap_cache_check_freshness(h, r);
+if (!fresh) {
+const char *etag, *lastmod;
+
+ap_log_error(APLOG_MARK, APLOG_DEBUG, APR_SUCCESS, r-server,
+Cached response for %s isn't fresh.  Adding/replacing 
+conditional request headers., r-uri);
+
+/* Make response into a conditional */
+cache-stale_headers = apr_table_copy(r-pool,
+r-headers_in);
+
+/* We can only revalidate with our own conditionals: remove the
+ * conditions from the original request.
+ */
+apr_table_unset(r-headers_in, If-Match);
+apr_table_unset(r-headers_in, If-Modified-Since);
+apr_table_unset(r-headers_in, If-None-Match);
+apr_table_unset(r-headers_in, If-Range);
+apr_table_unset(r-headers_in, If-Unmodified-Since);
+
+etag = apr_table_get(h-resp_hdrs, ETag);
+lastmod = apr_table_get(h-resp_hdrs, Last-Modified);
+
+if (etag || lastmod) {
+/* If we have a cached etag and/or Last-Modified add in
+ * our own conditionals.
+ */
+
+if (etag) {
+apr_table_set(r-headers_in, If-None-Match, etag);
+}
+
+if (lastmod) {
+apr_table_set(r-headers_in, If-Modified-Since,
+lastmod);
+}
+cache-stale_handle = h;
+}
+
+return DECLINED;
+}
+
+return OK;
+}
+
 /*
  * select a specific URL entity in the cache
  *
@@ -201,110 +310,13 @@ int cache_select(request_rec *r)
 while (list) {
 switch ((rv = list-provider-open_entity(h, r, key))) {
 case OK: {
-char *vary = NULL;
-int fresh;
 
 if (list-provider-recall_headers(h, r) != APR_SUCCESS) {
 /* TODO: Handle this error */
 return DECLINED;
 }
 
-/*
- * Check Content-Negotiation - Vary
- *
- * At this point we need to make sure that the object we found in
- * the cache is the same object that would be delivered to the
- * client, when the effects of content negotiation are taken into
- * effect.
- *
- * In plain english, we

[patch 10/10] combine open_entity and recall_headers callbacks

2006-07-23 Thread Davi Arnaut

An open_entity without a close_entity call is pointless, those can
be resumed to a single call.

Index: trunk/modules/cache/cache_storage.c
===
--- trunk.orig/modules/cache/cache_storage.c
+++ trunk/modules/cache/cache_storage.c
@@ -40,14 +40,7 @@ apr_status_t cache_recall_entity_headers
  cache_handle_t *h, request_rec *r,
  const char *urlkey)
 {
-apr_status_t rv;
-
-rv = provider-open_entity(h, r, urlkey);
-
-if (rv != OK)
-return rv;
-
-return provider-recall_headers(h, r);
+return provider-recall_headers(h, r, urlkey);
 }
 
 apr_status_t cache_recall_entity_body(cache_request_rec *cache,
Index: trunk/modules/cache/mod_cache.h
===
--- trunk.orig/modules/cache/mod_cache.h
+++ trunk/modules/cache/mod_cache.h
@@ -213,12 +213,10 @@ typedef struct {
 int (*remove_entity) (cache_handle_t *h);
 apr_status_t (*store_headers)(cache_handle_t *h, request_rec *r, 
cache_info *i);
 apr_status_t (*store_body)(cache_handle_t *h, request_rec *r, 
apr_bucket_brigade *b);
-apr_status_t (*recall_headers) (cache_handle_t *h, request_rec *r);
+apr_status_t (*recall_headers) (cache_handle_t *h, request_rec *r, const 
char *urlkey);
 apr_status_t (*recall_body) (cache_handle_t *h, apr_pool_t *p, 
apr_bucket_brigade *bb); 
 int (*create_entity) (cache_handle_t *h, request_rec *r,
const char *urlkey, apr_off_t len);
-int (*open_entity) (cache_handle_t *h, request_rec *r,
-   const char *urlkey);
 int (*remove_url) (cache_handle_t *h, apr_pool_t *p);
 } cache_provider;
 
Index: trunk/modules/cache/mod_disk_cache.c
===
--- trunk.orig/modules/cache/mod_disk_cache.c
+++ trunk/modules/cache/mod_disk_cache.c
@@ -57,7 +57,7 @@ module AP_MODULE_DECLARE_DATA disk_cache
 static int remove_entity(cache_handle_t *h);
 static apr_status_t store_headers(cache_handle_t *h, request_rec *r, 
cache_info *i);
 static apr_status_t store_body(cache_handle_t *h, request_rec *r, 
apr_bucket_brigade *b);
-static apr_status_t recall_headers(cache_handle_t *h, request_rec *r);
+static apr_status_t recall_headers(cache_handle_t *h, request_rec *r, const 
char *urlkey);
 static apr_status_t recall_body(cache_handle_t *h, apr_pool_t *p, 
apr_bucket_brigade *bb);
 static apr_status_t read_array(request_rec *r, apr_array_header_t* arr,
apr_file_t *file);
@@ -777,9 +777,15 @@ static apr_status_t read_table(cache_han
  * @@@: XXX: FIXME: currently the headers are passed thru un-merged.
  * Is that okay, or should they be collapsed where possible?
  */
-static apr_status_t recall_headers(cache_handle_t *h, request_rec *r)
+static apr_status_t recall_headers(cache_handle_t *h, request_rec *r,
+   const char *urlkey)
 {
-disk_cache_object_t *dobj = (disk_cache_object_t *) h-cache_obj-vobj;
+disk_cache_object_t *dobj;
+
+if (open_entity(h, r, urlkey) != OK)
+return DECLINED;
+
+dobj = (disk_cache_object_t *) h-cache_obj-vobj;
 
 /* This case should not happen... */
 if (!dobj-hfd) {
@@ -1203,7 +1209,6 @@ static const cache_provider cache_disk_p
 recall_headers,
 recall_body,
 create_entity,
-open_entity,
 remove_url,
 };
 
Index: trunk/modules/cache/mod_mem_cache.c
===
--- trunk.orig/modules/cache/mod_mem_cache.c
+++ trunk/modules/cache/mod_mem_cache.c
@@ -108,7 +108,7 @@ static mem_cache_conf *sconf;
 static int remove_entity(cache_handle_t *h);
 static apr_status_t store_headers(cache_handle_t *h, request_rec *r, 
cache_info *i);
 static apr_status_t store_body(cache_handle_t *h, request_rec *r, 
apr_bucket_brigade *b);
-static apr_status_t recall_headers(cache_handle_t *h, request_rec *r);
+static apr_status_t recall_headers(cache_handle_t *h, request_rec *r, const 
char *urlkey);
 static apr_status_t recall_body(cache_handle_t *h, apr_pool_t *p, 
apr_bucket_brigade *bb);
 
 static void cleanup_cache_object(cache_object_t *obj);
@@ -628,10 +628,16 @@ static int remove_url(cache_handle_t *h,
 return OK;
 }
 
-static apr_status_t recall_headers(cache_handle_t *h, request_rec *r)
+static apr_status_t recall_headers(cache_handle_t *h, request_rec *r,
+   const char *urlkey)
 {
 int rc;
-mem_cache_object_t *mobj = (mem_cache_object_t*) h-cache_obj-vobj;
+mem_cache_object_t *mobj;
+
+if (open_entity(h, r, urlkey) != OK)
+return DECLINED;
+
+mobj = (mem_cache_object_t*) h-cache_obj-vobj;
 
 h-req_hdrs = apr_table_make(r-pool, mobj-num_req_hdrs);
 h-resp_hdrs = apr_table_make(r-pool, mobj-num_header_out);
@@ -1053,7 +1059,6 @@ static const

[PATCH] revamped mod_disk_cache directory structure

2006-07-20 Thread Davi Arnaut

Hi,

This patch converts the mod_disk_cache cache directory structure to a
uniformly distributed two level hierarchy. The admin specifies the number
of level-1 and level-2 directories and the files are scattered across
the level-2 directories.

Also, with this patch it is possible to designate directories to separate
partitions because the temporary files are created on the destination
directory.

For example, running Apache/proxy+cache for a small network:

[EMAIL PROTECTED] cache]# sh files-per-directory.sh 
dir: 00/ subs: 139 files: 632 size: 4.8M
dir: 01/ subs: 156 files: 765 size: 5.7M
dir: 02/ subs: 144 files: 626 size: 4.8M
dir: 03/ subs: 160 files: 714 size: 6.1M
dir: 04/ subs: 169 files: 820 size: 5.9M
dir: 05/ subs: 131 files: 590 size: 4.1M
dir: 06/ subs: 148 files: 677 size: 5.3M
dir: 07/ subs: 142 files: 644 size: 5.8M
dir: 08/ subs: 148 files: 749 size: 5.8M
dir: 09/ subs: 158 files: 711 size: 6.3M
dir: 0A/ subs: 146 files: 666 size: 5.1M
dir: 0B/ subs: 157 files: 701 size: 5.1M
dir: 0C/ subs: 157 files: 671 size: 5.2M
dir: 0D/ subs: 157 files: 711 size: 5.7M
dir: 0E/ subs: 149 files: 704 size: 5.6M
dir: 0F/ subs: 158 files: 742 size: 5.8M

--
Davi Arnaut

Index: modules/cache/cache_util.c
===
--- modules/cache/cache_util.c  (revision 423984)
+++ modules/cache/cache_util.c  (working copy)
@@ -19,6 +19,7 @@
 #include mod_cache.h
 
 #include ap_provider.h
+#include util_md5.h
 
 /* -- */
 
@@ -489,54 +490,31 @@
 y[sizeof(j) * 2] = '\0';
 }
 
-static void cache_hash(const char *it, char *val, int ndepth, int nlength)
+static unsigned int cdb_string_hash(const char *str)
 {
-apr_md5_ctx_t context;
-unsigned char digest[16];
-char tmp[22];
-int i, k, d;
-unsigned int x;
-static const char enc_table[64] =
-ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_@;
+unsigned int hash = 5381;
 
-apr_md5_init(context);
-apr_md5_update(context, (const unsigned char *) it, strlen(it));
-apr_md5_final(digest, context);
+while (*str)
+hash = 33 * hash + *str++;
 
-/* encode 128 bits as 22 characters, using a modified uuencoding
- * the encoding is 3 bytes - 4 characters* i.e. 128 bits is
- * 5 x 3 bytes + 1 byte - 5 * 4 characters + 2 characters
- */
-for (i = 0, k = 0; i  15; i += 3) {
-x = (digest[i]  16) | (digest[i + 1]  8) | digest[i + 2];
-tmp[k++] = enc_table[x  18];
-tmp[k++] = enc_table[(x  12)  0x3f];
-tmp[k++] = enc_table[(x  6)  0x3f];
-tmp[k++] = enc_table[x  0x3f];
-}
-
-/* one byte left */
-x = digest[15];
-tmp[k++] = enc_table[x  2];/* use up 6 bits */
-tmp[k++] = enc_table[(x  4)  0x3f];
-
-/* now split into directory levels */
-for (i = k = d = 0; d  ndepth; ++d) {
-memcpy(val[i], tmp[k], nlength);
-k += nlength;
-val[i + nlength] = '/';
-i += nlength + 1;
-}
-memcpy(val[i], tmp[k], 22 - k);
-val[i + 22 - k] = '\0';
+return hash;
 }
 
-CACHE_DECLARE(char *)ap_cache_generate_name(apr_pool_t *p, int dirlevels,
-int dirlength, const char *name)
+CACHE_DECLARE(char *)ap_cache_generate_name(apr_pool_t *p, unsigned int L1,
+unsigned int L2, const char *name)
 {
-char hashfile[66];
-cache_hash(name, hashfile, dirlevels, dirlength);
-return apr_pstrdup(p, hashfile);
+char *key;
+char *md5_hash;
+unsigned int cdb_hash;
+
+md5_hash = ap_md5_binary(p, (unsigned char *) name, (int) strlen(name));
+
+cdb_hash = cdb_string_hash(md5_hash) / L2;
+
+key = apr_psprintf(p, %02X/%02X/%s, (cdb_hash / L2) % L1,
+   cdb_hash % L2, md5_hash);
+
+return key;
 }
 
 /* Create a new table consisting of those elements from an input
Index: modules/cache/mod_cache.h
===
--- modules/cache/mod_cache.h   (revision 423984)
+++ modules/cache/mod_cache.h   (working copy)
@@ -274,8 +274,8 @@
 
 CACHE_DECLARE(apr_time_t) ap_cache_hex2usec(const char *x);
 CACHE_DECLARE(void) ap_cache_usec2hex(apr_time_t j, char *y);
-CACHE_DECLARE(char *) ap_cache_generate_name(apr_pool_t *p, int dirlevels, 
- int dirlength, 
+CACHE_DECLARE(char *) ap_cache_generate_name(apr_pool_t *p, unsigned int L1,
+ unsigned int L2,
  const char *name);
 CACHE_DECLARE(cache_provider_list *)ap_cache_get_providers(request_rec *r, 
cache_server_conf *conf, apr_uri_t uri);
 CACHE_DECLARE(int) ap_cache_liststr(apr_pool_t *p, const char *list,
Index: modules/cache/mod_disk_cache.c
===
--- modules/cache/mod_disk_cache.c  (revision

Re: [PATCH] revamped mod_disk_cache directory structure

2006-07-20 Thread Davi Arnaut



Em 20/07/2006, às 17:06, Colm MacCarthaigh escreveu:


On Thu, Jul 20, 2006 at 11:58:01AM -0300, Davi Arnaut wrote:
Also, with this patch it is possible to designate directories to  
separate

partitions because the temporary files are created on the destination
directory.


I'm not sure it goes far enough though. What if an admin has two
filesystems/disks they can to store the cache on, or what if it's 7?


CacheDirLevels n 256 for n = 1,2,...,7,...


What if one is a 160GB filesystem and the other only 10GB?


This was not the scope of the patch but it's one step towards. I  
think we

can add load balancing later.

If we're going to tackle these kind problems, we need to look at  
how things like

diablo (which gets it right IMO) do it.



diablo ? the game ? :-)

--
Davi Arnaut

Re: [PATCH] Don't cache expired objects

2006-06-22 Thread Davi Arnaut

On Thu, 22 Jun 2006 06:35:05 -0700
Paul Querna [EMAIL PROTECTED] wrote:

 Davi Arnaut wrote:
  Don't cache requests with a expires date in the past; otherwise the cache 
  code
  will _always_ try to cache the URL, leading to numerous rename() errors - 
  since
  the URL is already cached.
 ...
  Index: modules/cache/mod_cache.c
  ===
  --- modules/cache/mod_cache.c   (revision 414393)
  +++ modules/cache/mod_cache.c   (working copy)
  @@ -426,6 +426,9 @@
   /* if a broken Expires header is present, don't cache it */
   reason = apr_pstrcat(p, Broken expires header: , exps, NULL);
   }
  +else if (exps != NULL  exp != APR_DATE_BAD  exp  apr_time_now()) {
  +reason = Expires header already expired, not cacheable;
  +}
   else if (r-args  exps == NULL) {
   /* if query string present but no expiration time, don't cache it
* (RFC 2616/13.9)
 
 
 Instead of calling apr_time_now(), shouldn't it re-use r-request_time 
 here, saving a possibly expensive call?
 
 Otherwise, I think its a good fix.
 

Good catch, thanks!

--
Davi Arnaut

Index: mod_cache.c
===
--- mod_cache.c (revision 416367)
+++ mod_cache.c (working copy)
@@ -426,6 +426,9 @@
 /* if a broken Expires header is present, don't cache it */
 reason = apr_pstrcat(p, Broken expires header: , exps, NULL);
 }
+else if (exps != NULL  exp != APR_DATE_BAD  exp  r-request_time) {
+reason = Expires header already expired, not cacheable;
+}
 else if (r-args  exps == NULL) {
 /* if query string present but no expiration time, don't cache it
  * (RFC 2616/13.9)

[PATCH] Don't cache expired objects

2006-06-14 Thread Davi Arnaut

Hi,

Don't cache requests with a expires date in the past; otherwise the cache code
will _always_ try to cache the URL, leading to numerous rename() errors - since
the URL is already cached.

--
Davi Arnaut

Index: modules/cache/mod_cache.c
===
--- modules/cache/mod_cache.c   (revision 414393)
+++ modules/cache/mod_cache.c   (working copy)
@@ -426,6 +426,9 @@
 /* if a broken Expires header is present, don't cache it */
 reason = apr_pstrcat(p, Broken expires header: , exps, NULL);
 }
+else if (exps != NULL  exp != APR_DATE_BAD  exp  apr_time_now()) {
+reason = Expires header already expired, not cacheable;
+}
 else if (r-args  exps == NULL) {
 /* if query string present but no expiration time, don't cache it
  * (RFC 2616/13.9)

SoC 2006

2006-05-29 Thread Davi Arnaut

Hi,

I'm starting to work on my SoC 2006 proposal [1] and I want to let you
guys know how I will eventually proceed and what to expect. Your comments
and suggestions are welcome.

- What I am trying to achieve ?

A cache framework that features three layers of granularity: storage,
protocol and provider. On arrival of a request (ie: http request), the
protocol layer places filters into the protocol pipeline and decides
whether or not this content should be cached. If so, it requests a
a cache provider to the provider. The provider layer then mediates the
and dispatches the data from the protocol modules to the various
storage modules. The I/O operations are handled by the storage layer
(storage modules) which retrieves and stores key/data pairs.

- How I will proceed ?

Initially, I will be refactoring the current mode cache code, without
altering its behavior. There are functions (ie: cache_save_filter)
taking more than 500 lines. I will separate those large functions
into smaller functions without changing functionality at all, making
the core more readable. This will also help me fully understand the
structure of the cache code. (Separated patches).

Further down the road, I want to rework the store_body|headers and
recall_body|headers function into single store and recall functions,
moving the http specific bits back to mod_cache and see and see how
well it works (later, those bits will be consolidated in a branding
new mod_cache_http - protocol layer). When I hit this point I'm going
to send a new report.

- Notes

Don't expect much work on this in June, my semester is ending (tests..
oh my) and there is the World Soccer Cup (FIFA) which nearly stops
the country (Brazil is expected to win its sixty championship).


--
Davi Arnaut


1. Redesigning and extending the Apache cache architecture.
   http://verdesmares.com/Apache/proposal.txt

SoC 2006, v1

2006-05-29 Thread Davi Arnaut


 Doh, did you run a spell-check before you hit the send button? :)


Hi,

I'm starting to work on my SoC 2006 proposal [1] and I want to let you
guys know how I will eventually proceed and what to expect. Your comments
and suggestions are welcome.

- What I am trying to achieve ?

A cache framework that features three layers of granularity: storage,
protocol and provider. On arrival of a request (ie: http request), the
protocol layer places filters into the protocol pipeline and decides
whether or not this content should be cached. If so, it requests a
a cache provider to the provider layer. The provider layer then mediates
and dispatches the data from the protocol modules to the various
storage modules. The I/O operations are handled by the storage layer
(storage modules) which retrieves and stores key/data pairs.

- How I will proceed ?

Initially, I will be refactoring the current mode cache code, without
altering its behavior. There are functions (ie: cache_save_filter)
taking more than 500 lines. I will separate those large functions
into smaller functions without changing functionality at all, making
the core more readable. This will also help me fully understand the
structure of the cache code. (Separated patches).

Further down the road, I want to rework the store_body|headers and
recall_body|headers function into single store and recall functions,
moving the http specific bits back to mod_cache and see how well it
works (later, those bits will be consolidated in a branding new
mod_cache_http - protocol layer). When I hit this point I'm going
to send a new report.

- Notes

Don't expect much work on this in June, my semester is ending (tests..
oh my) and there is the World Soccer Cup (FIFA) which nearly stops
the country (Brazil is expected to win its sixty championship).


--
Davi Arnaut


1. Redesigning and extending the Apache cache architecture.
   http://verdesmares.com/Apache/proposal.txt

[PATCH] No temporary file left behind

2006-05-27 Thread Davi Arnaut

Hi,

Remove mod_disk_cache temporary files when rename() fails, otherwise
they may accumulate in excess. 

--
Davi Arnaut

Index: modules/cache/mod_disk_cache.c
===
--- modules/cache/mod_disk_cache.c  (revision 409900)
+++ modules/cache/mod_disk_cache.c  (working copy)
@@ -165,7 +165,10 @@
  */
 rv = apr_file_rename(dobj-tempfile, dobj-datafile, r-pool);
 if (rv != APR_SUCCESS) {
-/* XXX log */
+ap_log_error(APLOG_MARK, APLOG_DEBUG, rv, r-server,
+ disk_cache: rename tempfile to datafile failed: %s 
- %s,
+ dobj-tempfile, dobj-datafile);
+apr_file_remove(dobj-tempfile, r-pool);
 }
 
 dobj-tfd = NULL;
@@ -854,6 +857,7 @@
 ap_log_error(APLOG_MARK, APLOG_DEBUG, rv, r-server,
 disk_cache: rename tempfile to varyfile failed: %s - %s,
 dobj-tempfile, dobj-hdrsfile);
+apr_file_remove(dobj-tempfile, r-pool);
 return rv;
 }
 
@@ -946,6 +950,7 @@
 ap_log_error(APLOG_MARK, APLOG_ERR, rv, r-server,
  disk_cache: rename tempfile to hdrsfile failed: %s - 
%s,
  dobj-tempfile, dobj-hdrsfile);
+apr_file_remove(dobj-tempfile, r-pool);
 return rv;
 }

[PATCH] properly set the parent pid on shutdown

2006-05-26 Thread Davi Arnaut

Hi,

On WIN9X machines, before signaling the parent process in
ap_signal_parent() we ought to set the proper parent pid
retrieved by GetWindowThreadProcessId(). Otherwise, the
event will be signaled without being properly opened because
the parent  pid will remain the same as the calling process pid.

With this patch it's possible to stop the httpd daemon with
the standart -k stop signal on WIN9X.

--
Davi Arnaut

Index: server/mpm/winnt/service.c
===
--- server/mpm/winnt/service.c  (revision 409410)
+++ server/mpm/winnt/service.c  (working copy)
@@ -35,6 +35,7 @@
 #undef _WINUSER_
 #include winuser.h
 
+extern DWORD parent_pid;
 static char *mpm_service_name = NULL;
 static char *mpm_display_name = NULL;
 
@@ -1289,8 +1290,10 @@
  * provided the class is ApacheWin95ServiceMonitor
  */
 hwnd = FindWindow(ApacheWin95ServiceMonitor, mpm_service_name);
-if (hwnd  GetWindowThreadProcessId(hwnd, service_pid))
+if (hwnd  GetWindowThreadProcessId(hwnd, service_pid)) {
+parent_pid = service_pid;
 globdat.ssStatus.dwCurrentState = SERVICE_RUNNING;
+}
 else
 {
 globdat.ssStatus.dwCurrentState = SERVICE_STOPPED;
Index: server/mpm/winnt/mpm_winnt.c
===
--- server/mpm/winnt/mpm_winnt.c(revision 409410)
+++ server/mpm/winnt/mpm_winnt.c(working copy)
@@ -57,7 +57,7 @@
 
 OSVERSIONINFO osver; /* VER_PLATFORM_WIN32_NT */
 
-static DWORD parent_pid;
+DWORD parent_pid;
 DWORD my_pid;
 
 int ap_threads_per_child = 0;

Re: [PATCH] properly set the parent pid on shutdown

2006-05-26 Thread Davi Arnaut

On Fri, 26 May 2006 13:48:05 -0500
William A. Rowe, Jr. [EMAIL PROTECTED] wrote:

 +1 on visual review.  Anyone have a win9x install handy to verify
 that this works?  Sorry, none of my multiboots still boot 9x.

I've personally tested it on WIN98 (virtual machine).
 
 Davi Arnaut wrote:
  Hi,
  
  On WIN9X machines, before signaling the parent process in
  ap_signal_parent() we ought to set the proper parent pid
  retrieved by GetWindowThreadProcessId(). Otherwise, the
  event will be signaled without being properly opened because
  the parent  pid will remain the same as the calling process pid.
  
  With this patch it's possible to stop the httpd daemon with
  the standart -k stop signal on WIN9X.

--
Davi Arnaut

Re: How to get at the apr_socket_t?

2006-05-11 Thread Davi Arnaut

On Thu, 11 May 2006 13:36:44 -0700
Tyler MacDonald [EMAIL PROTECTED] wrote:

 Tyler MacDonald [EMAIL PROTECTED] wrote:
  I'm working on a raw tcp/ip handler which is going to have to do
  nonblocking I/O. I'd like to get direct access to the apr_socket_t from the
  request_rec/conn_rec...
 
 OK, I found this in mod_proxy:
 
 apr_socket_t *client_socket = 
 ap_get_module_config(r-connection-conn_config, core_module);
 
 But that is only avaialable if CORE_PRIVATE is defined... Does that mean my
 module needs to define CORE_PRIVATE in order to get access to an
 apr_socket_t? That seems dirty to me... is there some better way?
 

How about: conn_rec-cs-pfd-desc ?

--
Davi Arnaut

Re: Possible new cache architecture

2006-05-03 Thread Davi Arnaut

On Wed, 3 May 2006 14:31:06 +0200 (SAST)
Graham Leggett [EMAIL PROTECTED] wrote:

 On Wed, May 3, 2006 1:26 am, Davi Arnaut said:
 
  Then you will end up with code that does not meet the requirements of
  HTTP, and you will have wasted your time.
 
  Yeah, right! How ? Hey, you are using the Monty Python argument style.
  Can you point to even one requirement of HTTP that my_cache_provider
  wont meet ?
 
 Yes. Atomic insertions and deletions, the ability to update headers
 independantly of body, etc etc, just go back and read the thread.

I can't argue with a zombie, you keep repeating the same misunderstands.

 Seriously, please move this off list to keep the noise out of people's
 inboxes.

Fine, I give up.

--
Davi Arnaut

Re: Possible new cache architecture

2006-05-03 Thread Davi Arnaut

On Wed, 3 May 2006 11:39:02 -0700
Roy T. Fielding [EMAIL PROTECTED] wrote:

 On May 3, 2006, at 5:56 AM, Davi Arnaut wrote:
 
  On Wed, 3 May 2006 14:31:06 +0200 (SAST)
  Graham Leggett [EMAIL PROTECTED] wrote:
 
  On Wed, May 3, 2006 1:26 am, Davi Arnaut said:
 
  Then you will end up with code that does not meet the  
  requirements of
  HTTP, and you will have wasted your time.
 
  Yeah, right! How ? Hey, you are using the Monty Python argument  
  style.
  Can you point to even one requirement of HTTP that my_cache_provider
  wont meet ?
 
  Yes. Atomic insertions and deletions, the ability to update headers
  independantly of body, etc etc, just go back and read the thread.
 
  I can't argue with a zombie, you keep repeating the same  
  misunderstands.
 
  Seriously, please move this off list to keep the noise out of  
  people's
  inboxes.
 
  Fine, I give up.
 
 For the record, Graham's statements were entirely correct,
 Brian's suggested architecture would slow the HTTP cache,
 and your responses have been amazingly childish for someone
 who has earned zero credibility on this list.

Fine, I do have zero credibility.

 I suggest you stop defending a half-baked design theory and
 just go ahead and implement something as a patch.  If it works,
 that's great.  If it slows the HTTP cache, I will veto it myself.

I'm already doing this.

 There is, of course, no reason why the HTTP cache has to use
 some new middle-layer back-end cache, so maybe you could just
 stop arguing about vaporware and simply implement a single
 mod_backend_cache that doesn't try to be all things to all people.
 
 Implement it and then convince people on the basis of measurements.
 That is a heck of a lot easier than convincing everyone to dump
 the current code based on an untested theory.
 

I just wanted to get comments (the original idea wasn't mine).

It wasn't my intention to flame anyone, I'm not mad or anything.
I was just stating my opinion. I maybe wrong, but I don't give
up easy. :)

--
Davi Arnaut

Re: Possible new cache architecture

2006-05-02 Thread Davi Arnaut

On Tue, 2 May 2006 11:22:31 +0200 (MEST)
Niklas Edmundsson [EMAIL PROTECTED] wrote:

 On Mon, 1 May 2006, Davi Arnaut wrote:
 
  More important, if we stick with the key/data concept it's possible to
  implement the header/body relationship under single or multiple keys.
 
 I've been hacking on mod_disk_cache to make it:
 * Only store one set of data when one uncached item is accessed
simultaneously (currently all requests cache the file and the last
finished cache process is wins).
 * Don't wait until the whole item is cached, reply while caching
(currently it stalls).
 * Don't block the requesting thread when requestng a large uncached
item, cache in the background and reply while caching (currently it
stalls).
 
 This is mostly aimed at serving huge static files from a slow disk 
 backend (typically an NFS export from a server holding all the disk), 
 such as http://ftp.acc.umu.se/ and http://ftp.heanet.ie/ .
 
 Doing this with the current mod_disk_cache disk layout was not 
 possible, doing the above without unneccessary locking means:
 
 * More or less atomic operations, so caching headers and data in
separate files gets very messy if you want to keep consistency.
 * You can't use tempfiles since you want to be able to figure out
where the data is to be able to reply while caching.
 * You want to know the size of the data in order to tell when you're
done (ie the current size of a file isn't necessarily the real size
of the body since it might be caching while we're reading it).
 
 In the light of our experiences, I really think that you want to have 
 a concept that allows you to keep the bond between header and data. 
 Yes, you can patch up a missing bond by require locking and stuff, but 
 I really prefer not having to lock cache files when doing read access. 
 When it comes to make the common case fast a lockless design is very 
 much preferred.

I will repeat once again: there is no locking involved, unless your format
of storing the header/data is really wrong. _The data format is up to
the module using it_, while the storage backend is a completely different
issue.

 However, if all those issues are sorted out in the layer above disk 
 cache then the above observations becomes more or less moot.

Yes, that's the point.

 In any case the patch is more or less finished, independent testing 
 and auditing haven't been done yet but I can submit a preliminary 
 jumbo-patch if people are interested in having a look at it now.

--
Davi Arnaut

Re: Possible new cache architecture

2006-05-02 Thread Davi Arnaut

On Tue, 2 May 2006 15:40:30 +0200 (SAST)
Graham Leggett [EMAIL PROTECTED] wrote:

 On Tue, May 2, 2006 3:24 pm, Brian Akins said:
 
  - the cache says cool, will send my copy upstream. Oops, where has my
  data gone?.
 
  So, the cache says, okay must get content the old fashioned way (proxy,
  filesystem, magic fairies, etc.).
 
  Where's the issue?
 
 To rephrase it, a whole lot of extra code, which has to be written and
 debugged, has to say oops, ok sorry backend about the If-None-Match, I
 thought I had it cached but I actually didn't, please can I have the full
 file?. Then the backend gives you a response with different headers to
 those you already delivered to the frontend. Oops.

There is not such scenario. I will simulate a request using the disk_cache
format:

. Incoming client requests URI /foo/bar/baz
. Request goes through mod_http_cache, Generate hash off of URI
. mod_http_cache ask mod_cache for the data associated with key: hash.header
. No data:
. Fetch from upstream
. Data Fetched:
. If format #1 (Contains a list of Vary Headers):
. Use each header name (from .header) with our request
values (headers_in) to regenerate hash using HeaderName+
HeaderValue+URI
. Ask mod_cache for data with key: hash.header
. No data:
. Fetch from upstream
. Data:
. Serve data to client
. If format #2
. Serve data to client

Where is the difference ?

 Keeping the code as simple as possible will keep your code bug free, which
 means less time debugging for you, and less time for end users trying to
 figure out what the cause is of their weird symptoms.

We are trying to get it more simple as possible by separating the storage
layer from the protocol layer.

--
Davi Arnaut
Davi Arnaut

Re: Possible new cache architecture

2006-05-02 Thread Davi Arnaut

On Tue, 2 May 2006 17:22:00 +0200 (SAST)
Graham Leggett [EMAIL PROTECTED] wrote:

 On Tue, May 2, 2006 7:06 pm, Davi Arnaut said:
 
  There is not such scenario. I will simulate a request using the disk_cache
  format:
 
 The way HTTP caching works is a lot more complex than in your example, you
 haven't taken into account conditional HTTP requests.

I've taken into account the actual mod_disk_cache code! Let me try to translate
your typical scenario.

 A typical conditional scenario goes like this:
 
 - Browser asks for URL from httpd.

Same.

 - Mod_cache has a cached copy by looking up the headers BUT - it's stale.
 mod_cache converts the browser's original request to a conditional request
 by adding the header If-None-Match.

sed s/mod_cache/mod_http_cache

 - The backend server answers no worries, what you have is still fresh by
 sending a 304 Not Modified.

sed s/mod_cache/mod_http_cache

 - mod_cache takes the headers from the 304, and replaces the headers on
 the cached entry, in the process making the entry fresh again.

sed s/mod_cache/mod_http_cache

 - mod_cache hands the cached data back to the browser.

sed s/mod_cache/mod_http_cache

 Read http://www.ietf.org/rfc/rfc2616.txt section 13 (mainly) to see in
 detail how this works.

Again: we do not want to change the semantics, we only want to separate
the HTTP specific part from the storage specific part. The HTTP specific
parts of mod_disk_cache, mod_mem_cache and mod_cache are moved to a
mod_http_cache, while retaining the storage specific parts. And mod_cache
is the one who will combine those two layers.

Again: it's the same thing as we were replacing all mod_disk_cache file
operations by hash table operations.

--
Davi Arnaut

Re: Possible new cache architecture

2006-05-02 Thread Davi Arnaut

On Tue, 02 May 2006 23:31:13 +0200
Graham Leggett [EMAIL PROTECTED] wrote:

 Davi Arnaut wrote:
 
  The way HTTP caching works is a lot more complex than in your example, you
  haven't taken into account conditional HTTP requests.
  
  I've taken into account the actual mod_disk_cache code!
 
 mod_disk_cache doesn't contain any of the conditional HTTP request code, 
 which is why you're not seeing it there.
 
 Please keep in mind that the existing mod_cache framework's goal is to 
 be a fully HTTP/1.1 compliant, content generator neutral, efficient, 
 error free and high performance cache.
 
 Moving towards and keeping with the above goals is a far higher priority 
 than simplifying the generic backend cache interface.
 
 To sum up - the cache backend must fulfill the requirements of the cache 
 frontend (generic or not), which in turn must fulfill the requirements 
 of the users, who are browsers, web robot code, and humans. To try and 
 prioritise this the other way round is putting the cart before the horse.

Graham, what I want is to be able to write a mod_cache backend _without_
having to worry about HTTP. _NOT_ to rewrite mod_disk/proxy/cache/whatever!

You keep talking about HTTP this, HTTP that, I wont change the way it currently
works. I just want to place a glue beteween the storage and the HTTP part.

I could even wrap around your code:

typedef struct 
apr_status_t (*fetch) (cache_handle_t *h, apr_bucket_brigade *bb);
apr_status_t (*store) (cache_handle_t *h, apr_bucket_brigade *bb);
int (*remove) (const char *key);
} my_cache_provider;

typedef struct {
const char *key_headers;
const char *key_body;
} my_cache_object;

create_entity:
my_cache_object *obj;

obj-key_headers = hash_headers(request, whatever);
obj-key_body = hash_body(request, whatever);

open_entity:
my_cache_object *obj;

my_provider-fetch(h, obj-key_headers, header_brigade);

// if necessary, update obj-key_headers/body (vary..)


remove_url:
my_provider-remove(obj-key_header);
my_provider-remove(obj-key_body);

remove_entity:
nop

store_headers:
my_cache_object *obj;
// if necessary, update obj-key_headers (vary..)
my_provider-store(h, obj-key_headers, header_brigade);

store_body:
my_cache_object *obj;
my_provider-store(h, obj-key_body, body_brigade)

recall_headers:
my_cache_object *obj;
my_provider-fetch(h, obj-key_headers, header_brigade);

recall_body:
my_cache_object *obj;
my_provider-fetch(h, obj-key_body, body_brigade);

--
Davi Arnaut

Re: Possible new cache architecture

2006-05-02 Thread Davi Arnaut

On Wed, 03 May 2006 01:09:03 +0200
Graham Leggett [EMAIL PROTECTED] wrote:

 Davi Arnaut wrote:
 
  Graham, what I want is to be able to write a mod_cache backend _without_
  having to worry about HTTP.
 
 Then you will end up with code that does not meet the requirements of 
 HTTP, and you will have wasted your time.

Yeah, right! How ? Hey, you are using the Monty Python argument style.
Can you point to even one requirement of HTTP that my_cache_provider
wont meet ?

 Please go through _all_ of the mod_cache architecture, and not just 
 mod_disk_cache. Also read and understand HTTP/1.1 gateways and caches, 
 and as you want to create a generic cache, read and understand mod_ldap, 
 a module that will probably benefit from the availability of a generic 
 cache. Then step back and see that mod_cache is a small part of a bigger 
 picture. At this point you'll see that as nice as your idea of a simple 
 generic cache interface is, it's not going to be the most elegant 
 solution to the problem.

blah, blah.. you essentially said: I don't want a simpler interface,
I think the current mess is more elegant.

I have shown you that I can even wrap your messy cache_provider hooks
into a much simpler one, how can anything else be more elegant ?

--
Davi Arnaut

Re: Possible new cache architecture

2006-05-01 Thread Davi Arnaut

On Mon, 01 May 2006 14:51:53 +0200
Graham Leggett [EMAIL PROTECTED] wrote:

 Davi Arnaut wrote:
 
  mod_cache need not be HTTP specific, it only needs the ability to cache 
  multiple entities (data, headers) under the same key, and be able to 
  replace zero or more entities independently of the other entities (think 
  updating headers without updating content).
  
  mod_cache needs only to cache key/value pairs. The key/value format is up to
  the mod_cache user.
 
 It's a design flaw to create problems that have to be specially coded 
 around, when you can avoid the problem entirely.

Maybe I'm missing something, what problems do you foresee ?

 The cache needs to be generic, yes - but there is no need to stick to 
 the key/value cliché of cache code, if a variation to this is going to 
 make your life significantly easier.
 

And the variation is..?

--
Davi Arnaut

Re: Possible new cache architecture

2006-05-01 Thread Davi Arnaut

On Mon, 01 May 2006 09:02:31 -0400
Brian Akins [EMAIL PROTECTED] wrote:

 Here is a scenario.  We will assume a cache hit.

I think the usage scenario is clear. Moving on, I would like to able to stack
up the cache providers (like the apache filter chain). Basically, mod_cache
will expose the functions:

add(key, value, expiration, flag)
get(key)
remove(key)

mod_cache will then pass the request (add/get or remove) down the chain,
similar to apache filter chain. ie:

apr_status_t mem_cache_get_filter(ap_cache_filter_t *f,
  apr_bucket_brigade *bb, ...);

apr_status_t disk_cache_get_filter(ap_cache_filter_t *f,
   apr_bucket_brigade *bb, ...);

This way it would be possible for one cache to act as a cache of another
cache provider, mod_mem_cache would work as a small/fast MRU cache for
mod_disk_cache.

--
Davi Arnaut

Re: Possible new cache architecture

2006-05-01 Thread Davi Arnaut

On Mon, 01 May 2006 15:46:58 -0400
Brian Akins [EMAIL PROTECTED] wrote:

 Graham Leggett wrote:
 
  That's two hits to find whether something is cached.
 
 You must have two hits if you support vary.
 
  How are races prevented?
 
 shouldn't be any.  something is in the cache or not.  if one piece of 
 an http object is not valid or in cache, the object is invalid. 
 Although other variants may be valid/in cache.
 

More important, if we stick with the key/data concept it's possible to
implement the header/body relationship under single or multiple keys.

I think Brian want's mod_cache should be only a layer (glue) between the
underlying providers and the cache users. Each set of problems are better
dealt under their own layers. The storage layer (cache providers) are going
to only worry about storing the key/data pairs (and expiring ?) while the
protocol layer will deal with the underlying concepts of each protocol
(mod_http_cache).

The current design leads to bloat, just look at mem_cache and disk_cache,
both have their own duplicated quirks (serialize/unserialize, et cetera)
and need special handling of the headers and file format. Under the new
design this duplication will be gone, think that we will assemble the
HTTP-specific part and generalize the storage part.

--
Davi Arnaut

Re: Possible new cache architecture

2006-05-01 Thread Davi Arnaut

On Mon, 01 May 2006 22:46:44 +0200
Graham Leggett [EMAIL PROTECTED] wrote:

 Brian Akins wrote:
 
  That's two hits to find whether something is cached.
  
  You must have two hits if you support vary.
 
 You need only one - bring up the original cached entry with the key, and 
 then use cheap subkeys over a very limited data set to find both the 
 variants and the header/data.
 
  How are races prevented?
  
  shouldn't be any.  something is in the cache or not.  if one piece of 
  an http object is not valid or in cache, the object is invalid. 
  Although other variants may be valid/in cache.
 
 I can think of one race off the top of my head:
 
 - the browser says send me this URL.
 
 - the cache has it cached, but it's stale, so it asks the backend 
 If-None-Match.
 
 - the cache reaper comes along, says oh, this is stale, and reaps the 
 cached body (which is independant, remember?). The data is no longer 
 cached even though the headers still exist.
 
 - The backend says 304 Not Modified.
 
 - the cache says cool, will send my copy upstream. Oops, where has my 
 data gone?.

Sorry, but this only happens in your imagination. It's pretty obvious
that mod_cache_http will handle this.

 The end user will probably experience this as oh, the website had a 
 glitch, let me try again, so it won't be reported as a bug.

No.

 Ok, so you tried to lock the body before going to the backend, but 
 searching for and locking the body would have been an additional wasted 
 cache hit if the backend answered with its own body. Not to mention 
 having to write and debug code to do this.

Locks are not necessary, perhaps you are imaginating something very different.
If a data body disappears under mod_http_cache it is not a big deal! It will
refuse to serve the request from the cache and a new version of the page will
be cached.

 Races need to be properly handled, and atomic cache operations will go a 
 long way to prevent them.

I think we are discussing apples and oranges. First, we only want to *organize*
the current cache code into a more layered solution. The current semantics won't
change, yet!

--
Davi Arnaut

Re: Possible new cache architecture

2006-04-30 Thread Davi Arnaut

On Sun, 30 Apr 2006 22:38:23 +0200
Graham Leggett [EMAIL PROTECTED] wrote:

 Brian Akins wrote:
 
  mod_http_cache could just cache headers and data as separate cache entries.
 
 The potential danger with this is for race conditions to happen while 
 expiring cache entries. If the data entity expired before the header 
 entity, it potentially could confuse the cache - is the entry cached or 
 not? The headers say yes, data says no.

If both the data and header have the same expiration time they should
both be removed atomically, but this would be hard to achieve. The trick
is to set the header to expire before the data. Also, this would confuse
the cache user not the cache itself.

  So a given HTTP object may actually have 3 entries in the cache:
  -first entry says: Vary on x,y,z
  -second entry is headers for new key (generated with the vary info)
  -third entry is the actual data
 
 Each variant should be an independent cached entry, the cache should 
 allow different variants to be cached side by side.
 
  As far as mod_cache is concerned these are 3 independent entries, but 
  mod_http_cache knows how to stitch them together.
  
  mod_cache should *not* be HTTP specific in any way.
 
 mod_cache need not be HTTP specific, it only needs the ability to cache 
 multiple entities (data, headers) under the same key, and be able to 
 replace zero or more entities independently of the other entities (think 
 updating headers without updating content).
 

mod_cache needs only to cache key/value pairs. The key/value format is up to
the mod_cache user.

--
Davi Arnaut

[PATCH]: upstream server timeout

2006-04-28 Thread Davi Arnaut


According to RFC 2616, whenever a connection to a upstream server fails due to a
timeout (or DNS failure) the proxy server should return with status 504 (Gateway
Timeout).

-- 
Davi Arnaut

Index: mod_proxy_http.c
===
--- mod_proxy_http.c(revision 398058)
+++ mod_proxy_http.c(working copy)
@@ -1689,7 +1689,7 @@
 /* Step Two: Make the Connection */
 if (ap_proxy_connect_backend(proxy_function, backend, worker, r-server)) {
 if (r-proxyreq == PROXYREQ_PROXY)
-status = HTTP_NOT_FOUND;
+status = HTTP_GATEWAY_TIME_OUT;
 else
 status = HTTP_SERVICE_UNAVAILABLE;
 goto cleanup;

mod_proxy yields a 404 on timeout

2006-04-19 Thread Davi Arnaut


Hi,

When mod_proxy fails to connect to a backend (ap_proxy_connect_backend) due to
a timeout, or something else, and the connection is a standard proxy request
(i.e. non-reverse) it returns a status code 404 (Not found). Is there any reason
why we are not returning a 500 (Gateway Timeout) code ?

Quoting RFC 2616:

10.5.5 504 Gateway Timeout

   The server, while acting as a gateway or proxy, did not receive a
   timely response from the upstream server specified by the URI (e.g.
   HTTP, FTP, LDAP) or some other auxiliary server (e.g. DNS) it needed
   to access in attempting to complete the request.

--
Davi Arnaut

1 2 >

1 - 100 of 110 matches

Mail list logo