Re: Possible new cache architecture

2006-05-01 Thread Davi Arnaut
On Mon, 01 May 2006 22:46:44 +0200
Graham Leggett <[EMAIL PROTECTED]> wrote:

> Brian Akins wrote:
> 
> >> That's two hits to find whether something is cached.
> > 
> > You must have two hits if you support vary.
> 
> You need only one - bring up the original cached entry with the key, and 
> then use cheap subkeys over a very limited data set to find both the 
> variants and the header/data.
> 
> >> How are races prevented?
> > 
> > shouldn't be any.  something is in the cache or not.  if one "piece" of 
> > an http "object" is not valid or in cache, the object is invalid. 
> > Although other variants may be valid/in cache.
> 
> I can think of one race off the top of my head:
> 
> - the browser says "send me this URL".
> 
> - the cache has it cached, but it's stale, so it asks the backend 
> "If-None-Match".
> 
> - the cache reaper comes along, says "oh, this is stale", and reaps the 
> cached body (which is independant, remember?). The data is no longer 
> cached even though the headers still exist.
> 
> - The backend says "304 Not Modified".
> 
> - the cache says "cool, will send my copy upstream. Oops, where has my 
> data gone?".

Sorry, but this only happens in your imagination. It's pretty obvious
that mod_cache_http will handle this.

> The end user will probably experience this as "oh, the website had a 
> glitch, let me try again", so it won't be reported as a bug.

No.

> Ok, so you tried to lock the body before going to the backend, but 
> searching for and locking the body would have been an additional wasted 
> cache hit if the backend answered with its own body. Not to mention 
> having to write and debug code to do this.

Locks are not necessary, perhaps you are imaginating something very different.
If a data body disappears under mod_http_cache it is not a big deal! It will
refuse to serve the request from the cache and a new version of the page will
be cached.

> Races need to be properly handled, and atomic cache operations will go a 
> long way to prevent them.

I think we are discussing apples and oranges. First, we only want to *organize*
the current cache code into a more layered solution. The current semantics won't
change, yet!

--
Davi Arnaut


Re: Possible new cache architecture

2006-05-01 Thread William A. Rowe, Jr.

Graham Leggett wrote:

Brian Akins wrote:


That's two hits to find whether something is cached.



You must have two hits if you support vary.



You need only one - bring up the original cached entry with the key, and 
then use cheap subkeys over a very limited data set to find both the 
variants and the header/data.



How are races prevented?



shouldn't be any.  something is in the cache or not.  if one "piece" 
of an http "object" is not valid or in cache, the object is invalid. 
Although other variants may be valid/in cache.



I can think of one race off the top of my head:

- the browser says "send me this URL".

- the cache has it cached, but it's stale, so it asks the backend 
"If-None-Match".


- the cache reaper comes along, says "oh, this is stale", and reaps the 
cached body (which is independant, remember?). The data is no longer 
cached even though the headers still exist.


- The backend says "304 Not Modified".

- the cache says "cool, will send my copy upstream. Oops, where has my 
data gone?".


I think that can be avoided by, instead of reaping the cached body, actually
setting aside the cached body (public > private), by changing it's key or
whatnot.  Then - throw it away after the backend says "200 OK", and replace
it with something new.  Or, rekey it a second time (private > public) when
the backend reports "304 NOT MODIFIED".

In the race, one will set it aside looking for another, the second will make
a fresh request (it doesn't see it in the cache), and either the first or
second request will wrap up -last- to place the final copy back into the
cache, replacing the document from the winner.  No harm no foul.

Bill


Re: [PATCH] #39275 MaxClients on startup [Was: Bug in 2.0.56-dev]

2006-05-01 Thread Greg Ames

Jeff Trawick wrote:


I have been working with a user on one of these fork bomb scenarios
and assumed it was the child_init hook.  But after giving them a test
fix that relies on a child setting scoreboard fields in child_main
before child-init hooks run, and also adds some debugging traces
related to calling child-init hooks, it is clear that their stall
occurs BEFORE the child-init hook.  Which leaves a stretch of fairly
simple code.

(Best theory is bad stuff happening in an atfork handler registred by
a third-party module or some library it uses.  But that's besides the
point.)


after more thought, there is a simpler patch that should do the job.  the key to both of 
these is how threads in SERVER_DEAD state with a pid in the scoreboard are treated.  this 
means that p_i_s_m forked on a previous timer pop but some thread never made it into 
SERVER_STARTING state.


the difference:  this patch just counts those potential threads as idle, and allows 
MinSpareThreads worth of processes to be forked before putting on the brakes.  the 
previous patch pauses the forking immediately when the strange situation is detected but 
requires more code and a new variable.  I'm leaning toward this one because it is simpler. 
 opinions?


Greg

--- server/mpm/worker/worker.c  (revision 398659)
+++ server/mpm/worker/worker.c  (working copy)
@@ -1422,7 +1422,7 @@
  */
 if (ps->pid != 0) { /* XXX just set all_dead_threads in outer for
loop if no pid?  not much else matters */
-if (status <= SERVER_READY && status != SERVER_DEAD &&
+if (status <= SERVER_READY &&
 !ps->quiescing &&
 ps->generation == ap_my_generation) {
 ++idle_thread_count;




Re: Possible new cache architecture

2006-05-01 Thread Graham Leggett

Brian Akins wrote:


That's two hits to find whether something is cached.


You must have two hits if you support vary.


You need only one - bring up the original cached entry with the key, and 
then use cheap subkeys over a very limited data set to find both the 
variants and the header/data.



How are races prevented?


shouldn't be any.  something is in the cache or not.  if one "piece" of 
an http "object" is not valid or in cache, the object is invalid. 
Although other variants may be valid/in cache.


I can think of one race off the top of my head:

- the browser says "send me this URL".

- the cache has it cached, but it's stale, so it asks the backend 
"If-None-Match".


- the cache reaper comes along, says "oh, this is stale", and reaps the 
cached body (which is independant, remember?). The data is no longer 
cached even though the headers still exist.


- The backend says "304 Not Modified".

- the cache says "cool, will send my copy upstream. Oops, where has my 
data gone?".


The end user will probably experience this as "oh, the website had a 
glitch, let me try again", so it won't be reported as a bug.


Ok, so you tried to lock the body before going to the backend, but 
searching for and locking the body would have been an additional wasted 
cache hit if the backend answered with its own body. Not to mention 
having to write and debug code to do this.


Races need to be properly handled, and atomic cache operations will go a 
long way to prevent them.


Regards,
Graham
--


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Possible new cache architecture

2006-05-01 Thread Davi Arnaut
On Mon, 01 May 2006 15:46:58 -0400
Brian Akins <[EMAIL PROTECTED]> wrote:

> Graham Leggett wrote:
> 
> > That's two hits to find whether something is cached.
> 
> You must have two hits if you support vary.
> 
> > How are races prevented?
> 
> shouldn't be any.  something is in the cache or not.  if one "piece" of 
> an http "object" is not valid or in cache, the object is invalid. 
> Although other variants may be valid/in cache.
> 

More important, if we stick with the key/data concept it's possible to
implement the header/body relationship under single or multiple keys.

I think Brian want's mod_cache should be only a layer (glue) between the
underlying providers and the cache users. Each set of problems are better
dealt under their own layers. The storage layer (cache providers) are going
to only worry about storing the key/data pairs (and expiring ?) while the
"protocol" layer will deal with the underlying concepts of each protocol
(mod_http_cache).

The current design leads to bloat, just look at mem_cache and disk_cache,
both have their own duplicated quirks (serialize/unserialize, et cetera)
and need special handling of the headers and file format. Under the new
design this duplication will be gone, think that we will assemble the
HTTP-specific part and generalize the storage part.

--
Davi Arnaut


Re: Possible new cache architecture

2006-05-01 Thread Brian Akins

William A. Rowe, Jr. wrote:


And, of course, inserting the hit once it's composed is important, and can
happen in parallel (3 clients looking for the same, and then fetching the
same page from the origin).  But it's harmless if the insertion is mutex
protected, and the insertion can only happen once the page is fetched
complete.



in the case of mod_disk_cache the way I would do it is to have a 
deterministic tempfile rather than user apr_tempfile and opening it EXCL.


--
Brian Akins
Lead Systems Engineer
CNN Internet Technologies


Re: Possible new cache architecture

2006-05-01 Thread William A. Rowe, Jr.

Brian Akins wrote:

Graham Leggett wrote:


That's two hits to find whether something is cached.


You must have two hits if you support vary.


Well, one to three hits.  One, if you use an arbitrary page (MRU or most
frequently referenced would be most optimial, but it really doesn't matter)
and then determine what varies, and if you are in the right place, or what
that right place is (page by language, or whatever fields it varied by.)

Three hits or more if your variant also varies ;)


How are races prevented?


shouldn't be any.  something is in the cache or not.  if one "piece" of 
an http "object" is not valid or in cache, the object is invalid. 
Although other variants may be valid/in cache.


And, of course, inserting the hit once it's composed is important, and can
happen in parallel (3 clients looking for the same, and then fetching the
same page from the origin).  But it's harmless if the insertion is mutex
protected, and the insertion can only happen once the page is fetched
complete.


Re: Possible new cache architecture

2006-05-01 Thread Brian Akins

Graham Leggett wrote:


That's two hits to find whether something is cached.


You must have two hits if you support vary.


How are races prevented?


shouldn't be any.  something is in the cache or not.  if one "piece" of 
an http "object" is not valid or in cache, the object is invalid. 
Although other variants may be valid/in cache.



--
Brian Akins
Lead Systems Engineer
CNN Internet Technologies


Re: Possible new cache architecture

2006-05-01 Thread Brian Akins

Graham Leggett wrote:
 Or you 
can avoid this issue entirely by building a generic cache that works 
with key/subkey/data.


and then you have to find a way to bridge the gap between this interface 
and all the key/value caches that currently exist (memcache being the 
most popular example).


what if mod_http_cache had a way to "record" it's cached objects? It 
could keep up with the relationships there.  Basically, you have a 
provider that has a few functions that get called whenever 
mod_http_cache caches or expires an object.



--
Brian Akins
Lead Systems Engineer
CNN Internet Technologies


Re: Possible new cache architecture

2006-05-01 Thread Graham Leggett

Brian Akins wrote:

Nope.  Look at the way the current http cache works. An http "object," 
headers and data, is only valid if both headers and data are valid.


That's two hits to find whether something is cached.

How are races prevented?

Regards,
Graham
--


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Possible new cache architecture

2006-05-01 Thread Brian Akins

Graham Leggett wrote:
 the independent caching of variants. 


The example I posted should address this issue.

I also have some ideas concerning the thundering herd problem, it's just 
a matter if you think it should be handled in cache or http_cache.




--
Brian Akins
Lead Systems Engineer
CNN Internet Technologies


Re: Possible new cache architecture

2006-05-01 Thread Graham Leggett

Davi Arnaut wrote:

It's a design flaw to create problems that have to be specially coded 
around, when you can avoid the problem entirely.


Maybe I'm missing something, what problems do you foresee ?


There are lots of issues that were uncovered when I split the proxy and 
cache code for httpd v2.0.


A web cache requires two separately alterable cached entities (headers, 
body) just for caching a single variant. This pair of entities need to 
expire and/or be forceably expired (think Cache-Control no-cache) 
atomically. Sure, you can code and debug a lot of code to try and create 
the effect of atomically expiring multiple cache entries at once. Or you 
can avoid this issue entirely by building a generic cache that works 
with key/subkey/data.


There are a number of other issues that have been listed as bugs since 
httpd v1.3 that are still present, most notably the thundering herd 
problem, and the independent caching of variants. There is no point in 
refactoring the cache code if the new code isn't going to be 
significantly better than the existing code.


Regards,
Graham
--


smime.p7s
Description: S/MIME Cryptographic Signature


Re: plain file name of a request

2006-05-01 Thread William A. Rowe, Jr.

Greg Ames wrote:

Markus Litz wrote:


Hello,

how can i get the filename only of the requested uri? For example if 
"http://www.example.com/test.html"; is requestet, i only want 
"test.html". request_rec::filename only gives the full filename on disk.


basename(r->filename)


:)  Or portably, apr_filepath_name_get() declared in apr_lib.h


Re: plain file name of a request

2006-05-01 Thread Greg Ames

Markus Litz wrote:

Hello,

how can i get the filename only of the requested uri? For example if 
"http://www.example.com/test.html"; is requestet, i only want "test.html". 
request_rec::filename only gives the full filename on disk.


basename(r->filename)

Greg


Re: svn commit: r398494 - in /httpd/site/trunk: docs/security/vulnerabilities_13.html docs/security/vulnerabilities_20.html docs/security/vulnerabilities_22.html xdocs/security/vulnerabilities_22.xml

2006-05-01 Thread Paul Querna
Mark J Cox wrote:
>> This killed the list of vulnerabilities for all versions. Was this intended?
>> And if yes, where can they be found now?
> 
> Must be someone with bad java foo, fixing.
> 

Er. ya. It wasn't my intention to break stuff, I just ran build.sh and
it kept saying it wanted to do this

java version "1.5.0_06"

Intel Mac.

How could a version of java change the behavior of the site build stuff?

-Paul


Re: svn commit: r398494 - in /httpd/site/trunk: docs/security/vulnerabilities_13.html docs/security/vulnerabilities_20.html docs/security/vulnerabilities_22.html xdocs/security/vulnerabilities_22.xml

2006-05-01 Thread Mark J Cox
> This killed the list of vulnerabilities for all versions. Was this intended?
> And if yes, where can they be found now?

Must be someone with bad java foo, fixing.

Mark
--
Mark J Cox | www.awe.com/mark





Re: Possible new cache architecture

2006-05-01 Thread Brian Akins

Davi Arnaut wrote:

This way it would be possible for one cache to act as a cache of another
cache provider, mod_mem_cache would work as a small/fast MRU cache for
mod_disk_cache.


Slightly off subject, but in my testing, mod_disk_cache is much faster 
than mod_mem_cache.  Thanks to sendifle!


I was thinking about scenarios were each cache had there local cache 
(disk, mem, whatever) with memcache behind it.  That way each "object" 
only has to be generated once for the entire "farm."  This would be an 
easy way to have a distributed cache.


Also, the squid type htcp (or icp) could be a failback for the local 
cache as well without mucking up all the proxy and cache code.



--
Brian Akins
Lead Systems Engineer
CNN Internet Technologies


Re: Possible new cache architecture

2006-05-01 Thread Davi Arnaut
On Mon, 01 May 2006 09:02:31 -0400
Brian Akins <[EMAIL PROTECTED]> wrote:

> Here is a scenario.  We will assume a cache "hit."

I think the usage scenario is clear. Moving on, I would like to able to stack
up the cache providers (like the apache filter chain). Basically, mod_cache
will expose the functions:

add(key, value, expiration, flag)
get(key)
remove(key)

mod_cache will then pass the request (add/get or remove) down the chain,
similar to apache filter chain. ie:

apr_status_t mem_cache_get_filter(ap_cache_filter_t *f,
  apr_bucket_brigade *bb, ...);

apr_status_t disk_cache_get_filter(ap_cache_filter_t *f,
   apr_bucket_brigade *bb, ...);

This way it would be possible for one cache to act as a cache of another
cache provider, mod_mem_cache would work as a small/fast MRU cache for
mod_disk_cache.

--
Davi Arnaut



Re: Possible new cache architecture

2006-05-01 Thread Davi Arnaut
On Mon, 01 May 2006 14:51:53 +0200
Graham Leggett <[EMAIL PROTECTED]> wrote:

> Davi Arnaut wrote:
> 
> >> mod_cache need not be HTTP specific, it only needs the ability to cache 
> >> multiple entities (data, headers) under the same key, and be able to 
> >> replace zero or more entities independently of the other entities (think 
> >> updating headers without updating content).
> > 
> > mod_cache needs only to cache key/value pairs. The key/value format is up to
> > the mod_cache user.
> 
> It's a design flaw to create problems that have to be specially coded 
> around, when you can avoid the problem entirely.

Maybe I'm missing something, what problems do you foresee ?

> The cache needs to be generic, yes - but there is no need to stick to 
> the "key/value" cliché of cache code, if a variation to this is going to 
> make your life significantly easier.
> 

And the variation is..?

--
Davi Arnaut


Re: Possible new cache architecture

2006-05-01 Thread Brian Akins

Here is a scenario.  We will assume a cache "hit."

Client asks for http://domain/uri.html?args

mod_http_cache generates a key: http-domain-uri.html-args-header

asks mod_cache for value with this key.

mod_cache fetches the value, looks at expire time, its good, and returns 
the "blob"


mod_http_cache examines blob, it's vary information on Accept-Encoding.

mod_http_cache generates a new key: http-domain.html-args-header-gzip 
(value from client)


asks mod_cache for value with this key.

mod_cache fetches the value, looks at expire time, its good, and returns 
the "blob"


mod_http_cache examines blob, it's a normal header blob. does not "meet 
conditions" need to get data.


mod_http_cache generates a new key: http-domain.html-args-data-gzip 
(value from client)


asks mod_cache for value with this key.

mod_cache fetches the value, looks at expire time, its good, and returns 
the "blob"



mod_http_cache returns headers and data to client.


Notice there is a pattern to this...
--
Brian Akins
Lead Systems Engineer
CNN Internet Technologies


[Fwd: svn commit: r398585 - in /httpd/site/trunk: docs/download.html docs/index.html xdocs/download.xml xdocs/index.xml]

2006-05-01 Thread William A. Rowe, Jr.
+Win32 Binary (Self extracting): href="[preferred]/httpd/binaries/win32/apache_1.3.35-win32-x86-no_src.exe">apache_1.3.35-win32-x86-no_src.exe 



There is no more .exe (and won't be).  By 2006 everyone has at least
msiexec 1.10 installed ;-)

Only -src.msi and -no_src.msi remain for 1.3, while 2.0 and 2.2 will have
-ssl.msi and -no-ssl.msi flavors.

Bill



Re: Possible new cache architecture

2006-05-01 Thread Brian Akins

Davi Arnaut wrote:


mod_cache needs only to cache key/value pairs. The key/value format is up to
the mod_cache user.


correct.

--
Brian Akins
Lead Systems Engineer
CNN Internet Technologies


Re: Possible new cache architecture

2006-05-01 Thread Graham Leggett

Davi Arnaut wrote:

mod_cache need not be HTTP specific, it only needs the ability to cache 
multiple entities (data, headers) under the same key, and be able to 
replace zero or more entities independently of the other entities (think 
updating headers without updating content).


mod_cache needs only to cache key/value pairs. The key/value format is up to
the mod_cache user.


It's a design flaw to create problems that have to be specially coded 
around, when you can avoid the problem entirely.


The cache needs to be generic, yes - but there is no need to stick to 
the "key/value" cliché of cache code, if a variation to this is going to 
make your life significantly easier.


Regards,
Graham
--


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Possible new cache architecture

2006-05-01 Thread Brian Akins

Graham Leggett wrote:


The potential danger with this is for race conditions to happen while 
expiring cache entries. If the data entity expired before the header 
entity, it potentially could confuse the cache - is the entry cached or 
not? The headers say yes, data says no.


Nope.  Look at the way the current http cache works. An http "object," 
headers and data, is only valid if both headers and data are valid.


Each variant should be an independent cached entry, the cache should 
allow different variants to be cached side by side.


Yes.  Each is distinguished by its key.

As far as mod_cache is concerned these are 3 independent entries, but 
mod_http_cache knows how to "stitch" them together.


mod_cache should *not* be HTTP specific in any way.


mod_cache need not be HTTP specific, it only needs the ability to cache 
multiple entities (data, headers) under the same key, 


No.



In other words, there must be the ability to cache by a key and a subkey.



No. mod_http_cache generates new keys for headers (key.header) data 
(key.data) and each variant (key1.header, key2.header, key1.daya... 
etc.).  As far as the underlying generic cache is concerned, they are 
all independent entries.



--
Brian Akins
Lead Systems Engineer
CNN Internet Technologies


Re: svn commit: r398492 - in /httpd/site/trunk: docs/download.html docs/index.html xdocs/download.xml xdocs/index.xml

2006-05-01 Thread Ruediger Pluem


On 05/01/2006 03:25 AM, [EMAIL PROTECTED] wrote:
> Author: pquerna
> Date: Sun Apr 30 18:25:38 2006
> New Revision: 398492
> 
> URL: http://svn.apache.org/viewcvs?rev=398492&view=rev
> Log:
> Rev website for 2.2.2
> 
> Modified:
> httpd/site/trunk/docs/download.html
> httpd/site/trunk/docs/index.html
> httpd/site/trunk/xdocs/download.xml
> httpd/site/trunk/xdocs/index.xml

I see that 2.2.2 and 2.0.58 are announced. What about 1.3.35? Did it not hit 
the mirrors in time?

Regards

Rüdiger



Re: svn commit: r398494 - in /httpd/site/trunk: docs/security/vulnerabilities_13.html docs/security/vulnerabilities_20.html docs/security/vulnerabilities_22.html xdocs/security/vulnerabilities_22.xml

2006-05-01 Thread Ruediger Pluem


On 05/01/2006 03:32 AM, [EMAIL PROTECTED] wrote:
> Author: pquerna
> Date: Sun Apr 30 18:32:18 2006
> New Revision: 398494
> 
> URL: http://svn.apache.org/viewcvs?rev=398494&view=rev
> Log:
> rebuild all.
> 
> Modified:
> httpd/site/trunk/docs/security/vulnerabilities_13.html
> httpd/site/trunk/docs/security/vulnerabilities_20.html
> httpd/site/trunk/docs/security/vulnerabilities_22.html
> httpd/site/trunk/xdocs/security/vulnerabilities_22.xml

This killed the list of vulnerabilities for all versions. Was this intended?
And if yes, where can they be found now?

Anyway, many thanks for doing this release work :-).

Regards

Rüdiger