RE: httpd attempts to open file.html/.htaccess (is this a bug?)

2007-06-13 Thread Allen Pulsifer
Hello William,

Thanks for the suggestions.  I have a fix that is pretty simple (and
therefore I hope, unlikely to break anything ;-).  Later today, after I've
compiled and tested it on both Windows and Linux, I'll post it to the list.

Allen



RE: httpd attempts to open file.html/.htaccess (is this a bug?)

2007-06-12 Thread Allen Pulsifer
  When processing a GET /.../file.html, Apache httpd briefly treats 
  file.html as a directory and attempts to open 
  docroot/.../file.html/.htaccess.  The os returns ENOTDIR, 
 and then 
  processing of the request continues.

 Yes, this is a somewhat known issue.  Previously it caused 
 issues with earlier versions of reiserfs4: 
 http://issues.apache.org/bugzilla/show_bug.cgi?id=31126
 
 Rici explains more details here: 
 http://marc.info/?l=apache-httpd-devm=109470495819687w=4


Hello Paul and Dev List,

Thanks for the reply.  I checked out the links and did some code tracing
with the debugger.  As one of the links pointed out, the problem is in the
block of code attached below from ap_directory_walk() in server/request.c

This block of code is contained in the directory walk that looks for sym
links and .htaccess files.  It is executed immediately after appending the
next path segment, which is either a subdirectory or the file name.

The if test at the top of the block attempts to optimize by skipping the
statements the follow it.  The comment on the if test states:

  * If...we knew r-filename was a file, and
  * if...we have strict (case-sensitive) filenames, or
  *  we know the canonical_filename matches to _this_ name, and
  * if...we have allowed symlinks
  * skip the lstat and dummy up an APR_DIR value for thisinfo.

The first problem with the if test is that it doesn't recognize when the
segment is actually the file name itself, and therefore the type is APR_REG
rather than APR_DIR.  This could easily be fixed, but there may be a few
other problems.

First, it should be mentioned that the optimization can be removed and then
httpd will behave correctly: it will not do a spurious access on
file.html/.htaccess.  However, when the optimization is removed, it will
then do a stat on each component in the file path, when it might not need
to.

Let's first look at the lines of code that follows the if optimization and
look at the conditions under which they are not necessary.  Before starting
though, let's note that prior to beginning the directory walk,
ap_directory_walk() does a stat on the full file name, using the
APR_FINFO_MIN parameter.

Later, without optimization, it would then do a stat on each component in
the path, as follows:

1. Do a stat on the path component, looking at the link info
(APR_FINFO_LINK) rather than the target info.

2. Test if stat returned an error.  Note that since the initial stat on the
full path did not return an error, the stat on the component will never
return as error (assuming the program logic is correct).  This can therefore
always be optimized out.

3. Fix up the path name if the actual component name info does not match.  A
mismatch is only possible with a file system that is not case sensitive, and
therefore can be optimized out if either (a) the file system is case
sensitive or (b) we already know they match; or (c) we don't care if they
match or not.

4. If the path is a link, run resolve_symlink().  This function will always
return success when OPT_SYM_LINKS (FollowSymLinks) is enabled.

5. If the path points at anything other than a directory, end processing.

So basically, these processing steps can be skipped whenever (1)
FollowSymLinks is enabled AND (2) the file system is case sensitive.

It seems to me that the optimization should actually read:

If (filesystem is case sensitive AND OPT_SYM_LINKS is enabled AND we did a
successful stat on the full file path) Then:

{   If (the path to test is the fill path AND full path points at a
regular file) Then: end processing

Else: assume path to test is a dir and skip the stat
}

These are the two things I'm concerned about:

1. In the current optimization, the comment says:

  * if...we have strict (case-sensitive) filenames, or
  *  we know the canonical_filename matches to _this_ name, and

while the actual code says:

#ifdef CASE_BLIND_FILESYSTEM
 (filename_len = canonical_len)
#endif

At first examination, it looks the comment describes the correct
implementation, but how does the test for filename_len = canonical_len
ensure that canonical_filename matches to _this_ name.  Can anyone verify
this is correct?

2. When OPT_SYM_LINKS is enabled, resolve_symlink() does not test
OPT_SYM_OWNER, i.e., OPT_SYM_LINKS overrides OPT_SYM_OWNER.  The
optimization however insists that OPT_SYM_LINKS is set while OPT_SYM_OWNER
is unset.

Which of these two are correct?  Should resolve_symlink() always check
OPT_SYM_OWNER, even if OPT_SYM_LINKS is enabled, or should the optimization
only check OPT_SYM_LINKS?

Thanks,

Allen


-

THE PROBLEMATIC BLOCK OF CODE

/* First optimization;
 * If...we knew r-filename was a file, and
 * if...we have strict (case-sensitive) filenames, or
 *  we know the canonical_filename matches to _this_ name,
and
 * if...we have allowed symlinks
 * skip the lstat and dummy 

Re: httpd attempts to open file.html/.htaccess (is this a bug?)

2007-06-12 Thread William A. Rowe, Jr.
Allen Pulsifer wrote:
 
 Hello Paul and Dev List,
 
 Thanks for the reply.  I checked out the links and did some code tracing
 with the debugger.  As one of the links pointed out, the problem is in the
 block of code attached below from ap_directory_walk() in server/request.c

just a quick note to thank you, Allen, for the most thorough analysis of
the optimizations of dir_walk.  I'm partially to blame (followed by others
who attempted to optimized further :-) and would love to see an optimization
model which is more generic, e.g. not engangled with the specifics of
'I'm for directories' or 'I'm for patterns'...  It's great to have your
reference to help debug and to correct the functioning of dir_walk, and we
hope you'll participate in testing/confirming any proposed fixes.

My thought for the next-step is to divide dir_walk into cache code (was this
opaque pattern hit before?) and into dir/file handling code, with fixes (which
your patch suggests) and perhaps even clearly splitting out the REG v.s. DIR
into some separate phases.

We are open to all suggestions.


Re: httpd attempts to open file.html/.htaccess (is this a bug?)

2007-06-11 Thread Giuliano Gavazzi


On 11 Jun 2007, at 15:01, Allen Pulsifer wrote:


When processing a GET /.../file.html, Apache httpd briefly treats
file.html as a directory and attempts to open
docroot/.../file.html/.htaccess.  The os returns ENOTDIR, and then
processing of the request continues.

[...]


Does anyone else see the same behavior?  Is this a bug?


Configuration: Apache httpd v 2.2.4 running on a default  
installation of
CentOS-5 (ext3 filesystem).  Tested with stock configuration  
distributed
with CentOS-5, as well as a stock installation compiled from the  
source.


Only change to http.conf is:
AllowOverride None changed to AllowOverride All


same here (2.2.4 on macosx 10.4.9) but AllowOverride None for the  
relevant directory.


Giuliano


Re: httpd attempts to open file.html/.htaccess (is this a bug?)

2007-06-11 Thread Paul Querna
Allen Pulsifer wrote:
 Summary:
 
 When processing a GET /.../file.html, Apache httpd briefly treats
 file.html as a directory and attempts to open
 docroot/.../file.html/.htaccess.  The os returns ENOTDIR, and then
 processing of the request continues.

Yes, this is a somewhat known issue.  Previously it caused issues with
earlier versions of reiserfs4:
http://issues.apache.org/bugzilla/show_bug.cgi?id=31126

Rici explains more details here:
http://marc.info/?l=apache-httpd-devm=109470495819687w=4

It would be nice to fix the root issue

-Paul