When processing a GET /.../file.html, Apache httpd briefly treats
file.html as a directory and attempts to open
docroot/.../file.html/.htaccess. The os returns ENOTDIR,
and then
processing of the request continues.
Yes, this is a somewhat known issue. Previously it caused
issues with earlier versions of reiserfs4:
http://issues.apache.org/bugzilla/show_bug.cgi?id=31126
Rici explains more details here:
http://marc.info/?l=apache-httpd-devm=109470495819687w=4
Hello Paul and Dev List,
Thanks for the reply. I checked out the links and did some code tracing
with the debugger. As one of the links pointed out, the problem is in the
block of code attached below from ap_directory_walk() in server/request.c
This block of code is contained in the directory walk that looks for sym
links and .htaccess files. It is executed immediately after appending the
next path segment, which is either a subdirectory or the file name.
The if test at the top of the block attempts to optimize by skipping the
statements the follow it. The comment on the if test states:
* If...we knew r-filename was a file, and
* if...we have strict (case-sensitive) filenames, or
* we know the canonical_filename matches to _this_ name, and
* if...we have allowed symlinks
* skip the lstat and dummy up an APR_DIR value for thisinfo.
The first problem with the if test is that it doesn't recognize when the
segment is actually the file name itself, and therefore the type is APR_REG
rather than APR_DIR. This could easily be fixed, but there may be a few
other problems.
First, it should be mentioned that the optimization can be removed and then
httpd will behave correctly: it will not do a spurious access on
file.html/.htaccess. However, when the optimization is removed, it will
then do a stat on each component in the file path, when it might not need
to.
Let's first look at the lines of code that follows the if optimization and
look at the conditions under which they are not necessary. Before starting
though, let's note that prior to beginning the directory walk,
ap_directory_walk() does a stat on the full file name, using the
APR_FINFO_MIN parameter.
Later, without optimization, it would then do a stat on each component in
the path, as follows:
1. Do a stat on the path component, looking at the link info
(APR_FINFO_LINK) rather than the target info.
2. Test if stat returned an error. Note that since the initial stat on the
full path did not return an error, the stat on the component will never
return as error (assuming the program logic is correct). This can therefore
always be optimized out.
3. Fix up the path name if the actual component name info does not match. A
mismatch is only possible with a file system that is not case sensitive, and
therefore can be optimized out if either (a) the file system is case
sensitive or (b) we already know they match; or (c) we don't care if they
match or not.
4. If the path is a link, run resolve_symlink(). This function will always
return success when OPT_SYM_LINKS (FollowSymLinks) is enabled.
5. If the path points at anything other than a directory, end processing.
So basically, these processing steps can be skipped whenever (1)
FollowSymLinks is enabled AND (2) the file system is case sensitive.
It seems to me that the optimization should actually read:
If (filesystem is case sensitive AND OPT_SYM_LINKS is enabled AND we did a
successful stat on the full file path) Then:
{ If (the path to test is the fill path AND full path points at a
regular file) Then: end processing
Else: assume path to test is a dir and skip the stat
}
These are the two things I'm concerned about:
1. In the current optimization, the comment says:
* if...we have strict (case-sensitive) filenames, or
* we know the canonical_filename matches to _this_ name, and
while the actual code says:
#ifdef CASE_BLIND_FILESYSTEM
(filename_len = canonical_len)
#endif
At first examination, it looks the comment describes the correct
implementation, but how does the test for filename_len = canonical_len
ensure that canonical_filename matches to _this_ name. Can anyone verify
this is correct?
2. When OPT_SYM_LINKS is enabled, resolve_symlink() does not test
OPT_SYM_OWNER, i.e., OPT_SYM_LINKS overrides OPT_SYM_OWNER. The
optimization however insists that OPT_SYM_LINKS is set while OPT_SYM_OWNER
is unset.
Which of these two are correct? Should resolve_symlink() always check
OPT_SYM_OWNER, even if OPT_SYM_LINKS is enabled, or should the optimization
only check OPT_SYM_LINKS?
Thanks,
Allen
-
THE PROBLEMATIC BLOCK OF CODE
/* First optimization;
* If...we knew r-filename was a file, and
* if...we have strict (case-sensitive) filenames, or
* we know the canonical_filename matches to _this_ name,
and
* if...we have allowed symlinks
* skip the lstat and dummy