Edit report at https://bugs.php.net/bug.php?id=47070&edit=1
ID: 47070
Comment by: darrel dot opry at gmail dot com
Reported by: darrel dot opry at gmail dot com
Summary: php_stream_locate_url_wrapper fails without
authority section
Status: Open
Type: Feature/Change Request
Package: Streams related
Operating System: Ubuntu 8.10
PHP Version: 5.2.8
Block user comment: N
Private report: N
New Comment:
Here is my non-C developers take on how this function could be re-implemented
to
be a little easier to follow. My general take is that the 'file' scheme
handling in this function (authority and path parsing) is out of place and
should be moved the to actual plain files stream wrapper.
In general I believe stream wrappers and code utilizing stream wrappers could
be
made a little more efficient by fully parsing the URL in this function,
possible
with parse URL and using the resulting scheme returned to select a stream
wrapper and passing the results of the same parse_url into the stream methods
instead of the URL itself. at the very least passing the pre-parsed authority
and path sections into the stream_open and the parsed out path in to the path.
This would save developers making a duplicate call to parse_url from
interpreted
code.
/* {{{ php_stream_locate_url_wrapper */
PHPAPI php_stream_wrapper *php_stream_locate_url_wrapper(const char *path, char
**path_for_open, int options TSRMLS_DC)
{
HashTable *wrapper_hash = (FG(stream_wrappers) ? FG(stream_wrappers) :
&url_stream_wrappers_hash);
php_stream_wrapper **wrapper = NULL;
const char *ptr_scheme_delimiter = NULL;
const char *scheme = NULL;
int scheme_delimiter_position = 0;
if (path_for_open) {
*path_for_open = (char*)path;
}
if (options & IGNORE_URL) {
return (options & STREAM_LOCATE_WRAPPERS_ONLY) ? NULL :
&php_plain_files_wrapper;
}
// Loop over path as long as we have valid protocol characters
[:alpha:,+,-,.] until we reach the scheme delimiter.
for (ptr_scheme_delimiter = path; isalnum((int)*ptr_scheme_delimiter)
||
*ptr_scheme_delimiter == '+' || *ptr_scheme_delimiter == '-' ||
*ptr_scheme_delimiter == '.'; p++) {
scheme_delimiter_position++;
}
// Why is 'data:' being checked for here?
if ((*ptr_scheme_delimiter == ':') && (scheme_delimiter_position > 1)
&&
(scheme_delimiter_position == 4 && !memcmp("data:", path, 5))) {
scheme = estrndup(path, scheme_delimiter_position);
}
// convert zlib stream wrapper name. This should be removed or
compress.zlib should be
// registered under both schemes until it is officially removed.
if (scheme_delimiter_position == 5 && !strncasecmp(scheme, "zlib", 5)) {
scheme = "compress.zlib";
scheme_delimiter_position = 13;
php_error_docref(NULL TSRMLS_CC, E_WARNING, "Use of \"zlib:\"
wrapper is deprecated; please use \"compress.zlib://\" instead");
}
// if we matched a schema...
if (scheme) {
// attempt to lookup the wrapper from the wrapper hash using
the
scheme.
if (FAILURE == zend_hash_find(wrapper_hash, (char*)scheme,
scheme_delimiter_position + 1, (void**)&wrapper)) {
char wrapper_name[32];
if (n >= sizeof(wrapper_name)) {
n = sizeof(wrapper_name) - 1;
}
PHP_STRLCPY(wrapper_name, protocol,
sizeof(wrapper_name), scheme_delimiter_position);
php_error_docref(NULL TSRMLS_CC, E_WARNING, "Unable to
find the wrapper \"%s\" - did you forget to enable it when you configured
PHP?",
wrapper_name);
wrapper = NULL;
}
}
// If we didn't find a wrapper yet we should try defaulting to the file
scheme.
if (!wrapper)
// if the configuration says we support wrappers only. We
should
exit now before moving farther.
if(options & STREAM_LOCATE_WRAPPERS_ONLY) {
if (options & REPORT_ERRORS) {
php_error_docref(NULL TSRMLS_CC,
E_WARNING, "Unable to find a suitable stream wrapper for %s.", path);
}
return NULL;
}
// let proceed with the assumption we're working with files and
try to locate the wrapper.
if (FG(stream_wrappers)) {
/* Check again, the original check might have not known
the protocol name */
if (!wrapper && zend_hash_find(wrapper_hash, "file",
sizeof("file"), (void**)&wrapper) == FAILURE) {
if (options & REPORT_ERRORS) {
php_error_docref(NULL TSRMLS_CC,
E_WARNING, "file: wrapper is disabled in the server configuration");
}
return NULL;
}
}
}
// if we have a wrappper and matched a scheme and it wasn't file... lets
handle some errors.
if (wrapper && scheme && !strncasecmp(scheme, "file", 4) {
if ((*wrapper)->is_url && (options &
STREAM_DISABLE_URL_PROTECTION) == 0
&& (!PG(allow_url_fopen)
|| (((options & STREAM_OPEN_FOR_INCLUDE) ||
PG(in_user_include)) && !PG(allow_url_include))
)
) {
if (options & REPORT_ERRORS) {
/* protocol[n] probably isn't '\0' */
char *protocol_dup = estrndup(protocol,
n);
if (!PG(allow_url_fopen)) {
php_error_docref(NULL
TSRMLS_CC,
E_WARNING, "%s: wrapper is disabled in the server configuration by
allow_url_fopen=0", protocol_dup);
} else {
php_error_docref(NULL
TSRMLS_CC,
E_WARNING, "%s: wrapper is disabled in the server configuration by
allow_url_include=0", protocol_dup);
}
efree(protocol_dup);
}
return NULL;
}
return *wrapper;
}
// if we've gotten here we're using file whether it was defaulted or
not....
// the following code could be simplified if internalized to php plain
files wrapper.
// in general the work of handling the authoritry and path sections of
the URI should be handed off
// to the stream wrappers. It would be real nice if the stream wrapper
methods received a parsed url
// in their stream_open methods saving developers the trouble of
parsing
out authority and
// path in interpreted code .
int localhost = 0;
if (!strncasecmp(path, "file://localhost/", 17)) {
localhost = 1;
}
// Validate that we're only using localhost
#ifdef PHP_WIN32
if (localhost == 0 && path[scheme_delimiter_position+3] != '\0' &&
path[scheme_delimiter_position+3] != '/' && path[scheme_delimiter_position+4]
!=
':') {
#else
if (localhost == 0 && path[scheme_delimiter_position+3] != '\0' &&
path[scheme_delimiter_position+3] != '/') {
#endif
if (options & REPORT_ERRORS) {
php_error_docref(NULL TSRMLS_CC, E_WARNING, "remote
host
file access not supported, %s", path);
}
return NULL;
}
// so fix up the paths for files...
if (path_for_open) {
/* skip past protocol and :/, but handle windows
correctly */
*path_for_open = (char*)path +
scheme_delimiter_position
+ 1;
if (localhost == 1) {
(*path_for_open) += 11;
}
while (*(++*path_for_open)=='/');
#ifdef PHP_WIN32
if (*(*path_for_open + 1) != ':')
#endif
(*path_for_open)--;
}
}
return &php_plain_files_wrapper;
}
Previous Comments:
------------------------------------------------------------------------
[2011-11-12 05:50:46] dopry at rynassociates dot com
I believe the code that needs to be modified to be in main/streams/streams.c
in the function
php_stream_locate_url_wrapper
// iterate over path while the current character is alpha numeric
// +, -, or .
for (p = path; isalnum((int)*p) || *p == '+' || *p == '-' || *p == '.'; p++) {
n++;
}
// if the current value of p is : and n is not the first character
// and the characters following p are // || the first five charaters are not
// data:
if ((*p == ':') && (n > 1) && (!strncmp("//", p+1, 2) || (n == 4 &&
!memcmp("data:", path, 5)))) {
protocol = path;
} else if (n == 5 && strncasecmp(path, "zlib:", 5) == 0) {
/* BC with older php scripts and zlib wrapper */
protocol = "compress.zlib";
n = 13;
php_error_docref(NULL TSRMLS_CC, E_WARNING, "Use of \"zlib:\"
wrapper is deprecated; please use \"compress.zlib://\" instead");
}
I believe the solution to be removing (!strncmp("//", p+1, 2) from the if
statement above so that // is not a required part of the validation.
The following logic for returning the stream wrapper looks like it can be
further optimized by re-ordering/grouping some of the conditionals.
------------------------------------------------------------------------
[2009-07-27 08:39:56] [email protected]
Reclassified.
------------------------------------------------------------------------
[2009-01-15 16:25:37] darrel dot opry at gmail dot com
a note regarding backwards compatibility. This change shouldn't
interfere with existing user defined stream wrappers, since they
would not be called with if they used without the authority section
currently.
------------------------------------------------------------------------
[2009-01-15 16:14:16] darrel dot opry at gmail dot com
I did RTFM. As a developer and user of PHP I disagree with your
assertion. The use and parsing of the URL schema for streams is not
the same as that of parse_url which is recommended for parsing
incoming paths.
The documentation specifically uses the wording URL in many places,
right to the point of having a setting for allow_url_fopen.
The issue with the current approach is that it does not recognize
legally delimited scheme's properly, and pass them to the underlying
stream wrapper. It seems to make the invalid assumption that :// or
// is the scheme delimiter when it is in fact :. It's a minor parsing
issue that could probably be corrected by an experienced C developer
in under 20 minutes.
Why is this an issue you may ask...
parse_url properly parses the scheme, user, password, host, port, and
path.
If I want to use parse_url within my stream wrapper I have to
concatenate the host and path elements if I use a URL in the form of
file://path/to/file.txt or I have to use a url in the form of
file://localhost/path/to/file.txt if I want to avoid this
concatenation.
A secondary impact, if I'm storing URLs to resources in the database
I now have to store the authority section as well. This is suboptimal
for me as a application developer as the number of resources I'm
storing references to increases.
So maybe you should go RYOFM, before dismissing something out of hand
just because it references and RFC without thinking of the
implications.
Basically I think most developers really want their stream wrapper
called if scheme: is properly designated, so they can write standards
compliant code.
------------------------------------------------------------------------
[2009-01-15 15:26:34] [email protected]
RTFM: http://www.php.net/fopen (look for explanation what PHP expects the
filename parameter to look like...you can put as many RFCs here, but it's never
said anywhere that fopen() expects some RFCs style scheme.. :)
------------------------------------------------------------------------
The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
https://bugs.php.net/bug.php?id=47070
--
Edit this bug report at https://bugs.php.net/bug.php?id=47070&edit=1