Edit report at https://bugs.php.net/bug.php?id=47070&edit=1
ID: 47070 Comment by: darrel dot opry at gmail dot com Reported by: darrel dot opry at gmail dot com Summary: php_stream_locate_url_wrapper fails without authority section Status: Open Type: Feature/Change Request Package: Streams related Operating System: Ubuntu 8.10 PHP Version: 5.2.8 Block user comment: N Private report: N New Comment: Here is my non-C developers take on how this function could be re-implemented to be a little easier to follow. My general take is that the 'file' scheme handling in this function (authority and path parsing) is out of place and should be moved the to actual plain files stream wrapper. In general I believe stream wrappers and code utilizing stream wrappers could be made a little more efficient by fully parsing the URL in this function, possible with parse URL and using the resulting scheme returned to select a stream wrapper and passing the results of the same parse_url into the stream methods instead of the URL itself. at the very least passing the pre-parsed authority and path sections into the stream_open and the parsed out path in to the path. This would save developers making a duplicate call to parse_url from interpreted code. /* {{{ php_stream_locate_url_wrapper */ PHPAPI php_stream_wrapper *php_stream_locate_url_wrapper(const char *path, char **path_for_open, int options TSRMLS_DC) { HashTable *wrapper_hash = (FG(stream_wrappers) ? FG(stream_wrappers) : &url_stream_wrappers_hash); php_stream_wrapper **wrapper = NULL; const char *ptr_scheme_delimiter = NULL; const char *scheme = NULL; int scheme_delimiter_position = 0; if (path_for_open) { *path_for_open = (char*)path; } if (options & IGNORE_URL) { return (options & STREAM_LOCATE_WRAPPERS_ONLY) ? NULL : &php_plain_files_wrapper; } // Loop over path as long as we have valid protocol characters [:alpha:,+,-,.] until we reach the scheme delimiter. for (ptr_scheme_delimiter = path; isalnum((int)*ptr_scheme_delimiter) || *ptr_scheme_delimiter == '+' || *ptr_scheme_delimiter == '-' || *ptr_scheme_delimiter == '.'; p++) { scheme_delimiter_position++; } // Why is 'data:' being checked for here? if ((*ptr_scheme_delimiter == ':') && (scheme_delimiter_position > 1) && (scheme_delimiter_position == 4 && !memcmp("data:", path, 5))) { scheme = estrndup(path, scheme_delimiter_position); } // convert zlib stream wrapper name. This should be removed or compress.zlib should be // registered under both schemes until it is officially removed. if (scheme_delimiter_position == 5 && !strncasecmp(scheme, "zlib", 5)) { scheme = "compress.zlib"; scheme_delimiter_position = 13; php_error_docref(NULL TSRMLS_CC, E_WARNING, "Use of \"zlib:\" wrapper is deprecated; please use \"compress.zlib://\" instead"); } // if we matched a schema... if (scheme) { // attempt to lookup the wrapper from the wrapper hash using the scheme. if (FAILURE == zend_hash_find(wrapper_hash, (char*)scheme, scheme_delimiter_position + 1, (void**)&wrapper)) { char wrapper_name[32]; if (n >= sizeof(wrapper_name)) { n = sizeof(wrapper_name) - 1; } PHP_STRLCPY(wrapper_name, protocol, sizeof(wrapper_name), scheme_delimiter_position); php_error_docref(NULL TSRMLS_CC, E_WARNING, "Unable to find the wrapper \"%s\" - did you forget to enable it when you configured PHP?", wrapper_name); wrapper = NULL; } } // If we didn't find a wrapper yet we should try defaulting to the file scheme. if (!wrapper) // if the configuration says we support wrappers only. We should exit now before moving farther. if(options & STREAM_LOCATE_WRAPPERS_ONLY) { if (options & REPORT_ERRORS) { php_error_docref(NULL TSRMLS_CC, E_WARNING, "Unable to find a suitable stream wrapper for %s.", path); } return NULL; } // let proceed with the assumption we're working with files and try to locate the wrapper. if (FG(stream_wrappers)) { /* Check again, the original check might have not known the protocol name */ if (!wrapper && zend_hash_find(wrapper_hash, "file", sizeof("file"), (void**)&wrapper) == FAILURE) { if (options & REPORT_ERRORS) { php_error_docref(NULL TSRMLS_CC, E_WARNING, "file: wrapper is disabled in the server configuration"); } return NULL; } } } // if we have a wrappper and matched a scheme and it wasn't file... lets handle some errors. if (wrapper && scheme && !strncasecmp(scheme, "file", 4) { if ((*wrapper)->is_url && (options & STREAM_DISABLE_URL_PROTECTION) == 0 && (!PG(allow_url_fopen) || (((options & STREAM_OPEN_FOR_INCLUDE) || PG(in_user_include)) && !PG(allow_url_include)) ) ) { if (options & REPORT_ERRORS) { /* protocol[n] probably isn't '\0' */ char *protocol_dup = estrndup(protocol, n); if (!PG(allow_url_fopen)) { php_error_docref(NULL TSRMLS_CC, E_WARNING, "%s: wrapper is disabled in the server configuration by allow_url_fopen=0", protocol_dup); } else { php_error_docref(NULL TSRMLS_CC, E_WARNING, "%s: wrapper is disabled in the server configuration by allow_url_include=0", protocol_dup); } efree(protocol_dup); } return NULL; } return *wrapper; } // if we've gotten here we're using file whether it was defaulted or not.... // the following code could be simplified if internalized to php plain files wrapper. // in general the work of handling the authoritry and path sections of the URI should be handed off // to the stream wrappers. It would be real nice if the stream wrapper methods received a parsed url // in their stream_open methods saving developers the trouble of parsing out authority and // path in interpreted code . int localhost = 0; if (!strncasecmp(path, "file://localhost/", 17)) { localhost = 1; } // Validate that we're only using localhost #ifdef PHP_WIN32 if (localhost == 0 && path[scheme_delimiter_position+3] != '\0' && path[scheme_delimiter_position+3] != '/' && path[scheme_delimiter_position+4] != ':') { #else if (localhost == 0 && path[scheme_delimiter_position+3] != '\0' && path[scheme_delimiter_position+3] != '/') { #endif if (options & REPORT_ERRORS) { php_error_docref(NULL TSRMLS_CC, E_WARNING, "remote host file access not supported, %s", path); } return NULL; } // so fix up the paths for files... if (path_for_open) { /* skip past protocol and :/, but handle windows correctly */ *path_for_open = (char*)path + scheme_delimiter_position + 1; if (localhost == 1) { (*path_for_open) += 11; } while (*(++*path_for_open)=='/'); #ifdef PHP_WIN32 if (*(*path_for_open + 1) != ':') #endif (*path_for_open)--; } } return &php_plain_files_wrapper; } Previous Comments: ------------------------------------------------------------------------ [2011-11-12 05:50:46] dopry at rynassociates dot com I believe the code that needs to be modified to be in main/streams/streams.c in the function php_stream_locate_url_wrapper // iterate over path while the current character is alpha numeric // +, -, or . for (p = path; isalnum((int)*p) || *p == '+' || *p == '-' || *p == '.'; p++) { n++; } // if the current value of p is : and n is not the first character // and the characters following p are // || the first five charaters are not // data: if ((*p == ':') && (n > 1) && (!strncmp("//", p+1, 2) || (n == 4 && !memcmp("data:", path, 5)))) { protocol = path; } else if (n == 5 && strncasecmp(path, "zlib:", 5) == 0) { /* BC with older php scripts and zlib wrapper */ protocol = "compress.zlib"; n = 13; php_error_docref(NULL TSRMLS_CC, E_WARNING, "Use of \"zlib:\" wrapper is deprecated; please use \"compress.zlib://\" instead"); } I believe the solution to be removing (!strncmp("//", p+1, 2) from the if statement above so that // is not a required part of the validation. The following logic for returning the stream wrapper looks like it can be further optimized by re-ordering/grouping some of the conditionals. ------------------------------------------------------------------------ [2009-07-27 08:39:56] j...@php.net Reclassified. ------------------------------------------------------------------------ [2009-01-15 16:25:37] darrel dot opry at gmail dot com a note regarding backwards compatibility. This change shouldn't interfere with existing user defined stream wrappers, since they would not be called with if they used without the authority section currently. ------------------------------------------------------------------------ [2009-01-15 16:14:16] darrel dot opry at gmail dot com I did RTFM. As a developer and user of PHP I disagree with your assertion. The use and parsing of the URL schema for streams is not the same as that of parse_url which is recommended for parsing incoming paths. The documentation specifically uses the wording URL in many places, right to the point of having a setting for allow_url_fopen. The issue with the current approach is that it does not recognize legally delimited scheme's properly, and pass them to the underlying stream wrapper. It seems to make the invalid assumption that :// or // is the scheme delimiter when it is in fact :. It's a minor parsing issue that could probably be corrected by an experienced C developer in under 20 minutes. Why is this an issue you may ask... parse_url properly parses the scheme, user, password, host, port, and path. If I want to use parse_url within my stream wrapper I have to concatenate the host and path elements if I use a URL in the form of file://path/to/file.txt or I have to use a url in the form of file://localhost/path/to/file.txt if I want to avoid this concatenation. A secondary impact, if I'm storing URLs to resources in the database I now have to store the authority section as well. This is suboptimal for me as a application developer as the number of resources I'm storing references to increases. So maybe you should go RYOFM, before dismissing something out of hand just because it references and RFC without thinking of the implications. Basically I think most developers really want their stream wrapper called if scheme: is properly designated, so they can write standards compliant code. ------------------------------------------------------------------------ [2009-01-15 15:26:34] j...@php.net RTFM: http://www.php.net/fopen (look for explanation what PHP expects the filename parameter to look like...you can put as many RFCs here, but it's never said anywhere that fopen() expects some RFCs style scheme.. :) ------------------------------------------------------------------------ The remainder of the comments for this report are too long. To view the rest of the comments, please view the bug report online at https://bugs.php.net/bug.php?id=47070 -- Edit this bug report at https://bugs.php.net/bug.php?id=47070&edit=1