Edit report at https://bugs.php.net/bug.php?id=54369&edit=1
ID: 54369 Comment by: woody dot gilk at gmail dot com Reported by: tomas dot brastavicius at quantum dot lt Summary: [PATCH] parse_url() incorrectly determines the start of query and fragment parts Status: Open Type: Bug Package: URL related PHP Version: Irrelevant Block user comment: N Private report: N New Comment: According to RFC, the URL http://www.example.com?foo=bar is a completely valid URL. To quote: > For example, the URI <mailto:f...@example.com> has a path of > "f...@example.com", whereas the URI <foo://info.example.com?fred> has an empty path. There is nothing in the RFC spec that says a path must be included in the URL. Please fix this bug. Previous Comments: ------------------------------------------------------------------------ [2011-06-29 21:37:39] lenzai2004-dev at yahoo dot com The point is not about wether the patch is relevant or not. But for this bug and other cases, parse_url is returning corrupt result. It could be fixed in 2 ways: - patch it - or detect invalid url and return error. I've been trying to use this function and after significant volume of URLs I always find cases where it returns incorrect data. I had to rewrite everything in PHP and it's quite slow. ------------------------------------------------------------------------ [2011-05-17 20:12:50] tomas dot brastavicius at quantum dot lt Changed report name as described in the bug report spec. ------------------------------------------------------------------------ [2011-04-03 19:36:33] tokul at users dot sourceforge dot net You can't argue that function is broken and needs fixes, if you feed broken data and expect good output. Use valid urls in your tests, if you want to show that function is broken. ------------------------------------------------------------------------ [2011-04-03 18:36:42] tomas dot brastavicius at quantum dot lt One more comment about this issue: http://marc.info/?l=php-internals&m=130183094107548&w=2 ------------------------------------------------------------------------ [2011-04-03 18:09:08] tomas dot brastavicius at quantum dot lt Another comment about this issue: http://marc.info/?l=php-internals&m=130183032307080&w=2 @Peter Yes, according to RFC 1738 the test URLs are not valid. But: 1. It is not defined that parse_url() parses URLs according to RFC 1738. 2. parse_url() "is not meant to validate given URL". See http://php.net/manual/en/function.parse-url.php 3. Why it is better to return invalid hostname ("#" and "/" are invalid characters, current parse_url() version) instead of invalid query or fragment (patched parse_url() version) ? @tokul at users dot sourceforge dot net Checked My arguments for the patch acceptance are as follows: 1. parse_url() documentation's "Return Values" section clearly states that query and fragment component starts after "?" and "#" character respectively. 2. I don't know any specification that allows "#" and "?" in the hostnames (someone knows ?) but I know at least RFC3986 (unfortunately I am working with) that allows "/" character in both query and fragment parts. See http://tools.ietf.org/html/rfc3986.html#section-3.4 and http://tools.ietf.org/html/rfc3986.html#section-3.5 3. It has been already stated (although different content) that parse_url() parses URLs according to RFC3986. See http://bugs.php.net/bug.php?id=50484. May be Adam Harvey knows more ? ------------------------------------------------------------------------ The remainder of the comments for this report are too long. To view the rest of the comments, please view the bug report online at https://bugs.php.net/bug.php?id=54369 -- Edit this bug report at https://bugs.php.net/bug.php?id=54369&edit=1