Edit report at http://bugs.php.net/bug.php?id=54369&edit=1
ID: 54369
User updated by: tomas dot brastavicius at quantum dot lt
Reported by: tomas dot brastavicius at quantum dot lt
-Summary: parse_url() incorrectly determines the start of
query and fragment parts of URL
+Summary: [PATCH] parse_url() incorrectly determines the start
of query and fragment parts
Status: Open
Type: Bug
Package: URL related
PHP Version: Irrelevant
Block user comment: N
Private report: N
New Comment:
Changed report name as described in the bug report spec.
Previous Comments:
------------------------------------------------------------------------
[2011-04-03 19:36:33] tokul at users dot sourceforge dot net
You can't argue that function is broken and needs fixes, if you feed
broken data and expect good output. Use valid urls in your tests, if you
want to show that function is broken.
------------------------------------------------------------------------
[2011-04-03 18:36:42] tomas dot brastavicius at quantum dot lt
One more comment about this issue:
http://marc.info/?l=php-internals&m=130183094107548&w=2
------------------------------------------------------------------------
[2011-04-03 18:09:08] tomas dot brastavicius at quantum dot lt
Another comment about this issue:
http://marc.info/?l=php-internals&m=130183032307080&w=2
@Peter
Yes, according to RFC 1738 the test URLs are not valid. But:
1. It is not defined that parse_url() parses URLs according to RFC
1738.
2. parse_url() "is not meant to validate given URL". See
http://php.net/manual/en/function.parse-url.php
3. Why it is better to return invalid hostname ("#" and "/" are invalid
characters, current parse_url() version) instead of invalid query or
fragment (patched parse_url() version) ?
@tokul at users dot sourceforge dot net
Checked
My arguments for the patch acceptance are as follows:
1. parse_url() documentation's "Return Values" section clearly states
that query and fragment component starts after "?" and "#" character
respectively.
2. I don't know any specification that allows "#" and "?" in the
hostnames (someone knows ?) but I know at least RFC3986 (unfortunately I
am working with) that allows "/" character in both query and fragment
parts. See http://tools.ietf.org/html/rfc3986.html#section-3.4 and
http://tools.ietf.org/html/rfc3986.html#section-3.5
3. It has been already stated (although different content) that
parse_url() parses URLs according to RFC3986. See
http://bugs.php.net/bug.php?id=50484. May be Adam Harvey knows more ?
------------------------------------------------------------------------
[2011-04-03 14:10:58] tokul at users dot sourceforge dot net
Check url encoding documentation first.
http://en.wikipedia.org/wiki/Percent-encoding
Then fix your $url value. You use reserved character for other purpose.
------------------------------------------------------------------------
[2011-03-24 15:46:33] tomas dot brastavicius at quantum dot lt
Description:
------------
Attached patch fixes the issue.
Test script:
---------------
$url = 'http://www.example.com#fra/gment';
echo $url . "\n";
var_dump(parse_url($url));
$url = 'http://www.example.com?p=1/param';
echo $url . "\n";
var_dump(parse_url($url));
// No host, should return false
$url = 'http://#fra/gment';
echo $url . "\n";
var_dump(parse_url($url));
// No host, should return false
$url = 'http://?p=1/param';
echo $url . "\n";
var_dump(parse_url($url));
Expected result:
----------------
http://www.example.com#fra/gment
array(3) {
["scheme"]=>
string(4) "http"
["host"]=>
string(15) "www.example.com"
["fragment"]=>
string(9) "fra/gment"
}
http://www.example.com?p=1/param
array(3) {
["scheme"]=>
string(4) "http"
["host"]=>
string(15) "www.example.com"
["query"]=>
string(9) "p=1/param"
}
http://#fra/gment
bool(false)
http://?p=1/param
bool(false)
Actual result:
--------------
http://www.example.com#fra/gment
array(3) {
["scheme"]=>
string(4) "http"
["host"]=>
string(19) "www.example.com#fra"
["path"]=>
string(6) "/gment"
}
http://www.example.com?p=1/param
array(3) {
["scheme"]=>
string(4) "http"
["host"]=>
string(19) "www.example.com?p=1"
["path"]=>
string(6) "/param"
}
http://#fra/gment
array(3) {
["scheme"]=>
string(4) "http"
["host"]=>
string(4) "#fra"
["path"]=>
string(6) "/gment"
}
http://?p=1/param
array(3) {
["scheme"]=>
string(4) "http"
["host"]=>
string(4) "?p=1"
["path"]=>
string(6) "/param"
}
------------------------------------------------------------------------
--
Edit this bug report at http://bugs.php.net/bug.php?id=54369&edit=1