Edit report at http://bugs.php.net/bug.php?id=54369&edit=1
ID: 54369
Comment by: tomas dot brastavicius at quantum dot lt
Reported by: tomas dot brastavicius at quantum dot lt
Summary: parse_url() incorrectly determines the start of
query and fragment parts of URL
Status: Open
Type: Bug
Package: URL related
PHP Version: Irrelevant
Block user comment: N
Private report: N
New Comment:
One more comment about this issue:
http://marc.info/?l=php-internals&m=130183094107548&w=2
Previous Comments:
------------------------------------------------------------------------
[2011-04-03 18:09:08] tomas dot brastavicius at quantum dot lt
Another comment about this issue:
http://marc.info/?l=php-internals&m=130183032307080&w=2
@Peter
Yes, according to RFC 1738 the test URLs are not valid. But:
1. It is not defined that parse_url() parses URLs according to RFC
1738.
2. parse_url() "is not meant to validate given URL". See
http://php.net/manual/en/function.parse-url.php
3. Why it is better to return invalid hostname ("#" and "/" are invalid
characters, current parse_url() version) instead of invalid query or
fragment (patched parse_url() version) ?
@tokul at users dot sourceforge dot net
Checked
My arguments for the patch acceptance are as follows:
1. parse_url() documentation's "Return Values" section clearly states
that query and fragment component starts after "?" and "#" character
respectively.
2. I don't know any specification that allows "#" and "?" in the
hostnames (someone knows ?) but I know at least RFC3986 (unfortunately I
am working with) that allows "/" character in both query and fragment
parts. See http://tools.ietf.org/html/rfc3986.html#section-3.4 and
http://tools.ietf.org/html/rfc3986.html#section-3.5
3. It has been already stated (although different content) that
parse_url() parses URLs according to RFC3986. See
http://bugs.php.net/bug.php?id=50484. May be Adam Harvey knows more ?
------------------------------------------------------------------------
[2011-04-03 14:10:58] tokul at users dot sourceforge dot net
Check url encoding documentation first.
http://en.wikipedia.org/wiki/Percent-encoding
Then fix your $url value. You use reserved character for other purpose.
------------------------------------------------------------------------
[2011-03-24 15:46:33] tomas dot brastavicius at quantum dot lt
Description:
------------
Attached patch fixes the issue.
Test script:
---------------
$url = 'http://www.example.com#fra/gment';
echo $url . "\n";
var_dump(parse_url($url));
$url = 'http://www.example.com?p=1/param';
echo $url . "\n";
var_dump(parse_url($url));
// No host, should return false
$url = 'http://#fra/gment';
echo $url . "\n";
var_dump(parse_url($url));
// No host, should return false
$url = 'http://?p=1/param';
echo $url . "\n";
var_dump(parse_url($url));
Expected result:
----------------
http://www.example.com#fra/gment
array(3) {
["scheme"]=>
string(4) "http"
["host"]=>
string(15) "www.example.com"
["fragment"]=>
string(9) "fra/gment"
}
http://www.example.com?p=1/param
array(3) {
["scheme"]=>
string(4) "http"
["host"]=>
string(15) "www.example.com"
["query"]=>
string(9) "p=1/param"
}
http://#fra/gment
bool(false)
http://?p=1/param
bool(false)
Actual result:
--------------
http://www.example.com#fra/gment
array(3) {
["scheme"]=>
string(4) "http"
["host"]=>
string(19) "www.example.com#fra"
["path"]=>
string(6) "/gment"
}
http://www.example.com?p=1/param
array(3) {
["scheme"]=>
string(4) "http"
["host"]=>
string(19) "www.example.com?p=1"
["path"]=>
string(6) "/param"
}
http://#fra/gment
array(3) {
["scheme"]=>
string(4) "http"
["host"]=>
string(4) "#fra"
["path"]=>
string(6) "/gment"
}
http://?p=1/param
array(3) {
["scheme"]=>
string(4) "http"
["host"]=>
string(4) "?p=1"
["path"]=>
string(6) "/param"
}
------------------------------------------------------------------------
--
Edit this bug report at http://bugs.php.net/bug.php?id=54369&edit=1