Edit report at http://bugs.php.net/bug.php?id=54089&edit=1
ID: 54089
User updated by: nicolas dot grekas+php at gmail dot com
Reported by: nicolas dot grekas+php at gmail dot com
Summary: token_get_all with regards to __halt_compiler is not
binary safe
-Status: Open
+Status: Closed
Type: Bug
Package: Unknown/Other Function
Operating System: Any
PHP Version: 5.3.5
Assigned To: iliaa
Block user comment: N
Private report: N
New Comment:
Latest 5.3.6 has been released with the patch...
So now token_get_all() stops on T_HALT_COMPILER for ever :)
Previous Comments:
------------------------------------------------------------------------
[2011-03-10 08:26:30] nicolas dot grekas+php at gmail dot com
Really, the actual patch is a step backward, I can't do things that were
easy before (getting the halt_compiler_offset with token_get_all)...
Please consider reverting it!
------------------------------------------------------------------------
[2011-03-03 15:44:33] nicolas dot grekas+php at gmail dot com
Sorry to reopen. As 5.3.6 is in RC, I just want to be sure my previous
comment has been read. What about reverting the patch ?
------------------------------------------------------------------------
[2011-03-01 10:15:47] nicolas dot grekas+php at gmail dot com
Thanks for the patch. After reading it, I'm not sure it really helps,
considering that the stop on T_HALT_COMPILER was already easily feasible
in plain PHP. In fact, it may be worse, because now if I want to access
data after T_HALT_COMPILER in PHP using tokenizer, I have to write more
code, as the data is missing from the token array.
As a corner case also, __halt_compiler is always followed by 3 valid
tokens: "(", ")" then ";" or T_CLOSE_TAG, with any number of
T_WHITESPACE/T_COMMENT/T_DOC_COMMENT between.
My view is that this "bug" can be fixed by introducing a new
T_UNEXPECTED_CHARACTER token type, matching those "Unexpected character
in input" warnings: this would fix token_get_all binary unsafeness. Is
it a good idea? I don't know if it's difficult to implement, nor if it
would introduce any BC break, so maybe a "Won't fix" on this bug is
enough?
Could the patch be reverted? I'm afraid it's the best for tokenizer
users...
Here is what I was using before the patch to work around this binary
incompatibility:
<?php
// New token matching an "Unexpected character in input"
define('T_UNEXPECTED_CHARACTER', -1);
$src_tokens = @token_get_all($code);
$bin_tokens = array();
$offset = 0;
$i = -1;
while (isset($src_tokens[++$i]))
{
$t = isset($src_tokens[$i][1]) ? $src_tokens[$i][1] : $src_tokens[$i];
while ($t[0] !== $code[$offset])
$bin_tokens[] = array(T_UNEXPECTED_CHARACTER, $code[$offset++]);
$offset += strlen($t);
$bin_tokens[] = $src_tokens[$i];
unset($src_tokens[$i]);
}
// Here, $bin_tokens contains binary safe tokens
?>
------------------------------------------------------------------------
[2011-02-28 16:18:35] [email protected]
This bug has been fixed in SVN.
Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
Thank you for the report, and for helping us make PHP better.
------------------------------------------------------------------------
[2011-02-28 16:18:28] [email protected]
Automatic comment from SVN on behalf of iliaa
Revision: http://svn.php.net/viewvc/?view=revision&revision=308761
Log: Fixed bug #54089 (token_get_all() does not stop after
__halt_compiler).
------------------------------------------------------------------------
The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
http://bugs.php.net/bug.php?id=54089
--
Edit this bug report at http://bugs.php.net/bug.php?id=54089&edit=1