Edit report at https://bugs.php.net/bug.php?id=54089&edit=1
ID: 54089 Updated by: s...@php.net Reported by: nicolas dot grekas+php at gmail dot com Summary: token_get_all with regards to __halt_compiler is not binary safe -Status: Assigned +Status: Closed Type: Bug Package: Unknown/Other Function Operating System: Any PHP Version: 5.3.5 Assigned To: iliaa Block user comment: N Private report: N New Comment: This bug has been fixed in SVN. Snapshots of the sources are packaged every three hours; this change will be in the next snapshot. You can grab the snapshot at http://snaps.php.net/. For Windows: http://windows.php.net/snapshots/ Thank you for the report, and for helping us make PHP better. Previous Comments: ------------------------------------------------------------------------ [2011-09-15 14:32:46] ni...@php.net The following patch has been added/updated: Patch Name: tokenizer_patch_full.txt Revision: 1316097166 URL: https://bugs.php.net/patch-display.php?bug=54089&patch=tokenizer_patch_full.txt&revision=1316097166 ------------------------------------------------------------------------ [2011-09-15 14:27:27] ni...@php.net The following patch has been added/updated: Patch Name: tokenizer_patch.txt Revision: 1316096847 URL: https://bugs.php.net/patch-display.php?bug=54089&patch=tokenizer_patch.txt&revision=1316096847 ------------------------------------------------------------------------ [2011-09-13 17:11:54] ni...@php.net The following patch has been added/updated: Patch Name: tokenizer_patch_full.txt Revision: 1315933914 URL: https://bugs.php.net/patch-display.php?bug=54089&patch=tokenizer_patch_full.txt&revision=1315933914 ------------------------------------------------------------------------ [2011-09-13 15:50:49] ni...@php.net The following patch has been added/updated: Patch Name: tokenizer_patch.txt Revision: 1315929049 URL: https://bugs.php.net/patch-display.php?bug=54089&patch=tokenizer_patch.txt&revision=1315929049 ------------------------------------------------------------------------ [2011-09-13 07:49:54] nicolas dot grekas+php at gmail dot com Excerpt from internals: Nikita Popov Fri, Sep 9, 2011 at 09:15 [...] The problem with the patch is, that there are some tokens after T_HALT_COMPILER that are of interest, namely the '(' ')' ';'. After the patch it is impossible to get those tokens, without either relexing the code after T_HALT_COMPILER (that way you get the binary data problem back, just with much more complex code) or writing a regular expression to match it (which is really hard, as there may be any token dropped by the PHP parser in there, i.e. whitespace, comments, PHP tags). [...] I would like this patch to be reverted on the 5.4 and trunk branches. I assume it's too late to revert it on 5.3, as it has been there for some time already. It is just counterproductive. (Alternatively one could fix token_get_all to return the (); tokens after __halt_compiler, too, but that would be hard, probably.) -- Ferenc Kovacs Fri, Sep 9, 2011 at 10:01 I think that it wouldn't be too hard. >From a quick glance on the code, we should introduce a new local variable, set that to true where we break now ( http://svn.php.net/viewvc/php/php-src/trunk/ext/tokenizer/tokenizer.c?view=markup#l155 ) and don't break there but for the next ';'. another maybe less confusing solution would be to explicitly add '(', ')' and ';' to the result in the T_HALT_COMPILER condition before breking out of the loop. I will create a patch for this afternoon. or could there be other important tokens after the __halt_compiler() which should be present in the token_get_all() result? -- Nicolas Grekas Fri, Sep 9, 2011 at 10:46 > don't break there but for the next ';'. You can also just count the number of semantic token after T_HALT_COMPILER (ie excluding whitespace and comments) and once you hit 3, halt. > less confusing solution would be to explicitly add '(', ')' and ';' to the > result in the T_HALT_COMPILER condition before breking out of the loop. If you mean verifying that '(', ')' and (';' or T_CLOSE_TAG) are effectively following T_HALT_COMPILER, I think that's part of the syntax analyser's job, not tokenizer's. If you're ok with this argument, then just couting 3 tokens is really the most basic "syntax analysis" we have to do to fix the pb, don't you think? > could there be other important tokens after the __halt_compiler() > which should be present in the token_get_all() result? Maybe the binary data itself, as a big T_INLINE_HTML for example ? Also, if token_get_all you be made binary safe, that would be very cool ! (no more eating of \x00-\x1F inside regular code) :) -- Nikita Popov Fri, Sep 9, 2011 at 19:39 > You can also just count the number of semantic token after T_HALT_COMPILER > (ie excluding whitespace and comments) and once you hit 3, halt. > [...] > Maybe the binary data itself, as a big T_INLINE_HTML for example ? In favor of both proposals! Returning the next 3 tokens should be quite easy [1]. Returning the rest as an T_INLINE_HTML makes sense too, as extracting the data is probably what you want. Though I have no idea how to implement that ^^ [1]: https://github.com/nikic/php-src/commit/2d4cfa05947f04de447635ca5748b3b58defbfaf (Not tested, only guessing) -- Nikita Popov Tue, Sep 13, 2011 at 09:16 I just set up an PHP environment and wrote a proper patch (including test changes) to make it collect the next three tokens. It's a git patch and I'm not sure whether it's compatible with SVN patches. I would love it if this would go into 5.4 before beta. I didn't know how one could fetch the rest into T_INLINE_HTML, so I'm hoping on help here from someone who actually knowns C there :) ------------------------------------------------------------------------ The remainder of the comments for this report are too long. To view the rest of the comments, please view the bug report online at https://bugs.php.net/bug.php?id=54089 -- Edit this bug report at https://bugs.php.net/bug.php?id=54089&edit=1