[PHP-DEV] Re: Reverting ext/mbstring patch
Hey, I think I can fix it somehow. Please don't be haste with it. I am going to look into it. Moriyoshi On Tue, Mar 1, 2011 at 11:35 PM, Dmitry Stogov dmi...@zend.com wrote: Hi, I'm going to revert Moriyoshi patch from December and some following fixes. I like the idea of the patch, but it just doesn't work as expected. It breaks 10 tests, but in general it breaks most things related to Unicode (declare statement, multibyte scripts, exif support for Unicode, multibyte POST requests). I tried to fix it myself, but I just can't understand how it should work (it's too big). It also has several places where integers messed with pointers, old API messed with new one and so on. I'm going to revert (apply the attached patch) on Thursday. Following is the list of failed tests: Shift_JIS request [tests/basic/029.phpt] Testing declare statement with several type values [Zend/tests/declare_001.phpt] Zend Multibyte and ShiftJIS [Zend/tests/multibyte/multibyte_encoding_001.phpt] Zend Multibyte and UTF-8 BOM [Zend/tests/multibyte/multibyte_encoding_002.phpt] Zend Multibyte and UTF-16 BOM [Zend/tests/multibyte/multibyte_encoding_003.phpt] encoding conversion from script encoding into internal encoding [Zend/tests/multibyte/multibyte_encoding_005.phpt] 086: bracketed namespace with encoding [Zend/tests/ns_086.phpt] Check for exif_read_data, Unicode user comment [ext/exif/tests/exif003.phpt] Check for exif_read_data, Unicode WinXP tags [ext/exif/tests/exif004.phpt] Test mb_get_info() function [ext/mbstring/tests/mb_get_info.phpt] Thanks. Dmitry. -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
[PHP-DEV] RFC: built-in web server in CLI.
Hi, Just to let you know that I wrote a RFC about built-in web server feature with which PHP can serve contents without a help of web servers. That would be handy for development purpose. If interested, have a look at http://wiki.php.net/rfc/builtinwebserver . Regards, Moriyoshi -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] RFC: built-in web server in CLI.
On 3/2/11 12:55 PM, Moriyoshi Koizumi wrote: Hi, Just to let you know that I wrote a RFC about built-in web server feature with which PHP can serve contents without a help of web servers. That would be handy for development purpose. If interested, have a look at http://wiki.php.net/rfc/builtinwebserver . I like it. Need to go through it very carefully and look for security-related issues though. Make sure all memory handling is safe. -Rasmus -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] RFC: built-in web server in CLI.
On Wed, Mar 2, 2011 at 9:59 PM, Rasmus Lerdorf ras...@lerdorf.com wrote: On 3/2/11 12:55 PM, Moriyoshi Koizumi wrote: Hi, Just to let you know that I wrote a RFC about built-in web server feature with which PHP can serve contents without a help of web servers. That would be handy for development purpose. If interested, have a look at http://wiki.php.net/rfc/builtinwebserver . I like it. Need to go through it very carefully and look for security-related issues though. Make sure all memory handling is safe. Same here, very handy. I would not worry too much about security related issues as such builtin server should really be used for development purposes only (yes, users do bad things even if we say to do not it :). -- Pierre @pierrejoye | http://blog.thepimp.net | http://www.libgd.org -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] RFC: built-in web server in CLI.
Moriyoshi Koizumi wrote: Hi, Just to let you know that I wrote a RFC about built-in web server feature with which PHP can serve contents without a help of web servers. That would be handy for development purpose. If interested, have a look at http://wiki.php.net/rfc/builtinwebserver . Regards, Moriyoshi I like the idea. Regarding the patch (https://gist.github.com/835698): I don't see a switch to disable the internal parse on configure. I'd expect the files to be on its own folder inside sapi, even being able to bundle them in a single binary. Why is this needed on WIndows? + ADD_FLAG(LIBS_CLI, ws2_32.lib); Surely php will already link with the sockets library for its own functions. The http parser code seems copied from https://github.com/ry/http-parser and it may not be a good idea to modify it downstream, but it seems to do more things than strictly needed by php (eg. there are more methods than those a php server would take use). It also seems to be a hand-coded lexer, so that's much more verbose than a set of rules. The patch looks messy as it splits main in two functions, so it gets hard to follow, but is probably good overall. The change from php_printf to printf in line 3988 looks wrong. Any special reason to disable it on PHP_CLI_WIN32_NO_CONSOLE ? -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Volnitsky substring search algo
Stas Malyshev wrote: Hi! http://volnitsky.com/project/str_search/ I'm not sure it'd be easy to integrate this into PHP codebase as-is, provided it relies on C++ standard libraries which PHP makes no use of (and thus potentially introduces a world of dependencies and complexities into the build process). I'm sure it can be re-done in pure standard C, then it can be tested in PHP and if it's better - I don't see why it can't be integrated. Not really. It only uses std::search, which would be equivalent to the current zend_memnstr(). And std::numeric_limits can be replaced with a limits.h macro. I'd be more concerned about the only for little-endian platforms and where access to misaligned W is allowed remark. php is also available for big endian architectures, but that seems easy. Some architecture supported by php won't probably accept that, so it would also need some configure test to disable it. -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Volnitsky substring search algo
On Sun, Feb 27, 2011 at 12:50 PM, Jordi Boggiano j.boggi...@seld.be wrote: http://volnitsky.com/project/str_search/ The algorithm seems flawed to me, at least in its reference implementation. There does not seem to be any guarantee that the returned position is really the *first* occurrence of the needle in the haystack. It's easy to see with needle being a repetition of the same character: SS = 'a' * 1000 W_size = 4 The hash map build will be clogged, with the value 1 stored in the first slot for SS. As a consequence, the algorithm will step through the haystack, trying to confirm a match for the needle at the step position. If a match is found there, it will discard any previous matches that could be valid at this position. All those haystack will return the same position 998: S = '' * 997 + 'a' * 1000 (correct) S = 'bbb' * 996 + 'a' * 1001 (incorrect, should return 997) S = 'bb' * 995 + 'a' * 1002 (incorrect, should return 996) S = 'b' * 994 + 'a' * 1003 (incorrect, should return 995) The implementation could be fixed (by adding an explicit string matching when building the hash table, and by storing *all* the occurrences of a given W in SS), but that will increase the overall cost (both computing and memory) of the algorithm. Damien Tournoud
Re: [PHP-DEV] RFC: built-in web server in CLI.
2011/3/3 Ángel González keis...@gmail.com: Moriyoshi Koizumi wrote: Hi, Just to let you know that I wrote a RFC about built-in web server feature with which PHP can serve contents without a help of web servers. That would be handy for development purpose. If interested, have a look at http://wiki.php.net/rfc/builtinwebserver . Regards, Moriyoshi I like the idea. Regarding the patch (https://gist.github.com/835698): I don't see a switch to disable the internal parse on configure. I don't see any obvious reason it should be able to be turned off through the build option. The only problem is binary size increase, which I guess is quite subtle. I'd expect the files to be on its own folder inside sapi, even being able to bundle them in a single binary. Why is this needed on WIndows? + ADD_FLAG(LIBS_CLI, ws2_32.lib); Surely php will already link with the sockets library for its own functions. Of course the objects that directly involves generation of php.exe depend on WinSock functions. Other socket related portion is inside php5.dll (php5ts.dll) whose imported symbols cannot be referred to unlike ELF shared objects. The http parser code seems copied from https://github.com/ry/http-parser and it may not be a good idea to modify it downstream, but it seems to do more things than strictly needed by php (eg. there are more methods than those a php server would take use). It also seems to be a hand-coded lexer, so that's much more verbose than a set of rules. Do we really have to look into the parser right now? I don't think we have to limit the methods that the server can accept since there is no reason limiting it though the server can deal with, I don't find it a problem for it to be hand-coded either. The patch looks messy as it splits main in two functions, so it gets hard to follow, but is probably good overall. Assuming you are mentioning about the option parsing portion of the code, yes, it's a bit messy, but I had to do so because runtime initialization procedure is very different from the ordinary CLI. The change from php_printf to printf in line 3988 looks wrong. php_printf() eventually redirects the output to sapi_module.ub_write(), which should only be available after proper SAPI initialization. The changed part can be reached before the initialization and it absolutely makes no sense to use php_printf() when you simply want to print a message text before the script starts in the console. Any special reason to disable it on PHP_CLI_WIN32_NO_CONSOLE ? cli-win32 version of PHP doesn't have an associated console and is supposed to use to create applications without console interactions (i.e. GUI). So, It doesn't make sense to enable this feature for it. Regards, Moriyoshi -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] RFC: built-in web server in CLI.
On Wed, Mar 2, 2011 at 11:55 PM, Moriyoshi Koizumi m...@mozo.jp wrote: Hi, Just to let you know that I wrote a RFC about built-in web server feature with which PHP can serve contents without a help of web servers. That would be handy for development purpose. If interested, have a look at http://wiki.php.net/rfc/builtinwebserver . Interesting, indeed. I noticed, that you hardcode mimetypes and index_files. Mimetypes can probably be obtained from the system — we even had some extension doing that. And index_files should be configurable, because there are some situations when people don't want any mime-types at all. Also, it would be good to be able to configure which files are actually parsed by php, not just served. Currently, these are only .php files -- Alexey Zakhlestin, http://twitter.com/jimi_dini http://www.milkfarmsoft.com/ -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
[PHP-DEV] Re: Reverting ext/mbstring patch
Hi Moriyoshi, OK, I thought the email was lost, so ignore the email I just resent. In general I like your patch and I would glad to see it fixed. I already tried to make some fixes. See the attached patch. Thanks. Dmitry. On 03/02/2011 11:51 PM, Moriyoshi Koizumi wrote: Hey, I think I can fix it somehow. Please don't be haste with it. I am going to look into it. Moriyoshi On Tue, Mar 1, 2011 at 11:35 PM, Dmitry Stogovdmi...@zend.com wrote: Hi, I'm going to revert Moriyoshi patch from December and some following fixes. I like the idea of the patch, but it just doesn't work as expected. It breaks 10 tests, but in general it breaks most things related to Unicode (declare statement, multibyte scripts, exif support for Unicode, multibyte POST requests). I tried to fix it myself, but I just can't understand how it should work (it's too big). It also has several places where integers messed with pointers, old API messed with new one and so on. I'm going to revert (apply the attached patch) on Thursday. Following is the list of failed tests: Shift_JIS request [tests/basic/029.phpt] Testing declare statement with several type values [Zend/tests/declare_001.phpt] Zend Multibyte and ShiftJIS [Zend/tests/multibyte/multibyte_encoding_001.phpt] Zend Multibyte and UTF-8 BOM [Zend/tests/multibyte/multibyte_encoding_002.phpt] Zend Multibyte and UTF-16 BOM [Zend/tests/multibyte/multibyte_encoding_003.phpt] encoding conversion from script encoding into internal encoding [Zend/tests/multibyte/multibyte_encoding_005.phpt] 086: bracketed namespace with encoding [Zend/tests/ns_086.phpt] Check for exif_read_data, Unicode user comment [ext/exif/tests/exif003.phpt] Check for exif_read_data, Unicode WinXP tags [ext/exif/tests/exif004.phpt] Test mb_get_info() function [ext/mbstring/tests/mb_get_info.phpt] Thanks. Dmitry. Index: ext/exif/exif.c === --- ext/exif/exif.c (revision 308813) +++ ext/exif/exif.c (working copy) @@ -2664,13 +2664,13 @@ decode = ImageInfo-decode_unicode_le; } if (zend_multibyte_encoding_converter( - pszInfoPtr, + (unsigned char**)pszInfoPtr, len, - szValuePtr, + (unsigned char*)szValuePtr, ByteCount, - ImageInfo-encode_unicode, - decode - TSRMLS_CC) != 0) { + zend_multibyte_fetch_encoding(ImageInfo-encode_unicode TSRMLS_CC), + zend_multibyte_fetch_encoding(decode TSRMLS_CC) + TSRMLS_CC) 0) { len = exif_process_string_raw(pszInfoPtr, szValuePtr, ByteCount); } return len; @@ -2684,13 +2684,13 @@ szValuePtr = szValuePtr+8; ByteCount -= 8; if (zend_multibyte_encoding_converter( - pszInfoPtr, + (unsigned char**)pszInfoPtr, len, - szValuePtr, + (unsigned char*)szValuePtr, ByteCount, - ImageInfo-encode_jis, - ImageInfo-motorola_intel ? ImageInfo-decode_jis_be : ImageInfo-decode_jis_le - TSRMLS_CC) != 0) { + zend_multibyte_fetch_encoding(ImageInfo-encode_jis TSRMLS_CC), + zend_multibyte_fetch_encoding(ImageInfo-motorola_intel ? ImageInfo-decode_jis_be : ImageInfo-decode_jis_le TSRMLS_CC) + TSRMLS_CC) 0) { len = exif_process_string_raw(pszInfoPtr, szValuePtr, ByteCount); } return len; @@ -2723,13 +2723,13 @@ /* Copy the comment */ if (zend_multibyte_encoding_converter( - xp_field-value, + (unsigned char**)xp_field-value, xp_field-size, - szValuePtr, + (unsigned char*)szValuePtr, ByteCount, - ImageInfo-encode_unicode, - ImageInfo-motorola_intel ? ImageInfo-decode_unicode_be : ImageInfo-decode_unicode_le - TSRMLS_CC) != 0) { + zend_multibyte_fetch_encoding(ImageInfo-encode_unicode TSRMLS_CC), + zend_multibyte_fetch_encoding(ImageInfo-motorola_intel ? ImageInfo-decode_unicode_be : ImageInfo-decode_unicode_le TSRMLS_CC) + TSRMLS_CC) 0) { xp_field-size = exif_process_string_raw(xp_field-value, szValuePtr, ByteCount); } return xp_field-size; Index: ext/mbstring/tests/mb_encoding_aliases.phpt === --- ext/mbstring/tests/mb_encoding_aliases.phpt (revision 308813) +++ ext/mbstring/tests/mb_encoding_aliases.phpt (working copy) @@ -13,26 +13,28 @@ ? --EXPECTF-- Warning: mb_encoding_aliases() expects exactly 1 parameter, 0 given in %s on line 2 -array(10) { +array(11) { [0]= string(14) ANSI_X3.4-1968 [1]= string(14) ANSI_X3.4-1986 [2]= + string(7) IBM-367 + [3]= string(6) IBM367 - [3]= + [4]= string(9) ISO646-US - [4]= + [5]= string(16) ISO_646.irv:1991 - [5]= + [6]= string(8) US-ASCII - [6]= + [7]= string(5) cp367 - [7]= + [8]= string(7) csASCII - [8]= + [9]= string(8) iso-ir-6 - [9]= + [10]= string(2) us } array(0) { Index: ext/mbstring/mbstring.c === --- ext/mbstring/mbstring.c (revision 308813) +++ ext/mbstring/mbstring.c (working copy) @@ -2910,7