[PHP-DEV] Re: Reverting ext/mbstring patch

2011-03-02 Thread Moriyoshi Koizumi
Hey,

I think I can fix it somehow.  Please don't be haste with it.  I am
going to look into it.

Moriyoshi

On Tue, Mar 1, 2011 at 11:35 PM, Dmitry Stogov dmi...@zend.com wrote:
 Hi,

 I'm going to revert Moriyoshi patch from December and some following fixes.

 I like the idea of the patch, but it just doesn't work as expected.
 It breaks 10 tests, but in general it breaks most things related to Unicode
 (declare statement, multibyte scripts, exif support for Unicode, multibyte
 POST requests).

 I tried to fix it myself, but I just can't understand how it should work
 (it's too big). It also has several places where integers messed with
 pointers, old API messed with new one and so on.

 I'm going to revert (apply the attached patch) on Thursday.

 Following is the list of failed tests:

 Shift_JIS request [tests/basic/029.phpt]
 Testing declare statement with several type values
 [Zend/tests/declare_001.phpt]
 Zend Multibyte and ShiftJIS
 [Zend/tests/multibyte/multibyte_encoding_001.phpt]
 Zend Multibyte and UTF-8 BOM
 [Zend/tests/multibyte/multibyte_encoding_002.phpt]
 Zend Multibyte and UTF-16 BOM
 [Zend/tests/multibyte/multibyte_encoding_003.phpt]
 encoding conversion from script encoding into internal encoding
 [Zend/tests/multibyte/multibyte_encoding_005.phpt]
 086: bracketed namespace with encoding [Zend/tests/ns_086.phpt]
 Check for exif_read_data, Unicode user comment [ext/exif/tests/exif003.phpt]
 Check for exif_read_data, Unicode WinXP tags [ext/exif/tests/exif004.phpt]
 Test mb_get_info() function [ext/mbstring/tests/mb_get_info.phpt]

 Thanks. Dmitry.


-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



[PHP-DEV] RFC: built-in web server in CLI.

2011-03-02 Thread Moriyoshi Koizumi
Hi,

Just to let you know that I wrote a RFC about built-in web server
feature with which PHP can serve contents without a help of web
servers.  That would be handy for development purpose.

If interested, have a look at http://wiki.php.net/rfc/builtinwebserver .

Regards,
Moriyoshi

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] RFC: built-in web server in CLI.

2011-03-02 Thread Rasmus Lerdorf
On 3/2/11 12:55 PM, Moriyoshi Koizumi wrote:
 Hi,
 
 Just to let you know that I wrote a RFC about built-in web server
 feature with which PHP can serve contents without a help of web
 servers.  That would be handy for development purpose.
 
 If interested, have a look at http://wiki.php.net/rfc/builtinwebserver .

I like it. Need to go through it very carefully and look for
security-related issues though. Make sure all memory handling is safe.

-Rasmus

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] RFC: built-in web server in CLI.

2011-03-02 Thread Pierre Joye
On Wed, Mar 2, 2011 at 9:59 PM, Rasmus Lerdorf ras...@lerdorf.com wrote:
 On 3/2/11 12:55 PM, Moriyoshi Koizumi wrote:
 Hi,

 Just to let you know that I wrote a RFC about built-in web server
 feature with which PHP can serve contents without a help of web
 servers.  That would be handy for development purpose.

 If interested, have a look at http://wiki.php.net/rfc/builtinwebserver .

 I like it. Need to go through it very carefully and look for
 security-related issues though. Make sure all memory handling is safe.

Same here, very handy. I would not worry too much about security
related issues as such builtin server should really be used for
development purposes only (yes, users do bad things even if we say to
do not it :).

-- 
Pierre

@pierrejoye | http://blog.thepimp.net | http://www.libgd.org

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] RFC: built-in web server in CLI.

2011-03-02 Thread Ángel González
Moriyoshi Koizumi wrote:
 Hi,

 Just to let you know that I wrote a RFC about built-in web server
 feature with which PHP can serve contents without a help of web
 servers.  That would be handy for development purpose.

 If interested, have a look at http://wiki.php.net/rfc/builtinwebserver .

 Regards,
 Moriyoshi
I like the idea.

Regarding the patch (https://gist.github.com/835698):
I don't see a switch to disable the internal parse on configure.

I'd expect the files to be on its own folder inside sapi, even being
able to
bundle them in a single binary.

Why is this needed on WIndows?

+ ADD_FLAG(LIBS_CLI, ws2_32.lib);

Surely php will already link with the sockets library for its own functions.

The http parser code seems copied from https://github.com/ry/http-parser and
it may not be a good idea to modify it downstream, but it  seems to do more
things than strictly needed by php (eg. there are more methods than those a
php server would take use).
It also seems to be a hand-coded lexer, so that's much more verbose than a
set of rules.

The patch looks messy as it splits main in two functions, so it gets
hard to follow,
but is probably good overall.

The change from php_printf to printf in line 3988 looks wrong.

Any special reason to disable it on PHP_CLI_WIN32_NO_CONSOLE ?


-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Volnitsky substring search algo

2011-03-02 Thread Ángel González
Stas Malyshev wrote:
 Hi!

 http://volnitsky.com/project/str_search/

 I'm not sure it'd be easy to integrate this into PHP codebase as-is,
 provided it relies on C++ standard libraries which PHP makes no use of
 (and thus potentially introduces a world of dependencies and
 complexities into the build process). I'm sure it can be re-done in
 pure standard C, then it can be tested in PHP and if it's better - I
 don't see why it can't be integrated.

Not really. It only uses std::search, which would be equivalent to the
current zend_memnstr(). And std::numeric_limits can be replaced with a
limits.h macro.
I'd be more concerned about the only for little-endian platforms and
where access to misaligned W is allowed remark. php is also available
for big endian architectures, but that seems easy. Some architecture
supported by php won't probably accept that, so it would also need some
configure test to disable it.



-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Volnitsky substring search algo

2011-03-02 Thread Damien Tournoud
On Sun, Feb 27, 2011 at 12:50 PM, Jordi Boggiano j.boggi...@seld.be wrote:

 http://volnitsky.com/project/str_search/


The algorithm seems flawed to me, at least in its reference implementation.
There does not seem to be any guarantee that the returned position is really
the *first* occurrence of the needle in the haystack.

It's easy to see with needle being a repetition of the same character:
  SS = 'a' * 1000
  W_size = 4

The hash map build will be clogged, with the value 1 stored in the first
slot for SS. As a consequence, the algorithm will step through the haystack,
trying to confirm a match for the needle at the step position. If a match is
found there, it will discard any previous matches that could be valid at
this position. All those haystack will return the same position 998:
  S = '' * 997 + 'a' * 1000 (correct)
  S = 'bbb' * 996 + 'a' * 1001 (incorrect, should return 997)
  S = 'bb' * 995 + 'a' * 1002 (incorrect, should return 996)
  S = 'b' * 994 + 'a' * 1003 (incorrect, should return 995)

The implementation could be fixed (by adding an explicit string matching
when building the hash table, and by storing *all* the occurrences of a
given W in SS), but that will increase the overall cost (both computing and
memory) of the algorithm.

Damien Tournoud


Re: [PHP-DEV] RFC: built-in web server in CLI.

2011-03-02 Thread Moriyoshi Koizumi
2011/3/3 Ángel González keis...@gmail.com:
 Moriyoshi Koizumi wrote:
 Hi,

 Just to let you know that I wrote a RFC about built-in web server
 feature with which PHP can serve contents without a help of web
 servers.  That would be handy for development purpose.

 If interested, have a look at http://wiki.php.net/rfc/builtinwebserver .

 Regards,
 Moriyoshi
 I like the idea.

 Regarding the patch (https://gist.github.com/835698):
 I don't see a switch to disable the internal parse on configure.

I don't see any obvious reason it should be able to be turned off
through the build option.  The only problem is binary size increase,
which I guess is quite subtle.

 I'd expect the files to be on its own folder inside sapi, even being
 able to
 bundle them in a single binary.

 Why is this needed on WIndows?

 + ADD_FLAG(LIBS_CLI, ws2_32.lib);

 Surely php will already link with the sockets library for its own functions.

Of course the objects that directly involves generation of php.exe
depend on WinSock functions. Other socket related portion is inside
php5.dll (php5ts.dll) whose imported symbols cannot be referred to
unlike ELF shared objects.

 The http parser code seems copied from https://github.com/ry/http-parser and
 it may not be a good idea to modify it downstream, but it  seems to do more
 things than strictly needed by php (eg. there are more methods than those a
 php server would take use).
 It also seems to be a hand-coded lexer, so that's much more verbose than a
 set of rules.

Do we really have to look into the parser right now?  I don't think we
have to limit the methods that the server can accept since there is no
reason limiting it though the server can deal with,  I don't find it a
problem for it to be hand-coded either.

 The patch looks messy as it splits main in two functions, so it gets
 hard to follow,
 but is probably good overall.

Assuming you are mentioning about the option parsing portion of the
code, yes, it's a bit messy, but I had to do so because runtime
initialization procedure is very different from the ordinary CLI.

 The change from php_printf to printf in line 3988 looks wrong.

php_printf() eventually redirects the output to
sapi_module.ub_write(), which should only be available after proper
SAPI initialization.  The changed part can be reached before the
initialization and it absolutely makes no sense to use php_printf()
when you simply want to print a message text before the script starts
in the console.

 Any special reason to disable it on PHP_CLI_WIN32_NO_CONSOLE ?

cli-win32 version of PHP doesn't have an associated console and is
supposed to use to create applications without console interactions
(i.e. GUI).  So, It doesn't make sense to enable this feature for it.

Regards,
Moriyoshi

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] RFC: built-in web server in CLI.

2011-03-02 Thread Alexey Zakhlestin
On Wed, Mar 2, 2011 at 11:55 PM, Moriyoshi Koizumi m...@mozo.jp wrote:
 Hi,

 Just to let you know that I wrote a RFC about built-in web server
 feature with which PHP can serve contents without a help of web
 servers.  That would be handy for development purpose.

 If interested, have a look at http://wiki.php.net/rfc/builtinwebserver .

Interesting, indeed.

I noticed, that you hardcode mimetypes and index_files.
Mimetypes can probably be obtained from the system — we even had some
extension doing that.
And index_files should be configurable, because there are some
situations when people don't want any mime-types at all.

Also, it would be good to be able to configure which files are
actually parsed by php, not just served. Currently, these are only
.php files

-- 
Alexey Zakhlestin, http://twitter.com/jimi_dini
http://www.milkfarmsoft.com/

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



[PHP-DEV] Re: Reverting ext/mbstring patch

2011-03-02 Thread Dmitry Stogov

Hi Moriyoshi,

OK, I thought the email was lost, so ignore the email I just resent.

In general I like your patch and I would glad to see it fixed.

I already tried to make some fixes.
See the attached patch.

Thanks. Dmitry.

On 03/02/2011 11:51 PM, Moriyoshi Koizumi wrote:

Hey,

I think I can fix it somehow.  Please don't be haste with it.  I am
going to look into it.

Moriyoshi

On Tue, Mar 1, 2011 at 11:35 PM, Dmitry Stogovdmi...@zend.com  wrote:

Hi,

I'm going to revert Moriyoshi patch from December and some following fixes.

I like the idea of the patch, but it just doesn't work as expected.
It breaks 10 tests, but in general it breaks most things related to Unicode
(declare statement, multibyte scripts, exif support for Unicode, multibyte
POST requests).

I tried to fix it myself, but I just can't understand how it should work
(it's too big). It also has several places where integers messed with
pointers, old API messed with new one and so on.

I'm going to revert (apply the attached patch) on Thursday.

Following is the list of failed tests:

Shift_JIS request [tests/basic/029.phpt]
Testing declare statement with several type values
[Zend/tests/declare_001.phpt]
Zend Multibyte and ShiftJIS
[Zend/tests/multibyte/multibyte_encoding_001.phpt]
Zend Multibyte and UTF-8 BOM
[Zend/tests/multibyte/multibyte_encoding_002.phpt]
Zend Multibyte and UTF-16 BOM
[Zend/tests/multibyte/multibyte_encoding_003.phpt]
encoding conversion from script encoding into internal encoding
[Zend/tests/multibyte/multibyte_encoding_005.phpt]
086: bracketed namespace with encoding [Zend/tests/ns_086.phpt]
Check for exif_read_data, Unicode user comment [ext/exif/tests/exif003.phpt]
Check for exif_read_data, Unicode WinXP tags [ext/exif/tests/exif004.phpt]
Test mb_get_info() function [ext/mbstring/tests/mb_get_info.phpt]

Thanks. Dmitry.



Index: ext/exif/exif.c
===
--- ext/exif/exif.c	(revision 308813)
+++ ext/exif/exif.c	(working copy)
@@ -2664,13 +2664,13 @@
 decode = ImageInfo-decode_unicode_le;
 			}
 			if (zend_multibyte_encoding_converter(
-	pszInfoPtr, 
+	(unsigned char**)pszInfoPtr, 
 	len, 
-	szValuePtr,
+	(unsigned char*)szValuePtr,
 	ByteCount,
-	ImageInfo-encode_unicode,
-	decode
-	TSRMLS_CC) != 0) {
+	zend_multibyte_fetch_encoding(ImageInfo-encode_unicode TSRMLS_CC),
+	zend_multibyte_fetch_encoding(decode TSRMLS_CC)
+	TSRMLS_CC)  0) {
 len = exif_process_string_raw(pszInfoPtr, szValuePtr, ByteCount);
 			}
 			return len;
@@ -2684,13 +2684,13 @@
 			szValuePtr = szValuePtr+8;
 			ByteCount -= 8;
 			if (zend_multibyte_encoding_converter(
-	pszInfoPtr, 
+	(unsigned char**)pszInfoPtr, 
 	len, 
-	szValuePtr,
+	(unsigned char*)szValuePtr,
 	ByteCount,
-	ImageInfo-encode_jis,
-	ImageInfo-motorola_intel ? ImageInfo-decode_jis_be : ImageInfo-decode_jis_le
-	TSRMLS_CC) != 0) {
+	zend_multibyte_fetch_encoding(ImageInfo-encode_jis TSRMLS_CC),
+	zend_multibyte_fetch_encoding(ImageInfo-motorola_intel ? ImageInfo-decode_jis_be : ImageInfo-decode_jis_le TSRMLS_CC)
+	TSRMLS_CC)  0) {
 len = exif_process_string_raw(pszInfoPtr, szValuePtr, ByteCount);
 			}
 			return len;
@@ -2723,13 +2723,13 @@
 
 	/* Copy the comment */
 	if (zend_multibyte_encoding_converter(
-			xp_field-value, 
+			(unsigned char**)xp_field-value, 
 			xp_field-size, 
-			szValuePtr,
+			(unsigned char*)szValuePtr,
 			ByteCount,
-			ImageInfo-encode_unicode,
-			ImageInfo-motorola_intel ? ImageInfo-decode_unicode_be : ImageInfo-decode_unicode_le
-			TSRMLS_CC) != 0) {
+			zend_multibyte_fetch_encoding(ImageInfo-encode_unicode TSRMLS_CC),
+			zend_multibyte_fetch_encoding(ImageInfo-motorola_intel ? ImageInfo-decode_unicode_be : ImageInfo-decode_unicode_le TSRMLS_CC)
+			TSRMLS_CC)  0) {
 		xp_field-size = exif_process_string_raw(xp_field-value, szValuePtr, ByteCount);
 	}
 	return xp_field-size;
Index: ext/mbstring/tests/mb_encoding_aliases.phpt
===
--- ext/mbstring/tests/mb_encoding_aliases.phpt	(revision 308813)
+++ ext/mbstring/tests/mb_encoding_aliases.phpt	(working copy)
@@ -13,26 +13,28 @@
 ?
 --EXPECTF--
 Warning: mb_encoding_aliases() expects exactly 1 parameter, 0 given in %s on line 2
-array(10) {
+array(11) {
   [0]=
   string(14) ANSI_X3.4-1968
   [1]=
   string(14) ANSI_X3.4-1986
   [2]=
+  string(7) IBM-367
+  [3]=
   string(6) IBM367
-  [3]=
+  [4]=
   string(9) ISO646-US
-  [4]=
+  [5]=
   string(16) ISO_646.irv:1991
-  [5]=
+  [6]=
   string(8) US-ASCII
-  [6]=
+  [7]=
   string(5) cp367
-  [7]=
+  [8]=
   string(7) csASCII
-  [8]=
+  [9]=
   string(8) iso-ir-6
-  [9]=
+  [10]=
   string(2) us
 }
 array(0) {
Index: ext/mbstring/mbstring.c
===
--- ext/mbstring/mbstring.c	(revision 308813)
+++ ext/mbstring/mbstring.c	(working copy)
@@ -2910,7