RE: [PHP-CVS] cvs: php-src /ext/soap php_encoding.c
Hi Rasmus, Will your patch support strings with special characters ('', '', '')? Thanks. Dmitry. -Original Message- From: Rasmus Lerdorf [mailto:[EMAIL PROTECTED] Sent: Thursday, June 15, 2006 10:04 PM To: php-cvs@lists.php.net Subject: [PHP-CVS] cvs: php-src /ext/soap php_encoding.c rasmusThu Jun 15 18:03:31 2006 UTC Modified files: /php-src/ext/soap php_encoding.c Log: I don't think the call to xmlNodeSetContentLen() is needed here and it is causing performance problems because it tries to parse the blob and create a subtree. Because we are escaping the string anyway, we are never going to get a subtree, but the entity parsing that is done by xmlNodeSetContentLen() is killing performance on large blobs of text. On one recent example it took a couple of minutes to parse whereas if we just create a text node like this and set the contents to the raw string it is down to milliseconds. As far as I can tell all the tests pass with this patch. http://cvs.php.net/viewcvs.cgi/php-src/ext/soap/php_encoding.c ?r1=1.127r2=1.128diff_format=u Index: php-src/ext/soap/php_encoding.c diff -u php-src/ext/soap/php_encoding.c:1.127 php-src/ext/soap/php_encoding.c:1.128 --- php-src/ext/soap/php_encoding.c:1.127 Fri May 26 09:04:53 2006 +++ php-src/ext/soap/php_encoding.c Thu Jun 15 18:03:30 2006 @@ -17,7 +17,7 @@ | Dmitry Stogov [EMAIL PROTECTED] | +- -+ */ -/* $Id: php_encoding.c,v 1.127 2006/05/26 09:04:53 dmitry Exp $ */ +/* $Id: php_encoding.c,v 1.128 2006/06/15 18:03:30 rasmus Exp $ */ #include time.h @@ -728,7 +728,7 @@ static xmlNodePtr to_xml_string(encodeTypePtr type, zval *data, int style, xmlNodePtr parent) { - xmlNodePtr ret; + xmlNodePtr ret, text; char *str; int new_len; TSRMLS_FETCH(); @@ -738,13 +738,15 @@ FIND_ZVAL_NULL(data, ret, style); if (Z_TYPE_P(data) == IS_STRING) { - str = php_escape_html_entities(Z_STRVAL_P(data), Z_STRLEN_P(data), new_len, 0, 0, NULL TSRMLS_CC); + str = estrndup(Z_STRVAL_P(data), Z_STRLEN_P(data)); + new_len = Z_STRLEN_P(data); } else { zval tmp = *data; zval_copy_ctor(tmp); convert_to_string(tmp); - str = php_escape_html_entities(Z_STRVAL(tmp), Z_STRLEN(tmp), new_len, 0, 0, NULL TSRMLS_CC); + str = estrndup(Z_STRVAL(tmp), Z_STRLEN(tmp)); + new_len = Z_STRLEN(tmp); zval_dtor(tmp); } @@ -766,7 +768,8 @@ soap_error1(E_ERROR, Encoding: string '%s' is not a valid utf-8 string, str); } - xmlNodeSetContentLen(ret, str, new_len); + text = xmlNewTextLen(str, new_len); + xmlAddChild(ret, text); efree(str); if (style == SOAP_ENCODED) { -- PHP CVS Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- PHP CVS Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-CVS] cvs: php-src /ext/soap php_encoding.c
Yes, those chars is exactly what was causing the performance problem actually. xmlNewTextLen() will call the internal libxml entity encoder, but it won't try to allocate each entity for use by the subtree. It was this entity allocation code in xmlNodeSetContentLen that was slowing everything down even though because we were calling php_escape_html_entities() on the blob before passing it in, it wouldn't create any sub-nodes anyway so the whole thing was a bit redundant, at least if I am understanding this correctly. -Rasmus Dmitry Stogov wrote: Hi Rasmus, Will your patch support strings with special characters ('', '', '')? Thanks. Dmitry. -Original Message- From: Rasmus Lerdorf [mailto:[EMAIL PROTECTED] Sent: Thursday, June 15, 2006 10:04 PM To: php-cvs@lists.php.net Subject: [PHP-CVS] cvs: php-src /ext/soap php_encoding.c rasmus Thu Jun 15 18:03:31 2006 UTC Modified files: /php-src/ext/soap php_encoding.c Log: I don't think the call to xmlNodeSetContentLen() is needed here and it is causing performance problems because it tries to parse the blob and create a subtree. Because we are escaping the string anyway, we are never going to get a subtree, but the entity parsing that is done by xmlNodeSetContentLen() is killing performance on large blobs of text. On one recent example it took a couple of minutes to parse whereas if we just create a text node like this and set the contents to the raw string it is down to milliseconds. As far as I can tell all the tests pass with this patch. http://cvs.php.net/viewcvs.cgi/php-src/ext/soap/php_encoding.c ?r1=1.127r2=1.128diff_format=u Index: php-src/ext/soap/php_encoding.c diff -u php-src/ext/soap/php_encoding.c:1.127 php-src/ext/soap/php_encoding.c:1.128 --- php-src/ext/soap/php_encoding.c:1.127 Fri May 26 09:04:53 2006 +++ php-src/ext/soap/php_encoding.c Thu Jun 15 18:03:30 2006 @@ -17,7 +17,7 @@ | Dmitry Stogov [EMAIL PROTECTED] | +- -+ */ -/* $Id: php_encoding.c,v 1.127 2006/05/26 09:04:53 dmitry Exp $ */ +/* $Id: php_encoding.c,v 1.128 2006/06/15 18:03:30 rasmus Exp $ */ #include time.h @@ -728,7 +728,7 @@ static xmlNodePtr to_xml_string(encodeTypePtr type, zval *data, int style, xmlNodePtr parent) { - xmlNodePtr ret; + xmlNodePtr ret, text; char *str; int new_len; TSRMLS_FETCH(); @@ -738,13 +738,15 @@ FIND_ZVAL_NULL(data, ret, style); if (Z_TYPE_P(data) == IS_STRING) { - str = php_escape_html_entities(Z_STRVAL_P(data), Z_STRLEN_P(data), new_len, 0, 0, NULL TSRMLS_CC); + str = estrndup(Z_STRVAL_P(data), Z_STRLEN_P(data)); + new_len = Z_STRLEN_P(data); } else { zval tmp = *data; zval_copy_ctor(tmp); convert_to_string(tmp); - str = php_escape_html_entities(Z_STRVAL(tmp), Z_STRLEN(tmp), new_len, 0, 0, NULL TSRMLS_CC); + str = estrndup(Z_STRVAL(tmp), Z_STRLEN(tmp)); + new_len = Z_STRLEN(tmp); zval_dtor(tmp); } @@ -766,7 +768,8 @@ soap_error1(E_ERROR, Encoding: string '%s' is not a valid utf-8 string, str); } - xmlNodeSetContentLen(ret, str, new_len); + text = xmlNewTextLen(str, new_len); + xmlAddChild(ret, text); efree(str); if (style == SOAP_ENCODED) { -- PHP CVS Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- PHP CVS Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-CVS] cvs: php-src /ext/soap php_encoding.c
Just to clarify that. xmlNewTextLen will contain the raw text and those 3 characters are escaped during serialization. xmlNodeSetContentLen, on the other hand, since it was being called from the scope of an element, was building a subtree of text and entity nodes. With large amounts of data (containing many entities mixed with text) this is extremely slow and memory extensive due to the number nodes having to be created and then ultimately free'd. This also slowed down serialization due to having to traverse so many nodes. Rob Rasmus Lerdorf wrote: Yes, those chars is exactly what was causing the performance problem actually. xmlNewTextLen() will call the internal libxml entity encoder, but it won't try to allocate each entity for use by the subtree. It was this entity allocation code in xmlNodeSetContentLen that was slowing everything down even though because we were calling php_escape_html_entities() on the blob before passing it in, it wouldn't create any sub-nodes anyway so the whole thing was a bit redundant, at least if I am understanding this correctly. -Rasmus Dmitry Stogov wrote: Hi Rasmus, Will your patch support strings with special characters ('', '', '')? Thanks. Dmitry. -Original Message- From: Rasmus Lerdorf [mailto:[EMAIL PROTECTED] Sent: Thursday, June 15, 2006 10:04 PM To: php-cvs@lists.php.net Subject: [PHP-CVS] cvs: php-src /ext/soap php_encoding.c rasmusThu Jun 15 18:03:31 2006 UTC Modified files: /php-src/ext/soap php_encoding.c Log: I don't think the call to xmlNodeSetContentLen() is needed here and it is causing performance problems because it tries to parse the blob and create a subtree. Because we are escaping the string anyway, we are never going to get a subtree, but the entity parsing that is done by xmlNodeSetContentLen() is killing performance on large blobs of text. On one recent example it took a couple of minutes to parse whereas if we just create a text node like this and set the contents to the raw string it is down to milliseconds. As far as I can tell all the tests pass with this patch. http://cvs.php.net/viewcvs.cgi/php-src/ext/soap/php_encoding.c ?r1=1.127r2=1.128diff_format=u Index: php-src/ext/soap/php_encoding.c diff -u php-src/ext/soap/php_encoding.c:1.127 php-src/ext/soap/php_encoding.c:1.128 --- php-src/ext/soap/php_encoding.c:1.127Fri May 26 09:04:53 2006 +++ php-src/ext/soap/php_encoding.cThu Jun 15 18:03:30 2006 @@ -17,7 +17,7 @@ | Dmitry Stogov [EMAIL PROTECTED] | +- -+ */ -/* $Id: php_encoding.c,v 1.127 2006/05/26 09:04:53 dmitry Exp $ */ +/* $Id: php_encoding.c,v 1.128 2006/06/15 18:03:30 rasmus Exp $ */ #include time.h @@ -728,7 +728,7 @@ static xmlNodePtr to_xml_string(encodeTypePtr type, zval *data, int style, xmlNodePtr parent) { -xmlNodePtr ret; +xmlNodePtr ret, text; char *str; int new_len; TSRMLS_FETCH(); @@ -738,13 +738,15 @@ FIND_ZVAL_NULL(data, ret, style); if (Z_TYPE_P(data) == IS_STRING) { -str = php_escape_html_entities(Z_STRVAL_P(data), Z_STRLEN_P(data), new_len, 0, 0, NULL TSRMLS_CC); +str = estrndup(Z_STRVAL_P(data), Z_STRLEN_P(data)); +new_len = Z_STRLEN_P(data); } else { zval tmp = *data; zval_copy_ctor(tmp); convert_to_string(tmp); -str = php_escape_html_entities(Z_STRVAL(tmp), Z_STRLEN(tmp), new_len, 0, 0, NULL TSRMLS_CC); +str = estrndup(Z_STRVAL(tmp), Z_STRLEN(tmp)); +new_len = Z_STRLEN(tmp); zval_dtor(tmp); } @@ -766,7 +768,8 @@ soap_error1(E_ERROR, Encoding: string '%s' is not a valid utf-8 string, str); } -xmlNodeSetContentLen(ret, str, new_len); +text = xmlNewTextLen(str, new_len); +xmlAddChild(ret, text); efree(str); if (style == SOAP_ENCODED) { -- PHP CVS Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- PHP CVS Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP-CVS] cvs: php-src /ext/soap php_encoding.c
Hi George, Seems you patch is wrong. It breaks ext/soap/tests/schema/shema047.phpt and ext/soap/tests/schema/049.phpt. I reverted the path. Please provide test case, what is not working for you? Thanks. Dmitry. -Original Message- From: George Schlossnagle [mailto:[EMAIL PROTECTED] Sent: Friday, October 07, 2005 2:30 AM To: php-cvs@lists.php.net Subject: [PHP-CVS] cvs: php-src /ext/soap php_encoding.c gschlossnagle Thu Oct 6 18:30:11 2005 EDT Modified files: /php-src/ext/soap php_encoding.c Log: support complex types in restrictions and extensions http://cvs.php.net/diff.php/php-src/ext/soap/php_encoding.c?r1 =1.107r2=1.108ty=u Index: php-src/ext/soap/php_encoding.c diff -u php-src/ext/soap/php_encoding.c:1.107 php-src/ext/soap/php_encoding.c:1.108 --- php-src/ext/soap/php_encoding.c:1.107 Thu Sep 29 06:00:59 2005 +++ php-src/ext/soap/php_encoding.c Thu Oct 6 18:30:08 2005 @@ -17,7 +17,7 @@ | Dmitry Stogov [EMAIL PROTECTED] | +- -+ */ -/* $Id: php_encoding.c,v 1.107 2005/09/29 10:00:59 dmitry Exp $ */ +/* $Id: php_encoding.c,v 1.108 2005/10/06 22:30:08 gschlossnagle Exp $ +*/ #include time.h @@ -319,6 +319,10 @@ node = encode-to_xml_after(encode-details, node, style); } } + if(!node) { + node = xmlNewNode(NULL,BOGUS); + xmlAddChild(parent, node); + } return node; } @@ -1536,6 +1540,7 @@ enc = sdlType-encode; while (enc enc-details.sdl_type +enc-details.sdl_type-kind != XSD_TYPEKIND_COMPLEX enc-details.sdl_type-kind != XSD_TYPEKIND_SIMPLE enc-details.sdl_type-kind != XSD_TYPEKIND_LIST enc-details.sdl_type-kind != XSD_TYPEKIND_UNION) { @@ -1545,11 +1550,8 @@ zval *tmp = get_zval_property(data, _ TSRMLS_CC); if (tmp) { xmlParam = master_to_xml(enc, tmp, style, parent); - } else if (prop == NULL) { - xmlParam = master_to_xml(enc, data, style, parent); } else { - xmlParam = xmlNewNode(NULL,BOGUS); - xmlAddChild(parent, xmlParam); + xmlParam = master_to_xml(enc, data, style, parent); } } else { xmlParam = xmlNewNode(NULL,BOGUS); @@ -1558,6 +1560,7 @@ } else if (sdlType-kind == XSD_TYPEKIND_EXTENSION sdlType-encode type != sdlType-encode-details) { if (sdlType-encode-details.sdl_type + sdlType-encode-details.sdl_type-kind != XSD_TYPEKIND_COMPLEX + sdlType-encode-details.sdl_type-kind != XSD_TYPEKIND_SIMPLE sdlType-encode-details.sdl_type-kind != XSD_TYPEKIND_LIST sdlType-encode-details.sdl_type-kind != XSD_TYPEKIND_UNION) { @@ -1567,12 +1570,9 @@ if (tmp) { xmlParam = master_to_xml(sdlType-encode, tmp, style, parent); - } else if (prop == NULL) { - xmlParam = master_to_xml(sdlType-encode, data, style, parent); } else { - xmlParam = xmlNewNode(NULL,BOGUS); - xmlAddChild(parent, xmlParam); - } + xmlParam = master_to_xml(sdlType-encode, data, style, parent); + } } } else { xmlParam = xmlNewNode(NULL,BOGUS); -- PHP CVS Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- PHP CVS Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-CVS] cvs: php-src /ext/soap php_encoding.c
Test 49 looks totally incorrect to me. You've derived the object by restriction in accordance with section 2.5.2.1 of the Schema Part 2 spec, [Definition:] A datatype is said to be ·derived· by restriction from another datatype when values for zero or more ·constraining facet·s are specified that serve to constrain its ·value space· and/or its lexical space to a subset of the base type. Check http://www.w3.org/TR/xmlschema-2/#derivation-by-restriction and http://www.w3.org/TR/xmlschema-1/#Complex_Type_Definition_details for more details. but you aren't constraining the type to be a subset of the base type, in fact you ignore the base type completely and just implement your own parameter there, which seems to violate the way restrictions work. Test 49 _should_ look like this: Further, if you specify no constraints on a restriction, then you extend off that restriction. I've added test 81 which validates this (and fails with your reversion of my patch). Here's a patch that makes 81 pass correctly and leaves 47 working. As I've stated, 49 just looks wrong: George On Oct 7, 2005, at 5:03 AM, Dmitry Stogov wrote: Hi George, Seems you patch is wrong. It breaks ext/soap/tests/schema/shema047.phpt and ext/soap/tests/schema/049.phpt. I reverted the path. Please provide test case, what is not working for you? Thanks. Dmitry. -Original Message- From: George Schlossnagle [mailto:[EMAIL PROTECTED] Sent: Friday, October 07, 2005 2:30 AM To: php-cvs@lists.php.net Subject: [PHP-CVS] cvs: php-src /ext/soap php_encoding.c gschlossnagleThu Oct 6 18:30:11 2005 EDT Modified files: /php-src/ext/soapphp_encoding.c Log: support complex types in restrictions and extensions http://cvs.php.net/diff.php/php-src/ext/soap/php_encoding.c?r1 =1.107r2=1.108ty=u Index: php-src/ext/soap/php_encoding.c diff -u php-src/ext/soap/php_encoding.c:1.107 php-src/ext/soap/php_encoding.c:1.108 --- php-src/ext/soap/php_encoding.c:1.107Thu Sep 29 06:00:59 2005 +++ php-src/ext/soap/php_encoding.cThu Oct 6 18:30:08 2005 @@ -17,7 +17,7 @@ | Dmitry Stogov [EMAIL PROTECTED] | +- -+ */ -/* $Id: php_encoding.c,v 1.107 2005/09/29 10:00:59 dmitry Exp $ */ +/* $Id: php_encoding.c,v 1.108 2005/10/06 22:30:08 gschlossnagle Exp $ +*/ #include time.h @@ -319,6 +319,10 @@ node = encode-to_xml_after(encode-details, node, style); } } +if(!node) { +node = xmlNewNode(NULL,BOGUS); +xmlAddChild(parent, node); +} return node; } @@ -1536,6 +1540,7 @@ enc = sdlType-encode; while (enc enc-details.sdl_type + enc-details.sdl_type-kind != XSD_TYPEKIND_COMPLEX enc-details.sdl_type-kind != XSD_TYPEKIND_SIMPLE enc-details.sdl_type-kind != XSD_TYPEKIND_LIST enc-details.sdl_type-kind != XSD_TYPEKIND_UNION) { @@ -1545,11 +1550,8 @@ zval *tmp = get_zval_property(data, _ TSRMLS_CC); if (tmp) { xmlParam = master_to_xml(enc, tmp, style, parent); -} else if (prop == NULL) { -xmlParam = master_to_xml(enc, data, style, parent); } else { -xmlParam = xmlNewNode(NULL,BOGUS); -xmlAddChild(parent, xmlParam); +xmlParam = master_to_xml(enc, data, style, parent); } } else { xmlParam = xmlNewNode(NULL,BOGUS); @@ -1558,6 +1560,7 @@ } else if (sdlType-kind == XSD_TYPEKIND_EXTENSION sdlType-encode type != sdlType-encode-details) { if (sdlType-encode-details.sdl_type + sdlType-encode-details.sdl_type-kind != XSD_TYPEKIND_COMPLEX + sdlType-encode-details.sdl_type-kind != XSD_TYPEKIND_SIMPLE sdlType-encode-details.sdl_type-kind != XSD_TYPEKIND_LIST sdlType-encode-details.sdl_type-kind != XSD_TYPEKIND_UNION) { @@ -1567,12 +1570,9 @@ if (tmp) { xmlParam = master_to_xml(sdlType-encode, tmp, style, parent); -} else if (prop == NULL) { -xmlParam = master_to_xml(sdlType-encode, data, style, parent); } else { -xmlParam = xmlNewNode(NULL,BOGUS); -xmlAddChild(parent, xmlParam); -} +xmlParam = master_to_xml(sdlType-encode, data, style, parent); +} } } else { xmlParam = xmlNewNode(NULL,BOGUS); -- PHP CVS Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php George Schlossnagle -- Vice President of Engineering -- OmniTI Computer Consulting -- http://www.omniti.com -- PHP CVS Mailing List (http://www.php.net/) To unsubscribe, visit:
RE: [PHP-CVS] cvs: php-src /ext/soap php_encoding.c
On Wed, 8 Sep 2004, Dmitry Stogov wrote: Hi, I should make a decision. Can anybody point me to some utf-8 specification document? http://www.unicode.org/faq/utf_bom.html#37 http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf section 3.9, which proves that Rob is right and I was wrong for using UTF-8 as Unicode encoding standard. (Though theoretically you could use UTF8 for 4 byte encodings up to 6 bytes). Besides this, I do no think that we should introduce copied versions into our extensions, but just block it from being used with a configure check for this specific libxml2 version. This also should not be done on an extension level, but generally for PHP. (Or in case that we really want to add a copied (+fixed) function, we should do that in ext/libxml so that all extensions can make use of this. Derick -- Derick Rethans http://derickrethans.nl | http://ez.no | http://xdebug.org -- PHP CVS Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP-CVS] cvs: php-src /ext/soap php_encoding.c
OK. I willn't add 6-bytes characters suppport. You are right. The proper place for this function is ext/libxml. I will glad to use it, if somebody will implement this function there. Thanks. Dmitry. -Original Message- From: Derick Rethans [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 08, 2004 10:34 To: Dmitry Stogov Cc: 'Rob Richards'; 'Marcus Boerger'; 'Dmitry Stogov'; [EMAIL PROTECTED] Subject: RE: [PHP-CVS] cvs: php-src /ext/soap php_encoding.c On Wed, 8 Sep 2004, Dmitry Stogov wrote: Hi, I should make a decision. Can anybody point me to some utf-8 specification document? http://www.unicode.org/faq/utf_bom.html#37 http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf section 3.9, which proves that Rob is right and I was wrong for using UTF-8 as Unicode encoding standard. (Though theoretically you could use UTF8 for 4 byte encodings up to 6 bytes). Besides this, I do no think that we should introduce copied versions into our extensions, but just block it from being used with a configure check for this specific libxml2 version. This also should not be done on an extension level, but generally for PHP. (Or in case that we really want to add a copied (+fixed) function, we should do that in ext/libxml so that all extensions can make use of this. Derick -- Derick Rethans http://derickrethans.nl | http://ez.no | http://xdebug.org -- PHP CVS Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-CVS] cvs: php-src /ext/soap php_encoding.c
That you. At least I know I'm still borderline and havent gone completely insane. It was changed in RFC 3629 It should probably be added in ext/libxml as libxml 2.6.13 (and it looks like .12 as well) are broken badly here when a bug fix was done in the function. Previous versions have a bug with the 2 byte check (certain invalid strings are returned as valid). Dmitry's code is almost exactly as what's in libxml cvs for the function now so the code should be at least used for = 2.6.13. Rob - Original Message - From: Derick Rethans I should make a decision. Can anybody point me to some utf-8 specification document? http://www.unicode.org/faq/utf_bom.html#37 http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf section 3.9, which proves that Rob is right and I was wrong for using UTF-8 as Unicode encoding standard. (Though theoretically you could use UTF8 for 4 byte encodings up to 6 bytes). Besides this, I do no think that we should introduce copied versions into our extensions, but just block it from being used with a configure check for this specific libxml2 version. This also should not be done on an extension level, but generally for PHP. (Or in case that we really want to add a copied (+fixed) function, we should do that in ext/libxml so that all extensions can make use of this. -- PHP CVS Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-CVS] cvs: php-src /ext/soap php_encoding.c
Hello Dmitry, that's missing a few lines: } else if ((c 0xfc) == 0xf8) { if ((s[i++] 0xc0) != 0x80 || (s[i++] 0xc0) != 0x80 || (s[i++] 0xc0) != 0x80 || (s[i++] 0xc0) != 0x80) { return 0; } } else if ((c 0xfe) == 0xfc) { if ((s[i++] 0xc0) != 0x80 || (s[i++] 0xc0) != 0x80 || (s[i++] 0xc0) != 0x80 || (s[i++] 0xc0) != 0x80 || (s[i++] 0xc0) != 0x80) { return 0; } regards marcus Tuesday, September 7, 2004, 4:34:46 PM, you wrote: dmitryTue Sep 7 10:34:46 2004 EDT Modified files: /php-src/ext/soap php_encoding.c Log: Make ext/soap work around libxml2 bug in xmlCheckUTF8 (2.6.7-2.6.13) http://cvs.php.net/diff.php/php-src/ext/soap/php_encoding.c?r1=1.74r2=1.75ty=u Index: php-src/ext/soap/php_encoding.c diff -u php-src/ext/soap/php_encoding.c:1.74 php-src/ext/soap/php_encoding.c:1.75 --- php-src/ext/soap/php_encoding.c:1.74 Thu Aug 26 14:40:10 2004 +++ php-src/ext/soap/php_encoding.c Tue Sep 7 10:34:46 2004 @@ -17,7 +17,7 @@ | Dmitry Stogov [EMAIL PROTECTED] +--+ */ -/* $Id: php_encoding.c,v 1.74 2004/08/26 18:40:10 dmitry Exp $ */ +/* $Id: php_encoding.c,v 1.75 2004/09/07 14:34:46 dmitry Exp $ */ #include time.h @@ -581,6 +581,32 @@ return ret; } +static int php_soap_xmlCheckUTF8(const unsigned char *s) +{ + int i; + unsigned char c; + + for (i = 0; (c = s[i++]);) { + if ((c 0x80) == 0) { + } else if ((c 0xe0) == 0xc0) { + if ((s[i++] 0xc0) != 0x80) { + return 0; + } + } else if ((c 0xf0) == 0xe0) { + if ((s[i++] 0xc0) != 0x80 || (s[i++] 0xc0) != 0x80) { + return 0; + } + } else if ((c 0xf8) == 0xf0) { + if ((s[i++] 0xc0) != 0x80 || (s[i++] 0xc0) != 0x80 || (s[i++] 0xc0) != 0x80) { + return 0; + } + } else { + return 0; + } + } + return 1; +} + static xmlNodePtr to_xml_string(encodeTypePtr type, zval *data, int style, xmlNodePtr parent) { xmlNodePtr ret; @@ -612,12 +638,12 @@ efree(str); str = estrdup(xmlBufferContent(out)); new_len = n; - } else if (!xmlCheckUTF8(str)) { + } else if (!php_soap_xmlCheckUTF8(str)) { soap_error1(E_ERROR, Encoding: string '%s' is not a valid utf-8 string, str); } xmlBufferFree(out); xmlBufferFree(in); - } else if (!xmlCheckUTF8(str)) { + } else if (!php_soap_xmlCheckUTF8(str)) { soap_error1(E_ERROR, Encoding: string '%s' is not a valid utf-8 string, str); } -- Best regards, Marcusmailto:[EMAIL PROTECTED] -- PHP CVS Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-CVS] cvs: php-src /ext/soap php_encoding.c
utf-8 is now limited to 4 bytes so imo it should be left as is. Rob - Original Message - From: Marcus Boerger that's missing a few lines: } else if ((c 0xfc) == 0xf8) { if ((s[i++] 0xc0) != 0x80 || (s[i++] 0xc0) != 0x80 || (s[i++] 0xc0) != 0x80 || (s[i++] 0xc0) != 0x80) { return 0; } } else if ((c 0xfe) == 0xfc) { if ((s[i++] 0xc0) != 0x80 || (s[i++] 0xc0) != 0x80 || (s[i++] 0xc0) != 0x80 || (s[i++] 0xc0) != 0x80 || (s[i++] 0xc0) != 0x80) { return 0; } -- PHP CVS Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP-CVS] cvs: php-src /ext/soap php_encoding.c
Hi, I think, you are right. I will add them. Thanks. Dmitry. -Original Message- From: Marcus Boerger [mailto:[EMAIL PROTECTED] Sent: Tuesday, September 07, 2004 22:46 To: Dmitry Stogov Cc: [EMAIL PROTECTED] Subject: Re: [PHP-CVS] cvs: php-src /ext/soap php_encoding.c Hello Dmitry, that's missing a few lines: } else if ((c 0xfc) == 0xf8) { if ((s[i++] 0xc0) != 0x80 || (s[i++] 0xc0) != 0x80 || (s[i++] 0xc0) != 0x80 || (s[i++] 0xc0) != 0x80) { return 0; } } else if ((c 0xfe) == 0xfc) { if ((s[i++] 0xc0) != 0x80 || (s[i++] 0xc0) != 0x80 || (s[i++] 0xc0) != 0x80 || (s[i++] 0xc0) != 0x80 || (s[i++] 0xc0) != 0x80) { return 0; } regards marcus Tuesday, September 7, 2004, 4:34:46 PM, you wrote: dmitry Tue Sep 7 10:34:46 2004 EDT Modified files: /php-src/ext/soap php_encoding.c Log: Make ext/soap work around libxml2 bug in xmlCheckUTF8 (2.6.7-2.6.13) http://cvs.php.net/diff.php/php-src/ext/soap/php_encoding.c?r1=1.74r2 =1.75ty=u Index: php-src/ext/soap/php_encoding.c diff -u php-src/ext/soap/php_encoding.c:1.74 php-src/ext/soap/php_encoding.c:1.75 --- php-src/ext/soap/php_encoding.c:1.74Thu Aug 26 14:40:10 2004 +++ php-src/ext/soap/php_encoding.c Tue Sep 7 10:34:46 2004 @@ -17,7 +17,7 @@ | Dmitry Stogov [EMAIL PROTECTED] +- -+ */ -/* $Id: php_encoding.c,v 1.74 2004/08/26 18:40:10 dmitry Exp $ */ +/* $Id: php_encoding.c,v 1.75 2004/09/07 14:34:46 dmitry Exp $ */ #include time.h @@ -581,6 +581,32 @@ return ret; } +static int php_soap_xmlCheckUTF8(const unsigned char *s) +{ + int i; + unsigned char c; + + for (i = 0; (c = s[i++]);) { + if ((c 0x80) == 0) { + } else if ((c 0xe0) == 0xc0) { + if ((s[i++] 0xc0) != 0x80) { + return 0; + } + } else if ((c 0xf0) == 0xe0) { + if ((s[i++] 0xc0) != 0x80 || (s[i++] 0xc0) != 0x80) { + return 0; + } + } else if ((c 0xf8) == 0xf0) { + if ((s[i++] 0xc0) != 0x80 || (s[i++] 0xc0) != 0x80 || (s[i++] 0xc0) != 0x80) { + return 0; + } + } else { + return 0; + } + } + return 1; +} + static xmlNodePtr to_xml_string(encodeTypePtr type, zval *data, int style, xmlNodePtr parent) { xmlNodePtr ret; @@ -612,12 +638,12 @@ efree(str); str = estrdup(xmlBufferContent(out)); new_len = n; - } else if (!xmlCheckUTF8(str)) { + } else if (!php_soap_xmlCheckUTF8(str)) { soap_error1(E_ERROR, Encoding: string '%s' is not a valid utf-8 string, str); } xmlBufferFree(out); xmlBufferFree(in); - } else if (!xmlCheckUTF8(str)) { + } else if (!php_soap_xmlCheckUTF8(str)) { soap_error1(E_ERROR, Encoding: string '%s' is not a valid utf-8 string, str); } -- Best regards, Marcusmailto:[EMAIL PROTECTED] -- PHP CVS Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP-CVS] cvs: php-src /ext/soap php_encoding.c
Hi, I should make a decision. Can anybody point me to some utf-8 specification document? Thanks. Dmitry. -Original Message- From: Rob Richards [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 08, 2004 03:24 To: Marcus Boerger; Dmitry Stogov Cc: [EMAIL PROTECTED] Subject: Re: [PHP-CVS] cvs: php-src /ext/soap php_encoding.c utf-8 is now limited to 4 bytes so imo it should be left as is. Rob - Original Message - From: Marcus Boerger that's missing a few lines: } else if ((c 0xfc) == 0xf8) { if ((s[i++] 0xc0) != 0x80 || (s[i++] 0xc0) != 0x80 || (s[i++] 0xc0) != 0x80 || (s[i++] 0xc0) != 0x80) { return 0; } } else if ((c 0xfe) == 0xfc) { if ((s[i++] 0xc0) != 0x80 || (s[i++] 0xc0) != 0x80 || (s[i++] 0xc0) != 0x80 || (s[i++] 0xc0) != 0x80 || (s[i++] 0xc0) != 0x80) { return 0; } -- PHP CVS Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-CVS] cvs: php-src /ext/soap php_encoding.c
On Fri, 13 Feb 2004, Dmitry Stogov wrote: dmitryFri Feb 13 03:29:17 2004 EDT Modified files: /php-src/ext/soap php_encoding.c Log: BUGFIX Would be cool to mention what you fixed ;-) Derick -- PHP CVS Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php