RE: [PHP-CVS] cvs: php-src /ext/soap php_encoding.c

2006-06-15 Thread Dmitry Stogov
Hi Rasmus,

Will your patch support strings with special characters ('', '', '')?

Thanks. Dmitry.

 -Original Message-
 From: Rasmus Lerdorf [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, June 15, 2006 10:04 PM
 To: php-cvs@lists.php.net
 Subject: [PHP-CVS] cvs: php-src /ext/soap php_encoding.c 
 
 
 rasmusThu Jun 15 18:03:31 2006 UTC
 
   Modified files:  
 /php-src/ext/soap php_encoding.c 
   Log:
   I don't think the call to xmlNodeSetContentLen() is needed here and
   it is causing performance problems because it tries to 
 parse the blob
   and create a subtree.  Because we are escaping the string anyway, we
   are never going to get a subtree, but the entity parsing 
 that is done
   by xmlNodeSetContentLen() is killing performance on large blobs of 
   text.  On one recent example it took a couple of minutes to parse 
   whereas if we just create a text node like this and set the contents
   to the raw string it is down to milliseconds.  As far as I can tell
   all the tests pass with this patch.
   
   
 http://cvs.php.net/viewcvs.cgi/php-src/ext/soap/php_encoding.c
 ?r1=1.127r2=1.128diff_format=u
 Index: php-src/ext/soap/php_encoding.c
 diff -u php-src/ext/soap/php_encoding.c:1.127 
 php-src/ext/soap/php_encoding.c:1.128
 --- php-src/ext/soap/php_encoding.c:1.127 Fri May 26 09:04:53 2006
 +++ php-src/ext/soap/php_encoding.c   Thu Jun 15 18:03:30 2006
 @@ -17,7 +17,7 @@
|  Dmitry Stogov [EMAIL PROTECTED] 
 |

 +-
 -+
  */
 -/* $Id: php_encoding.c,v 1.127 2006/05/26 09:04:53 dmitry Exp $ */
 +/* $Id: php_encoding.c,v 1.128 2006/06/15 18:03:30 rasmus Exp $ */
  
  #include time.h
  
 @@ -728,7 +728,7 @@
  
  static xmlNodePtr to_xml_string(encodeTypePtr type, zval 
 *data, int style, xmlNodePtr parent)  {
 - xmlNodePtr ret;
 + xmlNodePtr ret, text;
   char *str;
   int new_len;
   TSRMLS_FETCH();
 @@ -738,13 +738,15 @@
   FIND_ZVAL_NULL(data, ret, style);
  
   if (Z_TYPE_P(data) == IS_STRING) {
 - str = 
 php_escape_html_entities(Z_STRVAL_P(data), Z_STRLEN_P(data), 
 new_len, 0, 0, NULL TSRMLS_CC);
 + str = estrndup(Z_STRVAL_P(data), Z_STRLEN_P(data));
 + new_len = Z_STRLEN_P(data);
   } else {
   zval tmp = *data;
  
   zval_copy_ctor(tmp);
   convert_to_string(tmp);
 - str = php_escape_html_entities(Z_STRVAL(tmp), 
 Z_STRLEN(tmp), new_len, 0, 0, NULL TSRMLS_CC);   
 + str = estrndup(Z_STRVAL(tmp), Z_STRLEN(tmp));
 + new_len = Z_STRLEN(tmp);
   zval_dtor(tmp);
   }
  
 @@ -766,7 +768,8 @@
   soap_error1(E_ERROR,  Encoding: string '%s' is 
 not a valid utf-8 string, str);
   }
  
 - xmlNodeSetContentLen(ret, str, new_len);
 + text = xmlNewTextLen(str, new_len);
 + xmlAddChild(ret, text);
   efree(str);
  
   if (style == SOAP_ENCODED) {
 
 -- 
 PHP CVS Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php
 
 
 

-- 
PHP CVS Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-CVS] cvs: php-src /ext/soap php_encoding.c

2006-06-15 Thread Rasmus Lerdorf
Yes, those chars is exactly what was causing the performance problem 
actually.


xmlNewTextLen() will call the internal libxml entity encoder, but it 
won't try to allocate each entity for use by the subtree.  It was this 
entity allocation code in xmlNodeSetContentLen that was slowing 
everything down even though because we were calling 
php_escape_html_entities() on the blob before passing it in, it wouldn't 
create any sub-nodes anyway so the whole thing was a bit redundant, at 
least if I am understanding this correctly.


-Rasmus


Dmitry Stogov wrote:

Hi Rasmus,

Will your patch support strings with special characters ('', '', '')?

Thanks. Dmitry.


-Original Message-
From: Rasmus Lerdorf [mailto:[EMAIL PROTECTED] 
Sent: Thursday, June 15, 2006 10:04 PM

To: php-cvs@lists.php.net
Subject: [PHP-CVS] cvs: php-src /ext/soap php_encoding.c 



rasmus  Thu Jun 15 18:03:31 2006 UTC

  Modified files:  
/php-src/ext/soap	php_encoding.c 
  Log:

  I don't think the call to xmlNodeSetContentLen() is needed here and
  it is causing performance problems because it tries to 
parse the blob

  and create a subtree.  Because we are escaping the string anyway, we
  are never going to get a subtree, but the entity parsing 
that is done
  by xmlNodeSetContentLen() is killing performance on large blobs of 
  text.  On one recent example it took a couple of minutes to parse 
  whereas if we just create a text node like this and set the contents

  to the raw string it is down to milliseconds.  As far as I can tell
  all the tests pass with this patch.
  
  
http://cvs.php.net/viewcvs.cgi/php-src/ext/soap/php_encoding.c

?r1=1.127r2=1.128diff_format=u
Index: php-src/ext/soap/php_encoding.c
diff -u php-src/ext/soap/php_encoding.c:1.127 
php-src/ext/soap/php_encoding.c:1.128

--- php-src/ext/soap/php_encoding.c:1.127   Fri May 26 09:04:53 2006
+++ php-src/ext/soap/php_encoding.c Thu Jun 15 18:03:30 2006
@@ -17,7 +17,7 @@
   |  Dmitry Stogov [EMAIL PROTECTED] 
|
   
+-

-+
 */
-/* $Id: php_encoding.c,v 1.127 2006/05/26 09:04:53 dmitry Exp $ */
+/* $Id: php_encoding.c,v 1.128 2006/06/15 18:03:30 rasmus Exp $ */
 
 #include time.h
 
@@ -728,7 +728,7 @@
 
 static xmlNodePtr to_xml_string(encodeTypePtr type, zval 
*data, int style, xmlNodePtr parent)  {

-   xmlNodePtr ret;
+   xmlNodePtr ret, text;
char *str;
int new_len;
TSRMLS_FETCH();
@@ -738,13 +738,15 @@
FIND_ZVAL_NULL(data, ret, style);
 
 	if (Z_TYPE_P(data) == IS_STRING) {
-		str = 
php_escape_html_entities(Z_STRVAL_P(data), Z_STRLEN_P(data), 
new_len, 0, 0, NULL TSRMLS_CC);

+   str = estrndup(Z_STRVAL_P(data), Z_STRLEN_P(data));
+   new_len = Z_STRLEN_P(data);
} else {
zval tmp = *data;
 
 		zval_copy_ctor(tmp);

convert_to_string(tmp);
-		str = php_escape_html_entities(Z_STRVAL(tmp), 
Z_STRLEN(tmp), new_len, 0, 0, NULL TSRMLS_CC);	

+   str = estrndup(Z_STRVAL(tmp), Z_STRLEN(tmp));
+   new_len = Z_STRLEN(tmp);
zval_dtor(tmp);
}
 
@@ -766,7 +768,8 @@
 		soap_error1(E_ERROR,  Encoding: string '%s' is 
not a valid utf-8 string, str);

}
 
-	xmlNodeSetContentLen(ret, str, new_len);

+   text = xmlNewTextLen(str, new_len);
+   xmlAddChild(ret, text);
efree(str);
 
 	if (style == SOAP_ENCODED) {


--
PHP CVS Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php







--
PHP CVS Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-CVS] cvs: php-src /ext/soap php_encoding.c

2006-06-15 Thread Rob Richards

Just to clarify that.

xmlNewTextLen will contain the raw text and those 3 characters are 
escaped during serialization.
xmlNodeSetContentLen, on the other hand, since it was being called from 
the scope of an element, was building a subtree of text and entity 
nodes. With large amounts of data (containing many entities mixed with 
text) this is extremely slow and memory extensive due to the number 
nodes having to be created and then ultimately free'd. This also slowed 
down serialization due to having to traverse so many nodes.


Rob

Rasmus Lerdorf wrote:
Yes, those chars is exactly what was causing the performance problem 
actually.


xmlNewTextLen() will call the internal libxml entity encoder, but it 
won't try to allocate each entity for use by the subtree.  It was this 
entity allocation code in xmlNodeSetContentLen that was slowing 
everything down even though because we were calling 
php_escape_html_entities() on the blob before passing it in, it 
wouldn't create any sub-nodes anyway so the whole thing was a bit 
redundant, at least if I am understanding this correctly.


-Rasmus


Dmitry Stogov wrote:

Hi Rasmus,

Will your patch support strings with special characters ('', '', '')?

Thanks. Dmitry.


-Original Message-
From: Rasmus Lerdorf [mailto:[EMAIL PROTECTED] Sent: Thursday, June 
15, 2006 10:04 PM

To: php-cvs@lists.php.net
Subject: [PHP-CVS] cvs: php-src /ext/soap php_encoding.c

rasmusThu Jun 15 18:03:31 2006 UTC

  Modified files:  /php-src/ext/soap
php_encoding.c   Log:

  I don't think the call to xmlNodeSetContentLen() is needed here and
  it is causing performance problems because it tries to parse the blob
  and create a subtree.  Because we are escaping the string anyway, we
  are never going to get a subtree, but the entity parsing that is done
  by xmlNodeSetContentLen() is killing performance on large blobs of 
  text.  On one recent example it took a couple of minutes to parse 
  whereas if we just create a text node like this and set the contents

  to the raw string it is down to milliseconds.  As far as I can tell
  all the tests pass with this patch.
http://cvs.php.net/viewcvs.cgi/php-src/ext/soap/php_encoding.c
?r1=1.127r2=1.128diff_format=u
Index: php-src/ext/soap/php_encoding.c
diff -u php-src/ext/soap/php_encoding.c:1.127 
php-src/ext/soap/php_encoding.c:1.128

--- php-src/ext/soap/php_encoding.c:1.127Fri May 26 09:04:53 2006
+++ php-src/ext/soap/php_encoding.cThu Jun 15 18:03:30 2006
@@ -17,7 +17,7 @@
   |  Dmitry Stogov [EMAIL PROTECTED] 
|

   +-
-+
 */
-/* $Id: php_encoding.c,v 1.127 2006/05/26 09:04:53 dmitry Exp $ */
+/* $Id: php_encoding.c,v 1.128 2006/06/15 18:03:30 rasmus Exp $ */
 
 #include time.h
 
@@ -728,7 +728,7 @@
 
 static xmlNodePtr to_xml_string(encodeTypePtr type, zval *data, int 
style, xmlNodePtr parent)  {

-xmlNodePtr ret;
+xmlNodePtr ret, text;
 char *str;
 int new_len;
 TSRMLS_FETCH();
@@ -738,13 +738,15 @@
 FIND_ZVAL_NULL(data, ret, style);
 
 if (Z_TYPE_P(data) == IS_STRING) {
-str = php_escape_html_entities(Z_STRVAL_P(data), 
Z_STRLEN_P(data), new_len, 0, 0, NULL TSRMLS_CC);

+str = estrndup(Z_STRVAL_P(data), Z_STRLEN_P(data));
+new_len = Z_STRLEN_P(data);
 } else {
 zval tmp = *data;
 
 zval_copy_ctor(tmp);

 convert_to_string(tmp);
-str = php_escape_html_entities(Z_STRVAL(tmp), 
Z_STRLEN(tmp), new_len, 0, 0, NULL TSRMLS_CC);   
+str = estrndup(Z_STRVAL(tmp), Z_STRLEN(tmp));

+new_len = Z_STRLEN(tmp);
 zval_dtor(tmp);
 }
 
@@ -766,7 +768,8 @@
 soap_error1(E_ERROR,  Encoding: string '%s' is not a valid 
utf-8 string, str);

 }
 
-xmlNodeSetContentLen(ret, str, new_len);

+text = xmlNewTextLen(str, new_len);
+xmlAddChild(ret, text);
 efree(str);
 
 if (style == SOAP_ENCODED) {


--
PHP CVS Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php









--
PHP CVS Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP-CVS] cvs: php-src /ext/soap php_encoding.c

2005-10-07 Thread Dmitry Stogov
Hi George,

Seems you patch is wrong.
It breaks ext/soap/tests/schema/shema047.phpt and
ext/soap/tests/schema/049.phpt.

I reverted the path.

Please provide test case, what is not working for you?

Thanks. Dmitry.

 -Original Message-
 From: George Schlossnagle [mailto:[EMAIL PROTECTED] 
 Sent: Friday, October 07, 2005 2:30 AM
 To: php-cvs@lists.php.net
 Subject: [PHP-CVS] cvs: php-src /ext/soap php_encoding.c 
 
 
 gschlossnagle Thu Oct  6 18:30:11 2005 EDT
 
   Modified files:  
 /php-src/ext/soap php_encoding.c 
   Log:
   support complex types in restrictions and extensions
   
 http://cvs.php.net/diff.php/php-src/ext/soap/php_encoding.c?r1
 =1.107r2=1.108ty=u
 Index: php-src/ext/soap/php_encoding.c
 diff -u php-src/ext/soap/php_encoding.c:1.107 
 php-src/ext/soap/php_encoding.c:1.108
 --- php-src/ext/soap/php_encoding.c:1.107 Thu Sep 29 06:00:59 2005
 +++ php-src/ext/soap/php_encoding.c   Thu Oct  6 18:30:08 2005
 @@ -17,7 +17,7 @@
|  Dmitry Stogov [EMAIL PROTECTED] 
 |

 +-
 -+
  */
 -/* $Id: php_encoding.c,v 1.107 2005/09/29 10:00:59 dmitry Exp $ */
 +/* $Id: php_encoding.c,v 1.108 2005/10/06 22:30:08 
 gschlossnagle Exp $ 
 +*/
  
  #include time.h
  
 @@ -319,6 +319,10 @@
   node = 
 encode-to_xml_after(encode-details, node, style);
   }
   }
 + if(!node) {
 + node = xmlNewNode(NULL,BOGUS);
 + xmlAddChild(parent, node);
 + }
   return node;
  }
  
 @@ -1536,6 +1540,7 @@
  
   enc = sdlType-encode;
   while (enc  enc-details.sdl_type 
 +enc-details.sdl_type-kind != 
 XSD_TYPEKIND_COMPLEX 
  enc-details.sdl_type-kind != 
 XSD_TYPEKIND_SIMPLE 
  enc-details.sdl_type-kind != 
 XSD_TYPEKIND_LIST 
  enc-details.sdl_type-kind != 
 XSD_TYPEKIND_UNION) { @@ -1545,11 +1550,8 @@
   zval *tmp = 
 get_zval_property(data, _ TSRMLS_CC);
   if (tmp) {
   xmlParam = 
 master_to_xml(enc, tmp, style, parent);
 - } else if (prop == NULL) {
 - xmlParam = 
 master_to_xml(enc, data, style, parent);
   } else {
 - xmlParam = 
 xmlNewNode(NULL,BOGUS);
 - xmlAddChild(parent, xmlParam);
 + xmlParam = 
 master_to_xml(enc, data, style, parent);
   }
   } else {
   xmlParam = xmlNewNode(NULL,BOGUS);
 @@ -1558,6 +1560,7 @@
   } else if (sdlType-kind == XSD_TYPEKIND_EXTENSION 
  sdlType-encode  type != 
 sdlType-encode-details) {
   if (sdlType-encode-details.sdl_type 
 + 
 sdlType-encode-details.sdl_type-kind != XSD_TYPEKIND_COMPLEX 
 +
   
 sdlType-encode-details.sdl_type-kind != XSD_TYPEKIND_SIMPLE 
   
 sdlType-encode-details.sdl_type-kind != XSD_TYPEKIND_LIST 
   
 sdlType-encode-details.sdl_type-kind != 
 XSD_TYPEKIND_UNION) { @@ -1567,12 +1570,9 @@
  
   if (tmp) {
   xmlParam = 
 master_to_xml(sdlType-encode, tmp, style, parent);
 - } else if (prop == NULL) {
 - xmlParam = 
 master_to_xml(sdlType-encode, data, style, parent);
   } else {
 - xmlParam = 
 xmlNewNode(NULL,BOGUS);
 - xmlAddChild(parent, xmlParam);
 - }
 + xmlParam = 
 master_to_xml(sdlType-encode, data, style, parent);
 + }
   }
   } else {
   xmlParam = xmlNewNode(NULL,BOGUS);
 
 -- 
 PHP CVS Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php
 
 

-- 
PHP CVS Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-CVS] cvs: php-src /ext/soap php_encoding.c

2005-10-07 Thread George Schlossnagle
Test 49 looks totally incorrect to me.  You've derived the object by  
restriction in accordance with section 2.5.2.1 of the Schema Part 2  
spec,



[Definition:]  A datatype is said to be ·derived· by restriction from  
another datatype when values for zero or more ·constraining facet·s  
are specified that serve to constrain its ·value space· and/or its  
lexical space to a subset of the base type.


Check http://www.w3.org/TR/xmlschema-2/#derivation-by-restriction and  
http://www.w3.org/TR/xmlschema-1/#Complex_Type_Definition_details for  
more details.


but you aren't constraining the type to be a subset of the base type,  
in fact you ignore the base type completely and just implement your  
own parameter there, which seems to violate the way restrictions work.


Test 49 _should_ look like this:





Further, if you specify no constraints on a restriction, then you  
extend off that restriction.  I've added test 81 which validates this  
(and fails with your reversion of my patch).


Here's a patch that makes 81 pass correctly and leaves 47 working.   
As I've stated, 49 just looks wrong:




George




On Oct 7, 2005, at 5:03 AM, Dmitry Stogov wrote:


Hi George,

Seems you patch is wrong.
It breaks ext/soap/tests/schema/shema047.phpt and
ext/soap/tests/schema/049.phpt.

I reverted the path.

Please provide test case, what is not working for you?

Thanks. Dmitry.



-Original Message-
From: George Schlossnagle [mailto:[EMAIL PROTECTED]
Sent: Friday, October 07, 2005 2:30 AM
To: php-cvs@lists.php.net
Subject: [PHP-CVS] cvs: php-src /ext/soap php_encoding.c


gschlossnagleThu Oct  6 18:30:11 2005 EDT

  Modified files:
/php-src/ext/soapphp_encoding.c
  Log:
  support complex types in restrictions and extensions

http://cvs.php.net/diff.php/php-src/ext/soap/php_encoding.c?r1
=1.107r2=1.108ty=u
Index: php-src/ext/soap/php_encoding.c
diff -u php-src/ext/soap/php_encoding.c:1.107
php-src/ext/soap/php_encoding.c:1.108
--- php-src/ext/soap/php_encoding.c:1.107Thu Sep 29 06:00:59 2005
+++ php-src/ext/soap/php_encoding.cThu Oct  6 18:30:08 2005
@@ -17,7 +17,7 @@
   |  Dmitry Stogov [EMAIL PROTECTED]
|

+-
-+
 */
-/* $Id: php_encoding.c,v 1.107 2005/09/29 10:00:59 dmitry Exp $ */
+/* $Id: php_encoding.c,v 1.108 2005/10/06 22:30:08
gschlossnagle Exp $
+*/

 #include time.h

@@ -319,6 +319,10 @@
 node =
encode-to_xml_after(encode-details, node, style);
 }
 }
+if(!node) {
+node = xmlNewNode(NULL,BOGUS);
+xmlAddChild(parent, node);
+}
 return node;
 }

@@ -1536,6 +1540,7 @@

 enc = sdlType-encode;
 while (enc  enc-details.sdl_type 
+   enc-details.sdl_type-kind !=
XSD_TYPEKIND_COMPLEX 
enc-details.sdl_type-kind !=
XSD_TYPEKIND_SIMPLE 
enc-details.sdl_type-kind !=
XSD_TYPEKIND_LIST 
enc-details.sdl_type-kind !=
XSD_TYPEKIND_UNION) { @@ -1545,11 +1550,8 @@
 zval *tmp =
get_zval_property(data, _ TSRMLS_CC);
 if (tmp) {
 xmlParam =
master_to_xml(enc, tmp, style, parent);
-} else if (prop == NULL) {
-xmlParam =
master_to_xml(enc, data, style, parent);
 } else {
-xmlParam =
xmlNewNode(NULL,BOGUS);
-xmlAddChild(parent, xmlParam);
+xmlParam =
master_to_xml(enc, data, style, parent);
 }
 } else {
 xmlParam = xmlNewNode(NULL,BOGUS);
@@ -1558,6 +1560,7 @@
 } else if (sdlType-kind == XSD_TYPEKIND_EXTENSION 
sdlType-encode  type !=
sdlType-encode-details) {
 if (sdlType-encode-details.sdl_type 
+
sdlType-encode-details.sdl_type-kind != XSD_TYPEKIND_COMPLEX
+

sdlType-encode-details.sdl_type-kind != XSD_TYPEKIND_SIMPLE 

sdlType-encode-details.sdl_type-kind != XSD_TYPEKIND_LIST 

sdlType-encode-details.sdl_type-kind !=
XSD_TYPEKIND_UNION) { @@ -1567,12 +1570,9 @@

 if (tmp) {
 xmlParam =
master_to_xml(sdlType-encode, tmp, style, parent);
-} else if (prop == NULL) {
-xmlParam =
master_to_xml(sdlType-encode, data, style, parent);
 } else {
-xmlParam =
xmlNewNode(NULL,BOGUS);
-xmlAddChild(parent, xmlParam);
-}
+xmlParam =
master_to_xml(sdlType-encode, data, style, parent);
+}
 }
 } else {
 xmlParam = xmlNewNode(NULL,BOGUS);

--
PHP CVS Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php









George Schlossnagle

-- Vice President of Engineering
-- OmniTI Computer Consulting
-- http://www.omniti.com


-- 
PHP CVS Mailing List (http://www.php.net/)
To unsubscribe, visit: 

RE: [PHP-CVS] cvs: php-src /ext/soap php_encoding.c

2004-09-08 Thread Derick Rethans
On Wed, 8 Sep 2004, Dmitry Stogov wrote:

 Hi,

 I should make a decision.
 Can anybody point me to some utf-8 specification document?

http://www.unicode.org/faq/utf_bom.html#37
http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf
section 3.9, which proves that Rob is right and I was wrong for using
UTF-8 as Unicode encoding standard.

(Though theoretically you could use UTF8 for 4 byte encodings up to 6
bytes).

Besides this, I do no think that we should introduce copied versions
into our extensions, but just block it from being used with a configure
check for this specific libxml2 version. This also should not be done on
an extension level, but generally for PHP. (Or in case that we really
want to add a copied (+fixed) function, we should do that in ext/libxml
so that all extensions can make use of this.

Derick

-- 
Derick Rethans
http://derickrethans.nl | http://ez.no | http://xdebug.org

-- 
PHP CVS Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP-CVS] cvs: php-src /ext/soap php_encoding.c

2004-09-08 Thread Dmitry Stogov
OK. I willn't add 6-bytes characters suppport.

You are right. The proper place for this function is ext/libxml.
I will glad to use it, if somebody will implement this function there.

Thanks. Dmitry.

 -Original Message-
 From: Derick Rethans [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, September 08, 2004 10:34
 To: Dmitry Stogov
 Cc: 'Rob Richards'; 'Marcus Boerger'; 'Dmitry Stogov'; 
 [EMAIL PROTECTED]
 Subject: RE: [PHP-CVS] cvs: php-src /ext/soap php_encoding.c
 
 
 On Wed, 8 Sep 2004, Dmitry Stogov wrote:
 
  Hi,
 
  I should make a decision.
  Can anybody point me to some utf-8 specification document?
 
 http://www.unicode.org/faq/utf_bom.html#37
 http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf
 section 3.9, which proves that Rob is right and I was wrong 
 for using UTF-8 as Unicode encoding standard.
 
 (Though theoretically you could use UTF8 for 4 byte encodings 
 up to 6 bytes).
 
 Besides this, I do no think that we should introduce copied 
 versions into our extensions, but just block it from being 
 used with a configure check for this specific libxml2 
 version. This also should not be done on an extension level, 
 but generally for PHP. (Or in case that we really want to add 
 a copied (+fixed) function, we should do that in ext/libxml 
 so that all extensions can make use of this.
 
 Derick
 
 -- 
 Derick Rethans
 http://derickrethans.nl | http://ez.no | http://xdebug.org
 
 

-- 
PHP CVS Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-CVS] cvs: php-src /ext/soap php_encoding.c

2004-09-08 Thread Rob Richards
That you. At least I know I'm still borderline and havent gone completely
insane.
It was changed in RFC 3629

It should probably be added in ext/libxml as libxml 2.6.13 (and it looks
like .12 as well) are broken badly here when a bug fix was done in the
function. Previous versions have a bug with the 2 byte check (certain
invalid strings are returned as valid). Dmitry's code is almost exactly as
what's in libxml cvs for the function now so the code should be at least
used for = 2.6.13.

Rob

- Original Message - 
From: Derick Rethans

  I should make a decision.
  Can anybody point me to some utf-8 specification document?

 http://www.unicode.org/faq/utf_bom.html#37
 http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf
 section 3.9, which proves that Rob is right and I was wrong for using
 UTF-8 as Unicode encoding standard.

 (Though theoretically you could use UTF8 for 4 byte encodings up to 6
 bytes).

 Besides this, I do no think that we should introduce copied versions
 into our extensions, but just block it from being used with a configure
 check for this specific libxml2 version. This also should not be done on
 an extension level, but generally for PHP. (Or in case that we really
 want to add a copied (+fixed) function, we should do that in ext/libxml
 so that all extensions can make use of this.

-- 
PHP CVS Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-CVS] cvs: php-src /ext/soap php_encoding.c

2004-09-07 Thread Marcus Boerger
Hello Dmitry,

  that's missing a few lines:

} else if ((c  0xfc) == 0xf8) {
if ((s[i++]  0xc0) != 0x80 || (s[i++]  0xc0) != 0x80 || 
(s[i++]  0xc0) != 0x80 || (s[i++]  0xc0) != 0x80) {
return 0;
}
} else if ((c  0xfe) == 0xfc) {
if ((s[i++]  0xc0) != 0x80 || (s[i++]  0xc0) != 0x80 || 
(s[i++]  0xc0) != 0x80 || (s[i++]  0xc0) != 0x80 || (s[i++]  0xc0) != 0x80) {
return 0;
}

regards
marcus


Tuesday, September 7, 2004, 4:34:46 PM, you wrote:

 dmitryTue Sep  7 10:34:46 2004 EDT

   Modified files:  
 /php-src/ext/soap php_encoding.c 
   Log:
   Make ext/soap work around libxml2 bug in xmlCheckUTF8 (2.6.7-2.6.13)
  
  
 http://cvs.php.net/diff.php/php-src/ext/soap/php_encoding.c?r1=1.74r2=1.75ty=u
 Index: php-src/ext/soap/php_encoding.c
 diff -u php-src/ext/soap/php_encoding.c:1.74
 php-src/ext/soap/php_encoding.c:1.75
 --- php-src/ext/soap/php_encoding.c:1.74  Thu Aug 26 14:40:10 2004
 +++ php-src/ext/soap/php_encoding.c   Tue Sep  7 10:34:46 2004
 @@ -17,7 +17,7 @@
|  Dmitry Stogov [EMAIL PROTECTED]  
   
 +--+
  */
 -/* $Id: php_encoding.c,v 1.74 2004/08/26 18:40:10 dmitry Exp $ */
 +/* $Id: php_encoding.c,v 1.75 2004/09/07 14:34:46 dmitry Exp $ */
 
  #include time.h
 
 @@ -581,6 +581,32 @@
   return ret;
  }
 
 +static int php_soap_xmlCheckUTF8(const unsigned char *s)
 +{
 + int i;
 + unsigned char c;
 +
 + for (i = 0; (c = s[i++]);) {
 + if ((c  0x80) == 0) {
 + } else if ((c  0xe0) == 0xc0) {
 + if ((s[i++]  0xc0) != 0x80) {
 + return 0;
 + }
 + } else if ((c  0xf0) == 0xe0) {
 + if ((s[i++]  0xc0) != 0x80 || (s[i++]  0xc0) != 0x80) {
 + return 0;
 + }
 + } else if ((c  0xf8) == 0xf0) {
 + if ((s[i++]  0xc0) != 0x80 || (s[i++]  0xc0) != 0x80 || 
 (s[i++]  0xc0) != 0x80) {
 + return 0;
 + }
 + } else {
 + return 0;
 + }
 + }
 + return 1;
 +}
 +
  static xmlNodePtr to_xml_string(encodeTypePtr type, zval *data, int style, 
 xmlNodePtr parent)
  {
   xmlNodePtr ret;
 @@ -612,12 +638,12 @@
   efree(str);
   str = estrdup(xmlBufferContent(out));
   new_len = n;
 - } else if (!xmlCheckUTF8(str)) {
 + } else if (!php_soap_xmlCheckUTF8(str)) {
   soap_error1(E_ERROR,  Encoding: string '%s' is not a valid 
 utf-8 string, str);
   }
   xmlBufferFree(out);
   xmlBufferFree(in);
 - } else if (!xmlCheckUTF8(str)) {
 + } else if (!php_soap_xmlCheckUTF8(str)) {
   soap_error1(E_ERROR,  Encoding: string '%s' is not a valid utf-8 
 string, str);
   }
 




-- 
Best regards,
 Marcusmailto:[EMAIL PROTECTED]

-- 
PHP CVS Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-CVS] cvs: php-src /ext/soap php_encoding.c

2004-09-07 Thread Rob Richards
utf-8 is now limited to 4 bytes so imo it should be left as is.

Rob

- Original Message - 
From: Marcus Boerger

   that's missing a few lines:

 } else if ((c  0xfc) == 0xf8) {
 if ((s[i++]  0xc0) != 0x80 || (s[i++]  0xc0) !=
0x80 || (s[i++]  0xc0) != 0x80 || (s[i++]  0xc0) != 0x80) {
 return 0;
 }
 } else if ((c  0xfe) == 0xfc) {
 if ((s[i++]  0xc0) != 0x80 || (s[i++]  0xc0) !=
0x80 || (s[i++]  0xc0) != 0x80 || (s[i++]  0xc0) != 0x80 || (s[i++] 
0xc0) != 0x80) {
 return 0;
 }

-- 
PHP CVS Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP-CVS] cvs: php-src /ext/soap php_encoding.c

2004-09-07 Thread Dmitry Stogov
Hi,

I think, you are right.
I will add them.

Thanks. Dmitry.

 -Original Message-
 From: Marcus Boerger [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, September 07, 2004 22:46
 To: Dmitry Stogov
 Cc: [EMAIL PROTECTED]
 Subject: Re: [PHP-CVS] cvs: php-src /ext/soap php_encoding.c
 
 
 Hello Dmitry,
 
   that's missing a few lines:
 
 } else if ((c  0xfc) == 0xf8) {
 if ((s[i++]  0xc0) != 0x80 || 
 (s[i++]  0xc0) != 0x80 || (s[i++]  0xc0) != 0x80 || (s[i++] 
  0xc0) != 0x80) {
 return 0;
 }
 } else if ((c  0xfe) == 0xfc) {
 if ((s[i++]  0xc0) != 0x80 || 
 (s[i++]  0xc0) != 0x80 || (s[i++]  0xc0) != 0x80 || (s[i++] 
  0xc0) != 0x80 || (s[i++]  0xc0) != 0x80) {
 return 0;
 }
 
 regards
 marcus
 
 
 Tuesday, September 7, 2004, 4:34:46 PM, you wrote:
 
  dmitry  Tue Sep  7 10:34:46 2004 EDT
 
Modified files:  
  /php-src/ext/soap   php_encoding.c 
Log:
Make ext/soap work around libxml2 bug in xmlCheckUTF8 
 (2.6.7-2.6.13)
   
   
  
 http://cvs.php.net/diff.php/php-src/ext/soap/php_encoding.c?r1=1.74r2
  =1.75ty=u
  Index: php-src/ext/soap/php_encoding.c
  diff -u php-src/ext/soap/php_encoding.c:1.74
  php-src/ext/soap/php_encoding.c:1.75
  --- php-src/ext/soap/php_encoding.c:1.74Thu Aug 26 14:40:10 2004
  +++ php-src/ext/soap/php_encoding.c Tue Sep  7 10:34:46 2004
  @@ -17,7 +17,7 @@
 |  Dmitry Stogov [EMAIL PROTECTED]   


  
 +-
 -+
   */
  -/* $Id: php_encoding.c,v 1.74 2004/08/26 18:40:10 dmitry Exp $ */
  +/* $Id: php_encoding.c,v 1.75 2004/09/07 14:34:46 dmitry Exp $ */
  
   #include time.h
  
  @@ -581,6 +581,32 @@
  return ret;
   }
  
  +static int php_soap_xmlCheckUTF8(const unsigned char *s)
  +{
  +   int i;
  +   unsigned char c;
  +
  +   for (i = 0; (c = s[i++]);) {
  +   if ((c  0x80) == 0) {
  +   } else if ((c  0xe0) == 0xc0) {
  +   if ((s[i++]  0xc0) != 0x80) {
  +   return 0;
  +   }
  +   } else if ((c  0xf0) == 0xe0) {
  +   if ((s[i++]  0xc0) != 0x80 || (s[i++] 
  0xc0) != 0x80) {
  +   return 0;
  +   }
  +   } else if ((c  0xf8) == 0xf0) {
  +   if ((s[i++]  0xc0) != 0x80 || (s[i++] 
  0xc0) != 0x80 || (s[i++]  0xc0) != 0x80) {
  +   return 0;
  +   }
  +   } else {
  +   return 0;
  +   }
  +   }
  +   return 1;
  +}
  +
   static xmlNodePtr to_xml_string(encodeTypePtr type, zval 
 *data, int 
  style, xmlNodePtr parent)  {
  xmlNodePtr ret;
  @@ -612,12 +638,12 @@
  efree(str);
  str = estrdup(xmlBufferContent(out));
  new_len = n;
  -   } else if (!xmlCheckUTF8(str)) {
  +   } else if (!php_soap_xmlCheckUTF8(str)) {
  soap_error1(E_ERROR,  Encoding: string 
 '%s' is not a valid utf-8 string, str);
  }
  xmlBufferFree(out);
  xmlBufferFree(in);
  -   } else if (!xmlCheckUTF8(str)) {
  +   } else if (!php_soap_xmlCheckUTF8(str)) {
  soap_error1(E_ERROR,  Encoding: string '%s' is 
 not a valid utf-8 string, str);
  }
  
 
 
 
 
 -- 
 Best regards,
  Marcusmailto:[EMAIL PROTECTED]
 
 

-- 
PHP CVS Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP-CVS] cvs: php-src /ext/soap php_encoding.c

2004-09-07 Thread Dmitry Stogov
Hi,

I should make a decision.
Can anybody point me to some utf-8 specification document?

Thanks. Dmitry.

 -Original Message-
 From: Rob Richards [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, September 08, 2004 03:24
 To: Marcus Boerger; Dmitry Stogov
 Cc: [EMAIL PROTECTED]
 Subject: Re: [PHP-CVS] cvs: php-src /ext/soap php_encoding.c
 
 
 utf-8 is now limited to 4 bytes so imo it should be left as is.
 
 Rob
 
 - Original Message - 
 From: Marcus Boerger
 
that's missing a few lines:
 
  } else if ((c  0xfc) == 0xf8) {
  if ((s[i++]  0xc0) != 0x80 || 
 (s[i++]  0xc0) 
  !=
 0x80 || (s[i++]  0xc0) != 0x80 || (s[i++]  0xc0) != 0x80) {
  return 0;
  }
  } else if ((c  0xfe) == 0xfc) {
  if ((s[i++]  0xc0) != 0x80 || 
 (s[i++]  0xc0) 
  !=
 0x80 || (s[i++]  0xc0) != 0x80 || (s[i++]  0xc0) != 0x80 || 
 (s[i++] 
 0xc0) != 0x80) {
  return 0;
  }
 
 
 

-- 
PHP CVS Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-CVS] cvs: php-src /ext/soap php_encoding.c

2004-02-13 Thread Derick Rethans
On Fri, 13 Feb 2004, Dmitry Stogov wrote:

 dmitryFri Feb 13 03:29:17 2004 EDT

   Modified files:
 /php-src/ext/soap php_encoding.c
   Log:
   BUGFIX

Would be cool to mention what you fixed ;-)

Derick

-- 
PHP CVS Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php