Just to clarify that.

xmlNewTextLen will contain the raw text and those 3 characters are escaped during serialization. xmlNodeSetContentLen, on the other hand, since it was being called from the scope of an element, was building a subtree of text and entity nodes. With large amounts of data (containing many entities mixed with text) this is extremely slow and memory extensive due to the number nodes having to be created and then ultimately free'd. This also slowed down serialization due to having to traverse so many nodes.

Rob

Rasmus Lerdorf wrote:
Yes, those chars is exactly what was causing the performance problem actually.

xmlNewTextLen() will call the internal libxml entity encoder, but it won't try to allocate each entity for use by the subtree. It was this entity allocation code in xmlNodeSetContentLen that was slowing everything down even though because we were calling php_escape_html_entities() on the blob before passing it in, it wouldn't create any sub-nodes anyway so the whole thing was a bit redundant, at least if I am understanding this correctly.

-Rasmus


Dmitry Stogov wrote:
Hi Rasmus,

Will your patch support strings with special characters ('<', '>', '&')?

Thanks. Dmitry.

-----Original Message-----
From: Rasmus Lerdorf [mailto:[EMAIL PROTECTED] Sent: Thursday, June 15, 2006 10:04 PM
To: php-cvs@lists.php.net
Subject: [PHP-CVS] cvs: php-src /ext/soap php_encoding.c

rasmus        Thu Jun 15 18:03:31 2006 UTC

Modified files: /php-src/ext/soap php_encoding.c Log:
  I don't think the call to xmlNodeSetContentLen() is needed here and
  it is causing performance problems because it tries to parse the blob
  and create a subtree.  Because we are escaping the string anyway, we
  are never going to get a subtree, but the entity parsing that is done
by xmlNodeSetContentLen() is killing performance on large blobs of text. On one recent example it took a couple of minutes to parse whereas if we just create a text node like this and set the contents
  to the raw string it is down to milliseconds.  As far as I can tell
  all the tests pass with this patch.
    http://cvs.php.net/viewcvs.cgi/php-src/ext/soap/php_encoding.c
?r1=1.127&r2=1.128&diff_format=u
Index: php-src/ext/soap/php_encoding.c
diff -u php-src/ext/soap/php_encoding.c:1.127 php-src/ext/soap/php_encoding.c:1.128
--- php-src/ext/soap/php_encoding.c:1.127    Fri May 26 09:04:53 2006
+++ php-src/ext/soap/php_encoding.c    Thu Jun 15 18:03:30 2006
@@ -17,7 +17,7 @@
| Dmitry Stogov <[EMAIL PROTECTED]> |
   +-------------------------------------------------------------
---------+
 */
-/* $Id: php_encoding.c,v 1.127 2006/05/26 09:04:53 dmitry Exp $ */
+/* $Id: php_encoding.c,v 1.128 2006/06/15 18:03:30 rasmus Exp $ */
#include <time.h> @@ -728,7 +728,7 @@ static xmlNodePtr to_xml_string(encodeTypePtr type, zval *data, int style, xmlNodePtr parent) {
-    xmlNodePtr ret;
+    xmlNodePtr ret, text;
     char *str;
     int new_len;
     TSRMLS_FETCH();
@@ -738,13 +738,15 @@
     FIND_ZVAL_NULL(data, ret, style);
if (Z_TYPE_P(data) == IS_STRING) { - str = php_escape_html_entities(Z_STRVAL_P(data), Z_STRLEN_P(data), &new_len, 0, 0, NULL TSRMLS_CC);
+        str = estrndup(Z_STRVAL_P(data), Z_STRLEN_P(data));
+        new_len = Z_STRLEN_P(data);
     } else {
         zval tmp = *data;
zval_copy_ctor(&tmp);
         convert_to_string(&tmp);
- str = php_escape_html_entities(Z_STRVAL(tmp), Z_STRLEN(tmp), &new_len, 0, 0, NULL TSRMLS_CC); + str = estrndup(Z_STRVAL(tmp), Z_STRLEN(tmp));
+        new_len = Z_STRLEN(tmp);
         zval_dtor(&tmp);
     }
@@ -766,7 +768,8 @@ soap_error1(E_ERROR, "Encoding: string '%s' is not a valid utf-8 string", str);
     }
- xmlNodeSetContentLen(ret, str, new_len);
+    text = xmlNewTextLen(str, new_len);
+    xmlAddChild(ret, text);
     efree(str);
if (style == SOAP_ENCODED) {

--
PHP CVS Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php






--
PHP CVS Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to