[ 
http://issues.apache.org/jira/browse/XERCESC-1363?page=comments#action_60360 ]
     
David Earlam commented on XERCESC-1363:
---------------------------------------

I tried out Christian's patch to substring.

patch -p0 < XMLString.cpp.patch

Using domprint and a VisualStudio.net 2003 built xerces 2.6 DLL(release) took 
~300 seconds without the patch, yet only about 18 seconds when rebuilt with the 
patch applied.

(Previous tests were run with VC6 built binary xerces 2.5.0 DLL).

This fixes the performance problem. Thanks.

Interestingly, domprint'ing most of my real XML data is now faster when 
validated -n -s -wfpp=on than not (I guess the space compress between list 
elements means the console IO has less work to do).


Note: There's another overloaded substring method in XMLString. It takes a 
const char* const srcStr, rather than a const XMLCh* const srcStr. Perhaps the 
same change should be applied to this function too ?

David










>  DataTypeListValidator extraordinarily slow  for long lists
> -----------------------------------------------------------
>
>          Key: XERCESC-1363
>          URL: http://issues.apache.org/jira/browse/XERCESC-1363
>      Project: Xerces-C++
>         Type: Bug
>   Components: Validating Parser (Schema) (Xerces 1.5 or up only)
>     Versions: 2.5.0, 2.6.0
>  Environment: Windows 2000
>     Reporter: David Earlam
>     Priority: Minor
>  Attachments: XMLString.cpp.patch, pq.zip
>
> Validating an XML instance against a Schema with an unbounded xsd:list type 
> can take much greater than O(n) processing resources, where n is the number 
> of items in the list.
> To reproduce use this Schema:
> pq.xsd
> <?xml version="1.0" encoding="utf-8" ?>
> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"; 
> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
>       xmlns:pqns="http://swsis.cambridge.arm.com/~dearlam/xercestest/"; 
> targetNamespace="http://swsis.cambridge.arm.com/~dearlam/xercestest/";
>       elementFormDefault="qualified" version="0.1">
>       <xs:annotation>
>               <xs:documentation xml:lang="en">
>               XML schema for Hofstadter's Gödel pq-System.
>               
>               Test data for list data type validation.
>        </xs:documentation>
>       </xs:annotation>
>       <xs:element name="pqData" type="pqns:pqDataType"></xs:element>
>       <xs:complexType name="pqDataType">
>               <xs:complexContent>
>                       <xs:restriction base="xs:anyType">
>                               <xs:sequence minOccurs="1" maxOccurs="1">
>                                       <xs:element name="dashes" 
> type="pqns:dashBlockType"></xs:element>
>                                       <xs:element name="p" type="xs:string" 
> xsi:nill="true"></xs:element>
>                                       <xs:element name="dashes" 
> type="pqns:dashBlockType"></xs:element>
>                                       <xs:element name="q" type="xs:string" 
> xsi:nill="true"></xs:element>
>                                       <xs:element name="dashes" 
> type="pqns:dashBlockType"></xs:element>
>                               </xs:sequence>
>                       </xs:restriction>
>               </xs:complexContent>
>       </xs:complexType>
>       <xs:complexType name="porqType">
>               <xs:simpleContent>
>                       <xs:extension base="xs:string"></xs:extension>
>               </xs:simpleContent>
>       </xs:complexType>
>       <xs:complexType name="dashBlockType">
>               <xs:simpleContent>
>                       <xs:extension base="pqns:dataDashes"></xs:extension>
>               </xs:simpleContent>
>       </xs:complexType>
>       <xs:simpleType name="Dash">
>               <xs:restriction base="xs:string">
>                       <xs:pattern value="[\-]"></xs:pattern>
>               </xs:restriction>
>       </xs:simpleType>
>       <xs:simpleType name="dataDashes">
>               <xs:restriction base="pqns:DashList">
>                       <xs:minLength value="0" />
>               </xs:restriction>
>       </xs:simpleType>
>       <xs:simpleType name="DashList">
>               <xs:list itemType="pqns:Dash"></xs:list>
>       </xs:simpleType>
> </xs:schema>
> and this XML file
> pqData0.xml
> <?xml version="1.0" encoding="utf-8" ?> 
> <pqData xmlns='http://swsis.cambridge.arm.com/~dearlam/xercestest/'
> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
> xsi:schemaLocation="http://swsis.cambridge.arm.com/~dearlam/xercestest/
>  http://swsis.cambridge.arm.com/~dearlam/xercestest/pq.xsd";>
> <dashes>
> - -
> </dashes>
> <p/>
> <dashes>-</dashes>
> <q/>
> <dashes>-</dashes>
> </pqData>
> (replacing swsis.cambridge.arm.com/~dearlam/xercestest with your location)
> Then use 
>   domprint -wfpp=on pqData0.xml
> and
>   domprint -n -s -wfpp=on pqData0.xml
> to print the XML non-validating and validating.
> They print in equal short time. OK.
> Now, edit pqData0.xml as pqData1.xml and replace
> - - 
> with 4000 lines of
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
> - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> This gives a 500Kb file (which mimics my real data).
> If you then try
>   domprint -wfpp=on pqData1.xml
> and
>   domprint -n -s -wfpp=on pqData1.xml 
> the first prints instantly (pipe it to NUL if you like), but the second 
> consumes 99% CPU for 230 seconds, then prints. 
> That's about 2 bytes per second !
> --
> (My suspicion is XMLString::tokenizeString is using subString() to calculate 
> the string length
> way too many times...)
> kind regards,
> David

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
If you want more information on JIRA, or have a bug to report see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to