DataTypeListValidator extraordinarily slow for long lists -----------------------------------------------------------
Key: XERCESC-1363 URL: http://issues.apache.org/jira/browse/XERCESC-1363 Project: Xerces-C++ Type: Bug Components: Validating Parser (Schema) (Xerces 1.5 or up only) Versions: 2.5.0, 2.6.0 Environment: Windows 2000 Reporter: David Earlam Priority: Minor Validating an XML instance against a Schema with an unbounded xsd:list type can take much greater than O(n) processing resources, where n is the number of items in the list. To reproduce use this Schema: pq.xsd <?xml version="1.0" encoding="utf-8" ?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:pqns="http://swsis.cambridge.arm.com/~dearlam/xercestest/" targetNamespace="http://swsis.cambridge.arm.com/~dearlam/xercestest/" elementFormDefault="qualified" version="0.1"> <xs:annotation> <xs:documentation xml:lang="en"> XML schema for Hofstadter's Gödel pq-System. Test data for list data type validation. </xs:documentation> </xs:annotation> <xs:element name="pqData" type="pqns:pqDataType"></xs:element> <xs:complexType name="pqDataType"> <xs:complexContent> <xs:restriction base="xs:anyType"> <xs:sequence minOccurs="1" maxOccurs="1"> <xs:element name="dashes" type="pqns:dashBlockType"></xs:element> <xs:element name="p" type="xs:string" xsi:nill="true"></xs:element> <xs:element name="dashes" type="pqns:dashBlockType"></xs:element> <xs:element name="q" type="xs:string" xsi:nill="true"></xs:element> <xs:element name="dashes" type="pqns:dashBlockType"></xs:element> </xs:sequence> </xs:restriction> </xs:complexContent> </xs:complexType> <xs:complexType name="porqType"> <xs:simpleContent> <xs:extension base="xs:string"></xs:extension> </xs:simpleContent> </xs:complexType> <xs:complexType name="dashBlockType"> <xs:simpleContent> <xs:extension base="pqns:dataDashes"></xs:extension> </xs:simpleContent> </xs:complexType> <xs:simpleType name="Dash"> <xs:restriction base="xs:string"> <xs:pattern value="[\-]"></xs:pattern> </xs:restriction> </xs:simpleType> <xs:simpleType name="dataDashes"> <xs:restriction base="pqns:DashList"> <xs:minLength value="0" /> </xs:restriction> </xs:simpleType> <xs:simpleType name="DashList"> <xs:list itemType="pqns:Dash"></xs:list> </xs:simpleType> </xs:schema> and this XML file pqData0.xml <?xml version="1.0" encoding="utf-8" ?> <pqData xmlns='http://swsis.cambridge.arm.com/~dearlam/xercestest/' xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://swsis.cambridge.arm.com/~dearlam/xercestest/ http://swsis.cambridge.arm.com/~dearlam/xercestest/pq.xsd"> <dashes> - - </dashes> <p/> <dashes>-</dashes> <q/> <dashes>-</dashes> </pqData> (replacing swsis.cambridge.arm.com/~dearlam/xercestest with your location) Then use domprint -wfpp=on pqData0.xml and domprint -n -s -wfpp=on pqData0.xml to print the XML non-validating and validating. They print in equal short time. OK. Now, edit pqData0.xml as pqData1.xml and replace - - with 4000 lines of - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - This gives a 500Kb file (which mimics my real data). If you then try domprint -wfpp=on pqData1.xml and domprint -n -s -wfpp=on pqData1.xml the first prints instantly (pipe it to NUL if you like), but the second consumes 99% CPU for 230 seconds, then prints. That's about 2 bytes per second ! -- (My suspicion is XMLString::tokenizeString is using subString() to calculate the string length way too many times...) kind regards, David -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - If you want more information on JIRA, or have a bug to report see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]