DataTypeListValidator extraordinarily slow  for long lists
-----------------------------------------------------------

         Key: XERCESC-1363
         URL: http://issues.apache.org/jira/browse/XERCESC-1363
     Project: Xerces-C++
        Type: Bug
  Components: Validating Parser (Schema) (Xerces 1.5 or up only)  
    Versions: 2.5.0, 2.6.0    
 Environment: Windows 2000
    Reporter: David Earlam
    Priority: Minor


Validating an XML instance against a Schema with an unbounded xsd:list type can 
take much greater than O(n) processing resources, where n is the number of 
items in the list.

To reproduce use this Schema:

pq.xsd

<?xml version="1.0" encoding="utf-8" ?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
        xmlns:pqns="http://swsis.cambridge.arm.com/~dearlam/xercestest/"; 
targetNamespace="http://swsis.cambridge.arm.com/~dearlam/xercestest/";
        elementFormDefault="qualified" version="0.1">
        <xs:annotation>
                <xs:documentation xml:lang="en">
                XML schema for Hofstadter's Gödel pq-System.
                
                Test data for list data type validation.
         </xs:documentation>
        </xs:annotation>
        <xs:element name="pqData" type="pqns:pqDataType"></xs:element>
        <xs:complexType name="pqDataType">
                <xs:complexContent>
                        <xs:restriction base="xs:anyType">
                                <xs:sequence minOccurs="1" maxOccurs="1">
                                        <xs:element name="dashes" 
type="pqns:dashBlockType"></xs:element>
                                        <xs:element name="p" type="xs:string" 
xsi:nill="true"></xs:element>
                                        <xs:element name="dashes" 
type="pqns:dashBlockType"></xs:element>
                                        <xs:element name="q" type="xs:string" 
xsi:nill="true"></xs:element>
                                        <xs:element name="dashes" 
type="pqns:dashBlockType"></xs:element>
                                </xs:sequence>
                        </xs:restriction>
                </xs:complexContent>
        </xs:complexType>
        <xs:complexType name="porqType">
                <xs:simpleContent>
                        <xs:extension base="xs:string"></xs:extension>
                </xs:simpleContent>
        </xs:complexType>
        <xs:complexType name="dashBlockType">
                <xs:simpleContent>
                        <xs:extension base="pqns:dataDashes"></xs:extension>
                </xs:simpleContent>
        </xs:complexType>
        <xs:simpleType name="Dash">
                <xs:restriction base="xs:string">
                        <xs:pattern value="[\-]"></xs:pattern>
                </xs:restriction>
        </xs:simpleType>
        <xs:simpleType name="dataDashes">
                <xs:restriction base="pqns:DashList">
                        <xs:minLength value="0" />
                </xs:restriction>
        </xs:simpleType>
        <xs:simpleType name="DashList">
                <xs:list itemType="pqns:Dash"></xs:list>
        </xs:simpleType>
</xs:schema>

and this XML file

pqData0.xml

<?xml version="1.0" encoding="utf-8" ?> 
<pqData xmlns='http://swsis.cambridge.arm.com/~dearlam/xercestest/'
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
xsi:schemaLocation="http://swsis.cambridge.arm.com/~dearlam/xercestest/
 http://swsis.cambridge.arm.com/~dearlam/xercestest/pq.xsd";>
<dashes>
- -
</dashes>
<p/>
<dashes>-</dashes>
<q/>
<dashes>-</dashes>
</pqData>

(replacing swsis.cambridge.arm.com/~dearlam/xercestest with your location)

Then use 
  domprint -wfpp=on pqData0.xml
and
  domprint -n -s -wfpp=on pqData0.xml
to print the XML non-validating and validating.

They print in equal short time. OK.

Now, edit pqData0.xml as pqData1.xml and replace
- - 
with 4000 lines of
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
- - - - - - - - - - - - - - - - - - - - - - - - - - -

This gives a 500Kb file (which mimics my real data).

If you then try

  domprint -wfpp=on pqData1.xml
and
  domprint -n -s -wfpp=on pqData1.xml 
the first prints instantly (pipe it to NUL if you like), but the second 
consumes 99% CPU for 230 seconds, then prints. 

That's about 2 bytes per second !

--
(My suspicion is XMLString::tokenizeString is using subString() to calculate 
the string length
way too many times...)


kind regards,
David

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
If you want more information on JIRA, or have a bug to report see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to