[ http://issues.apache.org/jira/browse/XERCESC-1363?page=comments#action_60347 ] David Earlam commented on XERCESC-1363: ---------------------------------------
I wrote "That's about 2 bytes per second !" I meant "That's about 2 kilobytes per second !" > DataTypeListValidator extraordinarily slow for long lists > ----------------------------------------------------------- > > Key: XERCESC-1363 > URL: http://issues.apache.org/jira/browse/XERCESC-1363 > Project: Xerces-C++ > Type: Bug > Components: Validating Parser (Schema) (Xerces 1.5 or up only) > Versions: 2.5.0, 2.6.0 > Environment: Windows 2000 > Reporter: David Earlam > Priority: Minor > Attachments: pq.zip > > Validating an XML instance against a Schema with an unbounded xsd:list type > can take much greater than O(n) processing resources, where n is the number > of items in the list. > To reproduce use this Schema: > pq.xsd > <?xml version="1.0" encoding="utf-8" ?> > <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" > xmlns:pqns="http://swsis.cambridge.arm.com/~dearlam/xercestest/" > targetNamespace="http://swsis.cambridge.arm.com/~dearlam/xercestest/" > elementFormDefault="qualified" version="0.1"> > <xs:annotation> > <xs:documentation xml:lang="en"> > XML schema for Hofstadter's Gödel pq-System. > > Test data for list data type validation. > </xs:documentation> > </xs:annotation> > <xs:element name="pqData" type="pqns:pqDataType"></xs:element> > <xs:complexType name="pqDataType"> > <xs:complexContent> > <xs:restriction base="xs:anyType"> > <xs:sequence minOccurs="1" maxOccurs="1"> > <xs:element name="dashes" > type="pqns:dashBlockType"></xs:element> > <xs:element name="p" type="xs:string" > xsi:nill="true"></xs:element> > <xs:element name="dashes" > type="pqns:dashBlockType"></xs:element> > <xs:element name="q" type="xs:string" > xsi:nill="true"></xs:element> > <xs:element name="dashes" > type="pqns:dashBlockType"></xs:element> > </xs:sequence> > </xs:restriction> > </xs:complexContent> > </xs:complexType> > <xs:complexType name="porqType"> > <xs:simpleContent> > <xs:extension base="xs:string"></xs:extension> > </xs:simpleContent> > </xs:complexType> > <xs:complexType name="dashBlockType"> > <xs:simpleContent> > <xs:extension base="pqns:dataDashes"></xs:extension> > </xs:simpleContent> > </xs:complexType> > <xs:simpleType name="Dash"> > <xs:restriction base="xs:string"> > <xs:pattern value="[\-]"></xs:pattern> > </xs:restriction> > </xs:simpleType> > <xs:simpleType name="dataDashes"> > <xs:restriction base="pqns:DashList"> > <xs:minLength value="0" /> > </xs:restriction> > </xs:simpleType> > <xs:simpleType name="DashList"> > <xs:list itemType="pqns:Dash"></xs:list> > </xs:simpleType> > </xs:schema> > and this XML file > pqData0.xml > <?xml version="1.0" encoding="utf-8" ?> > <pqData xmlns='http://swsis.cambridge.arm.com/~dearlam/xercestest/' > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" > xsi:schemaLocation="http://swsis.cambridge.arm.com/~dearlam/xercestest/ > http://swsis.cambridge.arm.com/~dearlam/xercestest/pq.xsd"> > <dashes> > - - > </dashes> > <p/> > <dashes>-</dashes> > <q/> > <dashes>-</dashes> > </pqData> > (replacing swsis.cambridge.arm.com/~dearlam/xercestest with your location) > Then use > domprint -wfpp=on pqData0.xml > and > domprint -n -s -wfpp=on pqData0.xml > to print the XML non-validating and validating. > They print in equal short time. OK. > Now, edit pqData0.xml as pqData1.xml and replace > - - > with 4000 lines of > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > - - - - - - - - - - - - - - - - - - - - - - - - - - - - > This gives a 500Kb file (which mimics my real data). > If you then try > domprint -wfpp=on pqData1.xml > and > domprint -n -s -wfpp=on pqData1.xml > the first prints instantly (pipe it to NUL if you like), but the second > consumes 99% CPU for 230 seconds, then prints. > That's about 2 bytes per second ! > -- > (My suspicion is XMLString::tokenizeString is using subString() to calculate > the string length > way too many times...) > kind regards, > David -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - If you want more information on JIRA, or have a bug to report see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]