[ http://issues.apache.org/jira/browse/XERCESC-1363?page=comments#action_60353 ] Christian Will commented on XERCESC-1363: -----------------------------------------
Hi, the problem is that we always calculate the string length of our source string in XMLString:substring(...), also if the string is extraordinarily long and we only copy some charachters. Here is my proposal to fix that. I removed the string length call and implemented the check during we loop and copy all charachters if we at the end of our source buffer. So if reached the end of our source buffer before we reached the end index we have to throw an error. This way should be much faster. I'll attach a patch file. Regards, Christian Will > DataTypeListValidator extraordinarily slow for long lists > ----------------------------------------------------------- > > Key: XERCESC-1363 > URL: http://issues.apache.org/jira/browse/XERCESC-1363 > Project: Xerces-C++ > Type: Bug > Components: Validating Parser (Schema) (Xerces 1.5 or up only) > Versions: 2.5.0, 2.6.0 > Environment: Windows 2000 > Reporter: David Earlam > Priority: Minor > Attachments: pq.zip > > Validating an XML instance against a Schema with an unbounded xsd:list type > can take much greater than O(n) processing resources, where n is the number > of items in the list. > To reproduce use this Schema: > pq.xsd > <?xml version="1.0" encoding="utf-8" ?> > <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" > xmlns:pqns="http://swsis.cambridge.arm.com/~dearlam/xercestest/" > targetNamespace="http://swsis.cambridge.arm.com/~dearlam/xercestest/" > elementFormDefault="qualified" version="0.1"> > <xs:annotation> > <xs:documentation xml:lang="en"> > XML schema for Hofstadter's Gödel pq-System. > > Test data for list data type validation. > </xs:documentation> > </xs:annotation> > <xs:element name="pqData" type="pqns:pqDataType"></xs:element> > <xs:complexType name="pqDataType"> > <xs:complexContent> > <xs:restriction base="xs:anyType"> > <xs:sequence minOccurs="1" maxOccurs="1"> > <xs:element name="dashes" > type="pqns:dashBlockType"></xs:element> > <xs:element name="p" type="xs:string" > xsi:nill="true"></xs:element> > <xs:element name="dashes" > type="pqns:dashBlockType"></xs:element> > <xs:element name="q" type="xs:string" > xsi:nill="true"></xs:element> > <xs:element name="dashes" > type="pqns:dashBlockType"></xs:element> > </xs:sequence> > </xs:restriction> > </xs:complexContent> > </xs:complexType> > <xs:complexType name="porqType"> > <xs:simpleContent> > <xs:extension base="xs:string"></xs:extension> > </xs:simpleContent> > </xs:complexType> > <xs:complexType name="dashBlockType"> > <xs:simpleContent> > <xs:extension base="pqns:dataDashes"></xs:extension> > </xs:simpleContent> > </xs:complexType> > <xs:simpleType name="Dash"> > <xs:restriction base="xs:string"> > <xs:pattern value="[\-]"></xs:pattern> > </xs:restriction> > </xs:simpleType> > <xs:simpleType name="dataDashes"> > <xs:restriction base="pqns:DashList"> > <xs:minLength value="0" /> > </xs:restriction> > </xs:simpleType> > <xs:simpleType name="DashList"> > <xs:list itemType="pqns:Dash"></xs:list> > </xs:simpleType> > </xs:schema> > and this XML file > pqData0.xml > <?xml version="1.0" encoding="utf-8" ?> > <pqData xmlns='http://swsis.cambridge.arm.com/~dearlam/xercestest/' > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" > xsi:schemaLocation="http://swsis.cambridge.arm.com/~dearlam/xercestest/ > http://swsis.cambridge.arm.com/~dearlam/xercestest/pq.xsd"> > <dashes> > - - > </dashes> > <p/> > <dashes>-</dashes> > <q/> > <dashes>-</dashes> > </pqData> > (replacing swsis.cambridge.arm.com/~dearlam/xercestest with your location) > Then use > domprint -wfpp=on pqData0.xml > and > domprint -n -s -wfpp=on pqData0.xml > to print the XML non-validating and validating. > They print in equal short time. OK. > Now, edit pqData0.xml as pqData1.xml and replace > - - > with 4000 lines of > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > - - - - - - - - - - - - - - - - - - - - - - - - - - - - > This gives a 500Kb file (which mimics my real data). > If you then try > domprint -wfpp=on pqData1.xml > and > domprint -n -s -wfpp=on pqData1.xml > the first prints instantly (pipe it to NUL if you like), but the second > consumes 99% CPU for 230 seconds, then prints. > That's about 2 bytes per second ! > -- > (My suspicion is XMLString::tokenizeString is using subString() to calculate > the string length > way too many times...) > kind regards, > David -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - If you want more information on JIRA, or have a bug to report see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]