Hello Joseph, Thanks for your feedback.
It would also be nice if we could avoid maintaining two XMLStringBuffers when the normalized and non-normalized attribute values are equivalent. We'd have to investigate whether tracking this is costly. We've made a number imporvements to attribute processing in the last two releases bu there's certainly room for more. I just noticed we're passing in two parameters to scanAttributeValue which are never accessed by the method. As for your first suggestion, there are documents (SVG for instance) out in the world which have large attribute values. The parser may be iterating over two large strings only to determine that it still needs to create a new String object. This would degrade performance for such documents. On Tue, 10 Feb 2004, Joseph Shraibman wrote: > I recently profiled a program of mine, and the top two stack traces were: > > > CPU SAMPLES BEGIN (total = 4019) Tue Feb 10 01:23:18 2004 > rank self accum count trace method > 1 17.54% 17.54% 705 4454 java.lang.String.<init> > 2 17.52% 35.06% 704 4463 org.apache.xerces.xni.XMLString.toString > 3 7.86% 42.92% 316 4459 java.lang.StringBuffer.toString > 4 7.56% 50.49% 304 4458 java.lang.StringBuffer.<init> > 5 7.51% 58.00% 302 4475 > com.xtenit.xml.PathContentHandler.setCurrPath > 6 7.39% 65.39% 297 4474 java.lang.StringBuffer.toString > 7 6.87% 72.26% 276 4472 java.lang.StringBuffer.<init> > 8 6.82% 79.07% 274 4468 > com.xtenit.xml.PathContentHandler.setCurrPath > 9 5.55% 84.62% 223 4469 java.lang.StringBuffer.expandCapacity > 10 1.39% 86.02% 56 4473 java.lang.StringBuffer.expandCapacity > > TRACE 4454: > java.lang.String.<init>(String.java:199) > org.apache.xerces.xni.XMLString.toString(<Unknown>:Unknown line) > > org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanAttribute(<Unknown>:Unknown > line) > > org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanStartElement(<Unknown>:Unknown > line) > > TRACE 4463: > org.apache.xerces.xni.XMLString.toString(<Unknown>:Unknown line) > > org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanAttribute(<Unknown>:Unknown > line) > > org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanStartElement(<Unknown>:Unknown > line) > > org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(<Unknown>:Unknown > line) > > ================================= > > I notice in scanAttribute(): > > scanAttributeValue(fTempString, fTempString2, > fAttributeQName.rawname, attributes, > attrIndex, isVC,fCurrentElement.rawname); > attributes.setValue(attrIndex, fTempString.toString()); > attributes.setNonNormalizedValue(attrIndex, > fTempString2.toString()); > > > The only time fTempString is not the same as fTempString2 is when the > value has either a non space whitespace char (\r\n\t) or there is an > entity (&string;). The vast majority of the time they are in fact the > same (at least with the xml I'm dealing with) so it seems to me we can > get rid of one of the two toString() calls. > > There are two ways to do this: > 1) in scanAttribute() compare the two XMLStrings. > Something like this: > > scanAttributeValue(fTempString, fTempString2, > fAttributeQName.rawname, attributes, > attrIndex, isVC,fCurrentElement.rawname); > String string1 = fTempString.toString() > attributes.setValue(attrIndex, string1); > String string2 = fTempString2.equals(string1) ? string1 : > fTempString2.toString(); > attributes.setNonNormalizedValue(attrIndex, string2); > > - or - > > 2) have scanAttributeValue() return a boolean indicating if it found an > entity or a non-space whitespace char. > > I think 2 might be faster, but 1 is easier to implement and makes the > code less messy. Thoughts? > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------- Michael Glavassevich XML Parser Development IBM Toronto Lab E-mail: [EMAIL PROTECTED] E-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
