Hello Joseph,

Thanks for your feedback.

It would also be nice if we could avoid maintaining two XMLStringBuffers
when the normalized and non-normalized attribute values are equivalent.
We'd have to investigate whether tracking this is costly. We've
made a number imporvements to attribute processing in the last two
releases bu there's certainly room for more. I just noticed we're passing
in two parameters to scanAttributeValue which are never accessed by the
method.

As for your first suggestion, there are documents (SVG for instance) out
in the world which have large attribute values. The parser may be
iterating over two large strings only to determine that it still needs to
create a new String object. This would degrade performance for such
documents.

On Tue, 10 Feb 2004, Joseph Shraibman wrote:

> I recently profiled a program of mine, and the top two stack traces were:
>
>
> CPU SAMPLES BEGIN (total = 4019) Tue Feb 10 01:23:18 2004
> rank   self  accum   count trace method
>     1 17.54% 17.54%     705  4454 java.lang.String.<init>
>     2 17.52% 35.06%     704  4463 org.apache.xerces.xni.XMLString.toString
>     3  7.86% 42.92%     316  4459 java.lang.StringBuffer.toString
>     4  7.56% 50.49%     304  4458 java.lang.StringBuffer.<init>
>     5  7.51% 58.00%     302  4475
> com.xtenit.xml.PathContentHandler.setCurrPath
>     6  7.39% 65.39%     297  4474 java.lang.StringBuffer.toString
>     7  6.87% 72.26%     276  4472 java.lang.StringBuffer.<init>
>     8  6.82% 79.07%     274  4468
> com.xtenit.xml.PathContentHandler.setCurrPath
>     9  5.55% 84.62%     223  4469 java.lang.StringBuffer.expandCapacity
>    10  1.39% 86.02%      56  4473 java.lang.StringBuffer.expandCapacity
>
> TRACE 4454:
>          java.lang.String.<init>(String.java:199)
>          org.apache.xerces.xni.XMLString.toString(<Unknown>:Unknown line)
>
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanAttribute(<Unknown>:Unknown
> line)
>
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanStartElement(<Unknown>:Unknown
> line)
>
> TRACE 4463:
>          org.apache.xerces.xni.XMLString.toString(<Unknown>:Unknown line)
>
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanAttribute(<Unknown>:Unknown
> line)
>
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanStartElement(<Unknown>:Unknown
> line)
>
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(<Unknown>:Unknown
> line)
>
> =================================
>
> I notice in scanAttribute():
>
>          scanAttributeValue(fTempString, fTempString2,
>                             fAttributeQName.rawname, attributes,
>                             attrIndex, isVC,fCurrentElement.rawname);
>          attributes.setValue(attrIndex, fTempString.toString());
>          attributes.setNonNormalizedValue(attrIndex,
> fTempString2.toString());
>
>
> The only time fTempString is not the same as fTempString2 is when the
> value has either a non space whitespace char (\r\n\t) or there is an
> entity (&string;).  The vast majority of the time they are in fact the
> same (at least with the xml I'm dealing with) so it seems to me we can
> get rid of one of the two toString() calls.
>
> There are two ways to do this:
> 1) in scanAttribute() compare the two XMLStrings.
> Something like this:
>
>          scanAttributeValue(fTempString, fTempString2,
>                             fAttributeQName.rawname, attributes,
>                             attrIndex, isVC,fCurrentElement.rawname);
>       String string1 = fTempString.toString()
>          attributes.setValue(attrIndex, string1);
>       String string2 = fTempString2.equals(string1) ? string1 :
> fTempString2.toString();
>          attributes.setNonNormalizedValue(attrIndex, string2);
>
> - or -
>
> 2) have scanAttributeValue() return a boolean indicating if it found an
> entity or a non-space whitespace char.
>
> I think 2 might be faster, but 1 is easier to implement and makes the
> code less messy.  Thoughts?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------
Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: [EMAIL PROTECTED]
E-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to