RE: Performance degradation

Alberto Massari Wed, 05 May 2004 03:38:10 -0700

Thanks for the analysis; I'll see what I can do to fix the degradation

Alberto

At 11.30 04/05/2004 -0500, Qiu, Wenning wrote:

About half of the performance degredation in DOMWriterImpl was due to the handling of namespace. Even applications that do not use namespace have to pay the price for it.

The overhead comes from the data structure used to handle namespace: a stack of hashmaps of namespace bindings. Current implementation creates a hashmap for every Element Node and pushes it on to the stack as it traverses the DOM tree. My test case did not use any namespace, but half of the DOMWriterImpl's 65% performance drop came from the construction and destruction of stack of empty hash maps.

I wonder if the following optimization is possible:

1) Both DOMParser and DOMBuilder has a DoNamespace feature. Is it possible for the DOMDocument to carry that feature flag, so that DOMWriter can completely bypass the namespace handling if it knows the DOM does not have namespace?

2) Even when namespace is used, it seems highly inefficient to have a hashmap created and destroyed for each DOMElement node. It is only necessary when that node introduces new namespace binding.

It seems to me there are some places we can definitely optimize DOMWriter's performance.

My application runs about 10% faster when I replaced DOMWriter with my own serializer(built with xercesc.2.5.0, STLport.4.6.2, libhoard.2.1.2d, expat.1.95.7) compared with the same code built with xercesc.2.1.0, STLport.4.5.3, libhoard.2.1.0 and expat.1.95.4. I hope I an drop my serializer when DOMWriter gets more efficient.

-Wenning Qiu

-----Original Message-----
From: Qiu, Wenning [mailto:[EMAIL PROTECTED]
Sent: Wednesday, April 28, 2004 5:11 PM
To: [EMAIL PROTECTED]
Subject: RE: Performance degradation

I appreciate the feedbacks on this subject, they have been very helpful.

I quantified my application and I'd like to report the results that I have obtained.

I compared xercesc.2.1.0 with xercesc.2.5.0. built with STLport.4.5.3. libhoard was taken out since it would not work with quantify. The quantified program has only one thread, it parses XML messages or builds DOM then serializes the DOM using xercesc::DOMWriter.

I noticed only slight performance drop for parsing(~0.5%) and DOM building(~3%). The bottleneck turns out to be the serlization part, where the degradation in performance is around 60%.

DOMWriterImpl::writeNode() used 75,800,391 cycles with Xerces.2.1.0. It used 124,670,672 cycles with Xerces.2.5.0. It's 64.5% slower.

The major contributors are:

1) XMLFormatter::operator << (const unsigned short*const) used 39,802,051 cycles with Xerces.2.1.0. It used 41,861,068 cycles. It's 5.2% slower.

2) XMLFormatter::operator << (const unsigned short) used 16,760,000 cycles with Xerces.2.1.0. It used 18,032,000 cycles with Xerces.2.5.0. It's 7.6% slower.

3) Some new functions in Xerces.2.5.0 also contributed: o xercesc_2_5::RefHashTableOf<unsigned short>::RefHashTableOf() 16.8% of total cycles. Almost all the time is spent calling MemoryManagerImpl::allocate() and Xmemory::operator new(). o xercesc_2_5::XMLFormatter::specialFormat() 9.33% of total cycles. o xercesc_2_5::BaseRefVectorOf<xercesc_2.5::RefHashTableOf<unsigned short> >::removeLastElement() 9.23% of total. o xercesc_2_5::XMemory::operator new() 4.06% of total cycles.

In addition to the degradation in processing time, xerces.2.5.0 seems not to scale beyond 2 threads when DOM serialization is involved.


-----Original Message-----
From: Karande Samir [mailto:[EMAIL PROTECTED]
Sent: Wednesday, April 28, 2004 2:00 PM
To: '[EMAIL PROTECTED]'
Subject: RE: Performance degradation


Hi Wenning,
        I have seen similar performance degradation (though, not because of
xerces upgrade) in past. It turned out that some of the components I was
using were using their own malloc/new implementation and hoard could not
override the malloc/new calls that were made through those components.
Unfortunately, most of the memory management schemes in third party
components are not optimized for SMP systems and you would start seeing lot
of context switching between threads/processes due to idle wait in malloc.
        If new memory allocation scheme in xerces is preventing hoard to
take over memory management, its likely that you would see performance
degradation. May be we want to supply libc's new/malloc to the xerces parser
calls explicitly, if its possible.

I hope this helps.

-Samir

-----Original Message-----
From: Qiu, Wenning [mailto:[EMAIL PROTECTED]
Sent: Wednesday, April 28, 2004 2:47 PM
To: [EMAIL PROTECTED]
Subject: RE: Performance degradation

Yes I am using a Solaris box with 4 processors.

-----Original Message-----
From: Karande Samir [mailto:[EMAIL PROTECTED]
Sent: Wednesday, April 28, 2004 12:35 PM
To: '[EMAIL PROTECTED]'
Subject: RE: Performance degradation


Hi Wenning,
        Do you use multiprocessor (SMP) systems ? I am assuming SMP system
because you are using hoard.
-Samir

-----Original Message-----
From: Qiu, Wenning [mailto:[EMAIL PROTECTED]
Sent: Tuesday, April 27, 2004 5:16 PM
To: [EMAIL PROTECTED]
Subject: RE: Performance degradation

Hi Neil,

You are right the "per-document memory heap" in Xerces's DOM implementation
is still there. I missed that when I looked at the source code.

My application accepts XML messages, parses them into DOM, converts DOM to
messages in a proprietory format and send them to a server. When response
messages come back from the server, my application parses the proprietory
response message and builds DOM, it then serialize DOM into XML byte stream
and send out.

We use expat to do the parsing because for better performance. So the
functionalities we actually use from Xerces is DOM building and
serialization. The validation was not included.


Thanks,
Wenning Qiu

-----Original Message-----
From: Neil Graham [mailto:[EMAIL PROTECTED]
Sent: Tuesday, April 27, 2004 2:56 PM
To: [EMAIL PROTECTED]
Subject: RE: Performance degradation

Hi Wenning,

If by the "per-document memory heap" you're referring to the way Xerces's
DOM implementation works, then nothing has changed since 2.2.  The same
memory paradigm is used.

The pluggable memory management certainly does introduce overhead:  every
time the parser needs memory, it has to reach out to a virtual function
instead of directly calling the system libraries.  Some work was done to
mitigate the parser's habit of creating and destroying short-lived objects
in the 2.4 time-frame, and this bought back a good portion of the
performance that the pluggable memory scheme cost.

To be more helpful, we'd have to understand the characteristics of your
application.  I conjecture you're DOM-based; do you do any validation?  If
so, then some of the grammar caching/persistence capabilities introduced
since 2.3 might be helpful to you.

Cheers,
Neil
Neil Graham
XML Parser Development
IBM Toronto Lab
Phone:  905-413-3519, T/L 969-3519
E-mail:  [EMAIL PROTECTED]

"Qiu, Wenning"

                      <[EMAIL PROTECTED]        To:
<[EMAIL PROTECTED]>

ystems.com> cc:

                                               Subject:  RE: Performance
degradation
                      04/27/2004 03:29

PM

Please respond to

xerces-c-dev


I've yet to Quantify my application. But as I took a brief look at the
xercesc.2.3.0 source code, it seems that the per-document memory heap is
gone with the introduction of pluggable memory manager. The default memory
manager  just turns around and calls new() and delete(). This means higher
overhead for handling large number of small objects. I suspect that the
default memory manager causes the performance degradation. I have to wait
for my application to be quantified to prove that.

Is the per-document memory heap logic provided somewhere in the source
distribution as a memory manager implemantation? It seems more reasonable
to provide that as the default memory manager.

      -----Original Message-----
      From: Jesse Pelton [mailto:[EMAIL PROTECTED]
      Sent: Tuesday, April 27, 2004 9:24 AM
      To: [EMAIL PROTECTED]
      Subject: RE: Performance degradation

      Hmm. I wonder if the pluggable memory manager introduced in 2.3 is
      responsible for the degradation. If I understand your benchmarks
      correctly, changing from Xerces 2.2 or earlier to 2.3 or later
      results in a 28% decrease in message throughput, from 50/sec to
      36/sec. That's pretty serious.

      Can you profile your application to see if there are any obvious
      bottlenecks in Xerces or elsewhere? Knowing where the problem lies
      would help you and/or the maintainers address it.

       From: Qiu, Wenning [mailto:[EMAIL PROTECTED]
       Sent: Tuesday, April 27, 2004 9:59 AM
       To: [EMAIL PROTECTED]
       Subject: Performance degradation

Hi, All


             We have observed a performance degradation when upgrading some
             third-party packages in our production systems.


             We are currently using xercesc.2.1.0 with STLport.4.5.3,
             libhoard.2.1.0 and expat.1.95.4. We are looking at upgrading
             to xercesc.2.5.0, STLport.4.6.2, libhoard.2.1.2d and
             expat.1.95.7.


             Our current production code can process about 40 messages per
             CPU-second in our test environment, while the new build with
             all new 3-rd party packages can do only 36 per CPU-second.
             However, when built with xercesc.2.1.0(or 2.2.0),
             STLport.4.6.2, libhoard.2.1.2d and expat.1.95.7, it can handle
             close to 50 mesages per CPU-second.


             We have tested all xercesc releases since 2.1.0, it seems that
             the performance drop started since 2.3.0 and remained till the
             latest release.


             Is there a way to turn off the unwanted features in the new
             releases so that good performance is retained?


             Does anybody have any idea when performance is to be addressed
             in future releases?


             For now it looks like we can move up to 2.2.0 at best since
             the performance is of great importance for our system.

Thanks for any feedback.


             Wenning Qiu
             CSG Systems Inc.
             Phone: (402)963-8364
             Email: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

______________________________________________________________________
  This email message has been scanned by PineApp Mail-Secure and has been
found clean.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

______________________________________________________________________
  This email message has been scanned by PineApp Mail-Secure and has been
found clean.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Performance degradation

Reply via email to