Sample code improvement

Daniel J. Pyra Wed, 27 Feb 2002 03:27:14 -0800

Hi XercesC community!

I would like to announce an improvement for DOMPrint sample program (for
XerecsC 1.6.0).
DOMPrint works pretty well, but for documents with a tag storing very large
portion of text data (> 2MB) it slows down significantly. Probably it is not
a good idea to put such data in a single tag, but in existing system which I
must deal with, I have no choice - my interface receives large files saved
in Base64 format. I have adopted DOMPrint program, so I can write XML
documents in any C++ stream.
In function ostream& operator<<(ostream& target, DOM_Node& toWrite) in
DOMPrint.cpp file a call
gFormatter->formatBuf("large buffer", "large count",.)
is the bottle neck. Probably it is a good idea to divide "large buffer" into
pieces.


***
// original source:

...
unsigned long lent = nodeValue.length() ;

switch (toWrite.getNodeType())
{
    case DOM_Node::TEXT_NODE:
    {
        gFormatter->formatBuf(nodeValue.rawBuffer(), lent,
XMLFormatter::CharEscapes) ;
        break ;
    }

...
***

***
// suggested source:

...
unsigned long lent = nodeValue.length() ;

switch (toWrite.getNodeType())
{
    case DOM_Node::TEXT_NODE:
    {
        /*
        Index of the beginning of data portion of tag being put in a stream
        */
        unsigned long ind = 0L ;

        /*
        Tag values are being written out in portions <=
XERCESC_XMLFRMBUF_SIZE. Then putting "large" tags does not differ from
multiply putting smaller tags.
        */
        while( ind + XERCESC_XMLFRMBUF_SIZE < lent )
        {
            gFormatter->formatBuf(&nodeValue.rawBuffer()[ind],
XERCESC_XMLFRMBUF_SIZE, XMLFormatter::CharEscapes) ;
            ind += XERCESC_XMLFRMBUF_SIZE ;
        }

        gFormatter->formatBuf(&nodeValue.rawBuffer()[ind], lent,
XMLFormatter::CharEscapes) ;
        break ;
    }

...
***

After analysing source of XMLFormatter class I assigned for
XERCESC_XMLFRMBUF_SIZE value 65536, which looks suit for internal
transcoding buffer. Now there is no decreasing of speed while calling
function for XML documents with a tag storing very large portion of text
data.

All documents I process are encoded in UTF-8. I have tested the code in
Windows 2000 (VC++) and HP-UX 11 (a++) environments with XercesC 1.6.0. Any
ideas, comments, further improvements are welcome.

Daniel Pyra
Software Engineer
[EMAIL PROTECTED]
Poland

***
// original source:

...
unsigned long lent = nodeValue.length() ;

switch (toWrite.getNodeType())
{
    case DOM_Node::TEXT_NODE:
    {
        gFormatter->formatBuf(nodeValue.rawBuffer(), lent, XMLFormatter::CharEscapes) ;
        break ;
    }

...
***

***
// suggested source:

...
unsigned long lent = nodeValue.length() ;

switch (toWrite.getNodeType())
{
    case DOM_Node::TEXT_NODE:
    {
        /*
        Index of the beginning of data portion of tag being put in a stream
        */
        unsigned long ind = 0L ;

        /*
        Tag values are being written out in portions <= XERCESC_XMLFRMBUF_SIZE. Then 
putting �large� tags does not differ from multiply putting smaller tags.
        */
        while( ind + XERCESC_XMLFRMBUF_SIZE < lent )
        {
            gFormatter->formatBuf(&nodeValue.rawBuffer()[ind], XERCESC_XMLFRMBUF_SIZE, 
XMLFormatter::CharEscapes) ;
            ind += XERCESC_XMLFRMBUF_SIZE ;
        }

        gFormatter->formatBuf(&nodeValue.rawBuffer()[ind], lent, 
XMLFormatter::CharEscapes) ;
        break ;
    }

...
***

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Sample code improvement

Reply via email to