Re: Sample code improvement

PeiYong PY Zhang Thu, 07 Mar 2002 15:11:41 -0800

Daniel Pyra,

>gFormatter->formatBuf("large buffer", "large count",.) is the bottle neck.


   Could you please elabrate why this function is the bottle neck?

>After analysing source of XMLFormatter class I assigned for
>XERCESC_XMLFRMBUF_SIZE value 65536, which looks suit for internal transcoding
buffer

the XMLFormatter::formatBuf() would try to send transcodeTo() a block of data no
more than
kTmpBufSize (which is 16 * 1024 = 2^14), as shown below:

                const unsigned int srcCount = tmpPtr - srcPtr;
                const unsigned srcChars = srcCount > kTmpBufSize ?
                                          kTmpBufSize : srcCount;

                const unsigned int outBytes = fXCoder->transcodeTo
                (
                    srcPtr
                    , srcChars
                    , fTmpBuf
                    , kTmpBufSize
                    , charsEaten
                    , unRepOpts
                );

By reducing the size to 65536, you reduce the invocations to transcoderTo() from
within formatBuf(), but the overall invocations
to transcoderTo() is the same considering your loop to invoke formatBuf().

>All documents I process are encoded in UTF-8. I have tested the code in
>Windows 2000 (VC++) and HP-UX 11 (a++) environments with XercesC 1.6.0. Any
>ideas, comments, further improvements are welcome.

   Did you try your enhencement on DOMPrint (not your application code) and
tested against your sample XML files? Is there any significant improvement? If
so, would you mind sending us your sample XML files? thanks.

PeiYong



"Daniel J. Pyra" wrote:

> Hi XercesC community!
>
> I would like to announce an improvement for DOMPrint sample program (for
> XerecsC 1.6.0).
> DOMPrint works pretty well, but for documents with a tag storing very large
> portion of text data (> 2MB) it slows down significantly. Probably it is not
> a good idea to put such data in a single tag, but in existing system which I
> must deal with, I have no choice - my interface receives large files saved
> in Base64 format. I have adopted DOMPrint program, so I can write XML
> documents in any C++ stream.
> In function ostream& operator<<(ostream& target, DOM_Node& toWrite) in
> DOMPrint.cpp file a call
> gFormatter->formatBuf("large buffer", "large count",.)
> is the bottle neck. Probably it is a good idea to divide "large buffer" into
> pieces.
>
> ***
> // original source:
>
> ...
> unsigned long lent = nodeValue.length() ;
>
> switch (toWrite.getNodeType())
> {
>     case DOM_Node::TEXT_NODE:
>     {
>         gFormatter->formatBuf(nodeValue.rawBuffer(), lent,
> XMLFormatter::CharEscapes) ;
>         break ;
>     }
>
> ...
> ***
>
> ***
> // suggested source:
>
> ...
> unsigned long lent = nodeValue.length() ;
>
> switch (toWrite.getNodeType())
> {
>     case DOM_Node::TEXT_NODE:
>     {
>         /*
>         Index of the beginning of data portion of tag being put in a stream
>         */
>         unsigned long ind = 0L ;
>
>         /*
>         Tag values are being written out in portions <=
> XERCESC_XMLFRMBUF_SIZE. Then putting "large" tags does not differ from
> multiply putting smaller tags.
>         */
>         while( ind + XERCESC_XMLFRMBUF_SIZE < lent )
>         {
>             gFormatter->formatBuf(&nodeValue.rawBuffer()[ind],
> XERCESC_XMLFRMBUF_SIZE, XMLFormatter::CharEscapes) ;
>             ind += XERCESC_XMLFRMBUF_SIZE ;
>         }
>
>         gFormatter->formatBuf(&nodeValue.rawBuffer()[ind], lent,
> XMLFormatter::CharEscapes) ;
>         break ;
>     }
>
> ...
> ***
>
> After analysing source of XMLFormatter class I assigned for
> XERCESC_XMLFRMBUF_SIZE value 65536, which looks suit for internal
> transcoding buffer. Now there is no decreasing of speed while calling
> function for XML documents with a tag storing very large portion of text
> data.
>
> All documents I process are encoded in UTF-8. I have tested the code in
> Windows 2000 (VC++) and HP-UX 11 (a++) environments with XercesC 1.6.0. Any
> ideas, comments, further improvements are welcome.
>
> Daniel Pyra
> Software Engineer
> [EMAIL PROTECTED]
> Poland
>
>   ------------------------------------------------------------------------
>                    Name: domprint.txt
>    domprint.txt    Type: Plain Text (text/plain)
>                Encoding: quoted-printable
>
>   ------------------------------------------------------------------------
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Sample code improvement

Reply via email to