Hi XercesC community!
I would like to announce an improvement for DOMPrint sample program (for
XerecsC 1.6.0).
DOMPrint works pretty well, but for documents with a tag storing very large
portion of text data (> 2MB) it slows down significantly. Probably it is not
a good idea to put such data in a single tag, but in existing system which I
must deal with, I have no choice - my interface receives large files saved
in Base64 format. I have adopted DOMPrint program, so I can write XML
documents in any C++ stream.
In function ostream& operator<<(ostream& target, DOM_Node& toWrite) in
DOMPrint.cpp file a call
gFormatter->formatBuf("large buffer", "large count",.)
is the bottle neck. Probably it is a good idea to divide "large buffer" into
pieces.
***
// original source:
...
unsigned long lent = nodeValue.length() ;
switch (toWrite.getNodeType())
{
case DOM_Node::TEXT_NODE:
{
gFormatter->formatBuf(nodeValue.rawBuffer(), lent,
XMLFormatter::CharEscapes) ;
break ;
}
...
***
***
// suggested source:
...
unsigned long lent = nodeValue.length() ;
switch (toWrite.getNodeType())
{
case DOM_Node::TEXT_NODE:
{
/*
Index of the beginning of data portion of tag being put in a stream
*/
unsigned long ind = 0L ;
/*
Tag values are being written out in portions <=
XERCESC_XMLFRMBUF_SIZE. Then putting "large" tags does not differ from
multiply putting smaller tags.
*/
while( ind + XERCESC_XMLFRMBUF_SIZE < lent )
{
gFormatter->formatBuf(&nodeValue.rawBuffer()[ind],
XERCESC_XMLFRMBUF_SIZE, XMLFormatter::CharEscapes) ;
ind += XERCESC_XMLFRMBUF_SIZE ;
}
gFormatter->formatBuf(&nodeValue.rawBuffer()[ind], lent,
XMLFormatter::CharEscapes) ;
break ;
}
...
***
After analysing source of XMLFormatter class I assigned for
XERCESC_XMLFRMBUF_SIZE value 65536, which looks suit for internal
transcoding buffer. Now there is no decreasing of speed while calling
function for XML documents with a tag storing very large portion of text
data.
All documents I process are encoded in UTF-8. I have tested the code in
Windows 2000 (VC++) and HP-UX 11 (a++) environments with XercesC 1.6.0. Any
ideas, comments, further improvements are welcome.
Daniel Pyra
Software Engineer
[EMAIL PROTECTED]
Poland
***
// original source:
...
unsigned long lent = nodeValue.length() ;
switch (toWrite.getNodeType())
{
case DOM_Node::TEXT_NODE:
{
gFormatter->formatBuf(nodeValue.rawBuffer(), lent, XMLFormatter::CharEscapes) ;
break ;
}
...
***
***
// suggested source:
...
unsigned long lent = nodeValue.length() ;
switch (toWrite.getNodeType())
{
case DOM_Node::TEXT_NODE:
{
/*
Index of the beginning of data portion of tag being put in a stream
*/
unsigned long ind = 0L ;
/*
Tag values are being written out in portions <= XERCESC_XMLFRMBUF_SIZE. Then
putting �large� tags does not differ from multiply putting smaller tags.
*/
while( ind + XERCESC_XMLFRMBUF_SIZE < lent )
{
gFormatter->formatBuf(&nodeValue.rawBuffer()[ind], XERCESC_XMLFRMBUF_SIZE,
XMLFormatter::CharEscapes) ;
ind += XERCESC_XMLFRMBUF_SIZE ;
}
gFormatter->formatBuf(&nodeValue.rawBuffer()[ind], lent,
XMLFormatter::CharEscapes) ;
break ;
}
...
***
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]