DOM performance/coding issue

Erik Rydgren Mon, 15 Sep 2003 01:14:24 -0700

As previously stated it is the getElementsByTagNameNS that is the bad
boy in your code. What the previous writers didn't explain was why it is
so.


The getElementsByTagNameNS operate on ALL nodes in a subtree. It
therefore has to traverse the whole tree and compare tagnames of ALL
elements against your string. If your document is large then you'll get
the picture. It is as inefficient as doing a tablescan for each database
lookup. Of course if you can't do any assumptions how your datastructure
looks like, then the getElementsByTagNameNS method is very powerful. But
in your case it sounds like you already know how your datastructure look
like and therefore you can write more efficient code by doing the
iteration and comparison yourself.

Something like this (I have used a pseudo function equal to clarify the
code):

DOMNode* pNode = m_pDocument->getDocumentElement()->getFirstChild();
while (pNode) {
  if (pNode->getNodeType() == DOM_NODE && equal(pNode->getNodeName(),
"PatchData")) {
    DOMNode* pInnerNode = pNode->getFirstChild();
    while (pInnerNode) {
      if (pInnerNode->getNodeType() == DOM_NODE &&
equal(pInnerNode->getNodeName(), "CbName")) {
        if (equal(pInnerNode->getNodeValue(), "your searchstring "))
          // Bingo!!! We found it, process node
          break; // No need to search for more inner nodes
        }
      }
    }
  }
  pNode = pNode->getNextSibling();
}

But if you know that you'll never will see other nodes than PatchData
nodes under the rootnode you don't have to compare tagname for those
either.

Regards
Erik Rydgren
Mandarin FS
Sweden

> -----Original Message-----
> From: David Hoffer [mailto:[EMAIL PROTECTED]
> Sent: den 15 september 2003 00:06
> To: [EMAIL PROTECTED]
> Subject: DOM performance/coding issue
> 
> I have a question about how to effectively parser a large DOM
document.
> For
> example, I have a lot of 'PatchData' elements.  I am looking for the
set
> of
> these where the child element 'CbName' matches a certain string.  The
> following code takes 30 seconds just to loop through the DOMNodeList
> before
> it finds the first matching child.  How can I make this faster?
> 
> DOMNodeList* pDOMNodeList =
> m_pDocument->getDocumentElement()->getElementsByTagNameNS(NULL,
> L"PatchData");
> 
> for (int i=0; i<pDOMNodeList->getLength(); i++)
> {
>       DOMNode* pDOMNode = pDOMNodeList->item(i);
>       DOMNodeList* pDOMNodeList = ((const
> DOMElement*)pDOMNode)->getElementsByTagNameNS(NULL, L"CbName");
>       pDOMNode = pDOMNodeList->item(0);
>       DOMNode* pChildNode = pDOMNode->getFirstChild();
>       std::wstring wstrTagValue = pChildNode->getNodeValue();
> 
>       if (wcscmp(wstrTagValue.c_str(), m_wstrColorbarName.c_str()) ==
0)
>       {
>               // This is one of the ones I want...I spend very little
time
> here.
>       }
> 
>       // I spend 30 seconds here, in a loop...
> }
> 
> My XML structure is like...
> <PatchData>
>     <CbName>abc</CbName>
>     <Location>0</Location>
>     <Type>0</Type>
>     <Width>5.15</Width>
>     <Enabled>true</Enabled>
> </PatchData>
> <PatchData>
>     <CbName>abc</CbName>
>     <Location>1</Location>
>     <Type>1</Type>
>     <Width>5.25</Width>
>     <Enabled>false</Enabled>
> </PatchData>
> 
> -dh



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

DOM performance/coding issue

Reply via email to