RE: Implementing getTextContent

Erik Rydgren Wed, 02 Apr 2003 00:58:40 -0800

I agree that this is a good way of providing both functionalities and it
actually made implementation a bit easier.
I took the liberty to write the whole shabang down in code.
This is written direcly from the specifications now.
As always the code compiles but it is untested. Please verify.


Regards

Erik Rydgren
Mandarinen systems AB
Sweden

--- CODE ---

const XMLCh*     DOMNodeImpl::getTextContent(XMLCh* pzBuffer, unsigned int&
rnBufferLength) const
{
  unsigned int nRemainingBuffer = rnBufferLength;
  rnBufferLength = 0;
  if (pzBuffer)
    *pzBuffer = 0;

  DOMNode *thisNode = castToNode(this);
  switch (thisNode->getNodeType()) {
    case DOMNode::ELEMENT_NODE:
    case DOMNode::ENTITY_NODE:
    case DOMNode::ENTITY_REFERENCE_NODE:
    case DOMNode::DOCUMENT_FRAGMENT_NODE:
    {
      DOMNode* current = thisNode->getFirstChild();
      while (current != NULL) {
        if (current->getNodeType() != DOMNode::COMMENT_NODE &&
            current->getNodeType() != DOMNode::PROCESSING_INSTRUCTION_NODE)
        {
          if (pzBuffer) {
            unsigned int nContentLength = nRemainingBuffer;
            ((DOMNodeImpl*)current)->getTextContent(pzBuffer +
rnBufferLength, nContentLength);
            rnBufferLength += nContentLength;
            nRemainingBuffer -= nContentLength;
          }
          else {
            unsigned int nContentLength = 0;
            ((DOMNodeImpl*)current)->getTextContent(NULL, nContentLength);
            rnBufferLength += nContentLength;
          }
        }
        current = current->getNextSibling();
      }
    }
    break;

    case DOMNode::ATTRIBUTE_NODE:
    case DOMNode::TEXT_NODE:
    case DOMNode::CDATA_SECTION_NODE:
    case DOMNode::COMMENT_NODE:
    case DOMNode::PROCESSING_INSTRUCTION_NODE:
    {
      const XMLCh* pzValue = thisNode->getNodeValue();
      unsigned int nStrLen = XMLString::stringLen(pzValue);
      if (pzBuffer) {
        unsigned int nContentLength = (nRemainingBuffer >= nStrLen) ?
nStrLen : nRemainingBuffer;
        XMLString::copyNString(pzBuffer + rnBufferLength, pzValue,
nContentLength);
        rnBufferLength += nContentLength;
        nRemainingBuffer -= nContentLength;
      }
      else {
        rnBufferLength += nStrLen;
      }
    }
    break;
  }
  return pzBuffer;
}

const XMLCh*     DOMNodeImpl::getTextContent() const
{
  unsigned int nBufferLength = 0;
  getTextContent(NULL, nBufferLength);
  XMLCh* pzBuffer = (XMLCh*)
((DOMDocumentImpl*)getOwnerDocument())->allocate(nBufferLength+1);
  getTextContent(pzBuffer, nBufferLength);
  pzBuffer[nBufferLength] = 0;
  return pzBuffer;
}

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Sent: den 2 april 2003 02:52
To: [EMAIL PROTECTED]
Subject: RE: Implementing getTextContent






Hi all,

What about another overload which allows for a user-supplied buffer for the
text, along with an argument which specifies the maximum numbers of
characters to copy, returning the number of characters actually copied?
That, along with an implementation that would allow passing in a null
pointer for the buffer to indicate the call should just determine how large
a buffer needs to be would be the most efficient way to go.  Then, the
other original API can be implemented using the new one.

Just my 2 cents worth...

Dave



|---------+--------------------------->
|         |           "Neil Graham"   |
|         |           <[EMAIL PROTECTED]|
|         |           m>              |
|         |                           |
|         |           04/01/2003 02:52|
|         |           PM              |
|         |           Please respond  |
|         |           to xerces-c-dev |
|---------+--------------------------->

>---------------------------------------------------------------------------
-----------------------------------------------------|
  |
                                             |
  |        To:      [EMAIL PROTECTED]
|
  |        cc:      (bcc: David N Bertoni/Cambridge/IBM)
|
  |        Subject: RE: Implementing getTextContent
|

>---------------------------------------------------------------------------
-----------------------------------------------------|



Hi Erik and Gareth,

FWIW, I'm fairly strongly of the view that we need to preserve consistency
here so that the parser owns this memory.  It's always bad to be
inconsistent but isn't it especially bad with things as subtle as memory
management?

Besides, the memory here is no larger than the sum of all the textual
children of the node; when you think about how heavy the DOM tends to be,
so long as this is documented in a FAQ somewhere I think this should be
acceptable.  After all, if a user just wants text then surely she should be
using SAX; otherwise there's generally likely lots of other operations
being performed (i.e., getTextContent shouldn't be used that often).

Cheers,
Neil
Neil Graham
XML Parser Development
IBM Toronto Lab
Phone:  905-413-3519, T/L 969-3519
E-mail:  [EMAIL PROTECTED]




|---------+---------------------------->
|         |           "Erik Rydgren"   |
|         |           <[EMAIL PROTECTED]|
|         |           darinen.se>      |
|         |                            |
|         |           04/01/2003 05:01 |
|         |           AM               |
|         |           Please respond to|
|         |           xerces-c-dev     |
|         |                            |
|---------+---------------------------->

>---------------------------------------------------------------------------
------------------------------------------------------------------|

  |
|
  |       To:       <[EMAIL PROTECTED]>
|
  |       cc:
|
  |       Subject:  RE: Implementing getTextContent
|
  |
|
  |
|

>---------------------------------------------------------------------------
------------------------------------------------------------------|




I agree, I like consistency.
Let's register the string into the document by allocating the memory on the
documents heap.
Although, the result can be very large. Should we allow the user to release
the memory when not used anymore? How?

/ Erik

-----Original Message-----
From: Gareth Reakes [mailto:[EMAIL PROTECTED]
Sent: den 1 april 2003 11:55
To: [EMAIL PROTECTED]
Subject: RE: Implementing getTextContent


Hi,
             there are no other methods in the DOM interface that you have
to
do that with. My opinion is that we allow the document to manage this as
well for consistency. Anyone else got an opinion?

Gareth


--
Gareth Reakes, Head of Product Development  +44-1865-203192
DecisionSoft Limited                        http://www.decisionsoft.com
XML Development and Services




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Implementing getTextContent

Reply via email to