RE: Newby question: Get Text of a Node

Anthony Dodd 4 Apr 2000 08:02:19 -0000

I wrote the following function to emilate XML4J's TXElements.getText(), which returned a string containing the text of all child nodes of a particular element. This might be of

some use to you. Since comments are a subclass of CharacterData this function will include comments in the string returned.

/** Return all text associated with this Node and its children without considering entities.
   * @param node - the node providing the context for the search
   * @returns Text associated with all children, or "" if no children.
   */
public static final java.lang.String getText(org.w3c.dom.Node node) {
    //Initialise the string buffer so we'll at least return "".
    StringBuffer sb = new StringBuffer("");
    try {
      NodeList nl = node.getChildNodes();
      for (int i = 0; i < nl.getLength(); i++) {
        //Is it a subclass of org.w3c.dom.CharacterData e.g. Text, CDATA Section, Comment etc..
        if (nl.item(i) instanceof org.w3c.dom.CharacterData) {
          sb.append(((org.w3c.dom.CharacterData)nl.item(i)).getData());
        }
        //Recursively examine the children of this node
        sb.append(getText(nl.item(i)));
      }
    } catch(java.lang.Exception e) {
      //Needs to be handled ?
      e.printStackTrace();
    }
    //Convert the contents to a String
    return sb.toString();
}

Regards

Anthony Dodd

-----Original Message-----
From: Andy Heninger [mailto:[EMAIL PROTECTED]
Sent: 03 April 2000 17:51
To: [EMAIL PROTECTED]
Subject: Re: Newby question: Get Text of a Node

The text of an element is contained in text nodes that are children of the element node. It's always a little awkward to get it out.

Elements can have any number of children, so you may want to generalize your code a bit.   Even if there are no nested elements, comments and entity reference nodes can still appear. You probably want to ignore comments altogether and recursively look for more text nodes among the children of any entity ref nodes.

It may be sligthly simpler to use Node.getFirstChild() and getNextSibling() methods for navigation, rather than the node list.

-- Andy

----- Original Message -----

From: Nathan Troxler

To: [EMAIL PROTECTED]

Sent: Saturday, April 01, 2000 1:15 AM

Subject: Newby question: Get Text of a Node

But I am a bit confused about getting the Text inside a Tag:

In the XML I have:

...

<NOTE>This is the text to read for the note</NOTE>

...

Currently I use the following, which seems to work, but seems also

to be a bit complicated. Is there an easier/safer way to get

the text ("This is the text to read for the note").

// Handler to extract the text of a Node with Name "NOTE"

//

void XMLParser::ParseVerbNote (DOM_Node& rNote)
{

/* No, this obviously does not work!!!
DOMString ds = rNote.getNodeValue();
CString   cs = ds.rawBuffer();
*/

CString note("");

DOM_NodeList childs = rNote.getChildNodes();

// Works, but seems complicated! Is the direct text of a node alway childs.item(0) ???

int cntCld = childs.getLength();
if( cntCld != 0 ) {
for(int j = 0; j < cntCld; j++ ) {
   DOM_Node child = childs.item(j);
   DOMString ds = child.getNodeValue();
   note = ds.rawBuffer();
}
}

dp_bufAgt->SetVerbNote((LPCTSTR)note); // writing to my internal data format
}

Thanks for any hints.

Nathan Troxler

[EMAIL PROTECTED]

RE: Newby question: Get Text of a Node

Reply via email to