Re: [xml] iterating through an XML document?

Torsten Mohr Wed, 13 Jun 2007 16:23:34 -0700

Hello Michael,

thanks a lot for your explanation, that helped a lot.


The purpose of iterating through that document is at the moment
just to get known to libxml2 and how to use the functions in principle.

I just made the changes you proposed and i can now see the
attributes/properties.

For reference, here is the new function show() with your suggestions.
I did not keep the formatting, as i only output it for learning
purposes:

void show(xmlNode* node, int indent) {
  xmlNode* n;
  int i;
  xmlAttr* attr;
  xmlChar* ac;
  xmlChar* val;

  for(n = node; n; n = n->next) {
    if(n->type == XML_ELEMENT_NODE) {
      for(i = 0; i < indent; i++) printf(" ");
      printf("<<%s>>\n", n->name);
      attr = n->properties;
      while(attr) {
        ac = xmlGetProp(n, attr->name);
        for(i = 0; i < indent+2; i++) printf(" ");
        printf("<%s><%s>\n", attr->name, ac);
        xmlFree(ac);
        attr = attr->next;
      }
      show(n->children, indent+2);
    }
    else if(n->type == XML_TEXT_NODE) {
      for(i = 0; i < indent; i++) printf(" ");
      val = xmlNodeGetContent(n);
      printf("c:%i:<%s>\n", strlen(val), val);
      xmlFree(val);
    }
  }
}

But it seems that too many text nodes are output, also for nodes that
do not have any content there is a text node with some whitespace characters
in it.

Do you know why this could happen?  How can i skip them?

Here is the XML file and below it there is the output of the function above.
text nodes are of format "c:length:<text>".

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <node1>content of node 1</node1>
  <node2/>
  <node3 attribute="yes" foo="bar">this node has attributes</node3>
  <node4>other way to create content (which is also a node)</node4>
  <node5>
    <node51 odd="no"/>
    <node52 odd="yes"/>
    <node53 odd="no"/>
  </node5>
  <node6>
    <node61 odd="no"/>
    <node62 odd="yes"/>
    <node63 odd="no"/>
  </node6>
</root>

Output:

<<root>>
  c:3:<

  <<node1>>
    c:17:<content of node 1>
  c:3:<

  <<node2>>
  c:3:<

  <<node3>>
    <attribute><yes>
    <foo><bar>
    c:24:<this node has attributes>
  c:3:<

  <<node4>>
    c:50:<other way to create content (which is also a node)>
  c:3:<

  <<node5>>
    c:5:<

    <<node51>>
      <odd><no>
    c:5:<

    <<node52>>
      <odd><yes>
    c:5:<

    <<node53>>
      <odd><no>
    c:3:<

  c:3:<

  <<node6>>
    c:5:<

    <<node61>>
      <odd><no>
    c:5:<

    <<node62>>
      <odd><yes>
    c:5:<

    <<node63>>
      <odd><no>
    c:3:<

  c:1:<



Thanks for any hints,
Torsten.



Regarding the text elements i still have some issues, it seems there
are some

Am Donnerstag, 14. Juni 2007 00:40 schrieben Sie:
> Hello, Torsten -
>
> You'll probably get other replies from the list, but here's a couple
> quick pointers to help you get started.
>
> Libxml uses a "loose polymorphism" approach in the node tree, as you've
> already noted -- you need to inspect the "type" field of the node to
> determine what you're dealing with.  The tree isn't entirely contained
> by the next and children nodes, however; depending on the type of the
> node, you sometimes need to statically cast the pointer to get at the
> internals.
>
> The default node type, "xmlNode", is also the "Element" type, which is
> convenient because that's the most common case.  An additional confusing
> detail is that the attribute list is named "properties" for some reason,
> which is one of those historical details that nobody can change now.
>
> Also, make certain not to confuse the DTD structures in tree.h with the
> node structures -- "xmlElement" and "xmlAttribute" are the definitions
> in the DTD, while "xmlNode" and "xmlAttr" are the actual nodes.
>
> In your case, you want code that looks like this (I'm doing this from
> memory, so excuse me if I get some of the capitalization and names wrong):
>
> if (n->type == XML_ELEMENT_NODE) {
>     printf("<%s", n->name);
>     xmlAttr *attr = n->properties;
>     while (attr) {
>         xmlchar *attrVal = xmlGetProp(n, n->name);
>        // Note that I am skipping the handling of namespaces here; use
> the "nsDef" field to figure those out
>         printf("%s=\"%s\" ", attr->name, attrVal);
>        xmlFree(attrVal);
>        attr = attr->next;
>     }
>     printf(">");
>     show(n->children, indent+2);
>     printf("</%s>", n->name);
> } else if (n->type == XML_TEXT_NODE) {
>      xmlChar *val = xmlNodeGetContent(n);
>     printf("%s", val);
>     xmlFree(val);
> } else ... (handle XML_CDATA_SECTION_NODE, COMMENT_NODE, PI_NODE, etc...)
>
> So, a couple interesting things to note about this:
> 1. Attributes are found by walking the "properties" list of the node.
> We know it's there because our type matched ELEMENT_NODE.
> 2. We can't just print out the value of the attribute, because it might
> contain entity references (things like &amp;).  You could walk the list
> yourself if you were very clever, but it's much easier and safer to just
> call xmlGetProp which does all that for you.  However, you need to free
> that memory when you're done with it, hence the call to xmlFree.
> 3. When we encounter a text node, we also need to resolve the entities,
> so we use the helpful "xmlNodeGetContent" function which does the same
> thing, and also needs to be cleaned up when we're done.
>
> Now, I should caution you that what you've done here is NOT the same as
> serializing the document back to XML!  This effectively throws out all
> the careful entity escaping that was in the original document... you
> could have bogus attribute values, and bad characters in your text, as a
> result of this, so it's really not safe to treat this output as XML.
>
> If you really want to get the XML back, the easiest thing to do is to
> just serialize it out with one of the "xmlDocDump" or "xmlNodeDump"
> functions.  There's a bunch of them and you can probably find one that
> does what you want.
>
> Hope that helps.
>
> Best -
> Michael
> --
> Cisco Systems/XML Engineering
> (formerly Reactivity, Inc.)
>
> Torsten Mohr wrote:
> > Now i wrote some code to read this file into memory and get its root node
> > and i'd like to output the document recursively.  I want to do this to
> > get known to libxml2 and on how to iterate through a document:
> >
> >
> > void show(xmlNode* node, int indent) {
> >   xmlNode* n;
> >   int i;
> >
> >   for(n = node; n; n = n->next) {
> >     if(n->type == XML_ELEMENT_NODE) {
> >       for(i = 0; i < indent; i++) printf(" ");
> >       printf("<%s> <%s>\n", n->name, xmlIsBlankNode(n) ? "<empty>" :
> > xmlNodeGetContent(n));
> >       show(n->children, indent+2);
> >     }
> >     if(n->type == XML_ATTRIBUTE_NODE) {
> >       for(i = 0; i < indent; i++) printf(" ");
> >       printf("<%s>+<%s>\n", n->name, xmlIsBlankNode(n) ? "<empty>" :
> > xmlNodeGetContent(n));
> >     }
> >   }
> > }
> >
> >
> > It does not exactly do what i want, i can't see any attributes like
> > foo="bar" or others.  Also, for nodes that do not have text, some empty
> > lines are printed, not the string "<empty>" as i want it to be.
> >
> >
> > I hope i don't mix up names, i'm not sure when to use attribute and
> > when property.
> >
> >
> > For using libxml2 in an own program i'd like to know how to:
> > - test if a node has a content or not
> > - test what attributes (or properties?) a node has
> >
> > It would be great if anybody could give me a hint on how to do this.
> >
> >
> > Best regards,
> > Torsten.
> > _______________________________________________
> > xml mailing list, project page  http://xmlsoft.org/
> > [email protected]
> > http://mail.gnome.org/mailman/listinfo/xml

-------------------------------------------------------
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

Re: [xml] iterating through an XML document?

Reply via email to