Re: [xml] iterating through an XML document?

Torsten Mohr Thu, 14 Jun 2007 13:37:20 -0700

Hi,

>   In general no. Please do not try to assume you will be able to get
> libxml2 to ignore data. This may work or not, and the DTD is usually not a
> garantee because document are usually not valid. Instead of trying to build
> a dangerous pile of assumtion to try to avoid processing a few nodes,
> please code the full algorithm, and skip those nodes there. You will avoid
> wasting a lot of time on design, coding, testing and when your users
> actually start to use the code. It's not like testing if a node is text and
> just white spaces is hard so what ???


thanks for your hints.  Ok, you convince me easily, of course i want to
write proper code without any assumptions that at some point break my code.

Also, as an inbetween solution i tried to iterate over the document (already
loaded) and remove those parts that are text nodes that just contain
white-spaces.

It seems to me that having a loop over some node->children and removing
some of them in that same loop is somehow not a good idea, at least glibc
aborts my program due to double-freeing memory.  So i had to program it
like this:


At the moment the relevant part of my program looks like below, basically now
i iterate recursively through the nodes, first the text nodes.  If i find an
empty one i remove it and start over (by returning 1, the caller will repeat
the recursive call).  When all empty text nodes are removed i iterate over
the element nodes and iterate over their children.  At startup i call
remove_empty(root_node).


int do_remove_empty(xmlNode* node) {
  xmlNode* n;
  int i, is_empty, len;
  xmlChar* val;

  for(n = node; n; n = n->next) {
    if(n->type == XML_TEXT_NODE) {
      val = xmlNodeGetContent(n);
      len = strlen((const char*)val);

      is_empty = 1;
      for(i = 0; i < len; i++) {
        printf("%02X ", val[i]);
        if(!isspace(val[i])) {
          is_empty = 0;
        }
      }
      printf("\n");
      xmlFree(val);

      if(is_empty) {
        printf("unlinking %p\n", n);
        xmlUnlinkNode(n);
        xmlFreeNode(n);
        return 1;
      }
    }
  }

  for(n = node; n; n = n->next) {
    if(n->type == XML_ELEMENT_NODE) {
      do {
      }while(do_remove_empty(n->children));
    }
  }

  return 0;
}


void remove_empty(xmlNode* node) {
  int s;

  do {
    s = do_remove_empty(node);
  } while(s);
}



void show(xmlNode* node, int indent) {
  xmlNode* n;
  int i;
  xmlAttr* attr;
  xmlChar* ac;
  xmlChar* val;

  for(n = node; n; n = n->next) {
    if(n->type == XML_ELEMENT_NODE) {
      for(i = 0; i < indent; i++) printf(" ");
      printf("<%s>\n", n->name);
      attr = n->properties;
      while(attr) {
        ac = xmlGetProp(n, attr->name);
        for(i = 0; i < indent+2; i++) printf(" ");
        printf("<%s><%s>\n", attr->name, ac);
        xmlFree(ac);
        attr = attr->next;
      }
      show(n->children, indent+2);
    }
    else if(n->type == XML_TEXT_NODE) {
      for(i = 0; i < indent; i++) printf(" ");
      val = xmlNodeGetContent(n);
      printf("c:%i:<%s>\n", strlen((const char*)val), val);
      xmlFree(val);
    }
  }
}


Best regards,
Torsten.
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

Re: [xml] iterating through an XML document?

Reply via email to