Hi,
> In general no. Please do not try to assume you will be able to get
> libxml2 to ignore data. This may work or not, and the DTD is usually not a
> garantee because document are usually not valid. Instead of trying to build
> a dangerous pile of assumtion to try to avoid processing a few nodes,
> please code the full algorithm, and skip those nodes there. You will avoid
> wasting a lot of time on design, coding, testing and when your users
> actually start to use the code. It's not like testing if a node is text and
> just white spaces is hard so what ???
thanks for your hints. Ok, you convince me easily, of course i want to
write proper code without any assumptions that at some point break my code.
Also, as an inbetween solution i tried to iterate over the document (already
loaded) and remove those parts that are text nodes that just contain
white-spaces.
It seems to me that having a loop over some node->children and removing
some of them in that same loop is somehow not a good idea, at least glibc
aborts my program due to double-freeing memory. So i had to program it
like this:
At the moment the relevant part of my program looks like below, basically now
i iterate recursively through the nodes, first the text nodes. If i find an
empty one i remove it and start over (by returning 1, the caller will repeat
the recursive call). When all empty text nodes are removed i iterate over
the element nodes and iterate over their children. At startup i call
remove_empty(root_node).
int do_remove_empty(xmlNode* node) {
xmlNode* n;
int i, is_empty, len;
xmlChar* val;
for(n = node; n; n = n->next) {
if(n->type == XML_TEXT_NODE) {
val = xmlNodeGetContent(n);
len = strlen((const char*)val);
is_empty = 1;
for(i = 0; i < len; i++) {
printf("%02X ", val[i]);
if(!isspace(val[i])) {
is_empty = 0;
}
}
printf("\n");
xmlFree(val);
if(is_empty) {
printf("unlinking %p\n", n);
xmlUnlinkNode(n);
xmlFreeNode(n);
return 1;
}
}
}
for(n = node; n; n = n->next) {
if(n->type == XML_ELEMENT_NODE) {
do {
}while(do_remove_empty(n->children));
}
}
return 0;
}
void remove_empty(xmlNode* node) {
int s;
do {
s = do_remove_empty(node);
} while(s);
}
void show(xmlNode* node, int indent) {
xmlNode* n;
int i;
xmlAttr* attr;
xmlChar* ac;
xmlChar* val;
for(n = node; n; n = n->next) {
if(n->type == XML_ELEMENT_NODE) {
for(i = 0; i < indent; i++) printf(" ");
printf("<%s>\n", n->name);
attr = n->properties;
while(attr) {
ac = xmlGetProp(n, attr->name);
for(i = 0; i < indent+2; i++) printf(" ");
printf("<%s><%s>\n", attr->name, ac);
xmlFree(ac);
attr = attr->next;
}
show(n->children, indent+2);
}
else if(n->type == XML_TEXT_NODE) {
for(i = 0; i < indent; i++) printf(" ");
val = xmlNodeGetContent(n);
printf("c:%i:<%s>\n", strlen((const char*)val), val);
xmlFree(val);
}
}
}
Best regards,
Torsten.
_______________________________________________
xml mailing list, project page http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml