Hi , This is using C++/ gcc on LIBXML 2.7.2
I am trying to add an attribute to a node , that raises an error "error : string is not in UTF-8" I am using the API xmlSetProp(currentNode , (const xmlChar *) kAttribName , (const xmlChar *)" http://www.w3.org/2000/09/xmldsig#")) Looking at the stack trace , the error originates from xmlNewPropInternal( ..) where xmlCheckUTF8(value) returns 0 I am baffled as to why xmlCheckUTF8 would fail when passing this string - " http://www.w3.org/2000/09/xmldsig#" Basically , inside the for loop the first if statement is encountered (if ((c & 0x80) == 0x00) There isn't a check for NULL termination due to which it even passes the NULL characters at the end of the string and then grabs garbage and ultimately returns 0 . int xmlCheckUTF8(const unsigned char *utf) { int ix; unsigned char c; if (utf == NULL) return(0); /* * utf is a string of 1, 2, 3 or 4 bytes. The valid strings * are as follows (in "bit format"): * 0xxxxxxx valid 1-byte * 110xxxxx 10xxxxxx valid 2-byte * 1110xxxx 10xxxxxx 10xxxxxx valid 3-byte * 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx valid 4-byte */ for (ix = 0;;) { /* string is 0-terminated */ c = utf[ix]; if ((c & 0x80) == 0x00) { /* 1-byte code, starts with 10 */ ix++; } else if ((c & 0xe0) == 0xc0) {/* 2-byte code, starts with 110 */ if ((utf[ix+1] & 0xc0 ) != 0x80) return 0; ix += 2; } else if ((c & 0xf0) == 0xe0) {/* 3-byte code, starts with 1110 */ if (((utf[ix+1] & 0xc0) != 0x80) || ((utf[ix+2] & 0xc0) != 0x80)) return 0; ix += 3; } else if ((c & 0xf8) == 0xf0) {/* 4-byte code, starts with 11110 */ if (((utf[ix+1] & 0xc0) != 0x80) || ((utf[ix+2] & 0xc0) != 0x80) || ((utf[ix+3] & 0xc0) != 0x80)) return 0; ix += 4; } else /* unknown encoding */ return 0; } return(1); } Am I missing something very fundamental here ? Thanks
_______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] http://mail.gnome.org/mailman/listinfo/xml
