Hi,
Prashant R wrote:
> Hi ,
>
> This is using C++/ gcc on LIBXML 2.7.2
>
> I am trying to add an attribute to a node , that raises an error
> "error : string is not in UTF-8"
>
> I am using the API
> xmlSetProp(currentNode , (const xmlChar *) kAttribName , (const
> xmlChar *)"
> http://www.w3.org/2000/09/xmldsig#"))
>
>
> Looking at the stack trace , the error originates from
> xmlNewPropInternal(
> ..)
>
> where
> xmlCheckUTF8(value) returns 0
>
> I am baffled as to why xmlCheckUTF8 would fail when passing this
> string - "
> http://www.w3.org/2000/09/xmldsig#"
> Basically , inside the for loop the first if statement is encountered
> (if ((c
> & 0x80) == 0x00)
>
> There isn't a check for NULL termination due to which it even passes
> the
> NULL characters at the end of the string and then grabs garbage and
> ultimately returns 0 .
I am baffled as to why you think there is no check for a NULL
character termination.
>
>
> int
>
> xmlCheckUTF8(const unsigned char *utf)
>
> {
>
> int ix;
>
> unsigned char c;
>
>
> if (utf == NULL)
>
> return(0);
>
> /*
>
> * utf is a string of 1, 2, 3 or 4 bytes. The valid strings
>
> * are as follows (in "bit format"):
>
> * 0xxxxxxx valid 1-byte
>
> * 110xxxxx 10xxxxxx valid 2-byte
>
> * 1110xxxx 10xxxxxx 10xxxxxx valid 3-byte
>
> * 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx valid 4-byte
>
> */
>
> for (ix = 0;;) { /* string is 0-terminated */
>
> c = utf[ix];
No, that line (in the issued source for at least 5 years) has been
for (ix = 0; (c = utf[ix]);) {
Why is yours different????
>
> if ((c & 0x80) == 0x00) { /* 1-byte code, starts with 10 */
>
> ix++;
>
> } else if ((c & 0xe0) == 0xc0) {/* 2-byte code, starts with 110 */
>
> if ((utf[ix+1] & 0xc0 ) != 0x80)
>
> return 0;
>
> ix += 2;
>
> } else if ((c & 0xf0) == 0xe0) {/* 3-byte code, starts with 1110 */
>
> if (((utf[ix+1] & 0xc0) != 0x80) ||
>
> ((utf[ix+2] & 0xc0) != 0x80))
>
> return 0;
>
> ix += 3;
>
> } else if ((c & 0xf8) == 0xf0) {/* 4-byte code, starts with 11110 */
>
> if (((utf[ix+1] & 0xc0) != 0x80) ||
>
> ((utf[ix+2] & 0xc0) != 0x80) ||
>
> ((utf[ix+3] & 0xc0) != 0x80))
>
> return 0;
>
> ix += 4;
>
> } else /* unknown encoding */
>
> return 0;
>
> }
>
> return(1);
>
> }
>
> Am I missing something very fundamental here ?
>
> Thanks
> _______________________________________________
> xml mailing list, project page http://xmlsoft.org/
> [email protected]
> http://mail.gnome.org/mailman/listinfo/xml
>
Bill
_______________________________________________
xml mailing list, project page http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml