Re: [xml] xmlSetProp reports error - "error : string is not in UTF-8" for a URL !

William M. Brack Wed, 11 Mar 2009 23:01:50 -0700

Hi,
Prashant R wrote:
> Hi ,
>
> This is using C++/ gcc on LIBXML 2.7.2
>
> I am trying to add an attribute to a node , that raises an error
> "error : string is not in UTF-8"
>
> I am using the API
> xmlSetProp(currentNode , (const xmlChar *) kAttribName , (const
> xmlChar *)"
> http://www.w3.org/2000/09/xmldsig#";))
>
>
> Looking at the stack trace , the error originates from
> xmlNewPropInternal(
> ..)
>
> where
> xmlCheckUTF8(value) returns 0
>
> I am baffled as to why xmlCheckUTF8 would fail when passing this
> string - "
> http://www.w3.org/2000/09/xmldsig#";
> Basically , inside the for loop the first if statement is encountered
> (if ((c
> & 0x80) == 0x00)
>
> There isn't a check for NULL termination due to which  it even passes
> the
> NULL characters at the end of the string and then grabs garbage and
> ultimately returns 0 .


I am baffled as to why you think there is no check for a NULL
character termination.

>
>
> int
>
> xmlCheckUTF8(const unsigned char *utf)
>
> {
>
>     int ix;
>
>     unsigned char c;
>
>
>     if (utf == NULL)
>
>         return(0);
>
>     /*
>
>      * utf is a string of 1, 2, 3 or 4 bytes.  The valid strings
>
>      * are as follows (in "bit format"):
>
>      *    0xxxxxxx                                      valid 1-byte
>
>      *    110xxxxx 10xxxxxx                             valid 2-byte
>
>      *    1110xxxx 10xxxxxx 10xxxxxx                    valid 3-byte
>
>      *    11110xxx 10xxxxxx 10xxxxxx 10xxxxxx           valid 4-byte
>
>      */
>
>     for (ix = 0;;) {      /* string is 0-terminated */
>
> c = utf[ix];

No, that line (in the issued source for at least 5 years) has been

      for (ix = 0; (c = utf[ix]);) {

Why is yours different????

>
>         if ((c & 0x80) == 0x00) { /* 1-byte code, starts with 10 */
>
>             ix++;
>
> } else if ((c & 0xe0) == 0xc0) {/* 2-byte code, starts with 110 */
>
>     if ((utf[ix+1] & 0xc0 ) != 0x80)
>
>         return 0;
>
>     ix += 2;
>
> } else if ((c & 0xf0) == 0xe0) {/* 3-byte code, starts with 1110 */
>
>     if (((utf[ix+1] & 0xc0) != 0x80) ||
>
>         ((utf[ix+2] & 0xc0) != 0x80))
>
>     return 0;
>
>     ix += 3;
>
> } else if ((c & 0xf8) == 0xf0) {/* 4-byte code, starts with 11110 */
>
>     if (((utf[ix+1] & 0xc0) != 0x80) ||
>
>         ((utf[ix+2] & 0xc0) != 0x80) ||
>
> ((utf[ix+3] & 0xc0) != 0x80))
>
>     return 0;
>
>     ix += 4;
>
> } else /* unknown encoding */
>
>     return 0;
>
>       }
>
>       return(1);
>
> }
>
> Am I missing something very fundamental here ?
>
> Thanks
> _______________________________________________
> xml mailing list, project page  http://xmlsoft.org/
> [email protected]
> http://mail.gnome.org/mailman/listinfo/xml
>

Bill


_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

Re: [xml] xmlSetProp reports error - "error : string is not in UTF-8" for a URL !

Reply via email to