Re: Please don't use locale-dependent C standard library functions (was: Re: Please don't use functions from ctype.h and strings.h)

2020-06-15 Thread Frederik Braun
I'm in the process of setting up documentation & examples on how to
implement new static analysis checks.

If we're OK with turning new usages of these functions into errors, I
can help whoever is volunteering to do this.

Am 12.06.20 um 22:40 schrieb Jeff Gilbert:
> It would be great to have CI linting for these!
> 
> On Fri, Jun 12, 2020 at 2:19 AM Henri Sivonen  wrote:
>>
>> This is an occasional re-reminder that anything in the C standard
>> library that is locale-sensitive is fundamentally broken and should
>> not be used.
>>
>> Today's example is strerr(), which returns a string that is meant to
>> be rendered to the user, but the string isn't guaranteed to be UTF-8.
>>
>> On Mon, Aug 27, 2018 at 3:04 PM Henri Sivonen  wrote:
>>>
>>> Please don't use the functions from ctype.h and strings.h.
>>>
>>> See:
>>> https://daniel.haxx.se/blog/2018/01/30/isalnum-is-not-my-friend/
>>> https://daniel.haxx.se/blog/2008/10/15/strcasecmp-in-turkish/
>>> https://stackoverflow.com/questions/2898228/can-isdigit-legitimately-be-locale-dependent-in-c
>>>
>>> In addition to these being locale-sensitive, the functions from
>>> ctype.h are defined to take (signed) int with the value space of
>>> *unsigned* char or EOF and other argument values are Undefined
>>> Behavior. Therefore, on platforms where char is signed, passing a char
>>> sign-extends to int and invokes UB if the most-significant bit of the
>>> char was set! Bug filed 15 years ago!
>>> https://bugzilla.mozilla.org/show_bug.cgi?id=216952 (I'm not aware of
>>> implementations doing anything surprising with this UB but there
>>> exists precedent for *compiler* writers looking at the standard
>>> *library* UB language and taking calls into standard library functions
>>> as optimization-guiding assertions about the values of their
>>> arguments, so better not risk it.)
>>>
>>> For isfoo(), please use mozilla::IsAsciiFoo() from mozilla/TextUtils.h.
>>>
>>> For tolower() and toupper(), please use ToLowerCaseASCII() and
>>> ToUpperCaseASCII() from nsUnicharUtils.h
>>>
>>> For strcasecmp() and strncasecmp(), please use their nsCRT::-prefixed
>>> versions from nsCRT.h.
>>>
>>> (Ideally, we should scrub these from vendored C code, too, since being
>>> in third-party code doesn't really make the above problems go away.)
>>>
>>> --
>>> Henri Sivonen
>>> hsivo...@mozilla.com
>>
>>
>>
>> --
>> Henri Sivonen
>> hsivo...@mozilla.com
>> ___
>> dev-platform mailing list
>> dev-platform@lists.mozilla.org
>> https://lists.mozilla.org/listinfo/dev-platform
> ___
> dev-platform mailing list
> dev-platform@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform
> 
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Please don't use locale-dependent C standard library functions (was: Re: Please don't use functions from ctype.h and strings.h)

2020-06-12 Thread Jeff Gilbert
It would be great to have CI linting for these!

On Fri, Jun 12, 2020 at 2:19 AM Henri Sivonen  wrote:
>
> This is an occasional re-reminder that anything in the C standard
> library that is locale-sensitive is fundamentally broken and should
> not be used.
>
> Today's example is strerr(), which returns a string that is meant to
> be rendered to the user, but the string isn't guaranteed to be UTF-8.
>
> On Mon, Aug 27, 2018 at 3:04 PM Henri Sivonen  wrote:
> >
> > Please don't use the functions from ctype.h and strings.h.
> >
> > See:
> > https://daniel.haxx.se/blog/2018/01/30/isalnum-is-not-my-friend/
> > https://daniel.haxx.se/blog/2008/10/15/strcasecmp-in-turkish/
> > https://stackoverflow.com/questions/2898228/can-isdigit-legitimately-be-locale-dependent-in-c
> >
> > In addition to these being locale-sensitive, the functions from
> > ctype.h are defined to take (signed) int with the value space of
> > *unsigned* char or EOF and other argument values are Undefined
> > Behavior. Therefore, on platforms where char is signed, passing a char
> > sign-extends to int and invokes UB if the most-significant bit of the
> > char was set! Bug filed 15 years ago!
> > https://bugzilla.mozilla.org/show_bug.cgi?id=216952 (I'm not aware of
> > implementations doing anything surprising with this UB but there
> > exists precedent for *compiler* writers looking at the standard
> > *library* UB language and taking calls into standard library functions
> > as optimization-guiding assertions about the values of their
> > arguments, so better not risk it.)
> >
> > For isfoo(), please use mozilla::IsAsciiFoo() from mozilla/TextUtils.h.
> >
> > For tolower() and toupper(), please use ToLowerCaseASCII() and
> > ToUpperCaseASCII() from nsUnicharUtils.h
> >
> > For strcasecmp() and strncasecmp(), please use their nsCRT::-prefixed
> > versions from nsCRT.h.
> >
> > (Ideally, we should scrub these from vendored C code, too, since being
> > in third-party code doesn't really make the above problems go away.)
> >
> > --
> > Henri Sivonen
> > hsivo...@mozilla.com
>
>
>
> --
> Henri Sivonen
> hsivo...@mozilla.com
> ___
> dev-platform mailing list
> dev-platform@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform