Re: Please don't use functions from ctype.h and strings.h

2020-06-26 Thread Jason Orendorff
This turns out to be super easy to do in clang-tidy, so I took bug 1485588.

-j

On Wed, Jun 24, 2020 at 2:35 PM Chris Peterson 
wrote:

> On 8/27/2018 7:00 AM, Henri Sivonen wrote:
> > I think it's worthwhile to have a lint, but regexps are likely to have
> > false positives, so using clang-tidy is probably better.
> >
> > A bug is on file:https://bugzilla.mozilla.org/show_bug.cgi?id=1485588
> >
> > On Mon, Aug 27, 2018 at 4:06 PM, Tom Ritter  wrote:
> >> Is this something worth making a lint over?  It's pretty easy to make
> >> regex-based lints, e.g.
> >>
> >> yml-only based lint:
> >>
> https://searchfox.org/mozilla-central/source/tools/lint/cpp-virtual-final.yml
> >>
> >> yml+python for slightly more complicated regexing:
> >>
> https://searchfox.org/mozilla-central/source/tools/lint/mingw-capitalization.yml
> >>
> https://searchfox.org/mozilla-central/source/tools/lint/cpp/mingw-capitalization.py
>
>
> Bug 1642825 recently added a "rejected words" lint. It was intended to
> warn about words like "blacklist" and "whitelist", but dangerous C
> function names could easily be added to the list:
>
> https://searchfox.org/mozilla-central/source/tools/lint/rejected-words.yml
>
> A "good enough" solution that can find real bugs now is preferable to a
> cleaner clang-tidy solution someday, maybe. (The clang-tidy lint bug
> 1485588 was filed two years ago.)
> ___
> dev-platform mailing list
> dev-platform@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform
>
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Please don't use functions from ctype.h and strings.h

2020-06-25 Thread Henri Sivonen
On Wed, Jun 24, 2020 at 10:35 PM Chris Peterson  wrote:
>
> On 8/27/2018 7:00 AM, Henri Sivonen wrote:
> > I think it's worthwhile to have a lint, but regexps are likely to have
> > false positives, so using clang-tidy is probably better.
> >
> > A bug is on file:https://bugzilla.mozilla.org/show_bug.cgi?id=1485588
> >
> > On Mon, Aug 27, 2018 at 4:06 PM, Tom Ritter  wrote:
> >> Is this something worth making a lint over?  It's pretty easy to make
> >> regex-based lints, e.g.
> >>
> >> yml-only based lint:
> >> https://searchfox.org/mozilla-central/source/tools/lint/cpp-virtual-final.yml
> >>
> >> yml+python for slightly more complicated regexing:
> >> https://searchfox.org/mozilla-central/source/tools/lint/mingw-capitalization.yml
> >> https://searchfox.org/mozilla-central/source/tools/lint/cpp/mingw-capitalization.py
>
>
> Bug 1642825 recently added a "rejected words" lint. It was intended to
> warn about words like "blacklist" and "whitelist", but dangerous C
> function names could easily be added to the list:
>
> https://searchfox.org/mozilla-central/source/tools/lint/rejected-words.yml
>
> A "good enough" solution that can find real bugs now is preferable to a
> cleaner clang-tidy solution someday, maybe. (The clang-tidy lint bug
> 1485588 was filed two years ago.)

Thanks. Filed https://bugzilla.mozilla.org/show_bug.cgi?id=1648390

-- 
Henri Sivonen
hsivo...@mozilla.com
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Please don't use functions from ctype.h and strings.h

2020-06-24 Thread Chris Peterson

On 8/27/2018 7:00 AM, Henri Sivonen wrote:

I think it's worthwhile to have a lint, but regexps are likely to have
false positives, so using clang-tidy is probably better.

A bug is on file:https://bugzilla.mozilla.org/show_bug.cgi?id=1485588

On Mon, Aug 27, 2018 at 4:06 PM, Tom Ritter  wrote:

Is this something worth making a lint over?  It's pretty easy to make
regex-based lints, e.g.

yml-only based lint:
https://searchfox.org/mozilla-central/source/tools/lint/cpp-virtual-final.yml

yml+python for slightly more complicated regexing:
https://searchfox.org/mozilla-central/source/tools/lint/mingw-capitalization.yml
https://searchfox.org/mozilla-central/source/tools/lint/cpp/mingw-capitalization.py



Bug 1642825 recently added a "rejected words" lint. It was intended to 
warn about words like "blacklist" and "whitelist", but dangerous C 
function names could easily be added to the list:


https://searchfox.org/mozilla-central/source/tools/lint/rejected-words.yml

A "good enough" solution that can find real bugs now is preferable to a 
cleaner clang-tidy solution someday, maybe. (The clang-tidy lint bug 
1485588 was filed two years ago.)

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Please don't use locale-dependent C standard library functions (was: Re: Please don't use functions from ctype.h and strings.h)

2020-06-15 Thread Frederik Braun
I'm in the process of setting up documentation & examples on how to
implement new static analysis checks.

If we're OK with turning new usages of these functions into errors, I
can help whoever is volunteering to do this.

Am 12.06.20 um 22:40 schrieb Jeff Gilbert:
> It would be great to have CI linting for these!
> 
> On Fri, Jun 12, 2020 at 2:19 AM Henri Sivonen  wrote:
>>
>> This is an occasional re-reminder that anything in the C standard
>> library that is locale-sensitive is fundamentally broken and should
>> not be used.
>>
>> Today's example is strerr(), which returns a string that is meant to
>> be rendered to the user, but the string isn't guaranteed to be UTF-8.
>>
>> On Mon, Aug 27, 2018 at 3:04 PM Henri Sivonen  wrote:
>>>
>>> Please don't use the functions from ctype.h and strings.h.
>>>
>>> See:
>>> https://daniel.haxx.se/blog/2018/01/30/isalnum-is-not-my-friend/
>>> https://daniel.haxx.se/blog/2008/10/15/strcasecmp-in-turkish/
>>> https://stackoverflow.com/questions/2898228/can-isdigit-legitimately-be-locale-dependent-in-c
>>>
>>> In addition to these being locale-sensitive, the functions from
>>> ctype.h are defined to take (signed) int with the value space of
>>> *unsigned* char or EOF and other argument values are Undefined
>>> Behavior. Therefore, on platforms where char is signed, passing a char
>>> sign-extends to int and invokes UB if the most-significant bit of the
>>> char was set! Bug filed 15 years ago!
>>> https://bugzilla.mozilla.org/show_bug.cgi?id=216952 (I'm not aware of
>>> implementations doing anything surprising with this UB but there
>>> exists precedent for *compiler* writers looking at the standard
>>> *library* UB language and taking calls into standard library functions
>>> as optimization-guiding assertions about the values of their
>>> arguments, so better not risk it.)
>>>
>>> For isfoo(), please use mozilla::IsAsciiFoo() from mozilla/TextUtils.h.
>>>
>>> For tolower() and toupper(), please use ToLowerCaseASCII() and
>>> ToUpperCaseASCII() from nsUnicharUtils.h
>>>
>>> For strcasecmp() and strncasecmp(), please use their nsCRT::-prefixed
>>> versions from nsCRT.h.
>>>
>>> (Ideally, we should scrub these from vendored C code, too, since being
>>> in third-party code doesn't really make the above problems go away.)
>>>
>>> --
>>> Henri Sivonen
>>> hsivo...@mozilla.com
>>
>>
>>
>> --
>> Henri Sivonen
>> hsivo...@mozilla.com
>> ___
>> dev-platform mailing list
>> dev-platform@lists.mozilla.org
>> https://lists.mozilla.org/listinfo/dev-platform
> ___
> dev-platform mailing list
> dev-platform@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform
> 
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Please don't use locale-dependent C standard library functions (was: Re: Please don't use functions from ctype.h and strings.h)

2020-06-12 Thread Jeff Gilbert
It would be great to have CI linting for these!

On Fri, Jun 12, 2020 at 2:19 AM Henri Sivonen  wrote:
>
> This is an occasional re-reminder that anything in the C standard
> library that is locale-sensitive is fundamentally broken and should
> not be used.
>
> Today's example is strerr(), which returns a string that is meant to
> be rendered to the user, but the string isn't guaranteed to be UTF-8.
>
> On Mon, Aug 27, 2018 at 3:04 PM Henri Sivonen  wrote:
> >
> > Please don't use the functions from ctype.h and strings.h.
> >
> > See:
> > https://daniel.haxx.se/blog/2018/01/30/isalnum-is-not-my-friend/
> > https://daniel.haxx.se/blog/2008/10/15/strcasecmp-in-turkish/
> > https://stackoverflow.com/questions/2898228/can-isdigit-legitimately-be-locale-dependent-in-c
> >
> > In addition to these being locale-sensitive, the functions from
> > ctype.h are defined to take (signed) int with the value space of
> > *unsigned* char or EOF and other argument values are Undefined
> > Behavior. Therefore, on platforms where char is signed, passing a char
> > sign-extends to int and invokes UB if the most-significant bit of the
> > char was set! Bug filed 15 years ago!
> > https://bugzilla.mozilla.org/show_bug.cgi?id=216952 (I'm not aware of
> > implementations doing anything surprising with this UB but there
> > exists precedent for *compiler* writers looking at the standard
> > *library* UB language and taking calls into standard library functions
> > as optimization-guiding assertions about the values of their
> > arguments, so better not risk it.)
> >
> > For isfoo(), please use mozilla::IsAsciiFoo() from mozilla/TextUtils.h.
> >
> > For tolower() and toupper(), please use ToLowerCaseASCII() and
> > ToUpperCaseASCII() from nsUnicharUtils.h
> >
> > For strcasecmp() and strncasecmp(), please use their nsCRT::-prefixed
> > versions from nsCRT.h.
> >
> > (Ideally, we should scrub these from vendored C code, too, since being
> > in third-party code doesn't really make the above problems go away.)
> >
> > --
> > Henri Sivonen
> > hsivo...@mozilla.com
>
>
>
> --
> Henri Sivonen
> hsivo...@mozilla.com
> ___
> dev-platform mailing list
> dev-platform@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Please don't use locale-dependent C standard library functions (was: Re: Please don't use functions from ctype.h and strings.h)

2020-06-12 Thread Henri Sivonen
This is an occasional re-reminder that anything in the C standard
library that is locale-sensitive is fundamentally broken and should
not be used.

Today's example is strerr(), which returns a string that is meant to
be rendered to the user, but the string isn't guaranteed to be UTF-8.

On Mon, Aug 27, 2018 at 3:04 PM Henri Sivonen  wrote:
>
> Please don't use the functions from ctype.h and strings.h.
>
> See:
> https://daniel.haxx.se/blog/2018/01/30/isalnum-is-not-my-friend/
> https://daniel.haxx.se/blog/2008/10/15/strcasecmp-in-turkish/
> https://stackoverflow.com/questions/2898228/can-isdigit-legitimately-be-locale-dependent-in-c
>
> In addition to these being locale-sensitive, the functions from
> ctype.h are defined to take (signed) int with the value space of
> *unsigned* char or EOF and other argument values are Undefined
> Behavior. Therefore, on platforms where char is signed, passing a char
> sign-extends to int and invokes UB if the most-significant bit of the
> char was set! Bug filed 15 years ago!
> https://bugzilla.mozilla.org/show_bug.cgi?id=216952 (I'm not aware of
> implementations doing anything surprising with this UB but there
> exists precedent for *compiler* writers looking at the standard
> *library* UB language and taking calls into standard library functions
> as optimization-guiding assertions about the values of their
> arguments, so better not risk it.)
>
> For isfoo(), please use mozilla::IsAsciiFoo() from mozilla/TextUtils.h.
>
> For tolower() and toupper(), please use ToLowerCaseASCII() and
> ToUpperCaseASCII() from nsUnicharUtils.h
>
> For strcasecmp() and strncasecmp(), please use their nsCRT::-prefixed
> versions from nsCRT.h.
>
> (Ideally, we should scrub these from vendored C code, too, since being
> in third-party code doesn't really make the above problems go away.)
>
> --
> Henri Sivonen
> hsivo...@mozilla.com



-- 
Henri Sivonen
hsivo...@mozilla.com
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Please don't use functions from ctype.h and strings.h

2018-08-27 Thread Henri Sivonen
I think it's worthwhile to have a lint, but regexps are likely to have
false positives, so using clang-tidy is probably better.

A bug is on file: https://bugzilla.mozilla.org/show_bug.cgi?id=1485588

On Mon, Aug 27, 2018 at 4:06 PM, Tom Ritter  wrote:
> Is this something worth making a lint over?  It's pretty easy to make
> regex-based lints, e.g.
>
> yml-only based lint:
> https://searchfox.org/mozilla-central/source/tools/lint/cpp-virtual-final.yml
>
> yml+python for slightly more complicated regexing:
> https://searchfox.org/mozilla-central/source/tools/lint/mingw-capitalization.yml
> https://searchfox.org/mozilla-central/source/tools/lint/cpp/mingw-capitalization.py
>
> -tom
>
> On Mon, Aug 27, 2018 at 7:04 AM, Henri Sivonen  wrote:
>>
>> Please don't use the functions from ctype.h and strings.h.
>>
>> See:
>> https://daniel.haxx.se/blog/2018/01/30/isalnum-is-not-my-friend/
>> https://daniel.haxx.se/blog/2008/10/15/strcasecmp-in-turkish/
>>
>> https://stackoverflow.com/questions/2898228/can-isdigit-legitimately-be-locale-dependent-in-c
>>
>> In addition to these being locale-sensitive, the functions from
>> ctype.h are defined to take (signed) int with the value space of
>> *unsigned* char or EOF and other argument values are Undefined
>> Behavior. Therefore, on platforms where char is signed, passing a char
>> sign-extends to int and invokes UB if the most-significant bit of the
>> char was set! Bug filed 15 years ago!
>> https://bugzilla.mozilla.org/show_bug.cgi?id=216952 (I'm not aware of
>> implementations doing anything surprising with this UB but there
>> exists precedent for *compiler* writers looking at the standard
>> *library* UB language and taking calls into standard library functions
>> as optimization-guiding assertions about the values of their
>> arguments, so better not risk it.)
>>
>> For isfoo(), please use mozilla::IsAsciiFoo() from mozilla/TextUtils.h.
>>
>> For tolower() and toupper(), please use ToLowerCaseASCII() and
>> ToUpperCaseASCII() from nsUnicharUtils.h
>>
>> For strcasecmp() and strncasecmp(), please use their nsCRT::-prefixed
>> versions from nsCRT.h.
>>
>> (Ideally, we should scrub these from vendored C code, too, since being
>> in third-party code doesn't really make the above problems go away.)
>>
>> --
>> Henri Sivonen
>> hsivo...@mozilla.com
>> ___
>> dev-platform mailing list
>> dev-platform@lists.mozilla.org
>> https://lists.mozilla.org/listinfo/dev-platform
>
>



-- 
Henri Sivonen
hsivo...@mozilla.com
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Please don't use functions from ctype.h and strings.h

2018-08-27 Thread Tom Ritter
Is this something worth making a lint over?  It's pretty easy to make
regex-based lints, e.g.

yml-only based lint:
https://searchfox.org/mozilla-central/source/tools/lint/cpp-virtual-final.yml

yml+python for slightly more complicated regexing:
https://searchfox.org/mozilla-central/source/tools/lint/mingw-capitalization.yml
https://searchfox.org/mozilla-central/source/tools/lint/cpp/mingw-capitalization.py

-tom

On Mon, Aug 27, 2018 at 7:04 AM, Henri Sivonen  wrote:

> Please don't use the functions from ctype.h and strings.h.
>
> See:
> https://daniel.haxx.se/blog/2018/01/30/isalnum-is-not-my-friend/
> https://daniel.haxx.se/blog/2008/10/15/strcasecmp-in-turkish/
> https://stackoverflow.com/questions/2898228/can-isdigit-
> legitimately-be-locale-dependent-in-c
>
> In addition to these being locale-sensitive, the functions from
> ctype.h are defined to take (signed) int with the value space of
> *unsigned* char or EOF and other argument values are Undefined
> Behavior. Therefore, on platforms where char is signed, passing a char
> sign-extends to int and invokes UB if the most-significant bit of the
> char was set! Bug filed 15 years ago!
> https://bugzilla.mozilla.org/show_bug.cgi?id=216952 (I'm not aware of
> implementations doing anything surprising with this UB but there
> exists precedent for *compiler* writers looking at the standard
> *library* UB language and taking calls into standard library functions
> as optimization-guiding assertions about the values of their
> arguments, so better not risk it.)
>
> For isfoo(), please use mozilla::IsAsciiFoo() from mozilla/TextUtils.h.
>
> For tolower() and toupper(), please use ToLowerCaseASCII() and
> ToUpperCaseASCII() from nsUnicharUtils.h
>
> For strcasecmp() and strncasecmp(), please use their nsCRT::-prefixed
> versions from nsCRT.h.
>
> (Ideally, we should scrub these from vendored C code, too, since being
> in third-party code doesn't really make the above problems go away.)
>
> --
> Henri Sivonen
> hsivo...@mozilla.com
> ___
> dev-platform mailing list
> dev-platform@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform
>
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Please don't use functions from ctype.h and strings.h

2018-08-27 Thread Henri Sivonen
Please don't use the functions from ctype.h and strings.h.

See:
https://daniel.haxx.se/blog/2018/01/30/isalnum-is-not-my-friend/
https://daniel.haxx.se/blog/2008/10/15/strcasecmp-in-turkish/
https://stackoverflow.com/questions/2898228/can-isdigit-legitimately-be-locale-dependent-in-c

In addition to these being locale-sensitive, the functions from
ctype.h are defined to take (signed) int with the value space of
*unsigned* char or EOF and other argument values are Undefined
Behavior. Therefore, on platforms where char is signed, passing a char
sign-extends to int and invokes UB if the most-significant bit of the
char was set! Bug filed 15 years ago!
https://bugzilla.mozilla.org/show_bug.cgi?id=216952 (I'm not aware of
implementations doing anything surprising with this UB but there
exists precedent for *compiler* writers looking at the standard
*library* UB language and taking calls into standard library functions
as optimization-guiding assertions about the values of their
arguments, so better not risk it.)

For isfoo(), please use mozilla::IsAsciiFoo() from mozilla/TextUtils.h.

For tolower() and toupper(), please use ToLowerCaseASCII() and
ToUpperCaseASCII() from nsUnicharUtils.h

For strcasecmp() and strncasecmp(), please use their nsCRT::-prefixed
versions from nsCRT.h.

(Ideally, we should scrub these from vendored C code, too, since being
in third-party code doesn't really make the above problems go away.)

-- 
Henri Sivonen
hsivo...@mozilla.com
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform