Re: Please don't use functions from ctype.h and strings.h
This turns out to be super easy to do in clang-tidy, so I took bug 1485588. -j On Wed, Jun 24, 2020 at 2:35 PM Chris Peterson wrote: > On 8/27/2018 7:00 AM, Henri Sivonen wrote: > > I think it's worthwhile to have a lint, but regexps are likely to have > > false positives, so using clang-tidy is probably better. > > > > A bug is on file:https://bugzilla.mozilla.org/show_bug.cgi?id=1485588 > > > > On Mon, Aug 27, 2018 at 4:06 PM, Tom Ritter wrote: > >> Is this something worth making a lint over? It's pretty easy to make > >> regex-based lints, e.g. > >> > >> yml-only based lint: > >> > https://searchfox.org/mozilla-central/source/tools/lint/cpp-virtual-final.yml > >> > >> yml+python for slightly more complicated regexing: > >> > https://searchfox.org/mozilla-central/source/tools/lint/mingw-capitalization.yml > >> > https://searchfox.org/mozilla-central/source/tools/lint/cpp/mingw-capitalization.py > > > Bug 1642825 recently added a "rejected words" lint. It was intended to > warn about words like "blacklist" and "whitelist", but dangerous C > function names could easily be added to the list: > > https://searchfox.org/mozilla-central/source/tools/lint/rejected-words.yml > > A "good enough" solution that can find real bugs now is preferable to a > cleaner clang-tidy solution someday, maybe. (The clang-tidy lint bug > 1485588 was filed two years ago.) > ___ > dev-platform mailing list > dev-platform@lists.mozilla.org > https://lists.mozilla.org/listinfo/dev-platform > ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Please don't use functions from ctype.h and strings.h
On Wed, Jun 24, 2020 at 10:35 PM Chris Peterson wrote: > > On 8/27/2018 7:00 AM, Henri Sivonen wrote: > > I think it's worthwhile to have a lint, but regexps are likely to have > > false positives, so using clang-tidy is probably better. > > > > A bug is on file:https://bugzilla.mozilla.org/show_bug.cgi?id=1485588 > > > > On Mon, Aug 27, 2018 at 4:06 PM, Tom Ritter wrote: > >> Is this something worth making a lint over? It's pretty easy to make > >> regex-based lints, e.g. > >> > >> yml-only based lint: > >> https://searchfox.org/mozilla-central/source/tools/lint/cpp-virtual-final.yml > >> > >> yml+python for slightly more complicated regexing: > >> https://searchfox.org/mozilla-central/source/tools/lint/mingw-capitalization.yml > >> https://searchfox.org/mozilla-central/source/tools/lint/cpp/mingw-capitalization.py > > > Bug 1642825 recently added a "rejected words" lint. It was intended to > warn about words like "blacklist" and "whitelist", but dangerous C > function names could easily be added to the list: > > https://searchfox.org/mozilla-central/source/tools/lint/rejected-words.yml > > A "good enough" solution that can find real bugs now is preferable to a > cleaner clang-tidy solution someday, maybe. (The clang-tidy lint bug > 1485588 was filed two years ago.) Thanks. Filed https://bugzilla.mozilla.org/show_bug.cgi?id=1648390 -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Please don't use functions from ctype.h and strings.h
On 8/27/2018 7:00 AM, Henri Sivonen wrote: I think it's worthwhile to have a lint, but regexps are likely to have false positives, so using clang-tidy is probably better. A bug is on file:https://bugzilla.mozilla.org/show_bug.cgi?id=1485588 On Mon, Aug 27, 2018 at 4:06 PM, Tom Ritter wrote: Is this something worth making a lint over? It's pretty easy to make regex-based lints, e.g. yml-only based lint: https://searchfox.org/mozilla-central/source/tools/lint/cpp-virtual-final.yml yml+python for slightly more complicated regexing: https://searchfox.org/mozilla-central/source/tools/lint/mingw-capitalization.yml https://searchfox.org/mozilla-central/source/tools/lint/cpp/mingw-capitalization.py Bug 1642825 recently added a "rejected words" lint. It was intended to warn about words like "blacklist" and "whitelist", but dangerous C function names could easily be added to the list: https://searchfox.org/mozilla-central/source/tools/lint/rejected-words.yml A "good enough" solution that can find real bugs now is preferable to a cleaner clang-tidy solution someday, maybe. (The clang-tidy lint bug 1485588 was filed two years ago.) ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Please don't use locale-dependent C standard library functions (was: Re: Please don't use functions from ctype.h and strings.h)
I'm in the process of setting up documentation & examples on how to implement new static analysis checks. If we're OK with turning new usages of these functions into errors, I can help whoever is volunteering to do this. Am 12.06.20 um 22:40 schrieb Jeff Gilbert: > It would be great to have CI linting for these! > > On Fri, Jun 12, 2020 at 2:19 AM Henri Sivonen wrote: >> >> This is an occasional re-reminder that anything in the C standard >> library that is locale-sensitive is fundamentally broken and should >> not be used. >> >> Today's example is strerr(), which returns a string that is meant to >> be rendered to the user, but the string isn't guaranteed to be UTF-8. >> >> On Mon, Aug 27, 2018 at 3:04 PM Henri Sivonen wrote: >>> >>> Please don't use the functions from ctype.h and strings.h. >>> >>> See: >>> https://daniel.haxx.se/blog/2018/01/30/isalnum-is-not-my-friend/ >>> https://daniel.haxx.se/blog/2008/10/15/strcasecmp-in-turkish/ >>> https://stackoverflow.com/questions/2898228/can-isdigit-legitimately-be-locale-dependent-in-c >>> >>> In addition to these being locale-sensitive, the functions from >>> ctype.h are defined to take (signed) int with the value space of >>> *unsigned* char or EOF and other argument values are Undefined >>> Behavior. Therefore, on platforms where char is signed, passing a char >>> sign-extends to int and invokes UB if the most-significant bit of the >>> char was set! Bug filed 15 years ago! >>> https://bugzilla.mozilla.org/show_bug.cgi?id=216952 (I'm not aware of >>> implementations doing anything surprising with this UB but there >>> exists precedent for *compiler* writers looking at the standard >>> *library* UB language and taking calls into standard library functions >>> as optimization-guiding assertions about the values of their >>> arguments, so better not risk it.) >>> >>> For isfoo(), please use mozilla::IsAsciiFoo() from mozilla/TextUtils.h. >>> >>> For tolower() and toupper(), please use ToLowerCaseASCII() and >>> ToUpperCaseASCII() from nsUnicharUtils.h >>> >>> For strcasecmp() and strncasecmp(), please use their nsCRT::-prefixed >>> versions from nsCRT.h. >>> >>> (Ideally, we should scrub these from vendored C code, too, since being >>> in third-party code doesn't really make the above problems go away.) >>> >>> -- >>> Henri Sivonen >>> hsivo...@mozilla.com >> >> >> >> -- >> Henri Sivonen >> hsivo...@mozilla.com >> ___ >> dev-platform mailing list >> dev-platform@lists.mozilla.org >> https://lists.mozilla.org/listinfo/dev-platform > ___ > dev-platform mailing list > dev-platform@lists.mozilla.org > https://lists.mozilla.org/listinfo/dev-platform > ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Please don't use locale-dependent C standard library functions (was: Re: Please don't use functions from ctype.h and strings.h)
It would be great to have CI linting for these! On Fri, Jun 12, 2020 at 2:19 AM Henri Sivonen wrote: > > This is an occasional re-reminder that anything in the C standard > library that is locale-sensitive is fundamentally broken and should > not be used. > > Today's example is strerr(), which returns a string that is meant to > be rendered to the user, but the string isn't guaranteed to be UTF-8. > > On Mon, Aug 27, 2018 at 3:04 PM Henri Sivonen wrote: > > > > Please don't use the functions from ctype.h and strings.h. > > > > See: > > https://daniel.haxx.se/blog/2018/01/30/isalnum-is-not-my-friend/ > > https://daniel.haxx.se/blog/2008/10/15/strcasecmp-in-turkish/ > > https://stackoverflow.com/questions/2898228/can-isdigit-legitimately-be-locale-dependent-in-c > > > > In addition to these being locale-sensitive, the functions from > > ctype.h are defined to take (signed) int with the value space of > > *unsigned* char or EOF and other argument values are Undefined > > Behavior. Therefore, on platforms where char is signed, passing a char > > sign-extends to int and invokes UB if the most-significant bit of the > > char was set! Bug filed 15 years ago! > > https://bugzilla.mozilla.org/show_bug.cgi?id=216952 (I'm not aware of > > implementations doing anything surprising with this UB but there > > exists precedent for *compiler* writers looking at the standard > > *library* UB language and taking calls into standard library functions > > as optimization-guiding assertions about the values of their > > arguments, so better not risk it.) > > > > For isfoo(), please use mozilla::IsAsciiFoo() from mozilla/TextUtils.h. > > > > For tolower() and toupper(), please use ToLowerCaseASCII() and > > ToUpperCaseASCII() from nsUnicharUtils.h > > > > For strcasecmp() and strncasecmp(), please use their nsCRT::-prefixed > > versions from nsCRT.h. > > > > (Ideally, we should scrub these from vendored C code, too, since being > > in third-party code doesn't really make the above problems go away.) > > > > -- > > Henri Sivonen > > hsivo...@mozilla.com > > > > -- > Henri Sivonen > hsivo...@mozilla.com > ___ > dev-platform mailing list > dev-platform@lists.mozilla.org > https://lists.mozilla.org/listinfo/dev-platform ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Please don't use locale-dependent C standard library functions (was: Re: Please don't use functions from ctype.h and strings.h)
This is an occasional re-reminder that anything in the C standard library that is locale-sensitive is fundamentally broken and should not be used. Today's example is strerr(), which returns a string that is meant to be rendered to the user, but the string isn't guaranteed to be UTF-8. On Mon, Aug 27, 2018 at 3:04 PM Henri Sivonen wrote: > > Please don't use the functions from ctype.h and strings.h. > > See: > https://daniel.haxx.se/blog/2018/01/30/isalnum-is-not-my-friend/ > https://daniel.haxx.se/blog/2008/10/15/strcasecmp-in-turkish/ > https://stackoverflow.com/questions/2898228/can-isdigit-legitimately-be-locale-dependent-in-c > > In addition to these being locale-sensitive, the functions from > ctype.h are defined to take (signed) int with the value space of > *unsigned* char or EOF and other argument values are Undefined > Behavior. Therefore, on platforms where char is signed, passing a char > sign-extends to int and invokes UB if the most-significant bit of the > char was set! Bug filed 15 years ago! > https://bugzilla.mozilla.org/show_bug.cgi?id=216952 (I'm not aware of > implementations doing anything surprising with this UB but there > exists precedent for *compiler* writers looking at the standard > *library* UB language and taking calls into standard library functions > as optimization-guiding assertions about the values of their > arguments, so better not risk it.) > > For isfoo(), please use mozilla::IsAsciiFoo() from mozilla/TextUtils.h. > > For tolower() and toupper(), please use ToLowerCaseASCII() and > ToUpperCaseASCII() from nsUnicharUtils.h > > For strcasecmp() and strncasecmp(), please use their nsCRT::-prefixed > versions from nsCRT.h. > > (Ideally, we should scrub these from vendored C code, too, since being > in third-party code doesn't really make the above problems go away.) > > -- > Henri Sivonen > hsivo...@mozilla.com -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Please don't use functions from ctype.h and strings.h
I think it's worthwhile to have a lint, but regexps are likely to have false positives, so using clang-tidy is probably better. A bug is on file: https://bugzilla.mozilla.org/show_bug.cgi?id=1485588 On Mon, Aug 27, 2018 at 4:06 PM, Tom Ritter wrote: > Is this something worth making a lint over? It's pretty easy to make > regex-based lints, e.g. > > yml-only based lint: > https://searchfox.org/mozilla-central/source/tools/lint/cpp-virtual-final.yml > > yml+python for slightly more complicated regexing: > https://searchfox.org/mozilla-central/source/tools/lint/mingw-capitalization.yml > https://searchfox.org/mozilla-central/source/tools/lint/cpp/mingw-capitalization.py > > -tom > > On Mon, Aug 27, 2018 at 7:04 AM, Henri Sivonen wrote: >> >> Please don't use the functions from ctype.h and strings.h. >> >> See: >> https://daniel.haxx.se/blog/2018/01/30/isalnum-is-not-my-friend/ >> https://daniel.haxx.se/blog/2008/10/15/strcasecmp-in-turkish/ >> >> https://stackoverflow.com/questions/2898228/can-isdigit-legitimately-be-locale-dependent-in-c >> >> In addition to these being locale-sensitive, the functions from >> ctype.h are defined to take (signed) int with the value space of >> *unsigned* char or EOF and other argument values are Undefined >> Behavior. Therefore, on platforms where char is signed, passing a char >> sign-extends to int and invokes UB if the most-significant bit of the >> char was set! Bug filed 15 years ago! >> https://bugzilla.mozilla.org/show_bug.cgi?id=216952 (I'm not aware of >> implementations doing anything surprising with this UB but there >> exists precedent for *compiler* writers looking at the standard >> *library* UB language and taking calls into standard library functions >> as optimization-guiding assertions about the values of their >> arguments, so better not risk it.) >> >> For isfoo(), please use mozilla::IsAsciiFoo() from mozilla/TextUtils.h. >> >> For tolower() and toupper(), please use ToLowerCaseASCII() and >> ToUpperCaseASCII() from nsUnicharUtils.h >> >> For strcasecmp() and strncasecmp(), please use their nsCRT::-prefixed >> versions from nsCRT.h. >> >> (Ideally, we should scrub these from vendored C code, too, since being >> in third-party code doesn't really make the above problems go away.) >> >> -- >> Henri Sivonen >> hsivo...@mozilla.com >> ___ >> dev-platform mailing list >> dev-platform@lists.mozilla.org >> https://lists.mozilla.org/listinfo/dev-platform > > -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Please don't use functions from ctype.h and strings.h
Is this something worth making a lint over? It's pretty easy to make regex-based lints, e.g. yml-only based lint: https://searchfox.org/mozilla-central/source/tools/lint/cpp-virtual-final.yml yml+python for slightly more complicated regexing: https://searchfox.org/mozilla-central/source/tools/lint/mingw-capitalization.yml https://searchfox.org/mozilla-central/source/tools/lint/cpp/mingw-capitalization.py -tom On Mon, Aug 27, 2018 at 7:04 AM, Henri Sivonen wrote: > Please don't use the functions from ctype.h and strings.h. > > See: > https://daniel.haxx.se/blog/2018/01/30/isalnum-is-not-my-friend/ > https://daniel.haxx.se/blog/2008/10/15/strcasecmp-in-turkish/ > https://stackoverflow.com/questions/2898228/can-isdigit- > legitimately-be-locale-dependent-in-c > > In addition to these being locale-sensitive, the functions from > ctype.h are defined to take (signed) int with the value space of > *unsigned* char or EOF and other argument values are Undefined > Behavior. Therefore, on platforms where char is signed, passing a char > sign-extends to int and invokes UB if the most-significant bit of the > char was set! Bug filed 15 years ago! > https://bugzilla.mozilla.org/show_bug.cgi?id=216952 (I'm not aware of > implementations doing anything surprising with this UB but there > exists precedent for *compiler* writers looking at the standard > *library* UB language and taking calls into standard library functions > as optimization-guiding assertions about the values of their > arguments, so better not risk it.) > > For isfoo(), please use mozilla::IsAsciiFoo() from mozilla/TextUtils.h. > > For tolower() and toupper(), please use ToLowerCaseASCII() and > ToUpperCaseASCII() from nsUnicharUtils.h > > For strcasecmp() and strncasecmp(), please use their nsCRT::-prefixed > versions from nsCRT.h. > > (Ideally, we should scrub these from vendored C code, too, since being > in third-party code doesn't really make the above problems go away.) > > -- > Henri Sivonen > hsivo...@mozilla.com > ___ > dev-platform mailing list > dev-platform@lists.mozilla.org > https://lists.mozilla.org/listinfo/dev-platform > ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Please don't use functions from ctype.h and strings.h
Please don't use the functions from ctype.h and strings.h. See: https://daniel.haxx.se/blog/2018/01/30/isalnum-is-not-my-friend/ https://daniel.haxx.se/blog/2008/10/15/strcasecmp-in-turkish/ https://stackoverflow.com/questions/2898228/can-isdigit-legitimately-be-locale-dependent-in-c In addition to these being locale-sensitive, the functions from ctype.h are defined to take (signed) int with the value space of *unsigned* char or EOF and other argument values are Undefined Behavior. Therefore, on platforms where char is signed, passing a char sign-extends to int and invokes UB if the most-significant bit of the char was set! Bug filed 15 years ago! https://bugzilla.mozilla.org/show_bug.cgi?id=216952 (I'm not aware of implementations doing anything surprising with this UB but there exists precedent for *compiler* writers looking at the standard *library* UB language and taking calls into standard library functions as optimization-guiding assertions about the values of their arguments, so better not risk it.) For isfoo(), please use mozilla::IsAsciiFoo() from mozilla/TextUtils.h. For tolower() and toupper(), please use ToLowerCaseASCII() and ToUpperCaseASCII() from nsUnicharUtils.h For strcasecmp() and strncasecmp(), please use their nsCRT::-prefixed versions from nsCRT.h. (Ideally, we should scrub these from vendored C code, too, since being in third-party code doesn't really make the above problems go away.) -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform