Re: strcasecmp raises its...
On 5/19/22 7:54 PM, Stefan Eissing wrote: > > >> Am 19.05.2022 um 17:20 schrieb Ruediger Pluem : >> >> >>> >>> +1 from me for replacing our protocol+config handling code with the >>> ap_cstr_casecmp(). >> >> +1. Just to mention: Christophe already did quite some work in this area. > > Thanks, Christophe. > > For my understanding: the code in APR for tables uses strcasecmp() and I am > probably just too stupid to see where this is redefined? > >From my point of view apr_tables use strcasecmp and this is not redefined. I >guess one of the reasons is that the apr_tables stuff has been there forever in APR and the apr_cstr stuff only came into APR 2016 and the code of apr_tables was not adjusted after this. The other reason at least for 1.x might be concerns that switching to apr_cstr might change the behavior of apr_tables in a way that is incompatible with the versioning rules of APR. But this is a discussion for dev@apr. Regards Rüdiger
Re: strcasecmp raises its...
> Am 19.05.2022 um 17:20 schrieb Ruediger Pluem : > > > > On 5/19/22 5:15 PM, Stefan Eissing wrote: >> >> >>> Am 19.05.2022 um 16:44 schrieb Joe Orton : >>> >>> On Wed, May 18, 2022 at 05:34:22PM +0200, Ruediger Pluem wrote: On 5/18/22 4:55 PM, Joe Orton wrote: > I think for httpd it is only safe and sane to run httpd with LANG=C, we > do this in the default service scripts in Fedora/RHEL for a very long > time. Other than the protocol parsing issues you can get in non-C > locales, you can also get "surprises" when sort order can change with > the system locale, impacting e.g. config file load ordering and more. Don't you need a locale sensitive case insensitive string comparison in case of case blind file systems which support extended latin characters? I know these Germans with their Umlaute :-). >>> >>> Heh. Well, I got away with it so far :) >>> > So IMHO it is probably sufficient & simpler to adjust apachectl to set > LANG=C rather than trying to eliminate strcasecmp, and add another > strcasecmp() reimplementation in APR, in this case. We already have this implementation in APR and we use the httpd one which is just a forward port from APR to httpd until we require a sufficient recent APR version in several places. The question is just if we should use them everywhere and thus do the correct thing no matter what locale is set. >>> >>> Ah, I missed that, thanks. >>> >>> +1 from me on doing replacement of strcasecmp() with the >>> locale-insensitive versions then. At least with config options, protocol >>> data, it is definitely right. >>> >> >> +1 from me for replacing our protocol+config handling code with the >> ap_cstr_casecmp(). > > +1. Just to mention: Christophe already did quite some work in this area. Thanks, Christophe. For my understanding: the code in APR for tables uses strcasecmp() and I am probably just too stupid to see where this is redefined? Kind Regards, Stefan > > Regards > > Rüdiger
Re: strcasecmp raises its...
On 5/19/22 5:15 PM, Stefan Eissing wrote: > > >> Am 19.05.2022 um 16:44 schrieb Joe Orton : >> >> On Wed, May 18, 2022 at 05:34:22PM +0200, Ruediger Pluem wrote: >>> On 5/18/22 4:55 PM, Joe Orton wrote: I think for httpd it is only safe and sane to run httpd with LANG=C, we do this in the default service scripts in Fedora/RHEL for a very long time. Other than the protocol parsing issues you can get in non-C locales, you can also get "surprises" when sort order can change with the system locale, impacting e.g. config file load ordering and more. >>> >>> Don't you need a locale sensitive case insensitive string comparison in >>> case of case blind file systems which support extended >>> latin characters? I know these Germans with their Umlaute :-). >> >> Heh. Well, I got away with it so far :) >> So IMHO it is probably sufficient & simpler to adjust apachectl to set LANG=C rather than trying to eliminate strcasecmp, and add another strcasecmp() reimplementation in APR, in this case. >>> >>> We already have this implementation in APR and we use the >>> httpd one which is just a forward port from APR to httpd until we require a >>> sufficient recent APR version in several places. >>> The question is just if we should use them everywhere and thus do the >>> correct thing no matter what locale is set. >> >> Ah, I missed that, thanks. >> >> +1 from me on doing replacement of strcasecmp() with the >> locale-insensitive versions then. At least with config options, protocol >> data, it is definitely right. >> > > +1 from me for replacing our protocol+config handling code with the > ap_cstr_casecmp(). +1. Just to mention: Christophe already did quite some work in this area. Regards Rüdiger
Re: strcasecmp raises its...
> Am 19.05.2022 um 16:44 schrieb Joe Orton : > > On Wed, May 18, 2022 at 05:34:22PM +0200, Ruediger Pluem wrote: >> On 5/18/22 4:55 PM, Joe Orton wrote: >>> I think for httpd it is only safe and sane to run httpd with LANG=C, we >>> do this in the default service scripts in Fedora/RHEL for a very long >>> time. Other than the protocol parsing issues you can get in non-C >>> locales, you can also get "surprises" when sort order can change with >>> the system locale, impacting e.g. config file load ordering and more. >> >> Don't you need a locale sensitive case insensitive string comparison in case >> of case blind file systems which support extended >> latin characters? I know these Germans with their Umlaute :-). > > Heh. Well, I got away with it so far :) > >>> So IMHO it is probably sufficient & simpler to adjust apachectl to set >>> LANG=C rather than trying to eliminate strcasecmp, and add another >>> strcasecmp() reimplementation in APR, in this case. >> >> We already have this implementation in APR and we use the >> httpd one which is just a forward port from APR to httpd until we require a >> sufficient recent APR version in several places. >> The question is just if we should use them everywhere and thus do the >> correct thing no matter what locale is set. > > Ah, I missed that, thanks. > > +1 from me on doing replacement of strcasecmp() with the > locale-insensitive versions then. At least with config options, protocol > data, it is definitely right. > +1 from me for replacing our protocol+config handling code with the ap_cstr_casecmp(). Cheers, Stefan
Re: strcasecmp raises its...
On Wed, May 18, 2022 at 05:34:22PM +0200, Ruediger Pluem wrote: > On 5/18/22 4:55 PM, Joe Orton wrote: > > I think for httpd it is only safe and sane to run httpd with LANG=C, we > > do this in the default service scripts in Fedora/RHEL for a very long > > time. Other than the protocol parsing issues you can get in non-C > > locales, you can also get "surprises" when sort order can change with > > the system locale, impacting e.g. config file load ordering and more. > > Don't you need a locale sensitive case insensitive string comparison in case > of case blind file systems which support extended > latin characters? I know these Germans with their Umlaute :-). Heh. Well, I got away with it so far :) > > So IMHO it is probably sufficient & simpler to adjust apachectl to set > > LANG=C rather than trying to eliminate strcasecmp, and add another > > strcasecmp() reimplementation in APR, in this case. > > We already have this implementation in APR and we use the > httpd one which is just a forward port from APR to httpd until we require a > sufficient recent APR version in several places. > The question is just if we should use them everywhere and thus do the correct > thing no matter what locale is set. Ah, I missed that, thanks. +1 from me on doing replacement of strcasecmp() with the locale-insensitive versions then. At least with config options, protocol data, it is definitely right. Regards, Joe
Re: strcasecmp raises its...
On 5/18/22 7:17 PM, Nick Kew wrote: > >> On 18 May 2022, at 16:34, Ruediger Pluem wrote: >> >> Rüdiger > > What locale are YOU in there? Any attempt at locale is going to have to draw > lines: de_DE.UTF-8 > what are the rules for when Ruediger == Rüdiger? There can be none. Because I can transcribe ü to ue , but not every ue is an ü the other way around. Regards Rüdiger
Re: strcasecmp raises its...
> Am 18.05.2022 um 19:17 schrieb Nick Kew : > > >> On 18 May 2022, at 16:34, Ruediger Pluem wrote: >> >> Rüdiger > > What locale are YOU in there? Any attempt at locale is going to have to draw > lines: > what are the rules for when Ruediger == Rüdiger? > > In a WWW (and hence httpd) context, internationalised domain names raise all > kinds > of issues, including for us potentially breaking case-insensitivity rules in > matching > hostnames, and perhaps other configuration matters. What happens if we make > locale a configurable parameter for hostnames and use strcasecmp_l? It is not restricted to that. In the Turkish locale strcasecmp("file", "FILE") can be != 0 ("can be" as POSIX declares it as undefined. But it has been a real world issue in curl in the past.) If we enforce locale to "C" in apachectl, that seems to solve the issue. However using our own ap_cstr_casecmp in protocol functions seems like a good idea. Kind Regards, Stefan > > -- > Nick Kew
Re: strcasecmp raises its...
> On 18 May 2022, at 16:34, Ruediger Pluem wrote: > > Rüdiger What locale are YOU in there? Any attempt at locale is going to have to draw lines: what are the rules for when Ruediger == Rüdiger? In a WWW (and hence httpd) context, internationalised domain names raise all kinds of issues, including for us potentially breaking case-insensitivity rules in matching hostnames, and perhaps other configuration matters. What happens if we make locale a configurable parameter for hostnames and use strcasecmp_l? -- Nick Kew
Re: strcasecmp raises its...
On 5/18/22 4:55 PM, Joe Orton wrote: > On Wed, May 18, 2022 at 12:53:57PM +0200, Ruediger Pluem wrote: >> >> >> On 5/18/22 12:19 PM, Stefan Eissing wrote: >>> 2022 and we discuss strcasecmp() again? >>> >>> Background: OpenSSL 3.0.3 added OPENSSL_strcasecmp() and friends and there >>> are several issue around their implementation. Up to this version, they >>> relied on the POSIX strcasecmp(). Whatever their reasons for their change... >>> >>> Checking our sources, we have ap_cstr_casecmp() that does the right thing. >>> But >>> - we do not use it everywhere >>> - it is not part of APR which relies on the POSIX strcasecmp(), esp. >>> apr_table does. >> >> It is, but it may not be used where it possibly should: >> >> https://apr.apache.org/docs/apr/1.7/group__apr__cstr.html >> >>> >>> I want to handshake with you regarding this: >>> 1. should we scan our sources for strcasecmp and replace it with >>> ap_cstr_casecmp()? >> >> If I remember correctly ap_cstr_casecmp was only designed to be used for >> comparisons of HTTP protocol strings as it is locale >> agnostic. Hence I am not sure if it is correct to use it everywhere. From >> the documentation: >> >> ** >> * Perform a case-insensitive comparison of two strings @a str1 and @a str2, >> * treating upper and lower case values of the 26 standard C/POSIX alphabetic >> * characters as equivalent. Extended latin characters outside of this set >> * are treated as unique octets, irrespective of the current locale. >> >> Hence it might be wrong to use it in cases where you need to respect the >> locale. > > Are there really any cases like that in httpd? > > I think for httpd it is only safe and sane to run httpd with LANG=C, we > do this in the default service scripts in Fedora/RHEL for a very long > time. Other than the protocol parsing issues you can get in non-C > locales, you can also get "surprises" when sort order can change with > the system locale, impacting e.g. config file load ordering and more. Don't you need a locale sensitive case insensitive string comparison in case of case blind file systems which support extended latin characters? I know these Germans with their Umlaute :-). > > So IMHO it is probably sufficient & simpler to adjust apachectl to set > LANG=C rather than trying to eliminate strcasecmp, and add another > strcasecmp() reimplementation in APR, in this case. We already have this implementation in APR and we use the httpd one which is just a forward port from APR to httpd until we require a sufficient recent APR version in several places. The question is just if we should use them everywhere and thus do the correct thing no matter what locale is set. Regards Rüdiger
Re: strcasecmp raises its...
On Wed, May 18, 2022 at 12:53:57PM +0200, Ruediger Pluem wrote: > > > On 5/18/22 12:19 PM, Stefan Eissing wrote: > > 2022 and we discuss strcasecmp() again? > > > > Background: OpenSSL 3.0.3 added OPENSSL_strcasecmp() and friends and there > > are several issue around their implementation. Up to this version, they > > relied on the POSIX strcasecmp(). Whatever their reasons for their change... > > > > Checking our sources, we have ap_cstr_casecmp() that does the right thing. > > But > > - we do not use it everywhere > > - it is not part of APR which relies on the POSIX strcasecmp(), esp. > > apr_table does. > > It is, but it may not be used where it possibly should: > > https://apr.apache.org/docs/apr/1.7/group__apr__cstr.html > > > > > I want to handshake with you regarding this: > > 1. should we scan our sources for strcasecmp and replace it with > > ap_cstr_casecmp()? > > If I remember correctly ap_cstr_casecmp was only designed to be used for > comparisons of HTTP protocol strings as it is locale > agnostic. Hence I am not sure if it is correct to use it everywhere. From the > documentation: > > ** > * Perform a case-insensitive comparison of two strings @a str1 and @a str2, > * treating upper and lower case values of the 26 standard C/POSIX alphabetic > * characters as equivalent. Extended latin characters outside of this set > * are treated as unique octets, irrespective of the current locale. > > Hence it might be wrong to use it in cases where you need to respect the > locale. Are there really any cases like that in httpd? I think for httpd it is only safe and sane to run httpd with LANG=C, we do this in the default service scripts in Fedora/RHEL for a very long time. Other than the protocol parsing issues you can get in non-C locales, you can also get "surprises" when sort order can change with the system locale, impacting e.g. config file load ordering and more. So IMHO it is probably sufficient & simpler to adjust apachectl to set LANG=C rather than trying to eliminate strcasecmp, and add another strcasecmp() reimplementation in APR, in this case. Regards, Joe
Re: strcasecmp raises its...
On 5/18/22 12:19 PM, Stefan Eissing wrote: > 2022 and we discuss strcasecmp() again? > > Background: OpenSSL 3.0.3 added OPENSSL_strcasecmp() and friends and there > are several issue around their implementation. Up to this version, they > relied on the POSIX strcasecmp(). Whatever their reasons for their change... > > Checking our sources, we have ap_cstr_casecmp() that does the right thing. > But > - we do not use it everywhere > - it is not part of APR which relies on the POSIX strcasecmp(), esp. > apr_table does. It is, but it may not be used where it possibly should: https://apr.apache.org/docs/apr/1.7/group__apr__cstr.html > > I want to handshake with you regarding this: > 1. should we scan our sources for strcasecmp and replace it with > ap_cstr_casecmp()? If I remember correctly ap_cstr_casecmp was only designed to be used for comparisons of HTTP protocol strings as it is locale agnostic. Hence I am not sure if it is correct to use it everywhere. From the documentation: ** * Perform a case-insensitive comparison of two strings @a str1 and @a str2, * treating upper and lower case values of the 26 standard C/POSIX alphabetic * characters as equivalent. Extended latin characters outside of this set * are treated as unique octets, irrespective of the current locale. Hence it might be wrong to use it in cases where you need to respect the locale. Regards Rüdiger