Re: strcasecmp raises its...

2022-05-20 Thread Ruediger Pluem



On 5/19/22 7:54 PM, Stefan Eissing wrote:
> 
> 
>> Am 19.05.2022 um 17:20 schrieb Ruediger Pluem :
>>
>>

>>>
>>> +1 from me for replacing our protocol+config handling code with the 
>>> ap_cstr_casecmp().
>>
>> +1. Just to mention: Christophe already did quite some work in this area.
> 
> Thanks, Christophe.
> 
> For my understanding: the code in APR for tables uses strcasecmp() and I am 
> probably just too stupid to see where this is redefined?
> 

>From my point of view apr_tables use strcasecmp and this is not redefined. I 
>guess one of the reasons is that the apr_tables stuff
has been there forever in APR and the apr_cstr stuff only came into APR 2016 
and the code of apr_tables was not adjusted after
this. The other reason at least for 1.x might be concerns that switching to 
apr_cstr might change the behavior of apr_tables in a
way that is incompatible with the versioning rules of APR.
But this is a discussion for dev@apr.

Regards

Rüdiger


Re: strcasecmp raises its...

2022-05-19 Thread Stefan Eissing



> Am 19.05.2022 um 17:20 schrieb Ruediger Pluem :
> 
> 
> 
> On 5/19/22 5:15 PM, Stefan Eissing wrote:
>> 
>> 
>>> Am 19.05.2022 um 16:44 schrieb Joe Orton :
>>> 
>>> On Wed, May 18, 2022 at 05:34:22PM +0200, Ruediger Pluem wrote:
 On 5/18/22 4:55 PM, Joe Orton wrote:
> I think for httpd it is only safe and sane to run httpd with LANG=C, we 
> do this in the default service scripts in Fedora/RHEL for a very long 
> time. Other than the protocol parsing issues you can get in non-C 
> locales, you can also get "surprises" when sort order can change with 
> the system locale, impacting e.g. config file load ordering and more.
 
 Don't you need a locale sensitive case insensitive string comparison in 
 case of case blind file systems which support extended
 latin characters? I know these Germans with their Umlaute :-).
>>> 
>>> Heh. Well, I got away with it so far :)
>>> 
> So IMHO it is probably sufficient & simpler to adjust apachectl to set 
> LANG=C rather than trying to eliminate strcasecmp, and add another 
> strcasecmp() reimplementation in APR, in this case.
 
 We already have this implementation in APR and we use the
 httpd one which is just a forward port from APR to httpd until we require 
 a sufficient recent APR version in several places.
 The question is just if we should use them everywhere and thus do the 
 correct thing no matter what locale is set.
>>> 
>>> Ah, I missed that, thanks.
>>> 
>>> +1 from me on doing replacement of strcasecmp() with the 
>>> locale-insensitive versions then. At least with config options, protocol 
>>> data, it is definitely right.
>>> 
>> 
>> +1 from me for replacing our protocol+config handling code with the 
>> ap_cstr_casecmp().
> 
> +1. Just to mention: Christophe already did quite some work in this area.

Thanks, Christophe.

For my understanding: the code in APR for tables uses strcasecmp() and I am 
probably just too stupid to see where this is redefined?

Kind Regards,
Stefan

> 
> Regards
> 
> Rüdiger



Re: strcasecmp raises its...

2022-05-19 Thread Ruediger Pluem



On 5/19/22 5:15 PM, Stefan Eissing wrote:
> 
> 
>> Am 19.05.2022 um 16:44 schrieb Joe Orton :
>>
>> On Wed, May 18, 2022 at 05:34:22PM +0200, Ruediger Pluem wrote:
>>> On 5/18/22 4:55 PM, Joe Orton wrote:
 I think for httpd it is only safe and sane to run httpd with LANG=C, we 
 do this in the default service scripts in Fedora/RHEL for a very long 
 time. Other than the protocol parsing issues you can get in non-C 
 locales, you can also get "surprises" when sort order can change with 
 the system locale, impacting e.g. config file load ordering and more.
>>>
>>> Don't you need a locale sensitive case insensitive string comparison in 
>>> case of case blind file systems which support extended
>>> latin characters? I know these Germans with their Umlaute :-).
>>
>> Heh. Well, I got away with it so far :)
>>
 So IMHO it is probably sufficient & simpler to adjust apachectl to set 
 LANG=C rather than trying to eliminate strcasecmp, and add another 
 strcasecmp() reimplementation in APR, in this case.
>>>
>>> We already have this implementation in APR and we use the
>>> httpd one which is just a forward port from APR to httpd until we require a 
>>> sufficient recent APR version in several places.
>>> The question is just if we should use them everywhere and thus do the 
>>> correct thing no matter what locale is set.
>>
>> Ah, I missed that, thanks.
>>
>> +1 from me on doing replacement of strcasecmp() with the 
>> locale-insensitive versions then. At least with config options, protocol 
>> data, it is definitely right.
>>
> 
> +1 from me for replacing our protocol+config handling code with the 
> ap_cstr_casecmp().

+1. Just to mention: Christophe already did quite some work in this area.

Regards

Rüdiger



Re: strcasecmp raises its...

2022-05-19 Thread Stefan Eissing



> Am 19.05.2022 um 16:44 schrieb Joe Orton :
> 
> On Wed, May 18, 2022 at 05:34:22PM +0200, Ruediger Pluem wrote:
>> On 5/18/22 4:55 PM, Joe Orton wrote:
>>> I think for httpd it is only safe and sane to run httpd with LANG=C, we 
>>> do this in the default service scripts in Fedora/RHEL for a very long 
>>> time. Other than the protocol parsing issues you can get in non-C 
>>> locales, you can also get "surprises" when sort order can change with 
>>> the system locale, impacting e.g. config file load ordering and more.
>> 
>> Don't you need a locale sensitive case insensitive string comparison in case 
>> of case blind file systems which support extended
>> latin characters? I know these Germans with their Umlaute :-).
> 
> Heh. Well, I got away with it so far :)
> 
>>> So IMHO it is probably sufficient & simpler to adjust apachectl to set 
>>> LANG=C rather than trying to eliminate strcasecmp, and add another 
>>> strcasecmp() reimplementation in APR, in this case.
>> 
>> We already have this implementation in APR and we use the
>> httpd one which is just a forward port from APR to httpd until we require a 
>> sufficient recent APR version in several places.
>> The question is just if we should use them everywhere and thus do the 
>> correct thing no matter what locale is set.
> 
> Ah, I missed that, thanks.
> 
> +1 from me on doing replacement of strcasecmp() with the 
> locale-insensitive versions then. At least with config options, protocol 
> data, it is definitely right.
> 

+1 from me for replacing our protocol+config handling code with the 
ap_cstr_casecmp().

Cheers,
Stefan

Re: strcasecmp raises its...

2022-05-19 Thread Joe Orton
On Wed, May 18, 2022 at 05:34:22PM +0200, Ruediger Pluem wrote:
> On 5/18/22 4:55 PM, Joe Orton wrote:
> > I think for httpd it is only safe and sane to run httpd with LANG=C, we 
> > do this in the default service scripts in Fedora/RHEL for a very long 
> > time. Other than the protocol parsing issues you can get in non-C 
> > locales, you can also get "surprises" when sort order can change with 
> > the system locale, impacting e.g. config file load ordering and more.
> 
> Don't you need a locale sensitive case insensitive string comparison in case 
> of case blind file systems which support extended
> latin characters? I know these Germans with their Umlaute :-).

Heh. Well, I got away with it so far :)

> > So IMHO it is probably sufficient & simpler to adjust apachectl to set 
> > LANG=C rather than trying to eliminate strcasecmp, and add another 
> > strcasecmp() reimplementation in APR, in this case.
> 
> We already have this implementation in APR and we use the
> httpd one which is just a forward port from APR to httpd until we require a 
> sufficient recent APR version in several places.
> The question is just if we should use them everywhere and thus do the correct 
> thing no matter what locale is set.

Ah, I missed that, thanks.

+1 from me on doing replacement of strcasecmp() with the 
locale-insensitive versions then. At least with config options, protocol 
data, it is definitely right.

Regards, Joe




Re: strcasecmp raises its...

2022-05-19 Thread Ruediger Pluem



On 5/18/22 7:17 PM, Nick Kew wrote:
> 
>> On 18 May 2022, at 16:34, Ruediger Pluem  wrote:
>>
>> Rüdiger
> 
> What locale are YOU in there?  Any attempt at locale is going to have to draw 
> lines:

de_DE.UTF-8

> what are the rules for when Ruediger == Rüdiger?

There can be none. Because I can transcribe ü to ue , but not every ue is an ü 
the other way around.

Regards

Rüdiger



Re: strcasecmp raises its...

2022-05-18 Thread Stefan Eissing



> Am 18.05.2022 um 19:17 schrieb Nick Kew :
> 
> 
>> On 18 May 2022, at 16:34, Ruediger Pluem  wrote:
>> 
>> Rüdiger
> 
> What locale are YOU in there?  Any attempt at locale is going to have to draw 
> lines:
> what are the rules for when Ruediger == Rüdiger?
> 
> In a WWW (and hence httpd) context, internationalised domain names raise all 
> kinds
> of issues, including for us potentially breaking case-insensitivity rules in 
> matching
> hostnames, and perhaps other configuration matters.  What happens if we make
> locale a configurable parameter for hostnames and use strcasecmp_l?

It is not restricted to that. In the Turkish locale strcasecmp("file", "FILE") 
can be != 0 ("can be" as POSIX declares it as undefined. But it has been a real 
world issue in curl in the past.)

If we enforce locale to "C" in apachectl, that seems to solve the issue. 
However using our own ap_cstr_casecmp in protocol functions seems like a good 
idea.

Kind Regards,
Stefan

> 
> -- 
> Nick Kew



Re: strcasecmp raises its...

2022-05-18 Thread Nick Kew


> On 18 May 2022, at 16:34, Ruediger Pluem  wrote:
> 
> Rüdiger

What locale are YOU in there?  Any attempt at locale is going to have to draw 
lines:
what are the rules for when Ruediger == Rüdiger?

In a WWW (and hence httpd) context, internationalised domain names raise all 
kinds
of issues, including for us potentially breaking case-insensitivity rules in 
matching
hostnames, and perhaps other configuration matters.  What happens if we make
locale a configurable parameter for hostnames and use strcasecmp_l?

-- 
Nick Kew

Re: strcasecmp raises its...

2022-05-18 Thread Ruediger Pluem



On 5/18/22 4:55 PM, Joe Orton wrote:
> On Wed, May 18, 2022 at 12:53:57PM +0200, Ruediger Pluem wrote:
>>
>>
>> On 5/18/22 12:19 PM, Stefan Eissing wrote:
>>> 2022 and we discuss strcasecmp() again?
>>>
>>> Background: OpenSSL 3.0.3 added OPENSSL_strcasecmp() and friends and there 
>>> are several issue around their implementation. Up to this version, they 
>>> relied on the POSIX strcasecmp(). Whatever their reasons for their change...
>>>
>>> Checking our sources, we have ap_cstr_casecmp() that does the right thing. 
>>> But 
>>> - we do not use it everywhere
>>> - it is not part of APR which relies on the POSIX strcasecmp(), esp. 
>>> apr_table does.
>>
>> It is, but it may not be used where it possibly should:
>>
>> https://apr.apache.org/docs/apr/1.7/group__apr__cstr.html
>>
>>>
>>> I want to handshake with you regarding this:
>>> 1. should we scan our sources for strcasecmp and replace it with 
>>> ap_cstr_casecmp()?
>>
>> If I remember correctly ap_cstr_casecmp was only designed to be used for 
>> comparisons of HTTP protocol strings as it is locale
>> agnostic. Hence I am not sure if it is correct to use it everywhere. From 
>> the documentation:
>>
>> **
>>  * Perform a case-insensitive comparison of two strings @a str1 and @a str2,
>>  * treating upper and lower case values of the 26 standard C/POSIX alphabetic
>>  * characters as equivalent. Extended latin characters outside of this set
>>  * are treated as unique octets, irrespective of the current locale.
>>
>> Hence it might be wrong to use it in cases where you need to respect the 
>> locale.
> 
> Are there really any cases like that in httpd?
> 
> I think for httpd it is only safe and sane to run httpd with LANG=C, we 
> do this in the default service scripts in Fedora/RHEL for a very long 
> time. Other than the protocol parsing issues you can get in non-C 
> locales, you can also get "surprises" when sort order can change with 
> the system locale, impacting e.g. config file load ordering and more.

Don't you need a locale sensitive case insensitive string comparison in case of 
case blind file systems which support extended
latin characters? I know these Germans with their Umlaute :-).

> 
> So IMHO it is probably sufficient & simpler to adjust apachectl to set 
> LANG=C rather than trying to eliminate strcasecmp, and add another 
> strcasecmp() reimplementation in APR, in this case.

We already have this implementation in APR and we use the
httpd one which is just a forward port from APR to httpd until we require a 
sufficient recent APR version in several places.
The question is just if we should use them everywhere and thus do the correct 
thing no matter what locale is set.

Regards

Rüdiger


Re: strcasecmp raises its...

2022-05-18 Thread Joe Orton
On Wed, May 18, 2022 at 12:53:57PM +0200, Ruediger Pluem wrote:
> 
> 
> On 5/18/22 12:19 PM, Stefan Eissing wrote:
> > 2022 and we discuss strcasecmp() again?
> > 
> > Background: OpenSSL 3.0.3 added OPENSSL_strcasecmp() and friends and there 
> > are several issue around their implementation. Up to this version, they 
> > relied on the POSIX strcasecmp(). Whatever their reasons for their change...
> > 
> > Checking our sources, we have ap_cstr_casecmp() that does the right thing. 
> > But 
> > - we do not use it everywhere
> > - it is not part of APR which relies on the POSIX strcasecmp(), esp. 
> > apr_table does.
> 
> It is, but it may not be used where it possibly should:
> 
> https://apr.apache.org/docs/apr/1.7/group__apr__cstr.html
> 
> > 
> > I want to handshake with you regarding this:
> > 1. should we scan our sources for strcasecmp and replace it with 
> > ap_cstr_casecmp()?
> 
> If I remember correctly ap_cstr_casecmp was only designed to be used for 
> comparisons of HTTP protocol strings as it is locale
> agnostic. Hence I am not sure if it is correct to use it everywhere. From the 
> documentation:
> 
> **
>  * Perform a case-insensitive comparison of two strings @a str1 and @a str2,
>  * treating upper and lower case values of the 26 standard C/POSIX alphabetic
>  * characters as equivalent. Extended latin characters outside of this set
>  * are treated as unique octets, irrespective of the current locale.
> 
> Hence it might be wrong to use it in cases where you need to respect the 
> locale.

Are there really any cases like that in httpd?

I think for httpd it is only safe and sane to run httpd with LANG=C, we 
do this in the default service scripts in Fedora/RHEL for a very long 
time. Other than the protocol parsing issues you can get in non-C 
locales, you can also get "surprises" when sort order can change with 
the system locale, impacting e.g. config file load ordering and more.

So IMHO it is probably sufficient & simpler to adjust apachectl to set 
LANG=C rather than trying to eliminate strcasecmp, and add another 
strcasecmp() reimplementation in APR, in this case.

Regards, Joe



Re: strcasecmp raises its...

2022-05-18 Thread Ruediger Pluem



On 5/18/22 12:19 PM, Stefan Eissing wrote:
> 2022 and we discuss strcasecmp() again?
> 
> Background: OpenSSL 3.0.3 added OPENSSL_strcasecmp() and friends and there 
> are several issue around their implementation. Up to this version, they 
> relied on the POSIX strcasecmp(). Whatever their reasons for their change...
> 
> Checking our sources, we have ap_cstr_casecmp() that does the right thing. 
> But 
> - we do not use it everywhere
> - it is not part of APR which relies on the POSIX strcasecmp(), esp. 
> apr_table does.

It is, but it may not be used where it possibly should:

https://apr.apache.org/docs/apr/1.7/group__apr__cstr.html

> 
> I want to handshake with you regarding this:
> 1. should we scan our sources for strcasecmp and replace it with 
> ap_cstr_casecmp()?

If I remember correctly ap_cstr_casecmp was only designed to be used for 
comparisons of HTTP protocol strings as it is locale
agnostic. Hence I am not sure if it is correct to use it everywhere. From the 
documentation:

**
 * Perform a case-insensitive comparison of two strings @a str1 and @a str2,
 * treating upper and lower case values of the 26 standard C/POSIX alphabetic
 * characters as equivalent. Extended latin characters outside of this set
 * are treated as unique octets, irrespective of the current locale.

Hence it might be wrong to use it in cases where you need to respect the locale.


Regards

Rüdiger