Re: [PHP-DEV] [RFC] Locale-independent case conversion

2021-10-11 Thread Hans Henrik Bergan
dang, yeah completely missed that, sorry

On Mon, 11 Oct 2021 at 13:14, Tim Starling  wrote:

> On 11/10/21 9:33 pm, Hans Henrik Bergan wrote:
> > @Nicolas i hope mediawiki doesn't run on Windows, because that
> > escapeshellarg-replacement you did is not valid for windows.
>
> Maybe you missed the Windows-specific code on lines 113-146.
>
> <
> https://gerrit.wikimedia.org/g/mediawiki/libs/Shellbox/+/refs/changes/48/722548/5/src/Shellbox.php#113
> >
>
> -- Tim Starling
>
>


Re: [PHP-DEV] [RFC] Locale-independent case conversion

2021-10-11 Thread Tim Starling
On 11/10/21 9:33 pm, Hans Henrik Bergan wrote:
> @Nicolas i hope mediawiki doesn't run on Windows, because that
> escapeshellarg-replacement you did is not valid for windows.

Maybe you missed the Windows-specific code on lines 113-146.



-- Tim Starling

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] Locale-independent case conversion

2021-10-11 Thread Hans Henrik Bergan
@Nicolas i hope mediawiki doesn't run on Windows, because that
escapeshellarg-replacement you did is not valid for windows.

the code

prints:
echo 'foo && whoami && echo '

and when i run that in bash i get:

C:\Users\hansh>echo 'foo && whoami && echo '
'foo
laptop-1plmku02\hansh
'
- whoami was executed even though it ran through mediawiki's escapeshellarg
replacement

PS creating a proper escapeshellarg for windows is actually difficult, see
https://docs.microsoft.com/en-gb/archive/blogs/twistylittlepassagesallalike/everyone-quotes-command-line-arguments-the-wrong-way
(and even php's upstream escapeshellarg() doesn't get it quite right on
Windows, corrupting more data than it strictly needs to corrupt, but i
don't recall the specifics)


On Mon, 11 Oct 2021 at 09:38, Nicolas Grekas 
wrote:

> Le lun. 11 oct. 2021 à 03:33, Tim Starling  a
> écrit :
>
> > On 4/10/21 9:08 pm, Nikita Popov wrote:
> > >
> > > Hi Tim,
> > >
> > > Thanks for creating this proposal, it looks great!
> > >
> > > I think this is a very beneficial change, and the amount of
> > > incorrect locale-dependent calls we had just in php-src further
> > > convinced me of this: We're generally aware of the problem, and we
> > > still made this mistake. Many times.
> > >
> > > The only open question I have is regarding the ctype_* functions.
> > > One might argue that these functions should be locale-independent as
> > > well. Certainly, whenever I have used ctype_digit() I only intended
> > > it to match [0-9]. It seems like some people try to use
> > > ctype_alpha() in a locale-sensitive way
> > > (
> >
> https://stackoverflow.com/questions/19929965/php-setlocale-not-working-for-ctype-alpha-check
> > > <
> >
> https://stackoverflow.com/questions/19929965/php-setlocale-not-working-for-ctype-alpha-check
> > >)
> > > and then fail because it doesn't support UTF-8.
> > >
> > OK, I removed ctype_tolower() and ctype_toupper() from the RFC and the
> > PR since they would be incompatible with a move towards a
> > locale-independent ctype extension.
> >
> > The non-controversial parts of the PR were split and merged, so I
> > rebased the PR and updated the RFC accordingly.
> >
> > Do you think the RFC is ready for voting now?
> >
> >
> > > PS: Regarding escapeshellarg(), are you aware of the array command
> > > support for proc_open() that was added in PHP 7.4? That does away
> > > the need to escape arguments.
> >
> > It doesn't really help us. I recently wrote a new shell command
> > execution system for MediaWiki called Shellbox. As part of that
> > project, I reviewed how shell execution is used in the MediaWiki
> > ecosystem. There are a lot of callers which are using shell features,
> > for example redirecting inputs or outputs, or constructing pipelines.
> > I didn't really want to break them all or reimplement those features
> > without the shell. And we have security and containerization wrappers
> > which depend on construction of a shell command string. So we need to
> > be able to construct shell command strings safely.
> >
> > After studying locale sensitivity for this RFC, I decided to get rid
> > of escapeshellarg() from MediaWiki. Instead we are doing our own shell
> > escaping:
> >
> > https://gerrit.wikimedia.org/r/c/mediawiki/libs/Shellbox/+/722548
> >
> > I also made MediaWiki use a fixed locale, instead of being configurable.
> >
>
> Hi Tim,
>
> thanks for the RFC and for the above pointers, I'm going to have a look at
> Symfony Process to follow your lead!
>
> About the RFC, I just have one note:
>
> > I didn't include strnatcasecmp() and natcasesort() in this RFC, because
> they also use isdigit() and isspace(), and because they are intended for
> natural language processing. They could be migrated in future.
>
> Despite their name, I never used *natcase* functions for natural language
> processing. I use them eg to sort lists of files in a directory, to account
> for numbers mainly. But that's not what I would call natural language
> processing. I'm not aware of anyone using them for that actually. I'm
> wondering if it's a good idea to postpone migrating them to an hypothetical
> future as to me, the whole reasoning of the RFC applies to them.
>
> Regards,
> Nicolas
>


Re: [PHP-DEV] [RFC] Locale-independent case conversion

2021-10-11 Thread Nicolas Grekas
Le lun. 11 oct. 2021 à 03:33, Tim Starling  a
écrit :

> On 4/10/21 9:08 pm, Nikita Popov wrote:
> >
> > Hi Tim,
> >
> > Thanks for creating this proposal, it looks great!
> >
> > I think this is a very beneficial change, and the amount of
> > incorrect locale-dependent calls we had just in php-src further
> > convinced me of this: We're generally aware of the problem, and we
> > still made this mistake. Many times.
> >
> > The only open question I have is regarding the ctype_* functions.
> > One might argue that these functions should be locale-independent as
> > well. Certainly, whenever I have used ctype_digit() I only intended
> > it to match [0-9]. It seems like some people try to use
> > ctype_alpha() in a locale-sensitive way
> > (
> https://stackoverflow.com/questions/19929965/php-setlocale-not-working-for-ctype-alpha-check
> > <
> https://stackoverflow.com/questions/19929965/php-setlocale-not-working-for-ctype-alpha-check
> >)
> > and then fail because it doesn't support UTF-8.
> >
> OK, I removed ctype_tolower() and ctype_toupper() from the RFC and the
> PR since they would be incompatible with a move towards a
> locale-independent ctype extension.
>
> The non-controversial parts of the PR were split and merged, so I
> rebased the PR and updated the RFC accordingly.
>
> Do you think the RFC is ready for voting now?
>
>
> > PS: Regarding escapeshellarg(), are you aware of the array command
> > support for proc_open() that was added in PHP 7.4? That does away
> > the need to escape arguments.
>
> It doesn't really help us. I recently wrote a new shell command
> execution system for MediaWiki called Shellbox. As part of that
> project, I reviewed how shell execution is used in the MediaWiki
> ecosystem. There are a lot of callers which are using shell features,
> for example redirecting inputs or outputs, or constructing pipelines.
> I didn't really want to break them all or reimplement those features
> without the shell. And we have security and containerization wrappers
> which depend on construction of a shell command string. So we need to
> be able to construct shell command strings safely.
>
> After studying locale sensitivity for this RFC, I decided to get rid
> of escapeshellarg() from MediaWiki. Instead we are doing our own shell
> escaping:
>
> https://gerrit.wikimedia.org/r/c/mediawiki/libs/Shellbox/+/722548
>
> I also made MediaWiki use a fixed locale, instead of being configurable.
>

Hi Tim,

thanks for the RFC and for the above pointers, I'm going to have a look at
Symfony Process to follow your lead!

About the RFC, I just have one note:

> I didn't include strnatcasecmp() and natcasesort() in this RFC, because
they also use isdigit() and isspace(), and because they are intended for
natural language processing. They could be migrated in future.

Despite their name, I never used *natcase* functions for natural language
processing. I use them eg to sort lists of files in a directory, to account
for numbers mainly. But that's not what I would call natural language
processing. I'm not aware of anyone using them for that actually. I'm
wondering if it's a good idea to postpone migrating them to an hypothetical
future as to me, the whole reasoning of the RFC applies to them.

Regards,
Nicolas