Re: [PHP-DEV] Directory separators on Windows

2017-04-03 Thread Fleshgrinder
On 4/2/2017 8:28 PM, Rowan Collins wrote:
> On 02/04/2017 09:09, Fleshgrinder wrote:
>> Your strategy works in these examples, but the example I gave was
>> different. Imagine that we have `/a/b/../c` which we would normalize to
>> `/a/c`. However, the `b` component is actually a symbolic link to `x/y`.
>> Hence, the real version of the path is `/a/x/c` and not `/a/c` as we
>> would have normalized it to.
> 
> Both strategies are equally valid, as long as you know which is in use.
> There are many common tools outside PHP which use both approaches, and
> situations where you might actually want the string-based approach, even
> if filesystem access is available.
> 
> See for instance this discussion of pwd:
> http://unix.stackexchange.com/q/331208/70530 In summary, POSIX specifies
> "-L" (logical) which uses $PWD as set by the shell as you navigate, and
> "-P" (physical) which resolves backwards through the ".." links in the
> file system.
> 
> The same is true for other operations - for instance, the below demo in
> bash shows one interpretation in "ls" and the other in "cd".
> 
> 
> /tmp/demo$ ls -lR
> .:
> drwxr-xr-x 2 vagrant vagrant 4096 Apr  2 18:21 foo
> drwxr-xr-x 3 vagrant vagrant 4096 Apr  2 18:05 other
> 
> ./foo:
> lrwxrwxrwx 1 vagrant vagrant 21 Apr  2 18:21 bar -> /tmp/demo/other/thing
> 
> ./other:
> drwxr-xr-x 2 vagrant vagrant 4096 Apr  2 18:06 thing
> 
> /tmp/demo$ ls foo/bar/..
> thing
> 
> /tmp/demo$ cd foo/bar/..
> /tmp/demo/foo$ ls
> bar
> 
> Regards,
> 

I get your point, and I have to agree here.
`normalize_path`/`Path::normalize` would be the counterpart to
`realpath`/`Path::canonicalize`.



-- 
Richard "Fleshgrinder" Fussenegger



signature.asc
Description: OpenPGP digital signature


Re: [PHP-DEV] Directory separators on Windows

2017-04-02 Thread Rowan Collins

On 02/04/2017 09:09, Fleshgrinder wrote:

Your strategy works in these examples, but the example I gave was
different. Imagine that we have `/a/b/../c` which we would normalize to
`/a/c`. However, the `b` component is actually a symbolic link to `x/y`.
Hence, the real version of the path is `/a/x/c` and not `/a/c` as we
would have normalized it to.


Both strategies are equally valid, as long as you know which is in use. 
There are many common tools outside PHP which use both approaches, and 
situations where you might actually want the string-based approach, even 
if filesystem access is available.


See for instance this discussion of pwd: 
http://unix.stackexchange.com/q/331208/70530 In summary, POSIX specifies 
"-L" (logical) which uses $PWD as set by the shell as you navigate, and 
"-P" (physical) which resolves backwards through the ".." links in the 
file system.


The same is true for other operations - for instance, the below demo in 
bash shows one interpretation in "ls" and the other in "cd".



/tmp/demo$ ls -lR
.:
drwxr-xr-x 2 vagrant vagrant 4096 Apr  2 18:21 foo
drwxr-xr-x 3 vagrant vagrant 4096 Apr  2 18:05 other

./foo:
lrwxrwxrwx 1 vagrant vagrant 21 Apr  2 18:21 bar -> /tmp/demo/other/thing

./other:
drwxr-xr-x 2 vagrant vagrant 4096 Apr  2 18:06 thing

/tmp/demo$ ls foo/bar/..
thing

/tmp/demo$ cd foo/bar/..
/tmp/demo/foo$ ls
bar

Regards,

--
Rowan Collins
[IMSoP]


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Directory separators on Windows

2017-04-02 Thread Fleshgrinder
On 4/1/2017 6:15 PM, Anatol Belski wrote:
> Basically, it is the same as your points 8., 9. and 10. - it deals
> with the given path itself, so no symlinks, etc. In the snippet
> /a/b/../c it's parsed like follows
> 
> - parse up to /a/b/../ - scroll back to /a - append the remain so it
> becomes /a/c
> 
> Similar process is with /a/./b would become /a/b and others. It is
> string traversing only. What is done with dirname() uses this
> approach. In general one can say - normalization is a path
> simplification, no drive access like realpath() does. For example, it
> lets to know the path itself would be correct before it comes to
> actual file operation, and not bother with I/O otherwise.
> 

Your strategy works in these examples, but the example I gave was
different. Imagine that we have `/a/b/../c` which we would normalize to
`/a/c`. However, the `b` component is actually a symbolic link to `x/y`.
Hence, the real version of the path is `/a/x/c` and not `/a/c` as we
would have normalized it to.

On 4/1/2017 6:15 PM, Anatol Belski wrote:
> As mentioned in an earlier post, in might make sense to have flags to
> control the behavior. Maybe a signature like
> 
> string canonicalize_path(string $path, int $flags = 0);
> 
> The function OFC knows the current platform. Flags like
> PATH_TARGET_WINDOWS | PATH_UNIXIFY would control the path separator
> behaviors. Generally, regarding path without drive letter - on
> Windows I'd strongely advise to not to use it in configs, etc.
> because of multiple root issues mentioned already. But in principle,
> say one has same FS structure on different platforms and just wants
> to mirror it, that would be ok with flags like PATH_TARGET_LINUX |
> PATH_STRIP_DRIVE as Linux implies forward slashes. Or otherwise, fe
> the reverse case - generating a path on Linux that is to be used on
> Windows, flags might contain only PATH_TARGET_WINDOWS which would
> produce backslashes as system default. Maybe that's too much or
> unrelated, and only platform targets should be provided, dunno, just
> a mind game for now.
> 

I hope you notice how this function is exploding in complexity. I beg
for classes, with clear responsibilities and small methods that do one
thing.

On 4/1/2017 6:15 PM, Anatol Belski wrote:
> These last 3 points, as well as above one, are canonicalization. Of
> course, in the imaginary function, it could be decoupled like
> PATH_NO_CANONIC if it's not wanted, or PATH_CANONICALIZE_ONLY to omit
> other conversions. It's only about to have the behaviors sensible. Fe
> possible other flags could be PATH_STRIP_TRAILING_SLASH,
> PATH_ALLOW_RELATIVE and other fine things. But by default, the
> function should do the default thing for the target platform, based
> on the current platform. Thus, producing NFD for Mac and NFC
> otherwise, backslash for Windows and forward slash otherwise, other
> thing that will for sure popup. As mentioned earlier, still this
> requires some re-implementations of the platform APIs, even we'd talk
> about slashes only - for ASCII paths I'm not sure we even can
> differentiate the UTF-8 encoding  forms without involving yet another
> library, so this might be tricky. Simply exposing the part of
> realpath() processing might solve several things for one given
> platform, that's for sure. The initial case Rasmus reported was about
> crossplatform handling, but the topic is indeed slightly bigger than
> just path separators, so IMO the convenient way were to care about a
> crossplatform approach. I've no info, how badly such crossplatform
> path issues are indeed relevant, so it might be another story to
> investigate before one starts any implementation. At least, grouping
> some cases and thought, maybe as an RFC, could be good to track the
> topic.
> 

I agree mostly:

- We should not call it canonicalization (I used the word too), but
rather normalization. The former is used in other languages and means
realpath there. This could be confusing.
- Leaving the stripping of the trailing separator to the user means that
other users never know what the get, that is bad. The normalization
should always use one strategy here.

-- 
Richard "Fleshgrinder" Fussenegger




signature.asc
Description: OpenPGP digital signature


RE: [PHP-DEV] Directory separators on Windows

2017-04-01 Thread Anatol Belski


> -Original Message-
> From: Fleshgrinder [mailto:p...@fleshgrinder.com]
> Sent: Saturday, April 1, 2017 2:43 PM
> To: Anatol Belski <weltl...@outlook.de>; Rasmus Schultz
> <ras...@mindplay.dk>
> Cc: PHP internals <internals@lists.php.net>
> Subject: Re: [PHP-DEV] Directory separators on Windows
> 
> On 4/1/2017 2:01 PM, Anatol Belski wrote:
> > 1. optionally - yes, otherwise it should do platform default 2. no,
> > this kind of operation is a pure parsing, no I/O related checks needed
> > 3. irrelevant, but can be defined
> >
> > Other points yet I'd care about
> > - result should be correct for target platform disregarding actual 
> > platform, fe
> target Linux path Windows, or Windows path on Mac, etc.
> > - validation, particularly for reserved words and chars, also other
> > platform aspects
> > - encodings have to be respected, or UTF-8 only, to define
> > - probably should be compatible with PHP stream wrapper namespaces
> >
> >
> > Thanks
> >
> > Anatol
> >
> 
> 1. How do you envision that? If the path is `/a/b/../c` where only `/a` 
> exists right
> now? It's unresolvable, assuming that `../` points to `/a` is wrong if `b/` 
> is a
> symbolic link that points to `/x/y`.
> 
> 2. Here I agree, casing cannot be decided without hitting the filesystem. Some
> are case-sensitive, some insensitive, and others configurable.
> 
Basically, it is the same as your points 8., 9. and 10. - it deals with the 
given path itself, so no symlinks, etc. In the snippet /a/b/../c it's parsed 
like follows

- parse up to /a/b/../
- scroll back to /a
- append the remain so it becomes /a/c

Similar process is with /a/./b would become /a/b and others. It is string 
traversing only. What is done with dirname() uses this approach. In general one 
can say - normalization is a path simplification, no drive access like 
realpath() does. For example, it lets to know the path itself would be correct 
before it comes to actual file operation, and not bother with I/O otherwise. 

> 3. Does not matter for Windows itself, it is case-insensitive.
> 
> (I continue the numbering for the points you raised.)
> 
> 4. How would we go about normalizing a Windows path to POSIX? `C:\a` is not
> necessarily the same as `/a`, or should it produce `C:/a`?
>
As mentioned in an earlier post, in might make sense to have flags to control 
the behavior. Maybe a signature like

string canonicalize_path(string $path, int $flags = 0);

The function OFC knows the current platform. Flags like PATH_TARGET_WINDOWS | 
PATH_UNIXIFY would control the path separator behaviors. Generally, regarding 
path without drive letter - on Windows I'd strongely advise to not to use it in 
configs, etc. because of multiple root issues mentioned already. But in 
principle, say one has same FS structure on different platforms and just wants 
to mirror it, that would be ok with flags like PATH_TARGET_LINUX | 
PATH_STRIP_DRIVE as Linux implies forward slashes. Or otherwise, fe the reverse 
case - generating a path on Linux that is to be used on Windows, flags might 
contain only PATH_TARGET_WINDOWS which would produce backslashes as system 
default. Maybe that's too much or unrelated, and only platform targets should 
be provided, dunno, just a mind game for now.

> 5. 
> 
> 6. I vote for UTF-8 only. We already have locale dependent filesystem 
> functions,
> which also makes them kind of weird to use, especially in libraries. Another 
> very
> important aspect to take care of this point is normalization forms. 
> Filesystems
> generally store stuff as is, that means that we can create to files with the 
> same
> name, at least by the looks of it, which are actually different ones. Think 
> of `ä`
> which can also be `ä`. It is generally most advisable to stick to NFC, 
> because that
> is also how users usually produce those chars.
> 
Yeah, probably UTF-8 were the simplest for the cross platform implementation. 
Regarding the encoding variant - that's where more care would be needed. Fe see 
https://github.com/aws/aws-cli/issues/1639 , that's where we would care about 
PATH_TARGET_MAC specific things. Comparable, fe the situation, where you want 
to escapeshell* something, but it'll be invalid on another platform or possibly 
with another shell, how it currently works. 
> 7.  just forward I'd say.
> 
> 8. Collapse multiple separators (e.g. `a//b` ~> `a/b`).
> 
> 9. Resolve self-references, unless they are leading (e.g. `a/./b` ~> `a/b` but
> `./a/b` stays `./a/b`).
> 
> 10. Trim separators from the end (e.g. `a/` ~> `a`).
> 
These last 3 points, as well as above one, are canonicalization. Of course, in 
the imaginary function, it could be decoupled like PATH_NO_CANONIC

Re: [PHP-DEV] Directory separators on Windows

2017-04-01 Thread Rasmus Schultz
10 thumbs up ;-)

But this really demonstrates how badly we need this function - I bet any
number of those points may or may not be covered by any number of
implementations in the wild.

It would be so nice to have this done "right", once and for all.


On Sat, Apr 1, 2017 at 2:42 PM, Fleshgrinder  wrote:

> On 4/1/2017 2:01 PM, Anatol Belski wrote:
> > 1. optionally - yes, otherwise it should do platform default
> > 2. no, this kind of operation is a pure parsing, no I/O related checks
> needed
> > 3. irrelevant, but can be defined
> >
> > Other points yet I'd care about
> > - result should be correct for target platform disregarding actual
> platform, fe target Linux path Windows, or Windows path on Mac, etc.
> > - validation, particularly for reserved words and chars, also other
> platform aspects
> > - encodings have to be respected, or UTF-8 only, to define
> > - probably should be compatible with PHP stream wrapper namespaces
> >
> >
> > Thanks
> >
> > Anatol
> >
>
> 1. How do you envision that? If the path is `/a/b/../c` where only `/a`
> exists right now? It's unresolvable, assuming that `../` points to `/a`
> is wrong if `b/` is a symbolic link that points to `/x/y`.
>
> 2. Here I agree, casing cannot be decided without hitting the
> filesystem. Some are case-sensitive, some insensitive, and others
> configurable.
>
> 3. Does not matter for Windows itself, it is case-insensitive.
>
> (I continue the numbering for the points you raised.)
>
> 4. How would we go about normalizing a Windows path to POSIX? `C:\a` is
> not necessarily the same as `/a`, or should it produce `C:/a`?
>
> 5. 
>
> 6. I vote for UTF-8 only. We already have locale dependent filesystem
> functions, which also makes them kind of weird to use, especially in
> libraries. Another very important aspect to take care of this point is
> normalization forms. Filesystems generally store stuff as is, that means
> that we can create to files with the same name, at least by the looks of
> it, which are actually different ones. Think of `ä` which can also be
> `ä`. It is generally most advisable to stick to NFC, because that is
> also how users usually produce those chars.
>
> 7.  just forward I'd say.
>
> 8. Collapse multiple separators (e.g. `a//b` ~> `a/b`).
>
> 9. Resolve self-references, unless they are leading (e.g. `a/./b` ~>
> `a/b` but `./a/b` stays `./a/b`).
>
> 10. Trim separators from the end (e.g. `a/` ~> `a`).
>
> --
> Richard "Fleshgrinder" Fussenegger
>


Re: [PHP-DEV] Directory separators on Windows

2017-04-01 Thread Fleshgrinder
On 4/1/2017 2:01 PM, Anatol Belski wrote:
> 1. optionally - yes, otherwise it should do platform default
> 2. no, this kind of operation is a pure parsing, no I/O related checks needed
> 3. irrelevant, but can be defined
> 
> Other points yet I'd care about
> - result should be correct for target platform disregarding actual platform, 
> fe target Linux path Windows, or Windows path on Mac, etc.
> - validation, particularly for reserved words and chars, also other platform 
> aspects
> - encodings have to be respected, or UTF-8 only, to define
> - probably should be compatible with PHP stream wrapper namespaces
> 
> 
> Thanks
> 
> Anatol
> 

1. How do you envision that? If the path is `/a/b/../c` where only `/a`
exists right now? It's unresolvable, assuming that `../` points to `/a`
is wrong if `b/` is a symbolic link that points to `/x/y`.

2. Here I agree, casing cannot be decided without hitting the
filesystem. Some are case-sensitive, some insensitive, and others
configurable.

3. Does not matter for Windows itself, it is case-insensitive.

(I continue the numbering for the points you raised.)

4. How would we go about normalizing a Windows path to POSIX? `C:\a` is
not necessarily the same as `/a`, or should it produce `C:/a`?

5. 

6. I vote for UTF-8 only. We already have locale dependent filesystem
functions, which also makes them kind of weird to use, especially in
libraries. Another very important aspect to take care of this point is
normalization forms. Filesystems generally store stuff as is, that means
that we can create to files with the same name, at least by the looks of
it, which are actually different ones. Think of `ä` which can also be
`ä`. It is generally most advisable to stick to NFC, because that is
also how users usually produce those chars.

7.  just forward I'd say.

8. Collapse multiple separators (e.g. `a//b` ~> `a/b`).

9. Resolve self-references, unless they are leading (e.g. `a/./b` ~>
`a/b` but `./a/b` stays `./a/b`).

10. Trim separators from the end (e.g. `a/` ~> `a`).

-- 
Richard "Fleshgrinder" Fussenegger

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP-DEV] Directory separators on Windows

2017-04-01 Thread Anatol Belski
Hi,

> -Original Message-
> From: Rasmus Schultz [mailto:ras...@mindplay.dk]
> Sent: Saturday, April 1, 2017 11:13 AM
> To: Pierre Joye <pierre@gmail.com>
> Cc: Kris Craig <kris.cr...@gmail.com>; Sara Golemon <poll...@php.net>; PHP
> internals <internals@lists.php.net>
> Subject: Re: [PHP-DEV] Directory separators on Windows
> 
> > Also ucfirst is useless (or any case operations)
> 
> It's not useless, if you want a normalized path on Windows, it has to include 
> a
> drive-letter, and Windows FS isn't case-sensitive.
> 
> > Right now realpath will fail if the path does not exist
> 
> I know, that's one reason I don't use it.
> 
> It kind of solves a different problem, e.g. resolves ".." and "." elements in
> paths... as a rule, I don't ever use relative paths, but it would certainly 
> be nice to
> have a realpath() that works for files that haven't been created yet.
> 
> I don't think you can simply make realpath() also normalize the path, as this
> would be a breaking change?
> 
> I guess an improved realpath() could be used internally as part of a
> normalize_path() function, but it's not enough on it's own, since the real 
> path
> will still have platform-specific directory-separators, so a
> normalize_path() function would still be useful if realpath() gets improved.
> 
> So to summarize, a normalize_path() function should:
> 
> 1. Fully normalize to an absolute path with no platform-specific separators 2.
> Have corrected case (for files/dirs that do exist.) 3. Have normalized (upper-
> case) drive-letter on Windows
> 
1. optionally - yes, otherwise it should do platform default
2. no, this kind of operation is a pure parsing, no I/O related checks needed
3. irrelevant, but can be defined

Other points yet I'd care about
- result should be correct for target platform disregarding actual platform, fe 
target Linux path Windows, or Windows path on Mac, etc.
- validation, particularly for reserved words and chars, also other platform 
aspects
- encodings have to be respected, or UTF-8 only, to define
- probably should be compatible with PHP stream wrapper namespaces


Thanks

Anatol

> There's also network file-system paths on Windows with a different syntax to
> consider? I don't know much about that...
> 
> 
> On Fri, Mar 31, 2017 at 11:40 AM, Pierre Joye <pierre@gmail.com> wrote:
> 
> > On Fri, Mar 31, 2017 at 3:32 PM, Rasmus Schultz <ras...@mindplay.dk>
> > wrote:
> > > Well, this is the opposite of what I'm asking for, and does not
> > > address
> > the
> > > case where paths have been persisted in a file or database and the
> > > data gets accessed from different OS.
> > >
> > > I understand the reasons given for not changing this behavior in PHP
> > > itself, so maybe we could have a standard function that normalizes
> > > paths
> > to
> > > forward slashes? e.g. basically:
> > >
> > > /**
> > >  * Normalize a filesystem path.
> > >  *
> > >  * On windows systems, replaces backslashes with forward slashes
> > >  * and ensures drive-letter in upper-case.
> > >  *
> > >  * @param string $path
> > >  *
> > >  * @return string normalized path
> > >  */
> > > function normalize_path( $path ) {
> > > $path = str_replace('\\', '/', $path);
> > >
> > > return $path{1} === ':'
> > > ? ucfirst($path)
> > > : $path;
> > > }
> >
> > Also ucfirst is useless (or any case operations). realpath goes
> > further down by solving ugly things like  \\\ or // (code
> > concatenating paths without checking trailing /\.
> >
> > > At least WordPress, Drupal and probably most major CMS and
> > > frameworks
> > have
> > > this function or something equivalent. .
> >
> > Now I remember why they have to do that.
> >
> > realpath is not fully exposed in userland. virtual_file_ex should be
> > used and provide the option to validate path or not. Right now
> > realpath will fail if the path does not exist. I would suggest to
> > expose this functionality/option and that will solve the need to
> > implement such things in userland.
> >
> > ps: I discussed that long time with Dmitry and forgot to implement it,
> > I take the blame for not having that in 7.x :)
> >
> > Cheers,
> > Pierre
> >


Re: [PHP-DEV] Directory separators on Windows

2017-04-01 Thread Fleshgrinder
On 4/1/2017 1:03 PM, Anatol Belski wrote:
> " A Uniform Resource Identifier (URI) is a compact sequence of 
> characters that identifies an abstract or physical resource" they
> say. Fits perfectly with PHP streams.
> 

The problem I was referring to is not semantically. The problem is that
the code cannot easily distinguish between local and remote files. Of
course there are functions for it again, but this would be better
expressed as part of the type system. I know that this is kind of alien
to the primitive obsessive world of PHP, but proper type systems can
help a lot to make code simpler.

That being said, it's totally off topic here. :P

On 4/1/2017 1:03 PM, Anatol Belski wrote:
> Yeah, though that draft still ignores many Windows variants ☹
> 
> We went anyway a bit too deep in this complex matter. Probably a
> separate function is where the opinions could be joined.
> 
> Thanks
> 
> Anatol
> 

Agree, this is my last response on this here. :)

-- 
Richard "Fleshgrinder" Fussenegger

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP-DEV] Directory separators on Windows

2017-04-01 Thread Anatol Belski


> -Original Message-
> From: Fleshgrinder [mailto:p...@fleshgrinder.com]
> Sent: Saturday, April 1, 2017 12:00 AM
> To: Anatol Belski <a...@php.net>; internals@lists.php.net; Rasmus Schultz
> <ras...@mindplay.dk>
> Subject: Re: [PHP-DEV] Directory separators on Windows
> 
> On 3/31/2017 9:29 PM, Anatol Belski wrote:
> > I can only link to this 
> >
> > http://git.php.net/?p=php-src.git;a=commitdiff;h=ec78507bd46a05f77dbde
> > 3fa4091ab4c91e61cad
> >
> >  the new implementation was consistent but had to be reverted in 7.1
> > partially, because of BC, even the use is inappropriate. Well, still
> > normalization on Windows means having '\\' in terms of the platform
> > API used, but just as a show case. The dirname function itself is
> > based on the PHP implementation, not a platform API. But also, it
> > would produce same path with different separators on different
> > platform, if normalized.
> >
> 
> A good example that showcases that we actually could normalize to slashes,
> don't you think. :)
> 
Nope, actually the opposite. More as an illustration to what shouldn't be done, 
namely fixing in core what actually would belongs to an app. But for BC, it's 
another point.

> Besides, I still believe that it is very wrong of PHP to treat URIs/URLs the 
> same
> as paths. A path can be a URI, but a URI should only be a path if it has the
> `file://` scheme. The current approach just asks for remote code inclusion, 
> URL
> fopen anyone? Different story though.
> 
" A Uniform Resource Identifier (URI) is a compact sequence of
   characters that identifies an abstract or physical resource" they say. Fits 
perfectly with PHP streams.

> On 3/31/2017 9:29 PM, Anatol Belski wrote:
> > You're right, they both are documented. What is not defined is the
> > cross platform handling. There are some documents, yes, like RFC 3986,
> > or RFC 1738 and RFC 8089 which are still in the proposed state.
> > However there is none I knew that would care about crossplatform
> > nuances in full extent. Particularly an RFC defining all the possible
> > behaviors of the file:// scheme is what were needed, I guess. Thus my
> > conclusion is to take the path of less resistance, as what is not
> > defined is not necessary good but also is not necessary broken. Yeah,
> > it is complex, and particularly in PHP historically grown, and just
> > touching the water surface might already produce some high waves.
> >
> > The functions mentioned - of course, it were up to an application to
> > decide what to use it in a particular situation, but not forcibly
> > changing the core handling. Like in the snippet above, you would have
> > currently to do dirname(realpath($path)), but that is also not
> > crossplatform and won't work on a nonexistent file. So another
> > function instead of realpath, like dirname(normalize_path($path,
> > UNIXIFY_SLASH)) were in use. The implementation might be tricky in
> > some parts, but in general doable.
> >
> > Regards
> >
> > Anatol
> >
> 
> Well, RFC 8089 has many examples in its appendix regarding Windows. It's true
> that they say that it is non-standard, however, it is how Windows deals with 
> it
> since IE4.
> 
> https://blogs.msdn.microsoft.com/freeassociations/2005/05/19/the-bizarre-
> and-unhappy-story-of-file-urls/
> 
Yeah, though that draft still ignores many Windows variants ☹

We went anyway a bit too deep in this complex matter. Probably a separate 
function is where the opinions could be joined.

Thanks

Anatol


Re: [PHP-DEV] Directory separators on Windows

2017-04-01 Thread Fleshgrinder
On 4/1/2017 11:13 AM, Rasmus Schultz wrote:
> So to summarize, a normalize_path() function should:
> 
> 1. Fully normalize to an absolute path with no platform-specific separators
> 2. Have corrected case (for files/dirs that do exist.)
> 3. Have normalized (upper-case) drive-letter on Windows
> 
> There's also network file-system paths on Windows with a different syntax
> to consider? I don't know much about that...
> 

1. cannot be guaranteed by a normalization function, because the parts
the dots point to might not exist. Resolving them without knowing if we
are dealing with a symbolic or hard link is impossible.

UNC paths work the same as normal paths, the only difference is their
prefix (e.g. `\\ComputerName\`), in other words, they can be treated
like a schemeless URL.

Verbatim paths are not supported by PHP anyways, hence, they can be ignored.

-- 
Richard "Fleshgrinder" Fussenegger

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Directory separators on Windows

2017-04-01 Thread Rasmus Schultz
> Also ucfirst is useless (or any case operations)

It's not useless, if you want a normalized path on Windows, it has to
include a drive-letter, and Windows FS isn't case-sensitive.

> Right now realpath will fail if the path does not exist

I know, that's one reason I don't use it.

It kind of solves a different problem, e.g. resolves ".." and "." elements
in paths... as a rule, I don't ever use relative paths, but it would
certainly be nice to have a realpath() that works for files that haven't
been created yet.

I don't think you can simply make realpath() also normalize the path, as
this would be a breaking change?

I guess an improved realpath() could be used internally as part of a
normalize_path() function, but it's not enough on it's own, since the real
path will still have platform-specific directory-separators, so a
normalize_path() function would still be useful if realpath() gets improved.

So to summarize, a normalize_path() function should:

1. Fully normalize to an absolute path with no platform-specific separators
2. Have corrected case (for files/dirs that do exist.)
3. Have normalized (upper-case) drive-letter on Windows

There's also network file-system paths on Windows with a different syntax
to consider? I don't know much about that...


On Fri, Mar 31, 2017 at 11:40 AM, Pierre Joye  wrote:

> On Fri, Mar 31, 2017 at 3:32 PM, Rasmus Schultz 
> wrote:
> > Well, this is the opposite of what I'm asking for, and does not address
> the
> > case where paths have been persisted in a file or database and the data
> > gets accessed from different OS.
> >
> > I understand the reasons given for not changing this behavior in PHP
> > itself, so maybe we could have a standard function that normalizes paths
> to
> > forward slashes? e.g. basically:
> >
> > /**
> >  * Normalize a filesystem path.
> >  *
> >  * On windows systems, replaces backslashes with forward slashes
> >  * and ensures drive-letter in upper-case.
> >  *
> >  * @param string $path
> >  *
> >  * @return string normalized path
> >  */
> > function normalize_path( $path ) {
> > $path = str_replace('\\', '/', $path);
> >
> > return $path{1} === ':'
> > ? ucfirst($path)
> > : $path;
> > }
>
> Also ucfirst is useless (or any case operations). realpath goes
> further down by solving ugly things like  \\\ or // (code
> concatenating paths without checking trailing /\.
>
> > At least WordPress, Drupal and probably most major CMS and frameworks
> have
> > this function or something equivalent. .
>
> Now I remember why they have to do that.
>
> realpath is not fully exposed in userland. virtual_file_ex should be
> used and provide the option to validate path or not. Right now
> realpath will fail if the path does not exist. I would suggest to
> expose this functionality/option and that will solve the need to
> implement such things in userland.
>
> ps: I discussed that long time with Dmitry and forgot to implement it,
> I take the blame for not having that in 7.x :)
>
> Cheers,
> Pierre
>


Re: [PHP-DEV] Directory separators on Windows

2017-03-31 Thread Fleshgrinder
On 3/31/2017 9:29 PM, Anatol Belski wrote:
> I can only link to this 
> 
> http://git.php.net/?p=php-src.git;a=commitdiff;h=ec78507bd46a05f77dbde3fa4091ab4c91e61cad
>
>  the new implementation was consistent but had to be reverted in 7.1
> partially, because of BC, even the use is inappropriate. Well, still
> normalization on Windows means having '\\' in terms of the platform
> API used, but just as a show case. The dirname function itself is
> based on the PHP implementation, not a platform API. But also, it
> would produce same path with different separators on different
> platform, if normalized.
> 

A good example that showcases that we actually could normalize to
slashes, don't you think. :)

Besides, I still believe that it is very wrong of PHP to treat URIs/URLs
the same as paths. A path can be a URI, but a URI should only be a path
if it has the `file://` scheme. The current approach just asks for
remote code inclusion, URL fopen anyone? Different story though.

On 3/31/2017 9:29 PM, Anatol Belski wrote:
> You're right, they both are documented. What is not defined is the
> cross platform handling. There are some documents, yes, like RFC
> 3986, or RFC 1738 and RFC 8089 which are still in the proposed state.
> However there is none I knew that would care about crossplatform
> nuances in full extent. Particularly an RFC defining all the possible
> behaviors of the file:// scheme is what were needed, I guess. Thus my
> conclusion is to take the path of less resistance, as what is not
> defined is not necessary good but also is not necessary broken. Yeah,
> it is complex, and particularly in PHP historically grown, and just
> touching the water surface might already produce some high waves.
> 
> The functions mentioned - of course, it were up to an application to
> decide what to use it in a particular situation, but not forcibly
> changing the core handling. Like in the snippet above, you would have
> currently to do dirname(realpath($path)), but that is also not
> crossplatform and won't work on a nonexistent file. So another
> function instead of realpath, like dirname(normalize_path($path,
> UNIXIFY_SLASH)) were in use. The implementation might be tricky in
> some parts, but in general doable.
> 
> Regards
> 
> Anatol
> 

Well, RFC 8089 has many examples in its appendix regarding Windows. It's
true that they say that it is non-standard, however, it is how Windows
deals with it since IE4.

https://blogs.msdn.microsoft.com/freeassociations/2005/05/19/the-bizarre-and-unhappy-story-of-file-urls/

-- 
Richard "Fleshgrinder" Fussenegger

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP-DEV] Directory separators on Windows

2017-03-31 Thread Anatol Belski


> -Original Message-
> From: Fleshgrinder [mailto:p...@fleshgrinder.com]
> Sent: Friday, March 31, 2017 8:32 PM
> To: Anatol Belski <a...@php.net>; internals@lists.php.net; Rasmus Schultz
> <ras...@mindplay.dk>
> Subject: Re: [PHP-DEV] Directory separators on Windows
> 
> 
> $ php71 -a
> php > echo dirname('C:\Folder/Resource\Resource');
> C:\Folder/Resource
> 
> hmmm... just one example, this is what this whole discussion is about.
> We are already super inconsistent. It seems as if this is not producing any 
> issues
> with PHP itself, as well as at least every extension I ever interacted with.
> 
I can only link to this 

http://git.php.net/?p=php-src.git;a=commitdiff;h=ec78507bd46a05f77dbde3fa4091ab4c91e61cad

the new implementation was consistent but had to be reverted in 7.1 partially, 
because of BC, even the use is inappropriate. Well, still normalization on 
Windows means having '\\' in terms of the platform API used, but just as a show 
case. The dirname function itself is based on the PHP implementation, not a 
platform API. But also, it would produce same path with different separators on 
different platform, if normalized.

> Of course things are very different when it is about outputting paths and
> forwarding them to other programs, which might be super shitty. (I look at you
> protoc from Google, **grrr**.) However, that is something where
> `realpath`/`path_canonicalize`/`path_normalize` would come into play, and
> something I would leave to the applications. Choosing the right situation 
> where
> the path requires those actions is impossible.
> 
> We could also consistently convert paths to their native form. Hence, above
> example would result in `C:\Folder\Resource`, or even `\\?\C:\Folder\Resource`
> (verbatim path, no further fiddling allowed).
> 
> Both POSIX and Windows paths are well documented. However, it's not an easy
> topic, that is for sure, and using slashes everywhere might be more 
> destructive
> than I anticipate.
> 
You're right, they both are documented. What is not defined is the cross 
platform handling. There are some documents, yes, like RFC 3986, or RFC 1738 
and RFC 8089 which are still in the proposed state. However there is none I 
knew that would care about crossplatform nuances in full extent. Particularly 
an RFC defining all the possible behaviors of the file:// scheme is what were 
needed, I guess. Thus my conclusion is to take the path of less resistance, as 
what is not defined is not necessary good but also is not necessary broken. 
Yeah, it is complex, and particularly in PHP historically grown, and just 
touching the water surface might already produce some high waves. 

The functions mentioned - of course, it were up to an application to decide 
what to use it in a particular situation, but not forcibly changing the core 
handling. Like in the snippet above, you would have currently to do 
dirname(realpath($path)), but that is also not crossplatform and won't work on 
a nonexistent file. So another function instead of realpath, like 
dirname(normalize_path($path, UNIXIFY_SLASH)) were in use. The implementation 
might be tricky in some parts, but in general doable.

Regards

Anatol



Re: [PHP-DEV] Directory separators on Windows

2017-03-31 Thread Fleshgrinder
Hey :)

On 3/31/2017 7:51 PM, Anatol Belski wrote:
> Well, there was slightly more in your msg, thus the response 
> 

Not really:

On 3/30/2017 8:05 PM, Fleshgrinder wrote:
> Windows and paths is a complicated and lengthy story.
> 
> TL;DR all versions of Windows are able to deal with slashes, and we 
> could easily use slashes everywhere all the time.
> 

The rest was under the heading "History".

On 3/31/2017 7:51 PM, Anatol Belski wrote:
> Path normalization and forward slash everywhere are two different
> things. Having forward slash just because it is supported - nope,
> it's more an issue and should not be done. The path can be used
> everywhere - in the script itself, passed to external prog, written
> into a file, etc. The suggested "always forward slash" will cause
> endless conversion back and forth, in both user space and internally.
> Please check the 7.1 related parts, or even earlier versions, we
> already have to do some conversions because of these and similar
> matters, doing yet more while introducing breakages for existing
> software doesn't sound necessary. Any individual case in the given
> app is what matters.
> 

$ php71 -a
php > echo dirname('C:\Folder/Resource\Resource');
C:\Folder/Resource

hmmm... just one example, this is what this whole discussion is about.
We are already super inconsistent. It seems as if this is not producing
any issues with PHP itself, as well as at least every extension I ever
interacted with.

Of course things are very different when it is about outputting paths
and forwarding them to other programs, which might be super shitty. (I
look at you protoc from Google, **grrr**.) However, that is something
where `realpath`/`path_canonicalize`/`path_normalize` would come into
play, and something I would leave to the applications. Choosing the
right situation where the path requires those actions is impossible.

We could also consistently convert paths to their native form. Hence,
above example would result in `C:\Folder\Resource`, or even
`\\?\C:\Folder\Resource` (verbatim path, no further fiddling allowed).

On 3/31/2017 7:51 PM, Anatol Belski wrote:
> Yep, a function to normalize path were doable. But again, the current
> implementations are platform dependent and use platform APIs. Such a
> function might need a re-implementations of those APIs, to produce
> results platform independently, that are valid on the target
> platform. Otherwise, more generalization doesn't look like having a
> base in absence of a consistent specs, at least I haven't seen any.
> Well, until someone takes it in the hand and files a draft to IETF
> 
> 
> Regards
> 
> Anatol
> 

Both POSIX and Windows paths are well documented. However, it's not an
easy topic, that is for sure, and using slashes everywhere might be more
destructive than I anticipate.

-- 
Richard "Fleshgrinder" Fussenegger

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP-DEV] Directory separators on Windows

2017-03-31 Thread Anatol Belski


> -Original Message-
> From: Fleshgrinder [mailto:p...@fleshgrinder.com]
> Sent: Friday, March 31, 2017 6:29 PM
> To: Anatol Belski <a...@php.net>; internals@lists.php.net; Rasmus Schultz
> <ras...@mindplay.dk>
> Subject: Re: [PHP-DEV] Directory separators on Windows
> 
> On 3/31/2017 12:33 PM, Anatol Belski wrote:
> > Regarding the path variants support - it's not quite that way. PHP
> > streams abstract many things, for both simplicity and security. The
> > current state has historically grown on these two factors. So far I
> > can tell, the only what we don't support is a drive relative path and
> > don't handle several irrelevant prefixes like device UID.
> >
> > While in general the info above is correct, things still stay platform
> > dependent in many cases, while supported in PHP, too. Fe using "/" to
> > access drive root ofc works, but might be surprisingly wrong if CWD is
> > changed to another drive. Well, that's the platform nuance, with DOS
> > one can have multiple roots.  In other cases, like UNC, links or
> > lately the long path prefix, the handling with PHP streams is
> > completely transparent to the consuming script.
> >
> > A given case with a generated file is clearly the app responsibility.
> > It is likely, that generated files moved between systems can cause
> > arbitrary issues disregarding the actual platform. The mentioned case
> > belongs to the same group, where I'd say there is no and cannot be a
> > plausible general "fix". In addition to the EOL example by Rowan,
> > another one of same could be escapeshell* functions. Taking in account
> > also
> >
> > - backward compatibility - platform specific - compatibility with
> > dependency libs, especially where it's impossible to integrate PHP
> > streams - absence of the cross platform specifications, which is IMO
> > the most of issue
> >
> > Even if we'd abstract ourselves from the initial app responsibility
> > case - there are the portability nuances that are not simply to clear
> > away by just renaming 'a' to 'b'.
> >
> > Regards
> >
> > Anatol
> >
> 
> Slow with the horses, we were only talking about backslash vs. slash, not
> anything else. I only explained the various paths that are available on 
> Windows.
> 
Well, there was slightly more in your msg, thus the response 

> We could use slashes everywhere, because every platform that is still in
> existence supports it. That's about it, we cannot do much more, well, maybe
> some normalization (e.g. self-references like `a/./b` to `a/b`, or removing
> multiple slashes `a//b` to `a/b`). That's about it. Any other cross-platform 
> issues
> are not solvable, and must be handled by applications.
>
Path normalization and forward slash everywhere are two different things. 
Having forward slash just because it is supported - nope, it's more an issue 
and should not be done. The path can be used everywhere - in the script itself, 
passed to external prog, written into a file, etc. The suggested "always 
forward slash" will cause endless conversion back and forth, in both user space 
and internally. Please check the 7.1 related parts, or even earlier versions, 
we already have to do some conversions because of these and similar matters, 
doing yet more while introducing breakages for existing software doesn't sound 
necessary. Any individual case in the given app is what matters.
 
> A proper path abstraction would be awesome. Of course I would prefer an
> object for it, but offering a `path_canonicalize` function as well for 
> starters is
> good too.
> 
Yep, a function to normalize path were doable. But again, the current 
implementations are platform dependent and use platform APIs. Such a function 
might need a re-implementations of those APIs, to produce results platform 
independently, that are valid on the target platform. Otherwise, more 
generalization doesn't look like having a base in absence of a consistent 
specs, at least I haven't seen any. Well, until someone takes it in the hand 
and files a draft to IETF 

Regards

Anatol


Re: [PHP-DEV] Directory separators on Windows

2017-03-31 Thread Fleshgrinder
On 3/31/2017 12:33 PM, Anatol Belski wrote:
> Regarding the path variants support - it's not quite that way. PHP
> streams abstract many things, for both simplicity and security. The
> current state has historically grown on these two factors. So far I
> can tell, the only what we don't support is a drive relative path and
> don't handle several irrelevant prefixes like device UID.
> 
> While in general the info above is correct, things still stay
> platform dependent in many cases, while supported in PHP, too. Fe
> using "/" to access drive root ofc works, but might be surprisingly
> wrong if CWD is changed to another drive. Well, that's the platform
> nuance, with DOS one can have multiple roots.  In other cases, like
> UNC, links or lately the long path prefix, the handling with PHP
> streams is completely transparent to the consuming script.
> 
> A given case with a generated file is clearly the app responsibility.
> It is likely, that generated files moved between systems can cause
> arbitrary issues disregarding the actual platform. The mentioned case
> belongs to the same group, where I'd say there is no and cannot be a
> plausible general "fix". In addition to the EOL example by Rowan,
> another one of same could be escapeshell* functions. Taking in
> account also
> 
> - backward compatibility - platform specific - compatibility with
> dependency libs, especially where it's impossible to integrate PHP
> streams - absence of the cross platform specifications, which is IMO
> the most of issue
> 
> Even if we'd abstract ourselves from the initial app responsibility
> case - there are the portability nuances that are not simply to clear
> away by just renaming 'a' to 'b'.
> 
> Regards
> 
> Anatol
> 

Slow with the horses, we were only talking about backslash vs. slash,
not anything else. I only explained the various paths that are available
on Windows.

We could use slashes everywhere, because every platform that is still in
existence supports it. That's about it, we cannot do much more, well,
maybe some normalization (e.g. self-references like `a/./b` to `a/b`, or
removing multiple slashes `a//b` to `a/b`). That's about it. Any other
cross-platform issues are not solvable, and must be handled by applications.

A proper path abstraction would be awesome. Of course I would prefer an
object for it, but offering a `path_canonicalize` function as well for
starters is good too.

-- 
Richard "Fleshgrinder" Fussenegger

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Directory separators on Windows

2017-03-31 Thread S.A.N
> +1
> Can be used, for convert NAMESPACE to filepath in autoload )
>
> 
> function __autoload($path)
> {
> include convert_seperators($path);
> }
>
>
> On Windows, it is what realpath does.
>

No, realpath() - is not used `include_path`

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP-DEV] Directory separators on Windows

2017-03-31 Thread Anatol Belski
Hi,

> -Original Message-
> From: Fleshgrinder [mailto:p...@fleshgrinder.com]
> Sent: Thursday, March 30, 2017 8:05 PM
> To: Rasmus Schultz <ras...@mindplay.dk>; PHP internals
> <internals@lists.php.net>
> Subject: Re: [PHP-DEV] Directory separators on Windows
> 
> On 3/30/2017 3:25 PM, Rasmus Schultz wrote:
> > Thoughts?
> >
> 
> Windows and paths is a complicated and lengthy story.
> 
> TL;DR all versions of Windows are able to deal with slashes, and we could 
> easily
> use slashes everywhere all the time.
> 
> # History
> The story why Windows is using the backslash might be of interest, read:
> 
> http://blogs.msdn.com/b/larryosterman/archive/2005/06/24/432386.aspx
> 
> This also explains that Windows IS supporting forward slashes since at least 
> the
> 1990s. However, there are programs that have significant problems with it, but
> usually those are old or otherwise shitty programs.
> 
> There are various ways paths can be represented in Windows, the so called path
> variants. There are 7 in total:
> 
> 1. Root
> 2. Disk
> 3. UNC
> 4. Device Namespace
> 5. Verbatim Disk
> 6. Verbatim UNC
> 7. Verbatim Device Namespace
> 
> ## Root
> This works just line on Unix an can be either `\` or `/`. It always refers to 
> the root
> directory of the current drive.
> 
> ### Home
> PowerShell also supports the home short-hand `~` like Unix systems, however,
> `cmd.exe` does not.
> 
> ## Disk
> This is the one we all know. The drive letter comes first, followed by a 
> colon `:`,
> and then continues with the actual path.
> 
> `C:\Folder\Resource`
> `C:/Folder/Resource`
> 
> ## UNC
> Is short for **Universal Naming Convention** or **Uniform Naming
> Convention** allows one to refer to network paths or server shares.
> 
> `\\ComputerName\SharedFolder\Resource`
> `//ComputerName/SharedFolder/Resource`
> 
> It also has an extended form for web resource:
> 
> `\\CompuserName[@SSL][@Port]\SharedFolder\Resource`
> 
> ## Device Namespace
> This allows one to directly address special devices, or again the disks
> themselves.
> 
> `\\.\Device\Resource`
> `//./Device/Resource`
> 
> ## Verbatim *
> The verbatim paths work exactly the same way as the respective normal
> counterpart, the difference is that the slash to backslash conversion does NOT
> happen auto-magically:
> 
> `\\?\C:\Folder\Resource`
> `\\?\Server\Share`
> `\\?\UNC\Server\Share`
> 
> https://en.wikipedia.org/wiki/Path_(computing)
> 
> I highly recommend you to have a look at Rust's path implementation, as it 
> takes
> care of all these things in a very intelligent manner. It is also capable of 
> dealing
> with all variants of paths in Windows, unlike PHP which only supports a few:
> 
> https://doc.rust-lang.org/std/path/index.html
> 
Regarding the path variants support - it's not quite that way. PHP streams 
abstract many things, for both simplicity and security. The current state has 
historically grown on these two factors. So far I can tell, the only what we 
don't support is a drive relative path and don't handle several irrelevant 
prefixes like device UID.

While in general the info above is correct, things still stay platform 
dependent in many cases, while supported in PHP, too. Fe using "/" to access 
drive root ofc works, but might be surprisingly wrong if CWD is changed to 
another drive. Well, that's the platform nuance, with DOS one can have multiple 
roots.  In other cases, like UNC, links or lately the long path prefix, the 
handling with PHP streams is completely transparent to the consuming script.

A given case with a generated file is clearly the app responsibility. It is 
likely, that generated files moved between systems can cause arbitrary issues 
disregarding the actual platform. The mentioned case belongs to the same group, 
where I'd say there is no and cannot be a plausible general "fix". In addition 
to the EOL example by Rowan, another one of same could be escapeshell* 
functions. Taking in account also

- backward compatibility
- platform specific
- compatibility with dependency libs, especially where it's impossible to 
integrate PHP streams
- absence of the cross platform specifications, which is IMO the most of issue

Even if we'd abstract ourselves from the initial app responsibility case - 
there are the portability nuances that are not simply to clear away by just 
renaming 'a' to 'b'. 

Regards

Anatol


Re: [PHP-DEV] Directory separators on Windows

2017-03-31 Thread Pierre Joye
On Fri, Mar 31, 2017 at 3:32 PM, Rasmus Schultz  wrote:
> Well, this is the opposite of what I'm asking for, and does not address the
> case where paths have been persisted in a file or database and the data
> gets accessed from different OS.
>
> I understand the reasons given for not changing this behavior in PHP
> itself, so maybe we could have a standard function that normalizes paths to
> forward slashes? e.g. basically:
>
> /**
>  * Normalize a filesystem path.
>  *
>  * On windows systems, replaces backslashes with forward slashes
>  * and ensures drive-letter in upper-case.
>  *
>  * @param string $path
>  *
>  * @return string normalized path
>  */
> function normalize_path( $path ) {
> $path = str_replace('\\', '/', $path);
>
> return $path{1} === ':'
> ? ucfirst($path)
> : $path;
> }

Also ucfirst is useless (or any case operations). realpath goes
further down by solving ugly things like  \\\ or // (code
concatenating paths without checking trailing /\.

> At least WordPress, Drupal and probably most major CMS and frameworks have
> this function or something equivalent. .

Now I remember why they have to do that.

realpath is not fully exposed in userland. virtual_file_ex should be
used and provide the option to validate path or not. Right now
realpath will fail if the path does not exist. I would suggest to
expose this functionality/option and that will solve the need to
implement such things in userland.

ps: I discussed that long time with Dmitry and forgot to implement it,
I take the blame for not having that in 7.x :)

Cheers,
Pierre

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Directory separators on Windows

2017-03-31 Thread Rasmus Schultz
Well, this is the opposite of what I'm asking for, and does not address the
case where paths have been persisted in a file or database and the data
gets accessed from different OS.

I understand the reasons given for not changing this behavior in PHP
itself, so maybe we could have a standard function that normalizes paths to
forward slashes? e.g. basically:

/**
 * Normalize a filesystem path.
 *
 * On windows systems, replaces backslashes with forward slashes
 * and ensures drive-letter in upper-case.
 *
 * @param string $path
 *
 * @return string normalized path
 */
function normalize_path( $path ) {
$path = str_replace('\\', '/', $path);

return $path{1} === ':'
? ucfirst($path)
: $path;
}

At least WordPress, Drupal and probably most major CMS and frameworks have
this function or something equivalent.

This function is too trivial to ship as a separate package, but at the same
time, it's too error-prone and repetitive for every framework/project to
implement (and test) for itself... In my opinion, it's common enough that
it ought to just be built-in?


On Thu, Mar 30, 2017 at 5:45 PM, Kris Craig  wrote:

>
> On Mar 30, 2017 8:21 AM, "Sara Golemon"  wrote:
> >
> > My first thought is UNC paths.  On windows a file server share is
> > denoted by \\host\share . if you combine that with relative paths
> > produced from PHP, you end up in the dubious situation of
> > "\\host\share/path/to/file" <--- wat?
> >
> > Overall, it smells of magic.
> >
> > -Sara
> >
> > On Thu, Mar 30, 2017 at 8:25 AM, Rasmus Schultz 
> wrote:
> > > Today, I ran into a very hard-to-debug problem, in which paths (to SQL
> > > files, in a database migration script) were kept in a map, persisted
> to a
> > > JSON file, and this file was moved from a Windows to a Linux
> file-system -
> > > because the paths on the Linux system had forward slashes, the files
> > > appeared to be missing from the map.
> > >
> > > Related questions are very commonly asked by Windows users, indicating
> that
> > > this is a common problem:
> > >
> > > http://stackoverflow.com/questions/14743548/php-on-
> windows-path-comes-up-with-backward-slash
> > > http://stackoverflow.com/questions/5642785/php-a-good-
> way-to-universalize-paths-across-oss-slash-directions
> > > http://stackoverflow.com/questions/6510468/is-there-a-
> way-to-force-php-on-windows-to-provide-paths-with-forward-slashes
> > >
> > > The answers that are usually given (use DIRECTORY_SEPARATOR, use
> > > str_replace() etc.) is that by default you automatically get
> cross-platform
> > > inconsistencies, and the workarounds end up complicating code
> everywhere,
> > > and sometimes lead to other (sometimes worse) portability problems.
> > >
> > > The problem is worsened by functions like glob() and the SPL
> directory/file
> > > traversal objects also producing inconsistent results.
> > >
> > > Returning backslashes on Windows seems rather unnecessary in the first
> > > place, since forward slashes work just fine?
> > >
> > > Might I suggest changing this behavior, such that file-system paths are
> > > consistently returned with a forward slash?
> > >
> > > Though this is more likely to fix rather than create issues, this
> could be
> > > a breaking change in some cases, so there should probably be an INI
> setting
> > > that enables the old behavior.
> > >
> > > Thoughts?
> >
> > --
> > PHP Internals - PHP Runtime Development Mailing List
> > To unsubscribe, visit: http://www.php.net/unsub.php
> >
>
> Another option would be to create a function that converts all slashes in
> a given input string to whatever the directory seperator should be on that
> platform.  This way, devs wouldn't have to deal with bulky aliases like
> DIRECTORY_SEPERATOR cluttering up their code.
>
> For example:
>
> 
> print convert_seperators( '/some\directory/' );
>
> ?>
>
> The above would output "/some/directory" on Linux and "\some\directory" on
> Windows.
>
> --Kris
>


Re: [PHP-DEV] Directory separators on Windows

2017-03-30 Thread Pierre Joye
On Mar 31, 2017 8:19 AM, "S.A.N"  wrote:

> Another option would be to create a function that converts all slashes in
a
> given input string to whatever the directory seperator should be on that
> platform.  This way, devs wouldn't have to deal with bulky aliases like
> DIRECTORY_SEPERATOR cluttering up their code.
>
> For example:
>
> 
> print convert_seperators( '/some\directory/' );
>
> ?>
>
> The above would output "/some/directory" on Linux and "\some\directory" on
> Windows.

+1
Can be used, for convert NAMESPACE to filepath in autoload )

http://www.php.net/unsub.php


Re: [PHP-DEV] Directory separators on Windows

2017-03-30 Thread S.A.N
> Another option would be to create a function that converts all slashes in a
> given input string to whatever the directory seperator should be on that
> platform.  This way, devs wouldn't have to deal with bulky aliases like
> DIRECTORY_SEPERATOR cluttering up their code.
>
> For example:
>
> 
> print convert_seperators( '/some\directory/' );
>
> ?>
>
> The above would output "/some/directory" on Linux and "\some\directory" on
> Windows.

+1
Can be used, for convert NAMESPACE to filepath in autoload )

http://www.php.net/unsub.php



Re: [PHP-DEV] Directory separators on Windows

2017-03-30 Thread Fleshgrinder
On 3/30/2017 3:25 PM, Rasmus Schultz wrote:
> Thoughts?
> 

Windows and paths is a complicated and lengthy story.

TL;DR all versions of Windows are able to deal with slashes, and we
could easily use slashes everywhere all the time.

# History
The story why Windows is using the backslash might be of interest, read:

http://blogs.msdn.com/b/larryosterman/archive/2005/06/24/432386.aspx

This also explains that Windows IS supporting forward slashes since at
least the 1990s. However, there are programs that have significant
problems with it, but usually those are old or otherwise shitty programs.

There are various ways paths can be represented in Windows, the so
called path variants. There are 7 in total:

1. Root
2. Disk
3. UNC
4. Device Namespace
5. Verbatim Disk
6. Verbatim UNC
7. Verbatim Device Namespace

## Root
This works just line on Unix an can be either `\` or `/`. It always
refers to the root directory of the current drive.

### Home
PowerShell also supports the home short-hand `~` like Unix systems,
however, `cmd.exe` does not.

## Disk
This is the one we all know. The drive letter comes first, followed by a
colon `:`, and then continues with the actual path.

`C:\Folder\Resource`
`C:/Folder/Resource`

## UNC
Is short for **Universal Naming Convention** or **Uniform Naming
Convention** allows one to refer to network paths or server shares.

`\\ComputerName\SharedFolder\Resource`
`//ComputerName/SharedFolder/Resource`

It also has an extended form for web resource:

`\\CompuserName[@SSL][@Port]\SharedFolder\Resource`

## Device Namespace
This allows one to directly address special devices, or again the disks
themselves.

`\\.\Device\Resource`
`//./Device/Resource`

## Verbatim *
The verbatim paths work exactly the same way as the respective normal
counterpart, the difference is that the slash to backslash conversion
does NOT happen auto-magically:

`\\?\C:\Folder\Resource`
`\\?\Server\Share`
`\\?\UNC\Server\Share`

https://en.wikipedia.org/wiki/Path_(computing)

I highly recommend you to have a look at Rust's path implementation, as
it takes care of all these things in a very intelligent manner. It is
also capable of dealing with all variants of paths in Windows, unlike
PHP which only supports a few:

https://doc.rust-lang.org/std/path/index.html

-- 
Richard "Fleshgrinder" Fussenegger

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Directory separators on Windows

2017-03-30 Thread Walter Parker
On Thu, Mar 30, 2017 at 8:21 AM, Sara Golemon  wrote:

> My first thought is UNC paths.  On windows a file server share is
> denoted by \\host\share . if you combine that with relative paths
> produced from PHP, you end up in the dubious situation of
> "\\host\share/path/to/file" <--- wat?
>
> Overall, it smells of magic.
>
> -Sara
>
> On Thu, Mar 30, 2017 at 8:25 AM, Rasmus Schultz 
> wrote:
> > Today, I ran into a very hard-to-debug problem, in which paths (to SQL
> > files, in a database migration script) were kept in a map, persisted to a
> > JSON file, and this file was moved from a Windows to a Linux file-system
> -
> > because the paths on the Linux system had forward slashes, the files
> > appeared to be missing from the map.
> >
> > Related questions are very commonly asked by Windows users, indicating
> that
> > this is a common problem:
> >
> > http://stackoverflow.com/questions/14743548/php-on-
> windows-path-comes-up-with-backward-slash
> > http://stackoverflow.com/questions/5642785/php-a-good-
> way-to-universalize-paths-across-oss-slash-directions
> > http://stackoverflow.com/questions/6510468/is-there-a-
> way-to-force-php-on-windows-to-provide-paths-with-forward-slashes
> >
> > The answers that are usually given (use DIRECTORY_SEPARATOR, use
> > str_replace() etc.) is that by default you automatically get
> cross-platform
> > inconsistencies, and the workarounds end up complicating code everywhere,
> > and sometimes lead to other (sometimes worse) portability problems.
> >
> > The problem is worsened by functions like glob() and the SPL
> directory/file
> > traversal objects also producing inconsistent results.
> >
> > Returning backslashes on Windows seems rather unnecessary in the first
> > place, since forward slashes work just fine?
> >
> > Might I suggest changing this behavior, such that file-system paths are
> > consistently returned with a forward slash?
> >
> > Though this is more likely to fix rather than create issues, this could
> be
> > a breaking change in some cases, so there should probably be an INI
> setting
> > that enables the old behavior.
> >
> > Thoughts?
>
> --
> PHP Internals - PHP Runtime Development Mailing List
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>

UNC pathing also works with forward slashes. For example, in powershell the
following is valid and works if your host is named UNC1 and you have admin
rights to the server.

//UNC1/C$/


-- 
The greatest dangers to liberty lurk in insidious encroachment by men of
zeal, well-meaning but without understanding.   -- Justice Louis D. Brandeis


Re: [PHP-DEV] Directory separators on Windows

2017-03-30 Thread Kris Craig
On Mar 30, 2017 8:21 AM, "Sara Golemon"  wrote:
>
> My first thought is UNC paths.  On windows a file server share is
> denoted by \\host\share . if you combine that with relative paths
> produced from PHP, you end up in the dubious situation of
> "\\host\share/path/to/file" <--- wat?
>
> Overall, it smells of magic.
>
> -Sara
>
> On Thu, Mar 30, 2017 at 8:25 AM, Rasmus Schultz 
wrote:
> > Today, I ran into a very hard-to-debug problem, in which paths (to SQL
> > files, in a database migration script) were kept in a map, persisted to
a
> > JSON file, and this file was moved from a Windows to a Linux
file-system -
> > because the paths on the Linux system had forward slashes, the files
> > appeared to be missing from the map.
> >
> > Related questions are very commonly asked by Windows users, indicating
that
> > this is a common problem:
> >
> >
http://stackoverflow.com/questions/14743548/php-on-windows-path-comes-up-with-backward-slash
> >
http://stackoverflow.com/questions/5642785/php-a-good-way-to-universalize-paths-across-oss-slash-directions
> >
http://stackoverflow.com/questions/6510468/is-there-a-way-to-force-php-on-windows-to-provide-paths-with-forward-slashes
> >
> > The answers that are usually given (use DIRECTORY_SEPARATOR, use
> > str_replace() etc.) is that by default you automatically get
cross-platform
> > inconsistencies, and the workarounds end up complicating code
everywhere,
> > and sometimes lead to other (sometimes worse) portability problems.
> >
> > The problem is worsened by functions like glob() and the SPL
directory/file
> > traversal objects also producing inconsistent results.
> >
> > Returning backslashes on Windows seems rather unnecessary in the first
> > place, since forward slashes work just fine?
> >
> > Might I suggest changing this behavior, such that file-system paths are
> > consistently returned with a forward slash?
> >
> > Though this is more likely to fix rather than create issues, this could
be
> > a breaking change in some cases, so there should probably be an INI
setting
> > that enables the old behavior.
> >
> > Thoughts?
>
> --
> PHP Internals - PHP Runtime Development Mailing List
> To unsubscribe, visit: http://www.php.net/unsub.php
>

Another option would be to create a function that converts all slashes in a
given input string to whatever the directory seperator should be on that
platform.  This way, devs wouldn't have to deal with bulky aliases like
DIRECTORY_SEPERATOR cluttering up their code.

For example:



The above would output "/some/directory" on Linux and "\some\directory" on
Windows.

--Kris


Re: [PHP-DEV] Directory separators on Windows

2017-03-30 Thread Sara Golemon
My first thought is UNC paths.  On windows a file server share is
denoted by \\host\share . if you combine that with relative paths
produced from PHP, you end up in the dubious situation of
"\\host\share/path/to/file" <--- wat?

Overall, it smells of magic.

-Sara

On Thu, Mar 30, 2017 at 8:25 AM, Rasmus Schultz  wrote:
> Today, I ran into a very hard-to-debug problem, in which paths (to SQL
> files, in a database migration script) were kept in a map, persisted to a
> JSON file, and this file was moved from a Windows to a Linux file-system -
> because the paths on the Linux system had forward slashes, the files
> appeared to be missing from the map.
>
> Related questions are very commonly asked by Windows users, indicating that
> this is a common problem:
>
> http://stackoverflow.com/questions/14743548/php-on-windows-path-comes-up-with-backward-slash
> http://stackoverflow.com/questions/5642785/php-a-good-way-to-universalize-paths-across-oss-slash-directions
> http://stackoverflow.com/questions/6510468/is-there-a-way-to-force-php-on-windows-to-provide-paths-with-forward-slashes
>
> The answers that are usually given (use DIRECTORY_SEPARATOR, use
> str_replace() etc.) is that by default you automatically get cross-platform
> inconsistencies, and the workarounds end up complicating code everywhere,
> and sometimes lead to other (sometimes worse) portability problems.
>
> The problem is worsened by functions like glob() and the SPL directory/file
> traversal objects also producing inconsistent results.
>
> Returning backslashes on Windows seems rather unnecessary in the first
> place, since forward slashes work just fine?
>
> Might I suggest changing this behavior, such that file-system paths are
> consistently returned with a forward slash?
>
> Though this is more likely to fix rather than create issues, this could be
> a breaking change in some cases, so there should probably be an INI setting
> that enables the old behavior.
>
> Thoughts?

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Directory separators on Windows

2017-03-30 Thread Ryan Pallas
On Thu, Mar 30, 2017 at 8:05 AM, Rowan Collins 
wrote:

> On 30 March 2017 14:25:02 BST, Rasmus Schultz  wrote:
>
> >Returning backslashes on Windows seems rather unnecessary in the first
> >place, since forward slashes work just fine?
>
> This may be true when using the paths within PHP, but is it true outside
> of it? If your JSON file had been read in by a .net application, or used to
> generate a DOS/NT batch file, wouldn't forward slashes there have been just
> as broken as backslashes on a Linux box?
>
>
In my experience, forward slashes work just fine in .NET 4.0+ (haven't ever
used less than 4.0, so I won't claim to know), PowerShell and batch files.
Command prompt deals with it just fine.


> Sadly, I fear this is like trying to automate line ending conversion - the
> more you try to avoid being platform-specific, the more awkward cases you
> introduce.
>

I tend to agree. It's really not that hard to handle in the application
itself, instead of relying on the language to perform some magic. We
generally know that magic features aren't so great, so let's not go adding
more.


>
> Regards,
>
> --
> Rowan Collins
> [IMSoP]
>
> --
> PHP Internals - PHP Runtime Development Mailing List
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>


Re: [PHP-DEV] Directory separators on Windows

2017-03-30 Thread Pierre Joye
On Mar 30, 2017 8:25 PM, "Rasmus Schultz"  wrote:

Today, I ran into a very hard-to-debug problem, in which paths (to SQL
files, in a database migration script) were kept in a map, persisted to a
JSON file, and this file was moved from a Windows to a Linux file-system -
because the paths on the Linux system had forward slashes, the files
appeared to be missing from the map.

Related questions are very commonly asked by Windows users, indicating that
this is a common problem:

http://stackoverflow.com/questions/14743548/php-on-
windows-path-comes-up-with-backward-slash
http://stackoverflow.com/questions/5642785/php-a-good-
way-to-universalize-paths-across-oss-slash-directions
http://stackoverflow.com/questions/6510468/is-there-a-
way-to-force-php-on-windows-to-provide-paths-with-forward-slashes

The answers that are usually given (use DIRECTORY_SEPARATOR, use
str_replace() etc.) is that by default you automatically get cross-platform
inconsistencies, and the workarounds end up complicating code everywhere,
and sometimes lead to other (sometimes worse) portability problems.

The problem is worsened by functions like glob() and the SPL directory/file
traversal objects also producing inconsistent results.

Returning backslashes on Windows seems rather unnecessary in the first
place, since forward slashes work just fine?

Might I suggest changing this behavior, such that file-system paths are
consistently returned with a forward slash?

Though this is more likely to fix rather than create issues, this could be
a breaking change in some cases, so there should probably be an INI setting
that enables the old behavior.

Thoughts?


It is true (works) only on Windows because PHP does the conversion
transparently for you.

It will miserably fails if your json string are processed as paths with
other tools or languages not doing this magic for you.

Cheers
Pierre


Re: [PHP-DEV] Directory separators on Windows

2017-03-30 Thread Rowan Collins
On 30 March 2017 14:25:02 BST, Rasmus Schultz  wrote:

>Returning backslashes on Windows seems rather unnecessary in the first
>place, since forward slashes work just fine?

This may be true when using the paths within PHP, but is it true outside of it? 
If your JSON file had been read in by a .net application, or used to generate a 
DOS/NT batch file, wouldn't forward slashes there have been just as broken as 
backslashes on a Linux box?

Sadly, I fear this is like trying to automate line ending conversion - the more 
you try to avoid being platform-specific, the more awkward cases you introduce.

Regards,

-- 
Rowan Collins
[IMSoP]

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



[PHP-DEV] Directory separators on Windows

2017-03-30 Thread Rasmus Schultz
Today, I ran into a very hard-to-debug problem, in which paths (to SQL
files, in a database migration script) were kept in a map, persisted to a
JSON file, and this file was moved from a Windows to a Linux file-system -
because the paths on the Linux system had forward slashes, the files
appeared to be missing from the map.

Related questions are very commonly asked by Windows users, indicating that
this is a common problem:

http://stackoverflow.com/questions/14743548/php-on-windows-path-comes-up-with-backward-slash
http://stackoverflow.com/questions/5642785/php-a-good-way-to-universalize-paths-across-oss-slash-directions
http://stackoverflow.com/questions/6510468/is-there-a-way-to-force-php-on-windows-to-provide-paths-with-forward-slashes

The answers that are usually given (use DIRECTORY_SEPARATOR, use
str_replace() etc.) is that by default you automatically get cross-platform
inconsistencies, and the workarounds end up complicating code everywhere,
and sometimes lead to other (sometimes worse) portability problems.

The problem is worsened by functions like glob() and the SPL directory/file
traversal objects also producing inconsistent results.

Returning backslashes on Windows seems rather unnecessary in the first
place, since forward slashes work just fine?

Might I suggest changing this behavior, such that file-system paths are
consistently returned with a forward slash?

Though this is more likely to fix rather than create issues, this could be
a breaking change in some cases, so there should probably be an INI setting
that enables the old behavior.

Thoughts?