Hi alistair

I'm not expert of file system. Now I like your idea to have explicit
messages such as canonalize or expand.

Stef

On Tue, Jun 27, 2017 at 8:56 PM, Alistair Grant <[email protected]> wrote:
> Hi Sven,
>
> On Tue, Jun 27, 2017 at 01:31:42PM +0200, Sven Van Caekenberghe wrote:
>> Hi Alistair,
>>
>> I think it is great that you are working on FileSystem and are trying
>> to contribute. I appreciate your effort to find consensus.
>
> :-)
>
>
>> But, and please take this as positive criticism, I feel a bit uneasy
>> when I read your reasoning, but maybe I am wrong.
>
> Thanks for taking the trouble to reply.  Hopefully I can address some of
> your unease below.
>
>
>> You see, the way I understand FileSystem (what I think was the
>> original design idea), it is a cross platform abstraction over
>> concrete file systems and paths.
>
> It does appear to be the case that effort was put in to avoid specifying
> a separator, but it makes an assumption that is incorrect. The class
> comments of path state that the seperator can be part of a file name,
> and this is incorrect for every disk file system Pharo supports (as far
> as I know).
>
>
>
>> A FileReference holds a FileSystem and a Path. A Path is just a
>> collection of elements that leads to a location, the leaf possibly
>> being interpreted as a file with an extension (although that last
>> point is just a convention). It seems great effort was put into
>> avoiding the use of separators.
>
>> Pharo is an object oriented system with appropriate abstractions,
>> FileSystem gives us one approach to this difficult problem.
>>
>> In your reasoning, your truth, your reference is always the Unix file
>> system and the way paths are handled there. You expect a number of
>> things based on that, but maybe that was/is not the design goal.
>
> I do have a bias towards the Unix filesystem, and you're correct that
> all the examples I provided use the Unix file system, but I've tried to
> make sure that the changes I've made apply equally to Windows file
> systems (more below).
>
>
>> I said this before, but I am not sure we should interpret the argument of #/.
>
> I'll come back to this later :-)
>
>
>> I think that
>>
>>   FileLocator root / 'foo' / 'bar' / 'readme.txt'.
>>
>> is more abstract (fitting to the original goal) then
>>
>>   '/foo' asFileReference / 'bar/readme.txt'.
>>
>> because it totally avoids a reference to the platform dependent path 
>> separator.
>
> I wasn't trying to suggest that code would normally be written this way.
> It is more likely that '/foo' asFileReference is created early in the
> code, and much later an input is supplied which is 'bar/readme.txt'.  At
> that point, the natural thing to do is send #/ as above.
>
> The problem with only supporting the first case above is that it puts
> the onus of parsing the string on the user.  We could add a another
> method that parses the supplied string, but that will just make the
> interface more confusing, and we already have a steady stream of
> messages to the list from people getting confused by this.
>
>
>> The same goes for '..' as
>>
>>   (FileLocator root / 'foo' / 'bar' / 'down') parent / 'readme.txt'.
>>
>> is a more object oriented way to write '/foo/bar/down/../readme.txt', IMO.
>
> Agreed, but as mentioned above, we also have to deal with input strings
> from external sources.
>
>
>> Windows is an important target for Pharo, they use $\ not $/.
>
> The patch handles that, so on windows I can do:
>
> ('C:\cygwin' asFileReference / 'usr\local\bin') parent  " File @ 
> C:\cygwin\usr\local"
>
> (just to be clear, this is an example of how the strings are handled,
> I'm not expecting anyone to write code like the above).
>
>
>> Do you see my point ?
>
> I think so, but... :-)
>
>
>> BTW, I consider the fact that current,
>>
>>  '/foo' asFileReference / 'bar/readme.txt'
>>
>> works with #exists even though 'bar/readme.txt' was not fully
>> parsed/resolved a bug, or a happy coincidence at best.
>
> I agree that it is a happy coincedence, but it is also intuitive.
>
>
>> Another point is the distinction between internal/external and/or
>> concrete/abstract paths. Should something like '..' remain part of a
>> path. Never, always, only when it can be resolved ? What about special
>> characters ?
>
> I'm arguing that '..' should be left in the path unless explicitly told
> to be removed, so that things like symbolic links are handled properly
> (by the file system itself).  (Thanks to Denis for reminding me about
> this in an earlier email thread)
>
> While it's nice to be completely general, is there a supported file
> system that doesn't interpret ".." as the parent directory?
>
> As an aside, another patch I'll be submitting fixes symbolic link
> handling so we could conveivably write a #canonicalizeOnFileSystem that
> would properly handle 'symboliclink/..', but it would be slow.
>
>
>> Should we support ~ ? Sometimes I would like that too, but maybe
>>
>>   FileLocator home
>>
>> is enough.
>
> As Subbu points out in a later message, things like '~' are handled by
> the shell.  We could consider adding something like #expand to handle
> '~' and shell variables, e.g. $HOME, but I'd make that a separate patch.
>
> It would also need to be done carefully as $ and ~ are valid characters
> within file names (only / and the null character are not allowed in
> Posix file systems, Windows has more characters that are not allowed).
>
>
>
>> All this being said, I think FileSystem can and should be improved,
>> but carefully.
>>
>> It would be good if more people joined this discussion.
>>
>> Sven
>
> I think I understand the original goals, and I don't disagree with them,
> however:
>
> 1. they are based on an arguably incorrect assumption (the
> separator character can be part of a file name),
>
> 2. they've caused quite a bit of confusion due to unexpected behaviour,
>
> 3. if support for a file system that allows the separator in file names
> is ever added, we will still have to be able to parse a full path
> correctly (which the current code wouldn't do)
>
> 4. I don't think these changes conflict with the original goals, all the
> examples you provided above still function exactly the same.
>
> Does that help reduce your uneasiness?
>
> Thanks again,
> Alistair
>
>
>
>
>> > On 26 Jun 2017, at 10:12, Alistair Grant <[email protected]> wrote:
>> >
>> > Pharo's FileSystem behaviour is currently somewhat inconsistent in its
>> > treatment of path strings.
>> >
>> > To be able to demonstrate the limitations and fixes, I'm assuming that
>> > the following files and directories have been created (/dev/shm is a
>> > memory based file system in Ubuntu):
>> >
>> > cd /dev/shm
>> > mkdir -p d1/d2/d3/d4
>> > touch d1/t1.txt d1/d2/t2.txt d1/d2/d3/t3.txt d1/d2/d3/d4/t4.txt
>> > pushd d1 && ln -s d2/d3 ./s3 && popd
>> >
>> >
>> >
>> > 1. Path's are canonicalised during initial creation, but are not when
>> > extending the path, i.e. sending #/
>> >
>> > E.g.
>> >
>> > '/dev/shm/d1/d2/../t1.txt' asFileReference  " File @ /dev/shm/d1/t1.txt"
>> >
>> > '/dev/shm/d1/' asFileReference / 'd2/../t1.txt'  " File @ 
>> > /dev/shm/d1/d2/../t1.txt"
>> >
>> > Automatic canonicalisation is problematic as the code doesn't handle
>> > symbolic links in the path name:
>> >
>> > ('/dev/shm/d1/' asFileReference / 's3/../t2.txt') exists  " true (correct 
>> > answer)"
>> >
>> > '/dev/shm/d1/s3/../t2.txt' asFileReference exists  " false (wrong answer)"
>> >
>> >
>> > 2. In an attempt to be completely general, the argument to #/ is assumed
>> > to be a single directory / file.  This is incorrect as the path
>> > delimiter is not allowed to be part of the file name.  The result is
>> > that path comparisons and operations such as #parent give unexpected
>> > results.
>> >
>> > E.g.
>> >
>> > ('/dev/shm/d1/' asFileReference / 'd2/d3/t3.txt') exists  " true"
>> >
>> > ('/dev/shm/d1/' asFileReference / 'd2/d3/t3.txt') parent  " File @ 
>> > /dev/shm/d1"
>> >
>> >
>> > My PR modified FileSystem so that:
>> >
>> > 1. Canonicalisation is separated out as an operation that has to be
>> > manually requested:
>> >
>> > '/dev/shm/d1/d2/../t1.txt' asFileReference  " File @ 
>> > /dev/shm/d1/d2/../t1.txt"
>> >
>> > '/dev/shm/d1/d2/../t1.txt' asFileReference canonicalize  " File @ 
>> > /dev/shm/d1/t1.txt"
>> >
>> >
>> > 2. The argument to #/ can be either a single file / directory name or a
>> > path.
>> >
>> > ('/dev/shm/d1/' asFileReference / 'd2/d3/t3.txt') parent  " File @ 
>> > /dev/shm/d1/d2/d3"
>> >
>> >
>> >
>> > 3. Adds unit tests to cover the changed behaviour.
>> >
>> >
>> > While I believe that these changes improve Pharo overall, making it more
>> > intuitive and consistent, modifying the path creation to remove
>> > canonicalisation is not backward compatible and requires a change to
>> > PathTest>>testRelativeFromStringNormalization and
>> > PathTest>>testRelativeFromStringNormalizationParent to manually
>> > canonicalise the paths.
>> >
>> > Before I submit the patch, does anyone have any strong objections to the
>> > changes as described above?
>> >
>> > Thanks,
>> > Alistair
>

Reply via email to