Hi alistair I'm not expert of file system. Now I like your idea to have explicit messages such as canonalize or expand.
Stef On Tue, Jun 27, 2017 at 8:56 PM, Alistair Grant <[email protected]> wrote: > Hi Sven, > > On Tue, Jun 27, 2017 at 01:31:42PM +0200, Sven Van Caekenberghe wrote: >> Hi Alistair, >> >> I think it is great that you are working on FileSystem and are trying >> to contribute. I appreciate your effort to find consensus. > > :-) > > >> But, and please take this as positive criticism, I feel a bit uneasy >> when I read your reasoning, but maybe I am wrong. > > Thanks for taking the trouble to reply. Hopefully I can address some of > your unease below. > > >> You see, the way I understand FileSystem (what I think was the >> original design idea), it is a cross platform abstraction over >> concrete file systems and paths. > > It does appear to be the case that effort was put in to avoid specifying > a separator, but it makes an assumption that is incorrect. The class > comments of path state that the seperator can be part of a file name, > and this is incorrect for every disk file system Pharo supports (as far > as I know). > > > >> A FileReference holds a FileSystem and a Path. A Path is just a >> collection of elements that leads to a location, the leaf possibly >> being interpreted as a file with an extension (although that last >> point is just a convention). It seems great effort was put into >> avoiding the use of separators. > >> Pharo is an object oriented system with appropriate abstractions, >> FileSystem gives us one approach to this difficult problem. >> >> In your reasoning, your truth, your reference is always the Unix file >> system and the way paths are handled there. You expect a number of >> things based on that, but maybe that was/is not the design goal. > > I do have a bias towards the Unix filesystem, and you're correct that > all the examples I provided use the Unix file system, but I've tried to > make sure that the changes I've made apply equally to Windows file > systems (more below). > > >> I said this before, but I am not sure we should interpret the argument of #/. > > I'll come back to this later :-) > > >> I think that >> >> FileLocator root / 'foo' / 'bar' / 'readme.txt'. >> >> is more abstract (fitting to the original goal) then >> >> '/foo' asFileReference / 'bar/readme.txt'. >> >> because it totally avoids a reference to the platform dependent path >> separator. > > I wasn't trying to suggest that code would normally be written this way. > It is more likely that '/foo' asFileReference is created early in the > code, and much later an input is supplied which is 'bar/readme.txt'. At > that point, the natural thing to do is send #/ as above. > > The problem with only supporting the first case above is that it puts > the onus of parsing the string on the user. We could add a another > method that parses the supplied string, but that will just make the > interface more confusing, and we already have a steady stream of > messages to the list from people getting confused by this. > > >> The same goes for '..' as >> >> (FileLocator root / 'foo' / 'bar' / 'down') parent / 'readme.txt'. >> >> is a more object oriented way to write '/foo/bar/down/../readme.txt', IMO. > > Agreed, but as mentioned above, we also have to deal with input strings > from external sources. > > >> Windows is an important target for Pharo, they use $\ not $/. > > The patch handles that, so on windows I can do: > > ('C:\cygwin' asFileReference / 'usr\local\bin') parent " File @ > C:\cygwin\usr\local" > > (just to be clear, this is an example of how the strings are handled, > I'm not expecting anyone to write code like the above). > > >> Do you see my point ? > > I think so, but... :-) > > >> BTW, I consider the fact that current, >> >> '/foo' asFileReference / 'bar/readme.txt' >> >> works with #exists even though 'bar/readme.txt' was not fully >> parsed/resolved a bug, or a happy coincidence at best. > > I agree that it is a happy coincedence, but it is also intuitive. > > >> Another point is the distinction between internal/external and/or >> concrete/abstract paths. Should something like '..' remain part of a >> path. Never, always, only when it can be resolved ? What about special >> characters ? > > I'm arguing that '..' should be left in the path unless explicitly told > to be removed, so that things like symbolic links are handled properly > (by the file system itself). (Thanks to Denis for reminding me about > this in an earlier email thread) > > While it's nice to be completely general, is there a supported file > system that doesn't interpret ".." as the parent directory? > > As an aside, another patch I'll be submitting fixes symbolic link > handling so we could conveivably write a #canonicalizeOnFileSystem that > would properly handle 'symboliclink/..', but it would be slow. > > >> Should we support ~ ? Sometimes I would like that too, but maybe >> >> FileLocator home >> >> is enough. > > As Subbu points out in a later message, things like '~' are handled by > the shell. We could consider adding something like #expand to handle > '~' and shell variables, e.g. $HOME, but I'd make that a separate patch. > > It would also need to be done carefully as $ and ~ are valid characters > within file names (only / and the null character are not allowed in > Posix file systems, Windows has more characters that are not allowed). > > > >> All this being said, I think FileSystem can and should be improved, >> but carefully. >> >> It would be good if more people joined this discussion. >> >> Sven > > I think I understand the original goals, and I don't disagree with them, > however: > > 1. they are based on an arguably incorrect assumption (the > separator character can be part of a file name), > > 2. they've caused quite a bit of confusion due to unexpected behaviour, > > 3. if support for a file system that allows the separator in file names > is ever added, we will still have to be able to parse a full path > correctly (which the current code wouldn't do) > > 4. I don't think these changes conflict with the original goals, all the > examples you provided above still function exactly the same. > > Does that help reduce your uneasiness? > > Thanks again, > Alistair > > > > >> > On 26 Jun 2017, at 10:12, Alistair Grant <[email protected]> wrote: >> > >> > Pharo's FileSystem behaviour is currently somewhat inconsistent in its >> > treatment of path strings. >> > >> > To be able to demonstrate the limitations and fixes, I'm assuming that >> > the following files and directories have been created (/dev/shm is a >> > memory based file system in Ubuntu): >> > >> > cd /dev/shm >> > mkdir -p d1/d2/d3/d4 >> > touch d1/t1.txt d1/d2/t2.txt d1/d2/d3/t3.txt d1/d2/d3/d4/t4.txt >> > pushd d1 && ln -s d2/d3 ./s3 && popd >> > >> > >> > >> > 1. Path's are canonicalised during initial creation, but are not when >> > extending the path, i.e. sending #/ >> > >> > E.g. >> > >> > '/dev/shm/d1/d2/../t1.txt' asFileReference " File @ /dev/shm/d1/t1.txt" >> > >> > '/dev/shm/d1/' asFileReference / 'd2/../t1.txt' " File @ >> > /dev/shm/d1/d2/../t1.txt" >> > >> > Automatic canonicalisation is problematic as the code doesn't handle >> > symbolic links in the path name: >> > >> > ('/dev/shm/d1/' asFileReference / 's3/../t2.txt') exists " true (correct >> > answer)" >> > >> > '/dev/shm/d1/s3/../t2.txt' asFileReference exists " false (wrong answer)" >> > >> > >> > 2. In an attempt to be completely general, the argument to #/ is assumed >> > to be a single directory / file. This is incorrect as the path >> > delimiter is not allowed to be part of the file name. The result is >> > that path comparisons and operations such as #parent give unexpected >> > results. >> > >> > E.g. >> > >> > ('/dev/shm/d1/' asFileReference / 'd2/d3/t3.txt') exists " true" >> > >> > ('/dev/shm/d1/' asFileReference / 'd2/d3/t3.txt') parent " File @ >> > /dev/shm/d1" >> > >> > >> > My PR modified FileSystem so that: >> > >> > 1. Canonicalisation is separated out as an operation that has to be >> > manually requested: >> > >> > '/dev/shm/d1/d2/../t1.txt' asFileReference " File @ >> > /dev/shm/d1/d2/../t1.txt" >> > >> > '/dev/shm/d1/d2/../t1.txt' asFileReference canonicalize " File @ >> > /dev/shm/d1/t1.txt" >> > >> > >> > 2. The argument to #/ can be either a single file / directory name or a >> > path. >> > >> > ('/dev/shm/d1/' asFileReference / 'd2/d3/t3.txt') parent " File @ >> > /dev/shm/d1/d2/d3" >> > >> > >> > >> > 3. Adds unit tests to cover the changed behaviour. >> > >> > >> > While I believe that these changes improve Pharo overall, making it more >> > intuitive and consistent, modifying the path creation to remove >> > canonicalisation is not backward compatible and requires a change to >> > PathTest>>testRelativeFromStringNormalization and >> > PathTest>>testRelativeFromStringNormalizationParent to manually >> > canonicalise the paths. >> > >> > Before I submit the patch, does anyone have any strong objections to the >> > changes as described above? >> > >> > Thanks, >> > Alistair >
