Re: [racket-dev] Symlink trouble
Tried it and it works perfectly. Thanks! On Wed, 17 Apr 2013 16:39:22 +0200, Matthew Flatt wrote: Yes, I think Racket should use PWD --- if the expansion of soft links produces the same path as getcwd(), which seems to be what "/bin/pwd" does. Should Racket also set PWD (optionally, but by default) when it creates a subprocess? I think probably so. To make sure we're all on the same page: The general problem is that there can be more than one filesystem path that reaches a file. It would be great if we could normalize every path to a canonical form, but path normalization in general seems to intractable due to the possibilities of soft links, hard links, multiple mount points, case-sensitivity choices, and probably other twists that I'm forgetting. We have therefore settled on different definitions of "same file", depending on the context. For module paths, "same file" involves only syntactic normalizations of the pathname (e.g., no checking for soft links). Various pieces of the system are carefully implemented to be consistent with syntactic normalization. For example, suppose that PLTCOLLECTS is set to "/home/mflatt/plt", but "/home/mflatt" is a symlink to "/Users/mflatt"; pathnames associated to modules that are accessed via collection will consistently use "/home/mflatt", and not somehow hop over to "/Users/mflatt". As long as a user is similarly consistent when supplying paths, it all works out. Unfortunately, `current-directory' is a place where you don't get to choose the path. You might say "/home/mflatt/plt" to get to a Racket installation, but to initialize `current-directory', the path gets turned into an inode and back to a path via getcwd() --- exactly the sort of thing that breaks a syntactic view of "same". The PWD environment variable addresses the problem with getcwd(): nice shells set PWD based on a syntactic derivation of the current directory, instead of an inode-based derivation. So, Racket should take advantage of the information that nice shells provide. Probably it should also act as a nice shell by default. (As it happens, I use "csh" on Mac OS X, and it's not nice in the above sense. That helps explain why I never got PWD vs. cwd() before.) At Wed, 17 Apr 2013 12:06:29 +0200, Tobias Hammer wrote: Hi, i am currently implementing an application that heavily relies on rackets great serialize functionality to exchange data between racket processes on different computers. That works well until i stumbled over a very confusion behavior of rackets filesystem and module path resolution. I will explain first, what i observed and then why this causes some trouble: * relative (module) paths are resolved with something like (or (current-load-directory) (current-directory)) * collection paths are resolved with (find-executable-path (find-system-path 'exec-file) (find-system-path 'collects-dir)) for the system collection and with the given path for the others * you can require a module relative and via collection, if they resolve to the same name, there is no error serialize stores the module path and symbol where the deserialize function can be found. It's interesting how this module path is determined * If the file containing the deserialize identifier (if implemented by hand or the file where e.g serializable-stuct is used) is loaded via collection, then the serialized stream contains a collection path (determined via identifier binding and mpi magic) * If this file is loaded relative, the fallback method with current-(load)-directory is used Nothing special so far, but the fun starts with how current-directory is initialized. It uses (on *nix systems) getcwd() but this function returns the path with all symbolic links resolved (getcwd is only a thin OS-wrapper, and the OS provides nothing else). This little detail can easily break the serialization framework (and maybe other things too). The scenario is a file that is in a path containing a symlink and that is in the current collections, e.g /abc/symlink/more/def/file.rkt and PLTCOLLECTS="/abc/symlink/more:" and file.rkt contains a serializable-struct definition. Now one racket process loads "file.rkt" relative, serializes a struct instance and sends it to another racket process. The other process loads def/file via collection and deserialies the struct. The receiver now has a struct that is of a different type and that he can't access. This fails because the serialized data contains the absolute symlink-free path that differs from the path the receiver used to load file.rkt (because for collection dirs symlinks are not resolved). The same happens of course when the data is send to another computer that has a symlink in the path to file.rkt, even if they both load the same way. The confusing thing is that from the users point of view everything is consistent. His working directory and collections all point to the same location. It is clear that this behavior is by far not lim
Re: [racket-dev] Symlink trouble
On Wed, 17 Apr 2013 17:25:02 +0200, Matthew Flatt wrote: That matches my observations. Files accessed via collection always keep their paths 'as is'. But it is enough to start a program via racket instead of racket -l what/ever to break this. I should have mentioned that you could use `racket ' to avoid the problem, which is a workaround that I have used often. Thanks. I didn't know that. That seems to solve it for everything relying on the module path of the initial file. Unfortunately, but understandably current-directory is not affected. _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] Symlink trouble
At Wed, 17 Apr 2013 17:19:58 +0200, Tobias Hammer wrote: > On Wed, 17 Apr 2013 16:39:22 +0200, Matthew Flatt > wrote: > > For module paths, "same file" involves only syntactic normalizations of > > the pathname (e.g., no checking for soft links). Various pieces of the > > system are carefully implemented to be consistent with syntactic > > normalization. For example, suppose that PLTCOLLECTS is set to > > "/home/mflatt/plt", but "/home/mflatt" is a symlink to "/Users/mflatt"; > > pathnames associated to modules that are accessed via collection will > > consistently use "/home/mflatt", and not somehow hop over to > > "/Users/mflatt". As long as a user is similarly consistent when > > supplying paths, it all works out. > > That matches my observations. Files accessed via collection always keep > their > paths 'as is'. But it is enough to start a program via racket > instead of > racket -l what/ever to break this. I should have mentioned that you could use `racket ' to avoid the problem, which is a workaround that I have used often. > > So, Racket should take advantage of the information that nice shells > > provide. Probably it should also act as a nice shell by default. > > What exactly do you mean by acting as a nice shell? Setting PWD for > subprocesses? > In that sense it should definitely be nice (by default). Yes. > > (As it happens, I use "csh" on Mac OS X, and it's not nice in the above > > sense. That helps explain why I never got PWD vs. cwd() before.) > > Just tried bash, csh and ksh on linux and they all seem to set PWD. But i > can't tell > if thats the default or specific to my installation. I think it may be part of the BSD legacy for Mac OS X. On my Linux installations, csh works as you describe. _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] Symlink trouble
On Wed, 17 Apr 2013 16:39:22 +0200, Matthew Flatt wrote: Yes, I think Racket should use PWD --- if the expansion of soft links produces the same path as getcwd(), which seems to be what "/bin/pwd" does. That check is even better than the one i had in mind. That should prevent any possible ambiguities. Should Racket also set PWD (optionally, but by default) when it creates a subprocess? I think probably so. Yes, i think that is a sensible default. Setting only the os cwd, as racket currently does would cause the same problems for any subprocesses that uses this information (at least most shells and, in the future, racket). To make sure we're all on the same page: The general problem is that there can be more than one filesystem path that reaches a file. It would be great if we could normalize every path to a canonical form, but path normalization in general seems to intractable due to the possibilities of soft links, hard links, multiple mount points, case-sensitivity choices, and probably other twists that I'm forgetting. We have therefore settled on different definitions of "same file", depending on the context. Right For module paths, "same file" involves only syntactic normalizations of the pathname (e.g., no checking for soft links). Various pieces of the system are carefully implemented to be consistent with syntactic normalization. For example, suppose that PLTCOLLECTS is set to "/home/mflatt/plt", but "/home/mflatt" is a symlink to "/Users/mflatt"; pathnames associated to modules that are accessed via collection will consistently use "/home/mflatt", and not somehow hop over to "/Users/mflatt". As long as a user is similarly consistent when supplying paths, it all works out. That matches my observations. Files accessed via collection always keep their paths 'as is'. But it is enough to start a program via racket instead of racket -l what/ever to break this. Therefore it seems to be the normal case that it fails (at least for console racket). Unfortunately, `current-directory' is a place where you don't get to choose the path. You might say "/home/mflatt/plt" to get to a Racket installation, but to initialize `current-directory', the path gets turned into an inode and back to a path via getcwd() --- exactly the sort of thing that breaks a syntactic view of "same". Correct The PWD environment variable addresses the problem with getcwd(): nice shells set PWD based on a syntactic derivation of the current directory, instead of an inode-based derivation. So, Racket should take advantage of the information that nice shells provide. Probably it should also act as a nice shell by default. What exactly do you mean by acting as a nice shell? Setting PWD for subprocesses? In that sense it should definitely be nice (by default). (As it happens, I use "csh" on Mac OS X, and it's not nice in the above sense. That helps explain why I never got PWD vs. cwd() before.) Just tried bash, csh and ksh on linux and they all seem to set PWD. But i can't tell if thats the default or specific to my installation. At Wed, 17 Apr 2013 12:06:29 +0200, Tobias Hammer wrote: Hi, i am currently implementing an application that heavily relies on rackets great serialize functionality to exchange data between racket processes on different computers. That works well until i stumbled over a very confusion behavior of rackets filesystem and module path resolution. I will explain first, what i observed and then why this causes some trouble: * relative (module) paths are resolved with something like (or (current-load-directory) (current-directory)) * collection paths are resolved with (find-executable-path (find-system-path 'exec-file) (find-system-path 'collects-dir)) for the system collection and with the given path for the others * you can require a module relative and via collection, if they resolve to the same name, there is no error serialize stores the module path and symbol where the deserialize function can be found. It's interesting how this module path is determined * If the file containing the deserialize identifier (if implemented by hand or the file where e.g serializable-stuct is used) is loaded via collection, then the serialized stream contains a collection path (determined via identifier binding and mpi magic) * If this file is loaded relative, the fallback method with current-(load)-directory is used Nothing special so far, but the fun starts with how current-directory is initialized. It uses (on *nix systems) getcwd() but this function returns the path with all symbolic links resolved (getcwd is only a thin OS-wrapper, and the OS provides nothing else). This little detail can easily break the serialization framework (and maybe other things too). The scenario is a file that is in a path containing a symlink and that is in the current collections, e.g /abc/symlink/more/def/file.rkt and PLTCOLLECTS="/abc/symlink/more:" and file.rkt cont
Re: [racket-dev] Symlink trouble
Matthew Flatt wrote at 04/17/2013 10:39 AM: It would be great if we could normalize every path to a canonical form, but path normalization in general seems to intractable due to the possibilities of soft links, hard links, multiple mount points, case-sensitivity choices, and probably other twists that I'm forgetting. Agreed intractable, and agreed with your approach. Just FYI for anyone who Googles this in the future and wants a limited but tractable pretty-good path canonicalization for their application (at least on Unix-y filesystems)... "canonicalize-path" from the following library might be good enough: http://www.neilvandyke.org/racket-path-misc/ (I wrote this for "scan-mediafiles" in "http://www.neilvandyke.org/racket-mediafile/";, and put it in a separate PLaneT package because I've needed such a procedure from time to time.) Neil V. _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] Symlink trouble
Yes, I think Racket should use PWD --- if the expansion of soft links produces the same path as getcwd(), which seems to be what "/bin/pwd" does. Should Racket also set PWD (optionally, but by default) when it creates a subprocess? I think probably so. To make sure we're all on the same page: The general problem is that there can be more than one filesystem path that reaches a file. It would be great if we could normalize every path to a canonical form, but path normalization in general seems to intractable due to the possibilities of soft links, hard links, multiple mount points, case-sensitivity choices, and probably other twists that I'm forgetting. We have therefore settled on different definitions of "same file", depending on the context. For module paths, "same file" involves only syntactic normalizations of the pathname (e.g., no checking for soft links). Various pieces of the system are carefully implemented to be consistent with syntactic normalization. For example, suppose that PLTCOLLECTS is set to "/home/mflatt/plt", but "/home/mflatt" is a symlink to "/Users/mflatt"; pathnames associated to modules that are accessed via collection will consistently use "/home/mflatt", and not somehow hop over to "/Users/mflatt". As long as a user is similarly consistent when supplying paths, it all works out. Unfortunately, `current-directory' is a place where you don't get to choose the path. You might say "/home/mflatt/plt" to get to a Racket installation, but to initialize `current-directory', the path gets turned into an inode and back to a path via getcwd() --- exactly the sort of thing that breaks a syntactic view of "same". The PWD environment variable addresses the problem with getcwd(): nice shells set PWD based on a syntactic derivation of the current directory, instead of an inode-based derivation. So, Racket should take advantage of the information that nice shells provide. Probably it should also act as a nice shell by default. (As it happens, I use "csh" on Mac OS X, and it's not nice in the above sense. That helps explain why I never got PWD vs. cwd() before.) At Wed, 17 Apr 2013 12:06:29 +0200, Tobias Hammer wrote: > Hi, > > i am currently implementing an application that heavily relies on rackets > great serialize functionality to exchange data between racket processes on > different computers. That works well until i stumbled over a very > confusion behavior of rackets filesystem and module path resolution. > > I will explain first, what i observed and then why this causes some > trouble: > * relative (module) paths are resolved with something like (or > (current-load-directory) (current-directory)) > * collection paths are resolved with > (find-executable-path (find-system-path 'exec-file) (find-system-path > 'collects-dir)) for the system collection and with the given path for the > others > * you can require a module relative and via collection, if they resolve to > the same name, there is no error > > serialize stores the module path and symbol where the deserialize function > can be found. It's interesting how this module path is determined > * If the file containing the deserialize identifier (if implemented by > hand or the file where e.g serializable-stuct is used) is loaded via > collection, then the serialized stream contains a collection path > (determined via identifier binding and mpi magic) > * If this file is loaded relative, the fallback method with > current-(load)-directory is used > > Nothing special so far, but the fun starts with how current-directory is > initialized. It uses (on *nix systems) getcwd() but this function returns > the path with all symbolic links resolved (getcwd is only a thin > OS-wrapper, and the OS provides nothing else). > This little detail can easily break the serialization framework (and maybe > other things too). > The scenario is a file that is in a path containing a symlink and that is > in the current collections, e.g > /abc/symlink/more/def/file.rkt > and PLTCOLLECTS="/abc/symlink/more:" > and file.rkt contains a serializable-struct definition. > > Now one racket process loads "file.rkt" relative, serializes a struct > instance and sends it to another racket process. The other process loads > def/file via collection and deserialies the struct. The receiver now has a > struct that is of a different type and that he can't access. > This fails because the serialized data contains the absolute symlink-free > path that differs from the path the receiver used to load file.rkt > (because for collection dirs symlinks are not resolved). > > The same happens of course when the data is send to another computer that > has a symlink in the path to file.rkt, even if they both load the same way. > > The confusing thing is that from the users point of view everything is > consistent. His working directory and collections all point to the same > location. > > It is clear that this behav