PROPOSAL: New libc function const char *getexecpath(void);
returns the pathname that was passed to execve(2), unmodified. Thoughts? DETAILS: This will be implemented by a new ELF auxiliary vector entry AT_EXECPATH (or maybe AT_NETBSD_EXECPATH). Programs can use this to get files relative to exactly the access path to the executable that the caller used. Call it ${execpath} in the following expressions for comparison. This path may not be absolute. Users who want an absolute path can do roughly: $(pwd)/${execpath} Users who want some notion of canonical absolute path can do roughly: $(cd $(dirname ${execpath}) && pwd)/$(basename ${execpath}) or use readlink(1), realpath(3), &c. For programs run with fexecve(2), the answer is NULL because there is no well-defined reliable answer. STATUS QUO: We have several methods to get something similar, but nothing exactly the same, and none of them is defined independently of concurrent file system activity: - AT_SUN_EXECNAME is currently passed as $(pwd)/${execpath}. => From 2007 to 2015, AT_SUN_EXECNAME was ${execpath} if absolute and omitted altogether otherwise. => Since what I propose is currently always a suffix of AT_SUN_EXECNAME, we can simply add the new auxiliary vector entry as a pointer into the same buffer. - /proc/self/exe is a symlink to whatever $(pwd)/${execpath} resolved to at exec time. - /proc/curproc/file is the executable vnode itself (as if it were hard-linked there). - sysctl {CTL_KERN, KERN_PROC_ARGS, -1, KERN_PROC_PATHNAME} gives what $(pwd)/${execpath} resolved to at exec time. - For programs run with fexecve, all of these instead return what $(cd $(dirname ${execpath}) && pwd)/$(basename ${execpath}) resolved to at exec time using vnode_to_path (with the caveat that namecache eviction may lead this to fail altogether). Of these methods, there is no reliable way to recover exactly the original path that was given to exec, because a program given /foo/bar/baz can't distinguish whether $(pwd) was /foo and ${execpath} was bar/baz or $(pwd) was /foo/bar and ${execpath} was baz. In contrast, with procfs mounted, it is possible to recover the vnode_to_path method even for programs without fexecve: open /proc/self/file and fcntl F_GETPATH. And, with just getexecpath(), it is always possible to recover the $(pwd)/${execpath} currently passed as AT_SUN_EXECNAME, by simply prepending getcwd() output (and without adding new races, either). OTHER SYSTEMS: Other operating systems also have similar but slightly different methods -- and I would guess that they can all fail to give any answer at all in some cases of fexecve: - FreeBSD's AT_EXECPATH is ${execpath} verbatim if it is absolute, or roughly what $(cd $(dirname ${execpath}) && pwd)/$(basename ${execpath}) resolves to at _exec_ time if it is relative. . FreeBSD ELF AT_EXECPATH: https://cgit.freebsd.org/src/tree/sys/kern/imgact_elf.c?id=35a2229b67914ff1a4bae6334ad5015aa603967a#n1467 . imgp->execpathp initialization: https://cgit.freebsd.org/src/tree/sys/kern/kern_exec.c?id=35a2229b67914ff1a4bae6334ad5015aa603967a#n1700 . imgp->execpath initialization: https://cgit.freebsd.org/src/tree/sys/kern/kern_exec.c?id=35a2229b67914ff1a4bae6334ad5015aa603967a#n501 - FreeBSD's sysctl {CTL_KERN, KERN_PROC_ARGS, KERN_PROC_PATHNAME, -1}, /proc/self/exe, and /proc/self/file all give roughly what $(cd $(dirname ${execpath}) && pwd)/$(basename ${execpath}) resolves to at _query_ time, rather than at exec time -- specifically, the directory is resolved at exec time and its vnode is persistently stored in the struct proc, but the pwd is resolved at query time. (If the directory or file has been deleted something else happens.) I have seen applications explicitly prefer the AT_EXECPATH semantics (passing absolute paths through verbatim) because the sysctl and /proc semantics `may not return the desired path if there are multiple hardlinks to the file'. Note that /proc/self/file is _not_ a `hard link' to the executable file -- it has the same semantics as /proc/self/exe. Note that the MIB ordering is different from NetBSD. (Yes, I've found this bug in pkgsrc patches that were evidently not tested!) . sysctl kern.proc_args.pathname: https://cgit.freebsd.org/src/tree/sys/kern/kern_proc.c?id=35a2229b67914ff1a4bae6334ad5015aa603967a#n3389 https://cgit.freebsd.org/src/tree/sys/kern/kern_proc.c?id=35a2229b67914ff1a4bae6334ad5015aa603967a#n2325 . /proc/*/exe: https://cgit.freebsd.org/src/tree/sys/fs/procfs/procfs.c?id=35a2229b67914ff1a4bae6334ad5015aa603967a#n193 https://cgit.freebsd.org/src/tree/sys/fs/procfs/procfs.c?id=35a2229b67914ff1a4bae6334ad5015aa603967a#n74 . proc_get_binpath: https://cgit.freebsd.org/src/tree/sys/kern/kern_proc.c?id=35a2229b67914ff1a4bae6334ad5015aa603967a#n2254 - I think Linux /proc/self/exe has the same semantics as FreeBSD /proc/self/exe but the code is unclear and I got bored of chasing it or experimenting. - Solaris's AT_SUN_EXECNAME and getexecname() is ${execpath} but with intermediate ./ and ../ components simplified. Not necessarily absolute. I don't think any symlinks are resolved, just intermediate ./ and ../ components, but I'm not sure about symlinks. . Oracle documentation: https://docs.oracle.com/cd/E36784_01/html/E36874/getexecname-3c.html . illumos source reference: o lookuppn simplifies intermediate ./ and ../ components: https://github.com/illumos/illumos-gate/blob/d3fbc1f35b71e399da966ef9ed66f66762d4afba/usr/src/uts/common/fs/lookup.c#L504-L547 o Resolved path is copied to args->pathname in exec: https://github.com/illumos/illumos-gate/blob/d3fbc1f35b71e399da966ef9ed66f66762d4afba/usr/src/uts/common/os/exec.c#L357 o args->pathname is fed into AT_SUN_EXECNAME in exec: https://github.com/illumos/illumos-gate/blob/d3fbc1f35b71e399da966ef9ed66f66762d4afba/usr/src/uts/common/os/exec.c#L1750-L1765 - macOS's _NSGetExecutablePath gives something that I'm not sure is guaranteed to be absolute, but it is documented _not_ to resolve symlinks (and I'm guessing may not resolve intermediate ./ or ../ components either): https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man3/dyld.3.html I also got bored trying to chase through the code at <https://github.com/apple-opensource/dyld> after this route: . _NSGetExecutablePath: https://github.com/apple-opensource/dyld/blob/e3f88907bebb8421f50f0943595f6874de70ebe0/dyld3/APIs.cpp#L679-L698 . AllImages::imagePath(const closure::Image *): https://github.com/apple-opensource/dyld/blob/e3f88907bebb8421f50f0943595f6874de70ebe0/dyld3/AllImages.cpp#L987-L997 . Image::path(): https://github.com/apple-opensource/dyld/blob/e3f88907bebb8421f50f0943595f6874de70ebe0/dyld3/Closure.cpp#L209-L215 . pathWithHash, looks like ELF auxv equivalent but I don't know where that gets passed in, nothing obvious turned up in a quick search of <https://github.com/apple-opensource/xnu>: https://github.com/apple-opensource/dyld/blob/e3f88907bebb8421f50f0943595f6874de70ebe0/dyld3/Closure.h#L81 POSTSCRIPT: This came up while I was investigating why lang/racket stopped building on NetBSD (which turned out to be because it was trying to resolve /proc/curproc/file as if it were a symlink, and then trying to open its own data files relative to that -- under /proc/curproc): https://github.com/racket/racket/issues/5122 The investigation led me to file a PR (still open) for disagreement between static executables and dynamic executables over what the main object name should be according to dl_iterate_phdr, which led me to find that FreeBSD's /proc/self/exe is slightly different from ours, and so on: PR lib/58865: static and dynamic dl_iterate_phdr disagree on main object name (https://gnats.NetBSD.org/58865) Maybe we should also record the directory vnode of each process's executable so a variant of the vnode_to_path logic can be made to work without relying on the namecache, like FreeBSD does for its semantics. But it's not clear to me that some notion of canonical absolute path is the right thing; I think the verbatim access path used by the execve(2) caller is more likely to be useful, easier to understand, and clearer to define.