Re: [Toybox] Add remaining pwd options
Sorry for this repeated hair-splitting. On 01/12/13 at 11:33pm, Rob Landley wrote: On 01/10/2013 02:25:13 PM, Felix Janda wrote: On 01/02/13 at 12:41am, Rob Landley wrote: What I did was disable #3 in the case where cwd doesn't exist. So the new rule #3 is: 3) If cwd exists and $PWD doesn't point to it, fall back to -P. Thanks for the clarification. Your version of 3) depends on whether pwd is builtin or not. Do you mean something like If getcwd() fails ...? cwd is what getcwd() returns. $PWD is an environment variable. I wanted to differentiate between the current working directory and its name. Doesn't an unliked current working still exist for the processes its the cwd of? (I was wrong about that 3) depends on whether pwd in builtin or not since child processes inherit cwd.) BTW, in the case that one has deleted and recreated one's current working directory one could also use cd . to get to the new directory. Good to know. (This means the shell is special casing . as well as ... I need to read the susv4 shell stuff thoroughly, it's been years...) The susv4 page special cases . and .. a bit, but it seems to me only in the $CDPATH handling. Ah, I see that you don't care about $CDPATH from the about page. $CDPATH and $PWD are separate. I just read http://landley.net/toybox/about.html: And some things (like $CDPATH support in cd) await a good explanation of why to bother with them. and interpreted it as a reluctance to implement $CDPATH support. Then I think one can leave out step 5 on susv4's page on cd, and cd . is no more special than cd dir; it does a chdir to $PWD/. or $PWD/dir respectively and then updates $PWD to its canonical form. (and modifies $OLDPWD also if necessary) Um, steps 4 and 8 are the ones that say cd . and .. are special? Step 4 means that $CDPATH shouldn't be taken into account when you do something like cd ./dir or cd ../dir. In Step 8 the usual formal processes of simplifying a path (by removing . dot components and so on) described. Of course here . and .. are treated specially, but this treatment affects only $PWD, since chdir(/some/dir/.) should do the same as chdir(/some/dir). Step 9 looks like fun... Another interesting situation is if your current directory /dir has been moved to /olddir and say /dir has been recreated. Then cd . will move you to new directory whereas cd $(pwd -P) will preserve your cwd and fix up $PWD. (at least for a shell behaving posixly correct) Preserving the cwd is what I wanted to do, yes. Imagine the same situation but with /dir not being recreated after being moved. Then cd . should fail according to susv4 since $PWD/. = /dir/., which does not exist. Would you like to have cd . behave the same as cd $(pwd) in this case? Bash does this if not in POSIX mode. Busybox ash doesn't do this and for some reason even cd $(pwd) fails. I want the great mass of existing shell scripts to work, which means reproducing historical behavior. Posix is (mostly) a reasonable consensus documentation of historical behavior. Ok Felix ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] Add remaining pwd options
On 01/13/2013 04:34:56 AM, Felix Janda wrote: Sorry for this repeated hair-splitting. Eh, it happens. :) I'm constantly mucking about in areas I'm brand new to (or haven't got the background for, or last messed with so long ago I've forgotten important things, or while massively distracted and sleep deprived...), so I'm wrong a lot. I just try to fix it when I notice. On 01/12/13 at 11:33pm, Rob Landley wrote: On 01/10/2013 02:25:13 PM, Felix Janda wrote: On 01/02/13 at 12:41am, Rob Landley wrote: What I did was disable #3 in the case where cwd doesn't exist. So the new rule #3 is: 3) If cwd exists and $PWD doesn't point to it, fall back to -P. Thanks for the clarification. Your version of 3) depends on whether pwd is builtin or not. Do you mean something like If getcwd() fails ...? cwd is what getcwd() returns. $PWD is an environment variable. I wanted to differentiate between the current working directory and its name. The kernel has two magic symlinks as part of each process's state: 1) . which is set by chdir() and returned by getcwd(). 2) / which is set by chroot() and not really returned by anything because it's what other paths are explained relative _to_. Inside the kernel, each points to a dentry, which is pinned by the reference so you don't have to worry about it going away. (It can be invalidated by deleting the dentry's attached inode, but I believe it's still around as a zombie until the reference count drops to zero. And there's a horrible magic syscall called switch_root that iterates through every process in the system and redirects the . and / links of every process from one of these to another, but that does horrible latency spike locking.) Each dentry has a .. link, which is not a process attribute, but a dentry attribute. (Or is it inode? The fact dentries aren't really independent of an underlying inode is half the reason you can't hardlink directories. Anyway, .. is implemented by following the dentry parent pointer, with the exception that / pointing to the current dentry is treated the same as the dentry parent pointer being NULL. Yes, this means that if you go: mkdir(sub); chroot(sub); chdir(../../../../../../../../..); chroot(.); You can escape a chroot. Moving the / symlink _under_ . means the .. test won't hit it, you see. There's no = test here, just ==. Anyway, given a dentry the kernel can traverse up to the root (either equal to / or where the dentry parent pointer is NULL) to work out the absolute path to this dentry, and since each dentry only has one parent pointer there's only _one_ absolute path to a given dentry. Does that help? Doesn't an unliked current working still exist for the processes its the cwd of? You have a pointer to a zombie dentry, the parent pointer of which is NULL. It's been unlinked from the tree but won't be garbage collected until the reference count falls to zero. I'd guess the corresponding inode has been freed and thus the inode pointer is also NULL (thus freeing up actual disk space, unlike a filehandle to an open _file_), but I'd have to go look at the kernel source to know for sure. (I was wrong about that 3) depends on whether pwd in builtin or not since child processes inherit cwd.) Child processes inherit environment variables too, but a child process can't change the parent's attributes. (Ok, it could ptrace it but that's HORRIBLE and we're not doing that. Sorry, reflexive action anytime anyone, including me, says you can't do X. There's usually a bad way to do it. I have a black belt in bad ways to do things, and a lot of experience in cleaning them up to look presentable. I do the don't ask questions, post errors thing to _myself_ all the time.) BTW, in the case that one has deleted and recreated one's current working directory one could also use cd . to get to the new directory. Good to know. (This means the shell is special casing . as well as ... I need to read the susv4 shell stuff thoroughly, it's been years...) The susv4 page special cases . and .. a bit, but it seems to me only in the $CDPATH handling. Ah, I see that you don't care about $CDPATH from the about page. $CDPATH and $PWD are separate. I just read http://landley.net/toybox/about.html: I read http://pubs.opengroup.org/onlinepubs/9699919799/utilities/cd.html (About the above URL: don't ask me why www.opengroup.org redirects to pubs.opengroup.org but just opengroup.org says the service is discontinued. The safe thing to do is probably just http://pubs.opengroup.org/onlinepubs/9699919799/download/ and use a file:// url on the local disk. That's what I do most of the time, and then have to dig up a public URL when I want to point somebody else at a page...) And some things (like $CDPATH support in cd) await a good explanation of why to bother with
Re: [Toybox] Add remaining pwd options
On 12/30/12 at 05:47pm, Rob Landley wrote: On 12/30/2012 05:16:41 AM, Felix Janda wrote: On 12/30/12 at 04:43am, Rob Landley wrote: POSIX contains many surprises. In the section on environment variables it says that $PWD should be set if pwd -P was specified. What happens if an error happens seems unspecified. Sorry, this is wrong. It has been changed between SUSV4 and SUSV3. Now pwd must not change $PWD. (It would be really nice to have SUSV4 man pages...) Translation: pwd must be a shell builtin running within the shell's process ID, and cannot sanely be implemented any other way. It would be nice if they would just _identify_ these. I did a pass to find them (in the roadmap), but missed this. I agree that it's sensible to have it as a builtin. I'm still not sure whether an external implementation can't be sane, though. Let's go back to the situation of a directory /dir deleted in a subshell. What is then the path name of the current working directory of the shell? (I'd say it's undefined.) Both getcwd() and stat(/dir) fail in this situation for both the shell and external commands. Does the builtin pwd have any advantage over the external pwd in making sure that $PWD is sane? Sigh. And the whole PWD defaults to -P unless POSIXLY_CORRECT thing above: while I'm sure that code is in there, it's not actually what it's doing here. Because GNU code is INSANE, and someone somewhere thought this tangle of corner cases might help somehow. Right, in the case of a deleted directory $PWD is all we've got, so have -L (which is the default) print it but first validate it's an absolute path with no .. in it. Only validate that current directory and path directory point to the same place if there IS a current directory. If that's not what they want, -P exists. In the corner case shouldn't pwd (-L and -P) just give an error message? ($PWD does not contain an absolute pathname of the current working directory.) If something deletes the directory you're working in, cd .. should work if the directory above you exists. That can't happen if $PWD isn't there. What exactly is the relation of this to the pwd command? cd .. should call chdir() with $PWD/.. after canonicalization. On contrast to pwd, cd _has_ to be builtin since a chdir() in a child process won't affect the parent shell. Also, when a directory gets deleted and recreated I do cd $(pwd) all the time. It's useful to still have pwd if some other process takes out the directory you're in. Ok, I see that this is handy. Alternative one could use cd $PWD. I find that this application really contradicts POSIX since here . and $PWD are completely different directories. Your fun corner case is still strange. From playing a bit around bash seems to keep the PWD in addition to the environment variable somewhere internally (pwd still works after unsetting $PWD.) On the other hand pwd -P seems to reset this internal state for some reason. Maybe it's a bug. dash also seems to keep some internal state, but pwd still works after pwd -P has failed. Felix ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] Add remaining pwd options
On 12/29/2012 07:38:24 AM, Felix Janda wrote: POSIX says that pwd should behave the same as pwd -L. The current pwd -P should behave the same way as the previous version of pwd. It just returns the getcwd() output. pwd -L does just check whether the environment variable PWD is also a valid current working directory and uses that instead of the output of getcwd() if that's the case. Here's a fun corner case: $ cd $ mkdir fruit $ cd fruit $ (cd .. rmdir fruit) $ ls -l total 0 $ pwd /home/landley/fruit $ pwd -L /home/landley/fruit $ pwd -P pwd: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory $ pwd -L pwd: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory $ pwd pwd: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory The amount of magic inherent in that behavior is kind of mind-boggling. If you can't getcwd() then it's happy printing $PWD, until you call pwd -P and that somehow invalidates $PWD? (Which means pwd is totally a shell builtin because a child process can't persistently set an environment variable in the parent process). Sigh. And the whole PWD defaults to -P unless POSIXLY_CORRECT thing above: while I'm sure that code is in there, it's not actually what it's doing here. Because GNU code is INSANE, and someone somewhere thought this tangle of corner cases might help somehow. Right, in the case of a deleted directory $PWD is all we've got, so have -L (which is the default) print it but first validate it's an absolute path with no .. in it. Only validate that current directory and path directory point to the same place if there IS a current directory. If that's not what they want, -P exists. Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] Add remaining pwd options
On 12/30/2012 04:47:13 AM, Felix Janda wrote: Thanks for the various clarifications and making pwd -L check for dot and dot-dot as described in the standard. Looking at the POSIX man page toysh should set $PWD at some point, too. Right now we have toysh is hugely incomplete and I just got it to segfault by playing with 'cd'. After I deal with mount/umount/losetup I'm going to try to do a cleanup pass on it and actually start on environment variable support. Alas, toysh was never nearly as finished as people seem to think it is... Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] Add remaining pwd options
On 12/28/2012 03:24:17 PM, Felix Janda wrote: Hi, the first patch adds the -L and -P options to pwd as specified by POSIX. The test script again uses stat. This time in order to get inode numbers of directories. For future reference adding the test in the same commit as the changes being tested is probably ok. I've applied this patch, but am going to have to take a closer look at it in the morning. (You added a -L option which... is a NOP? Huh, what posix specifies here is kind of insane, there's no way to get the raw getcwd() output. The -L stuff is all about $PWD, and if that doesn't have a valid value it falls back to -P which does a realpath() on the data to strip symlinks...? I need to read this when I'm more awake, this standard is written for a system that stores state different than linux. The current working directory is a process attribute used directly by the vfs, it's not an environment variable...) I think the fix is to have -L _not_ be the default, and to have pwd return the raw getcwd() output when neither -L nor -P is specified... but that's a technical violation of posix... Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] Add remaining pwd options
On 12/29/2012 07:38:24 AM, Felix Janda wrote: On 12/29/12 at 03:53am, Rob Landley wrote: On 12/28/2012 03:24:17 PM, Felix Janda wrote: Hi, the first patch adds the -L and -P options to pwd as specified by POSIX. The test script again uses stat. This time in order to get inode numbers of directories. For future reference adding the test in the same commit as the changes being tested is probably ok. Ok. I've applied this patch, but am going to have to take a closer look at it in the morning. (You added a -L option which... is a NOP? Huh, what posix specifies here is kind of insane, there's no way to get the raw getcwd() output. The -L stuff is all about $PWD, and if that doesn't have a valid value it falls back to -P which does a realpath() on the data to strip symlinks...? I need to read this when I'm more awake, this standard is written for a system that stores state different than linux. The current working directory is a process attribute used directly by the vfs, it's not an environment variable...) I think the fix is to have -L _not_ be the default, and to have pwd return the raw getcwd() output when neither -L nor -P is specified... but that's a technical violation of posix... POSIX says that pwd should behave the same as pwd -L. Posix seems to believe that the PWD environment variable is where the current directory is stored, which is not how Linux works. On linux, getcwd() returns one of two process-specific vfs attributes (chdir() sets . and chroot() sets /, and neither of those is an environment variable). If you export PWD=/blah that's not the same as calling chdir. The current pwd -P should behave the same way as the previous version of pwd. It just returns the getcwd() output. Which is always an abspath. (I checked.) I think what happens when you cd through a symlink is that the shell saves the path you descended into in $PWD, and then if you cd .. it chops off the last path component instead of actually dereferencing .. (which would wind up somewhere other than the directory you came from). So pwd -L is showing you the shell's view of things (using the $PWD environment variable), and pwd -P is showing you the realpath(). And what this basically means is pwd is more or less a shell builtin, the standard just isn't EXPLAINING it clearly. pwd -L does just check whether the environment variable PWD is also a valid current working directory and uses that instead of the output of getcwd() if that's the case. Posix goes on at some length about no . or .. in it. I added logic to do this, but haven't checked it in yet. So according to POSIX we have: $ cd /tmp $ ln -s . a $ cd a $ export PWD=/tmp/a $ pwd /tmp/a $ pwd -P /tmp Actually at least bash seems to update PWD automatically so that the export statement is unnecessary. Indeed, bash updates PWD. (Unless you assign it to something else or unset it.) It's maybe interesting to see what coreutils is doing. A fragment: I never look at gnu source if I can avoid it. I sometimes run that stuff under strace, but mostly I just read the docs and work out tests. /* POSIX requires a default of -L, but most scripts expect -P. */ bool logical = (getenv (POSIXLY_CORRECT) != NULL); The rule is Anything gnu does is a bad idea, and there are about as many exceptions to that as any other rule. I think I understand _why_ -L is doing that, and the sanity checks are so if somebody tries to futz around with pwd to point somewhere else (or in a way the shell wouldn't have set it), we discard it and give the abspath instead for security-ish reasons. But the user friendly path may have $HOME be a symlink with the abspath on /mnt/vol2 or something, and we want to default to giving the PWD the user actually remembers. The implementation of pwd -L could also use realpath instead of stat. Stat's easier. Taking a further look at POSIX I think that the option string should be 0LP[-LP] instead of 0LP[!LP]. I already made that change locally. :) Felix Rob Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
[Toybox] Add remaining pwd options
Hi, the first patch adds the -L and -P options to pwd as specified by POSIX. The test script again uses stat. This time in order to get inode numbers of directories. Felix # HG changeset patch # User Felix Janda felix.ja...@posteo.de # Date 1356627399 -3600 # Node ID 592dab5e536c053ac8b8696f368045f76c8a30b9 # Parent 017b8fd3c9ac5a86dd849831622c4878fddebe5d Add options -L and -P to pwd. diff -r 017b8fd3c9ac -r 592dab5e536c toys/posix/pwd.c --- a/toys/posix/pwd.c Wed Dec 26 19:39:51 2012 -0600 +++ b/toys/posix/pwd.c Thu Dec 27 17:56:39 2012 +0100 @@ -3,26 +3,34 @@ * Copyright 2006 Rob Landley r...@landley.net * * See http://opengroup.org/onlinepubs/9699919799/utilities/echo.html - * - * TODO: add -L -P -USE_PWD(NEWTOY(pwd, NULL, TOYFLAG_BIN)) +USE_PWD(NEWTOY(pwd, 0LP[!LP], TOYFLAG_BIN)) config PWD bool pwd default y help -usage: pwd +usage: pwd [-L|-P] The print working directory command prints the current directory. + +-P Avoid all symlinks +-L Use the value of the environment variable PWD if valid + +The option -L is implied by default. */ +#define FOR_pwd #include toys.h void pwd_main(void) { - char *pwd = xgetcwd(); + char *pwd = xgetcwd(), *env_pwd; + struct stat st[2]; - xprintf(%s\n, pwd); + if (!(toys.optflags FLAG_P) (env_pwd = getenv(PWD)) +!stat(pwd, st[0]) !stat(env_pwd, st[1]) +(st[0].st_ino == st[1].st_ino)) xprintf(%s\n, env_pwd); + else xprintf(%s\n, pwd); if (CFG_TOYBOX_FREE) free(pwd); } # HG changeset patch # User Felix Janda felix.ja...@posteo.de # Date 1356729021 -3600 # Node ID f5b0f21ef92f73e13c3415d8449be86d9c531186 # Parent dbf0480c88f4895724d719738c7d75ffc9f6c957 Add some tests for pwd. diff --git a/scripts/test/pwd.test b/scripts/test/pwd.test new file mode 100755 --- /dev/null +++ b/scripts/test/pwd.test @@ -0,0 +1,26 @@ +#!/bin/bash + +[ -f testing.sh ] . testing.sh + +#testing name command result infile stdin + +#TODO: Find better tests + +testing pwd [ $(stat -c %i $(pwd)) = $(stat -c %i .) ] echo yes \ + yes\n +testing pwd -P [ $(stat -c %i $(pwd -P)) = $(stat -c %i .) ] echo yes \ + yes\n + + +ln -s . sym +cd sym +testing pwd [ $(stat -c %i $(pwd)) = $(stat -c %i $PWD) ] echo yes \ + yes\n +testing pwd -P [ $(stat -c %i $(pwd -P)) = $(stat -c %i $PWD) ] || echo yes \ + yes\n +cd .. +rm sym + +export PWD=walrus +testing pwd (bad PWD) [ $(pwd) = $(cd . ; pwd) ] echo yes \ + yes\n ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net