Re: [1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files
Additionally, when GLOB_NOCHECK is in flags it is not expected to call stat() and return 0 paths if it does not exist. It is on the application to note the increase in count by 1 and compare the result for match to pattern to see if it needs to do a stat() seperately. One of the examples or Application Usage could make this explicit better. I would prefer to see a separate error return value for this case, e.g. GLOB_NOEXIST, as a more efficient means of testing for it, or remove the qualification from GLOB_NOMATCH. On Thursday, August 1, 2019 Geoff Clare wrote: Stephane Chazelas wrote, on 01 Aug 2019: > > It's also not clear what the interaction between GLOB_MARK and > GLOB_NOCHECK would be. If a pattern expands to itself because it > can't find a match, should it still call stat on it? Not clear? Seems crystal clear to me. GLOB_NOCHECK: "... shall return a list consisting of only pattern". No allowance for a slash to be appended. GLOB_MARK: "Each pathname that is a directory that matches pattern ..." If pattern does not match anything, then it is not "a directory that matches pattern" even if a directory with the same name exists. -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Re: [1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files
2019-08-01 15:51:10 +0100, Geoff Clare: > Stephane Chazelas wrote, on 01 Aug 2019: > > > > It's also not clear what the interaction between GLOB_MARK and > > GLOB_NOCHECK would be. If a pattern expands to itself because it > > can't find a match, should it still call stat on it? > > Not clear? Seems crystal clear to me. > > GLOB_NOCHECK: "... shall return a list consisting of only pattern". > No allowance for a slash to be appended. > > GLOB_MARK: "Each pathname that is a directory that matches pattern ..." > If pattern does not match anything, then it is not "a directory that > matches pattern" even if a directory with the same name exists. [...] Makes sense, though in my example the pathname *did* match the pattern. Only that unreadable/* file was not found by glob() as the unreadable directory wasn't readable. It was searchable though which meant a stat on the "unreadable/*" directory succeeded. In any case, those implementations will happily add a / even on directories that don't match the pattern like in unreadable/[*] or unreadable/\*, so those implementations are not compliant. Related question, also related to the backslash issue (bugid:1234). In: glob("\\foo") Should glob look for foo in the current directory via a listing of it or via lstat("foo")? Looking at the glibc implementation, I see that glob("foo") does a lstat("foo") without NOCHECK and nothing at all with NOCHECK. While for glob("\\foo"), it searches for it in the directory listing (both with and without NOCHECK). The MARK is added based on a stat() done on the result of the expansion (with NOCHECK, either foo or \foo). IOW, it behaves the same as for glob("[f]oo"). -- Stephane
Re: [1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files
Stephane Chazelas wrote, on 01 Aug 2019: > > It's also not clear what the interaction between GLOB_MARK and > GLOB_NOCHECK would be. If a pattern expands to itself because it > can't find a match, should it still call stat on it? Not clear? Seems crystal clear to me. GLOB_NOCHECK: "... shall return a list consisting of only pattern". No allowance for a slash to be appended. GLOB_MARK: "Each pathname that is a directory that matches pattern ..." If pattern does not match anything, then it is not "a directory that matches pattern" even if a directory with the same name exists. -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Re: [1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files
2019-07-27 10:49:39 +, Austin Group Bug Tracker: [...] > > If, during the search, a directory is encountered that cannot > > be opened or read and errfunc is not a null pointer, glob() > > calls (*errfunc()) with two arguments. > [...] > > 2. The eerrno argument is the value of errno from the > > failure, as set by opendir(), readdir(), or stat(). > > (Other values may be used to report other errors not > > explicitly documented for those functions.) > > (Note: does that mean glob() has to call those 3 functions (as > opposed to open(O_DIRECTORY)/getdents() or any other API)? Why > stat(), shouldn't that be lstat()?) [...] I'm only realising now (after reading the musl mailing-list thread) that the stat() above may be refering to GLOB_MARK which tells glob to append "/" after directories for which glob() would need to a stat() and which all implementations seem to be doing (but none seem to report errors when that one fails). It's also not clear what the interaction between GLOB_MARK and GLOB_NOCHECK would be. If a pattern expands to itself because it can't find a match, should it still call stat on it? glibc and diet seem to, musl doesn't: $ mkdir -p 'unreadable/*' unreadable/dir $ chmod 111 unreadable $ ~/glob-and-mark-glibc 'unreadable/*' unreadable: Permission denied ret=0 count=1 - unreadable/*/ $ ~/glob-and-mark-diet 'unreadable/*' unreadable: Permission denied ret=0 count=1 - unreadable/*/ $ ~/glob-and-mark-musl 'unreadable/*' unreadable/: Permission denied ret=0 count=1 - unreadable/* (btw, dietlibc's glob() implementation is very buggy, probably not worth considering here) Where glob-and-mark.c is: #include #include #include int errfunc(const char *epath, int eerrno) { printf("%s: %s\n", epath, strerror(eerrno)); return 0; } int main(int argc, char* argv[]) { int r; glob_t globbuf; r = glob(argv[1], GLOB_MARK|GLOB_NOCHECK, errfunc, ); printf("ret=%d count=%d\n", r, globbuf.gl_pathc); if (!r) { for (r = 0; r < globbuf.gl_pathc; r++) printf("- %s\n", globbuf.gl_pathv[r]); } return 0; } -- Stephane
Re: [1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files
Stephane Chazelas wrote, on 30 Jul 2019: > > 2019-07-30 15:31:13 +0100, Geoff Clare: > [...] > > It's not invention because the standard already requires it. (It also > > requires EACCES, ELOOP, ENAMETOOLONG, ENOENT, ENOTDIR to be treated as > > errors. The question is which ones the standard should be changed to > > say are ignored.) > [...] > > By that reading, you could also say that > > test -f /etc/passwd/file > > should report an error because of the ENOTDIR error returned by > stat(). How is it different? It's different because the purpose of test -f is to test whether the file exists (and is a regular file). Thus an exit status of 1 is the result of a successful execution which indicates that the file does not exist. The ENOTDIR does not prevent test -f from performing the task of determining whether the file exists (it is an indication that the file does not exist), and so should not be treated as an error. The purpose of pathname expansion in the shell is to replace a pattern with a list of pathnames that match that pattern. If an error occurs which prevents the shell from performing that task, then the shell should treat it as an error, except for cases that the standard explicitly says should be ignored (which is what we're proposing to add to fix the problem). -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Re: [1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files
2019-07-30 15:31:13 +0100, Geoff Clare: [...] > It's not invention because the standard already requires it. (It also > requires EACCES, ELOOP, ENAMETOOLONG, ENOENT, ENOTDIR to be treated as > errors. The question is which ones the standard should be changed to > say are ignored.) [...] By that reading, you could also say that test -f /etc/passwd/file should report an error because of the ENOTDIR error returned by stat(). How is it different? Surely the "errors" utilities are meant to report are those that they *consider* an error, not every error by any of the syscall their implementation makes. IMO, it's a bit far fetched to see the spec as requiring sh to fail upon a ENOENT error upon lstat() here (that would mean */file expansion could only succeed if all the non-hidden files in the current directory were searchable directories and contained a "file" entry) though it wouldn't harm to make it more explicit. -- Stephane
Re: [1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files
Robert Elz wrote, on 30 Jul 2019: > > | I strongly disagree that EMFILE and ENFILE should be ignored in the > | shell. That leads to the execution of commands with an unchanged > | pattern when there are matching files it should have used instead. > > Same thing for all the other errors. Further your desire here seems > to be invention - which is not something that should be happening. > Which current shells actually abort glob expansions when they get EMFILE > when opening a directory to read, or similar (which is the easy one to test) ? So far nobody has identified a shell that does it. It's not invention because the standard already requires it. (It also requires EACCES, ELOOP, ENAMETOOLONG, ENOENT, ENOTDIR to be treated as errors. The question is which ones the standard should be changed to say are ignored.) > | The command might succeed (operating on the wrong file) and the user > | would not be alerted to the problem. Particularly bad if it was an > | rm command. > > Yes, all that might happen - so it might if there's an EACCESS or EPERM > error, consider > > rm [abc]*/*.c > > where there's directories with names like "a102" "cxyz" "bletch" ... > which contain various *.c files that we want to remove. > > Now consider what happens when the (or a) previous command to that one was > > chmod a-rwx [abc]* > > all of the attempts to opem a102 cxyz bletch (etc) now fail, and > the pattern is not expanded, and rm gets given the pattern as the > file name to remove, and proceeds to delete my (very precious) file > '[abc]*/*.c' (which is a file in a directory with a name that doesn't > start with a b or c, so the chmod command did not protect it. > > "Particularly bad" > > Still that is what happens, and and what's more, is really what everone > expects will happen, and generally wants to happen (except those who want > some kind of "nomatch" error behaviour, which does not include me). While that's true, the difference is that this behaviour is consistent given those specific file system contents, whereas EMFILE and ENFILE might or might not happen at different times. There are also good reasons to want EACCES not to be treated as an error and this case is then collateral damage that we accept as being worth suffering; but there is no reason not to treat EMFILE and ENFILE as an error. > It isn't really all that bad in practice, as people rarely name files > with names that look like patterns, except from the occasional > accidental 'cat foo* > bar*' type error, and when they have done that > the problem tends to be more "how do I get rid of just that file" rather > that "why did that file get deleted". > > The (much less likely, except for users attempting to shoot themselves > in the foot, deliberately) EMFILE and ENFILE cases are not different at > all. However, they are easily preventable, so why not do that? -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Re: [1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files
2019-07-30 14:29:23 +0100, Geoff Clare: [...] > > Do you have a better suggestion? > > Unless one of the implementations changes to do something better > before we get too far into work on Issue 8, I think the only > choices we have are the Solaris behaviour, the GNU/BSD behaviour, > the GNU/BSD "done right" (ELOOP/ENAMETOOLONG/ENOENT/ENOTDIR all > treated the same), or allow some or all of these behaviours. [...] Great, thanks. I think we concur. My vote, as already stated would be: - ENOTDIR errors upon opendir() shall be ignored - ENOENT/ENAMETOOLONG/ELOOP may be ignored. That is Solaris (and other old BSDs and newer musl) not allowed as */*.c returning a ENOTDIR error is definitely a bug IMO, GNU/FreeBSD allowed. Do we want to allow lstat() errors (other than ENOENT/ENOTDIR) to be reported (I changed my mind on that and now think it would not be that confusing). I've now tested musl 1.1.21 and diet 0.34 on Linux which are actually quite different from the GNU/Solaris/FreeBSD mentioned above. For musl, first, it seems that in */*.c, it actually uses the entry-type information returns by readdir() and doesn't call opendir() on entries that are neither directory nor symlink. It still returns an error upon ENOTDIR if there are symlinks to regular files in the current directory or if called with regfile/*.c It calls stat() instead of lstat() for the */file glob (again, skipping the non-dir-non-symlink files), (so would not expand a dir/file broken symlink) and reports the stat() errors other than ENOENT (including ENOTDIR in */file when the current directory contains a symlink to a regular file). dietlibc seems to behave quite differently as well. In */*.c, it does a stat() on each file in the current directory to determine which are directories (and if there's no matching one calls opendir("*") mostly likely causing a ENOENT error), so won't report ENOTDIR errors there (except in race condition cases). In */file, it doesn't use stat/lstat but reads the content of the directories looking for a "file" entry (so fails on unreadable dirs instead of unsearchable ones). In any case, it ignores the ENOTDIR errors on opendir(), even in the regfile/*.c case. -- Stephane
Re: [1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files
Date:Tue, 30 Jul 2019 14:39:54 +0100 From:Geoff Clare Message-ID: <20190730133954.GB27152@lt2.masqnet> | This thread is specifically about the glob() function in XSH. Yes, I know, see my previous reply to Stephane. If I replied to a message in the wrong thread, when I should have used the other one, apologies - I wasn't paying all that much attention. It just posed a question I thought worthy of a reply (but apparently really in the context of the other bug report.) | I strongly disagree that EMFILE and ENFILE should be ignored in the | shell. That leads to the execution of commands with an unchanged | pattern when there are matching files it should have used instead. Same thing for all the other errors. Further your desire here seems to be invention - which is not something that should be happening. Which current shells actually abort glob expansions when they get EMFILE when opening a directory to read, or similar (which is the easy one to test) ? | The command might succeed (operating on the wrong file) and the user | would not be alerted to the problem. Particularly bad if it was an | rm command. Yes, all that might happen - so it might if there's an EACCESS or EPERM error, consider rm [abc]*/*.c where there's directories with names like "a102" "cxyz" "bletch" ... which contain various *.c files that we want to remove. Now consider what happens when the (or a) previous command to that one was chmod a-rwx [abc]* all of the attempts to opem a102 cxyz bletch (etc) now fail, and the pattern is not expanded, and rm gets given the pattern as the file name to remove, and proceeds to delete my (very precious) file '[abc]*/*.c' (which is a file in a directory with a name that doesn't start with a b or c, so the chmod command did not protect it. "Particularly bad" Still that is what happens, and and what's more, is really what everone expects will happen, and generally wants to happen (except those who want some kind of "nomatch" error behaviour, which does not include me). It isn't really all that bad in practice, as people rarely name files with names that look like patterns, except from the occasional accidental 'cat foo* > bar*' type error, and when they have done that the problem tends to be more "how do I get rid of just that file" rather that "why did that file get deleted". The (much less likely, except for users attempting to shoot themselves in the foot, deliberately) EMFILE and ENFILE cases are not different at all. kre
Re: [1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files
Date:Tue, 30 Jul 2019 14:35:47 +0100 From:Stephane Chazelas Message-ID: <20190730133547.mcqvnbaz3wmms...@chaz.gmail.com> | While I generally agree for bugid:1275 -- for shell globs or | glob() without GLOB_ERR As I said (and also in response to the first part of Geoff's subsequent message) for the case you're concerned with I don't much care (might be a good idea just to delete the GLOB_ERR flag though, and avoid the issue - any implementation that finds a need for it can then just implement it as an extension, and can make it handle whatever errors they need it to handle.) | implementations to report some errors like ENFILE/EFAULT/ENOMEM; | not that any does ATM), ash based shells will certainly report ENOMEM errors, if they occur while saving generated pathnames (which, even though it is really hard to generate on any modern system, is the likely case for ENOMEM to be seen). On the other hand if ENOMEM comes from opendir() not being able to allocate buffer space to manage the read from the directory, it will be ignored just the same as if opendir() returns failure because of ENOENT. | it's a different matter for glob() with | the GLOB_ERR flag in bugid:1273 discussed in this thread which | is explicitly meant to report errors You may have noticed that I haven't been paying that much attention around here recently (busy with other things) but I was actually aware of the distinction, I just don't care about that part of it, as ... | (and before you ask, no, I | don't know of any application that uses that API. Does anyone?). Not me. kre
Re: [1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files
Robert Elz wrote, on 30 Jul 2019: > > My recommendation would be to forbid glob (at least in the shell, > I don't care so much about whatever other uses there are) This thread is specifically about the glob() function in XSH. The similar issues with pathname expansion in the shell are covered by bug 1275. > from ever > returning any error status or issuing any error messages from operations > related to its path search -- errors from shell memory management - exhausing > available mem because the list of found paths contains too manny and they're > too long, or similar problems, is a different issue .. similarly if the > shell needs to fork because of the way its glob code is implemented, and > the fork fails, or in such a case if the shell needs a pipe to, or shared > memory with, its child to receive the results, and that fails to be > stablished. > > Glob isn't the right tool (it isn't a tool at all) to find file system > problems, whether they're problems we generally want glob to hide > (ENOTDIR because the first '*' in */*.c matched a regular file, or > ENOTDIR which might be because a directory inode got corrupted and is > now appearing to the filesystem as if it were a regular file, or a fifo > or something) EIO, errors about symlink loops, or absurldly long pathnames > or pathname components, or anything else (including EMFILE and ENFILE). I strongly disagree that EMFILE and ENFILE should be ignored in the shell. That leads to the execution of commands with an unchanged pattern when there are matching files it should have used instead. > If glob fails to match files that the user thought should be matched, then > the user needs to investigate The command might succeed (operating on the wrong file) and the user would not be alerted to the problem. Particularly bad if it was an rm command. -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Re: [1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files
2019-07-30 20:18:43 +0700, Robert Elz: [...] > If a sys call executed by glob while searching fails, then it should treat > that exactly the same as ENOENT (the thing simply doesn't exist for glob > purposes) and continue with whatever is next. [...] While I generally agree for bugid:1275 -- for shell globs or glob() without GLOB_ERR (though I wouldn't object to allowing implementations to report some errors like ENFILE/EFAULT/ENOMEM; not that any does ATM), it's a different matter for glob() with the GLOB_ERR flag in bugid:1273 discussed in this thread which is explicitly meant to report errors (and before you ask, no, I don't know of any application that uses that API. Does anyone?). -- Stephane
Re: [1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files
Stephane Chazelas wrote, on 30 Jul 2019: > > 2019-07-30 11:27:15 +0100, Geoff Clare: > [...] > > > For ENOENT, that can be seen as a pathological case worth > > > reporting as well, especially in the */*.c case where the > > > current directory contains broken symlinks. > > > > That's inconsistent with your position on ENOTDIR. > > > > If regfile exists then you claim regfile/*.c isn't going to produce any > > matches, so it should be ignored. Likewise if brokensymlink exists > > then brokensymlink/*.c isn't goint to produce any matches so to be > > consistent you should also want that to be ignored. > [...] > > But in long/path/with/spaghetty/symlinks/*/*.c, the fact that an > extra symlink brings you over the limit (of number of links for > ELOOP or of path length for ENAMETOOLONG) prevents you from > listing that directory for a reason that is worth reporting IMO. > > While there's no doubt in my mind that asking glob() to report > ENOTDIR errors in */*.c is wrong. Your argument above for ELOOP and ENAMETOOLONG applies equally well to ENOTDIR: long/path/with/regfile/in/the/middle/*/*.c The only consistent way to treat them is for all of the errors related to file system content to be ignored or for all of them to be errors. > That would be like asking that > ls -LR or find -L report them as well (ls -LR reports a ENOTDIR > error when a non-directory/file is passed as argument or is > found in the target of a symlink, but obviously not for the > non-directory files it finds by reading a directory, maybe that > can be adapted for glob()). Perhaps it could, but it would be invention - no current implementation does anything like that. > I don't think there's an ideal way to deal with it. That > interface is already broken/misdesigned in that it reports the > EACCESS errors in */*.c and not */file. Not reporting the > ENOTDIR error is definitely an improvement, at least in the case > of opening a directory that results from wildcard expansion (one > could argue glob() shouldn't try to open it in the first place), > not sure about ENOENT/ELOOP/ENAMETOOLONG. > > Do you have a better suggestion? Unless one of the implementations changes to do something better before we get too far into work on Issue 8, I think the only choices we have are the Solaris behaviour, the GNU/BSD behaviour, the GNU/BSD "done right" (ELOOP/ENAMETOOLONG/ENOENT/ENOTDIR all treated the same), or allow some or all of these behaviours. -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Re: [1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files
Date:Tue, 30 Jul 2019 12:43:54 +0100 From:Stephane Chazelas Message-ID: <20190730114354.72qqdwckkidsd...@chaz.gmail.com> | Do you have a better suggestion? My recommendation would be to forbid glob (at least in the shell, I don't care so much about whatever other uses there are) from ever returning any error status or issuing any error messages from operations related to its path search -- errors from shell memory management - exhausing available mem because the list of found paths contains too manny and they're too long, or similar problems, is a different issue .. similarly if the shell needs to fork because of the way its glob code is implemented, and the fork fails, or in such a case if the shell needs a pipe to, or shared memory with, its child to receive the results, and that fails to be stablished. Glob isn't the right tool (it isn't a tool at all) to find file system problems, whether they're problems we generally want glob to hide (ENOTDIR because the first '*' in */*.c matched a regular file, or ENOTDIR which might be because a directory inode got corrupted and is now appearing to the filesystem as if it were a regular file, or a fifo or something) EIO, errors about symlink loops, or absurldly long pathnames or pathname components, or anything else (including EMFILE and ENFILE). If a sys call executed by glob while searching fails, then it should treat that exactly the same as ENOENT (the thing simply doesn't exist for glob purposes) and continue with whatever is next. If glob fails to match files that the user thought should be matched, then the user needs to investigate - whether the cause eventually is determined to be incoprrect permissions somewhere, a typo in the script or an arg given to the script (incorrect name), over long pathnames, symlink loops, or bad blocks on the drive; other tools will find that (ls, find, even cat or cp). There's no real advantage having glob trying to deal with all these cases, nor in attempting to divide up the errrors between the "good" ones and the "bad" ones - there's no way to do that that will satisfy everyone, as this dispute over ELOOP (etc) shows. What's more, frankly, it is ludicrous to claim that a script should abort itself when one of these (quite rare) errors occurs, because it might do the wrong thing, while at the same time proclaiming that it has to continue for one of the much more common errors (like bad permission settings causeing EACCES or a typing mistake generating ENOENT or ENOTDIR). If the script is going to do the wrong thing in one case, it will do the same wrong thing in the other case as well. kre
Re: [1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files
2019-07-30 11:27:15 +0100, Geoff Clare: [...] > > For ENOENT, that can be seen as a pathological case worth > > reporting as well, especially in the */*.c case where the > > current directory contains broken symlinks. > > That's inconsistent with your position on ENOTDIR. > > If regfile exists then you claim regfile/*.c isn't going to produce any > matches, so it should be ignored. Likewise if brokensymlink exists > then brokensymlink/*.c isn't goint to produce any matches so to be > consistent you should also want that to be ignored. [...] But in long/path/with/spaghetty/symlinks/*/*.c, the fact that an extra symlink brings you over the limit (of number of links for ELOOP or of path length for ENAMETOOLONG) prevents you from listing that directory for a reason that is worth reporting IMO. While there's no doubt in my mind that asking glob() to report ENOTDIR errors in */*.c is wrong. That would be like asking that ls -LR or find -L report them as well (ls -LR reports a ENOTDIR error when a non-directory/file is passed as argument or is found in the target of a symlink, but obviously not for the non-directory files it finds by reading a directory, maybe that can be adapted for glob()). I don't think there's an ideal way to deal with it. That interface is already broken/misdesigned in that it reports the EACCESS errors in */*.c and not */file. Not reporting the ENOTDIR error is definitely an improvement, at least in the case of opening a directory that results from wildcard expansion (one could argue glob() shouldn't try to open it in the first place), not sure about ENOENT/ELOOP/ENAMETOOLONG. Do you have a better suggestion? -- Stephane
Re: [1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files
2019-07-30 10:48:49 +0100, Geoff Clare: [...] > The odd thing about all these implementations that ignore ENOTDIR and ENOENT > (or don't but think they should), is that they are not following either of > the possible interpretations of the current text. > > If they want to interpret it literally and only report an error when they > encounter an existing directory that they can't open, then they should not > just ignore ENOTDIR and ENOENT from opendir(), they should also ignore > ELOOP and ENAMETOOLONG. [...] Note that there's only one implementation (that I found) that ignores ENOENT: FreeBSD (also found on macOS). ENOTDIR was added to glibc/gnulib/uclibc/dietlibc because of */*.c returning an error on non-directory files in the current directory which is a common, normal case where we don't want an error to be reported. While ELOOP and ENAMETOOLONG are pathological case which as you said in a related discussion could be worth reporting. For ENOENT, that can be seen as a pathological case worth reporting as well, especially in the */*.c case where the current directory contains broken symlinks. That's why in my proposed resolution, I left it open whether to specify the GNU or FreeBSD behaviour or allow both. We could make it: - ENOTDIR errors upon opendir() shall be ignored - ENOENT/ENAMETOOLONG/ELOOP may be ignored. Or we could allow all existing implementations and replace that "shall" with a "should" or "may". -- Stephane
Re: [1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files
Stephane Chazelas wrote, on 29 Jul 2019: > > 2019-07-29 12:12:28 +0100, Geoff Clare: > [...] > > > in */*.c, Solaris returns with an error if the current directory > > > contains a non-directory file (and calls errfunc() with ENOTDIR > > > and that file), which is not wanted. > > > > True, but there's no way round that because GLOB_ERR can't distinguish > > these cases. It's "all or nothing". > > > > > IMO, GLOB_ERR should be about failure to expand the glob. > > > The ENOTDIR error when expanding /etc/passwd/*.c is not > > > preventing the glob from expanding (to nothing). If passwd was a > > > symlink to some inaccessible area, then it would. > > > > To me the point of having GLOB_ERR and errfunc as two different > > error reporting mechanisms is that GLOB_ERR is "all or nothing" > > and errfunc lets you be more selective. You said yourself in the bug > > that the Solaris behaviour is "more flexible in that the caller can > > use a errfunc that ignores ENOENT/ENOTDIR to emulate the GNU/FreeBSD > > behaviour". > [...] > > Yes, but to me that sounds more like the Solaris behaviour is > bogus and there's a way to work around it. > > From https://reviews.freebsd.org/rS304284 > https://reviews.freebsd.org/rS304284#C38376190OL661 > FreeBSD implementated that ignoring of ENOENT/ENOTDIR for POSIX > compliance in 2016. > > For the ENOTDIR ignoring in GNU libc, that was in 1999 following > a bug report (libc/1032 which I coudn't find). See > https://sourceware.org/git/?p=glibc.git;a=commit;h=647361287ddb2d52ffe9dbbfe2bd27ed76dc2c56 > > NetBSD has this comment in the code: > > /* > * Posix/XOpen: glob should return when it encounters a > * directory that it cannot open or read > * XXX: Should we ignore ENOTDIR and ENOENT though? > * I think that Posix had in mind EPERM... > */ > > (ITTM EACCESS). The odd thing about all these implementations that ignore ENOTDIR and ENOENT (or don't but think they should), is that they are not following either of the possible interpretations of the current text. If they want to interpret it literally and only report an error when they encounter an existing directory that they can't open, then they should not just ignore ENOTDIR and ENOENT from opendir(), they should also ignore ELOOP and ENAMETOOLONG. -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Re: [1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files
2019-07-29 13:13:03 +0100, Stephane Chazelas: [...] > For the ENOTDIR ignoring in GNU libc, that was in 1999 following > a bug report (libc/1032 which I coudn't find). See > https://sourceware.org/git/?p=glibc.git;a=commit;h=647361287ddb2d52ffe9dbbfe2bd27ed76dc2c56 [...] The bug report can be seen at https://sourceware.org/ml/libc-alpha/1999-q1/msg00498.html Somebody noted that Solaris 7 had the same problem, but it was fixed nonetheless https://sourceware.org/ml/libc-alpha/1999-05/msg4.html -- Stephane
Re: [1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files
2019-07-29 13:13:03 +0100, Stephane Chazelas: [...] > NetBSD has this comment in the code: > > /* > * Posix/XOpen: glob should return when it encounters a > * directory that it cannot open or read > * XXX: Should we ignore ENOTDIR and ENOENT though? > * I think that Posix had in mind EPERM... > */ [...] OpenBSD has: /* TODO: don't call for ENOENT or ENOTDIR? */ the same as in FreeBSD before the 2016 fix. It's the same comment that could be found in 1990 in the BSD code, when the glob() function was added. It can be found in tcsh, nvi, sudo and perl code as well. And in opensolaris/illumos glob(). Most likely that TODO is still in the Solaris code. glob(3) is a POSIX invention, isn't it? I couldn't find it in SVR4. I wonder how other SYSV-derived OSes (and that don't have a BSD heritage like Solaris) behave. uclibc, musl and dietlibc behave like GNU (ignore ENOTDIR, not ENOENT) AFAICT from reading the code. musl seems to do some extra processing on EACCESS, I've not looked much further into it. -- Stephane
Re: [1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files
2019-07-29 12:12:28 +0100, Geoff Clare: [...] > > in */*.c, Solaris returns with an error if the current directory > > contains a non-directory file (and calls errfunc() with ENOTDIR > > and that file), which is not wanted. > > True, but there's no way round that because GLOB_ERR can't distinguish > these cases. It's "all or nothing". > > > IMO, GLOB_ERR should be about failure to expand the glob. > > The ENOTDIR error when expanding /etc/passwd/*.c is not > > preventing the glob from expanding (to nothing). If passwd was a > > symlink to some inaccessible area, then it would. > > To me the point of having GLOB_ERR and errfunc as two different > error reporting mechanisms is that GLOB_ERR is "all or nothing" > and errfunc lets you be more selective. You said yourself in the bug > that the Solaris behaviour is "more flexible in that the caller can > use a errfunc that ignores ENOENT/ENOTDIR to emulate the GNU/FreeBSD > behaviour". [...] Yes, but to me that sounds more like the Solaris behaviour is bogus and there's a way to work around it. >From https://reviews.freebsd.org/rS304284 https://reviews.freebsd.org/rS304284#C38376190OL661 FreeBSD implementated that ignoring of ENOENT/ENOTDIR for POSIX compliance in 2016. For the ENOTDIR ignoring in GNU libc, that was in 1999 following a bug report (libc/1032 which I coudn't find). See https://sourceware.org/git/?p=glibc.git;a=commit;h=647361287ddb2d52ffe9dbbfe2bd27ed76dc2c56 NetBSD has this comment in the code: /* * Posix/XOpen: glob should return when it encounters a * directory that it cannot open or read * XXX: Should we ignore ENOTDIR and ENOENT though? * I think that Posix had in mind EPERM... */ (ITTM EACCESS). -- Stephane
Re: [1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files
Stephane Chazelas wrote, on 29 Jul 2019: > > 2019-07-29 11:43:11 +0100, Geoff Clare: > [...] > > > But here I'm saying that the ENOENT/ENOTDIR errors should be > > > ignored with GLOB_ERR. It can already be implied to some extent > > > in that if you get those errors then it's not "directories" > > > you're trying to open (so it's not a case there "it encounters a > > > *directory* that it cannot open or read), but still the Solaris > > > implementation (for both ENOENT and ENOTDIR) and GNU > > > implementations (for ENOENT) still return errors. > > > > I think you're interpreting the current text too literally. My > > reading is that it is trying to describe what happens when glob() > > attempts to open what it expects to be a directory and gets an error. > > The Solaris behaviour seems like the right thing to do. If an > > application calls glob() to expand /etc/passwd/*.c without GLOB_NOCHECK > > and with GLOB_ERR then I think the application writer would want glob() > > to indicate that there's a problem with the pattern, not just that there > > are no matches. > [...] > > But then > > in */*.c, Solaris returns with an error if the current directory > contains a non-directory file (and calls errfunc() with ENOTDIR > and that file), which is not wanted. True, but there's no way round that because GLOB_ERR can't distinguish these cases. It's "all or nothing". > IMO, GLOB_ERR should be about failure to expand the glob. > The ENOTDIR error when expanding /etc/passwd/*.c is not > preventing the glob from expanding (to nothing). If passwd was a > symlink to some inaccessible area, then it would. To me the point of having GLOB_ERR and errfunc as two different error reporting mechanisms is that GLOB_ERR is "all or nothing" and errfunc lets you be more selective. You said yourself in the bug that the Solaris behaviour is "more flexible in that the caller can use a errfunc that ignores ENOENT/ENOTDIR to emulate the GNU/FreeBSD behaviour". -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Re: [1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files
2019-07-29 11:43:11 +0100, Geoff Clare: [...] > > But here I'm saying that the ENOENT/ENOTDIR errors should be > > ignored with GLOB_ERR. It can already be implied to some extent > > in that if you get those errors then it's not "directories" > > you're trying to open (so it's not a case there "it encounters a > > *directory* that it cannot open or read), but still the Solaris > > implementation (for both ENOENT and ENOTDIR) and GNU > > implementations (for ENOENT) still return errors. > > I think you're interpreting the current text too literally. My > reading is that it is trying to describe what happens when glob() > attempts to open what it expects to be a directory and gets an error. > The Solaris behaviour seems like the right thing to do. If an > application calls glob() to expand /etc/passwd/*.c without GLOB_NOCHECK > and with GLOB_ERR then I think the application writer would want glob() > to indicate that there's a problem with the pattern, not just that there > are no matches. [...] But then in */*.c, Solaris returns with an error if the current directory contains a non-directory file (and calls errfunc() with ENOTDIR and that file), which is not wanted. IMO, GLOB_ERR should be about failure to expand the glob. The ENOTDIR error when expanding /etc/passwd/*.c is not preventing the glob from expanding (to nothing). If passwd was a symlink to some inaccessible area, then it would. (but again, there's the problem of lstat() failures that are not reported, but that's a different problem). -- Stephane
Re: [1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files
Stephane Chazelas wrote, on 29 Jul 2019: > > 2019-07-29 10:45:35 +0100, Geoff Clare: > [...] > > I noticed the same problem when I was working on the wording changes > > to glob() as part of the pathname expansion fixes that arose from > > bug 1255, which is why the proposed change in my email on 25th July > > had: > > > > | In glob() change GLOB_ERR from: > > | > > | Cause glob() to return when it encounters a directory that it > > | cannot open or read. Ordinarily, glob() continues to find matches. > > | > > | to: > > | > > | Cause glob() to return when an attempt to open, read or search a > > | directory fails because of an error condition that is related to > > | file system contents. If this flag is not set, glob() shall > > | not treat such conditions as an error, and shall continue to > > | look for matches. > > > > plus similar fixes further down the page. > [...] > > But here I'm saying that the ENOENT/ENOTDIR errors should be > ignored with GLOB_ERR. It can already be implied to some extent > in that if you get those errors then it's not "directories" > you're trying to open (so it's not a case there "it encounters a > *directory* that it cannot open or read), but still the Solaris > implementation (for both ENOENT and ENOTDIR) and GNU > implementations (for ENOENT) still return errors. I think you're interpreting the current text too literally. My reading is that it is trying to describe what happens when glob() attempts to open what it expects to be a directory and gets an error. The Solaris behaviour seems like the right thing to do. If an application calls glob() to expand /etc/passwd/*.c without GLOB_NOCHECK and with GLOB_ERR then I think the application writer would want glob() to indicate that there's a problem with the pattern, not just that there are no matches. -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Re: [1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files
2019-07-29 10:45:35 +0100, Geoff Clare: [...] > I noticed the same problem when I was working on the wording changes > to glob() as part of the pathname expansion fixes that arose from > bug 1255, which is why the proposed change in my email on 25th July > had: > > | In glob() change GLOB_ERR from: > | > | Cause glob() to return when it encounters a directory that it > | cannot open or read. Ordinarily, glob() continues to find matches. > | > | to: > | > | Cause glob() to return when an attempt to open, read or search a > | directory fails because of an error condition that is related to > | file system contents. If this flag is not set, glob() shall > | not treat such conditions as an error, and shall continue to > | look for matches. > > plus similar fixes further down the page. [...] But here I'm saying that the ENOENT/ENOTDIR errors should be ignored with GLOB_ERR. It can already be implied to some extent in that if you get those errors then it's not "directories" you're trying to open (so it's not a case there "it encounters a *directory* that it cannot open or read), but still the Solaris implementation (for both ENOENT and ENOTDIR) and GNU implementations (for ENOENT) still return errors. -- Stephane
Re: [1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files
Austin Group Bug Tracker wrote, on 27 Jul 2019: > > The following issue has been SUBMITTED. > == > http://austingroupbugs.net/view.php?id=1273 > == > In the XSH glob() specification, > > For GLOB_ERR, the spec says: > > > Cause glob() to return when it encounters a directory that it > > cannot open or read. Ordinarily, glob() continues to find > > matches. I noticed the same problem when I was working on the wording changes to glob() as part of the pathname expansion fixes that arose from bug 1255, which is why the proposed change in my email on 25th July had: | In glob() change GLOB_ERR from: | | Cause glob() to return when it encounters a directory that it | cannot open or read. Ordinarily, glob() continues to find matches. | | to: | | Cause glob() to return when an attempt to open, read or search a | directory fails because of an error condition that is related to | file system contents. If this flag is not set, glob() shall | not treat such conditions as an error, and shall continue to | look for matches. plus similar fixes further down the page. -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
[1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files
A NOTE has been added to this issue. == http://austingroupbugs.net/view.php?id=1273 == Reported By:stephane Assigned To: == Project:1003.1(2016)/Issue7+TC2 Issue ID: 1273 Category: System Interfaces Type: Error Severity: Objection Priority: normal Status: New Name: Stephane Chazelas Organization: User Reference: Section:glob() Page Number:1109, 1110 (in 2018 edition) Line Number:35742, 35768 Interp Status: --- Final Accepted Text: == Date Submitted: 2019-07-27 10:49 UTC Last Modified: 2019-07-28 07:03 UTC == Summary:glob()'s GLOB_ERR/errfunc and non-directory files == -- (0004495) stephane (reporter) - 2019-07-28 07:03 http://austingroupbugs.net/view.php?id=1273#c4495 -- Re: http://austingroupbugs.net/view.php?id=1273#c4494 > The real problem with the interface is that it doesn't allow reporting the > lstat() errors in the */foo/bar/baz cases. Since errfunc only takes a path > and errno arguments, calling it with a subdir/foo/bar/baz and EACCESS for > instance could cause confusion and imply subdir/foo/bar/baz is a directory > that cannot be read, while actually it's probably either subdir, > subdir/foo or subdir/foo/bar that is not searchable. I'm not sure we want > to force implementations to stat() those 3 directories just to report an > error. Anyway, stat() would not be the right tool, more access(X_OK) in that case. If subdir is not searchable then a */foo/bar/ba[z] would call errfunc(subdir/foo/bar, EACCESS), so it would be acceptable for an implementation to just do access(subdir/foo/bar, X_OK) if they wanted to (that would not cover the other lstat() error cases though). Issue History Date ModifiedUsername FieldChange == 2019-07-27 10:49 stephane New Issue 2019-07-27 10:49 stephane Name => Stephane Chazelas 2019-07-27 10:49 stephane Section => glob() 2019-07-27 10:49 stephane Page Number => 1109, 1110 (in 2018 edition) 2019-07-27 10:49 stephane Line Number => 35742, 35768 2019-07-28 00:48 Don Cragun Interp Status => --- 2019-07-28 00:48 Don Cragun Category Shell and Utilities => System Interfaces 2019-07-28 01:42 shware_systems Note Added: 0004493 2019-07-28 06:44 stephane Note Added: 0004494 2019-07-28 07:03 stephane Note Added: 0004495 ==
[1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files
A NOTE has been added to this issue. == http://austingroupbugs.net/view.php?id=1273 == Reported By:stephane Assigned To: == Project:1003.1(2016)/Issue7+TC2 Issue ID: 1273 Category: System Interfaces Type: Error Severity: Objection Priority: normal Status: New Name: Stephane Chazelas Organization: User Reference: Section:glob() Page Number:1109, 1110 (in 2018 edition) Line Number:35742, 35768 Interp Status: --- Final Accepted Text: == Date Submitted: 2019-07-27 10:49 UTC Last Modified: 2019-07-28 06:44 UTC == Summary:glob()'s GLOB_ERR/errfunc and non-directory files == -- (0004494) stephane (reporter) - 2019-07-28 06:44 http://austingroupbugs.net/view.php?id=1273#c4494 -- Re: http://austingroupbugs.net/view.php?id=1273#c4493 Yes, it's actually not clear how stat() is meant to be used here. I had assumed, lstat() was meant instead as in the ./*/file cases where implementations don't open the subdirs of ., but instead try lstat(./subdir/file) on each of them. But GLOB_ERR/errfunc being meant to report errors upon opening/reading *directories*, it can't report errors of lstat(). Maybe the spec wants implementations to call stat() on directories to check if they are searchable? If we step back from the implementation detail to look at what the intention of the interface should be: AFAICT a glob(*/*.c) should return the matching files and GLOB_ERR/errfunc should identify the problems that prevent us from doing so. /etc/passwd/*.c or non-existing/*.c doesn't match any file. The ENOTDIR/ENOENT failure upon trying to opening those non-directories is not an error preventing us from expanding the glob, it's on the contrary confirmation that the glob can't match. Where it becomes more ambiguous is when ELOOP/ENAMETOOLONG is returned (where the files may exist using a shortened path). FreeBSD's glob() does return errors in those cases which IMO sounds like the best thing to do. The real problem with the interface is that it doesn't allow reporting the lstat() errors in the */foo/bar/baz cases. Since errfunc only takes a path and errno arguments, calling it with a subdir/foo/bar/baz and EACCESS for instance could cause confusion and imply subdir/foo/bar/baz is a directory that cannot be read, while actually it's probably either subdir, subdir/foo or subdir/foo/bar that is not searchable. I'm not sure we want to force implementations to stat() those 3 directories just to report an error. Maybe we don't want to over-specify here and just say GLOB_ERR/errfunc should report the errors upon accessing directories (and directories or files assumed to be directories only) that prevent it from expanding the glob pattern without going into details of the implementation. And an application usage section clarifying that non-existing/*.c should not be reported as an error since the ENOENT failure of accessing the non-existing assumed-to-be-directory doesn't prevent us from expanding the glob, quite the contrary. Issue History Date ModifiedUsername FieldChange == 2019-07-27 10:49 stephane New Issue 2019-07-27 10:49 stephane Name => Stephane Chazelas 2019-07-27 10:49 stephane Section => glob() 2019-07-27 10:49 stephane Page Number => 1109, 1110 (in 2018 edition) 2019-07-27 10:49 stephane Line Number => 35742, 35768 2019-07-28 00:48 Don Cragun Interp Status => --- 2019-07-28 00:48 Don Cragun Category Shell and Utilities => System Interfaces 2019-07-28 01:42 shware_systems Note Added: 0004493 2019-07-28 06:44 stephane Note Added: 0004494 ==
[1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files
A NOTE has been added to this issue. == http://austingroupbugs.net/view.php?id=1273 == Reported By:stephane Assigned To: == Project:1003.1(2016)/Issue7+TC2 Issue ID: 1273 Category: System Interfaces Type: Error Severity: Objection Priority: normal Status: New Name: Stephane Chazelas Organization: User Reference: Section:glob() Page Number:1109, 1110 (in 2018 edition) Line Number:35742, 35768 Interp Status: --- Final Accepted Text: == Date Submitted: 2019-07-27 10:49 UTC Last Modified: 2019-07-28 01:42 UTC == Summary:glob()'s GLOB_ERR/errfunc and non-directory files == -- (0004493) shware_systems (reporter) - 2019-07-28 01:42 http://austingroupbugs.net/view.php?id=1273#c4493 -- Re: - I don't think we want to force implementations to literally call opendir()/readdir()/lstat() (in any case, that "stat()" is wrong). Not sure how to phrase it though. Those are examples of interfaces that may return error codes errfunc is expected to process, that I see, not a requirement glob() implementations have to use them and only them. So, use of lstat() is allowed, as is directly accessing a host through syscalls that affect errno, bypassing use of the listed interfaces entirely. All that is missing is "e.g." after "failure," and ", or other standard interfaces," after "those interfaces" in the parenthetical part, to emphasize they are examples. What may be helpful is a table of standard errno values that are to be passed to errfunc, whichever interface or implementation private code generates them, so applications don't need to guess what case labels errfunc's switch statement may have to process. Issue History Date ModifiedUsername FieldChange == 2019-07-27 10:49 stephane New Issue 2019-07-27 10:49 stephane Name => Stephane Chazelas 2019-07-27 10:49 stephane Section => glob() 2019-07-27 10:49 stephane Page Number => 1109, 1110 (in 2018 edition) 2019-07-27 10:49 stephane Line Number => 35742, 35768 2019-07-28 00:48 Don Cragun Interp Status => --- 2019-07-28 00:48 Don Cragun Category Shell and Utilities => System Interfaces 2019-07-28 01:42 shware_systems Note Added: 0004493 ==
[1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files
The following issue has been UPDATED. == http://austingroupbugs.net/view.php?id=1273 == Reported By:stephane Assigned To: == Project:1003.1(2016)/Issue7+TC2 Issue ID: 1273 Category: System Interfaces Type: Error Severity: Objection Priority: normal Status: New Name: Stephane Chazelas Organization: User Reference: Section:glob() Page Number:1109, 1110 (in 2018 edition) Line Number:35742, 35768 Interp Status: --- Final Accepted Text: == Date Submitted: 2019-07-27 10:49 UTC Last Modified: 2019-07-28 00:48 UTC == Summary:glob()'s GLOB_ERR/errfunc and non-directory files == Issue History Date ModifiedUsername FieldChange == 2019-07-27 10:49 stephane New Issue 2019-07-27 10:49 stephane Name => Stephane Chazelas 2019-07-27 10:49 stephane Section => glob() 2019-07-27 10:49 stephane Page Number => 1109, 1110 (in 2018 edition) 2019-07-27 10:49 stephane Line Number => 35742, 35768 2019-07-28 00:48 Don Cragun Interp Status => --- 2019-07-28 00:48 Don Cragun Category Shell and Utilities => System Interfaces ==
Re: [1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files
2019-07-27 10:49:39 +, Austin Group Bug Tracker: [...] > Category: Shell and Utilities [...] Sorry, my bad that should have been "System interfaces". It doesn't seem I can change it after the fact. -- Stephane
[1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files
The following issue has been SUBMITTED. == http://austingroupbugs.net/view.php?id=1273 == Reported By:stephane Assigned To: == Project:1003.1(2016)/Issue7+TC2 Issue ID: 1273 Category: Shell and Utilities Type: Error Severity: Objection Priority: normal Status: New Name: Stephane Chazelas Organization: User Reference: Section:glob() Page Number:1109, 1110 (in 2018 edition) Line Number:35742, 35768 Interp Status: --- Final Accepted Text: == Date Submitted: 2019-07-27 10:49 UTC Last Modified: 2019-07-27 10:49 UTC == Summary:glob()'s GLOB_ERR/errfunc and non-directory files Description: In the XSH glob() specification, For GLOB_ERR, the spec says: > Cause glob() to return when it encounters a directory that it > cannot open or read. Ordinarily, glob() continues to find > matches. (Note: it's not clear what "Ordinarily" means here. When errfunc is set and returns non-zero, glob() doesn't continue, is it ordinary?). For errfunc: > If, during the search, a directory is encountered that cannot > be opened or read and errfunc is not a null pointer, glob() > calls (*errfunc()) with two arguments. [...] > 2. The eerrno argument is the value of errno from the > failure, as set by opendir(), readdir(), or stat(). > (Other values may be used to report other errors not > explicitly documented for those functions.) (Note: does that mean glob() has to call those 3 functions (as opposed to open(O_DIRECTORY)/getdents() or any other API)? Why stat(), shouldn't that be lstat()?) First (and that's still not the case I'm making here), it's not obvious what /directories/ glob() will try to open. It can be somewhat inferred from the spec, as the pathname expansion specification refers to directories that must be readable (which implies they are going to be read) and some that only need to be searchable (implying they're not going to be read). But maybe the spec should be more explicit, as it's not obvious for instance that in */*.c the current directory and all the subdirs are going to be read, while in */foo.c, only the current directory is read (and all subdirs/foo.c lstat()ed), so if there's a non-readable subdir, only the former will fail (or cause errfunc to be invoked). Now, to get to the point, the spec refers to "directories" that can't be opened. What about a /etc/passwd/*.c glob. /etc/passwd is not a directory, opendir("/etc/passwd") if called would fail with ENOTDIR, does that mean glob() should not call opendir() here or that it should ignore opendir()'s error when errno is ENOTDIR? What about */*.c where there's at least one non-directory non-hidden file in the current directory? What if there's a broken symlink or a symlink to a file that is not accessible (and so for which we can't tell whether the symlink is a directory or not)? I've done tests with the FreeBSD 12.0, Solaris 10 and GNU libc 2.27 implementations of glob() and they all differ significantly, the Solaris one being the least compliant to what I can infer the spec to require, and FreeBSD's the most. On Solaris /etc/passwd/*.c glob(GLOB_ERR) fails (and calls errfunc with /etc/passwd, ENOTDIR), same for */*.c in a directory that contains a non-hidden regular file. Only FreeBSD's glob(GLOB_ERR) doesn't fail on non-existent/*.c or */*.c in a directory that contains a broken symlink. The other two call errfunc with ENOENT. For */*.c in a directory that contains a symlink to a non-accessible area, they all fail (call errfunc with EACCESS). Same with */*/*.c if the current directory contains a subdir that is readable but not searchable (note that whether glob() could tell whether entries of that directory are directories or not depends on whether readdir() returns that information or not; either way, we can't tell for symlinks). Desired Action: At this point, I just want to start the discussion as to how best fix it. - The "ordinarily" should probably be changed to "if errfunc is NULL" - I don't think we want to force implementations to literally call opendir()/readdir()/lstat() (in any case, that "stat()" is wrong). Not sure how to phrase it though. - we should probably clarify which directories glob() is meant to try opening, or which files glob() is meant to invoke opendir() or equivalent on. - and then what to do for