Re: [1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files

2019-08-01 Thread Shware Systems

Additionally, when GLOB_NOCHECK is in flags it is not expected to call stat() 
and return 0 paths if it does not exist. It is on the application to note the 
increase in count by 1 and compare the result for match to pattern to see if it 
needs to do a stat() seperately. One of the examples or Application Usage could 
make this explicit better. I would prefer to see a separate error return value 
for this case, e.g. GLOB_NOEXIST, as a more efficient means of testing for it, 
or remove the qualification from GLOB_NOMATCH.

On Thursday, August 1, 2019 Geoff Clare  wrote:

Stephane Chazelas  wrote, on 01 Aug 2019:

>
> It's also not clear what the interaction between GLOB_MARK and
> GLOB_NOCHECK would be. If a pattern expands to itself because it
> can't find a match, should it still call stat on it?


Not clear?  Seems crystal clear to me.

GLOB_NOCHECK: "... shall return a list consisting of only pattern".
No allowance for a slash to be appended.

GLOB_MARK: "Each pathname that is a directory that matches pattern ..."
If pattern does not match anything, then it is not "a directory that
matches pattern" even if a directory with the same name exists.

--
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England





Re: [1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files

2019-08-01 Thread Stephane Chazelas
2019-08-01 15:51:10 +0100, Geoff Clare:
> Stephane Chazelas  wrote, on 01 Aug 2019:
> >
> > It's also not clear what the interaction between GLOB_MARK and
> > GLOB_NOCHECK would be. If a pattern expands to itself because it
> > can't find a match, should it still call stat on it?
> 
> Not clear?  Seems crystal clear to me.
> 
> GLOB_NOCHECK: "... shall return a list consisting of only pattern".
> No allowance for a slash to be appended.
> 
> GLOB_MARK: "Each pathname that is a directory that matches pattern ..."
> If pattern does not match anything, then it is not "a directory that
> matches pattern" even if a directory with the same name exists.
[...]

Makes sense, though in my example the pathname *did* match the
pattern. Only that unreadable/* file was not found by glob() as
the unreadable directory wasn't readable. It was searchable
though which meant a stat on the "unreadable/*" directory
succeeded.

In any case, those implementations will happily add a / even on
directories that don't match the pattern like in unreadable/[*]
or unreadable/\*, so those implementations are not compliant.

Related question, also related to the backslash issue
(bugid:1234). In:

glob("\\foo")

Should glob look for foo in the current directory via a listing of
it or via lstat("foo")?

Looking at the glibc implementation, I see that glob("foo") does
a lstat("foo") without NOCHECK and nothing at all with NOCHECK.
While for glob("\\foo"), it searches for it in the directory
listing (both with and without NOCHECK). The MARK is added based
on a stat() done on the result of the expansion (with NOCHECK,
either foo or \foo). IOW, it behaves the same as for
glob("[f]oo").

-- 
Stephane



Re: [1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files

2019-08-01 Thread Geoff Clare
Stephane Chazelas  wrote, on 01 Aug 2019:
>
> It's also not clear what the interaction between GLOB_MARK and
> GLOB_NOCHECK would be. If a pattern expands to itself because it
> can't find a match, should it still call stat on it?

Not clear?  Seems crystal clear to me.

GLOB_NOCHECK: "... shall return a list consisting of only pattern".
No allowance for a slash to be appended.

GLOB_MARK: "Each pathname that is a directory that matches pattern ..."
If pattern does not match anything, then it is not "a directory that
matches pattern" even if a directory with the same name exists.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: [1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files

2019-08-01 Thread Stephane Chazelas
2019-07-27 10:49:39 +, Austin Group Bug Tracker:
[...]
> > If, during the search, a directory is encountered that cannot
> > be opened or read and errfunc is not a null pointer, glob()
> > calls (*errfunc()) with two arguments.
> [...]
> >   2. The eerrno argument is the value of errno from the
> >  failure, as set by opendir(), readdir(), or stat().
> >  (Other values may be used to report other errors not
> >  explicitly documented for those functions.)
> 
> (Note: does that mean glob() has to call those 3 functions (as
> opposed to open(O_DIRECTORY)/getdents() or any other API)? Why
> stat(), shouldn't that be lstat()?)
[...]

I'm only realising now (after reading the musl mailing-list
thread) that the stat() above may be refering to GLOB_MARK which
tells glob to append "/" after directories for which glob()
would need to a stat() and which all implementations seem to be
doing (but none seem to report errors when that one fails).

It's also not clear what the interaction between GLOB_MARK and
GLOB_NOCHECK would be. If a pattern expands to itself because it
can't find a match, should it still call stat on it? glibc and
diet seem to, musl doesn't:

$ mkdir -p 'unreadable/*' unreadable/dir
$ chmod 111 unreadable
$ ~/glob-and-mark-glibc 'unreadable/*'
unreadable: Permission denied
ret=0 count=1
- unreadable/*/
$ ~/glob-and-mark-diet 'unreadable/*'
unreadable: Permission denied
ret=0 count=1
- unreadable/*/
$ ~/glob-and-mark-musl 'unreadable/*'
unreadable/: Permission denied
ret=0 count=1
- unreadable/*

(btw, dietlibc's glob() implementation is very buggy, probably
not worth considering here)

Where glob-and-mark.c is:

#include 
#include 
#include 
int errfunc(const char *epath, int eerrno)
{
  printf("%s: %s\n", epath, strerror(eerrno));
  return 0;
}
int main(int argc, char* argv[])
{
  int r;
  glob_t globbuf;
  r = glob(argv[1], GLOB_MARK|GLOB_NOCHECK, errfunc, );
  printf("ret=%d count=%d\n", r, globbuf.gl_pathc);
  if (!r) {
for (r = 0; r < globbuf.gl_pathc; r++)
  printf("- %s\n", globbuf.gl_pathv[r]);
  }
  return 0;
}

-- 
Stephane



Re: [1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files

2019-07-31 Thread Geoff Clare
Stephane Chazelas  wrote, on 30 Jul 2019:
>
> 2019-07-30 15:31:13 +0100, Geoff Clare:
> [...]
> > It's not invention because the standard already requires it. (It also
> > requires EACCES, ELOOP, ENAMETOOLONG, ENOENT, ENOTDIR to be treated as
> > errors.  The question is which ones the standard should be changed to
> > say are ignored.)
> [...]
> 
> By that reading, you could also say that
> 
> test -f /etc/passwd/file
> 
> should report an error because of the ENOTDIR error returned by
> stat(). How is it different?

It's different because the purpose of test -f is to test whether the
file exists (and is a regular file).  Thus an exit status of 1 is
the result of a successful execution which indicates that the file
does not exist.  The ENOTDIR does not prevent test -f from performing
the task of determining whether the file exists (it is an indication
that the file does not exist), and so should not be treated as an error.

The purpose of pathname expansion in the shell is to replace a
pattern with a list of pathnames that match that pattern.  If an
error occurs which prevents the shell from performing that task,
then the shell should treat it as an error, except for cases that
the standard explicitly says should be ignored (which is what we're
proposing to add to fix the problem).

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: [1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files

2019-07-30 Thread Stephane Chazelas
2019-07-30 15:31:13 +0100, Geoff Clare:
[...]
> It's not invention because the standard already requires it. (It also
> requires EACCES, ELOOP, ENAMETOOLONG, ENOENT, ENOTDIR to be treated as
> errors.  The question is which ones the standard should be changed to
> say are ignored.)
[...]

By that reading, you could also say that

test -f /etc/passwd/file

should report an error because of the ENOTDIR error returned by
stat(). How is it different?

Surely the "errors" utilities are meant to report are those that
they *consider* an error, not every error by any of the syscall
their implementation makes.

IMO, it's a bit far fetched to see the spec as requiring sh to
fail upon a ENOENT error upon lstat() here (that would mean
*/file expansion could only succeed if all the non-hidden files
in the current directory were searchable directories and
contained a "file" entry) though it wouldn't harm to make it
more explicit.

-- 
Stephane



Re: [1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files

2019-07-30 Thread Geoff Clare
Robert Elz  wrote, on 30 Jul 2019:
>
>   | I strongly disagree that EMFILE and ENFILE should be ignored in the
>   | shell.  That leads to the execution of commands with an unchanged
>   | pattern when there are matching files it should have used instead.
> 
> Same thing for all the other errors.   Further your desire here seems
> to be invention - which is not something that should be happening.
> Which current shells actually abort glob expansions when they get EMFILE
> when opening a directory to read, or similar (which is the easy one to test) ?

So far nobody has identified a shell that does it.

It's not invention because the standard already requires it. (It also
requires EACCES, ELOOP, ENAMETOOLONG, ENOENT, ENOTDIR to be treated as
errors.  The question is which ones the standard should be changed to
say are ignored.)

>   | The command might succeed (operating on the wrong file) and the user
>   | would not be alerted to the problem.  Particularly bad if it was an
>   | rm command.
> 
> Yes, all that might happen - so it might if there's an EACCESS or EPERM
> error, consider
> 
>   rm [abc]*/*.c
> 
> where there's directories with names like "a102" "cxyz" "bletch" ...
> which contain various *.c files that we want to remove.
> 
> Now consider what happens when the (or a) previous command to that one was
> 
>   chmod a-rwx [abc]*
> 
> all of the attempts to opem a102 cxyz bletch (etc) now fail, and
> the pattern is not expanded, and rm gets given the pattern as the
> file name to remove, and proceeds to delete my (very precious) file
> '[abc]*/*.c' (which is a file in a directory with a name that doesn't
> start with a b or c, so the chmod command did not protect it.
> 
> "Particularly bad"
> 
> Still that is what happens, and and what's more, is really what everone
> expects will happen, and generally wants to happen (except those who want
> some kind of "nomatch" error behaviour, which does not include me).

While that's true, the difference is that this behaviour is consistent
given those specific file system contents, whereas EMFILE and ENFILE
might or might not happen at different times.  There are also good
reasons to want EACCES not to be treated as an error and this case
is then collateral damage that we accept as being worth suffering;
but there is no reason not to treat EMFILE and ENFILE as an error.

> It isn't really all that bad in practice, as people rarely name files
> with names that look like patterns, except from the occasional
> accidental 'cat foo* > bar*' type error, and when they have done that
> the problem tends to be more "how do I get rid of just that file" rather
> that "why did that file get deleted".
> 
> The (much less likely, except for users attempting to shoot themselves
> in the foot, deliberately) EMFILE and ENFILE cases are not different at
> all.

However, they are easily preventable, so why not do that?

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: [1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files

2019-07-30 Thread Stephane Chazelas
2019-07-30 14:29:23 +0100, Geoff Clare:
[...]
> > Do you have a better suggestion?
> 
> Unless one of the implementations changes to do something better
> before we get too far into work on Issue 8, I think the only
> choices we have are the Solaris behaviour, the GNU/BSD behaviour,
> the GNU/BSD "done right" (ELOOP/ENAMETOOLONG/ENOENT/ENOTDIR all
> treated the same), or allow some or all of these behaviours.
[...]

Great, thanks. I think we concur. My vote, as already stated
would be:

- ENOTDIR errors upon opendir() shall be ignored
- ENOENT/ENAMETOOLONG/ELOOP may be ignored.

That is Solaris (and other old BSDs and newer musl) not allowed
as */*.c returning a ENOTDIR error is definitely a bug IMO,
GNU/FreeBSD allowed.

Do we want to allow lstat() errors (other than ENOENT/ENOTDIR)
to be reported (I changed my mind on that and now think it would
not be that confusing).

I've now tested musl 1.1.21 and diet 0.34 on Linux which are
actually quite different from the GNU/Solaris/FreeBSD mentioned
above.

For musl, first, it seems that in */*.c, it actually uses the
entry-type information returns by readdir() and doesn't call
opendir() on entries that are neither directory nor symlink.

It still returns an error upon ENOTDIR if there are symlinks to
regular files in the current directory or if called with
regfile/*.c

It calls stat() instead of lstat() for the */file glob (again,
skipping the non-dir-non-symlink files), (so would not expand a
dir/file broken symlink) and reports the stat() errors other
than ENOENT (including ENOTDIR in */file when the current
directory contains a symlink to a regular file).

dietlibc seems to behave quite differently as well. In */*.c, it
does a stat() on each file in the current directory to determine
which are directories (and if there's no matching one calls
opendir("*") mostly likely causing a ENOENT error), so won't
report ENOTDIR errors there (except in race condition cases). In
*/file, it doesn't use stat/lstat but reads the content of the
directories looking for a "file" entry (so fails on unreadable
dirs instead of unsearchable ones). In any case, it ignores the
ENOTDIR errors on opendir(), even in the regfile/*.c case.

-- 
Stephane



Re: [1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files

2019-07-30 Thread Robert Elz
Date:Tue, 30 Jul 2019 14:39:54 +0100
From:Geoff Clare 
Message-ID:  <20190730133954.GB27152@lt2.masqnet>

  | This thread is specifically about the glob() function in XSH.

Yes, I know, see my previous reply to Stephane.

If I replied to a message in the wrong thread, when I should have
used the other one, apologies - I wasn't paying all that much attention.
It just posed a question I thought worthy of a reply (but apparently
really in the context of the other bug report.)

  | I strongly disagree that EMFILE and ENFILE should be ignored in the
  | shell.  That leads to the execution of commands with an unchanged
  | pattern when there are matching files it should have used instead.

Same thing for all the other errors.   Further your desire here seems
to be invention - which is not something that should be happening.
Which current shells actually abort glob expansions when they get EMFILE
when opening a directory to read, or similar (which is the easy one to test) ?

  | The command might succeed (operating on the wrong file) and the user
  | would not be alerted to the problem.  Particularly bad if it was an
  | rm command.

Yes, all that might happen - so it might if there's an EACCESS or EPERM
error, consider

rm [abc]*/*.c

where there's directories with names like "a102" "cxyz" "bletch" ...
which contain various *.c files that we want to remove.

Now consider what happens when the (or a) previous command to that one was

chmod a-rwx [abc]*

all of the attempts to opem a102 cxyz bletch (etc) now fail, and
the pattern is not expanded, and rm gets given the pattern as the
file name to remove, and proceeds to delete my (very precious) file
'[abc]*/*.c' (which is a file in a directory with a name that doesn't
start with a b or c, so the chmod command did not protect it.

"Particularly bad"

Still that is what happens, and and what's more, is really what everone
expects will happen, and generally wants to happen (except those who want
some kind of "nomatch" error behaviour, which does not include me).

It isn't really all that bad in practice, as people rarely name files
with names that look like patterns, except from the occasional
accidental 'cat foo* > bar*' type error, and when they have done that
the problem tends to be more "how do I get rid of just that file" rather
that "why did that file get deleted".

The (much less likely, except for users attempting to shoot themselves
in the foot, deliberately) EMFILE and ENFILE cases are not different at
all.

kre



Re: [1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files

2019-07-30 Thread Robert Elz
Date:Tue, 30 Jul 2019 14:35:47 +0100
From:Stephane Chazelas 
Message-ID:  <20190730133547.mcqvnbaz3wmms...@chaz.gmail.com>

  | While I generally agree for bugid:1275 -- for shell globs or
  | glob() without GLOB_ERR

As I said (and also in response to the first part of Geoff's
subsequent message) for the case you're concerned with I don't
much care (might be a good idea just to delete the GLOB_ERR flag
though, and avoid the issue - any implementation that finds a
need for it can then just implement it as an extension, and can
make it handle whatever errors they need it to handle.)

  | implementations to report some errors like ENFILE/EFAULT/ENOMEM;
  | not that any does ATM),

ash based shells will certainly report ENOMEM errors, if they occur
while saving generated pathnames (which, even though it is really
hard to generate on any modern system, is the likely case for ENOMEM
to be seen).   On the other hand if ENOMEM comes from opendir() not
being able to allocate buffer space to manage the read from the
directory, it will be ignored just the same as if opendir() returns
failure because of ENOENT.

  | it's a different matter for glob() with
  | the GLOB_ERR flag in bugid:1273 discussed in this thread which
  | is explicitly meant to report errors

You may have noticed that I haven't been paying that much
attention around here recently (busy with other things) but
I was actually aware of the distinction, I just don't care
about that part of it, as ...

  | (and before you ask, no, I
  | don't know of any application that uses that API. Does anyone?).

Not me.

kre



Re: [1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files

2019-07-30 Thread Geoff Clare
Robert Elz  wrote, on 30 Jul 2019:
>
> My recommendation would be to forbid glob (at least in the shell,
> I don't care so much about whatever other uses there are)

This thread is specifically about the glob() function in XSH.
The similar issues with pathname expansion in the shell are
covered by bug 1275.

> from ever
> returning any error status or issuing any error messages from operations
> related to its path search -- errors from shell memory management - exhausing
> available mem because the list of found paths contains too manny and they're
> too long, or similar problems, is a different issue .. similarly if the
> shell needs to fork because of the way its glob code is implemented, and
> the fork fails, or in such a case if the shell needs a pipe to, or shared
> memory with, its child to receive the results, and that fails to be
> stablished.
> 
> Glob isn't the right tool (it isn't a tool at all) to find file system
> problems, whether they're problems we generally want glob to hide
> (ENOTDIR because the first '*' in */*.c matched a regular file, or
> ENOTDIR which might be because a directory inode got corrupted and is
> now appearing to the filesystem as if it were a regular file, or a fifo
> or something) EIO, errors about symlink loops, or absurldly long pathnames
> or pathname components, or anything else (including EMFILE and ENFILE).

I strongly disagree that EMFILE and ENFILE should be ignored in the
shell.  That leads to the execution of commands with an unchanged
pattern when there are matching files it should have used instead.

> If glob fails to match files that the user thought should be matched, then
> the user needs to investigate

The command might succeed (operating on the wrong file) and the user
would not be alerted to the problem.  Particularly bad if it was an
rm command.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: [1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files

2019-07-30 Thread Stephane Chazelas
2019-07-30 20:18:43 +0700, Robert Elz:
[...]
> If a sys call executed by glob while searching fails, then it should treat
> that exactly the same as ENOENT (the thing simply doesn't exist for glob
> purposes) and continue with whatever is next.
[...]

While I generally agree for bugid:1275 -- for shell globs or
glob() without GLOB_ERR (though I wouldn't object to allowing
implementations to report some errors like ENFILE/EFAULT/ENOMEM;
not that any does ATM), it's a different matter for glob() with
the GLOB_ERR flag in bugid:1273 discussed in this thread which
is explicitly meant to report errors (and before you ask, no, I
don't know of any application that uses that API. Does anyone?).

-- 
Stephane



Re: [1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files

2019-07-30 Thread Geoff Clare
Stephane Chazelas  wrote, on 30 Jul 2019:
>
> 2019-07-30 11:27:15 +0100, Geoff Clare:
> [...]
> > > For ENOENT, that can be seen as a pathological case worth
> > > reporting as well, especially in the */*.c case where the
> > > current directory contains broken symlinks.
> > 
> > That's inconsistent with your position on ENOTDIR.
> > 
> > If regfile exists then you claim regfile/*.c isn't going to produce any
> > matches, so it should be ignored.  Likewise if brokensymlink exists
> > then brokensymlink/*.c isn't goint to produce any matches so to be
> > consistent you should also want that to be ignored.
> [...]
> 
> But in long/path/with/spaghetty/symlinks/*/*.c, the fact that an
> extra symlink brings you over the limit (of number of links for
> ELOOP or of path length for ENAMETOOLONG) prevents you from
> listing that directory for a reason that is worth reporting IMO.
> 
> While there's no doubt in my mind that asking glob() to report
> ENOTDIR errors in */*.c is wrong.

Your argument above for ELOOP and ENAMETOOLONG applies equally
well to ENOTDIR:

long/path/with/regfile/in/the/middle/*/*.c

The only consistent way to treat them is for all of the errors
related to file system content to be ignored or for all of them
to be errors.

> That would be like asking that
> ls -LR or find -L report them as well (ls -LR reports a ENOTDIR
> error when a non-directory/file is passed as argument or is
> found in the target of a symlink, but obviously not for the
> non-directory files it finds by reading a directory, maybe that
> can be adapted for glob()).

Perhaps it could, but it would be invention - no current
implementation does anything like that.

> I don't think there's an ideal way to deal with it. That
> interface is already broken/misdesigned in that it reports the
> EACCESS errors in */*.c and not */file. Not reporting the
> ENOTDIR error is definitely an improvement, at least in the case
> of opening a directory that results from wildcard expansion (one
> could argue glob() shouldn't try to open it in the first place),
> not sure about ENOENT/ELOOP/ENAMETOOLONG.
> 
> Do you have a better suggestion?

Unless one of the implementations changes to do something better
before we get too far into work on Issue 8, I think the only
choices we have are the Solaris behaviour, the GNU/BSD behaviour,
the GNU/BSD "done right" (ELOOP/ENAMETOOLONG/ENOENT/ENOTDIR all
treated the same), or allow some or all of these behaviours.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: [1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files

2019-07-30 Thread Robert Elz
Date:Tue, 30 Jul 2019 12:43:54 +0100
From:Stephane Chazelas 
Message-ID:  <20190730114354.72qqdwckkidsd...@chaz.gmail.com>

  | Do you have a better suggestion?

My recommendation would be to forbid glob (at least in the shell,
I don't care so much about whatever other uses there are) from ever
returning any error status or issuing any error messages from operations
related to its path search -- errors from shell memory management - exhausing
available mem because the list of found paths contains too manny and they're
too long, or similar problems, is a different issue .. similarly if the
shell needs to fork because of the way its glob code is implemented, and
the fork fails, or in such a case if the shell needs a pipe to, or shared
memory with, its child to receive the results, and that fails to be
stablished.

Glob isn't the right tool (it isn't a tool at all) to find file system
problems, whether they're problems we generally want glob to hide
(ENOTDIR because the first '*' in */*.c matched a regular file, or
ENOTDIR which might be because a directory inode got corrupted and is
now appearing to the filesystem as if it were a regular file, or a fifo
or something) EIO, errors about symlink loops, or absurldly long pathnames
or pathname components, or anything else (including EMFILE and ENFILE).

If a sys call executed by glob while searching fails, then it should treat
that exactly the same as ENOENT (the thing simply doesn't exist for glob
purposes) and continue with whatever is next.

If glob fails to match files that the user thought should be matched, then
the user needs to investigate - whether the cause eventually is determined
to be incoprrect permissions somewhere, a typo in the script or an arg given
to the script (incorrect name), over long pathnames, symlink loops,
or bad blocks on the drive; other tools will find that (ls, find, even cat
or cp).   There's no real advantage having glob trying to deal with all
these cases, nor in attempting to divide up the errrors between the "good"
ones and the "bad" ones - there's no way to do that that will satisfy
everyone, as this dispute over ELOOP (etc) shows.

What's more, frankly, it is ludicrous to claim that a script should
abort itself when one of these (quite rare) errors occurs, because it
might do the wrong thing, while at the same time proclaiming that it
has to continue for one of the much more common errors (like bad permission
settings causeing EACCES or a typing mistake generating ENOENT or ENOTDIR).
If the script is going to do the wrong thing in one case, it will do the
same wrong thing in the other case as well.

kre




Re: [1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files

2019-07-30 Thread Stephane Chazelas
2019-07-30 11:27:15 +0100, Geoff Clare:
[...]
> > For ENOENT, that can be seen as a pathological case worth
> > reporting as well, especially in the */*.c case where the
> > current directory contains broken symlinks.
> 
> That's inconsistent with your position on ENOTDIR.
> 
> If regfile exists then you claim regfile/*.c isn't going to produce any
> matches, so it should be ignored.  Likewise if brokensymlink exists
> then brokensymlink/*.c isn't goint to produce any matches so to be
> consistent you should also want that to be ignored.
[...]

But in long/path/with/spaghetty/symlinks/*/*.c, the fact that an
extra symlink brings you over the limit (of number of links for
ELOOP or of path length for ENAMETOOLONG) prevents you from
listing that directory for a reason that is worth reporting IMO.

While there's no doubt in my mind that asking glob() to report
ENOTDIR errors in */*.c is wrong. That would be like asking that
ls -LR or find -L report them as well (ls -LR reports a ENOTDIR
error when a non-directory/file is passed as argument or is
found in the target of a symlink, but obviously not for the
non-directory files it finds by reading a directory, maybe that
can be adapted for glob()).

I don't think there's an ideal way to deal with it. That
interface is already broken/misdesigned in that it reports the
EACCESS errors in */*.c and not */file. Not reporting the
ENOTDIR error is definitely an improvement, at least in the case
of opening a directory that results from wildcard expansion (one
could argue glob() shouldn't try to open it in the first place),
not sure about ENOENT/ELOOP/ENAMETOOLONG.

Do you have a better suggestion?

-- 
Stephane



Re: [1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files

2019-07-30 Thread Stephane Chazelas
2019-07-30 10:48:49 +0100, Geoff Clare:
[...]
> The odd thing about all these implementations that ignore ENOTDIR and ENOENT
> (or don't but think they should), is that they are not following either of
> the possible interpretations of the current text.
> 
> If they want to interpret it literally and only report an error when they
> encounter an existing directory that they can't open, then they should not
> just ignore ENOTDIR and ENOENT from opendir(), they should also ignore
> ELOOP and ENAMETOOLONG.
[...]

Note that there's only one implementation (that I found) that
ignores ENOENT: FreeBSD (also found on macOS). ENOTDIR was added
to glibc/gnulib/uclibc/dietlibc because of */*.c returning an
error on non-directory files in the current directory which is a
common, normal case where we don't want an error to be reported.

While ELOOP and ENAMETOOLONG are pathological case which as you
said in a related discussion could be worth reporting.

For ENOENT, that can be seen as a pathological case worth
reporting as well, especially in the */*.c case where the
current directory contains broken symlinks.

That's why in my proposed resolution, I left it open whether to
specify the GNU or FreeBSD behaviour or allow both. We could
make it:
- ENOTDIR errors upon opendir() shall be ignored
- ENOENT/ENAMETOOLONG/ELOOP may be ignored.

Or we could allow all existing implementations and replace that
"shall" with a "should" or "may".

-- 
Stephane



Re: [1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files

2019-07-30 Thread Geoff Clare
Stephane Chazelas  wrote, on 29 Jul 2019:
>
> 2019-07-29 12:12:28 +0100, Geoff Clare:
> [...]
> > > in */*.c, Solaris returns with an error if the current directory
> > > contains a non-directory file (and calls errfunc() with ENOTDIR
> > > and that file), which is not wanted.
> > 
> > True, but there's no way round that because GLOB_ERR can't distinguish
> > these cases.  It's "all or nothing".  
> > 
> > > IMO, GLOB_ERR should be about failure to expand the glob.
> > > The ENOTDIR error when expanding /etc/passwd/*.c is not
> > > preventing the glob from expanding (to nothing). If passwd was a
> > > symlink to some inaccessible area, then it would.
> > 
> > To me the point of having GLOB_ERR and errfunc as two different
> > error reporting mechanisms is that GLOB_ERR is "all or nothing"
> > and errfunc lets you be more selective.  You said yourself in the bug
> > that the Solaris behaviour is "more flexible in that the caller can
> > use a errfunc that ignores ENOENT/ENOTDIR to emulate the GNU/FreeBSD
> > behaviour".
> [...]
> 
> Yes, but to me that sounds more like the Solaris behaviour is
> bogus and there's a way to work around it.
> 
> From https://reviews.freebsd.org/rS304284
> https://reviews.freebsd.org/rS304284#C38376190OL661
> FreeBSD implementated that ignoring of ENOENT/ENOTDIR for POSIX
> compliance in 2016.
> 
> For the ENOTDIR ignoring in GNU libc, that was in 1999 following
> a bug report (libc/1032 which I coudn't find). See
> https://sourceware.org/git/?p=glibc.git;a=commit;h=647361287ddb2d52ffe9dbbfe2bd27ed76dc2c56
> 
> NetBSD has this comment in the code:
> 
> /*
>  * Posix/XOpen: glob should return when it encounters a
>  * directory that it cannot open or read
>  * XXX: Should we ignore ENOTDIR and ENOENT though?
>  * I think that Posix had in mind EPERM...
>  */
> 
> (ITTM EACCESS).

The odd thing about all these implementations that ignore ENOTDIR and ENOENT
(or don't but think they should), is that they are not following either of
the possible interpretations of the current text.

If they want to interpret it literally and only report an error when they
encounter an existing directory that they can't open, then they should not
just ignore ENOTDIR and ENOENT from opendir(), they should also ignore
ELOOP and ENAMETOOLONG.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: [1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files

2019-07-29 Thread Stephane Chazelas
2019-07-29 13:13:03 +0100, Stephane Chazelas:
[...]
> For the ENOTDIR ignoring in GNU libc, that was in 1999 following
> a bug report (libc/1032 which I coudn't find). See
> https://sourceware.org/git/?p=glibc.git;a=commit;h=647361287ddb2d52ffe9dbbfe2bd27ed76dc2c56
[...]

The bug report can be seen at
https://sourceware.org/ml/libc-alpha/1999-q1/msg00498.html

Somebody noted that Solaris 7 had the same problem, but it was
fixed nonetheless
https://sourceware.org/ml/libc-alpha/1999-05/msg4.html

-- 
Stephane



Re: [1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files

2019-07-29 Thread Stephane Chazelas
2019-07-29 13:13:03 +0100, Stephane Chazelas:
[...]
> NetBSD has this comment in the code:
> 
> /*
>  * Posix/XOpen: glob should return when it encounters a
>  * directory that it cannot open or read
>  * XXX: Should we ignore ENOTDIR and ENOENT though?
>  * I think that Posix had in mind EPERM...
>  */
[...]

OpenBSD has:

/* TODO: don't call for ENOENT or ENOTDIR? */

the same as in FreeBSD before the 2016 fix. It's the same
comment that could be found in 1990 in the BSD code, when the
glob() function was added. It can be found in tcsh, nvi, sudo
and perl code as well. And in opensolaris/illumos glob(). Most
likely that TODO is still in the Solaris code.

glob(3) is a POSIX invention, isn't it? I couldn't find it in
SVR4. I wonder how other SYSV-derived OSes (and that don't have
a BSD heritage like Solaris) behave.

uclibc, musl and dietlibc behave like GNU (ignore ENOTDIR, not
ENOENT) AFAICT from reading the code. musl seems to do some
extra processing on EACCESS, I've not looked much further into
it.

-- 
Stephane



Re: [1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files

2019-07-29 Thread Stephane Chazelas
2019-07-29 12:12:28 +0100, Geoff Clare:
[...]
> > in */*.c, Solaris returns with an error if the current directory
> > contains a non-directory file (and calls errfunc() with ENOTDIR
> > and that file), which is not wanted.
> 
> True, but there's no way round that because GLOB_ERR can't distinguish
> these cases.  It's "all or nothing".  
> 
> > IMO, GLOB_ERR should be about failure to expand the glob.
> > The ENOTDIR error when expanding /etc/passwd/*.c is not
> > preventing the glob from expanding (to nothing). If passwd was a
> > symlink to some inaccessible area, then it would.
> 
> To me the point of having GLOB_ERR and errfunc as two different
> error reporting mechanisms is that GLOB_ERR is "all or nothing"
> and errfunc lets you be more selective.  You said yourself in the bug
> that the Solaris behaviour is "more flexible in that the caller can
> use a errfunc that ignores ENOENT/ENOTDIR to emulate the GNU/FreeBSD
> behaviour".
[...]

Yes, but to me that sounds more like the Solaris behaviour is
bogus and there's a way to work around it.

>From https://reviews.freebsd.org/rS304284
https://reviews.freebsd.org/rS304284#C38376190OL661
FreeBSD implementated that ignoring of ENOENT/ENOTDIR for POSIX
compliance in 2016.

For the ENOTDIR ignoring in GNU libc, that was in 1999 following
a bug report (libc/1032 which I coudn't find). See
https://sourceware.org/git/?p=glibc.git;a=commit;h=647361287ddb2d52ffe9dbbfe2bd27ed76dc2c56

NetBSD has this comment in the code:

/*
 * Posix/XOpen: glob should return when it encounters a
 * directory that it cannot open or read
 * XXX: Should we ignore ENOTDIR and ENOENT though?
 * I think that Posix had in mind EPERM...
 */

(ITTM EACCESS).

-- 
Stephane



Re: [1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files

2019-07-29 Thread Geoff Clare
Stephane Chazelas  wrote, on 29 Jul 2019:
>
> 2019-07-29 11:43:11 +0100, Geoff Clare:
> [...]
> > > But here I'm saying that the ENOENT/ENOTDIR errors should be
> > > ignored with GLOB_ERR. It can already be implied to some extent
> > > in that if you get those errors then it's not "directories"
> > > you're trying to open (so it's not a case there "it encounters a
> > > *directory* that it cannot open or read), but still the Solaris
> > > implementation (for both ENOENT and ENOTDIR) and GNU
> > > implementations (for ENOENT) still return errors.
> > 
> > I think you're interpreting the current text too literally. My
> > reading is that it is trying to describe what happens when glob()
> > attempts to open what it expects to be a directory and gets an error.
> > The Solaris behaviour seems like the right thing to do.  If an
> > application calls glob() to expand /etc/passwd/*.c without GLOB_NOCHECK
> > and with GLOB_ERR then I think the application writer would want glob()
> > to indicate that there's a problem with the pattern, not just that there
> > are no matches.
> [...]
> 
> But then
> 
> in */*.c, Solaris returns with an error if the current directory
> contains a non-directory file (and calls errfunc() with ENOTDIR
> and that file), which is not wanted.

True, but there's no way round that because GLOB_ERR can't distinguish
these cases.  It's "all or nothing".  

> IMO, GLOB_ERR should be about failure to expand the glob.
> The ENOTDIR error when expanding /etc/passwd/*.c is not
> preventing the glob from expanding (to nothing). If passwd was a
> symlink to some inaccessible area, then it would.

To me the point of having GLOB_ERR and errfunc as two different
error reporting mechanisms is that GLOB_ERR is "all or nothing"
and errfunc lets you be more selective.  You said yourself in the bug
that the Solaris behaviour is "more flexible in that the caller can
use a errfunc that ignores ENOENT/ENOTDIR to emulate the GNU/FreeBSD
behaviour".

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: [1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files

2019-07-29 Thread Stephane Chazelas
2019-07-29 11:43:11 +0100, Geoff Clare:
[...]
> > But here I'm saying that the ENOENT/ENOTDIR errors should be
> > ignored with GLOB_ERR. It can already be implied to some extent
> > in that if you get those errors then it's not "directories"
> > you're trying to open (so it's not a case there "it encounters a
> > *directory* that it cannot open or read), but still the Solaris
> > implementation (for both ENOENT and ENOTDIR) and GNU
> > implementations (for ENOENT) still return errors.
> 
> I think you're interpreting the current text too literally. My
> reading is that it is trying to describe what happens when glob()
> attempts to open what it expects to be a directory and gets an error.
> The Solaris behaviour seems like the right thing to do.  If an
> application calls glob() to expand /etc/passwd/*.c without GLOB_NOCHECK
> and with GLOB_ERR then I think the application writer would want glob()
> to indicate that there's a problem with the pattern, not just that there
> are no matches.
[...]

But then

in */*.c, Solaris returns with an error if the current directory
contains a non-directory file (and calls errfunc() with ENOTDIR
and that file), which is not wanted.

IMO, GLOB_ERR should be about failure to expand the glob.
The ENOTDIR error when expanding /etc/passwd/*.c is not
preventing the glob from expanding (to nothing). If passwd was a
symlink to some inaccessible area, then it would.

(but again, there's the problem of lstat() failures that are not
reported, but that's a different problem).

-- 
Stephane



Re: [1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files

2019-07-29 Thread Geoff Clare
Stephane Chazelas  wrote, on 29 Jul 2019:
>
> 2019-07-29 10:45:35 +0100, Geoff Clare:
> [...]
> > I noticed the same problem when I was working on the wording changes
> > to glob() as part of the pathname expansion fixes that arose from
> > bug 1255, which is why the proposed change in my email on 25th July
> > had:
> > 
> > | In glob() change GLOB_ERR from:
> > | 
> > | Cause glob() to return when it encounters a directory that it
> > | cannot open or read. Ordinarily, glob() continues to find matches.
> > | 
> > | to:
> > | 
> > | Cause glob() to return when an attempt to open, read or search a
> > | directory fails because of an error condition that is related to
> > | file system contents.  If this flag is not set, glob() shall
> > | not treat such conditions as an error, and shall continue to
> > | look for matches.
> > 
> > plus similar fixes further down the page.
> [...]
> 
> But here I'm saying that the ENOENT/ENOTDIR errors should be
> ignored with GLOB_ERR. It can already be implied to some extent
> in that if you get those errors then it's not "directories"
> you're trying to open (so it's not a case there "it encounters a
> *directory* that it cannot open or read), but still the Solaris
> implementation (for both ENOENT and ENOTDIR) and GNU
> implementations (for ENOENT) still return errors.

I think you're interpreting the current text too literally. My
reading is that it is trying to describe what happens when glob()
attempts to open what it expects to be a directory and gets an error.
The Solaris behaviour seems like the right thing to do.  If an
application calls glob() to expand /etc/passwd/*.c without GLOB_NOCHECK
and with GLOB_ERR then I think the application writer would want glob()
to indicate that there's a problem with the pattern, not just that there
are no matches.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: [1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files

2019-07-29 Thread Stephane Chazelas
2019-07-29 10:45:35 +0100, Geoff Clare:
[...]
> I noticed the same problem when I was working on the wording changes
> to glob() as part of the pathname expansion fixes that arose from
> bug 1255, which is why the proposed change in my email on 25th July
> had:
> 
> | In glob() change GLOB_ERR from:
> | 
> | Cause glob() to return when it encounters a directory that it
> | cannot open or read. Ordinarily, glob() continues to find matches.
> | 
> | to:
> | 
> | Cause glob() to return when an attempt to open, read or search a
> | directory fails because of an error condition that is related to
> | file system contents.  If this flag is not set, glob() shall
> | not treat such conditions as an error, and shall continue to
> | look for matches.
> 
> plus similar fixes further down the page.
[...]

But here I'm saying that the ENOENT/ENOTDIR errors should be
ignored with GLOB_ERR. It can already be implied to some extent
in that if you get those errors then it's not "directories"
you're trying to open (so it's not a case there "it encounters a
*directory* that it cannot open or read), but still the Solaris
implementation (for both ENOENT and ENOTDIR) and GNU
implementations (for ENOENT) still return errors.

-- 
Stephane



Re: [1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files

2019-07-29 Thread Geoff Clare
Austin Group Bug Tracker  wrote, on 27 Jul 2019:
>
> The following issue has been SUBMITTED. 
> == 
> http://austingroupbugs.net/view.php?id=1273 
> == 

> In the XSH glob() specification, 
> 
> For GLOB_ERR, the spec says:
> 
> > Cause glob() to return when it encounters a directory that it
> > cannot open or read. Ordinarily, glob() continues to find
> > matches.

I noticed the same problem when I was working on the wording changes
to glob() as part of the pathname expansion fixes that arose from
bug 1255, which is why the proposed change in my email on 25th July
had:

| In glob() change GLOB_ERR from:
| 
| Cause glob() to return when it encounters a directory that it
| cannot open or read. Ordinarily, glob() continues to find matches.
| 
| to:
| 
| Cause glob() to return when an attempt to open, read or search a
| directory fails because of an error condition that is related to
| file system contents.  If this flag is not set, glob() shall
| not treat such conditions as an error, and shall continue to
| look for matches.

plus similar fixes further down the page.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



[1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files

2019-07-28 Thread Austin Group Bug Tracker


A NOTE has been added to this issue. 
== 
http://austingroupbugs.net/view.php?id=1273 
== 
Reported By:stephane
Assigned To:
== 
Project:1003.1(2016)/Issue7+TC2
Issue ID:   1273
Category:   System Interfaces
Type:   Error
Severity:   Objection
Priority:   normal
Status: New
Name:   Stephane Chazelas 
Organization:
User Reference:  
Section:glob() 
Page Number:1109, 1110 (in 2018 edition) 
Line Number:35742, 35768 
Interp Status:  --- 
Final Accepted Text: 
== 
Date Submitted: 2019-07-27 10:49 UTC
Last Modified:  2019-07-28 07:03 UTC
== 
Summary:glob()'s GLOB_ERR/errfunc and non-directory files
== 

-- 
 (0004495) stephane (reporter) - 2019-07-28 07:03
 http://austingroupbugs.net/view.php?id=1273#c4495 
-- 
Re: http://austingroupbugs.net/view.php?id=1273#c4494
> The real problem with the interface is that it doesn't allow reporting
the
> lstat() errors in the */foo/bar/baz cases. Since errfunc only takes a
path
> and errno arguments, calling it with a subdir/foo/bar/baz and EACCESS
for
> instance could cause confusion and imply subdir/foo/bar/baz is a
directory
> that cannot be read,  while actually it's probably either subdir,
> subdir/foo or subdir/foo/bar that is not searchable. I'm not sure we
want
> to force implementations to stat() those 3 directories just to report an
> error.

Anyway, stat() would not be the right tool, more access(X_OK) in that case.
If subdir is not searchable then a */foo/bar/ba[z] would call
errfunc(subdir/foo/bar, EACCESS), so it would be acceptable for an
implementation to just do access(subdir/foo/bar, X_OK) if they wanted to
(that would not cover the other lstat() error cases though). 

Issue History 
Date ModifiedUsername   FieldChange   
== 
2019-07-27 10:49 stephane   New Issue
2019-07-27 10:49 stephane   Name  => Stephane Chazelas
2019-07-27 10:49 stephane   Section   => glob()  
2019-07-27 10:49 stephane   Page Number   => 1109, 1110 (in 2018
edition)
2019-07-27 10:49 stephane   Line Number   => 35742, 35768
2019-07-28 00:48 Don Cragun Interp Status => --- 
2019-07-28 00:48 Don Cragun Category Shell and Utilities =>
System Interfaces
2019-07-28 01:42 shware_systems Note Added: 0004493  
2019-07-28 06:44 stephane   Note Added: 0004494  
2019-07-28 07:03 stephane   Note Added: 0004495  
==




[1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files

2019-07-28 Thread Austin Group Bug Tracker


A NOTE has been added to this issue. 
== 
http://austingroupbugs.net/view.php?id=1273 
== 
Reported By:stephane
Assigned To:
== 
Project:1003.1(2016)/Issue7+TC2
Issue ID:   1273
Category:   System Interfaces
Type:   Error
Severity:   Objection
Priority:   normal
Status: New
Name:   Stephane Chazelas 
Organization:
User Reference:  
Section:glob() 
Page Number:1109, 1110 (in 2018 edition) 
Line Number:35742, 35768 
Interp Status:  --- 
Final Accepted Text: 
== 
Date Submitted: 2019-07-27 10:49 UTC
Last Modified:  2019-07-28 06:44 UTC
== 
Summary:glob()'s GLOB_ERR/errfunc and non-directory files
== 

-- 
 (0004494) stephane (reporter) - 2019-07-28 06:44
 http://austingroupbugs.net/view.php?id=1273#c4494 
-- 
Re: http://austingroupbugs.net/view.php?id=1273#c4493

Yes, it's actually not clear how stat() is meant to be used here. I had
assumed, lstat() was meant instead as in the ./*/file cases where
implementations don't open the subdirs of ., but instead try
lstat(./subdir/file) on each of them.

But GLOB_ERR/errfunc being meant to report errors upon opening/reading
*directories*, it can't report errors of lstat(). Maybe the spec wants
implementations to call stat() on directories to check if they are
searchable?

If we step back from the implementation detail to look at what the
intention of the interface should be: AFAICT a glob(*/*.c) should return
the matching files and GLOB_ERR/errfunc should identify the problems that
prevent us from doing so.

/etc/passwd/*.c or non-existing/*.c doesn't match any file. The
ENOTDIR/ENOENT failure upon trying to opening those non-directories is not
an error preventing us from expanding the glob, it's on the contrary
confirmation that the glob can't match.

Where it becomes more ambiguous is when ELOOP/ENAMETOOLONG is returned
(where the files may exist using a shortened path). FreeBSD's glob() does
return errors in those cases which IMO sounds like the best thing to do.

The real problem with the interface is that it doesn't allow reporting the
lstat() errors in the */foo/bar/baz cases. Since errfunc only takes a path
and errno arguments, calling it with a subdir/foo/bar/baz and EACCESS for
instance could cause confusion and imply subdir/foo/bar/baz is a directory
that cannot be read,  while actually it's probably either subdir,
subdir/foo or subdir/foo/bar that is not searchable. I'm not sure we want
to force implementations to stat() those 3 directories just to report an
error.

Maybe we don't want to over-specify here and just say GLOB_ERR/errfunc
should report the errors upon accessing directories (and directories or
files assumed to be directories only) that prevent it from expanding the
glob pattern without going into details of the implementation. And an
application usage section clarifying that non-existing/*.c should not be
reported as an error since the ENOENT failure of accessing the non-existing
assumed-to-be-directory doesn't prevent us from expanding the glob, quite
the contrary. 

Issue History 
Date ModifiedUsername   FieldChange   
== 
2019-07-27 10:49 stephane   New Issue
2019-07-27 10:49 stephane   Name  => Stephane Chazelas
2019-07-27 10:49 stephane   Section   => glob()  
2019-07-27 10:49 stephane   Page Number   => 1109, 1110 (in 2018
edition)
2019-07-27 10:49 stephane   Line Number   => 35742, 35768
2019-07-28 00:48 Don Cragun Interp Status => --- 
2019-07-28 00:48 Don Cragun Category Shell and Utilities =>
System Interfaces
2019-07-28 01:42 shware_systems Note Added: 0004493  
2019-07-28 06:44 stephane   Note Added: 0004494  
==




[1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files

2019-07-27 Thread Austin Group Bug Tracker


A NOTE has been added to this issue. 
== 
http://austingroupbugs.net/view.php?id=1273 
== 
Reported By:stephane
Assigned To:
== 
Project:1003.1(2016)/Issue7+TC2
Issue ID:   1273
Category:   System Interfaces
Type:   Error
Severity:   Objection
Priority:   normal
Status: New
Name:   Stephane Chazelas 
Organization:
User Reference:  
Section:glob() 
Page Number:1109, 1110 (in 2018 edition) 
Line Number:35742, 35768 
Interp Status:  --- 
Final Accepted Text: 
== 
Date Submitted: 2019-07-27 10:49 UTC
Last Modified:  2019-07-28 01:42 UTC
== 
Summary:glob()'s GLOB_ERR/errfunc and non-directory files
== 

-- 
 (0004493) shware_systems (reporter) - 2019-07-28 01:42
 http://austingroupbugs.net/view.php?id=1273#c4493 
-- 
Re:
- I don't think we want to force implementations to literally
  call opendir()/readdir()/lstat() (in any case, that "stat()"
  is wrong). Not sure how to phrase it though.


Those are examples of interfaces that may return error codes errfunc is
expected to process, that I see, not a requirement glob() implementations
have to use them and only them. So, use of lstat() is allowed, as is
directly accessing a host through syscalls that affect errno, bypassing use
of the listed interfaces entirely. All that is missing is "e.g." after
"failure," and ", or other standard interfaces," after "those interfaces"
in the parenthetical part, to emphasize they are examples.

What may be helpful is a table of standard errno values that are to be
passed to errfunc, whichever interface or implementation private code
generates them, so applications don't need to guess what case labels
errfunc's switch statement may have to process. 

Issue History 
Date ModifiedUsername   FieldChange   
== 
2019-07-27 10:49 stephane   New Issue
2019-07-27 10:49 stephane   Name  => Stephane Chazelas
2019-07-27 10:49 stephane   Section   => glob()  
2019-07-27 10:49 stephane   Page Number   => 1109, 1110 (in 2018
edition)
2019-07-27 10:49 stephane   Line Number   => 35742, 35768
2019-07-28 00:48 Don Cragun Interp Status => --- 
2019-07-28 00:48 Don Cragun Category Shell and Utilities =>
System Interfaces
2019-07-28 01:42 shware_systems Note Added: 0004493  
==




[1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files

2019-07-27 Thread Austin Group Bug Tracker


The following issue has been UPDATED. 
== 
http://austingroupbugs.net/view.php?id=1273 
== 
Reported By:stephane
Assigned To:
== 
Project:1003.1(2016)/Issue7+TC2
Issue ID:   1273
Category:   System Interfaces
Type:   Error
Severity:   Objection
Priority:   normal
Status: New
Name:   Stephane Chazelas 
Organization:
User Reference:  
Section:glob() 
Page Number:1109, 1110 (in 2018 edition) 
Line Number:35742, 35768 
Interp Status:  --- 
Final Accepted Text: 
== 
Date Submitted: 2019-07-27 10:49 UTC
Last Modified:  2019-07-28 00:48 UTC
== 
Summary:glob()'s GLOB_ERR/errfunc and non-directory files
== 

Issue History 
Date ModifiedUsername   FieldChange   
== 
2019-07-27 10:49 stephane   New Issue
2019-07-27 10:49 stephane   Name  => Stephane Chazelas
2019-07-27 10:49 stephane   Section   => glob()  
2019-07-27 10:49 stephane   Page Number   => 1109, 1110 (in 2018
edition)
2019-07-27 10:49 stephane   Line Number   => 35742, 35768
2019-07-28 00:48 Don Cragun Interp Status => --- 
2019-07-28 00:48 Don Cragun Category Shell and Utilities =>
System Interfaces
==




Re: [1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files

2019-07-27 Thread Stephane Chazelas
2019-07-27 10:49:39 +, Austin Group Bug Tracker:
[...]
> Category:   Shell and Utilities
[...]

Sorry, my bad that should have been "System interfaces". It
doesn't seem I can change it after the fact.

-- 
Stephane



[1003.1(2016)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files

2019-07-27 Thread Austin Group Bug Tracker


The following issue has been SUBMITTED. 
== 
http://austingroupbugs.net/view.php?id=1273 
== 
Reported By:stephane
Assigned To:
== 
Project:1003.1(2016)/Issue7+TC2
Issue ID:   1273
Category:   Shell and Utilities
Type:   Error
Severity:   Objection
Priority:   normal
Status: New
Name:   Stephane Chazelas 
Organization:
User Reference:  
Section:glob() 
Page Number:1109, 1110 (in 2018 edition) 
Line Number:35742, 35768 
Interp Status:  --- 
Final Accepted Text: 
== 
Date Submitted: 2019-07-27 10:49 UTC
Last Modified:  2019-07-27 10:49 UTC
== 
Summary:glob()'s GLOB_ERR/errfunc and non-directory files
Description: 
In the XSH glob() specification, 

For GLOB_ERR, the spec says:

> Cause glob() to return when it encounters a directory that it
> cannot open or read. Ordinarily, glob() continues to find
> matches.

(Note: it's not clear what "Ordinarily" means here. When errfunc
is set and returns non-zero, glob() doesn't continue, is it
ordinary?).

For errfunc:

> If, during the search, a directory is encountered that cannot
> be opened or read and errfunc is not a null pointer, glob()
> calls (*errfunc()) with two arguments.
[...]
>   2. The eerrno argument is the value of errno from the
>  failure, as set by opendir(), readdir(), or stat().
>  (Other values may be used to report other errors not
>  explicitly documented for those functions.)

(Note: does that mean glob() has to call those 3 functions (as
opposed to open(O_DIRECTORY)/getdents() or any other API)? Why
stat(), shouldn't that be lstat()?)

First (and that's still not the case I'm making here), it's not
obvious what /directories/ glob() will try to open.

It can be somewhat inferred from the spec, as the pathname
expansion specification refers to directories that must be
readable (which implies they are going to be read) and some that
only need to be searchable (implying they're not going to be
read).

But maybe the spec should be more explicit, as it's not  obvious
for instance that in */*.c the current directory and all the
subdirs are going to be read, while in */foo.c, only  the
current directory is read (and all subdirs/foo.c lstat()ed), so
if there's a non-readable subdir, only the former will fail (or
cause errfunc to be invoked).

Now, to get to the point, the spec refers to "directories" that
can't be opened.

What about a /etc/passwd/*.c glob. /etc/passwd is not a
directory, opendir("/etc/passwd") if called would fail with
ENOTDIR, does that mean glob() should not call opendir() here or
that it should ignore opendir()'s error when errno is  ENOTDIR?

What about */*.c where there's at least one non-directory
non-hidden file in the current directory? What if there's a
broken symlink or a symlink to a file that is not accessible
(and so for which we can't tell whether the symlink is a
directory or not)?

I've done tests with the FreeBSD 12.0, Solaris 10 and GNU libc
2.27 implementations of glob() and they all differ
significantly, the Solaris one being the least compliant to what
I can infer the spec to require, and FreeBSD's the most.

On Solaris /etc/passwd/*.c glob(GLOB_ERR) fails (and calls
errfunc with /etc/passwd, ENOTDIR), same for */*.c in a
directory that contains a non-hidden regular file.

Only FreeBSD's glob(GLOB_ERR) doesn't fail on non-existent/*.c
or */*.c in a directory that contains a broken symlink. The
other two call errfunc with ENOENT.

For */*.c in a directory that contains a symlink to a
non-accessible area, they all fail (call errfunc with EACCESS).
Same with */*/*.c if the current directory contains a subdir
that is readable but not searchable (note that whether glob()
could tell whether entries of that directory are directories or
not depends on whether readdir() returns that information or
not; either way, we can't tell for symlinks).

Desired Action: 
At this point, I just want to start the discussion as to how
best fix it.

- The "ordinarily" should probably be changed to "if errfunc is
  NULL"

- I don't think we want to force implementations to literally
  call opendir()/readdir()/lstat() (in any case, that "stat()"
  is wrong). Not sure how to phrase it though.

- we should probably clarify which directories glob() is meant
  to try opening, or which files glob() is meant to invoke
  opendir() or equivalent on.

- and then what to do for