Re: [1003.1(2013)/Issue7+TC1 0000697]: Adding of a getdirentries() function

2020-09-02 Thread Wojtek Lerch via austin-group-l at The Open Group
ly:sans-serif;}#yiv9121566835
> p.yiv9121566835msonormal, #yiv9121566835 li.yiv9121566835msonormal,
> #yiv9121566835 div.yiv9121566835msonormal {margin-right:0cm;margin-
> left:0cm;font-size:11.0pt;font-family:sans-serif;}#yiv9121566835
> p.yiv9121566835msonospacing1, #yiv9121566835
> li.yiv9121566835msonospacing1, #yiv9121566835
> div.yiv9121566835msonospacing1 {margin-right:0cm;margin-
> left:0cm;font-size:11.0pt;font-family:sans-serif;}#yiv9121566835
> p.yiv9121566835msonormal4, #yiv9121566835 li.yiv9121566835msonormal4,
> #yiv9121566835 div.yiv9121566835msonormal4 {margin-right:0cm;margin-
> left:0cm;font-size:11.0pt;font-family:sans-serif;}#yiv9121566835
> p.yiv9121566835msonormal31, #yiv9121566835
> li.yiv9121566835msonormal31, #yiv9121566835
> div.yiv9121566835msonormal31 {margin-right:0cm;margin-left:0cm;font-
> size:11.0pt;font-family:sans-serif;}#yiv9121566835
> span.yiv9121566835EmailStyle36 {font-family:New
> serif;color:windowtext;}#yiv9121566835
> span.yiv9121566835PlainTextChar {font-family:sans-
> serif;}#yiv9121566835 .yiv9121566835MsoChpDefault {font-size:10.0pt;} 
> _filtered {}#yiv9121566835 div.yiv9121566835WordSection1
> {}#yiv9121566835 
> Yes I made the flexible member a "short" on purpose -- I wanted that
> byte of padding before the flexible array.
>  
>   
>  
> No, the sizeof can't be 5 or 6 unless the implementation is okay with
> unaligned access.  If I declare an array of these structs, the int32
> inside each element needs to be aligned to a multiple of 4 --
> therefore the size of the struct must be a multiple of 4 as well. 
> The same applies to a struct without a flexible member.
>  
>   
>  
> No, the requirements on sizeof have nothing to do with how many flex
> members are "present".  All that is required is that the sizeof is
> either the same as it would be for a struct without the flexible
> member (which is still 8, on any implementation that requires
> alignment), or greater, if the struct requires more padding
> (presumably also for alignment).  Apart from that, the C standard
> says nothing about whether there's enough room between the offsetof
> and the sizeof for one or more elements of the flexible array.
>  
>   
>  
> What you described with malloc() has nothing to do with what the C
> standard refers to as “padding”.
>  
>   
>  
> Also, while I understand the need to page-align data structures in
> some situations, I still don’t see its relevance to a discussion of
> the C standard’s requirements regarding padding in struct types and
> how it’s affected by flexible arrays.
>  
>   
>  
> From: shwaresyst <mailto:shwares...@aol.com> 
> Sent: September 2, 2020 1:58 PM
> To: Wojtek Lerch <mailto:wle...@blackberry.com>; 
> mailto:austin-group-l@opengroup.org
> Subject: RE: [1003.1(2013)/Issue7+TC1 697]: Adding of a
> getdirentries() function
>  
>   
>  
> That example still has a byte of added padding, or the offsetof would
> be 5. The sizeof value is just incorrect, as it assumes one flex
> member is present. It should be 5 or 6, and which is the required
> value is what is ambiguous.
>  
> As you say, these are used most often with malloc(). Padding after
> the array is usually an artifact of this operation. You do a
> malloc(12) and you may get 16 or 32 bytes actually allocated. Mapping
> this as a short s[] an application can safely access s[5], but a
> compiler may not block an access to s[7] too, in that the memory for
> it is allocated. You map a long long l[] and you can only access l[0]
> safely, the remaining 4 bytes out of the 12 plus what malloc adds are
> tail padding, but a compiler may allow an l[1] access because the
> total allocated permits it.
>  
> I mentioned page aligned because when you are buffering multiple
> sectors directly from media the malloc()s for these will usually be
> in multiples of pages, and efficient management of these happens when
> these don't straddle pages so are page aligned too. Such isn't
> required by the standard, but it's common enough as desirable
> aligned_alloc() was added. As I've seen no one use FLA as an acronym
> for flexible array, I consider VLA as applying to any array of
> indeterminate size, sorry if this confuses anyone.
>  
>   
>  
> On Tuesday, September 1, 2020 Wojtek Lerch <mailto:wle...@blackberry.com>
> wrote:
>  
> My understanding is that they meant to allow an implementation where
>  “struct a { int32_t x; char y; short flex[]; }”  produces
>  sizeof(struct a)==8  but  offsetof(struct a,flex)==6.
>  
>  
>  
> I don’t like that they talk about padding “after” the flexible member
> – since the flexible array has a flexible size, rather than a

RE: [1003.1(2013)/Issue7+TC1 0000697]: Adding of a getdirentries() function

2020-09-02 Thread shwaresyst via austin-group-l at The Open Group

No, it does not need to be aligned to a multiple of 4, except on some lame RISC 
architectures. The logical model is unaligned accesses are always permitted; 
aligned accesses are the exception, not the rule. This is why the language is 
padding bytes may be added, not shall be added. The standard expects 
applications to use int_fastN_t or int_leastN_t types if it wants to take 
advantage of platform specific alignment optimizations. The allocation 
functions only recently added the only alignment requirement, namely any 
pointer returned be aligned for an access to an intmax_t value, and the region 
be minimally sizeof(intmax_t) in length.
On Wednesday, September 2, 2020 Wojtek Lerch  wrote:
#yiv9121566835 #yiv9121566835 -- _filtered {} _filtered {} _filtered 
{}#yiv9121566835 #yiv9121566835 p.yiv9121566835MsoNormal, #yiv9121566835 
li.yiv9121566835MsoNormal, #yiv9121566835 div.yiv9121566835MsoNormal 
{margin:0cm;margin-bottom:.0001pt;font-size:11.0pt;font-family:sans-serif;}#yiv9121566835
 a:link, #yiv9121566835 span.yiv9121566835MsoHyperlink 
{color:blue;text-decoration:underline;}#yiv9121566835 
p.yiv9121566835MsoPlainText, #yiv9121566835 li.yiv9121566835MsoPlainText, 
#yiv9121566835 div.yiv9121566835MsoPlainText 
{margin:0cm;margin-bottom:.0001pt;font-size:11.0pt;font-family:sans-serif;}#yiv9121566835
 p.yiv9121566835msonormal, #yiv9121566835 li.yiv9121566835msonormal, 
#yiv9121566835 div.yiv9121566835msonormal 
{margin-right:0cm;margin-left:0cm;font-size:11.0pt;font-family:sans-serif;}#yiv9121566835
 p.yiv9121566835msonospacing1, #yiv9121566835 li.yiv9121566835msonospacing1, 
#yiv9121566835 div.yiv9121566835msonospacing1 
{margin-right:0cm;margin-left:0cm;font-size:11.0pt;font-family:sans-serif;}#yiv9121566835
 p.yiv9121566835msonormal4, #yiv9121566835 li.yiv9121566835msonormal4, 
#yiv9121566835 div.yiv9121566835msonormal4 
{margin-right:0cm;margin-left:0cm;font-size:11.0pt;font-family:sans-serif;}#yiv9121566835
 p.yiv9121566835msonormal31, #yiv9121566835 li.yiv9121566835msonormal31, 
#yiv9121566835 div.yiv9121566835msonormal31 
{margin-right:0cm;margin-left:0cm;font-size:11.0pt;font-family:sans-serif;}#yiv9121566835
 span.yiv9121566835EmailStyle36 {font-family:New 
serif;color:windowtext;}#yiv9121566835 span.yiv9121566835PlainTextChar 
{font-family:sans-serif;}#yiv9121566835 .yiv9121566835MsoChpDefault 
{font-size:10.0pt;} _filtered {}#yiv9121566835 div.yiv9121566835WordSection1 
{}#yiv9121566835 
Yes I made the flexible member a "short" on purpose -- I wanted that byte of 
padding before the flexible array.
 
  
 
No, the sizeof can't be 5 or 6 unless the implementation is okay with unaligned 
access.  If I declare an array of these structs, the int32 inside each element 
needs to be aligned to a multiple of 4 -- therefore the size of the struct must 
be a multiple of 4 as well.  The same applies to a struct without a flexible 
member.
 
  
 
No, the requirements on sizeof have nothing to do with how many flex members 
are "present".  All that is required is that the sizeof is either the same as 
it would be for a struct without the flexible member (which is still 8, on any 
implementation that requires alignment), or greater, if the struct requires 
more padding (presumably also for alignment).  Apart from that, the C standard 
says nothing about whether there's enough room between the offsetof and the 
sizeof for one or more elements of the flexible array.
 
  
 
What you described with malloc() has nothing to do with what the C standard 
refers to as “padding”.
 
  
 
Also, while I understand the need to page-align data structures in some 
situations, I still don’t see its relevance to a discussion of the C standard’s 
requirements regarding padding in struct types and how it’s affected by 
flexible arrays.
 
  
 
From: shwaresyst  
Sent: September 2, 2020 1:58 PM
To: Wojtek Lerch ; austin-group-l@opengroup.org
Subject: RE: [1003.1(2013)/Issue7+TC1 0000697]: Adding of a getdirentries() 
function
 
  
 
That example still has a byte of added padding, or the offsetof would be 5. The 
sizeof value is just incorrect, as it assumes one flex member is present. It 
should be 5 or 6, and which is the required value is what is ambiguous.
 
As you say, these are used most often with malloc(). Padding after the array is 
usually an artifact of this operation. You do a malloc(12) and you may get 16 
or 32 bytes actually allocated. Mapping this as a short s[] an application can 
safely access s[5], but a compiler may not block an access to s[7] too, in that 
the memory for it is allocated. You map a long long l[] and you can only access 
l[0] safely, the remaining 4 bytes out of the 12 plus what malloc adds are tail 
padding, but a compiler may allow an l[1] access because the total allocated 
permits it.
 
I mentioned page aligned because when you are buffering multiple sectors 
directly from media the malloc()s for these will usually be in multiples of 
pages, and efficient managem

Re: [1003.1(2013)/Issue7+TC1 0000697]: Adding of a getdirentries() function

2020-09-02 Thread Steffen Nurpmeso via austin-group-l at The Open Group
Hallo Jörg.

Joerg Schilling wrote in
 <5f4fabb0.NZ6ZB9gXMVdfs/6x%joerg.schill...@fokus.fraunhofer.de>:
 |Steffen Nurpmeso via austin-group-l at The Open Group  wrote:
 |
 |> I personally would say that these should be skipped.  The data is
 |> copied over to user buffers, and these entries are simply not
 |> copied.  That seems to be the best.  The Group does not seem to
 |> want to add DT_WHITEOUT or similar things.
 |
 |A nice idea from 1986 from SunOS-3.5 that did not make it into SVr4...
 |
 |The question is whether this is POSIX compliant at all. If you like to see
 |such eintries, I would expect that you need to open() the directory with
 |a specific open flag first.
 |
 |So my questtions:
 |
 |- When do you see such entries?
 |
 |- What happens when you stat() such a name?

These are good questions.  I have never seen them myself, i never
used union mounts (on *BSD).  In the tracker discussion you will
find myself digging through FreeBSD C library code, and i think it
was no good.  To answer your questions, i seem to recall that
whiteout is used on union mounts, and i think if you try to stat
one of those their overlay that exists in an upper layer is found
instead.

The Plan9 operating system of Bell Labs as developed by the real,
real heroes of the scene makes (made, but for the still existing
and continued 9front fork) heavy use of such bind mounts.  The
Plan9Port code base which makes lots of this code available on
POSIX systems makes use of the getdents/direntries system calls
for its directory listings, that much i remember.  I had to look
how _they_ handle such entries in their bind mounts.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)



Re: [1003.1(2013)/Issue7+TC1 0000697]: Adding of a getdirentries() function

2020-09-02 Thread Steffen Nurpmeso via austin-group-l at The Open Group
Philip Guenther via austin-group-l at The Open Group wrote in
 :
 |On Tue, Sep 1, 2020 at 1:20 PM Steffen Nurpmeso  wrote:
 |> Philip Guenther wrote in
 |>  :
 |>|On Tue, Sep 1, 2020 at 6:22 AM Steffen Nurpmeso via austin-group-l at The
 |>|Open Group  wrote:
 |>|> Robert Elz via austin-group-l at The Open Group wrote in
 |>|>  <9252.1598969...@jinx.noi.kre.to>:
 |>|>|Date:Tue, 1 Sep 2020 10:32:55 +0100
 |>|>|From:"Geoff Clare via austin-group-l at The Open Group" \
 |>|>|
 |>|>|Message-ID:  <20200901093255.GA7629@localhost>
 |>|>   ...
 |>|>|What's more important is what happens if the application buffer isn't
 |>|>|big enough for the next entry.What do the existing getdents()
 |>|>|implementations do in that case?   If they're all the same then
 |>  ..
 |>|> Isn't that covered nicely by the posted text?  There must be space
 |>|> for at least one entry, otherwise EINVAL occurs?  And upon success
 |>  ..
 |>|A quick review of FreeBSD, NetBSD, and OpenBSD finds they all return
 |> EINVAL
 |>|if the buffer isn't "big enough".
 |>|For OpenBSD, the minimum buffer size is 512 bytes; if I'm reading it
 |>|correctly NetBSD is similar, possibly varying based on filesystem
 |>|formatting.
 |>|FreeBSD requires space for the next entry.
 |>
 |> They document that "the size must be greater than or equal to the
 |> block size associated with the file", but i cannot find this "next
 |> entry" of yours?
 |>
 |
 |What document are you quoting from?

  $ git show origin/master:lib/libc/sys/getdirentries.2|mandoc|less

It uses "next" only in conjunction with seeking or entry hopping.
But it is likely you used "next" to mean "at least one", then this
is just a misunderstanding.

 |I actually tested the behavior of FreeBSD 11.3 getdirentries(2), after
 |looking at the code.
 |The manpage on that system says:

..EINVAL, yes..

 |...
 |>|At least NetBSD and OpenBSD will return an entry with d_ino == 0 if the
 |>|first entry in a block is removed.  I suspect others may do this as well;
 |>|glibc at least includes code to skip such entries in its generic
 |> readdir()
 |>|implementation.
 |>|
 |>|The question really is "is this supposed to be a API that can be
 |> trivially
 |>|supported by all the existing versions, even if that makes it more clunky
 |>|to use, or should it be easy to use even if every single existing
 |>|implementation needs to bend?"
 |>
 |> But .. it is already supported by all, and it has always been
 |> used?!
 |
 |As I've been describing, that is not true for at least NetBSD and OpenBSD:
 | * they both require the buffer to be some size larger than a single entry
 | * they both can return entries with d_ino == 0 and d_name that doesn't
 |correspond
 |   to a file in the directory
 |
 |"The same, but different" means "NOT THE SAME".  Code that follows the
 |proposed description on those points would not behaved as expected if used
 |with NetBSD's or OpenBSD's getdents(2).

All i can say is that i would skip those entries.
There is no undeletion facility, so passing as much information as
possible about directory content is fine, but it must be somehow
useful in the end; and then specific programs which include very
specific headers and use very specific ioctls can do _that_ job.
My opinion.

  ...
 |>|If the former, then define a minimum buffer size
 |>|(pathconf(_PC_DIRBUFMIN)...?), permit d_ino==0 as entries where d_name
 |> and
 |>
 |> However there is _PC_NAME_MAX already, and so the number must be
 |> nearby, no?  Isn't that overengineering?
 |
 |The buffers required by NetBSD and OpenBSD are larger than and unrelated to
 |the value returned by pathconf(_PC_NAME_MAX) and instead related to a block
 |size of the filesystem.

In fact i personally would not call this function with less than
a page full of memory, conditionally more.  That reminds me that
it is a pity that there is no "get_usable_size" for a size and
"get_usable_size_ptr" (or so, overloading not in C) for malloc.

 |>|d_type are unspecified, and let d_name be either a fixed fix array or a
 |>|flexible array member.
 |>
 |> You know, my personal position would be to just skip those entries
 |> when copying data over to user buffers.  The costs of walking over
 |> the user buffer once (if it is done like that, and that memory
 |> should be hot even, then) seem to be low compared to collecting
 |> the directory entry information.
 |
 |Sure, I'm fine with this new API specifying that those deleted entries be
 |suppressed, but it's inconsistent to do that and then insist that d_name's
 |nature be unspecified to make the API compatible with existing
 |implementation, despite making it more annoying to program with.

As far as i understood the d_name nature was about reuse of
existing structures which use fixed-size names, and has nothing to
do with "currently" non-existing files, to which i count whiteouts
too, by the way.

 |>|If the latter, then require it to work with very small buffers, require
 |> all
 

RE: [1003.1(2013)/Issue7+TC1 0000697]: Adding of a getdirentries() function

2020-09-02 Thread Wojtek Lerch via austin-group-l at The Open Group
Yes I made the flexible member a "short" on purpose -- I wanted that byte of 
padding before the flexible array.



No, the sizeof can't be 5 or 6 unless the implementation is okay with unaligned 
access.  If I declare an array of these structs, the int32 inside each element 
needs to be aligned to a multiple of 4 -- therefore the size of the struct must 
be a multiple of 4 as well.  The same applies to a struct without a flexible 
member.



No, the requirements on sizeof have nothing to do with how many flex members 
are "present".  All that is required is that the sizeof is either the same as 
it would be for a struct without the flexible member (which is still 8, on any 
implementation that requires alignment), or greater, if the struct requires 
more padding (presumably also for alignment).  Apart from that, the C standard 
says nothing about whether there's enough room between the offsetof and the 
sizeof for one or more elements of the flexible array.



What you described with malloc() has nothing to do with what the C standard 
refers to as “padding”.



Also, while I understand the need to page-align data structures in some 
situations, I still don’t see its relevance to a discussion of the C standard’s 
requirements regarding padding in struct types and how it’s affected by 
flexible arrays.

From: shwaresyst 
Sent: September 2, 2020 1:58 PM
To: Wojtek Lerch ; austin-group-l@opengroup.org
Subject: RE: [1003.1(2013)/Issue7+TC1 0000697]: Adding of a getdirentries() 
function


That example still has a byte of added padding, or the offsetof would be 5. The 
sizeof value is just incorrect, as it assumes one flex member is present. It 
should be 5 or 6, and which is the required value is what is ambiguous.

As you say, these are used most often with malloc(). Padding after the array is 
usually an artifact of this operation. You do a malloc(12) and you may get 16 
or 32 bytes actually allocated. Mapping this as a short s[] an application can 
safely access s[5], but a compiler may not block an access to s[7] too, in that 
the memory for it is allocated. You map a long long l[] and you can only access 
l[0] safely, the remaining 4 bytes out of the 12 plus what malloc adds are tail 
padding, but a compiler may allow an l[1] access because the total allocated 
permits it.

I mentioned page aligned because when you are buffering multiple sectors 
directly from media the malloc()s for these will usually be in multiples of 
pages, and efficient management of these happens when these don't straddle 
pages so are page aligned too. Such isn't required by the standard, but it's 
common enough as desirable aligned_alloc() was added. As I've seen no one use 
FLA as an acronym for flexible array, I consider VLA as applying to any array 
of indeterminate size, sorry if this confuses anyone.


On Tuesday, September 1, 2020 Wojtek Lerch 
mailto:wle...@blackberry.com>> wrote:

My understanding is that they meant to allow an implementation where  “struct a 
{ int32_t x; char y; short flex[]; }”  produces  sizeof(struct a)==8  but  
offsetof(struct a,flex)==6.



I don’t like that they talk about padding “after” the flexible member – since 
the flexible array has a flexible size, rather than a zero size, that padding 
really overlaps the beginning of the array.



Personally I think that the standard could be made clearer if a structure with 
a flexible member were considered an incomplete type.  You wouldn’t be allowed 
to apply sizeof to it at all, and you wouldn’t be able to declare objects whose 
type is the structure, but you could still use pointers to it and dereference 
members – since the main purpose of such structures is to allocate them via 
malloc(), I don’t think anybody would mind those restrictions.



Also, I don’t understand why struct s would need to be page aligned or why you 
mention a VLA.  A flexible array is not a VLA, in the sense C uses the term.



From: shwaresyst mailto:shwares...@aol.com>>
Sent: September 1, 2020 4:55 PM
To: Wojtek Lerch mailto:wle...@blackberry.com>>; 
austin-group-l@opengroup.org<mailto:austin-group-l@opengroup.org>
Subject: RE: [1003.1(2013)/Issue7+TC1 0000697]: Adding of a getdirentries() 
function



What that refers to, it looks, is any tail padding for the structure as a 
whole. The standard still permits internal padding between individual fields as 
required, e.g. a struct s { short a; double b[] } might need 6 bytes of this 
padding to align access for b[0]. This would still be needed if b[] only has a 
few members as a VLA but s is being page aligned, and so would reserve a lot of 
tail padding too. There would be 2 padding regions, however, is what that 
change forces.





On Tuesday, September 1, 2020 Wojtek Lerch 
mailto:wle...@blackberry.com>> wrote:

Actually the intent was the opposite.  The original C99 did contain a wording 
that matches y

RE: [1003.1(2013)/Issue7+TC1 0000697]: Adding of a getdirentries() function

2020-09-02 Thread shwaresyst via austin-group-l at The Open Group

That example still has a byte of added padding, or the offsetof would be 5. The 
sizeof value is just incorrect, as it assumes one flex member is present. It 
should be 5 or 6, and which is the required value is what is ambiguous.


As you say, these are used most often with malloc(). Padding after the array is 
usually an artifact of this operation. You do a malloc(12) and you may get 16 
or 32 bytes actually allocated. Mapping this as a short s[] an application can 
safely access s[5], but a compiler may not block an access to s[7] too, in that 
the memory for it is allocated. You map a long long l[] and you can only access 
l[0] safely, the remaining 4 bytes out of the 12 plus what malloc adds are tail 
padding, but a compiler may allow an l[1] access because the total allocated 
permits it.

I mentioned page aligned because when you are buffering multiple sectors 
directly from media the malloc()s for these will usually be in multiples of 
pages, and efficient management of these happens when these don't straddle 
pages so are page aligned too. Such isn't required by the standard, but it's 
common enough as desirable aligned_alloc() was added. As I've seen no one use 
FLA as an acronym for flexible array, I consider VLA as applying to any array 
of indeterminate size, sorry if this confuses anyone.
On Tuesday, September 1, 2020 Wojtek Lerch  wrote:
#yiv4376059201 #yiv4376059201 -- _filtered {} _filtered {} _filtered 
{}#yiv4376059201 #yiv4376059201 p.yiv4376059201MsoNormal, #yiv4376059201 
li.yiv4376059201MsoNormal, #yiv4376059201 div.yiv4376059201MsoNormal 
{margin:0cm;margin-bottom:.0001pt;font-size:11.0pt;font-family:sans-serif;}#yiv4376059201
 a:link, #yiv4376059201 span.yiv4376059201MsoHyperlink 
{color:blue;text-decoration:underline;}#yiv4376059201 
p.yiv4376059201msonospacing, #yiv4376059201 li.yiv4376059201msonospacing, 
#yiv4376059201 div.yiv4376059201msonospacing 
{margin-right:0cm;margin-left:0cm;font-size:11.0pt;font-family:sans-serif;}#yiv4376059201
 p.yiv4376059201msonormal, #yiv4376059201 li.yiv4376059201msonormal, 
#yiv4376059201 div.yiv4376059201msonormal 
{margin-right:0cm;margin-left:0cm;font-size:11.0pt;font-family:sans-serif;}#yiv4376059201
 p.yiv4376059201msonormal3, #yiv4376059201 li.yiv4376059201msonormal3, 
#yiv4376059201 div.yiv4376059201msonormal3 
{margin-right:0cm;margin-left:0cm;font-size:11.0pt;font-family:sans-serif;}#yiv4376059201
 span.yiv4376059201EmailStyle33 {font-family:New 
serif;color:windowtext;}#yiv4376059201 .yiv4376059201MsoChpDefault 
{font-size:10.0pt;} _filtered {}#yiv4376059201 div.yiv4376059201WordSection1 
{}#yiv4376059201 
My understanding is that they meant to allow an implementation where  “struct a 
{ int32_t x; char y; short flex[]; }”  produces  sizeof(struct a)==8  but  
offsetof(struct a,flex)==6.
 
  
 
I don’t like that they talk about padding “after” the flexible member – since 
the flexible array has a flexible size, rather than a zero size, that padding 
really overlaps the beginning of the array.
 
  
 
Personally I think that the standard could be made clearer if a structure with 
a flexible member were considered an incomplete type.  You wouldn’t be allowed 
to applysizeof to it at all, and you wouldn’t be able to declare objects whose 
type is the structure, but you could still use pointers to it and dereference 
members – since the main purpose of such structures is to allocate them via 
malloc(), I don’t think anybody would mind those restrictions.
 
  
 
Also, I don’t understand whystruct s would need to be page aligned or why you 
mention a VLA.  A flexible array is not a VLA, in the sense C uses the term.
 
  
 
From: shwaresyst  
Sent: September 1, 2020 4:55 PM
To: Wojtek Lerch ; austin-group-l@opengroup.org
Subject: RE: [1003.1(2013)/Issue7+TC1 697]: Adding of a getdirentries() 
function
 
  
 
What that refers to, it looks, is any tail padding for the structure as a 
whole. The standard still permits internal padding between individual fields as 
required, e.g. a struct s { short a; double b[] } might need 6 bytes of this 
padding to align access for b[0]. This would still be needed if b[] only has a 
few members as a VLA but s is being page aligned, and so would reserve a lot of 
tail padding too. There would be 2 padding regions, however, is what that 
change forces.
 
  
 
On Tuesday, September 1, 2020 Wojtek Lerch  wrote:
 
Actually the intent was the opposite.  The original C99 did contain a wording 
that matches your interpretation:
 
 
 
… the size of the structureshall be equal to the offset of the last element of 
an otherwise identical structure that replaces the flexible array member with 
an array of unspecified length.
 
 
 
But this was reported as a defect, and corrected in TC2.
 
 
 
Summary
 6.7.2.1 Structure and union specifiers, paragraphs 15 and 16 require that any 
padding for alignment of a structure containing a flexible array member must 
preceed the flexible array member.  This contradicts existing

Re: [1003.1(2013)/Issue7+TC1 0000697]: Adding of a getdirentries() function

2020-09-02 Thread Wojtek Lerch via austin-group-l at The Open Group

On 2020-09-02 10:34, Joerg Schilling wrote:

Wojtek Lerch via austin-group-l at The Open Group 
 wrote:


A structure member can be a "flexible array" in standard C, but that's not the 
same thing as a VLA.

Are you speaking about array[] in contrast to array[size] with size being a
variable?



Yes.

(Pedantically speaking, the "size" can be any arbitrary expression 
rather than just a variable; either way, because it has to be computed 
at runtime, such syntax is only allowed inside a function.)




Re: [1003.1(2013)/Issue7+TC1 0000697]: Adding of a getdirentries() function

2020-09-02 Thread Joerg Schilling via austin-group-l at The Open Group
Wojtek Lerch via austin-group-l at The Open Group 
 wrote:

> A structure member can be a "flexible array" in standard C, but that's not 
> the same thing as a VLA.

Are you speaking about array[] in contrast to array[size] with size being a 
variable?

Jörg

-- 
 EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin
joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'



Re: [1003.1(2013)/Issue7+TC1 0000697]: Adding of a getdirentries() function

2020-09-02 Thread Joerg Schilling via austin-group-l at The Open Group
Steffen Nurpmeso via austin-group-l at The Open Group 
 wrote:

> I personally would say that these should be skipped.  The data is
> copied over to user buffers, and these entries are simply not
> copied.  That seems to be the best.  The Group does not seem to
> want to add DT_WHITEOUT or similar things.

A nice idea from 1986 from SunOS-3.5 that did not make it into SVr4...

The question is whether this is POSIX compliant at all. If you like to see
such eintries, I would expect that you need to open() the directory with
a specific open flag first.

So my questtions:

-   When do you see such entries?

-   What happens when you stat() such a name?

Jörg

-- 
 EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin
joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'



Re: [1003.1(2013)/Issue7+TC1 0000697]: Adding of a getdirentries() function

2020-09-02 Thread Geoff Clare via austin-group-l at The Open Group
Steffen Nurpmeso wrote, on 01 Sep 2020:
> 
>  |Do the existing implementations ever return such things?   Do they
>  |hide them by making the reclen of the previous entry (if there is
>  |one in the buffer) bigger, or do they squash them out, moving the
>  |next existing entry down to follow immediately after the previous one
>  |(where all the reclen's are as small as possible to contain the
>  |sctuct header, the name (and its \0) and alignment padding.)   This is
>  |a case where we don't necessarily need to specify one scheme that
>  |must be used - we can leave that for the implementation, as long as
>  |applications are informed what might happen.
> 
> The proposed text says that filenames are NUL terminated and
> hopping from entry to entry happens by adding the reclen to the
> current entry (casted to char*).  So it seems there could be data
> in between.

I have put a proposed rationale addition in the etherpad to make it
clear that this solution is allowed:

Some existing getdents() functions include deleted
directory entries in buf, marked with a special value of
one of the structure members. This behavior is not allowed for
posix_getdents(), although the data from a deleted
directory entry may be present in buf in the form of extra
padding on the end of the previous entry.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: [1003.1(2013)/Issue7+TC1 0000697]: Adding of a getdirentries() function

2020-09-02 Thread Geoff Clare via austin-group-l at The Open Group
Philip Guenther wrote, on 01 Sep 2020:
>
> If a posix_getdents() implementation returned the names of all the files
> that ever existed in the given directory, including ones that were removed
> before the fd for this call was opened, what requirement in the standard
> would that violate?  I don't see any, thus my suggested wording for such a
> requirement.

It would not comply with the very first sentence of the description:

The posix_getdents() function shall attempt to read directory
entries from the directory associated with the open file
descriptor fildes and shall place information about the directory
entries and the files they refer to in ...

Note "the files they refer to" (present tense).

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: [1003.1(2013)/Issue7+TC1 0000697]: Adding of a getdirentries() function

2020-09-02 Thread Geoff Clare via austin-group-l at The Open Group
Wojtek Lerch wrote, on 01 Sep 2020:
>
> Geoff Clare wrote:
> > We can't require d_name in struct dirent to be a VLA since there are 
> > implementations where it is not.
> 
> Another good reason is that standard C does not allow structure members to be 
> VLAs.

Mea culpa.  I tried to save some typing by using VLA instead of flexible
array member, thinking they amounted to the same thing. Thanks for the
correction.  (Shows how little attention I have paid to both, since
they were - up to now - not relevant to POSIX.)

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: [1003.1(2013)/Issue7+TC1 0000697]: Adding of a getdirentries() function

2020-09-01 Thread Philip Guenther via austin-group-l at The Open Group
On Tue, Sep 1, 2020 at 1:20 PM Steffen Nurpmeso  wrote:

> Philip Guenther wrote in
>  :
>  |On Tue, Sep 1, 2020 at 6:22 AM Steffen Nurpmeso via austin-group-l at The
>  |Open Group  wrote:
>  |> Robert Elz via austin-group-l at The Open Group wrote in
>  |>  <9252.1598969...@jinx.noi.kre.to>:
>  |>|Date:Tue, 1 Sep 2020 10:32:55 +0100
>  |>|From:"Geoff Clare via austin-group-l at The Open Group" \
>  |>|
>  |>|Message-ID:  <20200901093255.GA7629@localhost>
>  |>   ...
>  |>|What's more important is what happens if the application buffer isn't
>  |>|big enough for the next entry.What do the existing getdents()
>  |>|implementations do in that case?   If they're all the same then
>  ..
>  |> Isn't that covered nicely by the posted text?  There must be space
>  |> for at least one entry, otherwise EINVAL occurs?  And upon success
>  ..
>  |A quick review of FreeBSD, NetBSD, and OpenBSD finds they all return
> EINVAL
>  |if the buffer isn't "big enough".
>  |For OpenBSD, the minimum buffer size is 512 bytes; if I'm reading it
>  |correctly NetBSD is similar, possibly varying based on filesystem
>  |formatting.
>  |FreeBSD requires space for the next entry.
>
> They document that "the size must be greater than or equal to the
> block size associated with the file", but i cannot find this "next
> entry" of yours?
>

What document are you quoting from?

I actually tested the behavior of FreeBSD 11.3 getdirentries(2), after
looking at the code.
The manpage on that system says:

 [EINVAL]   The file referenced by fd is not a directory, or
nbytes is too small for returning a directory entry
or
block of entries, or the current position pointer is
invalid.

...

>  |At least NetBSD and OpenBSD will return an entry with d_ino == 0 if the
>  |first entry in a block is removed.  I suspect others may do this as well;
>  |glibc at least includes code to skip such entries in its generic
> readdir()
>  |implementation.
>  |
>  |The question really is "is this supposed to be a API that can be
> trivially
>  |supported by all the existing versions, even if that makes it more clunky
>  |to use, or should it be easy to use even if every single existing
>  |implementation needs to bend?"
>
> But .. it is already supported by all, and it has always been
> used?!


As I've been describing, that is not true for at least NetBSD and OpenBSD:
 * they both require the buffer to be some size larger than a single entry
 * they both can return entries with d_ino == 0 and d_name that doesn't
correspond
   to a file in the directory

"The same, but different" means "NOT THE SAME".  Code that follows the
proposed description on those points would not behaved as expected if used
with NetBSD's or OpenBSD's getdents(2).



> And i like this forward-looking approach that has been
> taken by the group, having that stat(2) call removed is fine.
> (Even though that is easily doable with the current standard and
> fstatat(), which is a totally different situation to twenty years
> ago!  Yay!)
>
>  |If the former, then define a minimum buffer size
>  |(pathconf(_PC_DIRBUFMIN)...?), permit d_ino==0 as entries where d_name
> and
>
> However there is _PC_NAME_MAX already, and so the number must be
> nearby, no?  Isn't that overengineering?
>

The buffers required by NetBSD and OpenBSD are larger than and unrelated to
the value returned by pathconf(_PC_NAME_MAX) and instead related to a block
size of the filesystem.



>  |d_type are unspecified, and let d_name be either a fixed fix array or a
>  |flexible array member.
>
> You know, my personal position would be to just skip those entries
> when copying data over to user buffers.  The costs of walking over
> the user buffer once (if it is done like that, and that memory
> should be hot even, then) seem to be low compared to collecting
> the directory entry information.
>

Sure, I'm fine with this new API specifying that those deleted entries be
suppressed, but it's inconsistent to do that and then insist that d_name's
nature be unspecified to make the API compatible with existing
implementation, despite making it more annoying to program with.



>  |If the latter, then require it to work with very small buffers, require
> all
>  |entries to have valid d_name and d_type, and specify d_name as a FAM.
>
> That d_type from the start would be great.
>

If you mean "require d_type to have a real value and never DT_UNKNOWN",
then that's a step further which I don't think any existing getd*ent* API
has taken and which would make this API _slower_  than readdir() on some
implementation+filesystem combos.


Re: [1003.1(2013)/Issue7+TC1 0000697]: Adding of a getdirentries() function

2020-09-01 Thread shwaresyst via austin-group-l at The Open Group

No, that is not what I would want nor would anyone else. NAME_MAX doesn't 
guarantee no d_name will ever be longer than this value, what it says is all 
drivers for file systems provided by the implementation are capable of 
processing names up to that length. Some provided may support much longer names 
too, the standard leaves open. Because of this latter possibility no compile 
time constant guarantees EINVAL won't occur, that is suitable for use in a 
macro. Something that examines the media at runtime is required, which a macro 
might be an alias for, as a wrapper, but something still needs to be 
implemented to be wrapped.
On Tuesday, September 1, 2020 Steffen Nurpmeso  wrote:
shwaresyst wrote in
 <1739483391.1543785.1598977118...@mail.yahoo.com>:
 |No, it couldn't introduce such a macro, because such would have to \
 |assume all d_name entries are the same length. Adding an option to \

Well it has to go for NAME_MAX + the_size_of_posix_dent for each
and every entry, this is what you want here?  Except for what
Philip Guenther said, of course.  But if it would be left
implementation defined then even that could be covered by the
macro, better than by anything else.

I for one feel you are very brave to apply sizeof() to anything
with a "flexible array member", i would not dare that for portable
code.  (But my code has to work with ISO C89 too, so i have to use
macros to switch between [a-number] and [] as applicable, and also
to SIZEOF these types.)

Really, you are very brave!  Just the bugs i had to work around
since 2018 or what for a really tiny set of primitive tools!
(Like some gregarious animal not inlining for -Os, and another
huge one requiring explicit this-> to find superclass fields in
one class, but not the other.)

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter          he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)


Re: [1003.1(2013)/Issue7+TC1 0000697]: Adding of a getdirentries() function

2020-09-01 Thread Steffen Nurpmeso via austin-group-l at The Open Group
Philip Guenther wrote in
 :
 |On Tue, Sep 1, 2020 at 6:22 AM Steffen Nurpmeso via austin-group-l at The
 |Open Group  wrote:
 |> Robert Elz via austin-group-l at The Open Group wrote in
 |>  <9252.1598969...@jinx.noi.kre.to>:
 |>|Date:Tue, 1 Sep 2020 10:32:55 +0100
 |>|From:"Geoff Clare via austin-group-l at The Open Group" \
 |>|
 |>|Message-ID:  <20200901093255.GA7629@localhost>
 |>   ...
 |>|What's more important is what happens if the application buffer isn't
 |>|big enough for the next entry.What do the existing getdents()
 |>|implementations do in that case?   If they're all the same then
 ..
 |> Isn't that covered nicely by the posted text?  There must be space
 |> for at least one entry, otherwise EINVAL occurs?  And upon success
 ..
 |A quick review of FreeBSD, NetBSD, and OpenBSD finds they all return EINVAL
 |if the buffer isn't "big enough".
 |For OpenBSD, the minimum buffer size is 512 bytes; if I'm reading it
 |correctly NetBSD is similar, possibly varying based on filesystem
 |formatting.
 |FreeBSD requires space for the next entry.

They document that "the size must be greater than or equal to the
block size associated with the file", but i cannot find this "next
entry" of yours?

 ||Similarly for what is done for directory pieces that don't contain
 |>|files, on filesystems that allow that (inode number == 0 or perhaps
 |>|a file type for "dummy entry" or something, or whatever).
 |>
 |> I personally would say that these should be skipped.  The data is
 ...
 |> copied.  That seems to be the best.  The Group does not seem to
 |> want to add DT_WHITEOUT or similar things.
 |
 |DT_WHITEOUT is different, related to union mounts.

Yes.  I think it was mentioned in the tracker discussion.

 ||Do the existing implementations ever return such things?   Do they
 |>
 |> I personally have not seen it, but this likely is a very
 |> filesystem dependent thing, which possibly even changes over time
 |
 |At least NetBSD and OpenBSD will return an entry with d_ino == 0 if the
 |first entry in a block is removed.  I suspect others may do this as well;
 |glibc at least includes code to skip such entries in its generic readdir()
 |implementation.
 |
 |The question really is "is this supposed to be a API that can be trivially
 |supported by all the existing versions, even if that makes it more clunky
 |to use, or should it be easy to use even if every single existing
 |implementation needs to bend?"

But .. it is already supported by all, and it has always been
used?!  And i like this forward-looking approach that has been
taken by the group, having that stat(2) call removed is fine.
(Even though that is easily doable with the current standard and
fstatat(), which is a totally different situation to twenty years
ago!  Yay!)

 |If the former, then define a minimum buffer size
 |(pathconf(_PC_DIRBUFMIN)...?), permit d_ino==0 as entries where d_name and

However there is _PC_NAME_MAX already, and so the number must be
nearby, no?  Isn't that overengineering?

 |d_type are unspecified, and let d_name be either a fixed fix array or a
 |flexible array member.

You know, my personal position would be to just skip those entries
when copying data over to user buffers.  The costs of walking over
the user buffer once (if it is done like that, and that memory
should be hot even, then) seem to be low compared to collecting
the directory entry information.

 |If the latter, then require it to work with very small buffers, require all
 |entries to have valid d_name and d_type, and specify d_name as a FAM.

That d_type from the start would be great.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)



RE: [1003.1(2013)/Issue7+TC1 0000697]: Adding of a getdirentries() function

2020-09-01 Thread Wojtek Lerch via austin-group-l at The Open Group
My understanding is that they meant to allow an implementation where  “struct a 
{ int32_t x; char y; short flex[]; }”  produces  sizeof(struct a)==8  but  
offsetof(struct a,flex)==6.

I don’t like that they talk about padding “after” the flexible member – since 
the flexible array has a flexible size, rather than a zero size, that padding 
really overlaps the beginning of the array.

Personally I think that the standard could be made clearer if a structure with 
a flexible member were considered an incomplete type.  You wouldn’t be allowed 
to apply sizeof to it at all, and you wouldn’t be able to declare objects whose 
type is the structure, but you could still use pointers to it and dereference 
members – since the main purpose of such structures is to allocate them via 
malloc(), I don’t think anybody would mind those restrictions.

Also, I don’t understand why struct s would need to be page aligned or why you 
mention a VLA.  A flexible array is not a VLA, in the sense C uses the term.

From: shwaresyst 
Sent: September 1, 2020 4:55 PM
To: Wojtek Lerch ; austin-group-l@opengroup.org
Subject: RE: [1003.1(2013)/Issue7+TC1 697]: Adding of a getdirentries() 
function


What that refers to, it looks, is any tail padding for the structure as a 
whole. The standard still permits internal padding between individual fields as 
required, e.g. a struct s { short a; double b[] } might need 6 bytes of this 
padding to align access for b[0]. This would still be needed if b[] only has a 
few members as a VLA but s is being page aligned, and so would reserve a lot of 
tail padding too. There would be 2 padding regions, however, is what that 
change forces.


On Tuesday, September 1, 2020 Wojtek Lerch 
mailto:wle...@blackberry.com>> wrote:

Actually the intent was the opposite.  The original C99 did contain a wording 
that matches your interpretation:



… the size of the structure shall be equal to the offset of the last element of 
an otherwise identical structure that replaces the flexible array member with 
an array of unspecified length.



But this was reported as a defect, and corrected in TC2.



Summary
 6.7.2.1 Structure and union specifiers, paragraphs 15 and 16 require that any 
padding for alignment of a structure containing a flexible array member must 
preceed the flexible array member.  This contradicts existing implementations.  
We do not believe this was the intent of the C99 specification.

Details

If a struct contains a flexible array member and also requires padding for 
alignment, then the current C99 specification requires the implementation to 
put this padding before the flexible array member.  However, existing 
implementations, including at least GNU C, Compaq C, and Sun C, put the padding 
after the flexible array member.

The layout used by existing implementations can be more efficient. Furthermore, 
requiring these existing implementations to change their layout would break 
binary backwards compatibility with previous versions.



See DR282 for more details: 
http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_282.htm<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.open-2Dstd.org_jtc1_sc22_wg14_www_docs_dr-5F282.htm=DwMFaQ=yzoHOc_ZK-sxl-kfGNSEvlJYanssXN3q-lhj0sp26wE=-4AKPdl-tTThW9baWRqks1QhV4BtauX1oWrciJm2KH8=4Hb-MHkV2cRRhPP0ZwnkRAvxW8AzOkMO5hnS-tKa9R4=fNqZbhfwo3apg1vw26sgPTax2JbyoFeBsAxzVXZsARg=>





From: shwaresyst mailto:shwares...@aol.com>>
Sent: September 1, 2020 2:27 PM
To: Wojtek Lerch mailto:wle...@blackberry.com>>; 
austin-group-l@opengroup.org<mailto:austin-group-l@opengroup.org>
Subject: RE: [1003.1(2013)/Issue7+TC1 697]: Adding of a getdirentries() 
function



I agree some additional clarity might be useful there, in the C standard. I'm 
reading it as the intent being sizeof is equivalent to offsetof the VLA in 
accordance with the restrictions placed on it by use of the . or -> operators, 
which may not need extra bytes (so >vla == ( + sizeof(s)) is a truism, in 
other words) but it is not that specific.





On Tuesday, September 1, 2020 Wojtek Lerch 
mailto:wle...@blackberry.com>> wrote:

That sounds a little backwards – it’s everything else that works as if the 
flexible (not “variable”) member were not present.  The sizeof operator, as an 
exception, can return a greater value.  (The “.” and “->” operators are another 
exception.)



The standard does not say how much greater the value may be, or promise that it 
must be greater, even if padding is necessary to align the flexible member – as 
far as I can tell, sizeof(structure) can be less than offsetof(structure, 
flexible).



From: austin-group-l@opengroup.org<mailto:austin-group-l@opengroup.org> 
mailto:austin-group-l@opengroup.org>>
Sent: September 1, 2020 10:52 AM
To: g...@opengroup.org<mailto:g...@opengroup.org>; 
austin-group-l@opengroup.org<mailto:austin-gr

Re: [1003.1(2013)/Issue7+TC1 0000697]: Adding of a getdirentries() function

2020-09-01 Thread Steffen Nurpmeso via austin-group-l at The Open Group
shwaresyst wrote in
 <1739483391.1543785.1598977118...@mail.yahoo.com>:
 |No, it couldn't introduce such a macro, because such would have to \
 |assume all d_name entries are the same length. Adding an option to \

Well it has to go for NAME_MAX + the_size_of_posix_dent for each
and every entry, this is what you want here?  Except for what
Philip Guenther said, of course.  But if it would be left
implementation defined then even that could be covered by the
macro, better than by anything else.

I for one feel you are very brave to apply sizeof() to anything
with a "flexible array member", i would not dare that for portable
code.  (But my code has to work with ISO C89 too, so i have to use
macros to switch between [a-number] and [] as applicable, and also
to SIZEOF these types.)

Really, you are very brave!  Just the bugs i had to work around
since 2018 or what for a really tiny set of primitive tools!
(Like some gregarious animal not inlining for -Os, and another
huge one requiring explicit this-> to find superclass fields in
one class, but not the other.)

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)



Re: [1003.1(2013)/Issue7+TC1 0000697]: Adding of a getdirentries() function

2020-09-01 Thread Steffen Nurpmeso via austin-group-l at The Open Group
Wojtek Lerch via austin-group-l at The Open Group wrote in
 :
 |Geoff Clare wrote:
 |> We can't require d_name in struct dirent to be a VLA since there \
 |> are implementations where it is not.
 |
 |Another good reason is that standard C does not allow structure members \
 |to be VLAs.
 |
 |C11 6.7.2.1#9 "A member of a structure or union may have any complete \
 |object type other than a variably modified type."

And there is __STDC_NO_VLA__ under conditional feature macros in
the draft i have.

 |If implementations that define d_name as a VLA do in fact exist, they'd \
 |have to use some strange compiler extension.  (GCC does allow VLAs \
 |in structures, but only when the struct is defined inside a function \
 |-- a typedef in a header will not work.)
 |
 |A structure member can be a "flexible array" in standard C, but that's \
 |not the same thing as a VLA.

As usual, "the last element of a structure with more than one
named member may have an incomplete array type; this is called
a flexible array member."
So, it would be a syntax error as a VLA, since that requires an
unflexible definition of its variability for sure.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)



RE: [1003.1(2013)/Issue7+TC1 0000697]: Adding of a getdirentries() function

2020-09-01 Thread shwaresyst via austin-group-l at The Open Group

What that refers to, it looks, is any tail padding for the structure as a 
whole. The standard still permits internal padding between individual fields as 
required, e.g. a struct s { short a; double b[] } might need 6 bytes of this 
padding to align access for b[0]. This would still be needed if b[] only has a 
few members as a VLA but s is being page aligned, and so would reserve a lot of 
tail padding too. There would be 2 padding regions, however, is what that 
change forces.
On Tuesday, September 1, 2020 Wojtek Lerch  wrote:
#yiv7361582445 #yiv7361582445 -- _filtered {} _filtered {} _filtered 
{}#yiv7361582445 #yiv7361582445 p.yiv7361582445MsoNormal, #yiv7361582445 
li.yiv7361582445MsoNormal, #yiv7361582445 div.yiv7361582445MsoNormal 
{margin:0cm;margin-bottom:.0001pt;font-size:11.0pt;font-family:sans-serif;}#yiv7361582445
 a:link, #yiv7361582445 span.yiv7361582445MsoHyperlink 
{color:#0563C1;text-decoration:underline;}#yiv7361582445 
p.yiv7361582445MsoNoSpacing, #yiv7361582445 li.yiv7361582445MsoNoSpacing, 
#yiv7361582445 div.yiv7361582445MsoNoSpacing 
{margin:0cm;margin-bottom:.0001pt;font-size:11.0pt;font-family:sans-serif;}#yiv7361582445
 p.yiv7361582445msonormal, #yiv7361582445 li.yiv7361582445msonormal, 
#yiv7361582445 div.yiv7361582445msonormal 
{margin-right:0cm;margin-left:0cm;font-size:11.0pt;font-family:sans-serif;}#yiv7361582445
 span.yiv7361582445EmailStyle27 {font-family:New 
serif;color:windowtext;}#yiv7361582445 .yiv7361582445MsoChpDefault 
{font-size:10.0pt;} _filtered {}#yiv7361582445 div.yiv7361582445WordSection1 
{}#yiv7361582445 
Actually the intent was the opposite.  The original C99 did contain a wording 
that matches your interpretation:
 
  
 
… the size of the structureshall be equal to the offset of the last element of 
an otherwise identical structure that replaces the flexible array member with 
an array of unspecified length.
 
  
 
But this was reported as a defect, and corrected in TC2.
 
  
 
Summary
 6.7.2.1 Structure and union specifiers, paragraphs 15 and 16 require that any 
padding for alignment of a structure containing a flexible array member must 
preceed the flexible array member.  This contradicts existing implementations.  
We do not believe this was the intent of the C99 specification.
 
Details
 
If a struct contains a flexible array member and also requires padding for 
alignment, then the current C99 specification requires the implementation to 
put this paddingbefore the flexible array member.  However, existing 
implementations, including at least GNU C, Compaq C, and Sun C, put the 
paddingafter the flexible array member.
 
The layout used by existing implementations can be more efficient. Furthermore, 
requiring these existing implementations to change their layout would break 
binary backwards compatibility with previous versions.
 
  
 
See DR282 for more 
details:http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_282.htm
 
  
 
  
 
From: shwaresyst  
Sent: September 1, 2020 2:27 PM
To: Wojtek Lerch ; austin-group-l@opengroup.org
Subject: RE: [1003.1(2013)/Issue7+TC1 697]: Adding of a getdirentries() 
function
 
  
 
I agree some additional clarity might be useful there, in the C standard. I'm 
reading it as the intent being sizeof is equivalent to offsetof the VLA in 
accordance with the restrictions placed on it by use of the . or -> operators, 
which may not need extra bytes (so >vla == ( + sizeof(s)) is a truism, in 
other words) but it is not that specific.
 
  
 
On Tuesday, September 1, 2020 Wojtek Lerch  wrote:
 
That sounds a little backwards – it’severything else that works as if the 
flexible (not “variable”) member were not present.  The sizeof operator, as an 
exception, can return a greater value.  (The “.” and “->” operators are another 
exception.)
 
 
 
The standard does not sayhow much greater the value may be, or promise that it 
must be greater, even if padding is necessary to align the flexible member – as 
far as I can tell, sizeof(structure) can beless than offsetof(structure, 
flexible).
 
 
 
From: austin-group-l@opengroup.org 
Sent: September 1, 2020 10:52 AM
To: g...@opengroup.org; austin-group-l@opengroup.org
Subject: Re: [1003.1(2013)/Issue7+TC1 0000697]: Adding of a getdirentries() 
function
 
 
 
It's my understanding, by C11 6.7.2.1p18, sizeof on a struct with a variable 
array works as if the variable member was not present, but does count any bytes 
added for alignment padding, as this will be a fixed amount for each use of the 
struct. It is up to the application, like with variable argument lists, to 
establish a protocol that allows it to determine the effective size of the 
final member.
 
This transmission (including any attachments) may contain confidential 
information, privileged material (including material protected by the 
solicitor-client or other applicable privileges), or constitute non-public 
information. Any use of this information by anyone other than the intended 
recipient is p

Re: [1003.1(2013)/Issue7+TC1 0000697]: Adding of a getdirentries() function

2020-09-01 Thread Philip Guenther via austin-group-l at The Open Group
On Tue, Sep 1, 2020 at 6:22 AM Steffen Nurpmeso via austin-group-l at The
Open Group  wrote:

> Robert Elz via austin-group-l at The Open Group wrote in
>  <9252.1598969...@jinx.noi.kre.to>:
>  |Date:Tue, 1 Sep 2020 10:32:55 +0100
>  |From:"Geoff Clare via austin-group-l at The Open Group" \
>  |
>  |Message-ID:  <20200901093255.GA7629@localhost>
>   ...
>  |What's more important is what happens if the application buffer isn't
>  |big enough for the next entry.What do the existing getdents()
>  |implementations do in that case?   If they're all the same then
>  |posix_getdent() should do the same thing (EINVAL?  E2BIG?) - if they
>  |differ, then we can decide what's best.
>
> Isn't that covered nicely by the posted text?  There must be space
> for at least one entry, otherwise EINVAL occurs?  And upon success
> "a non-negative integer shall be returned indicating the number of
> bytes occupied by the posix_dent structures placed in
> buf", which even for a non-native tongue implies that there
> may be pad left.  I think you are overcomplicating here.
>

A quick review of FreeBSD, NetBSD, and OpenBSD finds they all return EINVAL
if the buffer isn't "big enough".
For OpenBSD, the minimum buffer size is 512 bytes; if I'm reading it
correctly NetBSD is similar, possibly varying based on filesystem
formatting.
FreeBSD requires space for the next entry.


 |Similarly for what is done for directory pieces that don't contain
>  |files, on filesystems that allow that (inode number == 0 or perhaps
>  |a file type for "dummy entry" or something, or whatever).
>
> I personally would say that these should be skipped.  The data is
> copied over to user buffers, and these entries are simply not
> copied.  That seems to be the best.  The Group does not seem to
> want to add DT_WHITEOUT or similar things.
>

DT_WHITEOUT is different, related to union mounts.

 |Do the existing implementations ever return such things?   Do they
>
> I personally have not seen it, but this likely is a very
> filesystem dependent thing, which possibly even changes over time


At least NetBSD and OpenBSD will return an entry with d_ino == 0 if the
first entry in a block is removed.  I suspect others may do this as well;
glibc at least includes code to skip such entries in its generic readdir()
implementation.


The question really is "is this supposed to be a API that can be trivially
supported by all the existing versions, even if that makes it more clunky
to use, or should it be easy to use even if every single existing
implementation needs to bend?"

If the former, then define a minimum buffer size
(pathconf(_PC_DIRBUFMIN)...?), permit d_ino==0 as entries where d_name and
d_type are unspecified, and let d_name be either a fixed fix array or a
flexible array member.

If the latter, then require it to work with very small buffers, require all
entries to have valid d_name and d_type, and specify d_name as a FAM.


Philip Guenther


RE: [1003.1(2013)/Issue7+TC1 0000697]: Adding of a getdirentries() function

2020-09-01 Thread Wojtek Lerch via austin-group-l at The Open Group
Actually the intent was the opposite.  The original C99 did contain a wording 
that matches your interpretation:

… the size of the structure shall be equal to the offset of the last element of 
an otherwise identical structure that replaces the flexible array member with 
an array of unspecified length.

But this was reported as a defect, and corrected in TC2.


Summary
 6.7.2.1 Structure and union specifiers, paragraphs 15 and 16 require that any 
padding for alignment of a structure containing a flexible array member must 
preceed the flexible array member.  This contradicts existing implementations.  
We do not believe this was the intent of the C99 specification.

Details

If a struct contains a flexible array member and also requires padding for 
alignment, then the current C99 specification requires the implementation to 
put this padding before the flexible array member.  However, existing 
implementations, including at least GNU C, Compaq C, and Sun C, put the padding 
after the flexible array member.

The layout used by existing implementations can be more efficient. Furthermore, 
requiring these existing implementations to change their layout would break 
binary backwards compatibility with previous versions.

See DR282 for more details: 
http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_282.htm


From: shwaresyst 
Sent: September 1, 2020 2:27 PM
To: Wojtek Lerch ; austin-group-l@opengroup.org
Subject: RE: [1003.1(2013)/Issue7+TC1 697]: Adding of a getdirentries() 
function


I agree some additional clarity might be useful there, in the C standard. I'm 
reading it as the intent being sizeof is equivalent to offsetof the VLA in 
accordance with the restrictions placed on it by use of the . or -> operators, 
which may not need extra bytes (so >vla == ( + sizeof(s)) is a truism, in 
other words) but it is not that specific.


On Tuesday, September 1, 2020 Wojtek Lerch 
mailto:wle...@blackberry.com>> wrote:

That sounds a little backwards – it’s everything else that works as if the 
flexible (not “variable”) member were not present.  The sizeof operator, as an 
exception, can return a greater value.  (The “.” and “->” operators are another 
exception.)



The standard does not say how much greater the value may be, or promise that it 
must be greater, even if padding is necessary to align the flexible member – as 
far as I can tell, sizeof(structure) can be less than offsetof(structure, 
flexible).



From: austin-group-l@opengroup.org<mailto:austin-group-l@opengroup.org> 
mailto:austin-group-l@opengroup.org>>
Sent: September 1, 2020 10:52 AM
To: g...@opengroup.org<mailto:g...@opengroup.org>; 
austin-group-l@opengroup.org<mailto:austin-group-l@opengroup.org>
Subject: Re: [1003.1(2013)/Issue7+TC1 697]: Adding of a getdirentries() 
function


It's my understanding, by C11 6.7.2.1p18, sizeof on a struct with a variable 
array works as if the variable member was not present, but does count any bytes 
added for alignment padding, as this will be a fixed amount for each use of the 
struct. It is up to the application, like with variable argument lists, to 
establish a protocol that allows it to determine the effective size of the 
final member.

This transmission (including any attachments) may contain confidential 
information, privileged material (including material protected by the 
solicitor-client or other applicable privileges), or constitute non-public 
information. Any use of this information by anyone other than the intended 
recipient is prohibited. If you have received this transmission in error, 
please immediately reply to the sender and delete this information from your 
system. Use, dissemination, distribution, or reproduction of this transmission 
by unintended recipients is not authorized and may be unlawful.


--
This transmission (including any attachments) may contain confidential 
information, privileged material (including material protected by the 
solicitor-client or other applicable privileges), or constitute non-public 
information. Any use of this information by anyone other than the intended 
recipient is prohibited. If you have received this transmission in error, 
please immediately reply to the sender and delete this information from your 
system. Use, dissemination, distribution, or reproduction of this transmission 
by unintended recipients is not authorized and may be unlawful.


Re: [1003.1(2013)/Issue7+TC1 0000697]: Adding of a getdirentries() function

2020-09-01 Thread Philip Guenther via austin-group-l at The Open Group
On Tue, Sep 1, 2020 at 5:40 AM Geoff Clare via austin-group-l at The Open
Group  wrote:

> > --
> >  (0004958) philip-guenther (reporter) - 2020-08-30 23:06
> >  https://austingroupbugs.net/view.php?id=6
> 


-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England

> 97#c4958 
> > --
> > The proposed text includes:
> > The d_name member shall be a filename string, and (if not dot
> or dot-dot)
> > shall contain the same byte sequence as the last pathname component
> of the
> > string used to create the directory entry, plus the terminating
>  byte.
> >
> > That would seem to require that all returned entries correspond to
> > filenames that existed in the directory at _some_ point in time.
>
> It is just copied from existing text for readdir() in Issue 8 draft 1.
> See bug 293.
>

That part is fine, that the returned names match the creation names.  My
concern in that comment is that there's no requirement on posix_getdents()
to only return _currently_ existing names.

If a posix_getdents() implementation returned the names of all the files
that ever existed in the given directory, including ones that were removed
before the fd for this call was opened, what requirement in the standard
would that violate?  I don't see any, thus my suggested wording for such a
requirement.

Philip


RE: [1003.1(2013)/Issue7+TC1 0000697]: Adding of a getdirentries() function

2020-09-01 Thread shwaresyst via austin-group-l at The Open Group

I agree some additional clarity might be useful there, in the C standard. I'm 
reading it as the intent being sizeof is equivalent to offsetof the VLA in 
accordance with the restrictions placed on it by use of the . or -> operators, 
which may not need extra bytes (so >vla == ( + sizeof(s)) is a truism, in 
other words) but it is not that specific.
On Tuesday, September 1, 2020 Wojtek Lerch  wrote:
#yiv0502119094 #yiv0502119094 -- _filtered {} _filtered {}#yiv0502119094 
#yiv0502119094 p.yiv0502119094MsoNormal, #yiv0502119094 
li.yiv0502119094MsoNormal, #yiv0502119094 div.yiv0502119094MsoNormal 
{margin:0cm;margin-bottom:.0001pt;font-size:11.0pt;font-family:sans-serif;}#yiv0502119094
 span.yiv0502119094EmailStyle20 {font-family:New 
serif;color:windowtext;}#yiv0502119094 .yiv0502119094MsoChpDefault 
{font-size:10.0pt;} _filtered {}#yiv0502119094 div.yiv0502119094WordSection1 
{}#yiv0502119094 
That sounds a little backwards – it’severything else that works as if the 
flexible (not “variable”) member were not present.  The sizeof operator, as an 
exception, can return a greater value.  (The “.” and “->” operators are another 
exception.)
 
  
 
The standard does not sayhow much greater the value may be, or promise that it 
must be greater, even if padding is necessary to align the flexible member – as 
far as I can tell, sizeof(structure) can beless than offsetof(structure, 
flexible).
 

 
  
 
From: austin-group-l@opengroup.org 
Sent: September 1, 2020 10:52 AM
To: g...@opengroup.org; austin-group-l@opengroup.org
Subject: Re: [1003.1(2013)/Issue7+TC1 0000697]: Adding of a getdirentries() 
function
 
  
 

It's my understanding, by C11 6.7.2.1p18, sizeof on a struct with a variable 
array works as if the variable member was not present, but does count any bytes 
added for alignment padding, as this will be a fixed amount for each use of the 
struct. It is up to the application, like with variable argument lists, to 
establish a protocol that allows it to determine the effective size of the 
final member. This transmission (including any attachments) may contain 
confidential information, privileged material (including material protected by 
the solicitor-client or other applicable privileges), or constitute non-public 
information. Any use of this information by anyone other than the intended 
recipient is prohibited. If you have received this transmission in error, 
please immediately reply to the sender and delete this information from your 
system. Use, dissemination, distribution, or reproduction of this transmission 
by unintended recipients is not authorized and may be unlawful.


RE: [1003.1(2013)/Issue7+TC1 0000697]: Adding of a getdirentries() function

2020-09-01 Thread Wojtek Lerch via austin-group-l at The Open Group
That sounds a little backwards – it’s everything else that works as if the 
flexible (not “variable”) member were not present.  The sizeof operator, as an 
exception, can return a greater value.  (The “.” and “->” operators are another 
exception.)

The standard does not say how much greater the value may be, or promise that it 
must be greater, even if padding is necessary to align the flexible member – as 
far as I can tell, sizeof(structure) can be less than offsetof(structure, 
flexible).

From: austin-group-l@opengroup.org 
Sent: September 1, 2020 10:52 AM
To: g...@opengroup.org; austin-group-l@opengroup.org
Subject: Re: [1003.1(2013)/Issue7+TC1 697]: Adding of a getdirentries() 
function


It's my understanding, by C11 6.7.2.1p18, sizeof on a struct with a variable 
array works as if the variable member was not present, but does count any bytes 
added for alignment padding, as this will be a fixed amount for each use of the 
struct. It is up to the application, like with variable argument lists, to 
establish a protocol that allows it to determine the effective size of the 
final member.

--
This transmission (including any attachments) may contain confidential 
information, privileged material (including material protected by the 
solicitor-client or other applicable privileges), or constitute non-public 
information. Any use of this information by anyone other than the intended 
recipient is prohibited. If you have received this transmission in error, 
please immediately reply to the sender and delete this information from your 
system. Use, dissemination, distribution, or reproduction of this transmission 
by unintended recipients is not authorized and may be unlawful.


RE: [1003.1(2013)/Issue7+TC1 0000697]: Adding of a getdirentries() function

2020-09-01 Thread Wojtek Lerch via austin-group-l at The Open Group
Geoff Clare wrote:
> We can't require d_name in struct dirent to be a VLA since there are 
> implementations where it is not.

Another good reason is that standard C does not allow structure members to be 
VLAs.

C11 6.7.2.1#9 "A member of a structure or union may have any complete object 
type other than a variably modified type."

If implementations that define d_name as a VLA do in fact exist, they'd have to 
use some strange compiler extension.  (GCC does allow VLAs in structures, but 
only when the struct is defined inside a function -- a typedef in a header will 
not work.)

A structure member can be a "flexible array" in standard C, but that's not the 
same thing as a VLA.

--
This transmission (including any attachments) may contain confidential 
information, privileged material (including material protected by the 
solicitor-client or other applicable privileges), or constitute non-public 
information. Any use of this information by anyone other than the intended 
recipient is prohibited. If you have received this transmission in error, 
please immediately reply to the sender and delete this information from your 
system. Use, dissemination, distribution, or reproduction of this transmission 
by unintended recipients is not authorized and may be unlawful.



Re: [1003.1(2013)/Issue7+TC1 0000697]: Adding of a getdirentries() function

2020-09-01 Thread shwaresyst via austin-group-l at The Open Group

No, it couldn't introduce such a macro, because such would have to assume all 
d_name entries are the same length. Adding an option to the interface to do a 
count, as a vararg parameter, and directly malloc the necessary space, returned 
via my suggested change to buf as a **, is plausible. Since we are merging 
common behaviors with this interface introduction, not describing a single 
reference implementation, such changes are permitted if someone commits to 
doing an implementation, afaik.
On Tuesday, September 1, 2020 Steffen Nurpmeso via austin-group-l at The Open 
Group  wrote:
Geoff Clare via austin-group-l at The Open Group wrote in
 <20200901143300.GB24606@localhost>:
 |> -- 
 |>  (0004953) philip-guenther (reporter) - 2020-08-28 22:52
 |>  https://www.austingroupbugs.net/view.php?id=697#c4953 
 |> -- 
 |> I think the unspecified nature of the d_name member in the new posix_dent
 |> makes writing portable software more difficult while providing only \
 |> minimal
 |> benefit to programs that don't care.  I would support requiring it \
 |> to be a
 |> flexible array member and thus eliminating the error of declaring \
 |> an array
 |> and trying to walk it via indexing instead of by advancing a char pointer
 |> by d_reclen.
 |
 |I think we should keep the requirements for d_name the same between
 |struct dirent and struct posix_dent.  Some implementations of
 |getdents() and getdirentries() use struct dirent and they should be
 |able to make posix_getdents() a synonym (or a light wrapper) for the
 |existing function by making struct posix_dent be identical to struct
 |dirent.  We can't require d_name in struct dirent to be a VLA since
 |there are implementations where it is not.

The standard could also introduce a macro which could be used to
space a buffer accordingly, something like (very ugly)
POSIX_GETDENTS_BYTES_FOR_DENTS(number-of-desired-dents), and use
it in the example.
Like that any possible errors with buffer space allocation would
not even be introduced (except for possible integer overflows,
maybe).

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter          he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)



Re: [1003.1(2013)/Issue7+TC1 0000697]: Adding of a getdirentries() function

2020-09-01 Thread Geoff Clare via austin-group-l at The Open Group
shwaresyst wrote, on 01 Sep 2020:
> 
> It's my understanding, by C11 6.7.2.1p18, sizeof on a struct with a variable 
> array works as if the variable member was not present

Thanks for the pointer. 

So it looks like removing "or equal to" is all that is needed here.

> On Tuesday, September 1, 2020 Geoff Clare via austin-group-l at The Open 
> Group  wrote:
> Per Mildner wrote, on 30 Aug 2020:
> >
> > The posix_getdents() function shall ... place ... posix_dent structures in 
> > the buffer pointed to by buf up to a maximum of nbyte bytes"
> > "The array d_name ... shall contain a filename of at most {NAME_MAX} bytes 
> > followed by a terminating null byte." (so could need up to {NAME_MAX} + 1 
> > bytes).
> > "Implementations may define the d_name array .. to ... use a flexible array 
> > member" (meaning the d_name array does not affect the "size of the 
> > posix_dent structure").
> > 
> > Does the above not imply that the following should use "greater than", 
> > rather than "greater than or equal", to make room for "a terminating null 
> > byte"?
> > 
> > "The number of posix_dent structures populated in buf ... shall be at least 
> > one if nbyte is greater than or equal to the size of the posix_dent 
> > structure plus {NAME_MAX} ..."
> > 
> 
> Good catch. This text predates the stuff about d_name possibly being a
> flexible array member, and needs updating.  For now I have marked "or
> equal to" for deletion in the etherpad, but I think there is still a
> problem with "size of the posix_dent structure" as you can't use
> sizeof(struct posix_dent) if d_name is a flexible array member.
> 
> To be correct it would have to distinguish the two cases:
> 
>     ... greater than {NAME_MAX} plus
>     * the size of the posix_dent structure, if d_name is not a
>       flexible array member, or
>     * the offset of d_name in the posix_dent structure, if d_name is a
>       flexible array member.
> 
> but I'm not sure how useful this would be to applications.  In any case
> it would be highly unusual for an application to use such a small buffer.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: [1003.1(2013)/Issue7+TC1 0000697]: Adding of a getdirentries() function

2020-09-01 Thread Steffen Nurpmeso via austin-group-l at The Open Group
Geoff Clare via austin-group-l at The Open Group wrote in
 <20200901143300.GB24606@localhost>:
 |> -- 
 |>  (0004953) philip-guenther (reporter) - 2020-08-28 22:52
 |>  https://www.austingroupbugs.net/view.php?id=697#c4953 
 |> -- 
 |> I think the unspecified nature of the d_name member in the new posix_dent
 |> makes writing portable software more difficult while providing only \
 |> minimal
 |> benefit to programs that don't care.  I would support requiring it \
 |> to be a
 |> flexible array member and thus eliminating the error of declaring \
 |> an array
 |> and trying to walk it via indexing instead of by advancing a char pointer
 |> by d_reclen.
 |
 |I think we should keep the requirements for d_name the same between
 |struct dirent and struct posix_dent.  Some implementations of
 |getdents() and getdirentries() use struct dirent and they should be
 |able to make posix_getdents() a synonym (or a light wrapper) for the
 |existing function by making struct posix_dent be identical to struct
 |dirent.  We can't require d_name in struct dirent to be a VLA since
 |there are implementations where it is not.

The standard could also introduce a macro which could be used to
space a buffer accordingly, something like (very ugly)
POSIX_GETDENTS_BYTES_FOR_DENTS(number-of-desired-dents), and use
it in the example.
Like that any possible errors with buffer space allocation would
not even be introduced (except for possible integer overflows,
maybe).

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)



Re: [1003.1(2013)/Issue7+TC1 0000697]: Adding of a getdirentries() function

2020-09-01 Thread Steffen Nurpmeso via austin-group-l at The Open Group
Robert Elz via austin-group-l at The Open Group wrote in
 <9252.1598969...@jinx.noi.kre.to>:
 |Date:Tue, 1 Sep 2020 10:32:55 +0100
 |From:"Geoff Clare via austin-group-l at The Open Group" \
 |
 |Message-ID:  <20200901093255.GA7629@localhost>
 |
 || but I'm not sure how useful this would be to applications.  In any case
 || it would be highly unusual for an application to use such a small buffer.
 |
 |I would suggest that we stop worrying about telling applications how
 |to write code to use the interfaces, or at least until the interface is
 |properly specified.

But the interfaces are decades old, isn't that wording too strong?

if(_impl->cast.ui1p < _impl->maxcast || a_FillBuffer(_impl)) {
de = _impl->cast.dep;
_impl->cast.ui1p += de->d_reclen;
} else
de = NIL;

This code worked two decades ago, and it would be great if it
would still work in two decades from now on.

   _impl->cast.ui1p = _impl->buffer;
   _impl->maxcast = _impl->buffer + u.osret;

where u.osret is "what posix_getdents()" returned (positively).

 |What's more important is what happens if the application buffer isn't
 |big enough for the next entry.What do the existing getdents()
 |implementations do in that case?   If they're all the same then
 |posix_getdent() should do the same thing (EINVAL?  E2BIG?) - if they
 |differ, then we can decide what's best.

Isn't that covered nicely by the posted text?  There must be space
for at least one entry, otherwise EINVAL occurs?  And upon success
"a non-negative integer shall be returned indicating the number of
bytes occupied by the posix_dent structures placed in
buf", which even for a non-native tongue implies that there
may be pad left.  I think you are overcomplicating here.

 |Similarly for what is done for directory pieces that don't contain
 |files, on filesystems that allow that (inode number == 0 or perhaps
 |a file type for "dummy entry" or something, or whatever).

I personally would say that these should be skipped.  The data is
copied over to user buffers, and these entries are simply not
copied.  That seems to be the best.  The Group does not seem to
want to add DT_WHITEOUT or similar things.

With directory entries you always have races, as you of course
surely know, so any data you see can anyway only be an indication,
the standard itself talks about races regarding this (and
introduced the "at" series to overcome some of them).

 |Do the existing implementations ever return such things?   Do they

I personally have not seen it, but this likely is a very
filesystem dependent thing, which possibly even changes over time.

 |hide them by making the reclen of the previous entry (if there is
 |one in the buffer) bigger, or do they squash them out, moving the
 |next existing entry down to follow immediately after the previous one
 |(where all the reclen's are as small as possible to contain the
 |sctuct header, the name (and its \0) and alignment padding.)   This is
 |a case where we don't necessarily need to specify one scheme that
 |must be used - we can leave that for the implementation, as long as
 |applications are informed what might happen.

The proposed text says that filenames are NUL terminated and
hopping from entry to entry happens by adding the reclen to the
current entry (casted to char*).  So it seems there could be data
in between.  Empty names are not allowed, so this.  As some
getdent implementations used to use d_ino fields, others d_fileno,
it may be necessary anyway to create a very small posix_getdents()
system call wrapper, one which boils down the huge number of
filesystem informations to what posix_dent actually serves?

 |If after all of this (and perhaps more) is worked out, if there is
 |an example application fragment that can usefully be included to
 |demonstrate how the interface might be used, then fine - but this
 |is a bonus extra, not really required in the standard.

It is wonderful you say this, having a way to directly read
directory content into buffers without having to use these other
functions, which may perform memory allocations which may impose
locking noise etc etc, and getting the d_type field directly as
well, and you know _how_ terribly it was to write that code in the
past, where you possibly even had to have valid path names around
in order to stat(2) a directory entry, at least for the
theoretical case that d_type does not exist!

Thanks,

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)



Re: [1003.1(2013)/Issue7+TC1 0000697]: Adding of a getdirentries() function

2020-09-01 Thread Geoff Clare via austin-group-l at The Open Group
> -- 
>  (0004958) philip-guenther (reporter) - 2020-08-30 23:06
>  https://austingroupbugs.net/view.php?id=697#c4958 
> -- 
> The proposed text includes:
> The d_name member shall be a filename string, and (if not dot or
> dot-dot)
> shall contain the same byte sequence as the last pathname component of
> the
> string used to create the directory entry, plus the terminating 
> byte.
> 
> That would seem to require that all returned entries correspond to
> filenames that existed in the directory at _some_ point in time.

It is just copied from existing text for readdir() in Issue 8 draft 1.
See bug 293.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: [1003.1(2013)/Issue7+TC1 0000697]: Adding of a getdirentries() function

2020-09-01 Thread Geoff Clare via austin-group-l at The Open Group
> -- 
>  (0004953) philip-guenther (reporter) - 2020-08-28 22:52
>  https://www.austingroupbugs.net/view.php?id=697#c4953 
> -- 
> I think the unspecified nature of the d_name member in the new posix_dent
> makes writing portable software more difficult while providing only minimal
> benefit to programs that don't care.  I would support requiring it to be a
> flexible array member and thus eliminating the error of declaring an array
> and trying to walk it via indexing instead of by advancing a char pointer
> by d_reclen.

I think we should keep the requirements for d_name the same between
struct dirent and struct posix_dent.  Some implementations of
getdents() and getdirentries() use struct dirent and they should be
able to make posix_getdents() a synonym (or a light wrapper) for the
existing function by making struct posix_dent be identical to struct
dirent.  We can't require d_name in struct dirent to be a VLA since
there are implementations where it is not.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: [1003.1(2013)/Issue7+TC1 0000697]: Adding of a getdirentries() function

2020-09-01 Thread Geoff Clare via austin-group-l at The Open Group
> -- 
>  (0004949) kre (reporter) - 2020-08-28 17:52
>  https://www.austingroupbugs.net/view.php?id=697#c4949 
> -- 
> I suspect that the intent of all of this is good, but this one
> phrase's wording (wrt the d_name field, it is used for that
> field in both struct direct and struct posix_dent):
> 
>  but shall contain a filename of at most {NAME_MAX} bytes
> 
> is incomprehensible to me.   I can read it as saying
> 
>  no file names longer than NAME_MAX bytes can ever occur in a d_name
> or as
>  the d_name field must be able to contain file names at least
>  NAME_MAX bytes long
> 
> I suspect that the latter is most likely what was intended, but I really
> don't know for sure - that one allows implementations to support
> filesystems
> where directories might contain names longer than can be regularly passed
> to
> system calls, for example, and generally allowing implementation
> extensions
> is desirable, but the former allows an application to declare an array of
> NAME_MAX+1 bytes and be sure that any d_name entry will fit.

The current wording is from bug 291 - see bugnote 578 which includes this:

At line 7577 (XBD dirent.h DESCRIPTION), change:

The character array d_name is of unspecified size, but the number of
bytes preceding the terminating null byte shall not exceed {NAME_MAX}.

to:

The array d_name is of unspecified size, but shall contain a filename
of at most {NAME_MAX} bytes followed by a terminating null byte.

If you want to get this changed, you should look back at the discussions
that led to the above change to understand the reasons behind it, and
then submit a separate Mantis bug in the Issue7+TC2 project.

Bug 697 is not the right place to make extra changes of this nature - it
should be limited to what's needed to add posix_getdents().

> 
> This phrase ought to be reworded to make it clear what is intended there.
> Since it is talking about a variable length array, better phrasing would
> probably concentrate on what bounds exist for the size of that array,
> rather
> than the length of what might be stored within it.

The array itself, in the struct dirent definition, can be 1 byte in size
(as stated in existing RATIONALE).  It can also be a VLA, which has no
size.  So talking about the size of the array is not meaningful. The only
thing that matters is how many bytes are stored at that address.

> And while I'm here, I don't think we need to be providing C tutorials,
> so I'd drop all the stuff about arrays of posix_dent structures
> completely - attempting to write a program using such a thing would be
> folly 

The reason for including it is that some implementations might have
d_name as full size, others might use the d_name[1] trick or have it
as a VLA.  An application writer who develops an application on an
implementation with a full size d_name might use:

struct posix_dent buf[100];

and it would work fine on that system, but would cause problems if
it is later ported to other systems.  This is entirely in keeping with
the purpose of the APPLICATION USAGE section, i.e. to warn about
potential portability issues.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: [1003.1(2013)/Issue7+TC1 0000697]: Adding of a getdirentries() function

2020-09-01 Thread Robert Elz via austin-group-l at The Open Group
Date:Tue, 1 Sep 2020 10:32:55 +0100
From:"Geoff Clare via austin-group-l at The Open Group" 

Message-ID:  <20200901093255.GA7629@localhost>


  | but I'm not sure how useful this would be to applications.  In any case
  | it would be highly unusual for an application to use such a small buffer.

I would suggest that we stop worrying about telling applications how
to write code to use the interfaces, or at least until the interface is
properly specified.

What's more important is what happens if the application buffer isn't
big enough for the next entry.What do the existing getdents()
implementations do in that case?   If they're all the same then
posix_getdent() should do the same thing (EINVAL?  E2BIG?) - if they
differ, then we can decide what's best.

Similarly for what is done for directory pieces that don't contain
files, on filesystems that allow that (inode number == 0 or perhaps
a file type for "dummy entry" or something, or whatever).

Do the existing implementations ever return such things?   Do they
hide them by making the reclen of the previous entry (if there is
one in the buffer) bigger, or do they squash them out, moving the
next existing entry down to follow immediately after the previous one
(where all the reclen's are as small as possible to contain the
sctuct header, the name (and its \0) and alignment padding.)   This is
a case where we don't necessarily need to specify one scheme that
must be used - we can leave that for the implementation, as long as
applications are informed what might happen.

If after all of this (and perhaps more) is worked out, if there is
an example application fragment that can usefully be included to
demonstrate how the interface might be used, then fine - but this
is a bonus extra, not really required in the standard.

kre



Re: [1003.1(2013)/Issue7+TC1 0000697]: Adding of a getdirentries() function

2020-09-01 Thread Geoff Clare via austin-group-l at The Open Group
Per Mildner wrote, on 30 Aug 2020:
>
> The posix_getdents() function shall ... place ... posix_dent structures in 
> the buffer pointed to by buf up to a maximum of nbyte bytes"
> "The array d_name ... shall contain a filename of at most {NAME_MAX} bytes 
> followed by a terminating null byte." (so could need up to {NAME_MAX} + 1 
> bytes).
> "Implementations may define the d_name array .. to ... use a flexible array 
> member" (meaning the d_name array does not affect the "size of the posix_dent 
> structure").
> 
> Does the above not imply that the following should use "greater than", rather 
> than "greater than or equal", to make room for "a terminating null byte"?
> 
> "The number of posix_dent structures populated in buf ... shall be at least 
> one if nbyte is greater than or equal to the size of the posix_dent structure 
> plus {NAME_MAX} ..."
> 

Good catch. This text predates the stuff about d_name possibly being a
flexible array member, and needs updating.  For now I have marked "or
equal to" for deletion in the etherpad, but I think there is still a
problem with "size of the posix_dent structure" as you can't use
sizeof(struct posix_dent) if d_name is a flexible array member.

To be correct it would have to distinguish the two cases:

... greater than {NAME_MAX} plus
* the size of the posix_dent structure, if d_name is not a
  flexible array member, or
* the offset of d_name in the posix_dent structure, if d_name is a
  flexible array member.

but I'm not sure how useful this would be to applications.  In any case
it would be highly unusual for an application to use such a small buffer.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: [1003.1(2013)/Issue7+TC1 0000697]: Adding of a getdirentries() function

2020-08-30 Thread Per Mildner via austin-group-l at The Open Group
The posix_getdents() function shall ... place ... posix_dent structures in the 
buffer pointed to by buf up to a maximum of nbyte bytes"
"The array d_name ... shall contain a filename of at most {NAME_MAX} bytes 
followed by a terminating null byte." (so could need up to {NAME_MAX} + 1 
bytes).
"Implementations may define the d_name array .. to ... use a flexible array 
member" (meaning the d_name array does not affect the "size of the posix_dent 
structure").

Does the above not imply that the following should use "greater than", rather 
than "greater than or equal", to make room for "a terminating null byte"?

"The number of posix_dent structures populated in buf ... shall be at least one 
if nbyte is greater than or equal to the size of the posix_dent structure plus 
{NAME_MAX} ..."



Per Mildner
Ph.D.
Digital Systems
Department Computer Science
Unit Computer Systems

D: +46 10 228 43 11
per.mild...@ri.se

RISE Research Institutes of Sweden | ri.se