On Wed, 3 Jul 2019 at 18:37, Joe Orton <jor...@redhat.com> wrote:
>
> On Wed, Jul 03, 2019 at 02:43:02PM +0300, Ivan Zhakov wrote:
> > They also make the existing old API unusable for many cases thus
> > making a switch to the new API mandatory, even though it doesn't have
> > to be so (see below).
> >
> > APR 1.x did not specify that the result of apr_dir_read() has to be 
> > allocated
> > in dir->pool:
> >
> >   
> > https://apr.apache.org/docs/apr/1.7/group__apr__dir.html#ga3e4ee253e0c712160bee10bfb9c8e4a8
> >
> > This function does not accept a pool argument, and I think that it's natural
> > to assume limited lifetime in such case. This is what the current major APR
> > users (httpd, svn) do, and other users should probably be ready for that 
> > too,
> > since on Win32 the subsequent calls to apr_dir_read() already invalidate the
> > previous apr_finfo_t.
>
> Are there many examples in the APR API where returned strings are not
> either pool-allocated or constant (process lifetime)?  It seems unusual
> and very surprising to me.
>
> A hypothetical API consumer can have made either assumption:
>
> a) of constant memory (true with limits on Win32, breaks for Unix), or
> b) of pool-allocated string lifetime (works for Unix, breaks for Win32)
>
> neither is documented and both are broken by current implementations. It
> is impossible to satisfy both so anybody can argue that fixing either
> implementation to match the other is a regression.
>
> Doubling down on (a) over (b) seems strongly inferior to me, the API
> cannot support this without introducing runtime overhead for all callers
> (a new iterpool in the dir object), so I'd rather fix this properly with
> a new API.
>
> I'd be fine with leaving Win32 apr_dir_read() behaviour as-is for 1.x at
> the same as introducing _pread() if desired - making the strdup only
> happen for _pread would only be a minor tweak, not a whole new subpool.
>

There is example of apr_hash_t.ITERATOR that is statically allocated in
the hash object and returned to the caller.

Also the native readdir() API behaves the same way [1]:
[[[
The returned pointer, and pointers within the structure, might be invalidated
or the structure or the storage areas might be overwritten by a subsequent call
to readdir() on the same directory stream.
]]]

>From this perspective the current behavior of apr_dir_read on Unix looks
inconsistent and surprising for the API user because it is impossible to use
in a loop without unbounded memory usage.

I strongly agree that the revved version of this function that accepts a POOL
argument will be good from both usage and implementation terms. I would be +1
on adding such API. But I also think that unless we fix the original issue
by using the preallocated buffer/iterpool, the whole thing would still be
problematic for any existing API user.

More detailed: if we take this opportunity to choose and document the API to
consistently return a temporary result, then as the caller I can either process
the data during iteration or manually put it into a long-living pool if that is
necessary. I think that all current callers work with the API this way. And I
can also switch to the new revved function to manually specify the pool and
better control the allocations. If I would like to support both APR 1.x and 2.x
the compatibility is also straightforward to achieve.

If we instead choose an option where the results may be allocated
in the dir object pool, then as the caller I cannot avoid unbounded memory
usage when supporting both APR 1.x and 2.x. And this approach also introduces
the unavoidable the unbounded memory usage for all Win32 users that
do not update their code.

[1] 
https://pubs.opengroup.org/onlinepubs/9699919799/functions/readdir.html#tag_16_475


-- 
Ivan Zhakov

Reply via email to