Re: svn commit: r1862071 - in /apr/apr/trunk: file_io/os2/dir.c file_io/unix/dir.c file_io/win32/dir.c include/apr_file_info.h test/testdir.c

2019-09-07 Thread Ivan Zhakov
On Wed, 28 Aug 2019 at 10:08, Joe Orton  wrote:
>
> On Sat, Aug 24, 2019 at 05:39:17PM +0300, Ivan Zhakov wrote:
> > For what it's worth, I'm -1 on the changes done in r1862071 and
> > r1862435 for the reasons listed above.
> >
> > PS: No idea if I can actually veto changes, because while I'm a
> > committer on the APR project, I am not in its PMC. But "-1" properly
> > expresses my technical opinion on this topic.
>
> OK fair enough, Ivan - reverted in r1866019 and please go ahead with the
> iterpool-based proposal - thanks a lot for reviewing this.
>
Thanks!

I'm currently busy with other thing, but I'll add this problem to my
TODO list and will try to fix it later.

-- 
Ivan Zhakov


Re: svn commit: r1862071 - in /apr/apr/trunk: file_io/os2/dir.c file_io/unix/dir.c file_io/win32/dir.c include/apr_file_info.h test/testdir.c

2019-08-28 Thread Joe Orton
On Sat, Aug 24, 2019 at 05:39:17PM +0300, Ivan Zhakov wrote:
> For what it's worth, I'm -1 on the changes done in r1862071 and
> r1862435 for the reasons listed above.
> 
> PS: No idea if I can actually veto changes, because while I'm a
> committer on the APR project, I am not in its PMC. But "-1" properly
> expresses my technical opinion on this topic.

OK fair enough, Ivan - reverted in r1866019 and please go ahead with the 
iterpool-based proposal - thanks a lot for reviewing this.

Regards, Joe




Re: svn commit: r1862071 - in /apr/apr/trunk: file_io/os2/dir.c file_io/unix/dir.c file_io/win32/dir.c include/apr_file_info.h test/testdir.c

2019-08-27 Thread Branko Čibej
On 27.08.2019 11:25, Ruediger Pluem wrote:
>
> On 08/25/2019 04:04 PM, Branko Čibej wrote:
>> On 24.08.2019 16:39, Ivan Zhakov wrote:
>>> On Thu, 25 Jul 2019 at 15:58, Ivan Zhakov  wrote:
 On Mon, 8 Jul 2019 at 20:05, Ivan Zhakov  wrote:
> On Wed, 3 Jul 2019 at 18:37, Joe Orton  wrote:
>> On Wed, Jul 03, 2019 at 02:43:02PM +0300, Ivan Zhakov wrote:
>>> They also make the existing old API unusable for many cases thus
>>> making a switch to the new API mandatory, even though it doesn't have
>>> to be so (see below).
>>>
>>> APR 1.x did not specify that the result of apr_dir_read() has to be 
>>> allocated
>>> in dir->pool:
>>>
>>>   
>>> https://apr.apache.org/docs/apr/1.7/group__apr__dir.html#ga3e4ee253e0c712160bee10bfb9c8e4a8
>>>
>>> This function does not accept a pool argument, and I think that it's 
>>> natural
>>> to assume limited lifetime in such case. This is what the current major 
>>> APR
>>> users (httpd, svn) do, and other users should probably be ready for 
>>> that too,
>>> since on Win32 the subsequent calls to apr_dir_read() already 
>>> invalidate the
>>> previous apr_finfo_t.
>> Are there many examples in the APR API where returned strings are not
>> either pool-allocated or constant (process lifetime)?  It seems unusual
>> and very surprising to me.
>>
>> A hypothetical API consumer can have made either assumption:
>>
>> a) of constant memory (true with limits on Win32, breaks for Unix), or
>> b) of pool-allocated string lifetime (works for Unix, breaks for Win32)
>>
>> neither is documented and both are broken by current implementations. It
>> is impossible to satisfy both so anybody can argue that fixing either
>> implementation to match the other is a regression.
>>
>> Doubling down on (a) over (b) seems strongly inferior to me, the API
>> cannot support this without introducing runtime overhead for all callers
>> (a new iterpool in the dir object), so I'd rather fix this properly with
>> a new API.
>>
>> I'd be fine with leaving Win32 apr_dir_read() behaviour as-is for 1.x at
>> the same as introducing _pread() if desired - making the strdup only
>> happen for _pread would only be a minor tweak, not a whole new subpool.
>>
> There is example of apr_hash_t.ITERATOR that is statically allocated in
> the hash object and returned to the caller.
>
> Also the native readdir() API behaves the same way [1]:
> [[[
> The returned pointer, and pointers within the structure, might be 
> invalidated
> or the structure or the storage areas might be overwritten by a 
> subsequent call
> to readdir() on the same directory stream.
> ]]]
>
> From this perspective the current behavior of apr_dir_read on Unix looks
> inconsistent and surprising for the API user because it is impossible to 
> use
> in a loop without unbounded memory usage.
>
> I strongly agree that the revved version of this function that accepts a 
> POOL
> argument will be good from both usage and implementation terms. I would 
> be +1
> on adding such API. But I also think that unless we fix the original issue
> by using the preallocated buffer/iterpool, the whole thing would still be
> problematic for any existing API user.
>
> More detailed: if we take this opportunity to choose and document the API 
> to
> consistently return a temporary result, then as the caller I can either 
> process
> the data during iteration or manually put it into a long-living pool if 
> that is
> necessary. I think that all current callers work with the API this way. 
> And I
> can also switch to the new revved function to manually specify the pool 
> and
> better control the allocations. If I would like to support both APR 1.x 
> and 2.x
> the compatibility is also straightforward to achieve.
>
> If we instead choose an option where the results may be allocated
> in the dir object pool, then as the caller I cannot avoid unbounded memory
> usage when supporting both APR 1.x and 2.x. And this approach also 
> introduces
> the unavoidable the unbounded memory usage for all Win32 users that
> do not update their code.
>
> [1] 
> https://pubs.opengroup.org/onlinepubs/9699919799/functions/readdir.html#tag_16_475
>
 Is there any feedback or thoughts on these proposed changes?

 Thanks!

>>> For what it's worth, I'm -1 on the changes done in r1862071 and
>>> r1862435 for the reasons listed above.
>>>
>>> PS: No idea if I can actually veto changes, because while I'm a
>>> committer on the APR project, I am not in its PMC. But "-1" properly
>>> expresses my technical opinion on this topic.
>>
>> I agree with your analysis. I think it would be best if you just
>> implemented your 

Re: svn commit: r1862071 - in /apr/apr/trunk: file_io/os2/dir.c file_io/unix/dir.c file_io/win32/dir.c include/apr_file_info.h test/testdir.c

2019-08-27 Thread Ruediger Pluem



On 08/25/2019 04:04 PM, Branko Čibej wrote:
> On 24.08.2019 16:39, Ivan Zhakov wrote:
>> On Thu, 25 Jul 2019 at 15:58, Ivan Zhakov  wrote:
>>> On Mon, 8 Jul 2019 at 20:05, Ivan Zhakov  wrote:
 On Wed, 3 Jul 2019 at 18:37, Joe Orton  wrote:
> On Wed, Jul 03, 2019 at 02:43:02PM +0300, Ivan Zhakov wrote:
>> They also make the existing old API unusable for many cases thus
>> making a switch to the new API mandatory, even though it doesn't have
>> to be so (see below).
>>
>> APR 1.x did not specify that the result of apr_dir_read() has to be 
>> allocated
>> in dir->pool:
>>
>>   
>> https://apr.apache.org/docs/apr/1.7/group__apr__dir.html#ga3e4ee253e0c712160bee10bfb9c8e4a8
>>
>> This function does not accept a pool argument, and I think that it's 
>> natural
>> to assume limited lifetime in such case. This is what the current major 
>> APR
>> users (httpd, svn) do, and other users should probably be ready for that 
>> too,
>> since on Win32 the subsequent calls to apr_dir_read() already invalidate 
>> the
>> previous apr_finfo_t.
> Are there many examples in the APR API where returned strings are not
> either pool-allocated or constant (process lifetime)?  It seems unusual
> and very surprising to me.
>
> A hypothetical API consumer can have made either assumption:
>
> a) of constant memory (true with limits on Win32, breaks for Unix), or
> b) of pool-allocated string lifetime (works for Unix, breaks for Win32)
>
> neither is documented and both are broken by current implementations. It
> is impossible to satisfy both so anybody can argue that fixing either
> implementation to match the other is a regression.
>
> Doubling down on (a) over (b) seems strongly inferior to me, the API
> cannot support this without introducing runtime overhead for all callers
> (a new iterpool in the dir object), so I'd rather fix this properly with
> a new API.
>
> I'd be fine with leaving Win32 apr_dir_read() behaviour as-is for 1.x at
> the same as introducing _pread() if desired - making the strdup only
> happen for _pread would only be a minor tweak, not a whole new subpool.
>
 There is example of apr_hash_t.ITERATOR that is statically allocated in
 the hash object and returned to the caller.

 Also the native readdir() API behaves the same way [1]:
 [[[
 The returned pointer, and pointers within the structure, might be 
 invalidated
 or the structure or the storage areas might be overwritten by a subsequent 
 call
 to readdir() on the same directory stream.
 ]]]

 From this perspective the current behavior of apr_dir_read on Unix looks
 inconsistent and surprising for the API user because it is impossible to 
 use
 in a loop without unbounded memory usage.

 I strongly agree that the revved version of this function that accepts a 
 POOL
 argument will be good from both usage and implementation terms. I would be 
 +1
 on adding such API. But I also think that unless we fix the original issue
 by using the preallocated buffer/iterpool, the whole thing would still be
 problematic for any existing API user.

 More detailed: if we take this opportunity to choose and document the API 
 to
 consistently return a temporary result, then as the caller I can either 
 process
 the data during iteration or manually put it into a long-living pool if 
 that is
 necessary. I think that all current callers work with the API this way. 
 And I
 can also switch to the new revved function to manually specify the pool and
 better control the allocations. If I would like to support both APR 1.x 
 and 2.x
 the compatibility is also straightforward to achieve.

 If we instead choose an option where the results may be allocated
 in the dir object pool, then as the caller I cannot avoid unbounded memory
 usage when supporting both APR 1.x and 2.x. And this approach also 
 introduces
 the unavoidable the unbounded memory usage for all Win32 users that
 do not update their code.

 [1] 
 https://pubs.opengroup.org/onlinepubs/9699919799/functions/readdir.html#tag_16_475

>>> Is there any feedback or thoughts on these proposed changes?
>>>
>>> Thanks!
>>>
>> For what it's worth, I'm -1 on the changes done in r1862071 and
>> r1862435 for the reasons listed above.
>>
>> PS: No idea if I can actually veto changes, because while I'm a
>> committer on the APR project, I am not in its PMC. But "-1" properly
>> expresses my technical opinion on this topic.
> 
> 
> I agree with your analysis. I think it would be best if you just
> implemented your proposal, including the revised version of the API. The
> unbounded-memory case you describe is pretty much a showstopper. The
> allocation choices do have 

Re: svn commit: r1862071 - in /apr/apr/trunk: file_io/os2/dir.c file_io/unix/dir.c file_io/win32/dir.c include/apr_file_info.h test/testdir.c

2019-08-25 Thread Branko Čibej
On 24.08.2019 16:39, Ivan Zhakov wrote:
> On Thu, 25 Jul 2019 at 15:58, Ivan Zhakov  wrote:
>> On Mon, 8 Jul 2019 at 20:05, Ivan Zhakov  wrote:
>>> On Wed, 3 Jul 2019 at 18:37, Joe Orton  wrote:
 On Wed, Jul 03, 2019 at 02:43:02PM +0300, Ivan Zhakov wrote:
> They also make the existing old API unusable for many cases thus
> making a switch to the new API mandatory, even though it doesn't have
> to be so (see below).
>
> APR 1.x did not specify that the result of apr_dir_read() has to be 
> allocated
> in dir->pool:
>
>   
> https://apr.apache.org/docs/apr/1.7/group__apr__dir.html#ga3e4ee253e0c712160bee10bfb9c8e4a8
>
> This function does not accept a pool argument, and I think that it's 
> natural
> to assume limited lifetime in such case. This is what the current major 
> APR
> users (httpd, svn) do, and other users should probably be ready for that 
> too,
> since on Win32 the subsequent calls to apr_dir_read() already invalidate 
> the
> previous apr_finfo_t.
 Are there many examples in the APR API where returned strings are not
 either pool-allocated or constant (process lifetime)?  It seems unusual
 and very surprising to me.

 A hypothetical API consumer can have made either assumption:

 a) of constant memory (true with limits on Win32, breaks for Unix), or
 b) of pool-allocated string lifetime (works for Unix, breaks for Win32)

 neither is documented and both are broken by current implementations. It
 is impossible to satisfy both so anybody can argue that fixing either
 implementation to match the other is a regression.

 Doubling down on (a) over (b) seems strongly inferior to me, the API
 cannot support this without introducing runtime overhead for all callers
 (a new iterpool in the dir object), so I'd rather fix this properly with
 a new API.

 I'd be fine with leaving Win32 apr_dir_read() behaviour as-is for 1.x at
 the same as introducing _pread() if desired - making the strdup only
 happen for _pread would only be a minor tweak, not a whole new subpool.

>>> There is example of apr_hash_t.ITERATOR that is statically allocated in
>>> the hash object and returned to the caller.
>>>
>>> Also the native readdir() API behaves the same way [1]:
>>> [[[
>>> The returned pointer, and pointers within the structure, might be 
>>> invalidated
>>> or the structure or the storage areas might be overwritten by a subsequent 
>>> call
>>> to readdir() on the same directory stream.
>>> ]]]
>>>
>>> From this perspective the current behavior of apr_dir_read on Unix looks
>>> inconsistent and surprising for the API user because it is impossible to use
>>> in a loop without unbounded memory usage.
>>>
>>> I strongly agree that the revved version of this function that accepts a 
>>> POOL
>>> argument will be good from both usage and implementation terms. I would be 
>>> +1
>>> on adding such API. But I also think that unless we fix the original issue
>>> by using the preallocated buffer/iterpool, the whole thing would still be
>>> problematic for any existing API user.
>>>
>>> More detailed: if we take this opportunity to choose and document the API to
>>> consistently return a temporary result, then as the caller I can either 
>>> process
>>> the data during iteration or manually put it into a long-living pool if 
>>> that is
>>> necessary. I think that all current callers work with the API this way. And 
>>> I
>>> can also switch to the new revved function to manually specify the pool and
>>> better control the allocations. If I would like to support both APR 1.x and 
>>> 2.x
>>> the compatibility is also straightforward to achieve.
>>>
>>> If we instead choose an option where the results may be allocated
>>> in the dir object pool, then as the caller I cannot avoid unbounded memory
>>> usage when supporting both APR 1.x and 2.x. And this approach also 
>>> introduces
>>> the unavoidable the unbounded memory usage for all Win32 users that
>>> do not update their code.
>>>
>>> [1] 
>>> https://pubs.opengroup.org/onlinepubs/9699919799/functions/readdir.html#tag_16_475
>>>
>> Is there any feedback or thoughts on these proposed changes?
>>
>> Thanks!
>>
> For what it's worth, I'm -1 on the changes done in r1862071 and
> r1862435 for the reasons listed above.
>
> PS: No idea if I can actually veto changes, because while I'm a
> committer on the APR project, I am not in its PMC. But "-1" properly
> expresses my technical opinion on this topic.


I agree with your analysis. I think it would be best if you just
implemented your proposal, including the revised version of the API. The
unbounded-memory case you describe is pretty much a showstopper. The
allocation choices do have to be documented better, IMO.

That would be trunk and 1.7.x, although we can't add a revised API in
the 1.7.x line since it's released already. For compatibility it would

Re: svn commit: r1862071 - in /apr/apr/trunk: file_io/os2/dir.c file_io/unix/dir.c file_io/win32/dir.c include/apr_file_info.h test/testdir.c

2019-08-24 Thread Ivan Zhakov
On Thu, 25 Jul 2019 at 15:58, Ivan Zhakov  wrote:
>
> On Mon, 8 Jul 2019 at 20:05, Ivan Zhakov  wrote:
> >
> > On Wed, 3 Jul 2019 at 18:37, Joe Orton  wrote:
> > >
> > > On Wed, Jul 03, 2019 at 02:43:02PM +0300, Ivan Zhakov wrote:
> > > > They also make the existing old API unusable for many cases thus
> > > > making a switch to the new API mandatory, even though it doesn't have
> > > > to be so (see below).
> > > >
> > > > APR 1.x did not specify that the result of apr_dir_read() has to be 
> > > > allocated
> > > > in dir->pool:
> > > >
> > > >   
> > > > https://apr.apache.org/docs/apr/1.7/group__apr__dir.html#ga3e4ee253e0c712160bee10bfb9c8e4a8
> > > >
> > > > This function does not accept a pool argument, and I think that it's 
> > > > natural
> > > > to assume limited lifetime in such case. This is what the current major 
> > > > APR
> > > > users (httpd, svn) do, and other users should probably be ready for 
> > > > that too,
> > > > since on Win32 the subsequent calls to apr_dir_read() already 
> > > > invalidate the
> > > > previous apr_finfo_t.
> > >
> > > Are there many examples in the APR API where returned strings are not
> > > either pool-allocated or constant (process lifetime)?  It seems unusual
> > > and very surprising to me.
> > >
> > > A hypothetical API consumer can have made either assumption:
> > >
> > > a) of constant memory (true with limits on Win32, breaks for Unix), or
> > > b) of pool-allocated string lifetime (works for Unix, breaks for Win32)
> > >
> > > neither is documented and both are broken by current implementations. It
> > > is impossible to satisfy both so anybody can argue that fixing either
> > > implementation to match the other is a regression.
> > >
> > > Doubling down on (a) over (b) seems strongly inferior to me, the API
> > > cannot support this without introducing runtime overhead for all callers
> > > (a new iterpool in the dir object), so I'd rather fix this properly with
> > > a new API.
> > >
> > > I'd be fine with leaving Win32 apr_dir_read() behaviour as-is for 1.x at
> > > the same as introducing _pread() if desired - making the strdup only
> > > happen for _pread would only be a minor tweak, not a whole new subpool.
> > >
> >
> > There is example of apr_hash_t.ITERATOR that is statically allocated in
> > the hash object and returned to the caller.
> >
> > Also the native readdir() API behaves the same way [1]:
> > [[[
> > The returned pointer, and pointers within the structure, might be 
> > invalidated
> > or the structure or the storage areas might be overwritten by a subsequent 
> > call
> > to readdir() on the same directory stream.
> > ]]]
> >
> > From this perspective the current behavior of apr_dir_read on Unix looks
> > inconsistent and surprising for the API user because it is impossible to use
> > in a loop without unbounded memory usage.
> >
> > I strongly agree that the revved version of this function that accepts a 
> > POOL
> > argument will be good from both usage and implementation terms. I would be 
> > +1
> > on adding such API. But I also think that unless we fix the original issue
> > by using the preallocated buffer/iterpool, the whole thing would still be
> > problematic for any existing API user.
> >
> > More detailed: if we take this opportunity to choose and document the API to
> > consistently return a temporary result, then as the caller I can either 
> > process
> > the data during iteration or manually put it into a long-living pool if 
> > that is
> > necessary. I think that all current callers work with the API this way. And 
> > I
> > can also switch to the new revved function to manually specify the pool and
> > better control the allocations. If I would like to support both APR 1.x and 
> > 2.x
> > the compatibility is also straightforward to achieve.
> >
> > If we instead choose an option where the results may be allocated
> > in the dir object pool, then as the caller I cannot avoid unbounded memory
> > usage when supporting both APR 1.x and 2.x. And this approach also 
> > introduces
> > the unavoidable the unbounded memory usage for all Win32 users that
> > do not update their code.
> >
> > [1] 
> > https://pubs.opengroup.org/onlinepubs/9699919799/functions/readdir.html#tag_16_475
> >
> Is there any feedback or thoughts on these proposed changes?
>
> Thanks!
>

For what it's worth, I'm -1 on the changes done in r1862071 and
r1862435 for the reasons listed above.

PS: No idea if I can actually veto changes, because while I'm a
committer on the APR project, I am not in its PMC. But "-1" properly
expresses my technical opinion on this topic.

--
Ivan Zhakov


Re: svn commit: r1862071 - in /apr/apr/trunk: file_io/os2/dir.c file_io/unix/dir.c file_io/win32/dir.c include/apr_file_info.h test/testdir.c

2019-07-25 Thread Ivan Zhakov
On Mon, 8 Jul 2019 at 20:05, Ivan Zhakov  wrote:
>
> On Wed, 3 Jul 2019 at 18:37, Joe Orton  wrote:
> >
> > On Wed, Jul 03, 2019 at 02:43:02PM +0300, Ivan Zhakov wrote:
> > > They also make the existing old API unusable for many cases thus
> > > making a switch to the new API mandatory, even though it doesn't have
> > > to be so (see below).
> > >
> > > APR 1.x did not specify that the result of apr_dir_read() has to be 
> > > allocated
> > > in dir->pool:
> > >
> > >   
> > > https://apr.apache.org/docs/apr/1.7/group__apr__dir.html#ga3e4ee253e0c712160bee10bfb9c8e4a8
> > >
> > > This function does not accept a pool argument, and I think that it's 
> > > natural
> > > to assume limited lifetime in such case. This is what the current major 
> > > APR
> > > users (httpd, svn) do, and other users should probably be ready for that 
> > > too,
> > > since on Win32 the subsequent calls to apr_dir_read() already invalidate 
> > > the
> > > previous apr_finfo_t.
> >
> > Are there many examples in the APR API where returned strings are not
> > either pool-allocated or constant (process lifetime)?  It seems unusual
> > and very surprising to me.
> >
> > A hypothetical API consumer can have made either assumption:
> >
> > a) of constant memory (true with limits on Win32, breaks for Unix), or
> > b) of pool-allocated string lifetime (works for Unix, breaks for Win32)
> >
> > neither is documented and both are broken by current implementations. It
> > is impossible to satisfy both so anybody can argue that fixing either
> > implementation to match the other is a regression.
> >
> > Doubling down on (a) over (b) seems strongly inferior to me, the API
> > cannot support this without introducing runtime overhead for all callers
> > (a new iterpool in the dir object), so I'd rather fix this properly with
> > a new API.
> >
> > I'd be fine with leaving Win32 apr_dir_read() behaviour as-is for 1.x at
> > the same as introducing _pread() if desired - making the strdup only
> > happen for _pread would only be a minor tweak, not a whole new subpool.
> >
>
> There is example of apr_hash_t.ITERATOR that is statically allocated in
> the hash object and returned to the caller.
>
> Also the native readdir() API behaves the same way [1]:
> [[[
> The returned pointer, and pointers within the structure, might be invalidated
> or the structure or the storage areas might be overwritten by a subsequent 
> call
> to readdir() on the same directory stream.
> ]]]
>
> From this perspective the current behavior of apr_dir_read on Unix looks
> inconsistent and surprising for the API user because it is impossible to use
> in a loop without unbounded memory usage.
>
> I strongly agree that the revved version of this function that accepts a POOL
> argument will be good from both usage and implementation terms. I would be +1
> on adding such API. But I also think that unless we fix the original issue
> by using the preallocated buffer/iterpool, the whole thing would still be
> problematic for any existing API user.
>
> More detailed: if we take this opportunity to choose and document the API to
> consistently return a temporary result, then as the caller I can either 
> process
> the data during iteration or manually put it into a long-living pool if that 
> is
> necessary. I think that all current callers work with the API this way. And I
> can also switch to the new revved function to manually specify the pool and
> better control the allocations. If I would like to support both APR 1.x and 
> 2.x
> the compatibility is also straightforward to achieve.
>
> If we instead choose an option where the results may be allocated
> in the dir object pool, then as the caller I cannot avoid unbounded memory
> usage when supporting both APR 1.x and 2.x. And this approach also introduces
> the unavoidable the unbounded memory usage for all Win32 users that
> do not update their code.
>
> [1] 
> https://pubs.opengroup.org/onlinepubs/9699919799/functions/readdir.html#tag_16_475
>
Is there any feedback or thoughts on these proposed changes?

Thanks!

--
Ivan Zhakov


Re: svn commit: r1862071 - in /apr/apr/trunk: file_io/os2/dir.c file_io/unix/dir.c file_io/win32/dir.c include/apr_file_info.h test/testdir.c

2019-07-08 Thread Ivan Zhakov
On Wed, 3 Jul 2019 at 18:37, Joe Orton  wrote:
>
> On Wed, Jul 03, 2019 at 02:43:02PM +0300, Ivan Zhakov wrote:
> > They also make the existing old API unusable for many cases thus
> > making a switch to the new API mandatory, even though it doesn't have
> > to be so (see below).
> >
> > APR 1.x did not specify that the result of apr_dir_read() has to be 
> > allocated
> > in dir->pool:
> >
> >   
> > https://apr.apache.org/docs/apr/1.7/group__apr__dir.html#ga3e4ee253e0c712160bee10bfb9c8e4a8
> >
> > This function does not accept a pool argument, and I think that it's natural
> > to assume limited lifetime in such case. This is what the current major APR
> > users (httpd, svn) do, and other users should probably be ready for that 
> > too,
> > since on Win32 the subsequent calls to apr_dir_read() already invalidate the
> > previous apr_finfo_t.
>
> Are there many examples in the APR API where returned strings are not
> either pool-allocated or constant (process lifetime)?  It seems unusual
> and very surprising to me.
>
> A hypothetical API consumer can have made either assumption:
>
> a) of constant memory (true with limits on Win32, breaks for Unix), or
> b) of pool-allocated string lifetime (works for Unix, breaks for Win32)
>
> neither is documented and both are broken by current implementations. It
> is impossible to satisfy both so anybody can argue that fixing either
> implementation to match the other is a regression.
>
> Doubling down on (a) over (b) seems strongly inferior to me, the API
> cannot support this without introducing runtime overhead for all callers
> (a new iterpool in the dir object), so I'd rather fix this properly with
> a new API.
>
> I'd be fine with leaving Win32 apr_dir_read() behaviour as-is for 1.x at
> the same as introducing _pread() if desired - making the strdup only
> happen for _pread would only be a minor tweak, not a whole new subpool.
>

There is example of apr_hash_t.ITERATOR that is statically allocated in
the hash object and returned to the caller.

Also the native readdir() API behaves the same way [1]:
[[[
The returned pointer, and pointers within the structure, might be invalidated
or the structure or the storage areas might be overwritten by a subsequent call
to readdir() on the same directory stream.
]]]

>From this perspective the current behavior of apr_dir_read on Unix looks
inconsistent and surprising for the API user because it is impossible to use
in a loop without unbounded memory usage.

I strongly agree that the revved version of this function that accepts a POOL
argument will be good from both usage and implementation terms. I would be +1
on adding such API. But I also think that unless we fix the original issue
by using the preallocated buffer/iterpool, the whole thing would still be
problematic for any existing API user.

More detailed: if we take this opportunity to choose and document the API to
consistently return a temporary result, then as the caller I can either process
the data during iteration or manually put it into a long-living pool if that is
necessary. I think that all current callers work with the API this way. And I
can also switch to the new revved function to manually specify the pool and
better control the allocations. If I would like to support both APR 1.x and 2.x
the compatibility is also straightforward to achieve.

If we instead choose an option where the results may be allocated
in the dir object pool, then as the caller I cannot avoid unbounded memory
usage when supporting both APR 1.x and 2.x. And this approach also introduces
the unavoidable the unbounded memory usage for all Win32 users that
do not update their code.

[1] 
https://pubs.opengroup.org/onlinepubs/9699919799/functions/readdir.html#tag_16_475


-- 
Ivan Zhakov


Re: svn commit: r1862071 - in /apr/apr/trunk: file_io/os2/dir.c file_io/unix/dir.c file_io/win32/dir.c include/apr_file_info.h test/testdir.c

2019-07-03 Thread Ruediger Pluem



On 07/03/2019 05:37 PM, Joe Orton wrote:
> On Wed, Jul 03, 2019 at 02:43:02PM +0300, Ivan Zhakov wrote:
>> They also make the existing old API unusable for many cases thus 
>> making a switch to the new API mandatory, even though it doesn't have 
>> to be so (see below).
>>
>> APR 1.x did not specify that the result of apr_dir_read() has to be allocated
>> in dir->pool:
>>
>>   
>> https://apr.apache.org/docs/apr/1.7/group__apr__dir.html#ga3e4ee253e0c712160bee10bfb9c8e4a8
>>
>> This function does not accept a pool argument, and I think that it's natural
>> to assume limited lifetime in such case. This is what the current major APR
>> users (httpd, svn) do, and other users should probably be ready for that too,
>> since on Win32 the subsequent calls to apr_dir_read() already invalidate the
>> previous apr_finfo_t.
> 
> Are there many examples in the APR API where returned strings are not 
> either pool-allocated or constant (process lifetime)?  It seems unusual 
> and very surprising to me.
> 
> A hypothetical API consumer can have made either assumption:
> 
> a) of constant memory (true with limits on Win32, breaks for Unix), or
> b) of pool-allocated string lifetime (works for Unix, breaks for Win32)
> 
> neither is documented and both are broken by current implementations. It 
> is impossible to satisfy both so anybody can argue that fixing either 
> implementation to match the other is a regression.  
> 
> Doubling down on (a) over (b) seems strongly inferior to me, the API 
> cannot support this without introducing runtime overhead for all callers 
> (a new iterpool in the dir object), so I'd rather fix this properly with 
> a new API.
> 
> I'd be fine with leaving Win32 apr_dir_read() behaviour as-is for 1.x at 
> the same as introducing _pread() if desired - making the strdup only 
> happen for _pread would only be a minor tweak, not a whole new subpool.
> 
+1

Regards

Rüdiger




Re: svn commit: r1862071 - in /apr/apr/trunk: file_io/os2/dir.c file_io/unix/dir.c file_io/win32/dir.c include/apr_file_info.h test/testdir.c

2019-07-03 Thread Joe Orton
On Wed, Jul 03, 2019 at 02:43:02PM +0300, Ivan Zhakov wrote:
> They also make the existing old API unusable for many cases thus 
> making a switch to the new API mandatory, even though it doesn't have 
> to be so (see below).
>
> APR 1.x did not specify that the result of apr_dir_read() has to be allocated
> in dir->pool:
> 
>   
> https://apr.apache.org/docs/apr/1.7/group__apr__dir.html#ga3e4ee253e0c712160bee10bfb9c8e4a8
> 
> This function does not accept a pool argument, and I think that it's natural
> to assume limited lifetime in such case. This is what the current major APR
> users (httpd, svn) do, and other users should probably be ready for that too,
> since on Win32 the subsequent calls to apr_dir_read() already invalidate the
> previous apr_finfo_t.

Are there many examples in the APR API where returned strings are not 
either pool-allocated or constant (process lifetime)?  It seems unusual 
and very surprising to me.

A hypothetical API consumer can have made either assumption:

a) of constant memory (true with limits on Win32, breaks for Unix), or
b) of pool-allocated string lifetime (works for Unix, breaks for Win32)

neither is documented and both are broken by current implementations. It 
is impossible to satisfy both so anybody can argue that fixing either 
implementation to match the other is a regression.  

Doubling down on (a) over (b) seems strongly inferior to me, the API 
cannot support this without introducing runtime overhead for all callers 
(a new iterpool in the dir object), so I'd rather fix this properly with 
a new API.

I'd be fine with leaving Win32 apr_dir_read() behaviour as-is for 1.x at 
the same as introducing _pread() if desired - making the strdup only 
happen for _pread would only be a minor tweak, not a whole new subpool.

Regards, Joe


Re: svn commit: r1862071 - in /apr/apr/trunk: file_io/os2/dir.c file_io/unix/dir.c file_io/win32/dir.c include/apr_file_info.h test/testdir.c

2019-07-03 Thread Ivan Zhakov
On Tue, 2 Jul 2019 at 13:18, Joe Orton  wrote:
>
> On Tue, Jul 02, 2019 at 09:59:20AM +0200, Branko Čibej wrote:
> > On 02.07.2019 08:49, Joe Orton wrote:
> > > On Thu, Jun 27, 2019 at 05:01:56PM +0300, Ivan Zhakov wrote:
> > >> On Tue, 25 Jun 2019 at 17:21,  wrote:
> > >>
> > >>> Author: jorton
> > >>> Date: Tue Jun 25 14:21:56 2019
> > >>> New Revision: 1862071
> > >>>
> > >>> URL: http://svn.apache.org/viewvc?rev=1862071=rev
> > >>> Log:
> > >>> Add apr_dir_pread(), a variant of apr_dir_read() which allows callers
> > >>> to read a directory with constant memory consumption:
> > > ...
> > >> I'm not sure it's best fix. Better solution would be allocate buffer for
> > >> dirname + MAX_FILE_NAME in apr_dir_t and reuse it on every iteration. 
> > >> Win32
> > >> already has such code.
> >
> > > I didn't look at win32 too closely, so on win32 currently doing:
> > >
> > >   apr_dir_open(, ...);
> > >   apr_dir_read(, ..., x);
> > >   apr_dir_read(, ..., x);
> > >
> > > invalidates f1?  That's pretty ugly, there is no indication in the API
> > > that apr_dir_read() has side-effects like that.
> >
> > The 2.0 docstring says explicitly that "memory is allocated in the pool
> > passed to apr_dir_open()". While this is really bad API design (hence
> > apr_dir_pread() ...), it pretty much forbids the implementation from
> > reusing an internal name buffer. Looks like the Windows implementation
> > needs fixing?
>
> Reading the win32 code a bit more, it is not constant-memory even with
> the buffer in apr_dir_t, because it can allocate from dir->pool (and
> even create cleanups!) in the more_finfo() call, so I think the current
> behaviour is not justifiable.
>
> Ivan, do you have a counter-proposal to apr_dir_pread() to allow
> constant memory while reading a directory?  If we could have a hard
> constraint on apr_dir_read() *never* allocating any memory I guess that
> would work but appears to require substantial changes to the win32 side
> which I'm not qualified for.
>
> IMO not pstrdup'ing the filenames on Win32 is an API violation.  You
> have to be very clear in documenting things like that if someone is
> passing in a struct *.
>
Hi Joe,

I think that these changes to apr_dir_read() may be troublesome, because
they basically *introduce* unbounded memory usage for any existing user of
apr_dir_read() on Win32. They also make the existing old API unusable for
many cases thus making a switch to the new API mandatory, even though it
doesn't have to be so (see below).

APR 1.x did not specify that the result of apr_dir_read() has to be allocated
in dir->pool:

  
https://apr.apache.org/docs/apr/1.7/group__apr__dir.html#ga3e4ee253e0c712160bee10bfb9c8e4a8

This function does not accept a pool argument, and I think that it's natural
to assume limited lifetime in such case. This is what the current major APR
users (httpd, svn) do, and other users should probably be ready for that too,
since on Win32 the subsequent calls to apr_dir_read() already invalidate the
previous apr_finfo_t.

With all that in mind, here is what I propose instead:

1) First, fix the apr_dir_read()'s unbounded memory usage without introducing
   the new API.

  I think that this can be done by using the preallocated buffer for the
  filename or by introducing an internal `iterpool` that will be cleared
  by subsequent apr_dir_read() calls.

   If there are other Win32-related unbounded memory usages in the current
   apr_dir_read() implementation, I'll also be willing to fix them.

2) For APR 2.0, revv apr_dir_read() to apr_dir_read2() that accepts two pools:
   `result_pool` and `scratch_pool`.

   I think that this would be an appropriate change by itself (by giving the
   caller more fine-grained control over the allocations), but I also see it
   as an improvement that is unrelated to (1).

   In other words, I think that we should fix the existing problem first by
   changing or plumbing the existing API and only then add the new API that
   improves over it, because doing so doesn't make the switch to the new API
   mandatory for existing users who want to avoid regressions.

-- 
Ivan Zhakov


Re: svn commit: r1862071 - in /apr/apr/trunk: file_io/os2/dir.c file_io/unix/dir.c file_io/win32/dir.c include/apr_file_info.h test/testdir.c

2019-07-02 Thread Joe Orton
On Tue, Jul 02, 2019 at 08:22:47AM -0500, Greg Stein wrote:
> On Tue, Jul 2, 2019 at 8:04 AM Branko Čibej  wrote:
> > Perhaps this is the right time to note (again) that over in Subversion
> > land, we found that functions should take _two_ pool parameters: one for
> > allocating the returned values and one for temporary allocations. This
> > is better than functions creating temporary subpools for temporaries,
> > because it allows the API consumer better control over allocations.
> 
> https://subversion.apache.org/docs/community-guide/conventions.html#apr-pools

No argument from me, applying this guidance consistently to the APR API 
would be a massive improvement - and also a massive BC break!

APR is particularly bad about embedding pool pointers in public 
structures - sockaddr is a big PITA here.



Re: svn commit: r1862071 - in /apr/apr/trunk: file_io/os2/dir.c file_io/unix/dir.c file_io/win32/dir.c include/apr_file_info.h test/testdir.c

2019-07-02 Thread Greg Stein
On Tue, Jul 2, 2019 at 8:04 AM Branko Čibej  wrote:
>...

> Perhaps this is the right time to note (again) that over in Subversion
> land, we found that functions should take _two_ pool parameters: one for
> allocating the returned values and one for temporary allocations. This
> is better than functions creating temporary subpools for temporaries,
> because it allows the API consumer better control over allocations.
>

https://subversion.apache.org/docs/community-guide/conventions.html#apr-pools


Re: svn commit: r1862071 - in /apr/apr/trunk: file_io/os2/dir.c file_io/unix/dir.c file_io/win32/dir.c include/apr_file_info.h test/testdir.c

2019-07-02 Thread Branko Čibej
On 02.07.2019 14:04, Joe Orton wrote:
> On Tue, Jul 02, 2019 at 11:18:31AM +0100, Joe Orton wrote:
>> Reading the win32 code a bit more, it is not constant-memory even with 
>> the buffer in apr_dir_t, because it can allocate from dir->pool (and 
>> even create cleanups!) in the more_finfo() call, so I think the current 
>> behaviour is not justifiable.
> Also, since the Unix implementation calls apr_stat(), which takes a 
> pool, I actually can't imagine there is another way around the need for 
> a pool passed to apr_dir_read() call to guarantee constant memory.

Any function that allocates memory should take a pool parameter. We
can't change apr_dir_read() retroactively, but we can deprecate it and I
think we should.

Perhaps this is the right time to note (again) that over in Subversion
land, we found that functions should take _two_ pool parameters: one for
allocating the returned values and one for temporary allocations. This
is better than functions creating temporary subpools for temporaries,
because it allows the API consumer better control over allocations.

-- Brane



Re: svn commit: r1862071 - in /apr/apr/trunk: file_io/os2/dir.c file_io/unix/dir.c file_io/win32/dir.c include/apr_file_info.h test/testdir.c

2019-07-02 Thread Joe Orton
On Tue, Jul 02, 2019 at 11:18:31AM +0100, Joe Orton wrote:
> Reading the win32 code a bit more, it is not constant-memory even with 
> the buffer in apr_dir_t, because it can allocate from dir->pool (and 
> even create cleanups!) in the more_finfo() call, so I think the current 
> behaviour is not justifiable.

Also, since the Unix implementation calls apr_stat(), which takes a 
pool, I actually can't imagine there is another way around the need for 
a pool passed to apr_dir_read() call to guarantee constant memory.



Re: svn commit: r1862071 - in /apr/apr/trunk: file_io/os2/dir.c file_io/unix/dir.c file_io/win32/dir.c include/apr_file_info.h test/testdir.c

2019-07-02 Thread Joe Orton
On Tue, Jul 02, 2019 at 09:59:20AM +0200, Branko Čibej wrote:
> On 02.07.2019 08:49, Joe Orton wrote:
> > On Thu, Jun 27, 2019 at 05:01:56PM +0300, Ivan Zhakov wrote:
> >> On Tue, 25 Jun 2019 at 17:21,  wrote:
> >>
> >>> Author: jorton
> >>> Date: Tue Jun 25 14:21:56 2019
> >>> New Revision: 1862071
> >>>
> >>> URL: http://svn.apache.org/viewvc?rev=1862071=rev
> >>> Log:
> >>> Add apr_dir_pread(), a variant of apr_dir_read() which allows callers
> >>> to read a directory with constant memory consumption:
> > ...
> >> I'm not sure it's best fix. Better solution would be allocate buffer for
> >> dirname + MAX_FILE_NAME in apr_dir_t and reuse it on every iteration. Win32
> >> already has such code.
>
> > I didn't look at win32 too closely, so on win32 currently doing:
> >
> >   apr_dir_open(, ...);
> >   apr_dir_read(, ..., x);
> >   apr_dir_read(, ..., x);
> >
> > invalidates f1?  That's pretty ugly, there is no indication in the API 
> > that apr_dir_read() has side-effects like that.
> 
> The 2.0 docstring says explicitly that "memory is allocated in the pool
> passed to apr_dir_open()". While this is really bad API design (hence
> apr_dir_pread() ...), it pretty much forbids the implementation from
> reusing an internal name buffer. Looks like the Windows implementation
> needs fixing?

Reading the win32 code a bit more, it is not constant-memory even with 
the buffer in apr_dir_t, because it can allocate from dir->pool (and 
even create cleanups!) in the more_finfo() call, so I think the current 
behaviour is not justifiable.

Ivan, do you have a counter-proposal to apr_dir_pread() to allow 
constant memory while reading a directory?  If we could have a hard 
constraint on apr_dir_read() *never* allocating any memory I guess that 
would work but appears to require substantial changes to the win32 side 
which I'm not qualified for.

IMO not pstrdup'ing the filenames on Win32 is an API violation.  You 
have to be very clear in documenting things like that if someone is 
passing in a struct *.

Regards, Joe


Re: svn commit: r1862071 - in /apr/apr/trunk: file_io/os2/dir.c file_io/unix/dir.c file_io/win32/dir.c include/apr_file_info.h test/testdir.c

2019-07-02 Thread Branko Čibej
On 02.07.2019 08:49, Joe Orton wrote:
> On Thu, Jun 27, 2019 at 05:01:56PM +0300, Ivan Zhakov wrote:
>> On Tue, 25 Jun 2019 at 17:21,  wrote:
>>
>>> Author: jorton
>>> Date: Tue Jun 25 14:21:56 2019
>>> New Revision: 1862071
>>>
>>> URL: http://svn.apache.org/viewvc?rev=1862071=rev
>>> Log:
>>> Add apr_dir_pread(), a variant of apr_dir_read() which allows callers
>>> to read a directory with constant memory consumption:
> ...
>> I'm not sure it's best fix. Better solution would be allocate buffer for
>> dirname + MAX_FILE_NAME in apr_dir_t and reuse it on every iteration. Win32
>> already has such code.
> I didn't look at win32 too closely, so on win32 currently doing:
>
>   apr_dir_open(, ...);
>   apr_dir_read(, ..., x);
>   apr_dir_read(, ..., x);
>
> invalidates f1?  That's pretty ugly, there is no indication in the API 
> that apr_dir_read() has side-effects like that.

The 2.0 docstring says explicitly that "memory is allocated in the pool
passed to apr_dir_open()". While this is really bad API design (hence
apr_dir_pread() ...), it pretty much forbids the implementation from
reusing an internal name buffer. Looks like the Windows implementation
needs fixing?

-- Brane



Re: svn commit: r1862071 - in /apr/apr/trunk: file_io/os2/dir.c file_io/unix/dir.c file_io/win32/dir.c include/apr_file_info.h test/testdir.c

2019-07-02 Thread Joe Orton
On Thu, Jun 27, 2019 at 05:01:56PM +0300, Ivan Zhakov wrote:
> On Tue, 25 Jun 2019 at 17:21,  wrote:
> 
> > Author: jorton
> > Date: Tue Jun 25 14:21:56 2019
> > New Revision: 1862071
> >
> > URL: http://svn.apache.org/viewvc?rev=1862071=rev
> > Log:
> > Add apr_dir_pread(), a variant of apr_dir_read() which allows callers
> > to read a directory with constant memory consumption:
...
> I'm not sure it's best fix. Better solution would be allocate buffer for
> dirname + MAX_FILE_NAME in apr_dir_t and reuse it on every iteration. Win32
> already has such code.

I didn't look at win32 too closely, so on win32 currently doing:

  apr_dir_open(, ...);
  apr_dir_read(, ..., x);
  apr_dir_read(, ..., x);

invalidates f1?  That's pretty ugly, there is no indication in the API 
that apr_dir_read() has side-effects like that.




Re: svn commit: r1862071 - in /apr/apr/trunk: file_io/os2/dir.c file_io/unix/dir.c file_io/win32/dir.c include/apr_file_info.h test/testdir.c

2019-06-27 Thread Ivan Zhakov
On Tue, 25 Jun 2019 at 17:21,  wrote:

> Author: jorton
> Date: Tue Jun 25 14:21:56 2019
> New Revision: 1862071
>
> URL: http://svn.apache.org/viewvc?rev=1862071=rev
> Log:
> Add apr_dir_pread(), a variant of apr_dir_read() which allows callers
> to read a directory with constant memory consumption:
>
> * include/apr_file_info.h: Add warning on memory consumption for
>   apr_dir_read; declare apr_dir_pread.
>
> * file_io/unix/dir.c (apr_dir_pread): Rename from apr_dir_read and
>   take pool argument.  (apr_dir_read): Reimplement using it.
>
> * file_io/win32/dir.c, file_io/os2/dir.c: Likewise, but untested.


> * test/testdir.c (test_pread) [APR_POOL_DEBUG]: Add test case.
>
> I'm not sure it's best fix. Better solution would be allocate buffer for
dirname + MAX_FILE_NAME in apr_dir_t and reuse it on every iteration. Win32
already has such code.


-- 
Ivan Zhakov