Re: [RFC] apr_array_find()

2018-04-28 Thread Jim Jagielski
FWIW, I also agree w/ Greg's PoV regarding this.

> On Apr 27, 2018, at 3:23 PM, Greg Stein  wrote:
> 
> At one point, we used to use an svn_array_find() or some such. We deprecated 
> it. Creating a function to do comparisons, then set up a function call... It 
> was just annoying. Iteration over an APR array is easy, and very clear. There 
> is basically no code or mental overload. Using a comparator function, there 
> is.
> 
> -0 on the concept.
> 
> Cheers,
> -g
> 
> ps. and if written, it *should* allow for structs; we use those often in svn. 
> Avoids alloc'ing a structure and sticking a ptr in the array.
> 
> On Fri, Apr 27, 2018, 13:46 William A Rowe Jr  > wrote:
> SVN's extensions to apr largely exist here;
> 
> http://svn.apache.org/repos/asf/subversion/trunk/subversion/libsvn_subr/ 
> 
> 
> Because they are based on apr and rooted in that ecosystem, there are a 
> number of useful extensions in that repo that were never pushed upstream, but 
> are not "withheld". Nobody had the time to submit and champion them.
> 
> On Fri, Apr 27, 2018, 13:25 Jim Riggs  > wrote:
> > On 27 Apr 2018, at 11:50, William A Rowe Jr  > > wrote:
> > 
> > If we code this abstractly, comparator declaration is not type-safe, but can
> > be declared so that it is usable by table, hash, b-tree and many other 
> > approaches to data organization. With clever use of wrappers, we should
> > be able to make a derivation (counted-byte-string tables, for example)
> > behave in a type-safe manner at no performance penalty.
> > 
> > We should cross check the subversion util feature set to ensure we don't
> > have something ready to borrow, already.
> 
> I fear things have now started going over my head. :-) I'll try to keep up.
> 



Re: [RFC] apr_array_find()

2018-04-28 Thread William A Rowe Jr
On Sat, Apr 28, 2018 at 5:59 AM, Branko Čibej  wrote:

> On 27.04.2018 21:23, Greg Stein wrote:
> > At one point, we used to use an svn_array_find() or some such. We
> > deprecated it. Creating a function to do comparisons, then set up a
> > function call... It was just annoying. Iteration over an APR array is
> > easy, and very clear. There is basically no code or mental overload.
> > Using a comparator function, there is.
>
> static int compare(const void* x, const void* y) {
>return strcmp(*(const char**) x, (const char*) y));
>// Oops ... is the first argument the pointer to the
>// array element and the second the search key, or is
>// it the other way around? Or do we have to dereference
>// the search key in order to make these arguments
>// symmetric?
> }
>
> // ... lots of code in between
>
> int i = apr_array_find(array, compare, "key");
>
> The first form is definitely easier to understand and nicely localized.
> "Explicit is better than implicit" and all that.
>

If you wanted to pursue this, I'd definitely apr_{type}_compare_set()
the comparator callback at array creation. Saves stack prep on lots
of calls, and avoids individual errors passing a comparator off in
some infrequently used code path. Use apr_uintptr_t if you want to
portably pass an int-or-ptr portably as the lhs/rhs argument. But
Greg makes a good case for comparing structs passed on the stack
(svn_errno is such a struct), and this requires the _compare_set()
to accept an incomplete type. This likely can't be done in 1.x (and
would be problematic when porting certain code to 2.x) because
apr_array_t was a very transparent type (where to hide the pointer
to comparator fn?)


Re: [RFC] apr_array_find()

2018-04-28 Thread Branko Čibej
On 27.04.2018 21:23, Greg Stein wrote:
> At one point, we used to use an svn_array_find() or some such. We
> deprecated it. Creating a function to do comparisons, then set up a
> function call... It was just annoying. Iteration over an APR array is
> easy, and very clear. There is basically no code or mental overload.
> Using a comparator function, there is.

Yes, I was thinking along those lines, too ... consider:

const char* key = "key";
int i;
for (i = 0; i < array->nelts; ++i) {
if (0 == strcmp(key, APR_ARRAY_IDX(array, i, const char*)))
break;
}


Compare that to:

static int compare(const void* x, const void* y) {
   return strcmp(*(const char**) x, (const char*) y));
   // Oops ... is the first argument the pointer to the
   // array element and the second the search key, or is
   // it the other way around? Or do we have to dereference
   // the search key in order to make these arguments
   // symmetric?
}

// ... lots of code in between

int i = apr_array_find(array, compare, "key");


The first form is definitely easier to understand and nicely localized.
"Explicit is better than implicit" and all that.

-- Brane


>
> -0 on the concept.
>
> Cheers,
> -g
>
> ps. and if written, it *should* allow for structs; we use those often
> in svn. Avoids alloc'ing a structure and sticking a ptr in the array.
>
> On Fri, Apr 27, 2018, 13:46 William A Rowe Jr  > wrote:
>
> SVN's extensions to apr largely exist here;
>
> http://svn.apache.org/repos/asf/subversion/trunk/subversion/libsvn_subr/
>
> Because they are based on apr and rooted in that ecosystem, there
> are a number of useful extensions in that repo that were never
> pushed upstream, but are not "withheld". Nobody had the time to
> submit and champion them.
>
> On Fri, Apr 27, 2018, 13:25 Jim Riggs  > wrote:
>
> > On 27 Apr 2018, at 11:50, William A Rowe Jr
> > wrote:
> >
> > If we code this abstractly, comparator declaration is not
> type-safe, but can
> > be declared so that it is usable by table, hash, b-tree and
> many other
> > approaches to data organization. With clever use of
> wrappers, we should
> > be able to make a derivation (counted-byte-string tables,
> for example)
> > behave in a type-safe manner at no performance penalty.
> >
> > We should cross check the subversion util feature set to
> ensure we don't
> > have something ready to borrow, already.
>
> I fear things have now started going over my head. :-) I'll
> try to keep up.
>



Re: [RFC] apr_array_find()

2018-04-27 Thread Greg Stein
At one point, we used to use an svn_array_find() or some such. We
deprecated it. Creating a function to do comparisons, then set up a
function call... It was just annoying. Iteration over an APR array is easy,
and very clear. There is basically no code or mental overload. Using a
comparator function, there is.

-0 on the concept.

Cheers,
-g

ps. and if written, it *should* allow for structs; we use those often in
svn. Avoids alloc'ing a structure and sticking a ptr in the array.

On Fri, Apr 27, 2018, 13:46 William A Rowe Jr  wrote:

> SVN's extensions to apr largely exist here;
>
> http://svn.apache.org/repos/asf/subversion/trunk/subversion/libsvn_subr/
>
> Because they are based on apr and rooted in that ecosystem, there are a
> number of useful extensions in that repo that were never pushed upstream,
> but are not "withheld". Nobody had the time to submit and champion them.
>
> On Fri, Apr 27, 2018, 13:25 Jim Riggs  wrote:
>
>> > On 27 Apr 2018, at 11:50, William A Rowe Jr 
>> wrote:
>> >
>> > If we code this abstractly, comparator declaration is not type-safe,
>> but can
>> > be declared so that it is usable by table, hash, b-tree and many other
>> > approaches to data organization. With clever use of wrappers, we should
>> > be able to make a derivation (counted-byte-string tables, for example)
>> > behave in a type-safe manner at no performance penalty.
>> >
>> > We should cross check the subversion util feature set to ensure we don't
>> > have something ready to borrow, already.
>>
>> I fear things have now started going over my head. :-) I'll try to keep
>> up.
>>
>>


Re: [RFC] apr_array_find()

2018-04-27 Thread William A Rowe Jr
SVN's extensions to apr largely exist here;

http://svn.apache.org/repos/asf/subversion/trunk/subversion/libsvn_subr/

Because they are based on apr and rooted in that ecosystem, there are a
number of useful extensions in that repo that were never pushed upstream,
but are not "withheld". Nobody had the time to submit and champion them.

On Fri, Apr 27, 2018, 13:25 Jim Riggs  wrote:

> > On 27 Apr 2018, at 11:50, William A Rowe Jr  wrote:
> >
> > If we code this abstractly, comparator declaration is not type-safe, but
> can
> > be declared so that it is usable by table, hash, b-tree and many other
> > approaches to data organization. With clever use of wrappers, we should
> > be able to make a derivation (counted-byte-string tables, for example)
> > behave in a type-safe manner at no performance penalty.
> >
> > We should cross check the subversion util feature set to ensure we don't
> > have something ready to borrow, already.
>
> I fear things have now started going over my head. :-) I'll try to keep up.
>
>


Re: [RFC] apr_array_find()

2018-04-27 Thread Jim Riggs
> On 27 Apr 2018, at 11:50, William A Rowe Jr  wrote:
> 
> If we code this abstractly, comparator declaration is not type-safe, but can
> be declared so that it is usable by table, hash, b-tree and many other 
> approaches to data organization. With clever use of wrappers, we should
> be able to make a derivation (counted-byte-string tables, for example)
> behave in a type-safe manner at no performance penalty.
> 
> We should cross check the subversion util feature set to ensure we don't
> have something ready to borrow, already.

I fear things have now started going over my head. :-) I'll try to keep up.



Re: [RFC] apr_array_find()

2018-04-27 Thread Jim Riggs
> On 27 Apr 2018, at 11:44, Jim Riggs  wrote:
> 
>> Major quibble: The comparator should take a parameter that receives the
>> array element size. Careful implementations will either use that size or
>> check that it's the correct size. APR arrays can contain whole
>> structures, not just pointers.
> 
> Again, the comparator has to be smart. It should know what it is receiving. 
> They may be ints, structs, strings, whatever. It can and should cast as 
> necessary. The void* is just a way of saying "I'm passing you something. You 
> have to know how to handle it." It may or may not be a pointer. I have 
> successfully tested it with char* (using strcmp/strcasecmp) as well as int 
> and char using custom comp callback functions. They all worked as expected.

Though, I now admit that this presumably only works for types where 
sizeof() <= sizeof(void*) due to casting, but that seems like a 
documentation issue, not a technical one. Personally, I haven't seen or worked 
with many apr_arrays being used with anything other than pointers, not to say 
there aren't lots of them. (I am the newish guy on the block.)

Heck, even if we were to restrict this function to only working with arrays 
whose elements are pointers, it would still be useful and help remove some code 
duplication, IMO.



Re: [RFC] apr_array_find()

2018-04-27 Thread William A Rowe Jr
On Fri, Apr 27, 2018 at 11:44 AM, Jim Riggs  wrote:

> > On 27 Apr 2018, at 05:06, Branko Čibej  wrote:
> >
> > On 27.04.2018 11:30, Nick Kew wrote:
> >>> In my use cases, the search may often be strings, but not always, so I
> thought that maybe APR could/should provide something more generic. The
> above functions could then be rewritten to use this new function. Here are
> my thoughts:
> >>>
> >>> 1. The search item (needle) can be anything rather than just a string
> as above.
> >>> 2. The caller provides a callback that compares the values to needle
> in the same way strcmp()/strcasecmp() do. That is, 0 means "equal". In
> fact, strcmp()/strcasecmp() can -- and probably often will -- be used as
> the callback for strings.
> >> There’s a bit of a mismatch between that and the “can be anything”.
> >> The latter would call for a size parameter that could be passed to
> memcmp,
> >> as well as of course strncmp/strncasecmp.
> >
> > You can't just use any of the string comparison functions directly
> > anyway: first because the signature is wrong, and second because there's
> > no requirement that array elements are pointers.
>
> Nick - I don't expect that comp would ever do just a memcmp...unless
> that's how someone implements it, but then that implementation would have
> to know what kind of data (and the size) it is dealing with. The comparator
> would be a custom written function that expects int or char* or struct* or
> _ and do whatever it considers an equality comparison. It may receive
> struct*s and only compare one or two members to consider them "equal" in
> the comparator's context.
>
> Branko - First, I can use them directly with either a compiler warning
> (implicit cast char* to void*) or an explicit cast. Second, this does not
> require that the elements be pointers. If they are char*, though, strcmp
> and strcasecmp work fine.
>
> >>> 3. A start (input) and index (output) parameter can be used to iterate
> and find all occurrences.
> >>> What do others think? Is there value in this, or is it just me?
> >> I’ve had occasion to search an array, though I don’t think I’ve needed
> anything more
> >> generic than the HTTPD functions you name above.
> >>
> >>> +typedef int (apr_array_find_comparison_callback_fn_t)(const void *x,
> const void *y);
> >> Minor quibble: the name seems quite tortuously overdescriptive!
>
> Agreed, but it's not something that really gets used much, though. I just
> based it off the apr_table_do callback's name.
>

If we code this abstractly, comparator declaration is not type-safe, but can
be declared so that it is usable by table, hash, b-tree and many other
approaches to data organization. With clever use of wrappers, we should
be able to make a derivation (counted-byte-string tables, for example)
behave in a type-safe manner at no performance penalty.

We should cross check the subversion util feature set to ensure we don't
have something ready to borrow, already.


Re: [RFC] apr_array_find()

2018-04-27 Thread Jim Riggs
> On 27 Apr 2018, at 05:06, Branko Čibej  wrote:
> 
> On 27.04.2018 11:30, Nick Kew wrote:
>>> In my use cases, the search may often be strings, but not always, so I 
>>> thought that maybe APR could/should provide something more generic. The 
>>> above functions could then be rewritten to use this new function. Here are 
>>> my thoughts:
>>> 
>>> 1. The search item (needle) can be anything rather than just a string as 
>>> above.
>>> 2. The caller provides a callback that compares the values to needle in the 
>>> same way strcmp()/strcasecmp() do. That is, 0 means "equal". In fact, 
>>> strcmp()/strcasecmp() can -- and probably often will -- be used as the 
>>> callback for strings.
>> There’s a bit of a mismatch between that and the “can be anything”.
>> The latter would call for a size parameter that could be passed to memcmp,
>> as well as of course strncmp/strncasecmp.
> 
> You can't just use any of the string comparison functions directly
> anyway: first because the signature is wrong, and second because there's
> no requirement that array elements are pointers.

Nick - I don't expect that comp would ever do just a memcmp...unless that's how 
someone implements it, but then that implementation would have to know what 
kind of data (and the size) it is dealing with. The comparator would be a 
custom written function that expects int or char* or struct* or _ and do 
whatever it considers an equality comparison. It may receive struct*s and only 
compare one or two members to consider them "equal" in the comparator's context.

Branko - First, I can use them directly with either a compiler warning 
(implicit cast char* to void*) or an explicit cast. Second, this does not 
require that the elements be pointers. If they are char*, though, strcmp and 
strcasecmp work fine.


>>> 3. A start (input) and index (output) parameter can be used to iterate and 
>>> find all occurrences.
>>> What do others think? Is there value in this, or is it just me?
>> I’ve had occasion to search an array, though I don’t think I’ve needed 
>> anything more
>> generic than the HTTPD functions you name above.
>> 
>>> +typedef int (apr_array_find_comparison_callback_fn_t)(const void *x, const 
>>> void *y);
>> Minor quibble: the name seems quite tortuously overdescriptive!

Agreed, but it's not something that really gets used much, though. I just based 
it off the apr_table_do callback's name.


> Major quibble: The comparator should take a parameter that receives the
> array element size. Careful implementations will either use that size or
> check that it's the correct size. APR arrays can contain whole
> structures, not just pointers.

Again, the comparator has to be smart. It should know what it is receiving. 
They may be ints, structs, strings, whatever. It can and should cast as 
necessary. The void* is just a way of saying "I'm passing you something. You 
have to know how to handle it." It may or may not be a pointer. I have 
successfully tested it with char* (using strcmp/strcasecmp) as well as int and 
char using custom comp callback functions. They all worked as expected.


> 
>>> +  for (i = start; (i < arr->nelts) && comp(needle, *(void **)(arr->elts + 
>>> (arr->elt_size * i))); i++) {
>>> +/* empty */
>>> +  }
> 
> The comparator should receive the pointer to the array element, not the
> array element interpreted as a pointer! See above, array elements don't
> have to be pointers. Why else do you think apr_array_t::elt_size exists?
> Or the APR_ARRAY_IDX() and APR_ARRAY_PUSH() macros?

Again, see above. It is only void* to say "I am passing you something, but 
*you* know what it really is."


>> Why not make that much more readable and put the substance inside the loop?
>> for (i …) {
>>if (!comp(…)) {
>>  set *index;
>>  set return value;
>>  break;
>>}
>> }
> 
> +1

Agreed:

diff -r 335976d9e7cb tables/apr_tables.c
--- a/tables/apr_tables.c   Wed Apr 04 16:41:28 2018 +
+++ b/tables/apr_tables.c   Fri Apr 27 11:40:48 2018 -0500
@@ -281,6 +281,31 @@
 return res;
 }

+APR_DECLARE(void *) apr_array_find(apr_array_header_t *arr,
+   const void *needle,
+   apr_array_find_comparison_callback_fn_t 
*comp,
+   int start,
+   int *index)
+{
+  int i;
+
+  if (!arr || !needle || !comp || (start < 0)) {
+return NULL;
+  }
+
+  for (i = start; i < arr->nelts; i++) {
+if (comp(needle, *(void **)(arr->elts + (arr->elt_size * i))) == 0) {
+  if (index) {
+*index = i;
+  }
+
+  return *(void **)(arr->elts + (arr->elt_size * i));
+}
+  }
+
+  return NULL;
+}
+

 /*
  *



Re: [RFC] apr_array_find()

2018-04-27 Thread Branko Čibej
On 27.04.2018 11:30, Nick Kew wrote:
>> On 27 Apr 2018, at 03:37, Jim Riggs  wrote:
>>
>> I am working on some httpd changes/features/ideas and have multiple needs to 
>> find items in an array. In httpd, we currently have these:
>>
>> AP_DECLARE(int) ap_array_str_index(const apr_array_header_t *array,
>>   const char *s,
>>   int start);
>>
>> AP_DECLARE(int) ap_array_str_contains(const apr_array_header_t *array,
>>  const char *s);
> Both of which could - perhaps should - really be APR functions!
>
> That is, with one reservation, that applies to both those and your proposal.
> This is (inevitably) linear search.  If we provide search within APR, we could
> be misleading developers into using it for very big arrays, where they should
> be avoiding linear search.
>
> I think I’ve been through that thought process myself:
> 1. I need a searchable set structure.
> 2. Hmm.  Why do I need to reinvent the obvious wheel of apr_array_find()???
> 3. Oh I see, it’s linear search.
> 4. How scalable do I want my array to be?
> 5. Damn.  Better use hash instead.
>
> In the absence of step 2, I’d probably never have reached steps 3-5.

If the prerequisite for using the correct data structure is the presence
or absence of an API which documents "linear search," the damage is
probably already too big anyway ... It's not our job to teach
programmers about data structures and algorithms.


>> In my use cases, the search may often be strings, but not always, so I 
>> thought that maybe APR could/should provide something more generic. The 
>> above functions could then be rewritten to use this new function. Here are 
>> my thoughts:
>>
>> 1. The search item (needle) can be anything rather than just a string as 
>> above.
>> 2. The caller provides a callback that compares the values to needle in the 
>> same way strcmp()/strcasecmp() do. That is, 0 means "equal". In fact, 
>> strcmp()/strcasecmp() can -- and probably often will -- be used as the 
>> callback for strings.
> There’s a bit of a mismatch between that and the “can be anything”.
> The latter would call for a size parameter that could be passed to memcmp,
> as well as of course strncmp/strncasecmp.

You can't just use any of the string comparison functions directly
anyway: first because the signature is wrong, and second because there's
no requirement that array elements are pointers.

>> 3. A start (input) and index (output) parameter can be used to iterate and 
>> find all occurrences.
>> What do others think? Is there value in this, or is it just me?
> I’ve had occasion to search an array, though I don’t think I’ve needed 
> anything more
> generic than the HTTPD functions you name above.
>
>> +typedef int (apr_array_find_comparison_callback_fn_t)(const void *x, const 
>> void *y);
> Minor quibble: the name seems quite tortuously overdescriptive!

Major quibble: The comparator should take a parameter that receives the
array element size. Careful implementations will either use that size or
check that it's the correct size. APR arrays can contain whole
structures, not just pointers.

>> +  for (i = start; (i < arr->nelts) && comp(needle, *(void **)(arr->elts + 
>> (arr->elt_size * i))); i++) {
>> +/* empty */
>> +  }

The comparator should receive the pointer to the array element, not the
array element interpreted as a pointer! See above, array elements don't
have to be pointers. Why else do you think apr_array_t::elt_size exists?
Or the APR_ARRAY_IDX() and APR_ARRAY_PUSH() macros?

> Why not make that much more readable and put the substance inside the loop?
> for (i …) {
> if (!comp(…)) {
>   set *index;
>   set return value;
>   break;
> }
> }

+1


-- Brane



Re: [RFC] apr_array_find()

2018-04-27 Thread Nick Kew

> On 27 Apr 2018, at 03:37, Jim Riggs  wrote:
> 
> I am working on some httpd changes/features/ideas and have multiple needs to 
> find items in an array. In httpd, we currently have these:
> 
> AP_DECLARE(int) ap_array_str_index(const apr_array_header_t *array,
>   const char *s,
>   int start);
> 
> AP_DECLARE(int) ap_array_str_contains(const apr_array_header_t *array,
>  const char *s);

Both of which could - perhaps should - really be APR functions!

That is, with one reservation, that applies to both those and your proposal.
This is (inevitably) linear search.  If we provide search within APR, we could
be misleading developers into using it for very big arrays, where they should
be avoiding linear search.

I think I’ve been through that thought process myself:
1. I need a searchable set structure.
2. Hmm.  Why do I need to reinvent the obvious wheel of apr_array_find()???
3. Oh I see, it’s linear search.
4. How scalable do I want my array to be?
5. Damn.  Better use hash instead.

In the absence of step 2, I’d probably never have reached steps 3-5.

> In my use cases, the search may often be strings, but not always, so I 
> thought that maybe APR could/should provide something more generic. The above 
> functions could then be rewritten to use this new function. Here are my 
> thoughts:
> 
> 1. The search item (needle) can be anything rather than just a string as 
> above.
> 2. The caller provides a callback that compares the values to needle in the 
> same way strcmp()/strcasecmp() do. That is, 0 means "equal". In fact, 
> strcmp()/strcasecmp() can -- and probably often will -- be used as the 
> callback for strings.

There’s a bit of a mismatch between that and the “can be anything”.
The latter would call for a size parameter that could be passed to memcmp,
as well as of course strncmp/strncasecmp.

> 3. A start (input) and index (output) parameter can be used to iterate and 
> find all occurrences.

> What do others think? Is there value in this, or is it just me?

I’ve had occasion to search an array, though I don’t think I’ve needed anything 
more
generic than the HTTPD functions you name above.

> +typedef int (apr_array_find_comparison_callback_fn_t)(const void *x, const 
> void *y);

Minor quibble: the name seems quite tortuously overdescriptive!

> +  for (i = start; (i < arr->nelts) && comp(needle, *(void **)(arr->elts + 
> (arr->elt_size * i))); i++) {
> +/* empty */
> +  }

Why not make that much more readable and put the substance inside the loop?
for (i …) {
if (!comp(…)) {
  set *index;
  set return value;
  break;
}
}

— 
Nick Kew