Re: apr_json object key doublons

2018-09-01 Thread Yann Ylavic
On Sat, Sep 1, 2018 at 1:49 PM Graham Leggett  wrote:
>
> On 01 Sep 2018, at 13:13, Yann Ylavic  wrote:
>
> > So how about the attached patch too, allowing for case sensitive 
> > apr_tables? :p
> > apr_json could then use apr_table_make_ex()…
>
> I like the idea of a case sensitive apr_table, it’s something that would have 
> been useful to me in the past.

Yeah, quite tangential to JSON but indeed useful, I'll commit it
independently if no objection arises.

>
> Using it in the JSON code I’m still not keen on, as we need the hard security 
> guarantees of the apr_hash method signature, which has just one mode of 
> operation.

I'm not sure what you mean by the "hard security guarantees of the
apr_hash method signature", for me the only hard guarantee of
apr_hash_t is to not lose data, but that's also a guarantee of
apr_table :)

Speaking of JSON, the downside of apr_hash are no-doublons and the
need for klen all over the place, so for me the right structure is
(caseless-)apr_table.

> We’re also still forced to unnecessarily strndup all our keys on the way in 
> to make them NUL terminated.

No we don't, in my patch apr_json_string_create() does apr_strmemdup()
only if the given len != APR_JSON_VALUE_STRING (which makes sense both
to make sure that the given string is NUL-terminated and to provide a
way to duplicate it if it's not already or the scope doesn't fit).
Then C string key version of apr_json_object_{set,add}() uses
APR_JSON_VALUE_STRING to build the json string key and
apr_table_setn/addn() to fill the table, so no duplication occurs by
default.
The way to duplicate the key for the user would be to
apr_json_object_{set,add}_ex(apr_json_string_create(key, strlen(key),
value, ...) explicitely, but that's not the "default" API.

Otherwise everything should work as before, and I like the _set/add()
semantics (borrowed from apr_table) nice w.r.t. doublons.


Regards,
Yann.


Re: apr_json object key doublons

2018-09-01 Thread Graham Leggett
On 01 Sep 2018, at 13:13, Yann Ylavic  wrote:

> So how about the attached patch too, allowing for case sensitive apr_tables? 
> :p
> apr_json could then use apr_table_make_ex()…

I like the idea of a case sensitive apr_table, it’s something that would have 
been useful to me in the past.

Using it in the JSON code I’m still not keen on, as we need the hard security 
guarantees of the apr_hash method signature, which has just one mode of 
operation. We’re also still forced to unnecessarily strndup all our keys on the 
way in to make them NUL terminated.

Regards,
Graham
—



Re: apr_json object key doublons

2018-09-01 Thread Yann Ylavic
On Sat, Sep 1, 2018 at 1:06 AM Graham Leggett  wrote:
>
> In the case of the apr_table.h API, keys are case insensitive, meaning that 
> if we use apr_tables.h in turn apr_json.h will in turn no longer be RFC 
> compliant (object keys will fail comparisons), and therefore neither will any 
> of the JOSE code.

Ah yes, right, I miised that.

>
> As a result, we cannot use the apr_tables.h API for this and remain compliant.

So how about the attached patch too, allowing for case sensitive apr_tables? :p
apr_json could then use apr_table_make_ex()...

Regards,
Yann.
Index: include/apr_tables.h
===
--- include/apr_tables.h	(revision 1839763)
+++ include/apr_tables.h	(working copy)
@@ -222,13 +222,26 @@ APR_DECLARE(char *) apr_array_pstrcat(apr_pool_t *
   const apr_array_header_t *arr,
   const char sep);
 
+#define APR_TABLE_CASELESS 0x1 /** ignore keys' case */
+
 /**
  * Make a new table.
  * @param p The pool to allocate the pool out of
  * @param nelts The number of elements in the initial table.
+ * @param flags Bitmask of APR_TABLE_* flags.
  * @return The new table.
  * @warning This table can only store text data
  */
+APR_DECLARE(apr_table_t *) apr_table_make_ex(apr_pool_t *p, int nelts,
+ int flags);
+
+/**
+ * Make a new table, ignoring keys' case.
+ * @param p The pool to allocate the pool out of
+ * @param nelts The number of elements in the initial table.
+ * @return The new table.
+ * @warning This table can only store text data
+ */
 APR_DECLARE(apr_table_t *) apr_table_make(apr_pool_t *p, int nelts);
 
 /**
Index: tables/apr_tables.c
===
--- tables/apr_tables.c	(revision 1839763)
+++ tables/apr_tables.c	(working copy)
@@ -303,9 +303,9 @@ APR_DECLARE(char *) apr_array_pstrcat(apr_pool_t *
  * 4 bytes, normalized for case-insensitivity and packed into
  * an int...this checksum allows us to do a single integer
  * comparison as a fast check to determine whether we can
- * skip a strcasecmp
+ * skip a str[case]cmp
  */
-#define COMPUTE_KEY_CHECKSUM(key, checksum)\
+#define COMPUTE_KEY_CHECKSUM(key, checksum, nc)\
 {  \
 const char *k = (key); \
 apr_uint32_t c = (apr_uint32_t)*k; \
@@ -325,7 +325,9 @@ APR_DECLARE(char *) apr_array_pstrcat(apr_pool_t *
 c = (apr_uint32_t)*++k;\
 checksum |= c; \
 }  \
-checksum &= CASE_MASK; \
+if ((nc)) {\
+checksum &= CASE_MASK; \
+}  \
 }
 
 /** The opaque string-content table type */
@@ -341,6 +343,10 @@ struct apr_table_t {
 /** Who created the array. */
 void *creator;
 #endif
+/** The compare function (str[case]cmp) */
+int (*cmp)(const char*, const char*);
+/** Bitmask of APR_TABLE_* flags */
+int flags;
 /* An index to speed up table lookups.  The way this works is:
  *   - Hash the key into the index:
  * - index_first[TABLE_HASH(key)] is the offset within
@@ -397,7 +403,8 @@ APR_DECLARE(int) apr_is_empty_table(const apr_tabl
 return ((t == NULL) || (t->a.nelts == 0));
 }
 
-APR_DECLARE(apr_table_t *) apr_table_make(apr_pool_t *p, int nelts)
+APR_DECLARE(apr_table_t *) apr_table_make_ex(apr_pool_t *p, int nelts,
+ int flags)
 {
 apr_table_t *t = apr_palloc(p, sizeof(apr_table_t));
 
@@ -406,9 +413,21 @@ APR_DECLARE(int) apr_is_empty_table(const apr_tabl
 t->creator = __builtin_return_address(0);
 #endif
 t->index_initialized = 0;
+if (flags & APR_TABLE_CASELESS) {
+t->cmp = strcasecmp;
+}
+else {
+t->cmp = strcmp;
+}
+t->flags = flags;
 return t;
 }
 
+APR_DECLARE(apr_table_t *) apr_table_make(apr_pool_t *p, int nelts)
+{
+return apr_table_make_ex(p, nelts, APR_TABLE_CASELESS);
+}
+
 APR_DECLARE(apr_table_t *) apr_table_copy(apr_pool_t *p, const apr_table_t *t)
 {
 apr_table_t *new = apr_palloc(p, sizeof(apr_table_t));
@@ -428,6 +447,8 @@ APR_DECLARE(apr_table_t *) apr_table_copy(apr_pool
 memcpy(new->index_first, t->index_first, sizeof(int) * TABLE_HASH_SIZE);
 memcpy(new->index_last, t->index_last, sizeof(int) * TABLE_HASH_SIZE);
 new->index_initialized = t->index_initialized;
+new->flags = t->flags;
+new->cmp = t->cmp;
 return new;
 }
 
@@ -438,6 +459,9 @@ APR_DECLARE(apr_table_t *) apr_table_clone(apr_poo
 apr_table_t *new = apr_table_make(p, array->nelts);
 int i;
 
+new->flags = t->flags;
+new->cmp = t->cmp;
+
 for (i = 0; i < array->nelts; i++) {
 apr_table_add(new, elts[i].key, elts[i].val);
 }
@@ -483,13 +507,13 @@ APR_DECLARE(const char *) apr_table_get

Re: apr_json object key doublons

2018-08-31 Thread Graham Leggett
On 31 Aug 2018, at 22:19, Yann Ylavic  wrote:

> Actually this updated patch, rather.
> There is no more copy with an apr_table than with an apr_hash (we
> still point to the given key).
> I also added apr_json_object_add() for explicitely allowing doublons,
> while apr_json_object_set() still replaces as before, similarly to
> apr_table_add() vs apr_table_set().
> 
> WDYT?

To fill you in, the purpose of the apr_json.h API is to be RFC8259 compliant, 
which in turn is being used an implementation of JOSE support for APR 
(RFC7515/RFC7516/RFC7519), which in turn depends on a fully RFC compliant 
version of a JSON encode/decoder, meeting specific security requirements.

In the case of the apr_table.h API, keys are case insensitive, meaning that if 
we use apr_tables.h in turn apr_json.h will in turn no longer be RFC compliant 
(object keys will fail comparisons), and therefore neither will any of the JOSE 
code.

As a result, we cannot use the apr_tables.h API for this and remain compliant.

The JOSE code was ready earlier today, but the API changes have meant I need to 
update the code to the new API before being able to commit anything (you just 
beat me to it). Will do that tomorrow when I have had some sleep.

Regards,
Graham
—



Re: apr_json object key doublons

2018-08-31 Thread Yann Ylavic
[CC dev@ this time]

On Fri, Aug 31, 2018 at 3:20 PM Yann Ylavic  wrote:
>
> On Fri, Aug 31, 2018 at 2:41 PM Graham Leggett  wrote:
> >
> > On 31 Aug 2018, at 14:26, Yann Ylavic  wrote:
> >
> > > I wonder if we shouldn't change the apr_hash_t (currently used to
> > > handle JSON objects) to an apr_table_t, such that key doublons are not
> > > an issue (this isn't one in JSON specification AFAICT).
> > >
> > > Then we could get rid of 'klen' handling in several places (NUL
> > > terminated string is fine for keys IMHO, the question is more about
> > > strdup or not) while APR_JSON_FLAGS_STRICT would still do its job for
> > > overlays (and could be extended to apr_json_object_set w/ something
> > > like APR_JSON_FLAGS_MULTI, mutually exclusive).
> >
> > I wondered about this for a while, I’m keen to keep the code as it is, to 
> > follow the ap_hash_t interface with key and key length as closely as 
> > possible.
> >
> > When decoding JSON, the original strings are referenced in place, and this 
> > saves both time and memory. If the strings became NUL terminated we would 
> > have to strndup each one, which would slow us down and almost double the 
> > memory footprint.
> >
>
> I was thinking of something like the attached patch.
> We don't really need to duplicate if not asked to (i.e. given len ==
> APR_JSON_VALUE_STRING).
> Wouldn't that work?

Actually this updated patch, rather.
There is no more copy with an apr_table than with an apr_hash (we
still point to the given key).
I also added apr_json_object_add() for explicitely allowing doublons,
while apr_json_object_set() still replaces as before, similarly to
apr_table_add() vs apr_table_set().

WDYT?

Regards,
Yann.
Index: include/apr_json.h
===
--- include/apr_json.h	(revision 1839763)
+++ include/apr_json.h	(working copy)
@@ -28,7 +28,6 @@
 #include "apr.h"
 #include "apr_pools.h"
 #include "apr_tables.h"
-#include "apr_hash.h"
 #include "apr_strings.h"
 #include "apr_buckets.h"
 
@@ -169,7 +168,7 @@ struct apr_json_object_t {
 /** The key value pairs in the object are in this list */
 APR_RING_HEAD(apr_json_object_list_t, apr_json_kv_t) list;
 /** JSON object */
-apr_hash_t *hash;
+apr_table_t *table;
 };
 
 /**
@@ -198,12 +197,14 @@ APR_DECLARE(apr_json_value_t *) apr_json_value_cre
  *
  * @param pool The pool to allocate from.
  * @param val The UTF-8 encoded string value.
- * @param len The length of the string, or APR_JSON_VALUE_STRING.
+ * @param len The length of the string, or APR_JSON_VALUE_STRING;
+ *the former duplicates (and NUL terminates) val up to
+ *the given len, while the latter uses the pointer only.
  * @return The apr_json_value_t structure.
  */
-APR_DECLARE(apr_json_value_t *)
-apr_json_string_create(apr_pool_t *pool, const char *val,
-apr_ssize_t len) __attribute__((nonnull(1)));
+APR_DECLARE(apr_json_value_t *) apr_json_string_create(apr_pool_t *pool,
+const char *val, apr_ssize_t len)
+__attribute__((nonnull(1)));
 
 /**
  * Allocate and return a JSON array.
@@ -269,11 +270,9 @@ APR_DECLARE(apr_json_value_t *)
 __attribute__((nonnull(1)));
 
 /**
- * Associate a value with a key in a JSON object.
+ * Associate a value with a key in a JSON object, replacing doublon(s).
  * @param obj The JSON object.
  * @param key Pointer to the key.
- * @param klen Length of the key, or APR_JSON_VALUE_STRING if NUL
- * terminated.
  * @param val Value to associate with the key.
  * @param pool Pool to use.
  * @return APR_SUCCESS on success, APR_EINVAL if the key is
@@ -281,22 +280,42 @@ APR_DECLARE(apr_json_value_t *)
  * @remark If the value is NULL the key value pair is deleted.
  */
 APR_DECLARE(apr_status_t) apr_json_object_set(apr_json_value_t *obj,
-const char *key, apr_ssize_t klen, apr_json_value_t *val,
-apr_pool_t *pool) __attribute__((nonnull(1, 2, 5)));
+const char *key, apr_json_value_t *val, apr_pool_t *pool)
+__attribute__((nonnull(1, 2, 4)));
 
 /**
+ * Associate a value with a key in a JSON object, with possible doublon(s).
+ * @param obj The JSON object.
+ * @param key Pointer to the key.
+ * @param val Value to associate with the key.
+ * @param pool Pool to use.
+ * @return APR_SUCCESS on success, APR_EINVAL if the key is
+ *   NULL or not a string, or the object is not an APR_JSON_OBJECT.
+ * @remark If the value is NULL the key value pair is deleted.
+ */
+APR_DECLARE(apr_status_t) apr_json_object_add(apr_json_value_t *obj,
+const char *key, apr_json_value_t *val, apr_pool_t *pool)
+__attribute__((nonnull(1, 2, 4)));
+
+/**
  * Look up the value associated with a key in a JSON object.
  * @param obj The JSON object.
  * @param key Pointer to the key.
- * @param klen Length of the key, or APR_JSON_VALUE_STRING if NUL
- * terminated.
  * @return Returns NULL if the key is not present.
  */
 APR

Re: apr_json object key doublons

2018-08-31 Thread Graham Leggett
On 31 Aug 2018, at 14:26, Yann Ylavic  wrote:

> I wonder if we shouldn't change the apr_hash_t (currently used to
> handle JSON objects) to an apr_table_t, such that key doublons are not
> an issue (this isn't one in JSON specification AFAICT).
> 
> Then we could get rid of 'klen' handling in several places (NUL
> terminated string is fine for keys IMHO, the question is more about
> strdup or not) while APR_JSON_FLAGS_STRICT would still do its job for
> overlays (and could be extended to apr_json_object_set w/ something
> like APR_JSON_FLAGS_MULTI, mutually exclusive).

I wondered about this for a while, I’m keen to keep the code as it is, to 
follow the ap_hash_t interface with key and key length as closely as possible.

When decoding JSON, the original strings are referenced in place, and this 
saves both time and memory. If the strings became NUL terminated we would have 
to strndup each one, which would slow us down and almost double the memory 
footprint.

Regards,
Graham
—



apr_json object key doublons

2018-08-31 Thread Yann Ylavic
I wonder if we shouldn't change the apr_hash_t (currently used to
handle JSON objects) to an apr_table_t, such that key doublons are not
an issue (this isn't one in JSON specification AFAICT).

Then we could get rid of 'klen' handling in several places (NUL
terminated string is fine for keys IMHO, the question is more about
strdup or not) while APR_JSON_FLAGS_STRICT would still do its job for
overlays (and could be extended to apr_json_object_set w/ something
like APR_JSON_FLAGS_MULTI, mutually exclusive).

WDYT?