Re: apr_json object key doublons
On Sat, Sep 1, 2018 at 1:49 PM Graham Leggett wrote: > > On 01 Sep 2018, at 13:13, Yann Ylavic wrote: > > > So how about the attached patch too, allowing for case sensitive > > apr_tables? :p > > apr_json could then use apr_table_make_ex()… > > I like the idea of a case sensitive apr_table, it’s something that would have > been useful to me in the past. Yeah, quite tangential to JSON but indeed useful, I'll commit it independently if no objection arises. > > Using it in the JSON code I’m still not keen on, as we need the hard security > guarantees of the apr_hash method signature, which has just one mode of > operation. I'm not sure what you mean by the "hard security guarantees of the apr_hash method signature", for me the only hard guarantee of apr_hash_t is to not lose data, but that's also a guarantee of apr_table :) Speaking of JSON, the downside of apr_hash are no-doublons and the need for klen all over the place, so for me the right structure is (caseless-)apr_table. > We’re also still forced to unnecessarily strndup all our keys on the way in > to make them NUL terminated. No we don't, in my patch apr_json_string_create() does apr_strmemdup() only if the given len != APR_JSON_VALUE_STRING (which makes sense both to make sure that the given string is NUL-terminated and to provide a way to duplicate it if it's not already or the scope doesn't fit). Then C string key version of apr_json_object_{set,add}() uses APR_JSON_VALUE_STRING to build the json string key and apr_table_setn/addn() to fill the table, so no duplication occurs by default. The way to duplicate the key for the user would be to apr_json_object_{set,add}_ex(apr_json_string_create(key, strlen(key), value, ...) explicitely, but that's not the "default" API. Otherwise everything should work as before, and I like the _set/add() semantics (borrowed from apr_table) nice w.r.t. doublons. Regards, Yann.
Re: apr_json object key doublons
On 01 Sep 2018, at 13:13, Yann Ylavic wrote: > So how about the attached patch too, allowing for case sensitive apr_tables? > :p > apr_json could then use apr_table_make_ex()… I like the idea of a case sensitive apr_table, it’s something that would have been useful to me in the past. Using it in the JSON code I’m still not keen on, as we need the hard security guarantees of the apr_hash method signature, which has just one mode of operation. We’re also still forced to unnecessarily strndup all our keys on the way in to make them NUL terminated. Regards, Graham —
Re: apr_json object key doublons
On Sat, Sep 1, 2018 at 1:06 AM Graham Leggett wrote: > > In the case of the apr_table.h API, keys are case insensitive, meaning that > if we use apr_tables.h in turn apr_json.h will in turn no longer be RFC > compliant (object keys will fail comparisons), and therefore neither will any > of the JOSE code. Ah yes, right, I miised that. > > As a result, we cannot use the apr_tables.h API for this and remain compliant. So how about the attached patch too, allowing for case sensitive apr_tables? :p apr_json could then use apr_table_make_ex()... Regards, Yann. Index: include/apr_tables.h === --- include/apr_tables.h (revision 1839763) +++ include/apr_tables.h (working copy) @@ -222,13 +222,26 @@ APR_DECLARE(char *) apr_array_pstrcat(apr_pool_t * const apr_array_header_t *arr, const char sep); +#define APR_TABLE_CASELESS 0x1 /** ignore keys' case */ + /** * Make a new table. * @param p The pool to allocate the pool out of * @param nelts The number of elements in the initial table. + * @param flags Bitmask of APR_TABLE_* flags. * @return The new table. * @warning This table can only store text data */ +APR_DECLARE(apr_table_t *) apr_table_make_ex(apr_pool_t *p, int nelts, + int flags); + +/** + * Make a new table, ignoring keys' case. + * @param p The pool to allocate the pool out of + * @param nelts The number of elements in the initial table. + * @return The new table. + * @warning This table can only store text data + */ APR_DECLARE(apr_table_t *) apr_table_make(apr_pool_t *p, int nelts); /** Index: tables/apr_tables.c === --- tables/apr_tables.c (revision 1839763) +++ tables/apr_tables.c (working copy) @@ -303,9 +303,9 @@ APR_DECLARE(char *) apr_array_pstrcat(apr_pool_t * * 4 bytes, normalized for case-insensitivity and packed into * an int...this checksum allows us to do a single integer * comparison as a fast check to determine whether we can - * skip a strcasecmp + * skip a str[case]cmp */ -#define COMPUTE_KEY_CHECKSUM(key, checksum)\ +#define COMPUTE_KEY_CHECKSUM(key, checksum, nc)\ { \ const char *k = (key); \ apr_uint32_t c = (apr_uint32_t)*k; \ @@ -325,7 +325,9 @@ APR_DECLARE(char *) apr_array_pstrcat(apr_pool_t * c = (apr_uint32_t)*++k;\ checksum |= c; \ } \ -checksum &= CASE_MASK; \ +if ((nc)) {\ +checksum &= CASE_MASK; \ +} \ } /** The opaque string-content table type */ @@ -341,6 +343,10 @@ struct apr_table_t { /** Who created the array. */ void *creator; #endif +/** The compare function (str[case]cmp) */ +int (*cmp)(const char*, const char*); +/** Bitmask of APR_TABLE_* flags */ +int flags; /* An index to speed up table lookups. The way this works is: * - Hash the key into the index: * - index_first[TABLE_HASH(key)] is the offset within @@ -397,7 +403,8 @@ APR_DECLARE(int) apr_is_empty_table(const apr_tabl return ((t == NULL) || (t->a.nelts == 0)); } -APR_DECLARE(apr_table_t *) apr_table_make(apr_pool_t *p, int nelts) +APR_DECLARE(apr_table_t *) apr_table_make_ex(apr_pool_t *p, int nelts, + int flags) { apr_table_t *t = apr_palloc(p, sizeof(apr_table_t)); @@ -406,9 +413,21 @@ APR_DECLARE(int) apr_is_empty_table(const apr_tabl t->creator = __builtin_return_address(0); #endif t->index_initialized = 0; +if (flags & APR_TABLE_CASELESS) { +t->cmp = strcasecmp; +} +else { +t->cmp = strcmp; +} +t->flags = flags; return t; } +APR_DECLARE(apr_table_t *) apr_table_make(apr_pool_t *p, int nelts) +{ +return apr_table_make_ex(p, nelts, APR_TABLE_CASELESS); +} + APR_DECLARE(apr_table_t *) apr_table_copy(apr_pool_t *p, const apr_table_t *t) { apr_table_t *new = apr_palloc(p, sizeof(apr_table_t)); @@ -428,6 +447,8 @@ APR_DECLARE(apr_table_t *) apr_table_copy(apr_pool memcpy(new->index_first, t->index_first, sizeof(int) * TABLE_HASH_SIZE); memcpy(new->index_last, t->index_last, sizeof(int) * TABLE_HASH_SIZE); new->index_initialized = t->index_initialized; +new->flags = t->flags; +new->cmp = t->cmp; return new; } @@ -438,6 +459,9 @@ APR_DECLARE(apr_table_t *) apr_table_clone(apr_poo apr_table_t *new = apr_table_make(p, array->nelts); int i; +new->flags = t->flags; +new->cmp = t->cmp; + for (i = 0; i < array->nelts; i++) { apr_table_add(new, elts[i].key, elts[i].val); } @@ -483,13 +507,13 @@ APR_DECLARE(const char *) apr_table_get
Re: apr_json object key doublons
On 31 Aug 2018, at 22:19, Yann Ylavic wrote: > Actually this updated patch, rather. > There is no more copy with an apr_table than with an apr_hash (we > still point to the given key). > I also added apr_json_object_add() for explicitely allowing doublons, > while apr_json_object_set() still replaces as before, similarly to > apr_table_add() vs apr_table_set(). > > WDYT? To fill you in, the purpose of the apr_json.h API is to be RFC8259 compliant, which in turn is being used an implementation of JOSE support for APR (RFC7515/RFC7516/RFC7519), which in turn depends on a fully RFC compliant version of a JSON encode/decoder, meeting specific security requirements. In the case of the apr_table.h API, keys are case insensitive, meaning that if we use apr_tables.h in turn apr_json.h will in turn no longer be RFC compliant (object keys will fail comparisons), and therefore neither will any of the JOSE code. As a result, we cannot use the apr_tables.h API for this and remain compliant. The JOSE code was ready earlier today, but the API changes have meant I need to update the code to the new API before being able to commit anything (you just beat me to it). Will do that tomorrow when I have had some sleep. Regards, Graham —
Re: apr_json object key doublons
[CC dev@ this time] On Fri, Aug 31, 2018 at 3:20 PM Yann Ylavic wrote: > > On Fri, Aug 31, 2018 at 2:41 PM Graham Leggett wrote: > > > > On 31 Aug 2018, at 14:26, Yann Ylavic wrote: > > > > > I wonder if we shouldn't change the apr_hash_t (currently used to > > > handle JSON objects) to an apr_table_t, such that key doublons are not > > > an issue (this isn't one in JSON specification AFAICT). > > > > > > Then we could get rid of 'klen' handling in several places (NUL > > > terminated string is fine for keys IMHO, the question is more about > > > strdup or not) while APR_JSON_FLAGS_STRICT would still do its job for > > > overlays (and could be extended to apr_json_object_set w/ something > > > like APR_JSON_FLAGS_MULTI, mutually exclusive). > > > > I wondered about this for a while, I’m keen to keep the code as it is, to > > follow the ap_hash_t interface with key and key length as closely as > > possible. > > > > When decoding JSON, the original strings are referenced in place, and this > > saves both time and memory. If the strings became NUL terminated we would > > have to strndup each one, which would slow us down and almost double the > > memory footprint. > > > > I was thinking of something like the attached patch. > We don't really need to duplicate if not asked to (i.e. given len == > APR_JSON_VALUE_STRING). > Wouldn't that work? Actually this updated patch, rather. There is no more copy with an apr_table than with an apr_hash (we still point to the given key). I also added apr_json_object_add() for explicitely allowing doublons, while apr_json_object_set() still replaces as before, similarly to apr_table_add() vs apr_table_set(). WDYT? Regards, Yann. Index: include/apr_json.h === --- include/apr_json.h (revision 1839763) +++ include/apr_json.h (working copy) @@ -28,7 +28,6 @@ #include "apr.h" #include "apr_pools.h" #include "apr_tables.h" -#include "apr_hash.h" #include "apr_strings.h" #include "apr_buckets.h" @@ -169,7 +168,7 @@ struct apr_json_object_t { /** The key value pairs in the object are in this list */ APR_RING_HEAD(apr_json_object_list_t, apr_json_kv_t) list; /** JSON object */ -apr_hash_t *hash; +apr_table_t *table; }; /** @@ -198,12 +197,14 @@ APR_DECLARE(apr_json_value_t *) apr_json_value_cre * * @param pool The pool to allocate from. * @param val The UTF-8 encoded string value. - * @param len The length of the string, or APR_JSON_VALUE_STRING. + * @param len The length of the string, or APR_JSON_VALUE_STRING; + *the former duplicates (and NUL terminates) val up to + *the given len, while the latter uses the pointer only. * @return The apr_json_value_t structure. */ -APR_DECLARE(apr_json_value_t *) -apr_json_string_create(apr_pool_t *pool, const char *val, -apr_ssize_t len) __attribute__((nonnull(1))); +APR_DECLARE(apr_json_value_t *) apr_json_string_create(apr_pool_t *pool, +const char *val, apr_ssize_t len) +__attribute__((nonnull(1))); /** * Allocate and return a JSON array. @@ -269,11 +270,9 @@ APR_DECLARE(apr_json_value_t *) __attribute__((nonnull(1))); /** - * Associate a value with a key in a JSON object. + * Associate a value with a key in a JSON object, replacing doublon(s). * @param obj The JSON object. * @param key Pointer to the key. - * @param klen Length of the key, or APR_JSON_VALUE_STRING if NUL - * terminated. * @param val Value to associate with the key. * @param pool Pool to use. * @return APR_SUCCESS on success, APR_EINVAL if the key is @@ -281,22 +280,42 @@ APR_DECLARE(apr_json_value_t *) * @remark If the value is NULL the key value pair is deleted. */ APR_DECLARE(apr_status_t) apr_json_object_set(apr_json_value_t *obj, -const char *key, apr_ssize_t klen, apr_json_value_t *val, -apr_pool_t *pool) __attribute__((nonnull(1, 2, 5))); +const char *key, apr_json_value_t *val, apr_pool_t *pool) +__attribute__((nonnull(1, 2, 4))); /** + * Associate a value with a key in a JSON object, with possible doublon(s). + * @param obj The JSON object. + * @param key Pointer to the key. + * @param val Value to associate with the key. + * @param pool Pool to use. + * @return APR_SUCCESS on success, APR_EINVAL if the key is + * NULL or not a string, or the object is not an APR_JSON_OBJECT. + * @remark If the value is NULL the key value pair is deleted. + */ +APR_DECLARE(apr_status_t) apr_json_object_add(apr_json_value_t *obj, +const char *key, apr_json_value_t *val, apr_pool_t *pool) +__attribute__((nonnull(1, 2, 4))); + +/** * Look up the value associated with a key in a JSON object. * @param obj The JSON object. * @param key Pointer to the key. - * @param klen Length of the key, or APR_JSON_VALUE_STRING if NUL - * terminated. * @return Returns NULL if the key is not present. */ APR
Re: apr_json object key doublons
On 31 Aug 2018, at 14:26, Yann Ylavic wrote: > I wonder if we shouldn't change the apr_hash_t (currently used to > handle JSON objects) to an apr_table_t, such that key doublons are not > an issue (this isn't one in JSON specification AFAICT). > > Then we could get rid of 'klen' handling in several places (NUL > terminated string is fine for keys IMHO, the question is more about > strdup or not) while APR_JSON_FLAGS_STRICT would still do its job for > overlays (and could be extended to apr_json_object_set w/ something > like APR_JSON_FLAGS_MULTI, mutually exclusive). I wondered about this for a while, I’m keen to keep the code as it is, to follow the ap_hash_t interface with key and key length as closely as possible. When decoding JSON, the original strings are referenced in place, and this saves both time and memory. If the strings became NUL terminated we would have to strndup each one, which would slow us down and almost double the memory footprint. Regards, Graham —
apr_json object key doublons
I wonder if we shouldn't change the apr_hash_t (currently used to handle JSON objects) to an apr_table_t, such that key doublons are not an issue (this isn't one in JSON specification AFAICT). Then we could get rid of 'klen' handling in several places (NUL terminated string is fine for keys IMHO, the question is more about strdup or not) while APR_JSON_FLAGS_STRICT would still do its job for overlays (and could be extended to apr_json_object_set w/ something like APR_JSON_FLAGS_MULTI, mutually exclusive). WDYT?