Re: [PATCH v2] json_writer: new routines to create data in JSON format
On 3/23/2018 4:11 PM, René Scharfe wrote: Am 23.03.2018 um 20:55 schrieb Jeff Hostetler: +struct json_writer_level +{ + unsigned level_is_array : 1; + unsigned level_is_empty : 1; +}; + +struct json_writer +{ + struct json_writer_level *levels; + int nr, alloc; + struct strbuf json; +}; A simpler and probably more compact representation of is_array would be a strbuf with one char per level, e.g. '[' for an array and '{' for an object (or ']' and '}'). I don't understand the need to track emptiness per level. Only the top level array/object can ever be empty, can it? My expectation was that any sub-object or sub-array could be empty. That is, this should be valid (and the JSON parser in Python allows): {"a":{}, "b":[], "c":[[]], "d":[{}]} Sure, but the emptiness of finished arrays and objects doesn't matter for the purposes of error checking, comma setting or closing. At most one of them is empty *and* unclosed while writing the overall JSON object -- the last one opened: { {"a":{ {"a":{}, "b":[ {"a":{}, "b":[], "c":[ {"a":{}, "b":[], "c":[[ {"a":{}, "b":[], "c":[[]], "d":[ {"a":{}, "b":[], "c":[[]], "d":[{ Any of the earlier written arrays/objects are either closed or contain at least a half-done sub-array/object, which makes them non-empty. René good point. i'll revisit. thanks. Jeff
Re: [PATCH v2] json_writer: new routines to create data in JSON format
Am 23.03.2018 um 20:55 schrieb Jeff Hostetler: >>> +struct json_writer_level >>> +{ >>> + unsigned level_is_array : 1; >>> + unsigned level_is_empty : 1; >>> +}; >>> + >>> +struct json_writer >>> +{ >>> + struct json_writer_level *levels; >>> + int nr, alloc; >>> + struct strbuf json; >>> +}; >> >> A simpler and probably more compact representation of is_array would >> be a strbuf with one char per level, e.g. '[' for an array and '{' >> for an object (or ']' and '}'). >> >> I don't understand the need to track emptiness per level. Only the >> top level array/object can ever be empty, can it? > > My expectation was that any sub-object or sub-array could be empty. > That is, this should be valid (and the JSON parser in Python allows): > > {"a":{}, "b":[], "c":[[]], "d":[{}]} Sure, but the emptiness of finished arrays and objects doesn't matter for the purposes of error checking, comma setting or closing. At most one of them is empty *and* unclosed while writing the overall JSON object -- the last one opened: { {"a":{ {"a":{}, "b":[ {"a":{}, "b":[], "c":[ {"a":{}, "b":[], "c":[[ {"a":{}, "b":[], "c":[[]], "d":[ {"a":{}, "b":[], "c":[[]], "d":[{ Any of the earlier written arrays/objects are either closed or contain at least a half-done sub-array/object, which makes them non-empty. René
Re: [PATCH v2] json_writer: new routines to create data in JSON format
On 3/23/2018 2:01 PM, René Scharfe wrote: Am 21.03.2018 um 20:28 schrieb g...@jeffhostetler.com: From: Jeff Hostetler Add basic routines to generate data in JSON format. Signed-off-by: Jeff Hostetler --- Makefile| 2 + json-writer.c | 321 + json-writer.h | 86 + t/helper/test-json-writer.c | 420 t/t0019-json-writer.sh | 102 +++ 5 files changed, 931 insertions(+) create mode 100644 json-writer.c create mode 100644 json-writer.h create mode 100644 t/helper/test-json-writer.c create mode 100755 t/t0019-json-writer.sh diff --git a/Makefile b/Makefile index 1a9b23b..57f58e6 100644 --- a/Makefile +++ b/Makefile @@ -662,6 +662,7 @@ TEST_PROGRAMS_NEED_X += test-fake-ssh TEST_PROGRAMS_NEED_X += test-genrandom TEST_PROGRAMS_NEED_X += test-hashmap TEST_PROGRAMS_NEED_X += test-index-version +TEST_PROGRAMS_NEED_X += test-json-writer TEST_PROGRAMS_NEED_X += test-lazy-init-name-hash TEST_PROGRAMS_NEED_X += test-line-buffer TEST_PROGRAMS_NEED_X += test-match-trees @@ -815,6 +816,7 @@ LIB_OBJS += hashmap.o LIB_OBJS += help.o LIB_OBJS += hex.o LIB_OBJS += ident.o +LIB_OBJS += json-writer.o LIB_OBJS += kwset.o LIB_OBJS += levenshtein.o LIB_OBJS += line-log.o diff --git a/json-writer.c b/json-writer.c new file mode 100644 index 000..89a6abb --- /dev/null +++ b/json-writer.c @@ -0,0 +1,321 @@ +#include "cache.h" +#include "json-writer.h" + +static char g_ch_open[2] = { '{', '[' }; +static char g_ch_close[2] = { '}', ']' }; + +/* + * Append JSON-quoted version of the given string to 'out'. + */ +static void append_quoted_string(struct strbuf *out, const char *in) +{ + strbuf_addch(out, '"'); + for (/**/; *in; in++) { + unsigned char c = (unsigned char)*in; + if (c == '"') + strbuf_add(out, "\\\"", 2); + else if (c == '\\') + strbuf_add(out, "", 2); + else if (c == '\n') + strbuf_add(out, "\\n", 2); + else if (c == '\r') + strbuf_add(out, "\\r", 2); + else if (c == '\t') + strbuf_add(out, "\\t", 2); + else if (c == '\f') + strbuf_add(out, "\\f", 2); + else if (c == '\b') + strbuf_add(out, "\\b", 2); Using strbuf_addstr() here would result in the same object code (its strlen() call is inlined for constants) and avoid having to specify the redundant length 2. sure. thanks. + else if (c < 0x20) + strbuf_addf(out, "\\u%04x", c); + else + strbuf_addch(out, c); + } + strbuf_addch(out, '"'); +} + [...] + +void jw_object_double(struct json_writer *jw, const char *fmt, + const char *key, double value) +{ + assert_in_object(jw, key); + maybe_add_comma(jw); + + if (!fmt || !*fmt) + fmt = "%f"; + + append_quoted_string(&jw->json, key); + strbuf_addch(&jw->json, ':'); + strbuf_addf(&jw->json, fmt, value); Hmm. Can compilers check such a caller-supplied format matches the value's type? (I don't know how to specify a format attribute for GCC and Clang for this function.) I'll look into this. thanks. +} [...] + +void jw_object_sub(struct json_writer *jw, const char *key, + const struct json_writer *value) +{ + assert_is_terminated(value); + + assert_in_object(jw, key); + maybe_add_comma(jw); + + append_quoted_string(&jw->json, key); + strbuf_addch(&jw->json, ':'); + strbuf_addstr(&jw->json, value->json.buf); strbuf_addbuf() would be a better fit here -- it avoids a strlen() call and NUL handling issues. sure. thanks. i always forget about _addbuf(). +} + +void jw_object_inline_begin_array(struct json_writer *jw, const char *key) +{ + assert_in_object(jw, key); + maybe_add_comma(jw); + + append_quoted_string(&jw->json, key); + strbuf_addch(&jw->json, ':'); + + jw_array_begin(jw); +} Those duplicate calls in the last ten functions feel mind-numbing. A helper function for adding comma, key and colon might be a good idea. I'll see if I can add another macro to do the dirty work here. [...] diff --git a/json-writer.h b/json-writer.h new file mode 100644 index 000..ad38c95 --- /dev/null +++ b/json-writer.h @@ -0,0 +1,86 @@ +#ifndef JSON_WRITER_H +#define JSON_WRITER_H + +/* + * JSON data structures are defined at: + * http://json.org/ + * http://www.ietf.org/rfc/rfc7159.txt + * + * The JSON-writer API allows one to build JSON data structures using a + * simple wrapper around a "struct strbuf" buffer. It is intended as a + * simple API to build output strings; it is n
Re: [PATCH v2] json_writer: new routines to create data in JSON format
Am 21.03.2018 um 20:28 schrieb g...@jeffhostetler.com: > From: Jeff Hostetler > > Add basic routines to generate data in JSON format. > > Signed-off-by: Jeff Hostetler > --- > Makefile| 2 + > json-writer.c | 321 + > json-writer.h | 86 + > t/helper/test-json-writer.c | 420 > > t/t0019-json-writer.sh | 102 +++ > 5 files changed, 931 insertions(+) > create mode 100644 json-writer.c > create mode 100644 json-writer.h > create mode 100644 t/helper/test-json-writer.c > create mode 100755 t/t0019-json-writer.sh > > diff --git a/Makefile b/Makefile > index 1a9b23b..57f58e6 100644 > --- a/Makefile > +++ b/Makefile > @@ -662,6 +662,7 @@ TEST_PROGRAMS_NEED_X += test-fake-ssh > TEST_PROGRAMS_NEED_X += test-genrandom > TEST_PROGRAMS_NEED_X += test-hashmap > TEST_PROGRAMS_NEED_X += test-index-version > +TEST_PROGRAMS_NEED_X += test-json-writer > TEST_PROGRAMS_NEED_X += test-lazy-init-name-hash > TEST_PROGRAMS_NEED_X += test-line-buffer > TEST_PROGRAMS_NEED_X += test-match-trees > @@ -815,6 +816,7 @@ LIB_OBJS += hashmap.o > LIB_OBJS += help.o > LIB_OBJS += hex.o > LIB_OBJS += ident.o > +LIB_OBJS += json-writer.o > LIB_OBJS += kwset.o > LIB_OBJS += levenshtein.o > LIB_OBJS += line-log.o > diff --git a/json-writer.c b/json-writer.c > new file mode 100644 > index 000..89a6abb > --- /dev/null > +++ b/json-writer.c > @@ -0,0 +1,321 @@ > +#include "cache.h" > +#include "json-writer.h" > + > +static char g_ch_open[2] = { '{', '[' }; > +static char g_ch_close[2] = { '}', ']' }; > + > +/* > + * Append JSON-quoted version of the given string to 'out'. > + */ > +static void append_quoted_string(struct strbuf *out, const char *in) > +{ > + strbuf_addch(out, '"'); > + for (/**/; *in; in++) { > + unsigned char c = (unsigned char)*in; > + if (c == '"') > + strbuf_add(out, "\\\"", 2); > + else if (c == '\\') > + strbuf_add(out, "", 2); > + else if (c == '\n') > + strbuf_add(out, "\\n", 2); > + else if (c == '\r') > + strbuf_add(out, "\\r", 2); > + else if (c == '\t') > + strbuf_add(out, "\\t", 2); > + else if (c == '\f') > + strbuf_add(out, "\\f", 2); > + else if (c == '\b') > + strbuf_add(out, "\\b", 2); Using strbuf_addstr() here would result in the same object code (its strlen() call is inlined for constants) and avoid having to specify the redundant length 2. > + else if (c < 0x20) > + strbuf_addf(out, "\\u%04x", c); > + else > + strbuf_addch(out, c); > + } > + strbuf_addch(out, '"'); > +} > + > + > +static inline void begin(struct json_writer *jw, int is_array) > +{ > + ALLOC_GROW(jw->levels, jw->nr + 1, jw->alloc); > + > + jw->levels[jw->nr].level_is_array = !!is_array; > + jw->levels[jw->nr].level_is_empty = 1; > + > + strbuf_addch(&jw->json, g_ch_open[!!is_array]); > + > + jw->nr++; > +} > + > +/* > + * Assert that we have an open object at this level. > + */ > +static void inline assert_in_object(const struct json_writer *jw, const char > *key) > +{ > + if (!jw->nr) > + BUG("object: missing jw_object_begin(): '%s'", key); > + if (jw->levels[jw->nr - 1].level_is_array) > + BUG("object: not in object: '%s'", key); > +} > + > +/* > + * Assert that we have an open array at this level. > + */ > +static void inline assert_in_array(const struct json_writer *jw) > +{ > + if (!jw->nr) > + BUG("array: missing jw_begin()"); > + if (!jw->levels[jw->nr - 1].level_is_array) > + BUG("array: not in array"); > +} > + > +/* > + * Add comma if we have already seen a member at this level. > + */ > +static void inline maybe_add_comma(struct json_writer *jw) > +{ > + if (jw->levels[jw->nr - 1].level_is_empty) > + jw->levels[jw->nr - 1].level_is_empty = 0; > + else > + strbuf_addch(&jw->json, ','); > +} > + > +/* > + * Assert that the given JSON object or JSON array has been properly > + * terminated. (Has closing bracket.) > + */ > +static void inline assert_is_terminated(const struct json_writer *jw) > +{ > + if (jw->nr) > + BUG("object: missing jw_end(): '%s'", jw->json.buf); > +} > + > +void jw_object_begin(struct json_writer *jw) > +{ > + begin(jw, 0); > +} > + > +void jw_object_string(struct json_writer *jw, const char *key, const char > *value) > +{ > + assert_in_object(jw, key); > + maybe_add_comma(jw); > + > + append_quoted_string(&jw->json, key); > + strbuf_addch(&jw->json, ':'); > + append_quoted_string(&jw->json, value); > +} > + > +void jw_object_int(struct json_wri
Re: [PATCH v2] json_writer: new routines to create data in JSON format
On 3/21/2018 5:25 PM, Junio C Hamano wrote: g...@jeffhostetler.com writes: From: Jeff Hostetler Add basic routines to generate data in JSON format. And the point of having capability to write JSON data in our codebase is...? diff --git a/json-writer.c b/json-writer.c new file mode 100644 index 000..89a6abb --- /dev/null +++ b/json-writer.c @@ -0,0 +1,321 @@ +#include "cache.h" +#include "json-writer.h" + +static char g_ch_open[2] = { '{', '[' }; +static char g_ch_close[2] = { '}', ']' }; What's "g_" prefix? Global. Sorry, very old habits. + +/* + * Append JSON-quoted version of the given string to 'out'. + */ +static void append_quoted_string(struct strbuf *out, const char *in) +{ + strbuf_addch(out, '"'); + for (/**/; *in; in++) { + unsigned char c = (unsigned char)*in; It is clear enough to lose /**/, i.e. for (; *in; in++) { but for this one. I wonder if unsigned char c; strbuf_addch(out, '"'); while ((c = *in++) != '\0') { ... is easier to follow, though. either way is fine. will fix. +static inline void begin(struct json_writer *jw, int is_array) +{ + ALLOC_GROW(jw->levels, jw->nr + 1, jw->alloc); + + jw->levels[jw->nr].level_is_array = !!is_array; + jw->levels[jw->nr].level_is_empty = 1; An element of this array is a struct that represents a level, and everybody who accesses an element of that type knows it is talking about a level by the field that has the array being named as .levels[] (also [*1*]). In such a context, it is a bit too loud to name the fields with level_$blah. IOW, struct json_writer_level { unsigned is_array : 1; unsigned is_empty : 1; }; make sense. will fix. +struct json_writer_level +{ + unsigned level_is_array : 1; + unsigned level_is_empty : 1; +}; + +struct json_writer +{ + struct json_writer_level *levels; + int nr, alloc; + struct strbuf json; +}; [Footnote] *1* I personally prefer to call an array of things "thing[]", not "things[]", because then you can refer to an individual element e.g. "thing[4]" and read it as "the fourth thing". Unless the code often treats an array as a whole, that is, in which case, things[] is OK as you'll be calling the whole thing with the plural name (e.g. call that function and give all the things by passing things[]). In this case, one level instance is an element of a stack, and the code would be accessing one level at a time most of the time, so "writer.level[4].is_empty" would read more naturally than "writer.levels[4].level_is_empty". yeah, that makes sense. Thanks Jeff
Re: [PATCH v2] json_writer: new routines to create data in JSON format
g...@jeffhostetler.com writes: > From: Jeff Hostetler > > Add basic routines to generate data in JSON format. And the point of having capability to write JSON data in our codebase is...? > diff --git a/json-writer.c b/json-writer.c > new file mode 100644 > index 000..89a6abb > --- /dev/null > +++ b/json-writer.c > @@ -0,0 +1,321 @@ > +#include "cache.h" > +#include "json-writer.h" > + > +static char g_ch_open[2] = { '{', '[' }; > +static char g_ch_close[2] = { '}', ']' }; What's "g_" prefix? > + > +/* > + * Append JSON-quoted version of the given string to 'out'. > + */ > +static void append_quoted_string(struct strbuf *out, const char *in) > +{ > + strbuf_addch(out, '"'); > + for (/**/; *in; in++) { > + unsigned char c = (unsigned char)*in; It is clear enough to lose /**/, i.e. for (; *in; in++) { but for this one. I wonder if unsigned char c; strbuf_addch(out, '"'); while ((c = *in++) != '\0') { ... is easier to follow, though. > +static inline void begin(struct json_writer *jw, int is_array) > +{ > + ALLOC_GROW(jw->levels, jw->nr + 1, jw->alloc); > + > + jw->levels[jw->nr].level_is_array = !!is_array; > + jw->levels[jw->nr].level_is_empty = 1; An element of this array is a struct that represents a level, and everybody who accesses an element of that type knows it is talking about a level by the field that has the array being named as .levels[] (also [*1*]). In such a context, it is a bit too loud to name the fields with level_$blah. IOW, struct json_writer_level { unsigned is_array : 1; unsigned is_empty : 1; }; > +struct json_writer_level > +{ > + unsigned level_is_array : 1; > + unsigned level_is_empty : 1; > +}; > + > +struct json_writer > +{ > + struct json_writer_level *levels; > + int nr, alloc; > + struct strbuf json; > +}; [Footnote] *1* I personally prefer to call an array of things "thing[]", not "things[]", because then you can refer to an individual element e.g. "thing[4]" and read it as "the fourth thing". Unless the code often treats an array as a whole, that is, in which case, things[] is OK as you'll be calling the whole thing with the plural name (e.g. call that function and give all the things by passing things[]). In this case, one level instance is an element of a stack, and the code would be accessing one level at a time most of the time, so "writer.level[4].is_empty" would read more naturally than "writer.levels[4].level_is_empty".