On 1/12/24 17:39, Frode Nordahl wrote: > Hello, Ilya, > > This is a partial response as I've come across something that might > change the approach for this specific patch, and I thought it would be > pertinent to disclose that discovery as soon as possible. > > I'll respond to the rest as I verify the finding and continue working > through your comments on this patch.
OK. > > On Fri, Jan 12, 2024 at 1:26 AM Ilya Maximets <[email protected]> wrote: >> >> On 1/10/24 20:29, Frode Nordahl wrote: >>> Creating and destroying JSON objects may be time consuming. >>> >>> Add yielding counterparts of json_serialized_object_create() and >>> json_destroy__() functions that make use of the cooperative >>> multitasking module to yield during processing, allowing time >>> sensitive tasks in other parts of the program to be completed >>> during processing. >>> >>> Signed-off-by: Frode Nordahl <[email protected]> >>> --- >>> include/openvswitch/json.h | 16 ++++++++-- >>> lib/json.c | 63 ++++++++++++++++++++++++++------------ >>> 2 files changed, 56 insertions(+), 23 deletions(-) >>> >>> diff --git a/include/openvswitch/json.h b/include/openvswitch/json.h >>> index 35b403c29..39528f140 100644 >>> --- a/include/openvswitch/json.h >>> +++ b/include/openvswitch/json.h >>> @@ -81,6 +81,7 @@ struct json *json_boolean_create(bool); >>> struct json *json_string_create(const char *); >>> struct json *json_string_create_nocopy(char *); >>> struct json *json_serialized_object_create(const struct json *); >>> +struct json *json_serialized_object_create_with_yield(const struct json *); >>> struct json *json_integer_create(long long int); >>> struct json *json_real_create(double); >>> >>> @@ -137,7 +138,8 @@ struct json *json_from_stream(FILE *stream); >>> >>> enum { >>> JSSF_PRETTY = 1 << 0, /* Multiple lines with indentation, if >>> true. */ >>> - JSSF_SORT = 1 << 1 /* Object members in sorted order, if >>> true. */ >>> + JSSF_SORT = 1 << 1, /* Object members in sorted order, if >>> true. */ >>> + JSSF_YIELD = 1 << 2 /* Yield during processing */ >> >> This is a public header, but we're referencing a functionality >> of the internal library that users of a public header can't access. > > The flags are publicly available through the use of the > json_to_string() functions flags argument, which is why I thought it > would be natural to put this here. > > OTOH, I guess it would not be natural for anyone to use it other than > through the `_with_yield()` front-end, so I see what you mean. > > >> It might be better to create an internal header in lib/ and >> define this as an internal flag with a higher value (bit 7?). >> A macro or a single-value enum should be fine. >> >> Also, period at the end of a comment. >> >>> }; >>> char *json_to_string(const struct json *, int flags); >>> void json_to_ds(const struct json *, int flags, struct ds *); >>> @@ -158,14 +160,22 @@ json_clone(const struct json *json_) >>> return json; >>> } >>> >>> -void json_destroy__(struct json *json); >>> +void json_destroy__(struct json *json, bool yield); >> >> Same technically applies here, but this is an internal__ function, >> so may be fine to keep it like that, as users are not supposed to >> call it. >> >>> >>> /* Frees 'json' and everything it points to, recursively. */ >>> static inline void >>> json_destroy(struct json *json) >>> { >>> if (json && !--json->count) { >>> - json_destroy__(json); >>> + json_destroy__(json, false); >>> + } >>> +} >>> + >>> +static inline void >>> +json_destroy_with_yield(struct json *json) >>> +{ >>> + if (json && !--json->count) { >>> + json_destroy__(json, true); >>> } >>> } >> >> This last function can be moved to the internal header. >> Along with the prototype of the json_serialized_object_create_with_yield. > > If we put these in a private header, ovsdb/monitor.c would have to > include a "json-private.h", and that does not feel right to me? Yeah, here we have a different type of 'private' header, that's why I used the word 'internal' and not 'private'. json-private.h indeed sounds like something that is only for json-*.c to include, but I had in mind a header that in just in lib/ and not include/, i.e. available for internal users to include, but not exported to users of a shared library. I suppose we can just create lib/json.h and put this stuff there. We do have lib/flow.h and include/openvswitch/flow.h, for example. In this case, ovsdb/monitor.c can include internal json.h. > > I'll try to find some way to stash this away though if you want to > keep it away from the public interface. > >> The name is getting very long... >> >>> >>> diff --git a/lib/json.c b/lib/json.c >>> index aded8bb01..78ebabb18 100644 >>> --- a/lib/json.c >>> +++ b/lib/json.c >>> @@ -24,6 +24,7 @@ >>> #include <limits.h> >>> #include <string.h> >>> >>> +#include "cooperative-multitasking.h" >>> #include "openvswitch/dynamic-string.h" >>> #include "hash.h" >>> #include "openvswitch/shash.h" >>> @@ -181,14 +182,26 @@ json_string_create(const char *s) >>> return json_string_create_nocopy(xstrdup(s)); >>> } >>> >>> -struct json * >>> -json_serialized_object_create(const struct json *src) >>> +static struct json * >>> +json_serialized_object_create__(const struct json *src, int flags) >>> { >>> struct json *json = json_create(JSON_SERIALIZED_OBJECT); >>> - json->string = json_to_string(src, JSSF_SORT); >>> + json->string = json_to_string(src, flags); >>> return json; >>> } >>> >>> +struct json * >>> +json_serialized_object_create(const struct json *src) >>> +{ >>> + return json_serialized_object_create__(src, JSSF_SORT); >>> +} >>> + >>> +struct json * >>> +json_serialized_object_create_with_yield(const struct json *src) >>> +{ >>> + return json_serialized_object_create__(src, JSSF_SORT | JSSF_YIELD); >>> +} >>> + >>> struct json * >>> json_array_create_empty(void) >>> { >>> @@ -360,20 +373,20 @@ json_integer(const struct json *json) >>> return json->integer; >>> } >>> >>> -static void json_destroy_object(struct shash *object); >>> -static void json_destroy_array(struct json_array *array); >>> +static void json_destroy_object(struct shash *object, bool yield); >>> +static void json_destroy_array(struct json_array *array, bool yield); >>> >>> /* Frees 'json' and everything it points to, recursively. */ >>> void >>> -json_destroy__(struct json *json) >>> +json_destroy__(struct json *json, bool yield) >>> { >>> switch (json->type) { >>> case JSON_OBJECT: >>> - json_destroy_object(json->object); >>> + json_destroy_object(json->object, yield); >>> break; >>> >>> case JSON_ARRAY: >>> - json_destroy_array(&json->array); >>> + json_destroy_array(&json->array, yield); >>> break; >>> >>> case JSON_STRING: >>> @@ -395,13 +408,16 @@ json_destroy__(struct json *json) >>> } >>> >>> static void >>> -json_destroy_object(struct shash *object) >>> +json_destroy_object(struct shash *object, bool yield) >>> { >>> struct shash_node *node; >>> >>> SHASH_FOR_EACH_SAFE (node, object) { >>> struct json *value = node->data; >>> >>> + if (yield) { >>> + cooperative_multitasking_yield(); >>> + } >> >> Should this be moved to the top of the function? >> >>> json_destroy(value); >> >> Should be a conditional call to with_yield() version as well? >> Inner objects can be huge. >> >> AFAIU the code here, only objects yield. Other types of JSON >> do not yield. And that is fine. However, if the object only >> contains simple elements, yielding on each of them is likely >> unnecessary. If elements of the object are objects themselves, >> then they will yield. >> >> For example, in case of a JSON-formatted database, we will yield >> once per row in this case. IIUC, current patch will yield once >> per table, because an update is an array of table objects that >> contain row objects. >> >> What do you think? >> >>> shash_delete(object, node); >>> } >>> @@ -410,12 +426,13 @@ json_destroy_object(struct shash *object) >>> } >>> >>> static void >>> -json_destroy_array(struct json_array *array) >>> +json_destroy_array(struct json_array *array, bool yield) >>> { >>> size_t i; >>> >>> for (i = 0; i < array->n; i++) { >>> - json_destroy(array->elems[i]); >>> + yield ? json_destroy_with_yield(array->elems[i]) : >>> + json_destroy(array->elems[i]); >> >> Please, use an if statement instead. >> >>> } >>> free(array->elems); >>> } >>> @@ -1525,7 +1542,7 @@ static void json_serialize_object(const struct shash >>> *object, >>> struct json_serializer *); >>> static void json_serialize_array(const struct json_array *, >>> struct json_serializer *); >>> -static void json_serialize_string(const char *, struct ds *); >>> +static void json_serialize_string(const char *, struct json_serializer *); >>> >>> /* Converts 'json' to a string in JSON format, encoded in UTF-8, and >>> returns >>> * that string. The caller is responsible for freeing the returned string, >>> @@ -1598,7 +1615,7 @@ json_serialize(const struct json *json, struct >>> json_serializer *s) >>> break; >>> >>> case JSON_STRING: >>> - json_serialize_string(json->string, ds); >>> + json_serialize_string(json->string, s); >>> break; >>> >>> case JSON_SERIALIZED_OBJECT: >>> @@ -1631,7 +1648,7 @@ json_serialize_object_member(size_t i, const struct >>> shash_node *node, >>> indent_line(s); >>> } >>> >>> - json_serialize_string(node->name, ds); >>> + json_serialize_string(node->name, s); >>> ds_put_char(ds, ':'); >>> if (s->flags & JSSF_PRETTY) { >>> ds_put_char(ds, ' '); >>> @@ -1734,7 +1751,7 @@ static const char *chars_escaping[256] = { >>> }; >>> >>> static void >>> -json_serialize_string(const char *string, struct ds *ds) >>> +json_serialize_string(const char *string, struct json_serializer *s) >>> { >>> uint8_t c; >>> uint8_t c2; >>> @@ -1742,26 +1759,32 @@ json_serialize_string(const char *string, struct ds >>> *ds) >>> const char *escape; >>> const char *start; >>> >>> - ds_put_char(ds, '"'); >>> + ds_put_char(s->ds, '"'); >>> count = 0; >>> start = string; >>> while ((c = *string++) != '\0') { >>> + if (s->flags & JSSF_YIELD) { >>> + cooperative_multitasking_yield(); >>> + } >> >> This looks like an overkill. Here we're yielding per character >> in every JSON string. That's a lot. >> Can we yield per object like it's done for a destruction instead? > > Looking at this again, it looks like I have mixed up the code paths of > the type JSON_SERIALIZED_OBJECT with JSON_STRING so this is most > likely not needed. > > That also explains one overrun source I was not able to identify > before running out of time to send the initial patch set for review, > sorry about that! > > We most likely need a chunked version of ds_put_cstr() instead though, > as it is what most likely steals the time instead [0]. Will get back > as soon as I've verified. I'm not convinced this is a problem. Pre-serialized objects are copied with a single memcpy() call. It should be fast enough even if the string is 100MB long. I'm more concerned about not yielding while serializing objects that have large sub-objects. > > 0: > https://github.com/openvswitch/ovs/blob/60457a5e9ddc33809139e91b08634eacd766abb2/lib/json.c#L1605 > > -- > Frode Nordahl > >>> if (c >= ' ' && c != '"' && c != '\\') { >>> count++; >>> } else { >>> if (count) { >>> - ds_put_buffer(ds, start, count); >>> + ds_put_buffer(s->ds, start, count); >>> count = 0; >>> } >>> start = string; >>> escape = chars_escaping[c]; >>> while ((c2 = *escape++) != '\0') { >>> - ds_put_char(ds, c2); >>> + if (s->flags & JSSF_YIELD) { >>> + cooperative_multitasking_yield(); >>> + } >>> + ds_put_char(s->ds, c2); >>> } >>> } >>> } >>> if (count) { >>> - ds_put_buffer(ds, start, count); >>> + ds_put_buffer(s->ds, start, count); >>> } >>> - ds_put_char(ds, '"'); >>> + ds_put_char(s->ds, '"'); >>> } >> _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
