On Thursday, 31.12.2015 at 14:05, Antti Kantee wrote:
> Ruling out XML entirely because most of it is unnecessary is hardly a
> reason.  XML has been used successfully for runtime information in an
> embedded/microcontroller OS running on tens of kilobytes of memory, so it's
> definitely lightweight enough for our config use.  Concluding that we can't
> use XML because some implementations are bloatware is like deciding to not
> use kitchen knives because people who don't know which end is the handle
> will run into messy problems.
> 
> "Many people hate xml with a passion" would have been a better rationale ;)

I did write "(most of) XML". If anyone wants to go hunting for a small,
safe XML parser with a sane API (and data structures!), be my guest :-)

Anyhow, as you wrote, "all config formats suck to some extent", so the
discussion is mostly academic at this point.

> >I tried this for several days, and the resulting code was both hard to adapt
> >to changes in the JSON structure and totally unmaintaintable.
> 
> Can you illustrate by giving a few examples of the problem?  (e.g. by
> pointing to the relevant part of the diff)

A good comparison would be handle_net() on master vs. handle_interface() on
the branch. Some parts elided for brevity:

----master
        T_CHECKTYPE(t, data, JSMN_OBJECT, __func__);

        /* we expect straight key-value pairs (at least for now) */
        objsize = t->size;
        if (left < 2*objsize + 1) {
                return -1;
        }
        t++;

        /* ... */

        ifname = cloner = type = method = NULL;
        addr = mask = gw = NULL;

        for (i = 0; i < objsize; i++, t+=2) {
                const char *valuestr;
                key = t;
                value = t+1;

                T_CHECKTYPE(key, data, JSMN_STRING, __func__);
                T_CHECKSIZE(key, data, 1, __func__);

                T_CHECKTYPE(value, data, JSMN_STRING, __func__);
                T_CHECKSIZE(value, data, 0, __func__);

                valuestr = token2cstr(value, data);
                if (T_STREQ(key, data, "if")) {
                        ifname = valuestr;
                } else if (T_STREQ(key, data, "cloner")) {
                        cloner = valuestr;
                } else if (T_STREQ(key, data, "type")) {
                        type = valuestr;
                } else if (T_STREQ(key, data, "method")) {
                        method = valuestr;
                } else if (T_STREQ(key, data, "addr")) {
                        addr = valuestr;
                } else if (T_STREQ(key, data, "mask")) {
                        /* XXX: we could also pass mask as a number ... */
                        mask = valuestr;
                } else if (T_STREQ(key, data, "gw")) {
                        gw = valuestr;
                } else {
                        errx(1, "unexpected key \"%.*s\" in \"%s\"",
                            T_PRINTFSTAR(key, data), __func__);
                }
        }
----branch
        jexpect(jobject, v, __func__);

        ifname = v->n;
        addrs = create = NULL;
        for (jvalue **i = v->u.v; *i; ++i) {
                if (strcmp((*i)->n, "create") == 0) {
                        if ((*i)->d != jtrue && (*i)->d != jfalse) {
                                errx(1, "%s: expected BOOLEAN for key"
                                        "\"create\" in \"%s\"", __func__,
                                        ifname);
                        }
                        create = *i;
                } else if (strcmp((*i)->n, "addrs") == 0) {
                        jexpect(jarray, *i, __func__);
                        addrs = *i;
                } else {
                        warnx("%s: unexpected key \"%s\" in \"%s\", ignored",
                                __func__, (*i)->n, ifname);
                }
        }

        /* ... */

        for (jvalue **a = addrs->u.v; *a; ++a) {
                jexpect(jobject, *a, __func__);

                type = method = addr = NULL;
                for (jvalue **i = (*a)->u.v; *i; ++i) {
                        if (strcmp((*i)->n, "type") == 0) {
                                type = (*i)->u.s;
                        } else if (strcmp((*i)->n, "method") == 0) {
                                method = (*i)->u.s;
                        } else if (strcmp((*i)->n, "addr") == 0) {
                                addr = (*i)->u.s;
                        } else {
                                warnx("%s: unexpected key \"%s\""
                                        " in \"%s.addrs[]\"",
                                        __func__, (*i)->n, ifname);
                        }
                }

                if (!type || !method) {
                        errx(1, "%s: missing type/method in \"%s.addrs[]\"",
                                __func__, ifname);
                }
                if (strcmp(type, "inet") == 0) {
                        config_ipv4(ifname, method, addr);
                } else if (strcmp(type, "inet6") == 0) {
                        config_ipv6(ifname, method, addr);
                } else {
                        errx(1, "%s: address type \"%s\" not supported "
                                "in \"%s.addrs[]\"", __func__, type, ifname);
                }
        }
-----

Note that the existing code parses a much simpler data structure (in terms
of the JSON data), yet still needs to take several shortcuts in order to
keep track of where it is in the stream of tokens. This results in making
assumptions about the structure of the JSON data.

Compare with the new code where each iteration over a nested structure is
just a for() loop. No assumptions are made about the size (and thus
structure) of child objects, and no manipulation of 't' is required. In
fact, the code does not need to deal with the "expected size in tokens" of
an object at all.

This is a general theme throught the change from the current to the new
parser, which results in cleaner code (since less mechanics of parsing are
exposed) and more robust code with better error reporting and ability to
easily ignore unknown parts of the JSON structure. For another example,
view master:addbin() and branch:handle_bin() side by side in an editor.

The latter point is especially important in order to future-proof the spec:
A unikernel which understands an older version of the spec, if passed a
newer version, should be able to safely ignore the directives it does not
understand. 

> >Fair point, however I was not aware of what your plans were with kernonly
> >mode as it is still documented as "experimental", much less that
> >"supporting kernonly" was a requirement for config.c.
> 
> IIRC you yourself said earlier this year that you'd like to run ocaml
> programs without libc.  I don't admittedly know what sort of ocaml programs
> you'd want to run, but I'm guessing at least some of them might want e.g. a
> network setup, making me think that some sort of -k config support would
> derive already from your own goals.

Not so much what network setup they (Ocaml programs) might want, but also
what network setup they might *not* want. This brings up an interesting
question:

If I have a Mirage/Rumprun hybrid unikernel (whether using -k or not,
irrelevant) and I would like Rumprun to mount some block devices, but
Mirage to handle the network stack, can I express this using a single
JSON configuration?

I've not thought about this enough, but it would seem to imply that Rumprun
config would handle "fs", but not "net", which would be passed to Mirage
somehow. A strawman of how this could work:

1) We would need to specify the method that rumprun uses to pass the
configuration on to the application. This could be as simple as writing it
out to "/config.json".

2) In a hybrid unikernel as described above, presumably -lrumpnet_netinet
and -lrumpnet_netinet6 would not be linked in. This would have to be
exposed to config.c somehow (through a data structure with a list of
baked-in components at rumprun-bake time?), and on this basis (no netinet),
config.c would not handle those parts of the configuration.

I may be completely off on the wrong track here -- would appreciate input
from the Mirage (and other?) unikernel folks on this list. This would also
be a good discussion to have on the "cross-project" unikernel.org once the
discourse.org setup there is complete.

> Anyway, the requirement is to not support -k with the config revamp
> ("separate change"), the requirement is to not introduce changes which
> clearly take -k config in the wrong direction ("must not be torpedoed").

I don't see the new parser "torpedoing -k config", as the dependency on
libc interfaces is minimal and can be removed.

In my opinion, as I've argued above, the new parser is worth the cost of
the extra work needed to make it run only on top of bmk interfaces,
whenever that happens in the future.

Martin

Reply via email to