Re: load-git and event parsing

Linus Torvalds Fri, 14 Feb 2020 14:42:51 -0800

On Fri, Feb 14, 2020 at 1:48 PM Robert C. Helling <[email protected]> wrote:
>
> But once more, I have to admit, a quick read through did not really reveal to 
> me how it works when handling multiple strings.


The basic handling hasn't changed: the strings are still decoded into
the 'struct membuffer' consecutively.

That's fairly fundamental, because we need to get the strings out of
the 'line[]' thing that contains meta-data, where whitespace is
special since it delineated the tokens.

So as each string is decoded into that separate string-space, in the
'line[]' space all that remains is a single '"' character to mark that
the string has been decoded separately.

That all is exactly the same as it used to be, and hasn't changed.

What has changed is the "how do we use the end result" part.

The *simple* cases that just took the whole string (think lines line
"divemaster" or "notes" or whatever), used to do

   state->active_dive->divemaster = get_utf8(str);

where "get_utf8()" was just basically a fairly pointless "copy
membuffer". That was mainly because the xml parsing used to have
something like that ("utf8_string()") that actually does some
whitespace cleanup etc. The git format just did a raw copy, because
all de-quoting had already been done earlier (as per above).

So that pointless "get_utf8()" was just removed, and replaced with
"detach_cstring()", which just takes the string directly from the
membuffer and uses the allocation as-is.

And the more complex cases that use "parse_keyvalue_entry()" because
they have multiple key values on one line, were changed to have the
parser just notice the string leftover (that '"' character), and on
each such entry take the first string from the stringspace buffer, and
use that as the value.

Thus the

+       /* Did we get a string? Take it from the list of strings */
+       if (*val == '"')
+               val = pop_cstring(str, key);

thing.

The rest of the changes are just to make the users of
parse_keyvalue_entry() take advantage of this simplification, instead
of having magic rules for that one exptected string at the end.

So for example, the weightsystem code used to do:

        ws.description = get_utf8(str);
        .. loop over all the entries ..

and for the the descriptor case we used to just have:

        /* This is handled by the "get_utf8()" */
        if (!strcmp(key, "description"))
                return;

because we'd already "handled" this (without proper parsing) by just
blindly taking the string-space data as-is.

Now, that special case is simply gone: that blind "take the string
space and put it in ws.description" is gone, and instead the actual
"did we find a description" case does the obvious:

        if (!strcmp(key, "description")) {
                ws->description = value;
                return;
        }

instead - because now those cases are passed in the proper decoded
value thanks to that whole "Take it from the list of strings" logic.

> But I thought that given there is no clash between dive mode names and 
> strings that can appear as event names, simply make name=„OC“ and similar a 
> feature.

That doesn't fix the fundamental fragility, but yes, my other small
fixup patch effectively does that. Except it actually triggers on the
proper thing

Because you *could* have an event that is called "CCR", and your patch
gets that case wrong, and would change the event name there.

And it turns out that the fundamental string fragility had other
problems too. Because after looking at this, it turns out that we've
gotten this "one string per line" wrong in another place when we added
more fields.

The cylinder info was corrupted by me in commit c685c05ff4: it added a
"use='string'" for the gas use. But that meant that now cylinders had
two strings: the description, and the use. And I never noticed,
because I don't do CCR, so I only have "description". That one didn't
corrupt anything - it just meant that the use was never loaded,
because the loading did

        if (!strcmp(key, "use")) {
                cylinder->cylinder_use = cylinderuse_from_text(value);
                return;
        }

which _looks_ correct, but wasn't - the "value" used to be just the
single '"' character before (and the real value was in the string
space.

So fixing the string handling should fix that case too. Not that I -
still - have a test-case for it.

Side note: for at least the event "divemode=" case, the code never
should have used a utf8 string in the first place. All the divemodes
are just single tokens with no whitespace, so they didn't need the
string handling.

The cylinder "use=" case actually has a whitespace for the "not used"
case, which means that it actually did need the string handling. It
just used to get it wrong.

Regardless, it all *should* work now. But no, your hack to just do "is
the event name a divemode" really isn't right. It fixes your
test-case, but it doesn't fix the fundamental issue, and if any dive
computer ever were to return event names that look like divemodes, it
would be wrong.

                Linus
_______________________________________________
subsurface mailing list
[email protected]
http://lists.subsurface-divelog.org/cgi-bin/mailman/listinfo/subsurface

Re: load-git and event parsing

Reply via email to