On Tue, May 09, 2023 at 02:31:08PM +0200, Hanna Czenczek wrote:
> On 08.05.23 22:03, Eric Blake wrote:
> > Add some more strings that the user might send our way.  In
> > particular, some of these additions include FIXME comments showing
> > where our parser doesn't quite behave the way we want.
> > 
> > Signed-off-by: Eric Blake <ebl...@redhat.com>
> > ---
> >   tests/unit/test-cutils.c | 226 +++++++++++++++++++++++++++++++++++++--
> >   1 file changed, 215 insertions(+), 11 deletions(-)
> 
> I wonder: The plan is to have "1.5e+1k" be parsed as "1.5e" + endptr ==
> "+1k"; but "0x1p1" is not parsed at all (could be "0x1" + "p1"). Is that
> fully intentional?

Pre-series: "1.5e+1k" is parsed as "1" ".5e+1" "k", no slop
Post-series: "1.5e+1k" is parsed as "1" ".5" "e", slop "+1k"

If you call qemu_strtosz(,NULL,), both versions still give EINVAL
error (pre-patch: we detected an exponent in the middle portion of the
parse, which is not allowed; post-series: we detect slop, which is not
allowed).  If you call qemu_strtosz(,&endptr), the pre-patch version
fails with EINVAL but the new code succeeds.  But of all the callers,
the only one that passed non-NULL endptr did it's own check that *slop
is only whitespace ("+1k" fails that check), so the end result is that
at the high level, we reject "1.5e+1k" from the user, but the
rejection is now at a different point in the call stack.  The reason I
made the change is that it was less code to guarantee that the
internal qemu_strtod_finite() is passed a string that cannot possibly
contain an exponent, than to blindly call qemu_strtod_finite() on an
arbitrary suffix and then check after the fact on whether an exponent
was actually consumed as part of that parse.

For "0x1p1", the parse is "0x1" plus slop "p1" (unchanged by the
series).  This is rejected with EINVAL (the moment you have hex, we
insist on no suffix or slop, regardless of endptr being NULL or not).
I could have changed it similar to how I changed "1.5e+1k" to allow
the parse at the qemu_strtosz layer and then detect the slop at the
caller, but that was more lines of code and I didn't feel like it was
worth it.

Either way, the unit tests accurately cover the difference in parse
behavior, and I tried to make the documentation (by patch 11/11) be
consistent as well.  But since this is why we have a review process,
I'm open to feedback if you think I need to tweak which parse styles
we should be favoring.

> 
> (Similarly, "1.1.k" is also not parsed at all, but the problem there is not
> just two decimal points, but also that "1.1" would be an invalid size in
> itself, so it really shouldn’t be parsed at all.)

"1.1k" is a valid (if unusual) size.  This particular test addition
did not happen to improve coverage for my final code, but given that
qemu_strtod_finite(".1.k") would stop parsing after the ".1" and still
be pointing at '.', and our read-out-of-bounds was caused by assuming
that qemu_strtod_finite() failures leave *endptr == '.' (which was
invalidated for ".9e999" failing with ERANGE), it didn't hurt to add
the coverage.

> 
> I don’t think it matters to users, really, but I still wonder.

If any of our callers had actually done something like: "size %llx
followed by junk %s" with the parsed value and the junk string, then
it would matter.  But since all of the callers either passed in NULL
endptr (and got EINVAL before even knowing how much of the string was
parsed) or insisted that the junk be whitespace only, and did not
print details about the parsed value before reporting such errors,
you're right that I don't see this as mattering to end users.  But it
took me quite a bit of audit time to convince myself of that.

> > @@ -2875,6 +3056,20 @@ static void test_qemu_strtosz_trailing(void)
> >       err = qemu_strtosz(str, NULL, &res);
> >       g_assert_cmpint(err, ==, -EINVAL);
> >       g_assert_cmphex(res, ==, 0xbaadf00d);
> > +
> > +    /* FIXME overflow in fraction is buggy */
> > +    str = "1.5E999";
> > +    endptr = NULL;
> > +    res = 0xbaadf00d;
> > +    err = qemu_strtosz(str, &endptr, &res);
> > +    g_assert_cmpint(err, ==, 0);
> > +    g_assert_cmpuint(res, ==, EiB /* FIXME EiB * 1.5 */);
> > +    g_assert(endptr == str + 9 /* FIXME + 4 */);
> 
> This is “correct” (i.e. it’s the value we’ll get right now, which is the
> wrong one), but gcc complains that the array index is out of bounds
> (well...), which breaks the build.

Huh - wonder what build settings you are using that I wasn't for
seeing that warning - but it makes total sense that 'str + 9' would
trigger a build failure if we don't pad str with enough '\0' to keep
things in bounds.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


Reply via email to