Re: [PATCH v3 0/9] Require error handling for dynamically created objects

Daniel P . Berrangé Fri, 06 Dec 2024 02:59:26 -0800

On Fri, Dec 06, 2024 at 09:25:54AM +0100, Markus Armbruster wrote:
> Daniel P. Berrangé <berra...@redhat.com> writes:
> 
> > On Wed, Dec 04, 2024 at 12:07:58PM +0100, Markus Armbruster wrote:
> >> To be fair, object_new() was not designed for use with user-provided
> >> type names.  When it chokes on type names not provided by the user, it's
> >> clearly a programming error, and assert() is a perfectly fine way to
> >> catch programming errors.  Same for qdev_new().
> >> 
> >> However, we do in fact use these functions with user-provided type
> >> names, if rarely.  When we do, we need to validate the type name before
> >> we pass it to them.
> >> 
> >> Trouble is the validation code is a bit involved, and reimplementing it
> >> everywhere it's needed is asking for bugs.
> >>
> >> Creating and using more interfaces that are more convenient for this
> >> purpose would avoid that.
> >
> > Yep, I don't have confidence in an API that will assert if the caller
> > forgot to validate the pre-conditions that can be triggered by user
> > input (or potentially other unexpected scenarios like something being
> > switched over to a module).
> 
> Modules broke object_new(), but I'd rather not call object_new()'s
> design bad for not accomodating a feature tacked on half-baked almost a
> decade later.  But let's discuss modules further down.
> 
> Asserting preconditions isn't the problem; this is how preconditions
> *should* be checked.  The problem is error-prone preconditions.


Yep, pre-conditions need to be something developers can be reasonably
expected to accurately comply with.

> Using string type names is in theory error-prone: the compiler cannot
> check the type name is valid.  It could be invalid because of a typo, or
> because it names a type that's not linked into this binary.


> The compiler could check with an enumeration, but then the header
> defining needed to be included basically everywhere QOM is used, and
> changed all the time.
> 
> So QOM went with strings.  I can't remember "invalid type name" bugs
> surviving even basic testing in more than a decade of QOM use.

Yep, at least for static object creation using since we're using the
pattern "object_new(TYPE_BLAH)" - even if TYPE_BLAH contains a typo,
it'll be the same typo passed in the .name = TYPE_BLAH of  TypeInfo,
so all will work fine if following normal code patterns.

> Except for *user-supplied* type names.  These need to be validated, we
> failed to factor out common validation code, and ended up with bugs in
> some of the copies.

Yep


> >> Three cases:
> >> 
> >> 1. Type name is literal string.  No change.  This is the most common
> >>    case.
> >> 
> >> 2. It's not.
> >> 
> >> 2a. Type name is user-provided.  This is rare.  We replace
> >> 
> >>         if (... guard ...) {
> >>             ... return failure ...
> >>         }
> >>         obj = object_new(...);
> >> 
> >>     by
> >> 
> >>         obj = object_new_dynamic(..., errp);
> >>         if (!obj) {
> >>             ... return failure ...
> >>         }
> >> 
> >>     This is an improvement.
> >> 
> >> 2b. It's not.  We should replace
> >> 
> >>         obj = object_new(...);
> >> 
> >>     by
> >> 
> >>         obj = object_new_dynamic(..., &error_abort);
> >> 
> >>     Exact same behavior, just wordier, to placate the compiler.
> >>     Tolerable as long as it's relatively rare.
> >> 
> >>     But I'm not sure it's worthwhile.  All it really does is helping
> >>     some towards not getting case 2a wrong.  But 2a is rare.
> >
> > Yes, 2a is fairly rare, but this is amplified by the consequences
> > of getting it wrong, which are an assert killing your running VM.
> > My goal was to make it much harder to screw up and trigger an
> > assert, even if that makes some valid uses more verbose.
> 
> Has this been a problem in practice?  We have thirteen years of
> experience...

No, but this series came out of Peter's proposal to introduce the
idea of Singleton classes, which would cause object_new to assert
in fun new scenarios. Effectively adding a new pre-condition and
would thus require all places which pass a dynamic type name to
object_new(), to be updated with a "if singleton..." check. I
wasn't happy with the idea of adding that precondition without a
way to enforce that we've not missed checks somewhere in the code.

Of course this pre-condition applies to static object_new calls
too, but those are less risky as the developer (probably) has the
mental context that the static object_new call is for a singleton.

> > I don't have a good answer for how to extend compile time validation
> > to cover non-user specified types that might be modules, without
> > changnig 'object_new' itself to add "Error **errp" and convert as
> > many callers as possible to propagate errors. That's a huge pile
> > of tedious work and in many cases would deteriorate  to &error_abort
> > since some key common use scenarios lack a "Error *errp" to propagate
> > into.
> 
> I can offer two ideas.
> 
> I'll start with devices for reasons that will become apparent in a
> minute.
> 
> The first idea is straighforward in conception: since the problem is
> modules breaking existing design assumptions, unbreak them.
> 
> Device creation cannot fail, only realize can.  Could we delay the
> problematic failure modes introduced by modules from creation to
> realize?
> 
> When creating the real thing fails, create a dummy instead.  Of course,
> the dummy needs to be sufficiently functional to provide for the things
> we do with devices before realize, such as introspection.
>
> Note that we already link information on modules into the binary, so
> that the binary knows which modules provide a certain object.  To enable
> sufficiently functional dummies, we'd have to link more.
> 
> The difficulty is "the things we do with devices before realize": do we
> even know?

Yeah, the idea of a dummy stub until realize is called fills me with
worry. It feels like something where it would be really easy to make
a mistake and have code that crashes interacting with an unrealized
object that doesn't have the struct fields you expect it to have, or
has the struct fields, but not initialized since no 'init' method
was run.

A slight refinement of your idea would be to break anything modular
into 2 distinct objects classes. MyDeviceFacade and MyDeviceImpl.
Creators of the device always call object_new(TYPE_MY_DEVICE_FACADE),
and the realize() method would load the module and make thje call
to object_new(TYPE_MY_DEVICE_IMPL).

Making something currently built-in, into a module, would involve
a bunch of tedious refactoring work, so I don't much like the
thought of choosing this as a design approach.

> The other difficulty is that objects don't have realize.  User-creatable
> objects have complete, which is kind of similar.  See also "Problem 5:
> QOM lacks a clear life cycle" in my memo "Dynamic & heterogeneous
> machines, initial configuration: problems"[*].

It would be nice to have a unified model between object and devices
for the complete/realize approach, but that's a slight tangent.

> The second idea is a variation of your idea to provide two interfaces
> for object creation, where using the wrong one won't compile: a common
> one that cannot fail, i.e. object_new(), and an uncommon one that can.
> Let's call that one object_try_new() for now.
> 
> Your proposed "string literal" as a useful approximation of "cannot
> fail".  Modules defeat that.
> 
> What if we switch from strings to something more expressive?
> 
> Step one: replace string type names by symbols
> 
> Change
> 
>     #define TYPE_FOO "foo"
> 
>     Object *object_new(const char *typename);
> 
> to something like
> 
>     extern const TypeInfoSafe foo_info;
>     #define TYPE_FOO &foo_info
> 
>     Object *object_new(const TypeInfoSafe *type_info);
> 
> Step two: different symbols for safe and unsafe types
> 
>     extern const TypeInfoUnsafe bar_info;
>     #define TYPE_BAR &bar_info
>     
>     Object *object_try_new(const TypeInfoUnsafe *type_info);
> 
> Now you cannot pass bar_info to object_new().
> 
> For a module-enabled TYPE_BAR, we already have something like
> 
>     module_obj(TYPE_BAR)
> 
> Make macro module_obj() require its argument to be TypeInfoUnsafe.
> 
> Voilà, the compiler enforces use of object_try_new() for objects
> provided by loadable modules.
> 
> There will be some fallout around computed type names such as
> ACCEL_OPS_NAME().  Fairly rare, I think.
> 
> More fallout around passing TYPE_ macros to functions that accept both
> safe and unsafe types.  How common is that?

Perhaps more common than we care to admit. eg most block device drivers
are safe, except for a few we modularized which are unsafe. Most ui
frontends would be safe, except for a few we modularized. This pattern
of "except for a few we modularized" has been repeated all over, and
conceptually that's not a bad thing, as we wanted to make it easy to
modularize things incrementally.

Looking at our current /usr/bin/qemu-system-XXX binaries, they range in
size from 6 MB to 30 MB, stripped, ignoring linked libraries. Considering
work on the qemu-system-any binary that is intended to unify all targets,
I wouldn't be surprised if it came out at over 100 MB with all devices
from all targets included.

Is qemu-system-any pushing us to a place where our approach to modules
is in fact wrong ?

Modularizing piecemeal let us cull the big offenders that pulled in
huge external libraries.

People still complain QEMU is "too big" and binaries linked to too
many legacy devices.

With my distro hat on, if we had 'qemu-system-any' would I really
want to have it as monolithic binary ?

I think I would want to have loadable TCG backends for each target,
and I would want all the devices for each target to be loadable too.
eg, so I could have a 'qemu-system-any' RPM with just the core, and
'qemu-system-modules-arm', 'qemu-system-modules-x86_64', etc, or
even more fine grained than that.

IOW, everything is a module by default. Not necccessarily
1 object == 1 module, more  "N objects == 1 module", but certainly
with very few objects built-in.

In such a world, IMHO, it doesn't make sense to have TypeInfoSafe
and TypeInfoUnsafe, with different object_new/object_try_new
methods. I think we would have to accept that object_new must
get an "Error **errp", and possibly even the 'init" method too.
It would force us to make sure we can propagage into errp in
all the key places we can't do so today wrt object lifecycles.


Overall I've talked myself into believing my series here is not
worthwhile, as it doesn't solve a big enough problem, and it
needs somethign more ambituous.

> >> Maybe module_object_new() and object_new_dynamic() could be fused into a
> >> single function with a better name.
> >> 
> >> > With this series, my objections to Peter Xu's singleton series[1]
> >> > would be largely nullified.
> >> >
> >> > [1] 
> >> > https://lists.nongnu.org/archive/html/qemu-devel/2024-10/msg05524.html
> 
> [*] Message-ID: <87o7d1i7ky....@pond.sub.org>
> https://lore.kernel.org/qemu-devel/87o7d1i7ky....@pond.sub.org/

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [PATCH v3 0/9] Require error handling for dynamically created objects

Reply via email to