On Fri, Dec 06, 2024 at 09:25:54AM +0100, Markus Armbruster wrote: > Daniel P. Berrangé <berra...@redhat.com> writes: > > > On Wed, Dec 04, 2024 at 12:07:58PM +0100, Markus Armbruster wrote: > >> To be fair, object_new() was not designed for use with user-provided > >> type names. When it chokes on type names not provided by the user, it's > >> clearly a programming error, and assert() is a perfectly fine way to > >> catch programming errors. Same for qdev_new(). > >> > >> However, we do in fact use these functions with user-provided type > >> names, if rarely. When we do, we need to validate the type name before > >> we pass it to them. > >> > >> Trouble is the validation code is a bit involved, and reimplementing it > >> everywhere it's needed is asking for bugs. > >> > >> Creating and using more interfaces that are more convenient for this > >> purpose would avoid that. > > > > Yep, I don't have confidence in an API that will assert if the caller > > forgot to validate the pre-conditions that can be triggered by user > > input (or potentially other unexpected scenarios like something being > > switched over to a module). > > Modules broke object_new(), but I'd rather not call object_new()'s > design bad for not accomodating a feature tacked on half-baked almost a > decade later. But let's discuss modules further down. > > Asserting preconditions isn't the problem; this is how preconditions > *should* be checked. The problem is error-prone preconditions.
Yep, pre-conditions need to be something developers can be reasonably expected to accurately comply with. > Using string type names is in theory error-prone: the compiler cannot > check the type name is valid. It could be invalid because of a typo, or > because it names a type that's not linked into this binary. > The compiler could check with an enumeration, but then the header > defining needed to be included basically everywhere QOM is used, and > changed all the time. > > So QOM went with strings. I can't remember "invalid type name" bugs > surviving even basic testing in more than a decade of QOM use. Yep, at least for static object creation using since we're using the pattern "object_new(TYPE_BLAH)" - even if TYPE_BLAH contains a typo, it'll be the same typo passed in the .name = TYPE_BLAH of TypeInfo, so all will work fine if following normal code patterns. > Except for *user-supplied* type names. These need to be validated, we > failed to factor out common validation code, and ended up with bugs in > some of the copies. Yep > >> Three cases: > >> > >> 1. Type name is literal string. No change. This is the most common > >> case. > >> > >> 2. It's not. > >> > >> 2a. Type name is user-provided. This is rare. We replace > >> > >> if (... guard ...) { > >> ... return failure ... > >> } > >> obj = object_new(...); > >> > >> by > >> > >> obj = object_new_dynamic(..., errp); > >> if (!obj) { > >> ... return failure ... > >> } > >> > >> This is an improvement. > >> > >> 2b. It's not. We should replace > >> > >> obj = object_new(...); > >> > >> by > >> > >> obj = object_new_dynamic(..., &error_abort); > >> > >> Exact same behavior, just wordier, to placate the compiler. > >> Tolerable as long as it's relatively rare. > >> > >> But I'm not sure it's worthwhile. All it really does is helping > >> some towards not getting case 2a wrong. But 2a is rare. > > > > Yes, 2a is fairly rare, but this is amplified by the consequences > > of getting it wrong, which are an assert killing your running VM. > > My goal was to make it much harder to screw up and trigger an > > assert, even if that makes some valid uses more verbose. > > Has this been a problem in practice? We have thirteen years of > experience... No, but this series came out of Peter's proposal to introduce the idea of Singleton classes, which would cause object_new to assert in fun new scenarios. Effectively adding a new pre-condition and would thus require all places which pass a dynamic type name to object_new(), to be updated with a "if singleton..." check. I wasn't happy with the idea of adding that precondition without a way to enforce that we've not missed checks somewhere in the code. Of course this pre-condition applies to static object_new calls too, but those are less risky as the developer (probably) has the mental context that the static object_new call is for a singleton. > > I don't have a good answer for how to extend compile time validation > > to cover non-user specified types that might be modules, without > > changnig 'object_new' itself to add "Error **errp" and convert as > > many callers as possible to propagate errors. That's a huge pile > > of tedious work and in many cases would deteriorate to &error_abort > > since some key common use scenarios lack a "Error *errp" to propagate > > into. > > I can offer two ideas. > > I'll start with devices for reasons that will become apparent in a > minute. > > The first idea is straighforward in conception: since the problem is > modules breaking existing design assumptions, unbreak them. > > Device creation cannot fail, only realize can. Could we delay the > problematic failure modes introduced by modules from creation to > realize? > > When creating the real thing fails, create a dummy instead. Of course, > the dummy needs to be sufficiently functional to provide for the things > we do with devices before realize, such as introspection. > > Note that we already link information on modules into the binary, so > that the binary knows which modules provide a certain object. To enable > sufficiently functional dummies, we'd have to link more. > > The difficulty is "the things we do with devices before realize": do we > even know? Yeah, the idea of a dummy stub until realize is called fills me with worry. It feels like something where it would be really easy to make a mistake and have code that crashes interacting with an unrealized object that doesn't have the struct fields you expect it to have, or has the struct fields, but not initialized since no 'init' method was run. A slight refinement of your idea would be to break anything modular into 2 distinct objects classes. MyDeviceFacade and MyDeviceImpl. Creators of the device always call object_new(TYPE_MY_DEVICE_FACADE), and the realize() method would load the module and make thje call to object_new(TYPE_MY_DEVICE_IMPL). Making something currently built-in, into a module, would involve a bunch of tedious refactoring work, so I don't much like the thought of choosing this as a design approach. > The other difficulty is that objects don't have realize. User-creatable > objects have complete, which is kind of similar. See also "Problem 5: > QOM lacks a clear life cycle" in my memo "Dynamic & heterogeneous > machines, initial configuration: problems"[*]. It would be nice to have a unified model between object and devices for the complete/realize approach, but that's a slight tangent. > The second idea is a variation of your idea to provide two interfaces > for object creation, where using the wrong one won't compile: a common > one that cannot fail, i.e. object_new(), and an uncommon one that can. > Let's call that one object_try_new() for now. > > Your proposed "string literal" as a useful approximation of "cannot > fail". Modules defeat that. > > What if we switch from strings to something more expressive? > > Step one: replace string type names by symbols > > Change > > #define TYPE_FOO "foo" > > Object *object_new(const char *typename); > > to something like > > extern const TypeInfoSafe foo_info; > #define TYPE_FOO &foo_info > > Object *object_new(const TypeInfoSafe *type_info); > > Step two: different symbols for safe and unsafe types > > extern const TypeInfoUnsafe bar_info; > #define TYPE_BAR &bar_info > > Object *object_try_new(const TypeInfoUnsafe *type_info); > > Now you cannot pass bar_info to object_new(). > > For a module-enabled TYPE_BAR, we already have something like > > module_obj(TYPE_BAR) > > Make macro module_obj() require its argument to be TypeInfoUnsafe. > > Voilà, the compiler enforces use of object_try_new() for objects > provided by loadable modules. > > There will be some fallout around computed type names such as > ACCEL_OPS_NAME(). Fairly rare, I think. > > More fallout around passing TYPE_ macros to functions that accept both > safe and unsafe types. How common is that? Perhaps more common than we care to admit. eg most block device drivers are safe, except for a few we modularized which are unsafe. Most ui frontends would be safe, except for a few we modularized. This pattern of "except for a few we modularized" has been repeated all over, and conceptually that's not a bad thing, as we wanted to make it easy to modularize things incrementally. Looking at our current /usr/bin/qemu-system-XXX binaries, they range in size from 6 MB to 30 MB, stripped, ignoring linked libraries. Considering work on the qemu-system-any binary that is intended to unify all targets, I wouldn't be surprised if it came out at over 100 MB with all devices from all targets included. Is qemu-system-any pushing us to a place where our approach to modules is in fact wrong ? Modularizing piecemeal let us cull the big offenders that pulled in huge external libraries. People still complain QEMU is "too big" and binaries linked to too many legacy devices. With my distro hat on, if we had 'qemu-system-any' would I really want to have it as monolithic binary ? I think I would want to have loadable TCG backends for each target, and I would want all the devices for each target to be loadable too. eg, so I could have a 'qemu-system-any' RPM with just the core, and 'qemu-system-modules-arm', 'qemu-system-modules-x86_64', etc, or even more fine grained than that. IOW, everything is a module by default. Not necccessarily 1 object == 1 module, more "N objects == 1 module", but certainly with very few objects built-in. In such a world, IMHO, it doesn't make sense to have TypeInfoSafe and TypeInfoUnsafe, with different object_new/object_try_new methods. I think we would have to accept that object_new must get an "Error **errp", and possibly even the 'init" method too. It would force us to make sure we can propagage into errp in all the key places we can't do so today wrt object lifecycles. Overall I've talked myself into believing my series here is not worthwhile, as it doesn't solve a big enough problem, and it needs somethign more ambituous. > >> Maybe module_object_new() and object_new_dynamic() could be fused into a > >> single function with a better name. > >> > >> > With this series, my objections to Peter Xu's singleton series[1] > >> > would be largely nullified. > >> > > >> > [1] > >> > https://lists.nongnu.org/archive/html/qemu-devel/2024-10/msg05524.html > > [*] Message-ID: <87o7d1i7ky....@pond.sub.org> > https://lore.kernel.org/qemu-devel/87o7d1i7ky....@pond.sub.org/ With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|