Re: [Intel-gfx] IOCTL feature detection (Was: Re: [PATCH 8/8] drm/i915: Allow user to set cache at BO creation)
On 2023-04-26 04:52:43, Daniel Vetter wrote: > > Joonas asked me to put my thoughts here: > > - in general the "feature discovery by trying it out" approach is most > robust and hence preferred, but it's also not something that's required > when there's good reasons against it More robust in what sense? Userspace has to set up some scenario to use the interface, which hopefully is not too complex. It's difficult to predict what it may look like in the future since we are talking about undefined extensions. Userspace has to rely on the kernel making creating and destroying this thing essentially free. For I915_GEM_CREATE_EXT_PROTECTED_CONTENT, that did not work out. For I915_GEM_CREATE_EXT_SET_PAT, just wondering, since the PAT indices are platform specific, what might happen if we tried to choose some common index value to detect the extension in a generic manner? Could it potentially lead to unexpected behavior, or maybe just an error. I guess it's really extension specific what kind of issues potentially could arise. > tldr; prefer discovery, don't sweat it if not, definitely don't > overengineer this with some magic "give me all extensions" thing because > that results in guaranteed cross-component backporting pains sooner or > later. Or inconsistency, which defeats the point. I guess I don't know the full context of your concerns here. For returning the gem-create extensions, isn't this just a matter of returning the valid indices to the create_extensions array in i915_gem_create.c? Is that an over-engineered query? It seems weird that there's a reasonably well defined "extension" mechanism here, but no way for the kernel to just tell us what extensions it knows about. -Jordan
Re: [Intel-gfx] IOCTL feature detection (Was: Re: [PATCH 8/8] drm/i915: Allow user to set cache at BO creation)
On 4/26/2023 9:48 AM, Teres Alexis, Alan Previn wrote: On Wed, 2023-04-26 at 13:52 +0200, Daniel Vetter wrote: On Tue, Apr 25, 2023 at 04:41:54PM +0300, Joonas Lahtinen wrote: (+ Faith and Daniel as they have been involved in previous discussions) Quoting Jordan Justen (2023-04-24 20:13:00) On 2023-04-24 02:08:43, Tvrtko Ursulin wrote: alan:snip - the more a feature spans drivers/modules, the more it should be discovered by trying it out, e.g. dma-buf fence import/export was a huge discussion, luckily mesa devs figured out how to transparantly fall back at runtime so we didn't end up merging the separate feature flag (I think at least, can't find it). pxp being split across i915/me/fw/who knows what else is kinda similar so I'd heavily lean towards discovery by creating a context - pxp taking 8s to init a ctx sounds very broken, irrespective of anything else I think there has been a bit of confusion in regards to this timeout and to where it applies, so let me try to clarify to make sure we're all on the same page (Alan has already explained most of it below, but I'm going to go in a bit more detail and I want to make sure it's all in one place for reference). Before we can do any PXP operation, dependencies need to be satisfied, some of which are outside of i915. For MTL, these are: GSC FW needs to be loaded (~250 ms) HuC FW needs to be authenticated for PXP ops (~20 ms) MEI modules need to be bound (depends on the probe ordering, but usually a few secs) GSC SW proxy via MEI needs to be established (~500 ms normally, but can take a few seconds on the first boot after a firmware update) Due to the fact that these can take several seconds in total to complete, to avoid stalling driver load/resume for that long we moved the i915-side operations to a separate worker and we register i915 before they've completed. This means that we can get a PXP context creation call before all the dependencies are in place, in which case we do need to wait and that's where the 8s come from. After all the pieces are in place, a PXP context creation call is much faster (up to ~150 ms, which is the time required to start the PXP session if it is not already running). The reason why we suggested a dedicated getparam was to avoid requiring early users to wait for all of that to happen just to check the capability. By the time an user actually wants to use PXP, we're likely done with the prep steps (or at least we're far along with them) and therefore the wait will be short. Alan: Please be aware that: 1. the wait-timeout was changed to 1 second sometime back. 2. the I'm not deciding the time-out. I initially wanted to keep it at the same timeout as ADL (250 milisec) - and ask the UMD to retry if user needs it. (as per same ADL behavior). Daniele requested to move it to 8 seconds - but thru review process, we reduced it to 1 second. 3. In anycase, thats just the wait-timeout - and we know it wont succeed until ~6 seconds after i915 (~9 secs after boot). The issue isnt our hardware or i915 - its the component driver load <-- this is what's broken. I think the question here is whether the mei driver is taking a long time to probe or if it is just being probed late. In the latter case, I wouldn't call it broken. Details: PXP context is dependent on gsc-fw load, huc-firmware load, mei-gsc-proxy component driver load + bind, huc-authentication and gsc-proxy-init-handshake. Most of above steps begin rather quickly during i915 driver load - the delay seems to come from a very late mei-gsc-proxy component driver load. In fact the parent mei-me driver is only getting ~6 seconds after i915 init is done. That blocks the gsc-proxy-init-handshake and huc-authentication and lastly PXP. That said, what is broken is why it takes so long to get the component drivers to come up. NOTE: PXP isnt really doing anything differently in the context creation flow (in terms of time-consuming-steps compared to ADL) besides the extra dependency waits these. We can actually go back to the original timeout of 250 milisecs like we have in ADL but will fail if MESA calls in too early (but will succeed later) ... or... we can create the GET_PARAMs. A better idea would be to figure out how to control the driver load order and force mei driver + components to get called right after i915. I was informed there is no way to control this and changes here will likely not be accepted upstream. we could add a device link to mark i915 as a consumer of mei, but I believe that wouldn't work for 2 reasons 1 - on discrete, mei binds to a child device of i915, so the dependency is reversed 2 - the link might just delay the i915 load to after the mei load, which I'm not sure it is something we want (and at that point we could also just wait for mei to bind from within the i915 load). Daniele ++ Daniele - can you chime in? Take note that ADL has the same issue but for whatever reason, the d
Re: [Intel-gfx] IOCTL feature detection (Was: Re: [PATCH 8/8] drm/i915: Allow user to set cache at BO creation)
On Wed, 2023-04-26 at 13:52 +0200, Daniel Vetter wrote: > On Tue, Apr 25, 2023 at 04:41:54PM +0300, Joonas Lahtinen wrote: > > (+ Faith and Daniel as they have been involved in previous discussions) > > Quoting Jordan Justen (2023-04-24 20:13:00) > > > On 2023-04-24 02:08:43, Tvrtko Ursulin wrote: > > > > > > > > > alan:snip > - the more a feature spans drivers/modules, the more it should be > discovered by trying it out, e.g. dma-buf fence import/export was a huge > discussion, luckily mesa devs figured out how to transparantly fall back > at runtime so we didn't end up merging the separate feature flag (I > think at least, can't find it). pxp being split across i915/me/fw/who > knows what else is kinda similar so I'd heavily lean towards discovery > by creating a context > > - pxp taking 8s to init a ctx sounds very broken, irrespective of anything > else > Alan: Please be aware that: 1. the wait-timeout was changed to 1 second sometime back. 2. the I'm not deciding the time-out. I initially wanted to keep it at the same timeout as ADL (250 milisec) - and ask the UMD to retry if user needs it. (as per same ADL behavior). Daniele requested to move it to 8 seconds - but thru review process, we reduced it to 1 second. 3. In anycase, thats just the wait-timeout - and we know it wont succeed until ~6 seconds after i915 (~9 secs after boot). The issue isnt our hardware or i915 - its the component driver load <-- this is what's broken. Details: PXP context is dependent on gsc-fw load, huc-firmware load, mei-gsc-proxy component driver load + bind, huc-authentication and gsc-proxy-init-handshake. Most of above steps begin rather quickly during i915 driver load - the delay seems to come from a very late mei-gsc-proxy component driver load. In fact the parent mei-me driver is only getting ~6 seconds after i915 init is done. That blocks the gsc-proxy-init-handshake and huc-authentication and lastly PXP. That said, what is broken is why it takes so long to get the component drivers to come up. NOTE: PXP isnt really doing anything differently in the context creation flow (in terms of time-consuming-steps compared to ADL) besides the extra dependency waits these. We can actually go back to the original timeout of 250 milisecs like we have in ADL but will fail if MESA calls in too early (but will succeed later) ... or... we can create the GET_PARAMs. A better idea would be to figure out how to control the driver load order and force mei driver + components to get called right after i915. I was informed there is no way to control this and changes here will likely not be accepted upstream. ++ Daniele - can you chime in? Take note that ADL has the same issue but for whatever reason, the dependant mei component on ADL loaded much sooner - so it was never an issue that was caught but still existed on ADL time merge (if users customize the kernel + compositor for fastboot it will happen).(i realize I havent tested ADL with the new kernel configs that we use to also boot PXP on MTL - wonder if the new mei configs are causing the delay - i.e. ADL customer could suddenly see this 6 sec delay too. - something i have to check now)
Re: [Intel-gfx] IOCTL feature detection (Was: Re: [PATCH 8/8] drm/i915: Allow user to set cache at BO creation)
On Tue, Apr 25, 2023 at 04:41:54PM +0300, Joonas Lahtinen wrote: > (+ Faith and Daniel as they have been involved in previous discussions) > > Quoting Jordan Justen (2023-04-24 20:13:00) > > On 2023-04-24 02:08:43, Tvrtko Ursulin wrote: > > > > > > Being able to "list" supported extensions sounds like a reasonable > > > principle, albeit a departure from the design direction to date. > > > Which means there are probably no quick solutions. Also, AFAIU, only > > > PXP context create is the problematic one, right? Everything else is > > > pretty much instant or delayed allocation so super cheap to probe by > > > attempting to use. > > > > > > If I got that right and given this series is about > > > drm_i915_gem_create_ext I don't think this side discussion should be > > > blocking it. > > > > This still leaves the issue of no reasonable detection mechanism for > > the extension. > > I remember this exact discussion being repeated at least a few times in > the past 5 years. Previously the conclusion was always that detection by > trying to use the extension leads to least amount of uAPI surface. There > is also no concern of having mismatch in backporting of the query and the > functionality patches. > > Why exactly do you think it is more reasonable to do the following? > > check_if_extension_query_is_supported(); > > check_if_extension_xyz_is_supported(); > > > VS > > create_[context,buffer,whatever]_with_extension(); > > destroy_[context,buffer,whatever](); > > For contexts and buffers there's no overhead, and we're keeping the uAPI > surface smaller (less bugs, less validation effort). Additionally we > support checking combinations of extensions A, B and C without extra > effort. > > If we're not returning enough clear errnos, then that is something to > make sure we do. Joonas asked me to put my thoughts here: - in general the "feature discovery by trying it out" approach is most robust and hence preferred, but it's also not something that's required when there's good reasons against it - the more a feature spans drivers/modules, the more it should be discovered by trying it out, e.g. dma-buf fence import/export was a huge discussion, luckily mesa devs figured out how to transparantly fall back at runtime so we didn't end up merging the separate feature flag (I think at least, can't find it). pxp being split across i915/me/fw/who knows what else is kinda similar so I'd heavily lean towards discovery by creating a context - pxp taking 8s to init a ctx sounds very broken, irrespective of anything else - in practice there's not really a combinatorial explosion, for a lot of reasons: - kernel and userspace tend to assume/require implied features where it makes sense, e.g. in kms atomic implies planes and universal planes. mesa has been doing the same - you cold go all the way to what radeon/amdgpu have done for years with a single incrementing version. Probably not flexible enough for intel. - every pciid is it's own uapi, so a feature only needs to be tested on platforms where it didn't ship from the start. Also cuts down on runtime discovery a lot - mesa tends to only support down to current lts kernels and not older, which again cuts the combinations a lot - I did look through upstream kernel docs for general (driver) uapi recommendations, but there isn't really anything about good taste stuff, just a lot about not screwing up compatibility across kernels/platforms. tldr; prefer discovery, don't sweat it if not, definitely don't overengineer this with some magic "give me all extensions" thing because that results in guaranteed cross-component backporting pains sooner or later. Or inconsistency, which defeats the point. Cheers, Daniel > Regards, Joonas > > > If the discussion gets too complicated, then can we add > > a GET_PARAM for the SET_PAT extension? I'm hoping we could either come > > up with something better reasonably quickly, or i915/Xe can add a new > > param for each new extensions until a better approach is available. > > > > > Furthermore the PXP context create story is even more complicated, > > > in a way that it is not just about querying whether the extension is > > > supported, but the expensive check is something more complicated. > > > > > > Going back to implementation details for this proposed new feature, > > > one alternative to query could be something like: > > > > > >drm_i915_gem_create_ext.flags |= > > > I915_GEM_CREATE_EXT_FLAG_PROBE_EXTENSIONS; > > > > > > That would be somewhat more light weight to implement that the > > > i915_query route. And it appears it would work for all ioctls which > > > support extensions apart for i915_context_param_engines. > > > > This seems little better than the "try it, and if it works then it's > > supported". > > > > I'm not suggesting that userspace should be able to check that > > scenario x+y+z will work, but more a list of extensions that > > conc
Re: [Intel-gfx] IOCTL feature detection (Was: Re: [PATCH 8/8] drm/i915: Allow user to set cache at BO creation)
On 2023-04-25 06:41:54, Joonas Lahtinen wrote: > (+ Faith and Daniel as they have been involved in previous discussions) > > Quoting Jordan Justen (2023-04-24 20:13:00) > > On 2023-04-24 02:08:43, Tvrtko Ursulin wrote: > > > > > > Being able to "list" supported extensions sounds like a reasonable > > > principle, albeit a departure from the design direction to date. > > > Which means there are probably no quick solutions. Also, AFAIU, only > > > PXP context create is the problematic one, right? Everything else is > > > pretty much instant or delayed allocation so super cheap to probe by > > > attempting to use. > > > > > > If I got that right and given this series is about > > > drm_i915_gem_create_ext I don't think this side discussion should be > > > blocking it. > > > > This still leaves the issue of no reasonable detection mechanism for > > the extension. > > I remember this exact discussion being repeated at least a few times in > the past 5 years. Previously the conclusion was always that detection by > trying to use the extension leads to least amount of uAPI surface. There > is also no concern of having mismatch in backporting of the query and the > functionality patches. > > Why exactly do you think it is more reasonable to do the following? > > check_if_extension_query_is_supported(); > > check_if_extension_xyz_is_supported(); > As I've mentioned a couple times, I'm asking for query item that returns it all the extensions that conceivably could work. In theory it could be made a single query item which somehow then enumerates if something is a context extension or bo extension. But, it seems simpler for all if we just have a couple query items returning a simple u64 array for the few classes of extensions. > VS > > create_[context,buffer,whatever]_with_extension(); > > destroy_[context,buffer,whatever](); > > For contexts and buffers there's no overhead, There's no-overhead to creating and destroying things? I think the 8s timeout when trying create a protected content context shows it's not always quite that simple. Over time userspace will have to continue growing this set of create/destroy tests as new extensions are added. Simply so we can discover what extensions are supported. This doesn't seem like a very robust way to advertise extensions for an api. Another point ... say you need to debug why some simple app is failing to start the driver. One tool might be strace. But now you have a bunch of noise of calls from the driver creating and destroying things just to discover what extensions the kernel supports. It would be nice if all this was handled with a query item vs a bunch of create/destroys. > and we're keeping the uAPI surface smaller (less bugs, less > validation effort). I understand wanting to keep the kernel uapi and implementation simple. Is it too complicated to return a u64 array for a query item indicating the extensions supported for the various classes of extensions? I think in most cases it'll just be essentially shuffling across an array of u64 items. (Since most known extensions will always be supported by the kernel.) > Additionally we support checking combinations of extensions A, B and > C without extra effort. Regarding combinations of extensions, is that really something userspace needs to probe? I would think if userspace knows about the possible extensions, then it's on userspace to use combinations appropriately. But, in detecting extensions, it is possible that, say extension B relies on extension A. Now userspace may have to probe to see if extension A is supported, and then probe for extension B while using extension A. Essentially, probing for an extension could become a bit complicated. Vs the kernel just giving us a u64 array of known extensions... -Jordan
Re: [Intel-gfx] IOCTL feature detection (Was: Re: [PATCH 8/8] drm/i915: Allow user to set cache at BO creation)
On Tue, 2023-04-25 at 16:41 +0300, Joonas Lahtinen wrote: > (+ Faith and Daniel as they have been involved in previous discussions) An orthogonal (but losely related) question: Is PXP the only subsystem that has the unique problem of: Uses a delayed worker to complete all dependencies for init.. but they take so long that by the time compositors-mesa-init comes up, creation of PXP context blocks too long and may still likely failing and incorrectly assuming PXP is not supported. (when we don't have a GET_PARAM for it). I believe HuC has a similiar issue - but doesnt reflect in the UAPI but rather the cmd buffers. We don't have other subsystems that have this problem? (dependency on other startups?)
[Intel-gfx] IOCTL feature detection (Was: Re: [PATCH 8/8] drm/i915: Allow user to set cache at BO creation)
(+ Faith and Daniel as they have been involved in previous discussions) Quoting Jordan Justen (2023-04-24 20:13:00) > On 2023-04-24 02:08:43, Tvrtko Ursulin wrote: > > > > Being able to "list" supported extensions sounds like a reasonable > > principle, albeit a departure from the design direction to date. > > Which means there are probably no quick solutions. Also, AFAIU, only > > PXP context create is the problematic one, right? Everything else is > > pretty much instant or delayed allocation so super cheap to probe by > > attempting to use. > > > > If I got that right and given this series is about > > drm_i915_gem_create_ext I don't think this side discussion should be > > blocking it. > > This still leaves the issue of no reasonable detection mechanism for > the extension. I remember this exact discussion being repeated at least a few times in the past 5 years. Previously the conclusion was always that detection by trying to use the extension leads to least amount of uAPI surface. There is also no concern of having mismatch in backporting of the query and the functionality patches. Why exactly do you think it is more reasonable to do the following? check_if_extension_query_is_supported(); check_if_extension_xyz_is_supported(); VS create_[context,buffer,whatever]_with_extension(); destroy_[context,buffer,whatever](); For contexts and buffers there's no overhead, and we're keeping the uAPI surface smaller (less bugs, less validation effort). Additionally we support checking combinations of extensions A, B and C without extra effort. If we're not returning enough clear errnos, then that is something to make sure we do. Regards, Joonas > If the discussion gets too complicated, then can we add > a GET_PARAM for the SET_PAT extension? I'm hoping we could either come > up with something better reasonably quickly, or i915/Xe can add a new > param for each new extensions until a better approach is available. > > > Furthermore the PXP context create story is even more complicated, > > in a way that it is not just about querying whether the extension is > > supported, but the expensive check is something more complicated. > > > > Going back to implementation details for this proposed new feature, > > one alternative to query could be something like: > > > >drm_i915_gem_create_ext.flags |= > > I915_GEM_CREATE_EXT_FLAG_PROBE_EXTENSIONS; > > > > That would be somewhat more light weight to implement that the > > i915_query route. And it appears it would work for all ioctls which > > support extensions apart for i915_context_param_engines. > > This seems little better than the "try it, and if it works then it's > supported". > > I'm not suggesting that userspace should be able to check that > scenario x+y+z will work, but more a list of extensions that > conceivably could work. Normally this should just a matter of the > kernel unconditionally adding the newly implemented extension to the > list returned in the query call. > > If a GET_PARAM can be made for the PXP case, then it seems like a > query item returning CONTEXT_CREATE extensions could conditionally > omit that extension just as easily as implementing the proposed new > GET_PARAM. > > -Jordan