On Wed, 12 May 2021 16:19:15 -0400 Eric Farman <far...@linux.ibm.com> wrote:
> On Tue, 2021-05-11 at 17:11 +0200, Cornelia Huck wrote: > > hsch and csch basically have two parts: execute the command, > > and perform the halt/clear function. For fully emulated > > subchannels, it is pretty clear how it will work: check the > > subchannel state, and actually 'perform the halt/clear function' > > and set cc 0 if everything looks good. > > > > For passthrough subchannels, some of the checking is done > > within QEMU, but some has to be done within the kernel. QEMU's > > subchannel state may be such that we can perform the async > > function, but the kernel may still get a cc != 0 when it is > > actually executing the instruction. In that case, we need to > > set the condition actually encountered by the kernel; if we > > set cc 0 on error, we would actually need to inject an interrupt > > as well. > > > > Signed-off-by: Cornelia Huck <coh...@redhat.com> > > --- > > > > Stumbled over this during the vfio-ccw kernel locking discussions. > > > > This is probably a corner case, and I'm not sure how I can actually > > get this path excercised, but it passes my smoke tests. > > I'll see if I can hammer my way into some of this. Thanks, that would be cool. > > > > > Not sure whether this is the way to go. > > I think it seems reasonable. > > > The unit exceptions in the > > halt/clear error paths also seem slightly fishy. > > It is peculiar. Looking at the old POPS references, the unit exception > states that the --device-- detected something unusual, not really the > subchannel (which is how vfio-ccw is behaving). But, providing some > indication that something went seriously wrong is good. Which I guess > was the point of the UE code, even though it's obviously set up to be > generated after a failure on the START. Yes, I had copied it over when I wired up halt/clear. > > I guess at the least, we need to clean up the FCTL based on the > function that failed, rather than only cleaning up the START function. Makes sense. > The UE itself may just be an extra "hey this is busted" indicator. It still feels a bit odd, but I'm not sure that there's a better alternative. > > > > > --- > > hw/s390x/css.c | 34 ++++++++++++++++++++++++++++++---- > > hw/vfio/ccw.c | 4 ++-- > > 2 files changed, 32 insertions(+), 6 deletions(-) > > > > diff --git a/hw/s390x/css.c b/hw/s390x/css.c > > index bed46f5ec3a2..ce2e903ca25a 100644 > > --- a/hw/s390x/css.c > > +++ b/hw/s390x/css.c > > @@ -1206,23 +1206,49 @@ static void > > sch_handle_start_func_virtual(SubchDev *sch) > > > > } > > > > -static void sch_handle_halt_func_passthrough(SubchDev *sch) > > +static IOInstEnding sch_handle_halt_func_passthrough(SubchDev *sch) > > { > > int ret; > > > > ret = s390_ccw_halt(sch); > > if (ret == -ENOSYS) { > > sch_handle_halt_func(sch); > > + return IOINST_CC_EXPECTED; > > This is the fallback, so makes sense. You could fold it into the switch > table, but since that's for "stuff from the kernel" versus -ENOSYS says > "there's no way to call the kernel" I guess this is fine too. Yes, that was the reason I kept that separate. (I don't expect it to be a heavily exercised path nowadays anyway.) > > > + } > > + /* > > + * Some conditions may have been detected prior to starting the > > halt > > + * function; map them to the correct cc. > > + */ > > + switch (ret) { > > + case -EBUSY: > > + return IOINST_CC_BUSY; > > + case -ENODEV: > > + case -EACCES: > > + return IOINST_CC_NOT_OPERATIONAL; > > + default: > > + return IOINST_CC_EXPECTED; > > } > > } (...) > > diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c > > index e752c845e9e4..39275a917bd2 100644 > > --- a/hw/vfio/ccw.c > > +++ b/hw/vfio/ccw.c > > @@ -199,7 +199,7 @@ again: > > // This is for CLEAR > > > case 0: > > case -ENODEV: > > case -EACCES: > > - return 0; > > + return ret; > > case -EFAULT: > > default: > > sch_gen_unit_exception(sch); > > @@ -240,7 +240,7 @@ again: > > // This is for HALT > > > case -EBUSY: > > case -ENODEV: > > case -EACCES: > > - return 0; > > + return ret; > > Aside: How could we get EACCES for either HALT or CLEAR? I only see > that set in the normal request path, if we got a CC3 on the SSCH. We should only get -ENODEV, which basically indicates cc3 on the kernel side. For start, the kernel distinguishes between "you didn't specify a path in the orb that's actually available" and "device not operational", but we map everything to cc 3 in QEMU (and I don't think there's anything else we could do with that information anyway.) > > Can we scrub them, or do we need to update kernel > Documentation/s390/vfio-ccw.rst ? :) We could remove them, or log something if they come up unexpectedly; but their presence does not really hurt, either.