On Thu, 7 Sep 2017 16:58:31 +0800 Dong Jia Shi <bjsdj...@linux.vnet.ibm.com> wrote:
> * Halil Pasic <pa...@linux.vnet.ibm.com> [2017-09-06 16:43:42 +0200]: > > > > > > > On 09/06/2017 04:20 PM, Cornelia Huck wrote: > > > On Wed, 6 Sep 2017 14:25:13 +0200 > > > Halil Pasic <pa...@linux.vnet.ibm.com> wrote: > > > > > >> We have basically two possibilities/options which ask for different > > >> handling: > > >> 1) EFAULT is due to a bug in the vfio-ccw implementation > > >> (can be QEMU or kernel). > > >> 2) EFAULT is due to buggy channel program. > > >> > > >> Option 2) is basically to be handled with a channel-program check and > > >> setting primary secondary and alert status. For reference see PoP page > > >> 15-59 ("Designation of Storage Area"). An exception may be an invalid > > >> channel program address in the ORB. There the channel-program check ain't > > >> explicitly stated (although) I would expect one. It may be implied by the > > >> things on page 15-59 though. > > >> > > >> Option 1) is however very similar to other we have figured out that the > > >> implementation is broken situations and should be handled consequently. > > >> The current state of the discussion is with a unit exception. > > >> > > >> Does that make sense? > > > > > > I think the situation is slightly different here, though. For the orb > > > flags, we reject something out of hand because we have not implemented > > > it, and for that, unit exception sounds like a good fit. Processing > > > errors, however, are more similar to errors in the hardware, and as > > > such can probably be reported via something like equipment check. > > > > > > > Noted. Let's see what Dong Jia has to say, before we continuing a > > discussion on something (option 1) what may be irrelevant anyway. > > > > >> > > >> Now, Dong Jia, I need your help to figure out do we have option 1 or > > >> option 2 here? After quick look at the kernel code, it appears to me that > > >> I've seen both option 1 and option 2 (I'm afraid) -- but my assessment > > >> was really very superficial. > There are three cases (all in the kernel) that generate a -EFAULT ret > code: > a. vfio_ccw_mdev_write: copy_from_user() fails. > This is option 1. > > b. ccwchain_fetch_tic > It's mostly likely that the vfio-ccw driver processed the ccw chains > wrongly. (Actually I can not think of any other reason.) Me neither, I'd consider hitting this a bug in the implementation. > This is option 1. > > c. ccwchain_fetch_idal > When we find that an IDAW contents an invalid address > This is option 2. > > > >> > > >> I would expect option 2 to be handled differently (kernel provides the > > >> SCSW) though. So we would do an equipment check for the first two ("equipment", i.e. the software, is malfunctioning) and use a more appropriate way for the malformed idaw?