Re: [PATCH v2] virtio_blk: Add support for lifetime feature

2021-04-20 Thread Cornelia Huck
On Tue, 20 Apr 2021 06:08:29 -0400
"Michael S. Tsirkin"  wrote:

> On Tue, Apr 20, 2021 at 08:01:29AM +0100, Christoph Hellwig wrote:
> > Just to despit my 2 cents again:  I think the way this is specified
> > in the virtio spec is actively harmful and we should not suport it in
> > Linux.
> > 
> > If others override me we at least need to require a detailed
> > documentation of these fields as the virto spec does not provide it.
> > 
> > Please also do not add pointless over 80 character lines, and follow
> > the one value per sysfs file rule.  
> 
> Enrico would you like to raise the issues with the virtio TC
> for resolution?
> 

FWIW, I've opened https://github.com/oasis-tcs/virtio-spec/issues/106
to track this.



Re: [PATCH] vfio/pci: Add missing range check in vfio_pci_mmap

2021-04-13 Thread Cornelia Huck
On Mon, 12 Apr 2021 23:41:24 +0200
"Christian A. Ehrhardt"  wrote:

> When mmaping an extra device region verify that the region index
> derived from the mmap offset is valid.
> 
> Fixes: a15b1883fee1 ("vfio_pci: Allow mapping extra regions")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Christian A. Ehrhardt 
> ---
>  drivers/vfio/pci/vfio_pci.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)

Reviewed-by: Cornelia Huck 



Re: [PATCH 8/9] vfio/pci: export nvlink2 support into vendor vfio_pci drivers

2021-04-01 Thread Cornelia Huck
On Mon, 29 Mar 2021 17:10:53 -0600
Alex Williamson  wrote:

> On Tue, 23 Mar 2021 16:32:13 -0300
> Jason Gunthorpe  wrote:
> 
> > On Mon, Mar 22, 2021 at 10:40:16AM -0600, Alex Williamson wrote:

> > > So unless you want to do some bitkeeper archaeology, we've always
> > > allowed driver probes to fail and fall through to the next one, not
> > > even complaining with -ENODEV.  In practice it hasn't been an issue
> > > because how many drivers do you expect to have that would even try to
> > > claim a device.  
> > 
> > Do you know of anything using this ability? It might be helpful  
> 
> I don't.

I've been trying to remember why I added that patch to ignore all
errors (rather than only -ENODEV), but I suspect it might have been
related to the concurrent probing stuff I tried to implement back then.
The one instance of drivers matching to the same id I recall (s390
ctc/lcs) is actually not handled on the individual device level, but in
the meta ccwgroup driver; I don't remember anything else in the s390
case.

> 
> > > Ordering is only important when there's a catch-all so we need to
> > > figure out how to make that last among a class of drivers that will
> > > attempt to claim a device.  The softdep is a bit of a hack to do
> > > that, I'll admit, but I don't see how the alternate driver flavor
> > > universe solves having a catch-all either.
> > 
> > Haven't entirely got there yet, but I think the catch all probably has
> > to be handled by userspace udev/kmod in some way, as it is the only
> > thing that knows if there is a more specific module to load. This is
> > the biggest problem..
> > 
> > And again, I feel this is all a big tangent, especially now that HCH
> > wants to delete the nvlink stuff we should just leave igd alone.  
> 
> Determining which things stay in vfio-pci-core and which things are
> split to variant drivers and how those variant drivers can match the
> devices they intend to support seems very inline with this series.  If
> igd stays as part of vfio-pci-core then I think we're drawing a
> parallel to z-pci support, where a significant part of that support is
> a set of extra data structures exposed through capabilities to support
> userspace use of the device.  Therefore extra regions or data
> structures through capabilities, where we're not changing device
> access, except as required for the platform (not the device) seem to be
> things that fit within the core, right?  Thanks,
> 
> Alex

As we are only talking about extra data governed by a capability, I
don't really see a problem with keeping it in the vfio core.

For those devices that need more specialized treatment, maybe we need
some kind of priority-based matching? I.e., if we match a device with
drivers, start with the one with highest priority (the specialized
one), and have the generic driver at the lowest priority. A
higher-priority driver added later one should not affect already bound
devices (and would need manual intervention again.)

[I think this has come up in other places in the past as well, but I
don't have any concrete pointers handy.]



Re: [virtio-dev] Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

2021-03-04 Thread Cornelia Huck
On Thu, 4 Mar 2021 16:24:16 +0800
Jason Wang  wrote:

> On 2021/3/3 4:29 下午, Cornelia Huck wrote:
> > On Wed, 3 Mar 2021 12:01:01 +0800
> > Jason Wang  wrote:
> >  
> >> On 2021/3/2 8:08 下午, Cornelia Huck wrote:  
> >>> On Mon, 1 Mar 2021 11:51:08 +0800
> >>> Jason Wang  wrote:
> >>> 
> >>>> On 2021/3/1 5:25 上午, Michael S. Tsirkin wrote:  
> >>>>> On Fri, Feb 26, 2021 at 04:19:16PM +0800, Jason Wang wrote:  
> >>>>>> On 2021/2/26 2:53 上午, Michael S. Tsirkin wrote:  
> >>>>>>> Confused. What is wrong with the above? It never reads the
> >>>>>>> field unless the feature has been offered by device.  
> >>>>>> So the spec said:
> >>>>>>
> >>>>>> "
> >>>>>>
> >>>>>> The following driver-read-only field, max_virtqueue_pairs only exists 
> >>>>>> if
> >>>>>> VIRTIO_NET_F_MQ is set.
> >>>>>>
> >>>>>> "
> >>>>>>
> >>>>>> If I read this correctly, there will be no max_virtqueue_pairs field 
> >>>>>> if the
> >>>>>> VIRTIO_NET_F_MQ is not offered by device? If yes the offsetof() 
> >>>>>> violates
> >>>>>> what spec said.
> >>>>>>
> >>>>>> Thanks  
> >>>>> I think that's a misunderstanding. This text was never intended to
> >>>>> imply that field offsets change beased on feature bits.
> >>>>> We had this pain with legacy and we never wanted to go back there.
> >>>>>
> >>>>> This merely implies that without VIRTIO_NET_F_MQ the field
> >>>>> should not be accessed. Exists in the sense "is accessible to driver".
> >>>>>
> >>>>> Let's just clarify that in the spec, job done.  
> >>>> Ok, agree. That will make things more eaiser.  
> >>> Yes, that makes much more sense.
> >>>
> >>> What about adding the following to the "Basic Facilities of a Virtio
> >>> Device/Device Configuration Space" section of the spec:
> >>>
> >>> "If an optional configuration field does not exist, the corresponding
> >>> space is still present, but reserved."  
> >>
> >> This became interesting after re-reading some of the qemu codes.
> >>
> >> E.g in virtio-net.c we had:
> >>
> >> *static VirtIOFeature feature_sizes[] = {
> >>       {.flags = 1ULL << VIRTIO_NET_F_MAC,
> >>    .end = endof(struct virtio_net_config, mac)},
> >>       {.flags = 1ULL << VIRTIO_NET_F_STATUS,
> >>    .end = endof(struct virtio_net_config, status)},
> >>       {.flags = 1ULL << VIRTIO_NET_F_MQ,
> >>    .end = endof(struct virtio_net_config, max_virtqueue_pairs)},
> >>       {.flags = 1ULL << VIRTIO_NET_F_MTU,
> >>    .end = endof(struct virtio_net_config, mtu)},
> >>       {.flags = 1ULL << VIRTIO_NET_F_SPEED_DUPLEX,
> >>    .end = endof(struct virtio_net_config, duplex)},
> >>       {.flags = (1ULL << VIRTIO_NET_F_RSS) | (1ULL <<
> >> VIRTIO_NET_F_HASH_REPORT),
> >>    .end = endof(struct virtio_net_config, supported_hash_types)},
> >>       {}
> >> };*
> >>
> >> *It has a implict dependency chain. E.g MTU doesn't presnet if
> >> DUPLEX/RSS is not offered ...
> >> *  
> > But I think it covers everything up to the relevant field, no? So MTU
> > is included if we have the feature bit, even if we don't have
> > DUPLEX/RSS.
> >
> > Given that a config space may be shorter (but must not collapse
> > non-existing fields), maybe a better wording would be:
> >
> > "If an optional configuration field does not exist, the corresponding
> > space will still be present if it is not at the end of the
> > configuration space (i.e., further configuration fields exist.)  
> 
> 
> This should work but I think we need to define the end of configuration 
> space first?

What about sidestepping this:

"...the corresponding space will still be present, unless no further
configuration fields exist."

?

> 
> > This
> > implies that a given field, if it exists, is always at the same offset
> > from the beginning of the configuration space."



Re: [virtio-dev] Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

2021-03-03 Thread Cornelia Huck
On Wed, 3 Mar 2021 12:01:01 +0800
Jason Wang  wrote:

> On 2021/3/2 8:08 下午, Cornelia Huck wrote:
> > On Mon, 1 Mar 2021 11:51:08 +0800
> > Jason Wang  wrote:
> >  
> >> On 2021/3/1 5:25 上午, Michael S. Tsirkin wrote:  
> >>> On Fri, Feb 26, 2021 at 04:19:16PM +0800, Jason Wang wrote:  
> >>>> On 2021/2/26 2:53 上午, Michael S. Tsirkin wrote:  
> >>>>> Confused. What is wrong with the above? It never reads the
> >>>>> field unless the feature has been offered by device.  
> >>>> So the spec said:
> >>>>
> >>>> "
> >>>>
> >>>> The following driver-read-only field, max_virtqueue_pairs only exists if
> >>>> VIRTIO_NET_F_MQ is set.
> >>>>
> >>>> "
> >>>>
> >>>> If I read this correctly, there will be no max_virtqueue_pairs field if 
> >>>> the
> >>>> VIRTIO_NET_F_MQ is not offered by device? If yes the offsetof() violates
> >>>> what spec said.
> >>>>
> >>>> Thanks  
> >>> I think that's a misunderstanding. This text was never intended to
> >>> imply that field offsets change beased on feature bits.
> >>> We had this pain with legacy and we never wanted to go back there.
> >>>
> >>> This merely implies that without VIRTIO_NET_F_MQ the field
> >>> should not be accessed. Exists in the sense "is accessible to driver".
> >>>
> >>> Let's just clarify that in the spec, job done.  
> >>
> >> Ok, agree. That will make things more eaiser.  
> > Yes, that makes much more sense.
> >
> > What about adding the following to the "Basic Facilities of a Virtio
> > Device/Device Configuration Space" section of the spec:
> >
> > "If an optional configuration field does not exist, the corresponding
> > space is still present, but reserved."  
> 
> 
> This became interesting after re-reading some of the qemu codes.
> 
> E.g in virtio-net.c we had:
> 
> *static VirtIOFeature feature_sizes[] = {
>      {.flags = 1ULL << VIRTIO_NET_F_MAC,
>   .end = endof(struct virtio_net_config, mac)},
>      {.flags = 1ULL << VIRTIO_NET_F_STATUS,
>   .end = endof(struct virtio_net_config, status)},
>      {.flags = 1ULL << VIRTIO_NET_F_MQ,
>   .end = endof(struct virtio_net_config, max_virtqueue_pairs)},
>      {.flags = 1ULL << VIRTIO_NET_F_MTU,
>   .end = endof(struct virtio_net_config, mtu)},
>      {.flags = 1ULL << VIRTIO_NET_F_SPEED_DUPLEX,
>   .end = endof(struct virtio_net_config, duplex)},
>      {.flags = (1ULL << VIRTIO_NET_F_RSS) | (1ULL << 
> VIRTIO_NET_F_HASH_REPORT),
>   .end = endof(struct virtio_net_config, supported_hash_types)},
>      {}
> };*
> 
> *It has a implict dependency chain. E.g MTU doesn't presnet if 
> DUPLEX/RSS is not offered ...
> *

But I think it covers everything up to the relevant field, no? So MTU
is included if we have the feature bit, even if we don't have
DUPLEX/RSS.

Given that a config space may be shorter (but must not collapse
non-existing fields), maybe a better wording would be:

"If an optional configuration field does not exist, the corresponding
space will still be present if it is not at the end of the
configuration space (i.e., further configuration fields exist.) This
implies that a given field, if it exists, is always at the same offset
from the beginning of the configuration space."


> >
> > (Do we need to specify what a device needs to do if the driver tries to
> > access a non-existing field? We cannot really make assumptions about
> > config space accesses; virtio-ccw can just copy a chunk of config space
> > that contains non-existing fields...  
> 
> 
> Right, it looks to me ccw doesn't depends on config len which is kind of 
> interesting. I think the answer depends on the implementation of both 
> transport and device:

(virtio-ccw is a bit odd, because channel I/O does not have the concept
of a config space and I needed to come up with something)

> 
> Let's take virtio-net-pci as an example, it fills status unconditionally 
> in virtio_net_set_config() even if VIRTIO_NET_F_STATUS is not 
> negotiated. So with the above feature_sizes:
> 
> 1) if one of the MQ, MTU, DUPLEX or RSS is implemented, driver can still 
> read status in this case
> 2) if none of the above four is negotied, driver can only read a 0xFF 
> (virtio_config_readb())
> 
> 
> > I guess the device could ignore
> > writes and return zeroes on read?)  
> 
> 
> So consider the above, it might be too late to fix/clarify that in the 
> spec on the expected behaviour of reading/writing non-exist fields.

We could still advise behaviour via SHOULD, though I'm not sure it adds
much at this point in time.

It really feels a bit odd that a driver can still read and write fields
for features that it did not negotiate, but I fear we're stuck with it.

> 
> Thanks
> 
> 
> >
> > I've opened https://github.com/oasis-tcs/virtio-spec/issues/98 for the
> > spec issues.
> >  



Re: [virtio-dev] Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

2021-03-02 Thread Cornelia Huck
On Mon, 1 Mar 2021 11:51:08 +0800
Jason Wang  wrote:

> On 2021/3/1 5:25 上午, Michael S. Tsirkin wrote:
> > On Fri, Feb 26, 2021 at 04:19:16PM +0800, Jason Wang wrote:  
> >> On 2021/2/26 2:53 上午, Michael S. Tsirkin wrote:  

> >>> Confused. What is wrong with the above? It never reads the
> >>> field unless the feature has been offered by device.  
> >>
> >> So the spec said:
> >>
> >> "
> >>
> >> The following driver-read-only field, max_virtqueue_pairs only exists if
> >> VIRTIO_NET_F_MQ is set.
> >>
> >> "
> >>
> >> If I read this correctly, there will be no max_virtqueue_pairs field if the
> >> VIRTIO_NET_F_MQ is not offered by device? If yes the offsetof() violates
> >> what spec said.
> >>
> >> Thanks  
> > I think that's a misunderstanding. This text was never intended to
> > imply that field offsets change beased on feature bits.
> > We had this pain with legacy and we never wanted to go back there.
> >
> > This merely implies that without VIRTIO_NET_F_MQ the field
> > should not be accessed. Exists in the sense "is accessible to driver".
> >
> > Let's just clarify that in the spec, job done.  
> 
> 
> Ok, agree. That will make things more eaiser.

Yes, that makes much more sense.

What about adding the following to the "Basic Facilities of a Virtio
Device/Device Configuration Space" section of the spec:

"If an optional configuration field does not exist, the corresponding
space is still present, but reserved."

(Do we need to specify what a device needs to do if the driver tries to
access a non-existing field? We cannot really make assumptions about
config space accesses; virtio-ccw can just copy a chunk of config space
that contains non-existing fields... I guess the device could ignore
writes and return zeroes on read?)

I've opened https://github.com/oasis-tcs/virtio-spec/issues/98 for the
spec issues.



Re: [virtio-dev] Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

2021-02-25 Thread Cornelia Huck
On Thu, 25 Feb 2021 12:36:07 +0800
Jason Wang  wrote:

> On 2021/2/24 7:12 下午, Cornelia Huck wrote:
> > On Wed, 24 Feb 2021 17:29:07 +0800
> > Jason Wang  wrote:
> >  
> >> On 2021/2/23 6:58 下午, Cornelia Huck wrote:  
> >>> On Tue, 23 Feb 2021 18:31:07 +0800
> >>> Jason Wang  wrote:

> >>>> The problem is the MTU description for example:
> >>>>
> >>>> "The following driver-read-only field, mtu only exists if
> >>>> VIRTIO_NET_F_MTU is set."
> >>>>
> >>>> It looks to me need to use "if VIRTIO_NET_F_MTU is set by device".  
> >>> "offered by the device"? I don't think it should 'disappear' from the
> >>> config space if the driver won't use it. (Same for other config space
> >>> fields that are tied to feature bits.)  
> >>
> >> But what happens if e.g device doesn't offer VIRTIO_NET_F_MTU? It looks
> >> to according to the spec there will be no mtu field.  
> > I think so, yes.
> >  
> >> And a more interesting case is VIRTIO_NET_F_MQ is not offered but
> >> VIRTIO_NET_F_MTU offered. To me, it means we don't have
> >> max_virtqueue_pairs but it's not how the driver is wrote today.  
> > That would be a bug, but it seems to me that the virtio-net driver
> > reads max_virtqueue_pairs conditionally and handles absence of the
> > feature correctly?  
> 
> 
> Yes, see the avove codes:
> 
>      if (virtio_has_feature(vdev, VIRTIO_NET_F_MTU)) {
>      int mtu = virtio_cread16(vdev,
>   offsetof(struct virtio_net_config,
>    mtu));
>      if (mtu < MIN_MTU)
>      __virtio_clear_bit(vdev, VIRTIO_NET_F_MTU);
>      }
> 

Ouch, you're right. The virtio_cread accessors all operate on offsets
into a structure, it's just more obvious here.

> So it's probably too late to fix the driver.

It is never too late to fix a driver :)

It seems involved, though. We'd need different config space structures
based upon which set of features has been negotiated. It's not too bad
when features build upon each other and add fields at the end (this
should be fine right now, if the code remembered to check for the
feature), but can become ugly if an in-between field depends upon a
feature.

I guess we've been lucky that it seems to be an extremely uncommon
configuration.

> 
> 
> >  
> >>  
> >>>
> >>>> Otherwise readers (at least for me), may think the MTU is only valid
> >>>> if driver set the bit.  
> >>> I think it would still be 'valid' in the sense that it exists and has
> >>> some value in there filled in by the device, but a driver reading it
> >>> without negotiating the feature would be buggy. (Like in the kernel
> >>> code above; the kernel not liking the value does not make the field
> >>> invalid.)  
> >>
> >> See Michael's reply, the spec allows read the config before setting
> >> features.  
> > Yes, the period prior to finishing negotiation is obviously special.
> >  
> >>  
> >>> Maybe a statement covering everything would be:
> >>>
> >>> "The following driver-read-only field mtu only exists if the device
> >>> offers VIRTIO_NET_F_MTU and may be read by the driver during feature
> >>> negotiation and after VIRTIO_NET_F_MTU has been successfully
> >>> negotiated."
> >>> 
> >>>> 
> >>>>> Should we add a wording clarification to the spec?  
> >>>> I think so.  
> >>> Some clarification would be needed for each field that depends on a
> >>> feature; that would be quite verbose. Maybe we can get away with a
> >>> clarifying statement?
> >>>
> >>> "Some config space fields may depend on a certain feature. In that
> >>> case, the field exits if the device has offered the corresponding
> >>> feature,  
> >>
> >> So this implies for !VIRTIO_NET_F_MQ && VIRTIO_NET_F_MTU, the config
> >> will look like:
> >>
> >> struct virtio_net_config {
> >>       u8 mac[6];
> >>       le16 status;
> >>       le16 mtu;
> >> };
> >>  
> > I agree.  
> 
> 
> So consider it's probably too late to fix the driver which assumes some 
> field are always persent, it looks to me need fix the spec do declare 
> the fields are always existing in

Re: [virtio-dev] Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

2021-02-24 Thread Cornelia Huck
On Wed, 24 Feb 2021 17:29:07 +0800
Jason Wang  wrote:

> On 2021/2/23 6:58 下午, Cornelia Huck wrote:
> > On Tue, 23 Feb 2021 18:31:07 +0800
> > Jason Wang  wrote:
> >  
> >> On 2021/2/23 6:04 下午, Cornelia Huck wrote:  
> >>> On Tue, 23 Feb 2021 17:46:20 +0800
> >>> Jason Wang  wrote:
> >>> 
> >>>> On 2021/2/23 下午5:25, Michael S. Tsirkin wrote:  
> >>>>> On Mon, Feb 22, 2021 at 09:09:28AM -0800, Si-Wei Liu wrote:  
> >>>>>> On 2/21/2021 8:14 PM, Jason Wang wrote:  
> >>>>>>> On 2021/2/19 7:54 下午, Si-Wei Liu wrote:  
> >>>>>>>> Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
> >>>>>>>> for legacy") made an exception for legacy guests to reset
> >>>>>>>> features to 0, when config space is accessed before features
> >>>>>>>> are set. We should relieve the verify_min_features() check
> >>>>>>>> and allow features reset to 0 for this case.
> >>>>>>>>
> >>>>>>>> It's worth noting that not just legacy guests could access
> >>>>>>>> config space before features are set. For instance, when
> >>>>>>>> feature VIRTIO_NET_F_MTU is advertised some modern driver
> >>>>>>>> will try to access and validate the MTU present in the config
> >>>>>>>> space before virtio features are set.  
> >>>>>>> This looks like a spec violation:
> >>>>>>>
> >>>>>>> "
> >>>>>>>
> >>>>>>> The following driver-read-only field, mtu only exists if
> >>>>>>> VIRTIO_NET_F_MTU is set. This field specifies the maximum MTU for the
> >>>>>>> driver to use.
> >>>>>>> "
> >>>>>>>
> >>>>>>> Do we really want to workaround this?  
> >>>>>> Isn't the commit 452639a64ad8 itself is a workaround for legacy guest?
> >>>>>>
> >>>>>> I think the point is, since there's legacy guest we'd have to support, 
> >>>>>> this
> >>>>>> host side workaround is unavoidable. Although I agree the violating 
> >>>>>> driver
> >>>>>> should be fixed (yes, it's in today's upstream kernel which exists for 
> >>>>>> a
> >>>>>> while now).  
> >>>>> Oh  you are right:
> >>>>>
> >>>>>
> >>>>> static int virtnet_validate(struct virtio_device *vdev)
> >>>>> {
> >>>>>if (!vdev->config->get) {
> >>>>>dev_err(>dev, "%s failure: config access 
> >>>>> disabled\n",
> >>>>>__func__);
> >>>>>return -EINVAL;
> >>>>>}
> >>>>>
> >>>>>if (!virtnet_validate_features(vdev))
> >>>>>return -EINVAL;
> >>>>>
> >>>>>if (virtio_has_feature(vdev, VIRTIO_NET_F_MTU)) {
> >>>>>int mtu = virtio_cread16(vdev,
> >>>>> offsetof(struct 
> >>>>> virtio_net_config,
> >>>>>  mtu));
> >>>>>if (mtu < MIN_MTU)
> >>>>>__virtio_clear_bit(vdev, VIRTIO_NET_F_MTU);  
> >>>> I wonder why not simply fail here?  
> >>> I think both failing or not accepting the feature can be argued to make
> >>> sense: "the device presented us with a mtu size that does not make
> >>> sense" would point to failing, "we cannot work with the mtu size that
> >>> the device presented us" would point to not negotiating the feature.
> >>> 
> >>>> 
> >>>>>}
> >>>>>
> >>>>>return 0;
> >>>>> }
> >>>>>
> >>>>> And the spec says:
> >>>>>
> >>>>>
> >>>>> The driver MUST follow this sequence to initialize a device:
> >>>>> 1. Reset the 

Re: [virtio-dev] Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

2021-02-23 Thread Cornelia Huck
On Tue, 23 Feb 2021 18:31:07 +0800
Jason Wang  wrote:

> On 2021/2/23 6:04 下午, Cornelia Huck wrote:
> > On Tue, 23 Feb 2021 17:46:20 +0800
> > Jason Wang  wrote:
> >  
> >> On 2021/2/23 下午5:25, Michael S. Tsirkin wrote:  
> >>> On Mon, Feb 22, 2021 at 09:09:28AM -0800, Si-Wei Liu wrote:  
> >>>> On 2/21/2021 8:14 PM, Jason Wang wrote:  
> >>>>> On 2021/2/19 7:54 下午, Si-Wei Liu wrote:  
> >>>>>> Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
> >>>>>> for legacy") made an exception for legacy guests to reset
> >>>>>> features to 0, when config space is accessed before features
> >>>>>> are set. We should relieve the verify_min_features() check
> >>>>>> and allow features reset to 0 for this case.
> >>>>>>
> >>>>>> It's worth noting that not just legacy guests could access
> >>>>>> config space before features are set. For instance, when
> >>>>>> feature VIRTIO_NET_F_MTU is advertised some modern driver
> >>>>>> will try to access and validate the MTU present in the config
> >>>>>> space before virtio features are set.  
> >>>>> This looks like a spec violation:
> >>>>>
> >>>>> "
> >>>>>
> >>>>> The following driver-read-only field, mtu only exists if
> >>>>> VIRTIO_NET_F_MTU is set. This field specifies the maximum MTU for the
> >>>>> driver to use.
> >>>>> "
> >>>>>
> >>>>> Do we really want to workaround this?  
> >>>> Isn't the commit 452639a64ad8 itself is a workaround for legacy guest?
> >>>>
> >>>> I think the point is, since there's legacy guest we'd have to support, 
> >>>> this
> >>>> host side workaround is unavoidable. Although I agree the violating 
> >>>> driver
> >>>> should be fixed (yes, it's in today's upstream kernel which exists for a
> >>>> while now).  
> >>> Oh  you are right:
> >>>
> >>>
> >>> static int virtnet_validate(struct virtio_device *vdev)
> >>> {
> >>>   if (!vdev->config->get) {
> >>>   dev_err(>dev, "%s failure: config access 
> >>> disabled\n",
> >>>   __func__);
> >>>   return -EINVAL;
> >>>   }
> >>>
> >>>   if (!virtnet_validate_features(vdev))
> >>>   return -EINVAL;
> >>>
> >>>   if (virtio_has_feature(vdev, VIRTIO_NET_F_MTU)) {
> >>>   int mtu = virtio_cread16(vdev,
> >>>offsetof(struct 
> >>> virtio_net_config,
> >>> mtu));
> >>>   if (mtu < MIN_MTU)
> >>>   __virtio_clear_bit(vdev, VIRTIO_NET_F_MTU);  
> >>
> >> I wonder why not simply fail here?  
> > I think both failing or not accepting the feature can be argued to make
> > sense: "the device presented us with a mtu size that does not make
> > sense" would point to failing, "we cannot work with the mtu size that
> > the device presented us" would point to not negotiating the feature.
> >  
> >>  
> >>>   }
> >>>
> >>>   return 0;
> >>> }
> >>>
> >>> And the spec says:
> >>>
> >>>
> >>> The driver MUST follow this sequence to initialize a device:
> >>> 1. Reset the device.
> >>> 2. Set the ACKNOWLEDGE status bit: the guest OS has noticed the device.
> >>> 3. Set the DRIVER status bit: the guest OS knows how to drive the device.
> >>> 4. Read device feature bits, and write the subset of feature bits 
> >>> understood by the OS and driver to the
> >>> device. During this step the driver MAY read (but MUST NOT write) the 
> >>> device-specific configuration
> >>> fields to check that it can support the device before accepting it.
> >>> 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new feature 
> >>> bits after this step.
> >>> 6. Re-read device status to ensure the FEATURES_OK bit is still set: 
> >&g

Re: [virtio-dev] Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

2021-02-23 Thread Cornelia Huck
On Tue, 23 Feb 2021 17:46:20 +0800
Jason Wang  wrote:

> On 2021/2/23 下午5:25, Michael S. Tsirkin wrote:
> > On Mon, Feb 22, 2021 at 09:09:28AM -0800, Si-Wei Liu wrote:  
> >>
> >> On 2/21/2021 8:14 PM, Jason Wang wrote:  
> >>> On 2021/2/19 7:54 下午, Si-Wei Liu wrote:  
>  Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
>  for legacy") made an exception for legacy guests to reset
>  features to 0, when config space is accessed before features
>  are set. We should relieve the verify_min_features() check
>  and allow features reset to 0 for this case.
> 
>  It's worth noting that not just legacy guests could access
>  config space before features are set. For instance, when
>  feature VIRTIO_NET_F_MTU is advertised some modern driver
>  will try to access and validate the MTU present in the config
>  space before virtio features are set.  
> >>>
> >>> This looks like a spec violation:
> >>>
> >>> "
> >>>
> >>> The following driver-read-only field, mtu only exists if
> >>> VIRTIO_NET_F_MTU is set. This field specifies the maximum MTU for the
> >>> driver to use.
> >>> "
> >>>
> >>> Do we really want to workaround this?  
> >> Isn't the commit 452639a64ad8 itself is a workaround for legacy guest?
> >>
> >> I think the point is, since there's legacy guest we'd have to support, this
> >> host side workaround is unavoidable. Although I agree the violating driver
> >> should be fixed (yes, it's in today's upstream kernel which exists for a
> >> while now).  
> > Oh  you are right:
> >
> >
> > static int virtnet_validate(struct virtio_device *vdev)
> > {
> >  if (!vdev->config->get) {
> >  dev_err(>dev, "%s failure: config access disabled\n",
> >  __func__);
> >  return -EINVAL;
> >  }
> >
> >  if (!virtnet_validate_features(vdev))
> >  return -EINVAL;
> >
> >  if (virtio_has_feature(vdev, VIRTIO_NET_F_MTU)) {
> >  int mtu = virtio_cread16(vdev,
> >   offsetof(struct virtio_net_config,
> >mtu));
> >  if (mtu < MIN_MTU)
> >  __virtio_clear_bit(vdev, VIRTIO_NET_F_MTU);  
> 
> 
> I wonder why not simply fail here?

I think both failing or not accepting the feature can be argued to make
sense: "the device presented us with a mtu size that does not make
sense" would point to failing, "we cannot work with the mtu size that
the device presented us" would point to not negotiating the feature.

> 
> 
> >  }
> >
> >  return 0;
> > }
> >
> > And the spec says:
> >
> >
> > The driver MUST follow this sequence to initialize a device:
> > 1. Reset the device.
> > 2. Set the ACKNOWLEDGE status bit: the guest OS has noticed the device.
> > 3. Set the DRIVER status bit: the guest OS knows how to drive the device.
> > 4. Read device feature bits, and write the subset of feature bits 
> > understood by the OS and driver to the
> > device. During this step the driver MAY read (but MUST NOT write) the 
> > device-specific configuration
> > fields to check that it can support the device before accepting it.
> > 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new feature 
> > bits after this step.
> > 6. Re-read device status to ensure the FEATURES_OK bit is still set: 
> > otherwise, the device does not
> > support our subset of features and the device is unusable.
> > 7. Perform device-specific setup, including discovery of virtqueues for the 
> > device, optional per-bus setup,
> > reading and possibly writing the device’s virtio configuration space, and 
> > population of virtqueues.
> > 8. Set the DRIVER_OK status bit. At this point the device is “live”.
> >
> >
> > Item 4 on the list explicitly allows reading config space before
> > FEATURES_OK.
> >
> > I conclude that VIRTIO_NET_F_MTU is set means "set in device features".  
> 
> 
> So this probably need some clarification. "is set" is used many times in 
> the spec that has different implications.

Before FEATURES_OK is set by the driver, I guess it means "the device
has offered the feature"; during normal usage, it means "the feature
has been negotiated". (This is a bit fuzzy for legacy mode.)

Should we add a wording clarification to the spec?



Re: [PATCH v2 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks

2021-02-19 Thread Cornelia Huck
On Mon, 15 Feb 2021 20:15:47 -0500
Tony Krowiak  wrote:

> This patch fixes a circular locking dependency in the CI introduced by
> commit f21916ec4826 ("s390/vfio-ap: clean up vfio_ap resources when KVM
> pointer invalidated"). The lockdep only occurs when starting a Secure
> Execution guest. Crypto virtualization (vfio_ap) is not yet supported for
> SE guests; however, in order to avoid CI errors, this fix is being
> provided.
> 
> The circular lockdep was introduced when the masks in the guest's APCB
> were taken under the matrix_dev->lock. While the lock is definitely
> needed to protect the setting/unsetting of the KVM pointer, it is not
> necessarily critical for setting the masks, so this will not be done under
> protection of the matrix_dev->lock.
> 
> Fixes: f21916ec4826 ("s390/vfio-ap: clean up vfio_ap resources when KVM 
> pointer invalidated")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Tony Krowiak 
> ---
>  drivers/s390/crypto/vfio_ap_ops.c | 119 +-
>  1 file changed, 84 insertions(+), 35 deletions(-)

I've been looking at the patch for a bit now and tried to follow down
the various paths; and while I think it's ok, I do not really have
enough confidence about that for a R-b. But have an

Acked-by: Cornelia Huck 



Re: [PATCH] vfio/type1: Use follow_pte()

2021-02-15 Thread Cornelia Huck
On Fri, 12 Feb 2021 12:01:50 -0700
Alex Williamson  wrote:

> follow_pfn() doesn't make sure that we're using the correct page
> protections, get the pte with follow_pte() so that we can test
> protections and get the pfn from the pte.
> 
> Fixes: 5cbf3264bc71 ("vfio/type1: Fix VA->PA translation for PFNMAP VMAs in 
> vaddr_get_pfn()")
> Signed-off-by: Alex Williamson 
> ---
>  drivers/vfio/vfio_iommu_type1.c |   14 --
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 

With the function signature adapted:

Reviewed-by: Cornelia Huck 



Re: [PATCH 8/9] vfio/pci: use x86 naming instead of igd

2021-02-11 Thread Cornelia Huck
On Thu, 11 Feb 2021 11:29:37 -0500
Matthew Rosato  wrote:

> On 2/11/21 10:47 AM, Max Gurtovoy wrote:
> > 
> > On 2/2/2021 7:10 PM, Jason Gunthorpe wrote:  
> >> On Tue, Feb 02, 2021 at 05:06:59PM +0100, Cornelia Huck wrote:
> >>  
> >>> On the other side, we have the zdev support, which both requires s390
> >>> and applies to any pci device on s390.  
> >> Is there a reason why CONFIG_VFIO_PCI_ZDEV exists? Why not just always
> >> return the s390 specific data in VFIO_DEVICE_GET_INFO if running on
> >> s390?
> >>
> >> It would be like returning data from ACPI on other platforms.  
> > 
> > Agree.
> > 
> > all agree that I remove it ?  
> 
> I did some archives digging on the discussions around 
> CONFIG_VFIO_PCI_ZDEV and whether we should/should not have a Kconfig 
> switch around this; it was something that was carried over various 
> attempts to get the zdev support upstream, but I can't really find (or 
> think of) a compelling reason that a Kconfig switch must be kept for it. 
>   The bottom line is if you're on s390, you really want zdev support.
> 
> So: I don't have an objection so long as the net result is that 
> vfio_pci_zdev.o is always built in to vfio-pci(-core) for s390.

Yes, I also don't expect presence of the zdev stuff to confuse any
older userspace.

So, let's just drop CONFIG_VFIO_PCI_ZDEV and use CONFIG_S390 in lieu of
it (not changing the file name).



Re: [PATCH 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks

2021-02-11 Thread Cornelia Huck
On Wed, 10 Feb 2021 15:34:24 -0500
Tony Krowiak  wrote:

> On 2/10/21 5:53 AM, Cornelia Huck wrote:
> > On Tue,  9 Feb 2021 14:48:30 -0500
> > Tony Krowiak  wrote:
> >  
> >> This patch fixes a circular locking dependency in the CI introduced by
> >> commit f21916ec4826 ("s390/vfio-ap: clean up vfio_ap resources when KVM
> >> pointer invalidated"). The lockdep only occurs when starting a Secure
> >> Execution guest. Crypto virtualization (vfio_ap) is not yet supported for
> >> SE guests; however, in order to avoid CI errors, this fix is being
> >> provided.
> >>
> >> The circular lockdep was introduced when the masks in the guest's APCB
> >> were taken under the matrix_dev->lock. While the lock is definitely
> >> needed to protect the setting/unsetting of the KVM pointer, it is not
> >> necessarily critical for setting the masks, so this will not be done under
> >> protection of the matrix_dev->lock.
> >>
> >> Fixes: f21916ec4826 ("s390/vfio-ap: clean up vfio_ap resources when KVM 
> >> pointer invalidated")
> >> Cc: sta...@vger.kernel.org
> >> Signed-off-by: Tony Krowiak 
> >> ---
> >>   drivers/s390/crypto/vfio_ap_ops.c | 75 ++-
> >>   1 file changed, 45 insertions(+), 30 deletions(-)
> >>
> >>   static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)
> >>   {
> >> -  kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
> >> -  matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
> >> -  vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
> >> -  kvm_put_kvm(matrix_mdev->kvm);
> >> -  matrix_mdev->kvm = NULL;
> >> +  if (matrix_mdev->kvm) {  
> > If you're doing setting/unsetting under matrix_dev->lock, is it
> > possible that matrix_mdev->kvm gets unset between here and the next
> > line, as you don't hold the lock?  
> 
> That is highly unlikely because the only place the matrix_mdev->kvm
> pointer is cleared is in this function which is called from only two
> places: the notifier that handles the VFIO_GROUP_NOTIFY_SET_KVM
> notification when the KVM pointer is cleared; the vfio_ap_mdev_release()
> function which is called when the mdev fd is closed (i.e., when the guest
> is shut down). The fact is, with the only end-to-end implementation
> currently available, the notifier callback is never invoked to clear
> the KVM pointer because the vfio_ap_mdev_release callback is
> invoked first and it unregisters the notifier callback.
> 
> Having said that, I suppose there is no guarantee that there will not
> be different userspace clients in the future that do things in a
> different order. At the very least, it wouldn't hurt to protect against
> that as you suggest below.

Yes, if userspace is able to use the interfaces in the certain way, we
should always make sure that nothing bad happens if it does so, even if
known userspace applications are well-behaved.

[Can we make an 'evil userspace' test program, maybe? The hardware
dependency makes this hard to run, though.]

> 
> >
> > Maybe you could
> > - grab a reference to kvm while holding the lock
> > - call the mask handling functions with that kvm reference
> > - lock again, drop the reference, and do the rest of the processing?
> >  
> >> +  kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
> >> +  mutex_lock(_dev->lock);
> >> +  matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
> >> +  vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
> >> +  kvm_put_kvm(matrix_mdev->kvm);
> >> +  matrix_mdev->kvm = NULL;
> >> +  mutex_unlock(_dev->lock);
> >> +  }
> >>   }  
> 



Re: [PATCH 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks

2021-02-10 Thread Cornelia Huck
On Tue,  9 Feb 2021 14:48:30 -0500
Tony Krowiak  wrote:

> This patch fixes a circular locking dependency in the CI introduced by
> commit f21916ec4826 ("s390/vfio-ap: clean up vfio_ap resources when KVM
> pointer invalidated"). The lockdep only occurs when starting a Secure
> Execution guest. Crypto virtualization (vfio_ap) is not yet supported for
> SE guests; however, in order to avoid CI errors, this fix is being
> provided.
> 
> The circular lockdep was introduced when the masks in the guest's APCB
> were taken under the matrix_dev->lock. While the lock is definitely
> needed to protect the setting/unsetting of the KVM pointer, it is not
> necessarily critical for setting the masks, so this will not be done under
> protection of the matrix_dev->lock.
> 
> Fixes: f21916ec4826 ("s390/vfio-ap: clean up vfio_ap resources when KVM 
> pointer invalidated")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Tony Krowiak 
> ---
>  drivers/s390/crypto/vfio_ap_ops.c | 75 ++-
>  1 file changed, 45 insertions(+), 30 deletions(-)
> 

>  static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)
>  {
> - kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
> - matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
> - vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
> - kvm_put_kvm(matrix_mdev->kvm);
> - matrix_mdev->kvm = NULL;
> + if (matrix_mdev->kvm) {

If you're doing setting/unsetting under matrix_dev->lock, is it
possible that matrix_mdev->kvm gets unset between here and the next
line, as you don't hold the lock?

Maybe you could
- grab a reference to kvm while holding the lock
- call the mask handling functions with that kvm reference
- lock again, drop the reference, and do the rest of the processing?

> + kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
> + mutex_lock(_dev->lock);
> + matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
> + vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
> + kvm_put_kvm(matrix_mdev->kvm);
> + matrix_mdev->kvm = NULL;
> + mutex_unlock(_dev->lock);
> + }
>  }



Re: [PATCH -next] KVM: s390: Return the correct errno code

2021-02-04 Thread Cornelia Huck
On Thu, 4 Feb 2021 16:05:23 +0800
Zheng Yongjun  wrote:

> When valloc failed, should return ENOMEM rather than ENOBUF.
> 
> Signed-off-by: Zheng Yongjun 
> ---
>  arch/s390/kvm/interrupt.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
> index 2f177298c663..6b7acc27cfa2 100644
> --- a/arch/s390/kvm/interrupt.c
> +++ b/arch/s390/kvm/interrupt.c
> @@ -2252,7 +2252,7 @@ static int get_all_floating_irqs(struct kvm *kvm, u8 
> __user *usrbuf, u64 len)
>*/
>   buf = vzalloc(len);
>   if (!buf)
> - return -ENOBUFS;
> + return -ENOMEM;
>  
>   max_irqs = len / sizeof(struct kvm_s390_irq);
>  

This breaks a user space interface (see the comment right above the
vzalloc).



Re: [PATCH 8/9] vfio/pci: use x86 naming instead of igd

2021-02-02 Thread Cornelia Huck
On Mon, 1 Feb 2021 11:42:30 -0700
Alex Williamson  wrote:

> On Mon, 1 Feb 2021 12:49:12 -0500
> Matthew Rosato  wrote:
> 
> > On 2/1/21 12:14 PM, Cornelia Huck wrote:  
> > > On Mon, 1 Feb 2021 16:28:27 +
> > > Max Gurtovoy  wrote:
> > > 
> > >> This patch doesn't change any logic but only align to the concept of
> > >> vfio_pci_core extensions. Extensions that are related to a platform
> > >> and not to a specific vendor of PCI devices should be part of the core
> > >> driver. Extensions that are specific for PCI device vendor should go
> > >> to a dedicated vendor vfio-pci driver.
> > > 
> > > My understanding is that igd means support for Intel graphics, i.e. a
> > > strict subset of x86. If there are other future extensions that e.g.
> > > only make sense for some devices found only on AMD systems, I don't
> > > think they should all be included under the same x86 umbrella.
> > > 
> > > Similar reasoning for nvlink, that only seems to cover support for some
> > > GPUs under Power, and is not a general platform-specific extension IIUC.
> > > 
> > > We can arguably do the zdev -> s390 rename (as zpci appears only on
> > > s390, and all PCI devices will be zpci on that platform), although I'm
> > > not sure about the benefit.
> > 
> > As far as I can tell, there isn't any benefit for s390 it's just 
> > "re-branding" to match the platform name rather than the zdev moniker, 
> > which admittedly perhaps makes it more clear to someone outside of s390 
> > that any PCI device on s390 is a zdev/zpci type, and thus will use this 
> > extension to vfio_pci(_core).  This would still be true even if we added 
> > something later that builds atop it (e.g. a platform-specific device 
> > like ism-vfio-pci).  Or for that matter, mlx5 via vfio-pci on s390x uses 
> > these zdev extensions today and would need to continue using them in a 
> > world where mlx5-vfio-pci.ko exists.
> > 
> > I guess all that to say: if such a rename matches the 'grand scheme' of 
> > this design where we treat arch-level extensions to vfio_pci(_core) as 
> > "vfio_pci_(arch)" then I'm not particularly opposed to the rename.  But 
> > by itself it's not very exciting :)  
> 
> This all seems like the wrong direction to me.  The goal here is to
> modularize vfio-pci into a core library and derived vendor modules that
> make use of that core library.  If existing device specific extensions
> within vfio-pci cannot be turned into vendor modules through this
> support and are instead redefined as platform specific features of the
> new core library, that feels like we're already admitting failure of
> this core library to support known devices, let alone future devices.
> 
> IGD is a specific set of devices.  They happen to rely on some platform
> specific support, whose availability should be determined via the
> vendor module probe callback.  Packing that support into an "x86"
> component as part of the core feels not only short sighted, but also
> avoids addressing the issues around how userspace determines an optimal
> module to use for a device.

Hm, it seems that not all current extensions to the vfio-pci code are
created equal.

IIUC, we have igd and nvlink, which are sets of devices that only show
up on x86 or ppc, respectively, and may rely on some special features
of those architectures/platforms. The important point is that you have
a device identifier that you can match a driver against.

On the other side, we have the zdev support, which both requires s390
and applies to any pci device on s390. If we added special handling for
ISM on s390, ISM would be in a category similar to igd/nvlink.

Now, if somebody plugs a mlx5 device into an s390, we would want both
the zdev support and the specialized mlx5 driver. Maybe zdev should be
some kind of library that can be linked into normal vfio-pci, into
vfio-pci-mlx5, and a hypothetical vfio-pci-ism? You always want zdev on
s390 (if configured into the kernel).



Re: [PATCH 6/9] vfio-pci/zdev: fix possible segmentation fault issue

2021-02-02 Thread Cornelia Huck
On Mon, 1 Feb 2021 13:47:57 -0700
Alex Williamson  wrote:

> On Mon, 1 Feb 2021 12:08:45 -0500
> Matthew Rosato  wrote:
> 
> > On 2/1/21 11:52 AM, Cornelia Huck wrote:  
> > > On Mon, 1 Feb 2021 16:28:25 +
> > > Max Gurtovoy  wrote:
> > > 
> > >> In case allocation fails, we must behave correctly and exit with error.
> > >>
> > >> Signed-off-by: Max Gurtovoy 
> > > 
> > > Fixes: e6b817d4b821 ("vfio-pci/zdev: Add zPCI capabilities to 
> > > VFIO_DEVICE_GET_INFO")
> > > 
> > > Reviewed-by: Cornelia Huck 
> > > 
> > > I think this should go in independently of this series. >
> > 
> > Agreed, makes sense to me -- thanks for finding.
> > 
> > Reviewed-by: Matthew Rosato   
> 
> I can grab this one, and 5/9.  Connie do you want to toss an R-b at
> 5/9?  Thanks,
> 
> Alex

Yes, makes sense to grab these two. R-b added.



Re: [PATCH 5/9] vfio-pci/zdev: remove unused vdev argument

2021-02-02 Thread Cornelia Huck
On Mon, 1 Feb 2021 16:28:24 +
Max Gurtovoy  wrote:

> Zdev static functions does not use vdev argument. Remove it.

s/does not use/do not use the/

> 
> Signed-off-by: Max Gurtovoy 
> ---
>  drivers/vfio/pci/vfio_pci_zdev.c | 20 
>  1 file changed, 8 insertions(+), 12 deletions(-)

Reviewed-by: Cornelia Huck 



Re: [PATCH 8/9] vfio/pci: use x86 naming instead of igd

2021-02-01 Thread Cornelia Huck
On Mon, 1 Feb 2021 16:28:27 +
Max Gurtovoy  wrote:

> This patch doesn't change any logic but only align to the concept of
> vfio_pci_core extensions. Extensions that are related to a platform
> and not to a specific vendor of PCI devices should be part of the core
> driver. Extensions that are specific for PCI device vendor should go
> to a dedicated vendor vfio-pci driver.

My understanding is that igd means support for Intel graphics, i.e. a
strict subset of x86. If there are other future extensions that e.g.
only make sense for some devices found only on AMD systems, I don't
think they should all be included under the same x86 umbrella.

Similar reasoning for nvlink, that only seems to cover support for some
GPUs under Power, and is not a general platform-specific extension IIUC.

We can arguably do the zdev -> s390 rename (as zpci appears only on
s390, and all PCI devices will be zpci on that platform), although I'm
not sure about the benefit.

> 
> For now, x86 extensions will include only igd.
> 
> Signed-off-by: Max Gurtovoy 
> ---
>  drivers/vfio/pci/Kconfig| 13 ++---
>  drivers/vfio/pci/Makefile   |  2 +-
>  drivers/vfio/pci/vfio_pci_core.c|  2 +-
>  drivers/vfio/pci/vfio_pci_private.h |  2 +-
>  drivers/vfio/pci/{vfio_pci_igd.c => vfio_pci_x86.c} |  0
>  5 files changed, 9 insertions(+), 10 deletions(-)
>  rename drivers/vfio/pci/{vfio_pci_igd.c => vfio_pci_x86.c} (100%)

(...)

> diff --git a/drivers/vfio/pci/vfio_pci_core.c 
> b/drivers/vfio/pci/vfio_pci_core.c
> index c559027def2d..e0e258c37fb5 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -328,7 +328,7 @@ static int vfio_pci_enable(struct vfio_pci_device *vdev)
>  
>   if (vfio_pci_is_vga(pdev) &&
>   pdev->vendor == PCI_VENDOR_ID_INTEL &&
> - IS_ENABLED(CONFIG_VFIO_PCI_IGD)) {
> + IS_ENABLED(CONFIG_VFIO_PCI_X86)) {
>   ret = vfio_pci_igd_init(vdev);

This one explicitly checks for Intel devices, so I'm not sure why you
want to generalize this to x86?

>   if (ret && ret != -ENODEV) {
>   pci_warn(pdev, "Failed to setup Intel IGD regions\n");



Re: [PATCH 6/9] vfio-pci/zdev: fix possible segmentation fault issue

2021-02-01 Thread Cornelia Huck
On Mon, 1 Feb 2021 16:28:25 +
Max Gurtovoy  wrote:

> In case allocation fails, we must behave correctly and exit with error.
> 
> Signed-off-by: Max Gurtovoy 

Fixes: e6b817d4b821 ("vfio-pci/zdev: Add zPCI capabilities to 
VFIO_DEVICE_GET_INFO")

Reviewed-by: Cornelia Huck 

I think this should go in independently of this series.

> ---
>  drivers/vfio/pci/vfio_pci_zdev.c | 4 
>  1 file changed, 4 insertions(+)



Re: [PATCH RFC v1 0/3] Introduce vfio-pci-core subsystem

2021-01-28 Thread Cornelia Huck
On Tue, 26 Jan 2021 15:27:43 +0200
Max Gurtovoy  wrote:

> Hi Alex, Cornelia and Jason,
> 
> thanks for the reviewing this.
> 
> On 1/26/2021 5:34 AM, Alex Williamson wrote:
> > On Mon, 25 Jan 2021 20:45:22 -0400
> > Jason Gunthorpe  wrote:
> >  
> >> On Mon, Jan 25, 2021 at 04:31:51PM -0700, Alex Williamson wrote:
> >>  
> >>> We're supposed to be enlightened by a vendor driver that does nothing
> >>> more than pass the opaque device_data through to the core functions,
> >>> but in reality this is exactly the point of concern above.  At a
> >>> minimum that vendor driver needs to look at the vdev to get the
> >>> pdev,  
> >> The end driver already havs the pdev, the RFC doesn't go enough into
> >> those bits, it is a good comment.
> >>
> >> The dd_data pased to the vfio_create_pci_device() will be retrieved
> >> from the ops to get back to the end drivers data. This can cleanly
> >> include everything: the VF pci_device, PF pci_device, mlx5_core
> >> pointer, vfio_device and vfio_pci_device.
> >>
> >> This is why the example passes in the mvadev:
> >>
> >> +  vdev = vfio_create_pci_device(pdev, _vfio_pci_ops, mvadev);
> >>
> >> The mvadev has the PF, VF, and mlx5 core driver pointer.
> >>
> >> Getting that back out during the ops is enough to do what the mlx5
> >> driver needs to do, which is relay migration related IOCTLs to the PF
> >> function via the mlx5_core driver so the device can execute them on
> >> behalf of the VF.
> >>  
> >>> but then what else does it look at, consume, or modify.  Now we have
> >>> vendor drivers misusing the core because it's not clear which fields
> >>> are private and how public fields can be used safely,  
> >> The kernel has never followed rigid rules for data isolation, it is
> >> normal to have whole private structs exposed in headers so that
> >> container_of can be used to properly compose data structures.  
> > I reject this assertion, there are plenty of structs that clearly
> > indicate private vs public data or, as we've done in mdev, clearly
> > marking the private data in a "private" header and provide access
> > functions for public fields.  Including a "private" header to make use
> > of a "library" is just wrong.  In the example above, there's no way for
> > the mlx vendor driver to get back dd_data without extracting it from
> > struct vfio_pci_device itself.  
> 
> I'll create a better separation between private/public fields according 
> to my understanding for the V2.
> 
> I'll just mention that beyond this separation, future improvements will 
> be needed and can be done incrementally.
> 
> I don't think that we should do so many changes at one shut. The 
> incremental process is safer from subsystem point of view.
> 
> I also think that upstreaming mlx5_vfio_pci.ko and upstreaming vfio-pci 
> separation into 2 modules doesn't have to happen in one-shut.

The design can probably benefit from tossing a non-mlx5 driver into the
mix.

So, let me suggest the zdev support for that experiment (see
e6b817d4b8217a9528fcfd59719b924ab8a5ff23 and friends.) It is quite
straightforward: it injects some capabilities into the info ioctl. A
specialized driver would basically only need to provide special
handling for the ioctl callback and just forward anything else. It also
would not need any matching for device ids etc., as it would only make
sense on s390x, but regardless of the device. It could, however, help
us to get an idea what a good split would look like.

> 
> But again, to make our point in this RFC, I'll improve it for V2.
> 
> >  
> >> Look at struct device, for instance. Most of that is private to the
> >> driver core.
> >>
> >> A few 'private to vfio-pci-core' comments would help, it is good
> >> feedback to make that more clear.
> >>  
> >>> extensions potentially break vendor drivers, etc.  We're only even hand
> >>> waving that existing device specific support could be farmed out to new
> >>> device specific drivers without even going to the effort to prove that.  
> >> This is a RFC, not a complete patch series. The RFC is to get feedback
> >> on the general design before everyone comits alot of resources and
> >> positions get dug in.
> >>
> >> Do you really think the existing device specific support would be a
> >> problem to lift? It already looks pretty clean with the
> >> vfio_pci_regops, looks easy enough to lift to the parent.
> >>  
> >>> So far the TODOs rather mask the dirty little secrets of the
> >>> extension rather than showing how a vendor derived driver needs to
> >>> root around in struct vfio_pci_device to do something useful, so
> >>> probably porting actual device specific support rather than further
> >>> hand waving would be more helpful.  
> >> It would be helpful to get actual feedback on the high level design -
> >> someting like this was already tried in May and didn't go anywhere -
> >> are you surprised that we are reluctant to commit alot of resources
> >> doing a complete job just to have it go nowhere again?  
> > 

Re: [PATCH 4/4] vfio-pci/zdev: Introduce the zPCI I/O vfio region

2021-01-27 Thread Cornelia Huck
On Wed, 27 Jan 2021 08:53:05 -0700
Alex Williamson  wrote:

> On Wed, 27 Jan 2021 09:23:04 -0500
> Matthew Rosato  wrote:
> 
> > On 1/26/21 6:18 PM, Alex Williamson wrote:  
> > > On Mon, 25 Jan 2021 09:40:38 -0500
> > > Matthew Rosato  wrote:
> > > 
> > >> On 1/22/21 6:48 PM, Alex Williamson wrote:
> > >>> On Tue, 19 Jan 2021 15:02:30 -0500
> > >>> Matthew Rosato  wrote:
> > >>>
> >  Some s390 PCI devices (e.g. ISM) perform I/O operations that have very
> >  specific requirements in terms of alignment as well as the patterns in
> >  which the data is read/written. Allowing these to proceed through the
> >  typical vfio_pci_bar_rw path will cause them to be broken in up in 
> >  such a
> >  way that these requirements can't be guaranteed. In addition, ISM 
> >  devices
> >  do not support the MIO codepaths that might be triggered on vfio I/O 
> >  coming
> >  from userspace; we must be able to ensure that these devices use the
> >  non-MIO instructions.  To facilitate this, provide a new vfio region by
> >  which non-MIO instructions can be passed directly to the host kernel 
> >  s390
> >  PCI layer, to be reliably issued as non-MIO instructions.
> > 
> >  This patch introduces the new vfio VFIO_REGION_SUBTYPE_IBM_ZPCI_IO 
> >  region
> >  and implements the ability to pass PCISTB and PCILG instructions over 
> >  it,
> >  as these are what is required for ISM devices.
> > >>>
> > >>> There have been various discussions about splitting vfio-pci to allow
> > >>> more device specific drivers rather adding duct tape and bailing wire
> > >>> for various device specific features to extend vfio-pci.  The latest
> > >>> iteration is here[1].  Is it possible that such a solution could simply
> > >>> provide the standard BAR region indexes, but with an implementation that
> > >>> works on s390, rather than creating new device specific regions to
> > >>> perform the same task?  Thanks,
> > >>>
> > >>> Alex
> > >>>
> > >>> [1]https://lore.kernel.org/lkml/20210117181534.65724-1-mgurto...@nvidia.com/
> > >>>
> > >>
> > >> Thanks for the pointer, I'll have to keep an eye on this.  An approach
> > >> like this could solve some issues, but I think a main issue that still
> > >> remains with relying on the standard BAR region indexes (whether using
> > >> the current vfio-pci driver or a device-specific driver) is that QEMU
> > >> writes to said BAR memory region are happening in, at most, 8B chunks
> > >> (which then, in the current general-purpose vfio-pci code get further
> > >> split up into 4B iowrite operations).  The alternate approach I'm
> > >> proposing here is allowing for the whole payload (4K) in a single
> > >> operation, which is significantly faster.  So, I suspect even with a
> > >> device specific driver we'd want this sort of a region anyhow..
> > > 
> > > Why is this device specific behavior?  It would be a fair argument that
> > > acceptable device access widths for MMIO are always device specific, so
> > > we should never break them down.  Looking at the PCI spec, a TLP
> > > requires a dword (4-byte) aligned address with a 10-bit length field > 
> > > indicating the number of dwords, so up to 4K data as you suggest is the   
> > >  
> > 
> > Well, as I mentioned in a different thread, it's not really device   
> 
> Sorry, I tried to follow the thread, not sure it's possible w/o lots of
> preexisting s390 knowledge.
> 
> > specific behavior but rather architecture/s390x specific behavior; 
> > PCISTB is an interface available to all PCI devices on s390x, it just so 
> > happens that ISM is the first device type that is running into the 
> > nuance.  The architecture is designed in such a way that other devices 
> > can use the same interface in the same manner.  
> 
> As a platform access mechanism, this leans towards a platform specific
> implementation of the PCI BAR regions.
>  
> > To drill down a bit, the PCISTB payload can either be 'strict' or 
> > 'relaxed', which via the s390x architecture 'relaxed' means it's not 
> > dword-aligned but rather byte-aligned up to 4K.  We find out at init 
> > time which interface a device supports --  So, for a device that 
> > supports 'relaxed' PCISTB like ISM, an example would be a PCISTB of 11 
> > bytes coming from a non-dword-aligned address is permissible, which 
> > doesn't match the TLP from the spec as you described...  I believe this 
> > 'relaxed' operation that steps outside of the spec is unique to s390x. 
> > (Conversely, devices that use 'strict' PCISTB conform to the typical TLP 
> > definition)
> > 
> > This deviation from spec is to my mind is another argument to treat 
> > these particular PCISTB separately.  
> 
> I think that's just an accessor abstraction, we're not asking users to
> generate TLPs.  If we expose a byte granularity interface, some
> platforms might pass that directly to the PCISTB command, otherwise
> might 

Re: [PATCH 4/4] vfio-pci/zdev: Introduce the zPCI I/O vfio region

2021-01-25 Thread Cornelia Huck
On Mon, 25 Jan 2021 09:40:38 -0500
Matthew Rosato  wrote:

> On 1/22/21 6:48 PM, Alex Williamson wrote:
> > On Tue, 19 Jan 2021 15:02:30 -0500
> > Matthew Rosato  wrote:
> >   
> >> Some s390 PCI devices (e.g. ISM) perform I/O operations that have very
> >> specific requirements in terms of alignment as well as the patterns in
> >> which the data is read/written. Allowing these to proceed through the
> >> typical vfio_pci_bar_rw path will cause them to be broken in up in such a
> >> way that these requirements can't be guaranteed. In addition, ISM devices
> >> do not support the MIO codepaths that might be triggered on vfio I/O coming
> >> from userspace; we must be able to ensure that these devices use the
> >> non-MIO instructions.  To facilitate this, provide a new vfio region by
> >> which non-MIO instructions can be passed directly to the host kernel s390
> >> PCI layer, to be reliably issued as non-MIO instructions.
> >>
> >> This patch introduces the new vfio VFIO_REGION_SUBTYPE_IBM_ZPCI_IO region
> >> and implements the ability to pass PCISTB and PCILG instructions over it,
> >> as these are what is required for ISM devices.  
> > 
> > There have been various discussions about splitting vfio-pci to allow
> > more device specific drivers rather adding duct tape and bailing wire
> > for various device specific features to extend vfio-pci.  The latest
> > iteration is here[1].  Is it possible that such a solution could simply
> > provide the standard BAR region indexes, but with an implementation that
> > works on s390, rather than creating new device specific regions to
> > perform the same task?  Thanks,
> > 
> > Alex
> > 
> > [1]https://lore.kernel.org/lkml/20210117181534.65724-1-mgurto...@nvidia.com/
> >   
> 
> Thanks for the pointer, I'll have to keep an eye on this.  An approach 
> like this could solve some issues, but I think a main issue that still 
> remains with relying on the standard BAR region indexes (whether using 
> the current vfio-pci driver or a device-specific driver) is that QEMU 
> writes to said BAR memory region are happening in, at most, 8B chunks 
> (which then, in the current general-purpose vfio-pci code get further 
> split up into 4B iowrite operations).  The alternate approach I'm 
> proposing here is allowing for the whole payload (4K) in a single 
> operation, which is significantly faster.  So, I suspect even with a 
> device specific driver we'd want this sort of a region anyhow..

I'm also wondering about device specific vs architecture/platform
specific handling.

If we're trying to support ISM devices, that's device specific
handling; but if we're trying to add more generic things like the large
payload support, that's not necessarily tied to a device, is it? For
example, could a device support large payload if plugged into a z, but
not if plugged into another machine?



Re: [PATCH RFC v1 0/3] Introduce vfio-pci-core subsystem

2021-01-25 Thread Cornelia Huck
On Fri, 22 Jan 2021 16:04:21 -0400
Jason Gunthorpe  wrote:

> On Fri, Jan 22, 2021 at 12:25:03PM -0700, Alex Williamson wrote:

> > > In this way, we'll use the HW vendor driver core to manage the lifecycle
> > > of these devices. This is reasonable since only the vendor driver knows
> > > exactly about the status on its internal state and the capabilities of
> > > its acceleratots, for example.  
> > 
> > But mdev provides that too, or the vendor could write their own vfio  
> 
> Not really, mdev has a completely different lifecycle model that is
> not very compatible with what is required here.
> 
> And writing a VFIO driver is basically what this does, just a large
> portion of the driver is reusing code from the normal vfio-pci cases.

I think you cut out an important part of Alex' comment, so let me
repost it here:

"But mdev provides that too, or the vendor could write their own vfio
bus driver for the device, this doesn't really justify or delve deep
enough to show examples beyond "TODO" remarks for a vendor driver
actually interacting with vfio-pci-core in an extensible way.  One of
the concerns of previous efforts was that it's trying to directly
expose vfio-pci's implementation as an API for vendor drivers, I don't
really see that anything has changed in that respect here."

I'm missing the bigger picture of how this api is supposed to work out,
a driver with a lot of TODOs does not help much to figure out whether
this split makes sense and is potentially useful for a number of use
cases, or whether mdev (even with its different lifecycle model) or a
different vfio bus driver might be a better fit for the more involved
cases. (For example, can s390 ISM fit here?)



Re: [PATCH 1/1] s390/vfio-ap: No need to disable IRQ after queue reset

2021-01-21 Thread Cornelia Huck
On Thu, 21 Jan 2021 08:20:08 +0100
Halil Pasic  wrote:

> From: Tony Krowiak 
> 
> The queues assigned to a matrix mediated device are currently reset when:
> 
> * The VFIO_DEVICE_RESET ioctl is invoked
> * The mdev fd is closed by userspace (QEMU)
> * The mdev is removed from sysfs.
> 
> Immediately after the reset of a queue, a call is made to disable
> interrupts for the queue. This is entirely unnecessary because the reset of
> a queue disables interrupts, so this will be removed.
> 
> Furthermore, vfio_ap_irq_disable() does an unconditional PQAP/AQIC which
> can result in a specification exception (when the corresponding facility
> is not available), so this is actually a bugfix.
> 
> Signed-off-by: Tony Krowiak 
> [pa...@linux.ibm.com: minor rework before merging]
> Reviewed-by: Halil Pasic 
> Signed-off-by: Halil Pasic 
> Fixes: ec89b55e3bce ("s390: ap: implement PAPQ AQIC interception in kernel")
> Cc: 
> 
> ---
> 
> Since it turned out disabling the interrupts via PQAP/AQIC is not only
> unnecesary but also buggy, we decided to put this patch, which
> used to be apart of the series https://lkml.org/lkml/2020/12/22/757 on the 
> fast
> lane.
> 
> If the backports turn out to be a bother, which I hope won't be the case
> not, I am happy to help with those.
> 
> ---
>  drivers/s390/crypto/vfio_ap_drv.c |   6 +-
>  drivers/s390/crypto/vfio_ap_ops.c | 100 --
>  drivers/s390/crypto/vfio_ap_private.h |  12 ++--
>  3 files changed, 69 insertions(+), 49 deletions(-)
> 

(...)

> diff --git a/drivers/s390/crypto/vfio_ap_private.h 
> b/drivers/s390/crypto/vfio_ap_private.h
> index f46dde56b464..28e9d9989768 100644
> --- a/drivers/s390/crypto/vfio_ap_private.h
> +++ b/drivers/s390/crypto/vfio_ap_private.h
> @@ -88,11 +88,6 @@ struct ap_matrix_mdev {
>   struct mdev_device *mdev;
>  };
>  
> -extern int vfio_ap_mdev_register(void);
> -extern void vfio_ap_mdev_unregister(void);
> -int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
> -  unsigned int retry);
> -
>  struct vfio_ap_queue {
>   struct ap_matrix_mdev *matrix_mdev;
>   unsigned long saved_pfn;
> @@ -100,5 +95,10 @@ struct vfio_ap_queue {
>  #define VFIO_AP_ISC_INVALID 0xff
>   unsigned char saved_isc;
>  };
> -struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q);
> +
> +int vfio_ap_mdev_register(void);
> +void vfio_ap_mdev_unregister(void);

Nit: was moving these two necessary?

> +int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q,
> +  unsigned int retry);
> +
>  #endif /* _VFIO_AP_PRIVATE_H_ */
> 
> base-commit: 9791581c049c10929e97098374dd1716a81fefcc

Anyway, if I didn't entangle myself in the various branches, this seems
sane.

Reviewed-by: Cornelia Huck 



Re: [PATCH RFC v1 0/3] Introduce vfio-pci-core subsystem

2021-01-19 Thread Cornelia Huck
On Mon, 18 Jan 2021 14:16:26 -0400
Jason Gunthorpe  wrote:

> On Mon, Jan 18, 2021 at 05:00:09PM +0100, Cornelia Huck wrote:
> 
> > > You can say that all the HW specific things are in the mlx5_vfio_pci
> > > driver. It is an unusual driver because it must bind to both the PCI
> > > VF with a pci_driver and to the mlx5_core PF using an
> > > auxiliary_driver. This is needed for the object lifetimes to be
> > > correct.  
> > 
> > Hm... I might be confused about the usage of the term 'driver' here.
> > IIUC, there are two drivers, one on the pci bus and one on the
> > auxiliary bus. Is the 'driver' you're talking about here more the
> > module you load (and not a driver in the driver core sense?)  
> 
> Here "driver" would be the common term meaning the code that realizes
> a subsytem for HW - so mlx5_vfio_pci is a VFIO driver because it
> ultimately creates a /dev/vfio* through the vfio subsystem.
> 
> The same way we usually call something like mlx5_en an "ethernet
> driver" not just a "pci driver"
> 
> > Yes, sure. But it also shows that mlx5_vfio_pci aka the device-specific
> > code is rather small in comparison to the common vfio-pci code.
> > Therefore my question whether it will gain more specific changes (that
> > cannot be covered via the auxiliary driver.)  
> 
> I'm not sure what you mean "via the auxiliary driver" - there is only
> one mlx5_vfio_pci, and the non-RFC version with all the migration code
> is fairly big.
> 
> The pci_driver contributes a 'struct pci_device *' and the
> auxiliary_driver contributes a 'struct mlx5_core_dev *'. mlx5_vfio_pci
> fuses them together into a VFIO device. Depending on the VFIO
> callback, it may use an API from the pci_device or from the
> mlx5_core_dev device, or both.

Let's rephrase my question a bit:

This proposal splits the existing vfio-pci driver into a "core"
component and code actually implementing the "driver" part. For mlx5,
an alternative "driver" is introduced that reuses the "core" component
and also hooks into mlx5-specific code parts via the auxiliary device
framework. (IIUC, the plan is to make existing special cases for
devices follow mlx5's lead later.)

I've been thinking of an alternative split: Keep vfio-pci as it is now,
but add an auxiliary device. For mlx5, an auxiliary device_driver can
match to that device and implement mlx5-specific things. From the code
in this RFC, it is not clear to me whether this would be feasible: most
callbacks seem to simply forward to the core component, and that might
be possible to be done by a purely auxiliary device_driver; but this
may or may not work well for additional functionality.

I guess my question is: into which callbacks will the additional
functionality hook? If there's no good way to do what they need to do
without manipulating the vfio-pci calls, my proposal will not work, and
this proposal looks like the better way. But it's hard to tell without
seeing the code, which is why I'm asking :)



Re: [PATCH 2/2] s390: mm: Fix secure storage access exception handling

2021-01-19 Thread Cornelia Huck
On Tue, 19 Jan 2021 11:38:10 +0100
Janosch Frank  wrote:

> On 1/19/21 11:25 AM, Christian Borntraeger wrote:
> > 
> > 
> > On 19.01.21 11:04, Janosch Frank wrote:  
> >> Turns out that the bit 61 in the TEID is not always 1 and if that's
> >> the case the address space ID and the address are
> >> unpredictable. Without an address and it's address space ID we can't
> >> export memory and hence we can only send a SIGSEGV to the process or
> >> panic the kernel depending on who caused the exception.
> >>
> >> Signed-off-by: Janosch Frank 
> >> Fixes: 084ea4d611a3d ("s390/mm: add (non)secure page access exceptions 
> >> handlers")
> >> Cc: sta...@vger.kernel.org  
> > 
> > Reviewed-by: Christian Borntraeger   
> 
> Thanks!
> 
> > 
> > some small things to consider (or to reject)
> >   
> >> ---
> >>  arch/s390/mm/fault.c | 14 ++
> >>  1 file changed, 14 insertions(+)
> >>
> >> diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
> >> index e30c7c781172..5442937e5b4b 100644
> >> --- a/arch/s390/mm/fault.c
> >> +++ b/arch/s390/mm/fault.c
> >> @@ -791,6 +791,20 @@ void do_secure_storage_access(struct pt_regs *regs)
> >>struct page *page;
> >>int rc;
> >>  
> >> +  /* There are cases where we don't have a TEID. */
> >> +  if (!(regs->int_parm_long & 0x4)) {
> >> +  /*
> >> +   * Userspace could for example try to execute secure
> >> +   * storage and trigger this. We should tell it that it
> >> +   * shouldn't do that.  
> > 
> > Maybe something like
> > /*
> >  * when this happens, userspace did something that it

s/when/When/ :)

> >  * was not supposed to do, e.g. branching into secure
> >  * secure memory. Trigger a segmentation fault.  
> >> +   */  
> 
> Sounds good
> 
> >> +  if (user_mode(regs)) {
> >> +  send_sig(SIGSEGV, current, 0);
> >> +  return;
> >> +  } else
> >> +  panic("Unexpected PGM 0x3d with TEID bit 61=0");  
> > 
> > use BUG instead of panic? That would kill this process, but it allows
> > people to maybe save unaffected data.  
> 
> That would make sense, will do

With BUG():

Reviewed-by: Cornelia Huck 


pgpICNfGBeLQd.pgp
Description: OpenPGP digital signature


Re: [PATCH 1/2] s390: uv: Fix sysfs max number of VCPUs reporting

2021-01-19 Thread Cornelia Huck
On Tue, 19 Jan 2021 05:04:01 -0500
Janosch Frank  wrote:

> The number reported by the query is N-1 and I think people reading the
> sysfs file would expect N instead. For users creating VMs there's no
> actual difference because KVM's limit is currently below the UV's
> limit.
> 
> The naming of the field is a bit misleading. Number in this context is
> used like ID and starts at 0. The query field denotes the maximum
> number that can be put into the VCPU number field in the "create
> secure CPU" UV call.
> 
> Signed-off-by: Janosch Frank 
> Fixes: a0f60f8431999 ("s390/protvirt: Add sysfs firmware interface for 
> Ultravisor information")
> Cc: sta...@vger.kernel.org
> ---
>  arch/s390/boot/uv.c| 2 +-
>  arch/s390/include/asm/uv.h | 4 ++--
>  arch/s390/kernel/uv.c  | 2 +-
>  3 files changed, 4 insertions(+), 4 deletions(-)

Acked-by: Cornelia Huck 



Re: [PATCH RFC v1 0/3] Introduce vfio-pci-core subsystem

2021-01-18 Thread Cornelia Huck
On Mon, 18 Jan 2021 11:10:20 -0400
Jason Gunthorpe  wrote:

> On Mon, Jan 18, 2021 at 02:38:06PM +0100, Cornelia Huck wrote:
> 
> > > These devices will be seen on the Auxiliary bus as:
> > > mlx5_core.vfio_pci.2048 -> 
> > > ../../../devices/pci:00/:00:02.0/:05:00.0/:06:00.0/:07:00.0/mlx5_core.vfio_pci.2048
> > > mlx5_core.vfio_pci.2304 -> 
> > > ../../../devices/pci:00/:00:02.0/:05:00.0/:06:00.0/:07:00.1/mlx5_core.vfio_pci.2304
> > > 
> > > 2048 represents BDF 08:00.0 and 2304 represents BDF 09:00.0 in decimal
> > > view. In this manner, the administrator will be able to locate the
> > > correct vfio-pci module it should bind the desired BDF to (by finding
> > > the pointer to the module according to the Auxiliary driver of that
> > > BDF).  
> > 
> > I'm not familiar with that auxiliary framework (it seems to be fairly
> > new?);   
> 
> Auxillary bus is for connecting two parts of the same driver that
> reside in different modules/subystems. Here it is connecting the vfio
> part to the core part of mlx5 (running on the PF).

Yes, that's also what I understood so far; still need to do more reading up.

> 
> > but can you maybe create an auxiliary device unconditionally and
> > contain all hardware-specific things inside a driver for it? Or is
> > that not flexible enough?  
> 
> The goal is to make a vfio device, auxiliary bus is only in the
> picture because a VF device under vfio needs to interact with the PF
> mlx5_core driver, auxiliary bus provides that connection.

Nod.

> 
> You can say that all the HW specific things are in the mlx5_vfio_pci
> driver. It is an unusual driver because it must bind to both the PCI
> VF with a pci_driver and to the mlx5_core PF using an
> auxiliary_driver. This is needed for the object lifetimes to be
> correct.

Hm... I might be confused about the usage of the term 'driver' here.
IIUC, there are two drivers, one on the pci bus and one on the
auxiliary bus. Is the 'driver' you're talking about here more the
module you load (and not a driver in the driver core sense?)

> 
> The PF is providing services to control a full VF which mlx5_vfio_pci
> converts into VFIO API.
> 
> > >  drivers/vfio/pci/Kconfig|   22 +-
> > >  drivers/vfio/pci/Makefile   |   16 +-
> > >  drivers/vfio/pci/mlx5_vfio_pci.c|  253 +++
> > >  drivers/vfio/pci/vfio_pci.c | 2386 +--
> > >  drivers/vfio/pci/vfio_pci_core.c| 2311 ++  
> > 
> > Especially regarding this diffstat...   
> 
> It is a bit misleading because it doesn't show the rename

Yes, sure. But it also shows that mlx5_vfio_pci aka the device-specific
code is rather small in comparison to the common vfio-pci code.
Therefore my question whether it will gain more specific changes (that
cannot be covered via the auxiliary driver.)



Re: [PATCH RFC v1 0/3] Introduce vfio-pci-core subsystem

2021-01-18 Thread Cornelia Huck
On Sun, 17 Jan 2021 18:15:31 +
Max Gurtovoy  wrote:

> Hi Alex and Cornelia,
> 
> This series split the vfio_pci driver into 2 parts: pci driver and a
> subsystem driver that will also be library of code. The pci driver,
> vfio_pci.ko will be used as before and it will bind to the subsystem
> driver vfio_pci_core.ko to register to the VFIO subsystem. This patchset
> if fully backward compatible. This is a typical Linux subsystem
> framework behaviour. This framework can be also adopted by vfio_mdev
> devices as we'll see in the below sketch.
> 
> This series is coming to solve the issues that were raised in the
> previous attempt for extending vfio-pci for vendor specific
> functionality: https://lkml.org/lkml/2020/5/17/376 by Yan Zhao.
> 
> This solution is also deterministic in a sense that when a user will
> bind to a vendor specific vfio-pci driver, it will get all the special
> goodies of the HW.
>  
> This subsystem framework will also ease on adding vendor specific
> functionality to VFIO devices in the future by allowing another module
> to provide the pci_driver that can setup number of details before
> registering to VFIO subsystem (such as inject its own operations).
> 
> Below we can see the proposed changes (this patchset only deals with
> VFIO_PCI subsystem but it can easily be extended to VFIO_MDEV subsystem
> as well):
> 
> +--+
> |  |
> |VFIO  |
> |  |
> +--+
> 
> ++++
> ||||
> |  VFIO_PCI_CORE ||  VFIO_MDEV_CORE|
> ||||
> ++++
> 
> +---+ +--++---+ +--+
> |   | |  ||   | |  |
> |   | |  ||   | |  |
> | VFIO_PCI  | | MLX5_VFIO_PCI|| VFIO_MDEV | |MLX5_VFIO_MDEV|
> |   | |  ||   | |  |
> |   | |  ||   | |  |
> +---+ +--++---+ +--+
> 
> First 2 patches introduce the above changes for vfio_pci and
> vfio_pci_core.
> 
> Patch (3/3) introduces a new mlx5 vfio-pci module that registers to VFIO
> subsystem using vfio_pci_core. It also registers to Auxiliary bus for
> binding to mlx5_core that is the parent of mlx5-vfio-pci devices. This
> will allow extending mlx5-vfio-pci devices with HW specific features
> such as Live Migration (mlx5_core patches are not part of this series
> that comes for proposing the changes need for the vfio pci subsystem).
> 
> These devices will be seen on the Auxiliary bus as:
> mlx5_core.vfio_pci.2048 -> 
> ../../../devices/pci:00/:00:02.0/:05:00.0/:06:00.0/:07:00.0/mlx5_core.vfio_pci.2048
> mlx5_core.vfio_pci.2304 -> 
> ../../../devices/pci:00/:00:02.0/:05:00.0/:06:00.0/:07:00.1/mlx5_core.vfio_pci.2304
> 
> 2048 represents BDF 08:00.0 and 2304 represents BDF 09:00.0 in decimal
> view. In this manner, the administrator will be able to locate the
> correct vfio-pci module it should bind the desired BDF to (by finding
> the pointer to the module according to the Auxiliary driver of that
> BDF).

I'm not familiar with that auxiliary framework (it seems to be fairly
new?); but can you maybe create an auxiliary device unconditionally and
contain all hardware-specific things inside a driver for it? Or is that
not flexible enough?

> 
> In this way, we'll use the HW vendor driver core to manage the lifecycle
> of these devices. This is reasonable since only the vendor driver knows
> exactly about the status on its internal state and the capabilities of
> its acceleratots, for example.
> 
> TODOs:
> 1. For this RFC we still haven't cleaned all vendor specific stuff that
>were merged in the past into vfio_pci (such as VFIO_PCI_IG and
>VFIO_PCI_NVLINK2).
> 2. Create subsystem module for VFIO_MDEV. This can be used for vendor
>specific scalable functions for example (SFs).
> 3. Add Live migration functionality for mlx5 SNAP devices
>(NVMe/Virtio-BLK).
> 4. Add Live migration functionality for mlx5 VFs
> 5. Add the needed functionality for mlx5_core
> 
> I would like to thank the great team that was involved in this
> development, design and internal review:
> Oren, Liran, Jason, Leon, Aviad, Shahaf, Gary, Artem, Kirti, Neo, Andy
> and others.
> 
> This series applies 

Re: [RFC 0/4] vfio-pci/zdev: Fixing s390 vfio-pci ISM support

2020-12-22 Thread Cornelia Huck
On Thu, 17 Dec 2020 11:04:48 -0500
Matthew Rosato  wrote:

> On 12/17/20 7:59 AM, Cornelia Huck wrote:
> > The basic question I have is whether it makes sense to specialcase the
> > ISM device (can we even find out that we're dealing with an ISM device
> > here?) to force the non-MIO instructions, as it is just a device with  
> 
> Yes, with the addition of the CLP data passed from the host via vfio 
> capabilities, we can tell this is an ISM device specifically via the 
> 'pft' field in VFOI_DEVICE_INFO_CAP_ZPCI_BASE.  We don't actually 
> surface that field to the guest itself in the Q PCI FN clp rsponse (has 
> to do with Function Measurement Block requirements) but we can certainly 
> use that information in QEMU to restrict this behavior to only ISM devices.
> 
> > some special requirements, or tie non-MIO to relaxed alignment. (Could
> > relaxed alignment devices in theory be served by MIO instructions as
> > well?)  
> 
> In practice, I think there are none today, but per the architecture it 
> IS possible to have relaxed alignment devices served by MIO 
> instructions, so we shouldn't rely on that bit alone as I'm doing in 
> this RFC.  I think instead relying on the pft value as I mention above 
> is what we have to do.

From what you write this looks like the best way to me as well.

> 
> > 
> > Another thing that came to my mind is whether we consider the guest to
> > be using a pci device and needing weird instructions to do that because
> > it's on s390, or whether it is issuing instructions for a device that
> > happens to be a pci device (sorry if that sounds a bit meta :)
> >   
> 
> Typically, I'd classify things as the former but I think ISM seems more 
> like the latter -- To me, ISM seems like less a classic PCI device and 
> more a device that happens to be using s390 PCI interfaces to accomplish 
> its goal.  But it's probably more of a case of this particular device 
> (and it's driver) are s390-specific and therefore built with the unique 
> s390 interface in-mind (and in fact invokes it directly rather than 
> through the general PCI layer), rather than fitting the typical PCI 
> device architecture on top of the s390 interface.

Nod, it certainly feels like that.



Re: [PATCH v4] s390/vfio-ap: clean up vfio_ap resources when KVM pointer invalidated

2020-12-22 Thread Cornelia Huck
On Tue, 22 Dec 2020 10:37:01 -0500
Tony Krowiak  wrote:

> On 12/21/20 11:05 PM, Halil Pasic wrote:
> > On Mon, 21 Dec 2020 13:56:25 -0500
> > Tony Krowiak  wrote:

> >>   static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
> >>   unsigned long action, void *data)
> >>   {
> >> -  int ret;
> >> +  int ret, notify_rc = NOTIFY_DONE;
> >>struct ap_matrix_mdev *matrix_mdev;
> >>   
> >>if (action != VFIO_GROUP_NOTIFY_SET_KVM)
> >>return NOTIFY_OK;
> >>   
> >>matrix_mdev = container_of(nb, struct ap_matrix_mdev, group_notifier);
> >> +  mutex_lock(_dev->lock);
> >>   
> >>if (!data) {
> >> -  matrix_mdev->kvm = NULL;
> >> -  return NOTIFY_OK;
> >> +  if (matrix_mdev->kvm)
> >> +  vfio_ap_mdev_unset_kvm(matrix_mdev);
> >> +  notify_rc = NOTIFY_OK;
> >> +  goto notify_done;
> >>}
> >>   
> >>ret = vfio_ap_mdev_set_kvm(matrix_mdev, data);
> >>if (ret)
> >> -  return NOTIFY_DONE;
> >> +  goto notify_done;
> >>   
> >>/* If there is no CRYCB pointer, then we can't copy the masks */
> >>if (!matrix_mdev->kvm->arch.crypto.crycbd)
> >> -  return NOTIFY_DONE;
> >> +  goto notify_done;
> >>   
> >>kvm_arch_crypto_set_masks(matrix_mdev->kvm, matrix_mdev->matrix.apm,
> >>  matrix_mdev->matrix.aqm,
> >>  matrix_mdev->matrix.adm);
> >>   
> >> -  return NOTIFY_OK;  
> > Shouldn't there be an
> >   + notify_rc = NOTIFY_OK;
> > here? I mean you initialize notify_rc to NOTIFY_DONE, in the !data branch
> > on success you set notify_rc to NOTIFY_OK, but in the !!data branch it
> > just stays NOTIFY_DONE. Or am I missing something?  
> 
> I don't think it matters much since NOTIFY_OK and NOTIFY_DONE have
> no further effect on processing of the notification queue, but I believe
> you are correct, this is a change from what we originally had. I can
> restore the original return values if you'd prefer.

Even if they have the same semantics now, that might change in the
future; restoring the original behaviour looks like the right thing to
do.

> 
> >
> > Otherwise LGTM!

Same here.

> >
> > Regards,
> > Halil
> >  
> >> +notify_done:
> >> +  mutex_unlock(_dev->lock);
> >> +  return notify_rc;
> >>   }
> >>  
> > [..]  
> 



Re: [RFC 0/4] vfio-pci/zdev: Fixing s390 vfio-pci ISM support

2020-12-17 Thread Cornelia Huck
On Fri, 11 Dec 2020 10:04:43 -0500
Matthew Rosato  wrote:

> On 12/11/20 10:01 AM, Matthew Rosato wrote:
> > On 12/11/20 9:35 AM, Cornelia Huck wrote:  

> >> Let me summarize this to make sure I understand this new region
> >> correctly:
> >>
> >> - some devices may have relaxed alignment/length requirements for
> >>    pcistb (and friends?)  
> > 
> > The relaxed alignment bit is really specific to PCISTB behavior, so the 
> > "and friends" doesn't apply there.

Ok.

> >   
> >> - some devices may actually require writes to be done in a large chunk
> >>    instead of being broken up (is that a strict subset of the devices
> >>    above?)  
> > 
> > Yes, this is specific to ISM devices, which are always a relaxed 
> > alignment/length device.
> > 
> > The inverse is an interesting question though (relaxed alignment devices 
> > that are not ISM, which you've posed as a possible future extension for 
> > emulated devices).  I'm not sure that any (real devices) exist where 
> > (relaxed_alignment && !ism), but 'what if' -- I guess the right approach 
> > would mean additional code in QEMU to handle relaxed alignment for the 
> > vfio mmio path as well (seen as pcistb_default in my qemu patchset) and 
> > being very specific in QEMU to only enable the region for an ism device.  
> 
> Let me be more precise there...  It would be additional code to handle 
> relaxed alignment for the default pcistb path (pcistb_default) which 
> would include BOTH emulated devices (should we ever surface the relaxed 
> alignment CLP bit and the guest kernel honor it) as well as any s390x 
> vfio-pci device that doesn't use this new I/O region described here.

Understood. Not sure if it is useful, but the important part is that we
could extend it if we wanted.

> 
> >   
> >> - some devices do not support the new MIO instructions (is that a
> >>    subset of the relaxed alignment devices? I'm not familiar with the
> >>    MIO instructions)
> >>  
> > 
> > The non-MIO requirement is again specific to ISM, which is a subset of 
> > the relaxed alignment devices.  In this case, the requirement is not 
> > limited to PCISTB, and that's why PCILG is also included here.  The ISM 
> > driver does not use PCISTG, and the only PCISTG instructions coming from 
> > the guest against an ISM device would be against the config space and 
> > those are OK to go through vfio still; so what was provided via the 
> > region is effectively the bare-minimum requirement to allow ISM to 
> > function properly in the guest.
> >   
> >> The patchsets introduce a new region that (a) is used by QEMU to submit
> >> writes in one go, and (b) makes sure to call into the non-MIO
> >> instructions directly; it's basically killing two birds with one stone
> >> for ISM devices. Are these two requirements (large writes and non-MIO)
> >> always going hand-in-hand, or is ISM just an odd device?  
> > 
> > I would say that ISM is definitely a special-case device, even just 
> > looking at the way it's implemented in the base kernel (interacting 
> > directly with the s390 kernel PCI layer in order to avoid use of MIO 
> > instructions -- no other driver does this).  But that said, having the 
> > two requirements hand-in-hand I think is not bad, though -- This 
> > approach ensures the specific instruction the guest wanted (or in this 
> > case, needed) is actually executed on the underlying host.

The basic question I have is whether it makes sense to specialcase the
ISM device (can we even find out that we're dealing with an ISM device
here?) to force the non-MIO instructions, as it is just a device with
some special requirements, or tie non-MIO to relaxed alignment. (Could
relaxed alignment devices in theory be served by MIO instructions as
well?)

Another thing that came to my mind is whether we consider the guest to
be using a pci device and needing weird instructions to do that because
it's on s390, or whether it is issuing instructions for a device that
happens to be a pci device (sorry if that sounds a bit meta :)

> > 
> > That said, the ability to re-use the large write for other devices would 
> > be nice -- but as hinted in the QEMU cover letter, this approach only 
> > works because ISM does not support MSI-X; using this approach for 
> > MSI-X-enabled devices breaks the MSI-X masking that vfio-pci does in 
> > QEMU (I tried an approach that used this region approach for all 3 
> > instructions as a test, PCISTG/PCISTB/PCILG, and threw it against mlx -- 
> > any writes against an MSI-X enabled bar will m

Re: [PATCH v3] s390/vfio-ap: clean up vfio_ap resources when KVM pointer invalidated

2020-12-15 Thread Cornelia Huck
On Mon, 14 Dec 2020 11:56:17 -0500
Tony Krowiak  wrote:

> The vfio_ap device driver registers a group notifier with VFIO when the
> file descriptor for a VFIO mediated device for a KVM guest is opened to
> receive notification that the KVM pointer is set (VFIO_GROUP_NOTIFY_SET_KVM
> event). When the KVM pointer is set, the vfio_ap driver takes the
> following actions:
> 1. Stashes the KVM pointer in the vfio_ap_mdev struct that holds the state
>of the mediated device.
> 2. Calls the kvm_get_kvm() function to increment its reference counter.
> 3. Sets the function pointer to the function that handles interception of
>the instruction that enables/disables interrupt processing.
> 4. Sets the masks in the KVM guest's CRYCB to pass AP resources through to
>the guest.
> 
> In order to avoid memory leaks, when the notifier is called to receive
> notification that the KVM pointer has been set to NULL, the vfio_ap device
> driver should reverse the actions taken when the KVM pointer was set.
> 
> Fixes: 258287c994de ("s390: vfio-ap: implement mediated device open callback")
> Signed-off-by: Tony Krowiak 
> ---
>  drivers/s390/crypto/vfio_ap_ops.c | 29 -----
>  1 file changed, 20 insertions(+), 9 deletions(-)

Reviewed-by: Cornelia Huck 



Re: [RFC 0/4] vfio-pci/zdev: Fixing s390 vfio-pci ISM support

2020-12-11 Thread Cornelia Huck
On Thu, 10 Dec 2020 10:51:23 -0500
Matthew Rosato  wrote:

> On 12/10/20 7:33 AM, Cornelia Huck wrote:
> > On Wed,  9 Dec 2020 15:27:46 -0500
> > Matthew Rosato  wrote:
> >   
> >> Today, ISM devices are completely disallowed for vfio-pci passthrough as
> >> QEMU will reject the device due to an (inappropriate) MSI-X check.
> >> However, in an effort to enable ISM device passthrough, I realized that the
> >> manner in which ISM performs block write operations is highly incompatible
> >> with the way that QEMU s390 PCI instruction interception and
> >> vfio_pci_bar_rw break up I/O operations into 8B and 4B operations -- ISM
> >> devices have particular requirements in regards to the alignment, size and
> >> order of writes performed.  Furthermore, they require that legacy/non-MIO
> >> s390 PCI instructions are used, which is also not guaranteed when the I/O
> >> is passed through the typical userspace channels.  
> > 
> > The part about the non-MIO instructions confuses me. How can MIO
> > instructions be generated with the current code, and why does changing  
> 
> So to be clear, they are not being generated at all in the guest as the 
> necessary facility is reported as unavailable.
> 
> Let's talk about Linux in LPAR / the host kernel:  When hardware that 
> supports MIO instructions is available, all userspace I/O traffic is 
> going to be routed through the MIO variants of the s390 PCI 
> instructions.  This is working well for other device types, but does not 
> work for ISM which does not support these variants.  However, the ISM 
> driver also does not invoke the userspace I/O routines for the kernel, 
> it invokes the s390 PCI layer directly, which in turn ensures the proper 
> PCI instructions are used -- This approach falls apart when the guest 
> ISM driver invokes those routines in the guest -- we (qemu) pass those 
> non-MIO instructions from the guest as memory operations through 
> vfio-pci, traversing through the vfio I/O layer in the guest 
> (vfio_pci_bar_rw and friends), where we then arrive in the host s390 PCI 
> layer -- where the MIO variant is used because the facility is available.
> 
> Per conversations with Niklas (on CC), it's not trivial to decide by the 
> time we reach the s390 PCI I/O layer to switch gears and use the non-MIO 
> instruction set.
> 
> > the write pattern help?  
> 
> The write pattern is a separate issue from non-MIO instruction 
> requirements...  Certain address spaces require specific instructions to 
> be used (so, no substituting PCISTG for PCISTB - that happens too by 
> default for any writes coming into the host s390 PCI layer that are 
> <=8B, and they all are when the PCISTB is broken up into 8B memory 
> operations that travel through vfio_pci_bar_rw, which further breaks 
> those up into 4B operations).  There's also a requirement for some 
> writes that the data, if broken up, be written in a certain order in 
> order to properly trigger events. :(  The ability to pass the entire 
> PCISTB payload vs breaking it into 8B chunks is also significantly faster.

Let me summarize this to make sure I understand this new region
correctly:

- some devices may have relaxed alignment/length requirements for
  pcistb (and friends?)
- some devices may actually require writes to be done in a large chunk
  instead of being broken up (is that a strict subset of the devices
  above?)
- some devices do not support the new MIO instructions (is that a
  subset of the relaxed alignment devices? I'm not familiar with the
  MIO instructions)

The patchsets introduce a new region that (a) is used by QEMU to submit
writes in one go, and (b) makes sure to call into the non-MIO
instructions directly; it's basically killing two birds with one stone
for ISM devices. Are these two requirements (large writes and non-MIO)
always going hand-in-hand, or is ISM just an odd device?

If there's an expectation that the new region will always use the
non-MIO instructions (in addition to the changed write handling), it
should be noted in the description for the region as well.



Re: [RFC 0/4] vfio-pci/zdev: Fixing s390 vfio-pci ISM support

2020-12-11 Thread Cornelia Huck
On Thu, 10 Dec 2020 17:14:24 +0100
Niklas Schnelle  wrote:

> On 12/10/20 4:51 PM, Matthew Rosato wrote:
> > On 12/10/20 7:33 AM, Cornelia Huck wrote:  
> >> On Wed,  9 Dec 2020 15:27:46 -0500
> >> Matthew Rosato  wrote:
> >>  
> >>> Today, ISM devices are completely disallowed for vfio-pci passthrough as
> >>> QEMU will reject the device due to an (inappropriate) MSI-X check.
> >>> However, in an effort to enable ISM device passthrough, I realized that 
> >>> the
> >>> manner in which ISM performs block write operations is highly incompatible
> >>> with the way that QEMU s390 PCI instruction interception and
> >>> vfio_pci_bar_rw break up I/O operations into 8B and 4B operations -- ISM
> >>> devices have particular requirements in regards to the alignment, size and
> >>> order of writes performed.  Furthermore, they require that legacy/non-MIO
> >>> s390 PCI instructions are used, which is also not guaranteed when the I/O
> >>> is passed through the typical userspace channels.  
> >>
> >> The part about the non-MIO instructions confuses me. How can MIO
> >> instructions be generated with the current code, and why does changing  
> > 
> > So to be clear, they are not being generated at all in the guest as the 
> > necessary facility is reported as unavailable.
> > 
> > Let's talk about Linux in LPAR / the host kernel:  When hardware that 
> > supports MIO instructions is available, all userspace I/O traffic is going 
> > to be routed through the MIO variants of the s390 PCI instructions.  This 
> > is working well for other device types, but does not work for ISM which 
> > does not support these variants.  However, the ISM driver also does not 
> > invoke the userspace I/O routines for the kernel, it invokes the s390 PCI 
> > layer directly, which in turn ensures the proper PCI instructions are used 
> > -- This approach falls apart when the guest ISM driver invokes those 
> > routines in the guest -- we (qemu) pass those non-MIO instructions from the 
> > guest as memory operations through vfio-pci, traversing through the vfio 
> > I/O layer in the guest (vfio_pci_bar_rw and friends), where we then arrive 
> > in the host s390 PCI layer -- where the MIO variant is used because the 
> > facility is available.  
> 
> Slight clarification since I think the word "userspace" is a bit overloaded as
> KVM folks often use it to talk about the guest even when that calls through 
> vfio.
> Application userspace (i.e. things like DPDK) can use PCI MIO Load/Stores
> directly on mmap()ed/ioremap()ed memory these don't go through the Kernel at
> all.
> QEMU while also in userspace on the other hand goes through the vfio_bar_rw()
> region which uses the common code _Kernel_ ioread()/iowrite() API. This Kernel
> ioread()/iowrite() API uses PCI MIO Load/Stores by default on machines that
> support them (z15 currently).  The ISM driver, knowing that its device does 
> not
> support MIO, goes around this API and directly calls zpci_store()/zpci_load().

Ok, thanks for the explanation.

> 
> 
> > 
> > Per conversations with Niklas (on CC), it's not trivial to decide by the 
> > time we reach the s390 PCI I/O layer to switch gears and use the non-MIO 
> > instruction set.  
> 
> Yes, we have some ideas about dynamically switching to legacy PCI stores in
> ioread()/iowrite() for devices that are set up for it but since that only gets
> an ioremap()ed address, a value and a size it would evolve such nasty things 
> as
> looking at this virtual address to determine if it includes a ZPCI_ADDR()
> cookie that we use to get to the function handle needed for the legacy PCI
> Load/Stores, while MIO PCI Load/Stores directly work on virtual addresses.
> 
> Now purely for the Kernel API we think this could work since that always
> allocates between VMALLOC_START and VMALLOC_END and we control where we put 
> the
> ZPCI_ADDR() cookie but I'm very hesitant to add something like that.
> 
> As for application userspace (DPDK) we do have a syscall
> (arch/s390/pci/pci_mmio.c) API that had a similar problem but we could make 
> use
> of the fact that our Architecture is pretty nifty with address spaces and just
> execute the MIO PCI Load/Store in the syscall _as if_ by the calling userspace
> application.

Is ISM (currently) the only device that needs to use the non-MIO
instructions, or are there others as well? Is there any characteristic
that a meta driver like vfio could discover, or is it a device quirk
you just need to know about?



Re: [RFC 1/4] s390/pci: track alignment/length strictness for zpci_dev

2020-12-11 Thread Cornelia Huck
On Thu, 10 Dec 2020 10:26:22 -0500
Matthew Rosato  wrote:

> On 12/10/20 5:33 AM, Cornelia Huck wrote:
> > On Wed,  9 Dec 2020 15:27:47 -0500
> > Matthew Rosato  wrote:
> >   
> >> Some zpci device types (e.g., ISM) follow different rules for length
> >> and alignment of pci instructions.  Recognize this and keep track of
> >> it in the zpci_dev.
> >>
> >> Signed-off-by: Matthew Rosato 
> >> Reviewed-by: Niklas Schnelle 
> >> Reviewed-by: Pierre Morel 
> >> ---
> >>   arch/s390/include/asm/pci.h | 3 ++-
> >>   arch/s390/include/asm/pci_clp.h | 4 +++-
> >>   arch/s390/pci/pci_clp.c | 1 +
> >>   3 files changed, 6 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
> >> index 2126289..f16ffba 100644
> >> --- a/arch/s390/include/asm/pci.h
> >> +++ b/arch/s390/include/asm/pci.h
> >> @@ -133,7 +133,8 @@ struct zpci_dev {
> >>u8  has_hp_slot : 1;
> >>u8  is_physfn   : 1;
> >>u8  util_str_avail  : 1;
> >> -  u8  reserved: 4;
> >> +  u8  relaxed_align   : 1;
> >> +  u8  reserved: 3;
> >>unsigned intdevfn;  /* DEVFN part of the RID*/
> >>   
> >>struct mutex lock;
> >> diff --git a/arch/s390/include/asm/pci_clp.h 
> >> b/arch/s390/include/asm/pci_clp.h
> >> index 1f4b666..9fb7cbf 100644
> >> --- a/arch/s390/include/asm/pci_clp.h
> >> +++ b/arch/s390/include/asm/pci_clp.h
> >> @@ -150,7 +150,9 @@ struct clp_rsp_query_pci_grp {
> >>u16 :  4;
> >>u16 noi : 12;   /* number of interrupts */
> >>u8 version;
> >> -  u8  :  6;
> >> +  u8  :  4;
> >> +  u8 relaxed_align:  1;   /* Relax length and alignment rules */
> >> +  u8  :  1;
> >>u8 frame:  1;
> >>u8 refresh  :  1;   /* TLB refresh mode */
> >>u16 reserved2;
> >> diff --git a/arch/s390/pci/pci_clp.c b/arch/s390/pci/pci_clp.c
> >> index 153720d..630f8fc 100644
> >> --- a/arch/s390/pci/pci_clp.c
> >> +++ b/arch/s390/pci/pci_clp.c
> >> @@ -103,6 +103,7 @@ static void clp_store_query_pci_fngrp(struct zpci_dev 
> >> *zdev,
> >>zdev->max_msi = response->noi;
> >>zdev->fmb_update = response->mui;
> >>zdev->version = response->version;
> >> +  zdev->relaxed_align = response->relaxed_align;
> >>   
> >>switch (response->version) {
> >>case 1:  
> > 
> > Hm, what does that 'relaxed alignment' imply? Is that something that
> > can apply to emulated devices as well?
> >   
> The relaxed alignment simply loosens the rules on the PCISTB instruction 
> so that it doesn't have to be on particular boundaries / have a minimum 
> length restriction, these effectively allow ISM devices to use PCISTB 
> instead of PCISTG for just about everything.  If you have a look at the 
> patch "s390x/pci: Handle devices that support relaxed alignment" from 
> the linked qemu set, you can get an idea of what the bit changes via the 
> way qemu has to be more permissive of what the guest provides for PCISTB.

Ok, so it is basically a characteristic of a specific device that
changes the rules of what pcistb will accept.

> 
> Re: emulated devices...  The S390 PCI I/O layer in the guest is always 
> issuing strict? aligned I/O for PCISTB, and if it decided to later 
> change that behavior it would need to look at this CLP bit to decide 
> whether that would be a valid operation for a given PCI function anyway. 
>   This bit will remain off in the CLP response we give for emulated 
> devices, ensuring that should such a change occur in the guest s390 PCI 
> I/O layer, we'd just continue getting strictly-aligned PCISTB.

My question was more whether that was a feature that might make sense
to emulate on the hypervisor side for fully emulated devices. I'd like
to leave the door open for emulated devices to advertise this and make
it possible for guests using those devices to use pcistb with the
relaxed rules, if it makes sense.

I guess we can easily retrofit this if we come up with a use case for it.



Re: [RFC 0/4] vfio-pci/zdev: Fixing s390 vfio-pci ISM support

2020-12-10 Thread Cornelia Huck
On Wed,  9 Dec 2020 15:27:46 -0500
Matthew Rosato  wrote:

> Today, ISM devices are completely disallowed for vfio-pci passthrough as
> QEMU will reject the device due to an (inappropriate) MSI-X check.
> However, in an effort to enable ISM device passthrough, I realized that the
> manner in which ISM performs block write operations is highly incompatible
> with the way that QEMU s390 PCI instruction interception and
> vfio_pci_bar_rw break up I/O operations into 8B and 4B operations -- ISM
> devices have particular requirements in regards to the alignment, size and
> order of writes performed.  Furthermore, they require that legacy/non-MIO
> s390 PCI instructions are used, which is also not guaranteed when the I/O
> is passed through the typical userspace channels.

The part about the non-MIO instructions confuses me. How can MIO
instructions be generated with the current code, and why does changing
the write pattern help?

> 
> As a result, this patchset proposes a new VFIO region to allow a guest to
> pass certain PCI instruction intercepts directly to the s390 host kernel
> PCI layer for exeuction, pinning the guest buffer in memory briefly in
> order to execute the requested PCI instruction.
> 
> Matthew Rosato (4):
>   s390/pci: track alignment/length strictness for zpci_dev
>   vfio-pci/zdev: Pass the relaxed alignment flag
>   s390/pci: Get hardware-reported max store block length
>   vfio-pci/zdev: Introduce the zPCI I/O vfio region
> 
>  arch/s390/include/asm/pci.h |   4 +-
>  arch/s390/include/asm/pci_clp.h |   7 +-
>  arch/s390/pci/pci_clp.c |   2 +
>  drivers/vfio/pci/vfio_pci.c |   8 ++
>  drivers/vfio/pci/vfio_pci_private.h |   6 ++
>  drivers/vfio/pci/vfio_pci_zdev.c| 160 
> 
>  include/uapi/linux/vfio.h   |   4 +
>  include/uapi/linux/vfio_zdev.h  |  33 
>  8 files changed, 221 insertions(+), 3 deletions(-)
> 



Re: [RFC 1/4] s390/pci: track alignment/length strictness for zpci_dev

2020-12-10 Thread Cornelia Huck
On Wed,  9 Dec 2020 15:27:47 -0500
Matthew Rosato  wrote:

> Some zpci device types (e.g., ISM) follow different rules for length
> and alignment of pci instructions.  Recognize this and keep track of
> it in the zpci_dev.
> 
> Signed-off-by: Matthew Rosato 
> Reviewed-by: Niklas Schnelle 
> Reviewed-by: Pierre Morel 
> ---
>  arch/s390/include/asm/pci.h | 3 ++-
>  arch/s390/include/asm/pci_clp.h | 4 +++-
>  arch/s390/pci/pci_clp.c | 1 +
>  3 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
> index 2126289..f16ffba 100644
> --- a/arch/s390/include/asm/pci.h
> +++ b/arch/s390/include/asm/pci.h
> @@ -133,7 +133,8 @@ struct zpci_dev {
>   u8  has_hp_slot : 1;
>   u8  is_physfn   : 1;
>   u8  util_str_avail  : 1;
> - u8  reserved: 4;
> + u8  relaxed_align   : 1;
> + u8  reserved: 3;
>   unsigned intdevfn;  /* DEVFN part of the RID*/
>  
>   struct mutex lock;
> diff --git a/arch/s390/include/asm/pci_clp.h b/arch/s390/include/asm/pci_clp.h
> index 1f4b666..9fb7cbf 100644
> --- a/arch/s390/include/asm/pci_clp.h
> +++ b/arch/s390/include/asm/pci_clp.h
> @@ -150,7 +150,9 @@ struct clp_rsp_query_pci_grp {
>   u16 :  4;
>   u16 noi : 12;   /* number of interrupts */
>   u8 version;
> - u8  :  6;
> + u8  :  4;
> + u8 relaxed_align:  1;   /* Relax length and alignment rules */
> + u8  :  1;
>   u8 frame:  1;
>   u8 refresh  :  1;   /* TLB refresh mode */
>   u16 reserved2;
> diff --git a/arch/s390/pci/pci_clp.c b/arch/s390/pci/pci_clp.c
> index 153720d..630f8fc 100644
> --- a/arch/s390/pci/pci_clp.c
> +++ b/arch/s390/pci/pci_clp.c
> @@ -103,6 +103,7 @@ static void clp_store_query_pci_fngrp(struct zpci_dev 
> *zdev,
>   zdev->max_msi = response->noi;
>   zdev->fmb_update = response->mui;
>   zdev->version = response->version;
> + zdev->relaxed_align = response->relaxed_align;
>  
>   switch (response->version) {
>   case 1:

Hm, what does that 'relaxed alignment' imply? Is that something that
can apply to emulated devices as well?



Re: [PATCH v2] vfio iommu type1: Improve vfio_iommu_type1_pin_pages performance

2020-12-09 Thread Cornelia Huck
On Tue, 8 Dec 2020 21:55:53 +0800
"xuxiaoyang (C)"  wrote:

> On 2020/11/21 15:58, xuxiaoyang (C) wrote:
> > vfio_pin_pages() accepts an array of unrelated iova pfns and processes
> > each to return the physical pfn.  When dealing with large arrays of
> > contiguous iovas, vfio_iommu_type1_pin_pages is very inefficient because
> > it is processed page by page.In this case, we can divide the iova pfn
> > array into multiple continuous ranges and optimize them.  For example,
> > when the iova pfn array is {1,5,6,7,9}, it will be divided into three
> > groups {1}, {5,6,7}, {9} for processing.  When processing {5,6,7}, the
> > number of calls to pin_user_pages_remote is reduced from 3 times to once.
> > For single page or large array of discontinuous iovas, we still use
> > vfio_pin_page_external to deal with it to reduce the performance loss
> > caused by refactoring.
> > 
> > Signed-off-by: Xiaoyang Xu 

(...)

> 
> hi Cornelia Huck, Eric Farman, Zhenyu Wang, Zhi Wang
> 
> vfio_pin_pages() accepts an array of unrelated iova pfns and processes
> each to return the physical pfn.  When dealing with large arrays of
> contiguous iovas, vfio_iommu_type1_pin_pages is very inefficient because
> it is processed page by page.  In this case, we can divide the iova pfn
> array into multiple continuous ranges and optimize them.  I have a set
> of performance test data for reference.
> 
> The patch was not applied
> 1 page   512 pages
> no huge pages: 1638ns   223651ns
> THP:   1668ns   222330ns
> HugeTLB:   1526ns   208151ns
> 
> The patch was applied
> 1 page   512 pages
> no huge pages   1735ns   167286ns
> THP:   1934ns   126900ns
> HugeTLB:   1713ns   102188ns
> 
> As Alex Williamson said, this patch lacks proof that it works in the
> real world. I think you will have some valuable opinions.

Looking at this from the vfio-ccw angle, I'm not sure how much this
would buy us, as we deal with IDAWs, which are designed so that they
can be non-contiguous. I guess this depends a lot on what the guest
does.

Eric, any opinion? Do you maybe also happen to have a test setup that
mimics workloads actually seen in the real world?



Re: [PATCH] s390/vfio-ap: Clean up vfio_ap resources when KVM pointer invalidated

2020-12-04 Thread Cornelia Huck
On Fri, 4 Dec 2020 11:48:24 -0500
Tony Krowiak  wrote:

> On 12/3/20 12:55 PM, Halil Pasic wrote:
> > On Wed,  2 Dec 2020 18:41:01 -0500
> > Tony Krowiak  wrote:
> >  
> >> The vfio_ap device driver registers a group notifier with VFIO when the
> >> file descriptor for a VFIO mediated device for a KVM guest is opened to
> >> receive notification that the KVM pointer is set (VFIO_GROUP_NOTIFY_SET_KVM
> >> event). When the KVM pointer is set, the vfio_ap driver stashes the pointer
> >> and calls the kvm_get_kvm() function to increment its reference counter.
> >> When the notifier is called to make notification that the KVM pointer has
> >> been set to NULL, the driver should clean up any resources associated with
> >> the KVM pointer and decrement its reference counter. The current
> >> implementation does not take care of this clean up.
> >>
> >> Signed-off-by: Tony Krowiak   
> > Do we need a Fixes tag? Do we need this backported? In my opinion
> > this is necessary since the interrupt patches.  
> 
> I'll put in a fixes tag:
> Fixes: 258287c994de (s390: vfio-ap: implement mediated device open callback)

The canonical format would be

Fixes: 258287c994de ("s390: vfio-ap: implement mediated device open callback")

> 
> Yes, this should probably be backported.
> 
> >  
> >> ---
> >>   drivers/s390/crypto/vfio_ap_ops.c | 21 +
> >>   1 file changed, 13 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
> >> b/drivers/s390/crypto/vfio_ap_ops.c
> >> index e0bde8518745..eeb9c9130756 100644
> >> --- a/drivers/s390/crypto/vfio_ap_ops.c
> >> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> >> @@ -1083,6 +1083,17 @@ static int vfio_ap_mdev_iommu_notifier(struct 
> >> notifier_block *nb,
> >>return NOTIFY_DONE;
> >>   }
> >>   
> >> +static void vfio_ap_mdev_put_kvm(struct ap_matrix_mdev *matrix_mdev)  
> > I don't like the name. The function does more that put_kvm. Maybe
> > something  like _disconnect_kvm()?  
> Since the vfio_ap_mdev_set_kvm() function is called by the
> notifier when the KVM pointer is set, how about:
> 
> vfio_ap_mdev_unset_kvm()
> 
> for when the KVM pointer is nullified?

Sounds good to me.



Re: [PATCH v1 2/5] vfio: platform: Switch to use platform_get_mem_or_io_resource()

2020-12-03 Thread Cornelia Huck
On Tue, 27 Oct 2020 19:58:03 +0200
Andy Shevchenko  wrote:

> Switch to use new platform_get_mem_or_io_resource() instead of
> home grown analogue.
> 
> Cc: Eric Auger 
> Cc: Alex Williamson 
> Cc: Cornelia Huck 
> Cc: k...@vger.kernel.org
> Signed-off-by: Andy Shevchenko 
> ---
>  drivers/vfio/platform/vfio_platform.c | 13 +
>  1 file changed, 1 insertion(+), 12 deletions(-)

Reviewed-by: Cornelia Huck 



Re: [PATCH v1 1/5] driver core: platform: Introduce platform_get_mem_or_io_resource()

2020-12-03 Thread Cornelia Huck
On Tue, 27 Oct 2020 19:58:02 +0200
Andy Shevchenko  wrote:

> There are at least few existing users of the proposed API which
> retrieves either MEM or IO resource from platform device.
> 
> Make it common to utilize in the existing and new users.
> 
> Signed-off-by: Andy Shevchenko 
> Cc: Eric Auger 
> Cc: Alex Williamson 
> Cc: Cornelia Huck 
> Cc: k...@vger.kernel.org
> Cc: linux-...@vger.kernel.org
> Cc: Peng Hao 
> Cc: Arnd Bergmann 
> ---
>  include/linux/platform_device.h | 13 +
>  1 file changed, 13 insertions(+)
> 
> diff --git a/include/linux/platform_device.h b/include/linux/platform_device.h
> index 77a2aada106d..eb8d74744e29 100644
> --- a/include/linux/platform_device.h
> +++ b/include/linux/platform_device.h
> @@ -52,6 +52,19 @@ extern struct device platform_bus;
>  
>  extern struct resource *platform_get_resource(struct platform_device *,
> unsigned int, unsigned int);
> +static inline
> +struct resource *platform_get_mem_or_io_resource(struct platform_device 
> *pdev,

Minor nit: If I would want to break up the long line, I'd use

static inline struct resource *
platform_get_mem_or_io_resource(...)

> +  unsigned int num)
> +{
> + struct resource *res;
> +
> + res = platform_get_resource(pdev, IORESOURCE_MEM, num);
> + if (res)
> + return res;
> +
> + return platform_get_resource(pdev, IORESOURCE_IO, num);
> +}
> +
>  extern struct device *
>  platform_find_device_by_driver(struct device *start,
>  const struct device_driver *drv);

Reviewed-by: Cornelia Huck 



Re: [PATCH] s390/vfio-ap: Clean up vfio_ap resources when KVM pointer invalidated

2020-12-03 Thread Cornelia Huck
On Wed,  2 Dec 2020 18:41:01 -0500
Tony Krowiak  wrote:

> The vfio_ap device driver registers a group notifier with VFIO when the
> file descriptor for a VFIO mediated device for a KVM guest is opened to
> receive notification that the KVM pointer is set (VFIO_GROUP_NOTIFY_SET_KVM
> event). When the KVM pointer is set, the vfio_ap driver stashes the pointer
> and calls the kvm_get_kvm() function to increment its reference counter.
> When the notifier is called to make notification that the KVM pointer has
> been set to NULL, the driver should clean up any resources associated with
> the KVM pointer and decrement its reference counter. The current
> implementation does not take care of this clean up.
> 
> Signed-off-by: Tony Krowiak 
> ---
>  drivers/s390/crypto/vfio_ap_ops.c | 21 +
>  1 file changed, 13 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
> b/drivers/s390/crypto/vfio_ap_ops.c
> index e0bde8518745..eeb9c9130756 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -1083,6 +1083,17 @@ static int vfio_ap_mdev_iommu_notifier(struct 
> notifier_block *nb,
>   return NOTIFY_DONE;
>  }
>  
> +static void vfio_ap_mdev_put_kvm(struct ap_matrix_mdev *matrix_mdev)
> +{
> + if (matrix_mdev->kvm) {
> + kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
> + matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
> + vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
> + kvm_put_kvm(matrix_mdev->kvm);
> + matrix_mdev->kvm = NULL;
> + }
> +}
> +
>  static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
>  unsigned long action, void *data)
>  {
> @@ -1095,7 +1106,7 @@ static int vfio_ap_mdev_group_notifier(struct 
> notifier_block *nb,
>   matrix_mdev = container_of(nb, struct ap_matrix_mdev, group_notifier);
>  
>   if (!data) {
> - matrix_mdev->kvm = NULL;
> + vfio_ap_mdev_put_kvm(matrix_mdev);

Hm. I'm wondering whether you need to hold the maxtrix_dev lock here as
well?

>   return NOTIFY_OK;
>   }
>  
> @@ -1222,13 +1233,7 @@ static void vfio_ap_mdev_release(struct mdev_device 
> *mdev)
>   struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>  
>   mutex_lock(_dev->lock);
> - if (matrix_mdev->kvm) {
> - kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
> - matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
> - vfio_ap_mdev_reset_queues(mdev);
> - kvm_put_kvm(matrix_mdev->kvm);
> - matrix_mdev->kvm = NULL;
> - }
> + vfio_ap_mdev_put_kvm(matrix_mdev);
>   mutex_unlock(_dev->lock);
>  
>   vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,



Re: [PATCH] s390: cio: fix use-after-free in ccw_device_destroy_console

2020-12-01 Thread Cornelia Huck
On Tue, 1 Dec 2020 14:31:50 +0800
Qinglang Miao  wrote:

> put_device calls release function which do kfree() inside.
> So following use of sch would cause use-after-free bugs.
> 
> Fix these by simply adjusting the position of put_device.
> 
> Fixes: 37db8985b211 ("s390/cio: add basic protected virtualization support")
> Reported-by: Hulk Robot 
> Suggested-by: Cornelia Huck 
> Signed-off-by: Qinglang Miao 
> ---
>  This patch is indeed a v2 of older one. Considering that the
>  patch's name has changed, I think a normal prefix 'PATCH' is
>  better.
> 
>  drivers/s390/cio/device.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Cornelia Huck 



Re: [PATCH] scsi: zfcp: fix use-after-free in zfcp_unit_remove

2020-11-26 Thread Cornelia Huck
On Thu, 26 Nov 2020 09:27:41 +0800
Qinglang Miao  wrote:

> 在 2020/11/26 1:06, Benjamin Block 写道:
> > On Fri, Nov 20, 2020 at 03:48:54PM +0800, Qinglang Miao wrote:  
> >> kfree(port) is called in put_device(>dev) so that following
> >> use would cause use-after-free bug.
> >>
> >> The former put_device is redundant for device_unregister contains
> >> put_device already. So just remove it to fix this.
> >>
> >> Fixes: 86bdf218a717 ("[SCSI] zfcp: cleanup unit sysfs attribute usage")
> >> Reported-by: Hulk Robot 
> >> Signed-off-by: Qinglang Miao 
> >> ---
> >>   drivers/s390/scsi/zfcp_unit.c | 2 --
> >>   1 file changed, 2 deletions(-)
> >>
> >> diff --git a/drivers/s390/scsi/zfcp_unit.c b/drivers/s390/scsi/zfcp_unit.c
> >> index e67bf7388..664b77853 100644
> >> --- a/drivers/s390/scsi/zfcp_unit.c
> >> +++ b/drivers/s390/scsi/zfcp_unit.c
> >> @@ -255,8 +255,6 @@ int zfcp_unit_remove(struct zfcp_port *port, u64 
> >> fcp_lun)
> >>scsi_device_put(sdev);
> >>}
> >>   
> >> -  put_device(>dev);
> >> -
> >>device_unregister(>dev);  
> >>  >>return 0;  
> > 
> > Same as in the other mail for `zfcp_sysfs_port_remove_store()`. We
> > explicitly get a new ref in `_zfcp_unit_find()`, so we also need to put
> > that away again.
> >  
> Sorry, Benjamin, I don't think so, because device_unregister calls 
> put_device inside.
> 
> It seem's that another put_device before or after device_unregister is 
> useless and even might cause an use-after-free.

The issue here (and in the other patches that I had commented on) is
that the references have different origins. device_register() acquires
a reference, and that reference is given up when you call
device_unregister(). However, the code here grabs an extra reference,
and it of course has to give it up again when it no longer needs it.

This is something that is not that easy to spot by an automated check,
I guess?



Re: [PATCH v11 05/14] s390/vfio-ap: implement in-use callback for vfio_ap driver

2020-11-23 Thread Cornelia Huck
On Sat, 14 Nov 2020 00:47:22 +0100
Halil Pasic  wrote:

> On Fri, 13 Nov 2020 12:14:22 -0500
> Tony Krowiak  wrote:
> [..]
> > >>   }
> > >>   
> > >> +#define MDEV_SHARING_ERR "Userspace may not re-assign queue %02lx.%04lx 
> > >> " \
> > >> + "already assigned to %s"
> > >> +
> > >> +static void vfio_ap_mdev_log_sharing_err(const char *mdev_name,
> > >> + unsigned long *apm,
> > >> + unsigned long *aqm)
> > >> +{
> > >> +unsigned long apid, apqi;
> > >> +
> > >> +for_each_set_bit_inv(apid, apm, AP_DEVICES)
> > >> +for_each_set_bit_inv(apqi, aqm, AP_DOMAINS)
> > >> +pr_err(MDEV_SHARING_ERR, apid, apqi, 
> > >> mdev_name);  
> > > Isn't error rather severe for this? For my taste even warning would be
> > > severe for this.  
> > 
> > The user only sees a EADDRINUSE returned from the sysfs interface,
> > so Conny asked if I could log a message to indicate which APQNs are
> > in use by which mdev. I can change this to an info message, but it
> > will be missed if the log level is set higher. Maybe Conny can put in
> > her two cents here since she asked for this.
> >   
> 
> I'm looking forward to Conny's opinion. :)

(only just saw this; -ETOOMANYEMAILS)

It is probably not an error in the sense of "things are broken, this
cannot work"; but I'd consider this at least a warning "this does not
work as you intended".



Re: [PATCH] s390: cmf: fix use-after-free in enable_cmf

2020-11-20 Thread Cornelia Huck
On Fri, 20 Nov 2020 15:48:50 +0800
Qinglang Miao  wrote:

> kfree(cdev) is called in put_device in the error branch. So that
> device_unlock(>dev) would raise a use-after-free bug. In fact,
> there's no need to call device_unlock after put_device.
> 
> Fix it by adding simply return after put_device.
> 
> Fixes: a6ef15652d26 ("s390/cio: fix use after free in cmb processing")
> Reported-by: Hulk Robot 
> Signed-off-by: Qinglang Miao 
> ---
>  drivers/s390/cio/cmf.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/s390/cio/cmf.c b/drivers/s390/cio/cmf.c
> index 72dd2471e..e95ca476f 100644
> --- a/drivers/s390/cio/cmf.c
> +++ b/drivers/s390/cio/cmf.c
> @@ -1149,9 +1149,12 @@ int enable_cmf(struct ccw_device *cdev)
>   sysfs_remove_group(>dev.kobj, cmbops->attr_group);
>   cmbops->free(cdev);
>   }
> +
>  out:
> - if (ret)
> + if (ret) {
>   put_device(>dev);

The put_device() here undoes a get_device() further up in the function.
There is at least one more reference remaining, held by the caller of
enable_cmf(). Returning here would actually introduce a bug (missing
unlock).

> + return ret;
> + }
>  out_unlock:
>   device_unlock(>dev);
>   return ret;



Re: [PATCH] s390: cio: fix two use-after-free bugs in device.c

2020-11-19 Thread Cornelia Huck
On Fri, 20 Nov 2020 15:48:49 +0800
Qinglang Miao  wrote:

> put_device calls release function which do kfree() inside.
> So following use of sch would cause use-after-free bugs.
> 
> Fix these by simply adjusting the position of put_device.
> 
> Fixes: 37db8985b211 ("s390/cio: add basic protected virtualization support")
> Fixes: 74bd0d859dc3 ("s390/cio: fix unlocked access of online member")
> Reported-by: Hulk Robot 
> Signed-off-by: Qinglang Miao 
> ---
>  drivers/s390/cio/device.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/s390/cio/device.c b/drivers/s390/cio/device.c
> index b29fe8d50..69492417b 100644
> --- a/drivers/s390/cio/device.c
> +++ b/drivers/s390/cio/device.c
> @@ -1664,10 +1664,10 @@ void __init ccw_device_destroy_console(struct 
> ccw_device *cdev)
>   struct io_subchannel_private *io_priv = to_io_private(sch);
>  
>   set_io_private(sch, NULL);
> - put_device(>dev);
> - put_device(>dev);
>   dma_free_coherent(>dev, sizeof(*io_priv->dma_area),
> io_priv->dma_area, io_priv->dma_area_dma);
> + put_device(>dev);
> + put_device(>dev);

That change looks reasonable.

>   kfree(io_priv);
>  }
>  
> @@ -1774,8 +1774,8 @@ static int ccw_device_remove(struct device *dev)
> ret, cdev->private->dev_id.ssid,
> cdev->private->dev_id.devno);
>   /* Give up reference obtained in ccw_device_set_online(). */
> - put_device(>dev);
>   spin_lock_irq(cdev->ccwlock);
> + put_device(>dev);

As the comment above states, the put_device() gives up the reference
obtained in ccw_device_set_online(). There's at least one more
reference remaining (held by the caller of the remove function). Moving
the put_device() does not fix anything here.

>   }
>   ccw_device_set_timeout(cdev, 0);
>   cdev->drv = NULL;



Re: [PATCH v4 04/27] s390: fix kernel-doc markups

2020-11-16 Thread Cornelia Huck
On Mon, 16 Nov 2020 11:18:00 +0100
Mauro Carvalho Chehab  wrote:

> fix one typo:
>   ccw driver -> ccw_driver
> 
> and one function rename.
> 
> Signed-off-by: Mauro Carvalho Chehab 
> ---
>  arch/s390/include/asm/ccwdev.h | 2 +-
>  arch/s390/include/asm/cio.h| 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Cornelia Huck 



Re: [PATCH v3 4/5] vfio-pci/zdev: Add zPCI capabilities to VFIO_DEVICE_GET_INFO

2020-10-09 Thread Cornelia Huck
On Wed,  7 Oct 2020 14:56:23 -0400
Matthew Rosato  wrote:

> Define a new configuration entry VFIO_PCI_ZDEV for VFIO/PCI.
> 
> When this s390-only feature is configured we add capabilities to the
> VFIO_DEVICE_GET_INFO ioctl that describe features of the associated
> zPCI device and its underlying hardware.
> 
> This patch is based on work previously done by Pierre Morel.
> 
> Signed-off-by: Matthew Rosato 
> ---
>  drivers/vfio/pci/Kconfig|  13 
>  drivers/vfio/pci/Makefile   |   1 +
>  drivers/vfio/pci/vfio_pci.c |  37 ++
>  drivers/vfio/pci/vfio_pci_private.h |  12 +++
>  drivers/vfio/pci/vfio_pci_zdev.c| 143 
> 
>  5 files changed, 206 insertions(+)
>  create mode 100644 drivers/vfio/pci/vfio_pci_zdev.c

With the compilation fix,

Reviewed-by: Cornelia Huck 



Re: [PATCH v3 3/5] vfio: Introduce capability definitions for VFIO_DEVICE_GET_INFO

2020-10-09 Thread Cornelia Huck
On Wed,  7 Oct 2020 14:56:22 -0400
Matthew Rosato  wrote:

> Allow the VFIO_DEVICE_GET_INFO ioctl to include a capability chain.
> Add a flag indicating capability chain support, and introduce the
> definitions for the first set of capabilities which are specified to
> s390 zPCI devices.
> 
> Signed-off-by: Matthew Rosato 
> ---
>  include/uapi/linux/vfio.h  | 11 ++
>  include/uapi/linux/vfio_zdev.h | 78 
> ++
>  2 files changed, 89 insertions(+)
>  create mode 100644 include/uapi/linux/vfio_zdev.h

Reviewed-by: Cornelia Huck 



Re: [PATCH v2 5/5] MAINTAINERS: Add entry for s390 vfio-pci

2020-10-06 Thread Cornelia Huck
On Fri,  2 Oct 2020 16:00:44 -0400
Matthew Rosato  wrote:

> Add myself to cover s390-specific items related to vfio-pci.
> 
> Signed-off-by: Matthew Rosato 
> ---
>  MAINTAINERS | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 190c7fa..389c4ad 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -15162,6 +15162,14 @@ F:   Documentation/s390/vfio-ccw.rst
>  F:   drivers/s390/cio/vfio_ccw*
>  F:   include/uapi/linux/vfio_ccw.h
>  
> +S390 VFIO-PCI DRIVER
> +M:   Matthew Rosato 
> +L:   linux-s...@vger.kernel.org
> +L:   k...@vger.kernel.org
> +S:   Supported
> +F:   drivers/vfio/pci/vfio_pci_zdev.c
> +F:   include/uapi/linux/vfio_zdev.h
> +
>  S390 ZCRYPT DRIVER
>  M:   Harald Freudenberger 
>  L:   linux-s...@vger.kernel.org

Acked-by: Cornelia Huck 



Re: [PATCH v2 2/5] s390/pci: track whether util_str is valid in the zpci_dev

2020-10-06 Thread Cornelia Huck
On Fri,  2 Oct 2020 16:00:41 -0400
Matthew Rosato  wrote:

> We'll need to keep track of whether or not the byte string in util_str is
> valid and thus needs to be passed to a vfio-pci passthrough device.
> 
> Signed-off-by: Matthew Rosato 
> Acked-by: Niklas Schnelle 
> Acked-by: Christian Borntraeger 
> ---
>  arch/s390/include/asm/pci.h | 3 ++-
>  arch/s390/pci/pci_clp.c | 1 +
>  2 files changed, 3 insertions(+), 1 deletion(-)

FWIW:

Acked-by: Cornelia Huck 



Re: [PATCH v2 3/5] vfio-pci/zdev: define the vfio_zdev header

2020-10-05 Thread Cornelia Huck
On Mon, 5 Oct 2020 12:16:10 -0400
Matthew Rosato  wrote:

> On 10/5/20 12:01 PM, Cornelia Huck wrote:
> > On Mon, 5 Oct 2020 09:52:25 -0400
> > Matthew Rosato  wrote:
> >   
> >> On 10/2/20 5:44 PM, Alex Williamson wrote:  
> >   
> >>> Can you discuss why a region with embedded capability chain is a better
> >>> solution than extending the VFIO_DEVICE_GET_INFO ioctl to support a
> >>> capability chain and providing this info there?  This all appears to be
> >>> read-only info, so what's the benefit of duplicating yet another  
> >>
> >> It is indeed read-only info, and the device region was defined as such.
> >>
> >> I would not necessarily be opposed to extending VFIO_DEVICE_GET_INFO
> >> with these defined as capabilities; I'd say a primary motivating factor
> >> to putting these in their own region was to avoid stuffing a bunch of
> >> s390-specific capabilities into a general-purpose ioctl response.  
> > 
> > Can't you make the zdev code register the capabilities? That would put
> > them nicely into their own configurable part.
> >   
> 
> I can still keep the code that adds these capabilities in the zdev .c 
> file, thus meaning they will only be added for s390 zpci devices -- but 
> the actual definition of them should probably instead be in vfio.h, no? 
> (maybe that's what you mean, but let's lay it out just in case)
> 
> The capability IDs would be shared with any other potential user of 
> VFIO_DEVICE_GET_INFO (I guess there is precedent for this already, 
> nvlink2 does this for vfio_region_info, see 
> VFIO_REGION_INFO_CAP_NVLINK2_SSATGT as an example).
> 
> Today, ZPCI would be the only users of VFIO_DEVICE_GET_INFO capability 
> chains.  Tomorrow, some other type might use them too.  Unless we want 
> to put a stake in the ground that says there will never be a case for a 
> capability that all devices share on VFIO_DEVICE_GET_INFO, I think we 
> should keep the IDs unique and define the capabilities in vfio.h but do 
> the corresponding add_capability() calls from a zdev-specific file.

Agreed. We should have enough space for multiple users, and I do not
consider reserving the IDs cluttering.



Re: [PATCH v2 3/5] vfio-pci/zdev: define the vfio_zdev header

2020-10-05 Thread Cornelia Huck
On Mon, 5 Oct 2020 09:52:25 -0400
Matthew Rosato  wrote:

> On 10/2/20 5:44 PM, Alex Williamson wrote:

> > Can you discuss why a region with embedded capability chain is a better
> > solution than extending the VFIO_DEVICE_GET_INFO ioctl to support a
> > capability chain and providing this info there?  This all appears to be
> > read-only info, so what's the benefit of duplicating yet another  
> 
> It is indeed read-only info, and the device region was defined as such.
> 
> I would not necessarily be opposed to extending VFIO_DEVICE_GET_INFO 
> with these defined as capabilities; I'd say a primary motivating factor 
> to putting these in their own region was to avoid stuffing a bunch of 
> s390-specific capabilities into a general-purpose ioctl response.

Can't you make the zdev code register the capabilities? That would put
them nicely into their own configurable part.

> 
> But if you're OK with that notion, I can give that a crack in v3.
> 
> > capability chain in a region?  It would also be possible to define four
> > separate device specific regions, one for each of these capabilities
> > rather than creating this chain.  It just seems like a strange approach  
> 
> I'm not sure if creating separate regions would be the right approach 
> though; these are just the first 4.  There will definitely be additional 
> capabilities in support of new zPCI features moving forward, I'm not 
> sure how many regions we really want to end up with.  Some might be as 
> small as a single field, which seems more in-line with capabilities vs 
> an entire region.

If we are expecting more of these in the future, going with GET_INFO
capabilities when adding new ones seems like the best approach.



Re: [RFC PATCH 0/3] KVM: Introduce "VM bugged" concept

2020-09-29 Thread Cornelia Huck
On Wed, 23 Sep 2020 15:45:27 -0700
Sean Christopherson  wrote:

> This series introduces a concept we've discussed a few times in x86 land.
> The crux of the problem is that x86 has a few cases where KVM could
> theoretically encounter a software or hardware bug deep in a call stack
> without any sane way to propagate the error out to userspace.
> 
> Another use case would be for scenarios where letting the VM live will
> do more harm than good, e.g. we've been using KVM_BUG_ON for early TDX
> enabling as botching anything related to secure paging all but guarantees
> there will be a flood of WARNs and error messages because lower level PTE
> operations will fail if an upper level operation failed.
> 
> The basic idea is to WARN_ONCE if a bug is encountered, kick all vCPUs out
> to userspace, and mark the VM as bugged so that no ioctls() can be issued
> on the VM or its devices/vCPUs.

I think this makes a lot of sense.

Are there other user space interactions where we want to generate an
error for a bugged VM, e.g. via eventfd?

And can we make the 'bugged' information available to user space in a
structured way?

> 
> RFC as I've done nowhere near enough testing to verify that rejecting the
> ioctls(), evicting running vCPUs, etc... works as intended.
> 
> Sean Christopherson (3):
>   KVM: Export kvm_make_all_cpus_request() for use in marking VMs as
> bugged
>   KVM: Add infrastructure and macro to mark VM as bugged
>   KVM: x86: Use KVM_BUG/KVM_BUG_ON to handle bugs that are fatal to the
> VM
> 
>  arch/x86/kvm/svm/svm.c   |  2 +-
>  arch/x86/kvm/vmx/vmx.c   | 23 
>  arch/x86/kvm/x86.c   |  4 
>  include/linux/kvm_host.h | 45 
>  virt/kvm/kvm_main.c  | 11 +-
>  5 files changed, 61 insertions(+), 24 deletions(-)
> 



Re: [PATCH 4/4] vfio-pci/zdev: use a device region to retrieve zPCI information

2020-09-22 Thread Cornelia Huck
On Sat, 19 Sep 2020 11:28:38 -0400
Matthew Rosato  wrote:

> Define a new configuration entry VFIO_PCI_ZDEV for VFIO/PCI.
> 
> When this s390-only feature is configured we initialize a new device
> region, VFIO_REGION_SUBTYPE_IBM_ZPCI_CLP, to hold information provided
> by the underlying hardware.
> 
> This patch is based on work previously done by Pierre Morel.
> 
> Signed-off-by: Matthew Rosato 
> ---
>  drivers/vfio/pci/Kconfig|  13 ++
>  drivers/vfio/pci/Makefile   |   1 +
>  drivers/vfio/pci/vfio_pci.c |   8 ++
>  drivers/vfio/pci/vfio_pci_private.h |  10 ++
>  drivers/vfio/pci/vfio_pci_zdev.c| 242 
> 

Maybe you want to add yourself to MAINTAINERS for the zdev-specific
files? You're probably better suited to review changes to the
zpci-specific code :)

>  5 files changed, 274 insertions(+)
>  create mode 100644 drivers/vfio/pci/vfio_pci_zdev.c

(...)

> +int vfio_pci_zdev_init(struct vfio_pci_device *vdev)
> +{
> + struct vfio_region_zpci_info *region;
> + struct zpci_dev *zdev;
> + size_t clp_offset;
> + int size;
> + int ret;
> +
> + if (!vdev->pdev->bus)
> + return -ENODEV;
> +
> + zdev = to_zpci(vdev->pdev);
> + if (!zdev)
> + return -ENODEV;
> +
> + /* Calculate size needed for all supported CLP features  */
> + size = sizeof(*region) +
> +sizeof(struct vfio_region_zpci_info_qpci) +
> +sizeof(struct vfio_region_zpci_info_qpcifg) +
> +(sizeof(struct vfio_region_zpci_info_util) + CLP_UTIL_STR_LEN) +
> +(sizeof(struct vfio_region_zpci_info_pfip) +
> + CLP_PFIP_NR_SEGMENTS);
> +
> + region = kmalloc(size, GFP_KERNEL);
> + if (!region)
> + return -ENOMEM;
> +
> + /* Fill in header */
> + region->argsz = size;
> + clp_offset = region->offset = sizeof(struct vfio_region_zpci_info);
> +
> + /* Fill the supported CLP features */
> + clp_offset = vfio_pci_zdev_add_qpci(zdev, region, clp_offset);
> + clp_offset = vfio_pci_zdev_add_qpcifg(zdev, region, clp_offset);
> + clp_offset = vfio_pci_zdev_add_util(zdev, region, clp_offset);
> + clp_offset = vfio_pci_zdev_add_pfip(zdev, region, clp_offset);

So, the regions are populated once. Can any of the values in the
hardware structures be modified by a guest? Or changed from the
hardware side?

> +
> + ret = vfio_pci_register_dev_region(vdev,
> + PCI_VENDOR_ID_IBM | VFIO_REGION_TYPE_PCI_VENDOR_TYPE,
> + VFIO_REGION_SUBTYPE_IBM_ZPCI_CLP, _pci_zdev_regops,
> + size, VFIO_REGION_INFO_FLAG_READ, region);
> + if (ret)
> + kfree(region);
> +
> + return ret;
> +}



Re: [PATCH 3/4] vfio-pci/zdev: define the vfio_zdev header

2020-09-22 Thread Cornelia Huck
On Sat, 19 Sep 2020 11:28:37 -0400
Matthew Rosato  wrote:

> We define a new device region in vfio.h to be able to get the ZPCI CLP
> information by reading this region from userspace.
> 
> We create a new file, vfio_zdev.h to define the structure of the new
> region defined in vfio.h
> 
> Signed-off-by: Matthew Rosato 
> ---
>  include/uapi/linux/vfio.h  |   5 ++
>  include/uapi/linux/vfio_zdev.h | 116 
> +
>  2 files changed, 121 insertions(+)
>  create mode 100644 include/uapi/linux/vfio_zdev.h
> 
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 9204705..65eb367 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -326,6 +326,11 @@ struct vfio_region_info_cap_type {
>   * to do TLB invalidation on a GPU.
>   */
>  #define VFIO_REGION_SUBTYPE_IBM_NVLINK2_ATSD (1)
> +/*
> + * IBM zPCI specific hardware feature information for a devcie.  The contents
> + * of this region are mapped by struct vfio_region_zpci_info.
> + */
> +#define VFIO_REGION_SUBTYPE_IBM_ZPCI_CLP (2)

This is not really for a 10de vendor, but for all pci devices accessed
via zpci, isn't it?

We obviously want to avoid collisions here; not really sure how to
cover all possible vendors. Maybe just pick a high number?

>  
>  /* sub-types for VFIO_REGION_TYPE_GFX */
>  #define VFIO_REGION_SUBTYPE_GFX_EDID(1)
> diff --git a/include/uapi/linux/vfio_zdev.h b/include/uapi/linux/vfio_zdev.h
> new file mode 100644
> index 000..c9e4891
> --- /dev/null
> +++ b/include/uapi/linux/vfio_zdev.h
> @@ -0,0 +1,116 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +/*
> + * Region definition for ZPCI devices
> + *
> + * Copyright IBM Corp. 2020
> + *
> + * Author(s): Pierre Morel 
> + *Matthew Rosato 
> + */
> +
> +#ifndef _VFIO_ZDEV_H_
> +#define _VFIO_ZDEV_H_
> +
> +#include 
> +
> +/**
> + * struct vfio_region_zpci_info - ZPCI information
> + *
> + * This region provides zPCI specific hardware feature information for a
> + * device.
> + *
> + * The ZPCI information structure is presented as a chain of CLP features

"CLP features" == "features returned by the CLP instruction", I guess?
Maybe mention that explicitly?

> + * defined below. argsz provides the size of the entire region, and offset
> + * provides the location of the first CLP feature in the chain.
> + *
> + */
> +struct vfio_region_zpci_info {
> + __u32 argsz;/* Size of entire payload */
> + __u32 offset;   /* Location of first entry */
> +} __packed;

This '__packed' annotation seems redundant. I think that all of these
structures should be defined in a way that packing is unneeded (which
seems to be the case on a quick browse.)

> +
> +/**
> + * struct vfio_region_zpci_info_hdr - ZPCI header information
> + *
> + * This structure is included at the top of each CLP feature to define what
> + * type of CLP feature is presented / the structure version. The next value
> + * defines the offset of the next CLP feature, and is an offset from the very
> + * beginning of the region (vfio_region_zpci_info).
> + *
> + * Each CLP feature must have it's own unique 'id'.

s/it's/its/

Is the 'id' something that is already provided by the CLP instruction?

> + */
> +struct vfio_region_zpci_info_hdr {
> + __u16 id;   /* Identifies the CLP type */
> + __u16   version;/* version of the CLP data */
> + __u32 next; /* Offset of next entry */
> +} __packed;
> +
> +/**
> + * struct vfio_region_zpci_info_qpci - Initial Query PCI information
> + *
> + * This region provides an initial set of data from the Query PCI Function

What does 'initial' mean in this context? Information you get for a
freshly initialized function?

> + * CLP.
> + */
> +#define VFIO_REGION_ZPCI_INFO_QPCI   1
> +
> +struct vfio_region_zpci_info_qpci {
> + struct vfio_region_zpci_info_hdr hdr;
> + __u64 start_dma;/* Start of available DMA addresses */
> + __u64 end_dma;  /* End of available DMA addresses */
> + __u16 pchid;/* Physical Channel ID */
> + __u16 vfn;  /* Virtual function number */
> + __u16 fmb_length;   /* Measurement Block Length (in bytes) */
> + __u8 pft;   /* PCI Function Type */
> + __u8 gid;   /* PCI function group ID */
> +} __packed;
> +
> +
> +/**
> + * struct vfio_region_zpci_info_qpcifg - Initial Query PCI Function Group 
> info
> + *
> + * This region provides an initial set of data from the Query PCI Function
> + * Group CLP.
> + */
> +#define VFIO_REGION_ZPCI_INFO_QPCIFG 2
> +
> +struct vfio_region_zpci_info_qpcifg {
> + struct vfio_region_zpci_info_hdr hdr;
> + __u64 dasm; /* DMA Address space mask */
> + __u64 msi_addr; /* MSI address */
> + __u64 flags;
> +#define VFIO_PCI_ZDEV_FLAGS_REFRESH 1 /* Use program-specified TLB refresh */
> + __u16 mui;  

Re: [PATCH 1/4] s390/pci: stash version in the zpci_dev

2020-09-21 Thread Cornelia Huck
On Mon, 21 Sep 2020 11:44:20 -0400
Matthew Rosato  wrote:

> On 9/21/20 11:01 AM, Cornelia Huck wrote:
> > On Sat, 19 Sep 2020 11:28:35 -0400
> > Matthew Rosato  wrote:
> >   
> >> In preparation for passing the info on to vfio-pci devices, stash the
> >> supported PCI version for the target device in the zpci_dev.  
> > 
> > Hm, what kind of version is that? The version of the zPCI interface?
> > 
> > Inquiring minds want to know :)
> >   
> 
> Ha :) It's related to PCI-SIG spec versions and which one the zPCI 
> facility supports for this device.

Thanks for the info :)

> 
> >>
> >> Signed-off-by: Matthew Rosato 
> >> ---
> >>   arch/s390/include/asm/pci.h | 1 +
> >>   arch/s390/pci/pci_clp.c | 1 +
> >>   2 files changed, 2 insertions(+)  
> >   
> 

FWIW,

Acked-by: Cornelia Huck 



Re: [PATCH 1/4] s390/pci: stash version in the zpci_dev

2020-09-21 Thread Cornelia Huck
On Sat, 19 Sep 2020 11:28:35 -0400
Matthew Rosato  wrote:

> In preparation for passing the info on to vfio-pci devices, stash the
> supported PCI version for the target device in the zpci_dev.

Hm, what kind of version is that? The version of the zPCI interface?

Inquiring minds want to know :)

> 
> Signed-off-by: Matthew Rosato 
> ---
>  arch/s390/include/asm/pci.h | 1 +
>  arch/s390/pci/pci_clp.c | 1 +
>  2 files changed, 2 insertions(+)



Re: [PATCH v2 2/2] vfio/pci: Don't regenerate vconfig for all BARs if !bardirty

2020-09-21 Thread Cornelia Huck
On Mon, 21 Sep 2020 12:51:16 +0800
Zenghui Yu  wrote:

> Now we regenerate vconfig for all the BARs via vfio_bar_fixup(), every time
> any offset of any of them are read. Though BARs aren't re-read regularly,
> the regeneration can be avoid if no BARs had been written since they were

s/avoid/avoided/

> last read, in which case the vdev->bardirty is false.

s/the//

> 
> Let's predicate the vfio_bar_fixup() on the bardirty so that it can return
> immediately if !bardirty.

Maybe

"Let's return immediately in vfio_bar_fixup() if bardirty is false." ?

> 
> Suggested-by: Alex Williamson 
> Signed-off-by: Zenghui Yu 
> ---
> * From v1:
>   - Per Alex's suggestion, let vfio_bar_fixup() test vdev->bardirty to
> avoid doing work if bardirty is false, instead of removing it entirely.
>   - Rewrite the commit message.
> 
>  drivers/vfio/pci/vfio_pci_config.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_config.c 
> b/drivers/vfio/pci/vfio_pci_config.c
> index d98843feddce..5e02ba07e8e8 100644
> --- a/drivers/vfio/pci/vfio_pci_config.c
> +++ b/drivers/vfio/pci/vfio_pci_config.c
> @@ -467,6 +467,9 @@ static void vfio_bar_fixup(struct vfio_pci_device *vdev)
>   __le32 *vbar;
>   u64 mask;
> 
> + if (!vdev->bardirty)

Finally, bardirty can actually affect something :)

> + return;
> +
>   vbar = (__le32 *)>vconfig[PCI_BASE_ADDRESS_0];
>  
>   for (i = 0; i < PCI_STD_NUM_BARS; i++, vbar++) {

Reviewed-by: Cornelia Huck 



Re: [PATCH] s390/vfio-ap: fix unregister GISC when KVM is already gone results in OOPS

2020-09-21 Thread Cornelia Huck
On Fri, 18 Sep 2020 13:02:34 -0400
Tony Krowiak  wrote:

> Attempting to unregister Guest Interruption Subclass (GISC) when the
> link between the matrix mdev and KVM has been removed results in the
> following:
> 
>"Kernel panic -not syncing: Fatal exception: panic_on_oops"

I'm wondering how we get there (why are we unregistering the gisc if
the mdev and kvm are not yet linked or are already unlinked?), so I
agree that the actual backchain would be helpful here.

> 
> This patch fixes this bug by verifying the matrix mdev and KVM are still
> linked prior to unregistering the GISC.
> 
> Signed-off-by: Tony Krowiak 
> ---
>  drivers/s390/crypto/vfio_ap_ops.c | 14 +-
>  1 file changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
> b/drivers/s390/crypto/vfio_ap_ops.c
> index e0bde8518745..847a88642644 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -119,11 +119,15 @@ static void vfio_ap_wait_for_irqclear(int apqn)
>   */
>  static void vfio_ap_free_aqic_resources(struct vfio_ap_queue *q)
>  {
> - if (q->saved_isc != VFIO_AP_ISC_INVALID && q->matrix_mdev)

If checking for ->kvm is the right thing to do, I agree that moving the
check here would be easier to read.

> - kvm_s390_gisc_unregister(q->matrix_mdev->kvm, q->saved_isc);
> - if (q->saved_pfn && q->matrix_mdev)
> - vfio_unpin_pages(mdev_dev(q->matrix_mdev->mdev),
> -  >saved_pfn, 1);
> + if (q->matrix_mdev) {
> + if (q->saved_isc != VFIO_AP_ISC_INVALID && q->matrix_mdev->kvm)
> + kvm_s390_gisc_unregister(q->matrix_mdev->kvm,
> +  q->saved_isc);
> + if (q->saved_pfn)
> + vfio_unpin_pages(mdev_dev(q->matrix_mdev->mdev),
> +  >saved_pfn, 1);
> + }
> +
>   q->saved_pfn = 0;
>   q->saved_isc = VFIO_AP_ISC_INVALID;
>  }



Re: [PATCH v10 07/16] s390/vfio-ap: sysfs attribute to display the guest's matrix

2020-09-17 Thread Cornelia Huck
On Fri, 21 Aug 2020 15:56:07 -0400
Tony Krowiak  wrote:

> The matrix of adapters and domains configured in a guest's CRYCB may
> differ from the matrix of adapters and domains assigned to the matrix mdev,
> so this patch introduces a sysfs attribute to display the matrix of a guest
> using the matrix mdev. For a matrix mdev denoted by $uuid, the crycb for a
> guest using the matrix mdev can be displayed as follows:
> 
>cat /sys/devices/vfio_ap/matrix/$uuid/guest_matrix
> 
> If a guest is not using the matrix mdev at the time the crycb is displayed,
> an error (ENODEV) will be returned.
> 
> Signed-off-by: Tony Krowiak 
> ---
>  drivers/s390/crypto/vfio_ap_ops.c | 58 +++
>  1 file changed, 58 insertions(+)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
> b/drivers/s390/crypto/vfio_ap_ops.c
> index efb229033f9e..30bf23734af6 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -1119,6 +1119,63 @@ static ssize_t matrix_show(struct device *dev, struct 
> device_attribute *attr,
>  }
>  static DEVICE_ATTR_RO(matrix);
>  
> +static ssize_t guest_matrix_show(struct device *dev,
> +  struct device_attribute *attr, char *buf)
> +{
> + struct mdev_device *mdev = mdev_from_dev(dev);
> + struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
> + char *bufpos = buf;
> + unsigned long apid;
> + unsigned long apqi;
> + unsigned long apid1;
> + unsigned long apqi1;
> + unsigned long napm_bits = matrix_mdev->shadow_apcb.apm_max + 1;
> + unsigned long naqm_bits = matrix_mdev->shadow_apcb.aqm_max + 1;
> + int nchars = 0;
> + int n;
> +
> + if (!vfio_ap_mdev_has_crycb(matrix_mdev))
> + return -ENODEV;
> +
> + apid1 = find_first_bit_inv(matrix_mdev->shadow_apcb.apm, napm_bits);
> + apqi1 = find_first_bit_inv(matrix_mdev->shadow_apcb.aqm, naqm_bits);
> +
> + mutex_lock(_dev->lock);
> +
> + if ((apid1 < napm_bits) && (apqi1 < naqm_bits)) {
> + for_each_set_bit_inv(apid, matrix_mdev->shadow_apcb.apm,
> +  napm_bits) {
> + for_each_set_bit_inv(apqi,
> +  matrix_mdev->shadow_apcb.aqm,
> +  naqm_bits) {
> + n = sprintf(bufpos, "%02lx.%04lx\n", apid,
> + apqi);
> + bufpos += n;
> + nchars += n;
> + }
> + }
> + } else if (apid1 < napm_bits) {
> + for_each_set_bit_inv(apid, matrix_mdev->shadow_apcb.apm,
> +  napm_bits) {
> + n = sprintf(bufpos, "%02lx.\n", apid);
> + bufpos += n;
> + nchars += n;
> + }
> + } else if (apqi1 < naqm_bits) {
> + for_each_set_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm,
> +  naqm_bits) {
> + n = sprintf(bufpos, ".%04lx\n", apqi);
> + bufpos += n;
> + nchars += n;
> + }
> + }
> +
> + mutex_unlock(_dev->lock);
> +
> + return nchars;
> +}

This basically looks like a version of matrix_show() operating on the
shadow apcb. I'm wondering if we could consolidate these two functions
by passing in the structure to operate on as a parameter? Might not be
worth the effort, though.



Re: [PATCH v10 06/16] s390/vfio-ap: introduce shadow APCB

2020-09-17 Thread Cornelia Huck
On Fri, 21 Aug 2020 15:56:06 -0400
Tony Krowiak  wrote:

> The APCB is a field within the CRYCB that provides the AP configuration
> to a KVM guest. Let's introduce a shadow copy of the KVM guest's APCB and
> maintain it for the lifespan of the guest.
> 
> Signed-off-by: Tony Krowiak 
> ---
>  drivers/s390/crypto/vfio_ap_ops.c | 32 ++-
>  drivers/s390/crypto/vfio_ap_private.h |  2 ++
>  2 files changed, 29 insertions(+), 5 deletions(-)

(...)

> @@ -1202,13 +1223,12 @@ static int vfio_ap_mdev_group_notifier(struct 
> notifier_block *nb,
>   if (ret)
>   return NOTIFY_DONE;
>  
> - /* If there is no CRYCB pointer, then we can't copy the masks */
> - if (!matrix_mdev->kvm->arch.crypto.crycbd)
> + if (!vfio_ap_mdev_has_crycb(matrix_mdev))
>   return NOTIFY_DONE;
>  
> - kvm_arch_crypto_set_masks(matrix_mdev->kvm, matrix_mdev->matrix.apm,
> -   matrix_mdev->matrix.aqm,
> -   matrix_mdev->matrix.adm);
> + memcpy(_mdev->shadow_apcb, _mdev->matrix,
> +sizeof(matrix_mdev->shadow_apcb));
> + vfio_ap_mdev_commit_crycb(matrix_mdev);

We are sure that the shadow APCB always matches up as we are the only
ones manipulating the APCB in the CRYCB, right?

>  
>   return NOTIFY_OK;
>  }



Re: [PATCH v10 04/16] s390/zcrypt: driver callback to indicate resource in use

2020-09-17 Thread Cornelia Huck
On Tue, 15 Sep 2020 15:32:35 -0400
Tony Krowiak  wrote:

> On 9/14/20 11:29 AM, Cornelia Huck wrote:
> > On Fri, 21 Aug 2020 15:56:04 -0400
> > Tony Krowiak  wrote:
> >  
> >> Introduces a new driver callback to prevent a root user from unbinding
> >> an AP queue from its device driver if the queue is in use. The intent of
> >> this callback is to provide a driver with the means to prevent a root user
> >> from inadvertently taking a queue away from a matrix mdev and giving it to
> >> the host while it is assigned to the matrix mdev. The callback will
> >> be invoked whenever a change to the AP bus's sysfs apmask or aqmask
> >> attributes would result in one or more AP queues being removed from its
> >> driver. If the callback responds in the affirmative for any driver
> >> queried, the change to the apmask or aqmask will be rejected with a device
> >> in use error.
> >>
> >> For this patch, only non-default drivers will be queried. Currently,
> >> there is only one non-default driver, the vfio_ap device driver. The
> >> vfio_ap device driver facilitates pass-through of an AP queue to a
> >> guest. The idea here is that a guest may be administered by a different
> >> sysadmin than the host and we don't want AP resources to unexpectedly
> >> disappear from a guest's AP configuration (i.e., adapters, domains and
> >> control domains assigned to the matrix mdev). This will enforce the proper
> >> procedure for removing AP resources intended for guest usage which is to
> >> first unassign them from the matrix mdev, then unbind them from the
> >> vfio_ap device driver.
> >>
> >> Signed-off-by: Tony Krowiak 
> >> Reported-by: kernel test robot   
> > This looks a bit odd...  
> 
> I've removed all of those. These kernel test robot errors were flagged
> in the last series. The review comments from the robot suggested
> the reported-by, but I assume that was for patches intended to
> fix those errors, so I am removing these as per Christian's comments.

Yes, I think the Reported-by: mostly makes sense if you include a patch
to fix something on top.

> 
> >  
> >> ---
> >>   drivers/s390/crypto/ap_bus.c | 148 ---
> >>   drivers/s390/crypto/ap_bus.h |   4 +
> >>   2 files changed, 142 insertions(+), 10 deletions(-)
> >>  
> > (...)
> >  
> >> @@ -1107,12 +1118,70 @@ static ssize_t apmask_show(struct bus_type *bus, 
> >> char *buf)
> >>return rc;
> >>   }
> >>   
> >> +static int __verify_card_reservations(struct device_driver *drv, void 
> >> *data)
> >> +{
> >> +  int rc = 0;
> >> +  struct ap_driver *ap_drv = to_ap_drv(drv);
> >> +  unsigned long *newapm = (unsigned long *)data;
> >> +
> >> +  /*
> >> +   * No need to verify whether the driver is using the queues if it is the
> >> +   * default driver.
> >> +   */
> >> +  if (ap_drv->flags & AP_DRIVER_FLAG_DEFAULT)
> >> +  return 0;
> >> +
> >> +  /* The non-default driver's module must be loaded */
> >> +  if (!try_module_get(drv->owner))
> >> +  return 0;
> >> +
> >> +  if (ap_drv->in_use)
> >> +  if (ap_drv->in_use(newapm, ap_perms.aqm))
> >> +  rc = -EADDRINUSE;  
> > ISTR that Christian suggested -EBUSY in a past revision of this series?
> > I think that would be more appropriate.  
> 
> I went back and looked and sure enough, he did recommend that.
> You have a great memory! I didn't respond to that comment, so I
> must have missed it at the time.
> 
> I personally prefer EADDRINUSE because I think it is more indicative
> of the reason an AP resource can not be assigned back to the host
> drivers is because it is in use by a guest or, at the very least, reserved
> for use by a guest (i.e., assigned to an mdev). To say it is busy implies
> that the device is busy performing encryption services which may or
> may not be true at a given moment. Even if so, that is not the reason
> for refusing to allow reassignment of the device.

I have a different understanding of these error codes: EADDRINUSE is
something used in the networking context when an actual address is
already used elsewhere. EBUSY is more of a generic error that indicates
that a certain resource is not free to perform the requested operation;
it does not necessarily mean that the resource is currently actively
doing something. Kind of when you get EBUSY when trying to eject
something another program holds a reference on: that other program
might not actually be doing anything, but it potentially could.



Re: [PATCH 2/2] vfio/pci: Remove bardirty from vfio_pci_device

2020-09-17 Thread Cornelia Huck
On Thu, 17 Sep 2020 11:31:28 +0800
Zenghui Yu  wrote:

> It isn't clear what purpose the @bardirty serves. It might be used to avoid
> the unnecessary vfio_bar_fixup() invoking on a user-space BAR read, which
> is not required when bardirty is unset.
> 
> The variable was introduced in commit 89e1f7d4c66d ("vfio: Add PCI device
> driver") but never actually used, so it shouldn't be that important. Remove
> it.
> 
> Signed-off-by: Zenghui Yu 
> ---
>  drivers/vfio/pci/vfio_pci_config.c  | 7 ---
>  drivers/vfio/pci/vfio_pci_private.h | 1 -
>  2 files changed, 8 deletions(-)

Yes, it seems to have been write-only all the time.

Reviewed-by: Cornelia Huck 



Re: [PATCH 1/2] vfio/pci: Remove redundant declaration of vfio_pci_driver

2020-09-17 Thread Cornelia Huck
On Thu, 17 Sep 2020 11:31:27 +0800
Zenghui Yu  wrote:

> It was added by commit 137e5531351d ("vfio/pci: Add sriov_configure
> support") and actually unnecessary. Remove it.
> 
> Signed-off-by: Zenghui Yu 
> ---
>  drivers/vfio/pci/vfio_pci.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 1ab1f5cda4ac..da68e2f86622 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -1862,7 +1862,6 @@ static const struct vfio_device_ops vfio_pci_ops = {
>  
>  static int vfio_pci_reflck_attach(struct vfio_pci_device *vdev);
>  static void vfio_pci_reflck_put(struct vfio_pci_reflck *reflck);
> -static struct pci_driver vfio_pci_driver;
>  
>  static int vfio_pci_bus_notifier(struct notifier_block *nb,
>    unsigned long action, void *data)

Reviewed-by: Cornelia Huck 



Re: [PATCH v3] vfio iommu: Add dma available capability

2020-09-16 Thread Cornelia Huck
On Tue, 15 Sep 2020 15:05:18 -0400
Matthew Rosato  wrote:

> Commit 492855939bdb ("vfio/type1: Limit DMA mappings per container")
> added the ability to limit the number of memory backed DMA mappings.
> However on s390x, when lazy mapping is in use, we use a very large
> number of concurrent mappings.  Let's provide the current allowable
> number of DMA mappings to userspace via the IOMMU info chain so that
> userspace can take appropriate mitigation.
> 
> Signed-off-by: Matthew Rosato 
> ---
>  drivers/vfio/vfio_iommu_type1.c | 17 +
>  include/uapi/linux/vfio.h   | 15 +++
>  2 files changed, 32 insertions(+)

Reviewed-by: Cornelia Huck 



Re: [PATCH v2] vfio iommu: Add dma available capability

2020-09-15 Thread Cornelia Huck
On Mon, 14 Sep 2020 18:25:31 -0400
Matthew Rosato  wrote:

> Commit 492855939bdb ("vfio/type1: Limit DMA mappings per container")
> added the ability to limit the number of memory backed DMA mappings.
> However on s390x, when lazy mapping is in use, we use a very large
> number of concurrent mappings.  Let's provide the current allowable
> number of DMA mappings to userspace via the IOMMU info chain so that
> userspace can take appropriate mitigation.
> 
> Signed-off-by: Matthew Rosato 
> ---
>  drivers/vfio/vfio_iommu_type1.c | 17 +
>  include/uapi/linux/vfio.h   | 16 
>  2 files changed, 33 insertions(+)

(...)

> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 9204705..a8cc4a5 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -1039,6 +1039,22 @@ struct vfio_iommu_type1_info_cap_migration {
>   __u64   max_dirty_bitmap_size;  /* in bytes */
>  };
>  
> +/*
> + * The DMA available capability allows to report the current number of
> + * simultaneously outstanding DMA mappings that are allowed.
> + *
> + * The structures below define version 1 of this capability.

"The structure below defines..." ?

> + *
> + * max: specifies the maximum number of outstanding DMA mappings allowed.

I think you forgot to tweak that one:

"avail: specifies the current number of outstanding DMA mappings allowed."

?

> + */
> +#define VFIO_IOMMU_TYPE1_INFO_DMA_AVAIL 3
> +
> +struct vfio_iommu_type1_info_dma_avail {
> + struct  vfio_info_cap_header header;
> + __u32   avail;
> +};
> +
> +
>  #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
>  
>  /**



Re: [PATCH] vfio: fix a missed vfio group put in vfio_pin_pages

2020-09-15 Thread Cornelia Huck
On Tue, 15 Sep 2020 08:28:35 +0800
Yan Zhao  wrote:

> when error occurs, need to put vfio group after a successful get.
> 
> Fixes: 95fc87b44104 (vfio: Selective dirty page tracking if IOMMU backed
> device pins pages)

The format of the Fixes: line should be

Fixes: 95fc87b44104 ("vfio: Selective dirty page tracking if IOMMU backed 
device pins pages")

> 
> Signed-off-by: Yan Zhao 
> ---
>  drivers/vfio/vfio.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)

Reviewed-by: Cornelia Huck 



Re: [PATCH v10 05/16] s390/vfio-ap: implement in-use callback for vfio_ap driver

2020-09-14 Thread Cornelia Huck
On Fri, 21 Aug 2020 15:56:05 -0400
Tony Krowiak  wrote:

> Let's implement the callback to indicate when an APQN
> is in use by the vfio_ap device driver. The callback is
> invoked whenever a change to the apmask or aqmask would
> result in one or more queue devices being removed from the driver. The
> vfio_ap device driver will indicate a resource is in use
> if the APQN of any of the queue devices to be removed are assigned to
> any of the matrix mdevs under the driver's control.
> 
> Signed-off-by: Tony Krowiak 
> ---
>  drivers/s390/crypto/vfio_ap_drv.c |  1 +
>  drivers/s390/crypto/vfio_ap_ops.c | 68 ---
>  drivers/s390/crypto/vfio_ap_private.h |  2 +
>  3 files changed, 53 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_drv.c 
> b/drivers/s390/crypto/vfio_ap_drv.c
> index 24cdef60039a..aae5b3d8e3fa 100644
> --- a/drivers/s390/crypto/vfio_ap_drv.c
> +++ b/drivers/s390/crypto/vfio_ap_drv.c
> @@ -175,6 +175,7 @@ static int __init vfio_ap_init(void)
>   memset(_ap_drv, 0, sizeof(vfio_ap_drv));
>   vfio_ap_drv.probe = vfio_ap_queue_dev_probe;
>   vfio_ap_drv.remove = vfio_ap_queue_dev_remove;
> + vfio_ap_drv.in_use = vfio_ap_mdev_resource_in_use;
>   vfio_ap_drv.ids = ap_queue_ids;
>  
>   ret = ap_driver_register(_ap_drv, THIS_MODULE, VFIO_AP_DRV_NAME);
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
> b/drivers/s390/crypto/vfio_ap_ops.c
> index 2e37ee82e422..fc1aa6f947eb 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -515,18 +515,36 @@ vfio_ap_mdev_verify_queues_reserved_for_apid(struct 
> ap_matrix_mdev *matrix_mdev,
>   return 0;
>  }
>  
> +#define MDEV_SHARING_ERR "Userspace may not re-assign queue %02lx.%04lx " \
> +  "already assigned to %s"

Ah, I spoke too soon; this is what I had been looking for :)



Re: [PATCH v10 04/16] s390/zcrypt: driver callback to indicate resource in use

2020-09-14 Thread Cornelia Huck
On Fri, 21 Aug 2020 15:56:04 -0400
Tony Krowiak  wrote:

> Introduces a new driver callback to prevent a root user from unbinding
> an AP queue from its device driver if the queue is in use. The intent of
> this callback is to provide a driver with the means to prevent a root user
> from inadvertently taking a queue away from a matrix mdev and giving it to
> the host while it is assigned to the matrix mdev. The callback will
> be invoked whenever a change to the AP bus's sysfs apmask or aqmask
> attributes would result in one or more AP queues being removed from its
> driver. If the callback responds in the affirmative for any driver
> queried, the change to the apmask or aqmask will be rejected with a device
> in use error.
> 
> For this patch, only non-default drivers will be queried. Currently,
> there is only one non-default driver, the vfio_ap device driver. The
> vfio_ap device driver facilitates pass-through of an AP queue to a
> guest. The idea here is that a guest may be administered by a different
> sysadmin than the host and we don't want AP resources to unexpectedly
> disappear from a guest's AP configuration (i.e., adapters, domains and
> control domains assigned to the matrix mdev). This will enforce the proper
> procedure for removing AP resources intended for guest usage which is to
> first unassign them from the matrix mdev, then unbind them from the
> vfio_ap device driver.
> 
> Signed-off-by: Tony Krowiak 
> Reported-by: kernel test robot 

This looks a bit odd...

> ---
>  drivers/s390/crypto/ap_bus.c | 148 ---
>  drivers/s390/crypto/ap_bus.h |   4 +
>  2 files changed, 142 insertions(+), 10 deletions(-)
> 

(...)

> @@ -1107,12 +1118,70 @@ static ssize_t apmask_show(struct bus_type *bus, char 
> *buf)
>   return rc;
>  }
>  
> +static int __verify_card_reservations(struct device_driver *drv, void *data)
> +{
> + int rc = 0;
> + struct ap_driver *ap_drv = to_ap_drv(drv);
> + unsigned long *newapm = (unsigned long *)data;
> +
> + /*
> +  * No need to verify whether the driver is using the queues if it is the
> +  * default driver.
> +  */
> + if (ap_drv->flags & AP_DRIVER_FLAG_DEFAULT)
> + return 0;
> +
> + /* The non-default driver's module must be loaded */
> + if (!try_module_get(drv->owner))
> + return 0;
> +
> + if (ap_drv->in_use)
> + if (ap_drv->in_use(newapm, ap_perms.aqm))
> + rc = -EADDRINUSE;

ISTR that Christian suggested -EBUSY in a past revision of this series?
I think that would be more appropriate.

Also, I know we have discussed this before, but it is very hard to
figure out the offending device(s) if the sysfs manipulation failed. Can
we at least drop something into the syslog? That would be far from
perfect, but it gives an admin at least a chance to figure out why they
got an error. Some more structured way that would be usable from tools
can still be added later.

> +
> + module_put(drv->owner);
> +
> + return rc;
> +}



Re: [PATCH] vfio: Fix typo of the device_state

2020-09-11 Thread Cornelia Huck
On Thu, 10 Sep 2020 20:25:08 +0800
Zenghui Yu  wrote:

> A typo fix ("_RUNNNG" => "_RUNNING") in comment block of the uapi header.
> 
> Signed-off-by: Zenghui Yu 
> ---
>  include/uapi/linux/vfio.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 920470502329..d4bd39e124bf 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -462,7 +462,7 @@ struct vfio_region_gfx_edid {
>   * 5. Resumed
>   *  |->|
>   *
> - * 0. Default state of VFIO device is _RUNNNG when the user application 
> starts.
> + * 0. Default state of VFIO device is _RUNNING when the user application 
> starts.
>   * 1. During normal shutdown of the user application, the user application 
> may
>   *optionally change the VFIO device state from _RUNNING to _STOP. This
>   *    transition is optional. The vendor driver must support this transition 
> but

Reviewed-by: Cornelia Huck 



Re: [RFC PATCH 00/22] Enhance VHOST to enable SoC-to-SoC communication

2020-09-08 Thread Cornelia Huck
On Tue, 1 Sep 2020 16:50:03 +0800
Jason Wang  wrote:

> On 2020/9/1 下午1:24, Kishon Vijay Abraham I wrote:
> > Hi,
> >
> > On 28/08/20 4:04 pm, Cornelia Huck wrote:  
> >> On Thu, 9 Jul 2020 14:26:53 +0800
> >> Jason Wang  wrote:
> >>
> >> [Let me note right at the beginning that I first noted this while
> >> listening to Kishon's talk at LPC on Wednesday. I might be very
> >> confused about the background here, so let me apologize beforehand for
> >> any confusion I might spread.]
> >>  
> >>> On 2020/7/8 下午9:13, Kishon Vijay Abraham I wrote:  
> >>>> Hi Jason,
> >>>>
> >>>> On 7/8/2020 4:52 PM, Jason Wang wrote:  
> >>>>> On 2020/7/7 下午10:45, Kishon Vijay Abraham I wrote:  
> >>>>>> Hi Jason,
> >>>>>>
> >>>>>> On 7/7/2020 3:17 PM, Jason Wang wrote:  
> >>>>>>> On 2020/7/6 下午5:32, Kishon Vijay Abraham I wrote:  
> >>>>>>>> Hi Jason,
> >>>>>>>>
> >>>>>>>> On 7/3/2020 12:46 PM, Jason Wang wrote:  
> >>>>>>>>> On 2020/7/2 下午9:35, Kishon Vijay Abraham I wrote:  
> >>>>>>>>>> Hi Jason,
> >>>>>>>>>>
> >>>>>>>>>> On 7/2/2020 3:40 PM, Jason Wang wrote:  
> >>>>>>>>>>> On 2020/7/2 下午5:51, Michael S. Tsirkin wrote:  
> >>>>>>>>>>>> On Thu, Jul 02, 2020 at 01:51:21PM +0530, Kishon Vijay 
> >>>>>>>>>>>> Abraham I wrote:  
> >>>>>>>>>>>>> This series enhances Linux Vhost support to enable SoC-to-SoC
> >>>>>>>>>>>>> communication over MMIO. This series enables rpmsg 
> >>>>>>>>>>>>> communication between
> >>>>>>>>>>>>> two SoCs using both PCIe RC<->EP and HOST1-NTB-HOST2
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 1) Modify vhost to use standard Linux driver model
> >>>>>>>>>>>>> 2) Add support in vring to access virtqueue over MMIO
> >>>>>>>>>>>>> 3) Add vhost client driver for rpmsg
> >>>>>>>>>>>>> 4) Add PCIe RC driver (uses virtio) and PCIe EP driver 
> >>>>>>>>>>>>> (uses vhost) for
> >>>>>>>>>>>>>      rpmsg communication between two SoCs connected to 
> >>>>>>>>>>>>> each other
> >>>>>>>>>>>>> 5) Add NTB Virtio driver and NTB Vhost driver for rpmsg 
> >>>>>>>>>>>>> communication
> >>>>>>>>>>>>>      between two SoCs connected via NTB
> >>>>>>>>>>>>> 6) Add configfs to configure the components
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> UseCase1 :
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>    VHOST RPMSG VIRTIO RPMSG
> >>>>>>>>>>>>> +   +
> >>>>>>>>>>>>> |   |
> >>>>>>>>>>>>> |   |
> >>>>>>>>>>>>> |   |
> >>>>>>>>>>>>> |   |
> >>>>>>>>>>>>> +-v--+ +--v---+
> >>>>>>>>>>>>> |   Linux    | | Linux    |
> >>>>>>>>>>>>> |  Endpoint  | | Root Complex |
> >>>>>>>>>>>>> | <->  |
> >>>>>>>>>>>>> |    | |  |
> >>>>>>>>>>>>> |    SOC1    | | SOC2 |
> >>>>>>>>>>>>> ++ +--+
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> UseCase 2:
> >>>>>>>>>>>>>
> >>&g

Re: [PATCH v11 0/2] s390: virtio: let arch validate VIRTIO features

2020-09-08 Thread Cornelia Huck
On Tue, 8 Sep 2020 00:39:51 +0200
Halil Pasic  wrote:

> On Mon,  7 Sep 2020 11:39:05 +0200
> Pierre Morel  wrote:
> 
> > Hi all,
> > 
> > The goal of the series is to give a chance to the architecture
> > to validate VIRTIO device features.  
> 
> Michael, is this going in via your tree?
> 

I believe Michael's tree is the right place for this, but I can also
queue it if I get an ack on patch 1.



Re: [RFC PATCH 00/22] Enhance VHOST to enable SoC-to-SoC communication

2020-08-28 Thread Cornelia Huck
On Thu, 9 Jul 2020 14:26:53 +0800
Jason Wang  wrote:

[Let me note right at the beginning that I first noted this while
listening to Kishon's talk at LPC on Wednesday. I might be very
confused about the background here, so let me apologize beforehand for
any confusion I might spread.]

> On 2020/7/8 下午9:13, Kishon Vijay Abraham I wrote:
> > Hi Jason,
> >
> > On 7/8/2020 4:52 PM, Jason Wang wrote:  
> >> On 2020/7/7 下午10:45, Kishon Vijay Abraham I wrote:  
> >>> Hi Jason,
> >>>
> >>> On 7/7/2020 3:17 PM, Jason Wang wrote:  
>  On 2020/7/6 下午5:32, Kishon Vijay Abraham I wrote:  
> > Hi Jason,
> >
> > On 7/3/2020 12:46 PM, Jason Wang wrote:  
> >> On 2020/7/2 下午9:35, Kishon Vijay Abraham I wrote:  
> >>> Hi Jason,
> >>>
> >>> On 7/2/2020 3:40 PM, Jason Wang wrote:  
>  On 2020/7/2 下午5:51, Michael S. Tsirkin wrote:  
> > On Thu, Jul 02, 2020 at 01:51:21PM +0530, Kishon Vijay Abraham I 
> > wrote:  
> >> This series enhances Linux Vhost support to enable SoC-to-SoC
> >> communication over MMIO. This series enables rpmsg communication 
> >> between
> >> two SoCs using both PCIe RC<->EP and HOST1-NTB-HOST2
> >>
> >> 1) Modify vhost to use standard Linux driver model
> >> 2) Add support in vring to access virtqueue over MMIO
> >> 3) Add vhost client driver for rpmsg
> >> 4) Add PCIe RC driver (uses virtio) and PCIe EP driver (uses 
> >> vhost) for
> >>     rpmsg communication between two SoCs connected to each 
> >> other
> >> 5) Add NTB Virtio driver and NTB Vhost driver for rpmsg 
> >> communication
> >>     between two SoCs connected via NTB
> >> 6) Add configfs to configure the components
> >>
> >> UseCase1 :
> >>
> >>   VHOST RPMSG VIRTIO RPMSG
> >>    +   +
> >>    |   |
> >>    |   |
> >>    |   |
> >>    |   |
> >> +-v--+ +--v---+
> >> |   Linux    | | Linux    |
> >> |  Endpoint  | | Root Complex |
> >> |    <->  |
> >> |    | |  |
> >> |    SOC1    | | SOC2 |
> >> ++ +--+
> >>
> >> UseCase 2:
> >>
> >>   VHOST RPMSG  VIRTIO 
> >> RPMSG
> >>    + +
> >>    | |
> >>    | |
> >>    | |
> >>    | |
> >>     +--v--+   
> >> +--v--+
> >>     | |   |
> >>  |
> >>     |    HOST1    |   |    
> >> HOST2    |
> >>     | |   |
> >>  |
> >>     +--^--+   
> >> +--^--+
> >>    | |
> >>    | |
> >> +-+
> >> |  +--v--+   
> >> +--v--+  |
> >> |  | |   | 
> >> |  |
> >> |  | EP  |   | EP  
> >> |  |
> >> |  | CONTROLLER1 |   | CONTROLLER2 
> >> |  |
> >> |  | <---> 
> >> |  |
> >> |  | |   | 
> >> |  |
> >> |  | |   | 
> >> |  |
> >> |  | |  SoC With Multiple EP Instances   | 
> >> |  |
> >> |  | |  (Configured using NTB Function)  | 
> >> |  |
> >> |  +-+   
> >> +-+  |
> >> 

Re: [PATCH v10 02/16] s390/vfio-ap: use new AP bus interface to search for queue devices

2020-08-28 Thread Cornelia Huck
On Thu, 27 Aug 2020 10:24:07 -0400
Tony Krowiak  wrote:

> On 8/25/20 6:13 AM, Cornelia Huck wrote:
> > On Fri, 21 Aug 2020 15:56:02 -0400
> > Tony Krowiak  wrote:

> >>   /**
> >> - * vfio_ap_get_queue: Retrieve a queue with a specific APQN from a list
> >> - * @matrix_mdev: the associated mediated matrix
> >> + * vfio_ap_get_queue: Retrieve a queue with a specific APQN.
> >>* @apqn: The queue APQN
> >>*
> >> - * Retrieve a queue with a specific APQN from the list of the
> >> - * devices of the vfio_ap_drv.
> >> - * Verify that the APID and the APQI are set in the matrix.
> >> + * Retrieve a queue with a specific APQN from the AP queue devices 
> >> attached to
> >> + * the AP bus.
> >>*
> >> - * Returns the pointer to the associated vfio_ap_queue
> >> + * Returns the pointer to the vfio_ap_queue with the specified APQN, or 
> >> NULL.
> >>*/
> >> -static struct vfio_ap_queue *vfio_ap_get_queue(
> >> -  struct ap_matrix_mdev *matrix_mdev,
> >> -  int apqn)
> >> +static struct vfio_ap_queue *vfio_ap_get_queue(unsigned long apqn)
> >>   {
> >> +  struct ap_queue *queue;
> >>struct vfio_ap_queue *q;
> >> -  struct device *dev;
> >>   
> >> -  if (!test_bit_inv(AP_QID_CARD(apqn), matrix_mdev->matrix.apm))
> >> -  return NULL;
> >> -  if (!test_bit_inv(AP_QID_QUEUE(apqn), matrix_mdev->matrix.aqm))  
> > I think you should add some explanation to the patch description why
> > testing the matrix bitmasks is not needed anymore.  
> 
> As a result of this comment, I took a closer look at the code to
> determine the reason for eliminating the matrix_mdev
> parameter. The reason is because the code below (i.e., find the device
> and get the driver data) was also repeated in the vfio_ap_irq_disable_apqn()
> function, so I replaced it with a call to the function above; however, the
> vfio_ap_irq_disable_apqn() function  does not have a reference to the
> matrix_mdev, so I eliminated the matrix_mdev parameter. Note that the
> vfio_ap_irq_disable_apqn() is called for each APQN assigned to a matrix
> mdev, so there is no need to test the bitmasks there.
> 
> The other place from which the function above is called is
> the handle_pqap() function which does have a reference to the
> matrix_mdev. In order to ensure the integrity of the instruction
> being intercepted - i.e., PQAP(AQIC) enable/disable IRQ for aN
> AP queue - the testing of the matrix bitmasks probably ought to
> be performed, so it will be done there instead of in the
> vfio_ap_get_queue() function above.

Should you add a comment that vfio_ap_get_queue() assumes that the
caller makes sure that this is only called for APQNs that are assigned
to a matrix?

> 
> 
> > +   queue = ap_get_qdev(apqn);
> > +   if (!queue)
> > return NULL;
> >   
> > -   dev = driver_find_device(_dev->vfio_ap_drv->driver, NULL,
> > -, match_apqn);
> > -   if (!dev)
> > -   return NULL;
> > -   q = dev_get_drvdata(dev);
> > -   q->matrix_mdev = matrix_mdev;
> > -   put_device(dev);
> > +   q = dev_get_drvdata(>ap_dev.device);
> > +   put_device(>ap_dev.device);
> >   
> > return q;
> >   }
> > (...)
> >  
> 



Re: [PATCH v10 01/16] s390/vfio-ap: add version vfio_ap module

2020-08-28 Thread Cornelia Huck
On Thu, 27 Aug 2020 10:39:07 -0400
Tony Krowiak  wrote:

> Currently there are two tools that probably need to be aware of
> the changes to these assignment interfaces:
> * The hades test framework has tests that will fail if run against
>     these patches that should be skipped if over-provisioning is
>     allowed. There are also tests under development to test the
>     function introduced by these patches that will fail if run against
>     an older version of the driver. These tests should be skipped in
>     that case.
> * There is a tool under development for configuring AP matrix
>     mediated devices that probably need to be aware of the change
>     introduced by this series.
> 
> Since a tool would have to first determine whether a new sysfs
> interface documenting facilities is available and it would only
> expose one facility at this point, it seems reasonable for these tools
> to check for the sysfs guest_matrix attribute to discern whether
> over-provisioning is available or not. I'll go ahead and remove this
> patch from the series.

Thanks for the explanation, that seems reasonable to me.



Re: [PATCH v10 01/16] s390/vfio-ap: add version vfio_ap module

2020-08-27 Thread Cornelia Huck
On Wed, 26 Aug 2020 10:49:47 -0400
Tony Krowiak  wrote:

> On 8/25/20 6:04 AM, Cornelia Huck wrote:
> > On Fri, 21 Aug 2020 15:56:01 -0400
> > Tony Krowiak  wrote:
> >  
> >> Let's set a version for the vfio_ap module so that automated regression
> >> tests can determine whether dynamic configuration tests can be run or
> >> not.
> >>
> >> Signed-off-by: Tony Krowiak 
> >> ---
> >>   drivers/s390/crypto/vfio_ap_drv.c | 2 ++
> >>   1 file changed, 2 insertions(+)
> >>
> >> diff --git a/drivers/s390/crypto/vfio_ap_drv.c 
> >> b/drivers/s390/crypto/vfio_ap_drv.c
> >> index be2520cc010b..f4ceb380dd61 100644
> >> --- a/drivers/s390/crypto/vfio_ap_drv.c
> >> +++ b/drivers/s390/crypto/vfio_ap_drv.c
> >> @@ -17,10 +17,12 @@
> >>   
> >>   #define VFIO_AP_ROOT_NAME "vfio_ap"
> >>   #define VFIO_AP_DEV_NAME "matrix"
> >> +#define VFIO_AP_MODULE_VERSION "1.2.0"
> >>   
> >>   MODULE_AUTHOR("IBM Corporation");
> >>   MODULE_DESCRIPTION("VFIO AP device driver, Copyright IBM Corp. 2018");
> >>   MODULE_LICENSE("GPL v2");
> >> +MODULE_VERSION(VFIO_AP_MODULE_VERSION);
> >>   
> >>   static struct ap_driver vfio_ap_drv;
> >> 
> > Setting a version manually has some drawbacks:
> > - tools wanting to check for capabilities need to keep track which
> >versions support which features
> > - you need to remember to actually bump the version when adding a new,
> >visible feature
> > (- selective downstream backports may get into a pickle, but that's
> > arguably not your problem)
> >
> > Is there no way for a tool to figure out whether this is supported?
> > E.g., via existence of a sysfs file, or via a known error that will
> > occur. If not, it's maybe better to expose known capabilities via a
> > generic interface.  
> 
> This patch series introduces a new mediated device sysfs attribute,
> guest_matrix, so the automated tests could check for the existence
> of that interface. The problem I have with that is it will work for
> this version of the vfio_ap device driver - which may be all that is
> ever needed - but does not account for future enhancements
> which may need to be detected by tooling or automated tests.
> It seems to me that regardless of how a tool detects whether
> a feature is supported or not, it will have to keep track of that
> somehow.

Which enhancements? If you change the interface in an incompatible way,
you have a different problem anyway. If someone trying to use the
enhanced version of the interface gets an error on a kernel providing
an older version of the interface, that's a reasonable way to discover
support.

I think "discover device driver capabilities by probing" is less
burdensome and error prone than trying to match up capabilities with a
version number. If you expose a version number, a tool would still have
to probe that version number, and then consult with a list of features
per version, which can easily go out of sync.

> Can you provide more details about this generic interface of
> which you speak?

If that is really needed, I'd probably do a driver sysfs attribute that
exposes a list of documented capabilities (as integer values, or as a
bit.) But since tools can simply check for guest_matrix to find out
about support for this feature here, it seems like overkill to me --
unless you have a multitude of features waiting in queue that need to
be made discoverable.



Re: [PATCH v10 16/16] s390/vfio-ap: update docs to include dynamic config support

2020-08-25 Thread Cornelia Huck
On Fri, 21 Aug 2020 15:56:16 -0400
Tony Krowiak  wrote:

> Update the documentation in vfio-ap.rst to include information about the
> AP dynamic configuration support (i.e., hot plug of adapters, domains
> and control domains via the matrix mediated device's sysfs assignment
> attributes).
> 
> Signed-off-by: Tony Krowiak 
> ---
>  Documentation/s390/vfio-ap.rst | 362 ++---
>  1 file changed, 285 insertions(+), 77 deletions(-)
> 
> diff --git a/Documentation/s390/vfio-ap.rst b/Documentation/s390/vfio-ap.rst
> index e15436599086..8907aeca8fb7 100644
> --- a/Documentation/s390/vfio-ap.rst
> +++ b/Documentation/s390/vfio-ap.rst
> @@ -253,7 +253,7 @@ The process for reserving an AP queue for use by a KVM 
> guest is:
>  1. The administrator loads the vfio_ap device driver
>  2. The vfio-ap driver during its initialization will register a single 
> 'matrix'
> device with the device core. This will serve as the parent device for
> -   all mediated matrix devices used to configure an AP matrix for a guest.
> +   all matrix mediated devices used to configure an AP matrix for a guest.

This (and many other changes here) seems to be unrelated to the new
feature. Split that out into a separate patch that can be applied right
away? That would make this patch smaller and easier to review; it's
hard to figure out which parts deal with the new feature, and which parts
simply got an update.

Also, do you want to do similar wording changes in the QEMU
documentation for vfio-ap?

>  3. The /sys/devices/vfio_ap/matrix device is created by the device core
>  4. The vfio_ap device driver will register with the AP bus for AP queue 
> devices
> of type 10 and higher (CEX4 and newer). The driver will provide the 
> vfio_ap

(...)

> @@ -435,6 +481,10 @@ available to a KVM guest via the following CPU model 
> features:
> can be made available to the guest only if it is available on the host 
> (i.e.,
> facility bit 12 is set).
>  
> +4. apqi: Indicates AP queue interrupts are available on the guest. This 
> facility
> +   can be made available to the guest only if it is available on the host 
> (i.e.,
> +   facility bit 65 is set).
> +
>  Note: If the user chooses to specify a CPU model different than the 'host'
>  model to QEMU, the CPU model features and facilities need to be turned on
>  explicitly; for example::
> @@ -444,7 +494,7 @@ explicitly; for example::
>  A guest can be precluded from using AP features/facilities by turning them 
> off
>  explicitly; for example::
>  
> - /usr/bin/qemu-system-s390x ... -cpu host,ap=off,apqci=off,apft=off
> + /usr/bin/qemu-system-s390x ... -cpu 
> host,ap=off,apqci=off,apft=off,apqi=off

Isn't that an already existing facility that was simply lacking
documentation? If yes, split it off?

>  
>  Note: If the APFT facility is turned off (apft=off) for the guest, the guest
>  will not see any AP devices. The zcrypt device drivers that register for 
> type 10

(...)



Re: [PATCH v10 03/16] s390/vfio-ap: manage link between queue struct and matrix mdev

2020-08-25 Thread Cornelia Huck
On Fri, 21 Aug 2020 15:56:03 -0400
Tony Krowiak  wrote:

> Let's create links between each queue device bound to the vfio_ap device
> driver and the matrix mdev to which the queue is assigned. The idea is to
> facilitate efficient retrieval of the objects representing the queue
> devices and matrix mdevs as well as to verify that a queue assigned to
> a matrix mdev is bound to the driver.
> 
> The links will be created as follows:
> 
>* When the queue device is probed, if its APQN is assigned to a matrix
>  mdev, the structures representing the queue device and the matrix mdev
>  will be linked.
> 
>* When an adapter or domain is assigned to a matrix mdev, for each new
>  APQN assigned that references a queue device bound to the vfio_ap
>  device driver, the structures representing the queue device and the
>  matrix mdev will be linked.
> 
> The links will be removed as follows:
> 
>* When the queue device is removed, if its APQN is assigned to a matrix
>  mdev, the structures representing the queue device and the matrix mdev
>  will be unlinked.
> 
>* When an adapter or domain is unassigned from a matrix mdev, for each
>  APQN unassigned that references a queue device bound to the vfio_ap
>  device driver, the structures representing the queue device and the
>  matrix mdev will be unlinked.
> 
> Signed-off-by: Tony Krowiak 
> ---
>  drivers/s390/crypto/vfio_ap_ops.c | 132 +-
>  drivers/s390/crypto/vfio_ap_private.h |   2 +
>  2 files changed, 129 insertions(+), 5 deletions(-)
> 

(...)

> @@ -548,6 +557,87 @@ static int vfio_ap_mdev_verify_no_sharing(struct 
> ap_matrix_mdev *matrix_mdev)
>   return 0;
>  }
>  
> +enum qlink_type {

I think this is less of a type, and more of an action, so
maybe call this 'qlink_action' (and the function parameter below
'action'?)

> + LINK_APID,
> + LINK_APQI,
> + UNLINK_APID,
> + UNLINK_APQI,
> +};
> +
> +static void vfio_ap_mdev_link_queue(struct ap_matrix_mdev *matrix_mdev,
> + unsigned long apid, unsigned long apqi)
> +{
> + struct vfio_ap_queue *q;
> +
> + q = vfio_ap_get_queue(AP_MKQID(apid, apqi));
> + if (q) {
> + q->matrix_mdev = matrix_mdev;
> + hash_add(matrix_mdev->qtable,
> +  >mdev_qnode, q->apqn);
> + }
> +}
> +
> +static void vfio_ap_mdev_unlink_queue(unsigned long apid, unsigned long apqi)
> +{
> + struct vfio_ap_queue *q;
> +
> + q = vfio_ap_get_queue(AP_MKQID(apid, apqi));
> + if (q) {
> + q->matrix_mdev = NULL;
> + hash_del(>mdev_qnode);
> + }
> +}
> +
> +/**
> + * vfio_ap_mdev_link_queues
> + *
> + * @matrix_mdev: The matrix mdev to link.
> + * @type: The type of @qlink_id.
> + * @qlink_id: The APID or APQI of the queues to link.
> + *
> + * Sets or clears the links between the queues with the specified @qlink_id
> + * and the @matrix_mdev:
> + * @type == LINK_APID: Set the links between the @matrix_mdev and the
> + * queues with the specified @qlink_id (APID)
> + * @type == LINK_APQI: Set the links between the @matrix_mdev and the
> + * queues with the specified @qlink_id (APQI)
> + * @type == UNLINK_APID: Clear the links between the @matrix_mdev and the
> + *   queues with the specified @qlink_id (APID)
> + * @type == UNLINK_APQI: Clear the links between the @matrix_mdev and the
> + *   queues with the specified @qlink_id (APQI)
> + */
> +static void vfio_ap_mdev_link_queues(struct ap_matrix_mdev *matrix_mdev,
> +  enum qlink_type type,
> +  unsigned long qlink_id)
> +{
> + unsigned long id;
> +
> + switch (type) {
> + case LINK_APID:
> + for_each_set_bit_inv(id, matrix_mdev->matrix.aqm,
> +  matrix_mdev->matrix.aqm_max + 1)
> + vfio_ap_mdev_link_queue(matrix_mdev, qlink_id, id);
> + break;
> + case UNLINK_APID:
> + for_each_set_bit_inv(id, matrix_mdev->matrix.aqm,
> +  matrix_mdev->matrix.aqm_max + 1)
> + vfio_ap_mdev_unlink_queue(qlink_id, id);
> + break;
> + case LINK_APQI:
> + for_each_set_bit_inv(id, matrix_mdev->matrix.apm,
> +  matrix_mdev->matrix.apm_max + 1)
> + vfio_ap_mdev_link_queue(matrix_mdev, id, qlink_id);
> + break;
> + case UNLINK_APQI:
> + for_each_set_bit_inv(id, matrix_mdev->matrix.apm,
> +  matrix_mdev->matrix.apm_max + 1)
> + vfio_ap_mdev_link_queue(matrix_mdev, id, qlink_id);
> + break;
> + default:
> + WARN_ON_ONCE(1);
> + }
> +}
> +

(...)

I have not reviewed this 

Re: [PATCH v10 02/16] s390/vfio-ap: use new AP bus interface to search for queue devices

2020-08-25 Thread Cornelia Huck
On Fri, 21 Aug 2020 15:56:02 -0400
Tony Krowiak  wrote:

> This patch refactor's the vfio_ap device driver to use the AP bus's

s/refactor's/refactors/

> ap_get_qdev() function to retrieve the vfio_ap_queue struct containing
> information about a queue that is bound to the vfio_ap device driver.
> The bus's ap_get_qdev() function retrieves the queue device from a
> hashtable keyed by APQN. This is much more efficient than looping over
> the list of devices attached to the AP bus by several orders of
> magnitude.
> 
> Signed-off-by: Tony Krowiak 
> Reported-by: kernel test robot 
> ---
>  drivers/s390/crypto/vfio_ap_drv.c | 27 ++---
>  drivers/s390/crypto/vfio_ap_ops.c | 86 +++
>  drivers/s390/crypto/vfio_ap_private.h |  8 ++-
>  3 files changed, 59 insertions(+), 62 deletions(-)
> 

(...)

> diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
> b/drivers/s390/crypto/vfio_ap_ops.c
> index e0bde8518745..ad3925f04f61 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -26,43 +26,26 @@
>  
>  static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
>  
> -static int match_apqn(struct device *dev, const void *data)
> -{
> - struct vfio_ap_queue *q = dev_get_drvdata(dev);
> -
> - return (q->apqn == *(int *)(data)) ? 1 : 0;
> -}
> -
>  /**
> - * vfio_ap_get_queue: Retrieve a queue with a specific APQN from a list
> - * @matrix_mdev: the associated mediated matrix
> + * vfio_ap_get_queue: Retrieve a queue with a specific APQN.
>   * @apqn: The queue APQN
>   *
> - * Retrieve a queue with a specific APQN from the list of the
> - * devices of the vfio_ap_drv.
> - * Verify that the APID and the APQI are set in the matrix.
> + * Retrieve a queue with a specific APQN from the AP queue devices attached 
> to
> + * the AP bus.
>   *
> - * Returns the pointer to the associated vfio_ap_queue
> + * Returns the pointer to the vfio_ap_queue with the specified APQN, or NULL.
>   */
> -static struct vfio_ap_queue *vfio_ap_get_queue(
> - struct ap_matrix_mdev *matrix_mdev,
> - int apqn)
> +static struct vfio_ap_queue *vfio_ap_get_queue(unsigned long apqn)
>  {
> + struct ap_queue *queue;
>   struct vfio_ap_queue *q;
> - struct device *dev;
>  
> - if (!test_bit_inv(AP_QID_CARD(apqn), matrix_mdev->matrix.apm))
> - return NULL;
> - if (!test_bit_inv(AP_QID_QUEUE(apqn), matrix_mdev->matrix.aqm))

I think you should add some explanation to the patch description why
testing the matrix bitmasks is not needed anymore.

> + queue = ap_get_qdev(apqn);
> + if (!queue)
>   return NULL;
>  
> - dev = driver_find_device(_dev->vfio_ap_drv->driver, NULL,
> -  , match_apqn);
> - if (!dev)
> - return NULL;
> - q = dev_get_drvdata(dev);
> - q->matrix_mdev = matrix_mdev;
> - put_device(dev);
> + q = dev_get_drvdata(>ap_dev.device);
> + put_device(>ap_dev.device);
>  
>   return q;
>  }

(...)



Re: [PATCH v10 01/16] s390/vfio-ap: add version vfio_ap module

2020-08-25 Thread Cornelia Huck
On Fri, 21 Aug 2020 15:56:01 -0400
Tony Krowiak  wrote:

> Let's set a version for the vfio_ap module so that automated regression
> tests can determine whether dynamic configuration tests can be run or
> not.
> 
> Signed-off-by: Tony Krowiak 
> ---
>  drivers/s390/crypto/vfio_ap_drv.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_drv.c 
> b/drivers/s390/crypto/vfio_ap_drv.c
> index be2520cc010b..f4ceb380dd61 100644
> --- a/drivers/s390/crypto/vfio_ap_drv.c
> +++ b/drivers/s390/crypto/vfio_ap_drv.c
> @@ -17,10 +17,12 @@
>  
>  #define VFIO_AP_ROOT_NAME "vfio_ap"
>  #define VFIO_AP_DEV_NAME "matrix"
> +#define VFIO_AP_MODULE_VERSION "1.2.0"
>  
>  MODULE_AUTHOR("IBM Corporation");
>  MODULE_DESCRIPTION("VFIO AP device driver, Copyright IBM Corp. 2018");
>  MODULE_LICENSE("GPL v2");
> +MODULE_VERSION(VFIO_AP_MODULE_VERSION);
>  
>  static struct ap_driver vfio_ap_drv;
>  

Setting a version manually has some drawbacks:
- tools wanting to check for capabilities need to keep track which
  versions support which features
- you need to remember to actually bump the version when adding a new,
  visible feature
(- selective downstream backports may get into a pickle, but that's
arguably not your problem)

Is there no way for a tool to figure out whether this is supported?
E.g., via existence of a sysfs file, or via a known error that will
occur. If not, it's maybe better to expose known capabilities via a
generic interface.



Re: [PATCH v9 2/2] s390: virtio: PV needs VIRTIO I/O device protection

2020-08-21 Thread Cornelia Huck
On Wed, 19 Aug 2020 18:23:18 +0200
Pierre Morel  wrote:

> If protected virtualization is active on s390, VIRTIO has retricted

s/retricted/only restricted/

> access to the guest memory.
> Define CONFIG_ARCH_HAS_RESTRICTED_VIRTIO_MEMORY_ACCESS and export
> arch_has_restricted_virtio_memory_access to advertize VIRTIO if that's
> the case, preventing a host error on access attempt.
> 
> Signed-off-by: Pierre Morel 
> ---
>  arch/s390/Kconfig   |  1 +
>  arch/s390/mm/init.c | 11 +++
>  2 files changed, 12 insertions(+)

(...)

> diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
> index 6dc7c3b60ef6..8febd73ed6ca 100644
> --- a/arch/s390/mm/init.c
> +++ b/arch/s390/mm/init.c
> @@ -45,6 +45,7 @@
>  #include 
>  #include 
>  #include 
> +#include 

I don't think you need this include anymore.

>  
>  pgd_t swapper_pg_dir[PTRS_PER_PGD] __section(.bss..swapper_pg_dir);
>  

(...)

With the nit fixed,

Reviewed-by: Cornelia Huck 



Re: [PATCH v9 1/2] virtio: let arch advertise guest's memory access restrictions

2020-08-21 Thread Cornelia Huck
On Wed, 19 Aug 2020 18:23:17 +0200
Pierre Morel  wrote:

> An architecture may restrict host access to guest memory.

"e.g. IBM s390 Secure Execution or AMD SEV"

Just to make clearer what you are referring to?

> 
> Provide a new Kconfig entry the architecture can select,
> CONFIG_ARCH_HAS_RESTRICTED_VIRTIO_MEMORY_ACCESS, when it provides
> the arch_has_restricted_virtio_memory_access callback to advertise

s/advertise/advertise to/

> VIRTIO common code when the architecture restricts memory access
> from the host.

"The common code can then fail the probe for any device where
VIRTIO_F_IOMMU_PLATFORM is required, but not set."

?

> 
> Signed-off-by: Pierre Morel 
> ---
>  drivers/virtio/Kconfig|  6 ++
>  drivers/virtio/virtio.c   | 15 +++
>  include/linux/virtio_config.h |  9 +
>  3 files changed, 30 insertions(+)
> 
> diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> index 5809e5f5b157..509f3b4d8ba1 100644
> --- a/drivers/virtio/Kconfig
> +++ b/drivers/virtio/Kconfig
> @@ -6,6 +6,12 @@ config VIRTIO
> bus, such as CONFIG_VIRTIO_PCI, CONFIG_VIRTIO_MMIO, CONFIG_RPMSG
> or CONFIG_S390_GUEST.
>  
> +config ARCH_HAS_RESTRICTED_VIRTIO_MEMORY_ACCESS
> + bool
> + help
> +   This option is selected by any architecture enforcing
> +   VIRTIO_F_IOMMU_PLATFORM

"This option is selected if the architecture may need to enforce
VIRTIO_F_IOMMU_PLATFORM."

?

> +
>  menuconfig VIRTIO_MENU
>   bool "Virtio drivers"
>   default y

(...)

Reviewed-by: Cornelia Huck 



Re: [PATCH v8 1/2] virtio: let arch validate VIRTIO features

2020-08-19 Thread Cornelia Huck
On Wed, 19 Aug 2020 10:50:18 +0200
Pierre Morel  wrote:

> On 2020-08-18 19:19, Cornelia Huck wrote:
> > On Tue, 18 Aug 2020 16:58:30 +0200
> > Pierre Morel  wrote:
> >   
> ...
> >> +config ARCH_HAS_RESTRICTED_MEMORY_ACCESS
> >> +  bool
> >> +  help
> >> +This option is selected by any architecture enforcing
> >> +VIRTIO_F_IOMMU_PLATFORM  
> > 
> > This option is only for a very specific case of "restricted memory
> > access", namely the kind that requires IOMMU_PLATFORM for virtio
> > devices. ARCH_HAS_RESTRICTED_VIRTIO_MEMORY_ACCESS? Or is this intended
> > to cover cases outside of virtio as well?  
> 
> AFAIK we did not identify other restrictions so adding VIRTIO in the 
> name should be the best thing to do.
> 
> If new restrictions appear they also may be orthogonal.
> 
> I will change to ARCH_HAS_RESTRICTED_VIRTIO_MEMORY_ACCESS if no one 
> complains.
> 
> >   
> >> +
> >>   menuconfig VIRTIO_MENU
> >>bool "Virtio drivers"
> >>default y
> >> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> >> index a977e32a88f2..1471db7d6510 100644
> >> --- a/drivers/virtio/virtio.c
> >> +++ b/drivers/virtio/virtio.c
> >> @@ -176,6 +176,10 @@ int virtio_finalize_features(struct virtio_device 
> >> *dev)
> >>if (ret)
> >>return ret;
> >>   
> >> +  ret = arch_has_restricted_memory_access(dev);
> >> +  if (ret)
> >> +  return ret;  
> > 
> > Hm, I'd rather have expected something like
> > 
> > if (arch_has_restricted_memory_access(dev)) {  
> 
> may be also change the callback name to
> arch_has_restricted_virtio_memory_access() ?

Yes, why not.

> 
> > // enforce VERSION_1 and IOMMU_PLATFORM
> > }
> > 
> > Otherwise, you're duplicating the checks in the individual architecture
> > callbacks again.  
> 
> Yes, I agree and go back this way.
> 
> > 
> > [Not sure whether the device argument would be needed here; are there
> > architectures where we'd only require IOMMU_PLATFORM for a subset of
> > virtio devices?]  
> 
> I don't think so and since we do the checks locally, we do not need the 
> device argument anymore.

Yes, that would also remove some layering entanglement.



Re: [PATCH v8 2/2] s390: virtio: PV needs VIRTIO I/O device protection

2020-08-18 Thread Cornelia Huck
On Tue, 18 Aug 2020 16:58:31 +0200
Pierre Morel  wrote:

> If protected virtualization is active on s390, the virtio queues are
> not accessible to the host, unless VIRTIO_F_IOMMU_PLATFORM has been
> negotiated.
> Define CONFIG_ARCH_HAS_RESTRICTED_MEMORY_ACCESS and export
> arch_has_restricted_memory_access to fail probe if that's
> not the case, preventing a host error on access attempt.
> 
> Signed-off-by: Pierre Morel 
> ---
>  arch/s390/Kconfig   |  1 +
>  arch/s390/mm/init.c | 30 ++
>  2 files changed, 31 insertions(+)
> 
> diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
> index 9cfd8de907cb..d4a3ef4fa27b 100644
> --- a/arch/s390/Kconfig
> +++ b/arch/s390/Kconfig
> @@ -820,6 +820,7 @@ menu "Virtualization"
>  config PROTECTED_VIRTUALIZATION_GUEST
>   def_bool n
>   prompt "Protected virtualization guest support"
> + select ARCH_HAS_RESTRICTED_MEMORY_ACCESS
>   help
> Select this option, if you want to be able to run this
> kernel as a protected virtualization KVM guest.
> diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
> index 6dc7c3b60ef6..aec04d7dd089 100644
> --- a/arch/s390/mm/init.c
> +++ b/arch/s390/mm/init.c
> @@ -45,6 +45,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  pgd_t swapper_pg_dir[PTRS_PER_PGD] __section(.bss..swapper_pg_dir);
>  
> @@ -161,6 +162,35 @@ bool force_dma_unencrypted(struct device *dev)
>   return is_prot_virt_guest();
>  }
>  
> +#ifdef CONFIG_ARCH_HAS_RESTRICTED_MEMORY_ACCESS
> +/*
> + * arch_has_restricted_memory_access
> + * @dev: the VIRTIO device being added
> + *
> + * Return an error if required features are missing on a guest running
> + * with protected virtualization.
> + */
> +int arch_has_restricted_memory_access(struct virtio_device *dev)
> +{
> + if (!is_prot_virt_guest())
> + return 0;

If you just did a

return is_prot_virt_guest();

and did the virtio feature stuff in the virtio core, this function
would be short and sweet :)

> +
> + if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1)) {
> + dev_warn(>dev, "device must provide VIRTIO_F_VERSION_1\n");
> + return -ENODEV;
> + }
> +
> + if (!virtio_has_feature(dev, VIRTIO_F_IOMMU_PLATFORM)) {
> + dev_warn(>dev,
> +  "device must provide VIRTIO_F_IOMMU_PLATFORM\n");
> + return -ENODEV;
> + }
> +
> + return 0;
> +}
> +EXPORT_SYMBOL(arch_has_restricted_memory_access);
> +#endif
> +
>  /* protected virtualization */
>  static void pv_init(void)
>  {



Re: [PATCH v8 1/2] virtio: let arch validate VIRTIO features

2020-08-18 Thread Cornelia Huck
On Tue, 18 Aug 2020 16:58:30 +0200
Pierre Morel  wrote:

> An architecture may need to validate the VIRTIO devices features
> based on architecture specifics.
> 
> Provide a new Kconfig entry, CONFIG_ARCH_HAS_RESTRICTED_MEMORY_ACCESS,
> the architecture can select when it provides a callback named
> arch_has_restricted_memory_access to validate the virtio device
> features.
> 
> Signed-off-by: Pierre Morel 
> ---
>  drivers/virtio/Kconfig| 6 ++
>  drivers/virtio/virtio.c   | 4 
>  include/linux/virtio_config.h | 9 +
>  3 files changed, 19 insertions(+)
> 
> diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> index 5809e5f5b157..eef09e3c92f9 100644
> --- a/drivers/virtio/Kconfig
> +++ b/drivers/virtio/Kconfig
> @@ -6,6 +6,12 @@ config VIRTIO
> bus, such as CONFIG_VIRTIO_PCI, CONFIG_VIRTIO_MMIO, CONFIG_RPMSG
> or CONFIG_S390_GUEST.
>  
> +config ARCH_HAS_RESTRICTED_MEMORY_ACCESS
> + bool
> + help
> +   This option is selected by any architecture enforcing
> +   VIRTIO_F_IOMMU_PLATFORM

This option is only for a very specific case of "restricted memory
access", namely the kind that requires IOMMU_PLATFORM for virtio
devices. ARCH_HAS_RESTRICTED_VIRTIO_MEMORY_ACCESS? Or is this intended
to cover cases outside of virtio as well?

> +
>  menuconfig VIRTIO_MENU
>   bool "Virtio drivers"
>   default y
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index a977e32a88f2..1471db7d6510 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -176,6 +176,10 @@ int virtio_finalize_features(struct virtio_device *dev)
>   if (ret)
>   return ret;
>  
> + ret = arch_has_restricted_memory_access(dev);
> + if (ret)
> + return ret;

Hm, I'd rather have expected something like

if (arch_has_restricted_memory_access(dev)) {
// enforce VERSION_1 and IOMMU_PLATFORM
}

Otherwise, you're duplicating the checks in the individual architecture
callbacks again.

[Not sure whether the device argument would be needed here; are there
architectures where we'd only require IOMMU_PLATFORM for a subset of
virtio devices?]

> +
>   if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1))
>   return 0;
>  
> diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> index bb4cc4910750..f6b82541c497 100644
> --- a/include/linux/virtio_config.h
> +++ b/include/linux/virtio_config.h
> @@ -459,4 +459,13 @@ static inline void virtio_cwrite64(struct virtio_device 
> *vdev,
>   _r; \
>   })
>  
> +#ifdef CONFIG_ARCH_HAS_RESTRICTED_MEMORY_ACCESS
> +int arch_has_restricted_memory_access(struct virtio_device *dev);
> +#else
> +static inline int arch_has_restricted_memory_access(struct virtio_device 
> *dev)
> +{
> + return 0;
> +}
> +#endif /* CONFIG_ARCH_HAS_RESTRICTED_MEMORY_ACCESS */
> +
>  #endif /* _LINUX_VIRTIO_CONFIG_H */



Re: [PATCH] vfio/type1: Add proper error unwind for vfio_iommu_replay()

2020-08-10 Thread Cornelia Huck
On Fri, 07 Aug 2020 14:14:48 -0600
Alex Williamson  wrote:

> The vfio_iommu_replay() function does not currently unwind on error,
> yet it does pin pages, perform IOMMU mapping, and modify the vfio_dma
> structure to indicate IOMMU mapping.  The IOMMU mappings are torn down
> when the domain is destroyed, but the other actions go on to cause
> trouble later.  For example, the iommu->domain_list can be empty if we
> only have a non-IOMMU backed mdev attached.  We don't currently check
> if the list is empty before getting the first entry in the list, which
> leads to a bogus domain pointer.  If a vfio_dma entry is erroneously
> marked as iommu_mapped, we'll attempt to use that bogus pointer to
> retrieve the existing physical page addresses.
> 
> This is the scenario that uncovered this issue, attempting to hot-add
> a vfio-pci device to a container with an existing mdev device and DMA
> mappings, one of which could not be pinned, causing a failure adding
> the new group to the existing container and setting the conditions
> for a subsequent attempt to explode.
> 
> To resolve this, we can first check if the domain_list is empty so
> that we can reject replay of a bogus domain, should we ever encounter
> this inconsistent state again in the future.  The real fix though is
> to add the necessary unwind support, which means cleaning up the
> current pinning if an IOMMU mapping fails, then walking back through
> the r-b tree of DMA entries, reading from the IOMMU which ranges are
> mapped, and unmapping and unpinning those ranges.  To be able to do
> this, we also defer marking the DMA entry as IOMMU mapped until all
> entries are processed, in order to allow the unwind to know the
> disposition of each entry.
> 
> Fixes: a54eb55045ae ("vfio iommu type1: Add support for mediated devices")
> Signed-off-by: Alex Williamson 
> ---
>  drivers/vfio/vfio_iommu_type1.c |   71 
> ++++---
>  1 file changed, 66 insertions(+), 5 deletions(-)

Reviewed-by: Cornelia Huck 



Re: [PATCH v1 0/1] s390: virtio-ccw: PV needs VIRTIO I/O device protection

2020-08-06 Thread Cornelia Huck
On Thu,  6 Aug 2020 16:23:01 +0200
Pierre Morel  wrote:

> Hi all,
> 
> In another series I proposed to add an architecture specific
> callback to fail feature negociation on architecture need.
> 
> In VIRTIO, we already have an entry to reject the features on the
> transport basis.
> 
> Transport is not architecture so I send a separate series in which
> we fail the feature negociation inside virtio_ccw_finalize_features,
> the virtio_config_ops.finalize_features for S390 CCW transport,
> when the device do not propose the VIRTIO_F_IOMMU_PLATFORM.
> 
> This solves the problem of crashing QEMU when this one is not using
> a CCW device with iommu_platform=on in S390.

This does work, and I'm tempted to queue this patch, but I'm wondering
whether we need to give up on a cross-architecture solution already
(especially keeping in mind that ccw is the only transport that is
really architecture-specific).

I know that we've gone through a few rounds already, and I'm not sure
whether we've been there already, but:

Could virtio_finalize_features() call an optional
arch_has_restricted_memory_access() function and do the enforcing of
IOMMU_PLATFORM? That would catch all transports, and things should work
once an architecture opts in. That direction also shouldn't be a
problem if virtio is a module.

> 
> Regards,
> Pierre
> 
> Regards,
> Pierre
> 
> Pierre Morel (1):
>   s390: virtio-ccw: PV needs VIRTIO I/O device protection
> 
>  drivers/s390/virtio/virtio_ccw.c | 24 +++-
>  1 file changed, 19 insertions(+), 5 deletions(-)
> 



Re: [PATCH] vfio-pci: Avoid recursive read-lock usage

2020-08-06 Thread Cornelia Huck
On Wed, 05 Aug 2020 11:58:05 -0600
Alex Williamson  wrote:

> A down_read on memory_lock is held when performing read/write accesses
> to MMIO BAR space, including across the copy_to/from_user() callouts
> which may fault.  If the user buffer for these copies resides in an
> mmap of device MMIO space, the mmap fault handler will acquire a
> recursive read-lock on memory_lock.  Avoid this by reducing the lock
> granularity.  Sequential accesses requiring multiple ioread/iowrite
> cycles are expected to be rare, therefore typical accesses should not
> see additional overhead.
> 
> VGA MMIO accesses are expected to be non-fatal regardless of the PCI
> memory enable bit to allow legacy probing, this behavior remains with
> a comment added.  ioeventfds are now included in memory access testing,
> with writes dropped while memory space is disabled.
> 
> Fixes: abafbc551fdd ("vfio-pci: Invalidate mmaps and block MMIO access on 
> disabled memory")
> Signed-off-by: Alex Williamson 
> ---
>  drivers/vfio/pci/vfio_pci_private.h |2 +
>  drivers/vfio/pci/vfio_pci_rdwr.c|  120 
> ---
>  2 files changed, 98 insertions(+), 24 deletions(-)

Reviewed-by: Cornelia Huck 



Re: [PATCH v3 38/38] virtio_net: use LE accessors for speed/duplex

2020-08-05 Thread Cornelia Huck
On Wed, 5 Aug 2020 09:45:00 -0400
"Michael S. Tsirkin"  wrote:

> Speed and duplex config fields depend on VIRTIO_NET_F_SPEED_DUPLEX
> which being 63>31 depends on VIRTIO_F_VERSION_1.
> 
> Accordingly, use LE accessors for these fields.
> 
> Reported-by: Cornelia Huck 
> Signed-off-by: Michael S. Tsirkin 
> ---
>  drivers/net/virtio_net.c| 9 +
>  include/uapi/linux/virtio_net.h | 2 +-
>  2 files changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index ba38765dc490..0934b1ec5320 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -2264,12 +2264,13 @@ static void virtnet_update_settings(struct 
> virtnet_info *vi)
>   if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_SPEED_DUPLEX))
>   return;
>  
> - speed = virtio_cread32(vi->vdev, offsetof(struct virtio_net_config,
> -   speed));
> + virtio_cread_le(vi->vdev, struct virtio_net_config, speed, );
> +
>   if (ethtool_validate_speed(speed))
>   vi->speed = speed;
> - duplex = virtio_cread8(vi->vdev, offsetof(struct virtio_net_config,
> -   duplex));
> +
> + virtio_cread_le(vi->vdev, struct virtio_net_config, duplex, );

Looks a bit odd for an u8, but does not really hurt.

> +
>   if (ethtool_validate_duplex(duplex))
>   vi->duplex = duplex;
>  }
> diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
> index 27d996f29dd1..3f55a4215f11 100644
> --- a/include/uapi/linux/virtio_net.h
> +++ b/include/uapi/linux/virtio_net.h
> @@ -99,7 +99,7 @@ struct virtio_net_config {
>* speed, in units of 1Mb. All values 0 to INT_MAX are legal.
>* Any other value stands for unknown.
>*/
> - __virtio32 speed;
> + __le32 speed;
>   /*
>* 0x00 - half duplex
>* 0x01 - full duplex

Reviewed-by: Cornelia Huck 



Re: [PATCH v2 18/24] mlxbf-tmfifo: sparse tags for config access

2020-08-04 Thread Cornelia Huck
On Mon, 3 Aug 2020 17:00:01 -0400
"Michael S. Tsirkin"  wrote:

> mlxbf-tmfifo accesses config space using native types -
> which works for it since the legacy virtio native types.
> 
> This will break if it ever needs to support modern virtio,
> so with new tags previously introduced for virtio net config,
> sparse now warns for this in drivers.
> 
> Since this is a legacy only device, fix it up using
> virtio_legacy_is_little_endian for now.

I'm wondering if the driver should make this more explicit?

No issues with the patch, though.

> 
> No functional changes.
> 
> Signed-off-by: Michael S. Tsirkin 
> ---
>  drivers/platform/mellanox/mlxbf-tmfifo.c | 13 ++---
>  1 file changed, 10 insertions(+), 3 deletions(-)

Acked-by: Cornelia Huck 



Re: [PATCH v2 17/24] virtio_config: disallow native type fields

2020-08-04 Thread Cornelia Huck
On Mon, 3 Aug 2020 16:59:57 -0400
"Michael S. Tsirkin"  wrote:

> Transitional devices should all use __virtioXX types.

I think they should use __leXX for those fields that are not present
with legacy devices?

> Modern ones should use __leXX.
> _uXX type would be a bug.
> Let's prevent that.

That sounds right, though.

> 
> Signed-off-by: Michael S. Tsirkin 
> ---
>  include/linux/virtio_config.h | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> index 64da491936f7..c68f58f3bf34 100644
> --- a/include/linux/virtio_config.h
> +++ b/include/linux/virtio_config.h
> @@ -319,9 +319,8 @@ static inline __virtio64 cpu_to_virtio64(struct 
> virtio_device *vdev, u64 val)
>   __virtio_pick_type(x, __u8, __u8,   
> \
>   __virtio_pick_endian(x, __virtio16, __virtio32, __virtio64, 
> \
>   __virtio_pick_endian(x, __le16, __le32, __le64, 
> \
> - __virtio_pick_endian(x, __u16, __u32, __u64,
> \
> - /* No other type allowed */ 
> \
> - (void)0)
> + /* No other type allowed */ 
> \
> + (void)0
>  
>  #endif
>  



Re: [PATCH v2 16/24] virtio_scsi: correct tags for config space fields

2020-08-04 Thread Cornelia Huck
On Mon, 3 Aug 2020 16:59:45 -0400
"Michael S. Tsirkin"  wrote:

> Tag config space fields as having virtio endian-ness.
> 
> Signed-off-by: Michael S. Tsirkin 
> ---
>  drivers/scsi/virtio_scsi.c   |  4 ++--
>  include/uapi/linux/virtio_scsi.h | 20 ++--
>  2 files changed, 12 insertions(+), 12 deletions(-)

Reviewed-by: Cornelia Huck 



Re: [PATCH v2 15/24] virtio_pmem: correct tags for config space fields

2020-08-04 Thread Cornelia Huck
On Mon, 3 Aug 2020 16:59:41 -0400
"Michael S. Tsirkin"  wrote:

> Since this is a modern-only device,
> tag config space fields as having little endian-ness.
> 
> Signed-off-by: Michael S. Tsirkin 
> ---
>  include/uapi/linux/virtio_pmem.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Cornelia Huck 



Re: [PATCH v2 14/24] virtio_net: correct tags for config space fields

2020-08-04 Thread Cornelia Huck
On Mon, 3 Aug 2020 16:59:37 -0400
"Michael S. Tsirkin"  wrote:

> Tag config space fields as having virtio endian-ness.
> 
> Signed-off-by: Michael S. Tsirkin 
> ---
>  include/uapi/linux/virtio_net.h | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
> index 19d23e5baa4e..27d996f29dd1 100644
> --- a/include/uapi/linux/virtio_net.h
> +++ b/include/uapi/linux/virtio_net.h
> @@ -87,19 +87,19 @@ struct virtio_net_config {
>   /* The config defining mac address (if VIRTIO_NET_F_MAC) */
>   __u8 mac[ETH_ALEN];
>   /* See VIRTIO_NET_F_STATUS and VIRTIO_NET_S_* above */
> - __u16 status;
> + __virtio16 status;
>   /* Maximum number of each of transmit and receive queues;
>* see VIRTIO_NET_F_MQ and VIRTIO_NET_CTRL_MQ.
>* Legal values are between 1 and 0x8000
>*/
> - __u16 max_virtqueue_pairs;
> + __virtio16 max_virtqueue_pairs;
>   /* Default maximum transmit unit advice */
> - __u16 mtu;
> + __virtio16 mtu;
>   /*
>* speed, in units of 1Mb. All values 0 to INT_MAX are legal.
>* Any other value stands for unknown.
>*/
> - __u32 speed;
> + __virtio32 speed;

Hm... VIRTIO_NET_F_SPEED_DUPLEX can only be negotiated if VERSION_1 has
also been negotiated; I think this should be __le32?

>   /*
>* 0x00 - half duplex
>* 0x01 - full duplex



Re: [PATCH v2 13/24] virtio_mem: correct tags for config space fields

2020-08-04 Thread Cornelia Huck
On Mon, 3 Aug 2020 16:59:32 -0400
"Michael S. Tsirkin"  wrote:

> Since this is a modern-only device,
> tag config space fields as having little endian-ness.
> 
> TODO: check other uses of __virtioXX types in this header,
> should probably be __leXX.

Yes, I think so.

> 
> Signed-off-by: Michael S. Tsirkin 
> ---
>  include/uapi/linux/virtio_mem.h | 14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)

Reviewed-by: Cornelia Huck 



  1   2   3   4   5   6   7   8   9   10   >