Re: [PATCH v15 00/13] s390/vfio-ap: dynamic configuration support

2021-04-08 Thread Halil Pasic
On Tue,  6 Apr 2021 11:31:09 -0400
Tony Krowiak  wrote:

> Tony Krowiak (13):
>   s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks

The subsequent patches, re introduce this circular locking dependency
problem. See my kernel messages for the details. The link we severe
in the above patch is re-introduced at several places. One of them is
assign_adapter_store().

Regards,
Halil

[  +0.000236] vfio_ap matrix: MDEV: Registered
[  +0.037919] vfio_mdev 4f77ad87-1e62-4959-8b7a-c677c98d2194: Adding to iommu 
group 1
[  +0.92] vfio_mdev 4f77ad87-1e62-4959-8b7a-c677c98d2194: MDEV: group_id = 1

[Apr 8 22:31] ==
[  +0.02] WARNING: possible circular locking dependency detected
[  +0.02] 5.12.0-rc6-00016-g5bea90816c56 #57 Not tainted
[  +0.02] --
[  +0.02] CPU 1/KVM/6651 is trying to acquire lock:
[  +0.02] cef9d508 (_dev->lock){+.+.}-{3:3}, at: 
handle_pqap+0x56/0x1c8 [vfio_ap]
[  +0.11] 
  but task is already holding lock:
[  +0.01] d41f4308 (>mutex){+.+.}-{3:3}, at: 
kvm_vcpu_ioctl+0x90/0x898 [kvm]
[  +0.38] 
  which lock already depends on the new lock.

[  +0.02] 
  the existing dependency chain (in reverse order) is:
[  +0.01] 
  -> #2 (>mutex){+.+.}-{3:3}:
[  +0.04]validate_chain+0x796/0xa20
[  +0.06]__lock_acquire+0x420/0x7c8
[  +0.03]lock_acquire.part.0+0xec/0x1e8
[  +0.02]lock_acquire+0xb8/0x208
[  +0.02]__mutex_lock+0xa2/0x928
[  +0.05]mutex_lock_nested+0x32/0x40
[  +0.02]kvm_s390_cpus_to_pv+0x4e/0xf8 [kvm]
[  +0.19]kvm_s390_handle_pv+0x1ce/0x6b0 [kvm]
[  +0.18]kvm_arch_vm_ioctl+0x3ec/0x550 [kvm]
[  +0.19]kvm_vm_ioctl+0x40e/0x4a8 [kvm]
[  +0.18]__s390x_sys_ioctl+0xc0/0x100
[  +0.04]do_syscall+0x7e/0xd0
[  +0.43]__do_syscall+0xc0/0xd8
[  +0.04]system_call+0x72/0x98
[  +0.04] 
  -> #1 (>lock){+.+.}-{3:3}:
[  +0.04]validate_chain+0x796/0xa20
[  +0.02]__lock_acquire+0x420/0x7c8
[  +0.02]lock_acquire.part.0+0xec/0x1e8
[  +0.02]lock_acquire+0xb8/0x208
[  +0.03]__mutex_lock+0xa2/0x928
[  +0.02]mutex_lock_nested+0x32/0x40
[  +0.02]kvm_arch_crypto_set_masks+0x4a/0x2b8 [kvm]
[  +0.18]vfio_ap_mdev_refresh_apcb+0xd0/0xe0 [vfio_ap]
[  +0.03]assign_adapter_store+0x1f2/0x240 [vfio_ap]
[  +0.03]kernfs_fop_write_iter+0x13e/0x1e0
[  +0.03]new_sync_write+0x10a/0x198
[  +0.03]vfs_write.part.0+0x196/0x290
[  +0.02]ksys_write+0x6c/0xf8
[  +0.03]do_syscall+0x7e/0xd0
[  +0.02]__do_syscall+0xc0/0xd8
[  +0.03]system_call+0x72/0x98
[  +0.02] 
  -> #0 (_dev->lock){+.+.}-{3:3}:
[  +0.04]check_noncircular+0x16e/0x190
[  +0.02]check_prev_add+0xec/0xf38
[  +0.02]validate_chain+0x796/0xa20
[  +0.02]__lock_acquire+0x420/0x7c8
[  +0.02]lock_acquire.part.0+0xec/0x1e8
[  +0.02]lock_acquire+0xb8/0x208
[  +0.02]__mutex_lock+0xa2/0x928
[  +0.02]mutex_lock_nested+0x32/0x40
[  +0.03]handle_pqap+0x56/0x1c8 [vfio_ap]
[  +0.02]handle_pqap+0xe2/0x1d8 [kvm]
[  +0.19]kvm_handle_sie_intercept+0x134/0x248 [kvm]
[  +0.19]vcpu_post_run+0x2b6/0x580 [kvm]
[  +0.18]__vcpu_run+0x27e/0x388 [kvm]
[  +0.19]kvm_arch_vcpu_ioctl_run+0x10a/0x278 [kvm]
[  +0.18]kvm_vcpu_ioctl+0x2cc/0x898 [kvm]
[  +0.18]__s390x_sys_ioctl+0xc0/0x100
[  +0.03]do_syscall+0x7e/0xd0
[  +0.02]__do_syscall+0xc0/0xd8
[  +0.02]system_call+0x72/0x98
[  +0.03] 
  other info that might help us debug this:

[  +0.01] Chain exists of:
_dev->lock --> >lock --> >mutex

[  +0.05]  Possible unsafe locking scenario:

[  +0.01]CPU0CPU1
[  +0.01]
[  +0.02]   lock(>mutex);
[  +0.02]lock(>lock);
[  +0.02]lock(>mutex);
[  +0.02]   lock(_dev->lock);
[  +0.02] 
   *** DEADLOCK ***

[  +0.02] 2 locks held by CPU 1/KVM/6651:
[  +0.02]  #0: d41f4308 (>mutex){+.+.}-{3:3}, at: 
kvm_vcpu_ioctl+0x90/0x898 [kvm]
[  +0.23]  #1: da2fc508 (>srcu){}-{0:0}, at: 
__vcpu_run+0x1ec/0x388 [kvm]
[  +0.21] 
  stack backtrace:
[  +0.02] CPU: 6 PID: 6651 Comm: CPU 1/KVM Not tainted 
5.12.0-rc6-00016-g5bea90816c56 #57
[  +0.04] Hardware name: IBM 8561 T01 701 (LPAR)
[  +0.01] Call Trace:
[  +0.02]  [<0002010e7ef0>] 

Re: [PATCH v14 00/13] s390/vfio-ap: dynamic configuration support

2021-04-01 Thread Halil Pasic
On Wed, 31 Mar 2021 11:22:43 -0400
Tony Krowiak  wrote:

> Change log v13-v14:
> --

When testing I've experienced this kernel panic.


[ 4422.479706] vfio_ap matrix: MDEV: Registered
[ 4422.516999] vfio_mdev b2013234-18b2-49bf-badd-a4be9c78b120: Adding to iommu 
group 1
[ 4422.517037] vfio_mdev b2013234-18b2-49bf-badd-a4be9c78b120: MDEV: group_id = 
1
[ 4577.906708] vfio_mdev b2013234-18b2-49bf-badd-a4be9c78b120: Removing from 
iommu group 1
[ 4577.906917] vfio_mdev b2013234-18b2-49bf-badd-a4be9c78b120: MDEV: detaching 
iommu
[ 4577.908093] Unable to handle kernel pointer dereference in virtual kernel 
address space
[ 4577.908097] Failing address: 0006ec02f000 TEID: 0006ec02f403
[ 4577.908100] Fault in home space mode while using kernel ASCE.
[ 4577.908106] AS:00035eb4c007 R3:0024 
[ 4577.908126] Oops: 003b ilc:3 [#1] PREEMPT SMP 
[ 4577.908132] Modules linked in: vfio_ap vhost_vsock 
vmw_vsock_virtio_transport_common vsock vhost vhost_iotlb kvm xt_CHECKSUM 
xt_MASQUERADE xt_conntrack ipt_R
EJECT xt_tcpudp nft_compat nf_nat_tftp nft_objref nf_conntrack_tftp nft_counter 
bridge stp llc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf
_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack 
nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink sunrpc s390_trng eadm_s
ch vfio_ccw vfio_mdev mdev vfio_iommu_type1 vfio sch_fq_codel configfs 
ip_tables x_tables dm_service_time ghash_s390 prng aes_s390 des_s390 libdes 
sha3_512_s390
 sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common nvme nvme_core zfcp 
scsi_transport_fc dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror d
m_region_hash dm_log dm_mod rng_core autofs4
[ 4577.908181] CPU: 0 PID: 14315 Comm: nose2 Not tainted 
5.12.0-rc5-00030-g4cd110385fa2 #55
[ 4577.908183] Hardware name: IBM 8561 T01 701 (LPAR)
[ 4577.908185] Krnl PSW : 0404e0018000 00035d2a50f4 
(__lock_acquire+0xdc/0x7c8)
[ 4577.908194]R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 
RI:0 EA:3
[ 4577.908232] Krnl GPRS: 00039d168d46 0006ec02f538 00035e7de940 

[ 4577.908235]  0001 
f9e04150
[ 4577.908237]00035fa8b100 006b6b6b680c417f f9e04150 
00035e61e8d0
[ 4577.908239]00035fa8b100  038010c4b7d8 
038010c4b738
[ 4577.908247] Krnl Code: 00035d2a50e4: eb110003000dsllg
%r1,%r1,3
[ 4577.908247]00035d2a50ea: b9080012agr %r1,%r2
[ 4577.908247]   #00035d2a50ee: e31003b80008ag  %r1,952
[ 4577.908247]   >00035d2a50f4: eb01107aagsi0(%r1),1
[ 4577.908247]00035d2a50fa: a718lhi %r1,-1
[ 4577.908247]00035d2a50fe: eb1103a800f8laa 
%r1,%r1,936
[ 4577.908247]00035d2a5104: ec18026b017ecij 
%r1,1,8,00035d2a55da
[ 4577.908247]00035d2a510a: c4180086d01flgrl
%r1,00035e37f148
[ 4577.908262] Call Trace:
[ 4577.908264]  [<00035d2a50f4>] __lock_acquire+0xdc/0x7c8 
[ 4577.908267]  [<00035d2a41ac>] lock_acquire.part.0+0xec/0x1e8 
[ 4577.908270]  [<00035d2a4360>] lock_acquire+0xb8/0x208 
[ 4577.908272]  [<00035de6fa2a>] _raw_spin_lock_irqsave+0x6a/0xd8 
[ 4577.908279]  [<00035d2874fe>] prepare_to_wait_event+0x2e/0x1e0 
[ 4577.908281]  [<03ff805d539a>] vfio_ap_mdev_remove_queue+0x122/0x148 
[vfio_ap] 
[ 4577.908287]  [<00035de20e94>] ap_device_remove+0x4c/0xf0 
[ 4577.908292]  [<00035db268a2>] __device_release_driver+0x18a/0x230 
[ 4577.908298]  [<00035db27cf0>] device_driver_detach+0x58/0xd0 
[ 4577.908301]  [<00035db25000>] device_reprobe+0x30/0xc0 
[ 4577.908304]  [<00035de22570>] __ap_revise_reserved+0x110/0x148 
[ 4577.908307]  [<00035db2408c>] bus_for_each_dev+0x7c/0xb8 
[ 4577.908310]  [<00035de2290c>] apmask_store+0xd4/0x118 
[ 4577.908313]  [<00035d639316>] kernfs_fop_write_iter+0x13e/0x1e0 
[ 4577.908317]  [<00035d542d22>] new_sync_write+0x10a/0x198 
[ 4577.908321]  [<00035d5433ee>] vfs_write.part.0+0x196/0x290 
[ 4577.908323]  [<00035d545f44>] ksys_write+0x6c/0xf8 
[ 4577.908326]  [<00035d1ce7ae>] do_syscall+0x7e/0xd0 
[ 4577.908330]  [<00035de5fc00>] __do_syscall+0xc0/0xd8 
[ 4577.908334]  [<00035de70c22>] system_call+0x72/0x98 
[ 4577.908337] INFO: lockdep is turned off.
[ 4577.908338] Last Breaking-Event-Address:
[ 4577.908340]  [<038010c4b648>] 0x38010c4b648
[ 4577.908345] Kernel panic - not syncing: Fatal exception: panic_on_oops


Re: [PATCH v5 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks

2021-03-25 Thread Halil Pasic
On Thu, 25 Mar 2021 08:46:40 -0400
Tony Krowiak  wrote:

> This patch fixes a lockdep splat introduced by commit f21916ec4826
> ("s390/vfio-ap: clean up vfio_ap resources when KVM pointer invalidated").
> The lockdep splat only occurs when starting a Secure Execution guest.
> Crypto virtualization (vfio_ap) is not yet supported for SE guests;
> however, in order to avoid this problem when support becomes available,
> this fix is being provided.
> 
> The circular locking dependency was introduced when the setting of the
> masks in the guest's APCB was executed while holding the matrix_dev->lock.
> While the lock is definitely needed to protect the setting/unsetting of the
> matrix_mdev->kvm pointer, it is not necessarily critical for setting the
> masks; so, the matrix_dev->lock will be released while the masks are being
> set or cleared.
> 
> Keep in mind, however, that another process that takes the matrix_dev->lock
> can get control while the masks in the guest's APCB are being set or
> cleared as a result of the driver being notified that the KVM pointer
> has been set or unset. This could result in invalid access to the
> matrix_mdev->kvm pointer by the intervening process. To avoid this
> scenario, two new fields are being added to the ap_matrix_mdev struct:
> 
> struct ap_matrix_mdev {
>   ...
>   bool kvm_busy;
>   wait_queue_head_t wait_for_kvm;
>...
> };
> 
> The functions that handle notification that the KVM pointer value has
> been set or cleared will set the kvm_busy flag to true until they are done
> processing at which time they will set it to false and wake up the tasks on
> the matrix_mdev->wait_for_kvm wait queue. Functions that require
> access to matrix_mdev->kvm will sleep on the wait queue until they are
> awakened at which time they can safely access the matrix_mdev->kvm
> field.
> 
> Fixes: f21916ec4826 ("s390/vfio-ap: clean up vfio_ap resources when KVM 
> pointer invalidated")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Tony Krowiak 

Reviewed-by: Halil Pasic 

I intend to give a couple of work-days to others, and if nobody objects
merge this. (I will wait till Tuesday.)

I've tested it and it does silence the lockdep splat.

Regards,
Halil


Re: [PATCH v4 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks

2021-03-18 Thread Halil Pasic
On Thu, 18 Mar 2021 14:38:53 -0400
Tony Krowiak  wrote:

> On 3/17/21 7:17 PM, Halil Pasic wrote:
> > On Wed, 10 Mar 2021 10:05:59 -0500
> > Tony Krowiak  wrote:
> >  
> >> -  ret = vfio_ap_mdev_reset_queues(mdev);
> >> +  matrix_mdev = mdev_get_drvdata(mdev);  
> > Is it guaranteed that matrix_mdev can't be NULL here? If yes, please
> > remind me of the mechanism that ensures this.
> >  
> >> +
> >> +  /*
> >> +   * If the KVM pointer is in the process of being set, wait until
> >> +   * the process has completed.
> >> +   */
> >> +  wait_event_cmd(matrix_mdev->wait_for_kvm,
> >> + matrix_mdev->kvm_busy == false,
> >> + mutex_unlock(_dev->lock),
> >> + mutex_lock(_dev->lock));
> >> +
> >> +  if (matrix_mdev->kvm)
> >> +  ret = vfio_ap_mdev_reset_queues(mdev);
> >> +  else
> >> +  ret = -ENODEV;  
> > Didn't we agree to make the call to vfio_ap_mdev_reset_queues()
> > unconditional again (for reference please take look at
> > Message-ID: <64afa72c-2d6a-2ca1-e576-34e15fa57...@linux.ibm.com>)?  
> 
> How about this:

Looks good. I will check the mdev code if the checkeck is really
needed. I'm curious when the sysfs files associated with a new mdev are
created. My guess is that this one comes in via a device specific file
(not the parent like in case of the create), and that those may be
created after the create. But we can get rid of the check any time so I
really don't see it as something that would preclude merging this.

Regards,
Halil

> 
> static ssize_t vfio_ap_mdev_ioctl(struct mdev_device *mdev,
>                      unsigned int cmd, unsigned long arg)
> {
>      int ret = 0;
>      struct ap_matrix_mdev *matrix_mdev;
> 
>      ...
>      case VFIO_DEVICE_RESET:
>          matrix_mdev = mdev_get_drvdata(mdev);
>          WARN(!matrix_mdev, "Driver data missing from mdev!!");
> 
>          if (matrix_mdev) {
>              /*
>               * If the KVM pointer is in the process of being set, wait 
> until
>               * the process has completed.
>               */
>              wait_event_cmd(matrix_mdev->wait_for_kvm,
>                     matrix_mdev->kvm_busy == false,
> mutex_unlock(_dev->lock),
> mutex_lock(_dev->lock));
> 
>              ret = vfio_ap_mdev_reset_queues(mdev);
>          }
>          break;
>      ...
> 
>      return ret;
> }
> 
> >
> > Regards,
> > Halil  
> 



Re: [PATCH v4 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks

2021-03-18 Thread Halil Pasic
On Thu, 18 Mar 2021 13:54:06 -0400
Tony Krowiak  wrote:

> > Is it guaranteed that matrix_mdev can't be NULL here? If yes, please
> > remind me of the mechanism that ensures this.  
> 
> The matrix_mdev is set as drvdata when the mdev is created and
> is only cleared when the mdev is removed. Likewise, this function
> is a callback defined by by vfio in the vfio_ap_matrix_ops structure
> when the matrix_dev is registered and is intended to handle ioctl
> calls from userspace during the lifetime of the mdev. 

Yes, I've checked that these are all callbacks in the same struct, so
the callbacks are all registered simultaneously, i.e. the ioctl callback
gettin gregistered only when drv_data is already set is not the case.
If there isn't a mechanism in core mdev, then I think we better be
careful.  I don't see what would guarantee the pointer is always in the
vfio_ap code. 

> While I can't
> speak definitively to the guarantee, I think it is extremely unlikely
> that matrix_mdev would be NULL at this point. On the other hand,
> it wouldn't hurt to check for NULL and log an error or warning
> message (I prefer an error here) if NULL.

If we aren't absolutely sure this pointer is going to be always a valid
one, let's check it!

Regards,
Halil


Re: [PATCH v4 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks

2021-03-17 Thread Halil Pasic
On Wed, 10 Mar 2021 10:05:59 -0500
Tony Krowiak  wrote:

> - ret = vfio_ap_mdev_reset_queues(mdev);
> + matrix_mdev = mdev_get_drvdata(mdev);

Is it guaranteed that matrix_mdev can't be NULL here? If yes, please
remind me of the mechanism that ensures this.

> +
> + /*
> +  * If the KVM pointer is in the process of being set, wait until
> +  * the process has completed.
> +  */
> + wait_event_cmd(matrix_mdev->wait_for_kvm,
> +matrix_mdev->kvm_busy == false,
> +mutex_unlock(_dev->lock),
> +mutex_lock(_dev->lock));
> +
> + if (matrix_mdev->kvm)
> + ret = vfio_ap_mdev_reset_queues(mdev);
> + else
> + ret = -ENODEV;

Didn't we agree to make the call to vfio_ap_mdev_reset_queues()
unconditional again (for reference please take look at 
Message-ID: <64afa72c-2d6a-2ca1-e576-34e15fa57...@linux.ibm.com>)?

Regards,
Halil


Re: [PATCH v3 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks

2021-03-09 Thread Halil Pasic
On Thu, 4 Mar 2021 12:43:44 -0500
Tony Krowiak  wrote:

> On the other hand, if we don't have ->kvm because something broke,
> then we may be out of luck anyway. There will certainly be no
> way to unregister the GISC; however, it may still be possible
> to unpin the pages if we still have q->saved_pfn.
> 
> The point is, if the queue is bound to vfio_ap, it can be reset. If we can't
> clean up the IRQ resources because something is broken, then there
> is nothing we can do about that.

Especially since the recently added WARN_ONCE macros calling reset_queues
unconditionally ain't that bad: we would at least see if there is a
problem with cleaning up the IRQ resources.

Let's make it unconditional again and observe. Can you send out a v4 with
this and the other issue fixed. 

Regards,
Halil


Re: [PATCH v1 13/14] vfio: Remove extern from declarations across vfio

2021-03-08 Thread Halil Pasic
On Mon, 08 Mar 2021 14:49:42 -0700
Alex Williamson  wrote:

> Cleanup disrecommended usage and docs.
> 
> Signed-off-by: Alex Williamson 

Acked-by: Halil Pasic 


Re: [PATCH v3 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks

2021-03-03 Thread Halil Pasic
On Wed, 3 Mar 2021 11:41:22 -0500
Tony Krowiak  wrote:

> > How do you exect userspace to react to this -ENODEV?  
> 
> The VFIO_DEVICE_RESET ioctl expects a return code.
> The vfio_ap_mdev_reset_queues() function can return -EIO or
> -EBUSY, so I would expect userspace to handle -ENODEV
> similarly to -EIO or any other non-zero return code. I also
> looked at all of the VFIO_DEVICE_RESET calls from QEMU to see
> how the return from the ioctl call is handled:
> 
> * ap: reports the reset failed along with the rc

And carries on as if nothing happened. There is not much smart
userspace can do in such a situation. Therefore the reset really
should not fail.

Please note that in this particular case, if the userspace would
opt for a retry, we would most likely end up in a retry loop.

> * ccw: doesn't check the rc
> * pci: kind of hard to follow without digging deep, but definitely
>   handles non-zero rc.
> 
> I think the caller should be notified whether the queues were
> successfully reset or not, and why; in this case, the answer is
> there are no devices to reset.

That is the wrong answer. The ioctl is supposed to reset the
ap_matrix_mdev device. The ap_matrix_mdev device still exists. Thus
returning -ENODEV is bugous.

Regards,
Halil


Re: [PATCH v3 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks

2021-03-03 Thread Halil Pasic
On Wed, 3 Mar 2021 12:10:11 -0500
Tony Krowiak  wrote:

> On 3/3/21 10:23 AM, Halil Pasic wrote:
> > On Tue,  2 Mar 2021 15:43:22 -0500
> > Tony Krowiak  wrote:
> >  
> >> This patch fixes a lockdep splat introduced by commit f21916ec4826
> >> ("s390/vfio-ap: clean up vfio_ap resources when KVM pointer invalidated").
> >> The lockdep splat only occurs when starting a Secure Execution guest.
> >> Crypto virtualization (vfio_ap) is not yet supported for SE guests;
> >> however, in order to avoid this problem when support becomes available,
> >> this fix is being provided.  
> > [..]
> >  
> >> @@ -1038,14 +1116,28 @@ static int vfio_ap_mdev_set_kvm(struct 
> >> ap_matrix_mdev *matrix_mdev,
> >>   {
> >>struct ap_matrix_mdev *m;
> >>
> >> -  list_for_each_entry(m, _dev->mdev_list, node) {
> >> -  if ((m != matrix_mdev) && (m->kvm == kvm))
> >> -  return -EPERM;
> >> -  }
> >> +  if (kvm->arch.crypto.crycbd) {
> >> +  matrix_mdev->kvm_busy = true;
> >>
> >> -  matrix_mdev->kvm = kvm;
> >> -  kvm_get_kvm(kvm);
> >> -  kvm->arch.crypto.pqap_hook = _mdev->pqap_hook;
> >> +  list_for_each_entry(m, _dev->mdev_list, node) {
> >> +  if ((m != matrix_mdev) && (m->kvm == kvm)) {
> >> +  wake_up_all(_mdev->wait_for_kvm);  
> > This ain't no good. kvm_busy will remain true if we take this exit. The
> > wake_up_all() is not needed, because we hold the lock, so nobody can
> > observe it if we don't forget kvm_busy set.
> >
> > I suggest moving matrix_mdev->kvm_busy = true; after this loop, maybe right
> > before the unlock, and removing the wake_up_all().
> >  
> >> +  return -EPERM;
> >> +  }
> >> +  }
> >> +
> >> +  kvm_get_kvm(kvm);
> >> +  mutex_unlock(_dev->lock);
> >> +  kvm_arch_crypto_set_masks(kvm,
> >> +matrix_mdev->matrix.apm,
> >> +matrix_mdev->matrix.aqm,
> >> +matrix_mdev->matrix.adm);
> >> +  mutex_lock(_dev->lock);
> >> +  kvm->arch.crypto.pqap_hook = _mdev->pqap_hook;
> >> +  matrix_mdev->kvm = kvm;
> >> +  matrix_mdev->kvm_busy = false;
> >> +  wake_up_all(_mdev->wait_for_kvm);
> >> +  }
> >>
> >>return 0;
> >>   }  
> > [..]
> >  
> >> @@ -1300,7 +1406,21 @@ static ssize_t vfio_ap_mdev_ioctl(struct 
> >> mdev_device *mdev,
> >>ret = vfio_ap_mdev_get_device_info(arg);
> >>break;
> >>case VFIO_DEVICE_RESET:
> >> -  ret = vfio_ap_mdev_reset_queues(mdev);
> >> +  matrix_mdev = mdev_get_drvdata(mdev);
> >> +
> >> +  /*
> >> +   * If the KVM pointer is in the process of being set, wait until
> >> +   * the process has completed.
> >> +   */
> >> +  wait_event_cmd(matrix_mdev->wait_for_kvm,
> >> + matrix_mdev->kvm_busy == false,
> >> + mutex_unlock(_dev->lock),
> >> + mutex_lock(_dev->lock));
> >> +
> >> +  if (matrix_mdev->kvm)
> >> +  ret = vfio_ap_mdev_reset_queues(mdev);
> >> +  else
> >> +  ret = -ENODEV;  
> > I don't think rejecting the reset is a good idea. I have you a more detailed
> > explanation of the list, where we initially discussed this question.
> >
> > How do you exect userspace to react to this -ENODEV?  
> 
> After reading your more detailed explanation, I have come to the
> conclusion that the test for matrix_mdev->kvm should not be
> performed here and the the vfio_ap_mdev_reset_queues() function
> should be called regardless. Each queue assigned to the mdev
> that is also bound to the vfio_ap driver will get reset and its
> IRQ resources cleaned up if they haven't already been and the
> other required conditions are met (i.e., see 
> vfio_ap_mdev_free_irq_resources()).

My point is if !->kvm the other required conditions are not met. But
yes we can go back to unconditional vfio_ap_mdev_reset_queues(mdev),
and think about the necessity of performing a
vfio_ap_mdev_reset_queues() if !->kvm later as I proposed in the other
mail.

Regards,
Halil


Re: [PATCH v3 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks

2021-03-03 Thread Halil Pasic
On Tue,  2 Mar 2021 15:43:22 -0500
Tony Krowiak  wrote:

> This patch fixes a lockdep splat introduced by commit f21916ec4826
> ("s390/vfio-ap: clean up vfio_ap resources when KVM pointer invalidated").
> The lockdep splat only occurs when starting a Secure Execution guest.
> Crypto virtualization (vfio_ap) is not yet supported for SE guests;
> however, in order to avoid this problem when support becomes available,
> this fix is being provided.

[..]

> @@ -1038,14 +1116,28 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev 
> *matrix_mdev,
>  {
>   struct ap_matrix_mdev *m;
> 
> - list_for_each_entry(m, _dev->mdev_list, node) {
> - if ((m != matrix_mdev) && (m->kvm == kvm))
> - return -EPERM;
> - }
> + if (kvm->arch.crypto.crycbd) {
> + matrix_mdev->kvm_busy = true;
> 
> - matrix_mdev->kvm = kvm;
> - kvm_get_kvm(kvm);
> - kvm->arch.crypto.pqap_hook = _mdev->pqap_hook;
> + list_for_each_entry(m, _dev->mdev_list, node) {
> + if ((m != matrix_mdev) && (m->kvm == kvm)) {
> + wake_up_all(_mdev->wait_for_kvm);

This ain't no good. kvm_busy will remain true if we take this exit. The
wake_up_all() is not needed, because we hold the lock, so nobody can
observe it if we don't forget kvm_busy set.

I suggest moving matrix_mdev->kvm_busy = true; after this loop, maybe right
before the unlock, and removing the wake_up_all().

> + return -EPERM;
> + }
> + }
> +
> + kvm_get_kvm(kvm);
> + mutex_unlock(_dev->lock);
> + kvm_arch_crypto_set_masks(kvm,
> +   matrix_mdev->matrix.apm,
> +   matrix_mdev->matrix.aqm,
> +   matrix_mdev->matrix.adm);
> + mutex_lock(_dev->lock);
> + kvm->arch.crypto.pqap_hook = _mdev->pqap_hook;
> + matrix_mdev->kvm = kvm;
> + matrix_mdev->kvm_busy = false;
> + wake_up_all(_mdev->wait_for_kvm);
> + }
> 
>   return 0;
>  }

[..]

> @@ -1300,7 +1406,21 @@ static ssize_t vfio_ap_mdev_ioctl(struct mdev_device 
> *mdev,
>   ret = vfio_ap_mdev_get_device_info(arg);
>   break;
>   case VFIO_DEVICE_RESET:
> - ret = vfio_ap_mdev_reset_queues(mdev);
> + matrix_mdev = mdev_get_drvdata(mdev);
> +
> + /*
> +  * If the KVM pointer is in the process of being set, wait until
> +  * the process has completed.
> +  */
> + wait_event_cmd(matrix_mdev->wait_for_kvm,
> +matrix_mdev->kvm_busy == false,
> +mutex_unlock(_dev->lock),
> +mutex_lock(_dev->lock));
> +
> + if (matrix_mdev->kvm)
> + ret = vfio_ap_mdev_reset_queues(mdev);
> + else
> + ret = -ENODEV;

I don't think rejecting the reset is a good idea. I have you a more detailed
explanation of the list, where we initially discussed this question.

How do you exect userspace to react to this -ENODEV?

Otherwise looks good to me!

I've tested your branch from yesterday (which looks to me like this patch
without the above check on ->kvm and reset) for the lockdep splat, but I
didn't do any comprehensive testing -- which would ensure that we didn't
break something else in the process. With the two issues fixed, and your
word that the patch was properly tested (except for the lockdep splat
which I tested myself), I feel comfortable with moving forward with this.

Regards,



Re: [PATCH v2 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks

2021-02-25 Thread Halil Pasic
On Thu, 25 Feb 2021 08:53:50 -0500
Tony Krowiak  wrote:

> If we add the proposed flag to indicate when the matrix_mdev->kvm
> pointer is in flux, then we can check that before allowing the functions
> in the list above to proceed.

I'm not against that. Go ahead!

Regards,
Halil


Re: [PATCH v2 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks

2021-02-25 Thread Halil Pasic
On Thu, 25 Feb 2021 10:25:24 -0500
Tony Krowiak  wrote:

> On 2/25/21 8:53 AM, Tony Krowiak wrote:
> >
> >
> > On 2/25/21 6:28 AM, Halil Pasic wrote:  
> >> On Wed, 24 Feb 2021 22:28:50 -0500
> >> Tony Krowiak  wrote:
> >>  
> >>>>>static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev 
> >>>>> *matrix_mdev)
> >>>>>{
> >>>>> -   kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
> >>>>> -   matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
> >>>>> -   vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
> >>>>> -   kvm_put_kvm(matrix_mdev->kvm);
> >>>>> -   matrix_mdev->kvm = NULL;
> >>>>> +   struct kvm *kvm;
> >>>>> +
> >>>>> +   if (matrix_mdev->kvm) {
> >>>>> +   kvm = matrix_mdev->kvm;
> >>>>> +   kvm_get_kvm(kvm);
> >>>>> +   matrix_mdev->kvm = NULL;  
> >>>> I think if there were two threads dong the unset in parallel, one
> >>>> of them could bail out and carry on before the cleanup is done. But
> >>>> since nothing much happens in release after that, I don't see an
> >>>> immediate problem.
> >>>>
> >>>> Another thing to consider is, that setting ->kvm to NULL arms
> >>>> vfio_ap_mdev_remove()...  
> >>> I'm not entirely sure what you mean by this, but my
> >>> assumption is that you are talking about the check
> >>> for matrix_mdev->kvm != NULL at the start of
> >>> that function.  
> >> Yes I was talking about the check
> >>
> >> static int vfio_ap_mdev_remove(struct mdev_device *mdev)
> >> {
> >>  struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
> >>
> >>   
> >>  if (matrix_mdev->kvm)
> >>  return -EBUSY;
> >> ...
> >>  kfree(matrix_mdev);
> >> ...
> >> }
> >>
> >> As you see, we bail out if kvm is still set, otherwise we clean up the
> >> matrix_mdev which includes kfree-ing it. And vfio_ap_mdev_remove() is
> >> initiated via the sysfs, i.e. can be initiated at any time. If we were
> >> to free matrix_mdev in mdev_remove() and then carry on with kvm_unset()
> >> with mutex_lock(_dev->lock); that would be bad.  
> >
> > I agree.
> >  
> >>  
> >>> The reason
> >>> matrix_mdev->kvm is set to NULL before giving up
> >>> the matrix_dev->lock is so that functions that check
> >>> for the presence of the matrix_mdev->kvm pointer,
> >>> such as assign_adapter_store() - will exit if they get
> >>> control while the masks are being cleared.  
> >> I disagree!
> >>
> >> static ssize_t assign_adapter_store(struct device *dev,
> >>  struct device_attribute *attr,
> >>  const char *buf, size_t count)
> >> {
> >>  int ret;
> >>  unsigned long apid;
> >>  struct mdev_device *mdev = mdev_from_dev(dev);
> >>  struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
> >>
> >>   
> >>  /* If the guest is running, disallow assignment of adapter */
> >>  if (matrix_mdev->kvm)
> >>  return -EBUSY;
> >>
> >> We bail out when kvm != NULL, so having it set to NULL while the
> >> mask are being cleared will make these not bail out.  
> >
> > You are correct, I am an idiot.
> >  
> >>> So what we have
> >>> here is a catch-22; in other words, we have the case
> >>> you pointed out above and the cases related to
> >>> assigning/unassigning adapters, domains and
> >>> control domains which should exit when a guest
> >>> is running.  
> >> See above.  
> >
> > Ditto.
> >  
> >>> I may have an idea to resolve this. Suppose we add:
> >>>
> >>> struct ap_matrix_mdev {
> >>>       ...
> >>>       bool kvm_busy;
> >>>       ...
> >>> }
> >>>
> >>> This flag will be set to true 

Re: [PATCH v2 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks

2021-02-25 Thread Halil Pasic
On Wed, 24 Feb 2021 22:28:50 -0500
Tony Krowiak  wrote:

> >>   static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)
> >>   {
> >> -  kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
> >> -  matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
> >> -  vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
> >> -  kvm_put_kvm(matrix_mdev->kvm);
> >> -  matrix_mdev->kvm = NULL;
> >> +  struct kvm *kvm;
> >> +
> >> +  if (matrix_mdev->kvm) {
> >> +  kvm = matrix_mdev->kvm;
> >> +  kvm_get_kvm(kvm);
> >> +  matrix_mdev->kvm = NULL;  
> > I think if there were two threads dong the unset in parallel, one
> > of them could bail out and carry on before the cleanup is done. But
> > since nothing much happens in release after that, I don't see an
> > immediate problem.
> >
> > Another thing to consider is, that setting ->kvm to NULL arms
> > vfio_ap_mdev_remove()...  
> 
> I'm not entirely sure what you mean by this, but my
> assumption is that you are talking about the check
> for matrix_mdev->kvm != NULL at the start of
> that function. 

Yes I was talking about the check

static int vfio_ap_mdev_remove(struct mdev_device *mdev)
{   
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);

if (matrix_mdev->kvm)   
return -EBUSY;
...
kfree(matrix_mdev); 
...   
} 

As you see, we bail out if kvm is still set, otherwise we clean up the
matrix_mdev which includes kfree-ing it. And vfio_ap_mdev_remove() is
initiated via the sysfs, i.e. can be initiated at any time. If we were
to free matrix_mdev in mdev_remove() and then carry on with kvm_unset()
with mutex_lock(_dev->lock); that would be bad.



> The reason
> matrix_mdev->kvm is set to NULL before giving up
> the matrix_dev->lock is so that functions that check
> for the presence of the matrix_mdev->kvm pointer,
> such as assign_adapter_store() - will exit if they get
> control while the masks are being cleared. 

I disagree!

static ssize_t assign_adapter_store(struct device *dev, 
struct device_attribute *attr,  
const char *buf, size_t count)  
{   
int ret;
unsigned long apid; 
struct mdev_device *mdev = mdev_from_dev(dev);  
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);

/* If the guest is running, disallow assignment of adapter */   
if (matrix_mdev->kvm)   
return -EBUSY;

We bail out when kvm != NULL, so having it set to NULL while the
mask are being cleared will make these not bail out.

> So what we have
> here is a catch-22; in other words, we have the case
> you pointed out above and the cases related to
> assigning/unassigning adapters, domains and
> control domains which should exit when a guest
> is running.


See above.

> 
> I may have an idea to resolve this. Suppose we add:
> 
> struct ap_matrix_mdev {
>      ...
>      bool kvm_busy;
>      ...
> }
> 
> This flag will be set to true at the start of both the
> vfio_ap_mdev_set_kvm() and vfio_ap_mdev_unset_kvm()
> and set to false at the end. The assignment/unassignment
> and remove callback functions can test this flag and
> return -EBUSY if the flag is true. That will preclude assigning
> or unassigning adapters, domains and control domains when
> the KVM pointer is being set/unset. Likewise, removal of the
> mediated device will also be prevented while the KVM pointer
> is being set/unset.
> 
> In the case of the PQAP handler function, it can wait for the
> set/unset of the KVM pointer as follows:
> 
> /while (matrix_mdev->kvm_busy) {//
> //        mutex_unlock(_dev->lock);//
> //        msleep(100);//
> //        mutex_lock(_dev->lock);//
> //}//
> //
> //if (!matrix_mdev->kvm)//
> //        goto out_unlock;
> 
> /What say you?
> //

I'm not sure. Since I disagree with your analysis above it is difficult
to deal with the conclusion. I'm not against decoupling the tracking of
the state of the mdev_matrix device from the value of the kvm pointer. I
think we should first get a common understanding of the problem, before
we proceed to the solution.

Regards,
Halil


Re: [PATCH v2 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks

2021-02-23 Thread Halil Pasic
On Mon, 15 Feb 2021 20:15:47 -0500
Tony Krowiak  wrote:

> This patch fixes a circular locking dependency in the CI introduced by
> commit f21916ec4826 ("s390/vfio-ap: clean up vfio_ap resources when KVM
> pointer invalidated"). The lockdep only occurs when starting a Secure
> Execution guest. Crypto virtualization (vfio_ap) is not yet supported for
> SE guests; however, in order to avoid CI errors, this fix is being
> provided.
> 
> The circular lockdep was introduced when the masks in the guest's APCB
> were taken under the matrix_dev->lock. While the lock is definitely
> needed to protect the setting/unsetting of the KVM pointer, it is not
> necessarily critical for setting the masks, so this will not be done under
> protection of the matrix_dev->lock.



With the one little thing I commented on below addressed: 
Acked-by: Halil Pasic   

This solution probably ain't a perfect one, but can't say I see a simple
way to get around this problem. For instance I played with the thought of
taking locks in a different order and keeping the critical sections
intact, but that has problems of its own. Tony should have the best
understanding of vfio_ap anyway.

In theory the execution of vfio_ap_mdev_group_notifier() and
vfio_ap_mdev_release() could interleave, and we could loose a clear because
in theory some permutations of the critical sections need to be
considered. In practice I hope that won't happen with QEMU.

Tony, you gave this a decent amount of testing or? 

I think we should move forward with this. Any objections? 
> 
> Fixes: f21916ec4826 ("s390/vfio-ap: clean up vfio_ap resources when KVM 
> pointer invalidated")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Tony Krowiak 
> ---
>  drivers/s390/crypto/vfio_ap_ops.c | 119 +-
>  1 file changed, 84 insertions(+), 35 deletions(-)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
> b/drivers/s390/crypto/vfio_ap_ops.c
> index 41fc2e4135fe..8574b6ecc9c5 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -1027,8 +1027,21 @@ static const struct attribute_group 
> *vfio_ap_mdev_attr_groups[] = {
>   * @matrix_mdev: a mediated matrix device
>   * @kvm: reference to KVM instance
>   *
> - * Verifies no other mediated matrix device has @kvm and sets a reference to
> - * it in @matrix_mdev->kvm.
> + * Sets all data for @matrix_mdev that are needed to manage AP resources
> + * for the guest whose state is represented by @kvm:
> + * 1. Verifies no other mediated device has a reference to @kvm.
> + * 2. Increments the ref count for @kvm so it doesn't disappear until the
> + *vfio_ap driver is notified the pointer is being nullified.
> + * 3. Sets a reference to the PQAP hook (i.e., handle_pqap() function) into
> + *@kvm to handle interception of the PQAP(AQIC) instruction.
> + * 4. Sets the masks supplying the AP configuration to the KVM guest.
> + * 5. Sets the KVM pointer into @kvm so the vfio_ap driver can access it.
> + *

Could for example a PQAP AQIC run across an unset matrix_mdev->kvm like
this, in theory? I don't think it's likely to happen in the wild though.
Why not set it up before setting the mask?

> + * Note: The matrix_dev->lock must be taken prior to calling
> + * this function; however, the lock will be temporarily released to avoid a
> + * potential circular lock dependency with other asynchronous processes that
> + * lock the kvm->lock mutex which is also needed to supply the guest's AP
> + * configuration.
>   *
>   * Return 0 if no other mediated matrix device has a reference to @kvm;
>   * otherwise, returns an -EPERM.
> @@ -1043,9 +1056,17 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev 
> *matrix_mdev,
>   return -EPERM;
>   }
>  
> - matrix_mdev->kvm = kvm;
> - kvm_get_kvm(kvm);
> - kvm->arch.crypto.pqap_hook = _mdev->pqap_hook;
> + if (kvm->arch.crypto.crycbd) {
> + kvm_get_kvm(kvm);
> + kvm->arch.crypto.pqap_hook = _mdev->pqap_hook;
> + mutex_unlock(_dev->lock);
> + kvm_arch_crypto_set_masks(kvm,
> +   matrix_mdev->matrix.apm,
> +   matrix_mdev->matrix.aqm,
> +   matrix_mdev->matrix.adm);
> + mutex_lock(_dev->lock);
> + matrix_mdev->kvm = kvm;
> + }
>  
>   return 0;
>  }
> @@ -1079,51 +1100,80 @@ static int vfio_ap_mdev_iommu_notifier(struct 
> notifier_block *nb,
>   return NOTIFY_DONE;
>  }
>  
> +/**
> + * vfio_ap_mdev_unset_kvm
> + *
> + * @matrix_mdev: a matrix mediated device
> + *
> + * Perfo

Re: [PATCH 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks

2021-02-11 Thread Halil Pasic
On Thu, 11 Feb 2021 09:21:26 -0500
Tony Krowiak  wrote:

> Yes, it makes sense. I guess I didn't look closely at your
> suggestion when I said it was exactly what I implemented
> after agreeing with Connie. I had a slight difference in
> my implementation:
> 
> static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)
> {
>      struct kvm *kvm;
> 
>      mutex_lock(_dev->lock);
> 
>      if (matrix_mdev->kvm) {
>          kvm = matrix_mdev->kvm;
>          mutex_unlock(_dev->lock);

The problem with this one is that as soon as we drop
the lock here, another thread can in theory execute
the critical section below, which drops our reference
to kvm via kvm_put_kvm(kvm). Thus when we enter
kvm_arch_crypto_clear_mask(), even if we are guaranteed
to have a non-null pointer, the pointee is not guaranteed
to be around. So like Connie suggested, you better take
another reference to kvm in the first critical section.

Regards,
Halil

>          kvm_arch_crypto_clear_masks(kvm);
>          mutex_lock(_dev->lock);
>          kvm->arch.crypto.pqap_hook = NULL;
>          vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
>      matrix_mdev->kvm = NULL;
>          kvm_put_kvm(kvm);
>      }
> 
>      mutex_unlock(_dev->lock);
> }


Re: [PATCH 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks

2021-02-10 Thread Halil Pasic
On Wed, 10 Feb 2021 17:05:48 -0500
Tony Krowiak  wrote:

> On 2/10/21 10:32 AM, Halil Pasic wrote:
> > On Wed, 10 Feb 2021 16:24:29 +0100
> > Halil Pasic  wrote:
> >  
> >>> Maybe you could
> >>> - grab a reference to kvm while holding the lock
> >>> - call the mask handling functions with that kvm reference
> >>> - lock again, drop the reference, and do the rest of the processing?  
> >> I agree, matrix_mdev->kvm can go NULL any time and we are risking
> >> a null pointer dereference here.
> >>
> >> Another idea would be to do
> >>
> >>
> >> static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)
> >> {
> >>  struct kvm *kvm;
> >>  
> >>  mutex_lock(_dev->lock);
> >>  if (matrix_mdev->kvm) {
> >>  kvm = matrix_mdev->kvm;
> >>  matrix_mdev->kvm = NULL;
> >>  mutex_unlock(_dev->lock);
> >>  kvm_arch_crypto_clear_masks(kvm);
> >>  mutex_lock(_dev->lock);
> >>  matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;  
> > s/matrix_mdev->kvm/kvm  
> >>  vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
> >>  kvm_put_kvm(kvm);
> >>  }
> >>  mutex_unlock(_dev->lock);
> >> }
> >>
> >> That way only one unset would actually do the unset and cleanup
> >> and every other invocation would bail out with only checking
> >> matrix_mdev->kvm.  
> > But the problem with that is that we enable the the assign/unassign
> > prematurely, which could interfere wit reset_queues(). Forget about
> > it.  
> 
> Not sure what you mean by this.
> 
> 

I mean because above I first do
(1) matrix_mdev->kvm = NULL;
and then do 
(2) vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
another thread could do 
static ssize_t unassign_adapter_store(struct device *dev,   
  struct device_attribute *attr,
  const char *buf, size_t count)
{   
int ret;
unsigned long apid; 
struct mdev_device *mdev = mdev_from_dev(dev);  
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);

/* If the guest is running, disallow un-assignment of adapter */
if (matrix_mdev->kvm)   
return -EBUSY;   
...
}
between (1) and (2), and we would not bail out with -EBUSY because !!kvm
because of (1). That means we would change matrix_mdev->matrix and we
would not reset the queues that correspond to the apid that was just
removed, because by the time we do the reset_queues, the queues are
not in the matrix_mdev->matrix any more.

Does that make sense?


Re: [PATCH 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks

2021-02-10 Thread Halil Pasic
On Wed, 10 Feb 2021 16:24:29 +0100
Halil Pasic  wrote:

> > Maybe you could
> > - grab a reference to kvm while holding the lock
> > - call the mask handling functions with that kvm reference
> > - lock again, drop the reference, and do the rest of the processing?  
> 
> I agree, matrix_mdev->kvm can go NULL any time and we are risking
> a null pointer dereference here.
> 
> Another idea would be to do
> 
> 
> static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)
>
> { 
>   
> struct kvm *kvm;
> 
> mutex_lock(_dev->lock);
>   
> if (matrix_mdev->kvm) {   
>   
> kvm = matrix_mdev->kvm;   
>   
> matrix_mdev->kvm = NULL;  
>   
> mutex_unlock(_dev->lock);  
>   
> kvm_arch_crypto_clear_masks(kvm); 
>   
> mutex_lock(_dev->lock);
>   
> matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
s/matrix_mdev->kvm/kvm
> vfio_ap_mdev_reset_queues(matrix_mdev->mdev); 
>   
> kvm_put_kvm(kvm); 
>   
> } 
>   
> mutex_unlock(_dev->lock);  
>
> }
> 
> That way only one unset would actually do the unset and cleanup
> and every other invocation would bail out with only checking
> matrix_mdev->kvm.

But the problem with that is that we enable the the assign/unassign
prematurely, which could interfere wit reset_queues(). Forget about
it.


Re: [PATCH 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks

2021-02-10 Thread Halil Pasic
On Wed, 10 Feb 2021 11:53:34 +0100
Cornelia Huck  wrote:

> On Tue,  9 Feb 2021 14:48:30 -0500
> Tony Krowiak  wrote:
> 
> > This patch fixes a circular locking dependency in the CI introduced by
> > commit f21916ec4826 ("s390/vfio-ap: clean up vfio_ap resources when KVM
> > pointer invalidated"). The lockdep only occurs when starting a Secure
> > Execution guest. Crypto virtualization (vfio_ap) is not yet supported for
> > SE guests; however, in order to avoid CI errors, this fix is being
> > provided.
> > 
> > The circular lockdep was introduced when the masks in the guest's APCB
> > were taken under the matrix_dev->lock. While the lock is definitely
> > needed to protect the setting/unsetting of the KVM pointer, it is not
> > necessarily critical for setting the masks, so this will not be done under
> > protection of the matrix_dev->lock.
> > 
> > Fixes: f21916ec4826 ("s390/vfio-ap: clean up vfio_ap resources when KVM 
> > pointer invalidated")
> > Cc: sta...@vger.kernel.org
> > Signed-off-by: Tony Krowiak 
> > ---
> >  drivers/s390/crypto/vfio_ap_ops.c | 75 ++-
> >  1 file changed, 45 insertions(+), 30 deletions(-)
> >   
> 
> >  static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)
> >  {
> > -   kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
> > -   matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
> > -   vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
> > -   kvm_put_kvm(matrix_mdev->kvm);
> > -   matrix_mdev->kvm = NULL;
> > +   if (matrix_mdev->kvm) {  
> 
> If you're doing setting/unsetting under matrix_dev->lock, is it
> possible that matrix_mdev->kvm gets unset between here and the next
> line, as you don't hold the lock?
> 
> Maybe you could
> - grab a reference to kvm while holding the lock
> - call the mask handling functions with that kvm reference
> - lock again, drop the reference, and do the rest of the processing?

I agree, matrix_mdev->kvm can go NULL any time and we are risking
a null pointer dereference here.

Another idea would be to do


static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)  
 
{   
struct kvm *kvm;

mutex_lock(_dev->lock);  
if (matrix_mdev->kvm) { 
kvm = matrix_mdev->kvm; 
matrix_mdev->kvm = NULL;
mutex_unlock(_dev->lock);
kvm_arch_crypto_clear_masks(kvm);   
mutex_lock(_dev->lock);  
matrix_mdev->kvm->arch.crypto.pqap_hook = NULL; 
vfio_ap_mdev_reset_queues(matrix_mdev->mdev);   
kvm_put_kvm(kvm);   
}   
mutex_unlock(_dev->lock);
 
}

That way only one unset would actually do the unset and cleanup
and every other invocation would bail out with only checking
matrix_mdev->kvm.

 
> > +   kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
> > +   mutex_lock(_dev->lock);
> > +   matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
> > +   vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
> > +   kvm_put_kvm(matrix_mdev->kvm);
> > +   matrix_mdev->kvm = NULL;
> > +   mutex_unlock(_dev->lock);
> > +   }
> >  }  
> 



Re: [PATCH v13 09/15] s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device

2021-02-03 Thread Halil Pasic
On Wed, 3 Feb 2021 18:13:09 -0500
Tony Krowiak  wrote:

> On 1/12/21 12:55 PM, Halil Pasic wrote:
> > On Tue, 12 Jan 2021 02:12:51 +0100
> > Halil Pasic  wrote:
> >  
> >>> @@ -1347,8 +1437,11 @@ void vfio_ap_mdev_remove_queue(struct ap_device 
> >>> *apdev)
> >>>   apqi = AP_QID_QUEUE(q->apqn);
> >>>   vfio_ap_mdev_reset_queue(apid, apqi, 1);
> >>>   
> >>> - if (q->matrix_mdev)
> >>> + if (q->matrix_mdev) {
> >>> + matrix_mdev = q->matrix_mdev;
> >>>   vfio_ap_mdev_unlink_queue(q);
> >>> + vfio_ap_mdev_refresh_apcb(matrix_mdev);
> >>> + }
> >>>   
> >>>   kfree(q);
> >>>   mutex_unlock(_dev->lock);  
> > Shouldn't we first remove the queue from the APCB and then
> > reset? Sorry, I missed this one yesterday.  
> 
> I agreed to move the reset, however if the remove callback is
> invoked due to a manual unbind of the queue and the queue is
> in use by a guest, the cleanup of the IRQ resources after the
> reset of the queue will not happen because the link from the
> queue to the matrix mdev was removed. Consequently, I'm going
> to have to change the patch 05/15 to split the vfio_ap_mdev_unlink_queue()
> function into two functions: one to remove the link from the matrix mdev to
> the queue; and, one to remove the link from the queue to the matrix
> mdev. 

Does that mean we should reset before the unlink (or before the second
part of it after the split up)?

I mean have a look at unassign_adapter_store() with all patches
of this series applied. It does an unlink but doesn't do any reset,
or cleanup IRQ resources. And after the unlink we can't clean up
the IRQ resources properly.

But before all this we should resolve this circular lock dependency
problem in a satisfactory way. I'm quite worried about how it is going
to mesh with this series and dynamic ap pass-through.

Regards,
Halil

>Only the first will be used for the remove callback which should
> be fine since the queue object is freed at the end of the remove
> function anyway.
> 
> >
> > Regards,
> > Halil  
> 



Re: [PATCH 1/1] s390/vfio-ap: No need to disable IRQ after queue reset

2021-01-21 Thread Halil Pasic
On Thu, 21 Jan 2021 09:20:44 +0100
Cornelia Huck  wrote:

> On Thu, 21 Jan 2021 08:20:08 +0100
> Halil Pasic  wrote:
[..]
> > --- a/drivers/s390/crypto/vfio_ap_private.h
> > +++ b/drivers/s390/crypto/vfio_ap_private.h
> > @@ -88,11 +88,6 @@ struct ap_matrix_mdev {
> > struct mdev_device *mdev;
> >  };
> >  
> > -extern int vfio_ap_mdev_register(void);
> > -extern void vfio_ap_mdev_unregister(void);
> > -int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
> > -unsigned int retry);
> > -
> >  struct vfio_ap_queue {
> > struct ap_matrix_mdev *matrix_mdev;
> > unsigned long saved_pfn;
> > @@ -100,5 +95,10 @@ struct vfio_ap_queue {
> >  #define VFIO_AP_ISC_INVALID 0xff
> > unsigned char saved_isc;
> >  };
> > -struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q);
> > +
> > +int vfio_ap_mdev_register(void);
> > +void vfio_ap_mdev_unregister(void);  
> 
> Nit: was moving these two necessary?
> 

No not strictly necessary. I decided that having the data types
first and the function prototypes in one place after the former
is nicer.

> > +int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q,
> > +unsigned int retry);
> > +
> >  #endif /* _VFIO_AP_PRIVATE_H_ */
> > 
> > base-commit: 9791581c049c10929e97098374dd1716a81fefcc  
> 
> Anyway, if I didn't entangle myself in the various branches, this seems
> sane.
> 
> Reviewed-by: Cornelia Huck 
> 

Thank you very much!

Regards,
Halil


[PATCH 1/1] s390/vfio-ap: No need to disable IRQ after queue reset

2021-01-20 Thread Halil Pasic
From: Tony Krowiak 

The queues assigned to a matrix mediated device are currently reset when:

* The VFIO_DEVICE_RESET ioctl is invoked
* The mdev fd is closed by userspace (QEMU)
* The mdev is removed from sysfs.

Immediately after the reset of a queue, a call is made to disable
interrupts for the queue. This is entirely unnecessary because the reset of
a queue disables interrupts, so this will be removed.

Furthermore, vfio_ap_irq_disable() does an unconditional PQAP/AQIC which
can result in a specification exception (when the corresponding facility
is not available), so this is actually a bugfix.

Signed-off-by: Tony Krowiak 
[pa...@linux.ibm.com: minor rework before merging]
Reviewed-by: Halil Pasic 
Signed-off-by: Halil Pasic 
Fixes: ec89b55e3bce ("s390: ap: implement PAPQ AQIC interception in kernel")
Cc: 

---

Since it turned out disabling the interrupts via PQAP/AQIC is not only
unnecesary but also buggy, we decided to put this patch, which
used to be apart of the series https://lkml.org/lkml/2020/12/22/757 on the fast
lane.

If the backports turn out to be a bother, which I hope won't be the case
not, I am happy to help with those.

---
 drivers/s390/crypto/vfio_ap_drv.c |   6 +-
 drivers/s390/crypto/vfio_ap_ops.c | 100 --
 drivers/s390/crypto/vfio_ap_private.h |  12 ++--
 3 files changed, 69 insertions(+), 49 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_drv.c 
b/drivers/s390/crypto/vfio_ap_drv.c
index be2520cc010b..7dc72cb718b0 100644
--- a/drivers/s390/crypto/vfio_ap_drv.c
+++ b/drivers/s390/crypto/vfio_ap_drv.c
@@ -71,15 +71,11 @@ static int vfio_ap_queue_dev_probe(struct ap_device *apdev)
 static void vfio_ap_queue_dev_remove(struct ap_device *apdev)
 {
struct vfio_ap_queue *q;
-   int apid, apqi;
 
mutex_lock(_dev->lock);
q = dev_get_drvdata(>device);
+   vfio_ap_mdev_reset_queue(q, 1);
dev_set_drvdata(>device, NULL);
-   apid = AP_QID_CARD(q->apqn);
-   apqi = AP_QID_QUEUE(q->apqn);
-   vfio_ap_mdev_reset_queue(apid, apqi, 1);
-   vfio_ap_irq_disable(q);
kfree(q);
mutex_unlock(_dev->lock);
 }
diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
b/drivers/s390/crypto/vfio_ap_ops.c
index e0bde8518745..7ceb6c433b3b 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -25,6 +25,7 @@
 #define VFIO_AP_MDEV_NAME_HWVIRT "VFIO AP Passthrough Device"
 
 static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
+static struct vfio_ap_queue *vfio_ap_find_queue(int apqn);
 
 static int match_apqn(struct device *dev, const void *data)
 {
@@ -49,20 +50,15 @@ static struct vfio_ap_queue *vfio_ap_get_queue(
int apqn)
 {
struct vfio_ap_queue *q;
-   struct device *dev;
 
if (!test_bit_inv(AP_QID_CARD(apqn), matrix_mdev->matrix.apm))
return NULL;
if (!test_bit_inv(AP_QID_QUEUE(apqn), matrix_mdev->matrix.aqm))
return NULL;
 
-   dev = driver_find_device(_dev->vfio_ap_drv->driver, NULL,
-, match_apqn);
-   if (!dev)
-   return NULL;
-   q = dev_get_drvdata(dev);
-   q->matrix_mdev = matrix_mdev;
-   put_device(dev);
+   q = vfio_ap_find_queue(apqn);
+   if (q)
+   q->matrix_mdev = matrix_mdev;
 
return q;
 }
@@ -119,13 +115,18 @@ static void vfio_ap_wait_for_irqclear(int apqn)
  */
 static void vfio_ap_free_aqic_resources(struct vfio_ap_queue *q)
 {
-   if (q->saved_isc != VFIO_AP_ISC_INVALID && q->matrix_mdev)
+   if (!q)
+   return;
+   if (q->saved_isc != VFIO_AP_ISC_INVALID &&
+   !WARN_ON(!(q->matrix_mdev && q->matrix_mdev->kvm))) {
kvm_s390_gisc_unregister(q->matrix_mdev->kvm, q->saved_isc);
-   if (q->saved_pfn && q->matrix_mdev)
+   q->saved_isc = VFIO_AP_ISC_INVALID;
+   }
+   if (q->saved_pfn && !WARN_ON(!q->matrix_mdev)) {
vfio_unpin_pages(mdev_dev(q->matrix_mdev->mdev),
 >saved_pfn, 1);
-   q->saved_pfn = 0;
-   q->saved_isc = VFIO_AP_ISC_INVALID;
+   q->saved_pfn = 0;
+   }
 }
 
 /**
@@ -144,7 +145,7 @@ static void vfio_ap_free_aqic_resources(struct 
vfio_ap_queue *q)
  * Returns if ap_aqic function failed with invalid, deconfigured or
  * checkstopped AP.
  */
-struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q)
+static struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q)
 {
struct ap_qirq_ctrl aqic_gisa = {};
struct ap_queue_status status;
@@ -1114,48 +1115,70 @@ static int vfio_ap_mdev_group_notifier(struct 
notifier_block *nb,
return NOTIFY_OK;
 }
 
-static void vfio_ap_irq_disable_apqn(int apqn)
+stat

Re: [PATCH v13 06/15] s390/vfio-ap: allow assignment of unavailable AP queues to mdev device

2021-01-14 Thread Halil Pasic
On Thu, 14 Jan 2021 12:54:39 -0500
Tony Krowiak  wrote:

> >>   /**
> >>* vfio_ap_mdev_verify_no_sharing
> >>*
> >> - * Verifies that the APQNs derived from the cross product of the AP 
> >> adapter IDs
> >> - * and AP queue indexes comprising the AP matrix are not configured for 
> >> another
> >> - * mediated device. AP queue sharing is not allowed.
> >> + * Verifies that each APQN derived from the Cartesian product of the AP 
> >> adapter
> >> + * IDs and AP queue indexes comprising the AP matrix are not configured 
> >> for
> >> + * another mediated device. AP queue sharing is not allowed.
> >>*
> >> - * @matrix_mdev: the mediated matrix device
> >> + * @matrix_mdev: the mediated matrix device to which the APQNs being 
> >> verified
> >> + * are assigned.
> >> + * @mdev_apm: mask indicating the APIDs of the APQNs to be verified
> >> + * @mdev_aqm: mask indicating the APQIs of the APQNs to be verified
> >>*
> >> - * Returns 0 if the APQNs are not shared, otherwise; returns -EADDRINUSE.
> >> + * Returns 0 if the APQNs are not shared, otherwise; returns -EBUSY.
> >>*/
> >> -static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev 
> >> *matrix_mdev)
> >> +static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev 
> >> *matrix_mdev,
> >> +unsigned long *mdev_apm,
> >> +unsigned long *mdev_aqm)
> >>   {
> >>struct ap_matrix_mdev *lstdev;
> >>DECLARE_BITMAP(apm, AP_DEVICES);
> >> @@ -523,20 +426,31 @@ static int vfio_ap_mdev_verify_no_sharing(struct 
> >> ap_matrix_mdev *matrix_mdev)
> >> * We work on full longs, as we can only exclude the leftover
> >> * bits in non-inverse order. The leftover is all zeros.
> >> */
> >> -  if (!bitmap_and(apm, matrix_mdev->matrix.apm,
> >> -  lstdev->matrix.apm, AP_DEVICES))
> >> +  if (!bitmap_and(apm, mdev_apm, lstdev->matrix.apm, AP_DEVICES))
> >>continue;
> >>   
> >> -  if (!bitmap_and(aqm, matrix_mdev->matrix.aqm,
> >> -  lstdev->matrix.aqm, AP_DOMAINS))
> >> +  if (!bitmap_and(aqm, mdev_aqm, lstdev->matrix.aqm, AP_DOMAINS))
> >>continue;
> >>   
> >> -  return -EADDRINUSE;
> >> +  vfio_ap_mdev_log_sharing_err(dev_name(mdev_dev(lstdev->mdev)),
> >> +   apm, aqm);
> >> +
> >> +  return -EBUSY;  
> > Why do we change -EADDRINUSE to -EBUSY? This gets bubbled up to
> > userspace, or? So a tool that checks for the other mdev has it
> > condition by checking for -EADDRINUSE, would be confused...  
> 
> Back in v8 of the series, Christian suggested the occurrences
> of -EADDRINUSE should be replaced by the more appropriate
> -EBUSY (Message ID ),
> so I changed it here. It does get bubbled up to userspace, so you make a 
> valid point. I will
> change it back. I will, however, set the value returned from the
> __verify_card_reservations() function in ap_bus.c to -EBUSY as
> suggested by Christian.

As long as the error code for an ephemeral failure due to can't take a
lock right now, and the error code for a failure due to a sharing
conflict are (which most likely requires admin action to be resolved)
I'm fine.

Choosing EBUSY for sharing conflict, and something else for can't take
lock for the bus attributes, while choosing EADDRINUSE for sharing
conflict, and EBUSY for can't take lock in the case of the mdev
attributes (assign_*; unassign_*) sounds confusing to me, but is still
better than collating the two conditions. Maybe we can choose EAGAIN
or EWOULDBLOCK for the can't take the lock right now. I don't know.

I'm open to suggestions. And if Christian wants to change this for
the already released interfaces, I will have to live with that. But it
has to be a conscious decision at least.

What I consider tricky about EBUSY, is that according to my intuition,
in pseudocode, object.operation(argument) returns -EBUSY probably tells
me that object is busy (i.e. is in the middle of something incompatible
with performing operation). In our case, it is not the object that is
busy, but the resource denoted by the argument.

Regards,
Halil


Re: [PATCH v13 06/15] s390/vfio-ap: allow assignment of unavailable AP queues to mdev device

2021-01-14 Thread Halil Pasic
On Thu, 14 Jan 2021 12:54:39 -0500
Tony Krowiak  wrote:

> On 1/11/21 3:40 PM, Halil Pasic wrote:
> > On Tue, 22 Dec 2020 20:15:57 -0500
> > Tony Krowiak  wrote:
> >  
> >> The current implementation does not allow assignment of an AP adapter or
> >> domain to an mdev device if each APQN resulting from the assignment
> >> does not reference an AP queue device that is bound to the vfio_ap device
> >> driver. This patch allows assignment of AP resources to the matrix mdev as
> >> long as the APQNs resulting from the assignment:
> >> 1. Are not reserved by the AP BUS for use by the zcrypt device drivers.
> >> 2. Are not assigned to another matrix mdev.
> >>
> >> The rationale behind this is twofold:
> >> 1. The AP architecture does not preclude assignment of APQNs to an AP
> >>configuration that are not available to the system.
> >> 2. APQNs that do not reference a queue device bound to the vfio_ap
> >>device driver will not be assigned to the guest's CRYCB, so the
> >>guest will not get access to queues not bound to the vfio_ap 
> >> driver.  
> > You didn't tell us about the changed error code.  
> 
> I am assuming you are talking about returning -EBUSY from
> the vfio_ap_mdev_verify_no_sharing() function instead of
> -EADDRINUSE. I'm going to change this back per your comments
> below.
> 
> >
> > Also notice that this point we don't have neither filtering nor in-use.
> > This used to be patch 11, and most of that stuff used to be in place. But
> > I'm going to trust you, if you say its fine to enable it this early.  
> 
> The patch order was changed due to your review comments in
> in Message ID <20201126165431.6ef1457a.pa...@linux.ibm.com>,
> patch 07/17 in the v12 series. In order to ensure that only queues
> bound to the vfio_ap driver are given to the guest, I'm going to
> create a patch that will preceded this one which introduces the
> filtering code currently introduced in the patch 12/17, the hot
> plug patch.
> 

I don't want to delay this any further, so it's up to you. I don't think
we will get the in-between steps perfect anyway.

I've re-readthe Message ID
 <20201126165431.6ef1457a.pa...@linux.ibm.com> and I didn't
ask for this change. I pointed out a problem, and said, maybe it can be
solved by reordering, I didn't think it through.

[..]


Re: [PATCH v13 02/15] s390/vfio-ap: No need to disable IRQ after queue reset

2021-01-13 Thread Halil Pasic
On Wed, 13 Jan 2021 19:46:03 -0500
Tony Krowiak  wrote:

> On 1/13/21 4:21 PM, Halil Pasic wrote:
> > On Wed, 13 Jan 2021 12:06:28 -0500
> > Tony Krowiak  wrote:
> >  
> >> On 1/11/21 11:32 AM, Halil Pasic wrote:  
> >>> On Tue, 22 Dec 2020 20:15:53 -0500
> >>> Tony Krowiak  wrote:
> >>> 
> >>>> The queues assigned to a matrix mediated device are currently reset when:
> >>>>
> >>>> * The VFIO_DEVICE_RESET ioctl is invoked
> >>>> * The mdev fd is closed by userspace (QEMU)
> >>>> * The mdev is removed from sysfs.
> >>>>
> >>>> Immediately after the reset of a queue, a call is made to disable
> >>>> interrupts for the queue. This is entirely unnecessary because the reset 
> >>>> of
> >>>> a queue disables interrupts, so this will be removed.
> >>>>
> >>>> Signed-off-by: Tony Krowiak 
> >>>> ---
> >>>>drivers/s390/crypto/vfio_ap_drv.c |  1 -
> >>>>drivers/s390/crypto/vfio_ap_ops.c | 40 +--
> >>>>drivers/s390/crypto/vfio_ap_private.h |  1 -
> >>>>3 files changed, 26 insertions(+), 16 deletions(-)
> >>>>
> >>>> diff --git a/drivers/s390/crypto/vfio_ap_drv.c 
> >>>> b/drivers/s390/crypto/vfio_ap_drv.c
> >>>> index be2520cc010b..ca18c91afec9 100644
> >>>> --- a/drivers/s390/crypto/vfio_ap_drv.c
> >>>> +++ b/drivers/s390/crypto/vfio_ap_drv.c
> >>>> @@ -79,7 +79,6 @@ static void vfio_ap_queue_dev_remove(struct ap_device 
> >>>> *apdev)
> >>>>  apid = AP_QID_CARD(q->apqn);
> >>>>  apqi = AP_QID_QUEUE(q->apqn);
> >>>>  vfio_ap_mdev_reset_queue(apid, apqi, 1);
> >>>> -vfio_ap_irq_disable(q);
> >>>>  kfree(q);
> >>>>  mutex_unlock(_dev->lock);
> >>>>}
> >>>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
> >>>> b/drivers/s390/crypto/vfio_ap_ops.c
> >>>> index 7339043906cf..052f61391ec7 100644
> >>>> --- a/drivers/s390/crypto/vfio_ap_ops.c
> >>>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> >>>> @@ -25,6 +25,7 @@
> >>>>#define VFIO_AP_MDEV_NAME_HWVIRT "VFIO AP Passthrough Device"
> >>>>
> >>>>static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
> >>>> +static struct vfio_ap_queue *vfio_ap_find_queue(int apqn);
> >>>>
> >>>>static int match_apqn(struct device *dev, const void *data)
> >>>>{
> >>>> @@ -49,20 +50,15 @@ static struct vfio_ap_queue *(
> >>>>  int apqn)
> >>>>{
> >>>>  struct vfio_ap_queue *q;
> >>>> -struct device *dev;
> >>>>
> >>>>  if (!test_bit_inv(AP_QID_CARD(apqn), matrix_mdev->matrix.apm))
> >>>>  return NULL;
> >>>>  if (!test_bit_inv(AP_QID_QUEUE(apqn), matrix_mdev->matrix.aqm))
> >>>>  return NULL;
> >>>>
> >>>> -dev = driver_find_device(_dev->vfio_ap_drv->driver, NULL,
> >>>> - , match_apqn);
> >>>> -if (!dev)
> >>>> -return NULL;
> >>>> -q = dev_get_drvdata(dev);
> >>>> -q->matrix_mdev = matrix_mdev;
> >>>> -put_device(dev);
> >>>> +q = vfio_ap_find_queue(apqn);
> >>>> +if (q)
> >>>> +q->matrix_mdev = matrix_mdev;
> >>>>
> >>>>  return q;
> >>>>}
> >>>> @@ -1126,24 +1122,27 @@ static int vfio_ap_mdev_group_notifier(struct 
> >>>> notifier_block *nb,
> >>>>  return notify_rc;
> >>>>}
> >>>>
> >>>> -static void (int apqn)
> >>>> +static struct vfio_ap_queue *vfio_ap_find_queue(int apqn)
> >>>>{
> >>>>  struct device *dev;
> >>>> -struct vfio_ap_queue *q;
> >>>> +struct vfio_ap_queue *q = NULL;
> >>>>
> >>>>  dev = driver_find_

Re: [PATCH v13 05/15] s390/vfio-ap: manage link between queue struct and matrix mdev

2021-01-13 Thread Halil Pasic
On Wed, 13 Jan 2021 16:41:27 -0500
Tony Krowiak  wrote:

> On 1/11/21 2:17 PM, Halil Pasic wrote:
> > On Tue, 22 Dec 2020 20:15:56 -0500
> > Tony Krowiak  wrote:
> >  
> >> Let's create links between each queue device bound to the vfio_ap device
> >> driver and the matrix mdev to which the queue's APQN is assigned. The idea
> >> is to facilitate efficient retrieval of the objects representing the queue
> >> devices and matrix mdevs as well as to verify that a queue assigned to
> >> a matrix mdev is bound to the driver.
> >>
> >> The links will be created as follows:
> >>
> >> * When the queue device is probed, if its APQN is assigned to a matrix
> >>   mdev, the structures representing the queue device and the matrix 
> >> mdev
> >>   will be linked.
> >>
> >> * When an adapter or domain is assigned to a matrix mdev, for each new
> >>   APQN assigned that references a queue device bound to the vfio_ap
> >>   device driver, the structures representing the queue device and the
> >>   matrix mdev will be linked.
> >>
> >> The links will be removed as follows:
> >>
> >> * When the queue device is removed, if its APQN is assigned to a matrix
> >>   mdev, the structures representing the queue device and the matrix 
> >> mdev
> >>   will be unlinked.
> >>
> >> * When an adapter or domain is unassigned from a matrix mdev, for each
> >>   APQN unassigned that references a queue device bound to the vfio_ap
> >>   device driver, the structures representing the queue device and the
> >>   matrix mdev will be unlinked.
> >>
> >> Signed-off-by: Tony Krowiak   
> > Reviewed-by: Halil Pasic 
> >  

[..]

> >> +
> >>   int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
> >>   {
> >>struct vfio_ap_queue *q;
> >> @@ -1324,9 +1404,13 @@ int vfio_ap_mdev_probe_queue(struct ap_device 
> >> *apdev)
> >>q = kzalloc(sizeof(*q), GFP_KERNEL);
> >>if (!q)
> >>return -ENOMEM;
> >> +  mutex_lock(_dev->lock);
> >>dev_set_drvdata(>device, q);
> >>q->apqn = to_ap_queue(>device)->qid;
> >>q->saved_isc = VFIO_AP_ISC_INVALID;
> >> +  vfio_ap_queue_link_mdev(q);
> >> +  mutex_unlock(_dev->lock);
> >> +  
> > Does the critical section have to include more than just
> > vfio_ap_queue_link_mdev()? Did we need the critical section
> > before this patch?  
> 
> We did not need the critical section before this patch because
> the only function that retrieved the vfio_ap_queue via the queue
> device's drvdata was the remove callback. I included the initialization
> of the vfio_ap_queue object under lock because the
> vfio_ap_find_queue() function retrieves the vfio_ap_queue object from
> the queue device's drvdata so it might be advantageous to initialize
> it under the mdev lock. On the other hand, I can't come up with a good
> argument to change this.
> 
> 

I was asking out of curiosity, not because I want it changed. I was
also wondering if somebody could see a partially initialized device:
we even first call dev_set_drvdata() and only then finish the
initialization. Before 's390/vfio-ap: use new AP bus interface to search
for queue devices', which is the previous patch, we had the klist code
in between, which uses spinlocks, which I think ensure, that all
effects of probe are seen when we get the queue from
vfio_ap_find_queue(). But with patch 4 in place that is not the case any
more. Or am I wrong?

Regards,
Halil


Re: [PATCH v13 02/15] s390/vfio-ap: No need to disable IRQ after queue reset

2021-01-13 Thread Halil Pasic
On Wed, 13 Jan 2021 12:06:28 -0500
Tony Krowiak  wrote:

> On 1/11/21 11:32 AM, Halil Pasic wrote:
> > On Tue, 22 Dec 2020 20:15:53 -0500
> > Tony Krowiak  wrote:
> >  
> >> The queues assigned to a matrix mediated device are currently reset when:
> >>
> >> * The VFIO_DEVICE_RESET ioctl is invoked
> >> * The mdev fd is closed by userspace (QEMU)
> >> * The mdev is removed from sysfs.
> >>
> >> Immediately after the reset of a queue, a call is made to disable
> >> interrupts for the queue. This is entirely unnecessary because the reset of
> >> a queue disables interrupts, so this will be removed.
> >>
> >> Signed-off-by: Tony Krowiak 
> >> ---
> >>   drivers/s390/crypto/vfio_ap_drv.c |  1 -
> >>   drivers/s390/crypto/vfio_ap_ops.c | 40 +--
> >>   drivers/s390/crypto/vfio_ap_private.h |  1 -
> >>   3 files changed, 26 insertions(+), 16 deletions(-)
> >>
> >> diff --git a/drivers/s390/crypto/vfio_ap_drv.c 
> >> b/drivers/s390/crypto/vfio_ap_drv.c
> >> index be2520cc010b..ca18c91afec9 100644
> >> --- a/drivers/s390/crypto/vfio_ap_drv.c
> >> +++ b/drivers/s390/crypto/vfio_ap_drv.c
> >> @@ -79,7 +79,6 @@ static void vfio_ap_queue_dev_remove(struct ap_device 
> >> *apdev)
> >>apid = AP_QID_CARD(q->apqn);
> >>apqi = AP_QID_QUEUE(q->apqn);
> >>vfio_ap_mdev_reset_queue(apid, apqi, 1);
> >> -  vfio_ap_irq_disable(q);
> >>kfree(q);
> >>mutex_unlock(_dev->lock);
> >>   }
> >> diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
> >> b/drivers/s390/crypto/vfio_ap_ops.c
> >> index 7339043906cf..052f61391ec7 100644
> >> --- a/drivers/s390/crypto/vfio_ap_ops.c
> >> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> >> @@ -25,6 +25,7 @@
> >>   #define VFIO_AP_MDEV_NAME_HWVIRT "VFIO AP Passthrough Device"
> >>   
> >>   static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
> >> +static struct vfio_ap_queue *vfio_ap_find_queue(int apqn);
> >>   
> >>   static int match_apqn(struct device *dev, const void *data)
> >>   {
> >> @@ -49,20 +50,15 @@ static struct vfio_ap_queue *(
> >>int apqn)
> >>   {
> >>struct vfio_ap_queue *q;
> >> -  struct device *dev;
> >>   
> >>if (!test_bit_inv(AP_QID_CARD(apqn), matrix_mdev->matrix.apm))
> >>return NULL;
> >>if (!test_bit_inv(AP_QID_QUEUE(apqn), matrix_mdev->matrix.aqm))
> >>return NULL;
> >>   
> >> -  dev = driver_find_device(_dev->vfio_ap_drv->driver, NULL,
> >> -   , match_apqn);
> >> -  if (!dev)
> >> -  return NULL;
> >> -  q = dev_get_drvdata(dev);
> >> -  q->matrix_mdev = matrix_mdev;
> >> -  put_device(dev);
> >> +  q = vfio_ap_find_queue(apqn);
> >> +  if (q)
> >> +  q->matrix_mdev = matrix_mdev;
> >>   
> >>return q;
> >>   }
> >> @@ -1126,24 +1122,27 @@ static int vfio_ap_mdev_group_notifier(struct 
> >> notifier_block *nb,
> >>return notify_rc;
> >>   }
> >>   
> >> -static void (int apqn)
> >> +static struct vfio_ap_queue *vfio_ap_find_queue(int apqn)
> >>   {
> >>struct device *dev;
> >> -  struct vfio_ap_queue *q;
> >> +  struct vfio_ap_queue *q = NULL;
> >>   
> >>dev = driver_find_device(_dev->vfio_ap_drv->driver, NULL,
> >> , match_apqn);
> >>if (dev) {
> >>q = dev_get_drvdata(dev);
> >> -  vfio_ap_irq_disable(q);
> >>put_device(dev);
> >>}
> >> +
> >> +  return q;
> >>   }  
> > This hunk and the previous one are a rewrite of vfio_ap_get_queue() and
> > have next to nothing to do with the patch's objective. If we were at an
> > earlier stage, I would ask to split it up.  
> 
> The rewrite of vfio_ap_get_queue() definitely is related to this
> patch's objective. 

Definitively loosely related.

> Below, in the vfio_ap_mdev_reset_queue()
> function, there is the label 'free_aqic_resources' which is where
> the call to vfio_ap_free_aqic_resources() function is called.
> That function takes a struct vfio_ap_queue as an argument,
> so the object needs to be retrieved prior to calling the function.
> We can't use the vfio_ap_g

Re: [PATCH v13 14/15] s390/vfio-ap: handle AP bus scan completed notification

2021-01-12 Thread Halil Pasic
On Tue, 22 Dec 2020 20:16:05 -0500
Tony Krowiak  wrote:

> Implements the driver callback invoked by the AP bus when the AP bus
> scan has completed. Since this callback is invoked after binding the newly
> added devices to their respective device drivers, the vfio_ap driver will
> attempt to hot plug the adapters, domains and control domains into each
> guest using the matrix mdev to which they are assigned. Keep in mind that
> an adapter or domain can be plugged in only if:
> * Each APQN derived from the newly added APID of the adapter and the APQIs
>   already assigned to the guest's APCB references an AP queue device bound
>   to the vfio_ap driver
> * Each APQN derived from the newly added APQI of the domain and the APIDs
>   already assigned to the guest's APCB references an AP queue device bound
>   to the vfio_ap driver

As stated in my comment to your previous patch, I don't see the promised
mechanism for delaying hotplug (from probe). Without that we can't
consolidate, and the handling of on_scan_complete() is useless, because
the hotplugs are already done.

Regards,
Halil

> 
> Signed-off-by: Tony Krowiak 
> ---
>  drivers/s390/crypto/vfio_ap_drv.c |  1 +
>  drivers/s390/crypto/vfio_ap_ops.c | 21 +
>  drivers/s390/crypto/vfio_ap_private.h |  2 ++
>  3 files changed, 24 insertions(+)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_drv.c 
> b/drivers/s390/crypto/vfio_ap_drv.c
> index 2029d8392416..075495fc44c0 100644
> --- a/drivers/s390/crypto/vfio_ap_drv.c
> +++ b/drivers/s390/crypto/vfio_ap_drv.c
> @@ -149,6 +149,7 @@ static int __init vfio_ap_init(void)
>   vfio_ap_drv.remove = vfio_ap_mdev_remove_queue;
>   vfio_ap_drv.in_use = vfio_ap_mdev_resource_in_use;
>   vfio_ap_drv.on_config_changed = vfio_ap_on_cfg_changed;
> + vfio_ap_drv.on_scan_complete = vfio_ap_on_scan_complete;
>   vfio_ap_drv.ids = ap_queue_ids;
>  
>   ret = ap_driver_register(_ap_drv, THIS_MODULE, VFIO_AP_DRV_NAME);
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
> b/drivers/s390/crypto/vfio_ap_ops.c
> index 8bbbd1dc7546..b8ed01297812 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -1592,3 +1592,24 @@ void vfio_ap_on_cfg_changed(struct ap_config_info 
> *new_config_info,
>   vfio_ap_mdev_on_cfg_add();
>   mutex_unlock(_dev->lock);
>  }
> +
> +void vfio_ap_on_scan_complete(struct ap_config_info *new_config_info,
> +   struct ap_config_info *old_config_info)
> +{
> + struct ap_matrix_mdev *matrix_mdev;
> +
> + mutex_lock(_dev->lock);
> + list_for_each_entry(matrix_mdev, _dev->mdev_list, node) {
> + if (bitmap_intersects(matrix_mdev->matrix.apm,
> +   matrix_dev->ap_add, AP_DEVICES) ||
> + bitmap_intersects(matrix_mdev->matrix.aqm,
> +   matrix_dev->aq_add, AP_DOMAINS) ||
> + bitmap_intersects(matrix_mdev->matrix.adm,
> +   matrix_dev->ad_add, AP_DOMAINS))
> + vfio_ap_mdev_refresh_apcb(matrix_mdev);
> + }
> +
> + bitmap_clear(matrix_dev->ap_add, 0, AP_DEVICES);
> + bitmap_clear(matrix_dev->aq_add, 0, AP_DOMAINS);
> + mutex_unlock(_dev->lock);
> +}
> diff --git a/drivers/s390/crypto/vfio_ap_private.h 
> b/drivers/s390/crypto/vfio_ap_private.h
> index b99b68968447..7f0f7c92e686 100644
> --- a/drivers/s390/crypto/vfio_ap_private.h
> +++ b/drivers/s390/crypto/vfio_ap_private.h
> @@ -117,5 +117,7 @@ int vfio_ap_mdev_resource_in_use(unsigned long *apm, 
> unsigned long *aqm);
>  
>  void vfio_ap_on_cfg_changed(struct ap_config_info *new_config_info,
>   struct ap_config_info *old_config_info);
> +void vfio_ap_on_scan_complete(struct ap_config_info *new_config_info,
> +   struct ap_config_info *old_config_info);
>  
>  #endif /* _VFIO_AP_PRIVATE_H_ */



Re: [PATCH v13 13/15] s390/vfio-ap: handle host AP config change notification

2021-01-12 Thread Halil Pasic
On Tue, 22 Dec 2020 20:16:04 -0500
Tony Krowiak  wrote:

> The motivation for config change notification is to enable the vfio_ap
> device driver to handle hot plug/unplug of AP queues for a KVM guest as a
> bulk operation. For example, if a new APID is dynamically assigned to the
> host configuration, then a queue device will be created for each APQN that
> can be formulated from the new APID and all APQIs already assigned to the
> host configuration. Each of these new queue devices will get bound to their
> respective driver one at a time, as they are created. In the case of the
> vfio_ap driver, if the APQN of the queue device being bound to the driver
> is assigned to a matrix mdev in use by a KVM guest, it will be hot plugged
> into the guest if possible. Given that the AP architecture allows for 256
> adapters and 256 domains, one can see the possibility of the vfio_ap
> driver's probe/remove callbacks getting invoked an inordinate number of
> times when the host configuration changes. Keep in mind that in order to
> plug/unplug an AP queue for a guest, the guest's VCPUs must be suspended,
> then the guest's AP configuration must be updated followed by the VCPUs
> being resumed. If this is done each time the probe or remove callback is
> invoked and there are hundreds or thousands of queues to be probed or
> removed, this would be incredibly inefficient and could have a large impact
> on guest performance. What the config notification does is allow us to
> make the changes to the guest in a single operation.
> 
> This patch implements the on_cfg_changed callback which notifies the
> AP device drivers that the host AP configuration has changed (i.e.,
> adapters, domains and/or control domains are added to or removed from the
> host AP configuration).
> 
> Adapters added to host configuration:
> * The APIDs of the adapters added will be stored in a bitmap contained
>   within the struct representing the matrix device which is the parent
>   device of all matrix mediated devices.
> * When a queue is probed, if the APQN of the queue being probed is
>   assigned to an mdev in use by a guest, the queue may get hot plugged
>   into the guest; however, if the APID of the adapter is contained in the
>   bitmap of adapters added, the queue hot plug operation will be skipped
>   until the AP bus notifies the driver that its scan operation has
>   completed (another patch).

I guess, I should be able to find this in patch 14. But I can't.

> * When the vfio_ap driver is notified that the AP bus scan has completed,
>   the guest's APCB will be refreshed by filtering the mdev's matrix by
>   APID.
> 
> Domains added to host configuration:
> * The APQIs of the domains added will be stored in a bitmap contained
>   within the struct representing the matrix device which is the parent
>   device of all matrix mediated devices.
> * When a queue is probed, if the APQN of the queue being probed is
>   assigned to an mdev in use by a guest, the queue may get hot plugged
>   into the guest; however, if the APQI of the domain is contained in the
>   bitmap of domains added, the queue hot plug operation will be skipped
>   until the AP bus notifies the driver that its scan operation has
>   completed (another patch).
> 
> Control domains added to the host configuration:
> * The domain numbers of the domains added will be stored in a bitmap
>   contained within the struct representing the matrix device which is the
>   parent device of all matrix mediated devices.
> 
> When the vfio_ap device driver is notified that the AP bus scan has
> completed, the APCB for each matrix mdev to which the adapters, domains
> and control domains added are assigned will be refreshed. If a KVM guest is
> using the matrix mdev, the APCB will be hot plugged into the guest to
> refresh its AP configuration.
> 
> Adapters removed from configuration:
> * Each queue device with the APID identifying an adapter removed from
>   the host AP configuration will be unlinked from the matrix mdev to which
>   the queue's APQN is assigned.
> * When the vfio_ap driver's remove callback is invoked, if the queue
>   device is not linked to the matrix mdev, the refresh of the guest's
>   APCB will be skipped.
> 
> Domains removed from configuration:
> * Each queue device with the APQI identifying a domain removed from
>   the host AP configuration will be unlinked from the matrix mdev to which
>   the queue's APQN is assigned.
> * When the vfio_ap driver's remove callback is invoked, if the queue
>   device is not linked to the matrix mdev, the refresh of the guest's
>   APCB will be skipped.
> 
> If any queues with an APQN assigned to a given matrix mdev have been
> unlinked or any control domains assigned to a given matrix mdev have been
> removed from the host AP configuration, the APCB of the matrix mdev will
> be refreshed. If a KVM guest is using the matrix mdev, the APCB will be hot
> plugged into the guest to refresh its AP configuration.
> 
> 

Re: [PATCH v13 09/15] s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device

2021-01-12 Thread Halil Pasic
On Tue, 12 Jan 2021 02:12:51 +0100
Halil Pasic  wrote:

> > @@ -1347,8 +1437,11 @@ void vfio_ap_mdev_remove_queue(struct ap_device 
> > *apdev)
> > apqi = AP_QID_QUEUE(q->apqn);
> > vfio_ap_mdev_reset_queue(apid, apqi, 1);
> >  
> > -   if (q->matrix_mdev)
> > +   if (q->matrix_mdev) {
> > +   matrix_mdev = q->matrix_mdev;
> > vfio_ap_mdev_unlink_queue(q);
> > +   vfio_ap_mdev_refresh_apcb(matrix_mdev);
> > +   }
> >  
> > kfree(q);
> > mutex_unlock(_dev->lock);  

Shouldn't we first remove the queue from the APCB and then
reset? Sorry, I missed this one yesterday.

Regards,
Halil


Re: [PATCH v13 12/15] s390/zcrypt: Notify driver on config changed and scan complete callbacks

2021-01-12 Thread Halil Pasic
On Tue, 22 Dec 2020 20:16:03 -0500
Tony Krowiak  wrote:

> This patch intruduces an extension to the ap bus to notify device drivers
> when the host AP configuration changes - i.e., adapters, domains or
> control domains are added or removed. To that end, two new callbacks are
> introduced for AP device drivers:
> 
>   void (*on_config_changed)(struct ap_config_info *new_config_info,
> struct ap_config_info *old_config_info);
> 
>  This callback is invoked at the start of the AP bus scan
>  function when it determines that the host AP configuration information
>  has changed since the previous scan. This is done by storing
>  an old and current QCI info struct and comparing them. If there is any
>  difference, the callback is invoked.
> 
>  Note that when the AP bus scan detects that AP adapters, domains or
>  control domains have been removed from the host's AP configuration, it
>  will remove the associated devices from the AP bus subsystem's device
>  model. This callback gives the device driver a chance to respond to
>  the removal of the AP devices from the host configuration prior to
>  calling the device driver's remove callback. The primary purpose of
>  this callback is to allow the vfio_ap driver to do a bulk unplug of
>  all affected adapters, domains and control domains from affected
>  guests rather than unplugging them one at a time when the remove
>  callback is invoked.
> 
>   void (*on_scan_complete)(struct ap_config_info *new_config_info,
>struct ap_config_info *old_config_info);
> 
>  The on_scan_complete callback is invoked after the ap bus scan is
>  complete if the host AP configuration data has changed.
> 
>  Note that when the AP bus scan detects that adapters, domains or
>  control domains have been added to the host's configuration, it will
>  create new devices in the AP bus subsystem's device model. The primary
>  purpose of this callback is to allow the vfio_ap driver to do a bulk
>  plug of all affected adapters, domains and control domains into
>  affected guests rather than plugging them one at a time when the
>  probe callback is invoked.
> 
> Please note that changes to the apmask and aqmask do not trigger
> these two callbacks since the bus scan function is not invoked by changes
> to those masks.
> 
> Signed-off-by: Harald Freudenberger 
> Signed-off-by: Tony Krowiak 

Reviewed-by: Halil Pasic 

[..]


Re: [PATCH v13 10/15] s390/zcrypt: driver callback to indicate resource in use

2021-01-12 Thread Halil Pasic
On Tue, 22 Dec 2020 20:16:01 -0500
Tony Krowiak  wrote:

> Introduces a new driver callback to prevent a root user from unbinding
> an AP queue from its device driver if the queue is in use. The callback
> will be invoked whenever a change to the AP bus's sysfs apmask or aqmask
> attributes would result in one or more AP queues being removed from its
> driver. If the callback responds in the affirmative for any driver
> queried, the change to the apmask or aqmask will be rejected with a device
> busy error.
> 
> For this patch, only non-default drivers will be queried. Currently,
> there is only one non-default driver, the vfio_ap device driver. The
> vfio_ap device driver facilitates pass-through of an AP queue to a
> guest. The idea here is that a guest may be administered by a different
> sysadmin than the host and we don't want AP resources to unexpectedly
> disappear from a guest's AP configuration (i.e., adapters and domains
> assigned to the matrix mdev). This will enforce the proper procedure for
> removing AP resources intended for guest usage which is to
> first unassign them from the matrix mdev, then unbind them from the
> vfio_ap device driver.
> 
> Signed-off-by: Tony Krowiak 
> Reviewed-by: Harald Freudenberger 

Reviewed-by: Halil Pasic 


Re: [PATCH v13 11/15] s390/vfio-ap: implement in-use callback for vfio_ap driver

2021-01-12 Thread Halil Pasic
On Tue, 12 Jan 2021 09:14:07 -0500
Matthew Rosato  wrote:

> On 1/11/21 8:20 PM, Halil Pasic wrote:
> > On Tue, 22 Dec 2020 20:16:02 -0500
> > Tony Krowiak  wrote:
> >   
> >> Let's implement the callback to indicate when an APQN
> >> is in use by the vfio_ap device driver. The callback is
> >> invoked whenever a change to the apmask or aqmask would
> >> result in one or more queue devices being removed from the driver. The
> >> vfio_ap device driver will indicate a resource is in use
> >> if the APQN of any of the queue devices to be removed are assigned to
> >> any of the matrix mdevs under the driver's control.
> >>
> >> There is potential for a deadlock condition between the matrix_dev->lock
> >> used to lock the matrix device during assignment of adapters and domains
> >> and the ap_perms_mutex locked by the AP bus when changes are made to the
> >> sysfs apmask/aqmask attributes.
> >>
> >> Consider following scenario (courtesy of Halil Pasic):
> >> 1) apmask_store() takes ap_perms_mutex
> >> 2) assign_adapter_store() takes matrix_dev->lock
> >> 3) apmask_store() calls vfio_ap_mdev_resource_in_use() which tries
> >> to take matrix_dev->lock
> >> 4) assign_adapter_store() calls ap_apqn_in_matrix_owned_by_def_drv
> >> which tries to take ap_perms_mutex
> >>
> >> BANG!
> >>
> >> To resolve this issue, instead of using the mutex_lock(_dev->lock)
> >> function to lock the matrix device during assignment of an adapter or
> >> domain to a matrix_mdev as well as during the in_use callback, the
> >> mutex_trylock(_dev->lock) function will be used. If the lock is not
> >> obtained, then the assignment and in_use functions will terminate with
> >> -EBUSY.
> >>
> >> Signed-off-by: Tony Krowiak 
> >> ---
> >>   drivers/s390/crypto/vfio_ap_drv.c |  1 +
> >>   drivers/s390/crypto/vfio_ap_ops.c | 21 ++---
> >>   drivers/s390/crypto/vfio_ap_private.h |  2 ++
> >>   3 files changed, 21 insertions(+), 3 deletions(-)
> >>  
> > [..]  
> >>   }
> >> +
> >> +int vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm)
> >> +{
> >> +  int ret;
> >> +
> >> +  if (!mutex_trylock(_dev->lock))
> >> +  return -EBUSY;
> >> +  ret = vfio_ap_mdev_verify_no_sharing(NULL, apm, aqm);  
> > 
> > If we detect that resources are in use, then we spit warnings to the
> > message log, right?
> > 
> > @Matt: Is your userspace tooling going to guarantee that this will never
> > happen?  
> 
> Yes, but only when using the tooling to modify apmask/aqmask.  You would 
> still be able to create such a scenario by bypassing the tooling and 
> invoking the sysfs interfaces directly.
> 
> 

Since, I suppose, the tooling is going to catch this anyway, and produce
much better feedback to the user, I believe we should be fine degrading
the severity to info or debug. 

I would prefer not producing a warning here, because I believe it is
likely to do more harm, than good (by implying a kernel problem, as I
don't think based on the message one will think that it is an userspace
problem). But if everybody else agrees, that we want a warning here, then
I can live with that as well.

Regards,
Halil


Re: [PATCH v13 11/15] s390/vfio-ap: implement in-use callback for vfio_ap driver

2021-01-11 Thread Halil Pasic
On Tue, 22 Dec 2020 20:16:02 -0500
Tony Krowiak  wrote:

> Let's implement the callback to indicate when an APQN
> is in use by the vfio_ap device driver. The callback is
> invoked whenever a change to the apmask or aqmask would
> result in one or more queue devices being removed from the driver. The
> vfio_ap device driver will indicate a resource is in use
> if the APQN of any of the queue devices to be removed are assigned to
> any of the matrix mdevs under the driver's control.
> 
> There is potential for a deadlock condition between the matrix_dev->lock
> used to lock the matrix device during assignment of adapters and domains
> and the ap_perms_mutex locked by the AP bus when changes are made to the
> sysfs apmask/aqmask attributes.
> 
> Consider following scenario (courtesy of Halil Pasic):
> 1) apmask_store() takes ap_perms_mutex
> 2) assign_adapter_store() takes matrix_dev->lock
> 3) apmask_store() calls vfio_ap_mdev_resource_in_use() which tries
>to take matrix_dev->lock
> 4) assign_adapter_store() calls ap_apqn_in_matrix_owned_by_def_drv
>which tries to take ap_perms_mutex
> 
> BANG!
> 
> To resolve this issue, instead of using the mutex_lock(_dev->lock)
> function to lock the matrix device during assignment of an adapter or
> domain to a matrix_mdev as well as during the in_use callback, the
> mutex_trylock(_dev->lock) function will be used. If the lock is not
> obtained, then the assignment and in_use functions will terminate with
> -EBUSY.
> 
> Signed-off-by: Tony Krowiak 
> ---
>  drivers/s390/crypto/vfio_ap_drv.c |  1 +
>  drivers/s390/crypto/vfio_ap_ops.c | 21 ++---
>  drivers/s390/crypto/vfio_ap_private.h |  2 ++
>  3 files changed, 21 insertions(+), 3 deletions(-)
> 
[..]
>  }
> +
> +int vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm)
> +{
> + int ret;
> +
> + if (!mutex_trylock(_dev->lock))
> + return -EBUSY;
> + ret = vfio_ap_mdev_verify_no_sharing(NULL, apm, aqm);

If we detect that resources are in use, then we spit warnings to the
message log, right?

@Matt: Is your userspace tooling going to guarantee that this will never
happen?

> + mutex_unlock(_dev->lock);
> +
> + return ret;
> +}
> diff --git a/drivers/s390/crypto/vfio_ap_private.h 
> b/drivers/s390/crypto/vfio_ap_private.h
> index d2d26ba18602..15b7cd74843b 100644
> --- a/drivers/s390/crypto/vfio_ap_private.h
> +++ b/drivers/s390/crypto/vfio_ap_private.h
> @@ -107,4 +107,6 @@ struct vfio_ap_queue {
>  int vfio_ap_mdev_probe_queue(struct ap_device *queue);
>  void vfio_ap_mdev_remove_queue(struct ap_device *queue);
>  
> +int vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm);
> +
>  #endif /* _VFIO_AP_PRIVATE_H_ */



Re: [PATCH v13 09/15] s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device

2021-01-11 Thread Halil Pasic
(apid, apqi);
> + if (!vfio_ap_mdev_get_queue(matrix_mdev, apqn)) {
> + clear_bit_inv(apid, shadow_apcb->apm);
> + break;
> + }
> + }
> + }
> +}
> +
> +/**
> + * vfio_ap_mdev_refresh_apcb
> + *
> + * Filter APQNs assigned to the matrix mdev that do not reference an AP queue
> + * device bound to the vfio_ap device driver.
> + *
> + * @matrix_mdev:  the matrix mdev whose AP configuration is to be filtered
> + * @shadow_apcb:  the shadow of the KVM guest's APCB (contains AP 
> configuration
> + * for guest)
> + * @filter_apids: boolean value indicating whether the APQNs shall be 
> filtered
> + * by APID (true) or by APQI (false).
> + *

The signature in the doc comment and of the function do not match.

Since none of the complains affects correctness, except maybe for the
qci suff:

Acked-by: Halil Pasic 

If it's good enough for you, it's good enough for me.

> + * Returns the number of APQNs remaining after filtering is complete.
> + */
> +static void vfio_ap_mdev_refresh_apcb(struct ap_matrix_mdev *matrix_mdev)
> +{
> + struct ap_matrix shadow_apcb;
> +
> + vfio_ap_mdev_filter_apcb(matrix_mdev, _apcb);
> +
> + if (memcmp(_apcb, _mdev->shadow_apcb,
> +sizeof(struct ap_matrix)) != 0) {
> + memcpy(_mdev->shadow_apcb, _apcb,
> +sizeof(struct ap_matrix));
> + vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
> + }
> +}
> +
>  static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device 
> *mdev)
>  {
>   struct ap_matrix_mdev *matrix_mdev;
> @@ -552,10 +634,6 @@ static ssize_t assign_adapter_store(struct device *dev,
>   struct mdev_device *mdev = mdev_from_dev(dev);
>   struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>  
> - /* If the guest is running, disallow assignment of adapter */
> - if (matrix_mdev->kvm)
> - return -EBUSY;
> -
>   ret = kstrtoul(buf, 0, );
>   if (ret)
>   return ret;
> @@ -577,6 +655,7 @@ static ssize_t assign_adapter_store(struct device *dev,
>  
>   set_bit_inv(apid, matrix_mdev->matrix.apm);
>   vfio_ap_mdev_link_adapter(matrix_mdev, apid);
> + vfio_ap_mdev_refresh_apcb(matrix_mdev);
>  
>   mutex_unlock(_dev->lock);
>  
> @@ -619,10 +698,6 @@ static ssize_t unassign_adapter_store(struct device *dev,
>   struct mdev_device *mdev = mdev_from_dev(dev);
>   struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>  
> - /* If the guest is running, disallow un-assignment of adapter */
> - if (matrix_mdev->kvm)
> - return -EBUSY;
> -
>   ret = kstrtoul(buf, 0, );
>   if (ret)
>   return ret;
> @@ -633,6 +708,8 @@ static ssize_t unassign_adapter_store(struct device *dev,
>   mutex_lock(_dev->lock);
>   clear_bit_inv((unsigned long)apid, matrix_mdev->matrix.apm);
>   vfio_ap_mdev_unlink_adapter(matrix_mdev, apid);
> + vfio_ap_mdev_refresh_apcb(matrix_mdev);
> +
>   mutex_unlock(_dev->lock);
>  
>   return count;
> @@ -691,10 +768,6 @@ static ssize_t assign_domain_store(struct device *dev,
>   struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>   unsigned long max_apqi = matrix_mdev->matrix.aqm_max;
>  
> - /* If the guest is running, disallow assignment of domain */
> - if (matrix_mdev->kvm)
> - return -EBUSY;
> -
>   ret = kstrtoul(buf, 0, );
>   if (ret)
>   return ret;
> @@ -715,6 +788,7 @@ static ssize_t assign_domain_store(struct device *dev,
>  
>   set_bit_inv(apqi, matrix_mdev->matrix.aqm);
>   vfio_ap_mdev_link_domain(matrix_mdev, apqi);
> + vfio_ap_mdev_refresh_apcb(matrix_mdev);
>  
>   mutex_unlock(_dev->lock);
>  
> @@ -757,10 +831,6 @@ static ssize_t unassign_domain_store(struct device *dev,
>   struct mdev_device *mdev = mdev_from_dev(dev);
>   struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>  
> - /* If the guest is running, disallow un-assignment of domain */
> - if (matrix_mdev->kvm)
> - return -EBUSY;
> -
>   ret = kstrtoul(buf, 0, );
>   if (ret)
>   return ret;
> @@ -771,12 +841,24 @@ static ssize_t unassign_domain_store(struct device *dev,
>   mutex_lock(_dev->lock);
>   clear_bit_inv((unsigned long)apqi, matrix_mdev->matrix.aqm);
>   vfio_ap_mdev_unlink_domain(matrix_mdev, apqi);
> + vfio_ap_mdev_refresh_apcb(matrix_mdev);
> +
>   mutex_unlock(_dev

Re: [PATCH v13 07/15] s390/vfio-ap: introduce shadow APCB

2021-01-11 Thread Halil Pasic
On Tue, 22 Dec 2020 20:15:58 -0500
Tony Krowiak  wrote:

> The APCB is a field within the CRYCB that provides the AP configuration
> to a KVM guest. Let's introduce a shadow copy of the KVM guest's APCB and
> maintain it for the lifespan of the guest.
> 
> Signed-off-by: Tony Krowiak 
> Reviewed-by: Halil Pasic 
> ---
>  drivers/s390/crypto/vfio_ap_ops.c | 15 +++
>  drivers/s390/crypto/vfio_ap_private.h |  2 ++
>  2 files changed, 17 insertions(+)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
> b/drivers/s390/crypto/vfio_ap_ops.c
> index 2d58b39977be..44b3a81cadfb 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -293,6 +293,20 @@ static void vfio_ap_matrix_init(struct ap_config_info 
> *info,
>   matrix->adm_max = info->apxa ? info->Nd : 15;
>  }
>  
> +static bool vfio_ap_mdev_has_crycb(struct ap_matrix_mdev *matrix_mdev)
> +{
> + return (matrix_mdev->kvm && matrix_mdev->kvm->arch.crypto.crycbd);
> +}
> +
> +static void vfio_ap_mdev_commit_shadow_apcb(struct ap_matrix_mdev 
> *matrix_mdev)
> +{
> + if (vfio_ap_mdev_has_crycb(matrix_mdev))
> + kvm_arch_crypto_set_masks(matrix_mdev->kvm,
> +   matrix_mdev->shadow_apcb.apm,
> +   matrix_mdev->shadow_apcb.aqm,
> +   matrix_mdev->shadow_apcb.adm);
> +}
> +
>  static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device 
> *mdev)
>  {
>   struct ap_matrix_mdev *matrix_mdev;
> @@ -308,6 +322,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, 
> struct mdev_device *mdev)
>  
>   matrix_mdev->mdev = mdev;
>   vfio_ap_matrix_init(_dev->info, _mdev->matrix);
> + vfio_ap_matrix_init(_dev->info, _mdev->shadow_apcb);
>   hash_init(matrix_mdev->qtable);
>   mdev_set_drvdata(mdev, matrix_mdev);
>   matrix_mdev->pqap_hook.hook = handle_pqap;
> diff --git a/drivers/s390/crypto/vfio_ap_private.h 
> b/drivers/s390/crypto/vfio_ap_private.h
> index 4e5cc72fc0db..d2d26ba18602 100644
> --- a/drivers/s390/crypto/vfio_ap_private.h
> +++ b/drivers/s390/crypto/vfio_ap_private.h
> @@ -75,6 +75,7 @@ struct ap_matrix {
>   * @list:allows the ap_matrix_mdev struct to be added to a list
>   * @matrix:  the adapters, usage domains and control domains assigned to the
>   *   mediated matrix device.
> + * @shadow_apcb:the shadow copy of the APCB field of the KVM guest's 
> CRYCB
>   * @group_notifier: notifier block used for specifying callback function for
>   *   handling the VFIO_GROUP_NOTIFY_SET_KVM event
>   * @kvm: the struct holding guest's state
> @@ -82,6 +83,7 @@ struct ap_matrix {
>  struct ap_matrix_mdev {
>   struct list_head node;
>   struct ap_matrix matrix;
> + struct ap_matrix shadow_apcb;
>   struct notifier_block group_notifier;
>   struct notifier_block iommu_notifier;
>   struct kvm *kvm;

What happened to the following hunk from v12?

@@ -1218,13 +1233,9 @@ static int vfio_ap_mdev_group_notifier(struct 
notifier_block *nb,
if (ret)
return NOTIFY_DONE;
 
-   /* If there is no CRYCB pointer, then we can't copy the masks */
-   if (!matrix_mdev->kvm->arch.crypto.crycbd)
-   return NOTIFY_DONE;
-
-   kvm_arch_crypto_set_masks(matrix_mdev->kvm, matrix_mdev->matrix.apm,
- matrix_mdev->matrix.aqm,
- matrix_mdev->matrix.adm);
+   memcpy(_mdev->shadow_apcb, _mdev->matrix,
+  sizeof(matrix_mdev->shadow_apcb));
+   vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
 
return NOTIFY_OK;
 }


Re: [PATCH v13 08/15] s390/vfio-ap: sysfs attribute to display the guest's matrix

2021-01-11 Thread Halil Pasic
On Tue, 22 Dec 2020 20:15:59 -0500
Tony Krowiak  wrote:

> The matrix of adapters and domains configured in a guest's APCB may
> differ from the matrix of adapters and domains assigned to the matrix mdev,
> so this patch introduces a sysfs attribute to display the matrix of
> adapters and domains that are or will be assigned to the APCB of a guest
> that is or will be using the matrix mdev. For a matrix mdev denoted by
> $uuid, the guest matrix can be displayed as follows:
> 
>cat /sys/devices/vfio_ap/matrix/$uuid/guest_matrix
> 
> Signed-off-by: Tony Krowiak 

Reviewed-by: Halil Pasic 

But because vfio_ap_mdev_commit_shadow_apcb() is not used (see prev
patch) the attribute won't show the guest matrix at this point. :(

> ---
>  drivers/s390/crypto/vfio_ap_ops.c | 51 ++-
>  1 file changed, 37 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
> b/drivers/s390/crypto/vfio_ap_ops.c
> index 44b3a81cadfb..1b1d5975ee0e 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -894,29 +894,24 @@ static ssize_t control_domains_show(struct device *dev,
>  }
>  static DEVICE_ATTR_RO(control_domains);
>  
> -static ssize_t matrix_show(struct device *dev, struct device_attribute *attr,
> -char *buf)
> +static ssize_t vfio_ap_mdev_matrix_show(struct ap_matrix *matrix, char *buf)
>  {
> - struct mdev_device *mdev = mdev_from_dev(dev);
> - struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>   char *bufpos = buf;
>   unsigned long apid;
>   unsigned long apqi;
>   unsigned long apid1;
>   unsigned long apqi1;
> - unsigned long napm_bits = matrix_mdev->matrix.apm_max + 1;
> - unsigned long naqm_bits = matrix_mdev->matrix.aqm_max + 1;
> + unsigned long napm_bits = matrix->apm_max + 1;
> + unsigned long naqm_bits = matrix->aqm_max + 1;
>   int nchars = 0;
>   int n;
>  
> - apid1 = find_first_bit_inv(matrix_mdev->matrix.apm, napm_bits);
> - apqi1 = find_first_bit_inv(matrix_mdev->matrix.aqm, naqm_bits);
> -
> - mutex_lock(_dev->lock);
> + apid1 = find_first_bit_inv(matrix->apm, napm_bits);
> + apqi1 = find_first_bit_inv(matrix->aqm, naqm_bits);
>  
>   if ((apid1 < napm_bits) && (apqi1 < naqm_bits)) {
> - for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, napm_bits) {
> - for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
> + for_each_set_bit_inv(apid, matrix->apm, napm_bits) {
> + for_each_set_bit_inv(apqi, matrix->aqm,
>naqm_bits) {
>   n = sprintf(bufpos, "%02lx.%04lx\n", apid,
>   apqi);
> @@ -925,25 +920,52 @@ static ssize_t matrix_show(struct device *dev, struct 
> device_attribute *attr,
>   }
>   }
>   } else if (apid1 < napm_bits) {
> - for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, napm_bits) {
> + for_each_set_bit_inv(apid, matrix->apm, napm_bits) {
>   n = sprintf(bufpos, "%02lx.\n", apid);
>   bufpos += n;
>   nchars += n;
>   }
>   } else if (apqi1 < naqm_bits) {
> - for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, naqm_bits) {
> + for_each_set_bit_inv(apqi, matrix->aqm, naqm_bits) {
>   n = sprintf(bufpos, ".%04lx\n", apqi);
>   bufpos += n;
>   nchars += n;
>   }
>   }
>  
> + return nchars;
> +}
> +
> +static ssize_t matrix_show(struct device *dev, struct device_attribute *attr,
> +char *buf)
> +{
> + ssize_t nchars;
> + struct mdev_device *mdev = mdev_from_dev(dev);
> + struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
> +
> + mutex_lock(_dev->lock);
> + nchars = vfio_ap_mdev_matrix_show(_mdev->matrix, buf);
>   mutex_unlock(_dev->lock);
>  
>   return nchars;
>  }
>  static DEVICE_ATTR_RO(matrix);
>  
> +static ssize_t guest_matrix_show(struct device *dev,
> +  struct device_attribute *attr, char *buf)
> +{
> + ssize_t nchars;
> + struct mdev_device *mdev = mdev_from_dev(dev);
> + struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
> +
> + mutex_lock(_dev->lock);
> + nchars = vfio_ap_mdev_matrix_show(_mdev->shadow

Re: [PATCH v13 06/15] s390/vfio-ap: allow assignment of unavailable AP queues to mdev device

2021-01-11 Thread Halil Pasic
On Tue, 22 Dec 2020 20:15:57 -0500
Tony Krowiak  wrote:

> The current implementation does not allow assignment of an AP adapter or
> domain to an mdev device if each APQN resulting from the assignment
> does not reference an AP queue device that is bound to the vfio_ap device
> driver. This patch allows assignment of AP resources to the matrix mdev as
> long as the APQNs resulting from the assignment:
>1. Are not reserved by the AP BUS for use by the zcrypt device drivers.
>2. Are not assigned to another matrix mdev.
> 
> The rationale behind this is twofold:
>1. The AP architecture does not preclude assignment of APQNs to an AP
>   configuration that are not available to the system.
>2. APQNs that do not reference a queue device bound to the vfio_ap
>   device driver will not be assigned to the guest's CRYCB, so the
>   guest will not get access to queues not bound to the vfio_ap driver.

You didn't tell us about the changed error code.

Also notice that this point we don't have neither filtering nor in-use.
This used to be patch 11, and most of that stuff used to be in place. But
I'm going to trust you, if you say its fine to enable it this early.

> 
> Signed-off-by: Tony Krowiak 
> ---
>  drivers/s390/crypto/vfio_ap_ops.c | 241 --
>  1 file changed, 62 insertions(+), 179 deletions(-)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
> b/drivers/s390/crypto/vfio_ap_ops.c
> index cdcc6378b4a5..2d58b39977be 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -379,134 +379,37 @@ static struct attribute_group 
> *vfio_ap_mdev_type_groups[] = {
>   NULL,
>  };
>  
> -struct vfio_ap_queue_reserved {
> - unsigned long *apid;
> - unsigned long *apqi;
> - bool reserved;
> -};
> +#define MDEV_SHARING_ERR "Userspace may not re-assign queue %02lx.%04lx " \
> +  "already assigned to %s"
>  
> -/**
> - * vfio_ap_has_queue
> - *
> - * @dev: an AP queue device
> - * @data: a struct vfio_ap_queue_reserved reference
> - *
> - * Flags whether the AP queue device (@dev) has a queue ID containing the 
> APQN,
> - * apid or apqi specified in @data:
> - *
> - * - If @data contains both an apid and apqi value, then @data will be 
> flagged
> - *   as reserved if the APID and APQI fields for the AP queue device matches
> - *
> - * - If @data contains only an apid value, @data will be flagged as
> - *   reserved if the APID field in the AP queue device matches
> - *
> - * - If @data contains only an apqi value, @data will be flagged as
> - *   reserved if the APQI field in the AP queue device matches
> - *
> - * Returns 0 to indicate the input to function succeeded. Returns -EINVAL if
> - * @data does not contain either an apid or apqi.
> - */
> -static int vfio_ap_has_queue(struct device *dev, void *data)
> +static void vfio_ap_mdev_log_sharing_err(const char *mdev_name,
> +  unsigned long *apm,
> +  unsigned long *aqm)
[..]
> - return 0;
> + for_each_set_bit_inv(apid, apm, AP_DEVICES)
> + for_each_set_bit_inv(apqi, aqm, AP_DOMAINS)
> + pr_warn(MDEV_SHARING_ERR, apid, apqi, mdev_name);

I would prefer dev_warn() here. We know which device is about to get
more queues, and this device can provide a clue regarding the initiator.

Also I believe a warning is too heavy handed here. Warnings should not
be ignored. This is a condition that can emerge during normal operation,
AFAIU. Or am I worng?

>  }
>  
>  /**
>   * vfio_ap_mdev_verify_no_sharing
>   *
> - * Verifies that the APQNs derived from the cross product of the AP adapter 
> IDs
> - * and AP queue indexes comprising the AP matrix are not configured for 
> another
> - * mediated device. AP queue sharing is not allowed.
> + * Verifies that each APQN derived from the Cartesian product of the AP 
> adapter
> + * IDs and AP queue indexes comprising the AP matrix are not configured for
> + * another mediated device. AP queue sharing is not allowed.
>   *
> - * @matrix_mdev: the mediated matrix device
> + * @matrix_mdev: the mediated matrix device to which the APQNs being verified
> + *are assigned.
> + * @mdev_apm: mask indicating the APIDs of the APQNs to be verified
> + * @mdev_aqm: mask indicating the APQIs of the APQNs to be verified
>   *
> - * Returns 0 if the APQNs are not shared, otherwise; returns -EADDRINUSE.
> + * Returns 0 if the APQNs are not shared, otherwise; returns -EBUSY.
>   */
> -static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
> +static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev,
> +   unsigned long *mdev_apm,
> +   unsigned long *mdev_aqm)
>  {
>   struct ap_matrix_mdev *lstdev;
>   DECLARE_BITMAP(apm, AP_DEVICES);
> @@ -523,20 +426,31 @@ static int 

Re: [PATCH v13 05/15] s390/vfio-ap: manage link between queue struct and matrix mdev

2021-01-11 Thread Halil Pasic
On Tue, 22 Dec 2020 20:15:56 -0500
Tony Krowiak  wrote:

> Let's create links between each queue device bound to the vfio_ap device
> driver and the matrix mdev to which the queue's APQN is assigned. The idea
> is to facilitate efficient retrieval of the objects representing the queue
> devices and matrix mdevs as well as to verify that a queue assigned to
> a matrix mdev is bound to the driver.
> 
> The links will be created as follows:
> 
>* When the queue device is probed, if its APQN is assigned to a matrix
>  mdev, the structures representing the queue device and the matrix mdev
>  will be linked.
> 
>* When an adapter or domain is assigned to a matrix mdev, for each new
>  APQN assigned that references a queue device bound to the vfio_ap
>  device driver, the structures representing the queue device and the
>  matrix mdev will be linked.
> 
> The links will be removed as follows:
> 
>* When the queue device is removed, if its APQN is assigned to a matrix
>  mdev, the structures representing the queue device and the matrix mdev
>  will be unlinked.
> 
>* When an adapter or domain is unassigned from a matrix mdev, for each
>  APQN unassigned that references a queue device bound to the vfio_ap
>  device driver, the structures representing the queue device and the
>      matrix mdev will be unlinked.
> 
> Signed-off-by: Tony Krowiak 

Reviewed-by: Halil Pasic 

> ---
>  drivers/s390/crypto/vfio_ap_ops.c | 140 +-
>  drivers/s390/crypto/vfio_ap_private.h |   3 +
>  2 files changed, 117 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
> b/drivers/s390/crypto/vfio_ap_ops.c
> index 835c963ae16d..cdcc6378b4a5 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -27,33 +27,17 @@
>  static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
>  static struct vfio_ap_queue *vfio_ap_find_queue(int apqn);
>  
> -/**
> - * vfio_ap_get_queue: Retrieve a queue with a specific APQN from a list
> - * @matrix_mdev: the associated mediated matrix
> - * @apqn: The queue APQN
> - *
> - * Retrieve a queue with a specific APQN from the list of the
> - * devices of the vfio_ap_drv.
> - * Verify that the APID and the APQI are set in the matrix.
> - *
> - * Returns the pointer to the associated vfio_ap_queue
> - */
> -static struct vfio_ap_queue *vfio_ap_get_queue(
> - struct ap_matrix_mdev *matrix_mdev,
> - int apqn)
> +static struct vfio_ap_queue *
> +vfio_ap_mdev_get_queue(struct ap_matrix_mdev *matrix_mdev, unsigned long 
> apqn)
>  {
> - struct vfio_ap_queue *q = NULL;
> -
> - if (!test_bit_inv(AP_QID_CARD(apqn), matrix_mdev->matrix.apm))
> - return NULL;
> - if (!test_bit_inv(AP_QID_QUEUE(apqn), matrix_mdev->matrix.aqm))
> - return NULL;
> + struct vfio_ap_queue *q;
>  
> - q = vfio_ap_find_queue(apqn);
> - if (q)
> - q->matrix_mdev = matrix_mdev;
> + hash_for_each_possible(matrix_mdev->qtable, q, mdev_qnode, apqn) {
> + if (q && (q->apqn == apqn))
> + return q;
> + }
>  
> - return q;
> + return NULL;
>  }
>  
>  /**
> @@ -166,7 +150,6 @@ static struct ap_queue_status vfio_ap_irq_disable(struct 
> vfio_ap_queue *q)
> status.response_code);
>  end_free:
>   vfio_ap_free_aqic_resources(q);
> - q->matrix_mdev = NULL;
>   return status;
>  }
>  
> @@ -282,7 +265,7 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
>   matrix_mdev = container_of(vcpu->kvm->arch.crypto.pqap_hook,
>  struct ap_matrix_mdev, pqap_hook);
>  
> - q = vfio_ap_get_queue(matrix_mdev, apqn);
> + q = vfio_ap_mdev_get_queue(matrix_mdev, apqn);
>   if (!q)
>   goto out_unlock;
>  
> @@ -325,6 +308,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, 
> struct mdev_device *mdev)
>  
>   matrix_mdev->mdev = mdev;
>   vfio_ap_matrix_init(_dev->info, _mdev->matrix);
> + hash_init(matrix_mdev->qtable);
>   mdev_set_drvdata(mdev, matrix_mdev);
>   matrix_mdev->pqap_hook.hook = handle_pqap;
>   matrix_mdev->pqap_hook.owner = THIS_MODULE;
> @@ -553,6 +537,50 @@ static int vfio_ap_mdev_verify_no_sharing(struct 
> ap_matrix_mdev *matrix_mdev)
>   return 0;
>  }
>  
> +static void vfio_ap_mdev_link_queue(struct ap_matrix_mdev *matrix_mdev,
> + struct vfio_ap_queue *q)
> +{
> + if 

Re: [PATCH v13 02/15] s390/vfio-ap: No need to disable IRQ after queue reset

2021-01-11 Thread Halil Pasic
On Tue, 22 Dec 2020 20:15:53 -0500
Tony Krowiak  wrote:

> The queues assigned to a matrix mediated device are currently reset when:
> 
> * The VFIO_DEVICE_RESET ioctl is invoked
> * The mdev fd is closed by userspace (QEMU)
> * The mdev is removed from sysfs.
> 
> Immediately after the reset of a queue, a call is made to disable
> interrupts for the queue. This is entirely unnecessary because the reset of
> a queue disables interrupts, so this will be removed.
> 
> Signed-off-by: Tony Krowiak 
> ---
>  drivers/s390/crypto/vfio_ap_drv.c |  1 -
>  drivers/s390/crypto/vfio_ap_ops.c | 40 +--
>  drivers/s390/crypto/vfio_ap_private.h |  1 -
>  3 files changed, 26 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_drv.c 
> b/drivers/s390/crypto/vfio_ap_drv.c
> index be2520cc010b..ca18c91afec9 100644
> --- a/drivers/s390/crypto/vfio_ap_drv.c
> +++ b/drivers/s390/crypto/vfio_ap_drv.c
> @@ -79,7 +79,6 @@ static void vfio_ap_queue_dev_remove(struct ap_device 
> *apdev)
>   apid = AP_QID_CARD(q->apqn);
>   apqi = AP_QID_QUEUE(q->apqn);
>   vfio_ap_mdev_reset_queue(apid, apqi, 1);
> - vfio_ap_irq_disable(q);
>   kfree(q);
>   mutex_unlock(_dev->lock);
>  }
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
> b/drivers/s390/crypto/vfio_ap_ops.c
> index 7339043906cf..052f61391ec7 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -25,6 +25,7 @@
>  #define VFIO_AP_MDEV_NAME_HWVIRT "VFIO AP Passthrough Device"
>  
>  static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
> +static struct vfio_ap_queue *vfio_ap_find_queue(int apqn);
>  
>  static int match_apqn(struct device *dev, const void *data)
>  {
> @@ -49,20 +50,15 @@ static struct vfio_ap_queue *(
>   int apqn)
>  {
>   struct vfio_ap_queue *q;
> - struct device *dev;
>  
>   if (!test_bit_inv(AP_QID_CARD(apqn), matrix_mdev->matrix.apm))
>   return NULL;
>   if (!test_bit_inv(AP_QID_QUEUE(apqn), matrix_mdev->matrix.aqm))
>   return NULL;
>  
> - dev = driver_find_device(_dev->vfio_ap_drv->driver, NULL,
> -  , match_apqn);
> - if (!dev)
> - return NULL;
> - q = dev_get_drvdata(dev);
> - q->matrix_mdev = matrix_mdev;
> - put_device(dev);
> + q = vfio_ap_find_queue(apqn);
> + if (q)
> + q->matrix_mdev = matrix_mdev;
>  
>   return q;
>  }
> @@ -1126,24 +1122,27 @@ static int vfio_ap_mdev_group_notifier(struct 
> notifier_block *nb,
>   return notify_rc;
>  }
>  
> -static void vfio_ap_irq_disable_apqn(int apqn)
> +static struct vfio_ap_queue *vfio_ap_find_queue(int apqn)
>  {
>   struct device *dev;
> - struct vfio_ap_queue *q;
> + struct vfio_ap_queue *q = NULL;
>  
>   dev = driver_find_device(_dev->vfio_ap_drv->driver, NULL,
>, match_apqn);
>   if (dev) {
>   q = dev_get_drvdata(dev);
> - vfio_ap_irq_disable(q);
>   put_device(dev);
>   }
> +
> + return q;
>  }

This hunk and the previous one are a rewrite of vfio_ap_get_queue() and
have next to nothing to do with the patch's objective. If we were at an
earlier stage, I would ask to split it up.

>  
>  int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
>unsigned int retry)
>  {
>   struct ap_queue_status status;
> + struct vfio_ap_queue *q;
> + int ret;
>   int retry2 = 2;
>   int apqn = AP_MKQID(apid, apqi);
>  
> @@ -1156,18 +1155,32 @@ int vfio_ap_mdev_reset_queue(unsigned int apid, 
> unsigned int apqi,
>   status = ap_tapq(apqn, NULL);
>   }
>   WARN_ON_ONCE(retry2 <= 0);
> - return 0;
> + ret = 0;
> + goto free_aqic_resources;
>   case AP_RESPONSE_RESET_IN_PROGRESS:
>   case AP_RESPONSE_BUSY:
>   msleep(20);
>   break;
>   default:
>   /* things are really broken, give up */
> - return -EIO;
> + ret = -EIO;
> + goto free_aqic_resources;

Do we really want the unpin here? I mean the reset did not work and
we are giving up. So the irqs are potentially still enabled.

Without this patch we try to disable the interrupts using AQIC, and
do the cleanup after that.

I'm aware, the comment say

Re: [PATCH v13 00/15] s390/vfio-ap: dynamic configuration support

2021-01-07 Thread Halil Pasic
On Wed, 6 Jan 2021 10:16:24 -0500
Tony Krowiak  wrote:

> Ping
> 

pong

Will try have a look these days...


Re: [PATCH v5] s390/vfio-ap: clean up vfio_ap resources when KVM pointer invalidated

2020-12-23 Thread Halil Pasic
On Tue, 22 Dec 2020 20:20:13 -0500
Tony Krowiak  wrote:

> The vfio_ap device driver registers a group notifier with VFIO when the
> file descriptor for a VFIO mediated device for a KVM guest is opened to
> receive notification that the KVM pointer is set (VFIO_GROUP_NOTIFY_SET_KVM
> event). When the KVM pointer is set, the vfio_ap driver takes the
> following actions:
> 1. Stashes the KVM pointer in the vfio_ap_mdev struct that holds the state
>of the mediated device.
> 2. Calls the kvm_get_kvm() function to increment its reference counter.
> 3. Sets the function pointer to the function that handles interception of
>the instruction that enables/disables interrupt processing.
> 4. Sets the masks in the KVM guest's CRYCB to pass AP resources through to
>the guest.
> 
> In order to avoid memory leaks, when the notifier is called to receive
> notification that the KVM pointer has been set to NULL, the vfio_ap device
> driver should reverse the actions taken when the KVM pointer was set.
> 
> Fixes: 258287c994de ("s390: vfio-ap: implement mediated device open callback")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Tony Krowiak 
> Reviewed-by: Halil Pasic 
> Reviewed-by: Cornelia Huck 

LGTM.

Christian, you wanted to pick this yourself directly, or? I think we are
good to go!


Re: [PATCH v4] s390/vfio-ap: clean up vfio_ap resources when KVM pointer invalidated

2020-12-22 Thread Halil Pasic
On Tue, 22 Dec 2020 16:57:06 +0100
Cornelia Huck  wrote:

> On Tue, 22 Dec 2020 10:37:01 -0500
> Tony Krowiak  wrote:
> 
> > On 12/21/20 11:05 PM, Halil Pasic wrote:  
> > > On Mon, 21 Dec 2020 13:56:25 -0500
> > > Tony Krowiak  wrote:  
> 
> > >>   static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
> > >> unsigned long action, void *data)
> > >>   {
> > >> -int ret;
> > >> +int ret, notify_rc = NOTIFY_DONE;
> > >>  struct ap_matrix_mdev *matrix_mdev;
> > >>   
> > >>  if (action != VFIO_GROUP_NOTIFY_SET_KVM)
> > >>  return NOTIFY_OK;
> > >>   
> > >>  matrix_mdev = container_of(nb, struct ap_matrix_mdev, 
> > >> group_notifier);
> > >> +mutex_lock(_dev->lock);
> > >>   
> > >>  if (!data) {
> > >> -matrix_mdev->kvm = NULL;
> > >> -return NOTIFY_OK;
> > >> +if (matrix_mdev->kvm)
> > >> +vfio_ap_mdev_unset_kvm(matrix_mdev);
> > >> +notify_rc = NOTIFY_OK;
> > >> +goto notify_done;
> > >>  }
> > >>   
> > >>  ret = vfio_ap_mdev_set_kvm(matrix_mdev, data);
> > >>  if (ret)
> > >> -return NOTIFY_DONE;
> > >> +goto notify_done;
> > >>   
> > >>  /* If there is no CRYCB pointer, then we can't copy the masks */
> > >>  if (!matrix_mdev->kvm->arch.crypto.crycbd)
> > >> -return NOTIFY_DONE;
> > >> +goto notify_done;
> > >>   
> > >>  kvm_arch_crypto_set_masks(matrix_mdev->kvm, 
> > >> matrix_mdev->matrix.apm,
> > >>matrix_mdev->matrix.aqm,
> > >>matrix_mdev->matrix.adm);
> > >>   
> > >> -return NOTIFY_OK;
> > > Shouldn't there be an
> > >   +   notify_rc = NOTIFY_OK;
> > > here? I mean you initialize notify_rc to NOTIFY_DONE, in the !data branch
> > > on success you set notify_rc to NOTIFY_OK, but in the !!data branch it
> > > just stays NOTIFY_DONE. Or am I missing something?
> > 
> > I don't think it matters much since NOTIFY_OK and NOTIFY_DONE have
> > no further effect on processing of the notification queue, but I believe
> > you are correct, this is a change from what we originally had. I can
> > restore the original return values if you'd prefer.  
> 
> Even if they have the same semantics now, that might change in the
> future; restoring the original behaviour looks like the right thing to
> do.

I agree. Especially since we do care to preserve the behavior in
the !data branch. If there is no difference between the two, then it
would probably make sense to clean that up globally. 

Regards,
Halil


Re: [PATCH v4] s390/vfio-ap: clean up vfio_ap resources when KVM pointer invalidated

2020-12-21 Thread Halil Pasic
On Mon, 21 Dec 2020 13:56:25 -0500
Tony Krowiak  wrote:

> The vfio_ap device driver registers a group notifier with VFIO when the
> file descriptor for a VFIO mediated device for a KVM guest is opened to
> receive notification that the KVM pointer is set (VFIO_GROUP_NOTIFY_SET_KVM
> event). When the KVM pointer is set, the vfio_ap driver takes the
> following actions:
> 1. Stashes the KVM pointer in the vfio_ap_mdev struct that holds the state
>of the mediated device.
> 2. Calls the kvm_get_kvm() function to increment its reference counter.
> 3. Sets the function pointer to the function that handles interception of
>the instruction that enables/disables interrupt processing.
> 4. Sets the masks in the KVM guest's CRYCB to pass AP resources through to
>the guest.
> 
> In order to avoid memory leaks, when the notifier is called to receive
> notification that the KVM pointer has been set to NULL, the vfio_ap device
> driver should reverse the actions taken when the KVM pointer was set.
> 
> Fixes: 258287c994de ("s390: vfio-ap: implement mediated device open callback")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Tony Krowiak 
> Reviewed-by: Halil Pasic 
> Reviewed-by: Cornelia Huck 

[..]

>  static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
>  unsigned long action, void *data)
>  {
> - int ret;
> + int ret, notify_rc = NOTIFY_DONE;
>   struct ap_matrix_mdev *matrix_mdev;
>  
>   if (action != VFIO_GROUP_NOTIFY_SET_KVM)
>   return NOTIFY_OK;
>  
>   matrix_mdev = container_of(nb, struct ap_matrix_mdev, group_notifier);
> + mutex_lock(_dev->lock);
>  
>   if (!data) {
> - matrix_mdev->kvm = NULL;
> - return NOTIFY_OK;
> + if (matrix_mdev->kvm)
> + vfio_ap_mdev_unset_kvm(matrix_mdev);
> + notify_rc = NOTIFY_OK;
> + goto notify_done;
>   }
>  
>   ret = vfio_ap_mdev_set_kvm(matrix_mdev, data);
>   if (ret)
> - return NOTIFY_DONE;
> + goto notify_done;
>  
>   /* If there is no CRYCB pointer, then we can't copy the masks */
>   if (!matrix_mdev->kvm->arch.crypto.crycbd)
> - return NOTIFY_DONE;
> + goto notify_done;
>  
>   kvm_arch_crypto_set_masks(matrix_mdev->kvm, matrix_mdev->matrix.apm,
> matrix_mdev->matrix.aqm,
> matrix_mdev->matrix.adm);
>  
> - return NOTIFY_OK;

Shouldn't there be an 
 +  notify_rc = NOTIFY_OK;
here? I mean you initialize notify_rc to NOTIFY_DONE, in the !data branch
on success you set notify_rc to NOTIFY_OK, but in the !!data branch it
just stays NOTIFY_DONE. Or am I missing something?

Otherwise LGTM!

Regards,
Halil

> +notify_done:
> + mutex_unlock(_dev->lock);
> + return notify_rc;
>  }
> 

[..] 


Re: [PATCH v3] s390/vfio-ap: clean up vfio_ap resources when KVM pointer invalidated

2020-12-16 Thread Halil Pasic
On Wed, 16 Dec 2020 10:58:48 +0100
Christian Borntraeger  wrote:

> On 16.12.20 02:21, Halil Pasic wrote:
> > On Tue, 15 Dec 2020 19:10:20 +0100
> > Christian Borntraeger  wrote:
> > 
> >>
> >>
> >> On 15.12.20 11:57, Halil Pasic wrote:
> >>> On Mon, 14 Dec 2020 11:56:17 -0500
> >>> Tony Krowiak  wrote:
> >>>
> >>>> The vfio_ap device driver registers a group notifier with VFIO when the
> >>>> file descriptor for a VFIO mediated device for a KVM guest is opened to
> >>>> receive notification that the KVM pointer is set 
> >>>> (VFIO_GROUP_NOTIFY_SET_KVM
> >>>> event). When the KVM pointer is set, the vfio_ap driver takes the
> >>>> following actions:
> >>>> 1. Stashes the KVM pointer in the vfio_ap_mdev struct that holds the 
> >>>> state
> >>>>of the mediated device.
> >>>> 2. Calls the kvm_get_kvm() function to increment its reference counter.
> >>>> 3. Sets the function pointer to the function that handles interception of
> >>>>the instruction that enables/disables interrupt processing.
> >>>> 4. Sets the masks in the KVM guest's CRYCB to pass AP resources through 
> >>>> to
> >>>>the guest.
> >>>>
> >>>> In order to avoid memory leaks, when the notifier is called to receive
> >>>> notification that the KVM pointer has been set to NULL, the vfio_ap 
> >>>> device
> >>>> driver should reverse the actions taken when the KVM pointer was set.
> >>>>
> >>>> Fixes: 258287c994de ("s390: vfio-ap: implement mediated device open 
> >>>> callback")
> >>>> Signed-off-by: Tony Krowiak 
> >>>> ---
> >>>>  drivers/s390/crypto/vfio_ap_ops.c | 29 -
> >>>>  1 file changed, 20 insertions(+), 9 deletions(-)
> >>>>
> >>>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
> >>>> b/drivers/s390/crypto/vfio_ap_ops.c
> >>>> index e0bde8518745..cd22e85588e1 100644
> >>>> --- a/drivers/s390/crypto/vfio_ap_ops.c
> >>>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> >>>> @@ -1037,8 +1037,6 @@ static int vfio_ap_mdev_set_kvm(struct 
> >>>> ap_matrix_mdev *matrix_mdev,
> >>>>  {
> >>>>  struct ap_matrix_mdev *m;
> >>>>
> >>>> -mutex_lock(_dev->lock);
> >>>> -
> >>>>  list_for_each_entry(m, _dev->mdev_list, node) {
> >>>>  if ((m != matrix_mdev) && (m->kvm == kvm)) {
> >>>>  mutex_unlock(_dev->lock);
> >>>> @@ -1049,7 +1047,6 @@ static int vfio_ap_mdev_set_kvm(struct 
> >>>> ap_matrix_mdev *matrix_mdev,
> >>>>  matrix_mdev->kvm = kvm;
> >>>>  kvm_get_kvm(kvm);
> >>>>  kvm->arch.crypto.pqap_hook = _mdev->pqap_hook;
> >>>> -mutex_unlock(_dev->lock);
> >>>>
> >>>>  return 0;
> >>>>  }
> >>>> @@ -1083,35 +1080,49 @@ static int vfio_ap_mdev_iommu_notifier(struct 
> >>>> notifier_block *nb,
> >>>>  return NOTIFY_DONE;
> >>>>  }
> >>>>
> >>>> +static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)
> >>>> +{
> >>>> +kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
> >>>> +matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
> >>>
> >>>
> >>> This patch LGTM. The only concern I have with it is whether a
> >>> different cpu is guaranteed to observe the above assignment as
> >>> an atomic operation. I think we didn't finish this discussion
> >>> at v1, or did we?
> >>
> >> You mean just this assigment:
> >>>> +matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
> >> should either have the old or the new value, but not halve zero halve old?
> >>
> > 
> > Yes that is the assignment I was referring to. Old value will work as well 
> > because
> > kvm holds a reference to this module while in the pqap_hook.
> >  
> >> Normally this should be ok (and I would consider this a compiler bug if
> >> this is split into 2 32 bit zeroes) But if you really want to be sure then 
> >> we
> >> can use WRITE_ONCE.
> > 
> > Just my curiosity: what would make this a bug? Is it the s390 elf ABI,
> > or some gcc feature, or even the C standard? Also how exactly would
> > WRITE_ONCE, also access via volatile help in this particular situation?
> 
> I think its a tricky things and not strictly guaranteed, but there is a lot
> of code that relies on the atomicity of word sizes. see for example the 
> discussion
> here
> https://lore.kernel.org/lkml/CAHk-=wgc4+kv9ailokw7cpp429rkcu+vja8cwafyojc3mtq...@mail.gmail.com/
> 
> WRITE_ONCE will not change the guarantees a lot, but it is mostly a 
> documentation
> that we assume atomic access here.

Thanks a lot! I've read it, and IMHO it seems to contradict the section
https://lwn.net/Articles/793253/#Store%20Tearing a little. From there, I also 
learned
that WRITE_ONCE (i.e. volatile access) can help, although I don't really
understand why. Of course, we don't need to be portable here, as this
is s390 only code. So we might be safe without anything -- I don't know.
I believe, if volatile were enough (under any circumstances), the C
standard wouldn't have introduced atomic types.

Regards,
Halil


Re: [PATCH v12 11/17] s390/vfio-ap: allow assignment of unavailable AP queues to mdev device

2020-12-16 Thread Halil Pasic
On Wed, 16 Dec 2020 15:14:47 -0500
Tony Krowiak  wrote:

> 
> 
> On 11/28/20 8:17 PM, Halil Pasic wrote:
> > On Tue, 24 Nov 2020 16:40:10 -0500
> > Tony Krowiak  wrote:
> >
> >> The current implementation does not allow assignment of an AP adapter or
> >> domain to an mdev device if each APQN resulting from the assignment
> >> does not reference an AP queue device that is bound to the vfio_ap device
> >> driver. This patch allows assignment of AP resources to the matrix mdev as
> >> long as the APQNs resulting from the assignment:
> >> 1. Are not reserved by the AP BUS for use by the zcrypt device drivers.
> >> 2. Are not assigned to another matrix mdev.
> >>
> >> The rationale behind this is twofold:
> >> 1. The AP architecture does not preclude assignment of APQNs to an AP
> >>configuration that are not available to the system.
> >> 2. APQNs that do not reference a queue device bound to the vfio_ap
> >>device driver will not be assigned to the guest's CRYCB, so the
> >>guest will not get access to queues not bound to the vfio_ap driver.
> >>
> >> Signed-off-by: Tony Krowiak 
> > Again code looks good. I'm still worried about all the incremental
> > changes (good for review) and their testability.
> 
> I'm not sure what your concern is here. Is there an expectation
> that each patch needs to be testable by itself, or whether the
> functionality in each patch can be easily tested en masse?

I was referring to the testability of each patch in the following
sense: can you (at least theoretically) write a testsuite, that has
perfect coverage, and no false positives for each prefix of the
series applied. 

BTW I don't consider this a showstopper. 

> 
> I'm not sure some of these changes can be tested with an
> automated test because the test code would have to be able to
> dynamically change the host's AP configuration and I don't know
> if there is currently a way to do this programmatically. In order to
> test the effects of dynamic host crypto configuration manually, one
> needs access to an SE or HMC with DPM.
> 

Nested should also give you this: you can change G2 which is a host
to G3.

Regards,
Halil


Re: [PATCH v3] s390/vfio-ap: clean up vfio_ap resources when KVM pointer invalidated

2020-12-16 Thread Halil Pasic
On Wed, 16 Dec 2020 17:05:24 +0100
Christian Borntraeger  wrote:

> 
> 
> On 16.12.20 10:58, Christian Borntraeger wrote:
> > On 16.12.20 02:21, Halil Pasic wrote:
> >> On Tue, 15 Dec 2020 19:10:20 +0100
> >> Christian Borntraeger  wrote:
> >>
> >>>
> >>>
> >>> On 15.12.20 11:57, Halil Pasic wrote:
> >>>> On Mon, 14 Dec 2020 11:56:17 -0500
> >>>> Tony Krowiak  wrote:
> >>>>
> >>>>> The vfio_ap device driver registers a group notifier with VFIO when the
> >>>>> file descriptor for a VFIO mediated device for a KVM guest is opened to
> >>>>> receive notification that the KVM pointer is set 
> >>>>> (VFIO_GROUP_NOTIFY_SET_KVM
> >>>>> event). When the KVM pointer is set, the vfio_ap driver takes the
> >>>>> following actions:
> >>>>> 1. Stashes the KVM pointer in the vfio_ap_mdev struct that holds the 
> >>>>> state
> >>>>>of the mediated device.
> >>>>> 2. Calls the kvm_get_kvm() function to increment its reference counter.
> >>>>> 3. Sets the function pointer to the function that handles interception 
> >>>>> of
> >>>>>the instruction that enables/disables interrupt processing.
> >>>>> 4. Sets the masks in the KVM guest's CRYCB to pass AP resources through 
> >>>>> to
> >>>>>the guest.
> >>>>>
> >>>>> In order to avoid memory leaks, when the notifier is called to receive
> >>>>> notification that the KVM pointer has been set to NULL, the vfio_ap 
> >>>>> device
> >>>>> driver should reverse the actions taken when the KVM pointer was set.
> >>>>>
> >>>>> Fixes: 258287c994de ("s390: vfio-ap: implement mediated device open 
> >>>>> callback")
> >>>>> Signed-off-by: Tony Krowiak 
> >>>>> ---
> >>>>>  drivers/s390/crypto/vfio_ap_ops.c | 29 -
> >>>>>  1 file changed, 20 insertions(+), 9 deletions(-)
> >>>>>
> >>>>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
> >>>>> b/drivers/s390/crypto/vfio_ap_ops.c
> >>>>> index e0bde8518745..cd22e85588e1 100644
> >>>>> --- a/drivers/s390/crypto/vfio_ap_ops.c
> >>>>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> >>>>> @@ -1037,8 +1037,6 @@ static int vfio_ap_mdev_set_kvm(struct 
> >>>>> ap_matrix_mdev *matrix_mdev,
> >>>>>  {
> >>>>> struct ap_matrix_mdev *m;
> >>>>>
> >>>>> -   mutex_lock(_dev->lock);
> >>>>> -
> >>>>> list_for_each_entry(m, _dev->mdev_list, node) {
> >>>>> if ((m != matrix_mdev) && (m->kvm == kvm)) {
> >>>>> mutex_unlock(_dev->lock);
> >>>>> @@ -1049,7 +1047,6 @@ static int vfio_ap_mdev_set_kvm(struct 
> >>>>> ap_matrix_mdev *matrix_mdev,
> >>>>> matrix_mdev->kvm = kvm;
> >>>>> kvm_get_kvm(kvm);
> >>>>> kvm->arch.crypto.pqap_hook = _mdev->pqap_hook;
> >>>>> -   mutex_unlock(_dev->lock);
> >>>>>
> >>>>> return 0;
> >>>>>  }
> >>>>> @@ -1083,35 +1080,49 @@ static int vfio_ap_mdev_iommu_notifier(struct 
> >>>>> notifier_block *nb,
> >>>>> return NOTIFY_DONE;
> >>>>>  }
> >>>>>
> >>>>> +static void "(struct ap_matrix_mdev *matrix_mdev)
> >>>>> +{
> >>>>> +   kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
> >>>>> +   matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
> >>>>
> >>>>
> >>>> This patch LGTM. The only concern I have with it is whether a
> >>>> different cpu is guaranteed to observe the above assignment as
> >>>> an atomic operation. I think we didn't finish this discussion
> >>>> at v1, or did we?
> >>>
> >>> You mean just this assigment:
> >>>>> +   matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
> >>> should either have the old or the new value, but not halve zero halve o

Re: [PATCH v3] s390/vfio-ap: clean up vfio_ap resources when KVM pointer invalidated

2020-12-15 Thread Halil Pasic
On Tue, 15 Dec 2020 19:10:20 +0100
Christian Borntraeger  wrote:

> 
> 
> On 15.12.20 11:57, Halil Pasic wrote:
> > On Mon, 14 Dec 2020 11:56:17 -0500
> > Tony Krowiak  wrote:
> > 
> >> The vfio_ap device driver registers a group notifier with VFIO when the
> >> file descriptor for a VFIO mediated device for a KVM guest is opened to
> >> receive notification that the KVM pointer is set (VFIO_GROUP_NOTIFY_SET_KVM
> >> event). When the KVM pointer is set, the vfio_ap driver takes the
> >> following actions:
> >> 1. Stashes the KVM pointer in the vfio_ap_mdev struct that holds the state
> >>of the mediated device.
> >> 2. Calls the kvm_get_kvm() function to increment its reference counter.
> >> 3. Sets the function pointer to the function that handles interception of
> >>the instruction that enables/disables interrupt processing.
> >> 4. Sets the masks in the KVM guest's CRYCB to pass AP resources through to
> >>the guest.
> >>
> >> In order to avoid memory leaks, when the notifier is called to receive
> >> notification that the KVM pointer has been set to NULL, the vfio_ap device
> >> driver should reverse the actions taken when the KVM pointer was set.
> >>
> >> Fixes: 258287c994de ("s390: vfio-ap: implement mediated device open 
> >> callback")
> >> Signed-off-by: Tony Krowiak 
> >> ---
> >>  drivers/s390/crypto/vfio_ap_ops.c | 29 -
> >>  1 file changed, 20 insertions(+), 9 deletions(-)
> >>
> >> diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
> >> b/drivers/s390/crypto/vfio_ap_ops.c
> >> index e0bde8518745..cd22e85588e1 100644
> >> --- a/drivers/s390/crypto/vfio_ap_ops.c
> >> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> >> @@ -1037,8 +1037,6 @@ static int vfio_ap_mdev_set_kvm(struct 
> >> ap_matrix_mdev *matrix_mdev,
> >>  {
> >>struct ap_matrix_mdev *m;
> >>
> >> -  mutex_lock(_dev->lock);
> >> -
> >>list_for_each_entry(m, _dev->mdev_list, node) {
> >>if ((m != matrix_mdev) && (m->kvm == kvm)) {
> >>mutex_unlock(_dev->lock);
> >> @@ -1049,7 +1047,6 @@ static int vfio_ap_mdev_set_kvm(struct 
> >> ap_matrix_mdev *matrix_mdev,
> >>matrix_mdev->kvm = kvm;
> >>kvm_get_kvm(kvm);
> >>kvm->arch.crypto.pqap_hook = _mdev->pqap_hook;
> >> -  mutex_unlock(_dev->lock);
> >>
> >>return 0;
> >>  }
> >> @@ -1083,35 +1080,49 @@ static int vfio_ap_mdev_iommu_notifier(struct 
> >> notifier_block *nb,
> >>return NOTIFY_DONE;
> >>  }
> >>
> >> +static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)
> >> +{
> >> +  kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
> >> +  matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
> > 
> > 
> > This patch LGTM. The only concern I have with it is whether a
> > different cpu is guaranteed to observe the above assignment as
> > an atomic operation. I think we didn't finish this discussion
> > at v1, or did we?
> 
> You mean just this assigment:
> >> +  matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
> should either have the old or the new value, but not halve zero halve old?
>

Yes that is the assignment I was referring to. Old value will work as well 
because
kvm holds a reference to this module while in the pqap_hook.
 
> Normally this should be ok (and I would consider this a compiler bug if
> this is split into 2 32 bit zeroes) But if you really want to be sure then we
> can use WRITE_ONCE.

Just my curiosity: what would make this a bug? Is it the s390 elf ABI,
or some gcc feature, or even the C standard? Also how exactly would
WRITE_ONCE, also access via volatile help in this particular situation?

I agree, if the member is properly aligned, (which it is),
normally/probably we are fine on s390x (which is also a given). 

> I think we take this via the s390 tree? I can add the WRITE_ONCE when 
> applying?

Yes that works fine with me.

Reviewed-by: Halil Pasic 


Re: [PATCH v3] s390/vfio-ap: clean up vfio_ap resources when KVM pointer invalidated

2020-12-15 Thread Halil Pasic
On Mon, 14 Dec 2020 11:56:17 -0500
Tony Krowiak  wrote:

> The vfio_ap device driver registers a group notifier with VFIO when the
> file descriptor for a VFIO mediated device for a KVM guest is opened to
> receive notification that the KVM pointer is set (VFIO_GROUP_NOTIFY_SET_KVM
> event). When the KVM pointer is set, the vfio_ap driver takes the
> following actions:
> 1. Stashes the KVM pointer in the vfio_ap_mdev struct that holds the state
>of the mediated device.
> 2. Calls the kvm_get_kvm() function to increment its reference counter.
> 3. Sets the function pointer to the function that handles interception of
>the instruction that enables/disables interrupt processing.
> 4. Sets the masks in the KVM guest's CRYCB to pass AP resources through to
>the guest.
> 
> In order to avoid memory leaks, when the notifier is called to receive
> notification that the KVM pointer has been set to NULL, the vfio_ap device
> driver should reverse the actions taken when the KVM pointer was set.
> 
> Fixes: 258287c994de ("s390: vfio-ap: implement mediated device open callback")
> Signed-off-by: Tony Krowiak 
> ---
>  drivers/s390/crypto/vfio_ap_ops.c | 29 -
>  1 file changed, 20 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
> b/drivers/s390/crypto/vfio_ap_ops.c
> index e0bde8518745..cd22e85588e1 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -1037,8 +1037,6 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev 
> *matrix_mdev,
>  {
>   struct ap_matrix_mdev *m;
> 
> - mutex_lock(_dev->lock);
> -
>   list_for_each_entry(m, _dev->mdev_list, node) {
>   if ((m != matrix_mdev) && (m->kvm == kvm)) {
>   mutex_unlock(_dev->lock);
> @@ -1049,7 +1047,6 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev 
> *matrix_mdev,
>   matrix_mdev->kvm = kvm;
>   kvm_get_kvm(kvm);
>   kvm->arch.crypto.pqap_hook = _mdev->pqap_hook;
> - mutex_unlock(_dev->lock);
> 
>   return 0;
>  }
> @@ -1083,35 +1080,49 @@ static int vfio_ap_mdev_iommu_notifier(struct 
> notifier_block *nb,
>   return NOTIFY_DONE;
>  }
> 
> +static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)
> +{
> + kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
> + matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;


This patch LGTM. The only concern I have with it is whether a
different cpu is guaranteed to observe the above assignment as
an atomic operation. I think we didn't finish this discussion
at v1, or did we?

Regards,
Halil

> + vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
> + kvm_put_kvm(matrix_mdev->kvm);
> + matrix_mdev->kvm = NULL;
> +}
> +
>  static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
>  unsigned long action, void *data)
>  {
> - int ret;
> + int ret, notify_rc = NOTIFY_DONE;
>   struct ap_matrix_mdev *matrix_mdev;
> 
>   if (action != VFIO_GROUP_NOTIFY_SET_KVM)
>   return NOTIFY_OK;
> 
>   matrix_mdev = container_of(nb, struct ap_matrix_mdev, group_notifier);
> + mutex_lock(_dev->lock);
> 
>   if (!data) {
> - matrix_mdev->kvm = NULL;
> - return NOTIFY_OK;
> + if (matrix_mdev->kvm)
> + vfio_ap_mdev_unset_kvm(matrix_mdev);
> + notify_rc = NOTIFY_OK;
> + goto notify_done;
>   }
> 
>   ret = vfio_ap_mdev_set_kvm(matrix_mdev, data);
>   if (ret)
> - return NOTIFY_DONE;
> + goto notify_done;
> 
>   /* If there is no CRYCB pointer, then we can't copy the masks */
>   if (!matrix_mdev->kvm->arch.crypto.crycbd)
> - return NOTIFY_DONE;
> + goto notify_done;
> 
>   kvm_arch_crypto_set_masks(matrix_mdev->kvm, matrix_mdev->matrix.apm,
> matrix_mdev->matrix.aqm,
> matrix_mdev->matrix.adm);
> 
> - return NOTIFY_OK;
> +notify_done:
> + mutex_unlock(_dev->lock);
> + return notify_rc;
>  }
> 
>  static void vfio_ap_irq_disable_apqn(int apqn)



Re: [PATCH] s390/vfio-ap: Clean up vfio_ap resources when KVM pointer invalidated

2020-12-13 Thread Halil Pasic
On Fri, 11 Dec 2020 16:08:53 -0500
Tony Krowiak  wrote:

> >>> +static void vfio_ap_mdev_put_kvm(struct ap_matrix_mdev *matrix_mdev)
> >>> +{
> >>> + if (matrix_mdev->kvm) {
> >>> + kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
> >>> + matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
> >>> + vfio_ap_mdev_reset_queues(matrix_mdev->mdev);  
> >> This reset probably does not belong here since there is no
> >> reason to reset the queues in the group notifier (see below).  
> > What about kvm_s390_gisc_unregister()? That needs a valid kvm
> > pointer, or? Or is it OK to not pair a kvm_s390_gisc_register()
> > with an kvm_s390_gisc_unregister()?  
> 
> I probably should have been more specific about what I meant.
> I was thinking that the reset should not be dependent upon
> whether there is a KVM pointer or not since this function is
> also called from the release callback. On the other hand,
> the vfio_ap_mdev_reset_queues function calls the
> vfio_ap_irq_disable (AQIC) function after each queue is reset.
> The vfio_ap_irq_disable function also cleans up the AQIC
> resources which requires that the KVM point is valid, so if
> the vfio_ap_reset_queues function is not called with a
> valid KVM pointer, that could result in an exception.
> 
> The thing is, it is unnecessary to disable interrupts after
> resetting a queue because the reset disables interrupts,
> so I think I should include a patch for this fix that does the
> following:
> 
> 1. Removes the disabling of interrupts subsequent to resetting
>      a queue.
> 2. Includes the cleanup of AQIC resources when a queue is
>      reset if a KVM pointer is present.

Sounds like a plan. I see, in your v2 vfio_ap_mdev_unset_kvm()
does call vfio_ap_mdev_reset_queues() even when called from the
group notifier. I also like that the cleanup of AQIC resources is
a part of queue_reset. In fact I asked a while ago (Message-ID:
<20201027074846.30ee0ddc.pa...@linux.ibm.com> in October) to make
vfio_ap_mdev_reset_queue() call vfio_ap_free_aqic_resources(q).

Regards,
Halil 



Re: [PATCH] s390/vfio-ap: Clean up vfio_ap resources when KVM pointer invalidated

2020-12-13 Thread Halil Pasic
On Fri, 11 Dec 2020 15:52:55 -0500
Tony Krowiak  wrote:

> 
> 
> On 12/7/20 7:01 PM, Halil Pasic wrote:
> > On Mon, 7 Dec 2020 13:50:36 -0500
> > Tony Krowiak  wrote:
> >
> >> On 12/4/20 2:05 PM, Halil Pasic wrote:
> >>> On Fri, 4 Dec 2020 09:43:59 -0500
> >>> Tony Krowiak  wrote:
> >>>   
> >>>>>> +{
> >>>>>> +  if (matrix_mdev->kvm) {
> >>>>>> +  (matrix_mdev->kvm);
> >>>>>> +  matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
> >>>>> Is a plain assignment to arch.crypto.pqap_hook apropriate, or do we need
> >>>>> to take more care?
> >>>>>
> >>>>> For instance kvm_arch_crypto_set_masks() takes kvm->lock before poking
> >>>>> kvm->arch.crypto.crycb.
> >>>> I do not think so. The CRYCB is used by KVM to provide crypto resources
> >>>> to the guest so it makes sense to protect it from changes to it while
> >>>> passing
> >>>> the AP devices through to the guest. The hook is used only when an AQIC
> >>>> executed on the guest is intercepted by KVM. If the notifier
> >>>> is being invoked to notify vfio_ap that KVM has been set to NULL, this 
> >>>> means
> >>>> the guest is gone in which case there will be no AP instructions to
> >>>> intercept.
> >>> If the update to pqap_hook isn't observed as atomic we still have a
> >>> problem. With torn writes or reads we would try to use a corrupt function
> >>> pointer. While the compiler probably ain't likely to generate silly code
> >>> for the above assignment (multiple write instructions less then
> >>> quadword wide), I know of nothing that would prohibit the compiler to do
> >>> so.
> >> I'm sorry, but I still don't understand why you 
> >> tkvm_vfio_group_set_kvmhink this is a problem
> >> given what I stated above.
> > I assume you are specifically referring to 'the guest is gone in which
> > case there will be no AP instructions to intercept'.  I assume by 'guest
> > is gone' you mean that the VM is being destroyed, and the vcpus are out
> > of SIE. You are probably right for the invocation of
> > kvm_vfio_group_set_kvm() in kvm_vfio_destroy(), but is that true for
> > the invocation in the KVM_DEV_VFIO_GROUP_DEL case in
> > kvm_vfio_set_group()? I.e. can't we get the notifier called when the
> > qemu device is hot unplugged (modulo remove which unregisters the
> > notifier and usually precludes the notifier being with NULL called at
> > all)?
> 
> I am assuming by your question that the qemu device you are
> talking about the '-device vfio-ap' specified on the qemu command
> line or attached vi||a qemu device_add. 

Yes.

> When an mdev is hot 
> unplugged, the
> vfio_ap driver's release callback gets invoked when the mdev fd is 
> closed. The
> release callback unregisters the notifier, so it does not get called
> when the guest subsequently shuts down.
> 

That is what I meant by 'modulo remove which unregisters the notifier
and usually precludes the notifier being with NULL called at all', but
unfortunately I mixed up remove and release.

AFAIU release should be called before the notifier gets invoked
regardless of whether we have a hot-unplug of '-device vfio-ap' or
a shutdown. The whole effort is about what happens if userspace does
not adhered to this. If I apply the logic of your last response to the
whole situation, then there is nothing to do (AFAIU).

The point I'm trying to make is, that in a case of the hot-unplug, the
guest may survive the call to the notifier and also the vfio_mdev device
it was associated to at some point. So your argument that 'the guest is
gone in which case there will be no AP instructions to interpret' does
not hold.

Regards,
Halil



Re: [PATCH] s390/vfio-ap: Clean up vfio_ap resources when KVM pointer invalidated

2020-12-07 Thread Halil Pasic
On Mon, 7 Dec 2020 14:05:55 -0500
Tony Krowiak  wrote:

> 
> 
> On 12/2/20 6:41 PM, Tony Krowiak wrote:
> > The vfio_ap device driver registers a group notifier with VFIO when the
> > file descriptor for a VFIO mediated device for a KVM guest is opened to
> > receive notification that the KVM pointer is set (VFIO_GROUP_NOTIFY_SET_KVM
> > event). When the KVM pointer is set, the vfio_ap driver stashes the pointer
> > and calls the kvm_get_kvm() function to increment its reference counter.
> > When the notifier is called to make notification that the KVM pointer has
> > been set to NULL, the driver should clean up any resources associated with
> > the KVM pointer and decrement its reference counter. The current
> > implementation does not take care of this clean up.
> >
> > Signed-off-by: Tony Krowiak 
> > ---
> >   drivers/s390/crypto/vfio_ap_ops.c | 21 +
> >   1 file changed, 13 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
> > b/drivers/s390/crypto/vfio_ap_ops.c
> > index e0bde8518745..eeb9c9130756 100644
> > --- a/drivers/s390/crypto/vfio_ap_ops.c
> > +++ b/drivers/s390/crypto/vfio_ap_ops.c
> > @@ -1083,6 +1083,17 @@ static int vfio_ap_mdev_iommu_notifier(struct 
> > notifier_block *nb,
> > return NOTIFY_DONE;
> >   }
> >   
> > +static void vfio_ap_mdev_put_kvm(struct ap_matrix_mdev *matrix_mdev)
> > +{
> > +   if (matrix_mdev->kvm) {
> > +   kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
> > +   matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
> > +   vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
> 
> This reset probably does not belong here since there is no
> reason to reset the queues in the group notifier (see below).

What about kvm_s390_gisc_unregister()? That needs a valid kvm
pointer, or? Or is it OK to not pair a kvm_s390_gisc_register()
with an kvm_s390_gisc_unregister()?

Regards,
Halil

> The reset should be done in the release callback only regardless
> of whether the KVM pointer exists or not.
> 
> > +   kvm_put_kvm(matrix_mdev->kvm);
> > +   matrix_mdev->kvm = NULL;
> > +   }
> > +}
> > +
> >   static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
> >unsigned long action, void *data)
> >   {
> > @@ -1095,7 +1106,7 @@ static int vfio_ap_mdev_group_notifier(struct 
> > notifier_block *nb,
> > matrix_mdev = container_of(nb, struct ap_matrix_mdev, group_notifier);
> >   
> > if (!data) {
> > -   matrix_mdev->kvm = NULL;
> > +   vfio_ap_mdev_put_kvm(matrix_mdev);
> > return NOTIFY_OK;
> > }
> >   
> > @@ -1222,13 +1233,7 @@ static void vfio_ap_mdev_release(struct mdev_device 
> > *mdev)
> > struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
> >   
> > mutex_lock(_dev->lock);
> > -   if (matrix_mdev->kvm) {
> > -   kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
> > -   matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
> > -   vfio_ap_mdev_reset_queues(mdev);
> 
> This release should be moved outside of the block and
> performed regardless of whether the KVM pointer exists or
> not.
> 
> > -   kvm_put_kvm(matrix_mdev->kvm);
> > -   matrix_mdev->kvm = NULL;
> > -   }
> > +   vfio_ap_mdev_put_kvm(matrix_mdev);
> > mutex_unlock(_dev->lock);
> >   
> > vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
> 



Re: [PATCH] s390/vfio-ap: Clean up vfio_ap resources when KVM pointer invalidated

2020-12-07 Thread Halil Pasic
On Mon, 7 Dec 2020 13:50:36 -0500
Tony Krowiak  wrote:

> On 12/4/20 2:05 PM, Halil Pasic wrote:
> > On Fri, 4 Dec 2020 09:43:59 -0500
> > Tony Krowiak  wrote:
> >  
> >>>> +{
> >>>> +if (matrix_mdev->kvm) {
> >>>> +(matrix_mdev->kvm);
> >>>> +matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;  
> >>> Is a plain assignment to arch.crypto.pqap_hook apropriate, or do we need
> >>> to take more care?
> >>>
> >>> For instance kvm_arch_crypto_set_masks() takes kvm->lock before poking
> >>> kvm->arch.crypto.crycb.  
> >> I do not think so. The CRYCB is used by KVM to provide crypto resources
> >> to the guest so it makes sense to protect it from changes to it while
> >> passing
> >> the AP devices through to the guest. The hook is used only when an AQIC
> >> executed on the guest is intercepted by KVM. If the notifier
> >> is being invoked to notify vfio_ap that KVM has been set to NULL, this 
> >> means
> >> the guest is gone in which case there will be no AP instructions to
> >> intercept.  
> > If the update to pqap_hook isn't observed as atomic we still have a
> > problem. With torn writes or reads we would try to use a corrupt function
> > pointer. While the compiler probably ain't likely to generate silly code
> > for the above assignment (multiple write instructions less then
> > quadword wide), I know of nothing that would prohibit the compiler to do
> > so.  
> 
> I'm sorry, but I still don't understand why you tkvm_vfio_group_set_kvmhink 
> this is a problem
> given what I stated above.

I assume you are specifically referring to 'the guest is gone in which
case there will be no AP instructions to intercept'.  I assume by 'guest
is gone' you mean that the VM is being destroyed, and the vcpus are out
of SIE. You are probably right for the invocation of
kvm_vfio_group_set_kvm() in kvm_vfio_destroy(), but is that true for
the invocation in the KVM_DEV_VFIO_GROUP_DEL case in
kvm_vfio_set_group()? I.e. can't we get the notifier called when the
qemu device is hot unplugged (modulo remove which unregisters the
notifier and usually precludes the notifier being with NULL called at
all)?

Regards,
Halil


Re: [PATCH] s390/vfio-ap: Clean up vfio_ap resources when KVM pointer invalidated

2020-12-07 Thread Halil Pasic
On Fri, 4 Dec 2020 11:48:24 -0500
Tony Krowiak  wrote:

> On 12/3/20 12:55 PM, Halil Pasic wrote:
> > On Wed,  2 Dec 2020 18:41:01 -0500
> > Tony Krowiak  wrote:
> >  
> >> The vfio_ap device driver registers a group notifier with VFIO when the
> >> file descriptor for a VFIO mediated device for a KVM guest is opened to
> >> receive notification that the KVM pointer is set (VFIO_GROUP_NOTIFY_SET_KVM
> >> event). When the KVM pointer is set, the vfio_ap driver stashes the pointer
> >> and calls the kvm_get_kvm() function to increment its reference counter.
> >> When the notifier is called to make notification that the KVM pointer has
> >> been set to NULL, the driver should clean up any resources associated with
> >> the KVM pointer and decrement its reference counter. The current
> >> implementation does not take care of this clean up.
> >>
> >> Signed-off-by: Tony Krowiak   
> > Do we need a Fixes tag? Do we need this backported? In my opinion
> > this is necessary since the interrupt patches.  
> 
> I'll put in a fixes tag:
> Fixes: 258287c994de (s390: vfio-ap: implement mediated device open callback)
> 
> Yes, this should probably be backported.

I changed my mind regarding the severity of this issue. I was paranoid
about post-mortem interrupts, and resulting notifier byte updates by the
machine. What I overlooked is that the pin is going to prevent the memory
form getting repurposed. I.e. if we have something like vmalloc(),
vfio_pin(notifier_page), vfree(), I believe the notifier_page is not free
(available for allocation). So the worst case scenario is IMHO a resource
leak and not corruption. So I'm not sure this must be backported.
Opinions?

Regards,
Halil




Re: [PATCH] s390/vfio-ap: Clean up vfio_ap resources when KVM pointer invalidated

2020-12-04 Thread Halil Pasic
On Fri, 4 Dec 2020 14:46:30 -0500
Tony Krowiak  wrote:

> On 12/4/20 2:05 PM, Halil Pasic wrote:
> > On Fri, 4 Dec 2020 09:43:59 -0500
> > Tony Krowiak  wrote:
> >  
> >>>> +{
> >>>> +if (matrix_mdev->kvm) {
> >>>> +(matrix_mdev->kvm);
> >>>> +matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;  
> >>> Is a plain assignment to arch.crypto.pqap_hook apropriate, or do we need
> >>> to take more care?
> >>>
> >>> For instance kvm_arch_crypto_set_masks() takes kvm->lock before poking
> >>> kvm->arch.crypto.crycb.  
> >> I do not think so. The CRYCB is used by KVM to provide crypto resources
> >> to the guest so it makes sense to protect it from changes to it while
> >> passing
> >> the AP devices through to the guest. The hook is used only when an AQIC
> >> executed on the guest is intercepted by KVM. If the notifier
> >> is being invoked to notify vfio_ap that KVM has been set to NULL, this 
> >> means
> >> the guest is gone in which case there will be no AP instructions to
> >> intercept.  
> > If the update to pqap_hook isn't observed as atomic we still have a
> > problem. With torn writes or reads we would try to use a corrupt function
> > pointer. While the compiler probably ain't likely to generate silly code
> > for the above assignment (multiple write instructions less then
> > quadword wide), I know of nothing that would prohibit the compiler to do
> > so.  
> 
> I see that in the handle_pqap() function in arch/s390/kvm/priv.c
> that gets called when the AQIC instruction is intercepted,
> the pqap_hook is protected by locking the owner of the hook:
> 
>          if (!try_module_get(vcpu->kvm->arch.crypto.pqap_hook->owner))
>              return -EOPNOTSUPP;
>          ret = vcpu->kvm->arch.crypto.pqap_hook->hook(vcpu);
> module_put(vcpu->kvm->arch.crypto.pqap_hook->owner);
> 
> Maybe that is what we should do when the kvm->arch.crypto.pqap_hook
> is set to NULL?

To my best knowledge that ain't no locking but mere refcounting. The
purpose of that is probably to prevent the owner module, and the code
pointed to by the 'hook' function pointer from being unloaded while we
are executing that very same code.

Why is that necessary, frankly I have no idea. We do tend to invalidate
the callback before doing our module_put in vfio_ap_mdev_release(). Maybe
the case you are handling right now is the reason (because the
callback is invalidated in vfio_ap_mdev_release() only if !!kvm.

Regards,
Halil



Re: [PATCH] s390/vfio-ap: Clean up vfio_ap resources when KVM pointer invalidated

2020-12-04 Thread Halil Pasic
On Fri, 4 Dec 2020 09:43:59 -0500
Tony Krowiak  wrote:

> >> +{
> >> +  if (matrix_mdev->kvm) {
> >> +  (matrix_mdev->kvm);
> >> +  matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;  
> > Is a plain assignment to arch.crypto.pqap_hook apropriate, or do we need
> > to take more care?
> >
> > For instance kvm_arch_crypto_set_masks() takes kvm->lock before poking
> > kvm->arch.crypto.crycb.  
> 
> I do not think so. The CRYCB is used by KVM to provide crypto resources
> to the guest so it makes sense to protect it from changes to it while 
> passing
> the AP devices through to the guest. The hook is used only when an AQIC
> executed on the guest is intercepted by KVM. If the notifier
> is being invoked to notify vfio_ap that KVM has been set to NULL, this means
> the guest is gone in which case there will be no AP instructions to 
> intercept.

If the update to pqap_hook isn't observed as atomic we still have a
problem. With torn writes or reads we would try to use a corrupt function
pointer. While the compiler probably ain't likely to generate silly code
for the above assignment (multiple write instructions less then
quadword wide), I know of nothing that would prohibit the compiler to do
so.

I'm not certain about the scope of the kvm->lock (if it's supposed to
protect the whole sub-tree of objects). Maybe Janosch can help us out.
@Janosch: what do you think?

Regards,
Halil


Re: [PATCH] s390/vfio-ap: Clean up vfio_ap resources when KVM pointer invalidated

2020-12-03 Thread Halil Pasic
On Wed,  2 Dec 2020 18:41:01 -0500
Tony Krowiak  wrote:

> The vfio_ap device driver registers a group notifier with VFIO when the
> file descriptor for a VFIO mediated device for a KVM guest is opened to
> receive notification that the KVM pointer is set (VFIO_GROUP_NOTIFY_SET_KVM
> event). When the KVM pointer is set, the vfio_ap driver stashes the pointer
> and calls the kvm_get_kvm() function to increment its reference counter.
> When the notifier is called to make notification that the KVM pointer has
> been set to NULL, the driver should clean up any resources associated with
> the KVM pointer and decrement its reference counter. The current
> implementation does not take care of this clean up.
> 
> Signed-off-by: Tony Krowiak 

Do we need a Fixes tag? Do we need this backported? In my opinion
this is necessary since the interrupt patches.

> ---
>  drivers/s390/crypto/vfio_ap_ops.c | 21 +
>  1 file changed, 13 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
> b/drivers/s390/crypto/vfio_ap_ops.c
> index e0bde8518745..eeb9c9130756 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -1083,6 +1083,17 @@ static int vfio_ap_mdev_iommu_notifier(struct 
> notifier_block *nb,
>   return NOTIFY_DONE;
>  }
>  
> +static void vfio_ap_mdev_put_kvm(struct ap_matrix_mdev *matrix_mdev)

I don't like the name. The function does more that put_kvm. Maybe
something  like _disconnect_kvm()?

> +{
> + if (matrix_mdev->kvm) {
> + (matrix_mdev->kvm);
> + matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;

Is a plain assignment to arch.crypto.pqap_hook apropriate, or do we need
to take more care?

For instance kvm_arch_crypto_set_masks() takes kvm->lock before poking
kvm->arch.crypto.crycb.

> + vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
> + kvm_put_kvm(matrix_mdev->kvm);
> + matrix_mdev->kvm = NULL;
> + }
> +}
> +
>  static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
>  unsigned long action, void *data)
>  {
> @@ -1095,7 +1106,7 @@ static int vfio_ap_mdev_group_notifier(struct 
> notifier_block *nb,
>   matrix_mdev = container_of(nb, struct ap_matrix_mdev, group_notifier);
>  
>   if (!data) {
> - matrix_mdev->kvm = NULL;
> + vfio_ap_mdev_put_kvm(matrix_mdev);

The lock question was already raised.

What are the exact circumstances under which this branch can be taken?

>   return NOTIFY_OK;
>   }
>  
> @@ -1222,13 +1233,7 @@ static void vfio_ap_mdev_release(struct mdev_device 
> *mdev)
>   struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>  
>   mutex_lock(_dev->lock);
> - if (matrix_mdev->kvm) {
> - kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
> - matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
> - vfio_ap_mdev_reset_queues(mdev);
> - kvm_put_kvm(matrix_mdev->kvm);
> - matrix_mdev->kvm = NULL;
> - }
> + vfio_ap_mdev_put_kvm(matrix_mdev);
>   mutex_unlock(_dev->lock);
>  
>   vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,



Re: [PATCH] s390/vfio-ap: Clean up vfio_ap resources when KVM pointer invalidated

2020-12-03 Thread Halil Pasic
On Thu, 3 Dec 2020 11:19:07 +0100
Cornelia Huck  wrote:

> > @@ -1095,7 +1106,7 @@ static int vfio_ap_mdev_group_notifier(struct 
> > notifier_block *nb,
> > matrix_mdev = container_of(nb, struct ap_matrix_mdev, group_notifier);
> >  
> > if (!data) {
> > -   matrix_mdev->kvm = NULL;
> > +   vfio_ap_mdev_put_kvm(matrix_mdev);  
> 
> Hm. I'm wondering whether you need to hold the maxtrix_dev lock here as
> well?

In v12 we eventually did come along and patch "s390/vfio-ap: allow hot
plug/unplug of AP resources using mdev device" made this a part of a
critical section protected by the matrix_dev->lock.

IMHO the cleanup should definitely happen with the matrix_dev->lock held.

Regards,
Halil


Re: [PATCH v12 12/17] s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device

2020-12-01 Thread Halil Pasic
On Tue, 1 Dec 2020 17:12:56 -0500
Tony Krowiak  wrote:

> On 12/1/20 12:56 PM, Halil Pasic wrote:
> > On Tue, 1 Dec 2020 00:32:27 +0100
> > Halil Pasic  wrote:
> >  
> >>>
> >>> On 11/28/20 8:52 PM, Halil Pasic wrote:  
> >> [..]  
> >>>>> * Unassign adapter from mdev's matrix:
> >>>>>
> >>>>> The domain will be hot unplugged from the KVM guest if it is
> >>>>> assigned to the guest's matrix.
> >>>>>
> >>>>> * Assign a control domain:
> >>>>>
> >>>>> The control domain will be hot plugged into the KVM guest if it is 
> >>>>> not
> >>>>> assigned to the guest's APCB. The AP architecture ensures a guest 
> >>>>> will
> >>>>> only get access to the control domain if it is in the host's AP
> >>>>> configuration, so there is no risk in hot plugging it; however, it 
> >>>>> will
> >>>>> become automatically available to the guest when it is added to the 
> >>>>> host
> >>>>> configuration.
> >>>>>
> >>>>> * Unassign a control domain:
> >>>>>
> >>>>> The control domain will be hot unplugged from the KVM guest if it is
> >>>>> assigned to the guest's APCB.  
> >>>> This is where things start getting tricky. E.g. do we need to revise
> >>>> filtering after an unassign? (For example an assign_adapter X didn't
> >>>> change the shadow, because queue XY was missing, but now we unplug domain
> >>>> Y. Should the adapter X pop up? I guess it should.)  
> >>> I suppose that makes sense at the expense of making the code
> >>> more complex. It is essentially what we had in the prior version
> >>> which used the same filtering code for assignment as well as
> >>> host AP configuration changes.
> >>>  
> >> Will have to think about it some more. Making the user unplug and
> >> replug an adapter because at some point it got filtered, but there
> >> is no need to filter it does not feel right. On the other hand, I'm
> >> afraid I'm complaining in circles.  
> > I did some thinking. The following statements are about the state of
> > affairs, when all 17 patches are applied. I'm commenting here, because
> > I believe this is the patch that introduces the most controversial code.
> >
> > First about low level problems with the current code/design. The other is
> > empty handling in vfio_ap_assign_apid_to_apcb() (and
> > vfio_ap_assign_apqi_to_apcb()) is troublesome. The final product
> > allows for over-commitment, i.e. assignment of e.g. domains that
> > are not in the crypto host config. Let's assume the host LPAR
> > has usage domains 1 and 2, and adapters 1, 2, and 3. The apmask
> > and aqmask are both 0 (all in on vfio), all bound. We start with an empty
> > mdev that is tied to a running guest:
> > assign_adapter 1
> > assign_adapter 2
> > assign_adapter 3
> > assign_adapter 4
> > all of these will work. The resulting shadow_apcb is completely empty. No
> > commit_apcb.
> > assign_domain 1
> > assign_domain 2
> > assign_domain 3
> > all of these will work. But again the shadow_apcb is completely empty at
> > the end: we did get to the loop that is checking the boundness of the
> > queues, but please note that we are checking against matrix.apm, and
> > adapter 4 is not in the config of the host.
> >
> > I've hacked up a fixup patch for these problems that simplifies the
> > code considerably, but there are design level issues, that run deeper,
> > so I'm not sure the fixups are the way to go.
> >
> > Now lets talk about design level stuff. Currently the assignment
> > operations are designed in to accommodate the FCFS principle. This
> > is a blessing and a curse at the same time.
> >
> > Consider the following scenarios. We have an empty (nothing assigned
> > mdev) and the following queues are bound to the vfio_ap driver:
> > 0.0
> > 0.1
> > 1.0
> > If the we do
> > asssign_adapter 0
> > assign_domain 0
> > assign_domain 1
> > assign_adapter 1
> > We end up with the guest_matrix
> > 0.0
> > 0.1
> > and the matrix
> > 0.0
> > 0.1
> > 1.0
> > 1.0
> >
> > That is a different result compared to
> > asssign_adapter 0
> > assign_domain 0
> > assign_adapt

Re: [PATCH v12 12/17] s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device

2020-12-01 Thread Halil Pasic
On Tue, 1 Dec 2020 00:32:27 +0100
Halil Pasic  wrote:

> > 
> > 
> > On 11/28/20 8:52 PM, Halil Pasic wrote:  
> [..]
> > >> * Unassign adapter from mdev's matrix:
> > >>
> > >>The domain will be hot unplugged from the KVM guest if it is
> > >>assigned to the guest's matrix.
> > >>
> > >> * Assign a control domain:
> > >>
> > >>The control domain will be hot plugged into the KVM guest if it is not
> > >>assigned to the guest's APCB. The AP architecture ensures a guest will
> > >>only get access to the control domain if it is in the host's AP
> > >>configuration, so there is no risk in hot plugging it; however, it 
> > >> will
> > >>become automatically available to the guest when it is added to the 
> > >> host
> > >>configuration.
> > >>
> > >> * Unassign a control domain:
> > >>
> > >>The control domain will be hot unplugged from the KVM guest if it is
> > >>assigned to the guest's APCB.  
> > > This is where things start getting tricky. E.g. do we need to revise
> > > filtering after an unassign? (For example an assign_adapter X didn't
> > > change the shadow, because queue XY was missing, but now we unplug domain
> > > Y. Should the adapter X pop up? I guess it should.)  
> > 
> > I suppose that makes sense at the expense of making the code
> > more complex. It is essentially what we had in the prior version
> > which used the same filtering code for assignment as well as
> > host AP configuration changes.
> >   
> 
> Will have to think about it some more. Making the user unplug and
> replug an adapter because at some point it got filtered, but there
> is no need to filter it does not feel right. On the other hand, I'm
> afraid I'm complaining in circles. 

I did some thinking. The following statements are about the state of
affairs, when all 17 patches are applied. I'm commenting here, because
I believe this is the patch that introduces the most controversial code.

First about low level problems with the current code/design. The other is
empty handling in vfio_ap_assign_apid_to_apcb() (and
vfio_ap_assign_apqi_to_apcb()) is troublesome. The final product
allows for over-commitment, i.e. assignment of e.g. domains that
are not in the crypto host config. Let's assume the host LPAR
has usage domains 1 and 2, and adapters 1, 2, and 3. The apmask
and aqmask are both 0 (all in on vfio), all bound. We start with an empty
mdev that is tied to a running guest:
assign_adapter 1
assign_adapter 2
assign_adapter 3
assign_adapter 4
all of these will work. The resulting shadow_apcb is completely empty. No
commit_apcb.
assign_domain 1
assign_domain 2
assign_domain 3
all of these will work. But again the shadow_apcb is completely empty at
the end: we did get to the loop that is checking the boundness of the
queues, but please note that we are checking against matrix.apm, and
adapter 4 is not in the config of the host.

I've hacked up a fixup patch for these problems that simplifies the
code considerably, but there are design level issues, that run deeper,
so I'm not sure the fixups are the way to go.

Now lets talk about design level stuff. Currently the assignment
operations are designed in to accommodate the FCFS principle. This
is a blessing and a curse at the same time. 

Consider the following scenarios. We have an empty (nothing assigned
mdev) and the following queues are bound to the vfio_ap driver:
0.0
0.1
1.0
If the we do 
asssign_adapter 0
assign_domain 0
assign_domain 1
assign_adapter 1
We end up with the guest_matrix
0.0
0.1
and the matrix
0.0
0.1
1.0
1.0

That is a different result compared to
asssign_adapter 0
assign_domain 0
assign_adapter 1
assign_domain 1
or the situation where we have 0.0, 0.1, 1.0 and 1.1 bound to vfio_ap
and then 1.1 gets unbound.

For the same system state (bound, config, ap_perm, matrix) you get a
different outcomes (guest_matrix), because the outcomes depend on
history.

Another thing is recovery. I believe the main idea behind shadow_apcb
is that we should auto recover once the resources are available again.
The current design choices make recovery more difficult to think about
because we may end up having either the apid or the apqi filtered on
a 'hole' (an queue missing for reasons different than, belonging to
default, or not being in the host config).

I still think for these cases filtering out the apid is the lesser
evil. Yes a hotplug of a domain making hot unplugging an adapter is
ugly, but at least I can describe that. So I propose the following.
Let me hack up a fixup that morphs things in this direction. Maybe
I will run into unexpected problems, but if I don't then we will
have an alternative design you can run your testcases against. How about
that?

Regards,
Halil


Re: [PATCH v12 12/17] s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device

2020-12-01 Thread Halil Pasic
On Mon, 30 Nov 2020 19:18:30 -0500
Tony Krowiak  wrote:

>  +static bool vfio_ap_assign_apid_to_apcb(struct ap_matrix_mdev 
>  *matrix_mdev,
>  +unsigned long apid)
>  +{
>  +unsigned long apqi, apqn;
>  +unsigned long *aqm = matrix_mdev->shadow_apcb.aqm;
>  +
>  +/*
>  + * If the APID is already assigned to the guest's shadow APCB, 
>  there is
>  + * no need to assign it.
>  + */
>  +if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm))
>  +return false;
>  +
>  +/*
>  + * If no domains have yet been assigned to the shadow APCB and 
>  one or
>  + * more domains have been assigned to the matrix mdev, then use
>  + * the domains assigned to the matrix mdev; otherwise, there is 
>  nothing
>  + * to assign to the shadow APCB.
>  + */
>  +if (bitmap_empty(matrix_mdev->shadow_apcb.aqm, AP_DOMAINS)) {
>  +if (bitmap_empty(matrix_mdev->matrix.aqm, AP_DOMAINS))
>  +return false;
>  +
>  +aqm = matrix_mdev->matrix.aqm;
>  +}
>  +
>  +/* Make sure all APQNs are bound to the vfio_ap driver */
>  +for_each_set_bit_inv(apqi, aqm, AP_DOMAINS) {
>  +apqn = AP_MKQID(apid, apqi);
>  +
>  +if (vfio_ap_mdev_get_queue(matrix_mdev, apqn) == NULL)
>  +return false;
>  +}
>  +
>  +set_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
>  +
>  +/*
>  + * If we verified APQNs using the domains assigned to the 
>  matrix mdev,
>  + * then copy the APQIs of those domains into the guest's APCB
>  + */
>  +if (bitmap_empty(matrix_mdev->shadow_apcb.aqm, AP_DOMAINS))
>  +bitmap_copy(matrix_mdev->shadow_apcb.aqm,
>  +matrix_mdev->matrix.aqm, AP_DOMAINS);
>  +
>  +return true;
>  +}  
> >>> What is the rationale behind the shadow aqm empty special handling?  
> >> The rationale was to avoid taking the VCPUs
> >> out of SIE in order to make an update to the guest's APCB
> >> unnecessarily. For example, suppose the guest is started
> >> without access to any APQNs (i.e., all matrix and shadow_apcb
> >> masks are zeros). Now suppose the administrator proceeds to
> >> start assigning AP resources to the mdev. Let's say he starts
> >> by assigning adapters 1 through 100. The code below will return
> >> true indicating the shadow_apcb was updated. Consequently,
> >> the calling code will commit the changes to the guest's
> >> APCB. The problem there is that in order to update the guest's
> >> VCPUs, they will have to be taken out of SIE, yet the guest will
> >> not get access to the adapter since no domains have yet been
> >> assigned to the APCB. Doing this 100 times - once for each
> >> adapter 1-100 - is probably a bad idea.
> >>  
> > Not yanking the VCPUs out of SIE does make a lot of sense. At least
> > I understand your motivation now. I will think some more about this,
> > but in the meanwhile, please try to answer one more question (see
> > below).
> > 
> >>>I.e.
> >>> why not simply:
> >>>
> >>>
> >>> static bool vfio_ap_assign_apid_to_apcb(struct ap_matrix_mdev 
> >>> *matrix_mdev,
> >>>   unsigned long apid)
> >>> {
> >>>   unsigned long apqi, apqn;
> >>>   unsigned long *aqm = matrix_mdev->shadow_apcb.aqm;
> >>>   
> >>> 
> >>>   /*
> >>>* If the APID is already assigned to the guest's shadow APCB, 
> >>> there is
> >>>* no need to assign it.
> >>>*/
> >>>   if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm))
> >>>   return false;
> >>>   
> >>> 
> >>>   /* Make sure all APQNs are bound to the vfio_ap driver */
> >>>   for_each_set_bit_inv(apqi, aqm, AP_DOMAINS) {
> >>>   apqn = AP_MKQID(apid, apqi);
> >>>   
> >>> 
> >>>   if (vfio_ap_mdev_get_queue(matrix_mdev, apqn) == NULL)
> >>>   return false;
> >>>   }
> >>>   
> >>> 
> >>>   set_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
> >>>   
> >>> 
> >>>   return true;  
> > Would
> > s/return true/return !bitmap_empty(matrix_mdev->shadow_apcb.aqm,
> > 

Re: [PATCH v12 12/17] s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device

2020-11-30 Thread Halil Pasic
On Mon, 30 Nov 2020 14:36:10 -0500
Tony Krowiak  wrote:

> 
> 
> On 11/28/20 8:52 PM, Halil Pasic wrote:
[..]
> >> * Unassign adapter from mdev's matrix:
> >>
> >>The domain will be hot unplugged from the KVM guest if it is
> >>assigned to the guest's matrix.
> >>
> >> * Assign a control domain:
> >>
> >>The control domain will be hot plugged into the KVM guest if it is not
> >>assigned to the guest's APCB. The AP architecture ensures a guest will
> >>only get access to the control domain if it is in the host's AP
> >>configuration, so there is no risk in hot plugging it; however, it will
> >>become automatically available to the guest when it is added to the host
> >>configuration.
> >>
> >> * Unassign a control domain:
> >>
> >>The control domain will be hot unplugged from the KVM guest if it is
> >>assigned to the guest's APCB.
> > This is where things start getting tricky. E.g. do we need to revise
> > filtering after an unassign? (For example an assign_adapter X didn't
> > change the shadow, because queue XY was missing, but now we unplug domain
> > Y. Should the adapter X pop up? I guess it should.)
> 
> I suppose that makes sense at the expense of making the code
> more complex. It is essentially what we had in the prior version
> which used the same filtering code for assignment as well as
> host AP configuration changes.
> 

Will have to think about it some more. Making the user unplug and
replug an adapter because at some point it got filtered, but there
is no need to filter it does not feel right. On the other hand, I'm
afraid I'm complaining in circles. 

> >
> >
> >> Note: Now that hot plug/unplug is implemented, there is the possibility
> >>that an assignment/unassignment of an adapter, domain or control
> >>domain could be initiated while the guest is starting, so the
> >>matrix device lock will be taken for the group notification callback
> >>that initializes the guest's APCB when the KVM pointer is made
> >>available to the vfio_ap device driver.
> >>
> >> Signed-off-by: Tony Krowiak 
> >> ---
> >>   drivers/s390/crypto/vfio_ap_ops.c | 190 +-
> >>   1 file changed, 159 insertions(+), 31 deletions(-)
> >>
> >> diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
> >> b/drivers/s390/crypto/vfio_ap_ops.c
> >> index 586ec5776693..4f96b7861607 100644
> >> --- a/drivers/s390/crypto/vfio_ap_ops.c
> >> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> >> @@ -631,6 +631,60 @@ static void vfio_ap_mdev_manage_qlinks(struct 
> >> ap_matrix_mdev *matrix_mdev,
> >>}
> >>   }
> >>   
> >> +static bool vfio_ap_assign_apid_to_apcb(struct ap_matrix_mdev 
> >> *matrix_mdev,
> >> +  unsigned long apid)
> >> +{
> >> +  unsigned long apqi, apqn;
> >> +  unsigned long *aqm = matrix_mdev->shadow_apcb.aqm;
> >> +
> >> +  /*
> >> +   * If the APID is already assigned to the guest's shadow APCB, there is
> >> +   * no need to assign it.
> >> +   */
> >> +  if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm))
> >> +  return false;
> >> +
> >> +  /*
> >> +   * If no domains have yet been assigned to the shadow APCB and one or
> >> +   * more domains have been assigned to the matrix mdev, then use
> >> +   * the domains assigned to the matrix mdev; otherwise, there is nothing
> >> +   * to assign to the shadow APCB.
> >> +   */
> >> +  if (bitmap_empty(matrix_mdev->shadow_apcb.aqm, AP_DOMAINS)) {
> >> +  if (bitmap_empty(matrix_mdev->matrix.aqm, AP_DOMAINS))
> >> +  return false;
> >> +
> >> +  aqm = matrix_mdev->matrix.aqm;
> >> +  }
> >> +
> >> +  /* Make sure all APQNs are bound to the vfio_ap driver */
> >> +  for_each_set_bit_inv(apqi, aqm, AP_DOMAINS) {
> >> +  apqn = AP_MKQID(apid, apqi);
> >> +
> >> +  if (vfio_ap_mdev_get_queue(matrix_mdev, apqn) == NULL)
> >> +  return false;
> >> +  }
> >> +
> >> +  set_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
> >> +
> >> +  /*
> >> +   * If we verified APQNs using the domains assigned to the matrix mdev,
> >> +   * then copy the APQIs of those domains into the guest's APCB
> >

Re: [PATCH v3] s390/pci: fix CPU address in MSI for directed IRQ

2020-11-30 Thread Halil Pasic
On Mon, 30 Nov 2020 09:30:33 +0100
Niklas Schnelle  wrote:

> I'm not really familiar, with it but I think this is closely related
> to what I asked Bernd Nerz. I fear that if CPUs go away we might already
> be in trouble at the firmware/hardware/platform level because the CPU Address 
> is
> "programmed into the device" so to speak. Thus a directed interrupt from
> a device may race with anything reordering/removing CPUs even if
> CPU addresses of dead CPUs are not reused and the mapping is stable.

From your answer, I read that CPU hot-unplug is supported for LPAR. 
> 
> Furthermore our floating fallback path will try to send a SIGP
> to the target CPU which clearly doesn't work when that is permanently
> gone. Either way I think these issues are out of scope for this fix
> so I will go ahead and merge this.

I agree, it makes on sense to delay this fix.

But if CPU hot-unplug is supported, I believe we should react when
a CPU is unplugged, that is a target of directed interrupts. My guess
is, that in this scenario transient hiccups are unavoidable, and thus
should be accepted, but we should make sure that we recover.

Regards,
Halil


Re: [PATCH v12 12/17] s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device

2020-11-28 Thread Halil Pasic
On Tue, 24 Nov 2020 16:40:11 -0500
Tony Krowiak  wrote:

> Let's hot plug/unplug adapters, domains and control domains assigned to or
> unassigned from an AP matrix mdev device while it is in use by a guest per
> the following rules:
> 
> * Assign an adapter to mdev's matrix:
> 
>   The adapter will be hot plugged into the guest under the following
>   conditions:
>   1. The adapter is not yet assigned to the guest's matrix
>   2. At least one domain is assigned to the guest's matrix
>   3. Each APQN derived from the APID of the newly assigned adapter and
>  the APQIs of the domains already assigned to the guest's
>  matrix references a queue device bound to the vfio_ap device driver.
> 
>   The adapter and each domain assigned to the mdev's matrix will be hot
>   plugged into the guest under the following conditions:
>   1. The adapter is not yet assigned to the guest's matrix
>   2. No domains are assigned to the guest's matrix
>   3  At least one domain is assigned to the mdev's matrix
>   4. Each APQN derived from the APID of the newly assigned adapter and
>  the APQIs of the domains assigned to the mdev's matrix references a
>  queue device bound to the vfio_ap device driver.
> 
> * Unassign an adapter from mdev's matrix:
> 
>   The adapter will be hot unplugged from the KVM guest if it is
>   assigned to the guest's matrix.
> 
> * Assign a domain to mdev's matrix:
> 
>   The domain will be hot plugged into the guest under the following
>   conditions:
>   1. The domain is not yet assigned to the guest's matrix
>   2. At least one adapter is assigned to the guest's matrix
>   3. Each APQN derived from the APQI of the newly assigned domain and
>  the APIDs of the adapters already assigned to the guest's
>  matrix references a queue device bound to the vfio_ap device driver.
> 
>   The domain and each adapter assigned to the mdev's matrix will be hot
>   plugged into the guest under the following conditions:
>   1. The domain is not yet assigned to the guest's matrix
>   2. No adapters are assigned to the guest's matrix
>   3  At least one adapter is assigned to the mdev's matrix
>   4. Each APQN derived from the APQI of the newly assigned domain and
>  the APIDs of the adapters assigned to the mdev's matrix references a
>  queue device bound to the vfio_ap device driver.
> 
> * Unassign adapter from mdev's matrix:
> 
>   The domain will be hot unplugged from the KVM guest if it is
>   assigned to the guest's matrix.
> 
> * Assign a control domain:
> 
>   The control domain will be hot plugged into the KVM guest if it is not
>   assigned to the guest's APCB. The AP architecture ensures a guest will
>   only get access to the control domain if it is in the host's AP
>   configuration, so there is no risk in hot plugging it; however, it will
>   become automatically available to the guest when it is added to the host
>   configuration.
> 
> * Unassign a control domain:
> 
>   The control domain will be hot unplugged from the KVM guest if it is
>   assigned to the guest's APCB.

This is where things start getting tricky. E.g. do we need to revise
filtering after an unassign? (For example an assign_adapter X didn't
change the shadow, because queue XY was missing, but now we unplug domain
Y. Should the adapter X pop up? I guess it should.)


> 
> Note: Now that hot plug/unplug is implemented, there is the possibility
>   that an assignment/unassignment of an adapter, domain or control
>   domain could be initiated while the guest is starting, so the
>   matrix device lock will be taken for the group notification callback
>   that initializes the guest's APCB when the KVM pointer is made
>   available to the vfio_ap device driver.
> 
> Signed-off-by: Tony Krowiak 
> ---
>  drivers/s390/crypto/vfio_ap_ops.c | 190 +-
>  1 file changed, 159 insertions(+), 31 deletions(-)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
> b/drivers/s390/crypto/vfio_ap_ops.c
> index 586ec5776693..4f96b7861607 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -631,6 +631,60 @@ static void vfio_ap_mdev_manage_qlinks(struct 
> ap_matrix_mdev *matrix_mdev,
>   }
>  }
>  
> +static bool vfio_ap_assign_apid_to_apcb(struct ap_matrix_mdev *matrix_mdev,
> + unsigned long apid)
> +{
> + unsigned long apqi, apqn;
> + unsigned long *aqm = matrix_mdev->shadow_apcb.aqm;
> +
> + /*
> +  * If the APID is already assigned to the guest's shadow APCB, there is
> +  * no need to assign it.
> +  */
> + if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm))
> + return false;
> +
> + /*
> +  * If no domains have yet been assigned to the shadow APCB and one or
> +  * more domains have been assigned to the matrix mdev, then use
> +  * the domains assigned to the matrix mdev; otherwise, there is nothing
> +  * to assign to the 

Re: [PATCH v12 11/17] s390/vfio-ap: allow assignment of unavailable AP queues to mdev device

2020-11-28 Thread Halil Pasic
On Tue, 24 Nov 2020 16:40:10 -0500
Tony Krowiak  wrote:

> The current implementation does not allow assignment of an AP adapter or
> domain to an mdev device if each APQN resulting from the assignment
> does not reference an AP queue device that is bound to the vfio_ap device
> driver. This patch allows assignment of AP resources to the matrix mdev as
> long as the APQNs resulting from the assignment:
>1. Are not reserved by the AP BUS for use by the zcrypt device drivers.
>2. Are not assigned to another matrix mdev.
> 
> The rationale behind this is twofold:
>1. The AP architecture does not preclude assignment of APQNs to an AP
>   configuration that are not available to the system.
>2. APQNs that do not reference a queue device bound to the vfio_ap
>   device driver will not be assigned to the guest's CRYCB, so the
>   guest will not get access to queues not bound to the vfio_ap driver.
> 
> Signed-off-by: Tony Krowiak 

Again code looks good. I'm still worried about all the incremental
changes (good for review) and their testability.


Re: [PATCH v12 10/17] s390/vfio-ap: initialize the guest apcb

2020-11-28 Thread Halil Pasic
On Tue, 24 Nov 2020 16:40:09 -0500
Tony Krowiak  wrote:

> The APCB is a control block containing the masks that specify the adapters,
> domains and control domains to which a KVM guest is granted access. When
> the vfio_ap device driver is notified that the KVM pointer has been set,
> the guest's APCB is initialized from the AP configuration of adapters,
> domains and control domains assigned to the matrix mdev. The linux device
> model, however, precludes passing through to a guest any devices that
> are not bound to the device driver facilitating the pass-through.
> Consequently, APQNs assigned to the matrix mdev that do not reference
> AP queue devices must be filtered before assigning them to the KVM guest's
> APCB; however, the AP architecture precludes filtering individual APQNs, so
> the APQNs will be filtered by APID. That is, if a given APQN does not
> reference a queue device bound to the vfio_ap driver, its APID will not
> get assigned to the guest's APCB. For example:
> 
> Queues bound to vfio_ap:
> 04.0004
> 04.0022
> 04.0035
> 05.0004
> 05.0022
> 
> Adapters/domains assigned to the matrix mdev:
> 04 0004
>0022
>0035
> 05 0004
>0022
>0035
> 
> APQNs assigned to APCB:
> 04.0004
> 04.0022
> 04.0035
> 
> The APID 05 was filtered from the matrix mdev's matrix because
> queue device 05.0035 is not bound to the vfio_ap device driver.
> 
> Signed-off-by: Tony Krowiak 

This adds filtering. So from here guest_matrix may be different
than matrix also for an mdev that is associated with a guest. I'm still
grappling with the big picture. Have you thought about testability?
How is a testcase supposed to figure out which behavior is
to be deemed correct?

I don't like the title line. It implies that guest apcb was
uninitialized before. Which is not the case.






Re: [PATCH v12 09/17] s390/vfio-ap: sysfs attribute to display the guest's matrix

2020-11-28 Thread Halil Pasic
On Tue, 24 Nov 2020 16:40:08 -0500
Tony Krowiak  wrote:

> The matrix of adapters and domains configured in a guest's APCB may
> differ from the matrix of adapters and domains assigned to the matrix mdev,
> so this patch introduces a sysfs attribute to display the matrix of
> adapters and domains that are or will be assigned to the APCB of a guest
> that is or will be using the matrix mdev. For a matrix mdev denoted by
> $uuid, the guest matrix can be displayed as follows:
> 
>cat /sys/devices/vfio_ap/matrix/$uuid/guest_matrix
> 
> Signed-off-by: Tony Krowiak 

Code looks good, but it may be a little early, since the treatment of
guset_matrix is changed by the following patches.


Re: [PATCH v12 08/17] s390/vfio-ap: introduce shadow APCB

2020-11-28 Thread Halil Pasic
On Tue, 24 Nov 2020 16:40:07 -0500
Tony Krowiak  wrote:

> The APCB is a field within the CRYCB that provides the AP configuration
> to a KVM guest. Let's introduce a shadow copy of the KVM guest's APCB and
> maintain it for the lifespan of the guest.
> 
> Signed-off-by: Tony Krowiak 
> Reviewed-by: Halil Pasic 

Still LGTM


Re: [PATCH v3] s390/pci: fix CPU address in MSI for directed IRQ

2020-11-27 Thread Halil Pasic
On Fri, 27 Nov 2020 11:08:10 +0100
Niklas Schnelle  wrote:

> 
> 
> On 11/27/20 9:56 AM, Halil Pasic wrote:
> > On Thu, 26 Nov 2020 18:00:37 +0100
> > Alexander Gordeev  wrote:
> > 
> >> The directed MSIs are delivered to CPUs whose address is
> >> written to the MSI message data. The current code assumes
> >> that a CPU logical number (as it is seen by the kernel)
> >> is also that CPU address.
> >>
> >> The above assumption is not correct, as the CPU address
> >> is rather the value returned by STAP instruction. That
> >> value does not necessarily match the kernel logical CPU
> >> number.
> >>
> >> Fixes: e979ce7bced2 ("s390/pci: provide support for CPU directed 
> >> interrupts")
> >> Signed-off-by: Alexander Gordeev 
> >> ---
> >>  arch/s390/pci/pci_irq.c | 14 +++---
> >>  1 file changed, 11 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/arch/s390/pci/pci_irq.c b/arch/s390/pci/pci_irq.c
> >> index 743f257cf2cb..75217fb63d7b 100644
> >> --- a/arch/s390/pci/pci_irq.c
> >> +++ b/arch/s390/pci/pci_irq.c
> >> @@ -103,9 +103,10 @@ static int zpci_set_irq_affinity(struct irq_data 
> >> *data, const struct cpumask *de
> >>  {
> >>struct msi_desc *entry = irq_get_msi_desc(data->irq);
> >>struct msi_msg msg = entry->msg;
> >> +  int cpu_addr = smp_cpu_get_cpu_address(cpumask_first(dest));
> >>  
> >>msg.address_lo &= 0xffff;
> >> -  msg.address_lo |= (cpumask_first(dest) << 8);
> >> +  msg.address_lo |= (cpu_addr << 8);
> >>pci_write_msi_msg(data->irq, );
> >>  
> >>return IRQ_SET_MASK_OK;
> >> @@ -238,6 +239,7 @@ int arch_setup_msi_irqs(struct pci_dev *pdev, int 
> >> nvec, int type)
> >>unsigned long bit;
> >>struct msi_desc *msi;
> >>struct msi_msg msg;
> >> +  int cpu_addr;
> >>int rc, irq;
> >>  
> >>zdev->aisb = -1UL;
> >> @@ -287,9 +289,15 @@ int arch_setup_msi_irqs(struct pci_dev *pdev, int 
> >> nvec, int type)
> >> handle_percpu_irq);
> >>msg.data = hwirq - bit;
> >>if (irq_delivery == DIRECTED) {
> >> +  if (msi->affinity)
> >> +      cpu = cpumask_first(>affinity->mask);
> >> +  else
> >> +  cpu = 0;
> >> +  cpu_addr = smp_cpu_get_cpu_address(cpu);
> >> +
> > 
> > I thin style wise, I would prefer keeping the ternary operator instead
> > of rewriting it as an if-then-else, i.e.:
> > cpu_addr = smp_cpu_get_cpu_address(msi->affinity ?  
> > 
> > cpumask_first(>affinity->mask) : 0);
> > but either way:
> > 
> > Reviewed-by: Halil Pasic  
> 
> Thanks for your review, lets keep the if/else its certainly not less
> readable even if it may be less pretty.
> 
> Found another thing (not directly in the touched code) but I'm now
> wondering about. In zpci_handle_cpu_local_irq()
> we do
>   struct airq_iv *dibv = zpci_ibv[smp_processor_id()];
> 
> does that also need to use some _address() variant? If it does that
> then dicatates that the CPU addresses must start at 0.
> 

I didn't go to the bottom of this, but my understanding is that it
does not need a _address() variant. What we need is, probably, the
mapping between the 'id' and 'address' being a stable one.

Please notice that cpu_enable_directed_irq() is called on each cpu. That
establishes the mapping/relationship between the id and the address,
as the machine cares for the address, and cpu_enable_directed_irq()
cares for the id:
static void __init cpu_enable_directed_irq(void *unused)
{   
union zpci_sic_iib iib = {{0}}; 

iib.cdiib.dibv_addr = (u64) zpci_ibv[smp_processor_id()]->vector;   

__zpci_set_irq_ctrl(SIC_IRQ_MODE_SET_CPU, 0, ); 
zpci_set_irq_ctrl(SIC_IRQ_MODE_D_SINGLE, PCI_ISC);  
}

Now were the id <-> address mapping to change, we would be in trouble. If
that's possible, I don't know. My guess is that it would require cpu hot
unplug. Niklas, are you familiar with that stuff? Should we ask, Heiko
and Vasily?

Regards,
Halil

> > 
> >>msg.address_lo = zdev->msi_addr & 0xffff;
> >> -  msg.address_lo |= msi->affinity ?
> >> -  (cpumask_first(>affinity->mask) << 8) : 0;
> >> +  msg.address_lo |= (cpu_addr << 8);
> >> +
> >>for_each_possible_cpu(cpu) {
> >>airq_iv_set_data(zpci_ibv[cpu], hwirq, irq);
> >>}
> > 



Re: [PATCH v3] s390/pci: fix CPU address in MSI for directed IRQ

2020-11-27 Thread Halil Pasic
On Thu, 26 Nov 2020 18:00:37 +0100
Alexander Gordeev  wrote:

> The directed MSIs are delivered to CPUs whose address is
> written to the MSI message data. The current code assumes
> that a CPU logical number (as it is seen by the kernel)
> is also that CPU address.
> 
> The above assumption is not correct, as the CPU address
> is rather the value returned by STAP instruction. That
> value does not necessarily match the kernel logical CPU
> number.
> 
> Fixes: e979ce7bced2 ("s390/pci: provide support for CPU directed interrupts")
> Signed-off-by: Alexander Gordeev 
> ---
>  arch/s390/pci/pci_irq.c | 14 +++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/s390/pci/pci_irq.c b/arch/s390/pci/pci_irq.c
> index 743f257cf2cb..75217fb63d7b 100644
> --- a/arch/s390/pci/pci_irq.c
> +++ b/arch/s390/pci/pci_irq.c
> @@ -103,9 +103,10 @@ static int zpci_set_irq_affinity(struct irq_data *data, 
> const struct cpumask *de
>  {
>   struct msi_desc *entry = irq_get_msi_desc(data->irq);
>   struct msi_msg msg = entry->msg;
> + int cpu_addr = smp_cpu_get_cpu_address(cpumask_first(dest));
>  
>   msg.address_lo &= 0xffff;
> - msg.address_lo |= (cpumask_first(dest) << 8);
> + msg.address_lo |= (cpu_addr << 8);
>   pci_write_msi_msg(data->irq, );
>  
>   return IRQ_SET_MASK_OK;
> @@ -238,6 +239,7 @@ int arch_setup_msi_irqs(struct pci_dev *pdev, int nvec, 
> int type)
>   unsigned long bit;
>   struct msi_desc *msi;
>   struct msi_msg msg;
> + int cpu_addr;
>   int rc, irq;
>  
>   zdev->aisb = -1UL;
> @@ -287,9 +289,15 @@ int arch_setup_msi_irqs(struct pci_dev *pdev, int nvec, 
> int type)
>handle_percpu_irq);
>   msg.data = hwirq - bit;
>   if (irq_delivery == DIRECTED) {
> + if (msi->affinity)
> + cpu = cpumask_first(>affinity->mask);
> + else
> + cpu = 0;
> + cpu_addr = smp_cpu_get_cpu_address(cpu);
> +

I thin style wise, I would prefer keeping the ternary operator instead
of rewriting it as an if-then-else, i.e.:
cpu_addr = smp_cpu_get_cpu_address(msi->affinity ?  
cpumask_first(>affinity->mask) : 0);
but either way:

Reviewed-by: Halil Pasic  

>   msg.address_lo = zdev->msi_addr & 0xffff;
> - msg.address_lo |= msi->affinity ?
> - (cpumask_first(>affinity->mask) << 8) : 0;
> + msg.address_lo |= (cpu_addr << 8);
> +
>   for_each_possible_cpu(cpu) {
>   airq_iv_set_data(zpci_ibv[cpu], hwirq, irq);
>   }



Re: [PATCH v12 07/17] s390/vfio-ap: implement in-use callback for vfio_ap driver

2020-11-26 Thread Halil Pasic
On Tue, 24 Nov 2020 16:40:06 -0500
Tony Krowiak  wrote:

> Let's implement the callback to indicate when an APQN
> is in use by the vfio_ap device driver. The callback is
> invoked whenever a change to the apmask or aqmask would
> result in one or more queue devices being removed from the driver. The
> vfio_ap device driver will indicate a resource is in use
> if the APQN of any of the queue devices to be removed are assigned to
> any of the matrix mdevs under the driver's control.
> 
> There is potential for a deadlock condition between the matrix_dev->lock
> used to lock the matrix device during assignment of adapters and domains
> and the ap_perms_mutex locked by the AP bus when changes are made to the
> sysfs apmask/aqmask attributes.
> 
> Consider following scenario (courtesy of Halil Pasic):
> 1) apmask_store() takes ap_perms_mutex
> 2) assign_adapter_store() takes matrix_dev->lock
> 3) apmask_store() calls vfio_ap_mdev_resource_in_use() which tries
>to take matrix_dev->lock
> 4) assign_adapter_store() calls ap_apqn_in_matrix_owned_by_def_drv
>which tries to take ap_perms_mutex
> 
> BANG!
> 
> To resolve this issue, instead of using the mutex_lock(_dev->lock)
> function to lock the matrix device during assignment of an adapter or
> domain to a matrix_mdev as well as during the in_use callback, the
> mutex_trylock(_dev->lock) function will be used. If the lock is not
> obtained, then the assignment and in_use functions will terminate with
> -EBUSY.

Good news is: the final product is OK with regards to in_use(). Bad news
is: this patch does not do enough. At this stage we are still racy.

The problem is that the assign operations don't bother to take the
ap_perms_mutex lock under the matrix_dev->lock.

The scenario is the following:
1) apmask_store() takes ap_perms_mutex
2) apmask_store() calls vfio_ap_mdev_resource_in_use() which
 takes matrix_dev->lock
3) vfio_ap_mdev_resource_in_use() releases matrix_dev->lock
   and returns 0
4) assign_adapter_store() takes matrix_dev->lock does the
   assign (the queues are still bound to vfio_ap) and releases
   matrix_dev->lock 
5) apmask_store() carries on, does the update to apask and releases
   ap_perms_mutex
6) The queues get 'stolen' from vfio ap while used.

This gets fixed with "s390/vfio-ap: allow assignment of unavailable AP
queues to mdev device". Maybe we can reorder these patches. I didn't
look into that.

We could also just ignore the problem, because it is just for a couple
of commits, but I would prefer it gone.

Regards,
Halil
   




Re: [PATCH v12 05/17] s390/vfio-ap: manage link between queue struct and matrix mdev

2020-11-26 Thread Halil Pasic
On Tue, 24 Nov 2020 16:40:04 -0500
Tony Krowiak  wrote:

> Let's create links between each queue device bound to the vfio_ap device
> driver and the matrix mdev to which the queue is assigned. The idea is to
> facilitate efficient retrieval of the objects representing the queue
> devices and matrix mdevs as well as to verify that a queue assigned to
> a matrix mdev is bound to the driver.
> 
> The links will be created as follows:
> 
>* When the queue device is probed, if its APQN is assigned to a matrix
>  mdev, the structures representing the queue device and the matrix mdev
>  will be linked.
> 
>* When an adapter or domain is assigned to a matrix mdev, for each new
>  APQN assigned that references a queue device bound to the vfio_ap
>  device driver, the structures representing the queue device and the
>  matrix mdev will be linked.
> 
> The links will be removed as follows:
> 
>* When the queue device is removed, if its APQN is assigned to a matrix
>  mdev, the structures representing the queue device and the matrix mdev
>  will be unlinked.
> 
>* When an adapter or domain is unassigned from a matrix mdev, for each
>  APQN unassigned that references a queue device bound to the vfio_ap
>  device driver, the structures representing the queue device and the
>  matrix mdev will be unlinked.
> 
> Signed-off-by: Tony Krowiak 

Actually some aspects of this look much better than last time,
but I'm afraid there one new issue that must be corrected -- see below.

> ---
>  drivers/s390/crypto/vfio_ap_ops.c | 161 +++---
>  drivers/s390/crypto/vfio_ap_private.h |   3 +
>  2 files changed, 146 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
> b/drivers/s390/crypto/vfio_ap_ops.c
> index dc699fd54505..07caf871943c 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -28,7 +28,6 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device 
> *mdev);
>  
>  /**
>   * vfio_ap_get_queue: Retrieve a queue with a specific APQN.
> - * @matrix_mdev: the associated mediated matrix
>   * @apqn: The queue APQN
>   *
>   * Retrieve a queue with a specific APQN from the AP queue devices attached 
> to
> @@ -36,32 +35,36 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device 
> *mdev);
>   *
>   * Returns the pointer to the vfio_ap_queue with the specified APQN, or NULL.
>   */
> -static struct vfio_ap_queue *vfio_ap_get_queue(
> - struct ap_matrix_mdev *matrix_mdev,
> - int apqn)
> +static struct vfio_ap_queue *vfio_ap_get_queue(int apqn)
>  {
>   struct ap_queue *queue;
>   struct vfio_ap_queue *q = NULL;
>  
> - if (!test_bit_inv(AP_QID_CARD(apqn), matrix_mdev->matrix.apm))
> - return NULL;
> - if (!test_bit_inv(AP_QID_QUEUE(apqn), matrix_mdev->matrix.aqm))
> - return NULL;
> -
>   queue = ap_get_qdev(apqn);
>   if (!queue)
>   return NULL;
>  
>   put_device(>ap_dev.device);
>  
> - if (queue->ap_dev.device.driver == _dev->vfio_ap_drv->driver) {
> + if (queue->ap_dev.device.driver == _dev->vfio_ap_drv->driver)
>   q = dev_get_drvdata(>ap_dev.device);
> - q->matrix_mdev = matrix_mdev;
> - }
>  
>   return q;
>  }
>  
> +static struct vfio_ap_queue *
> +vfio_ap_mdev_get_queue(struct ap_matrix_mdev *matrix_mdev, unsigned long 
> apqn)
> +{
> + struct vfio_ap_queue *q;
> +
> + hash_for_each_possible(matrix_mdev->qtable, q, mdev_qnode, apqn) {
> + if (q && (q->apqn == apqn))
> + return q;
> + }
> +
> + return NULL;
> +}
> +
>  /**
>   * vfio_ap_wait_for_irqclear
>   * @apqn: The AP Queue number
> @@ -172,7 +175,6 @@ static struct ap_queue_status vfio_ap_irq_disable(struct 
> vfio_ap_queue *q)
> status.response_code);
>  end_free:
>   vfio_ap_free_aqic_resources(q);
> - q->matrix_mdev = NULL;
>   return status;
>  }
>  
> @@ -288,7 +290,7 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
>   matrix_mdev = container_of(vcpu->kvm->arch.crypto.pqap_hook,
>  struct ap_matrix_mdev, pqap_hook);
>  
> - q = vfio_ap_get_queue(matrix_mdev, apqn);
> + q = vfio_ap_mdev_get_queue(matrix_mdev, apqn);
>   if (!q)
>   goto out_unlock;
>  
> @@ -331,6 +333,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, 
> struct mdev_device *mdev)
>  
>   matrix_mdev->mdev = mdev;
>   vfio_ap_matrix_init(_dev->info, _mdev->matrix);
> + hash_init(matrix_mdev->qtable);
>   mdev_set_drvdata(mdev, matrix_mdev);
>   matrix_mdev->pqap_hook.hook = handle_pqap;
>   matrix_mdev->pqap_hook.owner = THIS_MODULE;
> @@ -559,6 +562,87 @@ static int vfio_ap_mdev_verify_no_sharing(struct 
> ap_matrix_mdev *matrix_mdev)
>   return 0;
>  }
>  
> +enum qlink_action {
> + 

Re: [PATCH v12 05/17] s390/vfio-ap: manage link between queue struct and matrix mdev

2020-11-26 Thread Halil Pasic
On Tue, 24 Nov 2020 16:40:04 -0500
Tony Krowiak  wrote:

> @@ -1155,6 +1243,11 @@ static int vfio_ap_mdev_reset_queues(struct 
> mdev_device *mdev)
>matrix_mdev->matrix.apm_max + 1) {
>   for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
>matrix_mdev->matrix.aqm_max + 1) {
> + q = vfio_ap_mdev_get_queue(matrix_mdev,
> +AP_MKQID(apid, apqi));
> + if (!q)
> + continue;
> +
>   ret = vfio_ap_mdev_reset_queue(apid, apqi, 1);
>   /*
>* Regardless whether a queue turns out to be busy, or
> @@ -1164,9 +1257,7 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device 
> *mdev)
>   if (ret)
>   rc = ret;
>  
> - q = vfio_ap_get_queue(matrix_mdev, AP_MKQID(apid, apqi);
> - if (q)
> - vfio_ap_free_aqic_resources(q);
> + vfio_ap_free_aqic_resources(q);
>   }
>   }

During the review of v11 we discussed this. Introducing this the one
way around, just to change it in the next patch, which should deal
with something different makes no sense to me.

BTW I've provided a ton of feedback for '[PATCH v11 03/14]
s390/vfio-ap: manage link between queue struct and matrix mdev', but I
can't find your response to that. Some of the things resurface here, and
I don't feel like repeating myself. Can you provide me an answer to
the v11 version?


Re: [PATCH v12 04/17] s390/vfio-ap: No need to disable IRQ after queue reset

2020-11-26 Thread Halil Pasic
On Tue, 24 Nov 2020 16:40:03 -0500
Tony Krowiak  wrote:

> The queues assigned to a matrix mediated device are currently reset when:
> 
> * The VFIO_DEVICE_RESET ioctl is invoked
> * The mdev fd is closed by userspace (QEMU)
> * The mdev is removed from sysfs.
> 
> Immediately after the reset of a queue, a call is made to disable
> interrupts for the queue. This is entirely unnecessary because the reset of
> a queue disables interrupts, so this will be removed.
> 
> Signed-off-by: Tony Krowiak 

Reviewed-by: Halil Pasic 

As I said previously, I would prefer the cleanup of the airq
resources being part of reset_queue(), but I can propose that
later.

> ---
>  drivers/s390/crypto/vfio_ap_ops.c | 28 +---
>  1 file changed, 5 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
> b/drivers/s390/crypto/vfio_ap_ops.c
> index 8e6972495daa..dc699fd54505 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -26,14 +26,6 @@
>  
>  static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
>  
> -static int match_apqn(struct device *dev, const void *data)
> -{
> - struct vfio_ap_queue *q = dev_get_drvdata(dev);
> -
> - return (q->apqn == *(int *)(data)) ? 1 : 0;
> -}
> -
> -
>  /**
>   * vfio_ap_get_queue: Retrieve a queue with a specific APQN.
>   * @matrix_mdev: the associated mediated matrix
> @@ -1121,20 +1113,6 @@ static int vfio_ap_mdev_group_notifier(struct 
> notifier_block *nb,
>   return NOTIFY_OK;
>  }
>  
> -static void vfio_ap_irq_disable_apqn(int apqn)
> -{
> - struct device *dev;
> - struct vfio_ap_queue *q;
> -
> - dev = driver_find_device(_dev->vfio_ap_drv->driver, NULL,
> -  , match_apqn);
> - if (dev) {
> - q = dev_get_drvdata(dev);
> - vfio_ap_irq_disable(q);
> - put_device(dev);
> - }
> -}
> -
>  static int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
>   unsigned int retry)
>  {
> @@ -1169,6 +1147,7 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device 
> *mdev)
>  {
>   int ret;
>   int rc = 0;
> + struct vfio_ap_queue *q;
>   unsigned long apid, apqi;
>   struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>  
> @@ -1184,7 +1163,10 @@ static int vfio_ap_mdev_reset_queues(struct 
> mdev_device *mdev)
>*/
>   if (ret)
>   rc = ret;
> - vfio_ap_irq_disable_apqn(AP_MKQID(apid, apqi));
> +
> + q = vfio_ap_get_queue(matrix_mdev, AP_MKQID(apid, apqi);
> + if (q)
> + vfio_ap_free_aqic_resources(q);
>   }
>   }
>  



Re: [PATCH v12 03/17] 390/vfio-ap: use new AP bus interface to search for queue devices

2020-11-26 Thread Halil Pasic
On Tue, 24 Nov 2020 16:40:02 -0500
Tony Krowiak  wrote:

A nit: for all other patches the title prefix is  s390/vfio-ap, here you
have 390/vfio-ap.


Re: [PATCH v12 02/17] s390/vfio-ap: decrement reference count to KVM

2020-11-26 Thread Halil Pasic
On Tue, 24 Nov 2020 16:40:01 -0500
Tony Krowiak  wrote:

> Decrement the reference count to KVM when notified that KVM pointer is
> invalidated via the vfio group notifier.

Can you please explain more thoroughly. Is this a bug you found? If
yes do we need to backport it (cc stabe, fixes tag)? 

It doesn't see related to the objective of the series. If not related,
why not spin it separately?


> 
> Signed-off-by: Tony Krowiak 

This s-o-b is probably by accident.

> Signed-off-by: Tony Krowiak 
> ---
>  drivers/s390/crypto/vfio_ap_ops.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
> b/drivers/s390/crypto/vfio_ap_ops.c
> index 66fd9784a156..31e39c1f6e56 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -1095,7 +1095,11 @@ static int vfio_ap_mdev_group_notifier(struct 
> notifier_block *nb,
>   matrix_mdev = container_of(nb, struct ap_matrix_mdev, group_notifier);
>  
>   if (!data) {
> + if (matrix_mdev->kvm)
> + kvm_put_kvm(matrix_mdev->kvm);
> +
>   matrix_mdev->kvm = NULL;
> +
>   return NOTIFY_OK;
>   }
>  



Re: [PATCH v12 01/17] s390/vfio-ap: move probe and remove callbacks to vfio_ap_ops.c

2020-11-26 Thread Halil Pasic
On Tue, 24 Nov 2020 16:40:00 -0500
Tony Krowiak  wrote:

> Let's move the probe and remove callbacks into the vfio_ap_ops.c
> file to keep all code related to managing queues in a single file. This
> way, all functions related to queue management can be removed from the
> vfio_ap_private.h header file defining the public interfaces for the
> vfio_ap device driver.
> 
> Signed-off-by: Tony Krowiak 

Reviewed-by: Halil Pasic 


Re: [PATCH v11 04/14] s390/zcrypt: driver callback to indicate resource in use

2020-11-13 Thread Halil Pasic
On Fri, 13 Nov 2020 16:30:31 -0500
Tony Krowiak  wrote:

> We will be using the mutex_trylock() function in our sysfs 
> assignment
> interfaces which make the call to the AP bus to check permissions (which 
> also
> locks ap_perms). If the mutex_trylock() fails, we return from the assignment
> function with -EBUSY. This should resolve that potential deadlock issue.

It resolves the deadlock issue only if in_use() is also doing
mutex_trylock(), but the if in_use doesn't take the lock it
needs to back off (and so does it's client code) i.e. a boolean as
return value won't do.

Regards,
Halil


Re: [PATCH v11 05/14] s390/vfio-ap: implement in-use callback for vfio_ap driver

2020-11-13 Thread Halil Pasic
On Fri, 13 Nov 2020 12:14:22 -0500
Tony Krowiak  wrote:
[..]
> >>   }
> >>   
> >> +#define MDEV_SHARING_ERR "Userspace may not re-assign queue %02lx.%04lx " 
> >> \
> >> +   "already assigned to %s"
> >> +
> >> +static void vfio_ap_mdev_log_sharing_err(const char *mdev_name,
> >> +   unsigned long *apm,
> >> +   unsigned long *aqm)
> >> +{
> >> +  unsigned long apid, apqi;
> >> +
> >> +  for_each_set_bit_inv(apid, apm, AP_DEVICES)
> >> +  for_each_set_bit_inv(apqi, aqm, AP_DOMAINS)
> >> +  pr_err(MDEV_SHARING_ERR, apid, apqi, mdev_name);
> > Isn't error rather severe for this? For my taste even warning would be
> > severe for this.
> 
> The user only sees a EADDRINUSE returned from the sysfs interface,
> so Conny asked if I could log a message to indicate which APQNs are
> in use by which mdev. I can change this to an info message, but it
> will be missed if the log level is set higher. Maybe Conny can put in
> her two cents here since she asked for this.
> 

I'm looking forward to Conny's opinion. :)

[..]
> >>   
> >> @@ -708,18 +732,18 @@ static ssize_t assign_adapter_store(struct device 
> >> *dev,
> >>if (ret)
> >>goto done;
> >>   
> >> -  set_bit_inv(apid, matrix_mdev->matrix.apm);
> >> +  memset(apm, 0, sizeof(apm));
> >> +  set_bit_inv(apid, apm);
> >>   
> >> -  ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev);
> >> +  ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev, apm,
> >> +   matrix_mdev->matrix.aqm);
> > What is the benefit of using a copy here? I mean we have the vfio_ap lock
> > so nobody can see the bit we speculatively flipped.
> 
> The vfio_ap_mdev_verify_no_sharing() function definition was changed
> so that it can also be re-used by the vfio_ap_mdev_resource_in_use()
> function rather than duplicating that code for the in_use callback. The
> in-use callback is invoked by the AP bus which has no concept of
> a mediated device, so I made this change to accommodate that fact.

Seems I was not clear enough with my question. Here you pass a local
apm which has the every bit 0 except the one corresponding to the
adapter we are trying to assign. The matrix.apm actually may have
more apm bits set. What we used to do, is set the matrix.apm bit,
verify, and clear it if verification fails. I think that
would still work.

The computational complexity is currently the same. For
some reason unknown to me ap_apqn_in_matrix_owned_by_def_drv() uses loops
instead of using bitmap operations. But it won't do any less work
if the apm argument is sparse. Same is true bitmap ops are used.

What you do here is not wrong, because if the invariants, which should
be maintained, are maintained, performing the check with the other
bits set in the apm is superfluous. But as I said before, actually
it ain't extra work, and if there was a bug, it could help us detect
it (because the assignment, that should have worked would fail).

Preparing the local apm isn't much extra work either, but I still
don't understand the change. Why can't you pass in matrix.apm
after set_bit_inv(apid, ...) like we use to do before?

Again, no big deal, but I just prefer to understand the whys.

> 
> >
> > I've also pointed out in the previous patch that in_use() isn't
> > perfectly reliable (at least in theory) because of a race.
> 
> We discussed that privately and determined that the sysfs assignment
> interfaces will use mutex_trylock() to avoid races.

I don't think, what we discussed is going to fix the race I'm referring
to here. But I do look forward to v12.

Regards,
Halil


Re: [PATCH v11 07/14] s390/vfio-ap: sysfs attribute to display the guest's matrix

2020-11-13 Thread Halil Pasic
On Fri, 13 Nov 2020 12:27:32 -0500
Tony Krowiak  wrote:

> 
> 
> On 10/28/20 4:17 AM, Halil Pasic wrote:
> > On Thu, 22 Oct 2020 13:12:02 -0400
> > Tony Krowiak  wrote:
> >
> >> +static ssize_t guest_matrix_show(struct device *dev,
> >> +   struct device_attribute *attr, char *buf)
> >> +{
> >> +  ssize_t nchars;
> >> +  struct mdev_device *mdev = mdev_from_dev(dev);
> >> +  struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
> >> +
> >> +  if (!vfio_ap_mdev_has_crycb(matrix_mdev))
> >> +  return -ENODEV;
> > I'm wondering, would it make sense to have guest_matrix display the would
> > be guest matrix when we don't have a KVM? With the filtering in
> > place, the question in what guest_matrix would my (assign) matrix result
> > right now if I were to hook up my vfio_ap_mdev to a guest seems a
> > legitimate one.
> 
> A couple of thoughts here:
> * The ENODEV informs the user that there is no guest running
>     which makes sense to me given this interface displays the
>     guest matrix. The alternative, which I considered, was to
>     display an empty matrix (i.e., nothing).
> * This would be a pretty drastic change to the design because
>     the shadow_apcb - which is what is displayed via this interface - is
>     only updated when the guest is started and while it is running (i.e.,
>     hot plug of new adapters/domains). Making this change would
>     require changing that entire design concept which I am reluctant
>     to do at this point in the game.
> 
> 

No problem. My thinking was, that, because we can do the
assign/unassing ops also for the running guest, that we also have
the code to do the maintenance on the shadow_apcb. In this
series this code is conditional with respect to vfio_ap_mdev_has_crycb().
E.g. 

static ssize_t assign_adapter_store(struct device *dev, 
struct device_attribute *attr,  
const char *buf, size_t count)  
{   
[..]

if (vfio_ap_mdev_has_crycb(matrix_mdev))
if (vfio_ap_mdev_filter_guest_matrix(matrix_mdev, true))
vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);

If one were to move the 
vfio_ap_mdev_has_crycb() check into vfio_ap_mdev_commit_shadow_apcb()
then we would have an always up to date shatdow_apcb, we could display.

I don't feel strongly about this. Was just an idea, because if the result
of the filtering is surprising, currently the only to see, without
knowing the algorithm, and possibly the state, and the history of the
system, is to actually start a guest.

Regards,
Halil



Re: [PATCH v11 08/14] s390/vfio-ap: hot plug/unplug queues on bind/unbind of queue device

2020-11-05 Thread Halil Pasic
On Wed, 4 Nov 2020 16:20:26 -0500
Tony Krowiak  wrote:

> > But I'm sure the code is suggesting it can, because
> > vfio_ap_mdev_filter_guest_matrix() has a third parameter called filter_apid,
> > which governs whether the apm or the aqm bit should be removed. And
> > vfio_ap_mdev_filter_guest_matrix() does get called with filter_apid=false in
> > assign_domain_store() and I don't see subsequent unlink operations that 
> > would
> > severe q->mdev_matrix.  
> 
> I think you may be conflating two different things. The q in q->matrix_mdev
> represents a queue device bound to the driver. The link to matrix_mdev
> indicates the APQN of the queue device is assigned to the matrix_mdev.
> When a new domain is assigned to matrix_mdev, we know that
> all APQNS currently assigned to the shadow_apcb  are bound to the vfio 
> driver
> because of previous filtering, so we are only concerned with those APQNs
> with the APQI of the new domain being assigned.
> 
> 1. Queues bound to vfio_ap:
>      04.0004
>      04.0047
> 2. APQNs assigned to matrix_mdev:
>      04.0004
>      04.0047
> 3. shadow_apcb:
>      04.0004
>      04.0047
> 4. Assign domain 0054 to matrix_mdev
> 5. APQI 0054 gets filtered because 04.0054 not bound to vfio_ap
> 6. no change to shadow_apcb:
>      04.0004
>      04.0047

Let me please expand on your example. For reference see the filtering
code after the example.

1. Queues bound to vfio_ap:
     04.0004
     04.0047
     05.0004
     05.0047
     05.0054
2. APQNs assigned to matrix_mdev:
     04.0004
     04.0047
3. shadow_apcb:
     04.0004
     04.0047
4. Assign domain 0054 to matrix_mdev
5. APQNs assigned to matrix_mdev:
     04.0004
     04.0047
 04.0054
5. APQI 0054 gets filtered because 04.0054 not bound to vfio_ap
6. no change to shadow_apcb:
     04.0004
     04.0047
7. assign adapter 05
8. APQNs assigned to matrix_mdev:
     04.0004
     04.0047
 04.0054 
     05.0004
     05.0047
 05.0054
9. shadow_apcb changes to:
     05.0004
     05.0047
 05.0054
because now vfio_ap_mdev_filter_guest_matrix() is called with filter_apid=true
10. assign domain 0052
11. APQNs assigned to matrix_mdev:
     04.0004
     04.0047
 04.0053 
 04.0054 
     05.0004
     05.0047
 05.0053
 05.0054
11. shadow_apcb changes to 
 04.0004
 04.0047
 05.0004
 05.0047
 because now filter_guest_matrix() is called with filter_apid=false
 and apqis 0053 and 0054 get filtered
12. 05.0054 gets removed (unbound)
13. with your current code we unplug adapter 05 from shadow_apcb
despite the fact that 05.0054 was not in the shadow_apcb in
the first place
14. unassign adapter 05
15. unassign domain 0053
16. APQNs assigned to matrix_mdev:     
 04.0004
     04.0047
 04.0054
17. shadow apcb is
04.0004
04.0047
16. assign adapter 05
15. APQNs assigned to matrix_mdev:     
 04.0004
     04.0047
 04.0054
 05.0004
     05.0047 
 05.0054
16. shadow_apcb changes to 
 
 because now filter_guest_matrix() is called with filter_apid=true
 and apqn 04 gets filtered because queues 04.0053 are not bound
 and apqn 05 gets filtered because queues 05.0053 are not bound

static int vfio_ap_mdev_filter_guest_matrix(struct ap_matrix_mdev *matrix_mdev, 
bool filter_apid)   
{   
struct ap_matrix shadow_apcb;   
unsigned long apid, apqi, apqn; 

memcpy(_apcb, _mdev->matrix, sizeof(struct ap_matrix));   

for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {   
/*  
 * If the APID is not assigned to the host AP configuration,
 * we can not assign it to the guest's AP configuration 
 */ 
if (!test_bit_inv(apid, (unsigned long *)   
  matrix_dev->config_info.apm)) {   
clear_bit_inv(apid, shadow_apcb.apm);   
continue;   
}   

for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, 
 AP_DOMAINS) {  
/*  
 * 

Re: [PATCH v11 08/14] s390/vfio-ap: hot plug/unplug queues on bind/unbind of queue device

2020-11-04 Thread Halil Pasic
On Tue, 3 Nov 2020 17:49:21 -0500
Tony Krowiak  wrote:

> > We do this to show the no queues but bits set output in show? We could
> > get rid of some code if we were to not z  

Managed to delete "eroize" fro "zeroize"

> 
> I'm not sure what you are saying/asking here. The reason for this
> is because there is no point in setting bits in the APCB if no queues
> will be made available to the guest which is the case if the APM or
> AQM are cleared.

Exactly my train of thought! There is no point doing work (here
zeroizing) that has no effect.

Also I'm leaning towards incremental updates to the shadow_apcb (instead
of basically recomputing it from the scratch each time). One thing I'm
particularly worried abut is that because of the third argument of
vfio_ap_mdev_filter_guest_matrix() called filter_apid, we could end up
with different filtering decision than previously. E.g. we decided to
filter the card on e.g. removal of a single queueu, but then somebody
does an assign domain, and suddenly we unplug the domain and plug the
card. With incremental changes the shadow_apcb, we could do less work
(revise only what needs to be), and it would be more straight forward
to reason about the absence of inconsistent filtering.

Regards,
Halil


Re: [PATCH v11 08/14] s390/vfio-ap: hot plug/unplug queues on bind/unbind of queue device

2020-11-04 Thread Halil Pasic
On Tue, 3 Nov 2020 17:49:21 -0500
Tony Krowiak  wrote:

> >>   
> >> +void vfio_ap_mdev_hot_unplug_queue(struct vfio_ap_queue *q)
> >> +{
> >> +  unsigned long apid = AP_QID_CARD(q->apqn);
> >> +
> >> +  if ((q->matrix_mdev == NULL) || !vfio_ap_mdev_has_crycb(q->matrix_mdev))
> >> +  return;
> >> +
> >> +  /*
> >> +   * If the APID is assigned to the guest, then let's
> >> +   * go ahead and unplug the adapter since the
> >> +   * architecture does not provide a means to unplug
> >> +   * an individual queue.
> >> +   */
> >> +  if (test_bit_inv(apid, q->matrix_mdev->shadow_apcb.apm)) {
> >> +  clear_bit_inv(apid, q->matrix_mdev->shadow_apcb.apm);  
> > Shouldn't we check aqm as well? I mean it may be clear at this point
> > bacause of info->aqm. If the bit is clear, we don't have to remove
> > the apm bit.  
> 
> The rule we agreed upon is that if a queue is removed, we unplug
> the card because we can't unplug an individual queue, so this code
> is consistent with the stated rule.

All I'm asking for is to verify that the queue is actually plugged. The
queue is actually plugged iff 
test_bit_inv(apid, q->matrix_mdev->shadow_apcb.apm) && test_bit_inv(apqi,
q->matrix_mdev->shadow_apcb.aqm).

There is no point in unplugging the whole card, if the queue removed is
unplugged in the first place.

> Typically, a queue is unplugged
> because the adapter has been deconfigured or is broken which means
> that all queues for that adapter will be removed in succession. On the
> other hand, that situation would be handled when the last queue is
> removed if we check the AQM, so I'm not adverse to making that
> check if you insist. 

I don't agree. Let's detail your scenario. We have a nicely
operating card which is as a whole passed trough to our guest. It
goes broken, and the ap bus decides to deconstruct the queues.
Already the first queue removed would unplug the the card, because
both the apm and the aqm bits are set at this point. Subsequent removals
then see that the apm bit is removed. Actually IMHO everything works
like without the extra check on aqm (in this scenario).

Would make reasoning about the code much easier to me, so sorry I do
insist.

> Of course, if the queue is manually unbound from
> the vfio driver, what you are asking for makes sense I suppose. I'll have
> to think about this one some more, but feel free to respond to this.

I'm not sure the situation where the queues ->mdev_matrix pointer is set
but the apqi is not in the shadow_apcb can actually happen (races not
considered). But I'm sure the code is suggesting it can, because 
vfio_ap_mdev_filter_guest_matrix() has a third parameter called filter_apid,
which governs whether the apm or the aqm bit should be removed. And
vfio_ap_mdev_filter_guest_matrix() does get called with filter_apid=false in
assign_domain_store() and I don't see subsequent unlink operations that would
severe q->mdev_matrix.

Another case where the aqm may get filtered in
vfio_ap_mdev_filter_guest_matrix() is the info->aqm bit not set, as I've
mentioned in my previous mail. If that can not happen, we should turn
that into an assert.

Actually if you are convinced that apqi bit is always set in the
q->matrix_mdev->shadow_apcb.aqm, I would agree to turning that into an
assertion instead of condition. Then if not completely convinced, I
could at least try to trigger the assert :).

Regards,
Halil


Re: [PATCH v11 01/14] s390/vfio-ap: No need to disable IRQ after queue reset

2020-10-30 Thread Halil Pasic
On Fri, 30 Oct 2020 16:37:04 -0400
Tony Krowiak  wrote:

> On 10/30/20 1:42 PM, Halil Pasic wrote:
> > On Thu, 29 Oct 2020 19:29:35 -0400
> > Tony Krowiak  wrote:
> >  
> >>>> @@ -1177,7 +1166,10 @@ static int vfio_ap_mdev_reset_queues(struct 
> >>>> mdev_device *mdev)
> >>>>   */
> >>>>  if (ret)
> >>>>  rc = ret;
> >>>> -vfio_ap_irq_disable_apqn(AP_MKQID(apid, apqi));
> >>>> +q = vfio_ap_get_queue(matrix_mdev,
> >>>> +  AP_MKQID(apid, apqi));
> >>>> +if (q)
> >>>> +vfio_ap_free_aqic_resources(q);  
> >>> Is it safe to do vfio_ap_free_aqic_resources() at this point? I don't
> >>> think so. I mean does the current code (and vfio_ap_mdev_reset_queue()
> >>> in particular guarantee that the reset is actually done when we arrive
> >>> here)? BTW, I think we have a similar problem with the current code as
> >>> well.  
> >> If the return code from the vfio_ap_mdev_reset_queue() function
> >> is zero, then yes, we are guaranteed the reset was done and the
> >> queue is empty.  
> > I've read up on this and I disagree. We should discuss this offline.  
> 
> Maybe you are confusing things here; my statement is specific to the return
> code from the vfio_ap_mdev_reset_queue() function, not the response code
> from the PQAP(ZAPQ) instruction. The vfio_ap_mdev_reset_queue()
> function issues the PQAP(ZAPQ) instruction and if the status response code
> is 0 indicating the reset was successfully initiated, it waits for the
> queue to empty. When the queue is empty, it returns 0 to indicate
> the queue is reset. 
> If the queue does not become empty after a period of 
> time,
> it will issue a warning (WARN_ON_ONCE) and return 0. In that case, I suppose
> there is no guarantee the reset was done, so maybe a change needs to be
> made there such as a non-zero return code.
>

I've overlooked the wait for empty. Maybe that return 0 had a part in
it. I now remember me insisting on having the wait code added when the
interrupt support was in the make. Sorry!

If we have given up on out of retries retries, we are in trouble anyway.
 
> >  
> >>    The function returns a non-zero return code if
> >> the reset fails or the queue the reset did not complete within a given
> >> amount of time, so maybe we shouldn't free AQIC resources when
> >> we get a non-zero return code from the reset function?
> >>  
> > If the queue is gone, or broken, it won't produce interrupts or poke the
> > notifier bit, and we should clean up the AQIC resources.  
> 
> True, which is what the code provided by this patch does; however,
> the AQIC resources should be cleaned up only if the KVM pointer is
> not NULL for reasons discussed elsewhere.

Yes, but these should be cleaned up before the KVM pointer becomes
null. We don't want to keep the page with the notifier byte pinned
forever, or?


Re: [PATCH v11 01/14] s390/vfio-ap: No need to disable IRQ after queue reset

2020-10-30 Thread Halil Pasic
On Thu, 29 Oct 2020 19:29:35 -0400
Tony Krowiak  wrote:

> >> +void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
> >> +{
> >> +  struct vfio_ap_queue *q;
> >> +  struct ap_queue *queue;
> >> +  int apid, apqi;
> >> +
> >> +  queue = to_ap_queue(>device);  
> > What is the benefit of rewriting this? You introduced
> > queue just to do queue->ap_dev to get to the apdev you
> > have in hand in the first place.  
> 
> I'm not quite sure what you're asking. This function is
> the callback function specified via the function pointer
> specified via the remove field of the struct ap_driver
> when the vfio_ap device driver is registered with the
> AP bus. That callback function takes a struct ap_device
> as a parameter. What am I missing here?

Please compare the removed function vfio_ap_queue_dev_remove() with the
added function vfio_ap_mdev_remove_queue() line by line. It should
become clear.

Regards,
Halil


Re: [PATCH v11 01/14] s390/vfio-ap: No need to disable IRQ after queue reset

2020-10-30 Thread Halil Pasic
On Thu, 29 Oct 2020 19:29:35 -0400
Tony Krowiak  wrote:

> >> @@ -1177,7 +1166,10 @@ static int vfio_ap_mdev_reset_queues(struct 
> >> mdev_device *mdev)
> >> */
> >>if (ret)
> >>rc = ret;
> >> -  vfio_ap_irq_disable_apqn(AP_MKQID(apid, apqi));
> >> +  q = vfio_ap_get_queue(matrix_mdev,
> >> +AP_MKQID(apid, apqi));
> >> +  if (q)
> >> +  vfio_ap_free_aqic_resources(q);  

[..]

> >
> > Under what circumstances do we expect !q? If we don't, then we need to
> > complain one way or another.  
> 
> In the current code (i.e., prior to introducing the subsequent hot
> plug patches), an APQN can not be assigned to an mdev unless it
> references a queue device bound to the vfio_ap device driver; however,
> there is nothing preventing a queue device from getting unbound
> while the guest is running (one of the problems mostly resolved by this
> series). In that case, q would be NULL.

But if the queue does not belong to us any more it does not make sense
call vfio_ap_mdev_reset_queue() on it's APQN, or?

I think we should have 

if(!q)
continue; 
at the very beginning of the loop body, or we want to be sure that q is
not null. 



Re: [PATCH v11 01/14] s390/vfio-ap: No need to disable IRQ after queue reset

2020-10-30 Thread Halil Pasic
On Thu, 29 Oct 2020 19:29:35 -0400
Tony Krowiak  wrote:

> >> @@ -1177,7 +1166,10 @@ static int vfio_ap_mdev_reset_queues(struct 
> >> mdev_device *mdev)
> >> */
> >>if (ret)
> >>rc = ret;
> >> -  vfio_ap_irq_disable_apqn(AP_MKQID(apid, apqi));
> >> +  q = vfio_ap_get_queue(matrix_mdev,
> >> +AP_MKQID(apid, apqi));
> >> +  if (q)
> >> +  vfio_ap_free_aqic_resources(q);  
> > Is it safe to do vfio_ap_free_aqic_resources() at this point? I don't
> > think so. I mean does the current code (and vfio_ap_mdev_reset_queue()
> > in particular guarantee that the reset is actually done when we arrive
> > here)? BTW, I think we have a similar problem with the current code as
> > well.  
> 
> If the return code from the vfio_ap_mdev_reset_queue() function
> is zero, then yes, we are guaranteed the reset was done and the
> queue is empty.

I've read up on this and I disagree. We should discuss this offline.

>  The function returns a non-zero return code if
> the reset fails or the queue the reset did not complete within a given
> amount of time, so maybe we shouldn't free AQIC resources when
> we get a non-zero return code from the reset function?
> 

If the queue is gone, or broken, it won't produce interrupts or poke the
notifier bit, and we should clean up the AQIC resources.


> There are three occasions when the vfio_ap_mdev_reset_queues()
> is called:
> 1. When the VFIO_DEVICE_RESET ioctl is invoked from userspace
>      (i.e., when the guest is started)
> 2. When the mdev fd is closed (vfio_ap_mdev_release())
> 3. When the mdev is removed (vfio_ap_mdev_remove())
> 
> The IRQ resources are initialized when the PQAP(AQIC)
> is intercepted to enable interrupts. This would occur after
> the guest boots and the AP bus initializes. So, 1 would
> presumably occur before that happens. I couldn't find
> anywhere in the AP bus or zcrypt code where a PQAP(AQIC)
> is executed to disable interrupts, so my assumption is
> that IRQ disablement is accomplished by a reset on
> the guest. I'll have to ask Harald about that. So, 2 would
> occur when the guest is about to terminate and 3
> would occur only after the guest is terminated. In any
> case, it seems that IRQ resources should be cleaned up.
> Maybe it would be more appropriate to do that in the
> vfio_ap_mdev_release() and vfio_ap_mdev_remove()
> functions themselves?

I'm a bit confused. But I think you are wrong. What happens when the
guest reIPLs? I guess the subsystem reset should also do the
VFIO_DEVICE_RESET ioctl, and that has to reset the queues and disable
the interrupts. Or?

Regards,
Halil



Re: [PATCH v11 01/14] s390/vfio-ap: No need to disable IRQ after queue reset

2020-10-30 Thread Halil Pasic
On Thu, 29 Oct 2020 19:29:35 -0400
Tony Krowiak  wrote:

> On 10/27/20 2:48 AM, Halil Pasic wrote:
> > On Thu, 22 Oct 2020 13:11:56 -0400
> > Tony Krowiak  wrote:
> >  
> >> The queues assigned to a matrix mediated device are currently reset when:
> >>
> >> * The VFIO_DEVICE_RESET ioctl is invoked
> >> * The mdev fd is closed by userspace (QEMU)
> >> * The mdev is removed from sysfs.  
> > What about the situation when vfio_ap_mdev_group_notifier() is called to
> > tell us that our pointer to KVM is about to become invalid? Do we need to
> > clean up the IRQ stuff there?  
> 
> After reading this question, I decided to do some tracing using
> printk's and learned that the vfio_ap_mdev_group_notifier()
> function does not get called when the guest is shutdown. The reason
> for this is because the vfio_ap_mdev_release() function, which is called
> before the KVM pointer is invalidated, unregisters the group notifier.
> 
> I took a look at some of the other drivers that register a group
> notifier in the mdev_parent_ops.open callback and each unregistered
> the notifier in the mdev_parent_ops.release callback.
> 
> So, to answer your question, there is no need to cleanup the IRQ
> stuff in the vfio_ap_mdev_group_notifier() function since it will
> not get called when the KVM pointer is invalidated. The cleanup
> should be done in the vfio_ap_mdev_release() function that gets
> called when the mdev fd is closed.

You say if vfio_ap_mdev_group_notifier() is called to tell us
that KVM going away, then it is a bug?

If that is the case, I would like that reflected in the code! By that I
mean at logging an error at least (if not BUG_ON).

Regards,
Halil


Re: [PATCH v11 07/14] s390/vfio-ap: sysfs attribute to display the guest's matrix

2020-10-28 Thread Halil Pasic
On Thu, 22 Oct 2020 13:12:02 -0400
Tony Krowiak  wrote:

> +static ssize_t guest_matrix_show(struct device *dev,
> +  struct device_attribute *attr, char *buf)
> +{
> + ssize_t nchars;
> + struct mdev_device *mdev = mdev_from_dev(dev);
> + struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
> +
> + if (!vfio_ap_mdev_has_crycb(matrix_mdev))
> + return -ENODEV;

I'm wondering, would it make sense to have guest_matrix display the would
be guest matrix when we don't have a KVM? With the filtering in
place, the question in what guest_matrix would my (assign) matrix result
right now if I were to hook up my vfio_ap_mdev to a guest seems a
legitimate one.


> +
> + mutex_lock(_dev->lock);
> + nchars = vfio_ap_mdev_matrix_show(_mdev->shadow_apcb, buf);
> + mutex_unlock(_dev->lock);
> +
> + return nchars;
> +}
> +static DEVICE_ATTR_RO(guest_matrix);


Re: [PATCH v11 08/14] s390/vfio-ap: hot plug/unplug queues on bind/unbind of queue device

2020-10-28 Thread Halil Pasic
On Thu, 22 Oct 2020 13:12:03 -0400
Tony Krowiak  wrote:

> In response to the probe or remove of a queue device, if a KVM guest is
> using the matrix mdev to which the APQN of the queue device is assigned,
> the vfio_ap device driver must respond accordingly. In an ideal world, the
> queue device being probed would be hot plugged into the guest. Likewise,
> the queue corresponding to the queue device being removed would
> be hot unplugged from the guest. Unfortunately, the AP architecture
> precludes plugging or unplugging individual queues. We must also
> consider the fact that the linux device model precludes us from passing a
> queue device through to a KVM guest that is not bound to the driver
> facilitating the pass-through. Consequently, we are left with the choice of
> plugging/unplugging the adapter or the domain. In the latter case, this
> would result in taking access to the domain away for each adapter the
> guest is using. In either case, the operation will alter a KVM guest's
> access to one or more queues, so let's plug/unplug the adapter on
> bind/unbind of the queue device since this corresponds to the hardware
> entity that may be physically plugged/unplugged - i.e., a domain is not
> a piece of hardware.
> 
> Example:
> ===
> Queue devices bound to vfio_ap device driver:
>04.0004
>04.0047
>04.0054
> 
>05.0005
>05.0047
> 
> Adapters and domains assigned to matrix mdev:
>Adapters  Domains  -> Queues
>04000404.0004
>05004704.0047
>  005404.0054
>  05.0004
>  05.0047
>  05.0054
> 
> KVM guest matrix at is startup:
>Adapters  Domains  -> Queues
>04000404.0004
>  004704.0047
>  005404.0054
> 
>Adapter 05 is filtered because queue 05.0054 is not bound.
> 
> KVM guest matrix after queue 05.0054 is bound to the vfio_ap driver:
>Adapters  Domains  -> Queues
>04000404.0004
>05004704.0047
>  005404.0054
>  05.0004
>  05.0047
>  05.0054
> 
>All queues assigned to the matrix mdev are now bound.
> 
> KVM guest matrix after queue 04.0004 is unbound:
> 
>Adapters  Domains  -> Queues
>05000405.0004
>  004705.0047
>  005405.0054
> 
>Adapter 04 is filtered because 04.0004 is no longer bound.
> 
> Signed-off-by: Tony Krowiak 
> ---
>  drivers/s390/crypto/vfio_ap_ops.c | 158 +-
>  1 file changed, 155 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
> b/drivers/s390/crypto/vfio_ap_ops.c
> index 7bad70d7bcef..5b34bc8fca31 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -312,6 +312,13 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
>   return 0;
>  }
>  
> +static void vfio_ap_matrix_clear_masks(struct ap_matrix *matrix)
> +{
> + bitmap_clear(matrix->apm, 0, AP_DEVICES);
> + bitmap_clear(matrix->aqm, 0, AP_DOMAINS);
> + bitmap_clear(matrix->adm, 0, AP_DOMAINS);
> +}
> +
>  static void vfio_ap_matrix_init(struct ap_config_info *info,
>   struct ap_matrix *matrix)
>  {
> @@ -601,6 +608,104 @@ static int vfio_ap_mdev_verify_no_sharing(struct 
> ap_matrix_mdev *matrix_mdev,
>   return 0;
>  }
>  
> +static bool vfio_ap_mdev_matrixes_equal(struct ap_matrix *matrix1,
> + struct ap_matrix *matrix2)
> +{
> + return (bitmap_equal(matrix1->apm, matrix2->apm, AP_DEVICES) &&
> + bitmap_equal(matrix1->aqm, matrix2->aqm, AP_DOMAINS) &&
> + bitmap_equal(matrix1->adm, matrix2->adm, AP_DOMAINS));
> +}
> +
> +/**
> + * vfio_ap_mdev_filter_matrix
> + *
> + * Filters the matrix of adapters, domains, and control domains assigned to
> + * a matrix mdev's AP configuration and stores the result in the shadow copy 
> of
> + * the APCB used to supply a KVM guest's AP configuration.
> + *
> + * @matrix_mdev:  the matrix mdev whose AP configuration is to be filtered
> + *
> + * Returns true if filtering has changed the shadow copy of the APCB used
> + * to supply a KVM guest's AP configuration; otherwise, returns false.
> + */
> +static int vfio_ap_mdev_filter_guest_matrix(struct ap_matrix_mdev 
> *matrix_mdev)
> +{
> + struct ap_matrix shadow_apcb;
> + unsigned long apid, apqi, apqn;
> +
> + memcpy(_apcb, _mdev->matrix, sizeof(struct ap_matrix));
> +
> + for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
> + /*
> +  * If the APID is not assigned to the host AP configuration,
> +  * we can not assign it to the guest's AP configuration
> +  */
> + if (!test_bit_inv(apid,
> +   

Re: [PATCH v11 06/14] s390/vfio-ap: introduce shadow APCB

2020-10-28 Thread Halil Pasic
On Thu, 22 Oct 2020 13:12:01 -0400
Tony Krowiak  wrote:

> The APCB is a field within the CRYCB that provides the AP configuration
> to a KVM guest. Let's introduce a shadow copy of the KVM guest's APCB and
> maintain it for the lifespan of the guest.
> 
> Signed-off-by: Tony Krowiak 
> ---
>  drivers/s390/crypto/vfio_ap_ops.c | 24 +++-
>  drivers/s390/crypto/vfio_ap_private.h |  2 ++
>  2 files changed, 21 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
> b/drivers/s390/crypto/vfio_ap_ops.c
> index 9e9fad560859..9791761aa7fd 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -320,6 +320,19 @@ static void vfio_ap_matrix_init(struct ap_config_info 
> *info,
>   matrix->adm_max = info->apxa ? info->Nd : 15;
>  }
>  
> +static bool vfio_ap_mdev_has_crycb(struct ap_matrix_mdev *matrix_mdev)
> +{
> + return (matrix_mdev->kvm && matrix_mdev->kvm->arch.crypto.crycbd);
> +}
> +
> +static void vfio_ap_mdev_commit_shadow_apcb(struct ap_matrix_mdev 
> *matrix_mdev)
> +{
> + kvm_arch_crypto_set_masks(matrix_mdev->kvm,
> +   matrix_mdev->shadow_apcb.apm,
> +   matrix_mdev->shadow_apcb.aqm,
> +   matrix_mdev->shadow_apcb.adm);
> +}
> +
>  static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device 
> *mdev)
>  {
>   struct ap_matrix_mdev *matrix_mdev;
> @@ -335,6 +348,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, 
> struct mdev_device *mdev)
>  
>   matrix_mdev->mdev = mdev;
>   vfio_ap_matrix_init(_dev->info, _mdev->matrix);
> + vfio_ap_matrix_init(_dev->info, _mdev->shadow_apcb);
>   hash_init(matrix_mdev->qtable);
>   mdev_set_drvdata(mdev, matrix_mdev);
>   matrix_mdev->pqap_hook.hook = handle_pqap;
> @@ -1213,13 +1227,12 @@ static int vfio_ap_mdev_group_notifier(struct 
> notifier_block *nb,
>   if (ret)
>   return NOTIFY_DONE;
>  
> - /* If there is no CRYCB pointer, then we can't copy the masks */
> - if (!matrix_mdev->kvm->arch.crypto.crycbd)
> + if (!vfio_ap_mdev_has_crycb(matrix_mdev))
>   return NOTIFY_DONE;
>  
> - kvm_arch_crypto_set_masks(matrix_mdev->kvm, matrix_mdev->matrix.apm,
> -   matrix_mdev->matrix.aqm,
> -   matrix_mdev->matrix.adm);
> + memcpy(_mdev->shadow_apcb, _mdev->matrix,
> +sizeof(matrix_mdev->shadow_apcb));
> + vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>  
>   return NOTIFY_OK;
>  }
> @@ -1329,6 +1342,7 @@ static void vfio_ap_mdev_release(struct mdev_device 
> *mdev)
>   kvm_put_kvm(matrix_mdev->kvm);
>   matrix_mdev->kvm = NULL;
>   }
> +

Unrelated change.

Otherwise patch looks OK.

Reviewed-by: Halil Pasic 

>   mutex_unlock(_dev->lock);
>  
>   vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
> diff --git a/drivers/s390/crypto/vfio_ap_private.h 
> b/drivers/s390/crypto/vfio_ap_private.h
> index c1d8b5507610..fc8634cee485 100644
> --- a/drivers/s390/crypto/vfio_ap_private.h
> +++ b/drivers/s390/crypto/vfio_ap_private.h
> @@ -75,6 +75,7 @@ struct ap_matrix {
>   * @list:allows the ap_matrix_mdev struct to be added to a list
>   * @matrix:  the adapters, usage domains and control domains assigned to the
>   *   mediated matrix device.
> + * @shadow_apcb:the shadow copy of the APCB field of the KVM guest's 
> CRYCB
>   * @group_notifier: notifier block used for specifying callback function for
>   *   handling the VFIO_GROUP_NOTIFY_SET_KVM event
>   * @kvm: the struct holding guest's state
> @@ -82,6 +83,7 @@ struct ap_matrix {
>  struct ap_matrix_mdev {
>   struct list_head node;
>   struct ap_matrix matrix;
> + struct ap_matrix shadow_apcb;
>   struct notifier_block group_notifier;
>   struct notifier_block iommu_notifier;
>   struct kvm *kvm;



Re: [PATCH v11 09/14] s390/vfio-ap: allow assignment of unavailable AP queues to mdev device

2020-10-28 Thread Halil Pasic
On Thu, 22 Oct 2020 13:12:04 -0400
Tony Krowiak  wrote:

> +static int vfio_ap_mdev_validate_masks(struct ap_matrix_mdev *matrix_mdev,
> +unsigned long *mdev_apm,
> +unsigned long *mdev_aqm)
> +{
> + if (ap_apqn_in_matrix_owned_by_def_drv(mdev_apm, mdev_aqm))
> + return -EADDRNOTAVAIL;
> +
> + return vfio_ap_mdev_verify_no_sharing(matrix_mdev, mdev_apm, mdev_aqm);
> +}
> +
>  static bool vfio_ap_mdev_matrixes_equal(struct ap_matrix *matrix1,
>   struct ap_matrix *matrix2)
>  {
> @@ -840,33 +734,21 @@ static ssize_t assign_adapter_store(struct device *dev,
>   if (apid > matrix_mdev->matrix.apm_max)
>   return -ENODEV;
>  
> - /*
> -  * Set the bit in the AP mask (APM) corresponding to the AP adapter
> -  * number (APID). The bits in the mask, from most significant to least
> -  * significant bit, correspond to APIDs 0-255.
> -  */
> - mutex_lock(_dev->lock);
> -
> - ret = vfio_ap_mdev_verify_queues_reserved_for_apid(matrix_mdev, apid);
> - if (ret)
> - goto done;
> -
>   memset(apm, 0, sizeof(apm));
>   set_bit_inv(apid, apm);
>  
> - ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev, apm,
> -  matrix_mdev->matrix.aqm);
> - if (ret)
> - goto done;
> -
> + mutex_lock(_dev->lock);
> + ret = vfio_ap_mdev_validate_masks(matrix_mdev, apm,
> +   matrix_mdev->matrix.aqm);

Is this a potential deadlock?

Consider following scenario 
1) apmask_store() takes ap_perms_mutex
2) assign_adapter_store() takes matrix_dev->lock
3) apmask_store() calls vfio_ap_mdev_resource_in_use() which tries
   to take matrix_dev->lock
4) assign_adapter_store() calls ap_apqn_in_matrix_owned_by_def_drv
   which tries to take ap_perms_mutex

BANG!

I think using mutex_trylock(_dev->lock) and bailing out with busy
if we don't manage to acquire the lock would be a good idea anyway, to
prevent a bunch of mdev management operations piling up on the mutex
and starving in_use().

Regards,
Halil

 
> + if (ret) {
> + mutex_unlock(_dev->lock);
> + return ret;
> + }
>   set_bit_inv(apid, matrix_mdev->matrix.apm);
>   vfio_ap_mdev_link_queues(matrix_mdev, LINK_APID, apid);
> - ret = count;
> -
> -done:
>   mutex_unlock(_dev->lock);
>  
> - return ret;
> + return count;


Re: [PATCH v11 05/14] s390/vfio-ap: implement in-use callback for vfio_ap driver

2020-10-27 Thread Halil Pasic
On Thu, 22 Oct 2020 13:12:00 -0400
Tony Krowiak  wrote:

> Let's implement the callback to indicate when an APQN
> is in use by the vfio_ap device driver. The callback is
> invoked whenever a change to the apmask or aqmask would
> result in one or more queue devices being removed from the driver. The
> vfio_ap device driver will indicate a resource is in use
> if the APQN of any of the queue devices to be removed are assigned to
> any of the matrix mdevs under the driver's control.
> 
> Signed-off-by: Tony Krowiak 
> ---
>  drivers/s390/crypto/vfio_ap_drv.c |  1 +
>  drivers/s390/crypto/vfio_ap_ops.c | 78 +++
>  drivers/s390/crypto/vfio_ap_private.h |  2 +
>  3 files changed, 60 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_drv.c 
> b/drivers/s390/crypto/vfio_ap_drv.c
> index 73bd073fd5d3..8934471b7944 100644
> --- a/drivers/s390/crypto/vfio_ap_drv.c
> +++ b/drivers/s390/crypto/vfio_ap_drv.c
> @@ -147,6 +147,7 @@ static int __init vfio_ap_init(void)
>   memset(_ap_drv, 0, sizeof(vfio_ap_drv));
>   vfio_ap_drv.probe = vfio_ap_mdev_probe_queue;
>   vfio_ap_drv.remove = vfio_ap_mdev_remove_queue;
> + vfio_ap_drv.in_use = vfio_ap_mdev_resource_in_use;
>   vfio_ap_drv.ids = ap_queue_ids;
>  
>   ret = ap_driver_register(_ap_drv, THIS_MODULE, VFIO_AP_DRV_NAME);
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
> b/drivers/s390/crypto/vfio_ap_ops.c
> index 1357f8f8b7e4..9e9fad560859 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -522,18 +522,40 @@ vfio_ap_mdev_verify_queues_reserved_for_apid(struct 
> ap_matrix_mdev *matrix_mdev,
>   return 0;
>  }
>  
> +#define MDEV_SHARING_ERR "Userspace may not re-assign queue %02lx.%04lx " \
> +  "already assigned to %s"
> +
> +static void vfio_ap_mdev_log_sharing_err(const char *mdev_name,
> +  unsigned long *apm,
> +  unsigned long *aqm)
> +{
> + unsigned long apid, apqi;
> +
> + for_each_set_bit_inv(apid, apm, AP_DEVICES)
> + for_each_set_bit_inv(apqi, aqm, AP_DOMAINS)
> + pr_err(MDEV_SHARING_ERR, apid, apqi, mdev_name);

Isn't error rather severe for this? For my taste even warning would be
severe for this.

> +}
> +
>  /**
>   * vfio_ap_mdev_verify_no_sharing
>   *
> - * Verifies that the APQNs derived from the cross product of the AP adapter 
> IDs
> - * and AP queue indexes comprising the AP matrix are not configured for 
> another
> + * Verifies that each APQN derived from the cross product of the AP adapter 
> IDs
> + * and AP queue indexes comprising an AP matrix is not assigned to a
>   * mediated device. AP queue sharing is not allowed.
>   *
> - * @matrix_mdev: the mediated matrix device
> + * @matrix_mdev: the mediated matrix device to which the APQNs being verified
> + *are assigned. If the value is not NULL, then verification will
> + *proceed for all other matrix mediated devices; otherwise, all
> + *matrix mediated devices will be verified.
> + * @mdev_apm: mask indicating the APIDs of the APQNs to be verified
> + * @mdev_aqm: mask indicating the APQIs of the APQNs to be verified
>   *
> - * Returns 0 if the APQNs are not shared, otherwise; returns -EADDRINUSE.
> + * Returns 0 if no APQNs are not shared, otherwise; returns -EADDRINUSE if 
> one
> + * or more APQNs are shared.
>   */
> -static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
> +static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev,
> +   unsigned long *mdev_apm,
> +   unsigned long *mdev_aqm)
>  {
>   struct ap_matrix_mdev *lstdev;
>   DECLARE_BITMAP(apm, AP_DEVICES);
> @@ -550,14 +572,15 @@ static int vfio_ap_mdev_verify_no_sharing(struct 
> ap_matrix_mdev *matrix_mdev)
>* We work on full longs, as we can only exclude the leftover
>* bits in non-inverse order. The leftover is all zeros.
>*/
> - if (!bitmap_and(apm, matrix_mdev->matrix.apm,
> - lstdev->matrix.apm, AP_DEVICES))
> + if (!bitmap_and(apm, mdev_apm, lstdev->matrix.apm, AP_DEVICES))
>   continue;
>  
> - if (!bitmap_and(aqm, matrix_mdev->matrix.aqm,
> - lstdev->matrix.aqm, AP_DOMAINS))
> + if (!bitmap_and(aqm, mdev_aqm, lstdev->matrix.aqm, AP_DOMAINS))
>   continue;
>  
> + vfio_ap_mdev_log_sharing_err(dev_name(mdev_dev(lstdev->mdev)),
> +  apm, aqm);
> +
>   return -EADDRINUSE;
>   }
>  
> @@ -683,6 +706,7 @@ static ssize_t assign_adapter_store(struct device *dev,
>  {
>   int ret;
>   unsigned long apid;
> + DECLARE_BITMAP(apm, 

Re: [PATCH v11 04/14] s390/zcrypt: driver callback to indicate resource in use

2020-10-27 Thread Halil Pasic
On Thu, 22 Oct 2020 13:11:59 -0400
Tony Krowiak  wrote:

> Introduces a new driver callback to prevent a root user from unbinding
> an AP queue from its device driver if the queue is in use. The callback
> will be invoked whenever a change to the AP bus's sysfs apmask or aqmask
> attributes would result in one or more AP queues being removed from its
> driver. If the callback responds in the affirmative for any driver
> queried, the change to the apmask or aqmask will be rejected with a device
> in use error.

Like discussed last time, there seems to be nothing, that would prevent
a resource becoming in use between the in_use() callback returned false
and the resource being removed as a result of ap_bus_revise_bindings().

Another thing that may be of interest, is that now we hold the
ap_perms_mutex for the in_use() checks. The ap_perms_mutex is used
in ap_device_probe() and I don't quite understand some
usages of in zcrypt_api.c My feeling is that the extra pressure on that
lock should not be a problem, except if in_use() were to not return
because of some deadlock.

With all that said if Harald is fine with it, so am I.

Acked-by: Halil Pasic 

> 
> For this patch, only non-default drivers will be queried. Currently,
> there is only one non-default driver, the vfio_ap device driver. The
> vfio_ap device driver facilitates pass-through of an AP queue to a
> guest. The idea here is that a guest may be administered by a different
> sysadmin than the host and we don't want AP resources to unexpectedly
> disappear from a guest's AP configuration (i.e., adapters and domains
> assigned to the matrix mdev). This will enforce the proper procedure for
> removing AP resources intended for guest usage which is to
> first unassign them from the matrix mdev, then unbind them from the
> vfio_ap device driver.
> 
> Signed-off-by: Tony Krowiak 
>


Re: [PATCH v11 03/14] s390/vfio-ap: manage link between queue struct and matrix mdev

2020-10-27 Thread Halil Pasic
On Thu, 22 Oct 2020 13:11:58 -0400
Tony Krowiak  wrote:

> Let's create links between each queue device bound to the vfio_ap device
> driver and the matrix mdev to which the queue is assigned. The idea is to
> facilitate efficient retrieval of the objects representing the queue
> devices and matrix mdevs as well as to verify that a queue assigned to
> a matrix mdev is bound to the driver.
> 
> The links will be created as follows:
> 
>* When the queue device is probed, if its APQN is assigned to a matrix
>  mdev, the structures representing the queue device and the matrix mdev
>  will be linked.
> 
>* When an adapter or domain is assigned to a matrix mdev, for each new
>  APQN assigned that references a queue device bound to the vfio_ap
>  device driver, the structures representing the queue device and the
>  matrix mdev will be linked.
> 
> The links will be removed as follows:
> 
>* When the queue device is removed, if its APQN is assigned to a matrix
>  mdev, the structures representing the queue device and the matrix mdev
>  will be unlinked.
> 
>* When an adapter or domain is unassigned from a matrix mdev, for each
>  APQN unassigned that references a queue device bound to the vfio_ap
>  device driver, the structures representing the queue device and the
>  matrix mdev will be unlinked.
> 

I would prefer if the changes to the q->matrix_mdev link were restricted
to this patch. Patches 1 and 2 do some of that stuff as well. See my
comments at the code. 

> Signed-off-by: Tony Krowiak 
> ---
>  drivers/s390/crypto/vfio_ap_ops.c | 146 +++---
>  drivers/s390/crypto/vfio_ap_private.h |   3 +
>  2 files changed, 135 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
> b/drivers/s390/crypto/vfio_ap_ops.c
> index 049b97d7444c..1357f8f8b7e4 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -28,7 +28,6 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device 
> *mdev);
>  
>  /**
>   * vfio_ap_get_queue: Retrieve a queue with a specific APQN.
> - * @matrix_mdev: the associated mediated matrix
>   * @apqn: The queue APQN
>   *
>   * Retrieve a queue with a specific APQN from the AP queue devices attached 
> to
> @@ -36,18 +35,11 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device 
> *mdev);
>   *
>   * Returns the pointer to the vfio_ap_queue with the specified APQN, or NULL.
>   */
> -static struct vfio_ap_queue *vfio_ap_get_queue(
> - struct ap_matrix_mdev *matrix_mdev,
> - unsigned long apqn)
> +static struct vfio_ap_queue *vfio_ap_get_queue(unsigned long apqn)
>  {
>   struct ap_queue *queue;
>   struct vfio_ap_queue *q = NULL;
>  
> - if (!test_bit_inv(AP_QID_CARD(apqn), matrix_mdev->matrix.apm))
> - return NULL;
> - if (!test_bit_inv(AP_QID_QUEUE(apqn), matrix_mdev->matrix.aqm))
> - return NULL;
> -
>   queue = ap_get_qdev(apqn);
>   if (!queue)
>   return NULL;

Patch 2 removed
q->matrix_mdev = matrix_mdev;
because patch 1 make it redundant. But patch 1 should not have made it
redundant in the first place.

It should be removed in this patch.

> @@ -60,6 +52,19 @@ static struct vfio_ap_queue *vfio_ap_get_queue(
>   return q;
>  }
>  
> +static struct vfio_ap_queue *
> +vfio_ap_mdev_get_queue(struct ap_matrix_mdev *matrix_mdev, unsigned long 
> apqn)
> +{
> + struct vfio_ap_queue *q;
> +
> + hash_for_each_possible(matrix_mdev->qtable, q, mdev_qnode, apqn) {
> + if (q && (q->apqn == apqn))
> + return q;
> + }
> +
> + return NULL;
> +}
> +
>  /**
>   * vfio_ap_wait_for_irqclear
>   * @apqn: The AP Queue number
> @@ -171,7 +176,6 @@ static struct ap_queue_status vfio_ap_irq_disable(struct 
> vfio_ap_queue *q)
> status.response_code);
>  end_free:
>   vfio_ap_free_aqic_resources(q);
> - q->matrix_mdev = NULL;
>   return status;
>  }
>  
> @@ -284,14 +288,14 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
>  
>   if (!vcpu->kvm->arch.crypto.pqap_hook)
>   goto out_unlock;
> +
>   matrix_mdev = container_of(vcpu->kvm->arch.crypto.pqap_hook,
>  struct ap_matrix_mdev, pqap_hook);
>  
> - q = vfio_ap_get_queue(matrix_mdev, apqn);
> + q = vfio_ap_mdev_get_queue(matrix_mdev, apqn);
>   if (!q)
>   goto out_unlock;
>  
> - q->matrix_mdev = matrix_mdev;

This was unnecessarily added in patch 1, now it's removed.

>   status = vcpu->run->s.regs.gprs[1];
>  
>   /* If IR bit(16) is set we enable the interrupt */
> @@ -331,6 +335,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, 
> struct mdev_device *mdev)
>  
>   matrix_mdev->mdev = mdev;
>   vfio_ap_matrix_init(_dev->info, _mdev->matrix);
> + hash_init(matrix_mdev->qtable);
>  

Re: [PATCH v11 02/14] 390/vfio-ap: use new AP bus interface to search for queue devices

2020-10-27 Thread Halil Pasic
On Thu, 22 Oct 2020 13:11:57 -0400
Tony Krowiak  wrote:

> This patch refactors the vfio_ap device driver to use the AP bus's
> ap_get_qdev() function to retrieve the vfio_ap_queue struct containing
> information about a queue that is bound to the vfio_ap device driver.
> The bus's ap_get_qdev() function retrieves the queue device from a
> hashtable keyed by APQN. This is much more efficient than looping over
> the list of devices attached to the AP bus by several orders of
> magnitude.
> 
> Signed-off-by: Tony Krowiak 

Reviewed-by: Halil Pasic 

> ---
>  drivers/s390/crypto/vfio_ap_ops.c | 35 +--
>  1 file changed, 14 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
> b/drivers/s390/crypto/vfio_ap_ops.c
> index c471832f0a30..049b97d7444c 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -26,43 +26,36 @@
>  
>  static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
>  
> -static int match_apqn(struct device *dev, const void *data)
> -{
> - struct vfio_ap_queue *q = dev_get_drvdata(dev);
> -
> - return (q->apqn == *(int *)(data)) ? 1 : 0;
> -}
> -
>  /**
> - * vfio_ap_get_queue: Retrieve a queue with a specific APQN from a list
> + * vfio_ap_get_queue: Retrieve a queue with a specific APQN.
>   * @matrix_mdev: the associated mediated matrix
>   * @apqn: The queue APQN
>   *
> - * Retrieve a queue with a specific APQN from the list of the
> - * devices of the vfio_ap_drv.
> - * Verify that the APID and the APQI are set in the matrix.
> + * Retrieve a queue with a specific APQN from the AP queue devices attached 
> to
> + * the AP bus.
>   *
> - * Returns the pointer to the associated vfio_ap_queue
> + * Returns the pointer to the vfio_ap_queue with the specified APQN, or NULL.
>   */
>  static struct vfio_ap_queue *vfio_ap_get_queue(
>   struct ap_matrix_mdev *matrix_mdev,
> - int apqn)
> + unsigned long apqn)
>  {
> - struct vfio_ap_queue *q;
> - struct device *dev;
> + struct ap_queue *queue;
> + struct vfio_ap_queue *q = NULL;
>  
>   if (!test_bit_inv(AP_QID_CARD(apqn), matrix_mdev->matrix.apm))
>   return NULL;
>   if (!test_bit_inv(AP_QID_QUEUE(apqn), matrix_mdev->matrix.aqm))
>   return NULL;
>  
> - dev = driver_find_device(_dev->vfio_ap_drv->driver, NULL,
> -  , match_apqn);
> - if (!dev)
> + queue = ap_get_qdev(apqn);
> + if (!queue)
>   return NULL;
> - q = dev_get_drvdata(dev);
> - q->matrix_mdev = matrix_mdev;
> - put_device(dev);
> +
> + if (queue->ap_dev.device.driver == _dev->vfio_ap_drv->driver)
> + q = dev_get_drvdata(>ap_dev.device);
> +

Needs to be called with the vfio_ap lock held, right? Otherwise the queue could
get unbound while we are working with it as a vfio_ap_queue... Noting
new, but might we worth documenting.

> + put_device(>ap_dev.device);
>  
>   return q;
>  }



Re: [PATCH v11 01/14] s390/vfio-ap: No need to disable IRQ after queue reset

2020-10-27 Thread Halil Pasic
On Thu, 22 Oct 2020 13:11:56 -0400
Tony Krowiak  wrote:

> The queues assigned to a matrix mediated device are currently reset when:
> 
> * The VFIO_DEVICE_RESET ioctl is invoked
> * The mdev fd is closed by userspace (QEMU)
> * The mdev is removed from sysfs.

What about the situation when vfio_ap_mdev_group_notifier() is called to
tell us that our pointer to KVM is about to become invalid? Do we need to
clean up the IRQ stuff there?

> 
> Immediately after the reset of a queue, a call is made to disable
> interrupts for the queue. This is entirely unnecessary because the reset of
> a queue disables interrupts, so this will be removed.

Makes sense.

> 
> Since interrupt processing may have been enabled by the guest, it may also
> be necessary to clean up the resources used for interrupt processing. Part
> of the cleanup operation requires a reference to KVM, so a check is also
> being added to ensure the reference to KVM exists. The reason is because
> the release callback - invoked when userspace closes the mdev fd - removes
> the reference to KVM. When the remove callback - called when the mdev is
> removed from sysfs - is subsequently invoked, there will be no reference to
> KVM when the cleanup is performed.

Please see below in the code.

> 
> This patch will also do a bit of refactoring due to the fact that the
> remove callback, implemented in vfio_ap_drv.c, disables the queue after
> resetting it. Instead of the remove callback making a call into the
> vfio_ap_ops.c to clean up the resources used for interrupt processing,
> let's move the probe and remove callbacks into the vfio_ap_ops.c
> file keep all code related to managing queues in a single file.
>

It would have been helpful to split out the refactoring as a separate
patch. This way it is harder to review the code that got moved, because
it is intermingled with the changes that intend to change behavior.
 
> Signed-off-by: Tony Krowiak 
> ---
>  drivers/s390/crypto/vfio_ap_drv.c | 45 +--
>  drivers/s390/crypto/vfio_ap_ops.c | 63 +++
>  drivers/s390/crypto/vfio_ap_private.h |  7 +--
>  3 files changed, 52 insertions(+), 63 deletions(-)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_drv.c 
> b/drivers/s390/crypto/vfio_ap_drv.c
> index be2520cc010b..73bd073fd5d3 100644
> --- a/drivers/s390/crypto/vfio_ap_drv.c
> +++ b/drivers/s390/crypto/vfio_ap_drv.c
> @@ -43,47 +43,6 @@ static struct ap_device_id ap_queue_ids[] = {
>  
>  MODULE_DEVICE_TABLE(vfio_ap, ap_queue_ids);
>  
> -/**
> - * vfio_ap_queue_dev_probe:
> - *
> - * Allocate a vfio_ap_queue structure and associate it
> - * with the device as driver_data.
> - */
> -static int vfio_ap_queue_dev_probe(struct ap_device *apdev)
> -{
> - struct vfio_ap_queue *q;
> -
> - q = kzalloc(sizeof(*q), GFP_KERNEL);
> - if (!q)
> - return -ENOMEM;
> - dev_set_drvdata(>device, q);
> - q->apqn = to_ap_queue(>device)->qid;
> - q->saved_isc = VFIO_AP_ISC_INVALID;
> - return 0;
> -}
> -
> -/**
> - * vfio_ap_queue_dev_remove:
> - *
> - * Takes the matrix lock to avoid actions on this device while removing
> - * Free the associated vfio_ap_queue structure
> - */
> -static void vfio_ap_queue_dev_remove(struct ap_device *apdev)
> -{
> - struct vfio_ap_queue *q;
> - int apid, apqi;
> -
> - mutex_lock(_dev->lock);
> - q = dev_get_drvdata(>device);
> - dev_set_drvdata(>device, NULL);
> - apid = AP_QID_CARD(q->apqn);
> - apqi = AP_QID_QUEUE(q->apqn);
> - vfio_ap_mdev_reset_queue(apid, apqi, 1);
> - vfio_ap_irq_disable(q);
> - kfree(q);
> - mutex_unlock(_dev->lock);
> -}
> -
>  static void vfio_ap_matrix_dev_release(struct device *dev)
>  {
>   struct ap_matrix_dev *matrix_dev = dev_get_drvdata(dev);
> @@ -186,8 +145,8 @@ static int __init vfio_ap_init(void)
>   return ret;
>  
>   memset(_ap_drv, 0, sizeof(vfio_ap_drv));
> - vfio_ap_drv.probe = vfio_ap_queue_dev_probe;
> - vfio_ap_drv.remove = vfio_ap_queue_dev_remove;
> + vfio_ap_drv.probe = vfio_ap_mdev_probe_queue;
> + vfio_ap_drv.remove = vfio_ap_mdev_remove_queue;
>   vfio_ap_drv.ids = ap_queue_ids;
>  
>   ret = ap_driver_register(_ap_drv, THIS_MODULE, VFIO_AP_DRV_NAME);
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
> b/drivers/s390/crypto/vfio_ap_ops.c
> index e0bde8518745..c471832f0a30 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -119,7 +119,8 @@ static void vfio_ap_wait_for_irqclear(int apqn)
>   */
>  static void vfio_ap_free_aqic_resources(struct vfio_ap_queue *q)
>  {
> - if (q->saved_isc != VFIO_AP_ISC_INVALID && q->matrix_mdev)
> + if (q->saved_isc != VFIO_AP_ISC_INVALID && q->matrix_mdev &&
> + q->matrix_mdev->kvm)

Here is the check that the kvm reference exists, you mentioned in the
cover letter. You make only the gisc_unregister depend on it, because
that's what is going to explode.

But I'm actually 

Re: [RFCv2 05/16] x86/kvm: Make VirtIO use DMA API in KVM guest

2020-10-21 Thread Halil Pasic
On Tue, 20 Oct 2020 09:18:48 +0300
"Kirill A. Shutemov"  wrote:

> VirtIO for KVM is a primary way to provide IO. All memory that used for
> communication with the host has to be marked as shared.
> 
> The easiest way to archive that is to use DMA API that already knows how
> to deal with shared memory.
> 
> Signed-off-by: Kirill A. Shutemov 
> ---
>  drivers/virtio/virtio_ring.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index becc77697960..ace733845d5d 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -12,6 +12,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #ifdef DEBUG
>  /* For development, we want to crash whenever the ring is screwed. */
> @@ -255,6 +256,9 @@ static bool vring_use_dma_api(struct virtio_device *vdev)
>   if (xen_domain())
>   return true;
>  
> + if (kvm_mem_protected())
> + return true;
> +

I guess it does not matter because Christophs comment, but this breaks
the build for s390, because there is no kvm_mem_protected() for s390.

Regards,
Halil

>   return false;
>  }
>  



Re: [RFCv2 14/16] KVM: Handle protected memory in __kvm_map_gfn()/__kvm_unmap_gfn()

2020-10-21 Thread Halil Pasic
On Tue, 20 Oct 2020 09:18:57 +0300
"Kirill A. Shutemov"  wrote:

> We cannot access protected pages directly. Use ioremap() to
> create a temporary mapping of the page. The mapping is destroyed
> on __kvm_unmap_gfn().
> 
> The new interface gfn_to_pfn_memslot_protected() is used to detect if
> the page is protected.
> 
> ioremap_cache_force() is a hack to bypass IORES_MAP_SYSTEM_RAM check in
> the x86 ioremap code. We need a better solution.
> 
> Signed-off-by: Kirill A. Shutemov 
> ---
>  arch/powerpc/kvm/book3s_64_mmu_hv.c|  2 +-
>  arch/powerpc/kvm/book3s_64_mmu_radix.c |  2 +-
>  arch/x86/include/asm/io.h  |  2 +
>  arch/x86/include/asm/pgtable_types.h   |  1 +
>  arch/x86/kvm/mmu/mmu.c |  6 ++-
>  arch/x86/mm/ioremap.c  | 16 ++--
>  include/linux/kvm_host.h   |  3 +-
>  include/linux/kvm_types.h  |  1 +
>  virt/kvm/kvm_main.c| 52 +++---
>  9 files changed, 63 insertions(+), 22 deletions(-)
> 

You declare ioremap_cache_force() arch/x86/include/asm/io.h  in and
define it in arch/x86/mm/ioremap.c which is architecture specific code,
but use it in __kvm_map_gfn() in virt/kvm/kvm_main.c which is common
code.

Thus your series breaks the build for the s390 architecture. Have you
tried to (cross) compile for s390?

Regards,
Halil


Re: [PATCH v10 11/16] s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device

2020-10-05 Thread Halil Pasic
On Mon, 5 Oct 2020 12:24:39 -0400
Tony Krowiak  wrote:

> 
> 
> On 9/27/20 9:01 PM, Halil Pasic wrote:
> > On Fri, 21 Aug 2020 15:56:11 -0400
> > Tony Krowiak  wrote:
> >
> >> Let's hot plug/unplug adapters, domains and control domains assigned to or
> >> unassigned from an AP matrix mdev device while it is in use by a guest per
> >> the following:
> >>
> >> * When the APID of an adapter is assigned to a matrix mdev in use by a KVM
> >>guest, the adapter will be hot plugged into the KVM guest as long as 
> >> each
> >>APQN derived from the Cartesian product of the APID being assigned and
> >>the APQIs already assigned to the guest's CRYCB references a queue 
> >> device
> >>bound to the vfio_ap device driver.
> >>
> >> * When the APID of an adapter is unassigned from a matrix mdev in use by a
> >>KVM guest, the adapter will be hot unplugged from the KVM guest.
> >>
> >> * When the APQI of a domain is assigned to a matrix mdev in use by a KVM
> >>guest, the domain will be hot plugged into the KVM guest as long as each
> >>APQN derived from the Cartesian product of the APQI being assigned and
> >>the APIDs already assigned to the guest's CRYCB references a queue 
> >> device
> >>bound to the vfio_ap device driver.
> >>
> >> * When the APQI of a domain is unassigned from a matrix mdev in use by a
> >>KVM guest, the domain will be hot unplugged from the KVM guest
> > Hm, I suppose this means that what your guest effectively gets may depend
> > on whether assign_domain or assign_adapter is done first.
> >
> > Suppose we have the queues
> > 0.0 0.1
> > 1.0
> > bound to vfio_ap, i.e. 1.1 is missing for a reason different than
> > belonging to the default drivers (for what exact reason no idea).
> 
> I'm not quite sure what you mean be "we have queue". I will
> assume you mean those queues are bound to the vfio_ap
> device driver. 

Yes, this is exactly what I've meant.


> The only way this could happen is if somebody
> manually unbinds queue 1.1.
> 

Assuming that:
1) every time we observe ap_perm the ap subsystem in in a settled state
(i.e. not in a middle of pushing things left and right
because of an ap_perm change, 
2) the only non-default driver is vfio_ap, and that
3) queues handle non-operational states by other means than dissapearing
(should be the case with the latest reworks)
I agree what is left is manual unbind, which I lean towards considering
an edge case.

If this is indeed just about that edge case, maybe we can live with a
simpler algorithm than this one.


> > Let's suppose we started with the matix containing only adapter
> > 0 (0.) and domain 0 (.0).
> >
> > After echo 1 > assign_adapter && echo 1 > assign_domain we end up with
> > matrix:
> > 0.0 0.1
> > 1.0 1.1
> > guest_matrix:
> > 0.0 0.1
> > while after echo 1 > assign_domain && echo 1 > assign_adapter we end up
> > with:
> > matrix:
> > 0.0 0.1
> > 1.0 1.1
> > guest_matrix:
> > 0.0
> > 0.1
> >
> > That means, the set of bound queues and the set of assigned resources do
> > not fully determine the set of resources passed through to the guest.
> >
> > Is that a deliberate design choice?
> 
> Yes, it is a deliberate choice to only allow guest access to queues
> represented by queue devices bound to the vfio_ap device driver.
> The idea here is to adhere to the linux device model.
> 

This is not what I've asked. My question was about he fact that
reordering assignments gives different results. Well this was kind
of the case before as well, with the notable difference, that in a
past we always had an error. So if a full sequence of assignments could
be performed without an error, than any permutation would be performed
with the exact same result.

I'm all for only allowing guest access to queues represented by queue
devices bound to the vfio_ap device driver. I'm concerned with the
permutation (and calculus).

> >
> >> * When the domain number of a control domain is assigned to a matrix mdev
> >>in use by a KVM guest, the control domain will be hot plugged into the
> >>KVM guest.
> >>
> >> * When the domain number of a control domain is unassigned from a matrix
> >>mdev in use by a KVM guest, the control domain will be hot unplugged
> >>from the KVM guest.
> >>
> >> Signed-off-by: Tony Krowiak
> >> ---

[..]

> >> +static bool vfio_ap_mdev_assign_guest_apid(struct ap_matrix_mdev 
> >>

  1   2   3   4   >