Re: How to resolve an issue in swiotlb environment?

2019-06-19 Thread Alan Stern
On Wed, 19 Jun 2019, shuah wrote:

> I missed a lot of the thread info. and went looking for it and found the
> following summary of the problem:
> 
> ==
> The issue which prompted the commit this thread is about arose in a
> situation where the block layer set up a scatterlist containing buffer
> sizes something like:
> 
>   4096 4096 1536 1024
> 
> and the maximum packet size was 1024.  The situation was a little
> unusual, because it involved vhci-hcd (a virtual HCD).  This doesn't
> matter much in normal practice because:
> 
>   Block devices normally have a block size of 512 bytes or more.
>   Smaller values are very uncommon.  So scatterlist element sizes
>   are always divisible by 512.
> 
>   xHCI is the only USB host controller type with a maximum packet
>   size larger than 512, and xHCI hardware can do full
>   scatter-gather so it doesn't care what the buffer sizes are.
> 
> So another approach would be to fix vhci-hcd and then trust that the
> problem won't arise again, for the reasons above.  We would be okay so
> long as nobody tried to use a USB-SCSI device with a block size of 256
> bytes or less.
> ===
> 
> Out of the summary, the following gives me pause:
> 
> "xHCI hardware can do full scatter-gather so it doesn't care what the
> buffer sizes are."
> 
> vhci-hcd won't be able to count on hardware being able to do full
> scatter-gather. It has to deal with a variety of hardware with
> varying speeds.

Sure.  But you can test whether the server's HCD is able to handle 
scatter-gather transfers, and if it is then you can say that the 
client-side vhci-hcd is able to handle them as well.  Then all you 
would have to do is preserve the scatterlist information describing the 
transfer when you go between the client and the server.

The point is to make sure that the client-side vhci-hcd doesn't claim
to be _less_ capable than the server-side actual HCD.  That's what
leads to the problem described above.

> "We would be okay so long as nobody tried to use a USB-SCSI device with
> a block size of 256 bytes or less."
> 
> At least a USB Storage device, I test with says 512 block size. Can we
> count on not seeing a device with block size <= 256 bytes?

Yes, we can.  In fact, the SCSI core doesn't handle devices with block 
size < 512.

> In any case, I am looking into adding SG support vhci-hci at the moment.
> 
> Looks like the following is the repo, I should be working with?
> 
> git://git.infradead.org/users/hch/misc.git

It doesn't matter.  Your work should end up being independent of 
Christoph's, so you can base it on any repo.

Alan Stern



Re: How to resolve an issue in swiotlb environment?

2019-06-19 Thread shuah

Hi Alan,

On 6/18/19 9:28 AM, shuah wrote:

On 6/14/19 8:44 AM, Alan Stern wrote:

On Thu, 13 Jun 2019, shuah wrote:


Great!  So all we have to do is fix vhci-hcd.  Then we can remove all
the virt_boundary_mask stuff from usb-storage and uas entirely.

(I'm assuming wireless USB isn't a genuine issue.  As far as I know, it
is pretty much abandoned at this point.)

Valentina and Shua: Adding SG support to vhci-hcd shouldn't be too
hard.  It ought to be possible even without changing the network
protocol.



I will start taking a look at this. Is there a target release in plan
to drop virt_boundary_mask stuff?


Not yet.  But since it doesn't do what we want anyway, this should be
fixed quickly.





I missed a lot of the thread info. and went looking for it and found the
following summary of the problem:

==
The issue which prompted the commit this thread is about arose in a
situation where the block layer set up a scatterlist containing buffer
sizes something like:

4096 4096 1536 1024

and the maximum packet size was 1024.  The situation was a little
unusual, because it involved vhci-hcd (a virtual HCD).  This doesn't
matter much in normal practice because:

Block devices normally have a block size of 512 bytes or more.
Smaller values are very uncommon.  So scatterlist element sizes
are always divisible by 512.

xHCI is the only USB host controller type with a maximum packet
size larger than 512, and xHCI hardware can do full
scatter-gather so it doesn't care what the buffer sizes are.

So another approach would be to fix vhci-hcd and then trust that the
problem won't arise again, for the reasons above.  We would be okay so
long as nobody tried to use a USB-SCSI device with a block size of 256
bytes or less.
===

Out of the summary, the following gives me pause:

"xHCI hardware can do full scatter-gather so it doesn't care what the
buffer sizes are."

vhci-hcd won't be able to count on hardware being able to do full
scatter-gather. It has to deal with a variety of hardware with
varying speeds.

"We would be okay so long as nobody tried to use a USB-SCSI device with
a block size of 256 bytes or less."

At least a USB Storage device, I test with says 512 block size. Can we
count on not seeing a device with block size <= 256 bytes?

In any case, I am looking into adding SG support vhci-hci at the moment.

Looks like the following is the repo, I should be working with?

git://git.infradead.org/users/hch/misc.git

thanks,
-- Shuah


Re: [PATCH v4 03/22] iommu: Introduce device fault report API

2019-06-19 Thread Jonathan Cameron
On Sun, 9 Jun 2019 06:44:03 -0700
Jacob Pan  wrote:

> Traditionally, device specific faults are detected and handled within
> their own device drivers. When IOMMU is enabled, faults such as DMA
> related transactions are detected by IOMMU. There is no generic
> reporting mechanism to report faults back to the in-kernel device
> driver or the guest OS in case of assigned devices.
> 
> This patch introduces a registration API for device specific fault
> handlers. This differs from the existing iommu_set_fault_handler/
> report_iommu_fault infrastructures in several ways:
> - it allows to report more sophisticated fault events (both
>   unrecoverable faults and page request faults) due to the nature
>   of the iommu_fault struct
> - it is device specific and not domain specific.
> 
> The current iommu_report_device_fault() implementation only handles
> the "shoot and forget" unrecoverable fault case. Handling of page
> request faults or stalled faults will come later.
> 
> Signed-off-by: Jacob Pan 
> Signed-off-by: Ashok Raj 
> Signed-off-by: Jean-Philippe Brucker 
> Signed-off-by: Eric Auger 

A few nitpicks and minor suggestions inline.

> ---
>  drivers/iommu/iommu.c | 127 
> +-
>  include/linux/iommu.h |  33 -
>  2 files changed, 157 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 67ee662..7955184 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -644,6 +644,13 @@ int iommu_group_add_device(struct iommu_group *group, 
> struct device *dev)
>   goto err_free_name;
>   }
>  
> + dev->iommu_param = kzalloc(sizeof(*dev->iommu_param), GFP_KERNEL);
> + if (!dev->iommu_param) {
> + ret = -ENOMEM;
> + goto err_free_name;
> + }
> + mutex_init(>iommu_param->lock);
> +
>   kobject_get(group->devices_kobj);
>  
>   dev->iommu_group = group;
> @@ -674,6 +681,7 @@ int iommu_group_add_device(struct iommu_group *group, 
> struct device *dev)
>   mutex_unlock(>mutex);
>   dev->iommu_group = NULL;
>   kobject_put(group->devices_kobj);
> + kfree(dev->iommu_param);
>  err_free_name:
>   kfree(device->name);
>  err_remove_link:
> @@ -720,7 +728,7 @@ void iommu_group_remove_device(struct device *dev)
>   sysfs_remove_link(>kobj, "iommu_group");
>  
>   trace_remove_device_from_group(group->id, dev);
> -
> + kfree(dev->iommu_param);
>   kfree(device->name);
>   kfree(device);
>   dev->iommu_group = NULL;
> @@ -855,6 +863,123 @@ int iommu_group_unregister_notifier(struct iommu_group 
> *group,
>  EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier);
>  
>  /**
> + * iommu_register_device_fault_handler() - Register a device fault handler
> + * @dev: the device
> + * @handler: the fault handler
> + * @data: private data passed as argument to the handler
> + *
> + * When an IOMMU fault event is received, this handler gets called with the
> + * fault event and data as argument.
> + *
> + * Return 0 if the fault handler was installed successfully, or an error.
> + */
> +int iommu_register_device_fault_handler(struct device *dev,
> + iommu_dev_fault_handler_t handler,
> + void *data)
> +{
> + struct iommu_param *param = dev->iommu_param;
> + int ret = 0;
> +
> + /*
> +  * Device iommu_param should have been allocated when device is
> +  * added to its iommu_group.
> +  */
> + if (!param)
> + return -EINVAL;
> +
> + mutex_lock(>lock);
> + /* Only allow one fault handler registered for each device */
> + if (param->fault_param) {
> + ret = -EBUSY;
> + goto done_unlock;
> + }
> +
> + get_device(dev);
> + param->fault_param =
> + kzalloc(sizeof(struct iommu_fault_param), GFP_KERNEL);
> + if (!param->fault_param) {
> + put_device(dev);
> + ret = -ENOMEM;
> + goto done_unlock;
> + }
> + param->fault_param->handler = handler;
> + param->fault_param->data = data;
> +
> +done_unlock:
> + mutex_unlock(>lock);
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(iommu_register_device_fault_handler);
> +
> +/**
> + * iommu_unregister_device_fault_handler() - Unregister the device fault 
> handler
> + * @dev: the device
> + *
> + * Remove the device fault handler installed with
> + * iommu_register_device_fault_handler().
> + *
> + * Return 0 on success, or an error.
> + */
> +int iommu_unregister_device_fault_handler(struct device *dev)
> +{
> + struct iommu_param *param = dev->iommu_param;
> + int ret = 0;
> +
> + if (!param)
> + return -EINVAL;
> +
> + mutex_lock(>lock);
> +
> + if (!param->fault_param)
> + goto unlock;
> +
> + kfree(param->fault_param);
> + param->fault_param = NULL;
> + put_device(dev);
> +unlock:
> + 

Re: use exact allocation for dma coherent memory

2019-06-19 Thread Jason Gunthorpe
On Mon, Jun 17, 2019 at 10:33:42AM +0200, Christoph Hellwig wrote:
> > drivers/infiniband/hw/cxgb4/qp.c
> >129  static int alloc_host_sq(struct c4iw_rdev *rdev, struct t4_sq *sq)
> >130  {
> >131  sq->queue = dma_alloc_coherent(&(rdev->lldi.pdev->dev), 
> > sq->memsize,
> >132 &(sq->dma_addr), GFP_KERNEL);
> >133  if (!sq->queue)
> >134  return -ENOMEM;
> >135  sq->phys_addr = virt_to_phys(sq->queue);
> >136  dma_unmap_addr_set(sq, mapping, sq->dma_addr);
> >137  return 0;
> >138  }
> > 
> > Is this a bug?
> 
> Yes.  This will blow up badly on many platforms, as sq->queue
> might be vmapped, ioremapped, come from a pool without page backing.

Gah, this addr gets fed into io_remap_pfn_range/remap_pfn_range too..

Potnuri, you should fix this.. 

You probably need to use dma_mmap_from_dev_coherent() in the mmap ?

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v4 20/21] iommu/vt-d: hpet: Reserve an interrupt remampping table entry for watchdog

2019-06-19 Thread Jacob Pan
On Tue, 18 Jun 2019 01:08:06 +0200 (CEST)
Thomas Gleixner  wrote:

> Stephane,
> 
> On Mon, 17 Jun 2019, Stephane Eranian wrote:
> > On Mon, Jun 17, 2019 at 1:25 AM Thomas Gleixner
> >  wrote:  
> > > Great that there is no trace of any mail from Andi or Stephane
> > > about this on LKML. There is no problem with talking offlist
> > > about this stuff, but then you should at least provide a
> > > rationale for those who were not part of the private conversation.
> > >  
> > Let me add some context to this whole patch series. The pressure on
> > the core PMU counters is increasing as more people want to use them
> > to measure always more events. When the PMU is overcommitted, i.e.,
> > more events than counters for them, there is multiplexing. It comes
> > with an overhead that is too high for certain applications. One way
> > to avoid this is to lower the multiplexing frequency, which is by
> > default 1ms, but that comes with loss of accuracy. Another approach
> > is to measure only a small number of events at a time and use
> > multiple runs, but then you lose consistent event view. Another
> > approach is to push for increasing the number of counters. But
> > getting new hardware counters takes time. Short term, we can
> > investigate what it would take to free one cycle-capable counter
> > which is commandeered by the hard lockup detector on all X86
> > processors today. The functionality of the watchdog, being able to
> > get a crash dump on kernel deadlocks, is important and we cannot
> > simply disable it. At scale, many bugs are exposed and thus
> > machines deadlock. Therefore, we want to investigate what it would
> > take to move the detector to another NMI-capable source, such as
> > the HPET because the detector does not need high low granularity
> > timer and interrupts only every 2s.  
> 
> I'm well aware about the reasons for this.
> 
> > Furthermore, recent Intel erratum, e.g., the TSX issue forcing the
> > TFA code in perf_events, have increased the pressure even more with
> > only 3 generic counters left. Thus, it is time to look at
> > alternative ways of getting a hard lockup detector (NMI watchdog)
> > from another NMI source than the PMU. To that extent, I have been
> > discussing about alternatives.
> >
> > Intel suggested using the HPET and Ricardo has been working on
> > producing this patch series. It is clear from your review
> > that the patches have issues, but I am hoping that they can be
> > resolved with constructive feedback knowing what the end goal is.  
> 
> Well, I gave constructive feedback from the very first version on. But
> essential parts of that feedback have been ignored for whatever
> reasons.
> 
> > As for the round-robin changes, yes, we discussed this as an
> > alternative to avoid overloading CPU0 with handling all of the work
> > to broadcasting IPI to 100+ other CPUs.  
> 
> I can understand the reason why you don't want to do that, but again,
> I said way before this was tried that changing affinity from NMI
> context with the IOMMU cannot work by just calling into the iommu
> code and it needs some deep investigation with the IOMMU wizards
> whether a preallocated entry can be used lockless (including the
> subsequently required flush).
> 
> The outcome is that the change was implemented by simply calling into
> functions which I told that they cannot be called from NMI context.
> 
> Unless this problem is not solved and I doubt it can be solved after
> talking to IOMMU people and studying manuals, 
I agree. modify irte might be done with cmpxchg_double() but the queued
invalidation interface for IRTE cache flush is shared with DMA and
requires holding a spinlock for enque descriptors, QI tail update etc.

Also, reserving & manipulating IRTE slot for hpet via backdoor might not
be needed if the HPET PCI BDF (found in ACPI) can be utilized. But it
might need more work to add a fake PCI device for HPET.

> the round robin
> mechanics in the current form are not going to happen. We'd need a
> SMI based lockup detector to debug the resulting livelock wreckage.
> 
> There are two possible options:
> 
>   1) Back to the IPI approach
> 
>  The probem with broadcast is that it sends IPIs one by one to
> each online CPU, which sums up with a large number of CPUs.
> 
>  The interesting question is why the kernel does not utilize the
> all excluding self destination shorthand for this. The SDM is not
> giving any information.
> 
>  But there is a historic commit which is related and gives a hint:
> 
> commit e77deacb7b078156fcadf27b838a4ce1a65eda04
> Author: Keith Owens 
> Date:   Mon Jun 26 13:59:56 2006 +0200
> 
> [PATCH] x86_64: Avoid broadcasting NMI IPIs
> 
> On some i386/x86_64 systems, sending an NMI IPI as a
> broadcast will reset the system.  This seems to be a BIOS bug which
> affects machines where one or more cpus are not under OS control.  It
>   occurs on HT systems with a version of the 

Re: [PATCH] swiotlb: fix phys_addr_t overflow warning

2019-06-19 Thread Konrad Rzeszutek Wilk
On Mon, Jun 17, 2019 at 09:13:16AM -0700, Stefano Stabellini wrote:
> On Mon, 17 Jun 2019, Arnd Bergmann wrote:
> > On architectures that have a larger dma_addr_t than phys_addr_t,
> > the swiotlb_tbl_map_single() function truncates its return code
> > in the failure path, making it impossible to identify the error
> > later, as we compare to the original value:
> > 
> > kernel/dma/swiotlb.c:551:9: error: implicit conversion from 'dma_addr_t' 
> > (aka 'unsigned long long') to 'phys_addr_t' (aka 'unsigned int') changes 
> > value from 18446744073709551615 to 4294967295 
> > [-Werror,-Wconstant-conversion]
> > return DMA_MAPPING_ERROR;
> > 
> > Use an explicit typecast here to convert it to the narrower type,
> > and use the same expression in the error handling later.
> > 
> > Fixes: b907e20508d0 ("swiotlb: remove SWIOTLB_MAP_ERROR")
> > Signed-off-by: Arnd Bergmann 
> 
> Acked-by: Stefano Stabellini 

queued.
> 
> 
> > ---
> > I still think that reverting the original commit would have
> > provided clearer semantics for this corner case, but at least
> > this patch restores the correct behavior.
> > ---
> >  drivers/xen/swiotlb-xen.c | 2 +-
> >  kernel/dma/swiotlb.c  | 4 ++--
> >  2 files changed, 3 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> > index d53f3493a6b9..cfbe46785a3b 100644
> > --- a/drivers/xen/swiotlb-xen.c
> > +++ b/drivers/xen/swiotlb-xen.c
> > @@ -402,7 +402,7 @@ static dma_addr_t xen_swiotlb_map_page(struct device 
> > *dev, struct page *page,
> >  
> > map = swiotlb_tbl_map_single(dev, start_dma_addr, phys, size, dir,
> >  attrs);
> > -   if (map == DMA_MAPPING_ERROR)
> > +   if (map == (phys_addr_t)DMA_MAPPING_ERROR)
> > return DMA_MAPPING_ERROR;
> >  
> > dev_addr = xen_phys_to_bus(map);
> > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> > index e906ef2e6315..a3be651973ad 100644
> > --- a/kernel/dma/swiotlb.c
> > +++ b/kernel/dma/swiotlb.c
> > @@ -548,7 +548,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
> > if (!(attrs & DMA_ATTR_NO_WARN) && printk_ratelimit())
> > dev_warn(hwdev, "swiotlb buffer is full (sz: %zd bytes), total 
> > %lu (slots), used %lu (slots)\n",
> >  size, io_tlb_nslabs, tmp_io_tlb_used);
> > -   return DMA_MAPPING_ERROR;
> > +   return (phys_addr_t)DMA_MAPPING_ERROR;
> >  found:
> > io_tlb_used += nslots;
> > spin_unlock_irqrestore(_tlb_lock, flags);
> > @@ -666,7 +666,7 @@ bool swiotlb_map(struct device *dev, phys_addr_t *phys, 
> > dma_addr_t *dma_addr,
> > /* Oh well, have to allocate and map a bounce buffer. */
> > *phys = swiotlb_tbl_map_single(dev, __phys_to_dma(dev, io_tlb_start),
> > *phys, size, dir, attrs);
> > -   if (*phys == DMA_MAPPING_ERROR)
> > +   if (*phys == (phys_addr_t)DMA_MAPPING_ERROR)
> > return false;
> >  
> > /* Ensure that the address returned is DMA'ble */
> > -- 
> > 2.20.0
> > 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/8] iommu: Add I/O ASID allocator

2019-06-19 Thread Jean-Philippe Brucker
On 18/06/2019 18:05, Jacob Pan wrote:
> On Tue, 18 Jun 2019 15:22:20 +0100
> Jean-Philippe Brucker  wrote:
> 
>> On 11/06/2019 19:13, Jacob Pan wrote:
>> +/**
>> + * ioasid_find - Find IOASID data
>> + * @set: the IOASID set
>> + * @ioasid: the IOASID to find
>> + * @getter: function to call on the found object
>> + *
>> + * The optional getter function allows to take a reference to
>> the found object
>> + * under the rcu lock. The function can also check if the object
>> is still valid:
>> + * if @getter returns false, then the object is invalid and NULL
>> is returned.
>> + *
>> + * If the IOASID has been allocated for this set, return the
>> private pointer
>> + * passed to ioasid_alloc. Private data can be NULL if not set.
>> Return an error
>> + * if the IOASID is not found or does not belong to the set.
>
> Perhaps should make it clear that @set can be null.

 Indeed. But I'm not sure allowing @set to be NULL is such a good
 idea, because the data type associated to an ioasid depends on its
 set. For example SVA will put an mm_struct in there, and auxiliary
 domains use some structure private to the IOMMU domain.
  
>>> I am not sure we need to count on @set to decipher data type.
>>> Whoever does the allocation and owns the IOASID should knows its
>>> own data type. My thought was that @set is only used to group IDs,
>>> permission check etc.
>>>   
 Jacob, could me make @set mandatory, or do you see a use for a
 global search? If @set is NULL, then callers can check if the
 return pointer is NULL, but will run into trouble if they try to
 dereference it. 
>>> A global search use case can be for PRQ. IOMMU driver gets a IOASID
>>> (first interrupt then retrieve from a queue), it has no idea which
>>> @set it belongs to. But the data types are the same for all IOASIDs
>>> used by the IOMMU.  
>>
>> They aren't when we use a generic SVA handler. Following a call to
>> iommu_sva_bind_device(), iommu-sva.c allocates an IOASID and store an
>> mm_struct. If auxiliary domains are also enabled for the device,
>> following a call to iommu_aux_attach_device() the IOMMU driver
>> allocates an IOASID and stores some private object.
>>
>> Now for example the IOMMU driver receives a PPR and calls
>> ioasid_find() with @set = NULL. ioasid_find() may return either an
>> mm_struct or a private object, and the driver cannot know which it is
>> so the returned value is unusable.
> I think we might have a misunderstanding of what ioasid_set is. Or i
> have misused your original intention for it:) So my understanding of an
> ioasid_set is to identify a sub set of IOASIDs that
> 1. shares the same data type
> 2. belongs to the same permission group, owner.

In my case it's mostly #1. The two IOASID sets (SVA and AUX) are managed
by different modules (iommu-sva and arm-smmu-v3). Since we don't do
scalable IOV, the two sets are shared_ioasid and private_ioasid, with
either an mm or NULL as private data (but we might need to store a
domain for private IOASIDs at some point). So at the moment passing a
NULL set to ioasid_find() would work for us as well.

However in a non-virtualization scenario, a device driver could define
its own set and allocate an IOASID for its own use, associating some
private structure with it. If it somehow causes a PRQ on that IOASID,
the IOMMU driver shouldn't attempt to access the device driver's private
structure. So I do think we need to be careful with global
ioasid_find(). Given that any driver in the system can use the
allocator, we won't be able to keep assuming that all objects stored in
there are of one type.

> Our usage is #2., we put a mm_struct as the set to do permission
> check. E.g VFIO can check guest PASID ownership based on QEMU process
> mm. This will make sure IOASID allocated by one guest can only be used
> by the same guest.
> 
> For IOASID used for sva bind, let it be native or guest sva, we always
> have the same data type. Therefore, when page request handler gets
> called to search with ioasid_find(), it knows its data type. An
> additional flag will tell if it is a guest bind or native bind.
> 
> I was under the assumption that aux domain and its IOASID is a 1:1
> mapping, there is no need for a search. Or even it needs to search, it
> will be searched by the same caller that did the allocation, therefore
> it knows what private data type is.
>
> In addition, aux domain is used for request w/o PASID. And PPR for
> request w/o PASID is not to be supported. So there would not be a need
> to search from page request handler.
> 
> Now if we take the above assumption away, and use ioasid_set strictly
> to differentiate the data types (Usage #1). Then I agree we can get
> rid of NULL set and global search.
> 
> That would require we converge on the generic sva_bind for both
> native and guest. The additional implication is that VFIO layer has to
> 

Re: [PATCH v2 02/12] iommu/mediatek: Add probe_defer for smi-larb

2019-06-19 Thread Matthias Brugger



On 10/06/2019 14:55, Yong Wu wrote:
> The iommu consumer should use device_link to connect with the
> smi-larb(supplier). then the smi-larb should run before the iommu
> consumer. Here we delay the iommu driver until the smi driver is
> ready, then all the iommu consumer always is after the smi driver.
> 
> When there is no this patch, if some consumer drivers run before
> smi-larb, the supplier link_status is DL_DEV_NO_DRIVER(0) in the
> device_link_add, then device_links_driver_bound will use WARN_ON
> to complain that the link_status of supplier is not right.
> 
> This is a preparing patch for adding device_link.
> 
> Signed-off-by: Yong Wu 
> ---
>  drivers/iommu/mtk_iommu.c| 2 +-
>  drivers/iommu/mtk_iommu_v1.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index 6fe3369..f7599d8 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -664,7 +664,7 @@ static int mtk_iommu_probe(struct platform_device *pdev)
>   id = i;
>  
>   plarbdev = of_find_device_by_node(larbnode);
> - if (!plarbdev) {
> + if (!plarbdev || !plarbdev->dev.driver) {

can't we use:
device_lock()
device_is_bound(struct device *dev)
device_unlock()

>   of_node_put(larbnode);
>   return -EPROBE_DEFER;
>   }
> diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c
> index 0b0908c..c43c4a0 100644
> --- a/drivers/iommu/mtk_iommu_v1.c
> +++ b/drivers/iommu/mtk_iommu_v1.c
> @@ -604,7 +604,7 @@ static int mtk_iommu_probe(struct platform_device *pdev)
>   plarbdev = of_platform_device_create(
>   larb_spec.np, NULL,
>   platform_bus_type.dev_root);
> - if (!plarbdev) {
> + if (!plarbdev || !plarbdev->dev.driver) {
>   of_node_put(larb_spec.np);
>   return -EPROBE_DEFER;
>   }
> 


Re: [PATCH 3/8] iommu/arm-smmu-v3: Support platform SSID

2019-06-19 Thread Jean-Philippe Brucker
On 18/06/2019 19:08, Will Deacon wrote:
>> +/*
>> + * If the SMMU doesn't support 2-stage CD, limit the linear
>> + * tables to a reasonable number of contexts, let's say
>> + * 64kB / sizeof(ctx_desc) = 1024 = 2^10
>> + */
>> +if (!(smmu->features & ARM_SMMU_FEAT_2_LVL_CDTAB))
>> +master->ssid_bits = min(master->ssid_bits, 10U);
> 
> Please introduce a #define for the 10, so that it is computed in the way
> you describe in the comment (a bit like we do for things like queue sizes).

Ok

>> +
>>  group = iommu_group_get_for_dev(dev);
>>  if (!IS_ERR(group)) {
>>  iommu_group_put(group);
>> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
>> index f04a6df65eb8..04f4f6b95d82 100644
>> --- a/drivers/iommu/of_iommu.c
>> +++ b/drivers/iommu/of_iommu.c
>> @@ -206,8 +206,12 @@ const struct iommu_ops *of_iommu_configure(struct 
>> device *dev,
>>  if (err)
>>  break;
>>  }
>> -}
>>  
>> +fwspec = dev_iommu_fwspec_get(dev);
>> +if (!err && fwspec)
>> +of_property_read_u32(master_np, "pasid-num-bits",
>> + >num_pasid_bits);
>> +}
> 
> Hmm. Do you know if there's anything in ACPI for this?

Yes, IORT version D introduced a "substream width" field for the Named
component node (platform device). I don't think it existed last time I
checked, so I'll see about supporting it.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 26/29] vfio-pci: Register an iommu fault handler

2019-06-19 Thread Jean-Philippe Brucker
On 19/06/2019 01:19, Jacob Pan wrote:
>>> I see this as a future extension due to limited testing,   
>>
>> I'm wondering how we deal with:
>> (1) old userspace that won't fill the new private_data field in
>> page_response. A new kernel still has to support it.
>> (2) old kernel that won't recognize the new PRIVATE_DATA flag.
>> Currently iommu_page_response() rejects page responses with unknown
>> flags.
>>
>> I guess we'll need a two-way negotiation, where userspace queries
>> whether the kernel supports the flag (2), and the kernel learns
>> whether it should expect the private data to come back (1).
>>
> I am not sure case (1) exist in that there is no existing user space
> supports PRQ w/o private data. Am I missing something?
> 
> For VT-d emulation, private data is always part of the scalable mode
> PASID capability. If vIOMMU query host supports PASID and scalable
> mode, it will always support private data once PRQ is enabled.

Right if VT-d won't ever support page_response without private data then
I don't think we have to worry about (1).

> So I think we only need to negotiate (2) which should be covered by
> VT-d PASID cap.
> 
>>> perhaps for
>>> now, can you add paddings similar to page request? Make it 64B as
>>> well.  
>>
>> I don't think padding is necessary, because iommu_page_response is
>> sent by userspace to the kernel, unlike iommu_fault which is
>> allocated by userspace and filled by the kernel.
>>
>> Page response looks a lot more like existing VFIO mechanisms, so I
>> suppose we'll wrap the iommu_page_response structure and include an
>> argsz parameter at the top:
>>
>>  struct vfio_iommu_page_response {
>>  u32 argsz;
>>  struct iommu_page_response pr;
>>  };
>>
>>  struct vfio_iommu_page_response vpr = {
>>  .argsz = sizeof(vpr),
>>  .pr = ...
>>  ...
>>  };
>>
>>  ioctl(devfd, VFIO_IOMMU_PAGE_RESPONSE, );
>>
>> In that case supporting private data can be done by simply appending a
>> field at the end (plus the negotiation above).
>>
> Do you mean at the end of struct vfio_iommu_page_response{}? or at
> the end of that seems struct iommu_page_response{}?
> 
> The consumer of the private data is iommu driver not vfio. So I think
> you want to add the new field at the end of struct iommu_page_response,
> right?

Yes that's what I meant

Thanks,
Jean


Re: [PATCH v3 1/4] firmware: qcom_scm-64: Add atomic version of qcom_scm_call

2019-06-19 Thread Vivek Gautam
On Tue, Jun 18, 2019 at 11:25 PM Will Deacon  wrote:
>
> On Wed, Jun 12, 2019 at 12:45:51PM +0530, Vivek Gautam wrote:
> > There are scnenarios where drivers are required to make a
> > scm call in atomic context, such as in one of the qcom's
> > arm-smmu-500 errata [1].
> >
> > [1] ("https://source.codeaurora.org/quic/la/kernel/msm-4.9/commit/
> >   drivers/iommu/arm-smmu.c?h=CogSystems-msm-49/
> >   msm-4.9=da765c6c75266b38191b38ef086274943f353ea7")
> >
> > Signed-off-by: Vivek Gautam 
> > Reviewed-by: Bjorn Andersson 
> > ---
> >  drivers/firmware/qcom_scm-64.c | 136 
> > -
> >  1 file changed, 92 insertions(+), 44 deletions(-)
> >
> > diff --git a/drivers/firmware/qcom_scm-64.c b/drivers/firmware/qcom_scm-64.c
> > index 91d5ad7cf58b..b6dca32c5ac4 100644
> > --- a/drivers/firmware/qcom_scm-64.c
> > +++ b/drivers/firmware/qcom_scm-64.c

[snip]

> > +
> > +static void qcom_scm_call_do(const struct qcom_scm_desc *desc,
> > +  struct arm_smccc_res *res, u32 fn_id,
> > +  u64 x5, bool atomic)
> > +{
>
> Maybe pass in the call type (ARM_SMCCC_FAST_CALL vs ARM_SMCCC_STD_CALL)
> instead of "bool atomic"? Would certainly make the callsites easier to
> understand.

Sure, will do that.

>
> > + int retry_count = 0;
> > +
> > + if (!atomic) {
> > + do {
> > + mutex_lock(_scm_lock);
> > +
> > + __qcom_scm_call_do(desc, res, fn_id, x5,
> > +ARM_SMCCC_STD_CALL);
> > +
> > + mutex_unlock(_scm_lock);
> > +
> > + if (res->a0 == QCOM_SCM_V2_EBUSY) {
> > + if (retry_count++ > QCOM_SCM_EBUSY_MAX_RETRY)
> > + break;
> > + msleep(QCOM_SCM_EBUSY_WAIT_MS);
> > + }
> > + }  while (res->a0 == QCOM_SCM_V2_EBUSY);
> > + } else {
> > + __qcom_scm_call_do(desc, res, fn_id, x5, ARM_SMCCC_FAST_CALL);
> > + }
>
> Is it safe to make concurrent FAST calls?

I better add a spinlock here.

Thanks & regards
Vivek

>
> Will
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu



-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu