Re: [Freedreno] [PATCH v7 6/6] drm/msm: iommu: Replace runtime calls with runtime suppliers

2018-02-22 Thread Rob Clark
On Thu, Feb 22, 2018 at 3:13 AM, Tomasz Figa  wrote:
> On Fri, Feb 16, 2018 at 9:13 AM, Tomasz Figa  wrote:
>> On Fri, Feb 16, 2018 at 2:14 AM, Robin Murphy  wrote:
>>> On 15/02/18 04:17, Tomasz Figa wrote:
>>> [...]
>
> Could you elaborate on what kind of locking you are concerned about?
> As I explained before, the normally happening fast path would lock
> dev->power_lock only for the brief moment of incrementing the runtime
> PM usage counter.


 My bad, that's not even it.

 The atomic usage counter is incremented beforehands, without any
 locking [1] and the spinlock is acquired only for the sake of
 validating that device's runtime PM state remained valid indeed [2],
 which would be the case in the fast path of the same driver doing two
 mappings in parallel, with the master powered on (and so the SMMU,
 through device links; if master was not powered on already, powering
 on the SMMU is unavoidable anyway and it would add much more latency
 than the spinlock itself).
>>>
>>>
>>> We now have no locking at all in the map path, and only a per-domain lock
>>> around TLB sync in unmap which is unfortunately necessary for correctness;
>>> the latter isn't too terrible, since in "serious" hardware it should only be
>>> serialising a few cpus serving the same device against each other (e.g. for
>>> multiple queues on a single NIC).
>>>
>>> Putting in a global lock which serialises *all* concurrent map and unmap
>>> calls for *all* unrelated devices makes things worse. Period. Even if the
>>> lock itself were held for the minimum possible time, i.e. trivially
>>> "spin_lock(); spin_unlock()", the cost of repeatedly bouncing that
>>> one cache line around between 96 CPUs across two sockets is not negligible.
>>
>> Fair enough. Note that we're in a quite interesting situation now:
>>  a) We need to have runtime PM enabled on Qualcomm SoC to have power
>> properly managed,
>>  b) We need to have lock-free map/unmap on such distributed systems,
>>  c) If runtime PM is enabled, we need to call into runtime PM from any
>> code that does hardware accesses, otherwise the IOMMU API (and so DMA
>> API and then any V4L2 driver) becomes unusable.
>>
>> I can see one more way that could potentially let us have all the
>> three. How about enabling runtime PM only on selected implementations
>> (e.g. qcom,smmu) and then having all the runtime PM calls surrounded
>> with if (pm_runtime_enabled()), which is lockless?
>>
>
> Sorry for pinging, but any opinion on this kind of approach?
>

It is ok by me, for whatever that is worth

BR,
-R
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Freedreno] [PATCH v7 6/6] drm/msm: iommu: Replace runtime calls with runtime suppliers

2018-02-15 Thread Rob Clark
On Wed, Feb 14, 2018 at 11:09 PM, Tomasz Figa  wrote:
> On Thu, Feb 15, 2018 at 1:12 AM, Rob Clark  wrote:
>> On Wed, Feb 14, 2018 at 10:48 AM, Jordan Crouse  
>> wrote:
>>> On Wed, Feb 14, 2018 at 12:31:29PM +0900, Tomasz Figa wrote:

 - When submitting commands to the GPU, the GPU driver will
 pm_runtime_get_sync() on the GPU device, which will automatically do
 the same on all the linked suppliers, which would also include the
 SMMU itself. The role of device links here is exactly that the GPU
 driver doesn't have to care which other devices need to be brought up.
>>>
>>> This is true.  Assuming that the device link works correctly we would not 
>>> need
>>> to explicitly power the SMMU which makes my point entirely moot.
>>
>> Just to point out what motivated this patchset, the biggest problem is
>> iommu_unmap() because that can happen when GPU is not powered on (or
>> in the v4l2 case, because some other device dropped it's reference to
>> the dma-buf allowing it to be free'd).  Currently we pm get/put the
>> GPU device around unmap, but it is kinda silly to boot up the GPU just
>> to unmap a buffer.
>
> Note that in V4L2 both mapping and unmapping can happen completely
> without involving the driver. So AFAICT the approach being implemented
> by this patchset will not work, because there will be no one to power
> up the IOMMU before the operation. Moreover, there are platforms for
> which there is no reason to power up the IOMMU just for map/unmap,
> because the hardware state is lost anyway and the only real work
> needed is updating the page tables in memory. (I feel like this is
> actually true for most of the platforms in the wild, but this is based
> purely on the not so small number of platforms I worked with, haven't
> bothered looking for more general evidence.)
>

At least as far as drm/msm/adreno, I'm not terribly concerned about
other platforms that don't need to power up iommu.  It's not really
the same situation as a IP block that shows up in all different
vendor's SoCs.

But if you can convince Robin to go for get/put_sync() calls inside
the iommu driver, I'm fine with that approach too.  That is what I do
in qcom_iommu already.  But if not I'd like to at least solve this for
some platforms if we can't solve for all.

BR,
-R

>>
>> (Semi-related, I would also like to batch map/unmap's, I just haven't
>> gotten around to implementing it yet.. but that would be another case
>> where a single get_supplier()/put_supplier() outside of the iommu
>> would make sense instead of pm_get/put() inside the iommu driver's
>> ->unmap().)
>>
>> If you really dislike the get/put_supplier() approach, then perhaps we
>> need iommu_pm_get()/iommu_pm_put() operations that the iommu user
>> could use to accomplish the same thing?
>
> I'm afraid this wouldn't work for V4L2 either. And I still haven't
> been given any evidence that the approach I'm suggesting, which relies
> only on existing pieces of infrastructure and which worked for both
> Exynos and Rockchip, including V4L2, wouldn't work for SMMU and/or QC
> SoCs.
>
> Best regards,
> Tomasz
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Freedreno] [PATCH v7 6/6] drm/msm: iommu: Replace runtime calls with runtime suppliers

2018-02-14 Thread Tomasz Figa
On Thu, Feb 15, 2018 at 1:12 AM, Rob Clark  wrote:
> On Wed, Feb 14, 2018 at 10:48 AM, Jordan Crouse  
> wrote:
>> On Wed, Feb 14, 2018 at 12:31:29PM +0900, Tomasz Figa wrote:
>>>
>>> - When submitting commands to the GPU, the GPU driver will
>>> pm_runtime_get_sync() on the GPU device, which will automatically do
>>> the same on all the linked suppliers, which would also include the
>>> SMMU itself. The role of device links here is exactly that the GPU
>>> driver doesn't have to care which other devices need to be brought up.
>>
>> This is true.  Assuming that the device link works correctly we would not 
>> need
>> to explicitly power the SMMU which makes my point entirely moot.
>
> Just to point out what motivated this patchset, the biggest problem is
> iommu_unmap() because that can happen when GPU is not powered on (or
> in the v4l2 case, because some other device dropped it's reference to
> the dma-buf allowing it to be free'd).  Currently we pm get/put the
> GPU device around unmap, but it is kinda silly to boot up the GPU just
> to unmap a buffer.

Note that in V4L2 both mapping and unmapping can happen completely
without involving the driver. So AFAICT the approach being implemented
by this patchset will not work, because there will be no one to power
up the IOMMU before the operation. Moreover, there are platforms for
which there is no reason to power up the IOMMU just for map/unmap,
because the hardware state is lost anyway and the only real work
needed is updating the page tables in memory. (I feel like this is
actually true for most of the platforms in the wild, but this is based
purely on the not so small number of platforms I worked with, haven't
bothered looking for more general evidence.)

>
> (Semi-related, I would also like to batch map/unmap's, I just haven't
> gotten around to implementing it yet.. but that would be another case
> where a single get_supplier()/put_supplier() outside of the iommu
> would make sense instead of pm_get/put() inside the iommu driver's
> ->unmap().)
>
> If you really dislike the get/put_supplier() approach, then perhaps we
> need iommu_pm_get()/iommu_pm_put() operations that the iommu user
> could use to accomplish the same thing?

I'm afraid this wouldn't work for V4L2 either. And I still haven't
been given any evidence that the approach I'm suggesting, which relies
only on existing pieces of infrastructure and which worked for both
Exynos and Rockchip, including V4L2, wouldn't work for SMMU and/or QC
SoCs.

Best regards,
Tomasz
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Freedreno] [PATCH v7 6/6] drm/msm: iommu: Replace runtime calls with runtime suppliers

2018-02-14 Thread Rob Clark
On Wed, Feb 14, 2018 at 10:48 AM, Jordan Crouse  wrote:
> On Wed, Feb 14, 2018 at 12:31:29PM +0900, Tomasz Figa wrote:
>>
>> - When submitting commands to the GPU, the GPU driver will
>> pm_runtime_get_sync() on the GPU device, which will automatically do
>> the same on all the linked suppliers, which would also include the
>> SMMU itself. The role of device links here is exactly that the GPU
>> driver doesn't have to care which other devices need to be brought up.
>
> This is true.  Assuming that the device link works correctly we would not need
> to explicitly power the SMMU which makes my point entirely moot.

Just to point out what motivated this patchset, the biggest problem is
iommu_unmap() because that can happen when GPU is not powered on (or
in the v4l2 case, because some other device dropped it's reference to
the dma-buf allowing it to be free'd).  Currently we pm get/put the
GPU device around unmap, but it is kinda silly to boot up the GPU just
to unmap a buffer.

(Semi-related, I would also like to batch map/unmap's, I just haven't
gotten around to implementing it yet.. but that would be another case
where a single get_supplier()/put_supplier() outside of the iommu
would make sense instead of pm_get/put() inside the iommu driver's
->unmap().)

If you really dislike the get/put_supplier() approach, then perhaps we
need iommu_pm_get()/iommu_pm_put() operations that the iommu user
could use to accomplish the same thing?

BR,
-R
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel