Re: [Freedreno] [PATCH v7 6/6] drm/msm: iommu: Replace runtime calls with runtime suppliers
On Thu, Feb 22, 2018 at 3:13 AM, Tomasz Figawrote: > On Fri, Feb 16, 2018 at 9:13 AM, Tomasz Figa wrote: >> On Fri, Feb 16, 2018 at 2:14 AM, Robin Murphy wrote: >>> On 15/02/18 04:17, Tomasz Figa wrote: >>> [...] > > Could you elaborate on what kind of locking you are concerned about? > As I explained before, the normally happening fast path would lock > dev->power_lock only for the brief moment of incrementing the runtime > PM usage counter. My bad, that's not even it. The atomic usage counter is incremented beforehands, without any locking [1] and the spinlock is acquired only for the sake of validating that device's runtime PM state remained valid indeed [2], which would be the case in the fast path of the same driver doing two mappings in parallel, with the master powered on (and so the SMMU, through device links; if master was not powered on already, powering on the SMMU is unavoidable anyway and it would add much more latency than the spinlock itself). >>> >>> >>> We now have no locking at all in the map path, and only a per-domain lock >>> around TLB sync in unmap which is unfortunately necessary for correctness; >>> the latter isn't too terrible, since in "serious" hardware it should only be >>> serialising a few cpus serving the same device against each other (e.g. for >>> multiple queues on a single NIC). >>> >>> Putting in a global lock which serialises *all* concurrent map and unmap >>> calls for *all* unrelated devices makes things worse. Period. Even if the >>> lock itself were held for the minimum possible time, i.e. trivially >>> "spin_lock(); spin_unlock()", the cost of repeatedly bouncing that >>> one cache line around between 96 CPUs across two sockets is not negligible. >> >> Fair enough. Note that we're in a quite interesting situation now: >> a) We need to have runtime PM enabled on Qualcomm SoC to have power >> properly managed, >> b) We need to have lock-free map/unmap on such distributed systems, >> c) If runtime PM is enabled, we need to call into runtime PM from any >> code that does hardware accesses, otherwise the IOMMU API (and so DMA >> API and then any V4L2 driver) becomes unusable. >> >> I can see one more way that could potentially let us have all the >> three. How about enabling runtime PM only on selected implementations >> (e.g. qcom,smmu) and then having all the runtime PM calls surrounded >> with if (pm_runtime_enabled()), which is lockless? >> > > Sorry for pinging, but any opinion on this kind of approach? > It is ok by me, for whatever that is worth BR, -R ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [Freedreno] [PATCH v7 6/6] drm/msm: iommu: Replace runtime calls with runtime suppliers
On Wed, Feb 14, 2018 at 11:09 PM, Tomasz Figawrote: > On Thu, Feb 15, 2018 at 1:12 AM, Rob Clark wrote: >> On Wed, Feb 14, 2018 at 10:48 AM, Jordan Crouse >> wrote: >>> On Wed, Feb 14, 2018 at 12:31:29PM +0900, Tomasz Figa wrote: - When submitting commands to the GPU, the GPU driver will pm_runtime_get_sync() on the GPU device, which will automatically do the same on all the linked suppliers, which would also include the SMMU itself. The role of device links here is exactly that the GPU driver doesn't have to care which other devices need to be brought up. >>> >>> This is true. Assuming that the device link works correctly we would not >>> need >>> to explicitly power the SMMU which makes my point entirely moot. >> >> Just to point out what motivated this patchset, the biggest problem is >> iommu_unmap() because that can happen when GPU is not powered on (or >> in the v4l2 case, because some other device dropped it's reference to >> the dma-buf allowing it to be free'd). Currently we pm get/put the >> GPU device around unmap, but it is kinda silly to boot up the GPU just >> to unmap a buffer. > > Note that in V4L2 both mapping and unmapping can happen completely > without involving the driver. So AFAICT the approach being implemented > by this patchset will not work, because there will be no one to power > up the IOMMU before the operation. Moreover, there are platforms for > which there is no reason to power up the IOMMU just for map/unmap, > because the hardware state is lost anyway and the only real work > needed is updating the page tables in memory. (I feel like this is > actually true for most of the platforms in the wild, but this is based > purely on the not so small number of platforms I worked with, haven't > bothered looking for more general evidence.) > At least as far as drm/msm/adreno, I'm not terribly concerned about other platforms that don't need to power up iommu. It's not really the same situation as a IP block that shows up in all different vendor's SoCs. But if you can convince Robin to go for get/put_sync() calls inside the iommu driver, I'm fine with that approach too. That is what I do in qcom_iommu already. But if not I'd like to at least solve this for some platforms if we can't solve for all. BR, -R >> >> (Semi-related, I would also like to batch map/unmap's, I just haven't >> gotten around to implementing it yet.. but that would be another case >> where a single get_supplier()/put_supplier() outside of the iommu >> would make sense instead of pm_get/put() inside the iommu driver's >> ->unmap().) >> >> If you really dislike the get/put_supplier() approach, then perhaps we >> need iommu_pm_get()/iommu_pm_put() operations that the iommu user >> could use to accomplish the same thing? > > I'm afraid this wouldn't work for V4L2 either. And I still haven't > been given any evidence that the approach I'm suggesting, which relies > only on existing pieces of infrastructure and which worked for both > Exynos and Rockchip, including V4L2, wouldn't work for SMMU and/or QC > SoCs. > > Best regards, > Tomasz ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [Freedreno] [PATCH v7 6/6] drm/msm: iommu: Replace runtime calls with runtime suppliers
On Thu, Feb 15, 2018 at 1:12 AM, Rob Clarkwrote: > On Wed, Feb 14, 2018 at 10:48 AM, Jordan Crouse > wrote: >> On Wed, Feb 14, 2018 at 12:31:29PM +0900, Tomasz Figa wrote: >>> >>> - When submitting commands to the GPU, the GPU driver will >>> pm_runtime_get_sync() on the GPU device, which will automatically do >>> the same on all the linked suppliers, which would also include the >>> SMMU itself. The role of device links here is exactly that the GPU >>> driver doesn't have to care which other devices need to be brought up. >> >> This is true. Assuming that the device link works correctly we would not >> need >> to explicitly power the SMMU which makes my point entirely moot. > > Just to point out what motivated this patchset, the biggest problem is > iommu_unmap() because that can happen when GPU is not powered on (or > in the v4l2 case, because some other device dropped it's reference to > the dma-buf allowing it to be free'd). Currently we pm get/put the > GPU device around unmap, but it is kinda silly to boot up the GPU just > to unmap a buffer. Note that in V4L2 both mapping and unmapping can happen completely without involving the driver. So AFAICT the approach being implemented by this patchset will not work, because there will be no one to power up the IOMMU before the operation. Moreover, there are platforms for which there is no reason to power up the IOMMU just for map/unmap, because the hardware state is lost anyway and the only real work needed is updating the page tables in memory. (I feel like this is actually true for most of the platforms in the wild, but this is based purely on the not so small number of platforms I worked with, haven't bothered looking for more general evidence.) > > (Semi-related, I would also like to batch map/unmap's, I just haven't > gotten around to implementing it yet.. but that would be another case > where a single get_supplier()/put_supplier() outside of the iommu > would make sense instead of pm_get/put() inside the iommu driver's > ->unmap().) > > If you really dislike the get/put_supplier() approach, then perhaps we > need iommu_pm_get()/iommu_pm_put() operations that the iommu user > could use to accomplish the same thing? I'm afraid this wouldn't work for V4L2 either. And I still haven't been given any evidence that the approach I'm suggesting, which relies only on existing pieces of infrastructure and which worked for both Exynos and Rockchip, including V4L2, wouldn't work for SMMU and/or QC SoCs. Best regards, Tomasz ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [Freedreno] [PATCH v7 6/6] drm/msm: iommu: Replace runtime calls with runtime suppliers
On Wed, Feb 14, 2018 at 10:48 AM, Jordan Crousewrote: > On Wed, Feb 14, 2018 at 12:31:29PM +0900, Tomasz Figa wrote: >> >> - When submitting commands to the GPU, the GPU driver will >> pm_runtime_get_sync() on the GPU device, which will automatically do >> the same on all the linked suppliers, which would also include the >> SMMU itself. The role of device links here is exactly that the GPU >> driver doesn't have to care which other devices need to be brought up. > > This is true. Assuming that the device link works correctly we would not need > to explicitly power the SMMU which makes my point entirely moot. Just to point out what motivated this patchset, the biggest problem is iommu_unmap() because that can happen when GPU is not powered on (or in the v4l2 case, because some other device dropped it's reference to the dma-buf allowing it to be free'd). Currently we pm get/put the GPU device around unmap, but it is kinda silly to boot up the GPU just to unmap a buffer. (Semi-related, I would also like to batch map/unmap's, I just haven't gotten around to implementing it yet.. but that would be another case where a single get_supplier()/put_supplier() outside of the iommu would make sense instead of pm_get/put() inside the iommu driver's ->unmap().) If you really dislike the get/put_supplier() approach, then perhaps we need iommu_pm_get()/iommu_pm_put() operations that the iommu user could use to accomplish the same thing? BR, -R ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel