Re: [PATCH v3 02/10] swiotlb: Factor out slot allocation and free
Hi, On 4/24/19 10:45 PM, Christoph Hellwig wrote: On Tue, Apr 23, 2019 at 03:32:16PM +0800, Lu Baolu wrote: When we add the bounce buffer between IOVA and physical buffer, the bounced buffer must starts from the same offset in a page, otherwise, IOMMU can't work here. Why? Even with the odd hardware descriptors typical in Intel land that only allow offsets in the first page, not the following ones, having a zero offset where we previously had one should be fine. This is not VT-d specific. It's just how generic IOMMU works. Normally, IOMMU works in paging mode. So if a driver issues DMA with IOVA 0x0123, IOMMU can remap it with a physical address 0x0123. But we should never expect IOMMU to remap 0x0123 with physical address of 0x. That's the reason why I said that IOMMU will not work there. swiotlb System IOVA bounce pageMemory .-. .-. .-. | | | | | | | | | | | | buffer_start .-. .-.buffer_start .-. | |->| | | | | | | |>| | | | | | swiotlb | | IOMMU Page '-' '-' mapping '-' Boundary | | | | | | | | | | | | | |->| | | |IOMMU mapping | | | | | | IOMMU Page .-. .-. Boundary | | | | | | | | | |->| | | | IOMMU mapping| | | | | | | | | | IOMMU Page .-. .-. .-. Boundary | | | | | | | | | |>| | | |->| | swiotlb | | buffer_end '-' '-' mapping '-' | | | | | | | | | | | | '-' '-' '-' This is the whole view of iommu bounce page. I expect the buffer_start returned by swiotlb_tbl_map_single() starts from the same page_offset as the buffer_start in IOVA. Best regards, Lu Baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 3/7] iommu/vt-d: Expose ISA direct mapping region via iommu_get_resv_regions
Hi Tom, On 4/25/19 7:47 AM, Tom Murphy wrote: I can see two potential problems with these patches that should be addressed: The default domain of a group can be changed to type IOMMU_DOMAIN_IDENTITY via the command line. With these patches we are returning the si_domain for type IOMMU_DOMAIN_IDENTITY. There's a chance the shared si_domain could be freed while it is still being used when a group is freed. For example here in the iommu_group_release: https://github.com/torvalds/linux/blob/cd8dead0c39457e58ec1d36db93aedca811d48f1/drivers/iommu/iommu.c#L376 "if (group->default_domain) iommu_domain_free(group->default_domain);" Also now that a domain is attached to a device earlier we should implement the is_attach_deferred call-back and use it to defer the domain attach from iommu driver init to device driver init when iommu is pre-enabled in kdump kernel. like in this commit: https://github.com/torvalds/linux/commit/df3f7a6e8e855e4ff533508807cd7c3723faa51f Both agreed and should be considered during development. Best regards, Lu Baolu On Tue, Apr 16, 2019 at 3:24 AM Lu Baolu wrote: Hi James, On 4/15/19 10:16 PM, James Sewart wrote: Hey Lu, On 10 Apr 2019, at 06:22, Lu Baolu wrote: Hi James, On 4/6/19 2:02 AM, James Sewart wrote: Hey Lu, My bad, did some debugging on my end. The issue was swapping out find_domain for iommu_get_domain_for_dev. It seems in some situations the domain is not attached to the group but the device is expected to have the domain still stored in its archdata. I’ve attached the final patch with find_domain unremoved which seems to work in my testing. Just looked into your v3 patch set and some thoughts from my end posted here just for your information. Let me post the problems we want to address. 1. When allocating a new group for a device, how should we determine the type of the default domain? 2. If we need to put a device into an existing group which uses a different type of domain from what the device desires to use, we might break the functionality of the device. My new thought is letting the iommu generic code to determine the default domain type (hence my proposed vendor specific default domain type patches could be dropped). If the default domain type is dynamical mapping, and later in iommu_no_mapping(), we determines that we must use an identity domain, we then call iommu_request_dm_for_dev(dev). If the default domain type is identity mapping, and later in iommu_no_mapping(), we determined that we must use a dynamical domain, we then call iommu_request_dma_domain_for_dev(dev). We already have iommu_request_dm_for_dev() in iommu.c. We only need to implement iommu_request_dma_domain_for_dev(). With this done, your patch titled "Create an IOMMU group for devices that require an identity map" could also be dropped. Any thoughts? Theres a couple issues I can think of. iommu_request_dm_for_dev changes the domain for all devices within the devices group, if we may have devices with different domain requirements in the same group then only the last initialised device will get the domain it wants. If we want to add iommu_request_dma_domain_for_dev then we would still need another IOMMU group for identity domain devices. Current solution (a.k.a. v3) for this is to assign a new group to the device which requires an identity mapped domain. This will break the functionality of device direct pass-through (to user level). The iommu group represents the minimum granularity of iommu isolation and protection. The requirement of identity mapped domain should not be a factor for a new group. Both iomm_request_dma_domain_for_dev() or iommu_request_dm_for_dev() only support groups with a single device in it. The users could choose between domains of DMA type or IDENTITY type for a group. If multiple devices share a single group and both don't work for them, they have to disable the IOMMU. This isn't a valid configuration from IOMMU's point of view. Both with v3 and the proposed method here, iommu_should_identity_map is determining whether the device requires an identity map. In v3 this is called once up front by the generic code to determine for the IOMMU group which domain type to use. In the proposed method I think this would happen after initial domain allocation and would trigger the domain to be replaced with a different domain. I prefer the solution in v3. What do you think? The only concern of solution in v3 from my point of view is some kind of duplication. Even we can ask the Intel IOMMU driver to tell the default domain type, we might change it after boot up. For example, for 32-bit pci devices, we don't know whether it's 64-bit or 32-bit capable during IOMMU initialization, so we might tell iommu.c to use identity mapped domain. After boot up, we check it again, and find out that it's only 32-bit capable (hence only can access physical memory below 4G) and the default identity domain doesn't work. And we have to change it to DMA domain
Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted
On Wed, Apr 24, 2019 at 10:01:56PM -0300, Thiago Jung Bauermann wrote: > > Michael S. Tsirkin writes: > > > On Wed, Apr 17, 2019 at 06:42:00PM -0300, Thiago Jung Bauermann wrote: > >> > >> Michael S. Tsirkin writes: > >> > >> > On Thu, Mar 21, 2019 at 09:05:04PM -0300, Thiago Jung Bauermann wrote: > >> >> > >> >> Michael S. Tsirkin writes: > >> >> > >> >> > On Wed, Mar 20, 2019 at 01:13:41PM -0300, Thiago Jung Bauermann wrote: > >> >> >> >From what I understand of the ACCESS_PLATFORM definition, the host > >> >> >> >will > >> >> >> only ever try to access memory addresses that are supplied to it by > >> >> >> the > >> >> >> guest, so all of the secure guest memory that the host cares about is > >> >> >> accessible: > >> >> >> > >> >> >> If this feature bit is set to 0, then the device has same access > >> >> >> to > >> >> >> memory addresses supplied to it as the driver has. In particular, > >> >> >> the device will always use physical addresses matching addresses > >> >> >> used by the driver (typically meaning physical addresses used by > >> >> >> the > >> >> >> CPU) and not translated further, and can access any address > >> >> >> supplied > >> >> >> to it by the driver. When clear, this overrides any > >> >> >> platform-specific description of whether device access is > >> >> >> limited or > >> >> >> translated in any way, e.g. whether an IOMMU may be present. > >> >> >> > >> >> >> All of the above is true for POWER guests, whether they are secure > >> >> >> guests or not. > >> >> >> > >> >> >> Or are you saying that a virtio device may want to access memory > >> >> >> addresses that weren't supplied to it by the driver? > >> >> > > >> >> > Your logic would apply to IOMMUs as well. For your mode, there are > >> >> > specific encrypted memory regions that driver has access to but device > >> >> > does not. that seems to violate the constraint. > >> >> > >> >> Right, if there's a pre-configured 1:1 mapping in the IOMMU such that > >> >> the device can ignore the IOMMU for all practical purposes I would > >> >> indeed say that the logic would apply to IOMMUs as well. :-) > >> >> > >> >> I guess I'm still struggling with the purpose of signalling to the > >> >> driver that the host may not have access to memory addresses that it > >> >> will never try to access. > >> > > >> > For example, one of the benefits is to signal to host that driver does > >> > not expect ability to access all memory. If it does, host can > >> > fail initialization gracefully. > >> > >> But why would the ability to access all memory be necessary or even > >> useful? When would the host access memory that the driver didn't tell it > >> to access? > > > > When I say all memory I mean even memory not allowed by the IOMMU. > > Yes, but why? How is that memory relevant? It's relevant when driver is not trusted to only supply correct addresses. The feature was originally designed to support userspace drivers within guests. > >> >> >> >> > But the name "sev_active" makes me scared because at least AMD > >> >> >> >> > guys who > >> >> >> >> > were doing the sensible thing and setting ACCESS_PLATFORM > >> >> >> >> > >> >> >> >> My understanding is, AMD guest-platform knows in advance that > >> >> >> >> their > >> >> >> >> guest will run in secure mode and hence sets the flag at the time > >> >> >> >> of VM > >> >> >> >> instantiation. Unfortunately we dont have that luxury on our > >> >> >> >> platforms. > >> >> >> > > >> >> >> > Well you do have that luxury. It looks like that there are existing > >> >> >> > guests that already acknowledge ACCESS_PLATFORM and you are not > >> >> >> > happy > >> >> >> > with how that path is slow. So you are trying to optimize for > >> >> >> > them by clearing ACCESS_PLATFORM and then you have lost ability > >> >> >> > to invoke DMA API. > >> >> >> > > >> >> >> > For example if there was another flag just like ACCESS_PLATFORM > >> >> >> > just not yet used by anyone, you would be all fine using that > >> >> >> > right? > >> >> >> > >> >> >> Yes, a new flag sounds like a great idea. What about the definition > >> >> >> below? > >> >> >> > >> >> >> VIRTIO_F_ACCESS_PLATFORM_NO_IOMMU This feature has the same meaning > >> >> >> as > >> >> >> VIRTIO_F_ACCESS_PLATFORM both when set and when not set, with the > >> >> >> exception that the IOMMU is explicitly defined to be off or > >> >> >> bypassed > >> >> >> when accessing memory addresses supplied to the device by the > >> >> >> driver. This flag should be set by the guest if offered, but to > >> >> >> allow for backward-compatibility device implementations allow > >> >> >> for it > >> >> >> to be left unset by the guest. It is an error to set both this > >> >> >> flag > >> >> >> and VIRTIO_F_ACCESS_PLATFORM. > >> >> > > >> >> > It looks kind of narrow but it's an option. > >> >> > >> >> Great! > >> >> > >> >> > I wonder how we'll define what's an iommu though. > >> >> > >> >>
Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted
Michael S. Tsirkin writes: > On Wed, Apr 17, 2019 at 06:42:00PM -0300, Thiago Jung Bauermann wrote: >> >> Michael S. Tsirkin writes: >> >> > On Thu, Mar 21, 2019 at 09:05:04PM -0300, Thiago Jung Bauermann wrote: >> >> >> >> Michael S. Tsirkin writes: >> >> >> >> > On Wed, Mar 20, 2019 at 01:13:41PM -0300, Thiago Jung Bauermann wrote: >> >> >> >From what I understand of the ACCESS_PLATFORM definition, the host >> >> >> >will >> >> >> only ever try to access memory addresses that are supplied to it by the >> >> >> guest, so all of the secure guest memory that the host cares about is >> >> >> accessible: >> >> >> >> >> >> If this feature bit is set to 0, then the device has same access to >> >> >> memory addresses supplied to it as the driver has. In particular, >> >> >> the device will always use physical addresses matching addresses >> >> >> used by the driver (typically meaning physical addresses used by >> >> >> the >> >> >> CPU) and not translated further, and can access any address >> >> >> supplied >> >> >> to it by the driver. When clear, this overrides any >> >> >> platform-specific description of whether device access is limited >> >> >> or >> >> >> translated in any way, e.g. whether an IOMMU may be present. >> >> >> >> >> >> All of the above is true for POWER guests, whether they are secure >> >> >> guests or not. >> >> >> >> >> >> Or are you saying that a virtio device may want to access memory >> >> >> addresses that weren't supplied to it by the driver? >> >> > >> >> > Your logic would apply to IOMMUs as well. For your mode, there are >> >> > specific encrypted memory regions that driver has access to but device >> >> > does not. that seems to violate the constraint. >> >> >> >> Right, if there's a pre-configured 1:1 mapping in the IOMMU such that >> >> the device can ignore the IOMMU for all practical purposes I would >> >> indeed say that the logic would apply to IOMMUs as well. :-) >> >> >> >> I guess I'm still struggling with the purpose of signalling to the >> >> driver that the host may not have access to memory addresses that it >> >> will never try to access. >> > >> > For example, one of the benefits is to signal to host that driver does >> > not expect ability to access all memory. If it does, host can >> > fail initialization gracefully. >> >> But why would the ability to access all memory be necessary or even >> useful? When would the host access memory that the driver didn't tell it >> to access? > > When I say all memory I mean even memory not allowed by the IOMMU. Yes, but why? How is that memory relevant? >> >> >> >> > But the name "sev_active" makes me scared because at least AMD >> >> >> >> > guys who >> >> >> >> > were doing the sensible thing and setting ACCESS_PLATFORM >> >> >> >> >> >> >> >> My understanding is, AMD guest-platform knows in advance that their >> >> >> >> guest will run in secure mode and hence sets the flag at the time >> >> >> >> of VM >> >> >> >> instantiation. Unfortunately we dont have that luxury on our >> >> >> >> platforms. >> >> >> > >> >> >> > Well you do have that luxury. It looks like that there are existing >> >> >> > guests that already acknowledge ACCESS_PLATFORM and you are not happy >> >> >> > with how that path is slow. So you are trying to optimize for >> >> >> > them by clearing ACCESS_PLATFORM and then you have lost ability >> >> >> > to invoke DMA API. >> >> >> > >> >> >> > For example if there was another flag just like ACCESS_PLATFORM >> >> >> > just not yet used by anyone, you would be all fine using that right? >> >> >> >> >> >> Yes, a new flag sounds like a great idea. What about the definition >> >> >> below? >> >> >> >> >> >> VIRTIO_F_ACCESS_PLATFORM_NO_IOMMU This feature has the same meaning as >> >> >> VIRTIO_F_ACCESS_PLATFORM both when set and when not set, with the >> >> >> exception that the IOMMU is explicitly defined to be off or >> >> >> bypassed >> >> >> when accessing memory addresses supplied to the device by the >> >> >> driver. This flag should be set by the guest if offered, but to >> >> >> allow for backward-compatibility device implementations allow for >> >> >> it >> >> >> to be left unset by the guest. It is an error to set both this flag >> >> >> and VIRTIO_F_ACCESS_PLATFORM. >> >> > >> >> > It looks kind of narrow but it's an option. >> >> >> >> Great! >> >> >> >> > I wonder how we'll define what's an iommu though. >> >> >> >> Hm, it didn't occur to me it could be an issue. I'll try. >> >> I rephrased it in terms of address translation. What do you think of >> this version? The flag name is slightly different too: >> >> >> VIRTIO_F_ACCESS_PLATFORM_NO_TRANSLATION This feature has the same >> meaning as VIRTIO_F_ACCESS_PLATFORM both when set and when not set, >> with the exception that address translation is guaranteed to be >> unnecessary when accessing memory addresses supplied to the device >> by the driver. Which
Re: [PATCH v2 3/7] iommu/vt-d: Expose ISA direct mapping region via iommu_get_resv_regions
I can see two potential problems with these patches that should be addressed: The default domain of a group can be changed to type IOMMU_DOMAIN_IDENTITY via the command line. With these patches we are returning the si_domain for type IOMMU_DOMAIN_IDENTITY. There's a chance the shared si_domain could be freed while it is still being used when a group is freed. For example here in the iommu_group_release: https://github.com/torvalds/linux/blob/cd8dead0c39457e58ec1d36db93aedca811d48f1/drivers/iommu/iommu.c#L376 "if (group->default_domain) iommu_domain_free(group->default_domain);" Also now that a domain is attached to a device earlier we should implement the is_attach_deferred call-back and use it to defer the domain attach from iommu driver init to device driver init when iommu is pre-enabled in kdump kernel. like in this commit: https://github.com/torvalds/linux/commit/df3f7a6e8e855e4ff533508807cd7c3723faa51f On Tue, Apr 16, 2019 at 3:24 AM Lu Baolu wrote: > > Hi James, > > On 4/15/19 10:16 PM, James Sewart wrote: > > Hey Lu, > > > >> On 10 Apr 2019, at 06:22, Lu Baolu wrote: > >> > >> Hi James, > >> > >> On 4/6/19 2:02 AM, James Sewart wrote: > >>> Hey Lu, > >>> My bad, did some debugging on my end. The issue was swapping out > >>> find_domain for iommu_get_domain_for_dev. It seems in some situations the > >>> domain is not attached to the group but the device is expected to have the > >>> domain still stored in its archdata. > >>> I’ve attached the final patch with find_domain unremoved which seems to > >>> work in my testing. > >> > >> Just looked into your v3 patch set and some thoughts from my end posted > >> here just for your information. > >> > >> Let me post the problems we want to address. > >> > >> 1. When allocating a new group for a device, how should we determine the > >> type of the default domain? > >> > >> 2. If we need to put a device into an existing group which uses a > >> different type of domain from what the device desires to use, we might > >> break the functionality of the device. > >> > >> My new thought is letting the iommu generic code to determine the > >> default domain type (hence my proposed vendor specific default domain > >> type patches could be dropped). > >> > >> If the default domain type is dynamical mapping, and later in > >> iommu_no_mapping(), we determines that we must use an identity domain, > >> we then call iommu_request_dm_for_dev(dev). > >> > >> If the default domain type is identity mapping, and later in > >> iommu_no_mapping(), we determined that we must use a dynamical domain, > >> we then call iommu_request_dma_domain_for_dev(dev). > >> > >> We already have iommu_request_dm_for_dev() in iommu.c. We only need to > >> implement iommu_request_dma_domain_for_dev(). > >> > >> With this done, your patch titled "Create an IOMMU group for devices > >> that require an identity map" could also be dropped. > >> > >> Any thoughts? > > > > Theres a couple issues I can think of. iommu_request_dm_for_dev changes > > the domain for all devices within the devices group, if we may have > > devices with different domain requirements in the same group then only the > > last initialised device will get the domain it wants. If we want to add > > iommu_request_dma_domain_for_dev then we would still need another IOMMU > > group for identity domain devices. > > Current solution (a.k.a. v3) for this is to assign a new group to the > device which requires an identity mapped domain. This will break the > functionality of device direct pass-through (to user level). The iommu > group represents the minimum granularity of iommu isolation and > protection. The requirement of identity mapped domain should not be a > factor for a new group. > > Both iomm_request_dma_domain_for_dev() or iommu_request_dm_for_dev() > only support groups with a single device in it. > > The users could choose between domains of DMA type or IDENTITY type for > a group. If multiple devices share a single group and both don't work > for them, they have to disable the IOMMU. This isn't a valid > configuration from IOMMU's point of view. > > > > > Both with v3 and the proposed method here, iommu_should_identity_map is > > determining whether the device requires an identity map. In v3 this is > > called once up front by the generic code to determine for the IOMMU group > > which domain type to use. In the proposed method I think this would happen > > after initial domain allocation and would trigger the domain to be > > replaced with a different domain. > > > > I prefer the solution in v3. What do you think? > > The only concern of solution in v3 from my point of view is some kind of > duplication. Even we can ask the Intel IOMMU driver to tell the default > domain type, we might change it after boot up. For example, for 32-bit > pci devices, we don't know whether it's 64-bit or 32-bit capable during > IOMMU initialization, so we might tell iommu.c to use identity mapped > domain. After boot up, we check
[PATCH v2 9/9] platform/chrome: chromeos_laptop: use helper pci_dev_id
Use new helper pci_dev_id() to simplify the code. Signed-off-by: Heiner Kallweit --- drivers/platform/chrome/chromeos_laptop.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/platform/chrome/chromeos_laptop.c b/drivers/platform/chrome/chromeos_laptop.c index 24326eecd..7abbb6167 100644 --- a/drivers/platform/chrome/chromeos_laptop.c +++ b/drivers/platform/chrome/chromeos_laptop.c @@ -125,7 +125,7 @@ static bool chromeos_laptop_match_adapter_devid(struct device *dev, u32 devid) return false; pdev = to_pci_dev(dev); - return devid == PCI_DEVID(pdev->bus->number, pdev->devfn); + return devid == pci_dev_id(pdev); } static void chromeos_laptop_check_adapter(struct i2c_adapter *adapter) -- 2.21.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 8/9] stmmac: pci: use helper pci_dev_id
Use new helper pci_dev_id() to simplify the code. Signed-off-by: Heiner Kallweit --- drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c index cc1e887e4..0e87a9596 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c @@ -208,7 +208,7 @@ static int quark_default_data(struct pci_dev *pdev, ret = 1; } - plat->bus_id = PCI_DEVID(pdev->bus->number, pdev->devfn); + plat->bus_id = pci_dev_id(pdev); plat->phy_addr = ret; plat->interface = PHY_INTERFACE_MODE_RMII; -- 2.21.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 7/9] iommu/vt-d: use helper pci_dev_id
Use new helper pci_dev_id() to simplify the code. Signed-off-by: Heiner Kallweit --- drivers/iommu/intel-iommu.c | 2 +- drivers/iommu/intel_irq_remapping.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index d93c4bd7d..3f475a23a 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -1391,7 +1391,7 @@ static void iommu_enable_dev_iotlb(struct device_domain_info *info) /* pdev will be returned if device is not a vf */ pf_pdev = pci_physfn(pdev); - info->pfsid = PCI_DEVID(pf_pdev->bus->number, pf_pdev->devfn); + info->pfsid = pci_dev_id(pf_pdev); } #ifdef CONFIG_INTEL_IOMMU_SVM diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c index 634d8f059..4160aa9f3 100644 --- a/drivers/iommu/intel_irq_remapping.c +++ b/drivers/iommu/intel_irq_remapping.c @@ -424,7 +424,7 @@ static int set_msi_sid(struct irte *irte, struct pci_dev *dev) set_irte_sid(irte, SVT_VERIFY_SID_SQ, SQ_ALL_16, data.alias); else set_irte_sid(irte, SVT_VERIFY_SID_SQ, SQ_ALL_16, -PCI_DEVID(dev->bus->number, dev->devfn)); +pci_dev_id(dev)); return 0; } -- 2.21.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 6/9] iommu/amd: use helper pci_dev_id
Use new helper pci_dev_id() to simplify the code. Signed-off-by: Heiner Kallweit --- drivers/iommu/amd_iommu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index f467cc4b4..5cb201422 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -165,7 +165,7 @@ static inline u16 get_pci_device_id(struct device *dev) { struct pci_dev *pdev = to_pci_dev(dev); - return PCI_DEVID(pdev->bus->number, pdev->devfn); + return pci_dev_id(pdev); } static inline int get_acpihid_device_id(struct device *dev, -- 2.21.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 5/9] drm/amdkfd: use helper pci_dev_id
Use new helper pci_dev_id() to simplify the code. Signed-off-by: Heiner Kallweit --- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c index 2cb09e088..769dbc7be 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c @@ -1272,8 +1272,7 @@ int kfd_topology_add_device(struct kfd_dev *gpu) dev->node_props.vendor_id = gpu->pdev->vendor; dev->node_props.device_id = gpu->pdev->device; - dev->node_props.location_id = PCI_DEVID(gpu->pdev->bus->number, - gpu->pdev->devfn); + dev->node_props.location_id = pci_dev_id(gpu->pdev); dev->node_props.max_engine_clk_fcompute = amdgpu_amdkfd_get_max_engine_clock_in_mhz(dev->gpu->kgd); dev->node_props.max_engine_clk_ccompute = -- 2.21.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 4/9] powerpc/powernv/npu: use helper pci_dev_id
Use new helper pci_dev_id() to simplify the code. Signed-off-by: Heiner Kallweit --- arch/powerpc/platforms/powernv/npu-dma.c | 14 ++ 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c index dc23d9d2a..495550432 100644 --- a/arch/powerpc/platforms/powernv/npu-dma.c +++ b/arch/powerpc/platforms/powernv/npu-dma.c @@ -1213,9 +1213,8 @@ int pnv_npu2_map_lpar_dev(struct pci_dev *gpdev, unsigned int lparid, * Currently we only support radix and non-zero LPCR only makes sense * for hash tables so skiboot expects the LPCR parameter to be a zero. */ - ret = opal_npu_map_lpar(nphb->opal_id, - PCI_DEVID(gpdev->bus->number, gpdev->devfn), lparid, - 0 /* LPCR bits */); + ret = opal_npu_map_lpar(nphb->opal_id, pci_dev_id(gpdev), lparid, + 0 /* LPCR bits */); if (ret) { dev_err(>dev, "Error %d mapping device to LPAR\n", ret); return ret; @@ -1224,7 +1223,7 @@ int pnv_npu2_map_lpar_dev(struct pci_dev *gpdev, unsigned int lparid, dev_dbg(>dev, "init context opalid=%llu msr=%lx\n", nphb->opal_id, msr); ret = opal_npu_init_context(nphb->opal_id, 0/*__unused*/, msr, - PCI_DEVID(gpdev->bus->number, gpdev->devfn)); + pci_dev_id(gpdev)); if (ret < 0) dev_err(>dev, "Failed to init context: %d\n", ret); else @@ -1258,7 +1257,7 @@ int pnv_npu2_unmap_lpar_dev(struct pci_dev *gpdev) dev_dbg(>dev, "destroy context opalid=%llu\n", nphb->opal_id); ret = opal_npu_destroy_context(nphb->opal_id, 0/*__unused*/, - PCI_DEVID(gpdev->bus->number, gpdev->devfn)); + pci_dev_id(gpdev)); if (ret < 0) { dev_err(>dev, "Failed to destroy context: %d\n", ret); return ret; @@ -1266,9 +1265,8 @@ int pnv_npu2_unmap_lpar_dev(struct pci_dev *gpdev) /* Set LPID to 0 anyway, just to be safe */ dev_dbg(>dev, "Map LPAR opalid=%llu lparid=0\n", nphb->opal_id); - ret = opal_npu_map_lpar(nphb->opal_id, - PCI_DEVID(gpdev->bus->number, gpdev->devfn), 0 /*LPID*/, - 0 /* LPCR bits */); + ret = opal_npu_map_lpar(nphb->opal_id, pci_dev_id(gpdev), 0 /*LPID*/, + 0 /* LPCR bits */); if (ret) dev_err(>dev, "Error %d mapping device to LPAR\n", ret); -- 2.21.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 0/9] PCI: add help pci_dev_id
In several places in the kernel we find PCI_DEVID used like this: PCI_DEVID(dev->bus->number, dev->devfn) Therefore create a helper for it. v2: - apply the change to all affected places in the kernel Heiner Kallweit (9): PCI: add helper pci_dev_id PCI: use helper pci_dev_id r8169: use new helper pci_dev_id powerpc/powernv/npu: use helper pci_dev_id drm/amdkfd: use helper pci_dev_id iommu/amd: use helper pci_dev_id iommu/vt-d: use helper pci_dev_id stmmac: pci: use helper pci_dev_id platform/chrome: chromeos_laptop: use helper pci_dev_id arch/powerpc/platforms/powernv/npu-dma.c | 14 ++ drivers/gpu/drm/amd/amdkfd/kfd_topology.c| 3 +-- drivers/iommu/amd_iommu.c| 2 +- drivers/iommu/intel-iommu.c | 2 +- drivers/iommu/intel_irq_remapping.c | 2 +- drivers/net/ethernet/realtek/r8169.c | 3 +-- drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c | 2 +- drivers/pci/msi.c| 6 +++--- drivers/pci/search.c | 10 +++--- drivers/platform/chrome/chromeos_laptop.c| 2 +- include/linux/pci.h | 5 + 11 files changed, 24 insertions(+), 27 deletions(-) -- 2.21.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 3/9] r8169: use new helper pci_dev_id
Use new helper pci_dev_id() to simplify the code. Signed-off-by: Heiner Kallweit --- drivers/net/ethernet/realtek/r8169.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index efaea1a0a..ae476fe8d 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -7020,8 +7020,7 @@ static int r8169_mdio_register(struct rtl8169_private *tp) new_bus->priv = tp; new_bus->parent = >dev; new_bus->irq[0] = PHY_IGNORE_INTERRUPT; - snprintf(new_bus->id, MII_BUS_ID_SIZE, "r8169-%x", -PCI_DEVID(pdev->bus->number, pdev->devfn)); + snprintf(new_bus->id, MII_BUS_ID_SIZE, "r8169-%x", pci_dev_id(pdev)); new_bus->read = r8169_mdio_read_reg; new_bus->write = r8169_mdio_write_reg; -- 2.21.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 2/9] PCI: use helper pci_dev_id
Use new helper pci_dev_id() to simplify the code. Signed-off-by: Heiner Kallweit --- drivers/pci/msi.c| 6 +++--- drivers/pci/search.c | 10 +++--- 2 files changed, 6 insertions(+), 10 deletions(-) diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index 73986825d..e039b740f 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -1338,7 +1338,7 @@ irq_hw_number_t pci_msi_domain_calc_hwirq(struct pci_dev *dev, struct msi_desc *desc) { return (irq_hw_number_t)desc->msi_attrib.entry_nr | - PCI_DEVID(dev->bus->number, dev->devfn) << 11 | + pci_dev_id(dev) << 11 | (pci_domain_nr(dev->bus) & 0x) << 27; } @@ -1508,7 +1508,7 @@ static int get_msi_id_cb(struct pci_dev *pdev, u16 alias, void *data) u32 pci_msi_domain_get_msi_rid(struct irq_domain *domain, struct pci_dev *pdev) { struct device_node *of_node; - u32 rid = PCI_DEVID(pdev->bus->number, pdev->devfn); + u32 rid = pci_dev_id(pdev); pci_for_each_dma_alias(pdev, get_msi_id_cb, ); @@ -1531,7 +1531,7 @@ u32 pci_msi_domain_get_msi_rid(struct irq_domain *domain, struct pci_dev *pdev) struct irq_domain *pci_msi_get_device_domain(struct pci_dev *pdev) { struct irq_domain *dom; - u32 rid = PCI_DEVID(pdev->bus->number, pdev->devfn); + u32 rid = pci_dev_id(pdev); pci_for_each_dma_alias(pdev, get_msi_id_cb, ); dom = of_msi_map_get_device_domain(>dev, rid); diff --git a/drivers/pci/search.c b/drivers/pci/search.c index 2b5f72086..5c7922612 100644 --- a/drivers/pci/search.c +++ b/drivers/pci/search.c @@ -33,7 +33,7 @@ int pci_for_each_dma_alias(struct pci_dev *pdev, struct pci_bus *bus; int ret; - ret = fn(pdev, PCI_DEVID(pdev->bus->number, pdev->devfn), data); + ret = fn(pdev, pci_dev_id(pdev), data); if (ret) return ret; @@ -88,9 +88,7 @@ int pci_for_each_dma_alias(struct pci_dev *pdev, return ret; continue; case PCI_EXP_TYPE_PCIE_BRIDGE: - ret = fn(tmp, -PCI_DEVID(tmp->bus->number, - tmp->devfn), data); + ret = fn(tmp, pci_dev_id(tmp), data); if (ret) return ret; continue; @@ -101,9 +99,7 @@ int pci_for_each_dma_alias(struct pci_dev *pdev, PCI_DEVID(tmp->subordinate->number, PCI_DEVFN(0, 0)), data); else - ret = fn(tmp, -PCI_DEVID(tmp->bus->number, - tmp->devfn), data); + ret = fn(tmp, pci_dev_id(tmp), data); if (ret) return ret; } -- 2.21.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 1/9] PCI: add helper pci_dev_id
In several places in the kernel we find PCI_DEVID used like this: PCI_DEVID(dev->bus->number, dev->devfn) Therefore create a helper for it. Signed-off-by: Heiner Kallweit --- include/linux/pci.h | 5 + 1 file changed, 5 insertions(+) diff --git a/include/linux/pci.h b/include/linux/pci.h index 2c056a7a7..28d0313a3 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -598,6 +598,11 @@ struct pci_bus { #define to_pci_bus(n) container_of(n, struct pci_bus, dev) +static inline u16 pci_dev_id(struct pci_dev *dev) +{ + return PCI_DEVID(dev->bus->number, dev->devfn); +} + /* * Returns true if the PCI bus is root (behind host-PCI bridge), * false otherwise -- 2.21.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [patch V2 18/29] lockdep: Move stack trace logic into check_prev_add()
On Wed, 24 Apr 2019, Peter Zijlstra wrote: > On Thu, Apr 18, 2019 at 10:41:37AM +0200, Thomas Gleixner wrote: > > There is only one caller of check_prev_add() which hands in a zeroed struct > > stack trace and a function pointer to save_stack(). Inside check_prev_add() > > the stack_trace struct is checked for being empty, which is always > > true. Based on that one code path stores a stack trace which is unused. The > > comment there does not make sense either. It's all leftovers from > > historical lockdep code (cross release). > > I was more or less expecting a revert of: > > ce07a9415f26 ("locking/lockdep: Make check_prev_add() able to handle external > stack_trace") > > And then I read the comment that went with the "static struct > stack_trace trace" that got removed (in the above commit) and realized > that your patch will consume more stack entries. > > The problem is when the held lock stack in check_prevs_add() has multple > trylock entries on top, in that case we call check_prev_add() multiple > times, and this patch will then save the exact same stack-trace multiple > times, consuming static resources. > > Possibly we should copy what stackdepot does (but we cannot use it > directly because stackdepot uses locks; but possible we can share bits), > but that is a patch for another day I think. > > So while convoluted, perhaps we should retain this code for now. Uurg, what a mess. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [patch V2 19/29] lockdep: Simplify stack trace handling
On Thu, Apr 18, 2019 at 10:41:38AM +0200, Thomas Gleixner wrote: > Replace the indirection through struct stack_trace by using the storage > array based interfaces and storing the information is a small lockdep > specific data structure. > Acked-by: Peter Zijlstra (Intel) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [patch V2 18/29] lockdep: Move stack trace logic into check_prev_add()
On Thu, Apr 18, 2019 at 10:41:37AM +0200, Thomas Gleixner wrote: > There is only one caller of check_prev_add() which hands in a zeroed struct > stack trace and a function pointer to save_stack(). Inside check_prev_add() > the stack_trace struct is checked for being empty, which is always > true. Based on that one code path stores a stack trace which is unused. The > comment there does not make sense either. It's all leftovers from > historical lockdep code (cross release). I was more or less expecting a revert of: ce07a9415f26 ("locking/lockdep: Make check_prev_add() able to handle external stack_trace") And then I read the comment that went with the "static struct stack_trace trace" that got removed (in the above commit) and realized that your patch will consume more stack entries. The problem is when the held lock stack in check_prevs_add() has multple trylock entries on top, in that case we call check_prev_add() multiple times, and this patch will then save the exact same stack-trace multiple times, consuming static resources. Possibly we should copy what stackdepot does (but we cannot use it directly because stackdepot uses locks; but possible we can share bits), but that is a patch for another day I think. So while convoluted, perhaps we should retain this code for now. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 RFC/RFT 1/5] ARM: dma-mapping: Add fallback normal page allocations
On Wed, Apr 24, 2019 at 09:26:52PM +0200, Christoph Hellwig wrote: > On Wed, Apr 24, 2019 at 11:33:11AM -0700, Nicolin Chen wrote: > > I feel it's similar to my previous set, which did most of these > > internally except the renaming part. But Catalin had a concern > > that some platforms might have limits on CMA range [1]. Will it > > be still okay to do the fallback internally? > > > > [1: https://www.spinics.net/lists/arm-kernel/msg714295.html ] > > Catalins statement is correct, but I don't see how it applies to > your patch. Your patch just ensures that the fallback we have > in most callers is uniformly applied everywhere. The non-iommu > callers will still need to select a specific zone and/or retry > just the page allocator with other flags if the CMA (or fallback) > page doesn't match what they need. dma-direct does this correctly > and I think the arm32 allocator does as well, although it is a bit > hard to follow sometimes. Okay. I will revise and submit the patches then. I think we can still discuss on this topic once we have the changes. Thanks ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 RFC/RFT 1/5] ARM: dma-mapping: Add fallback normal page allocations
On Wed, Apr 24, 2019 at 11:33:11AM -0700, Nicolin Chen wrote: > I feel it's similar to my previous set, which did most of these > internally except the renaming part. But Catalin had a concern > that some platforms might have limits on CMA range [1]. Will it > be still okay to do the fallback internally? > > [1: https://www.spinics.net/lists/arm-kernel/msg714295.html ] Catalins statement is correct, but I don't see how it applies to your patch. Your patch just ensures that the fallback we have in most callers is uniformly applied everywhere. The non-iommu callers will still need to select a specific zone and/or retry just the page allocator with other flags if the CMA (or fallback) page doesn't match what they need. dma-direct does this correctly and I think the arm32 allocator does as well, although it is a bit hard to follow sometimes. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 RFC/RFT 1/5] ARM: dma-mapping: Add fallback normal page allocations
Hi Christoph, On Wed, Apr 24, 2019 at 05:06:38PM +0200, Christoph Hellwig wrote: > On Tue, Mar 26, 2019 at 04:01:27PM -0700, Nicolin Chen wrote: > > page = dma_alloc_from_contiguous(dev, count, order, gfp & __GFP_NOWARN); > > + if (!page) > > + page = alloc_pages(gfp, order); > > We have this fallback in most callers already. And with me adding > it to the dma-iommu code in one series, and you to arm here I think > we really need to take a step back and think of a better way > to handle this, and the general mess that dma_alloc_from_contiguous. > > So what about: Thanks for the suggestion! > (1) change the dma_alloc_from_contiguous prototype to be: > > struct page *dma_alloc_contiguous(struct device *dev, size_t size, gfp_t gfp); > > that is: calculate order and count internally, pass the full gfp_t > and mask it internally, and drop the pointless from in the name. > I'd also use the oppurtunity to forbid a NULL dev argument and > opencode those uses. > > (2) handle the alloc_pages fallback internally. Note that we should > use alloc_pages_node as we do in dma-direct. I feel it's similar to my previous set, which did most of these internally except the renaming part. But Catalin had a concern that some platforms might have limits on CMA range [1]. Will it be still okay to do the fallback internally? [1: https://www.spinics.net/lists/arm-kernel/msg714295.html ] ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 08/10] iommu/vt-d: Check whether device requires bounce buffer
On Tue, Apr 23, 2019 at 03:35:51PM +0800, Lu Baolu wrote: > Hi, > > On 4/23/19 2:08 PM, Christoph Hellwig wrote: > > > > Again, this and the option should not be in a specific iommu driver. > > > > > > > > > > The option of whether bounce is ignored should be in the specific iommu > > > driver. > > > > Why? As a user I could not care less which IOMMU driver my particular > > system uses. > > > > Looks reasonable to me. Let's listen to more opinions. I agree with Christoph here. > > Best regards, > Lu Baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 09/19] iommu/vt-d: Enlightened PASID allocation
Hi Jacob, On 4/24/19 1:31 AM, Jacob Pan wrote: > From: Lu Baolu > > If Intel IOMMU runs in caching mode, a.k.a. virtual IOMMU, the > IOMMU driver should rely on the emulation software to allocate > and free PASID IDs. Do we make the decision depending on the CM or depending on the VCCAP_REG? VCCAP_REG description says: If Set, software must use Virtual Command Register interface to allocate and free PASIDs. The Intel vt-d spec revision 3.0 defines a > register set to support this. This includes a capability register, > a virtual command register and a virtual response register. Refer > to section 10.4.42, 10.4.43, 10.4.44 for more information. > > This patch adds the enlightened PASID allocation/free interfaces For mu curiosity why is it called "enlightened"? > via the virtual command register. > > Cc: Ashok Raj > Cc: Jacob Pan > Cc: Kevin Tian > Signed-off-by: Liu Yi L > Signed-off-by: Lu Baolu > --- > drivers/iommu/intel-pasid.c | 70 > + > drivers/iommu/intel-pasid.h | 13 - > include/linux/intel-iommu.h | 2 ++ > 3 files changed, 84 insertions(+), 1 deletion(-) > > diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c > index 03b12d2..5b1d3be 100644 > --- a/drivers/iommu/intel-pasid.c > +++ b/drivers/iommu/intel-pasid.c > @@ -63,6 +63,76 @@ void *intel_pasid_lookup_id(int pasid) > return p; > } > > +int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int *pasid) > +{ > + u64 res; > + u64 cap; > + u8 err_code; > + unsigned long flags; > + int ret = 0; > + > + if (!ecap_vcs(iommu->ecap)) { > + pr_warn("IOMMU: %s: Hardware doesn't support virtual command\n", > + iommu->name); nit: other pr_* messages don't have the "IOMMU: %s:" prefix. > + return -ENODEV; > + } > + > + cap = dmar_readq(iommu->reg + DMAR_VCCAP_REG); > + if (!(cap & DMA_VCS_PAS)) { > + pr_warn("IOMMU: %s: Emulation software doesn't support PASID > allocation\n", > + iommu->name); > + return -ENODEV; > + } > + > + raw_spin_lock_irqsave(>register_lock, flags); > + dmar_writeq(iommu->reg + DMAR_VCMD_REG, VCMD_CMD_ALLOC); > + IOMMU_WAIT_OP(iommu, DMAR_VCRSP_REG, dmar_readq, > + !(res & VCMD_VRSP_IP), res); > + raw_spin_unlock_irqrestore(>register_lock, flags); > + > + err_code = VCMD_VRSP_EC(res); > + switch (err_code) { > + case VCMD_VRSP_EC_SUCCESS: > + *pasid = VCMD_VRSP_RESULE(res); > + break; > + case VCMD_VRSP_EC_UNAVAIL: > + pr_info("IOMMU: %s: No PASID available\n", iommu->name); > + ret = -ENOMEM; > + break; > + default: > + ret = -ENODEV; > + pr_warn("IOMMU: %s: Unkonwn error code %d\n", unknown > + iommu->name, err_code); > + } > + > + return ret; > +} > + > +void vcmd_free_pasid(struct intel_iommu *iommu, unsigned int pasid) > +{ > + u64 res; > + u8 err_code; > + unsigned long flags; Shall we check as well the cap is set? > + > + raw_spin_lock_irqsave(>register_lock, flags); > + dmar_writeq(iommu->reg + DMAR_VCMD_REG, (pasid << 8) | VCMD_CMD_FREE); > + IOMMU_WAIT_OP(iommu, DMAR_VCRSP_REG, dmar_readq, > + !(res & VCMD_VRSP_IP), res); > + raw_spin_unlock_irqrestore(>register_lock, flags); > + > + err_code = VCMD_VRSP_EC(res); > + switch (err_code) { > + case VCMD_VRSP_EC_SUCCESS: > + break; > + case VCMD_VRSP_EC_INVAL: > + pr_info("IOMMU: %s: Invalid PASID\n", iommu->name); > + break; > + default: > + pr_warn("IOMMU: %s: Unkonwn error code %d\n", unknown > + iommu->name, err_code); > + } > +} > + > /* > * Per device pasid table management: > */ > diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h > index 23537b3..0999dfe 100644 > --- a/drivers/iommu/intel-pasid.h > +++ b/drivers/iommu/intel-pasid.h > @@ -19,6 +19,16 @@ > #define PASID_PDE_SHIFT 6 > #define MAX_NR_PASID_BITS20 > > +/* Virtual command interface for enlightened pasid management. */ > +#define VCMD_CMD_ALLOC 0x1 > +#define VCMD_CMD_FREE0x2 > +#define VCMD_VRSP_IP 0x1 > +#define VCMD_VRSP_EC(e) (((e) >> 1) & 0x3) s/EC/SC? for Status Code and below > +#define VCMD_VRSP_EC_SUCCESS 0 > +#define VCMD_VRSP_EC_UNAVAIL 1 nit: _NO_VALID_PASID > +#define VCMD_VRSP_EC_INVAL 1 nit: _INVALID_PASID > +#define VCMD_VRSP_RESULE(e) (((e) >> 8) & 0xf) nit: s/RESULE/RSLT? > + > /* > * Domain ID reserved for pasid entries programmed for first-level > * only and pass-through transfer modes. > @@ -69,5 +79,6 @@ int intel_pasid_setup_pass_through(struct intel_iommu > *iommu, >
Re: [PATCH v2 10/19] iommu/vt-d: Add custom allocator for IOASID
Hi Jacob, On 4/24/19 1:31 AM, Jacob Pan wrote: > When VT-d driver runs in the guest, PASID allocation must be > performed via virtual command interface. This patch register a registers > custom IOASID allocator which takes precedence over the default > IDR based allocator. nit: s/IDR based// . It is xarray based now. The resulting IOASID allocation will always > come from the host. This ensures that PASID namespace is system- > wide. > > Signed-off-by: Lu Baolu > Signed-off-by: Liu, Yi L > Signed-off-by: Jacob Pan > --- > drivers/iommu/intel-iommu.c | 58 > + > include/linux/intel-iommu.h | 2 ++ > 2 files changed, 60 insertions(+) > > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c > index d93c4bd..ec6f22d 100644 > --- a/drivers/iommu/intel-iommu.c > +++ b/drivers/iommu/intel-iommu.c > @@ -1711,6 +1711,8 @@ static void free_dmar_iommu(struct intel_iommu *iommu) > if (ecap_prs(iommu->ecap)) > intel_svm_finish_prq(iommu); > } > + ioasid_unregister_allocator(>pasid_allocator); > + > #endif > } > > @@ -4811,6 +4813,46 @@ static int __init platform_optin_force_iommu(void) > return 1; > } > > +static ioasid_t intel_ioasid_alloc(ioasid_t min, ioasid_t max, void *data) > +{ > + struct intel_iommu *iommu = data; > + ioasid_t ioasid; > + > + /* > + * VT-d virtual command interface always uses the full 20 bit > + * PASID range. Host can partition guest PASID range based on > + * policies but it is out of guest's control. > + */ The above comment does not exactly relate to the check below > + if (min < PASID_MIN || max > PASID_MAX) > + return -EINVAL; > + > + if (vcmd_alloc_pasid(iommu, )) > + return INVALID_IOASID; > + > + return ioasid; > +} > + > +static int intel_ioasid_free(ioasid_t ioasid, void *data) > +{ > + struct iommu_pasid_alloc_info *svm; > + struct intel_iommu *iommu = data; > + > + if (!iommu || !cap_caching_mode(iommu->cap)) > + return -EINVAL; can !cap_caching_mode(iommu->cap) be true as the allocator only is set if CM? > + /* > + * Sanity check the ioasid owner is done at upper layer, e.g. VFIO > + * We can only free the PASID when all the devices are unbond. > + */ > + svm = ioasid_find(NULL, ioasid, NULL); > + if (!svm) { you can avoid using the local svm variable. > + pr_warn("Freeing unbond IOASID %d\n", ioasid); unbound > + return -EBUSY; -EINVAL? > + } > + vcmd_free_pasid(iommu, ioasid); > + > + return 0; > +} > + > int __init intel_iommu_init(void) > { > int ret = -ENODEV; > @@ -4912,6 +4954,22 @@ int __init intel_iommu_init(void) > "%s", iommu->name); > iommu_device_set_ops(>iommu, _iommu_ops); > iommu_device_register(>iommu); > + if (cap_caching_mode(iommu->cap) && sm_supported(iommu)) { so shouldn't you test VCCAP_REG as well? > + /* > + * Register a custom ASID allocator if we are running > + * in a guest, the purpose is to have a system wide > PASID > + * namespace among all PASID users. > + * There can be multiple vIOMMUs in each guest but only > + * one allocator is active. All vIOMMU allocators will > + * eventually be calling the same host allocator. > + */ > + iommu->pasid_allocator.alloc = intel_ioasid_alloc; > + iommu->pasid_allocator.free = intel_ioasid_free; > + iommu->pasid_allocator.pdata = (void *)iommu; > + ret = > ioasid_register_allocator(>pasid_allocator); > + if (ret) > + pr_warn("Custom PASID allocator registeration > failed\n"); registration > + } > } > > bus_set_iommu(_bus_type, _iommu_ops); > diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h > index bff907b..c24c8aa 100644 > --- a/include/linux/intel-iommu.h > +++ b/include/linux/intel-iommu.h > @@ -31,6 +31,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -549,6 +550,7 @@ struct intel_iommu { > #ifdef CONFIG_INTEL_IOMMU_SVM > struct page_req_dsc *prq; > unsigned char prq_name[16];/* Name for PRQ interrupt */ > + struct ioasid_allocator pasid_allocator; /* Custom allocator for PASIDs > */ > #endif > struct q_inval *qi;/* Queued invalidation info */ > u32 *iommu_state; /* Store iommu states between suspend and resume.*/ > Thanks Eric ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 12/19] iommu/vt-d: Move domain helper to header
Hi Jacob On 4/24/19 1:31 AM, Jacob Pan wrote: > Move domainer helper to header to be used by SVA code. s/domainer/domain > > Signed-off-by: Jacob Pan Reviewed-by: Eric Auger Eric > --- > drivers/iommu/intel-iommu.c | 6 -- > include/linux/intel-iommu.h | 6 ++ > 2 files changed, 6 insertions(+), 6 deletions(-) > > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c > index 785330a..77bbe1b 100644 > --- a/drivers/iommu/intel-iommu.c > +++ b/drivers/iommu/intel-iommu.c > @@ -427,12 +427,6 @@ static void init_translation_status(struct intel_iommu > *iommu) > iommu->flags |= VTD_FLAG_TRANS_PRE_ENABLED; > } > > -/* Convert generic 'struct iommu_domain to private struct dmar_domain */ > -static struct dmar_domain *to_dmar_domain(struct iommu_domain *dom) > -{ > - return container_of(dom, struct dmar_domain, domain); > -} > - > static int __init intel_iommu_setup(char *str) > { > if (!str) > diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h > index c24c8aa..48fa164 100644 > --- a/include/linux/intel-iommu.h > +++ b/include/linux/intel-iommu.h > @@ -597,6 +597,12 @@ static inline void __iommu_flush_cache( > clflush_cache_range(addr, size); > } > > +/* Convert generic 'struct iommu_domain to private struct dmar_domain */ > +static inline struct dmar_domain *to_dmar_domain(struct iommu_domain *dom) > +{ > + return container_of(dom, struct dmar_domain, domain); > +} > + > /* > * 0: readable > * 1: writable > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v1] iommu/amd: flush not present cache in iommu_map_page
check if there is a not-present cache present and flush it if there is. Signed-off-by: Tom Murphy --- drivers/iommu/amd_iommu.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index f7cdd2ab7f11..91fe5cb10f50 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -1637,6 +1637,11 @@ static int iommu_map_page(struct protection_domain *dom, update_domain(dom); + if (unlikely(amd_iommu_np_cache && !dom->updated)) { + domain_flush_pages(dom, bus_addr, page_size); + domain_flush_complete(dom); + } + /* Everything flushed out, free pages now */ free_page_list(freelist); -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 0/2] iommu/arm-smmu-v3: make sure the kdump kernel can work well when smmu is enabled
On 16/04/2019 11:14, Will Deacon wrote: > On Mon, Apr 08, 2019 at 10:31:47AM +0800, Leizhen (ThunderTown) wrote: >> On 2019/4/4 23:30, Will Deacon wrote: >>> On Mon, Mar 18, 2019 at 09:12:41PM +0800, Zhen Lei wrote: v1 --> v2: 1. Drop part2. Now, we only use the SMMUv3 hardware feature STE.config=0b000 (Report abort to device, no event recorded) to suppress the event messages caused by the unexpected devices. 2. rewrite the patch description. >>> >>> This issue came up a while back: >>> >>> https://lore.kernel.org/linux-pci/20180302103032.gb19...@arm.com/ >>> >>> and I'd still prefer to solve it using the disable_bypass logic which we >>> already have. Something along the lines of the diff below? >> >> Yes, my patches also use disable_bypass=1(set ste.config=0b000). If >> SMMU_IDR0.ST_LEVEL=0(Linear Stream table supported), then all STE entries >> are allocated and initialized(set ste.config=0b000). But if >> SMMU_IDR0.ST_LEVEL=1 >> (2-level Stream Table), we only allocated and initialized the first level >> tables, >> but leave level 2 tables dynamic allocated. That means, >> C_BAD_STREAMID(eventid=0x2) >> will be reported, if an unexpeted device access memory without reinitialized >> in >> kdump kernel. > > So is your problem just that the C_BAD_STREAMID events are noisy? If so, > perhaps we should be disabling fault reporting entirely in the kdump kernel. > > How about the update diff below? I'm keen to have this as simple as > possible, so we don't end up introducing rarely tested, complex code on > the crash path. > > Will > > --->8 > > diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c > index d3880010c6cf..d8b73da6447d 100644 > --- a/drivers/iommu/arm-smmu-v3.c > +++ b/drivers/iommu/arm-smmu-v3.c > @@ -2454,13 +2454,9 @@ static int arm_smmu_device_reset(struct > arm_smmu_device *smmu, bool bypass) > /* Clear CR0 and sync (disables SMMU and queue processing) */ > reg = readl_relaxed(smmu->base + ARM_SMMU_CR0); > if (reg & CR0_SMMUEN) { > - if (is_kdump_kernel()) { > - arm_smmu_update_gbpa(smmu, GBPA_ABORT, 0); > - arm_smmu_device_disable(smmu); > - return -EBUSY; > - } > - > dev_warn(smmu->dev, "SMMU currently enabled! Resetting...\n"); > + WARN_ON(is_kdump_kernel() && !disable_bypass); > + arm_smmu_update_gbpa(smmu, GBPA_ABORT, 0); > } > > ret = arm_smmu_device_disable(smmu); > @@ -2553,6 +2549,8 @@ static int arm_smmu_device_reset(struct arm_smmu_device > *smmu, bool bypass) > return ret; > } > > + if (is_kdump_kernel()) > + enables &= ~(CR0_EVTQEN | CR0_PRIQEN); > > /* Enable the SMMU interface, or ensure bypass */ > if (!bypass || disable_bypass) { > Same here I tested the patch and it works for me. Feel free to add: Tested-by: Matthias Brugger Regards, Matthias ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 56/79] docs: Documentation/*.txt: rename all ReST files to *.rst
Em Wed, 24 Apr 2019 16:54:10 +0200 Borislav Petkov escreveu: > On Wed, Apr 24, 2019 at 07:40:07AM -0300, Mauro Carvalho Chehab wrote: > > Personally, I don't care much with monospaced fonts on this table. After > > all, if I want to see it monospaced, I can simply click at the > > "View page source" at the browser, and it will display the file as a > > plain old monospaced text file. > > Goes to show why kernel people wouldn't want to look at that in > the browser. Long hex numbers are hard to read as it is - that's > why there's even the 4-digit separator in some docs, for example: > 0x__8100_. IMHO, even the 0x and _ would make it harder to read. This is a way more easy for my eyes: 8100 > Not having it monospaced makes the whole thing even less readable. Yeah, I see your point and agree with it. Just saying that, if all I want is to check if addresses that start with 80 belongs to the guard hole, or just to copy a value from a table into some C code, the font doesn't matter much, and, if I care, a simple click would show it in monospaced fonts. Looking from your PoV, something like: |8000 | -2GB | 9fff | 512 MB | kernel text mapping, mapped to physical address 0 | is very hard to be parsed by a human eye, even with monospaced fonts. In order to make it easier, I would replace it by: | 8000 | -2GB | 9fff | 512 MB | kernel text mapping, mapped to physical address 0 | > > That's why it is important for the markup not to get in the way of > people looking at those files in an editor. Fully agreed. the markups should make things easier and not harder for people to read its contents. Thanks, Mauro ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/amd: flush not present cache in iommu_map_page
On Wed, Apr 24, 2019 at 4:55 PM Joerg Roedel wrote: > > On Wed, Apr 24, 2019 at 07:58:19AM -0700, Christoph Hellwig wrote: > > I'd be tempted to do that. But lets just ask Joerg if he has > > any opinion.. > > The reason was that it is an unlikely path, as hardware implementations > are not allowed to set this bit. It is purely for emulated AMD IOMMUs. > I have not measured whether this annotation has any performance > benefit, but I find it more readable at least. In that case I will keep it in. > > Regards, > > Joerg > > PS: Why did you drop me from the Cc list of the previous replies? ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/amd: flush not present cache in iommu_map_page
On Wed, Apr 24, 2019 at 07:58:19AM -0700, Christoph Hellwig wrote: > I'd be tempted to do that. But lets just ask Joerg if he has > any opinion.. The reason was that it is an unlikely path, as hardware implementations are not allowed to set this bit. It is purely for emulated AMD IOMMUs. I have not measured whether this annotation has any performance benefit, but I find it more readable at least. Regards, Joerg PS: Why did you drop me from the Cc list of the previous replies? ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 RFC/RFT 1/5] ARM: dma-mapping: Add fallback normal page allocations
On Wed, Apr 24, 2019 at 05:06:38PM +0200, Christoph Hellwig wrote: > I'd also use the oppurtunity to forbid a NULL dev argument and > opencode those uses. Actually, looking at your last patch again the NULL argument might still fit in ok, so maybe we should keep it. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/amd: flush not present cache in iommu_map_page
On Wed, Apr 24, 2019 at 03:47:39PM +0100, Tom Murphy wrote: > >And I'd really like to understand the unlikely - amd_iommu_np_cache > >is set based on a hardware capability, so it seems rather odd to mark > >it unlikely. Dynamic branch prediction really should do the right thing > >here usually. > > Here is the commit which added it without any explanation: > https://github.com/torvalds/linux/commit/270cab2426cdc6307725e4f1f46ecf8ab8e69193 > > should we remove it seen as there's no explanation given ? I'd be tempted to do that. But lets just ask Joerg if he has any opinion.. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH] dma-mapping: create iommu mapping for newly allocated dma coherent mem
I'd be happy to offload all of the mentioned tasks to you if you volunteer. I think the first step is to move the two USB controller that can only DMA to their own BAR off the existing DMA coherent infrastructure. The controllers are already identified using the HCD_LOCAL_MEM flag, so we just need to key off that in the hcd_buffer_* routines and call into a genalloc that has been fed using the bar, replacing the current dma_declare_coherent usage. Take a look at drivers/pci/p2pdma.c for another example of allocating bits of a BAR using genalloc. Another issue in that are just popped up, which is remoteproc_virtio.c, which looks like a complete trainwreck. I'll need to spend time to figure out what the hell they are trying to first, though. Then we need to kill the dma_declare_coherent_memory and dma_release_declared_memory exports ASAP to avoid growing more users. Next is figuring out how we want to plumb the OF / early boot coherent regions into the iommu drivers. I think I want to handle them similar to the per-device CMA regions, that is each of the DMA ops has to manually call into it instead of the page allocator. Fortunately we are down to only about a hand full of instances that are relevant for the reserved memory now. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 56/79] docs: Documentation/*.txt: rename all ReST files to *.rst
On Wed, Apr 24, 2019 at 07:40:07AM -0300, Mauro Carvalho Chehab wrote: > Personally, I don't care much with monospaced fonts on this table. After > all, if I want to see it monospaced, I can simply click at the > "View page source" at the browser, and it will display the file as a > plain old monospaced text file. Goes to show why kernel people wouldn't want to look at that in the browser. Long hex numbers are hard to read as it is - that's why there's even the 4-digit separator in some docs, for example: 0x__8100_. Not having it monospaced makes the whole thing even less readable. That's why it is important for the markup not to get in the way of people looking at those files in an editor. -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/amd: flush not present cache in iommu_map_page
>The two conditions can go into one if statement to make this a little >more clear. Ah, yeah of course >And I'd really like to understand the unlikely - amd_iommu_np_cache >is set based on a hardware capability, so it seems rather odd to mark >it unlikely. Dynamic branch prediction really should do the right thing >here usually. Here is the commit which added it without any explanation: https://github.com/torvalds/linux/commit/270cab2426cdc6307725e4f1f46ecf8ab8e69193 should we remove it seen as there's no explanation given ? On Wed, Apr 24, 2019 at 3:32 PM Christoph Hellwig wrote: > > On Wed, Apr 24, 2019 at 03:18:59PM +0100, Tom Murphy via iommu wrote: > > check if there is a not-present cache present and flush it if there is. > > > > Signed-off-by: Tom Murphy > > --- > > drivers/iommu/amd_iommu.c | 6 ++ > > 1 file changed, 6 insertions(+) > > > > diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c > > index f7cdd2ab7f11..8ef43224aae0 100644 > > --- a/drivers/iommu/amd_iommu.c > > +++ b/drivers/iommu/amd_iommu.c > > @@ -1636,6 +1636,12 @@ static int iommu_map_page(struct protection_domain > > *dom, > > pte[i] = __pte; > > > > update_domain(dom); > > + if (!dom->updated) { > > + if (unlikely(amd_iommu_np_cache)) { > > + domain_flush_pages(dom, bus_addr, page_size); > > + domain_flush_complete(dom); > > + } > > + } > > The two conditions can go into one if statement to make this a little > more clear. > > And I'd really like to understand the unlikely - amd_iommu_np_cache > is set based on a hardware capability, so it seems rather odd to mark > it unlikely. Dynamic branch prediction really should do the right thing > here usually. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 02/10] swiotlb: Factor out slot allocation and free
On Tue, Apr 23, 2019 at 03:32:16PM +0800, Lu Baolu wrote: > When we add the bounce buffer between IOVA and physical buffer, the > bounced buffer must starts from the same offset in a page, otherwise, > IOMMU can't work here. Why? Even with the odd hardware descriptors typical in Intel land that only allow offsets in the first page, not the following ones, having a zero offset where we previously had one should be fine. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] dma-mapping: remove an unnecessary NULL check
On Wed, Apr 24, 2019 at 05:24:37PM +0300, Dan Carpenter wrote: > We already dereferenced "dev" when we called get_dma_ops() so this NULL > check is too late. We're not supposed to pass NULL "dev" pointers to > dma_alloc_attrs(). Thanks, applied to the dma-mapping for-next tree. > Signed-off-by: Dan Carpenter > --- > There are still at least two drivers which do pass a NULL unfortunately. > > drivers/staging/comedi/drivers/comedi_isadma.c:195 comedi_isadma_alloc() > error: NULL dereference inside function 'dma_alloc_coherent()' > drivers/staging/comedi/drivers/comedi_isadma.c:227 comedi_isadma_free() > error: NULL dereference inside function 'dma_free_coherent()' This is staging code. Per official decree from Linus we can just ignore it, and I tend to do so to keep my sanity. > drivers/tty/synclink.c:3667 mgsl_alloc_buffer_list_memory() error: NULL > dereference inside function 'dma_alloc_coherent()' > drivers/tty/synclink.c:3738 mgsl_free_buffer_list_memory() error: NULL > dereference inside function 'dma_free_coherent()' > drivers/tty/synclink.c:3777 mgsl_alloc_frame_memory() error: NULL dereference > inside function 'dma_alloc_coherent()' > drivers/tty/synclink.c:3811 mgsl_free_frame_memory() error: NULL dereference > inside function 'dma_free_coherent()' The !PCI case there is dead since I removed PCI support a while ago. Looks like it is still too convoluted for static checkers to notice that, though. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu/amd: flush not present cache in iommu_map_page
check if there is a not-present cache present and flush it if there is. Signed-off-by: Tom Murphy --- drivers/iommu/amd_iommu.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index f7cdd2ab7f11..8ef43224aae0 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -1636,6 +1636,12 @@ static int iommu_map_page(struct protection_domain *dom, pte[i] = __pte; update_domain(dom); + if (!dom->updated) { + if (unlikely(amd_iommu_np_cache)) { + domain_flush_pages(dom, bus_addr, page_size); + domain_flush_complete(dom); + } + } /* Everything flushed out, free pages now */ free_page_list(freelist); -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 3/6] iommu/ipmmu-vmsa: Make IPMMU_CTX_MAX unsigned
Make the IPMMU_CTX_MAX constant unsigned, to match the type of ipmmu_features.number_of_contexts. This allows to use plain min() instead of type-casting min_t(). Signed-off-by: Geert Uytterhoeven Reviewed-by: Laurent Pinchart Reviewed-by: Simon Horman --- v3: - Add Reviewed-by, v2: - Add Reviewed-by. --- drivers/iommu/ipmmu-vmsa.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c index f2061bd1dc7b8852..87acf86f295fac0d 100644 --- a/drivers/iommu/ipmmu-vmsa.c +++ b/drivers/iommu/ipmmu-vmsa.c @@ -36,7 +36,7 @@ #define arm_iommu_detach_device(...) do {} while (0) #endif -#define IPMMU_CTX_MAX 8 +#define IPMMU_CTX_MAX 8U struct ipmmu_features { bool use_ns_alias_offset; @@ -1060,8 +1060,7 @@ static int ipmmu_probe(struct platform_device *pdev) if (mmu->features->use_ns_alias_offset) mmu->base += IM_NS_ALIAS_OFFSET; - mmu->num_ctx = min_t(unsigned int, IPMMU_CTX_MAX, -mmu->features->number_of_contexts); + mmu->num_ctx = min(IPMMU_CTX_MAX, mmu->features->number_of_contexts); irq = platform_get_irq(pdev, 0); -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 4/6] iommu/ipmmu-vmsa: Move num_utlbs to SoC-specific features
The maximum number of micro-TLBs per IPMMU instance is not fixed, but depends on the SoC type. Hence move it from struct ipmmu_vmsa_device to struct ipmmu_features, and set up the correct value for both R-Car Gen2 and Gen3 SoCs. Note that currently no code uses this value. Signed-off-by: Geert Uytterhoeven Reviewed-by: Laurent Pinchart Reviewed-by: Simon Horman --- v3: - Add Reviewed-by, v2: - Add Reviewed-by. --- drivers/iommu/ipmmu-vmsa.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c index 87acf86f295fac0d..3fa57627b1e35562 100644 --- a/drivers/iommu/ipmmu-vmsa.c +++ b/drivers/iommu/ipmmu-vmsa.c @@ -42,6 +42,7 @@ struct ipmmu_features { bool use_ns_alias_offset; bool has_cache_leaf_nodes; unsigned int number_of_contexts; + unsigned int num_utlbs; bool setup_imbuscr; bool twobit_imttbcr_sl0; bool reserved_context; @@ -53,7 +54,6 @@ struct ipmmu_vmsa_device { struct iommu_device iommu; struct ipmmu_vmsa_device *root; const struct ipmmu_features *features; - unsigned int num_utlbs; unsigned int num_ctx; spinlock_t lock;/* Protects ctx and domains[] */ DECLARE_BITMAP(ctx, IPMMU_CTX_MAX); @@ -972,6 +972,7 @@ static const struct ipmmu_features ipmmu_features_default = { .use_ns_alias_offset = true, .has_cache_leaf_nodes = false, .number_of_contexts = 1, /* software only tested with one context */ + .num_utlbs = 32, .setup_imbuscr = true, .twobit_imttbcr_sl0 = false, .reserved_context = false, @@ -981,6 +982,7 @@ static const struct ipmmu_features ipmmu_features_rcar_gen3 = { .use_ns_alias_offset = false, .has_cache_leaf_nodes = true, .number_of_contexts = 8, + .num_utlbs = 48, .setup_imbuscr = false, .twobit_imttbcr_sl0 = true, .reserved_context = true, @@ -1033,7 +1035,6 @@ static int ipmmu_probe(struct platform_device *pdev) } mmu->dev = >dev; - mmu->num_utlbs = 48; spin_lock_init(>lock); bitmap_zero(mmu->ctx, IPMMU_CTX_MAX); mmu->features = of_device_get_match_data(>dev); -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 0/6] iommu/ipmmu-vmsa: Suspend/resume support and assorted cleanups
Hi Jörg, Magnus, On R-Car Gen3 systems with PSCI, PSCI may power down the SoC during system suspend, thus losing all IOMMU state. Hence after s2ram, devices behind an IPMMU (e.g. SATA), and configured to use it, will fail to complete their I/O operations. This patch series adds suspend/resume support to the Renesas IPMMU-VMSA IOMMU driver, and performs some smaller cleanups and fixes during the process. Most patches are fairly independent, except for patch 6/6, which depends on patches 4/6 and 5/6. Changes compared to v2: - Fix sysfs path typo in patch description, - Add Reviewed-by. Changes compared to v1: - Dropped "iommu/ipmmu-vmsa: Call ipmmu_ctx_write_root() instead of open coding", - Add Reviewed-by, - Merge IMEAR/IMELAR, - s/ipmmu_context_init/ipmmu_domain_setup_context/, - Drop PSCI checks. This has been tested on Salvator-XS with R-Car H3 ES2.0, with IPMMU suport for SATA enabled. To play safe, the resume operation has also been tested on R-Car M2-W. Thanks! Geert Uytterhoeven (6): iommu/ipmmu-vmsa: Link IOMMUs and devices in sysfs iommu/ipmmu-vmsa: Prepare to handle 40-bit error addresses iommu/ipmmu-vmsa: Make IPMMU_CTX_MAX unsigned iommu/ipmmu-vmsa: Move num_utlbs to SoC-specific features iommu/ipmmu-vmsa: Extract hardware context initialization iommu/ipmmu-vmsa: Add suspend/resume support drivers/iommu/ipmmu-vmsa.c | 185 + 1 file changed, 124 insertions(+), 61 deletions(-) -- 2.17.1 Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 6/6] iommu/ipmmu-vmsa: Add suspend/resume support
During PSCI system suspend, R-Car Gen3 SoCs are powered down, and all IPMMU state is lost. Hence after s2ram, devices wired behind an IPMMU, and configured to use it, will see their DMA operations hang. To fix this, restore all IPMMU contexts, and re-enable all active micro-TLBs during system resume. Signed-off-by: Geert Uytterhoeven --- This patch takes a different approach than the BSP, which implements a bulk save/restore of all registers during system suspend/resume. v3: - No changes, v2: - Drop PSCI checks. --- drivers/iommu/ipmmu-vmsa.c | 47 +- 1 file changed, 46 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c index 56e84bcc9532e1ce..408ad0b2591925e0 100644 --- a/drivers/iommu/ipmmu-vmsa.c +++ b/drivers/iommu/ipmmu-vmsa.c @@ -36,7 +36,10 @@ #define arm_iommu_detach_device(...) do {} while (0) #endif -#define IPMMU_CTX_MAX 8U +#define IPMMU_CTX_MAX 8U +#define IPMMU_CTX_INVALID -1 + +#define IPMMU_UTLB_MAX 48U struct ipmmu_features { bool use_ns_alias_offset; @@ -58,6 +61,7 @@ struct ipmmu_vmsa_device { spinlock_t lock;/* Protects ctx and domains[] */ DECLARE_BITMAP(ctx, IPMMU_CTX_MAX); struct ipmmu_vmsa_domain *domains[IPMMU_CTX_MAX]; + s8 utlb_ctx[IPMMU_UTLB_MAX]; struct iommu_group *group; struct dma_iommu_mapping *mapping; @@ -335,6 +339,7 @@ static void ipmmu_utlb_enable(struct ipmmu_vmsa_domain *domain, ipmmu_write(mmu, IMUCTR(utlb), IMUCTR_TTSEL_MMU(domain->context_id) | IMUCTR_FLUSH | IMUCTR_MMUEN); + mmu->utlb_ctx[utlb] = domain->context_id; } /* @@ -346,6 +351,7 @@ static void ipmmu_utlb_disable(struct ipmmu_vmsa_domain *domain, struct ipmmu_vmsa_device *mmu = domain->mmu; ipmmu_write(mmu, IMUCTR(utlb), 0); + mmu->utlb_ctx[utlb] = IPMMU_CTX_INVALID; } static void ipmmu_tlb_flush_all(void *cookie) @@ -1043,6 +1049,7 @@ static int ipmmu_probe(struct platform_device *pdev) spin_lock_init(>lock); bitmap_zero(mmu->ctx, IPMMU_CTX_MAX); mmu->features = of_device_get_match_data(>dev); + memset(mmu->utlb_ctx, IPMMU_CTX_INVALID, mmu->features->num_utlbs); dma_set_mask_and_coherent(>dev, DMA_BIT_MASK(40)); /* Map I/O memory and request IRQ. */ @@ -1158,10 +1165,48 @@ static int ipmmu_remove(struct platform_device *pdev) return 0; } +#ifdef CONFIG_PM_SLEEP +static int ipmmu_resume_noirq(struct device *dev) +{ + struct ipmmu_vmsa_device *mmu = dev_get_drvdata(dev); + unsigned int i; + + /* Reset root MMU and restore contexts */ + if (ipmmu_is_root(mmu)) { + ipmmu_device_reset(mmu); + + for (i = 0; i < mmu->num_ctx; i++) { + if (!mmu->domains[i]) + continue; + + ipmmu_domain_setup_context(mmu->domains[i]); + } + } + + /* Re-enable active micro-TLBs */ + for (i = 0; i < mmu->features->num_utlbs; i++) { + if (mmu->utlb_ctx[i] == IPMMU_CTX_INVALID) + continue; + + ipmmu_utlb_enable(mmu->root->domains[mmu->utlb_ctx[i]], i); + } + + return 0; +} + +static const struct dev_pm_ops ipmmu_pm = { + SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(NULL, ipmmu_resume_noirq) +}; +#define DEV_PM_OPS _pm +#else +#define DEV_PM_OPS NULL +#endif /* CONFIG_PM_SLEEP */ + static struct platform_driver ipmmu_driver = { .driver = { .name = "ipmmu-vmsa", .of_match_table = of_match_ptr(ipmmu_of_ids), + .pm = DEV_PM_OPS, }, .probe = ipmmu_probe, .remove = ipmmu_remove, -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 1/6] iommu/ipmmu-vmsa: Link IOMMUs and devices in sysfs
As of commit 7af9a5fdb9e0ca33 ("iommu/ipmmu-vmsa: Use iommu_device_sysfs_add()/remove()"), IOMMU devices show up under /sys/class/iommu/, but their "devices" subdirectories are empty. Likewise, devices tied to an IOMMU do not have an "iommu" backlink. Make sure all links are created, on both arm32 and arm64. Signed-off-by: Geert Uytterhoeven Reviewed-by: Laurent Pinchart --- v3: - Fix sysfs path typo in patch description, v2: - Add Reviewed-by. --- drivers/iommu/ipmmu-vmsa.c | 24 +--- 1 file changed, 17 insertions(+), 7 deletions(-) diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c index 9a380c10655e182d..9f2b781e20a0eba6 100644 --- a/drivers/iommu/ipmmu-vmsa.c +++ b/drivers/iommu/ipmmu-vmsa.c @@ -885,27 +885,37 @@ static int ipmmu_init_arm_mapping(struct device *dev) static int ipmmu_add_device(struct device *dev) { + struct ipmmu_vmsa_device *mmu = to_ipmmu(dev); struct iommu_group *group; + int ret; /* * Only let through devices that have been verified in xlate() */ - if (!to_ipmmu(dev)) + if (!mmu) return -ENODEV; - if (IS_ENABLED(CONFIG_ARM) && !IS_ENABLED(CONFIG_IOMMU_DMA)) - return ipmmu_init_arm_mapping(dev); + if (IS_ENABLED(CONFIG_ARM) && !IS_ENABLED(CONFIG_IOMMU_DMA)) { + ret = ipmmu_init_arm_mapping(dev); + if (ret) + return ret; + } else { + group = iommu_group_get_for_dev(dev); + if (IS_ERR(group)) + return PTR_ERR(group); - group = iommu_group_get_for_dev(dev); - if (IS_ERR(group)) - return PTR_ERR(group); + iommu_group_put(group); + } - iommu_group_put(group); + iommu_device_link(>iommu, dev); return 0; } static void ipmmu_remove_device(struct device *dev) { + struct ipmmu_vmsa_device *mmu = to_ipmmu(dev); + + iommu_device_unlink(>iommu, dev); arm_iommu_detach_device(dev); iommu_group_remove_device(dev); } -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 2/6] iommu/ipmmu-vmsa: Prepare to handle 40-bit error addresses
On R-Car Gen3, the faulting virtual address is a 40-bit address, and comprised of two registers. Read the upper address part, and combine both parts, when running on a 64-bit system. Signed-off-by: Geert Uytterhoeven Reviewed-by: Simon Horman --- Apart from this, the driver doesn't support 40-bit IOVA addresses yet. v3: - Add Reviewed-by, v2: - Merge IMEAR/IMELAR. --- drivers/iommu/ipmmu-vmsa.c | 15 +-- 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c index 9f2b781e20a0eba6..f2061bd1dc7b8852 100644 --- a/drivers/iommu/ipmmu-vmsa.c +++ b/drivers/iommu/ipmmu-vmsa.c @@ -186,7 +186,8 @@ static struct ipmmu_vmsa_device *to_ipmmu(struct device *dev) #define IMMAIR_ATTR_IDX_WBRWA 1 #define IMMAIR_ATTR_IDX_DEV2 -#define IMEAR 0x0030 +#define IMELAR 0x0030 /* IMEAR on R-Car Gen2 */ +#define IMEUAR 0x0034 /* R-Car Gen3 only */ #define IMPCTR 0x0200 #define IMPSTR 0x0208 @@ -522,14 +523,16 @@ static irqreturn_t ipmmu_domain_irq(struct ipmmu_vmsa_domain *domain) { const u32 err_mask = IMSTR_MHIT | IMSTR_ABORT | IMSTR_PF | IMSTR_TF; struct ipmmu_vmsa_device *mmu = domain->mmu; + unsigned long iova; u32 status; - u32 iova; status = ipmmu_ctx_read_root(domain, IMSTR); if (!(status & err_mask)) return IRQ_NONE; - iova = ipmmu_ctx_read_root(domain, IMEAR); + iova = ipmmu_ctx_read_root(domain, IMELAR); + if (IS_ENABLED(CONFIG_64BIT)) + iova |= (u64)ipmmu_ctx_read_root(domain, IMEUAR) << 32; /* * Clear the error status flags. Unlike traditional interrupt flag @@ -541,10 +544,10 @@ static irqreturn_t ipmmu_domain_irq(struct ipmmu_vmsa_domain *domain) /* Log fatal errors. */ if (status & IMSTR_MHIT) - dev_err_ratelimited(mmu->dev, "Multiple TLB hits @0x%08x\n", + dev_err_ratelimited(mmu->dev, "Multiple TLB hits @0x%lx\n", iova); if (status & IMSTR_ABORT) - dev_err_ratelimited(mmu->dev, "Page Table Walk Abort @0x%08x\n", + dev_err_ratelimited(mmu->dev, "Page Table Walk Abort @0x%lx\n", iova); if (!(status & (IMSTR_PF | IMSTR_TF))) @@ -560,7 +563,7 @@ static irqreturn_t ipmmu_domain_irq(struct ipmmu_vmsa_domain *domain) return IRQ_HANDLED; dev_err_ratelimited(mmu->dev, - "Unhandled fault: status 0x%08x iova 0x%08x\n", + "Unhandled fault: status 0x%08x iova 0x%lx\n", status, iova); return IRQ_HANDLED; -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 5/6] iommu/ipmmu-vmsa: Extract hardware context initialization
ipmmu_domain_init_context() takes care of (1) initializing the software domain, and (2) initializing the hardware context for the domain. Extract the code to initialize the hardware context into a new subroutine ipmmu_domain_setup_context(), to prepare for later reuse. Signed-off-by: Geert Uytterhoeven Reviewed-by: Simon Horman --- v3: - Add Reviewed-by, v2: - s/ipmmu_context_init/ipmmu_domain_setup_context/. --- drivers/iommu/ipmmu-vmsa.c | 91 -- 1 file changed, 48 insertions(+), 43 deletions(-) diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c index 3fa57627b1e35562..56e84bcc9532e1ce 100644 --- a/drivers/iommu/ipmmu-vmsa.c +++ b/drivers/iommu/ipmmu-vmsa.c @@ -404,52 +404,10 @@ static void ipmmu_domain_free_context(struct ipmmu_vmsa_device *mmu, spin_unlock_irqrestore(>lock, flags); } -static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain) +static void ipmmu_domain_setup_context(struct ipmmu_vmsa_domain *domain) { u64 ttbr; u32 tmp; - int ret; - - /* -* Allocate the page table operations. -* -* VMSA states in section B3.6.3 "Control of Secure or Non-secure memory -* access, Long-descriptor format" that the NStable bit being set in a -* table descriptor will result in the NStable and NS bits of all child -* entries being ignored and considered as being set. The IPMMU seems -* not to comply with this, as it generates a secure access page fault -* if any of the NStable and NS bits isn't set when running in -* non-secure mode. -*/ - domain->cfg.quirks = IO_PGTABLE_QUIRK_ARM_NS; - domain->cfg.pgsize_bitmap = SZ_1G | SZ_2M | SZ_4K; - domain->cfg.ias = 32; - domain->cfg.oas = 40; - domain->cfg.tlb = _gather_ops; - domain->io_domain.geometry.aperture_end = DMA_BIT_MASK(32); - domain->io_domain.geometry.force_aperture = true; - /* -* TODO: Add support for coherent walk through CCI with DVM and remove -* cache handling. For now, delegate it to the io-pgtable code. -*/ - domain->cfg.iommu_dev = domain->mmu->root->dev; - - /* -* Find an unused context. -*/ - ret = ipmmu_domain_allocate_context(domain->mmu->root, domain); - if (ret < 0) - return ret; - - domain->context_id = ret; - - domain->iop = alloc_io_pgtable_ops(ARM_32_LPAE_S1, >cfg, - domain); - if (!domain->iop) { - ipmmu_domain_free_context(domain->mmu->root, - domain->context_id); - return -EINVAL; - } /* TTBR0 */ ttbr = domain->cfg.arm_lpae_s1_cfg.ttbr[0]; @@ -495,7 +453,54 @@ static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain) */ ipmmu_ctx_write_all(domain, IMCTR, IMCTR_INTEN | IMCTR_FLUSH | IMCTR_MMUEN); +} + +static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain) +{ + int ret; + + /* +* Allocate the page table operations. +* +* VMSA states in section B3.6.3 "Control of Secure or Non-secure memory +* access, Long-descriptor format" that the NStable bit being set in a +* table descriptor will result in the NStable and NS bits of all child +* entries being ignored and considered as being set. The IPMMU seems +* not to comply with this, as it generates a secure access page fault +* if any of the NStable and NS bits isn't set when running in +* non-secure mode. +*/ + domain->cfg.quirks = IO_PGTABLE_QUIRK_ARM_NS; + domain->cfg.pgsize_bitmap = SZ_1G | SZ_2M | SZ_4K; + domain->cfg.ias = 32; + domain->cfg.oas = 40; + domain->cfg.tlb = _gather_ops; + domain->io_domain.geometry.aperture_end = DMA_BIT_MASK(32); + domain->io_domain.geometry.force_aperture = true; + /* +* TODO: Add support for coherent walk through CCI with DVM and remove +* cache handling. For now, delegate it to the io-pgtable code. +*/ + domain->cfg.iommu_dev = domain->mmu->root->dev; + + /* +* Find an unused context. +*/ + ret = ipmmu_domain_allocate_context(domain->mmu->root, domain); + if (ret < 0) + return ret; + + domain->context_id = ret; + + domain->iop = alloc_io_pgtable_ops(ARM_32_LPAE_S1, >cfg, + domain); + if (!domain->iop) { + ipmmu_domain_free_context(domain->mmu->root, + domain->context_id); + return -EINVAL; + } + ipmmu_domain_setup_context(domain); return 0; } -- 2.17.1 ___ iommu mailing list
[RFC v3 3/3] smaples: add vfio-mdev-pci driver
This patch adds sample driver named vfio-mdev-pci. It is to wrap a PCI device as a mediated device. For a pci device, once bound to vfio-mdev-pci driver, user space access of this device will go through vfio mdev framework. The usage of the device follows mdev management method. e.g. user should create a mdev before exposing the device to user-space. Benefit of this new driver would be acting as a sample driver for recent changes from "vfio/mdev: IOMMU aware mediated device" patchset. Also it could be a good experiment driver for future device specific mdev migration support. To use this driver: a) build and load vfio-mdev-pci.ko module execute "make menuconfig" and config CONFIG_SAMPLE_VFIO_MDEV_PCI then load it with following command > sudo modprobe vfio > sudo modprobe vfio-pci > sudo insmod drivers/vfio/pci/vfio-mdev-pci.ko b) unbind original device driver e.g. use following command to unbind its original driver > echo $dev_bdf > /sys/bus/pci/devices/$dev_bdf/driver/unbind c) bind vfio-mdev-pci driver to the physical device > echo $vend_id $dev_id > /sys/bus/pci/drivers/vfio-mdev-pci/new_id d) check the supported mdev instances > ls /sys/bus/pci/devices/$dev_bdf/mdev_supported_types/ vfio-mdev-pci-type1 > ls /sys/bus/pci/devices/$dev_bdf/mdev_supported_types/\ vfio-mdev-pci-type1/ available_instances create device_api devices name e) create mdev on this physical device (only 1 instance) > echo "83b8f4f2-509f-382f-3c1e-e6bfe0fa1003" > \ /sys/bus/pci/devices/$dev_bdf/mdev_supported_types/\ vfio-mdev-pci-type1/create f) passthru the mdev to guest add the following line in Qemu boot command -device vfio-pci,\ sysfsdev=/sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1003 g) destroy mdev > echo 1 > /sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1003/\ remove Cc: Kevin Tian Cc: Lu Baolu Cc: Masahiro Yamada Suggested-by: Alex Williamson Signed-off-by: Liu, Yi L --- drivers/vfio/pci/Makefile| 5 + drivers/vfio/pci/vfio_mdev_pci.c | 386 +++ samples/Kconfig | 11 ++ 3 files changed, 402 insertions(+) create mode 100644 drivers/vfio/pci/vfio_mdev_pci.c diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile index 813f6b3..6a05393 100644 --- a/drivers/vfio/pci/Makefile +++ b/drivers/vfio/pci/Makefile @@ -3,4 +3,9 @@ vfio-pci-y := vfio_pci.o common.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio_pci_conf vfio-pci-$(CONFIG_VFIO_PCI_IGD) += vfio_pci_igd.o vfio-pci-$(CONFIG_VFIO_PCI_NVLINK2) += vfio_pci_nvlink2.o +vfio-mdev-pci-y := vfio_mdev_pci.o common.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio_pci_config.o +vfio-mdev-pci-$(CONFIG_VFIO_PCI_IGD) += vfio_pci_igd.o +vfio-mdev-pci-$(CONFIG_VFIO_PCI_NVLINK2) += vfio_pci_nvlink2.o + obj-$(CONFIG_VFIO_PCI) += vfio-pci.o +obj-$(CONFIG_SAMPLE_VFIO_MDEV_PCI) += vfio-mdev-pci.o diff --git a/drivers/vfio/pci/vfio_mdev_pci.c b/drivers/vfio/pci/vfio_mdev_pci.c new file mode 100644 index 000..aec7a5b --- /dev/null +++ b/drivers/vfio/pci/vfio_mdev_pci.c @@ -0,0 +1,386 @@ +/* + * Copyright © 2019 Intel Corporation. + * Author: Liu, Yi L + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * Derived from original vfio_pci.c: + * Copyright (C) 2012 Red Hat, Inc. All rights reserved. + * Author: Alex Williamson + * + * Derived from original vfio: + * Copyright 2010 Cisco Systems, Inc. All rights reserved. + * Author: Tom Lyon, p...@cisco.com + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "vfio_pci_private.h" + +#define DRIVER_VERSION "0.1" +#define DRIVER_AUTHOR "Liu, Yi L " +#define DRIVER_DESC "VFIO Mdev PCI - Sample driver for PCI device as a mdev" + +#define VFIO_MDEV_PCI_NAME "vfio-mdev-pci" + +static char ids[1024] __initdata; +module_param_string(ids, ids, sizeof(ids), 0); +MODULE_PARM_DESC(ids, "Initial PCI IDs to add to the vfio-mdev-pci driver, format is \"vendor:device[:subvendor[:subdevice[:class[:class_mask\" and multiple comma separated entries can be specified"); + +static bool nointxmask; +module_param_named(nointxmask, nointxmask, bool, S_IRUGO | S_IWUSR); +MODULE_PARM_DESC(nointxmask, + "Disable support for PCI 2.3 style INTx masking. If this resolves problems for specific devices, report lspci -vvvxxx to linux-...@vger.kernel.org so the device can be fixed automatically via the broken_intx_masking flag."); + +#ifdef CONFIG_VFIO_PCI_VGA +static bool disable_vga; +module_param(disable_vga, bool, S_IRUGO); +MODULE_PARM_DESC(disable_vga, "Disable VGA resource access through vfio-mdev-pci");
[RFC v3 1/3] vfio_pci: split vfio_pci.c into two source files
This patch splits the non-module specific codes from original drivers/vfio/pci/vfio_pci.c into a common.c under drivers/vfio/pci. This is for potential code sharing. e.g. vfio-mdev-pci driver Cc: Kevin Tian Cc: Lu Baolu Signed-off-by: Liu, Yi L --- drivers/vfio/pci/Makefile |2 +- drivers/vfio/pci/common.c | 1511 +++ drivers/vfio/pci/vfio_pci.c | 1476 +- drivers/vfio/pci/vfio_pci_private.h | 27 + 4 files changed, 1551 insertions(+), 1465 deletions(-) create mode 100644 drivers/vfio/pci/common.c diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile index 9662c06..813f6b3 100644 --- a/drivers/vfio/pci/Makefile +++ b/drivers/vfio/pci/Makefile @@ -1,5 +1,5 @@ -vfio-pci-y := vfio_pci.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio_pci_config.o +vfio-pci-y := vfio_pci.o common.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio_pci_config.o vfio-pci-$(CONFIG_VFIO_PCI_IGD) += vfio_pci_igd.o vfio-pci-$(CONFIG_VFIO_PCI_NVLINK2) += vfio_pci_nvlink2.o diff --git a/drivers/vfio/pci/common.c b/drivers/vfio/pci/common.c new file mode 100644 index 000..847e2e4 --- /dev/null +++ b/drivers/vfio/pci/common.c @@ -0,0 +1,1511 @@ +/* + * Copyright © 2019 Intel Corporation. + * Author: Liu, Yi L + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * Derived from original vfio_pci.c: + * Copyright (C) 2012 Red Hat, Inc. All rights reserved. + * Author: Alex Williamson + * + * Derived from original vfio: + * Copyright 2010 Cisco Systems, Inc. All rights reserved. + * Author: Tom Lyon, p...@cisco.com + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "vfio_pci_private.h" + +inline bool vfio_vga_disabled(struct vfio_pci_device *vdev) +{ +#ifdef CONFIG_VFIO_PCI_VGA + return vdev->disable_vga; +#else + return true; +#endif +} + +/* + * Our VGA arbiter participation is limited since we don't know anything + * about the device itself. However, if the device is the only VGA device + * downstream of a bridge and VFIO VGA support is disabled, then we can + * safely return legacy VGA IO and memory as not decoded since the user + * has no way to get to it and routing can be disabled externally at the + * bridge. + */ +static unsigned int vfio_pci_set_vga_decode(void *opaque, bool single_vga) +{ + struct vfio_pci_device *vdev = opaque; + struct pci_dev *tmp = NULL, *pdev = vdev->pdev; + unsigned char max_busnr; + unsigned int decodes; + + if (single_vga || !vfio_vga_disabled(vdev) || + pci_is_root_bus(pdev->bus)) + return VGA_RSRC_NORMAL_IO | VGA_RSRC_NORMAL_MEM | + VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM; + + max_busnr = pci_bus_max_busnr(pdev->bus); + decodes = VGA_RSRC_NORMAL_IO | VGA_RSRC_NORMAL_MEM; + + while ((tmp = pci_get_class(PCI_CLASS_DISPLAY_VGA << 8, tmp)) != NULL) { + if (tmp == pdev || + pci_domain_nr(tmp->bus) != pci_domain_nr(pdev->bus) || + pci_is_root_bus(tmp->bus)) + continue; + + if (tmp->bus->number >= pdev->bus->number && + tmp->bus->number <= max_busnr) { + pci_dev_put(tmp); + decodes |= VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM; + break; + } + } + + return decodes; +} + +inline bool vfio_pci_is_vga(struct pci_dev *pdev) +{ + return (pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA; +} + +void vfio_pci_vga_probe(struct vfio_pci_device *vdev) +{ + vga_client_register(vdev->pdev, vdev, NULL, vfio_pci_set_vga_decode); + vga_set_legacy_decoding(vdev->pdev, + vfio_pci_set_vga_decode(vdev, false)); +} + +void vfio_pci_vga_remove(struct vfio_pci_device *vdev) +{ + vga_client_register(vdev->pdev, NULL, NULL, NULL); + vga_set_legacy_decoding(vdev->pdev, + VGA_RSRC_NORMAL_IO | VGA_RSRC_NORMAL_MEM | + VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM); +} + +static void vfio_pci_probe_mmaps(struct vfio_pci_device *vdev) +{ + struct resource *res; + int bar; + struct vfio_pci_dummy_resource *dummy_res; + + INIT_LIST_HEAD(>dummy_resources_list); + + for (bar = PCI_STD_RESOURCES; bar <= PCI_STD_RESOURCE_END; bar++) { + res = vdev->pdev->resource + bar; + + if (!IS_ENABLED(CONFIG_VFIO_PCI_MMAP)) + goto no_mmap; + + if (!(res->flags & IORESOURCE_MEM)) + goto no_mmap; + +
[RFC v3 2/3] vfio/pci: protect cap/ecap_perm bits alloc/free with atomic op
There is a case in which cap_perms and ecap_perms can be reallocated by different modules. e.g. the vfio-mdev-pci sample driver. To secure the initialization of cap_perms and ecap_perms, this patch adds an atomic variable to track the user of cap/ecap_perms bits. First caller of vfio_pci_init_perm_bits() will initialize the bits. While the last caller of vfio_pci_uninit_perm_bits() will free the bits. Cc: Kevin Tian Cc: Lu Baolu Suggested-by: Alex Williamson Signed-off-by: Liu, Yi L --- drivers/vfio/pci/vfio_pci_config.c | 9 + 1 file changed, 9 insertions(+) diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c index e82b511..913fca6 100644 --- a/drivers/vfio/pci/vfio_pci_config.c +++ b/drivers/vfio/pci/vfio_pci_config.c @@ -996,11 +996,17 @@ static int __init init_pci_ext_cap_pwr_perm(struct perm_bits *perm) return 0; } +/* Track the user number of the cap/ecap perm_bits */ +atomic_t vfio_pci_perm_bits_users = ATOMIC_INIT(0); + /* * Initialize the shared permission tables */ void vfio_pci_uninit_perm_bits(void) { + if (atomic_dec_return(_pci_perm_bits_users)) + return; + free_perm_bits(_perms[PCI_CAP_ID_BASIC]); free_perm_bits(_perms[PCI_CAP_ID_PM]); @@ -1017,6 +1023,9 @@ int __init vfio_pci_init_perm_bits(void) { int ret; + if (atomic_inc_return(_pci_perm_bits_users) != 1) + return 0; + /* Basic config space */ ret = init_pci_cap_basic_perm(_perms[PCI_CAP_ID_BASIC]); -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC v3 0/3] vfio_pci: wrap pci device as a mediated device
This patchset aims to add a vfio-pci-like meta driver as a demo user of the vfio changes introduced in "vfio/mdev: IOMMU aware mediated device" patchset from Baolu Lu. Previous RFC v1 has given two proposals and the discussion could be found in following link. Per the comments, this patchset adds a separate driver named vfio-mdev-pci. It is a sample driver, but loactes in drivers/vfio/pci due to code sharing consideration. The corresponding Kconfig definition is in samples/Kconfig. https://lkml.org/lkml/2019/3/4/529 Besides the test purpose, per Alex's comments, it could also be a good base driver for experimenting with device specific mdev migration. Specific interface tested in this proposal: *) int mdev_set_iommu_device(struct device *dev, struct device *iommu_device) introduced in the patch as below: "[PATCH v5 6/8] vfio/mdev: Add iommu related member in mdev_device" Links: *) Link of "vfio/mdev: IOMMU aware mediated device" https://lwn.net/Articles/780522/ Please feel free give your comments. Thanks, Yi Liu Change log: v2->v3: - use vfio-mdev-pci instead of vfio-pci-mdev - place the new driver under drivers/vfio/pci while define Kconfig in samples/Kconfig to clarify it is a sample driver v1->v2: - instead of adding kernel option to existing vfio-pci module in v1, v2 follows Alex's suggestion to add a separate vfio-pci-mdev module. - new patchset subject: "vfio/pci: wrap pci device as a mediated device" Liu, Yi L (3): vfio_pci: split vfio_pci.c into two source files vfio/pci: protect cap/ecap_perm bits alloc/free with atomic op smaples: add vfio-mdev-pci driver drivers/vfio/pci/Makefile |7 +- drivers/vfio/pci/common.c | 1511 +++ drivers/vfio/pci/vfio_mdev_pci.c| 386 + drivers/vfio/pci/vfio_pci.c | 1476 +- drivers/vfio/pci/vfio_pci_config.c |9 + drivers/vfio/pci/vfio_pci_private.h | 27 + samples/Kconfig | 11 + 7 files changed, 1962 insertions(+), 1465 deletions(-) create mode 100644 drivers/vfio/pci/common.c create mode 100644 drivers/vfio/pci/vfio_mdev_pci.c -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[GIT PULL] iommu/arm-smmu: Updates for 5.2
Hi Joerg, Please can you pull these ARM SMMU updates for 5.2? The highlight is ATS support in the SMMUv3 driver from Jean-Philippe. I have collected the appropriate Acks from the PCI and IORT sides. There is a trivial conflict with your api-features branch due to the diff context in linux/iommu.h. Cheers, Will --->8 The following changes since commit 79a3aaa7b82e3106be97842dedfd8429248896e6: Linux 5.1-rc3 (2019-03-31 14:39:29 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git for-joerg/arm-smmu/updates for you to fetch changes up to bc580b56cb7888d1f09fff8a50270228fb834ae8: iommu/arm-smmu: Log CBFRSYNRA register on context fault (2019-04-23 12:23:16 +0100) Douglas Anderson (1): iommu/arm-smmu: Break insecure users by disabling bypass by default Jean-Philippe Brucker (9): PCI: Move ATS declarations outside of CONFIG_PCI PCI: Add a stub for pci_ats_disabled() ACPI/IORT: Check ATS capability in root complex nodes iommu/arm-smmu-v3: Rename arm_smmu_master_data to arm_smmu_master iommu/arm-smmu-v3: Store SteamIDs in master iommu/arm-smmu-v3: Add a master->domain pointer iommu/arm-smmu-v3: Link domains and devices iommu/arm-smmu-v3: Add support for PCI ATS iommu/arm-smmu-v3: Disable tagged pointers Vivek Gautam (1): iommu/arm-smmu: Log CBFRSYNRA register on context fault Will Deacon (1): iommu/arm-smmu-v3: Don't disable SMMU in kdump kernel drivers/acpi/arm64/iort.c | 11 ++ drivers/iommu/Kconfig | 25 +++ drivers/iommu/arm-smmu-regs.h | 2 + drivers/iommu/arm-smmu-v3.c | 355 +- drivers/iommu/arm-smmu.c | 11 +- include/linux/iommu.h | 4 + include/linux/pci.h | 31 ++-- 7 files changed, 344 insertions(+), 95 deletions(-) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 56/79] docs: Documentation/*.txt: rename all ReST files to *.rst
Em Tue, 23 Apr 2019 23:38:16 +0200 Borislav Petkov escreveu: > On Tue, Apr 23, 2019 at 05:05:02PM -0300, Mauro Carvalho Chehab wrote: > > That's my view about how that specific file would be after > > converted to ReST: > > > > > > https://git.linuxtv.org/mchehab/experimental.git/tree/Documentation/x86/x86_64/mm.rst?h=convert_rst_renames > > > > I don't have any troubles reading/understanding it as a plain text > > file, > > If that is all the changes it would need, then I guess that's ok. Btw, > those rst-conversion patches don't really show what got changed. Dunno > if git can even show that properly. I diffed the two files by hand to > see what got changed, see end of mail. Well, you can use git show -M01 and it will likely show what changed. The thing is that plain diff is not very good showing diffs on text files. I suspect that using some tool like wdiff would give a better view of such changes. > So I guess if table in rst means, one needs to draw rows and columns, I > guess that's ok. It's not like I have to do it every day. Yes, for complex tables, one needs to draw rows/columns. For simple tables, all you need to do is something like: == === - CONT PTEPMDCONT PMDPUD == === 4K: 64K 2M 32M 1G 16K: 2M32M 1G 64K: 2M 512M 16G == === in order to teach Sphinx where each column starts/stops, and (optionally) show the table titles in bold. (that's from Documentation/arm64/hugetlbpage.rst conversion) > But exactly this - *having* to do rst formatting would mean a lot of > getting used to and people writing something which is not necessarily > correct rst and someone else fixing up after them. Yeah, one has to take the conversion effort, but once done, it should be easy to keep it updated. > > and its html output is also nice (although Sphinx 1.7.8 seems to > > have some issues when parsing some cells - probably due to some bug): > > > > https://www.infradead.org/~mchehab/rst_conversion/x86/x86_64/mm.html > > I don't know how that looks in your browser but in mine those addresses > are not in monospaced font and there's no properly reading them. > > And yap, the cells parsing fun I see too. Font selection is one of the things would require some markup, as a plain text file doesn't have font changes. There are several ways to make it use a monospaced font. The straight forward way would be to place everything that it is monospaced inside ``double quotes``, with is the ReST way to mark a literal block inside a text. IMHO, that would add too much "noise" at the tables. Another possibility would be to do: .. raw:: html td { font-family: monospace, monospace; } (the double monospace here is not a mistake - it is due to a known bug^H^H^Hfeature on some browsers[1]) [1] https://stackoverflow.com/questions/38781089/font-family-monospace-monospace IMO, the best alternative would be to add a new class to the css file, and use it whenever we need a table with monospaced font, e. g.: diff --git a/Documentation/sphinx-static/theme_overrides.css b/Documentation/sphinx-static/theme_overrides.css index e21e36cd6761..0948de6651f8 100644 --- a/Documentation/sphinx-static/theme_overrides.css +++ b/Documentation/sphinx-static/theme_overrides.css @@ -125,3 +125,7 @@ div[class^="highlight"] pre { color: inherit; } } + +table.monospaced { + font-family: monospace, monospace; +} diff --git a/Documentation/x86/x86_64/mm.rst b/Documentation/x86/x86_64/mm.rst index e8a92fa0f9b2..704bad5c5130 100644 --- a/Documentation/x86/x86_64/mm.rst +++ b/Documentation/x86/x86_64/mm.rst @@ -18,6 +18,8 @@ Notes: notation than "16 EB", which few will recognize at first sight as 16 exabytes. It also shows it nicely how incredibly large 64-bit address space is. +.. cssclass:: monospaced + +-++--+-+---+ | Start addr| Offset | End addr | Size | VM area description | +=++==+=+===+ (patch on the top of this tree https://git.linuxtv.org/mchehab/experimental.git/tree/Documentation/x86/x86_64/mm.rst?h=convert_rst_renames) The ..cssclass:: markup on the above example will be applied just to the table below it. So, with that, it is possible to have normal and monospaced tables mixed (if you apply the above patch, you'll see that just the first table will use monospaced fonts). - Personally, I don't care much with monospaced fonts on this table. After all, if I want to see it monospaced, I can simply click at the "View page source" at the browser, and it will display the file as a plain
Re: [PATCH v2 56/79] docs: Documentation/*.txt: rename all ReST files to *.rst
On Tue, Apr 23, 2019 at 04:06:40PM -0600, Jonathan Corbet wrote: > Remember that most of our docs are 99% RST even though they were written > by people who had never even heard of RST. I really don't think it's a > big deal - a far smaller cognitive load than trying to keep up with any > given subsystem's variable-declaration-ordering rules, for example :) Tztztz, this thing seems to have hit a nerve with people. Which means, I will enforce that even more now so that I annoy submitters more! :-P See, I can do my own "RST" too. :-P Srsly: ok, good. Sounds like we're on the same page then. > I'm trying to do the same in Documentation/, with an attempt to be > sympathetic toward our readers, sort things by intended audience, > and create (someday) a coherent whole. I agree that moving docs is > a short-term annoyance, but I'm hoping that it brings a long-term > benefit. Ok, that's fair. I've been moving files too, in the past. > Minimal markup is the policy (it's even documented :). Automating stuff > that can be automated is an area that has definitely not received > enough attention; hopefully some things can be done there in the very > near future. Sounds nice, thanks Jon! -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 56/79] docs: Documentation/*.txt: rename all ReST files to *.rst
On Tue, Apr 23, 2019 at 11:38:16PM +0200, Borislav Petkov wrote: > If that is all the changes it would need, then I guess that's ok. Btw, > those rst-conversion patches don't really show what got changed. Dunno > if git can even show that properly. I diffed the two files by hand to > see what got changed, see end of mail. That is not a happy diff; that table has gotten waay worse to read due to all that extra table crap. > --- > --- mm.old2019-04-23 23:18:55.954335784 +0200 > +++ mm.new2019-04-23 23:18:48.122335821 +0200 > @@ -18,51 +18,68 @@ Notes: > notation than "16 EB", which few will recognize at first sight as 16 > exabytes. > It also shows it nicely how incredibly large 64-bit address space is. > > - > -Start addr| Offset | End addr | Size | VM area > description > - > - || | | > - |0 | 7fff | 128 TB | user-space > virtual memory, different per mm > -__||__|_|___ > - || | | > - 8000 | +128TB | 7fff | ~16M TB | ... huge, > almost 64 bits wide hole of non-canonical > - || | | virtual > memory addresses up to the -128 TB > - || | | starting > offset of kernel mappings. > -__||__|_|___ > -| > -| Kernel-space > virtual memory, shared between all processes: > -|___ > - || | | > - 8000 | -128TB | 87ff |8 TB | ... guard > hole, also reserved for hypervisor > - 8800 | -120TB | 887f | 0.5 TB | LDT remap for > PTI > - 8880 | -119.5 TB | c87f | 64 TB | direct mapping > of all physical memory (page_offset_base) > - c880 | -55.5 TB | c8ff | 0.5 TB | ... unused hole > - c900 | -55TB | e8ff | 32 TB | > vmalloc/ioremap space (vmalloc_base) > - e900 | -23TB | e9ff |1 TB | ... unused hole > - ea00 | -22TB | eaff |1 TB | virtual memory > map (vmemmap_base) > - eb00 | -21TB | ebff |1 TB | ... unused hole > - ec00 | -20TB | fbff | 16 TB | KASAN shadow > memory > -__||__|_| > -| > -| Identical > layout to the 56-bit one from here on: > -| > - || | | > - fc00 | -4TB | fdff |2 TB | ... unused hole > - || | | vaddr_end for > KASLR > - fe00 | -2TB | fe7f | 0.5 TB | cpu_entry_area > mapping > - fe80 | -1.5 TB | feff | 0.5 TB | ... unused hole > - ff00 | -1TB | ff7f | 0.5 TB | %esp fixup > stacks > - ff80 | -512GB | ffee | 444 GB | ... unused hole > - ffef | -68GB | fffe | 64 GB | EFI region > mapping space > - | -4GB | 7fff |2 GB | ... unused hole > - 8000 | -2GB | 9fff | 512 MB | kernel text > mapping, mapped to physical address 0 > - 8000 |-2048MB | | | > - a000 |-1536MB | feff | 1520 MB | module mapping > space > - ff00 | -16MB | | | > -FIXADDR_START | ~-11MB | ff5f | ~0.5 MB | > kernel-internal fixmap range, variable size and offset > - ff60 | -10MB | ff600fff |4 kB | legacy > vsyscall ABI > - ffe0 | -2MB | |2 MB | ... unused hole >
Re: [PATCH v2 06/19] drivers core: Add I/O ASID allocator
On Tue, Apr 23, 2019 at 04:31:06PM -0700, Jacob Pan wrote: > The allocator doesn't really belong in drivers/iommu because some > drivers would like to allocate PASIDs for devices that aren't managed by > an IOMMU, using the same ID space as IOMMU. It doesn't really belong in > drivers/pci either since platform device also support PASID. Add the > allocator in drivers/base. I'd still add it to drivers/iommu, just selectable separately from the core iommu code.. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu