Re: [PATCH] kvm-userspace: set pci mem to start at 0xc100000 and vesa to 0xc000000
Chris Wright wrote: * Izik Eidus (iei...@redhat.com) wrote: This patch make the pci mem region larger (1 giga now). this is needed for pci devices that require large amount of memory such as video cards. for pea guests this patch is not an issue beacuse the guest OS will map the rest of the ram after 0x1..., for 32bits that arent pea, it mean the maximum memory that would be avaible now is 3giga. Seems a little heavy handed. a) Given the size...code could be cleaned up so that a simple constant change doesn't need to touch so much code. Yea it probably can... b) It is brute force. I'm not sure it really matters all that much to limit a 32-bit (non-PAE) guest to 3G, but it's a little extreme for the cases that don't care about the large hole. Is there anyway to make it dynamic based on the requirements of the devices that are part of the launched VM? There is (you need to transfer data to the bios but it is possible...), the thing is - there was concern that it will make windows crazy if you keep changing the devices physical mapping Avi what do you think? thanks, -chris -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2494730 ] Guests stalling on kvm-82
Bugs item #2494730, was opened at 2009-01-09 09:59 Message generated for change (Comment added) made by kmshanah You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2494730group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Kevin Shanahan (kmshanah) Assigned to: Nobody/Anonymous (nobody) Summary: Guests stalling on kvm-82 Initial Comment: I am seeing periodic stalls in Linux and Windows guests with kvm-82 on an IBM X3550 server with 2 x Xeon 5130 CPUs and 32GB RAM. I am *reasonably* certain that this is a regression somewhere between kvm-72 and kvm-82. We had been running kvm-72 (actually, the debian kvm-source package) up until now and never noticed the problem. Now the stalls are very obvious. When the guest stalls, the at least one kvm process on the host gobbles up 100% CPU. I'll do my debugging with the Linux guest, as that's sure to be easier to deal with. As a simple demostration that the guest is unresponsive, here is the result of me pinging the guest from another machine on the (very quiet) LAN: --- hermes-old.wumi.org.au ping statistics --- 600 packets transmitted, 600 received, 0% packet loss, time 599659ms rtt min/avg/max/mdev = 0.255/181.211/6291.871/558.706 ms, pipe 7 The latency varies pretty badly, with spikes up to several seconds as you can see. The problem is not reproducable on other VT capable hardware that I have - e.g. my desktop has a E8400 CPU which runs the VMs just fine. Does knowing that make it any easier to guess where the problem might be? The Xeon 5130 does not have the smx, est, sse4_1, xsave, vnmi and flexpriority CPU flags that the E8400 does. Because this server is the only hardware I have which exhibits the problem and it's a production machine, I have limited times where I can do testing. However, I will try confirm that kvm-72 is okay and then bisect. Currently the host is running a 2.6.28 kernel with the kvm-82 modules. I guess I'm likely to have problems compiling the older kvm releases against this kernel, so I'll have to drop back to 2.6.27.something to run the tests. CPU Vendor: Intel CPU Type: Xeon 5130 Number of CPUs: 2 Host distribution: Debain Lenny/Sid KVM version: kvm-82 Host kernel: Linux 2.6.28 x86_64 Guest Distribution: Debian Etch Guest kernel: Linux 2.6.27.10 i686 Host's /proc/cpuinfo: processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU5130 @ 2.00GHz stepping: 6 cpu MHz : 1995.117 cache size : 4096 KB physical id : 0 siblings: 2 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx tm2 ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow bogomips: 3990.23 clflush size: 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU5130 @ 2.00GHz stepping: 6 cpu MHz : 1995.117 cache size : 4096 KB physical id : 0 siblings: 2 core id : 1 cpu cores : 2 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx tm2 ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow bogomips: 3989.96 clflush size: 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: processor : 2 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU5130 @ 2.00GHz stepping: 6 cpu MHz : 1995.117 cache size : 4096 KB physical id : 3 siblings: 2 core id : 0 cpu cores : 2 apicid : 6 initial apicid : 6 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx tm2 ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow
cirrus-vga is not properly reset
Hi, when rebooting from non-text mode, the text output of BIOS and bootloader is unreadable. This only happens with cirrus and the KVM tree, std-vga is fine, cirrus with upstream QEMU is fine, too. Moreover, -no-kvm makes no difference. What's very strange about this: CirrusVGAState does not differ after issuing a reset from text mode compared to a reset from graphic mode (except for a needless re-init of some io_memory slots - will post a cleanup patch upstream). Currently I have no clue where to look next. Jan signature.asc Description: OpenPGP digital signature
Re: cirrus-vga is not properly reset
Jan Kiszka wrote: Hi, when rebooting from non-text mode, the text output of BIOS and bootloader is unreadable. This only happens with cirrus and the KVM tree, std-vga is fine, cirrus with upstream QEMU is fine, too. Moreover, -no-kvm makes no difference. Looked at it again, and I was able to reproduce this problem with latest QEMU, too. It just doesn't trigger that reliably as with kvm-userspace. So I bet it has something to do with recent QEMU reset changes for vga and cirrus (the display changes are not yet merged into kvm, so they can't contribute to the reason). Moreover, the error pattern suggests that correct text is written to the VGA RAM and attempted to dump to the screen, but the font that should have been written to VGA RAM by the BIOS is corrupted. What's very strange about this: CirrusVGAState does not differ after issuing a reset from text mode compared to a reset from graphic mode (except for a needless re-init of some io_memory slots - will post a cleanup patch upstream). Currently I have no clue where to look next. Jan Jan signature.asc Description: OpenPGP digital signature
[Resolved] XP Guest Clock Slow
Update in case someone else fights the same issue. Replaced Lenny kernel with 2.6.28 kernel.org compile. All is well now - At least within a few seconds per day. -- Marty -Original Message- From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf Of kvm-u...@goodbit.net Sent: Friday, January 16, 2009 10:15 AM To: kvm@vger.kernel.org Subject: XP Guest Clock Slow Hello, I just upgraded my hardware to a VT-capable CPU, and am getting up-to-speed with kvm. The clock on my XP guest significantly lags (e.g. by 50%) behind the host. I lose several hours overnight in the guest clock. (Not just a few seconds) System load seems to aggravate the problem. I have seen similar behavior in quick tests on a Vista client. My Ubuntu client is fine, presumably due to clocksource==kvm-clock in guest. Everything else works fine in all guest VMs. (Amazingly well, actually) Performance is equal to or better what I have been used to with VMware Server. Host CPU is Athlon 64 X2 with SB700 chipset. System dumps (lspci, cpuinfo, etc) available on request (I don't want to flood) Host environment is Debian Lenny, with the distribution 2.6.26 kernel. Currently running a locally compiled/installed kvm-82 I also saw the same issue with the kvm-72 Debian package. My host clock is set to 'hpet' | /sys/devices/system/clocksource/clocksource0$ cat * | hpet acpi_pm jiffies tsc | hpet Is there a paravirtual clock driver for M$ clients available? What is current best practice to work around this problem? Thanks, -- Marty -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 0/6] ATS capability support for Intel IOMMU
This patch series implements Address Translation Service support for the Intel IOMMU. ATS makes the PCI Endpoint be able to request the DMA address translation from the IOMMU and cache the translation in the Endpoint, thus alleviate IOMMU pressure and improve the hardware performance in the I/O virtualization environment. Changelog: v1 - v2 added 'static' prefix to a local LIST_HEAD (Andrew Morton) Yu Zhao (6): PCI: support the ATS capability VT-d: parse ATSR in DMA Remapping Reporting Structure VT-d: add queue invalidation fault status support VT-d: add device IOTLB invalidation support VT-d: cleanup iommu_flush_iotlb_psi and flush_unmaps VT-d: support the device IOTLB drivers/pci/dmar.c | 226 ++ drivers/pci/intel-iommu.c| 137 +- drivers/pci/intr_remapping.c | 21 +++-- drivers/pci/pci.c| 68 + include/linux/dmar.h |9 ++ include/linux/intel-iommu.h | 19 +++- include/linux/pci.h | 15 +++ include/linux/pci_regs.h | 10 ++ 8 files changed, 450 insertions(+), 55 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/6] VT-d: parse ATSR in DMA Remapping Reporting Structure
Parse the Root Port ATS Capability Reporting Structure in DMA Remapping Reporting Structure ACPI table. Signed-off-by: Yu Zhao yu.z...@intel.com --- drivers/pci/dmar.c | 112 -- include/linux/dmar.h|9 include/linux/intel-iommu.h |1 + 3 files changed, 116 insertions(+), 6 deletions(-) diff --git a/drivers/pci/dmar.c b/drivers/pci/dmar.c index f5a662a..bd37b3c 100644 --- a/drivers/pci/dmar.c +++ b/drivers/pci/dmar.c @@ -254,6 +254,84 @@ rmrr_parse_dev(struct dmar_rmrr_unit *rmrru) } return ret; } + +static LIST_HEAD(dmar_atsr_units); + +static int __init dmar_parse_one_atsr(struct acpi_dmar_header *hdr) +{ + struct acpi_dmar_atsr *atsr; + struct dmar_atsr_unit *atsru; + + atsr = container_of(hdr, struct acpi_dmar_atsr, header); + atsru = kzalloc(sizeof(*atsru), GFP_KERNEL); + if (!atsru) + return -ENOMEM; + + atsru-hdr = hdr; + atsru-include_all = atsr-flags 0x1; + + list_add(atsru-list, dmar_atsr_units); + + return 0; +} + +static int __init atsr_parse_dev(struct dmar_atsr_unit *atsru) +{ + int rc; + struct acpi_dmar_atsr *atsr; + + if (atsru-include_all) + return 0; + + atsr = container_of(atsru-hdr, struct acpi_dmar_atsr, header); + rc = dmar_parse_dev_scope((void *)(atsr + 1), + (void *)atsr + atsr-header.length, + atsru-devices_cnt, atsru-devices, + atsr-segment); + if (rc || !atsru-devices_cnt) { + list_del(atsru-list); + kfree(atsru); + } + + return rc; +} + +int dmar_find_matched_atsr_unit(struct pci_dev *dev) +{ + int i; + struct pci_bus *bus; + struct acpi_dmar_atsr *atsr; + struct dmar_atsr_unit *atsru; + + list_for_each_entry(atsru, dmar_atsr_units, list) { + atsr = container_of(atsru-hdr, struct acpi_dmar_atsr, header); + if (atsr-segment == pci_domain_nr(dev-bus)) + goto found; + } + + return 0; + +found: + for (bus = dev-bus; bus; bus = bus-parent) { + struct pci_dev *bridge = bus-self; + + if (!bridge || !bridge-is_pcie || + bridge-pcie_type == PCI_EXP_TYPE_PCI_BRIDGE) + return 0; + + if (bridge-pcie_type == PCI_EXP_TYPE_ROOT_PORT) { + for (i = 0; i atsru-devices_cnt; i++) + if (atsru-devices[i] == bridge) + return 1; + break; + } + } + + if (atsru-include_all) + return 1; + + return 0; +} #endif static void __init @@ -261,22 +339,28 @@ dmar_table_print_dmar_entry(struct acpi_dmar_header *header) { struct acpi_dmar_hardware_unit *drhd; struct acpi_dmar_reserved_memory *rmrr; + struct acpi_dmar_atsr *atsr; switch (header-type) { case ACPI_DMAR_TYPE_HARDWARE_UNIT: - drhd = (struct acpi_dmar_hardware_unit *)header; + drhd = container_of(header, struct acpi_dmar_hardware_unit, + header); printk (KERN_INFO PREFIX - DRHD (flags: 0x%08x)base: 0x%016Lx\n, - drhd-flags, (unsigned long long)drhd-address); + DRHD base: %#016Lx flags: %#x\n, + (unsigned long long)drhd-address, drhd-flags); break; case ACPI_DMAR_TYPE_RESERVED_MEMORY: - rmrr = (struct acpi_dmar_reserved_memory *)header; - + rmrr = container_of(header, struct acpi_dmar_reserved_memory, + header); printk (KERN_INFO PREFIX - RMRR base: 0x%016Lx end: 0x%016Lx\n, + RMRR base: %#016Lx end: %#016Lx\n, (unsigned long long)rmrr-base_address, (unsigned long long)rmrr-end_address); break; + case ACPI_DMAR_TYPE_ATSR: + atsr = container_of(header, struct acpi_dmar_atsr, header); + printk(KERN_INFO PREFIX ATSR flags: %#x\n, atsr-flags); + break; } } @@ -341,6 +425,11 @@ parse_dmar_table(void) ret = dmar_parse_one_rmrr(entry_header); #endif break; + case ACPI_DMAR_TYPE_ATSR: +#ifdef CONFIG_DMAR + ret = dmar_parse_one_atsr(entry_header); +#endif + break; default: printk(KERN_WARNING PREFIX Unknown DMAR structure type\n); @@ -409,11 +498,19 @@ int __init dmar_dev_scope_init(void) #ifdef CONFIG_DMAR { struct
[PATCH v2 3/6] VT-d: add queue invalidation fault status support
Check fault register after submitting an queue invalidation request. Signed-off-by: Yu Zhao yu.z...@intel.com --- drivers/pci/dmar.c | 59 +++-- drivers/pci/intr_remapping.c | 21 -- include/linux/intel-iommu.h |4 ++- 3 files changed, 59 insertions(+), 25 deletions(-) diff --git a/drivers/pci/dmar.c b/drivers/pci/dmar.c index bd37b3c..0c87ebd 100644 --- a/drivers/pci/dmar.c +++ b/drivers/pci/dmar.c @@ -671,19 +671,49 @@ static inline void reclaim_free_desc(struct q_inval *qi) } } +static int qi_check_fault(struct intel_iommu *iommu, int index) +{ + u32 fault; + int head; + struct q_inval *qi = iommu-qi; + int wait_index = (index + 1) % QI_LENGTH; + + fault = readl(iommu-reg + DMAR_FSTS_REG); + + /* +* If IQE happens, the head points to the descriptor associated +* with the error. No new descriptors are fetched until the IQE +* is cleared. +*/ + if (fault DMA_FSTS_IQE) { + head = readl(iommu-reg + DMAR_IQH_REG); + if ((head DMAR_IQ_OFFSET) == index) { + memcpy(qi-desc[index], qi-desc[wait_index], + sizeof(struct qi_desc)); + __iommu_flush_cache(iommu, qi-desc[index], + sizeof(struct qi_desc)); + writel(DMA_FSTS_IQE, iommu-reg + DMAR_FSTS_REG); + return -EINVAL; + } + } + + return 0; +} + /* * Submit the queued invalidation descriptor to the remapping * hardware unit and wait for its completion. */ -void qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu) +int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu) { + int rc = 0; struct q_inval *qi = iommu-qi; struct qi_desc *hw, wait_desc; int wait_index, index; unsigned long flags; if (!qi) - return; + return 0; hw = qi-desc; @@ -701,7 +731,8 @@ void qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu) hw[index] = *desc; - wait_desc.low = QI_IWD_STATUS_DATA(2) | QI_IWD_STATUS_WRITE | QI_IWD_TYPE; + wait_desc.low = QI_IWD_STATUS_DATA(QI_DONE) | + QI_IWD_STATUS_WRITE | QI_IWD_TYPE; wait_desc.high = virt_to_phys(qi-desc_status[wait_index]); hw[wait_index] = wait_desc; @@ -712,13 +743,11 @@ void qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu) qi-free_head = (qi-free_head + 2) % QI_LENGTH; qi-free_cnt -= 2; - spin_lock(iommu-register_lock); /* * update the HW tail register indicating the presence of * new descriptors. */ - writel(qi-free_head 4, iommu-reg + DMAR_IQT_REG); - spin_unlock(iommu-register_lock); + writel(qi-free_head DMAR_IQ_OFFSET, iommu-reg + DMAR_IQT_REG); while (qi-desc_status[wait_index] != QI_DONE) { /* @@ -728,6 +757,10 @@ void qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu) * a deadlock where the interrupt context can wait indefinitely * for free slots in the queue. */ + rc = qi_check_fault(iommu, index); + if (rc) + break; + spin_unlock(qi-q_lock); cpu_relax(); spin_lock(qi-q_lock); @@ -737,6 +770,8 @@ void qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu) reclaim_free_desc(qi); spin_unlock_irqrestore(qi-q_lock, flags); + + return rc; } /* @@ -749,13 +784,13 @@ void qi_global_iec(struct intel_iommu *iommu) desc.low = QI_IEC_TYPE; desc.high = 0; + /* should never fail */ qi_submit_sync(desc, iommu); } int qi_flush_context(struct intel_iommu *iommu, u16 did, u16 sid, u8 fm, u64 type, int non_present_entry_flush) { - struct qi_desc desc; if (non_present_entry_flush) { @@ -769,10 +804,7 @@ int qi_flush_context(struct intel_iommu *iommu, u16 did, u16 sid, u8 fm, | QI_CC_GRAN(type) | QI_CC_TYPE; desc.high = 0; - qi_submit_sync(desc, iommu); - - return 0; - + return qi_submit_sync(desc, iommu); } int qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr, @@ -802,10 +834,7 @@ int qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr, desc.high = QI_IOTLB_ADDR(addr) | QI_IOTLB_IH(ih) | QI_IOTLB_AM(size_order); - qi_submit_sync(desc, iommu); - - return 0; - + return qi_submit_sync(desc, iommu); } /* diff --git a/drivers/pci/intr_remapping.c b/drivers/pci/intr_remapping.c index f78371b..45effc5 100644 --- a/drivers/pci/intr_remapping.c +++ b/drivers/pci/intr_remapping.c @@
[PATCH v2 6/6] VT-d: support the device IOTLB
Support device IOTLB (i.e. ATS) for both native and KVM environments. Signed-off-by: Yu Zhao yu.z...@intel.com --- drivers/pci/intel-iommu.c | 97 +- include/linux/intel-iommu.h |1 + 2 files changed, 95 insertions(+), 3 deletions(-) diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c index df92764..fb84d82 100644 --- a/drivers/pci/intel-iommu.c +++ b/drivers/pci/intel-iommu.c @@ -125,6 +125,7 @@ static inline void context_set_fault_enable(struct context_entry *context) } #define CONTEXT_TT_MULTI_LEVEL 0 +#define CONTEXT_TT_DEV_IOTLB 1 static inline void context_set_translation_type(struct context_entry *context, unsigned long value) @@ -240,6 +241,8 @@ struct device_domain_info { struct list_head global; /* link to global list */ u8 bus; /* PCI bus numer */ u8 devfn; /* PCI devfn number */ + int qdep; /* invalidate queue depth */ + struct intel_iommu *iommu; /* IOMMU used by this device */ struct pci_dev *dev; /* it's NULL for PCIE-to-PCI bridge */ struct dmar_domain *domain; /* pointer to domain */ }; @@ -914,6 +917,75 @@ static int __iommu_flush_iotlb(struct intel_iommu *iommu, u16 did, return 0; } +static struct device_domain_info * +iommu_support_dev_iotlb(struct dmar_domain *domain, u8 bus, u8 devfn) +{ + int found = 0; + unsigned long flags; + struct device_domain_info *info; + struct intel_iommu *iommu = device_to_iommu(bus, devfn); + + if (!ecap_dev_iotlb_support(iommu-ecap)) + return NULL; + + if (!iommu-qi) + return NULL; + + spin_lock_irqsave(device_domain_lock, flags); + list_for_each_entry(info, domain-devices, link) + if (info-dev info-bus == bus info-devfn == devfn) { + found = 1; + break; + } + spin_unlock_irqrestore(device_domain_lock, flags); + + if (!found) + return NULL; + + if (!dmar_find_matched_atsr_unit(info-dev)) + return NULL; + + info-iommu = iommu; + info-qdep = pci_ats_qdep(info-dev); + if (!info-qdep) + return NULL; + + return info; +} + +static void iommu_enable_dev_iotlb(struct device_domain_info *info) +{ + pci_enable_ats(info-dev, VTD_PAGE_SHIFT); +} + +static void iommu_disable_dev_iotlb(struct device_domain_info *info) +{ + if (info-dev pci_ats_enabled(info-dev)) + pci_disable_ats(info-dev); +} + +static void iommu_flush_dev_iotlb(struct dmar_domain *domain, + u64 addr, unsigned int mask) +{ + int rc; + u16 sid; + unsigned long flags; + struct device_domain_info *info; + + spin_lock_irqsave(device_domain_lock, flags); + list_for_each_entry(info, domain-devices, link) { + if (!info-dev || !pci_ats_enabled(info-dev)) + continue; + + sid = info-bus 8 | info-devfn; + rc = qi_flush_dev_iotlb(info-iommu, sid, + info-qdep, addr, mask); + if (rc) + printk(KERN_ERR IOMMU: flush device IOTLB failed\n); + } + spin_unlock_irqrestore(device_domain_lock, flags); +} + static int iommu_flush_iotlb_psi(struct intel_iommu *iommu, u16 did, u64 addr, unsigned int pages, int non_present_entry_flush) { @@ -937,6 +1009,9 @@ static int iommu_flush_iotlb_psi(struct intel_iommu *iommu, u16 did, rc = iommu-flush.flush_iotlb(iommu, did, addr, mask, DMA_TLB_PSI_FLUSH, non_present_entry_flush); + if (!rc !non_present_entry_flush) + iommu_flush_dev_iotlb(iommu-domains[did], addr, mask); + return rc; } @@ -1461,6 +1536,7 @@ static int domain_context_mapping_one(struct dmar_domain *domain, unsigned long ndomains; int id; int agaw; + struct device_domain_info *info; pr_debug(Set context mapping for %02x:%02x.%d\n, bus, PCI_SLOT(devfn), PCI_FUNC(devfn)); @@ -1526,7 +1602,11 @@ static int domain_context_mapping_one(struct dmar_domain *domain, context_set_domain_id(context, id); context_set_address_width(context, iommu-agaw); context_set_address_root(context, virt_to_phys(pgd)); - context_set_translation_type(context, CONTEXT_TT_MULTI_LEVEL); + info = iommu_support_dev_iotlb(domain, bus, devfn); + if (info) + context_set_translation_type(context, CONTEXT_TT_DEV_IOTLB); + else + context_set_translation_type(context, CONTEXT_TT_MULTI_LEVEL); context_set_fault_enable(context); context_set_present(context);
[PATCH v2 4/6] VT-d: add device IOTLB invalidation support
Support device IOTLB invalidation to flush the translation cached in the Endpoint. Signed-off-by: Yu Zhao yu.z...@intel.com --- drivers/pci/dmar.c | 63 -- include/linux/intel-iommu.h | 13 - 2 files changed, 72 insertions(+), 4 deletions(-) diff --git a/drivers/pci/dmar.c b/drivers/pci/dmar.c index 0c87ebd..4fea360 100644 --- a/drivers/pci/dmar.c +++ b/drivers/pci/dmar.c @@ -664,7 +664,8 @@ void free_iommu(struct intel_iommu *iommu) */ static inline void reclaim_free_desc(struct q_inval *qi) { - while (qi-desc_status[qi-free_tail] == QI_DONE) { + while (qi-desc_status[qi-free_tail] == QI_DONE || + qi-desc_status[qi-free_tail] == QI_ABORT) { qi-desc_status[qi-free_tail] = QI_FREE; qi-free_tail = (qi-free_tail + 1) % QI_LENGTH; qi-free_cnt++; @@ -674,10 +675,13 @@ static inline void reclaim_free_desc(struct q_inval *qi) static int qi_check_fault(struct intel_iommu *iommu, int index) { u32 fault; - int head; + int head, tail; struct q_inval *qi = iommu-qi; int wait_index = (index + 1) % QI_LENGTH; + if (qi-desc_status[wait_index] == QI_ABORT) + return -EAGAIN; + fault = readl(iommu-reg + DMAR_FSTS_REG); /* @@ -697,6 +701,32 @@ static int qi_check_fault(struct intel_iommu *iommu, int index) } } + /* +* If ITE happens, all pending wait_desc commands are aborted. +* No new descriptors are fetched until the ITE is cleared. +*/ + if (fault DMA_FSTS_ITE) { + head = readl(iommu-reg + DMAR_IQH_REG); + head = ((head DMAR_IQ_OFFSET) - 1 + QI_LENGTH) % QI_LENGTH; + head |= 1; + tail = readl(iommu-reg + DMAR_IQT_REG); + tail = ((tail DMAR_IQ_OFFSET) - 1 + QI_LENGTH) % QI_LENGTH; + + writel(DMA_FSTS_ITE, iommu-reg + DMAR_FSTS_REG); + + do { + if (qi-desc_status[head] == QI_IN_USE) + qi-desc_status[head] = QI_ABORT; + head = (head - 2 + QI_LENGTH) % QI_LENGTH; + } while (head != tail); + + if (qi-desc_status[wait_index] == QI_ABORT) + return -EAGAIN; + } + + if (fault DMA_FSTS_ICE) + writel(DMA_FSTS_ICE, iommu-reg + DMAR_FSTS_REG); + return 0; } @@ -706,7 +736,7 @@ static int qi_check_fault(struct intel_iommu *iommu, int index) */ int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu) { - int rc = 0; + int rc; struct q_inval *qi = iommu-qi; struct qi_desc *hw, wait_desc; int wait_index, index; @@ -717,6 +747,9 @@ int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu) hw = qi-desc; +restart: + rc = 0; + spin_lock_irqsave(qi-q_lock, flags); while (qi-free_cnt 3) { spin_unlock_irqrestore(qi-q_lock, flags); @@ -771,6 +804,9 @@ int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu) reclaim_free_desc(qi); spin_unlock_irqrestore(qi-q_lock, flags); + if (rc == -EAGAIN) + goto restart; + return rc; } @@ -837,6 +873,27 @@ int qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr, return qi_submit_sync(desc, iommu); } +int qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, int qdep, + u64 addr, unsigned int mask) +{ + struct qi_desc desc; + + if (mask) { + BUG_ON(addr ((1 (VTD_PAGE_SHIFT + mask)) - 1)); + addr |= (1 (VTD_PAGE_SHIFT + mask - 1)) - 1; + desc.high = QI_DEV_IOTLB_ADDR(addr) | QI_DEV_IOTLB_SIZE; + } else + desc.high = QI_DEV_IOTLB_ADDR(addr); + + if (qdep = QI_DEV_IOTLB_MAX_INVS) + qdep = 0; + + desc.low = QI_DEV_IOTLB_SID(sid) | QI_DEV_IOTLB_QDEP(qdep) | + QI_DIOTLB_TYPE; + + return qi_submit_sync(desc, iommu); +} + /* * Enable Queued Invalidation interface. This is a must to support * interrupt-remapping. Also used by DMA-remapping, which replaces diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index 0a220c9..d82bdac 100644 --- a/include/linux/intel-iommu.h +++ b/include/linux/intel-iommu.h @@ -196,6 +196,8 @@ static inline void dmar_writeq(void __iomem *addr, u64 val) #define DMA_FSTS_PPF ((u32)2) #define DMA_FSTS_PFO ((u32)1) #define DMA_FSTS_IQE (1 4) +#define DMA_FSTS_ICE (1 5) +#define DMA_FSTS_ITE (1 6) #define dma_fsts_fault_record_index(s) (((s) 8) 0xff) /* FRCD_REG, 32 bits access */ @@ -224,7 +226,8 @@ do { \ enum { QI_FREE, QI_IN_USE, - QI_DONE + QI_DONE, + QI_ABORT }; #define
[PATCH v2 5/6] VT-d: cleanup iommu_flush_iotlb_psi and flush_unmaps
Make iommu_flush_iotlb_psi() and flush_unmaps() easier to read. Signed-off-by: Yu Zhao yu.z...@intel.com --- drivers/pci/intel-iommu.c | 46 +--- 1 files changed, 22 insertions(+), 24 deletions(-) diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c index 3dfecb2..df92764 100644 --- a/drivers/pci/intel-iommu.c +++ b/drivers/pci/intel-iommu.c @@ -917,30 +917,27 @@ static int __iommu_flush_iotlb(struct intel_iommu *iommu, u16 did, static int iommu_flush_iotlb_psi(struct intel_iommu *iommu, u16 did, u64 addr, unsigned int pages, int non_present_entry_flush) { - unsigned int mask; + int rc; + unsigned int mask = ilog2(__roundup_pow_of_two(pages)); BUG_ON(addr (~VTD_PAGE_MASK)); BUG_ON(pages == 0); - /* Fallback to domain selective flush if no PSI support */ - if (!cap_pgsel_inv(iommu-cap)) - return iommu-flush.flush_iotlb(iommu, did, 0, 0, - DMA_TLB_DSI_FLUSH, - non_present_entry_flush); - /* +* Fallback to domain selective flush if no PSI support or the size is +* too big. * PSI requires page size to be 2 ^ x, and the base address is naturally * aligned to the size */ - mask = ilog2(__roundup_pow_of_two(pages)); - /* Fallback to domain selective flush if size is too big */ - if (mask cap_max_amask_val(iommu-cap)) - return iommu-flush.flush_iotlb(iommu, did, 0, 0, - DMA_TLB_DSI_FLUSH, non_present_entry_flush); - - return iommu-flush.flush_iotlb(iommu, did, addr, mask, - DMA_TLB_PSI_FLUSH, - non_present_entry_flush); + if (!cap_pgsel_inv(iommu-cap) || mask cap_max_amask_val(iommu-cap)) + rc = iommu-flush.flush_iotlb(iommu, did, 0, 0, + DMA_TLB_DSI_FLUSH, + non_present_entry_flush); + else + rc = iommu-flush.flush_iotlb(iommu, did, addr, mask, + DMA_TLB_PSI_FLUSH, + non_present_entry_flush); + return rc; } static void iommu_disable_protect_mem_regions(struct intel_iommu *iommu) @@ -2293,15 +2290,16 @@ static void flush_unmaps(void) if (!iommu) continue; - if (deferred_flush[i].next) { - iommu-flush.flush_iotlb(iommu, 0, 0, 0, -DMA_TLB_GLOBAL_FLUSH, 0); - for (j = 0; j deferred_flush[i].next; j++) { - __free_iova(deferred_flush[i].domain[j]-iovad, - deferred_flush[i].iova[j]); - } - deferred_flush[i].next = 0; + if (!deferred_flush[i].next) + continue; + + iommu-flush.flush_iotlb(iommu, 0, 0, 0, +DMA_TLB_GLOBAL_FLUSH, 0); + for (j = 0; j deferred_flush[i].next; j++) { + __free_iova(deferred_flush[i].domain[j]-iovad, + deferred_flush[i].iova[j]); } + deferred_flush[i].next = 0; } list_size = 0; -- 1.5.6.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html