Re: [PATCH] kvm-userspace: set pci mem to start at 0xc100000 and vesa to 0xc000000

2009-01-17 Thread Izik Eidus

Chris Wright wrote:

* Izik Eidus (iei...@redhat.com) wrote:
  

This patch make the pci mem region larger (1 giga now).
this is needed for pci devices that require large amount of memory
such as video cards.

for pea guests this patch is not an issue beacuse the guest OS will map
the rest of the ram after 0x1...,
for 32bits that arent pea, it mean the maximum memory that would be
avaible now is 3giga.



Seems a little heavy handed.

a) Given the size...code could be cleaned up so that a simple constant
change doesn't need to touch so much code.
  


Yea it probably can...


b) It is brute force.  I'm not sure it really matters all that much to
limit a 32-bit (non-PAE) guest to 3G, but it's a little extreme for the
cases that don't care about the large hole. 


Is there anyway to make it dynamic based on the requirements of the
devices that are part of the launched VM?
  


There is (you need to transfer data to the bios but it is possible...), 
the thing is - there was concern that

it will make windows crazy if you keep changing the devices physical mapping

Avi what do you think?


thanks,
-chris
  


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2494730 ] Guests stalling on kvm-82

2009-01-17 Thread SourceForge.net
Bugs item #2494730, was opened at 2009-01-09 09:59
Message generated for change (Comment added) made by kmshanah
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2494730group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Kevin Shanahan (kmshanah)
Assigned to: Nobody/Anonymous (nobody)
Summary: Guests stalling on kvm-82

Initial Comment:
I am seeing periodic stalls in Linux and Windows guests with kvm-82 on an IBM 
X3550 server with 2 x Xeon 5130 CPUs and 32GB RAM.

I am *reasonably* certain that this is a regression somewhere between kvm-72 
and kvm-82. We had been running kvm-72 (actually, the debian kvm-source 
package) up until now and never noticed the problem. Now the stalls are very 
obvious. When the guest stalls, the at least one kvm process on the host 
gobbles up 100% CPU. I'll do my debugging with the Linux guest, as that's sure 
to be easier to deal with.

As a simple demostration that the guest is unresponsive, here is the result of 
me pinging the guest from another machine on the (very quiet) LAN:

--- hermes-old.wumi.org.au ping statistics ---
600 packets transmitted, 600 received, 0% packet loss, time 599659ms
rtt min/avg/max/mdev = 0.255/181.211/6291.871/558.706 ms, pipe 7

The latency varies pretty badly, with spikes up to several seconds as you can 
see.

The problem is not reproducable on other VT capable hardware that I have - e.g. 
my desktop has a E8400 CPU which runs the VMs just fine. Does knowing that make 
it any easier to guess where the problem might be?

The Xeon 5130 does not have the smx, est, sse4_1, xsave, vnmi and 
flexpriority CPU flags that the E8400 does.

Because this server is the only hardware I have which exhibits the problem and 
it's a production machine, I have limited times where I can do testing. 
However, I will try confirm that kvm-72 is okay and then bisect.

Currently the host is running a 2.6.28 kernel with the kvm-82 modules. I guess 
I'm likely to have problems compiling the older kvm releases against this 
kernel, so I'll have to drop back to 2.6.27.something to run the tests.

CPU Vendor: Intel
CPU Type: Xeon 5130
Number of CPUs: 2
Host distribution: Debain Lenny/Sid
KVM version: kvm-82
Host kernel: Linux 2.6.28 x86_64
Guest Distribution: Debian Etch
Guest kernel: Linux 2.6.27.10 i686

Host's /proc/cpuinfo:

processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 15
model name  : Intel(R) Xeon(R) CPU5130  @ 2.00GHz
stepping: 6
cpu MHz : 1995.117
cache size  : 4096 KB
physical id : 0
siblings: 2
core id : 0
cpu cores   : 2
apicid  : 0
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 10
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm 
constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx tm2 
ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow
bogomips: 3990.23
clflush size: 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor   : 1
vendor_id   : GenuineIntel
cpu family  : 6
model   : 15
model name  : Intel(R) Xeon(R) CPU5130  @ 2.00GHz
stepping: 6
cpu MHz : 1995.117
cache size  : 4096 KB
physical id : 0
siblings: 2
core id : 1
cpu cores   : 2
apicid  : 1
initial apicid  : 1
fpu : yes
fpu_exception   : yes
cpuid level : 10
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm 
constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx tm2 
ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow
bogomips: 3989.96
clflush size: 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor   : 2
vendor_id   : GenuineIntel
cpu family  : 6
model   : 15
model name  : Intel(R) Xeon(R) CPU5130  @ 2.00GHz
stepping: 6
cpu MHz : 1995.117
cache size  : 4096 KB
physical id : 3
siblings: 2
core id : 0
cpu cores   : 2
apicid  : 6
initial apicid  : 6
fpu : yes
fpu_exception   : yes
cpuid level : 10
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm 
constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx tm2 
ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow

cirrus-vga is not properly reset

2009-01-17 Thread Jan Kiszka
Hi,

when rebooting from non-text mode, the text output of BIOS and
bootloader is unreadable. This only happens with cirrus and the KVM
tree, std-vga is fine, cirrus with upstream QEMU is fine, too. Moreover,
-no-kvm makes no difference.

What's very strange about this: CirrusVGAState does not differ after
issuing a reset from text mode compared to a reset from graphic mode
(except for a needless re-init of some io_memory slots - will post a
cleanup patch upstream). Currently I have no clue where to look next.

Jan



signature.asc
Description: OpenPGP digital signature


Re: cirrus-vga is not properly reset

2009-01-17 Thread Jan Kiszka
Jan Kiszka wrote:
 Hi,
 
 when rebooting from non-text mode, the text output of BIOS and
 bootloader is unreadable. This only happens with cirrus and the KVM
 tree, std-vga is fine, cirrus with upstream QEMU is fine, too. Moreover,
 -no-kvm makes no difference.

Looked at it again, and I was able to reproduce this problem with latest
QEMU, too. It just doesn't trigger that reliably as with kvm-userspace.

So I bet it has something to do with recent QEMU reset changes for vga
and cirrus (the display changes are not yet merged into kvm, so they
can't contribute to the reason). Moreover, the error pattern suggests
that correct text is written to the VGA RAM and attempted to dump to the
screen, but the font that should have been written to VGA RAM by the
BIOS is corrupted.

 
 What's very strange about this: CirrusVGAState does not differ after
 issuing a reset from text mode compared to a reset from graphic mode
 (except for a needless re-init of some io_memory slots - will post a
 cleanup patch upstream). Currently I have no clue where to look next.
 
 Jan
 

Jan



signature.asc
Description: OpenPGP digital signature


[Resolved] XP Guest Clock Slow

2009-01-17 Thread kvm-user
Update in case someone else fights the same issue.

Replaced Lenny kernel with 2.6.28 kernel.org compile.
All is well now - At least within a few seconds per day.

--
Marty

-Original Message-
From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf
Of kvm-u...@goodbit.net
Sent: Friday, January 16, 2009 10:15 AM
To: kvm@vger.kernel.org
Subject: XP Guest Clock Slow

Hello,

I just upgraded my hardware to a VT-capable CPU, and am getting up-to-speed
with kvm.

The clock on my XP guest significantly lags (e.g. by 50%) behind the host.
I lose several hours overnight in the guest clock.  (Not just a few seconds)
System load seems to aggravate the problem.

I have seen similar behavior in quick tests on a Vista client.
My Ubuntu client is fine, presumably due to clocksource==kvm-clock in guest.


Everything else works fine in all guest VMs.  (Amazingly well, actually)
Performance is equal to or better what I have been used to with VMware
Server.

Host CPU is Athlon 64 X2 with SB700 chipset.
System dumps (lspci, cpuinfo, etc) available on request (I don't want to
flood)

Host environment is Debian Lenny, with the distribution 2.6.26 kernel.
Currently running a locally compiled/installed kvm-82
I also saw the same issue with the kvm-72 Debian package.

My host clock is set to 'hpet'
| /sys/devices/system/clocksource/clocksource0$ cat *
| hpet acpi_pm jiffies tsc
| hpet

Is there a paravirtual clock driver for M$ clients available?
What is current best practice to work around this problem?

Thanks,

--
Marty


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/6] ATS capability support for Intel IOMMU

2009-01-17 Thread Yu Zhao
This patch series implements Address Translation Service support for
the Intel IOMMU. ATS makes the PCI Endpoint be able to request the
DMA address translation from the IOMMU and cache the translation in
the Endpoint, thus alleviate IOMMU pressure and improve the hardware
performance in the I/O virtualization environment.

Changelog: v1 - v2
  added 'static' prefix to a local LIST_HEAD (Andrew Morton)


Yu Zhao (6):
  PCI: support the ATS capability
  VT-d: parse ATSR in DMA Remapping Reporting Structure
  VT-d: add queue invalidation fault status support
  VT-d: add device IOTLB invalidation support
  VT-d: cleanup iommu_flush_iotlb_psi and flush_unmaps
  VT-d: support the device IOTLB

 drivers/pci/dmar.c   |  226 ++
 drivers/pci/intel-iommu.c|  137 +-
 drivers/pci/intr_remapping.c |   21 +++--
 drivers/pci/pci.c|   68 +
 include/linux/dmar.h |9 ++
 include/linux/intel-iommu.h  |   19 +++-
 include/linux/pci.h  |   15 +++
 include/linux/pci_regs.h |   10 ++
 8 files changed, 450 insertions(+), 55 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/6] VT-d: parse ATSR in DMA Remapping Reporting Structure

2009-01-17 Thread Yu Zhao
Parse the Root Port ATS Capability Reporting Structure in DMA Remapping
Reporting Structure ACPI table.

Signed-off-by: Yu Zhao yu.z...@intel.com
---
 drivers/pci/dmar.c  |  112 --
 include/linux/dmar.h|9 
 include/linux/intel-iommu.h |1 +
 3 files changed, 116 insertions(+), 6 deletions(-)

diff --git a/drivers/pci/dmar.c b/drivers/pci/dmar.c
index f5a662a..bd37b3c 100644
--- a/drivers/pci/dmar.c
+++ b/drivers/pci/dmar.c
@@ -254,6 +254,84 @@ rmrr_parse_dev(struct dmar_rmrr_unit *rmrru)
}
return ret;
 }
+
+static LIST_HEAD(dmar_atsr_units);
+
+static int __init dmar_parse_one_atsr(struct acpi_dmar_header *hdr)
+{
+   struct acpi_dmar_atsr *atsr;
+   struct dmar_atsr_unit *atsru;
+
+   atsr = container_of(hdr, struct acpi_dmar_atsr, header);
+   atsru = kzalloc(sizeof(*atsru), GFP_KERNEL);
+   if (!atsru)
+   return -ENOMEM;
+
+   atsru-hdr = hdr;
+   atsru-include_all = atsr-flags  0x1;
+
+   list_add(atsru-list, dmar_atsr_units);
+
+   return 0;
+}
+
+static int __init atsr_parse_dev(struct dmar_atsr_unit *atsru)
+{
+   int rc;
+   struct acpi_dmar_atsr *atsr;
+
+   if (atsru-include_all)
+   return 0;
+
+   atsr = container_of(atsru-hdr, struct acpi_dmar_atsr, header);
+   rc = dmar_parse_dev_scope((void *)(atsr + 1),
+   (void *)atsr + atsr-header.length,
+   atsru-devices_cnt, atsru-devices,
+   atsr-segment);
+   if (rc || !atsru-devices_cnt) {
+   list_del(atsru-list);
+   kfree(atsru);
+   }
+
+   return rc;
+}
+
+int dmar_find_matched_atsr_unit(struct pci_dev *dev)
+{
+   int i;
+   struct pci_bus *bus;
+   struct acpi_dmar_atsr *atsr;
+   struct dmar_atsr_unit *atsru;
+
+   list_for_each_entry(atsru, dmar_atsr_units, list) {
+   atsr = container_of(atsru-hdr, struct acpi_dmar_atsr, header);
+   if (atsr-segment == pci_domain_nr(dev-bus))
+   goto found;
+   }
+
+   return 0;
+
+found:
+   for (bus = dev-bus; bus; bus = bus-parent) {
+   struct pci_dev *bridge = bus-self;
+
+   if (!bridge || !bridge-is_pcie ||
+   bridge-pcie_type == PCI_EXP_TYPE_PCI_BRIDGE)
+   return 0;
+
+   if (bridge-pcie_type == PCI_EXP_TYPE_ROOT_PORT) {
+   for (i = 0; i  atsru-devices_cnt; i++)
+   if (atsru-devices[i] == bridge)
+   return 1;
+   break;
+   }
+   }
+
+   if (atsru-include_all)
+   return 1;
+
+   return 0;
+}
 #endif
 
 static void __init
@@ -261,22 +339,28 @@ dmar_table_print_dmar_entry(struct acpi_dmar_header 
*header)
 {
struct acpi_dmar_hardware_unit *drhd;
struct acpi_dmar_reserved_memory *rmrr;
+   struct acpi_dmar_atsr *atsr;
 
switch (header-type) {
case ACPI_DMAR_TYPE_HARDWARE_UNIT:
-   drhd = (struct acpi_dmar_hardware_unit *)header;
+   drhd = container_of(header, struct acpi_dmar_hardware_unit,
+   header);
printk (KERN_INFO PREFIX
-   DRHD (flags: 0x%08x)base: 0x%016Lx\n,
-   drhd-flags, (unsigned long long)drhd-address);
+   DRHD base: %#016Lx flags: %#x\n,
+   (unsigned long long)drhd-address, drhd-flags);
break;
case ACPI_DMAR_TYPE_RESERVED_MEMORY:
-   rmrr = (struct acpi_dmar_reserved_memory *)header;
-
+   rmrr = container_of(header, struct acpi_dmar_reserved_memory,
+   header);
printk (KERN_INFO PREFIX
-   RMRR base: 0x%016Lx end: 0x%016Lx\n,
+   RMRR base: %#016Lx end: %#016Lx\n,
(unsigned long long)rmrr-base_address,
(unsigned long long)rmrr-end_address);
break;
+   case ACPI_DMAR_TYPE_ATSR:
+   atsr = container_of(header, struct acpi_dmar_atsr, header);
+   printk(KERN_INFO PREFIX ATSR flags: %#x\n, atsr-flags);
+   break;
}
 }
 
@@ -341,6 +425,11 @@ parse_dmar_table(void)
ret = dmar_parse_one_rmrr(entry_header);
 #endif
break;
+   case ACPI_DMAR_TYPE_ATSR:
+#ifdef CONFIG_DMAR
+   ret = dmar_parse_one_atsr(entry_header);
+#endif
+   break;
default:
printk(KERN_WARNING PREFIX
Unknown DMAR structure type\n);
@@ -409,11 +498,19 @@ int __init dmar_dev_scope_init(void)
 #ifdef CONFIG_DMAR
{
struct 

[PATCH v2 3/6] VT-d: add queue invalidation fault status support

2009-01-17 Thread Yu Zhao
Check fault register after submitting an queue invalidation request.

Signed-off-by: Yu Zhao yu.z...@intel.com
---
 drivers/pci/dmar.c   |   59 +++--
 drivers/pci/intr_remapping.c |   21 --
 include/linux/intel-iommu.h  |4 ++-
 3 files changed, 59 insertions(+), 25 deletions(-)

diff --git a/drivers/pci/dmar.c b/drivers/pci/dmar.c
index bd37b3c..0c87ebd 100644
--- a/drivers/pci/dmar.c
+++ b/drivers/pci/dmar.c
@@ -671,19 +671,49 @@ static inline void reclaim_free_desc(struct q_inval *qi)
}
 }
 
+static int qi_check_fault(struct intel_iommu *iommu, int index)
+{
+   u32 fault;
+   int head;
+   struct q_inval *qi = iommu-qi;
+   int wait_index = (index + 1) % QI_LENGTH;
+
+   fault = readl(iommu-reg + DMAR_FSTS_REG);
+
+   /*
+* If IQE happens, the head points to the descriptor associated
+* with the error. No new descriptors are fetched until the IQE
+* is cleared.
+*/
+   if (fault  DMA_FSTS_IQE) {
+   head = readl(iommu-reg + DMAR_IQH_REG);
+   if ((head  DMAR_IQ_OFFSET) == index) {
+   memcpy(qi-desc[index], qi-desc[wait_index],
+   sizeof(struct qi_desc));
+   __iommu_flush_cache(iommu, qi-desc[index],
+   sizeof(struct qi_desc));
+   writel(DMA_FSTS_IQE, iommu-reg + DMAR_FSTS_REG);
+   return -EINVAL;
+   }
+   }
+
+   return 0;
+}
+
 /*
  * Submit the queued invalidation descriptor to the remapping
  * hardware unit and wait for its completion.
  */
-void qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu)
+int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu)
 {
+   int rc = 0;
struct q_inval *qi = iommu-qi;
struct qi_desc *hw, wait_desc;
int wait_index, index;
unsigned long flags;
 
if (!qi)
-   return;
+   return 0;
 
hw = qi-desc;
 
@@ -701,7 +731,8 @@ void qi_submit_sync(struct qi_desc *desc, struct 
intel_iommu *iommu)
 
hw[index] = *desc;
 
-   wait_desc.low = QI_IWD_STATUS_DATA(2) | QI_IWD_STATUS_WRITE | 
QI_IWD_TYPE;
+   wait_desc.low = QI_IWD_STATUS_DATA(QI_DONE) |
+   QI_IWD_STATUS_WRITE | QI_IWD_TYPE;
wait_desc.high = virt_to_phys(qi-desc_status[wait_index]);
 
hw[wait_index] = wait_desc;
@@ -712,13 +743,11 @@ void qi_submit_sync(struct qi_desc *desc, struct 
intel_iommu *iommu)
qi-free_head = (qi-free_head + 2) % QI_LENGTH;
qi-free_cnt -= 2;
 
-   spin_lock(iommu-register_lock);
/*
 * update the HW tail register indicating the presence of
 * new descriptors.
 */
-   writel(qi-free_head  4, iommu-reg + DMAR_IQT_REG);
-   spin_unlock(iommu-register_lock);
+   writel(qi-free_head  DMAR_IQ_OFFSET, iommu-reg + DMAR_IQT_REG);
 
while (qi-desc_status[wait_index] != QI_DONE) {
/*
@@ -728,6 +757,10 @@ void qi_submit_sync(struct qi_desc *desc, struct 
intel_iommu *iommu)
 * a deadlock where the interrupt context can wait indefinitely
 * for free slots in the queue.
 */
+   rc = qi_check_fault(iommu, index);
+   if (rc)
+   break;
+
spin_unlock(qi-q_lock);
cpu_relax();
spin_lock(qi-q_lock);
@@ -737,6 +770,8 @@ void qi_submit_sync(struct qi_desc *desc, struct 
intel_iommu *iommu)
 
reclaim_free_desc(qi);
spin_unlock_irqrestore(qi-q_lock, flags);
+
+   return rc;
 }
 
 /*
@@ -749,13 +784,13 @@ void qi_global_iec(struct intel_iommu *iommu)
desc.low = QI_IEC_TYPE;
desc.high = 0;
 
+   /* should never fail */
qi_submit_sync(desc, iommu);
 }
 
 int qi_flush_context(struct intel_iommu *iommu, u16 did, u16 sid, u8 fm,
 u64 type, int non_present_entry_flush)
 {
-
struct qi_desc desc;
 
if (non_present_entry_flush) {
@@ -769,10 +804,7 @@ int qi_flush_context(struct intel_iommu *iommu, u16 did, 
u16 sid, u8 fm,
| QI_CC_GRAN(type) | QI_CC_TYPE;
desc.high = 0;
 
-   qi_submit_sync(desc, iommu);
-
-   return 0;
-
+   return qi_submit_sync(desc, iommu);
 }
 
 int qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr,
@@ -802,10 +834,7 @@ int qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 
addr,
desc.high = QI_IOTLB_ADDR(addr) | QI_IOTLB_IH(ih)
| QI_IOTLB_AM(size_order);
 
-   qi_submit_sync(desc, iommu);
-
-   return 0;
-
+   return qi_submit_sync(desc, iommu);
 }
 
 /*
diff --git a/drivers/pci/intr_remapping.c b/drivers/pci/intr_remapping.c
index f78371b..45effc5 100644
--- a/drivers/pci/intr_remapping.c
+++ b/drivers/pci/intr_remapping.c
@@ 

[PATCH v2 6/6] VT-d: support the device IOTLB

2009-01-17 Thread Yu Zhao
Support device IOTLB (i.e. ATS) for both native and KVM environments.

Signed-off-by: Yu Zhao yu.z...@intel.com
---
 drivers/pci/intel-iommu.c   |   97 +-
 include/linux/intel-iommu.h |1 +
 2 files changed, 95 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index df92764..fb84d82 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -125,6 +125,7 @@ static inline void context_set_fault_enable(struct 
context_entry *context)
 }
 
 #define CONTEXT_TT_MULTI_LEVEL 0
+#define CONTEXT_TT_DEV_IOTLB   1
 
 static inline void context_set_translation_type(struct context_entry *context,
unsigned long value)
@@ -240,6 +241,8 @@ struct device_domain_info {
struct list_head global; /* link to global list */
u8 bus; /* PCI bus numer */
u8 devfn;   /* PCI devfn number */
+   int qdep;   /* invalidate queue depth */
+   struct intel_iommu *iommu; /* IOMMU used by this device */
struct pci_dev *dev; /* it's NULL for PCIE-to-PCI bridge */
struct dmar_domain *domain; /* pointer to domain */
 };
@@ -914,6 +917,75 @@ static int __iommu_flush_iotlb(struct intel_iommu *iommu, 
u16 did,
return 0;
 }
 
+static struct device_domain_info *
+iommu_support_dev_iotlb(struct dmar_domain *domain, u8 bus, u8 devfn)
+{
+   int found = 0;
+   unsigned long flags;
+   struct device_domain_info *info;
+   struct intel_iommu *iommu = device_to_iommu(bus, devfn);
+
+   if (!ecap_dev_iotlb_support(iommu-ecap))
+   return NULL;
+
+   if (!iommu-qi)
+   return NULL;
+
+   spin_lock_irqsave(device_domain_lock, flags);
+   list_for_each_entry(info, domain-devices, link)
+   if (info-dev  info-bus == bus  info-devfn == devfn) {
+   found = 1;
+   break;
+   }
+   spin_unlock_irqrestore(device_domain_lock, flags);
+
+   if (!found)
+   return NULL;
+
+   if (!dmar_find_matched_atsr_unit(info-dev))
+   return NULL;
+
+   info-iommu = iommu;
+   info-qdep = pci_ats_qdep(info-dev);
+   if (!info-qdep)
+   return NULL;
+
+   return info;
+}
+
+static void iommu_enable_dev_iotlb(struct device_domain_info *info)
+{
+   pci_enable_ats(info-dev, VTD_PAGE_SHIFT);
+}
+
+static void iommu_disable_dev_iotlb(struct device_domain_info *info)
+{
+   if (info-dev  pci_ats_enabled(info-dev))
+   pci_disable_ats(info-dev);
+}
+
+static void iommu_flush_dev_iotlb(struct dmar_domain *domain,
+ u64 addr, unsigned int mask)
+{
+   int rc;
+   u16 sid;
+   unsigned long flags;
+   struct device_domain_info *info;
+
+   spin_lock_irqsave(device_domain_lock, flags);
+   list_for_each_entry(info, domain-devices, link) {
+   if (!info-dev || !pci_ats_enabled(info-dev))
+   continue;
+
+   sid = info-bus  8 | info-devfn;
+   rc = qi_flush_dev_iotlb(info-iommu, sid,
+   info-qdep, addr, mask);
+   if (rc)
+   printk(KERN_ERR IOMMU: flush device IOTLB failed\n);
+   }
+   spin_unlock_irqrestore(device_domain_lock, flags);
+}
+
 static int iommu_flush_iotlb_psi(struct intel_iommu *iommu, u16 did,
u64 addr, unsigned int pages, int non_present_entry_flush)
 {
@@ -937,6 +1009,9 @@ static int iommu_flush_iotlb_psi(struct intel_iommu 
*iommu, u16 did,
rc = iommu-flush.flush_iotlb(iommu, did, addr, mask,
DMA_TLB_PSI_FLUSH,
non_present_entry_flush);
+   if (!rc  !non_present_entry_flush)
+   iommu_flush_dev_iotlb(iommu-domains[did], addr, mask);
+
return rc;
 }
 
@@ -1461,6 +1536,7 @@ static int domain_context_mapping_one(struct dmar_domain 
*domain,
unsigned long ndomains;
int id;
int agaw;
+   struct device_domain_info *info;
 
pr_debug(Set context mapping for %02x:%02x.%d\n,
bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
@@ -1526,7 +1602,11 @@ static int domain_context_mapping_one(struct dmar_domain 
*domain,
context_set_domain_id(context, id);
context_set_address_width(context, iommu-agaw);
context_set_address_root(context, virt_to_phys(pgd));
-   context_set_translation_type(context, CONTEXT_TT_MULTI_LEVEL);
+   info = iommu_support_dev_iotlb(domain, bus, devfn);
+   if (info)
+   context_set_translation_type(context, CONTEXT_TT_DEV_IOTLB);
+   else
+   context_set_translation_type(context, CONTEXT_TT_MULTI_LEVEL);
context_set_fault_enable(context);
context_set_present(context);

[PATCH v2 4/6] VT-d: add device IOTLB invalidation support

2009-01-17 Thread Yu Zhao
Support device IOTLB invalidation to flush the translation cached in the
Endpoint.

Signed-off-by: Yu Zhao yu.z...@intel.com
---
 drivers/pci/dmar.c  |   63 --
 include/linux/intel-iommu.h |   13 -
 2 files changed, 72 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/dmar.c b/drivers/pci/dmar.c
index 0c87ebd..4fea360 100644
--- a/drivers/pci/dmar.c
+++ b/drivers/pci/dmar.c
@@ -664,7 +664,8 @@ void free_iommu(struct intel_iommu *iommu)
  */
 static inline void reclaim_free_desc(struct q_inval *qi)
 {
-   while (qi-desc_status[qi-free_tail] == QI_DONE) {
+   while (qi-desc_status[qi-free_tail] == QI_DONE ||
+  qi-desc_status[qi-free_tail] == QI_ABORT) {
qi-desc_status[qi-free_tail] = QI_FREE;
qi-free_tail = (qi-free_tail + 1) % QI_LENGTH;
qi-free_cnt++;
@@ -674,10 +675,13 @@ static inline void reclaim_free_desc(struct q_inval *qi)
 static int qi_check_fault(struct intel_iommu *iommu, int index)
 {
u32 fault;
-   int head;
+   int head, tail;
struct q_inval *qi = iommu-qi;
int wait_index = (index + 1) % QI_LENGTH;
 
+   if (qi-desc_status[wait_index] == QI_ABORT)
+   return -EAGAIN;
+
fault = readl(iommu-reg + DMAR_FSTS_REG);
 
/*
@@ -697,6 +701,32 @@ static int qi_check_fault(struct intel_iommu *iommu, int 
index)
}
}
 
+   /*
+* If ITE happens, all pending wait_desc commands are aborted.
+* No new descriptors are fetched until the ITE is cleared.
+*/
+   if (fault  DMA_FSTS_ITE) {
+   head = readl(iommu-reg + DMAR_IQH_REG);
+   head = ((head  DMAR_IQ_OFFSET) - 1 + QI_LENGTH) % QI_LENGTH;
+   head |= 1;
+   tail = readl(iommu-reg + DMAR_IQT_REG);
+   tail = ((tail  DMAR_IQ_OFFSET) - 1 + QI_LENGTH) % QI_LENGTH;
+
+   writel(DMA_FSTS_ITE, iommu-reg + DMAR_FSTS_REG);
+
+   do {
+   if (qi-desc_status[head] == QI_IN_USE)
+   qi-desc_status[head] = QI_ABORT;
+   head = (head - 2 + QI_LENGTH) % QI_LENGTH;
+   } while (head != tail);
+
+   if (qi-desc_status[wait_index] == QI_ABORT)
+   return -EAGAIN;
+   }
+
+   if (fault  DMA_FSTS_ICE)
+   writel(DMA_FSTS_ICE, iommu-reg + DMAR_FSTS_REG);
+
return 0;
 }
 
@@ -706,7 +736,7 @@ static int qi_check_fault(struct intel_iommu *iommu, int 
index)
  */
 int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu)
 {
-   int rc = 0;
+   int rc;
struct q_inval *qi = iommu-qi;
struct qi_desc *hw, wait_desc;
int wait_index, index;
@@ -717,6 +747,9 @@ int qi_submit_sync(struct qi_desc *desc, struct intel_iommu 
*iommu)
 
hw = qi-desc;
 
+restart:
+   rc = 0;
+
spin_lock_irqsave(qi-q_lock, flags);
while (qi-free_cnt  3) {
spin_unlock_irqrestore(qi-q_lock, flags);
@@ -771,6 +804,9 @@ int qi_submit_sync(struct qi_desc *desc, struct intel_iommu 
*iommu)
reclaim_free_desc(qi);
spin_unlock_irqrestore(qi-q_lock, flags);
 
+   if (rc == -EAGAIN)
+   goto restart;
+
return rc;
 }
 
@@ -837,6 +873,27 @@ int qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 
addr,
return qi_submit_sync(desc, iommu);
 }
 
+int qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, int qdep,
+   u64 addr, unsigned int mask)
+{
+   struct qi_desc desc;
+
+   if (mask) {
+   BUG_ON(addr  ((1  (VTD_PAGE_SHIFT + mask)) - 1));
+   addr |= (1  (VTD_PAGE_SHIFT + mask - 1)) - 1;
+   desc.high = QI_DEV_IOTLB_ADDR(addr) | QI_DEV_IOTLB_SIZE;
+   } else
+   desc.high = QI_DEV_IOTLB_ADDR(addr);
+
+   if (qdep = QI_DEV_IOTLB_MAX_INVS)
+   qdep = 0;
+
+   desc.low = QI_DEV_IOTLB_SID(sid) | QI_DEV_IOTLB_QDEP(qdep) |
+  QI_DIOTLB_TYPE;
+
+   return qi_submit_sync(desc, iommu);
+}
+
 /*
  * Enable Queued Invalidation interface. This is a must to support
  * interrupt-remapping. Also used by DMA-remapping, which replaces
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 0a220c9..d82bdac 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -196,6 +196,8 @@ static inline void dmar_writeq(void __iomem *addr, u64 val)
 #define DMA_FSTS_PPF ((u32)2)
 #define DMA_FSTS_PFO ((u32)1)
 #define DMA_FSTS_IQE (1  4)
+#define DMA_FSTS_ICE (1  5)
+#define DMA_FSTS_ITE (1  6)
 #define dma_fsts_fault_record_index(s) (((s)  8)  0xff)
 
 /* FRCD_REG, 32 bits access */
@@ -224,7 +226,8 @@ do {
\
 enum {
QI_FREE,
QI_IN_USE,
-   QI_DONE
+   QI_DONE,
+   QI_ABORT
 };
 
 #define 

[PATCH v2 5/6] VT-d: cleanup iommu_flush_iotlb_psi and flush_unmaps

2009-01-17 Thread Yu Zhao
Make iommu_flush_iotlb_psi() and flush_unmaps() easier to read.

Signed-off-by: Yu Zhao yu.z...@intel.com
---
 drivers/pci/intel-iommu.c |   46 +---
 1 files changed, 22 insertions(+), 24 deletions(-)

diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index 3dfecb2..df92764 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -917,30 +917,27 @@ static int __iommu_flush_iotlb(struct intel_iommu *iommu, 
u16 did,
 static int iommu_flush_iotlb_psi(struct intel_iommu *iommu, u16 did,
u64 addr, unsigned int pages, int non_present_entry_flush)
 {
-   unsigned int mask;
+   int rc;
+   unsigned int mask = ilog2(__roundup_pow_of_two(pages));
 
BUG_ON(addr  (~VTD_PAGE_MASK));
BUG_ON(pages == 0);
 
-   /* Fallback to domain selective flush if no PSI support */
-   if (!cap_pgsel_inv(iommu-cap))
-   return iommu-flush.flush_iotlb(iommu, did, 0, 0,
-   DMA_TLB_DSI_FLUSH,
-   non_present_entry_flush);
-
/*
+* Fallback to domain selective flush if no PSI support or the size is
+* too big.
 * PSI requires page size to be 2 ^ x, and the base address is naturally
 * aligned to the size
 */
-   mask = ilog2(__roundup_pow_of_two(pages));
-   /* Fallback to domain selective flush if size is too big */
-   if (mask  cap_max_amask_val(iommu-cap))
-   return iommu-flush.flush_iotlb(iommu, did, 0, 0,
-   DMA_TLB_DSI_FLUSH, non_present_entry_flush);
-
-   return iommu-flush.flush_iotlb(iommu, did, addr, mask,
-   DMA_TLB_PSI_FLUSH,
-   non_present_entry_flush);
+   if (!cap_pgsel_inv(iommu-cap) || mask  cap_max_amask_val(iommu-cap))
+   rc = iommu-flush.flush_iotlb(iommu, did, 0, 0,
+   DMA_TLB_DSI_FLUSH,
+   non_present_entry_flush);
+   else
+   rc = iommu-flush.flush_iotlb(iommu, did, addr, mask,
+   DMA_TLB_PSI_FLUSH,
+   non_present_entry_flush);
+   return rc;
 }
 
 static void iommu_disable_protect_mem_regions(struct intel_iommu *iommu)
@@ -2293,15 +2290,16 @@ static void flush_unmaps(void)
if (!iommu)
continue;
 
-   if (deferred_flush[i].next) {
-   iommu-flush.flush_iotlb(iommu, 0, 0, 0,
-DMA_TLB_GLOBAL_FLUSH, 0);
-   for (j = 0; j  deferred_flush[i].next; j++) {
-   __free_iova(deferred_flush[i].domain[j]-iovad,
-   deferred_flush[i].iova[j]);
-   }
-   deferred_flush[i].next = 0;
+   if (!deferred_flush[i].next)
+   continue;
+
+   iommu-flush.flush_iotlb(iommu, 0, 0, 0,
+DMA_TLB_GLOBAL_FLUSH, 0);
+   for (j = 0; j  deferred_flush[i].next; j++) {
+   __free_iova(deferred_flush[i].domain[j]-iovad,
+   deferred_flush[i].iova[j]);
}
+   deferred_flush[i].next = 0;
}
 
list_size = 0;
-- 
1.5.6.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html