date:20100901

Re: [PATCH 5/5] virtio-net: Switch default to new bottom half TX handler for iothread

2010-09-01 Thread Stefan Hajnoczi

On Tue, Aug 31, 2010 at 11:32 PM, Alex Williamson
alex.william...@redhat.com wrote:
 On Tue, 2010-08-31 at 23:25 +0300, Michael S. Tsirkin wrote:
 On Fri, Aug 27, 2010 at 04:37:45PM -0600, Alex Williamson wrote:
  The bottom half handler shows big improvements over the timer
  with few downsides, default to it when the iothread is enabled.
 
  Using the following tests, with the guest and host connected
  via tap+bridge:
 
  guest netperf -t TCP_STREAM -H $HOST
  host netperf -t TCP_STREAM -H $GUEST
  guest netperf -t UDP_STREAM -H $HOST
  host netperf -t UDP_STREAM -H $GUEST
  guest netperf -t TCP_RR -H $HOST
 
  Results: base throughput, exits/throughput -
                     patched throughput, exits/throughput
 
  --enable-io-thread
  TCP guest-host 2737.77, 47.82  - 6767.09, 29.15 = 247%, 61%
  TCP host-guest 2231.33, 74.00  - 4125.80, 67.61 = 185%, 91%
  UDP guest-host 6281.68, 14.66  - 12569.27, 1.98 = 200%, 14%
  UDP host-guest 275.91,  289.22 - 264.80, 293.53 = 96%, 101%
  interations/s   1949.65, 82.97  - 7417.56, 84.31 = 380%, 102%
 
  No --enable-io-thread
  TCP guest-host 3041.57, 55.11 - 1038.93, 517.57 = 34%, 939%
  TCP host-guest 2416.03, 76.67 - 5655.92, 55.52  = 234%, 72%
  UDP guest-host 12255.82, 6.11 - 7775.87, 31.32  = 63%, 513%
  UDP host-guest 587.92, 245.95 - 611.88, 239.92  = 104%, 98%
  interations/s   1975.59, 83.21 - 8935.50, 88.18  = 452%, 106%
 
  Signed-off-by: Alex Williamson alex.william...@redhat.com

 parameter having different settings based on config
 options might surprise some users. I don't think
 we really need a parameter here ...

 I'm not a bit fan of this either, but I'd also prefer not to introduce a
 regression for a performance difference we know about in advance.  It
 gets even more complicated when we factor in qemu-kvm, as it doesn't
 build with iothread enabled, but seems to get and even better boost in
 performance across the board thanks largely to the kvm-irqchip.  Should
 we instead make this a configure option?  --enable-virtio-net-txbh?
 Thanks,

 Alex

qemu-kvm uses its own iothread implementation by default.  It doesn't
need --enable-io-thread because it already uses a similar model.

Stefan


  ---
 
   hw/s390-virtio-bus.c |    3 ++-
   hw/syborg_virtio.c   |    3 ++-
   hw/virtio-pci.c      |    3 ++-
   hw/virtio.h          |    6 ++
   4 files changed, 12 insertions(+), 3 deletions(-)
 
  diff --git a/hw/s390-virtio-bus.c b/hw/s390-virtio-bus.c
  index 1483362..985f99a 100644
  --- a/hw/s390-virtio-bus.c
  +++ b/hw/s390-virtio-bus.c
  @@ -328,7 +328,8 @@ static VirtIOS390DeviceInfo s390_virtio_net = {
       .qdev.size = sizeof(VirtIOS390Device),
       .qdev.props = (Property[]) {
           DEFINE_NIC_PROPERTIES(VirtIOS390Device, nic),
  -        DEFINE_PROP_UINT32(txtimer, VirtIOS390Device, txtimer, 1),
  +        DEFINE_PROP_UINT32(txtimer, VirtIOS390Device, txtimer,
  +                           TXTIMER_DEFAULT),
           DEFINE_PROP_INT32(txburst, VirtIOS390Device, txburst, 256),
           DEFINE_PROP_END_OF_LIST(),
       },
  diff --git a/hw/syborg_virtio.c b/hw/syborg_virtio.c
  index 7b76972..ee5746d 100644
  --- a/hw/syborg_virtio.c
  +++ b/hw/syborg_virtio.c
  @@ -300,7 +300,8 @@ static SysBusDeviceInfo syborg_virtio_net_info = {
       .qdev.props = (Property[]) {
           DEFINE_NIC_PROPERTIES(SyborgVirtIOProxy, nic),
           DEFINE_VIRTIO_NET_FEATURES(SyborgVirtIOProxy, host_features),
  -        DEFINE_PROP_UINT32(txtimer, SyborgVirtIOProxy, txtimer, 1),
  +        DEFINE_PROP_UINT32(txtimer, SyborgVirtIOProxy, txtimer,
  +                           TXTIMER_DEFAULT),
           DEFINE_PROP_INT32(txburst, SyborgVirtIOProxy, txburst, 256),
           DEFINE_PROP_END_OF_LIST(),
       }
  diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
  index e025c09..9740f57 100644
  --- a/hw/virtio-pci.c
  +++ b/hw/virtio-pci.c
  @@ -695,7 +695,8 @@ static PCIDeviceInfo virtio_info[] = {
               DEFINE_PROP_UINT32(vectors, VirtIOPCIProxy, nvectors, 3),
               DEFINE_VIRTIO_NET_FEATURES(VirtIOPCIProxy, host_features),
               DEFINE_NIC_PROPERTIES(VirtIOPCIProxy, nic),
  -            DEFINE_PROP_UINT32(txtimer, VirtIOPCIProxy, txtimer, 1),
  +            DEFINE_PROP_UINT32(txtimer, VirtIOPCIProxy, txtimer,
  +                               TXTIMER_DEFAULT),
               DEFINE_PROP_INT32(txburst, VirtIOPCIProxy, txburst, 256),
               DEFINE_PROP_END_OF_LIST(),
           },
  diff --git a/hw/virtio.h b/hw/virtio.h
  index 4051889..a1a17a2 100644
  --- a/hw/virtio.h
  +++ b/hw/virtio.h
  @@ -183,6 +183,12 @@ void virtio_update_irq(VirtIODevice *vdev);
   void virtio_bind_device(VirtIODevice *vdev, const VirtIOBindings *binding,
                           void *opaque);
 
  +#ifdef CONFIG_IOTHREAD
  + #define TXTIMER_DEFAULT 0
  +#else
  + #define TXTIMER_DEFAULT 1
  +#endif
  +

 Add a comment explaning that this is just a performance optimization?

   /* Base devices.  */
   VirtIODevice

[PATCH 0/2] Improve kvm_stat delta display

2010-09-01 Thread Avi Kivity

Make the first sleep shorter, so numbers come up quicker, and scale the
delta so it is a rate-of-change instead of a meaningless number.

Avi Kivity (2):
  kvm_stat: make the initial sleep shorter
  kvm_stat: scale delta column to make it a rate

 kvm/kvm_stat |   10 ++
 1 files changed, 6 insertions(+), 4 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] kvm_stat: make the initial sleep shorter

2010-09-01 Thread Avi Kivity

So we can see results faster

Signed-off-by: Avi Kivity a...@redhat.com
---
 kvm/kvm_stat |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/kvm/kvm_stat b/kvm/kvm_stat
index e68ca4e..b031d48 100755
--- a/kvm/kvm_stat
+++ b/kvm/kvm_stat
@@ -325,9 +325,11 @@ def tui(screen, stats):
 row += 1
 screen.refresh()
 
+sleeptime = 0.25
 while True:
 refresh()
-curses.halfdelay(30)
+curses.halfdelay(int(sleeptime * 10))
+sleeptime = 3
 try:
 c = screen.getkey()
 if c == 'q':
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] kvm_stat: scale delta column to make it a rate

2010-09-01 Thread Avi Kivity

Scale the delta column by 1/sleeptime, so its units are
events per second.

Signed-off-by: Avi Kivity a...@redhat.com
---
 kvm/kvm_stat |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/kvm/kvm_stat b/kvm/kvm_stat
index b031d48..e32df14 100755
--- a/kvm/kvm_stat
+++ b/kvm/kvm_stat
@@ -301,7 +301,7 @@ number_width = 10
 def tui(screen, stats):
 curses.use_default_colors()
 curses.noecho()
-def refresh():
+def refresh(sleeptime):
 screen.erase()
 screen.addstr(0, 0, 'kvm statistics')
 row = 2
@@ -321,13 +321,13 @@ def tui(screen, stats):
 screen.addstr(row, col, '%10d' % (values[0],))
 col += number_width
 if values[1] is not None:
-screen.addstr(row, col, '%8d' % (values[1],))
+screen.addstr(row, col, '%8d' % (values[1] / sleeptime,))
 row += 1
 screen.refresh()
 
 sleeptime = 0.25
 while True:
-refresh()
+refresh(sleeptime)
 curses.halfdelay(int(sleeptime * 10))
 sleeptime = 3
 try:
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 0/2] SVM: fix cpu onlining

2010-09-01 Thread Avi Kivity


 On 09/01/2010 01:13 AM, Marcelo Tosatti wrote:

applied, thanks.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] kvm_stat: use total count as a secondary sort key

2010-09-01 Thread Avi Kivity

This shows events with history in preference to events with none.

Signed-off-by: Avi Kivity a...@redhat.com
---
 kvm/kvm_stat |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kvm/kvm_stat b/kvm/kvm_stat
index e32df14..4a16277 100755
--- a/kvm/kvm_stat
+++ b/kvm/kvm_stat
@@ -308,9 +308,9 @@ def tui(screen, stats):
 s = stats.get()
 def sortkey(x):
 if s[x][1]:
-return -s[x][1]
+return (-s[x][1], -s[x][0])
 else:
-return x
+return (0, -s[x][0])
 for key in sorted(s.keys(), key = sortkey):
 if row = screen.getmaxyx()[0]:
 break
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] kvm_stat: ignore events that have never occured

2010-09-01 Thread Avi Kivity

Less cluttered display.

Signed-off-by: Avi Kivity a...@redhat.com
---
 kvm/kvm_stat |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/kvm/kvm_stat b/kvm/kvm_stat
index 4a16277..d373c60 100755
--- a/kvm/kvm_stat
+++ b/kvm/kvm_stat
@@ -315,6 +315,8 @@ def tui(screen, stats):
 if row = screen.getmaxyx()[0]:
 break
 values = s[key]
+if not values[0] and not values[1]:
+break
 col = 1
 screen.addstr(row, col, key)
 col += label_width
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/6] * SUBJECT HERE *

2010-09-01 Thread Gerd Hoffmann


On 08/31/10 18:13, Anthony Liguori wrote:

Just as an aside, does anyone have a good way to maintain the 00 patches
in series with repeated submissions?


/me uses cut+paste from email folder or list archive.  I suspect you are 
looking for something better though ...


cheers,
  Gerd

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] KVM: Fix bogus value returned for MSR_IA32_PERF_STATUS

2010-09-01 Thread Avi Kivity

The returned value is completely bogus, and sets reserved bits.
Return zero instead.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/x86.c |5 +
 1 files changed, 1 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1cbf168..a2c03f1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1641,10 +1641,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, 
u64 *pdata)
data = vcpu-arch.ia32_misc_enable_msr;
break;
case MSR_IA32_PERF_STATUS:
-   /* TSC increment by tick */
-   data = 1000ULL;
-   /* CPU multiplier */
-   data |= (((uint64_t)4ULL)  40);
+   data = 0;
break;
case MSR_EFER:
data = vcpu-arch.efer;
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] KVM: Don't save/restore MSR_IA32_PERF_STATUS

2010-09-01 Thread Avi Kivity

It is read/only; restoring it only results in annoying messages.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/x86.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index cc6d6cd..1cbf168 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -739,7 +739,7 @@ static u32 msrs_to_save[] = {
 #ifdef CONFIG_X86_64
MSR_CSTAR, MSR_KERNEL_GS_BASE, MSR_SYSCALL_MASK, MSR_LSTAR,
 #endif
-   MSR_IA32_TSC, MSR_IA32_PERF_STATUS, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA
+   MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA
 };
 
 static unsigned num_msrs_to_save;
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/2] Fix MSR_IA32_PERF_STATUS

2010-09-01 Thread Avi Kivity

MSR_IA32_PERF_STATUS is read-only, yet exposed for save/restore.  This
generates annoying printk()s when userspace attempts to initialize it.

The value returned is also bogus, the comments appear to refer to another msr.

Fix both issues.  Alex, please test with a guest Mac.

Avi Kivity (2):
  KVM: Don't save/restore MSR_IA32_PERF_STATUS
  KVM: Fix bogus value returned for MSR_IA32_PERF_STATUS

 arch/x86/kvm/x86.c |7 ++-
 1 files changed, 2 insertions(+), 5 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Write to read-only msr MSR_IA32_PERF_STATUS is harmless, ignore it!

2010-09-01 Thread Avi Kivity


 On 08/31/2010 11:02 PM, Marcelo Tosatti wrote:


kvm_reset_msrs in qemu-kvm-x86.c.


I posted a patch to fix this.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PULL 00/35] KVM: PPC: End-August patch queue

2010-09-01 Thread Avi Kivity


 On 08/31/2010 05:31 AM, Alexander Graf wrote:

Howdy,

This is my local patch queue with stuff that has accumulated over the last
weeks on KVM for PPC with some last minute fixes, speedups and debugging help
that I needed for the KVM Forum ;-).

The highlights of this set are:

   - Converted most important debug points to tracepoints
   - Flush less PTEs (speedup)
   - Go back to our own hash (less duplicates)
   - Make SRs guest settable (speedup for 32 bit guests)
   - Remove r30/r31 restrictions from PV hooks (speedup!)
   - Fix random breakages
   - Fix random guest stalls
   - 440GP host support (Thanks Hollis!)
   - Reliable interrupt injection

Keep in mind that this is the first version that is stable on PPC32 hosts.
All versions prior to this could occupy otherwise used segment entries and
thus crash your machine :-).

It is also the first version that is stable with PPC64 guests, because they
require more sophisticated interrupt injection logic for which qemu patches
are also required.

Please pull this tree from:

 git://github.com/agraf/linux-2.6.git kvm-ppc-next

Have fun with more accurate, faster and less buggy KVM on PowerPC!


Pulled, thanks.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Write to read-only msr MSR_IA32_PERF_STATUS is harmless, ignore it!

2010-09-01 Thread Jes Sorensen

On 08/31/10 22:02, Marcelo Tosatti wrote:
 On Tue, Aug 31, 2010 at 06:33:34PM +0200, Jes Sorensen wrote:
 which makes no sense to me, but given it's x86, I am not sure if it
 could have come from the BIOS or something during reboot?
 
 kvm_reset_msrs in qemu-kvm-x86.c.

Interesting, I hadn't thought KVM would be able to cause itself to trap
writes like this. Avi's patch looks good though :)

Cheers,
Jes
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

ftrace/perf_event leak

2010-09-01 Thread Avi Kivity

 I recently added perf_event support to kvm_stat, to display kvm 
tracepoints as statistics (I'd like to fold this to tools/perf 
eventually, but that's another story).  However I'm seeing a resource 
leak - after I quit the tool, there are quite a few references into the 
kvm module:


  kvm_intel  43655  0
  kvm   272984  269 kvm_intel

The tool is just a python script that reads 
/sys/kernel/debug/tracing/events/kvm to find out which events are 
available, uses perf_event_open() to create one group per cpu to which a 
lot of events are attached.  The only special thing I can think of is 
that we use an ioctl to attach a filter to many perf_event descriptors.


You can find the source at 
http://git.kernel.org/?p=virt/kvm/qemu-kvm.git;a=blob_plain;f=kvm/kvm_stat;hb=5bd5f131b50cb373ff4e2a3632c6dad00a1f0b55.  
All it needs are the kvm modules loaded; no need to actually run a 
guest.  Run as root.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] KVM: Fix bogus value returned for MSR_IA32_PERF_STATUS

2010-09-01 Thread Alexander Graf


On 01.09.2010, at 09:34, Avi Kivity wrote:

 The returned value is completely bogus, and sets reserved bits.
 Return zero instead.
 
 Signed-off-by: Avi Kivity a...@redhat.com
 ---
 arch/x86/kvm/x86.c |5 +
 1 files changed, 1 insertions(+), 4 deletions(-)
 
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index 1cbf168..a2c03f1 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -1641,10 +1641,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, 
 u64 *pdata)
   data = vcpu-arch.ia32_misc_enable_msr;
   break;
   case MSR_IA32_PERF_STATUS:
 - /* TSC increment by tick */
 - data = 1000ULL;
 - /* CPU multiplier */
 - data |= (((uint64_t)4ULL)  40);
 + data = 0;
   break;
   case MSR_EFER:
   data = vcpu-arch.efer;


This is the respective code snippet from xnu:

/*
 * Get the TSC increment.  The TSC is incremented by this
 * on every bus tick.  Calculate the TSC conversion factors
 * to and from nano-seconds.
 */
if (cpuid_info()-cpuid_family == CPU_FAMILY_PENTIUM_M) {
uint64_tprfsts;

prfsts = rdmsr64(IA32_PERF_STS);
tscGranularity = (uint32_t)bitfield(prfsts, 44, 40);
N_by_2_bus_ratio = prfsts  bit(46);

} else {
panic(rtclock_init: unknown CPU family: 0x%X\n,
cpuid_info()-cpuid_family);
}

if (N_by_2_bus_ratio)
tscFCvtt2n = busFCvtt2n * 2 / (uint64_t)tscGranularity;
else
tscFCvtt2n = busFCvtt2n / (uint64_t)tscGranularity;

tscFreq = ((1 * Giga)   32) / tscFCvtt2n;
tscFCvtn2t = 0xULL / tscFCvtt2n;


So by passing in 0 here, you effectively make that code divide something by 0 
which results in a panic.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/5] virtio-net: Switch default to new bottom half TX handler for iothread

2010-09-01 Thread Michael S. Tsirkin

On Tue, Aug 31, 2010 at 04:32:54PM -0600, Alex Williamson wrote:
 On Tue, 2010-08-31 at 23:25 +0300, Michael S. Tsirkin wrote:
  On Fri, Aug 27, 2010 at 04:37:45PM -0600, Alex Williamson wrote:
   The bottom half handler shows big improvements over the timer
   with few downsides, default to it when the iothread is enabled.
   
   Using the following tests, with the guest and host connected
   via tap+bridge:
   
   guest netperf -t TCP_STREAM -H $HOST
   host netperf -t TCP_STREAM -H $GUEST
   guest netperf -t UDP_STREAM -H $HOST
   host netperf -t UDP_STREAM -H $GUEST
   guest netperf -t TCP_RR -H $HOST
   
   Results: base throughput, exits/throughput -
  patched throughput, exits/throughput
   
   --enable-io-thread
   TCP guest-host 2737.77, 47.82  - 6767.09, 29.15 = 247%, 61%
   TCP host-guest 2231.33, 74.00  - 4125.80, 67.61 = 185%, 91%
   UDP guest-host 6281.68, 14.66  - 12569.27, 1.98 = 200%, 14%
   UDP host-guest 275.91,  289.22 - 264.80, 293.53 = 96%, 101%
   interations/s   1949.65, 82.97  - 7417.56, 84.31 = 380%, 102%
   
   No --enable-io-thread
   TCP guest-host 3041.57, 55.11 - 1038.93, 517.57 = 34%, 939%
   TCP host-guest 2416.03, 76.67 - 5655.92, 55.52  = 234%, 72%
   UDP guest-host 12255.82, 6.11 - 7775.87, 31.32  = 63%, 513%
   UDP host-guest 587.92, 245.95 - 611.88, 239.92  = 104%, 98%
   interations/s   1975.59, 83.21 - 8935.50, 88.18  = 452%, 106%
   
   Signed-off-by: Alex Williamson alex.william...@redhat.com
  
  parameter having different settings based on config
  options might surprise some users. I don't think
  we really need a parameter here ...
 
 I'm not a bit fan of this either, but I'd also prefer not to introduce a
 regression for a performance difference we know about in advance.

I agree. The comment was mainly about a parameter. If we make it a hidden
parameter (mainbe with x- prefix), this won't be an issue.

 It gets even more complicated when we factor in qemu-kvm, as it doesn't
 build with iothread enabled, but seems to get and even better boost in
 performance across the board thanks largely to the kvm-irqchip.  Should
 we instead make this a configure option?  --enable-virtio-net-txbh?

That'll work too.

 Thanks,
 
 Alex
 
   ---
   
hw/s390-virtio-bus.c |3 ++-
hw/syborg_virtio.c   |3 ++-
hw/virtio-pci.c  |3 ++-
hw/virtio.h  |6 ++
4 files changed, 12 insertions(+), 3 deletions(-)
   
   diff --git a/hw/s390-virtio-bus.c b/hw/s390-virtio-bus.c
   index 1483362..985f99a 100644
   --- a/hw/s390-virtio-bus.c
   +++ b/hw/s390-virtio-bus.c
   @@ -328,7 +328,8 @@ static VirtIOS390DeviceInfo s390_virtio_net = {
.qdev.size = sizeof(VirtIOS390Device),
.qdev.props = (Property[]) {
DEFINE_NIC_PROPERTIES(VirtIOS390Device, nic),
   -DEFINE_PROP_UINT32(txtimer, VirtIOS390Device, txtimer, 1),
   +DEFINE_PROP_UINT32(txtimer, VirtIOS390Device, txtimer,
   +   TXTIMER_DEFAULT),
DEFINE_PROP_INT32(txburst, VirtIOS390Device, txburst, 256),
DEFINE_PROP_END_OF_LIST(),
},
   diff --git a/hw/syborg_virtio.c b/hw/syborg_virtio.c
   index 7b76972..ee5746d 100644
   --- a/hw/syborg_virtio.c
   +++ b/hw/syborg_virtio.c
   @@ -300,7 +300,8 @@ static SysBusDeviceInfo syborg_virtio_net_info = {
.qdev.props = (Property[]) {
DEFINE_NIC_PROPERTIES(SyborgVirtIOProxy, nic),
DEFINE_VIRTIO_NET_FEATURES(SyborgVirtIOProxy, host_features),
   -DEFINE_PROP_UINT32(txtimer, SyborgVirtIOProxy, txtimer, 1),
   +DEFINE_PROP_UINT32(txtimer, SyborgVirtIOProxy, txtimer,
   +   TXTIMER_DEFAULT),
DEFINE_PROP_INT32(txburst, SyborgVirtIOProxy, txburst, 256),
DEFINE_PROP_END_OF_LIST(),
}
   diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
   index e025c09..9740f57 100644
   --- a/hw/virtio-pci.c
   +++ b/hw/virtio-pci.c
   @@ -695,7 +695,8 @@ static PCIDeviceInfo virtio_info[] = {
DEFINE_PROP_UINT32(vectors, VirtIOPCIProxy, nvectors, 3),
DEFINE_VIRTIO_NET_FEATURES(VirtIOPCIProxy, host_features),
DEFINE_NIC_PROPERTIES(VirtIOPCIProxy, nic),
   -DEFINE_PROP_UINT32(txtimer, VirtIOPCIProxy, txtimer, 1),
   +DEFINE_PROP_UINT32(txtimer, VirtIOPCIProxy, txtimer,
   +   TXTIMER_DEFAULT),
DEFINE_PROP_INT32(txburst, VirtIOPCIProxy, txburst, 256),
DEFINE_PROP_END_OF_LIST(),
},
   diff --git a/hw/virtio.h b/hw/virtio.h
   index 4051889..a1a17a2 100644
   --- a/hw/virtio.h
   +++ b/hw/virtio.h
   @@ -183,6 +183,12 @@ void virtio_update_irq(VirtIODevice *vdev);
void virtio_bind_device(VirtIODevice *vdev, const VirtIOBindings 
   *binding,
void *opaque);

   +#ifdef CONFIG_IOTHREAD
   + #define TXTIMER_DEFAULT 0
   +#else
   + #define TXTIMER_DEFAULT 1
   +#endif
   +
  
  Add a

Re: [PATCH 2/2] KVM: Fix bogus value returned for MSR_IA32_PERF_STATUS

2010-09-01 Thread Avi Kivity


 On 09/01/2010 11:49 AM, Alexander Graf wrote:

On 01.09.2010, at 09:34, Avi Kivity wrote:


The returned value is completely bogus, and sets reserved bits.
Return zero instead.

Signed-off-by: Avi Kivitya...@redhat.com
---
arch/x86/kvm/x86.c |5 +
1 files changed, 1 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1cbf168..a2c03f1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1641,10 +1641,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, 
u64 *pdata)
data = vcpu-arch.ia32_misc_enable_msr;
break;
case MSR_IA32_PERF_STATUS:
-   /* TSC increment by tick */
-   data = 1000ULL;
-   /* CPU multiplier */
-   data |= (((uint64_t)4ULL)  40);
+   data = 0;
break;
case MSR_EFER:
data = vcpu-arch.efer;


This is the respective code snippet from xnu:

 /*
  * Get the TSC increment.  The TSC is incremented by this
  * on every bus tick.  Calculate the TSC conversion factors
  * to and from nano-seconds.
  */
 if (cpuid_info()-cpuid_family == CPU_FAMILY_PENTIUM_M) {
 uint64_tprfsts;

 prfsts = rdmsr64(IA32_PERF_STS);
 tscGranularity = (uint32_t)bitfield(prfsts, 44, 40);
 N_by_2_bus_ratio = prfsts  bit(46);

 } else {
 panic(rtclock_init: unknown CPU family: 0x%X\n,
 cpuid_info()-cpuid_family);
 }

 if (N_by_2_bus_ratio)
 tscFCvtt2n = busFCvtt2n * 2 / (uint64_t)tscGranularity;
 else
 tscFCvtt2n = busFCvtt2n / (uint64_t)tscGranularity;

 tscFreq = ((1 * Giga)  32) / tscFCvtt2n;
 tscFCvtn2t = 0xULL / tscFCvtt2n;


So by passing in 0 here, you effectively make that code divide something by 0 
which results in a panic.




Right.  Searching again I find that this is actually documented, (under 
MSR_PERF_STATU

s instead of IA32_PERF_STATUS).  I'll just drop this patch.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: ftrace/perf_event leak

2010-09-01 Thread Peter Zijlstra

On Wed, 2010-09-01 at 11:42 +0300, Avi Kivity wrote:
 I recently added perf_event support to kvm_stat, to display kvm 
 tracepoints as statistics (I'd like to fold this to tools/perf 
 eventually, but that's another story).  However I'm seeing a resource 
 leak - after I quit the tool, there are quite a few references into the 
 kvm module:
 
kvm_intel  43655  0
kvm   272984  269 kvm_intel
 
 The tool is just a python script that reads 
 /sys/kernel/debug/tracing/events/kvm to find out which events are 
 available, uses perf_event_open() to create one group per cpu to which a 
 lot of events are attached.  The only special thing I can think of is 
 that we use an ioctl to attach a filter to many perf_event descriptors.
 
 You can find the source at 
 http://git.kernel.org/?p=virt/kvm/qemu-kvm.git;a=blob_plain;f=kvm/kvm_stat;hb=5bd5f131b50cb373ff4e2a3632c6dad00a1f0b55.
   
 All it needs are the kvm modules loaded; no need to actually run a 
 guest.  Run as root.
 

Does something like the below cure that?

I seem to remember C doesn't make any promises about the order of logic
statements, hence we need to explicitly pull out that try_module_get()
so that it evaluates after the rest of the conditions.

---
 kernel/trace/trace_event_perf.c |   10 ++
 1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index 000e6e8..35051f2 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -88,10 +88,11 @@ int perf_trace_init(struct perf_event *p_event)
mutex_lock(event_mutex);
list_for_each_entry(tp_event, ftrace_events, list) {
if (tp_event-event.type == event_id 
-   tp_event-class  tp_event-class-reg 
-   try_module_get(tp_event-mod)) {
-   ret = perf_trace_event_init(tp_event, p_event);
-   break;
+   tp_event-class  tp_event-class-reg) {
+   if (try_module_get(tp_event-mod)) {
+   ret = perf_trace_event_init(tp_event, p_event);
+   break;
+   }
}
}
mutex_unlock(event_mutex);
@@ -138,6 +139,7 @@ void perf_trace_destroy(struct perf_event *p_event)
 
free_percpu(tp_event-perf_events);
tp_event-perf_events = NULL;
+   module_put(tp_event-mod);
 
if (!--total_ref_count) {
for (i = 0; i  4; i++) {

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 5/5] KVM: MMU: lower the aduit frequency

2010-09-01 Thread Avi Kivity


 On 08/31/2010 05:27 AM, Xiao Guangrong wrote:



I've always seen progress from the guest while running with audit
enabled (its slow, but its not supposed to be fast anyway).

Did you experience a freeze?


There is a simply test in the guest if it's not rate limit:

# time ls
anaconda-ks.cfg  Documents  install.log Music Public Videos
Desktop  Downloads  install.log.syslog  Pictures  Templates

real1m26.053s
user0m0.311s
sys 0m1.813s

'ls' command cost about 1.5 minute, if we run the memory test program, i think
the time/delay is unacceptable.. :-(


Marcelo, would making the ratelimit optional help?  personally I think 
without ratelimit audit is useless, and with ratelimit it is a lot less 
useful but can still point out problems.  But I haven't used it in a while.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: ftrace/perf_event leak

2010-09-01 Thread Avi Kivity


 On 09/01/2010 12:04 PM, Peter Zijlstra wrote:


Does something like the below cure that?



Unfortunately not.


I seem to remember C doesn't make any promises about the order of logic
statements, hence we need to explicitly pull out that try_module_get()
so that it evaluates after the rest of the conditions.


 and || are executed in order (so you can write

   if (p  p-x)

).

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: ftrace/perf_event leak

2010-09-01 Thread Li Zefan

Avi Kivity wrote:
  On 09/01/2010 12:04 PM, Peter Zijlstra wrote:

 Does something like the below cure that?

 
 Unfortunately not.
 

Then try this:

The bug should be caused by commit 1c024eca51fdc965290acf342ae16a476c2189d0.

---
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index 8a2b73f..9d1d1f2 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -161,6 +161,7 @@ void perf_trace_destroy(struct perf_event *p_event)
}
}
 out:
+   module_put(tp_event-mod);
mutex_unlock(event_mutex);
 }
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: ftrace/perf_event leak

2010-09-01 Thread Peter Zijlstra

On Wed, 2010-09-01 at 12:26 +0300, Avi Kivity wrote:
 On 09/01/2010 12:04 PM, Peter Zijlstra wrote:
 
  Does something like the below cure that?
 
 
 Unfortunately not.

Hrmm,.. bugger, don't use modules is my motto.. still I'll have another
look.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: ftrace/perf_event leak

2010-09-01 Thread Peter Zijlstra

On Wed, 2010-09-01 at 17:38 +0800, Li Zefan wrote:

 ---
 diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
 index 8a2b73f..9d1d1f2 100644
 --- a/kernel/trace/trace_event_perf.c
 +++ b/kernel/trace/trace_event_perf.c
 @@ -161,6 +161,7 @@ void perf_trace_destroy(struct perf_event *p_event)
   }
   }
  out:
 + module_put(tp_event-mod);
   mutex_unlock(event_mutex);
  }

Ah, indeed, we so that try_module_get() for each reference to the
tracepoint, I guess we also should do the below, in case the
registration fails.

diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index 000e6e8..31cc4cb 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -91,6 +91,8 @@ int perf_trace_init(struct perf_event *p_event)
tp_event-class  tp_event-class-reg 
try_module_get(tp_event-mod)) {
ret = perf_trace_event_init(tp_event, p_event);
+   if (ret)
+   module_put(tp_event-mod);
break;
}
}
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/1] Fix guest kernel crash when hitting AMD errata

2010-09-01 Thread Jes . Sorensen

From: Jes Sorensen jes.soren...@redhat.com

This fix gets around the case where a user specifies a CPU model
triggering an old Opteron errata. The patch returns the clock timing
the errata expects to see and ignores writes to make sure a guest
doesn't get an exception from reading or writing an MSR that is not
supported, since the kernel doesn't use safe MSR access for this
errata workaround.

Jes Sorensen (1):
  Fix guest kernel crash when hitting AMD errata

 arch/x86/kvm/x86.c |   22 ++
 1 files changed, 22 insertions(+), 0 deletions(-)

-- 
1.7.2.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/1] Fix guest kernel crash when hitting AMD errata

2010-09-01 Thread Jes . Sorensen

From: Jes Sorensen jes.soren...@redhat.com

MSR_K7_CLK_CTL is a no longer documented MSR, which is only relevant
on said old AMD CPU models. This change returns the expected value,
which the Linux kernel is expecting to avoid writing back the MSR,
plus it ignores all writes to the MSR.

Signed-off-by: Jes Sorensen jes.soren...@redhat.com
---
 arch/x86/kvm/x86.c |   22 ++
 1 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4014d6c..1810a80 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1439,6 +1439,16 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, 
u64 data)
pr_unimpl(vcpu, unimplemented perfctr wrmsr: 
0x%x data 0x%llx\n, msr, data);
break;
+   case MSR_K7_CLK_CTL:
+   /*
+* Ignore all writes to this no longer documented MSR.
+* Writes are only relevant for old K7 processors,
+* all pre-dating SVM, but a recommended workaround from
+* AMD for these chips. It is possible to speicify the
+* affected processor models on the command line, hence
+* the need to ignore the workaround.
+*/
+   break;
case HV_X64_MSR_GUEST_OS_ID ... HV_X64_MSR_SINT15:
if (kvm_hv_msr_partition_wide(msr)) {
int r;
@@ -1664,6 +1674,18 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, 
u64 *pdata)
case MSR_IA32_MCG_STATUS:
case MSR_IA32_MC0_CTL ... MSR_IA32_MC0_CTL + 4 * KVM_MAX_MCE_BANKS - 1:
return get_msr_mce(vcpu, msr, pdata);
+   case MSR_K7_CLK_CTL:
+   /*
+* Provide expected ramp-up count for K7. All other
+* are set to zero, indicating minimum divisors for
+* every field.
+*
+* This prevents guest kernels on AMD host with CPU
+* type 6, model 8 and higher from exploding due to
+* the rdmsr failing.
+*/
+   data = 0x2000;
+   break;
case HV_X64_MSR_GUEST_OS_ID ... HV_X64_MSR_SINT15:
if (kvm_hv_msr_partition_wide(msr)) {
int r;
-- 
1.7.2.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/5] virtio-net: Introduce a new bottom half packet TX

2010-09-01 Thread Michael S. Tsirkin

On Tue, Aug 31, 2010 at 02:33:46PM -0600, Alex Williamson wrote:
 On Tue, 2010-08-31 at 23:14 +0300, Michael S. Tsirkin wrote:
  On Fri, Aug 27, 2010 at 04:37:36PM -0600, Alex Williamson wrote:
   Based on a patch from Mark McLoughlin, this patch introduces a new
   bottom half packet transmitter that avoids the latency imposed by
   the tx_timer approach.  Rather than scheduling a timer when a TX
   packet comes in, schedule a bottom half to be run from the iothread.
   The bottom half handler first attempts to flush the queue with
   notification disabled (this is where we could race with a guest
   without txburst).  If we flush a full burst, reschedule immediately.
   If we send short of a full burst, try to re-enable notification.
   To avoid a race with TXs that may have occurred, we must then
   flush again.  If we find some packets to send, the guest it probably
   active, so we can reschedule again.
   
   tx_timer and tx_bh are mutually exclusive, so we can re-use the
   tx_waiting flag to indicate one or the other needs to be setup.
   This allows us to seamlessly migrate between timer and bh TX
   handling.
   
   Signed-off-by: Alex Williamson alex.william...@redhat.com
   ---
   
hw/virtio-net.c |   81 
   ++-
1 files changed, 68 insertions(+), 13 deletions(-)
   
   diff --git a/hw/virtio-net.c b/hw/virtio-net.c
   index 8b652f2..3288c77 100644
   --- a/hw/virtio-net.c
   +++ b/hw/virtio-net.c
   @@ -36,6 +36,7 @@ typedef struct VirtIONet
VirtQueue *ctrl_vq;
NICState *nic;
QEMUTimer *tx_timer;
   +QEMUBH *tx_bh;
uint32_t tx_timeout;
int32_t tx_burst;
int tx_waiting;
   @@ -704,16 +705,25 @@ static void virtio_net_handle_tx(VirtIODevice 
   *vdev, VirtQueue *vq)
{
VirtIONet *n = to_virtio_net(vdev);

   -if (n-tx_waiting) {
   -virtio_queue_set_notification(vq, 1);
   -qemu_del_timer(n-tx_timer);
   -n-tx_waiting = 0;
   -virtio_net_flush_tx(n, vq);
   +if (n-tx_timer) {
   +if (n-tx_waiting) {
   +virtio_queue_set_notification(vq, 1);
   +qemu_del_timer(n-tx_timer);
   +n-tx_waiting = 0;
   +virtio_net_flush_tx(n, vq);
   +} else {
   +qemu_mod_timer(n-tx_timer,
   +   qemu_get_clock(vm_clock) + n-tx_timeout);
   +n-tx_waiting = 1;
   +virtio_queue_set_notification(vq, 0);
   +}
} else {
   -qemu_mod_timer(n-tx_timer,
   -   qemu_get_clock(vm_clock) + n-tx_timeout);
   +if (unlikely(n-tx_waiting)) {
   +return;
   +}
   +virtio_queue_set_notification(n-tx_vq, 0);
   +qemu_bh_schedule(n-tx_bh);
n-tx_waiting = 1;
   -virtio_queue_set_notification(vq, 0);
}
}

   @@ -731,6 +741,41 @@ static void virtio_net_tx_timer(void *opaque)
virtio_net_flush_tx(n, n-tx_vq);
}

   +static void virtio_net_tx_bh(void *opaque)
   +{
   +VirtIONet *n = opaque;
   +int32_t ret;
   +
   +n-tx_waiting = 0;
   +
   +/* Just in case the driver is not ready on more */
   +if (unlikely(!(n-vdev.status  VIRTIO_CONFIG_S_DRIVER_OK)))
   +return;
   +
   +ret = virtio_net_flush_tx(n, n-tx_vq);
   +if (ret == -EBUSY) {
   +return; /* Notification re-enable handled by tx_complete */
   +}
   +
   +/* If we flush a full burst of packets, assume there are
   + * more coming and immediately reschedule */
   +if (ret = n-tx_burst) {
   +qemu_bh_schedule(n-tx_bh);
   +n-tx_waiting = 1;
   +return;
   +}
   +
   +/* If less than a full burst, re-enable notification and flush
   + * anything that may have come in while we weren't looking.  If
   + * we find something, assume the guest is still active and 
   reschedule */
   +virtio_queue_set_notification(n-tx_vq, 1);
   +if (virtio_net_flush_tx(n, n-tx_vq)  0) {
  
  Shouldn't this be virtio_net_flush_tx(n, n-tx_vq) = n-tx_burst?
  If we get less than tx_burst, the ring is empty now so no need to
  reschedule.
  Right?
 
 I suppose it depends on how aggressive we want to be.  If the guest put
 something on the queue between the first flush and this one, then it
 might be actively transmitting, and if we want to optimize latency, we
 anticipate that it might continue to transmit and re-schedule.  This is
 taken straight from markmc's rhel5 patch.  I wouldn't argue that it's
 wrong to not reschedule here, but it's clearly less aggressive.  Thanks,
 
 Alex

I'm a bit concerned that we are aggressive but not consistently aggressive.
For example if the guest adds a packet before we do disable
notification, we do not reschedule bh, but if it adds a packet
after this, we do. If we get 255 packets, then another 255 packets,
we poll without rescheduling an

Re: ftrace/perf_event leak

2010-09-01 Thread Avi Kivity


 On 09/01/2010 12:38 PM, Li Zefan wrote:


Then try this:


Tested-by: Avi Kivity a...@redhat.com

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/7] pci: expand tabs to spaces in pci_regs.h

2010-09-01 Thread Michael S. Tsirkin

On Wed, Sep 01, 2010 at 01:58:30AM +0300, Eduard - Gabriel Munteanu wrote:
 On Tue, Aug 31, 2010 at 11:29:53PM +0300, Michael S. Tsirkin wrote:
  On Sat, Aug 28, 2010 at 05:54:52PM +0300, Eduard - Gabriel Munteanu wrote:
   The conversion was done using the GNU 'expand' tool (default settings)
   to make it obey the QEMU coding style.
   
   Signed-off-by: Eduard - Gabriel Munteanu eduard.munte...@linux360.ro
  
  I'm not really interested in this: we copied pci_regs.h from linux
  to help non-linux hosts, and keeping the code consistent
  with the original makes detecting bugs and adding new stuff
  from linux/pci_regs.h easier.
  
   ---
hw/pci_regs.h | 1330 
   
1 files changed, 665 insertions(+), 665 deletions(-)
rewrite hw/pci_regs.h (90%)
 
 Ok, I'll drop it. The only reason I did it was one of my additions to
 this file made the patch look indented awkwardly.
 
 I'll use tabs and merge it into Linux as well.
 
 
   Thanks,
   Eduard

Good idea, this way more people with pci knowledge check it.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/5] virtio-net: More configurability and bh handling for tx

2010-09-01 Thread Michael S. Tsirkin

On Tue, Aug 31, 2010 at 04:26:07PM -0600, Alex Williamson wrote:
 On Tue, 2010-08-31 at 16:33 -0500, Anthony Liguori wrote:
  On 08/31/2010 03:27 PM, Michael S. Tsirkin wrote:
   On Fri, Aug 27, 2010 at 04:36:59PM -0600, Alex Williamson wrote:
  
   Add the ability to configure the tx_timer timeout and add a bottom
   half tx handler that typically shows a nice perf boost over the
   time based approach.  See last patch for perf details.  Make this
   the new default when the iothread is enabled.  Thanks,
  
   Alex

   As a further thought, maybe it would help to have a
   separate namespace for unsupported, developer-only parameters.
   Thoughts?
  
  
  We already have an undocumented one, just prefix with 'x-' to indicate 
  that it's experimental.
 
 Any objection then to x-txburst and x-txtimer then?  Easy enough to
 rename.  Thanks,
 
 Alex

No objection. Let's put the 256 constant in a header,
with a comment explaining where it came from?

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: ftrace/perf_event leak

2010-09-01 Thread Peter Zijlstra

On Wed, 2010-09-01 at 13:38 +0300, Avi Kivity wrote:
 On 09/01/2010 12:38 PM, Li Zefan wrote:
 
  Then try this:
 
 Tested-by: Avi Kivity a...@redhat.com
 

Thanks, queued as:

---
Subject: perf, trace: Fix module leak
From: Li Zefan l...@cn.fujitsu.com
Date: Wed Sep 01 12:58:43 CEST 2010

Commit 1c024eca (perf, trace: Optimize tracepoints by using
per-tracepoint-per-cpu hlist to track events) caused a module refcount
leak.

Tested-by: Avi Kivity a...@redhat.com
Signed-off-by: Peter Zijlstra a.p.zijls...@chello.nl
LKML-Reference: 4c7e1f12.8030...@cn.fujitsu.com
---
 kernel/trace/trace_event_perf.c |3 +++
 1 file changed, 3 insertions(+)

Index: linux-2.6/kernel/trace/trace_event_perf.c
===
--- linux-2.6.orig/kernel/trace/trace_event_perf.c
+++ linux-2.6/kernel/trace/trace_event_perf.c
@@ -91,6 +91,8 @@ int perf_trace_init(struct perf_event *p
tp_event-class  tp_event-class-reg 
try_module_get(tp_event-mod)) {
ret = perf_trace_event_init(tp_event, p_event);
+   if (ret)
+   module_put(tp_event-mod);
break;
}
}
@@ -147,6 +149,7 @@ void perf_trace_destroy(struct perf_even
}
}
 out:
+   module_put(tp_event-mod);
mutex_unlock(event_mutex);
 }
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: ftrace/perf_event leak

2010-09-01 Thread Ingo Molnar


* Peter Zijlstra pet...@infradead.org wrote:

 On Wed, 2010-09-01 at 13:38 +0300, Avi Kivity wrote:
  On 09/01/2010 12:38 PM, Li Zefan wrote:
  
   Then try this:
  
  Tested-by: Avi Kivity a...@redhat.com
  
 
 Thanks, queued as:
 
 ---
 Subject: perf, trace: Fix module leak
 From: Li Zefan l...@cn.fujitsu.com
 Date: Wed Sep 01 12:58:43 CEST 2010
 
 Commit 1c024eca (perf, trace: Optimize tracepoints by using
 per-tracepoint-per-cpu hlist to track events) caused a module refcount
 leak.
 
 Tested-by: Avi Kivity a...@redhat.com
 Signed-off-by: Peter Zijlstra a.p.zijls...@chello.nl
 LKML-Reference: 4c7e1f12.8030...@cn.fujitsu.com

Do we need this for -stable too?

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: ftrace/perf_event leak

2010-09-01 Thread Frederic Weisbecker

On Wed, Sep 01, 2010 at 01:02:57PM +0200, Peter Zijlstra wrote:
 On Wed, 2010-09-01 at 13:38 +0300, Avi Kivity wrote:
  On 09/01/2010 12:38 PM, Li Zefan wrote:
  
   Then try this:
  
  Tested-by: Avi Kivity a...@redhat.com
  
 
 Thanks, queued as:
 
 ---
 Subject: perf, trace: Fix module leak
 From: Li Zefan l...@cn.fujitsu.com
 Date: Wed Sep 01 12:58:43 CEST 2010
 
 Commit 1c024eca (perf, trace: Optimize tracepoints by using
 per-tracepoint-per-cpu hlist to track events) caused a module refcount
 leak.
 
 Tested-by: Avi Kivity a...@redhat.com
 Signed-off-by: Peter Zijlstra a.p.zijls...@chello.nl
 LKML-Reference: 4c7e1f12.8030...@cn.fujitsu.com
 ---
  kernel/trace/trace_event_perf.c |3 +++
  1 file changed, 3 insertions(+)
 
 Index: linux-2.6/kernel/trace/trace_event_perf.c
 ===
 --- linux-2.6.orig/kernel/trace/trace_event_perf.c
 +++ linux-2.6/kernel/trace/trace_event_perf.c
 @@ -91,6 +91,8 @@ int perf_trace_init(struct perf_event *p
   tp_event-class  tp_event-class-reg 
   try_module_get(tp_event-mod)) {
   ret = perf_trace_event_init(tp_event, p_event);
 + if (ret)
 + module_put(tp_event-mod);
   break;
   }
   }
 @@ -147,6 +149,7 @@ void perf_trace_destroy(struct perf_even
   }
   }
  out:
 + module_put(tp_event-mod);
   mutex_unlock(event_mutex);
  }
  
 



Thanks for fixing this.

However, can we split this in two patches to ease the backport?

The lack of a module_put() after perf_trace_init() failure is there for a while
(the backport needs to start in 2.6.32).

But the lack of a module_put in the destroy path needs a .35 backport only.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] 10Gb Ethernet Adapters (with TOE ) port

2010-09-01 Thread Jes Sorensen

On 08/18/10 08:12, Onkar Mahajan wrote:
 Is there any benefit gained by  porting 10Gb Ethernet adapters to
 Qemu/KVM ? The question may be naive. Please forgive me if this doesn't
 make any sense.

Creating a new QEMU driver to mimic one of the 10Gbit adapters is
probably a waste of time, you should be able to get the speed by using
virtio.

Alternatively you can use SRIOV to get direct access to the  10Gbit
adapter in the guest if you want.

Jes
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: ftrace/perf_event leak

2010-09-01 Thread Steven Rostedt

On Wed, 2010-09-01 at 14:15 +0200, Frederic Weisbecker wrote:

 Thanks for fixing this.
 
 However, can we split this in two patches to ease the backport?
 
 The lack of a module_put() after perf_trace_init() failure is there for a 
 while
 (the backport needs to start in 2.6.32).
 
 But the lack of a module_put in the destroy path needs a .35 backport only.

I don't think it really needs two patches. Just notify stable (and 
Greg KH in particular) about the backport requirements. Greg can handle
it ;)

-- Steve


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 5/5] KVM: MMU: lower the aduit frequency

2010-09-01 Thread Marcelo Tosatti

On Wed, Sep 01, 2010 at 12:06:37PM +0300, Avi Kivity wrote:
  On 08/31/2010 05:27 AM, Xiao Guangrong wrote:
 
 I've always seen progress from the guest while running with audit
 enabled (its slow, but its not supposed to be fast anyway).
 
 Did you experience a freeze?
 
 There is a simply test in the guest if it's not rate limit:
 
 # time ls
 anaconda-ks.cfg  Documents  install.log Music Public Videos
 Desktop  Downloads  install.log.syslog  Pictures  Templates
 
 real1m26.053s
 user0m0.311s
 sys 0m1.813s
 
 'ls' command cost about 1.5 minute, if we run the memory test program, i 
 think
 the time/delay is unacceptable.. :-(
 
 Marcelo, would making the ratelimit optional help?  personally I
 think without ratelimit audit is useless, and with ratelimit it is a
 lot less useful but can still point out problems.  But I haven't
 used it in a while.

Was just wondering whether there was a freeze. Patchset looks good
(ratelimit can be disabled manually).

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: ftrace/perf_event leak

2010-09-01 Thread Ingo Molnar


* Steven Rostedt rost...@goodmis.org wrote:

 On Wed, 2010-09-01 at 14:15 +0200, Frederic Weisbecker wrote:
 
  Thanks for fixing this.
  
  However, can we split this in two patches to ease the backport?
  
  The lack of a module_put() after perf_trace_init() failure is there for a 
  while
  (the backport needs to start in 2.6.32).
  
  But the lack of a module_put in the destroy path needs a .35 backport only.
 
 I don't think it really needs two patches. Just notify stable (and 
 Greg KH in particular) about the backport requirements. Greg can 
 handle it ;)

Well, Greg certainly has more than enough to handle, so if there's 
different chunks with different -stable vectors then it would be most 
helpful to him to split things up!

Manually trying to split up patches is both error-prone and 
stress-inducing.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 2/7] pci: memory access API and IOMMU support

2010-09-01 Thread Stefan Weil


Please see my comments at the end of this mail.


Am 30.08.2010 00:08, schrieb Eduard - Gabriel Munteanu:

PCI devices should access memory through pci_memory_*() instead of
cpu_physical_memory_*(). This also provides support for translation and
access checking in case an IOMMU is emulated.

Memory maps are treated as remote IOTLBs (that is, translation caches
belonging to the IOMMU-aware device itself). Clients (devices) must
provide callbacks for map invalidation in case these maps are
persistent beyond the current I/O context, e.g. AIO DMA transfers.

Signed-off-by: Eduard - Gabriel Munteanu eduard.munte...@linux360.ro
---
hw/pci.c | 191 +++-
hw/pci.h | 69 +++
hw/pci_internals.h | 12 +++
qemu-common.h | 1 +
4 files changed, 272 insertions(+), 1 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 2dc1577..afcb33c 100644
--- a/hw/pci.c
+++ b/hw/pci.c

...

diff --git a/hw/pci.h b/hw/pci.h
index c551f96..c95863a 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -172,6 +172,8 @@ struct PCIDevice {
char *romfile;
ram_addr_t rom_offset;
uint32_t rom_bar;
+
+ QLIST_HEAD(, PCIMemoryMap) memory_maps;
};

PCIDevice *pci_register_device(PCIBus *bus, const char *name,
@@ -391,4 +393,71 @@ static inline int ranges_overlap(uint64_t first1, 
uint64_t len1,

return !(last2  first1 || last1  first2);
}

+/*
+ * Memory I/O and PCI IOMMU definitions.
+ */
+
+#define IOMMU_PERM_READ (1  0)
+#define IOMMU_PERM_WRITE (1  1)
+#define IOMMU_PERM_RW (IOMMU_PERM_READ | IOMMU_PERM_WRITE)
+
+typedef int PCIInvalidateMapFunc(void *opaque);
+typedef int PCITranslateFunc(PCIDevice *iommu,
+ PCIDevice *dev,
+ pcibus_t addr,
+ target_phys_addr_t *paddr,
+ target_phys_addr_t *len,
+ unsigned perms);
+
+void pci_memory_rw(PCIDevice *dev,
+ pcibus_t addr,
+ uint8_t *buf,
+ pcibus_t len,
+ int is_write);
+void *pci_memory_map(PCIDevice *dev,
+ PCIInvalidateMapFunc *cb,
+ void *opaque,
+ pcibus_t addr,
+ target_phys_addr_t *len,
+ int is_write);
+void pci_memory_unmap(PCIDevice *dev,
+ void *buffer,
+ target_phys_addr_t len,
+ int is_write,
+ target_phys_addr_t access_len);
+void pci_register_iommu(PCIDevice *dev, PCITranslateFunc *translate);
+void pci_memory_invalidate_range(PCIDevice *dev, pcibus_t addr, 
pcibus_t len);

+
+#define DECLARE_PCI_LD(suffix, size) \
+uint##size##_t pci_ld##suffix(PCIDevice *dev, pcibus_t addr);
+
+#define DECLARE_PCI_ST(suffix, size) \
+void pci_st##suffix(PCIDevice *dev, pcibus_t addr, uint##size##_t val);
+
+DECLARE_PCI_LD(ub, 8)
+DECLARE_PCI_LD(uw, 16)
+DECLARE_PCI_LD(l, 32)
+DECLARE_PCI_LD(q, 64)
+
+DECLARE_PCI_ST(b, 8)
+DECLARE_PCI_ST(w, 16)
+DECLARE_PCI_ST(l, 32)
+DECLARE_PCI_ST(q, 64)
+
+static inline void pci_memory_read(PCIDevice *dev,
+ pcibus_t addr,
+ uint8_t *buf,
+ pcibus_t len)
+{
+ pci_memory_rw(dev, addr, buf, len, 0);
+}
+
+static inline void pci_memory_write(PCIDevice *dev,
+ pcibus_t addr,
+ const uint8_t *buf,
+ pcibus_t len)
+{
+ pci_memory_rw(dev, addr, (uint8_t *) buf, len, 1);
+}
+
#endif


The functions pci_memory_read and pci_memory_write not only read
or write byte data but many different data types which leads to
a lot of type casts in your other patches.

I'd prefer void *buf and const void *buf in the argument lists.
Then all those type casts could be removed.

Regards
Stefan Weil

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: Degrading Network performance as KVM/kernel version increases

2010-09-01 Thread matthew . r . rohrer

I can't say anything about the kernel version making things worse. At 
least for the qemu-kvm version, you should be using -device and -netdev

instead of -net nic -net tap (see 
*http://git.qemu.org/qemu.git/tree/docs/qdev-device-use.txt since it's 
not in the 0.12 tree).*

Thanks for your suggestion Brian I was not doing this correctly. After
changing to -device and -netdev I did get a significant performance
increase although overall performance is still much worse than what I
was getting in the 2.3.31.5 kernel with version 11.  Below is part of my
startup script. Any chance you notice anything else wrong?  

Before your suggestion (what was working well with version 11  2.6.31):
  modprobe kvm
  modprobe kvm_intel
  modprobe tun
  echo -e Setting up bridge device br0 \r
  brctl addbr br0
  ifconfig br0 192.168.100.254 netmask 255.255.255.0 up
  brctl addif br0 eth7
  ifconfig eth7 down
  ifconfig eth7 0.0.0.0
  for ((i=0; i  NUM_OF_DEVICES ; i++)); do
   echo -e Setting up  \r
   tunctl -b -g ${KVMNET_GID} -t kvmnet$i
   brctl addif br0 kvmnet$i
   ifconfig kvmnet$i up 0.0.0.0 promisc
 done
 echo 1  /proc/sys/net/ipv4/ip_forward
 iptables -t nat -A POSTROUTING -s 192.168.100.0/24 -o eth7 -j
MASQUERADE
 for ((i=0; i  NUM_OF_DEVICES ; i++)); do
echo -e Creating Virtual disk $i \r
qemu-img create -f qcow2 vdisk_$i.img $VDISK_SIZE
echo -e Starting Virtual Machine $i \r
/root/bin/qemu-system-x86_64 -cpu host -drive
file=./vdisk_$i.img,if=virtio,boot=on -cdrom ./$2 -boot d \
-net nic,model=virtio,macaddr=52:54:00:12:34:3$i \
-net tap,ifname=kvmnet$i,script=no \
-m 1024 \
-smp 2 \
-usb \
-usbdevice tablet \
-localtime \
-daemonize \
-vga std


After changing to -device and -netdev(working better with the latest
stuff but still much worse than 2.6.31):
modprobe kvm
  modprobe kvm_intel
  modprobe tun
  echo -e Setting up bridge device br0 \r
  brctl addbr br0
  ifconfig br0 192.168.100.254 netmask 255.255.255.0 up
  brctl addif br0 eth7
  ifconfig eth7 down
  ifconfig eth7 0.0.0.0
  for ((i=0; i  NUM_OF_DEVICES ; i++)); do
   echo -e Setting up  \r
   tunctl -b -g ${KVMNET_GID} -t kvmnet$i
   brctl addif br0 kvmnet$i
   ifconfig kvmnet$i up 0.0.0.0 promisc
   done
echo 1  /proc/sys/net/ipv4/ip_forward
iptables -t nat -A POSTROUTING -s 192.168.100.0/24 -o eth7 -j
MASQUERADE
for ((i=0; i  NUM_OF_DEVICES ; i++)); do
echo -e Creating Virtual disk $i \r
qemu-img create -f qcow2 vdisk_$i.img $VDISK_SIZE
echo -e Starting Virtual Machine $i \r
/root/bin/qemu-system-x86_64 -cpu host -drive
file=./vdisk_$i.img,if=virtio,boot=on -cdrom ./$2 -boot d \
-netdev type=tap,id=tap.0,script=no,ifname=kvmnet$i \
-device
virtio-net-pci,netdev=tap.0,mac=52:54:00:12:34:3$i \
-m 1024 \
-smp 2 \
-usb \
-usbdevice tablet \
-localtime \
-daemonize \
-vga std



Vhost-net is probably your best bet for maximizing throughput. You
might 
try a separate post just for the vhost error if nobody chimes in about 
it here.

I will do that.

Thanks again for the input! --Matt
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC 1/8] Implement getnsboottime kernel API

2010-09-01 Thread Zachary Amsden


On 08/30/2010 06:06 AM, Glauber Costa wrote:

From: Zachary Amsdenzams...@redhat.com

Add a kernel call to get the number of nanoseconds since boot.  This
is generally useful enough to make it a generic call.

Signed-off-by: Zachary Amsdenzams...@redhat.com
---
  include/linux/time.h  |1 +
  kernel/time/timekeeping.c |   27 +++
  2 files changed, 28 insertions(+), 0 deletions(-)

diff --git a/include/linux/time.h b/include/linux/time.h
index ea3559f..5d04108 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -145,6 +145,7 @@ extern void getnstimeofday(struct timespec *tv);
  extern void getrawmonotonic(struct timespec *ts);
  extern void getboottime(struct timespec *ts);
  extern void monotonic_to_bootbased(struct timespec *ts);
+extern s64 getnsboottime(void);

  extern struct timespec timespec_trunc(struct timespec t, unsigned gran);
  extern int timekeeping_valid_for_hres(void);
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index caf8d4d..d250f0a 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -285,6 +285,33 @@ void ktime_get_ts(struct timespec *ts)
  }
  EXPORT_SYMBOL_GPL(ktime_get_ts);

+
+/**
+ * getnsboottime - get the bootbased clock in nsec format
+ *
+ * The function calculates the bootbased clock from the realtime
+ * clock and the wall_to_monotonic offset and stores the result
+ * in normalized timespec format in the variable pointed to by @ts.
+ */
+s64 getnsboottime(void)
+{
+   unsigned int seq;
+   s64 secs, nsecs;
+
+   WARN_ON(timekeeping_suspended);
+
+   do {
+   seq = read_seqbegin(xtime_lock);
+   secs = xtime.tv_sec + wall_to_monotonic.tv_sec;
+   secs += total_sleep_time.tv_sec;
+   nsecs = xtime.tv_nsec + wall_to_monotonic.tv_nsec;
+   nsecs += total_sleep_time.tv_nsec + timekeeping_get_ns();
+
+   } while (read_seqretry(xtime_lock, seq));
+   return nsecs + (secs * NSEC_PER_SEC);
+}
+EXPORT_SYMBOL_GPL(getnsboottime);
+
  /**
   * do_gettimeofday - Returns the time of day in a timeval
   * @tv:   pointer to the timeval to be set
   


FWIW, I sent a patch to drop this yesterday, in favor of this approach:

+static inline u64 get_kernel_ns(void)
+{
+struct timespec ts;
+
+WARN_ON(preemptible());
+ktime_get_ts(ts);
+monotonic_to_bootbased(ts);
+return timespec_to_ns(ts);
+}
+

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: ftrace/perf_event leak

2010-09-01 Thread Li Zefan

 Subject: perf, trace: Fix module leak
 From: Li Zefan l...@cn.fujitsu.com
 Date: Wed Sep 01 12:58:43 CEST 2010

 Commit 1c024eca (perf, trace: Optimize tracepoints by using
 per-tracepoint-per-cpu hlist to track events) caused a module refcount
 leak.

 Tested-by: Avi Kivity a...@redhat.com
 Signed-off-by: Peter Zijlstra a.p.zijls...@chello.nl
 LKML-Reference: 4c7e1f12.8030...@cn.fujitsu.com
 ---
  kernel/trace/trace_event_perf.c |3 +++
  1 file changed, 3 insertions(+)

 Index: linux-2.6/kernel/trace/trace_event_perf.c
 ===
 --- linux-2.6.orig/kernel/trace/trace_event_perf.c
 +++ linux-2.6/kernel/trace/trace_event_perf.c
 @@ -91,6 +91,8 @@ int perf_trace_init(struct perf_event *p
  tp_event-class  tp_event-class-reg 
  try_module_get(tp_event-mod)) {
  ret = perf_trace_event_init(tp_event, p_event);
 +if (ret)
 +module_put(tp_event-mod);
  break;
  }
  }
 @@ -147,6 +149,7 @@ void perf_trace_destroy(struct perf_even
  }
  }
  out:
 +module_put(tp_event-mod);
  mutex_unlock(event_mutex);
  }

 Thanks for fixing this.

 However, can we split this in two patches to ease the backport?

 The lack of a module_put() after perf_trace_init() failure is there for a 
 while
 (the backport needs to start in 2.6.32).

The failure should be a rare case, I don't think this has to be backported?

 But the lack of a module_put in the destroy path needs a .35 backport only.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM test: virtio_console test v2

2010-09-01 Thread Lucas Meneghel Rodrigues

From: Lukáš Doktor ldok...@redhat.com

1) Starts VMs with the specified number of virtio console devices
2) Start smoke test
3) Start loopback test
4) Start performance test

This test uses an auxiliary script, console_switch.py, that is
copied to guests. This script has functions to send and write
data to virtio console ports. Details of each test can be found
on the docstrings for the test_* functions.

Changes from v1:
 * Style fixes
 * Whitespace cleanup
 * Docstring and message fixes
 * ID for char devices can't be a simple number, fix it

Tested with Fedora 13 guest/host, -smp 1.

Signed-off-by: Lukas Doktor ldok...@redhat.com
Signed-off-by: Jiri Zupka jzu...@redhat.com
Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com
---
 client/tests/kvm/scripts/console_switch.py |  218 
 client/tests/kvm/tests/virtio_console.py   |  774 
 client/tests/kvm/tests_base.cfg.sample |   11 +
 3 files changed, 1003 insertions(+), 0 deletions(-)
 create mode 100644 client/tests/kvm/scripts/console_switch.py
 create mode 100644 client/tests/kvm/tests/virtio_console.py

diff --git a/client/tests/kvm/scripts/console_switch.py 
b/client/tests/kvm/scripts/console_switch.py
new file mode 100644
index 000..05edbc2
--- /dev/null
+++ b/client/tests/kvm/scripts/console_switch.py
@@ -0,0 +1,218 @@
+#!/usr/bin/python
+# -*- coding: utf-8 -*-
+
+Auxiliary script used to send data between ports on guests.
+
+...@copyright: 2008-2009 Red Hat Inc.
+...@author: Jiri Zupka (jzu...@redhat.com)
+...@author: Lukas Doktor (ldok...@redhat.com)
+
+import threading
+from threading import Thread
+import os,time,select,re,random,sys,array
+
+files = dict()
+ev = threading.Event()
+threads = []
+
+DEBUGPATH=/sys/kernel/debug
+
+
+class Switch(Thread):
+
+Create a thread which sends data between ports.
+
+def __init__(self, exitevent, in_files, out_files, cachesize=1):
+
+@param exitevent: Event to end switch.
+@param in_files: Array of input files.
+@param out_files: Array of output files.
+@param cachesize: Block to receive and send.
+
+Thread.__init__(self)
+
+self.in_files = in_files
+self.out_files = out_files
+
+self.cachesize = cachesize
+self.exitevent = exitevent
+
+
+def run(self):
+while not self.exitevent.isSet():
+#TODO: Why select causes trouble? :-(
+#ret = select.select(self.in_files,[],[],1.0)
+data = 
+#if not ret[0] == []:
+for desc in self.in_files:
+data += os.read(desc, self.cachesize)
+for desc in self.out_files:
+os.write(desc, data)
+
+
+class Sender(Thread):
+
+Creates thread which sends random blocks of data to the destination port.
+
+def __init__(self, port, length):
+
+@param port: Destination port.
+@param length: Length of the random data block.
+
+Thread.__init__(self)
+self.port = port
+self.data = array.array('L')
+for i in range(max(length/self.data.itemsize, 1)):
+self.data.append(random.randrange(sys.maxint))
+
+
+def run(self):
+while True:
+os.write(self.port, self.data)
+del threads[:]
+
+
+def get_port_status():
+
+Get info about ports from kernel debugfs.
+
+@return: ports dictionary of port properties
+
+ports = {}
+
+not_present_msg = FAIL: There's no virtio-ports dir in debugfs
+if not os.path.ismount(DEBUGPATH):
+os.system('mount -t debugfs none %s' % DEBUGPATH)
+try:
+if not os.path.isdir('%s/virtio-ports' % DEBUGPATH):
+print not_present_msg
+except:
+print not_present_msg
+else:
+viop_names = os.listdir('%s/virtio-ports' % DEBUGPATH)
+for name in viop_names:
+f = open(%s/virtio-ports/%s % (DEBUGPATH, name), 'r')
+port = {}
+for line in iter(f):
+m = re.match((\S+): (\S+),line)
+port[m.group(1)] = m.group(2)
+
+if (port['is_console'] == yes):
+port[path] = /dev/hvc%s % port[console_vtermno]
+# Console works like a serialport
+else:
+port[path] = /dev/%s % name
+ports[port['name']] = port
+f.close()
+
+return ports
+
+
+def open_device(in_files, ports):
+
+Open devices and return array of descriptors
+
+@param in_files: files array
+@return: array of descriptor
+
+f = []
+
+for item in in_files:
+name = ports[item[0]][path]
+if (not item[1] == ports[item[0]][is_console]):
+print ports
+print FAIL: Host console is not like console on guest side\n
+
+if (name in files):
+f.append(files[name])
+else:
+try:
+files[name] = os.open(name,

[PATCH v2] KVM-TEST: Use cdrom for Linux guests in unattended_install

2010-09-01 Thread Amos Kong


Bug 615839  - kernel panic when install guest with -fda

There exists a bug of floppy driver, it blocked our installation tests.
So replace floppy with cdrom for Linux guestsin unattended_install.

Changes from v1:
- Format floppy before mount
- Mount install ISO to self.cd2_iso

Signed-off-by: Amos Kong ak...@redhat.com
---
 client/tests/kvm/scripts/unattended.py |  127 ++--
 client/tests/kvm/tests_base.cfg.sample |   88 --
 2 files changed, 119 insertions(+), 96 deletions(-)

diff --git a/client/tests/kvm/scripts/unattended.py 
b/client/tests/kvm/scripts/unattended.py
index a0f96e7..edb0abf 100755
--- a/client/tests/kvm/scripts/unattended.py
+++ b/client/tests/kvm/scripts/unattended.py
@@ -81,8 +81,14 @@ class UnattendedInstall(object):
 
 self.kernel_args = os.environ.get('KVM_TEST_kernel_args', '')
 self.finish_program = os.environ.get('KVM_TEST_finish_program', '')
-cdrom_iso = os.environ.get('KVM_TEST_cdrom_cd1')
 self.unattended_file = os.environ.get('KVM_TEST_unattended_file')
+cdrom_cd1 = os.environ.get('KVM_TEST_cdrom_cd1', '')
+cdrom_cd2 = os.environ.get('KVM_TEST_cdrom_cd2', '')
+self.cd1_iso = os.path.join(kvm_test_dir, cdrom_cd1)
+self.cd2_iso = os.path.join(kvm_test_dir, cdrom_cd2)
+cd1_dir = os.path.dirname(self.cd1_iso)
+if not os.path.isdir(cd1_dir):
+os.makedirs(cd1_dir)
 
 install_virtio = os.environ.get('KVM_TEST_install_virtio', 'no')
 if install_virtio == 'yes':
@@ -120,15 +126,19 @@ class UnattendedInstall(object):
 self.qemu_img_bin = os.environ.get('KVM_TEST_qemu_img_binary')
 if not os.path.isabs(self.qemu_img_bin):
 self.qemu_img_bin = os.path.join(kvm_test_dir, self.qemu_img_bin)
-self.cdrom_iso = os.path.join(kvm_test_dir, cdrom_iso)
-self.floppy_mount = tempfile.mkdtemp(prefix='floppy_', dir='/tmp')
-self.cdrom_mount = tempfile.mkdtemp(prefix='cdrom_', dir='/tmp')
+
+self.cd1_mount = tempfile.mkdtemp(prefix='cd1_', dir='/tmp')
+self.cd2_mount = tempfile.mkdtemp(prefix='cd2_', dir='/tmp')
+self.boot_img_mount = tempfile.mkdtemp(prefix='boot_img_', dir='/tmp')
 self.nfs_mount = tempfile.mkdtemp(prefix='nfs_', dir='/tmp')
-floppy_name = os.environ['KVM_TEST_floppy']
-self.floppy_img = os.path.join(kvm_test_dir, floppy_name)
-floppy_dir = os.path.dirname(self.floppy_img)
-if not os.path.isdir(floppy_dir):
-os.makedirs(floppy_dir)
+
+floppy_name = os.environ.get('KVM_TEST_floppy', '')
+self.floppy_img = ''
+if floppy_name:
+self.floppy_img = os.path.join(kvm_test_dir, floppy_name)
+floppy_dir = os.path.dirname(self.floppy_img)
+if not os.path.isdir(floppy_dir):
+os.makedirs(floppy_dir)
 
 self.pxe_dir = os.environ.get('KVM_TEST_pxe_dir', '')
 self.pxe_image = os.environ.get('KVM_TEST_pxe_image', '')
@@ -162,7 +172,7 @@ class UnattendedInstall(object):
 path_list = glob.glob('*')
 for path in path_list:
 src = os.path.join(self.virtio_floppy_mount, path)
-dst = os.path.join(self.floppy_mount, path)
+dst = os.path.join(self.boot_img_mount, path)
 if os.path.isdir(path):
 shutil.copytree(src, dst)
 elif os.path.isfile(path):
@@ -205,7 +215,7 @@ class UnattendedInstall(object):
 4) Re-write the config file to the disk
 
 self.copy_virtio_drivers_floppy()
-txtsetup_oem = os.path.join(self.floppy_mount, 'txtsetup.oem')
+txtsetup_oem = os.path.join(self.boot_img_mount, 'txtsetup.oem')
 if not os.path.isfile(txtsetup_oem):
 raise SetupError('File txtsetup.oem not found on the install '
  'floppy. Please verify if your floppy virtio '
@@ -223,37 +233,39 @@ class UnattendedInstall(object):
 fp.close()
 
 
-def create_boot_floppy(self):
+def create_boot_img(self):
 
-Prepares a boot floppy by creating a floppy image file, mounting it and
-copying an answer file (kickstarts for RH based distros, answer files
-for windows) to it. After that the image is umounted.
+Prepares a boot floppy/iso image by creating a floppy/iso image file,
+it contains an answer file (kickstarts for RH based distros, answer
+files for windows). After that the image is umounted.
 
-print Creating boot floppy
+try:
+if self.floppy_img:
+print Creating boot floppy
 
-if os.path.exists(self.floppy_img):
-os.remove(self.floppy_img)
+if os.path.exists(self.floppy_img):
+os.remove(self.floppy_img)
+c_cmd = '%s create -f raw %s 1440k' %

Re: [PATCH 4/7] ide: use the PCI memory access interface

2010-09-01 Thread Michael S. Tsirkin

On Sat, Aug 28, 2010 at 05:54:55PM +0300, Eduard - Gabriel Munteanu wrote:
 Emulated PCI IDE controllers now use the memory access interface. This
 also allows an emulated IOMMU to translate and check accesses.
 
 Map invalidation results in cancelling DMA transfers. Since the guest OS
 can't properly recover the DMA results in case the mapping is changed,
 this is a fairly good approximation.
 
 Signed-off-by: Eduard - Gabriel Munteanu eduard.munte...@linux360.ro
 ---
  dma-helpers.c |   46 +-
  dma.h |   21 -
  hw/ide/core.c |   15 ---
  hw/ide/internal.h |   39 +++
  hw/ide/macio.c|4 ++--
  hw/ide/pci.c  |7 +++
  6 files changed, 117 insertions(+), 15 deletions(-)
 
 diff --git a/dma-helpers.c b/dma-helpers.c
 index 712ed89..a0dcdb8 100644
 --- a/dma-helpers.c
 +++ b/dma-helpers.c
 @@ -10,12 +10,36 @@
  #include dma.h
  #include block_int.h
  
 -void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint)
 +static void *qemu_sglist_default_map(void *opaque,
 + QEMUSGInvalMapFunc *inval_cb,
 + void *inval_opaque,
 + target_phys_addr_t addr,
 + target_phys_addr_t *len,
 + int is_write)
 +{
 +return cpu_physical_memory_map(addr, len, is_write);
 +}
 +
 +static void qemu_sglist_default_unmap(void *opaque,
 +  void *buffer,
 +  target_phys_addr_t len,
 +  int is_write,
 +  target_phys_addr_t access_len)
 +{
 +cpu_physical_memory_unmap(buffer, len, is_write, access_len);
 +}
 +
 +void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint,
 +  QEMUSGMapFunc *map, QEMUSGUnmapFunc *unmap, void 
 *opaque)
  {
  qsg-sg = qemu_malloc(alloc_hint * sizeof(ScatterGatherEntry));
  qsg-nsg = 0;
  qsg-nalloc = alloc_hint;
  qsg-size = 0;
 +
 +qsg-map = map ? map : qemu_sglist_default_map;
 +qsg-unmap = unmap ? unmap : qemu_sglist_default_unmap;
 +qsg-opaque = opaque;
  }
  
  void qemu_sglist_add(QEMUSGList *qsg, target_phys_addr_t base,
 @@ -73,12 +97,23 @@ static void dma_bdrv_unmap(DMAAIOCB *dbs)
  int i;
  
  for (i = 0; i  dbs-iov.niov; ++i) {
 -cpu_physical_memory_unmap(dbs-iov.iov[i].iov_base,
 -  dbs-iov.iov[i].iov_len, !dbs-is_write,
 -  dbs-iov.iov[i].iov_len);
 +dbs-sg-unmap(dbs-sg-opaque,
 +   dbs-iov.iov[i].iov_base,
 +   dbs-iov.iov[i].iov_len, !dbs-is_write,
 +   dbs-iov.iov[i].iov_len);
  }
  }
  
 +static void dma_bdrv_cancel(void *opaque)
 +{
 +DMAAIOCB *dbs = opaque;
 +
 +bdrv_aio_cancel(dbs-acb);
 +dma_bdrv_unmap(dbs);
 +qemu_iovec_destroy(dbs-iov);
 +qemu_aio_release(dbs);
 +}
 +
  static void dma_bdrv_cb(void *opaque, int ret)
  {
  DMAAIOCB *dbs = (DMAAIOCB *)opaque;
 @@ -100,7 +135,8 @@ static void dma_bdrv_cb(void *opaque, int ret)
  while (dbs-sg_cur_index  dbs-sg-nsg) {
  cur_addr = dbs-sg-sg[dbs-sg_cur_index].base + dbs-sg_cur_byte;
  cur_len = dbs-sg-sg[dbs-sg_cur_index].len - dbs-sg_cur_byte;
 -mem = cpu_physical_memory_map(cur_addr, cur_len, !dbs-is_write);
 +mem = dbs-sg-map(dbs-sg-opaque, dma_bdrv_cancel, dbs,
 +   cur_addr, cur_len, !dbs-is_write);
  if (!mem)
  break;
  qemu_iovec_add(dbs-iov, mem, cur_len);
 diff --git a/dma.h b/dma.h
 index f3bb275..d48f35c 100644
 --- a/dma.h
 +++ b/dma.h
 @@ -15,6 +15,19 @@
  #include hw/hw.h
  #include block.h
  
 +typedef void QEMUSGInvalMapFunc(void *opaque);
 +typedef void *QEMUSGMapFunc(void *opaque,
 +QEMUSGInvalMapFunc *inval_cb,
 +void *inval_opaque,
 +target_phys_addr_t addr,
 +target_phys_addr_t *len,
 +int is_write);
 +typedef void QEMUSGUnmapFunc(void *opaque,
 + void *buffer,
 + target_phys_addr_t len,
 + int is_write,
 + target_phys_addr_t access_len);
 +
  typedef struct {
  target_phys_addr_t base;
  target_phys_addr_t len;
 @@ -25,9 +38,15 @@ typedef struct {
  int nsg;
  int nalloc;
  target_phys_addr_t size;
 +
 +QEMUSGMapFunc *map;
 +QEMUSGUnmapFunc *unmap;
 +void *opaque;
  } QEMUSGList;
  
 -void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint);
 +void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint,
 +  QEMUSGMapFunc *map, QEMUSGUnmapFunc *unmap,
 +

Re: [PATCH 2/7] pci: memory access API and IOMMU support

2010-09-01 Thread Michael S. Tsirkin

On Sat, Aug 28, 2010 at 05:54:53PM +0300, Eduard - Gabriel Munteanu wrote:
 PCI devices should access memory through pci_memory_*() instead of
 cpu_physical_memory_*(). This also provides support for translation and
 access checking in case an IOMMU is emulated.
 
 Memory maps are treated as remote IOTLBs (that is, translation caches
 belonging to the IOMMU-aware device itself). Clients (devices) must
 provide callbacks for map invalidation in case these maps are
 persistent beyond the current I/O context, e.g. AIO DMA transfers.
 
 Signed-off-by: Eduard - Gabriel Munteanu eduard.munte...@linux360.ro


I am concerned about adding more pointer chaising on data path.
Could we have
1. an iommu pointer in a device, inherited by secondary buses
   when they are created and by devices from buses when they are attached.
2. translation pointer in the iommu instead of the bus
3. pci_memory_XX functions inline, doing fast path for non-iommu case:
   
if (__builtin_expect(!dev-iommu, 1)
return cpu_memory_rw



 ---
  hw/pci.c   |  185 
 +++-
  hw/pci.h   |   74 +
  hw/pci_internals.h |   12 
  qemu-common.h  |1 +
  4 files changed, 271 insertions(+), 1 deletions(-)

Almost nothing here is PCI specific.
Can this code go into dma.c/dma.h?
We would have struct DMADevice, APIs like device_dma_write etc.
This would help us get rid of the void * stuff as well?

 
 diff --git a/hw/pci.c b/hw/pci.c
 index 2dc1577..b460905 100644
 --- a/hw/pci.c
 +++ b/hw/pci.c
 @@ -158,6 +158,19 @@ static void pci_device_reset(PCIDevice *dev)
  pci_update_mappings(dev);
  }
  
 +static int pci_no_translate(PCIDevice *iommu,
 +PCIDevice *dev,
 +pcibus_t addr,
 +target_phys_addr_t *paddr,
 +target_phys_addr_t *len,
 +unsigned perms)
 +{
 +*paddr = addr;
 +*len = -1;
 +
 +return 0;
 +}
 +
  static void pci_bus_reset(void *opaque)
  {
  PCIBus *bus = opaque;
 @@ -220,7 +233,10 @@ void pci_bus_new_inplace(PCIBus *bus, DeviceState 
 *parent,
  {
  qbus_create_inplace(bus-qbus, pci_bus_info, parent, name);
  assert(PCI_FUNC(devfn_min) == 0);
 -bus-devfn_min = devfn_min;
 +
 +bus-devfn_min  = devfn_min;
 +bus-iommu  = NULL;
 +bus-translate  = pci_no_translate;
  
  /* host bridge */
  QLIST_INIT(bus-child);
 @@ -1789,3 +1805,170 @@ static char *pcibus_get_dev_path(DeviceState *dev)
  return strdup(path);
  }
  
 +void pci_register_iommu(PCIDevice *iommu,
 +PCITranslateFunc *translate)
 +{
 +iommu-bus-iommu = iommu;
 +iommu-bus-translate = translate;
 +}
 +

The above seems broken for secondary buses, right?  Also, can we use
qdev for initialization in some way, instead of adding more APIs?  E.g.
I think it would be nice if we could just use qdev command line flags to
control which bus is behind iommu and which isn't.


 +void pci_memory_rw(PCIDevice *dev,
 +   pcibus_t addr,
 +   uint8_t *buf,
 +   pcibus_t len,
 +   int is_write)
 +{
 +int err;
 +unsigned perms;
 +PCIDevice *iommu = dev-bus-iommu;
 +target_phys_addr_t paddr, plen;
 +
 +perms = is_write ? IOMMU_PERM_WRITE : IOMMU_PERM_READ;
 +
 +while (len) {
 +err = dev-bus-translate(iommu, dev, addr, paddr, plen, perms);
 +if (err)
 +return;
 +
 +/* The translation might be valid for larger regions. */
 +if (plen  len)
 +plen = len;
 +
 +cpu_physical_memory_rw(paddr, buf, plen, is_write);
 +
 +len -= plen;
 +addr += plen;
 +buf += plen;
 +}
 +}
 +
 +static void pci_memory_register_map(PCIDevice *dev,
 +pcibus_t addr,
 +pcibus_t len,
 +target_phys_addr_t paddr,
 +PCIInvalidateMapFunc *invalidate,
 +void *invalidate_opaque)
 +{
 +PCIMemoryMap *map;
 +
 +map = qemu_malloc(sizeof(PCIMemoryMap));
 +map-addr   = addr;
 +map-len= len;
 +map-paddr  = paddr;
 +map-invalidate = invalidate;
 +map-invalidate_opaque  = invalidate_opaque;
 +
 +QLIST_INSERT_HEAD(dev-memory_maps, map, list);
 +}
 +
 +static void pci_memory_unregister_map(PCIDevice *dev,
 +  target_phys_addr_t paddr,
 +  target_phys_addr_t len)
 +{
 +PCIMemoryMap *map;
 +
 +QLIST_FOREACH(map, dev-memory_maps, list) {
 +if (map-paddr == paddr  map-len == len) {
 +QLIST_REMOVE(map, list);
 +free(map);
 +}
 +}
 +}
 +
 +void

Re: [PULL 00/35] KVM: PPC: End-August patch queue

2010-09-01 Thread Avi Kivity


 On 08/31/2010 05:31 AM, Alexander Graf wrote:

Howdy,

This is my local patch queue with stuff that has accumulated over the last
weeks on KVM for PPC with some last minute fixes, speedups and debugging help
that I needed for the KVM Forum ;-).

The highlights of this set are:

   - Converted most important debug points to tracepoints
   - Flush less PTEs (speedup)
   - Go back to our own hash (less duplicates)
   - Make SRs guest settable (speedup for 32 bit guests)
   - Remove r30/r31 restrictions from PV hooks (speedup!)
   - Fix random breakages
   - Fix random guest stalls
   - 440GP host support (Thanks Hollis!)
   - Reliable interrupt injection

Keep in mind that this is the first version that is stable on PPC32 hosts.
All versions prior to this could occupy otherwise used segment entries and
thus crash your machine :-).

It is also the first version that is stable with PPC64 guests, because they
require more sophisticated interrupt injection logic for which qemu patches
are also required.

Please pull this tree from:

 git://github.com/agraf/linux-2.6.git kvm-ppc-next

Have fun with more accurate, faster and less buggy KVM on PowerPC!


Pulled, thanks.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

46 matches

Mail list logo