Re: Perf trace event parse errors for KVM events

2010-05-29 Thread Stefan Hajnoczi
On Fri, May 28, 2010 at 05:45:57PM -0400, Steven Rostedt wrote:
 On Fri, 2010-05-28 at 17:42 +0100, Stefan Hajnoczi wrote:
  I get parse errors when using Steven Rostedt's trace-cmd tool, too.
  
  Any ideas what is going on here?  I can provide more info (e.g. trace
  files) if necessary.
 
 Does trace-cmd fail on the same tracepoints? Have you checkout the
 latest code?.

$ sudo trace-cmd record -e kvm:kvm_pio
$ trace-cmd report
version = 6
  bad op token {
  failed to read event print fmt for kvm_mmu_get_page
  bad op token {
  failed to read event print fmt for kvm_mmu_sync_page
  bad op token {
  failed to read event print fmt for kvm_mmu_unsync_page
  bad op token {
  failed to read event print fmt for kvm_mmu_zap_page
  Error: expected type 4 but read 7
  Error: expected type 5 but read 0
  failed to read event print fmt for kvm_apic
  function ftrace_print_symbols_seq not defined
  failed to read event print fmt for kvm_exit
  Error: expected type 4 but read 7
  Error: expected type 5 but read 0
  failed to read event print fmt for kvm_inj_exception
  function ftrace_print_symbols_seq not defined
  failed to read event print fmt for kvm_nested_vmexit
  function ftrace_print_symbols_seq not defined
  failed to read event print fmt for kvm_nested_vmexit_inject
  bad op token {
  failed to read event print fmt for kvm_emulate_insn

These are different from those reported by perf.

Yes, I use trace-cmd.git master branch (currently built from
b530a23f0442be322b1717e6dbce2bd502634cb4).

My kernel is 2.6.34 based.

 I do know it fails on some of the KVM tracerpoints since the formatting
 they use is obnoxious.
 
 Could you show the print-fmt of the failing events?

Here are the details along with my amateur comments on what might have gone 
wrong:

$ for event in kvmmmu/kvm_mmu_get_page kvmmmu/kvm_mmu_sync_page 
kvmmmu/kvm_mmu_unsync_page kvmmmu/kvm_mmu_zap_page kvm/kvm_apic kvm/kvm_exit 
kvm/kvm_inj_exception kvm/kvm_nested_vmexit kvm/kvm_nested_vmexit_inject 
kvm/kvm_emulate_insn; do echo -n $event: ; grep 'print fmt:' 
/sys/kernel/debug/tracing/events/$event/format; done

kvmmmu/kvm_mmu_get_page: print fmt: %s %s, ({ const char *ret = p-buffer + 
p-len; static const char *access_str[] = { ---, --x, w--, w-x, -u-, 
-ux, wu-, wux }; union kvm_mmu_page_role role; role.word = REC-role; 
trace_seq_printf(p, sp gfn %llx %u%s q%u%s %s%s  %snxe root %u %s%c, 
REC-gfn, role.level, role.cr4_pae ?  pae : , role.quadrant, role.direct ? 
 direct : , access_str[role.access], role.invalid ?  invalid : , 
role.nxe ?  : !, REC-root_count, REC-unsync ? unsync : sync, 0); ret; 
}), REC-created ? new : existing
kvmmmu/kvm_mmu_sync_page: print fmt: %s, ({ const char *ret = p-buffer + 
p-len; static const char *access_str[] = { ---, --x, w--, w-x, -u-, 
-ux, wu-, wux }; union kvm_mmu_page_role role; role.word = REC-role; 
trace_seq_printf(p, sp gfn %llx %u%s q%u%s %s%s  %snxe root %u %s%c, 
REC-gfn, role.level, role.cr4_pae ?  pae : , role.quadrant, role.direct ? 
 direct : , access_str[role.access], role.invalid ?  invalid : , 
role.nxe ?  : !, REC-root_count, REC-unsync ? unsync : sync, 0); ret; 
})
kvmmmu/kvm_mmu_unsync_page: print fmt: %s, ({ const char *ret = p-buffer + 
p-len; static const char *access_str[] = { ---, --x, w--, w-x, -u-, 
-ux, wu-, wux }; union kvm_mmu_page_role role; role.word = REC-role; 
trace_seq_printf(p, sp gfn %llx %u%s q%u%s %s%s  %snxe root %u %s%c, 
REC-gfn, role.level, role.cr4_pae ?  pae : , role.quadrant, role.direct ? 
 direct : , access_str[role.access], role.invalid ?  invalid : , 
role.nxe ?  : !, REC-root_count, REC-unsync ? unsync : sync, 0); ret; 
})
kvmmmu/kvm_mmu_zap_page: print fmt: %s, ({ const char *ret = p-buffer + 
p-len; static const char *access_str[] = { ---, --x, w--, w-x, -u-, 
-ux, wu-, wux }; union kvm_mmu_page_role role; role.word = REC-role; 
trace_seq_printf(p, sp gfn %llx %u%s q%u%s %s%s  %snxe root %u %s%c, 
REC-gfn, role.level, role.cr4_pae ?  pae : , role.quadrant, role.direct ? 
 direct : , access_str[role.access], role.invalid ?  invalid : , 
role.nxe ?  : !, REC-root_count, REC-unsync ? unsync : sync, 0); ret; 
})
kvm/kvm_emulate_insn: print fmt: %x:%llx:%s (%s)%s, REC-csbase, REC-rip, ({ 
int i; const char *ret = p-buffer + p-len; for (i = 0; i  REC-len; ++i) 
trace_seq_printf(p,  %02x, REC-insn[i]); trace_seq_printf(p, %c, 0); ret; 
}), __print_symbolic(REC-flags, { 0, real }, { (1  0) | (1  1), vm16 
}, { (1  0), prot16 }, { (1  0) | (1  2), prot32 }, { (1  0) | (1 
 3), prot64 }), REC-failed ?  failed : 

Macro expanded into C code that shouldn't have?

kvm/kvm_apic: print fmt: apic_%s %s = 0x%x, REC-rw ? write : read, 
__print_symbolic(REC-reg, { 0x20, APIC_ ID }, { 0x30, APIC_ LVR }, { 
0x80, APIC_ TASKPRI }, { 0x90, APIC_ ARBPRI }, { 0xA0, APIC_ 
PROCPRI }, { 0xB0, APIC_ EOI }, { 0xC0, APIC_ RRR }, { 0xD0, APIC_ 
LDR }, { 0xE0, APIC_ DFR }, { 0xF0, APIC_ SPIV }, { 0x100, APIC_ 
ISR }, { 0x180, APIC_ TMR }, { 0x200, APIC_ IRR }, { 0x280, APIC_ 
ESR }, 

Re: raw disks no longer work in latest kvm (kvm-88 was fine)

2010-05-29 Thread Antoine Martin
On 05/24/2010 12:47 AM, Stefan Hajnoczi wrote:
 On Sun, May 23, 2010 at 5:18 PM, Antoine Martin anto...@nagafix.co.uk wrote:
 Why does it work in a chroot for the other options (aio=native, if=ide, etc)
 but not for aio!=native??
 Looks like I am misunderstanding the semantics of chroot...
 
 It might not be the chroot() semantics but the environment inside that
 chroot, like the glibc.  Have you compared strace inside and outside
 the chroot?
Reverting to a static build also fixes the issue: aio=threads works.
Definitely something fishy going on with glibc library loading.
(I've checked glibc, libaio were up to date in the chroot - nothing
blatant in the strace)

Can someone explain the aio options?
All I can find is this:
# qemu-system-x86_64 -h | grep -i aio
   [,addr=A][,id=name][,aio=threads|native]
I assume it means the aio=threads emulates the kernel's aio with
separate threads? And is therefore likely to be slower, right?
Is there a reason why aio=native is not the default? Shouldn't
aio=threads be the fallback?

Cheers
Antoine



 
 Stefan
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raw disks no longer work in latest kvm (kvm-88 was fine)

2010-05-29 Thread Stefan Hajnoczi
On Sat, May 29, 2010 at 10:42 AM, Antoine Martin anto...@nagafix.co.uk wrote:
 Can someone explain the aio options?
 All I can find is this:
 # qemu-system-x86_64 -h | grep -i aio
       [,addr=A][,id=name][,aio=threads|native]
 I assume it means the aio=threads emulates the kernel's aio with
 separate threads? And is therefore likely to be slower, right?
 Is there a reason why aio=native is not the default? Shouldn't
 aio=threads be the fallback?

aio=threads uses posix-aio-compat.c, a POSIX AIO-like implementation
using a thread pool.  Each thread services queued I/O requests using
blocking syscalls (e.g. preadv()/pwritev()).

aio=native uses Linux libaio, the native (non-POSIX) AIO interface.

I would expect that aio=native is faster but benchmarks show that this
isn't true for all workloads.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raw disks no longer work in latest kvm (kvm-88 was fine)

2010-05-29 Thread Christoph Hellwig
On Sat, May 29, 2010 at 04:42:59PM +0700, Antoine Martin wrote:
 Can someone explain the aio options?
 All I can find is this:
 # qemu-system-x86_64 -h | grep -i aio
[,addr=A][,id=name][,aio=threads|native]
 I assume it means the aio=threads emulates the kernel's aio with
 separate threads? And is therefore likely to be slower, right?
 Is there a reason why aio=native is not the default? Shouldn't
 aio=threads be the fallback?

The kernel AIO support is unfortunately not a very generic API.
It only supports O_DIRECT I/O (cache=none for qemu), and if used on
a filesystems it might still block if we need to perform block
allocations.  We could probably make it the default for block devices,
but I'm not a big fan of these kind of conditional defaults.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raw disks no longer work in latest kvm (kvm-88 was fine)

2010-05-29 Thread Christoph Hellwig
On Sat, May 29, 2010 at 10:55:18AM +0100, Stefan Hajnoczi wrote:
 I would expect that aio=native is faster but benchmarks show that this
 isn't true for all workloads.

In what benchmark do you see worse results for aio=native compared to
aio=threads?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raw disks no longer work in latest kvm (kvm-88 was fine)

2010-05-29 Thread Stefan Hajnoczi
On Sat, May 29, 2010 at 11:34 AM, Christoph Hellwig h...@infradead.org wrote:
 In what benchmark do you see worse results for aio=native compared to
 aio=threads?

Sequential reads using 4 concurrent dd if=/dev/vdb iflag=direct
of=/dev/null bs=8k processes.  2 vcpu guest with 4 GB RAM, virtio
block devices, cache=none.  Host storage is a striped LVM volume.
Host kernel kvm.git and qemu-kvm.git userspace.

aio=native and aio=threads each run 3 times.

Result: aio=native has 15% lower throughput than aio=threads.

I haven't looked into this so I don't know what is causes these results.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Perf trace event parse errors for KVM events

2010-05-29 Thread Avi Kivity

On 05/29/2010 12:45 AM, Steven Rostedt wrote:

On Fri, 2010-05-28 at 17:42 +0100, Stefan Hajnoczi wrote:
   

I get parse errors when using Steven Rostedt's trace-cmd tool, too.

Any ideas what is going on here?  I can provide more info (e.g. trace
files) if necessary.
 

Does trace-cmd fail on the same tracepoints? Have you checkout the
latest code?.

I do know it fails on some of the KVM tracerpoints since the formatting
they use is obnoxious.

   


Isn't there a binary trace for this?

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] VFIO driver: Non-privileged user level PCI drivers

2010-05-29 Thread Arnd Bergmann
On Saturday 29 May 2010, Tom Lyon wrote:
 +/*
 + * Structure for DMA mapping of user buffers
 + * vaddr, dmaaddr, and size must all be page aligned
 + * buffer may only be larger than 1 page if (a) there is
 + * an iommu in the system, or (b) buffer is part of a huge page
 + */
 +struct vfio_dma_map {
 +   __u64   vaddr;  /* process virtual addr */
 +   __u64   dmaaddr;/* desired and/or returned dma address */
 +   __u64   size;   /* size in bytes */
 +   int rdwr;   /* bool: 0 for r/o; 1 for r/w */
 +};

Please add a 32 bit padding word at the end of this, otherwise the
size of the data structure is incompatible between 32 x86 applications
and 64 bit kernels.

Arnd
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] VFIO driver: Non-privileged user level PCI drivers

2010-05-29 Thread Avi Kivity

On 05/29/2010 02:55 PM, Arnd Bergmann wrote:

On Saturday 29 May 2010, Tom Lyon wrote:
   

+/*
+ * Structure for DMA mapping of user buffers
+ * vaddr, dmaaddr, and size must all be page aligned
+ * buffer may only be larger than 1 page if (a) there is
+ * an iommu in the system, or (b) buffer is part of a huge page
+ */
+struct vfio_dma_map {
+   __u64   vaddr;  /* process virtual addr */
+   __u64   dmaaddr;/* desired and/or returned dma address */
+   __u64   size;   /* size in bytes */
+   int rdwr;   /* bool: 0 for r/o; 1 for r/w */
+};
 

Please add a 32 bit padding word at the end of this, otherwise the
size of the data structure is incompatible between 32 x86 applications
and 64 bit kernels.
   


Might as well call it 'flags' and reserve a bit more space (keeping 
64-bit aligned size) for future expansion.


rdwr can be folded into it.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Perf trace event parse errors for KVM events

2010-05-29 Thread Steven Rostedt
On Sat, 2010-05-29 at 14:50 +0300, Avi Kivity wrote:
 On 05/29/2010 12:45 AM, Steven Rostedt wrote:
  On Fri, 2010-05-28 at 17:42 +0100, Stefan Hajnoczi wrote:
 
  I get parse errors when using Steven Rostedt's trace-cmd tool, too.
 
  Any ideas what is going on here?  I can provide more info (e.g. trace
  files) if necessary.
   
  Does trace-cmd fail on the same tracepoints? Have you checkout the
  latest code?.
 
  I do know it fails on some of the KVM tracerpoints since the formatting
  they use is obnoxious.
 
 
 
 Isn't there a binary trace for this?
 

The pretty printing from the kernel handles this fine. But there's
pressure to pass the format to userspace in binary and have the tool
parse it. Currently it uses the print fmt to figure out how to parse.

Using one of the examples that Stefan showed:

kvmmmu/kvm_mmu_get_page: print fmt: %s %s, ({ const char *ret =
p-buffer + p-len; static const char *access_str[] = { ---, --x,
w--, w-x, -u-, -ux, wu-, wux }; union kvm_mmu_page_role
role; role.word = REC-role; trace_seq_printf(p, sp gfn %llx %u%s q%u%s
%s%s  %snxe root %u %s%c, REC-gfn, role.level, role.cr4_pae ? 
pae : , role.quadrant, role.direct ?  direct : ,
access_str[role.access], role.invalid ?  invalid : , role.nxe ?  :
!, REC-root_count, REC-unsync ? unsync : sync, 0); ret; }),
REC-created ? new : existing


You need a full C parser/interpreter to understand the above.

-- Steve


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Hey, Opt In Email Lists Give Away!

2010-05-29 Thread Sam L. Carl
Hey, 

You have to check this out if your a webmaster. http://www.traffictractor.com 
is giving away thousands of opt in emails. 

This is huge! Everyone is talking about it now. With these opt in email 
addresses you can do so much. It takes years and a lot of money to build a list 
from scratch but now you have the chance to grab a massive list.

Use it for ezine, email, autoresponders, they will bring your website a huge 
amount of power, traffic and sales. 

You have to check it out at http://www.traffictractor.com

All the best,
Sam L. Carl

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


how to unbind the device from guest to host with vt-d

2010-05-29 Thread 贺鹏
Hi, everyone,
I 'm here  ask how to unbind the device from guest to host with vt-d.
I use the /sys/bus/pci/driver/pci-stub/remove_id
and the /sys/bus/pci/driver/pci-stub/unbind

but I can't find the dir : /sys/bus/pci/device/:09:00.0/driver .
So I can't bind the driver back to host system any more.
The device is not available in my host system.

I use the full virtio, by the way when my guest os shutdown, it gave a
kernel panic, said the e1000_shutdown func had a bug.




--
hepeng
ICT
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [QEMU-KVM]: Megasas + TCM_Loop + SG_IO into Windows XP guests

2010-05-29 Thread Nicholas A. Bellinger
On Tue, 2010-05-18 at 04:18 -0700, Nicholas A. Bellinger wrote:
 On Tue, 2010-05-18 at 11:43 +0200, Hannes Reinecke wrote:
  Nicholas A. Bellinger wrote:
   On Fri, 2010-05-14 at 02:42 -0700, Nicholas A. Bellinger wrote:
   Greetings Hannes,
   
   So I spent some more time with XP guests this weekend, and I noticed two
   things immediately when using hw/lsi53c895a.c instead of hw/megasas.c
   with the same two TCM_Loop SAS LUNs via SG_IO from last week:
   
   1) With lsi53c895a, XP guests are able to boot successfully w/ out the
   synchronous SG_IO hack that is currently required to get past the first
   36-byte INQUIRY for megasas + XP SP2
   
   2) With lsi53c895a, XP is able to successfully create and mount a NTFS
   filesystem, reboot, and read blocks appear to be functioning properly.
   FYI I have not run any 'write known pattern then read-back and compare
   blocks' data integrity tests from with in the XP guests just yet, but I
   am confident that TCM scatterlist - se_mem_t mapping is working as
   expected on the KVM Host.
   
   Futhermore, after formatting a 5 GB TCM/FILEIO LUN with lsi53c895a, and
   then rebooting with megasas with the same two configured TCM_Loop SG_IO
   devices, it appears to be able to mount and read blocks successfully.
   Attempting to write new blocks on the mounted filesystem also appears to
   work to some degree, but throughput slows down to a crawl during XP
   guest buffer cache flush, which is likely attributed to the use of my
   quick SYNC SG_IO hack.
   
   So it appears that there are two seperate issues here, and AFAICT they
   both look to be XP and megasas specific.  For #2, it may be something
   about the format of the incoming scatterlists generated during XP's
   mkfs.ntfs that is causing some issues.  While watching output during fs
   creation, I noticed the following WRITE_10s with a starting 4088 byte
   scatterlist and a trailing 8 byte scatterlist:
   
   megasas: writel mmio 40: 2b0b003
   megasas: Found mapped frame 2 context 82b0b000 pa 2b0b000
   megasas: Enqueue frame context 82b0b000 tail 493 busy 1
   megasas: LD SCSI dev 2 lun 0 sdev 0xdc0230 xfer 16384
   scsi-generic: Using cur_addr: 0x0ff6c008 cur_len: 
   0x0ff8
   scsi-generic: Adding iovec for mem: 0x7f1783b96008 len: 0x0ff8
   scsi-generic: Using cur_addr: 0x0fd6e000 cur_len: 
   0x1000
   scsi-generic: Adding iovec for mem: 0x7f1783998000 len: 0x1000
   scsi-generic: Using cur_addr: 0x0fe2f000 cur_len: 
   0x1000
   scsi-generic: Adding iovec for mem: 0x7f1783a59000 len: 0x1000
   scsi-generic: Using cur_addr: 0x0fdf cur_len: 
   0x1000
   scsi-generic: Adding iovec for mem: 0x7f1783a1a000 len: 0x1000
   scsi-generic: Using cur_addr: 0x0fded000 cur_len: 
   0x0008
   scsi-generic: Adding iovec for mem: 0x7f1783a17000 len: 0x0008
   scsi-generic: execute IOV: iovec_count: 5, dxferp: 0xd92420, dxfer_len: 
   16384
   scsi-generic: --- Issuing SG_IO CDB len 10: 0x2a 00 
   00 00 fa be 00 00 20 00 
   scsi-generic: scsi_write_complete() ret = 0
   scsi-generic: Command complete 0x0xd922c0 tag=0x82b0b000 status=0
   megasas: LD SCSI req 0xd922c0 cmd 0xda92c0 lun 0xdc0230 finished with 
   status 0 len 16384
   megasas: Complete frame context 82b0b000 tail 493 busy 0 doorbell 0
   
   Also, the final READ_10 that produces the 'could not create filesystem'
   exception is for LBA 63 and XP looking for the first FS blocks after
   GPT.
   
   Could there be some breakage in megasas with a length  PAGE_SIZE for
   the scatterlist..?As lsi53c895a seems to work OK for this case, is
   there something about the logic of parsing the incoming struct
   scatterlists that is different between the two HBA drivers..?  AFAICT
   both are using Gerd's common code in hw/scsi-bus.c, unless there is
   something about megasas_map_sgl() that is causing issues with the
   above..?
   
  
  The usual disclaimer here: I'm less than happy with the current SCSI disk 
  handling.
  Currently we have the two options:
  - Using 'scsi-disk', which will _emulate_ a SCSI disk internally, but allow 
  to use
asynchronous I/O using normal read/write syscalls
  - Using 'scsi-generic', which will allow you to pass-through any SCSI 
  device, but
disallow asynchronous I/O and requires you to use the SG_IO interface.
 
 Well, this is only true so far for the SYNC SG_IO patch with KVM XP
 guests.  The asynchronous I/O still works as expected for Linux KVM
 guests for 10 Gb/sec sec throughput.
 
  The latter also implies that the host will mark _all_ I/O commands as 
  'block_pc',
  so the code path within the kernel is quite different from those taken by 
  I/Os
  coming in via the 'scsi-disk' emulation.
  Guess it's time to have a 'scsi-passthrough' device ...
 
 Currently with QEMU-KVM hw/scsi-generic.c and STGT