Re: Perf trace event parse errors for KVM events
On Fri, May 28, 2010 at 05:45:57PM -0400, Steven Rostedt wrote: On Fri, 2010-05-28 at 17:42 +0100, Stefan Hajnoczi wrote: I get parse errors when using Steven Rostedt's trace-cmd tool, too. Any ideas what is going on here? I can provide more info (e.g. trace files) if necessary. Does trace-cmd fail on the same tracepoints? Have you checkout the latest code?. $ sudo trace-cmd record -e kvm:kvm_pio $ trace-cmd report version = 6 bad op token { failed to read event print fmt for kvm_mmu_get_page bad op token { failed to read event print fmt for kvm_mmu_sync_page bad op token { failed to read event print fmt for kvm_mmu_unsync_page bad op token { failed to read event print fmt for kvm_mmu_zap_page Error: expected type 4 but read 7 Error: expected type 5 but read 0 failed to read event print fmt for kvm_apic function ftrace_print_symbols_seq not defined failed to read event print fmt for kvm_exit Error: expected type 4 but read 7 Error: expected type 5 but read 0 failed to read event print fmt for kvm_inj_exception function ftrace_print_symbols_seq not defined failed to read event print fmt for kvm_nested_vmexit function ftrace_print_symbols_seq not defined failed to read event print fmt for kvm_nested_vmexit_inject bad op token { failed to read event print fmt for kvm_emulate_insn These are different from those reported by perf. Yes, I use trace-cmd.git master branch (currently built from b530a23f0442be322b1717e6dbce2bd502634cb4). My kernel is 2.6.34 based. I do know it fails on some of the KVM tracerpoints since the formatting they use is obnoxious. Could you show the print-fmt of the failing events? Here are the details along with my amateur comments on what might have gone wrong: $ for event in kvmmmu/kvm_mmu_get_page kvmmmu/kvm_mmu_sync_page kvmmmu/kvm_mmu_unsync_page kvmmmu/kvm_mmu_zap_page kvm/kvm_apic kvm/kvm_exit kvm/kvm_inj_exception kvm/kvm_nested_vmexit kvm/kvm_nested_vmexit_inject kvm/kvm_emulate_insn; do echo -n $event: ; grep 'print fmt:' /sys/kernel/debug/tracing/events/$event/format; done kvmmmu/kvm_mmu_get_page: print fmt: %s %s, ({ const char *ret = p-buffer + p-len; static const char *access_str[] = { ---, --x, w--, w-x, -u-, -ux, wu-, wux }; union kvm_mmu_page_role role; role.word = REC-role; trace_seq_printf(p, sp gfn %llx %u%s q%u%s %s%s %snxe root %u %s%c, REC-gfn, role.level, role.cr4_pae ? pae : , role.quadrant, role.direct ? direct : , access_str[role.access], role.invalid ? invalid : , role.nxe ? : !, REC-root_count, REC-unsync ? unsync : sync, 0); ret; }), REC-created ? new : existing kvmmmu/kvm_mmu_sync_page: print fmt: %s, ({ const char *ret = p-buffer + p-len; static const char *access_str[] = { ---, --x, w--, w-x, -u-, -ux, wu-, wux }; union kvm_mmu_page_role role; role.word = REC-role; trace_seq_printf(p, sp gfn %llx %u%s q%u%s %s%s %snxe root %u %s%c, REC-gfn, role.level, role.cr4_pae ? pae : , role.quadrant, role.direct ? direct : , access_str[role.access], role.invalid ? invalid : , role.nxe ? : !, REC-root_count, REC-unsync ? unsync : sync, 0); ret; }) kvmmmu/kvm_mmu_unsync_page: print fmt: %s, ({ const char *ret = p-buffer + p-len; static const char *access_str[] = { ---, --x, w--, w-x, -u-, -ux, wu-, wux }; union kvm_mmu_page_role role; role.word = REC-role; trace_seq_printf(p, sp gfn %llx %u%s q%u%s %s%s %snxe root %u %s%c, REC-gfn, role.level, role.cr4_pae ? pae : , role.quadrant, role.direct ? direct : , access_str[role.access], role.invalid ? invalid : , role.nxe ? : !, REC-root_count, REC-unsync ? unsync : sync, 0); ret; }) kvmmmu/kvm_mmu_zap_page: print fmt: %s, ({ const char *ret = p-buffer + p-len; static const char *access_str[] = { ---, --x, w--, w-x, -u-, -ux, wu-, wux }; union kvm_mmu_page_role role; role.word = REC-role; trace_seq_printf(p, sp gfn %llx %u%s q%u%s %s%s %snxe root %u %s%c, REC-gfn, role.level, role.cr4_pae ? pae : , role.quadrant, role.direct ? direct : , access_str[role.access], role.invalid ? invalid : , role.nxe ? : !, REC-root_count, REC-unsync ? unsync : sync, 0); ret; }) kvm/kvm_emulate_insn: print fmt: %x:%llx:%s (%s)%s, REC-csbase, REC-rip, ({ int i; const char *ret = p-buffer + p-len; for (i = 0; i REC-len; ++i) trace_seq_printf(p, %02x, REC-insn[i]); trace_seq_printf(p, %c, 0); ret; }), __print_symbolic(REC-flags, { 0, real }, { (1 0) | (1 1), vm16 }, { (1 0), prot16 }, { (1 0) | (1 2), prot32 }, { (1 0) | (1 3), prot64 }), REC-failed ? failed : Macro expanded into C code that shouldn't have? kvm/kvm_apic: print fmt: apic_%s %s = 0x%x, REC-rw ? write : read, __print_symbolic(REC-reg, { 0x20, APIC_ ID }, { 0x30, APIC_ LVR }, { 0x80, APIC_ TASKPRI }, { 0x90, APIC_ ARBPRI }, { 0xA0, APIC_ PROCPRI }, { 0xB0, APIC_ EOI }, { 0xC0, APIC_ RRR }, { 0xD0, APIC_ LDR }, { 0xE0, APIC_ DFR }, { 0xF0, APIC_ SPIV }, { 0x100, APIC_ ISR }, { 0x180, APIC_ TMR }, { 0x200, APIC_ IRR }, { 0x280, APIC_ ESR },
Re: raw disks no longer work in latest kvm (kvm-88 was fine)
On 05/24/2010 12:47 AM, Stefan Hajnoczi wrote: On Sun, May 23, 2010 at 5:18 PM, Antoine Martin anto...@nagafix.co.uk wrote: Why does it work in a chroot for the other options (aio=native, if=ide, etc) but not for aio!=native?? Looks like I am misunderstanding the semantics of chroot... It might not be the chroot() semantics but the environment inside that chroot, like the glibc. Have you compared strace inside and outside the chroot? Reverting to a static build also fixes the issue: aio=threads works. Definitely something fishy going on with glibc library loading. (I've checked glibc, libaio were up to date in the chroot - nothing blatant in the strace) Can someone explain the aio options? All I can find is this: # qemu-system-x86_64 -h | grep -i aio [,addr=A][,id=name][,aio=threads|native] I assume it means the aio=threads emulates the kernel's aio with separate threads? And is therefore likely to be slower, right? Is there a reason why aio=native is not the default? Shouldn't aio=threads be the fallback? Cheers Antoine Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raw disks no longer work in latest kvm (kvm-88 was fine)
On Sat, May 29, 2010 at 10:42 AM, Antoine Martin anto...@nagafix.co.uk wrote: Can someone explain the aio options? All I can find is this: # qemu-system-x86_64 -h | grep -i aio [,addr=A][,id=name][,aio=threads|native] I assume it means the aio=threads emulates the kernel's aio with separate threads? And is therefore likely to be slower, right? Is there a reason why aio=native is not the default? Shouldn't aio=threads be the fallback? aio=threads uses posix-aio-compat.c, a POSIX AIO-like implementation using a thread pool. Each thread services queued I/O requests using blocking syscalls (e.g. preadv()/pwritev()). aio=native uses Linux libaio, the native (non-POSIX) AIO interface. I would expect that aio=native is faster but benchmarks show that this isn't true for all workloads. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raw disks no longer work in latest kvm (kvm-88 was fine)
On Sat, May 29, 2010 at 04:42:59PM +0700, Antoine Martin wrote: Can someone explain the aio options? All I can find is this: # qemu-system-x86_64 -h | grep -i aio [,addr=A][,id=name][,aio=threads|native] I assume it means the aio=threads emulates the kernel's aio with separate threads? And is therefore likely to be slower, right? Is there a reason why aio=native is not the default? Shouldn't aio=threads be the fallback? The kernel AIO support is unfortunately not a very generic API. It only supports O_DIRECT I/O (cache=none for qemu), and if used on a filesystems it might still block if we need to perform block allocations. We could probably make it the default for block devices, but I'm not a big fan of these kind of conditional defaults. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raw disks no longer work in latest kvm (kvm-88 was fine)
On Sat, May 29, 2010 at 10:55:18AM +0100, Stefan Hajnoczi wrote: I would expect that aio=native is faster but benchmarks show that this isn't true for all workloads. In what benchmark do you see worse results for aio=native compared to aio=threads? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raw disks no longer work in latest kvm (kvm-88 was fine)
On Sat, May 29, 2010 at 11:34 AM, Christoph Hellwig h...@infradead.org wrote: In what benchmark do you see worse results for aio=native compared to aio=threads? Sequential reads using 4 concurrent dd if=/dev/vdb iflag=direct of=/dev/null bs=8k processes. 2 vcpu guest with 4 GB RAM, virtio block devices, cache=none. Host storage is a striped LVM volume. Host kernel kvm.git and qemu-kvm.git userspace. aio=native and aio=threads each run 3 times. Result: aio=native has 15% lower throughput than aio=threads. I haven't looked into this so I don't know what is causes these results. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Perf trace event parse errors for KVM events
On 05/29/2010 12:45 AM, Steven Rostedt wrote: On Fri, 2010-05-28 at 17:42 +0100, Stefan Hajnoczi wrote: I get parse errors when using Steven Rostedt's trace-cmd tool, too. Any ideas what is going on here? I can provide more info (e.g. trace files) if necessary. Does trace-cmd fail on the same tracepoints? Have you checkout the latest code?. I do know it fails on some of the KVM tracerpoints since the formatting they use is obnoxious. Isn't there a binary trace for this? -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] VFIO driver: Non-privileged user level PCI drivers
On Saturday 29 May 2010, Tom Lyon wrote: +/* + * Structure for DMA mapping of user buffers + * vaddr, dmaaddr, and size must all be page aligned + * buffer may only be larger than 1 page if (a) there is + * an iommu in the system, or (b) buffer is part of a huge page + */ +struct vfio_dma_map { + __u64 vaddr; /* process virtual addr */ + __u64 dmaaddr;/* desired and/or returned dma address */ + __u64 size; /* size in bytes */ + int rdwr; /* bool: 0 for r/o; 1 for r/w */ +}; Please add a 32 bit padding word at the end of this, otherwise the size of the data structure is incompatible between 32 x86 applications and 64 bit kernels. Arnd -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] VFIO driver: Non-privileged user level PCI drivers
On 05/29/2010 02:55 PM, Arnd Bergmann wrote: On Saturday 29 May 2010, Tom Lyon wrote: +/* + * Structure for DMA mapping of user buffers + * vaddr, dmaaddr, and size must all be page aligned + * buffer may only be larger than 1 page if (a) there is + * an iommu in the system, or (b) buffer is part of a huge page + */ +struct vfio_dma_map { + __u64 vaddr; /* process virtual addr */ + __u64 dmaaddr;/* desired and/or returned dma address */ + __u64 size; /* size in bytes */ + int rdwr; /* bool: 0 for r/o; 1 for r/w */ +}; Please add a 32 bit padding word at the end of this, otherwise the size of the data structure is incompatible between 32 x86 applications and 64 bit kernels. Might as well call it 'flags' and reserve a bit more space (keeping 64-bit aligned size) for future expansion. rdwr can be folded into it. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Perf trace event parse errors for KVM events
On Sat, 2010-05-29 at 14:50 +0300, Avi Kivity wrote: On 05/29/2010 12:45 AM, Steven Rostedt wrote: On Fri, 2010-05-28 at 17:42 +0100, Stefan Hajnoczi wrote: I get parse errors when using Steven Rostedt's trace-cmd tool, too. Any ideas what is going on here? I can provide more info (e.g. trace files) if necessary. Does trace-cmd fail on the same tracepoints? Have you checkout the latest code?. I do know it fails on some of the KVM tracerpoints since the formatting they use is obnoxious. Isn't there a binary trace for this? The pretty printing from the kernel handles this fine. But there's pressure to pass the format to userspace in binary and have the tool parse it. Currently it uses the print fmt to figure out how to parse. Using one of the examples that Stefan showed: kvmmmu/kvm_mmu_get_page: print fmt: %s %s, ({ const char *ret = p-buffer + p-len; static const char *access_str[] = { ---, --x, w--, w-x, -u-, -ux, wu-, wux }; union kvm_mmu_page_role role; role.word = REC-role; trace_seq_printf(p, sp gfn %llx %u%s q%u%s %s%s %snxe root %u %s%c, REC-gfn, role.level, role.cr4_pae ? pae : , role.quadrant, role.direct ? direct : , access_str[role.access], role.invalid ? invalid : , role.nxe ? : !, REC-root_count, REC-unsync ? unsync : sync, 0); ret; }), REC-created ? new : existing You need a full C parser/interpreter to understand the above. -- Steve -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Hey, Opt In Email Lists Give Away!
Hey, You have to check this out if your a webmaster. http://www.traffictractor.com is giving away thousands of opt in emails. This is huge! Everyone is talking about it now. With these opt in email addresses you can do so much. It takes years and a lot of money to build a list from scratch but now you have the chance to grab a massive list. Use it for ezine, email, autoresponders, they will bring your website a huge amount of power, traffic and sales. You have to check it out at http://www.traffictractor.com All the best, Sam L. Carl -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
how to unbind the device from guest to host with vt-d
Hi, everyone, I 'm here ask how to unbind the device from guest to host with vt-d. I use the /sys/bus/pci/driver/pci-stub/remove_id and the /sys/bus/pci/driver/pci-stub/unbind but I can't find the dir : /sys/bus/pci/device/:09:00.0/driver . So I can't bind the driver back to host system any more. The device is not available in my host system. I use the full virtio, by the way when my guest os shutdown, it gave a kernel panic, said the e1000_shutdown func had a bug. -- hepeng ICT -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [QEMU-KVM]: Megasas + TCM_Loop + SG_IO into Windows XP guests
On Tue, 2010-05-18 at 04:18 -0700, Nicholas A. Bellinger wrote: On Tue, 2010-05-18 at 11:43 +0200, Hannes Reinecke wrote: Nicholas A. Bellinger wrote: On Fri, 2010-05-14 at 02:42 -0700, Nicholas A. Bellinger wrote: Greetings Hannes, So I spent some more time with XP guests this weekend, and I noticed two things immediately when using hw/lsi53c895a.c instead of hw/megasas.c with the same two TCM_Loop SAS LUNs via SG_IO from last week: 1) With lsi53c895a, XP guests are able to boot successfully w/ out the synchronous SG_IO hack that is currently required to get past the first 36-byte INQUIRY for megasas + XP SP2 2) With lsi53c895a, XP is able to successfully create and mount a NTFS filesystem, reboot, and read blocks appear to be functioning properly. FYI I have not run any 'write known pattern then read-back and compare blocks' data integrity tests from with in the XP guests just yet, but I am confident that TCM scatterlist - se_mem_t mapping is working as expected on the KVM Host. Futhermore, after formatting a 5 GB TCM/FILEIO LUN with lsi53c895a, and then rebooting with megasas with the same two configured TCM_Loop SG_IO devices, it appears to be able to mount and read blocks successfully. Attempting to write new blocks on the mounted filesystem also appears to work to some degree, but throughput slows down to a crawl during XP guest buffer cache flush, which is likely attributed to the use of my quick SYNC SG_IO hack. So it appears that there are two seperate issues here, and AFAICT they both look to be XP and megasas specific. For #2, it may be something about the format of the incoming scatterlists generated during XP's mkfs.ntfs that is causing some issues. While watching output during fs creation, I noticed the following WRITE_10s with a starting 4088 byte scatterlist and a trailing 8 byte scatterlist: megasas: writel mmio 40: 2b0b003 megasas: Found mapped frame 2 context 82b0b000 pa 2b0b000 megasas: Enqueue frame context 82b0b000 tail 493 busy 1 megasas: LD SCSI dev 2 lun 0 sdev 0xdc0230 xfer 16384 scsi-generic: Using cur_addr: 0x0ff6c008 cur_len: 0x0ff8 scsi-generic: Adding iovec for mem: 0x7f1783b96008 len: 0x0ff8 scsi-generic: Using cur_addr: 0x0fd6e000 cur_len: 0x1000 scsi-generic: Adding iovec for mem: 0x7f1783998000 len: 0x1000 scsi-generic: Using cur_addr: 0x0fe2f000 cur_len: 0x1000 scsi-generic: Adding iovec for mem: 0x7f1783a59000 len: 0x1000 scsi-generic: Using cur_addr: 0x0fdf cur_len: 0x1000 scsi-generic: Adding iovec for mem: 0x7f1783a1a000 len: 0x1000 scsi-generic: Using cur_addr: 0x0fded000 cur_len: 0x0008 scsi-generic: Adding iovec for mem: 0x7f1783a17000 len: 0x0008 scsi-generic: execute IOV: iovec_count: 5, dxferp: 0xd92420, dxfer_len: 16384 scsi-generic: --- Issuing SG_IO CDB len 10: 0x2a 00 00 00 fa be 00 00 20 00 scsi-generic: scsi_write_complete() ret = 0 scsi-generic: Command complete 0x0xd922c0 tag=0x82b0b000 status=0 megasas: LD SCSI req 0xd922c0 cmd 0xda92c0 lun 0xdc0230 finished with status 0 len 16384 megasas: Complete frame context 82b0b000 tail 493 busy 0 doorbell 0 Also, the final READ_10 that produces the 'could not create filesystem' exception is for LBA 63 and XP looking for the first FS blocks after GPT. Could there be some breakage in megasas with a length PAGE_SIZE for the scatterlist..?As lsi53c895a seems to work OK for this case, is there something about the logic of parsing the incoming struct scatterlists that is different between the two HBA drivers..? AFAICT both are using Gerd's common code in hw/scsi-bus.c, unless there is something about megasas_map_sgl() that is causing issues with the above..? The usual disclaimer here: I'm less than happy with the current SCSI disk handling. Currently we have the two options: - Using 'scsi-disk', which will _emulate_ a SCSI disk internally, but allow to use asynchronous I/O using normal read/write syscalls - Using 'scsi-generic', which will allow you to pass-through any SCSI device, but disallow asynchronous I/O and requires you to use the SG_IO interface. Well, this is only true so far for the SYNC SG_IO patch with KVM XP guests. The asynchronous I/O still works as expected for Linux KVM guests for 10 Gb/sec sec throughput. The latter also implies that the host will mark _all_ I/O commands as 'block_pc', so the code path within the kernel is quite different from those taken by I/Os coming in via the 'scsi-disk' emulation. Guess it's time to have a 'scsi-passthrough' device ... Currently with QEMU-KVM hw/scsi-generic.c and STGT