Re: [PATCH] kvm-autotest: add object addressing in sample cfg
- Ryan Harper ry...@us.ibm.com wrote: The wiki documents[1] object addressing quite well, but we should include it in the example config file as well. 1. http://www.linux-kvm.org/page/KVM-Autotest/Parameters#Addressing_objects_.28VMs.2C_images.2C_NICs_etc.29 -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com diffstat output: kvm_tests.cfg.sample |4 1 files changed, 4 insertions(+) Signed-off-by: Ryan Harper ry...@us.ibm.com --- diff --git a/client/tests/kvm_runtest_2/kvm_tests.cfg.sample b/client/tests/kvm_runtest_2/kvm_tests.cfg.sample index 5619fa8..64f8e4b 100644 --- a/client/tests/kvm_runtest_2/kvm_tests.cfg.sample +++ b/client/tests/kvm_runtest_2/kvm_tests.cfg.sample @@ -19,6 +19,10 @@ image_size = 10G ssh_port = 22 display = vnc +# specify specific values for vm1 and nic1 +mem_vm1 = 256 +nic_model_nic1 = rtl8139 + # Port redirections redirs = ssh guest_port_ssh = 22 This may not be a good idea, because we'll end up using only rtl8139. Further down in the file we define virtio and e1000 variants. The e1000 one, for example, specifies 'nic_model = e1000'. So you'll get a dict that contains: nic_model_vm1 = rtl8139 nic_model = e1000 and the second statement will have no effect on vm1, because object specific statements take precedence over general ones, regardless of order (as mentioned in the wiki). Also, we'll end up always using mem = 256 (isn't that too little for some guests?). Soon we'll try to implement parsing of statements like 'nic_model.* ?= e1000', which will apply to any key that matches the regex 'nic_model.*'. This will make things a little easier. On the other hand, nic_model represents the default value for all VMs that don't have their own values. It makes sense to work mainly with this parameter, and give specific values only to VMs whose values we don't want to change. For example, when we implement a load test that brings up numerous VMs in the background, we may choose to always give them their own specific nic_model or mem or anything, as well as their own specific guest OS which excels at producing load, and leave our main_vm with the main OS we're testing (which depends on the current variant). Thanks, Michael -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
How to Use LibVirt to mange KVM virtual machine
Hi, guys I am trying to mange my KVM VM by libvirt, but I get troubles. If you have the experience, can you have a look about my issue? Before I send out this email, I already search libvirt.org and google it, No useful content found. If you have a step by step document or you know the document URL, can you forward it to me. Very appreciate for your helps. I am working in RHEL5u3, with libvirt RPM packages installed. # rpm -qa |grep libvirt libvirt-python-0.3.3-14.el5 libvirt-0.3.3-14.el5 libvirt-cim-0.5.1-4.el5 libvirt-devel-0.3.3-14.el5 libvirt-devel-0.3.3-14.el5 libvirt-0.3.3-14.el5 I build the host kernel with kvm upstream (2.6.29); the kvm modules are also built. I can success to boot my Linux with the following command: qemu-system-x86_64 -m 256 -smp 2 -no-acpi -net nic,macaddr=00:16:3e:11:1d:c5,model=rtl8139 -net tap,script=/etc/kvm/qemu-ifup -hda /share/xvs/var/tmp-img_gbp23_1238745859_1 I can run virsh with following command: virsh --connect qemu:///system But when I run 'list' command in virsh commlind, it can not find my VM, it is very strange. virsh # list Id Name State -- I try to boot the guest with an xml file, the content is following: [r...@vt-mv1 libvirt]# cat /share/xvs/var/kvm.conf domain type='kvm' namedemo2/name uuid4dea24b3-1d52-d8f3-2516-782e98a23fa0/uuid memory131072/memory vcpu1/vcpu os type arch=i686hvm/type /os clock sync=localtime/ devices emulator/usr/bin/kvm/emulator disk type='file' device='disk' source file='/share/ia32p_rhel5u1.img'/ target dev='hda'/ /disk interface type='network' source network='default'/ mac address='24:42:53:21:52:45'/ /interface graphics type='vnc' port='-1' / /devices /domain I ran the command line like this: virsh # define /share/xvs/var/kvm.conf Domain demo2 defined from /share/xvs/var/kvm.conf virsh # start demo2 error: Failed to start domain demo2 I check the qemu log /var/log/libvirt/qemu/demo2.log, the file is empty. I know that I miss some steps, but I can not find it. The documents in libvirt is very rough, I did not find any docs about KVM in that website. I am expecting your helps. Thanks. - Best Regards Shaohui Zheng -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/4] update ksm userspace interfaces
Gerd Hoffmann wrote: Izik Eidus wrote: The main problem that ksm will face when removing the fd interface is: right now when you register memory into ksm, you open fd, and then ksm do get_task_mm(), we will do mmput when the file will be closed Did you test whenever it really cleans up in case you kill -9 qemu? I recently did something simliar with the result that the extra reference hold on mm_struct prevented the process memory from being zapped ... cheers, Gerd Did you use mmput() after you called get_task_mm() ??? get_task_mm() do nothing beside atomic_inc(mm-mm_users); and mmput() do nothing beside dec this counter and check if no reference are available to this Am i missing anything? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 00/17] virtual-bus
Avi Kivity wrote: There is no choice. Exiting from the guest to the kernel to userspace is prohibitively expensive, you can't do that on every packet. I didn't look at virtio-net very closely yet. I wonder why the notification is that a big issue though. It is easy to keep the number of notifications low without increasing latency: Check shared ring status when stuffing a request. If there are requests not (yet) consumed by the other end there is no need to send a notification. That scheme can even span multiple rings (nics with rx and tx for example). Host backend can put a limit on the number of requests it takes out of the queue at once. i.e. block backend can take out some requests, throw them at the block layer, check whenever any request in flight is done, if so send back replies, start over again. guest can put more requests into the queue meanwhile without having to notify the host. I've seen the number of notifications going down to zero when running disk benchmarks in the guest ;) Of course that works best with one or more I/O threads, so the vcpu doesn't has to stop running anyway to get the I/O work done ... cheers, Gerd -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 00/17] virtual-bus
Gerd Hoffmann wrote: Avi Kivity wrote: There is no choice. Exiting from the guest to the kernel to userspace is prohibitively expensive, you can't do that on every packet. I didn't look at virtio-net very closely yet. I wonder why the notification is that a big issue though. It is easy to keep the number of notifications low without increasing latency: Check shared ring status when stuffing a request. If there are requests not (yet) consumed by the other end there is no need to send a notification. That scheme can even span multiple rings (nics with rx and tx for example). If the host is able to consume a request immediately, and the guest is not able to batch requests, this breaks down. And that is the current situation. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Commit 3d28613c225ba94062950dacbb2304b2d2024abc breaks linux boot
Sheng Yang wrote: tip is still broken for me, did a fix go in for this? Yes. The fix have already been picked up by Avi, please wait a while for push. Currently my queue is broken due to some qemu display regression. You can find my queue in the 'pending' branch on kernel.org. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Qemu process in Guest
Kumar, Venkat wrote: Thanks for the reply. I had wrong understanding that Qemu runs in Guest. But now I understand that *ioctl(fd, KVM_RUN, 0);* will tell KVM to load the guest and whenever there is an exception in the guest, KVM traps it and executes the host code post ioctl depending on the reason for exit. Can you point me to the code where the KVM traps the exception and loads the host to execute the post ioctl code? That's what vmx.c and svm.c in the kernel are about, look at vmx_vcpu_run() and svm_vcpu_run(). -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/4] update ksm userspace interfaces
Izik Eidus wrote: Gerd Hoffmann wrote: Did you test whenever it really cleans up in case you kill -9 qemu? I recently did something simliar with the result that the extra reference hold on mm_struct prevented the process memory from being zapped ... cheers, Gerd Did you use mmput() after you called get_task_mm() ??? get_task_mm() do nothing beside atomic_inc(mm-mm_users); mmput() call was in -release() callback, -release() in turn never was called because the kernel didn't zap the mappings because of the reference ... The driver *also* created mappings which ksmctl doesn't, so it could be you don't run into this issue. cheers, Gerd -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 00/17] virtual-bus
Check shared ring status when stuffing a request. If there are requests That means you're bouncing cache lines all the time. Probably not a big issue on single socket but could be on larger systems. -Andi -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 00/17] virtual-bus
Gerd Hoffmann wrote: Avi Kivity wrote: There is no choice. Exiting from the guest to the kernel to userspace is prohibitively expensive, you can't do that on every packet. I didn't look at virtio-net very closely yet. I wonder why the notification is that a big issue though. It is easy to keep the number of notifications low without increasing latency: Check shared ring status when stuffing a request. If there are requests not (yet) consumed by the other end there is no need to send a notification. That scheme can even span multiple rings (nics with rx and tx for example). FWIW: I employ this scheme. The shm-signal construct has a dirty and pending flag (all on the same cacheline, which may or may not address Andi's later point). The first time you dirty the shm, it sets both flags. The consumer side has to clear pending before any subsequent signals are sent. Normally the consumer side will also clear enabled (as part of the bidir napi thing) to further disable signals. -Greg signature.asc Description: OpenPGP digital signature
Re: [PATCH -tip 0/6 V4] tracing: kprobe-based event tracer
I'm wondering about something i suggested many moons ago: to look into the KVM decoder+emulator (arch/x86/kvm/x86_emulate.c). Hi Ingo, Me and Masami just discussed this a few emails ago in this thread:) -Andi -- a...@linux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 00/17] virtual-bus
On Fri, Apr 03, 2009 at 01:18:54PM +0200, Andi Kleen wrote: Check shared ring status when stuffing a request. If there are requests That means you're bouncing cache lines all the time. Probably not a big issue on single socket but could be on larger systems. If the backend is running on a core that doesn't share caches with the guest queue then you've got bigger problems. Right this is unavoidable for guests with many CPUs but that should go away once we support multiqueue in virtio-net. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} herb...@gondor.apana.org.au Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 00/17] virtual-bus
Gregory Haskins wrote: Yes, but the important thing to point out is it doesn't *replace* PCI. It simply an alternative. Does it offer substantial benefits over PCI? If not, it's just extra code. First of all, do you think I would spend time designing it if I didn't think so? :) I'll rephrase. What are the substantial benefits that this offers over PCI? Second of all, I want to use vbus for other things that do not speak PCI natively (like userspace for instance...and if I am gleaning this correctly, lguest doesnt either). And virtio supports lguest and s390. virtio is not PCI specific. However, for the PC platform, PCI has distinct advantages. What advantages does vbus have for the PC platform? PCI sounds good at first, but I believe its a false economy. It was designed, of course, to be a hardware solution, so it carries all this baggage derived from hardware constraints that simply do not exist in a pure software world and that have to be emulated. Things like the fixed length and centrally managed PCI-IDs, Not a problem in practice. PIO config cycles, BARs, pci-irq-routing, etc. What are the problems with these? While emulation of PCI is invaluable for executing unmodified guest, its not strictly necessary from a paravirtual software perspective...PV software is inherently already aware of its context and can therefore use the best mechanism appropriate from a broader selection of choices. It's also not necessary to invent a new bus. We need a positive advantage, we don't do things just because we can (and then lose the real advantages PCI has). If we insist that PCI is the only interface we can support and we want to do something, say, in the kernel for instance, we have to have either something like the ICH model in the kernel (and really all of the pci chipset models that qemu supports), or a hacky hybrid userspace/kernel solution. I think this is what you are advocating, but im sorry. IMO that's just gross and unecessary gunk. If we go for a kernel solution, a hybrid solution is the best IMO. I have no idea what's wrong with it. The guest would discover and configure the device using normal PCI methods. Qemu emulates the requests, and configures the kernel part using normal Linux syscalls. The nice thing is, kvm and the kernel part don't even know about each other, except for a way for hypercalls to reach the device and a way for interrupts to reach kvm. Lets stop beating around the bush and just define the 4-5 hypercall verbs we need and be done with it. :) FYI: The guest support for this is not really *that* much code IMO. drivers/vbus/proxy/Makefile |2 drivers/vbus/proxy/kvm.c | 726 + Does it support device hotplug and hotunplug? Can vbus interrupts be load balanced by irqbalance? Can guest userspace enumerate devices? Module autoloading support? pxe booting? Plus a port to Windows, enerprise Linux distros based on 2.6.dead, and possibly less mainstream OSes. and plus, I'll gladly maintain it :) I mean, its not like new buses do not get defined from time to time. Should the computing industry stop coming up with new bus types because they are afraid that the windows ABI only speaks PCI? No, they just develop a new driver for whatever the bus is and be done with it. This is really no different. As a matter of fact, a new bus was developed recently called PCI express. It uses new slots, new electricals, it's not even a bus (routers + point-to-point links), new everything except that the software model was 1% compatible with traditional PCI. That's how much people are afraid of the Windows ABI. Note that virtio is not tied to PCI, so vbus is generic doesn't count. Well, preserving the existing virtio-net on x86 ABI is tied to PCI, which is what I was referring to. Sorry for the confusion. virtio-net knows nothing about PCI. If you have a problem with PCI, write virtio-blah for a new bus. Though I still don't understand why. I meant, move the development effort, testing, installed base, Windows drivers. Again, I will maintain this feature, and its completely off to the side. Turn it off in the config, or do not enable it in qemu and its like it never existed. Worst case is it gets reverted if you don't like it. Aside from the last few kvm specific patches, the rest is no different than the greater linux environment. E.g. if I update the venet driver upstream, its conceptually no different than someone else updating e1000, right? I have no objections to you maintaining vbus, though I'd much prefer if we can pool our efforts and cooperate on having one good set of drivers. I think you're integrating too tightly with kvm, which is likely to cause problems when kvm evolves. The way I'd do it is: - drop all mmu integration; instead, have your devices maintain their own slots layout and use
Re: [RFC PATCH 00/17] virtual-bus
Herbert Xu wrote: On Fri, Apr 03, 2009 at 02:03:45PM +0300, Avi Kivity wrote: If the host is able to consume a request immediately, and the guest is not able to batch requests, this breaks down. And that is the current situation. Hang on, why is the host consuming the request immediately? It has to write the packet to tap, which then calls netif_rx_ni so it should actually go all the way, no? The host writes the packet to tap, at which point it is consumed from its point of view. The host would like to mention that if there was an API to notify it when the packet was actually consumed, then it would gladly use it. Bonus points if this involves not copying the packet. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 00/17] virtual-bus
Andi Kleen wrote: Check shared ring status when stuffing a request. If there are requests That means you're bouncing cache lines all the time. Probably not a big issue on single socket but could be on larger systems. That's why I'd like requests to be handled on the vcpu thread rather than an auxiliary thread. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 00/17] virtual-bus
On Fri, Apr 03, 2009 at 02:46:04PM +0300, Avi Kivity wrote: The host writes the packet to tap, at which point it is consumed from its point of view. The host would like to mention that if there was an API to notify it when the packet was actually consumed, then it would gladly use it. Bonus points if this involves not copying the packet. We're using write(2) for this, no? That should invoke netif_rx_ni which blocks until the packet is processed, which usually means that it's placed on the NIC's hardware queue. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} herb...@gondor.apana.org.au Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM performance
Hallo, as I want to switch from XEN to KVM I've made some performance tests to see if KVM is as peformant as XEN. But tests with a VMU that receives a streamed video, adds a small logo to the video and streams it to a client have shown that XEN performs much betten than KVM. In XEN the vlc (videolan client used to receive, process and send the video) process within the vmu has a cpuload of 33,8 % whereas in KVM the vlc process has a cpuload of 99.9 %. I'am not sure why, does anybody now some settings to improve the KVM performance? Thank you. Regards, Stefanie. Used hardware and settings: In the tests I've used the same host hardware for XEN and KVM: - Dual Core AMD 2.2 GHz, 8 GB RAM - Tested OSes for KVM Host: Fedora 10, 2.6.27.5-117.fc10.x86_64 with kvm version 10.fc10 version 74 also tested in january: compiled kernel with kvm-83 - KVM Guest settings: OS: Fedora 9 2.6.25-14.fc9.x86_64 (i386 also tested) RAM: 256 MB (same for XEN vmu) CPU: 1 Core with 2,2 GHz (same for XEN vmu) tested nic models: rtl8139, e1000, virtio Tested Scenario: VMU receives a streamed video , adds a logo (watermark) to the video stream and then streams it to a client Results: XEN: Host cpu load (virt-manager): 23% VMU cpu load (virt-manager): 18 % VLC process within VMU (top): 33,8% KVM: no virt-manager cpu load as I started the vmu with the kvm command Host cpu load : 52% qemu-kvm process (top) 77-100% VLC process within vmu (top): 80 - 99,9% KVM command to start vmu /usr/bin/qemu-kvm -boot c -hda /images/vmu01.raw -m 256 -net nic,vlan=0,macaddr=aa:bb:cc:dd:ee:10,model=virtio -net tap,ifname=tap0,vlan=0,script=/etc/kvm/qemu-ifup,downscript=/etc/kvm/qem u-ifdown -vnc 127.0.0.1:1 -k de --daemonize Alcatel-Lucent Deutschland AG Bell Labs Germany Service Infrastructure, ZFZ-SI Stefanie Braun Phone: +49.711.821-34865 Fax: +49.711.821-32453 Postal address: Alcatel-Lucent Deutschland AG Lorenzstrasse 10 D-70435 STUTTGART Mail: stefanie.br...@alcatel-lucent.de Alcatel-Lucent Deutschland AG Sitz der Gesellschaft: Stuttgart - Amtsgericht Stuttgart HRB 4026 Vorsitzender des Aufsichtsrats: Michael Oppenhoff Vorstand: Alf Henryk Wulf (Vors.), Dr. Rainer Fechner -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -tip 0/6 V4] tracing: kprobe-based event tracer
2009/4/3 Ingo Molnar mi...@elte.hu: * Avi Kivity a...@redhat.com wrote: Ingo Molnar wrote: kvm has three requirements not needed by kprobes: - it wants to execute instructions, not just decode them, including generating faults where appropriate - it is performance critical - it needs to support 16-bit, 32-bit, and 64-bit instructions simultaneously If an arch/x86/ decoder/emulator gives me these I'll gladly switch to it. x86_emulate.c is high on my list of most disliked code. Well, this has to be driven from the KVM side as the kprobes use will only be for decoding so if it's modified from the kprobes side the KVM-only functionality might regress. So ... we can do the library decoder for kprobes purposes, and someone versed in the KVM emulator can then combine the two. Problem is, anyone versed in the kvm emulator will want to run as far away from this work as possible. Are you suggesting that the KVM emulator should never have been merged in the first place? ;-) Anyway, we'll make sure the kprobes/library decoder is as clean as possible - so it ought to be hackable and extensible without the risk of permanent brain damage. Mmiotrace and kmemcheck has decoding smarts too, and i think the sw-breakpoint injection code of KGDB could use it as well - so there's broader utility in all this. (Sorry in advance for jumping in -- my post may be irrelevant) For the record, kmemcheck requirements for an instruction decoder are these: For any instruction with memory operands, we need to know which are the operands (so for movl %eax, (%ebx) we need to combine the instruction with a struct pt_regs to get the actual address dereferenced, i.e. the contents of %ebx), and their sizes (for movzbl, the source operand is 8 bits, destination operand is 32 bits). For things like movsb, we need to be able to get both %esi and %edi. mmiotrace additionally needs to know what the actual values read/written were, for instructions that read/write to memory (again, combined with a struct pt_regs). Maybe this doesn't really say much, since this is what a generic instruction decoder would be able to do anyway. But kmemcheck and mmiotrace both have very special-purpose decoders. I don't really know what other decoders look like, but what I would wish for is this: Some macros for iterating the operands, where each operand has a type (e.g. input (for reads), output (for writes), target (for jumps), immediate address, immediate value, etc.), a size (in bits), and a way to evaluate the operand. So eval(op, regs) for op=%eax, it will return regs-eax; for op=4(%eax), it will return regs-eax + 4; for op=4 it will return 4, etc. Both kmemcheck and mmiotrace could gain SMP support with instruction emulation, though it is strictly not necessary. In that case, though, we would not want to emulate fault handling, etc. (i.e. the fault should always be generated by the CPU itself). Please do put me on Cc for future discussions, though. Vegard -- The animistic metaphor of the bug that maliciously sneaked in while the programmer was not looking is intellectually dishonest as it disguises that the error is the programmer's own creation. -- E. W. Dijkstra, EWD1036 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 00/17] virtual-bus
Gregory Haskins wrote: Avi Kivity wrote: Gregory Haskins wrote: So again, I am proposing for consideration of accepting my work (either in its current form, or something we agree on after the normal review process) not only on the basis of the future development of the platform, but also to keep current components in their running to their full potential. I will again point out that the code is almost completely off to the side, can be completely disabled with config options, and I will maintain it. Therefore the only real impact is to people who care to even try it, and to me. Your work is a whole stack. Let's look at the constituents. - a new virtual bus for enumerating devices. Sorry, I still don't see the point. It will just make writing drivers more difficult. The only advantage I've heard from you is that it gets rid of the gunk. Well, we still have to support the gunk for non-pv devices so the gunk is basically free. The clean version is expensive since we need to port it to all guests and implement exciting features like hotplug. My real objection to PCI is fast-path related. I don't object, per se, to using PCI for discovery and hotplug. If you use PCI just for these types of things, but then allow fastpath to use more hypercall oriented primitives, then I would agree with you. We can leave PCI emulation in user-space, and we get it for free, and things are relatively tidy. PCI has very little to do with the fast path (nothing, if we use MSI). Its once you start requiring that we stay ABI compatible with something like the existing virtio-net in x86 KVM where I think it starts to get ugly when you try to move it into the kernel. So that is what I had a real objection to. I think as long as we are not talking about trying to make something like that work, its a much more viable prospect. I don't see why the fast path of virtio-net would be bad. Can you elaborate? Obviously all the pci glue stays in userspace. So what I propose is the following: 1) The core vbus design stays the same (or close to it) Sorry, I still don't see what advantage this has over PCI, and how you deal with the disadvantages. 2) the vbus-proxy and kvm-guest patch go away 3) the kvm-host patch changes to work with coordination from the userspace-pci emulation for things like MSI routing 4) qemu will know to create some MSI shim 1:1 with whatever it instantiates on the bus (and can communicate changes Don't userstand. What's this MSI shim? 5) any drivers that are written for these new PCI-IDs that might be present are allowed to use a hypercall ABI to talk after they have been probed for that ID (e.g. they are not limited to PIO or MMIO BAR type access methods). The way we'd to it with virtio is to add a feature bit that say you can hypercall here instead of pio. This way old drivers continue to work. Note that nothing prevents us from trapping pio in the kernel (in fact, we do) and forwarding it to the device. It shouldn't be any slower than hypercalls. Once I get here, I might have greater clarity to see how hard it would make to emulate fast path components as well. It might be easier than I think. This is all off the cuff so it might need some fine tuning before its actually workable. Does that sound reasonable? The vbus part (I assume you mean device enumeration) worries me. I don't think you've yet set down what its advantages are. Being pure and clean doesn't count, unless you rip out PCI from all existing installed hardware and from Windows. - finer-grained point-to-point communication abstractions Where virtio has ring+signalling together, you layer the two. For networking, it doesn't matter. For other applications, it may be helpful, perhaps you have something in mind. Yeah, actually. Thanks for bringing that up. So the reason why signaling and the ring are distinct constructs in the design is to facilitate constructs other than rings. For instance, there may be some models where having a flat shared page is better than a ring. A ring will naturally preserve all values in flight, where as a flat shared page would not (last update is always current). There are some algorithms where a previously posted value is obsoleted by an update, and therefore rings are inherently bad for this update model. And as we know, there are plenty of algorithms where a ring works perfectly. So I wanted that flexibility to be able to express both. I agree that there is significant potential here. One of the things I have in mind for the flat page model is that RT vcpu priority thing. Another thing I am thinking of is coming up with a PV LAPIC type replacement (where we can avoid doing the EOI trap by having the PICs state shared). You keep falling into the paravirtualize the entire universe trap. If you look deep down, you can see Jeremy struggling in there trying to bring dom0 support to
Re: [PATCH -tip 0/6 V4] tracing: kprobe-based event tracer
Avi Kivity wrote: Ingo Molnar wrote: ok, the structure and concept looks quite good now, really nice! I'm wondering about something i suggested many moons ago: to look into the KVM decoder+emulator (arch/x86/kvm/x86_emulate.c). I remember there were some issues with that (one problem being that the KVM decoder is a special-purpose thing covering specific range of execution environments - not a near-full integer-ops decoder like the one we are aiming for here) - are there any other fundamental problems beyond 'it has to be done' ? Conceptually we want just a single piece of decoder logic in arch/x86/. If the KVM folks are cool with it we could factor out the KVM one into arch/x86/lib/. But ... if there are compelling reasons to leave the KVM one alone in its limited environment we can do that too. kvm has three requirements not needed by kprobes: - it wants to execute instructions, not just decode them, including generating faults where appropriate - it is performance critical - it needs to support 16-bit, 32-bit, and 64-bit instructions simultaneously Hmm, I'd like to know actually kvm aims to emulate all kinds of instructions. If so, I might find some bugs in x86_emulate.c. However, I don't know all bugs. To find all of them, we have to port x86_emulate.c to user-space, decode binaries with it, and compare its output with another decoder, as Jim had done with insn.c. https://www.redhat.com/archives/utrace-devel/2009-March/msg00031.html Thank you, -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America) Inc. Software Solutions Division e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -tip 0/6 V4] tracing: kprobe-based event tracer
* Masami Hiramatsu mhira...@redhat.com wrote: Hmm, I'd like to know actually kvm aims to emulate all kinds of instructions. If so, I might find some bugs in x86_emulate.c. However, I don't know all bugs. To find all of them, we have to port x86_emulate.c to user-space, decode binaries with it, and compare its output with another decoder, as Jim had done with insn.c. https://www.redhat.com/archives/utrace-devel/2009-March/msg00031.html btw., i'd suggest we put a build time check for this into the kernel version as well. For example to decode the vmlinux via objdump, run it through your decoder as well and compare the results. Put under a CONFIG_DEBUG_X86_DECODER_TEST kind of (deault-off) build-time self-test. This would ensure that the kernel we are running is fully supported by the decoder - even as GCC/GAS starts using new instructions, etc. How does this sound to you? Ingo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -tip 0/6 V4] tracing: kprobe-based event tracer
Masami Hiramatsu wrote: Hmm, I'd like to know actually kvm aims to emulate all kinds of instructions. We're less interested in fpu/sse. The interesting instructions are those used for page table management, mmio, and real mode execution. If so, I might find some bugs in x86_emulate.c. However, I don't know all bugs. To find all of them, we have to port x86_emulate.c to user-space, decode binaries with it, and compare its output with another decoder, as Jim had done with insn.c. That would be very useful. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 00/17] virtual-bus
Hi Avi, I think we have since covered these topics later in the thread, but in case you wanted to know my thoughts here: Avi Kivity wrote: Gregory Haskins wrote: Yes, but the important thing to point out is it doesn't *replace* PCI. It simply an alternative. Does it offer substantial benefits over PCI? If not, it's just extra code. First of all, do you think I would spend time designing it if I didn't think so? :) I'll rephrase. What are the substantial benefits that this offers over PCI? Simplicity and optimization. You don't need most of the junk that comes with PCI. Its all overhead and artificial constraints. You really only need things like a handful of hypercall verbs and thats it. Second of all, I want to use vbus for other things that do not speak PCI natively (like userspace for instance...and if I am gleaning this correctly, lguest doesnt either). And virtio supports lguest and s390. virtio is not PCI specific. I understand that. We keep getting wrapped around the axle on this one. At some point in the discussion we were talking about supporting the existing guest ABI without changing the guest at all. So while I totally understand the virtio can work over various transports, I am referring to what would be needed to have existing ABI guests work with an in-kernel version. This may or may not be an actual requirement. However, for the PC platform, PCI has distinct advantages. What advantages does vbus have for the PC platform? To reiterate: IMO simplicity and optimization. Its designed specifically for PV use, which is software to software. PCI sounds good at first, but I believe its a false economy. It was designed, of course, to be a hardware solution, so it carries all this baggage derived from hardware constraints that simply do not exist in a pure software world and that have to be emulated. Things like the fixed length and centrally managed PCI-IDs, Not a problem in practice. Perhaps, but its just one more constraint that isn't actually needed. Its like the cvs vs git debate. Why have it centrally managed when you don't technically need it. Sure, centrally managed works, but I'd rather not deal with it if there was a better option. PIO config cycles, BARs, pci-irq-routing, etc. What are the problems with these? 1) PIOs are still less efficient to decode than a hypercall vector. We dont need to pretend we are hardware..the guest already knows whats underneath them. Use the most efficient call method. 2) BARs? No one in their right mind should use an MMIO BAR for PV. :) The last thing we want to do is cause page faults here. Don't use them, period. (This is where something like the vbus::shm() interface comes in) 3) pci-irq routing was designed to accommodate etch constraints on a piece of silicon that doesn't actually exist in kvm. Why would I want to pretend I have PCI A,B,C,D lines that route to a pin on an IOAPIC? Forget all that stuff and just inject an IRQ directly. This gets much better with MSI, I admit, but you hopefully catch my drift now. One of my primary design objectives with vbus was to a) reduce the signaling as much as possible, and b) reduce the cost of signaling. That is why I do things like use explicit hypercalls, aggregated interrupts, bidir napi to mitigate signaling, the shm_signal::pending mitigation, and avoiding going to userspace by running in the kernel. All of these things together help to form what I envision would be a maximum performance transport. Not all of these tricks are interdependent (for instance, the bidir + full-duplex threading that I do can be done in userspace too, as discussed). They are just the collective design elements that I think we need to make a guest perform very close to its peak. That is what I am after. While emulation of PCI is invaluable for executing unmodified guest, its not strictly necessary from a paravirtual software perspective...PV software is inherently already aware of its context and can therefore use the best mechanism appropriate from a broader selection of choices. It's also not necessary to invent a new bus. You are right, its not strictly necessary to work. Its just presents the opportunity to optimize as much as possible and to move away from legacy constraints that no longer apply. And since PVs sole purpose is about optimization, I was not really interested in going half-way. We need a positive advantage, we don't do things just because we can (and then lose the real advantages PCI has). Agreed, but I assert there are advantages. You may not think they outweigh the cost, and thats your prerogative, but I think they are still there nonetheless. If we insist that PCI is the only interface we can support and we want to do something, say, in the kernel for instance, we have to have either something like the ICH model in the kernel (and really all of the pci chipset models that qemu
VM cpuTime discrepancy
The cpuTime of a VM reported by kvm72 is ok (real seconds ) while that reported by kvm-84 is not Are you aware of this . Was it fixed in latest kvm releases since 84 ? I access couTime via libvirt . (same version in both cases) . thanks Zvi Dubitzky Virtualization and System Architecture Email:d...@il.ibm.com IBM Haifa Research LaboratoryPhone: +972-4-8296182 Haifa, 31905, ISRAEL -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 00/17] virtual-bus
Gregory Haskins wrote: I'll rephrase. What are the substantial benefits that this offers over PCI? Simplicity and optimization. You don't need most of the junk that comes with PCI. Its all overhead and artificial constraints. You really only need things like a handful of hypercall verbs and thats it. Simplicity: The guest already supports PCI. It has to, since it was written to the PC platform, and since today it is fashionable to run kernels that support both bare metal and a hypervisor. So you can't remove PCI from the guest. The host also already supports PCI. It has to, since it must supports guests which do not support vbus. We can't remove PCI from the host. You don't gain simplicity by adding things. Sure, lguest is simple because it doesn't support PCI. But Linux will forever support PCI, and Qemu will always support PCI. You aren't simplifying anything by adding vbus. Optimization: Most of PCI (in our context) deals with configuration. So removing it doesn't optimize anything, unless you're counting hotplugs-per-second or something. Second of all, I want to use vbus for other things that do not speak PCI natively (like userspace for instance...and if I am gleaning this correctly, lguest doesnt either). And virtio supports lguest and s390. virtio is not PCI specific. I understand that. We keep getting wrapped around the axle on this one. At some point in the discussion we were talking about supporting the existing guest ABI without changing the guest at all. So while I totally understand the virtio can work over various transports, I am referring to what would be needed to have existing ABI guests work with an in-kernel version. This may or may not be an actual requirement. There is be no problem supporting an in-kernel host virtio endpoint with the existing guest/host ABI. Nothing in the ABI assumes the host endpoint is in userspace. Nothing in the implementation requires us to move any of the PCI stuff into the kernel. In fact, we already have in-kernel sources of PCI interrupts, these are assigned PCI devices (obviously, these have to use PCI). However, for the PC platform, PCI has distinct advantages. What advantages does vbus have for the PC platform? To reiterate: IMO simplicity and optimization. Its designed specifically for PV use, which is software to software. To avoid reiterating, please be specific about these advantages. PCI sounds good at first, but I believe its a false economy. It was designed, of course, to be a hardware solution, so it carries all this baggage derived from hardware constraints that simply do not exist in a pure software world and that have to be emulated. Things like the fixed length and centrally managed PCI-IDs, Not a problem in practice. Perhaps, but its just one more constraint that isn't actually needed. Its like the cvs vs git debate. Why have it centrally managed when you don't technically need it. Sure, centrally managed works, but I'd rather not deal with it if there was a better option. We've allocated 3 PCI device IDs so far. It's not a problem. There are enough real problems out there. PIO config cycles, BARs, pci-irq-routing, etc. What are the problems with these? 1) PIOs are still less efficient to decode than a hypercall vector. We dont need to pretend we are hardware..the guest already knows whats underneath them. Use the most efficient call method. Last time we measured, hypercall overhead was the same as pio overhead. Both vmx and svm decode pio completely (except for string pio ...) 2) BARs? No one in their right mind should use an MMIO BAR for PV. :) The last thing we want to do is cause page faults here. Don't use them, period. (This is where something like the vbus::shm() interface comes in) So don't use BARs for your fast path. virtio places the ring in guest memory (like most real NICs). 3) pci-irq routing was designed to accommodate etch constraints on a piece of silicon that doesn't actually exist in kvm. Why would I want to pretend I have PCI A,B,C,D lines that route to a pin on an IOAPIC? Forget all that stuff and just inject an IRQ directly. This gets much better with MSI, I admit, but you hopefully catch my drift now. True, PCI interrupts suck. But this was fixed with MSI. Why fix it again? One of my primary design objectives with vbus was to a) reduce the signaling as much as possible, and b) reduce the cost of signaling. That is why I do things like use explicit hypercalls, aggregated interrupts, bidir napi to mitigate signaling, the shm_signal::pending mitigation, and avoiding going to userspace by running in the kernel. All of these things together help to form what I envision would be a maximum performance transport. Not all of these tricks are interdependent (for instance, the bidir + full-duplex threading that I do can be done
[PATCH -tip 2/6 V4.1] x86: add arch-dep register and stack access API to ptrace
Add following APIs for accessing registers and stack entries from pt_regs. - query_register_offset(const char *name) Query the offset of name register. - query_register_name(unsigned offset) Query the name of register by its offset. - get_register(struct pt_regs *regs, unsigned offset) Get the value of a register by its offset. - valid_stack_address(struct pt_regs *regs, unsigned long addr) Check the address is in the stack. - get_stack_nth(struct pt_regs *reg, unsigned nth) Get Nth entry of the stack. (N = 0) - get_argument_nth(struct pt_regs *reg, unsigned nth) Get Nth argument at function call. (N = 0) changes from v4: - support querying ss register. - remove unneeded cast. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Cc: Steven Rostedt rost...@goodmis.org Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Ingo Molnar mi...@elte.hu Cc: Frederic Weisbecker fweis...@gmail.com --- arch/x86/include/asm/ptrace.h | 66 + arch/x86/kernel/ptrace.c | 60 + 2 files changed, 126 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h index aed0894..51e5844 100644 --- a/arch/x86/include/asm/ptrace.h +++ b/arch/x86/include/asm/ptrace.h @@ -7,6 +7,7 @@ #ifdef __KERNEL__ #include asm/segment.h +#include asm/page_types.h #endif #ifndef __ASSEMBLY__ @@ -215,6 +216,71 @@ static inline unsigned long user_stack_pointer(struct pt_regs *regs) return regs-sp; } +/* Query offset/name of register from its name/offset */ +extern int query_register_offset(const char *name); +extern const char *query_register_name(unsigned offset); +#define MAX_REG_OFFSET (offsetof(struct pt_regs, ss)) + +/* Get register value from its offset */ +static inline unsigned long get_register(struct pt_regs *regs, unsigned offset) +{ + if (unlikely(offset MAX_REG_OFFSET)) + return 0; + return *(unsigned long *)((unsigned long)regs + offset); +} + +/* Check the address in the stack */ +static inline int valid_stack_address(struct pt_regs *regs, unsigned long addr) +{ + return ((addr ~(THREAD_SIZE - 1)) == + (kernel_trap_sp(regs) ~(THREAD_SIZE - 1))); +} + +/* Get Nth entry of the stack */ +static inline unsigned long get_stack_nth(struct pt_regs *regs, unsigned n) +{ + unsigned long *addr = (unsigned long *)kernel_trap_sp(regs); + addr += n; + if (valid_stack_address(regs, (unsigned long)addr)) + return *addr; + else + return 0; +} + +/* Get Nth argument at function call */ +static inline unsigned long get_argument_nth(struct pt_regs *regs, unsigned n) +{ +#ifdef CONFIG_X86_32 +#define NR_REGPARMS 3 + if (n NR_REGPARMS) { + switch (n) { + case 0: return regs-ax; + case 1: return regs-dx; + case 2: return regs-cx; + } + return 0; +#else /* CONFIG_X86_64 */ +#define NR_REGPARMS 6 + if (n NR_REGPARMS) { + switch (n) { + case 0: return regs-di; + case 1: return regs-si; + case 2: return regs-dx; + case 3: return regs-cx; + case 4: return regs-r8; + case 5: return regs-r9; + } + return 0; +#endif + } else { + /* +* The typical case: arg n is on the stack. +* (Note: stack[0] = return address, so skip it) +*/ + return get_stack_nth(regs, 1 + n - NR_REGPARMS); + } +} + /* * These are defined as per linux/ptrace.h, which see. */ diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c index 5c6e463..3f504fd 100644 --- a/arch/x86/kernel/ptrace.c +++ b/arch/x86/kernel/ptrace.c @@ -46,6 +46,66 @@ enum x86_regset { REGSET_IOPERM32, }; +struct pt_regs_offset { + const char *name; + int offset; +}; + +#define REG_OFFSET(r) offsetof(struct pt_regs, r) +#define REG_OFFSET_NAME(r) {.name = #r, .offset = REG_OFFSET(r)} +#define REG_OFFSET_END {.name = NULL, .offset = 0} + +static struct pt_regs_offset regoffset_table[] = { +#ifdef CONFIG_X86_64 + REG_OFFSET_NAME(r15), + REG_OFFSET_NAME(r14), + REG_OFFSET_NAME(r13), + REG_OFFSET_NAME(r12), + REG_OFFSET_NAME(r11), + REG_OFFSET_NAME(r10), + REG_OFFSET_NAME(r9), + REG_OFFSET_NAME(r8), +#endif + REG_OFFSET_NAME(bx), + REG_OFFSET_NAME(cx), + REG_OFFSET_NAME(dx), + REG_OFFSET_NAME(si), + REG_OFFSET_NAME(di), + REG_OFFSET_NAME(bp), + REG_OFFSET_NAME(ax), +#ifdef CONFIG_X86_32 + REG_OFFSET_NAME(ds), + REG_OFFSET_NAME(es), + REG_OFFSET_NAME(fs), + REG_OFFSET_NAME(gs), +#endif + REG_OFFSET_NAME(orig_ax), + REG_OFFSET_NAME(ip), + REG_OFFSET_NAME(cs), +
Re: [PATCH 5/4] update ksm userspace interfaces
* Gerd Hoffmann (kra...@redhat.com) wrote: mmput() call was in -release() callback, -release() in turn never was called because the kernel didn't zap the mappings because of the reference ... Don't have this issue. That mmput() is not tied to zapping mappings, rather zapping files. IOW, I think you're saying exit_mmap() wasn't running due to your get_task_mm() (quite understandable, you still hold a ref), whereas this ref is tied to exit_files(). So do_exit would do: exit_mm mmput -- not dropped yet exit_files -release mmput -- dropped here thanks, -chris -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 00/17] virtual-bus
Avi Kivity wrote: Gregory Haskins wrote: Avi Kivity wrote: Gregory Haskins wrote: So again, I am proposing for consideration of accepting my work (either in its current form, or something we agree on after the normal review process) not only on the basis of the future development of the platform, but also to keep current components in their running to their full potential. I will again point out that the code is almost completely off to the side, can be completely disabled with config options, and I will maintain it. Therefore the only real impact is to people who care to even try it, and to me. Your work is a whole stack. Let's look at the constituents. - a new virtual bus for enumerating devices. Sorry, I still don't see the point. It will just make writing drivers more difficult. The only advantage I've heard from you is that it gets rid of the gunk. Well, we still have to support the gunk for non-pv devices so the gunk is basically free. The clean version is expensive since we need to port it to all guests and implement exciting features like hotplug. My real objection to PCI is fast-path related. I don't object, per se, to using PCI for discovery and hotplug. If you use PCI just for these types of things, but then allow fastpath to use more hypercall oriented primitives, then I would agree with you. We can leave PCI emulation in user-space, and we get it for free, and things are relatively tidy. PCI has very little to do with the fast path (nothing, if we use MSI). At the very least, PIOs are slightly slower than hypercalls. Perhaps not enough to care, but the last time I measured them they were slower, and therefore my clean slate design doesn't use them. But I digress. I think I was actually kind of agreeing with you that we could do this. :P Its once you start requiring that we stay ABI compatible with something like the existing virtio-net in x86 KVM where I think it starts to get ugly when you try to move it into the kernel. So that is what I had a real objection to. I think as long as we are not talking about trying to make something like that work, its a much more viable prospect. I don't see why the fast path of virtio-net would be bad. Can you elaborate? Im not. I am saying I think we might be able to do this. Obviously all the pci glue stays in userspace. So what I propose is the following: 1) The core vbus design stays the same (or close to it) Sorry, I still don't see what advantage this has over PCI, and how you deal with the disadvantages. I think you are confusing the vbus-proxy (guest side) with the vbus backend. (1) is saying keep the vbus backend' and (2) is saying drop the guest side stuff. In this proposal, the guest would speak a PCI ABI as far as its concerned. Devices in the vbus backend would render as PCI objects in the ICH (or whatever) model in userspace. 2) the vbus-proxy and kvm-guest patch go away 3) the kvm-host patch changes to work with coordination from the userspace-pci emulation for things like MSI routing 4) qemu will know to create some MSI shim 1:1 with whatever it instantiates on the bus (and can communicate changes Don't userstand. What's this MSI shim? Well, if the device model was an object in vbus down in the kernel, yet PCI emulation was up in qemu, presumably we would want something to handle things like PCI config-cycles up in userspace. Like, for instance, if the guest re-routes the MSI. The shim/proxy would handle the config-cycle, and then turn around and do an ioctl to the kernel to configure the change with the in-kernel device model (or the irq infrastructure, as required). But, TBH, I haven't really looked into whats actually required to make this work yet. I am just spitballing to try to find a compromise. 5) any drivers that are written for these new PCI-IDs that might be present are allowed to use a hypercall ABI to talk after they have been probed for that ID (e.g. they are not limited to PIO or MMIO BAR type access methods). The way we'd to it with virtio is to add a feature bit that say you can hypercall here instead of pio. This way old drivers continue to work. Yep, agreed. This is what I was thinking we could do. But now that I have the possibility that I just need to write a virtio-vbus module to co-exist with virtio-pci, perhaps it doesn't even need to be explicit. Note that nothing prevents us from trapping pio in the kernel (in fact, we do) and forwarding it to the device. It shouldn't be any slower than hypercalls. Sure, its just slightly slower, so I would prefer pure hypercalls if at all possible. Once I get here, I might have greater clarity to see how hard it would make to emulate fast path components as well. It might be easier than I think. This is all off the cuff so it might need some fine tuning before its actually workable. Does that sound reasonable? The
Re: How to Use LibVirt to mange KVM virtual machine
You should ask about this on the libvirt mailing list and IRC channel, not here. That said, a few quick points: 1. the libvirt you're running is very old. 2. you might consider setting the emulator to point to a shell script which records the command line it's called with to a file before doing an exec /usr/bin/kvm $@ -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Add shared memory PCI device that shares a memory object betweens VMs
Avi Kivity wrote: Cam Macdonell wrote: I think there is value for static memory sharing. It can be used for fast, simple synchronization and communication between guests (and the host) that use need to share data that needs to be updated frequently (such as a simple cache or notification system). It may not be a common task, but I think static sharing has its place and that's what this device is for at this point. It would be good to detail a use case for reference. I'll try my best... We are using the (static) shared memory region for fast, interprocess communications (IPC). Of course, shared-memory IPC is an old idea, and the IPC here is actually between VMs (i.e., ivshmem), not processes inside a single VM. But, fast IPC is useful for shared caches, OS bypass (guest-to-guest, and host-to-guest), and low-latency IPC use-cases. For example, one use of ivshmem is as as a file cache between VMs. Note that, unlike stream-oriented IPC, this file cache can be shared between, say, four VMs simultaneously. In using VMs as sandboxes for distributed computing (condor, cloud, etc.), if two (or more) VMs are co-located on the same server, they can effectively share a single, unified cache. Any VM can bring in the data, and other VMs can use it. Otherwise, two VMs might transfer (over the WAN, in the worst case, as in a cloud) and buffer cache the same file in multiple VMs. In some ways, the configuration would look like an in-memory cluster file system, but instead of shared disks, we have shared memory. Alternative forms of file sharing between VMs (e.g., via SAMBA or NFS) are possible, but also results in multiple cached copies of the same file data on the same physical server. Furthermore, ivshmem has the (usual, planned) latency (e.g., for file metadata stats) and bandwidth advantages between most forms of stream-oriented IPC for file sharing protocols. Other (related) use cases include bulk-data movement between the host and guest VMs, due to the OS bypass properties of the ivshmem. Since static shared memory shares a file (or memory object) on the host, host-guest sharing is simpler than with dynamic shared memory. We acknowledge that work has to be done with thread/process scheduling to truly gain low IPC latency; that is to come, possibly with PCI interrupts. And, as the VMware experience shows (see below), VM migration *is* affected by ivshmem, but we think a good (but non-trivial) attach-to-ivshmem and detach-from-ivshmem protocol (in the future) can mostly address that issue. As an aside, VMware ditched shared memory as part of their VMCI interface. We emailed with some of their people who suggested to use sockets since shared memory de-virtualizes the VM (i.e. it breaks migration). But on their forums there were users that used shared memory for their work and were disappointed to see it go. One person I emailed with used shared memory for simulations running across VMs. Using shared memory freed him from having to come up with a protocol to exchange updates and having a central VM responsible for receiving and broadcasting updates. When he did try to use a socket-based approach, the performance dropped substantially due to the communication overhead. Then you need a side channel to communicate the information to the guest. Couldn't one of the registers in BAR0 be used to store the actual (non-power-of-two) size? The PCI config space (where the BARs reside) is a good place for it. Registers 0x40+ are device specific IIRC. Ok. Cam -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -tip 0/6 V4] tracing: kprobe-based event tracer
Ingo Molnar wrote: * Masami Hiramatsu mhira...@redhat.com wrote: Hmm, I'd like to know actually kvm aims to emulate all kinds of instructions. If so, I might find some bugs in x86_emulate.c. However, I don't know all bugs. To find all of them, we have to port x86_emulate.c to user-space, decode binaries with it, and compare its output with another decoder, as Jim had done with insn.c. https://www.redhat.com/archives/utrace-devel/2009-March/msg00031.html btw., i'd suggest we put a build time check for this into the kernel version as well. For example to decode the vmlinux via objdump, run it through your decoder as well and compare the results. Put under a CONFIG_DEBUG_X86_DECODER_TEST kind of (deault-off) build-time self-test. This would ensure that the kernel we are running is fully supported by the decoder - even as GCC/GAS starts using new instructions, etc. How does this sound to you? Thanks! That is a good idea. Jim, would you think you can port your script into kernel tree? Thank you, -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America) Inc. Software Solutions Division e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
PCI device assignment to Guest
Anyone has experience to assign PCI-E based InfiniBand card to guest OS(RHEL5U2 with kernel 2.6.18-92) on latest AMD with IOMMU support. Host OS has kernel 2.6.29. Steps I used: $ echo -n 8086 10de /sys/bus/pci/drivers/pci-stub/new_id $ echo -n :00:19.0 /sys/bus/pci/drivers/e1000e/unbind $ echo -n :00:19.0 /sys/bus/pci/drivers/pci-stub/bind Then I started guest with -pcidevice host=id. After guest is started, it successfully detected pci device with lspci command, however kernel can't bring up the device. dmesg shows infiniband kernel module can't detect infiniband card firmware properly then aborted. I think it is KVM issue rather than infiniband kernel module issue. Can anyone suggest? Thanks Eric _ Rediscover Hotmail®: Get quick friend updates right in your inbox. http://windowslive.com/RediscoverHotmail?ocid=TXT_TAGLM_WL_HM_Rediscover_Updates1_042009-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 00/17] virtual-bus
* Gregory Haskins (ghask...@novell.com) wrote: Let me ask you this: If you had a clean slate and were designing a hypervisor and a guest OS from scratch: What would you make the bus look like? Well, virtio did have a relatively clean slate. And PCI (as _one_ transport option) is what it looks like. It's not the only transport (as Avi already mentioned it works for s390, for example). BTW, from my brief look at vbus, it seems pretty similar to xenbus. thanks, -chris -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PCI device assignment to Guest
* Eric Liu (ericliu2...@hotmail.com) wrote: Anyone has experience to assign PCI-E based InfiniBand card to guest OS(RHEL5U2 with kernel 2.6.18-92) on latest AMD with IOMMU support. Host OS has kernel 2.6.29. Steps I used: $ echo -n 8086 10de /sys/bus/pci/drivers/pci-stub/new_id $ echo -n :00:19.0 /sys/bus/pci/drivers/e1000e/unbind $ echo -n :00:19.0 /sys/bus/pci/drivers/pci-stub/bind The steps above are specific to an e1000e device. Then I started guest with -pcidevice host=id. After guest is started, it successfully detected pci device with lspci command, however kernel can't bring up the device. dmesg shows infiniband kernel module can't detect infiniband card firmware properly then aborted. I think it is KVM issue rather than infiniband kernel module issue. Can anyone suggest? Sounds like you may have two drivers for this device. Can you include (on the host) lspci -tv and lspci -vvv? thanks, -chris -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -tip 4/6 V4.1] x86: kprobes checks safeness of insertion address.
On Fri, 2009-04-03 at 12:02 -0400, Masami Hiramatsu wrote: Ensure safeness of inserting kprobes by checking whether the specified address is at the first byte of a instruction. This is done by decoding probed function from its head to the probe point. changes from v4: - change a comment according to Ananth's suggestion. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Jim Keniston jkeni...@us.ibm.com Cc: Ingo Molnar mi...@elte.hu --- arch/x86/kernel/kprobes.c | 51 + 1 files changed, 51 insertions(+), 0 deletions(-) diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c ... +/* Recover original instruction */ /* Recover the probed instruction at addr for further analysis. */ See below. +static int recover_probed_instruction(kprobe_opcode_t *buf, unsigned long addr) +{ + struct kprobe *kp; + kp = get_kprobe((void *)addr); + if (!kp) + return -EINVAL; + + /* Don't use p-ainsn.insn; which will be modified by fix_riprel */ fix_riprel doesn't affect the instruction's length, which is what concerns this patch. But we want this function to be useful for unforeseen uses as well, so I like the code you have. Just consider the suggested comment changes. /* * Don't use p-ainsn.insn, which could be modified -- e.g., * by fix_riprel(). */ + memcpy(buf, kp-addr, MAX_INSN_SIZE * sizeof(kprobe_opcode_t)); + buf[0] = kp-opcode; + return 0; +} Jim Keniston -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: PCI device assignment to Guest
Here are the exact steps I used: 1. lspci -n on host: 06:00.0 0c06: 15b3:634a (rev a0) I want to assign this device to guest. 2. Uninstall driver for this device. 3. Unbind device with the following commands: echo 15b3 634a /sys/bus/pci/drivers/pci-stub/new_id echo :06:00.0 /sys/bus/pci/devices/:06:00.0/driver/unbind echo :06:00.0 /sys/bus/pci/drivers/pci-stub/bind 4. start guest with... -pcidevice host=06:00.0 5. Guest os detects device with lspci command but failed to start. lspci -tv on host:(last device here is what i want to assign) -[:00]-+-01.0-[:03-04]--+-0d.0-[:04]-- |\-0e.0 Broadcom BCM5785 [HT1000] SATA (PATA/IDE Mode) +-02.0 Broadcom BCM5785 [HT1000] Legacy South Bridge +-02.1 Broadcom BCM5785 [HT1000] IDE +-02.2 Broadcom BCM5785 [HT1000] LPC +-03.0 Broadcom BCM5785 [HT1000] USB +-03.1 Broadcom BCM5785 [HT1000] USB +-03.2 Broadcom BCM5785 [HT1000] USB +-04.0 ATI Technologies Inc ES1000 +-07.0-[:05]-- +-08.0-[:01]00.0 Broadcom Corporation NetXtreme BCM5721 Gigabit Ethernet PCI Express +-09.0-[:02]00.0 Broadcom Corporation NetXtreme BCM5721 Gigabit Ethernet PCI Express +-0a.0-[:06]00.0 Mellanox Technologies MT25418 [ConnectX IB DDR] lspci -vvv: (only related portion) 06:00.0 InfiniBand: Mellanox Technologies MT25418 [ConnectX IB DDR] (rev a0) Subsystem: Mellanox Technologies Unknown device 0007 Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fastTAbort- SERR- Date: Fri, 3 Apr 2009 10:13:22 -0700 From: chr...@sous-sol.org To: ericliu2...@hotmail.com CC: kvm@vger.kernel.org Subject: Re: PCI device assignment to Guest * Eric Liu (ericliu2...@hotmail.com) wrote: Anyone has experience to assign PCI-E based InfiniBand card to guest OS(RHEL5U2 with kernel 2.6.18-92) on latest AMD with IOMMU support. Host OS has kernel 2.6.29. Steps I used: $ echo -n 8086 10de /sys/bus/pci/drivers/pci-stub/new_id $ echo -n :00:19.0 /sys/bus/pci/drivers/e1000e/unbind $ echo -n :00:19.0 /sys/bus/pci/drivers/pci-stub/bind The steps above are specific to an e1000e device. Then I started guest with -pcidevice host=id. After guest is started, it successfully detected pci device with lspci command, however kernel can't bring up the device. dmesg shows infiniband kernel module can't detect infiniband card firmware properly then aborted. I think it is KVM issue rather than infiniband kernel module issue. Can anyone suggest? Sounds like you may have two drivers for this device. Can you include (on the host) lspci -tv and lspci -vvv? thanks, -chris _ Windows Live™: Keep your life in sync. http://windowslive.com/explore?ocid=TXT_TAGLM_WL_allup_1a_explore_042009-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -tip 0/6 V4] tracing: kprobe-based event tracer
On Fri, 2009-04-03 at 12:55 -0400, Masami Hiramatsu wrote: Ingo Molnar wrote: * Masami Hiramatsu mhira...@redhat.com wrote: Hmm, I'd like to know actually kvm aims to emulate all kinds of instructions. If so, I might find some bugs in x86_emulate.c. However, I don't know all bugs. To find all of them, we have to port x86_emulate.c to user-space, decode binaries with it, and compare its output with another decoder, as Jim had done with insn.c. https://www.redhat.com/archives/utrace-devel/2009-March/msg00031.html btw., i'd suggest we put a build time check for this into the kernel version as well. For example to decode the vmlinux via objdump, run it through your decoder as well and compare the results. Put under a CONFIG_DEBUG_X86_DECODER_TEST kind of (deault-off) build-time self-test. This would ensure that the kernel we are running is fully supported by the decoder - even as GCC/GAS starts using new instructions, etc. How does this sound to you? Thanks! That is a good idea. Jim, would you think you can port your script into kernel tree? ... I'd be happy to do what's needed to make it happen, and maintain it in the face of x86 changes. The script itself is practically nothing (~100 lines of awk and C), but what I don't know about the kernel build is a lot, so I'd need some help from a kernel-build expert. Jim -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 00/17] virtual-bus
Avi Kivity wrote: Gregory Haskins wrote: I'll rephrase. What are the substantial benefits that this offers over PCI? Simplicity and optimization. You don't need most of the junk that comes with PCI. Its all overhead and artificial constraints. You really only need things like a handful of hypercall verbs and thats it. Simplicity: The guest already supports PCI. It has to, since it was written to the PC platform, and since today it is fashionable to run kernels that support both bare metal and a hypervisor. So you can't remove PCI from the guest. Agreed The host also already supports PCI. It has to, since it must supports guests which do not support vbus. We can't remove PCI from the host. Agreed You don't gain simplicity by adding things. But you are failing to account for the fact that we still have to add something for PCI if we go with something like the in-kernel model. Its nice for the userspace side because a) it was already in qemu, and b) we need it for proper guest support. But we don't presumably have it for this new thing, so something has to be created (unless this support is somehow already there and I don't know it?) Sure, lguest is simple because it doesn't support PCI. But Linux will forever support PCI, and Qemu will always support PCI. You aren't simplifying anything by adding vbus. Optimization: Most of PCI (in our context) deals with configuration. So removing it doesn't optimize anything, unless you're counting hotplugs-per-second or something. Most, but not all ;) (Sorry, you left the window open on that one). What about IRQ routing? What if I want to coalesce interrupts to minimize injection overhead? How do I do that in PCI? How do I route those interrupts in an arbitrarily nested fashion, say, to a guest userspace? What about scale? What if Herbet decides to implement a 2048 ring MQ device ;) Theres no great way to do that in x86 with PCI, yet I can do it in vbus. (And yes, I know, this is ridiculous..just wanting to get you thinking) Second of all, I want to use vbus for other things that do not speak PCI natively (like userspace for instance...and if I am gleaning this correctly, lguest doesnt either). And virtio supports lguest and s390. virtio is not PCI specific. I understand that. We keep getting wrapped around the axle on this one. At some point in the discussion we were talking about supporting the existing guest ABI without changing the guest at all. So while I totally understand the virtio can work over various transports, I am referring to what would be needed to have existing ABI guests work with an in-kernel version. This may or may not be an actual requirement. There is be no problem supporting an in-kernel host virtio endpoint with the existing guest/host ABI. Nothing in the ABI assumes the host endpoint is in userspace. Nothing in the implementation requires us to move any of the PCI stuff into the kernel. Well, thats not really true. If the device is a PCI device, there is *some* stuff that has to go into the kernel. Not an ICH model or anything, but at least an ability to interact with userspace for config-space changes, etc. In fact, we already have in-kernel sources of PCI interrupts, these are assigned PCI devices (obviously, these have to use PCI). This will help. However, for the PC platform, PCI has distinct advantages. What advantages does vbus have for the PC platform? To reiterate: IMO simplicity and optimization. Its designed specifically for PV use, which is software to software. To avoid reiterating, please be specific about these advantages. We are both reading the same thread, right? PCI sounds good at first, but I believe its a false economy. It was designed, of course, to be a hardware solution, so it carries all this baggage derived from hardware constraints that simply do not exist in a pure software world and that have to be emulated. Things like the fixed length and centrally managed PCI-IDs, Not a problem in practice. Perhaps, but its just one more constraint that isn't actually needed. Its like the cvs vs git debate. Why have it centrally managed when you don't technically need it. Sure, centrally managed works, but I'd rather not deal with it if there was a better option. We've allocated 3 PCI device IDs so far. It's not a problem. There are enough real problems out there. PIO config cycles, BARs, pci-irq-routing, etc. What are the problems with these? 1) PIOs are still less efficient to decode than a hypercall vector. We dont need to pretend we are hardware..the guest already knows whats underneath them. Use the most efficient call method. Last time we measured, hypercall overhead was the same as pio overhead. Both vmx and svm decode pio completely (except for string pio ...) Not on my woodcrests last time I looked, but
Re: [RFC PATCH 00/17] virtual-bus
Chris Wright wrote: * Gregory Haskins (ghask...@novell.com) wrote: Let me ask you this: If you had a clean slate and were designing a hypervisor and a guest OS from scratch: What would you make the bus look like? Well, virtio did have a relatively clean slate. And PCI (as _one_ transport option) is what it looks like. It's not the only transport (as Avi already mentioned it works for s390, for example). Got it. Thanks. BTW, from my brief look at vbus, it seems pretty similar to xenbus. If you are referring to the guest side interface, it was actually inspired by lguest's bus (I forget what Rusty called it now, though). I think I actually declared that in the original patch series I put out 1.5 years ago, but I might have inadvertently omitted that on this go-round. I think XenBus is more of an event channel infrastructure, isn't it? But in any case, I think the nature of getting PV drivers into a guest is relatively similar, so I wouldn't be surprised if there were parallels in quite a few of the implementations. In fact, I chose a generic name like vbus in hopes that it could be used across different hypervisors. :) -Greg signature.asc Description: OpenPGP digital signature
Re: [PATCH -tip 4/6 V4.1] x86: kprobes checks safeness of insertion address.
Jim Keniston wrote: On Fri, 2009-04-03 at 12:02 -0400, Masami Hiramatsu wrote: Ensure safeness of inserting kprobes by checking whether the specified address is at the first byte of a instruction. This is done by decoding probed function from its head to the probe point. changes from v4: - change a comment according to Ananth's suggestion. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Jim Keniston jkeni...@us.ibm.com Cc: Ingo Molnar mi...@elte.hu --- arch/x86/kernel/kprobes.c | 51 + 1 files changed, 51 insertions(+), 0 deletions(-) diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c ... +/* Recover original instruction */ /* Recover the probed instruction at addr for further analysis. */ See below. Sure. +static int recover_probed_instruction(kprobe_opcode_t *buf, unsigned long addr) +{ +struct kprobe *kp; +kp = get_kprobe((void *)addr); +if (!kp) +return -EINVAL; + +/* Don't use p-ainsn.insn; which will be modified by fix_riprel */ fix_riprel doesn't affect the instruction's length, which is what concerns this patch. But we want this function to be useful for unforeseen uses as well, so I like the code you have. Just consider the suggested comment changes. /* * Don't use p-ainsn.insn, which could be modified -- e.g., * by fix_riprel(). */ Thanks, I'll update comments then! +memcpy(buf, kp-addr, MAX_INSN_SIZE * sizeof(kprobe_opcode_t)); +buf[0] = kp-opcode; +return 0; +} Jim Keniston -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America) Inc. Software Solutions Division e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PCI passtthrought intel 82574L can't boot from disk
On Thursday 02 April 2009 18:58:26 you wrote: It is my understanding that you need vt-d/iommu support. I didn't think any existing amd chipsets had iommu support. You may want to look into that. Hi Brian, thanks for you response. I found a tool [1] from Intel to disable the Boot ROM on the nic. Thats resolves the boot problem. Regards Hauke [1] http://downloadcenter.intel.com/Detail_Desc.aspx?agr=YProductID=412DwnldID=8242 --Brian Jackson On Thursday 02 April 2009 07:00:07 Hauke Hoffmann wrote: Hi, qemu-system-x86_64 runs well and i can boot and run the guest system. Thats works very well. Command: /usr/local/kvm/bin/qemu-system-x86_64 -m 512 -hda /var/VM/roadrunner.local/hda.qcow2 -smp 1 -vnc 192.168.2.30: -net nic,macaddr=DE:AD:BE:EF:90:26 -net tap,ifname=tap0,script=no,downscript=no -boot c Then i tried to add an intel 82574L network adapter to the guest. Just the same command with addtionally -pcidevice host=07:00.0 Then i connected via VNC and see BIOS startpage and the following lines: Initializing Intel(r) boot agent ge v1.3.21 pxe 2.1 build 086 (WfM 2.0) Press f12 for moot menu You can see a screenshot at http://nxt7.de/download/qemu.png The guests keep on this point and nothing changes. (I have wait hours.) I tried to press F12 in ThightVNC but no action. I must say that ThightVNC has problems with special chars (in my case). At this point, i need your help. Here are some details of my system Kernel: 2.6.29 form kernel.org (self compiled) kvm userspace: kvm-84 (self compiled) OS: Ubuntu 8.04.2 server r...@ls:~# lspci 00:00.0 RAM memory: nVidia Corporation MCP55 Memory Controller (rev a2) 00:01.0 ISA bridge: nVidia Corporation MCP55 LPC Bridge (rev a3) 00:01.1 SMBus: nVidia Corporation MCP55 SMBus (rev a3) 00:02.0 USB Controller: nVidia Corporation MCP55 USB Controller (rev a1) 00:02.1 USB Controller: nVidia Corporation MCP55 USB Controller (rev a2) 00:04.0 IDE interface: nVidia Corporation MCP55 IDE (rev a1) 00:05.0 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:05.1 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:05.2 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:06.0 PCI bridge: nVidia Corporation MCP55 PCI bridge (rev a2) 00:08.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3) 00:09.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3) 00:0a.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0b.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0c.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0d.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0e.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0f.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 01:09.0 Ethernet controller: Lite-On Communications Inc LNE100TX [Linksys EtherFast 10/100] (rev 25) 01:0a.0 VGA compatible controller: XGI Technology Inc. (eXtreme Graphics Innovation) Volari Z7 06:00.0 SATA controller: JMicron Technologies, Inc. JMicron 20360/20363 AHCI Controller (rev 03) 06:00.1 IDE interface: JMicron Technologies, Inc. JMicron 20360/20363 AHCI Controller (rev 03) 07:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection r...@ls:~# lspci -tvvv -[:00]-+-00.0 nVidia Corporation MCP55 Memory Controller +-01.0 nVidia Corporation MCP55 LPC Bridge +-01.1 nVidia Corporation MCP55 SMBus +-02.0 nVidia Corporation MCP55 USB Controller +-02.1 nVidia Corporation MCP55 USB Controller +-04.0 nVidia Corporation MCP55 IDE +-05.0 nVidia Corporation MCP55 SATA Controller +-05.1 nVidia Corporation MCP55 SATA Controller +-05.2 nVidia Corporation MCP55 SATA Controller +-06.0-[:01]--+-09.0 Lite-On Communications Inc LNE100TX [Linksys EtherFast 10/100] | \-0a.0 XGI Technology Inc. (eXtreme | Graphics Innovation) Volari Z7 +-08.0 nVidia Corporation MCP55 Ethernet +-09.0 nVidia Corporation MCP55 Ethernet +-0a.0-[:02]-- +-0b.0-[:03]-- +-0c.0-[:04]-- +-0d.0-[:05]-- +-0e.0-[:06]--+-00.0 JMicron Technologies, Inc. JMicron 20360/20363 AHCI Controller | \-00.1 JMicron Technologies, Inc.
Re: [kvm] [PATCH 06/16] Support for device capability
On Tue, 2009-03-17 at 11:50 +0800, Sheng Yang wrote: This framework can be easily extended to support device capability, like MSI/MSI-x. Sheng, Are you already looking at adding support for PM and EXP capabilities? The bnx2 driver is an example that won't claim the device if these capabilities aren't present. Thanks, Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM cpuTime discrepancy
The cpuTime of a VM reported by kvm72 is ok (real seconds ) while that reported by kvm-84 is not Are you aware of this . Was it fixed in latest kvm releases since 84 ? I access the cpuTime via libvirt . (same version in both cases) . thanks Zvi Dubitzky Virtualization and System Architecture Email:d...@il.ibm.com IBM Haifa Research LaboratoryPhone: +972-4-8296182 Haifa, 31905, ISRAEL -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH] PCI pass-through fixups
I'm wondering if we need a spot for device specific fixups for PCI pass-through. In the example below, I want to expose a single port of an Intel 82571EB quad port copper NIC to a guest. It works great until I shutdown the guest, at which point the guest e1000e driver knows by the device ID that the NIC is a quad port, and blindly attempts to twiddle some bits on the bridge above it (that doesn't exist). Obviously some robustness could be added to the driver, but would it make sense to do something like below and automatically remap these devices to identical single port device IDs? Thanks, Alex Signed-off-by: Alex Williamson alex.william...@hp.com -- diff --git a/qemu/hw/device-assignment.c b/qemu/hw/device-assignment.c index b7f9fa6..1c6d1e8 100644 --- a/qemu/hw/device-assignment.c +++ b/qemu/hw/device-assignment.c @@ -427,6 +427,35 @@ static int assigned_dev_register_regions(PCIRegion *io_regions, return 0; } +static void assigned_device_fixup(AssignedDevice *pci_dev) +{ +uint16_t vendor_id, device_id; + +vendor_id = pci_dev-dev.config[0] | pci_dev-dev.config[1] 8; +device_id = pci_dev-dev.config[2] | pci_dev-dev.config[3] 8; + +switch (vendor_id) { +case 0x8086: +switch (device_id) { +case 0x10A4: +case 0x10BC: +/* quad port copper - single port copper */ +pci_dev-dev.config[2] = 0x5E; +break; +case 0x10A5: +/* quad port fiber - single port fiber */ +pci_dev-dev.config[2] = 0x5F; +break; +case 0x10DA: +case 0x10D9: +/* dual/quad port serdes - single port serdes */ +pci_dev-dev.config[2] = 0x60; +break; +} +break; +} +} + static int get_real_device(AssignedDevice *pci_dev, uint8_t r_bus, uint8_t r_dev, uint8_t r_func) { @@ -524,6 +553,8 @@ again: } fclose(f); +assigned_device_fixup(pci_dev); + /* dealing with virtual function device */ snprintf(name, sizeof(name), %sphysfn/, dir); if (!stat(name, statbuf)) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -tip 2/6 V4.1] x86: add arch-dep register and stack access API to ptrace
+static struct pt_regs_offset regoffset_table[] = { ^ const -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm-autotest: weird memory error during stepmaker test
Excerpts from Ryan Harper's message of Qua Abr 01 12:55:58 -0300 2009: Wondering if anyone else using kvm-autotest stepmaker has ever seen this error: Traceback (most recent call last): File /home/rharper/work/git/build/kvm-autotest/client/tests/kvm_runtest_2/stepmaker. py, line 146, in update self.set_image_from_file(self.screendump_filename) File /home/rharper/work/git/build/kvm-autotest/client/tests/kvm_runtest_2/stepeditor .py, line 499, in set_image_from_file self.set_image(w, h, data) File /home/rharper/work/git/build/kvm-autotest/client/tests/kvm_runtest_2/stepeditor .py, line 485, in set_image w, h, w*3)) MemoryError I've seen this error twice today, while trying to create a step file to install a Windows 2008 R2 64-bit guest (the Win2008-64 step file available on the git repository doesn't work for me). This happened when the guest was being rebooted by the windows installer. The contents of the screen dump file are this: $ cat /home/ehabkost/autotest/kvm-autotest/client/results/default/kvm_runtest_2.Win2008.64.install/debug/scrdump.ppm P6 0 0 255 $ And the 0x0 pixmap really makes gdk panic: (w, h, data) = ppm_utils.image_read_from_ppm_file('/home/ehabkost/autotest/kvm-autotest/client/results/default/kvm_runtest_2.Win2008.64.install/debug/scrdump.ppm') w,h,data (0, 0, '') gtk.gdk.pixbuf_new_from_data(data, gtk.gdk.COLORSPACE_RGB, False, 8, w, h, w*3) Traceback (most recent call last): File stdin, line 1, in ? MemoryError The guest is still running, but stepmaker isn't recording any more so it's boned at that point. And of course, it's near the end of a guest install so one has lost a decent amount of time... -- Eduardo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH -tip 4/6 V4.2] x86: kprobes checks safeness of insertion address.
x86: kprobes checks safeness of insertion address. From: Masami Hiramatsu mhira...@redhat.com Ensure safeness of inserting kprobes by checking whether the specified address is at the first byte of a instruction. This is done by decoding probed function from its head to the probe point. changes from v4.1: - update comments according to Jim's suggestion. - s/lookup_symbol_attrs/kallsyms_lookup/ in a comment. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Jim Keniston jkeni...@us.ibm.com Cc: Ingo Molnar mi...@elte.hu --- arch/x86/kernel/kprobes.c | 54 + 1 files changed, 54 insertions(+), 0 deletions(-) diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c index 7b5169d..3d5e85f 100644 --- a/arch/x86/kernel/kprobes.c +++ b/arch/x86/kernel/kprobes.c @@ -48,12 +48,14 @@ #include linux/preempt.h #include linux/module.h #include linux/kdebug.h +#include linux/kallsyms.h #include asm/cacheflush.h #include asm/desc.h #include asm/pgtable.h #include asm/uaccess.h #include asm/alternative.h +#include asm/insn.h void jprobe_return_end(void); @@ -244,6 +246,56 @@ retry: } } +/* Recover the probed instruction at addr for further analysis. */ +static int recover_probed_instruction(kprobe_opcode_t *buf, unsigned long addr) +{ + struct kprobe *kp; + kp = get_kprobe((void *)addr); + if (!kp) + return -EINVAL; + + /* +* Don't use p-ainsn.insn, which could be modified -- e.g., +* by fix_riprel(). +*/ + memcpy(buf, kp-addr, MAX_INSN_SIZE * sizeof(kprobe_opcode_t)); + buf[0] = kp-opcode; + return 0; +} + +/* Dummy buffers for kallsyms_lookup */ +static char __dummy_buf[KSYM_NAME_LEN]; + +/* Check if paddr is at an instruction boundary */ +static int __kprobes can_probe(unsigned long paddr) +{ + int ret; + unsigned long addr, offset = 0; + struct insn insn; + kprobe_opcode_t buf[MAX_INSN_SIZE]; + + /* Lookup symbol including addr */ + if (!kallsyms_lookup(paddr, NULL, offset, NULL, __dummy_buf)) + return 0; + + /* Decode instructions */ + addr = paddr - offset; + while (addr paddr) { + insn_init_kernel(insn, (void *)addr); + insn_get_opcode(insn); + if (OPCODE1(insn) == BREAKPOINT_INSTRUCTION) { + ret = recover_probed_instruction(buf, addr); + if (ret) + return 0; + insn_init_kernel(insn, buf); + } + insn_get_length(insn); + addr += insn.length; + } + + return (addr == paddr); +} + /* * Returns non-zero if opcode modifies the interrupt flag. */ @@ -359,6 +411,8 @@ static void __kprobes arch_copy_kprobe(struct kprobe *p) int __kprobes arch_prepare_kprobe(struct kprobe *p) { + if (!can_probe((unsigned long)p-addr)) + return -EILSEQ; /* insn: must be on special executable page on x86. */ p-ainsn.insn = get_insn_slot(); if (!p-ainsn.insn) -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America) Inc. Software Solutions Division e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvm] [PATCH 13/16] kvm: enable MSI-X capabilty for assigned device
On Tue, 2009-03-17 at 11:50 +0800, Sheng Yang wrote: +if (*ctrl_word PCI_MSIX_ENABLE) { +if (assigned_dev_update_msix_mmio(pci_dev) 0) { +perror(assigned_dev_update_msix_mmio); +return; +} +if (kvm_assign_irq(kvm_context, assigned_irq_data) 0) { +perror(assigned_dev_enable_msix: assign irq); +return; +} +assigned_dev-irq_requested_type = assigned_irq_data.flags; +} +} Do we need some disable logic here? If I toggle a bnx2 NIC in a guest, I get the following when it attempts to come back up: MSI-X entry number is zero! assigned_dev_update_msix_mmio: No such device or address Thanks, Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm-autotest: weird memory error during stepmaker test
- Eduardo Habkost ehabk...@raisama.net wrote: Excerpts from Ryan Harper's message of Qua Abr 01 12:55:58 -0300 2009: Wondering if anyone else using kvm-autotest stepmaker has ever seen this error: Traceback (most recent call last): File /home/rharper/work/git/build/kvm-autotest/client/tests/kvm_runtest_2/stepmaker. py, line 146, in update self.set_image_from_file(self.screendump_filename) File /home/rharper/work/git/build/kvm-autotest/client/tests/kvm_runtest_2/stepeditor .py, line 499, in set_image_from_file self.set_image(w, h, data) File /home/rharper/work/git/build/kvm-autotest/client/tests/kvm_runtest_2/stepeditor .py, line 485, in set_image w, h, w*3)) MemoryError I've seen this error twice today, while trying to create a step file to install a Windows 2008 R2 64-bit guest (the Win2008-64 step file available on the git repository doesn't work for me). This happened when the guest was being rebooted by the windows installer. The contents of the screen dump file are this: $ cat /home/ehabkost/autotest/kvm-autotest/client/results/default/kvm_runtest_2.Win2008.64.install/debug/scrdump.ppm P6 0 0 255 $ And the 0x0 pixmap really makes gdk panic: (w, h, data) = ppm_utils.image_read_from_ppm_file('/home/ehabkost/autotest/kvm-autotest/client/results/default/kvm_runtest_2.Win2008.64.install/debug/scrdump.ppm') w,h,data (0, 0, '') gtk.gdk.pixbuf_new_from_data(data, gtk.gdk.COLORSPACE_RGB, False, 8, w, h, w*3) Traceback (most recent call last): File stdin, line 1, in ? MemoryError This is very useful information. I've seen qemu/kvm produce 0x0 screendumps before, but it's never happened to me while working with stepmaker. A reasonable solution would be to make sure a screendump is OK before feeding it to gdk. I'll try to commit this ASAP so it doesn't bother people any more. Thanks, Michael -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -tip 2/6 V4.1] x86: add arch-dep register and stack access API to ptrace
Roland McGrath wrote: +static struct pt_regs_offset regoffset_table[] = { ^ const Oops, exactly. Perhaps, I need to update insn.c to make bitmap tables static const too. Thank you, -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America) Inc. Software Solutions Division e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm-autotest: stepeditor: clear image if width, height, or data are invalid
This patch fixes the following issue: Excerpts from Eduardo Habkost's message of Fri Apr 03 17:37:56 -0300 2009: Excerpts from Ryan Harper's message of Wed Apr 01 12:55:58 -0300 2009: Wondering if anyone else using kvm-autotest stepmaker has ever seen this error: Traceback (most recent call last): File /home/rharper/work/git/build/kvm-autotest/client/tests/kvm_runtest_2/stepmaker. py, line 146, in update self.set_image_from_file(self.screendump_filename) File /home/rharper/work/git/build/kvm-autotest/client/tests/kvm_runtest_2/stepeditor .py, line 499, in set_image_from_file self.set_image(w, h, data) File /home/rharper/work/git/build/kvm-autotest/client/tests/kvm_runtest_2/stepeditor .py, line 485, in set_image w, h, w*3)) MemoryError I've seen this error twice today, while trying to create a step file to install a Windows 2008 R2 64-bit guest (the Win2008-64 step file available on the git repository doesn't work for me). This happened when the guest was being rebooted by the windows installer. The contents of the screen dump file are this: $ cat /home/ehabkost/autotest/kvm-autotest/client/results/default/kvm_runtest_2.Win200 8.64.install/debug/scrdump.ppm P6 0 0 255 $ And the 0x0 pixmap really makes gdk panic: (w, h, data) = ppm_utils.image_read_from_ppm_file('/home/ehabkost/autotest/kvm-autotest/client/results/default/kvm_runtest_2.Win2008.64.install/debug/scrdump.ppm') w,h,data (0, 0, '') gtk.gdk.pixbuf_new_from_data(data, gtk.gdk.COLORSPACE_RGB, False, 8, w, h, w*3) Traceback (most recent call last): File stdin, line 1, in ? MemoryError Signed-off-by: Eduardo Habkost ehabk...@redhat.com --- client/tests/kvm_runtest_2/stepeditor.py | 14 +- 1 files changed, 9 insertions(+), 5 deletions(-) diff --git a/client/tests/kvm_runtest_2/stepeditor.py b/client/tests/kvm_runtest_2/stepeditor.py index caaf47b..383834b 100755 --- a/client/tests/kvm_runtest_2/stepeditor.py +++ b/client/tests/kvm_runtest_2/stepeditor.py @@ -488,14 +488,18 @@ Utilities vscrollbar = self.scrolledwindow.get_vscrollbar() vscrollbar.set_range(0, h) +def clear_image(self): +self.image.clear() +self.image_width = 0 +self.image_height = 0 +self.image_data = + def set_image_from_file(self, filename): if not filename or not os.path.exists(filename): -self.image.clear() -self.image_width = 0 -self.image_height = 0 -self.image_data = -return +return self.clear_image() (w, h, data) = ppm_utils.image_read_from_ppm_file(filename) +if w = 0 or h = 0 or not data: +return self.clear_image() self.set_image(w, h, data) def get_step_lines(self, output_dir=None, current_step=None): -- 1.5.5.6 -- Eduardo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: cr3 OOS optimisation breaks 32-bit GNU/kFreeBSD guest
On Tue, Mar 24, 2009 at 11:47:33AM +0200, Avi Kivity wrote: index 2ea8262..48169d7 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3109,6 +3109,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) kvm_write_guest_time(vcpu); if (test_and_clear_bit(KVM_REQ_MMU_SYNC, vcpu-requests)) kvm_mmu_sync_roots(vcpu); +if (test_and_clear_bit(KVM_REQ_MMU_GLOBAL_SYNC, vcpu-requests)) +kvm_mmu_sync_global(vcpu); if (test_and_clear_bit(KVM_REQ_TLB_FLUSH, vcpu-requests)) kvm_x86_ops-tlb_flush(vcpu); if (test_and_clear_bit(KVM_REQ_REPORT_TPR_ACCESS Windows will (I think) write a PDE on every context switch, so this effectively disables global unsync for that guest. What about recursively syncing the newly linked page in FNAME(fetch)()? If the page isn't global, this becomes a no-op, so no new overhead. The only question is the expense when linking a populated top-level page, especially in long mode. How about this? KVM: MMU: sync global pages on fetch() If an unsync global page becomes unreachable via the shadow tree, which can happen if one its parent pages is zapped, invlpg will fail to invalidate translations for gvas contained in such unreachable pages. So sync global pages in fetch(). Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 09782a9..728be72 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -308,8 +308,14 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, break; } - if (is_shadow_present_pte(*sptep) !is_large_pte(*sptep)) + if (is_shadow_present_pte(*sptep) !is_large_pte(*sptep)) { + if (level-1 == PT_PAGE_TABLE_LEVEL) { + shadow_page = page_header(__pa(sptep)); + if (shadow_page-unsync shadow_page-global) + kvm_sync_page(vcpu, shadow_page); + } continue; + } if (is_large_pte(*sptep)) { rmap_remove(vcpu-kvm, sptep); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH -tip 2/6 V4.2] x86: add arch-dep register and stack access API to ptrace
Add following APIs for accessing registers and stack entries from pt_regs. - query_register_offset(const char *name) Query the offset of name register. - query_register_name(unsigned offset) Query the name of register by its offset. - get_register(struct pt_regs *regs, unsigned offset) Get the value of a register by its offset. - valid_stack_address(struct pt_regs *regs, unsigned long addr) Check the address is in the stack. - get_stack_nth(struct pt_regs *reg, unsigned nth) Get Nth entry of the stack. (N = 0) - get_argument_nth(struct pt_regs *reg, unsigned nth) Get Nth argument at function call. (N = 0) changes from v4.1: - make regoffset_table constant. - remove needless local variable initialization in query_register_*. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Cc: Steven Rostedt rost...@goodmis.org Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Ingo Molnar mi...@elte.hu Cc: Frederic Weisbecker fweis...@gmail.com Cc: Roland McGrath rol...@redhat.com --- arch/x86/include/asm/ptrace.h | 66 + arch/x86/kernel/ptrace.c | 60 + 2 files changed, 126 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h index aed0894..51e5844 100644 --- a/arch/x86/include/asm/ptrace.h +++ b/arch/x86/include/asm/ptrace.h @@ -7,6 +7,7 @@ #ifdef __KERNEL__ #include asm/segment.h +#include asm/page_types.h #endif #ifndef __ASSEMBLY__ @@ -215,6 +216,71 @@ static inline unsigned long user_stack_pointer(struct pt_regs *regs) return regs-sp; } +/* Query offset/name of register from its name/offset */ +extern int query_register_offset(const char *name); +extern const char *query_register_name(unsigned offset); +#define MAX_REG_OFFSET (offsetof(struct pt_regs, ss)) + +/* Get register value from its offset */ +static inline unsigned long get_register(struct pt_regs *regs, unsigned offset) +{ + if (unlikely(offset MAX_REG_OFFSET)) + return 0; + return *(unsigned long *)((unsigned long)regs + offset); +} + +/* Check the address in the stack */ +static inline int valid_stack_address(struct pt_regs *regs, unsigned long addr) +{ + return ((addr ~(THREAD_SIZE - 1)) == + (kernel_trap_sp(regs) ~(THREAD_SIZE - 1))); +} + +/* Get Nth entry of the stack */ +static inline unsigned long get_stack_nth(struct pt_regs *regs, unsigned n) +{ + unsigned long *addr = (unsigned long *)kernel_trap_sp(regs); + addr += n; + if (valid_stack_address(regs, (unsigned long)addr)) + return *addr; + else + return 0; +} + +/* Get Nth argument at function call */ +static inline unsigned long get_argument_nth(struct pt_regs *regs, unsigned n) +{ +#ifdef CONFIG_X86_32 +#define NR_REGPARMS 3 + if (n NR_REGPARMS) { + switch (n) { + case 0: return regs-ax; + case 1: return regs-dx; + case 2: return regs-cx; + } + return 0; +#else /* CONFIG_X86_64 */ +#define NR_REGPARMS 6 + if (n NR_REGPARMS) { + switch (n) { + case 0: return regs-di; + case 1: return regs-si; + case 2: return regs-dx; + case 3: return regs-cx; + case 4: return regs-r8; + case 5: return regs-r9; + } + return 0; +#endif + } else { + /* +* The typical case: arg n is on the stack. +* (Note: stack[0] = return address, so skip it) +*/ + return get_stack_nth(regs, 1 + n - NR_REGPARMS); + } +} + /* * These are defined as per linux/ptrace.h, which see. */ diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c index fe9345c..8a8af27 100644 --- a/arch/x86/kernel/ptrace.c +++ b/arch/x86/kernel/ptrace.c @@ -46,6 +46,66 @@ enum x86_regset { REGSET_IOPERM32, }; +struct pt_regs_offset { + const char *name; + int offset; +}; + +#define REG_OFFSET(r) offsetof(struct pt_regs, r) +#define REG_OFFSET_NAME(r) {.name = #r, .offset = REG_OFFSET(r)} +#define REG_OFFSET_END {.name = NULL, .offset = 0} + +static const struct pt_regs_offset regoffset_table[] = { +#ifdef CONFIG_X86_64 + REG_OFFSET_NAME(r15), + REG_OFFSET_NAME(r14), + REG_OFFSET_NAME(r13), + REG_OFFSET_NAME(r12), + REG_OFFSET_NAME(r11), + REG_OFFSET_NAME(r10), + REG_OFFSET_NAME(r9), + REG_OFFSET_NAME(r8), +#endif + REG_OFFSET_NAME(bx), + REG_OFFSET_NAME(cx), + REG_OFFSET_NAME(dx), + REG_OFFSET_NAME(si), + REG_OFFSET_NAME(di), + REG_OFFSET_NAME(bp), + REG_OFFSET_NAME(ax), +#ifdef CONFIG_X86_32 + REG_OFFSET_NAME(ds), + REG_OFFSET_NAME(es), + REG_OFFSET_NAME(fs), + REG_OFFSET_NAME(gs), +#endif +
[PATCH -tip 3/6 V4.1] x86: instruction decorder API
Add x86 instruction decoder to arch-specific libraries. This decoder can decode all x86 instructions into prefix, opcode, modrm, sib, displacement and immediates. This can also show the length of instructions. changes from v4: - make bitmap tables static. Signed-off-by: Jim Keniston jkeni...@us.ibm.com Signed-off-by: Masami Hiramatsu mhira...@redhat.com Cc: Steven Rostedt rost...@goodmis.org Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Ingo Molnar mi...@elte.hu Cc: Frederic Weisbecker fweis...@gmail.com Cc: Andi Kleen a...@linux.intel.com --- arch/x86/include/asm/insn.h | 130 + arch/x86/lib/Makefile |1 arch/x86/lib/insn.c | 627 +++ 3 files changed, 758 insertions(+), 0 deletions(-) create mode 100644 arch/x86/include/asm/insn.h create mode 100644 arch/x86/lib/insn.c diff --git a/arch/x86/include/asm/insn.h b/arch/x86/include/asm/insn.h new file mode 100644 index 000..488001f --- /dev/null +++ b/arch/x86/include/asm/insn.h @@ -0,0 +1,130 @@ +#ifndef _ASM_X86_INSN_H +#define _ASM_X86_INSN_H +/* + * x86 instruction analysis + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + * + * Copyright (C) IBM Corporation, 2009 + */ + +#include linux/types.h + +/* legacy instruction prefixes */ +#define X86_PFX_OPNDSZ 0x1 /* 0x66 */ +#define X86_PFX_ADDRSZ 0x2 /* 0x67 */ +#define X86_PFX_CS 0x4 /* 0x2E */ +#define X86_PFX_DS 0x8 /* 0x3E */ +#define X86_PFX_ES 0x10/* 0x26 */ +#define X86_PFX_FS 0x20/* 0x64 */ +#define X86_PFX_GS 0x40/* 0x65 */ +#define X86_PFX_SS 0x80/* 0x36 */ +#define X86_PFX_LOCK 0x100 /* 0xF0 */ +#define X86_PFX_REPE 0x200 /* 0xF3 */ +#define X86_PFX_REPNE 0x400 /* 0xF2 */ +/* REX prefix */ +#define X86_PFX_REX0x800 /* 0x4X */ +/* REX prefix dissected */ +#define X86_PFX_REX_BASE 0x1000 +#define X86_PFX_REXB 0x1000 /* 0x41 bit */ +#define X86_PFX_REXX 0x2000 /* 0x42 bit */ +#define X86_PFX_REXR 0x4000 /* 0x44 bit */ +#define X86_PFX_REXW 0x8000 /* 0x48 bit */ + +struct insn_field { + union { + s32 value; + u8 bytes[4]; + }; + bool got; /* true if we've run insn_get_xxx() for this field */ + u8 nbytes; +}; + +struct insn { + struct insn_field prefixes; /* prefixes.value is a bitmap */ + struct insn_field opcode; /* +* opcode.bytes[0]: opcode1 +* opcode.bytes[1]: opcode2 +* opcode.bytes[2]: opcode3 +*/ + struct insn_field modrm; + struct insn_field sib; + struct insn_field displacement; + union { + struct insn_field immediate; + struct insn_field moffset1; /* for 64bit MOV */ + struct insn_field immediate1; /* for 64bit imm or off16/32 */ + }; + union { + struct insn_field moffset2; /* for 64bit MOV */ + struct insn_field immediate2; /* for 64bit imm or seg16 */ + }; + + u8 opnd_bytes; + u8 addr_bytes; + u8 length; + bool x86_64; + + const u8 *kaddr;/* kernel address of insn (copy) to analyze */ + const u8 *next_byte; +}; + +#define OPCODE1(insn) ((insn)-opcode.bytes[0]) +#define OPCODE2(insn) ((insn)-opcode.bytes[1]) +#define OPCODE3(insn) ((insn)-opcode.bytes[2]) + +#define MODRM_MOD(insn) (((insn)-modrm.value 0xc0) 6) +#define MODRM_REG(insn) (((insn)-modrm.value 0x38) 3) +#define MODRM_RM(insn) ((insn)-modrm.value 0x07) + +#define SIB_SCALE(insn) (((insn)-sib.value 0xc0) 6) +#define SIB_INDEX(insn) (((insn)-sib.value 0x38) 3) +#define SIB_BASE(insn) ((insn)-sib.value 0x07) + +#define MOFFSET64(insn)(((u64)((insn)-moffset2.value) 32) | \ + (u32)((insn)-moffset1.value)) + +#define IMMEDIATE64(insn) (((u64)((insn)-immediate2.value) 32) | \ + (u32)((insn)-immediate1.value)) + +extern void insn_init(struct insn *insn, const u8 *kaddr, bool x86_64); +extern void insn_get_prefixes(struct insn *insn); +extern void insn_get_opcode(struct insn *insn); +extern void insn_get_modrm(struct
Re: Can't boot guest with more than 3585MB when using large pages
On Tue, Mar 24, 2009 at 04:57:46PM -0500, Ryan Harper wrote: * Alex Williamson alex.william...@hp.com [2009-03-24 16:07]: On a 2.6.29, x86_64 host/guest, what's special about specifying a guest size of -m 3586 when using -mem-path backed by hugetlbfs? 3585 works, 3586 hangs here: ... PCI-DMA: Using software bounce buffering for IO (SWIOTLB) Placing 64MB software IO TLB between 88002000 - 88002400 software IO TLB at phys 0x2000 - 0x2400 Memory: 3504832k/4196352k available (2926k kernel code, 524740k absent, 166780k reserved, 1260k data, 496k init) I can back -mem-path by tmpfs or disk and it works fine. Also works with no -mem-path, but it would obviously be nice to benefit from large pages on big guests. The system has plenty of huge pages to back the request, and booting with -mem-prealloc makes no difference. Tested on latest git as of today. Thanks, I've seen this as well, haven't had a chance to dig into the issue yet either. Certainly can test patches if anyone has an idea of what's wrong here. Can you please try the following -- qemu: kvm: fixup 4GB+ memslot large page alignment Need to align the 4GB+ memslot after we know its address, not before. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/qemu/hw/pc.c b/qemu/hw/pc.c index d4a4320..cc84772 100644 --- a/qemu/hw/pc.c +++ b/qemu/hw/pc.c @@ -866,6 +866,7 @@ static void pc_init1(ram_addr_t ram_size, int vga_ram_size, /* above 4giga memory allocation */ if (above_4g_mem_size 0) { +ram_addr = qemu_ram_alloc(above_4g_mem_size); if (hpagesize) { if (ram_addr (hpagesize-1)) { unsigned long aligned_addr; @@ -874,7 +875,6 @@ static void pc_init1(ram_addr_t ram_size, int vga_ram_size, ram_addr = aligned_addr; } } -ram_addr = qemu_ram_alloc(above_4g_mem_size); cpu_register_physical_memory(0x1ULL, above_4g_mem_size, ram_addr); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -tip 3/6 V4.1] x86: instruction decorder API
Masami Hiramatsu wrote: Add x86 instruction decoder to arch-specific libraries. This decoder can decode all x86 instructions into prefix, opcode, modrm, sib, displacement and immediates. This can also show the length of instructions. changes from v4: - make bitmap tables static. Hi Masami, On the surface the overall structure looks fine, but I have a couple of concerns: 1. is this meant to be able to decode userspace code or just kernel code? If it is supposed to be able to decode userspace code, is there a reason you're not dealing with 16-bit or V86 mode code at all? If not, why are you including the 32-bit decoder in a 64-bit kernel (as well as instructions which we're pretty much guaranteed to never use in the kernel, such as ENTER.) 2. you're already not dealing with all existing three-byte opcode spaces, nor with DREX or VEX encodings for upcoming processors. This doesn't matter so much for the kernel, but it does matter if this is supposed to be used for user-space code. 3. is there any need to deal with instruction set differences among processors? (Again, this depends on the usage model.) 4. you have a bunch of magic opcode constants all over the place. This means that as new instructions come in -- and they're going to be coming in -- this is going to be hard to update. It would be cleaner if we could have an intermediate format that preprocesses down to all the relevant tables and perhaps even some of the code rather than open-coding everything in C. This matters... for example you have: + } else if (opcode == 0xea /* jmp far seg:offs */) { + __get_immptr(insn); ... but nothing similar for opcode 0x9a. This is extremely hard to spot with this kind of structure. The more data-driven we can make it (without bloating the code too much) the better off we are, I believe. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -tip 3/6 V4.1] x86: instruction decorder API
Hi Peter, H. Peter Anvin wrote: Masami Hiramatsu wrote: Add x86 instruction decoder to arch-specific libraries. This decoder can decode all x86 instructions into prefix, opcode, modrm, sib, displacement and immediates. This can also show the length of instructions. changes from v4: - make bitmap tables static. Hi Masami, On the surface the overall structure looks fine, but I have a couple of concerns: 1. is this meant to be able to decode userspace code or just kernel code? If it is supposed to be able to decode userspace code, is there a reason you're not dealing with 16-bit or V86 mode code at all? If not, why are you including the 32-bit decoder in a 64-bit kernel (as well as instructions which we're pretty much guaranteed to never use in the kernel, such as ENTER.) Actually, this aims to decode both of user space and kernel code. At this point, it just needs to cover kernel code, because kprobes just want to decode kernel binary. However, this is just a starting point, uprobe developers want to use it to decode user-space code. In that case, it needs to be enhanced. 2. you're already not dealing with all existing three-byte opcode spaces, nor with DREX or VEX encodings for upcoming processors. This doesn't matter so much for the kernel, but it does matter if this is supposed to be used for user-space code. 3. is there any need to deal with instruction set differences among processors? (Again, this depends on the usage model.) Agreed. When it support decoding user-space code, it should support all kind of instructions. 4. you have a bunch of magic opcode constants all over the place. This means that as new instructions come in -- and they're going to be coming in -- this is going to be hard to update. It would be cleaner if we could have an intermediate format that preprocesses down to all the relevant tables and perhaps even some of the code rather than open-coding everything in C. This matters... for example you have: + } else if (opcode == 0xea /* jmp far seg:offs */) { + __get_immptr(insn); ... but nothing similar for opcode 0x9a. This is extremely hard to spot with this kind of structure. Oops, that should be a bug. Hmm, I think we'd better bit-flags tables for classifying opcodes. Jim, can your INAT idea help this situation? http://sources.redhat.com/ml/systemtap/2009-q2/msg00109.html The more data-driven we can make it (without bloating the code too much) the better off we are, I believe. -hpa Thank you for good advice! -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America) Inc. Software Solutions Division e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2729621 ] usb_add on garmin gps fails
Bugs item #2729621, was opened at 2009-04-03 21:02 Message generated for change (Tracker Item Submitted) made by byron_clark You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2729621group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: qemu Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Byron Clark (byron_clark) Assigned to: Nobody/Anonymous (nobody) Summary: usb_add on garmin gps fails Initial Comment: When I attempt to usb_add my Garmin Venture HC GPS with this command: usb_add host:091e:0003 I get the following error: usb_linux_update_endp_table: Broken pipe strace log: write(1, husb: open device 5.3\n, 22) = 22 open(/proc/bus/usb/005/003, O_RDWR|O_NONBLOCK) = 26 read(26, \22\1\20\1\377\377\...@\36\t\3\0\1\0\0\0\0\1\t\2'\0\1\1\0\300\0\t\4\0\0\3..., 1024) = 57 write(1, husb: config #1 need -1\n, 24) = 24 ioctl(26, USBDEVFS_IOCTL, 0x7fff0bb74300) = -1 ENODATA (No data available) ioctl(26, USBDEVFS_CLAIMINTERFACE, 0x7fff0bb7431c) = 0 write(1, husb: 1 interfaces claimed for c..., 47) = 47 ioctl(26, USBDEVFS_CONNECTINFO, 0x7fff0bb74750) = 0 write(1, husb: grabbed usb device 5.3\n, 29) = 29 ioctl(26, USBDEVFS_CONTROL, 0x7fff0bb742f0) = -1 EPIPE (Broken pipe) dup(2) = 27 fcntl(27, F_GETFL) = 0x8802 (flags O_RDWR|O_NONBLOCK|O_LARGEFILE) fstat(27, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 9), ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4e03b67000 lseek(27, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek) write(27, usb_linux_update_endp_table: Bro..., 41) = 41 close(27) = 0 cpu: processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz stepping: 11 cpu MHz : 2194.427 cache size : 4096 KB physical id : 0 siblings: 2 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm ida tpr_shadow vnmi flexpriority bogomips: 4388.85 clflush size: 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: host distribution: debian sid bitness: 64 guest distribution: windows xp bitness: 32 -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2729621group_id=180599 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2729621 ] usb_add on garmin gps fails
Bugs item #2729621, was opened at 2009-04-03 21:02 Message generated for change (Comment added) made by byron_clark You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2729621group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: qemu Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Byron Clark (byron_clark) Assigned to: Nobody/Anonymous (nobody) Summary: usb_add on garmin gps fails Initial Comment: When I attempt to usb_add my Garmin Venture HC GPS with this command: usb_add host:091e:0003 I get the following error: usb_linux_update_endp_table: Broken pipe strace log: write(1, husb: open device 5.3\n, 22) = 22 open(/proc/bus/usb/005/003, O_RDWR|O_NONBLOCK) = 26 read(26, \22\1\20\1\377\377\...@\36\t\3\0\1\0\0\0\0\1\t\2'\0\1\1\0\300\0\t\4\0\0\3..., 1024) = 57 write(1, husb: config #1 need -1\n, 24) = 24 ioctl(26, USBDEVFS_IOCTL, 0x7fff0bb74300) = -1 ENODATA (No data available) ioctl(26, USBDEVFS_CLAIMINTERFACE, 0x7fff0bb7431c) = 0 write(1, husb: 1 interfaces claimed for c..., 47) = 47 ioctl(26, USBDEVFS_CONNECTINFO, 0x7fff0bb74750) = 0 write(1, husb: grabbed usb device 5.3\n, 29) = 29 ioctl(26, USBDEVFS_CONTROL, 0x7fff0bb742f0) = -1 EPIPE (Broken pipe) dup(2) = 27 fcntl(27, F_GETFL) = 0x8802 (flags O_RDWR|O_NONBLOCK|O_LARGEFILE) fstat(27, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 9), ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4e03b67000 lseek(27, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek) write(27, usb_linux_update_endp_table: Bro..., 41) = 41 close(27) = 0 cpu: processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz stepping: 11 cpu MHz : 2194.427 cache size : 4096 KB physical id : 0 siblings: 2 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm ida tpr_shadow vnmi flexpriority bogomips: 4388.85 clflush size: 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: host distribution: debian sid bitness: 64 guest distribution: windows xp bitness: 32 -- Comment By: Byron Clark (byron_clark) Date: 2009-04-03 21:08 Message: kvm-84, kernel 2.6.29. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2729621group_id=180599 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NetBSD and device trees
On Fri, Apr 3, 2009 at 3:32 AM, Hollis Blanchard holl...@us.ibm.com wrote: (I'll address the MMU issue in a separate mail.) On Thu, 2009-04-02 at 11:56 -0700, Rahul Kulkarni wrote: Another potential issue could be the initial environment (described earlier as option 2) not being what BSD expects. Do you use u-boot? You can see the initial environment set up in kvm_arch_vcpu_setup() in KVM and mpc8544ds_init() in Qemu. Rahul Yes..I will look into those functions..We do use uboot..Are you hinting to go with option 1? If you use u-boot then you might not have much work to do (option 2 will probably work for you with few changes). Does NetBSD use flattened device trees at all? KVM (Qemu) supplies a stripped-down device tree to the guest so that the guest won't try to access IO devices not currently emulated by qemu. If BSD has a hardcoded device configuration system (e.g. we built for 8544, therefore we always have the following SoC devices) that will be an issue. Rahul The device config is hardcoded our NetBSD code base(more so because of the embedded nature it's a preferred way) but since I see NetBSD supported on Qemu..I would think there is a support available for a flattened device tree to be passed in from qemu..I'll look at x86 implementations. Really quick history: Traditionally, desktop/server PowerPC had Open Firmware (IEEE1275). Open Firmware provides runtime services (sometimes including IP stack, disk drivers, filesystems, etc), and those services allow the kernel to retrieve a device tree describing the physical topology of the system. The runtime services (callbacks) are relatively high overhead for embedded systems, so traditionally embedded PowerPC systems used something simpler (ppcboot/u-boot, redboot, CFE, homebrew, etc). These systems usually hardcoded the expected set of IO devices at build time. However, in recent years Linux developers have found that the flexibility granted by the device tree is invaluable, even without the runtime services. So they developed a flat device tree data structure (flat because it's a contiguous in-memory format representing a tree), and had firmware (especially u-boot) pass that tree to the kernel as a binary blob. The takeaway here is that the flat device tree is so far mostly a PowerPC Linux specific concept. Although the idea is beginning to catch on with architectures and kernels, I expect that NetBSD doesn't know anything about it, and x86 Linux doesn't either. hmm. learnt a lot. Thanks. seems qemu is going to adopt flat device tree. :) So since PowerPC NetBSD has build-time tables describing the hardware it will try to use. I see the following options: 1) Teach NetBSD about flat device trees. Probably a lot of work. 2) Emulate more 85xx hardware in qemu. Maybe an easy to medium amount of work, depending on the complexity and number of the IO devices. 3) Build a special NetBSD kernel with modified tables appropriate for qemu. Probably the easiest/quickest way, but if your long-term goal is to run unmodified NetBSD kernels built for real hardware, this is only a prototyping step. If you have more than one person playing with this, #2 could be done in conjunction with #3, until you've emulated all the necessary devices. Also, if you do #2, you could actually use qemu (without KVM) as a development environment on normal x86 Linux or Windows workstations (I think virtual prototyping or virtual platforms is the buzzword these days). This might be a benefit for your internal software development processes. btw, why did you give up virtio-net and change to e1000 on guest 440? If there is interest (or maybe even existing work) in the NetBSD community for flat device tree support, you may be able to team up with other developers to tackle problem #1. To find out, I would post to devicetree-disc...@ozlabs.org asking if they've heard of NetBSD work, and also NetBSD/PowerPC mailing lists to see if they've heard of device tree work. It will be better to cc here... -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: MMU tricks for NetBSD guests
On Fri, 2009-04-03 at 00:52 +0200, Alexander Graf wrote: That sounds a lot like what I imlemented for real mode on 970. I assume the PID is similar to a full SLB context and AS=1/AS=0 is just another bit that could as well be in the PID? Mostly... however, when an interrupt occurs, AS is set to 0 and PID remains unchanged. Also, AS can have different settings for instruction and data fetches. (I've been abbreviating as MSR[AS], but technically I should be writing MSR[IS] for instructions or MSR[DS] for data). So what we do on 970[1] is we treat real mode as yet another vsid. 970 translates EA - VA - RA. It looks like booke does the same, with the VSID coming from the PID. Exactly -- Book E uses AS | PID to provide the VSID, while Book S uses the SLB. The Book E way is much simpler, and also avoids the effective address collision problem we ran into on 970, because AS/PID don't depend on the EA. -- Hollis Blanchard IBM Linux Technology Center -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html