Re: [PATCH 3/4] KVM: Switch to srcu-less get_dirty_log()
On Fri, 16 Mar 2012 13:03:48 +0800 Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com wrote: For my quickly review, mmu_lock can not protect everything, if the guest page Yes and ... is written out of the shadow page/ept table, dirty page will be lost. No. There is a example: CPU A CPU B guest page is written by write-emulation hold mmu-lock and see dirty-bitmap is not be changed, then migration is completed. We do not allow this break. call mark_page_dirty() to set dirty_bit map Right? As you pointed out, we cannot assume mutual exclusion by mmu_lock. That is why we are using atomic bitmap operations: xchg and set_bit. In this sense we are at least guaranteed to get the dirty page information in dirty_bitmap - the current one or next one. So what we should care about is to not miss the information written in the next bitmap at the time we actually migrate the guest. Actually the userspace stops the guest at the final stage and then send the remaining pages found in the bitmap. So the above break between write and mark_page_dirty() cannot happen IIUC. Thanks, Takuya -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: Switch to srcu-less get_dirty_log()
On 03/16/2012 02:55 PM, Takuya Yoshikawa wrote: On Fri, 16 Mar 2012 13:03:48 +0800 Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com wrote: For my quickly review, mmu_lock can not protect everything, if the guest page Yes and ... is written out of the shadow page/ept table, dirty page will be lost. No. There is a example: CPU A CPU B guest page is written by write-emulation hold mmu-lock and see dirty-bitmap is not be changed, then migration is completed. We do not allow this break. Hmm? what can avoid this? Could you please point it out? call mark_page_dirty() to set dirty_bit map Right? As you pointed out, we cannot assume mutual exclusion by mmu_lock. That is why we are using atomic bitmap operations: xchg and set_bit. In this sense we are at least guaranteed to get the dirty page information in dirty_bitmap - the current one or next one. The problem is the guest page is written before dirty-bitmap is set, we may log the dirty page in this window like above case... So what we should care about is to not miss the information written in the next bitmap at the time we actually migrate the guest. Actually, the way log dirty page in MMU page-table is tricky: set dirty-bitmap allow spte to be writeable page can be written That means we always set dirty-bitmap _before_ page become dirty that is the reason why your bitmap-way can work. Actually the userspace stops the guest at the final stage and then send the remaining pages found in the bitmap. So the above break between write and mark_page_dirty() cannot happen IIUC. Maybe i'd better firstly understand why We do not allow this break :) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: Switch to srcu-less get_dirty_log()
On Fri, 16 Mar 2012 15:30:45 +0800 Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com wrote: There is a example: CPU A CPU B guest page is written by write-emulation hold mmu-lock and see dirty-bitmap is not be changed, then migration is completed. We do not allow this break. Hmm? what can avoid this? Could you please point it out? Stopping the guest before actualy migrating the guest means VCPU threads must be back in the userspace at the moment, no? So when the final GET_DIRTY_LOG is being executed, thread A cannot be in KVM. The problem is the guest page is written before dirty-bitmap is set, we may log the dirty page in this window like above case... Exactly, but the next GET_DIRTY_LOG call can take that because, as I wrote above, at this time the GET_DIRTY_LOG must not be the final one. Makes sense? Takuya -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH/RFC] kvm/powerpc: Add new ioctl to retreive support page sizes and encodings
This is necessary for qemu to be able to pass the right information to the guest, as the supported sizes and encodings can vary depending on the machine, the type of KVM used (PR vs HV) and the version of KVM Signed-off-by: Benjamin Herrenschmidt b...@kernel.crashing.org --- Please comment ASAP. I'm tired of the qemu side never working properly because of that and our out-of-tree nasty patches we've been carrying internally, so I'd like to get something like that in real quick :-) I have the qemu side patches that use this to generate the appropriate device-tree when available, and use heuristics for the fallback. I'll post them later, let's agree on the kernel interfaces first. The heuristics work as long as we have a reasonable guarantee that this kernel patch will get in -before- any patch that enables the PVINFO ioctl on HV KVM, that way I can rely on the later not working as a way to differenciate PR and HV KVM if this new ioctl is not supported. Note: We probably want an other ioctl for getting other type of MMU info, such as whether we support 1T segments etc... but I didn't want to try to kill to many birds at once and end up in bike shed painting on the mailing list for the next 6 month... Cheers, Ben. arch/powerpc/include/asm/kvm_ppc.h |3 ++- arch/powerpc/kvm/book3s_hv.c | 35 +++ arch/powerpc/kvm/book3s_pr.c | 22 ++ arch/powerpc/kvm/powerpc.c | 18 +- include/linux/kvm.h| 29 + 5 files changed, 105 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index c1069f6..bf530fd 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -140,7 +140,8 @@ extern int kvmppc_core_prepare_memory_region(struct kvm *kvm, struct kvm_userspace_memory_region *mem); extern void kvmppc_core_commit_memory_region(struct kvm *kvm, struct kvm_userspace_memory_region *mem); - +extern int kvm_vm_ioctl_get_page_sizes(struct kvm *kvm, + struct kvm_ppc_page_sizes *ps); extern int kvmppc_bookehv_init(void); extern void kvmppc_bookehv_exit(void); diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 8ee46b9..c7f7f20 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -1174,6 +1174,37 @@ long kvm_vm_ioctl_allocate_rma(struct kvm *kvm, struct kvm_allocate_rma *ret) return fd; } +static void kvmppc_add_seg_page_size(struct kvm_ppc_one_seg_page_size **sps, +int linux_psize) +{ + struct mmu_psize_def *def = mmu_psize_defs[linux_psize]; + + if (!def-shift) + return; + *sps-page_shift = def-shift; + *sps-slb_enc = def-sllp; + *sps-enc[0].page_shift = def-shift; + *sps-enc[0].pte_enc = def-penc; + *sps++; +} + +int kvm_vm_ioctl_get_page_sizes(struct kvm *kvm, struct kvm_ppc_page_sizes *ps) +{ + struct kvm_ppc_one_seg_page_size *sps; + int i; + + /* Page sizes limited by backing store */ + ps-flags = KVM_PPC_PAGE_SIZES_REAL; + + /* We only support these sizes for now, and no muti-size segments */ + sps = ps-sps[0]; + kvmppc_add_seg_page_size(sps, MMU_PAGE_4K); + kvmppc_add_seg_page_size(sps, MMU_PAGE_64K); + kvmppc_add_seg_page_size(sps, MMU_PAGE_16M); + + return 0; +} + /* * Get (and clear) the dirty memory log for a memory slot. */ @@ -1211,6 +1242,10 @@ out: return r; } +int kvm_vm_ioctl_get_page_sizes(struct kvm *kvm, struct kvm_ppc_page_sizes *ps) +{ +} + static unsigned long slb_pgsize_encoding(unsigned long psize) { unsigned long senc = 0; diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c index 5f0ee48..3c823ed 100644 --- a/arch/powerpc/kvm/book3s_pr.c +++ b/arch/powerpc/kvm/book3s_pr.c @@ -1154,6 +1154,28 @@ out: return r; } +#ifdef CONFIG_PPC64 +int kvm_vm_ioctl_get_page_sizes(struct kvm *kvm, struct kvm_ppc_page_sizes *ps) +{ + /* No flags */ + ps-flags = 0; + + /* Standard 4k base page size segment */ + ps-sps[0].page_shift = 12; + ps-sps[0].slb_enc = 0; + ps-sps[0].enc[0].page_shift = 12; + ps-sps[0].enc[0].pte_enc = 0; + + /* Standard 16M large page size segment */ + ps-sps[1].page_shift = 24; + ps-sps[1].slb_enc = SLB_VSID_L; + ps-sps[1].enc[0].page_shift = 24; + ps-sps[1].enc[0].pte_enc = 0; + + return 0; +} +#endif /* CONFIG_PPC64 */ + int kvmppc_core_prepare_memory_region(struct kvm *kvm, struct kvm_userspace_memory_region *mem) { diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 6ac3115..6f0c066 100644 --- a/arch/powerpc/kvm/powerpc.c +++
[PATCH v2] kvm/book3s: Make kernel emulated H_PUT_TCE available for PR KVM
There is nothing in the code for emulating TCE tables in the kernel that prevents it from working on PR KVM... other than ifdef's and location of the code. This renames book3s_64_vio_hv.c to book3s_64_vio.c and moves the bulk of the code there. This speeds things up a bit on my G5. --- v2. Changed the ifdef as per discussion with Alex. I still didn't manage to get git to figure out the rename but that's no big deal, the old file had only one small function in it. There's no code change, you can trust me on that one, It's really just moving things around :-) arch/powerpc/include/asm/kvm_host.h |4 +- arch/powerpc/include/asm/kvm_ppc.h |2 + arch/powerpc/kvm/Makefile |3 +- arch/powerpc/kvm/book3s_64_vio.c| 187 +++ arch/powerpc/kvm/book3s_64_vio_hv.c | 73 -- arch/powerpc/kvm/book3s_hv.c| 109 arch/powerpc/kvm/book3s_pr.c|3 + arch/powerpc/kvm/book3s_pr_papr.c | 18 arch/powerpc/kvm/powerpc.c |8 +- 9 files changed, 221 insertions(+), 186 deletions(-) create mode 100644 arch/powerpc/kvm/book3s_64_vio.c delete mode 100644 arch/powerpc/kvm/book3s_64_vio_hv.c diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 42a527e..d848cdc 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -237,7 +237,6 @@ struct kvm_arch { unsigned long vrma_slb_v; int rma_setup_done; int using_mmu_notifiers; - struct list_head spapr_tce_tables; spinlock_t slot_phys_lock; unsigned long *slot_phys[KVM_MEM_SLOTS_NUM]; int slot_npages[KVM_MEM_SLOTS_NUM]; @@ -245,6 +244,9 @@ struct kvm_arch { struct kvmppc_vcore *vcores[KVM_MAX_VCORES]; struct kvmppc_linear_info *hpt_li; #endif /* CONFIG_KVM_BOOK3S_64_HV */ +#ifdef CONFIG_PPC_BOOK3S_64 + struct list_head spapr_tce_tables; +#endif }; /* diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 7f0a3da..c1069f6 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -126,6 +126,8 @@ extern void kvmppc_map_vrma(struct kvm_vcpu *vcpu, extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu); extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm, struct kvm_create_spapr_tce *args); +extern long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn, +unsigned long ioba, unsigned long tce); extern long kvm_vm_ioctl_allocate_rma(struct kvm *kvm, struct kvm_allocate_rma *rma); extern struct kvmppc_linear_info *kvm_alloc_rma(void); diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile index 25225ae..8c95def 100644 --- a/arch/powerpc/kvm/Makefile +++ b/arch/powerpc/kvm/Makefile @@ -54,6 +54,7 @@ kvm-book3s_64-objs-$(CONFIG_KVM_BOOK3S_64_PR) := \ book3s_paired_singles.o \ book3s_pr.o \ book3s_pr_papr.o \ + book3s_64_vio.o \ book3s_emulate.o \ book3s_interrupts.o \ book3s_mmu_hpte.o \ @@ -70,7 +71,7 @@ kvm-book3s_64-objs-$(CONFIG_KVM_BOOK3S_64_HV) := \ kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_HV) := \ book3s_hv_rmhandlers.o \ book3s_hv_rm_mmu.o \ - book3s_64_vio_hv.o \ + book3s_64_vio.o \ book3s_hv_builtin.o kvm-book3s_64-module-objs := \ diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c new file mode 100644 index 000..193ba68 --- /dev/null +++ b/arch/powerpc/kvm/book3s_64_vio.c @@ -0,0 +1,187 @@ +/* + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. + * + * Copyright 2010 Paul Mackerras, IBM Corp. pau...@au1.ibm.com + * Copyright 2011 David Gibson, IBM Corporation d...@au1.ibm.com + */ + +#include linux/types.h +#include linux/string.h +#include linux/kvm.h +#include linux/kvm_host.h +#include linux/highmem.h +#include linux/gfp.h +#include linux/slab.h +#include linux/hugetlb.h +#include linux/list.h +#include linux/anon_inodes.h + +#include asm/tlbflush.h +#include asm/kvm_ppc.h +#include asm/kvm_book3s.h +#include asm/mmu-hash64.h +#include asm/hvcall.h +#include asm/synch.h +#include asm/ppc-opcode.h +#include asm/kvm_host.h +#include asm/udbg.h + +#define TCES_PER_PAGE
Re: [Qemu-devel] pci-assign can not work
At 03/16/2012 04:27 PM, Jan Kiszka Wrote: On 2012-03-16 03:38, Wen Congyang wrote: At 03/15/2012 06:21 PM, Wen Congyang Wrote: Hi all When I use pci-assign, I meet the following error: Failed to assign irq for hostdev0: Input/output error Perhaps you are assigning a device that shares an IRQ with another device? Is it a bug or I miss something? Hi, Jan This problem is caused by your patch: commit 6919115a8715c34cd80baa08422d90496f11f5d7 Author: Jan Kiszka jan.kis...@siemens.com Date: Thu Mar 8 11:10:27 2012 +0100 pci_assign: Flip defaults of prefer_msi and share_intx INTx sharing is a bit more expensive than exclusive host interrupts, but this channel is not supposed to be used for high-performance scenarios anyway. Modern devices support MSI/MSI-X and do not depend on using INTx under critical workload, real old devices do not support INTx sharing anyway. For those in the middle, the user experience is much better if they just work even when IRQ sharing is required. If there is nothing to share, share_intx=off can still be applied as tuning parameter. With INTx sharing as default, the primary reason for prefer_msi=on is gone. Make it default off, specifically as it is known to cause troubles with devices that have incomplete/broken MSI support or otherwise stumble if host IRQ configuration does not match guest driver expectation. Acked-by: Alex Williamson alex.william...@redhat.com Signed-off-by: Jan Kiszka jan.kis...@siemens.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com If I revert this commit. qemu can work. This should be solvable by passing prefer_msi=on to the pci-assign device, or likely by updating your host kernel to latest kvm.git (to enable INTx sharing). Is there some way to find out if the kernel supports to enable INTx sharing? Thanks Wen Congyang Hmm, unfortunate. We needed a conditional default for the prefer_msi property here. If INTx sharing doesn't work for some reason AND the user did not ask for disabling the host-side MSI usage, we should fall back to it again. Markus, is there some easy way to find out if a specific qdev property was set due to a command line switch or was defined by the default value? Jan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: Switch to srcu-less get_dirty_log()
On Fri, 16 Mar 2012 16:28:56 +0800 Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com wrote: Thanks for your explanation, maybe you are right, i do not know migration much. What i worried about is, you have changed the behaviour of GET_DIRTY_LOG, in the current one, it can get all the dirty pages when it is called; after your change, GET_DIRTY_LOG can get a empty dirty bitmap but dirty page exists. The current code also see the same situation because nothing prevents the guest from writing to pages before GET_DIRTY_LOG returns and the userspace checks the bitmap. Everything is running. Migration may work correctly depends on the final GET_DIRTY_LOG, in that time, guest is stopped. But i am not sure whether other components using GET_DIRTY_LOG are happy, e.g. frame-buffer. Ah, you are probably worrying about what I discussed with Avi before. In the end we cannot get a complete snapshot without stopping the guest like migration does. So that cannot be guaranteed. The only thing it can promise is to make it possible to get the log after mark_page_dirty(). Even when the bit is not marked at Nth GET_DIRTY_LOG time, we should be able to get it at (N+1)th call - maybe N+2. For VGA, the display continues to update endlessly and each page will be updated at some time: of course there may be a bit of time lag. BTW marking framebuffer pages is through MMU and mmu_lock protected. Thanks, Takuya -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] pci-assign can not work
On 2012-03-16 09:38, Wen Congyang wrote: At 03/16/2012 04:27 PM, Jan Kiszka Wrote: On 2012-03-16 03:38, Wen Congyang wrote: At 03/15/2012 06:21 PM, Wen Congyang Wrote: Hi all When I use pci-assign, I meet the following error: Failed to assign irq for hostdev0: Input/output error Perhaps you are assigning a device that shares an IRQ with another device? Is it a bug or I miss something? Hi, Jan This problem is caused by your patch: commit 6919115a8715c34cd80baa08422d90496f11f5d7 Author: Jan Kiszka jan.kis...@siemens.com Date: Thu Mar 8 11:10:27 2012 +0100 pci_assign: Flip defaults of prefer_msi and share_intx INTx sharing is a bit more expensive than exclusive host interrupts, but this channel is not supposed to be used for high-performance scenarios anyway. Modern devices support MSI/MSI-X and do not depend on using INTx under critical workload, real old devices do not support INTx sharing anyway. For those in the middle, the user experience is much better if they just work even when IRQ sharing is required. If there is nothing to share, share_intx=off can still be applied as tuning parameter. With INTx sharing as default, the primary reason for prefer_msi=on is gone. Make it default off, specifically as it is known to cause troubles with devices that have incomplete/broken MSI support or otherwise stumble if host IRQ configuration does not match guest driver expectation. Acked-by: Alex Williamson alex.william...@redhat.com Signed-off-by: Jan Kiszka jan.kis...@siemens.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com If I revert this commit. qemu can work. This should be solvable by passing prefer_msi=on to the pci-assign device, or likely by updating your host kernel to latest kvm.git (to enable INTx sharing). Is there some way to find out if the kernel supports to enable INTx sharing? QEMU does a feature check, but as a user you simply have to know which kernel version includes it (will be 3.4 or 3.5). Of course, that's not really handy. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: performance trouble
hello, sorry for the delay, Le Thu, Feb 23, 2012 at 10:38:07AM +0200, Gleb Natapov ecrivait : Ah, I guess the reason is that it records events only of IO thread. You need to trace all vcpu threads too. Not sure trace-cmd allows more then one -P option though. I manage to have the physical server with only one VM with the slowly function and take trace during the slowly function. I upload trace in http://www.roullier.net/Report/ with : o report.txt.3.1.gz : with kernel 3.1 o report.txt.3.2.gz : with kernel 3.2 o report.txt.vhost-net-3.1.gz : with kernel 3.1 and vhost-net o report.txt.vhost-net.3.2.gz : with kernel 3.2 and vhost-net With 3.2 + vhost-net we have 10.5s (to remember 8s with vmware esxi 4). David. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v3 1/9] net: introduce tcp_server_start()
On 14/03/12 22:58, Michael Roth wrote: On Wed, Mar 14, 2012 at 04:33:14PM +0800, Amos Kong wrote: On 14/03/12 00:39, Michael Roth wrote: On Wed, Mar 07, 2012 at 06:47:45AM +0800, Amos Kong wrote: Introduce tcp_server_start() by moving original code in tcp_start_incoming_migration(). Signed-off-by: Amos Kongak...@redhat.com --- net.c | 28 qemu_socket.h |2 ++ 2 files changed, 30 insertions(+), 0 deletions(-) +int tcp_server_start(const char *str, int *fd) +{ I would combine this with patch 2, since it provides context for why this function is being added. Would also do the same for 3 and 4. I see client the client implementation you need to pass fd back by reference since ret can be set to EINPROGRESS/EWOULDBLOCK on success, ret restores 0 or -socket_error() success: 0, -EINPROGRESS fail : ret 0 ret !=-EINTR ret != -EWOULDBLOCK , it should be -EINPROGRESS I see, I think I was confued by patch #4 where you do a +ret = tcp_client_start(host_port,s-fd); +if (ret == -EINPROGRESS || ret == -EWOULDBLOCK) { +DPRINTF(connect in progress); +qemu_set_fd_handler2(s-fd, NULL, NULL, tcp_wait_for_connect, s); If ret == EWOULDBLOCK is a failure (or if the call isn't supposed to return EWOULDBLOCK), we should fail it there rather than passing it on to tcp_wait_for_connect(). You are right, it should be : if (ret == -EINPROGRESS) { Also, is there any reason we can't re-use qemu-sockets.c:inet_listen()/qemu-sockets.c:inet_connect()? AFAICT they serve the same purpose, and already include some of the work from your PATCH #6. We could not directly use it, there are some difference, such as tcp_start_incoming_migration() doesn't set socket no-blocked, but net_socket_listen_init() sets socket no-blocked. I think adding a common function with blocking/non-blocking flag and having inet_listen_opts()/socket_listen_opts() call it with a wrapper would be reasonable. A lot of code is being introduced here to solve problems that are already handled in qemu-sockets.c. inet_listen()/inet_connect() already handles backeted-enclosed ipv6 addrs, getting port numbers when there's more than one colon, getaddrinfo()-based connections, and most importantly it's had ipv6 support from day 1. Not 100% sure it'll work for what you're doing, but qemu-sockets.c was specifically added for this type of use-case and is heavilly used currently (vnc, nbd, Chardev users), so I think we should use it unless there's a good reason not to. There are many special request for migration, which is not implemented in inet_listen_opts()/socket_listen_opts(), but many codes can be reused, I would re-write patches. Thanks, Amos -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] kgdb: Respect that flush op is optional
Not all kgdb I/O drivers implement a flush operation. Adjust gdbstub_exit accordingly. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- kernel/debug/gdbstub.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/kernel/debug/gdbstub.c b/kernel/debug/gdbstub.c index 5d7ed0a..c174ea3 100644 --- a/kernel/debug/gdbstub.c +++ b/kernel/debug/gdbstub.c @@ -1132,5 +1132,6 @@ void gdbstub_exit(int status) dbg_io_ops-write_char(hex_asc_lo(checksum)); /* make sure the output is flushed, lest the bootloader clobber it */ - dbg_io_ops-flush(); + if (dbg_io_ops-flush) + dbg_io_ops-flush(); } -- 1.7.3.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] kgdb: x86: Return all segment registers also in 64-bit mode
Even if the content is always 0, gdb expects us to return also ds, es, fs, and gs while in x86-64 mode. Do this to avoid ugly errors on info registers. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- arch/x86/include/asm/kgdb.h |6 +- arch/x86/kernel/kgdb.c |6 -- 2 files changed, 9 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/kgdb.h b/arch/x86/include/asm/kgdb.h index 77e95f5..e857f1a 100644 --- a/arch/x86/include/asm/kgdb.h +++ b/arch/x86/include/asm/kgdb.h @@ -64,9 +64,13 @@ enum regnames { GDB_PS, /* 17 */ GDB_CS, /* 18 */ GDB_SS, /* 19 */ + GDB_DS, /* 20 */ + GDB_ES, /* 21 */ + GDB_FS, /* 22 */ + GDB_GS, /* 23 */ }; #define GDB_ORIG_AX57 -#define DBG_MAX_REG_NUM20 +#define DBG_MAX_REG_NUM24 /* 17 64 bit regs and 3 32 bit regs */ #define NUMREGBYTES((17 * 8) + (3 * 4)) #endif /* ! CONFIG_X86_32 */ diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c index faba577..fdc37b3 100644 --- a/arch/x86/kernel/kgdb.c +++ b/arch/x86/kernel/kgdb.c @@ -67,8 +67,6 @@ struct dbg_reg_def_t dbg_reg_def[DBG_MAX_REG_NUM] = { ss, 4, offsetof(struct pt_regs, ss) }, { ds, 4, offsetof(struct pt_regs, ds) }, { es, 4, offsetof(struct pt_regs, es) }, - { fs, 4, -1 }, - { gs, 4, -1 }, #else { ax, 8, offsetof(struct pt_regs, ax) }, { bx, 8, offsetof(struct pt_regs, bx) }, @@ -90,7 +88,11 @@ struct dbg_reg_def_t dbg_reg_def[DBG_MAX_REG_NUM] = { flags, 4, offsetof(struct pt_regs, flags) }, { cs, 4, offsetof(struct pt_regs, cs) }, { ss, 4, offsetof(struct pt_regs, ss) }, + { ds, 4, -1 }, + { es, 4, -1 }, #endif + { fs, 4, -1 }, + { gs, 4, -1 }, }; int dbg_set_reg(int regno, void *mem, struct pt_regs *regs) -- 1.7.3.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] kgdb: x86: Detach gdb if machine shuts down or reboots
Hook into machine restart/power-off/halt handlers and call gdbstub_exit so that a attached gdb frontend is properly informed. If kgdb is disabled or no frontend attached, gdbstub_exit will do nothing. CC: Thomas Gleixner t...@linutronix.de CC: Ingo Molnar mi...@redhat.com CC: H. Peter Anvin h...@zytor.com CC: x...@kernel.org Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- arch/x86/kernel/reboot.c |6 ++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c index d840e69..926ac17 100644 --- a/arch/x86/kernel/reboot.c +++ b/arch/x86/kernel/reboot.c @@ -7,6 +7,7 @@ #include linux/sched.h #include linux/tboot.h #include linux/delay.h +#include linux/kgdb.h #include acpi/reboot.h #include asm/io.h #include asm/apic.h @@ -683,6 +684,7 @@ void native_machine_shutdown(void) static void __machine_emergency_restart(int emergency) { reboot_emergency = emergency; + gdbstub_exit(1); machine_ops.emergency_restart(); } @@ -730,6 +732,7 @@ struct machine_ops machine_ops = { void machine_power_off(void) { + gdbstub_exit(0); machine_ops.power_off(); } @@ -745,17 +748,20 @@ void machine_emergency_restart(void) void machine_restart(char *cmd) { + gdbstub_exit(0); machine_ops.restart(cmd); } void machine_halt(void) { + gdbstub_exit(0); machine_ops.halt(); } #ifdef CONFIG_KEXEC void machine_crash_shutdown(struct pt_regs *regs) { + gdbstub_exit(1); machine_ops.crash_shutdown(regs); } #endif -- 1.7.3.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4] kgdb: Small usability improvements for x86
This cleans up the info register result on x86 and adds gdb detach on reboot/shutdown for this target arch. See patches for details. CC: H. Peter Anvin h...@zytor.com CC: Ingo Molnar mi...@redhat.com CC: Thomas Gleixner t...@linutronix.de CC: x...@kernel.org Jan Kiszka (4): kgdb: x86: Return all segment registers also in 64-bit mode kgdb: Make gdbstub_exit a nop unless gdb is attached kgdb: Respect that flush op is optional kgdb: x86: Detach gdb if machine shuts down or reboots arch/x86/include/asm/kgdb.h |6 +- arch/x86/kernel/kgdb.c |6 -- arch/x86/kernel/reboot.c|6 ++ include/linux/kgdb.h|1 + kernel/debug/gdbstub.c |6 +- 5 files changed, 21 insertions(+), 4 deletions(-) -- 1.7.3.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] kgdb: Make gdbstub_exit a nop unless gdb is attached
This allows to call gdbstub_exit without worrying if - CONFIG_KGDB is enabled - if an kgdb I/O driver is loaded - if a gdb frontend is currently attached Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- include/linux/kgdb.h |1 + kernel/debug/gdbstub.c |3 +++ 2 files changed, 4 insertions(+), 0 deletions(-) diff --git a/include/linux/kgdb.h b/include/linux/kgdb.h index fa39183..16410e2 100644 --- a/include/linux/kgdb.h +++ b/include/linux/kgdb.h @@ -306,6 +306,7 @@ extern atomic_t kgdb_active; extern bool dbg_is_early; extern void __init dbg_late_init(void); #else /* ! CONFIG_KGDB */ +static inline void gdbstub_exit(int status) { } #define in_dbg_master() (0) #define dbg_late_init() #endif /* ! CONFIG_KGDB */ diff --git a/kernel/debug/gdbstub.c b/kernel/debug/gdbstub.c index c22d8c2..5d7ed0a 100644 --- a/kernel/debug/gdbstub.c +++ b/kernel/debug/gdbstub.c @@ -,6 +,9 @@ void gdbstub_exit(int status) unsigned char checksum, ch, buffer[3]; int loop; + if (!dbg_io_ops || !kgdb_connected) + return; + buffer[0] = 'W'; buffer[1] = hex_asc_hi(status); buffer[2] = hex_asc_lo(status); -- 1.7.3.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] kgdb: Small usability improvements for x86
On 2012-03-16 12:40, Jan Kiszka wrote: This cleans up the info register result on x86 and adds gdb detach on reboot/shutdown for this target arch. See patches for details. CC: H. Peter Anvin h...@zytor.com CC: Ingo Molnar mi...@redhat.com CC: Thomas Gleixner t...@linutronix.de CC: x...@kernel.org Jan Kiszka (4): kgdb: x86: Return all segment registers also in 64-bit mode kgdb: Make gdbstub_exit a nop unless gdb is attached kgdb: Respect that flush op is optional kgdb: x86: Detach gdb if machine shuts down or reboots arch/x86/include/asm/kgdb.h |6 +- arch/x86/kernel/kgdb.c |6 -- arch/x86/kernel/reboot.c|6 ++ include/linux/kgdb.h|1 + kernel/debug/gdbstub.c |6 +- 5 files changed, 21 insertions(+), 4 deletions(-) Sorry, wrong CC. Should have gone to LKML instead of the kvm list. I will repost. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][SeaBIOS] memory hotplug
Hi, On Thu, Mar 15, 2012 at 02:01:38PM +0200, Gleb Natapov wrote: Commenting a little bit late, but since you've said that you are working on a new version of the patch... better late than never. On Thu, Aug 11, 2011 at 04:39:38PM +0200, Vasilis Liaskovitis wrote: Hi, I am testing a set of experimental patches for memory-hotplug on x86_64 host / guest combinations. I have implemented this in a similar way to cpu-hotplug. A dynamic SSDT table with all memory devices is created at boot time. This table calls static methods from the DSDT. A byte array indicates which memory device is online or not. This array is kept in sync with a qemu-kvm bitmap array through ioport 0xaf20. Qemu-kvm updates this table on a mem_set command and an ACPI event is triggered. Memory devices are 128MB in size (to match /sys/devices/memory/block_size_bytes in x86_64). They are constructed dynamically in src/ssdt-mem.asl , similarly to hotpluggable-CPUs. The _CRS memstart-memend attribute for each memory device is defined accordingly, skipping the hole at 0xe000 - 0x1. Hotpluggable memory is always located above 4GB. What is the reason for this limitation? We currently model a PCI hole from below_4g_mem_size to 4GB, see i440fx_init call in pc_init1. The decision was discussed here: http://patchwork.ozlabs.org/patch/105892/ afaict because there was no clear resolution on using a top-of-memory register. So, hotplugging will start at 4GB + above_4g_mem_size. Unless we can model the pci hole more accurately hardware-wise. Qemu-kvm sets the upper bound of hotpluggable memory with maxmem = [totalmemory in MB] on the command line. Maxmem is an argument for -m similar to maxcpus for smp. E.g. -m 1024,maxmem=2048 on the qemu command line will create memory devices for 2GB of RAM, enabling only 1GB initially. Qemu_monitor triggers a memory hotplug with: (qemu) mem_set [memory range in MBs] online As far as I see mem_set does not get memory range as a parameter. The parameter is amount of memory to add/remove and memory is added/removed to/from the top. This is not flexible enough. Find grained control for memory slots is needed. What about exposing memory slot configuration to command line like this: -memslot mem=size,populated=yes|no adding one of those for each slot. yes, I agree we need this. Is the idea to model all physical DIMMs? For initial system RAM does it make sense to explicitly specify slots at the command line, or infer them? I think we can allocate a new qemu ram MemoryRegion for each new hotplugged slot/DIMM, so there will be a 1-1 mapping between new populated slots and qemu memory ram regions. Perhaps we want initial memory allocation to also comply with physical slot/DIMM modeling. Initial (cold) RAM is created as a single MemoryRegion pc.ram Also in kvm we can easily run out of kvm_memory_slots (10 slots per VM and 32 system-wide I think) mem_set will get slot id to populate/depopulate just like cpu_set gets cpu slot number to remove and not just yanks cpus with highest slot id. right, but I think for upstream qemu, people would like to eventually use device_add, instead of a new mem_set command. Pretty much the same way as cpu hotplug? For this to happen, memory devices should be modeled in QOM/qdev. Are we planning on keeping a CPUSocket structures for CPUs? or perhaps modelling a memory controller is the right way. What type should the memory controller/devices be a child of? I 'll try to resubmit in a few weeks time, though depending on feedack qom/qdev of memory devices will probably take longer. thanks, - Vasilis -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][SeaBIOS] memory hotplug
On 03/16/2012 03:09 PM, Vasilis Liaskovitis wrote: Hi, On Thu, Mar 15, 2012 at 02:01:38PM +0200, Gleb Natapov wrote: Commenting a little bit late, but since you've said that you are working on a new version of the patch... better late than never. On Thu, Aug 11, 2011 at 04:39:38PM +0200, Vasilis Liaskovitis wrote: Hi, I am testing a set of experimental patches for memory-hotplug on x86_64 host / guest combinations. I have implemented this in a similar way to cpu-hotplug. A dynamic SSDT table with all memory devices is created at boot time. This table calls static methods from the DSDT. A byte array indicates which memory device is online or not. This array is kept in sync with a qemu-kvm bitmap array through ioport 0xaf20. Qemu-kvm updates this table on a mem_set command and an ACPI event is triggered. Memory devices are 128MB in size (to match /sys/devices/memory/block_size_bytes in x86_64). They are constructed dynamically in src/ssdt-mem.asl , similarly to hotpluggable-CPUs. The _CRS memstart-memend attribute for each memory device is defined accordingly, skipping the hole at 0xe000 - 0x1. Hotpluggable memory is always located above 4GB. What is the reason for this limitation? We currently model a PCI hole from below_4g_mem_size to 4GB, see i440fx_init call in pc_init1. The decision was discussed here: http://patchwork.ozlabs.org/patch/105892/ afaict because there was no clear resolution on using a top-of-memory register. So, hotplugging will start at 4GB + above_4g_mem_size. Unless we can model the pci hole more accurately hardware-wise. Qemu-kvm sets the upper bound of hotpluggable memory with maxmem = [totalmemory in MB] on the command line. Maxmem is an argument for -m similar to maxcpus for smp. E.g. -m 1024,maxmem=2048 on the qemu command line will create memory devices for 2GB of RAM, enabling only 1GB initially. Qemu_monitor triggers a memory hotplug with: (qemu) mem_set [memory range in MBs] online As far as I see mem_set does not get memory range as a parameter. The parameter is amount of memory to add/remove and memory is added/removed to/from the top. This is not flexible enough. Find grained control for memory slots is needed. What about exposing memory slot configuration to command line like this: -memslot mem=size,populated=yes|no adding one of those for each slot. yes, I agree we need this. Is the idea to model all physical DIMMs? For initial system RAM does it make sense to explicitly specify slots at the command line, or infer them? I think we can allocate a new qemu ram MemoryRegion for each new hotplugged slot/DIMM, so there will be a 1-1 mapping between new populated slots and qemu memory ram regions. Perhaps we want initial memory allocation to also comply with physical slot/DIMM modeling. Initial (cold) RAM is created as a single MemoryRegion pc.ram Also in kvm we can easily run out of kvm_memory_slots (10 slots per VM and 32 system-wide I think) mem_set will get slot id to populate/depopulate just like cpu_set gets cpu slot number to remove and not just yanks cpus with highest slot id. right, but I think for upstream qemu, people would like to eventually use device_add, instead of a new mem_set command. Pretty much the same way as cpu hotplug? For this to happen, memory devices should be modeled in QOM/qdev. Are we planning on keeping a CPUSocket structures for CPUs? or perhaps modelling a memory controller I'd rather dump CPUSocket structure unless it's really required, it was introduced just for providing hotplug-able icc bus for cpus since hot-plug on sysbus was disabled. is the right way. What type should the memory controller/devices be a child of? I 'll try to resubmit in a few weeks time, though depending on feedack qom/qdev of memory devices will probably take longer. thanks, - Vasilis -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- - Igor -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
QEMU was not selected for Google Summer of Code this year
Sad news - QEMU was not accepted for Google Summer of Code 2012. Students can consider other organizations in the accepted organizations list here: http://www.google-melange.com/gsoc/accepted_orgs/google/gsoc2012 The list is currently not complete but should be finalized over the next few days as organizations complete their profiles. Students and mentors who wanted to participate with QEMU will be disappointed. I am too but there are many factors that organizations are considered against, we have not received information why QEMU was rejected this year. QEMU will try again for sure next year because Google Summer of Code is a great program both for students and for QEMU. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] Isolation groups
On Fri, 2012-03-16 at 14:45 +1100, David Gibson wrote: On Thu, Mar 15, 2012 at 02:15:01PM -0600, Alex Williamson wrote: On Wed, 2012-03-14 at 20:58 +1100, David Gibson wrote: On Tue, Mar 13, 2012 at 10:49:47AM -0600, Alex Williamson wrote: On Wed, 2012-03-14 at 01:33 +1100, David Gibson wrote: On Mon, Mar 12, 2012 at 04:32:54PM -0600, Alex Williamson wrote: +/* + * Add a device to an isolation group. Isolation groups start empty and + * must be told about the devices they contain. Expect this to be called + * from isolation group providers via notifier. + */ Doesn't necessarily have to be from a notifier, particularly if the provider is integrated into host bridge code. Sure, a provider could do this on it's own if it wants. This just provides some infrastructure for a common path. Also note that this helps to eliminate all the #ifdef CONFIG_ISOLATION in the provider. Yet to be seen whether that can reasonably be the case once isolation groups are added to streaming DMA paths. Right, but other than the #ifdef safety, which could be achieved more simply, I'm not seeing what benefit the infrastructure provides over directly calling the bus notifier function. The infrastructure groups the notifiers by bus type internally, but AFAICT exactly one bus notifier call would become exactly one isolation notifier call, and the notifier callback itself would be almost identical. I guess I don't see this as a fundamental design point of the proposal, it's just a convenient way to initialize groups as a side-band addition until isolation groups become a more fundamental part of the iommu infrastructure. If you want to do that level of integration in your provider, do it and make the callbacks w/o the notifier. If nobody ends up using them, we'll remove them. Maybe it will just end up being a bootstrap. In the typical case, yes, one bus notifier is one isolation notifier. It does however also allow one bus notifier to become multiple isolation notifiers, and includes some filtering that would just be duplicated if every provider decided to implement their own bus notifier. Uh.. I didn't notice any filtering? That's why I'm asking. Not much, but a little: + switch (action) { + case BUS_NOTIFY_ADD_DEVICE: + if (!dev-isolation_group) + blocking_notifier_call_chain(notifier-notifier, + ISOLATION_NOTIFY_ADD_DEVICE, dev); + break; + case BUS_NOTIFY_DEL_DEVICE: + if (dev-isolation_group) + blocking_notifier_call_chain(notifier-notifier, + ISOLATION_NOTIFY_DEL_DEVICE, dev); + break; + } ... So, somewhere, I think we need a fallback path, but I'm not sure exactly where. If an isolation provider doesn't explicitly put a device into a group, the device should go into the group of its parent bridge. This covers the case of a bus with IOMMU which has below it a bridge to a different type of DMA capable bus (which the IOMMU isn't explicitly aware of). DMAs from devices on the subordinate bus can be translated by the top-level IOMMU (assuming it sees them as coming from the bridge), but they all need to be treated as one group. Why would the top level IOMMU provider not set the isolation group in this case. Because it knows nothing about the subordinate bus. For example imagine a VT-d system, with a wacky PCI card into which you plug some other type of DMA capable device. The PCI card is acting as a bridge from PCI to this, let's call it FooBus. Obviously the VT-d code won't have a FooBus notifier, since it knows nothing about FooBus. But the FooBus devices still need to end up in the group of the PCI bridge device, since their DMA operations will appear as coming from the PCI bridge card to the IOMMU, and can be isolated from the rest of the system (but not each other) on that basis. I guess I was imagining that it's ok to have devices without an isolation group. It is, but having NULL isolation group has a pretty specific meaning - it means it's never safe to give that device to userspace, but it also means that normal kernel driver operation of that device must not interfere with anything in any iso group (otherwise we can never no that those iso groups _are_ safe to hand out). Likewise userspace operation of any isolation group can't mess with no-group devices. This is where wanting to use isolation groups as the working unit for an iommu ops layer and also wanting to use iommu ops to replace dma ops seem to collide a bit. Do we want two interfaces for dma, one group based and one for non-isolated devices? Isolation providers like intel-iommu would always use one,
Re: [Qemu-devel] QEMU was not selected for Google Summer of Code this year
Really sad news :( On 16/03/2012, at 19:29, Stefan Hajnoczi wrote: Sad news - QEMU was not accepted for Google Summer of Code 2012. Students can consider other organizations in the accepted organizations list here: http://www.google-melange.com/gsoc/accepted_orgs/google/gsoc2012 The list is currently not complete but should be finalized over the next few days as organizations complete their profiles. Students and mentors who wanted to participate with QEMU will be disappointed. I am too but there are many factors that organizations are considered against, we have not received information why QEMU was rejected this year. I've noted this year there are half the approved organizations than last year... QEMU will try again for sure next year because Google Summer of Code is a great program both for students and for QEMU. Stefan Natalia Portllo-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] QEMU was not selected for Google Summer of Code this year
On Fri, Mar 16, 2012 at 7:44 PM, Natalia Portillo clau...@claunia.com wrote: Really sad news :( On 16/03/2012, at 19:29, Stefan Hajnoczi wrote: Sad news - QEMU was not accepted for Google Summer of Code 2012. Students can consider other organizations in the accepted organizations list here: http://www.google-melange.com/gsoc/accepted_orgs/google/gsoc2012 The list is currently not complete but should be finalized over the next few days as organizations complete their profiles. Students and mentors who wanted to participate with QEMU will be disappointed. I am too but there are many factors that organizations are considered against, we have not received information why QEMU was rejected this year. I've noted this year there are half the approved organizations than last year... The list of accepted orgs is not complete yet, more will be displayed as they fill out their details in the http://google-melange.com/ web app. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] QEMU was not selected for Google Summer of Code this year
QEMU hosted on Haiku would be interesting. On 16/03/2012, at 22:30, Stefan Hajnoczi wrote: On Fri, Mar 16, 2012 at 7:44 PM, Natalia Portillo clau...@claunia.com wrote: Really sad news :( On 16/03/2012, at 19:29, Stefan Hajnoczi wrote: Sad news - QEMU was not accepted for Google Summer of Code 2012. Students can consider other organizations in the accepted organizations list here: http://www.google-melange.com/gsoc/accepted_orgs/google/gsoc2012 The list is currently not complete but should be finalized over the next few days as organizations complete their profiles. Students and mentors who wanted to participate with QEMU will be disappointed. I am too but there are many factors that organizations are considered against, we have not received information why QEMU was rejected this year. I've noted this year there are half the approved organizations than last year... The list of accepted orgs is not complete yet, more will be displayed as they fill out their details in the http://google-melange.com/ web app. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] fix multiboot loading if load_end_addr == 0
The previous code did not treat the case where load_end_addr was 0 specially. The multiboot specification says the following: * load_end_addr Contains the physical address of the end of the data segment. (load_end_addr - load_addr) specifies how much data to load. This implies that the text and data segments must be consecutive in the OS image; this is true for existing a.out executable formats. If this field is zero, the boot loader assumes that the text and data segments occupy the whole OS image file. This was raised initially as launchpad bug https://bugs.launchpad.net/qemu/+bug/957622 diff --git a/hw/multiboot.c b/hw/multiboot.c index b4484a3..b1e04c5 100644 --- a/hw/multiboot.c +++ b/hw/multiboot.c @@ -202,10 +202,16 @@ int load_multiboot(void *fw_cfg, uint32_t mh_bss_end_addr = ldl_p(header+i+24); mh_load_addr = ldl_p(header+i+16); uint32_t mb_kernel_text_offset = i - (mh_header_addr - mh_load_addr); -uint32_t mb_load_size = mh_load_end_addr - mh_load_addr; - +uint32_t mb_load_size = 0; mh_entry_addr = ldl_p(header+i+28); -mb_kernel_size = mh_bss_end_addr - mh_load_addr; + +if (mh_load_end_addr) { +mb_kernel_size = mh_bss_end_addr - mh_load_addr; +mb_load_size = mh_load_end_addr - mh_load_addr; +} else { +mb_kernel_size = kernel_file_size - mb_kernel_text_offset; +mb_load_size = mb_kernel_size; +} /* Valid if mh_flags sets MULTIBOOT_HEADER_HAS_VBE. uint32_t mh_mode_type = ldl_p(header+i+32); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] Isolation groups
On Fri, Mar 16, 2012 at 01:31:18PM -0600, Alex Williamson wrote: On Fri, 2012-03-16 at 14:45 +1100, David Gibson wrote: On Thu, Mar 15, 2012 at 02:15:01PM -0600, Alex Williamson wrote: On Wed, 2012-03-14 at 20:58 +1100, David Gibson wrote: On Tue, Mar 13, 2012 at 10:49:47AM -0600, Alex Williamson wrote: On Wed, 2012-03-14 at 01:33 +1100, David Gibson wrote: On Mon, Mar 12, 2012 at 04:32:54PM -0600, Alex Williamson wrote: +/* + * Add a device to an isolation group. Isolation groups start empty and + * must be told about the devices they contain. Expect this to be called + * from isolation group providers via notifier. + */ Doesn't necessarily have to be from a notifier, particularly if the provider is integrated into host bridge code. Sure, a provider could do this on it's own if it wants. This just provides some infrastructure for a common path. Also note that this helps to eliminate all the #ifdef CONFIG_ISOLATION in the provider. Yet to be seen whether that can reasonably be the case once isolation groups are added to streaming DMA paths. Right, but other than the #ifdef safety, which could be achieved more simply, I'm not seeing what benefit the infrastructure provides over directly calling the bus notifier function. The infrastructure groups the notifiers by bus type internally, but AFAICT exactly one bus notifier call would become exactly one isolation notifier call, and the notifier callback itself would be almost identical. I guess I don't see this as a fundamental design point of the proposal, it's just a convenient way to initialize groups as a side-band addition until isolation groups become a more fundamental part of the iommu infrastructure. If you want to do that level of integration in your provider, do it and make the callbacks w/o the notifier. If nobody ends up using them, we'll remove them. Maybe it will just end up being a bootstrap. In the typical case, yes, one bus notifier is one isolation notifier. It does however also allow one bus notifier to become multiple isolation notifiers, and includes some filtering that would just be duplicated if every provider decided to implement their own bus notifier. Uh.. I didn't notice any filtering? That's why I'm asking. Not much, but a little: + switch (action) { + case BUS_NOTIFY_ADD_DEVICE: + if (!dev-isolation_group) + blocking_notifier_call_chain(notifier-notifier, + ISOLATION_NOTIFY_ADD_DEVICE, dev); + break; + case BUS_NOTIFY_DEL_DEVICE: + if (dev-isolation_group) + blocking_notifier_call_chain(notifier-notifier, + ISOLATION_NOTIFY_DEL_DEVICE, dev); + break; + } Ah, I see, fair enough. A couple of tangential observations. First, I suspect using BUS_NOTIFY_DEL_DEVICE is a very roundabout way of handling hot-unplug, it might be better to have an unplug callback in the group instead. Second, I don't think aborting the call chain early for hot-plug is actually a good idea. I can't see a clear guarantee on the order, so individual providers couldn't rely on that short-cut behaviour. Which means that if two providers would have attempted to claim the same device, something is seriously wrong and we should probably report that. ... So, somewhere, I think we need a fallback path, but I'm not sure exactly where. If an isolation provider doesn't explicitly put a device into a group, the device should go into the group of its parent bridge. This covers the case of a bus with IOMMU which has below it a bridge to a different type of DMA capable bus (which the IOMMU isn't explicitly aware of). DMAs from devices on the subordinate bus can be translated by the top-level IOMMU (assuming it sees them as coming from the bridge), but they all need to be treated as one group. Why would the top level IOMMU provider not set the isolation group in this case. Because it knows nothing about the subordinate bus. For example imagine a VT-d system, with a wacky PCI card into which you plug some other type of DMA capable device. The PCI card is acting as a bridge from PCI to this, let's call it FooBus. Obviously the VT-d code won't have a FooBus notifier, since it knows nothing about FooBus. But the FooBus devices still need to end up in the group of the PCI bridge device, since their DMA operations will appear as coming from the PCI bridge card to the IOMMU, and can be isolated from the rest of the system (but not each other) on that basis. I guess I was imagining that it's ok to have devices
Re: [Qemu-devel] QEMU was not selected for Google Summer of Code this year
* Natalia Portillo (clau...@claunia.com) wrote: QEMU hosted on Haiku would be interesting. The fun of Haiku especially when it is hosting QEMU -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH/RFC] kvm/powerpc: Add new ioctl to retreive support page sizes and encodings
This is necessary for qemu to be able to pass the right information to the guest, as the supported sizes and encodings can vary depending on the machine, the type of KVM used (PR vs HV) and the version of KVM Signed-off-by: Benjamin Herrenschmidt b...@kernel.crashing.org --- Please comment ASAP. I'm tired of the qemu side never working properly because of that and our out-of-tree nasty patches we've been carrying internally, so I'd like to get something like that in real quick :-) I have the qemu side patches that use this to generate the appropriate device-tree when available, and use heuristics for the fallback. I'll post them later, let's agree on the kernel interfaces first. The heuristics work as long as we have a reasonable guarantee that this kernel patch will get in -before- any patch that enables the PVINFO ioctl on HV KVM, that way I can rely on the later not working as a way to differenciate PR and HV KVM if this new ioctl is not supported. Note: We probably want an other ioctl for getting other type of MMU info, such as whether we support 1T segments etc... but I didn't want to try to kill to many birds at once and end up in bike shed painting on the mailing list for the next 6 month... Cheers, Ben. arch/powerpc/include/asm/kvm_ppc.h |3 ++- arch/powerpc/kvm/book3s_hv.c | 35 +++ arch/powerpc/kvm/book3s_pr.c | 22 ++ arch/powerpc/kvm/powerpc.c | 18 +- include/linux/kvm.h| 29 + 5 files changed, 105 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index c1069f6..bf530fd 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -140,7 +140,8 @@ extern int kvmppc_core_prepare_memory_region(struct kvm *kvm, struct kvm_userspace_memory_region *mem); extern void kvmppc_core_commit_memory_region(struct kvm *kvm, struct kvm_userspace_memory_region *mem); - +extern int kvm_vm_ioctl_get_page_sizes(struct kvm *kvm, + struct kvm_ppc_page_sizes *ps); extern int kvmppc_bookehv_init(void); extern void kvmppc_bookehv_exit(void); diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 8ee46b9..c7f7f20 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -1174,6 +1174,37 @@ long kvm_vm_ioctl_allocate_rma(struct kvm *kvm, struct kvm_allocate_rma *ret) return fd; } +static void kvmppc_add_seg_page_size(struct kvm_ppc_one_seg_page_size **sps, +int linux_psize) +{ + struct mmu_psize_def *def = mmu_psize_defs[linux_psize]; + + if (!def-shift) + return; + *sps-page_shift = def-shift; + *sps-slb_enc = def-sllp; + *sps-enc[0].page_shift = def-shift; + *sps-enc[0].pte_enc = def-penc; + *sps++; +} + +int kvm_vm_ioctl_get_page_sizes(struct kvm *kvm, struct kvm_ppc_page_sizes *ps) +{ + struct kvm_ppc_one_seg_page_size *sps; + int i; + + /* Page sizes limited by backing store */ + ps-flags = KVM_PPC_PAGE_SIZES_REAL; + + /* We only support these sizes for now, and no muti-size segments */ + sps = ps-sps[0]; + kvmppc_add_seg_page_size(sps, MMU_PAGE_4K); + kvmppc_add_seg_page_size(sps, MMU_PAGE_64K); + kvmppc_add_seg_page_size(sps, MMU_PAGE_16M); + + return 0; +} + /* * Get (and clear) the dirty memory log for a memory slot. */ @@ -1211,6 +1242,10 @@ out: return r; } +int kvm_vm_ioctl_get_page_sizes(struct kvm *kvm, struct kvm_ppc_page_sizes *ps) +{ +} + static unsigned long slb_pgsize_encoding(unsigned long psize) { unsigned long senc = 0; diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c index 5f0ee48..3c823ed 100644 --- a/arch/powerpc/kvm/book3s_pr.c +++ b/arch/powerpc/kvm/book3s_pr.c @@ -1154,6 +1154,28 @@ out: return r; } +#ifdef CONFIG_PPC64 +int kvm_vm_ioctl_get_page_sizes(struct kvm *kvm, struct kvm_ppc_page_sizes *ps) +{ + /* No flags */ + ps-flags = 0; + + /* Standard 4k base page size segment */ + ps-sps[0].page_shift = 12; + ps-sps[0].slb_enc = 0; + ps-sps[0].enc[0].page_shift = 12; + ps-sps[0].enc[0].pte_enc = 0; + + /* Standard 16M large page size segment */ + ps-sps[1].page_shift = 24; + ps-sps[1].slb_enc = SLB_VSID_L; + ps-sps[1].enc[0].page_shift = 24; + ps-sps[1].enc[0].pte_enc = 0; + + return 0; +} +#endif /* CONFIG_PPC64 */ + int kvmppc_core_prepare_memory_region(struct kvm *kvm, struct kvm_userspace_memory_region *mem) { diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 6ac3115..6f0c066 100644 --- a/arch/powerpc/kvm/powerpc.c +++
[PATCH v2] kvm/book3s: Make kernel emulated H_PUT_TCE available for PR KVM
There is nothing in the code for emulating TCE tables in the kernel that prevents it from working on PR KVM... other than ifdef's and location of the code. This renames book3s_64_vio_hv.c to book3s_64_vio.c and moves the bulk of the code there. This speeds things up a bit on my G5. --- v2. Changed the ifdef as per discussion with Alex. I still didn't manage to get git to figure out the rename but that's no big deal, the old file had only one small function in it. There's no code change, you can trust me on that one, It's really just moving things around :-) arch/powerpc/include/asm/kvm_host.h |4 +- arch/powerpc/include/asm/kvm_ppc.h |2 + arch/powerpc/kvm/Makefile |3 +- arch/powerpc/kvm/book3s_64_vio.c| 187 +++ arch/powerpc/kvm/book3s_64_vio_hv.c | 73 -- arch/powerpc/kvm/book3s_hv.c| 109 arch/powerpc/kvm/book3s_pr.c|3 + arch/powerpc/kvm/book3s_pr_papr.c | 18 arch/powerpc/kvm/powerpc.c |8 +- 9 files changed, 221 insertions(+), 186 deletions(-) create mode 100644 arch/powerpc/kvm/book3s_64_vio.c delete mode 100644 arch/powerpc/kvm/book3s_64_vio_hv.c diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 42a527e..d848cdc 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -237,7 +237,6 @@ struct kvm_arch { unsigned long vrma_slb_v; int rma_setup_done; int using_mmu_notifiers; - struct list_head spapr_tce_tables; spinlock_t slot_phys_lock; unsigned long *slot_phys[KVM_MEM_SLOTS_NUM]; int slot_npages[KVM_MEM_SLOTS_NUM]; @@ -245,6 +244,9 @@ struct kvm_arch { struct kvmppc_vcore *vcores[KVM_MAX_VCORES]; struct kvmppc_linear_info *hpt_li; #endif /* CONFIG_KVM_BOOK3S_64_HV */ +#ifdef CONFIG_PPC_BOOK3S_64 + struct list_head spapr_tce_tables; +#endif }; /* diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 7f0a3da..c1069f6 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -126,6 +126,8 @@ extern void kvmppc_map_vrma(struct kvm_vcpu *vcpu, extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu); extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm, struct kvm_create_spapr_tce *args); +extern long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn, +unsigned long ioba, unsigned long tce); extern long kvm_vm_ioctl_allocate_rma(struct kvm *kvm, struct kvm_allocate_rma *rma); extern struct kvmppc_linear_info *kvm_alloc_rma(void); diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile index 25225ae..8c95def 100644 --- a/arch/powerpc/kvm/Makefile +++ b/arch/powerpc/kvm/Makefile @@ -54,6 +54,7 @@ kvm-book3s_64-objs-$(CONFIG_KVM_BOOK3S_64_PR) := \ book3s_paired_singles.o \ book3s_pr.o \ book3s_pr_papr.o \ + book3s_64_vio.o \ book3s_emulate.o \ book3s_interrupts.o \ book3s_mmu_hpte.o \ @@ -70,7 +71,7 @@ kvm-book3s_64-objs-$(CONFIG_KVM_BOOK3S_64_HV) := \ kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_HV) := \ book3s_hv_rmhandlers.o \ book3s_hv_rm_mmu.o \ - book3s_64_vio_hv.o \ + book3s_64_vio.o \ book3s_hv_builtin.o kvm-book3s_64-module-objs := \ diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c new file mode 100644 index 000..193ba68 --- /dev/null +++ b/arch/powerpc/kvm/book3s_64_vio.c @@ -0,0 +1,187 @@ +/* + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. + * + * Copyright 2010 Paul Mackerras, IBM Corp. pau...@au1.ibm.com + * Copyright 2011 David Gibson, IBM Corporation d...@au1.ibm.com + */ + +#include linux/types.h +#include linux/string.h +#include linux/kvm.h +#include linux/kvm_host.h +#include linux/highmem.h +#include linux/gfp.h +#include linux/slab.h +#include linux/hugetlb.h +#include linux/list.h +#include linux/anon_inodes.h + +#include asm/tlbflush.h +#include asm/kvm_ppc.h +#include asm/kvm_book3s.h +#include asm/mmu-hash64.h +#include asm/hvcall.h +#include asm/synch.h +#include asm/ppc-opcode.h +#include asm/kvm_host.h +#include asm/udbg.h + +#define TCES_PER_PAGE
[no subject]
-- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html