Re: [Qemu-devel] Use getaddrinfo for migration
On 02/03/12 18:41, Daniel P. Berrange wrote: On Fri, Mar 02, 2012 at 02:25:36PM +0400, Michael Tokarev wrote: Not a reply to the patch but a general observation. I noticed that the tcp migration uses gethostname (or getaddrinfo after this patch) from the main thread - is it really the way to go? Note that DNS query which is done may block for a large amount of time. Is it really safe in this context? Should it resolve the name in a separate thread, allowing guest to run while it is doing that? This question is important for me because right now I'm evaluating a network-connected block device driver which should do failover, so it will have to resolve alternative name(s) at runtime (especially since list of available targets is dynamic). From one point, _usually_, the delay there is very small since it is unlikely you'll do migration or failover overseas, so most likely you'll have the answer from DNS handy. But from another point, if the DNS is malfunctioning right at that time (eg, one of the two DNS resolvers is being rebooted), the delay even from local DNS may be noticeable. Yes, I think you are correct - QEMU should take care to ensure that DNS resolution can not block the QEMU event loop thread. There is the GLib extension (getaddrinfo_a) which does async DNS resolution, but for sake of portability it is probably better to use a thread to do it. I've prepared a V2 according to Kevin's comment, https://github.com/kongove/qemu/commits/master But I don't know how to process the getaddrinfo issue, which steps should be done by a thread? anyone can give a hint? thanks. == migrate steps == 0. main_loop, qemu_iohandler_poll 1. get migration command from qemu monitor 2. parse host/port, get an address list by getaddrinfo() 3. connect server 4. check status and return to main_loop (step 0) (VMstate data is transmitted in background) main_loop_wait() ... \- do_migrate() \- tcp_start_outgoing_migration() \- tcp_client_start() \- parse_host_port_info() \- getaddrinfo() -- Amos. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH] fix select(2) race between main_loop_wait and qemu_aio_wait
This is quite ugly. Two threads, one running main_loop_wait and one running qemu_aio_wait, can race with each other on running the same iohandler. The result is that an iohandler could run while the underlying socket is not readable or writable, with possibly ill effects. This shows as a failure to boot an IDE disk using the NBD device. We can consider it a bug in NBD or in the main loop. The patch fixes this in main_loop_wait, which is always going to lose the race because qemu_aio_wait runs select with the global lock held. Reported-by: Laurent Vivier laur...@vivier.eu Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- Anthony, if you think this is too ugly tell me and I can post an NBD fix too. main-loop.c |7 +++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/main-loop.c b/main-loop.c index db23de0..3beccff 100644 --- a/main-loop.c +++ b/main-loop.c @@ -458,6 +458,13 @@ int main_loop_wait(int nonblocking) if (timeout 0) { qemu_mutex_lock_iothread(); + +/* Poll again. A qemu_aio_wait() on another thread + * could have made the fdsets stale. + */ +tv.tv_sec = 0; +tv.tv_usec = 0; +ret = select(nfds + 1, rfds, wfds, xfds, tv); } glib_select_poll(rfds, wfds, xfds, (ret 0)); -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 3/4] net: split hostname and service by last colon
Am 02.03.2012 20:54, schrieb Laine Stump: On 03/02/2012 05:35 AM, Kevin Wolf wrote: Am 02.03.2012 10:58, schrieb Amos Kong: On 02/03/12 11:38, Amos Kong wrote: --- a/net.c +++ b/net.c @@ -84,7 +84,7 @@ static int get_str_sep(char *buf, int buf_size, const char **pp, int sep) const char *p, *p1; int len; p = *pp; -p1 = strchr(p, sep); +p1 = strrchr(p, sep); if (!p1) return -1; len = p1 - p; And what if the port isn't specified? I think you would erroneously interpret the last part of the IP address as port. Hi Kevin, port must be specified in '-incoming' parameters and migrate monitor cmd. qemu-kvm ... -incoming tcp:$host:$port (qemu) migrate -d tcp:$host:$port If use boot up guest by wrong cmdline, qemu will report an error msg. # ./x86_64-softmmu/qemu-system-x86_64 --enable-kvm -boot n -incoming tcp:2312::8272 -monitor stdio qemu-system-x86_64: qemu: getaddrinfo: Name or service not known tcp_server_start: Invalid argument Migration failed. Exit code tcp:2312::8272(-22), exiting. Which is because 2312: isn't a valid IP address, right? But what if you have something like 2312::1234:8272? If you misinterpret the 8272 as a port number, the remaining address is still a valid IPv6 address. This is made irrelevant by PATCH 4/4, which allows for the IP address to be placed inside brackets: [2312::8272]:port (at least it's irrelevant if your documentation *requires* brackets for all numeric ipv6-address:port pairs, which is strongly recommended by RFC 5952). It really is impossible to disambiguate the meaning of the final : unless you require these brackets (or 1) require full specification of all potential colons in the IPv6 address or require that the port *always* be specified, neither of which seem acceptable to me). Here you're actually explaining why it's not irrelevant. You don't want to enforce port numbers, so 2312::1234:8272 must be interpreted as an IPv6 address without a port. This code however would take 8727 as the port and 2312::1234 as the IPv6 address, which is not what you expected (even after brackets are allowed - they don't make a difference because the example doesn't use brackets). Kevin -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 3/4] net: split hostname and service by last colon
- Original Message - Am 02.03.2012 20:54, schrieb Laine Stump: On 03/02/2012 05:35 AM, Kevin Wolf wrote: Am 02.03.2012 10:58, schrieb Amos Kong: On 02/03/12 11:38, Amos Kong wrote: --- a/net.c +++ b/net.c @@ -84,7 +84,7 @@ static int get_str_sep(char *buf, int buf_size, const char **pp, int sep) const char *p, *p1; int len; p = *pp; -p1 = strchr(p, sep); +p1 = strrchr(p, sep); if (!p1) return -1; len = p1 - p; And what if the port isn't specified? I think you would erroneously interpret the last part of the IP address as port. Hi Kevin, port must be specified in '-incoming' parameters and migrate monitor cmd. qemu-kvm ... -incoming tcp:$host:$port (qemu) migrate -d tcp:$host:$port If use boot up guest by wrong cmdline, qemu will report an error msg. # ./x86_64-softmmu/qemu-system-x86_64 --enable-kvm -boot n -incoming tcp:2312::8272 -monitor stdio qemu-system-x86_64: qemu: getaddrinfo: Name or service not known tcp_server_start: Invalid argument Migration failed. Exit code tcp:2312::8272(-22), exiting. Which is because 2312: isn't a valid IP address, right? But what if you have something like 2312::1234:8272? If you misinterpret the 8272 as a port number, the remaining address is still a valid IPv6 address. This is made irrelevant by PATCH 4/4, which allows for the IP address to be placed inside brackets: [2312::8272]:port (at least it's irrelevant if your documentation *requires* brackets for all numeric ipv6-address:port pairs, which is strongly recommended by RFC 5952). It really is impossible to disambiguate the meaning of the final : unless you require these brackets (or 1) require full specification of all potential colons in the IPv6 address or require that the port *always* be specified, neither of which seem acceptable to me). Here you're actually explaining why it's not irrelevant. You don't want to enforce port numbers, so 2312::1234:8272 must be interpreted as an IPv6 address without a port. This code however would take 8727 as the port and 2312::1234 as the IPv6 address, which is not what you expected (even after brackets are allowed - they don't make a difference because the example doesn't use brackets). In the migration context, host/port are all necessary, so it's right to parse 8272 to a port. However, for IPv6 brackets must be mandatory if you require a port. BTW, the DNS delay issue existed in the past (gethostbyname()), it should be fixed by another patchset. I will post my V2 (without fix of DNS delay) later. Kevin -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 3/4] net: split hostname and service by last colon
Am 05.03.2012 09:59, schrieb Amos Kong: - Original Message - Am 02.03.2012 20:54, schrieb Laine Stump: On 03/02/2012 05:35 AM, Kevin Wolf wrote: Am 02.03.2012 10:58, schrieb Amos Kong: On 02/03/12 11:38, Amos Kong wrote: --- a/net.c +++ b/net.c @@ -84,7 +84,7 @@ static int get_str_sep(char *buf, int buf_size, const char **pp, int sep) const char *p, *p1; int len; p = *pp; -p1 = strchr(p, sep); +p1 = strrchr(p, sep); if (!p1) return -1; len = p1 - p; And what if the port isn't specified? I think you would erroneously interpret the last part of the IP address as port. Hi Kevin, port must be specified in '-incoming' parameters and migrate monitor cmd. qemu-kvm ... -incoming tcp:$host:$port (qemu) migrate -d tcp:$host:$port If use boot up guest by wrong cmdline, qemu will report an error msg. # ./x86_64-softmmu/qemu-system-x86_64 --enable-kvm -boot n -incoming tcp:2312::8272 -monitor stdio qemu-system-x86_64: qemu: getaddrinfo: Name or service not known tcp_server_start: Invalid argument Migration failed. Exit code tcp:2312::8272(-22), exiting. Which is because 2312: isn't a valid IP address, right? But what if you have something like 2312::1234:8272? If you misinterpret the 8272 as a port number, the remaining address is still a valid IPv6 address. This is made irrelevant by PATCH 4/4, which allows for the IP address to be placed inside brackets: [2312::8272]:port (at least it's irrelevant if your documentation *requires* brackets for all numeric ipv6-address:port pairs, which is strongly recommended by RFC 5952). It really is impossible to disambiguate the meaning of the final : unless you require these brackets (or 1) require full specification of all potential colons in the IPv6 address or require that the port *always* be specified, neither of which seem acceptable to me). Here you're actually explaining why it's not irrelevant. You don't want to enforce port numbers, so 2312::1234:8272 must be interpreted as an IPv6 address without a port. This code however would take 8727 as the port and 2312::1234 as the IPv6 address, which is not what you expected (even after brackets are allowed - they don't make a difference because the example doesn't use brackets). In the migration context, host/port are all necessary, so it's right to parse 8272 to a port. However, for IPv6 brackets must be mandatory if you require a port. Makes sense. BTW, the DNS delay issue existed in the past (gethostbyname()), it should be fixed by another patchset. I will post my V2 (without fix of DNS delay) later. Yes, I agree. Kevin -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] fix select(2) race between main_loop_wait and qemu_aio_wait
On 2012-03-05 09:34, Paolo Bonzini wrote: This is quite ugly. Two threads, one running main_loop_wait and one running qemu_aio_wait, can race with each other on running the same iohandler. The result is that an iohandler could run while the underlying socket is not readable or writable, with possibly ill effects. Hmm, isn't it a problem already that a socket is polled by two threads at the same time? Can't that be avoided? Long-term, I'd like to cut out certain file descriptors from the main loop and process them completely in separate threads (for separate locking, prioritization etc.). Dunno how NBD works, but maybe it should be reworked like this already. Jan This shows as a failure to boot an IDE disk using the NBD device. We can consider it a bug in NBD or in the main loop. The patch fixes this in main_loop_wait, which is always going to lose the race because qemu_aio_wait runs select with the global lock held. Reported-by: Laurent Vivier laur...@vivier.eu Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- Anthony, if you think this is too ugly tell me and I can post an NBD fix too. main-loop.c |7 +++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/main-loop.c b/main-loop.c index db23de0..3beccff 100644 --- a/main-loop.c +++ b/main-loop.c @@ -458,6 +458,13 @@ int main_loop_wait(int nonblocking) if (timeout 0) { qemu_mutex_lock_iothread(); + +/* Poll again. A qemu_aio_wait() on another thread + * could have made the fdsets stale. + */ +tv.tv_sec = 0; +tv.tv_usec = 0; +ret = select(nfds + 1, rfds, wfds, xfds, tv); } glib_select_poll(rfds, wfds, xfds, (ret 0)); -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] fix select(2) race between main_loop_wait and qemu_aio_wait
Il 05/03/2012 10:07, Jan Kiszka ha scritto: This is quite ugly. Two threads, one running main_loop_wait and one running qemu_aio_wait, can race with each other on running the same iohandler. The result is that an iohandler could run while the underlying socket is not readable or writable, with possibly ill effects. Hmm, isn't it a problem already that a socket is polled by two threads at the same time? Can't that be avoided? We still have synchronous I/O in the device models. That's the root cause of the bug, I suppose. Long-term, I'd like to cut out certain file descriptors from the main loop and process them completely in separate threads (for separate locking, prioritization etc.). Dunno how NBD works, but maybe it should be reworked like this already. Me too, I even made a very simple proof of concept a couple of weeks ago (search for a thread switching the block layer from coroutines to threads). It worked, though it is obviously not upstreamable in any way. In that world order EventNotifiers would replace qemu_aio_set_fd_handler, and socket-based protocols such as NBD would run with blocking I/O in their own thread. In addition to one thread per I/O request (from a thread pool), there would be one arbiter thread that reads replies and dispatches them to the appropriate I/O request thread. The arbiter thread replaces the read callback in qemu_aio_set_fd_handler. The problem is, even though it worked, making this thread-safe is another story. I suspect that in practice it is very difficult to do without resurrecting RCU patches. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 13/38] KVM: PPC: booke: category E.HV (GS-mode) support
+/* + * Host interrupt handlers may have clobbered these guest-readable + * SPRGs, so we need to reload them here with the guest's values. + */ +lwz r3, VCPU_VRSAVE(r4) +lwz r5, VCPU_SHARED_SPRG4(r11) +mtspr SPRN_VRSAVE, r3 +lwz r6, VCPU_SHARED_SPRG5(r11) +mtspr SPRN_SPRG4W, r5 +lwz r7, VCPU_SHARED_SPRG6(r11) +mtspr SPRN_SPRG5W, r6 +lwz r8, VCPU_SHARED_SPRG7(r11) +mtspr SPRN_SPRG6W, r7 +mtspr SPRN_SPRG7W, r8 + That should be here. +/* Load some guest volatiles. */ +PPC_LL r3, VCPU_LR(r4) +PPC_LL r5, VCPU_XER(r4) +PPC_LL r6, VCPU_CTR(r4) +PPC_LL r7, VCPU_CR(r4) +PPC_LL r8, VCPU_PC(r4) +#ifndef CONFIG_64BIT +lwz r9, (VCPU_SHARED_MSR + 4)(r11) +#else +ld r9, (VCPU_SHARED_MSR)(r11) +#endif +PPC_LL r0, VCPU_GPR(r0)(r4) +PPC_LL r1, VCPU_GPR(r1)(r4) +PPC_LL r2, VCPU_GPR(r2)(r4) +PPC_LL r10, VCPU_GPR(r10)(r4) +PPC_LL r11, VCPU_GPR(r11)(r4) +PPC_LL r12, VCPU_GPR(r12)(r4) +PPC_LL r13, VCPU_GPR(r13)(r4) +mtlrr3 +mtxer r5 +mtctr r6 +mtcrr7 +mtsrr0 r8 +mtsrr1 r9 + +#ifdef CONFIG_KVM_EXIT_TIMING +/* save enter time */ +1: +mfspr r6, SPRN_TBRU +mfspr r7, SPRN_TBRL +mfspr r8, SPRN_TBRU +cmpwr8, r6 Is not we should save guest CR after this otherwise this can corrupt it? I think this should be a typo since in our previous kvm implementation, we always did collect kvm exit timing at the above location :) Tiejun Thanks -Bharat +PPC_STL r7, VCPU_TIMING_LAST_ENTER_TBL(r4) +bne 1b +PPC_STL r8, VCPU_TIMING_LAST_ENTER_TBU(r4) +#endif + +/* Finish loading guest volatiles and jump to guest. */ +PPC_LL r5, VCPU_GPR(r5)(r4) +PPC_LL r6, VCPU_GPR(r6)(r4) +PPC_LL r7, VCPU_GPR(r7)(r4) +PPC_LL r8, VCPU_GPR(r8)(r4) +PPC_LL r9, VCPU_GPR(r9)(r4) + +PPC_LL r3, VCPU_GPR(r3)(r4) +PPC_LL r4, VCPU_GPR(r4)(r4) +rfi -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/1 v3] PCI: Device specific reset function
--- drivers/pci/pci.h|1 + drivers/pci/quirks.c | 33 +++-- include/linux/pci.h |1 + 3 files changed, 29 insertions(+), 6 deletions(-) diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 1009a5e..4d10479 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -315,6 +315,7 @@ struct pci_dev_reset_methods { u16 vendor; u16 device; int (*reset)(struct pci_dev *dev, int probe); + struct list_head list; }; #ifdef CONFIG_PCI_QUIRKS diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 6476547..f423d2f 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -3070,26 +3070,47 @@ static int reset_intel_82599_sfp_virtfn(struct pci_dev *dev, int probe) } #define PCI_DEVICE_ID_INTEL_82599_SFP_VF 0x10ed - -static const struct pci_dev_reset_methods pci_dev_reset_methods[] = { +static struct pci_dev_reset_methods pci_dev_reset_methods[] = { { PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82599_SFP_VF, -reset_intel_82599_sfp_virtfn }, + reset_intel_82599_sfp_virtfn }, { PCI_VENDOR_ID_INTEL, PCI_ANY_ID, reset_intel_generic_dev }, - { 0 } }; +static LIST_HEAD(reset_list); + +void pci_dev_specific_reset_add(struct pci_dev_reset_methods *reset_method) +{ + INIT_LIST_HEAD(reset_method-list); + list_add(reset_method-list, reset_list); +} + +static int __init pci_dev_specific_reset_init(void) +{ + int i; + + for (i = 0; i ARRAY_SIZE(pci_dev_reset_methods); i++) { + pci_dev_specific_reset_add(pci_dev_reset_methods[i]); + } + return 0; +} + +late_initcall(pci_dev_specific_reset_init); + int pci_dev_specific_reset(struct pci_dev *dev, int probe) { const struct pci_dev_reset_methods *i; + struct pci_driver *drv = dev-driver; + + if (drv drv-reset) + return drv-reset(dev, probe); - for (i = pci_dev_reset_methods; i-reset; i++) { + list_for_each_entry(i, reset_list, list) { if ((i-vendor == dev-vendor || i-vendor == (u16)PCI_ANY_ID) (i-device == dev-device || i-device == (u16)PCI_ANY_ID)) return i-reset(dev, probe); } - return -ENOTTY; } diff --git a/include/linux/pci.h b/include/linux/pci.h index a16b1df..a3a0bc5 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -560,6 +560,7 @@ struct pci_driver { int (*resume_early) (struct pci_dev *dev); int (*resume) (struct pci_dev *dev); /* Device woken up */ void (*shutdown) (struct pci_dev *dev); + int (*reset) (struct pci_dev *dev, int probe); /* Device specific reset */ struct pci_error_handlers *err_handler; struct device_driverdriver; struct pci_dynids dynids; -- 1.7.7.6 -- Intel Shannon Limited Registered in Ireland Registered Office: Collinstown Industrial Park, Leixlip, County Kildare Registered Number: 308263 Business address: Dromore House, East Park, Shannon, Co. Clare This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/1] PCI: Device specific reset function
Hi, I have a use case where I need to cleanup resource allocated for Virtual Functions after a guest OS that used it crashed. This cleanup needs to be done before the VF is being FLRed. The only possible way to do this seems to be by using pci_dev_specific_reset() function. Unfortunately this function only works for devices defined in a static table in the drivers/pci/quirks.c file. This patch changes it so that specific reset handler is part of pci_driver struct. drivers/pci/pci.h|1 + drivers/pci/quirks.c | 33 +++-- include/linux/pci.h |1 + 3 files changed, 29 insertions(+), 6 deletions(-) -- Intel Shannon Limited Registered in Ireland Registered Office: Collinstown Industrial Park, Leixlip, County Kildare Registered Number: 308263 Business address: Dromore House, East Park, Shannon, Co. Clare This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 0/9] support to migrate with IPv6 address
Those patches make migration of IPv6 address work, old code only support to parse IPv4 address/port, use getaddrinfo() to get socket addresses infomation. Last two patches are about spliting IPv6 host/port. Changes from v1: - split different changes to small patches, it will be easier to review - fixed some problem according to Kevin's comment --- Amos Kong (9): net: introduce tcp_server_start() net: use tcp_server_start() for tcp server creation net: introduce tcp_client_start() net: use tcp_client_start for tcp client creation net: refector tcp_*_start functions net: use getaddrinfo() in tcp_start_common net: introduce parse_host_port_info() net: split hostname and service by last colon net: support to include ipv6 address by brackets migration-tcp.c | 62 +++-- net.c | 137 +++ net/socket.c| 64 ++ qemu_socket.h |3 + 4 files changed, 171 insertions(+), 95 deletions(-) -- Amos Kong -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/9] net: introduce tcp_server_start()
Introduce tcp_server_start() by moving original code in tcp_start_incoming_migration(). Signed-off-by: Amos Kong ak...@redhat.com --- net.c | 27 +++ qemu_socket.h |2 ++ 2 files changed, 29 insertions(+), 0 deletions(-) diff --git a/net.c b/net.c index c34474f..0260968 100644 --- a/net.c +++ b/net.c @@ -99,6 +99,33 @@ static int get_str_sep(char *buf, int buf_size, const char **pp, int sep) return 0; } +int tcp_server_start(const char *str, int *fd) +{ +int val, ret; +struct sockaddr_in saddr; + +if (parse_host_port(saddr, str) 0) { +return -1; +} + +*fd = qemu_socket(PF_INET, SOCK_STREAM, 0); +if (fd 0) { +perror(socket); +return -1; +} +socket_set_nonblock(*fd); + +/* allow fast reuse */ +val = 1; +setsockopt(*fd, SOL_SOCKET, SO_REUSEADDR, (const char *)val, sizeof(val)); + +ret = bind(*fd, (struct sockaddr *)saddr, sizeof(saddr)); +if (ret 0) { +closesocket(*fd); +} +return ret; +} + int parse_host_port(struct sockaddr_in *saddr, const char *str) { char buf[512]; diff --git a/qemu_socket.h b/qemu_socket.h index fe4cf6c..d612793 100644 --- a/qemu_socket.h +++ b/qemu_socket.h @@ -54,6 +54,8 @@ int unix_listen(const char *path, char *ostr, int olen); int unix_connect_opts(QemuOpts *opts); int unix_connect(const char *path); +int tcp_server_start(const char *str, int *fd); + /* Old, ipv4 only bits. Don't use for new code. */ int parse_host_port(struct sockaddr_in *saddr, const char *str); int socket_init(void); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/9] net: use tcp_server_start() for tcp server creation
Use tcp_server_start in those two functions: tcp_start_incoming_migration() net_socket_listen_init() Signed-off-by: Amos Kong ak...@redhat.com --- migration-tcp.c | 21 + net/socket.c| 23 +++ 2 files changed, 8 insertions(+), 36 deletions(-) diff --git a/migration-tcp.c b/migration-tcp.c index 35a5781..ecadd10 100644 --- a/migration-tcp.c +++ b/migration-tcp.c @@ -157,28 +157,17 @@ out2: int tcp_start_incoming_migration(const char *host_port) { -struct sockaddr_in addr; -int val; +int ret; int s; DPRINTF(Attempting to start an incoming migration\n); -if (parse_host_port(addr, host_port) 0) { -fprintf(stderr, invalid host/port combination: %s\n, host_port); -return -EINVAL; -} - -s = qemu_socket(PF_INET, SOCK_STREAM, 0); -if (s == -1) { -return -socket_error(); +ret = tcp_server_start(host_port, s); +if (ret 0) { +fprintf(stderr, tcp_server_start: %s\n, strerror(-ret)); +return ret; } -val = 1; -setsockopt(s, SOL_SOCKET, SO_REUSEADDR, (const char *)val, sizeof(val)); - -if (bind(s, (struct sockaddr *)addr, sizeof(addr)) == -1) { -goto err; -} if (listen(s, 1) == -1) { goto err; } diff --git a/net/socket.c b/net/socket.c index 0bcf229..5feb3d2 100644 --- a/net/socket.c +++ b/net/socket.c @@ -403,31 +403,14 @@ static int net_socket_listen_init(VLANState *vlan, const char *host_str) { NetSocketListenState *s; -int fd, val, ret; -struct sockaddr_in saddr; - -if (parse_host_port(saddr, host_str) 0) -return -1; +int fd, ret; s = g_malloc0(sizeof(NetSocketListenState)); -fd = qemu_socket(PF_INET, SOCK_STREAM, 0); -if (fd 0) { -perror(socket); -g_free(s); -return -1; -} -socket_set_nonblock(fd); - -/* allow fast reuse */ -val = 1; -setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, (const char *)val, sizeof(val)); - -ret = bind(fd, (struct sockaddr *)saddr, sizeof(saddr)); +ret = tcp_server_start(host_str, fd); if (ret 0) { -perror(bind); +error_report(tcp_server_start: %s, strerror(-ret)); g_free(s); -closesocket(fd); return -1; } ret = listen(fd, 0); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 3/9] net: introduce tcp_client_start()
Introduce tcp_client_start() by moving original code in tcp_start_outgoing_migration(). Signed-off-by: Amos Kong ak...@redhat.com --- net.c | 39 +++ qemu_socket.h |1 + 2 files changed, 40 insertions(+), 0 deletions(-) diff --git a/net.c b/net.c index 0260968..5c20e22 100644 --- a/net.c +++ b/net.c @@ -126,6 +126,45 @@ int tcp_server_start(const char *str, int *fd) return ret; } +int tcp_client_start(const char *str, int *fd) +{ +struct sockaddr_in saddr; +int ret; + +if (parse_host_port(saddr, str) 0) { +return -EINVAL; +} + +*fd = qemu_socket(PF_INET, SOCK_STREAM, 0); +if (fd 0) { +perror(socket); +return -1; +} +socket_set_nonblock(*fd); + +for (;;) { +ret = connect(*fd, (struct sockaddr *)saddr, sizeof(saddr)); +if (ret 0) { +ret = -socket_error(); +if (ret == -EINPROGRESS) { +break; +#ifdef _WIN32 +} else if (ret == -WSAEALREADY || ret == -WSAEINVAL) { +break; +#endif +} else if (ret != -EINTR ret != -EWOULDBLOCK) { +perror(connect); +closesocket(*fd); +return -1; +} +} else { +break; +} +} + +return ret; +} + int parse_host_port(struct sockaddr_in *saddr, const char *str) { char buf[512]; diff --git a/qemu_socket.h b/qemu_socket.h index d612793..9246578 100644 --- a/qemu_socket.h +++ b/qemu_socket.h @@ -55,6 +55,7 @@ int unix_connect_opts(QemuOpts *opts); int unix_connect(const char *path); int tcp_server_start(const char *str, int *fd); +int tcp_client_start(const char *str, int *fd); /* Old, ipv4 only bits. Don't use for new code. */ int parse_host_port(struct sockaddr_in *saddr, const char *str); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 4/9] net: use tcp_client_start for tcp client creation
Use tcp_client_start() in those two functions: tcp_start_outgoing_migration() net_socket_connect_init() Signed-off-by: Amos Kong ak...@redhat.com --- migration-tcp.c | 41 + net/socket.c| 41 +++-- 2 files changed, 24 insertions(+), 58 deletions(-) diff --git a/migration-tcp.c b/migration-tcp.c index ecadd10..4f89bff 100644 --- a/migration-tcp.c +++ b/migration-tcp.c @@ -81,43 +81,28 @@ static void tcp_wait_for_connect(void *opaque) int tcp_start_outgoing_migration(MigrationState *s, const char *host_port) { -struct sockaddr_in addr; int ret; - -ret = parse_host_port(addr, host_port); -if (ret 0) { -return ret; -} +int fd; s-get_error = socket_errno; s-write = socket_write; s-close = tcp_close; -s-fd = qemu_socket(PF_INET, SOCK_STREAM, 0); -if (s-fd == -1) { -DPRINTF(Unable to open socket); -return -socket_error(); -} - -socket_set_nonblock(s-fd); - -do { -ret = connect(s-fd, (struct sockaddr *)addr, sizeof(addr)); -if (ret == -1) { -ret = -socket_error(); -} -if (ret == -EINPROGRESS || ret == -EWOULDBLOCK) { -qemu_set_fd_handler2(s-fd, NULL, NULL, tcp_wait_for_connect, s); -return 0; -} -} while (ret == -EINTR); - -if (ret 0) { +ret = tcp_client_start(host_port, fd); +s-fd = fd; +if (ret == -EINPROGRESS || ret == -EWOULDBLOCK) { +DPRINTF(connect in progress); +qemu_set_fd_handler2(s-fd, NULL, NULL, tcp_wait_for_connect, s); +} else if (ret 0) { DPRINTF(connect failed\n); -migrate_fd_error(s); +if (ret != -EINVAL) { +migrate_fd_error(s); +} return ret; +} else { +migrate_fd_connect(s); } -migrate_fd_connect(s); + return 0; } diff --git a/net/socket.c b/net/socket.c index 5feb3d2..b7cd8ec 100644 --- a/net/socket.c +++ b/net/socket.c @@ -434,41 +434,22 @@ static int net_socket_connect_init(VLANState *vlan, const char *host_str) { NetSocketState *s; -int fd, connected, ret, err; +int fd, connected, ret; struct sockaddr_in saddr; -if (parse_host_port(saddr, host_str) 0) -return -1; - -fd = qemu_socket(PF_INET, SOCK_STREAM, 0); -if (fd 0) { -perror(socket); -return -1; -} -socket_set_nonblock(fd); - -connected = 0; -for(;;) { -ret = connect(fd, (struct sockaddr *)saddr, sizeof(saddr)); -if (ret 0) { -err = socket_error(); -if (err == EINTR || err == EWOULDBLOCK) { -} else if (err == EINPROGRESS) { -break; +ret = tcp_client_start(host_str, fd); +if (ret == -EINPROGRESS || ret == -EWOULDBLOCK) { +connected = 0; #ifdef _WIN32 -} else if (err == WSAEALREADY || err == WSAEINVAL) { -break; +} else if (ret == -WSAEALREADY || ret == -WSAEINVAL) { +connected = 0; #endif -} else { -perror(connect); -closesocket(fd); -return -1; -} -} else { -connected = 1; -break; -} +} else if (ret 0) { +return -1; +} else { +connected = 1; } + s = net_socket_fd_init(vlan, model, name, fd, connected); if (!s) return -1; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 5/9] net: refector tcp_*_start functions
There are some repeated code for tcp_server_start() and tcp_client_start(). Signed-off-by: Amos Kong ak...@redhat.com --- net.c | 82 - 1 files changed, 46 insertions(+), 36 deletions(-) diff --git a/net.c b/net.c index 5c20e22..da2a8d4 100644 --- a/net.c +++ b/net.c @@ -99,37 +99,41 @@ static int get_str_sep(char *buf, int buf_size, const char **pp, int sep) return 0; } -int tcp_server_start(const char *str, int *fd) +static int tcp_server_bind(int fd, struct sockaddr_in *saddr) { -int val, ret; -struct sockaddr_in saddr; +int ret; +int val = 1; -if (parse_host_port(saddr, str) 0) { -return -1; -} +/* allow fast reuse */ +setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, (const char *)val, sizeof(val)); -*fd = qemu_socket(PF_INET, SOCK_STREAM, 0); -if (fd 0) { -perror(socket); -return -1; +ret = bind(fd, (struct sockaddr *)saddr, sizeof(*saddr)); + +if (ret == -1) { +ret = -socket_error(); } -socket_set_nonblock(*fd); +return ret; -/* allow fast reuse */ -val = 1; -setsockopt(*fd, SOL_SOCKET, SO_REUSEADDR, (const char *)val, sizeof(val)); +} + +static int tcp_client_connect(int fd, struct sockaddr_in *saddr) +{ +int ret; + +do { +ret = connect(fd, (struct sockaddr *)saddr, sizeof(*saddr)); +if (ret == -1) { +ret = -socket_error(); +} +} while (ret == -EINTR || ret == -EWOULDBLOCK); -ret = bind(*fd, (struct sockaddr *)saddr, sizeof(saddr)); -if (ret 0) { -closesocket(*fd); -} return ret; } -int tcp_client_start(const char *str, int *fd) +static int tcp_start_common(const char *str, int *fd, bool server) { +int ret = -EINVAL; struct sockaddr_in saddr; -int ret; if (parse_host_port(saddr, str) 0) { return -EINVAL; @@ -142,29 +146,35 @@ int tcp_client_start(const char *str, int *fd) } socket_set_nonblock(*fd); -for (;;) { -ret = connect(*fd, (struct sockaddr *)saddr, sizeof(saddr)); -if (ret 0) { -ret = -socket_error(); -if (ret == -EINPROGRESS) { -break; +if (server) { +ret = tcp_server_bind(*fd, saddr); +} else { +ret = tcp_client_connect(*fd, saddr); +} + #ifdef _WIN32 -} else if (ret == -WSAEALREADY || ret == -WSAEINVAL) { -break; +if (ret == -WSAEALREADY || ret == -WSAEINVAL) { +return ret; /* Success */ +} #endif -} else if (ret != -EINTR ret != -EWOULDBLOCK) { -perror(connect); -closesocket(*fd); -return -1; -} -} else { -break; -} +if (ret = 0 || ret == -EINPROGRESS || ret == -EWOULDBLOCK) { +return ret; /* Success */ } +closesocket(*fd); return ret; } +int tcp_server_start(const char *str, int *fd) +{ +return tcp_start_common(str, fd, true); +} + +int tcp_client_start(const char *str, int *fd) +{ +return tcp_start_common(str, fd, false); +} + int parse_host_port(struct sockaddr_in *saddr, const char *str) { char buf[512]; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 6/9] net: use getaddrinfo() in tcp_start_common
Migrating with IPv6 address exists problem, gethostbyname()/inet_aton() could not translate IPv6 address/port simply, so use getaddrinfo() in tcp_start_common to translate network address and service. We can get an address list by getaddrinfo(). Userlevel IPv6 Programming Introduction: http://www.akkadia.org/drepper/userapi-ipv6.html Reference RFC 3493, Basic Socket Interface Extensions for IPv6 Signed-off-by: Amos Kong ak...@redhat.com --- net.c | 81 - 1 files changed, 60 insertions(+), 21 deletions(-) diff --git a/net.c b/net.c index da2a8d4..de1db8c 100644 --- a/net.c +++ b/net.c @@ -99,7 +99,7 @@ static int get_str_sep(char *buf, int buf_size, const char **pp, int sep) return 0; } -static int tcp_server_bind(int fd, struct sockaddr_in *saddr) +static int tcp_server_bind(int fd, struct addrinfo *rp) { int ret; int val = 1; @@ -107,7 +107,7 @@ static int tcp_server_bind(int fd, struct sockaddr_in *saddr) /* allow fast reuse */ setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, (const char *)val, sizeof(val)); -ret = bind(fd, (struct sockaddr *)saddr, sizeof(*saddr)); +ret = bind(fd, rp-ai_addr, rp-ai_addrlen); if (ret == -1) { ret = -socket_error(); @@ -116,12 +116,12 @@ static int tcp_server_bind(int fd, struct sockaddr_in *saddr) } -static int tcp_client_connect(int fd, struct sockaddr_in *saddr) +static int tcp_client_connect(int fd, struct addrinfo *rp) { int ret; do { -ret = connect(fd, (struct sockaddr *)saddr, sizeof(*saddr)); +ret = connect(fd, rp-ai_addr, rp-ai_addrlen); if (ret == -1) { ret = -socket_error(); } @@ -132,36 +132,75 @@ static int tcp_client_connect(int fd, struct sockaddr_in *saddr) static int tcp_start_common(const char *str, int *fd, bool server) { +char hostname[512]; +const char *service; +const char *name; +struct addrinfo hints; +struct addrinfo *result, *rp; +int s; +int sfd; int ret = -EINVAL; -struct sockaddr_in saddr; -if (parse_host_port(saddr, str) 0) { +*fd = -1; +service = str; + +if (get_str_sep(hostname, sizeof(hostname), service, ':') 0) { +error_report(invalid host/port combination: %s, str); return -EINVAL; } - -*fd = qemu_socket(PF_INET, SOCK_STREAM, 0); -if (fd 0) { -perror(socket); -return -1; +if (server strlen(hostname) == 0) { +name = NULL; +} else { +name = hostname; } -socket_set_nonblock(*fd); + +/* Obtain address(es) matching host/port */ + +memset(hints, 0, sizeof(struct addrinfo)); +hints.ai_family = AF_UNSPEC; /* Allow IPv4 or IPv6 */ +hints.ai_socktype = SOCK_STREAM; /* Datagram socket */ if (server) { -ret = tcp_server_bind(*fd, saddr); -} else { -ret = tcp_client_connect(*fd, saddr); +hints.ai_flags = AI_PASSIVE; } -#ifdef _WIN32 -if (ret == -WSAEALREADY || ret == -WSAEINVAL) { -return ret; /* Success */ +s = getaddrinfo(name, service, hints, result); +if (s != 0) { +error_report(qemu: getaddrinfo: %s, gai_strerror(s)); +return -EINVAL; } + +/* getaddrinfo() returns a list of address structures. + Try each address until we successfully bind/connect). + If socket(2) (or bind/connect(2)) fails, we (close the socket + and) try the next address. */ + +for (rp = result; rp != NULL; rp = rp-ai_next) { +sfd = qemu_socket(rp-ai_family, rp-ai_socktype, rp-ai_protocol); +if (sfd == -1) { +ret = -errno; +continue; +} +socket_set_nonblock(sfd); +if (server) { +ret = tcp_server_bind(sfd, rp); +} else { +ret = tcp_client_connect(sfd, rp); +} +#ifdef _WIN32 +if (ret == -WSAEALREADY || ret == -WSAEINVAL) { +*fd = sfd; +break; /* Success */ +} #endif -if (ret = 0 || ret == -EINPROGRESS || ret == -EWOULDBLOCK) { -return ret; /* Success */ +if (ret = 0 || ret == -EINPROGRESS || ret == -EWOULDBLOCK) { +*fd = sfd; +break; /* Success */ +} +closesocket(sfd); } -closesocket(*fd); +freeaddrinfo(result); return ret; } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 7/9] net: introduce parse_host_port_info()
int parse_host_port(struct sockaddr_in *saddr, const char *str) Parsed address info will be restored into 'saddr', it only support ipv4. This function is used by net_socket_mcast_init() and net_socket_udp_init(). int parse_host_port_info(struct addrinfo *result, const char *str) Parsed address info will be restored into 'result', it's an address list. It can be used to parse IPv6 address/port. Signed-off-by: Amos Kong ak...@redhat.com --- net.c | 26 -- 1 files changed, 20 insertions(+), 6 deletions(-) diff --git a/net.c b/net.c index de1db8c..2518e5f 100644 --- a/net.c +++ b/net.c @@ -130,18 +130,15 @@ static int tcp_client_connect(int fd, struct addrinfo *rp) return ret; } -static int tcp_start_common(const char *str, int *fd, bool server) +static int parse_host_port_info(struct addrinfo **result, const char *str, +bool server) { char hostname[512]; const char *service; const char *name; struct addrinfo hints; -struct addrinfo *result, *rp; int s; -int sfd; -int ret = -EINVAL; -*fd = -1; service = str; if (get_str_sep(hostname, sizeof(hostname), service, ':') 0) { @@ -164,12 +161,29 @@ static int tcp_start_common(const char *str, int *fd, bool server) hints.ai_flags = AI_PASSIVE; } -s = getaddrinfo(name, service, hints, result); +s = getaddrinfo(name, service, hints, result); if (s != 0) { error_report(qemu: getaddrinfo: %s, gai_strerror(s)); return -EINVAL; } +return 0; +} + +static int tcp_start_common(const char *str, int *fd, bool server) +{ +struct addrinfo *rp; +int sfd; +int ret = -EINVAL; +struct addrinfo *result; + +*fd = -1; + +ret = parse_host_port_info(result, str, server); +if (ret 0) { +return -EINVAL; +} + /* getaddrinfo() returns a list of address structures. Try each address until we successfully bind/connect). If socket(2) (or bind/connect(2)) fails, we (close the socket -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 8/9] net: split hostname and service by last colon
IPv6 address contains colons, parse will be wrong. [2312::8274]:5200 Signed-off-by: Amos Kong ak...@redhat.com --- net.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/net.c b/net.c index 2518e5f..d6ce1fa 100644 --- a/net.c +++ b/net.c @@ -84,7 +84,7 @@ static int get_str_sep(char *buf, int buf_size, const char **pp, int sep) const char *p, *p1; int len; p = *pp; -p1 = strchr(p, sep); +p1 = strrchr(p, sep); if (!p1) return -1; len = p1 - p; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 9/9] net: support to include ipv6 address by brackets
That method of representing an IPv6 address with a port is discouraged because of its ambiguity. Referencing to RFC5952, the recommended format is: [2312::8274]:5200 For IPv6 brackets must be mandatory if you require a port. test status: Successed listen side: qemu-kvm -incoming tcp:[2312::8274]:5200 client side: qemu-kvm ... (qemu) migrate -d tcp:[2312::8274]:5200 Signed-off-by: Amos Kong ak...@redhat.com --- net.c |6 ++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/net.c b/net.c index d6ce1fa..499ed1d 100644 --- a/net.c +++ b/net.c @@ -88,6 +88,12 @@ static int get_str_sep(char *buf, int buf_size, const char **pp, int sep) if (!p1) return -1; len = p1 - p; +/* remove brackets which includes hostname */ +if (*p == '[' *(p1-1) == ']') { +p += 1; +len -= 2; +} + p1++; if (buf_size 0) { if (len buf_size - 1) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM call eganda for Tuesday 6th
Hi Please send in any agenda items you are interested in covering. Cheers, Juan. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Restore guest CR after exit timing calculation
No instruction which can change Condition Register (CR) should be executed after Guest CR is loaded. So the guest CR is restored after the Exit Timing in lightweight_exit executes cmpw, which can clobber CR. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- This patch is against e500mc branch. arch/powerpc/kvm/bookehv_interrupts.S | 11 --- 1 files changed, 8 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kvm/bookehv_interrupts.S b/arch/powerpc/kvm/bookehv_interrupts.S index 63fc5f0..6b9389f 100644 --- a/arch/powerpc/kvm/bookehv_interrupts.S +++ b/arch/powerpc/kvm/bookehv_interrupts.S @@ -574,7 +574,6 @@ lightweight_exit: mtlrr3 mtxer r5 mtctr r6 - mtcrr7 mtsrr0 r8 mtsrr1 r9 @@ -582,14 +581,20 @@ lightweight_exit: /* save enter time */ 1: mfspr r6, SPRN_TBRU - mfspr r7, SPRN_TBRL + mfspr r9, SPRN_TBRL mfspr r8, SPRN_TBRU cmpwr8, r6 - PPC_STL r7, VCPU_TIMING_LAST_ENTER_TBL(r4) + PPC_STL r9, VCPU_TIMING_LAST_ENTER_TBL(r4) bne 1b PPC_STL r8, VCPU_TIMING_LAST_ENTER_TBU(r4) #endif + /* +* Don't execute any instruction which can change CR after +* below instruction. +*/ + mtcrr7 + /* Finish loading guest volatiles and jump to guest. */ PPC_LL r5, VCPU_GPR(r5)(r4) PPC_LL r6, VCPU_GPR(r6)(r4) -- 1.7.0.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: Ensure all vcpus are consistent with in-kernel irqchip settings
If some vcpus are created before KVM_CREATE_IRQCHIP, then irqchip_in_kernel() and vcpu-arch.apic will be inconsistent, leading to potential NULL pointer dereferences. Fix by: - ensuring that no vcpus are installed when KVM_CREATE_IRQCHIP is called - ensuring that a vcpu has an apic if it is installed after KVM_CREATE_IRQCHIP This is somewhat long winded because vcpu-arch.apic is created without kvm-lock held. Based on earlier patch by Michael Ellerman. Signed-off-by: Michael Ellerman mich...@ellerman.id.au Signed-off-by: Avi Kivity a...@redhat.com --- arch/ia64/kvm/kvm-ia64.c |5 + arch/x86/kvm/x86.c |8 include/linux/kvm_host.h |7 +++ virt/kvm/kvm_main.c |4 4 files changed, 24 insertions(+), 0 deletions(-) diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c index d8ddbba..f5104b7 100644 --- a/arch/ia64/kvm/kvm-ia64.c +++ b/arch/ia64/kvm/kvm-ia64.c @@ -1172,6 +1172,11 @@ static enum hrtimer_restart hlt_timer_fn(struct hrtimer *data) #define PALE_RESET_ENTRY0x8000ffb0UL +bool kvm_vcpu_compatible(struct kvm_vcpu *vcpu) +{ + return irqchip_in_kernel(vcpu-kcm) == (vcpu-arch.apic != NULL); +} + int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu) { struct kvm_vcpu *v; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 3ee008f..be9594a 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3199,6 +3199,9 @@ long kvm_arch_vm_ioctl(struct file *filp, r = -EEXIST; if (kvm-arch.vpic) goto create_irqchip_unlock; + r = -EINVAL; + if (atomic_read(kvm-online_vcpus)) + goto create_irqchip_unlock; r = -ENOMEM; vpic = kvm_create_pic(kvm); if (vpic) { @@ -6107,6 +6110,11 @@ void kvm_arch_check_processor_compat(void *rtn) kvm_x86_ops-check_processor_compatibility(rtn); } +bool kvm_vcpu_compatible(struct kvm_vcpu *vcpu) +{ + return irqchip_in_kernel(vcpu-kvm) == (vcpu-arch.apic != NULL); +} + int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu) { struct page *page; diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 355e445..759fa26 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -805,6 +805,13 @@ static inline bool kvm_vcpu_is_bsp(struct kvm_vcpu *vcpu) { return vcpu-kvm-bsp_vcpu_id == vcpu-vcpu_id; } + +bool kvm_vcpu_compatible(struct kvm_vcpu *vcpu); + +#else + +static bool kvm_vcpu_compatible(struct kvm_vcpu *vcpu) { return true; } + #endif #ifdef __KVM_HAVE_DEVICE_ASSIGNMENT diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index e4431ad..94e148e 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1651,6 +1651,10 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id) goto vcpu_destroy; mutex_lock(kvm-lock); + if (!kvm_vcpu_compatible(vcpu)) { + r = -EINVAL; + goto unlock_vcpu_destroy; + } if (atomic_read(kvm-online_vcpus) == KVM_MAX_VCPUS) { r = -EINVAL; goto unlock_vcpu_destroy; -- 1.7.9 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v2 1/9] net: introduce tcp_server_start()
On 03/05/2012 12:03 PM, Amos Kong wrote: Introduce tcp_server_start() by moving original code in tcp_start_incoming_migration(). Signed-off-by: Amos Kong ak...@redhat.com --- net.c | 27 +++ qemu_socket.h |2 ++ 2 files changed, 29 insertions(+), 0 deletions(-) diff --git a/net.c b/net.c index c34474f..0260968 100644 --- a/net.c +++ b/net.c @@ -99,6 +99,33 @@ static int get_str_sep(char *buf, int buf_size, const char **pp, int sep) return 0; } +int tcp_server_start(const char *str, int *fd) +{ +int val, ret; +struct sockaddr_in saddr; + +if (parse_host_port(saddr, str) 0) { error message would be nice +return -1; +} + +*fd = qemu_socket(PF_INET, SOCK_STREAM, 0); +if (fd 0) { +perror(socket); +return -1; +} this is actually net_socket_listen_init version tcp_start_incoming_migration returns the error -socket_error(). I prefer not to lose the errno. I know that when calling net_socket_listen_init for some unknown reason there is an explict check for -1 if (net_socket_listen_init(vlan, socket, name, listen) == -1) { I think it is a good opportunity to change this check. Orit +socket_set_nonblock(*fd); + +/* allow fast reuse */ +val = 1; +setsockopt(*fd, SOL_SOCKET, SO_REUSEADDR, (const char *)val, sizeof(val)); + +ret = bind(*fd, (struct sockaddr *)saddr, sizeof(saddr)); +if (ret 0) { +closesocket(*fd); +} +return ret; +} + int parse_host_port(struct sockaddr_in *saddr, const char *str) { char buf[512]; diff --git a/qemu_socket.h b/qemu_socket.h index fe4cf6c..d612793 100644 --- a/qemu_socket.h +++ b/qemu_socket.h @@ -54,6 +54,8 @@ int unix_listen(const char *path, char *ostr, int olen); int unix_connect_opts(QemuOpts *opts); int unix_connect(const char *path); +int tcp_server_start(const char *str, int *fd); + /* Old, ipv4 only bits. Don't use for new code. */ int parse_host_port(struct sockaddr_in *saddr, const char *str); int socket_init(void); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v2 3/9] net: introduce tcp_client_start()
On 03/05/2012 12:03 PM, Amos Kong wrote: Introduce tcp_client_start() by moving original code in tcp_start_outgoing_migration(). Signed-off-by: Amos Kong ak...@redhat.com --- net.c | 39 +++ qemu_socket.h |1 + 2 files changed, 40 insertions(+), 0 deletions(-) diff --git a/net.c b/net.c index 0260968..5c20e22 100644 --- a/net.c +++ b/net.c @@ -126,6 +126,45 @@ int tcp_server_start(const char *str, int *fd) return ret; } +int tcp_client_start(const char *str, int *fd) +{ +struct sockaddr_in saddr; +int ret; + +if (parse_host_port(saddr, str) 0) { +return -EINVAL; +} + +*fd = qemu_socket(PF_INET, SOCK_STREAM, 0); +if (fd 0) { +perror(socket); +return -1; +} +socket_set_nonblock(*fd); + +for (;;) { +ret = connect(*fd, (struct sockaddr *)saddr, sizeof(saddr)); +if (ret 0) { +ret = -socket_error(); +if (ret == -EINPROGRESS) { +break; +#ifdef _WIN32 +} else if (ret == -WSAEALREADY || ret == -WSAEINVAL) { +break; +#endif +} else if (ret != -EINTR ret != -EWOULDBLOCK) { +perror(connect); +closesocket(*fd); +return -1; I think it should be: return ret (otherwise you lose the error code). And you need it. Orit +} +} else { +break; +} +} + +return ret; +} + int parse_host_port(struct sockaddr_in *saddr, const char *str) { char buf[512]; diff --git a/qemu_socket.h b/qemu_socket.h index d612793..9246578 100644 --- a/qemu_socket.h +++ b/qemu_socket.h @@ -55,6 +55,7 @@ int unix_connect_opts(QemuOpts *opts); int unix_connect(const char *path); int tcp_server_start(const char *str, int *fd); +int tcp_client_start(const char *str, int *fd); /* Old, ipv4 only bits. Don't use for new code. */ int parse_host_port(struct sockaddr_in *saddr, const char *str); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v2 2/9] net: use tcp_server_start() for tcp server creation
On 03/05/2012 12:03 PM, Amos Kong wrote: Use tcp_server_start in those two functions: tcp_start_incoming_migration() net_socket_listen_init() Signed-off-by: Amos Kong ak...@redhat.com --- migration-tcp.c | 21 + net/socket.c| 23 +++ 2 files changed, 8 insertions(+), 36 deletions(-) diff --git a/migration-tcp.c b/migration-tcp.c index 35a5781..ecadd10 100644 --- a/migration-tcp.c +++ b/migration-tcp.c @@ -157,28 +157,17 @@ out2: int tcp_start_incoming_migration(const char *host_port) { -struct sockaddr_in addr; -int val; +int ret; int s; DPRINTF(Attempting to start an incoming migration\n); -if (parse_host_port(addr, host_port) 0) { -fprintf(stderr, invalid host/port combination: %s\n, host_port); -return -EINVAL; -} - -s = qemu_socket(PF_INET, SOCK_STREAM, 0); -if (s == -1) { -return -socket_error(); +ret = tcp_server_start(host_port, s); +if (ret 0) { +fprintf(stderr, tcp_server_start: %s\n, strerror(-ret)); +return ret; } -val = 1; -setsockopt(s, SOL_SOCKET, SO_REUSEADDR, (const char *)val, sizeof(val)); - -if (bind(s, (struct sockaddr *)addr, sizeof(addr)) == -1) { -goto err; -} if (listen(s, 1) == -1) { goto err; } diff --git a/net/socket.c b/net/socket.c index 0bcf229..5feb3d2 100644 --- a/net/socket.c +++ b/net/socket.c @@ -403,31 +403,14 @@ static int net_socket_listen_init(VLANState *vlan, const char *host_str) { NetSocketListenState *s; -int fd, val, ret; -struct sockaddr_in saddr; - -if (parse_host_port(saddr, host_str) 0) -return -1; +int fd, ret; s = g_malloc0(sizeof(NetSocketListenState)); -fd = qemu_socket(PF_INET, SOCK_STREAM, 0); -if (fd 0) { -perror(socket); -g_free(s); -return -1; -} -socket_set_nonblock(fd); - -/* allow fast reuse */ -val = 1; -setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, (const char *)val, sizeof(val)); - -ret = bind(fd, (struct sockaddr *)saddr, sizeof(saddr)); +ret = tcp_server_start(host_str, fd); if (ret 0) { -perror(bind); +error_report(tcp_server_start: %s, strerror(-ret)); If the return value is always -1 this has no meaning Orit g_free(s); -closesocket(fd); return -1; } ret = listen(fd, 0); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v2 3/9] net: introduce tcp_client_start()
On 03/05/2012 12:03 PM, Amos Kong wrote: Introduce tcp_client_start() by moving original code in tcp_start_outgoing_migration(). Signed-off-by: Amos Kong ak...@redhat.com --- net.c | 39 +++ qemu_socket.h |1 + 2 files changed, 40 insertions(+), 0 deletions(-) diff --git a/net.c b/net.c index 0260968..5c20e22 100644 --- a/net.c +++ b/net.c @@ -126,6 +126,45 @@ int tcp_server_start(const char *str, int *fd) return ret; } +int tcp_client_start(const char *str, int *fd) +{ +struct sockaddr_in saddr; +int ret; + +if (parse_host_port(saddr, str) 0) { +return -EINVAL; You use this in order to know when to call migrate_fd_error this is problematic as another error can return this error code. I think that setting *fd = -1 in the beginning of the function would be enough. Orit +} + +*fd = qemu_socket(PF_INET, SOCK_STREAM, 0); +if (fd 0) { +perror(socket); +return -1; +} +socket_set_nonblock(*fd); + +for (;;) { +ret = connect(*fd, (struct sockaddr *)saddr, sizeof(saddr)); +if (ret 0) { +ret = -socket_error(); +if (ret == -EINPROGRESS) { +break; +#ifdef _WIN32 +} else if (ret == -WSAEALREADY || ret == -WSAEINVAL) { +break; +#endif +} else if (ret != -EINTR ret != -EWOULDBLOCK) { +perror(connect); +closesocket(*fd); +return -1; should be return ret; +} +} else { +break; +} +} + +return ret; +} + int parse_host_port(struct sockaddr_in *saddr, const char *str) { char buf[512]; diff --git a/qemu_socket.h b/qemu_socket.h index d612793..9246578 100644 --- a/qemu_socket.h +++ b/qemu_socket.h @@ -55,6 +55,7 @@ int unix_connect_opts(QemuOpts *opts); int unix_connect(const char *path); int tcp_server_start(const char *str, int *fd); +int tcp_client_start(const char *str, int *fd); /* Old, ipv4 only bits. Don't use for new code. */ int parse_host_port(struct sockaddr_in *saddr, const char *str); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call agenda for tuesday 31
On 02/21/2012 07:33 PM, Peter Maydell wrote: On 9 February 2012 22:23, Peter Maydellpeter.mayd...@linaro.org wrote: Ping re the VMState and variable sized arrays issue. I don't see any consensus in this discussion for a different approach, so should we just commit Mitsyanko's patchset? From an IRC conversation I just had with Anthony and Juan: ===begin== 14:51 pm215 quintela: do you have an opinion on the vmstate variable- length array stuff (needed for sd card) ? 14:51 quintela pm havent looked, email title? 14:52 pm215 KVM call agenda for tuesday 31 is the most recent discussion :-) 14:53 pm215 http://patchwork.ozlabs.org/patch/137732/ and http://patchwork.ozlabs.org/patch/137733/ are the relevant patches 14:54 quintela pm215: found it, that it is a difficult thing to do (TM) 14:54 quintela it should be on the card file, or whatever :-( 14:55 quintela notice the should part. 14:55 pm215 I'm not sure what you mean, can you elaborate? 14:57 quintela pm215: sect is number of sectors, right? 14:58 pm215 yes 14:59 quintela so, a 1GB card would have around 8MB array? 14:59 quintela took or left some byties here andthere. 14:59 quintela bytes indeed. 14:59 quintela I _think_ that we should put that in a save_live section, but that is just me (TM) 15:00 quintela I guess that at some point, people are going to need bigger SD cards (16GB are already on the wild, right) 15:00 quintela and that would make live migration just impossible 15:00 quintela or very slow, that is completely equivalent. 15:01 quintela it is my understanding that AHCI is using similar code, or did I missread some of the information? 15:03 pm215 I think alex would like ahci to use a similar 'variable length array' thing, but in that case the array is much smaller 15:03 aliguori pm215, it's large but of bounded size, right? 15:03 aliguori for SD cards? 15:03 quintela aliguori: number of sectors * 4 bytes 15:03 quintela aliguori: so hugee 15:04 quintela 8MB array for each 1GB of disk. 15:04 quintela but or take some bytes. 15:04 aliguori quintela, you cannot save that much data via savevm 15:04 quintela this is more than all other devices combined in a normal instalaltion. 15:04 aliguori that's too much 15:04 quintela aliguori: see my answer, we need a save_live section, really. 15:04 aliguori it will screw up the live migration downtime algorithm 15:04 aliguori pm215, ^ 15:04 aliguori or just treat it as a ram section 15:05 aliguori qemu_ram_alloc() it, and call it a day 15:05 pm215 the spec isn't very clear, but I think technically this info should go in the sd card image, except there's no way to tack additional info into a raw file 15:05 aliguori pm215, yeah, qemu_ram_alloc() is the way to go I think, that makes it effectively volatile on-card RAM 15:05 quintela pm215: I fully agree that it should go into the card image, but . no space for it :-( 15:05 aliguori which i think makes sense 15:05 quintela pm215: another thing, forgetting about migration at all. 15:06 quintela how does this work if you stop marchine and restart it again? 15:06 * quintela guess that it is stored somewhere? 15:06 quintela s/marchine/machine/ 15:06 pm215 no, we just assume that any fresh sd card image has no write-protect set up 15:07 quintela pm215: what is stored on that image? /me would have assumed that wearing information 15:07 quintela but that is without reading the whole code. 15:09 quintela humm, it looks like only 1 bit is used for each sector, why are we storing 32 bits if we only use 1 bit? 15:09 pm215 it's write-protect : you can set parts of the sd card to not be writeable (with a granularity of a write-protect group size) 15:09 pm215 we probably don't implement fantastically efficiently 15:10 quintela pm215: ok, only 1 bit is needed. 15:10 quintela we can move to 1bit/sector (8 times smaller) 15:10 quintela but I still think that doing the qemu_ram_alloc() trick that aliguori pointed is the easiest way to fix it 15:11 quintela you can create a ram_save_live section, but it is going to be more complex for almost no gain 15:11 pm215 ah, so we qemu_ram_alloc() it and then the contents get transferred in the same way as main memory ? 15:12 pm215 ...that is in exec-obsolete.h and marked as to be removed soon... 15:13 aliguori pm215, yeah, so you'll need to create it using whatever the new fancy memory api interface is 15:13 aliguori pm215, note that whenever you touch that memory, you have to set the dirty bits appropriately 15:13 aliguori or else live migration won't work 15:14 quintela aliguori: if they have to touch the dirty bits, it is equivalent to do their own save_live section. 15:14 quintela but I agree that this is the only easy solution. 15:17 pm215 doesn't sound too hard... 15:18 quintela as usual with vmstate, problem is testing (althought shouldn't be
Re: [Qemu-devel] [PATCH v2 4/9] net: use tcp_client_start for tcp client creation
On 03/05/2012 12:03 PM, Amos Kong wrote: Use tcp_client_start() in those two functions: tcp_start_outgoing_migration() net_socket_connect_init() Signed-off-by: Amos Kong ak...@redhat.com --- migration-tcp.c | 41 + net/socket.c| 41 +++-- 2 files changed, 24 insertions(+), 58 deletions(-) diff --git a/migration-tcp.c b/migration-tcp.c index ecadd10..4f89bff 100644 --- a/migration-tcp.c +++ b/migration-tcp.c @@ -81,43 +81,28 @@ static void tcp_wait_for_connect(void *opaque) int tcp_start_outgoing_migration(MigrationState *s, const char *host_port) { -struct sockaddr_in addr; int ret; - -ret = parse_host_port(addr, host_port); -if (ret 0) { -return ret; -} +int fd; s-get_error = socket_errno; s-write = socket_write; s-close = tcp_close; -s-fd = qemu_socket(PF_INET, SOCK_STREAM, 0); -if (s-fd == -1) { -DPRINTF(Unable to open socket); -return -socket_error(); -} - -socket_set_nonblock(s-fd); - -do { -ret = connect(s-fd, (struct sockaddr *)addr, sizeof(addr)); -if (ret == -1) { -ret = -socket_error(); -} -if (ret == -EINPROGRESS || ret == -EWOULDBLOCK) { -qemu_set_fd_handler2(s-fd, NULL, NULL, tcp_wait_for_connect, s); -return 0; -} -} while (ret == -EINTR); - -if (ret 0) { +ret = tcp_client_start(host_port, fd); +s-fd = fd; you don't need fd you can pass s-fd to the function. Orit +if (ret == -EINPROGRESS || ret == -EWOULDBLOCK) { +DPRINTF(connect in progress); +qemu_set_fd_handler2(s-fd, NULL, NULL, tcp_wait_for_connect, s); +} else if (ret 0) { DPRINTF(connect failed\n); -migrate_fd_error(s); +if (ret != -EINVAL) { +migrate_fd_error(s); +} return ret; +} else { +migrate_fd_connect(s); } -migrate_fd_connect(s); + return 0; } diff --git a/net/socket.c b/net/socket.c index 5feb3d2..b7cd8ec 100644 --- a/net/socket.c +++ b/net/socket.c @@ -434,41 +434,22 @@ static int net_socket_connect_init(VLANState *vlan, const char *host_str) { NetSocketState *s; -int fd, connected, ret, err; +int fd, connected, ret; struct sockaddr_in saddr; -if (parse_host_port(saddr, host_str) 0) -return -1; - -fd = qemu_socket(PF_INET, SOCK_STREAM, 0); -if (fd 0) { -perror(socket); -return -1; -} -socket_set_nonblock(fd); - -connected = 0; -for(;;) { -ret = connect(fd, (struct sockaddr *)saddr, sizeof(saddr)); -if (ret 0) { -err = socket_error(); -if (err == EINTR || err == EWOULDBLOCK) { -} else if (err == EINPROGRESS) { -break; +ret = tcp_client_start(host_str, fd); +if (ret == -EINPROGRESS || ret == -EWOULDBLOCK) { +connected = 0; #ifdef _WIN32 -} else if (err == WSAEALREADY || err == WSAEINVAL) { -break; +} else if (ret == -WSAEALREADY || ret == -WSAEINVAL) { +connected = 0; #endif -} else { -perror(connect); -closesocket(fd); -return -1; -} -} else { -connected = 1; -break; -} +} else if (ret 0) { +return -1; +} else { +connected = 1; } + s = net_socket_fd_init(vlan, model, name, fd, connected); if (!s) return -1; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v2 2/9] net: use tcp_server_start() for tcp server creation
On 05/03/12 21:27, Orit Wasserman wrote: On 03/05/2012 12:03 PM, Amos Kong wrote: Use tcp_server_start in those two functions: tcp_start_incoming_migration() net_socket_listen_init() Signed-off-by: Amos Kongak...@redhat.com --- migration-tcp.c | 21 + net/socket.c| 23 +++ 2 files changed, 8 insertions(+), 36 deletions(-) diff --git a/migration-tcp.c b/migration-tcp.c index 35a5781..ecadd10 100644 --- a/migration-tcp.c +++ b/migration-tcp.c @@ -157,28 +157,17 @@ out2: int tcp_start_incoming_migration(const char *host_port) { -struct sockaddr_in addr; -int val; +int ret; int s; DPRINTF(Attempting to start an incoming migration\n); -if (parse_host_port(addr, host_port) 0) { -fprintf(stderr, invalid host/port combination: %s\n, host_port); -return -EINVAL; -} - -s = qemu_socket(PF_INET, SOCK_STREAM, 0); -if (s == -1) { -return -socket_error(); +ret = tcp_server_start(host_port,s); +if (ret 0) { +fprintf(stderr, tcp_server_start: %s\n, strerror(-ret)); +return ret; } -val = 1; -setsockopt(s, SOL_SOCKET, SO_REUSEADDR, (const char *)val, sizeof(val)); - -if (bind(s, (struct sockaddr *)addr, sizeof(addr)) == -1) { -goto err; -} if (listen(s, 1) == -1) { goto err; } diff --git a/net/socket.c b/net/socket.c index 0bcf229..5feb3d2 100644 --- a/net/socket.c +++ b/net/socket.c @@ -403,31 +403,14 @@ static int net_socket_listen_init(VLANState *vlan, const char *host_str) { NetSocketListenState *s; -int fd, val, ret; -struct sockaddr_in saddr; - -if (parse_host_port(saddr, host_str) 0) -return -1; +int fd, ret; s = g_malloc0(sizeof(NetSocketListenState)); -fd = qemu_socket(PF_INET, SOCK_STREAM, 0); -if (fd 0) { -perror(socket); -g_free(s); -return -1; -} -socket_set_nonblock(fd); - -/* allow fast reuse */ -val = 1; -setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, (const char *)val, sizeof(val)); - -ret = bind(fd, (struct sockaddr *)saddr, sizeof(saddr)); +ret = tcp_server_start(host_str,fd); if (ret 0) { -perror(bind); +error_report(tcp_server_start: %s, strerror(-ret)); If the return value is always -1 this has no meaning Hi Orit, return -1; is the original code, net_socket_listen_init() is only used once in net_init_socket() if (net_socket_connect_init(vlan, socket, name, connect) == -1) { return -1; } This patch just replace the server creation code by tcp_server_start(). Amos. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v2 1/9] net: introduce tcp_server_start()
On 05/03/12 21:25, Orit Wasserman wrote: On 03/05/2012 12:03 PM, Amos Kong wrote: Introduce tcp_server_start() by moving original code in tcp_start_incoming_migration(). Signed-off-by: Amos Kongak...@redhat.com --- net.c | 27 +++ qemu_socket.h |2 ++ 2 files changed, 29 insertions(+), 0 deletions(-) diff --git a/net.c b/net.c index c34474f..0260968 100644 --- a/net.c +++ b/net.c @@ -99,6 +99,33 @@ static int get_str_sep(char *buf, int buf_size, const char **pp, int sep) return 0; } +int tcp_server_start(const char *str, int *fd) +{ +int val, ret; +struct sockaddr_in saddr; + +if (parse_host_port(saddr, str) 0) { error message would be nice +return -1; +} + +*fd = qemu_socket(PF_INET, SOCK_STREAM, 0); +if (fd 0) { +perror(socket); +return -1; +} this is actually net_socket_listen_init version tcp_start_incoming_migration returns the error -socket_error(). I prefer not to lose the errno. agree. I know that when calling net_socket_listen_init for some unknown reason there is an explict check for -1 if (net_socket_listen_init(vlan, socket, name, listen) == -1) { I think it is a good opportunity to change this check. nod. Orit +socket_set_nonblock(*fd); + +/* allow fast reuse */ +val = 1; +setsockopt(*fd, SOL_SOCKET, SO_REUSEADDR, (const char *)val, sizeof(val)); + +ret = bind(*fd, (struct sockaddr *)saddr, sizeof(saddr)); +if (ret 0) { +closesocket(*fd); +} +return ret; +} + int parse_host_port(struct sockaddr_in *saddr, const char *str) { char buf[512]; diff --git a/qemu_socket.h b/qemu_socket.h index fe4cf6c..d612793 100644 --- a/qemu_socket.h +++ b/qemu_socket.h @@ -54,6 +54,8 @@ int unix_listen(const char *path, char *ostr, int olen); int unix_connect_opts(QemuOpts *opts); int unix_connect(const char *path); +int tcp_server_start(const char *str, int *fd); + /* Old, ipv4 only bits. Don't use for new code. */ int parse_host_port(struct sockaddr_in *saddr, const char *str); int socket_init(void); -- Amos. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call agenda for tuesday 31
On 03/05/2012 03:38 PM, Igor Mitsyanko wrote: Short summary: * switch wp groups to bitfield rather than int array * convert sd.c to use memory_region_init_ram() to allocate the wp groups (being careful to use memory_region_set_dirty() when we touch them) * we don't need variable-length fields for sd.c any more * rest of the vmstate conversion is straightforward OK, it turned out to be not so simple, we can't use memory API in sd.c because TARGET_PHYS_ADDR_BITS value (and, consequently, target_phys_addr_t) is not defined for common objects. Well, can't you make sd.c target dependent? It's not so nice, but it does solve the problem. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] pmu emulation fixes
On 02/26/2012 04:55 PM, Gleb Natapov wrote: Gleb Natapov (3): KVM: x86 emulator: warn when pin control is set in eventsel msr KVM: x86 emulator: Fix raw event check KVM: x86 emulator: add proper support for fixed counter 2 Thanks, applied (s/emulator/pmu/...) -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2 unit-test] Add PMU test
On 02/26/2012 05:20 PM, Gleb Natapov wrote: Add unit test to test architectural PMU emulation in kvm. Thanks, applied. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] fix select(2) race between main_loop_wait and qemu_aio_wait
On 03/05/2012 11:07 AM, Jan Kiszka wrote: On 2012-03-05 09:34, Paolo Bonzini wrote: This is quite ugly. Two threads, one running main_loop_wait and one running qemu_aio_wait, can race with each other on running the same iohandler. The result is that an iohandler could run while the underlying socket is not readable or writable, with possibly ill effects. Hmm, isn't it a problem already that a socket is polled by two threads at the same time? Can't that be avoided? Could it be done simply by adding a mutex there? It's hardly a clean fix, but it's not a clean problem. Long-term, I'd like to cut out certain file descriptors from the main loop and process them completely in separate threads (for separate locking, prioritization etc.). Dunno how NBD works, but maybe it should be reworked like this already. Ideally qemu_set_fd_handler2() should be made thread local, and each device thread would run a copy of the main loop, just working on different data. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] fix select(2) race between main_loop_wait and qemu_aio_wait
Il 05/03/2012 15:24, Avi Kivity ha scritto: On 03/05/2012 11:07 AM, Jan Kiszka wrote: On 2012-03-05 09:34, Paolo Bonzini wrote: This is quite ugly. Two threads, one running main_loop_wait and one running qemu_aio_wait, can race with each other on running the same iohandler. The result is that an iohandler could run while the underlying socket is not readable or writable, with possibly ill effects. Hmm, isn't it a problem already that a socket is polled by two threads at the same time? Can't that be avoided? Could it be done simply by adding a mutex there? It's hardly a clean fix, but it's not a clean problem. Hmm, I don't think so. It would need to protect execution of the iohandlers too, and pretty much everything can happen there including a nested loop. Of course recursive mutexes exist, but it sounds like too big an axe. I could add a generation count updated by qemu_aio_wait(), and rerun the select() only if the generation count changes during its execution. Or we can call it an NBD bug. I'm not against that, but it seemed to me that the problem is more general. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] fix select(2) race between main_loop_wait and qemu_aio_wait
On 2012-03-05 15:24, Avi Kivity wrote: Long-term, I'd like to cut out certain file descriptors from the main loop and process them completely in separate threads (for separate locking, prioritization etc.). Dunno how NBD works, but maybe it should be reworked like this already. Ideally qemu_set_fd_handler2() should be made thread local, and each device thread would run a copy of the main loop, just working on different data. qemu_set_fd_handler2 may not only be called over an iothread. Rather, we need an object and associated lock that is related to the io-path (i.e. frontend device + backend driver). That object has to be passed to services like qemu_set_fd_handler2. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call agenda for tuesday 31
On 03/05/2012 06:13 PM, Avi Kivity wrote: On 03/05/2012 03:38 PM, Igor Mitsyanko wrote: Short summary: * switch wp groups to bitfield rather than int array * convert sd.c to use memory_region_init_ram() to allocate the wp groups (being careful to use memory_region_set_dirty() when we touch them) * we don't need variable-length fields for sd.c any more * rest of the vmstate conversion is straightforward OK, it turned out to be not so simple, we can't use memory API in sd.c because TARGET_PHYS_ADDR_BITS value (and, consequently, target_phys_addr_t) is not defined for common objects. Well, can't you make sd.c target dependent? It's not so nice, but it does solve the problem. OK, but it will turn qemu from it's long term path to suppress *all* target specific code :) -- Mitsyanko Igor ASWG, Moscow RD center, Samsung Electronics email: i.mitsya...@samsung.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] pmu emulation fixes
On Mon, Mar 05, 2012 at 04:18:21PM +0200, Avi Kivity wrote: On 02/26/2012 04:55 PM, Gleb Natapov wrote: Gleb Natapov (3): KVM: x86 emulator: warn when pin control is set in eventsel msr KVM: x86 emulator: Fix raw event check KVM: x86 emulator: add proper support for fixed counter 2 Thanks, applied (s/emulator/pmu/...) You, maintainers, are hard to please! My previous fix to PMU ee3f9f114bdc8b315eed7b1c651ca6c9b8251cf7 was prefixed KVM: x86 emulator: so I followed suit :) -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call agenda for tuesday 31
On 03/05/2012 04:37 PM, Igor Mitsyanko wrote: Well, can't you make sd.c target dependent? It's not so nice, but it does solve the problem. OK, but it will turn qemu from it's long term path to suppress *all* target specific code :) The other alternative is to s/target_phys_addr_t/uint64_t/ in the memory API. I think 32-on-32 is quite rare these days, so it wouldn't be much of a performance issue. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] fix select(2) race between main_loop_wait and qemu_aio_wait
On 03/05/2012 04:30 PM, Paolo Bonzini wrote: Il 05/03/2012 15:24, Avi Kivity ha scritto: On 03/05/2012 11:07 AM, Jan Kiszka wrote: On 2012-03-05 09:34, Paolo Bonzini wrote: This is quite ugly. Two threads, one running main_loop_wait and one running qemu_aio_wait, can race with each other on running the same iohandler. The result is that an iohandler could run while the underlying socket is not readable or writable, with possibly ill effects. Hmm, isn't it a problem already that a socket is polled by two threads at the same time? Can't that be avoided? Could it be done simply by adding a mutex there? It's hardly a clean fix, but it's not a clean problem. Hmm, I don't think so. It would need to protect execution of the iohandlers too, and pretty much everything can happen there including a nested loop. Of course recursive mutexes exist, but it sounds like too big an axe. The I/O handlers would still use the qemu mutex, no? we'd just protect the select() (taking the mutex from before releasing the global lock, and reacquiring it afterwards). I could add a generation count updated by qemu_aio_wait(), and rerun the select() only if the generation count changes during its execution. Or we can call it an NBD bug. I'm not against that, but it seemed to me that the problem is more general. What about making sure all callers of qemu_aio_wait() run from coroutines (or threads)? Then they just ask the main thread to wake them up, instead of dispatching completions themselves. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call agenda for tuesday 31
On 03/05/2012 09:10 AM, Avi Kivity wrote: On 03/05/2012 04:37 PM, Igor Mitsyanko wrote: Well, can't you make sd.c target dependent? It's not so nice, but it does solve the problem. OK, but it will turn qemu from it's long term path to suppress *all* target specific code :) The other alternative is to s/target_phys_addr_t/uint64_t/ in the memory API. I think 32-on-32 is quite rare these days, so it wouldn't be much of a performance issue. I think this makes sense independent of other discussions regarding fixing target_phys_addr_t size. Hardware addresses should be independent of the target. If we wanted to use a hw_addr_t that would be okay too. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call agenda for tuesday 31
On 03/05/2012 05:15 PM, Anthony Liguori wrote: The other alternative is to s/target_phys_addr_t/uint64_t/ in the memory API. I think 32-on-32 is quite rare these days, so it wouldn't be much of a performance issue. I think this makes sense independent of other discussions regarding fixing target_phys_addr_t size. Hardware addresses should be independent of the target. If we wanted to use a hw_addr_t that would be okay too. Would this hw_addr (s/_t$//, or you'll be Blued) be fixed at uint64_t (and thus only documentary), or also subject to multiple compilation? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call agenda for tuesday 31
On 5 March 2012 15:10, Avi Kivity a...@redhat.com wrote: I think 32-on-32 is quite rare these days, so it wouldn't be much of a performance issue. 32-on-32 will be the standard case for KVM on ARM I think... -- PMM -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call agenda for tuesday 31
On 03/05/2012 05:20 PM, Peter Maydell wrote: On 5 March 2012 15:10, Avi Kivity a...@redhat.com wrote: I think 32-on-32 is quite rare these days, so it wouldn't be much of a performance issue. 32-on-32 will be the standard case for KVM on ARM I think... Won't we be virtualizing LPAE per default? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call agenda for tuesday 31
On 5 March 2012 15:21, Avi Kivity a...@redhat.com wrote: On 03/05/2012 05:20 PM, Peter Maydell wrote: On 5 March 2012 15:10, Avi Kivity a...@redhat.com wrote: I think 32-on-32 is quite rare these days, so it wouldn't be much of a performance issue. 32-on-32 will be the standard case for KVM on ARM I think... Won't we be virtualizing LPAE per default? Mmm, I guess that would give you 64-on-32. -- PMM -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call agenda for tuesday 31
Am 05.03.2012 16:10, schrieb Avi Kivity: On 03/05/2012 04:37 PM, Igor Mitsyanko wrote: Well, can't you make sd.c target dependent? It's not so nice, but it does solve the problem. OK, but it will turn qemu from it's long term path to suppress *all* target specific code :) The other alternative is to s/target_phys_addr_t/uint64_t/ in the memory API. I think 32-on-32 is quite rare these days, so it wouldn't be much of a performance issue. Maybe rare, but 32-bit ARM netbooks and tablets are gaining marketshare. Mid-term also depends on how me want to proceed with LPAE softmmu-wise (bump arm to 64-bit target_phys_addr_t, or do LPAE and AArch64 in a new arm64). i386 is 64-on-32 these days already; most of the embedded targets are still at most 32-bit though (xtensa, mblaze, ...). Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call agenda for tuesday 31
On 03/05/2012 05:43 PM, Andreas Färber wrote: Am 05.03.2012 16:10, schrieb Avi Kivity: On 03/05/2012 04:37 PM, Igor Mitsyanko wrote: Well, can't you make sd.c target dependent? It's not so nice, but it does solve the problem. OK, but it will turn qemu from it's long term path to suppress *all* target specific code :) The other alternative is to s/target_phys_addr_t/uint64_t/ in the memory API. I think 32-on-32 is quite rare these days, so it wouldn't be much of a performance issue. Maybe rare, but 32-bit ARM netbooks and tablets are gaining marketshare. Mid-term also depends on how me want to proceed with LPAE softmmu-wise (bump arm to 64-bit target_phys_addr_t, or do LPAE and AArch64 in a new arm64). I was counting on LPAE to make 32-on-32 rare. i386 is 64-on-32 these days already; most of the embedded targets are still at most 32-bit though (xtensa, mblaze, ...). These would be 32-on-64, since the host would usually be x86. I guess it would be even more true when the w64 port is complete. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call agenda for tuesday 31
On 5 March 2012 15:43, Andreas Färber afaer...@suse.de wrote: Mid-term also depends on how me want to proceed with LPAE softmmu-wise (bump arm to 64-bit target_phys_addr_t, or do LPAE and AArch64 in a new arm64). For LPAE I would have thought we want to make arm go to a 64 bit target_phys_addr_t, since that's exactly what it is: same old ARM architecture with wider physical addresses :-) I notice that for the architectures we currently have that have 32 and 64 bit versions we have separate {i386,x86_64}-softmmu, {ppc,ppc64}-softmmu, {mips,mips64}-softmmu. What's the advantage of separating out the 64 bit flavours that way rather than having everything be a single binary? -- PMM -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm: mmu: make use of -root_level in reset_rsvds_bits_mask
From: Davidlohr Bueso d...@gnu.org The reset_rsvds_bits_mask() function can use the guest walker's root level number instead of using a separate 'level' variable. Signed-off-by: Davidlohr Bueso d...@gnu.org --- arch/x86/kvm/mmu.c | 31 +++ 1 files changed, 15 insertions(+), 16 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index ff053ca..4cb1642 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -3185,15 +3185,14 @@ static bool sync_mmio_spte(u64 *sptep, gfn_t gfn, unsigned access, #undef PTTYPE static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu, - struct kvm_mmu *context, - int level) + struct kvm_mmu *context) { int maxphyaddr = cpuid_maxphyaddr(vcpu); u64 exb_bit_rsvd = 0; if (!context-nx) exb_bit_rsvd = rsvd_bits(63, 63); - switch (level) { + switch (context-root_level) { case PT32_ROOT_LEVEL: /* no rsvd bits for 2 level 4K page table entries */ context-rsvd_bits_mask[0][1] = 0; @@ -3251,8 +3250,9 @@ static int paging64_init_context_common(struct kvm_vcpu *vcpu, int level) { context-nx = is_nx(vcpu); + context-root_level = level; - reset_rsvds_bits_mask(vcpu, context, level); + reset_rsvds_bits_mask(vcpu, context); ASSERT(is_pae(vcpu)); context-new_cr3 = paging_new_cr3; @@ -3262,7 +3262,6 @@ static int paging64_init_context_common(struct kvm_vcpu *vcpu, context-invlpg = paging64_invlpg; context-update_pte = paging64_update_pte; context-free = paging_free; - context-root_level = level; context-shadow_root_level = level; context-root_hpa = INVALID_PAGE; context-direct_map = false; @@ -3279,8 +3278,9 @@ static int paging32_init_context(struct kvm_vcpu *vcpu, struct kvm_mmu *context) { context-nx = false; + context-root_level = PT32_ROOT_LEVEL; - reset_rsvds_bits_mask(vcpu, context, PT32_ROOT_LEVEL); + reset_rsvds_bits_mask(vcpu, context); context-new_cr3 = paging_new_cr3; context-page_fault = paging32_page_fault; @@ -3289,7 +3289,6 @@ static int paging32_init_context(struct kvm_vcpu *vcpu, context-sync_page = paging32_sync_page; context-invlpg = paging32_invlpg; context-update_pte = paging32_update_pte; - context-root_level = PT32_ROOT_LEVEL; context-shadow_root_level = PT32E_ROOT_LEVEL; context-root_hpa = INVALID_PAGE; context-direct_map = false; @@ -3327,19 +3326,19 @@ static int init_kvm_tdp_mmu(struct kvm_vcpu *vcpu) context-root_level = 0; } else if (is_long_mode(vcpu)) { context-nx = is_nx(vcpu); - reset_rsvds_bits_mask(vcpu, context, PT64_ROOT_LEVEL); - context-gva_to_gpa = paging64_gva_to_gpa; context-root_level = PT64_ROOT_LEVEL; + reset_rsvds_bits_mask(vcpu, context); + context-gva_to_gpa = paging64_gva_to_gpa; } else if (is_pae(vcpu)) { context-nx = is_nx(vcpu); - reset_rsvds_bits_mask(vcpu, context, PT32E_ROOT_LEVEL); - context-gva_to_gpa = paging64_gva_to_gpa; context-root_level = PT32E_ROOT_LEVEL; + reset_rsvds_bits_mask(vcpu, context); + context-gva_to_gpa = paging64_gva_to_gpa; } else { context-nx = false; - reset_rsvds_bits_mask(vcpu, context, PT32_ROOT_LEVEL); - context-gva_to_gpa = paging32_gva_to_gpa; context-root_level = PT32_ROOT_LEVEL; + reset_rsvds_bits_mask(vcpu, context); + context-gva_to_gpa = paging32_gva_to_gpa; } return 0; @@ -3402,18 +3401,18 @@ static int init_kvm_nested_mmu(struct kvm_vcpu *vcpu) g_context-gva_to_gpa = nonpaging_gva_to_gpa_nested; } else if (is_long_mode(vcpu)) { g_context-nx = is_nx(vcpu); - reset_rsvds_bits_mask(vcpu, g_context, PT64_ROOT_LEVEL); g_context-root_level = PT64_ROOT_LEVEL; + reset_rsvds_bits_mask(vcpu, g_context); g_context-gva_to_gpa = paging64_gva_to_gpa_nested; } else if (is_pae(vcpu)) { g_context-nx = is_nx(vcpu); - reset_rsvds_bits_mask(vcpu, g_context, PT32E_ROOT_LEVEL); g_context-root_level = PT32E_ROOT_LEVEL; + reset_rsvds_bits_mask(vcpu, g_context); g_context-gva_to_gpa = paging64_gva_to_gpa_nested; } else { g_context-nx = false; - reset_rsvds_bits_mask(vcpu, g_context, PT32_ROOT_LEVEL); g_context-root_level = PT32_ROOT_LEVEL; +
[PATCH] KVM: PPC: Save/Restore CR over vcpu_run
On PPC, CR2-CR4 are nonvolatile, thus have to be saved across function calls. We didn't respect that for any architecture until Paul spotted it in his patch for Book3S-HV. This patch saves/restores CR for all KVM capable PPC hosts. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kvm/book3s_interrupts.S |7 +++ arch/powerpc/kvm/booke_interrupts.S |7 ++- arch/powerpc/kvm/bookehv_interrupts.S |8 +++- 3 files changed, 20 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/book3s_interrupts.S b/arch/powerpc/kvm/book3s_interrupts.S index 0a8515a..3e35383 100644 --- a/arch/powerpc/kvm/book3s_interrupts.S +++ b/arch/powerpc/kvm/book3s_interrupts.S @@ -84,6 +84,10 @@ kvm_start_entry: /* Save non-volatile registers (r14 - r31) */ SAVE_NVGPRS(r1) + /* Save CR */ + mfcrr14 + stw r14, _CCR(r1) + /* Save LR */ PPC_STL r0, _LINK(r1) @@ -165,6 +169,9 @@ kvm_exit_loop: PPC_LL r4, _LINK(r1) mtlrr4 + lwz r14, _CCR(r1) + mtcrr14 + /* Restore non-volatile host registers (r14 - r31) */ REST_NVGPRS(r1) diff --git a/arch/powerpc/kvm/booke_interrupts.S b/arch/powerpc/kvm/booke_interrupts.S index 10d8ef6..c8c4b87 100644 --- a/arch/powerpc/kvm/booke_interrupts.S +++ b/arch/powerpc/kvm/booke_interrupts.S @@ -34,7 +34,8 @@ /* r2 is special: it holds 'current', and it made nonvolatile in the * kernel with the -ffixed-r2 gcc option. */ #define HOST_R2 12 -#define HOST_NV_GPRS16 +#define HOST_CR 16 +#define HOST_NV_GPRS20 #define HOST_NV_GPR(n) (HOST_NV_GPRS + ((n - 14) * 4)) #define HOST_MIN_STACK_SIZE (HOST_NV_GPR(31) + 4) #define HOST_STACK_SIZE (((HOST_MIN_STACK_SIZE + 15) / 16) * 16) /* Align. */ @@ -296,8 +297,10 @@ heavyweight_exit: /* Return to kvm_vcpu_run(). */ lwz r4, HOST_STACK_LR(r1) + lwz r5, HOST_CR(r1) addir1, r1, HOST_STACK_SIZE mtlrr4 + mtcrr5 /* r3 still contains the return code from kvmppc_handle_exit(). */ blr @@ -314,6 +317,8 @@ _GLOBAL(__kvmppc_vcpu_run) stw r3, HOST_RUN(r1) mflrr3 stw r3, HOST_STACK_LR(r1) + mfcrr5 + stw r5, HOST_CR(r1) /* Save host non-volatile register state to stack. */ stw r14, HOST_NV_GPR(r14)(r1) diff --git a/arch/powerpc/kvm/bookehv_interrupts.S b/arch/powerpc/kvm/bookehv_interrupts.S index 63fc5f0..3989b5a 100644 --- a/arch/powerpc/kvm/bookehv_interrupts.S +++ b/arch/powerpc/kvm/bookehv_interrupts.S @@ -49,7 +49,8 @@ * kernel with the -ffixed-r2 gcc option. */ #define HOST_R2 (3 * LONGBYTES) -#define HOST_NV_GPRS(4 * LONGBYTES) +#define HOST_CR (4 * LONGBYTES) +#define HOST_NV_GPRS(5 * LONGBYTES) #define HOST_NV_GPR(n) (HOST_NV_GPRS + ((n - 14) * LONGBYTES)) #define HOST_MIN_STACK_SIZE (HOST_NV_GPR(31) + LONGBYTES) #define HOST_STACK_SIZE ((HOST_MIN_STACK_SIZE + 15) ~15) /* Align. */ @@ -396,6 +397,7 @@ skip_nv_load: heavyweight_exit: /* Not returning to guest. */ PPC_LL r5, HOST_STACK_LR(r1) + lwz r6, HOST_CR(r1) /* * We already saved guest volatile register state; now save the @@ -442,6 +444,7 @@ heavyweight_exit: /* Return to kvm_vcpu_run(). */ mtlrr5 + mtcrr6 addir1, r1, HOST_STACK_SIZE /* r3 still contains the return code from kvmppc_handle_exit(). */ blr @@ -459,6 +462,9 @@ _GLOBAL(__kvmppc_vcpu_run) mflrr3 PPC_STL r3, HOST_STACK_LR(r1) + mfcrr5 + stw r5, HOST_CR(r1) + /* Save host non-volatile register state to stack. */ PPC_STL r14, HOST_NV_GPR(r14)(r1) PPC_STL r15, HOST_NV_GPR(r15)(r1) -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: virtio-blk performance regression and qemu-kvm
Am 10.02.2012 15:36, schrieb Dongsu Park: Recently I observed performance regression regarding virtio-blk, especially different IO bandwidths between qemu-kvm 0.14.1 and 1.0. So I want to share the benchmark results, and ask you what the reason would be. Hi, I think I found the problem, there is no regression in the code. I think the problem is, that qmeu-kvm with the IO-Thread enabled doesn't produce enough cpu load to get the core to a higher cpu frequency, because the load is distributed to two threads. If I change the cpu governor to performance the result from the master branch is better than from the v0.14.1 branch. I get the same results on a serversystem without powermanagment activated. @Dongsu Could you confirm those findings? 1. Test on i7 Laptop with Cpu governor ondemand. v0.14.1 bw=63492KB/s iops=15873 bw=63221KB/s iops=15805 v1.0 bw=36696KB/s iops=9173 bw=37404KB/s iops=9350 master bw=36396KB/s iops=9099 bw=34182KB/s iops=8545 Change the Cpu governor to performance master bw=81756KB/s iops=20393 bw=81453KB/s iops=20257 2. Test on AMD Istanbul without powermanagement activated. v0.14.1 bw=53167KB/s iops=13291 bw=61386KB/s iops=15346 v1.0 bw=43599KB/s iops=10899 bw=46288KB/s iops=11572 master bw=60678KB/s iops=15169 bw=62733KB/s iops=15683 -martin -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] fix select(2) race between main_loop_wait and qemu_aio_wait
Il 05/03/2012 16:14, Avi Kivity ha scritto: Hmm, I don't think so. It would need to protect execution of the iohandlers too, and pretty much everything can happen there including a nested loop. Of course recursive mutexes exist, but it sounds like too big an axe. The I/O handlers would still use the qemu mutex, no? we'd just protect the select() (taking the mutex from before releasing the global lock, and reacquiring it afterwards). Yes, that could work, but it is _really_ ugly. I still prefer this patch or fixing NBD. At least both contain the hack in a single place. I could add a generation count updated by qemu_aio_wait(), and rerun the select() only if the generation count changes during its execution. Or we can call it an NBD bug. I'm not against that, but it seemed to me that the problem is more general. What about making sure all callers of qemu_aio_wait() run from coroutines (or threads)? Then they just ask the main thread to wake them up, instead of dispatching completions themselves. That would open another Pandora's box. The point of having a separate main loop is that only AIO can happen during qemu_aio_wait() or qemu_aio_flush(). In particular you don't want the monitor to process input while you're running another monitor command. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Question on qemu-kvm 1.0
Hi , We have been using qemu/kvm 0.12.5 (unchanged with stock kernel 2.6.32). I just upgraded to qemu/kvm-1.0 and see noticable difference in packet I/O. I want to understand the enhancements in 1.0 that leads to better performance. Can you give me some pointers? Off the bat I see new event code. From observation, I see that the qemu-kvm process is taking a whole lot less CPU. Thanks /a -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Question on qemu-kvm 1.0
Side note: I am not using vhost-net yet. I am reading from the blogs that vhost-net gives much better performance. I am putting another system up with vhost-net support to measure this. Appreciate the pointers for previous question. /a On Mon, Mar 5, 2012 at 11:17 AM, Al Patel alps@gmail.com wrote: Hi , We have been using qemu/kvm 0.12.5 (unchanged with stock kernel 2.6.32). I just upgraded to qemu/kvm-1.0 and see noticable difference in packet I/O. I want to understand the enhancements in 1.0 that leads to better performance. Can you give me some pointers? Off the bat I see new event code. From observation, I see that the qemu-kvm process is taking a whole lot less CPU. Thanks /a -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: virtio-blk performance regression and qemu-kvm
On Mon, Mar 5, 2012 at 4:13 PM, Martin Mailand mar...@tuxadero.com wrote: Am 10.02.2012 15:36, schrieb Dongsu Park: Recently I observed performance regression regarding virtio-blk, especially different IO bandwidths between qemu-kvm 0.14.1 and 1.0. So I want to share the benchmark results, and ask you what the reason would be. Hi, I think I found the problem, there is no regression in the code. I think the problem is, that qmeu-kvm with the IO-Thread enabled doesn't produce enough cpu load to get the core to a higher cpu frequency, because the load is distributed to two threads. If I change the cpu governor to performance the result from the master branch is better than from the v0.14.1 branch. I get the same results on a serversystem without powermanagment activated. @Dongsu Could you confirm those findings? 1. Test on i7 Laptop with Cpu governor ondemand. v0.14.1 bw=63492KB/s iops=15873 bw=63221KB/s iops=15805 v1.0 bw=36696KB/s iops=9173 bw=37404KB/s iops=9350 master bw=36396KB/s iops=9099 bw=34182KB/s iops=8545 Change the Cpu governor to performance master bw=81756KB/s iops=20393 bw=81453KB/s iops=20257 Interesting finding. Did you show the 0.14.1 results with performance governor? Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4 V13] Avoid soft lockup message when KVM is stopped by host
On Tue, Feb 14, 2012 at 04:17:20PM -0500, Eric B Munson wrote: On Tue, 14 Feb 2012, Marcelo Tosatti wrote: On Tue, Feb 14, 2012 at 10:50:13AM -0500, Eric B Munson wrote: On Tue, 14 Feb 2012, Marcelo Tosatti wrote: On Tue, Feb 14, 2012 at 10:29:31AM -0500, Eric B Munson wrote: On Wed, 08 Feb 2012, Eric B Munson wrote: When a guest kernel is stopped by the host hypervisor it can look like a soft lockup to the guest kernel. This false warning can mask later soft lockup warnings which may be real. This patch series adds a method for a host hypervisor to communicate to a guest kernel that it is being stopped. The final patch in the series has the watchdog check this flag when it goes to issue a soft lockup warning and skip the warning if the guest knows it was stopped. It was attempted to solve this in Qemu, but the side effects of saving and restoring the clock and tsc for each vcpu put the wall clock of the guest behind by the amount of time of the pause. This forces a guest to have ntp running in order to keep the wall clock accurate. Avi, Is this set fit for merging or is there something else you want changed? Eric, On Message-ID: 20120210160536.ga23...@amt.cnet, i asked: How is the stub getting included for other architectures again? Marcelo, Sorry, I put out V13 to answer that. There is a stub in asm-generic that was lost in the V11-V12 rebase. This stub has be included in the V13 set. Eric Eric, I know the stub has been included in the series. But i am asking how it is #include'ed for other architectures? (can't see that). Marcelo, kernel/watchdog.c now includes linux/kvm_para.h which includes asm/kvm_para.h. The check_and_clear function is defined in arch include/asm/kvm_para.h or in asm-generic/kvm_para.h for any arch lacking the specific header in their asm include dir. If I have misunderstood how these headers work, please let me know and I will fix it. There is no automatic inclusion of asm-generic/ headers. You must create kvm_para.h in each architecture's include/asm/ directory, #including asm-generic/kvm_para.h. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: virtio-blk performance regression and qemu-kvm
Am 05.03.2012 17:35, schrieb Stefan Hajnoczi: 1. Test on i7 Laptop with Cpu governor ondemand. v0.14.1 bw=63492KB/s iops=15873 bw=63221KB/s iops=15805 v1.0 bw=36696KB/s iops=9173 bw=37404KB/s iops=9350 master bw=36396KB/s iops=9099 bw=34182KB/s iops=8545 Change the Cpu governor to performance master bw=81756KB/s iops=20393 bw=81453KB/s iops=20257 Interesting finding. Did you show the 0.14.1 results with performance governor? Hi Stefan, all results are with ondemand except the one where I changed it to performance Do you want a v0.14.1 test with the governor on performance? -martin -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1 v3] PCI: Device specific reset function
On Mon, Mar 05, 2012 at 10:00:49AM +, Tadeusz Struk wrote: --- drivers/pci/pci.h|1 + drivers/pci/quirks.c | 33 +++-- include/linux/pci.h |1 + 3 files changed, 29 insertions(+), 6 deletions(-) Please read Documentation/SubmittingPatches for how to properly create, and send, patches that are in a format that can be accepted. Hint, also run your patch through scripts/checkpatch.pl to find the obvious problems that are in it, to keep us from having to do that for you... This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. I have been told that such email footers require that the patch be deleted and never be accepted. Please fix this if you expect your patches to be able to be applied. greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v0 1/2] net: bridge: propagate FDB table into hardware
On Tue, Feb 28, 2012 at 08:40:06PM -0800, John Fastabend wrote: Also if there are embedded switches with learning capabilities they might want to trigger events to user space. In this case having a protocol type makes user space a bit easier to manage. I've added Lennert so maybe he can comment I think the Marvell chipsets might support something along these lines. The SR-IOV chipsets I'm aware of _today_ don't do learning. Learning makes the event model more plausible. net/dsa currently configures any switch chips in the system to do auto-learning. However, I would much prefer to disable that, and have the switch chip just pass up packets for new source addresses, have Linux do the learning, and then mirror the Linux software FDB into the hardware instead -- that avoids having to manually flush the hardware FDB on certain STP state transitions or having to configure the hardware to use a shorter address learning timeout when we're in the middle of an STP topology change, which are problems we are running into in practice. Just curious -- while your patches allow propagating FDB entries into the hardware, do you also have hooks to tell the hardware which ports are to share address databases? For net/dsa, we currently have: http://patchwork.ozlabs.org/patch/16578/ While I think this is conceptually sound, the implementation is hacky, and I wonder how you've solved it for your setup, and if DSA can piggy-back off that. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v0 1/2] net: bridge: propagate FDB table into hardware
On Thu, Mar 01, 2012 at 08:36:20AM -0500, Jamal Hadi Salim wrote: I want to see a unified API so that user space control applications (RSTP, TRILL?) can use one set of netlink calls for both software bridge and hardware offloaded bridges. Does this proposal meet that requirement? I dont see any issues with those requirements being met. Jamal, so why do They have to be different calls? I'm not so sure anymore... moving to RTM_FDB_XXXENTRY saved some refactoring in the bridge module but that is just cosmetic. I may not want to use the s/ware bridge i.e I may want to use h/ware bridge. I may want to use both. This is a rather common case in embedded wireless routers/access points, where you want to have the 4 LAN ports bridged together with the wlan0 interface. In this scenario, the bridging between the LAN ports is typically done in hardware, and the bridging between the LAN ports and wlan0 in software, but here you have to be careful when you send the packet from the switch chip up the stack to be forwarded to the wlan0 interface to not re-send it to the hardware switch chip ports other than the one that the packet came from. net/dsa currently solves this by not having the hardware handle broadcast packets at all, which circumvents the problem, but for multicast traffic you would still like to be able to do at least the forwarding that can be done in hardware in hardware. (Unicast doesn't have this problem as long as the kernel and the switch chip agree on their view of the FDB.) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4 V13] Avoid soft lockup message when KVM is stopped by host
On Mon, 5 Mar 2012 13:39:43 -0300, Marcelo Tosatti wrote: On Tue, Feb 14, 2012 at 04:17:20PM -0500, Eric B Munson wrote: On Tue, 14 Feb 2012, Marcelo Tosatti wrote: On Tue, Feb 14, 2012 at 10:50:13AM -0500, Eric B Munson wrote: On Tue, 14 Feb 2012, Marcelo Tosatti wrote: On Tue, Feb 14, 2012 at 10:29:31AM -0500, Eric B Munson wrote: On Wed, 08 Feb 2012, Eric B Munson wrote: When a guest kernel is stopped by the host hypervisor it can look like a soft lockup to the guest kernel. This false warning can mask later soft lockup warnings which may be real. This patch series adds a method for a host hypervisor to communicate to a guest kernel that it is being stopped. The final patch in the series has the watchdog check this flag when it goes to issue a soft lockup warning and skip the warning if the guest knows it was stopped. It was attempted to solve this in Qemu, but the side effects of saving and restoring the clock and tsc for each vcpu put the wall clock of the guest behind by the amount of time of the pause. This forces a guest to have ntp running in order to keep the wall clock accurate. Avi, Is this set fit for merging or is there something else you want changed? Eric, On Message-ID: 20120210160536.ga23...@amt.cnet, i asked: How is the stub getting included for other architectures again? Marcelo, Sorry, I put out V13 to answer that. There is a stub in asm-generic that was lost in the V11-V12 rebase. This stub has be included in the V13 set. Eric Eric, I know the stub has been included in the series. But i am asking how it is #include'ed for other architectures? (can't see that). Marcelo, kernel/watchdog.c now includes linux/kvm_para.h which includes asm/kvm_para.h. The check_and_clear function is defined in arch include/asm/kvm_para.h or in asm-generic/kvm_para.h for any arch lacking the specific header in their asm include dir. If I have misunderstood how these headers work, please let me know and I will fix it. There is no automatic inclusion of asm-generic/ headers. You must create kvm_para.h in each architecture's include/asm/ directory, #including asm-generic/kvm_para.h. Okay, that will go into V16 then... Eric -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call agenda for tuesday 31
On 03/05/2012 05:50 PM, Peter Maydell wrote: On 5 March 2012 15:43, Andreas Färber afaer...@suse.de wrote: Mid-term also depends on how me want to proceed with LPAE softmmu-wise (bump arm to 64-bit target_phys_addr_t, or do LPAE and AArch64 in a new arm64). For LPAE I would have thought we want to make arm go to a 64 bit target_phys_addr_t, since that's exactly what it is: same old ARM architecture with wider physical addresses :-) I notice that for the architectures we currently have that have 32 and 64 bit versions we have separate {i386,x86_64}-softmmu, {ppc,ppc64}-softmmu, {mips,mips64}-softmmu. What's the advantage of separating out the 64 bit flavours that way rather than having everything be a single binary? The registers are smaller; if target_ulong fits in a long then everything is faster. Although, you could pretend that target_ulong is 32-bit when in 32-bit mode, and zero the high half when switching modes, if the target allows it (I believe i386-x86_64 does, but 8086-i386 does not). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] fix select(2) race between main_loop_wait and qemu_aio_wait
On 03/05/2012 06:14 PM, Paolo Bonzini wrote: Il 05/03/2012 16:14, Avi Kivity ha scritto: Hmm, I don't think so. It would need to protect execution of the iohandlers too, and pretty much everything can happen there including a nested loop. Of course recursive mutexes exist, but it sounds like too big an axe. The I/O handlers would still use the qemu mutex, no? we'd just protect the select() (taking the mutex from before releasing the global lock, and reacquiring it afterwards). Yes, that could work, but it is _really_ ugly. Yes, it is... I still prefer this patch or fixing NBD. At least both contain the hack in a single place. I could add a generation count updated by qemu_aio_wait(), and rerun the select() only if the generation count changes during its execution. Or we can call it an NBD bug. I'm not against that, but it seemed to me that the problem is more general. What about making sure all callers of qemu_aio_wait() run from coroutines (or threads)? Then they just ask the main thread to wake them up, instead of dispatching completions themselves. That would open another Pandora's box. The point of having a separate main loop is that only AIO can happen during qemu_aio_wait() or qemu_aio_flush(). In particular you don't want the monitor to process input while you're running another monitor command. Hmm, yes, we're abusing the type of completion here as a kind of wierd locking. It's conceptually broken since an aio completion could trigger anything. Usually it just involves block format driver and device code, but in theory, it can affect the state of whoever's running qemu_aio_wait(). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] fix select(2) race between main_loop_wait and qemu_aio_wait
On 03/05/2012 04:30 PM, Jan Kiszka wrote: On 2012-03-05 15:24, Avi Kivity wrote: Long-term, I'd like to cut out certain file descriptors from the main loop and process them completely in separate threads (for separate locking, prioritization etc.). Dunno how NBD works, but maybe it should be reworked like this already. Ideally qemu_set_fd_handler2() should be made thread local, and each device thread would run a copy of the main loop, just working on different data. qemu_set_fd_handler2 may not only be called over an iothread. Rather, we need an object and associated lock that is related to the io-path (i.e. frontend device + backend driver). That object has to be passed to services like qemu_set_fd_handler2. Not sure I like implicit lock-taking. In particular, how does it interact with unregistering an fd_handler? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: weird packet loss between two VMs
Hi Simon, you are using a 100Mbits nic and you try to send with 600M, try a 1000Mbits on the sending site as well. -martin Am 05.03.2012 00:57, schrieb Simon Chen: For the two VMs, one is using 100M VNIC, the other is using 1000M one. The vnet interfaces for the two VMs are put on two bridges on the two servers, both tap into the second vlan. I then run iperf to send UDP packets from the 100M VM to the 1000M VM using the following parameter: iperf -c 10.6.6.17 -t 30 -i 2 -r -b 600M -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] fix select(2) race between main_loop_wait and qemu_aio_wait
On 2012-03-05 18:39, Avi Kivity wrote: On 03/05/2012 04:30 PM, Jan Kiszka wrote: On 2012-03-05 15:24, Avi Kivity wrote: Long-term, I'd like to cut out certain file descriptors from the main loop and process them completely in separate threads (for separate locking, prioritization etc.). Dunno how NBD works, but maybe it should be reworked like this already. Ideally qemu_set_fd_handler2() should be made thread local, and each device thread would run a copy of the main loop, just working on different data. qemu_set_fd_handler2 may not only be called over an iothread. Rather, we need an object and associated lock that is related to the io-path (i.e. frontend device + backend driver). That object has to be passed to services like qemu_set_fd_handler2. Not sure I like implicit lock-taking. In particular, how does it interact with unregistering an fd_handler? I wasn't suggesting implicit lock taking, just decoupling from our infamous global lock. My point is that thread-local won't help here. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1 v3] PCI: Device specific reset function
On Mon, Mar 5, 2012 at 3:00 AM, Tadeusz Struk tadeusz.st...@intel.com wrote: --- drivers/pci/pci.h | 1 + drivers/pci/quirks.c | 33 +++-- include/linux/pci.h | 1 + 3 files changed, 29 insertions(+), 6 deletions(-) diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 1009a5e..4d10479 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -315,6 +315,7 @@ struct pci_dev_reset_methods { u16 vendor; u16 device; int (*reset)(struct pci_dev *dev, int probe); + struct list_head list; }; #ifdef CONFIG_PCI_QUIRKS diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 6476547..f423d2f 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -3070,26 +3070,47 @@ static int reset_intel_82599_sfp_virtfn(struct pci_dev *dev, int probe) } #define PCI_DEVICE_ID_INTEL_82599_SFP_VF 0x10ed - -static const struct pci_dev_reset_methods pci_dev_reset_methods[] = { +static struct pci_dev_reset_methods pci_dev_reset_methods[] = { { PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82599_SFP_VF, - reset_intel_82599_sfp_virtfn }, + reset_intel_82599_sfp_virtfn }, { PCI_VENDOR_ID_INTEL, PCI_ANY_ID, reset_intel_generic_dev }, - { 0 } }; +static LIST_HEAD(reset_list); + +void pci_dev_specific_reset_add(struct pci_dev_reset_methods *reset_method) +{ + INIT_LIST_HEAD(reset_method-list); + list_add(reset_method-list, reset_list); +} + +static int __init pci_dev_specific_reset_init(void) +{ + int i; + + for (i = 0; i ARRAY_SIZE(pci_dev_reset_methods); i++) { + pci_dev_specific_reset_add(pci_dev_reset_methods[i]); + } + return 0; +} + +late_initcall(pci_dev_specific_reset_init); + int pci_dev_specific_reset(struct pci_dev *dev, int probe) { const struct pci_dev_reset_methods *i; + struct pci_driver *drv = dev-driver; + + if (drv drv-reset) + return drv-reset(dev, probe); - for (i = pci_dev_reset_methods; i-reset; i++) { + list_for_each_entry(i, reset_list, list) { if ((i-vendor == dev-vendor || i-vendor == (u16)PCI_ANY_ID) (i-device == dev-device || i-device == (u16)PCI_ANY_ID)) return i-reset(dev, probe); } - return -ENOTTY; } diff --git a/include/linux/pci.h b/include/linux/pci.h index a16b1df..a3a0bc5 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -560,6 +560,7 @@ struct pci_driver { int (*resume_early) (struct pci_dev *dev); int (*resume) (struct pci_dev *dev); /* Device woken up */ void (*shutdown) (struct pci_dev *dev); + int (*reset) (struct pci_dev *dev, int probe); /* Device specific reset */ struct pci_error_handlers *err_handler; struct device_driver driver; struct pci_dynids dynids; This patch now consists of two pieces: 1) Convert the reset method table to a list, and 2) Add the reset function pointer in struct pci_driver and the if (drv drv-reset) block. These should be split into two patches. After you split them, I don't think you even need part 1, so you should probably just drop it. Common practice would be to also include your driver patch that actually uses the pci_driver.reset pointer as an additional patch in the same series. That gives us more confidence that this solution actually works and will be used. Bjorn -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call agenda for tuesday 31
On Mon, Mar 5, 2012 at 15:17, Avi Kivity a...@redhat.com wrote: On 03/05/2012 05:15 PM, Anthony Liguori wrote: The other alternative is to s/target_phys_addr_t/uint64_t/ in the memory API. I think 32-on-32 is quite rare these days, so it wouldn't be much of a performance issue. I think this makes sense independent of other discussions regarding fixing target_phys_addr_t size. Hardware addresses should be independent of the target. If we wanted to use a hw_addr_t that would be okay too. Would this hw_addr (s/_t$//, or you'll be Blued) be fixed at uint64_t Malced? Posixed? (and thus only documentary), or also subject to multiple compilation? In real world CPU physical addresses, bus addresses and device addresses need not have anything in common. The best would be if we could have devices with 10-bit addresses mixing freely with 32 bit buses and 36 bit CPU physical addresses. The next best thing probably is to fix all of them to shortest possible reasonable value, like now. Fixing all of them to 64 bits would simplify things a lot if we no longer care about the small performance loss on 32 bit hosts. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call agenda for tuesday 31
On Mon, 5 Mar 2012, Blue Swirl wrote: On Mon, Mar 5, 2012 at 15:17, Avi Kivity a...@redhat.com wrote: On 03/05/2012 05:15 PM, Anthony Liguori wrote: The other alternative is to s/target_phys_addr_t/uint64_t/ in the memory API. I think 32-on-32 is quite rare these days, so it wouldn't be much of a performance issue. I think this makes sense independent of other discussions regarding fixing target_phys_addr_t size. Hardware addresses should be independent of the target. If we wanted to use a hw_addr_t that would be okay too. Would this hw_addr (s/_t$//, or you'll be Blued) be fixed at uint64_t Malced? Posixed? Heh, a_moo would be Malced, no _t is Posixed indeed. -- mailto:av1...@comtv.ru
Re: [PATCH] KVM: PPC: Save/Restore CR over vcpu_run
On 03/05/2012 10:02 AM, Alexander Graf wrote: @@ -442,6 +444,7 @@ heavyweight_exit: /* Return to kvm_vcpu_run(). */ mtlrr5 + mtcrr6 addir1, r1, HOST_STACK_SIZE /* r3 still contains the return code from kvmppc_handle_exit(). */ blr @@ -459,6 +462,9 @@ _GLOBAL(__kvmppc_vcpu_run) mflrr3 PPC_STL r3, HOST_STACK_LR(r1) + mfcrr5 + stw r5, HOST_CR(r1) If you move the mfcr before the PPC_STL they should be able to run in parallel. Otherwise on e500mc mfcr will wait for PPC_STL to take its 3 cycles and then mfcr will take 5 cyles before the stw of HOST_CR. Alternatively, consider using mcrf/mtocrf three times. Similar issues in booke_interrupts.S (except we can't assume mtocrf exists there), but I'm less worried about that one as it still needs an optimization pass in general. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 42829] KVM Guest with virtio network driver loses network connectivity
https://bugzilla.kernel.org/show_bug.cgi?id=42829 Steve stefan.bo...@gmail.com changed: What|Removed |Added Kernel Version|v3.0-rc5|v3.0-rc1 -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 42829] KVM Guest with virtio network driver loses network connectivity
https://bugzilla.kernel.org/show_bug.cgi?id=42829 Steve stefan.bo...@gmail.com changed: What|Removed |Added Kernel Version|v3.0-rc1|v3.0-rc1+ Severity|high|blocking --- Comment #11 from Steve stefan.bo...@gmail.com 2012-03-06 02:26:16 --- I found bad commit. git bisect log: --- git bisect start # bad: [550cf00dbc8ee402bef71628cb71246493dd4500] Merge tag 'mmc-fixes-for-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc git bisect bad 550cf00dbc8ee402bef71628cb71246493dd4500 # good: [61c4f2c81c61f73549928dfd9f3e8f26aa36a8cf] Linux 2.6.39 git bisect good 61c4f2c81c61f73549928dfd9f3e8f26aa36a8cf # bad: [8a9ea3237e7eb5c25f09e429ad242ae5a3d5ea22] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next git bisect bad 8a9ea3237e7eb5c25f09e429ad242ae5a3d5ea22 # bad: [95a943c162d74b20d869917bdf5df11293c35b63] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem git bisect bad 95a943c162d74b20d869917bdf5df11293c35b63 # good: [98b98d316349e9a028e632629fe813d07fa5afdd] Merge branch 'drm-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6 git bisect good 98b98d316349e9a028e632629fe813d07fa5afdd # bad: [1f6e44a6dc21a5d2abb068063acbbf64f8cee548] pxa168_eth: enable transmit time stamping. git bisect bad 1f6e44a6dc21a5d2abb068063acbbf64f8cee548 # good: [19de85ef574c3a2182e3ccad9581805052f14946] bitops: add #ifndef for each of find bitops git bisect good 19de85ef574c3a2182e3ccad9581805052f14946 # good: [c320afe965bf3f857249d223801d8f2fc95615c2] Blackfin: debug-mmrs: include RSI_PID[4567] MMRs git bisect good c320afe965bf3f857249d223801d8f2fc95615c2 # bad: [23c79d31a3dd2602ee1a5ff31303b2d7a2d3c159] Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6 git bisect bad 23c79d31a3dd2602ee1a5ff31303b2d7a2d3c159 # good: [cd1acdf1723d71b28175f95b04305f1cc74ce363] Merge branch 'pnfs-submit' of git://git.open-osd.org/linux-open-osd git bisect good cd1acdf1723d71b28175f95b04305f1cc74ce363 # bad: [cd4ecf877a4d629c38571405fd649077c12dec50] Merge branch 'rmobile-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6 git bisect bad cd4ecf877a4d629c38571405fd649077c12dec50 # bad: [5c6cce92bc8aee751aafe82c5d9caf7553226a3d] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 git bisect bad 5c6cce92bc8aee751aafe82c5d9caf7553226a3d # bad: [8ea8cf89e19aeb596b818ee5f2bec8a8b0586b60] vhost: support event index git bisect bad 8ea8cf89e19aeb596b818ee5f2bec8a8b0586b60 # good: [bc805a03c26e1e25171bc627c6264553d27f746c] lguest: fix up compilation after move git bisect good bc805a03c26e1e25171bc627c6264553d27f746c # good: [bf50e69f63d21091e525185c3ae761412be0ba72] virtio balloon: kill tell-host-first logic git bisect good bf50e69f63d21091e525185c3ae761412be0ba72 # good: [770b31a85e000b0194974922f238a30ade4246b6] virtio: event index interface git bisect good 770b31a85e000b0194974922f238a30ade4246b6 # bad: [a5c262c5fd83ece01bd649fb08416c501d4c59d7] virtio_ring: support event idx feature git bisect bad a5c262c5fd83ece01bd649fb08416c501d4c59d7 # good: [bf7035bf20563a6cadcb9e870406e7b21daf5e30] virtio ring: inline function to check for events git bisect good bf7035bf20563a6cadcb9e870406e7b21daf5e30 git bisect message: === a5c262c5fd83ece01bd649fb08416c501d4c59d7 is the first bad commit commit a5c262c5fd83ece01bd649fb08416c501d4c59d7 Author: Michael S. Tsirkin m...@redhat.com Date: Fri May 20 02:10:44 2011 +0300 virtio_ring: support event idx feature Support for the new event idx feature: 1. When enabling interrupts, publish the current avail index value to the host to get interrupts on the next update. 2. Use the new avail_event feature to reduce the number of exits from the guest. Simple test with the simulator: [virtio]# time ./virtio_test spurious wakeus: 0x7 real0m0.169s user0m0.140s sys 0m0.019s [virtio]# time ./virtio_test --no-event-idx spurious wakeus: 0x11 real0m0.649s user0m0.295s sys 0m0.335s Signed-off-by: Michael S. Tsirkin m...@redhat.com Signed-off-by: Rusty Russell ru...@rustcorp.com.au :04 04 933903414419858cf7402aa3fb8c3f675d6ab7cc 0ed603da4671eef88e0702e6438e903b56688b62 M drivers I found bug in include/linux/virtio_ring.h: === virtio: event index interface authorMichael S. Tsirkin m...@redhat.com Thu, 19 May 2011 23:10:17 + (02:10 +0300) committerRusty Russell ru...@rustcorp.com.au Mon, 30 May 2011 01:44:14 + (10:44 +0930) commit770b31a85e000b0194974922f238a30ade4246b6 treeeed81e23f3116858b49af76bcc5831c38662de96tree |
[Bug 42829] KVM Guest with virtio network driver loses network connectivity
https://bugzilla.kernel.org/show_bug.cgi?id=42829 Jason Wang jasow...@redhat.com changed: What|Removed |Added CC||jasow...@redhat.com --- Comment #12 from Jason Wang jasow...@redhat.com 2012-03-06 02:56:08 --- (In reply to comment #11) I found bad commit. git bisect log: --- git bisect start # bad: [550cf00dbc8ee402bef71628cb71246493dd4500] Merge tag 'mmc-fixes-for-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc git bisect bad 550cf00dbc8ee402bef71628cb71246493dd4500 # good: [61c4f2c81c61f73549928dfd9f3e8f26aa36a8cf] Linux 2.6.39 git bisect good 61c4f2c81c61f73549928dfd9f3e8f26aa36a8cf # bad: [8a9ea3237e7eb5c25f09e429ad242ae5a3d5ea22] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next git bisect bad 8a9ea3237e7eb5c25f09e429ad242ae5a3d5ea22 # bad: [95a943c162d74b20d869917bdf5df11293c35b63] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem git bisect bad 95a943c162d74b20d869917bdf5df11293c35b63 # good: [98b98d316349e9a028e632629fe813d07fa5afdd] Merge branch 'drm-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6 git bisect good 98b98d316349e9a028e632629fe813d07fa5afdd # bad: [1f6e44a6dc21a5d2abb068063acbbf64f8cee548] pxa168_eth: enable transmit time stamping. git bisect bad 1f6e44a6dc21a5d2abb068063acbbf64f8cee548 # good: [19de85ef574c3a2182e3ccad9581805052f14946] bitops: add #ifndef for each of find bitops git bisect good 19de85ef574c3a2182e3ccad9581805052f14946 # good: [c320afe965bf3f857249d223801d8f2fc95615c2] Blackfin: debug-mmrs: include RSI_PID[4567] MMRs git bisect good c320afe965bf3f857249d223801d8f2fc95615c2 # bad: [23c79d31a3dd2602ee1a5ff31303b2d7a2d3c159] Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6 git bisect bad 23c79d31a3dd2602ee1a5ff31303b2d7a2d3c159 # good: [cd1acdf1723d71b28175f95b04305f1cc74ce363] Merge branch 'pnfs-submit' of git://git.open-osd.org/linux-open-osd git bisect good cd1acdf1723d71b28175f95b04305f1cc74ce363 # bad: [cd4ecf877a4d629c38571405fd649077c12dec50] Merge branch 'rmobile-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6 git bisect bad cd4ecf877a4d629c38571405fd649077c12dec50 # bad: [5c6cce92bc8aee751aafe82c5d9caf7553226a3d] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 git bisect bad 5c6cce92bc8aee751aafe82c5d9caf7553226a3d # bad: [8ea8cf89e19aeb596b818ee5f2bec8a8b0586b60] vhost: support event index git bisect bad 8ea8cf89e19aeb596b818ee5f2bec8a8b0586b60 # good: [bc805a03c26e1e25171bc627c6264553d27f746c] lguest: fix up compilation after move git bisect good bc805a03c26e1e25171bc627c6264553d27f746c # good: [bf50e69f63d21091e525185c3ae761412be0ba72] virtio balloon: kill tell-host-first logic git bisect good bf50e69f63d21091e525185c3ae761412be0ba72 # good: [770b31a85e000b0194974922f238a30ade4246b6] virtio: event index interface git bisect good 770b31a85e000b0194974922f238a30ade4246b6 # bad: [a5c262c5fd83ece01bd649fb08416c501d4c59d7] virtio_ring: support event idx feature git bisect bad a5c262c5fd83ece01bd649fb08416c501d4c59d7 # good: [bf7035bf20563a6cadcb9e870406e7b21daf5e30] virtio ring: inline function to check for events git bisect good bf7035bf20563a6cadcb9e870406e7b21daf5e30 git bisect message: === a5c262c5fd83ece01bd649fb08416c501d4c59d7 is the first bad commit commit a5c262c5fd83ece01bd649fb08416c501d4c59d7 Author: Michael S. Tsirkin m...@redhat.com Date: Fri May 20 02:10:44 2011 +0300 virtio_ring: support event idx feature Support for the new event idx feature: 1. When enabling interrupts, publish the current avail index value to the host to get interrupts on the next update. 2. Use the new avail_event feature to reduce the number of exits from the guest. Simple test with the simulator: [virtio]# time ./virtio_test spurious wakeus: 0x7 real0m0.169s user0m0.140s sys 0m0.019s [virtio]# time ./virtio_test --no-event-idx spurious wakeus: 0x11 real0m0.649s user0m0.295s sys 0m0.335s Signed-off-by: Michael S. Tsirkin m...@redhat.com Signed-off-by: Rusty Russell ru...@rustcorp.com.au :04 04 933903414419858cf7402aa3fb8c3f675d6ab7cc 0ed603da4671eef88e0702e6438e903b56688b62 M drivers I found bug in include/linux/virtio_ring.h: === virtio: event index interface authorMichael S. Tsirkin m...@redhat.com Thu, 19 May 2011 23:10:17 + (02:10 +0300) committerRusty Russell ru...@rustcorp.com.au Mon, 30 May 2011 01:44:14 + (10:44 +0930) commit
Re: [RFC PATCH v0 1/2] net: bridge: propagate FDB table into hardware
On 3/5/2012 8:53 AM, Lennert Buytenhek wrote: On Tue, Feb 28, 2012 at 08:40:06PM -0800, John Fastabend wrote: Also if there are embedded switches with learning capabilities they might want to trigger events to user space. In this case having a protocol type makes user space a bit easier to manage. I've added Lennert so maybe he can comment I think the Marvell chipsets might support something along these lines. The SR-IOV chipsets I'm aware of _today_ don't do learning. Learning makes the event model more plausible. net/dsa currently configures any switch chips in the system to do auto-learning. However, I would much prefer to disable that, and have the switch chip just pass up packets for new source addresses, have Linux do the learning, and then mirror the Linux software FDB into the hardware instead -- that avoids having to manually flush the hardware FDB on certain STP state transitions or having to configure the hardware to use a shorter address learning timeout when we're in the middle of an STP topology change, which are problems we are running into in practice. Great. And the plan is we should be able to use the same daemon with minimal changes (currently a flag) to control both sw and hw bridges. Just curious -- while your patches allow propagating FDB entries into the hardware, do you also have hooks to tell the hardware which ports are to share address databases? Not in the current patches. I don't have hardware right now that can instantiate multiple bridges. When I get some I was hoping to do something similar to this patch and use netlink commands to create/delete bridges and add/remove ports to them. This would be modifying the existing commands to work for both software and hardware bridges. By a bridge instantiation I mean a shared address database in this case. For net/dsa, we currently have: http://patchwork.ozlabs.org/patch/16578/ While I think this is conceptually sound, the implementation is hacky, and I wonder how you've solved it for your setup, and if DSA can piggy-back off that. Yep anything we come up with should work in both cases. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] KVM: expose Intel cpu new features to guest
Avi, Any comments? Thanks, Jinsong Liu, Jinsong wrote: From ecd8be962f69393c183f941bfdbd7a7d3876d442 Mon Sep 17 00:00:00 2001 From: Liu, Jinsong jinsong@intel.com Date: Mon, 27 Feb 2012 05:19:32 +0800 Subject: [PATCH] KVM: expose Intel cpu new features to guest Intel recently release 2 new features, HLE and TRM. Refer to http://software.intel.com/file/41417. This patch expose them to guest. Signed-off-by: Liu, Jinsong jinsong@intel.com --- arch/x86/include/asm/cpufeature.h |2 ++ arch/x86/kvm/cpuid.c |3 ++- 2 files changed, 4 insertions(+), 1 deletions(-) diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h index 17c5d4b..e8d12a8 100644 --- a/arch/x86/include/asm/cpufeature.h +++ b/arch/x86/include/asm/cpufeature.h @@ -198,10 +198,12 @@ /* Intel-defined CPU features, CPUID level 0x0007:0 (ebx), word 9 */ #define X86_FEATURE_FSGSBASE(9*32+ 0) /* {RD/WR}{FS/GS}BASE instructions*/ #define X86_FEATURE_BMI1 (9*32+ 3) /* 1st group bit manipulation extensions */ +#define X86_FEATURE_HLE (9*32+ 4) /* Hardware Lock Elision */ #define X86_FEATURE_AVX2(9*32+ 5) /* AVX2 instructions */ #define X86_FEATURE_SMEP (9*32+ 7) /* Supervisor Mode Execution Protection */ #define X86_FEATURE_BMI2(9*32+ 8) /* 2nd group bit manipulation extensions */ #define X86_FEATURE_ERMS (9*32+ 9) /* Enhanced REP MOVSB/STOSB */ +#define X86_FEATURE_RTM (9*32+11) /* Restricted Transactional Memory */ #if defined(__KERNEL__) !defined(__ASSEMBLY__) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 9fed5be..c2134b8 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -247,7 +247,8 @@ static int do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, /* cpuid 7.0.ebx */ const u32 kvm_supported_word9_x86_features = - F(FSGSBASE) | F(BMI1) | F(AVX2) | F(SMEP) | F(BMI2) | F(ERMS); + F(FSGSBASE) | F(BMI1) | F(HLE) | F(AVX2) | F(SMEP) | + F(BMI2) | F(ERMS) | F(RTM); /* all calls to cpuid_count() should be made on the same cpu */ get_cpu(); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: Ensure all vcpus are consistent with in-kernel irqchip settings
On Mon, 2012-03-05 at 14:29 +0200, Avi Kivity wrote: If some vcpus are created before KVM_CREATE_IRQCHIP, then irqchip_in_kernel() and vcpu-arch.apic will be inconsistent, leading to potential NULL pointer dereferences. Fix by: - ensuring that no vcpus are installed when KVM_CREATE_IRQCHIP is called - ensuring that a vcpu has an apic if it is installed after KVM_CREATE_IRQCHIP This is somewhat long winded because vcpu-arch.apic is created without kvm-lock held. Hi Avi, Thanks for following up on this. This looks OK to me. I wonder if we will end up needing to add other sanity tests at the same point, ie. when we install the vcpu, in which case we might need a generic sanity hook. But better to keep it specific until we need something generalised. When we do irqchip-in-kernel on powerpc we'll need to rework the #ifdef in kvm_host.h, because we don't want CONFIG_KVM_APIC_ARCHITECTURE, but we will need our own kvm_vcpu_compatible(). But again we'll do that at the time. cheers signature.asc Description: This is a digitally signed message part
[Bug 42829] KVM Guest with virtio network driver loses network connectivity
https://bugzilla.kernel.org/show_bug.cgi?id=42829 --- Comment #13 from Steve stefan.bo...@gmail.com 2012-03-06 07:28:59 --- Hello. I start testing from latest master branch v3.3-rc6+ on both: host, guest. During all test host has the same kernel other stuff, on guest i changed only kernel versions by git bisecting. I don't change any code, my proposal is only tip and could be wrong. I suppose that I provided sufficient information to detect bug in code. Answer to your question about containing the fixes: --- git branch --contains=a72caae21803b74e04e2afda5e035f149d4ea118 * master git branch --contains=4dbc5d9f4f791df8a5879f4a655f517adc7f56d1 * master Let me know how could I help you (when needed) to fix this issue as soon as possible. Thank you for your time. -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: PPC: Book 3S: Fix compilation for !HV configs
Commits 2f5cdd5487 (KVM: PPC: Book3S HV: Make secondary threads more robust against stray IPIs) and 1c2066b0f7 (KVM: PPC: Book3S HV: Make virtual processor area registration more robust) added fields to struct kvm_vcpu_arch inside #ifdef CONFIG_KVM_BOOK3S_64_HV regions, and added lines to arch/powerpc/kernel/asm-offsets.c to generate assembler constants for their offsets. Unfortunately this led to compile errors on Book 3S machines for configs that had KVM enabled but not CONFIG_KVM_BOOK3S_64_HV. This fixes the problem by moving the offending lines inside #ifdef CONFIG_KVM_BOOK3S_64_HV regions. Signed-off-by: Paul Mackerras pau...@samba.org --- This is against the kvm-ppc-next branch of git://github.com/agraf/linux-2.6.git. arch/powerpc/kernel/asm-offsets.c |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 8c8b2ce..86d43cc 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -464,6 +464,7 @@ int main(void) DEFINE(KVM_VRMA_SLB_V, offsetof(struct kvm, arch.vrma_slb_v)); DEFINE(VCPU_DSISR, offsetof(struct kvm_vcpu, arch.shregs.dsisr)); DEFINE(VCPU_DAR, offsetof(struct kvm_vcpu, arch.shregs.dar)); + DEFINE(VCPU_VPA, offsetof(struct kvm_vcpu, arch.vpa.pinned_addr)); #endif #ifdef CONFIG_PPC_BOOK3S DEFINE(VCPU_VCPUID, offsetof(struct kvm_vcpu, vcpu_id)); @@ -480,7 +481,6 @@ int main(void) DEFINE(VCPU_PENDING_EXC, offsetof(struct kvm_vcpu, arch.pending_exceptions)); DEFINE(VCPU_CEDED, offsetof(struct kvm_vcpu, arch.ceded)); DEFINE(VCPU_PRODDED, offsetof(struct kvm_vcpu, arch.prodded)); - DEFINE(VCPU_VPA, offsetof(struct kvm_vcpu, arch.vpa.pinned_addr)); DEFINE(VCPU_MMCR, offsetof(struct kvm_vcpu, arch.mmcr)); DEFINE(VCPU_PMC, offsetof(struct kvm_vcpu, arch.pmc)); DEFINE(VCPU_SLB, offsetof(struct kvm_vcpu, arch.slb)); @@ -554,10 +554,10 @@ int main(void) HSTATE_FIELD(HSTATE_IN_GUEST, in_guest); HSTATE_FIELD(HSTATE_RESTORE_HID5, restore_hid5); HSTATE_FIELD(HSTATE_NAPPING, napping); - HSTATE_FIELD(HSTATE_HWTHREAD_REQ, hwthread_req); - HSTATE_FIELD(HSTATE_HWTHREAD_STATE, hwthread_state); #ifdef CONFIG_KVM_BOOK3S_64_HV + HSTATE_FIELD(HSTATE_HWTHREAD_REQ, hwthread_req); + HSTATE_FIELD(HSTATE_HWTHREAD_STATE, hwthread_state); HSTATE_FIELD(HSTATE_KVM_VCPU, kvm_vcpu); HSTATE_FIELD(HSTATE_KVM_VCORE, kvm_vcore); HSTATE_FIELD(HSTATE_XICS_PHYS, xics_phys); -- 1.7.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [Qemu-devel] [PATCH 2/2] Expose tsc deadline timer cpuid to guest
Jan, Any comments? I feel some confused about your point 'disable cpuid feature for older machine types by default': are you planning a common approach for this common issue, or, you just ask me a specific solution for the tsc deadline timer case? Thanks, Jinsong Liu, Jinsong wrote: My point is that qemu-version-A [-cpu whatever] should provide the same VM as qemu-version-B -machine pc-A [-cpu whatever] specifically if you leave out the cpu specification. So the compat machine could establish a feature mask (e.g. append some -tsc_deadline in this case). But, indeed, we need a new channel for this. Yes, if such requirement need to be satisfied, I agree we need a new channel to solve this kind of common issue. As for tsc deadline timer feature exposing, I write an updated patch as attached. 1). It exposes tsc deadline timer feature to guest if in-kernel irqchip is used and kvm has emulated tsc deadline timer; 2). It also authorizes user to control the feature exposing via a cpu feature flag; Thanks, Jinsong From 5b7d5f459b621686e78e437010ce34748bcb9e8e Mon Sep 17 00:00:00 2001 From: Liu, Jinsong jinsong@intel.com Date: Wed, 29 Feb 2012 01:53:15 +0800 Subject: [PATCH] Expose tsc deadline timer feature to guest It exposes tsc deadline timer feature to guest if in-kernel irqchip is used and kvm has emulated tsc deadline timer. It also authorizes user to control the feature exposing via a cpu feature flag. Signed-off-by: Liu, Jinsong jinsong@intel.com --- target-i386/cpu.h |1 + target-i386/cpuid.c |2 +- target-i386/kvm.c |4 3 files changed, 6 insertions(+), 1 deletions(-) diff --git a/target-i386/cpu.h b/target-i386/cpu.h index d92be5d..3409afe 100644 --- a/target-i386/cpu.h +++ b/target-i386/cpu.h @@ -399,6 +399,7 @@ #define CPUID_EXT_X2APIC (1 21) #define CPUID_EXT_MOVBE(1 22) #define CPUID_EXT_POPCNT (1 23) +#define CPUID_EXT_TSC_DEADLINE_TIMER (1 24) #define CPUID_EXT_XSAVE(1 26) #define CPUID_EXT_OSXSAVE (1 27) #define CPUID_EXT_HYPERVISOR (1 31) diff --git a/target-i386/cpuid.c b/target-i386/cpuid.c index b9bfeaf..ac4b79c 100644 --- a/target-i386/cpuid.c +++ b/target-i386/cpuid.c @@ -50,7 +50,7 @@ static const char *ext_feature_name[] = { fma, cx16, xtpr, pdcm, NULL, NULL, dca, sse4.1|sse4_1, sse4.2|sse4_2, x2apic, movbe, popcnt, -NULL, aes, xsave, osxsave, +tsc_deadline, aes, xsave, osxsave, avx, NULL, NULL, hypervisor, }; static const char *ext2_feature_name[] = { diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 7079e87..2639699 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -370,6 +370,10 @@ int kvm_arch_init_vcpu(CPUState *env) i = env-cpuid_ext_features CPUID_EXT_HYPERVISOR; env-cpuid_ext_features = kvm_arch_get_supported_cpuid(s, 1, 0, R_ECX); env-cpuid_ext_features |= i; +if (!kvm_irqchip_in_kernel() || +!kvm_check_extension(s, KVM_CAP_TSC_DEADLINE_TIMER)) { +env-cpuid_ext_features = ~CPUID_EXT_TSC_DEADLINE_TIMER; +} env-cpuid_ext2_features = kvm_arch_get_supported_cpuid(s, 0x8001, 0, R_EDX); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 13/38] KVM: PPC: booke: category E.HV (GS-mode) support
+/* + * Host interrupt handlers may have clobbered these guest-readable + * SPRGs, so we need to reload them here with the guest's values. + */ +lwz r3, VCPU_VRSAVE(r4) +lwz r5, VCPU_SHARED_SPRG4(r11) +mtspr SPRN_VRSAVE, r3 +lwz r6, VCPU_SHARED_SPRG5(r11) +mtspr SPRN_SPRG4W, r5 +lwz r7, VCPU_SHARED_SPRG6(r11) +mtspr SPRN_SPRG5W, r6 +lwz r8, VCPU_SHARED_SPRG7(r11) +mtspr SPRN_SPRG6W, r7 +mtspr SPRN_SPRG7W, r8 + That should be here. +/* Load some guest volatiles. */ +PPC_LL r3, VCPU_LR(r4) +PPC_LL r5, VCPU_XER(r4) +PPC_LL r6, VCPU_CTR(r4) +PPC_LL r7, VCPU_CR(r4) +PPC_LL r8, VCPU_PC(r4) +#ifndef CONFIG_64BIT +lwz r9, (VCPU_SHARED_MSR + 4)(r11) +#else +ld r9, (VCPU_SHARED_MSR)(r11) +#endif +PPC_LL r0, VCPU_GPR(r0)(r4) +PPC_LL r1, VCPU_GPR(r1)(r4) +PPC_LL r2, VCPU_GPR(r2)(r4) +PPC_LL r10, VCPU_GPR(r10)(r4) +PPC_LL r11, VCPU_GPR(r11)(r4) +PPC_LL r12, VCPU_GPR(r12)(r4) +PPC_LL r13, VCPU_GPR(r13)(r4) +mtlrr3 +mtxer r5 +mtctr r6 +mtcrr7 +mtsrr0 r8 +mtsrr1 r9 + +#ifdef CONFIG_KVM_EXIT_TIMING +/* save enter time */ +1: +mfspr r6, SPRN_TBRU +mfspr r7, SPRN_TBRL +mfspr r8, SPRN_TBRU +cmpwr8, r6 Is not we should save guest CR after this otherwise this can corrupt it? I think this should be a typo since in our previous kvm implementation, we always did collect kvm exit timing at the above location :) Tiejun Thanks -Bharat +PPC_STL r7, VCPU_TIMING_LAST_ENTER_TBL(r4) +bne 1b +PPC_STL r8, VCPU_TIMING_LAST_ENTER_TBU(r4) +#endif + +/* Finish loading guest volatiles and jump to guest. */ +PPC_LL r5, VCPU_GPR(r5)(r4) +PPC_LL r6, VCPU_GPR(r6)(r4) +PPC_LL r7, VCPU_GPR(r7)(r4) +PPC_LL r8, VCPU_GPR(r8)(r4) +PPC_LL r9, VCPU_GPR(r9)(r4) + +PPC_LL r3, VCPU_GPR(r3)(r4) +PPC_LL r4, VCPU_GPR(r4)(r4) +rfi -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Restore guest CR after exit timing calculation
No instruction which can change Condition Register (CR) should be executed after Guest CR is loaded. So the guest CR is restored after the Exit Timing in lightweight_exit executes cmpw, which can clobber CR. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- This patch is against e500mc branch. arch/powerpc/kvm/bookehv_interrupts.S | 11 --- 1 files changed, 8 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kvm/bookehv_interrupts.S b/arch/powerpc/kvm/bookehv_interrupts.S index 63fc5f0..6b9389f 100644 --- a/arch/powerpc/kvm/bookehv_interrupts.S +++ b/arch/powerpc/kvm/bookehv_interrupts.S @@ -574,7 +574,6 @@ lightweight_exit: mtlrr3 mtxer r5 mtctr r6 - mtcrr7 mtsrr0 r8 mtsrr1 r9 @@ -582,14 +581,20 @@ lightweight_exit: /* save enter time */ 1: mfspr r6, SPRN_TBRU - mfspr r7, SPRN_TBRL + mfspr r9, SPRN_TBRL mfspr r8, SPRN_TBRU cmpwr8, r6 - PPC_STL r7, VCPU_TIMING_LAST_ENTER_TBL(r4) + PPC_STL r9, VCPU_TIMING_LAST_ENTER_TBL(r4) bne 1b PPC_STL r8, VCPU_TIMING_LAST_ENTER_TBU(r4) #endif + /* +* Don't execute any instruction which can change CR after +* below instruction. +*/ + mtcrr7 + /* Finish loading guest volatiles and jump to guest. */ PPC_LL r5, VCPU_GPR(r5)(r4) PPC_LL r6, VCPU_GPR(r6)(r4) -- 1.7.0.4 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: PPC: Save/Restore CR over vcpu_run
On PPC, CR2-CR4 are nonvolatile, thus have to be saved across function calls. We didn't respect that for any architecture until Paul spotted it in his patch for Book3S-HV. This patch saves/restores CR for all KVM capable PPC hosts. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kvm/book3s_interrupts.S |7 +++ arch/powerpc/kvm/booke_interrupts.S |7 ++- arch/powerpc/kvm/bookehv_interrupts.S |8 +++- 3 files changed, 20 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/book3s_interrupts.S b/arch/powerpc/kvm/book3s_interrupts.S index 0a8515a..3e35383 100644 --- a/arch/powerpc/kvm/book3s_interrupts.S +++ b/arch/powerpc/kvm/book3s_interrupts.S @@ -84,6 +84,10 @@ kvm_start_entry: /* Save non-volatile registers (r14 - r31) */ SAVE_NVGPRS(r1) + /* Save CR */ + mfcrr14 + stw r14, _CCR(r1) + /* Save LR */ PPC_STL r0, _LINK(r1) @@ -165,6 +169,9 @@ kvm_exit_loop: PPC_LL r4, _LINK(r1) mtlrr4 + lwz r14, _CCR(r1) + mtcrr14 + /* Restore non-volatile host registers (r14 - r31) */ REST_NVGPRS(r1) diff --git a/arch/powerpc/kvm/booke_interrupts.S b/arch/powerpc/kvm/booke_interrupts.S index 10d8ef6..c8c4b87 100644 --- a/arch/powerpc/kvm/booke_interrupts.S +++ b/arch/powerpc/kvm/booke_interrupts.S @@ -34,7 +34,8 @@ /* r2 is special: it holds 'current', and it made nonvolatile in the * kernel with the -ffixed-r2 gcc option. */ #define HOST_R2 12 -#define HOST_NV_GPRS16 +#define HOST_CR 16 +#define HOST_NV_GPRS20 #define HOST_NV_GPR(n) (HOST_NV_GPRS + ((n - 14) * 4)) #define HOST_MIN_STACK_SIZE (HOST_NV_GPR(31) + 4) #define HOST_STACK_SIZE (((HOST_MIN_STACK_SIZE + 15) / 16) * 16) /* Align. */ @@ -296,8 +297,10 @@ heavyweight_exit: /* Return to kvm_vcpu_run(). */ lwz r4, HOST_STACK_LR(r1) + lwz r5, HOST_CR(r1) addir1, r1, HOST_STACK_SIZE mtlrr4 + mtcrr5 /* r3 still contains the return code from kvmppc_handle_exit(). */ blr @@ -314,6 +317,8 @@ _GLOBAL(__kvmppc_vcpu_run) stw r3, HOST_RUN(r1) mflrr3 stw r3, HOST_STACK_LR(r1) + mfcrr5 + stw r5, HOST_CR(r1) /* Save host non-volatile register state to stack. */ stw r14, HOST_NV_GPR(r14)(r1) diff --git a/arch/powerpc/kvm/bookehv_interrupts.S b/arch/powerpc/kvm/bookehv_interrupts.S index 63fc5f0..3989b5a 100644 --- a/arch/powerpc/kvm/bookehv_interrupts.S +++ b/arch/powerpc/kvm/bookehv_interrupts.S @@ -49,7 +49,8 @@ * kernel with the -ffixed-r2 gcc option. */ #define HOST_R2 (3 * LONGBYTES) -#define HOST_NV_GPRS(4 * LONGBYTES) +#define HOST_CR (4 * LONGBYTES) +#define HOST_NV_GPRS(5 * LONGBYTES) #define HOST_NV_GPR(n) (HOST_NV_GPRS + ((n - 14) * LONGBYTES)) #define HOST_MIN_STACK_SIZE (HOST_NV_GPR(31) + LONGBYTES) #define HOST_STACK_SIZE ((HOST_MIN_STACK_SIZE + 15) ~15) /* Align. */ @@ -396,6 +397,7 @@ skip_nv_load: heavyweight_exit: /* Not returning to guest. */ PPC_LL r5, HOST_STACK_LR(r1) + lwz r6, HOST_CR(r1) /* * We already saved guest volatile register state; now save the @@ -442,6 +444,7 @@ heavyweight_exit: /* Return to kvm_vcpu_run(). */ mtlrr5 + mtcrr6 addir1, r1, HOST_STACK_SIZE /* r3 still contains the return code from kvmppc_handle_exit(). */ blr @@ -459,6 +462,9 @@ _GLOBAL(__kvmppc_vcpu_run) mflrr3 PPC_STL r3, HOST_STACK_LR(r1) + mfcrr5 + stw r5, HOST_CR(r1) + /* Save host non-volatile register state to stack. */ PPC_STL r14, HOST_NV_GPR(r14)(r1) PPC_STL r15, HOST_NV_GPR(r15)(r1) -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: check error return of kvmppc_core_vcpu_create first
On 02/21/2012 05:30 AM, Ben Collins wrote: The result of kvmppc_core_vcpu_create() was being manipulated before it was checked for IS_ERR(). Did not see the bug occur, but caught it when looking through the code. Nice catch, but this has already been fixed by Matt: commit c6f3830e7313eea47b526b597aadc5b18c69ad55 Author: Matt Evans m...@ozlabs.org Date: Tue Dec 6 21:19:42 2011 + KVM: PPC: Fix vcpu_create dereference before validity check. Fix usage of vcpu struct before check that it's actually valid. Signed-off-by: Matt Evans m...@ozlabs.org Signed-off-by: Alexander Graf ag...@suse.de Thanks a lot for sending the patch nevertheless! Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] KVM: PPC: Book3S HV: Save and restore CR in __kvmppc_vcore_entry
On 02/03/2012 11:53 AM, Paul Mackerras wrote: The ABI specifies that CR fields CR2--CR4 are nonvolatile across function calls. Currently __kvmppc_vcore_entry doesn't save and restore the CR, leading to CR2--CR4 getting corrupted with guest values, possibly leading to incorrect behaviour in its caller. This adds instructions to save and restore CR at the points where we save and restore the nonvolatile GPRs. Signed-off-by: Paul Mackerraspau...@samba.org Thanks, applied all to kvm-ppc-next. Please CC kvm@vger when you send patches. Failing to do so might mean the whole pull request gets blocked by Avi when it gets to him, because he doesn't read kvm-ppc@vger. Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: Book3s: PR: Add SPAPR H_BULK_REMOVE support
On 01/31/2012 07:25 AM, Matt Evans wrote: SPAPR support includes various in-kernel hypercalls, improving performance by cutting out the exit to userspace. H_BULK_REMOVE is implemented in this patch. Signed-off-by: Matt Evansm...@ozlabs.org Thanks, applied to kvm-ppc-next. Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Emulating lwarx and stwcx instructions in PowerPc BOOKE e500
Hi I'm working on powerpc booke architecture and my project requires me to remove read and write privileges on some pages. Due to this any instruction accessing these pages traps and i'm trying to emulate the behavior of these instructions. I've emulated lwarx and stwcx instruction but i think stwcx is not working correctly. The emulation i've written is written below case OP_31_XOP_LWARX: { ulong ret; ulong addr; int eh = inst 0x0001 ; kvm_gva_to_hva(vcpu,ea,addr); /*lwarx RT RA RB EH*/ if(eh == 0) __asm__ __volatile__(lwarx %0,0,%1,0; isync:=r (ret) :r (addr)); else __asm__ __volatile__(lwarx %0,0,%1,1; isync:=r (ret) :r (addr)); kvmppc_set_gpr(vcpu,rt,ret); } case OP_31_XOP_STWCX: { ulong tmp; ulong addr; ulong data; kvm_gva_to_hva(vcpu,ea,addr); kvmppc_read_guest(vcpu,ea,data,sizeof(data)); __asm__ __volatile__(stwcx. %1,0,%2; isync :=r (tmp):r (data),r (addr):memory); } Here kvm_gva_to_hva function convrets a guest effective address to host virtual address . void kvm_gva_to_hva(struct kvm_vcpu *vcpu, ulong ea,ulong* hva) { gfn_t gfn; gpa_t gpa ; int gtlb_index; int offset; ulong addr; struct kvmppc_vcpu_e500 *vcpu_e500 = to_e500(vcpu); gtlb_index = kvmppc_mmu_itlb_index(vcpu, ea); gpa = kvmppc_mmu_xlate(vcpu,gtlb_index, ea); gfn = gpa PAGE_SHIFT; addr = (ulong)gfn_to_hva(vcpu_e500-vcpu.kvm, gfn); offset = offset_in_page(gpa); *hva = addr + offset; return; } The guest just hangs once it encounters a stwcx instruction. Does anybody have any idea why this is not working and what's wrong about the emulation code. Also i'm working on linux-3.0-rc4 kernel . Thanks in advance -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Emulating lwarx and stwcx instructions in PowerPc BOOKE e500
On 03/05/2012 02:37 PM, Aashish Mittal wrote: Hi I'm working on powerpc booke architecture and my project requires me to remove read and write privileges on some pages. Due to this any instruction accessing these pages traps and i'm trying to emulate the behavior of these instructions. I've emulated lwarx and stwcx instruction but i think stwcx is not working correctly. The emulation i've written is written below What is it you're emulating that needs lwarx/stwcx to work? case OP_31_XOP_LWARX: { ulong ret; ulong addr; int eh = inst 0x0001 ; kvm_gva_to_hva(vcpu,ea,addr); /*lwarx RT RA RB EH*/ if(eh == 0) __asm__ __volatile__(lwarx %0,0,%1,0; isync:=r (ret) :r (addr)); else __asm__ __volatile__(lwarx %0,0,%1,1; isync:=r (ret) :r (addr)); kvmppc_set_gpr(vcpu,rt,ret); } case OP_31_XOP_STWCX: { ulong tmp; ulong addr; ulong data; kvm_gva_to_hva(vcpu,ea,addr); kvmppc_read_guest(vcpu,ea,data,sizeof(data)); __asm__ __volatile__(stwcx. %1,0,%2; isync :=r (tmp):r (data),r (addr):memory); } Here kvm_gva_to_hva function convrets a guest effective address to host virtual address . void kvm_gva_to_hva(struct kvm_vcpu *vcpu, ulong ea,ulong* hva) { gfn_t gfn; gpa_t gpa ; int gtlb_index; int offset; ulong addr; struct kvmppc_vcpu_e500 *vcpu_e500 = to_e500(vcpu); gtlb_index = kvmppc_mmu_itlb_index(vcpu, ea); gpa = kvmppc_mmu_xlate(vcpu,gtlb_index, ea); gfn = gpa PAGE_SHIFT; addr = (ulong)gfn_to_hva(vcpu_e500-vcpu.kvm, gfn); offset = offset_in_page(gpa); *hva = addr + offset; return; } The guest just hangs once it encounters a stwcx instruction. Does anybody have any idea why this is not working and what's wrong about the emulation code. You're losing the reservation somewhere. Any lock or atomic operation along the emulation path will do this. Even if this didn't happen by accident, we really don't want to leave a reservation when we return to the guest -- it could have belonged to a previously running guest operating on shared memory, for example. Perhaps we should have a dummy stwcx on KVM guest entry code, similar to the one on interrupt return? Also i'm working on linux-3.0-rc4 kernel . Why are you working on something other than the current code or a stable release? -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: Save/Restore CR over vcpu_run
On 03/05/2012 10:02 AM, Alexander Graf wrote: @@ -442,6 +444,7 @@ heavyweight_exit: /* Return to kvm_vcpu_run(). */ mtlrr5 + mtcrr6 addir1, r1, HOST_STACK_SIZE /* r3 still contains the return code from kvmppc_handle_exit(). */ blr @@ -459,6 +462,9 @@ _GLOBAL(__kvmppc_vcpu_run) mflrr3 PPC_STL r3, HOST_STACK_LR(r1) + mfcrr5 + stw r5, HOST_CR(r1) If you move the mfcr before the PPC_STL they should be able to run in parallel. Otherwise on e500mc mfcr will wait for PPC_STL to take its 3 cycles and then mfcr will take 5 cyles before the stw of HOST_CR. Alternatively, consider using mcrf/mtocrf three times. Similar issues in booke_interrupts.S (except we can't assume mtocrf exists there), but I'm less worried about that one as it still needs an optimization pass in general. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html