Re: [Qemu-devel] Use getaddrinfo for migration

2012-03-05 Thread Amos Kong

On 02/03/12 18:41, Daniel P. Berrange wrote:

On Fri, Mar 02, 2012 at 02:25:36PM +0400, Michael Tokarev wrote:

Not a reply to the patch but a general observation.

I noticed that the tcp migration uses gethostname
(or getaddrinfo after this patch) from the main
thread - is it really the way to go?  Note that
DNS query which is done may block for a large amount
of time.  Is it really safe in this context?  Should
it resolve the name in a separate thread, allowing
guest to run while it is doing that?

This question is important for me because right now
I'm evaluating a network-connected block device driver
which should do failover, so it will have to resolve
alternative name(s) at runtime (especially since list
of available targets is dynamic).

 From one point, _usually_, the delay there is very
small since it is unlikely you'll do migration or
failover overseas, so most likely you'll have the
answer from DNS handy.  But from another point, if
the DNS is malfunctioning right at that time (eg,
one of the two DNS resolvers is being rebooted),
the delay even from local DNS may be noticeable.


Yes, I think you are correct - QEMU should take care to ensure that
DNS resolution can not block the QEMU event loop thread.

There is the GLib extension (getaddrinfo_a) which does async DNS
resolution, but for sake of portability it is probably better
to use a thread to do it.



I've prepared a V2 according to Kevin's comment,
https://github.com/kongove/qemu/commits/master

But I don't know how to process the getaddrinfo issue,
which steps should be done by a thread?
anyone can give a hint? thanks.

== migrate steps ==
0. main_loop, qemu_iohandler_poll
1. get migration command from qemu monitor
2. parse host/port, get an address list by getaddrinfo()
3. connect server
4. check status and return to main_loop (step 0)

(VMstate data is transmitted in background)

main_loop_wait()
...
 \- do_migrate()
   \- tcp_start_outgoing_migration()
 \- tcp_client_start()
   \- parse_host_port_info()
 \- getaddrinfo()


--
Amos.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH] fix select(2) race between main_loop_wait and qemu_aio_wait

2012-03-05 Thread Paolo Bonzini
This is quite ugly.  Two threads, one running main_loop_wait and
one running qemu_aio_wait, can race with each other on running the
same iohandler.  The result is that an iohandler could run while the
underlying socket is not readable or writable, with possibly ill effects.

This shows as a failure to boot an IDE disk using the NBD device.
We can consider it a bug in NBD or in the main loop.  The patch fixes
this in main_loop_wait, which is always going to lose the race because
qemu_aio_wait runs select with the global lock held.

Reported-by: Laurent Vivier laur...@vivier.eu
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
Anthony, if you think this is too ugly tell me and I can
post an NBD fix too.

 main-loop.c |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/main-loop.c b/main-loop.c
index db23de0..3beccff 100644
--- a/main-loop.c
+++ b/main-loop.c
@@ -458,6 +458,13 @@ int main_loop_wait(int nonblocking)
 
 if (timeout  0) {
 qemu_mutex_lock_iothread();
+
+/* Poll again.  A qemu_aio_wait() on another thread
+ * could have made the fdsets stale.
+ */
+tv.tv_sec = 0;
+tv.tv_usec = 0;
+ret = select(nfds + 1, rfds, wfds, xfds, tv);
 }
 
 glib_select_poll(rfds, wfds, xfds, (ret  0));
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 3/4] net: split hostname and service by last colon

2012-03-05 Thread Kevin Wolf
Am 02.03.2012 20:54, schrieb Laine Stump:
 On 03/02/2012 05:35 AM, Kevin Wolf wrote:
 Am 02.03.2012 10:58, schrieb Amos Kong:
 On 02/03/12 11:38, Amos Kong wrote:
 --- a/net.c
 +++ b/net.c
 @@ -84,7 +84,7 @@ static int get_str_sep(char *buf, int buf_size,
 const char **pp, int sep)
   const char *p, *p1;
   int len;
   p = *pp;
 -p1 = strchr(p, sep);
 +p1 = strrchr(p, sep);
   if (!p1)
   return -1;
   len = p1 - p;
 And what if the port isn't specified? I think you would erroneously
 interpret the last part of the IP address as port.
 Hi Kevin, port must be specified in '-incoming' parameters and migrate 
 monitor cmd.

   qemu-kvm ... -incoming tcp:$host:$port
   (qemu) migrate -d tcp:$host:$port


 If use boot up guest by wrong cmdline, qemu will report an error msg.

 # ./x86_64-softmmu/qemu-system-x86_64 --enable-kvm -boot n -incoming 
 tcp:2312::8272 -monitor stdio
 qemu-system-x86_64: qemu: getaddrinfo: Name or service not known
 tcp_server_start: Invalid argument
 Migration failed. Exit code tcp:2312::8272(-22), exiting.
 Which is because 2312: isn't a valid IP address, right? But what if you
 have something like 2312::1234:8272? If you misinterpret the 8272 as a
 port number, the remaining address is still a valid IPv6 address.
 
 This is made irrelevant by PATCH 4/4, which allows for the IP address to
 be placed inside brackets:
 
[2312::8272]:port
 
 (at least it's irrelevant if your documentation *requires* brackets for
 all numeric ipv6-address:port pairs, which is strongly recommended by
 RFC 5952). It really is impossible to disambiguate the meaning of the
 final : unless you require these brackets (or 1) require full
 specification of all potential colons in the IPv6 address or require
 that the port *always* be specified, neither of which seem acceptable to
 me).

Here you're actually explaining why it's not irrelevant. You don't want
to enforce port numbers, so 2312::1234:8272 must be interpreted as an
IPv6 address without a port. This code however would take 8727 as the
port and 2312::1234 as the IPv6 address, which is not what you expected
(even after brackets are allowed - they don't make a difference because
the example doesn't use brackets).

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 3/4] net: split hostname and service by last colon

2012-03-05 Thread Amos Kong
- Original Message -
 Am 02.03.2012 20:54, schrieb Laine Stump:
  On 03/02/2012 05:35 AM, Kevin Wolf wrote:
  Am 02.03.2012 10:58, schrieb Amos Kong:
  On 02/03/12 11:38, Amos Kong wrote:
  --- a/net.c
  +++ b/net.c
  @@ -84,7 +84,7 @@ static int get_str_sep(char *buf, int
  buf_size,
  const char **pp, int sep)
const char *p, *p1;
int len;
p = *pp;
  -p1 = strchr(p, sep);
  +p1 = strrchr(p, sep);
if (!p1)
return -1;
len = p1 - p;
  And what if the port isn't specified? I think you would
  erroneously
  interpret the last part of the IP address as port.
  Hi Kevin, port must be specified in '-incoming' parameters and
  migrate
  monitor cmd.
 
qemu-kvm ... -incoming tcp:$host:$port
(qemu) migrate -d tcp:$host:$port
 
 
  If use boot up guest by wrong cmdline, qemu will report an error
  msg.
 
  # ./x86_64-softmmu/qemu-system-x86_64 --enable-kvm -boot n
  -incoming
  tcp:2312::8272 -monitor stdio
  qemu-system-x86_64: qemu: getaddrinfo: Name or service not known
  tcp_server_start: Invalid argument
  Migration failed. Exit code tcp:2312::8272(-22), exiting.
  Which is because 2312: isn't a valid IP address, right? But what
  if you
  have something like 2312::1234:8272? If you misinterpret the 8272
  as a
  port number, the remaining address is still a valid IPv6 address.
  
  This is made irrelevant by PATCH 4/4, which allows for the IP
  address to
  be placed inside brackets:
  
 [2312::8272]:port
  
  (at least it's irrelevant if your documentation *requires* brackets
  for
  all numeric ipv6-address:port pairs, which is strongly recommended
  by
  RFC 5952). It really is impossible to disambiguate the meaning of
  the
  final : unless you require these brackets (or 1) require full
  specification of all potential colons in the IPv6 address or
  require
  that the port *always* be specified, neither of which seem
  acceptable to
  me).
 
 Here you're actually explaining why it's not irrelevant. You don't
 want
 to enforce port numbers, so 2312::1234:8272 must be interpreted as an
 IPv6 address without a port. This code however would take 8727 as the
 port and 2312::1234 as the IPv6 address, which is not what you
 expected
 (even after brackets are allowed - they don't make a difference
 because
 the example doesn't use brackets).

In the migration context, host/port are all necessary, so it's right to parse 
8272 to a port.
However, for IPv6 brackets must be mandatory if you require a port.


BTW, the DNS delay issue existed in the past (gethostbyname()), it should be 
fixed by another patchset.
I will post my V2 (without fix of DNS delay) later.

 
 Kevin
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 3/4] net: split hostname and service by last colon

2012-03-05 Thread Kevin Wolf
Am 05.03.2012 09:59, schrieb Amos Kong:
 - Original Message -
 Am 02.03.2012 20:54, schrieb Laine Stump:
 On 03/02/2012 05:35 AM, Kevin Wolf wrote:
 Am 02.03.2012 10:58, schrieb Amos Kong:
 On 02/03/12 11:38, Amos Kong wrote:
 --- a/net.c
 +++ b/net.c
 @@ -84,7 +84,7 @@ static int get_str_sep(char *buf, int
 buf_size,
 const char **pp, int sep)
   const char *p, *p1;
   int len;
   p = *pp;
 -p1 = strchr(p, sep);
 +p1 = strrchr(p, sep);
   if (!p1)
   return -1;
   len = p1 - p;
 And what if the port isn't specified? I think you would
 erroneously
 interpret the last part of the IP address as port.
 Hi Kevin, port must be specified in '-incoming' parameters and
 migrate
 monitor cmd.

   qemu-kvm ... -incoming tcp:$host:$port
   (qemu) migrate -d tcp:$host:$port


 If use boot up guest by wrong cmdline, qemu will report an error
 msg.

 # ./x86_64-softmmu/qemu-system-x86_64 --enable-kvm -boot n
 -incoming
 tcp:2312::8272 -monitor stdio
 qemu-system-x86_64: qemu: getaddrinfo: Name or service not known
 tcp_server_start: Invalid argument
 Migration failed. Exit code tcp:2312::8272(-22), exiting.
 Which is because 2312: isn't a valid IP address, right? But what
 if you
 have something like 2312::1234:8272? If you misinterpret the 8272
 as a
 port number, the remaining address is still a valid IPv6 address.

 This is made irrelevant by PATCH 4/4, which allows for the IP
 address to
 be placed inside brackets:

[2312::8272]:port

 (at least it's irrelevant if your documentation *requires* brackets
 for
 all numeric ipv6-address:port pairs, which is strongly recommended
 by
 RFC 5952). It really is impossible to disambiguate the meaning of
 the
 final : unless you require these brackets (or 1) require full
 specification of all potential colons in the IPv6 address or
 require
 that the port *always* be specified, neither of which seem
 acceptable to
 me).

 Here you're actually explaining why it's not irrelevant. You don't
 want
 to enforce port numbers, so 2312::1234:8272 must be interpreted as an
 IPv6 address without a port. This code however would take 8727 as the
 port and 2312::1234 as the IPv6 address, which is not what you
 expected
 (even after brackets are allowed - they don't make a difference
 because
 the example doesn't use brackets).
 
 In the migration context, host/port are all necessary, so it's right to parse 
 8272 to a port.
 However, for IPv6 brackets must be mandatory if you require a port.

Makes sense.

 BTW, the DNS delay issue existed in the past (gethostbyname()), it should be 
 fixed by another patchset.
 I will post my V2 (without fix of DNS delay) later.

Yes, I agree.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] fix select(2) race between main_loop_wait and qemu_aio_wait

2012-03-05 Thread Jan Kiszka
On 2012-03-05 09:34, Paolo Bonzini wrote:
 This is quite ugly.  Two threads, one running main_loop_wait and
 one running qemu_aio_wait, can race with each other on running the
 same iohandler.  The result is that an iohandler could run while the
 underlying socket is not readable or writable, with possibly ill effects.

Hmm, isn't it a problem already that a socket is polled by two threads
at the same time? Can't that be avoided?

Long-term, I'd like to cut out certain file descriptors from the main
loop and process them completely in separate threads (for separate
locking, prioritization etc.). Dunno how NBD works, but maybe it should
be reworked like this already.

Jan

 
 This shows as a failure to boot an IDE disk using the NBD device.
 We can consider it a bug in NBD or in the main loop.  The patch fixes
 this in main_loop_wait, which is always going to lose the race because
 qemu_aio_wait runs select with the global lock held.
 
 Reported-by: Laurent Vivier laur...@vivier.eu
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
   Anthony, if you think this is too ugly tell me and I can
   post an NBD fix too.
 
  main-loop.c |7 +++
  1 files changed, 7 insertions(+), 0 deletions(-)
 
 diff --git a/main-loop.c b/main-loop.c
 index db23de0..3beccff 100644
 --- a/main-loop.c
 +++ b/main-loop.c
 @@ -458,6 +458,13 @@ int main_loop_wait(int nonblocking)
  
  if (timeout  0) {
  qemu_mutex_lock_iothread();
 +
 +/* Poll again.  A qemu_aio_wait() on another thread
 + * could have made the fdsets stale.
 + */
 +tv.tv_sec = 0;
 +tv.tv_usec = 0;
 +ret = select(nfds + 1, rfds, wfds, xfds, tv);
  }
  
  glib_select_poll(rfds, wfds, xfds, (ret  0));

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] fix select(2) race between main_loop_wait and qemu_aio_wait

2012-03-05 Thread Paolo Bonzini
Il 05/03/2012 10:07, Jan Kiszka ha scritto:
  This is quite ugly.  Two threads, one running main_loop_wait and
  one running qemu_aio_wait, can race with each other on running the
  same iohandler.  The result is that an iohandler could run while the
  underlying socket is not readable or writable, with possibly ill effects.
 
 Hmm, isn't it a problem already that a socket is polled by two threads
 at the same time? Can't that be avoided?

We still have synchronous I/O in the device models.  That's the root
cause of the bug, I suppose.

 Long-term, I'd like to cut out certain file descriptors from the main
 loop and process them completely in separate threads (for separate
 locking, prioritization etc.). Dunno how NBD works, but maybe it should
 be reworked like this already.

Me too, I even made a very simple proof of concept a couple of weeks ago
(search for a thread switching the block layer from coroutines to
threads).  It worked, though it is obviously not upstreamable in any way.

In that world order EventNotifiers would replace
qemu_aio_set_fd_handler, and socket-based protocols such as NBD would
run with blocking I/O in their own thread.  In addition to one thread
per I/O request (from a thread pool), there would be one arbiter thread
that reads replies and dispatches them to the appropriate I/O request
thread.  The arbiter thread replaces the read callback in
qemu_aio_set_fd_handler.

The problem is, even though it worked, making this thread-safe is
another story.  I suspect that in practice it is very difficult to do
without resurrecting RCU patches.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/38] KVM: PPC: booke: category E.HV (GS-mode) support

2012-03-05 Thread tiejun.chen
 +/*
 + * Host interrupt handlers may have clobbered these guest-readable
 + * SPRGs, so we need to reload them here with the guest's values.
 + */
 +lwz r3, VCPU_VRSAVE(r4)
 +lwz r5, VCPU_SHARED_SPRG4(r11)
 +mtspr   SPRN_VRSAVE, r3
 +lwz r6, VCPU_SHARED_SPRG5(r11)
 +mtspr   SPRN_SPRG4W, r5
 +lwz r7, VCPU_SHARED_SPRG6(r11)
 +mtspr   SPRN_SPRG5W, r6
 +lwz r8, VCPU_SHARED_SPRG7(r11)
 +mtspr   SPRN_SPRG6W, r7
 +mtspr   SPRN_SPRG7W, r8
 +

That should be here.

 +/* Load some guest volatiles. */
 +PPC_LL  r3, VCPU_LR(r4)
 +PPC_LL  r5, VCPU_XER(r4)
 +PPC_LL  r6, VCPU_CTR(r4)
 +PPC_LL  r7, VCPU_CR(r4)
 +PPC_LL  r8, VCPU_PC(r4)
 +#ifndef CONFIG_64BIT
 +lwz r9, (VCPU_SHARED_MSR + 4)(r11)
 +#else
 +ld  r9, (VCPU_SHARED_MSR)(r11)
 +#endif
 +PPC_LL  r0, VCPU_GPR(r0)(r4)
 +PPC_LL  r1, VCPU_GPR(r1)(r4)
 +PPC_LL  r2, VCPU_GPR(r2)(r4)
 +PPC_LL  r10, VCPU_GPR(r10)(r4)
 +PPC_LL  r11, VCPU_GPR(r11)(r4)
 +PPC_LL  r12, VCPU_GPR(r12)(r4)
 +PPC_LL  r13, VCPU_GPR(r13)(r4)
 +mtlrr3
 +mtxer   r5
 +mtctr   r6
 +mtcrr7
 +mtsrr0  r8
 +mtsrr1  r9
 +
 +#ifdef CONFIG_KVM_EXIT_TIMING
 +/* save enter time */
 +1:
 +mfspr   r6, SPRN_TBRU
 +mfspr   r7, SPRN_TBRL
 +mfspr   r8, SPRN_TBRU
 +cmpwr8, r6
 
 Is not we should save guest CR after this otherwise this can corrupt it?

I think this should be a typo since in our previous kvm implementation, we
always did collect kvm exit timing at the above location :)

Tiejun

 
 Thanks
 -Bharat
 
 +PPC_STL r7, VCPU_TIMING_LAST_ENTER_TBL(r4)
 +bne 1b  
 +PPC_STL r8, VCPU_TIMING_LAST_ENTER_TBU(r4)
 +#endif
 +
 +/* Finish loading guest volatiles and jump to guest. */
 +PPC_LL  r5, VCPU_GPR(r5)(r4)
 +PPC_LL  r6, VCPU_GPR(r6)(r4)
 +PPC_LL  r7, VCPU_GPR(r7)(r4)
 +PPC_LL  r8, VCPU_GPR(r8)(r4)
 +PPC_LL  r9, VCPU_GPR(r9)(r4)
 +
 +PPC_LL  r3, VCPU_GPR(r3)(r4)
 +PPC_LL  r4, VCPU_GPR(r4)(r4)
 +rfi
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/1 v3] PCI: Device specific reset function

2012-03-05 Thread Tadeusz Struk

---
 drivers/pci/pci.h|1 +
 drivers/pci/quirks.c |   33 +++--
 include/linux/pci.h  |1 +
 3 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 1009a5e..4d10479 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -315,6 +315,7 @@ struct pci_dev_reset_methods {
u16 vendor;
u16 device;
int (*reset)(struct pci_dev *dev, int probe);
+   struct list_head list;
 };
 
 #ifdef CONFIG_PCI_QUIRKS
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 6476547..f423d2f 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -3070,26 +3070,47 @@ static int reset_intel_82599_sfp_virtfn(struct pci_dev 
*dev, int probe)
 }
 
 #define PCI_DEVICE_ID_INTEL_82599_SFP_VF   0x10ed
-
-static const struct pci_dev_reset_methods pci_dev_reset_methods[] = {
+static struct pci_dev_reset_methods pci_dev_reset_methods[] = {
{ PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82599_SFP_VF,
-reset_intel_82599_sfp_virtfn },
+   reset_intel_82599_sfp_virtfn },
{ PCI_VENDOR_ID_INTEL, PCI_ANY_ID,
reset_intel_generic_dev },
-   { 0 }
 };
 
+static LIST_HEAD(reset_list);
+
+void pci_dev_specific_reset_add(struct pci_dev_reset_methods *reset_method)
+{
+   INIT_LIST_HEAD(reset_method-list);
+   list_add(reset_method-list, reset_list);
+}
+
+static int __init pci_dev_specific_reset_init(void)
+{
+   int i;
+
+   for (i = 0; i  ARRAY_SIZE(pci_dev_reset_methods); i++) {
+   pci_dev_specific_reset_add(pci_dev_reset_methods[i]);
+   }
+   return 0;
+}
+
+late_initcall(pci_dev_specific_reset_init);
+
 int pci_dev_specific_reset(struct pci_dev *dev, int probe)
 {
const struct pci_dev_reset_methods *i;
+   struct pci_driver *drv = dev-driver;
+
+   if (drv  drv-reset)
+   return drv-reset(dev, probe);
 
-   for (i = pci_dev_reset_methods; i-reset; i++) {
+   list_for_each_entry(i, reset_list, list) {
if ((i-vendor == dev-vendor ||
 i-vendor == (u16)PCI_ANY_ID) 
(i-device == dev-device ||
 i-device == (u16)PCI_ANY_ID))
return i-reset(dev, probe);
}
-
return -ENOTTY;
 }
diff --git a/include/linux/pci.h b/include/linux/pci.h
index a16b1df..a3a0bc5 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -560,6 +560,7 @@ struct pci_driver {
int  (*resume_early) (struct pci_dev *dev);
int  (*resume) (struct pci_dev *dev);   /* Device woken 
up */
void (*shutdown) (struct pci_dev *dev);
+   int  (*reset) (struct pci_dev *dev, int probe); /* Device specific 
reset */
struct pci_error_handlers *err_handler;
struct device_driverdriver;
struct pci_dynids dynids;
-- 
1.7.7.6

--
Intel Shannon Limited
Registered in Ireland
Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
Registered Number: 308263
Business address: Dromore House, East Park, Shannon, Co. Clare

This e-mail and any attachments may contain confidential material for the sole 
use of the intended recipient(s). Any review or distribution by others is 
strictly prohibited. If you are not the intended recipient, please contact the 
sender and delete all copies.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/1] PCI: Device specific reset function

2012-03-05 Thread Tadeusz Struk
Hi,
I have a use case where I need to cleanup resource allocated for Virtual
Functions after a guest OS that used it crashed. This cleanup needs to
be done before the VF is being FLRed. The only possible way to do this
seems to be by using pci_dev_specific_reset() function. Unfortunately
this function only works for devices defined in a static table in the
drivers/pci/quirks.c file. This patch changes it so that specific reset
handler is part of pci_driver struct.


 drivers/pci/pci.h|1 +
 drivers/pci/quirks.c |   33 +++--
 include/linux/pci.h  |1 +
 3 files changed, 29 insertions(+), 6 deletions(-)

--
Intel Shannon Limited
Registered in Ireland
Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
Registered Number: 308263
Business address: Dromore House, East Park, Shannon, Co. Clare

This e-mail and any attachments may contain confidential material for the sole 
use of the intended recipient(s). Any review or distribution by others is 
strictly prohibited. If you are not the intended recipient, please contact the 
sender and delete all copies.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/9] support to migrate with IPv6 address

2012-03-05 Thread Amos Kong
Those patches make migration of IPv6 address work, old code 
only support to parse IPv4 address/port, use getaddrinfo()
to get socket addresses infomation.
Last two patches are about spliting IPv6 host/port.

Changes from v1:
- split different changes to small patches, it will be
  easier to review
- fixed some problem according to Kevin's comment

---

Amos Kong (9):
  net: introduce tcp_server_start()
  net: use tcp_server_start() for tcp server creation
  net: introduce tcp_client_start()
  net: use tcp_client_start for tcp client creation
  net: refector tcp_*_start functions
  net: use getaddrinfo() in tcp_start_common
  net: introduce parse_host_port_info()
  net: split hostname and service by last colon
  net: support to include ipv6 address by brackets


 migration-tcp.c |   62 +++--
 net.c   |  137 +++
 net/socket.c|   64 ++
 qemu_socket.h   |3 +
 4 files changed, 171 insertions(+), 95 deletions(-)

-- 
Amos Kong
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/9] net: introduce tcp_server_start()

2012-03-05 Thread Amos Kong
Introduce tcp_server_start() by moving original code in
tcp_start_incoming_migration().

Signed-off-by: Amos Kong ak...@redhat.com
---
 net.c |   27 +++
 qemu_socket.h |2 ++
 2 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/net.c b/net.c
index c34474f..0260968 100644
--- a/net.c
+++ b/net.c
@@ -99,6 +99,33 @@ static int get_str_sep(char *buf, int buf_size, const char 
**pp, int sep)
 return 0;
 }
 
+int tcp_server_start(const char *str, int *fd)
+{
+int val, ret;
+struct sockaddr_in saddr;
+
+if (parse_host_port(saddr, str)  0) {
+return -1;
+}
+
+*fd = qemu_socket(PF_INET, SOCK_STREAM, 0);
+if (fd  0) {
+perror(socket);
+return -1;
+}
+socket_set_nonblock(*fd);
+
+/* allow fast reuse */
+val = 1;
+setsockopt(*fd, SOL_SOCKET, SO_REUSEADDR, (const char *)val, sizeof(val));
+
+ret = bind(*fd, (struct sockaddr *)saddr, sizeof(saddr));
+if (ret  0) {
+closesocket(*fd);
+}
+return ret;
+}
+
 int parse_host_port(struct sockaddr_in *saddr, const char *str)
 {
 char buf[512];
diff --git a/qemu_socket.h b/qemu_socket.h
index fe4cf6c..d612793 100644
--- a/qemu_socket.h
+++ b/qemu_socket.h
@@ -54,6 +54,8 @@ int unix_listen(const char *path, char *ostr, int olen);
 int unix_connect_opts(QemuOpts *opts);
 int unix_connect(const char *path);
 
+int tcp_server_start(const char *str, int *fd);
+
 /* Old, ipv4 only bits.  Don't use for new code. */
 int parse_host_port(struct sockaddr_in *saddr, const char *str);
 int socket_init(void);

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/9] net: use tcp_server_start() for tcp server creation

2012-03-05 Thread Amos Kong
Use tcp_server_start in those two functions:
 tcp_start_incoming_migration()
 net_socket_listen_init()

Signed-off-by: Amos Kong ak...@redhat.com
---
 migration-tcp.c |   21 +
 net/socket.c|   23 +++
 2 files changed, 8 insertions(+), 36 deletions(-)

diff --git a/migration-tcp.c b/migration-tcp.c
index 35a5781..ecadd10 100644
--- a/migration-tcp.c
+++ b/migration-tcp.c
@@ -157,28 +157,17 @@ out2:
 
 int tcp_start_incoming_migration(const char *host_port)
 {
-struct sockaddr_in addr;
-int val;
+int ret;
 int s;
 
 DPRINTF(Attempting to start an incoming migration\n);
 
-if (parse_host_port(addr, host_port)  0) {
-fprintf(stderr, invalid host/port combination: %s\n, host_port);
-return -EINVAL;
-}
-
-s = qemu_socket(PF_INET, SOCK_STREAM, 0);
-if (s == -1) {
-return -socket_error();
+ret = tcp_server_start(host_port, s);
+if (ret  0) {
+fprintf(stderr, tcp_server_start: %s\n, strerror(-ret));
+return ret;
 }
 
-val = 1;
-setsockopt(s, SOL_SOCKET, SO_REUSEADDR, (const char *)val, sizeof(val));
-
-if (bind(s, (struct sockaddr *)addr, sizeof(addr)) == -1) {
-goto err;
-}
 if (listen(s, 1) == -1) {
 goto err;
 }
diff --git a/net/socket.c b/net/socket.c
index 0bcf229..5feb3d2 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -403,31 +403,14 @@ static int net_socket_listen_init(VLANState *vlan,
   const char *host_str)
 {
 NetSocketListenState *s;
-int fd, val, ret;
-struct sockaddr_in saddr;
-
-if (parse_host_port(saddr, host_str)  0)
-return -1;
+int fd, ret;
 
 s = g_malloc0(sizeof(NetSocketListenState));
 
-fd = qemu_socket(PF_INET, SOCK_STREAM, 0);
-if (fd  0) {
-perror(socket);
-g_free(s);
-return -1;
-}
-socket_set_nonblock(fd);
-
-/* allow fast reuse */
-val = 1;
-setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, (const char *)val, sizeof(val));
-
-ret = bind(fd, (struct sockaddr *)saddr, sizeof(saddr));
+ret = tcp_server_start(host_str, fd);
 if (ret  0) {
-perror(bind);
+error_report(tcp_server_start: %s, strerror(-ret));
 g_free(s);
-closesocket(fd);
 return -1;
 }
 ret = listen(fd, 0);

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/9] net: introduce tcp_client_start()

2012-03-05 Thread Amos Kong
Introduce tcp_client_start() by moving original code in
tcp_start_outgoing_migration().

Signed-off-by: Amos Kong ak...@redhat.com
---
 net.c |   39 +++
 qemu_socket.h |1 +
 2 files changed, 40 insertions(+), 0 deletions(-)

diff --git a/net.c b/net.c
index 0260968..5c20e22 100644
--- a/net.c
+++ b/net.c
@@ -126,6 +126,45 @@ int tcp_server_start(const char *str, int *fd)
 return ret;
 }
 
+int tcp_client_start(const char *str, int *fd)
+{
+struct sockaddr_in saddr;
+int ret;
+
+if (parse_host_port(saddr, str)  0) {
+return -EINVAL;
+}
+
+*fd = qemu_socket(PF_INET, SOCK_STREAM, 0);
+if (fd  0) {
+perror(socket);
+return -1;
+}
+socket_set_nonblock(*fd);
+
+for (;;) {
+ret = connect(*fd, (struct sockaddr *)saddr, sizeof(saddr));
+if (ret  0) {
+ret = -socket_error();
+if (ret == -EINPROGRESS) {
+break;
+#ifdef _WIN32
+} else if (ret == -WSAEALREADY || ret == -WSAEINVAL) {
+break;
+#endif
+} else if (ret != -EINTR  ret != -EWOULDBLOCK) {
+perror(connect);
+closesocket(*fd);
+return -1;
+}
+} else {
+break;
+}
+}
+
+return ret;
+}
+
 int parse_host_port(struct sockaddr_in *saddr, const char *str)
 {
 char buf[512];
diff --git a/qemu_socket.h b/qemu_socket.h
index d612793..9246578 100644
--- a/qemu_socket.h
+++ b/qemu_socket.h
@@ -55,6 +55,7 @@ int unix_connect_opts(QemuOpts *opts);
 int unix_connect(const char *path);
 
 int tcp_server_start(const char *str, int *fd);
+int tcp_client_start(const char *str, int *fd);
 
 /* Old, ipv4 only bits.  Don't use for new code. */
 int parse_host_port(struct sockaddr_in *saddr, const char *str);

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 4/9] net: use tcp_client_start for tcp client creation

2012-03-05 Thread Amos Kong
Use tcp_client_start() in those two functions:
 tcp_start_outgoing_migration()
 net_socket_connect_init()

Signed-off-by: Amos Kong ak...@redhat.com
---
 migration-tcp.c |   41 +
 net/socket.c|   41 +++--
 2 files changed, 24 insertions(+), 58 deletions(-)

diff --git a/migration-tcp.c b/migration-tcp.c
index ecadd10..4f89bff 100644
--- a/migration-tcp.c
+++ b/migration-tcp.c
@@ -81,43 +81,28 @@ static void tcp_wait_for_connect(void *opaque)
 
 int tcp_start_outgoing_migration(MigrationState *s, const char *host_port)
 {
-struct sockaddr_in addr;
 int ret;
-
-ret = parse_host_port(addr, host_port);
-if (ret  0) {
-return ret;
-}
+int fd;
 
 s-get_error = socket_errno;
 s-write = socket_write;
 s-close = tcp_close;
 
-s-fd = qemu_socket(PF_INET, SOCK_STREAM, 0);
-if (s-fd == -1) {
-DPRINTF(Unable to open socket);
-return -socket_error();
-}
-
-socket_set_nonblock(s-fd);
-
-do {
-ret = connect(s-fd, (struct sockaddr *)addr, sizeof(addr));
-if (ret == -1) {
-ret = -socket_error();
-}
-if (ret == -EINPROGRESS || ret == -EWOULDBLOCK) {
-qemu_set_fd_handler2(s-fd, NULL, NULL, tcp_wait_for_connect, s);
-return 0;
-}
-} while (ret == -EINTR);
-
-if (ret  0) {
+ret = tcp_client_start(host_port, fd);
+s-fd = fd;
+if (ret == -EINPROGRESS || ret == -EWOULDBLOCK) {
+DPRINTF(connect in progress);
+qemu_set_fd_handler2(s-fd, NULL, NULL, tcp_wait_for_connect, s);
+} else if (ret  0) {
 DPRINTF(connect failed\n);
-migrate_fd_error(s);
+if (ret != -EINVAL) {
+migrate_fd_error(s);
+}
 return ret;
+} else {
+migrate_fd_connect(s);
 }
-migrate_fd_connect(s);
+
 return 0;
 }
 
diff --git a/net/socket.c b/net/socket.c
index 5feb3d2..b7cd8ec 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -434,41 +434,22 @@ static int net_socket_connect_init(VLANState *vlan,
const char *host_str)
 {
 NetSocketState *s;
-int fd, connected, ret, err;
+int fd, connected, ret;
 struct sockaddr_in saddr;
 
-if (parse_host_port(saddr, host_str)  0)
-return -1;
-
-fd = qemu_socket(PF_INET, SOCK_STREAM, 0);
-if (fd  0) {
-perror(socket);
-return -1;
-}
-socket_set_nonblock(fd);
-
-connected = 0;
-for(;;) {
-ret = connect(fd, (struct sockaddr *)saddr, sizeof(saddr));
-if (ret  0) {
-err = socket_error();
-if (err == EINTR || err == EWOULDBLOCK) {
-} else if (err == EINPROGRESS) {
-break;
+ret = tcp_client_start(host_str, fd);
+if (ret == -EINPROGRESS || ret == -EWOULDBLOCK) {
+connected = 0;
 #ifdef _WIN32
-} else if (err == WSAEALREADY || err == WSAEINVAL) {
-break;
+} else if (ret == -WSAEALREADY || ret == -WSAEINVAL) {
+connected = 0;
 #endif
-} else {
-perror(connect);
-closesocket(fd);
-return -1;
-}
-} else {
-connected = 1;
-break;
-}
+} else if (ret  0) {
+return -1;
+} else {
+connected = 1;
 }
+
 s = net_socket_fd_init(vlan, model, name, fd, connected);
 if (!s)
 return -1;

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 5/9] net: refector tcp_*_start functions

2012-03-05 Thread Amos Kong
There are some repeated code for tcp_server_start()
and tcp_client_start().

Signed-off-by: Amos Kong ak...@redhat.com
---
 net.c |   82 -
 1 files changed, 46 insertions(+), 36 deletions(-)

diff --git a/net.c b/net.c
index 5c20e22..da2a8d4 100644
--- a/net.c
+++ b/net.c
@@ -99,37 +99,41 @@ static int get_str_sep(char *buf, int buf_size, const char 
**pp, int sep)
 return 0;
 }
 
-int tcp_server_start(const char *str, int *fd)
+static int tcp_server_bind(int fd, struct sockaddr_in *saddr)
 {
-int val, ret;
-struct sockaddr_in saddr;
+int ret;
+int val = 1;
 
-if (parse_host_port(saddr, str)  0) {
-return -1;
-}
+/* allow fast reuse */
+setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, (const char *)val, sizeof(val));
 
-*fd = qemu_socket(PF_INET, SOCK_STREAM, 0);
-if (fd  0) {
-perror(socket);
-return -1;
+ret = bind(fd, (struct sockaddr *)saddr, sizeof(*saddr));
+
+if (ret == -1) {
+ret = -socket_error();
 }
-socket_set_nonblock(*fd);
+return ret;
 
-/* allow fast reuse */
-val = 1;
-setsockopt(*fd, SOL_SOCKET, SO_REUSEADDR, (const char *)val, sizeof(val));
+}
+
+static int tcp_client_connect(int fd, struct sockaddr_in *saddr)
+{
+int ret;
+
+do {
+ret = connect(fd, (struct sockaddr *)saddr, sizeof(*saddr));
+if (ret == -1) {
+ret = -socket_error();
+}
+} while (ret == -EINTR || ret == -EWOULDBLOCK);
 
-ret = bind(*fd, (struct sockaddr *)saddr, sizeof(saddr));
-if (ret  0) {
-closesocket(*fd);
-}
 return ret;
 }
 
-int tcp_client_start(const char *str, int *fd)
+static int tcp_start_common(const char *str, int *fd, bool server)
 {
+int ret = -EINVAL;
 struct sockaddr_in saddr;
-int ret;
 
 if (parse_host_port(saddr, str)  0) {
 return -EINVAL;
@@ -142,29 +146,35 @@ int tcp_client_start(const char *str, int *fd)
 }
 socket_set_nonblock(*fd);
 
-for (;;) {
-ret = connect(*fd, (struct sockaddr *)saddr, sizeof(saddr));
-if (ret  0) {
-ret = -socket_error();
-if (ret == -EINPROGRESS) {
-break;
+if (server) {
+ret = tcp_server_bind(*fd, saddr);
+} else {
+ret = tcp_client_connect(*fd, saddr);
+}
+
 #ifdef _WIN32
-} else if (ret == -WSAEALREADY || ret == -WSAEINVAL) {
-break;
+if (ret == -WSAEALREADY || ret == -WSAEINVAL) {
+return ret;  /* Success */
+}
 #endif
-} else if (ret != -EINTR  ret != -EWOULDBLOCK) {
-perror(connect);
-closesocket(*fd);
-return -1;
-}
-} else {
-break;
-}
+if (ret = 0 || ret == -EINPROGRESS || ret == -EWOULDBLOCK) {
+return ret;  /* Success */
 }
 
+closesocket(*fd);
 return ret;
 }
 
+int tcp_server_start(const char *str, int *fd)
+{
+return tcp_start_common(str, fd, true);
+}
+
+int tcp_client_start(const char *str, int *fd)
+{
+return tcp_start_common(str, fd, false);
+}
+
 int parse_host_port(struct sockaddr_in *saddr, const char *str)
 {
 char buf[512];

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 6/9] net: use getaddrinfo() in tcp_start_common

2012-03-05 Thread Amos Kong
Migrating with IPv6 address exists problem, gethostbyname()/inet_aton()
could not translate IPv6 address/port simply, so use getaddrinfo()
in tcp_start_common to translate network address and service.
We can get an address list by getaddrinfo().

Userlevel IPv6 Programming Introduction:
http://www.akkadia.org/drepper/userapi-ipv6.html

Reference RFC 3493, Basic Socket Interface Extensions for IPv6

Signed-off-by: Amos Kong ak...@redhat.com
---
 net.c |   81 -
 1 files changed, 60 insertions(+), 21 deletions(-)

diff --git a/net.c b/net.c
index da2a8d4..de1db8c 100644
--- a/net.c
+++ b/net.c
@@ -99,7 +99,7 @@ static int get_str_sep(char *buf, int buf_size, const char 
**pp, int sep)
 return 0;
 }
 
-static int tcp_server_bind(int fd, struct sockaddr_in *saddr)
+static int tcp_server_bind(int fd, struct addrinfo *rp)
 {
 int ret;
 int val = 1;
@@ -107,7 +107,7 @@ static int tcp_server_bind(int fd, struct sockaddr_in 
*saddr)
 /* allow fast reuse */
 setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, (const char *)val, sizeof(val));
 
-ret = bind(fd, (struct sockaddr *)saddr, sizeof(*saddr));
+ret = bind(fd, rp-ai_addr, rp-ai_addrlen);
 
 if (ret == -1) {
 ret = -socket_error();
@@ -116,12 +116,12 @@ static int tcp_server_bind(int fd, struct sockaddr_in 
*saddr)
 
 }
 
-static int tcp_client_connect(int fd, struct sockaddr_in *saddr)
+static int tcp_client_connect(int fd, struct addrinfo *rp)
 {
 int ret;
 
 do {
-ret = connect(fd, (struct sockaddr *)saddr, sizeof(*saddr));
+ret = connect(fd, rp-ai_addr, rp-ai_addrlen);
 if (ret == -1) {
 ret = -socket_error();
 }
@@ -132,36 +132,75 @@ static int tcp_client_connect(int fd, struct sockaddr_in 
*saddr)
 
 static int tcp_start_common(const char *str, int *fd, bool server)
 {
+char hostname[512];
+const char *service;
+const char *name;
+struct addrinfo hints;
+struct addrinfo *result, *rp;
+int s;
+int sfd;
 int ret = -EINVAL;
-struct sockaddr_in saddr;
 
-if (parse_host_port(saddr, str)  0) {
+*fd = -1;
+service = str;
+
+if (get_str_sep(hostname, sizeof(hostname), service, ':')  0) {
+error_report(invalid host/port combination: %s, str);
 return -EINVAL;
 }
-
-*fd = qemu_socket(PF_INET, SOCK_STREAM, 0);
-if (fd  0) {
-perror(socket);
-return -1;
+if (server  strlen(hostname) == 0) {
+name = NULL;
+} else {
+name = hostname;
 }
-socket_set_nonblock(*fd);
+
+/* Obtain address(es) matching host/port */
+
+memset(hints, 0, sizeof(struct addrinfo));
+hints.ai_family = AF_UNSPEC; /* Allow IPv4 or IPv6 */
+hints.ai_socktype = SOCK_STREAM; /* Datagram socket */
 
 if (server) {
-ret = tcp_server_bind(*fd, saddr);
-} else {
-ret = tcp_client_connect(*fd, saddr);
+hints.ai_flags = AI_PASSIVE;
 }
 
-#ifdef _WIN32
-if (ret == -WSAEALREADY || ret == -WSAEINVAL) {
-return ret;  /* Success */
+s = getaddrinfo(name, service, hints, result);
+if (s != 0) {
+error_report(qemu: getaddrinfo: %s, gai_strerror(s));
+return -EINVAL;
 }
+
+/* getaddrinfo() returns a list of address structures.
+   Try each address until we successfully bind/connect).
+   If socket(2) (or bind/connect(2)) fails, we (close the socket
+   and) try the next address. */
+
+for (rp = result; rp != NULL; rp = rp-ai_next) {
+sfd = qemu_socket(rp-ai_family, rp-ai_socktype, rp-ai_protocol);
+if (sfd == -1) {
+ret = -errno;
+continue;
+}
+socket_set_nonblock(sfd);
+if (server) {
+ret = tcp_server_bind(sfd, rp);
+} else {
+ret = tcp_client_connect(sfd, rp);
+}
+#ifdef _WIN32
+if (ret == -WSAEALREADY || ret == -WSAEINVAL) {
+*fd = sfd;
+break;  /* Success */
+}
 #endif
-if (ret = 0 || ret == -EINPROGRESS || ret == -EWOULDBLOCK) {
-return ret;  /* Success */
+if (ret = 0 || ret == -EINPROGRESS || ret == -EWOULDBLOCK) {
+*fd = sfd;
+break;  /* Success */
+}
+closesocket(sfd);
 }
 
-closesocket(*fd);
+freeaddrinfo(result);
 return ret;
 }
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 7/9] net: introduce parse_host_port_info()

2012-03-05 Thread Amos Kong
int parse_host_port(struct sockaddr_in *saddr, const char *str)
Parsed address info will be restored into 'saddr', it only support ipv4.
This function is used by net_socket_mcast_init() and net_socket_udp_init().

int parse_host_port_info(struct addrinfo *result, const char *str)
Parsed address info will be restored into 'result', it's an address list.
It can be used to parse IPv6 address/port.

Signed-off-by: Amos Kong ak...@redhat.com
---
 net.c |   26 --
 1 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/net.c b/net.c
index de1db8c..2518e5f 100644
--- a/net.c
+++ b/net.c
@@ -130,18 +130,15 @@ static int tcp_client_connect(int fd, struct addrinfo *rp)
 return ret;
 }
 
-static int tcp_start_common(const char *str, int *fd, bool server)
+static int parse_host_port_info(struct addrinfo **result, const char *str,
+bool server)
 {
 char hostname[512];
 const char *service;
 const char *name;
 struct addrinfo hints;
-struct addrinfo *result, *rp;
 int s;
-int sfd;
-int ret = -EINVAL;
 
-*fd = -1;
 service = str;
 
 if (get_str_sep(hostname, sizeof(hostname), service, ':')  0) {
@@ -164,12 +161,29 @@ static int tcp_start_common(const char *str, int *fd, 
bool server)
 hints.ai_flags = AI_PASSIVE;
 }
 
-s = getaddrinfo(name, service, hints, result);
+s = getaddrinfo(name, service, hints, result);
 if (s != 0) {
 error_report(qemu: getaddrinfo: %s, gai_strerror(s));
 return -EINVAL;
 }
 
+return 0;
+}
+
+static int tcp_start_common(const char *str, int *fd, bool server)
+{
+struct addrinfo *rp;
+int sfd;
+int ret = -EINVAL;
+struct addrinfo *result;
+
+*fd = -1;
+
+ret = parse_host_port_info(result, str, server);
+if (ret  0) {
+return -EINVAL;
+}
+
 /* getaddrinfo() returns a list of address structures.
Try each address until we successfully bind/connect).
If socket(2) (or bind/connect(2)) fails, we (close the socket

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 8/9] net: split hostname and service by last colon

2012-03-05 Thread Amos Kong
IPv6 address contains colons, parse will be wrong.

[2312::8274]:5200

Signed-off-by: Amos Kong ak...@redhat.com
---
 net.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net.c b/net.c
index 2518e5f..d6ce1fa 100644
--- a/net.c
+++ b/net.c
@@ -84,7 +84,7 @@ static int get_str_sep(char *buf, int buf_size, const char 
**pp, int sep)
 const char *p, *p1;
 int len;
 p = *pp;
-p1 = strchr(p, sep);
+p1 = strrchr(p, sep);
 if (!p1)
 return -1;
 len = p1 - p;

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 9/9] net: support to include ipv6 address by brackets

2012-03-05 Thread Amos Kong
That method of representing an IPv6 address with a port is
discouraged because of its ambiguity. Referencing to RFC5952,
the recommended format is:

 [2312::8274]:5200

For IPv6 brackets must be mandatory if you require a port.

test status: Successed
listen side: qemu-kvm  -incoming tcp:[2312::8274]:5200
client side: qemu-kvm ...
 (qemu) migrate -d tcp:[2312::8274]:5200

Signed-off-by: Amos Kong ak...@redhat.com
---
 net.c |6 ++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/net.c b/net.c
index d6ce1fa..499ed1d 100644
--- a/net.c
+++ b/net.c
@@ -88,6 +88,12 @@ static int get_str_sep(char *buf, int buf_size, const char 
**pp, int sep)
 if (!p1)
 return -1;
 len = p1 - p;
+/* remove brackets which includes hostname */
+if (*p == '['  *(p1-1) == ']') {
+p += 1;
+len -= 2;
+}
+
 p1++;
 if (buf_size  0) {
 if (len  buf_size - 1)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM call eganda for Tuesday 6th

2012-03-05 Thread Juan Quintela

Hi

Please send in any agenda items you are interested in covering.

Cheers, Juan.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Restore guest CR after exit timing calculation

2012-03-05 Thread Bharat Bhushan
No instruction which can change Condition Register (CR) should be executed 
after Guest CR is loaded. So the guest CR is restored after the Exit Timing in 
lightweight_exit executes cmpw, which can clobber CR.

Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
---
This patch is against e500mc branch.

 arch/powerpc/kvm/bookehv_interrupts.S |   11 ---
 1 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/bookehv_interrupts.S 
b/arch/powerpc/kvm/bookehv_interrupts.S
index 63fc5f0..6b9389f 100644
--- a/arch/powerpc/kvm/bookehv_interrupts.S
+++ b/arch/powerpc/kvm/bookehv_interrupts.S
@@ -574,7 +574,6 @@ lightweight_exit:
mtlrr3
mtxer   r5
mtctr   r6
-   mtcrr7
mtsrr0  r8
mtsrr1  r9
 
@@ -582,14 +581,20 @@ lightweight_exit:
/* save enter time */
 1:
mfspr   r6, SPRN_TBRU
-   mfspr   r7, SPRN_TBRL
+   mfspr   r9, SPRN_TBRL
mfspr   r8, SPRN_TBRU
cmpwr8, r6
-   PPC_STL r7, VCPU_TIMING_LAST_ENTER_TBL(r4)
+   PPC_STL r9, VCPU_TIMING_LAST_ENTER_TBL(r4)
bne 1b
PPC_STL r8, VCPU_TIMING_LAST_ENTER_TBU(r4)
 #endif
 
+   /*
+* Don't execute any instruction which can change CR after
+* below instruction.
+*/
+   mtcrr7
+
/* Finish loading guest volatiles and jump to guest. */
PPC_LL  r5, VCPU_GPR(r5)(r4)
PPC_LL  r6, VCPU_GPR(r6)(r4)
-- 
1.7.0.4


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: Ensure all vcpus are consistent with in-kernel irqchip settings

2012-03-05 Thread Avi Kivity
If some vcpus are created before KVM_CREATE_IRQCHIP, then
irqchip_in_kernel() and vcpu-arch.apic will be inconsistent, leading
to potential NULL pointer dereferences.

Fix by:
- ensuring that no vcpus are installed when KVM_CREATE_IRQCHIP is called
- ensuring that a vcpu has an apic if it is installed after KVM_CREATE_IRQCHIP

This is somewhat long winded because vcpu-arch.apic is created without
kvm-lock held.

Based on earlier patch by Michael Ellerman.

Signed-off-by: Michael Ellerman mich...@ellerman.id.au
Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/ia64/kvm/kvm-ia64.c |5 +
 arch/x86/kvm/x86.c   |8 
 include/linux/kvm_host.h |7 +++
 virt/kvm/kvm_main.c  |4 
 4 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index d8ddbba..f5104b7 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -1172,6 +1172,11 @@ static enum hrtimer_restart hlt_timer_fn(struct hrtimer 
*data)
 
 #define PALE_RESET_ENTRY0x8000ffb0UL
 
+bool kvm_vcpu_compatible(struct kvm_vcpu *vcpu)
+{
+   return irqchip_in_kernel(vcpu-kcm) == (vcpu-arch.apic != NULL);
+}
+
 int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 {
struct kvm_vcpu *v;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3ee008f..be9594a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3199,6 +3199,9 @@ long kvm_arch_vm_ioctl(struct file *filp,
r = -EEXIST;
if (kvm-arch.vpic)
goto create_irqchip_unlock;
+   r = -EINVAL;
+   if (atomic_read(kvm-online_vcpus))
+   goto create_irqchip_unlock;
r = -ENOMEM;
vpic = kvm_create_pic(kvm);
if (vpic) {
@@ -6107,6 +6110,11 @@ void kvm_arch_check_processor_compat(void *rtn)
kvm_x86_ops-check_processor_compatibility(rtn);
 }
 
+bool kvm_vcpu_compatible(struct kvm_vcpu *vcpu)
+{
+   return irqchip_in_kernel(vcpu-kvm) == (vcpu-arch.apic != NULL);
+}
+
 int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 {
struct page *page;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 355e445..759fa26 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -805,6 +805,13 @@ static inline bool kvm_vcpu_is_bsp(struct kvm_vcpu *vcpu)
 {
return vcpu-kvm-bsp_vcpu_id == vcpu-vcpu_id;
 }
+
+bool kvm_vcpu_compatible(struct kvm_vcpu *vcpu);
+
+#else
+
+static bool kvm_vcpu_compatible(struct kvm_vcpu *vcpu) { return true; }
+
 #endif
 
 #ifdef __KVM_HAVE_DEVICE_ASSIGNMENT
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index e4431ad..94e148e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1651,6 +1651,10 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 
id)
goto vcpu_destroy;
 
mutex_lock(kvm-lock);
+   if (!kvm_vcpu_compatible(vcpu)) {
+   r = -EINVAL;
+   goto unlock_vcpu_destroy;
+   }
if (atomic_read(kvm-online_vcpus) == KVM_MAX_VCPUS) {
r = -EINVAL;
goto unlock_vcpu_destroy;
-- 
1.7.9

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v2 1/9] net: introduce tcp_server_start()

2012-03-05 Thread Orit Wasserman
On 03/05/2012 12:03 PM, Amos Kong wrote:
 Introduce tcp_server_start() by moving original code in
 tcp_start_incoming_migration().
 
 Signed-off-by: Amos Kong ak...@redhat.com
 ---
  net.c |   27 +++
  qemu_socket.h |2 ++
  2 files changed, 29 insertions(+), 0 deletions(-)
 
 diff --git a/net.c b/net.c
 index c34474f..0260968 100644
 --- a/net.c
 +++ b/net.c
 @@ -99,6 +99,33 @@ static int get_str_sep(char *buf, int buf_size, const char 
 **pp, int sep)
  return 0;
  }
  
 +int tcp_server_start(const char *str, int *fd)
 +{
 +int val, ret;
 +struct sockaddr_in saddr;
 +
 +if (parse_host_port(saddr, str)  0) {

error message would be nice 

 +return -1;
 +}
 +
 +*fd = qemu_socket(PF_INET, SOCK_STREAM, 0);
 +if (fd  0) {
 +perror(socket);
 +return -1;
 +}

this is actually net_socket_listen_init version 
tcp_start_incoming_migration returns the error -socket_error().
I prefer not to lose the errno.

I know that when calling net_socket_listen_init for some unknown reason there 
is an explict check for -1
 if (net_socket_listen_init(vlan, socket, name, listen) == -1) {
I think it is a good opportunity to change this check.

Orit
 +socket_set_nonblock(*fd);
 +
 +/* allow fast reuse */
 +val = 1;
 +setsockopt(*fd, SOL_SOCKET, SO_REUSEADDR, (const char *)val, 
 sizeof(val));
 +
 +ret = bind(*fd, (struct sockaddr *)saddr, sizeof(saddr));
 +if (ret  0) {
 +closesocket(*fd);
 +}
 +return ret;
 +}
 +
  int parse_host_port(struct sockaddr_in *saddr, const char *str)
  {
  char buf[512];
 diff --git a/qemu_socket.h b/qemu_socket.h
 index fe4cf6c..d612793 100644
 --- a/qemu_socket.h
 +++ b/qemu_socket.h
 @@ -54,6 +54,8 @@ int unix_listen(const char *path, char *ostr, int olen);
  int unix_connect_opts(QemuOpts *opts);
  int unix_connect(const char *path);
  
 +int tcp_server_start(const char *str, int *fd);
 +
  /* Old, ipv4 only bits.  Don't use for new code. */
  int parse_host_port(struct sockaddr_in *saddr, const char *str);
  int socket_init(void);
 
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v2 3/9] net: introduce tcp_client_start()

2012-03-05 Thread Orit Wasserman
On 03/05/2012 12:03 PM, Amos Kong wrote:
 Introduce tcp_client_start() by moving original code in
 tcp_start_outgoing_migration().
 
 Signed-off-by: Amos Kong ak...@redhat.com
 ---
  net.c |   39 +++
  qemu_socket.h |1 +
  2 files changed, 40 insertions(+), 0 deletions(-)
 
 diff --git a/net.c b/net.c
 index 0260968..5c20e22 100644
 --- a/net.c
 +++ b/net.c
 @@ -126,6 +126,45 @@ int tcp_server_start(const char *str, int *fd)
  return ret;
  }
  
 +int tcp_client_start(const char *str, int *fd)
 +{
 +struct sockaddr_in saddr;
 +int ret;
 +
 +if (parse_host_port(saddr, str)  0) {
 +return -EINVAL;
 +}
 +
 +*fd = qemu_socket(PF_INET, SOCK_STREAM, 0);
 +if (fd  0) {
 +perror(socket);
 +return -1;
 +}
 +socket_set_nonblock(*fd);
 +
 +for (;;) {
 +ret = connect(*fd, (struct sockaddr *)saddr, sizeof(saddr));
 +if (ret  0) {
 +ret = -socket_error();
 +if (ret == -EINPROGRESS) {
 +break;
 +#ifdef _WIN32
 +} else if (ret == -WSAEALREADY || ret == -WSAEINVAL) {
 +break;
 +#endif
 +} else if (ret != -EINTR  ret != -EWOULDBLOCK) {
 +perror(connect);
 +closesocket(*fd);
 +return -1;

I think it should be: return ret (otherwise you lose the error code).
And you need it.

Orit
 +}
 +} else {
 +break;
 +}
 +}
 +
 +return ret;
 +}
 +
  int parse_host_port(struct sockaddr_in *saddr, const char *str)
  {
  char buf[512];
 diff --git a/qemu_socket.h b/qemu_socket.h
 index d612793..9246578 100644
 --- a/qemu_socket.h
 +++ b/qemu_socket.h
 @@ -55,6 +55,7 @@ int unix_connect_opts(QemuOpts *opts);
  int unix_connect(const char *path);
  
  int tcp_server_start(const char *str, int *fd);
 +int tcp_client_start(const char *str, int *fd);
  
  /* Old, ipv4 only bits.  Don't use for new code. */
  int parse_host_port(struct sockaddr_in *saddr, const char *str);
 
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v2 2/9] net: use tcp_server_start() for tcp server creation

2012-03-05 Thread Orit Wasserman
On 03/05/2012 12:03 PM, Amos Kong wrote:
 Use tcp_server_start in those two functions:
  tcp_start_incoming_migration()
  net_socket_listen_init()
 
 Signed-off-by: Amos Kong ak...@redhat.com
 ---
  migration-tcp.c |   21 +
  net/socket.c|   23 +++
  2 files changed, 8 insertions(+), 36 deletions(-)
 
 diff --git a/migration-tcp.c b/migration-tcp.c
 index 35a5781..ecadd10 100644
 --- a/migration-tcp.c
 +++ b/migration-tcp.c
 @@ -157,28 +157,17 @@ out2:
  
  int tcp_start_incoming_migration(const char *host_port)
  {
 -struct sockaddr_in addr;
 -int val;
 +int ret;
  int s;
  
  DPRINTF(Attempting to start an incoming migration\n);
  
 -if (parse_host_port(addr, host_port)  0) {
 -fprintf(stderr, invalid host/port combination: %s\n, host_port);
 -return -EINVAL;
 -}
 -
 -s = qemu_socket(PF_INET, SOCK_STREAM, 0);
 -if (s == -1) {
 -return -socket_error();
 +ret = tcp_server_start(host_port, s);
 +if (ret  0) {
 +fprintf(stderr, tcp_server_start: %s\n, strerror(-ret));
 +return ret;
  }
  
 -val = 1;
 -setsockopt(s, SOL_SOCKET, SO_REUSEADDR, (const char *)val, sizeof(val));
 -
 -if (bind(s, (struct sockaddr *)addr, sizeof(addr)) == -1) {
 -goto err;
 -}
  if (listen(s, 1) == -1) {
  goto err;
  }
 diff --git a/net/socket.c b/net/socket.c
 index 0bcf229..5feb3d2 100644
 --- a/net/socket.c
 +++ b/net/socket.c
 @@ -403,31 +403,14 @@ static int net_socket_listen_init(VLANState *vlan,
const char *host_str)
  {
  NetSocketListenState *s;
 -int fd, val, ret;
 -struct sockaddr_in saddr;
 -
 -if (parse_host_port(saddr, host_str)  0)
 -return -1;
 +int fd, ret;
  
  s = g_malloc0(sizeof(NetSocketListenState));
  
 -fd = qemu_socket(PF_INET, SOCK_STREAM, 0);
 -if (fd  0) {
 -perror(socket);
 -g_free(s);
 -return -1;
 -}
 -socket_set_nonblock(fd);
 -
 -/* allow fast reuse */
 -val = 1;
 -setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, (const char *)val, 
 sizeof(val));
 -
 -ret = bind(fd, (struct sockaddr *)saddr, sizeof(saddr));
 +ret = tcp_server_start(host_str, fd);
  if (ret  0) {
 -perror(bind);
 +error_report(tcp_server_start: %s, strerror(-ret));

If the return value is always -1 this has no meaning

Orit
  g_free(s);
 -closesocket(fd);
  return -1;
  }
  ret = listen(fd, 0);
 
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v2 3/9] net: introduce tcp_client_start()

2012-03-05 Thread Orit Wasserman
On 03/05/2012 12:03 PM, Amos Kong wrote:
 Introduce tcp_client_start() by moving original code in
 tcp_start_outgoing_migration().
 
 Signed-off-by: Amos Kong ak...@redhat.com
 ---
  net.c |   39 +++
  qemu_socket.h |1 +
  2 files changed, 40 insertions(+), 0 deletions(-)
 
 diff --git a/net.c b/net.c
 index 0260968..5c20e22 100644
 --- a/net.c
 +++ b/net.c
 @@ -126,6 +126,45 @@ int tcp_server_start(const char *str, int *fd)
  return ret;
  }
  
 +int tcp_client_start(const char *str, int *fd)
 +{
 +struct sockaddr_in saddr;
 +int ret;
 +
 +if (parse_host_port(saddr, str)  0) {
 +return -EINVAL;

You use this in order to know when to call migrate_fd_error this is problematic 
as another error can return this error code.
I think that setting *fd = -1 in the beginning of the function would be enough.

Orit
 +}
 +
 +*fd = qemu_socket(PF_INET, SOCK_STREAM, 0);
 +if (fd  0) {
 +perror(socket);
 +return -1;
 +}
 +socket_set_nonblock(*fd);
 +
 +for (;;) {
 +ret = connect(*fd, (struct sockaddr *)saddr, sizeof(saddr));
 +if (ret  0) {
 +ret = -socket_error();
 +if (ret == -EINPROGRESS) {
 +break;
 +#ifdef _WIN32
 +} else if (ret == -WSAEALREADY || ret == -WSAEINVAL) {
 +break;
 +#endif
 +} else if (ret != -EINTR  ret != -EWOULDBLOCK) {
 +perror(connect);
 +closesocket(*fd);
 +return -1;
should be return ret;
 +}
 +} else {
 +break;
 +}
 +}
 +
 +return ret;
 +}
 +
  int parse_host_port(struct sockaddr_in *saddr, const char *str)
  {
  char buf[512];
 diff --git a/qemu_socket.h b/qemu_socket.h
 index d612793..9246578 100644
 --- a/qemu_socket.h
 +++ b/qemu_socket.h
 @@ -55,6 +55,7 @@ int unix_connect_opts(QemuOpts *opts);
  int unix_connect(const char *path);
  
  int tcp_server_start(const char *str, int *fd);
 +int tcp_client_start(const char *str, int *fd);
  
  /* Old, ipv4 only bits.  Don't use for new code. */
  int parse_host_port(struct sockaddr_in *saddr, const char *str);
 
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for tuesday 31

2012-03-05 Thread Igor Mitsyanko

On 02/21/2012 07:33 PM, Peter Maydell wrote:

On 9 February 2012 22:23, Peter Maydellpeter.mayd...@linaro.org  wrote:

Ping re the VMState and variable sized arrays issue. I don't
see any consensus in this discussion for a different approach,
so should we just commit Mitsyanko's patchset?


 From an IRC conversation I just had with Anthony and Juan:
===begin==
14:51  pm215  quintela: do you have an opinion on the vmstate variable-
length array stuff (needed for sd card) ?
14:51  quintela  pm havent looked, email title?
14:52  pm215  KVM call agenda for tuesday 31 is the most recent
discussion :-)
14:53  pm215  http://patchwork.ozlabs.org/patch/137732/ and
  http://patchwork.ozlabs.org/patch/137733/ are the relevant patches
14:54  quintela  pm215: found it, that it is a difficult thing to do (TM)
14:54  quintela  it should be on the card file, or whatever :-(
14:55  quintela  notice the should part.
14:55  pm215  I'm not sure what you mean, can you elaborate?
14:57  quintela  pm215: sect is number of sectors, right?
14:58  pm215  yes
14:59  quintela  so, a 1GB card would have around 8MB array?
14:59  quintela  took or left some byties here andthere.
14:59  quintela  bytes indeed.
14:59  quintela  I _think_ that we should put that in a save_live
  section, but that is just me (TM)
15:00  quintela  I guess that at some point, people are going to need
  bigger SD cards (16GB are already on the wild, right)
15:00  quintela  and that would make live migration just impossible
15:00  quintela  or very slow, that is completely equivalent.
15:01  quintela  it is my understanding that AHCI is using similar code,
  or did I missread some of the information?
15:03  pm215  I think alex would like ahci to use a similar 'variable
  length array' thing, but in that case the array is much smaller
15:03  aliguori  pm215, it's large but of bounded size, right?
15:03  aliguori  for SD cards?
15:03  quintela  aliguori: number of sectors * 4 bytes
15:03  quintela  aliguori: so hugee
15:04  quintela  8MB array for each 1GB of disk.
15:04  quintela  but or take some bytes.
15:04  aliguori  quintela, you cannot save that much data via savevm
15:04  quintela  this is more than all other devices combined in a
  normal instalaltion.
15:04  aliguori  that's too much
15:04  quintela  aliguori: see my answer, we need a save_live section,
  really.
15:04  aliguori  it will screw up the live migration downtime algorithm
15:04  aliguori  pm215, ^
15:04  aliguori  or just treat it as a ram section
15:05  aliguori  qemu_ram_alloc() it, and call it a day
15:05  pm215  the spec isn't very clear, but I think technically this
  info should go in the sd card image, except there's no way to tack
  additional info into a raw file
15:05  aliguori  pm215, yeah, qemu_ram_alloc() is the way to go I think,
  that makes it effectively volatile on-card RAM
15:05  quintela  pm215: I fully agree that it should go into the card
  image, but . no space for it :-(
15:05  aliguori  which i think makes sense
15:05  quintela  pm215: another thing, forgetting about migration at all.
15:06  quintela  how does this work if you stop marchine and restart
  it again?
15:06  * quintela guess that it is stored somewhere?
15:06  quintela  s/marchine/machine/
15:06  pm215  no, we just assume that any fresh sd card image has no
  write-protect set up
15:07  quintela  pm215: what is stored on that image? /me would have
  assumed that wearing information
15:07  quintela  but that is without reading the whole code.
15:09  quintela  humm, it looks like only 1 bit is used for each sector,
  why are we storing 32 bits if we only use 1 bit?
15:09  pm215  it's write-protect : you can set parts of the sd card to
  not be writeable (with a granularity of a write-protect group size)
15:09  pm215  we probably don't implement fantastically efficiently
15:10  quintela  pm215: ok, only 1 bit is needed.
15:10  quintela  we can move to 1bit/sector (8 times smaller)
15:10  quintela  but I still think that doing the qemu_ram_alloc()
  trick that aliguori pointed is the easiest way to fix it
15:11  quintela  you can create a ram_save_live section, but it is
  going to be more complex for almost no gain
15:11  pm215  ah, so we qemu_ram_alloc() it and then the contents get
  transferred in the same way as main memory ?
15:12  pm215  ...that is in exec-obsolete.h and marked as to be
  removed soon...
15:13  aliguori  pm215, yeah, so you'll need to create it using
  whatever the new fancy memory api interface is
15:13  aliguori  pm215, note that whenever you touch that memory, you
  have to set the dirty bits appropriately
15:13  aliguori  or else live migration won't work
15:14  quintela  aliguori: if they have to touch the dirty bits, it is
  equivalent to do their own save_live section.
15:14  quintela  but I agree that this is the only easy solution.
15:17  pm215  doesn't sound too hard...
15:18  quintela  as usual with vmstate, problem is testing (althought
  shouldn't be 

Re: [Qemu-devel] [PATCH v2 4/9] net: use tcp_client_start for tcp client creation

2012-03-05 Thread Orit Wasserman
On 03/05/2012 12:03 PM, Amos Kong wrote:
 Use tcp_client_start() in those two functions:
  tcp_start_outgoing_migration()
  net_socket_connect_init()
 
 Signed-off-by: Amos Kong ak...@redhat.com
 ---
  migration-tcp.c |   41 +
  net/socket.c|   41 +++--
  2 files changed, 24 insertions(+), 58 deletions(-)
 
 diff --git a/migration-tcp.c b/migration-tcp.c
 index ecadd10..4f89bff 100644
 --- a/migration-tcp.c
 +++ b/migration-tcp.c
 @@ -81,43 +81,28 @@ static void tcp_wait_for_connect(void *opaque)
  
  int tcp_start_outgoing_migration(MigrationState *s, const char *host_port)
  {
 -struct sockaddr_in addr;
  int ret;
 -
 -ret = parse_host_port(addr, host_port);
 -if (ret  0) {
 -return ret;
 -}
 +int fd;
  
  s-get_error = socket_errno;
  s-write = socket_write;
  s-close = tcp_close;
  
 -s-fd = qemu_socket(PF_INET, SOCK_STREAM, 0);
 -if (s-fd == -1) {
 -DPRINTF(Unable to open socket);
 -return -socket_error();
 -}
 -
 -socket_set_nonblock(s-fd);
 -
 -do {
 -ret = connect(s-fd, (struct sockaddr *)addr, sizeof(addr));
 -if (ret == -1) {
 -ret = -socket_error();
 -}
 -if (ret == -EINPROGRESS || ret == -EWOULDBLOCK) {
 -qemu_set_fd_handler2(s-fd, NULL, NULL, tcp_wait_for_connect, s);
 -return 0;
 -}
 -} while (ret == -EINTR);
 -
 -if (ret  0) {
 +ret = tcp_client_start(host_port, fd);
 +s-fd = fd;

you don't need fd you can pass s-fd to the function.

Orit

 +if (ret == -EINPROGRESS || ret == -EWOULDBLOCK) {
 +DPRINTF(connect in progress);
 +qemu_set_fd_handler2(s-fd, NULL, NULL, tcp_wait_for_connect, s);
 +} else if (ret  0) {
  DPRINTF(connect failed\n);
 -migrate_fd_error(s);
 +if (ret != -EINVAL) {
 +migrate_fd_error(s);
 +}
  return ret;
 +} else {
 +migrate_fd_connect(s);
  }
 -migrate_fd_connect(s);
 +
  return 0;
  }
  
 diff --git a/net/socket.c b/net/socket.c
 index 5feb3d2..b7cd8ec 100644
 --- a/net/socket.c
 +++ b/net/socket.c
 @@ -434,41 +434,22 @@ static int net_socket_connect_init(VLANState *vlan,
 const char *host_str)
  {
  NetSocketState *s;
 -int fd, connected, ret, err;
 +int fd, connected, ret;
  struct sockaddr_in saddr;
  
 -if (parse_host_port(saddr, host_str)  0)
 -return -1;
 -
 -fd = qemu_socket(PF_INET, SOCK_STREAM, 0);
 -if (fd  0) {
 -perror(socket);
 -return -1;
 -}
 -socket_set_nonblock(fd);
 -
 -connected = 0;
 -for(;;) {
 -ret = connect(fd, (struct sockaddr *)saddr, sizeof(saddr));
 -if (ret  0) {
 -err = socket_error();
 -if (err == EINTR || err == EWOULDBLOCK) {
 -} else if (err == EINPROGRESS) {
 -break;
 +ret = tcp_client_start(host_str, fd);
 +if (ret == -EINPROGRESS || ret == -EWOULDBLOCK) {
 +connected = 0;
  #ifdef _WIN32
 -} else if (err == WSAEALREADY || err == WSAEINVAL) {
 -break;
 +} else if (ret == -WSAEALREADY || ret == -WSAEINVAL) {
 +connected = 0;
  #endif
 -} else {
 -perror(connect);
 -closesocket(fd);
 -return -1;
 -}
 -} else {
 -connected = 1;
 -break;
 -}
 +} else if (ret  0) {
 +return -1;
 +} else {
 +connected = 1;
  }
 +
  s = net_socket_fd_init(vlan, model, name, fd, connected);
  if (!s)
  return -1;
 
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v2 2/9] net: use tcp_server_start() for tcp server creation

2012-03-05 Thread Amos Kong

On 05/03/12 21:27, Orit Wasserman wrote:

On 03/05/2012 12:03 PM, Amos Kong wrote:

Use tcp_server_start in those two functions:
  tcp_start_incoming_migration()
  net_socket_listen_init()

Signed-off-by: Amos Kongak...@redhat.com
---
  migration-tcp.c |   21 +
  net/socket.c|   23 +++
  2 files changed, 8 insertions(+), 36 deletions(-)

diff --git a/migration-tcp.c b/migration-tcp.c
index 35a5781..ecadd10 100644
--- a/migration-tcp.c
+++ b/migration-tcp.c
@@ -157,28 +157,17 @@ out2:

  int tcp_start_incoming_migration(const char *host_port)
  {
-struct sockaddr_in addr;
-int val;
+int ret;
  int s;

  DPRINTF(Attempting to start an incoming migration\n);

-if (parse_host_port(addr, host_port)  0) {
-fprintf(stderr, invalid host/port combination: %s\n, host_port);
-return -EINVAL;
-}
-
-s = qemu_socket(PF_INET, SOCK_STREAM, 0);
-if (s == -1) {
-return -socket_error();
+ret = tcp_server_start(host_port,s);
+if (ret  0) {
+fprintf(stderr, tcp_server_start: %s\n, strerror(-ret));
+return ret;
  }

-val = 1;
-setsockopt(s, SOL_SOCKET, SO_REUSEADDR, (const char *)val, sizeof(val));
-
-if (bind(s, (struct sockaddr *)addr, sizeof(addr)) == -1) {
-goto err;
-}
  if (listen(s, 1) == -1) {
  goto err;
  }
diff --git a/net/socket.c b/net/socket.c
index 0bcf229..5feb3d2 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -403,31 +403,14 @@ static int net_socket_listen_init(VLANState *vlan,
const char *host_str)
  {
  NetSocketListenState *s;
-int fd, val, ret;
-struct sockaddr_in saddr;
-
-if (parse_host_port(saddr, host_str)  0)
-return -1;
+int fd, ret;

  s = g_malloc0(sizeof(NetSocketListenState));

-fd = qemu_socket(PF_INET, SOCK_STREAM, 0);
-if (fd  0) {
-perror(socket);
-g_free(s);
-return -1;
-}
-socket_set_nonblock(fd);
-
-/* allow fast reuse */
-val = 1;
-setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, (const char *)val, sizeof(val));
-
-ret = bind(fd, (struct sockaddr *)saddr, sizeof(saddr));
+ret = tcp_server_start(host_str,fd);
  if (ret  0) {
-perror(bind);
+error_report(tcp_server_start: %s, strerror(-ret));


If the return value is always -1 this has no meaning


Hi Orit,

return -1; is the original code, net_socket_listen_init() is only used 
once in net_init_socket()


if (net_socket_connect_init(vlan, socket, name, connect) == -1) {
return -1;
}

This patch just replace the server creation code by tcp_server_start().

Amos.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v2 1/9] net: introduce tcp_server_start()

2012-03-05 Thread Amos Kong

On 05/03/12 21:25, Orit Wasserman wrote:

On 03/05/2012 12:03 PM, Amos Kong wrote:

Introduce tcp_server_start() by moving original code in
tcp_start_incoming_migration().

Signed-off-by: Amos Kongak...@redhat.com
---
  net.c |   27 +++
  qemu_socket.h |2 ++
  2 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/net.c b/net.c
index c34474f..0260968 100644
--- a/net.c
+++ b/net.c
@@ -99,6 +99,33 @@ static int get_str_sep(char *buf, int buf_size, const char 
**pp, int sep)
  return 0;
  }

+int tcp_server_start(const char *str, int *fd)
+{
+int val, ret;
+struct sockaddr_in saddr;
+
+if (parse_host_port(saddr, str)  0) {


error message would be nice


+return -1;
+}
+
+*fd = qemu_socket(PF_INET, SOCK_STREAM, 0);
+if (fd  0) {
+perror(socket);
+return -1;
+}


this is actually net_socket_listen_init version
tcp_start_incoming_migration returns the error -socket_error().
I prefer not to lose the errno.


agree.


I know that when calling net_socket_listen_init for some unknown reason there 
is an explict check for -1
 if (net_socket_listen_init(vlan, socket, name, listen) == -1) {
I think it is a good opportunity to change this check.


nod.



Orit

+socket_set_nonblock(*fd);
+
+/* allow fast reuse */
+val = 1;
+setsockopt(*fd, SOL_SOCKET, SO_REUSEADDR, (const char *)val, sizeof(val));
+
+ret = bind(*fd, (struct sockaddr *)saddr, sizeof(saddr));
+if (ret  0) {
+closesocket(*fd);
+}
+return ret;
+}
+
  int parse_host_port(struct sockaddr_in *saddr, const char *str)
  {
  char buf[512];
diff --git a/qemu_socket.h b/qemu_socket.h
index fe4cf6c..d612793 100644
--- a/qemu_socket.h
+++ b/qemu_socket.h
@@ -54,6 +54,8 @@ int unix_listen(const char *path, char *ostr, int olen);
  int unix_connect_opts(QemuOpts *opts);
  int unix_connect(const char *path);

+int tcp_server_start(const char *str, int *fd);
+
  /* Old, ipv4 only bits.  Don't use for new code. */
  int parse_host_port(struct sockaddr_in *saddr, const char *str);
  int socket_init(void);




--
Amos.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for tuesday 31

2012-03-05 Thread Avi Kivity
On 03/05/2012 03:38 PM, Igor Mitsyanko wrote:
 Short summary:
   * switch wp groups to bitfield rather than int array
   * convert sd.c to use memory_region_init_ram() to allocate the wp
 groups
 (being careful to use memory_region_set_dirty() when we touch them)
   * we don't need variable-length fields for sd.c any more
   * rest of the vmstate conversion is straightforward


 OK, it turned out to be not so simple, we can't use memory API in sd.c
 because TARGET_PHYS_ADDR_BITS value (and, consequently,
 target_phys_addr_t) is not defined for common objects.


Well, can't you make sd.c target dependent?  It's not so nice, but it
does solve the problem.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] pmu emulation fixes

2012-03-05 Thread Avi Kivity
On 02/26/2012 04:55 PM, Gleb Natapov wrote:
 Gleb Natapov (3):
   KVM: x86 emulator: warn when pin control is set in eventsel msr
   KVM: x86 emulator: Fix raw event check
   KVM: x86 emulator: add proper support for fixed counter 2


Thanks, applied (s/emulator/pmu/...)

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2 unit-test] Add PMU test

2012-03-05 Thread Avi Kivity
On 02/26/2012 05:20 PM, Gleb Natapov wrote:
 Add unit test to test architectural PMU emulation in kvm.


Thanks, applied.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] fix select(2) race between main_loop_wait and qemu_aio_wait

2012-03-05 Thread Avi Kivity
On 03/05/2012 11:07 AM, Jan Kiszka wrote:
 On 2012-03-05 09:34, Paolo Bonzini wrote:
  This is quite ugly.  Two threads, one running main_loop_wait and
  one running qemu_aio_wait, can race with each other on running the
  same iohandler.  The result is that an iohandler could run while the
  underlying socket is not readable or writable, with possibly ill effects.

 Hmm, isn't it a problem already that a socket is polled by two threads
 at the same time? Can't that be avoided?

Could it be done simply by adding a mutex there?  It's hardly a clean
fix, but it's not a clean problem.

 Long-term, I'd like to cut out certain file descriptors from the main
 loop and process them completely in separate threads (for separate
 locking, prioritization etc.). Dunno how NBD works, but maybe it should
 be reworked like this already.

Ideally qemu_set_fd_handler2() should be made thread local, and each
device thread would run a copy of the main loop, just working on
different data.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] fix select(2) race between main_loop_wait and qemu_aio_wait

2012-03-05 Thread Paolo Bonzini
Il 05/03/2012 15:24, Avi Kivity ha scritto:
 On 03/05/2012 11:07 AM, Jan Kiszka wrote:
 On 2012-03-05 09:34, Paolo Bonzini wrote:
 This is quite ugly.  Two threads, one running main_loop_wait and
 one running qemu_aio_wait, can race with each other on running the
 same iohandler.  The result is that an iohandler could run while the
 underlying socket is not readable or writable, with possibly ill effects.

 Hmm, isn't it a problem already that a socket is polled by two threads
 at the same time? Can't that be avoided?
 
 Could it be done simply by adding a mutex there?  It's hardly a clean
 fix, but it's not a clean problem.

Hmm, I don't think so.  It would need to protect execution of the
iohandlers too, and pretty much everything can happen there including a
nested loop.  Of course recursive mutexes exist, but it sounds like too
big an axe.

I could add a generation count updated by qemu_aio_wait(), and rerun the
select() only if the generation count changes during its execution.

Or we can call it an NBD bug.  I'm not against that, but it seemed to me
that the problem is more general.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] fix select(2) race between main_loop_wait and qemu_aio_wait

2012-03-05 Thread Jan Kiszka
On 2012-03-05 15:24, Avi Kivity wrote:
 Long-term, I'd like to cut out certain file descriptors from the main
 loop and process them completely in separate threads (for separate
 locking, prioritization etc.). Dunno how NBD works, but maybe it should
 be reworked like this already.
 
 Ideally qemu_set_fd_handler2() should be made thread local, and each
 device thread would run a copy of the main loop, just working on
 different data.

qemu_set_fd_handler2 may not only be called over an iothread. Rather, we
need an object and associated lock that is related to the io-path (i.e.
frontend device + backend driver). That object has to be passed to
services like qemu_set_fd_handler2.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for tuesday 31

2012-03-05 Thread Igor Mitsyanko

On 03/05/2012 06:13 PM, Avi Kivity wrote:

On 03/05/2012 03:38 PM, Igor Mitsyanko wrote:

Short summary:
   * switch wp groups to bitfield rather than int array
   * convert sd.c to use memory_region_init_ram() to allocate the wp
groups
(being careful to use memory_region_set_dirty() when we touch them)
   * we don't need variable-length fields for sd.c any more
   * rest of the vmstate conversion is straightforward



OK, it turned out to be not so simple, we can't use memory API in sd.c
because TARGET_PHYS_ADDR_BITS value (and, consequently,
target_phys_addr_t) is not defined for common objects.



Well, can't you make sd.c target dependent?  It's not so nice, but it
does solve the problem.

OK, but it will turn qemu from it's long term path to suppress *all* 
target specific code :)


--
Mitsyanko Igor
ASWG, Moscow RD center, Samsung Electronics
email: i.mitsya...@samsung.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] pmu emulation fixes

2012-03-05 Thread Gleb Natapov
On Mon, Mar 05, 2012 at 04:18:21PM +0200, Avi Kivity wrote:
 On 02/26/2012 04:55 PM, Gleb Natapov wrote:
  Gleb Natapov (3):
KVM: x86 emulator: warn when pin control is set in eventsel msr
KVM: x86 emulator: Fix raw event check
KVM: x86 emulator: add proper support for fixed counter 2
 
 
 Thanks, applied (s/emulator/pmu/...)
 
You, maintainers, are hard to please! My previous fix to PMU 
ee3f9f114bdc8b315eed7b1c651ca6c9b8251cf7 was prefixed KVM: x86
emulator: so I followed suit :)

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for tuesday 31

2012-03-05 Thread Avi Kivity
On 03/05/2012 04:37 PM, Igor Mitsyanko wrote:
 Well, can't you make sd.c target dependent?  It's not so nice, but it
 does solve the problem.


 OK, but it will turn qemu from it's long term path to suppress *all*
 target specific code :)


The other alternative is to s/target_phys_addr_t/uint64_t/ in the memory
API.  I think 32-on-32 is quite rare these days, so it wouldn't be much
of a performance issue.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] fix select(2) race between main_loop_wait and qemu_aio_wait

2012-03-05 Thread Avi Kivity
On 03/05/2012 04:30 PM, Paolo Bonzini wrote:
 Il 05/03/2012 15:24, Avi Kivity ha scritto:
  On 03/05/2012 11:07 AM, Jan Kiszka wrote:
  On 2012-03-05 09:34, Paolo Bonzini wrote:
  This is quite ugly.  Two threads, one running main_loop_wait and
  one running qemu_aio_wait, can race with each other on running the
  same iohandler.  The result is that an iohandler could run while the
  underlying socket is not readable or writable, with possibly ill effects.
 
  Hmm, isn't it a problem already that a socket is polled by two threads
  at the same time? Can't that be avoided?
  
  Could it be done simply by adding a mutex there?  It's hardly a clean
  fix, but it's not a clean problem.

 Hmm, I don't think so.  It would need to protect execution of the
 iohandlers too, and pretty much everything can happen there including a
 nested loop.  Of course recursive mutexes exist, but it sounds like too
 big an axe.

The I/O handlers would still use the qemu mutex, no?  we'd just protect
the select() (taking the mutex from before releasing the global lock,
and reacquiring it afterwards).

 I could add a generation count updated by qemu_aio_wait(), and rerun the
 select() only if the generation count changes during its execution.

 Or we can call it an NBD bug.  I'm not against that, but it seemed to me
 that the problem is more general.

What about making sure all callers of qemu_aio_wait() run from
coroutines (or threads)?  Then they just ask the main thread to wake
them up, instead of dispatching completions themselves.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for tuesday 31

2012-03-05 Thread Anthony Liguori

On 03/05/2012 09:10 AM, Avi Kivity wrote:

On 03/05/2012 04:37 PM, Igor Mitsyanko wrote:

Well, can't you make sd.c target dependent?  It's not so nice, but it
does solve the problem.



OK, but it will turn qemu from it's long term path to suppress *all*
target specific code :)



The other alternative is to s/target_phys_addr_t/uint64_t/ in the memory
API.  I think 32-on-32 is quite rare these days, so it wouldn't be much
of a performance issue.


I think this makes sense independent of other discussions regarding fixing 
target_phys_addr_t size.


Hardware addresses should be independent of the target.  If we wanted to use a 
hw_addr_t that would be okay too.


Regards,

Anthony Liguori





--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for tuesday 31

2012-03-05 Thread Avi Kivity
On 03/05/2012 05:15 PM, Anthony Liguori wrote:
 The other alternative is to s/target_phys_addr_t/uint64_t/ in the memory
 API.  I think 32-on-32 is quite rare these days, so it wouldn't be much
 of a performance issue.


 I think this makes sense independent of other discussions regarding
 fixing target_phys_addr_t size.

 Hardware addresses should be independent of the target.  If we wanted
 to use a hw_addr_t that would be okay too.


Would this hw_addr (s/_t$//, or you'll be Blued) be fixed at uint64_t
(and thus only documentary), or also subject to multiple compilation?

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for tuesday 31

2012-03-05 Thread Peter Maydell
On 5 March 2012 15:10, Avi Kivity a...@redhat.com wrote:
 I think 32-on-32 is quite rare these days, so it wouldn't be much
 of a performance issue.

32-on-32 will be the standard case for KVM on ARM I think...

-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for tuesday 31

2012-03-05 Thread Avi Kivity
On 03/05/2012 05:20 PM, Peter Maydell wrote:
 On 5 March 2012 15:10, Avi Kivity a...@redhat.com wrote:
  I think 32-on-32 is quite rare these days, so it wouldn't be much
  of a performance issue.

 32-on-32 will be the standard case for KVM on ARM I think...

Won't we be virtualizing LPAE per default?

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for tuesday 31

2012-03-05 Thread Peter Maydell
On 5 March 2012 15:21, Avi Kivity a...@redhat.com wrote:
 On 03/05/2012 05:20 PM, Peter Maydell wrote:
 On 5 March 2012 15:10, Avi Kivity a...@redhat.com wrote:
  I think 32-on-32 is quite rare these days, so it wouldn't be much
  of a performance issue.

 32-on-32 will be the standard case for KVM on ARM I think...

 Won't we be virtualizing LPAE per default?

Mmm, I guess that would give you 64-on-32.

-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for tuesday 31

2012-03-05 Thread Andreas Färber
Am 05.03.2012 16:10, schrieb Avi Kivity:
 On 03/05/2012 04:37 PM, Igor Mitsyanko wrote:
 Well, can't you make sd.c target dependent?  It's not so nice, but it
 does solve the problem.


 OK, but it will turn qemu from it's long term path to suppress *all*
 target specific code :)

 
 The other alternative is to s/target_phys_addr_t/uint64_t/ in the memory
 API.  I think 32-on-32 is quite rare these days, so it wouldn't be much
 of a performance issue.

Maybe rare, but 32-bit ARM netbooks and tablets are gaining marketshare.

Mid-term also depends on how me want to proceed with LPAE softmmu-wise
(bump arm to 64-bit target_phys_addr_t, or do LPAE and AArch64 in a
new arm64).

i386 is 64-on-32 these days already; most of the embedded targets are
still at most 32-bit though (xtensa, mblaze, ...).

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for tuesday 31

2012-03-05 Thread Avi Kivity
On 03/05/2012 05:43 PM, Andreas Färber wrote:
 Am 05.03.2012 16:10, schrieb Avi Kivity:
  On 03/05/2012 04:37 PM, Igor Mitsyanko wrote:
  Well, can't you make sd.c target dependent?  It's not so nice, but it
  does solve the problem.
 
 
  OK, but it will turn qemu from it's long term path to suppress *all*
  target specific code :)
 
  
  The other alternative is to s/target_phys_addr_t/uint64_t/ in the memory
  API.  I think 32-on-32 is quite rare these days, so it wouldn't be much
  of a performance issue.

 Maybe rare, but 32-bit ARM netbooks and tablets are gaining marketshare.

 Mid-term also depends on how me want to proceed with LPAE softmmu-wise
 (bump arm to 64-bit target_phys_addr_t, or do LPAE and AArch64 in a
 new arm64).

I was counting on LPAE to make 32-on-32 rare.

 i386 is 64-on-32 these days already; most of the embedded targets are
 still at most 32-bit though (xtensa, mblaze, ...).

These would be 32-on-64, since the host would usually be x86.  I guess
it would be even more true when the w64 port is complete.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for tuesday 31

2012-03-05 Thread Peter Maydell
On 5 March 2012 15:43, Andreas Färber afaer...@suse.de wrote:
 Mid-term also depends on how me want to proceed with LPAE softmmu-wise
 (bump arm to 64-bit target_phys_addr_t, or do LPAE and AArch64 in a
 new arm64).

For LPAE I would have thought we want to make arm go to a 64 bit
target_phys_addr_t, since that's exactly what it is: same old
ARM architecture with wider physical addresses :-)

I notice that for the architectures we currently have that have
32 and 64 bit versions we have separate {i386,x86_64}-softmmu,
{ppc,ppc64}-softmmu, {mips,mips64}-softmmu. What's the advantage
of separating out the 64 bit flavours that way rather than
having everything be a single binary?

-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm: mmu: make use of -root_level in reset_rsvds_bits_mask

2012-03-05 Thread Davidlohr Bueso
From: Davidlohr Bueso d...@gnu.org

The reset_rsvds_bits_mask() function can use the guest walker's root level 
number
instead of using a separate 'level' variable.

Signed-off-by: Davidlohr Bueso d...@gnu.org
---
 arch/x86/kvm/mmu.c |   31 +++
 1 files changed, 15 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index ff053ca..4cb1642 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3185,15 +3185,14 @@ static bool sync_mmio_spte(u64 *sptep, gfn_t gfn, 
unsigned access,
 #undef PTTYPE
 
 static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu,
- struct kvm_mmu *context,
- int level)
+ struct kvm_mmu *context)
 {
int maxphyaddr = cpuid_maxphyaddr(vcpu);
u64 exb_bit_rsvd = 0;
 
if (!context-nx)
exb_bit_rsvd = rsvd_bits(63, 63);
-   switch (level) {
+   switch (context-root_level) {
case PT32_ROOT_LEVEL:
/* no rsvd bits for 2 level 4K page table entries */
context-rsvd_bits_mask[0][1] = 0;
@@ -3251,8 +3250,9 @@ static int paging64_init_context_common(struct kvm_vcpu 
*vcpu,
int level)
 {
context-nx = is_nx(vcpu);
+   context-root_level = level;
 
-   reset_rsvds_bits_mask(vcpu, context, level);
+   reset_rsvds_bits_mask(vcpu, context);
 
ASSERT(is_pae(vcpu));
context-new_cr3 = paging_new_cr3;
@@ -3262,7 +3262,6 @@ static int paging64_init_context_common(struct kvm_vcpu 
*vcpu,
context-invlpg = paging64_invlpg;
context-update_pte = paging64_update_pte;
context-free = paging_free;
-   context-root_level = level;
context-shadow_root_level = level;
context-root_hpa = INVALID_PAGE;
context-direct_map = false;
@@ -3279,8 +3278,9 @@ static int paging32_init_context(struct kvm_vcpu *vcpu,
 struct kvm_mmu *context)
 {
context-nx = false;
+   context-root_level = PT32_ROOT_LEVEL;
 
-   reset_rsvds_bits_mask(vcpu, context, PT32_ROOT_LEVEL);
+   reset_rsvds_bits_mask(vcpu, context);
 
context-new_cr3 = paging_new_cr3;
context-page_fault = paging32_page_fault;
@@ -3289,7 +3289,6 @@ static int paging32_init_context(struct kvm_vcpu *vcpu,
context-sync_page = paging32_sync_page;
context-invlpg = paging32_invlpg;
context-update_pte = paging32_update_pte;
-   context-root_level = PT32_ROOT_LEVEL;
context-shadow_root_level = PT32E_ROOT_LEVEL;
context-root_hpa = INVALID_PAGE;
context-direct_map = false;
@@ -3327,19 +3326,19 @@ static int init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
context-root_level = 0;
} else if (is_long_mode(vcpu)) {
context-nx = is_nx(vcpu);
-   reset_rsvds_bits_mask(vcpu, context, PT64_ROOT_LEVEL);
-   context-gva_to_gpa = paging64_gva_to_gpa;
context-root_level = PT64_ROOT_LEVEL;
+   reset_rsvds_bits_mask(vcpu, context);
+   context-gva_to_gpa = paging64_gva_to_gpa;
} else if (is_pae(vcpu)) {
context-nx = is_nx(vcpu);
-   reset_rsvds_bits_mask(vcpu, context, PT32E_ROOT_LEVEL);
-   context-gva_to_gpa = paging64_gva_to_gpa;
context-root_level = PT32E_ROOT_LEVEL;
+   reset_rsvds_bits_mask(vcpu, context);
+   context-gva_to_gpa = paging64_gva_to_gpa;
} else {
context-nx = false;
-   reset_rsvds_bits_mask(vcpu, context, PT32_ROOT_LEVEL);
-   context-gva_to_gpa = paging32_gva_to_gpa;
context-root_level = PT32_ROOT_LEVEL;
+   reset_rsvds_bits_mask(vcpu, context);
+   context-gva_to_gpa = paging32_gva_to_gpa;
}
 
return 0;
@@ -3402,18 +3401,18 @@ static int init_kvm_nested_mmu(struct kvm_vcpu *vcpu)
g_context-gva_to_gpa = nonpaging_gva_to_gpa_nested;
} else if (is_long_mode(vcpu)) {
g_context-nx = is_nx(vcpu);
-   reset_rsvds_bits_mask(vcpu, g_context, PT64_ROOT_LEVEL);
g_context-root_level = PT64_ROOT_LEVEL;
+   reset_rsvds_bits_mask(vcpu, g_context);
g_context-gva_to_gpa = paging64_gva_to_gpa_nested;
} else if (is_pae(vcpu)) {
g_context-nx = is_nx(vcpu);
-   reset_rsvds_bits_mask(vcpu, g_context, PT32E_ROOT_LEVEL);
g_context-root_level = PT32E_ROOT_LEVEL;
+   reset_rsvds_bits_mask(vcpu, g_context);
g_context-gva_to_gpa = paging64_gva_to_gpa_nested;
} else {
g_context-nx = false;
-   reset_rsvds_bits_mask(vcpu, g_context, PT32_ROOT_LEVEL);
g_context-root_level = PT32_ROOT_LEVEL;
+   

[PATCH] KVM: PPC: Save/Restore CR over vcpu_run

2012-03-05 Thread Alexander Graf
On PPC, CR2-CR4 are nonvolatile, thus have to be saved across function calls.
We didn't respect that for any architecture until Paul spotted it in his
patch for Book3S-HV. This patch saves/restores CR for all KVM capable PPC hosts.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/book3s_interrupts.S  |7 +++
 arch/powerpc/kvm/booke_interrupts.S   |7 ++-
 arch/powerpc/kvm/bookehv_interrupts.S |8 +++-
 3 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_interrupts.S 
b/arch/powerpc/kvm/book3s_interrupts.S
index 0a8515a..3e35383 100644
--- a/arch/powerpc/kvm/book3s_interrupts.S
+++ b/arch/powerpc/kvm/book3s_interrupts.S
@@ -84,6 +84,10 @@ kvm_start_entry:
/* Save non-volatile registers (r14 - r31) */
SAVE_NVGPRS(r1)
 
+   /* Save CR */
+   mfcrr14
+   stw r14, _CCR(r1)
+
/* Save LR */
PPC_STL r0, _LINK(r1)
 
@@ -165,6 +169,9 @@ kvm_exit_loop:
PPC_LL  r4, _LINK(r1)
mtlrr4
 
+   lwz r14, _CCR(r1)
+   mtcrr14
+
/* Restore non-volatile host registers (r14 - r31) */
REST_NVGPRS(r1)
 
diff --git a/arch/powerpc/kvm/booke_interrupts.S 
b/arch/powerpc/kvm/booke_interrupts.S
index 10d8ef6..c8c4b87 100644
--- a/arch/powerpc/kvm/booke_interrupts.S
+++ b/arch/powerpc/kvm/booke_interrupts.S
@@ -34,7 +34,8 @@
 /* r2 is special: it holds 'current', and it made nonvolatile in the
  * kernel with the -ffixed-r2 gcc option. */
 #define HOST_R2 12
-#define HOST_NV_GPRS16
+#define HOST_CR 16
+#define HOST_NV_GPRS20
 #define HOST_NV_GPR(n)  (HOST_NV_GPRS + ((n - 14) * 4))
 #define HOST_MIN_STACK_SIZE (HOST_NV_GPR(31) + 4)
 #define HOST_STACK_SIZE (((HOST_MIN_STACK_SIZE + 15) / 16) * 16) /* Align. */
@@ -296,8 +297,10 @@ heavyweight_exit:
 
/* Return to kvm_vcpu_run(). */
lwz r4, HOST_STACK_LR(r1)
+   lwz r5, HOST_CR(r1)
addir1, r1, HOST_STACK_SIZE
mtlrr4
+   mtcrr5
/* r3 still contains the return code from kvmppc_handle_exit(). */
blr
 
@@ -314,6 +317,8 @@ _GLOBAL(__kvmppc_vcpu_run)
stw r3, HOST_RUN(r1)
mflrr3
stw r3, HOST_STACK_LR(r1)
+   mfcrr5
+   stw r5, HOST_CR(r1)
 
/* Save host non-volatile register state to stack. */
stw r14, HOST_NV_GPR(r14)(r1)
diff --git a/arch/powerpc/kvm/bookehv_interrupts.S 
b/arch/powerpc/kvm/bookehv_interrupts.S
index 63fc5f0..3989b5a 100644
--- a/arch/powerpc/kvm/bookehv_interrupts.S
+++ b/arch/powerpc/kvm/bookehv_interrupts.S
@@ -49,7 +49,8 @@
  * kernel with the -ffixed-r2 gcc option.
  */
 #define HOST_R2 (3 * LONGBYTES)
-#define HOST_NV_GPRS(4 * LONGBYTES)
+#define HOST_CR (4 * LONGBYTES)
+#define HOST_NV_GPRS(5 * LONGBYTES)
 #define HOST_NV_GPR(n)  (HOST_NV_GPRS + ((n - 14) * LONGBYTES))
 #define HOST_MIN_STACK_SIZE (HOST_NV_GPR(31) + LONGBYTES)
 #define HOST_STACK_SIZE ((HOST_MIN_STACK_SIZE + 15)  ~15) /* Align. */
@@ -396,6 +397,7 @@ skip_nv_load:
 heavyweight_exit:
/* Not returning to guest. */
PPC_LL  r5, HOST_STACK_LR(r1)
+   lwz r6, HOST_CR(r1)
 
/*
 * We already saved guest volatile register state; now save the
@@ -442,6 +444,7 @@ heavyweight_exit:
 
/* Return to kvm_vcpu_run(). */
mtlrr5
+   mtcrr6
addir1, r1, HOST_STACK_SIZE
/* r3 still contains the return code from kvmppc_handle_exit(). */
blr
@@ -459,6 +462,9 @@ _GLOBAL(__kvmppc_vcpu_run)
mflrr3
PPC_STL r3, HOST_STACK_LR(r1)
 
+   mfcrr5
+   stw r5, HOST_CR(r1)
+
/* Save host non-volatile register state to stack. */
PPC_STL r14, HOST_NV_GPR(r14)(r1)
PPC_STL r15, HOST_NV_GPR(r15)(r1)
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virtio-blk performance regression and qemu-kvm

2012-03-05 Thread Martin Mailand

Am 10.02.2012 15:36, schrieb Dongsu Park:

Recently I observed performance regression regarding virtio-blk,
especially different IO bandwidths between qemu-kvm 0.14.1 and 1.0.
So I want to share the benchmark results, and ask you what the reason
would be.



Hi,
I think I found the problem, there is no regression in the code.
I think the problem is, that qmeu-kvm with the IO-Thread enabled doesn't 
produce enough cpu load to get the core to a higher cpu frequency, 
because the load is distributed to two threads.
If I change the cpu governor to performance the result from the master 
branch is better than from the v0.14.1 branch.

I get the same results on a serversystem without powermanagment activated.

@Dongsu Could you confirm those findings?


1. Test on i7 Laptop with Cpu governor ondemand.

v0.14.1
bw=63492KB/s iops=15873
bw=63221KB/s iops=15805

v1.0
bw=36696KB/s iops=9173
bw=37404KB/s iops=9350

master
bw=36396KB/s iops=9099
bw=34182KB/s iops=8545

Change the Cpu governor to performance
master
bw=81756KB/s iops=20393
bw=81453KB/s iops=20257


2. Test on AMD Istanbul without powermanagement activated.

v0.14.1
bw=53167KB/s iops=13291
bw=61386KB/s iops=15346

v1.0
bw=43599KB/s iops=10899
bw=46288KB/s iops=11572

master
bw=60678KB/s iops=15169
bw=62733KB/s iops=15683

-martin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] fix select(2) race between main_loop_wait and qemu_aio_wait

2012-03-05 Thread Paolo Bonzini
Il 05/03/2012 16:14, Avi Kivity ha scritto:
  Hmm, I don't think so.  It would need to protect execution of the
  iohandlers too, and pretty much everything can happen there including a
  nested loop.  Of course recursive mutexes exist, but it sounds like too
  big an axe.
 The I/O handlers would still use the qemu mutex, no?  we'd just protect
 the select() (taking the mutex from before releasing the global lock,
 and reacquiring it afterwards).

Yes, that could work, but it is _really_ ugly.  I still prefer this
patch or fixing NBD.  At least both contain the hack in a single place.

  I could add a generation count updated by qemu_aio_wait(), and rerun the
  select() only if the generation count changes during its execution.
 
  Or we can call it an NBD bug.  I'm not against that, but it seemed to me
  that the problem is more general.
 What about making sure all callers of qemu_aio_wait() run from
 coroutines (or threads)?  Then they just ask the main thread to wake
 them up, instead of dispatching completions themselves.

That would open another Pandora's box.  The point of having a separate
main loop is that only AIO can happen during qemu_aio_wait() or
qemu_aio_flush().  In particular you don't want the monitor to process
input while you're running another monitor command.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Question on qemu-kvm 1.0

2012-03-05 Thread Al Patel
Hi ,

We have been using qemu/kvm 0.12.5 (unchanged with stock kernel 2.6.32).

I just upgraded to qemu/kvm-1.0 and see noticable difference in packet I/O.

I want to understand the enhancements in 1.0 that leads to better performance.

Can you give me some pointers?

Off the bat I see new event code. From observation, I see that the
qemu-kvm process is
taking a whole lot less CPU.

Thanks
/a
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Question on qemu-kvm 1.0

2012-03-05 Thread Al Patel
Side note: I am not using vhost-net yet. I am reading from the blogs
that vhost-net gives much better performance.
I am putting another system up with vhost-net support to measure this.

Appreciate the pointers for previous question.

/a

On Mon, Mar 5, 2012 at 11:17 AM, Al Patel alps@gmail.com wrote:
 Hi ,

 We have been using qemu/kvm 0.12.5 (unchanged with stock kernel 2.6.32).

 I just upgraded to qemu/kvm-1.0 and see noticable difference in packet I/O.

 I want to understand the enhancements in 1.0 that leads to better performance.

 Can you give me some pointers?

 Off the bat I see new event code. From observation, I see that the
 qemu-kvm process is
 taking a whole lot less CPU.

 Thanks
 /a
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virtio-blk performance regression and qemu-kvm

2012-03-05 Thread Stefan Hajnoczi
On Mon, Mar 5, 2012 at 4:13 PM, Martin Mailand mar...@tuxadero.com wrote:
 Am 10.02.2012 15:36, schrieb Dongsu Park:

 Recently I observed performance regression regarding virtio-blk,
 especially different IO bandwidths between qemu-kvm 0.14.1 and 1.0.
 So I want to share the benchmark results, and ask you what the reason
 would be.



 Hi,
 I think I found the problem, there is no regression in the code.
 I think the problem is, that qmeu-kvm with the IO-Thread enabled doesn't
 produce enough cpu load to get the core to a higher cpu frequency, because
 the load is distributed to two threads.
 If I change the cpu governor to performance the result from the master
 branch is better than from the v0.14.1 branch.
 I get the same results on a serversystem without powermanagment activated.

 @Dongsu Could you confirm those findings?


 1. Test on i7 Laptop with Cpu governor ondemand.

 v0.14.1
 bw=63492KB/s iops=15873
 bw=63221KB/s iops=15805

 v1.0
 bw=36696KB/s iops=9173
 bw=37404KB/s iops=9350

 master
 bw=36396KB/s iops=9099
 bw=34182KB/s iops=8545

 Change the Cpu governor to performance
 master
 bw=81756KB/s iops=20393
 bw=81453KB/s iops=20257

Interesting finding.  Did you show the 0.14.1 results with
performance governor?

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4 V13] Avoid soft lockup message when KVM is stopped by host

2012-03-05 Thread Marcelo Tosatti
On Tue, Feb 14, 2012 at 04:17:20PM -0500, Eric B Munson wrote:
 On Tue, 14 Feb 2012, Marcelo Tosatti wrote:
 
  On Tue, Feb 14, 2012 at 10:50:13AM -0500, Eric B Munson wrote:
   On Tue, 14 Feb 2012, Marcelo Tosatti wrote:
   
On Tue, Feb 14, 2012 at 10:29:31AM -0500, Eric B Munson wrote:
 On Wed, 08 Feb 2012, Eric B Munson wrote:
 
  
  When a guest kernel is stopped by the host hypervisor it can look 
  like a soft
  lockup to the guest kernel.  This false warning can mask later soft 
  lockup
  warnings which may be real.  This patch series adds a method for a 
  host
  hypervisor to communicate to a guest kernel that it is being 
  stopped.  The
  final patch in the series has the watchdog check this flag when it 
  goes to
  issue a soft lockup warning and skip the warning if the guest knows 
  it was
  stopped.
  
  It was attempted to solve this in Qemu, but the side effects of 
  saving and
  restoring the clock and tsc for each vcpu put the wall clock of the 
  guest behind
  by the amount of time of the pause.  This forces a guest to have 
  ntp running
  in order to keep the wall clock accurate.
 
 Avi,
 
 Is this set fit for merging or is there something else you want 
 changed?

Eric,

On Message-ID: 20120210160536.ga23...@amt.cnet, i asked:

How is the stub getting included for other architectures again?

   
   Marcelo,
   
   Sorry, I put out V13 to answer that.  There is a stub in asm-generic that 
   was
   lost in the V11-V12 rebase.  This stub has be included in the V13 set.
   
   Eric
  
  Eric, 
  
  I know the stub has been included in the series. But i am asking how 
  it is #include'ed for other architectures? (can't see that).
 
 Marcelo,
 
 kernel/watchdog.c now includes linux/kvm_para.h which includes asm/kvm_para.h.
 The check_and_clear function is defined in arch include/asm/kvm_para.h or in
 asm-generic/kvm_para.h for any arch lacking the specific header in their asm
 include dir.  If I have misunderstood how these headers work, please let me
 know and I will fix it.

There is no automatic inclusion of asm-generic/ headers. You must create
kvm_para.h in each architecture's include/asm/ directory, #including
asm-generic/kvm_para.h.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virtio-blk performance regression and qemu-kvm

2012-03-05 Thread Martin Mailand

Am 05.03.2012 17:35, schrieb Stefan Hajnoczi:

1. Test on i7 Laptop with Cpu governor ondemand.

  v0.14.1
  bw=63492KB/s iops=15873
  bw=63221KB/s iops=15805

  v1.0
  bw=36696KB/s iops=9173
  bw=37404KB/s iops=9350

  master
  bw=36396KB/s iops=9099
  bw=34182KB/s iops=8545

  Change the Cpu governor to performance
  master
  bw=81756KB/s iops=20393
  bw=81453KB/s iops=20257

Interesting finding.  Did you show the 0.14.1 results with
performance governor?



Hi Stefan,
all results are with ondemand except the one where I changed it to 
performance


Do you want a v0.14.1 test with the governor on performance?

-martin

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1 v3] PCI: Device specific reset function

2012-03-05 Thread Greg KH
On Mon, Mar 05, 2012 at 10:00:49AM +, Tadeusz Struk wrote:
 
 ---
  drivers/pci/pci.h|1 +
  drivers/pci/quirks.c |   33 +++--
  include/linux/pci.h  |1 +
  3 files changed, 29 insertions(+), 6 deletions(-)

Please read Documentation/SubmittingPatches for how to properly create,
and send, patches that are in a format that can be accepted.

Hint, also run your patch through scripts/checkpatch.pl to find the
obvious problems that are in it, to keep us from having to do that for
you...

 
 This e-mail and any attachments may contain confidential material for the 
 sole use of the intended recipient(s). Any review or distribution by others 
 is strictly prohibited. If you are not the intended recipient, please contact 
 the sender and delete all copies.

I have been told that such email footers require that the patch be
deleted and never be accepted.  Please fix this if you expect your
patches to be able to be applied.

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v0 1/2] net: bridge: propagate FDB table into hardware

2012-03-05 Thread Lennert Buytenhek
On Tue, Feb 28, 2012 at 08:40:06PM -0800, John Fastabend wrote:

 Also if there are embedded switches with learning capabilities they
 might want to trigger events to user space. In this case having
 a protocol type makes user space a bit easier to manage. I've
 added Lennert so maybe he can comment I think the Marvell chipsets
 might support something along these lines. The SR-IOV chipsets I'm
 aware of _today_ don't do learning. Learning makes the event model
 more plausible.

net/dsa currently configures any switch chips in the system to do
auto-learning.  However, I would much prefer to disable that, and have
the switch chip just pass up packets for new source addresses, have
Linux do the learning, and then mirror the Linux software FDB into
the hardware instead -- that avoids having to manually flush the
hardware FDB on certain STP state transitions or having to configure
the hardware to use a shorter address learning timeout when we're in
the middle of an STP topology change, which are problems we are
running into in practice.

Just curious -- while your patches allow propagating FDB entries
into the hardware, do you also have hooks to tell the hardware which
ports are to share address databases?

For net/dsa, we currently have:

http://patchwork.ozlabs.org/patch/16578/

While I think this is conceptually sound, the implementation is hacky,
and I wonder how you've solved it for your setup, and if DSA can
piggy-back off that.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v0 1/2] net: bridge: propagate FDB table into hardware

2012-03-05 Thread Lennert Buytenhek
On Thu, Mar 01, 2012 at 08:36:20AM -0500, Jamal Hadi Salim wrote:

   I want to see a unified API so that user space control applications 
   (RSTP, TRILL?)
   can use one set of netlink calls for both software bridge and hardware 
   offloaded
   bridges.  Does this proposal meet that requirement?
   
 
 I dont see any issues with those requirements being met.
 
  Jamal, so why do They have to be different calls? I'm not so sure 
  anymore...
  moving to RTM_FDB_XXXENTRY saved some refactoring in the bridge module but 
  that
  is just cosmetic.
 
 I may not want to use the s/ware bridge i.e I may want to use h/ware
 bridge. I may want to use both.

This is a rather common case in embedded wireless routers/access points,
where you want to have the 4 LAN ports bridged together with the wlan0
interface.

In this scenario, the bridging between the LAN ports is typically done
in hardware, and the bridging between the LAN ports and wlan0 in
software, but here you have to be careful when you send the packet from
the switch chip up the stack to be forwarded to the wlan0 interface to
not re-send it to the hardware switch chip ports other than the one
that the packet came from.

net/dsa currently solves this by not having the hardware handle
broadcast packets at all, which circumvents the problem, but for
multicast traffic you would still like to be able to do at least the
forwarding that can be done in hardware in hardware.  (Unicast doesn't
have this problem as long as the kernel and the switch chip agree on
their view of the FDB.)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4 V13] Avoid soft lockup message when KVM is stopped by host

2012-03-05 Thread Eric B Munson

On Mon, 5 Mar 2012 13:39:43 -0300, Marcelo Tosatti wrote:

On Tue, Feb 14, 2012 at 04:17:20PM -0500, Eric B Munson wrote:

On Tue, 14 Feb 2012, Marcelo Tosatti wrote:

 On Tue, Feb 14, 2012 at 10:50:13AM -0500, Eric B Munson wrote:
  On Tue, 14 Feb 2012, Marcelo Tosatti wrote:
 
   On Tue, Feb 14, 2012 at 10:29:31AM -0500, Eric B Munson wrote:
On Wed, 08 Feb 2012, Eric B Munson wrote:
   

 When a guest kernel is stopped by the host hypervisor it 
can look like a soft
 lockup to the guest kernel.  This false warning can mask 
later soft lockup
 warnings which may be real.  This patch series adds a 
method for a host
 hypervisor to communicate to a guest kernel that it is 
being stopped.  The
 final patch in the series has the watchdog check this flag 
when it goes to
 issue a soft lockup warning and skip the warning if the 
guest knows it was

 stopped.

 It was attempted to solve this in Qemu, but the side 
effects of saving and
 restoring the clock and tsc for each vcpu put the wall 
clock of the guest behind
 by the amount of time of the pause.  This forces a guest 
to have ntp running

 in order to keep the wall clock accurate.
   
Avi,
   
Is this set fit for merging or is there something else you 
want changed?

  
   Eric,
  
   On Message-ID: 20120210160536.ga23...@amt.cnet, i asked:
  
   How is the stub getting included for other architectures 
again?

  
 
  Marcelo,
 
  Sorry, I put out V13 to answer that.  There is a stub in 
asm-generic that was
  lost in the V11-V12 rebase.  This stub has be included in the 
V13 set.

 
  Eric

 Eric,

 I know the stub has been included in the series. But i am asking 
how

 it is #include'ed for other architectures? (can't see that).

Marcelo,

kernel/watchdog.c now includes linux/kvm_para.h which includes 
asm/kvm_para.h.
The check_and_clear function is defined in arch 
include/asm/kvm_para.h or in
asm-generic/kvm_para.h for any arch lacking the specific header in 
their asm
include dir.  If I have misunderstood how these headers work, please 
let me

know and I will fix it.


There is no automatic inclusion of asm-generic/ headers. You must 
create

kvm_para.h in each architecture's include/asm/ directory, #including
asm-generic/kvm_para.h.



Okay, that will go into V16 then...

Eric
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for tuesday 31

2012-03-05 Thread Avi Kivity
On 03/05/2012 05:50 PM, Peter Maydell wrote:
 On 5 March 2012 15:43, Andreas Färber afaer...@suse.de wrote:
  Mid-term also depends on how me want to proceed with LPAE softmmu-wise
  (bump arm to 64-bit target_phys_addr_t, or do LPAE and AArch64 in a
  new arm64).

 For LPAE I would have thought we want to make arm go to a 64 bit
 target_phys_addr_t, since that's exactly what it is: same old
 ARM architecture with wider physical addresses :-)

 I notice that for the architectures we currently have that have
 32 and 64 bit versions we have separate {i386,x86_64}-softmmu,
 {ppc,ppc64}-softmmu, {mips,mips64}-softmmu. What's the advantage
 of separating out the 64 bit flavours that way rather than
 having everything be a single binary?

The registers are smaller; if target_ulong fits in a long then
everything is faster.

Although, you could pretend that target_ulong is 32-bit when in 32-bit
mode, and zero the high half when switching modes, if the target allows
it (I believe i386-x86_64 does, but 8086-i386 does not).

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] fix select(2) race between main_loop_wait and qemu_aio_wait

2012-03-05 Thread Avi Kivity
On 03/05/2012 06:14 PM, Paolo Bonzini wrote:
 Il 05/03/2012 16:14, Avi Kivity ha scritto:
   Hmm, I don't think so.  It would need to protect execution of the
   iohandlers too, and pretty much everything can happen there including a
   nested loop.  Of course recursive mutexes exist, but it sounds like too
   big an axe.
  The I/O handlers would still use the qemu mutex, no?  we'd just protect
  the select() (taking the mutex from before releasing the global lock,
  and reacquiring it afterwards).

 Yes, that could work, but it is _really_ ugly. 

Yes, it is...

  I still prefer this
 patch or fixing NBD.  At least both contain the hack in a single place.



   I could add a generation count updated by qemu_aio_wait(), and rerun the
   select() only if the generation count changes during its execution.
  
   Or we can call it an NBD bug.  I'm not against that, but it seemed to me
   that the problem is more general.
  What about making sure all callers of qemu_aio_wait() run from
  coroutines (or threads)?  Then they just ask the main thread to wake
  them up, instead of dispatching completions themselves.

 That would open another Pandora's box.  The point of having a separate
 main loop is that only AIO can happen during qemu_aio_wait() or
 qemu_aio_flush().  In particular you don't want the monitor to process
 input while you're running another monitor command.

Hmm, yes, we're abusing the type of completion here as a kind of wierd
locking.  It's conceptually broken since an aio completion could trigger
anything.  Usually it just involves block format driver and device code,
but in theory, it can affect the state of whoever's running qemu_aio_wait().

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] fix select(2) race between main_loop_wait and qemu_aio_wait

2012-03-05 Thread Avi Kivity
On 03/05/2012 04:30 PM, Jan Kiszka wrote:
 On 2012-03-05 15:24, Avi Kivity wrote:
  Long-term, I'd like to cut out certain file descriptors from the main
  loop and process them completely in separate threads (for separate
  locking, prioritization etc.). Dunno how NBD works, but maybe it should
  be reworked like this already.
  
  Ideally qemu_set_fd_handler2() should be made thread local, and each
  device thread would run a copy of the main loop, just working on
  different data.

 qemu_set_fd_handler2 may not only be called over an iothread. Rather, we
 need an object and associated lock that is related to the io-path (i.e.
 frontend device + backend driver). That object has to be passed to
 services like qemu_set_fd_handler2.

Not sure I like implicit lock-taking.  In particular, how does it
interact with unregistering an fd_handler?

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: weird packet loss between two VMs

2012-03-05 Thread Martin Mailand

Hi Simon,
you are using a 100Mbits nic and you try to send with 600M, try a 
1000Mbits on the sending site as well.


-martin

Am 05.03.2012 00:57, schrieb Simon Chen:

For the two VMs, one is using 100M VNIC, the other is using 1000M one.
The vnet interfaces for the two VMs are put on two bridges on the two
servers, both tap into the second vlan. I then run iperf to send UDP
packets from the 100M VM to the 1000M VM using the following
parameter:
iperf -c 10.6.6.17 -t 30 -i 2 -r -b 600M


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] fix select(2) race between main_loop_wait and qemu_aio_wait

2012-03-05 Thread Jan Kiszka
On 2012-03-05 18:39, Avi Kivity wrote:
 On 03/05/2012 04:30 PM, Jan Kiszka wrote:
 On 2012-03-05 15:24, Avi Kivity wrote:
 Long-term, I'd like to cut out certain file descriptors from the main
 loop and process them completely in separate threads (for separate
 locking, prioritization etc.). Dunno how NBD works, but maybe it should
 be reworked like this already.

 Ideally qemu_set_fd_handler2() should be made thread local, and each
 device thread would run a copy of the main loop, just working on
 different data.

 qemu_set_fd_handler2 may not only be called over an iothread. Rather, we
 need an object and associated lock that is related to the io-path (i.e.
 frontend device + backend driver). That object has to be passed to
 services like qemu_set_fd_handler2.
 
 Not sure I like implicit lock-taking.  In particular, how does it
 interact with unregistering an fd_handler?

I wasn't suggesting implicit lock taking, just decoupling from our
infamous global lock. My point is that thread-local won't help here.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1 v3] PCI: Device specific reset function

2012-03-05 Thread Bjorn Helgaas
On Mon, Mar 5, 2012 at 3:00 AM, Tadeusz Struk tadeusz.st...@intel.com wrote:

 ---
  drivers/pci/pci.h    |    1 +
  drivers/pci/quirks.c |   33 +++--
  include/linux/pci.h  |    1 +
  3 files changed, 29 insertions(+), 6 deletions(-)

 diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
 index 1009a5e..4d10479 100644
 --- a/drivers/pci/pci.h
 +++ b/drivers/pci/pci.h
 @@ -315,6 +315,7 @@ struct pci_dev_reset_methods {
        u16 vendor;
        u16 device;
        int (*reset)(struct pci_dev *dev, int probe);
 +       struct list_head list;
  };

  #ifdef CONFIG_PCI_QUIRKS
 diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
 index 6476547..f423d2f 100644
 --- a/drivers/pci/quirks.c
 +++ b/drivers/pci/quirks.c
 @@ -3070,26 +3070,47 @@ static int reset_intel_82599_sfp_virtfn(struct 
 pci_dev *dev, int probe)
  }

  #define PCI_DEVICE_ID_INTEL_82599_SFP_VF   0x10ed
 -
 -static const struct pci_dev_reset_methods pci_dev_reset_methods[] = {
 +static struct pci_dev_reset_methods pci_dev_reset_methods[] = {
        { PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82599_SFP_VF,
 -                reset_intel_82599_sfp_virtfn },
 +               reset_intel_82599_sfp_virtfn },
        { PCI_VENDOR_ID_INTEL, PCI_ANY_ID,
                reset_intel_generic_dev },
 -       { 0 }
  };

 +static LIST_HEAD(reset_list);
 +
 +void pci_dev_specific_reset_add(struct pci_dev_reset_methods *reset_method)
 +{
 +       INIT_LIST_HEAD(reset_method-list);
 +       list_add(reset_method-list, reset_list);
 +}
 +
 +static int __init pci_dev_specific_reset_init(void)
 +{
 +       int i;
 +
 +       for (i = 0; i  ARRAY_SIZE(pci_dev_reset_methods); i++) {
 +               pci_dev_specific_reset_add(pci_dev_reset_methods[i]);
 +       }
 +       return 0;
 +}
 +
 +late_initcall(pci_dev_specific_reset_init);
 +
  int pci_dev_specific_reset(struct pci_dev *dev, int probe)
  {
        const struct pci_dev_reset_methods *i;
 +       struct pci_driver *drv = dev-driver;
 +
 +       if (drv  drv-reset)
 +               return drv-reset(dev, probe);

 -       for (i = pci_dev_reset_methods; i-reset; i++) {
 +       list_for_each_entry(i, reset_list, list) {
                if ((i-vendor == dev-vendor ||
                     i-vendor == (u16)PCI_ANY_ID) 
                    (i-device == dev-device ||
                     i-device == (u16)PCI_ANY_ID))
                        return i-reset(dev, probe);
        }
 -
        return -ENOTTY;
  }
 diff --git a/include/linux/pci.h b/include/linux/pci.h
 index a16b1df..a3a0bc5 100644
 --- a/include/linux/pci.h
 +++ b/include/linux/pci.h
 @@ -560,6 +560,7 @@ struct pci_driver {
        int  (*resume_early) (struct pci_dev *dev);
        int  (*resume) (struct pci_dev *dev);                   /* Device 
 woken up */
        void (*shutdown) (struct pci_dev *dev);
 +       int  (*reset) (struct pci_dev *dev, int probe); /* Device specific 
 reset */
        struct pci_error_handlers *err_handler;
        struct device_driver    driver;
        struct pci_dynids dynids;

This patch now consists of two pieces:
  1) Convert the reset method table to a list, and
  2) Add the reset function pointer in struct pci_driver and the if
(drv  drv-reset) block.

These should be split into two patches.

After you split them, I don't think you even need part 1, so you
should probably just drop it.

Common practice would be to also include your driver patch that
actually uses the pci_driver.reset pointer as an additional patch in
the same series.  That gives us more confidence that this solution
actually works and will be used.

Bjorn
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for tuesday 31

2012-03-05 Thread Blue Swirl
On Mon, Mar 5, 2012 at 15:17, Avi Kivity a...@redhat.com wrote:
 On 03/05/2012 05:15 PM, Anthony Liguori wrote:
 The other alternative is to s/target_phys_addr_t/uint64_t/ in the memory
 API.  I think 32-on-32 is quite rare these days, so it wouldn't be much
 of a performance issue.


 I think this makes sense independent of other discussions regarding
 fixing target_phys_addr_t size.

 Hardware addresses should be independent of the target.  If we wanted
 to use a hw_addr_t that would be okay too.


 Would this hw_addr (s/_t$//, or you'll be Blued) be fixed at uint64_t

Malced? Posixed?

 (and thus only documentary), or also subject to multiple compilation?

In real world CPU physical addresses, bus addresses and device
addresses need not have anything in common. The best would be if we
could have devices with 10-bit addresses mixing freely with 32 bit
buses and 36 bit CPU physical addresses. The next best thing probably
is to fix all of them to shortest possible reasonable value, like now.
Fixing all of them to 64 bits would simplify things a lot if we no
longer care about the small performance loss on 32 bit hosts.

 --
 error compiling committee.c: too many arguments to function


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for tuesday 31

2012-03-05 Thread malc
On Mon, 5 Mar 2012, Blue Swirl wrote:

 On Mon, Mar 5, 2012 at 15:17, Avi Kivity a...@redhat.com wrote:
  On 03/05/2012 05:15 PM, Anthony Liguori wrote:
  The other alternative is to s/target_phys_addr_t/uint64_t/ in the memory
  API.  I think 32-on-32 is quite rare these days, so it wouldn't be much
  of a performance issue.
 
 
  I think this makes sense independent of other discussions regarding
  fixing target_phys_addr_t size.
 
  Hardware addresses should be independent of the target.  If we wanted
  to use a hw_addr_t that would be okay too.
 
 
  Would this hw_addr (s/_t$//, or you'll be Blued) be fixed at uint64_t
 
 Malced? Posixed?

Heh, a_moo would be Malced, no _t is Posixed indeed.

-- 
mailto:av1...@comtv.ru

Re: [PATCH] KVM: PPC: Save/Restore CR over vcpu_run

2012-03-05 Thread Scott Wood
On 03/05/2012 10:02 AM, Alexander Graf wrote:
 @@ -442,6 +444,7 @@ heavyweight_exit:
  
   /* Return to kvm_vcpu_run(). */
   mtlrr5
 + mtcrr6
   addir1, r1, HOST_STACK_SIZE
   /* r3 still contains the return code from kvmppc_handle_exit(). */
   blr
 @@ -459,6 +462,9 @@ _GLOBAL(__kvmppc_vcpu_run)
   mflrr3
   PPC_STL r3, HOST_STACK_LR(r1)
  
 + mfcrr5
 + stw r5, HOST_CR(r1)

If you move the mfcr before the PPC_STL they should be able to run in
parallel.  Otherwise on e500mc mfcr will wait for PPC_STL to take its 3
cycles and then mfcr will take 5 cyles before the stw of HOST_CR.
Alternatively, consider using mcrf/mtocrf three times.

Similar issues in booke_interrupts.S (except we can't assume mtocrf
exists there), but I'm less worried about that one as it still needs an
optimization pass in general.

-Scott

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 42829] KVM Guest with virtio network driver loses network connectivity

2012-03-05 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=42829


Steve stefan.bo...@gmail.com changed:

   What|Removed |Added

 Kernel Version|v3.0-rc5|v3.0-rc1




-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 42829] KVM Guest with virtio network driver loses network connectivity

2012-03-05 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=42829


Steve stefan.bo...@gmail.com changed:

   What|Removed |Added

 Kernel Version|v3.0-rc1|v3.0-rc1+
   Severity|high|blocking




--- Comment #11 from Steve stefan.bo...@gmail.com  2012-03-06 02:26:16 ---
I found bad commit.

git bisect log:
---
git bisect start
# bad: [550cf00dbc8ee402bef71628cb71246493dd4500] Merge tag 'mmc-fixes-for-3.3'
of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc
git bisect bad 550cf00dbc8ee402bef71628cb71246493dd4500
# good: [61c4f2c81c61f73549928dfd9f3e8f26aa36a8cf] Linux 2.6.39
git bisect good 61c4f2c81c61f73549928dfd9f3e8f26aa36a8cf
# bad: [8a9ea3237e7eb5c25f09e429ad242ae5a3d5ea22] Merge
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
git bisect bad 8a9ea3237e7eb5c25f09e429ad242ae5a3d5ea22
# bad: [95a943c162d74b20d869917bdf5df11293c35b63] Merge branch 'master' of
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into
for-davem
git bisect bad 95a943c162d74b20d869917bdf5df11293c35b63
# good: [98b98d316349e9a028e632629fe813d07fa5afdd] Merge branch 'drm-core-next'
of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6
git bisect good 98b98d316349e9a028e632629fe813d07fa5afdd
# bad: [1f6e44a6dc21a5d2abb068063acbbf64f8cee548] pxa168_eth: enable transmit
time stamping.
git bisect bad 1f6e44a6dc21a5d2abb068063acbbf64f8cee548
# good: [19de85ef574c3a2182e3ccad9581805052f14946] bitops: add #ifndef for each
of find bitops
git bisect good 19de85ef574c3a2182e3ccad9581805052f14946
# good: [c320afe965bf3f857249d223801d8f2fc95615c2] Blackfin: debug-mmrs:
include RSI_PID[4567] MMRs
git bisect good c320afe965bf3f857249d223801d8f2fc95615c2
# bad: [23c79d31a3dd2602ee1a5ff31303b2d7a2d3c159] Merge
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6
git bisect bad 23c79d31a3dd2602ee1a5ff31303b2d7a2d3c159
# good: [cd1acdf1723d71b28175f95b04305f1cc74ce363] Merge branch 'pnfs-submit'
of git://git.open-osd.org/linux-open-osd
git bisect good cd1acdf1723d71b28175f95b04305f1cc74ce363
# bad: [cd4ecf877a4d629c38571405fd649077c12dec50] Merge branch
'rmobile-fixes-for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6
git bisect bad cd4ecf877a4d629c38571405fd649077c12dec50
# bad: [5c6cce92bc8aee751aafe82c5d9caf7553226a3d] Merge branch 'for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6
git bisect bad 5c6cce92bc8aee751aafe82c5d9caf7553226a3d
# bad: [8ea8cf89e19aeb596b818ee5f2bec8a8b0586b60] vhost: support event index
git bisect bad 8ea8cf89e19aeb596b818ee5f2bec8a8b0586b60
# good: [bc805a03c26e1e25171bc627c6264553d27f746c] lguest: fix up compilation
after move
git bisect good bc805a03c26e1e25171bc627c6264553d27f746c
# good: [bf50e69f63d21091e525185c3ae761412be0ba72] virtio balloon: kill
tell-host-first logic
git bisect good bf50e69f63d21091e525185c3ae761412be0ba72
# good: [770b31a85e000b0194974922f238a30ade4246b6] virtio: event index
interface
git bisect good 770b31a85e000b0194974922f238a30ade4246b6
# bad: [a5c262c5fd83ece01bd649fb08416c501d4c59d7] virtio_ring: support event
idx feature
git bisect bad a5c262c5fd83ece01bd649fb08416c501d4c59d7
# good: [bf7035bf20563a6cadcb9e870406e7b21daf5e30] virtio ring: inline function
to check for events
git bisect good bf7035bf20563a6cadcb9e870406e7b21daf5e30


git bisect message:
===
a5c262c5fd83ece01bd649fb08416c501d4c59d7 is the first bad commit
commit a5c262c5fd83ece01bd649fb08416c501d4c59d7
Author: Michael S. Tsirkin m...@redhat.com
Date:   Fri May 20 02:10:44 2011 +0300

virtio_ring: support event idx feature

Support for the new event idx feature:
1. When enabling interrupts, publish the current avail index
   value to the host to get interrupts on the next update.
2. Use the new avail_event feature to reduce the number
   of exits from the guest.

Simple test with the simulator:

[virtio]# time ./virtio_test
spurious wakeus: 0x7

real0m0.169s
user0m0.140s
sys 0m0.019s
[virtio]# time ./virtio_test --no-event-idx
spurious wakeus: 0x11

real0m0.649s
user0m0.295s
sys 0m0.335s

Signed-off-by: Michael S. Tsirkin m...@redhat.com
Signed-off-by: Rusty Russell ru...@rustcorp.com.au

:04 04 933903414419858cf7402aa3fb8c3f675d6ab7cc
0ed603da4671eef88e0702e6438e903b56688b62 M  drivers



I found bug in include/linux/virtio_ring.h:
===

virtio: event index interface
authorMichael S. Tsirkin m...@redhat.com
Thu, 19 May 2011 23:10:17 + (02:10 +0300)
committerRusty Russell ru...@rustcorp.com.au
Mon, 30 May 2011 01:44:14 + (10:44 +0930)
commit770b31a85e000b0194974922f238a30ade4246b6
treeeed81e23f3116858b49af76bcc5831c38662de96tree | 

[Bug 42829] KVM Guest with virtio network driver loses network connectivity

2012-03-05 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=42829


Jason Wang jasow...@redhat.com changed:

   What|Removed |Added

 CC||jasow...@redhat.com




--- Comment #12 from Jason Wang jasow...@redhat.com  2012-03-06 02:56:08 ---
(In reply to comment #11)
 I found bad commit.
 
 git bisect log:
 ---
 git bisect start
 # bad: [550cf00dbc8ee402bef71628cb71246493dd4500] Merge tag 
 'mmc-fixes-for-3.3'
 of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc
 git bisect bad 550cf00dbc8ee402bef71628cb71246493dd4500
 # good: [61c4f2c81c61f73549928dfd9f3e8f26aa36a8cf] Linux 2.6.39
 git bisect good 61c4f2c81c61f73549928dfd9f3e8f26aa36a8cf
 # bad: [8a9ea3237e7eb5c25f09e429ad242ae5a3d5ea22] Merge
 git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
 git bisect bad 8a9ea3237e7eb5c25f09e429ad242ae5a3d5ea22
 # bad: [95a943c162d74b20d869917bdf5df11293c35b63] Merge branch 'master' of
 git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into
 for-davem
 git bisect bad 95a943c162d74b20d869917bdf5df11293c35b63
 # good: [98b98d316349e9a028e632629fe813d07fa5afdd] Merge branch 
 'drm-core-next'
 of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6
 git bisect good 98b98d316349e9a028e632629fe813d07fa5afdd
 # bad: [1f6e44a6dc21a5d2abb068063acbbf64f8cee548] pxa168_eth: enable transmit
 time stamping.
 git bisect bad 1f6e44a6dc21a5d2abb068063acbbf64f8cee548
 # good: [19de85ef574c3a2182e3ccad9581805052f14946] bitops: add #ifndef for 
 each
 of find bitops
 git bisect good 19de85ef574c3a2182e3ccad9581805052f14946
 # good: [c320afe965bf3f857249d223801d8f2fc95615c2] Blackfin: debug-mmrs:
 include RSI_PID[4567] MMRs
 git bisect good c320afe965bf3f857249d223801d8f2fc95615c2
 # bad: [23c79d31a3dd2602ee1a5ff31303b2d7a2d3c159] Merge
 git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6
 git bisect bad 23c79d31a3dd2602ee1a5ff31303b2d7a2d3c159
 # good: [cd1acdf1723d71b28175f95b04305f1cc74ce363] Merge branch 'pnfs-submit'
 of git://git.open-osd.org/linux-open-osd
 git bisect good cd1acdf1723d71b28175f95b04305f1cc74ce363
 # bad: [cd4ecf877a4d629c38571405fd649077c12dec50] Merge branch
 'rmobile-fixes-for-linus' of
 git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6
 git bisect bad cd4ecf877a4d629c38571405fd649077c12dec50
 # bad: [5c6cce92bc8aee751aafe82c5d9caf7553226a3d] Merge branch 'for-linus' of
 git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6
 git bisect bad 5c6cce92bc8aee751aafe82c5d9caf7553226a3d
 # bad: [8ea8cf89e19aeb596b818ee5f2bec8a8b0586b60] vhost: support event index
 git bisect bad 8ea8cf89e19aeb596b818ee5f2bec8a8b0586b60
 # good: [bc805a03c26e1e25171bc627c6264553d27f746c] lguest: fix up compilation
 after move
 git bisect good bc805a03c26e1e25171bc627c6264553d27f746c
 # good: [bf50e69f63d21091e525185c3ae761412be0ba72] virtio balloon: kill
 tell-host-first logic
 git bisect good bf50e69f63d21091e525185c3ae761412be0ba72
 # good: [770b31a85e000b0194974922f238a30ade4246b6] virtio: event index
 interface
 git bisect good 770b31a85e000b0194974922f238a30ade4246b6
 # bad: [a5c262c5fd83ece01bd649fb08416c501d4c59d7] virtio_ring: support event
 idx feature
 git bisect bad a5c262c5fd83ece01bd649fb08416c501d4c59d7
 # good: [bf7035bf20563a6cadcb9e870406e7b21daf5e30] virtio ring: inline 
 function
 to check for events
 git bisect good bf7035bf20563a6cadcb9e870406e7b21daf5e30
 
 
 git bisect message:
 ===
 a5c262c5fd83ece01bd649fb08416c501d4c59d7 is the first bad commit
 commit a5c262c5fd83ece01bd649fb08416c501d4c59d7
 Author: Michael S. Tsirkin m...@redhat.com
 Date:   Fri May 20 02:10:44 2011 +0300
 
 virtio_ring: support event idx feature
 
 Support for the new event idx feature:
 1. When enabling interrupts, publish the current avail index
value to the host to get interrupts on the next update.
 2. Use the new avail_event feature to reduce the number
of exits from the guest.
 
 Simple test with the simulator:
 
 [virtio]# time ./virtio_test
 spurious wakeus: 0x7
 
 real0m0.169s
 user0m0.140s
 sys 0m0.019s
 [virtio]# time ./virtio_test --no-event-idx
 spurious wakeus: 0x11
 
 real0m0.649s
 user0m0.295s
 sys 0m0.335s
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 Signed-off-by: Rusty Russell ru...@rustcorp.com.au
 
 :04 04 933903414419858cf7402aa3fb8c3f675d6ab7cc
 0ed603da4671eef88e0702e6438e903b56688b62 M  drivers
 
 
 
 I found bug in include/linux/virtio_ring.h:
 ===
 
 virtio: event index interface
 authorMichael S. Tsirkin m...@redhat.com
 Thu, 19 May 2011 23:10:17 + (02:10 +0300)
 committerRusty Russell ru...@rustcorp.com.au
 Mon, 30 May 2011 01:44:14 + (10:44 +0930)
 commit

Re: [RFC PATCH v0 1/2] net: bridge: propagate FDB table into hardware

2012-03-05 Thread John Fastabend
On 3/5/2012 8:53 AM, Lennert Buytenhek wrote:
 On Tue, Feb 28, 2012 at 08:40:06PM -0800, John Fastabend wrote:
 
 Also if there are embedded switches with learning capabilities they
 might want to trigger events to user space. In this case having
 a protocol type makes user space a bit easier to manage. I've
 added Lennert so maybe he can comment I think the Marvell chipsets
 might support something along these lines. The SR-IOV chipsets I'm
 aware of _today_ don't do learning. Learning makes the event model
 more plausible.
 
 net/dsa currently configures any switch chips in the system to do
 auto-learning.  However, I would much prefer to disable that, and have
 the switch chip just pass up packets for new source addresses, have
 Linux do the learning, and then mirror the Linux software FDB into
 the hardware instead -- that avoids having to manually flush the
 hardware FDB on certain STP state transitions or having to configure
 the hardware to use a shorter address learning timeout when we're in
 the middle of an STP topology change, which are problems we are
 running into in practice.
 

Great. And the plan is we should be able to use the same daemon with
minimal changes (currently a flag) to control both sw and hw bridges.

 Just curious -- while your patches allow propagating FDB entries
 into the hardware, do you also have hooks to tell the hardware which
 ports are to share address databases?
 

Not in the current patches. I don't have hardware right now
that can instantiate multiple bridges. When I get some I was hoping
to do something similar to this patch and use netlink commands
to create/delete bridges and add/remove ports to them. This would
be modifying the existing commands to work for both software and
hardware bridges.

By a bridge instantiation I mean a shared address database in this case.

 For net/dsa, we currently have:
 
   http://patchwork.ozlabs.org/patch/16578/
 
 While I think this is conceptually sound, the implementation is hacky,
 and I wonder how you've solved it for your setup, and if DSA can
 piggy-back off that.

Yep anything we come up with should work in both cases.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] KVM: expose Intel cpu new features to guest

2012-03-05 Thread Liu, Jinsong
Avi,

Any comments?

Thanks,
Jinsong

Liu, Jinsong wrote:
 From ecd8be962f69393c183f941bfdbd7a7d3876d442 Mon Sep 17 00:00:00 2001
 From: Liu, Jinsong jinsong@intel.com
 Date: Mon, 27 Feb 2012 05:19:32 +0800
 Subject: [PATCH] KVM: expose Intel cpu new features to guest
 
 Intel recently release 2 new features, HLE and TRM.
 Refer to http://software.intel.com/file/41417.
 This patch expose them to guest.
 
 Signed-off-by: Liu, Jinsong jinsong@intel.com
 ---
  arch/x86/include/asm/cpufeature.h |2 ++
  arch/x86/kvm/cpuid.c  |3 ++-
  2 files changed, 4 insertions(+), 1 deletions(-)
 
 diff --git a/arch/x86/include/asm/cpufeature.h
 b/arch/x86/include/asm/cpufeature.h 
 index 17c5d4b..e8d12a8 100644
 --- a/arch/x86/include/asm/cpufeature.h
 +++ b/arch/x86/include/asm/cpufeature.h
 @@ -198,10 +198,12 @@
  /* Intel-defined CPU features, CPUID level 0x0007:0 (ebx), word
  9 */ #define X86_FEATURE_FSGSBASE(9*32+ 0) /* {RD/WR}{FS/GS}BASE
  instructions*/ #define X86_FEATURE_BMI1  (9*32+ 3) /* 1st group bit
 manipulation extensions */ +#define X86_FEATURE_HLE   (9*32+ 4) /*
  Hardware Lock Elision */ #define X86_FEATURE_AVX2(9*32+ 5) /* AVX2
  instructions */ #define X86_FEATURE_SMEP (9*32+ 7) /* Supervisor
  Mode Execution Protection */ #define X86_FEATURE_BMI2(9*32+ 8) /*
  2nd group bit manipulation extensions */ #define
 X86_FEATURE_ERMS  (9*32+ 9) /* Enhanced REP MOVSB/STOSB */ +#define
 X86_FEATURE_RTM   (9*32+11) /* Restricted Transactional Memory */ 
 
  #if defined(__KERNEL__)  !defined(__ASSEMBLY__)
 
 diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
 index 9fed5be..c2134b8 100644
 --- a/arch/x86/kvm/cpuid.c
 +++ b/arch/x86/kvm/cpuid.c
 @@ -247,7 +247,8 @@ static int do_cpuid_ent(struct kvm_cpuid_entry2
 *entry, u32 function, 
 
   /* cpuid 7.0.ebx */
   const u32 kvm_supported_word9_x86_features =
 - F(FSGSBASE) | F(BMI1) | F(AVX2) | F(SMEP) | F(BMI2) | F(ERMS);
 + F(FSGSBASE) | F(BMI1) | F(HLE) | F(AVX2) | F(SMEP) |
 + F(BMI2) | F(ERMS) | F(RTM);
 
   /* all calls to cpuid_count() should be made on the same cpu */
   get_cpu();

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Ensure all vcpus are consistent with in-kernel irqchip settings

2012-03-05 Thread Michael Ellerman
On Mon, 2012-03-05 at 14:29 +0200, Avi Kivity wrote:
 If some vcpus are created before KVM_CREATE_IRQCHIP, then
 irqchip_in_kernel() and vcpu-arch.apic will be inconsistent, leading
 to potential NULL pointer dereferences.
 
 Fix by:
 - ensuring that no vcpus are installed when KVM_CREATE_IRQCHIP is called
 - ensuring that a vcpu has an apic if it is installed after KVM_CREATE_IRQCHIP
 
 This is somewhat long winded because vcpu-arch.apic is created without
 kvm-lock held.

Hi Avi,

Thanks for following up on this. This looks OK to me.

I wonder if we will end up needing to add other sanity tests at the same
point, ie. when we install the vcpu, in which case we might need a
generic sanity hook. But better to keep it specific until we need
something generalised.

When we do irqchip-in-kernel on powerpc we'll need to rework the #ifdef
in kvm_host.h, because we don't want CONFIG_KVM_APIC_ARCHITECTURE, but
we will need our own kvm_vcpu_compatible(). But again we'll do that at
the time.

cheers



signature.asc
Description: This is a digitally signed message part


[Bug 42829] KVM Guest with virtio network driver loses network connectivity

2012-03-05 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=42829





--- Comment #13 from Steve stefan.bo...@gmail.com  2012-03-06 07:28:59 ---
Hello.

I start testing from latest master branch v3.3-rc6+ on both: host, guest.
During all test host has the same kernel  other stuff, on guest
i changed only kernel versions by git bisecting. I don't change any code,
my proposal is only tip and could be wrong. I suppose that I provided
sufficient information to detect bug in code.

Answer to your question about containing the fixes:
---
git branch --contains=a72caae21803b74e04e2afda5e035f149d4ea118
* master

git branch --contains=4dbc5d9f4f791df8a5879f4a655f517adc7f56d1
* master

Let me know how could I help you (when needed) to fix this issue
as soon as possible.

Thank you for your time.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: PPC: Book 3S: Fix compilation for !HV configs

2012-03-05 Thread Paul Mackerras
Commits 2f5cdd5487 (KVM: PPC: Book3S HV: Make secondary threads more
robust against stray IPIs) and 1c2066b0f7 (KVM: PPC: Book3S HV: Make
virtual processor area registration more robust) added fields to
struct kvm_vcpu_arch inside #ifdef CONFIG_KVM_BOOK3S_64_HV regions,
and added lines to arch/powerpc/kernel/asm-offsets.c to generate
assembler constants for their offsets.  Unfortunately this led to
compile errors on Book 3S machines for configs that had KVM enabled
but not CONFIG_KVM_BOOK3S_64_HV.  This fixes the problem by moving
the offending lines inside #ifdef CONFIG_KVM_BOOK3S_64_HV regions.

Signed-off-by: Paul Mackerras pau...@samba.org
---
This is against the kvm-ppc-next branch of
git://github.com/agraf/linux-2.6.git.

 arch/powerpc/kernel/asm-offsets.c |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 8c8b2ce..86d43cc 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -464,6 +464,7 @@ int main(void)
DEFINE(KVM_VRMA_SLB_V, offsetof(struct kvm, arch.vrma_slb_v));
DEFINE(VCPU_DSISR, offsetof(struct kvm_vcpu, arch.shregs.dsisr));
DEFINE(VCPU_DAR, offsetof(struct kvm_vcpu, arch.shregs.dar));
+   DEFINE(VCPU_VPA, offsetof(struct kvm_vcpu, arch.vpa.pinned_addr));
 #endif
 #ifdef CONFIG_PPC_BOOK3S
DEFINE(VCPU_VCPUID, offsetof(struct kvm_vcpu, vcpu_id));
@@ -480,7 +481,6 @@ int main(void)
DEFINE(VCPU_PENDING_EXC, offsetof(struct kvm_vcpu, 
arch.pending_exceptions));
DEFINE(VCPU_CEDED, offsetof(struct kvm_vcpu, arch.ceded));
DEFINE(VCPU_PRODDED, offsetof(struct kvm_vcpu, arch.prodded));
-   DEFINE(VCPU_VPA, offsetof(struct kvm_vcpu, arch.vpa.pinned_addr));
DEFINE(VCPU_MMCR, offsetof(struct kvm_vcpu, arch.mmcr));
DEFINE(VCPU_PMC, offsetof(struct kvm_vcpu, arch.pmc));
DEFINE(VCPU_SLB, offsetof(struct kvm_vcpu, arch.slb));
@@ -554,10 +554,10 @@ int main(void)
HSTATE_FIELD(HSTATE_IN_GUEST, in_guest);
HSTATE_FIELD(HSTATE_RESTORE_HID5, restore_hid5);
HSTATE_FIELD(HSTATE_NAPPING, napping);
-   HSTATE_FIELD(HSTATE_HWTHREAD_REQ, hwthread_req);
-   HSTATE_FIELD(HSTATE_HWTHREAD_STATE, hwthread_state);
 
 #ifdef CONFIG_KVM_BOOK3S_64_HV
+   HSTATE_FIELD(HSTATE_HWTHREAD_REQ, hwthread_req);
+   HSTATE_FIELD(HSTATE_HWTHREAD_STATE, hwthread_state);
HSTATE_FIELD(HSTATE_KVM_VCPU, kvm_vcpu);
HSTATE_FIELD(HSTATE_KVM_VCORE, kvm_vcore);
HSTATE_FIELD(HSTATE_XICS_PHYS, xics_phys);
-- 
1.7.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Qemu-devel] [PATCH 2/2] Expose tsc deadline timer cpuid to guest

2012-03-05 Thread Liu, Jinsong
Jan,

Any comments? I feel some confused about your point 'disable cpuid feature for 
older machine types by default': are you planning a common approach for this 
common issue, or, you just ask me a specific solution for the tsc deadline 
timer case?

Thanks,
Jinsong


Liu, Jinsong wrote:
 My point is that
 
   qemu-version-A [-cpu whatever]
 
 should provide the same VM as
 
   qemu-version-B -machine pc-A [-cpu whatever]
 
 specifically if you leave out the cpu specification.
 
 So the compat machine could establish a feature mask (e.g. append
 some -tsc_deadline in this case). But, indeed, we need a new
 channel for this. 
 
 
 Yes, if such requirement need to be satisfied, I agree we need a new
 channel to solve this kind of common issue. 
 
 As for tsc deadline timer feature exposing, I write an updated patch
 as attached. 1). It exposes tsc deadline timer feature to guest if
 in-kernel irqchip is used and kvm has emulated tsc deadline timer;
 2). It also authorizes user to control the feature exposing via a cpu
 feature flag;  
 
 Thanks,
 Jinsong
 
 
 From 5b7d5f459b621686e78e437010ce34748bcb9e8e Mon Sep 17 00:00:00 2001
 From: Liu, Jinsong jinsong@intel.com
 Date: Wed, 29 Feb 2012 01:53:15 +0800
 Subject: [PATCH] Expose tsc deadline timer feature to guest
 
 It exposes tsc deadline timer feature to guest if in-kernel irqchip
 is used 
 and kvm has emulated tsc deadline timer.
 It also authorizes user to control the feature exposing via a cpu
 feature flag. 
 
 Signed-off-by: Liu, Jinsong jinsong@intel.com
 ---
  target-i386/cpu.h   |1 +
  target-i386/cpuid.c |2 +-
  target-i386/kvm.c   |4 
  3 files changed, 6 insertions(+), 1 deletions(-)
 
 diff --git a/target-i386/cpu.h b/target-i386/cpu.h
 index d92be5d..3409afe 100644
 --- a/target-i386/cpu.h
 +++ b/target-i386/cpu.h
 @@ -399,6 +399,7 @@
  #define CPUID_EXT_X2APIC   (1  21)
  #define CPUID_EXT_MOVBE(1  22)
  #define CPUID_EXT_POPCNT   (1  23)
 +#define CPUID_EXT_TSC_DEADLINE_TIMER (1  24)
  #define CPUID_EXT_XSAVE(1  26)
  #define CPUID_EXT_OSXSAVE  (1  27)
  #define CPUID_EXT_HYPERVISOR  (1  31)
 diff --git a/target-i386/cpuid.c b/target-i386/cpuid.c
 index b9bfeaf..ac4b79c 100644
 --- a/target-i386/cpuid.c
 +++ b/target-i386/cpuid.c
 @@ -50,7 +50,7 @@ static const char *ext_feature_name[] = {
  fma, cx16, xtpr, pdcm,
  NULL, NULL, dca, sse4.1|sse4_1,
  sse4.2|sse4_2, x2apic, movbe, popcnt,
 -NULL, aes, xsave, osxsave,
 +tsc_deadline, aes, xsave, osxsave,
  avx, NULL, NULL, hypervisor,
  };
  static const char *ext2_feature_name[] = {
 diff --git a/target-i386/kvm.c b/target-i386/kvm.c
 index 7079e87..2639699 100644
 --- a/target-i386/kvm.c
 +++ b/target-i386/kvm.c
 @@ -370,6 +370,10 @@ int kvm_arch_init_vcpu(CPUState *env)
  i = env-cpuid_ext_features  CPUID_EXT_HYPERVISOR;
  env-cpuid_ext_features = kvm_arch_get_supported_cpuid(s, 1, 0,
  R_ECX); env-cpuid_ext_features |= i;
 +if (!kvm_irqchip_in_kernel() ||
 +!kvm_check_extension(s, KVM_CAP_TSC_DEADLINE_TIMER)) {
 +env-cpuid_ext_features = ~CPUID_EXT_TSC_DEADLINE_TIMER;
 +}
 
  env-cpuid_ext2_features = kvm_arch_get_supported_cpuid(s,
  
 0x8001, 0, R_EDX); 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/38] KVM: PPC: booke: category E.HV (GS-mode) support

2012-03-05 Thread tiejun.chen
 +/*
 + * Host interrupt handlers may have clobbered these guest-readable
 + * SPRGs, so we need to reload them here with the guest's values.
 + */
 +lwz r3, VCPU_VRSAVE(r4)
 +lwz r5, VCPU_SHARED_SPRG4(r11)
 +mtspr   SPRN_VRSAVE, r3
 +lwz r6, VCPU_SHARED_SPRG5(r11)
 +mtspr   SPRN_SPRG4W, r5
 +lwz r7, VCPU_SHARED_SPRG6(r11)
 +mtspr   SPRN_SPRG5W, r6
 +lwz r8, VCPU_SHARED_SPRG7(r11)
 +mtspr   SPRN_SPRG6W, r7
 +mtspr   SPRN_SPRG7W, r8
 +

That should be here.

 +/* Load some guest volatiles. */
 +PPC_LL  r3, VCPU_LR(r4)
 +PPC_LL  r5, VCPU_XER(r4)
 +PPC_LL  r6, VCPU_CTR(r4)
 +PPC_LL  r7, VCPU_CR(r4)
 +PPC_LL  r8, VCPU_PC(r4)
 +#ifndef CONFIG_64BIT
 +lwz r9, (VCPU_SHARED_MSR + 4)(r11)
 +#else
 +ld  r9, (VCPU_SHARED_MSR)(r11)
 +#endif
 +PPC_LL  r0, VCPU_GPR(r0)(r4)
 +PPC_LL  r1, VCPU_GPR(r1)(r4)
 +PPC_LL  r2, VCPU_GPR(r2)(r4)
 +PPC_LL  r10, VCPU_GPR(r10)(r4)
 +PPC_LL  r11, VCPU_GPR(r11)(r4)
 +PPC_LL  r12, VCPU_GPR(r12)(r4)
 +PPC_LL  r13, VCPU_GPR(r13)(r4)
 +mtlrr3
 +mtxer   r5
 +mtctr   r6
 +mtcrr7
 +mtsrr0  r8
 +mtsrr1  r9
 +
 +#ifdef CONFIG_KVM_EXIT_TIMING
 +/* save enter time */
 +1:
 +mfspr   r6, SPRN_TBRU
 +mfspr   r7, SPRN_TBRL
 +mfspr   r8, SPRN_TBRU
 +cmpwr8, r6
 
 Is not we should save guest CR after this otherwise this can corrupt it?

I think this should be a typo since in our previous kvm implementation, we
always did collect kvm exit timing at the above location :)

Tiejun

 
 Thanks
 -Bharat
 
 +PPC_STL r7, VCPU_TIMING_LAST_ENTER_TBL(r4)
 +bne 1b  
 +PPC_STL r8, VCPU_TIMING_LAST_ENTER_TBU(r4)
 +#endif
 +
 +/* Finish loading guest volatiles and jump to guest. */
 +PPC_LL  r5, VCPU_GPR(r5)(r4)
 +PPC_LL  r6, VCPU_GPR(r6)(r4)
 +PPC_LL  r7, VCPU_GPR(r7)(r4)
 +PPC_LL  r8, VCPU_GPR(r8)(r4)
 +PPC_LL  r9, VCPU_GPR(r9)(r4)
 +
 +PPC_LL  r3, VCPU_GPR(r3)(r4)
 +PPC_LL  r4, VCPU_GPR(r4)(r4)
 +rfi
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Restore guest CR after exit timing calculation

2012-03-05 Thread Bharat Bhushan
No instruction which can change Condition Register (CR) should be executed 
after Guest CR is loaded. So the guest CR is restored after the Exit Timing in 
lightweight_exit executes cmpw, which can clobber CR.

Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
---
This patch is against e500mc branch.

 arch/powerpc/kvm/bookehv_interrupts.S |   11 ---
 1 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/bookehv_interrupts.S 
b/arch/powerpc/kvm/bookehv_interrupts.S
index 63fc5f0..6b9389f 100644
--- a/arch/powerpc/kvm/bookehv_interrupts.S
+++ b/arch/powerpc/kvm/bookehv_interrupts.S
@@ -574,7 +574,6 @@ lightweight_exit:
mtlrr3
mtxer   r5
mtctr   r6
-   mtcrr7
mtsrr0  r8
mtsrr1  r9
 
@@ -582,14 +581,20 @@ lightweight_exit:
/* save enter time */
 1:
mfspr   r6, SPRN_TBRU
-   mfspr   r7, SPRN_TBRL
+   mfspr   r9, SPRN_TBRL
mfspr   r8, SPRN_TBRU
cmpwr8, r6
-   PPC_STL r7, VCPU_TIMING_LAST_ENTER_TBL(r4)
+   PPC_STL r9, VCPU_TIMING_LAST_ENTER_TBL(r4)
bne 1b
PPC_STL r8, VCPU_TIMING_LAST_ENTER_TBU(r4)
 #endif
 
+   /*
+* Don't execute any instruction which can change CR after
+* below instruction.
+*/
+   mtcrr7
+
/* Finish loading guest volatiles and jump to guest. */
PPC_LL  r5, VCPU_GPR(r5)(r4)
PPC_LL  r6, VCPU_GPR(r6)(r4)
-- 
1.7.0.4


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: PPC: Save/Restore CR over vcpu_run

2012-03-05 Thread Alexander Graf
On PPC, CR2-CR4 are nonvolatile, thus have to be saved across function calls.
We didn't respect that for any architecture until Paul spotted it in his
patch for Book3S-HV. This patch saves/restores CR for all KVM capable PPC hosts.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/book3s_interrupts.S  |7 +++
 arch/powerpc/kvm/booke_interrupts.S   |7 ++-
 arch/powerpc/kvm/bookehv_interrupts.S |8 +++-
 3 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_interrupts.S 
b/arch/powerpc/kvm/book3s_interrupts.S
index 0a8515a..3e35383 100644
--- a/arch/powerpc/kvm/book3s_interrupts.S
+++ b/arch/powerpc/kvm/book3s_interrupts.S
@@ -84,6 +84,10 @@ kvm_start_entry:
/* Save non-volatile registers (r14 - r31) */
SAVE_NVGPRS(r1)
 
+   /* Save CR */
+   mfcrr14
+   stw r14, _CCR(r1)
+
/* Save LR */
PPC_STL r0, _LINK(r1)
 
@@ -165,6 +169,9 @@ kvm_exit_loop:
PPC_LL  r4, _LINK(r1)
mtlrr4
 
+   lwz r14, _CCR(r1)
+   mtcrr14
+
/* Restore non-volatile host registers (r14 - r31) */
REST_NVGPRS(r1)
 
diff --git a/arch/powerpc/kvm/booke_interrupts.S 
b/arch/powerpc/kvm/booke_interrupts.S
index 10d8ef6..c8c4b87 100644
--- a/arch/powerpc/kvm/booke_interrupts.S
+++ b/arch/powerpc/kvm/booke_interrupts.S
@@ -34,7 +34,8 @@
 /* r2 is special: it holds 'current', and it made nonvolatile in the
  * kernel with the -ffixed-r2 gcc option. */
 #define HOST_R2 12
-#define HOST_NV_GPRS16
+#define HOST_CR 16
+#define HOST_NV_GPRS20
 #define HOST_NV_GPR(n)  (HOST_NV_GPRS + ((n - 14) * 4))
 #define HOST_MIN_STACK_SIZE (HOST_NV_GPR(31) + 4)
 #define HOST_STACK_SIZE (((HOST_MIN_STACK_SIZE + 15) / 16) * 16) /* Align. */
@@ -296,8 +297,10 @@ heavyweight_exit:
 
/* Return to kvm_vcpu_run(). */
lwz r4, HOST_STACK_LR(r1)
+   lwz r5, HOST_CR(r1)
addir1, r1, HOST_STACK_SIZE
mtlrr4
+   mtcrr5
/* r3 still contains the return code from kvmppc_handle_exit(). */
blr
 
@@ -314,6 +317,8 @@ _GLOBAL(__kvmppc_vcpu_run)
stw r3, HOST_RUN(r1)
mflrr3
stw r3, HOST_STACK_LR(r1)
+   mfcrr5
+   stw r5, HOST_CR(r1)
 
/* Save host non-volatile register state to stack. */
stw r14, HOST_NV_GPR(r14)(r1)
diff --git a/arch/powerpc/kvm/bookehv_interrupts.S 
b/arch/powerpc/kvm/bookehv_interrupts.S
index 63fc5f0..3989b5a 100644
--- a/arch/powerpc/kvm/bookehv_interrupts.S
+++ b/arch/powerpc/kvm/bookehv_interrupts.S
@@ -49,7 +49,8 @@
  * kernel with the -ffixed-r2 gcc option.
  */
 #define HOST_R2 (3 * LONGBYTES)
-#define HOST_NV_GPRS(4 * LONGBYTES)
+#define HOST_CR (4 * LONGBYTES)
+#define HOST_NV_GPRS(5 * LONGBYTES)
 #define HOST_NV_GPR(n)  (HOST_NV_GPRS + ((n - 14) * LONGBYTES))
 #define HOST_MIN_STACK_SIZE (HOST_NV_GPR(31) + LONGBYTES)
 #define HOST_STACK_SIZE ((HOST_MIN_STACK_SIZE + 15)  ~15) /* Align. */
@@ -396,6 +397,7 @@ skip_nv_load:
 heavyweight_exit:
/* Not returning to guest. */
PPC_LL  r5, HOST_STACK_LR(r1)
+   lwz r6, HOST_CR(r1)
 
/*
 * We already saved guest volatile register state; now save the
@@ -442,6 +444,7 @@ heavyweight_exit:
 
/* Return to kvm_vcpu_run(). */
mtlrr5
+   mtcrr6
addir1, r1, HOST_STACK_SIZE
/* r3 still contains the return code from kvmppc_handle_exit(). */
blr
@@ -459,6 +462,9 @@ _GLOBAL(__kvmppc_vcpu_run)
mflrr3
PPC_STL r3, HOST_STACK_LR(r1)
 
+   mfcrr5
+   stw r5, HOST_CR(r1)
+
/* Save host non-volatile register state to stack. */
PPC_STL r14, HOST_NV_GPR(r14)(r1)
PPC_STL r15, HOST_NV_GPR(r15)(r1)
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: check error return of kvmppc_core_vcpu_create first

2012-03-05 Thread Alexander Graf

On 02/21/2012 05:30 AM, Ben Collins wrote:

The result of kvmppc_core_vcpu_create() was being manipulated before it was 
checked for IS_ERR(). Did not see the bug occur, but caught it when looking 
through the code.


Nice catch, but this has already been fixed by Matt:

commit c6f3830e7313eea47b526b597aadc5b18c69ad55
Author: Matt Evans m...@ozlabs.org
Date:   Tue Dec 6 21:19:42 2011 +

KVM: PPC: Fix vcpu_create dereference before validity check.

Fix usage of vcpu struct before check that it's actually valid.

Signed-off-by: Matt Evans m...@ozlabs.org
Signed-off-by: Alexander Graf ag...@suse.de


Thanks a lot for sending the patch nevertheless!

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] KVM: PPC: Book3S HV: Save and restore CR in __kvmppc_vcore_entry

2012-03-05 Thread Alexander Graf

On 02/03/2012 11:53 AM, Paul Mackerras wrote:

The ABI specifies that CR fields CR2--CR4 are nonvolatile across function
calls.  Currently __kvmppc_vcore_entry doesn't save and restore the CR,
leading to CR2--CR4 getting corrupted with guest values, possibly leading
to incorrect behaviour in its caller.  This adds instructions to save
and restore CR at the points where we save and restore the nonvolatile
GPRs.

Signed-off-by: Paul Mackerraspau...@samba.org


Thanks, applied all to kvm-ppc-next. Please CC kvm@vger when you send 
patches. Failing to do so might mean the whole pull request gets blocked 
by Avi when it gets to him, because he doesn't read kvm-ppc@vger.



Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: Book3s: PR: Add SPAPR H_BULK_REMOVE support

2012-03-05 Thread Alexander Graf

On 01/31/2012 07:25 AM, Matt Evans wrote:

SPAPR support includes various in-kernel hypercalls, improving performance
by cutting out the exit to userspace.  H_BULK_REMOVE is implemented in this
patch.

Signed-off-by: Matt Evansm...@ozlabs.org


Thanks, applied to kvm-ppc-next.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Emulating lwarx and stwcx instructions in PowerPc BOOKE e500

2012-03-05 Thread Aashish Mittal
Hi
I'm working on powerpc booke architecture and my project requires me to remove
read and write privileges on some pages. Due to this any instruction accessing
these pages traps and i'm trying to emulate the behavior of these instructions.

I've emulated lwarx and stwcx instruction but i think stwcx is not working
correctly. The emulation i've written is written below

case OP_31_XOP_LWARX:
{
  ulong ret;
  ulong addr;
  int eh = inst  0x0001 ;
  kvm_gva_to_hva(vcpu,ea,addr);
  /*lwarx RT RA RB EH*/
  if(eh == 0)
  __asm__ __volatile__(lwarx %0,0,%1,0; isync:=r (ret) :r (addr));
  else
 __asm__ __volatile__(lwarx %0,0,%1,1; isync:=r (ret) :r (addr));
  
  kvmppc_set_gpr(vcpu,rt,ret);
}

case OP_31_XOP_STWCX:
{
  ulong tmp;
  ulong addr;
  ulong data;
  kvm_gva_to_hva(vcpu,ea,addr);
  kvmppc_read_guest(vcpu,ea,data,sizeof(data));
  __asm__ __volatile__(stwcx. %1,0,%2; isync
  :=r (tmp):r (data),r (addr):memory);

} 

Here kvm_gva_to_hva function convrets a guest effective address to host virtual
address .

void kvm_gva_to_hva(struct kvm_vcpu *vcpu, ulong ea,ulong* hva)
{
  gfn_t gfn;
  gpa_t gpa ;
  int gtlb_index;
  int offset;
  ulong addr;
  struct kvmppc_vcpu_e500 *vcpu_e500 = to_e500(vcpu);

  gtlb_index = kvmppc_mmu_itlb_index(vcpu, ea);
  gpa = kvmppc_mmu_xlate(vcpu,gtlb_index, ea);
  gfn = gpa  PAGE_SHIFT;
  addr = (ulong)gfn_to_hva(vcpu_e500-vcpu.kvm, gfn);
  offset = offset_in_page(gpa);
  
  *hva = addr + offset;
  return;
}

The guest just hangs once it encounters a stwcx instruction. Does anybody have
any idea why this is not working and what's wrong about the emulation code.

Also i'm working on linux-3.0-rc4 kernel .

Thanks in advance


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Emulating lwarx and stwcx instructions in PowerPc BOOKE e500

2012-03-05 Thread Scott Wood
On 03/05/2012 02:37 PM, Aashish Mittal wrote:
 Hi
 I'm working on powerpc booke architecture and my project requires me to remove
 read and write privileges on some pages. Due to this any instruction accessing
 these pages traps and i'm trying to emulate the behavior of these 
 instructions.
 
 I've emulated lwarx and stwcx instruction but i think stwcx is not working
 correctly. The emulation i've written is written below

What is it you're emulating that needs lwarx/stwcx to work?

 case OP_31_XOP_LWARX:
 {
   ulong ret;
   ulong addr;
   int eh = inst  0x0001 ;
   kvm_gva_to_hva(vcpu,ea,addr);
   /*lwarx RT RA RB EH*/
   if(eh == 0)
   __asm__ __volatile__(lwarx %0,0,%1,0; isync:=r (ret) :r (addr));
   else
  __asm__ __volatile__(lwarx %0,0,%1,1; isync:=r (ret) :r (addr));
 
   kvmppc_set_gpr(vcpu,rt,ret);
 }
 
 case OP_31_XOP_STWCX:
 {
   ulong tmp;
   ulong addr;
   ulong data;
   kvm_gva_to_hva(vcpu,ea,addr);
   kvmppc_read_guest(vcpu,ea,data,sizeof(data));
   __asm__ __volatile__(stwcx. %1,0,%2; isync
   :=r (tmp):r (data),r (addr):memory);
 
 } 
 
 Here kvm_gva_to_hva function convrets a guest effective address to host 
 virtual
 address .
 
 void kvm_gva_to_hva(struct kvm_vcpu *vcpu, ulong ea,ulong* hva)
 {
   gfn_t gfn;
   gpa_t gpa ;
   int gtlb_index;
   int offset;
   ulong addr;
   struct kvmppc_vcpu_e500 *vcpu_e500 = to_e500(vcpu);
   
   gtlb_index = kvmppc_mmu_itlb_index(vcpu, ea);
   gpa = kvmppc_mmu_xlate(vcpu,gtlb_index, ea);
   gfn = gpa  PAGE_SHIFT;
   addr = (ulong)gfn_to_hva(vcpu_e500-vcpu.kvm, gfn);
   offset = offset_in_page(gpa);
   
   *hva = addr + offset;
   return;
 }
 
 The guest just hangs once it encounters a stwcx instruction. Does anybody have
 any idea why this is not working and what's wrong about the emulation code.

You're losing the reservation somewhere.  Any lock or atomic operation
along the emulation path will do this.

Even if this didn't happen by accident, we really don't want to leave a
reservation when we return to the guest -- it could have belonged to a
previously running guest operating on shared memory, for example.
Perhaps we should have a dummy stwcx on KVM guest entry code, similar to
the one on interrupt return?

 Also i'm working on linux-3.0-rc4 kernel .

Why are you working on something other than the current code or a stable
release?

-Scott

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: Save/Restore CR over vcpu_run

2012-03-05 Thread Scott Wood
On 03/05/2012 10:02 AM, Alexander Graf wrote:
 @@ -442,6 +444,7 @@ heavyweight_exit:
  
   /* Return to kvm_vcpu_run(). */
   mtlrr5
 + mtcrr6
   addir1, r1, HOST_STACK_SIZE
   /* r3 still contains the return code from kvmppc_handle_exit(). */
   blr
 @@ -459,6 +462,9 @@ _GLOBAL(__kvmppc_vcpu_run)
   mflrr3
   PPC_STL r3, HOST_STACK_LR(r1)
  
 + mfcrr5
 + stw r5, HOST_CR(r1)

If you move the mfcr before the PPC_STL they should be able to run in
parallel.  Otherwise on e500mc mfcr will wait for PPC_STL to take its 3
cycles and then mfcr will take 5 cyles before the stw of HOST_CR.
Alternatively, consider using mcrf/mtocrf three times.

Similar issues in booke_interrupts.S (except we can't assume mtocrf
exists there), but I'm less worried about that one as it still needs an
optimization pass in general.

-Scott

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html