date:20120912

Re: [Qemu-devel] [PATCH V3 10/11] vcpu: introduce lockmap

2012-09-12 Thread liu ping fan

On Tue, Sep 11, 2012 at 8:41 PM, Avi Kivity  wrote:
> On 09/11/2012 03:24 PM, Avi Kivity wrote:
>> On 09/11/2012 12:57 PM, Jan Kiszka wrote:
>>> On 2012-09-11 11:44, liu ping fan wrote:
 On Tue, Sep 11, 2012 at 4:35 PM, Avi Kivity  wrote:
> On 09/11/2012 10:51 AM, Liu Ping Fan wrote:
>> From: Liu Ping Fan 
>>
>> The func call chain can suffer from recursively hold
>> qemu_mutex_lock_iothread. We introduce lockmap to record the
>> lock depth.
>
> What is the root cause?  io handlers initiating I/O?
>
 cpu_physical_memory_rw() can be called nested, and when called, it can
 be protected by no-lock/device lock/ big-lock.
 I think without big lock, io-dispatcher should face the same issue.
 As to main-loop, have not carefully consider, but at least, dma-helper
 will call cpu_physical_memory_rw() with big-lock.
>>>
>>> That is our core problem: inconsistent invocation of existing services
>>> /wrt locking. For portio, I was lucky that there is no nesting and I was
>>> able to drop the big lock around all (x86) call sites. But MMIO is way
>>> more tricky due to DMA nesting.
>>
>> Maybe we need to switch to a continuation style.  Instead of expecting
>> cpu_physical_memory_rw() to complete synchronously, it becomes an
>> asynchronous call and you provide it with a completion.  That means
>> devices which use it are forced to drop the lock in between.  Block and
>> network clients will be easy to convert since they already use APIs that
>> drop the lock (except for accessing the descriptors).
>>
>>> We could try to introduce a different version of cpu_physical_memory_rw,
>>> cpu_physical_memory_rw_unlocked. But the problem remains that an MMIO
>>> request can trigger the very same access in a nested fashion, and we
>>> will have to detect this to avoid locking up QEMU (locking up the guest
>>> might be OK).
>>
>> An async version of c_p_m_rw() will just cause a completion to bounce
>> around, consuming cpu but not deadlocking anything.  If we can keep a
>> count of the bounces, we might be able to stall it indefinitely or at
>> least ratelimit it.
>>
>
> Another option is to require all users of c_p_m_rw() and related to use
> a coroutine or thread.  That makes the programming easier (but still
> required a revalidation after the dropped lock).
>
For the nested cpu_physical_memory_rw(), we change it internal but
keep it sync API as it is.  (Wrapper current cpu_physical_memory_rw()
into cpu_physical_memory_rw_internal() )


LOCK()  // can be device or big lock or both, depend on caller.
..
cpu_physical_memory_rw()
{
   UNLOCK() //unlock all the locks
   queue_work_on_thread(cpu_physical_memory_rw_internal, completion);
// cpu_physical_memory_rw_internal can take lock(device,biglock) again
   wait_for_completion(completion)
   LOCK()
}
..
UNLOCK()

Although cpu_physical_memory_rw_internal() can be freed to use lock,
but with precondition. -- We still need to trace lock stack taken by
cpu_physical_memory_rw(), so that it can return to caller correctly.
Is that OK?

Regards,
pingfan

> --
> error compiling committee.c: too many arguments to function

Re: [Qemu-devel] [PATCH V3 10/11] vcpu: introduce lockmap

2012-09-12 Thread liu ping fan

On Tue, Sep 11, 2012 at 10:54 PM, Marcelo Tosatti  wrote:
> On Tue, Sep 11, 2012 at 03:41:15PM +0300, Avi Kivity wrote:
>> On 09/11/2012 03:24 PM, Avi Kivity wrote:
>> > On 09/11/2012 12:57 PM, Jan Kiszka wrote:
>> >> On 2012-09-11 11:44, liu ping fan wrote:
>> >>> On Tue, Sep 11, 2012 at 4:35 PM, Avi Kivity  wrote:
>>  On 09/11/2012 10:51 AM, Liu Ping Fan wrote:
>> > From: Liu Ping Fan 
>> >
>> > The func call chain can suffer from recursively hold
>> > qemu_mutex_lock_iothread. We introduce lockmap to record the
>> > lock depth.
>> 
>>  What is the root cause?  io handlers initiating I/O?
>> 
>> >>> cpu_physical_memory_rw() can be called nested, and when called, it can
>> >>> be protected by no-lock/device lock/ big-lock.
>> >>> I think without big lock, io-dispatcher should face the same issue.
>> >>> As to main-loop, have not carefully consider, but at least, dma-helper
>> >>> will call cpu_physical_memory_rw() with big-lock.
>> >>
>> >> That is our core problem: inconsistent invocation of existing services
>> >> /wrt locking. For portio, I was lucky that there is no nesting and I was
>> >> able to drop the big lock around all (x86) call sites. But MMIO is way
>> >> more tricky due to DMA nesting.
>> >
>> > Maybe we need to switch to a continuation style.  Instead of expecting
>> > cpu_physical_memory_rw() to complete synchronously, it becomes an
>> > asynchronous call and you provide it with a completion.  That means
>> > devices which use it are forced to drop the lock in between.  Block and
>> > network clients will be easy to convert since they already use APIs that
>> > drop the lock (except for accessing the descriptors).
>> >
>> >> We could try to introduce a different version of cpu_physical_memory_rw,
>> >> cpu_physical_memory_rw_unlocked. But the problem remains that an MMIO
>> >> request can trigger the very same access in a nested fashion, and we
>> >> will have to detect this to avoid locking up QEMU (locking up the guest
>> >> might be OK).
>> >
>> > An async version of c_p_m_rw() will just cause a completion to bounce
>> > around, consuming cpu but not deadlocking anything.  If we can keep a
>> > count of the bounces, we might be able to stall it indefinitely or at
>> > least ratelimit it.
>> >
>>
>> Another option is to require all users of c_p_m_rw() and related to use
>> a coroutine or thread.  That makes the programming easier (but still
>> required a revalidation after the dropped lock).
>
> Unless i am misunderstanding this thread, the iothread flow section of
> http://lists.gnu.org/archive/html/qemu-devel/2012-06/msg04315.html
> contains possible solutions for this.
>
> Basically use trylock() and if it fails, then choose a bailout option
> (which there are more than one possibilities listed).
>
I think in that thread, the original goal of the trylock() is to solve
the lock held by other thread, by here we want to resolve nested lock.
But yes, there is similar point to solve it by async if we do not
allow nested lock.

Regards,
pingfan
>

Re: [Qemu-devel] [PATCH V3 08/11] qom: introduce reclaimer to release obj in async

2012-09-12 Thread liu ping fan

On Tue, Sep 11, 2012 at 5:37 PM, Avi Kivity  wrote:
> On 09/11/2012 12:32 PM, liu ping fan wrote:
>> On Tue, Sep 11, 2012 at 4:32 PM, Avi Kivity  wrote:
>>> On 09/11/2012 10:51 AM, Liu Ping Fan wrote:
 From: Liu Ping Fan 

 DeviceState will be protected by refcnt from disappearing during
 dispatching. But when refcnt comes down to zero, DeviceState may
 be still in use by iohandler, timer etc in main loop, we just delay
 its free untill no reader.

>>>
>>> How can this be?  We elevate the refcount while dispatching I/O.  If we
>>> have similar problems with the timer, we need to employ a similar solution.
>>>
>> Yes, at the next step, plan to covert iohandler, timer etc to use
>> refcount as memory. Here just a temp solution.
>
> I prefer not to ever introduce it.
>
> What we can do is introduce a sub-region for e1000's mmio that will take
> only the device lock, and let original region use the old dispatch path
> (and also take the device lock).  As we thread the various subsystems
> e1000 uses, we can expand the sub-region until it covers all of e1000's
> functions, then fold it back into the main region.
>
Introducing new sub-region for e1000  seems no help to resolve this
issue. It can not tell whether main-loop still use it or not.
I think the key point is that original code SYNC eliminate all the
readers of DeviceState at acpi_piix_eject_slot() by
dev->unit()/exit(), so each subsystem will no access it in future.
But now, we can delete the DeviceState async.
Currently, we can just use e1000->unmap() to detach itself from each
subsystem(Not implemented in this series patches for timer,...) to
achieve the goal, because their readers are still under the protection
of big lock, but when they are out of big lock, we need extra effort
like memory system.

Regards,
pingfan
> To start with the sub-region can only include registers that call no
> qemu infrastructure code: simple read/writes.
>
>
> --
> error compiling committee.c: too many arguments to function

Re: [Qemu-devel] [PATCH V3 01/11] atomic: introduce atomic operations

2012-09-12 Thread liu ping fan

On Tue, Sep 11, 2012 at 4:04 PM, Avi Kivity  wrote:
> On 09/11/2012 10:51 AM, Liu Ping Fan wrote:
>> From: Liu Ping Fan 
>>
>> If out of global lock, we will be challenged by SMP in low level,
>> so need atomic ops.
>>
>> This file is a wrapper of GCC atomic builtin.
>>
>> Signed-off-by: Liu Ping Fan 
>> ---
>>  include/qemu/atomic.h |   63 
>> +
>>  1 files changed, 63 insertions(+), 0 deletions(-)
>>  create mode 100644 include/qemu/atomic.h
>>
>> diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
>> new file mode 100644
>> index 000..f17145d
>> --- /dev/null
>> +++ b/include/qemu/atomic.h
>> @@ -0,0 +1,63 @@
>> +/*
>> + * Simple wrapper of gcc Atomic-Builtins
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>> + * See the COPYING file in the top-level directory.
>> + *
>> + */
>> +#ifndef __QEMU_ATOMIC_H
>> +#define __QEMU_ATOMIC_H
>> +
>> +typedef struct Atomic {
>> +int counter;
>> +} Atomic;
>
> Best to mark counter 'volatile'.
>
>> +
>> +static inline void atomic_set(Atomic *v, int i)
>> +{
>> +v->counter = i;
>> +}
>> +
>> +static inline int atomic_read(Atomic *v)
>> +{
>> +return v->counter;
>> +}
>>
>
> So these two operations don't get mangled by the optimizer.
>
Browsing linux code and reading lkml, find some similar material. But
they have moved volatile from ->counter to function - atomic_read().
As to atomic_read(), I think it need to prevent optimizer from
refetching issue, but as to atomic_set(), do we need ?

Regards,
pingfan
>
>
>
> --
> error compiling committee.c: too many arguments to function

Re: [Qemu-devel] [PATCH V3 01/11] atomic: introduce atomic operations

2012-09-12 Thread liu ping fan

On Tue, Sep 11, 2012 at 4:15 PM, Peter Maydell  wrote:
> On 11 September 2012 08:51, Liu Ping Fan  wrote:
>> +
>> +/**
>> + *  * atomic_inc - increment atomic variable
>> + *   * @v: pointer of type Atomic
>> + **
>> + * * Atomically increments @v by 1.
>> + *  */
>
> Your editor has done something weird with these comments.
>
Yes, will fix it. Thanx, pingfan
> -- PMM

Re: [Qemu-devel] [PATCH 0/3] client monitors config support

2012-09-12 Thread Gerd Hoffmann

On 09/12/12 15:13, Alon Levy wrote:
>  - no addition of guest capabilities, use interrupt mask instead, ignore
>0 or ~0 that are set by current windows driver.
>  - use crc to solve possible write while read.
>  - limit heads to 64, statically allocated on rom by host.
>  - some misc trace fixes.

Patch series added to spice patch queue.

thanks,
  Gerd

Re: [Qemu-devel] 'qemu-nbd' explicite flush to disk

2012-09-12 Thread Paolo Bonzini

Il 12/09/2012 23:28, Mark Trumpold ha scritto:
> So, I've been experimenting with 'qemu-nbd --cache=writeback ..'
> This nicely eliminates the 'checkpoint' issue; however, I have as
> yet been unable to explicitely flush things to disk -- which I would like to
> do just before a 'nilfs' snapshot.

The Linux kernel driver for NBD does not support flushes.  Patches were
sent to the maintainer, but he never applied them.

You can get them at
http://thread.gmane.org/gmane.linux.drivers.nbd.general/1108 and try them.

> Subsequently I've been trying to call 'bdrv_co_flush(bs)' directly, but I 
> can't figure out how to dereference 'bs' for the call.
> 
> I'm probably out in the weeds on this one.
> Any guidance would be greatly appreciated.
> 
> I am running:
> qemu-1.2.0
> linux kernel 3.3.1
> 
> Thank you,
> Mark Trumpold
> 
> 
> Confidentiality Notice: The information contained in this electronic
> e-mail and any accompanying attachment(s) is intended only for the use
> of the intended recipient and is confidential and/or privileged.

I doubt so, since this is a public mailing list.

Paolo

Re: [Qemu-devel] Rethinking missed tick catchup

2012-09-12 Thread Paolo Bonzini

Il 12/09/2012 20:03, Clemens Kolbitsch ha scritto:
> 
> not much that I can contribute to solving the problem, but I have a
> bunch of VMs where this happens _every_ time I resume a snapshot (but
> without hibernating). In case this could be a connected problem and
> you need help testing a patch, I'm more than happy to help.

Resuming from a snapshot is the same as hibernating from the guest POV.
 To fix it, you need to suspend the VM to S3 before saving a snapshot
and/or hibernating.

Paolo

Re: [Qemu-devel] [PATCH] hw: Add support for new LSI Logic devices.

2012-09-12 Thread Paolo Bonzini

Il 12/09/2012 15:58, Anthony Liguori ha scritto:
> Why would someone use this verses megasas vs. LSI vs virtio-scsi?

LSI is dead.  Compare it to IDE.

virtio-scsi has the highest performance, but it is not supported on all
guests.  Compare it to virtio-blk.

This vs. megasas is a good question; both can be compared to AHCI, they
have good performance and have the advantage of compatibility with real
hardware.  If this had gone in first, I would probably have rejected
megasas, but this has the advantage of being used in VMware and
VirtualBox.  Plus, unlike megasas Don said he'd work on SeaBIOS support,
so it has further merit.

Paolo

[Qemu-devel] [PATCH v3 1/2] slirp: Handle more than 65535 blocks in TFTP transfers

2012-09-12 Thread Hervé Poussineau

RFC 1350 does not mention block count roll-over. However, a lot of TFTP servers
implement it to be able to transmit big files, so do it also.

Current block size is 512 bytes, so TFTP files were limited to 32 MB.

Signed-off-by: Hervé Poussineau 
---
 slirp/tftp.c |   24 ++--
 slirp/tftp.h |1 +
 2 files changed, 11 insertions(+), 14 deletions(-)

diff --git a/slirp/tftp.c b/slirp/tftp.c
index 520dbd6..c6a5df2 100644
--- a/slirp/tftp.c
+++ b/slirp/tftp.c
@@ -97,7 +97,7 @@ static int tftp_session_find(Slirp *slirp, struct tftp_t *tp)
   return -1;
 }
 
-static int tftp_read_data(struct tftp_session *spt, uint16_t block_nr,
+static int tftp_read_data(struct tftp_session *spt, uint32_t block_nr,
   uint8_t *buf, int len)
 {
 int bytes_read = 0;
@@ -197,19 +197,14 @@ out:
   tftp_session_terminate(spt);
 }
 
-static int tftp_send_data(struct tftp_session *spt,
-  uint16_t block_nr,
- struct tftp_t *recv_tp)
+static int tftp_send_next_block(struct tftp_session *spt,
+struct tftp_t *recv_tp)
 {
   struct sockaddr_in saddr, daddr;
   struct mbuf *m;
   struct tftp_t *tp;
   int nobytes;
 
-  if (block_nr < 1) {
-return -1;
-  }
-
   m = m_get(spt->slirp);
 
   if (!m) {
@@ -223,7 +218,7 @@ static int tftp_send_data(struct tftp_session *spt,
   m->m_data += sizeof(struct udpiphdr);
 
   tp->tp_op = htons(TFTP_DATA);
-  tp->x.tp_data.tp_block_nr = htons(block_nr);
+  tp->x.tp_data.tp_block_nr = htons((spt->block_nr + 1) & 0x);
 
   saddr.sin_addr = recv_tp->ip.ip_dst;
   saddr.sin_port = recv_tp->udp.uh_dport;
@@ -231,7 +226,7 @@ static int tftp_send_data(struct tftp_session *spt,
   daddr.sin_addr = spt->client_ip;
   daddr.sin_port = spt->client_port;
 
-  nobytes = tftp_read_data(spt, block_nr - 1, tp->x.tp_data.tp_buf, 512);
+  nobytes = tftp_read_data(spt, spt->block_nr, tp->x.tp_data.tp_buf, 512);
 
   if (nobytes < 0) {
 m_free(m);
@@ -255,6 +250,7 @@ static int tftp_send_data(struct tftp_session *spt,
 tftp_session_terminate(spt);
   }
 
+  spt->block_nr++;
   return 0;
 }
 
@@ -373,7 +369,8 @@ static void tftp_handle_rrq(Slirp *slirp, struct tftp_t 
*tp, int pktlen)
   }
   }
 
-  tftp_send_data(spt, 1, tp);
+  spt->block_nr = 0;
+  tftp_send_next_block(spt, tp);
 }
 
 static void tftp_handle_ack(Slirp *slirp, struct tftp_t *tp, int pktlen)
@@ -386,9 +383,8 @@ static void tftp_handle_ack(Slirp *slirp, struct tftp_t 
*tp, int pktlen)
 return;
   }
 
-  if (tftp_send_data(&slirp->tftp_sessions[s],
-ntohs(tp->x.tp_data.tp_block_nr) + 1,
-tp) < 0) {
+  if (tftp_send_next_block(&slirp->tftp_sessions[s],
+   tp) < 0) {
 return;
   }
 }
diff --git a/slirp/tftp.h b/slirp/tftp.h
index 9c364ea..51704e4 100644
--- a/slirp/tftp.h
+++ b/slirp/tftp.h
@@ -37,6 +37,7 @@ struct tftp_session {
 
 struct in_addr client_ip;
 uint16_t client_port;
+uint32_t block_nr;
 
 int timestamp;
 };
-- 
1.7.10.4

[Qemu-devel] [PATCH v3 2/2] slirp: Implement TFTP Blocksize option

2012-09-12 Thread Hervé Poussineau

This option is described in RFC 1783. As this is only an optional field,
we may ignore it in some situations and handle it in some others.

However, MS Windows 2003 PXE boot client requests a block size of the MTU
(most of the times 1472 bytes), and doesn't work if the option is not
acknowledged (with whatever value).

According to the RFC 1783, we cannot acknowledge the option with a bigger
value than the requested one.

As current implementation is using 512 bytes by block, accept the option
with a value of 512 if the option was specified, and don't acknowledge it
if it is not present or less than 512 bytes.

Signed-off-by: Hervé Poussineau 
---
 slirp/tftp.c |   42 +-
 1 file changed, 33 insertions(+), 9 deletions(-)

diff --git a/slirp/tftp.c b/slirp/tftp.c
index c6a5df2..37b0387 100644
--- a/slirp/tftp.c
+++ b/slirp/tftp.c
@@ -120,13 +120,13 @@ static int tftp_read_data(struct tftp_session *spt, 
uint32_t block_nr,
 }
 
 static int tftp_send_oack(struct tftp_session *spt,
-  const char *key, uint32_t value,
+  const char *keys[], uint32_t values[], int nb,
   struct tftp_t *recv_tp)
 {
 struct sockaddr_in saddr, daddr;
 struct mbuf *m;
 struct tftp_t *tp;
-int n = 0;
+int i, n = 0;
 
 m = m_get(spt->slirp);
 
@@ -140,10 +140,12 @@ static int tftp_send_oack(struct tftp_session *spt,
 m->m_data += sizeof(struct udpiphdr);
 
 tp->tp_op = htons(TFTP_OACK);
-n += snprintf(tp->x.tp_buf + n, sizeof(tp->x.tp_buf) - n, "%s",
-  key) + 1;
-n += snprintf(tp->x.tp_buf + n, sizeof(tp->x.tp_buf) - n, "%u",
-  value) + 1;
+for (i = 0; i < nb; i++) {
+n += snprintf(tp->x.tp_buf + n, sizeof(tp->x.tp_buf) - n, "%s",
+  keys[i]) + 1;
+n += snprintf(tp->x.tp_buf + n, sizeof(tp->x.tp_buf) - n, "%u",
+  values[i]) + 1;
+}
 
 saddr.sin_addr = recv_tp->ip.ip_dst;
 saddr.sin_port = recv_tp->udp.uh_dport;
@@ -260,6 +262,9 @@ static void tftp_handle_rrq(Slirp *slirp, struct tftp_t 
*tp, int pktlen)
   int s, k;
   size_t prefix_len;
   char *req_fname;
+  const char *option_name[2];
+  uint32_t option_value[2];
+  int nb_options = 0;
 
   /* check if a session already exists and if so terminate it */
   s = tftp_session_find(slirp, tp);
@@ -337,7 +342,7 @@ static void tftp_handle_rrq(Slirp *slirp, struct tftp_t 
*tp, int pktlen)
   return;
   }
 
-  while (k < pktlen) {
+  while (k < pktlen && nb_options < ARRAY_SIZE(option_name)) {
   const char *key, *value;
 
   key = &tp->x.tp_buf[k];
@@ -364,11 +369,30 @@ static void tftp_handle_rrq(Slirp *slirp, struct tftp_t 
*tp, int pktlen)
  }
  }
 
- tftp_send_oack(spt, "tsize", tsize, tp);
- return;
+  option_name[nb_options] = "tsize";
+  option_value[nb_options] = tsize;
+  nb_options++;
+  } else if (strcasecmp(key, "blksize") == 0) {
+  int blksize = atoi(value);
+
+  /* If blksize option is bigger than what we will
+   * emit, accept the option with our packet size.
+   * Otherwise, simply do as we didn't see the option.
+   */
+  if (blksize >= 512) {
+  option_name[nb_options] = "blksize";
+  option_value[nb_options] = 512;
+  nb_options++;
+  }
   }
   }
 
+  if (nb_options > 0) {
+  assert(nb_options <= ARRAY_SIZE(option_name));
+  tftp_send_oack(spt, option_name, option_value, nb_options, tp);
+  return;
+  }
+
   spt->block_nr = 0;
   tftp_send_next_block(spt, tp);
 }
-- 
1.7.10.4

[Qemu-devel] [PATCH v3 0/2] slirp: tftp server improvements

2012-09-12 Thread Hervé Poussineau

These patches have already been sent in April 2011, and contain some fixes for 
the internal TFTP server.
With these patches, MS Windows PE can be booted via PXE, and 32MB file 
limitation has been removed.

This has been tested with MS Windows 2003 PXE boot client, PXELINUX and gPXE.

Indentation seems a little bit off in patch 2/2, because surrounding code 
indentation is one tab followed by spaces. I've indented new lines by replacing 
the tab by 8 spaces.
Patch 1/2 also has indentation errors reported by checkpatch.pl, but I kept 
indentation consistent.

Changes v1 -> v2
- rebased
- updated with Aurélien Jarno remarks
- improved commit messages

Changes v2 -> v3
- removed patch 1/3, already applied to slirp queue
- updated with Jan Kiszka remarks

Hervé Poussineau (2):
  slirp: Handle more than 65535 blocks in TFTP transfers
  slirp: Implement TFTP Blocksize option

 slirp/tftp.c |   66 ++
 slirp/tftp.h |1 +
 2 files changed, 44 insertions(+), 23 deletions(-)

-- 
1.7.10.4

Re: [Qemu-devel] [PATCH 2/4] tcg/i386: Remove unused registers from tcg_target_call_iarg_regs

2012-09-12 Thread Stefan Weil


Am 12.09.2012 23:18, schrieb Peter Maydell:

On 12 September 2012 21:44, Stefan Weil  wrote:

--- a/tcg/i386/tcg-target.c
+++ b/tcg/i386/tcg-target.c
@@ -75,9 +75,7 @@ static const int tcg_target_call_iarg_regs[] = {
  TCG_REG_R8,
  TCG_REG_R9,
  #else
-TCG_REG_EAX,
-TCG_REG_EDX,
-TCG_REG_ECX
+/* 32 bit mode uses stack based calling convention (GCC default). */
  #endif
  };

This makes the array zero-length for 32 bit targets, but functions
like tcg_out_tlb_load() and tcg_out_qemu_ld() still unconditionally
access elements in it...

-- PMM


Thanks for the hint. I'm afraid there are a lot more functions
of that kind in i386/tcg-target.c.

I could use conditional compilation for those accesses,
but first I'd like to understand why this works at all.

- sw

Re: [Qemu-devel] [PATCH 1/2] KVM: fix i8259 interrupt high to low transition logic

2012-09-12 Thread Matthew Ogilvie

On Wed, Sep 12, 2012 at 10:57:57AM +0200, Jan Kiszka wrote:
> On 2012-09-12 10:51, Avi Kivity wrote:
> > On 09/12/2012 11:48 AM, Jan Kiszka wrote:
> >> On 2012-09-12 10:01, Avi Kivity wrote:
> >>> On 09/10/2012 04:29 AM, Matthew Ogilvie wrote:
>  Intel's definition of "edge triggered" means: "asserted with a
>  low-to-high transition at the time an interrupt is registered
>  and then kept high until the interrupt is served via one of the
>  EOI mechanisms or goes away unhandled."
> 
>  So the only difference between edge triggered and level triggered
>  is in the leading edge, with no difference in the trailing edge.
> 
>  This bug manifested itself when the guest was Microport UNIX
>  System V/386 v2.1 (ca. 1987), because it would sometimes mask
>  off IRQ14 in the slave IMR after it had already been asserted.
>  The master would still try to deliver an interrupt even though
>  IRQ2 had dropped again, resulting in a spurious interupt
>  (IRQ15) and a panicked UNIX kernel.
>  diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
>  index adba28f..5cbba99 100644
>  --- a/arch/x86/kvm/i8254.c
>  +++ b/arch/x86/kvm/i8254.c
>  @@ -302,8 +302,12 @@ static void pit_do_work(struct kthread_work *work)
>   }
>   spin_unlock(&ps->inject_lock);
>   if (inject) {
>  -kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1);
>  +/* Clear previous interrupt, then create a rising
>  + * edge to request another interupt, and leave it at
>  + * level=1 until time to inject another one.
>  + */
>   kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 0);
>  +kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1);
>   
>   /*
> >>>
> >>> I thought I understood this, now I'm not sure.  How can this be correct?
> >>>  Real hardware doesn't act like this.
> >>>
> >>> What if the PIT is disabled after this?  You're injecting a spurious
> >>> interrupt then.
> >>
> >> Yes, the PIT has to raise the output as long as specified, i.e.
> >> according to the datasheet. That's important now due to the corrections
> >> to the PIC. We can then carefully check if there is room for
> >> simplifications / optimizations. I also cannot imagine that the above
> >> already fulfills these requirements.
> >>
> >> And if the PIT is disabled by the HPET, we need to clear the output
> >> explicitly as we inject the IRQ#0 under a different source ID than
> >> userspace HPET does (which will logically take over IRQ#0 control). The
> >> kernel would otherwise OR both sources to an incorrect result.
> >>
> > 
> > I guess we need to double the hrtimer rate then in order to generate a
> > square wave.  It's getting ridiculous how accurate our model needs to be.
> 
> I would suggest to solve this for the userspace model first, ensure that
> it works properly in all modes, maybe optimize it, and then decide how
> to map all this on kernel space. As long as we have two models, we can
> also make use of them.

Thoughts about the 8254 PIT:

First, this summary of (real) 8254 PIT behavior seems fairly
good, as far it goes:

On Tue, Sep 04, 2012 at 07:27:38PM +0100, Maciej W. Rozycki wrote:
>  * The 8254 PIT is normally configured in mode 2 or 3 in the PC/AT
>architecture.  In the former its output is high (active) all the time
>except from one (last) clock cycle.  In the latter a wave that has a
>duty cycle close or equal to 0.5 (depending on whether the divider is
>odd or even) is produced, so no short pulses either.  I don't remember
>the other four modes -- have a look at the datasheet if interested, but
>I reckon they're not really compatible with the wiring anyway, e.g. the
>gate is hardwired enabled.

I've also just skimmed parts of the 8254 section of "The Indispensable PC
Hardware Book", by Hans-Peter Messmer, Copyright 1994 Addison-Wesley,
although I probably ought to read it more carefully.

Under "normal" conditions, the 8254 part of the patch above should be
indistinguishable from previous behavior.  The 8259's IRR will
still show up as 1 until the interrupt is actually serviced,
and no new interrupt will be serviced after one is serviced until
another edge is injected via the high-low-high transition of the new
code.  (Unless the guest resets the 8259 or maybe messes with IMR,
but real hardware would generate extra interrupts in such cases as
well.)

The new code sounds much closer to mode 2 described by
Maciej, compared to the old code - except the duty cycle is
effectively 100 percent instead of 99.[some number of 9's] percent.

-
But there might be some concerns in abnormal conditions:

   * If some guest is actually depending on a 50 percent duty cycle
 (maybe some kind of polling rather than interrupts), I would
 expect it to

Re: [Qemu-devel] [PATCH 3/4] tcg: Move tcg_target_get_call_iarg_regs_count to tcg.c

2012-09-12 Thread Stefan Weil


Am 12.09.2012 22:59, schrieb Aurelien Jarno:

On Wed, Sep 12, 2012 at 10:44:37PM +0200, Stefan Weil wrote:

The TCG targets no longer need individual implementations.

Signed-off-by: Stefan Weil 
---
  tcg/arm/tcg-target.c   |6 --
  tcg/hppa/tcg-target.c  |6 --
  tcg/i386/tcg-target.c  |6 --
  tcg/ia64/tcg-target.c  |6 --
  tcg/mips/tcg-target.c  |6 --
  tcg/ppc/tcg-target.c   |6 --
  tcg/ppc64/tcg-target.c |6 --
  tcg/s390/tcg-target.c  |5 -
  tcg/sparc/tcg-target.c |6 --
  tcg/tcg.c  |7 ++-
  tcg/tci/tcg-target.c   |6 --
  11 files changed, 6 insertions(+), 60 deletions(-)




[...]


diff --git a/tcg/tcg.c b/tcg/tcg.c
index a4e7f42..53316f6 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -89,7 +89,6 @@ static void tcg_out_st(TCGContext *s, TCGType type, TCGReg 
arg, TCGReg arg1,
 tcg_target_long arg2);
  static int tcg_target_const_match(tcg_target_long val,
const TCGArgConstraint *arg_ct);
-static int tcg_target_get_call_iarg_regs_count(int flags);
  
  TCGOpDef tcg_op_defs[] = {

  #define DEF(s, oargs, iargs, cargs, flags) { #s, oargs, iargs, cargs, iargs + 
oargs + cargs, flags },
@@ -182,6 +181,12 @@ int gen_new_label(void)
  
  #include "tcg-target.c"
  
+/* Maximum number of register used for input function arguments. */

+static inline int tcg_target_get_call_iarg_regs_count(int flags)
+{
+return ARRAY_SIZE(tcg_target_call_iarg_regs);
+}
+

Do we really need a function for that, given that it has only one
caller?

For me ARRAY_SIZE(tcg_target_call_iarg_regs) is even more understandable
than tcg_target_get_call_iarg_regs_count().



I agree. tcg_target_get_call_iarg_regs_count can be removed
completely, but only if we find a solution for the problem
which Peter Maydell found: some code accesses
tcg_target_call_iarg_regs unconditionally even for x86 32 bit,
so for the moment, we cannot reduce its size to 0.

Stefan

Re: [Qemu-devel] [PATCH 1/9] ehci: Don't set seen to 0 when removing unseen queue-heads

2012-09-12 Thread Gerd Hoffmann

On 09/12/12 15:08, Hans de Goede wrote:
> When removing unseen queue-heads from the async queue list, we should not
> set the seen flag to 0, as this may cause them to be removed by
> ehci_queues_rip_unused() during the next call to ehci_advance_async_state()
> if the timer is late or running at a low frequency.

Patch series added to usb patch queue.

thanks,
  Gerd

Re: [Qemu-devel] [libvirt] [PATCH] snapshot: fix rollback failure in transaction mode

2012-09-12 Thread Guannan Ren


On 09/13/2012 01:47 AM, Eric Blake wrote:

On 09/12/2012 09:22 AM, Guannan Ren wrote:

After failure of qemu transaction command, the snapshot file still
be there with non-zero in size. In order to unlink the file, the
patch removes the file size checking.

Can you give some exact steps to reproduce this, so that I know who is
making the file have non-zero size?  I'm worried that unlinking a
non-empty file is discarding data, so the commit message needs a lot
more details about how we are proving that the only way the file can be
non-zero size is because qemu happened to put data into a previously
empty file prior to the failed 'transaction' attempt.


 qemu left non-empty file.
 Steps:
 1, Create a qemu instance with two drive images of qcow2 type 
(root user)

  /usr/libexec/qemu-kvm -m 1024 -smp 1 -name "rhel6u1" \
   -drive 
file=/var/lib/libvirt/images/firstqcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none 

   -device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 
\
   -drive 
file=/var/lib/libvirt/images/secondqcow2,if=none,id=drive-virtio-disk1,format=qcow2,cache=none 
\
   -device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk1,id=virtio-disk1 
-qmp stdio


 2, Initialize qemu qmp
 {"execute":"qmp_capabilities"}

 3, Remove the second drive image file
 rm -f /var/lib/libvirt/images/secondqcow2

 4, Run 'transaction' command with snapshot qemu commands in.
 {"execute":"transaction","arguments":
   {"actions":
[{"type":"blockdev-snapshot-sync","data":{"device":"drive-virtio-disk0","snapshot-file":"/var/lib/libvirt/images/firstqcow2-snapshot.img","format":"qcow2"}},
{"type":"blockdev-snapshot-sync","data":{"device":"drive-virtio-disk1","snapshot-file":"/var/lib/libvirt/images/secondqcow2-snapshot.img","format":"qcow2"}}]
   },
 "id":"libvirt-6"}

 5,  Got the error as follows:
 {"id": "libvirt-6", "error": {"class": "OpenFileFailed", 
"desc": "Could not open 
'/var/lib/libvirt/images/secondqcow2-snapshot.img'",
  "data": {"filename": 
"/var/lib/libvirt/images/secondqcow2-snapshot.img"}}}


 6, List first newly-created snapshot file:
 -rw-r--r--. 1 root root 262144 Sep 13 11:43 
firstqcow2-snapshot.img




That is, after re-reading context code just now, I'm fairly confident
that this code can only be reached when qemu supports the 'transaction'
monitor command, and libvirt's --reuse-ext flag was not specified, and
therefore libvirt must have just created the file.  But I want that in
the commit message rather than having to re-read the code every time I
visit this commit in future reads of the git log.



That's ok,  I will add this information into commit log in v2.




It may also be that
qemu has a bug in that the 'transaction' command is modifying files even
when it fails, so even while this works around the bug, I'm cc'ing Jeff
to see if qemu also needs a bug fix.

Re: [Qemu-devel] [PATCH V2 2/6] libqblock type and structure defines

2012-09-12 Thread Eric Blake

On 09/12/2012 09:33 PM, Eric Blake wrote:
>>  OK ,then I think
>> #if __GNUC__ >= 4
>> 
>> #else
>>   [warn name space pollution may happen]
>> #endif
>> would be better.
> 
> It may be shorter, but it is definitely not better, at least not in the
> current context of qemu.  Using the short form will fail a -Werror
> build, unless you also write a patch to qemu's configure to quit
> supplying -Wundef during builds.  But as touching configure has a bigger
> impact to the overall qemu project, you're going to need a lot more
> buy-in from other developers that -Wundef is not helping qemu gain any
> portability, and that it is safe to ditch it (or get enough
> counter-arguments from other developers why qemu insists on the
> anachronistic style enforced by -Wundef, at which point you must comply
> and use the longer form).

On second thought, this _particular_ usage will never fail a -Wundef
-Werror build, precisely because -Wundef is a gcc warning, which impies
the warning is only ever useful in the same scenarios that the __GNUC__
macro is always defined (that is, __GNUC__ is undefined only on a
non-gcc compiler, but what non-gcc compiler supports -Wundef -Werror?).

But why should this line be the one exemption to the rules?  Either qemu
insists on the -Wundef style of coding (and you should use the long form
to conform to that style, on the off-chance that someone ever wants to
port to a non-gcc compiler, even in this one place where gcc can't warn
you about the violation of that style), or we should change the qemu
style (at which point, the short form is nicer here, but it also implies
the potential for cleaning up lots of other places to also use short
forms and rely on preprocessor 0 computation).

-- 
Eric Blake   ebl...@redhat.com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH 01/13] ppc: Make kvm_arch_put_registers() put all the registers

2012-09-12 Thread David Gibson

At least when invoked with high enough 'level' arguments,
kvm_arch_put_registers() is supposed to copy essentially all the cpu state
as encoded in qemu's internal structures into the kvm state.  Currently
the ppc version does not do this - it never calls KVM_SET_SREGS, for
example, and therefore never sets the SDR1 and various other important
though rarely changed registers.

Instead, the code paths which need to set these registers need to
explicitly make (conditional) kvm calls which transfer the changes to kvm.
This breaks the usual model of handling state updates in qemu, where code
just changes the internal model and has it flushed out to kvm automatically
at some later point.

This patch fixes this for Book S ppc CPUs by adding a suitable call to
KVM_SET_SREGS and als to KVM_SET_ONE_REG to set the HIOR (the only register
that is set with that call so far).  This lets us remove the hacks to
explicitly set these registers from the kvmppc_set_papr() function.

The problem still exists for Book E CPUs (which use a different version of
the kvm_sregs structure).  But fixing that has some complications of its
own so can be left to another day.

Lkewise, there is still some ugly code for setting the PVR through special
calls to SET_SREGS which is left in for now.  The PVR needs to be set
especially early because it can affect what other features are available
on the CPU, so I need to do more thinking to see if it can be integrated
into the normal paths or not.

Signed-off-by: David Gibson 
---
 target-ppc/kvm.c |   89 ++
 1 file changed, 50 insertions(+), 39 deletions(-)

diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index a31d278..1a7489b 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -60,6 +60,7 @@ static int cap_booke_sregs;
 static int cap_ppc_smt;
 static int cap_ppc_rma;
 static int cap_spapr_tce;
+static int cap_hior;
 
 /* XXX We have a race condition where we actually have a level triggered
  * interrupt, but the infrastructure can't expose that yet, so the guest
@@ -86,6 +87,7 @@ int kvm_arch_init(KVMState *s)
 cap_ppc_smt = kvm_check_extension(s, KVM_CAP_PPC_SMT);
 cap_ppc_rma = kvm_check_extension(s, KVM_CAP_PPC_RMA);
 cap_spapr_tce = kvm_check_extension(s, KVM_CAP_SPAPR_TCE);
+cap_hior = kvm_check_extension(s, KVM_CAP_PPC_HIOR);
 
 if (!cap_interrupt_level) {
 fprintf(stderr, "KVM: Couldn't find level irq capability. Expect the "
@@ -469,6 +471,53 @@ int kvm_arch_put_registers(CPUPPCState *env, int level)
 env->tlb_dirty = false;
 }
 
+if (cap_segstate && (level >= KVM_PUT_RESET_STATE)) {
+struct kvm_sregs sregs;
+
+sregs.pvr = env->spr[SPR_PVR];
+
+sregs.u.s.sdr1 = env->spr[SPR_SDR1];
+
+/* Sync SLB */
+#ifdef TARGET_PPC64
+for (i = 0; i < 64; i++) {
+sregs.u.s.ppc64.slb[i].slbe = env->slb[i].esid;
+sregs.u.s.ppc64.slb[i].slbv = env->slb[i].vsid;
+}
+#endif
+
+/* Sync SRs */
+for (i = 0; i < 16; i++) {
+sregs.u.s.ppc32.sr[i] = env->sr[i];
+}
+
+/* Sync BATs */
+for (i = 0; i < 8; i++) {
+sregs.u.s.ppc32.dbat[i] = ((uint64_t)env->DBAT[1][i] << 32)
+| env->DBAT[0][i];
+sregs.u.s.ppc32.ibat[i] = ((uint64_t)env->IBAT[1][i] << 32)
+| env->IBAT[0][i];
+}
+
+ret = kvm_vcpu_ioctl(env, KVM_SET_SREGS, &sregs);
+if (ret) {
+return ret;
+}
+}
+
+if (cap_hior && (level >= KVM_PUT_RESET_STATE)) {
+uint64_t hior = env->spr[SPR_HIOR];
+struct kvm_one_reg reg = {
+.id = KVM_REG_PPC_HIOR,
+.addr = (uintptr_t) &hior,
+};
+
+ret = kvm_vcpu_ioctl(env, KVM_SET_ONE_REG, ®);
+if (ret) {
+return ret;
+}
+}
+
 return ret;
 }
 
@@ -946,52 +995,14 @@ int kvmppc_get_hypercall(CPUPPCState *env, uint8_t *buf, 
int buf_len)
 void kvmppc_set_papr(CPUPPCState *env)
 {
 struct kvm_enable_cap cap = {};
-struct kvm_one_reg reg = {};
-struct kvm_sregs sregs = {};
 int ret;
-uint64_t hior = env->spr[SPR_HIOR];
 
 cap.cap = KVM_CAP_PPC_PAPR;
 ret = kvm_vcpu_ioctl(env, KVM_ENABLE_CAP, &cap);
 
 if (ret) {
-goto fail;
-}
-
-/*
- * XXX We set HIOR here. It really should be a qdev property of
- * the CPU node, but we don't have CPUs converted to qdev yet.
- *
- * Once we have qdev CPUs, move HIOR to a qdev property and
- * remove this chunk.
- */
-reg.id = KVM_REG_PPC_HIOR;
-reg.addr = (uintptr_t)&hior;
-ret = kvm_vcpu_ioctl(env, KVM_SET_ONE_REG, ®);
-if (ret) {
-fprintf(stderr, "Couldn't set HIOR. Maybe you're running an old \n"
-"kernel with support for HV KVM but no PAPR PR \n"
-"KVM in which case things will work. If they don't \n"
-

[Qemu-devel] [PATCH 06/13] pseries: Reset emulated PCI TCE tables on system reset

2012-09-12 Thread David Gibson

The emulated PCI host bridge on the pseries machine incorporates an IOMMU
(PAPR TCE table).  Currently the mappings in this IOMMU are not cleared
when we reset the system.  This patch fixes this bug.  To do this it adds
a new reset function to the IOMMU emulation code.  The VIO devices already
reset their TCE tables, but they do so by destroying and re-creating their
DMA context.  This doesn't work for the PCI host bridge, because the
infrastructure for PCI IOMMUs has already copied/cached the DMA pointer
context into the subordinate PCI device structures.

Signed-off-by: David Gibson 
---
 hw/spapr.h   |1 +
 hw/spapr_iommu.c |   11 +++
 hw/spapr_pci.c   |   10 ++
 3 files changed, 22 insertions(+)

diff --git a/hw/spapr.h b/hw/spapr.h
index f1fb646..f9a7b0f 100644
--- a/hw/spapr.h
+++ b/hw/spapr.h
@@ -338,6 +338,7 @@ typedef struct sPAPRTCE {
 void spapr_iommu_init(void);
 DMAContext *spapr_tce_new_dma_context(uint32_t liobn, size_t window_size);
 void spapr_tce_free(DMAContext *dma);
+void spapr_tce_reset(DMAContext *dma);
 int spapr_dma_dt(void *fdt, int node_off, const char *propname,
  uint32_t liobn, uint64_t window, uint32_t size);
 int spapr_tcet_dma_dt(void *fdt, int node_off, const char *propname,
diff --git a/hw/spapr_iommu.c b/hw/spapr_iommu.c
index 53b7317..216aa06 100644
--- a/hw/spapr_iommu.c
+++ b/hw/spapr_iommu.c
@@ -162,6 +162,17 @@ void spapr_tce_free(DMAContext *dma)
 }
 }
 
+void spapr_tce_reset(DMAContext *dma)
+{
+if (dma) {
+sPAPRTCETable *tcet = DO_UPCAST(sPAPRTCETable, dma, dma);
+size_t table_size = (tcet->window_size >> SPAPR_TCE_PAGE_SHIFT)
+* sizeof(sPAPRTCE);
+
+memset(tcet->table, 0, table_size);
+}
+}
+
 static target_ulong put_tce_emu(sPAPRTCETable *tcet, target_ulong ioba,
 target_ulong tce)
 {
diff --git a/hw/spapr_pci.c b/hw/spapr_pci.c
index 661c05b..203155e 100644
--- a/hw/spapr_pci.c
+++ b/hw/spapr_pci.c
@@ -595,6 +595,15 @@ static int spapr_phb_init(SysBusDevice *s)
 return 0;
 }
 
+static void spapr_phb_reset(DeviceState *qdev)
+{
+SysBusDevice *s = sysbus_from_qdev(qdev);
+sPAPRPHBState *sphb = SPAPR_PCI_HOST_BRIDGE(s);
+
+/* Reset the IOMMU state */
+spapr_tce_reset(sphb->dma);
+}
+
 static Property spapr_phb_properties[] = {
 DEFINE_PROP_HEX64("buid", sPAPRPHBState, buid, 0),
 DEFINE_PROP_STRING("busname", sPAPRPHBState, busname),
@@ -613,6 +622,7 @@ static void spapr_phb_class_init(ObjectClass *klass, void 
*data)
 
 sdc->init = spapr_phb_init;
 dc->props = spapr_phb_properties;
+dc->reset = spapr_phb_reset;
 }
 
 static const TypeInfo spapr_phb_info = {
-- 
1.7.10.4

[Qemu-devel] [PATCH 10/13] pseries: Remove XICS irq type enum type

2012-09-12 Thread David Gibson

Currently the XICS interrupt controller emulation uses a custom enum to
specify whether a given interrupt is level-sensitive or message-triggered.
This enum makes life awkward for saving the state, and isn't particularly
useful since there are only two possibilities.  This patch replaces the
enum with a simple bool.

Signed-off-by: David Gibson 
---
 hw/spapr.c |8 
 hw/spapr.h |8 
 hw/spapr_pci.c |2 +-
 hw/xics.c  |   16 +++-
 hw/xics.h  |8 +---
 5 files changed, 17 insertions(+), 25 deletions(-)

diff --git a/hw/spapr.c b/hw/spapr.c
index 0a0e9cd..1177efa 100644
--- a/hw/spapr.c
+++ b/hw/spapr.c
@@ -89,7 +89,7 @@
 
 sPAPREnvironment *spapr;
 
-int spapr_allocate_irq(int hint, enum xics_irq_type type)
+int spapr_allocate_irq(int hint, bool lsi)
 {
 int irq;
 
@@ -105,13 +105,13 @@ int spapr_allocate_irq(int hint, enum xics_irq_type type)
 return 0;
 }
 
-xics_set_irq_type(spapr->icp, irq, type);
+xics_set_irq_type(spapr->icp, irq, lsi);
 
 return irq;
 }
 
 /* Allocate block of consequtive IRQs, returns a number of the first */
-int spapr_allocate_irq_block(int num, enum xics_irq_type type)
+int spapr_allocate_irq_block(int num, bool lsi)
 {
 int first = -1;
 int i;
@@ -119,7 +119,7 @@ int spapr_allocate_irq_block(int num, enum xics_irq_type 
type)
 for (i = 0; i < num; ++i) {
 int irq;
 
-irq = spapr_allocate_irq(0, type);
+irq = spapr_allocate_irq(0, lsi);
 if (!irq) {
 return -1;
 }
diff --git a/hw/spapr.h b/hw/spapr.h
index f9a7b0f..51a966b 100644
--- a/hw/spapr.h
+++ b/hw/spapr.h
@@ -291,17 +291,17 @@ void spapr_register_hypercall(target_ulong opcode, 
spapr_hcall_fn fn);
 target_ulong spapr_hypercall(CPUPPCState *env, target_ulong opcode,
  target_ulong *args);
 
-int spapr_allocate_irq(int hint, enum xics_irq_type type);
-int spapr_allocate_irq_block(int num, enum xics_irq_type type);
+int spapr_allocate_irq(int hint, bool lsi);
+int spapr_allocate_irq_block(int num, bool lsi);
 
 static inline int spapr_allocate_msi(int hint)
 {
-return spapr_allocate_irq(hint, XICS_MSI);
+return spapr_allocate_irq(hint, false);
 }
 
 static inline int spapr_allocate_lsi(int hint)
 {
-return spapr_allocate_irq(hint, XICS_LSI);
+return spapr_allocate_irq(hint, true);
 }
 
 static inline uint32_t rtas_ld(target_ulong phys, int n)
diff --git a/hw/spapr_pci.c b/hw/spapr_pci.c
index 203155e..b628f89 100644
--- a/hw/spapr_pci.c
+++ b/hw/spapr_pci.c
@@ -351,7 +351,7 @@ static void rtas_ibm_change_msi(sPAPREnvironment *spapr,
 
 /* There is no cached config, allocate MSIs */
 if (!phb->msi_table[ndev].nvec) {
-irq = spapr_allocate_irq_block(req_num, XICS_MSI);
+irq = spapr_allocate_irq_block(req_num, true);
 if (irq < 0) {
 fprintf(stderr, "Cannot allocate MSIs for device#%d", ndev);
 rtas_st(rets, 0, -1); /* Hardware error */
diff --git a/hw/xics.c b/hw/xics.c
index 648af25..75c8cca 100644
--- a/hw/xics.c
+++ b/hw/xics.c
@@ -170,7 +170,7 @@ struct ics_irq_state {
 #define XICS_STATUS_REJECTED   0x4
 #define XICS_STATUS_MASKED_PENDING 0x8
 uint8_t status;
-enum xics_irq_type type;
+bool lsi;
 };
 
 struct ics_state {
@@ -244,7 +244,7 @@ static void ics_set_irq(void *opaque, int srcno, int val)
 struct ics_state *ics = (struct ics_state *)opaque;
 struct ics_irq_state *irq = ics->irqs + srcno;
 
-if (irq->type == XICS_LSI) {
+if (irq->lsi) {
 set_irq_lsi(ics, srcno, val);
 } else {
 set_irq_msi(ics, srcno, val);
@@ -278,7 +278,7 @@ static void ics_write_xive(struct ics_state *ics, int nr, 
int server,
 irq->server = server;
 irq->priority = priority;
 
-if (irq->type == XICS_LSI) {
+if (irq->lsi) {
 write_xive_lsi(ics, srcno);
 } else {
 write_xive_msi(ics, srcno);
@@ -301,7 +301,7 @@ static void ics_resend(struct ics_state *ics)
 struct ics_irq_state *irq = ics->irqs + i;
 
 /* FIXME: filter by server#? */
-if (irq->type == XICS_LSI) {
+if (irq->lsi) {
 resend_lsi(ics, i);
 } else {
 resend_msi(ics, i);
@@ -314,7 +314,7 @@ static void ics_eoi(struct ics_state *ics, int nr)
 int srcno = nr - ics->offset;
 struct ics_irq_state *irq = ics->irqs + srcno;
 
-if (irq->type == XICS_LSI) {
+if (irq->lsi) {
 irq->status &= ~XICS_STATUS_SENT;
 }
 }
@@ -333,14 +333,12 @@ qemu_irq xics_get_qirq(struct icp_state *icp, int irq)
 return icp->ics->qirqs[irq - icp->ics->offset];
 }
 
-void xics_set_irq_type(struct icp_state *icp, int irq,
-   enum xics_irq_type type)
+void xics_set_irq_type(struct icp_state *icp, int irq, bool lsi)
 {
 assert((irq >= icp->ics->offset)
&& (irq < (icp->ics->offset + icp->ics->nr_irqs)));
-assert((type == XICS_MSI) || (type == XIC

[Qemu-devel] [PATCH 07/13] pseries: Fix XICS reset

2012-09-12 Thread David Gibson

The XICS interrupt controller used on the pseries machine currently has no
reset handler.  We can get away with this under some circumstances, but
it's not correct, and can cause failures if the XICS happens to be in the
wrong state at the time of reset.

This patch adds a hook to properly reset the XICS state.

Signed-off-by: David Gibson 
---
 hw/xics.c |   38 --
 1 file changed, 28 insertions(+), 10 deletions(-)

diff --git a/hw/xics.c b/hw/xics.c
index b674771..a8a08ce 100644
--- a/hw/xics.c
+++ b/hw/xics.c
@@ -489,11 +489,36 @@ static void rtas_int_on(sPAPREnvironment *spapr, uint32_t 
token,
 rtas_st(rets, 0, 0); /* Success */
 }
 
+static void xics_reset(void *opaque)
+{
+struct icp_state *icp = (struct icp_state *)opaque;
+struct ics_state *ics = icp->ics;
+int i;
+
+for (i = 0; i < icp->nr_servers; i++) {
+icp->ss[i].xirr = 0;
+icp->ss[i].pending_priority = 0;
+icp->ss[i].mfrr = 0xff;
+/* Make all outputs are deasserted */
+qemu_set_irq(icp->ss[i].output, 0);
+}
+
+for (i = 0; i < ics->nr_irqs; i++) {
+/* Reset everything *except* the type */
+ics->irqs[i].server = 0;
+ics->irqs[i].asserted = 0;
+ics->irqs[i].sent = 0;
+ics->irqs[i].rejected = 0;
+ics->irqs[i].masked_pending = 0;
+ics->irqs[i].priority = 0xff;
+ics->irqs[i].saved_priority = 0xff;
+}
+}
+
 struct icp_state *xics_system_init(int nr_irqs)
 {
 CPUPPCState *env;
 int max_server_num;
-int i;
 struct icp_state *icp;
 struct ics_state *ics;
 
@@ -508,10 +533,6 @@ struct icp_state *xics_system_init(int nr_irqs)
 icp->nr_servers = max_server_num + 1;
 icp->ss = g_malloc0(icp->nr_servers*sizeof(struct icp_server_state));
 
-for (i = 0; i < icp->nr_servers; i++) {
-icp->ss[i].mfrr = 0xff;
-}
-
 for (env = first_cpu; env != NULL; env = env->next_cpu) {
 struct icp_server_state *ss = &icp->ss[env->cpu_index];
 
@@ -539,11 +560,6 @@ struct icp_state *xics_system_init(int nr_irqs)
 icp->ics = ics;
 ics->icp = icp;
 
-for (i = 0; i < nr_irqs; i++) {
-ics->irqs[i].priority = 0xff;
-ics->irqs[i].saved_priority = 0xff;
-}
-
 ics->qirqs = qemu_allocate_irqs(ics_set_irq, ics, nr_irqs);
 
 spapr_register_hypercall(H_CPPR, h_cppr);
@@ -556,5 +572,7 @@ struct icp_state *xics_system_init(int nr_irqs)
 spapr_rtas_register("ibm,int-off", rtas_int_off);
 spapr_rtas_register("ibm,int-on", rtas_int_on);
 
+qemu_register_reset(xics_reset, icp);
+
 return icp;
 }
-- 
1.7.10.4

[Qemu-devel] [PATCH 02/13] pseries: Fix and cleanup CPU initialization and reset

2012-09-12 Thread David Gibson

The current pseries machine init function iterates over the CPUs at several
points, doing various bits of initialization.  This is messy; these can
and should be merged into a single iteration doing all the necessary per
cpu initialization.  Worse, some of these initializations were setting up
state which should be set on every reset, not just at machine init time.
A few of the initializations simply weren't necessary at all.

This patch, therefore, moves those things that need to be to the
per-cpu reset handler, and combines the remainder into two loops over
the cpus (which also creates them).  The second loop is for setting up
hash table information, and will be removed in a subsequent patch also
making other fixes to the hash table setup.

This exposes a bug in our start-cpu RTAS routine (called by the guest to
start up CPUs other than CPU0) under kvm.  Previously, this function did
not make a call to ensure that it's changes to the new cpu's state were
pushed into KVM in-kernel state.  We sort-of got away with this because
some of the initializations had already placed the secondary CPUs into the
right starting state for the sorts of Linux guests we've been running.

Nonetheless the start-cpu RTAS call's behaviour was not correct and could
easily have been broken by guest changes.  This patch also fixes it.

Signed-off-by: David Gibson 
---
 hw/spapr.c  |   34 --
 hw/spapr_rtas.c |5 +
 2 files changed, 25 insertions(+), 14 deletions(-)

diff --git a/hw/spapr.c b/hw/spapr.c
index c34b767..d88525a 100644
--- a/hw/spapr.c
+++ b/hw/spapr.c
@@ -581,8 +581,16 @@ static void spapr_reset(void *opaque)
 static void spapr_cpu_reset(void *opaque)
 {
 PowerPCCPU *cpu = opaque;
+CPUPPCState *env = &cpu->env;
 
 cpu_reset(CPU(cpu));
+
+/* All CPUs start halted.  CPU0 is unhalted from the machine level
+ * reset code and the rest are explicitly started up by the guest
+ * using an RTAS call */
+env->halted = 1;
+
+env->spr[SPR_HIOR] = 0;
 }
 
 /* Returns whether we want to use VGA or not */
@@ -665,11 +673,16 @@ static void ppc_spapr_init(ram_addr_t ram_size,
 
 /* Set time-base frequency to 512 MHz */
 cpu_ppc_tb_init(env, TIMEBASE_FREQ);
-qemu_register_reset(spapr_cpu_reset, cpu);
 
-env->hreset_vector = 0x60;
+/* PAPR always has exception vectors in RAM not ROM */
 env->hreset_excp_prefix = 0;
-env->gpr[3] = env->cpu_index;
+
+/* Tell KVM that we're in PAPR mode */
+if (kvm_enabled()) {
+kvmppc_set_papr(env);
+}
+
+qemu_register_reset(spapr_cpu_reset, cpu);
 }
 
 /* allocate RAM */
@@ -685,7 +698,10 @@ static void ppc_spapr_init(ram_addr_t ram_size,
 
 /* allocate hash page table.  For now we always make this 16mb,
  * later we should probably make it scale to the size of guest
- * RAM */
+ * RAM.  FIXME: setting the htab information in the CPU env really
+ * belongs at CPU reset time, but we can get away with it for now
+ * because the PAPR guest is not permitted to write SDR1 so in
+ * fact these settings will never change during the run */
 spapr->htab_size = 1ULL << (pteg_shift + 7);
 spapr->htab = qemu_memalign(spapr->htab_size, spapr->htab_size);
 
@@ -697,11 +713,6 @@ static void ppc_spapr_init(ram_addr_t ram_size,
 /* Tell KVM that we're in PAPR mode */
 env->spr[SPR_SDR1] = (unsigned long)spapr->htab |
  ((pteg_shift + 7) - 18);
-env->spr[SPR_HIOR] = 0;
-
-if (kvm_enabled()) {
-kvmppc_set_papr(env);
-}
 }
 
 filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, "spapr-rtas.bin");
@@ -827,11 +838,6 @@ static void ppc_spapr_init(ram_addr_t ram_size,
 
 spapr->entry_point = 0x100;
 
-/* SLOF will startup the secondary CPUs using RTAS */
-for (env = first_cpu; env != NULL; env = env->next_cpu) {
-env->halted = 1;
-}
-
 /* Prepare the device tree */
 spapr->fdt_skel = spapr_create_fdt_skel(cpu_model, rma_size,
 initrd_base, initrd_size,
diff --git a/hw/spapr_rtas.c b/hw/spapr_rtas.c
index ae18595..b808f80 100644
--- a/hw/spapr_rtas.c
+++ b/hw/spapr_rtas.c
@@ -184,6 +184,11 @@ static void rtas_start_cpu(sPAPREnvironment *spapr,
 return;
 }
 
+/* This will make sure qemu state is up to date with kvm, and
+ * mark it dirty so our changes get flushed back before the
+ * new cpu enters */
+kvm_cpu_synchronize_state(env);
+
 env->msr = (1ULL << MSR_SF) | (1ULL << MSR_ME);
 env->nip = start;
 env->gpr[3] = r3;
-- 
1.7.10.4

[Qemu-devel] [PATCH 05/13] pseries: Clear TCE and signal state when resetting PAPR VIO devices

2012-09-12 Thread David Gibson

When we reset the system, the reset method for VIO bus devices resets
the state of their request queue (if present) as it should.  However
it was not resetting the state of their TCE table (DMA translation) if
present.  It was also not resetting the state of the per-device signal
mask set with H_VIO_SIGNAL.  This patch corrects both bugs, and also
removes some small code duplication in the reset paths.

Signed-off-by: David Gibson 
---
 hw/spapr_vio.c |   11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/hw/spapr_vio.c b/hw/spapr_vio.c
index 7ca4452..752836e 100644
--- a/hw/spapr_vio.c
+++ b/hw/spapr_vio.c
@@ -324,9 +324,7 @@ static void spapr_vio_quiesce_one(VIOsPAPRDevice *dev)
 }
 dev->dma = spapr_tce_new_dma_context(liobn, pc->rtce_window_size);
 
-dev->crq.qladdr = 0;
-dev->crq.qsize = 0;
-dev->crq.qnext = 0;
+free_crq(dev);
 }
 
 static void rtas_set_tce_bypass(sPAPREnvironment *spapr, uint32_t token,
@@ -409,9 +407,10 @@ static void spapr_vio_busdev_reset(DeviceState *qdev)
 VIOsPAPRDevice *dev = DO_UPCAST(VIOsPAPRDevice, qdev, qdev);
 VIOsPAPRDeviceClass *pc = VIO_SPAPR_DEVICE_GET_CLASS(dev);
 
-if (dev->crq.qsize) {
-free_crq(dev);
-}
+/* Shut down the request queue and TCEs if necessary */
+spapr_vio_quiesce_one(dev);
+
+dev->signal_state = 0;
 
 if (pc->reset) {
 pc->reset(dev);
-- 
1.7.10.4

[Qemu-devel] [0/13] pseries patch queue

2012-09-12 Thread David Gibson

Hi Alex,

Here's my current set of ready-to-merge pseries related patches.  This
includes the latest revisions of the patches to fix system reset which
I posted before, plus a number of other small bugfixes and cleanups.

Re: [Qemu-devel] [PATCH V2 2/6] libqblock type and structure defines

2012-09-12 Thread Eric Blake

On 09/12/2012 09:24 PM, Wenchao Xia wrote:
> 于 2012-9-12 20:59, Eric Blake 写道:
>> On 09/11/2012 09:05 PM, Wenchao Xia wrote:
 Seriously?  We require a C99-compliant compiler, which is required to
 treat the more compact version identically (all undefined names
 evaluate
 to 0 in the preprocessor), and HACKING doesn't mandate that we spell
 out
 a defined-ness check first.  Okay, so configure adds -Wundef to the set
 of CFLAGS, but I fail to see why we want -Wundef (that's an
 anachronistic warning, only there to help you if you are writing code
 portable to K&R).

>>>So if the preprocessor replaced __GNUC__ to 0, is there difference
>>> between these two kinds of macoros?
>>> #if __GNUC__ >= 4
>>> #if defined(__GNUC__) && __GNUC__ >=4
>>
>> The only difference is whether -Wundef will warn, and I'm trying to
>> argue that qemu's current use of -Wundef is pointless, as that warning
>> exists solely for K&R compatibility, not for modern standard-compliant
>> code correctness.
>>
>  OK ,then I think
> #if __GNUC__ >= 4
> 
> #else
>   [warn name space pollution may happen]
> #endif
> would be better.

It may be shorter, but it is definitely not better, at least not in the
current context of qemu.  Using the short form will fail a -Werror
build, unless you also write a patch to qemu's configure to quit
supplying -Wundef during builds.  But as touching configure has a bigger
impact to the overall qemu project, you're going to need a lot more
buy-in from other developers that -Wundef is not helping qemu gain any
portability, and that it is safe to ditch it (or get enough
counter-arguments from other developers why qemu insists on the
anachronistic style enforced by -Wundef, at which point you must comply
and use the longer form).

-- 
Eric Blake   ebl...@redhat.com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH 11/13] pseries: Remove never used flags field from spapr vio devices

2012-09-12 Thread David Gibson

The general device state structure for PAPR VIO emulated devices includes a
'flags' field which was never used.  This patch removes it.

Signed-off-by: David Gibson 
---
 hw/spapr_vio.h |1 -
 1 file changed, 1 deletion(-)

diff --git a/hw/spapr_vio.h b/hw/spapr_vio.h
index ea6aa43..acef65e 100644
--- a/hw/spapr_vio.h
+++ b/hw/spapr_vio.h
@@ -60,7 +60,6 @@ typedef struct VIOsPAPRDeviceClass {
 struct VIOsPAPRDevice {
 DeviceState qdev;
 uint32_t reg;
-uint32_t flags;
 uint32_t irq;
 target_ulong signal_state;
 VIOsPAPR_CRQ crq;
-- 
1.7.10.4

Re: [Qemu-devel] [PATCH V2 2/6] libqblock type and structure defines

2012-09-12 Thread Wenchao Xia


于 2012-9-12 20:59, Eric Blake 写道:

On 09/11/2012 09:05 PM, Wenchao Xia wrote:

Seriously?  We require a C99-compliant compiler, which is required to
treat the more compact version identically (all undefined names evaluate
to 0 in the preprocessor), and HACKING doesn't mandate that we spell out
a defined-ness check first.  Okay, so configure adds -Wundef to the set
of CFLAGS, but I fail to see why we want -Wundef (that's an
anachronistic warning, only there to help you if you are writing code
portable to K&R).


   So if the preprocessor replaced __GNUC__ to 0, is there difference
between these two kinds of macoros?
#if __GNUC__ >= 4
#if defined(__GNUC__) && __GNUC__ >=4


The only difference is whether -Wundef will warn, and I'm trying to
argue that qemu's current use of -Wundef is pointless, as that warning
exists solely for K&R compatibility, not for modern standard-compliant
code correctness.


 OK ,then I think
#if __GNUC__ >= 4

#else
  [warn name space pollution may happen]
#endif
would be better.
--
Best Regards

Wenchao Xia

[Qemu-devel] [PATCH 09/13] pseries: Remove C bitfields from xics code

2012-09-12 Thread David Gibson

The XICS interrupt controller emulation uses some C bitfield variables in
its internal state structure.  This makes like awkward for saving the state
because we don't have easy VMSTATE helpers for bitfields.

This patch removes the bitfields, instead using explicit bit masking in a
single status variable.

Signed-off-by: David Gibson 
---
 hw/xics.c |   43 ---
 1 file changed, 24 insertions(+), 19 deletions(-)

diff --git a/hw/xics.c b/hw/xics.c
index a8a08ce..648af25 100644
--- a/hw/xics.c
+++ b/hw/xics.c
@@ -165,11 +165,12 @@ struct ics_irq_state {
 int server;
 uint8_t priority;
 uint8_t saved_priority;
+#define XICS_STATUS_ASSERTED   0x1
+#define XICS_STATUS_SENT   0x2
+#define XICS_STATUS_REJECTED   0x4
+#define XICS_STATUS_MASKED_PENDING 0x8
+uint8_t status;
 enum xics_irq_type type;
-int asserted:1;
-int sent:1;
-int rejected:1;
-int masked_pending:1;
 };
 
 struct ics_state {
@@ -191,8 +192,8 @@ static void resend_msi(struct ics_state *ics, int srcno)
 struct ics_irq_state *irq = ics->irqs + srcno;
 
 /* FIXME: filter by server#? */
-if (irq->rejected) {
-irq->rejected = 0;
+if (irq->status & XICS_STATUS_REJECTED) {
+irq->status &= ~XICS_STATUS_REJECTED;
 if (irq->priority != 0xff) {
 icp_irq(ics->icp, irq->server, srcno + ics->offset,
 irq->priority);
@@ -204,8 +205,10 @@ static void resend_lsi(struct ics_state *ics, int srcno)
 {
 struct ics_irq_state *irq = ics->irqs + srcno;
 
-if ((irq->priority != 0xff) && irq->asserted && !irq->sent) {
-irq->sent = 1;
+if ((irq->priority != 0xff)
+&& (irq->status & XICS_STATUS_ASSERTED)
+&& !(irq->status & XICS_STATUS_SENT)) {
+irq->status |= XICS_STATUS_SENT;
 icp_irq(ics->icp, irq->server, srcno + ics->offset, irq->priority);
 }
 }
@@ -216,7 +219,7 @@ static void set_irq_msi(struct ics_state *ics, int srcno, 
int val)
 
 if (val) {
 if (irq->priority == 0xff) {
-irq->masked_pending = 1;
+irq->status |= XICS_STATUS_MASKED_PENDING;
 /* masked pending */ ;
 } else  {
 icp_irq(ics->icp, irq->server, srcno + ics->offset, irq->priority);
@@ -228,7 +231,11 @@ static void set_irq_lsi(struct ics_state *ics, int srcno, 
int val)
 {
 struct ics_irq_state *irq = ics->irqs + srcno;
 
-irq->asserted = val;
+if (val) {
+irq->status |= XICS_STATUS_ASSERTED;
+} else {
+irq->status &= ~XICS_STATUS_ASSERTED;
+}
 resend_lsi(ics, srcno);
 }
 
@@ -248,11 +255,12 @@ static void write_xive_msi(struct ics_state *ics, int 
srcno)
 {
 struct ics_irq_state *irq = ics->irqs + srcno;
 
-if (!irq->masked_pending || (irq->priority == 0xff)) {
+if (!(irq->status & XICS_STATUS_MASKED_PENDING)
+|| (irq->priority == 0xff)) {
 return;
 }
 
-irq->masked_pending = 0;
+irq->status &= ~XICS_STATUS_MASKED_PENDING;
 icp_irq(ics->icp, irq->server, srcno + ics->offset, irq->priority);
 }
 
@@ -281,8 +289,8 @@ static void ics_reject(struct ics_state *ics, int nr)
 {
 struct ics_irq_state *irq = ics->irqs + nr - ics->offset;
 
-irq->rejected = 1; /* Irrelevant but harmless for LSI */
-irq->sent = 0; /* Irrelevant but harmless for MSI */
+irq->status |= XICS_STATUS_REJECTED; /* Irrelevant but harmless for LSI */
+irq->status &= ~XICS_STATUS_SENT; /* Irrelevant but harmless for MSI */
 }
 
 static void ics_resend(struct ics_state *ics)
@@ -307,7 +315,7 @@ static void ics_eoi(struct ics_state *ics, int nr)
 struct ics_irq_state *irq = ics->irqs + srcno;
 
 if (irq->type == XICS_LSI) {
-irq->sent = 0;
+irq->status &= ~XICS_STATUS_SENT;
 }
 }
 
@@ -506,10 +514,7 @@ static void xics_reset(void *opaque)
 for (i = 0; i < ics->nr_irqs; i++) {
 /* Reset everything *except* the type */
 ics->irqs[i].server = 0;
-ics->irqs[i].asserted = 0;
-ics->irqs[i].sent = 0;
-ics->irqs[i].rejected = 0;
-ics->irqs[i].masked_pending = 0;
+ics->irqs[i].status = 0;
 ics->irqs[i].priority = 0xff;
 ics->irqs[i].saved_priority = 0xff;
 }
-- 
1.7.10.4

[Qemu-devel] [PATCH 12/13] pseries: Rework implementation of TCE bypass

2012-09-12 Thread David Gibson

On the pseries machine the IOMMU (aka TCE tables) is always active for all
PCI and VIO devices.  Mostly to simplify the SLOF firmware, we implement an
extension which allows the IOMMU to be temporarily disabled for certain
devices.

Currently this is implemented by setting the device's DMAContext pointer to
NULL (thus reverting to qemu's default no-IOMMU DMA behaviour), then
replacing it when bypass mode is disabled.

This approach causes a bunch of complications though.  It complexifies the
management of the DMAContext lifetimes, it's problematic for savevm/loadvm,
and it means that while bypass is active we have nowhere to store the
device's LIOBN (Logical IO Bus Number, used to identify DMA address
spaces).  At present we regenerate the LIOBN from other address information
but this restricts how we can allocate LIOBNs.

This patch gives up on this approach, replacing it with the much simpler
one of having a 'bypass' boolean flag in the TCE state structure.

Signed-off-by: David Gibson 
---
 hw/spapr.h   |1 +
 hw/spapr_iommu.c |   25 +++--
 hw/spapr_vio.c   |   26 ++
 hw/spapr_vio.h   |1 -
 4 files changed, 30 insertions(+), 23 deletions(-)

diff --git a/hw/spapr.h b/hw/spapr.h
index 51a966b..e984e3f 100644
--- a/hw/spapr.h
+++ b/hw/spapr.h
@@ -339,6 +339,7 @@ void spapr_iommu_init(void);
 DMAContext *spapr_tce_new_dma_context(uint32_t liobn, size_t window_size);
 void spapr_tce_free(DMAContext *dma);
 void spapr_tce_reset(DMAContext *dma);
+void spapr_tce_set_bypass(DMAContext *dma, bool bypass);
 int spapr_dma_dt(void *fdt, int node_off, const char *propname,
  uint32_t liobn, uint64_t window, uint32_t size);
 int spapr_tcet_dma_dt(void *fdt, int node_off, const char *propname,
diff --git a/hw/spapr_iommu.c b/hw/spapr_iommu.c
index 216aa06..38034c0 100644
--- a/hw/spapr_iommu.c
+++ b/hw/spapr_iommu.c
@@ -42,6 +42,7 @@ struct sPAPRTCETable {
 uint32_t liobn;
 uint32_t window_size;
 sPAPRTCE *table;
+bool bypass;
 int fd;
 QLIST_ENTRY(sPAPRTCETable) list;
 };
@@ -78,6 +79,12 @@ static int spapr_tce_translate(DMAContext *dma,
 DMA_ADDR_FMT "\n", tcet->liobn, addr);
 #endif
 
+if (tcet->bypass) {
+*paddr = addr;
+*len = (target_phys_addr_t)-1;
+return 0;
+}
+
 /* Check if we are in bound */
 if (addr >= tcet->window_size) {
 #ifdef DEBUG_TCE
@@ -162,15 +169,21 @@ void spapr_tce_free(DMAContext *dma)
 }
 }
 
+void spapr_tce_set_bypass(DMAContext *dma, bool bypass)
+{
+sPAPRTCETable *tcet = DO_UPCAST(sPAPRTCETable, dma, dma);
+
+tcet->bypass = bypass;
+}
+
 void spapr_tce_reset(DMAContext *dma)
 {
-if (dma) {
-sPAPRTCETable *tcet = DO_UPCAST(sPAPRTCETable, dma, dma);
-size_t table_size = (tcet->window_size >> SPAPR_TCE_PAGE_SHIFT)
-* sizeof(sPAPRTCE);
+sPAPRTCETable *tcet = DO_UPCAST(sPAPRTCETable, dma, dma);
+size_t table_size = (tcet->window_size >> SPAPR_TCE_PAGE_SHIFT)
+* sizeof(sPAPRTCE);
 
-memset(tcet->table, 0, table_size);
-}
+tcet->bypass = false;
+memset(tcet->table, 0, table_size);
 }
 
 static target_ulong put_tce_emu(sPAPRTCETable *tcet, target_ulong ioba,
diff --git a/hw/spapr_vio.c b/hw/spapr_vio.c
index 752836e..848806d 100644
--- a/hw/spapr_vio.c
+++ b/hw/spapr_vio.c
@@ -316,14 +316,9 @@ int spapr_vio_send_crq(VIOsPAPRDevice *dev, uint8_t *crq)
 
 static void spapr_vio_quiesce_one(VIOsPAPRDevice *dev)
 {
-VIOsPAPRDeviceClass *pc = VIO_SPAPR_DEVICE_GET_CLASS(dev);
-uint32_t liobn = SPAPR_VIO_BASE_LIOBN | dev->reg;
-
 if (dev->dma) {
-spapr_tce_free(dev->dma);
+spapr_tce_reset(dev->dma);
 }
-dev->dma = spapr_tce_new_dma_context(liobn, pc->rtce_window_size);
-
 free_crq(dev);
 }
 
@@ -346,16 +341,14 @@ static void rtas_set_tce_bypass(sPAPREnvironment *spapr, 
uint32_t token,
 rtas_st(rets, 0, -3);
 return;
 }
-if (enable) {
-spapr_tce_free(dev->dma);
-dev->dma = NULL;
-} else {
-VIOsPAPRDeviceClass *pc = VIO_SPAPR_DEVICE_GET_CLASS(dev);
-uint32_t liobn = SPAPR_VIO_BASE_LIOBN | dev->reg;
 
-dev->dma = spapr_tce_new_dma_context(liobn, pc->rtce_window_size);
+if (!dev->dma) {
+rtas_st(rets, 0, -3);
+return;
 }
 
+spapr_tce_set_bypass(dev->dma, !!enable);
+
 rtas_st(rets, 0, 0);
 }
 
@@ -421,7 +414,6 @@ static int spapr_vio_busdev_init(DeviceState *qdev)
 {
 VIOsPAPRDevice *dev = (VIOsPAPRDevice *)qdev;
 VIOsPAPRDeviceClass *pc = VIO_SPAPR_DEVICE_GET_CLASS(dev);
-uint32_t liobn;
 char *id;
 
 if (dev->reg != -1) {
@@ -463,8 +455,10 @@ static int spapr_vio_busdev_init(DeviceState *qdev)
 return -1;
 }
 
-liobn = SPAPR_VIO_BASE_LIOBN | dev->reg;
-dev->dma = spapr_tce_new_dma_context(liobn, pc->rtce_window_size);
+if (pc->rtce_window_size) {
+uint32_t liobn = SPAPR_VIO_BASE_LIOBN | dev->r

[Qemu-devel] [PATCH 13/13] pseries: Fix semantics of RTAS int-on, int-off and set-xive functions

2012-09-12 Thread David Gibson

Currently the ibm,int-on and ibm,int-off RTAS functions are implemented as
no-ops.  This is because when implemented as specified in PAPR they caused
Linux (which calls both int-on/off and set-xive) to end up with interrupts
masked when they should not be.  Since Linux's set-xive calls make the
int-on/off calls redundant, making them nops worked around the problem.

In fact, the problem was caused because there was a subtle bug in set-xive,
PAPR specifies that as well as updating the current priority, it also needs
to update the saved priority used by int-on/off.  With this bug fixed the
problem goes away.  This patch implements this more correct fix.

Signed-off-by: David Gibson 
---
 hw/xics.c |   25 -
 1 file changed, 8 insertions(+), 17 deletions(-)

diff --git a/hw/xics.c b/hw/xics.c
index 75c8cca..ce88aa7 100644
--- a/hw/xics.c
+++ b/hw/xics.c
@@ -270,13 +270,14 @@ static void write_xive_lsi(struct ics_state *ics, int 
srcno)
 }
 
 static void ics_write_xive(struct ics_state *ics, int nr, int server,
-   uint8_t priority)
+   uint8_t priority, uint8_t saved_priority)
 {
 int srcno = nr - ics->offset;
 struct ics_irq_state *irq = ics->irqs + srcno;
 
 irq->server = server;
 irq->priority = priority;
+irq->saved_priority = saved_priority;
 
 if (irq->lsi) {
 write_xive_lsi(ics, srcno);
@@ -405,7 +406,7 @@ static void rtas_set_xive(sPAPREnvironment *spapr, uint32_t 
token,
 return;
 }
 
-ics_write_xive(ics, nr, server, priority);
+ics_write_xive(ics, nr, server, priority, priority);
 
 rtas_st(rets, 0, 0); /* Success */
 }
@@ -453,14 +454,8 @@ static void rtas_int_off(sPAPREnvironment *spapr, uint32_t 
token,
 return;
 }
 
-/* This is a NOP for now, since the described PAPR semantics don't
- * seem to gel with what Linux does */
-#if 0
-struct ics_irq_state *irq = xics->irqs + (nr - xics->offset);
-
-irq->saved_priority = irq->priority;
-ics_write_xive_msi(xics, nr, irq->server, 0xff);
-#endif
+ics_write_xive(ics, nr, ics->irqs[nr - ics->offset].server, 0xff,
+   ics->irqs[nr - ics->offset].priority);
 
 rtas_st(rets, 0, 0); /* Success */
 }
@@ -484,13 +479,9 @@ static void rtas_int_on(sPAPREnvironment *spapr, uint32_t 
token,
 return;
 }
 
-/* This is a NOP for now, since the described PAPR semantics don't
- * seem to gel with what Linux does */
-#if 0
-struct ics_irq_state *irq = xics->irqs + (nr - xics->offset);
-
-ics_write_xive_msi(xics, nr, irq->server, irq->saved_priority);
-#endif
+ics_write_xive(ics, nr, ics->irqs[nr - ics->offset].server,
+   ics->irqs[nr - ics->offset].saved_priority,
+   ics->irqs[nr - ics->offset].saved_priority);
 
 rtas_st(rets, 0, 0); /* Success */
 }
-- 
1.7.10.4

[Qemu-devel] [PATCH 08/13] pseries: Small cleanup to H_CEDE implementation

2012-09-12 Thread David Gibson

The H_CEDE hypercall implementation for the pseries machine doesn't trigger
quite the right path in the main cpu exec loop.  We should set exit_request
to pop up one extra level and recheck state, and we should set the
exception_index to EXCP_HLT (H_CEDE is roughly equivalent to the hlt
instruction on x86).

In practice, this doesn't really matter except for KVM, and KVM implements
H_CEDE internally so we never hit this code path.  But we might as well
get it right, just in case it matters some day.

Signed-off-by: David Gibson 
---
 hw/spapr_hcall.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/spapr_hcall.c b/hw/spapr_hcall.c
index a5990a9..407791d 100644
--- a/hw/spapr_hcall.c
+++ b/hw/spapr_hcall.c
@@ -545,6 +545,8 @@ static target_ulong h_cede(CPUPPCState *env, 
sPAPREnvironment *spapr,
 hreg_compute_hflags(env);
 if (!cpu_has_work(env)) {
 env->halted = 1;
+env->exception_index = EXCP_HLT;
+env->exit_request = 1;
 }
 return H_SUCCESS;
 }
-- 
1.7.10.4

[Qemu-devel] [PATCH 04/13] pseries: Add support for new KVM hash table control call

2012-09-12 Thread David Gibson

This adds support for then new "reset htab" ioctl which allows qemu
to properly cleanup the MMU hash table when the guest is reset. With
the corresponding kernel support, reset of a guest now works properly.

This also paves the way for indicating a different size hash table
to the kernel and for the kernel to be able to impose limits on
the requested size.

Signed-off-by: Benjamin Herrenschmidt 
Signed-off-by: David Gibson 
---
 hw/spapr.c   |  274 +-
 hw/spapr.h   |4 +-
 target-ppc/kvm.c |   29 ++
 target-ppc/kvm_ppc.h |   19 
 4 files changed, 213 insertions(+), 113 deletions(-)

diff --git a/hw/spapr.c b/hw/spapr.c
index 68542e8..0a0e9cd 100644
--- a/hw/spapr.c
+++ b/hw/spapr.c
@@ -85,6 +85,8 @@
 
 #define PHANDLE_XICP0x
 
+#define HTAB_SIZE(spapr)(1ULL << ((spapr)->htab_shift))
+
 sPAPREnvironment *spapr;
 
 int spapr_allocate_irq(int hint, enum xics_irq_type type)
@@ -134,12 +136,13 @@ int spapr_allocate_irq_block(int num, enum xics_irq_type 
type)
 return first;
 }
 
-static int spapr_set_associativity(void *fdt, sPAPREnvironment *spapr)
+static int spapr_fixup_cpu_dt(void *fdt, sPAPREnvironment *spapr)
 {
 int ret = 0, offset;
 CPUPPCState *env;
 char cpu_model[32];
 int smt = kvmppc_smt_threads();
+uint32_t pft_size_prop[] = {0, cpu_to_be32(spapr->htab_shift)};
 
 assert(spapr->cpu_model);
 
@@ -163,8 +166,16 @@ static int spapr_set_associativity(void *fdt, 
sPAPREnvironment *spapr)
 return offset;
 }
 
-ret = fdt_setprop(fdt, offset, "ibm,associativity", associativity,
-  sizeof(associativity));
+if (nb_numa_nodes > 1) {
+ret = fdt_setprop(fdt, offset, "ibm,associativity", associativity,
+  sizeof(associativity));
+if (ret < 0) {
+return ret;
+}
+}
+
+ret = fdt_setprop(fdt, offset, "ibm,pft-size",
+  pft_size_prop, sizeof(pft_size_prop));
 if (ret < 0) {
 return ret;
 }
@@ -206,45 +217,36 @@ static size_t create_page_sizes_prop(CPUPPCState *env, 
uint32_t *prop,
 return (p - prop) * sizeof(uint32_t);
 }
 
+#define _FDT(exp) \
+do { \
+int ret = (exp);   \
+if (ret < 0) { \
+fprintf(stderr, "qemu: error creating device tree: %s: %s\n", \
+#exp, fdt_strerror(ret));  \
+exit(1);   \
+}  \
+} while (0)
+
+
 static void *spapr_create_fdt_skel(const char *cpu_model,
-   target_phys_addr_t rma_size,
target_phys_addr_t initrd_base,
target_phys_addr_t initrd_size,
target_phys_addr_t kernel_size,
const char *boot_device,
-   const char *kernel_cmdline,
-   long hash_shift)
+   const char *kernel_cmdline)
 {
 void *fdt;
 CPUPPCState *env;
-uint64_t mem_reg_property[2];
 uint32_t start_prop = cpu_to_be32(initrd_base);
 uint32_t end_prop = cpu_to_be32(initrd_base + initrd_size);
-uint32_t pft_size_prop[] = {0, cpu_to_be32(hash_shift)};
 char hypertas_prop[] = "hcall-pft\0hcall-term\0hcall-dabr\0hcall-interrupt"
 "\0hcall-tce\0hcall-vio\0hcall-splpar\0hcall-bulk";
 char qemu_hypertas_prop[] = "hcall-memop1";
+uint32_t refpoints[] = {cpu_to_be32(0x4), cpu_to_be32(0x4)};
 uint32_t interrupt_server_ranges_prop[] = {0, cpu_to_be32(smp_cpus)};
-int i;
 char *modelname;
-int smt = kvmppc_smt_threads();
+int i, smt = kvmppc_smt_threads();
 unsigned char vec5[] = {0x0, 0x0, 0x0, 0x0, 0x0, 0x80};
-uint32_t refpoints[] = {cpu_to_be32(0x4), cpu_to_be32(0x4)};
-uint32_t associativity[] = {cpu_to_be32(0x4), cpu_to_be32(0x0),
-cpu_to_be32(0x0), cpu_to_be32(0x0),
-cpu_to_be32(0x0)};
-char mem_name[32];
-target_phys_addr_t node0_size, mem_start;
-
-#define _FDT(exp) \
-do { \
-int ret = (exp);   \
-if (ret < 0) { \
-fprintf(stderr, "qemu: error creating device tree: %s: %s\n", \
-#exp, fdt_strerror(ret));  \
-exit(1);   \
-}  \
-} while (0)
 
 fdt = g_malloc0(FDT_MAX_SIZE);
 _FDT((fdt_create(fdt, FDT_MAX_SIZE)));
@@

[Qemu-devel] [PATCH 03/13] pseries: Use new method to correct reset sequence

2012-09-12 Thread David Gibson

A number of things need to occur during reset of the PAPR
paravirtualized platform in a specific order.  For example, the hash
table needs to be cleared before the CPUs are reset, so that they
initialize their register state correctly, and the CPUs need to have
their main reset called before we set up the entry point state on the
boot cpu.  We also need to have the main qdev reset happen before the
creation and installation of the device tree for the new boot, because
we need the state of the devices settled to correctly construct the
device tree.

We currently do the pseries once-per-reset initializations done from a
reset handler.  However we can't adequately control when this handler
is called during the reset - in particular we can't guarantee it
happens after all the qdev resets (since qdevs might be registered
after the machine init function has executed).

This patch uses the new QEMUMachine reset method to to fix this
problem, ensuring the various order dependent reset steps happen in
the correct order.

Signed-off-by: David Gibson 
---
 hw/spapr.c |9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/hw/spapr.c b/hw/spapr.c
index d88525a..68542e8 100644
--- a/hw/spapr.c
+++ b/hw/spapr.c
@@ -559,13 +559,13 @@ static void emulate_spapr_hypercall(CPUPPCState *env)
 env->gpr[3] = spapr_hypercall(env, env->gpr[3], &env->gpr[4]);
 }
 
-static void spapr_reset(void *opaque)
+static void ppc_spapr_reset(void)
 {
-sPAPREnvironment *spapr = (sPAPREnvironment *)opaque;
-
 /* flush out the hash table */
 memset(spapr->htab, 0, spapr->htab_size);
 
+qemu_devices_reset();
+
 /* Load the fdt */
 spapr_finalize_fdt(spapr, spapr->fdt_addr, spapr->rtas_addr,
spapr->rtas_size);
@@ -845,14 +845,13 @@ static void ppc_spapr_init(ram_addr_t ram_size,
 boot_device, kernel_cmdline,
 pteg_shift + 7);
 assert(spapr->fdt_skel != NULL);
-
-qemu_register_reset(spapr_reset, spapr);
 }
 
 static QEMUMachine spapr_machine = {
 .name = "pseries",
 .desc = "pSeries Logical Partition (PAPR compliant)",
 .init = ppc_spapr_init,
+.reset = ppc_spapr_reset,
 .max_cpus = MAX_CPUS,
 .no_parallel = 1,
 .use_scsi = 1,
-- 
1.7.10.4

Re: [Qemu-devel] [RFC v3 ATCH 0/5] char: expose CirMemCharDriver and provide QMP interface

2012-09-12 Thread Anthony Liguori

"Daniel P. Berrange"  writes:

> On Wed, Sep 12, 2012 at 07:57:21PM +0800, Lei Li wrote:
>> This RFC series attempts to convert the MemCharDriver to use a circular
>> buffer for input and output, expose it to users by introducing QMP commands
>> memchar_write and memchar_read and via the command line like the other
>> CharDriverStates.
>> 
>> Serial ports in qemu always use CharDriverStates as there backends,
>> Right now, all of our backends always try to write the data from the
>> guest to a socket or file. The concern from OpenStack is that this could
>> lead to unbounded disk space usage since they log the serial output.
>> For more detail of the background info:
>> https://bugs.launchpad.net/nova/+bug/832507
>
> Unbounded disk usage is only relevant if telling QEMU to write directly
> to its file backend. If you use a socket backend the mgmt app can provide
> whatever policy it desires.
>
>> So we want to use a circular buffer in QEMU instead, and then OpenStack
>> can periodically read the buffer in QEMU and log it.
>
> With any circular buffer you obviously have a race condition where
> the buffer may overflow before the application can consume the data.
> By implementing the circular buffer in QEMU you are imposing a
> specific policy for overflow on the applications using QEMU, namely
> that data gets overwritten/lost.
>
> If the circular buffering is done outside QEMU, then the application
> can implement a variety of policies on overflow. At the very least
> they can detect when overflow would occur, and insert a marker to
> the effect that there is a log discontinuity. Or they can pause the
> VM for a period of time, or reduce its scheduling priority, or any
> number of different things.
>
> The further advantage of doing it outside QEMU, is that OpenStack will
> work with all existing QEMU/KVM/libvirt versions.

I'm not sure what is the best solution for libvirt and OpenStack but I
think you're missing a few key points.

CDS doesn't propagate flow control to the guest (in some cases, it
simply can't because hardware doesn't).  That means that there has to be
some policy in QEMU about what to do with data that cannot be written to
a socket.

Today we simply drop new data.  This adds a mechanism where we drop old
data.  For some cases, dropping old data is much nicer than dropping new
data.

But there are other compelling use-cases for this beyond libvirt.  This
will allow us to implement a 'console' command which will be pretty nice
within HMP.

It also provides a very nice way to write tests.  It's much easier to
use something like this from qtest than it is to setup a socket in in
listen mode and then queue incoming data to be read.

Regards,

Anthony Liguori


> I think most of the discussion in the quoted OpenStack bug is rather
> short sighted only focusing on the immediate needs for their current
> 'get_log_output' API. I don't think this will be satisfactory for
> OpenStack in the medium-long term. IMHO OpenStack needs to provide
> a more comprensive logging capability than what is does today, and
> thus this proposed QEMU feature would have a short lifetime of usefulness
> to OpenStack. I've already recommended that OpenStack take advantage
> of conserver to setup logging of all VMs, though there were some
> issues around that. It is entirely possible for OpenStack to provide
> its own logging system to process VM logs into fixed size files, as
> well as satisfying the needs of its get_log_output API for circular
> buffers.
>
> Regards,
> Daniel
> -- 
> |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org  -o- http://virt-manager.org :|
> |: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

Re: [Qemu-devel] Does TCG IR use static single assignment (SSA) form?

2012-09-12 Thread Wei-Ren Chen

> Excuse me for asking, does TCG-IR  use static single assignment (SSA) form?
> 
> I just wanna know how to translate a register-based bytecode to TCG-IR.

  Sounds like you need to take a look at target-xxx/translate.c ?

Regards,
chenwj

-- 
Wei-Ren Chen (陳韋任)
Computer Systems Lab, Institute of Information Science,
Academia Sinica, Taiwan (R.O.C.)
Tel:886-2-2788-3799 #1667
Homepage: http://people.cs.nctu.edu.tw/~chenwj

[Qemu-devel] 'qemu-nbd' explicite flush to disk

2012-09-12 Thread Mark Trumpold

Hello,

I posted a similar message to 'qemu-discuss', and was thinking my question may 
require a developer level of expertise.  I will try to be as brief as possible.

My context is using 'qemu-nbd' and 'qemu-img' to manage 'qcow2' loop filesystem 
images being hosted on a 'nilfs' filesystem.  This way I can 'nilfs snapshot' 
the images.

The problem is 'nilfs' creates a 'checkpoint' for every synchronous write, and 
'qemu-nbd's default cache mode is 'writethrough'.   Disk space is consumed 
quickly with the 'checkpoint' overhead in this case.

So, I've been experimenting with 'qemu-nbd --cache=writeback ..'
This nicely eliminates the  'checkpoint' issue; however, I have as yet been 
unable to explicitely flush things to disk -- which I would like to do just 
before a 'nilfs' snapshot.

I've tried the 'bdrv_drain_all() / bdrv_flush_all()'.   I instrumented and 
found the 'bdrv_co_flush(bs)' doesn't get called as expected in the loop.  I 
had one image connected as follows for this test:
qemu-nbd -v --cache=writeback -c /dev/nbd5 /images/root/snap.img

Subsequently I've been trying to call 'bdrv_co_flush(bs)' directly, but I can't 
figure out how to dereference 'bs' for the call.

I'm probably out in the weeds on this one.
Any guidance would be greatly appreciated.

I am running:
qemu-1.2.0
linux kernel 3.3.1

Thank you,
Mark Trumpold


Confidentiality Notice:  The information contained in this electronic e-mail 
and any accompanying attachment(s) is intended only for the use of the intended 
recipient and is confidential and/or privileged. If you and we have a 
confidentiality agreement or other non-disclosure obligations between us, this 
Notice shall be deemed to mark and identify the content of this email and any 
attachments as confidential and proprietary.   If any reader of this 
communication is not the intended recipient, unauthorized use, disclosure or 
copying is strictly prohibited, and may be unlawful.  If you have received this 
communication in error, please immediately notify the sender by return e-mail, 
and delete the original message and all copies from your system.  Thank you.

IRS Circular 230 Disclosure: To ensure compliance with requirements imposed by 
the IRS, please be advised that any U.S. federal tax advice contained in this 
communication (including any attachments) is not intended or written to be used 
or relied upon, and cannot be used or relied upon, for the purpose of (i) 
avoiding penalties under the Internal Revenue Code, or (ii) promoting, 
marketing or recommending to another party any transaction or matter addressed 
herein.

E-mail is susceptible to data corruption, interception, unauthorized amendment, 
tampering and viruses, and we only send and receive e-mails on the basis that 
we are not liable for any such corruption, interception, amendment, tampering 
or viruses or any consequences thereof.

[Qemu-devel] [PATCH] Revert 455aa1e08 and c3767ed0eb

2012-09-12 Thread Anthony Liguori

commit c3767ed0eb5d0bb25fe409ae5dec06e3411ff1b6
qemu-char: (Re-)connect for tcp_chr_write() unconnected writing

Has no hope of working because tcp_chr_connect() does not actually connect.

455aa1e08 just fixes the SEGV with server() but the attempt to connect a client
socket is still completely broken.

This patch reverts both.

Reported-by: Richard W.M. Jones 
Signed-off-by: Anthony Liguori 
---
 qemu-char.c |7 +--
 1 files changed, 1 insertions(+), 6 deletions(-)

diff --git a/qemu-char.c b/qemu-char.c
index 767da93..10d1504 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -2141,18 +2141,13 @@ typedef struct {
 
 static void tcp_chr_accept(void *opaque);
 
-static void tcp_chr_connect(void *opaque);
-
 static int tcp_chr_write(CharDriverState *chr, const uint8_t *buf, int len)
 {
 TCPCharDriver *s = chr->opaque;
 if (s->connected) {
 return send_all(s->fd, buf, len);
-} else if (s->listen_fd == -1) {
-/* (Re-)connect for unconnected writing */
-tcp_chr_connect(chr);
-return 0;
 } else {
+/* XXX: indicate an error ? */
 return len;
 }
 }
-- 
1.7.5.4

Re: [Qemu-devel] Enablig DLPAR capacity on QEMU pSeries

2012-09-12 Thread Benjamin Herrenschmidt

On Wed, 2012-09-12 at 17:53 +0200, Alexander Graf wrote:
> On 09/12/2012 04:54 PM, Erlon Cruz wrote:
> > Hi all,
> >
> > We are planning to implement DLPAR capacity on QEMU pSeries. As we
> 
> What is DLPAR? Hotplug support?

Yes.

> > lack of experience in the internals of the arch we would like you guys
> > to give us some design directions
> > and confirm if we going in the right direction. Our first idea is:
> >
> >  1 - to patch 'spapr.c' so it can dynamically insert/remove basic
> > items into the device tree.
> 
> What exactly would you like to patch into it? We already do have support 
> for dynamic dt creation with the spapr target.

No we don't. We don't have the necessary bits and pieces to pass the DT
updates down to the guest. PAPR defines a mechanism using RTAS calls
which we need to implement, but there are some issues remaining:

 - We don't have a way to "initiate" a DLPAR operation. This is
currently done by proprietary tools that communicate with the HMC. We
want to invent some kind of hotplug "interrupt" (using existing RTAS
event facilities). All it needs to do is indicate the DT path (ie.
connector) where something is to be plugged to or unplugged, which can
then trigger the relevant configure-connector calls to retrieve the DT
bits.

 - We have a problem with PCI. Currently, the content of the PCI
bus(ses) is discovered by SLOF running inside the guest. Not by qemu.
It's SLOF that assigns the BARs and create the device-tree nodes for the
various PCI devices. However, with hotplug, the guest expects to get
fully populated DT nodes for hotplugged PCI devices and fully assigned
BARS. Under pHyp that works because under the hood, RTAS contains an OFW
implementation which does all the assignment before passing the stuff to
the OS, but under qemu, RTAS is actually in qemu. This means we'll
probably have to move back the PCI device node creation and resource
assignment to qemu (like it was in the very early versions of the spapr
support).

> >  2 - create a host side device that will be used with a guest side
> > driver to perform guest side operations and communicate changes from
> > host to the guest (like DynamicRM does in PowerVM LPARs). We are not
>
> Why not just use hypercalls?

Actually there are existing RTAS calls to use for the actual passing of
the device-tree bits, the problem is purely how to "initiate" an
operation to trigger the guest code that will then perform the
appropriate calls.

qemu-ga is an option. But I was thinking more along the lines of adding
some new RTAS events, maybe EPOW style, a bit like ACPI does.

> > planning to use powerpc-tools and want to make resource management
> > transparent (i.e. no need to run daemons or userspace programs in the
> > guest, only this kernel driver).
> >  3 - create bindings to support adding/removal  ibmvscsi devices
> >  4 - create bindings to support adding/removal  ibmveth devices
> >  5 - create bindings to support adding/removal PCI devices
> >  6 - create bindings to support adding/removal of memory

There's already large parts of the necessary bits using RTAS in the
kernel (in recent kernels that is, older stuff really needed it all done
in userspace). The trigger mostly is missing.

> This is going to be the hardest part. I don't think QEMU supports memory 
> hotplug yet.

Missing from the above list is also CPU hotplug.

> >  - Do we need to do this the way PowerVM does? We have tested
> > virtio ballooning and it can works with a few endiannes corrections.
> 
> I don't know how PowerVM works. But if normal ballooning is all you 
> need, you should certainly just enable virtio-balloon.

Does virtio-balloon needs endian fixes ? We though it was just working !
Feel free to submit patches :)

> >  7 - create bindings to support adding/removal  CPUs
> >  - is SMP supported already? I tried to run SMP in a x86 host
> > and the guest stuck when SMP is enabled
> 
> SMP should work just fine, yes. Where exactly does it get stuck?

Right,it works fine as far as I can tell.

> >  - would be possible to work on this without a P7 baremetal
> > machine?
> 
> At least for device hotplug, it should be perfectly possible to use an 
> old G5 with PR KVM. I haven't gotten around to patch all the pieces of 
> the puzzle to make -M pseries work with PR KVM when it's running on top 
> of pHyp yet, so that won't work.
> 
> > We have a P7 8205-E6B, is that possible to kick PHYP out?
> 
> Ben?

Probably not. You need a 7R2.

> > Any ideia on how much effort (time/people) the hole thing would take?
> > Any consideration about this is much appreciated :)
> 
> Phew. It's hard to tell. Depends heavily on how good your people are :).
> 

Cheers,
Ben.

Re: [Qemu-devel] Enablig DLPAR capacity on QEMU pSeries

2012-09-12 Thread Alexander Graf



On 12.09.2012, at 22:56, Erlon Cruz  wrote:

> On Wed, Sep 12, 2012 at 12:53 PM, Alexander Graf  wrote:
>> On 09/12/2012 04:54 PM, Erlon Cruz wrote:
>>> 
>>> Hi all,
>>> 
>>> We are planning to implement DLPAR capacity on QEMU pSeries. As we
>> 
>> 
>> What is DLPAR? Hotplug support?
> 
> Yes, basically the way PowerVM uses to dynamically add memory, cpu,
> and I/O slots to logical partitions (LPARs)
> 
>> 
>>> lack of experience in the internals of the arch we would like you guys
>>> to give us some design directions
>>> and confirm if we going in the right direction. Our first idea is:
>>> 
>>> 1 - to patch 'spapr.c' so it can dynamically insert/remove basic
>>> items into the device tree.
>> 
>> 
>> What exactly would you like to patch into it? We already do have support for
>> dynamic dt creation with the spapr target.
>> 
> 
> Actually we were not sure if the machine could do that. So we can add
> things to the tree after booting it?

Linux gets a dt blob on boot, so that one is fixed. IIRC the user space tools 
for pHyp mess with the device tree through special hooks from the Linux side.

> 
>>> 2 - create a host side device that will be used with a guest side
>>> driver to perform guest side operations and communicate changes from
>>> host to the guest (like DynamicRM does in PowerVM LPARs). We are not
>> 
>> 
>> Why not just use hypercalls?
> 
> The hypercalls are initiated from the guest side right? We also need a
> way to the host send things to guest. Using hypercall also would
> require a guest side KM.

Ah, you need some interrupt mechanism. I see.

> 
>>> planning to use powerpc-tools and want to make resource management
>>> transparent (i.e. no need to run daemons or userspace programs in the
>>> guest, only this kernel driver).
>>> 3 - create bindings to support adding/removal  ibmvscsi devices
>>> 4 - create bindings to support adding/removal  ibmveth devices
>>> 5 - create bindings to support adding/removal PCI devices
>>> 6 - create bindings to support adding/removal of memory
>> 
>> 
>> This is going to be the hardest part. I don't think QEMU supports memory
>> hotplug yet.
> 
> AFAIC ballonning is what QEMU provides so far which is fine to x86.

Yes. But balooning can only shrink memory footprint of a VM, not increase its 
memory size.

> 
>> 
>>> - Do we need to do this the way PowerVM does? We have tested
>>> virtio ballooning and it can works with a few endiannes corrections.
>> 
>> 
>> I don't know how PowerVM works. But if normal ballooning is all you need,
>> you should certainly just enable virtio-balloon.
> 
> PowerVM works with Logical Memory Blocks (LMB). The hypervisor
> hotplugs memory blocks to guest's memory. Not only a 'borrowing' from
> the guests, right Ben?
> 
>>> 7 - create bindings to support adding/removal  CPUs
>>> - is SMP supported already? I tried to run SMP in a x86 host
>>> and the guest stuck when SMP is enabled
>> 
>> 
>> SMP should work just fine, yes. Where exactly does it get stuck?
> 
> I think that is right after the guest kernel enables SMP
> 
> [7.478259] Faulting instruction address: 0xc053bbec
> [7.479521] Oops: Kernel access of bad area, sig: 11 [#1]
> [7.479694] SMP NR_CPUS=1024 NUMA pSeries

Mind to attach gdb to the vm and break at that address? Then print a bt on all 
threads (vcpus) :).

> 
> http://pastebin.com/VMtRyaTE
> 
>> 
>> 
>>> - would be possible to work on this without a P7 baremetal
>>> machine?
>> 
>> 
>> At least for device hotplug, it should be perfectly possible to use an old
>> G5 with PR KVM. I haven't gotten around to patch all the pieces of the
>> puzzle to make -M pseries work with PR KVM when it's running on top of pHyp
>> yet, so that won't work.
>> 
>> 
>>> We have a P7 8205-E6B, is that possible to kick PHYP out?
>> 
>> 
>> Ben?
>> 
>> 
>>> Any ideia on how much effort (time/people) the hole thing would take?
>>> Any consideration about this is much appreciated :)
>> 
>> 
>> Phew. It's hard to tell. Depends heavily on how good your people are :).
> 
> Well, considering someone like you :p, so we would just need to
> multiply it by 5 :P

It's still hard to guess. Full-time work probably about a month for a working 
prototype. But you might run into additional issues. And then there's the 
upstream review process which will take at least as long until thebthing is 
clean. And then quite some time to get it bug free ;).


Alex

> 
>> 
>> 
>> Alex
>> 
>>

Re: [Qemu-devel] [PATCH 2/4] tcg/i386: Remove unused registers from tcg_target_call_iarg_regs

2012-09-12 Thread Peter Maydell

On 12 September 2012 21:44, Stefan Weil  wrote:
> --- a/tcg/i386/tcg-target.c
> +++ b/tcg/i386/tcg-target.c
> @@ -75,9 +75,7 @@ static const int tcg_target_call_iarg_regs[] = {
>  TCG_REG_R8,
>  TCG_REG_R9,
>  #else
> -TCG_REG_EAX,
> -TCG_REG_EDX,
> -TCG_REG_ECX
> +/* 32 bit mode uses stack based calling convention (GCC default). */
>  #endif
>  };

This makes the array zero-length for 32 bit targets, but functions
like tcg_out_tlb_load() and tcg_out_qemu_ld() still unconditionally
access elements in it...

-- PMM

[Qemu-devel] [PATCH 1/4] w64: Fix TCG helper functions with 5 arguments

2012-09-12 Thread Stefan Weil

TCG uses 6 registers for function arguments on 64 bit Linux hosts,
but only 4 registers on W64 hosts.

Commit 2999a0b20074a7e4a58f56572bb1436749368f59 increased the number
of arguments for some important helper functions from 4 to 5
which triggered a bug for W64 hosts: QEMU aborts when executing
helper_lcall_real in the guest's BIOS because function
tcg_target_get_call_iarg_regs_count always returned 6.

As W64 has only 4 registers for arguments, the 5th argument must be
passed on the stack using a correct stack offset.

Signed-off-by: Stefan Weil 
---
 tcg/i386/tcg-target.c |2 +-
 tcg/i386/tcg-target.h |4 
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c
index da17bba..43b5572 100644
--- a/tcg/i386/tcg-target.c
+++ b/tcg/i386/tcg-target.c
@@ -118,7 +118,7 @@ static void patch_reloc(uint8_t *code_ptr, int type,
 static inline int tcg_target_get_call_iarg_regs_count(int flags)
 {
 if (TCG_TARGET_REG_BITS == 64) {
-return 6;
+return ARRAY_SIZE(tcg_target_call_iarg_regs);
 }
 
 return 0;
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index c3cfe05..87417d0 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -67,7 +67,11 @@ typedef enum {
 /* used for function call generation */
 #define TCG_REG_CALL_STACK TCG_REG_ESP 
 #define TCG_TARGET_STACK_ALIGN 16
+#if defined(_WIN64)
+#define TCG_TARGET_CALL_STACK_OFFSET 32
+#else
 #define TCG_TARGET_CALL_STACK_OFFSET 0
+#endif
 
 /* optional instructions */
 #define TCG_TARGET_HAS_div2_i32 1
-- 
1.7.10

[Qemu-devel] [PATCH 3/4] tcg: Move tcg_target_get_call_iarg_regs_count to tcg.c

2012-09-12 Thread Stefan Weil

The TCG targets no longer need individual implementations.

Signed-off-by: Stefan Weil 
---
 tcg/arm/tcg-target.c   |6 --
 tcg/hppa/tcg-target.c  |6 --
 tcg/i386/tcg-target.c  |6 --
 tcg/ia64/tcg-target.c  |6 --
 tcg/mips/tcg-target.c  |6 --
 tcg/ppc/tcg-target.c   |6 --
 tcg/ppc64/tcg-target.c |6 --
 tcg/s390/tcg-target.c  |5 -
 tcg/sparc/tcg-target.c |6 --
 tcg/tcg.c  |7 ++-
 tcg/tci/tcg-target.c   |6 --
 11 files changed, 6 insertions(+), 60 deletions(-)

diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c
index cf0ca3d..38e2057 100644
--- a/tcg/arm/tcg-target.c
+++ b/tcg/arm/tcg-target.c
@@ -145,12 +145,6 @@ static void patch_reloc(uint8_t *code_ptr, int type,
 }
 }
 
-/* maximum number of register used for input function arguments */
-static inline int tcg_target_get_call_iarg_regs_count(int flags)
-{
-return 4;
-}
-
 /* parse target specific constraints */
 static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
 {
diff --git a/tcg/hppa/tcg-target.c b/tcg/hppa/tcg-target.c
index 4f17928..985476a 100644
--- a/tcg/hppa/tcg-target.c
+++ b/tcg/hppa/tcg-target.c
@@ -175,12 +175,6 @@ static void patch_reloc(uint8_t *code_ptr, int type,
 *insn_ptr = insn;
 }
 
-/* maximum number of register used for input function arguments */
-static inline int tcg_target_get_call_iarg_regs_count(int flags)
-{
-return 4;
-}
-
 /* parse target specific constraints */
 static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
 {
diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c
index d60bab8..3c25acc 100644
--- a/tcg/i386/tcg-target.c
+++ b/tcg/i386/tcg-target.c
@@ -112,12 +112,6 @@ static void patch_reloc(uint8_t *code_ptr, int type,
 }
 }
 
-/* maximum number of register used for input function arguments */
-static inline int tcg_target_get_call_iarg_regs_count(int flags)
-{
-return ARRAY_SIZE(tcg_target_call_iarg_regs);
-}
-
 /* parse target specific constraints */
 static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
 {
diff --git a/tcg/ia64/tcg-target.c b/tcg/ia64/tcg-target.c
index dc588db..7f31876 100644
--- a/tcg/ia64/tcg-target.c
+++ b/tcg/ia64/tcg-target.c
@@ -176,12 +176,6 @@ static const int tcg_target_call_oarg_regs[] = {
 TCG_REG_R8
 };
 
-/* maximum number of register used for input function arguments */
-static inline int tcg_target_get_call_iarg_regs_count(int flags)
-{
-return 8;
-}
-
 /*
  * opcode formation
  */
diff --git a/tcg/mips/tcg-target.c b/tcg/mips/tcg-target.c
index 1006e28..aad590c 100644
--- a/tcg/mips/tcg-target.c
+++ b/tcg/mips/tcg-target.c
@@ -185,12 +185,6 @@ static void patch_reloc(uint8_t *code_ptr, int type,
 }
 }
 
-/* maximum number of register used for input function arguments */
-static inline int tcg_target_get_call_iarg_regs_count(int flags)
-{
-return 4;
-}
-
 /* parse target specific constraints */
 static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
 {
diff --git a/tcg/ppc/tcg-target.c b/tcg/ppc/tcg-target.c
index 0cff181..b889e34 100644
--- a/tcg/ppc/tcg-target.c
+++ b/tcg/ppc/tcg-target.c
@@ -221,12 +221,6 @@ static void patch_reloc(uint8_t *code_ptr, int type,
 }
 }
 
-/* maximum number of register used for input function arguments */
-static int tcg_target_get_call_iarg_regs_count(int flags)
-{
-return ARRAY_SIZE (tcg_target_call_iarg_regs);
-}
-
 /* parse target specific constraints */
 static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
 {
diff --git a/tcg/ppc64/tcg-target.c b/tcg/ppc64/tcg-target.c
index 27a0ae8..08c5c0f 100644
--- a/tcg/ppc64/tcg-target.c
+++ b/tcg/ppc64/tcg-target.c
@@ -208,12 +208,6 @@ static void patch_reloc (uint8_t *code_ptr, int type,
 }
 }
 
-/* maximum number of register used for input function arguments */
-static int tcg_target_get_call_iarg_regs_count (int flags)
-{
-return ARRAY_SIZE (tcg_target_call_iarg_regs);
-}
-
 /* parse target specific constraints */
 static int target_parse_constraint (TCGArgConstraint *ct, const char **pct_str)
 {
diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index 99b5339..287f010 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -376,11 +376,6 @@ static void patch_reloc(uint8_t *code_ptr, int type,
 }
 }
 
-static int tcg_target_get_call_iarg_regs_count(int flags)
-{
-return sizeof(tcg_target_call_iarg_regs) / sizeof(int);
-}
-
 /* parse target specific constraints */
 static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
 {
diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
index 247a278..3b6c108 100644
--- a/tcg/sparc/tcg-target.c
+++ b/tcg/sparc/tcg-target.c
@@ -137,12 +137,6 @@ static void patch_reloc(uint8_t *code_ptr, int type,
 }
 }
 
-/* maximum number of register used for input function arguments */
-static inline int tcg_target_get_call_iarg

Re: [Qemu-devel] [PATCH 3/4] tcg: Move tcg_target_get_call_iarg_regs_count to tcg.c

2012-09-12 Thread Aurelien Jarno

On Wed, Sep 12, 2012 at 10:44:37PM +0200, Stefan Weil wrote:
> The TCG targets no longer need individual implementations.
> 
> Signed-off-by: Stefan Weil 
> ---
>  tcg/arm/tcg-target.c   |6 --
>  tcg/hppa/tcg-target.c  |6 --
>  tcg/i386/tcg-target.c  |6 --
>  tcg/ia64/tcg-target.c  |6 --
>  tcg/mips/tcg-target.c  |6 --
>  tcg/ppc/tcg-target.c   |6 --
>  tcg/ppc64/tcg-target.c |6 --
>  tcg/s390/tcg-target.c  |5 -
>  tcg/sparc/tcg-target.c |6 --
>  tcg/tcg.c  |7 ++-
>  tcg/tci/tcg-target.c   |6 --
>  11 files changed, 6 insertions(+), 60 deletions(-)
> 
> diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c
> index cf0ca3d..38e2057 100644
> --- a/tcg/arm/tcg-target.c
> +++ b/tcg/arm/tcg-target.c
> @@ -145,12 +145,6 @@ static void patch_reloc(uint8_t *code_ptr, int type,
>  }
>  }
>  
> -/* maximum number of register used for input function arguments */
> -static inline int tcg_target_get_call_iarg_regs_count(int flags)
> -{
> -return 4;
> -}
> -
>  /* parse target specific constraints */
>  static int target_parse_constraint(TCGArgConstraint *ct, const char 
> **pct_str)
>  {
> diff --git a/tcg/hppa/tcg-target.c b/tcg/hppa/tcg-target.c
> index 4f17928..985476a 100644
> --- a/tcg/hppa/tcg-target.c
> +++ b/tcg/hppa/tcg-target.c
> @@ -175,12 +175,6 @@ static void patch_reloc(uint8_t *code_ptr, int type,
>  *insn_ptr = insn;
>  }
>  
> -/* maximum number of register used for input function arguments */
> -static inline int tcg_target_get_call_iarg_regs_count(int flags)
> -{
> -return 4;
> -}
> -
>  /* parse target specific constraints */
>  static int target_parse_constraint(TCGArgConstraint *ct, const char 
> **pct_str)
>  {
> diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c
> index d60bab8..3c25acc 100644
> --- a/tcg/i386/tcg-target.c
> +++ b/tcg/i386/tcg-target.c
> @@ -112,12 +112,6 @@ static void patch_reloc(uint8_t *code_ptr, int type,
>  }
>  }
>  
> -/* maximum number of register used for input function arguments */
> -static inline int tcg_target_get_call_iarg_regs_count(int flags)
> -{
> -return ARRAY_SIZE(tcg_target_call_iarg_regs);
> -}
> -
>  /* parse target specific constraints */
>  static int target_parse_constraint(TCGArgConstraint *ct, const char 
> **pct_str)
>  {
> diff --git a/tcg/ia64/tcg-target.c b/tcg/ia64/tcg-target.c
> index dc588db..7f31876 100644
> --- a/tcg/ia64/tcg-target.c
> +++ b/tcg/ia64/tcg-target.c
> @@ -176,12 +176,6 @@ static const int tcg_target_call_oarg_regs[] = {
>  TCG_REG_R8
>  };
>  
> -/* maximum number of register used for input function arguments */
> -static inline int tcg_target_get_call_iarg_regs_count(int flags)
> -{
> -return 8;
> -}
> -
>  /*
>   * opcode formation
>   */
> diff --git a/tcg/mips/tcg-target.c b/tcg/mips/tcg-target.c
> index 1006e28..aad590c 100644
> --- a/tcg/mips/tcg-target.c
> +++ b/tcg/mips/tcg-target.c
> @@ -185,12 +185,6 @@ static void patch_reloc(uint8_t *code_ptr, int type,
>  }
>  }
>  
> -/* maximum number of register used for input function arguments */
> -static inline int tcg_target_get_call_iarg_regs_count(int flags)
> -{
> -return 4;
> -}
> -
>  /* parse target specific constraints */
>  static int target_parse_constraint(TCGArgConstraint *ct, const char 
> **pct_str)
>  {
> diff --git a/tcg/ppc/tcg-target.c b/tcg/ppc/tcg-target.c
> index 0cff181..b889e34 100644
> --- a/tcg/ppc/tcg-target.c
> +++ b/tcg/ppc/tcg-target.c
> @@ -221,12 +221,6 @@ static void patch_reloc(uint8_t *code_ptr, int type,
>  }
>  }
>  
> -/* maximum number of register used for input function arguments */
> -static int tcg_target_get_call_iarg_regs_count(int flags)
> -{
> -return ARRAY_SIZE (tcg_target_call_iarg_regs);
> -}
> -
>  /* parse target specific constraints */
>  static int target_parse_constraint(TCGArgConstraint *ct, const char 
> **pct_str)
>  {
> diff --git a/tcg/ppc64/tcg-target.c b/tcg/ppc64/tcg-target.c
> index 27a0ae8..08c5c0f 100644
> --- a/tcg/ppc64/tcg-target.c
> +++ b/tcg/ppc64/tcg-target.c
> @@ -208,12 +208,6 @@ static void patch_reloc (uint8_t *code_ptr, int type,
>  }
>  }
>  
> -/* maximum number of register used for input function arguments */
> -static int tcg_target_get_call_iarg_regs_count (int flags)
> -{
> -return ARRAY_SIZE (tcg_target_call_iarg_regs);
> -}
> -
>  /* parse target specific constraints */
>  static int target_parse_constraint (TCGArgConstraint *ct, const char 
> **pct_str)
>  {
> diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
> index 99b5339..287f010 100644
> --- a/tcg/s390/tcg-target.c
> +++ b/tcg/s390/tcg-target.c
> @@ -376,11 +376,6 @@ static void patch_reloc(uint8_t *code_ptr, int type,
>  }
>  }
>  
> -static int tcg_target_get_call_iarg_regs_count(int flags)
> -{
> -return sizeof(tcg_target_call_iarg_regs) / sizeof(int);
> -}
> -
>  /* parse target specific constraints */
>  static int target_parse_constraint(TCGArgConstr

Re: [Qemu-devel] Enablig DLPAR capacity on QEMU pSeries

2012-09-12 Thread Erlon Cruz

On Wed, Sep 12, 2012 at 12:53 PM, Alexander Graf  wrote:
> On 09/12/2012 04:54 PM, Erlon Cruz wrote:
>>
>> Hi all,
>>
>> We are planning to implement DLPAR capacity on QEMU pSeries. As we
>
>
> What is DLPAR? Hotplug support?

Yes, basically the way PowerVM uses to dynamically add memory, cpu,
and I/O slots to logical partitions (LPARs)

>
>> lack of experience in the internals of the arch we would like you guys
>> to give us some design directions
>> and confirm if we going in the right direction. Our first idea is:
>>
>>  1 - to patch 'spapr.c' so it can dynamically insert/remove basic
>> items into the device tree.
>
>
> What exactly would you like to patch into it? We already do have support for
> dynamic dt creation with the spapr target.
>

Actually we were not sure if the machine could do that. So we can add
things to the tree after booting it?

>>  2 - create a host side device that will be used with a guest side
>> driver to perform guest side operations and communicate changes from
>> host to the guest (like DynamicRM does in PowerVM LPARs). We are not
>
>
> Why not just use hypercalls?

The hypercalls are initiated from the guest side right? We also need a
way to the host send things to guest. Using hypercall also would
require a guest side KM.

>> planning to use powerpc-tools and want to make resource management
>> transparent (i.e. no need to run daemons or userspace programs in the
>> guest, only this kernel driver).
>>  3 - create bindings to support adding/removal  ibmvscsi devices
>>  4 - create bindings to support adding/removal  ibmveth devices
>>  5 - create bindings to support adding/removal PCI devices
>>  6 - create bindings to support adding/removal of memory
>
>
> This is going to be the hardest part. I don't think QEMU supports memory
> hotplug yet.

AFAIC ballonning is what QEMU provides so far which is fine to x86.

>
>>  - Do we need to do this the way PowerVM does? We have tested
>> virtio ballooning and it can works with a few endiannes corrections.
>
>
> I don't know how PowerVM works. But if normal ballooning is all you need,
> you should certainly just enable virtio-balloon.

PowerVM works with Logical Memory Blocks (LMB). The hypervisor
hotplugs memory blocks to guest's memory. Not only a 'borrowing' from
the guests, right Ben?

>>  7 - create bindings to support adding/removal  CPUs
>>  - is SMP supported already? I tried to run SMP in a x86 host
>> and the guest stuck when SMP is enabled
>
>
> SMP should work just fine, yes. Where exactly does it get stuck?

I think that is right after the guest kernel enables SMP

[7.478259] Faulting instruction address: 0xc053bbec
[7.479521] Oops: Kernel access of bad area, sig: 11 [#1]
[7.479694] SMP NR_CPUS=1024 NUMA pSeries

http://pastebin.com/VMtRyaTE

>
>
>>  - would be possible to work on this without a P7 baremetal
>> machine?
>
>
> At least for device hotplug, it should be perfectly possible to use an old
> G5 with PR KVM. I haven't gotten around to patch all the pieces of the
> puzzle to make -M pseries work with PR KVM when it's running on top of pHyp
> yet, so that won't work.
>
>
>> We have a P7 8205-E6B, is that possible to kick PHYP out?
>
>
> Ben?
>
>
>> Any ideia on how much effort (time/people) the hole thing would take?
>> Any consideration about this is much appreciated :)
>
>
> Phew. It's hard to tell. Depends heavily on how good your people are :).

Well, considering someone like you :p, so we would just need to
multiply it by 5 :P

>
>
> Alex
>
>

[Qemu-devel] [PATCH 0/4] Fix and clean tcg_target_get_call_iarg_regs_count

2012-09-12 Thread Stefan Weil

Function tcg_target_get_call_iarg_regs_count was wrong for w64 hosts
and can be simplified for all TCG targets:

[PATCH 1/4] w64: Fix TCG helper functions with 5 arguments
[PATCH 2/4] tcg/i386: Remove unused registers from
[PATCH 3/4] tcg: Move tcg_target_get_call_iarg_regs_count to tcg.c
[PATCH 4/4] tcg: Remove unused parameter from

The first patch is also needed for stable-1.2. It was already sent
today as a single patch.

Regards,

Stefan Weil

[Qemu-devel] [PATCH 2/4] tcg/i386: Remove unused registers from tcg_target_call_iarg_regs

2012-09-12 Thread Stefan Weil

32 bit x86 hosts don't need registers for helper function arguments
because they use the default stack based calling convention.

Removing the registers allows simpler code for function
tcg_target_get_call_iarg_regs_count.

Signed-off-by: Stefan Weil 
---
 tcg/i386/tcg-target.c |   10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c
index 43b5572..d60bab8 100644
--- a/tcg/i386/tcg-target.c
+++ b/tcg/i386/tcg-target.c
@@ -75,9 +75,7 @@ static const int tcg_target_call_iarg_regs[] = {
 TCG_REG_R8,
 TCG_REG_R9,
 #else
-TCG_REG_EAX,
-TCG_REG_EDX,
-TCG_REG_ECX
+/* 32 bit mode uses stack based calling convention (GCC default). */
 #endif
 };
 
@@ -117,11 +115,7 @@ static void patch_reloc(uint8_t *code_ptr, int type,
 /* maximum number of register used for input function arguments */
 static inline int tcg_target_get_call_iarg_regs_count(int flags)
 {
-if (TCG_TARGET_REG_BITS == 64) {
-return ARRAY_SIZE(tcg_target_call_iarg_regs);
-}
-
-return 0;
+return ARRAY_SIZE(tcg_target_call_iarg_regs);
 }
 
 /* parse target specific constraints */
-- 
1.7.10

[Qemu-devel] [PATCH 4/4] tcg: Remove unused parameter from tcg_target_get_call_iarg_regs_count

2012-09-12 Thread Stefan Weil

Since commit 6a18ae2d2947532d5c26439548afa0481c4529f9,
'flags' is no longer used in tcg_target_get_call_iarg_regs_count.

Signed-off-by: Stefan Weil 
---
 tcg/tcg.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index 53316f6..0f0a8d0 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -182,7 +182,7 @@ int gen_new_label(void)
 #include "tcg-target.c"
 
 /* Maximum number of register used for input function arguments. */
-static inline int tcg_target_get_call_iarg_regs_count(int flags)
+static inline int tcg_target_get_call_iarg_regs_count(void)
 {
 return ARRAY_SIZE(tcg_target_call_iarg_regs);
 }
@@ -1871,7 +1871,7 @@ static int tcg_reg_alloc_call(TCGContext *s, const 
TCGOpDef *def,
 
 flags = args[nb_oargs + nb_iargs];
 
-nb_regs = tcg_target_get_call_iarg_regs_count(flags);
+nb_regs = tcg_target_get_call_iarg_regs_count();
 if (nb_regs > nb_params)
 nb_regs = nb_params;
 
-- 
1.7.10

Re: [Qemu-devel] [PATCH v2 0/3] nonblocking connect address handling cleanup

2012-09-12 Thread Michael S. Tsirkin

On Wed, Sep 12, 2012 at 02:12:55PM +0300, Orit Wasserman wrote:
> getaddrinfo can give us a list of addresses, but we only try to
> connect to the first one. If that fails we never proceed to
> the next one.  This is common on desktop setups that often have ipv6
> configured but not actually working.
> A simple way to reproduce the problem is migration:
> for the destination use -incoming tcp:0:, run migrate -d 
> tcp:localhost:
> migration will fail on hosts that have both IPv4 and IPV6 address for 
> localhost.
> 
> To fix this, refactor address resolution code and make 
> inet_nonblocking_connect
> retry connection with a different address.

Nice improvement over my patch. thanks

For the series:
Acked-by: Michael S. Tsirkin 

Anthony, we got reports offlist from people with such
setups - migration under 1.2 regressed for them.
So I think we need this on 1.2.X as well if
this ever happens.

> Orit Wasserman (3):
>   Refactor inet_connect_opts function
>   Separate inet_connect into inet_connect (blocking) and
> inet_nonblocking_connect
>   Fix address handling in inet_nonblocking_connect
> 
>  migration-tcp.c |   29 ++-
>  nbd.c   |2 +-
>  qemu-sockets.c  |  254 
> +--
>  qemu_socket.h   |9 ++-
>  ui/vnc.c|2 +-
>  5 files changed, 208 insertions(+), 88 deletions(-)
> 
> -- 
> 1.7.7.6

Re: [Qemu-devel] [libvirt] [PATCH] snapshot: fix rollback failure in transaction mode

2012-09-12 Thread Eric Blake

On 09/12/2012 12:33 PM, Jeff Cody wrote:

> If QEMU creates the snapshot file, upon transaction failure it is
> possible to have a newly created image file left, depending on when the
> failure occurs.  The running QEMU instance, however, will not be
> affected.
> 
> For instance, if we are performing qcow2 snapshots on 2 drives using the
> transaction command (e.g. drive0 and drive1), we will:
> 
> 1. create qcow2 image file for drive0 new active layer
> 2. create qcow2 image file for drive1 new active layer
> 3. If 1 & 2 were successful, then we modify the live image chain
>structure in memory to use the newly created files.  Otherwise, we
>abandon the change, notify libvirt of the error, and leave any newly
>created files intact.
> 
> That means, on a snapshot failure, QEMU's live running operation will
> not be affected, but the management software (libvirt) should clean up
> any resulting image files, if appropriate.
> 
> It sounds like you expect QEMU to unlink any of the newly created
> snapshot files on failure - is that an accurate statement?

A non-empty file is being left behind by the transaction command, and it
seems like whoever did the successful open(O_CREAT) of that file should
then be tasked with the unlink() of the same file on the failure path.
But it's not quite as simple as having qemu unlink() the file.
Remember, with libvirt, we have to pre-create a 0-byte file, in order to
set appropriate SELinux labels, prior to telling qemu to make the
snapshot.  Remember, qemu is doing open(O_CREAT) but not
open(O_CREAT|O_EXCL), and has no idea whether it created a new file or
is merely reusing (and truncating) an existing file, so in a way, I'm
arguing that libvirt should be doing the unlink() if it handed a
pre-created file into qemu.

But even if unlink() on 'transaction' failure in qemu is asking too
much, it might be nice of qemu to ftruncate() the file back to 0 bytes
to return it to the state it had pre-'transaction'; any operation that
claims to have full rollback, but leaves altered state behind (even if
the altered state doesn't affect qemu operation), just seems odd.  And
while libvirt always hands in a pre-created empty file, there's still
the question of how qemu should behave when operated manually without
SELinux in the way and without libvirt driving things, where an unlink()
in qemu may actually make sense.

At any rate, you are confirming that qemu 1.1 leaves a non-empty file
behind, so even if we change qemu 1.3 to truncate the file back to 0
bytes on failure, libvirt still has to cope with the older behavior.  So
it's now up to the qemu folks to decide whether this is a bug worth
fixing, or merely an annoyance worth documenting, or even something
worth completely ignoring and throwing back at libvirt.  On libvirt's
side, Guannan's patch looks correct, although it needs better commentary
in the commit message and/or code before being applied.

-- 
Eric Blake   ebl...@redhat.com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] Rethinking missed tick catchup

2012-09-12 Thread Michael Roth

On Wed, Sep 12, 2012 at 07:30:08PM +0200, Stefan Weil wrote:
> Am 12.09.2012 18:45, schrieb Gleb Natapov:
> >On Wed, Sep 12, 2012 at 06:27:14PM +0200, Stefan Weil wrote:
> >>Am 12.09.2012 15:54, schrieb Anthony Liguori:
> >>>Hi,
> >>>
> >>>We've been running into a lot of problems lately with Windows guests and
> >>>I think they all ultimately could be addressed by revisiting the missed
> >>>tick catchup algorithms that we use.  Mike and I spent a while talking
> >>>about it yesterday and I wanted to take the discussion to the list to
> >>>get some additional input.
> >>>
> >>>Here are the problems we're seeing:
> >>>
> >>>1) Rapid reinjection can lead to time moving faster for short bursts of
> >>>time.  We've seen a number of RTC watchdog BSoDs and it's possible
> >>>that at least one cause is reinjection speed.
> >>>
> >>>2) When hibernating a host system, the guest gets is essentially paused
> >>>for a long period of time.  This results in a very large tick catchup
> >>>while also resulting in a large skew in guest time.
> >>>
> >>>I've gotten reports of the tick catchup consuming a lot of CPU time
> >>>from rapid delivery of interrupts (although I haven't reproduced this
> >>>yet).
> >>>
> >>>3) Windows appears to have a service that periodically syncs the guest
> >>>time with the hardware clock.  I've been told the resync period is an
> >>>hour.  For large clock skews, this can compete with reinjection
> >>>resulting in a positive skew in time (the guest can be ahead of the
> >>>host).
> >>Nearly each modern OS (including Windows) uses NTP
> >>or some other protocol to get the time via a TCP network.
> >>
> >The drifts we are talking about will take ages for NTP to fix.
> >
> >>If a guest OS detects a small difference of time, it will usually
> >>accelerate or decelerate the OS clock until the time is
> >>synchronised again.
> >>
> >>Large jumps in network time will make the OS time jump, too.
> >>With a little bad luck, QEMU's reinjection will add the
> >>positive skew, no matter whether the guest is Linux or Windows.
> >>
> >As far as I know NTP will never make OS clock jump. The purpose of NTP
> >is to fix time gradually, so apps will not notice. npdate is used to
> >force clock synchronization, but is should be run manually.
> 
> s/npdate/ntpdate. Yes, some Linux distros run it at system start,
> and it's also usual to call it every hour (poor man's NTP, uses
> less resources).

Windows at least seems to generally default to a max correction of += 15
hours using this approach. The relevant registry values are listed here:

http://support.microsoft.com/kb/816042#method4 (under More Information)

On my Win7 instance I have:

MaxPosPhaseCorrection: 54000 (15 hours)
MaxNegPhaseCorrection: 54000 (15 hours)

So there are definitely situations where guests won't correct themselves
even with NTP or ntpdate-like services running.

Also:

MaxAllowedPhaseOffset: 1 (1 second)

So Windows won't attempt to "catch-up" via increased tickrate if the
delta is greater than 1 second, and will instead try to reset the clock
directly. Which is basically the policy we're looking to implement,
except from the host-side.


> 
> >
> >>>I've been thinking about an algorithm like this to address these
> >>>problems:
> >>>
> >>>A) Limit the number of interrupts that we reinject to the equivalent of
> >>>a small period of wallclock time.  Something like 60 seconds.
> >>>
> >>>B) In the event of (A), trigger a notification in QEMU.  This is easy
> >>>for the RTC but harder for the in-kernel PIT.  Maybe it's a good time 
> >>> to
> >>>revisit usage of the in-kernel PIT?
> >>>
> >>>C) On acculumated tick overflow, rely on using a qemu-ga command to
> >>>force a resync of the guest's time to the hardware wallclock time.
> >>>
> >>>D) Whenever the guest reads the wallclock time from the RTC, reset all
> >>>accumulated ticks.
> >>D) makes no sense, see my comment above.
> >>
> >>Injection of additional timer interrupts should not be needed
> >>after a hibernation. The guest must handle that situation
> >>by reading either the hw clock (which must be updated
> >>by QEMU when it resumes from hibernate) or by using
> >>another time reference (like NTP, for example).
> >>
> >He is talking about host hibernation, not guest.
> >
> 
> I also meant host hibernation.
> 
> Maybe the host should tell the guest that it is going to
> hibernate (ACPI event), then the guest can use its
> normal hibernate entry and recovery code, too.
> 

I think do that would be useful either way, but aren't there other
scenarios where big time jumps can occur? What about live migration?
Presumably we'd complete within the 15 hour limit above, but for other
operating systems or particular configurations thereof we might still
fall outside the threshold they're willing to correct for. At least with
an approach like this we can clearly define the requirements for proper
time-keeping.

Re: [Qemu-devel] [PATCH] w64: Fix calls of TCG helper functions with 5 arguments

2012-09-12 Thread Stefan Weil


Am 12.09.2012 21:14, schrieb Aurelien Jarno:

On Wed, Sep 12, 2012 at 07:12:47PM +0100, Peter Maydell wrote:

On 12 September 2012 19:03, Stefan Weil  wrote:

diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c
index da17bba..43b5572 100644
--- a/tcg/i386/tcg-target.c
+++ b/tcg/i386/tcg-target.c
@@ -118,7 +118,7 @@ static void patch_reloc(uint8_t *code_ptr, int type,
  static inline int tcg_target_get_call_iarg_regs_count(int flags)
  {
  if (TCG_TARGET_REG_BITS == 64) {
-return 6;
+return ARRAY_SIZE(tcg_target_call_iarg_regs);
  }

  return 0;


Hmm. Why can't we just return the array size in all cases?
Is there something special about 32 bit x86? I checked, and
all our other TCG targets return the same value as the size of
the iarg_regs array (either using ARRAY_SIZE or by just returning
the right number)...



On 32-bit x86, all arguments are now being passed on the stack, that's
why the function returns 0. On the other hand when the change has been
done, the registers haven't been removed from tcg_target_call_iarg_regs.

I think this patch is fine enough for 1.2, but a better patch is needed
for master.


As soon as the special case x86 with 32 bit is fixed or eliminated,
it should be possible that all TCG targets share the same code for
tcg_target_get_call_iarg_regs_count. That function could be
removed from the target specific implementations and moved
to tcg.c.

Re: [Qemu-devel] Rethinking missed tick catchup

2012-09-12 Thread Stefan Weil


Am 12.09.2012 20:13, schrieb Gleb Natapov:

On Wed, Sep 12, 2012 at 07:30:08PM +0200, Stefan Weil wrote:

I also meant host hibernation.

Than I don't see how guest can handle the situation since it has
no idea that it was stopped. Qemu has not idea about host hibernation
either.


The guest can compare its internal timers (which stop
when the host hibernates) with external time references
(NTP, hw clock or any other clock reference).

Re: [Qemu-devel] [PATCH] w64: Fix calls of TCG helper functions with 5 arguments

2012-09-12 Thread Stefan Weil


Am 12.09.2012 21:14, schrieb Aurelien Jarno:

On Wed, Sep 12, 2012 at 07:12:47PM +0100, Peter Maydell wrote:

On 12 September 2012 19:03, Stefan Weil  wrote:

diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c
index da17bba..43b5572 100644
--- a/tcg/i386/tcg-target.c
+++ b/tcg/i386/tcg-target.c
@@ -118,7 +118,7 @@ static void patch_reloc(uint8_t *code_ptr, int type,
  static inline int tcg_target_get_call_iarg_regs_count(int flags)
  {
  if (TCG_TARGET_REG_BITS == 64) {
-return 6;
+return ARRAY_SIZE(tcg_target_call_iarg_regs);
  }

  return 0;


Hmm. Why can't we just return the array size in all cases?
Is there something special about 32 bit x86? I checked, and
all our other TCG targets return the same value as the size of
the iarg_regs array (either using ARRAY_SIZE or by just returning
the right number)...



On 32-bit x86, all arguments are now being passed on the stack, that's
why the function returns 0. On the other hand when the change has been
done, the registers haven't been removed from tcg_target_call_iarg_regs.

I think this patch is fine enough for 1.2, but a better patch is needed
for master.


I noticed that Blue switched from register arguments to
arguments on the stack, but don't know the reason for that
change.

Maybe 32 bit x86 can also use a mixture of register / stack
arguments. This needs more testing and is the main reason
why I did not change tcg_target_call_iarg_regs for 32 bit
and return ARRAY_SIZE for both 32 and 64 bit.

I'd prefer to get the patch in master soon because it is
a minimalistic change which fixes the now unusable
64 bit mode on Windows. An additional patch can still
be applied on top.

Of course any better patch which also fixes 64 bit Windows
and which comes soon would also be very acceptable.

Regards

Stefan

Re: [Qemu-devel] [PATCH] socket: don't attempt to reconnect a TCP socket in server mode

2012-09-12 Thread Anthony Liguori

"Richard W.M. Jones"  writes:

> On Wed, Sep 05, 2012 at 02:01:36PM -0500, Anthony Liguori wrote:
>> Commit c3767ed0eb5d0bb25fe409ae5dec06e3411ff1b6 introduced a possible SEGV 
>> when
>> using a socket chardev with server=on because it assumes that all TCP sockets
>> are in client mode.
>> 
>> This patch adds a check to only reconnect when in client mode.
>> 
>> Cc: Lei Li 
>> Reported-by: Michael Roth 
>> Signed-off-by: Anthony Liguori 
>> ---
>>  qemu-char.c |4 +++-
>>  1 files changed, 3 insertions(+), 1 deletions(-)
>> 
>> diff --git a/qemu-char.c b/qemu-char.c
>> index 398baf1..767da93 100644
>> --- a/qemu-char.c
>> +++ b/qemu-char.c
>> @@ -2148,10 +2148,12 @@ static int tcp_chr_write(CharDriverState *chr, const 
>> uint8_t *buf, int len)
>>  TCPCharDriver *s = chr->opaque;
>>  if (s->connected) {
>>  return send_all(s->fd, buf, len);
>> -} else {
>> +} else if (s->listen_fd == -1) {
>>  /* (Re-)connect for unconnected writing */
>>  tcp_chr_connect(chr);
>>  return 0;
>> +} else {
>> +return len;
>>  }
>>  }
>
> Hi Anthony,
>
> I just came around this patch when I was trying to fix this
> bug: https://bugzilla.redhat.com/show_bug.cgi?id=853408
> qemu segfaults when trying to write to a serial socket which
> is *not* a server socket and has been closed by the other end.
>
> Unfortunately your patch above does not fix it.  Only a
> complete revert of c3767ed0eb5d0 fixes it.
>
> I don't understand the purpose of c3767ed0eb5d0 at all.  It
> seems to set the s->connected flag and carries on regardless,
> happily calling write (-1, ...), which is completely broken.
>
> The other end closed the socket.  There's no one listening on the
> other end, and setting the s->connected flag will not help that.

You're 100% correct.  I was only attempting to fix the server SEGV, I
didn't notice that client was hopelessly broken too.  Will send a patch
reverting both commits.

Regards,

Anthony Liguori

>
> Rich.
>
> -- 
> Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
> Read my programming blog: http://rwmj.wordpress.com
> Fedora now supports 80 OCaml packages (the OPEN alternative to F#)
> http://cocan.org/getting_started_with_ocaml_on_red_hat_and_fedora

Re: [Qemu-devel] [PATCH] w64: Fix calls of TCG helper functions with 5 arguments

2012-09-12 Thread Aurelien Jarno

On Wed, Sep 12, 2012 at 07:12:47PM +0100, Peter Maydell wrote:
> On 12 September 2012 19:03, Stefan Weil  wrote:
> > diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c
> > index da17bba..43b5572 100644
> > --- a/tcg/i386/tcg-target.c
> > +++ b/tcg/i386/tcg-target.c
> > @@ -118,7 +118,7 @@ static void patch_reloc(uint8_t *code_ptr, int type,
> >  static inline int tcg_target_get_call_iarg_regs_count(int flags)
> >  {
> >  if (TCG_TARGET_REG_BITS == 64) {
> > -return 6;
> > +return ARRAY_SIZE(tcg_target_call_iarg_regs);
> >  }
> >
> >  return 0;
> 
> Hmm. Why can't we just return the array size in all cases?
> Is there something special about 32 bit x86? I checked, and
> all our other TCG targets return the same value as the size of
> the iarg_regs array (either using ARRAY_SIZE or by just returning
> the right number)...
> 

On 32-bit x86, all arguments are now being passed on the stack, that's
why the function returns 0. On the other hand when the change has been 
done, the registers haven't been removed from tcg_target_call_iarg_regs.

I think this patch is fine enough for 1.2, but a better patch is needed
for master.


-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH] socket: don't attempt to reconnect a TCP socket in server mode

2012-09-12 Thread Richard W.M. Jones

On Wed, Sep 05, 2012 at 02:01:36PM -0500, Anthony Liguori wrote:
> Commit c3767ed0eb5d0bb25fe409ae5dec06e3411ff1b6 introduced a possible SEGV 
> when
> using a socket chardev with server=on because it assumes that all TCP sockets
> are in client mode.
> 
> This patch adds a check to only reconnect when in client mode.
> 
> Cc: Lei Li 
> Reported-by: Michael Roth 
> Signed-off-by: Anthony Liguori 
> ---
>  qemu-char.c |4 +++-
>  1 files changed, 3 insertions(+), 1 deletions(-)
> 
> diff --git a/qemu-char.c b/qemu-char.c
> index 398baf1..767da93 100644
> --- a/qemu-char.c
> +++ b/qemu-char.c
> @@ -2148,10 +2148,12 @@ static int tcp_chr_write(CharDriverState *chr, const 
> uint8_t *buf, int len)
>  TCPCharDriver *s = chr->opaque;
>  if (s->connected) {
>  return send_all(s->fd, buf, len);
> -} else {
> +} else if (s->listen_fd == -1) {
>  /* (Re-)connect for unconnected writing */
>  tcp_chr_connect(chr);
>  return 0;
> +} else {
> +return len;
>  }
>  }

Hi Anthony,

I just came around this patch when I was trying to fix this
bug: https://bugzilla.redhat.com/show_bug.cgi?id=853408
qemu segfaults when trying to write to a serial socket which
is *not* a server socket and has been closed by the other end.

Unfortunately your patch above does not fix it.  Only a
complete revert of c3767ed0eb5d0 fixes it.

I don't understand the purpose of c3767ed0eb5d0 at all.  It
seems to set the s->connected flag and carries on regardless,
happily calling write (-1, ...), which is completely broken.

The other end closed the socket.  There's no one listening on the
other end, and setting the s->connected flag will not help that.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming blog: http://rwmj.wordpress.com
Fedora now supports 80 OCaml packages (the OPEN alternative to F#)
http://cocan.org/getting_started_with_ocaml_on_red_hat_and_fedora

Re: [Qemu-devel] [libvirt] [PATCH] snapshot: fix rollback failure in transaction mode

2012-09-12 Thread Jeff Cody

On 09/12/2012 01:47 PM, Eric Blake wrote:
> On 09/12/2012 09:22 AM, Guannan Ren wrote:
>> After failure of qemu transaction command, the snapshot file still
>> be there with non-zero in size. In order to unlink the file, the
>> patch removes the file size checking.
> 
> Can you give some exact steps to reproduce this, so that I know who is
> making the file have non-zero size?  I'm worried that unlinking a
> non-empty file is discarding data, so the commit message needs a lot
> more details about how we are proving that the only way the file can be
> non-zero size is because qemu happened to put data into a previously
> empty file prior to the failed 'transaction' attempt.
> 
> That is, after re-reading context code just now, I'm fairly confident
> that this code can only be reached when qemu supports the 'transaction'
> monitor command, and libvirt's --reuse-ext flag was not specified, and
> therefore libvirt must have just created the file.  But I want that in
> the commit message rather than having to re-read the code every time I
> visit this commit in future reads of the git log.  It may also be that
> qemu has a bug in that the 'transaction' command is modifying files even
> when it fails, so even while this works around the bug, I'm cc'ing Jeff
> to see if qemu also needs a bug fix.

If QEMU creates the snapshot file, upon transaction failure it is
possible to have a newly created image file left, depending on when the
failure occurs.  The running QEMU instance, however, will not be
affected.

For instance, if we are performing qcow2 snapshots on 2 drives using the
transaction command (e.g. drive0 and drive1), we will:

1. create qcow2 image file for drive0 new active layer
2. create qcow2 image file for drive1 new active layer
3. If 1 & 2 were successful, then we modify the live image chain
   structure in memory to use the newly created files.  Otherwise, we
   abandon the change, notify libvirt of the error, and leave any newly
   created files intact.

That means, on a snapshot failure, QEMU's live running operation will
not be affected, but the management software (libvirt) should clean up
any resulting image files, if appropriate.

It sounds like you expect QEMU to unlink any of the newly created
snapshot files on failure - is that an accurate statement?

> 
>> ---
>>  src/qemu/qemu_driver.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
>> index 8e8e00c..1fedfb8 100644
>> --- a/src/qemu/qemu_driver.c
>> +++ b/src/qemu/qemu_driver.c
>> @@ -10833,7 +10833,7 @@ qemuDomainSnapshotUndoSingleDiskActive(struct 
>> qemud_driver *driver,
>>  if (virDomainLockDiskDetach(driver->lockManager, vm, disk) < 0)
>>  VIR_WARN("Unable to release lock on %s", disk->src);
>>  if (need_unlink && stat(disk->src, &st) == 0 &&
>> -st.st_size == 0 && S_ISREG(st.st_mode) && unlink(disk->src) < 0)
>> +S_ISREG(st.st_mode) && unlink(disk->src) < 0)
>>  VIR_WARN("Unable to remove just-created %s", disk->src);
>>  
>>  /* Update vm in place to match changes.  */
>>
>

Re: [Qemu-devel] [PATCH v2 3/4] target-i386: Allow changing of Hypervisor CPUIDs.

2012-09-12 Thread Marcelo Tosatti


The problem with integrating this is that it has little or 
no assurance from documentation. The Linux kernel source is a good
source, then say "accordingly to VMWare guest support code in version xyz"
in the changelog.

Also extracting this information in a text file (or comment in the code)
would be better than just adding code.

On Tue, Sep 11, 2012 at 10:07:46AM -0400, Don Slutz wrote:
> This is primarily done so that the guest will think it is running
> under vmware when hypervisor-vendor=vmware is specified as a
> property of a cpu.
> 
> Signed-off-by: Don Slutz 
> ---
>  target-i386/cpu.c |  214 
> +
>  target-i386/cpu.h |   21 +
>  target-i386/kvm.c |   33 +++--
>  3 files changed, 262 insertions(+), 6 deletions(-)
> 
> diff --git a/target-i386/cpu.c b/target-i386/cpu.c
> index 5f9866a..9f1f390 100644
> --- a/target-i386/cpu.c
> +++ b/target-i386/cpu.c
> @@ -1135,6 +1135,36 @@ static void x86_cpuid_set_model_id(Object *obj, const 
> char *model_id,
>  }
>  }
>  
> +static void x86_cpuid_set_vmware_extra(Object *obj)
> +{
> +X86CPU *cpu = X86_CPU(obj);
> +
> +if ((cpu->env.tsc_khz != 0) &&
> +(cpu->env.cpuid_hv_level == CPUID_HV_LEVEL_VMARE_4) &&
> +(cpu->env.cpuid_hv_vendor1 == CPUID_HV_VENDOR_VMWARE_1) &&
> +(cpu->env.cpuid_hv_vendor2 == CPUID_HV_VENDOR_VMWARE_2) &&
> +(cpu->env.cpuid_hv_vendor3 == CPUID_HV_VENDOR_VMWARE_3)) {
> +const uint32_t apic_khz = 100L;
> +
> +/*
> + * From article.gmane.org/gmane.comp.emulators.kvm.devel/22643
> + *
> + *Leaf 0x4010, Timing Information.
> + *
> + *VMware has defined the first generic leaf to provide timing
> + *information.  This leaf returns the current TSC frequency and
> + *current Bus frequency in kHz.
> + *
> + *# EAX: (Virtual) TSC frequency in kHz.
> + *# EBX: (Virtual) Bus (local apic timer) frequency in kHz.
> + *# ECX, EDX: RESERVED (Per above, reserved fields are set to 
> zero).
> + */
> +cpu->env.cpuid_hv_extra = 0x4010;
> +cpu->env.cpuid_hv_extra_a = (uint32_t)cpu->env.tsc_khz;
> +cpu->env.cpuid_hv_extra_b = apic_khz;
> +}
> +}

What happens in case you migrate the vmware guest to a host
with different frequency? How is that transmitted to the
vmware-guest-running-on-kvm ? Or is migration not supported?

> +static void x86_cpuid_set_hv_level(Object *obj, Visitor *v, void *opaque,
> +const char *name, Error **errp)
> +{
> +X86CPU *cpu = X86_CPU(obj);
> +uint32_t value;
> +
> +visit_type_uint32(v, &value, name, errp);
> +if (error_is_set(errp)) {
> +return;
> +}
> +
> +if ((value != 0) && (value < 0x4000)) {
> +value += 0x4000;
> +}
> +cpu->env.cpuid_hv_level = value;
> +}
> +
> +static char *x86_cpuid_get_hv_vendor(Object *obj, Error **errp)
> +{
> +X86CPU *cpu = X86_CPU(obj);
> +CPUX86State *env = &cpu->env;
> +char *value;
> +int i;
> +
> +value = (char *)g_malloc(CPUID_VENDOR_SZ + 1);
> +for (i = 0; i < 4; i++) {
> +value[i + 0] = env->cpuid_hv_vendor1 >> (8 * i);
> +value[i + 4] = env->cpuid_hv_vendor2 >> (8 * i);
> +value[i + 8] = env->cpuid_hv_vendor3 >> (8 * i);
> +}
> +value[CPUID_VENDOR_SZ] = '\0';
> +
> +/* Convert known names */
> +if (!strcmp(value, CPUID_HV_VENDOR_VMWARE)) {
> +if (env->cpuid_hv_level == CPUID_HV_LEVEL_VMARE_4) {
> +pstrcpy(value, sizeof(value), "vmware4");
> +} else if (env->cpuid_hv_level == CPUID_HV_LEVEL_VMARE_3) {
> +pstrcpy(value, sizeof(value), "vmware3");
> +}
> +} else if (!strcmp(value, CPUID_HV_VENDOR_XEN) &&
> +   env->cpuid_hv_level == CPUID_HV_LEVEL_XEN) {
> +pstrcpy(value, sizeof(value), "xen");
> +} else if (!strcmp(value, CPUID_HV_VENDOR_KVM) &&
> +   env->cpuid_hv_level == 0) {
> +pstrcpy(value, sizeof(value), "kvm");
> +}
> +return value;
> +}
> +
> +static void x86_cpuid_set_hv_vendor(Object *obj, const char *value,
> + Error **errp)
> +{
> +X86CPU *cpu = X86_CPU(obj);
> +CPUX86State *env = &cpu->env;
> +int i;
> +char adj_value[CPUID_VENDOR_SZ + 1];
> +
> +memset(adj_value, 0, sizeof(adj_value));
> +
> +/* Convert known names */
> +if (!strcmp(value, "vmware") || !strcmp(value, "vmware4")) {
> +if (env->cpuid_hv_level == 0) {
> +env->cpuid_hv_level = CPUID_HV_LEVEL_VMARE_4;
> +}
> +pstrcpy(adj_value, sizeof(adj_value), CPUID_HV_VENDOR_VMWARE);
> +} else if (!strcmp(value, "vmware3")) {
> +if (env->cpuid_hv_level == 0) {
> +env->cpuid_hv_level = CPUID_HV_LEVEL_VMARE_3;
> +}
> +pstrcpy(adj_value, sizeof(adj_value), CPUID_HV_VE

Re: [Qemu-devel] Rethinking missed tick catchup

2012-09-12 Thread Anthony Liguori

Gleb Natapov  writes:

> On Wed, Sep 12, 2012 at 08:54:26AM -0500, Anthony Liguori wrote:
>> 
>> Hi,
>> 
>> We've been running into a lot of problems lately with Windows guests and
>> I think they all ultimately could be addressed by revisiting the missed
>> tick catchup algorithms that we use.  Mike and I spent a while talking
>> about it yesterday and I wanted to take the discussion to the list to
>> get some additional input.
>> 
>> Here are the problems we're seeing:
>> 
>> 1) Rapid reinjection can lead to time moving faster for short bursts of
>>time.  We've seen a number of RTC watchdog BSoDs and it's possible
>>that at least one cause is reinjection speed.
>> 
>> 2) When hibernating a host system, the guest gets is essentially paused
>>for a long period of time.  This results in a very large tick catchup
>>while also resulting in a large skew in guest time.
>> 
>>I've gotten reports of the tick catchup consuming a lot of CPU time
>>from rapid delivery of interrupts (although I haven't reproduced this
>>yet).
>> 
>> 3) Windows appears to have a service that periodically syncs the guest
>>time with the hardware clock.  I've been told the resync period is an
>>hour.  For large clock skews, this can compete with reinjection
>>resulting in a positive skew in time (the guest can be ahead of the
>>host).
>> 
>> I've been thinking about an algorithm like this to address these
>> problems:
>> 
>> A) Limit the number of interrupts that we reinject to the equivalent of
>>a small period of wallclock time.  Something like 60 seconds.
>> 
> How this will fix BSOD problem for instance? 60 seconds is long enough
> to cause all the problem you are talking about above. We can make
> amount of accumulated ticks easily configurable though to play with and
> see.

It won't, but the goal of an upper limit is to cap time correction at
something reasonably caused by overcommit, not by suspend/resume.

60 seconds is probably way too long.  Maybe 5 seconds?  We can try
various amounts as you said.

What do you think about slowing down the catchup rate?  I think now we
increase wallclock time by 100-700%.

This is very fast.  I wonder if this makes sense anymore since hr timers
are pretty much ubiquitous.

I think we could probably even just increase wallclock time by as little
as 10-20%.  That should avoid false watchdog alerts but still give us a
chance to inject enough interrupts.

>
>> B) In the event of (A), trigger a notification in QEMU.  This is easy
>>for the RTC but harder for the in-kernel PIT.  Maybe it's a good time to
>>revisit usage of the in-kernel PIT?
>> 
> PIT does not matter for Windows guests.
>
>> C) On acculumated tick overflow, rely on using a qemu-ga command to
>>force a resync of the guest's time to the hardware wallclock time.
>> 
> Needs guest cooperation.

Yes, hence qemu-ga.  But is there any other choice?  Hibernation can
cause us to miss an unbounded number of ticks.   Days worth of time.  It
seems unreasonable to gradually catch up that much time.

>> D) Whenever the guest reads the wallclock time from the RTC, reset all
>>accumulated ticks.
>>
>> In order to do (C), we'll need to plumb qemu-ga through QMP.  Mike and I
>> discussed a low-impact way of doing this (having a separate dispatch
>> path for guest agent commands) and I'm confident we could do this for
>> 1.3.
>> 
>> This would mean that management tools would need to consume qemu-ga
>> through QMP.  Not sure if this is a problem for anyone.
>> 
>> I'm not sure whether it's worth trying to support this with the
>> in-kernel PIT or not either.
>> 
>> Are there other issues with reinjection that people are aware of?  Does
>> anything seem obviously wrong with the above?
>> 
> It looks like you are trying to solve only pathologically big timedrift
> problems. Those do not happen normally.

They do if you hibernate your laptop.

Regards,

Anthony Liguori

>
> --
>   Gleb.

Re: [Qemu-devel] Rethinking missed tick catchup

2012-09-12 Thread Gleb Natapov

On Wed, Sep 12, 2012 at 07:30:08PM +0200, Stefan Weil wrote:
> Am 12.09.2012 18:45, schrieb Gleb Natapov:
> >On Wed, Sep 12, 2012 at 06:27:14PM +0200, Stefan Weil wrote:
> >>Am 12.09.2012 15:54, schrieb Anthony Liguori:
> >>>Hi,
> >>>
> >>>We've been running into a lot of problems lately with Windows guests and
> >>>I think they all ultimately could be addressed by revisiting the missed
> >>>tick catchup algorithms that we use.  Mike and I spent a while talking
> >>>about it yesterday and I wanted to take the discussion to the list to
> >>>get some additional input.
> >>>
> >>>Here are the problems we're seeing:
> >>>
> >>>1) Rapid reinjection can lead to time moving faster for short bursts of
> >>>time.  We've seen a number of RTC watchdog BSoDs and it's possible
> >>>that at least one cause is reinjection speed.
> >>>
> >>>2) When hibernating a host system, the guest gets is essentially paused
> >>>for a long period of time.  This results in a very large tick catchup
> >>>while also resulting in a large skew in guest time.
> >>>
> >>>I've gotten reports of the tick catchup consuming a lot of CPU time
> >>>from rapid delivery of interrupts (although I haven't reproduced this
> >>>yet).
> >>>
> >>>3) Windows appears to have a service that periodically syncs the guest
> >>>time with the hardware clock.  I've been told the resync period is an
> >>>hour.  For large clock skews, this can compete with reinjection
> >>>resulting in a positive skew in time (the guest can be ahead of the
> >>>host).
> >>Nearly each modern OS (including Windows) uses NTP
> >>or some other protocol to get the time via a TCP network.
> >>
> >The drifts we are talking about will take ages for NTP to fix.
> >
> >>If a guest OS detects a small difference of time, it will usually
> >>accelerate or decelerate the OS clock until the time is
> >>synchronised again.
> >>
> >>Large jumps in network time will make the OS time jump, too.
> >>With a little bad luck, QEMU's reinjection will add the
> >>positive skew, no matter whether the guest is Linux or Windows.
> >>
> >As far as I know NTP will never make OS clock jump. The purpose of NTP
> >is to fix time gradually, so apps will not notice. npdate is used to
> >force clock synchronization, but is should be run manually.
> 
> s/npdate/ntpdate. Yes, some Linux distros run it at system start,
Yes, typo.

> and it's also usual to call it every hour (poor man's NTP, uses
> less resources).
> 
> >
> >>>I've been thinking about an algorithm like this to address these
> >>>problems:
> >>>
> >>>A) Limit the number of interrupts that we reinject to the equivalent of
> >>>a small period of wallclock time.  Something like 60 seconds.
> >>>
> >>>B) In the event of (A), trigger a notification in QEMU.  This is easy
> >>>for the RTC but harder for the in-kernel PIT.  Maybe it's a good time 
> >>> to
> >>>revisit usage of the in-kernel PIT?
> >>>
> >>>C) On acculumated tick overflow, rely on using a qemu-ga command to
> >>>force a resync of the guest's time to the hardware wallclock time.
> >>>
> >>>D) Whenever the guest reads the wallclock time from the RTC, reset all
> >>>accumulated ticks.
> >>D) makes no sense, see my comment above.
> >>
> >>Injection of additional timer interrupts should not be needed
> >>after a hibernation. The guest must handle that situation
> >>by reading either the hw clock (which must be updated
> >>by QEMU when it resumes from hibernate) or by using
> >>another time reference (like NTP, for example).
> >>
> >He is talking about host hibernation, not guest.
> >
> 
> I also meant host hibernation.
Than I don't see how guest can handle the situation since it has
no idea that it was stopped. Qemu has not idea about host hibernation
either.

> 
> Maybe the host should tell the guest that it is going to
> hibernate (ACPI event), then the guest can use its
> normal hibernate entry and recovery code, too.
Qemu does not emulate Sleep button, but even if it did guest may ignore
it. AFAIK libvirt migrate VM into a file during host hibernation. While
this does not require guest cooperation it have time keeping issues.

--
Gleb.

Re: [Qemu-devel] [PATCH] w64: Fix calls of TCG helper functions with 5 arguments

2012-09-12 Thread Peter Maydell

On 12 September 2012 19:03, Stefan Weil  wrote:
> diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c
> index da17bba..43b5572 100644
> --- a/tcg/i386/tcg-target.c
> +++ b/tcg/i386/tcg-target.c
> @@ -118,7 +118,7 @@ static void patch_reloc(uint8_t *code_ptr, int type,
>  static inline int tcg_target_get_call_iarg_regs_count(int flags)
>  {
>  if (TCG_TARGET_REG_BITS == 64) {
> -return 6;
> +return ARRAY_SIZE(tcg_target_call_iarg_regs);
>  }
>
>  return 0;

Hmm. Why can't we just return the array size in all cases?
Is there something special about 32 bit x86? I checked, and
all our other TCG targets return the same value as the size of
the iarg_regs array (either using ARRAY_SIZE or by just returning
the right number)...

-- PMM

Re: [Qemu-devel] Rethinking missed tick catchup

2012-09-12 Thread Clemens Kolbitsch

> On 2012-09-12 15:54, Anthony Liguori wrote:
>>
>> Hi,
>>
>> We've been running into a lot of problems lately with Windows guests and
>> I think they all ultimately could be addressed by revisiting the missed
>> tick catchup algorithms that we use.  Mike and I spent a while talking
>> about it yesterday and I wanted to take the discussion to the list to
>> get some additional input.
>>
>> Here are the problems we're seeing:
>>
>> 1) Rapid reinjection can lead to time moving faster for short bursts of
>>time.  We've seen a number of RTC watchdog BSoDs and it's possible
>>that at least one cause is reinjection speed.
>>
>> 2) When hibernating a host system, the guest gets is essentially paused
>>for a long period of time.  This results in a very large tick catchup
>>while also resulting in a large skew in guest time.
>>
>>I've gotten reports of the tick catchup consuming a lot of CPU time
>>from rapid delivery of interrupts (although I haven't reproduced this
>>yet).

Guys,

not much that I can contribute to solving the problem, but I have a
bunch of VMs where this happens _every_ time I resume a snapshot (but
without hibernating). In case this could be a connected problem and
you need help testing a patch, I'm more than happy to help.

-Clemens

[Qemu-devel] [PATCH] w64: Fix calls of TCG helper functions with 5 arguments

2012-09-12 Thread Stefan Weil

TCG uses 6 registers for function arguments on 64 bit Linux hosts,
but only 4 registers on W64 hosts.

Commit 2999a0b20074a7e4a58f56572bb1436749368f59 increased the number
of arguments for some important helper functions from 4 to 5
which triggered a bug for W64 hosts: QEMU aborts when executing
helper_lcall_real in the guest's BIOS because function
tcg_target_get_call_iarg_regs_count always returned 6.

As W64 has only 4 registers for arguments, the 5th argument must be
passed on the stack using a correct stack offset.

Signed-off-by: Stefan Weil 
---

Without that patch, QEMU 1.2 is unusable on W64 hosts.
Please apply it to the stable versions.

Thanks,

Stefan W.

 tcg/i386/tcg-target.c |2 +-
 tcg/i386/tcg-target.h |4 
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c
index da17bba..43b5572 100644
--- a/tcg/i386/tcg-target.c
+++ b/tcg/i386/tcg-target.c
@@ -118,7 +118,7 @@ static void patch_reloc(uint8_t *code_ptr, int type,
 static inline int tcg_target_get_call_iarg_regs_count(int flags)
 {
 if (TCG_TARGET_REG_BITS == 64) {
-return 6;
+return ARRAY_SIZE(tcg_target_call_iarg_regs);
 }
 
 return 0;
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index c3cfe05..87417d0 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -67,7 +67,11 @@ typedef enum {
 /* used for function call generation */
 #define TCG_REG_CALL_STACK TCG_REG_ESP 
 #define TCG_TARGET_STACK_ALIGN 16
+#if defined(_WIN64)
+#define TCG_TARGET_CALL_STACK_OFFSET 32
+#else
 #define TCG_TARGET_CALL_STACK_OFFSET 0
+#endif
 
 /* optional instructions */
 #define TCG_TARGET_HAS_div2_i32 1
-- 
1.7.10

Re: [Qemu-devel] [libvirt] [PATCH] snapshot: fix rollback failure in transaction mode

2012-09-12 Thread Eric Blake

On 09/12/2012 09:22 AM, Guannan Ren wrote:
> After failure of qemu transaction command, the snapshot file still
> be there with non-zero in size. In order to unlink the file, the
> patch removes the file size checking.

Can you give some exact steps to reproduce this, so that I know who is
making the file have non-zero size?  I'm worried that unlinking a
non-empty file is discarding data, so the commit message needs a lot
more details about how we are proving that the only way the file can be
non-zero size is because qemu happened to put data into a previously
empty file prior to the failed 'transaction' attempt.

That is, after re-reading context code just now, I'm fairly confident
that this code can only be reached when qemu supports the 'transaction'
monitor command, and libvirt's --reuse-ext flag was not specified, and
therefore libvirt must have just created the file.  But I want that in
the commit message rather than having to re-read the code every time I
visit this commit in future reads of the git log.  It may also be that
qemu has a bug in that the 'transaction' command is modifying files even
when it fails, so even while this works around the bug, I'm cc'ing Jeff
to see if qemu also needs a bug fix.

> ---
>  src/qemu/qemu_driver.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
> index 8e8e00c..1fedfb8 100644
> --- a/src/qemu/qemu_driver.c
> +++ b/src/qemu/qemu_driver.c
> @@ -10833,7 +10833,7 @@ qemuDomainSnapshotUndoSingleDiskActive(struct 
> qemud_driver *driver,
>  if (virDomainLockDiskDetach(driver->lockManager, vm, disk) < 0)
>  VIR_WARN("Unable to release lock on %s", disk->src);
>  if (need_unlink && stat(disk->src, &st) == 0 &&
> -st.st_size == 0 && S_ISREG(st.st_mode) && unlink(disk->src) < 0)
> +S_ISREG(st.st_mode) && unlink(disk->src) < 0)
>  VIR_WARN("Unable to remove just-created %s", disk->src);
>  
>  /* Update vm in place to match changes.  */
> 

-- 
Eric Blake   ebl...@redhat.com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] Rethinking missed tick catchup

2012-09-12 Thread Stefan Weil


Am 12.09.2012 18:45, schrieb Gleb Natapov:

On Wed, Sep 12, 2012 at 06:27:14PM +0200, Stefan Weil wrote:

Am 12.09.2012 15:54, schrieb Anthony Liguori:

Hi,

We've been running into a lot of problems lately with Windows guests and
I think they all ultimately could be addressed by revisiting the missed
tick catchup algorithms that we use.  Mike and I spent a while talking
about it yesterday and I wanted to take the discussion to the list to
get some additional input.

Here are the problems we're seeing:

1) Rapid reinjection can lead to time moving faster for short bursts of
time.  We've seen a number of RTC watchdog BSoDs and it's possible
that at least one cause is reinjection speed.

2) When hibernating a host system, the guest gets is essentially paused
for a long period of time.  This results in a very large tick catchup
while also resulting in a large skew in guest time.

I've gotten reports of the tick catchup consuming a lot of CPU time
from rapid delivery of interrupts (although I haven't reproduced this
yet).

3) Windows appears to have a service that periodically syncs the guest
time with the hardware clock.  I've been told the resync period is an
hour.  For large clock skews, this can compete with reinjection
resulting in a positive skew in time (the guest can be ahead of the
host).

Nearly each modern OS (including Windows) uses NTP
or some other protocol to get the time via a TCP network.


The drifts we are talking about will take ages for NTP to fix.


If a guest OS detects a small difference of time, it will usually
accelerate or decelerate the OS clock until the time is
synchronised again.

Large jumps in network time will make the OS time jump, too.
With a little bad luck, QEMU's reinjection will add the
positive skew, no matter whether the guest is Linux or Windows.


As far as I know NTP will never make OS clock jump. The purpose of NTP
is to fix time gradually, so apps will not notice. npdate is used to
force clock synchronization, but is should be run manually.


s/npdate/ntpdate. Yes, some Linux distros run it at system start,
and it's also usual to call it every hour (poor man's NTP, uses
less resources).




I've been thinking about an algorithm like this to address these
problems:

A) Limit the number of interrupts that we reinject to the equivalent of
a small period of wallclock time.  Something like 60 seconds.

B) In the event of (A), trigger a notification in QEMU.  This is easy
for the RTC but harder for the in-kernel PIT.  Maybe it's a good time to
revisit usage of the in-kernel PIT?

C) On acculumated tick overflow, rely on using a qemu-ga command to
force a resync of the guest's time to the hardware wallclock time.

D) Whenever the guest reads the wallclock time from the RTC, reset all
accumulated ticks.

D) makes no sense, see my comment above.

Injection of additional timer interrupts should not be needed
after a hibernation. The guest must handle that situation
by reading either the hw clock (which must be updated
by QEMU when it resumes from hibernate) or by using
another time reference (like NTP, for example).


He is talking about host hibernation, not guest.



I also meant host hibernation.

Maybe the host should tell the guest that it is going to
hibernate (ACPI event), then the guest can use its
normal hibernate entry and recovery code, too.

Re: [Qemu-devel] Rethinking missed tick catchup

2012-09-12 Thread Luiz Capitulino

On Wed, 12 Sep 2012 08:54:26 -0500
Anthony Liguori  wrote:

> 
> Hi,
> 
> We've been running into a lot of problems lately with Windows guests and
> I think they all ultimately could be addressed by revisiting the missed
> tick catchup algorithms that we use.  Mike and I spent a while talking
> about it yesterday and I wanted to take the discussion to the list to
> get some additional input.
> 
> Here are the problems we're seeing:
> 
> 1) Rapid reinjection can lead to time moving faster for short bursts of
>time.  We've seen a number of RTC watchdog BSoDs and it's possible
>that at least one cause is reinjection speed.
> 
> 2) When hibernating a host system, the guest gets is essentially paused
>for a long period of time.  This results in a very large tick catchup
>while also resulting in a large skew in guest time.
> 
>I've gotten reports of the tick catchup consuming a lot of CPU time
>from rapid delivery of interrupts (although I haven't reproduced this
>yet).
> 
> 3) Windows appears to have a service that periodically syncs the guest
>time with the hardware clock.  I've been told the resync period is an
>hour.  For large clock skews, this can compete with reinjection
>resulting in a positive skew in time (the guest can be ahead of the
>host).
> 
> I've been thinking about an algorithm like this to address these
> problems:
> 
> A) Limit the number of interrupts that we reinject to the equivalent of
>a small period of wallclock time.  Something like 60 seconds.
> 
> B) In the event of (A), trigger a notification in QEMU.  This is easy
>for the RTC but harder for the in-kernel PIT.  Maybe it's a good time to
>revisit usage of the in-kernel PIT?
> 
> C) On acculumated tick overflow, rely on using a qemu-ga command to
>force a resync of the guest's time to the hardware wallclock time.
> 
> D) Whenever the guest reads the wallclock time from the RTC, reset all
>accumulated ticks.
> 
> In order to do (C), we'll need to plumb qemu-ga through QMP.  Mike and I
> discussed a low-impact way of doing this (having a separate dispatch
> path for guest agent commands) and I'm confident we could do this for
> 1.3.

Fine with me, but note that we're only two or three commands away from
having the qapi conversion done. So, it's possible that we'll merge this
and re-do it a few weeks later.

> This would mean that management tools would need to consume qemu-ga
> through QMP.  Not sure if this is a problem for anyone.

Shouldn't be a problem I think.

> 
> I'm not sure whether it's worth trying to support this with the
> in-kernel PIT or not either.
> 
> Are there other issues with reinjection that people are aware of?  Does
> anything seem obviously wrong with the above?
> 
> Regards,
> 
> Anthony Liguori
>

Re: [Qemu-devel] [PATCH] tcg: Fix MAX_OPC_PARAM_IARGS

2012-09-12 Thread Richard Henderson

On 09/12/2012 10:18 AM, Stefan Weil wrote:
> DEF_HELPER_FLAGS_5 was added some time ago without adjusting
> MAX_OPC_PARAM_IARGS.
> 
> Fixing the definition becomes more important as QEMU is using
> an increasing number of helper functions called with 5 arguments.
> 
> Add also a comment to avoid future problems when DEF_HELPER_FLAGS_6
> will be added.
> 
> Signed-off-by: Stefan Weil 

Reviewed-by: Richard Henderson 


r~

[Qemu-devel] [PATCH] tcg: Fix MAX_OPC_PARAM_IARGS

2012-09-12 Thread Stefan Weil

DEF_HELPER_FLAGS_5 was added some time ago without adjusting
MAX_OPC_PARAM_IARGS.

Fixing the definition becomes more important as QEMU is using
an increasing number of helper functions called with 5 arguments.

Add also a comment to avoid future problems when DEF_HELPER_FLAGS_6
will be added.

Signed-off-by: Stefan Weil 
---

Hi,

I think this patch should be added to the latest stable versions, too.

Please note that this patch breaks compilation with --enable-tcg-interpreter.

TCI code is designed for up to 4 arguments and needs modifications.
The current TCI binaries crash at runtime, so the patch just makes it
obvious that TCI needs to be fixed.

Regards,
Stefan Weil

 def-helper.h |2 ++
 exec-all.h   |2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/def-helper.h b/def-helper.h
index b98ff69..022a9ce 100644
--- a/def-helper.h
+++ b/def-helper.h
@@ -128,6 +128,8 @@
 #define DEF_HELPER_5(name, ret, t1, t2, t3, t4, t5) \
 DEF_HELPER_FLAGS_5(name, 0, ret, t1, t2, t3, t4, t5)
 
+/* MAX_OPC_PARAM_IARGS must be set to n if last entry is DEF_HELPER_FLAGS_n. */
+
 #endif /* DEF_HELPER_H */
 
 #ifndef GEN_HELPER
diff --git a/exec-all.h b/exec-all.h
index ac19c02..8977729 100644
--- a/exec-all.h
+++ b/exec-all.h
@@ -51,7 +51,7 @@ typedef struct TranslationBlock TranslationBlock;
 #else
 #define MAX_OPC_PARAM_PER_ARG 1
 #endif
-#define MAX_OPC_PARAM_IARGS 4
+#define MAX_OPC_PARAM_IARGS 5
 #define MAX_OPC_PARAM_OARGS 1
 #define MAX_OPC_PARAM_ARGS (MAX_OPC_PARAM_IARGS + MAX_OPC_PARAM_OARGS)
 
-- 
1.7.10

Re: [Qemu-devel] TCG questions

2012-09-12 Thread Xin Tong

I have the code on http://code.google.com/p/qemu-trace/. I currently
have memory trace, branch trace and some special instructions traces
ready ( unverified though). we should discuss about what is the best
way to do this btw.

Xin


On Wed, Sep 12, 2012 at 10:09 AM, Xin Tong  wrote:
> On Wed, Sep 12, 2012 at 6:14 AM, Lluís Vilanova  wrote:
>> Xin Tong writes:
>>
>>> i do not know. could be similar. I am doing architecture research. i
>>> need traces of memory access for programming running under a full
>>> system environment, so i wrote this.
>>
>>> i do nto seem to be able to access the linked provided from the link
>>> you give me though.
>>
>>> https://projects.gso.ac.upc.edu/projects/qemu-dbi/wiki
>>
>> Well, if you tried to access it during the last few days, we've been having 
>> some
>> issues with the server.
>>
>> It should all work now.
>>
>> The main idea is to have an API similar in spirit to that of PIN [1]. You can
>> have a look at the instrumentation docs [2] for some simple examples.
>>
>> I had some plans to modify QEMU's address translation mechanism to provide a
>> performant mechanism to retrieve physical addresses (traces of virtual 
>> addresses
>> are already supported), but that will have to wait until I finish some other
>> unrelated tasks.
>>
>>
>> [1] http://pintool.org
>> [2] 
>> https://projects.gso.ac.upc.edu/projects/qemu-dbi/repository/entry/docs/instrumentation.txt#L202
>
> By the way Luis. this is exactly what i am doing. i am writing up a
> ISPASS paper for this as well. I am also using PIN to verify the
> instrumentation interface.
>
> Xin
>
>>
>> Lluis
>>
>>
>>
>>> On Tue, Sep 11, 2012 at 6:52 PM, 陳韋任 (Wei-Ren Chen)
>>>  wrote:
> I have created a set of instrument API on QEMU. one can write client
> programs that compile into shared library. the shared library is then
> loaded into qemu and extract statistics out of QEMU.

 Instrument API? Same as what Liuis did?

 Regards,
 chenwj

 [1] http://lists.gnu.org/archive/html/qemu-devel/2012-09/msg00379.html

 --
 Wei-Ren Chen (陳韋任)
 Computer Systems Lab, Institute of Information Science,
 Academia Sinica, Taiwan (R.O.C.)
 Tel:886-2-2788-3799 #1667
 Homepage: http://people.cs.nctu.edu.tw/~chenwj
>>
>>
>> --
>>  "And it's much the same thing with knowledge, for whenever you learn
>>  something new, the whole world becomes that much richer."
>>  -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
>>  Tollbooth

Re: [Qemu-devel] TCG questions

2012-09-12 Thread Xin Tong

On Wed, Sep 12, 2012 at 6:14 AM, Lluís Vilanova  wrote:
> Xin Tong writes:
>
>> i do not know. could be similar. I am doing architecture research. i
>> need traces of memory access for programming running under a full
>> system environment, so i wrote this.
>
>> i do nto seem to be able to access the linked provided from the link
>> you give me though.
>
>> https://projects.gso.ac.upc.edu/projects/qemu-dbi/wiki
>
> Well, if you tried to access it during the last few days, we've been having 
> some
> issues with the server.
>
> It should all work now.
>
> The main idea is to have an API similar in spirit to that of PIN [1]. You can
> have a look at the instrumentation docs [2] for some simple examples.
>
> I had some plans to modify QEMU's address translation mechanism to provide a
> performant mechanism to retrieve physical addresses (traces of virtual 
> addresses
> are already supported), but that will have to wait until I finish some other
> unrelated tasks.
>
>
> [1] http://pintool.org
> [2] 
> https://projects.gso.ac.upc.edu/projects/qemu-dbi/repository/entry/docs/instrumentation.txt#L202

By the way Luis. this is exactly what i am doing. i am writing up a
ISPASS paper for this as well. I am also using PIN to verify the
instrumentation interface.

Xin

>
> Lluis
>
>
>
>> On Tue, Sep 11, 2012 at 6:52 PM, 陳韋任 (Wei-Ren Chen)
>>  wrote:
 I have created a set of instrument API on QEMU. one can write client
 programs that compile into shared library. the shared library is then
 loaded into qemu and extract statistics out of QEMU.
>>>
>>> Instrument API? Same as what Liuis did?
>>>
>>> Regards,
>>> chenwj
>>>
>>> [1] http://lists.gnu.org/archive/html/qemu-devel/2012-09/msg00379.html
>>>
>>> --
>>> Wei-Ren Chen (陳韋任)
>>> Computer Systems Lab, Institute of Information Science,
>>> Academia Sinica, Taiwan (R.O.C.)
>>> Tel:886-2-2788-3799 #1667
>>> Homepage: http://people.cs.nctu.edu.tw/~chenwj
>
>
> --
>  "And it's much the same thing with knowledge, for whenever you learn
>  something new, the whole world becomes that much richer."
>  -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
>  Tollbooth

Re: [Qemu-devel] Does TCG IR use static single assignment (SSA) form?

2012-09-12 Thread Richard Henderson

On 09/12/2012 02:45 AM, (Yichen Yang)楊逸臣 wrote:
> Excuse me for asking, does TCG-IR  use static single assignment (SSA) form?

No.


r~

Re: [Qemu-devel] Does TCG IR use static single assignment (SSA) form?

2012-09-12 Thread Peter Maydell

On 12 September 2012 10:45, (Yichen Yang)楊逸臣
 wrote:
> Excuse me for asking, does TCG-IR  use static single assignment (SSA) form?

No. The TCG IR is documented in tcg/README -- this should be enough to
be able to translate something else into it.

-- PMM

Re: [Qemu-devel] [PATCH v3 01/17] vga: rename pci_vga_init() into pci_std_vga_init()

2012-09-12 Thread Richard Henderson

On 09/11/2012 12:10 PM, Aurelien Jarno wrote:
> This better explains what is this function about. Adjust all callers.
> 
> Cc: Richard Henderson 
> Cc: Alexander Graf 
> Cc: Andreas Färber 
> Cc: David Gibson 
> Cc: Anthony Liguori 
> Acked-by: Blue Swirl 
> Acked-by: Andreas Färber 
> Signed-off-by: Aurelien Jarno 

Acked-by: Richard Henderson 


r~

Re: [Qemu-devel] Windows VM slow boot

2012-09-12 Thread Richard Davies

Hi Mel - thanks for replying to my underhand bcc!

Mel Gorman wrote:
> I see that this is an old-ish bug but I did not read the full history.
> Is it now booting faster than 3.5.0 was? I'm asking because I'm
> interested to see if commit c67fe375 helped your particular case.

Yes, I think 3.6.0-rc5 is already better than 3.5.x but can still be
improved, as discussed.

> A follow-on from commit c67fe375 was the following patch (author cc'd)
> which addresses lock contention in isolate_migratepages_range where your
> perf report indicates that we're spending 95% of the time. Would you be
> willing to test it please?
>
> ---8<---
> From: Shaohua Li 
> Subject: mm: compaction: check lock contention first before taking lock
>
> isolate_migratepages_range will take zone->lru_lock first and check if the
> lock is contented, if yes, it will release the lock.  This isn't
> efficient.  If the lock is truly contented, a lock/unlock pair will
> increase the lock contention.  We'd better check if the lock is contended
> first.  compact_trylock_irqsave perfectly meets the requirement.
>
> Signed-off-by: Shaohua Li 
> Acked-by: Mel Gorman 
> Acked-by: Minchan Kim 
> Signed-off-by: Andrew Morton 
> ---
>
>  mm/compaction.c |5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff -puN 
> mm/compaction.c~mm-compaction-check-lock-contention-first-before-taking-lock 
> mm/compaction.c
> --- 
> a/mm/compaction.c~mm-compaction-check-lock-contention-first-before-taking-lock
> +++ a/mm/compaction.c
> @@ -349,8 +349,9 @@ isolate_migratepages_range(struct zone *
> 
>   /* Time to isolate some pages for migration */
>   cond_resched();
> - spin_lock_irqsave(&zone->lru_lock, flags);
> - locked = true;
> + locked = compact_trylock_irqsave(&zone->lru_lock, &flags, cc);
> + if (!locked)
> + return 0;
>   for (; low_pfn < end_pfn; low_pfn++) {
>   struct page *page;

I have applied and tested again - perf results below.

isolate_migratepages_range is indeed much reduced.

There is now a lot of time in isolate_freepages_block and still quite a lot
of lock contention, although in a different place.


# 
# captured on: Wed Sep 12 16:00:52 2012
# os release : 3.6.0-rc5-elastic+
# perf version : 3.5.2
# arch : x86_64
# nrcpus online : 16
# nrcpus avail : 16
# cpudesc : AMD Opteron(tm) Processor 6128
# cpuid : AuthenticAMD,16,9,1
# total memory : 131973280 kB
# cmdline : /home/root/bin/perf record -g -a 
# event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, 
excl_usr = 0, excl_kern = 0, id = { 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 
76, 77, 78, 79, 80 }
# HEADER_CPU_TOPOLOGY info available, use -I to display
# HEADER_NUMA_TOPOLOGY info available, use -I to display
# 
#
# Samples: 1M of event 'cycles'
# Event count (approx.): 560365005583
#
# Overhead  Command Shared Object   
   Symbol
#   ...    
..
#
43.95% qemu-kvm  [kernel.kallsyms] [k] isolate_freepages_block  
 
   |
   --- isolate_freepages_block
  |  
  |--99.99%-- compaction_alloc
  |  migrate_pages
  |  compact_zone
  |  compact_zone_order
  |  try_to_compact_pages
  |  __alloc_pages_direct_compact
  |  __alloc_pages_nodemask
  |  alloc_pages_vma
  |  do_huge_pmd_anonymous_page
  |  handle_mm_fault
  |  __get_user_pages
  |  get_user_page_nowait
  |  hva_to_pfn.isra.17
  |  __gfn_to_pfn
  |  gfn_to_pfn_async
  |  try_async_pf
  |  tdp_page_fault
  |  kvm_mmu_page_fault
  |  pf_interception
  |  handle_exit
  |  kvm_arch_vcpu_ioctl_run
  |  kvm_vcpu_ioctl
  |  do_vfs_ioctl
  |  sys_ioctl
  |  system_call_fastpath
  |  ioctl
  |  |  
  |  |--95.17%-- 0x1010006
  |  |  
  |   --4.83%-- 0x1010002
   --0.01%-- [...]
15.98% qemu-kvm  [kernel.kallsyms] [k] _raw_spin_lock_irqsave   
 
   |
   --- _raw_spin_lock_irqsave

Re: [Qemu-devel] Rethinking missed tick catchup

2012-09-12 Thread Gleb Natapov

On Wed, Sep 12, 2012 at 06:27:14PM +0200, Stefan Weil wrote:
> Am 12.09.2012 15:54, schrieb Anthony Liguori:
> >
> >Hi,
> >
> >We've been running into a lot of problems lately with Windows guests and
> >I think they all ultimately could be addressed by revisiting the missed
> >tick catchup algorithms that we use.  Mike and I spent a while talking
> >about it yesterday and I wanted to take the discussion to the list to
> >get some additional input.
> >
> >Here are the problems we're seeing:
> >
> >1) Rapid reinjection can lead to time moving faster for short bursts of
> >time.  We've seen a number of RTC watchdog BSoDs and it's possible
> >that at least one cause is reinjection speed.
> >
> >2) When hibernating a host system, the guest gets is essentially paused
> >for a long period of time.  This results in a very large tick catchup
> >while also resulting in a large skew in guest time.
> >
> >I've gotten reports of the tick catchup consuming a lot of CPU time
> >from rapid delivery of interrupts (although I haven't reproduced this
> >yet).
> >
> >3) Windows appears to have a service that periodically syncs the guest
> >time with the hardware clock.  I've been told the resync period is an
> >hour.  For large clock skews, this can compete with reinjection
> >resulting in a positive skew in time (the guest can be ahead of the
> >host).
> 
> Nearly each modern OS (including Windows) uses NTP
> or some other protocol to get the time via a TCP network.
> 
The drifts we are talking about will take ages for NTP to fix.

> If a guest OS detects a small difference of time, it will usually
> accelerate or decelerate the OS clock until the time is
> synchronised again.
> 
> Large jumps in network time will make the OS time jump, too.
> With a little bad luck, QEMU's reinjection will add the
> positive skew, no matter whether the guest is Linux or Windows.
> 
As far as I know NTP will never make OS clock jump. The purpose of NTP
is to fix time gradually, so apps will not notice. npdate is used to
force clock synchronization, but is should be run manually.

> >
> >I've been thinking about an algorithm like this to address these
> >problems:
> >
> >A) Limit the number of interrupts that we reinject to the equivalent of
> >a small period of wallclock time.  Something like 60 seconds.
> >
> >B) In the event of (A), trigger a notification in QEMU.  This is easy
> >for the RTC but harder for the in-kernel PIT.  Maybe it's a good time to
> >revisit usage of the in-kernel PIT?
> >
> >C) On acculumated tick overflow, rely on using a qemu-ga command to
> >force a resync of the guest's time to the hardware wallclock time.
> >
> >D) Whenever the guest reads the wallclock time from the RTC, reset all
> >accumulated ticks.
> 
> D) makes no sense, see my comment above.
> 
> Injection of additional timer interrupts should not be needed
> after a hibernation. The guest must handle that situation
> by reading either the hw clock (which must be updated
> by QEMU when it resumes from hibernate) or by using
> another time reference (like NTP, for example).
> 
He is talking about host hibernation, not guest.

> >
> >In order to do (C), we'll need to plumb qemu-ga through QMP.  Mike and I
> >discussed a low-impact way of doing this (having a separate dispatch
> >path for guest agent commands) and I'm confident we could do this for
> >1.3.
> >
> >This would mean that management tools would need to consume qemu-ga
> >through QMP.  Not sure if this is a problem for anyone.
> >
> >I'm not sure whether it's worth trying to support this with the
> >in-kernel PIT or not either.
> >
> >Are there other issues with reinjection that people are aware of?  Does
> >anything seem obviously wrong with the above?
> >
> >Regards,
> >
> >Anthony Liguori

--
Gleb.

Re: [Qemu-devel] Rethinking missed tick catchup

2012-09-12 Thread Stefan Weil


Am 12.09.2012 15:54, schrieb Anthony Liguori:


Hi,

We've been running into a lot of problems lately with Windows guests and
I think they all ultimately could be addressed by revisiting the missed
tick catchup algorithms that we use.  Mike and I spent a while talking
about it yesterday and I wanted to take the discussion to the list to
get some additional input.

Here are the problems we're seeing:

1) Rapid reinjection can lead to time moving faster for short bursts of
time.  We've seen a number of RTC watchdog BSoDs and it's possible
that at least one cause is reinjection speed.

2) When hibernating a host system, the guest gets is essentially paused
for a long period of time.  This results in a very large tick catchup
while also resulting in a large skew in guest time.

I've gotten reports of the tick catchup consuming a lot of CPU time
from rapid delivery of interrupts (although I haven't reproduced this
yet).

3) Windows appears to have a service that periodically syncs the guest
time with the hardware clock.  I've been told the resync period is an
hour.  For large clock skews, this can compete with reinjection
resulting in a positive skew in time (the guest can be ahead of the
host).


Nearly each modern OS (including Windows) uses NTP
or some other protocol to get the time via a TCP network.

If a guest OS detects a small difference of time, it will usually
accelerate or decelerate the OS clock until the time is
synchronised again.

Large jumps in network time will make the OS time jump, too.
With a little bad luck, QEMU's reinjection will add the
positive skew, no matter whether the guest is Linux or Windows.



I've been thinking about an algorithm like this to address these
problems:

A) Limit the number of interrupts that we reinject to the equivalent of
a small period of wallclock time.  Something like 60 seconds.

B) In the event of (A), trigger a notification in QEMU.  This is easy
for the RTC but harder for the in-kernel PIT.  Maybe it's a good time to
revisit usage of the in-kernel PIT?

C) On acculumated tick overflow, rely on using a qemu-ga command to
force a resync of the guest's time to the hardware wallclock time.

D) Whenever the guest reads the wallclock time from the RTC, reset all
accumulated ticks.


D) makes no sense, see my comment above.

Injection of additional timer interrupts should not be needed
after a hibernation. The guest must handle that situation
by reading either the hw clock (which must be updated
by QEMU when it resumes from hibernate) or by using
another time reference (like NTP, for example).



In order to do (C), we'll need to plumb qemu-ga through QMP.  Mike and I
discussed a low-impact way of doing this (having a separate dispatch
path for guest agent commands) and I'm confident we could do this for
1.3.

This would mean that management tools would need to consume qemu-ga
through QMP.  Not sure if this is a problem for anyone.

I'm not sure whether it's worth trying to support this with the
in-kernel PIT or not either.

Are there other issues with reinjection that people are aware of?  Does
anything seem obviously wrong with the above?

Regards,

Anthony Liguori

Re: [Qemu-devel] [PATCH 005/126] target-s390: Fix gdbstub

2012-09-12 Thread Richard Henderson

On 09/12/2012 08:54 AM, Alexander Graf wrote:
> So the CC pseudo-register is never written to?

They do handle a write to cc_regnum in s390_pseudo_register_write.
They modify psw.mask as one would expect.


r~

Re: [Qemu-devel] Rethinking missed tick catchup

2012-09-12 Thread Gleb Natapov

On Wed, Sep 12, 2012 at 06:06:47PM +0300, Gleb Natapov wrote:
> On Wed, Sep 12, 2012 at 09:44:10AM -0500, Anthony Liguori wrote:
> > Jan Kiszka  writes:
> > 
> > > On 2012-09-12 15:54, Anthony Liguori wrote:
> > >> 
> > >> Hi,
> > >> 
> > >> We've been running into a lot of problems lately with Windows guests and
> > >> I think they all ultimately could be addressed by revisiting the missed
> > >> tick catchup algorithms that we use.  Mike and I spent a while talking
> > >> about it yesterday and I wanted to take the discussion to the list to
> > >> get some additional input.
> > >> 
> > >> Here are the problems we're seeing:
> > >> 
> > >> 1) Rapid reinjection can lead to time moving faster for short bursts of
> > >>time.  We've seen a number of RTC watchdog BSoDs and it's possible
> > >>that at least one cause is reinjection speed.
> > >> 
> > >> 2) When hibernating a host system, the guest gets is essentially paused
> > >>for a long period of time.  This results in a very large tick catchup
> > >>while also resulting in a large skew in guest time.
> > >> 
> > >>I've gotten reports of the tick catchup consuming a lot of CPU time
> > >>from rapid delivery of interrupts (although I haven't reproduced this
> > >>yet).
> > >> 
> > >> 3) Windows appears to have a service that periodically syncs the guest
> > >>time with the hardware clock.  I've been told the resync period is an
> > >>hour.  For large clock skews, this can compete with reinjection
> > >>resulting in a positive skew in time (the guest can be ahead of the
> > >>host).
> > >> 
> > >> I've been thinking about an algorithm like this to address these
> > >> problems:
> > >> 
> > >> A) Limit the number of interrupts that we reinject to the equivalent of
> > >>a small period of wallclock time.  Something like 60 seconds.
> > >> 
> > >> B) In the event of (A), trigger a notification in QEMU.  This is easy
> > >>for the RTC but harder for the in-kernel PIT.  Maybe it's a good time 
> > >> to
> > >>revisit usage of the in-kernel PIT?
> > >> 
> > >> C) On acculumated tick overflow, rely on using a qemu-ga command to
> > >>force a resync of the guest's time to the hardware wallclock time.
> > >> 
> > >> D) Whenever the guest reads the wallclock time from the RTC, reset all
> > >>accumulated ticks.
> > >> 
> > >> In order to do (C), we'll need to plumb qemu-ga through QMP.  Mike and I
> > >> discussed a low-impact way of doing this (having a separate dispatch
> > >> path for guest agent commands) and I'm confident we could do this for
> > >> 1.3.
> > >> 
> > >> This would mean that management tools would need to consume qemu-ga
> > >> through QMP.  Not sure if this is a problem for anyone.
> > >> 
> > >> I'm not sure whether it's worth trying to support this with the
> > >> in-kernel PIT or not either.
> > >
> > > As with our current discussion around fixing the PIC and its impact on
> > > the PIT, we should try on the userspace model first and then check if
> > > the design can be adapted to support in-kernel as well.
> > >
> > > For which guests is the PIT important again? Old Linux kernels? Windows
> > > should be mostly happy with the RTC - or the HPET.
> > 
> > I thought that only 64-bit Win2k8+ used the RTC.
> > 
> > I thought win2k3 and even 32-bit win2k8 still used the PIT.
> > 
> Only WindowsXP non-acpi hal uses PIT. Any other windows uses RTC. In
> other words we do not care about PIT.
> 
Small clarification. They use RTC if HPET is not present. I don't know
at what version Windows started to prefer HPET over RTC.

--
Gleb.

Re: [Qemu-devel] [RFC v3 ATCH 0/5] char: expose CirMemCharDriver and provide QMP interface

2012-09-12 Thread Daniel P. Berrange

On Wed, Sep 12, 2012 at 07:57:21PM +0800, Lei Li wrote:
> This RFC series attempts to convert the MemCharDriver to use a circular
> buffer for input and output, expose it to users by introducing QMP commands
> memchar_write and memchar_read and via the command line like the other
> CharDriverStates.
> 
> Serial ports in qemu always use CharDriverStates as there backends,
> Right now, all of our backends always try to write the data from the
> guest to a socket or file. The concern from OpenStack is that this could
> lead to unbounded disk space usage since they log the serial output.
> For more detail of the background info:
> https://bugs.launchpad.net/nova/+bug/832507

Unbounded disk usage is only relevant if telling QEMU to write directly
to its file backend. If you use a socket backend the mgmt app can provide
whatever policy it desires.

> So we want to use a circular buffer in QEMU instead, and then OpenStack
> can periodically read the buffer in QEMU and log it.

With any circular buffer you obviously have a race condition where
the buffer may overflow before the application can consume the data.
By implementing the circular buffer in QEMU you are imposing a
specific policy for overflow on the applications using QEMU, namely
that data gets overwritten/lost.

If the circular buffering is done outside QEMU, then the application
can implement a variety of policies on overflow. At the very least
they can detect when overflow would occur, and insert a marker to
the effect that there is a log discontinuity. Or they can pause the
VM for a period of time, or reduce its scheduling priority, or any
number of different things.

The further advantage of doing it outside QEMU, is that OpenStack will
work with all existing QEMU/KVM/libvirt versions.

I think most of the discussion in the quoted OpenStack bug is rather
short sighted only focusing on the immediate needs for their current
'get_log_output' API. I don't think this will be satisfactory for
OpenStack in the medium-long term. IMHO OpenStack needs to provide
a more comprensive logging capability than what is does today, and
thus this proposed QEMU feature would have a short lifetime of usefulness
to OpenStack. I've already recommended that OpenStack take advantage
of conserver to setup logging of all VMs, though there were some
issues around that. It is entirely possible for OpenStack to provide
its own logging system to process VM logs into fixed size files, as
well as satisfying the needs of its get_log_output API for circular
buffers.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

Re: [Qemu-devel] Windows VM slow boot

2012-09-12 Thread Mel Gorman

On Wed, Sep 12, 2012 at 11:56:59AM +0100, Richard Davies wrote:
> [ adding linux-mm - previously at http://marc.info/?t=13451150943 ]
> 
> Hi Rik,
> 

I'm not Rik but hi anyway.

> Since qemu-kvm 1.2.0 and Linux 3.6.0-rc5 came out, I thought that I would
> retest with these.
> 

Ok. 3.6.0-rc5 contains [c67fe375: mm: compaction: Abort async compaction
if locks are contended or taking too long] that should have helped mitigate
some of the lock contention problem but not all of it as we'll see later.

> The typical symptom now appears to be that the Windows VMs boot reasonably
> fast,

I see that this is an old-ish bug but I did not read the full history.
Is it now booting faster than 3.5.0 was? I'm asking because I'm
interested to see if commit c67fe375 helped your particular case.

> but then there is high CPU use and load for many minutes afterwards -
> the high CPU use is both for the qemu-kvm processes themselves and also for
> % sys.
> 

Ok, I cannot comment on the userspace portion of things but the kernel
portion still indicates that there is a high percentage of time on what
appears to be lock contention.

> I attach a perf report which seems to show that the high CPU use is in the
> memory manager.
> 

A follow-on from commit c67fe375 was the following patch (author cc'd)
which addresses lock contention in isolate_migratepages_range where your
perf report indicates that we're spending 95% of the time. Would you be
willing to test it please?

---8<---
From: Shaohua Li 
Subject: mm: compaction: check lock contention first before taking lock

isolate_migratepages_range will take zone->lru_lock first and check if the
lock is contented, if yes, it will release the lock.  This isn't
efficient.  If the lock is truly contented, a lock/unlock pair will
increase the lock contention.  We'd better check if the lock is contended
first.  compact_trylock_irqsave perfectly meets the requirement.

Signed-off-by: Shaohua Li 
Acked-by: Mel Gorman 
Acked-by: Minchan Kim 
Signed-off-by: Andrew Morton 
---

 mm/compaction.c |5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff -puN 
mm/compaction.c~mm-compaction-check-lock-contention-first-before-taking-lock 
mm/compaction.c
--- 
a/mm/compaction.c~mm-compaction-check-lock-contention-first-before-taking-lock
+++ a/mm/compaction.c
@@ -349,8 +349,9 @@ isolate_migratepages_range(struct zone *

/* Time to isolate some pages for migration */
cond_resched();
-   spin_lock_irqsave(&zone->lru_lock, flags);
-   locked = true;
+   locked = compact_trylock_irqsave(&zone->lru_lock, &flags, cc);
+   if (!locked)
+   return 0;
for (; low_pfn < end_pfn; low_pfn++) {
struct page *page;

[Qemu-devel] Does TCG IR use static single assignment (SSA) form?

2012-09-12 Thread Yichen Yang

Dear all,

Excuse me for asking, does TCG-IR  use static single assignment (SSA) form?

I just wanna know how to translate a register-based bytecode to TCG-IR.

thanks :)

Yichen

Re: [Qemu-devel] [PATCH 005/126] target-s390: Fix gdbstub

2012-09-12 Thread Alexander Graf


On 09/12/2012 05:11 PM, Richard Henderson wrote:

On 09/12/2012 06:25 AM, Alexander Graf wrote:

+case S390_PSWM_REGNUM:
+env->psw.mask = tmpl;
+env->cc_op = (tmpl >> 13) & 3;

Are you sure this is correct? I thought gdbstub would just ignore the cc bits.

Well... no it won't ignore the cc bits.


So the CC pseudo-register is never written to?


  But it would appear that I've got
them at the wrong location.  From gdb/s390-tdep.c:

   if (regnum == tdep->cc_regnum)
 {
   enum register_status status;

   status = regcache_raw_read_unsigned (regcache, S390_PSWM_REGNUM, &val);
   if (status == REG_VALID)
 {
   if (register_size (gdbarch, S390_PSWA_REGNUM) == 4)
 val = (val >> 12) & 3;


Oops :)


Alex

Re: [Qemu-devel] [RFC v3 ATCH 0/5] char: expose CirMemCharDriver and provide QMP interface

2012-09-12 Thread Avi Kivity

On 09/12/2012 02:57 PM, Lei Li wrote:
> This RFC series attempts to convert the MemCharDriver to use a circular
> buffer for input and output, expose it to users by introducing QMP commands
> memchar_write and memchar_read and via the command line like the other
> CharDriverStates.
> 
> Serial ports in qemu always use CharDriverStates as there backends,
> Right now, all of our backends always try to write the data from the
> guest to a socket or file. The concern from OpenStack is that this could
> lead to unbounded disk space usage since they log the serial output.
> For more detail of the background info:
> https://bugs.launchpad.net/nova/+bug/832507
> 
> So we want to use a circular buffer in QEMU instead, and then OpenStack
> can periodically read the buffer in QEMU and log it.

Can't they do it themselves?  Have qemu write to a pipe, and on the
other side, do whatever rate limiting is needed.

-- 
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] Enablig DLPAR capacity on QEMU pSeries

2012-09-12 Thread Alexander Graf


On 09/12/2012 04:54 PM, Erlon Cruz wrote:

Hi all,

We are planning to implement DLPAR capacity on QEMU pSeries. As we


What is DLPAR? Hotplug support?


lack of experience in the internals of the arch we would like you guys
to give us some design directions
and confirm if we going in the right direction. Our first idea is:

 1 - to patch 'spapr.c' so it can dynamically insert/remove basic
items into the device tree.


What exactly would you like to patch into it? We already do have support 
for dynamic dt creation with the spapr target.



 2 - create a host side device that will be used with a guest side
driver to perform guest side operations and communicate changes from
host to the guest (like DynamicRM does in PowerVM LPARs). We are not


Why not just use hypercalls?


planning to use powerpc-tools and want to make resource management
transparent (i.e. no need to run daemons or userspace programs in the
guest, only this kernel driver).
 3 - create bindings to support adding/removal  ibmvscsi devices
 4 - create bindings to support adding/removal  ibmveth devices
 5 - create bindings to support adding/removal PCI devices
 6 - create bindings to support adding/removal of memory


This is going to be the hardest part. I don't think QEMU supports memory 
hotplug yet.



 - Do we need to do this the way PowerVM does? We have tested
virtio ballooning and it can works with a few endiannes corrections.


I don't know how PowerVM works. But if normal ballooning is all you 
need, you should certainly just enable virtio-balloon.



 7 - create bindings to support adding/removal  CPUs
 - is SMP supported already? I tried to run SMP in a x86 host
and the guest stuck when SMP is enabled


SMP should work just fine, yes. Where exactly does it get stuck?


 - would be possible to work on this without a P7 baremetal
machine?


At least for device hotplug, it should be perfectly possible to use an 
old G5 with PR KVM. I haven't gotten around to patch all the pieces of 
the puzzle to make -M pseries work with PR KVM when it's running on top 
of pHyp yet, so that won't work.



We have a P7 8205-E6B, is that possible to kick PHYP out?


Ben?


Any ideia on how much effort (time/people) the hole thing would take?
Any consideration about this is much appreciated :)


Phew. It's hard to tell. Depends heavily on how good your people are :).


Alex

Re: [Qemu-devel] Rethinking missed tick catchup

2012-09-12 Thread Gleb Natapov

On Wed, Sep 12, 2012 at 05:42:58PM +0200, Jan Kiszka wrote:
> On 2012-09-12 17:06, Gleb Natapov wrote:
>  Are there other issues with reinjection that people are aware of?  Does
>  anything seem obviously wrong with the above?
> >>>
> >>> We should take the chance and design everything in a way that the HPET
> >>> can finally be (left) enabled.
> >>
> >> I thought the issue with the HPET was access frequency and the cost of
> >> heavy weight exits.
> >>
> >> I don't have concrete data here.  I've only heard it second hand.  Can
> >> anyone comment more?
> >>
> > There is no any reason whatsoever to emulate HPET for Windows. It will
> > make it slower. Hyper-V does not emulate it. For proper time support in
> > Windows we need to implement relevant part of Hyper-V spec.
> 
> There are two reasons to do it nevertheless:
> 
>  - QEMU is not Hyper-V. We are emulating the HPET already, and we
>expose it by default. So we should do it properly.
> 
>  - The time drift fix for the RTC is still a hack. Adding a second user
>would force us to finally clean it up.
> 
I am not saying we should not emulate HPET in QEMU,  I am saying there
is not reason to emulate it for Windows :)

--
Gleb.

Re: [Qemu-devel] Rethinking missed tick catchup

2012-09-12 Thread Jan Kiszka

On 2012-09-12 17:06, Gleb Natapov wrote:
 Are there other issues with reinjection that people are aware of?  Does
 anything seem obviously wrong with the above?
>>>
>>> We should take the chance and design everything in a way that the HPET
>>> can finally be (left) enabled.
>>
>> I thought the issue with the HPET was access frequency and the cost of
>> heavy weight exits.
>>
>> I don't have concrete data here.  I've only heard it second hand.  Can
>> anyone comment more?
>>
> There is no any reason whatsoever to emulate HPET for Windows. It will
> make it slower. Hyper-V does not emulate it. For proper time support in
> Windows we need to implement relevant part of Hyper-V spec.

There are two reasons to do it nevertheless:

 - QEMU is not Hyper-V. We are emulating the HPET already, and we
   expose it by default. So we should do it properly.

 - The time drift fix for the RTC is still a hack. Adding a second user
   would force us to finally clean it up.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

Re: [Qemu-devel] [PATCH] hw: Add support for new LSI Logic devices.

2012-09-12 Thread Avi Kivity

On 09/11/2012 08:00 PM, Don Slutz wrote:
> Add LSI53C1030, SAS1068, SAS1068e.  Based on code from "VirtualBox Open 
> Source Edition".
> Based on QEMU MegaRAID SAS 8708EM2.
> 
> This is a common VMware disk controller.
> 
> SEABIOS change for booting is in the works.
> 
> Tested with Fedora 16, 17.  CentoOS 6. Windows 2003R2 and 2008R2.
> 

Is the spec for these devices freely available?  Please provide a link
in the source.


-- 
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] Rethinking missed tick catchup

2012-09-12 Thread Gleb Natapov

On Wed, Sep 12, 2012 at 08:54:26AM -0500, Anthony Liguori wrote:
> 
> Hi,
> 
> We've been running into a lot of problems lately with Windows guests and
> I think they all ultimately could be addressed by revisiting the missed
> tick catchup algorithms that we use.  Mike and I spent a while talking
> about it yesterday and I wanted to take the discussion to the list to
> get some additional input.
> 
> Here are the problems we're seeing:
> 
> 1) Rapid reinjection can lead to time moving faster for short bursts of
>time.  We've seen a number of RTC watchdog BSoDs and it's possible
>that at least one cause is reinjection speed.
> 
> 2) When hibernating a host system, the guest gets is essentially paused
>for a long period of time.  This results in a very large tick catchup
>while also resulting in a large skew in guest time.
> 
>I've gotten reports of the tick catchup consuming a lot of CPU time
>from rapid delivery of interrupts (although I haven't reproduced this
>yet).
> 
> 3) Windows appears to have a service that periodically syncs the guest
>time with the hardware clock.  I've been told the resync period is an
>hour.  For large clock skews, this can compete with reinjection
>resulting in a positive skew in time (the guest can be ahead of the
>host).
> 
> I've been thinking about an algorithm like this to address these
> problems:
> 
> A) Limit the number of interrupts that we reinject to the equivalent of
>a small period of wallclock time.  Something like 60 seconds.
> 
How this will fix BSOD problem for instance? 60 seconds is long enough
to cause all the problem you are talking about above. We can make
amount of accumulated ticks easily configurable though to play with and
see.

> B) In the event of (A), trigger a notification in QEMU.  This is easy
>for the RTC but harder for the in-kernel PIT.  Maybe it's a good time to
>revisit usage of the in-kernel PIT?
> 
PIT does not matter for Windows guests.

> C) On acculumated tick overflow, rely on using a qemu-ga command to
>force a resync of the guest's time to the hardware wallclock time.
> 
Needs guest cooperation.

> D) Whenever the guest reads the wallclock time from the RTC, reset all
>accumulated ticks.
>
> In order to do (C), we'll need to plumb qemu-ga through QMP.  Mike and I
> discussed a low-impact way of doing this (having a separate dispatch
> path for guest agent commands) and I'm confident we could do this for
> 1.3.
> 
> This would mean that management tools would need to consume qemu-ga
> through QMP.  Not sure if this is a problem for anyone.
> 
> I'm not sure whether it's worth trying to support this with the
> in-kernel PIT or not either.
> 
> Are there other issues with reinjection that people are aware of?  Does
> anything seem obviously wrong with the above?
> 
It looks like you are trying to solve only pathologically big timedrift
problems. Those do not happen normally.

--
Gleb.

Re: [Qemu-devel] [PATCH 005/126] target-s390: Fix gdbstub

2012-09-12 Thread Richard Henderson

On 09/12/2012 06:25 AM, Alexander Graf wrote:
>> +case S390_PSWM_REGNUM:
>> +env->psw.mask = tmpl;
>> +env->cc_op = (tmpl >> 13) & 3;
> 
> Are you sure this is correct? I thought gdbstub would just ignore the cc bits.

Well... no it won't ignore the cc bits.  But it would appear that I've got
them at the wrong location.  From gdb/s390-tdep.c:

  if (regnum == tdep->cc_regnum)
{
  enum register_status status;

  status = regcache_raw_read_unsigned (regcache, S390_PSWM_REGNUM, &val);
  if (status == REG_VALID)
{
  if (register_size (gdbarch, S390_PSWA_REGNUM) == 4)
val = (val >> 12) & 3;
  else
val = (val >> 44) & 3;
  store_unsigned_integer (buf, regsize, byte_order, val);
}
  return status;
}


r~

Re: [Qemu-devel] Rethinking missed tick catchup

2012-09-12 Thread Gleb Natapov

On Wed, Sep 12, 2012 at 09:44:10AM -0500, Anthony Liguori wrote:
> Jan Kiszka  writes:
> 
> > On 2012-09-12 15:54, Anthony Liguori wrote:
> >> 
> >> Hi,
> >> 
> >> We've been running into a lot of problems lately with Windows guests and
> >> I think they all ultimately could be addressed by revisiting the missed
> >> tick catchup algorithms that we use.  Mike and I spent a while talking
> >> about it yesterday and I wanted to take the discussion to the list to
> >> get some additional input.
> >> 
> >> Here are the problems we're seeing:
> >> 
> >> 1) Rapid reinjection can lead to time moving faster for short bursts of
> >>time.  We've seen a number of RTC watchdog BSoDs and it's possible
> >>that at least one cause is reinjection speed.
> >> 
> >> 2) When hibernating a host system, the guest gets is essentially paused
> >>for a long period of time.  This results in a very large tick catchup
> >>while also resulting in a large skew in guest time.
> >> 
> >>I've gotten reports of the tick catchup consuming a lot of CPU time
> >>from rapid delivery of interrupts (although I haven't reproduced this
> >>yet).
> >> 
> >> 3) Windows appears to have a service that periodically syncs the guest
> >>time with the hardware clock.  I've been told the resync period is an
> >>hour.  For large clock skews, this can compete with reinjection
> >>resulting in a positive skew in time (the guest can be ahead of the
> >>host).
> >> 
> >> I've been thinking about an algorithm like this to address these
> >> problems:
> >> 
> >> A) Limit the number of interrupts that we reinject to the equivalent of
> >>a small period of wallclock time.  Something like 60 seconds.
> >> 
> >> B) In the event of (A), trigger a notification in QEMU.  This is easy
> >>for the RTC but harder for the in-kernel PIT.  Maybe it's a good time to
> >>revisit usage of the in-kernel PIT?
> >> 
> >> C) On acculumated tick overflow, rely on using a qemu-ga command to
> >>force a resync of the guest's time to the hardware wallclock time.
> >> 
> >> D) Whenever the guest reads the wallclock time from the RTC, reset all
> >>accumulated ticks.
> >> 
> >> In order to do (C), we'll need to plumb qemu-ga through QMP.  Mike and I
> >> discussed a low-impact way of doing this (having a separate dispatch
> >> path for guest agent commands) and I'm confident we could do this for
> >> 1.3.
> >> 
> >> This would mean that management tools would need to consume qemu-ga
> >> through QMP.  Not sure if this is a problem for anyone.
> >> 
> >> I'm not sure whether it's worth trying to support this with the
> >> in-kernel PIT or not either.
> >
> > As with our current discussion around fixing the PIC and its impact on
> > the PIT, we should try on the userspace model first and then check if
> > the design can be adapted to support in-kernel as well.
> >
> > For which guests is the PIT important again? Old Linux kernels? Windows
> > should be mostly happy with the RTC - or the HPET.
> 
> I thought that only 64-bit Win2k8+ used the RTC.
> 
> I thought win2k3 and even 32-bit win2k8 still used the PIT.
> 
Only WindowsXP non-acpi hal uses PIT. Any other windows uses RTC. In
other words we do not care about PIT.

> >> Are there other issues with reinjection that people are aware of?  Does
> >> anything seem obviously wrong with the above?
> >
> > We should take the chance and design everything in a way that the HPET
> > can finally be (left) enabled.
> 
> I thought the issue with the HPET was access frequency and the cost of
> heavy weight exits.
> 
> I don't have concrete data here.  I've only heard it second hand.  Can
> anyone comment more?
> 
There is no any reason whatsoever to emulate HPET for Windows. It will
make it slower. Hyper-V does not emulate it. For proper time support in
Windows we need to implement relevant part of Hyper-V spec.

--
Gleb.

Re: [Qemu-devel] [PATCH 4/5] QAPI: Introduce memchar-read QMP command

2012-09-12 Thread Eric Blake

On 09/12/2012 05:57 AM, Lei Li wrote:
> Signed-off-by: Lei Li 
> ---
>  hmp-commands.hx  |   25 +
>  hmp.c|   18 ++
>  hmp.h|1 +
>  qapi-schema.json |   27 +++
>  qemu-char.c  |   48 
>  qmp-commands.hx  |   37 +
>  6 files changed, 156 insertions(+), 0 deletions(-)
> 

> +++ b/qapi-schema.json
> @@ -305,6 +305,33 @@
> '*control': 'CongestionControl'} }
>  
>  ##
> +# @memchar-read:
> +#
> +# Provide read interface for CirMemCharDriver. Read from cirmemchar
> +# char device and return the data.
> +#
> +# @chardev: the name of the cirmemchar char device.
> +#
> +# @size: the size to read in bytes.
> +#
> +# @format: #optional the format of the data want to read from
> +#  CirMemCharDriver, by default is 'utf8'.
> +#
> +# @control: #optional options for read and write command that specifies
> +#   behavior when the queue is full/empty.
> +#
> +# Returns: The data read from cirmemchar as string.
> +#  If @chardev is not a valid memchr device, DeviceNotFound
> +#  If an I/O error occurs while reading, IOError
> +#
> +# Since: 1.3
> +##
> +{ 'command': 'memchar-read',
> +  'data': {'chardev': 'str', 'size': 'int', '*format': 'DataFormat',
> +   '*control': 'CongestionControl'},
> +   'returns': 'str' }

What happens if the data to be read contains embedded NUL, but the
requested 'format' can't express that?  What happens if there is less
data available than the maximum requested size?  I'm wondering if the
return should be a JSON struct, { 'data':'str', 'size':'int' }, in order
to allow for the case of short read returns.

> +- "chardev": the name of the char device, must be unique (json-string)
> +- "size": the memory size in bytes, init size of the cirmemchar
> +  by default (json-int)
> +- "format": the data format write to CirMemCharDriver, default is
> +utf8. (json-string, optional)
> +  - Possible values: "utf8", "base64"

Also, you probably want to make it crystal-clear whether size is
referring to the unencoded size of the raw data, or the encoded size
after conversion to utf8 or base64.

-- 
Eric Blake   ebl...@redhat.com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] Enablig DLPAR capacity on QEMU pSeries

2012-09-12 Thread Erlon Cruz

Hi all,

We are planning to implement DLPAR capacity on QEMU pSeries. As we
lack of experience in the internals of the arch we would like you guys
to give us some design directions
and confirm if we going in the right direction. Our first idea is:

1 - to patch 'spapr.c' so it can dynamically insert/remove basic
items into the device tree.
2 - create a host side device that will be used with a guest side
driver to perform guest side operations and communicate changes from
host to the guest (like DynamicRM does in PowerVM LPARs). We are not
planning to use powerpc-tools and want to make resource management
transparent (i.e. no need to run daemons or userspace programs in the
guest, only this kernel driver).
3 - create bindings to support adding/removal  ibmvscsi devices
4 - create bindings to support adding/removal  ibmveth devices
5 - create bindings to support adding/removal PCI devices
6 - create bindings to support adding/removal of memory
- Do we need to do this the way PowerVM does? We have tested
virtio ballooning and it can works with a few endiannes corrections.
7 - create bindings to support adding/removal  CPUs
- is SMP supported already? I tried to run SMP in a x86 host
and the guest stuck when SMP is enabled
- would be possible to work on this without a P7 baremetal
machine? We have a P7 8205-E6B, is that possible to kick PHYP out?

Any ideia on how much effort (time/people) the hole thing would take?
Any consideration about this is much appreciated :)

Kind regards,
Erlon

Re: [Qemu-devel] Rethinking missed tick catchup

2012-09-12 Thread Jan Kiszka

On 2012-09-12 16:44, Anthony Liguori wrote:
> Jan Kiszka  writes:
> 
>> On 2012-09-12 15:54, Anthony Liguori wrote:
>>>
>>> Hi,
>>>
>>> We've been running into a lot of problems lately with Windows guests and
>>> I think they all ultimately could be addressed by revisiting the missed
>>> tick catchup algorithms that we use.  Mike and I spent a while talking
>>> about it yesterday and I wanted to take the discussion to the list to
>>> get some additional input.
>>>
>>> Here are the problems we're seeing:
>>>
>>> 1) Rapid reinjection can lead to time moving faster for short bursts of
>>>time.  We've seen a number of RTC watchdog BSoDs and it's possible
>>>that at least one cause is reinjection speed.
>>>
>>> 2) When hibernating a host system, the guest gets is essentially paused
>>>for a long period of time.  This results in a very large tick catchup
>>>while also resulting in a large skew in guest time.
>>>
>>>I've gotten reports of the tick catchup consuming a lot of CPU time
>>>from rapid delivery of interrupts (although I haven't reproduced this
>>>yet).
>>>
>>> 3) Windows appears to have a service that periodically syncs the guest
>>>time with the hardware clock.  I've been told the resync period is an
>>>hour.  For large clock skews, this can compete with reinjection
>>>resulting in a positive skew in time (the guest can be ahead of the
>>>host).
>>>
>>> I've been thinking about an algorithm like this to address these
>>> problems:
>>>
>>> A) Limit the number of interrupts that we reinject to the equivalent of
>>>a small period of wallclock time.  Something like 60 seconds.
>>>
>>> B) In the event of (A), trigger a notification in QEMU.  This is easy
>>>for the RTC but harder for the in-kernel PIT.  Maybe it's a good time to
>>>revisit usage of the in-kernel PIT?
>>>
>>> C) On acculumated tick overflow, rely on using a qemu-ga command to
>>>force a resync of the guest's time to the hardware wallclock time.
>>>
>>> D) Whenever the guest reads the wallclock time from the RTC, reset all
>>>accumulated ticks.
>>>
>>> In order to do (C), we'll need to plumb qemu-ga through QMP.  Mike and I
>>> discussed a low-impact way of doing this (having a separate dispatch
>>> path for guest agent commands) and I'm confident we could do this for
>>> 1.3.
>>>
>>> This would mean that management tools would need to consume qemu-ga
>>> through QMP.  Not sure if this is a problem for anyone.
>>>
>>> I'm not sure whether it's worth trying to support this with the
>>> in-kernel PIT or not either.
>>
>> As with our current discussion around fixing the PIC and its impact on
>> the PIT, we should try on the userspace model first and then check if
>> the design can be adapted to support in-kernel as well.
>>
>> For which guests is the PIT important again? Old Linux kernels? Windows
>> should be mostly happy with the RTC - or the HPET.
> 
> I thought that only 64-bit Win2k8+ used the RTC.
> 
> I thought win2k3 and even 32-bit win2k8 still used the PIT.

Hmm, might be true.

> 
>>> Are there other issues with reinjection that people are aware of?  Does
>>> anything seem obviously wrong with the above?
>>
>> We should take the chance and design everything in a way that the HPET
>> can finally be (left) enabled.
> 
> I thought the issue with the HPET was access frequency and the cost of
> heavy weight exits.

Well, with common Win7-64 you can choose between RTC and HPET. There
former works well, the latter breaks timing. Both require userspace
exists. But HPET is enabled by default, so needs manual tuning to get
things right.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

Re: [Qemu-devel] Rethinking missed tick catchup

2012-09-12 Thread Anthony Liguori

Jan Kiszka  writes:

> On 2012-09-12 15:54, Anthony Liguori wrote:
>> 
>> Hi,
>> 
>> We've been running into a lot of problems lately with Windows guests and
>> I think they all ultimately could be addressed by revisiting the missed
>> tick catchup algorithms that we use.  Mike and I spent a while talking
>> about it yesterday and I wanted to take the discussion to the list to
>> get some additional input.
>> 
>> Here are the problems we're seeing:
>> 
>> 1) Rapid reinjection can lead to time moving faster for short bursts of
>>time.  We've seen a number of RTC watchdog BSoDs and it's possible
>>that at least one cause is reinjection speed.
>> 
>> 2) When hibernating a host system, the guest gets is essentially paused
>>for a long period of time.  This results in a very large tick catchup
>>while also resulting in a large skew in guest time.
>> 
>>I've gotten reports of the tick catchup consuming a lot of CPU time
>>from rapid delivery of interrupts (although I haven't reproduced this
>>yet).
>> 
>> 3) Windows appears to have a service that periodically syncs the guest
>>time with the hardware clock.  I've been told the resync period is an
>>hour.  For large clock skews, this can compete with reinjection
>>resulting in a positive skew in time (the guest can be ahead of the
>>host).
>> 
>> I've been thinking about an algorithm like this to address these
>> problems:
>> 
>> A) Limit the number of interrupts that we reinject to the equivalent of
>>a small period of wallclock time.  Something like 60 seconds.
>> 
>> B) In the event of (A), trigger a notification in QEMU.  This is easy
>>for the RTC but harder for the in-kernel PIT.  Maybe it's a good time to
>>revisit usage of the in-kernel PIT?
>> 
>> C) On acculumated tick overflow, rely on using a qemu-ga command to
>>force a resync of the guest's time to the hardware wallclock time.
>> 
>> D) Whenever the guest reads the wallclock time from the RTC, reset all
>>accumulated ticks.
>> 
>> In order to do (C), we'll need to plumb qemu-ga through QMP.  Mike and I
>> discussed a low-impact way of doing this (having a separate dispatch
>> path for guest agent commands) and I'm confident we could do this for
>> 1.3.
>> 
>> This would mean that management tools would need to consume qemu-ga
>> through QMP.  Not sure if this is a problem for anyone.
>> 
>> I'm not sure whether it's worth trying to support this with the
>> in-kernel PIT or not either.
>
> As with our current discussion around fixing the PIC and its impact on
> the PIT, we should try on the userspace model first and then check if
> the design can be adapted to support in-kernel as well.
>
> For which guests is the PIT important again? Old Linux kernels? Windows
> should be mostly happy with the RTC - or the HPET.

I thought that only 64-bit Win2k8+ used the RTC.

I thought win2k3 and even 32-bit win2k8 still used the PIT.

>> Are there other issues with reinjection that people are aware of?  Does
>> anything seem obviously wrong with the above?
>
> We should take the chance and design everything in a way that the HPET
> can finally be (left) enabled.

I thought the issue with the HPET was access frequency and the cost of
heavy weight exits.

I don't have concrete data here.  I've only heard it second hand.  Can
anyone comment more?

Regards,

Anthony Liguori

>
> Jan
>
> -- 
> Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
> Corporate Competence Center Embedded Linux

Re: [Qemu-devel] Rethinking missed tick catchup

2012-09-12 Thread Jan Kiszka

On 2012-09-12 15:54, Anthony Liguori wrote:
> 
> Hi,
> 
> We've been running into a lot of problems lately with Windows guests and
> I think they all ultimately could be addressed by revisiting the missed
> tick catchup algorithms that we use.  Mike and I spent a while talking
> about it yesterday and I wanted to take the discussion to the list to
> get some additional input.
> 
> Here are the problems we're seeing:
> 
> 1) Rapid reinjection can lead to time moving faster for short bursts of
>time.  We've seen a number of RTC watchdog BSoDs and it's possible
>that at least one cause is reinjection speed.
> 
> 2) When hibernating a host system, the guest gets is essentially paused
>for a long period of time.  This results in a very large tick catchup
>while also resulting in a large skew in guest time.
> 
>I've gotten reports of the tick catchup consuming a lot of CPU time
>from rapid delivery of interrupts (although I haven't reproduced this
>yet).
> 
> 3) Windows appears to have a service that periodically syncs the guest
>time with the hardware clock.  I've been told the resync period is an
>hour.  For large clock skews, this can compete with reinjection
>resulting in a positive skew in time (the guest can be ahead of the
>host).
> 
> I've been thinking about an algorithm like this to address these
> problems:
> 
> A) Limit the number of interrupts that we reinject to the equivalent of
>a small period of wallclock time.  Something like 60 seconds.
> 
> B) In the event of (A), trigger a notification in QEMU.  This is easy
>for the RTC but harder for the in-kernel PIT.  Maybe it's a good time to
>revisit usage of the in-kernel PIT?
> 
> C) On acculumated tick overflow, rely on using a qemu-ga command to
>force a resync of the guest's time to the hardware wallclock time.
> 
> D) Whenever the guest reads the wallclock time from the RTC, reset all
>accumulated ticks.
> 
> In order to do (C), we'll need to plumb qemu-ga through QMP.  Mike and I
> discussed a low-impact way of doing this (having a separate dispatch
> path for guest agent commands) and I'm confident we could do this for
> 1.3.
> 
> This would mean that management tools would need to consume qemu-ga
> through QMP.  Not sure if this is a problem for anyone.
> 
> I'm not sure whether it's worth trying to support this with the
> in-kernel PIT or not either.

As with our current discussion around fixing the PIC and its impact on
the PIT, we should try on the userspace model first and then check if
the design can be adapted to support in-kernel as well.

For which guests is the PIT important again? Old Linux kernels? Windows
should be mostly happy with the RTC - or the HPET.

> 
> Are there other issues with reinjection that people are aware of?  Does
> anything seem obviously wrong with the above?

We should take the chance and design everything in a way that the HPET
can finally be (left) enabled.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

Re: [Qemu-devel] [PATCH v2 0/3] nonblocking connect address handling cleanup

2012-09-12 Thread Amos Kong


On 12/09/12 19:12, Orit Wasserman wrote:

getaddrinfo can give us a list of addresses, but we only try to
connect to the first one. If that fails we never proceed to
the next one.  This is common on desktop setups that often have ipv6
configured but not actually working.
A simple way to reproduce the problem is migration:
for the destination use -incoming tcp:0:, run migrate -d tcp:localhost:
migration will fail on hosts that have both IPv4 and IPV6 address for localhost.

To fix this, refactor address resolution code and make inet_nonblocking_connect
retry connection with a different address.



Looks good to me.

Reviewed-by: Amos Kong 
Tested-by: Amos Kong 



Orit Wasserman (3):
   Refactor inet_connect_opts function
   Separate inet_connect into inet_connect (blocking) and
 inet_nonblocking_connect
   Fix address handling in inet_nonblocking_connect

  migration-tcp.c |   29 ++-
  nbd.c   |2 +-
  qemu-sockets.c  |  254 +--
  qemu_socket.h   |9 ++-
  ui/vnc.c|2 +-
  5 files changed, 208 insertions(+), 88 deletions(-)



--
Amos.

Re: [Qemu-devel] [PATCH] hw: Add support for new LSI Logic devices.

2012-09-12 Thread Anthony Liguori

Kevin Wolf  writes:

> Am 12.09.2012 01:50, schrieb Michael S. Tsirkin:
>> On Tue, Sep 11, 2012 at 01:00:13PM -0400, Don Slutz wrote:
>>> Add LSI53C1030, SAS1068, SAS1068e.  Based on code from "VirtualBox Open 
>>> Source Edition".
>>> Based on QEMU MegaRAID SAS 8708EM2.
>>>
>>> This is a common VMware disk controller.
>> 
>> I think you mean VMware emulates this controller too,
>> pls say it explicitly in the commit log.
>> 
>>> SEABIOS change for booting is in the works.
>>>
>>> Tested with Fedora 16, 17.  CentoOS 6. Windows 2003R2 and 2008R2.
>>>
>>> Signed-off-by: Don Slutz 
>> 
>> Minor comments below.
>> 
>> Coding style does not adhere to qemu standards,
>> I guess you know that :)
>> 
>> Otherwise, from pci side of things this looks OK.
>> I did not look at the scsi side of things.
>> 
>>> ---
>>>  default-configs/pci.mak |1 +
>>>  hw/Makefile.objs|1 +
>>>  hw/lsilogic.c   | 2743 ++
>>>  hw/lsilogic.h   | 3365 
>>> +++
>>>  hw/pci_ids.h|4 +
>>>  trace-events|   26 +
>>>  6 files changed, 6140 insertions(+), 0 deletions(-)
>>>  create mode 100644 hw/lsilogic.c
>>>  create mode 100644 hw/lsilogic.h
>>>
>>> diff --git a/default-configs/pci.mak b/default-configs/pci.mak
>>> index 69e18f1..ae4873d 100644
>>> --- a/default-configs/pci.mak
>>> +++ b/default-configs/pci.mak
>>> @@ -11,6 +11,7 @@ CONFIG_PCNET_PCI=y
>>>  CONFIG_PCNET_COMMON=y
>>>  CONFIG_LSI_SCSI_PCI=y
>>>  CONFIG_MEGASAS_SCSI_PCI=y
>>> +CONFIG_LSILOGIC_SCSI_PCI=y
>>>  CONFIG_RTL8139_PCI=y
>>>  CONFIG_E1000_PCI=y
>>>  CONFIG_IDE_CORE=y
>>> diff --git a/hw/Makefile.objs b/hw/Makefile.objs
>>> index 6dfebd2..e5f939c 100644
>>> --- a/hw/Makefile.objs
>>> +++ b/hw/Makefile.objs
>>> @@ -115,6 +115,7 @@ hw-obj-$(CONFIG_OPENCORES_ETH) += opencores_eth.o
>>>  # SCSI layer
>>>  hw-obj-$(CONFIG_LSI_SCSI_PCI) += lsi53c895a.o
>>>  hw-obj-$(CONFIG_MEGASAS_SCSI_PCI) += megasas.o
>>> +hw-obj-$(CONFIG_LSILOGIC_SCSI_PCI) += lsilogic.o
>>>  hw-obj-$(CONFIG_ESP) += esp.o
>>>  hw-obj-$(CONFIG_ESP_PCI) += esp-pci.o
>>>  
>>> diff --git a/hw/lsilogic.c b/hw/lsilogic.c
>>> new file mode 100644
>>> index 000..1c0a54f
>>> --- /dev/null
>>> +++ b/hw/lsilogic.c
>>> @@ -0,0 +1,2743 @@
>>> +/*
>>> + * QEMU LSILOGIC LSI53C1030 SCSI and SAS1068 Host Bus Adapter emulation
>>> + * Based on the QEMU Megaraid emulator and the VirtualBox LsiLogic
>>> + * LSI53c1030 SCSI controller
>>> + *
>>> + * Copyright (c) 2009-2012 Hannes Reinecke, SUSE Labs
>>> + *
>>> + * This library is free software; you can redistribute it and/or
>>> + * modify it under the terms of the GNU Lesser General Public
>>> + * License as published by the Free Software Foundation; either
>>> + * version 2 of the License, or (at your option) any later version.
>>> + *
>>> + * This library is distributed in the hope that it will be useful,
>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>>> + * Lesser General Public License for more details.
>>> + *
>>> + * You should have received a copy of the GNU Lesser General Public
>>> + * License along with this library; if not, see 
>>> .
>>> + */
>>> +
>>> +/* Id: DevLsiLogicSCSI.cpp 40642 2012-03-26 13:14:08Z vboxsync $ */
>>> +/** @file
>>> + * VBox storage devices: LsiLogic LSI53c1030 SCSI controller.
>>> + */
>>> +
>>> +/*
>>> + * Copyright (C) 2006-2009 Oracle Corporation
>>> + *
>>> + * This file is part of VirtualBox Open Source Edition (OSE), as
>>> + * available from http://www.virtualbox.org. This file is free software;
>>> + * you can redistribute it and/or modify it under the terms of the GNU
>>> + * General Public License (GPL) as published by the Free Software
>>> + * Foundation, in version 2 as it comes in the "COPYING" file of the
>>> + * VirtualBox OSE distribution. VirtualBox OSE is distributed in the
>>> + * hope that it will be useful, but WITHOUT ANY WARRANTY of any kind.
>>> + */
>>> +
>>> +
>> 
>> I suspect you need to rewrite above: probably add
>> all copyrights in 1st header and make it v2 only.
>
> Do we even accept new GPLv2-only code?

Yes.

I've got some concern about the maintainability of this though.  This is
code copied from another project and then heavily modified.

Are we prepared to independently fork this device?  How are we going to test it
regularly?

We seem to be growing SCSI controllers like weeds.  Why would someone
use this verses megasas vs. LSI vs virtio-scsi?

Regards,

Anthony Liguori

>
> Kevin

[Qemu-devel] Rethinking missed tick catchup

2012-09-12 Thread Anthony Liguori


Hi,

We've been running into a lot of problems lately with Windows guests and
I think they all ultimately could be addressed by revisiting the missed
tick catchup algorithms that we use.  Mike and I spent a while talking
about it yesterday and I wanted to take the discussion to the list to
get some additional input.

Here are the problems we're seeing:

1) Rapid reinjection can lead to time moving faster for short bursts of
   time.  We've seen a number of RTC watchdog BSoDs and it's possible
   that at least one cause is reinjection speed.

2) When hibernating a host system, the guest gets is essentially paused
   for a long period of time.  This results in a very large tick catchup
   while also resulting in a large skew in guest time.

   I've gotten reports of the tick catchup consuming a lot of CPU time
   from rapid delivery of interrupts (although I haven't reproduced this
   yet).

3) Windows appears to have a service that periodically syncs the guest
   time with the hardware clock.  I've been told the resync period is an
   hour.  For large clock skews, this can compete with reinjection
   resulting in a positive skew in time (the guest can be ahead of the
   host).

I've been thinking about an algorithm like this to address these
problems:

A) Limit the number of interrupts that we reinject to the equivalent of
   a small period of wallclock time.  Something like 60 seconds.

B) In the event of (A), trigger a notification in QEMU.  This is easy
   for the RTC but harder for the in-kernel PIT.  Maybe it's a good time to
   revisit usage of the in-kernel PIT?

C) On acculumated tick overflow, rely on using a qemu-ga command to
   force a resync of the guest's time to the hardware wallclock time.

D) Whenever the guest reads the wallclock time from the RTC, reset all
   accumulated ticks.

In order to do (C), we'll need to plumb qemu-ga through QMP.  Mike and I
discussed a low-impact way of doing this (having a separate dispatch
path for guest agent commands) and I'm confident we could do this for
1.3.

This would mean that management tools would need to consume qemu-ga
through QMP.  Not sure if this is a problem for anyone.

I'm not sure whether it's worth trying to support this with the
in-kernel PIT or not either.

Are there other issues with reinjection that people are aware of?  Does
anything seem obviously wrong with the above?

Regards,

Anthony Liguori

[Qemu-devel] [PATCH 8/9] usb-redir: Revert usb-redir part of commit 93bfef4c

2012-09-12 Thread Hans de Goede

Commit 93bfef4c6e4b23caea9d51e1099d06433d8835a4 makes qemu-devices
which report the qemu version string to the guest in some way use a
qemu_get_version function which reports a machine-specific version string.

However usb-redir does not expose the qemu version to the guest, only to
the usbredir-host as part of the initial handshake. This can then be logged
on the usbredir-host side for debugging purposes and is otherwise completely
unused! For debugging purposes it is important to have the real qemu version
in there, rather then the machine-specific version.

Signed-off-by: Hans de Goede 
---
 hw/usb/redirect.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/hw/usb/redirect.c b/hw/usb/redirect.c
index e438fd1..f327b94 100644
--- a/hw/usb/redirect.c
+++ b/hw/usb/redirect.c
@@ -142,6 +142,8 @@ static void usbredir_interrupt_packet(void *priv, uint64_t 
id,
 static int usbredir_handle_status(USBRedirDevice *dev,
int status, int actual_len);
 
+#define VERSION "qemu usb-redir guest " QEMU_VERSION
+
 /*
  * Logging stuff
  */
@@ -863,7 +865,6 @@ static void usbredir_chardev_close_bh(void *opaque)
 static void usbredir_chardev_open(USBRedirDevice *dev)
 {
 uint32_t caps[USB_REDIR_CAPS_SIZE] = { 0, };
-char version[32];
 int flags = 0;
 
 /* Make sure any pending closes are handled (no-op if none pending) */
@@ -872,9 +873,6 @@ static void usbredir_chardev_open(USBRedirDevice *dev)
 
 DPRINTF("creating usbredirparser\n");
 
-strcpy(version, "qemu usb-redir guest ");
-pstrcat(version, sizeof(version), qemu_get_version());
-
 dev->parser = qemu_oom_check(usbredirparser_create());
 dev->parser->priv = dev;
 dev->parser->log_func = usbredir_log;
@@ -906,7 +904,7 @@ static void usbredir_chardev_open(USBRedirDevice *dev)
 if (runstate_check(RUN_STATE_INMIGRATE)) {
 flags |= usbredirparser_fl_no_hello;
 }
-usbredirparser_init(dev->parser, version, caps, USB_REDIR_CAPS_SIZE,
+usbredirparser_init(dev->parser, VERSION, caps, USB_REDIR_CAPS_SIZE,
 flags);
 usbredirparser_do_write(dev->parser);
 }
-- 
1.7.12

1 2 >

1 - 100 of 149 matches

Mail list logo