Re: [Qemu-devel] default slot used for vga device on q35 machines

2013-08-05 Thread Gerd Hoffmann
  Hi,

 1) Is this difference intentional, or a bug?
 The vga simply goes into the first free slot.  That happens to be #2
 with i440fx and #1 with q35.
 
 It sounds like it's a safe bet to assume that -vga will put the device
 on slot 2 for pc machinetypes and slot 1 for q35, no matter what other
 devices there are, since -vga is always initialized first. Correct?

Yes.

 Do
 you see any chance that might change in the future? (e.g. due to some
 other new device that needs to be initialized even before vga)

Highly unlikely.  Also libvirt will switch over to use -device for vga
cards, which will make this a moot point.

 3) Does the qxl multihead support really require that the device be at
 slot 2 (as stated in the above bugzilla commend)? Or is that just a
 misunderstanding/overstatement?
 It's not required at all.  The problem is that on older qemu versions
 (pre-memory-api basically) it was impossible to create functional vga
 devices via -device due to an initialization order issue.  Which implies
 you have to use -vga instead, which in turn implies the vga ends up in
 slot #2.  On i440fx, but q35 didn't exist yet back then ;)
 
 Are you saying that it's still required to use -vga instead of -device
 qxl-vga for pc machinetype? Or that it was needed at one time, but that
 is no longer the case? If the latter, is there a reliable way to make
 the decision whether or not we need to use -vga?

libvirt has started using -device for vga cards, to be able to place
them in any pci slot.  IIRC it's used for qemu 1.1+ or 1.2+.  That works
fine for all vga cards except qxl.  The bug which broke it for qxl will
be fixed in 1.6 (and probably 1.5.3 too).

cheers,
  Gerd




[Qemu-devel] [PATCH 0/4]: timers thread-safe stuff

2013-08-05 Thread Liu Ping Fan
The patches has been rebased onto Alex's [RFC] [PATCHv5 00/16] aio / timers: 
Add AioContext timers and use ppoll
 permalink.gmane.org/gmane.comp.emulators.qemu/226333

For some other complied error issue, I can not finish compiling, will fix it 
later.

Changes since last version:
 1. drop the overlap partition and leave only thread-safe stuff
 2. For timers_state, since currently, only vm_clock can be read outside BQL.
There is no protection with ticks(since the protection will cost more in 
read_tsc path).
 3. use light weight QemuEvent to re-implement the qemu_clock_enable(foo,false)


Liu Ping Fan (2):
  timer: protect timers_state's clock with seqlock
  timer: make qemu_clock_enable sync between disable and timer's cb

Paolo Bonzini (2):
  seqlock: introduce read-write seqlock
  qemu-thread: add QemuEvent

 cpus.c  |  36 +++---
 include/qemu/seqlock.h  |  72 +++
 include/qemu/thread-posix.h |   8 +++
 include/qemu/thread-win32.h |   4 ++
 include/qemu/thread.h   |   7 +++
 include/qemu/timer.h|   1 +
 qemu-timer.c|  11 +
 util/qemu-thread-posix.c| 116 
 util/qemu-thread-win32.c|  26 ++
 9 files changed, 274 insertions(+), 7 deletions(-)
 create mode 100644 include/qemu/seqlock.h

-- 
1.8.1.4




[Qemu-devel] [PATCH 2/4] timer: protect timers_state's clock with seqlock

2013-08-05 Thread Liu Ping Fan
In kvm mode, vm_clock may be read outside BQL. This will make
timers_state --the foundation of vm_clock exposed to race condition.
Using private lock to protect it.

Note in tcg mode, vm_clock still read inside BQL, so icount is
left without change. As for cpu_ticks in timers_state, it
is still protected by BQL.

Lock rule: private lock innermost, ie BQL-this lock

Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com
---
 cpus.c | 36 +---
 1 file changed, 29 insertions(+), 7 deletions(-)

diff --git a/cpus.c b/cpus.c
index 85e743d..ab92db9 100644
--- a/cpus.c
+++ b/cpus.c
@@ -107,12 +107,17 @@ static int64_t qemu_icount;
 typedef struct TimersState {
 int64_t cpu_ticks_prev;
 int64_t cpu_ticks_offset;
+/* QemuClock will be read out of BQL, so protect is with private lock.
+ * As for cpu_ticks, no requirement to read it outside BQL.
+ * Lock rule: innermost
+ */
+QemuSeqLock clock_seqlock;
 int64_t cpu_clock_offset;
 int32_t cpu_ticks_enabled;
 int64_t dummy;
 } TimersState;
 
-TimersState timers_state;
+static TimersState timers_state;
 
 /* Return the virtual CPU time, based on the instruction counter.  */
 int64_t cpu_get_icount(void)
@@ -132,6 +137,7 @@ int64_t cpu_get_icount(void)
 }
 
 /* return the host CPU cycle counter and handle stop/restart */
+/* cpu_ticks is safely if holding BQL */
 int64_t cpu_get_ticks(void)
 {
 if (use_icount) {
@@ -156,33 +162,46 @@ int64_t cpu_get_ticks(void)
 int64_t cpu_get_clock(void)
 {
 int64_t ti;
-if (!timers_state.cpu_ticks_enabled) {
-return timers_state.cpu_clock_offset;
-} else {
-ti = get_clock();
-return ti + timers_state.cpu_clock_offset;
-}
+unsigned start;
+
+do {
+start = seqlock_read_begin(timers_state.clock_seqlock);
+if (!timers_state.cpu_ticks_enabled) {
+ti = timers_state.cpu_clock_offset;
+} else {
+ti = get_clock();
+ti += timers_state.cpu_clock_offset;
+}
+} while (seqlock_read_check(timers_state.clock_seqlock, start);
+
+return ti;
 }
 
 /* enable cpu_get_ticks() */
 void cpu_enable_ticks(void)
 {
+/* Here, the really thing protected by seqlock is cpu_clock. */
+seqlock_write_lock(timers_state.clock_seqlock);
 if (!timers_state.cpu_ticks_enabled) {
 timers_state.cpu_ticks_offset -= cpu_get_real_ticks();
 timers_state.cpu_clock_offset -= get_clock();
 timers_state.cpu_ticks_enabled = 1;
 }
+seqlock_write_unlock(timers_state.clock_seqlock);
 }
 
 /* disable cpu_get_ticks() : the clock is stopped. You must not call
cpu_get_ticks() after that.  */
 void cpu_disable_ticks(void)
 {
+/* Here, the really thing protected by seqlock is cpu_clock. */
+seqlock_write_lock(timers_state.clock_seqlock);
 if (timers_state.cpu_ticks_enabled) {
 timers_state.cpu_ticks_offset = cpu_get_ticks();
 timers_state.cpu_clock_offset = cpu_get_clock();
 timers_state.cpu_ticks_enabled = 0;
 }
+seqlock_write_unlock(timers_state.clock_seqlock);
 }
 
 /* Correlation between real and virtual time is always going to be
@@ -364,6 +383,9 @@ static const VMStateDescription vmstate_timers = {
 
 void configure_icount(const char *option)
 {
+QemuMutex *mutex = g_malloc0(sizeof(QemuMutex));
+qemu_mutex_init(mutex);
+seqlock_init(timers_state.clock_seqlock, mutex);
 vmstate_register(NULL, 0, vmstate_timers, timers_state);
 if (!option) {
 return;
-- 
1.8.1.4




[Qemu-devel] [PATCH 1/4] seqlock: introduce read-write seqlock

2013-08-05 Thread Liu Ping Fan
From: Paolo Bonzini pbonz...@redhat.com

This lets the read-side access run outside the BQL.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 include/qemu/seqlock.h | 72 ++
 1 file changed, 72 insertions(+)
 create mode 100644 include/qemu/seqlock.h

diff --git a/include/qemu/seqlock.h b/include/qemu/seqlock.h
new file mode 100644
index 000..8f1c89f
--- /dev/null
+++ b/include/qemu/seqlock.h
@@ -0,0 +1,72 @@
+/*
+ * Seqlock implementation for QEMU
+ *
+ * Copyright Red Hat, Inc. 2013
+ *
+ * Author:
+ *  Paolo Bonzini pbonz...@redhat.com
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+#ifndef QEMU_SEQLOCK_H
+#define QEMU_SEQLOCK_H 1
+
+#include qemu/atomic.h
+#include qemu/thread.h
+
+typedef struct QemuSeqLock QemuSeqLock;
+
+struct QemuSeqLock {
+QemuMutex *mutex;
+unsigned sequence;
+};
+
+static inline void seqlock_init(QemuSeqLock *sl, QemuMutex *mutex)
+{
+sl-mutex = mutex;
+sl-sequence = 0;
+}
+
+/* Lock out other writers and update the count.  */
+static inline void seqlock_write_lock(QemuSeqLock *sl)
+{
+if (sl-mutex) {
+qemu_mutex_lock(sl-mutex);
+}
+++sl-sequence;
+
+/* Write sequence before updating other fields.  */
+smp_wmb();
+}
+
+static inline void seqlock_write_unlock(QemuSeqLock *sl)
+{
+/* Write other fields before finalizing sequence.  */
+smp_wmb();
+
+++sl-sequence;
+if (sl-mutex) {
+qemu_mutex_unlock(sl-mutex);
+}
+}
+
+static inline unsigned seqlock_read_begin(QemuSeqLock *sl)
+{
+/* Always fail if a write is in progress.  */
+unsigned ret = sl-sequence  ~1;
+
+/* Read sequence before reading other fields.  */
+smp_rmb();
+return ret;
+}
+
+static int seqlock_read_check(const QemuSeqLock *sl, unsigned start)
+{
+/* Read other fields before reading final sequence.  */
+smp_rmb();
+return unlikely(sl-sequence != start);
+}
+
+#endif
-- 
1.8.1.4




[Qemu-devel] [PATCH 4/4] timer: make qemu_clock_enable sync between disable and timer's cb

2013-08-05 Thread Liu Ping Fan
After disabling the QemuClock, we should make sure that no QemuTimers
are still in flight. To implement that with light overhead, we resort
to QemuEvent. The caller of disabling will wait on QemuEvent of each
timerlist.

Note, qemu_clock_enable(foo,false) can _not_ be called from timer's cb.
And the callers of qemu_clock_enable() should be sync by themselves,
not protected by this patch.

Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com
---
 include/qemu/timer.h |  1 +
 qemu-timer.c | 11 +++
 2 files changed, 12 insertions(+)

diff --git a/include/qemu/timer.h b/include/qemu/timer.h
index 1363316..ca09ba2 100644
--- a/include/qemu/timer.h
+++ b/include/qemu/timer.h
@@ -85,6 +85,7 @@ int64_t timerlistgroup_deadline_ns(QEMUTimerListGroup tlg);
 
 int qemu_timeout_ns_to_ms(int64_t ns);
 int qemu_poll_ns(GPollFD *fds, uint nfds, int64_t timeout);
+/* The disable of clock can not be called in timer's cb */
 void qemu_clock_enable(QEMUClock *clock, bool enabled);
 void qemu_clock_warp(QEMUClock *clock);
 
diff --git a/qemu-timer.c b/qemu-timer.c
index ebe7597..5828107 100644
--- a/qemu-timer.c
+++ b/qemu-timer.c
@@ -71,6 +71,8 @@ struct QEMUTimerList {
 QLIST_ENTRY(QEMUTimerList) list;
 QEMUTimerListNotifyCB *notify_cb;
 void *notify_opaque;
+/* light weight method to mark the end of timerlist's running */
+QemuEvent ev;
 };
 
 struct QEMUTimer {
@@ -92,6 +94,7 @@ static QEMUTimerList *timerlist_new_from_clock(QEMUClock 
*clock)
 QEMUTimerList *tl;
 
 tl = g_malloc0(sizeof(QEMUTimerList));
+qemu_event_init(tl-ev, false);
 tl-clock = clock;
 QLIST_INSERT_HEAD(clock-timerlists, tl, list);
 return tl;
@@ -145,12 +148,18 @@ void qemu_clock_notify(QEMUClock *clock)
 }
 }
 
+/* The disable of clock can _not_ be called from timer's cb */
 void qemu_clock_enable(QEMUClock *clock, bool enabled)
 {
+QEMUTimerList *tl;
 bool old = clock-enabled;
 clock-enabled = enabled;
 if (enabled  !old) {
 qemu_clock_notify(clock);
+} else if (!enabled  old) {
+QLIST_FOREACH(tl, clock-timerlists, list) {
+qemu_event_wait(tl-ev);
+}
 }
 }
 
@@ -419,6 +428,7 @@ bool timerlist_run_timers(QEMUTimerList *tl)
 }
 
 current_time = qemu_get_clock_ns(tl-clock);
+qemu_event_reset(tl-ev);
 for(;;) {
 ts = tl-active_timers;
 if (!qemu_timer_expired_ns(ts, current_time)) {
@@ -432,6 +442,7 @@ bool timerlist_run_timers(QEMUTimerList *tl)
 ts-cb(ts-opaque);
 progress = true;
 }
+qemu_event_set(tl-ev);
 return progress;
 }
 
-- 
1.8.1.4




[Qemu-devel] [PATCH 3/4] qemu-thread: add QemuEvent

2013-08-05 Thread Liu Ping Fan
From: Paolo Bonzini pbonz...@redhat.com

This emulates Win32 manual-reset events using futexes or conditional
variables.  Typical ways to use them are with multi-producer,
single-consumer data structures, to test for a complex condition whose
elements come from different threads:

for (;;) {
qemu_event_reset(ev);
... test complex condition ...
if (condition is true) {
break;
}
qemu_event_wait(ev);
}

Or more efficiently (but with some duplication):

... evaluate condition ...
while (!condition) {
qemu_event_reset(ev);
... evaluate condition ...
if (!condition) {
qemu_event_wait(ev);
... evaluate condition ...
}
}

QemuEvent provides a very fast userspace path in the common case when
no other thread is waiting, or the event is not changing state.  It
is used to report RCU quiescent states to the thread calling
synchronize_rcu (the latter being the single consumer), and to report
call_rcu invocations to the thread that receives them.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 include/qemu/thread-posix.h |   8 +++
 include/qemu/thread-win32.h |   4 ++
 include/qemu/thread.h   |   7 +++
 util/qemu-thread-posix.c| 116 
 util/qemu-thread-win32.c|  26 ++
 5 files changed, 161 insertions(+)

diff --git a/include/qemu/thread-posix.h b/include/qemu/thread-posix.h
index 0f30dcc..916b2a7 100644
--- a/include/qemu/thread-posix.h
+++ b/include/qemu/thread-posix.h
@@ -21,6 +21,14 @@ struct QemuSemaphore {
 #endif
 };
 
+struct QemuEvent {
+#ifndef __linux__
+pthread_mutex_t lock;
+pthread_cond_t cond;
+#endif
+unsigned value;
+};
+
 struct QemuThread {
 pthread_t thread;
 };
diff --git a/include/qemu/thread-win32.h b/include/qemu/thread-win32.h
index 13adb95..3d58081 100644
--- a/include/qemu/thread-win32.h
+++ b/include/qemu/thread-win32.h
@@ -17,6 +17,10 @@ struct QemuSemaphore {
 HANDLE sema;
 };
 
+struct QemuEvent {
+HANDLE event;
+};
+
 typedef struct QemuThreadData QemuThreadData;
 struct QemuThread {
 QemuThreadData *data;
diff --git a/include/qemu/thread.h b/include/qemu/thread.h
index c02404b..3e32c65 100644
--- a/include/qemu/thread.h
+++ b/include/qemu/thread.h
@@ -7,6 +7,7 @@
 typedef struct QemuMutex QemuMutex;
 typedef struct QemuCond QemuCond;
 typedef struct QemuSemaphore QemuSemaphore;
+typedef struct QemuEvent QemuEvent;
 typedef struct QemuThread QemuThread;
 
 #ifdef _WIN32
@@ -45,6 +46,12 @@ void qemu_sem_wait(QemuSemaphore *sem);
 int qemu_sem_timedwait(QemuSemaphore *sem, int ms);
 void qemu_sem_destroy(QemuSemaphore *sem);
 
+void qemu_event_init(QemuEvent *ev, bool init);
+void qemu_event_set(QemuEvent *ev);
+void qemu_event_reset(QemuEvent *ev);
+void qemu_event_wait(QemuEvent *ev);
+void qemu_event_destroy(QemuEvent *ev);
+
 void qemu_thread_create(QemuThread *thread,
 void *(*start_routine)(void *),
 void *arg, int mode);
diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c
index 4489abf..8178f9b 100644
--- a/util/qemu-thread-posix.c
+++ b/util/qemu-thread-posix.c
@@ -20,7 +20,12 @@
 #include limits.h
 #include unistd.h
 #include sys/time.h
+#ifdef __linux__
+#include sys/syscall.h
+#include linux/futex.h
+#endif
 #include qemu/thread.h
+#include qemu/atomic.h
 
 static void error_exit(int err, const char *msg)
 {
@@ -268,6 +273,117 @@ void qemu_sem_wait(QemuSemaphore *sem)
 #endif
 }
 
+#ifdef __linux__
+#define futex(...)  syscall(__NR_futex, __VA_ARGS__)
+
+static inline void futex_wake(QemuEvent *ev, int n)
+{
+futex(ev, FUTEX_WAKE, n, NULL, NULL, 0);
+}
+
+static inline void futex_wait(QemuEvent *ev, unsigned val)
+{
+futex(ev, FUTEX_WAIT, (int) val, NULL, NULL, 0);
+}
+#else
+static inline void futex_wake(QemuEvent *ev, int n)
+{
+if (n == 1) {
+pthread_cond_signal(ev-cond);
+} else {
+pthread_cond_broadcast(ev-cond);
+}
+}
+
+static inline void futex_wait(QemuEvent *ev, unsigned val)
+{
+pthread_mutex_lock(ev-lock);
+if (ev-value == val) {
+pthread_cond_wait(ev-cond, ev-lock);
+}
+pthread_mutex_unlock(ev-lock);
+}
+#endif
+
+/* Valid transitions:
+ * - free-set, when setting the event
+ * - busy-set, when setting the event, followed by futex_wake
+ * - set-free, when resetting the event
+ * - free-busy, when waiting
+ *
+ * set-busy does not happen (it can be observed from the outside but
+ * it really is set-free-busy).
+ *
+ * busy-free provably cannot happen; to enforce it, the set-free transition
+ * is done with an OR, which becomes a no-op if the event has concurrently
+ * transitioned to free or busy.
+ */
+
+#define EV_SET 0
+#define EV_FREE1
+#define EV_BUSY   -1
+
+void qemu_event_init(QemuEvent *ev, bool init)
+{
+#ifndef __linux__
+pthread_mutex_init(ev-lock, NULL);
+

Re: [Qemu-devel] [PATCH] target-mips: fix decoding of microMIPS POOL32Axf instructions

2013-08-05 Thread Leon Alrae
On 03/08/13 23:01, Aurelien Jarno wrote:
 On Thu, Aug 01, 2013 at 11:02:27AM +0100, Leon Alrae wrote:
 These are not DSP instructions, thus there is no ac field.

 For more details please refer to instruction encoding of
 MULT, MULTU, MADD, MADDU, MSUB, MSUBU, MFHI, MFLO, MTHI, MTLO in
 MIPS Architecture for Programmers Volume II-B: The microMIPS32 Instruction 
 Set
 
 These instructions have both a non DSP version without the ac field,
 and a DSP version with the ac field. The encoding of the ac field is
 done in bits 14 and 15, so the current QEMU code is correct.
 
 Please refer to the  MIPS32® Architecture Manual Volume IV-e: The
 MIPS® DSP Application-Specific Extension to the microMIPS32®
 Architecture (document MD00764).
  

Please note that the patch modifies non-DSP version of listed
instructions, where ac field doesn't exist. In this case reading bits
14 and 15 and passing it as acc argument to gen_muldiv() or gen_HILO()
is not correct. If those bits are not 0 (and for most of the listed
instructions this is true) then check_dsp() is called which will cause
an exception if DSP is not supported / disabled.




Re: [Qemu-devel] [PATCH 0/4] dump-guest-memory: correct the vmcores

2013-08-05 Thread Laszlo Ersek
On 08/01/13 16:31, Luiz Capitulino wrote:
 On Thu, 1 Aug 2013 09:41:07 -0400
 Luiz Capitulino lcapitul...@redhat.com wrote:
 
 Applied to the qmp branch, thanks.
 
 Hmm, it brakes the build. Dropping it from the queue for now:
 
 /home/lcapitulino/work/src/upstream/qmp-unstable/target-s390x/arch_dump.c:179:5:
  error: conflicting types for ‘cpu_get_dump_info’
  int cpu_get_dump_info(ArchDumpInfo *info)
  ^
 In file included from 
 /home/lcapitulino/work/src/upstream/qmp-unstable/target-s390x/arch_dump.c:17:0:
 /home/lcapitulino/work/src/upstream/qmp-unstable/include/sysemu/dump.h:24:5: 
 note: previous declaration of ‘cpu_get_dump_info’ was here
  int cpu_get_dump_info(ArchDumpInfo *info,
  ^
 make[1]: *** [target-s390x/arch_dump.o] Error 1
 make: *** [subdir-s390x-softmmu] Error 2
 make: *** Waiting for unfinished jobs
 

My series was based on

  Author: Aurelien Jarno aurel...@aurel32.net  2013-07-29 09:03:23
  Committer: Aurelien Jarno aurel...@aurel32.net  2013-07-29 09:03:23

  Merge branch 'trivial-patches' of git://git.corpit.ru/qemu

and it compiled then just fine.

target-s390x/arch_dump.c was added in

  Author: Ekaterina Tumanova tuman...@linux.vnet.ibm.com  2013-07-10 15:26:46
  Committer: Christian Borntraeger borntrae...@de.ibm.com  2013-07-30 16:12:25

  s390: Implement dump-guest-memory support for target s390x

See the commit date: 2013-07-30.

I'll refresh the series.

Laszlo



Re: [Qemu-devel] [ceph-users] qemu-1.4.0 and onwards, linux kernel 3.2.x, ceph-RBD, heavy I/O leads to kernel_hung_tasks_timout_secs message and unresponsive qemu-process, [Bug 1207686]

2013-08-05 Thread Stefan Hajnoczi
On Sun, Aug 04, 2013 at 03:36:52PM +0200, Oliver Francke wrote:
 Am 02.08.2013 um 23:47 schrieb Mike Dawson mike.daw...@cloudapt.com:
  We can un-wedge the guest by opening a NoVNC session or running a 'virsh 
  screenshot' command. After that, the guest resumes and runs as expected. At 
  that point we can examine the guest. Each time we'll see:

If virsh screenshot works then this confirms that QEMU itself is still
responding.  Its main loop cannot be blocked since it was able to
process the screendump command.

This supports Josh's theory that a callback is not being invoked.  The
virtio-blk I/O request would be left in a pending state.

Now here is where the behavior varies between configurations:

On a Windows guest with 1 vCPU, you may see the symptom that the guest no
longer responds to ping.

On a Linux guest with multiple vCPUs, you may see the hung task message
from the guest kernel because other vCPUs are still making progress.
Just the vCPU that issued the I/O request and whose task is in
UNINTERRUPTIBLE state would really be stuck.

Basically, the symptoms depend not just on how QEMU is behaving but also
on the guest kernel and how many vCPUs you have configured.

I think this can explain how both problems you are observing, Oliver and
Mike, are a result of the same bug.  At least I hope they are :).

Stefan



[Qemu-devel] [PATCH] block: Produce zeros when protocols reading beyond end of file

2013-08-05 Thread Asias He
From: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp

While Asias is debugging an issue creating qcow2 images on top of
non-file protocols.  It boils down to this example using NBD:

$ qemu-io -c 'open -g nbd+unix:///?socket=/tmp/nbd.sock' -c 'read -v 0 512'

Notice the open -g option to set bs-growable.  This means you can
read/write beyond end of file.  Reading beyond end of file is supposed
to produce zeroes.

We rely on this behavior in qcow2_create2() during qcow2 image
creation.  We create a new file and then write the qcow2 header
structure using bdrv_pwrite().  Since QCowHeader is not a multiple of
sector size, block.c first uses bdrv_read() on the empty file to fetch
the first sector (should be all zeroes).

Here is the output from the qemu-io NBD example above:

$ qemu-io -c 'open -g nbd+unix:///?socket=/tmp/nbd.sock' -c 'read -v 0 512'
:  ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab  
0010:  ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab  
0020:  ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab  
...

We are not zeroing the buffer!  As a result qcow2 image creation on top
of protocols is not guaranteed to work even when file creation is
supported by the protocol.

Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
Signed-off-by: Asias He as...@redhat.com
---
 block.c | 30 +-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index 01b66d8..deaf0a0 100644
--- a/block.c
+++ b/block.c
@@ -2544,7 +2544,35 @@ static int coroutine_fn 
bdrv_co_do_readv(BlockDriverState *bs,
 }
 }
 
-ret = drv-bdrv_co_readv(bs, sector_num, nb_sectors, qiov);
+if (!bs-drv-protocol_name) {
+ret = drv-bdrv_co_readv(bs, sector_num, nb_sectors, qiov);
+} else {
+/* NBD doesn't support reading beyond end of file. */
+int64_t len, total_sectors, max_nb_sectors;
+
+len = bdrv_getlength(bs);
+if (len  0) {
+ret = len;
+goto out;
+}
+
+total_sectors = len  BDRV_SECTOR_BITS;
+max_nb_sectors = MAX(0, total_sectors - sector_num);
+if (max_nb_sectors  0) {
+ret = drv-bdrv_co_readv(bs, sector_num,
+ MIN(nb_sectors, max_nb_sectors), qiov);
+} else {
+ret = 0;
+}
+
+/* Reading beyond end of file is supposed to produce zeroes */
+if (ret == 0  total_sectors  sector_num + nb_sectors) {
+size_t offset = MAX(0, total_sectors - sector_num);
+size_t bytes = (sector_num + nb_sectors - offset) *
+BDRV_SECTOR_SIZE;
+qemu_iovec_memset(qiov, offset * BDRV_SECTOR_SIZE, 0, bytes);
+}
+}
 
 out:
 tracked_request_end(req);
-- 
1.8.3.1




[Qemu-devel] [PATCH v2 for-qmp-1.6 0/4] dump-guest-memory: correct the vmcores

2013-08-05 Thread Laszlo Ersek
v1-v2 [Luiz]:
- fix up the cpu_get_dump_info() prototype in target-s390x/arch_dump.c
  that has been introduced between my posting of v1 and Luiz's applying
  it to qmp-1.6. (Patch #4.)

v1 blurb:

Conceptually, the dump-guest-memory command works as follows:
(a) pause the guest,
(b) get a snapshot of the guest's physical memory map, as provided by
qemu,
(c) retrieve the guest's virtual mappings, as seen by the guest (this is
where paging=true vs. paging=false makes a difference),
(d) filter (c) as requested by the QMP caller,
(e) write ELF headers, keying off (b) -- the guest's physmap -- and (d)
-- the filtered guest mappings.
(f) dump RAM contents, keying off the same (b) and (d),
(g) unpause the guest (if necessary).

Patch #1 affects step (e); specifically, how (d) is matched against (b),
when paging is true, and the guest kernel maps more guest-physical
RAM than it actually has.

This can be done by non-malicious, clean-state guests (eg. a pristine
RHEL-6.4 guest), and may cause libbfd errors due to PT_LOAD entries
(coming directly from the guest page tables) exceeding the vmcore file's
size.

Patches #2 to #4 are independent of the paging option (or, more
precisely, affect them equally); they affect (b). Currently input
parameter (b), that is, the guest's physical memory map as provided by
qemu, is implicitly represented by ram_list.blocks. As a result, steps
and outputs dependent on (b) will refer to qemu-internal offsets.

Unfortunately, this breaks when the guest-visible physical addresses
diverge from the qemu-internal, RAMBlock based representation. This can
happen eg. for guests  3.5 GB, due to the 32-bit PCI hole; see patch #4
for a diagram.

Patch #2 introduces input parameter (b) explicitly, as a reasonably
minimal map of guest-physical address ranges. (Minimality is not a hard
requirement here, it just decreases the number of PT_LOAD entries
written to the vmcore header.) Patch #3 populates this map. Patch #4
rebases the dump-guest-memory command to it, so that steps (e) and (f)
work with guest-phys addresses.

As a result, the crash utility can parse vmcores dumped for big x86_64
guests (paging=false).

Please refer to Red Hat Bugzilla 981582
https://bugzilla.redhat.com/show_bug.cgi?id=981582.

Laszlo Ersek (4):
  dump: clamp guest-provided mapping lengths to ramblock sizes
  dump: introduce GuestPhysBlockList
  dump: populate guest_phys_blocks
  dump: rebase from host-private RAMBlock offsets to guest-physical
addresses

 include/sysemu/dump.h   |4 +-
 include/sysemu/memory_mapping.h |   30 ++-
 dump.c  |  171 +-
 memory_mapping.c|  174 +--
 stubs/dump.c|3 +-
 target-i386/arch_dump.c |   10 ++-
 target-s390x/arch_dump.c|3 +-
 7 files changed, 302 insertions(+), 93 deletions(-)




[Qemu-devel] [PATCH v2 for-qmp-1.6 2/4] dump: introduce GuestPhysBlockList

2013-08-05 Thread Laszlo Ersek
The vmcore must use physical addresses that are visible to the guest, not
addresses that point into linear RAMBlocks. As first step, introduce the
list type into which we'll collect the physical mappings in effect at the
time of the dump.

Related RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=981582

Signed-off-by: Laszlo Ersek ler...@redhat.com
---
 include/sysemu/memory_mapping.h |   22 ++
 dump.c  |   31 +++
 memory_mapping.c|   17 +
 3 files changed, 58 insertions(+), 12 deletions(-)

diff --git a/include/sysemu/memory_mapping.h b/include/sysemu/memory_mapping.h
index 6dfb68d..53c2cd5 100644
--- a/include/sysemu/memory_mapping.h
+++ b/include/sysemu/memory_mapping.h
@@ -17,6 +17,25 @@
 #include qemu/queue.h
 #include qemu/typedefs.h
 
+typedef struct GuestPhysBlock {
+/* visible to guest, reflects PCI hole, etc */
+hwaddr target_start;
+
+/* implies size */
+hwaddr target_end;
+
+/* points into host memory */
+uint8_t *host_addr;
+
+QTAILQ_ENTRY(GuestPhysBlock) next;
+} GuestPhysBlock;
+
+/* point-in-time snapshot of guest-visible physical mappings */
+typedef struct GuestPhysBlockList {
+unsigned num;
+QTAILQ_HEAD(, GuestPhysBlock) head;
+} GuestPhysBlockList;
+
 /* The physical and virtual address in the memory mapping are contiguous. */
 typedef struct MemoryMapping {
 hwaddr phys_addr;
@@ -45,6 +64,9 @@ void memory_mapping_list_free(MemoryMappingList *list);
 
 void memory_mapping_list_init(MemoryMappingList *list);
 
+void guest_phys_blocks_free(GuestPhysBlockList *list);
+void guest_phys_blocks_init(GuestPhysBlockList *list);
+
 void qemu_get_guest_memory_mapping(MemoryMappingList *list, Error **errp);
 
 /* get guest's memory mapping without do paging(virtual address is 0). */
diff --git a/dump.c b/dump.c
index 9a2f939..716fb1d 100644
--- a/dump.c
+++ b/dump.c
@@ -59,6 +59,7 @@ static uint64_t cpu_convert_to_target64(uint64_t val, int 
endian)
 }
 
 typedef struct DumpState {
+GuestPhysBlockList guest_phys_blocks;
 ArchDumpInfo dump_info;
 MemoryMappingList list;
 uint16_t phdr_num;
@@ -81,6 +82,7 @@ static int dump_cleanup(DumpState *s)
 {
 int ret = 0;
 
+guest_phys_blocks_free(s-guest_phys_blocks);
 memory_mapping_list_free(s-list);
 if (s-fd != -1) {
 close(s-fd);
@@ -728,31 +730,34 @@ static int dump_init(DumpState *s, int fd, bool paging, 
bool has_filter,
 s-resume = false;
 }
 
+/* If we use KVM, we should synchronize the registers before we get dump
+ * info or physmap info.
+ */
+cpu_synchronize_all_states();
+nr_cpus = 0;
+for (cpu = first_cpu; cpu != NULL; cpu = cpu-next_cpu) {
+nr_cpus++;
+}
+
 s-errp = errp;
 s-fd = fd;
 s-has_filter = has_filter;
 s-begin = begin;
 s-length = length;
+
+guest_phys_blocks_init(s-guest_phys_blocks);
+/* FILL LIST */
+
 s-start = get_start_block(s);
 if (s-start == -1) {
 error_set(errp, QERR_INVALID_PARAMETER, begin);
 goto cleanup;
 }
 
-/*
- * get dump info: endian, class and architecture.
+/* get dump info: endian, class and architecture.
  * If the target architecture is not supported, cpu_get_dump_info() will
  * return -1.
- *
- * If we use KVM, we should synchronize the registers before we get dump
- * info.
  */
-cpu_synchronize_all_states();
-nr_cpus = 0;
-for (cpu = first_cpu; cpu != NULL; cpu = cpu-next_cpu) {
-nr_cpus++;
-}
-
 ret = cpu_get_dump_info(s-dump_info);
 if (ret  0) {
 error_set(errp, QERR_UNSUPPORTED);
@@ -827,6 +832,8 @@ static int dump_init(DumpState *s, int fd, bool paging, 
bool has_filter,
 return 0;
 
 cleanup:
+guest_phys_blocks_free(s-guest_phys_blocks);
+
 if (s-resume) {
 vm_start();
 }
@@ -874,7 +881,7 @@ void qmp_dump_guest_memory(bool paging, const char *file, 
bool has_begin,
 return;
 }
 
-s = g_malloc(sizeof(DumpState));
+s = g_malloc0(sizeof(DumpState));
 
 ret = dump_init(s, fd, paging, has_begin, begin, length, errp);
 if (ret  0) {
diff --git a/memory_mapping.c b/memory_mapping.c
index 515a984..c70505b 100644
--- a/memory_mapping.c
+++ b/memory_mapping.c
@@ -165,6 +165,23 @@ void memory_mapping_list_init(MemoryMappingList *list)
 QTAILQ_INIT(list-head);
 }
 
+void guest_phys_blocks_free(GuestPhysBlockList *list)
+{
+GuestPhysBlock *p, *q;
+
+QTAILQ_FOREACH_SAFE(p, list-head, next, q) {
+QTAILQ_REMOVE(list-head, p, next);
+g_free(p);
+}
+list-num = 0;
+}
+
+void guest_phys_blocks_init(GuestPhysBlockList *list)
+{
+list-num = 0;
+QTAILQ_INIT(list-head);
+}
+
 static CPUState *find_paging_enabled_cpu(CPUState *start_cpu)
 {
 CPUState *cpu;
-- 
1.7.1





[Qemu-devel] [PATCH v2 for-qmp-1.6 1/4] dump: clamp guest-provided mapping lengths to ramblock sizes

2013-08-05 Thread Laszlo Ersek
Even a trusted  clean-state guest can map more memory than what it was
given. Since the vmcore contains RAMBlocks, mapping sizes should be
clamped to RAMBlock sizes. Otherwise such oversized mappings can exceed
the entire file size, and ELF parsers might refuse even the valid portion
of the PT_LOAD entry.

Related RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=981582

Signed-off-by: Laszlo Ersek ler...@redhat.com
---
 dump.c |   65 +++
 1 files changed, 40 insertions(+), 25 deletions(-)

diff --git a/dump.c b/dump.c
index 6a3a72a..9a2f939 100644
--- a/dump.c
+++ b/dump.c
@@ -187,7 +187,8 @@ static int write_elf32_header(DumpState *s)
 }
 
 static int write_elf64_load(DumpState *s, MemoryMapping *memory_mapping,
-int phdr_index, hwaddr offset)
+int phdr_index, hwaddr offset,
+hwaddr filesz)
 {
 Elf64_Phdr phdr;
 int ret;
@@ -197,15 +198,12 @@ static int write_elf64_load(DumpState *s, MemoryMapping 
*memory_mapping,
 phdr.p_type = cpu_convert_to_target32(PT_LOAD, endian);
 phdr.p_offset = cpu_convert_to_target64(offset, endian);
 phdr.p_paddr = cpu_convert_to_target64(memory_mapping-phys_addr, endian);
-if (offset == -1) {
-/* When the memory is not stored into vmcore, offset will be -1 */
-phdr.p_filesz = 0;
-} else {
-phdr.p_filesz = cpu_convert_to_target64(memory_mapping-length, 
endian);
-}
+phdr.p_filesz = cpu_convert_to_target64(filesz, endian);
 phdr.p_memsz = cpu_convert_to_target64(memory_mapping-length, endian);
 phdr.p_vaddr = cpu_convert_to_target64(memory_mapping-virt_addr, endian);
 
+assert(memory_mapping-length = filesz);
+
 ret = fd_write_vmcore(phdr, sizeof(Elf64_Phdr), s);
 if (ret  0) {
 dump_error(s, dump: failed to write program header table.\n);
@@ -216,7 +214,8 @@ static int write_elf64_load(DumpState *s, MemoryMapping 
*memory_mapping,
 }
 
 static int write_elf32_load(DumpState *s, MemoryMapping *memory_mapping,
-int phdr_index, hwaddr offset)
+int phdr_index, hwaddr offset,
+hwaddr filesz)
 {
 Elf32_Phdr phdr;
 int ret;
@@ -226,15 +225,12 @@ static int write_elf32_load(DumpState *s, MemoryMapping 
*memory_mapping,
 phdr.p_type = cpu_convert_to_target32(PT_LOAD, endian);
 phdr.p_offset = cpu_convert_to_target32(offset, endian);
 phdr.p_paddr = cpu_convert_to_target32(memory_mapping-phys_addr, endian);
-if (offset == -1) {
-/* When the memory is not stored into vmcore, offset will be -1 */
-phdr.p_filesz = 0;
-} else {
-phdr.p_filesz = cpu_convert_to_target32(memory_mapping-length, 
endian);
-}
+phdr.p_filesz = cpu_convert_to_target32(filesz, endian);
 phdr.p_memsz = cpu_convert_to_target32(memory_mapping-length, endian);
 phdr.p_vaddr = cpu_convert_to_target32(memory_mapping-virt_addr, endian);
 
+assert(memory_mapping-length = filesz);
+
 ret = fd_write_vmcore(phdr, sizeof(Elf32_Phdr), s);
 if (ret  0) {
 dump_error(s, dump: failed to write program header table.\n);
@@ -418,17 +414,24 @@ static int write_memory(DumpState *s, RAMBlock *block, 
ram_addr_t start,
 return 0;
 }
 
-/* get the memory's offset in the vmcore */
-static hwaddr get_offset(hwaddr phys_addr,
- DumpState *s)
+/* get the memory's offset and size in the vmcore */
+static void get_offset_range(hwaddr phys_addr,
+ ram_addr_t mapping_length,
+ DumpState *s,
+ hwaddr *p_offset,
+ hwaddr *p_filesz)
 {
 RAMBlock *block;
 hwaddr offset = s-memory_offset;
 int64_t size_in_block, start;
 
+/* When the memory is not stored into vmcore, offset will be -1 */
+*p_offset = -1;
+*p_filesz = 0;
+
 if (s-has_filter) {
 if (phys_addr  s-begin || phys_addr = s-begin + s-length) {
-return -1;
+return;
 }
 }
 
@@ -457,18 +460,26 @@ static hwaddr get_offset(hwaddr phys_addr,
 }
 
 if (phys_addr = start  phys_addr  start + size_in_block) {
-return phys_addr - start + offset;
+*p_offset = phys_addr - start + offset;
+
+/* The offset range mapped from the vmcore file must not spill over
+ * the RAMBlock, clamp it. The rest of the mapping will be
+ * zero-filled in memory at load time; see
+ * http://refspecs.linuxbase.org/elf/gabi4+/ch5.pheader.html.
+ */
+*p_filesz = phys_addr + mapping_length = start + size_in_block ?
+mapping_length :
+size_in_block - (phys_addr - start);
+return;
 }
 
 offset += size_in_block;
 }
-
-

[Qemu-devel] [PATCH v2 for-qmp-1.6 4/4] dump: rebase from host-private RAMBlock offsets to guest-physical addresses

2013-08-05 Thread Laszlo Ersek
RAMBlock.offset   -- GuestPhysBlock.target_start
RAMBlock.offset + RAMBlock.length -- GuestPhysBlock.target_end
RAMBlock.length   -- GuestPhysBlock.target_end -
  GuestPhysBlock.target_start

GuestPhysBlock.host_addr is only used when writing the dump contents.

This patch enables crash to work with the vmcore by rebasing the vmcore
from the left side of the following diagram to the right side:

host-private
offset
relative
to ram_addr   RAMBlock  guest-visible paddrs
0 +---+.+---+ 0
  | ^ | |^  |
  |   640 KB  | |  640 KB   |
  | v | |v  |
  0xa +---+.+---+ 0xa
  | ^ | |XXX|
  |   384 KB  | |XXX|
  | v | |XXX|
  0x00010 +---+.+---+ 0x00010
  | ^ | |^  |
  |   3583 MB | |  3583 MB  |
  | v | |v  |
  0x0e000 +---+.+---+ 0x0e000
  | ^ |.|XXX|
  | above_4g_mem_size | .   | PCI hole X|
  | v |  .  |  X|
 ram_size +---+   . |  512 MB  X|
   .   .|XXX|
.   +---+ 0x1
 .  | ^ |
  . | above_4g_mem_size |
   .| v |
+---+ ram_size
  + 512 MB

Related RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=981582

Signed-off-by: Laszlo Ersek ler...@redhat.com
---
 v1-v2:
 - fix up the cpu_get_dump_info() prototype in target-s390x/arch_dump.c
   that has been introduced between my posting of v1 and Luiz's applying
   it to qmp-1.6.

 include/sysemu/dump.h   |4 ++-
 include/sysemu/memory_mapping.h |7 +++-
 dump.c  |   77 +++
 memory_mapping.c|   22 +++-
 stubs/dump.c|3 +-
 target-i386/arch_dump.c |   10 +++--
 target-s390x/arch_dump.c|3 +-
 7 files changed, 69 insertions(+), 57 deletions(-)

diff --git a/include/sysemu/dump.h b/include/sysemu/dump.h
index b8c770f..19fafb2 100644
--- a/include/sysemu/dump.h
+++ b/include/sysemu/dump.h
@@ -20,7 +20,9 @@ typedef struct ArchDumpInfo {
 int d_class;/* ELFCLASS32 or ELFCLASS64 */
 } ArchDumpInfo;
 
-int cpu_get_dump_info(ArchDumpInfo *info);
+struct GuestPhysBlockList; /* memory_mapping.h */
+int cpu_get_dump_info(ArchDumpInfo *info,
+  const struct GuestPhysBlockList *guest_phys_blocks);
 ssize_t cpu_get_note_size(int class, int machine, int nr_cpus);
 
 #endif
diff --git a/include/sysemu/memory_mapping.h b/include/sysemu/memory_mapping.h
index 6723dc5..2d98a89 100644
--- a/include/sysemu/memory_mapping.h
+++ b/include/sysemu/memory_mapping.h
@@ -68,10 +68,13 @@ void guest_phys_blocks_free(GuestPhysBlockList *list);
 void guest_phys_blocks_init(GuestPhysBlockList *list);
 void guest_phys_blocks_append(GuestPhysBlockList *list);
 
-void qemu_get_guest_memory_mapping(MemoryMappingList *list, Error **errp);
+void qemu_get_guest_memory_mapping(MemoryMappingList *list,
+   const GuestPhysBlockList *guest_phys_blocks,
+   Error **errp);
 
 /* get guest's memory mapping without do paging(virtual address is 0). */
-void qemu_get_guest_simple_memory_mapping(MemoryMappingList *list);
+void qemu_get_guest_simple_memory_mapping(MemoryMappingList *list,
+  const GuestPhysBlockList *guest_phys_blocks);
 
 void memory_mapping_filter(MemoryMappingList *list, int64_t begin,
int64_t length);
diff --git a/dump.c b/dump.c
index 3fa33fc..c0dae2c 100644
--- a/dump.c
+++ b/dump.c
@@ -70,7 +70,7 @@ typedef struct DumpState {
 hwaddr memory_offset;
 int fd;
 
-RAMBlock *block;
+GuestPhysBlock *next_block;
 ram_addr_t start;
 bool has_filter;
 int64_t begin;
@@ -391,14 +391,14 @@ static int write_data(DumpState *s, void *buf, int length)
 }
 
 /* write the memroy to vmcore. 1 page per I/O. */
-static int write_memory(DumpState *s, RAMBlock *block, ram_addr_t start,
+static int write_memory(DumpState *s, GuestPhysBlock *block, ram_addr_t 

[Qemu-devel] [PATCH] virtio: Do not notify virtqueue if no element was pushed back.

2013-08-05 Thread Gal Hammer
Signed-off-by: Gal Hammer gham...@redhat.com
---
 hw/char/virtio-serial-bus.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index da417c7..0d38b4b 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -105,6 +105,7 @@ static void do_flush_queued_data(VirtIOSerialPort *port, 
VirtQueue *vq,
  VirtIODevice *vdev)
 {
 VirtIOSerialPortClass *vsc;
+bool elem_pushed = false;
 
 assert(port);
 assert(virtio_queue_ready(vq));
@@ -145,9 +146,12 @@ static void do_flush_queued_data(VirtIOSerialPort *port, 
VirtQueue *vq,
 break;
 }
 virtqueue_push(vq, port-elem, 0);
+elem_pushed = true;
 port-elem.out_num = 0;
 }
-virtio_notify(vdev, vq);
+if (elem_pushed) {
+virtio_notify(vdev, vq);
+}
 }
 
 static void flush_queued_data(VirtIOSerialPort *port)
-- 
1.8.1.4




[Qemu-devel] [PATCH v2 for-qmp-1.6 3/4] dump: populate guest_phys_blocks

2013-08-05 Thread Laszlo Ersek
While the machine is paused, in guest_phys_blocks_append() we register a
one-shot MemoryListener, solely for the initial collection of the valid
guest-physical memory ranges that happens at client registration time.

For each range that is reported to guest_phys_blocks_set_memory(), we
attempt to merge the range with adjacent (preceding, subsequent, or both)
ranges. We use two hash tables for this purpose, both indexing the same
ranges, just by different keys (guest-phys-start vs. guest-phys-end).

Ranges can only be joined if they are contiguous in both guest-physical
address space, and contiguous in host virtual address space.

The maximal ranges that remain in the end constitute the guest-physical
memory map that the dump will be based on.

Related RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=981582

Signed-off-by: Laszlo Ersek ler...@redhat.com
---
 include/sysemu/memory_mapping.h |1 +
 dump.c  |2 +-
 memory_mapping.c|  135 +++
 3 files changed, 137 insertions(+), 1 deletions(-)

diff --git a/include/sysemu/memory_mapping.h b/include/sysemu/memory_mapping.h
index 53c2cd5..6723dc5 100644
--- a/include/sysemu/memory_mapping.h
+++ b/include/sysemu/memory_mapping.h
@@ -66,6 +66,7 @@ void memory_mapping_list_init(MemoryMappingList *list);
 
 void guest_phys_blocks_free(GuestPhysBlockList *list);
 void guest_phys_blocks_init(GuestPhysBlockList *list);
+void guest_phys_blocks_append(GuestPhysBlockList *list);
 
 void qemu_get_guest_memory_mapping(MemoryMappingList *list, Error **errp);
 
diff --git a/dump.c b/dump.c
index 716fb1d..3fa33fc 100644
--- a/dump.c
+++ b/dump.c
@@ -746,7 +746,7 @@ static int dump_init(DumpState *s, int fd, bool paging, 
bool has_filter,
 s-length = length;
 
 guest_phys_blocks_init(s-guest_phys_blocks);
-/* FILL LIST */
+guest_phys_blocks_append(s-guest_phys_blocks);
 
 s-start = get_start_block(s);
 if (s-start == -1) {
diff --git a/memory_mapping.c b/memory_mapping.c
index c70505b..efaabf8 100644
--- a/memory_mapping.c
+++ b/memory_mapping.c
@@ -11,9 +11,13 @@
  *
  */
 
+#include glib.h
+
 #include cpu.h
 #include exec/cpu-all.h
 #include sysemu/memory_mapping.h
+#include exec/memory.h
+#include exec/address-spaces.h
 
 static void memory_mapping_list_add_mapping_sorted(MemoryMappingList *list,
MemoryMapping *mapping)
@@ -182,6 +186,137 @@ void guest_phys_blocks_init(GuestPhysBlockList *list)
 QTAILQ_INIT(list-head);
 }
 
+typedef struct GuestPhysListener {
+GHashTable *by_target_start;
+GHashTable *by_target_end;
+MemoryListener listener;
+} GuestPhysListener;
+
+static void guest_phys_blocks_region_add(MemoryListener *listener,
+ MemoryRegionSection *section)
+{
+GuestPhysListener *g;
+uint64_t section_size;
+hwaddr target_start, target_end;
+uint8_t *host_addr;
+GuestPhysBlock *predecessor, *successor, *block;
+bool found;
+
+/* we only care about RAM */
+if (!memory_region_is_ram(section-mr)) {
+return;
+}
+
+g= container_of(listener, GuestPhysListener, listener);
+section_size = int128_get64(section-size);
+target_start = section-offset_within_address_space;
+target_end   = target_start + section_size;
+host_addr= memory_region_get_ram_ptr(section-mr) +
+   section-offset_within_region;
+
+/* find continuity in guest physical address space */
+predecessor = g_hash_table_lookup(g-by_target_end,   target_start);
+successor   = g_hash_table_lookup(g-by_target_start, target_end);
+
+/* we require continuity in host memory too */
+if (predecessor != NULL) {
+hwaddr predecessor_size = predecessor-target_end -
+  predecessor-target_start;
+if (predecessor-host_addr + predecessor_size != host_addr) {
+predecessor = NULL;
+}
+}
+if (successor != NULL
+ host_addr + section_size != successor-host_addr) {
+successor = NULL;
+}
+
+if (predecessor == NULL) {
+if (successor == NULL) {
+/* Isolated mapping, allocate it and add it to both tables. */
+block = g_malloc0(sizeof *block);
+
+block-target_end = target_end;
+g_hash_table_insert(g-by_target_end, block-target_end, block);
+} else {
+/* Mapping has successor only. Merge current into successor by
+ * modifying successor's start. Successor's end doesn't change.
+ */
+block = successor;
+found = g_hash_table_steal(g-by_target_start,
+   block-target_start);
+g_assert(found);
+}
+block-target_start = target_start;
+block-host_addr= host_addr;
+g_hash_table_insert(g-by_target_start, block-target_start, 

Re: [Qemu-devel] [PATCH] target-ppc: Add POWER7+ CPU model

2013-08-05 Thread Andreas Färber
Am 05.08.2013 07:43, schrieb Paul Mackerras:
 On Fri, Aug 02, 2013 at 06:14:46PM +0200, Andreas Färber wrote:
 Am 02.08.2013 04:59, schrieb Alexey Kardashevskiy:
 This patch adds CPU PVR definition for POWER7+.

 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 ---
  target-ppc/cpu-models.c | 2 ++
  target-ppc/cpu-models.h | 1 +
  2 files changed, 3 insertions(+)

 diff --git a/target-ppc/cpu-models.c b/target-ppc/cpu-models.c
 index 9578ed8..c97c183 100644
 --- a/target-ppc/cpu-models.c
 +++ b/target-ppc/cpu-models.c
 @@ -1143,6 +1143,8 @@
  POWER7 v2.1)
  POWERPC_DEF(POWER7_v2.3,   CPU_POWERPC_POWER7_v23, 
 POWER7,
  POWER7 v2.3)
 +POWERPC_DEF(POWER7P,   CPU_POWERPC_POWER7P,
 POWER7,
 +POWER7P)

 Subject says POWER7+ rather than POWER7P. Since this is a string there's
 nothing wrong with +. How should it show up in SLOF?
 
 Do you mean in the device tree?  The children of the /cpus node should
 be named like PowerPC,POWER7+@xxx.

Thanks.

 See also below.

  POWERPC_DEF(POWER8_v1.0,   CPU_POWERPC_POWER8_v10, 
 POWER8,
  POWER8 v1.0)
  POWERPC_DEF(970,   CPU_POWERPC_970,970,
 diff --git a/target-ppc/cpu-models.h b/target-ppc/cpu-models.h
 index 01e488f..c3c78d1 100644
 --- a/target-ppc/cpu-models.h
 +++ b/target-ppc/cpu-models.h
 @@ -556,6 +556,7 @@ enum {
  CPU_POWERPC_POWER7_v20 = 0x003F0200,
  CPU_POWERPC_POWER7_v21 = 0x003F0201,
  CPU_POWERPC_POWER7_v23 = 0x003F0203,
 +CPU_POWERPC_POWER7P= 0x004A0201,

 Shouldn't this be ..._POWER7P_v21 to align with the surrounding models?
 Ditto for the model name then, POWER7+ being an alias to the latest
 version only.
 
 Does the fact that all these minor revisions are enumerated imply that
 QEMU is always matching all the bits of the PVR value?

Yes, see target-ppc/kvm.c.

  If so then
 that seems very fragile to me, given that QEMU looks at the host's PVR
 value when using KVM.  All it takes is for IBM to release a new minor
 revision of a CPU to break existing compiled versions of qemu
 (e.g. from a distro).

  Wouldn't it be better to be able to match only
 the top 16 bits, at least for the situations where there is no
 architectural or significant behavioural change between the versions?

Not being the ppc maintainer, I can only point out that this is how QEMU
works today and that, independently, it makes sense to name the PVR and
model after the revision they encode for self-documentation.

Ben has pointed out that POWER5++ PVR-wise was a POWER5+ v3.0+ or so
and, according to you, POWER5++ is not just a revision with
hardware-only changes and a new marketing name. So no, it's not always
just the upper 16 bits.

For e500v1/v2 the revision appears to be encoded in the two least
significant nibbles rather than in the two least significant bytes.
For e300 it's in the second most significant byte.
For the 405/440 models it's not linear from a to b to c at all.
Cf. target-ppc/cpu-models.h.

If we wanted to make our matching less precise, we'd need to specify an
explicit wildcard per model. And we can only match known models, i.e. up
to v2.1 as long as we don't know whether v2.2 or v3.0 will have the same
guest-exposed/emulation features (in which case we could've picked them
instead), because otherwise we'd need a versioning mechanism like x86
has, to avoid guest-visible CPU features changing between releases once
we do know what features new revisions have.

Also, if we stopped modeling exact revisions, it may be a good idea to
add properties to the PowerPCCPU to specify a particular major/minor
revision within the wildcard. We may need to limit such properties to
IBM's CPU models since apparently that scheme does not apply to all
PVRs. CC'ing Scott and Stuart.

Examples:
-device POWER7+-powerpc64-cpu,major-revision=2,minor-revision=0
-device POWER7+-powerpc64-cpu,pvr=0x12345678

Regards,
Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg



Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with EPT enabled

2013-08-05 Thread Zhanghaoyu (A)
   hi all,
   
   I met similar problem to these, while performing live migration or 
   save-restore test on the kvm platform (qemu:1.4.0, host:suse11sp2, 
   guest:suse11sp2), running tele-communication software suite in 
   guest, 
   https://lists.gnu.org/archive/html/qemu-devel/2013-05/msg00098.html
   http://comments.gmane.org/gmane.comp.emulators.kvm.devel/102506
   http://thread.gmane.org/gmane.comp.emulators.kvm.devel/100592
   https://bugzilla.kernel.org/show_bug.cgi?id=58771
   
   After live migration or virsh restore [savefile], one process's CPU 
   utilization went up by about 30%, resulted in throughput 
   degradation of this process.
   
   If EPT disabled, this problem gone.
   
   I suspect that kvm hypervisor has business with this problem.
   Based on above suspect, I want to find the two adjacent versions of 
   kvm-kmod which triggers this problem or not (e.g. 2.6.39, 3.0-rc1), 
   and analyze the differences between this two versions, or apply the 
   patches between this two versions by bisection method, finally find 
   the key patches.
   
   Any better ideas?
   
   Thanks,
   Zhang Haoyu
  
  I've attempted to duplicate this on a number of machines that are as 
  similar to yours as I am able to get my hands on, and so far have not 
  been able to see any performance degradation. And from what I've read in 
  the above links, huge pages do not seem to be part of the problem.
  
  So, if you are in a position to bisect the kernel changes, that would 
  probably be the best avenue to pursue in my opinion.
  
  Bruce
  
  I found the first bad 
  commit([612819c3c6e67bac8fceaa7cc402f13b1b63f7e4] KVM: propagate fault 
  r/w information to gup(), allow read-only memory) which triggers this 
  problem by git bisecting the kvm kernel (download from 
  https://git.kernel.org/pub/scm/virt/kvm/kvm.git) changes.
  
  And,
  git log 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4 -n 1 -p  
  612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.log
  git diff 
  612819c3c6e67bac8fceaa7cc402f13b1b63f7e4~1..612819c3c6e67bac8fceaa7cc4
  02f13b1b63f7e4  612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.diff
  
  Then, I diffed 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.log and 
  612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.diff,
  came to a conclusion that all of the differences between 
  612819c3c6e67bac8fceaa7cc402f13b1b63f7e4~1 and 
  612819c3c6e67bac8fceaa7cc402f13b1b63f7e4
  are contributed by no other than 
  612819c3c6e67bac8fceaa7cc402f13b1b63f7e4, so this commit is the 
  peace-breaker which directly or indirectly causes the degradation.
  
  Does the map_writable flag passed to mmu_set_spte() function have effect 
  on PTE's PAT flag or increase the VMEXITs induced by that guest tried to 
  write read-only memory?
  
  Thanks,
  Zhang Haoyu
  
 
 There should be no read-only memory maps backing guest RAM.
 
 Can you confirm map_writable = false is being passed to __direct_map? (this 
 should not happen, for guest RAM).
 And if it is false, please capture the associated GFN.
 
 I added below check and printk at the start of __direct_map() at the fist 
 bad commit version,
 --- kvm-612819c3c6e67bac8fceaa7cc402f13b1b63f7e4/arch/x86/kvm/mmu.c 
 2013-07-26 18:44:05.0 +0800
 +++ kvm-612819/arch/x86/kvm/mmu.c   2013-07-31 00:05:48.0 +0800
 @@ -2223,6 +2223,9 @@ static int __direct_map(struct kvm_vcpu
 int pt_write = 0;
 gfn_t pseudo_gfn;
 
 +if (!map_writable)
 +printk(KERN_ERR %s: %s: gfn = %llu \n, __FILE__, 
 __func__, gfn);
 +
 for_each_shadow_entry(vcpu, (u64)gfn  PAGE_SHIFT, iterator) {
 if (iterator.level == level) {
 unsigned pte_access = ACC_ALL;
 
 I virsh-save the VM, and then virsh-restore it, so many GFNs were printed, 
 you can absolutely describe it as flooding.
 
The flooding you see happens during migrate to file stage because of dirty
page tracking. If you clear dmesg after virsh-save you should not see any
flooding after virsh-restore. I just checked with latest tree, I do not.

I made a verification again.
I virsh-save the VM, during the saving stage, I run 'dmesg', no GFN printed, 
maybe the switching from running stage to pause stage takes so short time, 
no guest-write happens during this switching period.
After the completion of saving operation, I run 'demsg -c' to clear the buffer 
all the same, then I virsh-restore the VM, so many GFNs are printed by running 
'dmesg',
and I also run 'tail -f /var/log/messages' during the restoring stage, so many 
GFNs are flooded dynamically too.
I'm sure that the flooding happens during the virsh-restore stage, not the 
migration stage.

On VM's normal starting stage, only very few GFNs are printed, shown as below
gfn = 16
gfn = 604
gfn = 605
gfn = 606
gfn = 607
gfn = 608
gfn = 609

but on the VM's restoring stage, so many GFNs are printed, taking some examples 
shown as below,
2042600
279
2797778
2797779
2797780
2797781
2797782
2797783
2797784

Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with EPT enabled

2013-08-05 Thread Gleb Natapov
On Mon, Aug 05, 2013 at 08:35:09AM +, Zhanghaoyu (A) wrote:
hi all,

I met similar problem to these, while performing live migration or 
save-restore test on the kvm platform (qemu:1.4.0, host:suse11sp2, 
guest:suse11sp2), running tele-communication software suite in 
guest, 
https://lists.gnu.org/archive/html/qemu-devel/2013-05/msg00098.html
http://comments.gmane.org/gmane.comp.emulators.kvm.devel/102506
http://thread.gmane.org/gmane.comp.emulators.kvm.devel/100592
https://bugzilla.kernel.org/show_bug.cgi?id=58771

After live migration or virsh restore [savefile], one process's CPU 
utilization went up by about 30%, resulted in throughput 
degradation of this process.

If EPT disabled, this problem gone.

I suspect that kvm hypervisor has business with this problem.
Based on above suspect, I want to find the two adjacent versions of 
kvm-kmod which triggers this problem or not (e.g. 2.6.39, 3.0-rc1), 
and analyze the differences between this two versions, or apply the 
patches between this two versions by bisection method, finally find 
the key patches.

Any better ideas?

Thanks,
Zhang Haoyu
   
   I've attempted to duplicate this on a number of machines that are as 
   similar to yours as I am able to get my hands on, and so far have not 
   been able to see any performance degradation. And from what I've read 
   in the above links, huge pages do not seem to be part of the problem.
   
   So, if you are in a position to bisect the kernel changes, that would 
   probably be the best avenue to pursue in my opinion.
   
   Bruce
   
   I found the first bad 
   commit([612819c3c6e67bac8fceaa7cc402f13b1b63f7e4] KVM: propagate fault 
   r/w information to gup(), allow read-only memory) which triggers this 
   problem by git bisecting the kvm kernel (download from 
   https://git.kernel.org/pub/scm/virt/kvm/kvm.git) changes.
   
   And,
   git log 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4 -n 1 -p  
   612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.log
   git diff 
   612819c3c6e67bac8fceaa7cc402f13b1b63f7e4~1..612819c3c6e67bac8fceaa7cc4
   02f13b1b63f7e4  612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.diff
   
   Then, I diffed 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.log and 
   612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.diff,
   came to a conclusion that all of the differences between 
   612819c3c6e67bac8fceaa7cc402f13b1b63f7e4~1 and 
   612819c3c6e67bac8fceaa7cc402f13b1b63f7e4
   are contributed by no other than 
   612819c3c6e67bac8fceaa7cc402f13b1b63f7e4, so this commit is the 
   peace-breaker which directly or indirectly causes the degradation.
   
   Does the map_writable flag passed to mmu_set_spte() function have 
   effect on PTE's PAT flag or increase the VMEXITs induced by that guest 
   tried to write read-only memory?
   
   Thanks,
   Zhang Haoyu
   
  
  There should be no read-only memory maps backing guest RAM.
  
  Can you confirm map_writable = false is being passed to __direct_map? 
  (this should not happen, for guest RAM).
  And if it is false, please capture the associated GFN.
  
  I added below check and printk at the start of __direct_map() at the fist 
  bad commit version,
  --- kvm-612819c3c6e67bac8fceaa7cc402f13b1b63f7e4/arch/x86/kvm/mmu.c 
  2013-07-26 18:44:05.0 +0800
  +++ kvm-612819/arch/x86/kvm/mmu.c   2013-07-31 00:05:48.0 +0800
  @@ -2223,6 +2223,9 @@ static int __direct_map(struct kvm_vcpu
  int pt_write = 0;
  gfn_t pseudo_gfn;
  
  +if (!map_writable)
  +printk(KERN_ERR %s: %s: gfn = %llu \n, __FILE__, 
  __func__, gfn);
  +
  for_each_shadow_entry(vcpu, (u64)gfn  PAGE_SHIFT, iterator) {
  if (iterator.level == level) {
  unsigned pte_access = ACC_ALL;
  
  I virsh-save the VM, and then virsh-restore it, so many GFNs were printed, 
  you can absolutely describe it as flooding.
  
 The flooding you see happens during migrate to file stage because of dirty
 page tracking. If you clear dmesg after virsh-save you should not see any
 flooding after virsh-restore. I just checked with latest tree, I do not.
 
 I made a verification again.
 I virsh-save the VM, during the saving stage, I run 'dmesg', no GFN printed, 
 maybe the switching from running stage to pause stage takes so short time, 
 no guest-write happens during this switching period.
 After the completion of saving operation, I run 'demsg -c' to clear the 
 buffer all the same, then I virsh-restore the VM, so many GFNs are printed by 
 running 'dmesg',
 and I also run 'tail -f /var/log/messages' during the restoring stage, so 
 many GFNs are flooded dynamically too.
 I'm sure that the flooding happens during the virsh-restore stage, not the 
 migration stage.
 
Interesting, is this with upstream kernel? For me the situation is
exactly the opposite. What is your command line?
 
--
   

Re: [Qemu-devel] [PATCH for-1.6] target-mips: do not raise exceptions when accessing invalid memory

2013-08-05 Thread Andreas Färber
Am 05.08.2013 00:04, schrieb Aurélien Jarno:
 On Mon, Jul 29, 2013 at 10:35:31PM +0200, Stefan Weil wrote:
 Am 27.07.2013 22:58, schrieb Stefan Weil:
 Am 27.07.2013 22:43, schrieb Andreas Färber:
 Am 27.07.2013 21:37, schrieb Stefan Weil:
 Am 27.07.2013 19:43, schrieb Peter Maydell:
 On 27 July 2013 17:18, Hervé Poussineau hpous...@reactos.org wrote:
 Another solution would be to add a big dummy memory regions on all MIPS 
 boards
 to catch memory accesses and not raise an exception. However, this 
 means that
 each MIPS board will have its own unassigned memory handler, different 
 from the
 global QEMU one.
 Better would be to at least provide fake RAZ/WI implementations of
 devices for the boards, rather than making the dummy region cover
 the whole of the address space. Not 1.6 material, though.


 For MIPS Malta, Linux boot can be fixed by handling read access for two
 addresses:

 0x1fbf8008
 0x1bc80110

 The corresponding definitions in the Linux kernel code seem to be these
 lines:

 #define GCMP_BASE_ADDR  0x1fbf8000
 #define GCMP_ADDRSPACE_SZ   (256 * 1024)
 #define GCMP_GCB_GCMPB_OFS  0x0008  /* Global GCMP
 Base */

 #define MSC01_BIU_REG_BASE  0x1bc8
 #define MSC01_BIU_ADDRSPACE_SZ  (256 * 1024)
 #define MSC01_SC_CFG_OFS0x0110

 = mips_malta.c needs a handler for reads of
 (GCMP_BASE_ADDR + GCMP_GCB_GCMPB_OFS) and
 (MSC01_BIU_REG_BASE + MSC01_SC_CFG_OFS).
 
 I don't think it would be correct to emulate them as this are not
 present on the real Malta board, at least for the model emulated by
 QEMU. Theses addresses correspond to the SMP controller, and is
 therefore only present when an SMP daughter card is installed.
 
 The Linux kernel probes theses addresses, and look if they return
 something consistent. If not the corresponding devices are latter
 ignored.
 
 The real hardware probably returns all 1 or all 0 for addresses not
 decoded to a device. This is what QEMU should model, and it should
 not trigger a DBE or IBE exception.
[snip]

Have you tested Jan's patches limiting the new unassigned read value -1
to PIO?

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg



Re: [Qemu-devel] [PATCH for-1.6] target-mips: do not raise exceptions when accessing invalid memory

2013-08-05 Thread Jan Kiszka
On 2013-08-05 10:45, Andreas Färber wrote:
 Am 05.08.2013 00:04, schrieb Aurélien Jarno:
 On Mon, Jul 29, 2013 at 10:35:31PM +0200, Stefan Weil wrote:
 Am 27.07.2013 22:58, schrieb Stefan Weil:
 Am 27.07.2013 22:43, schrieb Andreas Färber:
 Am 27.07.2013 21:37, schrieb Stefan Weil:
 Am 27.07.2013 19:43, schrieb Peter Maydell:
 On 27 July 2013 17:18, Hervé Poussineau hpous...@reactos.org wrote:
 Another solution would be to add a big dummy memory regions on all 
 MIPS boards
 to catch memory accesses and not raise an exception. However, this 
 means that
 each MIPS board will have its own unassigned memory handler, different 
 from the
 global QEMU one.
 Better would be to at least provide fake RAZ/WI implementations of
 devices for the boards, rather than making the dummy region cover
 the whole of the address space. Not 1.6 material, though.


 For MIPS Malta, Linux boot can be fixed by handling read access for two
 addresses:

 0x1fbf8008
 0x1bc80110

 The corresponding definitions in the Linux kernel code seem to be these
 lines:

 #define GCMP_BASE_ADDR  0x1fbf8000
 #define GCMP_ADDRSPACE_SZ   (256 * 1024)
 #define GCMP_GCB_GCMPB_OFS  0x0008  /* Global GCMP
 Base */

 #define MSC01_BIU_REG_BASE  0x1bc8
 #define MSC01_BIU_ADDRSPACE_SZ  (256 * 1024)
 #define MSC01_SC_CFG_OFS0x0110

 = mips_malta.c needs a handler for reads of
 (GCMP_BASE_ADDR + GCMP_GCB_GCMPB_OFS) and
 (MSC01_BIU_REG_BASE + MSC01_SC_CFG_OFS).

 I don't think it would be correct to emulate them as this are not
 present on the real Malta board, at least for the model emulated by
 QEMU. Theses addresses correspond to the SMP controller, and is
 therefore only present when an SMP daughter card is installed.

 The Linux kernel probes theses addresses, and look if they return
 something consistent. If not the corresponding devices are latter
 ignored.

 The real hardware probably returns all 1 or all 0 for addresses not
 decoded to a device. This is what QEMU should model, and it should
 not trigger a DBE or IBE exception.
 [snip]
 
 Have you tested Jan's patches limiting the new unassigned read value -1
 to PIO?

Yeah, this sounds a lot like another side effect of using
unassigned_mem_ops for PIO...

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] [PATCH 0/12 RFC v2] Localhost migration

2013-08-05 Thread Lei Li

On 07/26/2013 05:41 PM, Paolo Bonzini wrote:

Il 25/07/2013 22:18, Lei Li ha scritto:

Hi,

This patch series tries to add localhost migration support to
Qemu.

When doing localhost migration, the host memory will balloon
up during the period, might consume double memories for some time.
So we want to add a new live migration mechanism localhost
migration.

Following I copied from last version that Anthony added for the
benefit of the other reviewers:

The goal here is to allow live upgrade of a running QEMU instance.
The work flow would look like this:

1) Guests are running QEMU release 1.6.1
2) Admin installs QEMU release 1.6.2 via RPM or deb
3) Admin does localhost migration with page flipping to use new version
of QEMU.

Page flipping is used in order to avoid requiring that there is enough
free memory to fit an additional copy of the largest guest which is the
requirement today with localhost migration.

You can also read from the link below:
http://lists.gnu.org/archive/html/qemu-devel/2013-06/msg02577.html

The plan is:

1) Add new command to do localhost migration.

The qmp interface introduced like:

{ 'command': 'localhost-migrate', 'data': {'uri': 'str'} }

2) Use different mechanism than current live migration.

The very basic work flow like:

qemu on the source (the source and destination are both on localhost)
   |
   V
Stop VM
   |
   V
   Create threads
   |
   V
Page flipping through vmspice
   |
   V
MADV_DONTNEED the ram pages which are already flipped
   |
   V
Migration completes

As stopping VM first, we expect/resume the page flipping through vmspice
is fast enough to meet *live migration (low downtime).

Notes:
Currently the work flow is not exactly the same as description
above. For the first step, the work flow we implemented is:
stop VM and copy ram pages via unix domain socket, MADV_DONTNEED
ram pages that already copied. After that, will replace to vmsplice
mechanism instead of copying pages.

Now it's still a draft version, and as it is implemented separately
to the current migration code for a easy start, the next step will
be trying to integrate it into the current migration implementation
closely. To make sure we are on the right direction that should be
headed, please let me know your suggestions on this.

For the interface, as Anthony has suggested using a flag or a
capability, which one would you prefer or any ideas?

Using a capability on the source makes sense; I don't think anything
special is needed on the destination.  The destination just sees a
special packet telling it that a pipe is available via SCM_RIGHTS; then
it fetches the file descriptor and uses it for the new protocol.

It looks like this could reuse a lot of the RAM copying hooks that we
introduced for RDMA.  You shouldn't need to change anything in savevm.c
or arch_init.c


Hi Paolo,

Thanks for your suggestions!

And sorry for the late reply, I am looking into RDMA implementation and
trying to figure out the way to integrate with its approach as you suggested.

For the interface on the source, I am OK with using a capability.
For the destination, currently we are copying ram pages via unix
domain socket as first step, will replace to pipe later. How about
add a new URI prefix as Michael R.Hines suggested?




Paolo


Your comments are very welcome!

TODO:
- Integrate to the current implement of migration closely?
- Introduce a mechanism to exchange a PIPE via SCM_RIGHTS.
- benchmark/evaluation.

Lei Li (12):
   migration: export MIG_STATE_xxx flags
   savevm: export qemu_save_device_state()
   rename is_active to is_block_active
   arch_init: introduce ram_page_save()
   arch_init: introduce ram_save_local()
   arch_init: add save_local_setup to savevm_ram_handlers
   savevm: introduce qemu_savevm_local()
   savevm: adjust is_ram check in register_savevm_live()
   migration-local: implementation of outgoing part
   migration-local: implementation of incoming part
   migration-local: add option to command line for local incoming
   hmp:add hmp_localhost_migration interface

  Makefile.objs |1 +
  arch_init.c   |   110 +++
  block-migration.c |2 +-
  hmp-commands.hx   |   17 
  hmp.c |   13 +++
  hmp.h |1 +
  include/migration/migration.h |   32 +++
  include/sysemu/sysemu.h   |1 +
  migration-local.c |  228 +
  migration-unix.c  |   60 
  migration.c   |8 --
  qapi-schema.json  |   14 +++
  qemu-options.hx   |9 ++
  qmp-commands.hx   |   22 +
  savevm.c  |   100 +++--
  vl.c

[Qemu-devel] [PATCH for-1.6] qemu-img: Error out for excess arguments

2013-08-05 Thread Kevin Wolf
Don't silently ignore excess arguments at the end of the command line,
but error out instead. This can catch typos like 'resize test.img + 1G',
which doesn't increase the image size by 1G as intended, but truncates
the image to 1G. Even for less dangerous commands, the old behaviour is
confusing.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 qemu-img.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index c55ca5c..dece1b3 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -396,6 +396,9 @@ static int img_create(int argc, char **argv)
 }
 img_size = (uint64_t)sval;
 }
+if (optind != argc) {
+help();
+}
 
 if (options  is_help_option(options)) {
 return print_block_option_help(filename, fmt);
@@ -573,7 +576,7 @@ static int img_check(int argc, char **argv)
 break;
 }
 }
-if (optind = argc) {
+if (optind != argc - 1) {
 help();
 }
 filename = argv[optind++];
@@ -684,7 +687,7 @@ static int img_commit(int argc, char **argv)
 break;
 }
 }
-if (optind = argc) {
+if (optind != argc - 1) {
 help();
 }
 filename = argv[optind++];
@@ -930,7 +933,7 @@ static int img_compare(int argc, char **argv)
 }
 
 
-if (optind  argc - 2) {
+if (optind != argc - 2) {
 help();
 }
 filename1 = argv[optind++];
@@ -1741,7 +1744,7 @@ static int img_info(int argc, char **argv)
 break;
 }
 }
-if (optind = argc) {
+if (optind != argc - 1) {
 help();
 }
 filename = argv[optind++];
@@ -1842,7 +1845,7 @@ static int img_snapshot(int argc, char **argv)
 }
 }
 
-if (optind = argc) {
+if (optind != argc - 1) {
 help();
 }
 filename = argv[optind++];
@@ -1953,7 +1956,7 @@ static int img_rebase(int argc, char **argv)
 progress = 0;
 }
 
-if ((optind = argc) || (!unsafe  !out_baseimg)) {
+if ((optind != argc - 1) || (!unsafe  !out_baseimg)) {
 help();
 }
 filename = argv[optind++];
@@ -2232,7 +2235,7 @@ static int img_resize(int argc, char **argv)
 break;
 }
 }
-if (optind = argc) {
+if (optind != argc - 1) {
 help();
 }
 filename = argv[optind++];
-- 
1.8.1.4




Re: [Qemu-devel] [PATCH] savevm: set right return value for qemu_file_rate_limit

2013-08-05 Thread Lei Li

PING? any comments?

On 07/24/2013 11:48 PM, Lei Li wrote:
In the logic of ram_save_iterate(), it checks the ret of 
qemu_file_rate_limit(f)
by ret  0 if there has been an error. But now it will never return 
negative value

because qemu_file_rate_limit() return 1 if qemu_file_get_error().

Also the original implementation of qemu_file_rate_limit() as function
buffered_rate_limit() did return negative for this situation.


On 07/21/2013 08:56 PM, Lei Li wrote:

Commit 1964a397063967acc5ce71a2a24ed26e74824ee1 refactors rate
limiting to QEMUFile, but set the return value for qemu_file_rate_limit
to 1 in the case of qemu_file_get_error. It is wrong and should be 
negative
compared to the original function buffered_rate_limit and the current 
logic
in ram_save_iterate. As qemu_file_rate_limit is called manually to 
determine
if it has to exit, add the defination of the meaning of the return 
values

as well.


Signed-off-by: Lei Li li...@linux.vnet.ibm.com
---
  savevm.c |   14 --
  1 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/savevm.c b/savevm.c
index e0491e7..f406790 100644
--- a/savevm.c
+++ b/savevm.c
@@ -904,10 +904,20 @@ int64_t qemu_ftell(QEMUFile *f)
  return f-pos;
  }

+/*
+ * The meaning of the return values is:
+ *   0: We can continue sending
+ *   1: Time to stop
+ *   negative: There has been an error
+ */
+
  int qemu_file_rate_limit(QEMUFile *f)
  {
-if (qemu_file_get_error(f)) {
-return 1;
+int ret;
+
+ret = qemu_file_get_error(f);
+if (ret) {
+return ret;
  }
  if (f-xfer_limit  0  f-bytes_xfer  f-xfer_limit) {
  return 1;






--
Lei




Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with EPT enabled

2013-08-05 Thread Zhanghaoyu (A)
hi all,

I met similar problem to these, while performing live migration or 
save-restore test on the kvm platform (qemu:1.4.0, host:suse11sp2, 
guest:suse11sp2), running tele-communication software suite in 
guest, 
https://lists.gnu.org/archive/html/qemu-devel/2013-05/msg00098.html
http://comments.gmane.org/gmane.comp.emulators.kvm.devel/102506
http://thread.gmane.org/gmane.comp.emulators.kvm.devel/100592
https://bugzilla.kernel.org/show_bug.cgi?id=58771

After live migration or virsh restore [savefile], one process's CPU 
utilization went up by about 30%, resulted in throughput 
degradation of this process.

If EPT disabled, this problem gone.

I suspect that kvm hypervisor has business with this problem.
Based on above suspect, I want to find the two adjacent versions of 
kvm-kmod which triggers this problem or not (e.g. 2.6.39, 3.0-rc1), 
and analyze the differences between this two versions, or apply the 
patches between this two versions by bisection method, finally find 
the key patches.

Any better ideas?

Thanks,
Zhang Haoyu
   
   I've attempted to duplicate this on a number of machines that are as 
   similar to yours as I am able to get my hands on, and so far have not 
   been able to see any performance degradation. And from what I've read 
   in the above links, huge pages do not seem to be part of the problem.
   
   So, if you are in a position to bisect the kernel changes, that would 
   probably be the best avenue to pursue in my opinion.
   
   Bruce
   
   I found the first bad 
   commit([612819c3c6e67bac8fceaa7cc402f13b1b63f7e4] KVM: propagate fault 
   r/w information to gup(), allow read-only memory) which triggers this 
   problem by git bisecting the kvm kernel (download from 
   https://git.kernel.org/pub/scm/virt/kvm/kvm.git) changes.
   
   And,
   git log 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4 -n 1 -p  
   612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.log
   git diff 
   612819c3c6e67bac8fceaa7cc402f13b1b63f7e4~1..612819c3c6e67bac8fceaa7cc4
   02f13b1b63f7e4  612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.diff
   
   Then, I diffed 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.log and 
   612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.diff,
   came to a conclusion that all of the differences between 
   612819c3c6e67bac8fceaa7cc402f13b1b63f7e4~1 and 
   612819c3c6e67bac8fceaa7cc402f13b1b63f7e4
   are contributed by no other than 
   612819c3c6e67bac8fceaa7cc402f13b1b63f7e4, so this commit is the 
   peace-breaker which directly or indirectly causes the degradation.
   
   Does the map_writable flag passed to mmu_set_spte() function have 
   effect on PTE's PAT flag or increase the VMEXITs induced by that guest 
   tried to write read-only memory?
   
   Thanks,
   Zhang Haoyu
   
  
  There should be no read-only memory maps backing guest RAM.
  
  Can you confirm map_writable = false is being passed to __direct_map? 
  (this should not happen, for guest RAM).
  And if it is false, please capture the associated GFN.
  
  I added below check and printk at the start of __direct_map() at the fist 
  bad commit version,
  --- kvm-612819c3c6e67bac8fceaa7cc402f13b1b63f7e4/arch/x86/kvm/mmu.c 
  2013-07-26 18:44:05.0 +0800
  +++ kvm-612819/arch/x86/kvm/mmu.c   2013-07-31 00:05:48.0 
  +0800
  @@ -2223,6 +2223,9 @@ static int __direct_map(struct kvm_vcpu
  int pt_write = 0;
  gfn_t pseudo_gfn;
  
  +if (!map_writable)
  +printk(KERN_ERR %s: %s: gfn = %llu \n, __FILE__, 
  __func__, gfn);
  +
  for_each_shadow_entry(vcpu, (u64)gfn  PAGE_SHIFT, iterator) {
  if (iterator.level == level) {
  unsigned pte_access = ACC_ALL;
  
  I virsh-save the VM, and then virsh-restore it, so many GFNs were 
  printed, you can absolutely describe it as flooding.
  
 The flooding you see happens during migrate to file stage because of dirty
 page tracking. If you clear dmesg after virsh-save you should not see any
 flooding after virsh-restore. I just checked with latest tree, I do not.
 
 I made a verification again.
 I virsh-save the VM, during the saving stage, I run 'dmesg', no GFN printed, 
 maybe the switching from running stage to pause stage takes so short time, 
 no guest-write happens during this switching period.
 After the completion of saving operation, I run 'demsg -c' to clear the 
 buffer all the same, then I virsh-restore the VM, so many GFNs are printed 
 by running 'dmesg',
 and I also run 'tail -f /var/log/messages' during the restoring stage, so 
 many GFNs are flooded dynamically too.
 I'm sure that the flooding happens during the virsh-restore stage, not the 
 migration stage.
 
Interesting, is this with upstream kernel? For me the situation is
exactly the opposite. What is your command line?
 
I made the verification on the first bad commit 

Re: [Qemu-devel] [PATCH] target-ppc: Add POWER7+ CPU model

2013-08-05 Thread Benjamin Herrenschmidt
On Mon, 2013-08-05 at 10:24 +0200, Andreas Färber wrote:
 Ben has pointed out that POWER5++ PVR-wise was a POWER5+ v3.0+ or so
 and, according to you, POWER5++ is not just a revision with
 hardware-only changes and a new marketing name. So no, it's not always
 just the upper 16 bits.

Also 4xx uses a different bit split etc... that's why qemu should do
like the kernel and use a mask/value pair to do the matching.

Cheers,
Ben.





Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with EPT enabled

2013-08-05 Thread Andreas Färber
Hi,

Am 05.08.2013 11:09, schrieb Zhanghaoyu (A):
 When I build the upstream, encounter a problem that I compile and install the 
 upstream(commit: e769ece3b129698d2b09811a6f6d304e4eaa8c29) on sles11sp2 
 environment via below command
 cp /boot/config-3.0.13-0.27-default ./.config
 yes  | make oldconfig
 make  make modules_install  make install
 then, I reboot the host, and select the upstream kernel, but during the 
 starting stage, below problem happened,
 Could not find /dev/disk/by-id/scsi-3600508e0864407c5b8f7ad01-part3 
 
 I'm trying to resolve it.

Possibly you need to enable loading unsupported kernel modules?
At least that's needed when testing a kmod with a SUSE kernel.

Regards,
Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg



[Qemu-devel] [PATCH] pci: fix i82801b11 bridge

2013-08-05 Thread Gerd Hoffmann
Signed-off-by: Gerd Hoffmann kra...@redhat.com
---
 hw/pci-bridge/i82801b11.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/pci-bridge/i82801b11.c b/hw/pci-bridge/i82801b11.c
index 8a5e426..14cd7fd 100644
--- a/hw/pci-bridge/i82801b11.c
+++ b/hw/pci-bridge/i82801b11.c
@@ -90,6 +90,7 @@ static void i82801b11_bridge_class_init(ObjectClass *klass, 
void *data)
 k-device_id = PCI_DEVICE_ID_INTEL_82801BA_11;
 k-revision = ICH9_D2P_A2_REVISION;
 k-init = i82801b11_bridge_initfn;
+k-config_write = pci_bridge_write_config;
 set_bit(DEVICE_CATEGORY_BRIDGE, dc-categories);
 }
 
-- 
1.7.9.7




Re: [Qemu-devel] [Patch] ARM: Add an L2 cache controller to KZM

2013-08-05 Thread Peter Maydell
On 5 August 2013 02:21, Peter Chubb peter.ch...@nicta.com.au wrote:
 Reads to unassigned memory now return non-zero (since patch
 9b8c69243585). This breaks guests runnong on i.MX31 that use the
 cache controller --- they poll forever waiting for the L2CC cache
 invalidate regsiter to be zero.

That commit claims it was just restoring the previous
behaviour, so it shouldn't break guests, surely?

-- PMM



Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with EPT enabled

2013-08-05 Thread Zhanghaoyu (A)
Hi,

Am 05.08.2013 11:09, schrieb Zhanghaoyu (A):
 When I build the upstream, encounter a problem that I compile and 
 install the upstream(commit: e769ece3b129698d2b09811a6f6d304e4eaa8c29) 
 on sles11sp2 environment via below command cp 
 /boot/config-3.0.13-0.27-default ./.config yes  | make oldconfig 
 make  make modules_install  make install then, I reboot the host, 
 and select the upstream kernel, but during the starting stage, below 
 problem happened, Could not find 
 /dev/disk/by-id/scsi-3600508e0864407c5b8f7ad01-part3
 
 I'm trying to resolve it.

Possibly you need to enable loading unsupported kernel modules?
At least that's needed when testing a kmod with a SUSE kernel.

I have tried to set  allow_unsupported_modules 1 in 
/etc/modprobe.d/unsupported-modules, but the problem still happened.
I replace the whole kernel with the kvm kernel, not only the kvm modules.

Regards,
Andreas


Re: [Qemu-devel] [Patch] ARM: Add an L2 cache controller to KZM

2013-08-05 Thread Andreas Färber
Am 05.08.2013 11:18, schrieb Peter Maydell:
 On 5 August 2013 02:21, Peter Chubb peter.ch...@nicta.com.au wrote:
 Reads to unassigned memory now return non-zero (since patch
 9b8c69243585). This breaks guests runnong on i.MX31 that use the
 cache controller --- they poll forever waiting for the L2CC cache
 invalidate regsiter to be zero.
 
 That commit claims it was just restoring the previous
 behaviour, so it shouldn't break guests, surely?

See Jan's patches on the list. PReP was reported affected, too.

Peter Ch., if you know the exact differences, why don't you just derive
an imx-l2cc type (or so) derived from ARM's type, overriding the values
mentioned above? Sounds trivial to me.

Regards,
Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg



Re: [Qemu-devel] [PATCH v6] e1000: add interrupt mitigation support

2013-08-05 Thread Stefan Hajnoczi
On Fri, Aug 02, 2013 at 06:30:52PM +0200, Vincenzo Maffione wrote:
 This patch partially implements the e1000 interrupt mitigation mechanisms.
 Using a single QEMUTimer, it emulates the ITR register (which is the newer
 mitigation register, recommended by Intel) and approximately emulates
 RADV and TADV registers. TIDV and RDTR register functionalities are not
 emulated (RDTR is only used to validate RADV, according to the e1000 specs).
 
 RADV, TADV, TIDV and RDTR registers make up the older e1000 mitigation
 mechanism and would need a timer each to be completely emulated. However,
 a single timer has been used in order to reach a good compromise between
 emulation accuracy and simplicity/efficiency.
 
 The implemented mechanism can be enabled/disabled specifying the command
 line e1000-specific boolean parameter mitigation, e.g.
 
 qemu-system-x86_64 -device e1000,mitigation=on,... ...
 
 For more information, see the Software developer's manual at
 http://download.intel.com/design/network/manuals/8254x_GBe_SDM.pdf.
 
 Interrupt mitigation boosts performance when the guest suffers from
 an high interrupt rate (i.e. receiving short UDP packets at high packet
 rate). For some numerical results see the following link
 http://info.iet.unipi.it/~luigi/papers/20130520-rizzo-vm.pdf
 
 Signed-off-by: Vincenzo Maffione v.maffi...@gmail.com
 ---
 Added pc-*-1.7 machines (default machine moved to pc-i440fx-1.7).
 
  hw/i386/pc_piix.c|  18 ++-
  hw/i386/pc_q35.c |  16 ++-
  hw/net/e1000.c   | 131 
 +--
  include/hw/i386/pc.h |   8 
  4 files changed, 167 insertions(+), 6 deletions(-)

Reviewed-by: Stefan Hajnoczi stefa...@redhat.com



Re: [Qemu-devel] [PATCH v6] e1000: add interrupt mitigation support

2013-08-05 Thread Stefan Hajnoczi
On Fri, Aug 02, 2013 at 06:30:52PM +0200, Vincenzo Maffione wrote:
 This patch partially implements the e1000 interrupt mitigation mechanisms.
 Using a single QEMUTimer, it emulates the ITR register (which is the newer
 mitigation register, recommended by Intel) and approximately emulates
 RADV and TADV registers. TIDV and RDTR register functionalities are not
 emulated (RDTR is only used to validate RADV, according to the e1000 specs).
 
 RADV, TADV, TIDV and RDTR registers make up the older e1000 mitigation
 mechanism and would need a timer each to be completely emulated. However,
 a single timer has been used in order to reach a good compromise between
 emulation accuracy and simplicity/efficiency.
 
 The implemented mechanism can be enabled/disabled specifying the command
 line e1000-specific boolean parameter mitigation, e.g.
 
 qemu-system-x86_64 -device e1000,mitigation=on,... ...
 
 For more information, see the Software developer's manual at
 http://download.intel.com/design/network/manuals/8254x_GBe_SDM.pdf.
 
 Interrupt mitigation boosts performance when the guest suffers from
 an high interrupt rate (i.e. receiving short UDP packets at high packet
 rate). For some numerical results see the following link
 http://info.iet.unipi.it/~luigi/papers/20130520-rizzo-vm.pdf
 
 Signed-off-by: Vincenzo Maffione v.maffi...@gmail.com
 ---
 Added pc-*-1.7 machines (default machine moved to pc-i440fx-1.7).
 
  hw/i386/pc_piix.c|  18 ++-
  hw/i386/pc_q35.c |  16 ++-
  hw/net/e1000.c   | 131 
 +--
  include/hw/i386/pc.h |   8 
  4 files changed, 167 insertions(+), 6 deletions(-)

I will merge this into my net tree soon but I'd like to wait for
Andreas' review to make sure he's comfortable with the machine type
changes too.

Stefan



Re: [Qemu-devel] [PATCH 1/2] memory: Provide separate handling of unassigned io ports accesses

2013-08-05 Thread Andreas Färber
Am 03.08.2013 10:31, schrieb Jan Kiszka:
 From: Jan Kiszka jan.kis...@siemens.com
 
 Accesses to unassigned io ports shall return -1 on read and be ignored
 on write. Ensure these properties via dedicated ops, decoupling us from
 the memory core's handling of unassigned accesses.
 
 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 ---
  exec.c|3 ++-
  include/exec/ioport.h |2 ++
  ioport.c  |   16 
  3 files changed, 20 insertions(+), 1 deletions(-)
 
 diff --git a/exec.c b/exec.c
 index 3ca9381..9ed598f 100644
 --- a/exec.c
 +++ b/exec.c
 @@ -1820,7 +1820,8 @@ static void memory_map_init(void)
  address_space_init(address_space_memory, system_memory, memory);
  
  system_io = g_malloc(sizeof(*system_io));
 -memory_region_init(system_io, NULL, io, 65536);
 +memory_region_init_io(system_io, NULL, unassigned_io_ops, NULL, io,
 +  65536);

It was reported that there may be some machines/PHBs that have
overlapping PIO/MMIO. Unless we use priorities, this ..._io MemoryRegion
will shadow or conflict with any ..._io MemoryRegion added to the memory
address space, wouldn't it?

Regards,
Andreas

  address_space_init(address_space_io, system_io, I/O);
  
  memory_listener_register(core_memory_listener, address_space_memory);
 diff --git a/include/exec/ioport.h b/include/exec/ioport.h
 index bdd4e96..84f7f85 100644
 --- a/include/exec/ioport.h
 +++ b/include/exec/ioport.h
 @@ -45,6 +45,8 @@ typedef struct MemoryRegionPortio {
  
  #define PORTIO_END_OF_LIST() { }
  
 +extern const MemoryRegionOps unassigned_io_ops;
 +
  void cpu_outb(pio_addr_t addr, uint8_t val);
  void cpu_outw(pio_addr_t addr, uint16_t val);
  void cpu_outl(pio_addr_t addr, uint32_t val);
 diff --git a/ioport.c b/ioport.c
 index 79b7f1a..9765588 100644
 --- a/ioport.c
 +++ b/ioport.c
 @@ -44,6 +44,22 @@ typedef struct MemoryRegionPortioList {
  MemoryRegionPortio ports[];
  } MemoryRegionPortioList;
  
 +static uint64_t unassigned_io_read(void *opaque, hwaddr addr, unsigned size)
 +{
 +return -1UL;
 +}
 +
 +static void unassigned_io_write(void *opaque, hwaddr addr, uint64_t val,
 +unsigned size)
 +{
 +}
 +
 +const MemoryRegionOps unassigned_io_ops = {
 +.read = unassigned_io_read,
 +.write = unassigned_io_write,
 +.endianness = DEVICE_NATIVE_ENDIAN,
 +};
 +
  void cpu_outb(pio_addr_t addr, uint8_t val)
  {
  LOG_IOPORT(outb: %04FMT_pioaddr %02PRIx8\n, addr, val);
 


-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg



Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with EPT enabled

2013-08-05 Thread Gleb Natapov
On Mon, Aug 05, 2013 at 09:09:56AM +, Zhanghaoyu (A) wrote:
 The QEMU command line (/var/log/libvirt/qemu/[domain name].log),
 LC_ALL=C PATH=/bin:/sbin:/usr/bin:/usr/sbin HOME=/ QEMU_AUDIO_DRV=none 
 /usr/local/bin/qemu-system-x86_64 -name ATS1 -S -M pc-0.12 -cpu qemu32 
 -enable-kvm -m 12288 -smp 4,sockets=4,cores=1,threads=1 -uuid 
 0505ec91-382d-800e-2c79-e5b286eb60b5 -no-user-config -nodefaults -chardev 
 socket,id=charmonitor,path=/var/lib/libvirt/qemu/ATS1.monitor,server,nowait 
 -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime 
 -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive 
 file=/opt/ne/vm/ATS1.img,if=none,id=drive-virtio-disk0,format=raw,cache=none 
 -device 
 virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
  -netdev tap,fd=20,id=hostnet0,vhost=on,vhostfd=21 -device 
 virtio-net-pci,netdev=hostnet0,id=net0,mac=00:e0:fc:00:0f:00,bus=pci.0,addr=0x3,bootindex=2
  -netdev tap,fd=22,id=hostnet1,vhost=on,vhostfd=23 -device 
 virtio-net-pci,netdev=hostnet1,id=net1,mac=00:e0:fc:01:0f:00,bus=pci.0,addr=0x4
  -netdev tap,fd=24,id=hostnet2,vhost=on,vhostfd=25 -device 
 virtio-net-pci,netdev=hostnet2,id=net2,mac=00:e0:fc:02:0f:00,bus=pci.0,addr=0x5
  -netdev tap,fd=26,id=hostnet3,vhost=on,vhostfd=27 -device 
 virtio-net-pci,netdev=hostnet3,id=net3,mac=00:e0:fc:03:0f:00,bus=pci.0,addr=0x6
  -netdev tap,fd=28,id=hostnet4,vhost=on,vhostfd=29 -device 
 virtio-net-pci,netdev=hostnet4,id=net4,mac=00:e0:fc:0a:0f:00,bus=pci.0,addr=0x7
  -netdev tap,fd=30,id=hostnet5,vhost=on,vhostfd=31 -device 
 virtio-net-pci,netdev=hostnet5,id=net5,mac=00:e0:fc:0b:0f:00,bus=pci.0,addr=0x9
  -chardev pty,id=charserial0 -device 
 isa-serial,chardev=charserial0,id=serial0 -vnc *:0 -k en-us -vga cirrus 
 -device i6300esb,id=watchdog0,bus=pci.0,addr=0xb -watchdog-action poweroff 
 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xa
 
Which QEMU version is this? Can you try with e1000 NICs instead of
virtio?

--
Gleb.



Re: [Qemu-devel] [PATCH 1/2] memory: Provide separate handling of unassigned io ports accesses

2013-08-05 Thread Jan Kiszka
On 2013-08-05 11:34, Andreas Färber wrote:
 Am 03.08.2013 10:31, schrieb Jan Kiszka:
 From: Jan Kiszka jan.kis...@siemens.com

 Accesses to unassigned io ports shall return -1 on read and be ignored
 on write. Ensure these properties via dedicated ops, decoupling us from
 the memory core's handling of unassigned accesses.

 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 ---
  exec.c|3 ++-
  include/exec/ioport.h |2 ++
  ioport.c  |   16 
  3 files changed, 20 insertions(+), 1 deletions(-)

 diff --git a/exec.c b/exec.c
 index 3ca9381..9ed598f 100644
 --- a/exec.c
 +++ b/exec.c
 @@ -1820,7 +1820,8 @@ static void memory_map_init(void)
  address_space_init(address_space_memory, system_memory, memory);
  
  system_io = g_malloc(sizeof(*system_io));
 -memory_region_init(system_io, NULL, io, 65536);
 +memory_region_init_io(system_io, NULL, unassigned_io_ops, NULL, io,
 +  65536);
 
 It was reported that there may be some machines/PHBs that have
 overlapping PIO/MMIO. Unless we use priorities, this ..._io MemoryRegion
 will shadow or conflict with any ..._io MemoryRegion added to the memory
 address space, wouldn't it?

I cannot follow yet. This is the base mem-region for the PIO space. If
some region from the MMIO space forwards request here and there is no
backing device, this base region will handle it. So, even on arch with
MMIO-based PIO, both address space should remain separate. Or am I
missing something?

Jan




signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v6] e1000: add interrupt mitigation support

2013-08-05 Thread Andreas Färber
Am 05.08.2013 11:27, schrieb Stefan Hajnoczi:
 On Fri, Aug 02, 2013 at 06:30:52PM +0200, Vincenzo Maffione wrote:
 This patch partially implements the e1000 interrupt mitigation mechanisms.
 Using a single QEMUTimer, it emulates the ITR register (which is the newer
 mitigation register, recommended by Intel) and approximately emulates
 RADV and TADV registers. TIDV and RDTR register functionalities are not
 emulated (RDTR is only used to validate RADV, according to the e1000 specs).

 RADV, TADV, TIDV and RDTR registers make up the older e1000 mitigation
 mechanism and would need a timer each to be completely emulated. However,
 a single timer has been used in order to reach a good compromise between
 emulation accuracy and simplicity/efficiency.

 The implemented mechanism can be enabled/disabled specifying the command
 line e1000-specific boolean parameter mitigation, e.g.

 qemu-system-x86_64 -device e1000,mitigation=on,... ...

 For more information, see the Software developer's manual at
 http://download.intel.com/design/network/manuals/8254x_GBe_SDM.pdf.

 Interrupt mitigation boosts performance when the guest suffers from
 an high interrupt rate (i.e. receiving short UDP packets at high packet
 rate). For some numerical results see the following link
 http://info.iet.unipi.it/~luigi/papers/20130520-rizzo-vm.pdf

 Signed-off-by: Vincenzo Maffione v.maffi...@gmail.com
 ---
 Added pc-*-1.7 machines (default machine moved to pc-i440fx-1.7).

  hw/i386/pc_piix.c|  18 ++-
  hw/i386/pc_q35.c |  16 ++-
  hw/net/e1000.c   | 131 
 +--
  include/hw/i386/pc.h |   8 
  4 files changed, 167 insertions(+), 6 deletions(-)
 
 I will merge this into my net tree soon but I'd like to wait for
 Andreas' review to make sure he's comfortable with the machine type
 changes too.

I'm okay with the machine changes, just not comfortable reviewing the
net code. Feel free to add a Reviewed-by: Andreas Färber
afaer...@suse.de (for pc-* machines) if you wish.

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg



Re: [Qemu-devel] [PATCH] pci: fix i82801b11 bridge

2013-08-05 Thread Andreas Färber
Am 05.08.2013 10:50, schrieb Gerd Hoffmann:
 Signed-off-by: Gerd Hoffmann kra...@redhat.com
 ---
  hw/pci-bridge/i82801b11.c |1 +
  1 file changed, 1 insertion(+)
 
 diff --git a/hw/pci-bridge/i82801b11.c b/hw/pci-bridge/i82801b11.c
 index 8a5e426..14cd7fd 100644
 --- a/hw/pci-bridge/i82801b11.c
 +++ b/hw/pci-bridge/i82801b11.c
 @@ -90,6 +90,7 @@ static void i82801b11_bridge_class_init(ObjectClass *klass, 
 void *data)
  k-device_id = PCI_DEVICE_ID_INTEL_82801BA_11;
  k-revision = ICH9_D2P_A2_REVISION;
  k-init = i82801b11_bridge_initfn;
 +k-config_write = pci_bridge_write_config;
  set_bit(DEVICE_CATEGORY_BRIDGE, dc-categories);
  }
  

We have an abstract TYPE_PCI_BRIDGE now, suggest to set it in its
class_init instead if that's a generic bridge function.

Probably should be for-1.6?

Cheers,
Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg



Re: [Qemu-devel] [PATCH 1/2] memory: Provide separate handling of unassigned io ports accesses

2013-08-05 Thread Peter Maydell
On 5 August 2013 10:34, Andreas Färber afaer...@suse.de wrote:
 Am 03.08.2013 10:31, schrieb Jan Kiszka:
 From: Jan Kiszka jan.kis...@siemens.com

 Accesses to unassigned io ports shall return -1 on read and be ignored
 on write. Ensure these properties via dedicated ops, decoupling us from
 the memory core's handling of unassigned accesses.

 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 ---
  exec.c|3 ++-
  include/exec/ioport.h |2 ++
  ioport.c  |   16 
  3 files changed, 20 insertions(+), 1 deletions(-)

 diff --git a/exec.c b/exec.c
 index 3ca9381..9ed598f 100644
 --- a/exec.c
 +++ b/exec.c
 @@ -1820,7 +1820,8 @@ static void memory_map_init(void)
  address_space_init(address_space_memory, system_memory, memory);

  system_io = g_malloc(sizeof(*system_io));
 -memory_region_init(system_io, NULL, io, 65536);
 +memory_region_init_io(system_io, NULL, unassigned_io_ops, NULL, io,
 +  65536);

 It was reported that there may be some machines/PHBs that have
 overlapping PIO/MMIO. Unless we use priorities, this ..._io MemoryRegion
 will shadow or conflict with any ..._io MemoryRegion added to the memory
 address space, wouldn't it?

Priorities only apply between different subregions within a
container. This is adding IO operations to the container itself,
so there's no priority issue here: the container's operations
always come last, behind any subregions it has.

(Do we have any existing examples of container regions with their
own default IO operations? The memory.c code clearly expects them
to be OK, though - eg render_memory_region() specifically does
render subregions; then render the region itself into any gaps.)

Or do you mean that if we had:

 [ system memory region, with its own default read/write ops ]
 [ io region mapped into it ]
 [ io ]   [ io ][io]

that now if you access the bit of system memory corresponding
to the I/O region at some address with no specific IO port,
you'll get the IO region's defaults, rather than the system
memory region's defaults? I think that's true and possibly
a change in behaviour. Do we have any boards that do that?

-- PMM



Re: [Qemu-devel] [PATCH 0/4]: timers thread-safe stuff

2013-08-05 Thread Alex Bligh

Pingfan,

--On 5 August 2013 15:33:22 +0800 Liu Ping Fan 
pingf...@linux.vnet.ibm.com wrote:



The patches has been rebased onto Alex's [RFC] [PATCHv5 00/16] aio /
timers: Add AioContext timers and use ppoll
permalink.gmane.org/gmane.comp.emulators.qemu/226333

For some other complied error issue, I can not finish compiling, will fix
it later.

Changes since last version:
 1. drop the overlap partition and leave only thread-safe stuff
 2. For timers_state, since currently, only vm_clock can be read outside
BQL. There is no protection with ticks(since the protection will cost
more in read_tsc path).  3. use light weight QemuEvent to re-implement
the qemu_clock_enable(foo,false)


I think you may need to protect a little more.

For instance, do we need to take a lock whilst traversing a QEMUTimerList?
I hope the answer to this is no. The design idea of my stuff was that only
one thread would be manipulating or traversing this list. As the notify
function is a property of the QEMUTimerList itself, no traversal is
necessary for that. However, the icount stuff (per my patch 15) does
look at the deadlines for the vm_clock QEMUTimerLists (which is the
first entry). Is that always going to be thread safe? Before the icount
stuff, I was pretty certain this did not need a lock, but perhaps
it does now. If the soonest deadline on any QEMUTimerList was stored
in a 64 bit atomic variable, this might remove the need for a lock;
it's possible that putting some memory barrier operations in might
be enough.

Also, we maintain a per-clock list of QEMUTimerLists. This list is
traversed by the icount stuff, by things adding and removing timers
(e.g. creation/deletion of AioContext) and whenever a QEMUClock is
reenabled. I think this DOES need a lock. Aside from the icount
stuff, usage is very infrequent. It's far more frequent with the
icount stuff, so it should probably be optimised for that case.

--
Alex Bligh



Re: [Qemu-devel] [PATCH 1/2] memory: Provide separate handling of unassigned io ports accesses

2013-08-05 Thread Andreas Färber
Am 05.08.2013 11:59, schrieb Peter Maydell:
 On 5 August 2013 10:34, Andreas Färber afaer...@suse.de wrote:
 Am 03.08.2013 10:31, schrieb Jan Kiszka:
 From: Jan Kiszka jan.kis...@siemens.com

 Accesses to unassigned io ports shall return -1 on read and be ignored
 on write. Ensure these properties via dedicated ops, decoupling us from
 the memory core's handling of unassigned accesses.

 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 ---
  exec.c|3 ++-
  include/exec/ioport.h |2 ++
  ioport.c  |   16 
  3 files changed, 20 insertions(+), 1 deletions(-)

 diff --git a/exec.c b/exec.c
 index 3ca9381..9ed598f 100644
 --- a/exec.c
 +++ b/exec.c
 @@ -1820,7 +1820,8 @@ static void memory_map_init(void)
  address_space_init(address_space_memory, system_memory, memory);

  system_io = g_malloc(sizeof(*system_io));
 -memory_region_init(system_io, NULL, io, 65536);
 +memory_region_init_io(system_io, NULL, unassigned_io_ops, NULL, io,
 +  65536);

 It was reported that there may be some machines/PHBs that have
 overlapping PIO/MMIO. Unless we use priorities, this ..._io MemoryRegion
 will shadow or conflict with any ..._io MemoryRegion added to the memory
 address space, wouldn't it?
 
 Priorities only apply between different subregions within a
 container. This is adding IO operations to the container itself,
 so there's no priority issue here: the container's operations
 always come last, behind any subregions it has.
 
 (Do we have any existing examples of container regions with their
 own default IO operations? The memory.c code clearly expects them
 to be OK, though - eg render_memory_region() specifically does
 render subregions; then render the region itself into any gaps.)
 
 Or do you mean that if we had:
 
  [ system memory region, with its own default read/write ops ]
  [ io region mapped into it ]
  [ io ]   [ io ][io]
 
 that now if you access the bit of system memory corresponding
 to the I/O region at some address with no specific IO port,
 you'll get the IO region's defaults, rather than the system
 memory region's defaults? I think that's true and possibly
 a change in behaviour.

Yes, that's what I thought someone brought up in one of those lengthy
memory discussions: Accesses between the first two [io]s would now seem
to go to [io region] rather than into [system memory region] subregions
at the same level as [io region].

 Do we have any boards that do that?

Sorry, I don't remember who brought it up, hopefully Paolo remembers?
Possibly sPAPR?

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg



Re: [Qemu-devel] [PATCH 1/2] memory: Provide separate handling of unassigned io ports accesses

2013-08-05 Thread Jan Kiszka
On 2013-08-05 11:59, Peter Maydell wrote:
 On 5 August 2013 10:34, Andreas Färber afaer...@suse.de wrote:
 Am 03.08.2013 10:31, schrieb Jan Kiszka:
 From: Jan Kiszka jan.kis...@siemens.com

 Accesses to unassigned io ports shall return -1 on read and be ignored
 on write. Ensure these properties via dedicated ops, decoupling us from
 the memory core's handling of unassigned accesses.

 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 ---
  exec.c|3 ++-
  include/exec/ioport.h |2 ++
  ioport.c  |   16 
  3 files changed, 20 insertions(+), 1 deletions(-)

 diff --git a/exec.c b/exec.c
 index 3ca9381..9ed598f 100644
 --- a/exec.c
 +++ b/exec.c
 @@ -1820,7 +1820,8 @@ static void memory_map_init(void)
  address_space_init(address_space_memory, system_memory, memory);

  system_io = g_malloc(sizeof(*system_io));
 -memory_region_init(system_io, NULL, io, 65536);
 +memory_region_init_io(system_io, NULL, unassigned_io_ops, NULL, io,
 +  65536);

 It was reported that there may be some machines/PHBs that have
 overlapping PIO/MMIO. Unless we use priorities, this ..._io MemoryRegion
 will shadow or conflict with any ..._io MemoryRegion added to the memory
 address space, wouldn't it?
 
 Priorities only apply between different subregions within a
 container. This is adding IO operations to the container itself,
 so there's no priority issue here: the container's operations
 always come last, behind any subregions it has.
 
 (Do we have any existing examples of container regions with their
 own default IO operations? The memory.c code clearly expects them
 to be OK, though - eg render_memory_region() specifically does
 render subregions; then render the region itself into any gaps.)
 
 Or do you mean that if we had:
 
  [ system memory region, with its own default read/write ops ]

I cannot imagine how this could work. The system memory region has no
clue about what the regions below it can handle and what not. So it has
to pass through the io window.

Jan

  [ io region mapped into it ]
  [ io ]   [ io ][io]
 
 that now if you access the bit of system memory corresponding
 to the I/O region at some address with no specific IO port,
 you'll get the IO region's defaults, rather than the system
 memory region's defaults? I think that's true and possibly
 a change in behaviour. Do we have any boards that do that?
 
 -- PMM
 




signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH] pci: fix i82801b11 bridge

2013-08-05 Thread Gerd Hoffmann
On 08/05/13 11:45, Andreas Färber wrote:
 Am 05.08.2013 10:50, schrieb Gerd Hoffmann:
 Signed-off-by: Gerd Hoffmann kra...@redhat.com
 ---
  hw/pci-bridge/i82801b11.c |1 +
  1 file changed, 1 insertion(+)

 diff --git a/hw/pci-bridge/i82801b11.c b/hw/pci-bridge/i82801b11.c
 index 8a5e426..14cd7fd 100644
 --- a/hw/pci-bridge/i82801b11.c
 +++ b/hw/pci-bridge/i82801b11.c
 @@ -90,6 +90,7 @@ static void i82801b11_bridge_class_init(ObjectClass 
 *klass, void *data)
  k-device_id = PCI_DEVICE_ID_INTEL_82801BA_11;
  k-revision = ICH9_D2P_A2_REVISION;
  k-init = i82801b11_bridge_initfn;
 +k-config_write = pci_bridge_write_config;
  set_bit(DEVICE_CATEGORY_BRIDGE, dc-categories);
  }
  
 
 We have an abstract TYPE_PCI_BRIDGE now, suggest to set it in its
 class_init instead if that's a generic bridge function.

Well, depends on the bridge device.  When they have stuff beside core
pci bridge functionality (such as shpc hotplug support like the
'pci-bridge' device) they need their private function.  But those can
still override config_write in the device init I guess.

/me leaves that to mst to decide.  Beside that it should most likely go
into a separate cleanup patch as there are other devices which hook
pci_bridge_write_config the same way.

 Probably should be for-1.6?

Yes.

cheers,
  Gerd





Re: [Qemu-devel] [PATCH 1/2] memory: Provide separate handling of unassigned io ports accesses

2013-08-05 Thread Peter Maydell
On 5 August 2013 11:30, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-08-05 11:59, Peter Maydell wrote:
 Or do you mean that if we had:

  [ system memory region, with its own default read/write ops ]

 I cannot imagine how this could work. The system memory region has no
 clue about what the regions below it can handle and what not. So it has
 to pass through the io window.

The system memory region's just a container, you can add a
background region to it at lowest-possible-priority, which
then takes accesses which are either (a) not in any subregion
or (b) in a subregion but that container doesn't specify
its own io ops and nothing in that container handles the
access. (Or you could create the system memory region with
its own IO ops, which would have the same effect.)

This works because the memory subsystem flattens the whole
tree, and at flattening time render_memory_region() effectively
knows if there are gaps in the subcontainer or not. At runtime
we never pass through anything (except for the special case
of an IOMMU) -- the tree of regions has been preflattened
and the target of the memory access is known in advance.

-- PMM



Re: [Qemu-devel] [PATCH 1/2] memory: Provide separate handling of unassigned io ports accesses

2013-08-05 Thread Jan Kiszka
On 2013-08-05 12:36, Peter Maydell wrote:
 On 5 August 2013 11:30, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-08-05 11:59, Peter Maydell wrote:
 Or do you mean that if we had:

  [ system memory region, with its own default read/write ops ]

 I cannot imagine how this could work. The system memory region has no
 clue about what the regions below it can handle and what not. So it has
 to pass through the io window.
 
 The system memory region's just a container, you can add a
 background region to it at lowest-possible-priority, which
 then takes accesses which are either (a) not in any subregion
 or (b) in a subregion but that container doesn't specify
 its own io ops and nothing in that container handles the
 access. (Or you could create the system memory region with
 its own IO ops, which would have the same effect.)

First, we do not render MMIO and IO within the same address space so
far. But even if we would, the IO container now catches all accesses, so
the system memory region will never have its default handler run for
that window.

Jan




signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH] target-mips: fix decoding of microMIPS POOL32Axf instructions

2013-08-05 Thread Aurelien Jarno
On Mon, Aug 05, 2013 at 08:41:52AM +0100, Leon Alrae wrote:
 On 03/08/13 23:01, Aurelien Jarno wrote:
  On Thu, Aug 01, 2013 at 11:02:27AM +0100, Leon Alrae wrote:
  These are not DSP instructions, thus there is no ac field.
 
  For more details please refer to instruction encoding of
  MULT, MULTU, MADD, MADDU, MSUB, MSUBU, MFHI, MFLO, MTHI, MTLO in
  MIPS Architecture for Programmers Volume II-B: The microMIPS32 Instruction 
  Set
  
  These instructions have both a non DSP version without the ac field,
  and a DSP version with the ac field. The encoding of the ac field is
  done in bits 14 and 15, so the current QEMU code is correct.
  
  Please refer to the  MIPS32® Architecture Manual Volume IV-e: The
  MIPS® DSP Application-Specific Extension to the microMIPS32®
  Architecture (document MD00764).
   
 
 Please note that the patch modifies non-DSP version of listed
 instructions, where ac field doesn't exist. In this case reading bits

This is correct, except for MFHI, MFLO, MTHI, MTLO which share the same
opcode for the DSP and non-DSP version. Theses instructions have bits 14
and 15 set to 0 for the non-DSP version, so they are working as expected
and thus your patch should not modify them.

 14 and 15 and passing it as acc argument to gen_muldiv() or gen_HILO()
 is not correct. If those bits are not 0 (and for most of the listed
 instructions this is true) then check_dsp() is called which will cause
 an exception if DSP is not supported / disabled.
 

This is correct. The current code assumes that all instructions using
gen_muldiv() have the same opcode for non-DSP and DSP version (actually
it's weird that they have a different encoding, it's just wasting some
opcode space).

Therefore what should be done:
- The part concerning MFHI, MFLO, MTHI, MTLO should be dropped from your
  patch as it already works correctly.
- The part of your patch concerning instructions calling gen_muldiv() is
  correct and should be kept to handle the non-DSP version of these
  instructions.
- An entry with for the DSP version of these instructions should be
  created, calling gen_muldiv() passing the ac field from bits 14 and
  15. This is basically the same code than now, just with extension 0x32
  instead of 0x2c.

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net



Re: [Qemu-devel] [PATCH 1/2] memory: Provide separate handling of unassigned io ports accesses

2013-08-05 Thread Peter Maydell
On 5 August 2013 11:44, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-08-05 12:36, Peter Maydell wrote:
 On 5 August 2013 11:30, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-08-05 11:59, Peter Maydell wrote:
 Or do you mean that if we had:

  [ system memory region, with its own default read/write ops ]

 I cannot imagine how this could work. The system memory region has no
 clue about what the regions below it can handle and what not. So it has
 to pass through the io window.

 The system memory region's just a container, you can add a
 background region to it at lowest-possible-priority, which
 then takes accesses which are either (a) not in any subregion
 or (b) in a subregion but that container doesn't specify
 its own io ops and nothing in that container handles the
 access. (Or you could create the system memory region with
 its own IO ops, which would have the same effect.)

 First, we do not render MMIO and IO within the same address space so
 far.

Is this a statement made because you've checked all the boards
and know that nobody's mapping the system-io memory region into
the system-memory region? (It is technically trivial, you
just need to call memory_region_add_subregion() directly
or indirectly...)

 But even if we would, the IO container now catches all accesses, so
 the system memory region will never have its default handler run for
 that window.

Yes, that is the point -- before the IO container caught all
accesses, the default handler that would run would be the
system memory region, but now that the IO container has
default read/write ops they will take precedence. So this
would be a behaviour change if there are any boards set up
like this. (I'm not sure there are any, though.)

-- PMM



Re: [Qemu-devel] [PATCH 4/4] timer: make qemu_clock_enable sync between disable and timer's cb

2013-08-05 Thread Paolo Bonzini
 On Aug 05 2013, Liu Ping Fan wrote:
 After disabling the QemuClock, we should make sure that no QemuTimers
 are still in flight. To implement that with light overhead, we resort
 to QemuEvent. The caller of disabling will wait on QemuEvent of each
 timerlist.
 
 Note, qemu_clock_enable(foo,false) can _not_ be called from timer's cb.
 And the callers of qemu_clock_enable() should be sync by themselves,
 not protected by this patch.
 
 Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com
 ---
  include/qemu/timer.h |  1 +
  qemu-timer.c | 11 +++
  2 files changed, 12 insertions(+)
 
 diff --git a/include/qemu/timer.h b/include/qemu/timer.h
 index 1363316..ca09ba2 100644
 --- a/include/qemu/timer.h
 +++ b/include/qemu/timer.h
 @@ -85,6 +85,7 @@ int64_t timerlistgroup_deadline_ns(QEMUTimerListGroup tlg);
  
  int qemu_timeout_ns_to_ms(int64_t ns);
  int qemu_poll_ns(GPollFD *fds, uint nfds, int64_t timeout);
 +/* The disable of clock can not be called in timer's cb */

See below for a more verbose version of the comment.  For
now leave it only in the .c file, we should add comments to
all of timer.h.

  void qemu_clock_enable(QEMUClock *clock, bool enabled);
  void qemu_clock_warp(QEMUClock *clock);
  
 diff --git a/qemu-timer.c b/qemu-timer.c
 index ebe7597..5828107 100644
 --- a/qemu-timer.c
 +++ b/qemu-timer.c
 @@ -71,6 +71,8 @@ struct QEMUTimerList {
  QLIST_ENTRY(QEMUTimerList) list;
  QEMUTimerListNotifyCB *notify_cb;
  void *notify_opaque;
 +/* light weight method to mark the end of timerlist's running */
 +QemuEvent ev;
  };
  
  struct QEMUTimer {
 @@ -92,6 +94,7 @@ static QEMUTimerList *timerlist_new_from_clock(QEMUClock 
 *clock)
  QEMUTimerList *tl;
  
  tl = g_malloc0(sizeof(QEMUTimerList));
 +qemu_event_init(tl-ev, false);

The event should start as set, since set means not inside
qemu_run_timers.

  tl-clock = clock;
  QLIST_INSERT_HEAD(clock-timerlists, tl, list);
  return tl;
 @@ -145,12 +148,18 @@ void qemu_clock_notify(QEMUClock *clock)
  }
  }
  
 +/* The disable of clock can _not_ be called from timer's cb */

/* Disabling the clock will wait for related timerlists to stop
 * executing qemu_run_timers.  Thus, this functions should not
 * be used from the callback of a timer that is based on @clock.
 * Doing so would cause a deadlock.
 */

  void qemu_clock_enable(QEMUClock *clock, bool enabled)
  {
 +QEMUTimerList *tl;
  bool old = clock-enabled;
  clock-enabled = enabled;
  if (enabled  !old) {
  qemu_clock_notify(clock);
 +} else if (!enabled  old) {
 +QLIST_FOREACH(tl, clock-timerlists, list) {
 +qemu_event_wait(tl-ev);
 +}
  }
  }
  
 @@ -419,6 +428,7 @@ bool timerlist_run_timers(QEMUTimerList *tl)
  }
  
  current_time = qemu_get_clock_ns(tl-clock);
 +qemu_event_reset(tl-ev);

Race condition here.  You need to test clock-enabled while the
event is reset.  Otherwise you get:

 -
 thread 1 is runningthread 2 is running
 qemu_clock_enable(foo, false)  qemu_run_timers(tl);
 -
 ** event is initially set **
  if (!clock-enabled) return;
clock-enabled = false;
qemu_event_wait(tl-ev);
return;
  qemu_event_reset(tl-ev);
  invokes callback
  qemu_event_set(tl-ev);
 -

violating the invariant that no callbacks are invoked after the return from
qemu_clock_enable(foo, false).

Paolo

  for(;;) {
  ts = tl-active_timers;
  if (!qemu_timer_expired_ns(ts, current_time)) {
 @@ -432,6 +442,7 @@ bool timerlist_run_timers(QEMUTimerList *tl)
  ts-cb(ts-opaque);
  progress = true;
  }
 +qemu_event_set(tl-ev);
  return progress;
  }
  
 -- 
 1.8.1.4
 



Re: [Qemu-devel] [PATCH 1/2] memory: Provide separate handling of unassigned io ports accesses

2013-08-05 Thread Jan Kiszka
On 2013-08-05 12:51, Peter Maydell wrote:
 On 5 August 2013 11:44, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-08-05 12:36, Peter Maydell wrote:
 On 5 August 2013 11:30, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-08-05 11:59, Peter Maydell wrote:
 Or do you mean that if we had:

  [ system memory region, with its own default read/write ops ]

 I cannot imagine how this could work. The system memory region has no
 clue about what the regions below it can handle and what not. So it has
 to pass through the io window.

 The system memory region's just a container, you can add a
 background region to it at lowest-possible-priority, which
 then takes accesses which are either (a) not in any subregion
 or (b) in a subregion but that container doesn't specify
 its own io ops and nothing in that container handles the
 access. (Or you could create the system memory region with
 its own IO ops, which would have the same effect.)

 First, we do not render MMIO and IO within the same address space so
 far.
 
 Is this a statement made because you've checked all the boards
 and know that nobody's mapping the system-io memory region into
 the system-memory region? (It is technically trivial, you
 just need to call memory_region_add_subregion() directly
 or indirectly...)

I know this because I just recently wrote the patch that enables this
trivial step, i.e. converted PIO dispatching to the memory subsystem.

Jan




signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [PATCH v2] target-arm: Implement 'int' loglevel

2013-08-05 Thread Peter Maydell
The 'int' loglevel for recording interrupts and exceptions
requires support in the target-specific code. Implement
it for ARM. This improves debug logging in some situations
that were otherwise pretty opaque, such as when we fault
trying to execute at an exception vector address, which
would otherwise cause an infinite loop of taking exceptions
without any indication in the debug log of what was going on.

Signed-off-by: Peter Maydell peter.mayd...@linaro.org
---
Changes v1-v2: added extra 'const' to excnames[] definition
as per rth review.

 target-arm/helper.c |   42 ++
 1 file changed, 42 insertions(+)

diff --git a/target-arm/helper.c b/target-arm/helper.c
index 4968391..6d9026d 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -1974,6 +1974,37 @@ static void do_v7m_exception_exit(CPUARMState *env)
pointer.  */
 }
 
+/* Exception names for debug logging; note that not all of these
+ * precisely correspond to architectural exceptions.
+ */
+static const char * const excnames[] = {
+[EXCP_UDEF] = Undefined Instruction,
+[EXCP_SWI] = SVC,
+[EXCP_PREFETCH_ABORT] = Prefetch Abort,
+[EXCP_DATA_ABORT] = Data Abort,
+[EXCP_IRQ] = IRQ,
+[EXCP_FIQ] = FIQ,
+[EXCP_BKPT] = Breakpoint,
+[EXCP_EXCEPTION_EXIT] = QEMU v7M exception exit,
+[EXCP_KERNEL_TRAP] = QEMU intercept of kernel commpage,
+[EXCP_STREX] = QEMU intercept of STREX,
+};
+
+static inline void arm_log_exception(int idx)
+{
+if (qemu_loglevel_mask(CPU_LOG_INT)) {
+const char *exc = NULL;
+
+if (idx = 0  idx  ARRAY_SIZE(excnames)) {
+exc = excnames[idx];
+}
+if (!exc) {
+exc = unknown;
+}
+qemu_log_mask(CPU_LOG_INT, Taking exception %d [%s]\n, idx, exc);
+}
+}
+
 void arm_v7m_cpu_do_interrupt(CPUState *cs)
 {
 ARMCPU *cpu = ARM_CPU(cs);
@@ -1982,6 +2013,8 @@ void arm_v7m_cpu_do_interrupt(CPUState *cs)
 uint32_t lr;
 uint32_t addr;
 
+arm_log_exception(env-exception_index);
+
 lr = 0xfff1;
 if (env-v7m.current_sp)
 lr |= 4;
@@ -2011,6 +2044,7 @@ void arm_v7m_cpu_do_interrupt(CPUState *cs)
 if (nr == 0xab) {
 env-regs[15] += 2;
 env-regs[0] = do_arm_semihosting(env);
+qemu_log_mask(CPU_LOG_INT, ...handled as semihosting call\n);
 return;
 }
 }
@@ -2064,6 +2098,8 @@ void arm_cpu_do_interrupt(CPUState *cs)
 
 assert(!IS_M(env));
 
+arm_log_exception(env-exception_index);
+
 /* TODO: Vectored interrupt controller.  */
 switch (env-exception_index) {
 case EXCP_UDEF:
@@ -2091,6 +2127,7 @@ void arm_cpu_do_interrupt(CPUState *cs)
 || (mask == 0xab  env-thumb))
(env-uncached_cpsr  CPSR_M) != ARM_CPU_MODE_USR) {
 env-regs[0] = do_arm_semihosting(env);
+qemu_log_mask(CPU_LOG_INT, ...handled as semihosting call\n);
 return;
 }
 }
@@ -2108,18 +2145,23 @@ void arm_cpu_do_interrupt(CPUState *cs)
(env-uncached_cpsr  CPSR_M) != ARM_CPU_MODE_USR) {
 env-regs[15] += 2;
 env-regs[0] = do_arm_semihosting(env);
+qemu_log_mask(CPU_LOG_INT, ...handled as semihosting call\n);
 return;
 }
 }
 env-cp15.c5_insn = 2;
 /* Fall through to prefetch abort.  */
 case EXCP_PREFETCH_ABORT:
+qemu_log_mask(CPU_LOG_INT, ...with IFSR 0x%x IFAR 0x%x\n,
+  env-cp15.c5_insn, env-cp15.c6_insn);
 new_mode = ARM_CPU_MODE_ABT;
 addr = 0x0c;
 mask = CPSR_A | CPSR_I;
 offset = 4;
 break;
 case EXCP_DATA_ABORT:
+qemu_log_mask(CPU_LOG_INT, ...with DFSR 0x%x DFAR 0x%x\n,
+  env-cp15.c5_data, env-cp15.c6_data);
 new_mode = ARM_CPU_MODE_ABT;
 addr = 0x10;
 mask = CPSR_A | CPSR_I;
-- 
1.7.9.5




Re: [Qemu-devel] [PATCH 1/2] memory: Provide separate handling of unassigned io ports accesses

2013-08-05 Thread Paolo Bonzini
 On Aug 05 2013, Jan Kiszka wrote:
 On 2013-08-05 12:36, Peter Maydell wrote:
  On 5 August 2013 11:30, Jan Kiszka jan.kis...@web.de wrote:
  On 2013-08-05 11:59, Peter Maydell wrote:
  Or do you mean that if we had:
 
   [ system memory region, with its own default read/write ops ]
 
  I cannot imagine how this could work. The system memory region has no
  clue about what the regions below it can handle and what not. So it has
  to pass through the io window.
  
  The system memory region's just a container, you can add a
  background region to it at lowest-possible-priority, which
  then takes accesses which are either (a) not in any subregion
  or (b) in a subregion but that container doesn't specify
  its own io ops and nothing in that container handles the
  access. (Or you could create the system memory region with
  its own IO ops, which would have the same effect.)
 
 First, we do not render MMIO and IO within the same address space so
 far.

This is not true, we do render the I/O address space in system memory
on machines that support PCI or ISA but are not x86.  We do that through
an alias of get_system_io(), but that doesn't change the picture.

 But even if we would, the IO container now catches all accesses, so
 the system memory region will never have its default handler run for
 that window.

This however is true, and is the right thing to do.  It is also
very similar to the workaround that Richard did on Alpha (commit
3661049fec64ffd7ab008e57e396881c6a4b53a4) and it should be possible
to revert that one indeed.

Paolo



Re: [Qemu-devel] [PATCH 1/2] memory: Provide separate handling of unassigned io ports accesses

2013-08-05 Thread Paolo Bonzini
On Aug 05 2013, Peter Maydell wrote:
  But even if we would, the IO container now catches all accesses, so
  the system memory region will never have its default handler run for
  that window.
 
 Yes, that is the point -- before the IO container caught all
 accesses, the default handler that would run would be the
 system memory region, but now that the IO container has
 default read/write ops they will take precedence. So this
 would be a behaviour change if there are any boards set up
 like this. (I'm not sure there are any, though.)

Before 1.6 this would have happened anyway.  Before 1.6, all
I/O accesses would go through cpu_in/cpu_out, and would never
hit the system memory region.  So Jan's patch is effectively
restoring old behavior.

Paolo



[Qemu-devel] [PATCH v4 0/2] ARM: add 'virt' platform

2013-08-05 Thread Peter Maydell
This patch series adds a 'virt' platform which uses the
kernel's mach-virt (fully device-tree driven) support
to create a simple minimalist platform intended for
use for KVM VM guests. It's based on John Rigby's
patches, but I've overhauled it a lot:

 * renamed user-facing machine to just virt
 * removed the A9 support (it can't work since the A9 has no
   generic timers)
 * added virtio-mmio transports instead of random set of 'soc' devices
 * instead of updating io_base as we step through adding devices,
   define a memory map with an array (similar to vexpress)
 * folded in some minor fixes from John's aarch64-support patch
 * rather than explicitly doing endian-swapping on FDT cells,
   use fdt APIs that let us just pass in host-endian values
   and let the fdt layer take care of the swapping
 * miscellaneous minor code cleanups and style fixes

If you want to test this with TCG QEMU you'll also need the
generic-timers implementation patches I posted recently.
A branch with generic-timers plus these patches is here:
https://git.linaro.org/gitweb?p=people/pmaydell/qemu-arm.git;a=shortlog;h=refs/heads/mach-virt

(The kernel in pure mach-virt mode requires generic timers;
it can't deal with getting its clock source from an sp804
timer specified by the device tree. This might be fixed in
a future kernel, but dropping all the soc-device support
from mach-virt makes it simpler anyway.)

An obvious thing this machine does not provide is a serial
port. I would rather just use virtio-console (and we
should implement the 'emergency console/earlyprintk' bit of
the virtio spec).

John Rigby (2):
  ARM: Allow boards to provide an fdt blob
  ARM: Add 'virt' platform

 hw/arm/Makefile.objs |2 +-
 hw/arm/boot.c|   32 +++--
 hw/arm/virt.c|  363 ++
 include/hw/arm/arm.h |7 +
 4 files changed, 391 insertions(+), 13 deletions(-)
 create mode 100644 hw/arm/virt.c

-- 
1.7.9.5




[Qemu-devel] [PATCH v4 1/2] ARM: Allow boards to provide an fdt blob

2013-08-05 Thread Peter Maydell
From: John Rigby john.ri...@linaro.org

If no fdt is provided on command line and the new field
get_dtb in struct arm_boot_info is set then call it to
get a device tree blob.

Signed-off-by: John Rigby john.ri...@linaro.org
[PMM: minor tweaks and cleanup]
Signed-off-by: Peter Maydell peter.mayd...@linaro.org
---
 hw/arm/boot.c|   32 
 include/hw/arm/arm.h |7 +++
 2 files changed, 27 insertions(+), 12 deletions(-)

diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index 2cbeefd..9d790b7 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -228,23 +228,31 @@ static void set_kernel_args_old(const struct 
arm_boot_info *info)
 static int load_dtb(hwaddr addr, const struct arm_boot_info *binfo)
 {
 void *fdt = NULL;
-char *filename;
 int size, rc;
 uint32_t acells, scells;
 
-filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, binfo-dtb_filename);
-if (!filename) {
-fprintf(stderr, Couldn't open dtb file %s\n, binfo-dtb_filename);
-goto fail;
-}
+if (binfo-dtb_filename) {
+char *filename;
+filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, binfo-dtb_filename);
+if (!filename) {
+fprintf(stderr, Couldn't open dtb file %s\n, 
binfo-dtb_filename);
+goto fail;
+}
 
-fdt = load_device_tree(filename, size);
-if (!fdt) {
-fprintf(stderr, Couldn't open dtb file %s\n, filename);
+fdt = load_device_tree(filename, size);
+if (!fdt) {
+fprintf(stderr, Couldn't open dtb file %s\n, filename);
+g_free(filename);
+goto fail;
+}
 g_free(filename);
-goto fail;
+} else if (binfo-get_dtb) {
+fdt = binfo-get_dtb(binfo, size);
+if (!fdt) {
+fprintf(stderr, Board was unable to create a dtb blob\n);
+goto fail;
+}
 }
-g_free(filename);
 
 acells = qemu_devtree_getprop_cell(fdt, /, #address-cells);
 scells = qemu_devtree_getprop_cell(fdt, /, #size-cells);
@@ -436,7 +444,7 @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info 
*info)
 /* for device tree boot, we pass the DTB directly in r2. Otherwise
  * we point to the kernel args.
  */
-if (info-dtb_filename) {
+if (info-dtb_filename || info-get_dtb) {
 /* Place the DTB after the initrd in memory. Note that some
  * kernels will trash anything in the 4K page the initrd
  * ends in, so make sure the DTB isn't caught up in that.
diff --git a/include/hw/arm/arm.h b/include/hw/arm/arm.h
index bae87c6..4e51a9e 100644
--- a/include/hw/arm/arm.h
+++ b/include/hw/arm/arm.h
@@ -55,6 +55,13 @@ struct arm_boot_info {
  const struct arm_boot_info *info);
 void (*secondary_cpu_reset_hook)(ARMCPU *cpu,
  const struct arm_boot_info *info);
+/* if a board is able to create a dtb without a dtb file then it
+ * sets get_dtb. This will only be used if no dtb file is provided
+ * by the user. On success, sets *size to the length of the created
+ * dtb, and returns a pointer to it. (The caller must free this memory
+ * with g_free() when it has finished with it.) On failure, returns NULL.
+ */
+void *(*get_dtb)(const struct arm_boot_info *info, int *size);
 /* if a board needs to be able to modify a device tree provided by
  * the user it should implement this hook.
  */
-- 
1.7.9.5




Re: [Qemu-devel] [PATCH v4 2/2] ARM: Add 'virt' platform

2013-08-05 Thread Peter Maydell
On 5 August 2013 12:18, Peter Maydell peter.mayd...@linaro.org wrote:
 From: John Rigby john.ri...@linaro.org

 Add 'virt' platform support corresponding to arch/arm/mach-virt
 in the Linux kernel tree. This has no platform-specific code but
 can use any device whose kernel driver is is able to work purely
 from a device tree node. We use this to instantiate a minimal
 set of devices: a GIC and some virtio-mmio transports.

 TODO:
  * MAX_MEM for A15 could be higher if we shuffle things about
  * RAM or flash at addr 0??

Doh. I addressed these TODO items in the code but forgot to
remove them from the commit message. Please ignore for
purposes of code review :-)

-- PMM



[Qemu-devel] [PATCH v4 2/2] ARM: Add 'virt' platform

2013-08-05 Thread Peter Maydell
From: John Rigby john.ri...@linaro.org

Add 'virt' platform support corresponding to arch/arm/mach-virt
in the Linux kernel tree. This has no platform-specific code but
can use any device whose kernel driver is is able to work purely
from a device tree node. We use this to instantiate a minimal
set of devices: a GIC and some virtio-mmio transports.

TODO:
 * MAX_MEM for A15 could be higher if we shuffle things about
 * RAM or flash at addr 0??

Signed-off-by: John Rigby john.ri...@linaro.org
[PMM:
 Significantly overhauled:
 * renamed user-facing machine to just virt
 * removed the A9 support (it can't work since the A9 has no
   generic timers)
 * added virtio-mmio transports instead of random set of 'soc' devices
 * instead of updating io_base as we step through adding devices,
   define a memory map with an array (similar to vexpress)
 * folded in some minor fixes from John's aarch64-support patch
 * rather than explicitly doing endian-swapping on FDT cells,
   use fdt APIs that let us just pass in host-endian values
   and let the fdt layer take care of the swapping
 * miscellaneous minor code cleanups and style fixes
]
Signed-off-by: Peter Maydell peter.mayd...@linaro.org
---
 hw/arm/Makefile.objs |2 +-
 hw/arm/virt.c|  363 ++
 2 files changed, 364 insertions(+), 1 deletion(-)
 create mode 100644 hw/arm/virt.c

diff --git a/hw/arm/Makefile.objs b/hw/arm/Makefile.objs
index 9e3a06f..744484f 100644
--- a/hw/arm/Makefile.objs
+++ b/hw/arm/Makefile.objs
@@ -1,7 +1,7 @@
 obj-y += boot.o collie.o exynos4_boards.o gumstix.o highbank.o
 obj-y += integratorcp.o kzm.o mainstone.o musicpal.o nseries.o
 obj-y += omap_sx1.o palm.o pic_cpu.o realview.o spitz.o stellaris.o
-obj-y += tosa.o versatilepb.o vexpress.o xilinx_zynq.o z2.o
+obj-y += tosa.o versatilepb.o vexpress.o virt.o xilinx_zynq.o z2.o
 
 obj-y += armv7m.o exynos4210.o pxa2xx.o pxa2xx_gpio.o pxa2xx_pic.o
 obj-y += omap1.o omap2.o strongarm.o
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
new file mode 100644
index 000..2a743b7
--- /dev/null
+++ b/hw/arm/virt.c
@@ -0,0 +1,363 @@
+/*
+ * ARM mach-virt emulation
+ *
+ * Copyright (c) 2013 Linaro
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see http://www.gnu.org/licenses/.
+ *
+ * Emulate a virtual board which works by passing Linux all the information
+ * it needs about what devices are present via the device tree.
+ * There are some restrictions about what we can do here:
+ *  + we can only present devices whose Linux drivers will work based
+ *purely on the device tree with no platform data at all
+ *  + we want to present a very stripped-down minimalist platform,
+ *both because this reduces the security attack surface from the guest
+ *and also because it reduces our exposure to being broken when
+ *the kernel updates its device tree bindings and requires further
+ *information in a device binding that we aren't providing.
+ * This is essentially the same approach kvmtool uses.
+ */
+
+#include hw/sysbus.h
+#include hw/arm/arm.h
+#include hw/arm/primecell.h
+#include hw/devices.h
+#include net/net.h
+#include sysemu/device_tree.h
+#include sysemu/sysemu.h
+#include sysemu/kvm.h
+#include hw/boards.h
+#include exec/address-spaces.h
+#include qemu/bitops.h
+#include qemu/error-report.h
+
+#define NUM_VIRTIO_TRANSPORTS 32
+
+#define GIC_FDT_IRQ_TYPE_SPI 0
+#define GIC_FDT_IRQ_TYPE_PPI 1
+
+#define GIC_FDT_IRQ_FLAGS_EDGE_LO_HI 1
+#define GIC_FDT_IRQ_FLAGS_EDGE_HI_LO 2
+#define GIC_FDT_IRQ_FLAGS_LEVEL_HI 4
+#define GIC_FDT_IRQ_FLAGS_LEVEL_LO 8
+
+#define GIC_FDT_IRQ_PPI_CPU_START 8
+#define GIC_FDT_IRQ_PPI_CPU_WIDTH 8
+
+enum {
+VIRT_FLASH,
+VIRT_MEM,
+VIRT_CPUPERIPHS,
+VIRT_GIC_DIST,
+VIRT_GIC_CPU,
+VIRT_MMIO,
+};
+
+typedef struct MemMapEntry {
+hwaddr base;
+hwaddr size;
+} MemMapEntry;
+
+typedef struct VirtBoardInfo {
+struct arm_boot_info bootinfo;
+const char *cpu_model;
+const char *cpu_compatible;
+const char *qdevname;
+const char *gic_compatible;
+const MemMapEntry *memmap;
+int smp_cpus;
+void *fdt;
+int fdt_size;
+} VirtBoardInfo;
+
+/* Addresses and sizes of our components.
+ * We leave the first 64K free for possible use later for
+ * flash (for running boot code such as UEFI); following
+ * that is I/O, and then everything else is RAM (which may
+ * happily spill over into the high memory region beyond 4GB).
+ */

Re: [Qemu-devel] [PATCH 1/2] memory: Provide separate handling of unassigned io ports accesses

2013-08-05 Thread Andreas Färber
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Am 05.08.2013 13:03, schrieb Jan Kiszka:
 On 2013-08-05 12:51, Peter Maydell wrote:
 On 5 August 2013 11:44, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-08-05 12:36, Peter Maydell wrote:
 On 5 August 2013 11:30, Jan Kiszka jan.kis...@web.de
 wrote:
 On 2013-08-05 11:59, Peter Maydell wrote:
 Or do you mean that if we had:
 
 [ system memory region, with its own default read/write
 ops ]
 
 I cannot imagine how this could work. The system memory
 region has no clue about what the regions below it can
 handle and what not. So it has to pass through the io
 window.
 
 The system memory region's just a container, you can add a 
 background region to it at lowest-possible-priority, which 
 then takes accesses which are either (a) not in any
 subregion or (b) in a subregion but that container doesn't
 specify its own io ops and nothing in that container handles
 the access. (Or you could create the system memory region
 with its own IO ops, which would have the same effect.)
 
 First, we do not render MMIO and IO within the same address
 space so far.
 
 Is this a statement made because you've checked all the boards 
 and know that nobody's mapping the system-io memory region into 
 the system-memory region? (It is technically trivial, you just
 need to call memory_region_add_subregion() directly or
 indirectly...)
 
 I know this because I just recently wrote the patch that enables
 this trivial step, i.e. converted PIO dispatching to the memory
 subsystem.

Several patches have been applied since, e.g.

sPAPR PHB:
http://git.qemu.org/?p=qemu.git;a=commit;h=66aab867cedd2a2d81b4d64eff7c3e0f6f272bbf
- - aliases system_io()

PReP i82378 PCI-ISA bridge:
http://git.qemu.org/?p=qemu.git;a=commit;h=5c9736789b79ea49cd236ac326f0a414f63b1015
- - uses pci_address_space_io()

Alpha Typhoon:
http://git.qemu.org/?p=qemu.git;a=commit;h=056e6bae1c91f47165d962564f82f5176bae47f0
http://git.qemu.org/?p=qemu.git;a=commit;h=3661049fec64ffd7ab008e57e396881c6a4b53a4

[For those joining late, this discussion is about whether making PIO
MemoryRegion ..._io rather than just container might hurt some use
case. If you have a concrete test case that would be appreciated; a
we-don't-care-about-such-a-fringe-case would help as well.]

Andreas

- -- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.19 (GNU/Linux)

iQIcBAEBAgAGBQJR/43lAAoJEPou0S0+fgE/EiMQAJP7YfqF+YYKvS5NUonLMo26
ncsY7E9IikpWSCcL/QVwmfFTfDjptR0iI987+4OcE3Sf0nH3pv5FFge4aR965Of1
sBvqfAVD9ihnV2sO+s2rImgw9tyC/kadA6k0hLB8h/+QDgl/J5wjpI0A29OHT8Uc
BYv210POLqzU+nAO83t9w3xZA3Q60+X8YxI3GqAd6CEMS7i2yga0SK2Q3U1P7o+G
sE6hqYE7sWKtaQlosTWuAxYnOOXcY2u+EEnqS6SXqZbp4NcitRYCamNu+8ARDAab
JNIbbmCt1GTzdQ6PxVu5paPRUJbETQ2rDWxAq4Ryr37Gi5vb9FFPNMKpSncNu28b
o46Fr6QMDTMNgk0Ac7F4WKKICGsCMuHuzot7JLoRkfhPQunqwQR5yEyvUKwYtV5p
pAl1Frop+88X6HEmUkNJHR1FnhI5pVLImyxJTLXVRGgZqx7i1J8cvVPM6PhWa8tU
NFFVKy7aWNj5MOVE8oSGK+LWilikb7i0b7YxDI/Ky/OI62niy3CK+XQgYfSFXEZ/
s8ErAfI5CMKy0O1/KbUy6V3cGpGM2xRizBSDCy3GqP3XD3If3tG8siRSarEG2FxO
TO73OWtYarAtZ5cIRIsefV3uT/CA6g8p/dXgfp1WxDSIZcqZIMz5FIGgIDoIT7i4
6fPuo2wYiZXLS3/g6tuP
=egTc
-END PGP SIGNATURE-



Re: [Qemu-devel] net/tap.c: Possibly a way to stall tap input

2013-08-05 Thread Stefan Hajnoczi
On Fri, Aug 02, 2013 at 09:41:24PM +0200, Jan Kiszka wrote:
 On 2013-08-02 14:45, Jan Kiszka wrote:
  On 2013-08-02 13:46, Stefan Hajnoczi wrote:
  On Thu, Aug 01, 2013 at 07:15:54PM +0200, Jan Kiszka wrote:
  I was digging into the involved code and found something fishy:
 
  net/tap.c:
  static void tap_send(void *opaque)
  {
  ...
  size = qemu_send_packet_async(s-nc, buf, size,
tap_send_completed);
  if (size == 0) {
  tap_read_poll(s, false);
  }
 
  So, if tap_send is registered for the mainloop polling (ie. can_receive
  returned true before starting to poll) but qemu_send_packet_async
  returns 0 now as qemu_can_send_packet/can_receive happens to report
  false in the meantime, we will disable read polling. If also write
  polling is off, the fd will be completely removed from the iohandler
  list. But even if write polling remains on, I wonder what should bring
  read polling back?
 
  This behavior seems fine to me.  Once the peer (pcnet) is able to
  receive again it must flush the queue, this will re-enable
  tap_read_poll().
 
  Can you explain a bit more why this would be a problem?
  
  The problem is that I don't see at all what will call tap_read_poll(s,
  1), neither in theory nor in reality.
  
  As long as the real test case is out of reach, I tried to emulate the
  faulty behaviour by letting tap_can_send always return 1. Result:
  reception stalls during boot as even qemu_flush_queued_packets cannot
  get it running again once tap_read_poll(s, 0) was called.
 
 OK, false alarm. The issue was most likely fixed by commit 199ee608
 (net: fix qemu_flush_queued_packets() in presence of a hub) which is
 present in 1.5.x but not 1.3.x. We initially tried to test on 1.5 but
 had to role back to 1.3 due to other issues - and missed this fix.
 
 My understanding of the networking maze was confused by the unfortunate
 naming of the incoming net client queues (send_queue) - will propose a
 renaming.
 
 This still requires a confirmation on the target, but I'm quite
 optimistic now.

Okay, good to hear.  It makes more sense now and I agree that
send_queue is not a great name.

Stefan



Re: [Qemu-devel] [PATCH 1/2] memory: Provide separate handling of unassigned io ports accesses

2013-08-05 Thread Jan Kiszka
On 2013-08-05 13:35, Andreas Färber wrote:
 Am 05.08.2013 13:03, schrieb Jan Kiszka:
 On 2013-08-05 12:51, Peter Maydell wrote:
 On 5 August 2013 11:44, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-08-05 12:36, Peter Maydell wrote:
 On 5 August 2013 11:30, Jan Kiszka jan.kis...@web.de
 wrote:
 On 2013-08-05 11:59, Peter Maydell wrote:
 Or do you mean that if we had:

 [ system memory region, with its own default read/write
 ops ]

 I cannot imagine how this could work. The system memory
 region has no clue about what the regions below it can
 handle and what not. So it has to pass through the io
 window.

 The system memory region's just a container, you can add a 
 background region to it at lowest-possible-priority, which 
 then takes accesses which are either (a) not in any
 subregion or (b) in a subregion but that container doesn't
 specify its own io ops and nothing in that container handles
 the access. (Or you could create the system memory region
 with its own IO ops, which would have the same effect.)

 First, we do not render MMIO and IO within the same address
 space so far.

 Is this a statement made because you've checked all the boards 
 and know that nobody's mapping the system-io memory region into 
 the system-memory region? (It is technically trivial, you just
 need to call memory_region_add_subregion() directly or
 indirectly...)
 
 I know this because I just recently wrote the patch that enables
 this trivial step, i.e. converted PIO dispatching to the memory
 subsystem.
 
 Several patches have been applied since, e.g.
 
 sPAPR PHB:
 http://git.qemu.org/?p=qemu.git;a=commit;h=66aab867cedd2a2d81b4d64eff7c3e0f6f272bbf
 - aliases system_io()
 
 PReP i82378 PCI-ISA bridge:
 http://git.qemu.org/?p=qemu.git;a=commit;h=5c9736789b79ea49cd236ac326f0a414f63b1015
 - uses pci_address_space_io()
 
 Alpha Typhoon:
 http://git.qemu.org/?p=qemu.git;a=commit;h=056e6bae1c91f47165d962564f82f5176bae47f0
 http://git.qemu.org/?p=qemu.git;a=commit;h=3661049fec64ffd7ab008e57e396881c6a4b53a4
 
 [For those joining late, this discussion is about whether making PIO
 MemoryRegion ..._io rather than just container might hurt some use
 case. If you have a concrete test case that would be appreciated; a
 we-don't-care-about-such-a-fringe-case would help as well.]

OK, one assumption became outdated, but the other will remain true once
the patch is applied. So let's close this discussion.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH for-1.6] qemu-img: Error out for excess arguments

2013-08-05 Thread Laszlo Ersek
On 08/05/13 10:57, Kevin Wolf wrote:
 Don't silently ignore excess arguments at the end of the command line,
 but error out instead. This can catch typos like 'resize test.img + 1G',
 which doesn't increase the image size by 1G as intended, but truncates
 the image to 1G. Even for less dangerous commands, the old behaviour is
 confusing.
 
 Signed-off-by: Kevin Wolf kw...@redhat.com
 ---
  qemu-img.c | 17 ++---
  1 file changed, 10 insertions(+), 7 deletions(-)
 
 diff --git a/qemu-img.c b/qemu-img.c
 index c55ca5c..dece1b3 100644
 --- a/qemu-img.c
 +++ b/qemu-img.c
 @@ -396,6 +396,9 @@ static int img_create(int argc, char **argv)
  }
  img_size = (uint64_t)sval;
  }
 +if (optind != argc) {
 +help();
 +}
  
  if (options  is_help_option(options)) {
  return print_block_option_help(filename, fmt);
 @@ -573,7 +576,7 @@ static int img_check(int argc, char **argv)
  break;
  }
  }
 -if (optind = argc) {
 +if (optind != argc - 1) {
  help();
  }
  filename = argv[optind++];
 @@ -684,7 +687,7 @@ static int img_commit(int argc, char **argv)
  break;
  }
  }
 -if (optind = argc) {
 +if (optind != argc - 1) {
  help();
  }
  filename = argv[optind++];
 @@ -930,7 +933,7 @@ static int img_compare(int argc, char **argv)
  }
  
  
 -if (optind  argc - 2) {
 +if (optind != argc - 2) {
  help();
  }
  filename1 = argv[optind++];
 @@ -1741,7 +1744,7 @@ static int img_info(int argc, char **argv)
  break;
  }
  }
 -if (optind = argc) {
 +if (optind != argc - 1) {
  help();
  }
  filename = argv[optind++];
 @@ -1842,7 +1845,7 @@ static int img_snapshot(int argc, char **argv)
  }
  }
  
 -if (optind = argc) {
 +if (optind != argc - 1) {
  help();
  }
  filename = argv[optind++];
 @@ -1953,7 +1956,7 @@ static int img_rebase(int argc, char **argv)
  progress = 0;
  }
  
 -if ((optind = argc) || (!unsafe  !out_baseimg)) {
 +if ((optind != argc - 1) || (!unsafe  !out_baseimg)) {
  help();
  }
  filename = argv[optind++];
 @@ -2232,7 +2235,7 @@ static int img_resize(int argc, char **argv)
  break;
  }
  }
 -if (optind = argc) {
 +if (optind != argc - 1) {
  help();
  }
  filename = argv[optind++];
 

Reviewed-by: Laszlo Ersek ler...@redhat.com



Re: [Qemu-devel] [PATCH v2 2/9] block: vhdx - add header update capability.

2013-08-05 Thread Kevin Wolf
Am 01.08.2013 um 05:23 hat Jeff Cody geschrieben:
 This adds the ability to update the headers in a VHDX image, including
 generating a new MS-compatible GUID.
 
 As VHDX depends on uuid.h, VHDX is now a configurable build option.  If
 VHDX support is enabled, that will also enable uuid as well.  The
 default is to have VHDX enabled.
 
 To enable/disable VHDX:  --enable-vhdx, --disable-vhdx
 
 Signed-off-by: Jeff Cody jc...@redhat.com

 diff --git a/configure b/configure
 index 877a821..821b790 100755
 --- a/configure
 +++ b/configure
 @@ -244,6 +244,7 @@ gtk=
  gtkabi=2.0
  tpm=no
  libssh2=
 +vhdx=yes
  
  # parse CC options first
  for opt do
 @@ -950,6 +951,11 @@ for opt do
;;
--enable-libssh2) libssh2=yes
;;
 +  --enable-vhdx) vhdx=yes ;
 + uuid=yes
 +  ;;

What's this? Shouldn't auto-detecting uuid and checking later whether
it's available do the right thing?

Test cases:

1. configure --disable-uuid --enable-vhdx
   This should result in an error because the options contradict each
   other. With this patch, the disable flag is overridden.

2. No libuuid headers available; ./configure
   Should build without VHDX. With this patch, uuid is left out, vhdx is
   configured anyway, the build fails.

Kevin



Re: [Qemu-devel] [PATCH] virtio: Do not notify virtqueue if no element was pushed back.

2013-08-05 Thread Laszlo Ersek
On 08/05/13 10:18, Gal Hammer wrote:
 Signed-off-by: Gal Hammer gham...@redhat.com
 ---
  hw/char/virtio-serial-bus.c | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)
 
 diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
 index da417c7..0d38b4b 100644
 --- a/hw/char/virtio-serial-bus.c
 +++ b/hw/char/virtio-serial-bus.c
 @@ -105,6 +105,7 @@ static void do_flush_queued_data(VirtIOSerialPort *port, 
 VirtQueue *vq,
   VirtIODevice *vdev)
  {
  VirtIOSerialPortClass *vsc;
 +bool elem_pushed = false;
  
  assert(port);
  assert(virtio_queue_ready(vq));
 @@ -145,9 +146,12 @@ static void do_flush_queued_data(VirtIOSerialPort *port, 
 VirtQueue *vq,
  break;
  }
  virtqueue_push(vq, port-elem, 0);
 +elem_pushed = true;
  port-elem.out_num = 0;
  }
 -virtio_notify(vdev, vq);
 +if (elem_pushed) {
 +virtio_notify(vdev, vq);
 +}
  }
  
  static void flush_queued_data(VirtIOSerialPort *port)
 

I could be missing something, but it looks good to me.

BTW the subject should say: virtio-serial: ..., not generic virtio: 

How did you catch this? Is this a performance problem, or did it expose
a guest driver bug? (The guest driver should be prepared for spurious
wakeups.)

Reviewed-by: Laszlo Ersek ler...@redhat.com



Re: [Qemu-devel] [PATCH v4 2/2] ARM: Add 'virt' platform

2013-08-05 Thread Anup Patel
On Mon, Aug 5, 2013 at 4:48 PM, Peter Maydell peter.mayd...@linaro.org wrote:
 From: John Rigby john.ri...@linaro.org

 Add 'virt' platform support corresponding to arch/arm/mach-virt
 in the Linux kernel tree. This has no platform-specific code but
 can use any device whose kernel driver is is able to work purely
 from a device tree node. We use this to instantiate a minimal
 set of devices: a GIC and some virtio-mmio transports.

 TODO:
  * MAX_MEM for A15 could be higher if we shuffle things about
  * RAM or flash at addr 0??

 Signed-off-by: John Rigby john.ri...@linaro.org
 [PMM:
  Significantly overhauled:
  * renamed user-facing machine to just virt
  * removed the A9 support (it can't work since the A9 has no
generic timers)
  * added virtio-mmio transports instead of random set of 'soc' devices
  * instead of updating io_base as we step through adding devices,
define a memory map with an array (similar to vexpress)
  * folded in some minor fixes from John's aarch64-support patch
  * rather than explicitly doing endian-swapping on FDT cells,
use fdt APIs that let us just pass in host-endian values
and let the fdt layer take care of the swapping
  * miscellaneous minor code cleanups and style fixes
 ]
 Signed-off-by: Peter Maydell peter.mayd...@linaro.org
 ---
  hw/arm/Makefile.objs |2 +-
  hw/arm/virt.c|  363 
 ++
  2 files changed, 364 insertions(+), 1 deletion(-)
  create mode 100644 hw/arm/virt.c

 diff --git a/hw/arm/Makefile.objs b/hw/arm/Makefile.objs
 index 9e3a06f..744484f 100644
 --- a/hw/arm/Makefile.objs
 +++ b/hw/arm/Makefile.objs
 @@ -1,7 +1,7 @@
  obj-y += boot.o collie.o exynos4_boards.o gumstix.o highbank.o
  obj-y += integratorcp.o kzm.o mainstone.o musicpal.o nseries.o
  obj-y += omap_sx1.o palm.o pic_cpu.o realview.o spitz.o stellaris.o
 -obj-y += tosa.o versatilepb.o vexpress.o xilinx_zynq.o z2.o
 +obj-y += tosa.o versatilepb.o vexpress.o virt.o xilinx_zynq.o z2.o

  obj-y += armv7m.o exynos4210.o pxa2xx.o pxa2xx_gpio.o pxa2xx_pic.o
  obj-y += omap1.o omap2.o strongarm.o
 diff --git a/hw/arm/virt.c b/hw/arm/virt.c
 new file mode 100644
 index 000..2a743b7
 --- /dev/null
 +++ b/hw/arm/virt.c
 @@ -0,0 +1,363 @@
 +/*
 + * ARM mach-virt emulation
 + *
 + * Copyright (c) 2013 Linaro
 + *
 + * This program is free software; you can redistribute it and/or modify it
 + * under the terms and conditions of the GNU General Public License,
 + * version 2 or later, as published by the Free Software Foundation.
 + *
 + * This program is distributed in the hope it will be useful, but WITHOUT
 + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
 + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
 + * more details.
 + *
 + * You should have received a copy of the GNU General Public License along 
 with
 + * this program.  If not, see http://www.gnu.org/licenses/.
 + *
 + * Emulate a virtual board which works by passing Linux all the information
 + * it needs about what devices are present via the device tree.
 + * There are some restrictions about what we can do here:
 + *  + we can only present devices whose Linux drivers will work based
 + *purely on the device tree with no platform data at all
 + *  + we want to present a very stripped-down minimalist platform,
 + *both because this reduces the security attack surface from the guest
 + *and also because it reduces our exposure to being broken when
 + *the kernel updates its device tree bindings and requires further
 + *information in a device binding that we aren't providing.
 + * This is essentially the same approach kvmtool uses.
 + */
 +
 +#include hw/sysbus.h
 +#include hw/arm/arm.h
 +#include hw/arm/primecell.h
 +#include hw/devices.h
 +#include net/net.h
 +#include sysemu/device_tree.h
 +#include sysemu/sysemu.h
 +#include sysemu/kvm.h
 +#include hw/boards.h
 +#include exec/address-spaces.h
 +#include qemu/bitops.h
 +#include qemu/error-report.h
 +
 +#define NUM_VIRTIO_TRANSPORTS 32
 +
 +#define GIC_FDT_IRQ_TYPE_SPI 0
 +#define GIC_FDT_IRQ_TYPE_PPI 1
 +
 +#define GIC_FDT_IRQ_FLAGS_EDGE_LO_HI 1
 +#define GIC_FDT_IRQ_FLAGS_EDGE_HI_LO 2
 +#define GIC_FDT_IRQ_FLAGS_LEVEL_HI 4
 +#define GIC_FDT_IRQ_FLAGS_LEVEL_LO 8
 +
 +#define GIC_FDT_IRQ_PPI_CPU_START 8
 +#define GIC_FDT_IRQ_PPI_CPU_WIDTH 8
 +
 +enum {
 +VIRT_FLASH,
 +VIRT_MEM,
 +VIRT_CPUPERIPHS,
 +VIRT_GIC_DIST,
 +VIRT_GIC_CPU,
 +VIRT_MMIO,
 +};
 +
 +typedef struct MemMapEntry {
 +hwaddr base;
 +hwaddr size;
 +} MemMapEntry;
 +
 +typedef struct VirtBoardInfo {
 +struct arm_boot_info bootinfo;
 +const char *cpu_model;
 +const char *cpu_compatible;
 +const char *qdevname;
 +const char *gic_compatible;
 +const MemMapEntry *memmap;
 +int smp_cpus;
 +void *fdt;
 +int fdt_size;
 +} VirtBoardInfo;
 +
 +/* Addresses and sizes of our components.
 + * We leave the first 64K free for 

Re: [Qemu-devel] [PATCH for-1.6 01/11] ignore SIGPIPE in qemu-img and qemu-io

2013-08-05 Thread Kevin Wolf
Am 03.08.2013 um 05:52 hat Doug Goldstein geschrieben:
 On Tue, Jul 23, 2013 at 4:19 AM, Paolo Bonzini pbonz...@redhat.com wrote:
  Il 23/07/2013 10:30, MORITA Kazutaka ha scritto:
  This prevents the tools from being stopped when they write data to a
  closed connection in the other side.
 
  Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
  ---
   qemu-img.c | 4 
   qemu-io.c  | 4 
   2 files changed, 8 insertions(+)
 
  diff --git a/qemu-img.c b/qemu-img.c
  index c55ca5c..919d464 100644
  --- a/qemu-img.c
  +++ b/qemu-img.c
  @@ -2319,6 +2319,10 @@ int main(int argc, char **argv)
   const img_cmd_t *cmd;
   const char *cmdname;
 
  +#ifdef CONFIG_POSIX
  +signal(SIGPIPE, SIG_IGN);
  +#endif
  +
   error_set_progname(argv[0]);
 
   qemu_init_main_loop();
  diff --git a/qemu-io.c b/qemu-io.c
  index cb9def5..d54dc86 100644
  --- a/qemu-io.c
  +++ b/qemu-io.c
  @@ -335,6 +335,10 @@ int main(int argc, char **argv)
   int opt_index = 0;
   int flags = BDRV_O_UNMAP;
 
  +#ifdef CONFIG_POSIX
  +signal(SIGPIPE, SIG_IGN);
  +#endif
  +
   progname = basename(argv[0]);
 
   while ((c = getopt_long(argc, argv, sopt, lopt, opt_index)) != -1) {
 
 
  Reviewed-by: Paolo Bonzini pbonz...@redhat.com
 
  and adding qemu-stable for this one.
 
 
 Nudge so this isn't forgotten about since it hasn't hit master yet.

Thanks, applied to the block branch.

Kevin



Re: [Qemu-devel] [PATCH] tap: Use numbered tap/tun devices on all *BSD OS's

2013-08-05 Thread Stefan Hajnoczi
On Sat, Aug 03, 2013 at 10:20:41PM -0400, Brad Smith wrote:
 The following patch simplifies the *BSD tap/tun code and makes use of numbered
 tap/tun interfaces on all *BSD OS's. NetBSD has a patch in their pkgsrc tree
 to make use of this feature and DragonFly also supports this as well.
 
 Signed-off-by: Brad Smith b...@comstyle.com

I confirmed that the NetBSD tun driver does use /dev/tap%d by default.
There are not other CONFIG_BSD=y targets listed in ./configure besides
FreeBSD/kFreeBSD, OpenBSD, and Darwin.  Therefore this patch is safe.

Thanks, applied to my net-next tree:
https://github.com/stefanha/qemu/commits/net-next

Stefan



Re: [Qemu-devel] [ANNOUNCE] QEMU 1.6.0-rc1 is now available

2013-08-05 Thread Laszlo Ersek
On 08/04/13 14:17, Mark Cave-Ayland wrote:
 On 01/08/13 23:08, Anthony Liguori wrote:
 
 Hi,

 On behalf of the QEMU Team, I'd like to announce the availability of the
 second release candidate for the QEMU 1.6 release.  This release is meant
 for testing purposes and should not be used in a production environment.

 http://wiki.qemu.org/download/qemu-1.6.0-rc1.tar.bz2
 
 Hi Anthony,
 
 There are still two outstanding issues from my testing here - patches
 have been submitted for both, but not applied yet.
 
 - qemu-system-sparc64 broken
 See https://lists.gnu.org/archive/html/qemu-devel/2013-07/msg05201.html.
 
 - OpenBIOS images have not been updated
 See https://lists.gnu.org/archive/html/qemu-devel/2013-07/msg05923.html.

Also, my series

  [PATCH 0/4] dump-guest-memory: correct the vmcores
  http://thread.gmane.org/gmane.comp.emulators.qemu/225360

was dropped from qmp-1.6 (and consequently didn't make the d5a2bcf7
merge) due to an asynchronous change to another part of the tree that I
could not have foreseen.

IOW it was not my fault -- I did exercise due diligence and my series
was building and working fine at the then-fresh master when I posted it.
A conflicting function was introduced between my posting and Luiz's
application of my series to his qmp-1.6 branch, and since I was on PTO
from last Thursday, I could only refresh the series only today.

The update from v1 to v2 is trivial:


diff --git a/target-s390x/arch_dump.c b/target-s390x/arch_dump.c
index f3e5144..9d36116 100644
--- a/target-s390x/arch_dump.c
+++ b/target-s390x/arch_dump.c
@@ -176,7 +176,8 @@ int s390_cpu_write_elf64_note(WriteCoreDumpFunction f, 
CPUState *cs,
 return s390x_write_all_elf64_notes(CORE, f, cpu, cpuid, opaque);
 }

-int cpu_get_dump_info(ArchDumpInfo *info)
+int cpu_get_dump_info(ArchDumpInfo *info,
+  const struct GuestPhysBlockList *guest_phys_blocks)
 {
 info-d_machine = EM_S390;
 info-d_endian = ELFDATA2MSB;


Please do consider it for 1.6, it addresses a serious bug (= vmcores saved
with dump-guest-memory/paging=false are broken for  3.5G x86(_64) guests).

  [PATCH v2 for-qmp-1.6 0/4] dump-guest-memory: correct the vmcores
  http://thread.gmane.org/gmane.comp.emulators.qemu/226378

Thanks
Laszlo




Re: [Qemu-devel] [PATCH] net: Rename send_queue to incoming_queue

2013-08-05 Thread Stefan Hajnoczi
On Fri, Aug 02, 2013 at 09:47:08PM +0200, Jan Kiszka wrote:
 From: Jan Kiszka jan.kis...@siemens.com
 
 Each networking client has a queue for packets that could not yet be
 delivered to that client. Calling this queue send_queue is highly
 confusing as it has nothing to to with packets send from this client but
 to it. Avoid this confusing by renaming it to incoming_queue.
 
 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 ---
  include/net/net.h |2 +-
  net/hub.c |2 +-
  net/net.c |   14 +++---
  3 files changed, 9 insertions(+), 9 deletions(-)

Taking this for QEMU 1.7 in the net-next branch.

Thanks, applied to my net-next tree:
https://github.com/stefanha/qemu/commits/net-next

Stefan



Re: [Qemu-devel] [PATCH] virtio: Do not notify virtqueue if no element was pushed back.

2013-08-05 Thread Gal Hammer

On 05/08/2013 14:49, Laszlo Ersek wrote:

On 08/05/13 10:18, Gal Hammer wrote:

Signed-off-by: Gal Hammer gham...@redhat.com
---
  hw/char/virtio-serial-bus.c | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index da417c7..0d38b4b 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -105,6 +105,7 @@ static void do_flush_queued_data(VirtIOSerialPort *port, 
VirtQueue *vq,
   VirtIODevice *vdev)
  {
  VirtIOSerialPortClass *vsc;
+bool elem_pushed = false;

  assert(port);
  assert(virtio_queue_ready(vq));
@@ -145,9 +146,12 @@ static void do_flush_queued_data(VirtIOSerialPort *port, 
VirtQueue *vq,
  break;
  }
  virtqueue_push(vq, port-elem, 0);
+elem_pushed = true;
  port-elem.out_num = 0;
  }
-virtio_notify(vdev, vq);
+if (elem_pushed) {
+virtio_notify(vdev, vq);
+}
  }

  static void flush_queued_data(VirtIOSerialPort *port)



I could be missing something, but it looks good to me.


Thanks.


BTW the subject should say: virtio-serial: ..., not generic virtio: 


OK. I can change that.


How did you catch this? Is this a performance problem, or did it expose
a guest driver bug? (The guest driver should be prepared for spurious
wakeups.)


I had a bug in the Windows' driver that caused a buffer duplication when 
the char device didn't read the whole message. It was found when working 
with SPICE client over WAN (i.e. slow connection) and trying to 
copypaste a large image.


The driver was fixed. However I still think that this is needed although 
I'm not sure if there is a major performance penalty because of this 
redundant notification.


Gal.


Reviewed-by: Laszlo Ersek ler...@redhat.com






Re: [Qemu-devel] [PATCH for-1.6] target-i386: Fix X86CPU error handling

2013-08-05 Thread Andreas Färber
Am 02.08.2013 19:22, schrieb Andreas Färber:
 Error **errp argument is not for emitting warnings, it means an error
 has occurred and the caller should not make any assumptions about the
 state of other return values (unless otherwise documented).
 
 Therefore cpu_x86_create() must unref the new X86CPU itself, and
 pc_new_cpu() must check for an Error rather than NULL return value.
 
 While at it, clean up a superfluous NULL check.
 
 Reported-by: Jan Kiszka jan.kis...@siemens.com
 Cc: qemu-sta...@nongnu.org
 Cc: Igor Mammedov imamm...@redhat.com
 Signed-off-by: Andreas Färber afaer...@suse.de

Ping! Jan, does this address your concerns / use cases?
-rc2 is on Wednesday.

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg



Re: [Qemu-devel] [PATCH for-1.6] target-i386: Fix X86CPU error handling

2013-08-05 Thread Jan Kiszka
On 2013-08-05 14:06, Andreas Färber wrote:
 Am 02.08.2013 19:22, schrieb Andreas Färber:
 Error **errp argument is not for emitting warnings, it means an error
 has occurred and the caller should not make any assumptions about the
 state of other return values (unless otherwise documented).

 Therefore cpu_x86_create() must unref the new X86CPU itself, and
 pc_new_cpu() must check for an Error rather than NULL return value.

 While at it, clean up a superfluous NULL check.

 Reported-by: Jan Kiszka jan.kis...@siemens.com
 Cc: qemu-sta...@nongnu.org
 Cc: Igor Mammedov imamm...@redhat.com
 Signed-off-by: Andreas Färber afaer...@suse.de
 
 Ping! Jan, does this address your concerns / use cases?
 -rc2 is on Wednesday.

Yes, thanks, this looks good.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] [PATCH] pcnet: Flush queued packets on end of STOP state

2013-08-05 Thread Stefan Hajnoczi
On Fri, Aug 02, 2013 at 09:48:18PM +0200, Jan Kiszka wrote:
 From: Jan Kiszka jan.kis...@siemens.com
 
 Analogously to other NICs, we have to inform the network layer when
 the can_receive handler will no longer report 0. Without this, we may
 get stuck waiting on queued incoming packets.
 
 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 ---
  hw/net/pcnet.c |4 
  1 files changed, 4 insertions(+), 0 deletions(-)

Great, let's get this into the next QEMU 1.6-rc.

Thanks, applied to my net tree:
https://github.com/stefanha/qemu/commits/net

Stefan



Re: [Qemu-devel] [PATCH v4 2/2] ARM: Add 'virt' platform

2013-08-05 Thread Peter Maydell
On 5 August 2013 12:48, Anup Patel a...@brainfault.org wrote:
 +static const MemMapEntry a15memmap[] = {
 +[VIRT_FLASH] = { 0, 0x10 },

 IMHO, 1 MB of flash is small for possible future expansion. If mach-virt
 becomes popular then we can expect people running UBoot or UEFI or
  from this flash.

 I think having 16 MB of flash would be good because it is multiple of
 2 MB hence we can also create section entries for it in Stage2 TTBL.

Seems reasonable.

 +[VIRT_CPUPERIPHS] = { 0x10, 0x8000 },

 I would suggest to start peripheral space at 2 MB boundary and also
 have its total size in multiple of 2 MB.

The total size here is fixed by the CPU hardware -- an A15's
private peripheral space is only 0x8000 in size.

 This will enable us to create 2 MB
 entries in Stage2 TTBL for trapping which in-turn can help in performance
 by reducing Stage2 TTBL walks.

 I am sure there won't be too many peripherals in mach-virt so, it would
 even better if we can have base address of each peripheral to be at
 2 MB boundary.

Why does each individual peripheral have to be at a 2MB boundary?
I would expect the kernel to just have a single nothing mapped
here stage 2 table entry (or entries) covering the whole of the
mmio peripheral region regardless of how many devices happen
to be inside it.

thanks
-- PMM



[Qemu-devel] [PULL 1/1] pcnet: Flush queued packets on end of STOP state

2013-08-05 Thread Stefan Hajnoczi
From: Jan Kiszka jan.kis...@siemens.com

Analogously to other NICs, we have to inform the network layer when
the can_receive handler will no longer report 0. Without this, we may
get stuck waiting on queued incoming packets.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 hw/net/pcnet.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/hw/net/pcnet.c b/hw/net/pcnet.c
index b606d2b..63aa73a 100644
--- a/hw/net/pcnet.c
+++ b/hw/net/pcnet.c
@@ -861,6 +861,8 @@ static void pcnet_init(PCNetState *s)
 
 s-csr[0] |= 0x0101;
 s-csr[0] = ~0x0004;   /* clear STOP bit */
+
+qemu_flush_queued_packets(qemu_get_queue(s-nic));
 }
 
 static void pcnet_start(PCNetState *s)
@@ -878,6 +880,8 @@ static void pcnet_start(PCNetState *s)
 s-csr[0] = ~0x0004;   /* clear STOP bit */
 s-csr[0] |= 0x0002;
 pcnet_poll_timer(s);
+
+qemu_flush_queued_packets(qemu_get_queue(s-nic));
 }
 
 static void pcnet_stop(PCNetState *s)
-- 
1.8.1.4




[Qemu-devel] [PULL for-1.6 0/1] Net patches

2013-08-05 Thread Stefan Hajnoczi
This fix allows the pcnet NIC to work properly with the tap device.

The following changes since commit b9ac5d923b820a0f0152a2df56067e55ce34f487:

  target-mips: fix 34Kf configuration for DSP ASE (2013-08-03 23:33:17 +0200)

are available in the git repository at:

  git://github.com/stefanha/qemu.git net

for you to fetch changes up to ee76c1f821e75550644e084dea85743bbc934f91:

  pcnet: Flush queued packets on end of STOP state (2013-08-05 14:11:17 +0200)


Jan Kiszka (1):
  pcnet: Flush queued packets on end of STOP state

 hw/net/pcnet.c | 4 
 1 file changed, 4 insertions(+)

-- 
1.8.1.4




Re: [Qemu-devel] [PATCH v4 2/2] ARM: Add 'virt' platform

2013-08-05 Thread Anup Patel
On Mon, Aug 5, 2013 at 5:31 PM, Peter Maydell peter.mayd...@linaro.org wrote:
 On 5 August 2013 12:48, Anup Patel a...@brainfault.org wrote:
 +static const MemMapEntry a15memmap[] = {
 +[VIRT_FLASH] = { 0, 0x10 },

 IMHO, 1 MB of flash is small for possible future expansion. If mach-virt
 becomes popular then we can expect people running UBoot or UEFI or
  from this flash.

 I think having 16 MB of flash would be good because it is multiple of
 2 MB hence we can also create section entries for it in Stage2 TTBL.

 Seems reasonable.

 +[VIRT_CPUPERIPHS] = { 0x10, 0x8000 },

 I would suggest to start peripheral space at 2 MB boundary and also
 have its total size in multiple of 2 MB.

 The total size here is fixed by the CPU hardware -- an A15's
 private peripheral space is only 0x8000 in size.

Does this mean, mach-virt address space is Cortex-A15 specific ?
What about Cortex-A7 and Cortex-A12  ?

If above is a mandatory constraint then we can have peripheral space divide
into two parts:
1. Essential (or MPCore) peripherals: For now only GIC belongs here. Other
things that fit here is Watchdog and Local timers but this are redundant for
mach-virt. This space can be only 0x8000 in size as required by Cortex-A15.
2. General peripherals: All VirtIO devices would belong here. In addition,
users can also add devices to this space using QEMU command line.


 This will enable us to create 2 MB
 entries in Stage2 TTBL for trapping which in-turn can help in performance
 by reducing Stage2 TTBL walks.

 I am sure there won't be too many peripherals in mach-virt so, it would
 even better if we can have base address of each peripheral to be at
 2 MB boundary.

 Why does each individual peripheral have to be at a 2MB boundary?
 I would expect the kernel to just have a single nothing mapped
 here stage 2 table entry (or entries) covering the whole of the
 mmio peripheral region regardless of how many devices happen
 to be inside it.

Yes, this seem a little over-kill but we can atleast have peripheral space to
be 2MB aligned with total size in multiple of 2MB.


 thanks
 -- PMM



Re: [Qemu-devel] [PATCH for-1.6] target-mips: do not raise exceptions when accessing invalid memory

2013-08-05 Thread Aurélien Jarno
On Mon, Aug 05, 2013 at 07:19:08AM +0200, Stefan Weil wrote:
 Am 05.08.2013 00:37, schrieb Peter Maydell:
  On 4 August 2013 23:04, Aurélien Jarno aurel...@aurel32.net wrote:
  The real hardware probably returns all 1 or all 0 for addresses not
  decoded to a device. This is what QEMU should model, and it should
  not trigger a DBE or IBE exception. Looking at the current MIPS
  documentation, Bus Error is defined as:
 
A bus error exception occurs when an instruction or data access makes a
bus request (due to a cache miss or an  uncacheable reference) and
that request terminates in an error.
 
  Older CPU documentation like the R4000 have a more precise definition:
 
A Bus Error exception is raised by board-level circuitry for events such
as bus time-out, backplane bus parity errors, and invalid physical memory
addresses or access types.
 
  As we don't model this kind of errors, we should definitely just not
  trigger an exception in that case, and even logging the event as
  unimplemented is probably wrong.
  Well, we certainly can model invalid-physical-address and
  bus-timeout where that's what the board does for accesses
  to non-decoded addresses; but presumably in this case it
  doesn't...
 
  -- PMM
 
 Is there anybody who has access to physical Malta hardware?
 It would be interesting to see whether there is an exception
 during the gcmp test or not.
 
 With latest QEMU, the MIPS Malta system emulation starts
 handling the exception caused by the gcmp test, but then
 gets a second exception which is fatal (see below).
 
 There might be something missing in our very simple bios
 emulation.

Booting YAMON in QEMU also shows the same behaviour, that is trying to
access to the 1fbf8008 address and getting a DBE exception, causing it
to fail. So it is clearly not due to our simple bios emulation, but
rather to the way the I/O are emulated.

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net



Re: [Qemu-devel] [PATCH v4 2/2] ARM: Add 'virt' platform

2013-08-05 Thread Peter Maydell
On 5 August 2013 13:22, Anup Patel a...@brainfault.org wrote:
 On Mon, Aug 5, 2013 at 5:31 PM, Peter Maydell peter.mayd...@linaro.org 
 wrote:
 On 5 August 2013 12:48, Anup Patel a...@brainfault.org wrote:
 +static const MemMapEntry a15memmap[] = {
 +[VIRT_FLASH] = { 0, 0x10 },

 IMHO, 1 MB of flash is small for possible future expansion. If mach-virt
 becomes popular then we can expect people running UBoot or UEFI or
  from this flash.

 I think having 16 MB of flash would be good because it is multiple of
 2 MB hence we can also create section entries for it in Stage2 TTBL.

 Seems reasonable.

 +[VIRT_CPUPERIPHS] = { 0x10, 0x8000 },

 I would suggest to start peripheral space at 2 MB boundary and also
 have its total size in multiple of 2 MB.

 The total size here is fixed by the CPU hardware -- an A15's
 private peripheral space is only 0x8000 in size.

 Does this mean, mach-virt address space is Cortex-A15 specific ?

The patch has support for an array of VirtBoardInfo structs,
each of which has its own memmap. The only entry at the
moment is for the A15.

 What about Cortex-A7 and Cortex-A12  ?

QEMU supports neither of these CPUs. The A7's the same as
the A15, though.

 If above is a mandatory constraint then we can have peripheral space divide
 into two parts:
 1. Essential (or MPCore) peripherals: For now only GIC belongs here. Other
 things that fit here is Watchdog and Local timers but this are redundant for
 mach-virt. This space can be only 0x8000 in size as required by Cortex-A15.
 2. General peripherals: All VirtIO devices would belong here. In addition,
 users can also add devices to this space using QEMU command line.

Er, this is what the patch already does. The CPUPERIPHS
bit is specifically for the CPU's private peripheral region.

You can't add an MMIO device on the QEMU command line, there's
no way to wire up the memory regions and IRQs.

 Why does each individual peripheral have to be at a 2MB boundary?
 I would expect the kernel to just have a single nothing mapped
 here stage 2 table entry (or entries) covering the whole of the
 mmio peripheral region regardless of how many devices happen
 to be inside it.

 Yes, this seem a little over-kill but we can atleast have peripheral space to
 be 2MB aligned with total size in multiple of 2MB.

I really don't want to eat 2MB for each virtio-mmio transport
in a 32 bit address space, it seems hugely wasteful unless
there's a good reason to do it.

-- PMM



Re: [Qemu-devel] [PATCH 02/11] iov: handle EOF in iov_send_recv

2013-08-05 Thread Kevin Wolf
Am 03.08.2013 um 05:48 hat Doug Goldstein geschrieben:
 On Tue, Jul 23, 2013 at 6:28 AM, Paolo Bonzini pbonz...@redhat.com wrote:
  Il 23/07/2013 10:30, MORITA Kazutaka ha scritto:
  Without this patch, iov_send_recv() never returns when do_send_recv()
  returns zero.
 
  Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
  ---
   util/iov.c | 6 ++
   1 file changed, 6 insertions(+)
 
  diff --git a/util/iov.c b/util/iov.c
  index cc6e837..f705586 100644
  --- a/util/iov.c
  +++ b/util/iov.c
  @@ -202,6 +202,12 @@ ssize_t iov_send_recv(int sockfd, struct iovec *iov, 
  unsigned iov_cnt,
   return -1;
   }
 
  +if (ret == 0  !do_send) {
  +/* recv returns 0 when the peer has performed an orderly
  + * shutdown. */
  +break;
  +}
  +
   /* Prepare for the next iteration */
   offset += ret;
   total += ret;
 
 
  Reviewed-by: Paolo Bonzini pbonz...@redhat.com
 
  ... and should also be in 1.5.2.
 
  Paolo
 
 
 Nudge so this doesn't get forgotten about. It hasn't hit master yet.

Thanks, applied to the block branch.

Kevin



Re: [Qemu-devel] [PATCH v4 2/2] ARM: Add 'virt' platform

2013-08-05 Thread Anup Patel
On Mon, Aug 5, 2013 at 5:58 PM, Peter Maydell peter.mayd...@linaro.org wrote:
 On 5 August 2013 13:22, Anup Patel a...@brainfault.org wrote:
 On Mon, Aug 5, 2013 at 5:31 PM, Peter Maydell peter.mayd...@linaro.org 
 wrote:
 On 5 August 2013 12:48, Anup Patel a...@brainfault.org wrote:
 +static const MemMapEntry a15memmap[] = {
 +[VIRT_FLASH] = { 0, 0x10 },

 IMHO, 1 MB of flash is small for possible future expansion. If mach-virt
 becomes popular then we can expect people running UBoot or UEFI or
  from this flash.

 I think having 16 MB of flash would be good because it is multiple of
 2 MB hence we can also create section entries for it in Stage2 TTBL.

 Seems reasonable.

 +[VIRT_CPUPERIPHS] = { 0x10, 0x8000 },

 I would suggest to start peripheral space at 2 MB boundary and also
 have its total size in multiple of 2 MB.

 The total size here is fixed by the CPU hardware -- an A15's
 private peripheral space is only 0x8000 in size.

 Does this mean, mach-virt address space is Cortex-A15 specific ?

 The patch has support for an array of VirtBoardInfo structs,
 each of which has its own memmap. The only entry at the
 moment is for the A15.

 What about Cortex-A7 and Cortex-A12  ?

 QEMU supports neither of these CPUs. The A7's the same as
 the A15, though.

 If above is a mandatory constraint then we can have peripheral space divide
 into two parts:
 1. Essential (or MPCore) peripherals: For now only GIC belongs here. Other
 things that fit here is Watchdog and Local timers but this are redundant for
 mach-virt. This space can be only 0x8000 in size as required by Cortex-A15.
 2. General peripherals: All VirtIO devices would belong here. In addition,
 users can also add devices to this space using QEMU command line.

 Er, this is what the patch already does. The CPUPERIPHS
 bit is specifically for the CPU's private peripheral region.

 You can't add an MMIO device on the QEMU command line, there's
 no way to wire up the memory regions and IRQs.

 Why does each individual peripheral have to be at a 2MB boundary?
 I would expect the kernel to just have a single nothing mapped
 here stage 2 table entry (or entries) covering the whole of the
 mmio peripheral region regardless of how many devices happen
 to be inside it.

 Yes, this seem a little over-kill but we can atleast have peripheral space to
 be 2MB aligned with total size in multiple of 2MB.

 I really don't want to eat 2MB for each virtio-mmio transport
 in a 32 bit address space, it seems hugely wasteful unless
 there's a good reason to do it.

I am not suggesting to give 2MB space to each virtio-mmio transport.

What I really meant was to start VIRT_MMIO space (where all the
virtio-mmio transport would be added) to start at 2MB aligned address
and have total size (include all virtio-mmio transports) to be in-multiple
of 2MB so that we can trap access to all virtio-mmio transports using
1-2 2MB entries in Stage2.


 -- PMM

--Anup



Re: [Qemu-devel] [PATCH] block: Produce zeros when protocols reading beyond end of file

2013-08-05 Thread Kevin Wolf
Am 05.08.2013 um 10:11 hat Asias He geschrieben:
 From: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
 
 While Asias is debugging an issue creating qcow2 images on top of
 non-file protocols.  It boils down to this example using NBD:
 
 $ qemu-io -c 'open -g nbd+unix:///?socket=/tmp/nbd.sock' -c 'read -v 0 512'
 
 Notice the open -g option to set bs-growable.  This means you can
 read/write beyond end of file.  Reading beyond end of file is supposed
 to produce zeroes.
 
 We rely on this behavior in qcow2_create2() during qcow2 image
 creation.  We create a new file and then write the qcow2 header
 structure using bdrv_pwrite().  Since QCowHeader is not a multiple of
 sector size, block.c first uses bdrv_read() on the empty file to fetch
 the first sector (should be all zeroes).
 
 Here is the output from the qemu-io NBD example above:
 
 $ qemu-io -c 'open -g nbd+unix:///?socket=/tmp/nbd.sock' -c 'read -v 0 512'
 :  ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab  
 0010:  ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab  
 0020:  ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab  
 ...
 
 We are not zeroing the buffer!  As a result qcow2 image creation on top
 of protocols is not guaranteed to work even when file creation is
 supported by the protocol.
 
 Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
 Signed-off-by: Asias He as...@redhat.com
 ---
  block.c | 30 +-
  1 file changed, 29 insertions(+), 1 deletion(-)
 
 diff --git a/block.c b/block.c
 index 01b66d8..deaf0a0 100644
 --- a/block.c
 +++ b/block.c
 @@ -2544,7 +2544,35 @@ static int coroutine_fn 
 bdrv_co_do_readv(BlockDriverState *bs,
  }
  }
  
 -ret = drv-bdrv_co_readv(bs, sector_num, nb_sectors, qiov);
 +if (!bs-drv-protocol_name) {

I think !bs-growable is the right check.

Checking for the protocol name is always a hack and most times wrong.

 +ret = drv-bdrv_co_readv(bs, sector_num, nb_sectors, qiov);
 +} else {
 +/* NBD doesn't support reading beyond end of file. */

This is not only for NBD, make it a neutral comment like:

/* Read zeros after EOF of growable BDSes */

 +int64_t len, total_sectors, max_nb_sectors;
 +
 +len = bdrv_getlength(bs);
 +if (len  0) {
 +ret = len;
 +goto out;
 +}
 +
 +total_sectors = len  BDRV_SECTOR_BITS;
 +max_nb_sectors = MAX(0, total_sectors - sector_num);
 +if (max_nb_sectors  0) {
 +ret = drv-bdrv_co_readv(bs, sector_num,
 + MIN(nb_sectors, max_nb_sectors), qiov);
 +} else {
 +ret = 0;
 +}
 +
 +/* Reading beyond end of file is supposed to produce zeroes */
 +if (ret == 0  total_sectors  sector_num + nb_sectors) {
 +size_t offset = MAX(0, total_sectors - sector_num);
 +size_t bytes = (sector_num + nb_sectors - offset) *
 +BDRV_SECTOR_SIZE;

uint64_t for both offset and bytes, size_t can be 32 bits.

 +qemu_iovec_memset(qiov, offset * BDRV_SECTOR_SIZE, 0, bytes);
 +}
 +}
  
  out:
  tracked_request_end(req);

Kevin



Re: [Qemu-devel] [PATCH v4 2/2] ARM: Add 'virt' platform

2013-08-05 Thread Peter Maydell
On 5 August 2013 13:37, Anup Patel a...@brainfault.org wrote:
 On Mon, Aug 5, 2013 at 5:58 PM, Peter Maydell peter.mayd...@linaro.org 
 wrote:
 I really don't want to eat 2MB for each virtio-mmio transport
 in a 32 bit address space, it seems hugely wasteful unless
 there's a good reason to do it.

 I am not suggesting to give 2MB space to each virtio-mmio transport.

 What I really meant was to start VIRT_MMIO space (where all the
 virtio-mmio transport would be added) to start at 2MB aligned address
 and have total size (include all virtio-mmio transports) to be in-multiple
 of 2MB so that we can trap access to all virtio-mmio transports using
 1-2 2MB entries in Stage2.

Yes, that's fine. That would give us:

static const MemMapEntry a15memmap[] = {
[VIRT_FLASH] = { 0, 0x100 },
[VIRT_CPUPERIPHS] = { 0x100, 0x8000 },
/* GIC distributor and CPU interfaces sit inside the CPU peripheral space */
[VIRT_GIC_DIST] = { 0x1001000, 0x1000 },
[VIRT_GIC_CPU] = { 0x1002000, 0x1000 },
[VIRT_MMIO] = { 0x120, 0x200 },
/* ...repeating for a total of NUM_VIRTIO_TRANSPORTS, each of that size */
[VIRT_MEM] = { 0x800, 30ULL * 1024 * 1024 * 1024 },
};

-- PMM



[Qemu-devel] Versioned machine types for ARM/non-x86 ? (Was Re: [PATCH v4 0/2] ARM: add 'virt' platform)

2013-08-05 Thread Daniel P. Berrange
On Mon, Aug 05, 2013 at 12:18:10PM +0100, Peter Maydell wrote:
 This patch series adds a 'virt' platform which uses the
 kernel's mach-virt (fully device-tree driven) support
 to create a simple minimalist platform intended for
 use for KVM VM guests. It's based on John Rigby's
 patches, but I've overhauled it a lot:

On x86, we've long had versioned machine names, so that we can
make changes in future QEMU releases without breaking guest ABI
compatibility. AFAICT, the problem has basically been ignored
on non-x86 platforms in QEMU. Given the increased interest in
ARM in particular, should we use the addition of this new 'virt'
machine type, as an opportunity to introduce versioning for
ARM too. eg make this machine be called 'virt-1.0.6' and then
have 'virt' simply be an alias that points to the most recent
version.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|



Re: [Qemu-devel] [PATCH] target-mips: fix decoding of microMIPS POOL32Axf instructions

2013-08-05 Thread Leon Alrae
On 05/08/13 11:50, Aurelien Jarno wrote:
 On Mon, Aug 05, 2013 at 08:41:52AM +0100, Leon Alrae wrote:
 On 03/08/13 23:01, Aurelien Jarno wrote:
 On Thu, Aug 01, 2013 at 11:02:27AM +0100, Leon Alrae wrote:
 These are not DSP instructions, thus there is no ac field.

 For more details please refer to instruction encoding of
 MULT, MULTU, MADD, MADDU, MSUB, MSUBU, MFHI, MFLO, MTHI, MTLO in
 MIPS Architecture for Programmers Volume II-B: The microMIPS32 Instruction 
 Set

 These instructions have both a non DSP version without the ac field,
 and a DSP version with the ac field. The encoding of the ac field is
 done in bits 14 and 15, so the current QEMU code is correct.

 Please refer to the  MIPS32® Architecture Manual Volume IV-e: The
 MIPS® DSP Application-Specific Extension to the microMIPS32®
 Architecture (document MD00764).
  

 Please note that the patch modifies non-DSP version of listed
 instructions, where ac field doesn't exist. In this case reading bits
 
 This is correct, except for MFHI, MFLO, MTHI, MTLO which share the same
 opcode for the DSP and non-DSP version. Theses instructions have bits 14
 and 15 set to 0 for the non-DSP version, so they are working as expected
 and thus your patch should not modify them.
 

As far as microMIPS is concerned - MFHI, MFLO, MTHI, MTLO don't seem to
share the same opcode for the DSP and non-DSP version. It is true that
non-DSP version has bits 14 and 15 set to zero. However, according to
already mentioned documents, bits 11..6 (extension) are different in the
DSP (set to 0x35) and non-DSP version (set to 0x1). Therefore, we
shouldn't read bits 14 and 15 in case of non-DSP version, and we should
have a separate entry for DSP version.

Just to make sure that we are refering to the same thing :)
For MFHI:
page no. 303 in MIPS Architecture for Programmers Volume II-B: The
microMIPS32 Instruction Set (document MD00764)
page no. 140 in MIPS Architecture for Programmers Volume IV-e: The MIPS
DSP Module for the microMIPS32 Architecture (document MD00582)

 14 and 15 and passing it as acc argument to gen_muldiv() or gen_HILO()
 is not correct. If those bits are not 0 (and for most of the listed
 instructions this is true) then check_dsp() is called which will cause
 an exception if DSP is not supported / disabled.

 
 This is correct. The current code assumes that all instructions using
 gen_muldiv() have the same opcode for non-DSP and DSP version (actually
 it's weird that they have a different encoding, it's just wasting some
 opcode space).
 
 Therefore what should be done:
 - The part concerning MFHI, MFLO, MTHI, MTLO should be dropped from your
   patch as it already works correctly.
 - The part of your patch concerning instructions calling gen_muldiv() is
   correct and should be kept to handle the non-DSP version of these
   instructions.
 - An entry with for the DSP version of these instructions should be
   created, calling gen_muldiv() passing the ac field from bits 14 and
   15. This is basically the same code than now, just with extension 0x32
   instead of 0x2c.
 

OK - I will update the patch, but I would leave the part concerning
MFHI, MFLO, MTHI, MTLO as it seems to be correct, but the new entry will
be added for DSP version.

Regards,
Leon




[Qemu-devel] [PATCH for-1.6] qemu-iotests: filter QEMU version in monitor banner

2013-08-05 Thread Stefan Hajnoczi
Filter out the QEMU monitor version banner so that tests do not break
when the QEMU version number is changed.

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 tests/qemu-iotests/051.out   | 64 
 tests/qemu-iotests/common.filter |  3 +-
 2 files changed, 34 insertions(+), 33 deletions(-)

diff --git a/tests/qemu-iotests/051.out b/tests/qemu-iotests/051.out
index 9588d0c..5582ed3 100644
--- a/tests/qemu-iotests/051.out
+++ b/tests/qemu-iotests/051.out
@@ -23,11 +23,11 @@ QEMU_PROG: -drive 
file=TEST_DIR/t.qcow2,format=qcow2,unknown_opt=foo: could not
 === Enable and disable lazy refcounting on the command line, plus some invalid 
values ===
 
 Testing: -drive file=TEST_DIR/t.qcow2,format=qcow2,lazy-refcounts=on
-QEMU 1.5.50 monitor - type 'help' for more information
+QEMU X.Y.Z monitor - type 'help' for more information
 (qemu) qququiquit
 
 Testing: -drive file=TEST_DIR/t.qcow2,format=qcow2,lazy-refcounts=off
-QEMU 1.5.50 monitor - type 'help' for more information
+QEMU X.Y.Z monitor - type 'help' for more information
 (qemu) qququiquit
 
 Testing: -drive file=TEST_DIR/t.qcow2,format=qcow2,lazy-refcounts=
@@ -51,72 +51,72 @@ QEMU_PROG: -drive 
file=TEST_DIR/t.qcow2,format=qcow2,lazy-refcounts=on: Lazy ref
 QEMU_PROG: -drive file=TEST_DIR/t.qcow2,format=qcow2,lazy-refcounts=on: could 
not open disk image TEST_DIR/t.qcow2: Invalid argument
 
 Testing: -drive file=TEST_DIR/t.qcow2,format=qcow2,lazy-refcounts=off
-QEMU 1.5.50 monitor - type 'help' for more information
+QEMU X.Y.Z monitor - type 'help' for more information
 (qemu) qququiquit
 
 
 === No medium ===
 
 Testing: -drive if=floppy
-QEMU 1.5.50 monitor - type 'help' for more information
+QEMU X.Y.Z monitor - type 'help' for more information
 (qemu) qququiquit
 
 Testing: -drive if=ide,media=cdrom
-QEMU 1.5.50 monitor - type 'help' for more information
+QEMU X.Y.Z monitor - type 'help' for more information
 (qemu) qququiquit
 
 Testing: -drive if=scsi,media=cdrom
-QEMU 1.5.50 monitor - type 'help' for more information
+QEMU X.Y.Z monitor - type 'help' for more information
 (qemu) qququiquit
 
 Testing: -drive if=ide
-QEMU 1.5.50 monitor - type 'help' for more information
+QEMU X.Y.Z monitor - type 'help' for more information
 (qemu) QEMU_PROG: Device needs media, but drive is empty
 QEMU_PROG: Device initialization failed.
 QEMU_PROG: Initialization of device ide-hd failed
 
 Testing: -drive if=virtio
-QEMU 1.5.50 monitor - type 'help' for more information
+QEMU X.Y.Z monitor - type 'help' for more information
 (qemu) QEMU_PROG: -drive if=virtio: Device needs media, but drive is empty
 QEMU_PROG: -drive if=virtio: Device initialization failed.
 QEMU_PROG: -drive if=virtio: Device initialization failed.
 QEMU_PROG: -drive if=virtio: Device 'virtio-blk-pci' could not be initialized
 
 Testing: -drive if=scsi
-QEMU 1.5.50 monitor - type 'help' for more information
+QEMU X.Y.Z monitor - type 'help' for more information
 (qemu) QEMU_PROG: -drive if=scsi: Device needs media, but drive is empty
 QEMU_PROG: -drive if=scsi: Device initialization failed.
 QEMU_PROG: Device initialization failed.
 QEMU_PROG: Initialization of device lsi53c895a failed
 
 Testing: -drive if=none,id=disk -device ide-cd,drive=disk
-QEMU 1.5.50 monitor - type 'help' for more information
+QEMU X.Y.Z monitor - type 'help' for more information
 (qemu) qququiquit
 
 Testing: -drive if=none,id=disk -device lsi53c895a -device scsi-cd,drive=disk
-QEMU 1.5.50 monitor - type 'help' for more information
+QEMU X.Y.Z monitor - type 'help' for more information
 (qemu) qququiquit
 
 Testing: -drive if=none,id=disk -device ide-drive,drive=disk
-QEMU 1.5.50 monitor - type 'help' for more information
+QEMU X.Y.Z monitor - type 'help' for more information
 (qemu) QEMU_PROG: -device ide-drive,drive=disk: Device needs media, but drive 
is empty
 QEMU_PROG: -device ide-drive,drive=disk: Device initialization failed.
 QEMU_PROG: -device ide-drive,drive=disk: Device 'ide-drive' could not be 
initialized
 
 Testing: -drive if=none,id=disk -device ide-hd,drive=disk
-QEMU 1.5.50 monitor - type 'help' for more information
+QEMU X.Y.Z monitor - type 'help' for more information
 (qemu) QEMU_PROG: -device ide-hd,drive=disk: Device needs media, but drive is 
empty
 QEMU_PROG: -device ide-hd,drive=disk: Device initialization failed.
 QEMU_PROG: -device ide-hd,drive=disk: Device 'ide-hd' could not be initialized
 
 Testing: -drive if=none,id=disk -device lsi53c895a -device scsi-disk,drive=disk
-QEMU 1.5.50 monitor - type 'help' for more information
+QEMU X.Y.Z monitor - type 'help' for more information
 (qemu) QEMU_PROG: -device scsi-disk,drive=disk: Device needs media, but drive 
is empty
 QEMU_PROG: -device 

Re: [Qemu-devel] Versioned machine types for ARM/non-x86 ? (Was Re: [PATCH v4 0/2] ARM: add 'virt' platform)

2013-08-05 Thread Peter Maydell
On 5 August 2013 13:49, Daniel P. Berrange berra...@redhat.com wrote:
 On x86, we've long had versioned machine names, so that we can
 make changes in future QEMU releases without breaking guest ABI
 compatibility. AFAICT, the problem has basically been ignored
 on non-x86 platforms in QEMU.

Yes; this is deliberate on the basis that starting to do this
is accepting a huge pile of maintenance workload (ie checking
for things which change, keeping around a pile of old version
machine models, retaining migration compatibility between
old and new versions). Which isn't to say I'm against it
but it means I'm not doing it until the pushback from users
that it's necessary is pretty strong.

 Given the increased interest in
 ARM in particular, should we use the addition of this new 'virt'
 machine type, as an opportunity to introduce versioning for
 ARM too. eg make this machine be called 'virt-1.0.6' and then
 have 'virt' simply be an alias that points to the most recent
 version.

I'm not convinced we're at the point where we need to do this
yet.

-- PMM



[Qemu-devel] FW: [Xen-devel]Cirrus VGA slow screen update, show blank screen last 13s or so for windows XP guest

2013-08-05 Thread Gonglei (Arei)
Hi,
 -Original Message-
 From: Gonglei (Arei)
 Sent: Tuesday, July 30, 2013 10:01 AM
 To: 'Pasi Kärkkäinen'
 Cc: Gerd Hoffmann; Andreas Färber; Hanweidong; Luonengjun;
 qemu-devel@nongnu.org; xen-de...@lists.xen.org; Anthony Liguori;
 Huangweidong (Hardware); 'ian.jack...@eu.citrix.com'; Anthony Liguori;
 'aligu...@us.ibm.com'
 Subject: RE: [Xen-devel] [Qemu-devel] Cirrus VGA slow screen update, show
 blank screen last 13s or so for windows XP guest
 
  On Mon, Jul 29, 2013 at 08:48:54AM +, Gonglei (Arei) wrote:
-Original Message-
From: Pasi Kärkkäinen [mailto:pa...@iki.fi]
Sent: Saturday, July 27, 2013 7:51 PM
To: Gerd Hoffmann
Cc: Andreas Färber; Hanweidong; Luonengjun; qemu-devel@nongnu.org;
xen-de...@lists.xen.org; Gonglei (Arei); Anthony Liguori; Huangweidong
(Hardware)
Subject: Re: [Xen-devel] [Qemu-devel] Cirrus VGA slow screen update,
 show
blank screen last 13s or so for windows XP guest
   
On Fri, Jul 26, 2013 at 12:19:16PM +0200, Gerd Hoffmann wrote:

 Maybe the xen guys did some optimizations in qemu-dm which where
 not
 merged upstream.  Try asking @ xen-devel.

   
Yeah, xen qemu-dm must have some optimization for cirrus that isn't in
upstream qemu.
  
   Hi, Pasi. Would you give me some more details? Thanks!
  
 
  Unfortunately I don't have any more details.. I was just assuming if xen
  qemu-dm is fast,
  but upstream qemu is slow, there might be some optimization in xen
  qemu-dm ..
 
  Did you check the xen qemu-dm (traditional) history / commits?
 
  -- Pasi
 
 Yes, I did. I tried to reproduce the issue with qemu-dm by fall backing 
 history
 commit.
 But the qemu-dm mainline merged other branches, such as branch 'upstream'
 and branch 'qemu',
 so that I can not complied the xen-4.1.2/tools project well. CC'ing Ian and
 Anthony.
 
By analyzing xentrace data, I found that there is lot's of VMEXIT between 
memory region 0xa~0xa with upstream qemu,
but only 256 VMEXIT on traditional qemu-dm, when Windows XP guest booting up or 
change the method connecting the VM from VNC(RDP) to RDP(VNC):

linux-sAGhxH:# cat qemu-upstrem.log |grep 0x000a|wc -l 
640654
linux-sAGhxH:# cat qemu-dm.log |grep 0x000a|wc -l   
256
And the 256 VMEXIT of qemu-dm as the same as the top 256 VMEXIT of qemu-upstream

linux-sAGhxH:# cat qemu-upstream.log |grep 0x000a| tail  
CPU0  0 (+   0)  NPF [ gpa = 0x000a0b08 mfn = 
0x qual = 0x0182 p2mt = 0x0004 ]
CPU0  0 (+   0)  NPF [ gpa = 0x000a0b0a mfn = 
0x qual = 0x0182 p2mt = 0x0004 ]
CPU0  0 (+   0)  NPF [ gpa = 0x000a0b0c mfn = 
0x qual = 0x0182 p2mt = 0x0004 ]
CPU0  0 (+   0)  NPF [ gpa = 0x000a0b0e mfn = 
0x qual = 0x0182 p2mt = 0x0004 ]
CPU0  0 (+   0)  NPF [ gpa = 0x000a0b10 mfn = 
0x qual = 0x0182 p2mt = 0x0004 ]
CPU0  0 (+   0)  NPF [ gpa = 0x000a0b12 mfn = 
0x qual = 0x0182 p2mt = 0x0004 ]
CPU0  0 (+   0)  NPF [ gpa = 0x000a0b14 mfn = 
0x qual = 0x0182 p2mt = 0x0004 ]
CPU0  0 (+   0)  NPF [ gpa = 0x000a0b16 mfn = 
0x qual = 0x0182 p2mt = 0x0004 ]
CPU0  0 (+   0)  NPF [ gpa = 0x000a0b18 mfn = 
0x qual = 0x0182 p2mt = 0x0004 ]
CPU0  0 (+   0)  NPF [ gpa = 0x000a0b1a mfn = 
0x qual = 0x0182 p2mt = 0x0004 ]
linux-sAGhxH:# cat qemu-dm.log |grep 0x000a| tail
CPU2  0 (+   0)  NPF [ gpa = 0x000a1ec0 mfn = 
0x qual = 0x0182 p2mt = 0x0004 ]
CPU2  0 (+   0)  NPF [ gpa = 0x000a1ee0 mfn = 
0x qual = 0x0182 p2mt = 0x0004 ]
CPU2  0 (+   0)  NPF [ gpa = 0x000a1f00 mfn = 
0x qual = 0x0182 p2mt = 0x0004 ]
CPU2  0 (+   0)  NPF [ gpa = 0x000a1f20 mfn = 
0x qual = 0x0182 p2mt = 0x0004 ]
CPU2  0 (+   0)  NPF [ gpa = 0x000a1f40 mfn = 
0x qual = 0x0182 p2mt = 0x0004 ]
CPU2  0 (+   0)  NPF [ gpa = 0x000a1f60 mfn = 
0x qual = 0x0182 p2mt = 0x0004 ]
CPU2  0 (+   0)  NPF [ gpa = 0x000a1f80 mfn = 
0x qual = 0x0182 p2mt = 0x0004 ]
CPU2  0 (+   0)  NPF [ gpa = 0x000a1fa0 mfn = 
0x qual = 0x0182 p2mt = 0x0004 ]
CPU2  0 (+   0)  NPF [ gpa = 0x000a1fc0 mfn = 
0x qual = 0x0182 p2mt = 0x0004 ]
CPU2  0 (+   0)  NPF [ gpa = 0x000a1fe0 mfn = 
0x qual = 0x0182 p2mt = 0x0004 ]

Please see attachment for more information.

Because of the lot's of VMEXIT, the cirrus_vga_ioport_write funciton receive 
little scheduling, 
and cirrus_vga_write_gr execution interval of 

Re: [Qemu-devel] [PATCH v5 00/21] AArch64 preparation patchset

2013-08-05 Thread Peter Maydell
On 18 July 2013 14:31, Peter Maydell peter.mayd...@linaro.org wrote:
 On 1 July 2013 20:07, Peter Maydell peter.mayd...@linaro.org wrote:
 On 1 July 2013 18:34, Peter Maydell peter.mayd...@linaro.org wrote:
 This patchset is v5 of the preparation patchset that started
 off with Alex, was passed to John Rigby and now to me.

 Also available via git:

 git://git.linaro.org/people/pmaydell/qemu-arm.git aarch64
 (caution: branch likely to rebase in future)

 I'm not going to bother reposting the whole patchset since
 the changes are trivial, but for the benefit of people basing
 other patchsets on top of this, I've rebased it on current
 master and updated the git branch above.

Git branch updated again on current master.

-- PMM



Re: [Qemu-devel] [ANNOUNCE] QEMU 1.6.0-rc1 is now available

2013-08-05 Thread Luiz Capitulino
On Mon, 05 Aug 2013 14:01:10 +0200
Laszlo Ersek ler...@redhat.com wrote:

 Please do consider it for 1.6, it addresses a serious bug (= vmcores saved
 with dump-guest-memory/paging=false are broken for  3.5G x86(_64) guests).

No worries, it is going to be included (unless someone has a serious objection,
of course).



[Qemu-devel] [PATCH for-1.6? v2 00/21] qtest: Test all targets

2013-08-05 Thread Andreas Färber
Hello Anthony/Aurélien,

This series extends test coverage to all 16 targets.
For now it tests that QOM type changes do not lead to QOM cast assertions.

v2 extends it to cover virtually all machines (except Xen and pc*-x.y).
Where an fprintf() is touched, use error_report() instead.

Regards,
Andreas

v1 - v2:
* gumstix, z2: Avoided conditionalizing use of pflash device in favor of NULL 
bdrv.
* puv3: Limited qtest workaround to a NULL kernel_filename.
* Added error workarounds for milkymist, ppc405, shix and leon3.
* Cleaned up debug output for ppc405 and shix.
* Extended qom-test to cover virtually all machines, including n800 and pc.
* Moved machine names to arrays wherever sensible, to aid with extensibility.
* Adopted error_report() for armv7m, too.

Cc: Anthony Liguori anth...@codemonkey.ws
Cc: Aurélien Jarno aurel...@aurel32.net
Cc: Peter Maydell peter.mayd...@linaro.org

Andreas Färber (21):
  mips_mipssim: Silence BIOS loading warning for qtest
  arm/boot: Turn arm_load_kernel() into no-op for qtest without -kernel
  puv3: Turn puv3_load_kernel() into a no-op for qtest without -kernel
  mainstone: Don't enforce use of -pflash for qtest
  gumstix: Don't enforce use of -pflash for qtest
  z2: Don't enforce use of -pflash for qtest
  palm: Don't enforce loading ROM or kernel for qtest
  omap_sx1: Don't enforce use of kernel or flash for qtest
  exynos4_boards: Silence lack of -smp 2 warning for qtest
  armv7m: Don't enforce use of kernel for qtest
  axis_dev88: Don't enforce use of kernel for qtest
  mcf5208: Don't enforce use of kernel for qtest
  an5206: Don't enforce use of kernel for qtest
  milkymist: Suppress -kernel/-bios/-drive error for qtest
  ppc405_boards: Disable debug output
  ppc405_uc: Disable debug output
  ppc405_boards: Don't enforce presence of firmware for qtest
  shix: Drop debug output
  shix: Don't require firmware presence for qtest
  leon3: Don't enforce use of -bios with qtest
  qtest: Prepare QOM machine tests

 hw/arm/armv7m.c |  25 +++--
 hw/arm/boot.c   |   4 +
 hw/arm/exynos4_boards.c |   3 +-
 hw/arm/gumstix.c|  11 +-
 hw/arm/mainstone.c  |   5 +-
 hw/arm/omap_sx1.c   |   3 +-
 hw/arm/palm.c   |   3 +-
 hw/arm/z2.c |   5 +-
 hw/block/tc58128.c  |  10 +-
 hw/cris/axis_dev88.c|  11 +-
 hw/lm32/milkymist.c |   3 +-
 hw/m68k/an5206.c|   4 +
 hw/m68k/mcf5208.c   |   4 +
 hw/mips/mips_mipssim.c  |   4 +-
 hw/ppc/ppc405_boards.c  |  39 ---
 hw/ppc/ppc405_uc.c  |  16 +--
 hw/sh4/shix.c   |  16 +--
 hw/sparc/leon3.c|   3 +-
 hw/unicore32/puv3.c |   4 +
 tests/Makefile  |  26 +
 tests/qom-test.c| 280 
 21 files changed, 410 insertions(+), 69 deletions(-)
 create mode 100644 tests/qom-test.c

-- 
1.8.1.4




[Qemu-devel] [PATCH for-1.6? v2 06/21] z2: Don't enforce use of -pflash for qtest

2013-08-05 Thread Andreas Färber
Signed-off-by: Andreas Färber afaer...@suse.de
---
 hw/arm/z2.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/arm/z2.c b/hw/arm/z2.c
index 07a127b..a4cbaeb 100644
--- a/hw/arm/z2.c
+++ b/hw/arm/z2.c
@@ -24,6 +24,7 @@
 #include ui/console.h
 #include audio/audio.h
 #include exec/address-spaces.h
+#include sysemu/qtest.h
 
 #ifdef DEBUG_Z2
 #define DPRINTF(fmt, ...) \
@@ -323,7 +324,7 @@ static void z2_init(QEMUMachineInitArgs *args)
 be = 0;
 #endif
 dinfo = drive_get(IF_PFLASH, 0, 0);
-if (!dinfo) {
+if (!dinfo  !qtest_enabled()) {
 fprintf(stderr, Flash image must be given with the 
 'pflash' parameter\n);
 exit(1);
@@ -331,7 +332,7 @@ static void z2_init(QEMUMachineInitArgs *args)
 
 if (!pflash_cfi01_register(Z2_FLASH_BASE,
NULL, z2.flash0, Z2_FLASH_SIZE,
-   dinfo-bdrv, sector_len,
+   dinfo ? dinfo-bdrv : NULL, sector_len,
Z2_FLASH_SIZE / sector_len, 4, 0, 0, 0, 0,
be)) {
 fprintf(stderr, qemu: Error registering flash memory.\n);
-- 
1.8.1.4




[Qemu-devel] [PATCH for-1.6? v2 02/21] arm/boot: Turn arm_load_kernel() into no-op for qtest without -kernel

2013-08-05 Thread Andreas Färber
Signed-off-by: Andreas Färber afaer...@suse.de
---
 hw/arm/boot.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index 2cbeefd..0c6d611 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -15,6 +15,7 @@
 #include hw/loader.h
 #include elf.h
 #include sysemu/device_tree.h
+#include sysemu/qtest.h
 #include qemu/config-file.h
 
 #define KERNEL_ARGS_ADDR 0x100
@@ -354,6 +355,9 @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info 
*info)
 
 /* Load the kernel.  */
 if (!info-kernel_filename) {
+if (qtest_enabled()) {
+return;
+}
 fprintf(stderr, Kernel image must be specified\n);
 exit(1);
 }
-- 
1.8.1.4




[Qemu-devel] [PATCH for-1.6? v2 01/21] mips_mipssim: Silence BIOS loading warning for qtest

2013-08-05 Thread Andreas Färber
Signed-off-by: Andreas Färber afaer...@suse.de
---
 hw/mips/mips_mipssim.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/mips/mips_mipssim.c b/hw/mips/mips_mipssim.c
index fea1a15..d8c4347 100644
--- a/hw/mips/mips_mipssim.c
+++ b/hw/mips/mips_mipssim.c
@@ -37,6 +37,7 @@
 #include elf.h
 #include hw/sysbus.h
 #include exec/address-spaces.h
+#include sysemu/qtest.h
 
 static struct _loaderparams {
 int ram_size;
@@ -189,7 +190,8 @@ mips_mipssim_init(QEMUMachineInitArgs *args)
 } else {
 bios_size = -1;
 }
-if ((bios_size  0 || bios_size  BIOS_SIZE)  !kernel_filename) {
+if ((bios_size  0 || bios_size  BIOS_SIZE) 
+!kernel_filename  !qtest_enabled()) {
 /* Bail out if we have neither a kernel image nor boot vector code. */
 fprintf(stderr,
 qemu: Warning, could not load MIPS bios '%s', and no -kernel 
argument was specified\n,
-- 
1.8.1.4




[Qemu-devel] [PATCH for-1.6? v2 15/21] ppc405_boards: Disable debug output

2013-08-05 Thread Andreas Färber
Also move one stray debug output into an #ifdef.

Signed-off-by: Andreas Färber afaer...@suse.de
---
 hw/ppc/ppc405_boards.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/ppc/ppc405_boards.c b/hw/ppc/ppc405_boards.c
index f74e5e5..807ada0 100644
--- a/hw/ppc/ppc405_boards.c
+++ b/hw/ppc/ppc405_boards.c
@@ -42,7 +42,7 @@
 
 #define USE_FLASH_BIOS
 
-#define DEBUG_BOARD_INIT
+//#define DEBUG_BOARD_INIT
 
 /*/
 /* PPC405EP reference board (IBM) */
@@ -353,9 +353,9 @@ static void ref405ep_init(QEMUMachineInitArgs *args)
 bdloc = 0;
 }
 #ifdef DEBUG_BOARD_INIT
+printf(bdloc  RAM_ADDR_FMT \n, bdloc);
 printf(%s: Done\n, __func__);
 #endif
-printf(bdloc  RAM_ADDR_FMT \n, bdloc);
 }
 
 static QEMUMachine ref405ep_machine = {
-- 
1.8.1.4




[Qemu-devel] [PATCH for-1.6? v2 11/21] axis_dev88: Don't enforce use of kernel for qtest

2013-08-05 Thread Andreas Färber
Signed-off-by: Andreas Färber afaer...@suse.de
---
 hw/cris/axis_dev88.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/hw/cris/axis_dev88.c b/hw/cris/axis_dev88.c
index 9104d61..6673c66 100644
--- a/hw/cris/axis_dev88.c
+++ b/hw/cris/axis_dev88.c
@@ -32,6 +32,7 @@
 #include boot.h
 #include sysemu/blockdev.h
 #include exec/address-spaces.h
+#include sysemu/qtest.h
 
 #define D(x)
 #define DNAND(x)
@@ -340,14 +341,14 @@ void axisdev88_init(QEMUMachineInitArgs *args)
  irq[0x14 + i]);
 }
 
-if (!kernel_filename) {
+if (kernel_filename) {
+li.image_filename = kernel_filename;
+li.cmdline = kernel_cmdline;
+cris_load_image(cpu, li);
+} else if (!qtest_enabled()) {
 fprintf(stderr, Kernel image must be specified\n);
 exit(1);
 }
-
-li.image_filename = kernel_filename;
-li.cmdline = kernel_cmdline;
-cris_load_image(cpu, li);
 }
 
 static QEMUMachine axisdev88_machine = {
-- 
1.8.1.4




[Qemu-devel] [PATCH for-1.6? v2 05/21] gumstix: Don't enforce use of -pflash for qtest

2013-08-05 Thread Andreas Färber
Signed-off-by: Andreas Färber afaer...@suse.de
---
 hw/arm/gumstix.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/hw/arm/gumstix.c b/hw/arm/gumstix.c
index b8cab10..a9015ed 100644
--- a/hw/arm/gumstix.c
+++ b/hw/arm/gumstix.c
@@ -42,6 +42,7 @@
 #include hw/boards.h
 #include sysemu/blockdev.h
 #include exec/address-spaces.h
+#include sysemu/qtest.h
 
 static const int sector_len = 128 * 1024;
 
@@ -58,7 +59,7 @@ static void connex_init(QEMUMachineInitArgs *args)
 cpu = pxa255_init(address_space_mem, connex_ram);
 
 dinfo = drive_get(IF_PFLASH, 0, 0);
-if (!dinfo) {
+if (!dinfo  !qtest_enabled()) {
 fprintf(stderr, A flash image must be given with the 
 'pflash' parameter\n);
 exit(1);
@@ -70,7 +71,8 @@ static void connex_init(QEMUMachineInitArgs *args)
 be = 0;
 #endif
 if (!pflash_cfi01_register(0x, NULL, connext.rom, connex_rom,
-   dinfo-bdrv, sector_len, connex_rom / 
sector_len,
+   dinfo ? dinfo-bdrv : NULL,
+   sector_len, connex_rom / sector_len,
2, 0, 0, 0, 0, be)) {
 fprintf(stderr, qemu: Error registering flash memory.\n);
 exit(1);
@@ -95,7 +97,7 @@ static void verdex_init(QEMUMachineInitArgs *args)
 cpu = pxa270_init(address_space_mem, verdex_ram, cpu_model ?: pxa270-c0);
 
 dinfo = drive_get(IF_PFLASH, 0, 0);
-if (!dinfo) {
+if (!dinfo  !qtest_enabled()) {
 fprintf(stderr, A flash image must be given with the 
 'pflash' parameter\n);
 exit(1);
@@ -107,7 +109,8 @@ static void verdex_init(QEMUMachineInitArgs *args)
 be = 0;
 #endif
 if (!pflash_cfi01_register(0x, NULL, verdex.rom, verdex_rom,
-   dinfo-bdrv, sector_len, verdex_rom / 
sector_len,
+   dinfo ? dinfo-bdrv : NULL,
+   sector_len, verdex_rom / sector_len,
2, 0, 0, 0, 0, be)) {
 fprintf(stderr, qemu: Error registering flash memory.\n);
 exit(1);
-- 
1.8.1.4




[Qemu-devel] [PATCH for-1.6? v2 04/21] mainstone: Don't enforce use of -pflash for qtest

2013-08-05 Thread Andreas Färber
Simply skip flash setup for now.

Also drop useless debug output.

Signed-off-by: Andreas Färber afaer...@suse.de
---
 hw/arm/mainstone.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/hw/arm/mainstone.c b/hw/arm/mainstone.c
index 8e5fc26..e9d4b98 100644
--- a/hw/arm/mainstone.c
+++ b/hw/arm/mainstone.c
@@ -21,6 +21,7 @@
 #include sysemu/blockdev.h
 #include hw/sysbus.h
 #include exec/address-spaces.h
+#include sysemu/qtest.h
 
 /* Device addresses */
 #define MST_FPGA_PHYS  0x0800
@@ -127,6 +128,9 @@ static void mainstone_common_init(MemoryRegion 
*address_space_mem,
 for (i = 0; i  2; i ++) {
 dinfo = drive_get(IF_PFLASH, 0, i);
 if (!dinfo) {
+if (qtest_enabled()) {
+break;
+}
 fprintf(stderr, Two flash images must be given with the 
 'pflash' parameter\n);
 exit(1);
@@ -147,7 +151,6 @@ static void mainstone_common_init(MemoryRegion 
*address_space_mem,
 qdev_get_gpio_in(mpu-gpio, 0));
 
 /* setup keypad */
-printf(map addr %p\n, map);
 pxa27x_register_keypad(mpu-kp, map, 0xe0);
 
 /* MMC/SD host */
-- 
1.8.1.4




[Qemu-devel] [PATCH for-1.6? v2 07/21] palm: Don't enforce loading ROM or kernel for qtest

2013-08-05 Thread Andreas Färber
Signed-off-by: Andreas Färber afaer...@suse.de
---
 hw/arm/palm.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/arm/palm.c b/hw/arm/palm.c
index cdc3c3a..dc4bbcd 100644
--- a/hw/arm/palm.c
+++ b/hw/arm/palm.c
@@ -19,6 +19,7 @@
 #include hw/hw.h
 #include audio/audio.h
 #include sysemu/sysemu.h
+#include sysemu/qtest.h
 #include ui/console.h
 #include hw/arm/omap.h
 #include hw/boards.h
@@ -255,7 +256,7 @@ static void palmte_init(QEMUMachineInitArgs *args)
 }
 }
 
-if (!rom_loaded  !kernel_filename) {
+if (!rom_loaded  !kernel_filename  !qtest_enabled()) {
 fprintf(stderr, Kernel or ROM image must be specified\n);
 exit(1);
 }
-- 
1.8.1.4




Re: [Qemu-devel] Versioned machine types for ARM/non-x86 ? (Was Re: [PATCH v4 0/2] ARM: add 'virt' platform)

2013-08-05 Thread Anthony Liguori
Daniel P. Berrange berra...@redhat.com writes:

 On Mon, Aug 05, 2013 at 12:18:10PM +0100, Peter Maydell wrote:
 This patch series adds a 'virt' platform which uses the
 kernel's mach-virt (fully device-tree driven) support
 to create a simple minimalist platform intended for
 use for KVM VM guests. It's based on John Rigby's
 patches, but I've overhauled it a lot:

 On x86, we've long had versioned machine names, so that we can
 make changes in future QEMU releases without breaking guest ABI
 compatibility. AFAICT, the problem has basically been ignored
 on non-x86 platforms in QEMU. Given the increased interest in
 ARM in particular, should we use the addition of this new 'virt'
 machine type, as an opportunity to introduce versioning for
 ARM too. eg make this machine be called 'virt-1.0.6' and then
 have 'virt' simply be an alias that points to the most recent
 version.

I've been thinking about this for SPAPR too.  Like virt, I'm not sure
the platform is stable enough for this but I expect it to be soon.

However, unlike PC, I'd like to do linear versioning and avoid bumping
at every release.

IOW, spapr-1, spapr-2, spapr-3, etc.

I think virt ought to try to do the same.

Regards,

Anthony Liguori


 Regards,
 Daniel
 -- 
 |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
 |: http://libvirt.org  -o- http://virt-manager.org :|
 |: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
 |: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|




[Qemu-devel] [PATCH for-1.6? v2 14/21] milkymist: Suppress -kernel/-bios/-drive error for qtest

2013-08-05 Thread Andreas Färber
Signed-off-by: Andreas Färber afaer...@suse.de
---
 hw/lm32/milkymist.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/lm32/milkymist.c b/hw/lm32/milkymist.c
index 7ceedb8..a8e7230 100644
--- a/hw/lm32/milkymist.c
+++ b/hw/lm32/milkymist.c
@@ -21,6 +21,7 @@
 #include hw/hw.h
 #include hw/block/flash.h
 #include sysemu/sysemu.h
+#include sysemu/qtest.h
 #include hw/devices.h
 #include hw/boards.h
 #include hw/loader.h
@@ -143,7 +144,7 @@ milkymist_init(QEMUMachineInitArgs *args)
 reset_info-bootstrap_pc = BIOS_OFFSET;
 
 /* if no kernel is given no valid bios rom is a fatal error */
-if (!kernel_filename  !dinfo  !bios_filename) {
+if (!kernel_filename  !dinfo  !bios_filename  !qtest_enabled()) {
 fprintf(stderr, qemu: could not load Milkymist One bios '%s'\n,
 bios_name);
 exit(1);
-- 
1.8.1.4




[Qemu-devel] [PATCH for-1.6? v2 17/21] ppc405_boards: Don't enforce presence of firmware for qtest

2013-08-05 Thread Andreas Färber
Adopt error_report() while at it.

Signed-off-by: Andreas Färber afaer...@suse.de
---
 hw/ppc/ppc405_boards.c | 35 ---
 1 file changed, 20 insertions(+), 15 deletions(-)

diff --git a/hw/ppc/ppc405_boards.c b/hw/ppc/ppc405_boards.c
index 807ada0..75b2177 100644
--- a/hw/ppc/ppc405_boards.c
+++ b/hw/ppc/ppc405_boards.c
@@ -27,9 +27,11 @@
 #include hw/timer/m48t59.h
 #include hw/block/flash.h
 #include sysemu/sysemu.h
+#include sysemu/qtest.h
 #include block/block.h
 #include hw/boards.h
 #include qemu/log.h
+#include qemu/error-report.h
 #include hw/loader.h
 #include sysemu/blockdev.h
 #include exec/address-spaces.h
@@ -252,17 +254,20 @@ static void ref405ep_init(QEMUMachineInitArgs *args)
 if (filename) {
 bios_size = load_image(filename, memory_region_get_ram_ptr(bios));
 g_free(filename);
+if (bios_size  0 || bios_size  BIOS_SIZE) {
+error_report(Could not load PowerPC BIOS '%s', bios_name);
+exit(1);
+}
+bios_size = (bios_size + 0xfff)  ~0xfff;
+memory_region_add_subregion(sysmem, (uint32_t)(-bios_size), bios);
+} else if (!qtest_enabled() || kernel_filename != NULL) {
+error_report(Could not load PowerPC BIOS '%s', bios_name);
+exit(1);
 } else {
+/* Avoid an uninitialized variable warning */
 bios_size = -1;
 }
-if (bios_size  0 || bios_size  BIOS_SIZE) {
-fprintf(stderr, qemu: could not load PowerPC bios '%s'\n,
-bios_name);
-exit(1);
-}
-bios_size = (bios_size + 0xfff)  ~0xfff;
 memory_region_set_readonly(bios, true);
-memory_region_add_subregion(sysmem, (uint32_t)(-bios_size), bios);
 }
 /* Register FPGA */
 #ifdef DEBUG_BOARD_INIT
@@ -569,17 +574,17 @@ static void taihu_405ep_init(QEMUMachineInitArgs *args)
 if (filename) {
 bios_size = load_image(filename, memory_region_get_ram_ptr(bios));
 g_free(filename);
-} else {
-bios_size = -1;
-}
-if (bios_size  0 || bios_size  BIOS_SIZE) {
-fprintf(stderr, qemu: could not load PowerPC bios '%s'\n,
-bios_name);
+if (bios_size  0 || bios_size  BIOS_SIZE) {
+error_report(Could not load PowerPC BIOS '%s', bios_name);
+exit(1);
+}
+bios_size = (bios_size + 0xfff)  ~0xfff;
+memory_region_add_subregion(sysmem, (uint32_t)(-bios_size), bios);
+} else if (!qtest_enabled()) {
+error_report(Could not load PowerPC BIOS '%s', bios_name);
 exit(1);
 }
-bios_size = (bios_size + 0xfff)  ~0xfff;
 memory_region_set_readonly(bios, true);
-memory_region_add_subregion(sysmem, (uint32_t)(-bios_size), bios);
 }
 /* Register Linux flash */
 dinfo = drive_get(IF_PFLASH, 0, fl_idx);
-- 
1.8.1.4




  1   2   3   >