Re: [Qemu-devel] default slot used for vga device on q35 machines
Hi, 1) Is this difference intentional, or a bug? The vga simply goes into the first free slot. That happens to be #2 with i440fx and #1 with q35. It sounds like it's a safe bet to assume that -vga will put the device on slot 2 for pc machinetypes and slot 1 for q35, no matter what other devices there are, since -vga is always initialized first. Correct? Yes. Do you see any chance that might change in the future? (e.g. due to some other new device that needs to be initialized even before vga) Highly unlikely. Also libvirt will switch over to use -device for vga cards, which will make this a moot point. 3) Does the qxl multihead support really require that the device be at slot 2 (as stated in the above bugzilla commend)? Or is that just a misunderstanding/overstatement? It's not required at all. The problem is that on older qemu versions (pre-memory-api basically) it was impossible to create functional vga devices via -device due to an initialization order issue. Which implies you have to use -vga instead, which in turn implies the vga ends up in slot #2. On i440fx, but q35 didn't exist yet back then ;) Are you saying that it's still required to use -vga instead of -device qxl-vga for pc machinetype? Or that it was needed at one time, but that is no longer the case? If the latter, is there a reliable way to make the decision whether or not we need to use -vga? libvirt has started using -device for vga cards, to be able to place them in any pci slot. IIRC it's used for qemu 1.1+ or 1.2+. That works fine for all vga cards except qxl. The bug which broke it for qxl will be fixed in 1.6 (and probably 1.5.3 too). cheers, Gerd
[Qemu-devel] [PATCH 0/4]: timers thread-safe stuff
The patches has been rebased onto Alex's [RFC] [PATCHv5 00/16] aio / timers: Add AioContext timers and use ppoll permalink.gmane.org/gmane.comp.emulators.qemu/226333 For some other complied error issue, I can not finish compiling, will fix it later. Changes since last version: 1. drop the overlap partition and leave only thread-safe stuff 2. For timers_state, since currently, only vm_clock can be read outside BQL. There is no protection with ticks(since the protection will cost more in read_tsc path). 3. use light weight QemuEvent to re-implement the qemu_clock_enable(foo,false) Liu Ping Fan (2): timer: protect timers_state's clock with seqlock timer: make qemu_clock_enable sync between disable and timer's cb Paolo Bonzini (2): seqlock: introduce read-write seqlock qemu-thread: add QemuEvent cpus.c | 36 +++--- include/qemu/seqlock.h | 72 +++ include/qemu/thread-posix.h | 8 +++ include/qemu/thread-win32.h | 4 ++ include/qemu/thread.h | 7 +++ include/qemu/timer.h| 1 + qemu-timer.c| 11 + util/qemu-thread-posix.c| 116 util/qemu-thread-win32.c| 26 ++ 9 files changed, 274 insertions(+), 7 deletions(-) create mode 100644 include/qemu/seqlock.h -- 1.8.1.4
[Qemu-devel] [PATCH 2/4] timer: protect timers_state's clock with seqlock
In kvm mode, vm_clock may be read outside BQL. This will make timers_state --the foundation of vm_clock exposed to race condition. Using private lock to protect it. Note in tcg mode, vm_clock still read inside BQL, so icount is left without change. As for cpu_ticks in timers_state, it is still protected by BQL. Lock rule: private lock innermost, ie BQL-this lock Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com --- cpus.c | 36 +--- 1 file changed, 29 insertions(+), 7 deletions(-) diff --git a/cpus.c b/cpus.c index 85e743d..ab92db9 100644 --- a/cpus.c +++ b/cpus.c @@ -107,12 +107,17 @@ static int64_t qemu_icount; typedef struct TimersState { int64_t cpu_ticks_prev; int64_t cpu_ticks_offset; +/* QemuClock will be read out of BQL, so protect is with private lock. + * As for cpu_ticks, no requirement to read it outside BQL. + * Lock rule: innermost + */ +QemuSeqLock clock_seqlock; int64_t cpu_clock_offset; int32_t cpu_ticks_enabled; int64_t dummy; } TimersState; -TimersState timers_state; +static TimersState timers_state; /* Return the virtual CPU time, based on the instruction counter. */ int64_t cpu_get_icount(void) @@ -132,6 +137,7 @@ int64_t cpu_get_icount(void) } /* return the host CPU cycle counter and handle stop/restart */ +/* cpu_ticks is safely if holding BQL */ int64_t cpu_get_ticks(void) { if (use_icount) { @@ -156,33 +162,46 @@ int64_t cpu_get_ticks(void) int64_t cpu_get_clock(void) { int64_t ti; -if (!timers_state.cpu_ticks_enabled) { -return timers_state.cpu_clock_offset; -} else { -ti = get_clock(); -return ti + timers_state.cpu_clock_offset; -} +unsigned start; + +do { +start = seqlock_read_begin(timers_state.clock_seqlock); +if (!timers_state.cpu_ticks_enabled) { +ti = timers_state.cpu_clock_offset; +} else { +ti = get_clock(); +ti += timers_state.cpu_clock_offset; +} +} while (seqlock_read_check(timers_state.clock_seqlock, start); + +return ti; } /* enable cpu_get_ticks() */ void cpu_enable_ticks(void) { +/* Here, the really thing protected by seqlock is cpu_clock. */ +seqlock_write_lock(timers_state.clock_seqlock); if (!timers_state.cpu_ticks_enabled) { timers_state.cpu_ticks_offset -= cpu_get_real_ticks(); timers_state.cpu_clock_offset -= get_clock(); timers_state.cpu_ticks_enabled = 1; } +seqlock_write_unlock(timers_state.clock_seqlock); } /* disable cpu_get_ticks() : the clock is stopped. You must not call cpu_get_ticks() after that. */ void cpu_disable_ticks(void) { +/* Here, the really thing protected by seqlock is cpu_clock. */ +seqlock_write_lock(timers_state.clock_seqlock); if (timers_state.cpu_ticks_enabled) { timers_state.cpu_ticks_offset = cpu_get_ticks(); timers_state.cpu_clock_offset = cpu_get_clock(); timers_state.cpu_ticks_enabled = 0; } +seqlock_write_unlock(timers_state.clock_seqlock); } /* Correlation between real and virtual time is always going to be @@ -364,6 +383,9 @@ static const VMStateDescription vmstate_timers = { void configure_icount(const char *option) { +QemuMutex *mutex = g_malloc0(sizeof(QemuMutex)); +qemu_mutex_init(mutex); +seqlock_init(timers_state.clock_seqlock, mutex); vmstate_register(NULL, 0, vmstate_timers, timers_state); if (!option) { return; -- 1.8.1.4
[Qemu-devel] [PATCH 1/4] seqlock: introduce read-write seqlock
From: Paolo Bonzini pbonz...@redhat.com This lets the read-side access run outside the BQL. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- include/qemu/seqlock.h | 72 ++ 1 file changed, 72 insertions(+) create mode 100644 include/qemu/seqlock.h diff --git a/include/qemu/seqlock.h b/include/qemu/seqlock.h new file mode 100644 index 000..8f1c89f --- /dev/null +++ b/include/qemu/seqlock.h @@ -0,0 +1,72 @@ +/* + * Seqlock implementation for QEMU + * + * Copyright Red Hat, Inc. 2013 + * + * Author: + * Paolo Bonzini pbonz...@redhat.com + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + * + */ +#ifndef QEMU_SEQLOCK_H +#define QEMU_SEQLOCK_H 1 + +#include qemu/atomic.h +#include qemu/thread.h + +typedef struct QemuSeqLock QemuSeqLock; + +struct QemuSeqLock { +QemuMutex *mutex; +unsigned sequence; +}; + +static inline void seqlock_init(QemuSeqLock *sl, QemuMutex *mutex) +{ +sl-mutex = mutex; +sl-sequence = 0; +} + +/* Lock out other writers and update the count. */ +static inline void seqlock_write_lock(QemuSeqLock *sl) +{ +if (sl-mutex) { +qemu_mutex_lock(sl-mutex); +} +++sl-sequence; + +/* Write sequence before updating other fields. */ +smp_wmb(); +} + +static inline void seqlock_write_unlock(QemuSeqLock *sl) +{ +/* Write other fields before finalizing sequence. */ +smp_wmb(); + +++sl-sequence; +if (sl-mutex) { +qemu_mutex_unlock(sl-mutex); +} +} + +static inline unsigned seqlock_read_begin(QemuSeqLock *sl) +{ +/* Always fail if a write is in progress. */ +unsigned ret = sl-sequence ~1; + +/* Read sequence before reading other fields. */ +smp_rmb(); +return ret; +} + +static int seqlock_read_check(const QemuSeqLock *sl, unsigned start) +{ +/* Read other fields before reading final sequence. */ +smp_rmb(); +return unlikely(sl-sequence != start); +} + +#endif -- 1.8.1.4
[Qemu-devel] [PATCH 4/4] timer: make qemu_clock_enable sync between disable and timer's cb
After disabling the QemuClock, we should make sure that no QemuTimers are still in flight. To implement that with light overhead, we resort to QemuEvent. The caller of disabling will wait on QemuEvent of each timerlist. Note, qemu_clock_enable(foo,false) can _not_ be called from timer's cb. And the callers of qemu_clock_enable() should be sync by themselves, not protected by this patch. Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com --- include/qemu/timer.h | 1 + qemu-timer.c | 11 +++ 2 files changed, 12 insertions(+) diff --git a/include/qemu/timer.h b/include/qemu/timer.h index 1363316..ca09ba2 100644 --- a/include/qemu/timer.h +++ b/include/qemu/timer.h @@ -85,6 +85,7 @@ int64_t timerlistgroup_deadline_ns(QEMUTimerListGroup tlg); int qemu_timeout_ns_to_ms(int64_t ns); int qemu_poll_ns(GPollFD *fds, uint nfds, int64_t timeout); +/* The disable of clock can not be called in timer's cb */ void qemu_clock_enable(QEMUClock *clock, bool enabled); void qemu_clock_warp(QEMUClock *clock); diff --git a/qemu-timer.c b/qemu-timer.c index ebe7597..5828107 100644 --- a/qemu-timer.c +++ b/qemu-timer.c @@ -71,6 +71,8 @@ struct QEMUTimerList { QLIST_ENTRY(QEMUTimerList) list; QEMUTimerListNotifyCB *notify_cb; void *notify_opaque; +/* light weight method to mark the end of timerlist's running */ +QemuEvent ev; }; struct QEMUTimer { @@ -92,6 +94,7 @@ static QEMUTimerList *timerlist_new_from_clock(QEMUClock *clock) QEMUTimerList *tl; tl = g_malloc0(sizeof(QEMUTimerList)); +qemu_event_init(tl-ev, false); tl-clock = clock; QLIST_INSERT_HEAD(clock-timerlists, tl, list); return tl; @@ -145,12 +148,18 @@ void qemu_clock_notify(QEMUClock *clock) } } +/* The disable of clock can _not_ be called from timer's cb */ void qemu_clock_enable(QEMUClock *clock, bool enabled) { +QEMUTimerList *tl; bool old = clock-enabled; clock-enabled = enabled; if (enabled !old) { qemu_clock_notify(clock); +} else if (!enabled old) { +QLIST_FOREACH(tl, clock-timerlists, list) { +qemu_event_wait(tl-ev); +} } } @@ -419,6 +428,7 @@ bool timerlist_run_timers(QEMUTimerList *tl) } current_time = qemu_get_clock_ns(tl-clock); +qemu_event_reset(tl-ev); for(;;) { ts = tl-active_timers; if (!qemu_timer_expired_ns(ts, current_time)) { @@ -432,6 +442,7 @@ bool timerlist_run_timers(QEMUTimerList *tl) ts-cb(ts-opaque); progress = true; } +qemu_event_set(tl-ev); return progress; } -- 1.8.1.4
[Qemu-devel] [PATCH 3/4] qemu-thread: add QemuEvent
From: Paolo Bonzini pbonz...@redhat.com This emulates Win32 manual-reset events using futexes or conditional variables. Typical ways to use them are with multi-producer, single-consumer data structures, to test for a complex condition whose elements come from different threads: for (;;) { qemu_event_reset(ev); ... test complex condition ... if (condition is true) { break; } qemu_event_wait(ev); } Or more efficiently (but with some duplication): ... evaluate condition ... while (!condition) { qemu_event_reset(ev); ... evaluate condition ... if (!condition) { qemu_event_wait(ev); ... evaluate condition ... } } QemuEvent provides a very fast userspace path in the common case when no other thread is waiting, or the event is not changing state. It is used to report RCU quiescent states to the thread calling synchronize_rcu (the latter being the single consumer), and to report call_rcu invocations to the thread that receives them. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- include/qemu/thread-posix.h | 8 +++ include/qemu/thread-win32.h | 4 ++ include/qemu/thread.h | 7 +++ util/qemu-thread-posix.c| 116 util/qemu-thread-win32.c| 26 ++ 5 files changed, 161 insertions(+) diff --git a/include/qemu/thread-posix.h b/include/qemu/thread-posix.h index 0f30dcc..916b2a7 100644 --- a/include/qemu/thread-posix.h +++ b/include/qemu/thread-posix.h @@ -21,6 +21,14 @@ struct QemuSemaphore { #endif }; +struct QemuEvent { +#ifndef __linux__ +pthread_mutex_t lock; +pthread_cond_t cond; +#endif +unsigned value; +}; + struct QemuThread { pthread_t thread; }; diff --git a/include/qemu/thread-win32.h b/include/qemu/thread-win32.h index 13adb95..3d58081 100644 --- a/include/qemu/thread-win32.h +++ b/include/qemu/thread-win32.h @@ -17,6 +17,10 @@ struct QemuSemaphore { HANDLE sema; }; +struct QemuEvent { +HANDLE event; +}; + typedef struct QemuThreadData QemuThreadData; struct QemuThread { QemuThreadData *data; diff --git a/include/qemu/thread.h b/include/qemu/thread.h index c02404b..3e32c65 100644 --- a/include/qemu/thread.h +++ b/include/qemu/thread.h @@ -7,6 +7,7 @@ typedef struct QemuMutex QemuMutex; typedef struct QemuCond QemuCond; typedef struct QemuSemaphore QemuSemaphore; +typedef struct QemuEvent QemuEvent; typedef struct QemuThread QemuThread; #ifdef _WIN32 @@ -45,6 +46,12 @@ void qemu_sem_wait(QemuSemaphore *sem); int qemu_sem_timedwait(QemuSemaphore *sem, int ms); void qemu_sem_destroy(QemuSemaphore *sem); +void qemu_event_init(QemuEvent *ev, bool init); +void qemu_event_set(QemuEvent *ev); +void qemu_event_reset(QemuEvent *ev); +void qemu_event_wait(QemuEvent *ev); +void qemu_event_destroy(QemuEvent *ev); + void qemu_thread_create(QemuThread *thread, void *(*start_routine)(void *), void *arg, int mode); diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c index 4489abf..8178f9b 100644 --- a/util/qemu-thread-posix.c +++ b/util/qemu-thread-posix.c @@ -20,7 +20,12 @@ #include limits.h #include unistd.h #include sys/time.h +#ifdef __linux__ +#include sys/syscall.h +#include linux/futex.h +#endif #include qemu/thread.h +#include qemu/atomic.h static void error_exit(int err, const char *msg) { @@ -268,6 +273,117 @@ void qemu_sem_wait(QemuSemaphore *sem) #endif } +#ifdef __linux__ +#define futex(...) syscall(__NR_futex, __VA_ARGS__) + +static inline void futex_wake(QemuEvent *ev, int n) +{ +futex(ev, FUTEX_WAKE, n, NULL, NULL, 0); +} + +static inline void futex_wait(QemuEvent *ev, unsigned val) +{ +futex(ev, FUTEX_WAIT, (int) val, NULL, NULL, 0); +} +#else +static inline void futex_wake(QemuEvent *ev, int n) +{ +if (n == 1) { +pthread_cond_signal(ev-cond); +} else { +pthread_cond_broadcast(ev-cond); +} +} + +static inline void futex_wait(QemuEvent *ev, unsigned val) +{ +pthread_mutex_lock(ev-lock); +if (ev-value == val) { +pthread_cond_wait(ev-cond, ev-lock); +} +pthread_mutex_unlock(ev-lock); +} +#endif + +/* Valid transitions: + * - free-set, when setting the event + * - busy-set, when setting the event, followed by futex_wake + * - set-free, when resetting the event + * - free-busy, when waiting + * + * set-busy does not happen (it can be observed from the outside but + * it really is set-free-busy). + * + * busy-free provably cannot happen; to enforce it, the set-free transition + * is done with an OR, which becomes a no-op if the event has concurrently + * transitioned to free or busy. + */ + +#define EV_SET 0 +#define EV_FREE1 +#define EV_BUSY -1 + +void qemu_event_init(QemuEvent *ev, bool init) +{ +#ifndef __linux__ +pthread_mutex_init(ev-lock, NULL); +
Re: [Qemu-devel] [PATCH] target-mips: fix decoding of microMIPS POOL32Axf instructions
On 03/08/13 23:01, Aurelien Jarno wrote: On Thu, Aug 01, 2013 at 11:02:27AM +0100, Leon Alrae wrote: These are not DSP instructions, thus there is no ac field. For more details please refer to instruction encoding of MULT, MULTU, MADD, MADDU, MSUB, MSUBU, MFHI, MFLO, MTHI, MTLO in MIPS Architecture for Programmers Volume II-B: The microMIPS32 Instruction Set These instructions have both a non DSP version without the ac field, and a DSP version with the ac field. The encoding of the ac field is done in bits 14 and 15, so the current QEMU code is correct. Please refer to the MIPS32® Architecture Manual Volume IV-e: The MIPS® DSP Application-Specific Extension to the microMIPS32® Architecture (document MD00764). Please note that the patch modifies non-DSP version of listed instructions, where ac field doesn't exist. In this case reading bits 14 and 15 and passing it as acc argument to gen_muldiv() or gen_HILO() is not correct. If those bits are not 0 (and for most of the listed instructions this is true) then check_dsp() is called which will cause an exception if DSP is not supported / disabled.
Re: [Qemu-devel] [PATCH 0/4] dump-guest-memory: correct the vmcores
On 08/01/13 16:31, Luiz Capitulino wrote: On Thu, 1 Aug 2013 09:41:07 -0400 Luiz Capitulino lcapitul...@redhat.com wrote: Applied to the qmp branch, thanks. Hmm, it brakes the build. Dropping it from the queue for now: /home/lcapitulino/work/src/upstream/qmp-unstable/target-s390x/arch_dump.c:179:5: error: conflicting types for ‘cpu_get_dump_info’ int cpu_get_dump_info(ArchDumpInfo *info) ^ In file included from /home/lcapitulino/work/src/upstream/qmp-unstable/target-s390x/arch_dump.c:17:0: /home/lcapitulino/work/src/upstream/qmp-unstable/include/sysemu/dump.h:24:5: note: previous declaration of ‘cpu_get_dump_info’ was here int cpu_get_dump_info(ArchDumpInfo *info, ^ make[1]: *** [target-s390x/arch_dump.o] Error 1 make: *** [subdir-s390x-softmmu] Error 2 make: *** Waiting for unfinished jobs My series was based on Author: Aurelien Jarno aurel...@aurel32.net 2013-07-29 09:03:23 Committer: Aurelien Jarno aurel...@aurel32.net 2013-07-29 09:03:23 Merge branch 'trivial-patches' of git://git.corpit.ru/qemu and it compiled then just fine. target-s390x/arch_dump.c was added in Author: Ekaterina Tumanova tuman...@linux.vnet.ibm.com 2013-07-10 15:26:46 Committer: Christian Borntraeger borntrae...@de.ibm.com 2013-07-30 16:12:25 s390: Implement dump-guest-memory support for target s390x See the commit date: 2013-07-30. I'll refresh the series. Laszlo
Re: [Qemu-devel] [ceph-users] qemu-1.4.0 and onwards, linux kernel 3.2.x, ceph-RBD, heavy I/O leads to kernel_hung_tasks_timout_secs message and unresponsive qemu-process, [Bug 1207686]
On Sun, Aug 04, 2013 at 03:36:52PM +0200, Oliver Francke wrote: Am 02.08.2013 um 23:47 schrieb Mike Dawson mike.daw...@cloudapt.com: We can un-wedge the guest by opening a NoVNC session or running a 'virsh screenshot' command. After that, the guest resumes and runs as expected. At that point we can examine the guest. Each time we'll see: If virsh screenshot works then this confirms that QEMU itself is still responding. Its main loop cannot be blocked since it was able to process the screendump command. This supports Josh's theory that a callback is not being invoked. The virtio-blk I/O request would be left in a pending state. Now here is where the behavior varies between configurations: On a Windows guest with 1 vCPU, you may see the symptom that the guest no longer responds to ping. On a Linux guest with multiple vCPUs, you may see the hung task message from the guest kernel because other vCPUs are still making progress. Just the vCPU that issued the I/O request and whose task is in UNINTERRUPTIBLE state would really be stuck. Basically, the symptoms depend not just on how QEMU is behaving but also on the guest kernel and how many vCPUs you have configured. I think this can explain how both problems you are observing, Oliver and Mike, are a result of the same bug. At least I hope they are :). Stefan
[Qemu-devel] [PATCH] block: Produce zeros when protocols reading beyond end of file
From: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp While Asias is debugging an issue creating qcow2 images on top of non-file protocols. It boils down to this example using NBD: $ qemu-io -c 'open -g nbd+unix:///?socket=/tmp/nbd.sock' -c 'read -v 0 512' Notice the open -g option to set bs-growable. This means you can read/write beyond end of file. Reading beyond end of file is supposed to produce zeroes. We rely on this behavior in qcow2_create2() during qcow2 image creation. We create a new file and then write the qcow2 header structure using bdrv_pwrite(). Since QCowHeader is not a multiple of sector size, block.c first uses bdrv_read() on the empty file to fetch the first sector (should be all zeroes). Here is the output from the qemu-io NBD example above: $ qemu-io -c 'open -g nbd+unix:///?socket=/tmp/nbd.sock' -c 'read -v 0 512' : ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab 0010: ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab 0020: ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ... We are not zeroing the buffer! As a result qcow2 image creation on top of protocols is not guaranteed to work even when file creation is supported by the protocol. Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp Signed-off-by: Asias He as...@redhat.com --- block.c | 30 +- 1 file changed, 29 insertions(+), 1 deletion(-) diff --git a/block.c b/block.c index 01b66d8..deaf0a0 100644 --- a/block.c +++ b/block.c @@ -2544,7 +2544,35 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs, } } -ret = drv-bdrv_co_readv(bs, sector_num, nb_sectors, qiov); +if (!bs-drv-protocol_name) { +ret = drv-bdrv_co_readv(bs, sector_num, nb_sectors, qiov); +} else { +/* NBD doesn't support reading beyond end of file. */ +int64_t len, total_sectors, max_nb_sectors; + +len = bdrv_getlength(bs); +if (len 0) { +ret = len; +goto out; +} + +total_sectors = len BDRV_SECTOR_BITS; +max_nb_sectors = MAX(0, total_sectors - sector_num); +if (max_nb_sectors 0) { +ret = drv-bdrv_co_readv(bs, sector_num, + MIN(nb_sectors, max_nb_sectors), qiov); +} else { +ret = 0; +} + +/* Reading beyond end of file is supposed to produce zeroes */ +if (ret == 0 total_sectors sector_num + nb_sectors) { +size_t offset = MAX(0, total_sectors - sector_num); +size_t bytes = (sector_num + nb_sectors - offset) * +BDRV_SECTOR_SIZE; +qemu_iovec_memset(qiov, offset * BDRV_SECTOR_SIZE, 0, bytes); +} +} out: tracked_request_end(req); -- 1.8.3.1
[Qemu-devel] [PATCH v2 for-qmp-1.6 0/4] dump-guest-memory: correct the vmcores
v1-v2 [Luiz]: - fix up the cpu_get_dump_info() prototype in target-s390x/arch_dump.c that has been introduced between my posting of v1 and Luiz's applying it to qmp-1.6. (Patch #4.) v1 blurb: Conceptually, the dump-guest-memory command works as follows: (a) pause the guest, (b) get a snapshot of the guest's physical memory map, as provided by qemu, (c) retrieve the guest's virtual mappings, as seen by the guest (this is where paging=true vs. paging=false makes a difference), (d) filter (c) as requested by the QMP caller, (e) write ELF headers, keying off (b) -- the guest's physmap -- and (d) -- the filtered guest mappings. (f) dump RAM contents, keying off the same (b) and (d), (g) unpause the guest (if necessary). Patch #1 affects step (e); specifically, how (d) is matched against (b), when paging is true, and the guest kernel maps more guest-physical RAM than it actually has. This can be done by non-malicious, clean-state guests (eg. a pristine RHEL-6.4 guest), and may cause libbfd errors due to PT_LOAD entries (coming directly from the guest page tables) exceeding the vmcore file's size. Patches #2 to #4 are independent of the paging option (or, more precisely, affect them equally); they affect (b). Currently input parameter (b), that is, the guest's physical memory map as provided by qemu, is implicitly represented by ram_list.blocks. As a result, steps and outputs dependent on (b) will refer to qemu-internal offsets. Unfortunately, this breaks when the guest-visible physical addresses diverge from the qemu-internal, RAMBlock based representation. This can happen eg. for guests 3.5 GB, due to the 32-bit PCI hole; see patch #4 for a diagram. Patch #2 introduces input parameter (b) explicitly, as a reasonably minimal map of guest-physical address ranges. (Minimality is not a hard requirement here, it just decreases the number of PT_LOAD entries written to the vmcore header.) Patch #3 populates this map. Patch #4 rebases the dump-guest-memory command to it, so that steps (e) and (f) work with guest-phys addresses. As a result, the crash utility can parse vmcores dumped for big x86_64 guests (paging=false). Please refer to Red Hat Bugzilla 981582 https://bugzilla.redhat.com/show_bug.cgi?id=981582. Laszlo Ersek (4): dump: clamp guest-provided mapping lengths to ramblock sizes dump: introduce GuestPhysBlockList dump: populate guest_phys_blocks dump: rebase from host-private RAMBlock offsets to guest-physical addresses include/sysemu/dump.h |4 +- include/sysemu/memory_mapping.h | 30 ++- dump.c | 171 +- memory_mapping.c| 174 +-- stubs/dump.c|3 +- target-i386/arch_dump.c | 10 ++- target-s390x/arch_dump.c|3 +- 7 files changed, 302 insertions(+), 93 deletions(-)
[Qemu-devel] [PATCH v2 for-qmp-1.6 2/4] dump: introduce GuestPhysBlockList
The vmcore must use physical addresses that are visible to the guest, not addresses that point into linear RAMBlocks. As first step, introduce the list type into which we'll collect the physical mappings in effect at the time of the dump. Related RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=981582 Signed-off-by: Laszlo Ersek ler...@redhat.com --- include/sysemu/memory_mapping.h | 22 ++ dump.c | 31 +++ memory_mapping.c| 17 + 3 files changed, 58 insertions(+), 12 deletions(-) diff --git a/include/sysemu/memory_mapping.h b/include/sysemu/memory_mapping.h index 6dfb68d..53c2cd5 100644 --- a/include/sysemu/memory_mapping.h +++ b/include/sysemu/memory_mapping.h @@ -17,6 +17,25 @@ #include qemu/queue.h #include qemu/typedefs.h +typedef struct GuestPhysBlock { +/* visible to guest, reflects PCI hole, etc */ +hwaddr target_start; + +/* implies size */ +hwaddr target_end; + +/* points into host memory */ +uint8_t *host_addr; + +QTAILQ_ENTRY(GuestPhysBlock) next; +} GuestPhysBlock; + +/* point-in-time snapshot of guest-visible physical mappings */ +typedef struct GuestPhysBlockList { +unsigned num; +QTAILQ_HEAD(, GuestPhysBlock) head; +} GuestPhysBlockList; + /* The physical and virtual address in the memory mapping are contiguous. */ typedef struct MemoryMapping { hwaddr phys_addr; @@ -45,6 +64,9 @@ void memory_mapping_list_free(MemoryMappingList *list); void memory_mapping_list_init(MemoryMappingList *list); +void guest_phys_blocks_free(GuestPhysBlockList *list); +void guest_phys_blocks_init(GuestPhysBlockList *list); + void qemu_get_guest_memory_mapping(MemoryMappingList *list, Error **errp); /* get guest's memory mapping without do paging(virtual address is 0). */ diff --git a/dump.c b/dump.c index 9a2f939..716fb1d 100644 --- a/dump.c +++ b/dump.c @@ -59,6 +59,7 @@ static uint64_t cpu_convert_to_target64(uint64_t val, int endian) } typedef struct DumpState { +GuestPhysBlockList guest_phys_blocks; ArchDumpInfo dump_info; MemoryMappingList list; uint16_t phdr_num; @@ -81,6 +82,7 @@ static int dump_cleanup(DumpState *s) { int ret = 0; +guest_phys_blocks_free(s-guest_phys_blocks); memory_mapping_list_free(s-list); if (s-fd != -1) { close(s-fd); @@ -728,31 +730,34 @@ static int dump_init(DumpState *s, int fd, bool paging, bool has_filter, s-resume = false; } +/* If we use KVM, we should synchronize the registers before we get dump + * info or physmap info. + */ +cpu_synchronize_all_states(); +nr_cpus = 0; +for (cpu = first_cpu; cpu != NULL; cpu = cpu-next_cpu) { +nr_cpus++; +} + s-errp = errp; s-fd = fd; s-has_filter = has_filter; s-begin = begin; s-length = length; + +guest_phys_blocks_init(s-guest_phys_blocks); +/* FILL LIST */ + s-start = get_start_block(s); if (s-start == -1) { error_set(errp, QERR_INVALID_PARAMETER, begin); goto cleanup; } -/* - * get dump info: endian, class and architecture. +/* get dump info: endian, class and architecture. * If the target architecture is not supported, cpu_get_dump_info() will * return -1. - * - * If we use KVM, we should synchronize the registers before we get dump - * info. */ -cpu_synchronize_all_states(); -nr_cpus = 0; -for (cpu = first_cpu; cpu != NULL; cpu = cpu-next_cpu) { -nr_cpus++; -} - ret = cpu_get_dump_info(s-dump_info); if (ret 0) { error_set(errp, QERR_UNSUPPORTED); @@ -827,6 +832,8 @@ static int dump_init(DumpState *s, int fd, bool paging, bool has_filter, return 0; cleanup: +guest_phys_blocks_free(s-guest_phys_blocks); + if (s-resume) { vm_start(); } @@ -874,7 +881,7 @@ void qmp_dump_guest_memory(bool paging, const char *file, bool has_begin, return; } -s = g_malloc(sizeof(DumpState)); +s = g_malloc0(sizeof(DumpState)); ret = dump_init(s, fd, paging, has_begin, begin, length, errp); if (ret 0) { diff --git a/memory_mapping.c b/memory_mapping.c index 515a984..c70505b 100644 --- a/memory_mapping.c +++ b/memory_mapping.c @@ -165,6 +165,23 @@ void memory_mapping_list_init(MemoryMappingList *list) QTAILQ_INIT(list-head); } +void guest_phys_blocks_free(GuestPhysBlockList *list) +{ +GuestPhysBlock *p, *q; + +QTAILQ_FOREACH_SAFE(p, list-head, next, q) { +QTAILQ_REMOVE(list-head, p, next); +g_free(p); +} +list-num = 0; +} + +void guest_phys_blocks_init(GuestPhysBlockList *list) +{ +list-num = 0; +QTAILQ_INIT(list-head); +} + static CPUState *find_paging_enabled_cpu(CPUState *start_cpu) { CPUState *cpu; -- 1.7.1
[Qemu-devel] [PATCH v2 for-qmp-1.6 1/4] dump: clamp guest-provided mapping lengths to ramblock sizes
Even a trusted clean-state guest can map more memory than what it was given. Since the vmcore contains RAMBlocks, mapping sizes should be clamped to RAMBlock sizes. Otherwise such oversized mappings can exceed the entire file size, and ELF parsers might refuse even the valid portion of the PT_LOAD entry. Related RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=981582 Signed-off-by: Laszlo Ersek ler...@redhat.com --- dump.c | 65 +++ 1 files changed, 40 insertions(+), 25 deletions(-) diff --git a/dump.c b/dump.c index 6a3a72a..9a2f939 100644 --- a/dump.c +++ b/dump.c @@ -187,7 +187,8 @@ static int write_elf32_header(DumpState *s) } static int write_elf64_load(DumpState *s, MemoryMapping *memory_mapping, -int phdr_index, hwaddr offset) +int phdr_index, hwaddr offset, +hwaddr filesz) { Elf64_Phdr phdr; int ret; @@ -197,15 +198,12 @@ static int write_elf64_load(DumpState *s, MemoryMapping *memory_mapping, phdr.p_type = cpu_convert_to_target32(PT_LOAD, endian); phdr.p_offset = cpu_convert_to_target64(offset, endian); phdr.p_paddr = cpu_convert_to_target64(memory_mapping-phys_addr, endian); -if (offset == -1) { -/* When the memory is not stored into vmcore, offset will be -1 */ -phdr.p_filesz = 0; -} else { -phdr.p_filesz = cpu_convert_to_target64(memory_mapping-length, endian); -} +phdr.p_filesz = cpu_convert_to_target64(filesz, endian); phdr.p_memsz = cpu_convert_to_target64(memory_mapping-length, endian); phdr.p_vaddr = cpu_convert_to_target64(memory_mapping-virt_addr, endian); +assert(memory_mapping-length = filesz); + ret = fd_write_vmcore(phdr, sizeof(Elf64_Phdr), s); if (ret 0) { dump_error(s, dump: failed to write program header table.\n); @@ -216,7 +214,8 @@ static int write_elf64_load(DumpState *s, MemoryMapping *memory_mapping, } static int write_elf32_load(DumpState *s, MemoryMapping *memory_mapping, -int phdr_index, hwaddr offset) +int phdr_index, hwaddr offset, +hwaddr filesz) { Elf32_Phdr phdr; int ret; @@ -226,15 +225,12 @@ static int write_elf32_load(DumpState *s, MemoryMapping *memory_mapping, phdr.p_type = cpu_convert_to_target32(PT_LOAD, endian); phdr.p_offset = cpu_convert_to_target32(offset, endian); phdr.p_paddr = cpu_convert_to_target32(memory_mapping-phys_addr, endian); -if (offset == -1) { -/* When the memory is not stored into vmcore, offset will be -1 */ -phdr.p_filesz = 0; -} else { -phdr.p_filesz = cpu_convert_to_target32(memory_mapping-length, endian); -} +phdr.p_filesz = cpu_convert_to_target32(filesz, endian); phdr.p_memsz = cpu_convert_to_target32(memory_mapping-length, endian); phdr.p_vaddr = cpu_convert_to_target32(memory_mapping-virt_addr, endian); +assert(memory_mapping-length = filesz); + ret = fd_write_vmcore(phdr, sizeof(Elf32_Phdr), s); if (ret 0) { dump_error(s, dump: failed to write program header table.\n); @@ -418,17 +414,24 @@ static int write_memory(DumpState *s, RAMBlock *block, ram_addr_t start, return 0; } -/* get the memory's offset in the vmcore */ -static hwaddr get_offset(hwaddr phys_addr, - DumpState *s) +/* get the memory's offset and size in the vmcore */ +static void get_offset_range(hwaddr phys_addr, + ram_addr_t mapping_length, + DumpState *s, + hwaddr *p_offset, + hwaddr *p_filesz) { RAMBlock *block; hwaddr offset = s-memory_offset; int64_t size_in_block, start; +/* When the memory is not stored into vmcore, offset will be -1 */ +*p_offset = -1; +*p_filesz = 0; + if (s-has_filter) { if (phys_addr s-begin || phys_addr = s-begin + s-length) { -return -1; +return; } } @@ -457,18 +460,26 @@ static hwaddr get_offset(hwaddr phys_addr, } if (phys_addr = start phys_addr start + size_in_block) { -return phys_addr - start + offset; +*p_offset = phys_addr - start + offset; + +/* The offset range mapped from the vmcore file must not spill over + * the RAMBlock, clamp it. The rest of the mapping will be + * zero-filled in memory at load time; see + * http://refspecs.linuxbase.org/elf/gabi4+/ch5.pheader.html. + */ +*p_filesz = phys_addr + mapping_length = start + size_in_block ? +mapping_length : +size_in_block - (phys_addr - start); +return; } offset += size_in_block; } - -
[Qemu-devel] [PATCH v2 for-qmp-1.6 4/4] dump: rebase from host-private RAMBlock offsets to guest-physical addresses
RAMBlock.offset -- GuestPhysBlock.target_start RAMBlock.offset + RAMBlock.length -- GuestPhysBlock.target_end RAMBlock.length -- GuestPhysBlock.target_end - GuestPhysBlock.target_start GuestPhysBlock.host_addr is only used when writing the dump contents. This patch enables crash to work with the vmcore by rebasing the vmcore from the left side of the following diagram to the right side: host-private offset relative to ram_addr RAMBlock guest-visible paddrs 0 +---+.+---+ 0 | ^ | |^ | | 640 KB | | 640 KB | | v | |v | 0xa +---+.+---+ 0xa | ^ | |XXX| | 384 KB | |XXX| | v | |XXX| 0x00010 +---+.+---+ 0x00010 | ^ | |^ | | 3583 MB | | 3583 MB | | v | |v | 0x0e000 +---+.+---+ 0x0e000 | ^ |.|XXX| | above_4g_mem_size | . | PCI hole X| | v | . | X| ram_size +---+ . | 512 MB X| . .|XXX| . +---+ 0x1 . | ^ | . | above_4g_mem_size | .| v | +---+ ram_size + 512 MB Related RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=981582 Signed-off-by: Laszlo Ersek ler...@redhat.com --- v1-v2: - fix up the cpu_get_dump_info() prototype in target-s390x/arch_dump.c that has been introduced between my posting of v1 and Luiz's applying it to qmp-1.6. include/sysemu/dump.h |4 ++- include/sysemu/memory_mapping.h |7 +++- dump.c | 77 +++ memory_mapping.c| 22 +++- stubs/dump.c|3 +- target-i386/arch_dump.c | 10 +++-- target-s390x/arch_dump.c|3 +- 7 files changed, 69 insertions(+), 57 deletions(-) diff --git a/include/sysemu/dump.h b/include/sysemu/dump.h index b8c770f..19fafb2 100644 --- a/include/sysemu/dump.h +++ b/include/sysemu/dump.h @@ -20,7 +20,9 @@ typedef struct ArchDumpInfo { int d_class;/* ELFCLASS32 or ELFCLASS64 */ } ArchDumpInfo; -int cpu_get_dump_info(ArchDumpInfo *info); +struct GuestPhysBlockList; /* memory_mapping.h */ +int cpu_get_dump_info(ArchDumpInfo *info, + const struct GuestPhysBlockList *guest_phys_blocks); ssize_t cpu_get_note_size(int class, int machine, int nr_cpus); #endif diff --git a/include/sysemu/memory_mapping.h b/include/sysemu/memory_mapping.h index 6723dc5..2d98a89 100644 --- a/include/sysemu/memory_mapping.h +++ b/include/sysemu/memory_mapping.h @@ -68,10 +68,13 @@ void guest_phys_blocks_free(GuestPhysBlockList *list); void guest_phys_blocks_init(GuestPhysBlockList *list); void guest_phys_blocks_append(GuestPhysBlockList *list); -void qemu_get_guest_memory_mapping(MemoryMappingList *list, Error **errp); +void qemu_get_guest_memory_mapping(MemoryMappingList *list, + const GuestPhysBlockList *guest_phys_blocks, + Error **errp); /* get guest's memory mapping without do paging(virtual address is 0). */ -void qemu_get_guest_simple_memory_mapping(MemoryMappingList *list); +void qemu_get_guest_simple_memory_mapping(MemoryMappingList *list, + const GuestPhysBlockList *guest_phys_blocks); void memory_mapping_filter(MemoryMappingList *list, int64_t begin, int64_t length); diff --git a/dump.c b/dump.c index 3fa33fc..c0dae2c 100644 --- a/dump.c +++ b/dump.c @@ -70,7 +70,7 @@ typedef struct DumpState { hwaddr memory_offset; int fd; -RAMBlock *block; +GuestPhysBlock *next_block; ram_addr_t start; bool has_filter; int64_t begin; @@ -391,14 +391,14 @@ static int write_data(DumpState *s, void *buf, int length) } /* write the memroy to vmcore. 1 page per I/O. */ -static int write_memory(DumpState *s, RAMBlock *block, ram_addr_t start, +static int write_memory(DumpState *s, GuestPhysBlock *block, ram_addr_t
[Qemu-devel] [PATCH] virtio: Do not notify virtqueue if no element was pushed back.
Signed-off-by: Gal Hammer gham...@redhat.com --- hw/char/virtio-serial-bus.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c index da417c7..0d38b4b 100644 --- a/hw/char/virtio-serial-bus.c +++ b/hw/char/virtio-serial-bus.c @@ -105,6 +105,7 @@ static void do_flush_queued_data(VirtIOSerialPort *port, VirtQueue *vq, VirtIODevice *vdev) { VirtIOSerialPortClass *vsc; +bool elem_pushed = false; assert(port); assert(virtio_queue_ready(vq)); @@ -145,9 +146,12 @@ static void do_flush_queued_data(VirtIOSerialPort *port, VirtQueue *vq, break; } virtqueue_push(vq, port-elem, 0); +elem_pushed = true; port-elem.out_num = 0; } -virtio_notify(vdev, vq); +if (elem_pushed) { +virtio_notify(vdev, vq); +} } static void flush_queued_data(VirtIOSerialPort *port) -- 1.8.1.4
[Qemu-devel] [PATCH v2 for-qmp-1.6 3/4] dump: populate guest_phys_blocks
While the machine is paused, in guest_phys_blocks_append() we register a one-shot MemoryListener, solely for the initial collection of the valid guest-physical memory ranges that happens at client registration time. For each range that is reported to guest_phys_blocks_set_memory(), we attempt to merge the range with adjacent (preceding, subsequent, or both) ranges. We use two hash tables for this purpose, both indexing the same ranges, just by different keys (guest-phys-start vs. guest-phys-end). Ranges can only be joined if they are contiguous in both guest-physical address space, and contiguous in host virtual address space. The maximal ranges that remain in the end constitute the guest-physical memory map that the dump will be based on. Related RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=981582 Signed-off-by: Laszlo Ersek ler...@redhat.com --- include/sysemu/memory_mapping.h |1 + dump.c |2 +- memory_mapping.c| 135 +++ 3 files changed, 137 insertions(+), 1 deletions(-) diff --git a/include/sysemu/memory_mapping.h b/include/sysemu/memory_mapping.h index 53c2cd5..6723dc5 100644 --- a/include/sysemu/memory_mapping.h +++ b/include/sysemu/memory_mapping.h @@ -66,6 +66,7 @@ void memory_mapping_list_init(MemoryMappingList *list); void guest_phys_blocks_free(GuestPhysBlockList *list); void guest_phys_blocks_init(GuestPhysBlockList *list); +void guest_phys_blocks_append(GuestPhysBlockList *list); void qemu_get_guest_memory_mapping(MemoryMappingList *list, Error **errp); diff --git a/dump.c b/dump.c index 716fb1d..3fa33fc 100644 --- a/dump.c +++ b/dump.c @@ -746,7 +746,7 @@ static int dump_init(DumpState *s, int fd, bool paging, bool has_filter, s-length = length; guest_phys_blocks_init(s-guest_phys_blocks); -/* FILL LIST */ +guest_phys_blocks_append(s-guest_phys_blocks); s-start = get_start_block(s); if (s-start == -1) { diff --git a/memory_mapping.c b/memory_mapping.c index c70505b..efaabf8 100644 --- a/memory_mapping.c +++ b/memory_mapping.c @@ -11,9 +11,13 @@ * */ +#include glib.h + #include cpu.h #include exec/cpu-all.h #include sysemu/memory_mapping.h +#include exec/memory.h +#include exec/address-spaces.h static void memory_mapping_list_add_mapping_sorted(MemoryMappingList *list, MemoryMapping *mapping) @@ -182,6 +186,137 @@ void guest_phys_blocks_init(GuestPhysBlockList *list) QTAILQ_INIT(list-head); } +typedef struct GuestPhysListener { +GHashTable *by_target_start; +GHashTable *by_target_end; +MemoryListener listener; +} GuestPhysListener; + +static void guest_phys_blocks_region_add(MemoryListener *listener, + MemoryRegionSection *section) +{ +GuestPhysListener *g; +uint64_t section_size; +hwaddr target_start, target_end; +uint8_t *host_addr; +GuestPhysBlock *predecessor, *successor, *block; +bool found; + +/* we only care about RAM */ +if (!memory_region_is_ram(section-mr)) { +return; +} + +g= container_of(listener, GuestPhysListener, listener); +section_size = int128_get64(section-size); +target_start = section-offset_within_address_space; +target_end = target_start + section_size; +host_addr= memory_region_get_ram_ptr(section-mr) + + section-offset_within_region; + +/* find continuity in guest physical address space */ +predecessor = g_hash_table_lookup(g-by_target_end, target_start); +successor = g_hash_table_lookup(g-by_target_start, target_end); + +/* we require continuity in host memory too */ +if (predecessor != NULL) { +hwaddr predecessor_size = predecessor-target_end - + predecessor-target_start; +if (predecessor-host_addr + predecessor_size != host_addr) { +predecessor = NULL; +} +} +if (successor != NULL + host_addr + section_size != successor-host_addr) { +successor = NULL; +} + +if (predecessor == NULL) { +if (successor == NULL) { +/* Isolated mapping, allocate it and add it to both tables. */ +block = g_malloc0(sizeof *block); + +block-target_end = target_end; +g_hash_table_insert(g-by_target_end, block-target_end, block); +} else { +/* Mapping has successor only. Merge current into successor by + * modifying successor's start. Successor's end doesn't change. + */ +block = successor; +found = g_hash_table_steal(g-by_target_start, + block-target_start); +g_assert(found); +} +block-target_start = target_start; +block-host_addr= host_addr; +g_hash_table_insert(g-by_target_start, block-target_start,
Re: [Qemu-devel] [PATCH] target-ppc: Add POWER7+ CPU model
Am 05.08.2013 07:43, schrieb Paul Mackerras: On Fri, Aug 02, 2013 at 06:14:46PM +0200, Andreas Färber wrote: Am 02.08.2013 04:59, schrieb Alexey Kardashevskiy: This patch adds CPU PVR definition for POWER7+. Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- target-ppc/cpu-models.c | 2 ++ target-ppc/cpu-models.h | 1 + 2 files changed, 3 insertions(+) diff --git a/target-ppc/cpu-models.c b/target-ppc/cpu-models.c index 9578ed8..c97c183 100644 --- a/target-ppc/cpu-models.c +++ b/target-ppc/cpu-models.c @@ -1143,6 +1143,8 @@ POWER7 v2.1) POWERPC_DEF(POWER7_v2.3, CPU_POWERPC_POWER7_v23, POWER7, POWER7 v2.3) +POWERPC_DEF(POWER7P, CPU_POWERPC_POWER7P, POWER7, +POWER7P) Subject says POWER7+ rather than POWER7P. Since this is a string there's nothing wrong with +. How should it show up in SLOF? Do you mean in the device tree? The children of the /cpus node should be named like PowerPC,POWER7+@xxx. Thanks. See also below. POWERPC_DEF(POWER8_v1.0, CPU_POWERPC_POWER8_v10, POWER8, POWER8 v1.0) POWERPC_DEF(970, CPU_POWERPC_970,970, diff --git a/target-ppc/cpu-models.h b/target-ppc/cpu-models.h index 01e488f..c3c78d1 100644 --- a/target-ppc/cpu-models.h +++ b/target-ppc/cpu-models.h @@ -556,6 +556,7 @@ enum { CPU_POWERPC_POWER7_v20 = 0x003F0200, CPU_POWERPC_POWER7_v21 = 0x003F0201, CPU_POWERPC_POWER7_v23 = 0x003F0203, +CPU_POWERPC_POWER7P= 0x004A0201, Shouldn't this be ..._POWER7P_v21 to align with the surrounding models? Ditto for the model name then, POWER7+ being an alias to the latest version only. Does the fact that all these minor revisions are enumerated imply that QEMU is always matching all the bits of the PVR value? Yes, see target-ppc/kvm.c. If so then that seems very fragile to me, given that QEMU looks at the host's PVR value when using KVM. All it takes is for IBM to release a new minor revision of a CPU to break existing compiled versions of qemu (e.g. from a distro). Wouldn't it be better to be able to match only the top 16 bits, at least for the situations where there is no architectural or significant behavioural change between the versions? Not being the ppc maintainer, I can only point out that this is how QEMU works today and that, independently, it makes sense to name the PVR and model after the revision they encode for self-documentation. Ben has pointed out that POWER5++ PVR-wise was a POWER5+ v3.0+ or so and, according to you, POWER5++ is not just a revision with hardware-only changes and a new marketing name. So no, it's not always just the upper 16 bits. For e500v1/v2 the revision appears to be encoded in the two least significant nibbles rather than in the two least significant bytes. For e300 it's in the second most significant byte. For the 405/440 models it's not linear from a to b to c at all. Cf. target-ppc/cpu-models.h. If we wanted to make our matching less precise, we'd need to specify an explicit wildcard per model. And we can only match known models, i.e. up to v2.1 as long as we don't know whether v2.2 or v3.0 will have the same guest-exposed/emulation features (in which case we could've picked them instead), because otherwise we'd need a versioning mechanism like x86 has, to avoid guest-visible CPU features changing between releases once we do know what features new revisions have. Also, if we stopped modeling exact revisions, it may be a good idea to add properties to the PowerPCCPU to specify a particular major/minor revision within the wildcard. We may need to limit such properties to IBM's CPU models since apparently that scheme does not apply to all PVRs. CC'ing Scott and Stuart. Examples: -device POWER7+-powerpc64-cpu,major-revision=2,minor-revision=0 -device POWER7+-powerpc64-cpu,pvr=0x12345678 Regards, Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with EPT enabled
hi all, I met similar problem to these, while performing live migration or save-restore test on the kvm platform (qemu:1.4.0, host:suse11sp2, guest:suse11sp2), running tele-communication software suite in guest, https://lists.gnu.org/archive/html/qemu-devel/2013-05/msg00098.html http://comments.gmane.org/gmane.comp.emulators.kvm.devel/102506 http://thread.gmane.org/gmane.comp.emulators.kvm.devel/100592 https://bugzilla.kernel.org/show_bug.cgi?id=58771 After live migration or virsh restore [savefile], one process's CPU utilization went up by about 30%, resulted in throughput degradation of this process. If EPT disabled, this problem gone. I suspect that kvm hypervisor has business with this problem. Based on above suspect, I want to find the two adjacent versions of kvm-kmod which triggers this problem or not (e.g. 2.6.39, 3.0-rc1), and analyze the differences between this two versions, or apply the patches between this two versions by bisection method, finally find the key patches. Any better ideas? Thanks, Zhang Haoyu I've attempted to duplicate this on a number of machines that are as similar to yours as I am able to get my hands on, and so far have not been able to see any performance degradation. And from what I've read in the above links, huge pages do not seem to be part of the problem. So, if you are in a position to bisect the kernel changes, that would probably be the best avenue to pursue in my opinion. Bruce I found the first bad commit([612819c3c6e67bac8fceaa7cc402f13b1b63f7e4] KVM: propagate fault r/w information to gup(), allow read-only memory) which triggers this problem by git bisecting the kvm kernel (download from https://git.kernel.org/pub/scm/virt/kvm/kvm.git) changes. And, git log 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4 -n 1 -p 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.log git diff 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4~1..612819c3c6e67bac8fceaa7cc4 02f13b1b63f7e4 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.diff Then, I diffed 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.log and 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.diff, came to a conclusion that all of the differences between 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4~1 and 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4 are contributed by no other than 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4, so this commit is the peace-breaker which directly or indirectly causes the degradation. Does the map_writable flag passed to mmu_set_spte() function have effect on PTE's PAT flag or increase the VMEXITs induced by that guest tried to write read-only memory? Thanks, Zhang Haoyu There should be no read-only memory maps backing guest RAM. Can you confirm map_writable = false is being passed to __direct_map? (this should not happen, for guest RAM). And if it is false, please capture the associated GFN. I added below check and printk at the start of __direct_map() at the fist bad commit version, --- kvm-612819c3c6e67bac8fceaa7cc402f13b1b63f7e4/arch/x86/kvm/mmu.c 2013-07-26 18:44:05.0 +0800 +++ kvm-612819/arch/x86/kvm/mmu.c 2013-07-31 00:05:48.0 +0800 @@ -2223,6 +2223,9 @@ static int __direct_map(struct kvm_vcpu int pt_write = 0; gfn_t pseudo_gfn; +if (!map_writable) +printk(KERN_ERR %s: %s: gfn = %llu \n, __FILE__, __func__, gfn); + for_each_shadow_entry(vcpu, (u64)gfn PAGE_SHIFT, iterator) { if (iterator.level == level) { unsigned pte_access = ACC_ALL; I virsh-save the VM, and then virsh-restore it, so many GFNs were printed, you can absolutely describe it as flooding. The flooding you see happens during migrate to file stage because of dirty page tracking. If you clear dmesg after virsh-save you should not see any flooding after virsh-restore. I just checked with latest tree, I do not. I made a verification again. I virsh-save the VM, during the saving stage, I run 'dmesg', no GFN printed, maybe the switching from running stage to pause stage takes so short time, no guest-write happens during this switching period. After the completion of saving operation, I run 'demsg -c' to clear the buffer all the same, then I virsh-restore the VM, so many GFNs are printed by running 'dmesg', and I also run 'tail -f /var/log/messages' during the restoring stage, so many GFNs are flooded dynamically too. I'm sure that the flooding happens during the virsh-restore stage, not the migration stage. On VM's normal starting stage, only very few GFNs are printed, shown as below gfn = 16 gfn = 604 gfn = 605 gfn = 606 gfn = 607 gfn = 608 gfn = 609 but on the VM's restoring stage, so many GFNs are printed, taking some examples shown as below, 2042600 279 2797778 2797779 2797780 2797781 2797782 2797783 2797784
Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with EPT enabled
On Mon, Aug 05, 2013 at 08:35:09AM +, Zhanghaoyu (A) wrote: hi all, I met similar problem to these, while performing live migration or save-restore test on the kvm platform (qemu:1.4.0, host:suse11sp2, guest:suse11sp2), running tele-communication software suite in guest, https://lists.gnu.org/archive/html/qemu-devel/2013-05/msg00098.html http://comments.gmane.org/gmane.comp.emulators.kvm.devel/102506 http://thread.gmane.org/gmane.comp.emulators.kvm.devel/100592 https://bugzilla.kernel.org/show_bug.cgi?id=58771 After live migration or virsh restore [savefile], one process's CPU utilization went up by about 30%, resulted in throughput degradation of this process. If EPT disabled, this problem gone. I suspect that kvm hypervisor has business with this problem. Based on above suspect, I want to find the two adjacent versions of kvm-kmod which triggers this problem or not (e.g. 2.6.39, 3.0-rc1), and analyze the differences between this two versions, or apply the patches between this two versions by bisection method, finally find the key patches. Any better ideas? Thanks, Zhang Haoyu I've attempted to duplicate this on a number of machines that are as similar to yours as I am able to get my hands on, and so far have not been able to see any performance degradation. And from what I've read in the above links, huge pages do not seem to be part of the problem. So, if you are in a position to bisect the kernel changes, that would probably be the best avenue to pursue in my opinion. Bruce I found the first bad commit([612819c3c6e67bac8fceaa7cc402f13b1b63f7e4] KVM: propagate fault r/w information to gup(), allow read-only memory) which triggers this problem by git bisecting the kvm kernel (download from https://git.kernel.org/pub/scm/virt/kvm/kvm.git) changes. And, git log 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4 -n 1 -p 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.log git diff 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4~1..612819c3c6e67bac8fceaa7cc4 02f13b1b63f7e4 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.diff Then, I diffed 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.log and 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.diff, came to a conclusion that all of the differences between 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4~1 and 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4 are contributed by no other than 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4, so this commit is the peace-breaker which directly or indirectly causes the degradation. Does the map_writable flag passed to mmu_set_spte() function have effect on PTE's PAT flag or increase the VMEXITs induced by that guest tried to write read-only memory? Thanks, Zhang Haoyu There should be no read-only memory maps backing guest RAM. Can you confirm map_writable = false is being passed to __direct_map? (this should not happen, for guest RAM). And if it is false, please capture the associated GFN. I added below check and printk at the start of __direct_map() at the fist bad commit version, --- kvm-612819c3c6e67bac8fceaa7cc402f13b1b63f7e4/arch/x86/kvm/mmu.c 2013-07-26 18:44:05.0 +0800 +++ kvm-612819/arch/x86/kvm/mmu.c 2013-07-31 00:05:48.0 +0800 @@ -2223,6 +2223,9 @@ static int __direct_map(struct kvm_vcpu int pt_write = 0; gfn_t pseudo_gfn; +if (!map_writable) +printk(KERN_ERR %s: %s: gfn = %llu \n, __FILE__, __func__, gfn); + for_each_shadow_entry(vcpu, (u64)gfn PAGE_SHIFT, iterator) { if (iterator.level == level) { unsigned pte_access = ACC_ALL; I virsh-save the VM, and then virsh-restore it, so many GFNs were printed, you can absolutely describe it as flooding. The flooding you see happens during migrate to file stage because of dirty page tracking. If you clear dmesg after virsh-save you should not see any flooding after virsh-restore. I just checked with latest tree, I do not. I made a verification again. I virsh-save the VM, during the saving stage, I run 'dmesg', no GFN printed, maybe the switching from running stage to pause stage takes so short time, no guest-write happens during this switching period. After the completion of saving operation, I run 'demsg -c' to clear the buffer all the same, then I virsh-restore the VM, so many GFNs are printed by running 'dmesg', and I also run 'tail -f /var/log/messages' during the restoring stage, so many GFNs are flooded dynamically too. I'm sure that the flooding happens during the virsh-restore stage, not the migration stage. Interesting, is this with upstream kernel? For me the situation is exactly the opposite. What is your command line? --
Re: [Qemu-devel] [PATCH for-1.6] target-mips: do not raise exceptions when accessing invalid memory
Am 05.08.2013 00:04, schrieb Aurélien Jarno: On Mon, Jul 29, 2013 at 10:35:31PM +0200, Stefan Weil wrote: Am 27.07.2013 22:58, schrieb Stefan Weil: Am 27.07.2013 22:43, schrieb Andreas Färber: Am 27.07.2013 21:37, schrieb Stefan Weil: Am 27.07.2013 19:43, schrieb Peter Maydell: On 27 July 2013 17:18, Hervé Poussineau hpous...@reactos.org wrote: Another solution would be to add a big dummy memory regions on all MIPS boards to catch memory accesses and not raise an exception. However, this means that each MIPS board will have its own unassigned memory handler, different from the global QEMU one. Better would be to at least provide fake RAZ/WI implementations of devices for the boards, rather than making the dummy region cover the whole of the address space. Not 1.6 material, though. For MIPS Malta, Linux boot can be fixed by handling read access for two addresses: 0x1fbf8008 0x1bc80110 The corresponding definitions in the Linux kernel code seem to be these lines: #define GCMP_BASE_ADDR 0x1fbf8000 #define GCMP_ADDRSPACE_SZ (256 * 1024) #define GCMP_GCB_GCMPB_OFS 0x0008 /* Global GCMP Base */ #define MSC01_BIU_REG_BASE 0x1bc8 #define MSC01_BIU_ADDRSPACE_SZ (256 * 1024) #define MSC01_SC_CFG_OFS0x0110 = mips_malta.c needs a handler for reads of (GCMP_BASE_ADDR + GCMP_GCB_GCMPB_OFS) and (MSC01_BIU_REG_BASE + MSC01_SC_CFG_OFS). I don't think it would be correct to emulate them as this are not present on the real Malta board, at least for the model emulated by QEMU. Theses addresses correspond to the SMP controller, and is therefore only present when an SMP daughter card is installed. The Linux kernel probes theses addresses, and look if they return something consistent. If not the corresponding devices are latter ignored. The real hardware probably returns all 1 or all 0 for addresses not decoded to a device. This is what QEMU should model, and it should not trigger a DBE or IBE exception. [snip] Have you tested Jan's patches limiting the new unassigned read value -1 to PIO? Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
Re: [Qemu-devel] [PATCH for-1.6] target-mips: do not raise exceptions when accessing invalid memory
On 2013-08-05 10:45, Andreas Färber wrote: Am 05.08.2013 00:04, schrieb Aurélien Jarno: On Mon, Jul 29, 2013 at 10:35:31PM +0200, Stefan Weil wrote: Am 27.07.2013 22:58, schrieb Stefan Weil: Am 27.07.2013 22:43, schrieb Andreas Färber: Am 27.07.2013 21:37, schrieb Stefan Weil: Am 27.07.2013 19:43, schrieb Peter Maydell: On 27 July 2013 17:18, Hervé Poussineau hpous...@reactos.org wrote: Another solution would be to add a big dummy memory regions on all MIPS boards to catch memory accesses and not raise an exception. However, this means that each MIPS board will have its own unassigned memory handler, different from the global QEMU one. Better would be to at least provide fake RAZ/WI implementations of devices for the boards, rather than making the dummy region cover the whole of the address space. Not 1.6 material, though. For MIPS Malta, Linux boot can be fixed by handling read access for two addresses: 0x1fbf8008 0x1bc80110 The corresponding definitions in the Linux kernel code seem to be these lines: #define GCMP_BASE_ADDR 0x1fbf8000 #define GCMP_ADDRSPACE_SZ (256 * 1024) #define GCMP_GCB_GCMPB_OFS 0x0008 /* Global GCMP Base */ #define MSC01_BIU_REG_BASE 0x1bc8 #define MSC01_BIU_ADDRSPACE_SZ (256 * 1024) #define MSC01_SC_CFG_OFS0x0110 = mips_malta.c needs a handler for reads of (GCMP_BASE_ADDR + GCMP_GCB_GCMPB_OFS) and (MSC01_BIU_REG_BASE + MSC01_SC_CFG_OFS). I don't think it would be correct to emulate them as this are not present on the real Malta board, at least for the model emulated by QEMU. Theses addresses correspond to the SMP controller, and is therefore only present when an SMP daughter card is installed. The Linux kernel probes theses addresses, and look if they return something consistent. If not the corresponding devices are latter ignored. The real hardware probably returns all 1 or all 0 for addresses not decoded to a device. This is what QEMU should model, and it should not trigger a DBE or IBE exception. [snip] Have you tested Jan's patches limiting the new unassigned read value -1 to PIO? Yeah, this sounds a lot like another side effect of using unassigned_mem_ops for PIO... Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux
Re: [Qemu-devel] [PATCH 0/12 RFC v2] Localhost migration
On 07/26/2013 05:41 PM, Paolo Bonzini wrote: Il 25/07/2013 22:18, Lei Li ha scritto: Hi, This patch series tries to add localhost migration support to Qemu. When doing localhost migration, the host memory will balloon up during the period, might consume double memories for some time. So we want to add a new live migration mechanism localhost migration. Following I copied from last version that Anthony added for the benefit of the other reviewers: The goal here is to allow live upgrade of a running QEMU instance. The work flow would look like this: 1) Guests are running QEMU release 1.6.1 2) Admin installs QEMU release 1.6.2 via RPM or deb 3) Admin does localhost migration with page flipping to use new version of QEMU. Page flipping is used in order to avoid requiring that there is enough free memory to fit an additional copy of the largest guest which is the requirement today with localhost migration. You can also read from the link below: http://lists.gnu.org/archive/html/qemu-devel/2013-06/msg02577.html The plan is: 1) Add new command to do localhost migration. The qmp interface introduced like: { 'command': 'localhost-migrate', 'data': {'uri': 'str'} } 2) Use different mechanism than current live migration. The very basic work flow like: qemu on the source (the source and destination are both on localhost) | V Stop VM | V Create threads | V Page flipping through vmspice | V MADV_DONTNEED the ram pages which are already flipped | V Migration completes As stopping VM first, we expect/resume the page flipping through vmspice is fast enough to meet *live migration (low downtime). Notes: Currently the work flow is not exactly the same as description above. For the first step, the work flow we implemented is: stop VM and copy ram pages via unix domain socket, MADV_DONTNEED ram pages that already copied. After that, will replace to vmsplice mechanism instead of copying pages. Now it's still a draft version, and as it is implemented separately to the current migration code for a easy start, the next step will be trying to integrate it into the current migration implementation closely. To make sure we are on the right direction that should be headed, please let me know your suggestions on this. For the interface, as Anthony has suggested using a flag or a capability, which one would you prefer or any ideas? Using a capability on the source makes sense; I don't think anything special is needed on the destination. The destination just sees a special packet telling it that a pipe is available via SCM_RIGHTS; then it fetches the file descriptor and uses it for the new protocol. It looks like this could reuse a lot of the RAM copying hooks that we introduced for RDMA. You shouldn't need to change anything in savevm.c or arch_init.c Hi Paolo, Thanks for your suggestions! And sorry for the late reply, I am looking into RDMA implementation and trying to figure out the way to integrate with its approach as you suggested. For the interface on the source, I am OK with using a capability. For the destination, currently we are copying ram pages via unix domain socket as first step, will replace to pipe later. How about add a new URI prefix as Michael R.Hines suggested? Paolo Your comments are very welcome! TODO: - Integrate to the current implement of migration closely? - Introduce a mechanism to exchange a PIPE via SCM_RIGHTS. - benchmark/evaluation. Lei Li (12): migration: export MIG_STATE_xxx flags savevm: export qemu_save_device_state() rename is_active to is_block_active arch_init: introduce ram_page_save() arch_init: introduce ram_save_local() arch_init: add save_local_setup to savevm_ram_handlers savevm: introduce qemu_savevm_local() savevm: adjust is_ram check in register_savevm_live() migration-local: implementation of outgoing part migration-local: implementation of incoming part migration-local: add option to command line for local incoming hmp:add hmp_localhost_migration interface Makefile.objs |1 + arch_init.c | 110 +++ block-migration.c |2 +- hmp-commands.hx | 17 hmp.c | 13 +++ hmp.h |1 + include/migration/migration.h | 32 +++ include/sysemu/sysemu.h |1 + migration-local.c | 228 + migration-unix.c | 60 migration.c |8 -- qapi-schema.json | 14 +++ qemu-options.hx |9 ++ qmp-commands.hx | 22 + savevm.c | 100 +++-- vl.c
[Qemu-devel] [PATCH for-1.6] qemu-img: Error out for excess arguments
Don't silently ignore excess arguments at the end of the command line, but error out instead. This can catch typos like 'resize test.img + 1G', which doesn't increase the image size by 1G as intended, but truncates the image to 1G. Even for less dangerous commands, the old behaviour is confusing. Signed-off-by: Kevin Wolf kw...@redhat.com --- qemu-img.c | 17 ++--- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/qemu-img.c b/qemu-img.c index c55ca5c..dece1b3 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -396,6 +396,9 @@ static int img_create(int argc, char **argv) } img_size = (uint64_t)sval; } +if (optind != argc) { +help(); +} if (options is_help_option(options)) { return print_block_option_help(filename, fmt); @@ -573,7 +576,7 @@ static int img_check(int argc, char **argv) break; } } -if (optind = argc) { +if (optind != argc - 1) { help(); } filename = argv[optind++]; @@ -684,7 +687,7 @@ static int img_commit(int argc, char **argv) break; } } -if (optind = argc) { +if (optind != argc - 1) { help(); } filename = argv[optind++]; @@ -930,7 +933,7 @@ static int img_compare(int argc, char **argv) } -if (optind argc - 2) { +if (optind != argc - 2) { help(); } filename1 = argv[optind++]; @@ -1741,7 +1744,7 @@ static int img_info(int argc, char **argv) break; } } -if (optind = argc) { +if (optind != argc - 1) { help(); } filename = argv[optind++]; @@ -1842,7 +1845,7 @@ static int img_snapshot(int argc, char **argv) } } -if (optind = argc) { +if (optind != argc - 1) { help(); } filename = argv[optind++]; @@ -1953,7 +1956,7 @@ static int img_rebase(int argc, char **argv) progress = 0; } -if ((optind = argc) || (!unsafe !out_baseimg)) { +if ((optind != argc - 1) || (!unsafe !out_baseimg)) { help(); } filename = argv[optind++]; @@ -2232,7 +2235,7 @@ static int img_resize(int argc, char **argv) break; } } -if (optind = argc) { +if (optind != argc - 1) { help(); } filename = argv[optind++]; -- 1.8.1.4
Re: [Qemu-devel] [PATCH] savevm: set right return value for qemu_file_rate_limit
PING? any comments? On 07/24/2013 11:48 PM, Lei Li wrote: In the logic of ram_save_iterate(), it checks the ret of qemu_file_rate_limit(f) by ret 0 if there has been an error. But now it will never return negative value because qemu_file_rate_limit() return 1 if qemu_file_get_error(). Also the original implementation of qemu_file_rate_limit() as function buffered_rate_limit() did return negative for this situation. On 07/21/2013 08:56 PM, Lei Li wrote: Commit 1964a397063967acc5ce71a2a24ed26e74824ee1 refactors rate limiting to QEMUFile, but set the return value for qemu_file_rate_limit to 1 in the case of qemu_file_get_error. It is wrong and should be negative compared to the original function buffered_rate_limit and the current logic in ram_save_iterate. As qemu_file_rate_limit is called manually to determine if it has to exit, add the defination of the meaning of the return values as well. Signed-off-by: Lei Li li...@linux.vnet.ibm.com --- savevm.c | 14 -- 1 files changed, 12 insertions(+), 2 deletions(-) diff --git a/savevm.c b/savevm.c index e0491e7..f406790 100644 --- a/savevm.c +++ b/savevm.c @@ -904,10 +904,20 @@ int64_t qemu_ftell(QEMUFile *f) return f-pos; } +/* + * The meaning of the return values is: + * 0: We can continue sending + * 1: Time to stop + * negative: There has been an error + */ + int qemu_file_rate_limit(QEMUFile *f) { -if (qemu_file_get_error(f)) { -return 1; +int ret; + +ret = qemu_file_get_error(f); +if (ret) { +return ret; } if (f-xfer_limit 0 f-bytes_xfer f-xfer_limit) { return 1; -- Lei
Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with EPT enabled
hi all, I met similar problem to these, while performing live migration or save-restore test on the kvm platform (qemu:1.4.0, host:suse11sp2, guest:suse11sp2), running tele-communication software suite in guest, https://lists.gnu.org/archive/html/qemu-devel/2013-05/msg00098.html http://comments.gmane.org/gmane.comp.emulators.kvm.devel/102506 http://thread.gmane.org/gmane.comp.emulators.kvm.devel/100592 https://bugzilla.kernel.org/show_bug.cgi?id=58771 After live migration or virsh restore [savefile], one process's CPU utilization went up by about 30%, resulted in throughput degradation of this process. If EPT disabled, this problem gone. I suspect that kvm hypervisor has business with this problem. Based on above suspect, I want to find the two adjacent versions of kvm-kmod which triggers this problem or not (e.g. 2.6.39, 3.0-rc1), and analyze the differences between this two versions, or apply the patches between this two versions by bisection method, finally find the key patches. Any better ideas? Thanks, Zhang Haoyu I've attempted to duplicate this on a number of machines that are as similar to yours as I am able to get my hands on, and so far have not been able to see any performance degradation. And from what I've read in the above links, huge pages do not seem to be part of the problem. So, if you are in a position to bisect the kernel changes, that would probably be the best avenue to pursue in my opinion. Bruce I found the first bad commit([612819c3c6e67bac8fceaa7cc402f13b1b63f7e4] KVM: propagate fault r/w information to gup(), allow read-only memory) which triggers this problem by git bisecting the kvm kernel (download from https://git.kernel.org/pub/scm/virt/kvm/kvm.git) changes. And, git log 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4 -n 1 -p 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.log git diff 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4~1..612819c3c6e67bac8fceaa7cc4 02f13b1b63f7e4 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.diff Then, I diffed 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.log and 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.diff, came to a conclusion that all of the differences between 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4~1 and 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4 are contributed by no other than 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4, so this commit is the peace-breaker which directly or indirectly causes the degradation. Does the map_writable flag passed to mmu_set_spte() function have effect on PTE's PAT flag or increase the VMEXITs induced by that guest tried to write read-only memory? Thanks, Zhang Haoyu There should be no read-only memory maps backing guest RAM. Can you confirm map_writable = false is being passed to __direct_map? (this should not happen, for guest RAM). And if it is false, please capture the associated GFN. I added below check and printk at the start of __direct_map() at the fist bad commit version, --- kvm-612819c3c6e67bac8fceaa7cc402f13b1b63f7e4/arch/x86/kvm/mmu.c 2013-07-26 18:44:05.0 +0800 +++ kvm-612819/arch/x86/kvm/mmu.c 2013-07-31 00:05:48.0 +0800 @@ -2223,6 +2223,9 @@ static int __direct_map(struct kvm_vcpu int pt_write = 0; gfn_t pseudo_gfn; +if (!map_writable) +printk(KERN_ERR %s: %s: gfn = %llu \n, __FILE__, __func__, gfn); + for_each_shadow_entry(vcpu, (u64)gfn PAGE_SHIFT, iterator) { if (iterator.level == level) { unsigned pte_access = ACC_ALL; I virsh-save the VM, and then virsh-restore it, so many GFNs were printed, you can absolutely describe it as flooding. The flooding you see happens during migrate to file stage because of dirty page tracking. If you clear dmesg after virsh-save you should not see any flooding after virsh-restore. I just checked with latest tree, I do not. I made a verification again. I virsh-save the VM, during the saving stage, I run 'dmesg', no GFN printed, maybe the switching from running stage to pause stage takes so short time, no guest-write happens during this switching period. After the completion of saving operation, I run 'demsg -c' to clear the buffer all the same, then I virsh-restore the VM, so many GFNs are printed by running 'dmesg', and I also run 'tail -f /var/log/messages' during the restoring stage, so many GFNs are flooded dynamically too. I'm sure that the flooding happens during the virsh-restore stage, not the migration stage. Interesting, is this with upstream kernel? For me the situation is exactly the opposite. What is your command line? I made the verification on the first bad commit
Re: [Qemu-devel] [PATCH] target-ppc: Add POWER7+ CPU model
On Mon, 2013-08-05 at 10:24 +0200, Andreas Färber wrote: Ben has pointed out that POWER5++ PVR-wise was a POWER5+ v3.0+ or so and, according to you, POWER5++ is not just a revision with hardware-only changes and a new marketing name. So no, it's not always just the upper 16 bits. Also 4xx uses a different bit split etc... that's why qemu should do like the kernel and use a mask/value pair to do the matching. Cheers, Ben.
Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with EPT enabled
Hi, Am 05.08.2013 11:09, schrieb Zhanghaoyu (A): When I build the upstream, encounter a problem that I compile and install the upstream(commit: e769ece3b129698d2b09811a6f6d304e4eaa8c29) on sles11sp2 environment via below command cp /boot/config-3.0.13-0.27-default ./.config yes | make oldconfig make make modules_install make install then, I reboot the host, and select the upstream kernel, but during the starting stage, below problem happened, Could not find /dev/disk/by-id/scsi-3600508e0864407c5b8f7ad01-part3 I'm trying to resolve it. Possibly you need to enable loading unsupported kernel modules? At least that's needed when testing a kmod with a SUSE kernel. Regards, Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
[Qemu-devel] [PATCH] pci: fix i82801b11 bridge
Signed-off-by: Gerd Hoffmann kra...@redhat.com --- hw/pci-bridge/i82801b11.c |1 + 1 file changed, 1 insertion(+) diff --git a/hw/pci-bridge/i82801b11.c b/hw/pci-bridge/i82801b11.c index 8a5e426..14cd7fd 100644 --- a/hw/pci-bridge/i82801b11.c +++ b/hw/pci-bridge/i82801b11.c @@ -90,6 +90,7 @@ static void i82801b11_bridge_class_init(ObjectClass *klass, void *data) k-device_id = PCI_DEVICE_ID_INTEL_82801BA_11; k-revision = ICH9_D2P_A2_REVISION; k-init = i82801b11_bridge_initfn; +k-config_write = pci_bridge_write_config; set_bit(DEVICE_CATEGORY_BRIDGE, dc-categories); } -- 1.7.9.7
Re: [Qemu-devel] [Patch] ARM: Add an L2 cache controller to KZM
On 5 August 2013 02:21, Peter Chubb peter.ch...@nicta.com.au wrote: Reads to unassigned memory now return non-zero (since patch 9b8c69243585). This breaks guests runnong on i.MX31 that use the cache controller --- they poll forever waiting for the L2CC cache invalidate regsiter to be zero. That commit claims it was just restoring the previous behaviour, so it shouldn't break guests, surely? -- PMM
Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with EPT enabled
Hi, Am 05.08.2013 11:09, schrieb Zhanghaoyu (A): When I build the upstream, encounter a problem that I compile and install the upstream(commit: e769ece3b129698d2b09811a6f6d304e4eaa8c29) on sles11sp2 environment via below command cp /boot/config-3.0.13-0.27-default ./.config yes | make oldconfig make make modules_install make install then, I reboot the host, and select the upstream kernel, but during the starting stage, below problem happened, Could not find /dev/disk/by-id/scsi-3600508e0864407c5b8f7ad01-part3 I'm trying to resolve it. Possibly you need to enable loading unsupported kernel modules? At least that's needed when testing a kmod with a SUSE kernel. I have tried to set allow_unsupported_modules 1 in /etc/modprobe.d/unsupported-modules, but the problem still happened. I replace the whole kernel with the kvm kernel, not only the kvm modules. Regards, Andreas
Re: [Qemu-devel] [Patch] ARM: Add an L2 cache controller to KZM
Am 05.08.2013 11:18, schrieb Peter Maydell: On 5 August 2013 02:21, Peter Chubb peter.ch...@nicta.com.au wrote: Reads to unassigned memory now return non-zero (since patch 9b8c69243585). This breaks guests runnong on i.MX31 that use the cache controller --- they poll forever waiting for the L2CC cache invalidate regsiter to be zero. That commit claims it was just restoring the previous behaviour, so it shouldn't break guests, surely? See Jan's patches on the list. PReP was reported affected, too. Peter Ch., if you know the exact differences, why don't you just derive an imx-l2cc type (or so) derived from ARM's type, overriding the values mentioned above? Sounds trivial to me. Regards, Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
Re: [Qemu-devel] [PATCH v6] e1000: add interrupt mitigation support
On Fri, Aug 02, 2013 at 06:30:52PM +0200, Vincenzo Maffione wrote: This patch partially implements the e1000 interrupt mitigation mechanisms. Using a single QEMUTimer, it emulates the ITR register (which is the newer mitigation register, recommended by Intel) and approximately emulates RADV and TADV registers. TIDV and RDTR register functionalities are not emulated (RDTR is only used to validate RADV, according to the e1000 specs). RADV, TADV, TIDV and RDTR registers make up the older e1000 mitigation mechanism and would need a timer each to be completely emulated. However, a single timer has been used in order to reach a good compromise between emulation accuracy and simplicity/efficiency. The implemented mechanism can be enabled/disabled specifying the command line e1000-specific boolean parameter mitigation, e.g. qemu-system-x86_64 -device e1000,mitigation=on,... ... For more information, see the Software developer's manual at http://download.intel.com/design/network/manuals/8254x_GBe_SDM.pdf. Interrupt mitigation boosts performance when the guest suffers from an high interrupt rate (i.e. receiving short UDP packets at high packet rate). For some numerical results see the following link http://info.iet.unipi.it/~luigi/papers/20130520-rizzo-vm.pdf Signed-off-by: Vincenzo Maffione v.maffi...@gmail.com --- Added pc-*-1.7 machines (default machine moved to pc-i440fx-1.7). hw/i386/pc_piix.c| 18 ++- hw/i386/pc_q35.c | 16 ++- hw/net/e1000.c | 131 +-- include/hw/i386/pc.h | 8 4 files changed, 167 insertions(+), 6 deletions(-) Reviewed-by: Stefan Hajnoczi stefa...@redhat.com
Re: [Qemu-devel] [PATCH v6] e1000: add interrupt mitigation support
On Fri, Aug 02, 2013 at 06:30:52PM +0200, Vincenzo Maffione wrote: This patch partially implements the e1000 interrupt mitigation mechanisms. Using a single QEMUTimer, it emulates the ITR register (which is the newer mitigation register, recommended by Intel) and approximately emulates RADV and TADV registers. TIDV and RDTR register functionalities are not emulated (RDTR is only used to validate RADV, according to the e1000 specs). RADV, TADV, TIDV and RDTR registers make up the older e1000 mitigation mechanism and would need a timer each to be completely emulated. However, a single timer has been used in order to reach a good compromise between emulation accuracy and simplicity/efficiency. The implemented mechanism can be enabled/disabled specifying the command line e1000-specific boolean parameter mitigation, e.g. qemu-system-x86_64 -device e1000,mitigation=on,... ... For more information, see the Software developer's manual at http://download.intel.com/design/network/manuals/8254x_GBe_SDM.pdf. Interrupt mitigation boosts performance when the guest suffers from an high interrupt rate (i.e. receiving short UDP packets at high packet rate). For some numerical results see the following link http://info.iet.unipi.it/~luigi/papers/20130520-rizzo-vm.pdf Signed-off-by: Vincenzo Maffione v.maffi...@gmail.com --- Added pc-*-1.7 machines (default machine moved to pc-i440fx-1.7). hw/i386/pc_piix.c| 18 ++- hw/i386/pc_q35.c | 16 ++- hw/net/e1000.c | 131 +-- include/hw/i386/pc.h | 8 4 files changed, 167 insertions(+), 6 deletions(-) I will merge this into my net tree soon but I'd like to wait for Andreas' review to make sure he's comfortable with the machine type changes too. Stefan
Re: [Qemu-devel] [PATCH 1/2] memory: Provide separate handling of unassigned io ports accesses
Am 03.08.2013 10:31, schrieb Jan Kiszka: From: Jan Kiszka jan.kis...@siemens.com Accesses to unassigned io ports shall return -1 on read and be ignored on write. Ensure these properties via dedicated ops, decoupling us from the memory core's handling of unassigned accesses. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- exec.c|3 ++- include/exec/ioport.h |2 ++ ioport.c | 16 3 files changed, 20 insertions(+), 1 deletions(-) diff --git a/exec.c b/exec.c index 3ca9381..9ed598f 100644 --- a/exec.c +++ b/exec.c @@ -1820,7 +1820,8 @@ static void memory_map_init(void) address_space_init(address_space_memory, system_memory, memory); system_io = g_malloc(sizeof(*system_io)); -memory_region_init(system_io, NULL, io, 65536); +memory_region_init_io(system_io, NULL, unassigned_io_ops, NULL, io, + 65536); It was reported that there may be some machines/PHBs that have overlapping PIO/MMIO. Unless we use priorities, this ..._io MemoryRegion will shadow or conflict with any ..._io MemoryRegion added to the memory address space, wouldn't it? Regards, Andreas address_space_init(address_space_io, system_io, I/O); memory_listener_register(core_memory_listener, address_space_memory); diff --git a/include/exec/ioport.h b/include/exec/ioport.h index bdd4e96..84f7f85 100644 --- a/include/exec/ioport.h +++ b/include/exec/ioport.h @@ -45,6 +45,8 @@ typedef struct MemoryRegionPortio { #define PORTIO_END_OF_LIST() { } +extern const MemoryRegionOps unassigned_io_ops; + void cpu_outb(pio_addr_t addr, uint8_t val); void cpu_outw(pio_addr_t addr, uint16_t val); void cpu_outl(pio_addr_t addr, uint32_t val); diff --git a/ioport.c b/ioport.c index 79b7f1a..9765588 100644 --- a/ioport.c +++ b/ioport.c @@ -44,6 +44,22 @@ typedef struct MemoryRegionPortioList { MemoryRegionPortio ports[]; } MemoryRegionPortioList; +static uint64_t unassigned_io_read(void *opaque, hwaddr addr, unsigned size) +{ +return -1UL; +} + +static void unassigned_io_write(void *opaque, hwaddr addr, uint64_t val, +unsigned size) +{ +} + +const MemoryRegionOps unassigned_io_ops = { +.read = unassigned_io_read, +.write = unassigned_io_write, +.endianness = DEVICE_NATIVE_ENDIAN, +}; + void cpu_outb(pio_addr_t addr, uint8_t val) { LOG_IOPORT(outb: %04FMT_pioaddr %02PRIx8\n, addr, val); -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with EPT enabled
On Mon, Aug 05, 2013 at 09:09:56AM +, Zhanghaoyu (A) wrote: The QEMU command line (/var/log/libvirt/qemu/[domain name].log), LC_ALL=C PATH=/bin:/sbin:/usr/bin:/usr/sbin HOME=/ QEMU_AUDIO_DRV=none /usr/local/bin/qemu-system-x86_64 -name ATS1 -S -M pc-0.12 -cpu qemu32 -enable-kvm -m 12288 -smp 4,sockets=4,cores=1,threads=1 -uuid 0505ec91-382d-800e-2c79-e5b286eb60b5 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/ATS1.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/opt/ne/vm/ATS1.img,if=none,id=drive-virtio-disk0,format=raw,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=20,id=hostnet0,vhost=on,vhostfd=21 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:e0:fc:00:0f:00,bus=pci.0,addr=0x3,bootindex=2 -netdev tap,fd=22,id=hostnet1,vhost=on,vhostfd=23 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:e0:fc:01:0f:00,bus=pci.0,addr=0x4 -netdev tap,fd=24,id=hostnet2,vhost=on,vhostfd=25 -device virtio-net-pci,netdev=hostnet2,id=net2,mac=00:e0:fc:02:0f:00,bus=pci.0,addr=0x5 -netdev tap,fd=26,id=hostnet3,vhost=on,vhostfd=27 -device virtio-net-pci,netdev=hostnet3,id=net3,mac=00:e0:fc:03:0f:00,bus=pci.0,addr=0x6 -netdev tap,fd=28,id=hostnet4,vhost=on,vhostfd=29 -device virtio-net-pci,netdev=hostnet4,id=net4,mac=00:e0:fc:0a:0f:00,bus=pci.0,addr=0x7 -netdev tap,fd=30,id=hostnet5,vhost=on,vhostfd=31 -device virtio-net-pci,netdev=hostnet5,id=net5,mac=00:e0:fc:0b:0f:00,bus=pci.0,addr=0x9 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc *:0 -k en-us -vga cirrus -device i6300esb,id=watchdog0,bus=pci.0,addr=0xb -watchdog-action poweroff -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xa Which QEMU version is this? Can you try with e1000 NICs instead of virtio? -- Gleb.
Re: [Qemu-devel] [PATCH 1/2] memory: Provide separate handling of unassigned io ports accesses
On 2013-08-05 11:34, Andreas Färber wrote: Am 03.08.2013 10:31, schrieb Jan Kiszka: From: Jan Kiszka jan.kis...@siemens.com Accesses to unassigned io ports shall return -1 on read and be ignored on write. Ensure these properties via dedicated ops, decoupling us from the memory core's handling of unassigned accesses. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- exec.c|3 ++- include/exec/ioport.h |2 ++ ioport.c | 16 3 files changed, 20 insertions(+), 1 deletions(-) diff --git a/exec.c b/exec.c index 3ca9381..9ed598f 100644 --- a/exec.c +++ b/exec.c @@ -1820,7 +1820,8 @@ static void memory_map_init(void) address_space_init(address_space_memory, system_memory, memory); system_io = g_malloc(sizeof(*system_io)); -memory_region_init(system_io, NULL, io, 65536); +memory_region_init_io(system_io, NULL, unassigned_io_ops, NULL, io, + 65536); It was reported that there may be some machines/PHBs that have overlapping PIO/MMIO. Unless we use priorities, this ..._io MemoryRegion will shadow or conflict with any ..._io MemoryRegion added to the memory address space, wouldn't it? I cannot follow yet. This is the base mem-region for the PIO space. If some region from the MMIO space forwards request here and there is no backing device, this base region will handle it. So, even on arch with MMIO-based PIO, both address space should remain separate. Or am I missing something? Jan signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH v6] e1000: add interrupt mitigation support
Am 05.08.2013 11:27, schrieb Stefan Hajnoczi: On Fri, Aug 02, 2013 at 06:30:52PM +0200, Vincenzo Maffione wrote: This patch partially implements the e1000 interrupt mitigation mechanisms. Using a single QEMUTimer, it emulates the ITR register (which is the newer mitigation register, recommended by Intel) and approximately emulates RADV and TADV registers. TIDV and RDTR register functionalities are not emulated (RDTR is only used to validate RADV, according to the e1000 specs). RADV, TADV, TIDV and RDTR registers make up the older e1000 mitigation mechanism and would need a timer each to be completely emulated. However, a single timer has been used in order to reach a good compromise between emulation accuracy and simplicity/efficiency. The implemented mechanism can be enabled/disabled specifying the command line e1000-specific boolean parameter mitigation, e.g. qemu-system-x86_64 -device e1000,mitigation=on,... ... For more information, see the Software developer's manual at http://download.intel.com/design/network/manuals/8254x_GBe_SDM.pdf. Interrupt mitigation boosts performance when the guest suffers from an high interrupt rate (i.e. receiving short UDP packets at high packet rate). For some numerical results see the following link http://info.iet.unipi.it/~luigi/papers/20130520-rizzo-vm.pdf Signed-off-by: Vincenzo Maffione v.maffi...@gmail.com --- Added pc-*-1.7 machines (default machine moved to pc-i440fx-1.7). hw/i386/pc_piix.c| 18 ++- hw/i386/pc_q35.c | 16 ++- hw/net/e1000.c | 131 +-- include/hw/i386/pc.h | 8 4 files changed, 167 insertions(+), 6 deletions(-) I will merge this into my net tree soon but I'd like to wait for Andreas' review to make sure he's comfortable with the machine type changes too. I'm okay with the machine changes, just not comfortable reviewing the net code. Feel free to add a Reviewed-by: Andreas Färber afaer...@suse.de (for pc-* machines) if you wish. Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
Re: [Qemu-devel] [PATCH] pci: fix i82801b11 bridge
Am 05.08.2013 10:50, schrieb Gerd Hoffmann: Signed-off-by: Gerd Hoffmann kra...@redhat.com --- hw/pci-bridge/i82801b11.c |1 + 1 file changed, 1 insertion(+) diff --git a/hw/pci-bridge/i82801b11.c b/hw/pci-bridge/i82801b11.c index 8a5e426..14cd7fd 100644 --- a/hw/pci-bridge/i82801b11.c +++ b/hw/pci-bridge/i82801b11.c @@ -90,6 +90,7 @@ static void i82801b11_bridge_class_init(ObjectClass *klass, void *data) k-device_id = PCI_DEVICE_ID_INTEL_82801BA_11; k-revision = ICH9_D2P_A2_REVISION; k-init = i82801b11_bridge_initfn; +k-config_write = pci_bridge_write_config; set_bit(DEVICE_CATEGORY_BRIDGE, dc-categories); } We have an abstract TYPE_PCI_BRIDGE now, suggest to set it in its class_init instead if that's a generic bridge function. Probably should be for-1.6? Cheers, Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
Re: [Qemu-devel] [PATCH 1/2] memory: Provide separate handling of unassigned io ports accesses
On 5 August 2013 10:34, Andreas Färber afaer...@suse.de wrote: Am 03.08.2013 10:31, schrieb Jan Kiszka: From: Jan Kiszka jan.kis...@siemens.com Accesses to unassigned io ports shall return -1 on read and be ignored on write. Ensure these properties via dedicated ops, decoupling us from the memory core's handling of unassigned accesses. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- exec.c|3 ++- include/exec/ioport.h |2 ++ ioport.c | 16 3 files changed, 20 insertions(+), 1 deletions(-) diff --git a/exec.c b/exec.c index 3ca9381..9ed598f 100644 --- a/exec.c +++ b/exec.c @@ -1820,7 +1820,8 @@ static void memory_map_init(void) address_space_init(address_space_memory, system_memory, memory); system_io = g_malloc(sizeof(*system_io)); -memory_region_init(system_io, NULL, io, 65536); +memory_region_init_io(system_io, NULL, unassigned_io_ops, NULL, io, + 65536); It was reported that there may be some machines/PHBs that have overlapping PIO/MMIO. Unless we use priorities, this ..._io MemoryRegion will shadow or conflict with any ..._io MemoryRegion added to the memory address space, wouldn't it? Priorities only apply between different subregions within a container. This is adding IO operations to the container itself, so there's no priority issue here: the container's operations always come last, behind any subregions it has. (Do we have any existing examples of container regions with their own default IO operations? The memory.c code clearly expects them to be OK, though - eg render_memory_region() specifically does render subregions; then render the region itself into any gaps.) Or do you mean that if we had: [ system memory region, with its own default read/write ops ] [ io region mapped into it ] [ io ] [ io ][io] that now if you access the bit of system memory corresponding to the I/O region at some address with no specific IO port, you'll get the IO region's defaults, rather than the system memory region's defaults? I think that's true and possibly a change in behaviour. Do we have any boards that do that? -- PMM
Re: [Qemu-devel] [PATCH 0/4]: timers thread-safe stuff
Pingfan, --On 5 August 2013 15:33:22 +0800 Liu Ping Fan pingf...@linux.vnet.ibm.com wrote: The patches has been rebased onto Alex's [RFC] [PATCHv5 00/16] aio / timers: Add AioContext timers and use ppoll permalink.gmane.org/gmane.comp.emulators.qemu/226333 For some other complied error issue, I can not finish compiling, will fix it later. Changes since last version: 1. drop the overlap partition and leave only thread-safe stuff 2. For timers_state, since currently, only vm_clock can be read outside BQL. There is no protection with ticks(since the protection will cost more in read_tsc path). 3. use light weight QemuEvent to re-implement the qemu_clock_enable(foo,false) I think you may need to protect a little more. For instance, do we need to take a lock whilst traversing a QEMUTimerList? I hope the answer to this is no. The design idea of my stuff was that only one thread would be manipulating or traversing this list. As the notify function is a property of the QEMUTimerList itself, no traversal is necessary for that. However, the icount stuff (per my patch 15) does look at the deadlines for the vm_clock QEMUTimerLists (which is the first entry). Is that always going to be thread safe? Before the icount stuff, I was pretty certain this did not need a lock, but perhaps it does now. If the soonest deadline on any QEMUTimerList was stored in a 64 bit atomic variable, this might remove the need for a lock; it's possible that putting some memory barrier operations in might be enough. Also, we maintain a per-clock list of QEMUTimerLists. This list is traversed by the icount stuff, by things adding and removing timers (e.g. creation/deletion of AioContext) and whenever a QEMUClock is reenabled. I think this DOES need a lock. Aside from the icount stuff, usage is very infrequent. It's far more frequent with the icount stuff, so it should probably be optimised for that case. -- Alex Bligh
Re: [Qemu-devel] [PATCH 1/2] memory: Provide separate handling of unassigned io ports accesses
Am 05.08.2013 11:59, schrieb Peter Maydell: On 5 August 2013 10:34, Andreas Färber afaer...@suse.de wrote: Am 03.08.2013 10:31, schrieb Jan Kiszka: From: Jan Kiszka jan.kis...@siemens.com Accesses to unassigned io ports shall return -1 on read and be ignored on write. Ensure these properties via dedicated ops, decoupling us from the memory core's handling of unassigned accesses. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- exec.c|3 ++- include/exec/ioport.h |2 ++ ioport.c | 16 3 files changed, 20 insertions(+), 1 deletions(-) diff --git a/exec.c b/exec.c index 3ca9381..9ed598f 100644 --- a/exec.c +++ b/exec.c @@ -1820,7 +1820,8 @@ static void memory_map_init(void) address_space_init(address_space_memory, system_memory, memory); system_io = g_malloc(sizeof(*system_io)); -memory_region_init(system_io, NULL, io, 65536); +memory_region_init_io(system_io, NULL, unassigned_io_ops, NULL, io, + 65536); It was reported that there may be some machines/PHBs that have overlapping PIO/MMIO. Unless we use priorities, this ..._io MemoryRegion will shadow or conflict with any ..._io MemoryRegion added to the memory address space, wouldn't it? Priorities only apply between different subregions within a container. This is adding IO operations to the container itself, so there's no priority issue here: the container's operations always come last, behind any subregions it has. (Do we have any existing examples of container regions with their own default IO operations? The memory.c code clearly expects them to be OK, though - eg render_memory_region() specifically does render subregions; then render the region itself into any gaps.) Or do you mean that if we had: [ system memory region, with its own default read/write ops ] [ io region mapped into it ] [ io ] [ io ][io] that now if you access the bit of system memory corresponding to the I/O region at some address with no specific IO port, you'll get the IO region's defaults, rather than the system memory region's defaults? I think that's true and possibly a change in behaviour. Yes, that's what I thought someone brought up in one of those lengthy memory discussions: Accesses between the first two [io]s would now seem to go to [io region] rather than into [system memory region] subregions at the same level as [io region]. Do we have any boards that do that? Sorry, I don't remember who brought it up, hopefully Paolo remembers? Possibly sPAPR? Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
Re: [Qemu-devel] [PATCH 1/2] memory: Provide separate handling of unassigned io ports accesses
On 2013-08-05 11:59, Peter Maydell wrote: On 5 August 2013 10:34, Andreas Färber afaer...@suse.de wrote: Am 03.08.2013 10:31, schrieb Jan Kiszka: From: Jan Kiszka jan.kis...@siemens.com Accesses to unassigned io ports shall return -1 on read and be ignored on write. Ensure these properties via dedicated ops, decoupling us from the memory core's handling of unassigned accesses. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- exec.c|3 ++- include/exec/ioport.h |2 ++ ioport.c | 16 3 files changed, 20 insertions(+), 1 deletions(-) diff --git a/exec.c b/exec.c index 3ca9381..9ed598f 100644 --- a/exec.c +++ b/exec.c @@ -1820,7 +1820,8 @@ static void memory_map_init(void) address_space_init(address_space_memory, system_memory, memory); system_io = g_malloc(sizeof(*system_io)); -memory_region_init(system_io, NULL, io, 65536); +memory_region_init_io(system_io, NULL, unassigned_io_ops, NULL, io, + 65536); It was reported that there may be some machines/PHBs that have overlapping PIO/MMIO. Unless we use priorities, this ..._io MemoryRegion will shadow or conflict with any ..._io MemoryRegion added to the memory address space, wouldn't it? Priorities only apply between different subregions within a container. This is adding IO operations to the container itself, so there's no priority issue here: the container's operations always come last, behind any subregions it has. (Do we have any existing examples of container regions with their own default IO operations? The memory.c code clearly expects them to be OK, though - eg render_memory_region() specifically does render subregions; then render the region itself into any gaps.) Or do you mean that if we had: [ system memory region, with its own default read/write ops ] I cannot imagine how this could work. The system memory region has no clue about what the regions below it can handle and what not. So it has to pass through the io window. Jan [ io region mapped into it ] [ io ] [ io ][io] that now if you access the bit of system memory corresponding to the I/O region at some address with no specific IO port, you'll get the IO region's defaults, rather than the system memory region's defaults? I think that's true and possibly a change in behaviour. Do we have any boards that do that? -- PMM signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH] pci: fix i82801b11 bridge
On 08/05/13 11:45, Andreas Färber wrote: Am 05.08.2013 10:50, schrieb Gerd Hoffmann: Signed-off-by: Gerd Hoffmann kra...@redhat.com --- hw/pci-bridge/i82801b11.c |1 + 1 file changed, 1 insertion(+) diff --git a/hw/pci-bridge/i82801b11.c b/hw/pci-bridge/i82801b11.c index 8a5e426..14cd7fd 100644 --- a/hw/pci-bridge/i82801b11.c +++ b/hw/pci-bridge/i82801b11.c @@ -90,6 +90,7 @@ static void i82801b11_bridge_class_init(ObjectClass *klass, void *data) k-device_id = PCI_DEVICE_ID_INTEL_82801BA_11; k-revision = ICH9_D2P_A2_REVISION; k-init = i82801b11_bridge_initfn; +k-config_write = pci_bridge_write_config; set_bit(DEVICE_CATEGORY_BRIDGE, dc-categories); } We have an abstract TYPE_PCI_BRIDGE now, suggest to set it in its class_init instead if that's a generic bridge function. Well, depends on the bridge device. When they have stuff beside core pci bridge functionality (such as shpc hotplug support like the 'pci-bridge' device) they need their private function. But those can still override config_write in the device init I guess. /me leaves that to mst to decide. Beside that it should most likely go into a separate cleanup patch as there are other devices which hook pci_bridge_write_config the same way. Probably should be for-1.6? Yes. cheers, Gerd
Re: [Qemu-devel] [PATCH 1/2] memory: Provide separate handling of unassigned io ports accesses
On 5 August 2013 11:30, Jan Kiszka jan.kis...@web.de wrote: On 2013-08-05 11:59, Peter Maydell wrote: Or do you mean that if we had: [ system memory region, with its own default read/write ops ] I cannot imagine how this could work. The system memory region has no clue about what the regions below it can handle and what not. So it has to pass through the io window. The system memory region's just a container, you can add a background region to it at lowest-possible-priority, which then takes accesses which are either (a) not in any subregion or (b) in a subregion but that container doesn't specify its own io ops and nothing in that container handles the access. (Or you could create the system memory region with its own IO ops, which would have the same effect.) This works because the memory subsystem flattens the whole tree, and at flattening time render_memory_region() effectively knows if there are gaps in the subcontainer or not. At runtime we never pass through anything (except for the special case of an IOMMU) -- the tree of regions has been preflattened and the target of the memory access is known in advance. -- PMM
Re: [Qemu-devel] [PATCH 1/2] memory: Provide separate handling of unassigned io ports accesses
On 2013-08-05 12:36, Peter Maydell wrote: On 5 August 2013 11:30, Jan Kiszka jan.kis...@web.de wrote: On 2013-08-05 11:59, Peter Maydell wrote: Or do you mean that if we had: [ system memory region, with its own default read/write ops ] I cannot imagine how this could work. The system memory region has no clue about what the regions below it can handle and what not. So it has to pass through the io window. The system memory region's just a container, you can add a background region to it at lowest-possible-priority, which then takes accesses which are either (a) not in any subregion or (b) in a subregion but that container doesn't specify its own io ops and nothing in that container handles the access. (Or you could create the system memory region with its own IO ops, which would have the same effect.) First, we do not render MMIO and IO within the same address space so far. But even if we would, the IO container now catches all accesses, so the system memory region will never have its default handler run for that window. Jan signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH] target-mips: fix decoding of microMIPS POOL32Axf instructions
On Mon, Aug 05, 2013 at 08:41:52AM +0100, Leon Alrae wrote: On 03/08/13 23:01, Aurelien Jarno wrote: On Thu, Aug 01, 2013 at 11:02:27AM +0100, Leon Alrae wrote: These are not DSP instructions, thus there is no ac field. For more details please refer to instruction encoding of MULT, MULTU, MADD, MADDU, MSUB, MSUBU, MFHI, MFLO, MTHI, MTLO in MIPS Architecture for Programmers Volume II-B: The microMIPS32 Instruction Set These instructions have both a non DSP version without the ac field, and a DSP version with the ac field. The encoding of the ac field is done in bits 14 and 15, so the current QEMU code is correct. Please refer to the MIPS32® Architecture Manual Volume IV-e: The MIPS® DSP Application-Specific Extension to the microMIPS32® Architecture (document MD00764). Please note that the patch modifies non-DSP version of listed instructions, where ac field doesn't exist. In this case reading bits This is correct, except for MFHI, MFLO, MTHI, MTLO which share the same opcode for the DSP and non-DSP version. Theses instructions have bits 14 and 15 set to 0 for the non-DSP version, so they are working as expected and thus your patch should not modify them. 14 and 15 and passing it as acc argument to gen_muldiv() or gen_HILO() is not correct. If those bits are not 0 (and for most of the listed instructions this is true) then check_dsp() is called which will cause an exception if DSP is not supported / disabled. This is correct. The current code assumes that all instructions using gen_muldiv() have the same opcode for non-DSP and DSP version (actually it's weird that they have a different encoding, it's just wasting some opcode space). Therefore what should be done: - The part concerning MFHI, MFLO, MTHI, MTLO should be dropped from your patch as it already works correctly. - The part of your patch concerning instructions calling gen_muldiv() is correct and should be kept to handle the non-DSP version of these instructions. - An entry with for the DSP version of these instructions should be created, calling gen_muldiv() passing the ac field from bits 14 and 15. This is basically the same code than now, just with extension 0x32 instead of 0x2c. -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurel...@aurel32.net http://www.aurel32.net
Re: [Qemu-devel] [PATCH 1/2] memory: Provide separate handling of unassigned io ports accesses
On 5 August 2013 11:44, Jan Kiszka jan.kis...@web.de wrote: On 2013-08-05 12:36, Peter Maydell wrote: On 5 August 2013 11:30, Jan Kiszka jan.kis...@web.de wrote: On 2013-08-05 11:59, Peter Maydell wrote: Or do you mean that if we had: [ system memory region, with its own default read/write ops ] I cannot imagine how this could work. The system memory region has no clue about what the regions below it can handle and what not. So it has to pass through the io window. The system memory region's just a container, you can add a background region to it at lowest-possible-priority, which then takes accesses which are either (a) not in any subregion or (b) in a subregion but that container doesn't specify its own io ops and nothing in that container handles the access. (Or you could create the system memory region with its own IO ops, which would have the same effect.) First, we do not render MMIO and IO within the same address space so far. Is this a statement made because you've checked all the boards and know that nobody's mapping the system-io memory region into the system-memory region? (It is technically trivial, you just need to call memory_region_add_subregion() directly or indirectly...) But even if we would, the IO container now catches all accesses, so the system memory region will never have its default handler run for that window. Yes, that is the point -- before the IO container caught all accesses, the default handler that would run would be the system memory region, but now that the IO container has default read/write ops they will take precedence. So this would be a behaviour change if there are any boards set up like this. (I'm not sure there are any, though.) -- PMM
Re: [Qemu-devel] [PATCH 4/4] timer: make qemu_clock_enable sync between disable and timer's cb
On Aug 05 2013, Liu Ping Fan wrote: After disabling the QemuClock, we should make sure that no QemuTimers are still in flight. To implement that with light overhead, we resort to QemuEvent. The caller of disabling will wait on QemuEvent of each timerlist. Note, qemu_clock_enable(foo,false) can _not_ be called from timer's cb. And the callers of qemu_clock_enable() should be sync by themselves, not protected by this patch. Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com --- include/qemu/timer.h | 1 + qemu-timer.c | 11 +++ 2 files changed, 12 insertions(+) diff --git a/include/qemu/timer.h b/include/qemu/timer.h index 1363316..ca09ba2 100644 --- a/include/qemu/timer.h +++ b/include/qemu/timer.h @@ -85,6 +85,7 @@ int64_t timerlistgroup_deadline_ns(QEMUTimerListGroup tlg); int qemu_timeout_ns_to_ms(int64_t ns); int qemu_poll_ns(GPollFD *fds, uint nfds, int64_t timeout); +/* The disable of clock can not be called in timer's cb */ See below for a more verbose version of the comment. For now leave it only in the .c file, we should add comments to all of timer.h. void qemu_clock_enable(QEMUClock *clock, bool enabled); void qemu_clock_warp(QEMUClock *clock); diff --git a/qemu-timer.c b/qemu-timer.c index ebe7597..5828107 100644 --- a/qemu-timer.c +++ b/qemu-timer.c @@ -71,6 +71,8 @@ struct QEMUTimerList { QLIST_ENTRY(QEMUTimerList) list; QEMUTimerListNotifyCB *notify_cb; void *notify_opaque; +/* light weight method to mark the end of timerlist's running */ +QemuEvent ev; }; struct QEMUTimer { @@ -92,6 +94,7 @@ static QEMUTimerList *timerlist_new_from_clock(QEMUClock *clock) QEMUTimerList *tl; tl = g_malloc0(sizeof(QEMUTimerList)); +qemu_event_init(tl-ev, false); The event should start as set, since set means not inside qemu_run_timers. tl-clock = clock; QLIST_INSERT_HEAD(clock-timerlists, tl, list); return tl; @@ -145,12 +148,18 @@ void qemu_clock_notify(QEMUClock *clock) } } +/* The disable of clock can _not_ be called from timer's cb */ /* Disabling the clock will wait for related timerlists to stop * executing qemu_run_timers. Thus, this functions should not * be used from the callback of a timer that is based on @clock. * Doing so would cause a deadlock. */ void qemu_clock_enable(QEMUClock *clock, bool enabled) { +QEMUTimerList *tl; bool old = clock-enabled; clock-enabled = enabled; if (enabled !old) { qemu_clock_notify(clock); +} else if (!enabled old) { +QLIST_FOREACH(tl, clock-timerlists, list) { +qemu_event_wait(tl-ev); +} } } @@ -419,6 +428,7 @@ bool timerlist_run_timers(QEMUTimerList *tl) } current_time = qemu_get_clock_ns(tl-clock); +qemu_event_reset(tl-ev); Race condition here. You need to test clock-enabled while the event is reset. Otherwise you get: - thread 1 is runningthread 2 is running qemu_clock_enable(foo, false) qemu_run_timers(tl); - ** event is initially set ** if (!clock-enabled) return; clock-enabled = false; qemu_event_wait(tl-ev); return; qemu_event_reset(tl-ev); invokes callback qemu_event_set(tl-ev); - violating the invariant that no callbacks are invoked after the return from qemu_clock_enable(foo, false). Paolo for(;;) { ts = tl-active_timers; if (!qemu_timer_expired_ns(ts, current_time)) { @@ -432,6 +442,7 @@ bool timerlist_run_timers(QEMUTimerList *tl) ts-cb(ts-opaque); progress = true; } +qemu_event_set(tl-ev); return progress; } -- 1.8.1.4
Re: [Qemu-devel] [PATCH 1/2] memory: Provide separate handling of unassigned io ports accesses
On 2013-08-05 12:51, Peter Maydell wrote: On 5 August 2013 11:44, Jan Kiszka jan.kis...@web.de wrote: On 2013-08-05 12:36, Peter Maydell wrote: On 5 August 2013 11:30, Jan Kiszka jan.kis...@web.de wrote: On 2013-08-05 11:59, Peter Maydell wrote: Or do you mean that if we had: [ system memory region, with its own default read/write ops ] I cannot imagine how this could work. The system memory region has no clue about what the regions below it can handle and what not. So it has to pass through the io window. The system memory region's just a container, you can add a background region to it at lowest-possible-priority, which then takes accesses which are either (a) not in any subregion or (b) in a subregion but that container doesn't specify its own io ops and nothing in that container handles the access. (Or you could create the system memory region with its own IO ops, which would have the same effect.) First, we do not render MMIO and IO within the same address space so far. Is this a statement made because you've checked all the boards and know that nobody's mapping the system-io memory region into the system-memory region? (It is technically trivial, you just need to call memory_region_add_subregion() directly or indirectly...) I know this because I just recently wrote the patch that enables this trivial step, i.e. converted PIO dispatching to the memory subsystem. Jan signature.asc Description: OpenPGP digital signature
[Qemu-devel] [PATCH v2] target-arm: Implement 'int' loglevel
The 'int' loglevel for recording interrupts and exceptions requires support in the target-specific code. Implement it for ARM. This improves debug logging in some situations that were otherwise pretty opaque, such as when we fault trying to execute at an exception vector address, which would otherwise cause an infinite loop of taking exceptions without any indication in the debug log of what was going on. Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- Changes v1-v2: added extra 'const' to excnames[] definition as per rth review. target-arm/helper.c | 42 ++ 1 file changed, 42 insertions(+) diff --git a/target-arm/helper.c b/target-arm/helper.c index 4968391..6d9026d 100644 --- a/target-arm/helper.c +++ b/target-arm/helper.c @@ -1974,6 +1974,37 @@ static void do_v7m_exception_exit(CPUARMState *env) pointer. */ } +/* Exception names for debug logging; note that not all of these + * precisely correspond to architectural exceptions. + */ +static const char * const excnames[] = { +[EXCP_UDEF] = Undefined Instruction, +[EXCP_SWI] = SVC, +[EXCP_PREFETCH_ABORT] = Prefetch Abort, +[EXCP_DATA_ABORT] = Data Abort, +[EXCP_IRQ] = IRQ, +[EXCP_FIQ] = FIQ, +[EXCP_BKPT] = Breakpoint, +[EXCP_EXCEPTION_EXIT] = QEMU v7M exception exit, +[EXCP_KERNEL_TRAP] = QEMU intercept of kernel commpage, +[EXCP_STREX] = QEMU intercept of STREX, +}; + +static inline void arm_log_exception(int idx) +{ +if (qemu_loglevel_mask(CPU_LOG_INT)) { +const char *exc = NULL; + +if (idx = 0 idx ARRAY_SIZE(excnames)) { +exc = excnames[idx]; +} +if (!exc) { +exc = unknown; +} +qemu_log_mask(CPU_LOG_INT, Taking exception %d [%s]\n, idx, exc); +} +} + void arm_v7m_cpu_do_interrupt(CPUState *cs) { ARMCPU *cpu = ARM_CPU(cs); @@ -1982,6 +2013,8 @@ void arm_v7m_cpu_do_interrupt(CPUState *cs) uint32_t lr; uint32_t addr; +arm_log_exception(env-exception_index); + lr = 0xfff1; if (env-v7m.current_sp) lr |= 4; @@ -2011,6 +2044,7 @@ void arm_v7m_cpu_do_interrupt(CPUState *cs) if (nr == 0xab) { env-regs[15] += 2; env-regs[0] = do_arm_semihosting(env); +qemu_log_mask(CPU_LOG_INT, ...handled as semihosting call\n); return; } } @@ -2064,6 +2098,8 @@ void arm_cpu_do_interrupt(CPUState *cs) assert(!IS_M(env)); +arm_log_exception(env-exception_index); + /* TODO: Vectored interrupt controller. */ switch (env-exception_index) { case EXCP_UDEF: @@ -2091,6 +2127,7 @@ void arm_cpu_do_interrupt(CPUState *cs) || (mask == 0xab env-thumb)) (env-uncached_cpsr CPSR_M) != ARM_CPU_MODE_USR) { env-regs[0] = do_arm_semihosting(env); +qemu_log_mask(CPU_LOG_INT, ...handled as semihosting call\n); return; } } @@ -2108,18 +2145,23 @@ void arm_cpu_do_interrupt(CPUState *cs) (env-uncached_cpsr CPSR_M) != ARM_CPU_MODE_USR) { env-regs[15] += 2; env-regs[0] = do_arm_semihosting(env); +qemu_log_mask(CPU_LOG_INT, ...handled as semihosting call\n); return; } } env-cp15.c5_insn = 2; /* Fall through to prefetch abort. */ case EXCP_PREFETCH_ABORT: +qemu_log_mask(CPU_LOG_INT, ...with IFSR 0x%x IFAR 0x%x\n, + env-cp15.c5_insn, env-cp15.c6_insn); new_mode = ARM_CPU_MODE_ABT; addr = 0x0c; mask = CPSR_A | CPSR_I; offset = 4; break; case EXCP_DATA_ABORT: +qemu_log_mask(CPU_LOG_INT, ...with DFSR 0x%x DFAR 0x%x\n, + env-cp15.c5_data, env-cp15.c6_data); new_mode = ARM_CPU_MODE_ABT; addr = 0x10; mask = CPSR_A | CPSR_I; -- 1.7.9.5
Re: [Qemu-devel] [PATCH 1/2] memory: Provide separate handling of unassigned io ports accesses
On Aug 05 2013, Jan Kiszka wrote: On 2013-08-05 12:36, Peter Maydell wrote: On 5 August 2013 11:30, Jan Kiszka jan.kis...@web.de wrote: On 2013-08-05 11:59, Peter Maydell wrote: Or do you mean that if we had: [ system memory region, with its own default read/write ops ] I cannot imagine how this could work. The system memory region has no clue about what the regions below it can handle and what not. So it has to pass through the io window. The system memory region's just a container, you can add a background region to it at lowest-possible-priority, which then takes accesses which are either (a) not in any subregion or (b) in a subregion but that container doesn't specify its own io ops and nothing in that container handles the access. (Or you could create the system memory region with its own IO ops, which would have the same effect.) First, we do not render MMIO and IO within the same address space so far. This is not true, we do render the I/O address space in system memory on machines that support PCI or ISA but are not x86. We do that through an alias of get_system_io(), but that doesn't change the picture. But even if we would, the IO container now catches all accesses, so the system memory region will never have its default handler run for that window. This however is true, and is the right thing to do. It is also very similar to the workaround that Richard did on Alpha (commit 3661049fec64ffd7ab008e57e396881c6a4b53a4) and it should be possible to revert that one indeed. Paolo
Re: [Qemu-devel] [PATCH 1/2] memory: Provide separate handling of unassigned io ports accesses
On Aug 05 2013, Peter Maydell wrote: But even if we would, the IO container now catches all accesses, so the system memory region will never have its default handler run for that window. Yes, that is the point -- before the IO container caught all accesses, the default handler that would run would be the system memory region, but now that the IO container has default read/write ops they will take precedence. So this would be a behaviour change if there are any boards set up like this. (I'm not sure there are any, though.) Before 1.6 this would have happened anyway. Before 1.6, all I/O accesses would go through cpu_in/cpu_out, and would never hit the system memory region. So Jan's patch is effectively restoring old behavior. Paolo
[Qemu-devel] [PATCH v4 0/2] ARM: add 'virt' platform
This patch series adds a 'virt' platform which uses the kernel's mach-virt (fully device-tree driven) support to create a simple minimalist platform intended for use for KVM VM guests. It's based on John Rigby's patches, but I've overhauled it a lot: * renamed user-facing machine to just virt * removed the A9 support (it can't work since the A9 has no generic timers) * added virtio-mmio transports instead of random set of 'soc' devices * instead of updating io_base as we step through adding devices, define a memory map with an array (similar to vexpress) * folded in some minor fixes from John's aarch64-support patch * rather than explicitly doing endian-swapping on FDT cells, use fdt APIs that let us just pass in host-endian values and let the fdt layer take care of the swapping * miscellaneous minor code cleanups and style fixes If you want to test this with TCG QEMU you'll also need the generic-timers implementation patches I posted recently. A branch with generic-timers plus these patches is here: https://git.linaro.org/gitweb?p=people/pmaydell/qemu-arm.git;a=shortlog;h=refs/heads/mach-virt (The kernel in pure mach-virt mode requires generic timers; it can't deal with getting its clock source from an sp804 timer specified by the device tree. This might be fixed in a future kernel, but dropping all the soc-device support from mach-virt makes it simpler anyway.) An obvious thing this machine does not provide is a serial port. I would rather just use virtio-console (and we should implement the 'emergency console/earlyprintk' bit of the virtio spec). John Rigby (2): ARM: Allow boards to provide an fdt blob ARM: Add 'virt' platform hw/arm/Makefile.objs |2 +- hw/arm/boot.c| 32 +++-- hw/arm/virt.c| 363 ++ include/hw/arm/arm.h |7 + 4 files changed, 391 insertions(+), 13 deletions(-) create mode 100644 hw/arm/virt.c -- 1.7.9.5
[Qemu-devel] [PATCH v4 1/2] ARM: Allow boards to provide an fdt blob
From: John Rigby john.ri...@linaro.org If no fdt is provided on command line and the new field get_dtb in struct arm_boot_info is set then call it to get a device tree blob. Signed-off-by: John Rigby john.ri...@linaro.org [PMM: minor tweaks and cleanup] Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- hw/arm/boot.c| 32 include/hw/arm/arm.h |7 +++ 2 files changed, 27 insertions(+), 12 deletions(-) diff --git a/hw/arm/boot.c b/hw/arm/boot.c index 2cbeefd..9d790b7 100644 --- a/hw/arm/boot.c +++ b/hw/arm/boot.c @@ -228,23 +228,31 @@ static void set_kernel_args_old(const struct arm_boot_info *info) static int load_dtb(hwaddr addr, const struct arm_boot_info *binfo) { void *fdt = NULL; -char *filename; int size, rc; uint32_t acells, scells; -filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, binfo-dtb_filename); -if (!filename) { -fprintf(stderr, Couldn't open dtb file %s\n, binfo-dtb_filename); -goto fail; -} +if (binfo-dtb_filename) { +char *filename; +filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, binfo-dtb_filename); +if (!filename) { +fprintf(stderr, Couldn't open dtb file %s\n, binfo-dtb_filename); +goto fail; +} -fdt = load_device_tree(filename, size); -if (!fdt) { -fprintf(stderr, Couldn't open dtb file %s\n, filename); +fdt = load_device_tree(filename, size); +if (!fdt) { +fprintf(stderr, Couldn't open dtb file %s\n, filename); +g_free(filename); +goto fail; +} g_free(filename); -goto fail; +} else if (binfo-get_dtb) { +fdt = binfo-get_dtb(binfo, size); +if (!fdt) { +fprintf(stderr, Board was unable to create a dtb blob\n); +goto fail; +} } -g_free(filename); acells = qemu_devtree_getprop_cell(fdt, /, #address-cells); scells = qemu_devtree_getprop_cell(fdt, /, #size-cells); @@ -436,7 +444,7 @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info) /* for device tree boot, we pass the DTB directly in r2. Otherwise * we point to the kernel args. */ -if (info-dtb_filename) { +if (info-dtb_filename || info-get_dtb) { /* Place the DTB after the initrd in memory. Note that some * kernels will trash anything in the 4K page the initrd * ends in, so make sure the DTB isn't caught up in that. diff --git a/include/hw/arm/arm.h b/include/hw/arm/arm.h index bae87c6..4e51a9e 100644 --- a/include/hw/arm/arm.h +++ b/include/hw/arm/arm.h @@ -55,6 +55,13 @@ struct arm_boot_info { const struct arm_boot_info *info); void (*secondary_cpu_reset_hook)(ARMCPU *cpu, const struct arm_boot_info *info); +/* if a board is able to create a dtb without a dtb file then it + * sets get_dtb. This will only be used if no dtb file is provided + * by the user. On success, sets *size to the length of the created + * dtb, and returns a pointer to it. (The caller must free this memory + * with g_free() when it has finished with it.) On failure, returns NULL. + */ +void *(*get_dtb)(const struct arm_boot_info *info, int *size); /* if a board needs to be able to modify a device tree provided by * the user it should implement this hook. */ -- 1.7.9.5
Re: [Qemu-devel] [PATCH v4 2/2] ARM: Add 'virt' platform
On 5 August 2013 12:18, Peter Maydell peter.mayd...@linaro.org wrote: From: John Rigby john.ri...@linaro.org Add 'virt' platform support corresponding to arch/arm/mach-virt in the Linux kernel tree. This has no platform-specific code but can use any device whose kernel driver is is able to work purely from a device tree node. We use this to instantiate a minimal set of devices: a GIC and some virtio-mmio transports. TODO: * MAX_MEM for A15 could be higher if we shuffle things about * RAM or flash at addr 0?? Doh. I addressed these TODO items in the code but forgot to remove them from the commit message. Please ignore for purposes of code review :-) -- PMM
[Qemu-devel] [PATCH v4 2/2] ARM: Add 'virt' platform
From: John Rigby john.ri...@linaro.org Add 'virt' platform support corresponding to arch/arm/mach-virt in the Linux kernel tree. This has no platform-specific code but can use any device whose kernel driver is is able to work purely from a device tree node. We use this to instantiate a minimal set of devices: a GIC and some virtio-mmio transports. TODO: * MAX_MEM for A15 could be higher if we shuffle things about * RAM or flash at addr 0?? Signed-off-by: John Rigby john.ri...@linaro.org [PMM: Significantly overhauled: * renamed user-facing machine to just virt * removed the A9 support (it can't work since the A9 has no generic timers) * added virtio-mmio transports instead of random set of 'soc' devices * instead of updating io_base as we step through adding devices, define a memory map with an array (similar to vexpress) * folded in some minor fixes from John's aarch64-support patch * rather than explicitly doing endian-swapping on FDT cells, use fdt APIs that let us just pass in host-endian values and let the fdt layer take care of the swapping * miscellaneous minor code cleanups and style fixes ] Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- hw/arm/Makefile.objs |2 +- hw/arm/virt.c| 363 ++ 2 files changed, 364 insertions(+), 1 deletion(-) create mode 100644 hw/arm/virt.c diff --git a/hw/arm/Makefile.objs b/hw/arm/Makefile.objs index 9e3a06f..744484f 100644 --- a/hw/arm/Makefile.objs +++ b/hw/arm/Makefile.objs @@ -1,7 +1,7 @@ obj-y += boot.o collie.o exynos4_boards.o gumstix.o highbank.o obj-y += integratorcp.o kzm.o mainstone.o musicpal.o nseries.o obj-y += omap_sx1.o palm.o pic_cpu.o realview.o spitz.o stellaris.o -obj-y += tosa.o versatilepb.o vexpress.o xilinx_zynq.o z2.o +obj-y += tosa.o versatilepb.o vexpress.o virt.o xilinx_zynq.o z2.o obj-y += armv7m.o exynos4210.o pxa2xx.o pxa2xx_gpio.o pxa2xx_pic.o obj-y += omap1.o omap2.o strongarm.o diff --git a/hw/arm/virt.c b/hw/arm/virt.c new file mode 100644 index 000..2a743b7 --- /dev/null +++ b/hw/arm/virt.c @@ -0,0 +1,363 @@ +/* + * ARM mach-virt emulation + * + * Copyright (c) 2013 Linaro + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2 or later, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program. If not, see http://www.gnu.org/licenses/. + * + * Emulate a virtual board which works by passing Linux all the information + * it needs about what devices are present via the device tree. + * There are some restrictions about what we can do here: + * + we can only present devices whose Linux drivers will work based + *purely on the device tree with no platform data at all + * + we want to present a very stripped-down minimalist platform, + *both because this reduces the security attack surface from the guest + *and also because it reduces our exposure to being broken when + *the kernel updates its device tree bindings and requires further + *information in a device binding that we aren't providing. + * This is essentially the same approach kvmtool uses. + */ + +#include hw/sysbus.h +#include hw/arm/arm.h +#include hw/arm/primecell.h +#include hw/devices.h +#include net/net.h +#include sysemu/device_tree.h +#include sysemu/sysemu.h +#include sysemu/kvm.h +#include hw/boards.h +#include exec/address-spaces.h +#include qemu/bitops.h +#include qemu/error-report.h + +#define NUM_VIRTIO_TRANSPORTS 32 + +#define GIC_FDT_IRQ_TYPE_SPI 0 +#define GIC_FDT_IRQ_TYPE_PPI 1 + +#define GIC_FDT_IRQ_FLAGS_EDGE_LO_HI 1 +#define GIC_FDT_IRQ_FLAGS_EDGE_HI_LO 2 +#define GIC_FDT_IRQ_FLAGS_LEVEL_HI 4 +#define GIC_FDT_IRQ_FLAGS_LEVEL_LO 8 + +#define GIC_FDT_IRQ_PPI_CPU_START 8 +#define GIC_FDT_IRQ_PPI_CPU_WIDTH 8 + +enum { +VIRT_FLASH, +VIRT_MEM, +VIRT_CPUPERIPHS, +VIRT_GIC_DIST, +VIRT_GIC_CPU, +VIRT_MMIO, +}; + +typedef struct MemMapEntry { +hwaddr base; +hwaddr size; +} MemMapEntry; + +typedef struct VirtBoardInfo { +struct arm_boot_info bootinfo; +const char *cpu_model; +const char *cpu_compatible; +const char *qdevname; +const char *gic_compatible; +const MemMapEntry *memmap; +int smp_cpus; +void *fdt; +int fdt_size; +} VirtBoardInfo; + +/* Addresses and sizes of our components. + * We leave the first 64K free for possible use later for + * flash (for running boot code such as UEFI); following + * that is I/O, and then everything else is RAM (which may + * happily spill over into the high memory region beyond 4GB). + */
Re: [Qemu-devel] [PATCH 1/2] memory: Provide separate handling of unassigned io ports accesses
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Am 05.08.2013 13:03, schrieb Jan Kiszka: On 2013-08-05 12:51, Peter Maydell wrote: On 5 August 2013 11:44, Jan Kiszka jan.kis...@web.de wrote: On 2013-08-05 12:36, Peter Maydell wrote: On 5 August 2013 11:30, Jan Kiszka jan.kis...@web.de wrote: On 2013-08-05 11:59, Peter Maydell wrote: Or do you mean that if we had: [ system memory region, with its own default read/write ops ] I cannot imagine how this could work. The system memory region has no clue about what the regions below it can handle and what not. So it has to pass through the io window. The system memory region's just a container, you can add a background region to it at lowest-possible-priority, which then takes accesses which are either (a) not in any subregion or (b) in a subregion but that container doesn't specify its own io ops and nothing in that container handles the access. (Or you could create the system memory region with its own IO ops, which would have the same effect.) First, we do not render MMIO and IO within the same address space so far. Is this a statement made because you've checked all the boards and know that nobody's mapping the system-io memory region into the system-memory region? (It is technically trivial, you just need to call memory_region_add_subregion() directly or indirectly...) I know this because I just recently wrote the patch that enables this trivial step, i.e. converted PIO dispatching to the memory subsystem. Several patches have been applied since, e.g. sPAPR PHB: http://git.qemu.org/?p=qemu.git;a=commit;h=66aab867cedd2a2d81b4d64eff7c3e0f6f272bbf - - aliases system_io() PReP i82378 PCI-ISA bridge: http://git.qemu.org/?p=qemu.git;a=commit;h=5c9736789b79ea49cd236ac326f0a414f63b1015 - - uses pci_address_space_io() Alpha Typhoon: http://git.qemu.org/?p=qemu.git;a=commit;h=056e6bae1c91f47165d962564f82f5176bae47f0 http://git.qemu.org/?p=qemu.git;a=commit;h=3661049fec64ffd7ab008e57e396881c6a4b53a4 [For those joining late, this discussion is about whether making PIO MemoryRegion ..._io rather than just container might hurt some use case. If you have a concrete test case that would be appreciated; a we-don't-care-about-such-a-fringe-case would help as well.] Andreas - -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.19 (GNU/Linux) iQIcBAEBAgAGBQJR/43lAAoJEPou0S0+fgE/EiMQAJP7YfqF+YYKvS5NUonLMo26 ncsY7E9IikpWSCcL/QVwmfFTfDjptR0iI987+4OcE3Sf0nH3pv5FFge4aR965Of1 sBvqfAVD9ihnV2sO+s2rImgw9tyC/kadA6k0hLB8h/+QDgl/J5wjpI0A29OHT8Uc BYv210POLqzU+nAO83t9w3xZA3Q60+X8YxI3GqAd6CEMS7i2yga0SK2Q3U1P7o+G sE6hqYE7sWKtaQlosTWuAxYnOOXcY2u+EEnqS6SXqZbp4NcitRYCamNu+8ARDAab JNIbbmCt1GTzdQ6PxVu5paPRUJbETQ2rDWxAq4Ryr37Gi5vb9FFPNMKpSncNu28b o46Fr6QMDTMNgk0Ac7F4WKKICGsCMuHuzot7JLoRkfhPQunqwQR5yEyvUKwYtV5p pAl1Frop+88X6HEmUkNJHR1FnhI5pVLImyxJTLXVRGgZqx7i1J8cvVPM6PhWa8tU NFFVKy7aWNj5MOVE8oSGK+LWilikb7i0b7YxDI/Ky/OI62niy3CK+XQgYfSFXEZ/ s8ErAfI5CMKy0O1/KbUy6V3cGpGM2xRizBSDCy3GqP3XD3If3tG8siRSarEG2FxO TO73OWtYarAtZ5cIRIsefV3uT/CA6g8p/dXgfp1WxDSIZcqZIMz5FIGgIDoIT7i4 6fPuo2wYiZXLS3/g6tuP =egTc -END PGP SIGNATURE-
Re: [Qemu-devel] net/tap.c: Possibly a way to stall tap input
On Fri, Aug 02, 2013 at 09:41:24PM +0200, Jan Kiszka wrote: On 2013-08-02 14:45, Jan Kiszka wrote: On 2013-08-02 13:46, Stefan Hajnoczi wrote: On Thu, Aug 01, 2013 at 07:15:54PM +0200, Jan Kiszka wrote: I was digging into the involved code and found something fishy: net/tap.c: static void tap_send(void *opaque) { ... size = qemu_send_packet_async(s-nc, buf, size, tap_send_completed); if (size == 0) { tap_read_poll(s, false); } So, if tap_send is registered for the mainloop polling (ie. can_receive returned true before starting to poll) but qemu_send_packet_async returns 0 now as qemu_can_send_packet/can_receive happens to report false in the meantime, we will disable read polling. If also write polling is off, the fd will be completely removed from the iohandler list. But even if write polling remains on, I wonder what should bring read polling back? This behavior seems fine to me. Once the peer (pcnet) is able to receive again it must flush the queue, this will re-enable tap_read_poll(). Can you explain a bit more why this would be a problem? The problem is that I don't see at all what will call tap_read_poll(s, 1), neither in theory nor in reality. As long as the real test case is out of reach, I tried to emulate the faulty behaviour by letting tap_can_send always return 1. Result: reception stalls during boot as even qemu_flush_queued_packets cannot get it running again once tap_read_poll(s, 0) was called. OK, false alarm. The issue was most likely fixed by commit 199ee608 (net: fix qemu_flush_queued_packets() in presence of a hub) which is present in 1.5.x but not 1.3.x. We initially tried to test on 1.5 but had to role back to 1.3 due to other issues - and missed this fix. My understanding of the networking maze was confused by the unfortunate naming of the incoming net client queues (send_queue) - will propose a renaming. This still requires a confirmation on the target, but I'm quite optimistic now. Okay, good to hear. It makes more sense now and I agree that send_queue is not a great name. Stefan
Re: [Qemu-devel] [PATCH 1/2] memory: Provide separate handling of unassigned io ports accesses
On 2013-08-05 13:35, Andreas Färber wrote: Am 05.08.2013 13:03, schrieb Jan Kiszka: On 2013-08-05 12:51, Peter Maydell wrote: On 5 August 2013 11:44, Jan Kiszka jan.kis...@web.de wrote: On 2013-08-05 12:36, Peter Maydell wrote: On 5 August 2013 11:30, Jan Kiszka jan.kis...@web.de wrote: On 2013-08-05 11:59, Peter Maydell wrote: Or do you mean that if we had: [ system memory region, with its own default read/write ops ] I cannot imagine how this could work. The system memory region has no clue about what the regions below it can handle and what not. So it has to pass through the io window. The system memory region's just a container, you can add a background region to it at lowest-possible-priority, which then takes accesses which are either (a) not in any subregion or (b) in a subregion but that container doesn't specify its own io ops and nothing in that container handles the access. (Or you could create the system memory region with its own IO ops, which would have the same effect.) First, we do not render MMIO and IO within the same address space so far. Is this a statement made because you've checked all the boards and know that nobody's mapping the system-io memory region into the system-memory region? (It is technically trivial, you just need to call memory_region_add_subregion() directly or indirectly...) I know this because I just recently wrote the patch that enables this trivial step, i.e. converted PIO dispatching to the memory subsystem. Several patches have been applied since, e.g. sPAPR PHB: http://git.qemu.org/?p=qemu.git;a=commit;h=66aab867cedd2a2d81b4d64eff7c3e0f6f272bbf - aliases system_io() PReP i82378 PCI-ISA bridge: http://git.qemu.org/?p=qemu.git;a=commit;h=5c9736789b79ea49cd236ac326f0a414f63b1015 - uses pci_address_space_io() Alpha Typhoon: http://git.qemu.org/?p=qemu.git;a=commit;h=056e6bae1c91f47165d962564f82f5176bae47f0 http://git.qemu.org/?p=qemu.git;a=commit;h=3661049fec64ffd7ab008e57e396881c6a4b53a4 [For those joining late, this discussion is about whether making PIO MemoryRegion ..._io rather than just container might hurt some use case. If you have a concrete test case that would be appreciated; a we-don't-care-about-such-a-fringe-case would help as well.] OK, one assumption became outdated, but the other will remain true once the patch is applied. So let's close this discussion. Jan signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH for-1.6] qemu-img: Error out for excess arguments
On 08/05/13 10:57, Kevin Wolf wrote: Don't silently ignore excess arguments at the end of the command line, but error out instead. This can catch typos like 'resize test.img + 1G', which doesn't increase the image size by 1G as intended, but truncates the image to 1G. Even for less dangerous commands, the old behaviour is confusing. Signed-off-by: Kevin Wolf kw...@redhat.com --- qemu-img.c | 17 ++--- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/qemu-img.c b/qemu-img.c index c55ca5c..dece1b3 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -396,6 +396,9 @@ static int img_create(int argc, char **argv) } img_size = (uint64_t)sval; } +if (optind != argc) { +help(); +} if (options is_help_option(options)) { return print_block_option_help(filename, fmt); @@ -573,7 +576,7 @@ static int img_check(int argc, char **argv) break; } } -if (optind = argc) { +if (optind != argc - 1) { help(); } filename = argv[optind++]; @@ -684,7 +687,7 @@ static int img_commit(int argc, char **argv) break; } } -if (optind = argc) { +if (optind != argc - 1) { help(); } filename = argv[optind++]; @@ -930,7 +933,7 @@ static int img_compare(int argc, char **argv) } -if (optind argc - 2) { +if (optind != argc - 2) { help(); } filename1 = argv[optind++]; @@ -1741,7 +1744,7 @@ static int img_info(int argc, char **argv) break; } } -if (optind = argc) { +if (optind != argc - 1) { help(); } filename = argv[optind++]; @@ -1842,7 +1845,7 @@ static int img_snapshot(int argc, char **argv) } } -if (optind = argc) { +if (optind != argc - 1) { help(); } filename = argv[optind++]; @@ -1953,7 +1956,7 @@ static int img_rebase(int argc, char **argv) progress = 0; } -if ((optind = argc) || (!unsafe !out_baseimg)) { +if ((optind != argc - 1) || (!unsafe !out_baseimg)) { help(); } filename = argv[optind++]; @@ -2232,7 +2235,7 @@ static int img_resize(int argc, char **argv) break; } } -if (optind = argc) { +if (optind != argc - 1) { help(); } filename = argv[optind++]; Reviewed-by: Laszlo Ersek ler...@redhat.com
Re: [Qemu-devel] [PATCH v2 2/9] block: vhdx - add header update capability.
Am 01.08.2013 um 05:23 hat Jeff Cody geschrieben: This adds the ability to update the headers in a VHDX image, including generating a new MS-compatible GUID. As VHDX depends on uuid.h, VHDX is now a configurable build option. If VHDX support is enabled, that will also enable uuid as well. The default is to have VHDX enabled. To enable/disable VHDX: --enable-vhdx, --disable-vhdx Signed-off-by: Jeff Cody jc...@redhat.com diff --git a/configure b/configure index 877a821..821b790 100755 --- a/configure +++ b/configure @@ -244,6 +244,7 @@ gtk= gtkabi=2.0 tpm=no libssh2= +vhdx=yes # parse CC options first for opt do @@ -950,6 +951,11 @@ for opt do ;; --enable-libssh2) libssh2=yes ;; + --enable-vhdx) vhdx=yes ; + uuid=yes + ;; What's this? Shouldn't auto-detecting uuid and checking later whether it's available do the right thing? Test cases: 1. configure --disable-uuid --enable-vhdx This should result in an error because the options contradict each other. With this patch, the disable flag is overridden. 2. No libuuid headers available; ./configure Should build without VHDX. With this patch, uuid is left out, vhdx is configured anyway, the build fails. Kevin
Re: [Qemu-devel] [PATCH] virtio: Do not notify virtqueue if no element was pushed back.
On 08/05/13 10:18, Gal Hammer wrote: Signed-off-by: Gal Hammer gham...@redhat.com --- hw/char/virtio-serial-bus.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c index da417c7..0d38b4b 100644 --- a/hw/char/virtio-serial-bus.c +++ b/hw/char/virtio-serial-bus.c @@ -105,6 +105,7 @@ static void do_flush_queued_data(VirtIOSerialPort *port, VirtQueue *vq, VirtIODevice *vdev) { VirtIOSerialPortClass *vsc; +bool elem_pushed = false; assert(port); assert(virtio_queue_ready(vq)); @@ -145,9 +146,12 @@ static void do_flush_queued_data(VirtIOSerialPort *port, VirtQueue *vq, break; } virtqueue_push(vq, port-elem, 0); +elem_pushed = true; port-elem.out_num = 0; } -virtio_notify(vdev, vq); +if (elem_pushed) { +virtio_notify(vdev, vq); +} } static void flush_queued_data(VirtIOSerialPort *port) I could be missing something, but it looks good to me. BTW the subject should say: virtio-serial: ..., not generic virtio: How did you catch this? Is this a performance problem, or did it expose a guest driver bug? (The guest driver should be prepared for spurious wakeups.) Reviewed-by: Laszlo Ersek ler...@redhat.com
Re: [Qemu-devel] [PATCH v4 2/2] ARM: Add 'virt' platform
On Mon, Aug 5, 2013 at 4:48 PM, Peter Maydell peter.mayd...@linaro.org wrote: From: John Rigby john.ri...@linaro.org Add 'virt' platform support corresponding to arch/arm/mach-virt in the Linux kernel tree. This has no platform-specific code but can use any device whose kernel driver is is able to work purely from a device tree node. We use this to instantiate a minimal set of devices: a GIC and some virtio-mmio transports. TODO: * MAX_MEM for A15 could be higher if we shuffle things about * RAM or flash at addr 0?? Signed-off-by: John Rigby john.ri...@linaro.org [PMM: Significantly overhauled: * renamed user-facing machine to just virt * removed the A9 support (it can't work since the A9 has no generic timers) * added virtio-mmio transports instead of random set of 'soc' devices * instead of updating io_base as we step through adding devices, define a memory map with an array (similar to vexpress) * folded in some minor fixes from John's aarch64-support patch * rather than explicitly doing endian-swapping on FDT cells, use fdt APIs that let us just pass in host-endian values and let the fdt layer take care of the swapping * miscellaneous minor code cleanups and style fixes ] Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- hw/arm/Makefile.objs |2 +- hw/arm/virt.c| 363 ++ 2 files changed, 364 insertions(+), 1 deletion(-) create mode 100644 hw/arm/virt.c diff --git a/hw/arm/Makefile.objs b/hw/arm/Makefile.objs index 9e3a06f..744484f 100644 --- a/hw/arm/Makefile.objs +++ b/hw/arm/Makefile.objs @@ -1,7 +1,7 @@ obj-y += boot.o collie.o exynos4_boards.o gumstix.o highbank.o obj-y += integratorcp.o kzm.o mainstone.o musicpal.o nseries.o obj-y += omap_sx1.o palm.o pic_cpu.o realview.o spitz.o stellaris.o -obj-y += tosa.o versatilepb.o vexpress.o xilinx_zynq.o z2.o +obj-y += tosa.o versatilepb.o vexpress.o virt.o xilinx_zynq.o z2.o obj-y += armv7m.o exynos4210.o pxa2xx.o pxa2xx_gpio.o pxa2xx_pic.o obj-y += omap1.o omap2.o strongarm.o diff --git a/hw/arm/virt.c b/hw/arm/virt.c new file mode 100644 index 000..2a743b7 --- /dev/null +++ b/hw/arm/virt.c @@ -0,0 +1,363 @@ +/* + * ARM mach-virt emulation + * + * Copyright (c) 2013 Linaro + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2 or later, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program. If not, see http://www.gnu.org/licenses/. + * + * Emulate a virtual board which works by passing Linux all the information + * it needs about what devices are present via the device tree. + * There are some restrictions about what we can do here: + * + we can only present devices whose Linux drivers will work based + *purely on the device tree with no platform data at all + * + we want to present a very stripped-down minimalist platform, + *both because this reduces the security attack surface from the guest + *and also because it reduces our exposure to being broken when + *the kernel updates its device tree bindings and requires further + *information in a device binding that we aren't providing. + * This is essentially the same approach kvmtool uses. + */ + +#include hw/sysbus.h +#include hw/arm/arm.h +#include hw/arm/primecell.h +#include hw/devices.h +#include net/net.h +#include sysemu/device_tree.h +#include sysemu/sysemu.h +#include sysemu/kvm.h +#include hw/boards.h +#include exec/address-spaces.h +#include qemu/bitops.h +#include qemu/error-report.h + +#define NUM_VIRTIO_TRANSPORTS 32 + +#define GIC_FDT_IRQ_TYPE_SPI 0 +#define GIC_FDT_IRQ_TYPE_PPI 1 + +#define GIC_FDT_IRQ_FLAGS_EDGE_LO_HI 1 +#define GIC_FDT_IRQ_FLAGS_EDGE_HI_LO 2 +#define GIC_FDT_IRQ_FLAGS_LEVEL_HI 4 +#define GIC_FDT_IRQ_FLAGS_LEVEL_LO 8 + +#define GIC_FDT_IRQ_PPI_CPU_START 8 +#define GIC_FDT_IRQ_PPI_CPU_WIDTH 8 + +enum { +VIRT_FLASH, +VIRT_MEM, +VIRT_CPUPERIPHS, +VIRT_GIC_DIST, +VIRT_GIC_CPU, +VIRT_MMIO, +}; + +typedef struct MemMapEntry { +hwaddr base; +hwaddr size; +} MemMapEntry; + +typedef struct VirtBoardInfo { +struct arm_boot_info bootinfo; +const char *cpu_model; +const char *cpu_compatible; +const char *qdevname; +const char *gic_compatible; +const MemMapEntry *memmap; +int smp_cpus; +void *fdt; +int fdt_size; +} VirtBoardInfo; + +/* Addresses and sizes of our components. + * We leave the first 64K free for
Re: [Qemu-devel] [PATCH for-1.6 01/11] ignore SIGPIPE in qemu-img and qemu-io
Am 03.08.2013 um 05:52 hat Doug Goldstein geschrieben: On Tue, Jul 23, 2013 at 4:19 AM, Paolo Bonzini pbonz...@redhat.com wrote: Il 23/07/2013 10:30, MORITA Kazutaka ha scritto: This prevents the tools from being stopped when they write data to a closed connection in the other side. Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp --- qemu-img.c | 4 qemu-io.c | 4 2 files changed, 8 insertions(+) diff --git a/qemu-img.c b/qemu-img.c index c55ca5c..919d464 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -2319,6 +2319,10 @@ int main(int argc, char **argv) const img_cmd_t *cmd; const char *cmdname; +#ifdef CONFIG_POSIX +signal(SIGPIPE, SIG_IGN); +#endif + error_set_progname(argv[0]); qemu_init_main_loop(); diff --git a/qemu-io.c b/qemu-io.c index cb9def5..d54dc86 100644 --- a/qemu-io.c +++ b/qemu-io.c @@ -335,6 +335,10 @@ int main(int argc, char **argv) int opt_index = 0; int flags = BDRV_O_UNMAP; +#ifdef CONFIG_POSIX +signal(SIGPIPE, SIG_IGN); +#endif + progname = basename(argv[0]); while ((c = getopt_long(argc, argv, sopt, lopt, opt_index)) != -1) { Reviewed-by: Paolo Bonzini pbonz...@redhat.com and adding qemu-stable for this one. Nudge so this isn't forgotten about since it hasn't hit master yet. Thanks, applied to the block branch. Kevin
Re: [Qemu-devel] [PATCH] tap: Use numbered tap/tun devices on all *BSD OS's
On Sat, Aug 03, 2013 at 10:20:41PM -0400, Brad Smith wrote: The following patch simplifies the *BSD tap/tun code and makes use of numbered tap/tun interfaces on all *BSD OS's. NetBSD has a patch in their pkgsrc tree to make use of this feature and DragonFly also supports this as well. Signed-off-by: Brad Smith b...@comstyle.com I confirmed that the NetBSD tun driver does use /dev/tap%d by default. There are not other CONFIG_BSD=y targets listed in ./configure besides FreeBSD/kFreeBSD, OpenBSD, and Darwin. Therefore this patch is safe. Thanks, applied to my net-next tree: https://github.com/stefanha/qemu/commits/net-next Stefan
Re: [Qemu-devel] [ANNOUNCE] QEMU 1.6.0-rc1 is now available
On 08/04/13 14:17, Mark Cave-Ayland wrote: On 01/08/13 23:08, Anthony Liguori wrote: Hi, On behalf of the QEMU Team, I'd like to announce the availability of the second release candidate for the QEMU 1.6 release. This release is meant for testing purposes and should not be used in a production environment. http://wiki.qemu.org/download/qemu-1.6.0-rc1.tar.bz2 Hi Anthony, There are still two outstanding issues from my testing here - patches have been submitted for both, but not applied yet. - qemu-system-sparc64 broken See https://lists.gnu.org/archive/html/qemu-devel/2013-07/msg05201.html. - OpenBIOS images have not been updated See https://lists.gnu.org/archive/html/qemu-devel/2013-07/msg05923.html. Also, my series [PATCH 0/4] dump-guest-memory: correct the vmcores http://thread.gmane.org/gmane.comp.emulators.qemu/225360 was dropped from qmp-1.6 (and consequently didn't make the d5a2bcf7 merge) due to an asynchronous change to another part of the tree that I could not have foreseen. IOW it was not my fault -- I did exercise due diligence and my series was building and working fine at the then-fresh master when I posted it. A conflicting function was introduced between my posting and Luiz's application of my series to his qmp-1.6 branch, and since I was on PTO from last Thursday, I could only refresh the series only today. The update from v1 to v2 is trivial: diff --git a/target-s390x/arch_dump.c b/target-s390x/arch_dump.c index f3e5144..9d36116 100644 --- a/target-s390x/arch_dump.c +++ b/target-s390x/arch_dump.c @@ -176,7 +176,8 @@ int s390_cpu_write_elf64_note(WriteCoreDumpFunction f, CPUState *cs, return s390x_write_all_elf64_notes(CORE, f, cpu, cpuid, opaque); } -int cpu_get_dump_info(ArchDumpInfo *info) +int cpu_get_dump_info(ArchDumpInfo *info, + const struct GuestPhysBlockList *guest_phys_blocks) { info-d_machine = EM_S390; info-d_endian = ELFDATA2MSB; Please do consider it for 1.6, it addresses a serious bug (= vmcores saved with dump-guest-memory/paging=false are broken for 3.5G x86(_64) guests). [PATCH v2 for-qmp-1.6 0/4] dump-guest-memory: correct the vmcores http://thread.gmane.org/gmane.comp.emulators.qemu/226378 Thanks Laszlo
Re: [Qemu-devel] [PATCH] net: Rename send_queue to incoming_queue
On Fri, Aug 02, 2013 at 09:47:08PM +0200, Jan Kiszka wrote: From: Jan Kiszka jan.kis...@siemens.com Each networking client has a queue for packets that could not yet be delivered to that client. Calling this queue send_queue is highly confusing as it has nothing to to with packets send from this client but to it. Avoid this confusing by renaming it to incoming_queue. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- include/net/net.h |2 +- net/hub.c |2 +- net/net.c | 14 +++--- 3 files changed, 9 insertions(+), 9 deletions(-) Taking this for QEMU 1.7 in the net-next branch. Thanks, applied to my net-next tree: https://github.com/stefanha/qemu/commits/net-next Stefan
Re: [Qemu-devel] [PATCH] virtio: Do not notify virtqueue if no element was pushed back.
On 05/08/2013 14:49, Laszlo Ersek wrote: On 08/05/13 10:18, Gal Hammer wrote: Signed-off-by: Gal Hammer gham...@redhat.com --- hw/char/virtio-serial-bus.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c index da417c7..0d38b4b 100644 --- a/hw/char/virtio-serial-bus.c +++ b/hw/char/virtio-serial-bus.c @@ -105,6 +105,7 @@ static void do_flush_queued_data(VirtIOSerialPort *port, VirtQueue *vq, VirtIODevice *vdev) { VirtIOSerialPortClass *vsc; +bool elem_pushed = false; assert(port); assert(virtio_queue_ready(vq)); @@ -145,9 +146,12 @@ static void do_flush_queued_data(VirtIOSerialPort *port, VirtQueue *vq, break; } virtqueue_push(vq, port-elem, 0); +elem_pushed = true; port-elem.out_num = 0; } -virtio_notify(vdev, vq); +if (elem_pushed) { +virtio_notify(vdev, vq); +} } static void flush_queued_data(VirtIOSerialPort *port) I could be missing something, but it looks good to me. Thanks. BTW the subject should say: virtio-serial: ..., not generic virtio: OK. I can change that. How did you catch this? Is this a performance problem, or did it expose a guest driver bug? (The guest driver should be prepared for spurious wakeups.) I had a bug in the Windows' driver that caused a buffer duplication when the char device didn't read the whole message. It was found when working with SPICE client over WAN (i.e. slow connection) and trying to copypaste a large image. The driver was fixed. However I still think that this is needed although I'm not sure if there is a major performance penalty because of this redundant notification. Gal. Reviewed-by: Laszlo Ersek ler...@redhat.com
Re: [Qemu-devel] [PATCH for-1.6] target-i386: Fix X86CPU error handling
Am 02.08.2013 19:22, schrieb Andreas Färber: Error **errp argument is not for emitting warnings, it means an error has occurred and the caller should not make any assumptions about the state of other return values (unless otherwise documented). Therefore cpu_x86_create() must unref the new X86CPU itself, and pc_new_cpu() must check for an Error rather than NULL return value. While at it, clean up a superfluous NULL check. Reported-by: Jan Kiszka jan.kis...@siemens.com Cc: qemu-sta...@nongnu.org Cc: Igor Mammedov imamm...@redhat.com Signed-off-by: Andreas Färber afaer...@suse.de Ping! Jan, does this address your concerns / use cases? -rc2 is on Wednesday. Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
Re: [Qemu-devel] [PATCH for-1.6] target-i386: Fix X86CPU error handling
On 2013-08-05 14:06, Andreas Färber wrote: Am 02.08.2013 19:22, schrieb Andreas Färber: Error **errp argument is not for emitting warnings, it means an error has occurred and the caller should not make any assumptions about the state of other return values (unless otherwise documented). Therefore cpu_x86_create() must unref the new X86CPU itself, and pc_new_cpu() must check for an Error rather than NULL return value. While at it, clean up a superfluous NULL check. Reported-by: Jan Kiszka jan.kis...@siemens.com Cc: qemu-sta...@nongnu.org Cc: Igor Mammedov imamm...@redhat.com Signed-off-by: Andreas Färber afaer...@suse.de Ping! Jan, does this address your concerns / use cases? -rc2 is on Wednesday. Yes, thanks, this looks good. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux
Re: [Qemu-devel] [PATCH] pcnet: Flush queued packets on end of STOP state
On Fri, Aug 02, 2013 at 09:48:18PM +0200, Jan Kiszka wrote: From: Jan Kiszka jan.kis...@siemens.com Analogously to other NICs, we have to inform the network layer when the can_receive handler will no longer report 0. Without this, we may get stuck waiting on queued incoming packets. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/net/pcnet.c |4 1 files changed, 4 insertions(+), 0 deletions(-) Great, let's get this into the next QEMU 1.6-rc. Thanks, applied to my net tree: https://github.com/stefanha/qemu/commits/net Stefan
Re: [Qemu-devel] [PATCH v4 2/2] ARM: Add 'virt' platform
On 5 August 2013 12:48, Anup Patel a...@brainfault.org wrote: +static const MemMapEntry a15memmap[] = { +[VIRT_FLASH] = { 0, 0x10 }, IMHO, 1 MB of flash is small for possible future expansion. If mach-virt becomes popular then we can expect people running UBoot or UEFI or from this flash. I think having 16 MB of flash would be good because it is multiple of 2 MB hence we can also create section entries for it in Stage2 TTBL. Seems reasonable. +[VIRT_CPUPERIPHS] = { 0x10, 0x8000 }, I would suggest to start peripheral space at 2 MB boundary and also have its total size in multiple of 2 MB. The total size here is fixed by the CPU hardware -- an A15's private peripheral space is only 0x8000 in size. This will enable us to create 2 MB entries in Stage2 TTBL for trapping which in-turn can help in performance by reducing Stage2 TTBL walks. I am sure there won't be too many peripherals in mach-virt so, it would even better if we can have base address of each peripheral to be at 2 MB boundary. Why does each individual peripheral have to be at a 2MB boundary? I would expect the kernel to just have a single nothing mapped here stage 2 table entry (or entries) covering the whole of the mmio peripheral region regardless of how many devices happen to be inside it. thanks -- PMM
[Qemu-devel] [PULL 1/1] pcnet: Flush queued packets on end of STOP state
From: Jan Kiszka jan.kis...@siemens.com Analogously to other NICs, we have to inform the network layer when the can_receive handler will no longer report 0. Without this, we may get stuck waiting on queued incoming packets. Signed-off-by: Jan Kiszka jan.kis...@siemens.com Signed-off-by: Stefan Hajnoczi stefa...@redhat.com --- hw/net/pcnet.c | 4 1 file changed, 4 insertions(+) diff --git a/hw/net/pcnet.c b/hw/net/pcnet.c index b606d2b..63aa73a 100644 --- a/hw/net/pcnet.c +++ b/hw/net/pcnet.c @@ -861,6 +861,8 @@ static void pcnet_init(PCNetState *s) s-csr[0] |= 0x0101; s-csr[0] = ~0x0004; /* clear STOP bit */ + +qemu_flush_queued_packets(qemu_get_queue(s-nic)); } static void pcnet_start(PCNetState *s) @@ -878,6 +880,8 @@ static void pcnet_start(PCNetState *s) s-csr[0] = ~0x0004; /* clear STOP bit */ s-csr[0] |= 0x0002; pcnet_poll_timer(s); + +qemu_flush_queued_packets(qemu_get_queue(s-nic)); } static void pcnet_stop(PCNetState *s) -- 1.8.1.4
[Qemu-devel] [PULL for-1.6 0/1] Net patches
This fix allows the pcnet NIC to work properly with the tap device. The following changes since commit b9ac5d923b820a0f0152a2df56067e55ce34f487: target-mips: fix 34Kf configuration for DSP ASE (2013-08-03 23:33:17 +0200) are available in the git repository at: git://github.com/stefanha/qemu.git net for you to fetch changes up to ee76c1f821e75550644e084dea85743bbc934f91: pcnet: Flush queued packets on end of STOP state (2013-08-05 14:11:17 +0200) Jan Kiszka (1): pcnet: Flush queued packets on end of STOP state hw/net/pcnet.c | 4 1 file changed, 4 insertions(+) -- 1.8.1.4
Re: [Qemu-devel] [PATCH v4 2/2] ARM: Add 'virt' platform
On Mon, Aug 5, 2013 at 5:31 PM, Peter Maydell peter.mayd...@linaro.org wrote: On 5 August 2013 12:48, Anup Patel a...@brainfault.org wrote: +static const MemMapEntry a15memmap[] = { +[VIRT_FLASH] = { 0, 0x10 }, IMHO, 1 MB of flash is small for possible future expansion. If mach-virt becomes popular then we can expect people running UBoot or UEFI or from this flash. I think having 16 MB of flash would be good because it is multiple of 2 MB hence we can also create section entries for it in Stage2 TTBL. Seems reasonable. +[VIRT_CPUPERIPHS] = { 0x10, 0x8000 }, I would suggest to start peripheral space at 2 MB boundary and also have its total size in multiple of 2 MB. The total size here is fixed by the CPU hardware -- an A15's private peripheral space is only 0x8000 in size. Does this mean, mach-virt address space is Cortex-A15 specific ? What about Cortex-A7 and Cortex-A12 ? If above is a mandatory constraint then we can have peripheral space divide into two parts: 1. Essential (or MPCore) peripherals: For now only GIC belongs here. Other things that fit here is Watchdog and Local timers but this are redundant for mach-virt. This space can be only 0x8000 in size as required by Cortex-A15. 2. General peripherals: All VirtIO devices would belong here. In addition, users can also add devices to this space using QEMU command line. This will enable us to create 2 MB entries in Stage2 TTBL for trapping which in-turn can help in performance by reducing Stage2 TTBL walks. I am sure there won't be too many peripherals in mach-virt so, it would even better if we can have base address of each peripheral to be at 2 MB boundary. Why does each individual peripheral have to be at a 2MB boundary? I would expect the kernel to just have a single nothing mapped here stage 2 table entry (or entries) covering the whole of the mmio peripheral region regardless of how many devices happen to be inside it. Yes, this seem a little over-kill but we can atleast have peripheral space to be 2MB aligned with total size in multiple of 2MB. thanks -- PMM
Re: [Qemu-devel] [PATCH for-1.6] target-mips: do not raise exceptions when accessing invalid memory
On Mon, Aug 05, 2013 at 07:19:08AM +0200, Stefan Weil wrote: Am 05.08.2013 00:37, schrieb Peter Maydell: On 4 August 2013 23:04, Aurélien Jarno aurel...@aurel32.net wrote: The real hardware probably returns all 1 or all 0 for addresses not decoded to a device. This is what QEMU should model, and it should not trigger a DBE or IBE exception. Looking at the current MIPS documentation, Bus Error is defined as: A bus error exception occurs when an instruction or data access makes a bus request (due to a cache miss or an uncacheable reference) and that request terminates in an error. Older CPU documentation like the R4000 have a more precise definition: A Bus Error exception is raised by board-level circuitry for events such as bus time-out, backplane bus parity errors, and invalid physical memory addresses or access types. As we don't model this kind of errors, we should definitely just not trigger an exception in that case, and even logging the event as unimplemented is probably wrong. Well, we certainly can model invalid-physical-address and bus-timeout where that's what the board does for accesses to non-decoded addresses; but presumably in this case it doesn't... -- PMM Is there anybody who has access to physical Malta hardware? It would be interesting to see whether there is an exception during the gcmp test or not. With latest QEMU, the MIPS Malta system emulation starts handling the exception caused by the gcmp test, but then gets a second exception which is fatal (see below). There might be something missing in our very simple bios emulation. Booting YAMON in QEMU also shows the same behaviour, that is trying to access to the 1fbf8008 address and getting a DBE exception, causing it to fail. So it is clearly not due to our simple bios emulation, but rather to the way the I/O are emulated. -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurel...@aurel32.net http://www.aurel32.net
Re: [Qemu-devel] [PATCH v4 2/2] ARM: Add 'virt' platform
On 5 August 2013 13:22, Anup Patel a...@brainfault.org wrote: On Mon, Aug 5, 2013 at 5:31 PM, Peter Maydell peter.mayd...@linaro.org wrote: On 5 August 2013 12:48, Anup Patel a...@brainfault.org wrote: +static const MemMapEntry a15memmap[] = { +[VIRT_FLASH] = { 0, 0x10 }, IMHO, 1 MB of flash is small for possible future expansion. If mach-virt becomes popular then we can expect people running UBoot or UEFI or from this flash. I think having 16 MB of flash would be good because it is multiple of 2 MB hence we can also create section entries for it in Stage2 TTBL. Seems reasonable. +[VIRT_CPUPERIPHS] = { 0x10, 0x8000 }, I would suggest to start peripheral space at 2 MB boundary and also have its total size in multiple of 2 MB. The total size here is fixed by the CPU hardware -- an A15's private peripheral space is only 0x8000 in size. Does this mean, mach-virt address space is Cortex-A15 specific ? The patch has support for an array of VirtBoardInfo structs, each of which has its own memmap. The only entry at the moment is for the A15. What about Cortex-A7 and Cortex-A12 ? QEMU supports neither of these CPUs. The A7's the same as the A15, though. If above is a mandatory constraint then we can have peripheral space divide into two parts: 1. Essential (or MPCore) peripherals: For now only GIC belongs here. Other things that fit here is Watchdog and Local timers but this are redundant for mach-virt. This space can be only 0x8000 in size as required by Cortex-A15. 2. General peripherals: All VirtIO devices would belong here. In addition, users can also add devices to this space using QEMU command line. Er, this is what the patch already does. The CPUPERIPHS bit is specifically for the CPU's private peripheral region. You can't add an MMIO device on the QEMU command line, there's no way to wire up the memory regions and IRQs. Why does each individual peripheral have to be at a 2MB boundary? I would expect the kernel to just have a single nothing mapped here stage 2 table entry (or entries) covering the whole of the mmio peripheral region regardless of how many devices happen to be inside it. Yes, this seem a little over-kill but we can atleast have peripheral space to be 2MB aligned with total size in multiple of 2MB. I really don't want to eat 2MB for each virtio-mmio transport in a 32 bit address space, it seems hugely wasteful unless there's a good reason to do it. -- PMM
Re: [Qemu-devel] [PATCH 02/11] iov: handle EOF in iov_send_recv
Am 03.08.2013 um 05:48 hat Doug Goldstein geschrieben: On Tue, Jul 23, 2013 at 6:28 AM, Paolo Bonzini pbonz...@redhat.com wrote: Il 23/07/2013 10:30, MORITA Kazutaka ha scritto: Without this patch, iov_send_recv() never returns when do_send_recv() returns zero. Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp --- util/iov.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/util/iov.c b/util/iov.c index cc6e837..f705586 100644 --- a/util/iov.c +++ b/util/iov.c @@ -202,6 +202,12 @@ ssize_t iov_send_recv(int sockfd, struct iovec *iov, unsigned iov_cnt, return -1; } +if (ret == 0 !do_send) { +/* recv returns 0 when the peer has performed an orderly + * shutdown. */ +break; +} + /* Prepare for the next iteration */ offset += ret; total += ret; Reviewed-by: Paolo Bonzini pbonz...@redhat.com ... and should also be in 1.5.2. Paolo Nudge so this doesn't get forgotten about. It hasn't hit master yet. Thanks, applied to the block branch. Kevin
Re: [Qemu-devel] [PATCH v4 2/2] ARM: Add 'virt' platform
On Mon, Aug 5, 2013 at 5:58 PM, Peter Maydell peter.mayd...@linaro.org wrote: On 5 August 2013 13:22, Anup Patel a...@brainfault.org wrote: On Mon, Aug 5, 2013 at 5:31 PM, Peter Maydell peter.mayd...@linaro.org wrote: On 5 August 2013 12:48, Anup Patel a...@brainfault.org wrote: +static const MemMapEntry a15memmap[] = { +[VIRT_FLASH] = { 0, 0x10 }, IMHO, 1 MB of flash is small for possible future expansion. If mach-virt becomes popular then we can expect people running UBoot or UEFI or from this flash. I think having 16 MB of flash would be good because it is multiple of 2 MB hence we can also create section entries for it in Stage2 TTBL. Seems reasonable. +[VIRT_CPUPERIPHS] = { 0x10, 0x8000 }, I would suggest to start peripheral space at 2 MB boundary and also have its total size in multiple of 2 MB. The total size here is fixed by the CPU hardware -- an A15's private peripheral space is only 0x8000 in size. Does this mean, mach-virt address space is Cortex-A15 specific ? The patch has support for an array of VirtBoardInfo structs, each of which has its own memmap. The only entry at the moment is for the A15. What about Cortex-A7 and Cortex-A12 ? QEMU supports neither of these CPUs. The A7's the same as the A15, though. If above is a mandatory constraint then we can have peripheral space divide into two parts: 1. Essential (or MPCore) peripherals: For now only GIC belongs here. Other things that fit here is Watchdog and Local timers but this are redundant for mach-virt. This space can be only 0x8000 in size as required by Cortex-A15. 2. General peripherals: All VirtIO devices would belong here. In addition, users can also add devices to this space using QEMU command line. Er, this is what the patch already does. The CPUPERIPHS bit is specifically for the CPU's private peripheral region. You can't add an MMIO device on the QEMU command line, there's no way to wire up the memory regions and IRQs. Why does each individual peripheral have to be at a 2MB boundary? I would expect the kernel to just have a single nothing mapped here stage 2 table entry (or entries) covering the whole of the mmio peripheral region regardless of how many devices happen to be inside it. Yes, this seem a little over-kill but we can atleast have peripheral space to be 2MB aligned with total size in multiple of 2MB. I really don't want to eat 2MB for each virtio-mmio transport in a 32 bit address space, it seems hugely wasteful unless there's a good reason to do it. I am not suggesting to give 2MB space to each virtio-mmio transport. What I really meant was to start VIRT_MMIO space (where all the virtio-mmio transport would be added) to start at 2MB aligned address and have total size (include all virtio-mmio transports) to be in-multiple of 2MB so that we can trap access to all virtio-mmio transports using 1-2 2MB entries in Stage2. -- PMM --Anup
Re: [Qemu-devel] [PATCH] block: Produce zeros when protocols reading beyond end of file
Am 05.08.2013 um 10:11 hat Asias He geschrieben: From: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp While Asias is debugging an issue creating qcow2 images on top of non-file protocols. It boils down to this example using NBD: $ qemu-io -c 'open -g nbd+unix:///?socket=/tmp/nbd.sock' -c 'read -v 0 512' Notice the open -g option to set bs-growable. This means you can read/write beyond end of file. Reading beyond end of file is supposed to produce zeroes. We rely on this behavior in qcow2_create2() during qcow2 image creation. We create a new file and then write the qcow2 header structure using bdrv_pwrite(). Since QCowHeader is not a multiple of sector size, block.c first uses bdrv_read() on the empty file to fetch the first sector (should be all zeroes). Here is the output from the qemu-io NBD example above: $ qemu-io -c 'open -g nbd+unix:///?socket=/tmp/nbd.sock' -c 'read -v 0 512' : ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab 0010: ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab 0020: ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ... We are not zeroing the buffer! As a result qcow2 image creation on top of protocols is not guaranteed to work even when file creation is supported by the protocol. Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp Signed-off-by: Asias He as...@redhat.com --- block.c | 30 +- 1 file changed, 29 insertions(+), 1 deletion(-) diff --git a/block.c b/block.c index 01b66d8..deaf0a0 100644 --- a/block.c +++ b/block.c @@ -2544,7 +2544,35 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs, } } -ret = drv-bdrv_co_readv(bs, sector_num, nb_sectors, qiov); +if (!bs-drv-protocol_name) { I think !bs-growable is the right check. Checking for the protocol name is always a hack and most times wrong. +ret = drv-bdrv_co_readv(bs, sector_num, nb_sectors, qiov); +} else { +/* NBD doesn't support reading beyond end of file. */ This is not only for NBD, make it a neutral comment like: /* Read zeros after EOF of growable BDSes */ +int64_t len, total_sectors, max_nb_sectors; + +len = bdrv_getlength(bs); +if (len 0) { +ret = len; +goto out; +} + +total_sectors = len BDRV_SECTOR_BITS; +max_nb_sectors = MAX(0, total_sectors - sector_num); +if (max_nb_sectors 0) { +ret = drv-bdrv_co_readv(bs, sector_num, + MIN(nb_sectors, max_nb_sectors), qiov); +} else { +ret = 0; +} + +/* Reading beyond end of file is supposed to produce zeroes */ +if (ret == 0 total_sectors sector_num + nb_sectors) { +size_t offset = MAX(0, total_sectors - sector_num); +size_t bytes = (sector_num + nb_sectors - offset) * +BDRV_SECTOR_SIZE; uint64_t for both offset and bytes, size_t can be 32 bits. +qemu_iovec_memset(qiov, offset * BDRV_SECTOR_SIZE, 0, bytes); +} +} out: tracked_request_end(req); Kevin
Re: [Qemu-devel] [PATCH v4 2/2] ARM: Add 'virt' platform
On 5 August 2013 13:37, Anup Patel a...@brainfault.org wrote: On Mon, Aug 5, 2013 at 5:58 PM, Peter Maydell peter.mayd...@linaro.org wrote: I really don't want to eat 2MB for each virtio-mmio transport in a 32 bit address space, it seems hugely wasteful unless there's a good reason to do it. I am not suggesting to give 2MB space to each virtio-mmio transport. What I really meant was to start VIRT_MMIO space (where all the virtio-mmio transport would be added) to start at 2MB aligned address and have total size (include all virtio-mmio transports) to be in-multiple of 2MB so that we can trap access to all virtio-mmio transports using 1-2 2MB entries in Stage2. Yes, that's fine. That would give us: static const MemMapEntry a15memmap[] = { [VIRT_FLASH] = { 0, 0x100 }, [VIRT_CPUPERIPHS] = { 0x100, 0x8000 }, /* GIC distributor and CPU interfaces sit inside the CPU peripheral space */ [VIRT_GIC_DIST] = { 0x1001000, 0x1000 }, [VIRT_GIC_CPU] = { 0x1002000, 0x1000 }, [VIRT_MMIO] = { 0x120, 0x200 }, /* ...repeating for a total of NUM_VIRTIO_TRANSPORTS, each of that size */ [VIRT_MEM] = { 0x800, 30ULL * 1024 * 1024 * 1024 }, }; -- PMM
[Qemu-devel] Versioned machine types for ARM/non-x86 ? (Was Re: [PATCH v4 0/2] ARM: add 'virt' platform)
On Mon, Aug 05, 2013 at 12:18:10PM +0100, Peter Maydell wrote: This patch series adds a 'virt' platform which uses the kernel's mach-virt (fully device-tree driven) support to create a simple minimalist platform intended for use for KVM VM guests. It's based on John Rigby's patches, but I've overhauled it a lot: On x86, we've long had versioned machine names, so that we can make changes in future QEMU releases without breaking guest ABI compatibility. AFAICT, the problem has basically been ignored on non-x86 platforms in QEMU. Given the increased interest in ARM in particular, should we use the addition of this new 'virt' machine type, as an opportunity to introduce versioning for ARM too. eg make this machine be called 'virt-1.0.6' and then have 'virt' simply be an alias that points to the most recent version. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
Re: [Qemu-devel] [PATCH] target-mips: fix decoding of microMIPS POOL32Axf instructions
On 05/08/13 11:50, Aurelien Jarno wrote: On Mon, Aug 05, 2013 at 08:41:52AM +0100, Leon Alrae wrote: On 03/08/13 23:01, Aurelien Jarno wrote: On Thu, Aug 01, 2013 at 11:02:27AM +0100, Leon Alrae wrote: These are not DSP instructions, thus there is no ac field. For more details please refer to instruction encoding of MULT, MULTU, MADD, MADDU, MSUB, MSUBU, MFHI, MFLO, MTHI, MTLO in MIPS Architecture for Programmers Volume II-B: The microMIPS32 Instruction Set These instructions have both a non DSP version without the ac field, and a DSP version with the ac field. The encoding of the ac field is done in bits 14 and 15, so the current QEMU code is correct. Please refer to the MIPS32® Architecture Manual Volume IV-e: The MIPS® DSP Application-Specific Extension to the microMIPS32® Architecture (document MD00764). Please note that the patch modifies non-DSP version of listed instructions, where ac field doesn't exist. In this case reading bits This is correct, except for MFHI, MFLO, MTHI, MTLO which share the same opcode for the DSP and non-DSP version. Theses instructions have bits 14 and 15 set to 0 for the non-DSP version, so they are working as expected and thus your patch should not modify them. As far as microMIPS is concerned - MFHI, MFLO, MTHI, MTLO don't seem to share the same opcode for the DSP and non-DSP version. It is true that non-DSP version has bits 14 and 15 set to zero. However, according to already mentioned documents, bits 11..6 (extension) are different in the DSP (set to 0x35) and non-DSP version (set to 0x1). Therefore, we shouldn't read bits 14 and 15 in case of non-DSP version, and we should have a separate entry for DSP version. Just to make sure that we are refering to the same thing :) For MFHI: page no. 303 in MIPS Architecture for Programmers Volume II-B: The microMIPS32 Instruction Set (document MD00764) page no. 140 in MIPS Architecture for Programmers Volume IV-e: The MIPS DSP Module for the microMIPS32 Architecture (document MD00582) 14 and 15 and passing it as acc argument to gen_muldiv() or gen_HILO() is not correct. If those bits are not 0 (and for most of the listed instructions this is true) then check_dsp() is called which will cause an exception if DSP is not supported / disabled. This is correct. The current code assumes that all instructions using gen_muldiv() have the same opcode for non-DSP and DSP version (actually it's weird that they have a different encoding, it's just wasting some opcode space). Therefore what should be done: - The part concerning MFHI, MFLO, MTHI, MTLO should be dropped from your patch as it already works correctly. - The part of your patch concerning instructions calling gen_muldiv() is correct and should be kept to handle the non-DSP version of these instructions. - An entry with for the DSP version of these instructions should be created, calling gen_muldiv() passing the ac field from bits 14 and 15. This is basically the same code than now, just with extension 0x32 instead of 0x2c. OK - I will update the patch, but I would leave the part concerning MFHI, MFLO, MTHI, MTLO as it seems to be correct, but the new entry will be added for DSP version. Regards, Leon
[Qemu-devel] [PATCH for-1.6] qemu-iotests: filter QEMU version in monitor banner
Filter out the QEMU monitor version banner so that tests do not break when the QEMU version number is changed. Signed-off-by: Stefan Hajnoczi stefa...@redhat.com --- tests/qemu-iotests/051.out | 64 tests/qemu-iotests/common.filter | 3 +- 2 files changed, 34 insertions(+), 33 deletions(-) diff --git a/tests/qemu-iotests/051.out b/tests/qemu-iotests/051.out index 9588d0c..5582ed3 100644 --- a/tests/qemu-iotests/051.out +++ b/tests/qemu-iotests/051.out @@ -23,11 +23,11 @@ QEMU_PROG: -drive file=TEST_DIR/t.qcow2,format=qcow2,unknown_opt=foo: could not === Enable and disable lazy refcounting on the command line, plus some invalid values === Testing: -drive file=TEST_DIR/t.qcow2,format=qcow2,lazy-refcounts=on -QEMU 1.5.50 monitor - type 'help' for more information +QEMU X.Y.Z monitor - type 'help' for more information (qemu) q[K[Dqu[K[D[Dqui[K[D[D[Dquit[K Testing: -drive file=TEST_DIR/t.qcow2,format=qcow2,lazy-refcounts=off -QEMU 1.5.50 monitor - type 'help' for more information +QEMU X.Y.Z monitor - type 'help' for more information (qemu) q[K[Dqu[K[D[Dqui[K[D[D[Dquit[K Testing: -drive file=TEST_DIR/t.qcow2,format=qcow2,lazy-refcounts= @@ -51,72 +51,72 @@ QEMU_PROG: -drive file=TEST_DIR/t.qcow2,format=qcow2,lazy-refcounts=on: Lazy ref QEMU_PROG: -drive file=TEST_DIR/t.qcow2,format=qcow2,lazy-refcounts=on: could not open disk image TEST_DIR/t.qcow2: Invalid argument Testing: -drive file=TEST_DIR/t.qcow2,format=qcow2,lazy-refcounts=off -QEMU 1.5.50 monitor - type 'help' for more information +QEMU X.Y.Z monitor - type 'help' for more information (qemu) q[K[Dqu[K[D[Dqui[K[D[D[Dquit[K === No medium === Testing: -drive if=floppy -QEMU 1.5.50 monitor - type 'help' for more information +QEMU X.Y.Z monitor - type 'help' for more information (qemu) q[K[Dqu[K[D[Dqui[K[D[D[Dquit[K Testing: -drive if=ide,media=cdrom -QEMU 1.5.50 monitor - type 'help' for more information +QEMU X.Y.Z monitor - type 'help' for more information (qemu) q[K[Dqu[K[D[Dqui[K[D[D[Dquit[K Testing: -drive if=scsi,media=cdrom -QEMU 1.5.50 monitor - type 'help' for more information +QEMU X.Y.Z monitor - type 'help' for more information (qemu) q[K[Dqu[K[D[Dqui[K[D[D[Dquit[K Testing: -drive if=ide -QEMU 1.5.50 monitor - type 'help' for more information +QEMU X.Y.Z monitor - type 'help' for more information (qemu) QEMU_PROG: Device needs media, but drive is empty QEMU_PROG: Device initialization failed. QEMU_PROG: Initialization of device ide-hd failed Testing: -drive if=virtio -QEMU 1.5.50 monitor - type 'help' for more information +QEMU X.Y.Z monitor - type 'help' for more information (qemu) QEMU_PROG: -drive if=virtio: Device needs media, but drive is empty QEMU_PROG: -drive if=virtio: Device initialization failed. QEMU_PROG: -drive if=virtio: Device initialization failed. QEMU_PROG: -drive if=virtio: Device 'virtio-blk-pci' could not be initialized Testing: -drive if=scsi -QEMU 1.5.50 monitor - type 'help' for more information +QEMU X.Y.Z monitor - type 'help' for more information (qemu) QEMU_PROG: -drive if=scsi: Device needs media, but drive is empty QEMU_PROG: -drive if=scsi: Device initialization failed. QEMU_PROG: Device initialization failed. QEMU_PROG: Initialization of device lsi53c895a failed Testing: -drive if=none,id=disk -device ide-cd,drive=disk -QEMU 1.5.50 monitor - type 'help' for more information +QEMU X.Y.Z monitor - type 'help' for more information (qemu) q[K[Dqu[K[D[Dqui[K[D[D[Dquit[K Testing: -drive if=none,id=disk -device lsi53c895a -device scsi-cd,drive=disk -QEMU 1.5.50 monitor - type 'help' for more information +QEMU X.Y.Z monitor - type 'help' for more information (qemu) q[K[Dqu[K[D[Dqui[K[D[D[Dquit[K Testing: -drive if=none,id=disk -device ide-drive,drive=disk -QEMU 1.5.50 monitor - type 'help' for more information +QEMU X.Y.Z monitor - type 'help' for more information (qemu) QEMU_PROG: -device ide-drive,drive=disk: Device needs media, but drive is empty QEMU_PROG: -device ide-drive,drive=disk: Device initialization failed. QEMU_PROG: -device ide-drive,drive=disk: Device 'ide-drive' could not be initialized Testing: -drive if=none,id=disk -device ide-hd,drive=disk -QEMU 1.5.50 monitor - type 'help' for more information +QEMU X.Y.Z monitor - type 'help' for more information (qemu) QEMU_PROG: -device ide-hd,drive=disk: Device needs media, but drive is empty QEMU_PROG: -device ide-hd,drive=disk: Device initialization failed. QEMU_PROG: -device ide-hd,drive=disk: Device 'ide-hd' could not be initialized Testing: -drive if=none,id=disk -device lsi53c895a -device scsi-disk,drive=disk -QEMU 1.5.50 monitor - type 'help' for more information +QEMU X.Y.Z monitor - type 'help' for more information (qemu) QEMU_PROG: -device scsi-disk,drive=disk: Device needs media, but drive is empty QEMU_PROG: -device
Re: [Qemu-devel] Versioned machine types for ARM/non-x86 ? (Was Re: [PATCH v4 0/2] ARM: add 'virt' platform)
On 5 August 2013 13:49, Daniel P. Berrange berra...@redhat.com wrote: On x86, we've long had versioned machine names, so that we can make changes in future QEMU releases without breaking guest ABI compatibility. AFAICT, the problem has basically been ignored on non-x86 platforms in QEMU. Yes; this is deliberate on the basis that starting to do this is accepting a huge pile of maintenance workload (ie checking for things which change, keeping around a pile of old version machine models, retaining migration compatibility between old and new versions). Which isn't to say I'm against it but it means I'm not doing it until the pushback from users that it's necessary is pretty strong. Given the increased interest in ARM in particular, should we use the addition of this new 'virt' machine type, as an opportunity to introduce versioning for ARM too. eg make this machine be called 'virt-1.0.6' and then have 'virt' simply be an alias that points to the most recent version. I'm not convinced we're at the point where we need to do this yet. -- PMM
[Qemu-devel] FW: [Xen-devel]Cirrus VGA slow screen update, show blank screen last 13s or so for windows XP guest
Hi, -Original Message- From: Gonglei (Arei) Sent: Tuesday, July 30, 2013 10:01 AM To: 'Pasi Kärkkäinen' Cc: Gerd Hoffmann; Andreas Färber; Hanweidong; Luonengjun; qemu-devel@nongnu.org; xen-de...@lists.xen.org; Anthony Liguori; Huangweidong (Hardware); 'ian.jack...@eu.citrix.com'; Anthony Liguori; 'aligu...@us.ibm.com' Subject: RE: [Xen-devel] [Qemu-devel] Cirrus VGA slow screen update, show blank screen last 13s or so for windows XP guest On Mon, Jul 29, 2013 at 08:48:54AM +, Gonglei (Arei) wrote: -Original Message- From: Pasi Kärkkäinen [mailto:pa...@iki.fi] Sent: Saturday, July 27, 2013 7:51 PM To: Gerd Hoffmann Cc: Andreas Färber; Hanweidong; Luonengjun; qemu-devel@nongnu.org; xen-de...@lists.xen.org; Gonglei (Arei); Anthony Liguori; Huangweidong (Hardware) Subject: Re: [Xen-devel] [Qemu-devel] Cirrus VGA slow screen update, show blank screen last 13s or so for windows XP guest On Fri, Jul 26, 2013 at 12:19:16PM +0200, Gerd Hoffmann wrote: Maybe the xen guys did some optimizations in qemu-dm which where not merged upstream. Try asking @ xen-devel. Yeah, xen qemu-dm must have some optimization for cirrus that isn't in upstream qemu. Hi, Pasi. Would you give me some more details? Thanks! Unfortunately I don't have any more details.. I was just assuming if xen qemu-dm is fast, but upstream qemu is slow, there might be some optimization in xen qemu-dm .. Did you check the xen qemu-dm (traditional) history / commits? -- Pasi Yes, I did. I tried to reproduce the issue with qemu-dm by fall backing history commit. But the qemu-dm mainline merged other branches, such as branch 'upstream' and branch 'qemu', so that I can not complied the xen-4.1.2/tools project well. CC'ing Ian and Anthony. By analyzing xentrace data, I found that there is lot's of VMEXIT between memory region 0xa~0xa with upstream qemu, but only 256 VMEXIT on traditional qemu-dm, when Windows XP guest booting up or change the method connecting the VM from VNC(RDP) to RDP(VNC): linux-sAGhxH:# cat qemu-upstrem.log |grep 0x000a|wc -l 640654 linux-sAGhxH:# cat qemu-dm.log |grep 0x000a|wc -l 256 And the 256 VMEXIT of qemu-dm as the same as the top 256 VMEXIT of qemu-upstream linux-sAGhxH:# cat qemu-upstream.log |grep 0x000a| tail CPU0 0 (+ 0) NPF [ gpa = 0x000a0b08 mfn = 0x qual = 0x0182 p2mt = 0x0004 ] CPU0 0 (+ 0) NPF [ gpa = 0x000a0b0a mfn = 0x qual = 0x0182 p2mt = 0x0004 ] CPU0 0 (+ 0) NPF [ gpa = 0x000a0b0c mfn = 0x qual = 0x0182 p2mt = 0x0004 ] CPU0 0 (+ 0) NPF [ gpa = 0x000a0b0e mfn = 0x qual = 0x0182 p2mt = 0x0004 ] CPU0 0 (+ 0) NPF [ gpa = 0x000a0b10 mfn = 0x qual = 0x0182 p2mt = 0x0004 ] CPU0 0 (+ 0) NPF [ gpa = 0x000a0b12 mfn = 0x qual = 0x0182 p2mt = 0x0004 ] CPU0 0 (+ 0) NPF [ gpa = 0x000a0b14 mfn = 0x qual = 0x0182 p2mt = 0x0004 ] CPU0 0 (+ 0) NPF [ gpa = 0x000a0b16 mfn = 0x qual = 0x0182 p2mt = 0x0004 ] CPU0 0 (+ 0) NPF [ gpa = 0x000a0b18 mfn = 0x qual = 0x0182 p2mt = 0x0004 ] CPU0 0 (+ 0) NPF [ gpa = 0x000a0b1a mfn = 0x qual = 0x0182 p2mt = 0x0004 ] linux-sAGhxH:# cat qemu-dm.log |grep 0x000a| tail CPU2 0 (+ 0) NPF [ gpa = 0x000a1ec0 mfn = 0x qual = 0x0182 p2mt = 0x0004 ] CPU2 0 (+ 0) NPF [ gpa = 0x000a1ee0 mfn = 0x qual = 0x0182 p2mt = 0x0004 ] CPU2 0 (+ 0) NPF [ gpa = 0x000a1f00 mfn = 0x qual = 0x0182 p2mt = 0x0004 ] CPU2 0 (+ 0) NPF [ gpa = 0x000a1f20 mfn = 0x qual = 0x0182 p2mt = 0x0004 ] CPU2 0 (+ 0) NPF [ gpa = 0x000a1f40 mfn = 0x qual = 0x0182 p2mt = 0x0004 ] CPU2 0 (+ 0) NPF [ gpa = 0x000a1f60 mfn = 0x qual = 0x0182 p2mt = 0x0004 ] CPU2 0 (+ 0) NPF [ gpa = 0x000a1f80 mfn = 0x qual = 0x0182 p2mt = 0x0004 ] CPU2 0 (+ 0) NPF [ gpa = 0x000a1fa0 mfn = 0x qual = 0x0182 p2mt = 0x0004 ] CPU2 0 (+ 0) NPF [ gpa = 0x000a1fc0 mfn = 0x qual = 0x0182 p2mt = 0x0004 ] CPU2 0 (+ 0) NPF [ gpa = 0x000a1fe0 mfn = 0x qual = 0x0182 p2mt = 0x0004 ] Please see attachment for more information. Because of the lot's of VMEXIT, the cirrus_vga_ioport_write funciton receive little scheduling, and cirrus_vga_write_gr execution interval of
Re: [Qemu-devel] [PATCH v5 00/21] AArch64 preparation patchset
On 18 July 2013 14:31, Peter Maydell peter.mayd...@linaro.org wrote: On 1 July 2013 20:07, Peter Maydell peter.mayd...@linaro.org wrote: On 1 July 2013 18:34, Peter Maydell peter.mayd...@linaro.org wrote: This patchset is v5 of the preparation patchset that started off with Alex, was passed to John Rigby and now to me. Also available via git: git://git.linaro.org/people/pmaydell/qemu-arm.git aarch64 (caution: branch likely to rebase in future) I'm not going to bother reposting the whole patchset since the changes are trivial, but for the benefit of people basing other patchsets on top of this, I've rebased it on current master and updated the git branch above. Git branch updated again on current master. -- PMM
Re: [Qemu-devel] [ANNOUNCE] QEMU 1.6.0-rc1 is now available
On Mon, 05 Aug 2013 14:01:10 +0200 Laszlo Ersek ler...@redhat.com wrote: Please do consider it for 1.6, it addresses a serious bug (= vmcores saved with dump-guest-memory/paging=false are broken for 3.5G x86(_64) guests). No worries, it is going to be included (unless someone has a serious objection, of course).
[Qemu-devel] [PATCH for-1.6? v2 00/21] qtest: Test all targets
Hello Anthony/Aurélien, This series extends test coverage to all 16 targets. For now it tests that QOM type changes do not lead to QOM cast assertions. v2 extends it to cover virtually all machines (except Xen and pc*-x.y). Where an fprintf() is touched, use error_report() instead. Regards, Andreas v1 - v2: * gumstix, z2: Avoided conditionalizing use of pflash device in favor of NULL bdrv. * puv3: Limited qtest workaround to a NULL kernel_filename. * Added error workarounds for milkymist, ppc405, shix and leon3. * Cleaned up debug output for ppc405 and shix. * Extended qom-test to cover virtually all machines, including n800 and pc. * Moved machine names to arrays wherever sensible, to aid with extensibility. * Adopted error_report() for armv7m, too. Cc: Anthony Liguori anth...@codemonkey.ws Cc: Aurélien Jarno aurel...@aurel32.net Cc: Peter Maydell peter.mayd...@linaro.org Andreas Färber (21): mips_mipssim: Silence BIOS loading warning for qtest arm/boot: Turn arm_load_kernel() into no-op for qtest without -kernel puv3: Turn puv3_load_kernel() into a no-op for qtest without -kernel mainstone: Don't enforce use of -pflash for qtest gumstix: Don't enforce use of -pflash for qtest z2: Don't enforce use of -pflash for qtest palm: Don't enforce loading ROM or kernel for qtest omap_sx1: Don't enforce use of kernel or flash for qtest exynos4_boards: Silence lack of -smp 2 warning for qtest armv7m: Don't enforce use of kernel for qtest axis_dev88: Don't enforce use of kernel for qtest mcf5208: Don't enforce use of kernel for qtest an5206: Don't enforce use of kernel for qtest milkymist: Suppress -kernel/-bios/-drive error for qtest ppc405_boards: Disable debug output ppc405_uc: Disable debug output ppc405_boards: Don't enforce presence of firmware for qtest shix: Drop debug output shix: Don't require firmware presence for qtest leon3: Don't enforce use of -bios with qtest qtest: Prepare QOM machine tests hw/arm/armv7m.c | 25 +++-- hw/arm/boot.c | 4 + hw/arm/exynos4_boards.c | 3 +- hw/arm/gumstix.c| 11 +- hw/arm/mainstone.c | 5 +- hw/arm/omap_sx1.c | 3 +- hw/arm/palm.c | 3 +- hw/arm/z2.c | 5 +- hw/block/tc58128.c | 10 +- hw/cris/axis_dev88.c| 11 +- hw/lm32/milkymist.c | 3 +- hw/m68k/an5206.c| 4 + hw/m68k/mcf5208.c | 4 + hw/mips/mips_mipssim.c | 4 +- hw/ppc/ppc405_boards.c | 39 --- hw/ppc/ppc405_uc.c | 16 +-- hw/sh4/shix.c | 16 +-- hw/sparc/leon3.c| 3 +- hw/unicore32/puv3.c | 4 + tests/Makefile | 26 + tests/qom-test.c| 280 21 files changed, 410 insertions(+), 69 deletions(-) create mode 100644 tests/qom-test.c -- 1.8.1.4
[Qemu-devel] [PATCH for-1.6? v2 06/21] z2: Don't enforce use of -pflash for qtest
Signed-off-by: Andreas Färber afaer...@suse.de --- hw/arm/z2.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/hw/arm/z2.c b/hw/arm/z2.c index 07a127b..a4cbaeb 100644 --- a/hw/arm/z2.c +++ b/hw/arm/z2.c @@ -24,6 +24,7 @@ #include ui/console.h #include audio/audio.h #include exec/address-spaces.h +#include sysemu/qtest.h #ifdef DEBUG_Z2 #define DPRINTF(fmt, ...) \ @@ -323,7 +324,7 @@ static void z2_init(QEMUMachineInitArgs *args) be = 0; #endif dinfo = drive_get(IF_PFLASH, 0, 0); -if (!dinfo) { +if (!dinfo !qtest_enabled()) { fprintf(stderr, Flash image must be given with the 'pflash' parameter\n); exit(1); @@ -331,7 +332,7 @@ static void z2_init(QEMUMachineInitArgs *args) if (!pflash_cfi01_register(Z2_FLASH_BASE, NULL, z2.flash0, Z2_FLASH_SIZE, - dinfo-bdrv, sector_len, + dinfo ? dinfo-bdrv : NULL, sector_len, Z2_FLASH_SIZE / sector_len, 4, 0, 0, 0, 0, be)) { fprintf(stderr, qemu: Error registering flash memory.\n); -- 1.8.1.4
[Qemu-devel] [PATCH for-1.6? v2 02/21] arm/boot: Turn arm_load_kernel() into no-op for qtest without -kernel
Signed-off-by: Andreas Färber afaer...@suse.de --- hw/arm/boot.c | 4 1 file changed, 4 insertions(+) diff --git a/hw/arm/boot.c b/hw/arm/boot.c index 2cbeefd..0c6d611 100644 --- a/hw/arm/boot.c +++ b/hw/arm/boot.c @@ -15,6 +15,7 @@ #include hw/loader.h #include elf.h #include sysemu/device_tree.h +#include sysemu/qtest.h #include qemu/config-file.h #define KERNEL_ARGS_ADDR 0x100 @@ -354,6 +355,9 @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info) /* Load the kernel. */ if (!info-kernel_filename) { +if (qtest_enabled()) { +return; +} fprintf(stderr, Kernel image must be specified\n); exit(1); } -- 1.8.1.4
[Qemu-devel] [PATCH for-1.6? v2 01/21] mips_mipssim: Silence BIOS loading warning for qtest
Signed-off-by: Andreas Färber afaer...@suse.de --- hw/mips/mips_mipssim.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/hw/mips/mips_mipssim.c b/hw/mips/mips_mipssim.c index fea1a15..d8c4347 100644 --- a/hw/mips/mips_mipssim.c +++ b/hw/mips/mips_mipssim.c @@ -37,6 +37,7 @@ #include elf.h #include hw/sysbus.h #include exec/address-spaces.h +#include sysemu/qtest.h static struct _loaderparams { int ram_size; @@ -189,7 +190,8 @@ mips_mipssim_init(QEMUMachineInitArgs *args) } else { bios_size = -1; } -if ((bios_size 0 || bios_size BIOS_SIZE) !kernel_filename) { +if ((bios_size 0 || bios_size BIOS_SIZE) +!kernel_filename !qtest_enabled()) { /* Bail out if we have neither a kernel image nor boot vector code. */ fprintf(stderr, qemu: Warning, could not load MIPS bios '%s', and no -kernel argument was specified\n, -- 1.8.1.4
[Qemu-devel] [PATCH for-1.6? v2 15/21] ppc405_boards: Disable debug output
Also move one stray debug output into an #ifdef. Signed-off-by: Andreas Färber afaer...@suse.de --- hw/ppc/ppc405_boards.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/hw/ppc/ppc405_boards.c b/hw/ppc/ppc405_boards.c index f74e5e5..807ada0 100644 --- a/hw/ppc/ppc405_boards.c +++ b/hw/ppc/ppc405_boards.c @@ -42,7 +42,7 @@ #define USE_FLASH_BIOS -#define DEBUG_BOARD_INIT +//#define DEBUG_BOARD_INIT /*/ /* PPC405EP reference board (IBM) */ @@ -353,9 +353,9 @@ static void ref405ep_init(QEMUMachineInitArgs *args) bdloc = 0; } #ifdef DEBUG_BOARD_INIT +printf(bdloc RAM_ADDR_FMT \n, bdloc); printf(%s: Done\n, __func__); #endif -printf(bdloc RAM_ADDR_FMT \n, bdloc); } static QEMUMachine ref405ep_machine = { -- 1.8.1.4
[Qemu-devel] [PATCH for-1.6? v2 11/21] axis_dev88: Don't enforce use of kernel for qtest
Signed-off-by: Andreas Färber afaer...@suse.de --- hw/cris/axis_dev88.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/hw/cris/axis_dev88.c b/hw/cris/axis_dev88.c index 9104d61..6673c66 100644 --- a/hw/cris/axis_dev88.c +++ b/hw/cris/axis_dev88.c @@ -32,6 +32,7 @@ #include boot.h #include sysemu/blockdev.h #include exec/address-spaces.h +#include sysemu/qtest.h #define D(x) #define DNAND(x) @@ -340,14 +341,14 @@ void axisdev88_init(QEMUMachineInitArgs *args) irq[0x14 + i]); } -if (!kernel_filename) { +if (kernel_filename) { +li.image_filename = kernel_filename; +li.cmdline = kernel_cmdline; +cris_load_image(cpu, li); +} else if (!qtest_enabled()) { fprintf(stderr, Kernel image must be specified\n); exit(1); } - -li.image_filename = kernel_filename; -li.cmdline = kernel_cmdline; -cris_load_image(cpu, li); } static QEMUMachine axisdev88_machine = { -- 1.8.1.4
[Qemu-devel] [PATCH for-1.6? v2 05/21] gumstix: Don't enforce use of -pflash for qtest
Signed-off-by: Andreas Färber afaer...@suse.de --- hw/arm/gumstix.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/hw/arm/gumstix.c b/hw/arm/gumstix.c index b8cab10..a9015ed 100644 --- a/hw/arm/gumstix.c +++ b/hw/arm/gumstix.c @@ -42,6 +42,7 @@ #include hw/boards.h #include sysemu/blockdev.h #include exec/address-spaces.h +#include sysemu/qtest.h static const int sector_len = 128 * 1024; @@ -58,7 +59,7 @@ static void connex_init(QEMUMachineInitArgs *args) cpu = pxa255_init(address_space_mem, connex_ram); dinfo = drive_get(IF_PFLASH, 0, 0); -if (!dinfo) { +if (!dinfo !qtest_enabled()) { fprintf(stderr, A flash image must be given with the 'pflash' parameter\n); exit(1); @@ -70,7 +71,8 @@ static void connex_init(QEMUMachineInitArgs *args) be = 0; #endif if (!pflash_cfi01_register(0x, NULL, connext.rom, connex_rom, - dinfo-bdrv, sector_len, connex_rom / sector_len, + dinfo ? dinfo-bdrv : NULL, + sector_len, connex_rom / sector_len, 2, 0, 0, 0, 0, be)) { fprintf(stderr, qemu: Error registering flash memory.\n); exit(1); @@ -95,7 +97,7 @@ static void verdex_init(QEMUMachineInitArgs *args) cpu = pxa270_init(address_space_mem, verdex_ram, cpu_model ?: pxa270-c0); dinfo = drive_get(IF_PFLASH, 0, 0); -if (!dinfo) { +if (!dinfo !qtest_enabled()) { fprintf(stderr, A flash image must be given with the 'pflash' parameter\n); exit(1); @@ -107,7 +109,8 @@ static void verdex_init(QEMUMachineInitArgs *args) be = 0; #endif if (!pflash_cfi01_register(0x, NULL, verdex.rom, verdex_rom, - dinfo-bdrv, sector_len, verdex_rom / sector_len, + dinfo ? dinfo-bdrv : NULL, + sector_len, verdex_rom / sector_len, 2, 0, 0, 0, 0, be)) { fprintf(stderr, qemu: Error registering flash memory.\n); exit(1); -- 1.8.1.4
[Qemu-devel] [PATCH for-1.6? v2 04/21] mainstone: Don't enforce use of -pflash for qtest
Simply skip flash setup for now. Also drop useless debug output. Signed-off-by: Andreas Färber afaer...@suse.de --- hw/arm/mainstone.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/hw/arm/mainstone.c b/hw/arm/mainstone.c index 8e5fc26..e9d4b98 100644 --- a/hw/arm/mainstone.c +++ b/hw/arm/mainstone.c @@ -21,6 +21,7 @@ #include sysemu/blockdev.h #include hw/sysbus.h #include exec/address-spaces.h +#include sysemu/qtest.h /* Device addresses */ #define MST_FPGA_PHYS 0x0800 @@ -127,6 +128,9 @@ static void mainstone_common_init(MemoryRegion *address_space_mem, for (i = 0; i 2; i ++) { dinfo = drive_get(IF_PFLASH, 0, i); if (!dinfo) { +if (qtest_enabled()) { +break; +} fprintf(stderr, Two flash images must be given with the 'pflash' parameter\n); exit(1); @@ -147,7 +151,6 @@ static void mainstone_common_init(MemoryRegion *address_space_mem, qdev_get_gpio_in(mpu-gpio, 0)); /* setup keypad */ -printf(map addr %p\n, map); pxa27x_register_keypad(mpu-kp, map, 0xe0); /* MMC/SD host */ -- 1.8.1.4
[Qemu-devel] [PATCH for-1.6? v2 07/21] palm: Don't enforce loading ROM or kernel for qtest
Signed-off-by: Andreas Färber afaer...@suse.de --- hw/arm/palm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/hw/arm/palm.c b/hw/arm/palm.c index cdc3c3a..dc4bbcd 100644 --- a/hw/arm/palm.c +++ b/hw/arm/palm.c @@ -19,6 +19,7 @@ #include hw/hw.h #include audio/audio.h #include sysemu/sysemu.h +#include sysemu/qtest.h #include ui/console.h #include hw/arm/omap.h #include hw/boards.h @@ -255,7 +256,7 @@ static void palmte_init(QEMUMachineInitArgs *args) } } -if (!rom_loaded !kernel_filename) { +if (!rom_loaded !kernel_filename !qtest_enabled()) { fprintf(stderr, Kernel or ROM image must be specified\n); exit(1); } -- 1.8.1.4
Re: [Qemu-devel] Versioned machine types for ARM/non-x86 ? (Was Re: [PATCH v4 0/2] ARM: add 'virt' platform)
Daniel P. Berrange berra...@redhat.com writes: On Mon, Aug 05, 2013 at 12:18:10PM +0100, Peter Maydell wrote: This patch series adds a 'virt' platform which uses the kernel's mach-virt (fully device-tree driven) support to create a simple minimalist platform intended for use for KVM VM guests. It's based on John Rigby's patches, but I've overhauled it a lot: On x86, we've long had versioned machine names, so that we can make changes in future QEMU releases without breaking guest ABI compatibility. AFAICT, the problem has basically been ignored on non-x86 platforms in QEMU. Given the increased interest in ARM in particular, should we use the addition of this new 'virt' machine type, as an opportunity to introduce versioning for ARM too. eg make this machine be called 'virt-1.0.6' and then have 'virt' simply be an alias that points to the most recent version. I've been thinking about this for SPAPR too. Like virt, I'm not sure the platform is stable enough for this but I expect it to be soon. However, unlike PC, I'd like to do linear versioning and avoid bumping at every release. IOW, spapr-1, spapr-2, spapr-3, etc. I think virt ought to try to do the same. Regards, Anthony Liguori Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
[Qemu-devel] [PATCH for-1.6? v2 14/21] milkymist: Suppress -kernel/-bios/-drive error for qtest
Signed-off-by: Andreas Färber afaer...@suse.de --- hw/lm32/milkymist.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/hw/lm32/milkymist.c b/hw/lm32/milkymist.c index 7ceedb8..a8e7230 100644 --- a/hw/lm32/milkymist.c +++ b/hw/lm32/milkymist.c @@ -21,6 +21,7 @@ #include hw/hw.h #include hw/block/flash.h #include sysemu/sysemu.h +#include sysemu/qtest.h #include hw/devices.h #include hw/boards.h #include hw/loader.h @@ -143,7 +144,7 @@ milkymist_init(QEMUMachineInitArgs *args) reset_info-bootstrap_pc = BIOS_OFFSET; /* if no kernel is given no valid bios rom is a fatal error */ -if (!kernel_filename !dinfo !bios_filename) { +if (!kernel_filename !dinfo !bios_filename !qtest_enabled()) { fprintf(stderr, qemu: could not load Milkymist One bios '%s'\n, bios_name); exit(1); -- 1.8.1.4
[Qemu-devel] [PATCH for-1.6? v2 17/21] ppc405_boards: Don't enforce presence of firmware for qtest
Adopt error_report() while at it. Signed-off-by: Andreas Färber afaer...@suse.de --- hw/ppc/ppc405_boards.c | 35 --- 1 file changed, 20 insertions(+), 15 deletions(-) diff --git a/hw/ppc/ppc405_boards.c b/hw/ppc/ppc405_boards.c index 807ada0..75b2177 100644 --- a/hw/ppc/ppc405_boards.c +++ b/hw/ppc/ppc405_boards.c @@ -27,9 +27,11 @@ #include hw/timer/m48t59.h #include hw/block/flash.h #include sysemu/sysemu.h +#include sysemu/qtest.h #include block/block.h #include hw/boards.h #include qemu/log.h +#include qemu/error-report.h #include hw/loader.h #include sysemu/blockdev.h #include exec/address-spaces.h @@ -252,17 +254,20 @@ static void ref405ep_init(QEMUMachineInitArgs *args) if (filename) { bios_size = load_image(filename, memory_region_get_ram_ptr(bios)); g_free(filename); +if (bios_size 0 || bios_size BIOS_SIZE) { +error_report(Could not load PowerPC BIOS '%s', bios_name); +exit(1); +} +bios_size = (bios_size + 0xfff) ~0xfff; +memory_region_add_subregion(sysmem, (uint32_t)(-bios_size), bios); +} else if (!qtest_enabled() || kernel_filename != NULL) { +error_report(Could not load PowerPC BIOS '%s', bios_name); +exit(1); } else { +/* Avoid an uninitialized variable warning */ bios_size = -1; } -if (bios_size 0 || bios_size BIOS_SIZE) { -fprintf(stderr, qemu: could not load PowerPC bios '%s'\n, -bios_name); -exit(1); -} -bios_size = (bios_size + 0xfff) ~0xfff; memory_region_set_readonly(bios, true); -memory_region_add_subregion(sysmem, (uint32_t)(-bios_size), bios); } /* Register FPGA */ #ifdef DEBUG_BOARD_INIT @@ -569,17 +574,17 @@ static void taihu_405ep_init(QEMUMachineInitArgs *args) if (filename) { bios_size = load_image(filename, memory_region_get_ram_ptr(bios)); g_free(filename); -} else { -bios_size = -1; -} -if (bios_size 0 || bios_size BIOS_SIZE) { -fprintf(stderr, qemu: could not load PowerPC bios '%s'\n, -bios_name); +if (bios_size 0 || bios_size BIOS_SIZE) { +error_report(Could not load PowerPC BIOS '%s', bios_name); +exit(1); +} +bios_size = (bios_size + 0xfff) ~0xfff; +memory_region_add_subregion(sysmem, (uint32_t)(-bios_size), bios); +} else if (!qtest_enabled()) { +error_report(Could not load PowerPC BIOS '%s', bios_name); exit(1); } -bios_size = (bios_size + 0xfff) ~0xfff; memory_region_set_readonly(bios, true); -memory_region_add_subregion(sysmem, (uint32_t)(-bios_size), bios); } /* Register Linux flash */ dinfo = drive_get(IF_PFLASH, 0, fl_idx); -- 1.8.1.4