Re: [Qemu-devel] [PATCH 1/8] Add spent time for migration
On 06/22/2012 04:46 PM, Juan Quintela wrote: We add time spent for migration to the output of info migrate command. 'total_time' means time since the start fo migration if migration is 'active', and total time of migration if migration is completed. As we are also interested in transferred ram when migration completes, adding all ram statistics Signed-off-by: Juan Quintela quint...@redhat.com --- hmp.c|2 ++ migration.c | 11 +++ migration.h |1 + qapi-schema.json | 12 +--- 4 files changed, 23 insertions(+), 3 deletions(-) diff --git a/hmp.c b/hmp.c index b9cec1d..4c6d4ae 100644 --- a/hmp.c +++ b/hmp.c @@ -145,6 +145,8 @@ void hmp_info_migrate(Monitor *mon) info-ram-remaining 10); monitor_printf(mon, total ram: % PRIu64 kbytes\n, info-ram-total 10); +monitor_printf(mon, total time: % PRIu64 milliseconds\n, + info-ram-total_time); } if (info-has_disk) { diff --git a/migration.c b/migration.c index 3f485d3..599bb6c 100644 --- a/migration.c +++ b/migration.c @@ -131,6 +131,8 @@ MigrationInfo *qmp_query_migrate(Error **errp) info-ram-transferred = ram_bytes_transferred(); info-ram-remaining = ram_bytes_remaining(); info-ram-total = ram_bytes_total(); +info-ram-total_time = qemu_get_clock_ms(rt_clock) +- s-total_time; if (blk_mig_active()) { info-has_disk = true; @@ -143,6 +145,13 @@ MigrationInfo *qmp_query_migrate(Error **errp) case MIG_STATE_COMPLETED: info-has_status = true; info-status = g_strdup(completed); + +info-has_ram = true; +info-ram = g_malloc0(sizeof(*info-ram)); +info-ram-transferred = ram_bytes_transferred(); +info-ram-remaining = 0; +info-ram-total = ram_bytes_total(); +info-ram-total_time = s-total_time; break; case MIG_STATE_ERROR: info-has_status = true; @@ -260,6 +269,7 @@ static void migrate_fd_put_ready(void *opaque) } else { migrate_fd_completed(s); } +s-total_time = qemu_get_clock_ms(rt_clock) - s-total_time; if (s-state != MIG_STATE_COMPLETED) { if (old_vm_running) { vm_start(); @@ -373,6 +383,7 @@ static MigrationState *migrate_init(int blk, int inc) s-bandwidth_limit = bandwidth_limit; s-state = MIG_STATE_SETUP; +s-total_time = qemu_get_clock_ms(rt_clock); return s; } diff --git a/migration.h b/migration.h index 2e9ca2e..165b27b 100644 --- a/migration.h +++ b/migration.h @@ -33,6 +33,7 @@ struct MigrationState void *opaque; int blk; int shared; +int64_t total_time; }; void process_incoming_migration(QEMUFile *f); diff --git a/qapi-schema.json b/qapi-schema.json index 3b6e346..1ab5dbd 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -260,10 +260,15 @@ # # @total: total amount of bytes involved in the migration process # +# @total_time: tota0l amount of ms since migration started. If +#migration has ended, it returns the total migration +#time. (since 1.2) +# # Since: 0.14.0. ## { 'type': 'MigrationStats', - 'data': {'transferred': 'int', 'remaining': 'int', 'total': 'int' } } + 'data': {'transferred': 'int', 'remaining': 'int', 'total': 'int' , + 'total_time': 'int' } } ## # @MigrationInfo @@ -275,8 +280,9 @@ # 'cancelled'. If this field is not returned, no migration process # has been initiated # -# @ram: #optional @MigrationStats containing detailed migration status, -# only returned if status is 'active' +# @ram: #optional @MigrationStats containing detailed migration +# status, only returned if status is 'active' or +# 'completed'. 'comppleted' (since 1.2) # # @disk: #optional @MigrationStats containing detailed disk migration #status, only returned if status is 'active' and it is a block Reviewed-by: Orit Wasserman owass...@redhat.com
Re: [Qemu-devel] [PATCHv3 02/14] unicore32-softmmu: Add coprocessor 0(sysctrl) and 1(ocd) instruction support
On Mon, Jun 25, 2012 at 11:23:25AM +0800, guanxue...@mprc.pku.edu.cn wrote: [snip] +unrecognized: + ?? ??cpu_abort(env, Wrong register (%d) or wrong operation (%d) in cp0_set!\n, + ?? ?? ?? ?? ?? ??creg, cop); The call to cpu_abort() would mean that the guest is able to terminate QEMU at will, which is not OK. What does real HW do? In my opinion, I just want to terminate qemu when any unhandled or unknown operations happen. This can make the emulator vulnerable in the security sense. Probably Unicore CPUs are not used now in an environment where the guest can not be trusted (like cloud computing), but who knows the future? Is it proper to print such information to monitor? by using monitor_printf(). What if user doesn't open a monitor? Regards, chenwj -- Wei-Ren Chen (陳韋任) Computer Systems Lab, Institute of Information Science, Academia Sinica, Taiwan (R.O.C.) Tel:886-2-2788-3799 #1667 Homepage: http://people.cs.nctu.edu.tw/~chenwj
Re: [Qemu-devel] [PATCH 2/8] Add tracepoints for savevm section start/end
On 06/22/2012 04:46 PM, Juan Quintela wrote: This allows to know how long each section takes to save. An awk script like this tells us sections that takes more that 10ms $1 ~ /savevm_state_iterate_end/ { /* Print savevm_section_end line when 10ms duration */ if ($2 1) { printf(%s times_missing=%u\n, $0, times_missing++); } } Signed-off-by: Juan Quintela quint...@redhat.com fix ws tracepoints Signed-off-by: Juan Quintela quint...@redhat.com --- savevm.c |8 trace-events |5 + 2 files changed, 13 insertions(+) diff --git a/savevm.c b/savevm.c index faa8145..40320be 100644 --- a/savevm.c +++ b/savevm.c @@ -85,6 +85,7 @@ #include cpus.h #include memory.h #include qmp-commands.h +#include trace.h #define SELF_ANNOUNCE_ROUNDS 5 @@ -1624,11 +1625,14 @@ int qemu_savevm_state_iterate(QEMUFile *f) if (se-save_live_state == NULL) continue; +trace_savevm_section_start(); /* Section type */ qemu_put_byte(f, QEMU_VM_SECTION_PART); qemu_put_be32(f, se-section_id); ret = se-save_live_state(f, QEMU_VM_SECTION_PART, se-opaque); +trace_savevm_section_end(se-section_id); + if (ret = 0) { /* Do not proceed to the next vmstate before this one reported completion of the current stage. This serializes the migration @@ -1658,11 +1662,13 @@ int qemu_savevm_state_complete(QEMUFile *f) if (se-save_live_state == NULL) continue; +trace_savevm_section_start(); /* Section type */ qemu_put_byte(f, QEMU_VM_SECTION_END); qemu_put_be32(f, se-section_id); ret = se-save_live_state(f, QEMU_VM_SECTION_END, se-opaque); +trace_savevm_section_end(se-section_id); if (ret 0) { return ret; } @@ -1674,6 +1680,7 @@ int qemu_savevm_state_complete(QEMUFile *f) if (se-save_state == NULL se-vmsd == NULL) continue; +trace_savevm_section_start(); /* Section type */ qemu_put_byte(f, QEMU_VM_SECTION_FULL); qemu_put_be32(f, se-section_id); @@ -1687,6 +1694,7 @@ int qemu_savevm_state_complete(QEMUFile *f) qemu_put_be32(f, se-version_id); vmstate_save(f, se); +trace_savevm_section_end(se-section_id); } qemu_put_byte(f, QEMU_VM_EOF); diff --git a/trace-events b/trace-events index 5c82b3a..82c7619 100644 --- a/trace-events +++ b/trace-events @@ -782,6 +782,11 @@ displaysurface_resize(void *display_state, void *display_surface, int width, int # vga.c ppm_save(const char *filename, void *display_surface) %s surface=%p +# savevm.c + +savevm_section_start(void) +savevm_section_end(unsigned int section_id) section_id %u + # hw/qxl.c disable qxl_interface_set_mm_time(int qid, uint32_t mm_time) %d %d disable qxl_io_write_vga(int qid, const char *mode, uint32_t addr, uint32_t val) %d %s addr=%u val=%u Reviewed-by: Orit Wasserman owass...@redhat.com
Re: [Qemu-devel] [PATCH 3/8] No need to iterate if we already are over the limit
On 06/22/2012 04:46 PM, Juan Quintela wrote: If buffers are full, don't iterate, just exit. Signed-off-by: Juan Quintela quint...@redhat.com --- savevm.c |3 +++ 1 file changed, 3 insertions(+) diff --git a/savevm.c b/savevm.c index 40320be..9101bfb 100644 --- a/savevm.c +++ b/savevm.c @@ -1625,6 +1625,9 @@ int qemu_savevm_state_iterate(QEMUFile *f) if (se-save_live_state == NULL) continue; +if (qemu_file_rate_limit(f)) { +return 0; +} trace_savevm_section_start(); /* Section type */ qemu_put_byte(f, QEMU_VM_SECTION_PART); Reviewed-by: Orit Wasserman owass...@redhat.com
Re: [Qemu-devel] [PATCH 4/8] Only TCG needs TLB handling
On 06/22/2012 04:46 PM, Juan Quintela wrote: Refactor the code that is only needed for tcg to an static function. Call that only when tcg is enabled. We can't refactor to a dummy function in the kvm case, as qemu can be compiled at the same time with tcg and kvm. Signed-off-by: Juan Quintela quint...@redhat.com --- exec.c | 31 +-- 1 file changed, 21 insertions(+), 10 deletions(-) diff --git a/exec.c b/exec.c index 8244d54..a68b65c 100644 --- a/exec.c +++ b/exec.c @@ -1824,11 +1824,29 @@ void tb_flush_jmp_cache(CPUArchState *env, target_ulong addr) TB_JMP_PAGE_SIZE * sizeof(TranslationBlock *)); } +static void tlb_reset_dirty_range_all(ram_addr_t start, ram_addr_t end, + uintptr_t length) +{ +uintptr_t start1; + +/* we modify the TLB cache so that the dirty bit will be set again + when accessing the range */ +start1 = (uintptr_t)qemu_safe_ram_ptr(start); +/* Check that we don't span multiple blocks - this breaks the + address comparisons below. */ +if ((uintptr_t)qemu_safe_ram_ptr(end - 1) - start1 +!= (end - 1) - start) { +abort(); +} +cpu_tlb_reset_dirty_all(start1, length); + +} + /* Note: start and end must be within the same ram block. */ void cpu_physical_memory_reset_dirty(ram_addr_t start, ram_addr_t end, int dirty_flags) { -uintptr_t length, start1; +uintptr_t length; start = TARGET_PAGE_MASK; end = TARGET_PAGE_ALIGN(end); @@ -1838,16 +1856,9 @@ void cpu_physical_memory_reset_dirty(ram_addr_t start, ram_addr_t end, return; cpu_physical_memory_mask_dirty_range(start, length, dirty_flags); -/* we modify the TLB cache so that the dirty bit will be set again - when accessing the range */ -start1 = (uintptr_t)qemu_safe_ram_ptr(start); -/* Check that we don't span multiple blocks - this breaks the - address comparisons below. */ -if ((uintptr_t)qemu_safe_ram_ptr(end - 1) - start1 -!= (end - 1) - start) { -abort(); +if (tcg_enabled()) { +tlb_reset_dirty_range_all(start, end, length); } -cpu_tlb_reset_dirty_all(start1, length); } int cpu_physical_memory_set_dirty_tracking(int enable) Reviewed-by: Orit Wasserman owass...@redhat.com
Re: [Qemu-devel] [PATCH 6/8] Exit loop if we have been there too long
On 06/22/2012 04:46 PM, Juan Quintela wrote: Checking each 64 pages is a random magic number as good as any other. We don't want to test too many times, but on the other hand, qemu_get_clock_ns() is not so expensive either. We want to be sure that we spent less than 50ms (half of buffered_file timer), if we spent more than 100ms, all the accounting got wrong. Signed-off-by: Juan Quintela quint...@redhat.com --- arch_init.c | 17 + 1 file changed, 17 insertions(+) diff --git a/arch_init.c b/arch_init.c index 424efe7..7de1abf 100644 --- a/arch_init.c +++ b/arch_init.c @@ -294,12 +294,15 @@ static void sort_ram_list(void) g_free(blocks); } +#define MAX_WAIT 50 /* ms, half buffered_file limit */ + int ram_save_live(QEMUFile *f, int stage, void *opaque) { ram_addr_t addr; uint64_t bytes_transferred_last; double bwidth = 0; int ret; +int i; if (stage 0) { memory_global_dirty_log_stop(); @@ -339,6 +342,7 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque) bytes_transferred_last = bytes_transferred; bwidth = qemu_get_clock_ns(rt_clock); +i = 0; while ((ret = qemu_file_rate_limit(f)) == 0) { int bytes_sent; @@ -347,6 +351,19 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque) if (bytes_sent == 0) { /* no more blocks */ break; } +/* we want to check in the 1st loop, just in case it was the 1st time + and we had to sync the dirty bitmap. + qemu_get_clock_ns() is a bit expensive, so we only check each some + iterations +*/ +if ((i 63) == 0) { +uint64_t t1 = (qemu_get_clock_ns(rt_clock) - bwidth) / 100; +if (t1 MAX_WAIT) { +DPRINTF(big wait: %ld milliseconds, %d iterations\n, t1, i); +break; +} +} +i++; } if (ret 0) { Reviewed-by: Orit Wasserman owass...@redhat.com
[Qemu-devel] buildbot failure in qemu on default_openbsd_4.9
The Buildbot has detected a new failure on builder default_openbsd_4.9 while building qemu. Full details are available at: http://buildbot.b1-systems.de/qemu/builders/default_openbsd_4.9/builds/298 Buildbot URL: http://buildbot.b1-systems.de/qemu/ Buildslave for this Build: kraxel_openbsd49 Build Reason: The Nightly scheduler named 'nightly_default' triggered this build Build Source Stamp: [branch master] HEAD Blamelist: BUILD FAILED: failed compile sincerely, -The Buildbot
Re: [Qemu-devel] [PATCH 5/8] Only calculate expected_time for stage 2
On 06/22/2012 04:46 PM, Juan Quintela wrote: ram_save_remaining() is an expensive operation when there is a lot of memory. So we only call the function when we need it. Signed-off-by: Juan Quintela quint...@redhat.com --- arch_init.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/arch_init.c b/arch_init.c index a9e8b74..424efe7 100644 --- a/arch_init.c +++ b/arch_init.c @@ -299,7 +299,6 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque) ram_addr_t addr; uint64_t bytes_transferred_last; double bwidth = 0; -uint64_t expected_time = 0; int ret; if (stage 0) { @@ -376,9 +375,12 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque) qemu_put_be64(f, RAM_SAVE_FLAG_EOS); -expected_time = ram_save_remaining() * TARGET_PAGE_SIZE / bwidth; - -return (stage == 2) (expected_time = migrate_max_downtime()); +if (stage == 2) { +uint64_t expected_time; +expected_time = ram_save_remaining() * TARGET_PAGE_SIZE / bwidth; +return expected_time = migrate_max_downtime(); +} +return 0; } static inline void *host_from_stream_offset(QEMUFile *f, Reviewed-by: Orit Wasserman owass...@redhat.com
Re: [Qemu-devel] [PATCH 7/8] dirty bitmap: abstract its use
On 06/22/2012 04:46 PM, Juan Quintela wrote: Always use accessors to read/set the dirty bitmap. Signed-off-by: Juan Quintela quint...@redhat.com --- exec-obsolete.h | 40 exec.c |3 +-- 2 files changed, 21 insertions(+), 22 deletions(-) diff --git a/exec-obsolete.h b/exec-obsolete.h index 792c831..f8ffce6 100644 --- a/exec-obsolete.h +++ b/exec-obsolete.h @@ -45,15 +45,15 @@ int cpu_physical_memory_set_dirty_tracking(int enable); #define CODE_DIRTY_FLAG 0x02 #define MIGRATION_DIRTY_FLAG 0x08 -/* read dirty bit (return 0 or 1) */ -static inline int cpu_physical_memory_is_dirty(ram_addr_t addr) +static inline int cpu_physical_memory_get_dirty_flags(ram_addr_t addr) { -return ram_list.phys_dirty[addr TARGET_PAGE_BITS] == 0xff; +return ram_list.phys_dirty[addr TARGET_PAGE_BITS]; } -static inline int cpu_physical_memory_get_dirty_flags(ram_addr_t addr) +/* read dirty bit (return 0 or 1) */ +static inline int cpu_physical_memory_is_dirty(ram_addr_t addr) { -return ram_list.phys_dirty[addr TARGET_PAGE_BITS]; +return cpu_physical_memory_get_dirty_flags(addr) == 0xff; } Juan, you changed the order of the functions , can your restore the order as it was. static inline int cpu_physical_memory_get_dirty(ram_addr_t start, @@ -61,41 +61,45 @@ static inline int cpu_physical_memory_get_dirty(ram_addr_t start, int dirty_flags) { int ret = 0; -uint8_t *p; ram_addr_t addr, end; end = TARGET_PAGE_ALIGN(start + length); start = TARGET_PAGE_MASK; -p = ram_list.phys_dirty + (start TARGET_PAGE_BITS); for (addr = start; addr end; addr += TARGET_PAGE_SIZE) { -ret |= *p++ dirty_flags; +ret |= cpu_physical_memory_get_dirty_flags(addr) dirty_flags; } return ret; } +static inline int cpu_physical_memory_set_dirty_flags(ram_addr_t addr, + int dirty_flags) +{ +return ram_list.phys_dirty[addr TARGET_PAGE_BITS] |= dirty_flags; +} + static inline void cpu_physical_memory_set_dirty(ram_addr_t addr) { -ram_list.phys_dirty[addr TARGET_PAGE_BITS] = 0xff; +cpu_physical_memory_set_dirty_flags(addr, 0xff); } -static inline int cpu_physical_memory_set_dirty_flags(ram_addr_t addr, - int dirty_flags) +static inline int cpu_physical_memory_clear_dirty_flags(ram_addr_t addr, +int dirty_flags) { -return ram_list.phys_dirty[addr TARGET_PAGE_BITS] |= dirty_flags; +int mask = ~dirty_flags; + +return ram_list.phys_dirty[addr TARGET_PAGE_BITS] = mask; } static inline void cpu_physical_memory_set_dirty_range(ram_addr_t start, ram_addr_t length, int dirty_flags) { -uint8_t *p; ram_addr_t addr, end; end = TARGET_PAGE_ALIGN(start + length); start = TARGET_PAGE_MASK; -p = ram_list.phys_dirty + (start TARGET_PAGE_BITS); for (addr = start; addr end; addr += TARGET_PAGE_SIZE) { -*p++ |= dirty_flags; +cpu_physical_memory_set_dirty_flags(addr, dirty_flags); } } @@ -103,16 +107,12 @@ static inline void cpu_physical_memory_mask_dirty_range(ram_addr_t start, ram_addr_t length, int dirty_flags) { -int mask; -uint8_t *p; ram_addr_t addr, end; end = TARGET_PAGE_ALIGN(start + length); start = TARGET_PAGE_MASK; -mask = ~dirty_flags; -p = ram_list.phys_dirty + (start TARGET_PAGE_BITS); for (addr = start; addr end; addr += TARGET_PAGE_SIZE) { -*p++ = mask; +cpu_physical_memory_clear_dirty_flags(addr, dirty_flags); } } diff --git a/exec.c b/exec.c index a68b65c..dd4833d 100644 --- a/exec.c +++ b/exec.c @@ -2565,8 +2565,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host, ram_list.phys_dirty = g_realloc(ram_list.phys_dirty, last_ram_offset() TARGET_PAGE_BITS); -memset(ram_list.phys_dirty + (new_block-offset TARGET_PAGE_BITS), - 0xff, size TARGET_PAGE_BITS); +cpu_physical_memory_set_dirty_range(new_block-offset, size, 0xff); This will be much slower than memset , do we use it much ? Orit if (kvm_enabled()) kvm_setup_guest_memory(new_block-host, size);
Re: [Qemu-devel] [PATCH 8/8] Maintain the number of dirty pages
On 06/22/2012 04:46 PM, Juan Quintela wrote: Calculate the number of dirty pages takes a lot on hosts with lots of memory. Just maintain how many pages are dirty. Signed-off-by: Juan Quintela quint...@redhat.com --- arch_init.c | 15 +-- cpu-all.h |1 + exec-obsolete.h | 10 ++ 3 files changed, 12 insertions(+), 14 deletions(-) diff --git a/arch_init.c b/arch_init.c index 7de1abf..545cf8f 100644 --- a/arch_init.c +++ b/arch_init.c @@ -228,20 +228,7 @@ static uint64_t bytes_transferred; static ram_addr_t ram_save_remaining(void) { -RAMBlock *block; -ram_addr_t count = 0; - -QLIST_FOREACH(block, ram_list.blocks, next) { -ram_addr_t addr; -for (addr = 0; addr block-length; addr += TARGET_PAGE_SIZE) { -if (memory_region_get_dirty(block-mr, addr, TARGET_PAGE_SIZE, -DIRTY_MEMORY_MIGRATION)) { -count++; -} -} -} - -return count; +return ram_list.dirty_pages; } uint64_t ram_bytes_remaining(void) diff --git a/cpu-all.h b/cpu-all.h index 50c8b62..88cedba 100644 --- a/cpu-all.h +++ b/cpu-all.h @@ -477,6 +477,7 @@ typedef struct RAMBlock { typedef struct RAMList { uint8_t *phys_dirty; QLIST_HEAD(, RAMBlock) blocks; +uint64_t dirty_pages; } RAMList; extern RAMList ram_list; diff --git a/exec-obsolete.h b/exec-obsolete.h index f8ffce6..c099256 100644 --- a/exec-obsolete.h +++ b/exec-obsolete.h @@ -74,6 +74,11 @@ static inline int cpu_physical_memory_get_dirty(ram_addr_t start, static inline int cpu_physical_memory_set_dirty_flags(ram_addr_t addr, int dirty_flags) { +if ((dirty_flags MIGRATION_DIRTY_FLAG) +!cpu_physical_memory_get_dirty(addr, TARGET_PAGE_SIZE, + MIGRATION_DIRTY_FLAG)) { +ram_list.dirty_pages++; +} return ram_list.phys_dirty[addr TARGET_PAGE_BITS] |= dirty_flags; } @@ -87,6 +92,11 @@ static inline int cpu_physical_memory_clear_dirty_flags(ram_addr_t addr, { int mask = ~dirty_flags; +if ((dirty_flags MIGRATION_DIRTY_FLAG) +cpu_physical_memory_get_dirty(addr, TARGET_PAGE_SIZE, + MIGRATION_DIRTY_FLAG)) { +ram_list.dirty_pages--; +} return ram_list.phys_dirty[addr TARGET_PAGE_BITS] = mask; } Reviewed-by: Orit Wasserman owass...@redhat.com
Re: [Qemu-devel] [PATCH 1/5] i82378: Remove bogus MMIO coalescing
Jan Kiszka a écrit : This MMIO area is an entry gate to legacy PC ISA devices, addressed via PIO over there. Quite a few of the PIO ports have side effects on access like starting/stopping timers that must be executed properly ordered /wrt the CPU. So we have to remove the coalescing mark. CC: Andreas Färber andreas.faer...@web.de CC: Hervé Poussineau hpous...@reactos.org Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/i82378.c |1 - 1 files changed, 0 insertions(+), 1 deletions(-) Acked-by: Hervé Poussineau hpous...@reactos.org
Re: [Qemu-devel] [PATCH 1/5] i82378: Remove bogus MMIO coalescing
Am 25.06.2012 09:11, schrieb Hervé Poussineau: Jan Kiszka a écrit : This MMIO area is an entry gate to legacy PC ISA devices, addressed via PIO over there. Quite a few of the PIO ports have side effects on access like starting/stopping timers that must be executed properly ordered /wrt the CPU. So we have to remove the coalescing mark. CC: Andreas Färber andreas.faer...@web.de CC: Hervé Poussineau hpous...@reactos.org Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/i82378.c |1 - 1 files changed, 0 insertions(+), 1 deletions(-) Acked-by: Hervé Poussineau hpous...@reactos.org Fine with me then. Andreas
[Qemu-devel] [PATCH 0/5] kvm: Get coalesced MMIO flushing out of the hot-path
We currently flush the coalesced MMIO buffer on every vmexit to userspace. KVM only provides a single buffer per VM, so a central lock is required to read from it. This is a contention point given a large enough VCPU set. Moreover, we need to hold the BQL while replaying the queued requests, probably for a long time until there is more fine grained locking available. Good reasons to overcome the unconditional flush. The series achieves this by flushing only on selected memory region accesses, either generically via the memory access dispatcher or directly on certain VGA PIO accesses that are not yet fully converted. Another reason to flush are remappings or other relevant region state changes. Please review carefully. CC: Andreas Färber andreas.faer...@web.de CC: Hervé Poussineau hpous...@reactos.org Jan Kiszka (5): i82378: Remove bogus MMIO coalescing memory: Flush coalesced MMIO on selected region access memory: Flush coalesced MMIO on mapping and state changes VGA: Flush coalesced MMIO on related MMIO/PIO accesses kvm: Stop flushing coalesced MMIO on vmexit hw/cirrus_vga.c |7 +++ hw/i82378.c |1 - hw/qxl.c|1 + hw/vga-isa-mm.c |1 + hw/vga.c|5 + hw/vmware_vga.c |1 + kvm-all.c |2 -- memory.c| 36 memory.h| 13 + 9 files changed, 64 insertions(+), 3 deletions(-) -- 1.7.3.4
[Qemu-devel] [PATCH 4/5] VGA: Flush coalesced MMIO on related MMIO/PIO accesses
In preparation of stopping to flush coalesced MMIO unconditionally on vmexits, mark VGA MMIO and PIO regions as synchronous /wrt coalesced MMIO and flush the buffer explicitly on PIO accesses that do not use generic memory regions yet. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/cirrus_vga.c |7 +++ hw/qxl.c|1 + hw/vga-isa-mm.c |1 + hw/vga.c|5 + hw/vmware_vga.c |1 + 5 files changed, 15 insertions(+), 0 deletions(-) diff --git a/hw/cirrus_vga.c b/hw/cirrus_vga.c index afedaa4..cf551a3 100644 --- a/hw/cirrus_vga.c +++ b/hw/cirrus_vga.c @@ -2439,6 +2439,8 @@ static uint32_t cirrus_vga_ioport_read(void *opaque, uint32_t addr) VGACommonState *s = c-vga; int val, index; +qemu_flush_coalesced_mmio_buffer(); + if (vga_ioport_invalid(s, addr)) { val = 0xff; } else { @@ -2532,6 +2534,8 @@ static void cirrus_vga_ioport_write(void *opaque, uint32_t addr, uint32_t val) VGACommonState *s = c-vga; int index; +qemu_flush_coalesced_mmio_buffer(); + /* check port range access depending on color/monochrome mode */ if (vga_ioport_invalid(s, addr)) { return; @@ -2852,6 +2856,7 @@ static void cirrus_init_common(CirrusVGAState * s, int device_id, int is_pci, /* I/O handler for LFB */ memory_region_init_io(s-cirrus_linear_io, cirrus_linear_io_ops, s, cirrus-linear-io, VGA_RAM_SIZE); +memory_region_set_flush_coalesced(s-cirrus_linear_io); /* I/O handler for LFB */ memory_region_init_io(s-cirrus_linear_bitblt_io, @@ -2859,10 +2864,12 @@ static void cirrus_init_common(CirrusVGAState * s, int device_id, int is_pci, s, cirrus-bitblt-mmio, 0x40); +memory_region_set_flush_coalesced(s-cirrus_linear_bitblt_io); /* I/O handler for memory-mapped I/O */ memory_region_init_io(s-cirrus_mmio_io, cirrus_mmio_io_ops, s, cirrus-mmio, CIRRUS_PNPMMIO_SIZE); +memory_region_set_flush_coalesced(s-cirrus_mmio_io); s-real_vram_size = (s-device_id == CIRRUS_ID_CLGD5446) ? 4096 * 1024 : 2048 * 1024; diff --git a/hw/qxl.c b/hw/qxl.c index 3da3399..ef21176 100644 --- a/hw/qxl.c +++ b/hw/qxl.c @@ -1778,6 +1778,7 @@ static int qxl_init_common(PCIQXLDevice *qxl) if (qxl-id == 0) { vga_dirty_log_start(qxl-vga); } +memory_region_set_flush_coalesced(qxl-io_bar); pci_register_bar(qxl-pci, QXL_IO_RANGE_INDEX, diff --git a/hw/vga-isa-mm.c b/hw/vga-isa-mm.c index f8984c6..aab78de 100644 --- a/hw/vga-isa-mm.c +++ b/hw/vga-isa-mm.c @@ -105,6 +105,7 @@ static void vga_mm_init(ISAVGAMMState *s, target_phys_addr_t vram_base, s_ioport_ctrl = g_malloc(sizeof(*s_ioport_ctrl)); memory_region_init_io(s_ioport_ctrl, vga_mm_ctrl_ops, s, vga-mm-ctrl, 0x10); +memory_region_set_flush_coalesced(s_ioport_ctrl); vga_io_memory = g_malloc(sizeof(*vga_io_memory)); /* XXX: endianness? */ diff --git a/hw/vga.c b/hw/vga.c index d784df7..2a6f040 100644 --- a/hw/vga.c +++ b/hw/vga.c @@ -357,6 +357,8 @@ uint32_t vga_ioport_read(void *opaque, uint32_t addr) VGACommonState *s = opaque; int val, index; +qemu_flush_coalesced_mmio_buffer(); + if (vga_ioport_invalid(s, addr)) { val = 0xff; } else { @@ -449,6 +451,8 @@ void vga_ioport_write(void *opaque, uint32_t addr, uint32_t val) VGACommonState *s = opaque; int index; +qemu_flush_coalesced_mmio_buffer(); + /* check port range access depending on color/monochrome mode */ if (vga_ioport_invalid(s, addr)) { return; @@ -2320,6 +2324,7 @@ MemoryRegion *vga_init_io(VGACommonState *s, vga_mem = g_malloc(sizeof(*vga_mem)); memory_region_init_io(vga_mem, vga_mem_ops, s, vga-lowmem, 0x2); +memory_region_set_flush_coalesced(vga_mem); return vga_mem; } diff --git a/hw/vmware_vga.c b/hw/vmware_vga.c index 142d9f4..cf2638d 100644 --- a/hw/vmware_vga.c +++ b/hw/vmware_vga.c @@ -1182,6 +1182,7 @@ static int pci_vmsvga_initfn(PCIDevice *dev) memory_region_init_io(s-io_bar, vmsvga_io_ops, s-chip, vmsvga-io, 0x10); +memory_region_set_flush_coalesced(s-io_bar); pci_register_bar(s-card, 0, PCI_BASE_ADDRESS_SPACE_IO, s-io_bar); vmsvga_init(s-chip, VGA_RAM_SIZE, pci_address_space(dev), -- 1.7.3.4
[Qemu-devel] [PATCH 2/5] memory: Flush coalesced MMIO on selected region access
Instead of flushing pending coalesced MMIO requests on every vmexit, this provides a mechanism to selectively flush when memory regions related to the coalesced one are accessed. This first of all includes the coalesced region itself but can also applied to other regions, e.g. of the same device, by calling memory_region_set_flush_coalesced. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- memory.c | 13 + memory.h | 13 + 2 files changed, 26 insertions(+), 0 deletions(-) diff --git a/memory.c b/memory.c index aab4a31..ba55b3e 100644 --- a/memory.c +++ b/memory.c @@ -311,6 +311,9 @@ static void memory_region_read_accessor(void *opaque, MemoryRegion *mr = opaque; uint64_t tmp; +if (mr-flush_coalesced_mmio) { +qemu_flush_coalesced_mmio_buffer(); +} tmp = mr-ops-read(mr-opaque, addr, size); *value |= (tmp mask) shift; } @@ -325,6 +328,9 @@ static void memory_region_write_accessor(void *opaque, MemoryRegion *mr = opaque; uint64_t tmp; +if (mr-flush_coalesced_mmio) { +qemu_flush_coalesced_mmio_buffer(); +} tmp = (*value shift) mask; mr-ops-write(mr-opaque, addr, tmp, size); } @@ -826,6 +832,7 @@ void memory_region_init(MemoryRegion *mr, mr-dirty_log_mask = 0; mr-ioeventfd_nb = 0; mr-ioeventfds = NULL; +mr-flush_coalesced_mmio = false; } static bool memory_region_access_valid(MemoryRegion *mr, @@ -1176,6 +1183,7 @@ void memory_region_add_coalescing(MemoryRegion *mr, cmr-addr = addrrange_make(int128_make64(offset), int128_make64(size)); QTAILQ_INSERT_TAIL(mr-coalesced, cmr, link); memory_region_update_coalesced_range(mr); +memory_region_set_flush_coalesced(mr); } void memory_region_clear_coalescing(MemoryRegion *mr) @@ -1190,6 +1198,11 @@ void memory_region_clear_coalescing(MemoryRegion *mr) memory_region_update_coalesced_range(mr); } +void memory_region_set_flush_coalesced(MemoryRegion *mr) +{ +mr-flush_coalesced_mmio = true; +} + void memory_region_add_eventfd(MemoryRegion *mr, target_phys_addr_t addr, unsigned size, diff --git a/memory.h b/memory.h index 740c48e..dca7a86 100644 --- a/memory.h +++ b/memory.h @@ -133,6 +133,7 @@ struct MemoryRegion { bool enabled; bool rom_device; bool warning_printed; /* For reservations */ +bool flush_coalesced_mmio; MemoryRegion *alias; target_phys_addr_t alias_offset; unsigned priority; @@ -521,6 +522,18 @@ void memory_region_add_coalescing(MemoryRegion *mr, void memory_region_clear_coalescing(MemoryRegion *mr); /** + * memory_region_set_flush_coalesced: Enforce memory coalescing flush before + *accesses. + * + * Ensure that pending coalesced MMIO request are flushed before the memory + * region is accessed. This property is automatically enabled for all regions + * passed to memory_region_set_coalescing() and memory_region_add_coalescing(). + * + * @mr: the memory region to be updated. + */ +void memory_region_set_flush_coalesced(MemoryRegion *mr); + +/** * memory_region_add_eventfd: Request an eventfd to be triggered when a word *is written to a location. * -- 1.7.3.4
Re: [Qemu-devel] [RFC PATCH V1 1/2] arm_boot: added linux switch
Am 25.06.2012 03:51, schrieb Peter Crosthwaite: Ping! @PMM You rejected my workaround in V2 (i.e. using -dtb to force is_linux) on the grounds that youd accept some reasonable way of saying this ELF file is a Linux kernel, That brings us back to V1 - this patch. Any objections? I won't object to it, but I don't like it: If it's really needed, it should be os=linux or something (enum?) for extensibility or at some point we'll need windows=on, vxworks=on, aix=on, etc. Andreas Regards, Peter On Fri, Jun 1, 2012 at 11:41 AM, Peter Maydell peter.mayd...@linaro.org wrote: On 1 June 2012 02:16, Peter A. G. Crosthwaite peter.crosthwa...@petalogix.com wrote: Added a switch to tell the bootloader that the image is linux and should be bootstrapped as such. This is needed to boot an elf that is linux. Syntax would be: qemu-system-arm ... -kernel linux.elf -machine linux=on This whole area is a mess. Personally I think we should have a standard way to say load this ELF file into memory which isn't target-dependent, and then we could make -kernel actually load a kernel rather than its current load a kernel unless it's an ELF file in which case do something else. There are some back-compatibility issues there, though. -- PMM -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
[Qemu-devel] buildbot failure in qemu on default_openbsd_current
The Buildbot has detected a new failure on builder default_openbsd_current while building qemu. Full details are available at: http://buildbot.b1-systems.de/qemu/builders/default_openbsd_current/builds/298 Buildbot URL: http://buildbot.b1-systems.de/qemu/ Buildslave for this Build: brad_openbsd_current Build Reason: The Nightly scheduler named 'nightly_default' triggered this build Build Source Stamp: [branch master] HEAD Blamelist: BUILD FAILED: failed compile sincerely, -The Buildbot
[Qemu-devel] [PATCH 5/5] kvm: Stop flushing coalesced MMIO on vmexit
The memory subsystem will now take care of flushing whenever affected regions are accessed or the memory mapping changes. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- kvm-all.c |2 -- 1 files changed, 0 insertions(+), 2 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index f8e4328..a1d32f6 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -1483,8 +1483,6 @@ int kvm_cpu_exec(CPUArchState *env) qemu_mutex_lock_iothread(); kvm_arch_post_run(env, run); -kvm_flush_coalesced_mmio_buffer(); - if (run_ret 0) { if (run_ret == -EINTR || run_ret == -EAGAIN) { DPRINTF(io window exit\n); -- 1.7.3.4
Re: [Qemu-devel] [RFC PATCH V1 1/2] arm_boot: added linux switch
On Mon, Jun 25, 2012 at 5:31 PM, Andreas Färber afaer...@suse.de wrote: Am 25.06.2012 03:51, schrieb Peter Crosthwaite: Ping! @PMM You rejected my workaround in V2 (i.e. using -dtb to force is_linux) on the grounds that youd accept some reasonable way of saying this ELF file is a Linux kernel, That brings us back to V1 - this patch. Any objections? I won't object to it, but I don't like it: If it's really needed, it should be os=linux String machine opt os=foo work? or something (enum?) for extensibility or at some point we'll need windows=on, vxworks=on, aix=on, etc. Andreas
[Qemu-devel] [PATCH 1/5] i82378: Remove bogus MMIO coalescing
This MMIO area is an entry gate to legacy PC ISA devices, addressed via PIO over there. Quite a few of the PIO ports have side effects on access like starting/stopping timers that must be executed properly ordered /wrt the CPU. So we have to remove the coalescing mark. CC: Andreas Färber andreas.faer...@web.de CC: Hervé Poussineau hpous...@reactos.org Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/i82378.c |1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/hw/i82378.c b/hw/i82378.c index 9b11d90..2123c14 100644 --- a/hw/i82378.c +++ b/hw/i82378.c @@ -225,7 +225,6 @@ static int pci_i82378_init(PCIDevice *dev) pci_register_bar(dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, s-io); memory_region_init_io(s-mem, i82378_mem_ops, s, i82378-mem, 0x0100); -memory_region_set_coalescing(s-mem); pci_register_bar(dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY, s-mem); /* Make I/O address read only */ -- 1.7.3.4
Re: [Qemu-devel] [PATCH 2/2] virtio-scsi: Implement hotplug support for virtio-scsi
Quoting Stefan Hajnoczi stefa...@gmail.com: On Wed, Jun 20, 2012 at 7:47 AM, Cong Meng m...@linux.vnet.ibm.com wrote: Implement the hotplug() and hot_unplug() interfaces in virtio-scsi, by signal the virtio_scsi.ko in guest kernel via event virtual queue. The counterpart patch of virtio_scsi.ko will be sent soon in another thread. Signed-off-by: Cong Meng m...@linux.vnet.ibm.com Signed-off-by: Sen Wang senw...@linux.vnet.ibm.com --- hw/virtio-scsi.c | 72 +++-- 1 files changed, 69 insertions(+), 3 deletions(-) I compared against the virtio-scsi specification and this looks good: http://ozlabs.org/~rusty/virtio-spec/virtio-0.9.5.pdf Dropped events and event throttling are not implemented by this patch. This means that the guest can miss events if it runs out of event queue elements. A scenario that might be able to trigger this is if multiple LUNs are hotplugged in a single QEMU monitor callback. Implementing dropped events is easy in hw/virtio-scsi.c. Keep a bool or counter of dropped events and report them when the guest kicks us with a free event element (virtio_scsi_handle_event). Yes. It's easy to do this in qemu. But I'm not sure what should be done in virtio-scsi.ko to respond the VIRTIO_SCSI_T_EVENTS_MISSED event. The spec says poll the logical units for unit attention conditions, or just a whole bus rescan? Stefan
[Qemu-devel] [PATCH 3/5] memory: Flush coalesced MMIO on mapping and state changes
Flush pending coalesced MMIO before performing mapping or state changes that could affect the event orderings or route the buffered requests to a wrong region. Signed-off-by: Jan Kiszka jan.kis...@siemens.com In addition, we also have to --- memory.c | 23 +++ 1 files changed, 23 insertions(+), 0 deletions(-) diff --git a/memory.c b/memory.c index ba55b3e..141f92b 100644 --- a/memory.c +++ b/memory.c @@ -759,6 +759,7 @@ static void memory_region_update_topology(MemoryRegion *mr) void memory_region_transaction_begin(void) { +qemu_flush_coalesced_mmio_buffer(); ++memory_region_transaction_depth; } @@ -1109,6 +1110,9 @@ void memory_region_sync_dirty_bitmap(MemoryRegion *mr) void memory_region_set_readonly(MemoryRegion *mr, bool readonly) { +if (!QTAILQ_EMPTY(mr-coalesced)) { +qemu_flush_coalesced_mmio_buffer(); +} if (mr-readonly != readonly) { mr-readonly = readonly; memory_region_update_topology(mr); @@ -1117,6 +1121,9 @@ void memory_region_set_readonly(MemoryRegion *mr, bool readonly) void memory_region_rom_device_set_readable(MemoryRegion *mr, bool readable) { +if (!QTAILQ_EMPTY(mr-coalesced)) { +qemu_flush_coalesced_mmio_buffer(); +} if (mr-readable != readable) { mr-readable = readable; memory_region_update_topology(mr); @@ -1190,6 +1197,8 @@ void memory_region_clear_coalescing(MemoryRegion *mr) { CoalescedMemoryRange *cmr; +qemu_flush_coalesced_mmio_buffer(); + while (!QTAILQ_EMPTY(mr-coalesced)) { cmr = QTAILQ_FIRST(mr-coalesced); QTAILQ_REMOVE(mr-coalesced, cmr, link); @@ -1219,6 +1228,9 @@ void memory_region_add_eventfd(MemoryRegion *mr, }; unsigned i; +if (!QTAILQ_EMPTY(mr-coalesced)) { +qemu_flush_coalesced_mmio_buffer(); +} for (i = 0; i mr-ioeventfd_nb; ++i) { if (memory_region_ioeventfd_before(mrfd, mr-ioeventfds[i])) { break; @@ -1249,6 +1261,9 @@ void memory_region_del_eventfd(MemoryRegion *mr, }; unsigned i; +if (!QTAILQ_EMPTY(mr-coalesced)) { +qemu_flush_coalesced_mmio_buffer(); +} for (i = 0; i mr-ioeventfd_nb; ++i) { if (memory_region_ioeventfd_equal(mrfd, mr-ioeventfds[i])) { break; @@ -1269,6 +1284,8 @@ static void memory_region_add_subregion_common(MemoryRegion *mr, { MemoryRegion *other; +qemu_flush_coalesced_mmio_buffer(); + assert(!subregion-parent); subregion-parent = mr; subregion-addr = offset; @@ -1327,6 +1344,8 @@ void memory_region_add_subregion_overlap(MemoryRegion *mr, void memory_region_del_subregion(MemoryRegion *mr, MemoryRegion *subregion) { +qemu_flush_coalesced_mmio_buffer(); + assert(subregion-parent == mr); subregion-parent = NULL; QTAILQ_REMOVE(mr-subregions, subregion, subregions_link); @@ -1335,6 +1354,8 @@ void memory_region_del_subregion(MemoryRegion *mr, void memory_region_set_enabled(MemoryRegion *mr, bool enabled) { +qemu_flush_coalesced_mmio_buffer(); + if (enabled == mr-enabled) { return; } @@ -1367,6 +1388,8 @@ void memory_region_set_alias_offset(MemoryRegion *mr, target_phys_addr_t offset) { target_phys_addr_t old_offset = mr-alias_offset; +qemu_flush_coalesced_mmio_buffer(); + assert(mr-alias); mr-alias_offset = offset; -- 1.7.3.4
Re: [Qemu-devel] [RFC PATCH V1 1/2] arm_boot: added linux switch
On 25 June 2012 02:51, Peter Crosthwaite peter.crosthwa...@petalogix.com wrote: You rejected my workaround in V2 (i.e. using -dtb to force is_linux) on the grounds that youd accept some reasonable way of saying this ELF file is a Linux kernel, That brings us back to V1 - this patch. Any objections? I'd really like to see some input from people who deal with the other QEMU architectures on whether we can come up with something coherent as a plan for handling these loader command line options. I don't really like here's a switch but it happens to only work on ARM targets if we can avoid it. -- PMM
Re: [Qemu-devel] [PATCH 2/5] memory: Flush coalesced MMIO on selected region access
On 06/25/2012 10:00 AM, Jan Kiszka wrote: Instead of flushing pending coalesced MMIO requests on every vmexit, this provides a mechanism to selectively flush when memory regions related to the coalesced one are accessed. This first of all includes the coalesced region itself but can also applied to other regions, e.g. @@ -521,6 +522,18 @@ void memory_region_add_coalescing(MemoryRegion *mr, void memory_region_clear_coalescing(MemoryRegion *mr); /** + * memory_region_set_flush_coalesced: Enforce memory coalescing flush before + *accesses. + * + * Ensure that pending coalesced MMIO request are flushed before the memory + * region is accessed. This property is automatically enabled for all regions + * passed to memory_region_set_coalescing() and memory_region_add_coalescing(). + * + * @mr: the memory region to be updated. + */ +void memory_region_set_flush_coalesced(MemoryRegion *mr); + +/** Please provide a way to clear the flag (and autoclear on clear_coalesced). -- error compiling committee.c: too many arguments to function
[Qemu-devel] net: RFC New Socket-Based, Switched Network Backend (QDES)
Hi all, Here is something I've been tinkering with the past few weeks and now have it in a state where the basic idea makes sense, it works, and could use some feedback from the community. This is what I've been calling QDES or QEMU Distributed Ethernet Switch. I first had the idea when I was playing with the udp and mcast socket network backends while exploring how to build a VM infrastructure. I liked the idea of using the sockets backends cause it doesn't require escalated permissions to configure and run as well as the ability to talk over IP networks. But the built in socket backends either allowed for only 2 guests talking directly or for multiple guests where all traffic is sent to all. So one can either have two guests talking or have bandwidth wasted with multiple guests. There wasn't something that could talk to multiple guests but also utilize unicast traffic. So I made a backend that can do this. It takes the basics of how the udp and mcast socket backends work and combines them with some switching based on the ethernet packets. The result is multiple guests can talk to each other but not waste bandwidth by delivering unicast traffic to all guests. The backend also adds some header data to each packet. This header includes a network identifier so multiple logical networks can be created using the same multicast configuration but still have separation in the guests. Its kind of like VXLAN or NVGRE but replace the GRE tunnels with UDP packets. There are a couple advantages that I see to this. It allows for multiple guests in multiple locations to talk to each other while keeping unicast traffic to just between two hosts. It doesn't require root permissions to run. It can operate over non-ethernet networks (like IPoIB). It doesn't require changing network configuration on the host. It allows for a ton of logical networks to be created (currently 65536 per multicast address and port combination). There are a few disadvantages as well. It does add some more processing to the QEMU process but not much (I saw it go as fast as the socket backends). It is encapsulating an Ethernet frame inside a UDP packet so there is the overhead of the IP and UDP headers as well as the transport medium headers (most likely Ethernet again). Because there is additional header data and MTU of the guest could be limited depending on the ability to send larger multicast packet from the host. (I haven't really looked closely at this last one). There isn't the ability for something besides QEMU processes to communicate using this, though I hope to build a utility to work with a tap device. Overall, I think this is something that's pretty cool. I don't know how much people give any thought to the socket backends for real world use and so I don't know if this would be of much use to anyone. I am looking for some feedback into what the community thinks and for comments about the code. Its only my second time doing more than 20 lines of C so I'm sure I did some stupid things. I have only tested on 64 bit x86 Linux systems so far. Hopefully you all have good things to say. :) mike
[Qemu-devel] [PATCH] Initial commit for QDES - QEMU Distributed Ethernet Switch
This commit adds a new network backend to QEMU. It combines the basic behavior of the unicast udp and multicast socket backends with some intelligence about the source and destination of the packets. It also adds a header to the packets to allow for creating multiple logical networks using the same underlying network infrastructure. (Kind of like GRE Keys). This provides a network backend that acts as a type of Ethernet switch. During initialization, QDES will create two sockets. One that is used for receiving unicast udp traffic and for sending both unicast and multicast udp traffic. The second socket is for receiving multicast udp traffic. A GHashTable is also created that will serve as an address table for where packets should be delivered. A timer is also configured to regularly clean the contents of the address table. When a packet is received by either of the two sockets, the header is parsed and checked to make sure the packet is a member of the correct logical network. Then the ethernet frame that is received is inspected for the source MAC address. The address table is then updated with the source MAC address, the address of where the packets was received from, and the current time. When QEMU delivers a packet to be sent, the destination MAC address is looked for in the address table. If it is found, the packet is sent to the remote address stored in the table along with the approriate header. If the address is not found, the packet and header is sent to the configured multicast address so all members of the network will receive the packet. Unfortunately, only IPv4 is currently supported. IPv6 is on the short list of improvements to be made. Signed-off-by: Mike Lovell m...@dev-zero.net --- hmp-commands.hx |2 +- net.c | 31 +++- net.h |1 + net/Makefile.objs |1 + net/qdes.c| 453 + net/qdes.h| 38 + net/socket.c |3 +- net/socket.h |3 + qemu-options.hx |5 + 9 files changed, 533 insertions(+), 4 deletions(-) create mode 100644 net/qdes.c create mode 100644 net/qdes.h diff --git a/hmp-commands.hx b/hmp-commands.hx index f5d9d91..042bb85 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -1007,7 +1007,7 @@ ETEXI { .name = host_net_add, .args_type = device:s,opts:s?, -.params = tap|user|socket|vde|dump [options], +.params = tap|user|socket|qdes|vde|dump [options], .help = add host VLAN client, .mhandler.cmd = net_host_device_add, }, diff --git a/net.c b/net.c index 4aa416c..dbb4a48 100644 --- a/net.c +++ b/net.c @@ -30,6 +30,7 @@ #include net/dump.h #include net/slirp.h #include net/vde.h +#include net/qdes.h #include net/util.h #include monitor.h #include qemu-common.h @@ -1016,6 +1017,30 @@ static const struct { { /* end of list */ } }, }, +[NET_CLIENT_TYPE_QDES] = { +.type = qdes, +.init = net_init_qdes, +.desc = { +NET_COMMON_PARAMS_DESC, +{ +.name = timer, +.type = QEMU_OPT_NUMBER, +.help = Seconds between cleaning mac address table +}, { +.name = mcast, +.type = QEMU_OPT_STRING, +.help = UDP multicast address and port number, +}, { +.name = localaddr, +.type = QEMU_OPT_STRING, +.help = source address for multicast and udp packets, +}, { +.name = network, +.type = QEMU_OPT_NUMBER, +.help = qdes network number, +}, +}, +}, #ifdef CONFIG_VDE [NET_CLIENT_TYPE_VDE] = { .type = vde, @@ -1104,7 +1129,8 @@ int net_client_init(QemuOpts *opts, int is_netdev, Error **errp) #ifdef CONFIG_VDE strcmp(type, vde) != 0 #endif -strcmp(type, socket) != 0) { +strcmp(type, socket) != 0 +strcmp(type, qdes) != 0) { error_set(errp, QERR_INVALID_PARAMETER_VALUE, type, a netdev backend type); return -1; @@ -1170,7 +1196,7 @@ int net_client_init(QemuOpts *opts, int is_netdev, Error **errp) static int net_host_check_device(const char *device) { int i; -const char *valid_param_list[] = { tap, socket, dump +const char *valid_param_list[] = { tap, socket, dump, qdes #ifdef CONFIG_NET_BRIDGE , bridge #endif @@ -1405,6 +1431,7 @@ void net_check_clients(void) case NET_CLIENT_TYPE_USER: case NET_CLIENT_TYPE_TAP: case NET_CLIENT_TYPE_SOCKET: +case NET_CLIENT_TYPE_QDES: case NET_CLIENT_TYPE_VDE: has_host_dev = 1; break; diff --git a/net.h b/net.h index bdc2a06..bf932b0 100644 --- a/net.h +++ b/net.h
[Qemu-devel] [Bug 1013467] Re: network in bridge mode doesn't work in SMP Linux guest
The regression is due to the patch of rtl8139: do the network/host communication only in normal operating mode. (Commit ff71f2e8cacefae99179993204172bc65e4303df). However after applying the patch, someone already reported this issue. (http://comments.gmane.org/gmane.comp.emulators.qemu/150064). So revertted this patch. Not sure why this patch revert-revert again? So the bug happened again. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1013467 Title: network in bridge mode doesn't work in SMP Linux guest Status in QEMU: New Bug description: Environment: Host OS (ia32/ia32e/IA64):ia32e Guest OS (ia32/ia32e/IA64):ia32e Guest OS Type (Linux/Windows):Linux kvm.git Commit:25e531a988ea5a64bd97a72dc9d3c65ad5850120 qemu-kvm Commit:0a948cbb1835e5f36990e173966d30bc4c8cc038 Host Kernel Version:3.5.0-rc1 Hardware:WSM-EP, Romley-EP Bug detailed description: -- Network in bridge mode doesn't work in SMP RHEL6.1/6.2 guest. After booting up, use “dhclient ethX” to get IP (we use DHCP in the network), guest can not get IP. Note: 1. UP guest(-smp 1) doesn’t have this issue. 2. Windows(e.g. Win7) guest(-smp 4) has no issue. This should be a qemu-kvm bug. kvm + qemu-kvm = result 25e531a9 + 0a948cbb= bad 25e531a9 + 18b01275 = good Reproduce steps: 1.start up a host with kvm(25e531a9) + qemu-kvm (0a948cbb) 2.qemu-system-x86_64 -m 1024 -smp 4 –net nic,model=rt18139 –net tap,script=/etc/kvm/qemu-ifup –hda /root/rhel6u1.qcow 3. ifconfig 4. ping 192.168.199.98 (a IP in the same subnet) 5. dhclient ethx (can't get dynamic IP address) Current result: The guest can not get IP Expected result: The guest get IP and work fine. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1013467/+subscriptions
Re: [Qemu-devel] [PATCH] Initial commit for QDES - QEMU Distributed Ethernet Switch
I'm not sure why but it looks like my intro email for this got eaten by something. Here it is again and sorry if it shows up twice. This is my first time posting to the list and submitting a patch and I guess something doesn't like the way I did it. - Hi all, Here is something I've been tinkering with the past few weeks and now have it in a state where the basic idea makes sense, it works, and could use some feedback from the community. This is what I've been calling QDES or QEMU Distributed Ethernet Switch. I first had the idea when I was playing with the udp and mcast socket network backends while exploring how to build a VM infrastructure. I liked the idea of using the sockets backends cause it doesn't require escalated permissions to configure and run as well as the ability to talk over IP networks. But the built in socket backends either allowed for only 2 guests talking directly or for multiple guests where all traffic is sent to all. So one can either have two guests talking or have bandwidth wasted with multiple guests. There wasn't something that could talk to multiple guests but also utilize unicast traffic. So I made a backend that can do this. It takes the basics of how the udp and mcast socket backends work and combines them with some switching based on the ethernet packets. The result is multiple guests can talk to each other but not waste bandwidth by delivering unicast traffic to all guests. The backend also adds some header data to each packet. This header includes a network identifier so multiple logical networks can be created using the same multicast configuration but still have separation in the guests. There are a couple advantages that I see to this. It allows for multiple guests in multiple locations to talk to each other while keeping unicast traffic to just between two hosts. It doesn't require root permissions to run. It can operate over non-ethernet networks (like IPoIB). It doesn't require changing network configuration on the host. It allows for a ton of logical networks to be created (currently 65536 per multicast address and port combination). There are a few disadvantages as well. It does add some more processing to the QEMU process but not much (I saw it go as fast as the socket backends). It is encapsulating an Ethernet frame inside a UDP packet so there is the overhead of the IP and UDP headers as well as the transport medium headers (most likely Ethernet again). Because there is additional header data and MTU of the guest could be limited depending on the ability to send larger multicast packet from the host. (I haven't really looked closely at this last one). There isn't the ability for something besides QEMU processes to communicate using this, though I hope to build a utility to work with a tap device. Overall, I think this is something that's pretty cool. I don't know how much people give any thought to the socket backends for real world use and so I don't know if this would be of much use to anyone. I am looking for some feedback into what the community thinks and for comments about the code. Its only my second time doing more than 20 lines of C so I'm sure I did some stupid things. I have only tested on 64 bit x86 Linux systems so far. Hopefully you all have good things to say. :) mike
[Qemu-devel] [PATCH] Exynos4: added RTC device
This patch add RTC device Oleg Ogurtsov (1): Exynos4: added RTC device hw/arm/Makefile.objs |1 + hw/exynos4210.c |8 + hw/exynos4210_rtc.c | 607 ++ 3 files changed, 616 insertions(+), 0 deletions(-) create mode 100644 hw/exynos4210_rtc.c -- 1.7.5.4
[Qemu-devel] [PATCH] Exynos4: added RTC device
Signed-off-by: Oleg Ogurtsov o.ogurt...@samsung.com --- hw/arm/Makefile.objs |1 + hw/exynos4210.c |8 + hw/exynos4210_rtc.c | 607 ++ 3 files changed, 616 insertions(+), 0 deletions(-) create mode 100644 hw/exynos4210_rtc.c diff --git a/hw/arm/Makefile.objs b/hw/arm/Makefile.objs index 88ff47d..0fdb832 100644 --- a/hw/arm/Makefile.objs +++ b/hw/arm/Makefile.objs @@ -11,6 +11,7 @@ obj-y += realview_gic.o realview.o arm_sysctl.o arm11mpcore.o a9mpcore.o obj-y += exynos4210_gic.o exynos4210_combiner.o exynos4210.o obj-y += exynos4_boards.o exynos4210_uart.o exynos4210_pwm.o obj-y += exynos4210_pmu.o exynos4210_mct.o exynos4210_fimd.o +obj-y += exynos4210_rtc.o obj-y += arm_l2x0.o obj-y += arm_mptimer.o a15mpcore.o obj-y += armv7m.o armv7m_nvic.o stellaris.o pl022.o stellaris_enet.o diff --git a/hw/exynos4210.c b/hw/exynos4210.c index 9c20b3f..6e105e0 100644 --- a/hw/exynos4210.c +++ b/hw/exynos4210.c @@ -33,6 +33,9 @@ /* PWM */ #define EXYNOS4210_PWM_BASE_ADDR 0x139D +/* RTC */ +#define EXYNOS4210_RTC_BASE_ADDR 0x1007 + /* MCT */ #define EXYNOS4210_MCT_BASE_ADDR 0x1005 @@ -258,6 +261,11 @@ Exynos4210State *exynos4210_init(MemoryRegion *system_mem, s-irq_table[exynos4210_get_irq(22, 3)], s-irq_table[exynos4210_get_irq(22, 4)], NULL); +/* RTC */ +sysbus_create_varargs(exynos4210.rtc, EXYNOS4210_RTC_BASE_ADDR, + s-irq_table[exynos4210_get_irq(23, 0)], + s-irq_table[exynos4210_get_irq(23, 1)], + NULL); /* Multi Core Timer */ dev = qdev_create(NULL, exynos4210.mct); diff --git a/hw/exynos4210_rtc.c b/hw/exynos4210_rtc.c new file mode 100644 index 000..2ac6301 --- /dev/null +++ b/hw/exynos4210_rtc.c @@ -0,0 +1,607 @@ +/* + * Samsung exynos4210 Real Time Clock + * + * Copyright (c) 2012 Samsung Electronics Co., Ltd. + * Ogurtsov Oleg o.ogurt...@samsung.com + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the + * Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License + * for more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, see http://www.gnu.org/licenses/. + * + */ + +/* Description: + * Register RTCCON: + * CLKSEL Bit[1] not used + * CLKOUTEN Bit[9] not used + */ + +#include sysbus.h +#include qemu-timer.h +#include qemu-common.h +#include ptimer.h + +#include hw.h +#include qemu-timer.h +#include sysemu.h + +#include exynos4210.h + +#define DEBUG_RTC 0 + +#if DEBUG_RTC +#define DPRINTF(fmt, ...) \ +do { fprintf(stdout, RTC: [%24s:%5d] fmt, __func__, __LINE__, \ +## __VA_ARGS__); } while (0) +#else +#define DPRINTF(fmt, ...) do {} while (0) +#endif + +#define EXYNOS4210_RTC_REG_MEM_SIZE 0x0100 + +#define INTP0x0030 +#define RTCCON 0x0040 +#define TICCNT 0x0044 +#define RTCALM 0x0050 +#define ALMSEC 0x0054 +#define ALMMIN 0x0058 +#define ALMHOUR 0x005C +#define ALMDAY 0x0060 +#define ALMMON 0x0064 +#define ALMYEAR 0x0068 +#define BCDSEC 0x0070 +#define BDCMIN 0x0074 +#define BCDHOUR 0x0078 +#define BCDDAY 0x007C +#define BCDDAYWEEK 0x0080 +#define BCDMON 0x0084 +#define BCDYEAR 0x0088 +#define CURTICNT0x0090 + +#define TICK_TIMER_ENABLE 0x0100 +#define TICNT_THRESHHOLD2 + + +#define RTC_ENABLE 0x0001 + +#define INTP_TICK_ENABLE0x0001 +#define INTP_ALM_ENABLE 0x0002 + +#define ALARM_INT_ENABLE0x0040 + +#define RTC_BASE_FREQ 32768 + +typedef struct Exynos4210RTCState { +SysBusDevice busdev; +MemoryRegion iomem; + +/* registers */ +uint32_treg_intp; +uint32_treg_rtccon; +uint32_treg_ticcnt; +uint32_treg_rtcalm; +uint32_treg_almsec; +uint32_treg_almmin; +uint32_treg_almhour; +uint32_treg_almday; +uint32_treg_almmon; +uint32_treg_almyear; +uint32_treg_curticcnt; + +ptimer_state*ptimer;/* tick timer */ +ptimer_state*ptimer_1Hz;/* clock timer */ +uint32_tfreq; + +qemu_irqtick_irq; /* Time Tick Generator irq */ +qemu_irqalm_irq;/* alarm irq */ + +struct tm current_tm; /* current time
Re: [Qemu-devel] [PATCH 3/5] memory: Flush coalesced MMIO on mapping and state changes
Am 25.06.2012 09:01, schrieb Jan Kiszka: Flush pending coalesced MMIO before performing mapping or state changes that could affect the event orderings or route the buffered requests to a wrong region. Signed-off-by: Jan Kiszka jan.kis...@siemens.com In addition, we also have to Stray paste or missing addition to the above? Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
Re: [Qemu-devel] [PATCH 0/7] esp: add AMD PCscsi emulation
Am 24.06.2012 19:15, schrieb Hervé Poussineau: This patchset adds a PCI connection to the 53c9x emulation already present in QEMU. The emulated card is the AMD PCscsi adapter, and has been tested on multiple Microsoft operating systems. Changes v1 - v2: - use qemu_log_mask(LOG_UNIMP, ...) instead of error_report() - use prefix esp_pci_* for functions related to PCI emulation - add missing break - fix DMA start for Select and Transfer Information commands Hervé Poussineau (9): esp: execute select commands immediately when it is a non-dma command esp: delay Transfer Information command if dma is not enabled esp: implement Disable selection command esp: implement Reset ATN command esp: support future change of chip_id esp: use hba_private field instead of a complex cast esp: split esp code into generic chip emulation and sysbus layer esp: use trace framework instead of stderr output esp: add AMD PCscsi emulation (PCI SCSI adapter) default-configs/i386-softmmu.mak |1 + hw/esp.c | 544 -- hw/pci_ids.h |1 + trace-events | 17 ++ 4 files changed, 481 insertions(+), 82 deletions(-) Please keep Paolo CCed, he is the SCSI maintainer (though he's still on vacation this week, iirc). Kevin
Re: [Qemu-devel] [PATCH v6 5/6] fdc_test: update media_change test
Am 24.06.2012 08:16, schrieb Blue Swirl: On Fri, Jun 22, 2012 at 10:33 AM, Pavel Hrdina phrd...@redhat.com wrote: After rewrite DSKCHG bit handling the test has to be updated. Now is needed to seek to different track to clear DSKCHG bit. Signed-off-by: Pavel Hrdina phrd...@redhat.com --- tests/fdc-test.c | 29 + 1 files changed, 21 insertions(+), 8 deletions(-) diff --git a/tests/fdc-test.c b/tests/fdc-test.c index 610e2f1..5280eff 100644 --- a/tests/fdc-test.c +++ b/tests/fdc-test.c @@ -156,19 +156,20 @@ static uint8_t send_read_command(void) return ret; } -static void send_step_pulse(void) +static void send_step_pulse(bool chg_cyl) { int drive = 0; int head = 0; -static int cyl = 0; +int cyl = 0; + +if (chg_cyl) +cyl = (cyl + 1) % 4; Missing braces, please use checkpatch.pl to avoid these issues. % 4 could be turned into 3, maybe with a separate patch. I wouldn't do that. It's not something like a mask for a bitfield, but just a way to let counting start from the beginning after a while. If you have two way to express the same, choose the one that expresses the clearest what the real intention is. The compiler will optimise it (not that this optimisation in a test case like this really mattered...) Kevin
Re: [Qemu-devel] [PATCH] Exynos4: added RTC device
On 25.06.2012 11:55, Oleg Ogurtsov wrote: Signed-off-by: Oleg Ogurtsovo.ogurt...@samsung.com --- hw/arm/Makefile.objs |1 + hw/exynos4210.c |8 + hw/exynos4210_rtc.c | 607 ++ 3 files changed, 616 insertions(+), 0 deletions(-) create mode 100644 hw/exynos4210_rtc.c diff --git a/hw/arm/Makefile.objs b/hw/arm/Makefile.objs index 88ff47d..0fdb832 100644 --- a/hw/arm/Makefile.objs +++ b/hw/arm/Makefile.objs @@ -11,6 +11,7 @@ obj-y += realview_gic.o realview.o arm_sysctl.o arm11mpcore.o a9mpcore.o obj-y += exynos4210_gic.o exynos4210_combiner.o exynos4210.o obj-y += exynos4_boards.o exynos4210_uart.o exynos4210_pwm.o obj-y += exynos4210_pmu.o exynos4210_mct.o exynos4210_fimd.o +obj-y += exynos4210_rtc.o obj-y += arm_l2x0.o obj-y += arm_mptimer.o a15mpcore.o obj-y += armv7m.o armv7m_nvic.o stellaris.o pl022.o stellaris_enet.o diff --git a/hw/exynos4210.c b/hw/exynos4210.c index 9c20b3f..6e105e0 100644 --- a/hw/exynos4210.c +++ b/hw/exynos4210.c @@ -33,6 +33,9 @@ /* PWM */ #define EXYNOS4210_PWM_BASE_ADDR 0x139D +/* RTC */ +#define EXYNOS4210_RTC_BASE_ADDR 0x1007 + /* MCT */ #define EXYNOS4210_MCT_BASE_ADDR 0x1005 @@ -258,6 +261,11 @@ Exynos4210State *exynos4210_init(MemoryRegion *system_mem, s-irq_table[exynos4210_get_irq(22, 3)], s-irq_table[exynos4210_get_irq(22, 4)], NULL); +/* RTC */ +sysbus_create_varargs(exynos4210.rtc, EXYNOS4210_RTC_BASE_ADDR, + s-irq_table[exynos4210_get_irq(23, 0)], + s-irq_table[exynos4210_get_irq(23, 1)], + NULL); /* Multi Core Timer */ dev = qdev_create(NULL, exynos4210.mct); diff --git a/hw/exynos4210_rtc.c b/hw/exynos4210_rtc.c new file mode 100644 index 000..2ac6301 --- /dev/null +++ b/hw/exynos4210_rtc.c @@ -0,0 +1,607 @@ +/* + * Samsung exynos4210 Real Time Clock + * + * Copyright (c) 2012 Samsung Electronics Co., Ltd. + * Ogurtsov Olego.ogurt...@samsung.com + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the + * Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License + * for more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, seehttp://www.gnu.org/licenses/. + * + */ + +/* Description: + * Register RTCCON: + * CLKSEL Bit[1] not used + * CLKOUTEN Bit[9] not used + */ + +#include sysbus.h +#include qemu-timer.h +#include qemu-common.h +#include ptimer.h + +#include hw.h +#include qemu-timer.h +#include sysemu.h + +#include exynos4210.h + +#define DEBUG_RTC 0 + +#if DEBUG_RTC +#define DPRINTF(fmt, ...) \ +do { fprintf(stdout, RTC: [%24s:%5d] fmt, __func__, __LINE__, \ +## __VA_ARGS__); } while (0) +#else +#define DPRINTF(fmt, ...) do {} while (0) +#endif + +#define EXYNOS4210_RTC_REG_MEM_SIZE 0x0100 + +#define INTP0x0030 +#define RTCCON 0x0040 +#define TICCNT 0x0044 +#define RTCALM 0x0050 +#define ALMSEC 0x0054 +#define ALMMIN 0x0058 +#define ALMHOUR 0x005C +#define ALMDAY 0x0060 +#define ALMMON 0x0064 +#define ALMYEAR 0x0068 +#define BCDSEC 0x0070 +#define BDCMIN 0x0074 +#define BCDHOUR 0x0078 +#define BCDDAY 0x007C +#define BCDDAYWEEK 0x0080 +#define BCDMON 0x0084 +#define BCDYEAR 0x0088 +#define CURTICNT0x0090 + +#define TICK_TIMER_ENABLE 0x0100 +#define TICNT_THRESHHOLD2 + + +#define RTC_ENABLE 0x0001 + +#define INTP_TICK_ENABLE0x0001 +#define INTP_ALM_ENABLE 0x0002 + +#define ALARM_INT_ENABLE0x0040 + +#define RTC_BASE_FREQ 32768 + +typedef struct Exynos4210RTCState { +SysBusDevice busdev; +MemoryRegion iomem; + +/* registers */ +uint32_treg_intp; +uint32_treg_rtccon; +uint32_treg_ticcnt; +uint32_treg_rtcalm; +uint32_treg_almsec; +uint32_treg_almmin; +uint32_treg_almhour; +uint32_treg_almday; +uint32_treg_almmon; +uint32_treg_almyear; +uint32_treg_curticcnt; + +ptimer_state*ptimer;/* tick timer */ +ptimer_state*ptimer_1Hz;/* clock timer */ +uint32_tfreq; + +qemu_irqtick_irq; /* Time Tick Generator irq */ +qemu_irqalm_irq;/* alarm
Re: [Qemu-devel] [RFC PATCH V1 1/2] arm_boot: added linux switch
On Mon, Jun 25, 2012 at 5:59 PM, Peter Maydell peter.mayd...@linaro.org wrote: On 25 June 2012 02:51, Peter Crosthwaite peter.crosthwa...@petalogix.com wrote: You rejected my workaround in V2 (i.e. using -dtb to force is_linux) on the grounds that youd accept some reasonable way of saying this ELF file is a Linux kernel, That brings us back to V1 - this patch. Any objections? I'd really like to see some input from people who deal with the other QEMU architectures on whether we can come up with something coherent as a plan for handling these loader command line options. I don't really like here's a switch but it happens to only work on ARM targets if we can avoid it. Well, this switch itself is generic. Theres no armisms in the machine opt itself. As for architectures, I can speak for Microblaze and the PPC bootloader for virtex_ml507. The DTB will blob in regardless of whether its an elf or not, so no is_linux switch is needed there for my Linux elfs. There is no functional difference between linux and not linux in those bootloaders. -- PMM
Re: [Qemu-devel] [PATCH 3/5] memory: Flush coalesced MMIO on mapping and state changes
On 06/25/2012 10:01 AM, Jan Kiszka wrote: Flush pending coalesced MMIO before performing mapping or state changes that could affect the event orderings or route the buffered requests to a wrong region. Signed-off-by: Jan Kiszka jan.kis...@siemens.com In addition, we also have to Yes, we do. void memory_region_transaction_begin(void) { +qemu_flush_coalesced_mmio_buffer(); ++memory_region_transaction_depth; } Why is this needed? @@ -1109,6 +1110,9 @@ void memory_region_sync_dirty_bitmap(MemoryRegion *mr) void memory_region_set_readonly(MemoryRegion *mr, bool readonly) { +if (!QTAILQ_EMPTY(mr-coalesced)) { +qemu_flush_coalesced_mmio_buffer(); +} The readonly attribute is inherited by subregions and alias targets, so this check is insufficient. See render_memory_region(). Need to flush unconditionally. if (mr-readonly != readonly) { mr-readonly = readonly; memory_region_update_topology(mr); @@ -1117,6 +1121,9 @@ void memory_region_set_readonly(MemoryRegion *mr, bool readonly) void memory_region_rom_device_set_readable(MemoryRegion *mr, bool readable) { +if (!QTAILQ_EMPTY(mr-coalesced)) { +qemu_flush_coalesced_mmio_buffer(); +} This property is not inherited, but let's flush unconditionally just the same, to reduce fragility. @@ -1219,6 +1228,9 @@ void memory_region_add_eventfd(MemoryRegion *mr, }; unsigned i; +if (!QTAILQ_EMPTY(mr-coalesced)) { +qemu_flush_coalesced_mmio_buffer(); +} Ditto. It's possible that an eventfd overlays a subregion which has coalescing enabled. It's not really defined what happens in this case, and such code and its submitter should be perma-nacked, but let's play it safe here since there isn't much to be gained by avoiding the flush. This code is a very slow path anyway, including and rcu and/or srcu synchronization, and a rebuild of the dispatch radix tree (trees when we dma-enable this code). for (i = 0; i mr-ioeventfd_nb; ++i) { if (memory_region_ioeventfd_before(mrfd, mr-ioeventfds[i])) { break; @@ -1249,6 +1261,9 @@ void memory_region_del_eventfd(MemoryRegion *mr, }; unsigned i; +if (!QTAILQ_EMPTY(mr-coalesced)) { +qemu_flush_coalesced_mmio_buffer(); +} Same. The repetitiveness of this code suggests a different way of doing this: make every API call be its own subtransaction and perform the flush in memory_region_begin_transaction() (maybe that's the answer to my question above). -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [PATCH v6 08/16] target-or32: Add instruction tanslation
On Mon, Jun 25, 2012 at 6:50 AM, Jia Liu pro...@gmail.com wrote: Hi Max, On Thu, Jun 21, 2012 at 6:24 PM, Max Filippov jcmvb...@gmail.com wrote: On Thu, Jun 21, 2012 at 6:58 AM, Jia Liu pro...@gmail.com wrote: Add OpenRISC instruction tanslation routines. Signed-off-by: Jia Liu pro...@gmail.com [...] + case 0x0009: + switch (op1) { + case 0x03: /*l.div*/ + LOG_DIS(l.div r%d, r%d, r%d\n, rd, ra, rb); + { + int lab0 = gen_new_label(); + int lab1 = gen_new_label(); + int lab2 = gen_new_label(); + TCGv_i32 sr_ove = tcg_temp_local_new_i32(); + if (rb == 0) { + tcg_gen_ori_tl(cpu_sr, cpu_sr, (SR_OV | SR_CY)); + tcg_gen_brcondi_tl(TCG_COND_NE, sr_ove, SR_OVE, lab0); + tcg_gen_andi_tl(sr_ove, cpu_sr, SR_OVE); + gen_exception(dc, EXCP_RANGE); + gen_set_label(lab0); + } else { + tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_R[rb], + 0x, lab1); + tcg_gen_brcondi_tl(TCG_COND_NE, cpu_R[ra], + 0x, lab2); + tcg_gen_brcondi_tl(TCG_COND_NE, cpu_R[rb], + 0x8000, lab2); + gen_set_label(lab1); + tcg_gen_ori_tl(cpu_sr, cpu_sr, (SR_OV | SR_CY)); + tcg_gen_andi_tl(sr_ove, cpu_sr, SR_OVE); + tcg_gen_brcondi_tl(TCG_COND_NE, sr_ove, SR_OVE, lab2); Causes host division by zero/overflow. I'd suggest to brcond to lab3 set after the final tcg_gen_div. Causes host division by zero/overflow? Can I handle the host code? I'm confused about this. May I get more comment about this? Sorry I didn't understand it. The brcondi in question jumps to the division operation although we know for sure that it's going to be a division by zero or an overflow. The host is not smarter than we are, it cannot magically divide by zero. So, if the result register value is undefined in case of division by zero/overflow than you should skip the division and brcondi to a label lab3, right after it. + gen_exception(dc, EXCP_RANGE); + gen_set_label(lab2); + tcg_gen_div_tl(cpu_R[rd], cpu_R[ra], cpu_R[rb]); + gen_set_label(lab3); + } + tcg_temp_free_i32(sr_ove); + } + break; A picture worth a thousand words is the test for this specific case. I think it can look like this (because SR_OVE is cleared upon reset): int main(void) { int a, b, c; b = 0x8000; c = 0x; __asm (l.div %0, %1, %2\n : =r(a) : r(b), r(c) ); return 0; } I believe that this test with the current div implementation would crash qemu. BTW, shouldn't tests that detect wrong behavior do return 1 or something like this to make make check fully automatic? -- Thanks. -- Max
Re: [Qemu-devel] [RFC PATCH V1 1/2] arm_boot: added linux switch
On 25 June 2012 09:55, Peter Crosthwaite peter.crosthwa...@petalogix.com wrote: As for architectures, I can speak for Microblaze and the PPC bootloader for virtex_ml507. The DTB will blob in regardless of whether its an elf or not, so no is_linux switch is needed there for my Linux elfs. There is no functional difference between linux and not linux in those bootloaders. Yes, it is exactly the on architecture X we behave like this and on architecture Y we do something else that I dislike, and this is why I'm not happy with any patch which is just trying to band-aid a situation on architecture X rather than trying to address the more general situation. -- PMM
Re: [Qemu-devel] [PATCH 2/2] virtio-scsi: Implement hotplug support for virtio-scsi
On Mon, Jun 25, 2012 at 03:51:13AM -0400, m...@linux.vnet.ibm.com wrote: Quoting Stefan Hajnoczi stefa...@gmail.com: On Wed, Jun 20, 2012 at 7:47 AM, Cong Meng m...@linux.vnet.ibm.com wrote: Implement the hotplug() and hot_unplug() interfaces in virtio-scsi, by signal the virtio_scsi.ko in guest kernel via event virtual queue. The counterpart patch of virtio_scsi.ko will be sent soon in another thread. Signed-off-by: Cong Meng m...@linux.vnet.ibm.com Signed-off-by: Sen Wang senw...@linux.vnet.ibm.com --- hw/virtio-scsi.c | 72 +++-- 1 files changed, 69 insertions(+), 3 deletions(-) I compared against the virtio-scsi specification and this looks good: http://ozlabs.org/~rusty/virtio-spec/virtio-0.9.5.pdf Dropped events and event throttling are not implemented by this patch. This means that the guest can miss events if it runs out of event queue elements. A scenario that might be able to trigger this is if multiple LUNs are hotplugged in a single QEMU monitor callback. Implementing dropped events is easy in hw/virtio-scsi.c. Keep a bool or counter of dropped events and report them when the guest kicks us with a free event element (virtio_scsi_handle_event). Yes. It's easy to do this in qemu. But I'm not sure what should be done in virtio-scsi.ko to respond the VIRTIO_SCSI_T_EVENTS_MISSED event. The spec says poll the logical units for unit attention conditions, or just a whole bus rescan? I'm not sure what the answer is either, maybe you can find an existing SCSI LLD that does what you need. Stefan
Re: [Qemu-devel] [PATCH v4 1/7] qemu-char: Add MSG_CMSG_CLOEXEC flag to recvmsg
On Fri, Jun 22, 2012 at 01:31:17PM -0600, Eric Blake wrote: On 06/22/2012 12:36 PM, Corey Bryant wrote: This sets the close-on-exec flag for the file descriptor received via SCM_RIGHTS. Signed-off-by: Corey Bryant cor...@linux.vnet.ibm.com --- v4 -This patch is new in v4 (ebl...@redhat.com) qemu-char.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/qemu-char.c b/qemu-char.c index c2aaaee..f890113 100644 --- a/qemu-char.c +++ b/qemu-char.c @@ -2263,7 +2263,7 @@ static ssize_t tcp_chr_recv(CharDriverState *chr, char *buf, size_t len) msg.msg_control = msg_control; msg.msg_controllen = sizeof(msg_control); -ret = recvmsg(s-fd, msg, 0); +ret = recvmsg(s-fd, msg, MSG_CMSG_CLOEXEC); MSG_CMSG_CLOEXEC is not (yet) in POSIX (although it has been proposed for addition); therefore, at the moment, it only exists on Linux and Cygwin. Does this need to have conditional code to allow compilation on BSD, such as: #ifndef MSG_CMSG_CLOEXEC # define MSG_CMSG_CLOEXEC 0 #endif as well as fallback code that sets FD_CLOEXEC manually via fcntl() when MSG_CMSG_CLOEXEC is missing? Good point. I think the answer is yes. Just like qemu_open() we can wrap recvmsg(2) with fd so that platforms with MSG_CMSG_CLOEXEC use that flag and other platforms use qemu_set_cloexec(). Stefan
Re: [Qemu-devel] [RFC PATCH V1 1/2] arm_boot: added linux switch
On Mon, Jun 25, 2012 at 7:03 PM, Peter Maydell peter.mayd...@linaro.org wrote: On 25 June 2012 09:55, Peter Crosthwaite peter.crosthwa...@petalogix.com wrote: As for architectures, I can speak for Microblaze and the PPC bootloader for virtex_ml507. The DTB will blob in regardless of whether its an elf or not, so no is_linux switch is needed there for my Linux elfs. There is no functional difference between linux and not linux in those bootloaders. Yes, it is exactly the on architecture X we behave like this and on architecture Y we do something else that I dislike, and this is why I'm not happy with any patch which is just trying to band-aid a situation on architecture X rather than trying to address the more general situation. Well if X and Y behave differently and you want to unify them, then either X or Y loses backwards compatibility. Someone is going to have their command line change on them. Looking at some of the other bootloaders, I dont see others that assume elfs are not linux, so it seems to me that ARM is the exception, not the rule. So from that unification would involve getting rid of this ARM assumption that elfs are different. Just boot the elf with the linux style bootloader like most other platforms. -- PMM
Re: [Qemu-devel] [PATCH] Exynos4: added RTC device
Am 25.06.2012 09:55, schrieb Oleg Ogurtsov: Signed-off-by: Oleg Ogurtsov o.ogurt...@samsung.com --- hw/arm/Makefile.objs |1 + hw/exynos4210.c |8 + hw/exynos4210_rtc.c | 607 ++ 3 files changed, 616 insertions(+), 0 deletions(-) create mode 100644 hw/exynos4210_rtc.c This RTC like many other Exynos devices has no dependency on the CPU. I have a patch in preparation that moves such devices from hw/arm/Makefile.objs to hw/Makefile.objs. I don't object to this patch, not even minor style nits spotted, compliment, but if you have to respin for some reason, it would be nice if you could consider that improvement. One thing I noted though, is it time to share the static BCD conversion helpers used in the various RTCs? (bcd.c? qemu-common.h?) Or are they slightly different? Regards, Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
[Qemu-devel] [RFC] Attaching EHCI to sysbus.
Hi All, I have a platform (Xilinx Zynq) that has a USB EHCI controller that attaches directly to the system bus and not through PCI. We are looking for a way to disentangle EHCI from PCI - currently it inherits from TYPE_PCI_DEVICE: static TypeInfo ehci_info = { .name = usb-ehci, .parent= TYPE_PCI_DEVICE, .instance_size = sizeof(EHCIState), .class_init= ehci_class_init, }; How would we go about edit EHCI to inherit from TYPE_SYSBUS_DEVICE as well? Regards, Peter
Re: [Qemu-devel] [PATCH] Exynos4: added RTC device
On 25 June 2012 08:55, Oleg Ogurtsov o.ogurt...@samsung.com wrote: Subject: [PATCH] Exynos4: added RTC device add, please. Signed-off-by: Oleg Ogurtsov o.ogurt...@samsung.com (You don't need to send a cover letter for a single patch, by the way.) +#define BCDSEC 0x0070 +#define BDCMIN 0x0074 typo, should be BCDMIN I assume. +#define BCDHOUR 0x0078 +#define BCDDAY 0x007C +#define BCDDAYWEEK 0x0080 +#define BCDMON 0x0084 +#define BCDYEAR 0x0088 +/* + * 1Hz clock handler + */ +static void exynos4210_rtc_1Hz_tick(void *opaque) +{ + Exynos4210RTCState *s = (Exynos4210RTCState *)opaque; + + rtc_next_second(s-current_tm); + /* DPRINTF(1Hz tick\n); */ + + /* raise IRQ */ + if (s-reg_rtcalm ALARM_INT_ENABLE) { + check_alarm_raise(s); + } + + ptimer_set_count(s-ptimer_1Hz, RTC_BASE_FREQ); + ptimer_run(s-ptimer_1Hz, 1); +} You could I think structure this so that instead of running the timer every second you only have it go off at the alarm time (you then recalculate the right values etc if the guest reads/writes the registers). That would be more complicated though and I guess one timer fire every second isn't a big deal. We seem to do it this way in other RTCs too, so this code is OK. -- PMM
[Qemu-devel] [net-next RFC V4 PATCH 3/4] virtio: introduce a method to get the irq of a specific virtqueue
Device specific irq optimizations such as irq affinity may be used by virtio drivers. So this patch introduce a new method to get the irq of a specific virtqueue. After this patch, virtio device drivers could query the irq and do device specific optimizations. First user would be virtio-net. Signed-off-by: Jason Wang jasow...@redhat.com --- drivers/lguest/lguest_device.c |8 drivers/s390/kvm/kvm_virtio.c |6 ++ drivers/virtio/virtio_mmio.c |8 drivers/virtio/virtio_pci.c| 12 include/linux/virtio_config.h |4 5 files changed, 38 insertions(+), 0 deletions(-) diff --git a/drivers/lguest/lguest_device.c b/drivers/lguest/lguest_device.c index 9e8388e..bcd080f 100644 --- a/drivers/lguest/lguest_device.c +++ b/drivers/lguest/lguest_device.c @@ -392,6 +392,13 @@ static const char *lg_bus_name(struct virtio_device *vdev) return ; } +static int lg_get_vq_irq(struct virtio_device *vdev, struct virtqueue *vq) +{ + struct lguest_vq_info *lvq = vq-priv; + + return lvq-config.irq; +} + /* The ops structure which hooks everything together. */ static struct virtio_config_ops lguest_config_ops = { .get_features = lg_get_features, @@ -404,6 +411,7 @@ static struct virtio_config_ops lguest_config_ops = { .find_vqs = lg_find_vqs, .del_vqs = lg_del_vqs, .bus_name = lg_bus_name, + .get_vq_irq = lg_get_vq_irq, }; /* diff --git a/drivers/s390/kvm/kvm_virtio.c b/drivers/s390/kvm/kvm_virtio.c index d74e9ae..a897de2 100644 --- a/drivers/s390/kvm/kvm_virtio.c +++ b/drivers/s390/kvm/kvm_virtio.c @@ -268,6 +268,11 @@ static const char *kvm_bus_name(struct virtio_device *vdev) return ; } +static int kvm_get_vq_irq(struct virtio_device *vdev, struct virtqueue *vq) +{ + return 0x2603; +} + /* * The config ops structure as defined by virtio config */ @@ -282,6 +287,7 @@ static struct virtio_config_ops kvm_vq_configspace_ops = { .find_vqs = kvm_find_vqs, .del_vqs = kvm_del_vqs, .bus_name = kvm_bus_name, + .get_vq_irq = kvm_get_vq_irq, }; /* diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c index f5432b6..2ba37ed 100644 --- a/drivers/virtio/virtio_mmio.c +++ b/drivers/virtio/virtio_mmio.c @@ -411,6 +411,13 @@ static const char *vm_bus_name(struct virtio_device *vdev) return vm_dev-pdev-name; } +static int vm_get_vq_irq(struct virtio_device *vdev, struct virtqueue *vq) +{ + struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev); + + return platform_get_irq(vm_dev-pdev, 0); +} + static struct virtio_config_ops virtio_mmio_config_ops = { .get= vm_get, .set= vm_set, @@ -422,6 +429,7 @@ static struct virtio_config_ops virtio_mmio_config_ops = { .get_features = vm_get_features, .finalize_features = vm_finalize_features, .bus_name = vm_bus_name, + .get_vq_irq = vm_get_vq_irq, }; diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c index adb24f2..c062227 100644 --- a/drivers/virtio/virtio_pci.c +++ b/drivers/virtio/virtio_pci.c @@ -607,6 +607,17 @@ static const char *vp_bus_name(struct virtio_device *vdev) return pci_name(vp_dev-pci_dev); } +static int vp_get_vq_irq(struct virtio_device *vdev, struct virtqueue *vq) +{ + struct virtio_pci_device *vp_dev = to_vp_device(vdev); + struct virtio_pci_vq_info *info = vq-priv; + + if (vp_dev-intx_enabled) + return vp_dev-pci_dev-irq; + else + return vp_dev-msix_entries[info-msix_vector].vector; +} + static struct virtio_config_ops virtio_pci_config_ops = { .get= vp_get, .set= vp_set, @@ -618,6 +629,7 @@ static struct virtio_config_ops virtio_pci_config_ops = { .get_features = vp_get_features, .finalize_features = vp_finalize_features, .bus_name = vp_bus_name, + .get_vq_irq = vp_get_vq_irq, }; static void virtio_pci_release_dev(struct device *_d) diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h index fc457f4..acd6930 100644 --- a/include/linux/virtio_config.h +++ b/include/linux/virtio_config.h @@ -98,6 +98,9 @@ * vdev: the virtio_device * This returns a pointer to the bus name a la pci_name from which * the caller can then copy. + * @get_vq_irq: get the irq numer of the specific virt queue. + * vdev: the virtio_device + * vq: the virtqueue */ typedef void vq_callback_t(struct virtqueue *); struct virtio_config_ops { @@ -116,6 +119,7 @@ struct virtio_config_ops { u32 (*get_features)(struct virtio_device *vdev); void (*finalize_features)(struct virtio_device *vdev); const char *(*bus_name)(struct virtio_device *vdev); + int (*get_vq_irq)(struct virtio_device *vdev, struct virtqueue *vq); }; /* If driver didn't advertise
[Qemu-devel] [net-next RFC V4 PATCH 4/4] virtio_net: multiqueue support
This addes multiqueue support to virtio_net driver. This feature is negotiated through VIRTIO_NET_F_MULTIQUEUE. The driver expects the number of rx/tx queue paris is equal to the number of vcpus. To maximize the performance under this per-cpu rx/tx queue pairs, some optimization were introduced: - Txq selection is based on the processor id in order to avoid contending a lock whose owner may exits to host. - Since the txq/txq were per-cpu, affinity hint were set to the cpu that owns the queue pairs. Signed-off-by: Krishna Kumar krkum...@in.ibm.com Signed-off-by: Jason Wang jasow...@redhat.com --- drivers/net/virtio_net.c | 695 include/linux/virtio_net.h |2 + 2 files changed, 506 insertions(+), 191 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index f18149a..5c719de 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -26,6 +26,7 @@ #include linux/scatterlist.h #include linux/if_vlan.h #include linux/slab.h +#include linux/interrupt.h static int napi_weight = 128; module_param(napi_weight, int, 0444); @@ -51,43 +52,73 @@ struct virtnet_stats { u64 rx_packets; }; -struct virtnet_info { - struct virtio_device *vdev; - struct virtqueue *rvq, *svq, *cvq; - struct net_device *dev; +/* Internal representation of a send virtqueue */ +struct send_queue { + /* Virtqueue associated with this send _queue */ + struct virtqueue *vq; + + /* TX: fragments + linear part + virtio header */ + struct scatterlist sg[MAX_SKB_FRAGS + 2]; + + cpumask_var_t affinity_mask; +}; + +/* Internal representation of a receive virtqueue */ +struct receive_queue { + /* Virtqueue associated with this receive_queue */ + struct virtqueue *vq; + + /* Back pointer to the virtnet_info */ + struct virtnet_info *vi; + struct napi_struct napi; - unsigned int status; /* Number of input buffers, and max we've ever had. */ unsigned int num, max; + /* Work struct for refilling if we run low on memory. */ + struct delayed_work refill; + + /* Chain pages by the private ptr. */ + struct page *pages; + + /* RX: fragments + linear part + virtio header */ + struct scatterlist sg[MAX_SKB_FRAGS + 2]; + + cpumask_var_t affinity_mask; +}; + +struct virtnet_info { + int num_queue_pairs;/* # of RX/TX vq pairs */ + + struct send_queue **sq; + struct receive_queue **rq; + struct virtqueue *cvq; + + struct virtio_device *vdev; + struct net_device *dev; + unsigned int status; + /* I like... big packets and I cannot lie! */ bool big_packets; /* Host will merge rx buffers for big packets (shake it! shake it!) */ bool mergeable_rx_bufs; + /* Has control virtqueue */ + bool has_cvq; + /* enable config space updates */ bool config_enable; /* Active statistics */ struct virtnet_stats __percpu *stats; - /* Work struct for refilling if we run low on memory. */ - struct delayed_work refill; - /* Work struct for config space updates */ struct work_struct config_work; /* Lock for config space updates */ struct mutex config_lock; - - /* Chain pages by the private ptr. */ - struct page *pages; - - /* fragments + linear part + virtio header */ - struct scatterlist rx_sg[MAX_SKB_FRAGS + 2]; - struct scatterlist tx_sg[MAX_SKB_FRAGS + 2]; }; struct skb_vnet_hdr { @@ -108,6 +139,22 @@ struct padded_vnet_hdr { char padding[6]; }; +static inline int txq_get_qnum(struct virtnet_info *vi, struct virtqueue *vq) +{ + int ret = virtqueue_get_queue_index(vq); + + /* skip ctrl vq */ + if (vi-has_cvq) + return (ret - 1) / 2; + else + return ret / 2; +} + +static inline int rxq_get_qnum(struct virtnet_info *vi, struct virtqueue *vq) +{ + return virtqueue_get_queue_index(vq) / 2; +} + static inline struct skb_vnet_hdr *skb_vnet_hdr(struct sk_buff *skb) { return (struct skb_vnet_hdr *)skb-cb; @@ -117,22 +164,22 @@ static inline struct skb_vnet_hdr *skb_vnet_hdr(struct sk_buff *skb) * private is used to chain pages for big packets, put the whole * most recent used list in the beginning for reuse */ -static void give_pages(struct virtnet_info *vi, struct page *page) +static void give_pages(struct receive_queue *rq, struct page *page) { struct page *end; /* Find end of list, sew whole thing into vi-pages. */ for (end = page; end-private; end = (struct page *)end-private); - end-private = (unsigned long)vi-pages; - vi-pages = page; + end-private = (unsigned long)rq-pages; + rq-pages = page; } -static struct page *get_a_page(struct virtnet_info *vi, gfp_t gfp_mask) +static struct
Re: [Qemu-devel] [PATCH v6 5/6] fdc_test: update media_change test
Am 22.06.2012 12:33, schrieb Pavel Hrdina: After rewrite DSKCHG bit handling the test has to be updated. Now is needed to seek to different track to clear DSKCHG bit. Signed-off-by: Pavel Hrdina phrd...@redhat.com --- tests/fdc-test.c | 29 + 1 files changed, 21 insertions(+), 8 deletions(-) diff --git a/tests/fdc-test.c b/tests/fdc-test.c index 610e2f1..5280eff 100644 --- a/tests/fdc-test.c +++ b/tests/fdc-test.c @@ -156,19 +156,20 @@ static uint8_t send_read_command(void) return ret; } -static void send_step_pulse(void) +static void send_step_pulse(bool chg_cyl) { int drive = 0; int head = 0; -static int cyl = 0; +int cyl = 0; + +if (chg_cyl) +cyl = (cyl + 1) % 4; Why do you remove the static? Previously, the function would cycle through cylinders 1-4. Now it's always 0 if !chg_cyl and 1 if chg_cyl. I think you need to fix this. I also don't quite understand why you move the increment to here instead of leaving it as the bottom. It doesn't look wrong, but pointless. floppy_send(CMD_SEEK); floppy_send(head 2 | drive); g_assert(!get_irq(FLOPPY_IRQ)); floppy_send(cyl); ack_irq(); - -cyl = (cyl + 1) % 4; } static uint8_t cmos_read(uint8_t reg) @@ -195,8 +196,7 @@ static void test_no_media_on_start(void) assert_bit_set(dir, DSKCHG); dir = inb(FLOPPY_BASE + reg_dir); assert_bit_set(dir, DSKCHG); -send_step_pulse(); -send_step_pulse(); +send_step_pulse(1); This is a bool parameter, so you should pass true/false instead of 0/1. Other than that, the test case additions look good and I'll accept it as sufficient for the DSKCHG handling change. It would be great, though, to add some more cases that test the implicit seeks during commands like READ (ideally one for each command that can seek). Kevin
Re: [Qemu-devel] [RFC] Attaching EHCI to sysbus.
Hi Peter, Am 25.06.2012 11:34, schrieb Peter Crosthwaite: I have a platform (Xilinx Zynq) that has a USB EHCI controller that attaches directly to the system bus and not through PCI. We are looking for a way to disentangle EHCI from PCI - currently it inherits from TYPE_PCI_DEVICE: static TypeInfo ehci_info = { .name = usb-ehci, .parent= TYPE_PCI_DEVICE, .instance_size = sizeof(EHCIState), .class_init= ehci_class_init, }; How would we go about edit EHCI to inherit from TYPE_SYSBUS_DEVICE as well? As mentioned recently, I need that for the Tegra platform as well, please keep me in the loop. Basically it means adding a second TypeInfo with its own class_init. We'd need to have PCIEHCIState with PCIDevice as parent and a SysBusEHCIState with SysBusDevice as parent, sharing a common EHCIState. The m48t59 comes to mind as example with both ISADevice and SysBusDevice. Maintaining VMState compatibility is likely going to be tricky. Following David's DMA refactoring this also means that the PCI DMA helpers used in EHCI need to be replaced by the generic ones now suggested by Ben, as done for OHCI. Regards, Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
Re: [Qemu-devel] [PATCH 1/2] CPUArchState: introduce per-cpu lock
On Sun, Jun 24, 2012 at 3:05 PM, liu ping fan qemul...@gmail.com wrote: On Fri, Jun 22, 2012 at 8:36 PM, Stefan Hajnoczi stefa...@gmail.com wrote: On Thu, Jun 21, 2012 at 4:06 PM, Liu Ping Fan qemul...@gmail.com wrote: diff --git a/cpu-defs.h b/cpu-defs.h index f49e950..7305822 100644 --- a/cpu-defs.h +++ b/cpu-defs.h @@ -30,6 +30,7 @@ #include osdep.h #include qemu-queue.h #include targphys.h +#include qemu-thread-posix.h This breaks Windows, you need qemu-thread.h. #ifndef TARGET_LONG_BITS #error TARGET_LONG_BITS must be defined before including this header @@ -220,6 +221,7 @@ typedef struct CPUWatchpoint { CPU_COMMON_THREAD \ struct QemuCond *halt_cond; \ int thread_kicked; \ + struct QemuMutex *cpu_lock; \ It would be nicer to declare it QemuMutex cpu_lock (no pointer) so that you don't need to worry about malloc/free. Yes :), I have considered about it, and not quite sure. But I figure out the lock for per-device to break BQL will be like this for some reason. After all, have not decided yet. Start simple. If you need it to be a pointer later that's fine but for now I see no reason to do the malloc (especially since there is no free!). diff --git a/main-loop.h b/main-loop.h index dce1cd9..d8d44a4 100644 --- a/main-loop.h +++ b/main-loop.h @@ -323,6 +323,9 @@ void qemu_bh_delete(QEMUBH *bh); int qemu_add_child_watch(pid_t pid); #endif +void qemu_mutex_lock_cpu(void *_env); +void qemu_mutex_unlock_cpu(void *_env); Why void* instead of CPUArchState*? Because we can hide the CPUArchState from apic.c There's probably a nicer way of doing that than void*. Stefan
Re: [Qemu-devel] [PATCH 1/2] CPUArchState: introduce per-cpu lock
Am 25.06.2012 12:04, schrieb Stefan Hajnoczi: On Sun, Jun 24, 2012 at 3:05 PM, liu ping fan qemul...@gmail.com wrote: On Fri, Jun 22, 2012 at 8:36 PM, Stefan Hajnoczi stefa...@gmail.com wrote: On Thu, Jun 21, 2012 at 4:06 PM, Liu Ping Fan qemul...@gmail.com wrote: diff --git a/main-loop.h b/main-loop.h index dce1cd9..d8d44a4 100644 --- a/main-loop.h +++ b/main-loop.h @@ -323,6 +323,9 @@ void qemu_bh_delete(QEMUBH *bh); int qemu_add_child_watch(pid_t pid); #endif +void qemu_mutex_lock_cpu(void *_env); +void qemu_mutex_unlock_cpu(void *_env); Why void* instead of CPUArchState*? Because we can hide the CPUArchState from apic.c There's probably a nicer way of doing that than void*. CPUState, as requested. Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
[Qemu-devel] [RFC V2 PATCH 0/4] Multiqueue support for tap and virtio-net/vhost
Hello all: This seires is an update of last version of multiqueue support to add multiqueue capability to both tap and virtio-net. Some kinds of tap backends has (macvatp in linux) or would (tap) support multiqueue. In such kind of tap backend, each file descriptor of a tap is a qeueu and ioctls were prodived to attach an exist tap file descriptor to the tun/tap device. So the patch let qemu to use this kind of backend, and let it can transmit and receving packets through multiple file descriptors. Patch 1 introduce a new help to get all matched options, after this patch, we could pass multiple file descriptors to a signle netdev by: qemu -netdev tap,id=hn0,fd=10,fd=11,... Patch 2 introduce generic helpers in tap to attach or detach a file descriptor from a tap device, emulated nics could use this helper to enable/disable queues. Patch 3 modifies the NICState to allow multiple VLANClientState to be stored in it, with this patch, qemu has basic support of multiple capable tap backend. Patch 4 converts virtio-net/vhost to be multiple capable. The vhost device were created per tx/rx queue pairs as usual. Changes from V1: - rebase to the latest - fix memory leak in parse_netdev - fix guest notifiers assignment/de-assignment - changes the command lines to: qemu -netdev tap,queues=2 -device virtio-net-pci,queues=2 TODO: - netdev_add - bridge helper for multiple queue backend References: - V1 http://comments.gmane.org/gmane.comp.emulators.qemu/100481 Please review and comments. --- Jason Wang (4): option: introduce qemu_get_opt_all() tap: multiqueue support net: multiqueue support virtio-net: add multiqueue support hw/dp8393x.c |2 hw/mcf_fec.c |2 hw/qdev-properties.c | 33 +++- hw/qdev.h|3 hw/vhost.c | 58 -- hw/vhost.h |1 hw/vhost_net.c |7 + hw/vhost_net.h |2 hw/virtio-net.c | 461 +- hw/virtio-net.h |3 net.c| 62 ++- net.h| 16 +- net/tap-aix.c| 13 + net/tap-bsd.c| 13 + net/tap-haiku.c | 13 + net/tap-linux.c | 55 ++ net/tap-linux.h |3 net/tap-solaris.c| 13 + net/tap-win32.c | 11 + net/tap.c| 189 +--- net/tap.h|7 + qemu-option.c| 19 ++ qemu-option.h|2 23 files changed, 714 insertions(+), 274 deletions(-) -- Signature
[Qemu-devel] [RFC V2 PATCH 1/4] option: introduce qemu_get_opt_all()
Sometimes, we need to pass option like -netdev tap,fd=100,fd=101,fd=102 which can not be properly parsed by qemu_find_opt() because it only returns the first matched option. So qemu_get_opt_all() were introduced to return an array of pointers which contains all matched option. Signed-off-by: Jason Wang jasow...@redhat.com --- qemu-option.c | 19 +++ qemu-option.h |2 ++ 2 files changed, 21 insertions(+), 0 deletions(-) diff --git a/qemu-option.c b/qemu-option.c index bb3886c..9263125 100644 --- a/qemu-option.c +++ b/qemu-option.c @@ -545,6 +545,25 @@ static QemuOpt *qemu_opt_find(QemuOpts *opts, const char *name) return NULL; } +int qemu_opt_get_all(QemuOpts *opts, const char *name, const char **optp, + int max) +{ +QemuOpt *opt; +int index = 0; + +QTAILQ_FOREACH_REVERSE(opt, opts-head, QemuOptHead, next) { +if (strcmp(opt-name, name) == 0) { +if (index max) { +optp[index++] = opt-str; +} +if (index == max) { +break; +} +} +} +return index; +} + const char *qemu_opt_get(QemuOpts *opts, const char *name) { QemuOpt *opt = qemu_opt_find(opts, name); diff --git a/qemu-option.h b/qemu-option.h index 951dec3..3c9a273 100644 --- a/qemu-option.h +++ b/qemu-option.h @@ -106,6 +106,8 @@ struct QemuOptsList { QemuOptDesc desc[]; }; +int qemu_opt_get_all(QemuOpts *opts, const char *name, const char **optp, + int max); const char *qemu_opt_get(QemuOpts *opts, const char *name); bool qemu_opt_get_bool(QemuOpts *opts, const char *name, bool defval); uint64_t qemu_opt_get_number(QemuOpts *opts, const char *name, uint64_t defval);
[Qemu-devel] [RFC V2 PATCH 2/4] tap: multiqueue support
This patch adds basic support for the multiple queue capable tap device. When multiqueue were enabled for a tap device, user can attach/detach multiple files (sockets) to the device through TUNATTACHQUEUE/TUNDETACHQUEUE. Two helpers tun_attach() and tun_deatch() were introduced to attach and detach file. Platform-specific helpers were called and only linux helper has its content as multiqueue tap were only supported in linux. Signed-off-by: Jason Wang jasow...@redhat.com --- net.c |4 + net/tap-aix.c | 13 +++- net/tap-bsd.c | 13 +++- net/tap-haiku.c | 13 +++- net/tap-linux.c | 55 +++ net/tap-linux.h |3 + net/tap-solaris.c | 13 +++- net/tap-win32.c | 11 +++ net/tap.c | 189 ++--- net/tap.h |7 ++ 10 files changed, 245 insertions(+), 76 deletions(-) diff --git a/net.c b/net.c index 4aa416c..eabe830 100644 --- a/net.c +++ b/net.c @@ -978,6 +978,10 @@ static const struct { .name = vhostforce, .type = QEMU_OPT_BOOL, .help = force vhost on for non-MSIX virtio guests, +}, { +.name = queues, +.type = QEMU_OPT_NUMBER, +.help = number of queues the backend can provides, }, #endif /* _WIN32 */ { /* end of list */ } diff --git a/net/tap-aix.c b/net/tap-aix.c index e19aaba..f111e0f 100644 --- a/net/tap-aix.c +++ b/net/tap-aix.c @@ -25,7 +25,8 @@ #include net/tap.h #include stdio.h -int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int vnet_hdr_required) +int tap_open(char *ifname, int ifname_size, int *vnet_hdr, + int vnet_hdr_required, int attach) { fprintf(stderr, no tap on AIX\n); return -1; @@ -59,3 +60,13 @@ void tap_fd_set_offload(int fd, int csum, int tso4, int tso6, int ecn, int ufo) { } + +int tap_fd_attach(int fd, const char *ifname) +{ +return -1; +} + +int tap_fd_detach(int fd, const char *ifname) +{ +return -1; +} diff --git a/net/tap-bsd.c b/net/tap-bsd.c index 937a94b..44f3421 100644 --- a/net/tap-bsd.c +++ b/net/tap-bsd.c @@ -33,7 +33,8 @@ #include net/if_tap.h #endif -int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int vnet_hdr_required) +int tap_open(char *ifname, int ifname_size, int *vnet_hdr, + int vnet_hdr_required, int attach) { int fd; #ifdef TAPGIFNAME @@ -145,3 +146,13 @@ void tap_fd_set_offload(int fd, int csum, int tso4, int tso6, int ecn, int ufo) { } + +int tap_fd_attach(int fd, const char *ifname) +{ +return -1; +} + +int tap_fd_detach(int fd, const char *ifname) +{ +return -1; +} diff --git a/net/tap-haiku.c b/net/tap-haiku.c index 91dda8e..6fb6719 100644 --- a/net/tap-haiku.c +++ b/net/tap-haiku.c @@ -25,7 +25,8 @@ #include net/tap.h #include stdio.h -int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int vnet_hdr_required) +int tap_open(char *ifname, int ifname_size, int *vnet_hdr, + int vnet_hdr_required, int attach) { fprintf(stderr, no tap on Haiku\n); return -1; @@ -59,3 +60,13 @@ void tap_fd_set_offload(int fd, int csum, int tso4, int tso6, int ecn, int ufo) { } + +int tap_fd_attach(int fd, const char *ifname) +{ +return -1; +} + +int tap_fd_detach(int fd, const char *ifname) +{ +return -1; +} diff --git a/net/tap-linux.c b/net/tap-linux.c index 41d581b..5d74b53 100644 --- a/net/tap-linux.c +++ b/net/tap-linux.c @@ -35,7 +35,8 @@ #define PATH_NET_TUN /dev/net/tun -int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int vnet_hdr_required) +int tap_open(char *ifname, int ifname_size, int *vnet_hdr, + int vnet_hdr_required, int attach) { struct ifreq ifr; int fd, ret; @@ -47,6 +48,8 @@ int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int vnet_hdr_required } memset(ifr, 0, sizeof(ifr)); ifr.ifr_flags = IFF_TAP | IFF_NO_PI; +if (!attach) +ifr.ifr_flags |= IFF_MULTI_QUEUE; if (*vnet_hdr) { unsigned int features; @@ -71,7 +74,10 @@ int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int vnet_hdr_required pstrcpy(ifr.ifr_name, IFNAMSIZ, ifname); else pstrcpy(ifr.ifr_name, IFNAMSIZ, tap%d); -ret = ioctl(fd, TUNSETIFF, (void *) ifr); +if (attach) +ret = ioctl(fd, TUNATTACHQUEUE, (void *) ifr); +else +ret = ioctl(fd, TUNSETIFF, (void *) ifr); if (ret != 0) { if (ifname[0] != '\0') { error_report(could not configure %s (%s): %m, PATH_NET_TUN, ifr.ifr_name); @@ -197,3 +203,48 @@ void tap_fd_set_offload(int fd, int csum, int tso4, } } } + +/* Attach a file descriptor to a TUN/TAP device. This descriptor should be + * detached before. + */ +int tap_fd_attach(int fd, const char *ifname) +{ +struct ifreq ifr; +int ret; + +
[Qemu-devel] [RFC V2 PATCH 3/4] net: multiqueue support
This patch adds the multiqueues support for emulated nics. Each VLANClientState pairs are now abstract as a queue instead of a nic, and multiple VLANClientState pointers were stored in the NICState. A queue_index were also introduced to let the emulated nics know which queue the packet were came from or sent out. Virtio-net would be the first user. Signed-off-by: Jason Wang jasow...@redhat.com --- hw/dp8393x.c |2 +- hw/mcf_fec.c |2 +- hw/qdev-properties.c | 33 +++- hw/qdev.h|3 ++- net.c| 58 +++--- net.h| 16 ++ 6 files changed, 93 insertions(+), 21 deletions(-) diff --git a/hw/dp8393x.c b/hw/dp8393x.c index 017d074..483a868 100644 --- a/hw/dp8393x.c +++ b/hw/dp8393x.c @@ -900,7 +900,7 @@ void dp83932_init(NICInfo *nd, target_phys_addr_t base, int it_shift, s-conf.macaddr = nd-macaddr; s-conf.vlan = nd-vlan; -s-conf.peer = nd-netdev; +s-conf.peers[0] = nd-netdev; s-nic = qemu_new_nic(net_dp83932_info, s-conf, nd-model, nd-name, s); diff --git a/hw/mcf_fec.c b/hw/mcf_fec.c index ae37bef..69f508d 100644 --- a/hw/mcf_fec.c +++ b/hw/mcf_fec.c @@ -473,7 +473,7 @@ void mcf_fec_init(MemoryRegion *sysmem, NICInfo *nd, s-conf.macaddr = nd-macaddr; s-conf.vlan = nd-vlan; -s-conf.peer = nd-netdev; +s-conf.peers[0] = nd-netdev; s-nic = qemu_new_nic(net_mcf_fec_info, s-conf, nd-model, nd-name, s); diff --git a/hw/qdev-properties.c b/hw/qdev-properties.c index 9ae3187..d45fcef 100644 --- a/hw/qdev-properties.c +++ b/hw/qdev-properties.c @@ -554,16 +554,37 @@ PropertyInfo qdev_prop_chr = { static int parse_netdev(DeviceState *dev, const char *str, void **ptr) { -VLANClientState *netdev = qemu_find_netdev(str); +VLANClientState ***nc = (VLANClientState ***)ptr; +VLANClientState *vcs[MAX_QUEUE_NUM]; +int queues, i = 0; +int ret; -if (netdev == NULL) { -return -ENOENT; +*nc = g_malloc(MAX_QUEUE_NUM * sizeof(VLANClientState *)); +queues = qemu_find_netdev_all(str, vcs, MAX_QUEUE_NUM); +if (queues == 0) { +ret = -ENOENT; +goto err; } -if (netdev-peer) { -return -EEXIST; + +for (i = 0; i queues; i++) { +if (vcs[i] == NULL) { +ret = -ENOENT; +goto err; +} + +if (vcs[i]-peer) { +ret = -EEXIST; +goto err; +} + +(*nc)[i] = vcs[i]; } -*ptr = netdev; + return 0; + +err: +g_free(*nc); +return ret; } static const char *print_netdev(void *ptr) diff --git a/hw/qdev.h b/hw/qdev.h index 5386b16..1c023b4 100644 --- a/hw/qdev.h +++ b/hw/qdev.h @@ -248,6 +248,7 @@ extern PropertyInfo qdev_prop_blocksize; .defval= (bool)_defval, \ } + #define DEFINE_PROP_UINT8(_n, _s, _f, _d) \ DEFINE_PROP_DEFAULT(_n, _s, _f, _d, qdev_prop_uint8, uint8_t) #define DEFINE_PROP_UINT16(_n, _s, _f, _d) \ @@ -274,7 +275,7 @@ extern PropertyInfo qdev_prop_blocksize; #define DEFINE_PROP_STRING(_n, _s, _f) \ DEFINE_PROP(_n, _s, _f, qdev_prop_string, char*) #define DEFINE_PROP_NETDEV(_n, _s, _f) \ -DEFINE_PROP(_n, _s, _f, qdev_prop_netdev, VLANClientState*) +DEFINE_PROP(_n, _s, _f, qdev_prop_netdev, VLANClientState**) #define DEFINE_PROP_VLAN(_n, _s, _f) \ DEFINE_PROP(_n, _s, _f, qdev_prop_vlan, VLANState*) #define DEFINE_PROP_DRIVE(_n, _s, _f) \ diff --git a/net.c b/net.c index eabe830..026a03a 100644 --- a/net.c +++ b/net.c @@ -238,16 +238,40 @@ NICState *qemu_new_nic(NetClientInfo *info, { VLANClientState *nc; NICState *nic; +int i; assert(info-type == NET_CLIENT_TYPE_NIC); assert(info-size = sizeof(NICState)); -nc = qemu_new_net_client(info, conf-vlan, conf-peer, model, name); +if (conf-peers) { +nc = qemu_new_net_client(info, NULL, conf-peers[0], model, name); +} else { +nc = qemu_new_net_client(info, conf-vlan, NULL, model, name); +} nic = DO_UPCAST(NICState, nc, nc); nic-conf = conf; nic-opaque = opaque; +/* For compatiablity with single queue nic */ +nic-ncs[0] = nc; +nc-opaque = nic; + +for (i = 1 ; i conf-queues; i++) { +VLANClientState *vc = g_malloc0(sizeof(*vc)); +vc-opaque = nic; +nic-ncs[i] = vc; +vc-peer = conf-peers[i]; +vc-info = info; +vc-queue_index = i; +vc-peer-peer = vc; +QTAILQ_INSERT_TAIL(non_vlan_clients, vc, next); + +vc-send_queue = qemu_new_net_queue(qemu_deliver_packet, +qemu_deliver_packet_iov, +vc); +} + return nic; } @@ -283,11 +307,10 @@ void qemu_del_vlan_client(VLANClientState *vc) { /*
[Qemu-devel] [RFC V2 PATCH 4/4] virtio-net: add multiqueue support
This patch let the virtio-net can transmit and recevie packets through multiuple VLANClientStates and abstract them as multiple virtqueues to guest. A new parameter 'queues' were introduced to specify the number of queue pairs. The main goal for vhost support is to let the multiqueue could be used without changes in vhost code. So each vhost_net structure were used to track a single VLANClientState and two virtqueues in the past. As multiple VLANClientState were stored in the NICState, we can infer the correspond VLANClientState from this and queue_index easily. Signed-off-by: Jason Wang jasow...@redhat.com --- hw/vhost.c | 58 --- hw/vhost.h |1 hw/vhost_net.c |7 + hw/vhost_net.h |2 hw/virtio-net.c | 461 +-- hw/virtio-net.h |3 6 files changed, 355 insertions(+), 177 deletions(-) diff --git a/hw/vhost.c b/hw/vhost.c index 43664e7..6318bb2 100644 --- a/hw/vhost.c +++ b/hw/vhost.c @@ -620,11 +620,12 @@ static int vhost_virtqueue_init(struct vhost_dev *dev, { target_phys_addr_t s, l, a; int r; +int vhost_vq_index = (idx 2 ? idx - 1 : idx) % dev-nvqs; struct vhost_vring_file file = { -.index = idx, + .index = vhost_vq_index }; struct vhost_vring_state state = { -.index = idx, +.index = vhost_vq_index }; struct VirtQueue *vvq = virtio_get_queue(vdev, idx); @@ -670,11 +671,12 @@ static int vhost_virtqueue_init(struct vhost_dev *dev, goto fail_alloc_ring; } -r = vhost_virtqueue_set_addr(dev, vq, idx, dev-log_enabled); +r = vhost_virtqueue_set_addr(dev, vq, vhost_vq_index, dev-log_enabled); if (r 0) { r = -errno; goto fail_alloc; } + file.fd = event_notifier_get_fd(virtio_queue_get_host_notifier(vvq)); r = ioctl(dev-control, VHOST_SET_VRING_KICK, file); if (r) { @@ -715,7 +717,7 @@ static void vhost_virtqueue_cleanup(struct vhost_dev *dev, unsigned idx) { struct vhost_vring_state state = { -.index = idx, +.index = (idx 2 ? idx - 1 : idx) % dev-nvqs, }; int r; r = ioctl(dev-control, VHOST_GET_VRING_BASE, state); @@ -829,7 +831,9 @@ int vhost_dev_enable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev) } for (i = 0; i hdev-nvqs; ++i) { -r = vdev-binding-set_host_notifier(vdev-binding_opaque, i, true); +r = vdev-binding-set_host_notifier(vdev-binding_opaque, +hdev-start_idx + i, +true); if (r 0) { fprintf(stderr, vhost VQ %d notifier binding failed: %d\n, i, -r); goto fail_vq; @@ -839,7 +843,9 @@ int vhost_dev_enable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev) return 0; fail_vq: while (--i = 0) { -r = vdev-binding-set_host_notifier(vdev-binding_opaque, i, false); +r = vdev-binding-set_host_notifier(vdev-binding_opaque, +hdev-start_idx + i, +false); if (r 0) { fprintf(stderr, vhost VQ %d notifier cleanup error: %d\n, i, -r); fflush(stderr); @@ -860,7 +866,9 @@ void vhost_dev_disable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev) int i, r; for (i = 0; i hdev-nvqs; ++i) { -r = vdev-binding-set_host_notifier(vdev-binding_opaque, i, false); +r = vdev-binding-set_host_notifier(vdev-binding_opaque, +hdev-start_idx + i, +false); if (r 0) { fprintf(stderr, vhost VQ %d notifier cleanup failed: %d\n, i, -r); fflush(stderr); @@ -874,15 +882,17 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev) { int i, r; if (!vdev-binding-set_guest_notifiers) { -fprintf(stderr, binding does not support guest notifiers\n); +fprintf(stderr, binding does not support guest notifier\n); r = -ENOSYS; goto fail; } -r = vdev-binding-set_guest_notifiers(vdev-binding_opaque, true); -if (r 0) { -fprintf(stderr, Error binding guest notifier: %d\n, -r); -goto fail_notifiers; +if (hdev-start_idx == 0) { +r = vdev-binding-set_guest_notifiers(vdev-binding_opaque, true); +if (r 0) { +fprintf(stderr, Error binding guest notifier: %d\n, -r); +goto fail_notifiers; +} } r = vhost_dev_set_features(hdev, hdev-log_enabled); @@ -898,7 +908,7 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev) r = vhost_virtqueue_init(hdev, vdev, hdev-vqs + i, - i); + hdev-start_idx + i);
Re: [Qemu-devel] [net-next RFC V4 PATCH 3/4] virtio: introduce a method to get the irq of a specific virtqueue
On Mon, Jun 25, 2012 at 05:41:17PM +0800, Jason Wang wrote: Device specific irq optimizations such as irq affinity may be used by virtio drivers. So this patch introduce a new method to get the irq of a specific virtqueue. After this patch, virtio device drivers could query the irq and do device specific optimizations. First user would be virtio-net. Signed-off-by: Jason Wang jasow...@redhat.com --- drivers/lguest/lguest_device.c |8 drivers/s390/kvm/kvm_virtio.c |6 ++ drivers/virtio/virtio_mmio.c |8 drivers/virtio/virtio_pci.c| 12 include/linux/virtio_config.h |4 5 files changed, 38 insertions(+), 0 deletions(-) diff --git a/drivers/lguest/lguest_device.c b/drivers/lguest/lguest_device.c index 9e8388e..bcd080f 100644 --- a/drivers/lguest/lguest_device.c +++ b/drivers/lguest/lguest_device.c @@ -392,6 +392,13 @@ static const char *lg_bus_name(struct virtio_device *vdev) return ; } +static int lg_get_vq_irq(struct virtio_device *vdev, struct virtqueue *vq) +{ + struct lguest_vq_info *lvq = vq-priv; + + return lvq-config.irq; +} + /* The ops structure which hooks everything together. */ static struct virtio_config_ops lguest_config_ops = { .get_features = lg_get_features, @@ -404,6 +411,7 @@ static struct virtio_config_ops lguest_config_ops = { .find_vqs = lg_find_vqs, .del_vqs = lg_del_vqs, .bus_name = lg_bus_name, + .get_vq_irq = lg_get_vq_irq, }; /* diff --git a/drivers/s390/kvm/kvm_virtio.c b/drivers/s390/kvm/kvm_virtio.c index d74e9ae..a897de2 100644 --- a/drivers/s390/kvm/kvm_virtio.c +++ b/drivers/s390/kvm/kvm_virtio.c @@ -268,6 +268,11 @@ static const char *kvm_bus_name(struct virtio_device *vdev) return ; } +static int kvm_get_vq_irq(struct virtio_device *vdev, struct virtqueue *vq) +{ + return 0x2603; +} + /* * The config ops structure as defined by virtio config */ @@ -282,6 +287,7 @@ static struct virtio_config_ops kvm_vq_configspace_ops = { .find_vqs = kvm_find_vqs, .del_vqs = kvm_del_vqs, .bus_name = kvm_bus_name, + .get_vq_irq = kvm_get_vq_irq, }; /* diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c index f5432b6..2ba37ed 100644 --- a/drivers/virtio/virtio_mmio.c +++ b/drivers/virtio/virtio_mmio.c @@ -411,6 +411,13 @@ static const char *vm_bus_name(struct virtio_device *vdev) return vm_dev-pdev-name; } +static int vm_get_vq_irq(struct virtio_device *vdev, struct virtqueue *vq) +{ + struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev); + + return platform_get_irq(vm_dev-pdev, 0); +} + static struct virtio_config_ops virtio_mmio_config_ops = { .get= vm_get, .set= vm_set, @@ -422,6 +429,7 @@ static struct virtio_config_ops virtio_mmio_config_ops = { .get_features = vm_get_features, .finalize_features = vm_finalize_features, .bus_name = vm_bus_name, + .get_vq_irq = vm_get_vq_irq, }; diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c index adb24f2..c062227 100644 --- a/drivers/virtio/virtio_pci.c +++ b/drivers/virtio/virtio_pci.c @@ -607,6 +607,17 @@ static const char *vp_bus_name(struct virtio_device *vdev) return pci_name(vp_dev-pci_dev); } +static int vp_get_vq_irq(struct virtio_device *vdev, struct virtqueue *vq) +{ + struct virtio_pci_device *vp_dev = to_vp_device(vdev); + struct virtio_pci_vq_info *info = vq-priv; + + if (vp_dev-intx_enabled) + return vp_dev-pci_dev-irq; + else + return vp_dev-msix_entries[info-msix_vector].vector; +} + static struct virtio_config_ops virtio_pci_config_ops = { .get= vp_get, .set= vp_set, @@ -618,6 +629,7 @@ static struct virtio_config_ops virtio_pci_config_ops = { .get_features = vp_get_features, .finalize_features = vp_finalize_features, .bus_name = vp_bus_name, + .get_vq_irq = vp_get_vq_irq, }; static void virtio_pci_release_dev(struct device *_d) diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h index fc457f4..acd6930 100644 --- a/include/linux/virtio_config.h +++ b/include/linux/virtio_config.h @@ -98,6 +98,9 @@ * vdev: the virtio_device * This returns a pointer to the bus name a la pci_name from which * the caller can then copy. + * @get_vq_irq: get the irq numer of the specific virt queue. + * vdev: the virtio_device + * vq: the virtqueue What if the vq does not have an IRQ? E.g. control vqs don't. What if the IRQ is shared between VQs? Between devices? The need to cleanup affinity on destroy is also nasty. How about we expose a set_affinity API instead? Then: - non PCI can ignore for now - with a per vq vector we
Re: [Qemu-devel] [PATCH 3/5] memory: Flush coalesced MMIO on mapping and state changes
On 2012-06-25 10:57, Avi Kivity wrote: On 06/25/2012 10:01 AM, Jan Kiszka wrote: Flush pending coalesced MMIO before performing mapping or state changes that could affect the event orderings or route the buffered requests to a wrong region. Signed-off-by: Jan Kiszka jan.kis...@siemens.com In addition, we also have to Yes, we do. void memory_region_transaction_begin(void) { +qemu_flush_coalesced_mmio_buffer(); ++memory_region_transaction_depth; } Why is this needed? @@ -1109,6 +1110,9 @@ void memory_region_sync_dirty_bitmap(MemoryRegion *mr) void memory_region_set_readonly(MemoryRegion *mr, bool readonly) { +if (!QTAILQ_EMPTY(mr-coalesced)) { +qemu_flush_coalesced_mmio_buffer(); +} The readonly attribute is inherited by subregions and alias targets, so this check is insufficient. See render_memory_region(). Need to flush unconditionally. if (mr-readonly != readonly) { mr-readonly = readonly; memory_region_update_topology(mr); @@ -1117,6 +1121,9 @@ void memory_region_set_readonly(MemoryRegion *mr, bool readonly) void memory_region_rom_device_set_readable(MemoryRegion *mr, bool readable) { +if (!QTAILQ_EMPTY(mr-coalesced)) { +qemu_flush_coalesced_mmio_buffer(); +} This property is not inherited, but let's flush unconditionally just the same, to reduce fragility. @@ -1219,6 +1228,9 @@ void memory_region_add_eventfd(MemoryRegion *mr, }; unsigned i; +if (!QTAILQ_EMPTY(mr-coalesced)) { +qemu_flush_coalesced_mmio_buffer(); +} Ditto. It's possible that an eventfd overlays a subregion which has coalescing enabled. It's not really defined what happens in this case, and such code and its submitter should be perma-nacked, but let's play it safe here since there isn't much to be gained by avoiding the flush. This code is a very slow path anyway, including and rcu and/or srcu synchronization, and a rebuild of the dispatch radix tree (trees when we dma-enable this code). for (i = 0; i mr-ioeventfd_nb; ++i) { if (memory_region_ioeventfd_before(mrfd, mr-ioeventfds[i])) { break; @@ -1249,6 +1261,9 @@ void memory_region_del_eventfd(MemoryRegion *mr, }; unsigned i; +if (!QTAILQ_EMPTY(mr-coalesced)) { +qemu_flush_coalesced_mmio_buffer(); +} Same. OK. The repetitiveness of this code suggests a different way of doing this: make every API call be its own subtransaction and perform the flush in memory_region_begin_transaction() (maybe that's the answer to my question above). So you want me to wrap the core of those services in begin/commit_transaction instead? Just to be sure I got the idea. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
Re: [Qemu-devel] running qemu with image file on a block device
On Sun, Jun 24, 2012 at 7:49 AM, udit agarwal fzdu...@gmail.com wrote: I have set up a block device of 20G on my hard drive and is located at /dev/sda2. Now I have created a raw image inside this device via command: hp1:~ # qemu-img create -f raw /dev/sda2:server.img 10G Then I copied the iso file on this device via command: hp1:~# cp ~/CD.iso /dev/sda2:CD,iso and it also worked fine. But when I ran the following qemu command: hp1:~ # qemu -name Gnome-desktop -M pc-0.12 -m 256 -smp 2 -boot d -drive file=/dev/sda2:server.img,if=virtio,index=0,media=disk,format=raw -drive file=/dev/sda2:CD.iso,index=1,media=cdrom -net nic,model=virtio,macaddr=52:54:00:05:11:11 -net user -vga cirrus -balloon virtio I got the following errors: qemu: -drive file=/dev/sda2:server.img,if=virtio,index=0,media=disk,format=raw: could not open disk image /dev/sda2:server.img: No such file or directory Can anybody help me with this?? The commands you pasted seem to be creating files called /dev/sda2:server.img and /dev/sda2:CD.iso. Note that these files are created in /dev, they are not located on the /dev/sda2 device as you intended. The ':' notation you tried to use has no special meaning on Linux, you are creating new files that have nothing to do with /dev/sda2. If you want to put several disk images onto a single block device then you probably need to create a file system on the device, mount it, and then put files into this new file system: # mkfs.ext4 /dev/sda2 # WARNING: any existing data in /dev/sda2 will be lost # mount /dev/sda2 /mnt # cp CD.iso /mnt # qemu-img create /mnt/server.img 10G # ls /mnt CD.iso server.img Stefan
Re: [Qemu-devel] [PATCH 1/2] CPUArchState: introduce per-cpu lock
Am 24.06.2012 16:02, schrieb liu ping fan: On Fri, Jun 22, 2012 at 8:55 PM, Andreas Färber afaer...@suse.de wrote: Am 21.06.2012 17:06, schrieb Liu Ping Fan: introduce a lock for per-cpu to protect agaist accesing from other vcpu thread. Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com --- cpu-defs.h |2 ++ cpus.c | 17 + main-loop.h |3 +++ 3 files changed, 22 insertions(+), 0 deletions(-) diff --git a/cpu-defs.h b/cpu-defs.h index f49e950..7305822 100644 --- a/cpu-defs.h +++ b/cpu-defs.h @@ -30,6 +30,7 @@ #include osdep.h #include qemu-queue.h #include targphys.h +#include qemu-thread-posix.h #ifndef TARGET_LONG_BITS #error TARGET_LONG_BITS must be defined before including this header @@ -220,6 +221,7 @@ typedef struct CPUWatchpoint { CPU_COMMON_THREAD \ struct QemuCond *halt_cond; \ int thread_kicked; \ +struct QemuMutex *cpu_lock; \ struct qemu_work_item *queued_work_first, *queued_work_last;\ const char *cpu_model_str; \ struct KVMState *kvm_state; \ Please don't add stuff to CPU_COMMON. Instead add to CPUState in qom/cpu.c. The QOM CPUState part 4 series moves many of those fields there. My mistake: The struct to add to is in include/qemu/cpu.h obviously. Ok, thanks. diff --git a/cpus.c b/cpus.c index b182b3d..554f7bc 100644 --- a/cpus.c +++ b/cpus.c @@ -735,6 +735,7 @@ static void *qemu_kvm_cpu_thread_fn(void *arg) env-thread_id = qemu_get_thread_id(); cpu_single_env = env; + Stray whitespace addition. r = kvm_init_vcpu(env); if (r 0) { fprintf(stderr, kvm_init_vcpu failed: %s\n, strerror(-r)); @@ -891,6 +892,20 @@ int qemu_cpu_is_self(void *_env) return qemu_thread_is_self(env-thread); } +void qemu_mutex_lock_cpu(void *_env) +{ +CPUArchState *env = _env; + +qemu_mutex_lock(env-cpu_lock); +} + +void qemu_mutex_unlock_cpu(void *_env) +{ +CPUArchState *env = _env; + +qemu_mutex_unlock(env-cpu_lock); +} + I don't like these helpers. For one, you are using void * arguments and casting them, for another you are using CPUArchState at all. With my suggestion above these can be CPUState *cpu. For using it in apic.c, we need to hide the CPUArchState structure CPUState serves to hide CPUArchState, with added type safety. :) Regards, Andreas void qemu_mutex_lock_iothread(void) { if (!tcg_enabled()) { @@ -1027,6 +1042,8 @@ void qemu_init_vcpu(void *_env) env-nr_cores = smp_cores; env-nr_threads = smp_threads; env-stopped = 1; +env-cpu_lock = g_malloc0(sizeof(QemuMutex)); +qemu_mutex_init(env-cpu_lock); Are you sure this is not needed for linux-user/bsd-user? If not needed, then the field should be #ifdef'ed in the struct to assure that. Otherwise this function is never called and you need to move the initialization to the initfn in qom/cpu.c and then should also clean it up in a finalizer. It is not needed for linux-user/bsd-user. I will use the macro, Thanks and regards, pingfan -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
Re: [Qemu-devel] [PATCH 3/5] memory: Flush coalesced MMIO on mapping and state changes
On 2012-06-25 12:15, Jan Kiszka wrote: On 2012-06-25 10:57, Avi Kivity wrote: The repetitiveness of this code suggests a different way of doing this: make every API call be its own subtransaction and perform the flush in memory_region_begin_transaction() (maybe that's the answer to my question above). So you want me to wrap the core of those services in begin/commit_transaction instead? Just to be sure I got the idea. What we would lose this way (transaction_commit) is the ability to skip updates on !mr-enabled. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
Re: [Qemu-devel] net: RFC New Socket-Based, Switched Network Backend (QDES)
On Mon, Jun 25, 2012 at 6:42 AM, Mike Lovell m...@dev-zero.net wrote: This is what I've been calling QDES or QEMU Distributed Ethernet Switch. I first had the idea when I was playing with the udp and mcast socket network backends while exploring how to build a VM infrastructure. I liked the idea of using the sockets backends cause it doesn't require escalated permissions to configure and run as well as the ability to talk over IP networks. But the built in socket backends either allowed for only 2 guests talking directly or for multiple guests where all traffic is sent to all. So one can either have two guests talking or have bandwidth wasted with multiple guests. There wasn't something that could talk to multiple guests but also utilize unicast traffic. Have you looked at QEMU's net/vde.c backend? Does VDE (http://vde.sourceforge.net/) already do everything that QDES does? Stefan
Re: [Qemu-devel] [PATCH 3/5] memory: Flush coalesced MMIO on mapping and state changes
On 06/25/2012 01:26 PM, Jan Kiszka wrote: On 2012-06-25 12:15, Jan Kiszka wrote: On 2012-06-25 10:57, Avi Kivity wrote: The repetitiveness of this code suggests a different way of doing this: make every API call be its own subtransaction and perform the flush in memory_region_begin_transaction() (maybe that's the answer to my question above). So you want me to wrap the core of those services in begin/commit_transaction instead? Just to be sure I got the idea. What we would lose this way (transaction_commit) is the ability to skip updates on !mr-enabled. We could have an internal memory_region_begin_transaction_mr() which checks mr-enabled and notes it if its clear (but anything else would have to undo this). I don't think it's worth it, let's lose the optimization and see if it shows up anywhere. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [PATCH 3/5] memory: Flush coalesced MMIO on mapping and state changes
On 2012-06-25 13:01, Avi Kivity wrote: On 06/25/2012 01:26 PM, Jan Kiszka wrote: On 2012-06-25 12:15, Jan Kiszka wrote: On 2012-06-25 10:57, Avi Kivity wrote: The repetitiveness of this code suggests a different way of doing this: make every API call be its own subtransaction and perform the flush in memory_region_begin_transaction() (maybe that's the answer to my question above). So you want me to wrap the core of those services in begin/commit_transaction instead? Just to be sure I got the idea. What we would lose this way (transaction_commit) is the ability to skip updates on !mr-enabled. We could have an internal memory_region_begin_transaction_mr() which checks mr-enabled and notes it if its clear (but anything else would have to undo this). I don't think it's worth it, let's lose the optimization and see if it shows up anywhere. OK, will send a new series with a conversion of this included. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
Re: [Qemu-devel] [PATCH stable-1.1 00/26] Initial tree and candidates for stable-1.1
On Sat, 23 Jun 2012, Michael Roth wrote: Hi everyone, I've set up a tree for stable-1.1 at: git://github.com/mdroth/qemu.git stable-1.1 Thanks for doing this. That is simply the initial commit for v1.1.0 I've set up a staging tree for the first batch of commits at: git://github.com/mdroth/qemu.git stable-1.1-staging This should capture everything explicitly tagged for stable/1.1. I've also pulled in relevant build fixes and a handful of applicable bug fixes. Please let me know if I've missed any patches, or if any of these shouldn't be applied. Please backport this one too: commit 64c27e5b1fdb6d94bdc0bda3b1869d7383a35c65 Author: Jan Beulich jbeul...@suse.com Date: Mon Jun 11 09:52:27 2012 + qemu/xendisk: set maximum number of grants to be used Thanks! - Stefano
Re: [Qemu-devel] [PATCH] Exynos4: added RTC device
On 25.06.2012 13:24, Andreas Färber wrote: Am 25.06.2012 09:55, schrieb Oleg Ogurtsov: Signed-off-by: Oleg Ogurtsovo.ogurt...@samsung.com --- hw/arm/Makefile.objs |1 + hw/exynos4210.c |8 + hw/exynos4210_rtc.c | 607 ++ 3 files changed, 616 insertions(+), 0 deletions(-) create mode 100644 hw/exynos4210_rtc.c This RTC like many other Exynos devices has no dependency on the CPU. I have a patch in preparation that moves such devices from hw/arm/Makefile.objs to hw/Makefile.objs. I don't object to this patch, not even minor style nits spotted, compliment, but if you have to respin for some reason, it would be nice if you could consider that improvement. These devices are SOC specific and this SOC is based on ARM only. Do we really need to move them? -- Kind regards, Evgeny Voevodin, Leading Software Engineer, ASWG, Moscow RD center, Samsung Electronics e-mail: e.voevo...@samsung.com
Re: [Qemu-devel] [PATCH v6 6/6] fdc_test: introduce test_sense_interrupt
Am 22.06.2012 12:33, schrieb Pavel Hrdina: Calling sense interrupt status while there is no interrupt should return invalid command (0x80). Read command should always returns in st0 seek_end bit set to 1. Signed-off-by: Pavel Hrdina phrd...@redhat.com --- tests/fdc-test.c | 25 - 1 files changed, 24 insertions(+), 1 deletions(-) diff --git a/tests/fdc-test.c b/tests/fdc-test.c index 5280eff..be14d83 100644 --- a/tests/fdc-test.c +++ b/tests/fdc-test.c @@ -142,7 +142,7 @@ static uint8_t send_read_command(void) } st0 = floppy_recv(); -if (st0 != 0x40) { +if (st0 != 0x60) { ret = 1; } @@ -263,6 +263,28 @@ static void test_media_change(void) assert_bit_set(dir, DSKCHG); } +static void test_sense_interrupt(void) +{ +int drive = 0; +int head = 0; +int cyl = 0; +int ret = 0; + +floppy_send(CMD_SENSE_INT); +ret = floppy_recv(); +g_assert(ret == 0x80); If I'm not mistakes, CMD_SENSE_INT returns two bytes (not just one), the first of which is ST0 and the second one the Present Cylinder Number. 0x80 for ST0 would mean invalid command, which is the right return value if there is no active interrupt condition. So you seem to check the right value and just miss a second floppy_recv(). + +floppy_send(CMD_SEEK); +floppy_send(head 2 | drive); +g_assert(!get_irq(FLOPPY_IRQ)); +floppy_send(cyl); Maybe add another assertion here that the IRQ is now active? + +floppy_send(CMD_SENSE_INT); +ret = floppy_recv(); +g_assert(ret != 0x80); I think we can do better and tell the exact value that should result. I believe g_assert(ret == 0x20); would be right. Kevin
Re: [Qemu-devel] [PATCH] Exynos4: added RTC device
Am 25.06.2012 13:46, schrieb Evgeny Voevodin: On 25.06.2012 13:24, Andreas Färber wrote: Am 25.06.2012 09:55, schrieb Oleg Ogurtsov: Signed-off-by: Oleg Ogurtsovo.ogurt...@samsung.com --- hw/arm/Makefile.objs |1 + hw/exynos4210.c |8 + hw/exynos4210_rtc.c | 607 ++ 3 files changed, 616 insertions(+), 0 deletions(-) create mode 100644 hw/exynos4210_rtc.c This RTC like many other Exynos devices has no dependency on the CPU. I have a patch in preparation that moves such devices from hw/arm/Makefile.objs to hw/Makefile.objs. I don't object to this patch, not even minor style nits spotted, compliment, but if you have to respin for some reason, it would be nice if you could consider that improvement. These devices are SOC specific and this SOC is based on ARM only. Do we really need to move them? For one, they do not need to be rebuilt when cpu.h changes and they should get the usual device poisoning for proper modeling. For another, someone on IRC started work on an armeb-softmmu, for which we would probably not want to compile in the Exynos devices. Or if we do, we certainly don't want to compile everything twice (cf. xilinx). If devices are ARM-specific and need access to the CPU (e.g., machines, PICs) then according to Paolo they should be placed in hw/arm/ with the new scheme. I'm trying to stay away from moving other people's files around, but the Makefile changes are pretty non-intrusive and can go through arm-devs.next. Cheers, Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
Re: [Qemu-devel] [PATCH v3 1/1] virtio-rng: hardware random number generator device
On Fri, Jun 22, 2012 at 02:59:13PM -0500, Anthony Liguori wrote: On 06/22/2012 01:50 PM, Amit Shah wrote: On (Fri) 22 Jun 2012 [08:44:52], Anthony Liguori wrote: On 06/22/2012 08:34 AM, Daniel P. Berrange wrote: Oh, that's a good point. But easily fixed. Of course, except that now we have to maintain compatibility so some hideous hack goes in. This is what fundamentally makes using events a bad approach. There are more problems lurking. This is the fundamental complexity of having two non-synchronized communication channels when you're trying to synchronize some sort of activity. BTW, despite what danpb mentioned, you are rate limiting entropy requests in this patch series All of these problems are naturally solved using a protocol over a CharDriverState. Can we at least agree on merging a patch which just includes the raw chardev backend support for virtio-rng ? ie drop the QMP event for now, so we can make some step forward. All we need to do to support EGD is instead of doing: +QObject *data; + +data = qobject_from_jsonf({ 'bytes': % PRId64 }, + size); +monitor_protocol_event(QEVENT_ENTROPY_NEEDED, data); +qobject_decref(data); Do: while (size 0) { uint8_t partial_size = MIN(255, size); uint8_t msg[2] = { 0x02, partial_size }; qemu_chr_write(s-chr, msg, sizeof(msg)); size -= partial_size; } And that's it. It's an extremely simple protocol to support. It would actually reduce the total size of the patch. So I now get what your objection is, and that is because of an assumption you're making which is incorrect. An OS asking for 5038 bytes of entropy is doing just that: asking for those bytes. That doesn't mean a hardware device can provide it with that much entropy. So, the hardware device here will just provide whatever is available, and the OS has to be happy with what it got. If it got 200 bytes from the device, the OS is then free to demand even 7638 bytes more, but it may not get anything for quite a while. (This is exactly how things work with /dev/random and /dev/urandom too, btw. And /dev/urandom was invented for apps which can't wait for all the data they're asking to come to them.) As it turns out, the EGD protocol also has a mechanism to ask for ask much entropy as is available at the current moment. It also has a mechanism to query the amount of available entropy. And no, you're assertion that we don't care about how much entropy the guest is requesting is wrong. If we lose entropy requests, then we never know if the guest is in a state where it's expecting entropy and we're not sending it. Consider: - Guest requests entropy (X bytes) - libvirt sees request - libvirt sends X bytes to guest - Guest requests entropy (Y bytes), QEMU filters due to rate limiting The guest will never request entropy again and libvirt will never send entropy again. The device is effectively dead-locked. WRT the impl of rate limiting Amit has used in this patch, you are correct, however, this is not how QEMU should be rate limiting this event. As mentioned in an earlier reply in this thread, any rate limited /must/ work as follows: - Guest sends randomness request for 10 bytes - QMP event gets sent for 10 bytes - Guest sends randomness request for 4 bytes - QMP is dropped - Guest sends randomness request for 8 bytes - QMP event gets sent for 12 bytes ie, the byte count for any events which are dropped, must be added to the byte count in the next emitted event. Also as Amit mentioned in his reply to me, it should take account of multiple devices, or we should limit QEMU to 1 single RNG device per guest, as we do for the balloon driver. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
Re: [Qemu-devel] [PATCH] target-ppc: Fix 2nd parameter for tcg_gen_shri_tl
Am 24.06.2012 17:14, schrieb Alexander Graf: On 24.06.2012, at 16:18, Stefan Weil s...@weilnetz.de wrote: This fixes a compiler error when QEMU was configured with --enable-debug. Signed-off-by: Stefan Weil s...@weilnetz.de --- Alex, please review. It fixes the compiler error, but I did the change simply by comparision with other functions, so I have no idea whether it is really correct. Looks good at a first glance, but will verify tonight. Thanks a lot for catching it! I guess we should add an --enable-debug build to buildbot. Blue/Stefan, could you set up a feature page on the Wiki for the AREG0 refactoring? Then we could document what maintainers/testers need to verify for the next series there. Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
Re: [Qemu-devel] [PATCH v3 1/1] virtio-rng: hardware random number generator device
On 06/25/2012 07:10 AM, Daniel P. Berrange wrote: On Fri, Jun 22, 2012 at 02:59:13PM -0500, Anthony Liguori wrote: On 06/22/2012 01:50 PM, Amit Shah wrote: On (Fri) 22 Jun 2012 [08:44:52], Anthony Liguori wrote: On 06/22/2012 08:34 AM, Daniel P. Berrange wrote: Oh, that's a good point. But easily fixed. Of course, except that now we have to maintain compatibility so some hideous hack goes in. This is what fundamentally makes using events a bad approach. There are more problems lurking. This is the fundamental complexity of having two non-synchronized communication channels when you're trying to synchronize some sort of activity. BTW, despite what danpb mentioned, you are rate limiting entropy requests in this patch series All of these problems are naturally solved using a protocol over a CharDriverState. Can we at least agree on merging a patch which just includes the raw chardev backend support for virtio-rng ? ie drop the QMP event for now, so we can make some step forward. All we need to do to support EGD is instead of doing: +QObject *data; + +data = qobject_from_jsonf({ 'bytes': % PRId64 }, + size); +monitor_protocol_event(QEVENT_ENTROPY_NEEDED, data); +qobject_decref(data); Do: while (size 0) { uint8_t partial_size = MIN(255, size); uint8_t msg[2] = { 0x02, partial_size }; qemu_chr_write(s-chr, msg, sizeof(msg)); size -= partial_size; } And that's it. It's an extremely simple protocol to support. It would actually reduce the total size of the patch. So I now get what your objection is, and that is because of an assumption you're making which is incorrect. An OS asking for 5038 bytes of entropy is doing just that: asking for those bytes. That doesn't mean a hardware device can provide it with that much entropy. So, the hardware device here will just provide whatever is available, and the OS has to be happy with what it got. If it got 200 bytes from the device, the OS is then free to demand even 7638 bytes more, but it may not get anything for quite a while. (This is exactly how things work with /dev/random and /dev/urandom too, btw. And /dev/urandom was invented for apps which can't wait for all the data they're asking to come to them.) As it turns out, the EGD protocol also has a mechanism to ask for ask much entropy as is available at the current moment. It also has a mechanism to query the amount of available entropy. And no, you're assertion that we don't care about how much entropy the guest is requesting is wrong. If we lose entropy requests, then we never know if the guest is in a state where it's expecting entropy and we're not sending it. Consider: - Guest requests entropy (X bytes) - libvirt sees request - libvirt sends X bytes to guest - Guest requests entropy (Y bytes), QEMU filters due to rate limiting The guest will never request entropy again and libvirt will never send entropy again. The device is effectively dead-locked. WRT the impl of rate limiting Amit has used in this patch, you are correct, however, this is not how QEMU should be rate limiting this event. As mentioned in an earlier reply in this thread, any rate limited /must/ work as follows: - Guest sends randomness request for 10 bytes - QMP event gets sent for 10 bytes - Guest sends randomness request for 4 bytes - QMP is dropped - Guest sends randomness request for 8 bytes - QMP event gets sent for 12 bytes ie, the byte count for any events which are dropped, must be added to the byte count in the next emitted event. Also as Amit mentioned in his reply to me, it should take account of multiple devices, or we should limit QEMU to 1 single RNG device per guest, as we do for the balloon driver. QMP events are not meant to be used like this. They are posted events and since we cannot know if there is something connected to the monitor, it's always possible for the backend to lose the information associated with an event. Events really should just be used to indicate, hey, you should go query information from QEMU now. Even what you suggest above doesn't handle the case where a management application restarts. If you were going to do this via QMP, you'd want to introduce a command to query the total outstanding requested entropy for a given device and then send a event whenever that value changes. But again, it's silly to use QMP for this. Using an inline protocol simplifies things quite a bit. Regards, Anthony Liguori Daniel
Re: [Qemu-devel] [PATCH v3 1/1] virtio-rng: hardware random number generator device
On Mon, Jun 25, 2012 at 07:22:13AM -0500, Anthony Liguori wrote: On 06/25/2012 07:10 AM, Daniel P. Berrange wrote: On Fri, Jun 22, 2012 at 02:59:13PM -0500, Anthony Liguori wrote: On 06/22/2012 01:50 PM, Amit Shah wrote: On (Fri) 22 Jun 2012 [08:44:52], Anthony Liguori wrote: On 06/22/2012 08:34 AM, Daniel P. Berrange wrote: Oh, that's a good point. But easily fixed. Of course, except that now we have to maintain compatibility so some hideous hack goes in. This is what fundamentally makes using events a bad approach. There are more problems lurking. This is the fundamental complexity of having two non-synchronized communication channels when you're trying to synchronize some sort of activity. BTW, despite what danpb mentioned, you are rate limiting entropy requests in this patch series All of these problems are naturally solved using a protocol over a CharDriverState. Can we at least agree on merging a patch which just includes the raw chardev backend support for virtio-rng ? ie drop the QMP event for now, so we can make some step forward. All we need to do to support EGD is instead of doing: +QObject *data; + +data = qobject_from_jsonf({ 'bytes': % PRId64 }, + size); +monitor_protocol_event(QEVENT_ENTROPY_NEEDED, data); +qobject_decref(data); Do: while (size 0) { uint8_t partial_size = MIN(255, size); uint8_t msg[2] = { 0x02, partial_size }; qemu_chr_write(s-chr, msg, sizeof(msg)); size -= partial_size; } And that's it. It's an extremely simple protocol to support. It would actually reduce the total size of the patch. So I now get what your objection is, and that is because of an assumption you're making which is incorrect. An OS asking for 5038 bytes of entropy is doing just that: asking for those bytes. That doesn't mean a hardware device can provide it with that much entropy. So, the hardware device here will just provide whatever is available, and the OS has to be happy with what it got. If it got 200 bytes from the device, the OS is then free to demand even 7638 bytes more, but it may not get anything for quite a while. (This is exactly how things work with /dev/random and /dev/urandom too, btw. And /dev/urandom was invented for apps which can't wait for all the data they're asking to come to them.) As it turns out, the EGD protocol also has a mechanism to ask for ask much entropy as is available at the current moment. It also has a mechanism to query the amount of available entropy. And no, you're assertion that we don't care about how much entropy the guest is requesting is wrong. If we lose entropy requests, then we never know if the guest is in a state where it's expecting entropy and we're not sending it. Consider: - Guest requests entropy (X bytes) - libvirt sees request - libvirt sends X bytes to guest - Guest requests entropy (Y bytes), QEMU filters due to rate limiting The guest will never request entropy again and libvirt will never send entropy again. The device is effectively dead-locked. WRT the impl of rate limiting Amit has used in this patch, you are correct, however, this is not how QEMU should be rate limiting this event. As mentioned in an earlier reply in this thread, any rate limited /must/ work as follows: - Guest sends randomness request for 10 bytes - QMP event gets sent for 10 bytes - Guest sends randomness request for 4 bytes - QMP is dropped - Guest sends randomness request for 8 bytes - QMP event gets sent for 12 bytes ie, the byte count for any events which are dropped, must be added to the byte count in the next emitted event. Also as Amit mentioned in his reply to me, it should take account of multiple devices, or we should limit QEMU to 1 single RNG device per guest, as we do for the balloon driver. QMP events are not meant to be used like this. They are posted events and since we cannot know if there is something connected to the monitor, it's always possible for the backend to lose the information associated with an event. Events really should just be used to indicate, hey, you should go query information from QEMU now. Even what you suggest above doesn't handle the case where a management application restarts. If you were going to do this via QMP, you'd want to introduce a command to query the total outstanding requested entropy for a given device and then send a event whenever that value changes. But again, it's silly to use QMP for this. Using an inline protocol simplifies things quite a bit. Ok, I agree that being rebust against mgmt layer restarts is a good reason for not relying on QMP events for this, and that using the EGD protocol would nicely solve this problem. So lets drop the QMP event. I still think it is key to allow use of raw chardevs in addition to EDGE though, so you
Re: [Qemu-devel] [PATCH v6 4/6] fdc: fix interrupt handling
Am 22.06.2012 12:33, schrieb Pavel Hrdina: If you call the SENSE INTERRUPT STATUS command while there is no interrupt waiting you get as result unknown command. Fixed status0 register handling for read/write/format commands. Signed-off-by: Pavel Hrdina phrd...@redhat.com --- hw/fdc.c | 34 +- 1 files changed, 21 insertions(+), 13 deletions(-) diff --git a/hw/fdc.c b/hw/fdc.c index 0270264..e28841c 100644 --- a/hw/fdc.c +++ b/hw/fdc.c @@ -307,6 +307,9 @@ enum { }; enum { +FD_SR0_DS0 = 0x01, +FD_SR0_DS1 = 0x02, +FD_SR0_HEAD = 0x04, FD_SR0_EQPMT= 0x10, FD_SR0_SEEK = 0x20, FD_SR0_ABNTERM = 0x40, @@ -972,14 +975,15 @@ static void fdctrl_reset_fifo(FDCtrl *fdctrl) } /* Set FIFO status for the host to read */ -static void fdctrl_set_fifo(FDCtrl *fdctrl, int fifo_len, int do_irq) +static void fdctrl_set_fifo(FDCtrl *fdctrl, int fifo_len, uint8_t status0) { fdctrl-data_dir = FD_DIR_READ; fdctrl-data_len = fifo_len; fdctrl-data_pos = 0; fdctrl-msr |= FD_MSR_CMDBUSY | FD_MSR_RQM | FD_MSR_DIO; -if (do_irq) -fdctrl_raise_irq(fdctrl, 0x00); +if (status0) { +fdctrl_raise_irq(fdctrl, status0); +} Is status0 != 0 the real condition or is it just what we have here and what happens to give the right result with our implementation? } /* Set an error: unimplemented/unknown command */ @@ -1044,10 +1048,12 @@ static void fdctrl_stop_transfer(FDCtrl *fdctrl, uint8_t status0, FDrive *cur_drv; cur_drv = get_cur_drv(fdctrl); +fdctrl-status0 = status0 | FD_SR0_SEEK | (cur_drv-head 2) | + GET_CUR_DRV(fdctrl); + FLOPPY_DPRINTF(transfer status: %02x %02x %02x (%02x)\n, - status0, status1, status2, - status0 | (cur_drv-head 2) | GET_CUR_DRV(fdctrl)); -fdctrl-fifo[0] = status0 | (cur_drv-head 2) | GET_CUR_DRV(fdctrl); + status0, status1, status2, fdctrl-status0); +fdctrl-fifo[0] = fdctrl-status0; fdctrl-fifo[1] = status1; fdctrl-fifo[2] = status2; fdctrl-fifo[3] = cur_drv-track; @@ -1060,7 +1066,7 @@ static void fdctrl_stop_transfer(FDCtrl *fdctrl, uint8_t status0, } fdctrl-msr |= FD_MSR_RQM | FD_MSR_DIO; fdctrl-msr = ~FD_MSR_NONDMA; -fdctrl_set_fifo(fdctrl, 7, 1); +fdctrl_set_fifo(fdctrl, 7, fdctrl-status0); } /* Prepare a data transfer (either DMA or FIFO) */ @@ -1175,7 +1181,7 @@ static void fdctrl_start_transfer(FDCtrl *fdctrl, int direction) if (direction != FD_DIR_WRITE) fdctrl-msr |= FD_MSR_DIO; /* IO based transfer: calculate len */ -fdctrl_raise_irq(fdctrl, 0x00); +fdctrl_raise_irq(fdctrl, FD_SR0_SEEK); return; } @@ -1604,16 +1610,18 @@ static void fdctrl_handle_sense_interrupt_status(FDCtrl *fdctrl, int direction) { FDrive *cur_drv = get_cur_drv(fdctrl); -if(fdctrl-reset_sensei 0) { +if (fdctrl-reset_sensei 0) { fdctrl-fifo[0] = FD_SR0_RDYCHG + FD_RESET_SENSEI_COUNT - fdctrl-reset_sensei; fdctrl-reset_sensei--; +} else if (!(fdctrl-sra FD_SRA_INTPEND)) { +fdctrl-fifo[0] = FD_SR0_INVCMD; +fdctrl_set_fifo(fdctrl, 1, 0); +return; } else { -/* XXX: status0 handling is broken for read/write - commands, so we do this hack. It should be suppressed - ASAP */ fdctrl-fifo[0] = -FD_SR0_SEEK | (cur_drv-head 2) | GET_CUR_DRV(fdctrl); +(fdctrl-status0 ~(FD_SR0_HEAD | FD_SR0_DS1 | FD_SR0_DS0)) +| GET_CUR_DRV(fdctrl); Why isn't fdctrl-status0 already right? In any case, I think if you're masking out the head number from status0, you need to or it back, even if from a different source. Kevin
[Qemu-devel] [PATCH 0/4] Define and use PRI*PLX format specifier macros for target_phys_addr_t
This patchset defines a set of macros for use in printf format strings for handling target_phys_addr_t values, as suggested by Andreas Färber (http://lists.gnu.org/archive/html/qemu-devel/2012-01/msg01547.html) We can then use them in some places which were previously either making assumptions about the width of target_phys_addr_t or had an ifdef to handle the 32 and 64 bit cases separately. Patches 2-4 cover the format-string related parts of Anthony's original make target_phys_addr_t 64 bits in all cases patch: http://lists.gnu.org/archive/html/qemu-devel/2012-01/msg01523.html Peter Maydell (4): targphys.h: Define PRI*PLX format specifier macros hw/omap.h: Use PRIxPLX to define OMAP_FMT_plx hw/sh_serial: Use PRIxPLX rather than %x for physaddr monitor: Use PRI*PLX to avoid TARGET_PHYS_ADDR_BITS ifdef hw/omap.h |8 +--- hw/sh_serial.c |5 +++-- monitor.c | 29 - targphys.h | 16 4 files changed, 24 insertions(+), 34 deletions(-)
[Qemu-devel] [PATCH 1/4] targphys.h: Define PRI*PLX format specifier macros
Define a set of PRI*PLX format specifier macros for working with target_phys_addr_t types. These follow the standard pattern for such macros, and are more flexible than TARGET_FMT_plx, which does not allow specification of field widths. Suggested-by: Andreas Färber afaer...@suse.de Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- targphys.h | 16 1 files changed, 16 insertions(+), 0 deletions(-) diff --git a/targphys.h b/targphys.h index 95648d6..d5b5636 100644 --- a/targphys.h +++ b/targphys.h @@ -11,10 +11,26 @@ typedef uint32_t target_phys_addr_t; #define TARGET_PHYS_ADDR_MAX UINT32_MAX #define TARGET_FMT_plx %08x +/* Format strings for printing target_phys_addr_t types. + * These are recommended over the less flexible TARGET_FMT_plx, + * which is retained for the benefit of existing code. + */ +#define PRIdPLX PRId32 +#define PRIiPLX PRIi32 +#define PRIoPLX PRIo32 +#define PRIuPLX PRIu32 +#define PRIxPLX PRIx32 +#define PRIXPLX PRIX32 #elif TARGET_PHYS_ADDR_BITS == 64 typedef uint64_t target_phys_addr_t; #define TARGET_PHYS_ADDR_MAX UINT64_MAX #define TARGET_FMT_plx %016 PRIx64 +#define PRIdPLX PRId64 +#define PRIiPLX PRIi64 +#define PRIoPLX PRIo64 +#define PRIuPLX PRIu64 +#define PRIxPLX PRIx64 +#define PRIXPLX PRIX64 #endif #endif -- 1.7.1
[Qemu-devel] [PATCH 4/4] monitor: Use PRI*PLX to avoid TARGET_PHYS_ADDR_BITS ifdef
Now we have PRX*PLX for printing target_phys_addr_t values, we can use them in monitor.c rather than having duplicate code in two arms of a TARGET_PHYS_ADDR_BITS ifdef. Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- monitor.c | 29 - 1 files changed, 4 insertions(+), 25 deletions(-) diff --git a/monitor.c b/monitor.c index f6107ba..6dea2a3 100644 --- a/monitor.c +++ b/monitor.c @@ -1262,45 +1262,24 @@ static void do_print(Monitor *mon, const QDict *qdict) int format = qdict_get_int(qdict, format); target_phys_addr_t val = qdict_get_int(qdict, val); -#if TARGET_PHYS_ADDR_BITS == 32 switch(format) { case 'o': -monitor_printf(mon, %#o, val); +monitor_printf(mon, %# PRIoPLX, val); break; case 'x': -monitor_printf(mon, %#x, val); +monitor_printf(mon, %# PRIxPLX, val); break; case 'u': -monitor_printf(mon, %u, val); +monitor_printf(mon, % PRIuPLX, val); break; default: case 'd': -monitor_printf(mon, %d, val); +monitor_printf(mon, % PRIdPLX, val); break; case 'c': monitor_printc(mon, val); break; } -#else -switch(format) { -case 'o': -monitor_printf(mon, %# PRIo64, val); -break; -case 'x': -monitor_printf(mon, %# PRIx64, val); -break; -case 'u': -monitor_printf(mon, % PRIu64, val); -break; -default: -case 'd': -monitor_printf(mon, % PRId64, val); -break; -case 'c': -monitor_printc(mon, val); -break; -} -#endif monitor_printf(mon, \n); } -- 1.7.1
Re: [Qemu-devel] [PATCH v3 1/1] virtio-rng: hardware random number generator device
On 06/25/2012 07:30 AM, Daniel P. Berrange wrote: On Mon, Jun 25, 2012 at 07:22:13AM -0500, Anthony Liguori wrote: On 06/25/2012 07:10 AM, Daniel P. Berrange wrote: On Fri, Jun 22, 2012 at 02:59:13PM -0500, Anthony Liguori wrote: On 06/22/2012 01:50 PM, Amit Shah wrote: On (Fri) 22 Jun 2012 [08:44:52], Anthony Liguori wrote: On 06/22/2012 08:34 AM, Daniel P. Berrange wrote: Oh, that's a good point. But easily fixed. Of course, except that now we have to maintain compatibility so some hideous hack goes in. This is what fundamentally makes using events a bad approach. There are more problems lurking. This is the fundamental complexity of having two non-synchronized communication channels when you're trying to synchronize some sort of activity. BTW, despite what danpb mentioned, you are rate limiting entropy requests in this patch series All of these problems are naturally solved using a protocol over a CharDriverState. Can we at least agree on merging a patch which just includes the raw chardev backend support for virtio-rng ? ie drop the QMP event for now, so we can make some step forward. All we need to do to support EGD is instead of doing: +QObject *data; + +data = qobject_from_jsonf({ 'bytes': % PRId64 }, + size); +monitor_protocol_event(QEVENT_ENTROPY_NEEDED, data); +qobject_decref(data); Do: while (size0) { uint8_t partial_size = MIN(255, size); uint8_t msg[2] = { 0x02, partial_size }; qemu_chr_write(s-chr, msg, sizeof(msg)); size -= partial_size; } And that's it. It's an extremely simple protocol to support. It would actually reduce the total size of the patch. So I now get what your objection is, and that is because of an assumption you're making which is incorrect. An OS asking for 5038 bytes of entropy is doing just that: asking for those bytes. That doesn't mean a hardware device can provide it with that much entropy. So, the hardware device here will just provide whatever is available, and the OS has to be happy with what it got. If it got 200 bytes from the device, the OS is then free to demand even 7638 bytes more, but it may not get anything for quite a while. (This is exactly how things work with /dev/random and /dev/urandom too, btw. And /dev/urandom was invented for apps which can't wait for all the data they're asking to come to them.) As it turns out, the EGD protocol also has a mechanism to ask for ask much entropy as is available at the current moment. It also has a mechanism to query the amount of available entropy. And no, you're assertion that we don't care about how much entropy the guest is requesting is wrong. If we lose entropy requests, then we never know if the guest is in a state where it's expecting entropy and we're not sending it. Consider: - Guest requests entropy (X bytes) - libvirt sees request - libvirt sends X bytes to guest - Guest requests entropy (Y bytes), QEMU filters due to rate limiting The guest will never request entropy again and libvirt will never send entropy again. The device is effectively dead-locked. WRT the impl of rate limiting Amit has used in this patch, you are correct, however, this is not how QEMU should be rate limiting this event. As mentioned in an earlier reply in this thread, any rate limited /must/ work as follows: - Guest sends randomness request for 10 bytes - QMP event gets sent for 10 bytes - Guest sends randomness request for 4 bytes - QMP is dropped - Guest sends randomness request for 8 bytes - QMP event gets sent for 12 bytes ie, the byte count for any events which are dropped, must be added to the byte count in the next emitted event. Also as Amit mentioned in his reply to me, it should take account of multiple devices, or we should limit QEMU to 1 single RNG device per guest, as we do for the balloon driver. QMP events are not meant to be used like this. They are posted events and since we cannot know if there is something connected to the monitor, it's always possible for the backend to lose the information associated with an event. Events really should just be used to indicate, hey, you should go query information from QEMU now. Even what you suggest above doesn't handle the case where a management application restarts. If you were going to do this via QMP, you'd want to introduce a command to query the total outstanding requested entropy for a given device and then send a event whenever that value changes. But again, it's silly to use QMP for this. Using an inline protocol simplifies things quite a bit. Ok, I agree that being rebust against mgmt layer restarts is a good reason for not relying on QMP events for this, and that using the EGD protocol would nicely solve this problem. So lets drop the QMP event. I still think it is key to allow use of raw chardevs in addition to EDGE though, so you can easily attached QEMU directly to a
[Qemu-devel] [PATCH 2/4] hw/omap.h: Use PRIxPLX to define OMAP_FMT_plx
Use the new PRIxPLX macro to avoid the need to define an OMAP_FMT_plx macro whose expansion depends directly on TARGET_PHYS_ADDR_BITS. Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- hw/omap.h |8 +--- 1 files changed, 1 insertions(+), 7 deletions(-) diff --git a/hw/omap.h b/hw/omap.h index 3d98941..2560aba 100644 --- a/hw/omap.h +++ b/hw/omap.h @@ -942,13 +942,7 @@ struct omap_mpu_state_s *omap2420_mpu_init(MemoryRegion *sysmem, unsigned long sdram_size, const char *core); -# if TARGET_PHYS_ADDR_BITS == 32 -# define OMAP_FMT_plx %#08x -# elif TARGET_PHYS_ADDR_BITS == 64 -# define OMAP_FMT_plx %#08 PRIx64 -# else -# error TARGET_PHYS_ADDR_BITS undefined -# endif +#define OMAP_FMT_plx %#08 PRIxPLX uint32_t omap_badwidth_read8(void *opaque, target_phys_addr_t addr); void omap_badwidth_write8(void *opaque, target_phys_addr_t addr, -- 1.7.1
[Qemu-devel] [PATCH 3/4] hw/sh_serial: Use PRIxPLX rather than %x for physaddr
Switch a format string from %x to PRIxPLX so that it will continue to work even if target_phys_addr_t is changed to 64 bits in the future. Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- hw/sh_serial.c |5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/hw/sh_serial.c b/hw/sh_serial.c index 43b0eb1..6f89696 100644 --- a/hw/sh_serial.c +++ b/hw/sh_serial.c @@ -186,7 +186,7 @@ static void sh_serial_write(void *opaque, target_phys_addr_t offs, } } -fprintf(stderr, sh_serial: unsupported write to 0x%02x\n, offs); +fprintf(stderr, sh_serial: unsupported write to 0x%02 PRIxPLX \n, offs); abort(); } @@ -287,7 +287,8 @@ static uint64_t sh_serial_read(void *opaque, target_phys_addr_t offs, #endif if (ret ~((1 16) - 1)) { -fprintf(stderr, sh_serial: unsupported read from 0x%02x\n, offs); +fprintf(stderr, +sh_serial: unsupported read from 0x%02 PRIxPLX \n, offs); abort(); } -- 1.7.1
Re: [Qemu-devel] [PATCH stable-1.1 00/26] Initial tree and candidates for stable-1.1
Am 23.06.2012 02:33, schrieb Michael Roth: Hi everyone, I've set up a tree for stable-1.1 at: git://github.com/mdroth/qemu.git stable-1.1 That is simply the initial commit for v1.1.0 I've set up a staging tree for the first batch of commits at: git://github.com/mdroth/qemu.git stable-1.1-staging This should capture everything explicitly tagged for stable/1.1. I've also pulled in relevant build fixes and a handful of applicable bug fixes. Please let me know if I've missed any patches, or if any of these shouldn't be applied. Testing so far has been all-target builds on x86, make check, qemu-test, and installation + old(current stable-1.1)-new and new-old migration for fc15/fc16/winxp. I plan on doing a more extensive final round of testing later next week before committing, as that will more or less be QEMU v1.1.1 For the next release I plan on doing weekly or bi-weekly testing of staged patches and emails of this sort before committing to stable-1.1, depending on the number of patches and code churn, and will continue working off my stable-1.1-staging tree on github for those. Please consider as well: 4bb9c939a57103898f5a51aa6a7336eb3320d923 ahci: SATA FIS is 20 bytes... 6f3c714eb7730630241fd0b33b799352d7feb876 sheepdog: fix return value... 47ce9ef7f89032c4079bf5132a12d1bfd4d5bca5 virtio: Fix compiler warning... f085800e245836fed27fddb3b624a29326637657 qemu-img: document qed... Also I think you should add your SoB to each of the patches and document which commit in master you backported (i.e. use git cherry-pick -sx). Kevin
Re: [Qemu-devel] [PATCH v3 1/1] virtio-rng: hardware random number generator device
On Mon, Jun 25, 2012 at 07:54:18AM -0500, Anthony Liguori wrote: On 06/25/2012 07:30 AM, Daniel P. Berrange wrote: On Mon, Jun 25, 2012 at 07:22:13AM -0500, Anthony Liguori wrote: On 06/25/2012 07:10 AM, Daniel P. Berrange wrote: On Fri, Jun 22, 2012 at 02:59:13PM -0500, Anthony Liguori wrote: On 06/22/2012 01:50 PM, Amit Shah wrote: On (Fri) 22 Jun 2012 [08:44:52], Anthony Liguori wrote: On 06/22/2012 08:34 AM, Daniel P. Berrange wrote: Oh, that's a good point. But easily fixed. Of course, except that now we have to maintain compatibility so some hideous hack goes in. This is what fundamentally makes using events a bad approach. There are more problems lurking. This is the fundamental complexity of having two non-synchronized communication channels when you're trying to synchronize some sort of activity. BTW, despite what danpb mentioned, you are rate limiting entropy requests in this patch series All of these problems are naturally solved using a protocol over a CharDriverState. Can we at least agree on merging a patch which just includes the raw chardev backend support for virtio-rng ? ie drop the QMP event for now, so we can make some step forward. All we need to do to support EGD is instead of doing: +QObject *data; + +data = qobject_from_jsonf({ 'bytes': % PRId64 }, + size); +monitor_protocol_event(QEVENT_ENTROPY_NEEDED, data); +qobject_decref(data); Do: while (size0) { uint8_t partial_size = MIN(255, size); uint8_t msg[2] = { 0x02, partial_size }; qemu_chr_write(s-chr, msg, sizeof(msg)); size -= partial_size; } And that's it. It's an extremely simple protocol to support. It would actually reduce the total size of the patch. So I now get what your objection is, and that is because of an assumption you're making which is incorrect. An OS asking for 5038 bytes of entropy is doing just that: asking for those bytes. That doesn't mean a hardware device can provide it with that much entropy. So, the hardware device here will just provide whatever is available, and the OS has to be happy with what it got. If it got 200 bytes from the device, the OS is then free to demand even 7638 bytes more, but it may not get anything for quite a while. (This is exactly how things work with /dev/random and /dev/urandom too, btw. And /dev/urandom was invented for apps which can't wait for all the data they're asking to come to them.) As it turns out, the EGD protocol also has a mechanism to ask for ask much entropy as is available at the current moment. It also has a mechanism to query the amount of available entropy. And no, you're assertion that we don't care about how much entropy the guest is requesting is wrong. If we lose entropy requests, then we never know if the guest is in a state where it's expecting entropy and we're not sending it. Consider: - Guest requests entropy (X bytes) - libvirt sees request - libvirt sends X bytes to guest - Guest requests entropy (Y bytes), QEMU filters due to rate limiting The guest will never request entropy again and libvirt will never send entropy again. The device is effectively dead-locked. WRT the impl of rate limiting Amit has used in this patch, you are correct, however, this is not how QEMU should be rate limiting this event. As mentioned in an earlier reply in this thread, any rate limited /must/ work as follows: - Guest sends randomness request for 10 bytes - QMP event gets sent for 10 bytes - Guest sends randomness request for 4 bytes - QMP is dropped - Guest sends randomness request for 8 bytes - QMP event gets sent for 12 bytes ie, the byte count for any events which are dropped, must be added to the byte count in the next emitted event. Also as Amit mentioned in his reply to me, it should take account of multiple devices, or we should limit QEMU to 1 single RNG device per guest, as we do for the balloon driver. QMP events are not meant to be used like this. They are posted events and since we cannot know if there is something connected to the monitor, it's always possible for the backend to lose the information associated with an event. Events really should just be used to indicate, hey, you should go query information from QEMU now. Even what you suggest above doesn't handle the case where a management application restarts. If you were going to do this via QMP, you'd want to introduce a command to query the total outstanding requested entropy for a given device and then send a event whenever that value changes. But again, it's silly to use QMP for this. Using an inline protocol simplifies things quite a bit. Ok, I agree that being rebust against mgmt layer restarts is a good reason for not relying on QMP events for this, and that using the EGD protocol would nicely solve this
Re: [Qemu-devel] [PATCH 7/8] dirty bitmap: abstract its use
Orit Wasserman owass...@redhat.com wrote: On 06/22/2012 04:46 PM, Juan Quintela wrote: Always use accessors to read/set the dirty bitmap. Signed-off-by: Juan Quintela quint...@redhat.com -static inline int cpu_physical_memory_get_dirty_flags(ram_addr_t addr) +/* read dirty bit (return 0 or 1) */ +static inline int cpu_physical_memory_is_dirty(ram_addr_t addr) { -return ram_list.phys_dirty[addr TARGET_PAGE_BITS]; +return cpu_physical_memory_get_dirty_flags(addr) == 0xff; } Juan, you changed the order of the functions , can your restore the order as it was. There is reason after madness. Now cpu_physical_memory_is_dirty calls cpu_physical_memory_get_dirty_flags(). After this patch, only 4 functions touch the bitmap directly. index a68b65c..dd4833d 100644 --- a/exec.c +++ b/exec.c @@ -2565,8 +2565,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host, ram_list.phys_dirty = g_realloc(ram_list.phys_dirty, last_ram_offset() TARGET_PAGE_BITS); -memset(ram_list.phys_dirty + (new_block-offset TARGET_PAGE_BITS), - 0xff, size TARGET_PAGE_BITS); +cpu_physical_memory_set_dirty_range(new_block-offset, size, 0xff); This will be much slower than memset , do we use it much ? No, we only use it each time that we add another region of RAM (i.e. a very rare event). Next step is to change the bitmap to three bitmaps of 1 bit per page. That way, we only need to look at that 4 functions for changing it. On an un-related note, I can't see how migration + TCG can work at the moment. TCG uses *is_dirty() to see if the page has changed, but if we have migrated that page, is_dirty is going to return _not-dirty_. AKA as bad as it can be :-( Later, Juan.)
Re: [Qemu-devel] [PATCH 1/4] targphys.h: Define PRI*PLX format specifier macros
On 06/25/2012 06:45 AM, Peter Maydell wrote: Define a set of PRI*PLX format specifier macros for working with target_phys_addr_t types. These follow the standard pattern for such macros, and are more flexible than TARGET_FMT_plx, which does not allow specification of field widths. Suggested-by: Andreas Färber afaer...@suse.de Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- targphys.h | 16 1 files changed, 16 insertions(+), 0 deletions(-) diff --git a/targphys.h b/targphys.h index 95648d6..d5b5636 100644 --- a/targphys.h +++ b/targphys.h @@ -11,10 +11,26 @@ typedef uint32_t target_phys_addr_t; #define TARGET_PHYS_ADDR_MAX UINT32_MAX #define TARGET_FMT_plx %08x +/* Format strings for printing target_phys_addr_t types. + * These are recommended over the less flexible TARGET_FMT_plx, + * which is retained for the benefit of existing code. + */ +#define PRIdPLX PRId32 This risks collisions with future POSIX: http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/inttypes.h.html#tag_13_20 Macro names beginning with PRI or SCN followed by any lowercase letter or 'X' may be added to the macros defined in the inttypes.h header. The correct way to do this is to first undefine any potential conflicts, per this text in http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_01 If any header in the following table is included, macros with the prefixes shown may be defined. After the last inclusion of a given header, an application may use identifiers with the corresponding prefixes for its own purpose, provided their use is preceded by a #undef of the corresponding macro. ... inttypes.h PRI[Xa-z], SCN[Xa-z] -- Eric Blake ebl...@redhat.com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH 1/8] Add spent time for migration
On 06/25/2012 12:24 AM, Orit Wasserman wrote: # @MigrationInfo @@ -275,8 +280,9 @@ # 'cancelled'. If this field is not returned, no migration process # has been initiated # -# @ram: #optional @MigrationStats containing detailed migration status, -# only returned if status is 'active' +# @ram: #optional @MigrationStats containing detailed migration +# status, only returned if status is 'active' or +# 'completed'. 'comppleted' (since 1.2) s/comppleted/completed/ -- Eric Blake ebl...@redhat.com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [RFC][PATCH 05/11] kvm: Introduce kvm_irqchip_add_msi_route
On Mon, 2012-05-14 at 18:07 -0300, Jan Kiszka wrote: Add a service that establishes a static route from a virtual IRQ line to an MSI message. Will be used for IRQFD and device assignment. As we will use this service outside of CONFIG_KVM protected code, stub it properly. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- kvm-all.c | 31 +++ kvm-stub.c |8 kvm.h | 10 ++ 3 files changed, 45 insertions(+), 4 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index 8ab83db..f45b852 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -1080,6 +1080,32 @@ int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg) return kvm_irqchip_set_irq(s, route-kroute.gsi, 1); } +int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg) +{ +struct kvm_irq_routing_entry kroute; +int gsi; + +if (!kvm_irqchip_in_kernel()) { +return -ENOSYS; +} + +gsi = kvm_get_pseudo_gsi(s); +if (gsi 0) { +return gsi; +} + +kroute.gsi = gsi; +kroute.type = KVM_IRQ_ROUTING_MSI; +kroute.flags = 0; +kroute.u.msi.address_lo = (uint32_t)msg.address; +kroute.u.msi.address_hi = msg.address 32; +kroute.u.msi.data = msg.data; + +kvm_add_routing_entry(s, kroute); + +return gsi; +} + #else /* !KVM_CAP_IRQ_ROUTING */ static void kvm_init_irq_routing(KVMState *s) @@ -1090,6 +1116,11 @@ int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg) { abort(); } + +int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg) +{ +abort(); +} #endif /* !KVM_CAP_IRQ_ROUTING */ Jan, Could we perhaps return a sane error value here? This seems to be the initial function used to setup direct MSI injection and if we have code like: virq = kvm_irqchip_add_msi_route(...) if (virq 0) { slow path... } else { fast path... } It works on x86 w/ or w/o irqchip, works with kvm disabled, but gives an abort on !x86. I really don't want to have to surround the above in a #define or in-kernel ioapic test. Thanks, Alex static int kvm_irqchip_create(KVMState *s) diff --git a/kvm-stub.c b/kvm-stub.c index 47c573d..db3a7dc 100644 --- a/kvm-stub.c +++ b/kvm-stub.c @@ -12,10 +12,13 @@ #include qemu-common.h #include hw/hw.h +#include hw/msi.h #include cpu.h #include gdbstub.h #include kvm.h +KVMState *kvm_state; + int kvm_init_vcpu(CPUArchState *env) { return -ENOSYS; @@ -128,3 +131,8 @@ int kvm_on_sigbus(int code, void *addr) { return 1; } + +int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg) +{ +return -ENOSYS; +} diff --git a/kvm.h b/kvm.h index 8b061bd..67df1f1 100644 --- a/kvm.h +++ b/kvm.h @@ -44,6 +44,10 @@ typedef struct KVMCapabilityInfo { #define KVM_CAP_INFO(CAP) { KVM_CAP_ stringify(CAP), KVM_CAP_##CAP } #define KVM_CAP_LAST_INFO { NULL, 0 } +struct KVMState; +typedef struct KVMState KVMState; +extern KVMState *kvm_state; + /* external API */ int kvm_init(void); @@ -88,10 +92,6 @@ int kvm_on_sigbus(int code, void *addr); /* internal API */ -struct KVMState; -typedef struct KVMState KVMState; -extern KVMState *kvm_state; - int kvm_ioctl(KVMState *s, int type, ...); int kvm_vm_ioctl(KVMState *s, int type, ...); @@ -213,4 +213,6 @@ int kvm_set_ioeventfd_mmio(int fd, uint32_t adr, uint32_t val, bool assign, uint32_t size); int kvm_set_ioeventfd_pio_word(int fd, uint16_t adr, uint16_t val, bool assign); + +int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg); #endif
Re: [Qemu-devel] [PATCH v4 1/7] qemu-char: Add MSG_CMSG_CLOEXEC flag to recvmsg
On 06/25/2012 05:16 AM, Stefan Hajnoczi wrote: On Fri, Jun 22, 2012 at 01:31:17PM -0600, Eric Blake wrote: On 06/22/2012 12:36 PM, Corey Bryant wrote: This sets the close-on-exec flag for the file descriptor received via SCM_RIGHTS. Signed-off-by: Corey Bryant cor...@linux.vnet.ibm.com --- v4 -This patch is new in v4 (ebl...@redhat.com) qemu-char.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/qemu-char.c b/qemu-char.c index c2aaaee..f890113 100644 --- a/qemu-char.c +++ b/qemu-char.c @@ -2263,7 +2263,7 @@ static ssize_t tcp_chr_recv(CharDriverState *chr, char *buf, size_t len) msg.msg_control = msg_control; msg.msg_controllen = sizeof(msg_control); -ret = recvmsg(s-fd, msg, 0); +ret = recvmsg(s-fd, msg, MSG_CMSG_CLOEXEC); MSG_CMSG_CLOEXEC is not (yet) in POSIX (although it has been proposed for addition); therefore, at the moment, it only exists on Linux and Cygwin. Does this need to have conditional code to allow compilation on BSD, such as: #ifndef MSG_CMSG_CLOEXEC # define MSG_CMSG_CLOEXEC 0 #endif as well as fallback code that sets FD_CLOEXEC manually via fcntl() when MSG_CMSG_CLOEXEC is missing? Good point. I think the answer is yes. Just like qemu_open() we can wrap recvmsg(2) with fd so that platforms with MSG_CMSG_CLOEXEC use that flag and other platforms use qemu_set_cloexec(). Stefan Thanks for pointing this out. I'll make the updates in v5. -- Regards, Corey
Re: [Qemu-devel] [PATCH 0/2] introduce bdrv_swap, implement bdrv_append on top
On Fri, Jun 15, 2012 at 10:38 AM, Kevin Wolf kw...@redhat.com wrote: Am 14.06.2012 16:55, schrieb Paolo Bonzini: Yet another tiny bit extracted from block mirroring, looks like it should be useful for block commit too. Paolo Bonzini (2): block: copy over job and dirty bitmap fields in bdrv_append block: introduce bdrv_swap, implement bdrv_append on top of it block.c | 175 +-- block.h | 1 + 2 files changed, 103 insertions(+), 73 deletions(-) I was really hoping we could get rid of bdrv_append() rather than extend it and spread its use... What exactly do we need this for? I'm sure you have good reasons, but with such hackish approaches the justification should be explicit. Opening twice and switching to the new state is an okay approach. It's just made very difficult because BlockDriverState has grown into a monster struct. So the implementation is ugly. If we split BlockDriverState (e.g. guest-visible block device state vs block driver state) can we do away with the in-place swap? If it's possible to implement this using regular pointer assignment instead of swapping fields, then this approach will become reasonable IMO. Stefan
Re: [Qemu-devel] [PATCH 0/4] Define and use PRI*PLX format specifier macros for target_phys_addr_t
Am 25.06.2012 14:45, schrieb Peter Maydell: This patchset defines a set of macros for use in printf format strings for handling target_phys_addr_t values, as suggested by Andreas Färber (http://lists.gnu.org/archive/html/qemu-devel/2012-01/msg01547.html) What is the X for again? I assume PL = Physical Long, X as in hexadecimal would then contradict the d/u and duplicate x/X. Thus I would suggest TARGET_PRIdPHYSADDR, also addressing Eric's comment. Judging from patches 2-4 it's not used to frequently unlike its user TARGET_FMT_plx, so we don't have to badly shorten it for usability. Andreas We can then use them in some places which were previously either making assumptions about the width of target_phys_addr_t or had an ifdef to handle the 32 and 64 bit cases separately. Patches 2-4 cover the format-string related parts of Anthony's original make target_phys_addr_t 64 bits in all cases patch: http://lists.gnu.org/archive/html/qemu-devel/2012-01/msg01523.html Peter Maydell (4): targphys.h: Define PRI*PLX format specifier macros hw/omap.h: Use PRIxPLX to define OMAP_FMT_plx hw/sh_serial: Use PRIxPLX rather than %x for physaddr monitor: Use PRI*PLX to avoid TARGET_PHYS_ADDR_BITS ifdef hw/omap.h |8 +--- hw/sh_serial.c |5 +++-- monitor.c | 29 - targphys.h | 16 4 files changed, 24 insertions(+), 34 deletions(-) -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
Re: [Qemu-devel] [RFC][PATCH 05/11] kvm: Introduce kvm_irqchip_add_msi_route
On 2012-06-25 15:38, Alex Williamson wrote: On Mon, 2012-05-14 at 18:07 -0300, Jan Kiszka wrote: Add a service that establishes a static route from a virtual IRQ line to an MSI message. Will be used for IRQFD and device assignment. As we will use this service outside of CONFIG_KVM protected code, stub it properly. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- kvm-all.c | 31 +++ kvm-stub.c |8 kvm.h | 10 ++ 3 files changed, 45 insertions(+), 4 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index 8ab83db..f45b852 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -1080,6 +1080,32 @@ int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg) return kvm_irqchip_set_irq(s, route-kroute.gsi, 1); } +int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg) +{ +struct kvm_irq_routing_entry kroute; +int gsi; + +if (!kvm_irqchip_in_kernel()) { +return -ENOSYS; +} + +gsi = kvm_get_pseudo_gsi(s); +if (gsi 0) { +return gsi; +} + +kroute.gsi = gsi; +kroute.type = KVM_IRQ_ROUTING_MSI; +kroute.flags = 0; +kroute.u.msi.address_lo = (uint32_t)msg.address; +kroute.u.msi.address_hi = msg.address 32; +kroute.u.msi.data = msg.data; + +kvm_add_routing_entry(s, kroute); + +return gsi; +} + #else /* !KVM_CAP_IRQ_ROUTING */ static void kvm_init_irq_routing(KVMState *s) @@ -1090,6 +1116,11 @@ int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg) { abort(); } + +int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg) +{ +abort(); +} #endif /* !KVM_CAP_IRQ_ROUTING */ Jan, Could we perhaps return a sane error value here? This seems to be the initial function used to setup direct MSI injection and if we have code like: virq = kvm_irqchip_add_msi_route(...) if (virq 0) { slow path... } else { fast path... } It works on x86 w/ or w/o irqchip, works with kvm disabled, but gives an abort on !x86. I really don't want to have to surround the above in a #define or in-kernel ioapic test. Thanks, No problems, just send a patch. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
Re: [Qemu-devel] [PATCH 7/8] dirty bitmap: abstract its use
On 06/25/2012 04:20 PM, Juan Quintela wrote: Orit Wasserman owass...@redhat.com wrote: On 06/22/2012 04:46 PM, Juan Quintela wrote: Always use accessors to read/set the dirty bitmap. Signed-off-by: Juan Quintela quint...@redhat.com -static inline int cpu_physical_memory_get_dirty_flags(ram_addr_t addr) +/* read dirty bit (return 0 or 1) */ +static inline int cpu_physical_memory_is_dirty(ram_addr_t addr) { -return ram_list.phys_dirty[addr TARGET_PAGE_BITS]; +return cpu_physical_memory_get_dirty_flags(addr) == 0xff; } Juan, you changed the order of the functions , can your restore the order as it was. There is reason after madness. Now cpu_physical_memory_is_dirty calls cpu_physical_memory_get_dirty_flags(). After this patch, only 4 functions touch the bitmap directly. ok. index a68b65c..dd4833d 100644 --- a/exec.c +++ b/exec.c @@ -2565,8 +2565,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host, ram_list.phys_dirty = g_realloc(ram_list.phys_dirty, last_ram_offset() TARGET_PAGE_BITS); -memset(ram_list.phys_dirty + (new_block-offset TARGET_PAGE_BITS), - 0xff, size TARGET_PAGE_BITS); +cpu_physical_memory_set_dirty_range(new_block-offset, size, 0xff); This will be much slower than memset , do we use it much ? No, we only use it each time that we add another region of RAM (i.e. a very rare event). if it is rare , than it is fine Next step is to change the bitmap to three bitmaps of 1 bit per page. That way, we only need to look at that 4 functions for changing it. Great. On an un-related note, I can't see how migration + TCG can work at the moment. TCG uses *is_dirty() to see if the page has changed, but if we have migrated that page, is_dirty is going to return _not-dirty_. AKA as bad as it can be :-( Later, Juan.) Reviewed-by: Orit Wasserman owass...@redhat.com
[Qemu-devel] [PATCH 2/4] target-xtensa: drop usage of prev_debug_excp_handler
Chains of exception handlers are currently unused feature. Dropping it to be consistent with target-i386 but it may simplify qom-ifying CPU in future like for target-i386. Signed-off-by: Igor Mammedov imamm...@redhat.com --- target-xtensa/helper.c |8 +--- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/target-xtensa/helper.c b/target-xtensa/helper.c index 044ce18..d5bb171 100644 --- a/target-xtensa/helper.c +++ b/target-xtensa/helper.c @@ -54,8 +54,6 @@ static uint32_t check_hw_breakpoints(CPUXtensaState *env) return 0; } -static CPUDebugExcpHandler *prev_debug_excp_handler; - static void breakpoint_handler(CPUXtensaState *env) { if (env-watchpoint_hit) { @@ -70,9 +68,6 @@ static void breakpoint_handler(CPUXtensaState *env) cpu_resume_from_signal(env, NULL); } } -if (prev_debug_excp_handler) { -prev_debug_excp_handler(env); -} } XtensaCPU *cpu_xtensa_init(const char *cpu_model) @@ -105,8 +100,7 @@ XtensaCPU *cpu_xtensa_init(const char *cpu_model) if (!debug_handler_inited tcg_enabled()) { debug_handler_inited = 1; -prev_debug_excp_handler = -cpu_set_debug_excp_handler(breakpoint_handler); +cpu_set_debug_excp_handler(breakpoint_handler); } xtensa_irq_init(env); -- 1.7.10.2
[Qemu-devel] [PATCH 0/4 v2] target-i386: move tcg intialization inside CPU object
v2: - drop usage of prev_debug_excp_handler consistently in all users - split from reset patches to avoid confusion of inter-dependency Compile Run tested: target-i386: tcg and kvm mode i386-linux-user: running of /bin/ls Compile tested: xtensa-softmmu xtensaeb-softmmu git tree for testing: https://github.com/imammedo/qemu/tree/x86cpu_qom_tcg_v2 Igor Mammedov (4): target-i386: drop usage of prev_debug_excp_handler target-xtensa: drop usage of prev_debug_excp_handler cleanup cpu_set_debug_excp_handler target-i386: move tcg initialization into x86_cpu_initfn() cpu-exec.c |5 + exec-all.h |2 +- target-i386/cpu.c | 10 ++ target-i386/cpu.h |1 + target-i386/helper.c | 16 +--- target-xtensa/helper.c |8 +--- 6 files changed, 15 insertions(+), 27 deletions(-) -- 1.7.10.2
[Qemu-devel] [PATCH 3/4] cleanup cpu_set_debug_excp_handler
There are no users left for previous exception handler returned from cpu_set_debug_excp_handler. It should simplify code a little. Signed-off-by: Igor Mammedov imamm...@redhat.com --- cpu-exec.c |5 + exec-all.h |2 +- 2 files changed, 2 insertions(+), 5 deletions(-) diff --git a/cpu-exec.c b/cpu-exec.c index 624c409..24607fb 100644 --- a/cpu-exec.c +++ b/cpu-exec.c @@ -156,12 +156,9 @@ static inline TranslationBlock *tb_find_fast(CPUArchState *env) static CPUDebugExcpHandler *debug_excp_handler; -CPUDebugExcpHandler *cpu_set_debug_excp_handler(CPUDebugExcpHandler *handler) +void cpu_set_debug_excp_handler(CPUDebugExcpHandler *handler) { -CPUDebugExcpHandler *old_handler = debug_excp_handler; - debug_excp_handler = handler; -return old_handler; } static void cpu_handle_debug_exception(CPUArchState *env) diff --git a/exec-all.h b/exec-all.h index 9bda7f7..c5ec8e1 100644 --- a/exec-all.h +++ b/exec-all.h @@ -357,7 +357,7 @@ tb_page_addr_t get_page_addr_code(CPUArchState *env1, target_ulong addr); typedef void (CPUDebugExcpHandler)(CPUArchState *env); -CPUDebugExcpHandler *cpu_set_debug_excp_handler(CPUDebugExcpHandler *handler); +void cpu_set_debug_excp_handler(CPUDebugExcpHandler *handler); /* vl.c */ extern int singlestep; -- 1.7.10.2
[Qemu-devel] [PATCH 1/4] target-i386: drop usage of prev_debug_excp_handler
Chains of exception handlers are currently unused feature, drop it for now so as not to expose prev_debug_excp_handler at global scope when moving tcg initialization into target-i386/cpu.c Later we probably could re-invent better interface for this. Signed-off-by: Igor Mammedov imamm...@redhat.com --- target-i386/helper.c |7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/target-i386/helper.c b/target-i386/helper.c index 2cc8097..b9384f6 100644 --- a/target-i386/helper.c +++ b/target-i386/helper.c @@ -941,8 +941,6 @@ int check_hw_breakpoints(CPUX86State *env, int force_dr6_update) return hit_enabled; } -static CPUDebugExcpHandler *prev_debug_excp_handler; - static void breakpoint_handler(CPUX86State *env) { CPUBreakpoint *bp; @@ -965,8 +963,6 @@ static void breakpoint_handler(CPUX86State *env) break; } } -if (prev_debug_excp_handler) -prev_debug_excp_handler(env); } typedef struct MCEInjectionParams { @@ -1166,8 +1162,7 @@ X86CPU *cpu_x86_init(const char *cpu_model) inited = 1; optimize_flags_init(); #ifndef CONFIG_USER_ONLY -prev_debug_excp_handler = -cpu_set_debug_excp_handler(breakpoint_handler); +cpu_set_debug_excp_handler(breakpoint_handler); #endif } if (cpu_x86_register(cpu, cpu_model) 0) { -- 1.7.10.2
[Qemu-devel] [PATCH 4/4] target-i386: move tcg initialization into x86_cpu_initfn()
In order to make cpu object not depended on external ad-hoc initialization routines, move tcg initialization from cpu_x86_init inside cpu object x86_cpu_initfn(). Signed-off-by: Igor Mammedov imamm...@redhat.com Reviewed-by: Andreas Färber afaer...@suse.de --- target-i386/cpu.c| 10 ++ target-i386/cpu.h|1 + target-i386/helper.c | 11 +-- 3 files changed, 12 insertions(+), 10 deletions(-) diff --git a/target-i386/cpu.c b/target-i386/cpu.c index fdd95be..445274c 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -1716,6 +1716,7 @@ static void x86_cpu_initfn(Object *obj) { X86CPU *cpu = X86_CPU(obj); CPUX86State *env = cpu-env; +static int inited; cpu_exec_init(env); @@ -1745,6 +1746,15 @@ static void x86_cpu_initfn(Object *obj) x86_cpuid_set_tsc_freq, NULL, NULL, NULL); env-cpuid_apic_id = env-cpu_index; + +/* init various static tables used in TCG mode */ +if (tcg_enabled() !inited) { +inited = 1; +optimize_flags_init(); +#ifndef CONFIG_USER_ONLY +cpu_set_debug_excp_handler(breakpoint_handler); +#endif +} } static void x86_cpu_common_class_init(ObjectClass *oc, void *data) diff --git a/target-i386/cpu.h b/target-i386/cpu.h index bcf663e..80dcb49 100644 --- a/target-i386/cpu.h +++ b/target-i386/cpu.h @@ -933,6 +933,7 @@ static inline int hw_breakpoint_len(unsigned long dr7, int index) void hw_breakpoint_insert(CPUX86State *env, int index); void hw_breakpoint_remove(CPUX86State *env, int index); int check_hw_breakpoints(CPUX86State *env, int force_dr6_update); +void breakpoint_handler(CPUX86State *env); /* will be suppressed */ void cpu_x86_update_cr0(CPUX86State *env, uint32_t new_cr0); diff --git a/target-i386/helper.c b/target-i386/helper.c index b9384f6..c52ec13 100644 --- a/target-i386/helper.c +++ b/target-i386/helper.c @@ -941,7 +941,7 @@ int check_hw_breakpoints(CPUX86State *env, int force_dr6_update) return hit_enabled; } -static void breakpoint_handler(CPUX86State *env) +void breakpoint_handler(CPUX86State *env) { CPUBreakpoint *bp; @@ -1151,20 +1151,11 @@ X86CPU *cpu_x86_init(const char *cpu_model) { X86CPU *cpu; CPUX86State *env; -static int inited; cpu = X86_CPU(object_new(TYPE_X86_CPU)); env = cpu-env; env-cpu_model_str = cpu_model; -/* init various static tables used in TCG mode */ -if (tcg_enabled() !inited) { -inited = 1; -optimize_flags_init(); -#ifndef CONFIG_USER_ONLY -cpu_set_debug_excp_handler(breakpoint_handler); -#endif -} if (cpu_x86_register(cpu, cpu_model) 0) { object_delete(OBJECT(cpu)); return NULL; -- 1.7.10.2
Re: [Qemu-devel] [PATCH stable-1.1 00/26] Initial tree and candidates for stable-1.1
On Mon, Jun 25, 2012 at 12:46:28PM +0100, Stefano Stabellini wrote: On Sat, 23 Jun 2012, Michael Roth wrote: Hi everyone, I've set up a tree for stable-1.1 at: git://github.com/mdroth/qemu.git stable-1.1 Thanks for doing this. That is simply the initial commit for v1.1.0 I've set up a staging tree for the first batch of commits at: git://github.com/mdroth/qemu.git stable-1.1-staging This should capture everything explicitly tagged for stable/1.1. I've also pulled in relevant build fixes and a handful of applicable bug fixes. Please let me know if I've missed any patches, or if any of these shouldn't be applied. Please backport this one too: commit 64c27e5b1fdb6d94bdc0bda3b1869d7383a35c65 Author: Jan Beulich jbeul...@suse.com Date: Mon Jun 11 09:52:27 2012 + qemu/xendisk: set maximum number of grants to be used Applied to staging, thanks. Thanks! - Stefano
Re: [Qemu-devel] [PATCH 2/4] target-xtensa: drop usage of prev_debug_excp_handler
On Mon, Jun 25, 2012 at 5:55 PM, Igor Mammedov imamm...@redhat.com wrote: Chains of exception handlers are currently unused feature. Dropping it to be consistent with target-i386 but it may simplify qom-ifying CPU in future like for target-i386. Signed-off-by: Igor Mammedov imamm...@redhat.com --- target-xtensa/helper.c | 8 +--- 1 file changed, 1 insertion(+), 7 deletions(-) Acked-by: Max Filippov jcmvb...@gmail.com -- Thanks. -- Max
Re: [Qemu-devel] [RFC 0/7] qcow2: implement lazy refcounts optimization
On Fri, Jun 22, 2012 at 11:08 PM, Stefan Hajnoczi stefa...@linux.vnet.ibm.com wrote: This series aims to improve qcow2 performance with cache=writethrough and cache=directsync. In particular it reduces the impact of metadata updates for What does cache=directsync mean? what is the difference between them? allocating writes. Allocating writes are expensive because they involve updating L2 tables and refcount blocks. In addition they can also cause L2 table allocation and refcount block allocation but these remain unaffected by this optimization. The key insight is that refcounts are not required to access data in the image. Can you elaborate this? why? This means that we can postpone refcount updates without violating the data integrity guarantee that cache=writethrough and cache=directsync give. The trade-off for postponing refcount updates is that the image may not be completely consistent in case of power failure or crash. If the image is dirty then it must be repaired before performing further modifications, in other words we need an fsck-like scan on startup. I don't have performance results to share yet but I wanted to get the code out there. The bigger picture is that this optimization should help make qcow2 a good choice even for cache=writethrough and cache=directsync where QED has traditionally had an advantage due to less metadata - this allows us to converge image format development in QEMU around the qcow2v3 format. Why do we design one new image format? e.g. use btree to manage metadata and disk data layout? Since btree has a lot of advantages than bitmap or tables. Stefan Hajnoczi (7): docs: add dirty bit to qcow2 specification qcow2: introduce dirty bit docs: add lazy refcounts bit to qcow2 specification qemu-iotests: ignore qemu-img create lazy_refcounts output qcow2: implement lazy refcounts qemu-io: add abort command to simulate program crash qemu-iotests: add 039 qcow2 lazy refcounts test block/qcow2-cluster.c | 5 +- block/qcow2.c | 111 +++--- block/qcow2.h | 11 + block_int.h | 26 +- docs/specs/qcow2.txt | 12 - qemu-io.c | 12 + tests/qemu-iotests/039 | 99 + tests/qemu-iotests/039.out | 34 + tests/qemu-iotests/common.rc | 3 +- tests/qemu-iotests/group | 1 + 10 files changed, 292 insertions(+), 22 deletions(-) create mode 100755 tests/qemu-iotests/039 create mode 100644 tests/qemu-iotests/039.out -- 1.7.10 -- Regards, Zhi Yong Wu
Re: [Qemu-devel] [PATCH 0/4] Define and use PRI*PLX format specifier macros for target_phys_addr_t
On 25 June 2012 14:48, Andreas Färber afaer...@suse.de wrote: Am 25.06.2012 14:45, schrieb Peter Maydell: This patchset defines a set of macros for use in printf format strings for handling target_phys_addr_t values, as suggested by Andreas Färber (http://lists.gnu.org/archive/html/qemu-devel/2012-01/msg01547.html) What is the X for again? I assume PL = Physical Long, X as in hexadecimal would then contradict the d/u and duplicate x/X. I'm just following the existing use of 'plx' in TARGET_FMT_plx, but I'm not very attached to the abbreviation. Thus I would suggest TARGET_PRIdPHYSADDR, also addressing Eric's comment. Judging from patches 2-4 it's not used to frequently unlike its user TARGET_FMT_plx, so we don't have to badly shorten it for usability. The hope is that it is used more extensively in the future (ie any device that would have used TARGET_FMT_plx should use these instead); I think TARGET_PRIdPHYSADDR is too long. -- PMM
Re: [Qemu-devel] [PATCH stable-1.1 00/26] Initial tree and candidates for stable-1.1
On Mon, Jun 25, 2012 at 02:58:40PM +0200, Kevin Wolf wrote: Am 23.06.2012 02:33, schrieb Michael Roth: Hi everyone, I've set up a tree for stable-1.1 at: git://github.com/mdroth/qemu.git stable-1.1 That is simply the initial commit for v1.1.0 I've set up a staging tree for the first batch of commits at: git://github.com/mdroth/qemu.git stable-1.1-staging This should capture everything explicitly tagged for stable/1.1. I've also pulled in relevant build fixes and a handful of applicable bug fixes. Please let me know if I've missed any patches, or if any of these shouldn't be applied. Testing so far has been all-target builds on x86, make check, qemu-test, and installation + old(current stable-1.1)-new and new-old migration for fc15/fc16/winxp. I plan on doing a more extensive final round of testing later next week before committing, as that will more or less be QEMU v1.1.1 For the next release I plan on doing weekly or bi-weekly testing of staged patches and emails of this sort before committing to stable-1.1, depending on the number of patches and code churn, and will continue working off my stable-1.1-staging tree on github for those. Please consider as well: 4bb9c939a57103898f5a51aa6a7336eb3320d923 ahci: SATA FIS is 20 bytes... I originally had this in the queue but it looks like the only effect is to avoid allocating/initializing an extra 16 bytes on the stack. I didn't see any other potential side-effects so decided to drop it. Let me know if I'm missing something though. 6f3c714eb7730630241fd0b33b799352d7feb876 sheepdog: fix return value... 47ce9ef7f89032c4079bf5132a12d1bfd4d5bca5 virtio: Fix compiler warning... f085800e245836fed27fddb3b624a29326637657 qemu-img: document qed... Applied to staging, thanks. Also I think you should add your SoB to each of the patches and document which commit in master you backported (i.e. use git cherry-pick -sx). Thanks, corrected in staging now. Didn't know about about the -x option, pretty spiffy :) Kevin
Re: [Qemu-devel] [RFC 2/7] qcow2: introduce dirty bit
Am 22.06.2012 17:08, schrieb Stefan Hajnoczi: This patch adds an incompatible feature bit to mark images that have not been closed cleanly. When a dirty image file is opened a consistency check and repair is performed. Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com --- block/qcow2.c | 44 ++-- block/qcow2.h |3 +++ 2 files changed, 45 insertions(+), 2 deletions(-) diff --git a/block/qcow2.h b/block/qcow2.h index 455b6d7..5c7cfb6 100644 --- a/block/qcow2.h +++ b/block/qcow2.h @@ -108,6 +108,9 @@ enum { QCOW2_FEAT_TYPE_INCOMPATIBLE= 0, QCOW2_FEAT_TYPE_COMPATIBLE = 1, QCOW2_FEAT_TYPE_AUTOCLEAR = 2, + +QCOW2_INCOMPATIBLE_FEAT_DIRTY = 0x1, +QCOW2_INCOMPATIBLE_FEAT_MASK= QCOW2_INCOMPATIBLE_FEAT_DIRTY, }; I would use a separate enum for each of the three flag categories. Also, you should add a feature table entry for the dirty bit so that older qemu versions can display a useful error message. Kevin