Re: [Qemu-devel] qemu per vm config file support
On (Mon) Apr 12 2010 [21:06:29], Manish Regmi wrote: Hi, These days i am playing with qemu and kvm. The command line parameters can become very long. Is there any technical reason why config file is not implemented in qemu yet. See -readconfig and -writeconfig. There's partial support for config files; not all devices can be initialised via configs yet. Amit
[Qemu-devel] Re: [PATCH v2 3/6] Modifies wrapper functions for byte-based phys_ram_dirty bitmap to bit-based phys_ram_dirty bitmap.
Avi Kivity wrote: On 04/12/2010 01:58 PM, Yoshiaki Tamura wrote: Is it necessary to update migration and vga bitmaps? We can simply update the master bitmap, and update the migration and vga bitmaps only when they need it. That can be done in a different patch. Let me explain the role of the master bitmap here. In the previous discussion, the master bitmap represented at least one dirty type is actually dirty. While implementing this approach, I though there is one downside that upon resetting the bitmap, it needs to check all dirty types whether they are already cleared to unset the master bitmap, and this might hurt the performance in general. The way I see it, the master bitmap is a buffer between the guest and all the other client bitmaps (migration and vga). So, a bit can be set in vga and cleared in master. Let's look at the following scenario: 1. guest touches address 0xa (page 0xa0) 2. qemu updates master bitmap bit 0xa0 3. vga timer fires 4. vga syncs the dirty bitmap for addresses 0xa-0xc 4.1 vga_bitmap[0xa0-0xc0] |= master_bitmap[0xa0-0xc0] 4.2 migration_bitmap[0xa0-0xc0] |= master_bitmap[0xa0-0xc0] 4.3 master_bitmap[0xa0-0xc0] = 0 5. vga draws based on vga_bitmap the nice thing is that step 4.2 can be omitted if migration is not active. so, client bitmaps are always updated when the client requests them, master bitmap is only a buffer. Thanks. I think I'm finally getting your point. At 4.2 and 4.3, is syncing to client necessary? Simply doing OR at get_dirty like in this patch not enough? In this patch, master bitmap represents all types are dirty, similar to existing 0xff. With this approach, resetting the master bitmap can be done without checking the other types. set_dirty_flags is actually taking the burden in this case though. Anyway, IIUC somebody would be unhappy depending on the role of the master bitmap. Yes, the problem is that set_dirty_flags is the common case and also uses random access, we'd want to make it touch only a single bit. client access is rare, and also sequential, and therefore efficient. I see. So set_dirty_flags would be like, 1. set master first regard less of dirty type. 2. if dirty type was CODE, set the client. Note that we should only allocate the migration and vga bitmaps when migration or vga is active. Right. May I do it in a different patch? Sure, more patches is usually better. I think this is about optimization. In this patch, I focused on not breaking the existing function. Yes, that's the best approach. But we do need to see how all the incremental steps end up. I totally agree. Thanks for helping. In summary, I'll prepare two patch sets. 1. v3 of this patch set. - Change FLAGS value to (1,2,4,8), and add IDX (0,1,2,3) - Use ffsl to convert FLAGS to IDX. - Add help function which takes IDX. - Change the behavior of set_dirty_flags as above. - Change dirty bitmap access to a loop. - Add brace after if () - Move some macros to qemu-common.h. 2. Allocate vga and migration dirty bitmap dynamically. Please modify or add items if I were missing.
Re: [Qemu-devel] QEMU regression problems
2010/4/12 Gerhard Wiesinger li...@wiesinger.com: Hello, Checkit reports some problems under DOS: 1.) NPU functions are not correct: NPU Trigonometric Functions: FAILED. Seems to be a problem of the instruction set. 2.) Real-Time Clock Alarm: FAILED (This might be also the reason for the KVM problem, see my previous post). Seems to be that real-time clock is not working correct. 3.) There is also a problem with the reported base memory under QEMM386 (HIMEM.SYS and EMM386.EXE is correct here). It is 646kB instead of 640kB. Therefore base memory test fails. I guess that reporting memory CMOS tables/interrupt functions are not implemented correctly. Details are listed below. All issues are NOT present under VMWare Server 2.0 and with real hardware. QEMU: 0.12.3 under Fedora 11, 2.6.30.10-105.2.23.fc11.x86 on AMD Phenom II Quad Core, x86_64-softmmu. Any comments? Tested with Checkit 3.0 and QEMM 9.0. - The NPU error is confirmed and it seems to be a floating point rounding error in QEMU. This should be a 0.12 regression as it works in 0.11.1 and 0.10.6. - The RTC Alarm failure is confirmed too. Is it unimplemented in QEMU? - The Base Memory 640K error seems to be SeaBIOS related. QEMU Bochs BIOS(tested with both -old-bios hack in 0.12 series and old 0.11.1) will just freeze after QEMU counted RAM.(Tested with ScriptPC and Bochs). Thnx. Ciao, Gerhard -- http://www.wiesinger.com/ Details: 1.) NPU Trigonometric Functions.FAILED *** Step 1, Expected0.42926, received0.42926 Double 'Npu_oldans1' = 0.429259 (3FDB78F91894EFA5h). Double 'Npu_oldans2' = 0.628319 (3FE41B2F769CF0E0h). Double 'Npu_result ' = 0.429259 (3FDB78F91894EFA6h). 2.) Compare Current TimePassed DOS: 16:24:39.89Real-Time Clock: 16:24:39.00 (.89 apart) Compare Current DatePassed DOS: 04/11/2010 Real-Time Clock: 04/11/2010. Real-Time Clock Alarm...FAILED *** Compare Elapsed TimePassed DOS: 11.97 Seconds Real-Time Clock: 12.00 Seconds (.03 apart) 3.) Known Memory: Base646K From 0K to646K (000h to 00A17FFh) Base Memory.FAILED *** ERROR at Address 0Ah, Bits FEDCBA9876543210 ERROR at Address 0A0004h, Bits FEDCBA9876543210 ERROR at Address 0A0006h, Bits FEDCBA9876543210 ERROR at Address 0A0008h, Bits FEDCBA9876543210 ERROR at Address 0A000Ah, Bits FEDCBA9876543210 ERROR at Address 0A000Ch, Bits FEDCBA9876543210 ERROR at Address 0A000Eh, Bits FEDCBA9876543210 ERROR at Address 0A0010h, Bits FEDCBA9876543210 ERROR at Address 0A0012h, Bits FEDCBA9876543210 ERROR at Address 0A0014h, Bits FEDCBA9876543210 ERROR at Address 0A0016h, Bits FEDCBA9876543210 ERROR at Address 0A0018h, Bits FEDCBA9876543210 ERROR at Address 0A001Ah, Bits FEDCBA9876543210 ERROR at Address 0A001Ch, Bits FEDCBA9876543210 ERROR at Address 0A001Eh, Bits FEDCBA9876543210 ERROR at Address 0A0020h, Bits FEDCBA9876543210 ADDITIONAL MEMORY ERRORS WERE NOT LISTED DUE TO LACK OF SPACE.
[Qemu-devel] KVM call agenda for Apr 13
Please send in any agenda items you are interested in covering. thanks, -chris
[Qemu-devel] Re: Missing singlestep for already-translated code?
Hi, So for the already-translated code, we will miss singlestep? At least SH4(and mips?) shows such behaviour. I think a patch below enables single stepping in such case, too. But, I'm not sure if this behaviour is on purpose, nor this patch is correct. /yoshii diff --git a/target-sh4/translate.c b/target-sh4/translate.c index 3537f8c..dfa724a 100644 --- a/target-sh4/translate.c +++ b/target-sh4/translate.c @@ -300,7 +300,7 @@ static void gen_goto_tb(DisasContext * ctx, int n, target_ulong dest) tb = ctx-tb; if ((tb-pc TARGET_PAGE_MASK) == (dest TARGET_PAGE_MASK) - !ctx-singlestep_enabled) { + !ctx-singlestep_enabled !singlestep) { /* Use a direct jump if in same page and singlestep not enabled */ tcg_gen_goto_tb(n); tcg_gen_movi_i32(cpu_pc, dest);
[Qemu-devel] [PATCH] fix migration with large mem
From f881b371e08760a67bf1f5b992a586c3de600f7a Mon Sep 17 00:00:00 2001 From: Izik Eidus iei...@redhat.com Date: Tue, 13 Apr 2010 12:24:57 +0300 Subject: [PATCH] fix migration with large mem In cases of guests with large mem that have pages that all their bytes content are the same, we will spend alot of time reading the memory from the guest (is_dup_page()) It is happening beacuse ram_save_live() function have limit of how much we can send to the dest but not how much we read from it, and in cases we have many is_dup_page() hits, we might read huge amount of data without updating important stuff like the timers... The guest lose all its repsonsibility and have many softlock ups inside itself. this patch add limit on the size we can read from the guest each iteration. Thanks. Signed-off-by: Izik Eidus iei...@redhat.com --- arch_init.c |6 +- 1 files changed, 5 insertions(+), 1 deletions(-) diff --git a/arch_init.c b/arch_init.c index cfc03ea..e27b1a0 100644 --- a/arch_init.c +++ b/arch_init.c @@ -88,6 +88,8 @@ const uint32_t arch_type = QEMU_ARCH; #define RAM_SAVE_FLAG_PAGE 0x08 #define RAM_SAVE_FLAG_EOS 0x10 +#define MAX_SAVE_BLOCK_READ 10 * 1024 * 1024 + static int is_dup_page(uint8_t *page, uint8_t ch) { uint32_t val = ch 24 | ch 16 | ch 8 | ch; @@ -175,6 +177,7 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque) uint64_t bytes_transferred_last; double bwidth = 0; uint64_t expected_time = 0; +int data_read = 0; if (stage 0) { cpu_physical_memory_set_dirty_tracking(0); @@ -205,10 +208,11 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque) bytes_transferred_last = bytes_transferred; bwidth = qemu_get_clock_ns(rt_clock); -while (!qemu_file_rate_limit(f)) { +while (!qemu_file_rate_limit(f) data_read MAX_SAVE_BLOCK_READ) { int ret; ret = ram_save_block(f); +data_read += ret * TARGET_PAGE_SIZE; bytes_transferred += ret * TARGET_PAGE_SIZE; if (ret == 0) { /* no more blocks */ break; -- 1.6.6.1
[Qemu-devel] [PATCH 1/3] block: Convert first_drv to QLIST
Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com --- block.c | 22 -- block_int.h |2 +- 2 files changed, 13 insertions(+), 11 deletions(-) diff --git a/block.c b/block.c index 5d087de..12cf434 100644 --- a/block.c +++ b/block.c @@ -58,7 +58,8 @@ static int bdrv_write_em(BlockDriverState *bs, int64_t sector_num, static QTAILQ_HEAD(, BlockDriverState) bdrv_states = QTAILQ_HEAD_INITIALIZER(bdrv_states); -static BlockDriver *first_drv; +static QLIST_HEAD(, BlockDriver) bdrv_drivers = +QLIST_HEAD_INITIALIZER(bdrv_drivers); /* If non-zero, use only whitelisted block drivers */ static int use_bdrv_whitelist; @@ -142,8 +143,7 @@ void bdrv_register(BlockDriver *bdrv) if (!bdrv-bdrv_aio_flush) bdrv-bdrv_aio_flush = bdrv_aio_flush_em; -bdrv-next = first_drv; -first_drv = bdrv; +QLIST_INSERT_HEAD(bdrv_drivers, bdrv, list); } /* create a new block device (by default it is empty) */ @@ -162,9 +162,10 @@ BlockDriverState *bdrv_new(const char *device_name) BlockDriver *bdrv_find_format(const char *format_name) { BlockDriver *drv1; -for(drv1 = first_drv; drv1 != NULL; drv1 = drv1-next) { -if (!strcmp(drv1-format_name, format_name)) +QLIST_FOREACH(drv1, bdrv_drivers, list) { +if (!strcmp(drv1-format_name, format_name)) { return drv1; +} } return NULL; } @@ -265,10 +266,11 @@ static BlockDriver *find_protocol(const char *filename) len = sizeof(protocol) - 1; memcpy(protocol, filename, len); protocol[len] = '\0'; -for(drv1 = first_drv; drv1 != NULL; drv1 = drv1-next) { +QLIST_FOREACH(drv1, bdrv_drivers, list) { if (drv1-protocol_name -!strcmp(drv1-protocol_name, protocol)) +!strcmp(drv1-protocol_name, protocol)) { return drv1; +} } return NULL; } @@ -282,7 +284,7 @@ static BlockDriver *find_hdev_driver(const char *filename) int score_max = 0, score; BlockDriver *drv = NULL, *d; -for (d = first_drv; d; d = d-next) { +QLIST_FOREACH(d, bdrv_drivers, list) { if (d-bdrv_probe_device) { score = d-bdrv_probe_device(filename); if (score score_max) { @@ -317,7 +319,7 @@ static BlockDriver *find_image_format(const char *filename) } score_max = 0; -for(drv1 = first_drv; drv1 != NULL; drv1 = drv1-next) { +QLIST_FOREACH(drv1, bdrv_drivers, list) { if (drv1-bdrv_probe) { score = drv1-bdrv_probe(buf, ret, filename); if (score score_max) { @@ -1157,7 +1159,7 @@ void bdrv_iterate_format(void (*it)(void *opaque, const char *name), { BlockDriver *drv; -for (drv = first_drv; drv != NULL; drv = drv-next) { +QLIST_FOREACH(drv, bdrv_drivers, list) { it(opaque, drv-format_name); } } diff --git a/block_int.h b/block_int.h index 466a38c..d4067ff 100644 --- a/block_int.h +++ b/block_int.h @@ -126,7 +126,7 @@ struct BlockDriver { /* Set if newly created images are not guaranteed to contain only zeros */ int no_zero_init; -struct BlockDriver *next; +QLIST_ENTRY(BlockDriver) list; }; struct BlockDriverState { -- 1.7.0
[Qemu-devel] [PATCH 2/3] qemu-img: Eliminate bdrv_new_open() code duplication
Several commands have code to create a BlockDriverState and open a file. The bdrv_new_open() function can be used to perform these steps. This patch converts the qemu-img commands to actually use bdrv_new_open(). Replaced the bdrv_new_open() 'readonly' argument with bdrv_open()-style flags to support generic flags like BDRV_O_NO_BACKING. Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com --- qemu-img.c | 83 +++ 1 files changed, 10 insertions(+), 73 deletions(-) diff --git a/qemu-img.c b/qemu-img.c index 18e43a0..b30effa 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -190,12 +190,11 @@ static int read_password(char *buf, int buf_size) static BlockDriverState *bdrv_new_open(const char *filename, const char *fmt, - int readonly) + int flags) { BlockDriverState *bs; BlockDriver *drv; char password[256]; -int flags = BRDV_O_FLAGS; bs = bdrv_new(); if (!bs) @@ -207,9 +206,6 @@ static BlockDriverState *bdrv_new_open(const char *filename, } else { drv = NULL; } -if (!readonly) { -flags |= BDRV_O_RDWR; -} if (bdrv_open(bs, filename, flags, drv) 0) { error(Could not open '%s', filename); } @@ -349,7 +345,7 @@ static int img_create(int argc, char **argv) } } -bs = bdrv_new_open(backing_file-value.s, fmt, 1); +bs = bdrv_new_open(backing_file-value.s, fmt, BRDV_O_FLAGS); bdrv_get_geometry(bs, size); size *= 512; bdrv_delete(bs); @@ -384,7 +380,6 @@ static int img_check(int argc, char **argv) { int c, ret; const char *filename, *fmt; -BlockDriver *drv; BlockDriverState *bs; fmt = NULL; @@ -405,19 +400,7 @@ static int img_check(int argc, char **argv) help(); filename = argv[optind++]; -bs = bdrv_new(); -if (!bs) -error(Not enough memory); -if (fmt) { -drv = bdrv_find_format(fmt); -if (!drv) -error(Unknown file format '%s', fmt); -} else { -drv = NULL; -} -if (bdrv_open(bs, filename, BRDV_O_FLAGS, drv) 0) { -error(Could not open '%s', filename); -} +bs = bdrv_new_open(filename, fmt, BRDV_O_FLAGS); ret = bdrv_check(bs); switch(ret) { case 0: @@ -443,7 +426,6 @@ static int img_commit(int argc, char **argv) { int c, ret; const char *filename, *fmt; -BlockDriver *drv; BlockDriverState *bs; fmt = NULL; @@ -464,19 +446,7 @@ static int img_commit(int argc, char **argv) help(); filename = argv[optind++]; -bs = bdrv_new(); -if (!bs) -error(Not enough memory); -if (fmt) { -drv = bdrv_find_format(fmt); -if (!drv) -error(Unknown file format '%s', fmt); -} else { -drv = NULL; -} -if (bdrv_open(bs, filename, BRDV_O_FLAGS | BDRV_O_RDWR, drv) 0) { -error(Could not open '%s', filename); -} +bs = bdrv_new_open(filename, fmt, BRDV_O_FLAGS | BDRV_O_RDWR); ret = bdrv_commit(bs); switch(ret) { case 0: @@ -633,7 +603,7 @@ static int img_convert(int argc, char **argv) total_sectors = 0; for (bs_i = 0; bs_i bs_n; bs_i++) { -bs[bs_i] = bdrv_new_open(argv[optind + bs_i], fmt, 1); +bs[bs_i] = bdrv_new_open(argv[optind + bs_i], fmt, BRDV_O_FLAGS); if (!bs[bs_i]) error(Could not open '%s', argv[optind + bs_i]); bdrv_get_geometry(bs[bs_i], bs_sectors); @@ -691,7 +661,7 @@ static int img_convert(int argc, char **argv) } } -out_bs = bdrv_new_open(out_filename, out_fmt, 0); +out_bs = bdrv_new_open(out_filename, out_fmt, BRDV_O_FLAGS | BDRV_O_RDWR); bs_i = 0; bs_offset = 0; @@ -889,7 +859,6 @@ static int img_info(int argc, char **argv) { int c; const char *filename, *fmt; -BlockDriver *drv; BlockDriverState *bs; char fmt_name[128], size_buf[128], dsize_buf[128]; uint64_t total_sectors; @@ -916,19 +885,7 @@ static int img_info(int argc, char **argv) help(); filename = argv[optind++]; -bs = bdrv_new(); -if (!bs) -error(Not enough memory); -if (fmt) { -drv = bdrv_find_format(fmt); -if (!drv) -error(Unknown file format '%s', fmt); -} else { -drv = NULL; -} -if (bdrv_open(bs, filename, BRDV_O_FLAGS | BDRV_O_NO_BACKING, drv) 0) { -error(Could not open '%s', filename); -} +bs = bdrv_new_open(filename, fmt, BRDV_O_FLAGS | BDRV_O_NO_BACKING); bdrv_get_format(bs, fmt_name, sizeof(fmt_name)); bdrv_get_geometry(bs, total_sectors); get_human_readable_size(size_buf, sizeof(size_buf), total_sectors * 512); @@ -1028,13 +985,7 @@ static int img_snapshot(int argc, char **argv) filename =
[Qemu-devel] [PATCH 3/3] qemu-img: Fix BRDV_O_FLAGS typo
It should be BDRV_O_FLAGS instead of BRDV_O_FLAGS. Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com --- qemu-img.c | 20 ++-- 1 files changed, 10 insertions(+), 10 deletions(-) diff --git a/qemu-img.c b/qemu-img.c index b30effa..7203b8b 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -37,7 +37,7 @@ typedef struct img_cmd_t { } img_cmd_t; /* Default to cache=writeback as data integrity is not important for qemu-tcg. */ -#define BRDV_O_FLAGS BDRV_O_CACHE_WB +#define BDRV_O_FLAGS BDRV_O_CACHE_WB static void QEMU_NORETURN error(const char *fmt, ...) { @@ -345,7 +345,7 @@ static int img_create(int argc, char **argv) } } -bs = bdrv_new_open(backing_file-value.s, fmt, BRDV_O_FLAGS); +bs = bdrv_new_open(backing_file-value.s, fmt, BDRV_O_FLAGS); bdrv_get_geometry(bs, size); size *= 512; bdrv_delete(bs); @@ -400,7 +400,7 @@ static int img_check(int argc, char **argv) help(); filename = argv[optind++]; -bs = bdrv_new_open(filename, fmt, BRDV_O_FLAGS); +bs = bdrv_new_open(filename, fmt, BDRV_O_FLAGS); ret = bdrv_check(bs); switch(ret) { case 0: @@ -446,7 +446,7 @@ static int img_commit(int argc, char **argv) help(); filename = argv[optind++]; -bs = bdrv_new_open(filename, fmt, BRDV_O_FLAGS | BDRV_O_RDWR); +bs = bdrv_new_open(filename, fmt, BDRV_O_FLAGS | BDRV_O_RDWR); ret = bdrv_commit(bs); switch(ret) { case 0: @@ -603,7 +603,7 @@ static int img_convert(int argc, char **argv) total_sectors = 0; for (bs_i = 0; bs_i bs_n; bs_i++) { -bs[bs_i] = bdrv_new_open(argv[optind + bs_i], fmt, BRDV_O_FLAGS); +bs[bs_i] = bdrv_new_open(argv[optind + bs_i], fmt, BDRV_O_FLAGS); if (!bs[bs_i]) error(Could not open '%s', argv[optind + bs_i]); bdrv_get_geometry(bs[bs_i], bs_sectors); @@ -661,7 +661,7 @@ static int img_convert(int argc, char **argv) } } -out_bs = bdrv_new_open(out_filename, out_fmt, BRDV_O_FLAGS | BDRV_O_RDWR); +out_bs = bdrv_new_open(out_filename, out_fmt, BDRV_O_FLAGS | BDRV_O_RDWR); bs_i = 0; bs_offset = 0; @@ -885,7 +885,7 @@ static int img_info(int argc, char **argv) help(); filename = argv[optind++]; -bs = bdrv_new_open(filename, fmt, BRDV_O_FLAGS | BDRV_O_NO_BACKING); +bs = bdrv_new_open(filename, fmt, BDRV_O_FLAGS | BDRV_O_NO_BACKING); bdrv_get_format(bs, fmt_name, sizeof(fmt_name)); bdrv_get_geometry(bs, total_sectors); get_human_readable_size(size_buf, sizeof(size_buf), total_sectors * 512); @@ -1075,7 +1075,7 @@ static int img_rebase(int argc, char **argv) * Ignore the old backing file for unsafe rebase in case we want to correct * the reference to a renamed or moved backing file. */ -flags = BRDV_O_FLAGS | BDRV_O_RDWR | (unsafe ? BDRV_O_NO_BACKING : 0); +flags = BDRV_O_FLAGS | BDRV_O_RDWR | (unsafe ? BDRV_O_NO_BACKING : 0); bs = bdrv_new_open(filename, fmt, flags); /* Find the right drivers for the backing files */ @@ -1106,7 +1106,7 @@ static int img_rebase(int argc, char **argv) bs_old_backing = bdrv_new(old_backing); bdrv_get_backing_filename(bs, backing_name, sizeof(backing_name)); -if (bdrv_open(bs_old_backing, backing_name, BRDV_O_FLAGS, +if (bdrv_open(bs_old_backing, backing_name, BDRV_O_FLAGS, old_backing_drv)) { error(Could not open old backing file '%s', backing_name); @@ -1114,7 +1114,7 @@ static int img_rebase(int argc, char **argv) } bs_new_backing = bdrv_new(new_backing); -if (bdrv_open(bs_new_backing, out_baseimg, BRDV_O_FLAGS | BDRV_O_RDWR, +if (bdrv_open(bs_new_backing, out_baseimg, BDRV_O_FLAGS | BDRV_O_RDWR, new_backing_drv)) { error(Could not open new backing file '%s', out_baseimg); -- 1.7.0
[Qemu-devel] [PATCH 0/3] block and qemu-img: Further cleanups
These patches apply to Kevin's block branch.
[Qemu-devel] Re: Missing singlestep for already-translated code?
On Tue, Apr 13, 2010 at 6:21 PM, takas...@ops.dti.ne.jp wrote: Hi, So for the already-translated code, we will miss singlestep? At least SH4(and mips?) shows such behaviour. I think a patch below enables single stepping in such case, too. But, I'm not sure if this behaviour is on purpose, nor this patch is correct. /yoshii diff --git a/target-sh4/translate.c b/target-sh4/translate.c index 3537f8c..dfa724a 100644 --- a/target-sh4/translate.c +++ b/target-sh4/translate.c @@ -300,7 +300,7 @@ static void gen_goto_tb(DisasContext * ctx, int n, target_ulong dest) tb = ctx-tb; if ((tb-pc TARGET_PAGE_MASK) == (dest TARGET_PAGE_MASK) - !ctx-singlestep_enabled) { + !ctx-singlestep_enabled !singlestep) { /* Use a direct jump if in same page and singlestep not enabled */ tcg_gen_goto_tb(n); tcg_gen_movi_i32(cpu_pc, dest); The first glance: because you are patching translate.c, I dont think you fixed the problem. Thanks, J
[Qemu-devel] Re: [PATCH v2 3/6] Modifies wrapper functions for byte-based phys_ram_dirty bitmap to bit-based phys_ram_dirty bitmap.
On 04/13/2010 11:01 AM, Yoshiaki Tamura wrote: Avi Kivity wrote: On 04/12/2010 01:58 PM, Yoshiaki Tamura wrote: Is it necessary to update migration and vga bitmaps? We can simply update the master bitmap, and update the migration and vga bitmaps only when they need it. That can be done in a different patch. Let me explain the role of the master bitmap here. In the previous discussion, the master bitmap represented at least one dirty type is actually dirty. While implementing this approach, I though there is one downside that upon resetting the bitmap, it needs to check all dirty types whether they are already cleared to unset the master bitmap, and this might hurt the performance in general. The way I see it, the master bitmap is a buffer between the guest and all the other client bitmaps (migration and vga). So, a bit can be set in vga and cleared in master. Let's look at the following scenario: 1. guest touches address 0xa (page 0xa0) 2. qemu updates master bitmap bit 0xa0 3. vga timer fires 4. vga syncs the dirty bitmap for addresses 0xa-0xc 4.1 vga_bitmap[0xa0-0xc0] |= master_bitmap[0xa0-0xc0] 4.2 migration_bitmap[0xa0-0xc0] |= master_bitmap[0xa0-0xc0] 4.3 master_bitmap[0xa0-0xc0] = 0 5. vga draws based on vga_bitmap the nice thing is that step 4.2 can be omitted if migration is not active. so, client bitmaps are always updated when the client requests them, master bitmap is only a buffer. Thanks. I think I'm finally getting your point. At 4.2 and 4.3, is syncing to client necessary? Yes. We want to clear the master bitmap, otherwise on the next iteration we will get dirty bits that may be spurious. But if we clear the dirty bitmap, we must copy it to all client bitmaps, not just vga, otherwise we lose data. Simply doing OR at get_dirty like in this patch not enough? If you don't clear the master bitmap, then you will get the same bits at the next iteration. In this patch, master bitmap represents all types are dirty, similar to existing 0xff. With this approach, resetting the master bitmap can be done without checking the other types. set_dirty_flags is actually taking the burden in this case though. Anyway, IIUC somebody would be unhappy depending on the role of the master bitmap. Yes, the problem is that set_dirty_flags is the common case and also uses random access, we'd want to make it touch only a single bit. client access is rare, and also sequential, and therefore efficient. I see. So set_dirty_flags would be like, 1. set master first regard less of dirty type. 2. if dirty type was CODE, set the client. There really should be two APIs, one for general dirtyness and one for code dirty. As Anthony said, the code dirty bitmap is completely separate from the other uses. Consider starting by separating the two uses, it may clear up the code and make things easier later on. In summary, I'll prepare two patch sets. 1. v3 of this patch set. - Change FLAGS value to (1,2,4,8), and add IDX (0,1,2,3) - Use ffsl to convert FLAGS to IDX. - Add help function which takes IDX. - Change the behavior of set_dirty_flags as above. - Change dirty bitmap access to a loop. - Add brace after if () - Move some macros to qemu-common.h. 2. Allocate vga and migration dirty bitmap dynamically. Please modify or add items if I were missing. I would add separation of CODE and the other dirty bitmaps. -- error compiling committee.c: too many arguments to function
[Qemu-devel] [PATCH] block.h: bdrv_create2 doesn't exist any more
The bdrv_create2 implementation has been disappeared long ago. Remove its prototype from the header file, too. Signed-off-by: Kevin Wolf kw...@redhat.com --- block.h |4 1 files changed, 0 insertions(+), 4 deletions(-) diff --git a/block.h b/block.h index 4a57dd5..05ad572 100644 --- a/block.h +++ b/block.h @@ -57,10 +57,6 @@ BlockDriver *bdrv_find_format(const char *format_name); BlockDriver *bdrv_find_whitelisted_format(const char *format_name); int bdrv_create(BlockDriver *drv, const char* filename, QEMUOptionParameter *options); -int bdrv_create2(BlockDriver *drv, - const char *filename, int64_t size_in_sectors, - const char *backing_file, const char *backing_format, - int flags); BlockDriverState *bdrv_new(const char *device_name); void bdrv_delete(BlockDriverState *bs); int bdrv_file_open(BlockDriverState **pbs, const char *filename, int flags); -- 1.6.6.1
[Qemu-devel] Re: [PATCH v2 3/6] Modifies wrapper functions for byte-based phys_ram_dirty bitmap to bit-based phys_ram_dirty bitmap.
Avi Kivity wrote: On 04/13/2010 11:01 AM, Yoshiaki Tamura wrote: Avi Kivity wrote: On 04/12/2010 01:58 PM, Yoshiaki Tamura wrote: Is it necessary to update migration and vga bitmaps? We can simply update the master bitmap, and update the migration and vga bitmaps only when they need it. That can be done in a different patch. Let me explain the role of the master bitmap here. In the previous discussion, the master bitmap represented at least one dirty type is actually dirty. While implementing this approach, I though there is one downside that upon resetting the bitmap, it needs to check all dirty types whether they are already cleared to unset the master bitmap, and this might hurt the performance in general. The way I see it, the master bitmap is a buffer between the guest and all the other client bitmaps (migration and vga). So, a bit can be set in vga and cleared in master. Let's look at the following scenario: 1. guest touches address 0xa (page 0xa0) 2. qemu updates master bitmap bit 0xa0 3. vga timer fires 4. vga syncs the dirty bitmap for addresses 0xa-0xc 4.1 vga_bitmap[0xa0-0xc0] |= master_bitmap[0xa0-0xc0] 4.2 migration_bitmap[0xa0-0xc0] |= master_bitmap[0xa0-0xc0] 4.3 master_bitmap[0xa0-0xc0] = 0 5. vga draws based on vga_bitmap the nice thing is that step 4.2 can be omitted if migration is not active. so, client bitmaps are always updated when the client requests them, master bitmap is only a buffer. Thanks. I think I'm finally getting your point. At 4.2 and 4.3, is syncing to client necessary? Yes. We want to clear the master bitmap, otherwise on the next iteration we will get dirty bits that may be spurious. But if we clear the dirty bitmap, we must copy it to all client bitmaps, not just vga, otherwise we lose data. Simply doing OR at get_dirty like in this patch not enough? If you don't clear the master bitmap, then you will get the same bits at the next iteration. OK. Master is just a cache/buffer, and upon get_dirty, we copy it to client bitmaps with for loop, clear the master and finally check actual client bitmap. If we're going to allocate client bitmaps dynamically, we need to check whether it isn't null too. In this patch, master bitmap represents all types are dirty, similar to existing 0xff. With this approach, resetting the master bitmap can be done without checking the other types. set_dirty_flags is actually taking the burden in this case though. Anyway, IIUC somebody would be unhappy depending on the role of the master bitmap. Yes, the problem is that set_dirty_flags is the common case and also uses random access, we'd want to make it touch only a single bit. client access is rare, and also sequential, and therefore efficient. I see. So set_dirty_flags would be like, 1. set master first regard less of dirty type. 2. if dirty type was CODE, set the client. There really should be two APIs, one for general dirtyness and one for code dirty. As Anthony said, the code dirty bitmap is completely separate from the other uses. Consider starting by separating the two uses, it may clear up the code and make things easier later on. That really make sense. A little bit hesitant because the code might look a bit duplicated. In summary, I'll prepare two patch sets. 1. v3 of this patch set. - Change FLAGS value to (1,2,4,8), and add IDX (0,1,2,3) - Use ffsl to convert FLAGS to IDX. - Add help function which takes IDX. - Change the behavior of set_dirty_flags as above. - Change dirty bitmap access to a loop. - Add brace after if () - Move some macros to qemu-common.h. 2. Allocate vga and migration dirty bitmap dynamically. Please modify or add items if I were missing. I would add separation of CODE and the other dirty bitmaps. I'm not sure I should starting separating first or later at this moment, but will consider it.
Re: [Qemu-devel] How to lock-up your tap-based VM network
Paul Brook wrote: A major reason for this deadlock could likely be removed by shutting down the tap (if peered) or dropping packets in user space (in case of vlan) when a NIC is stopped or otherwise shut down. Currently most (if not all) NIC models seem to signal both queue full and RX disabled via !can_receive(). No. A disabled device should return true from can_recieve, then discard the packets in its receive callback. Failure to do so is a bug in the device. It looks like the virtio-net device may be buggy. That's not a virtio-only issue. In fact, we ran into this over pcnet, and a quick check of other popular PCI NIC models (except for rtl8139) revealed the same picture: They only report can_receive if their receiver unit is up and ready (some also include the queue state, but that's an add-on). I think it's clear why: can_receive strongly suggests that a suspended receiver should make the model return false. If we want to keep this handler, it should be refactored to something like queue_full. But before starting any refactoring endeavor: Do we have a consensus on the direction? Refactor can_receive to queue_full? Or even drop it? Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
Re: [Qemu-devel] How to lock-up your tap-based VM network
Jamie Lokier wrote: Paul Brook wrote: A major reason for this deadlock could likely be removed by shutting down the tap (if peered) or dropping packets in user space (in case of vlan) when a NIC is stopped or otherwise shut down. Currently most (if not all) NIC models seem to signal both queue full and RX disabled via !can_receive(). No. A disabled device should return true from can_recieve, then discard the packets in its receive callback. Failure to do so is a bug in the device. It looks like the virtio-net device may be buggy. I agree - or alternatively signal that there's no point sending it packets and they should be dropped without bothering to construct them. But anyway, this flow control mechanism is buggy - what if instead of an interface down, you just have a *slow* guest? That should not push back so much that it makes other guests networking with each other slow down. Indeed. So, instead of the current scheme that tries to stop the sender when some receiver overflows, we must turn it up side down so that this stop can _never_ happen. We may keep a sufficiently long queue to reduce the risk of packet drops, but we can't prevent this for all cases anyway. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
Re: [Qemu-devel] How to lock-up your tap-based VM network
Paul Brook wrote: But anyway, this flow control mechanism is buggy - what if instead of an interface down, you just have a *slow* guest? That should not push back so much that it makes other guests networking with each other slow down. The OP described multiple guests connected via a host bridge. In this case it is entirely the host's responsibility to arbitrate between multiple guests. If one interface can block the bridge simply by failing to respond in a timely manner then this is a serious bug or misconfiguration of your host bridge. It's not the bridge, I think it's limited to the tun driver as bridge participant and how it is configured/used by QEMU. The reason tap_send exists is because slirp does not implement TCP flow control properly. Instead it relies on the can_send hook to only avoid dropped packets. Using this in the tap code is debatable. My guess is that this is a hack to workaround guest network stacks that expect throughput to be limited by line speed (which is a virtual environment is effectively infinite), have poor higher level flow control, and react badly to packet loss. [ CC'ing some people who should have been involved in establishing the TX mitigation for tap devices. Full thread under http://thread.gmane.org/gmane.comp.emulators.qemu/67809 ] Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
[Qemu-devel] [PATCH] SCSI: Add disk reset handler
Ensure that pending requests of an SCSI disk are purged on system reset and also restore max_lba. The latter is now only present in the reset handler as that one is already automatically called during init. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/scsi-disk.c | 32 1 files changed, 24 insertions(+), 8 deletions(-) This should be applied to stable as well. I can provide a corresponding patch once this one is acceptable. diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c index 77cb1da..50d5e9c 100644 --- a/hw/scsi-disk.c +++ b/hw/scsi-disk.c @@ -1010,22 +1010,42 @@ static int32_t scsi_send_command(SCSIDevice *d, uint32_t tag, } } -static void scsi_destroy(SCSIDevice *dev) +static void scsi_disk_purge_requests(SCSIDiskState *s) { -SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, dev); SCSIDiskReq *r; while (!QTAILQ_EMPTY(s-qdev.requests)) { r = DO_UPCAST(SCSIDiskReq, req, QTAILQ_FIRST(s-qdev.requests)); scsi_remove_request(r); } +} + +static void scsi_disk_reset(DeviceState *dev) +{ +SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev.qdev, dev); +uint64_t nb_sectors; + +scsi_disk_purge_requests(s); + +bdrv_get_geometry(s-bs, nb_sectors); +nb_sectors /= s-cluster_size; +if (nb_sectors) { +nb_sectors--; +} +s-max_lba = nb_sectors; +} + +static void scsi_destroy(SCSIDevice *dev) +{ +SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, dev); + +scsi_disk_purge_requests(s); drive_uninit(s-qdev.conf.dinfo); } static int scsi_disk_initfn(SCSIDevice *dev) { SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, dev); -uint64_t nb_sectors; if (!s-qdev.conf.dinfo || !s-qdev.conf.dinfo-bdrv) { error_report(scsi-disk: drive property not set); @@ -1046,11 +1066,6 @@ static int scsi_disk_initfn(SCSIDevice *dev) s-cluster_size = s-qdev.blocksize / 512; s-qdev.type = TYPE_DISK; -bdrv_get_geometry(s-bs, nb_sectors); -nb_sectors /= s-cluster_size; -if (nb_sectors) -nb_sectors--; -s-max_lba = nb_sectors; qemu_add_vm_change_state_handler(scsi_dma_restart_cb, s); return 0; } @@ -1059,6 +1074,7 @@ static SCSIDeviceInfo scsi_disk_info = { .qdev.name= scsi-disk, .qdev.desc= virtual scsi disk or cdrom, .qdev.size= sizeof(SCSIDiskState), +.qdev.reset = scsi_disk_reset, .init = scsi_disk_initfn, .destroy = scsi_destroy, .send_command = scsi_send_command,
Re: [Qemu-devel] How to lock-up your tap-based VM network
Paul Brook wrote: But anyway, this flow control mechanism is buggy - what if instead of an interface down, you just have a *slow* guest? That should not push back so much that it makes other guests networking with each other slow down. The OP described multiple guests connected via a host bridge. In this case it is entirely the host's responsibility to arbitrate between multiple guests. If one interface can block the bridge simply by failing to respond in a timely manner then this is a serious bug or misconfiguration of your host bridge. It's not the bridge, I think it's limited to the tun driver as bridge participant and how it is configured/used by QEMU. If one tap interface can prevent correct operation of another by failing to read data, then this is clearly a host kernel bug. I've no idea whether it's a bug in the TAP implementation (e.g. shared queue between independent devices), a bug in the bridging implementation (drops unrelated packets when one port is slow), or some interaction between the two, but it's definitely not a qemu bug. I'm sure there are plenty of ways to DoS a qemu instance, and prevent it reading data from its tap interface. How/whether this effects other unrelated processes on the host machine is not something qemu can or should know about. Paul
Re: [Qemu-devel] How to lock-up your tap-based VM network
Paul Brook wrote: Paul Brook wrote: A major reason for this deadlock could likely be removed by shutting down the tap (if peered) or dropping packets in user space (in case of vlan) when a NIC is stopped or otherwise shut down. Currently most (if not all) NIC models seem to signal both queue full and RX disabled via !can_receive(). No. A disabled device should return true from can_recieve, then discard the packets in its receive callback. Failure to do so is a bug in the device. It looks like the virtio-net device may be buggy. That's not a virtio-only issue. In fact, we ran into this over pcnet, and a quick check of other popular PCI NIC models (except for rtl8139) revealed the same picture: They only report can_receive if their receiver unit is up and ready (some also include the queue state, but that's an add-on). If so these are also buggy. I think it's clear why: can_receive strongly suggests that a suspended receiver should make the model return false. If we want to keep this handler, it should be refactored to something like queue_full. I don't see a need to refactor anything. You just need to fix the devices that incorrectly return false when their RX engine is disabled. can_receive is a misleading name. NIC model developers will continue to make mistakes when implementing this function. And I don't think we want such implementations all around: static int rtl8139_can_receive(VLANClientState *nc) { RTL8139State *s = DO_UPCAST(NICState, nc, nc)-opaque; int avail; /* Receive (drop) packets if card is disabled. */ if (!s-clock_enabled) return 1; if (!rtl8139_receiver_enabled(s)) return 1; if (rtl8139_cp_receiver_enabled(s)) { /* ??? Flow control not implemented in c+ mode. This is a hack to work around slirp deficiencies anyway. */ return 1; } else { avail = MOD2(s-RxBufferSize + s-RxBufPtr - s-RxBufAddr, s-RxBufferSize); return (avail == 0 || avail = 1514); } } Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
[Qemu-devel] Re: Missing singlestep for already-translated code?
Jun Koi wrote: Hi, I am looking into the singlestep command in monitor interface, and it seems that we only take into account the singlestep flag when we are translating code. So for the already-translated code, we will miss singlestep? This feature is broken. For TCG, it should at least flush the translation buffer, and for KVM it has to enable single-stepping in the kernel. That's what happens automatically when you call cpu_single_step. I guess 'singlestep' wants to be somehow orthogonal to this. But this is the wrong approach. Does anyone actually used this feature or still does so? It looks fairly redundant to me, kind of a poor-man's gdb front-end as part of the monitor console. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
Re: [Qemu-devel] How to lock-up your tap-based VM network
Paul Brook wrote: A major reason for this deadlock could likely be removed by shutting down the tap (if peered) or dropping packets in user space (in case of vlan) when a NIC is stopped or otherwise shut down. Currently most (if not all) NIC models seem to signal both queue full and RX disabled via !can_receive(). No. A disabled device should return true from can_recieve, then discard the packets in its receive callback. Failure to do so is a bug in the device. It looks like the virtio-net device may be buggy. That's not a virtio-only issue. In fact, we ran into this over pcnet, and a quick check of other popular PCI NIC models (except for rtl8139) revealed the same picture: They only report can_receive if their receiver unit is up and ready (some also include the queue state, but that's an add-on). If so these are also buggy. I think it's clear why: can_receive strongly suggests that a suspended receiver should make the model return false. If we want to keep this handler, it should be refactored to something like queue_full. I don't see a need to refactor anything. You just need to fix the devices that incorrectly return false when their RX engine is disabled. Paul
[Qemu-devel] KVM call minutes for Apr 13
No topics, short call
Re: [Qemu-devel] Re: Missing singlestep for already-translated code?
On 13.04.2010, at 15:36, Jan Kiszka wrote: Jun Koi wrote: Hi, I am looking into the singlestep command in monitor interface, and it seems that we only take into account the singlestep flag when we are translating code. So for the already-translated code, we will miss singlestep? This feature is broken. For TCG, it should at least flush the translation buffer, and for KVM it has to enable single-stepping in the kernel. That's what happens automatically when you call cpu_single_step. I guess 'singlestep' wants to be somehow orthogonal to this. But this is the wrong approach. Does anyone actually used this feature or still does so? It looks fairly redundant to me, kind of a poor-man's gdb front-end as part of the monitor console. Not sure what it does, but I use -singlestep quite a lot to get register dumps for instructions when using -d cpu. Alex
Re: [Qemu-devel] [PATCH 2/2] VirtIO RNG
Paul Brook wrote: Paul Brook wrote: So now everything that looks like a stream of bytes has to use the virtio-serial code... IMO, yes. That's the whole point of virtio-serial. You can handle it however you like in your in your favourite guest OS, but I object to qemu having multiple virtio devices that look and smell exactly like a serial port. I dont think it looks and smells exactly like a serial port. The code is simpler to begin with, nor does it support any out-of-band signalling. Why? Its not like it'll make the rng device any simpler, smaller, faster, or reduce its dependencies. Virtio is simple enough to begin with! I disagree. Past experience shows that it's more than possible to completely screw up something as simple as a serial port. Straw man. Serial ports might be fairly simple things, but my device is simpler still than most of them, and you're talking about emulating a serial port, where you have control lines, line disciplines, etc. to emulate, all of which need to be right. None of that is a concern in my driver. If everything that looks like a serial port provides their own implementation then the chances are they'll all be missing something, and broken in different ways. Better never write any code then, its garunteesd to have a bug in it. I *know* what you're getting at - I removed hundreds of lines of common code copied and randomly, brokenly, modified from the various ASoC kernel drivers a few months back. But this bit of code doesnt really share enough in common with anything else to make it worth handling your way. Alternatively, prove me wrong! Take rate limiting as an example: Your implementation has at least one outright bug (using gettimeofday), Arguably a bug. I'd disagree that it causes any problems in practice (qemu_gettimeofday, btw - which I was specifically requested to use during an earlier review of this patch...) and various questionable characteristics (is a 1-second token bucket appropriate). So what? if people dont like the default policy, or its too limited, they can feel free to contribute patches. Its hardly like this is core code that minute changes will require months of review to ratify! At least I'm doing something about the total-lack-of-any-support-whatsoever situation. The EGD protocol bits are another example. You aren't exposing any of the interesting bits of this protocol to the guest, so it seems entirely reasonable to connect this to a serial port. Except that then we need to have a little daemon to feed rate limited EGD protocol data to said serial port. So now you've got the situation of the USB-serial entropy key, feeding the (potentially) proprietary egd-compatible-but-not-rate-limited daemon that feeds a rate limiting daemon into a serial port on qemu that feeds the guest OS. So this is running in a server farm with 64 or 128 guests sat on it. each one with its own silly rate limiting daemon / serial port mux from the egd daemon the USB stick feeds. What happens when you apt-get upgrade and the egd daemon gets replaced? you really want to notify and restart all 128 noddy mux daemons and reconnect qemu to them? Why dont we just kill the guests and reboot them all too? My patchset handled this case without any problems at all (the reader would simply (and without blocking) reconect to the server). The point is that you can take 'generic support' too far. Sure you CAN model this as a bunch of serial ports, and feed them from rate limiting daemons, which in turn read from some egd-speaking daemon or hardware, but EGD protocol is *SO* simple. Why on earth go to these lengths when it can be implemented and embedded in a handful of lines of code? The chardev reconnect stuff was in the generic qemu chardev layer and thus reuseable by other parts of qemu that need to maintain a persistent state even when connected to other processes via sockets, pipes, fifos etc. That allows it to be used by guests that don't have a virtio-rng driver. Thats already supported by use of TCP tunnels. but thats 1) not efficient and 2) not very secure - few sources of entropy provide crypto to garuntee that the data source hasnt been frobbed. 3) much more vulnerable to attack from processes running on the guest (TCP is much more exposde than the rng-core internals) So, rather than bike-shedding, how about making some kind of decision? Either: 1) Dictate how the code is going to be written (in which case, since you know it so well, write it yourself) or: 2) Accept that not everyone likes your (idea for a) implementation, that prior interfaces exist on the guest side, and that a perfectly good and non-intrusive way of doing it on qemu has beeen written for you. (including a couple of other nice features / fixes along the way, like adding the missing SIZE_ property, socket reconnect support, etc.) Bear in mind that its rather unfair to (as has been in this case) have a patch reviewed, modified based on
Re: [Qemu-devel] [PATCH 2/2] VirtIO RNG
Bear in mind that its rather unfair to (as has been in this case) have a patch reviewed, modified based on criticism, and then rejected after effort has been wasted working on it. I'm pretty sure I raised all these issues the first time this code was submitted. Paul
[Qemu-devel] [PATCH] tun: orphan an skb on tx
The following situation was observed in the field: tap1 sends packets, tap2 does not consume them, as a result tap1 can not be closed. This happens because tun/tap devices can hang on to skbs undefinitely. As noted by Herbert, possible solutions include a timeout followed by a copy/change of ownership of the skb, or always copying/changing ownership if we're going into a hostile device. This patch implements the second approach. Note: one issue still remaining is that since skbs keep reference to tun socket and tun socket has a reference to tun device, we won't flush backlog, instead simply waiting for all skbs to get transmitted. At least this is not user-triggerable, and this was not reported in practice, my assumption is other devices besides tap complete an skb within finite time after it has been queued. A possible solution for the second issue would not to have socket reference the device, instead, implement dev-destructor for tun, and wait for all skbs to complete there, but this needs some thought, probably too risky for 2.6.34. Signed-off-by: Michael S. Tsirkin m...@redhat.com Tested-by: Yan Vugenfirer yvuge...@redhat.com --- Please review the below, and consider for 2.6.34, and stable trees. drivers/net/tun.c |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 96c39bd..4326520 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -387,6 +387,10 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev) } } + /* Orphan the skb - required as we might hang on to it +* for indefinite time. */ + skb_orphan(skb); + /* Enqueue packet */ skb_queue_tail(tun-socket.sk-sk_receive_queue, skb); dev-trans_start = jiffies; -- 1.7.0.2.280.gc6f05
[Qemu-devel] Re: [PATCH] tun: orphan an skb on tx
On Tue, Apr 13, 2010 at 05:59:44PM +0300, Michael S. Tsirkin wrote: The following situation was observed in the field: tap1 sends packets, tap2 does not consume them, as a result tap1 can not be closed. This happens because tun/tap devices can hang on to skbs undefinitely. As noted by Herbert, possible solutions include a timeout followed by a copy/change of ownership of the skb, or always copying/changing ownership if we're going into a hostile device. This patch implements the second approach. Note: one issue still remaining is that since skbs keep reference to tun socket and tun socket has a reference to tun device, we won't flush backlog, instead simply waiting for all skbs to get transmitted. At least this is not user-triggerable, and this was not reported in practice, my assumption is other devices besides tap complete an skb within finite time after it has been queued. A possible solution for the second issue would not to have socket reference the device, instead, implement dev-destructor for tun, and wait for all skbs to complete there, but this needs some thought, probably too risky for 2.6.34. Signed-off-by: Michael S. Tsirkin m...@redhat.com Tested-by: Yan Vugenfirer yvuge...@redhat.com Acked-by: Herbert Xu herb...@gondor.apana.org.au Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} herb...@gondor.apana.org.au Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [Qemu-devel] Re: Missing singlestep for already-translated code?
Alexander Graf wrote: On 13.04.2010, at 15:36, Jan Kiszka wrote: Jun Koi wrote: Hi, I am looking into the singlestep command in monitor interface, and it seems that we only take into account the singlestep flag when we are translating code. So for the already-translated code, we will miss singlestep? This feature is broken. For TCG, it should at least flush the translation buffer, and for KVM it has to enable single-stepping in the kernel. That's what happens automatically when you call cpu_single_step. I guess 'singlestep' wants to be somehow orthogonal to this. But this is the wrong approach. Does anyone actually used this feature or still does so? It looks fairly redundant to me, kind of a poor-man's gdb front-end as part of the monitor console. Not sure what it does, but I use -singlestep quite a lot to get register dumps for instructions when using -d cpu. Ah, singlestep is not about stopping the VM after each instruction but about limiting the TB length to a single instruction. Badly named and poorly documented. In that case, the dynamic switch should already be fine by adding a tb_flush() on enable. Still, someone should also patch at least the docs. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
Re: [Qemu-devel] [PATCH 2/2] VirtIO RNG
So, rather than bike-shedding, how about making some kind of decision? Ok, let me make this simple. Features such as rate limiting and EGD protocol translation should not be part of individual device emulation. They are part of the host interface, not the guest machine. If you really want these in qemu then they should be implemented as part of the chardev layer. It may be a bit more work to implement these properly. However I'm not going to accept a half-assed implementation just because someone wrote it. A direct result of this is that virtio-rng degenerates to a crippled virtual serial port. We already have a virtual serial port implementation designed for exactly this kind of application. IMHO virtio-rng is a complete waste of time and space. Paul
[Qemu-devel] Re: [PATCH] tun: orphan an skb on tx
Michael S. Tsirkin wrote: The following situation was observed in the field: tap1 sends packets, tap2 does not consume them, as a result tap1 can not be closed. And before that, tap1 may not be able to send further packets to anyone else on the bridge as its TX resources were blocked by tap2 - that's what we saw in the field. This happens because tun/tap devices can hang on to skbs undefinitely. As noted by Herbert, possible solutions include a timeout followed by a copy/change of ownership of the skb, or always copying/changing ownership if we're going into a hostile device. This patch implements the second approach. Note: one issue still remaining is that since skbs keep reference to tun socket and tun socket has a reference to tun device, we won't flush backlog, instead simply waiting for all skbs to get transmitted. At least this is not user-triggerable, and this was not reported in practice, my assumption is other devices besides tap complete an skb within finite time after it has been queued. A possible solution for the second issue would not to have socket reference the device, instead, implement dev-destructor for tun, and wait for all skbs to complete there, but this needs some thought, probably too risky for 2.6.34. Signed-off-by: Michael S. Tsirkin m...@redhat.com Tested-by: Yan Vugenfirer yvuge...@redhat.com --- Please review the below, and consider for 2.6.34, and stable trees. drivers/net/tun.c |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 96c39bd..4326520 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -387,6 +387,10 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev) } } + /* Orphan the skb - required as we might hang on to it + * for indefinite time. */ + skb_orphan(skb); + /* Enqueue packet */ skb_queue_tail(tun-socket.sk-sk_receive_queue, skb); dev-trans_start = jiffies; Nice solution, thanks! Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
[Qemu-devel] Re: Problem with QEMU on KVM
Mulyadi Santosa wrote: Hi Jamie... On Mon, Apr 12, 2010 at 19:07, Jamie Lokier ja...@shareable.org wrote: There are various -no-kvm-XXX options to try: -no-kvm-irqchip disable KVM kernel mode PIC/IOAPIC/LAPIC -no-kvm-pit disable KVM kernel mode PIT -no-kvm-pit-reinjection disable KVM kernel mode PIT interrupt reinjection I try to read the source code without too much luck understanding the meaning of the above parameters, especially the reinjection. Please CMIIW, interrupt reinjection is a way to handle lost ticks, right? Those switches only exist in the qemu-kvm branch. The fact that -enable-kvm has some effect for you makes me think that you tried upstream KVM support (ie. the one that comes with vanilla QEMU), right? It would be interesting to check if you can reproduce the problem also with qemu-kvm(-0.12.3 or git head). As you may also face an issue of the kvm kernel bits, it would be furthermore helpful to check with the latest kernel from kvm.git or the modules build via kvm-kmod as well. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
[Qemu-devel] Re: [PATCH] tun: orphan an skb on tx
Eric Dumazet wrote: Le mardi 13 avril 2010 à 17:36 +0200, Jan Kiszka a écrit : Michael S. Tsirkin wrote: The following situation was observed in the field: tap1 sends packets, tap2 does not consume them, as a result tap1 can not be closed. And before that, tap1 may not be able to send further packets to anyone else on the bridge as its TX resources were blocked by tap2 - that's what we saw in the field. After the patch, tap1 is able to flood tap2, and tap3/tap4 not able to send one single frame. Is it OK ? I think if that's a real issue, you have to apply traffic shaping to the untrusted nodes. The existing flow-control scheme was fragile anyway as you had to translate packet lengths on TX side to packet counts on RX. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
[Qemu-devel] Re: [PATCH] tun: orphan an skb on tx
On Tue, Apr 13, 2010 at 06:40:38PM +0200, Eric Dumazet wrote: Le mardi 13 avril 2010 à 17:36 +0200, Jan Kiszka a écrit : Michael S. Tsirkin wrote: The following situation was observed in the field: tap1 sends packets, tap2 does not consume them, as a result tap1 can not be closed. And before that, tap1 may not be able to send further packets to anyone else on the bridge as its TX resources were blocked by tap2 - that's what we saw in the field. After the patch, tap1 is able to flood tap2, and tap3/tap4 not able to send one single frame. Is it OK ? Yes :) This was always possible. Number of senders needed to flood a receiver might vary depending on send/recv queue size that you set. External sources can also fill your RX queue if you let them. In the end, we need to rely on the scheduler for fairness, or apply packet shaping. Back to the problem : tap1 cannot be closed. Why ? because of refcounts ? Yes. When a socket with inflight tx packets is closed, we dont block the close, we only delay the socket freeing once all packets were delivered and freed. Which is wrong, since this is under userspace control, so you get unkillable processes. -- MST
[Qemu-devel] Re: [PATCH] block: Split bdrv_open
On Mon, Apr 12, 2010 at 04:49:16PM +0200, Kevin Wolf wrote: bdrv_open contains quite some code that is only useful for opening images (as opposed to opening files by a protocol), for example snapshots. This patch splits the code so that we have bdrv_open_file() for files (uses protocols), bdrv_open() for images (uses format drivers) and bdrv_do_open() for the code common for opening both images and files. Signed-off-by: Kevin Wolf kw...@redhat.com --- This patch applies on top of Christoph's RFC for the format/protocol split I like this a lot. A few comments: - why is bdrv_do_open below it's users in the code? I really hate forward declarations of functions and they can usually be easily avoided. - a do a function name is not very meaningfull - what about bdrv_open_common instead? - doesn't the backing device handling only apply to image formats, too?
Re: [Qemu-devel] How to lock-up your tap-based VM network
On 4/12/10, Paul Brook p...@codesourcery.com wrote: A major reason for this deadlock could likely be removed by shutting down the tap (if peered) or dropping packets in user space (in case of vlan) when a NIC is stopped or otherwise shut down. Currently most (if not all) NIC models seem to signal both queue full and RX disabled via !can_receive(). No. A disabled device should return true from can_recieve, then discard the packets in its receive callback. Failure to do so is a bug in the device. It looks like the virtio-net device may be buggy. Awesome, it looks like a longstanding bug with pcnet/lance has is fixed by this change! OpenBSD installer would hang when receiving packages, now it works! diff --git a/hw/pcnet.c b/hw/pcnet.c index 5e63eb5..a04ad09 100644 --- a/hw/pcnet.c +++ b/hw/pcnet.c @@ -1031,8 +1031,6 @@ static int pcnet_tdte_poll(PCNetState *s) int pcnet_can_receive(VLANClientState *nc) { PCNetState *s = DO_UPCAST(NICState, nc, nc)-opaque; -if (CSR_STOP(s) || CSR_SPND(s)) -return 0; return sizeof(s-buffer)-16; }
Re: [Qemu-devel] [GSoC 2010] Pass-through filesystem support.
jvrao wrote: Alexander Graf wrote: On 12.04.2010, at 13:58, Jamie Lokier wrote: Mohammed Gamal wrote: On Mon, Apr 12, 2010 at 12:29 AM, Jamie Lokier ja...@shareable.org wrote: Javier Guerra Giraldez wrote: On Sat, Apr 10, 2010 at 7:42 AM, Mohammed Gamal m.gamal...@gmail.com wrote: On Sat, Apr 10, 2010 at 2:12 PM, Jamie Lokier ja...@shareable.org wrote: To throw a spanner in, the most widely supported filesystem across operating systems is probably NFS, version 2 :-) Remember that Windows usage on a VM is not some rare use case, and it'd be a little bit of a pain from a user's perspective to have to install a third party NFS client for every VM they use. Having something supported on the VM out of the box is a better option IMO. i don't think virtio-CIFS has any more support out of the box (on any system) than virtio-9P. It doesn't, but at least network-CIFS tends to work ok and is the method of choice for Windows VMs - when you can setup Samba on the host (which as previously noted you cannot always do non-disruptively with current Sambas). -- Jamie I think having support for both 9p and CIFS would be the best option. In that case the user will have the option to use either one, depending on how their guests support these filesystems. In that case I'd prefer to work on CIFS support while the 9p effort can still go on. I don't think both efforts are mutually exclusive. What do the rest of you guys think? I only noted NFS because most old OSes do not support CIFS or 9P - especially all the old unixes. I don't think old versions of MS-DOS and Windows (95, 98, ME, Nt4?) even support current CIFS. They need extra server settings to work - such as setting passwords on the server to non-encrypted and other quirks. Meanwhile Windows Vista/2008/7 works better with SMB2, not CIFS, to properly see symlinks and hard links. So there is no really nice out of the box file service which works easily with all guest OSes. I'm guessing that out of all the filesystems, CIFS is the most widely supported in recent OSes (released in the last 10 years). But I'm not really sure what the state of CIFS is for non-Windows, non-Linux, non-BSD guests. So what? If you want to have direct host fs access, install guest drivers. If you can't, set up networking and use CIFS or NFS or whatever. I'm not sure why 9P is being pursued. Does anything much support it, or do all OSes except quite recent Linux need a custom driver for 9P? Even Linux only got the first commit in the kernel 5 years ago, so probably it was only about 3 years ago that it will have begun appearing in stable distros, if at all. Filesystem passthrough to Linux guests installed in the last couple of years is a useful feature, and I know that for many people that is their only use of KVM, but compared with CIFS' broad support it seems like quite a narrow goal. The goal is to have something simple and fast. We can fine-tune 9P to align with the Linux VFS structures, making it really little overhead (and little headache too). For Windows guests, nothing prevents us to expose yet another 9P flavor. That again would be aligned well with Windows's VFS and be slim and fast there. The biggest problem I see with CIFS is that it's a huge beast. There are a lot of corner cases where it just doesn't fit in. See my previous mail for more details. As Alex mentioned, 9P is chosen for its mere simplicity and easy adaptability. NFS and CIFS does not give that flexibility. As we mentioned in the patch series, we are already seeing better numbers with 9P. Looking ahead 9P can embed KVM/QEMU knowledge to share physical resources like memory/cache between the host and the guest. I think looking into the windows side of 9P client would be great option too. The current patch on the mailing list supports .U (unix) protocol and will be introducing .L (Linux) variant as we move forward. Hi Mohammed, Please let us know once you decide on where your interest lies. Will be glad to have you on VirtFS (9P) though. :) - JV - JV Alex
Re: [Qemu-devel] How to lock-up your tap-based VM network
On 4/13/10, Blue Swirl blauwir...@gmail.com wrote: On 4/12/10, Paul Brook p...@codesourcery.com wrote: A major reason for this deadlock could likely be removed by shutting down the tap (if peered) or dropping packets in user space (in case of vlan) when a NIC is stopped or otherwise shut down. Currently most (if not all) NIC models seem to signal both queue full and RX disabled via !can_receive(). No. A disabled device should return true from can_recieve, then discard the packets in its receive callback. Failure to do so is a bug in the device. It looks like the virtio-net device may be buggy. Awesome, it looks like a longstanding bug with pcnet/lance has is fixed by this change! OpenBSD installer would hang when receiving packages, now it works! I spoke too soon, networking works also without the patch now.
Re: [Qemu-devel] Problem with DOS application and 286 DOS Extender application
On Tue, 13 Apr 2010, Roy Tam wrote: 2010/4/13 Gerhard Wiesinger li...@wiesinger.com: Hello, I'm having a problem with a DOS application which uses a 286 DOS Extender, error message is as the following: unable to create task for execution Interrupt 10 (Ah) while creating task: Invalid task segment selector. Happens with QEMM386 and HIMEM.SYS/EMM386.EXE. I guess the application does at this point swithing to 286 protected mode and trying to move conventional memory up to EMS memory. Issue is NOT present under VMWare Server 2.0 and with real hardware. DOS; MS-DOS 6.22 QEMU: 0.12.3 under Fedora 11, 2.6.30.10-105.2.23.fc11.x86 on AMD Phenom II Quad Core, x86_64-softmmu. Any comments or ideas (I guess something with protected mode and MMU might be wrong)? You need to mention the program name so that people can try to reproduce the bug. It is a non public, proprietary application which uses the Ergo Computing 286 DOS Extender. I guess some other application which use the same DOS extender have the same problem. So best thing is to find another application which uses the Ergo Computing 286 DOS Extender, too. Ciao, Gerhard -- http://www.wiesinger.com/
Re: [Qemu-devel] Problem with QEMU on KVM
On Mon, 12 Apr 2010, Jamie Lokier wrote: Mulyadi Santosa wrote: Hi Gerhard... On Sun, Apr 11, 2010 at 20:52, Gerhard Wiesinger li...@wiesinger.com wrote: OK, uses the following ports: Port 0x20: 8259 interrupt controller Port 0x40: 8253 timer Interrupt 0x1A: ah=0x00: fetches system timer counters ah=0x02: reads the clock ah=0x04: fetches date So there must be something wrong with KVM with the above functionality (I guess the timers). Hmm, my silly guess is, maybe the timer is seen as decrementedor at least stuck. I have very little knowledge about git, but maybe you can start doing git bisect to narrow which git commit that introduce such behaviour. Meanwhile, let's wait for comments from one of the KVM developers. There are various -no-kvm-XXX options to try: -no-kvm-irqchip disable KVM kernel mode PIC/IOAPIC/LAPIC -no-kvm-pit disable KVM kernel mode PIT -no-kvm-pit-reinjection disable KVM kernel mode PIT interrupt reinjection Any of them might be causing this problem. They disable in-kernel emulation of the interrupt controller and timer chips, and force the qemu version to be used. It's possible the in-kernel emulation is buggier than the qemu version. Hello Jamie, thanx so far. I'm starting in this way: /root/download/qemu/qemu-0.12.3/x86_64-softmmu/qemu-system-x86_64 -enable-kvm -a_lot_of_other_options All options you mentioned don't work: /root/download/qemu/qemu-0.12.3/x86_64-softmmu/qemu-system-x86_64: invalid option -- '-no-kvm-irqchip' /root/download/qemu/qemu-0.12.3/x86_64-softmmu/qemu-system-x86_64: invalid option -- '-no-kvm-pit' /root/download/qemu/qemu-0.12.3/x86_64-softmmu/qemu-system-x86_64: invalid option -- '-no-kvm-no-kvm-irqchip' Is there another way of startup or is it working only on newer versions? BTW: For me it is very unclear what's the difference between the following commands: qemu qemu -enable-kvm kvm How can I build the kvm executable? Different source tree? Newer version from branch? Maybe someone can explain the different versions of qemu/kvm and branches etc. and possible features in detail. Thnx. Ciao, Gerhard -- http://www.wiesinger.com/
Re: [Qemu-devel] [GSoC 2010] Pass-through filesystem support.
On Tue, Apr 13, 2010 at 9:08 PM, jvrao jv...@linux.vnet.ibm.com wrote: jvrao wrote: Alexander Graf wrote: On 12.04.2010, at 13:58, Jamie Lokier wrote: Mohammed Gamal wrote: On Mon, Apr 12, 2010 at 12:29 AM, Jamie Lokier ja...@shareable.org wrote: Javier Guerra Giraldez wrote: On Sat, Apr 10, 2010 at 7:42 AM, Mohammed Gamal m.gamal...@gmail.com wrote: On Sat, Apr 10, 2010 at 2:12 PM, Jamie Lokier ja...@shareable.org wrote: To throw a spanner in, the most widely supported filesystem across operating systems is probably NFS, version 2 :-) Remember that Windows usage on a VM is not some rare use case, and it'd be a little bit of a pain from a user's perspective to have to install a third party NFS client for every VM they use. Having something supported on the VM out of the box is a better option IMO. i don't think virtio-CIFS has any more support out of the box (on any system) than virtio-9P. It doesn't, but at least network-CIFS tends to work ok and is the method of choice for Windows VMs - when you can setup Samba on the host (which as previously noted you cannot always do non-disruptively with current Sambas). -- Jamie I think having support for both 9p and CIFS would be the best option. In that case the user will have the option to use either one, depending on how their guests support these filesystems. In that case I'd prefer to work on CIFS support while the 9p effort can still go on. I don't think both efforts are mutually exclusive. What do the rest of you guys think? I only noted NFS because most old OSes do not support CIFS or 9P - especially all the old unixes. I don't think old versions of MS-DOS and Windows (95, 98, ME, Nt4?) even support current CIFS. They need extra server settings to work - such as setting passwords on the server to non-encrypted and other quirks. Meanwhile Windows Vista/2008/7 works better with SMB2, not CIFS, to properly see symlinks and hard links. So there is no really nice out of the box file service which works easily with all guest OSes. I'm guessing that out of all the filesystems, CIFS is the most widely supported in recent OSes (released in the last 10 years). But I'm not really sure what the state of CIFS is for non-Windows, non-Linux, non-BSD guests. So what? If you want to have direct host fs access, install guest drivers. If you can't, set up networking and use CIFS or NFS or whatever. I'm not sure why 9P is being pursued. Does anything much support it, or do all OSes except quite recent Linux need a custom driver for 9P? Even Linux only got the first commit in the kernel 5 years ago, so probably it was only about 3 years ago that it will have begun appearing in stable distros, if at all. Filesystem passthrough to Linux guests installed in the last couple of years is a useful feature, and I know that for many people that is their only use of KVM, but compared with CIFS' broad support it seems like quite a narrow goal. The goal is to have something simple and fast. We can fine-tune 9P to align with the Linux VFS structures, making it really little overhead (and little headache too). For Windows guests, nothing prevents us to expose yet another 9P flavor. That again would be aligned well with Windows's VFS and be slim and fast there. The biggest problem I see with CIFS is that it's a huge beast. There are a lot of corner cases where it just doesn't fit in. See my previous mail for more details. As Alex mentioned, 9P is chosen for its mere simplicity and easy adaptability. NFS and CIFS does not give that flexibility. As we mentioned in the patch series, we are already seeing better numbers with 9P. Looking ahead 9P can embed KVM/QEMU knowledge to share physical resources like memory/cache between the host and the guest. I think looking into the windows side of 9P client would be great option too. The current patch on the mailing list supports .U (unix) protocol and will be introducing .L (Linux) variant as we move forward. Hi Mohammed, Please let us know once you decide on where your interest lies. Will be glad to have you on VirtFS (9P) though. :) - JV It seems the community is more keen on getting 9P support merged than getting CIFS supported, and they have made good points to support their argument. I'm not sure whether work on this project could fit in as a GSoC project and if there is much remaining work that could make it fit in that direction. But I'd be glad to volunteer anyway :) Regards, Mohammed
[Qemu-devel] Re: [PATCH] tun: orphan an skb on tx
On Tue, Apr 13, 2010 at 08:31:03PM +0200, Eric Dumazet wrote: Le mardi 13 avril 2010 à 20:39 +0300, Michael S. Tsirkin a écrit : When a socket with inflight tx packets is closed, we dont block the close, we only delay the socket freeing once all packets were delivered and freed. Which is wrong, since this is under userspace control, so you get unkillable processes. We do not get unkillable processes, at least with sockets I was thinking about (TCP/UDP ones). Maybe tun sockets can behave the same ? Looks like that's what my patch does: ip_rcv seems to call skb_orphan too. Herbert Acked your patch, so I guess its OK, but I think it can be dangerous. Anyway my feeling is that we try to add various mechanisms to keep a hostile user flooding another one. For example, UDP got memory accounting quite recently, and we added socket backlog limits very recently. It was considered not needed few years ago.
[Qemu-devel] Re: [PATCH] tun: orphan an skb on tx
On Tue, Apr 13, 2010 at 10:38:06PM +0200, Eric Dumazet wrote: Le mardi 13 avril 2010 à 23:25 +0300, Michael S. Tsirkin a écrit : On Tue, Apr 13, 2010 at 08:31:03PM +0200, Eric Dumazet wrote: Le mardi 13 avril 2010 à 20:39 +0300, Michael S. Tsirkin a écrit : When a socket with inflight tx packets is closed, we dont block the close, we only delay the socket freeing once all packets were delivered and freed. Which is wrong, since this is under userspace control, so you get unkillable processes. We do not get unkillable processes, at least with sockets I was thinking about (TCP/UDP ones). Maybe tun sockets can behave the same ? Looks like that's what my patch does: ip_rcv seems to call skb_orphan too. Well, I was speaking of tx side, you speak of receiving side. Point is, both ip_rcv and my patch call skb_orphan. An external flood (coming from another domain) is another problem. A sender might flood the 'network' inside our domain. How can we reasonably limit the sender ? Maybe the answer is 'We can not', but it should be stated somewhere, so that someone can address this point later. And whatever's done should ideally work for tap to IP and IP to IP sockets as well, not just tap to tap. -- MST
[Qemu-devel] [PATCH v2] Write cmos hd data for ide drives using -device parm
When specifying ide devices using -device, the cmos information which the bios depends on is not written. This patch generalizes the cmos hd data setting for the existing code path and adds the ability to call that code on a per machine, per ide drive basis. This is important for older guests, such as Windows XP, and Windows Server 2003. I needed to add an id to the IDEBus structure since there wasn't a way to identify whether an ide bus was primary or secondary in the lower level code. v2:Fixed same issue for isapc machine type (previous patch addressed pc machine types only). Additionally, the bus naming needed to be fixed in the isa case to be able to reference the secondary ide bus in -device nomenclature. Signed-off-by: Bruce Rogers brog...@novell.com Bruce hw/ide/internal.h |3 +- hw/ide/isa.c |8 +- hw/ide/piix.c |6 +++- hw/ide/qdev.c | 13 +++--- hw/pc.c | 68 -- sysemu.h |1 vl.c |6 7 files changed, 71 insertions(+), 34 deletions(-) diff --git a/hw/ide/internal.h b/hw/ide/internal.h index 027029e..4ea62bd 100644 --- a/hw/ide/internal.h +++ b/hw/ide/internal.h @@ -450,6 +450,7 @@ struct IDEBus { IDEState ifs[2]; uint8_t unit; uint8_t cmd; +uint8_t id; qemu_irq irq; }; @@ -565,7 +566,7 @@ void ide_init2(IDEBus *bus, DriveInfo *hd0, DriveInfo *hd1, void ide_init_ioport(IDEBus *bus, int iobase, int iobase2); /* hw/ide/qdev.c */ -void ide_bus_new(IDEBus *idebus, DeviceState *dev); +void ide_bus_new(IDEBus *idebus, DeviceState *dev, const char *name); IDEDevice *ide_create_drive(IDEBus *bus, int unit, DriveInfo *drive); #endif /* HW_IDE_INTERNAL_H */ diff --git a/hw/ide/isa.c b/hw/ide/isa.c index dff7c79..ac3ee3b 100644 --- a/hw/ide/isa.c +++ b/hw/ide/isa.c @@ -67,7 +67,13 @@ static int isa_ide_initfn(ISADevice *dev) { ISAIDEState *s = DO_UPCAST(ISAIDEState, dev, dev); -ide_bus_new(s-bus, s-dev.qdev); +if (s-iobase == 0x1f0) { +ide_bus_new(s-bus, s-dev.qdev, ide.0); +s-bus.id = 0; +} else { +ide_bus_new(s-bus, s-dev.qdev, ide.1); +s-bus.id = 1; +} ide_init_ioport(s-bus, s-iobase, s-iobase2); isa_init_irq(dev, s-irq, s-isairq); ide_init2(s-bus, NULL, NULL, s-irq); diff --git a/hw/ide/piix.c b/hw/ide/piix.c index 295a93d..377fda8 100644 --- a/hw/ide/piix.c +++ b/hw/ide/piix.c @@ -131,11 +131,13 @@ static int pci_piix_ide_initfn(PCIIDEState *d) vmstate_register(0, vmstate_ide_pci, d); -ide_bus_new(d-bus[0], d-dev.qdev); -ide_bus_new(d-bus[1], d-dev.qdev); +ide_bus_new(d-bus[0], d-dev.qdev, NULL); +ide_bus_new(d-bus[1], d-dev.qdev, NULL); ide_init_ioport(d-bus[0], 0x1f0, 0x3f6); ide_init_ioport(d-bus[1], 0x170, 0x376); +d-bus[0].id = 0; +d-bus[1].id = 1; ide_init2(d-bus[0], NULL, NULL, isa_reserve_irq(14)); ide_init2(d-bus[1], NULL, NULL, isa_reserve_irq(15)); return 0; diff --git a/hw/ide/qdev.c b/hw/ide/qdev.c index b18693d..c868b8e 100644 --- a/hw/ide/qdev.c +++ b/hw/ide/qdev.c @@ -29,9 +29,9 @@ static struct BusInfo ide_bus_info = { .size = sizeof(IDEBus), }; -void ide_bus_new(IDEBus *idebus, DeviceState *dev) +void ide_bus_new(IDEBus *idebus, DeviceState *dev, const char *name) { -qbus_create_inplace(idebus-qbus, ide_bus_info, dev, NULL); +qbus_create_inplace(idebus-qbus, ide_bus_info, dev, name); } static int ide_qdev_init(DeviceState *qdev, DeviceInfo *base) @@ -39,6 +39,7 @@ static int ide_qdev_init(DeviceState *qdev, DeviceInfo *base) IDEDevice *dev = DO_UPCAST(IDEDevice, qdev, qdev); IDEDeviceInfo *info = DO_UPCAST(IDEDeviceInfo, qdev, base); IDEBus *bus = DO_UPCAST(IDEBus, qbus, qdev-parent_bus); +int i; if (!dev-conf.dinfo) { fprintf(stderr, %s: no drive specified\n, qdev-info-name); @@ -65,7 +66,13 @@ static int ide_qdev_init(DeviceState *qdev, DeviceInfo *base) default: goto err; } -return info-init(dev); +dev-dinfo-bus = bus-id; +dev-dinfo-unit = dev-unit; +i = info-init(dev); +if (i = 0) { +machine_ide_finalize(dev-dinfo); +} +return i; err: return -1; diff --git a/hw/pc.c b/hw/pc.c index 69e597f..29f47d4 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -215,6 +215,44 @@ static void cmos_init_hd(int type_ofs, int info_ofs, BlockDriverState *hd) rtc_set_memory(s, info_ofs + 8, sectors); } +static void set_cmos_hd_data(DriveInfo *dinfo) +{ +static int hd_type = 0; +static int hd_trans = 0; +RTCState *s = rtc_state; +int cylinders, heads, sectors, translation; + +if (dinfo-bus == 0) { +if (dinfo-unit == 0) { +hd_type |= 0xf0; +cmos_init_hd(0x19, 0x1b, dinfo-bdrv); +} else { +hd_type |= 0x0f; +cmos_init_hd(0x1a, 0x24, dinfo-bdrv); +} +rtc_set_memory(s, 0x12,
[Qemu-devel] Re: [PATCH] tun: orphan an skb on tx
Le mardi 13 avril 2010 à 17:36 +0200, Jan Kiszka a écrit : Michael S. Tsirkin wrote: The following situation was observed in the field: tap1 sends packets, tap2 does not consume them, as a result tap1 can not be closed. And before that, tap1 may not be able to send further packets to anyone else on the bridge as its TX resources were blocked by tap2 - that's what we saw in the field. After the patch, tap1 is able to flood tap2, and tap3/tap4 not able to send one single frame. Is it OK ? Back to the problem : tap1 cannot be closed. Why ? because of refcounts ? When a socket with inflight tx packets is closed, we dont block the close, we only delay the socket freeing once all packets were delivered and freed.
Re: [Qemu-devel] [PATCH] arm: Fix compiler warning (fprintf format string)
On Fri, Apr 09, 2010 at 10:49:50PM +0200, Stefan Weil wrote: When argument checking is enabled, gcc throws this error: error: format not a string literal and no format arguments The patch rewrites the statement to satisfy the compiler. Signed-off-by: Stefan Weil w...@mail.berlios.de Thanks, applied. --- arm-dis.c |8 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/arm-dis.c b/arm-dis.c index 4fb899e..5028555 100644 --- a/arm-dis.c +++ b/arm-dis.c @@ -3149,14 +3149,14 @@ print_insn_thumb16 (bfd_vma pc, struct disassemble_info *info, long given) if (started) func (stream, , ); started = 1; - func (stream, arm_regnames[14] /* lr */); + func (stream, %s, arm_regnames[14] /* lr */); } if (domaskpc) { if (started) func (stream, , ); - func (stream, arm_regnames[15] /* pc */); + func (stream, %s, arm_regnames[15] /* pc */); } func (stream, }); @@ -3699,7 +3699,7 @@ print_insn_thumb32 (bfd_vma pc, struct disassemble_info *info, long given) } else { - func (stream, psr_name (given 0xff)); + func (stream, %s, psr_name (given 0xff)); } break; @@ -3707,7 +3707,7 @@ print_insn_thumb32 (bfd_vma pc, struct disassemble_info *info, long given) if ((given 0xff) == 0) func (stream, %cPSR, (given 0x10) ? 'S' : 'C'); else - func (stream, psr_name (given 0xff)); + func (stream, %s, psr_name (given 0xff)); break; case '0': case '1': case '2': case '3': case '4': -- 1.5.6.5 -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurel...@aurel32.net http://www.aurel32.net
Re: [Qemu-devel] [PATCH] m68k: Fix compiler warning (fprintf format string)
On Fri, Apr 09, 2010 at 10:49:51PM +0200, Stefan Weil wrote: When argument checking is enabled, gcc throws this error: error: format not a string literal and no format arguments The patch rewrites the statement to satisfy the compiler. It also removes a type cast which is not needed. Signed-off-by: Stefan Weil w...@mail.berlios.de Thanks, applied. --- m68k-dis.c |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/m68k-dis.c b/m68k-dis.c index d38d5a2..7cd88f8 100644 --- a/m68k-dis.c +++ b/m68k-dis.c @@ -1104,7 +1104,7 @@ print_insn_arg (const char *d, { static const char *const cacheFieldName[] = { nc, dc, ic, bc }; val = fetch_arg (buffer, place, 2, info); -(*info-fprintf_func) (info-stream, cacheFieldName[val]); +(*info-fprintf_func) (info-stream, %s, cacheFieldName[val]); break; } @@ -1199,7 +1199,7 @@ print_insn_arg (const char *d, { static const char *const scalefactor_name[] = { , }; val = fetch_arg (buffer, place, 1, info); - (*info-fprintf_func) (info-stream, scalefactor_name[val]); + (*info-fprintf_func) (info-stream, %s, scalefactor_name[val]); } else { @@ -1804,7 +1804,7 @@ match_insn_m68k (bfd_vma memaddr, save_p = p; info-print_address_func = dummy_print_address; - info-fprintf_func = (fprintf_ftype) dummy_printer; + info-fprintf_func = dummy_printer; /* We scan the operands twice. The first time we don't print anything, but look for errors. */ -- 1.5.6.5 -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurel...@aurel32.net http://www.aurel32.net
Re: [Qemu-devel] [PATCH] sh4: Fix compiler warning (fprintf format string)
On Fri, Apr 09, 2010 at 10:49:52PM +0200, Stefan Weil wrote: When argument checking is enabled, gcc throws this error: error: format not a string literal and no format arguments The patch rewrites the statement to satisfy the compiler. Signed-off-by: Stefan Weil w...@mail.berlios.de Thanks, applied. --- sh4-dis.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/sh4-dis.c b/sh4-dis.c index 41fd866..078a6b2 100644 --- a/sh4-dis.c +++ b/sh4-dis.c @@ -1493,10 +1493,10 @@ print_insn_ppi (int field_b, struct disassemble_info *info) print_dsp_reg (field_b 0xf, fprintf_fn, stream); break; case DSP_REG_X: - fprintf_fn (stream, sx_tab[(field_b 6) 3]); + fprintf_fn (stream, %s, sx_tab[(field_b 6) 3]); break; case DSP_REG_Y: - fprintf_fn (stream, sy_tab[(field_b 4) 3]); + fprintf_fn (stream, %s, sy_tab[(field_b 4) 3]); break; case A_MACH: fprintf_fn (stream, mach); -- 1.5.6.5 -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurel...@aurel32.net http://www.aurel32.net
Re: [Qemu-devel] [PATCH] sparc: Fix compiler warning (fprintf format string)
On Fri, Apr 09, 2010 at 10:49:53PM +0200, Stefan Weil wrote: When argument checking is enabled, gcc throws this error: error: format not a string literal and no format arguments The patch rewrites the statement to satisfy the compiler. Signed-off-by: Stefan Weil w...@mail.berlios.de Thanks, applied. --- sparc-dis.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/sparc-dis.c b/sparc-dis.c index 83a12ae..611e74f 100644 --- a/sparc-dis.c +++ b/sparc-dis.c @@ -2778,7 +2778,7 @@ print_insn_sparc (bfd_vma memaddr, disassemble_info *info) /* Can't do simple format if source and dest are different. */ continue; - (*info-fprintf_func) (stream, opcode-name); + (*info-fprintf_func) (stream, %s, opcode-name); { const char *s; -- 1.5.6.5 -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurel...@aurel32.net http://www.aurel32.net
Re: [Qemu-devel] [PATCH] linux-user: do_shmdt(): Fix page_set_flags's 2nd arg.
On Sun, Apr 11, 2010 at 02:09:57AM +0900, takas...@ops.dti.ne.jp wrote: 2nd arg of page_set_flags() should be start+size, but size. Signed-off-by: Takashi YOSHII takas...@ops.dti.ne.jp Thanks, applied. --- linux-user/syscall.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index a03e432..26c0fb4 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -2752,7 +2752,7 @@ static inline abi_long do_shmdt(abi_ulong shmaddr) for (i = 0; i N_SHM_REGIONS; ++i) { if (shm_regions[i].start == shmaddr) { shm_regions[i].start = 0; -page_set_flags(shmaddr, shm_regions[i].size, 0); +page_set_flags(shmaddr, shmaddr + shm_regions[i].size, 0); break; } } -- 1.6.5 -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurel...@aurel32.net http://www.aurel32.net
[Qemu-devel] Qemu sharing files help
Hi, I have Ubuntu on my host system and I am using following command on it, to load kernel.. *qemu -kernel kernelimage -initrd initrd.img /dev/zero -append cmdline * Was wondering, how can I share/copy files from host Ubuntu system to the one with above command? Someone mentioned using sshfs, but was hard for me to figure out, how ? Thanks, Arpit
Re: [Qemu-devel] Problem with DOS application and 286 DOS Extender application
Gerhard Wiesinger wrote: It is a non public, proprietary application which uses the Ergo Computing 286 DOS Extender. I guess some other application which use the same DOS extender have the same problem. So best thing is to find another application which uses the Ergo Computing 286 DOS Extender, too. The 286 was obsolete 20 years ago, although code depending on it persisted for some years after. I'm fairly sure the number of people using (or trying to use) Qemu with 286-specific code is very small indeed, so unfortunately for a 286 problem, you will need to help reproduce it as much as you can for it to be fixed. Note that Qemu doesn't emulate segments properly even for 32-bit x86 code, and 16-bit (286) code depends on that all the more. That may be the problem. Or it may be the reset using keyboard controller and BIOS method used to switch from protected mode to real mode on a 286 is not implemented properly, or is not supported by the BIOS properly. Or it may simply be a bug in 16-bit task segment switching or something like that, which is quite complex and so rarely used that it might never have been properly tested. Did you try running the application under Bochs, which has a more accurate emulation of very old x86 CPUs? -- Jamie
[Qemu-devel] Re: ehci update
David S. Ahern wrote: After a month of code refactoring and clean ups, etc, I thought I would send along an update. The attached patch is relative to your ehci branch; I also attached the full usb-ehci.c file for easier reading. Thanks for your work! I applied it and once again merged git head at this chance. Just one request for the future: Please keep a queue of incremental changes. This EHCI beast is sufficiently tricky, and at some point someone (you included) may want to go back in history to find out where some change came from, and why it was applied. E.g.: We apparently regressed further /wrt my favorite test case (as it's self-contained): -usbdevice net. qemu is now entering an infinite loop when you start dhcpcd in the guest on that interface. At this point I can get a Windows XP guest to format a 4GB key and read from and write to it. I can get an FC-12 guest to format a 4GB key and an 8GB key as well as read from and write to both. Write rates are on the order of 8 MB/sec for dd: # dd if=/dev/zero of=test bs=1M count=100 oflag=dsync 100+0 records in 100+0 records out 104857600 bytes (105 MB) copied, 12.1205 s, 8.7 MB/s rsync of text files (e.g., /var/log) is on the order of 2MB/sec. 4GB keys are definitely more stable; the 8GB is not recognized by Windows XP. It still needs a lot of love, but definitely an improvement from the last version. The biggest difference for the performance boost and stability is discovering that the usbfs in linux limits transactions to 16k versus the EHCI spec which allows 20k per qTD. I added a hack to submit which detects 20k requests from a guest and breaks it up into 2 requests through the host (a 16k and then a 4k). Did someone already bring this up on LKML or wherever usbfs is discussed? Should be fixable, I naively guess. Jan signature.asc Description: OpenPGP digital signature
[Qemu-devel] Re: ehci update
On 04/13/2010 05:35 PM, Jan Kiszka wrote: David S. Ahern wrote: After a month of code refactoring and clean ups, etc, I thought I would send along an update. The attached patch is relative to your ehci branch; I also attached the full usb-ehci.c file for easier reading. Thanks for your work! I applied it and once again merged git head at this chance. Just one request for the future: Please keep a queue of incremental changes. This EHCI beast is sufficiently tricky, and at some point someone (you included) may want to go back in history to find out where some change came from, and why it was applied. Agreed. I will submit focused changes from now on. E.g.: We apparently regressed further /wrt my favorite test case (as it's self-contained): -usbdevice net. qemu is now entering an infinite loop when you start dhcpcd in the guest on that interface. I've been focused on a single path -- async, bulk transfers to a USB key. I take a look at the ethernet device as well. I've struggled with the infinite loop part: the async train jumps the track with FC-12 guest rather quickly (ie., the link list is no longer a loop). I put in a loop detector in the advance_state function which helps for storage devices - but clearly something is not right. I've been roaming the ehci_hcd code in the kernel as well looking for clues. A lot of details to gel and the day job keeps getting in the way. :-) At this point I can get a Windows XP guest to format a 4GB key and read from and write to it. I can get an FC-12 guest to format a 4GB key and an 8GB key as well as read from and write to both. Write rates are on the order of 8 MB/sec for dd: # dd if=/dev/zero of=test bs=1M count=100 oflag=dsync 100+0 records in 100+0 records out 104857600 bytes (105 MB) copied, 12.1205 s, 8.7 MB/s rsync of text files (e.g., /var/log) is on the order of 2MB/sec. 4GB keys are definitely more stable; the 8GB is not recognized by Windows XP. It still needs a lot of love, but definitely an improvement from the last version. The biggest difference for the performance boost and stability is discovering that the usbfs in linux limits transactions to 16k versus the EHCI spec which allows 20k per qTD. I added a hack to submit which detects 20k requests from a guest and breaks it up into 2 requests through the host (a 16k and then a 4k). Did someone already bring this up on LKML or wherever usbfs is discussed? Should be fixable, I naively guess. I submitted the patch to linux-usb and it was nack'ed. The response was that memory is allocated in powers of 2 so trying to up the limit from 16k to 20k means it will actually want to find 32k of contiguous memory. The suggestion was to handle it with multiple requests within qemu. I guess libusb does that. Right now there is a hack in the ehci model to do this, but the usb-linux code would be a better place since it might be specific to linux hosts. David Jan
[Qemu-devel] Re: [PATCH] tun: orphan an skb on tx
On Tue, Apr 13, 2010 at 08:31:03PM +0200, Eric Dumazet wrote: Herbert Acked your patch, so I guess its OK, but I think it can be dangerous. The tun socket accounting was never designed to stop it from flooding another tun interface. It's there to stop it from transmitting above a destination interface TX bandwidth and cause unnecessary packet drops. It also limits the total amount of kernel memory that can be pinned down by a single tun interface. In this case, all we're doing is shifting the accounting from the hardware queue to the qdisc queue. So your ability to flood a tun interface is essentially unchanged. BTW we do the same thing in a number of hardware drivers, as well as virtio-net. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} herb...@gondor.apana.org.au Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [SeaBIOS] [Qemu-devel] QEMU regression problems
On Tue, Apr 13, 2010 at 02:26:10PM +0800, Roy Tam wrote: 2010/4/12 Gerhard Wiesinger li...@wiesinger.com: 3.) There is also a problem with the reported base memory under QEMM386 (HIMEM.SYS and EMM386.EXE is correct here). It is 646kB instead of 640kB. Therefore base memory test fails. I guess that reporting memory CMOS tables/interrupt functions are not implemented correctly. - The Base Memory 640K error seems to be SeaBIOS related. QEMU Bochs BIOS(tested with both -old-bios hack in 0.12 series and old 0.11.1) will just freeze after QEMU counted RAM.(Tested with ScriptPC and Bochs). The SeaBIOS log would really help. This can be done by adding: -chardev stdio,id=seabios -device isa-debugcon,iobase=0x402,chardev=seabios to the qemu command line. The memory can be obtained in several places (int 12, int 1588, int 15e801, int 15e820, and mem 40:13). All look fine to me from looking at the code. -Kevin
Re: [Qemu-devel] Re: ehci update
On 14.04.2010, at 01:50, David S. Ahern wrote: On 04/13/2010 05:35 PM, Jan Kiszka wrote: David S. Ahern wrote: After a month of code refactoring and clean ups, etc, I thought I would send along an update. The attached patch is relative to your ehci branch; I also attached the full usb-ehci.c file for easier reading. Thanks for your work! I applied it and once again merged git head at this chance. Just one request for the future: Please keep a queue of incremental changes. This EHCI beast is sufficiently tricky, and at some point someone (you included) may want to go back in history to find out where some change came from, and why it was applied. Agreed. I will submit focused changes from now on. E.g.: We apparently regressed further /wrt my favorite test case (as it's self-contained): -usbdevice net. qemu is now entering an infinite loop when you start dhcpcd in the guest on that interface. I've been focused on a single path -- async, bulk transfers to a USB key. I take a look at the ethernet device as well. I've struggled with the infinite loop part: the async train jumps the track with FC-12 guest rather quickly (ie., the link list is no longer a loop). I put in a loop detector in the advance_state function which helps for storage devices - but clearly something is not right. I've been roaming the ehci_hcd code in the kernel as well looking for clues. A lot of details to gel and the day job keeps getting in the way. :-) At this point I can get a Windows XP guest to format a 4GB key and read from and write to it. I can get an FC-12 guest to format a 4GB key and an 8GB key as well as read from and write to both. Write rates are on the order of 8 MB/sec for dd: # dd if=/dev/zero of=test bs=1M count=100 oflag=dsync 100+0 records in 100+0 records out 104857600 bytes (105 MB) copied, 12.1205 s, 8.7 MB/s rsync of text files (e.g., /var/log) is on the order of 2MB/sec. 4GB keys are definitely more stable; the 8GB is not recognized by Windows XP. It still needs a lot of love, but definitely an improvement from the last version. The biggest difference for the performance boost and stability is discovering that the usbfs in linux limits transactions to 16k versus the EHCI spec which allows 20k per qTD. I added a hack to submit which detects 20k requests from a guest and breaks it up into 2 requests through the host (a 16k and then a 4k). Did someone already bring this up on LKML or wherever usbfs is discussed? Should be fixable, I naively guess. I submitted the patch to linux-usb and it was nack'ed. The response was that memory is allocated in powers of 2 so trying to up the limit from 16k to 20k means it will actually want to find 32k of contiguous memory. The suggestion was to handle it with multiple requests within qemu. I guess libusb does that. Any reason we're not using libusb? Alex
[Qemu-devel] Re: SeaBIOS error with Juniper FreeBSD kernel
On Tue, Apr 13, 2010 at 08:48:35PM +0200, Bjørn Mork wrote: Gleb Natapov g...@redhat.com writes: On Sun, Feb 28, 2010 at 02:39:04PM -0500, Kevin O'Connor wrote: On Sun, Feb 21, 2010 at 04:18:38PM -0700, Brandon Bennett wrote: I have narrowed it down to SMBIOS. If I disable CONFIG_SMBIOS the image boots up fine. If there is any seabios revision that works then it is possible to bisect to find problematic commit. I'm also bitten by this and have now attempted to bisect it. The result was: c95d2cee36db79c88253d43a90230ceadf0c26cf is first bad commit commit c95d2cee36db79c88253d43a90230ceadf0c26cf It looks like memory layout changes in the f-segment is tickling the underlying bug. I don't think SMBIOS, the above commit, or the other commit identified earlier are the root cause of the problem. Instead, I'd guess these commits just change the memory layout enough to avoid the root cause. This looks like it is going to require some careful study with a debugger and some execution traces. Unfortunately, since this image isn't available for download it makes it difficult to track down. -Kevin
[Qemu-devel] Re: [PULL] pci,vhost-net,eepro100
On 04/11/2010 03:29 PM, Michael S. Tsirkin wrote: The following changes since commit f7e2aca83419dde3c94fa1d5e615581bb4ded9c0: tcg/ppc: Fix typo (2010-04-06 03:10:03 +0400) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/mst/qemu.git for_anthony Pulled. Thanks. Regards, Anthony Liguori David L Stevens (1): vhost: fix features ack Marcelo Tosatti (1): vhost.c: includelinux/vhost.h last Michael S. Tsirkin (3): pci: add API to add capability at a known offset eepro100: convert to new capability API vhost-net: disable mergeable buffers Stefan Weil (8): eepro100: Don't allow writing SCBStatus eepro100: Simplify status handling eepro100: Simplified device instantiation eepro100: Add new device variant i82801 eepro100: Set configuration bit for standard TCB eepro100: Set power management capability using pci_reserve_capability eepro100: fix mapping of flash memory eepro100: fix PCI interrupt pin configuration regression hw/eepro100.c | 534 -- hw/pci.c| 17 ++- hw/pci.h|2 + hw/vhost.c |2 +- hw/vhost_net.c |1 + hw/virtio-net.c |8 + 6 files changed, 223 insertions(+), 341 deletions(-)
Re: [Qemu-devel] Re: ehci update
On 04/13/2010 07:20 PM, Alexander Graf wrote: It still needs a lot of love, but definitely an improvement from the last version. The biggest difference for the performance boost and stability is discovering that the usbfs in linux limits transactions to 16k versus the EHCI spec which allows 20k per qTD. I added a hack to submit which detects 20k requests from a guest and breaks it up into 2 requests through the host (a 16k and then a 4k). Did someone already bring this up on LKML or wherever usbfs is discussed? Should be fixable, I naively guess. I submitted the patch to linux-usb and it was nack'ed. The response was that memory is allocated in powers of 2 so trying to up the limit from 16k to 20k means it will actually want to find 32k of contiguous memory. The suggestion was to handle it with multiple requests within qemu. I guess libusb does that. Any reason we're not using libusb? Good question. I was wondering the same. I was going to look at converting usb-linux to use libusb1 when I get some time. David Alex
[Qemu-devel] [PATCH v5 00/17] virtio-serial fixes, new abi for port discovery, flow control
Hello, These patches rework the way ports are announced to the guests. A control message is used to let the guest know a new port is added. Initial port discovery and port hot-plug work via this way now. This was done to have the host and guest port numbering in sync to avoid surprises after several hotplug/unplug operations and migrations. The ability to assign a particular port number to ports is also added so that management software can control the placement of ports. This iteration fixes a few things Juan pointed out. Juan and Gerd have already acked v4 of the patch series, from which it remains largely unchanged. Differences from v4: - Juan: qemu_free() handles NULL fine - Juan: Send ports_map as an array of u32s instead of as a buffer (big/little endian issues) - Juan: iov should be compiled only once (nothing target-specific) - A few fixes from the other submission: discard piled up buffers in the vq after port close or hot-unplug. Overall: - Users can set the port id they wish to instantiate ports at by using the ,nr= parameter to 'virtserialport' and 'virtconsole' devices - Migration fixes: refuse migration when: - number of active ports is different between the src and destination - max_nr_ports a device can support on the src is more - If a qemu chardev connection to a port is closed on the dest while it was open on the src, inform the guest about this. (Also do the same for port closed on src but open on dest.) - Use control messages for relaying new port information instead of config space (changes abi) - Propagate error message from guest in instantiating a port or a device to the user. - Handle scatter/gather for control output and data output from the guest - Fix abuse of virtio api in the virtqueue_push() function - Add an API for the ports for flow control: ports can signal when they're ready to accept data / stop sending data. Amit Shah (17): virtio-serial: save/load: Ensure target has enough ports virtio-serial: save/load: Ensure nr_ports on src and dest are same. virtio-serial: save/load: Ensure we have hot-plugged ports instantiated virtio-serial: save/load: Send target host connection status if different virtio-serial: Use control messages to notify guest of new ports virtio-serial: whitespace: match surrounding code virtio-serial: Remove redundant check for 0-sized write request virtio-serial: Update copyright year to 2010 virtio-serial: Propagate errors in initialising ports / devices in guest virtio-serial: Send out guest data to ports only if port is opened iov: Introduce a new file for helpers around iovs, add iov_from_buf() iov: Add iov_to_buf and iov_size helpers virtio-serial: Handle scatter-gather buffers for control messages virtio-serial: Handle scatter/gather input from the guest virtio-serial: Apps should consume all data that guest sends out / Fix virtio api abuse virtio-serial: Discard data that guest sends us when ports aren't connected virtio-serial: Implement flow control for individual ports Makefile |2 + Makefile.objs |1 + hw/iov.c | 70 ++ hw/iov.h | 19 +++ hw/virtio-balloon.c| 35 +- hw/virtio-console.c| 11 +- hw/virtio-net.c| 20 +--- hw/virtio-serial-bus.c | 333 hw/virtio-serial.h | 34 -- 9 files changed, 377 insertions(+), 148 deletions(-) create mode 100644 hw/iov.c create mode 100644 hw/iov.h
[Qemu-devel] [PATCH v5 01/17] virtio-serial: save/load: Ensure target has enough ports
The target could be started with max_nr_ports for a virtio-serial device lesser than what was available on the source machine. Fail the migration in such a case. Signed-off-by: Amit Shah amit.s...@redhat.com Reported-by: Juan Quintela quint...@redhat.com --- hw/virtio-serial-bus.c | 13 +++-- 1 files changed, 11 insertions(+), 2 deletions(-) diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c index 17c1ec1..9a7f0c1 100644 --- a/hw/virtio-serial-bus.c +++ b/hw/virtio-serial-bus.c @@ -374,10 +374,13 @@ static void virtio_serial_save(QEMUFile *f, void *opaque) /* Items in struct VirtIOSerial */ +qemu_put_be32s(f, s-bus-max_nr_ports); + /* Do this because we might have hot-unplugged some ports */ nr_active_ports = 0; -QTAILQ_FOREACH(port, s-ports, next) +QTAILQ_FOREACH(port, s-ports, next) { nr_active_ports++; +} qemu_put_be32s(f, nr_active_ports); @@ -399,7 +402,7 @@ static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id) { VirtIOSerial *s = opaque; VirtIOSerialPort *port; -uint32_t nr_active_ports; +uint32_t max_nr_ports, nr_active_ports; unsigned int i; if (version_id 2) { @@ -420,6 +423,12 @@ static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id) /* Items in struct VirtIOSerial */ +qemu_get_be32s(f, max_nr_ports); +if (max_nr_ports s-bus-max_nr_ports) { +/* Source could have more ports than us. Fail migration. */ +return -EINVAL; +} + qemu_get_be32s(f, nr_active_ports); /* Items in struct VirtIOSerialPort */ -- 1.6.2.5
[Qemu-devel] [PATCH v5 02/17] virtio-serial: save/load: Ensure nr_ports on src and dest are same.
The number of ports on the source as well as the destination machines should match. If they don't, it means some ports that got hotplugged on the source aren't instantiated on the destination. Or that ports that were hot-unplugged on the source are created on the destination. Signed-off-by: Amit Shah amit.s...@redhat.com Reported-by: Juan Quintela quint...@redhat.com --- hw/virtio-serial-bus.c | 18 -- 1 files changed, 16 insertions(+), 2 deletions(-) diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c index 9a7f0c1..d31e62d 100644 --- a/hw/virtio-serial-bus.c +++ b/hw/virtio-serial-bus.c @@ -402,7 +402,7 @@ static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id) { VirtIOSerial *s = opaque; VirtIOSerialPort *port; -uint32_t max_nr_ports, nr_active_ports; +uint32_t max_nr_ports, nr_active_ports, nr_ports; unsigned int i; if (version_id 2) { @@ -419,7 +419,21 @@ static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id) /* The config space */ qemu_get_be16s(f, s-config.cols); qemu_get_be16s(f, s-config.rows); -s-config.nr_ports = qemu_get_be32(f); +nr_ports = qemu_get_be32(f); + +if (nr_ports != s-config.nr_ports) { +/* + * Source hot-plugged/unplugged ports and we don't have all of + * them here. + * + * Note: This condition cannot check for all hotplug/unplug + * events: eg, if one port was hot-plugged and one was + * unplugged, the nr_ports remains the same but the port id's + * would have changed and we won't catch it here. A later + * check for !find_port_by_id() will confirm if this happened. + */ +return -EINVAL; +} /* Items in struct VirtIOSerial */ -- 1.6.2.5
[Qemu-devel] [PATCH v5 06/17] virtio-serial: whitespace: match surrounding code
The virtio-serial code doesn't mix declarations and definitions, so separate them out on different lines. Signed-off-by: Amit Shah amit.s...@redhat.com --- hw/virtio-serial-bus.c |5 - 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c index bb11a9b..8efba0b 100644 --- a/hw/virtio-serial-bus.c +++ b/hw/virtio-serial-bus.c @@ -354,7 +354,10 @@ static void handle_input(VirtIODevice *vdev, VirtQueue *vq) static uint32_t get_features(VirtIODevice *vdev, uint32_t features) { -VirtIOSerial *vser = DO_UPCAST(VirtIOSerial, vdev, vdev); +VirtIOSerial *vser; + +vser = DO_UPCAST(VirtIOSerial, vdev, vdev); + if (vser-bus-max_nr_ports 1) { features |= (1 VIRTIO_CONSOLE_F_MULTIPORT); } -- 1.6.2.5
[Qemu-devel] [PATCH v5 03/17] virtio-serial: save/load: Ensure we have hot-plugged ports instantiated
If some ports that were hot-plugged on the source are not available on the destination, fail migration instead of trying to deref a NULL pointer. Signed-off-by: Amit Shah amit.s...@redhat.com Reported-by: Juan Quintela quint...@redhat.com --- hw/virtio-serial-bus.c |7 +++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c index d31e62d..5316ef6 100644 --- a/hw/virtio-serial-bus.c +++ b/hw/virtio-serial-bus.c @@ -451,6 +451,13 @@ static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id) id = qemu_get_be32(f); port = find_port_by_id(s, id); +if (!port) { +/* + * The requested port was hot-plugged on the source but we + * don't have it + */ +return -EINVAL; +} port-guest_connected = qemu_get_byte(f); } -- 1.6.2.5
[Qemu-devel] [PATCH v5 07/17] virtio-serial: Remove redundant check for 0-sized write request
The check for a 0-sized write request to a guest port is not necessary; the while loop below won't be executed in this case and all will be fine. Signed-off-by: Amit Shah amit.s...@redhat.com --- hw/virtio-serial-bus.c |3 --- 1 files changed, 0 insertions(+), 3 deletions(-) diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c index 8efba0b..c1d9851 100644 --- a/hw/virtio-serial-bus.c +++ b/hw/virtio-serial-bus.c @@ -91,9 +91,6 @@ static size_t write_to_port(VirtIOSerialPort *port, if (!virtio_queue_ready(vq)) { return 0; } -if (!size) { -return 0; -} while (offset size) { int i; -- 1.6.2.5
[Qemu-devel] [PATCH v5 05/17] virtio-serial: Use control messages to notify guest of new ports
Allow the port 'id's to be set by a user on the command line. This is needed by management apps that will want a stable port numbering scheme for hot-plug/unplug and migration. Since the port numbers are shared with the guest (to identify ports in control messages), we just send a control message to the guest indicating addition of new ports (hot-plug) or notifying the guest of the available ports when the guest sends us a DEVICE_READY control message. Signed-off-by: Amit Shah amit.s...@redhat.com --- hw/virtio-console.c|2 + hw/virtio-serial-bus.c | 184 +++ hw/virtio-serial.h | 17 +++-- 3 files changed, 133 insertions(+), 70 deletions(-) diff --git a/hw/virtio-console.c b/hw/virtio-console.c index bd44ec6..17b221d 100644 --- a/hw/virtio-console.c +++ b/hw/virtio-console.c @@ -99,6 +99,7 @@ static VirtIOSerialPortInfo virtconsole_info = { .exit = virtconsole_exitfn, .qdev.props = (Property[]) { DEFINE_PROP_UINT8(is_console, VirtConsole, port.is_console, 1), +DEFINE_PROP_UINT32(nr, VirtConsole, port.id, VIRTIO_CONSOLE_BAD_ID), DEFINE_PROP_CHR(chardev, VirtConsole, chr), DEFINE_PROP_STRING(name, VirtConsole, port.name), DEFINE_PROP_END_OF_LIST(), @@ -133,6 +134,7 @@ static VirtIOSerialPortInfo virtserialport_info = { .init = virtserialport_initfn, .exit = virtconsole_exitfn, .qdev.props = (Property[]) { +DEFINE_PROP_UINT32(nr, VirtConsole, port.id, VIRTIO_CONSOLE_BAD_ID), DEFINE_PROP_CHR(chardev, VirtConsole, chr), DEFINE_PROP_STRING(name, VirtConsole, port.name), DEFINE_PROP_END_OF_LIST(), diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c index 484dc94..bb11a9b 100644 --- a/hw/virtio-serial-bus.c +++ b/hw/virtio-serial-bus.c @@ -41,6 +41,10 @@ struct VirtIOSerial { VirtIOSerialBus *bus; QTAILQ_HEAD(, VirtIOSerialPort) ports; + +/* bitmap for identifying active ports */ +uint32_t *ports_map; + struct virtio_console_config config; }; @@ -48,6 +52,10 @@ static VirtIOSerialPort *find_port_by_id(VirtIOSerial *vser, uint32_t id) { VirtIOSerialPort *port; +if (id == VIRTIO_CONSOLE_BAD_ID) { +return NULL; +} + QTAILQ_FOREACH(port, vser-ports, next) { if (port-id == id) return port; @@ -208,14 +216,25 @@ static void handle_control_message(VirtIOSerial *vser, void *buf) size_t buffer_len; gcpkt = buf; -port = find_port_by_id(vser, ldl_p(gcpkt-id)); -if (!port) -return; cpkt.event = lduw_p(gcpkt-event); cpkt.value = lduw_p(gcpkt-value); +port = find_port_by_id(vser, ldl_p(gcpkt-id)); +if (!port cpkt.event != VIRTIO_CONSOLE_DEVICE_READY) +return; + switch(cpkt.event) { +case VIRTIO_CONSOLE_DEVICE_READY: +/* + * The device is up, we can now tell the device about all the + * ports we have here. + */ +QTAILQ_FOREACH(port, vser-ports, next) { +send_control_event(port, VIRTIO_CONSOLE_PORT_ADD, 1); +} +break; + case VIRTIO_CONSOLE_PORT_READY: /* * Now that we know the guest asked for the port name, we're @@ -363,6 +382,7 @@ static void virtio_serial_save(QEMUFile *f, void *opaque) VirtIOSerial *s = opaque; VirtIOSerialPort *port; uint32_t nr_active_ports; +unsigned int i; /* The virtio device */ virtio_save(s-vdev, f); @@ -370,13 +390,17 @@ static void virtio_serial_save(QEMUFile *f, void *opaque) /* The config space */ qemu_put_be16s(f, s-config.cols); qemu_put_be16s(f, s-config.rows); -qemu_put_be32s(f, s-config.nr_ports); -/* Items in struct VirtIOSerial */ +qemu_put_be32s(f, s-config.max_nr_ports); + +/* The ports map */ + +for (i = 0; i (s-config.max_nr_ports + 31) / 32; i++) { +qemu_put_be32s(f, s-ports_map[i]); +} -qemu_put_be32s(f, s-bus-max_nr_ports); +/* Ports */ -/* Do this because we might have hot-unplugged some ports */ nr_active_ports = 0; QTAILQ_FOREACH(port, s-ports, next) { nr_active_ports++; @@ -388,11 +412,6 @@ static void virtio_serial_save(QEMUFile *f, void *opaque) * Items in struct VirtIOSerialPort. */ QTAILQ_FOREACH(port, s-ports, next) { -/* - * We put the port number because we may not have an active - * port at id 0 that's reserved for a console port, or in case - * of ports that might have gotten unplugged - */ qemu_put_be32s(f, port-id); qemu_put_byte(f, port-guest_connected); qemu_put_byte(f, port-host_connected); @@ -403,7 +422,8 @@ static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id) { VirtIOSerial *s = opaque; VirtIOSerialPort *port; -uint32_t max_nr_ports, nr_active_ports, nr_ports; +size_t ports_map_size; +
[Qemu-devel] [PATCH v5 08/17] virtio-serial: Update copyright year to 2010
Signed-off-by: Amit Shah amit.s...@redhat.com --- hw/virtio-console.c|2 +- hw/virtio-serial-bus.c |2 +- hw/virtio-serial.h |2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/hw/virtio-console.c b/hw/virtio-console.c index 17b221d..6b8 100644 --- a/hw/virtio-console.c +++ b/hw/virtio-console.c @@ -1,7 +1,7 @@ /* * Virtio Console and Generic Serial Port Devices * - * Copyright Red Hat, Inc. 2009 + * Copyright Red Hat, Inc. 2009, 2010 * * Authors: * Amit Shah amit.s...@redhat.com diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c index c1d9851..c77ea4f 100644 --- a/hw/virtio-serial-bus.c +++ b/hw/virtio-serial-bus.c @@ -1,7 +1,7 @@ /* * A bus for connecting virtio serial and console ports * - * Copyright (C) 2009 Red Hat, Inc. + * Copyright (C) 2009, 2010 Red Hat, Inc. * * Author(s): * Amit Shah amit.s...@redhat.com diff --git a/hw/virtio-serial.h b/hw/virtio-serial.h index 0548689..f023873 100644 --- a/hw/virtio-serial.h +++ b/hw/virtio-serial.h @@ -2,7 +2,7 @@ * Virtio Serial / Console Support * * Copyright IBM, Corp. 2008 - * Copyright Red Hat, Inc. 2009 + * Copyright Red Hat, Inc. 2009, 2010 * * Authors: * Christian Ehrhardt ehrha...@linux.vnet.ibm.com -- 1.6.2.5
[Qemu-devel] [PATCH v5 09/17] virtio-serial: Propagate errors in initialising ports / devices in guest
If adding of ports or devices in the guest fails we can send out a QMP event so that management software can deal with it. Signed-off-by: Amit Shah amit.s...@redhat.com --- hw/virtio-serial-bus.c | 10 ++ 1 files changed, 10 insertions(+), 0 deletions(-) diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c index c77ea4f..3a09f0d 100644 --- a/hw/virtio-serial-bus.c +++ b/hw/virtio-serial-bus.c @@ -223,6 +223,11 @@ static void handle_control_message(VirtIOSerial *vser, void *buf) switch(cpkt.event) { case VIRTIO_CONSOLE_DEVICE_READY: +if (!cpkt.value) { +error_report(virtio-serial-bus: Guest failure in adding device %s\n, + vser-bus-qbus.name); +break; +} /* * The device is up, we can now tell the device about all the * ports we have here. @@ -233,6 +238,11 @@ static void handle_control_message(VirtIOSerial *vser, void *buf) break; case VIRTIO_CONSOLE_PORT_READY: +if (!cpkt.value) { +error_report(virtio-serial-bus: Guest failure in adding port %u for device %s\n, + port-id, vser-bus-qbus.name); +break; +} /* * Now that we know the guest asked for the port name, we're * sure the guest has initialised whatever state is necessary -- 1.6.2.5