Re: [Qemu-devel] [PATCH v3] network scripts: don't block SIGCHLD before forking
On 12/08/2011 04:48 AM, Michael Roth wrote: This patch fixes a bug where child processes of launch_script() can misbehave due to SIGCHLD being blocked. In the case of `sudo`, this causes a permanent hang. Previously a SIGCHLD handler was added to reap fork_exec()'d zombie processes by calling waitpid(-1, ...). This required other fork()/waitpid() callers to temporarilly block SIGCHILD to avoid having the final wait status being intercepted by the SIGCHLD handler: 7c3370d4fe3fa6cda8655f109e4659afc8ca4269 Since then, the qemu_add_child_watch() interface was added to allow registration of such processes and reap only from that specific set of PIDs: 4d54ec7898bd951007cb6122d5315584bd41d0c4 As a result, we can now avoid blocking SIGCHLD in launch_script(), so drop that behavior. Signed-off-by: Michael Rothmdr...@linux.vnet.ibm.com --- net/tap.c |6 -- 1 files changed, 0 insertions(+), 6 deletions(-) diff --git a/net/tap.c b/net/tap.c index 1f26dc9..6c27a94 100644 --- a/net/tap.c +++ b/net/tap.c @@ -346,15 +346,10 @@ static TAPState *net_tap_fd_init(VLANState *vlan, static int launch_script(const char *setup_script, const char *ifname, int fd) { -sigset_t oldmask, mask; int pid, status; char *args[3]; char **parg; -sigemptyset(mask); -sigaddset(mask, SIGCHLD); -sigprocmask(SIG_BLOCK,mask,oldmask); - /* try to launch network script */ pid = fork(); if (pid == 0) { @@ -378,7 +373,6 @@ static int launch_script(const char *setup_script, const char *ifname, int fd) while (waitpid(pid,status, 0) != pid) { /* loop */ } -sigprocmask(SIG_SETMASK,oldmask, NULL); if (WIFEXITED(status) WEXITSTATUS(status) == 0) { return 0; Reviewed-by: Paolo Bonzini pbonz...@redhat.com Paolo
Re: [Qemu-devel] [BUG] [Seabios] PCI 64bit BARs on Win2008 - unable to start the device. (ACPI lacks the _DSM method)
On 12/05/11 05:29, Alexey Korolev wrote: Hi Gerd, We have very early prototype of data acquisition device, with quite large MMIO buffer. It is an emulated device. We are running the 0.15 release. 0.15 doesn't work correctly with 64bit BARs so I've already added some hacks to Seabios to let OS to choose the memory region. Thus you see bar 1, addr 0 in seabios log. I'd strongly suggest to move forward to qemu 1.0. Memory region handling has seen a major rewrite in 1.0 (memory api patches by avi). Chances are good that the 64bit bar bugs in qemu have been fixed meanwhile. I have experimental patches which add a 64bit bar to the qxl device and seabios handles it just fine (although memory-backed not mmio), except that there is no support yet to map 64bit bars above 4G. It shouldn't be that hard to add the latter though. seabios needs two more pci_region_type (PCI_REGION_TYPE_MEM_64 and PCI_REGION_TYPE_PREFMEM_64) to track and map 64bit bars separately. And a address space window where it can map 64bit bars to. Sorry that I haven't specified all this initially. I just want to make 64bit PCI bar working properly. Linux guests works correctly (except early versions - not investigated this yet). At the moment I have some issues with windows which relies on ACPI _CRS. ... and a _CRS entry for the 64bit bar address space window of course. cheers, Gerd
[Qemu-devel] RFC: Only display help options that are accepted by the architecture
Hi all, As the subject says, this is an RFC. I have a few patches (to follow), that change the help output from QEMU so that we only display options that are accepted by the arch of the running binary. So for example qemu-system-ppc64 will not tell you about i386 options like -no-acpi, -no-hpet etc. We already have almost all the information we need, it just requires actually filtering the options as we print them in the help text - and some related cruft, see the patches for details. I'd argue it's generally a sane change, but it also has the benefit that it makes libvirt's life easier. For better or worse libvirt parses the output of qemu -h to see what options are supported. At the moment that is problematic because the ppc64 emulator claims to support -no-acpi etc. So with this change libvirt could more reliably determine what is supported. cheers signature.asc Description: This is a digitally signed message part
[Qemu-devel] Rappel: Contacts DEFISCALISATION et PLACEMENTS FINANCIERS. Jusqu'à 35 contacts offerts en Décembre. Profitez-en!
Cher(s) CLIENTS, cgp(s) indépendant(s), cabinet(s) de commercilisation, promoteur(s)... Rendez-vous sur www.mes-contacts.com (recommandé par les promoteurs) pour obtenir maintenant les meilleurs Contacts entrants en Défiscalisation et Placements financiers (Ass.vie, SCPI,mutuelle,retraite..) Offre valable tout le mois de Décembre 2011 : CONTACTS DEFISCALISATION :*Forfait 750 #8364; : 6 contacts offerts soit 23 fiches au total au lieu de 17*Forfait 1000 #8364; :9 contacts offerts soit 33 fiches au total au lieu de 24*Forfait 2000 #8364; : 19 contacts offerts soit 70 fiches au total au lieu de 51*Forfait 5000 #8364; : 35 contacts offerts soit 170 fiches au total au lieu de 135 CONTACTS PLACEMENTS FINANCIERS:*Forfait 550 #8364; : 2 contacts offerts soit 17 fiches au total au lieu de 15*Forfait 1000 #8364; : 5 contacts offerts soit 35 fiches au total au lieu de 30*Forfait 2000 #8364; : 10 contacts offerts soit 90 fiches au total au lieu de 80-Contacts Exclusifs-Démarche volontaire des prospects-Livraison des contacts sur votre boîte e-mail en temps réel-Gros volume d'envois possible-Simple, efficace,économique et rentableà partir de 25 #8364;/le contactCordialement,L'équipe Mes-Contacts.comwww.mes-contacts.com Désinscription
[Qemu-devel] [PATCH 1/3] Add arch mask to headings but don't use it yet
Make it possible to specify what architecture a heading in the help doco applies to. Signed-off-by: Michael Ellerman mich...@ellerman.id.au --- A possibly nicer way to do this would be to add a new macro, perhaps ARCHHEADING, that is used for architecture specific headings. That would make the help source nicer, in that most headings would just be DEFHEADINGS - but it would mean we need to #define/undef another set of macros at each location we include the options. --- qemu-options.h |2 +- qemu-options.hx | 42 +- scripts/hxtool |2 +- vl.c|4 ++-- 4 files changed, 25 insertions(+), 25 deletions(-) diff --git a/qemu-options.h b/qemu-options.h index c96f994..c8c3022 100644 --- a/qemu-options.h +++ b/qemu-options.h @@ -31,7 +31,7 @@ enum { #define DEF(option, opt_arg, opt_enum, opt_help, arch_mask) \ opt_enum, -#define DEFHEADING(text) +#define DEFHEADING(text, arch_mask) #include qemu-options.def #undef DEF #undef DEFHEADING diff --git a/qemu-options.hx b/qemu-options.hx index 681eaf1..cf03763 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -6,7 +6,7 @@ HXCOMM construct option structures, enums and help message for specified HXCOMM architectures. HXCOMM HXCOMM can be used for comments, discarded from both texi and C -DEFHEADING(Standard options:) +DEFHEADING(Standard options:, QEMU_ARCH_ALL) STEXI @table @option ETEXI @@ -525,9 +525,9 @@ possible drivers and properties, use @code{-device ?} and @code{-device @var{driver},?}. ETEXI -DEFHEADING() +DEFHEADING(, QEMU_ARCH_ALL) -DEFHEADING(File system options:) +DEFHEADING(File system options:, QEMU_ARCH_ALL) DEF(fsdev, HAS_ARG, QEMU_OPTION_fsdev, -fsdev fsdriver,id=id,path=path,[security_model={mapped|passthrough|none}]\n @@ -583,9 +583,9 @@ Specifies the tag name to be used by the guest to mount this export point ETEXI -DEFHEADING() +DEFHEADING(, QEMU_ARCH_ALL) -DEFHEADING(Virtual File system pass-through options:) +DEFHEADING(Virtual File system pass-through options:, QEMU_ARCH_ALL) DEF(virtfs, HAS_ARG, QEMU_OPTION_virtfs, -virtfs local,path=path,mount_tag=tag,security_model=[mapped|passthrough|none]\n @@ -640,7 +640,7 @@ STEXI Create synthetic file system image ETEXI -DEFHEADING() +DEFHEADING(, QEMU_ARCH_ALL) DEF(name, HAS_ARG, QEMU_OPTION_name, -name string1[,process=string2]\n @@ -669,9 +669,9 @@ STEXI @end table ETEXI -DEFHEADING() +DEFHEADING(, QEMU_ARCH_ALL) -DEFHEADING(Display options:) +DEFHEADING(Display options:, QEMU_ARCH_ALL) STEXI @table @option @@ -1062,9 +1062,9 @@ STEXI @end table ETEXI -DEFHEADING() +DEFHEADING(, QEMU_ARCH_I386) -DEFHEADING(i386 target only:) +DEFHEADING(i386 target only:, QEMU_ARCH_I386) STEXI @table @option ETEXI @@ -1160,12 +1160,12 @@ Specify SMBIOS type 0 fields Specify SMBIOS type 1 fields ETEXI -DEFHEADING() +DEFHEADING(, QEMU_ARCH_ALL) STEXI @end table ETEXI -DEFHEADING(Network options:) +DEFHEADING(Network options:, QEMU_ARCH_ALL) STEXI @table @option ETEXI @@ -1484,9 +1484,9 @@ is activated if no @option{-net} options are provided. @end table ETEXI -DEFHEADING() +DEFHEADING(, QEMU_ARCH_ALL) -DEFHEADING(Character device options:) +DEFHEADING(Character device options:, QEMU_ARCH_ALL) DEF(chardev, HAS_ARG, QEMU_OPTION_chardev, -chardev null,id=id[,mux=on|off]\n @@ -1734,10 +1734,10 @@ Connect to a spice virtual machine channel, such as vdiport. @end table ETEXI -DEFHEADING() +DEFHEADING(, QEMU_ARCH_ALL) STEXI -DEFHEADING(Device URL Syntax:) +DEFHEADING(Device URL Syntax:, QEMU_ARCH_ALL) In addition to using normal file images for the emulated storage devices, QEMU can also use networked resources such as iSCSI devices. These are @@ -1823,7 +1823,7 @@ See also @url{http://http://www.osrg.net/sheepdog/}. @end table ETEXI -DEFHEADING(Bluetooth(R) options:) +DEFHEADING(Bluetooth(R) options:, QEMU_ARCH_ALL) DEF(bt, HAS_ARG, QEMU_OPTION_bt, \ -bt hci,nulldumb bluetooth HCI - doesn't respond to commands\n \ @@ -1893,9 +1893,9 @@ Virtual wireless keyboard implementing the HIDP bluetooth profile. @end table ETEXI -DEFHEADING() +DEFHEADING(, QEMU_ARCH_ALL) -DEFHEADING(Linux/Multiboot boot specific:) +DEFHEADING(Linux/Multiboot boot specific:, QEMU_ARCH_ALL) STEXI When using these options, you can use a given Linux or Multiboot @@ -1941,9 +1941,9 @@ STEXI @end table ETEXI -DEFHEADING() +DEFHEADING(, QEMU_ARCH_ALL) -DEFHEADING(Debug/Expert options:) +DEFHEADING(Debug/Expert options:, QEMU_ARCH_ALL) STEXI @table @option diff --git a/scripts/hxtool b/scripts/hxtool index 7ca83ed..8d07f01 100644 --- a/scripts/hxtool +++ b/scripts/hxtool @@ -45,7 +45,7 @@ hxtotexi() fi ;; DEFHEADING*) -echo $(expr $str : DEFHEADING(\(.*\))) +echo $(expr $str : DEFHEADING(\(.*\),.*)) ;; *) test $flag -eq 1 echo
[Qemu-devel] [PATCH 3/3] In qemu -h output, only print options for the arch we are running as
Only print options in the help output that are accepted by our arch. Signed-off-by: Michael Ellerman mich...@ellerman.id.au --- vl.c | 31 +-- 1 files changed, 17 insertions(+), 14 deletions(-) diff --git a/vl.c b/vl.c index b492f8c..ba8e76d 100644 --- a/vl.c +++ b/vl.c @@ -1492,28 +1492,31 @@ static void version(void) static void help(int exitcode) { -const char *options_help = -#define DEF(option, opt_arg, opt_enum, opt_help, arch_mask) \ -opt_help -#define DEFHEADING(text, arch_mask) stringify(text) \n +version(); +printf(usage: qemu [options] [disk_image]\n + \n + 'disk_image' is a raw hard disk image for IDE hard disk 0\n\n); + +#define DEF(option, opt_arg, opt_enum, opt_help, arch_mask)\ +if ((arch_mask) arch_type) \ +fputs(opt_help, stdout); + +#define DEFHEADING(text, arch_mask) \ +if ((arch_mask) arch_type)\ +puts(stringify(text)); + #include qemu-options.def #undef DEF #undef DEFHEADING #undef GEN_DOCS -; -version(); -printf(usage: qemu [options] [disk_image]\n - \n - 'disk_image' is a raw hard disk image for IDE hard disk 0\n - \n - %s\n - During emulation, the following keys are useful:\n + +printf(\nDuring emulation, the following keys are useful:\n ctrl-alt-f toggle full screen\n ctrl-alt-n switch to virtual console 'n'\n ctrl-alttoggle mouse and keyboard grab\n \n - When using -nographic, press 'ctrl-a h' to get some help.\n, - options_help); + When using -nographic, press 'ctrl-a h' to get some help.\n); + exit(exitcode); } -- 1.7.7.3
[Qemu-devel] [PATCH 2/3] vl.c: Fold constant string into printf rather than using %s
In help() we do what boils down to: printf(%s, qemu); This seems to be an artifact of be995c27640a82c7056b6f53d02ec823570114e5 (removed unused code), which removed some ifdef'ery that used to print a different name depending on CONFIG_SOFTMMU. But now that is gone and we always use qemu we may as well just put that in the format string. Signed-off-by: Michael Ellerman mich...@ellerman.id.au --- vl.c |3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/vl.c b/vl.c index 6a2ca6f..b492f8c 100644 --- a/vl.c +++ b/vl.c @@ -1502,7 +1502,7 @@ static void help(int exitcode) #undef GEN_DOCS ; version(); -printf(usage: %s [options] [disk_image]\n +printf(usage: qemu [options] [disk_image]\n \n 'disk_image' is a raw hard disk image for IDE hard disk 0\n \n @@ -1513,7 +1513,6 @@ static void help(int exitcode) ctrl-alttoggle mouse and keyboard grab\n \n When using -nographic, press 'ctrl-a h' to get some help.\n, - qemu, options_help); exit(exitcode); } -- 1.7.7.3
Re: [Qemu-devel] [PATCH] Fix parse of usb device description with multiple configurations
Hi, +} else if (descriptors[i + 5] != s-configuration) { +fprintf(stderr, not requested configuration %d\n, +s-configuration); +i += (descriptors[i + 3] 8) + descriptors[i + 2]; +continue; +} That message doesn't indicate an error and should be a DPRINTF instead of a fprintf. Otherwise the patch looks fine. cheers, Gerd
Re: [Qemu-devel] [PATCH 2/3] vl.c: Fold constant string into printf rather than using %s
Am 12.12.2011 09:21, schrieb Michael Ellerman: In help() we do what boils down to: printf(%s, qemu); This seems to be an artifact of be995c27640a82c7056b6f53d02ec823570114e5 (removed unused code), which removed some ifdef'ery that used to print a different name depending on CONFIG_SOFTMMU. But now that is gone and we always use qemu we may as well just put that in the format string. I would rather propose to save argv[0] and use that. By now qemu is not even correct for i386. Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
Re: [Qemu-devel] RFC: raw device support for block device targets
Am 11.12.2011 11:45, schrieb Alex Bligh: --On 8 December 2011 13:40:57 +0100 Kevin Wolf kw...@redhat.com wrote: qemu-img convert appears to support block devices as input, but not as output. That is irritating, as when using qemu-img convert to convert qcow to raw on a block partition, an intermediate file has to be used, which slows things down and pointlessly uses disk space. The problem is that ftruncate() is being called on the output file in order ensure it is sufficiently large, and this fails on block devices. ... Creating an image on a block device shouldn't even call raw_create(), but only hdev_create(), which doesn't try to truncate the device, but just uses lseek to make sure that it's large enough. Which qemu version are you using and what's your command line? I was testing on: amb@alex-test:~$ qemu-img --version qemu-img version 0.12.3, Copyright (c) 2004-2008 Fabrice Bellard That's the problem. It should work since 0.13. Kevin
Re: [Qemu-devel] [RFC 0/6] target-i386: TCGv cleanups
Am 12.12.2011 00:42, schrieb Paul Brook: This series makes target-i386 compile with DEBUG_TCGV_TL. What benefit does this provide? It showcases what changes would need to be done to allow type-safe compilation of the first pair of --enable-system targets. Especially my focus has been on how to do that without introducing loads of unnecessary temporaries. Andreas
Re: [Qemu-devel] [Bug 902148] [NEW] qemu-img V1.0 hangs on creating Image (0.15.1 runs)
On Fri, Dec 9, 2011 at 1:10 PM, Michael Niehren 902...@bugs.launchpad.net wrote: Strace on the hanging qemu-img ends on: select(5, [4], [], NULL, NULL) = 1 (in [4]) read(4, \0, 16) = 1 close(3) = 0 open(test.img, O_RDONLY|O_NONBLOCK) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=131072, ...}) = 0 close(3) = 0 open(test.img, O_RDONLY|O_NONBLOCK) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=131072, ...}) = 0 close(3) = 0 stat(test.img, {st_mode=S_IFREG|0644, st_size=131072, ...}) = 0 open(test.img, O_RDWR|O_CLOEXEC) = 3 lseek(3, 0, SEEK_END) = 131072 next line in the strace on working qemu-img V0.15.1 is: pread(3, QFI\373\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\20\0\0\0\0\0\0\0\0..., 512, 0) = 512 ... When it hangs does it consume CPU? Please attach gdb to the process and capture a backtrace: $ gdb -p $PID_OF_QEMU (gdb) thread apply all bt Stefan
Re: [Qemu-devel] [PATCH] hw/usb-net.c: Fix precedence bug when checking rndis_state
On 14 November 2011 17:21, andrzej zaborowski balr...@gmail.com wrote: On 14 November 2011 09:08, Peter Maydell peter.mayd...@linaro.org wrote: I'm happy that non-rndis works, I tested that. What I don't know is whether the patch breaks rndis Sorry, I misread what you said assuming that you tested a branch affected by this patch. I'm unable to find a guest to test the rndis mode so I reverted the patch until after release. Applying is probably the best way to get someone to test a change like this. Well, we're past release now, I think we could reasonably (re)apply this patch now? thanks -- PMM
Re: [Qemu-devel] Dropping the MONITOR_CMD_ASYNC
On Sun, Dec 11, 2011 at 10:29 AM, Alon Levy al...@redhat.com wrote: On Thu, Dec 08, 2011 at 05:45:44PM -0200, Luiz Capitulino wrote: Hi there, I'm about to completely drop the MONITOR_CMD_ASYNC API, but it turns out that the command client_migrate_info uses it. That's a legacy interface and has to be dropped, no command should be using it... Something tells me that if I just drop it (and change the command to use the regular interface), bad things will happen. Am I right? :) The monitor command client_migrate_info needs to complete after getting an ACK message from the currently connected spice client (this is the only case where this is required - if there is no client then the MONITOR_CMD_ASYNC API won't be used). This in turn requires the main thread to perform select and call the callback that will process this ACK. That's why the MONITOR_CMD_ASYNC API was used. I'm not aware of any other way to do this, I'll be glad for any help here. Also, I understand this is not what is not true async, since one would expect a true async interface to support multiple in flight monitor commands. If there is any ETA or existing way to do this we could change the implementation of client_migrate_info. Is it possible to use a QMP event to signal completion instead of a MONITOR_CMD_ASYNC command? Stefan
Re: [Qemu-devel] RFC: raw device support for block device targets
--On 12 December 2011 10:48:43 +0100 Kevin Wolf kw...@redhat.com wrote: I was testing on: amb@alex-test:~$ qemu-img --version qemu-img version 0.12.3, Copyright (c) 2004-2008 Fabrice Bellard That's the problem. It should work since 0.13. Thanks -- Alex Bligh
Re: [Qemu-devel] [PATCH 0/4] tcg: Add debug facilities for TCGv
Am 12.12.2011 00:28, schrieb Paul Brook: What mismatches does this catch that the existing debug code doesn't? Cf. patch 4/4: TCGv tmp = tcg_temp_new_i32(); tcg_temp_free_i32(tmp); TCGv_i32 tmp2 = tcg_temp_new(); tcg_temp_free(tmp2); Why is this a problem? If TARGET_LONG_BITS==32 then tcg_temp_free and tcg_temp_free_i32 are synonyms, and everything is happy. If TARGET_LONG_BITS==64 then we already flag this as an error. Try compiling --target-list=arm-softmmu --enable-debug-tcg with my series and DEBUG_TCGV_TL uncommented, and you'll see for yourself. There's too many to mention and for me to actually fix. You'll have to deal with it for ARMv8 at some point and this series hopefully helps. That's exactly why I think this patch is a bad idea. If a target always has TARGET_LONG_BITS==32 then it doesn't matter if we mix TCGv and TCGv_i32. To me and my perfectionism it does matter. This is not about fixing some user-visible runtime bug, it's about making the developer (me) aware of unintended type mixups. Trying to make a 32-bit target 64-bit safe without actually implementing the 64-bit target is a complete waste of time. That's where we disagree. I rather do things right from the start than leaving the cleanup work to someone else later on. You've almost no chance of getting it right. In some cases the correct answer will be to use 32-bit arithmetic, then sign/zero extend the result. In other cases the correct answer will be to perform word size arithmetic. Blindly picking one just makes the bugs harder to find later. This series picks nothing blindly. It provides an optional facility for the developer to be made aware of uses of tl where i32/i64 was intended (or vice versa), nothing changes at runtime. Whether and in which way the developer addresses the issues shown that way is up to the developer, and as any line added to a new target must be decided on a case-by-case basis. For me the most annoying issue was that tcg_gen_qemu_{ld,st}* needs TCGv. If you're trying to add support for targets where the primary word size is neither 32 nor 64 then that's a completely different problem, and probably not one that's worth solving. In practice your port is going to end up using 64- bit arithmetic and explicitly compensating for the excess precision where necessary. That's a different issue and not being addressed here. My point was that I have an inheritance hierarchy where (as opposed to, e.g., ppc and ppc64) both end up as TARGET_LONG_BITS==32, so I do not get a check for free by compiling both, and I do not want to introduce an artificial TARGET_LONG_BITS==64 architecture just to check that my tl==i32 target code is free of typos. Same issue if you pick only --target-list=some-softmmu BTW. Just like with softfloat [u]int16 etc. types it's just too easy to forget _t somewhere (here: _i32) and to end up with a type you don't want. If you have a better proposal how to introduce the checks I want, please let us hear it. If no one has, I don't see how this series would hurt (1-3 refactoring, 4 not enabled by default) while providing useful new debug facilities to developers of new targets. Andreas
Re: [Qemu-devel] [PATCH 2/6] memory: change dirty setting APIs to take a size
Blue Swirl blauwir...@gmail.com wrote: Instead of each target knowing or guessing the guest page size, just pass the desired size of dirtied memory area. This should also improve performance due to memset() optimizations. My understanding last time I looked at this, is that it is as easy basically to change the interface to work on pages, not on addresses. I looked at the interface from the point of view of migration, and I always needed a page. My understanding at the time was that every caller could be change to use a page instead, but that is another set of unfinished patches on my ToDo list. Actually, what I found what I needed was that I wanted a pointer to the ram_block on that functions. Migration and vga were simple to fix, but TCG got me completely lost :-(, and at that point I moved to someting else. Will try to think what were my problems later today. Later, Juan.
[Qemu-devel] [PULL 0/8] arm-devs queue
The following changes since commit f18318eef8b4b263f4e82a5338c9b2875a6c73c8: Merge branch 'master' of git://git.qemu.org/qemu (2011-12-12 04:12:31 +0400) are available in the git repository at: git://git.linaro.org/people/pmaydell/qemu-arm.git arm-devs.for-upstream Peter Chubb (1): Fix sp804 dual-timer Peter Maydell (7): hw/arm_mptimer.c: Turn ARM MPcore private timers into qdev devices hw/arm_gic: Expose GIC CPU interfaces as sysbus memory regions hw/mpcore.c: Use the GIC memory regions for the CPU interface hw/realview_gic: Use GIC memory region for the CPU interface hw/mpcore: Clean up mpcore_priv_read/write as they are now SCU only hw/a9mpcore.c: Implement A9MP peripherals rather than 11MPcore ones hw/mpcore.c: Merge with hw/arm11mpcore.c Makefile.target |1 + hw/a9mpcore.c | 189 -- hw/arm11mpcore.c | 130 +- hw/arm_gic.c | 75 - hw/arm_mptimer.c | 332 + hw/arm_timer.c| 41 ++- hw/mpcore.c | 283 - hw/realview_gic.c | 25 + 8 files changed, 751 insertions(+), 325 deletions(-) create mode 100644 hw/arm_mptimer.c delete mode 100644 hw/mpcore.c
[Qemu-devel] [PATCH 8/8] hw/mpcore.c: Merge with hw/arm11mpcore.c
hw/mpcore.c is now implementing only ARM11MPCore specific peripherals, and is #included only from hw/arm11mpcore.c, so just merge it into that file. Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- hw/arm11mpcore.c | 130 ++- hw/mpcore.c | 137 -- 2 files changed, 129 insertions(+), 138 deletions(-) delete mode 100644 hw/mpcore.c diff --git a/hw/arm11mpcore.c b/hw/arm11mpcore.c index 32ecf98..bc0457e 100644 --- a/hw/arm11mpcore.c +++ b/hw/arm11mpcore.c @@ -7,11 +7,139 @@ * This code is licensed under the GPL. */ +#include sysbus.h +#include qemu-timer.h + /* ??? The MPCore TRM says the on-chip controller has 224 external IRQ lines (+ 32 internal). However my test chip only exposes/reports 32. More importantly Linux falls over if more than 32 are present! */ #define GIC_NIRQ 64 -#include mpcore.c + +#define NCPU 4 + +static inline int +gic_get_current_cpu(void) +{ + return cpu_single_env-cpu_index; +} + +#include arm_gic.c + +/* MPCore private memory region. */ + +typedef struct mpcore_priv_state { +gic_state gic; +uint32_t scu_control; +int iomemtype; +uint32_t old_timer_status[8]; +uint32_t num_cpu; +qemu_irq *timer_irq; +MemoryRegion iomem; +MemoryRegion container; +DeviceState *mptimer; +} mpcore_priv_state; + +/* Per-CPU private memory mapped IO. */ + +static uint64_t mpcore_scu_read(void *opaque, target_phys_addr_t offset, +unsigned size) +{ +mpcore_priv_state *s = (mpcore_priv_state *)opaque; +int id; +offset = 0xff; +/* SCU */ +switch (offset) { +case 0x00: /* Control. */ +return s-scu_control; +case 0x04: /* Configuration. */ +id = ((1 s-num_cpu) - 1) 4; +return id | (s-num_cpu - 1); +case 0x08: /* CPU status. */ +return 0; +case 0x0c: /* Invalidate all. */ +return 0; +default: +hw_error(mpcore_priv_read: Bad offset %x\n, (int)offset); +} +} + +static void mpcore_scu_write(void *opaque, target_phys_addr_t offset, + uint64_t value, unsigned size) +{ +mpcore_priv_state *s = (mpcore_priv_state *)opaque; +offset = 0xff; +/* SCU */ +switch (offset) { +case 0: /* Control register. */ +s-scu_control = value 1; +break; +case 0x0c: /* Invalidate all. */ +/* This is a no-op as cache is not emulated. */ +break; +default: +hw_error(mpcore_priv_read: Bad offset %x\n, (int)offset); +} +} + +static const MemoryRegionOps mpcore_scu_ops = { +.read = mpcore_scu_read, +.write = mpcore_scu_write, +.endianness = DEVICE_NATIVE_ENDIAN, +}; + +static void mpcore_timer_irq_handler(void *opaque, int irq, int level) +{ +mpcore_priv_state *s = (mpcore_priv_state *)opaque; +if (level !s-old_timer_status[irq]) { +gic_set_pending_private(s-gic, irq 1, 29 + (irq 1)); +} +s-old_timer_status[irq] = level; +} + +static void mpcore_priv_map_setup(mpcore_priv_state *s) +{ +int i; +SysBusDevice *busdev = sysbus_from_qdev(s-mptimer); +memory_region_init(s-container, mpcode-priv-container, 0x2000); +memory_region_init_io(s-iomem, mpcore_scu_ops, s, mpcore-scu, 0x100); +memory_region_add_subregion(s-container, 0, s-iomem); +/* GIC CPU interfaces: current CPU at 0x100, then specific CPUs + * at 0x200, 0x300... + */ +for (i = 0; i (s-num_cpu + 1); i++) { +target_phys_addr_t offset = 0x100 + (i * 0x100); +memory_region_add_subregion(s-container, offset, s-gic.cpuiomem[i]); +} +/* Add the regions for timer and watchdog for current CPU and + * for each specific CPU. + */ +s-timer_irq = qemu_allocate_irqs(mpcore_timer_irq_handler, + s, (s-num_cpu + 1) * 2); +for (i = 0; i (s-num_cpu + 1) * 2; i++) { +/* Timers at 0x600, 0x700, ...; watchdogs at 0x620, 0x720, ... */ +target_phys_addr_t offset = 0x600 + (i 1) * 0x100 + (i 1) * 0x20; +memory_region_add_subregion(s-container, offset, +sysbus_mmio_get_region(busdev, i)); +} +memory_region_add_subregion(s-container, 0x1000, s-gic.iomem); +/* Wire up the interrupt from each watchdog and timer. */ +for (i = 0; i s-num_cpu * 2; i++) { +sysbus_connect_irq(busdev, i, s-timer_irq[i]); +} +} + +static int mpcore_priv_init(SysBusDevice *dev) +{ +mpcore_priv_state *s = FROM_SYSBUSGIC(mpcore_priv_state, dev); + +gic_init(s-gic, s-num_cpu); +s-mptimer = qdev_create(NULL, arm_mptimer); +qdev_prop_set_uint32(s-mptimer, num-cpu, s-num_cpu); +qdev_init_nofail(s-mptimer); +mpcore_priv_map_setup(s); +sysbus_init_mmio(dev, s-container); +return 0; +} /* Dummy PIC to route IRQ lines. The baseboard has 4 independent IRQ controllers.
[Qemu-devel] [PATCH 2/8] hw/arm_mptimer.c: Turn ARM MPcore private timers into qdev devices
Turn the ARM MPcore private timer/watchdog blocks into separate qdev devices. This will allow us to share them neatly between 11MPCore and A9MPcore. Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- Makefile.target |1 + hw/arm_mptimer.c | 332 ++ hw/mpcore.c | 184 +- 3 files changed, 365 insertions(+), 152 deletions(-) create mode 100644 hw/arm_mptimer.c diff --git a/Makefile.target b/Makefile.target index a111521..39b2e5a 100644 --- a/Makefile.target +++ b/Makefile.target @@ -344,6 +344,7 @@ obj-arm-y = integratorcp.o versatilepb.o arm_pic.o arm_timer.o obj-arm-y += arm_boot.o pl011.o pl031.o pl050.o pl080.o pl110.o pl181.o pl190.o obj-arm-y += versatile_pci.o obj-arm-y += realview_gic.o realview.o arm_sysctl.o arm11mpcore.o a9mpcore.o +obj-arm-y += arm_mptimer.o obj-arm-y += armv7m.o armv7m_nvic.o stellaris.o pl022.o stellaris_enet.o obj-arm-y += pl061.o obj-arm-y += arm-semi.o diff --git a/hw/arm_mptimer.c b/hw/arm_mptimer.c new file mode 100644 index 000..455a0aa --- /dev/null +++ b/hw/arm_mptimer.c @@ -0,0 +1,332 @@ +/* + * Private peripheral timer/watchdog blocks for ARM 11MPCore and A9MP + * + * Copyright (c) 2006-2007 CodeSourcery. + * Copyright (c) 2011 Linaro Limited + * Written by Paul Brook, Peter Maydell + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, see http://www.gnu.org/licenses/. + */ + +#include sysbus.h +#include qemu-timer.h + +/* This device implements the per-cpu private timer and watchdog block + * which is used in both the ARM11MPCore and Cortex-A9MP. + */ + +#define MAX_CPUS 4 + +/* State of a single timer or watchdog block */ +typedef struct { +uint32_t count; +uint32_t load; +uint32_t control; +uint32_t status; +int64_t tick; +QEMUTimer *timer; +qemu_irq irq; +MemoryRegion iomem; +} timerblock; + +typedef struct { +SysBusDevice busdev; +uint32_t num_cpu; +timerblock timerblock[MAX_CPUS * 2]; +MemoryRegion iomem[2]; +} arm_mptimer_state; + +static inline int get_current_cpu(arm_mptimer_state *s) +{ +if (cpu_single_env-cpu_index = s-num_cpu) { +hw_error(arm_mptimer: num-cpu %d but this cpu is %d!\n, + s-num_cpu, cpu_single_env-cpu_index); +} +return cpu_single_env-cpu_index; +} + +static inline void timerblock_update_irq(timerblock *tb) +{ +qemu_set_irq(tb-irq, tb-status); +} + +/* Return conversion factor from mpcore timer ticks to qemu timer ticks. */ +static inline uint32_t timerblock_scale(timerblock *tb) +{ +return (((tb-control 8) 0xff) + 1) * 10; +} + +static void timerblock_reload(timerblock *tb, int restart) +{ +if (tb-count == 0) { +return; +} +if (restart) { +tb-tick = qemu_get_clock_ns(vm_clock); +} +tb-tick += (int64_t)tb-count * timerblock_scale(tb); +qemu_mod_timer(tb-timer, tb-tick); +} + +static void timerblock_tick(void *opaque) +{ +timerblock *tb = (timerblock *)opaque; +tb-status = 1; +if (tb-control 2) { +tb-count = tb-load; +timerblock_reload(tb, 0); +} else { +tb-count = 0; +} +timerblock_update_irq(tb); +} + +static uint64_t timerblock_read(void *opaque, target_phys_addr_t addr, +unsigned size) +{ +timerblock *tb = (timerblock *)opaque; +int64_t val; +addr = 0x1f; +switch (addr) { +case 0: /* Load */ +return tb-load; +case 4: /* Counter. */ +if (((tb-control 1) == 0) || (tb-count == 0)) { +return 0; +} +/* Slow and ugly, but hopefully won't happen too often. */ +val = tb-tick - qemu_get_clock_ns(vm_clock); +val /= timerblock_scale(tb); +if (val 0) { +val = 0; +} +return val; +case 8: /* Control. */ +return tb-control; +case 12: /* Interrupt status. */ +return tb-status; +default: +return 0; +} +} + +static void timerblock_write(void *opaque, target_phys_addr_t addr, + uint64_t value, unsigned size) +{ +timerblock *tb = (timerblock *)opaque; +int64_t old; +addr = 0x1f; +switch (addr) { +case 0: /* Load */ +tb-load = value; +/* Fall through. */ +case 4: /* Counter. */ +if ((tb-control 1) tb-count) { +/* Cancel
[Qemu-devel] [PATCH] Add a .mailmap to map pre-git-conversion authors to friendly names
Add a .mailmap file so 'git shortlog' can map the unfriendly pre-git-conversion author entries to real names. Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- v1-v2: fixed Andrzej's email to match MAINTAINERS file added entries for Fabrice Bellard, Jocelyn Mayer cc'd active maintainers with an entry in the mailmap v2-v3: sorted into alpha order by first column .mailmap | 16 1 files changed, 16 insertions(+), 0 deletions(-) create mode 100644 .mailmap diff --git a/.mailmap b/.mailmap new file mode 100644 index 000..9797802 --- /dev/null +++ b/.mailmap @@ -0,0 +1,16 @@ +# This mailmap just translates the weird addresses from the original import into git +# into proper addresses so that they are counted properly in git shortlog output. +# +Andrzej Zaborowski balr...@gmail.com balrog balrog@c046a42c-6fe2-441c-8c8c-71466251a162 +Anthony Liguori aligu...@us.ibm.com aliguori aliguori@c046a42c-6fe2-441c-8c8c-71466251a162 +Aurelien Jarno aurel...@aurel32.net aurel32 aurel32@c046a42c-6fe2-441c-8c8c-71466251a162 +Blue Swirl blauwir...@gmail.com blueswir1 blueswir1@c046a42c-6fe2-441c-8c8c-71466251a162 +Edgar E. Iglesias edgar.igles...@gmail.com edgar_igl edgar_igl@c046a42c-6fe2-441c-8c8c-71466251a162 +Fabrice Bellard fabr...@bellard.org bellard bellard@c046a42c-6fe2-441c-8c8c-71466251a162 +Jocelyn Mayer l_ind...@magic.fr j_mayer j_mayer@c046a42c-6fe2-441c-8c8c-71466251a162 +Paul Brook p...@codesourcery.com pbrook pbrook@c046a42c-6fe2-441c-8c8c-71466251a162 +Thiemo Seufer t...@networkno.de ths ths@c046a42c-6fe2-441c-8c8c-71466251a162 +malc av1...@comtv.ru malc malc@c046a42c-6fe2-441c-8c8c-71466251a162 +# There is also a: +#(no author) (no author)@c046a42c-6fe2-441c-8c8c-71466251a162 +# for the cvs2svn initialization commit e63c3dc74bf. -- 1.7.1
[Qemu-devel] [PATCH 4/8] hw/mpcore.c: Use the GIC memory regions for the CPU interface
Switch to using the GIC memory regions for the CPU interface rather than hand implementing them as a subcase of mpcore_priv_read() and mpcore_priv_write(). Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- hw/mpcore.c | 35 ++- 1 files changed, 10 insertions(+), 25 deletions(-) diff --git a/hw/mpcore.c b/hw/mpcore.c index 3d64609..a0af1ad 100644 --- a/hw/mpcore.c +++ b/hw/mpcore.c @@ -41,7 +41,7 @@ static uint64_t mpcore_priv_read(void *opaque, target_phys_addr_t offset, { mpcore_priv_state *s = (mpcore_priv_state *)opaque; int id; -offset = 0xfff; +offset = 0xff; if (offset 0x100) { /* SCU */ switch (offset) { @@ -57,17 +57,6 @@ static uint64_t mpcore_priv_read(void *opaque, target_phys_addr_t offset, default: goto bad_reg; } -} else if (offset 0x600) { -/* Interrupt controller. */ -if (offset 0x200) { -id = gic_get_current_cpu(); -} else { -id = (offset - 0x200) 8; -if (id = s-num_cpu) { -return 0; -} -} -return gic_cpu_read(s-gic, id, offset 0xff); } bad_reg: hw_error(mpcore_priv_read: Bad offset %x\n, (int)offset); @@ -78,8 +67,7 @@ static void mpcore_priv_write(void *opaque, target_phys_addr_t offset, uint64_t value, unsigned size) { mpcore_priv_state *s = (mpcore_priv_state *)opaque; -int id; -offset = 0xfff; +offset = 0xff; if (offset 0x100) { /* SCU */ switch (offset) { @@ -92,16 +80,6 @@ static void mpcore_priv_write(void *opaque, target_phys_addr_t offset, default: goto bad_reg; } -} else if (offset 0x600) { -/* Interrupt controller. */ -if (offset 0x200) { -id = gic_get_current_cpu(); -} else { -id = (offset - 0x200) 8; -} -if (id s-num_cpu) { -gic_cpu_write(s-gic, id, offset 0xff, value); -} } return; bad_reg: @@ -129,8 +107,15 @@ static void mpcore_priv_map_setup(mpcore_priv_state *s) SysBusDevice *busdev = sysbus_from_qdev(s-mptimer); memory_region_init(s-container, mpcode-priv-container, 0x2000); memory_region_init_io(s-iomem, mpcore_priv_ops, s, mpcode-priv, - 0x1000); + 0x100); memory_region_add_subregion(s-container, 0, s-iomem); +/* GIC CPU interfaces: current CPU at 0x100, then specific CPUs + * at 0x200, 0x300... + */ +for (i = 0; i (s-num_cpu + 1); i++) { +target_phys_addr_t offset = 0x100 + (i * 0x100); +memory_region_add_subregion(s-container, offset, s-gic.cpuiomem[i]); +} /* Add the regions for timer and watchdog for current CPU and * for each specific CPU. */ -- 1.7.1
[Qemu-devel] [PATCH 7/8] hw/a9mpcore.c: Implement A9MP peripherals rather than 11MPcore ones
Implement the A9MP private peripheral region correctly, rather than piggybacking on the 11MPCore code; the two CPUs are not the same in this area. Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- hw/a9mpcore.c | 189 ++--- 1 files changed, 179 insertions(+), 10 deletions(-) diff --git a/hw/a9mpcore.c b/hw/a9mpcore.c index 6f108f4..cd2985f 100644 --- a/hw/a9mpcore.c +++ b/hw/a9mpcore.c @@ -2,28 +2,197 @@ * Cortex-A9MPCore internal peripheral emulation. * * Copyright (c) 2009 CodeSourcery. - * Written by Paul Brook + * Copyright (c) 2011 Linaro Limited. + * Written by Paul Brook, Peter Maydell. * * This code is licensed under the GPL. */ -/* 64 external IRQ lines. */ +#include sysbus.h + +/* Configuration for arm_gic.c: + * number of external IRQ lines, max number of CPUs, how to ID current CPU + */ #define GIC_NIRQ 96 -#include mpcore.c +#define NCPU 4 + +static inline int +gic_get_current_cpu(void) +{ + return cpu_single_env-cpu_index; +} + +#include arm_gic.c + +/* A9MP private memory region. */ + +typedef struct a9mp_priv_state { +gic_state gic; +uint32_t scu_control; +uint32_t old_timer_status[8]; +uint32_t num_cpu; +qemu_irq *timer_irq; +MemoryRegion scu_iomem; +MemoryRegion ptimer_iomem; +MemoryRegion container; +DeviceState *mptimer; +} a9mp_priv_state; + +static uint64_t a9_scu_read(void *opaque, target_phys_addr_t offset, +unsigned size) +{ +a9mp_priv_state *s = (a9mp_priv_state *)opaque; +switch (offset) { +case 0x00: /* Control */ +return s-scu_control; +case 0x04: /* Configuration */ +return (((1 s-num_cpu) - 1) 4) | (s-num_cpu - 1); +case 0x08: /* CPU Power Status */ +return 0; +case 0x0c: /* Invalidate All Registers In Secure State */ +return 0; +case 0x40: /* Filtering Start Address Register */ +case 0x44: /* Filtering End Address Register */ +/* RAZ/WI, like an implementation with only one AXI master */ +return 0; +case 0x50: /* SCU Access Control Register */ +case 0x54: /* SCU Non-secure Access Control Register */ +/* unimplemented, fall through */ +default: +return 0; +} +} + +static void a9_scu_write(void *opaque, target_phys_addr_t offset, + uint64_t value, unsigned size) +{ +a9mp_priv_state *s = (a9mp_priv_state *)opaque; +switch (offset) { +case 0x00: /* Control */ +s-scu_control = value 1; +break; +case 0x4: /* Configuration: RO */ +break; +case 0x0c: /* Invalidate All Registers In Secure State */ +/* no-op as we do not implement caches */ +break; +case 0x40: /* Filtering Start Address Register */ +case 0x44: /* Filtering End Address Register */ +/* RAZ/WI, like an implementation with only one AXI master */ +break; +case 0x8: /* CPU Power Status */ +case 0x50: /* SCU Access Control Register */ +case 0x54: /* SCU Non-secure Access Control Register */ +/* unimplemented, fall through */ +default: +break; +} +} + +static const MemoryRegionOps a9_scu_ops = { +.read = a9_scu_read, +.write = a9_scu_write, +.endianness = DEVICE_NATIVE_ENDIAN, +}; + +static void a9mpcore_timer_irq_handler(void *opaque, int irq, int level) +{ +a9mp_priv_state *s = (a9mp_priv_state *)opaque; +if (level !s-old_timer_status[irq]) { +gic_set_pending_private(s-gic, irq 1, 29 + (irq 1)); +} +s-old_timer_status[irq] = level; +} + +static void a9mp_priv_reset(DeviceState *dev) +{ +a9mp_priv_state *s = FROM_SYSBUSGIC(a9mp_priv_state, sysbus_from_qdev(dev)); +int i; +s-scu_control = 0; +for (i = 0; i ARRAY_SIZE(s-old_timer_status); i++) { +s-old_timer_status[i] = 0; +} +} + +static int a9mp_priv_init(SysBusDevice *dev) +{ +a9mp_priv_state *s = FROM_SYSBUSGIC(a9mp_priv_state, dev); +SysBusDevice *busdev; +int i; + +if (s-num_cpu NCPU) { +hw_error(a9mp_priv_init: num-cpu may not be more than %d\n, NCPU); +} + +gic_init(s-gic, s-num_cpu); + +s-mptimer = qdev_create(NULL, arm_mptimer); +qdev_prop_set_uint32(s-mptimer, num-cpu, s-num_cpu); +qdev_init_nofail(s-mptimer); +busdev = sysbus_from_qdev(s-mptimer); + +/* Memory map (addresses are offsets from PERIPHBASE): + * 0x-0x00ff -- Snoop Control Unit + * 0x0100-0x01ff -- GIC CPU interface + * 0x0200-0x02ff -- Global Timer + * 0x0300-0x05ff -- nothing + * 0x0600-0x06ff -- private timers and watchdogs + * 0x0700-0x0fff -- nothing + * 0x1000-0x1fff -- GIC Distributor + * + * We should implement the global timer but don't currently do so. + */ +memory_region_init(s-container, a9mp-priv-container, 0x2000); +memory_region_init_io(s-scu_iomem, a9_scu_ops, s, a9mp-scu, 0x100); +
Re: [Qemu-devel] [RFC 0/4] virtio-mmio transport
On Mon, 2011-12-12 at 02:52 +, Paul Brook wrote: I've taken a look at the virtion-mmio spec, and it looks fairly reasonable. The only thing I'd change is the GuestPageSize/QueuePFN mess. Seems like just using straight 64-bit addresses would be a better solution. Maybe split into a high/low pair to keep all registers as 32-bit registers. This can be done fairly simple by: 1. Bumping Version register content. 2. Adding two registers: QueueAddrLow (0x44) and QueueAddrHigh (0x48). 3. Describing the QueuePFN as obsolete. Than the driver would just use Addr or PFN depending on the device's version. I can do that, but not this year (on holiday from Friday 16th, without any access to Internet whatsoever :-) One think to be decided is in what order the halfs should be filled? Low first, then high? High then low? Does it matter at all? :-) Cheers! Paweł
[Qemu-devel] [PATCH 6/8] hw/mpcore: Clean up mpcore_priv_read/write as they are now SCU only
The only code left in mpcore_priv_read and mpcore_priv_write is now the implementation of the SCU registers. Clean up by renaming functions and removing some unnecessary conditionals to make this clearer. Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- hw/mpcore.c | 73 +-- 1 files changed, 31 insertions(+), 42 deletions(-) diff --git a/hw/mpcore.c b/hw/mpcore.c index a0af1ad..670d7e5 100644 --- a/hw/mpcore.c +++ b/hw/mpcore.c @@ -36,59 +36,49 @@ typedef struct mpcore_priv_state { /* Per-CPU private memory mapped IO. */ -static uint64_t mpcore_priv_read(void *opaque, target_phys_addr_t offset, - unsigned size) +static uint64_t mpcore_scu_read(void *opaque, target_phys_addr_t offset, +unsigned size) { mpcore_priv_state *s = (mpcore_priv_state *)opaque; int id; offset = 0xff; -if (offset 0x100) { -/* SCU */ -switch (offset) { -case 0x00: /* Control. */ -return s-scu_control; -case 0x04: /* Configuration. */ -id = ((1 s-num_cpu) - 1) 4; -return id | (s-num_cpu - 1); -case 0x08: /* CPU status. */ -return 0; -case 0x0c: /* Invalidate all. */ -return 0; -default: -goto bad_reg; -} +/* SCU */ +switch (offset) { +case 0x00: /* Control. */ +return s-scu_control; +case 0x04: /* Configuration. */ +id = ((1 s-num_cpu) - 1) 4; +return id | (s-num_cpu - 1); +case 0x08: /* CPU status. */ +return 0; +case 0x0c: /* Invalidate all. */ +return 0; +default: +hw_error(mpcore_priv_read: Bad offset %x\n, (int)offset); } -bad_reg: -hw_error(mpcore_priv_read: Bad offset %x\n, (int)offset); -return 0; } -static void mpcore_priv_write(void *opaque, target_phys_addr_t offset, - uint64_t value, unsigned size) +static void mpcore_scu_write(void *opaque, target_phys_addr_t offset, + uint64_t value, unsigned size) { mpcore_priv_state *s = (mpcore_priv_state *)opaque; offset = 0xff; -if (offset 0x100) { -/* SCU */ -switch (offset) { -case 0: /* Control register. */ -s-scu_control = value 1; -break; -case 0x0c: /* Invalidate all. */ -/* This is a no-op as cache is not emulated. */ -break; -default: -goto bad_reg; -} +/* SCU */ +switch (offset) { +case 0: /* Control register. */ +s-scu_control = value 1; +break; +case 0x0c: /* Invalidate all. */ +/* This is a no-op as cache is not emulated. */ +break; +default: +hw_error(mpcore_priv_read: Bad offset %x\n, (int)offset); } -return; -bad_reg: -hw_error(mpcore_priv_read: Bad offset %x\n, (int)offset); } -static const MemoryRegionOps mpcore_priv_ops = { -.read = mpcore_priv_read, -.write = mpcore_priv_write, +static const MemoryRegionOps mpcore_scu_ops = { +.read = mpcore_scu_read, +.write = mpcore_scu_write, .endianness = DEVICE_NATIVE_ENDIAN, }; @@ -106,8 +96,7 @@ static void mpcore_priv_map_setup(mpcore_priv_state *s) int i; SysBusDevice *busdev = sysbus_from_qdev(s-mptimer); memory_region_init(s-container, mpcode-priv-container, 0x2000); -memory_region_init_io(s-iomem, mpcore_priv_ops, s, mpcode-priv, - 0x100); +memory_region_init_io(s-iomem, mpcore_scu_ops, s, mpcore-scu, 0x100); memory_region_add_subregion(s-container, 0, s-iomem); /* GIC CPU interfaces: current CPU at 0x100, then specific CPUs * at 0x200, 0x300... -- 1.7.1
[Qemu-devel] [PATCH 5/8] hw/realview_gic: Use GIC memory region for the CPU interface
Use the GIC provided memory region for the CPU interface rather than implementing our own. Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- hw/realview_gic.c | 25 + 1 files changed, 1 insertions(+), 24 deletions(-) diff --git a/hw/realview_gic.c b/hw/realview_gic.c index 479f939..8c4d509 100644 --- a/hw/realview_gic.c +++ b/hw/realview_gic.c @@ -23,36 +23,13 @@ gic_get_current_cpu(void) typedef struct { gic_state gic; -MemoryRegion iomem; MemoryRegion container; } RealViewGICState; -static uint64_t realview_gic_cpu_read(void *opaque, target_phys_addr_t offset, - unsigned size) -{ -gic_state *s = (gic_state *)opaque; -return gic_cpu_read(s, gic_get_current_cpu(), offset); -} - -static void realview_gic_cpu_write(void *opaque, target_phys_addr_t offset, - uint64_t value, unsigned size) -{ -gic_state *s = (gic_state *)opaque; -gic_cpu_write(s, gic_get_current_cpu(), offset, value); -} - -static const MemoryRegionOps realview_gic_cpu_ops = { -.read = realview_gic_cpu_read, -.write = realview_gic_cpu_write, -.endianness = DEVICE_NATIVE_ENDIAN, -}; - static void realview_gic_map_setup(RealViewGICState *s) { memory_region_init(s-container, realview-gic-container, 0x2000); -memory_region_init_io(s-iomem, realview_gic_cpu_ops, s-gic, - realview-gic, 0x1000); -memory_region_add_subregion(s-container, 0, s-iomem); +memory_region_add_subregion(s-container, 0, s-gic.cpuiomem[0]); memory_region_add_subregion(s-container, 0x1000, s-gic.iomem); } -- 1.7.1
[Qemu-devel] [PATCH 3/8] hw/arm_gic: Expose GIC CPU interfaces as sysbus memory regions
Expose the ARM GIC CPU interfaces as memory regions, rather than just providing read and write functions for them. Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- hw/arm_gic.c | 75 +- 1 files changed, 74 insertions(+), 1 deletions(-) diff --git a/hw/arm_gic.c b/hw/arm_gic.c index 527c9ce..66c48fd 100644 --- a/hw/arm_gic.c +++ b/hw/arm_gic.c @@ -103,7 +103,14 @@ typedef struct gic_state int num_cpu; #endif -MemoryRegion iomem; +MemoryRegion iomem; /* Distributor */ +#ifndef NVIC +/* This is just so we can have an opaque pointer which identifies + * both this GIC and which CPU interface we should be accessing. + */ +struct gic_state *backref[NCPU]; +MemoryRegion cpuiomem[NCPU+1]; /* CPU interfaces */ +#endif } gic_state; /* TODO: Many places that call this routine could be optimized. */ @@ -633,6 +640,54 @@ static void gic_cpu_write(gic_state *s, int cpu, int offset, uint32_t value) } gic_update(s); } + +/* Wrappers to read/write the GIC CPU interface for the current CPU */ +static uint64_t gic_thiscpu_read(void *opaque, target_phys_addr_t addr, + unsigned size) +{ +gic_state *s = (gic_state *)opaque; +return gic_cpu_read(s, gic_get_current_cpu(), addr 0xff); +} + +static void gic_thiscpu_write(void *opaque, target_phys_addr_t addr, + uint64_t value, unsigned size) +{ +gic_state *s = (gic_state *)opaque; +gic_cpu_write(s, gic_get_current_cpu(), addr 0xff, value); +} + +/* Wrappers to read/write the GIC CPU interface for a specific CPU. + * These just decode the opaque pointer into gic_state* + cpu id. + */ +static uint64_t gic_do_cpu_read(void *opaque, target_phys_addr_t addr, +unsigned size) +{ +gic_state **backref = (gic_state **)opaque; +gic_state *s = *backref; +int id = (backref - s-backref); +return gic_cpu_read(s, id, addr 0xff); +} + +static void gic_do_cpu_write(void *opaque, target_phys_addr_t addr, + uint64_t value, unsigned size) +{ +gic_state **backref = (gic_state **)opaque; +gic_state *s = *backref; +int id = (backref - s-backref); +gic_cpu_write(s, id, addr 0xff, value); +} + +static const MemoryRegionOps gic_thiscpu_ops = { +.read = gic_thiscpu_read, +.write = gic_thiscpu_write, +.endianness = DEVICE_NATIVE_ENDIAN, +}; + +static const MemoryRegionOps gic_cpu_ops = { +.read = gic_do_cpu_read, +.write = gic_do_cpu_write, +.endianness = DEVICE_NATIVE_ENDIAN, +}; #endif static void gic_reset(gic_state *s) @@ -752,6 +807,24 @@ static void gic_init(gic_state *s) sysbus_init_irq(s-busdev, s-parent_irq[i]); } memory_region_init_io(s-iomem, gic_dist_ops, s, gic_dist, 0x1000); +#ifndef NVIC +/* Memory regions for the CPU interfaces (NVIC doesn't have these): + * a region for CPU interface for this core, then a region for + * CPU interface for core 0, for core 1, ... + * NB that the memory region size of 0x100 applies for the 11MPCore + * and also cores following the GIC v1 spec (ie A9). + * GIC v2 defines a larger memory region (0x1000) so this will need + * to be extended when we implement A15. + */ +memory_region_init_io(s-cpuiomem[0], gic_thiscpu_ops, s, + gic_cpu, 0x100); +for (i = 0; i NUM_CPU(s); i++) { +s-backref[i] = s; +memory_region_init_io(s-cpuiomem[i+1], gic_cpu_ops, s-backref[i], + gic_cpu, 0x100); +} +#endif + gic_reset(s); register_savevm(NULL, arm_gic, -1, 2, gic_save, gic_load, s); } -- 1.7.1
Re: [Qemu-devel] Dropping the MONITOR_CMD_ASYNC
On 12/12/11 11:18, Stefan Hajnoczi wrote: On Sun, Dec 11, 2011 at 10:29 AM, Alon Levy al...@redhat.com wrote: On Thu, Dec 08, 2011 at 05:45:44PM -0200, Luiz Capitulino wrote: Hi there, I'm about to completely drop the MONITOR_CMD_ASYNC API, but it turns out that the command client_migrate_info uses it. That's a legacy interface and has to be dropped, no command should be using it... Something tells me that if I just drop it (and change the command to use the regular interface), bad things will happen. Am I right? :) The monitor command client_migrate_info needs to complete after getting an ACK message from the currently connected spice client (this is the only case where this is required - if there is no client then the MONITOR_CMD_ASYNC API won't be used). This in turn requires the main thread to perform select and call the callback that will process this ACK. That's why the MONITOR_CMD_ASYNC API was used. I'm not aware of any other way to do this, I'll be glad for any help here. Also, I understand this is not what is not true async, since one would expect a true async interface to support multiple in flight monitor commands. If there is any ETA or existing way to do this we could change the implementation of client_migrate_info. Is it possible to use a QMP event to signal completion instead of a MONITOR_CMD_ASYNC command? Problem is this breaks the qemu - libvirt interface. cheers, Gerd
[Qemu-devel] [PATCH 1/8] Fix sp804 dual-timer
From: Peter Chubb peter.ch...@nicta.com.au Properly implement dual-timer read/write for the sp804 dual timer module. Based on ARM specs at http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0271d/index.html Signed-off-by: Hans Jang hsj...@ok-labs.com Signed-off-by: David Mirabito david.mirab...@nicta.com.au Signed-off-by: Peter Chubb peter.ch...@nicta.com.au Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- hw/arm_timer.c | 41 +++-- 1 files changed, 35 insertions(+), 6 deletions(-) diff --git a/hw/arm_timer.c b/hw/arm_timer.c index 518bad2..0a5b9d2 100644 --- a/hw/arm_timer.c +++ b/hw/arm_timer.c @@ -170,9 +170,9 @@ static arm_timer_state *arm_timer_init(uint32_t freq) } /* ARM PrimeCell SP804 dual timer module. - Docs for this device don't seem to be publicly available. This - implementation is based on guesswork, the linux kernel sources and the - Integrator/CP timer modules. */ + * Docs at + * http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0271d/index.html +*/ typedef struct { SysBusDevice busdev; @@ -182,6 +182,13 @@ typedef struct { qemu_irq irq; } sp804_state; +static const uint8_t sp804_ids[] = { +/* Timer ID */ +0x04, 0x18, 0x14, 0, +/* PrimeCell ID */ +0xd, 0xf0, 0x05, 0xb1 +}; + /* Merge the IRQs from the two component devices. */ static void sp804_set_irq(void *opaque, int irq, int level) { @@ -196,12 +203,27 @@ static uint64_t sp804_read(void *opaque, target_phys_addr_t offset, { sp804_state *s = (sp804_state *)opaque; -/* ??? Don't know the PrimeCell ID for this device. */ if (offset 0x20) { return arm_timer_read(s-timer[0], offset); -} else { +} +if (offset 0x40) { return arm_timer_read(s-timer[1], offset - 0x20); } + +/* TimerPeriphID */ +if (offset = 0xfe0 offset = 0xffc) { +return sp804_ids[(offset - 0xfe0) 2]; +} + +switch (offset) { +/* Integration Test control registers, which we won't support */ +case 0xf00: /* TimerITCR */ +case 0xf04: /* TimerITOP (strictly write only but..) */ +return 0; +} + +hw_error(%s: Bad offset %x\n, __func__, (int)offset); +return 0; } static void sp804_write(void *opaque, target_phys_addr_t offset, @@ -211,9 +233,16 @@ static void sp804_write(void *opaque, target_phys_addr_t offset, if (offset 0x20) { arm_timer_write(s-timer[0], offset, value); -} else { +return; +} + +if (offset 0x40) { arm_timer_write(s-timer[1], offset - 0x20, value); +return; } + +/* Technically we could be writing to the Test Registers, but not likely */ +hw_error(%s: Bad offset %x\n, __func__, (int)offset); } static const MemoryRegionOps sp804_ops = { -- 1.7.1
Re: [Qemu-devel] [PATCH v3] block : return real error code in cow.c
Am 12.12.2011 06:54, schrieb Li Zhi Hui: v3: modify some errors Signed-off-by: Li Zhi Hui zhihu...@linux.vnet.ibm.com --- block/cow.c | 44 +--- 1 files changed, 29 insertions(+), 15 deletions(-) Thanks, applied to the block branch. Kevin
Re: [Qemu-devel] [PATCH V4 04/13] hw/9pfs: File system helper process for qemu 9p proxy FS
On Fri, Dec 09, 2011 at 10:12:17PM +0530, M. Mohan Kumar wrote: On Friday, December 09, 2011 12:01:14 AM Stefan Hajnoczi wrote: On Mon, Dec 05, 2011 at 09:48:41PM +0530, M. Mohan Kumar wrote: +static int read_request(int sockfd, struct iovec *iovec, ProxyHeader *header) +{ +int retval; + +/* + * read the request header. + */ +iovec-iov_len = 0; +retval = socket_read(sockfd, iovec-iov_base, PROXY_HDR_SZ); +if (retval 0) { +return retval; +} +iovec-iov_len = PROXY_HDR_SZ; +retval = proxy_unmarshal(iovec, 0, dd, header-type, header-size); +if (retval 0) { +return retval; +} +/* + * We can't process message.size PROXY_MAX_IO_SZ, read the complete + * message from the socket and ignore it. This ensures that + * we can correctly handle the next request. We also return + * ENOBUFS as error to indicate we ran out of buffer space. + */ +if (header-size PROXY_MAX_IO_SZ) { +int count, size; +size = header-size; +while (size 0) { +count = MIN(PROXY_MAX_IO_SZ, size); +count = socket_read(sockfd, iovec-iov_base + PROXY_HDR_SZ, count); +if (count 0) { +return count; +} +size -= count; +} I'm not sure recovery attempts are worthwhile here. The client is buggy, perhaps just refuse further work. But whats the issue in trying to recover in this case? This recovery procedure is not robust because it does not always work. In fact it only works in the case where the header-size field was out-of-range but accurate. That's not a likely case since the QEMU-side code that you are writing should handle this. If the nature of the invalid request is different, either a broken or malicious client which does not send a valid header-size then we're stuck in this special-case recovery trying to gobble bytes and we never log an error. A real recovery would be something like disconnecting and re-establishing the connection between QEMU and the helper. This would allow us to get back to a clean state in all cases. Stefan
Re: [Qemu-devel] Dropping the MONITOR_CMD_ASYNC
On Mon, Dec 12, 2011 at 11:29 AM, Gerd Hoffmann kra...@redhat.com wrote: On 12/12/11 11:18, Stefan Hajnoczi wrote: On Sun, Dec 11, 2011 at 10:29 AM, Alon Levy al...@redhat.com wrote: On Thu, Dec 08, 2011 at 05:45:44PM -0200, Luiz Capitulino wrote: Hi there, I'm about to completely drop the MONITOR_CMD_ASYNC API, but it turns out that the command client_migrate_info uses it. That's a legacy interface and has to be dropped, no command should be using it... Something tells me that if I just drop it (and change the command to use the regular interface), bad things will happen. Am I right? :) The monitor command client_migrate_info needs to complete after getting an ACK message from the currently connected spice client (this is the only case where this is required - if there is no client then the MONITOR_CMD_ASYNC API won't be used). This in turn requires the main thread to perform select and call the callback that will process this ACK. That's why the MONITOR_CMD_ASYNC API was used. I'm not aware of any other way to do this, I'll be glad for any help here. Also, I understand this is not what is not true async, since one would expect a true async interface to support multiple in flight monitor commands. If there is any ETA or existing way to do this we could change the implementation of client_migrate_info. Is it possible to use a QMP event to signal completion instead of a MONITOR_CMD_ASYNC command? Problem is this breaks the qemu - libvirt interface. I had the same issue with the block_job_cancel command, which Adam Litke and Eric Blake helped us fix and find a backward-compatible libvirt solution for: http://www.redhat.com/archives/libvir-list/2011-November/msg01351.html Stefan
Re: [Qemu-devel] [RFC 0/4] virtio-mmio transport
I noticed the virtio-mmio spec has an interrupt status register. On x86 and virtio-pci things are moving towards Message Signalled Interrupts and virtqueues having their own interrupts for better performance and flexibility. Any thoughts on how 1 interrupt per virtqueue works for virtio-mmio? My feeling is that the interrupt details are board-specific and can't be described in virtio-mmio, but it still jumped out at me that it looks like you're only thinking of 1 interrupt for the device. Stefan
Re: [Qemu-devel] [RFC 0/4] virtio-mmio transport
On Mon, 2011-12-12 at 12:14 +, Stefan Hajnoczi wrote: I noticed the virtio-mmio spec has an interrupt status register. On x86 and virtio-pci things are moving towards Message Signalled Interrupts and virtqueues having their own interrupts for better performance and flexibility. Any thoughts on how 1 interrupt per virtqueue works for virtio-mmio? This could be done by either creating devices with more then one interrupt (platform device can take any number of resources) and declaring that first queue uses the first one etc. My feeling is that the interrupt details are board-specific and can't be described in virtio-mmio, It's just the the design pattern in the embedded world that devices usually have one interrupt output, shared between its internal functions. And - of course - there is no in-band signalling (like MSI) possible - interrupt lines are just wires :-) In a boundary case scenario we may face a situation when total amount of interrupts for all queues may actually exceed amount of interrupt inputs available in the interrupt controller... There may be a half-way solution - one interrupt per device but the active queue number notified via the interrupt status register (as a FIFO) so the driver wouldn't have to enumerate all the queues. but it still jumped out at me that it looks like you're only thinking of 1 interrupt for the device. Yes, current version assumes tgat. I'll keep this in mind when planning changes for v2 (next year ;-), thanks for letting me know! Cheers! Paweł
Re: [Qemu-devel] Dropping the MONITOR_CMD_ASYNC
On 12/12/11 13:10, Stefan Hajnoczi wrote: On Mon, Dec 12, 2011 at 11:29 AM, Gerd Hoffmann kra...@redhat.com wrote: On 12/12/11 11:18, Stefan Hajnoczi wrote: On Sun, Dec 11, 2011 at 10:29 AM, Alon Levy al...@redhat.com wrote: On Thu, Dec 08, 2011 at 05:45:44PM -0200, Luiz Capitulino wrote: Hi there, I'm about to completely drop the MONITOR_CMD_ASYNC API, but it turns out that the command client_migrate_info uses it. That's a legacy interface and has to be dropped, no command should be using it... Something tells me that if I just drop it (and change the command to use the regular interface), bad things will happen. Am I right? :) The monitor command client_migrate_info needs to complete after getting an ACK message from the currently connected spice client (this is the only case where this is required - if there is no client then the MONITOR_CMD_ASYNC API won't be used). This in turn requires the main thread to perform select and call the callback that will process this ACK. That's why the MONITOR_CMD_ASYNC API was used. I'm not aware of any other way to do this, I'll be glad for any help here. Also, I understand this is not what is not true async, since one would expect a true async interface to support multiple in flight monitor commands. If there is any ETA or existing way to do this we could change the implementation of client_migrate_info. Is it possible to use a QMP event to signal completion instead of a MONITOR_CMD_ASYNC command? Problem is this breaks the qemu - libvirt interface. I had the same issue with the block_job_cancel command, which Adam Litke and Eric Blake helped us fix and find a backward-compatible libvirt solution for: http://www.redhat.com/archives/libvir-list/2011-November/msg01351.html Isn't going to fly as waiting for completion isn't optional in that case. Workflow is this: (1) libvirt issues client_migrate_info command. (2) qemu forwards it to spice-server, which in turn forwards it to the spice client (if connected). (3) spice client connects to the migration target machine. (4) spice client signals completion to spice-server, which in turn notifies qemu. (5) qemu calls the monitor completion callback, libvirt gets client_migrate_info result. (6) libvirt issues migrate command. The problem is that (3) must be finished before (6) because qemu on the target side doesn't accept incoming tcp connections any more once the migration started. I don't see a way to switch this to qmp events without breaking old libvirt versions managing new qemu. cheers, Gerd
Re: [Qemu-devel] [RFC] qemu-ga: Introduce guest-hibernate command
On Sun, 11 Dec 2011 12:00:12 +0200 Dor Laor dl...@redhat.com wrote: On 12/08/2011 08:52 PM, Luiz Capitulino wrote: This is basically suspend to disk on a Linux guest. Signed-off-by: Luiz Capitulinolcapitul...@redhat.com --- This is an RFC because I did it as simple as possible and I'm open to suggestions... Now, while testing this or even echo disk /sys/power/state I get several Beyond the previous comments about virtio s4 that Amit solved and pm-hibernate I think it will be nice to add suspend to ram (s3) option too. Currently qemu wakes up immediately after s3 but it can be fixed and it can pause the guest and wait to a keyboard event or monitor command. I thought adding it in this RFC, but I decided to keep it simple. Will add it to v2. funny results. Some times qemu just dies after printing that message: Guest moved used index from 20151 to 1 Some times it doesn't die, but I'm unable to log into the guest: I type username password but the terminal kind of locks (the shell doesn't run). Some times it works... qapi-schema-guest.json | 11 +++ qga/guest-agent-commands.c | 19 +++ 2 files changed, 30 insertions(+), 0 deletions(-) diff --git a/qapi-schema-guest.json b/qapi-schema-guest.json index fde5971..2c5bbcf 100644 --- a/qapi-schema-guest.json +++ b/qapi-schema-guest.json @@ -215,3 +215,14 @@ ## { 'command': 'guest-fsfreeze-thaw', 'returns': 'int' } + +## +# @guest-hibernate +# +# Save RAM contents to disk and powerdown the guest. +# +# Notes: This command doesn't return on success. +# +# Since: 1.1 +## +{ 'command': 'guest-hibernate' } diff --git a/qga/guest-agent-commands.c b/qga/guest-agent-commands.c index 6da9904..9dd4060 100644 --- a/qga/guest-agent-commands.c +++ b/qga/guest-agent-commands.c @@ -550,6 +550,25 @@ int64_t qmp_guest_fsfreeze_thaw(Error **err) } #endif +#define LINUX_SYS_STATE_FILE /sys/power/state + +void qmp_guest_hibernate(Error **err) +{ +int fd; + +fd = open(LINUX_SYS_STATE_FILE, O_WRONLY); +if (fd 0) { +error_set(err, QERR_OPEN_FILE_FAILED, LINUX_SYS_STATE_FILE); +return; +} + +if (write(fd, disk, 4) 0) { +error_set(err, QERR_UNDEFINED_ERROR); +} + +close(fd); +} + /* register init/cleanup routines for stateful command groups */ void ga_command_state_init(GAState *s, GACommandState *cs) {
Re: [Qemu-devel] [PATCH V2 2/5] xen mapcache: Check if a memory space has moved.
On Fri, 9 Dec 2011, Anthony PERARD wrote: This patch change the xen_map_cache behavior. Before trying to map a guest addr, mapcache will look into the list of range of address that have been moved (physmap/set_memory). There is currently one memory space like this, the vram, moved from were it's allocated to were the guest will look into. This help to have a succefull migration. Signed-off-by: Anthony PERARD anthony.per...@citrix.com --- xen-all.c | 18 +- xen-mapcache.c | 22 +++--- xen-mapcache.h |9 +++-- 3 files changed, 43 insertions(+), 6 deletions(-) diff --git a/xen-all.c b/xen-all.c index b5e28ab..b2e9853 100644 --- a/xen-all.c +++ b/xen-all.c @@ -218,6 +218,22 @@ static XenPhysmap *get_physmapping(XenIOState *state, return NULL; } +static target_phys_addr_t xen_phys_offset_to_gaddr(target_phys_addr_t start_addr, + ram_addr_t size, void *opaque) +{ +target_phys_addr_t addr = start_addr TARGET_PAGE_MASK; +XenIOState *xen_io_state = opaque; +XenPhysmap *physmap = NULL; +QLIST_FOREACH(physmap, xen_io_state-physmap, list) { +if (range_covers_byte(physmap-phys_offset, physmap-size, addr)) { +return physmap-start_addr; +} +} + +return start_addr; +} Considering that xen_phys_offset_to_gaddr can be called with addresses that are not page aligned, you should be able to return the translated address with the right offset in the page, rather than just the translated address, that is incorrect. Otherwise if start_addr has to be page aligned, just add a BUG_ON at the beginning of the function, to spot cases in which it is not. #if CONFIG_XEN_CTRL_INTERFACE_VERSION = 340 static int xen_add_to_physmap(XenIOState *state, target_phys_addr_t start_addr, @@ -938,7 +954,7 @@ int xen_hvm_init(void) } /* Init RAM management */ -xen_map_cache_init(); +xen_map_cache_init(xen_phys_offset_to_gaddr, state); xen_ram_init(ram_size); This patch is better than the previous version but I think there is still room for improvement. For example, the only case in which xen_phys_offset_to_gaddr should actually be used is for the videoram on restore, right? In that case, why don't we set xen_phys_offset_to_gaddr only on restore rather than all cases? We could even unset xen_phys_offset_to_gaddr after the videoram address has been translated correctly so that it is going to be called only once. qemu_add_vm_change_state_handler(xen_hvm_change_state_handler, state); diff --git a/xen-mapcache.c b/xen-mapcache.c index 7bcb86e..d5f52b2 100644 --- a/xen-mapcache.c +++ b/xen-mapcache.c @@ -76,6 +76,9 @@ typedef struct MapCache { uint8_t *last_address_vaddr; unsigned long max_mcache_size; unsigned int mcache_bucket_shift; + +phys_offset_to_gaddr_t phys_offset_to_gaddr; +void *opaque; } MapCache; static MapCache *mapcache; @@ -89,13 +92,16 @@ static inline int test_bits(int nr, int size, const unsigned long *addr) return 0; } -void xen_map_cache_init(void) +void xen_map_cache_init(phys_offset_to_gaddr_t f, void *opaque) { unsigned long size; struct rlimit rlimit_as; mapcache = g_malloc0(sizeof (MapCache)); +mapcache-phys_offset_to_gaddr = f; +mapcache-opaque = opaque; + QTAILQ_INIT(mapcache-locked_entries); mapcache-last_address_index = -1; @@ -191,9 +197,14 @@ uint8_t *xen_map_cache(target_phys_addr_t phys_addr, target_phys_addr_t size, uint8_t lock) { MapCacheEntry *entry, *pentry = NULL; -target_phys_addr_t address_index = phys_addr MCACHE_BUCKET_SHIFT; -target_phys_addr_t address_offset = phys_addr (MCACHE_BUCKET_SIZE - 1); +target_phys_addr_t address_index; +target_phys_addr_t address_offset; target_phys_addr_t __size = size; +bool translated = false; + +tryagain: +address_index = phys_addr MCACHE_BUCKET_SHIFT; +address_offset = phys_addr (MCACHE_BUCKET_SIZE - 1); trace_xen_map_cache(phys_addr); @@ -235,6 +246,11 @@ uint8_t *xen_map_cache(target_phys_addr_t phys_addr, target_phys_addr_t size, if(!test_bits(address_offset XC_PAGE_SHIFT, size XC_PAGE_SHIFT, entry-valid_mapping)) { mapcache-last_address_index = -1; +if (!translated mapcache-phys_offset_to_gaddr) { +phys_addr = mapcache-phys_offset_to_gaddr(phys_addr, size, mapcache-opaque); +translated = true; +goto tryagain; +} trace_xen_map_cache_return(NULL); return NULL; } It would be interesting to check how many times we end up needlessly calling phys_offset_to_gaddr during the normal life of a VM. It should be very few, may only one, right?
Re: [Qemu-devel] [PATCH V2 4/5] xen: Change memory access behavior during migration.
On Fri, 9 Dec 2011, Anthony PERARD wrote: Do not allocate RAM during pre-migration runstate. Do not actually do set_memory during migration. Signed-off-by: Anthony PERARD anthony.per...@citrix.com --- xen-all.c | 13 + 1 files changed, 13 insertions(+), 0 deletions(-) diff --git a/xen-all.c b/xen-all.c index b2e9853..c1fed87 100644 --- a/xen-all.c +++ b/xen-all.c @@ -189,6 +189,11 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size) trace_xen_ram_alloc(ram_addr, size); +if (runstate_check(RUN_STATE_PREMIGRATE)) { +/* RAM already populated in Xen */ +return; +} It is probably worth printing out a message about the address and size qemu wanted to allocated nr_pfn = size TARGET_PAGE_BITS; pfn_list = g_malloc(sizeof (*pfn_list) * nr_pfn); @@ -269,6 +274,13 @@ go_physmap: DPRINTF(mapping vram to %llx - %llx, from %llx\n, start_addr, start_addr + size, phys_offset); +if (runstate_check(RUN_STATE_INMIGRATE)) { +/* The mapping should already be done and can not be done a second + * time. So we just add to the physmap list instead. + */ +goto done; +} same here, printing a message would be useful
Re: [Qemu-devel] [PATCH V2 5/5] vga-cirrus: Workaround during restore when using Xen.
On Fri, 9 Dec 2011, Anthony PERARD wrote: During the initialisation of the machine at restore time, the access to the VRAM will fail because QEMU does not know yet the right guest address to map, so the vram_ptr is NULL. So this patch avoid using a NULL pointer during initialisation, and try to get another vram_ptr if the call failed the first time. I think it would be a good idea to split this patch into two patches: one that avoids doing the memset on restore, because it is actually futile in all cases; and another patch that tries to set the vga.vram_ptr again in case it is still NULL.
Re: [Qemu-devel] [RFC 0/4] virtio-mmio transport
On Mon, Dec 12, 2011 at 12:28:27PM +, Pawel Moll wrote: On Mon, 2011-12-12 at 12:14 +, Stefan Hajnoczi wrote: I noticed the virtio-mmio spec has an interrupt status register. On x86 and virtio-pci things are moving towards Message Signalled Interrupts and virtqueues having their own interrupts for better performance and flexibility. Any thoughts on how 1 interrupt per virtqueue works for virtio-mmio? This could be done by either creating devices with more then one interrupt (platform device can take any number of resources) and declaring that first queue uses the first one etc. We currently support mapping from virtqueues to interrupt vectors in virtio core. Only virtio pci uses that but mmio can too. It's better than fixed mapping IMO as driver can control resources per queue. My feeling is that the interrupt details are board-specific and can't be described in virtio-mmio, It's just the the design pattern in the embedded world that devices usually have one interrupt output, shared between its internal functions. And - of course - there is no in-band signalling (like MSI) possible - interrupt lines are just wires :-) In a boundary case scenario we may face a situation when total amount of interrupts for all queues may actually exceed amount of interrupt inputs available in the interrupt controller... There may be a half-way solution - one interrupt per device but the active queue number notified via the interrupt status register (as a FIFO) so the driver wouldn't have to enumerate all the queues. We could use a queue for this certainly. Why do you have so many queues? but it still jumped out at me that it looks like you're only thinking of 1 interrupt for the device. Yes, current version assumes tgat. I'll keep this in mind when planning changes for v2 (next year ;-), thanks for letting me know! Cheers! Paweł
Re: [Qemu-devel] [PATCH V2 5/5] vga-cirrus: Workaround during restore when using Xen.
On Sat, 10 Dec 2011, Jan Kiszka wrote: On 2011-12-09 22:54, Anthony PERARD wrote: During the initialisation of the machine at restore time, the access to the VRAM will fail because QEMU does not know yet the right guest address to map, so the vram_ptr is NULL. So this patch avoid using a NULL pointer during initialisation, and try to get another vram_ptr if the call failed the first time. Signed-off-by: Anthony PERARD anthony.per...@citrix.com --- hw/cirrus_vga.c | 16 +--- 1 files changed, 13 insertions(+), 3 deletions(-) diff --git a/hw/cirrus_vga.c b/hw/cirrus_vga.c index c7e365b..2e049c9 100644 --- a/hw/cirrus_vga.c +++ b/hw/cirrus_vga.c @@ -32,6 +32,7 @@ #include console.h #include vga_int.h #include loader.h +#include sysemu.h /* * TODO: @@ -2696,6 +2697,13 @@ static int cirrus_post_load(void *opaque, int version_id) s-vga.gr[0x01] = s-cirrus_shadow_gr1 0x0f; cirrus_update_memory_access(s); +if (!s-vga.vram_ptr) { +/* At this point vga.vram_ptr can be invalid on Xen because we need to + * know the position of the videoram in the guest physical address space in + * order to be able to map it. After cirrus_update_memory_access we do know + * where the videoram is, so let's map it now. */ +s-vga.vram_ptr = memory_region_get_ram_ptr(s-vga.vram); +} /* force refresh */ s-vga.graphic_mode = -1; cirrus_update_bank_ptr(s, 0); @@ -2784,9 +2792,11 @@ static void cirrus_reset(void *opaque) } s-vga.cr[0x27] = s-device_id; -/* Win2K seems to assume that the pattern buffer is at 0xff - initially ! */ -memset(s-vga.vram_ptr, 0xff, s-real_vram_size); +if (!runstate_check(RUN_STATE_PREMIGRATE)) { +/* Win2K seems to assume that the pattern buffer is at 0xff + initially ! */ +memset(s-vga.vram_ptr, 0xff, s-real_vram_size); +} s-cirrus_hidden_dac_lockindex = 5; s-cirrus_hidden_dac_data = 0; Is there really no way to fix this properly in the Xen layer? We thought about this issue for some time but we couldn't come up with anything better. To summarize the problem: - on restore the videoram has already been loaded in the guest physical address space by Xen; - we don't know exactly where it is yet, because it has been loaded at the *last* address it was mapped to (see map_linear_vram_bank that moves the videoram); - we want to avoid allocating the videoram twice (actually the second allocation would fail because of memory constraints); So the solution (I acknowledge that it looks more like an hack than a solution) is: - wait for cirrus to load its state so that we know where the videoram is; - map the videoram into qemu's address space at that point. Anything else we came up with was even worse, for example: - independently save the position of the videoram and then when vga_common_init tries to allocate it for a second time give back the old videoram instead; - move back the videoram to the original position and then move it again to the new position. This looks highly fragile as specifically the second hunk appears unrelated to Xen. I think that the second chuck should be in a separate patch because it is unrelated to Xen. On restore it is useless to memset the videoram, so for performance reasons it would be a good idea to avoid it on all platforms. Also it happens to crash Qemu on Xen but that is another story ;-) I think we should also add a comment: useles to memset the videoram on restore, the old videoram is going to be copied over soon anyway Also, is this the only device affected by the shortcoming? What about other VGA adapters e.g.? To my knowledge (in the PC emulator) it is the only device that allocates memory using memory_region_init_ram/memory_region_get_ram_ptr and then moves it around. But maybe QXL does it as well?
Re: [Qemu-devel] [RFC 0/4] virtio-mmio transport
On Mon, 2011-12-12 at 13:12 +, Michael S. Tsirkin wrote: On Mon, Dec 12, 2011 at 12:28:27PM +, Pawel Moll wrote: On Mon, 2011-12-12 at 12:14 +, Stefan Hajnoczi wrote: I noticed the virtio-mmio spec has an interrupt status register. On x86 and virtio-pci things are moving towards Message Signalled Interrupts and virtqueues having their own interrupts for better performance and flexibility. Any thoughts on how 1 interrupt per virtqueue works for virtio-mmio? This could be done by either creating devices with more then one interrupt (platform device can take any number of resources) and declaring that first queue uses the first one etc. We currently support mapping from virtqueues to interrupt vectors in virtio core. Only virtio pci uses that but mmio can too. It's better than fixed mapping IMO as driver can control resources per queue. I'll keep that in mind. My feeling is that the interrupt details are board-specific and can't be described in virtio-mmio, It's just the the design pattern in the embedded world that devices usually have one interrupt output, shared between its internal functions. And - of course - there is no in-band signalling (like MSI) possible - interrupt lines are just wires :-) In a boundary case scenario we may face a situation when total amount of interrupts for all queues may actually exceed amount of interrupt inputs available in the interrupt controller... There may be a half-way solution - one interrupt per device but the active queue number notified via the interrupt status register (as a FIFO) so the driver wouldn't have to enumerate all the queues. We could use a queue for this certainly. Hm, yes, I suppose so :-) This would be a system-level queue rather than the normal one, but I guess we could do that. Why do you have so many queues? I assume this a question about the example I gave above? The answer is: I obviously don't :-) This was just to point out that there _may_ be a problem if we wanted to allocate an interrupt per queue. Cheers! Paweł
Re: [Qemu-devel] [PATCH V2 5/5] vga-cirrus: Workaround during restore when using Xen.
On 2011-12-12 14:18, Stefano Stabellini wrote: On Sat, 10 Dec 2011, Jan Kiszka wrote: On 2011-12-09 22:54, Anthony PERARD wrote: During the initialisation of the machine at restore time, the access to the VRAM will fail because QEMU does not know yet the right guest address to map, so the vram_ptr is NULL. So this patch avoid using a NULL pointer during initialisation, and try to get another vram_ptr if the call failed the first time. Signed-off-by: Anthony PERARD anthony.per...@citrix.com --- hw/cirrus_vga.c | 16 +--- 1 files changed, 13 insertions(+), 3 deletions(-) diff --git a/hw/cirrus_vga.c b/hw/cirrus_vga.c index c7e365b..2e049c9 100644 --- a/hw/cirrus_vga.c +++ b/hw/cirrus_vga.c @@ -32,6 +32,7 @@ #include console.h #include vga_int.h #include loader.h +#include sysemu.h /* * TODO: @@ -2696,6 +2697,13 @@ static int cirrus_post_load(void *opaque, int version_id) s-vga.gr[0x01] = s-cirrus_shadow_gr1 0x0f; cirrus_update_memory_access(s); +if (!s-vga.vram_ptr) { +/* At this point vga.vram_ptr can be invalid on Xen because we need to + * know the position of the videoram in the guest physical address space in + * order to be able to map it. After cirrus_update_memory_access we do know + * where the videoram is, so let's map it now. */ +s-vga.vram_ptr = memory_region_get_ram_ptr(s-vga.vram); +} /* force refresh */ s-vga.graphic_mode = -1; cirrus_update_bank_ptr(s, 0); @@ -2784,9 +2792,11 @@ static void cirrus_reset(void *opaque) } s-vga.cr[0x27] = s-device_id; -/* Win2K seems to assume that the pattern buffer is at 0xff - initially ! */ -memset(s-vga.vram_ptr, 0xff, s-real_vram_size); +if (!runstate_check(RUN_STATE_PREMIGRATE)) { +/* Win2K seems to assume that the pattern buffer is at 0xff + initially ! */ +memset(s-vga.vram_ptr, 0xff, s-real_vram_size); +} s-cirrus_hidden_dac_lockindex = 5; s-cirrus_hidden_dac_data = 0; Is there really no way to fix this properly in the Xen layer? We thought about this issue for some time but we couldn't come up with anything better. To summarize the problem: - on restore the videoram has already been loaded in the guest physical address space by Xen; - we don't know exactly where it is yet, because it has been loaded at the *last* address it was mapped to (see map_linear_vram_bank that moves the videoram); - we want to avoid allocating the videoram twice (actually the second allocation would fail because of memory constraints); So the solution (I acknowledge that it looks more like an hack than a solution) is: - wait for cirrus to load its state so that we know where the videoram is; Why can't you store this information in some additional Xen-specific vmstate? The fact that memory_region_get_ram_ptr has to return NULL looks bogus to me. It's against the QEMU semantics. Other devices may only be fine as they are not (yet) used with Xen. - map the videoram into qemu's address space at that point. Anything else we came up with was even worse, for example: - independently save the position of the videoram and then when vga_common_init tries to allocate it for a second time give back the old videoram instead; - move back the videoram to the original position and then move it again to the new position. This looks highly fragile as specifically the second hunk appears unrelated to Xen. I think that the second chuck should be in a separate patch because it is unrelated to Xen. On restore it is useless to memset the videoram, so for performance reasons it would be a good idea to avoid it on all platforms. Also it happens to crash Qemu on Xen but that is another story ;-) I think we should also add a comment: useles to memset the videoram on restore, the old videoram is going to be copied over soon anyway That's in fact a different story and maybe worth optimizing due to the size of that buffer. But please do not call the state PREMIGRATE. It's rather INCOMING[_MIGRATION]. Thanks, Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
Re: [Qemu-devel] [PATCH V2 5/5] vga-cirrus: Workaround during restore when using Xen.
On Mon, 12 Dec 2011, Jan Kiszka wrote: On 2011-12-12 14:18, Stefano Stabellini wrote: On Sat, 10 Dec 2011, Jan Kiszka wrote: On 2011-12-09 22:54, Anthony PERARD wrote: During the initialisation of the machine at restore time, the access to the VRAM will fail because QEMU does not know yet the right guest address to map, so the vram_ptr is NULL. So this patch avoid using a NULL pointer during initialisation, and try to get another vram_ptr if the call failed the first time. Signed-off-by: Anthony PERARD anthony.per...@citrix.com --- hw/cirrus_vga.c | 16 +--- 1 files changed, 13 insertions(+), 3 deletions(-) diff --git a/hw/cirrus_vga.c b/hw/cirrus_vga.c index c7e365b..2e049c9 100644 --- a/hw/cirrus_vga.c +++ b/hw/cirrus_vga.c @@ -32,6 +32,7 @@ #include console.h #include vga_int.h #include loader.h +#include sysemu.h /* * TODO: @@ -2696,6 +2697,13 @@ static int cirrus_post_load(void *opaque, int version_id) s-vga.gr[0x01] = s-cirrus_shadow_gr1 0x0f; cirrus_update_memory_access(s); +if (!s-vga.vram_ptr) { +/* At this point vga.vram_ptr can be invalid on Xen because we need to + * know the position of the videoram in the guest physical address space in + * order to be able to map it. After cirrus_update_memory_access we do know + * where the videoram is, so let's map it now. */ +s-vga.vram_ptr = memory_region_get_ram_ptr(s-vga.vram); +} /* force refresh */ s-vga.graphic_mode = -1; cirrus_update_bank_ptr(s, 0); @@ -2784,9 +2792,11 @@ static void cirrus_reset(void *opaque) } s-vga.cr[0x27] = s-device_id; -/* Win2K seems to assume that the pattern buffer is at 0xff - initially ! */ -memset(s-vga.vram_ptr, 0xff, s-real_vram_size); +if (!runstate_check(RUN_STATE_PREMIGRATE)) { +/* Win2K seems to assume that the pattern buffer is at 0xff + initially ! */ +memset(s-vga.vram_ptr, 0xff, s-real_vram_size); +} s-cirrus_hidden_dac_lockindex = 5; s-cirrus_hidden_dac_data = 0; Is there really no way to fix this properly in the Xen layer? We thought about this issue for some time but we couldn't come up with anything better. To summarize the problem: - on restore the videoram has already been loaded in the guest physical address space by Xen; - we don't know exactly where it is yet, because it has been loaded at the *last* address it was mapped to (see map_linear_vram_bank that moves the videoram); - we want to avoid allocating the videoram twice (actually the second allocation would fail because of memory constraints); So the solution (I acknowledge that it looks more like an hack than a solution) is: - wait for cirrus to load its state so that we know where the videoram is; Why can't you store this information in some additional Xen-specific vmstate? The fact that memory_region_get_ram_ptr has to return NULL looks bogus to me. It's against the QEMU semantics. Other devices may only be fine as they are not (yet) used with Xen. Unfortunately that cannot work because the allocation is done by vga_common_init before any state has been loaded. So even if we saved the physmap QLIST in a vmstate record, it wouldn't be available yet when vga_common_init tries to allocate the memory.
Re: [Qemu-devel] [RFC 0/4] virtio-mmio transport
I can do that, but not this year (on holiday from Friday 16th, without any access to Internet whatsoever :-) One think to be decided is in what order the halfs should be filled? Low first, then high? High then low? Does it matter at all? :-) My inital though was that you shouldn't be changing this value when the ring is enabled. Unfortunately you disable the ring by setting the address to zero so that argument doesn't work :-/ I suggest that the device to buffer writes to the high part, and construct the actual 64-bit value when the low part is written. That allows 32-bit guests can ignore the high part entirely. Requiring the guest always write high then low also works, though I don't see any benefit to either guest or host. Acting on writes as soon as they occur is not a viable option as it causes the device to be enabled before the full address has bee written. You could say the address takes effect after both halves have been written, writes must come in pairs, but may occur in either order. However there is a risk that host and guest will somehow get out of sync on which values pair together, so IMO this is a bad idea. If you can't stomach the above then I guess changing the condition for ring enablement to QueueNum != 0 and rearanging the configuration sequence appropriately would also do the trick. Having this be different between PCI and mmio is probably not worth the confusion though. Paul
Re: [Qemu-devel] [RFC 0/4] virtio-mmio transport
On Mon, 2011-12-12 at 14:45 +, Paul Brook wrote: I suggest that the device to buffer writes to the high part, and construct the actual 64-bit value when the low part is written. That allows 32-bit guests can ignore the high part entirely. This sounds good to me. If we define the reset value of both registers as 0 (which is consistent with 0 = stop condition) a 32-bit guest will be able to behave as you are suggesting. Cool, I will keep this in mind. Thanks! Paweł
Re: [Qemu-devel] [RFC 0/6] target-i386: TCGv cleanups
Am 12.12.2011 00:42, schrieb Paul Brook: This series makes target-i386 compile with DEBUG_TCGV_TL. What benefit does this provide? It showcases what changes would need to be done to allow type-safe compilation of the first pair of --enable-system targets. How is the existing code not type safe? What benefit does making TCGv a separate type rather than an alias for either _i32 or _i64 give? Paul
Re: [Qemu-devel] [PATCH V2 5/5] vga-cirrus: Workaround during restore when using Xen.
On 2011-12-12 15:41, Stefano Stabellini wrote: On Mon, 12 Dec 2011, Jan Kiszka wrote: On 2011-12-12 14:18, Stefano Stabellini wrote: On Sat, 10 Dec 2011, Jan Kiszka wrote: On 2011-12-09 22:54, Anthony PERARD wrote: During the initialisation of the machine at restore time, the access to the VRAM will fail because QEMU does not know yet the right guest address to map, so the vram_ptr is NULL. So this patch avoid using a NULL pointer during initialisation, and try to get another vram_ptr if the call failed the first time. Signed-off-by: Anthony PERARD anthony.per...@citrix.com --- hw/cirrus_vga.c | 16 +--- 1 files changed, 13 insertions(+), 3 deletions(-) diff --git a/hw/cirrus_vga.c b/hw/cirrus_vga.c index c7e365b..2e049c9 100644 --- a/hw/cirrus_vga.c +++ b/hw/cirrus_vga.c @@ -32,6 +32,7 @@ #include console.h #include vga_int.h #include loader.h +#include sysemu.h /* * TODO: @@ -2696,6 +2697,13 @@ static int cirrus_post_load(void *opaque, int version_id) s-vga.gr[0x01] = s-cirrus_shadow_gr1 0x0f; cirrus_update_memory_access(s); +if (!s-vga.vram_ptr) { +/* At this point vga.vram_ptr can be invalid on Xen because we need to + * know the position of the videoram in the guest physical address space in + * order to be able to map it. After cirrus_update_memory_access we do know + * where the videoram is, so let's map it now. */ +s-vga.vram_ptr = memory_region_get_ram_ptr(s-vga.vram); +} /* force refresh */ s-vga.graphic_mode = -1; cirrus_update_bank_ptr(s, 0); @@ -2784,9 +2792,11 @@ static void cirrus_reset(void *opaque) } s-vga.cr[0x27] = s-device_id; -/* Win2K seems to assume that the pattern buffer is at 0xff - initially ! */ -memset(s-vga.vram_ptr, 0xff, s-real_vram_size); +if (!runstate_check(RUN_STATE_PREMIGRATE)) { +/* Win2K seems to assume that the pattern buffer is at 0xff + initially ! */ +memset(s-vga.vram_ptr, 0xff, s-real_vram_size); +} s-cirrus_hidden_dac_lockindex = 5; s-cirrus_hidden_dac_data = 0; Is there really no way to fix this properly in the Xen layer? We thought about this issue for some time but we couldn't come up with anything better. To summarize the problem: - on restore the videoram has already been loaded in the guest physical address space by Xen; - we don't know exactly where it is yet, because it has been loaded at the *last* address it was mapped to (see map_linear_vram_bank that moves the videoram); - we want to avoid allocating the videoram twice (actually the second allocation would fail because of memory constraints); So the solution (I acknowledge that it looks more like an hack than a solution) is: - wait for cirrus to load its state so that we know where the videoram is; Why can't you store this information in some additional Xen-specific vmstate? The fact that memory_region_get_ram_ptr has to return NULL looks bogus to me. It's against the QEMU semantics. Other devices may only be fine as they are not (yet) used with Xen. Unfortunately that cannot work because the allocation is done by vga_common_init before any state has been loaded. So even if we saved the physmap QLIST in a vmstate record, it wouldn't be available yet when vga_common_init tries to allocate the memory. If you run two VMs with identical setup, one from cold start to operational mode, the other into an incoming migration, the guest physical memory layout the host sees is different? Hmm, maybe if you reorder devices in the command line. Really, I think this is something inherently incompatible with the current memory API. If Xen has this unfixable special requirement (it's rather a design issue IMHO), adjust the API and adapt all devices. Hot-fixing only a single one this way is no good idea long term. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
Re: [Qemu-devel] New Guess OS Creation Problem
On Mon, Dec 12, 2011 at 2:50 PM, takizo paul...@takizo.com wrote: On Dec 12, 2011, at 6:56 PM, Stefan Hajnoczi wrote: On Sun, Dec 11, 2011 at 4:40 PM, takizo paul...@takizo.com wrote: LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -S -M rhel6.1.0 -enable-kvm -m 4096 -smp 1,sockets=1,cores=1,threads=1 -name database -uuid f3e9f320-7826-7e50-94bb-1833f7fd9dfb -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/database.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-reboot -drive file=/opt/cibai/database,if=none,id=drive-ide0-0-0,format=raw,cache=none,aio=threads -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=2 -drive file=/opt/ISO-Download/FreeBSD-8.2-RELEASE-amd64-disc1.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw,aio=threads -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1 -netdev tap,fd=22,id=hostnet0 -device e1000,netdev=hostnet0,id=net0,mac=52:54:00:77:a5:a6,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -usb -vnc 127.0.0.1:2,password -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 char device redirected to /dev/pts/6 Using CPU model cpu64-rhel6 block I/O error in device 'drive-ide0-0-0': Invalid argument (22) Is /opt/cibai/database on a special filesystem or exotic storage setup? Please check dmesg on the host for any kernel messages regarding I/O errors. Please try reading the file on the host to check whether the I/O error is happening on the host and not related to KVM: $ dd if=/opt/cibai/database of=/dev/null iflag=direct Stefan Hi Stefan, When I quit the installation, further see the error messages ide_dma_cancel: aiocb still pendingide_dma_cancel: BM_STATUS_DMAING still pendingide_dma_cancel: aiocb still pendingide_dma_cancel: BM_STATUS_DMAING still pendingide_dma_cancel: aiocb still pendingide_dma_cancel: BM_STATUS_DMAING still pendingide_dma_cancel: aiocb still pendingide_dma_cancel: BM_STATUS_DMAING still pendingide_dma_cancel: aiocb still pendingide_dma_cancel: BM_STATUS_DMAING still pendingide_dma_cancel: aiocb still pendingide_dma_cancel: BM_STATUS_DMAING still pending2011-12-12 22:47:51.427: shutting down Okay, so if I understand correctly you have CentOS 6.1 host running FreeBSD and CentOS guest installs to emulated IDE drives. You're getting failed I/O requests with EINVAL (22). You have verified that dd if=/opt/vm/database.img of=/dev/null iflag=direct reads the image file without errors. Can you please try the CentOS guest install with a virtio-blk drive instead of IDE? This should show whether this problem is related to IDE or a general issue in how QEMU accesses the host file. Stefan
Re: [Qemu-devel] [RFC 0/4] virtio-mmio transport
On Mon, Dec 12, 2011 at 12:28 PM, Pawel Moll pawel.m...@arm.com wrote: On Mon, 2011-12-12 at 12:14 +, Stefan Hajnoczi wrote: I noticed the virtio-mmio spec has an interrupt status register. On x86 and virtio-pci things are moving towards Message Signalled Interrupts and virtqueues having their own interrupts for better performance and flexibility. Any thoughts on how 1 interrupt per virtqueue works for virtio-mmio? This could be done by either creating devices with more then one interrupt (platform device can take any number of resources) and declaring that first queue uses the first one etc. My feeling is that the interrupt details are board-specific and can't be described in virtio-mmio, It's just the the design pattern in the embedded world that devices usually have one interrupt output, shared between its internal functions. And - of course - there is no in-band signalling (like MSI) possible - interrupt lines are just wires :-) In a boundary case scenario we may face a situation when total amount of interrupts for all queues may actually exceed amount of interrupt inputs available in the interrupt controller... There may be a half-way solution - one interrupt per device but the active queue number notified via the interrupt status register (as a FIFO) so the driver wouldn't have to enumerate all the queues. If there aren't already then pretty soon ARM-based systems will deal with PCIe and Message Signalled Interrupts. Are you sure new ARM boards are constraints to a small number of physical interrupts? Stefan
Re: [Qemu-devel] [RFC 0/4] virtio-mmio transport
On 12 December 2011 15:11, Stefan Hajnoczi stefa...@gmail.com wrote: If there aren't already then pretty soon ARM-based systems will deal with PCIe and Message Signalled Interrupts. Are you sure new ARM boards are constraints to a small number of physical interrupts? Depends what you mean by small number. The GIC has up to 224 interrupt lines... -- PMM
Re: [Qemu-devel] New Guess OS Creation Problem
On Dec 12, 2011, at 11:08 PM, Stefan Hajnoczi wrote: On Mon, Dec 12, 2011 at 2:50 PM, takizo paul...@takizo.com wrote: On Dec 12, 2011, at 6:56 PM, Stefan Hajnoczi wrote: On Sun, Dec 11, 2011 at 4:40 PM, takizo paul...@takizo.com wrote: LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -S -M rhel6.1.0 -enable-kvm -m 4096 -smp 1,sockets=1,cores=1,threads=1 -name database -uuid f3e9f320-7826-7e50-94bb-1833f7fd9dfb -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/database.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-reboot -drive file=/opt/cibai/database,if=none,id=drive-ide0-0-0,format=raw,cache=none,aio=threads -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=2 -drive file=/opt/ISO-Download/FreeBSD-8.2-RELEASE-amd64-disc1.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw,aio=threads -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1 -netdev tap,fd=22,id=hostnet0 -device e1000,netdev=hostnet0,id=net0,mac=52:54:00:77:a5:a6,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -usb -vnc 127.0.0.1:2,password -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 char device redirected to /dev/pts/6 Using CPU model cpu64-rhel6 block I/O error in device 'drive-ide0-0-0': Invalid argument (22) Is /opt/cibai/database on a special filesystem or exotic storage setup? Please check dmesg on the host for any kernel messages regarding I/O errors. Please try reading the file on the host to check whether the I/O error is happening on the host and not related to KVM: $ dd if=/opt/cibai/database of=/dev/null iflag=direct Stefan Hi Stefan, When I quit the installation, further see the error messages ide_dma_cancel: aiocb still pendingide_dma_cancel: BM_STATUS_DMAING still pendingide_dma_cancel: aiocb still pendingide_dma_cancel: BM_STATUS_DMAING still pendingide_dma_cancel: aiocb still pendingide_dma_cancel: BM_STATUS_DMAING still pendingide_dma_cancel: aiocb still pendingide_dma_cancel: BM_STATUS_DMAING still pendingide_dma_cancel: aiocb still pendingide_dma_cancel: BM_STATUS_DMAING still pendingide_dma_cancel: aiocb still pendingide_dma_cancel: BM_STATUS_DMAING still pending2011-12-12 22:47:51.427: shutting down Hi Stefan Okay, so if I understand correctly you have CentOS 6.1 host running FreeBSD and CentOS guest installs to emulated IDE drives. You're getting failed I/O requests with EINVAL (22). You have verified that dd if=/opt/vm/database.img of=/dev/null iflag=direct reads the image file without errors. That is right, both also emulated on IDE drive Can you please try the CentOS guest install with a virtio-blk drive instead of IDE? This should show whether this problem is related to IDE or a general issue in how QEMU accesses the host file. Took your advice installed Centos on VirtIO drive and it's ok. Does it means it's IDE problem on the OS itself? 2011-12-12 23:15:06.759: starting up LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -S -M rhel6.1.0 -enable-kvm -m 1024 -smp 1,sockets=1,cores=1,threads=1 -name centos -uuid 0752c74e-1d51-d529-f66f-8df9a28670ed -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/centos.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-reboot -drive file=/opt/vm/centos.img,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=threads -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 -drive file=/opt/ISO-Download/CentOS-6.0-x86_64-minimal.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw,aio=threads -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1 -netdev tap,fd=22,id=hostnet0,vhost=on,vhostfd=26 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:10:bc:5d,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -usb -device usb-tablet,id=input0 -vnc 127.0.0.1:3,password -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 char device redirected to /dev/pts/9 Using CPU model cpu64-rhel6 Stefan
Re: [Qemu-devel] [RFC 0/4] virtio-mmio transport
On Mon, 2011-12-12 at 15:11 +, Stefan Hajnoczi wrote: If there aren't already then pretty soon ARM-based systems will deal with PCIe and Message Signalled Interrupts. Actually PCI is not an alien in ARM world - we had platforms with PCI long time ago. And new SOCs aimed at servers come with PCIe indeed. It's just that bulk of available systems were designed for mobile phones etc., and I think you agree that in such case not wasting silicon on huge PCI root complex makes at least some sense... And generally, when you have PCI-ed platform, you just use virtio-pci :-) Are you sure new ARM boards are constraints to a small number of physical interrupts? It's about SOC rather then a board. If I remember correctly Cortex-A9's GIC (interrupt controller) can be synthesized with up to 224 interrupt inputs. And I've already seen SOCs with more interrupt sources in the past... Having said all this, I don't think it is really a problem now. At least today virtio-mmio will be used in simple models rather that in huge KVM/Xen/whatever virtualised systems with hundreds of PV devices. Cheers! Paweł
Re: [Qemu-devel] Dropping the MONITOR_CMD_ASYNC
On Mon, Dec 12, 2011 at 12:39 PM, Gerd Hoffmann kra...@redhat.com wrote: On 12/12/11 13:10, Stefan Hajnoczi wrote: On Mon, Dec 12, 2011 at 11:29 AM, Gerd Hoffmann kra...@redhat.com wrote: On 12/12/11 11:18, Stefan Hajnoczi wrote: On Sun, Dec 11, 2011 at 10:29 AM, Alon Levy al...@redhat.com wrote: On Thu, Dec 08, 2011 at 05:45:44PM -0200, Luiz Capitulino wrote: Hi there, I'm about to completely drop the MONITOR_CMD_ASYNC API, but it turns out that the command client_migrate_info uses it. That's a legacy interface and has to be dropped, no command should be using it... Something tells me that if I just drop it (and change the command to use the regular interface), bad things will happen. Am I right? :) The monitor command client_migrate_info needs to complete after getting an ACK message from the currently connected spice client (this is the only case where this is required - if there is no client then the MONITOR_CMD_ASYNC API won't be used). This in turn requires the main thread to perform select and call the callback that will process this ACK. That's why the MONITOR_CMD_ASYNC API was used. I'm not aware of any other way to do this, I'll be glad for any help here. Also, I understand this is not what is not true async, since one would expect a true async interface to support multiple in flight monitor commands. If there is any ETA or existing way to do this we could change the implementation of client_migrate_info. Is it possible to use a QMP event to signal completion instead of a MONITOR_CMD_ASYNC command? Problem is this breaks the qemu - libvirt interface. I had the same issue with the block_job_cancel command, which Adam Litke and Eric Blake helped us fix and find a backward-compatible libvirt solution for: http://www.redhat.com/archives/libvir-list/2011-November/msg01351.html Isn't going to fly as waiting for completion isn't optional in that case. Workflow is this: (1) libvirt issues client_migrate_info command. (2) qemu forwards it to spice-server, which in turn forwards it to the spice client (if connected). (3) spice client connects to the migration target machine. (4) spice client signals completion to spice-server, which in turn notifies qemu. (5) qemu calls the monitor completion callback, libvirt gets client_migrate_info result. (6) libvirt issues migrate command. The problem is that (3) must be finished before (6) because qemu on the target side doesn't accept incoming tcp connections any more once the migration started. I don't see a way to switch this to qmp events without breaking old libvirt versions managing new qemu. I don't see a solution in this case either. Looks like libvirt supports this command since 0.9.2 so it's not a good idea to just yank it. It might be possible for the QEMU client_migrate_info handler to run a nested event loop in the legacy libvirt case. This would suck since the VM is unresponsive while waiting for spice migration to complete. New libvirt would call the async version of the command which is well-behaved and uses a QMP event to signal completion. Stefan
Re: [Qemu-devel] [PATCH V4 04/13] hw/9pfs: File system helper process for qemu 9p proxy FS
On Mon, 12 Dec 2011 12:08:33 +, Stefan Hajnoczi stefa...@gmail.com wrote: On Fri, Dec 09, 2011 at 10:12:17PM +0530, M. Mohan Kumar wrote: On Friday, December 09, 2011 12:01:14 AM Stefan Hajnoczi wrote: On Mon, Dec 05, 2011 at 09:48:41PM +0530, M. Mohan Kumar wrote: +static int read_request(int sockfd, struct iovec *iovec, ProxyHeader *header) +{ +int retval; + +/* + * read the request header. + */ +iovec-iov_len = 0; +retval = socket_read(sockfd, iovec-iov_base, PROXY_HDR_SZ); +if (retval 0) { +return retval; +} +iovec-iov_len = PROXY_HDR_SZ; +retval = proxy_unmarshal(iovec, 0, dd, header-type, header-size); +if (retval 0) { +return retval; +} +/* + * We can't process message.size PROXY_MAX_IO_SZ, read the complete + * message from the socket and ignore it. This ensures that + * we can correctly handle the next request. We also return + * ENOBUFS as error to indicate we ran out of buffer space. + */ +if (header-size PROXY_MAX_IO_SZ) { +int count, size; +size = header-size; +while (size 0) { +count = MIN(PROXY_MAX_IO_SZ, size); +count = socket_read(sockfd, iovec-iov_base + PROXY_HDR_SZ, count); +if (count 0) { +return count; +} +size -= count; +} I'm not sure recovery attempts are worthwhile here. The client is buggy, perhaps just refuse further work. But whats the issue in trying to recover in this case? This recovery procedure is not robust because it does not always work. In fact it only works in the case where the header-size field was out-of-range but accurate. That's not a likely case since the QEMU-side code that you are writing should handle this. If the nature of the invalid request is different, either a broken or malicious client which does not send a valid header-size then we're stuck in this special-case recovery trying to gobble bytes and we never log an error. A real recovery would be something like disconnecting and re-establishing the connection between QEMU and the helper. This would allow us to get back to a clean state in all cases. Since we are not having any state in the proxy helper, returning ENOBUFS should be similar to the above right ? One of the reason to try to recover as much as possible, is to make sure the guest can umount the file system properly. That is if we hit these error condition due to a bug in proxy FS driver is qemu, we want to make sure we return some valid error, which will atleast enable the guest/client to do an umount. -aneesh
Re: [Qemu-devel] [PATCH V2 5/5] vga-cirrus: Workaround during restore when using Xen.
On Mon, 12 Dec 2011, Jan Kiszka wrote: Is there really no way to fix this properly in the Xen layer? We thought about this issue for some time but we couldn't come up with anything better. To summarize the problem: - on restore the videoram has already been loaded in the guest physical address space by Xen; - we don't know exactly where it is yet, because it has been loaded at the *last* address it was mapped to (see map_linear_vram_bank that moves the videoram); - we want to avoid allocating the videoram twice (actually the second allocation would fail because of memory constraints); So the solution (I acknowledge that it looks more like an hack than a solution) is: - wait for cirrus to load its state so that we know where the videoram is; Why can't you store this information in some additional Xen-specific vmstate? The fact that memory_region_get_ram_ptr has to return NULL looks bogus to me. It's against the QEMU semantics. Other devices may only be fine as they are not (yet) used with Xen. Unfortunately that cannot work because the allocation is done by vga_common_init before any state has been loaded. So even if we saved the physmap QLIST in a vmstate record, it wouldn't be available yet when vga_common_init tries to allocate the memory. If you run two VMs with identical setup, one from cold start to operational mode, the other into an incoming migration, the guest physical memory layout the host sees is different? Hmm, maybe if you reorder devices in the command line. Yes, it is different because the memory allocated for a specific device (Cirrus) has been moved (map_linear_vram_bank). Really, I think this is something inherently incompatible with the current memory API. If Xen has this unfixable special requirement (it's rather a design issue IMHO), adjust the API and adapt all devices. Hot-fixing only a single one this way is no good idea long term. Fair enough. What about introducing a type of savevm state that is going to be restored before machine-init? This way we could save and restore our physmap and we could handle memory maps and allocations transparently.
Re: [Qemu-devel] New Guess OS Creation Problem
On Mon, Dec 12, 2011 at 3:19 PM, takizo paul...@takizo.com wrote: On Dec 12, 2011, at 11:08 PM, Stefan Hajnoczi wrote: On Mon, Dec 12, 2011 at 2:50 PM, takizo paul...@takizo.com wrote: On Dec 12, 2011, at 6:56 PM, Stefan Hajnoczi wrote: On Sun, Dec 11, 2011 at 4:40 PM, takizo paul...@takizo.com wrote: LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -S -M rhel6.1.0 -enable-kvm -m 4096 -smp 1,sockets=1,cores=1,threads=1 -name database -uuid f3e9f320-7826-7e50-94bb-1833f7fd9dfb -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/database.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-reboot -drive file=/opt/cibai/database,if=none,id=drive-ide0-0-0,format=raw,cache=none,aio=threads -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=2 -drive file=/opt/ISO-Download/FreeBSD-8.2-RELEASE-amd64-disc1.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw,aio=threads -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1 -netdev tap,fd=22,id=hostnet0 -device e1000,netdev=hostnet0,id=net0,mac=52:54:00:77:a5:a6,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -usb -vnc 127.0.0.1:2,password -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 char device redirected to /dev/pts/6 Using CPU model cpu64-rhel6 block I/O error in device 'drive-ide0-0-0': Invalid argument (22) Is /opt/cibai/database on a special filesystem or exotic storage setup? Please check dmesg on the host for any kernel messages regarding I/O errors. Please try reading the file on the host to check whether the I/O error is happening on the host and not related to KVM: $ dd if=/opt/cibai/database of=/dev/null iflag=direct Stefan Hi Stefan, When I quit the installation, further see the error messages ide_dma_cancel: aiocb still pendingide_dma_cancel: BM_STATUS_DMAING still pendingide_dma_cancel: aiocb still pendingide_dma_cancel: BM_STATUS_DMAING still pendingide_dma_cancel: aiocb still pendingide_dma_cancel: BM_STATUS_DMAING still pendingide_dma_cancel: aiocb still pendingide_dma_cancel: BM_STATUS_DMAING still pendingide_dma_cancel: aiocb still pendingide_dma_cancel: BM_STATUS_DMAING still pendingide_dma_cancel: aiocb still pendingide_dma_cancel: BM_STATUS_DMAING still pending2011-12-12 22:47:51.427: shutting down Hi Stefan Okay, so if I understand correctly you have CentOS 6.1 host running FreeBSD and CentOS guest installs to emulated IDE drives. You're getting failed I/O requests with EINVAL (22). You have verified that dd if=/opt/vm/database.img of=/dev/null iflag=direct reads the image file without errors. That is right, both also emulated on IDE drive Can you please try the CentOS guest install with a virtio-blk drive instead of IDE? This should show whether this problem is related to IDE or a general issue in how QEMU accesses the host file. Took your advice installed Centos on VirtIO drive and it's ok. Does it means it's IDE problem on the OS itself? The physical drive in your machine is probably fine. QEMU emulates an IDE controller and disks for the guest. It looks like there is a bug in the IDE emulation code inside QEMU which leads to these errors. In general virtio-blk is the recommended storage interface because it delivers better performance. I'm not up-to-speed on how stable the FreeBSD virtio drivers are but I think they are available if you search. 2011-12-12 23:15:06.759: starting up LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -S -M rhel6.1.0 -enable-kvm -m 1024 -smp 1,sockets=1,cores=1,threads=1 -name centos -uuid 0752c74e-1d51-d529-f66f-8df9a28670ed -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/centos.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-reboot -drive file=/opt/vm/centos.img,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=threads -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 -drive file=/opt/ISO-Download/CentOS-6.0-x86_64-minimal.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw,aio=threads -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1 -netdev tap,fd=22,id=hostnet0,vhost=on,vhostfd=26 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:10:bc:5d,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -usb -device usb-tablet,id=input0 -vnc 127.0.0.1:3,password -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 char device redirected to /dev/pts/9 Using CPU model cpu64-rhel6 At this stage of debugging I would try enabling SystemTap probes in QEMU
Re: [Qemu-devel] [RFC 0/6] target-i386: TCGv cleanups
Am 12.12.2011 15:56, schrieb Paul Brook: Am 12.12.2011 00:42, schrieb Paul Brook: This series makes target-i386 compile with DEBUG_TCGV_TL. What benefit does this provide? It showcases what changes would need to be done to allow type-safe compilation of the first pair of --enable-system targets. How is the existing code not type safe? What benefit does making TCGv a separate type rather than an alias for either _i32 or _i64 give? I've already answered that extensively elsewhere. Look, I really personally don't care about i386 and whether this RFC gets applied to target-i386 or not. This is an example, because the changes that I *do* care about have already been folded into my branch, with the help of my TCGv series now posted for others' benefit. Please read the other thread for details of what it does and doesn't do. Andreas
[Qemu-devel] [PATCH] ARM - Remove fixed map code buffer restriction
On ARM, don't map the code buffer at a fixed location, and fix up the call/goto tcg routines to let it do long jumps. Mapping the code buffer at a fixed address could sometimes result in it being mapped over the top of the heap with pretty random results. This diff is against v1.0. Signed-off-by: Dr. David Alan Gilbert david.gilb...@linaro.org --- exec.c |4 +--- tcg/arm/tcg-target.c | 31 --- 2 files changed, 13 insertions(+), 22 deletions(-) diff --git a/exec.c b/exec.c index 6b92198..ef83da1 100644 --- a/exec.c +++ b/exec.c @@ -497,9 +497,7 @@ static void code_gen_alloc(unsigned long tb_size) if (code_gen_buffer_size (512 * 1024 * 1024)) code_gen_buffer_size = (512 * 1024 * 1024); #elif defined(__arm__) -/* Map the buffer below 32M, so we can use direct calls and branches */ -flags |= MAP_FIXED; -start = (void *) 0x0100UL; +/* Keep the buffer no bigger than 16GB to branch between blocks */ if (code_gen_buffer_size 16 * 1024 * 1024) code_gen_buffer_size = 16 * 1024 * 1024; #elif defined(__s390x__) diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c index e05a64f..730d913 100644 --- a/tcg/arm/tcg-target.c +++ b/tcg/arm/tcg-target.c @@ -842,6 +842,12 @@ static inline void tcg_out_st8(TCGContext *s, int cond, tcg_out_st8_12(s, cond, rd, rn, offset); } +/* The _goto case is normally between TBs within the same code buffer, + and with the code buffer limited to 16GB we shouldn't need the long + case. + + except to the prologue that is in its own buffer. + */ static inline void tcg_out_goto(TCGContext *s, int cond, uint32_t addr) { int32_t val; @@ -855,22 +861,20 @@ static inline void tcg_out_goto(TCGContext *s, int cond, uint32_t addr) if (val - 8 0x01fd val - 8 -0x01fd) tcg_out_b(s, cond, val); else { -#if 1 -tcg_abort(); -#else if (cond == COND_AL) { tcg_out_ld32_12(s, COND_AL, TCG_REG_PC, TCG_REG_PC, -4); -tcg_out32(s, addr); /* XXX: This is l-u.value, can we use it? */ +tcg_out32(s, addr); } else { tcg_out_movi32(s, cond, TCG_REG_R8, val - 8); tcg_out_dat_reg(s, cond, ARITH_ADD, TCG_REG_PC, TCG_REG_PC, TCG_REG_R8, SHIFT_IMM_LSL(0)); } -#endif } } +/* The call case is mostly used for helpers - so it's not unreasonable + for them to be beyond branch range */ static inline void tcg_out_call(TCGContext *s, uint32_t addr) { int32_t val; @@ -887,20 +891,9 @@ static inline void tcg_out_call(TCGContext *s, uint32_t addr) tcg_out_bl(s, COND_AL, val); } } else { -#if 1 -tcg_abort(); -#else -if (cond == COND_AL) { -tcg_out_dat_imm(s, cond, ARITH_ADD, TCG_REG_R14, TCG_REG_PC, 4); -tcg_out_ld32_12(s, COND_AL, TCG_REG_PC, TCG_REG_PC, -4); -tcg_out32(s, addr); /* XXX: This is l-u.value, can we use it? */ -} else { -tcg_out_movi32(s, cond, TCG_REG_R9, addr); -tcg_out_dat_reg(s, cond, ARITH_MOV, TCG_REG_R14, 0, -TCG_REG_PC, SHIFT_IMM_LSL(0)); -tcg_out_bx(s, cond, TCG_REG_R9); -} -#endif +tcg_out_dat_imm(s, COND_AL, ARITH_ADD, TCG_REG_R14, TCG_REG_PC, 4); +tcg_out_ld32_12(s, COND_AL, TCG_REG_PC, TCG_REG_PC, -4); +tcg_out32(s, addr); } }
Re: [Qemu-devel] New Guess OS Creation Problem
On Dec 12, 2011, at 11:33 PM, Stefan Hajnoczi wrote: On Mon, Dec 12, 2011 at 3:19 PM, takizo paul...@takizo.com wrote: On Dec 12, 2011, at 11:08 PM, Stefan Hajnoczi wrote: On Mon, Dec 12, 2011 at 2:50 PM, takizo paul...@takizo.com wrote: On Dec 12, 2011, at 6:56 PM, Stefan Hajnoczi wrote: On Sun, Dec 11, 2011 at 4:40 PM, takizo paul...@takizo.com wrote: LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -S -M rhel6.1.0 -enable-kvm -m 4096 -smp 1,sockets=1,cores=1,threads=1 -name database -uuid f3e9f320-7826-7e50-94bb-1833f7fd9dfb -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/database.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-reboot -drive file=/opt/cibai/database,if=none,id=drive-ide0-0-0,format=raw,cache=none,aio=threads -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=2 -drive file=/opt/ISO-Download/FreeBSD-8.2-RELEASE-amd64-disc1.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw,aio=threads -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1 -netdev tap,fd=22,id=hostnet0 -device e1000,netdev=hostnet0,id=net0,mac=52:54:00:77:a5:a6,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -usb -vnc 127.0.0.1:2,password -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 char device redirected to /dev/pts/6 Using CPU model cpu64-rhel6 block I/O error in device 'drive-ide0-0-0': Invalid argument (22) Is /opt/cibai/database on a special filesystem or exotic storage setup? Please check dmesg on the host for any kernel messages regarding I/O errors. Please try reading the file on the host to check whether the I/O error is happening on the host and not related to KVM: $ dd if=/opt/cibai/database of=/dev/null iflag=direct Stefan Hi Stefan, When I quit the installation, further see the error messages ide_dma_cancel: aiocb still pendingide_dma_cancel: BM_STATUS_DMAING still pendingide_dma_cancel: aiocb still pendingide_dma_cancel: BM_STATUS_DMAING still pendingide_dma_cancel: aiocb still pendingide_dma_cancel: BM_STATUS_DMAING still pendingide_dma_cancel: aiocb still pendingide_dma_cancel: BM_STATUS_DMAING still pendingide_dma_cancel: aiocb still pendingide_dma_cancel: BM_STATUS_DMAING still pendingide_dma_cancel: aiocb still pendingide_dma_cancel: BM_STATUS_DMAING still pending2011-12-12 22:47:51.427: shutting down Hi Stefan Okay, so if I understand correctly you have CentOS 6.1 host running FreeBSD and CentOS guest installs to emulated IDE drives. You're getting failed I/O requests with EINVAL (22). You have verified that dd if=/opt/vm/database.img of=/dev/null iflag=direct reads the image file without errors. That is right, both also emulated on IDE drive Can you please try the CentOS guest install with a virtio-blk drive instead of IDE? This should show whether this problem is related to IDE or a general issue in how QEMU accesses the host file. Took your advice installed Centos on VirtIO drive and it's ok. Does it means it's IDE problem on the OS itself? The physical drive in your machine is probably fine. QEMU emulates an IDE controller and disks for the guest. It looks like there is a bug in the IDE emulation code inside QEMU which leads to these errors. I suspect the problem might came from QEMU, because day before it was working before before I upgrade to latest QEMU. In general virtio-blk is the recommended storage interface because it delivers better performance. I'm not up-to-speed on how stable the FreeBSD virtio drivers are but I think they are available if you search. Unfortunately I am running FreeBSD 8.2, it's not natively supported yet. I guess now interim solutions will be duplicate from some current running VM. 2011-12-12 23:15:06.759: starting up LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -S -M rhel6.1.0 -enable-kvm -m 1024 -smp 1,sockets=1,cores=1,threads=1 -name centos -uuid 0752c74e-1d51-d529-f66f-8df9a28670ed -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/centos.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-reboot -drive file=/opt/vm/centos.img,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=threads -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 -drive file=/opt/ISO-Download/CentOS-6.0-x86_64-minimal.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw,aio=threads -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1 -netdev tap,fd=22,id=hostnet0,vhost=on,vhostfd=26 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:10:bc:5d,bus=pci.0,addr=0x3
Re: [Qemu-devel] Dropping the MONITOR_CMD_ASYNC
On Thu, Dec 08, 2011 at 05:45:44PM -0200, Luiz Capitulino wrote: Hi there, I'm about to completely drop the MONITOR_CMD_ASYNC API, but it turns out that the command client_migrate_info uses it. That's a legacy interface and has to be dropped, no command should be using it... Why has it got to be dropped? can't it be declared as deprecated first? Something tells me that if I just drop it (and change the command to use the regular interface), bad things will happen. Am I right? :)
Re: [Qemu-devel] [PATCH] ARM - Remove fixed map code buffer restriction
CC'ing Andrzej, who is the tcg/arm maintainer. On 12 December 2011 15:37, Dr. David Alan Gilbert david.gilb...@linaro.org wrote: On ARM, don't map the code buffer at a fixed location, and fix up the call/goto tcg routines to let it do long jumps. Mapping the code buffer at a fixed address could sometimes result in it being mapped over the top of the heap with pretty random results. This diff is against v1.0. Patches should always be against master, although we're still close enough to 1.0 that this will probably apply anyway. Comments like This diff is against v1.0. go below the '---' so they don't appear in the final commit log. Code generally looks OK to me. Signed-off-by: Dr. David Alan Gilbert david.gilb...@linaro.org --- exec.c | 4 +--- tcg/arm/tcg-target.c | 31 --- 2 files changed, 13 insertions(+), 22 deletions(-) diff --git a/exec.c b/exec.c index 6b92198..ef83da1 100644 --- a/exec.c +++ b/exec.c @@ -497,9 +497,7 @@ static void code_gen_alloc(unsigned long tb_size) if (code_gen_buffer_size (512 * 1024 * 1024)) code_gen_buffer_size = (512 * 1024 * 1024); #elif defined(__arm__) - /* Map the buffer below 32M, so we can use direct calls and branches */ - flags |= MAP_FIXED; - start = (void *) 0x0100UL; + /* Keep the buffer no bigger than 16GB to branch between blocks */ if (code_gen_buffer_size 16 * 1024 * 1024) code_gen_buffer_size = 16 * 1024 * 1024; #elif defined(__s390x__) diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c index e05a64f..730d913 100644 --- a/tcg/arm/tcg-target.c +++ b/tcg/arm/tcg-target.c @@ -842,6 +842,12 @@ static inline void tcg_out_st8(TCGContext *s, int cond, tcg_out_st8_12(s, cond, rd, rn, offset); } +/* The _goto case is normally between TBs within the same code buffer, + and with the code buffer limited to 16GB we shouldn't need the long + case. + + except to the prologue that is in its own buffer. + */ static inline void tcg_out_goto(TCGContext *s, int cond, uint32_t addr) { int32_t val; @@ -855,22 +861,20 @@ static inline void tcg_out_goto(TCGContext *s, int cond, uint32_t addr) if (val - 8 0x01fd val - 8 -0x01fd) tcg_out_b(s, cond, val); else { -#if 1 - tcg_abort(); -#else if (cond == COND_AL) { tcg_out_ld32_12(s, COND_AL, TCG_REG_PC, TCG_REG_PC, -4); - tcg_out32(s, addr); /* XXX: This is l-u.value, can we use it? */ + tcg_out32(s, addr); I see these XXXs have been there since the ARM tcg target was introduced. Does anybody know what they were referring to? Andrzej? } else { tcg_out_movi32(s, cond, TCG_REG_R8, val - 8); tcg_out_dat_reg(s, cond, ARITH_ADD, TCG_REG_PC, TCG_REG_PC, TCG_REG_R8, SHIFT_IMM_LSL(0)); } -#endif } } +/* The call case is mostly used for helpers - so it's not unreasonable + for them to be beyond branch range */ static inline void tcg_out_call(TCGContext *s, uint32_t addr) { int32_t val; @@ -887,20 +891,9 @@ static inline void tcg_out_call(TCGContext *s, uint32_t addr) tcg_out_bl(s, COND_AL, val); } } else { -#if 1 - tcg_abort(); -#else - if (cond == COND_AL) { - tcg_out_dat_imm(s, cond, ARITH_ADD, TCG_REG_R14, TCG_REG_PC, 4); - tcg_out_ld32_12(s, COND_AL, TCG_REG_PC, TCG_REG_PC, -4); - tcg_out32(s, addr); /* XXX: This is l-u.value, can we use it? */ - } else { - tcg_out_movi32(s, cond, TCG_REG_R9, addr); - tcg_out_dat_reg(s, cond, ARITH_MOV, TCG_REG_R14, 0, - TCG_REG_PC, SHIFT_IMM_LSL(0)); - tcg_out_bx(s, cond, TCG_REG_R9); - } -#endif + tcg_out_dat_imm(s, COND_AL, ARITH_ADD, TCG_REG_R14, TCG_REG_PC, 4); + tcg_out_ld32_12(s, COND_AL, TCG_REG_PC, TCG_REG_PC, -4); + tcg_out32(s, addr); } } -- PMM
Re: [Qemu-devel] [PATCH V4 04/13] hw/9pfs: File system helper process for qemu 9p proxy FS
On Mon, Dec 12, 2011 at 3:21 PM, Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com wrote: On Mon, 12 Dec 2011 12:08:33 +, Stefan Hajnoczi stefa...@gmail.com wrote: On Fri, Dec 09, 2011 at 10:12:17PM +0530, M. Mohan Kumar wrote: On Friday, December 09, 2011 12:01:14 AM Stefan Hajnoczi wrote: On Mon, Dec 05, 2011 at 09:48:41PM +0530, M. Mohan Kumar wrote: +static int read_request(int sockfd, struct iovec *iovec, ProxyHeader *header) +{ + int retval; + + /* + * read the request header. + */ + iovec-iov_len = 0; + retval = socket_read(sockfd, iovec-iov_base, PROXY_HDR_SZ); + if (retval 0) { + return retval; + } + iovec-iov_len = PROXY_HDR_SZ; + retval = proxy_unmarshal(iovec, 0, dd, header-type, header-size); + if (retval 0) { + return retval; + } + /* + * We can't process message.size PROXY_MAX_IO_SZ, read the complete + * message from the socket and ignore it. This ensures that + * we can correctly handle the next request. We also return + * ENOBUFS as error to indicate we ran out of buffer space. + */ + if (header-size PROXY_MAX_IO_SZ) { + int count, size; + size = header-size; + while (size 0) { + count = MIN(PROXY_MAX_IO_SZ, size); + count = socket_read(sockfd, iovec-iov_base + PROXY_HDR_SZ, count); + if (count 0) { + return count; + } + size -= count; + } I'm not sure recovery attempts are worthwhile here. The client is buggy, perhaps just refuse further work. But whats the issue in trying to recover in this case? This recovery procedure is not robust because it does not always work. In fact it only works in the case where the header-size field was out-of-range but accurate. That's not a likely case since the QEMU-side code that you are writing should handle this. If the nature of the invalid request is different, either a broken or malicious client which does not send a valid header-size then we're stuck in this special-case recovery trying to gobble bytes and we never log an error. A real recovery would be something like disconnecting and re-establishing the connection between QEMU and the helper. This would allow us to get back to a clean state in all cases. Since we are not having any state in the proxy helper, returning ENOBUFS should be similar to the above right ? One of the reason to try to recover as much as possible, is to make sure the guest can umount the file system properly. That is if we hit these error condition due to a bug in proxy FS driver is qemu, we want to make sure we return some valid error, which will atleast enable the guest/client to do an umount. When the helper detects something outside the protocol specification it needs to terminate the connection. The protocol has no reliable way to skip the junk coming over the socket so we can't process the next message. The flipside to try to recover as much as possible is damage as little as possible. We don't want to mis-interpret requests on this broken connection and corrupt the user's data. I'm happy with any scheme as long as it handles all error cases. The problem with the -ENOBUFS case was that it is pretty artificial (unlikely to happen) and doesn't handle cases where header-size is inaccurate. Stefan
Re: [Qemu-devel] [PATCH 0/4] tcg: Add debug facilities for TCGv
Trying to make a 32-bit target 64-bit safe without actually implementing the 64-bit target is a complete waste of time. That's where we disagree. I rather do things right from the start than leaving the cleanup work to someone else later on. You've almost no chance of getting it right. In some cases the correct answer will be to use 32-bit arithmetic, then sign/zero extend the result. In other cases the correct answer will be to perform word size arithmetic. Blindly picking one just makes the bugs harder to find later. This series picks nothing blindly. Sure it does - or at least it will when you get to ARM and m68k[1]. The whole point of your patch is to force developers to pick between an address sized value and a 32-bit value. On a 32-bit target these are the same thing. Not only are they the same thing, but most of the time the architecture doesn't even have the concept that they may be different. Only when you introduce a variant with 64-bit addresses does the distinction become meaningful. At that point they're actually different, and are trivially detected by the existing checks. Ther are three ways to resolve a mismatch: - Change everything to TCGv_i32. - Change everything to TCGv. - Add an explicit extension/truncation (compiles to no-op on 32-bit targets). There's no way of the developer of a 32-bit architecture to know which of these a [hypothetical future] 64-bit architecture will require. The fact they you've been forced to pick one (rather then leave the mismatch for the 64-bit porter to resolve) actually makes bugs harder to find. For me the most annoying issue was that tcg_gen_qemu_{ld,st}* needs TCGv. You mean the value transferred is always TCGv sized, so ld32u requires an additional truncation before doing 32-bit arithmetic? Fixing that is completely independent of making TCGv a separate type. The whole point of TCGv is that it's an address sized value, so is the only real option for the address. Constructing an address for a different sized value requires a target specific sign/szero extension or truncation/overflow check. If you're trying to add support for targets where the primary word size is neither 32 nor 64 then that's a completely different problem, and probably not one that's worth solving. In practice your port is going to end up using 64- bit arithmetic and explicitly compensating for the excess precision where necessary. That's a different issue and not being addressed here. Good. My point was that I have an inheritance hierarchy where (as opposed to, e.g., ppc and ppc64) both end up as TARGET_LONG_BITS==32, so I do not get a check for free by compiling both, and I do not want to introduce an artificial TARGET_LONG_BITS==64 architecture just to check that my tl==i32 target code is free of typos. Huh? Now I'm completely confused. If all your targets have T_L_B==32 how can mixing TCGv and TCGV_i32 be wrong? How do you know which of the 3 alternatives above is the right answer? Same issue if you pick only --target-list=some-softmmu BTW. If you're only building a small subset of targets then clearly you won't see errors in the targets you omitted. The only sane answer is to make sure you build (and preferably test) all the targets you have effected. This is the same reason I object to code that is disabled by default. If it isn't at least being built by the majority of developers (or at least any working in related areas) then it's going to bitrot rapidly, and probably won't work when someone does decide to turn it on. Just like with softfloat [u]int16 etc. types it's just too easy to forget _t somewhere (here: _i32) and to end up with a type you don't want. I don't see how that is relevant. TCGv (aka target_ulong) is a synonym for either TCGv_i32 (uint32_t) or TCGv_i64 (uint64_t). If this is not true then we're completely screwed. If you have a better proposal how to introduce the checks I want, please let us hear it. I still don't understand how your additional restructions are ever useful. Please give an example of an actual error your checks will catch. Paul [1] m68k may never have a 64-bit variant, but ARM is a good example of a qemu target that is currently 32-bit but will grow a 64-bit variant in the not too distant future.
Re: [Qemu-devel] Dropping the MONITOR_CMD_ASYNC
On Mon, Dec 12, 2011 at 03:23:35PM +, Stefan Hajnoczi wrote: On Mon, Dec 12, 2011 at 12:39 PM, Gerd Hoffmann kra...@redhat.com wrote: On 12/12/11 13:10, Stefan Hajnoczi wrote: On Mon, Dec 12, 2011 at 11:29 AM, Gerd Hoffmann kra...@redhat.com wrote: On 12/12/11 11:18, Stefan Hajnoczi wrote: On Sun, Dec 11, 2011 at 10:29 AM, Alon Levy al...@redhat.com wrote: On Thu, Dec 08, 2011 at 05:45:44PM -0200, Luiz Capitulino wrote: Hi there, I'm about to completely drop the MONITOR_CMD_ASYNC API, but it turns out that the command client_migrate_info uses it. That's a legacy interface and has to be dropped, no command should be using it... Something tells me that if I just drop it (and change the command to use the regular interface), bad things will happen. Am I right? :) The monitor command client_migrate_info needs to complete after getting an ACK message from the currently connected spice client (this is the only case where this is required - if there is no client then the MONITOR_CMD_ASYNC API won't be used). This in turn requires the main thread to perform select and call the callback that will process this ACK. That's why the MONITOR_CMD_ASYNC API was used. I'm not aware of any other way to do this, I'll be glad for any help here. Also, I understand this is not what is not true async, since one would expect a true async interface to support multiple in flight monitor commands. If there is any ETA or existing way to do this we could change the implementation of client_migrate_info. Is it possible to use a QMP event to signal completion instead of a MONITOR_CMD_ASYNC command? Problem is this breaks the qemu - libvirt interface. I had the same issue with the block_job_cancel command, which Adam Litke and Eric Blake helped us fix and find a backward-compatible libvirt solution for: http://www.redhat.com/archives/libvir-list/2011-November/msg01351.html Isn't going to fly as waiting for completion isn't optional in that case. Workflow is this: (1) libvirt issues client_migrate_info command. (2) qemu forwards it to spice-server, which in turn forwards it to the spice client (if connected). (3) spice client connects to the migration target machine. (4) spice client signals completion to spice-server, which in turn notifies qemu. (5) qemu calls the monitor completion callback, libvirt gets client_migrate_info result. (6) libvirt issues migrate command. The problem is that (3) must be finished before (6) because qemu on the target side doesn't accept incoming tcp connections any more once the migration started. I don't see a way to switch this to qmp events without breaking old libvirt versions managing new qemu. I don't see a solution in this case either. Looks like libvirt supports this command since 0.9.2 so it's not a good idea to just yank it. It might be possible for the QEMU client_migrate_info handler to run a nested event loop in the legacy libvirt case. This would suck since the VM is unresponsive while waiting for spice migration to complete. New libvirt would call the async version of the command which is well-behaved and uses a QMP event to signal completion. I agree that a nested event loop would be a bad solution, the point is to let the guest continue to work while waiting, otherwise you destroy the live migration experience, might as well disconnect the client from the source and have it connect to the target. I thought the whole point of MONITOR_CMD_ASYNC was to allow this scenario. So iiuc QMP is the alternative but it would require a rewrite, i.e. break existing users like libvirt. Hence my suggestion as a reply to Luiz's initial email that we just deprecate MONITOR_CMD_ASYNC right now instead of dropping it, then we can implement a QMP variant of client_migrate_info side by side, libvirt can learn to use it, and then (seeing as we are the only user) M_C_A can be dropped. Stefan
Re: [Qemu-devel] [PATCH V12 1/8] Support for TPM command line options
Please use git-send-email. This series won't apply via git-am as-is. Also, use a single line separator with '---' between the revision history and the commit message so that the revision history doesn't end up in git. You need to move the SoB line before the revision history too. Regards, Anthony Liguori On 10/11/2011 12:32 PM, Stefan Berger wrote:
Re: [Qemu-devel] Dropping the MONITOR_CMD_ASYNC
On Mon, 12 Dec 2011 17:50:46 +0200 Alon Levy al...@redhat.com wrote: On Thu, Dec 08, 2011 at 05:45:44PM -0200, Luiz Capitulino wrote: Hi there, I'm about to completely drop the MONITOR_CMD_ASYNC API, but it turns out that the command client_migrate_info uses it. That's a legacy interface and has to be dropped, no command should be using it... Why has it got to be dropped? can't it be declared as deprecated first? Well, after this thread looks like it's what we'll have to do... Something tells me that if I just drop it (and change the command to use the regular interface), bad things will happen. Am I right? :)
[Qemu-devel] [PATCH v2 0/4] add qemu_thread_join, use it to fix bug in ccid
Patches introducing qemu_thread_join have floated around multiple times. Now I found a bug that requires it to be fixed, so perhaps this time it will be more successful. For the actual bug, see patch 4. v1-v2: remove spurious submodule change, fix blank lines Jan Kiszka (2): qemu-thread: add API for joinable threads qemu-thread: implement joinable threads for POSIX Paolo Bonzini (2): qemu-thread: implement joinable threads for Win32 ccid: make threads joinable cpus.c |6 ++- hw/ccid-card-emulated.c | 25 +-- qemu-thread-posix.c | 35 ++-- qemu-thread-win32.c | 107 ++ qemu-thread-win32.h |5 +- qemu-thread.h |8 +++- ui/vnc-jobs-async.c |2 +- 8 files changed, 127 insertions(+), 63 deletions(-) -- 1.7.7.1
[Qemu-devel] [PATCH v2 2/4] qemu-thread: implement joinable threads for POSIX
From: Jan Kiszka jan.kis...@siemens.com Allow to control if a QEMU thread is created joinable or not. Make it not joinable by default to avoid that we keep the associated resources around when terminating a thread without joining it (what we couldn't do so far for obvious reasons). The audio subsystem will need the join feature when converting it to QEMU threading/locking abstractions, so provide that service. Signed-off-by: Jan Kiszka jan.kis...@siemens.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- qemu-thread-posix.c | 28 -- 1 files changed, 26 insertions(+), 2 deletions(-) diff --git a/qemu-thread-posix.c b/qemu-thread-posix.c index 49d3388..603ff3d 100644 --- a/qemu-thread-posix.c +++ b/qemu-thread-posix.c @@ -122,17 +122,29 @@ void qemu_thread_create(QemuThread *thread, { sigset_t set, oldset; int err; +pthread_attr_t attr; -assert(mode == QEMU_THREAD_DETACHED); +err = pthread_attr_init(attr); +if (err) { +error_exit(err, __func__); +} +if (mode == QEMU_THREAD_DETACHED) { +err = pthread_attr_setdetachstate(attr, PTHREAD_CREATE_DETACHED); +if (err) { +error_exit(err, __func__); +} +} /* Leave signal handling to the iothread. */ sigfillset(set); pthread_sigmask(SIG_SETMASK, set, oldset); -err = pthread_create(thread-thread, NULL, start_routine, arg); +err = pthread_create(thread-thread, attr, start_routine, arg); if (err) error_exit(err, __func__); pthread_sigmask(SIG_SETMASK, oldset, NULL); + +pthread_attr_destroy(attr); } void qemu_thread_get_self(QemuThread *thread) @@ -148,3 +162,15 @@ void qemu_thread_exit(void *retval) { pthread_exit(retval); } + +void *qemu_thread_join(QemuThread *thread) +{ +int err; +void *ret; + +err = pthread_join(thread-thread, ret); +if (err) { +error_exit(err, __func__); +} +return ret; +} -- 1.7.7.1
[Qemu-devel] [PATCH v2 4/4] ccid: make threads joinable
Destroying a mutex that another thread might have just unlocked is racy. It usually works, but you cannot do that in general and can lead to deadlocks or segfaults. Change ccid to use joinable threads instead. (Also, qemu_mutex_init/qemu_cond_init were missing). Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- hw/ccid-card-emulated.c | 26 +++--- 1 files changed, 11 insertions(+), 15 deletions(-) diff --git a/hw/ccid-card-emulated.c b/hw/ccid-card-emulated.c index 9fe9db5..2d2ebce 100644 --- a/hw/ccid-card-emulated.c +++ b/hw/ccid-card-emulated.c @@ -120,6 +120,7 @@ struct EmulatedState { uint8_t atr_length; QSIMPLEQ_HEAD(event_list, EmulEvent) event_list; QemuMutex event_list_mutex; +QemuThread event_thread_id; VReader *reader; QSIMPLEQ_HEAD(guest_apdu_list, EmulEvent) guest_apdu_list; QemuMutex vreader_mutex; /* and guest_apdu_list mutex */ @@ -127,8 +128,7 @@ struct EmulatedState { QemuCond handle_apdu_cond; int pipe[2]; int quit_apdu_thread; -QemuMutex apdu_thread_quit_mutex; -QemuCond apdu_thread_quit_cond; +QemuThread apdu_thread_id; }; static void emulated_apdu_from_guest(CCIDCardState *base, @@ -271,9 +271,6 @@ static void *handle_apdu_thread(void* arg) } qemu_mutex_unlock(card-vreader_mutex); } -qemu_mutex_lock(card-apdu_thread_quit_mutex); -qemu_cond_signal(card-apdu_thread_quit_cond); -qemu_mutex_unlock(card-apdu_thread_quit_mutex); return NULL; } @@ -489,7 +486,6 @@ static uint32_t parse_enumeration(char *str, static int emulated_initfn(CCIDCardState *base) { EmulatedState *card = DO_UPCAST(EmulatedState, base, base); -QemuThread thread_id; VCardEmulError ret; EnumTable *ptable; @@ -541,9 +537,10 @@ static int emulated_initfn(CCIDCardState *base) printf(%s: failed to initialize vcard\n, EMULATED_DEV_NAME); return -1; } -qemu_thread_create(thread_id, event_thread, card, QEMU_THREAD_DETACHED); -qemu_thread_create(thread_id, handle_apdu_thread, card, - QEMU_THREAD_DETACHED); +qemu_thread_create(card-event_thread_id, event_thread, card, + QEMU_THREAD_JOINABLE); +qemu_thread_create(card-apdu_thread_id, handle_apdu_thread, card, + QEMU_THREAD_JOINABLE); return 0; } @@ -553,15 +550,14 @@ static int emulated_exitfn(CCIDCardState *base) VEvent *vevent = vevent_new(VEVENT_LAST, NULL, NULL); vevent_queue_vevent(vevent); /* stop vevent thread */ -qemu_mutex_lock(card-apdu_thread_quit_mutex); +qemu_thread_join(card-event_thread_id); + card-quit_apdu_thread = 1; /* stop handle_apdu thread */ qemu_cond_signal(card-handle_apdu_cond); -qemu_cond_wait(card-apdu_thread_quit_cond, - card-apdu_thread_quit_mutex); -/* handle_apdu thread stopped, can destroy all of it's mutexes */ +qemu_thread_join(card-apdu_thread_id); + +/* threads exited, can destroy all condvars/mutexes */ qemu_cond_destroy(card-handle_apdu_cond); -qemu_cond_destroy(card-apdu_thread_quit_cond); -qemu_mutex_destroy(card-apdu_thread_quit_mutex); qemu_mutex_destroy(card-handle_apdu_mutex); qemu_mutex_destroy(card-vreader_mutex); qemu_mutex_destroy(card-event_list_mutex); -- 1.7.7.1
[Qemu-devel] [PATCH v2 1/4] qemu-thread: add API for joinable threads
From: Jan Kiszka jan.kis...@siemens.com Split from Jan's original qemu-thread-posix.c patch. No semantic change, just introduce the new API that POSIX and Win32 implementations will conform to. Signed-off-by: Jan Kiszka jan.kis...@siemens.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- cpus.c |6 -- hw/ccid-card-emulated.c |5 +++-- qemu-thread-posix.c |7 --- qemu-thread-win32.c |4 +++- qemu-thread.h |8 ++-- ui/vnc-jobs-async.c |2 +- 6 files changed, 21 insertions(+), 11 deletions(-) diff --git a/cpus.c b/cpus.c index ca46ec6..cbb4617 100644 --- a/cpus.c +++ b/cpus.c @@ -910,7 +910,8 @@ static void qemu_tcg_init_vcpu(void *_env) env-halt_cond = g_malloc0(sizeof(QemuCond)); qemu_cond_init(env-halt_cond); tcg_halt_cond = env-halt_cond; -qemu_thread_create(env-thread, qemu_tcg_cpu_thread_fn, env); +qemu_thread_create(env-thread, qemu_tcg_cpu_thread_fn, env, + QEMU_THREAD_DETACHED); while (env-created == 0) { qemu_cond_wait(qemu_cpu_cond, qemu_global_mutex); } @@ -926,7 +927,8 @@ static void qemu_kvm_start_vcpu(CPUState *env) env-thread = g_malloc0(sizeof(QemuThread)); env-halt_cond = g_malloc0(sizeof(QemuCond)); qemu_cond_init(env-halt_cond); -qemu_thread_create(env-thread, qemu_kvm_cpu_thread_fn, env); +qemu_thread_create(env-thread, qemu_kvm_cpu_thread_fn, env, + QEMU_THREAD_DETACHED); while (env-created == 0) { qemu_cond_wait(qemu_cpu_cond, qemu_global_mutex); } diff --git a/hw/ccid-card-emulated.c b/hw/ccid-card-emulated.c index 092301b..9fe9db5 100644 --- a/hw/ccid-card-emulated.c +++ b/hw/ccid-card-emulated.c @@ -541,8 +541,9 @@ static int emulated_initfn(CCIDCardState *base) printf(%s: failed to initialize vcard\n, EMULATED_DEV_NAME); return -1; } -qemu_thread_create(thread_id, event_thread, card); -qemu_thread_create(thread_id, handle_apdu_thread, card); +qemu_thread_create(thread_id, event_thread, card, QEMU_THREAD_DETACHED); +qemu_thread_create(thread_id, handle_apdu_thread, card, + QEMU_THREAD_DETACHED); return 0; } diff --git a/qemu-thread-posix.c b/qemu-thread-posix.c index ac3c0c9..49d3388 100644 --- a/qemu-thread-posix.c +++ b/qemu-thread-posix.c @@ -117,13 +117,14 @@ void qemu_cond_wait(QemuCond *cond, QemuMutex *mutex) void qemu_thread_create(QemuThread *thread, void *(*start_routine)(void*), - void *arg) + void *arg, int mode) { +sigset_t set, oldset; int err; -/* Leave signal handling to the iothread. */ -sigset_t set, oldset; +assert(mode == QEMU_THREAD_DETACHED); +/* Leave signal handling to the iothread. */ sigfillset(set); pthread_sigmask(SIG_SETMASK, set, oldset); err = pthread_create(thread-thread, NULL, start_routine, arg); diff --git a/qemu-thread-win32.c b/qemu-thread-win32.c index db8e744..ff80e84 100644 --- a/qemu-thread-win32.c +++ b/qemu-thread-win32.c @@ -243,10 +243,12 @@ static inline void qemu_thread_init(void) void qemu_thread_create(QemuThread *thread, void *(*start_routine)(void *), - void *arg) + void *arg, int mode) { HANDLE hThread; +assert(mode == QEMU_THREAD_DETACHED); + struct QemuThreadData *data; qemu_thread_init(); data = g_malloc(sizeof *data); diff --git a/qemu-thread.h b/qemu-thread.h index e008b60..a78a8f2 100644 --- a/qemu-thread.h +++ b/qemu-thread.h @@ -13,6 +13,9 @@ typedef struct QemuThread QemuThread; #include qemu-thread-posix.h #endif +#define QEMU_THREAD_JOINABLE 0 +#define QEMU_THREAD_DETACHED 1 + void qemu_mutex_init(QemuMutex *mutex); void qemu_mutex_destroy(QemuMutex *mutex); void qemu_mutex_lock(QemuMutex *mutex); @@ -35,8 +38,9 @@ void qemu_cond_broadcast(QemuCond *cond); void qemu_cond_wait(QemuCond *cond, QemuMutex *mutex); void qemu_thread_create(QemuThread *thread, - void *(*start_routine)(void*), - void *arg); +void *(*start_routine)(void *), +void *arg, int mode); +void *qemu_thread_join(QemuThread *thread); void qemu_thread_get_self(QemuThread *thread); int qemu_thread_is_self(QemuThread *thread); void qemu_thread_exit(void *retval); diff --git a/ui/vnc-jobs-async.c b/ui/vnc-jobs-async.c index de5ea6b..9b3016c 100644 --- a/ui/vnc-jobs-async.c +++ b/ui/vnc-jobs-async.c @@ -318,7 +318,7 @@ void vnc_start_worker_thread(void) return ; q = vnc_queue_init(); -qemu_thread_create(q-thread, vnc_worker_thread, q); +qemu_thread_create(q-thread, vnc_worker_thread, q, QEMU_THREAD_DETACHED); queue = q; /* Set global queue */ } -- 1.7.7.1
Re: [Qemu-devel] Dropping the MONITOR_CMD_ASYNC
On 12/11/2011 04:29 AM, Alon Levy wrote: On Thu, Dec 08, 2011 at 05:45:44PM -0200, Luiz Capitulino wrote: Hi there, I'm about to completely drop the MONITOR_CMD_ASYNC API, but it turns out that the command client_migrate_info uses it. That's a legacy interface and has to be dropped, no command should be using it... Something tells me that if I just drop it (and change the command to use the regular interface), bad things will happen. Am I right? :) The monitor command client_migrate_info needs to complete after getting Then it's completely broken. CMD_ASYNC is broken and should have been removed a long time ago. Unfortunately, we're going to have to remove this command until we can move to full QAPI. I see this went in through Gerd's tree: commit edc5cb1a52b2847201acf78b0fba67ab3c2464d5 Author: Yonit Halperin yhalp...@redhat.com Date: Mon Oct 17 10:03:18 2011 +0200 spice: turn client_migrate_info to async Changes to monitor commands really should at least carry an Acked-by from the QMP maintainer (Luiz). That would have prevented this from happening. I'll try to be more diligent about enforcing this in the future. Regards, Anthony Liguori an ACK message from the currently connected spice client (this is the only case where this is required - if there is no client then the MONITOR_CMD_ASYNC API won't be used). This in turn requires the main thread to perform select and call the callback that will process this ACK. That's why the MONITOR_CMD_ASYNC API was used. I'm not aware of any other way to do this, I'll be glad for any help here. Also, I understand this is not what is not true async, since one would expect a true async interface to support multiple in flight monitor commands. If there is any ETA or existing way to do this we could change the implementation of client_migrate_info. Alon
Re: [Qemu-devel] Dropping the MONITOR_CMD_ASYNC
On 12/12/2011 10:00 AM, Alon Levy wrote: On Mon, Dec 12, 2011 at 03:23:35PM +, Stefan Hajnoczi wrote: On Mon, Dec 12, 2011 at 12:39 PM, Gerd Hoffmannkra...@redhat.com wrote: On 12/12/11 13:10, Stefan Hajnoczi wrote: On Mon, Dec 12, 2011 at 11:29 AM, Gerd Hoffmannkra...@redhat.com wrote: On 12/12/11 11:18, Stefan Hajnoczi wrote: On Sun, Dec 11, 2011 at 10:29 AM, Alon Levyal...@redhat.com wrote: On Thu, Dec 08, 2011 at 05:45:44PM -0200, Luiz Capitulino wrote: Hi there, I'm about to completely drop the MONITOR_CMD_ASYNC API, but it turns out that the command client_migrate_info uses it. That's a legacy interface and has to be dropped, no command should be using it... Something tells me that if I just drop it (and change the command to use the regular interface), bad things will happen. Am I right? :) The monitor command client_migrate_info needs to complete after getting an ACK message from the currently connected spice client (this is the only case where this is required - if there is no client then the MONITOR_CMD_ASYNC API won't be used). This in turn requires the main thread to perform select and call the callback that will process this ACK. That's why the MONITOR_CMD_ASYNC API was used. I'm not aware of any other way to do this, I'll be glad for any help here. Also, I understand this is not what is not true async, since one would expect a true async interface to support multiple in flight monitor commands. If there is any ETA or existing way to do this we could change the implementation of client_migrate_info. Is it possible to use a QMP event to signal completion instead of a MONITOR_CMD_ASYNC command? Problem is this breaks the qemu- libvirt interface. I had the same issue with the block_job_cancel command, which Adam Litke and Eric Blake helped us fix and find a backward-compatible libvirt solution for: http://www.redhat.com/archives/libvir-list/2011-November/msg01351.html Isn't going to fly as waiting for completion isn't optional in that case. Workflow is this: (1) libvirt issues client_migrate_info command. (2) qemu forwards it to spice-server, which in turn forwards it to the spice client (if connected). (3) spice client connects to the migration target machine. (4) spice client signals completion to spice-server, which in turn notifies qemu. (5) qemu calls the monitor completion callback, libvirt gets client_migrate_info result. (6) libvirt issues migrate command. The problem is that (3) must be finished before (6) because qemu on the target side doesn't accept incoming tcp connections any more once the migration started. I don't see a way to switch this to qmp events without breaking old libvirt versions managing new qemu. I don't see a solution in this case either. Looks like libvirt supports this command since 0.9.2 so it's not a good idea to just yank it. It might be possible for the QEMU client_migrate_info handler to run a nested event loop in the legacy libvirt case. This would suck since the VM is unresponsive while waiting for spice migration to complete. New libvirt would call the async version of the command which is well-behaved and uses a QMP event to signal completion. I agree that a nested event loop would be a bad solution, the point is to let the guest continue to work while waiting, otherwise you destroy the live migration experience, might as well disconnect the client from the source and have it connect to the target. I thought the whole point of MONITOR_CMD_ASYNC was to allow this scenario. So iiuc QMP is the alternative but it would require a rewrite, i.e. break existing users like libvirt. Hence my suggestion as a reply to Luiz's initial email that we just deprecate MONITOR_CMD_ASYNC right now instead of dropping it, It *has* to be dropped. Any command using it is fundamentally broken. The command should have never been introduced in the first place to use async. Regards, Anthony Liguori
Re: [Qemu-devel] Dropping the MONITOR_CMD_ASYNC
On 12/12/2011 10:08 AM, Luiz Capitulino wrote: On Mon, 12 Dec 2011 17:50:46 +0200 Alon Levyal...@redhat.com wrote: On Thu, Dec 08, 2011 at 05:45:44PM -0200, Luiz Capitulino wrote: Hi there, I'm about to completely drop the MONITOR_CMD_ASYNC API, but it turns out that the command client_migrate_info uses it. That's a legacy interface and has to be dropped, no command should be using it... Why has it got to be dropped? can't it be declared as deprecated first? Well, after this thread looks like it's what we'll have to do...\ Nope, it has to be dropped. Commands using CMD_ASYNC may fail in arbitrary ways because of the way error reporting is done. This is an unfixable problem until we eliminate all uses of qerror_report(). We need to take the hit here and force the command to always fail. libvirt will need logic to use a different command with new versions. If we coordinate this with the libvirt folks, we can make the transition as smooth as possible. Regards, Anthony Liguori Something tells me that if I just drop it (and change the command to use the regular interface), bad things will happen. Am I right? :)
Re: [Qemu-devel] [PULL 0/8] arm-devs queue
On 12/12/2011 04:47 AM, Peter Maydell wrote: The following changes since commit f18318eef8b4b263f4e82a5338c9b2875a6c73c8: Merge branch 'master' of git://git.qemu.org/qemu (2011-12-12 04:12:31 +0400) are available in the git repository at: git://git.linaro.org/people/pmaydell/qemu-arm.git arm-devs.for-upstream Pulled. Thanks. Regards, Anthony Liguori Peter Chubb (1): Fix sp804 dual-timer Peter Maydell (7): hw/arm_mptimer.c: Turn ARM MPcore private timers into qdev devices hw/arm_gic: Expose GIC CPU interfaces as sysbus memory regions hw/mpcore.c: Use the GIC memory regions for the CPU interface hw/realview_gic: Use GIC memory region for the CPU interface hw/mpcore: Clean up mpcore_priv_read/write as they are now SCU only hw/a9mpcore.c: Implement A9MP peripherals rather than 11MPcore ones hw/mpcore.c: Merge with hw/arm11mpcore.c Makefile.target |1 + hw/a9mpcore.c | 189 -- hw/arm11mpcore.c | 130 +- hw/arm_gic.c | 75 - hw/arm_mptimer.c | 332 + hw/arm_timer.c| 41 ++- hw/mpcore.c | 283 - hw/realview_gic.c | 25 + 8 files changed, 751 insertions(+), 325 deletions(-) create mode 100644 hw/arm_mptimer.c delete mode 100644 hw/mpcore.c
Re: [Qemu-devel] [PULL 00/22]: QMP queue
On 12/06/2011 11:54 AM, Luiz Capitulino wrote: Anthony, This pull request contains my round 3 QAPI conversion patches, the new QMP visitor tests, some documentation and the qmp test tool. The changes (since 217bfb445b54db618a30f3a39170bebd9fd9dbf2) are available in the following repository: git://repo.or.cz/qemu/qmp-unstable.git queue/qmp Pulled. Thanks. Regards, Anthony Liguori Luiz Capitulino (21): docs: Add writing-qmp-commands.txt configure: Don't mix glib and libcheck tests Introduce test-qmp-output-visitor Introduce test-qmp-input-visitor Drop test-visitor qapi: Complete system_powerdown conversion console: Drop unused prototypes QError: Introduce QERR_IO_ERROR qapi: Convert memsave qapi: Convert pmemsave qapi: Convert cont qapi: Convert inject-nmi qapi: Convert set_link qapi: Convert block_passwd qapi: Convert balloon qapi: Convert block_resize qapi: Convert blockdev_snapshot_sync qapi: Convert human-monitor-command qapi: Convert migrate_cancel qapi: Convert migrate_set_downtime qapi: Convert migrate_set_speed Mark Wu (1): qmp: add test tool for QMP Makefile |9 +- QMP/qmp | 126 balloon.c | 28 +-- balloon.h |3 - blockdev.c| 89 +++ blockdev.h|4 - configure |5 +- console.h |2 - cpus.c| 90 ++ docs/writing-qmp-commands.txt | 642 + hmp-commands.hx | 36 +-- hmp.c | 148 ++ hmp.h | 12 + migration.c | 28 +-- migration.h |7 - monitor.c | 201 ++--- monitor.h |3 + net.c | 10 +- net.h |1 - qapi-schema-test.json |6 + qapi-schema.json | 267 + qerror.c |4 + qerror.h |3 + qmp-commands.hx | 77 + qmp.c | 37 +++ test-qmp-input-visitor.c | 270 + test-qmp-output-visitor.c | 423 +++ test-visitor.c| 338 -- 28 files changed, 2153 insertions(+), 716 deletions(-)
[Qemu-devel] [PATCH v2 3/4] qemu-thread: implement joinable threads for Win32
Rewrite the handshaking between qemu_thread_create and the win32_start_routine, so that the thread can be joined without races. Similar handshaking is done now between qemu_thread_exit and qemu_thread_join. This also simplifies how QemuThreads are initialized. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- qemu-thread-win32.c | 107 +-- qemu-thread-win32.h |5 +- 3 files changed, 73 insertions(+), 41 deletions(-) diff --git a/qemu-thread-win32.c b/qemu-thread-win32.c index ff80e84..a13ffcc 100644 --- a/qemu-thread-win32.c +++ b/qemu-thread-win32.c @@ -193,41 +193,78 @@ void qemu_cond_wait(QemuCond *cond, QemuMutex *mutex) } struct QemuThreadData { -QemuThread *thread; -void *(*start_routine)(void *); -void *arg; +/* Passed to win32_start_routine. */ +void *(*start_routine)(void *); +void *arg; +short mode; + +/* Only used for joinable threads. */ +bool exited; +void *ret; +CRITICAL_SECTION cs; }; static int qemu_thread_tls_index = TLS_OUT_OF_INDEXES; static unsigned __stdcall win32_start_routine(void *arg) { -struct QemuThreadData data = *(struct QemuThreadData *) arg; -QemuThread *thread = data.thread; - -free(arg); -TlsSetValue(qemu_thread_tls_index, thread); - -/* - * Use DuplicateHandle instead of assigning thread-thread in the - * creating thread to avoid races. It's simpler this way than with - * synchronization. - */ -DuplicateHandle(GetCurrentProcess(), GetCurrentThread(), -GetCurrentProcess(), thread-thread, -0, FALSE, DUPLICATE_SAME_ACCESS); - -qemu_thread_exit(data.start_routine(data.arg)); +QemuThreadData *data = (QemuThreadData *) arg; +void *(*start_routine)(void *) = data-start_routine; +void *thread_arg = data-arg; + +if (data-mode == QEMU_THREAD_DETACHED) { +g_free(data); +data = NULL; +} else { +InitializeCriticalSection(data-cs); +} +TlsSetValue(qemu_thread_tls_index, data); +qemu_thread_exit(start_routine(thread_arg)); abort(); } void qemu_thread_exit(void *arg) { -QemuThread *thread = TlsGetValue(qemu_thread_tls_index); -thread-ret = arg; -CloseHandle(thread-thread); -thread-thread = NULL; -ExitThread(0); +QemuThreadData *data = TlsGetValue(qemu_thread_tls_index); +if (data) { +data-ret = arg; +EnterCriticalSection(data-cs); +data-exited = true; +LeaveCriticalSection(data-cs); +} +_endthreadex(0); +} + +void *qemu_thread_join(QemuThread *thread) +{ +QemuThreadData *data; +void *ret; +HANDLE handle; + +data = thread-data; +if (!data) { +return NULL; +} +/* + * Because multiple copies of the QemuThread can exist via + * qemu_thread_get_self, we need to store a value that cannot + * leak there. The simplest, non racy way is to store the TID, + * discard the handle that _beginthreadex gives back, and + * get another copy of the handle here. + */ +EnterCriticalSection(data-cs); +if (!data-exited) { +handle = OpenThread(SYNCHRONIZE, FALSE, thread-tid); +LeaveCriticalSection(data-cs); +WaitForSingleObject(handle, INFINITE); +CloseHandle(handle); +} else { +LeaveCriticalSection(data-cs); +} +ret = data-ret; +DeleteCriticalSection(data-cs); +g_free(data); +return ret; } static inline void qemu_thread_init(void) @@ -247,37 +284,31 @@ void qemu_thread_create(QemuThread *thread, { HANDLE hThread; -assert(mode == QEMU_THREAD_DETACHED); - struct QemuThreadData *data; qemu_thread_init(); data = g_malloc(sizeof *data); -data-thread = thread; data-start_routine = start_routine; data-arg = arg; +data-mode = mode; +data-exited = false; hThread = (HANDLE) _beginthreadex(NULL, 0, win32_start_routine, - data, 0, NULL); + data, 0, thread-tid); if (!hThread) { error_exit(GetLastError(), __func__); } CloseHandle(hThread); +thread-data = (mode == QEMU_THREAD_DETACHED) ? NULL : data; } void qemu_thread_get_self(QemuThread *thread) { -if (!thread-thread) { -/* In the main thread of the process. Initialize the QemuThread - pointer in TLS, and use the dummy GetCurrentThread handle as - the identifier for qemu_thread_is_self. */ -qemu_thread_init(); -TlsSetValue(qemu_thread_tls_index, thread); -thread-thread = GetCurrentThread(); -} +qemu_thread_init(); +thread-data = TlsGetValue(qemu_thread_tls_index); +thread-tid = GetCurrentThreadId(); } int qemu_thread_is_self(QemuThread *thread) { -QemuThread *this_thread =
Re: [Qemu-devel] [PATCH] Mark future contributions to GPLv2-only files as GPLv2+
On 10/21/2011 09:03 AM, Paolo Bonzini wrote: Even for files are licensed GPLv2-only, let's not play catch with ourselves, and explicitly declare that future contributions to those files will also be available as any later version. Signed-off-by: Paolo Bonzinipbonz...@redhat.com So where do we stand with this? Regards, Anthony Liguori --- aio.c |2 ++ block-migration.c |2 ++ block/raw-posix-aio.h |2 ++ block/rbd.c|2 ++ block/sheepdog.c |3 +++ buffered_file.c|2 ++ compatfd.c |2 ++ hmp.c |2 ++ hw/ac97.c |3 +++ hw/acpi.c |3 +++ hw/acpi_piix4.c|3 +++ hw/ads7846.c |3 +++ hw/apm.c |3 +++ hw/bitbang_i2c.c |3 +++ hw/bonito.c|3 +++ hw/collie.c|3 +++ hw/ds1338.c|3 +++ hw/ecc.c |3 +++ hw/event_notifier.c|3 +++ hw/framebuffer.c |3 +++ hw/gumstix.c |3 +++ hw/ivshmem.c |3 +++ hw/kvmclock.c |2 ++ hw/lan9118.c |3 +++ hw/mainstone.c |3 +++ hw/marvell_88w8618_audio.c |3 +++ hw/max111x.c |3 +++ hw/mips_fulong2e.c |3 +++ hw/msix.c |3 +++ hw/mst_fpga.c |3 +++ hw/musicpal.c |3 +++ hw/nand.c |3 +++ hw/pl031.c |2 ++ hw/pxa2xx_keypad.c |3 +++ hw/pxa2xx_lcd.c|3 +++ hw/pxa2xx_mmci.c |3 +++ hw/pxa2xx_pcmcia.c |3 +++ hw/smbios.c|2 ++ hw/spitz.c |3 +++ hw/ssi-sd.c|3 +++ hw/ssi.c |3 +++ hw/strongarm.c |3 +++ hw/tc6393xb.c |3 +++ hw/tosa.c |3 +++ hw/vexpress.c |3 +++ hw/vhost.c |3 +++ hw/vhost_net.c |3 +++ hw/virtio-pci.c|2 ++ hw/virtio-serial-bus.c |3 +++ hw/vt82c686.c |3 +++ hw/xen_backend.c |3 +++ hw/xen_disk.c |3 +++ hw/xen_nic.c |3 +++ hw/z2.c|3 +++ iov.c |3 +++ memory.c |2 ++ migration-exec.c |2 ++ migration-fd.c |2 ++ migration-tcp.c|2 ++ migration-unix.c |2 ++ migration.c|2 ++ module.c |2 ++ net/checksum.c |3 +++ notify.c |2 ++ pflib.c|2 ++ posix-aio-compat.c |2 ++ qemu-tool.c|2 ++ qmp.c |2 ++ roms/SLOF |2 +- xen-all.c |2 ++ xen-mapcache.c |2 ++ xen-stub.c |2 ++ 72 files changed, 188 insertions(+), 1 deletions(-) diff --git a/aio.c b/aio.c index 1239ca7..c6f3cb1 100644 --- a/aio.c +++ b/aio.c @@ -9,6 +9,8 @@ * This work is licensed under the terms of the GNU GPL, version 2. See * the COPYING file in the top-level directory. * + * Contributions after 2011-10-25 are licensed under the terms of the + * GNU GPL, version 2 or (at your option) any later version. */ #include qemu-common.h diff --git a/block-migration.c b/block-migration.c index 0bff075..32c2eea 100644 --- a/block-migration.c +++ b/block-migration.c @@ -9,6 +9,8 @@ * This work is licensed under the terms of the GNU GPL, version 2. See * the COPYING file in the top-level directory. * + * Contributions after 2011-10-25 are licensed under the terms of the + * GNU GPL, version 2 or (at your option) any later version. */ #include qemu-common.h diff --git a/block/raw-posix-aio.h b/block/raw-posix-aio.h index dfc63b8..d6d7275 100644 --- a/block/raw-posix-aio.h +++ b/block/raw-posix-aio.h @@ -9,6 +9,8 @@ * This work is licensed under the terms of the GNU GPL, version 2. See * the COPYING file in the top-level directory. * + * Contributions after 2011-10-25 are licensed under the terms of the + * GNU GPL, version 2 or (at your option) any later version. */ #ifndef QEMU_RAW_POSIX_AIO_H #define QEMU_RAW_POSIX_AIO_H diff --git a/block/rbd.c b/block/rbd.c index 3068c82..b726c80 100644 --- a/block/rbd.c +++ b/block/rbd.c @@ -7,6 +7,8 @@ * This work is licensed under the terms of the GNU GPL, version 2. See * the COPYING file in the top-level directory. * + * Contributions after 2011-10-25 are licensed under the terms of the + * GNU GPL, version 2 or (at your option) any later version. */ #includeinttypes.h diff
Re: [Qemu-devel] [PATCH v4 00/15] uq/master: Introduce basic irqchip support
On Thu, Dec 08, 2011 at 12:52:19PM +0100, Jan Kiszka wrote: Changes in v4: - rebased of current uq/master - fixed stupid bugs that broke bisectability and user space irqchip mode - integrated NMI-over-LINT1 injection logic CC: Lai Jiangshan la...@cn.fujitsu.com Jan Kiszka (15): msi: Generalize msix_supported to msi_supported kvm: Move kvmclock into hw/kvm folder apic: Stop timer on reset apic: Inject external NMI events via LINT1 apic: Introduce backend/frontend infrastructure for KVM reuse apic: Open-code timer save/restore i8259: Introduce backend/frontend infrastructure for KVM reuse ioapic: Introduce backend/frontend infrastructure for KVM reuse memory: Introduce memory_region_init_reservation kvm: Introduce core services for in-kernel irqchip support kvm: x86: Establish IRQ0 override control kvm: x86: Add user space part for in-kernel APIC kvm: x86: Add user space part for in-kernel i8259 kvm: x86: Add user space part for in-kernel IOAPIC kvm: Arm in-kernel irqchip support Looks good to me. Any thoughts on the qemu-kvm merge plan? Sounds painful.
Re: [Qemu-devel] [PATCH 01/11] isa: give ISABus/ISADevice to isa_create(), isa_bus_irqs() and isa_get_irq() functions
On 10/24/2011 03:18 PM, Hervé Poussineau wrote: NULL is a valid bus/device, so there is no change in behaviour. Signed-off-by: Hervé Poussineauhpous...@reactos.org Reviewed-by: Anthony Liguori aligu...@us.ibm.com Regards, Anthony Liguori --- arch_init.c|8 arch_init.h|2 +- hw/adlib.c |2 +- hw/alpha_dp264.c | 10 ++ hw/alpha_typhoon.c |7 --- hw/audiodev.h |8 hw/cs4231a.c |4 ++-- hw/fdc.h |4 ++-- hw/gus.c |4 ++-- hw/i8254.c |2 +- hw/i8259.c |6 +++--- hw/ide.h |2 +- hw/ide/isa.c |4 ++-- hw/ide/piix.c |2 +- hw/ide/via.c |2 +- hw/isa-bus.c | 18 +++--- hw/isa.h | 10 +- hw/m48t59.c|5 +++-- hw/mc146818rtc.c |4 ++-- hw/mc146818rtc.h |2 +- hw/mips_fulong2e.c | 16 +--- hw/mips_jazz.c | 13 +++-- hw/mips_malta.c| 26 ++ hw/mips_r4k.c | 21 +++-- hw/nvram.h |3 ++- hw/pc.c| 30 +++--- hw/pc.h| 35 ++- hw/pc_piix.c | 19 +++ hw/pcspk.c |2 +- hw/ppc_prep.c | 20 +++- hw/sb16.c |4 ++-- hw/sun4u.c | 20 qemu-common.h |1 + 33 files changed, 171 insertions(+), 145 deletions(-) diff --git a/arch_init.c b/arch_init.c index a411fdf..3bc2a41 100644 --- a/arch_init.c +++ b/arch_init.c @@ -473,7 +473,7 @@ struct soundhw { int enabled; int isa; union { -int (*init_isa) (qemu_irq *pic); +int (*init_isa) (ISABus *bus, qemu_irq *pic); int (*init_pci) (PCIBus *bus); } init; }; @@ -628,7 +628,7 @@ void select_soundhw(const char *optarg) } } -void audio_init(qemu_irq *isa_pic, PCIBus *pci_bus) +void audio_init(ISABus *isa_bus, qemu_irq *isa_pic, PCIBus *pci_bus) { struct soundhw *c; @@ -636,7 +636,7 @@ void audio_init(qemu_irq *isa_pic, PCIBus *pci_bus) if (c-enabled) { if (c-isa) { if (isa_pic) { -c-init.init_isa(isa_pic); +c-init.init_isa(isa_bus, isa_pic); } } else { if (pci_bus) { @@ -650,7 +650,7 @@ void audio_init(qemu_irq *isa_pic, PCIBus *pci_bus) void select_soundhw(const char *optarg) { } -void audio_init(qemu_irq *isa_pic, PCIBus *pci_bus) +void audio_init(ISABus *isa_bus, qemu_irq *isa_pic, PCIBus *pci_bus) { } #endif diff --git a/arch_init.h b/arch_init.h index a74187a..074f02a 100644 --- a/arch_init.h +++ b/arch_init.h @@ -27,7 +27,7 @@ void do_acpitable_option(const char *optarg); void do_smbios_option(const char *optarg); void cpudef_init(void); int audio_available(void); -void audio_init(qemu_irq *isa_pic, PCIBus *pci_bus); +void audio_init(ISABus *isa_bus, qemu_irq *isa_pic, PCIBus *pci_bus); int tcg_available(void); int kvm_available(void); int xen_available(void); diff --git a/hw/adlib.c b/hw/adlib.c index e4bfcc6..b5e1564 100644 --- a/hw/adlib.c +++ b/hw/adlib.c @@ -275,7 +275,7 @@ static void Adlib_fini (AdlibState *s) AUD_remove_card (s-card); } -int Adlib_init (qemu_irq *pic) +int Adlib_init (ISABus *bus, qemu_irq *pic) { AdlibState *s =glob_adlib; struct audsettings as; diff --git a/hw/alpha_dp264.c b/hw/alpha_dp264.c index fcc20e9..a87d6ef 100644 --- a/hw/alpha_dp264.c +++ b/hw/alpha_dp264.c @@ -50,6 +50,7 @@ static void clipper_init(ram_addr_t ram_size, { CPUState *cpus[4]; PCIBus *pci_bus; +ISABus *isa_bus; qemu_irq rtc_irq; long size, i; const char *palcode_filename; @@ -68,10 +69,11 @@ static void clipper_init(ram_addr_t ram_size, /* Init the chipset. */ pci_bus = typhoon_init(ram_size,rtc_irq, cpus, clipper_pci_map_irq); +isa_bus = NULL; -rtc_init(1980, rtc_irq); -pit_init(0x40, 0); -isa_create_simple(i8042); +rtc_init(isa_bus, 1980, rtc_irq); +pit_init(isa_bus, 0x40, 0); +isa_create_simple(isa_bus, i8042); /* VGA setup. Don't bother loading the bios. */ alpha_pci_vga_setup(pci_bus); @@ -79,7 +81,7 @@ static void clipper_init(ram_addr_t ram_size, /* Serial code setup. */ for (i = 0; i MAX_SERIAL_PORTS; ++i) { if (serial_hds[i]) { -serial_isa_init(i, serial_hds[i]); +serial_isa_init(isa_bus, i, serial_hds[i]); } } diff --git a/hw/alpha_typhoon.c b/hw/alpha_typhoon.c index c7608bb..113837d 100644 --- a/hw/alpha_typhoon.c +++ b/hw/alpha_typhoon.c @@ -791,11 +791,12 @@ PCIBus *typhoon_init(ram_addr_t ram_size, qemu_irq *p_rtc_irq, /* ??? Technically there should be a cy82c693ub pci-isa bridge. */ {
Re: [Qemu-devel] [PATCH 02/11] isa: move ISABus structure definition to header file
On 10/24/2011 03:18 PM, Hervé Poussineau wrote: Signed-off-by: Hervé Poussineauhpous...@reactos.org Reviewed-by: Anthony Liguori aligu...@us.ibm.com Regards, Anthony Liguori --- hw/isa-bus.c |5 - hw/isa.h |6 ++ 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/hw/isa-bus.c b/hw/isa-bus.c index dcbb134..7c94f0b 100644 --- a/hw/isa-bus.c +++ b/hw/isa-bus.c @@ -22,11 +22,6 @@ #include isa.h #include exec-memory.h -struct ISABus { -BusState qbus; -MemoryRegion *address_space_io; -qemu_irq *irqs; -}; static ISABus *isabus; target_phys_addr_t isa_mem_base = 0; diff --git a/hw/isa.h b/hw/isa.h index 4b58e37..0462521 100644 --- a/hw/isa.h +++ b/hw/isa.h @@ -13,6 +13,12 @@ typedef struct ISABus ISABus; typedef struct ISADevice ISADevice; typedef struct ISADeviceInfo ISADeviceInfo; +struct ISABus { +BusState qbus; +MemoryRegion *address_space_io; +qemu_irq *irqs; +}; + struct ISADevice { DeviceState qdev; uint32_t isairq[2];
Re: [Qemu-devel] [PATCH 03/11] i8259: give ISA device to isa_register_ioport()
On 10/24/2011 03:18 PM, Hervé Poussineau wrote: Signed-off-by: Hervé Poussineauhpous...@reactos.org Reviewed-by: Anthony Liguori aligu...@us.ibm.com Regards, Anthony Liguori --- hw/i8259.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/hw/i8259.c b/hw/i8259.c index 4446339..7331e0e 100644 --- a/hw/i8259.c +++ b/hw/i8259.c @@ -469,9 +469,9 @@ static int pic_initfn(ISADevice *dev) memory_region_init_io(s-base_io,pic_base_ioport_ops, s, pic, 2); memory_region_init_io(s-elcr_io,pic_elcr_ioport_ops, s, elcr, 1); -isa_register_ioport(NULL,s-base_io, s-iobase); +isa_register_ioport(dev,s-base_io, s-iobase); if (s-elcr_addr != -1) { -isa_register_ioport(NULL,s-elcr_io, s-elcr_addr); +isa_register_ioport(dev,s-elcr_io, s-elcr_addr); } qdev_init_gpio_out(dev-qdev, s-int_out, ARRAY_SIZE(s-int_out));
Re: [Qemu-devel] [PATCH 04/11] pc: give ISA bus to ISA methods
On 10/24/2011 03:18 PM, Hervé Poussineau wrote: Signed-off-by: Hervé Poussineauhpous...@reactos.org Reviewed-by: Anthony Liguori aligu...@us.ibm.com Regards, Anthony Liguori --- hw/pc.h |2 +- hw/pc_piix.c |3 +-- hw/piix_pci.c |8 +--- 3 files changed, 7 insertions(+), 6 deletions(-) diff --git a/hw/pc.h b/hw/pc.h index c43fa73..127940c 100644 --- a/hw/pc.h +++ b/hw/pc.h @@ -181,7 +181,7 @@ struct PCII440FXState; typedef struct PCII440FXState PCII440FXState; PCIBus *i440fx_init(PCII440FXState **pi440fx_state, int *piix_devfn, -qemu_irq *pic, +ISABus **isa_bus, qemu_irq *pic, MemoryRegion *address_space_mem, MemoryRegion *address_space_io, ram_addr_t ram_size, diff --git a/hw/pc_piix.c b/hw/pc_piix.c index 6bc1f60..be91d3b 100644 --- a/hw/pc_piix.c +++ b/hw/pc_piix.c @@ -135,7 +135,7 @@ static void pc_init1(MemoryRegion *system_memory, gsi = qemu_allocate_irqs(gsi_handler, gsi_state, GSI_NUM_PINS); if (pci_enabled) { -pci_bus = i440fx_init(i440fx_state,piix3_devfn, gsi, +pci_bus = i440fx_init(i440fx_state,piix3_devfn,isa_bus, gsi, system_memory, system_io, ram_size, below_4g_mem_size, 0x1ULL - below_4g_mem_size, @@ -144,7 +144,6 @@ static void pc_init1(MemoryRegion *system_memory, ? 0 : ((uint64_t)1 62)), pci_memory, ram_memory); -isa_bus = NULL; } else { pci_bus = NULL; i440fx_state = NULL; diff --git a/hw/piix_pci.c b/hw/piix_pci.c index d183443..aef2d7f 100644 --- a/hw/piix_pci.c +++ b/hw/piix_pci.c @@ -263,7 +263,7 @@ static int i440fx_initfn(PCIDevice *dev) static PCIBus *i440fx_common_init(const char *device_name, PCII440FXState **pi440fx_state, int *piix3_devfn, - qemu_irq *pic, + ISABus **isa_bus, qemu_irq *pic, MemoryRegion *address_space_mem, MemoryRegion *address_space_io, ram_addr_t ram_size, @@ -325,6 +325,8 @@ static PCIBus *i440fx_common_init(const char *device_name, PIIX_NUM_PIRQS); } piix3-pic = pic; +*isa_bus = DO_UPCAST(ISABus, qbus, + qdev_get_child_bus(piix3-dev.qdev, isa.0)); (*pi440fx_state)-piix3 = piix3; @@ -341,7 +343,7 @@ static PCIBus *i440fx_common_init(const char *device_name, } PCIBus *i440fx_init(PCII440FXState **pi440fx_state, int *piix3_devfn, -qemu_irq *pic, +ISABus **isa_bus, qemu_irq *pic, MemoryRegion *address_space_mem, MemoryRegion *address_space_io, ram_addr_t ram_size, @@ -354,7 +356,7 @@ PCIBus *i440fx_init(PCII440FXState **pi440fx_state, int *piix3_devfn, { PCIBus *b; -b = i440fx_common_init(i440FX, pi440fx_state, piix3_devfn, pic, +b = i440fx_common_init(i440FX, pi440fx_state, piix3_devfn, isa_bus, pic, address_space_mem, address_space_io, ram_size, pci_hole_start, pci_hole_size, pci_hole64_size, pci_hole64_size,
Re: [Qemu-devel] [PATCH 00/11] isa: preliminary work for multiple buses
On 10/24/2011 03:18 PM, Hervé Poussineau wrote: Current patches are a rework of my patches already available at [1]. They don't provide full support for multiple ISA buses (yet), but add a ISABus or ISADevice argument to all ISA functions. They are mostly mechanically touching every instanciation of ISA devices, so number of lines is quite high even if impact is quite low. Some patches don't pass checkpass check due to spaces around parentheses, but malc asked to do so on files he maintains. Some more patches will be provided after Qemu 1.0 to support multiple ISA buses, but will mostly touch ISA bridges and hw/isa-bus.c file. I think that this first step can be applied now (before release), so ISA interface may be considered stable for devices and machine emulations. Please consider applying this before Qemu 1.0. Reviewed-by: Anthony Liguori aligu...@us.ibm.com But could you rebase this series? It doesn't apply very well right now. This is a very nice cleanup, sorry it's taken so long to get it applied. Regards, Anthony Liguori Thanks [1] http://lists.gnu.org/archive/html/qemu-devel/2011-10/msg00094.html Hervé Poussineau (11): isa: give ISABus/ISADevice to isa_create(), isa_bus_irqs() and isa_get_irq() functions isa: move ISABus structure definition to header file i8259: give ISA device to isa_register_ioport() pc: give ISA bus to ISA methods alpha: give ISA bus to ISA methods sun4u: give ISA bus to ISA methods fulong2e: give ISA bus to ISA methods malta: give ISA bus to ISA methods isa: always use provided ISA bus when creating an isa device isa: always use provided ISA bus in isa_bus_irqs() audio: remove unused parameter isa_pic arch_init.c| 10 +- arch_init.h|2 +- hw/adlib.c |2 +- hw/alpha_dp264.c | 12 +++- hw/alpha_sys.h |3 ++- hw/alpha_typhoon.c |9 + hw/audiodev.h |8 hw/cs4231a.c |4 ++-- hw/fdc.h |4 ++-- hw/gus.c |4 ++-- hw/i8254.c |2 +- hw/i8259.c | 10 +- hw/ide.h |2 +- hw/ide/isa.c |4 ++-- hw/ide/piix.c |2 +- hw/ide/via.c |2 +- hw/isa-bus.c | 33 - hw/isa.h | 16 +++- hw/m48t59.c|5 +++-- hw/mc146818rtc.c |4 ++-- hw/mc146818rtc.h |2 +- hw/mips_fulong2e.c | 20 ++-- hw/mips_jazz.c | 13 +++-- hw/mips_malta.c| 27 ++- hw/mips_r4k.c | 21 +++-- hw/nvram.h |3 ++- hw/pc.c| 30 +++--- hw/pc.h| 39 --- hw/pc_piix.c | 20 +++- hw/pcspk.c |2 +- hw/piix4.c |3 ++- hw/piix_pci.c |8 +--- hw/ppc_prep.c | 20 +++- hw/sb16.c |4 ++-- hw/sun4u.c | 24 +++- hw/vt82c686.c |4 ++-- hw/vt82c686.h |2 +- qemu-common.h |1 + 38 files changed, 205 insertions(+), 176 deletions(-)
Re: [Qemu-devel] [PATCH v4 00/15] uq/master: Introduce basic irqchip support
On 2011-12-12 17:37, Marcelo Tosatti wrote: On Thu, Dec 08, 2011 at 12:52:19PM +0100, Jan Kiszka wrote: Changes in v4: - rebased of current uq/master - fixed stupid bugs that broke bisectability and user space irqchip mode - integrated NMI-over-LINT1 injection logic CC: Lai Jiangshan la...@cn.fujitsu.com Jan Kiszka (15): msi: Generalize msix_supported to msi_supported kvm: Move kvmclock into hw/kvm folder apic: Stop timer on reset apic: Inject external NMI events via LINT1 apic: Introduce backend/frontend infrastructure for KVM reuse apic: Open-code timer save/restore i8259: Introduce backend/frontend infrastructure for KVM reuse ioapic: Introduce backend/frontend infrastructure for KVM reuse memory: Introduce memory_region_init_reservation kvm: Introduce core services for in-kernel irqchip support kvm: x86: Establish IRQ0 override control kvm: x86: Add user space part for in-kernel APIC kvm: x86: Add user space part for in-kernel i8259 kvm: x86: Add user space part for in-kernel IOAPIC kvm: Arm in-kernel irqchip support Looks good to me. Any thoughts on the qemu-kvm merge plan? Sounds painful. Pain will be where the existing qemu-kvm extensions collide with these refactored upstream devices (backend/frontend split specifically). That's where we have to merge very carefully. Haven't tried this yet, will give it a spin tomorrow or so. From that point on, disabling the new stuff for now and at some point switching over from the old one should be simple again. BTW, PIT+HPET+speaker will cause similar issues for the same reasons. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
Re: [Qemu-devel] Dropping the MONITOR_CMD_ASYNC
On Mon, Dec 12, 2011 at 10:24:53AM -0600, Anthony Liguori wrote: On 12/12/2011 10:00 AM, Alon Levy wrote: On Mon, Dec 12, 2011 at 03:23:35PM +, Stefan Hajnoczi wrote: On Mon, Dec 12, 2011 at 12:39 PM, Gerd Hoffmannkra...@redhat.com wrote: On 12/12/11 13:10, Stefan Hajnoczi wrote: On Mon, Dec 12, 2011 at 11:29 AM, Gerd Hoffmannkra...@redhat.com wrote: On 12/12/11 11:18, Stefan Hajnoczi wrote: On Sun, Dec 11, 2011 at 10:29 AM, Alon Levyal...@redhat.com wrote: On Thu, Dec 08, 2011 at 05:45:44PM -0200, Luiz Capitulino wrote: Hi there, I'm about to completely drop the MONITOR_CMD_ASYNC API, but it turns out that the command client_migrate_info uses it. That's a legacy interface and has to be dropped, no command should be using it... Something tells me that if I just drop it (and change the command to use the regular interface), bad things will happen. Am I right? :) The monitor command client_migrate_info needs to complete after getting an ACK message from the currently connected spice client (this is the only case where this is required - if there is no client then the MONITOR_CMD_ASYNC API won't be used). This in turn requires the main thread to perform select and call the callback that will process this ACK. That's why the MONITOR_CMD_ASYNC API was used. I'm not aware of any other way to do this, I'll be glad for any help here. Also, I understand this is not what is not true async, since one would expect a true async interface to support multiple in flight monitor commands. If there is any ETA or existing way to do this we could change the implementation of client_migrate_info. Is it possible to use a QMP event to signal completion instead of a MONITOR_CMD_ASYNC command? Problem is this breaks the qemu- libvirt interface. I had the same issue with the block_job_cancel command, which Adam Litke and Eric Blake helped us fix and find a backward-compatible libvirt solution for: http://www.redhat.com/archives/libvir-list/2011-November/msg01351.html Isn't going to fly as waiting for completion isn't optional in that case. Workflow is this: (1) libvirt issues client_migrate_info command. (2) qemu forwards it to spice-server, which in turn forwards it to the spice client (if connected). (3) spice client connects to the migration target machine. (4) spice client signals completion to spice-server, which in turn notifies qemu. (5) qemu calls the monitor completion callback, libvirt gets client_migrate_info result. (6) libvirt issues migrate command. The problem is that (3) must be finished before (6) because qemu on the target side doesn't accept incoming tcp connections any more once the migration started. I don't see a way to switch this to qmp events without breaking old libvirt versions managing new qemu. I don't see a solution in this case either. Looks like libvirt supports this command since 0.9.2 so it's not a good idea to just yank it. It might be possible for the QEMU client_migrate_info handler to run a nested event loop in the legacy libvirt case. This would suck since the VM is unresponsive while waiting for spice migration to complete. New libvirt would call the async version of the command which is well-behaved and uses a QMP event to signal completion. I agree that a nested event loop would be a bad solution, the point is to let the guest continue to work while waiting, otherwise you destroy the live migration experience, might as well disconnect the client from the source and have it connect to the target. I thought the whole point of MONITOR_CMD_ASYNC was to allow this scenario. So iiuc QMP is the alternative but it would require a rewrite, i.e. break existing users like libvirt. Hence my suggestion as a reply to Luiz's initial email that we just deprecate MONITOR_CMD_ASYNC right now instead of dropping it, It *has* to be dropped. Any command using it is fundamentally broken. Care to explain what is fundamental about it? The command should have never been introduced in the first place to use async. Gerd gave a very good explanation of the requirement, I don't think I can add anything to it. What you suggest is to rewrite it, I just don't understand why we can't leave the MONITOR_CMD_ASYNC around until there is both an alternative and libvirt learns to use it. Regards, Anthony Liguori
Re: [Qemu-devel] [PATCH] ARM - Remove fixed map code buffer restriction
Hi, On 12 December 2011 16:55, Peter Maydell peter.mayd...@linaro.org wrote: On 12 December 2011 15:37, Dr. David Alan Gilbert david.gilb...@linaro.org wrote: On ARM, don't map the code buffer at a fixed location, and fix up the call/goto tcg routines to let it do long jumps. Mapping the code buffer at a fixed address could sometimes result in it being mapped over the top of the heap with pretty random results. This diff is against v1.0. Patches should always be against master, although we're still close enough to 1.0 that this will probably apply anyway. Comments like This diff is against v1.0. go below the '---' so they don't appear in the final commit log. Code generally looks OK to me. To me as well, I'll apply if there are no objections. I remember Peter asked about MAP_FIXED on other hosts, has there been a resolution? Signed-off-by: Dr. David Alan Gilbert david.gilb...@linaro.org --- exec.c | 4 +--- tcg/arm/tcg-target.c | 31 --- 2 files changed, 13 insertions(+), 22 deletions(-) diff --git a/exec.c b/exec.c index 6b92198..ef83da1 100644 --- a/exec.c +++ b/exec.c @@ -497,9 +497,7 @@ static void code_gen_alloc(unsigned long tb_size) if (code_gen_buffer_size (512 * 1024 * 1024)) code_gen_buffer_size = (512 * 1024 * 1024); #elif defined(__arm__) - /* Map the buffer below 32M, so we can use direct calls and branches */ - flags |= MAP_FIXED; - start = (void *) 0x0100UL; + /* Keep the buffer no bigger than 16GB to branch between blocks */ if (code_gen_buffer_size 16 * 1024 * 1024) code_gen_buffer_size = 16 * 1024 * 1024; #elif defined(__s390x__) diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c index e05a64f..730d913 100644 --- a/tcg/arm/tcg-target.c +++ b/tcg/arm/tcg-target.c @@ -842,6 +842,12 @@ static inline void tcg_out_st8(TCGContext *s, int cond, tcg_out_st8_12(s, cond, rd, rn, offset); } +/* The _goto case is normally between TBs within the same code buffer, + and with the code buffer limited to 16GB we shouldn't need the long + case. + + except to the prologue that is in its own buffer. + */ static inline void tcg_out_goto(TCGContext *s, int cond, uint32_t addr) { int32_t val; @@ -855,22 +861,20 @@ static inline void tcg_out_goto(TCGContext *s, int cond, uint32_t addr) if (val - 8 0x01fd val - 8 -0x01fd) tcg_out_b(s, cond, val); else { -#if 1 - tcg_abort(); -#else if (cond == COND_AL) { tcg_out_ld32_12(s, COND_AL, TCG_REG_PC, TCG_REG_PC, -4); - tcg_out32(s, addr); /* XXX: This is l-u.value, can we use it? */ + tcg_out32(s, addr); I see these XXXs have been there since the ARM tcg target was introduced. Does anybody know what they were referring to? Andrzej? David asked me about them, but I couldn't figure it out. (the part in tcg_out_call apparently is just copypasted from tcg_ouy_goto) BTW: I think we can also use the ld branch when we see the goto target is in Thumb mode. Cheers
Re: [Qemu-devel] [PATCH v4 00/15] uq/master: Introduce basic irqchip support
On 12/12/2011 06:51 PM, Jan Kiszka wrote: Any thoughts on the qemu-kvm merge plan? Sounds painful. Pain will be where the existing qemu-kvm extensions collide with these refactored upstream devices (backend/frontend split specifically). That's where we have to merge very carefully. Haven't tried this yet, will give it a spin tomorrow or so. From that point on, disabling the new stuff for now and at some point switching over from the old one should be simple again. BTW, PIT+HPET+speaker will cause similar issues for the same reasons. It's a little late for this, but refactoring qemu-kvm in-tree and then splitting it into patches would have been easier. Let's try it this way for the next batch. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [PATCH v4 00/15] uq/master: Introduce basic irqchip support
On 2011-12-12 18:37, Avi Kivity wrote: On 12/12/2011 06:51 PM, Jan Kiszka wrote: Any thoughts on the qemu-kvm merge plan? Sounds painful. Pain will be where the existing qemu-kvm extensions collide with these refactored upstream devices (backend/frontend split specifically). That's where we have to merge very carefully. Haven't tried this yet, will give it a spin tomorrow or so. From that point on, disabling the new stuff for now and at some point switching over from the old one should be simple again. BTW, PIT+HPET+speaker will cause similar issues for the same reasons. It's a little late for this, but refactoring qemu-kvm in-tree and then splitting it into patches would have been easier. Let's try it this way for the next batch. I thought about this, but it definitely takes a clean, qemu-kvm free base as start. The point is to design something free of all the legacy, only looking at the other code base to extract the logic. Moreover, there was and still is quite some upstream cleanup necessary, and that never goes well with the delta of qemu-kvm. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
Re: [Qemu-devel] Dropping the MONITOR_CMD_ASYNC
On 12/12/2011 11:22 AM, Alon Levy wrote: On Mon, Dec 12, 2011 at 10:24:53AM -0600, Anthony Liguori wrote: On 12/12/2011 10:00 AM, Alon Levy wrote: On Mon, Dec 12, 2011 at 03:23:35PM +, Stefan Hajnoczi wrote: On Mon, Dec 12, 2011 at 12:39 PM, Gerd Hoffmannkra...@redhat.com wrote: On 12/12/11 13:10, Stefan Hajnoczi wrote: On Mon, Dec 12, 2011 at 11:29 AM, Gerd Hoffmannkra...@redhat.com wrote: On 12/12/11 11:18, Stefan Hajnoczi wrote: On Sun, Dec 11, 2011 at 10:29 AM, Alon Levyal...@redhat.com wrote: On Thu, Dec 08, 2011 at 05:45:44PM -0200, Luiz Capitulino wrote: Hi there, I'm about to completely drop the MONITOR_CMD_ASYNC API, but it turns out that the command client_migrate_info uses it. That's a legacy interface and has to be dropped, no command should be using it... Something tells me that if I just drop it (and change the command to use the regular interface), bad things will happen. Am I right? :) The monitor command client_migrate_info needs to complete after getting an ACK message from the currently connected spice client (this is the only case where this is required - if there is no client then the MONITOR_CMD_ASYNC API won't be used). This in turn requires the main thread to perform select and call the callback that will process this ACK. That's why the MONITOR_CMD_ASYNC API was used. I'm not aware of any other way to do this, I'll be glad for any help here. Also, I understand this is not what is not true async, since one would expect a true async interface to support multiple in flight monitor commands. If there is any ETA or existing way to do this we could change the implementation of client_migrate_info. Is it possible to use a QMP event to signal completion instead of a MONITOR_CMD_ASYNC command? Problem is this breaks the qemu- libvirt interface. I had the same issue with the block_job_cancel command, which Adam Litke and Eric Blake helped us fix and find a backward-compatible libvirt solution for: http://www.redhat.com/archives/libvir-list/2011-November/msg01351.html Isn't going to fly as waiting for completion isn't optional in that case. Workflow is this: (1) libvirt issues client_migrate_info command. (2) qemu forwards it to spice-server, which in turn forwards it to the spice client (if connected). (3) spice client connects to the migration target machine. (4) spice client signals completion to spice-server, which in turn notifies qemu. (5) qemu calls the monitor completion callback, libvirt gets client_migrate_info result. (6) libvirt issues migrate command. The problem is that (3) must be finished before (6) because qemu on the target side doesn't accept incoming tcp connections any more once the migration started. I don't see a way to switch this to qmp events without breaking old libvirt versions managing new qemu. I don't see a solution in this case either. Looks like libvirt supports this command since 0.9.2 so it's not a good idea to just yank it. It might be possible for the QEMU client_migrate_info handler to run a nested event loop in the legacy libvirt case. This would suck since the VM is unresponsive while waiting for spice migration to complete. New libvirt would call the async version of the command which is well-behaved and uses a QMP event to signal completion. I agree that a nested event loop would be a bad solution, the point is to let the guest continue to work while waiting, otherwise you destroy the live migration experience, might as well disconnect the client from the source and have it connect to the target. I thought the whole point of MONITOR_CMD_ASYNC was to allow this scenario. So iiuc QMP is the alternative but it would require a rewrite, i.e. break existing users like libvirt. Hence my suggestion as a reply to Luiz's initial email that we just deprecate MONITOR_CMD_ASYNC right now instead of dropping it, It *has* to be dropped. Any command using it is fundamentally broken. Care to explain what is fundamental about it? The commands are broken as they sit today. If you issue an async command and an error occurs, the error will not be associated with the right command. The command should have never been introduced in the first place to use async. Gerd gave a very good explanation of the requirement, I don't think I can add anything to it. What you suggest is to rewrite it, I just don't understand why we can't leave the MONITOR_CMD_ASYNC around until there is both an alternative and libvirt learns to use it. It's broken and more importantly, the longer we leave it there the more likely people will use it and create more problems. We aren't breaking an ABI here. libvirt can detect that this command is gone and probe for the new command. This shouldn't have gone in in the first place and yes, that's my responsibility to enforce. Regards, Anthony Liguori Regards, Anthony Liguori
Re: [Qemu-devel] [PATCH v4 00/15] uq/master: Introduce basic irqchip support
On 12/12/2011 07:42 PM, Jan Kiszka wrote: It's a little late for this, but refactoring qemu-kvm in-tree and then splitting it into patches would have been easier. Let's try it this way for the next batch. I thought about this, but it definitely takes a clean, qemu-kvm free base as start. The point is to design something free of all the legacy, only looking at the other code base to extract the logic. If qemu-kvm was merged into qemu as is, then I'm sure you'd be able to refactor it into good shape (though there might be a little less motivation). Moreover, there was and still is quite some upstream cleanup necessary, and that never goes well with the delta of qemu-kvm. That's a good point. -- error compiling committee.c: too many arguments to function
[Qemu-devel] [Bug 569760] Re: Error in i386 cmpxchg instruction emulation
** Changed in: qemu Status: New = Fix Committed -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/569760 Title: Error in i386 cmpxchg instruction emulation Status in QEMU: Fix Committed Bug description: As reported in http://www.netbsd.org/cgi-bin/query-pr-single.pl?number=42158 programs using pthreads and fork() under NetBSD/i386 hang when the NetBSD system is run within qemu. This problem affects every version of qemu I have tested, including 0.12.3. I have now tracked down the cause of the problem to a bug in qemu's emulation of the cmpxchg instruction. Quoting the above bug report: In a physical i386 CPU, the cmpxchg instruction performs a comparison and read-modify-write memory cycle. In the case where the comparison outcome is unequal, the read-modify-write cycle is an effective no-op, writing back the same value that was read, and the value of the source operand is loaded into the accumulator. Qemu attempts to emulate this behavior including the redundant memory write. To be precise, qemu first loads the accumulator and then does the redundant memory write. If a page fault occurs during the write, the cmpxchg instruction will be restarted after handling the page fault, but because the accumulator has already been changed, the comparison will now incorrectly yield a result of equal, causing the memory write to write the value from the source operand instead of re-writing the original memory contents. I assume fork() triggers the bug because it write protects pages to implement copy-on-write, thereby producing a situation where the read part of the cmpxchg read-modify-write cycle succeeds but the write part causes a page fault. Patching qemu to only change the accumulator after performing the redundant write fixes the problem for me. I will attach a patch against qemu 0.12.3 shortly. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/569760/+subscriptions
Re: [Qemu-devel] [PATCH] ARM - Remove fixed map code buffer restriction
On 12 December 2011 17:24, andrzej zaborowski balr...@gmail.com wrote: BTW: I think we can also use the ld branch when we see the goto target is in Thumb mode. The target of a goto is currently never Thumb (because gotos are always to other TCG generated code and we only generate ARM insns). If we did need to be able to do a goto-Thumb we'd want to support the short-range version too. But I think we might as well leave it until we actually need it. -- PMM
Re: [Qemu-devel] [PATCH] ARM - Remove fixed map code buffer restriction
On 12 December 2011 19:03, Peter Maydell peter.mayd...@linaro.org wrote: On 12 December 2011 17:24, andrzej zaborowski balr...@gmail.com wrote: BTW: I think we can also use the ld branch when we see the goto target is in Thumb mode. The target of a goto is currently never Thumb (because gotos are always to other TCG generated code and we only generate ARM insns). I'm aware of that, I just like functions that can do what their name says well. :) Cheers
Re: [Qemu-devel] [PATCH] ARM - Remove fixed map code buffer restriction
On 12 December 2011 18:10, andrzej zaborowski balr...@gmail.com wrote: On 12 December 2011 19:03, Peter Maydell peter.mayd...@linaro.org wrote: On 12 December 2011 17:24, andrzej zaborowski balr...@gmail.com wrote: BTW: I think we can also use the ld branch when we see the goto target is in Thumb mode. The target of a goto is currently never Thumb (because gotos are always to other TCG generated code and we only generate ARM insns). I'm aware of that, I just like functions that can do what their name says well. :) It does have an assert which will catch it if you try, so no one should get caught out by it, and on ARMv7 the add is apparently an interworking branch, so I think it might even work. Dave
Re: [Qemu-devel] [PATCH 1/7] Remove unnecessary casts from PCI DMA code in eepro100
On 11/03/2011 08:03 PM, David Gibson wrote: This patch removes some unnecessary casts in the eepro100 device, introduced by commit 16ef60c9a8269f7cbc95219a431b1d7cbf29 'eepro100: Use PCI DMA stub functions'. Signed-off-by: David Gibsonda...@gibson.dropbear.id.au Applied all. Thanks. Regards, Anthony Liguori --- hw/eepro100.c |7 +++ 1 files changed, 3 insertions(+), 4 deletions(-) diff --git a/hw/eepro100.c b/hw/eepro100.c index 7d59e71..8769e33 100644 --- a/hw/eepro100.c +++ b/hw/eepro100.c @@ -713,8 +713,7 @@ static void dump_statistics(EEPRO100State * s) * values which really matter. * Number of data should check configuration!!! */ -pci_dma_write(s-dev, s-statsaddr, - (uint8_t *)s-statistics, s-stats_size); +pci_dma_write(s-dev, s-statsaddr,s-statistics, s-stats_size); stl_le_pci_dma(s-dev, s-statsaddr + 0, s-statistics.tx_good_frames); stl_le_pci_dma(s-dev, s-statsaddr + 36, @@ -732,7 +731,7 @@ static void dump_statistics(EEPRO100State * s) static void read_cb(EEPRO100State *s) { -pci_dma_read(s-dev, s-cb_address, (uint8_t *)s-tx, sizeof(s-tx)); +pci_dma_read(s-dev, s-cb_address,s-tx, sizeof(s-tx)); s-tx.status = le16_to_cpu(s-tx.status); s-tx.command = le16_to_cpu(s-tx.command); s-tx.link = le32_to_cpu(s-tx.link); @@ -1707,7 +1706,7 @@ static ssize_t nic_receive(VLANClientState *nc, const uint8_t * buf, size_t size /* !!! */ eepro100_rx_t rx; pci_dma_read(s-dev, s-ru_base + s-ru_offset, - (uint8_t *)rx, sizeof(eepro100_rx_t)); +rx, sizeof(eepro100_rx_t)); uint16_t rfd_command = le16_to_cpu(rx.command); uint16_t rfd_size = le16_to_cpu(rx.size);
Re: [Qemu-devel] [PATCH] hw/usb-net.c: Fix precedence bug when checking rndis_state
On 11/09/2011 03:09 PM, Peter Maydell wrote: !X == 2 is always false (spotted by Coverity), so the checks for whether rndis is in the correct state would never fire. Signed-off-by: Peter Maydellpeter.mayd...@linaro.org Applied. Thanks. Regards, Anthony Liguori --- NB that although I tested that this doesn't break non-rndis usb-net, I don't have a test image that uses rndis usb-net, so treat this patch with the appropriate degree of caution. (Probably safer not putting it into 1.0 unless tested.) hw/usb-net.c |5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/hw/usb-net.c b/hw/usb-net.c index a8b7c8d..f91fa32 100644 --- a/hw/usb-net.c +++ b/hw/usb-net.c @@ -1268,8 +1268,9 @@ static ssize_t usbnet_receive(VLANClientState *nc, const uint8_t *buf, size_t si if (is_rndis(s)) { msg = (struct rndis_packet_msg_type *) s-in_buf; -if (!s-rndis_state == RNDIS_DATA_INITIALIZED) +if (s-rndis_state != RNDIS_DATA_INITIALIZED) { return -1; +} if (size + sizeof(struct rndis_packet_msg_type) sizeof(s-in_buf)) return -1; @@ -1302,7 +1303,7 @@ static int usbnet_can_receive(VLANClientState *nc) { USBNetState *s = DO_UPCAST(NICState, nc, nc)-opaque; -if (is_rndis(s) !s-rndis_state == RNDIS_DATA_INITIALIZED) { +if (is_rndis(s) s-rndis_state != RNDIS_DATA_INITIALIZED) { return 1; }
Re: [Qemu-devel] [PATCH 00/10] qemu_fclose() error handling fixes (v3)
On 11/10/2011 06:41 AM, Eduardo Habkost wrote: Comments for v3: I am still not sure if this is 1.0 material, but I am more inclined to delay this for post-1.0. Changes v2 - v3: - Only coding style changes for issues detected by checkpatch.pl: - Avoid // comments; - Use braces on if statements. Applied all. Thanks. Regards, Anthony Liguori Comments for v2: I am not sure if this is appropriate post-freeze, I will let the maintainers decide this. Personally I think the code is more reliable with these changes, but on the other hand the only bugs it fixes are on the error paths. Changes v1 - v2: - Patch 2: Cosmetic spelling change on comment text - Patch 5: Add small comment about the need to return previously-spotted errors - Patch 6: On success, keep returning pclose() return value, instead of always 0. (the most relevant change in this new version of the series) Also, this series was tested using ping-pong migration with Autotest, no problems were detected. Original series description: Summary of the problem: - qemu_fclose() calls qemu_fflush() - Writes done by qemu_fflush() can fail - Those errors are lost after qemu_fclose() returns So, this series change qemu_fclose() to return last_error. But to do that we need to make sure all involve code use the -errno convention, hence the large series. Michael, probably this will conflict with your ongoing work. I don't want to delay other work, so I can rebase my patches if needed. This is just a RFC. Juan, maybe you were already working on this. But as I was already fixing this code while auditing the migration handling, I thought it was interesting to send this for review anyway. I hope I didn't duplicate any work. This is still completely untested, I am just using this series as a way to report the issue and get comments so I know I am going through the right path. Detailed description of the changes: Small cleanups: - Always use qemu_file_set_error() to set last_error (patch 1) - Add return value documentation to QEMUFileCloseFunc (patch 2) Actual qemu_fclose() behavior changes are done in 3 steps: - First step: fix qemu_fclose() callers: - exec_close() - Fixed to check for negative values, not -1 (patch 3) - Note: exec_close() is changed in two steps: first on the qemu_fclose() calling code, then on the return value code - migrate_fd_cleanup - Fixed to: - check qemu_fclose() return value for0 (patch 4) - return -errno, not just -1 (patch 4) - Callers: - migrate_fd_completed: - Error checking is done properly, already. - migrate_fd_error: - It ignores migrated_fd_cleanup() return value. - migrate_fd_cancel: - It ignores migrated_fd_cleanup() return value. - exec_accept_incoming_migration(): no return value check (yet) - fd_accept_incoming_migration(): no return value check (yet) - tcp_accept_incoming_migration(): no return value check (yet) - unix_accept_incoming_migration(): no return value check (yet) - do_savevm(): no return value check (yet) - load_vmstate(): no return value check (yet) - Second step: change qemu_fclose() to return last_error (patch 5) - Made sure to return unchanged (positive) success value on success (required by exec_close()) - Third step: change qemu_fclose() implementations (QEMUFileCloseFunc): - stdio_fclose - Fixed to return -errno (patch 6) - stdio_pclose - Fixed to return -errno (patch 7) - buffered_close - Implemented through QEMUFileBuffered.close: - Only implementation is migrate_fd_close(), that calls the following, through MigrationState.close: - exec_close(): - fixed to return original error value, not -1 (patch 8) - fd_close - Fixed to return -errno on close() errors. (patch 9) - tcp_close - Fixed to return -errno on close() errors. (patch 10) - unix_close - Fixed to return -errno on close() errors. (patch 11) - socket_close - No system call is made, returns always 0. - bdrv_fclose - No system call is made, returns always 0. Eduardo Habkost (10): savevm: use qemu_file_set_error() instead of setting last_error directly QEMUFileCloseFunc: add return value documentation (v2) exec_close(): accept any negative value as qemu_fclose() error migrate_fd_cleanup: accept any negative qemu_fclose() value as error qemu_fclose: return last_error if set (v3) stdio_pclose: return -errno on error (v3) stdio_fclose: return -errno on errors (v2) exec_close(): return -errno on errors (v2) tcp_close(): check for close() errors too (v2) unix_close(): check for close() errors too (v2) hw/hw.h |8 +- migration-exec.c |9 ++- migration-tcp.c |7 - migration-unix.c |7 - migration.c |4 +-- savevm.c | 65
Re: [Qemu-devel] [RFC] Device sandboxing
On 12/08/2011 04:51 PM, Blue Swirl wrote: Why limit this to device emulation only? Where in QEMU would this approach not work? That's a good point, and we've thrown this idea around. I don't know if there's any reason why this approach wouldn't work for all of QEMU. The idea for now though is to target the most vulnerable code, devices. -- Regards, Corey
Re: [Qemu-devel] [RFC] Device sandboxing
On Sun, Dec 11, 2011 at 4:50 AM, Dor Laor dl...@redhat.com wrote: On 12/08/2011 11:40 AM, Stefan Hajnoczi wrote: On Wed, Dec 7, 2011 at 8:54 PM, Eric Parisepa...@redhat.com wrote: On Wed, 2011-12-07 at 13:43 -0600, Anthony Liguori wrote: On 12/07/2011 01:32 PM, Corey Bryant wrote: That would seem like the logical approach. I think there may be new mode 2 patches coming soon so we can see how they go over. I'd like to see what the whitelist would need to be for something like QEMU in mode 2. My biggest concern is that the whitelist would need to be so large that the practical security what's all that much improved. When I prototyped my version of seccomp v2 for qemu a while back I did it by only looking at syscalls after inital setup was completed (aka the very last thing before main_loop() was to lock it down). My list was much sorter than even these: + __NR_brk, + __NR_close, + __NR_exit_group, + __NR_futex, + __NR_ioctl, + __NR_madvise, + __NR_mmap, + __NR_munmap, + __NR_read, + __NR_recvfrom, + __NR_recvmsg, + __NR_rt_sigaction, + __NR_select, + __NR_sendto, + __NR_tgkill, + __NR_timer_delete, + __NR_timer_gettime, + __NR_timer_settime, + __NR_write, + __NR_writev, There is simple obvious negligible overhead value here, but every proposal I know of, including mine, has been shot down by Ingo. Ingo's proposal is much more work, more overhead, but clearly more flexible. His suggestions (and code based on those suggestions from others) has been shot down by PeterZ. So I feel like seccomp v2 is between a rock and a hard place. We have something that works really well, and could be a huge win for all sorts of programs, but we don't seem to be able to get anything past the damned if you do, damned if you don't nak's. (There's also a cgroup version of seccomp proposed, but I'm guessing it will go just about as far as all the other versions) Still, these sorts of situations are overcome all the time. Sometimes it takes a while and several LWN.net articles about the drama but at the end things can be worked out. If we want to discuss the specifics of mode 2 and especially what Ingo or Peter think then I think we should do it in a forum where they participate. Maybe their positions have changed. Will, little bird whispered that you'll going to send another iteration w/ higher acceptance chances. Where do we stand w/ it? Can you please elaborate on it chance to get merged? Hi - yup. I keep getting delayed with other work, but I still plan to send it soon. The first plan was to port the updated patchset to linus's tree, then repost. I plan on adding a cover letter (0/x) to the series to see if we can get discussion going again on it. I'm not sure what the merge chance is, to be honest. I believe the patch is desirable to many parties, but Ingo didn't feel it was appropriate given that it would add an ABI that was not part of the trace/perf ABI. Given that grand-unified trace has been held up on stable internal api points and other challenges, I would hope that we could continue on using a prctl for seccomp_filter, even if it becomes a compatibility layer which is eventually switched off for most users. I did spend some time looking at other approaches (making the syscall table a namespace, cgroups syscalls like Łukasz Sowa's patch, etc), but using the seccomp-centered approach still seems to make sense from a process view and a seccomp view - and who wants a new syscall anyway :) The last thing I am looking at before I post is seeing if it'd be possible to avoid touching ftrace filter engine at all for ipc,socketcall, ioctl, etc, but I'm not sure that it makes sense to not use the generic system that exists today. cheers! will