date:20111212

Re: [Qemu-devel] [PATCH v3] network scripts: don't block SIGCHLD before forking

2011-12-12 Thread Paolo Bonzini


On 12/08/2011 04:48 AM, Michael Roth wrote:

This patch fixes a bug where child processes of launch_script() can
misbehave due to SIGCHLD being blocked. In the case of `sudo`, this
causes a permanent hang.

Previously a SIGCHLD handler was added to reap fork_exec()'d zombie
processes by calling waitpid(-1, ...). This required other
fork()/waitpid() callers to temporarilly block SIGCHILD to avoid
having the final wait status being intercepted by the SIGCHLD
handler:

7c3370d4fe3fa6cda8655f109e4659afc8ca4269

Since then, the qemu_add_child_watch() interface was added to allow
registration of such processes and reap only from that specific set
of PIDs:

4d54ec7898bd951007cb6122d5315584bd41d0c4

As a result, we can now avoid blocking SIGCHLD in launch_script(), so
drop that behavior.

Signed-off-by: Michael Rothmdr...@linux.vnet.ibm.com
---
  net/tap.c |6 --
  1 files changed, 0 insertions(+), 6 deletions(-)

diff --git a/net/tap.c b/net/tap.c
index 1f26dc9..6c27a94 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -346,15 +346,10 @@ static TAPState *net_tap_fd_init(VLANState *vlan,

  static int launch_script(const char *setup_script, const char *ifname, int fd)
  {
-sigset_t oldmask, mask;
  int pid, status;
  char *args[3];
  char **parg;

-sigemptyset(mask);
-sigaddset(mask, SIGCHLD);
-sigprocmask(SIG_BLOCK,mask,oldmask);
-
  /* try to launch network script */
  pid = fork();
  if (pid == 0) {
@@ -378,7 +373,6 @@ static int launch_script(const char *setup_script, const 
char *ifname, int fd)
  while (waitpid(pid,status, 0) != pid) {
  /* loop */
  }
-sigprocmask(SIG_SETMASK,oldmask, NULL);

  if (WIFEXITED(status)  WEXITSTATUS(status) == 0) {
  return 0;


Reviewed-by: Paolo Bonzini pbonz...@redhat.com

Paolo

Re: [Qemu-devel] [BUG] [Seabios] PCI 64bit BARs on Win2008 - unable to start the device. (ACPI lacks the _DSM method)

2011-12-12 Thread Gerd Hoffmann

On 12/05/11 05:29, Alexey Korolev wrote:
 Hi Gerd,
 
 We have very early prototype of data acquisition device, with quite
 large MMIO buffer. It is an emulated device.
 We are running the 0.15 release.
 0.15 doesn't work correctly with 64bit BARs so I've already added some
 hacks to Seabios to let OS to choose the memory region.
 Thus you see bar 1, addr 0 in seabios log.

I'd strongly suggest to move forward to qemu 1.0.  Memory region
handling has seen a major rewrite in 1.0 (memory api patches by avi).
Chances are good that the 64bit bar bugs in qemu have been fixed meanwhile.

I have experimental patches which add a 64bit bar to the qxl device and
seabios handles it just fine (although memory-backed not mmio), except
that there is no support yet to map 64bit bars above 4G.

It shouldn't be that hard to add the latter though.  seabios needs two
more pci_region_type (PCI_REGION_TYPE_MEM_64 and
PCI_REGION_TYPE_PREFMEM_64) to track and map 64bit bars separately.  And
a address space window where it can map 64bit bars to.

 Sorry that I haven't specified all this initially. I just want to make
 64bit PCI bar working properly. Linux guests works correctly (except
 early versions - not investigated this yet). At the moment I have some
 issues with windows which relies on ACPI _CRS.

... and a _CRS entry for the 64bit bar address space window of course.

cheers,
  Gerd

[Qemu-devel] RFC: Only display help options that are accepted by the architecture

2011-12-12 Thread Michael Ellerman

Hi all,

As the subject says, this is an RFC.

I have a few patches (to follow), that change the help output from QEMU
so that we only display options that are accepted by the arch of the
running binary.

So for example qemu-system-ppc64 will not tell you about i386 options
like -no-acpi, -no-hpet etc.

We already have almost all the information we need, it just requires
actually filtering the options as we print them in the help text - and
some related cruft, see the patches for details.

I'd argue it's generally a sane change, but it also has the benefit that
it makes libvirt's life easier. For better or worse libvirt parses the
output of qemu -h to see what options are supported. At the moment that
is problematic because the ppc64 emulator claims to support -no-acpi
etc. So with this change libvirt could more reliably determine what is
supported.

cheers


signature.asc
Description: This is a digitally signed message part

[Qemu-devel] Rappel: Contacts DEFISCALISATION et PLACEMENTS FINANCIERS. Jusqu'à 35 contacts offerts en Décembre. Profitez-en!

2011-12-12 Thread Mes-Contacts

Cher(s)  CLIENTS,   cgp(s) indépendant(s), cabinet(s) de commercilisation, 
promoteur(s)... Rendez-vous sur www.mes-contacts.com (recommandé par les 
promoteurs) pour obtenir maintenant les meilleurs Contacts entrants en 
Défiscalisation et Placements financiers (Ass.vie, 
SCPI,mutuelle,retraite..) Offre valable tout le mois de Décembre 2011 
: CONTACTS DEFISCALISATION :*Forfait 750 #8364; :   6 contacts offerts soit 23 
fiches au total au lieu de 17*Forfait 1000 #8364; :9 contacts offerts soit 33 
fiches au total au lieu de 24*Forfait 2000 #8364; : 19 contacts offerts soit 
70 fiches au total au lieu de 51*Forfait 5000 #8364; : 35 contacts offerts 
soit 170 fiches au total au lieu de 135  CONTACTS PLACEMENTS 
FINANCIERS:*Forfait 550 #8364; :   2 contacts offerts soit 17 fiches au total 
au lieu de 15*Forfait 1000 #8364; : 5 contacts offerts soit 35 fiches au total 
au lieu de 30*Forfait 2000 #8364; : 10 contacts offerts soit 90 fiches au 
total au lieu de 80-Contacts Exclusifs-Démarche volontaire des 
prospects-Livraison des contacts sur votre boîte e-mail en temps réel-Gros 
volume d'envois possible-Simple, efficace,économique et rentableà partir de 25 
#8364;/le contactCordialement,L'équipe 
Mes-Contacts.comwww.mes-contacts.com Désinscription

[Qemu-devel] [PATCH 1/3] Add arch mask to headings but don't use it yet

2011-12-12 Thread Michael Ellerman

Make it possible to specify what architecture a heading in the help
doco applies to.

Signed-off-by: Michael Ellerman mich...@ellerman.id.au

---

A possibly nicer way to do this would be to add a new macro, perhaps
ARCHHEADING, that is used for architecture specific headings. That
would make the help source nicer, in that most headings would just
be DEFHEADINGS - but it would mean we need to #define/undef another
set of macros at each location we include the options.
---
 qemu-options.h  |2 +-
 qemu-options.hx |   42 +-
 scripts/hxtool  |2 +-
 vl.c|4 ++--
 4 files changed, 25 insertions(+), 25 deletions(-)

diff --git a/qemu-options.h b/qemu-options.h
index c96f994..c8c3022 100644
--- a/qemu-options.h
+++ b/qemu-options.h
@@ -31,7 +31,7 @@
 enum {
 #define DEF(option, opt_arg, opt_enum, opt_help, arch_mask) \
 opt_enum,
-#define DEFHEADING(text)
+#define DEFHEADING(text, arch_mask)
 #include qemu-options.def
 #undef DEF
 #undef DEFHEADING
diff --git a/qemu-options.hx b/qemu-options.hx
index 681eaf1..cf03763 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -6,7 +6,7 @@ HXCOMM construct option structures, enums and help message for 
specified
 HXCOMM architectures.
 HXCOMM HXCOMM can be used for comments, discarded from both texi and C
 
-DEFHEADING(Standard options:)
+DEFHEADING(Standard options:, QEMU_ARCH_ALL)
 STEXI
 @table @option
 ETEXI
@@ -525,9 +525,9 @@ possible drivers and properties, use @code{-device ?} and
 @code{-device @var{driver},?}.
 ETEXI
 
-DEFHEADING()
+DEFHEADING(, QEMU_ARCH_ALL)
 
-DEFHEADING(File system options:)
+DEFHEADING(File system options:, QEMU_ARCH_ALL)
 
 DEF(fsdev, HAS_ARG, QEMU_OPTION_fsdev,
 -fsdev 
fsdriver,id=id,path=path,[security_model={mapped|passthrough|none}]\n
@@ -583,9 +583,9 @@ Specifies the tag name to be used by the guest to mount 
this export point
 
 ETEXI
 
-DEFHEADING()
+DEFHEADING(, QEMU_ARCH_ALL)
 
-DEFHEADING(Virtual File system pass-through options:)
+DEFHEADING(Virtual File system pass-through options:, QEMU_ARCH_ALL)
 
 DEF(virtfs, HAS_ARG, QEMU_OPTION_virtfs,
 -virtfs 
local,path=path,mount_tag=tag,security_model=[mapped|passthrough|none]\n
@@ -640,7 +640,7 @@ STEXI
 Create synthetic file system image
 ETEXI
 
-DEFHEADING()
+DEFHEADING(, QEMU_ARCH_ALL)
 
 DEF(name, HAS_ARG, QEMU_OPTION_name,
 -name string1[,process=string2]\n
@@ -669,9 +669,9 @@ STEXI
 @end table
 ETEXI
 
-DEFHEADING()
+DEFHEADING(, QEMU_ARCH_ALL)
 
-DEFHEADING(Display options:)
+DEFHEADING(Display options:, QEMU_ARCH_ALL)
 
 STEXI
 @table @option
@@ -1062,9 +1062,9 @@ STEXI
 @end table
 ETEXI
 
-DEFHEADING()
+DEFHEADING(, QEMU_ARCH_I386)
 
-DEFHEADING(i386 target only:)
+DEFHEADING(i386 target only:, QEMU_ARCH_I386)
 STEXI
 @table @option
 ETEXI
@@ -1160,12 +1160,12 @@ Specify SMBIOS type 0 fields
 Specify SMBIOS type 1 fields
 ETEXI
 
-DEFHEADING()
+DEFHEADING(, QEMU_ARCH_ALL)
 STEXI
 @end table
 ETEXI
 
-DEFHEADING(Network options:)
+DEFHEADING(Network options:, QEMU_ARCH_ALL)
 STEXI
 @table @option
 ETEXI
@@ -1484,9 +1484,9 @@ is activated if no @option{-net} options are provided.
 @end table
 ETEXI
 
-DEFHEADING()
+DEFHEADING(, QEMU_ARCH_ALL)
 
-DEFHEADING(Character device options:)
+DEFHEADING(Character device options:, QEMU_ARCH_ALL)
 
 DEF(chardev, HAS_ARG, QEMU_OPTION_chardev,
 -chardev null,id=id[,mux=on|off]\n
@@ -1734,10 +1734,10 @@ Connect to a spice virtual machine channel, such as 
vdiport.
 @end table
 ETEXI
 
-DEFHEADING()
+DEFHEADING(, QEMU_ARCH_ALL)
 
 STEXI
-DEFHEADING(Device URL Syntax:)
+DEFHEADING(Device URL Syntax:, QEMU_ARCH_ALL)
 
 In addition to using normal file images for the emulated storage devices,
 QEMU can also use networked resources such as iSCSI devices. These are
@@ -1823,7 +1823,7 @@ See also @url{http://http://www.osrg.net/sheepdog/}.
 @end table
 ETEXI
 
-DEFHEADING(Bluetooth(R) options:)
+DEFHEADING(Bluetooth(R) options:, QEMU_ARCH_ALL)
 
 DEF(bt, HAS_ARG, QEMU_OPTION_bt, \
 -bt hci,nulldumb bluetooth HCI - doesn't respond to commands\n \
@@ -1893,9 +1893,9 @@ Virtual wireless keyboard implementing the HIDP bluetooth 
profile.
 @end table
 ETEXI
 
-DEFHEADING()
+DEFHEADING(, QEMU_ARCH_ALL)
 
-DEFHEADING(Linux/Multiboot boot specific:)
+DEFHEADING(Linux/Multiboot boot specific:, QEMU_ARCH_ALL)
 STEXI
 
 When using these options, you can use a given Linux or Multiboot
@@ -1941,9 +1941,9 @@ STEXI
 @end table
 ETEXI
 
-DEFHEADING()
+DEFHEADING(, QEMU_ARCH_ALL)
 
-DEFHEADING(Debug/Expert options:)
+DEFHEADING(Debug/Expert options:, QEMU_ARCH_ALL)
 
 STEXI
 @table @option
diff --git a/scripts/hxtool b/scripts/hxtool
index 7ca83ed..8d07f01 100644
--- a/scripts/hxtool
+++ b/scripts/hxtool
@@ -45,7 +45,7 @@ hxtotexi()
 fi
 ;;
 DEFHEADING*)
-echo $(expr $str : DEFHEADING(\(.*\)))
+echo $(expr $str : DEFHEADING(\(.*\),.*))
 ;;
 *)
 test $flag -eq 1  echo

[Qemu-devel] [PATCH 3/3] In qemu -h output, only print options for the arch we are running as

2011-12-12 Thread Michael Ellerman

Only print options in the help output that are accepted by our arch.

Signed-off-by: Michael Ellerman mich...@ellerman.id.au
---
 vl.c |   31 +--
 1 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/vl.c b/vl.c
index b492f8c..ba8e76d 100644
--- a/vl.c
+++ b/vl.c
@@ -1492,28 +1492,31 @@ static void version(void)
 
 static void help(int exitcode)
 {
-const char *options_help =
-#define DEF(option, opt_arg, opt_enum, opt_help, arch_mask) \
-opt_help
-#define DEFHEADING(text, arch_mask) stringify(text) \n
+version();
+printf(usage: qemu [options] [disk_image]\n
+   \n
+   'disk_image' is a raw hard disk image for IDE hard disk 0\n\n);
+
+#define DEF(option, opt_arg, opt_enum, opt_help, arch_mask)\
+if ((arch_mask)  arch_type)   \
+fputs(opt_help, stdout);
+
+#define DEFHEADING(text, arch_mask) \
+if ((arch_mask)  arch_type)\
+puts(stringify(text));
+
 #include qemu-options.def
 #undef DEF
 #undef DEFHEADING
 #undef GEN_DOCS
-;
-version();
-printf(usage: qemu [options] [disk_image]\n
-   \n
-   'disk_image' is a raw hard disk image for IDE hard disk 0\n
-   \n
-   %s\n
-   During emulation, the following keys are useful:\n
+
+printf(\nDuring emulation, the following keys are useful:\n
ctrl-alt-f  toggle full screen\n
ctrl-alt-n  switch to virtual console 'n'\n
ctrl-alttoggle mouse and keyboard grab\n
\n
-   When using -nographic, press 'ctrl-a h' to get some help.\n,
-   options_help);
+   When using -nographic, press 'ctrl-a h' to get some help.\n);
+
 exit(exitcode);
 }
 
-- 
1.7.7.3

[Qemu-devel] [PATCH 2/3] vl.c: Fold constant string into printf rather than using %s

2011-12-12 Thread Michael Ellerman

In help() we do what boils down to:

  printf(%s, qemu);

This seems to be an artifact of be995c27640a82c7056b6f53d02ec823570114e5
(removed unused code), which removed some ifdef'ery that used to print
a different name depending on CONFIG_SOFTMMU.

But now that is gone and we always use qemu we may as well just put
that in the format string.

Signed-off-by: Michael Ellerman mich...@ellerman.id.au
---
 vl.c |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/vl.c b/vl.c
index 6a2ca6f..b492f8c 100644
--- a/vl.c
+++ b/vl.c
@@ -1502,7 +1502,7 @@ static void help(int exitcode)
 #undef GEN_DOCS
 ;
 version();
-printf(usage: %s [options] [disk_image]\n
+printf(usage: qemu [options] [disk_image]\n
\n
'disk_image' is a raw hard disk image for IDE hard disk 0\n
\n
@@ -1513,7 +1513,6 @@ static void help(int exitcode)
ctrl-alttoggle mouse and keyboard grab\n
\n
When using -nographic, press 'ctrl-a h' to get some help.\n,
-   qemu,
options_help);
 exit(exitcode);
 }
-- 
1.7.7.3

Re: [Qemu-devel] [PATCH] Fix parse of usb device description with multiple configurations

2011-12-12 Thread Gerd Hoffmann

  Hi,

 +} else if (descriptors[i + 5] != s-configuration) {
 +fprintf(stderr, not requested configuration %d\n,
 +s-configuration);
 +i += (descriptors[i + 3]  8) + descriptors[i + 2];
 +continue;
 +}

That message doesn't indicate an error and should be a DPRINTF instead
of a fprintf.  Otherwise the patch looks fine.

cheers,
  Gerd

Re: [Qemu-devel] [PATCH 2/3] vl.c: Fold constant string into printf rather than using %s

2011-12-12 Thread Andreas Färber

Am 12.12.2011 09:21, schrieb Michael Ellerman:
 In help() we do what boils down to:
 
   printf(%s, qemu);
 
 This seems to be an artifact of be995c27640a82c7056b6f53d02ec823570114e5
 (removed unused code), which removed some ifdef'ery that used to print
 a different name depending on CONFIG_SOFTMMU.
 
 But now that is gone and we always use qemu we may as well just put
 that in the format string.

I would rather propose to save argv[0] and use that. By now qemu is
not even correct for i386.

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

Re: [Qemu-devel] RFC: raw device support for block device targets

2011-12-12 Thread Kevin Wolf

Am 11.12.2011 11:45, schrieb Alex Bligh:
 
 
 --On 8 December 2011 13:40:57 +0100 Kevin Wolf kw...@redhat.com wrote:
 
 qemu-img convert appears to support block devices as input, but not
 as output. That is irritating, as when using qemu-img convert to
 convert qcow to raw on a block partition, an intermediate file has
 to be used, which slows things down and pointlessly uses disk space.

 The problem is that ftruncate() is being called on the output file
 in order ensure it is sufficiently large, and this fails on
 block devices.
 ...

 Creating an image on a block device shouldn't even call raw_create(),
 but only hdev_create(), which doesn't try to truncate the device, but
 just uses lseek to make sure that it's large enough.

 Which qemu version are you using and what's your command line?
 
 I was testing on:
 
 amb@alex-test:~$ qemu-img --version
 qemu-img version 0.12.3, Copyright (c) 2004-2008 Fabrice Bellard

That's the problem. It should work since 0.13.

Kevin

Re: [Qemu-devel] [RFC 0/6] target-i386: TCGv cleanups

2011-12-12 Thread Andreas Färber

Am 12.12.2011 00:42, schrieb Paul Brook:
 This series makes target-i386 compile with DEBUG_TCGV_TL.
 
 What benefit does this provide?

It showcases what changes would need to be done to allow type-safe
compilation of the first pair of --enable-system targets.

Especially my focus has been on how to do that without introducing loads
of unnecessary temporaries.

Andreas

Re: [Qemu-devel] [Bug 902148] [NEW] qemu-img V1.0 hangs on creating Image (0.15.1 runs)

2011-12-12 Thread Stefan Hajnoczi

On Fri, Dec 9, 2011 at 1:10 PM, Michael Niehren
902...@bugs.launchpad.net wrote:
 Strace on the hanging qemu-img ends on:

 select(5, [4], [], NULL, NULL)          = 1 (in [4])
 read(4, \0, 16)                       = 1
 close(3)                                = 0
 open(test.img, O_RDONLY|O_NONBLOCK)   = 3
 fstat(3, {st_mode=S_IFREG|0644, st_size=131072, ...}) = 0
 close(3)                                = 0
 open(test.img, O_RDONLY|O_NONBLOCK)   = 3
 fstat(3, {st_mode=S_IFREG|0644, st_size=131072, ...}) = 0
 close(3)                                = 0
 stat(test.img, {st_mode=S_IFREG|0644, st_size=131072, ...}) = 0
 open(test.img, O_RDWR|O_CLOEXEC)      = 3
 lseek(3, 0, SEEK_END)                   = 131072

 next line in the strace on working qemu-img V0.15.1 is:
 pread(3, 
 QFI\373\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\20\0\0\0\0\0\0\0\0..., 512, 
 0) = 512
 ...

When it hangs does it consume CPU?

Please attach gdb to the process and capture a backtrace:
$ gdb -p $PID_OF_QEMU
(gdb) thread apply all bt

Stefan

Re: [Qemu-devel] [PATCH] hw/usb-net.c: Fix precedence bug when checking rndis_state

2011-12-12 Thread Peter Maydell

On 14 November 2011 17:21, andrzej zaborowski balr...@gmail.com wrote:
 On 14 November 2011 09:08, Peter Maydell peter.mayd...@linaro.org wrote:
 I'm happy that non-rndis works, I tested that. What I don't know
 is whether the patch breaks rndis

 Sorry, I misread what you said assuming that you tested a branch
 affected by this patch.  I'm unable to find a guest to test the rndis
 mode so I reverted the patch until after release.  Applying is
 probably the best way to get someone to test a change like this.

Well, we're past release now, I think we could reasonably (re)apply
this patch now?

thanks
-- PMM

Re: [Qemu-devel] Dropping the MONITOR_CMD_ASYNC

2011-12-12 Thread Stefan Hajnoczi

On Sun, Dec 11, 2011 at 10:29 AM, Alon Levy al...@redhat.com wrote:
 On Thu, Dec 08, 2011 at 05:45:44PM -0200, Luiz Capitulino wrote:
 Hi there,

 I'm about to completely drop the MONITOR_CMD_ASYNC API, but it turns out that
 the command client_migrate_info uses it. That's a legacy interface and has to
 be dropped, no command should be using it...

 Something tells me that if I just drop it (and change the command to use the
 regular interface), bad things will happen. Am I right? :)


 The monitor command client_migrate_info needs to complete after getting
 an ACK message from the currently connected spice client (this is the
 only case where this is required - if there is no client then the
 MONITOR_CMD_ASYNC API won't be used). This in turn requires the main
 thread to perform select and call the callback that will process this
 ACK. That's why the MONITOR_CMD_ASYNC API was used.

 I'm not aware of any other way to do this, I'll be glad for any help
 here. Also, I understand this is not what is not true async, since one
 would expect a true async interface to support multiple in flight
 monitor commands. If there is any ETA or existing way to do this we
 could change the implementation of client_migrate_info.

Is it possible to use a QMP event to signal completion instead of a
MONITOR_CMD_ASYNC command?

Stefan

Re: [Qemu-devel] RFC: raw device support for block device targets

2011-12-12 Thread Alex Bligh




--On 12 December 2011 10:48:43 +0100 Kevin Wolf kw...@redhat.com wrote:


I was testing on:

amb@alex-test:~$ qemu-img --version
qemu-img version 0.12.3, Copyright (c) 2004-2008 Fabrice Bellard


That's the problem. It should work since 0.13.


Thanks

--
Alex Bligh

Re: [Qemu-devel] [PATCH 0/4] tcg: Add debug facilities for TCGv

2011-12-12 Thread Andreas Färber

Am 12.12.2011 00:28, schrieb Paul Brook:
 What mismatches does this catch that the existing debug code doesn't?

 Cf. patch 4/4:

 TCGv tmp = tcg_temp_new_i32();
 tcg_temp_free_i32(tmp);

 TCGv_i32 tmp2 = tcg_temp_new();
 tcg_temp_free(tmp2);
 
 Why is this a problem?  If TARGET_LONG_BITS==32 then tcg_temp_free and 
 tcg_temp_free_i32 are synonyms, and everything is happy.
 
 If TARGET_LONG_BITS==64 then we already flag this as an error.
 
 Try compiling --target-list=arm-softmmu --enable-debug-tcg with my
 series and DEBUG_TCGV_TL uncommented, and you'll see for yourself.
 There's too many to mention and for me to actually fix. You'll have to
 deal with it for ARMv8 at some point and this series hopefully helps.
 
 That's exactly why I think this patch is a bad idea.
 
 If a target always has TARGET_LONG_BITS==32 then it doesn't matter if we mix 
 TCGv and TCGv_i32.

To me and my perfectionism it does matter.

This is not about fixing some user-visible runtime bug, it's about
making the developer (me) aware of unintended type mixups.

 Trying to make a 32-bit target 64-bit safe without actually implementing 
 the 
 64-bit target is a complete waste of time.

That's where we disagree. I rather do things right from the start than
leaving the cleanup work to someone else later on.

 You've almost no chance of getting
 it right.  In some cases the correct answer will be to use 32-bit arithmetic, 
 then sign/zero extend the result. In other cases the correct answer will be 
 to 
 perform word size arithmetic.  Blindly picking one just makes the bugs harder 
 to find later.

This series picks nothing blindly. It provides an optional facility for
the developer to be made aware of uses of tl where i32/i64 was intended
(or vice versa), nothing changes at runtime.

Whether and in which way the developer addresses the issues shown that
way is up to the developer, and as any line added to a new target must
be decided on a case-by-case basis.

For me the most annoying issue was that tcg_gen_qemu_{ld,st}* needs TCGv.

 If you're trying to add support for targets where the primary word size is 
 neither 32 nor 64 then that's a completely different problem, and probably 
 not 
 one that's worth solving.  In practice your port is going to end up using 64-
 bit arithmetic and explicitly compensating for the excess precision where 
 necessary.

That's a different issue and not being addressed here.

My point was that I have an inheritance hierarchy where (as opposed to,
e.g., ppc and ppc64) both end up as TARGET_LONG_BITS==32, so I do not
get a check for free by compiling both, and I do not want to introduce
an artificial TARGET_LONG_BITS==64 architecture just to check that my
tl==i32 target code is free of typos.
Same issue if you pick only --target-list=some-softmmu BTW.

Just like with softfloat [u]int16 etc. types it's just too easy to
forget _t somewhere (here: _i32) and to end up with a type you don't want.

If you have a better proposal how to introduce the checks I want, please
let us hear it.

If no one has, I don't see how this series would hurt (1-3 refactoring,
4 not enabled by default) while providing useful new debug facilities to
developers of new targets.

Andreas

Re: [Qemu-devel] [PATCH 2/6] memory: change dirty setting APIs to take a size

2011-12-12 Thread Juan Quintela

Blue Swirl blauwir...@gmail.com wrote:
 Instead of each target knowing or guessing the guest page size,
 just pass the desired size of dirtied memory area. This should also
 improve performance due to memset() optimizations.

My understanding last time I looked at this, is that it is as easy
basically to change the interface to work on pages, not on addresses.

I looked at the interface from the point of view of migration, and I
always needed a page.  My understanding at the time was that every
caller could be change to use a page instead, but that is another set of
unfinished patches on my ToDo list.

Actually, what I found what I needed was that I wanted a pointer to the
ram_block on that functions.  Migration and vga were simple to fix, but
TCG got me completely lost :-(, and at that point I moved to someting else.

Will try to think what were my problems later today.

Later, Juan.

[Qemu-devel] [PULL 0/8] arm-devs queue

2011-12-12 Thread Peter Maydell

The following changes since commit f18318eef8b4b263f4e82a5338c9b2875a6c73c8:

  Merge branch 'master' of git://git.qemu.org/qemu (2011-12-12 04:12:31 +0400)

are available in the git repository at:

  git://git.linaro.org/people/pmaydell/qemu-arm.git arm-devs.for-upstream

Peter Chubb (1):
  Fix sp804 dual-timer

Peter Maydell (7):
  hw/arm_mptimer.c: Turn ARM MPcore private timers into qdev devices
  hw/arm_gic: Expose GIC CPU interfaces as sysbus memory regions
  hw/mpcore.c: Use the GIC memory regions for the CPU interface
  hw/realview_gic: Use GIC memory region for the CPU interface
  hw/mpcore: Clean up mpcore_priv_read/write as they are now SCU only
  hw/a9mpcore.c: Implement A9MP peripherals rather than 11MPcore ones
  hw/mpcore.c: Merge with hw/arm11mpcore.c

 Makefile.target   |1 +
 hw/a9mpcore.c |  189 --
 hw/arm11mpcore.c  |  130 +-
 hw/arm_gic.c  |   75 -
 hw/arm_mptimer.c  |  332 +
 hw/arm_timer.c|   41 ++-
 hw/mpcore.c   |  283 -
 hw/realview_gic.c |   25 +
 8 files changed, 751 insertions(+), 325 deletions(-)
 create mode 100644 hw/arm_mptimer.c
 delete mode 100644 hw/mpcore.c

[Qemu-devel] [PATCH 8/8] hw/mpcore.c: Merge with hw/arm11mpcore.c

2011-12-12 Thread Peter Maydell

hw/mpcore.c is now implementing only ARM11MPCore specific peripherals,
and is #included only from hw/arm11mpcore.c, so just merge it into that
file.

Signed-off-by: Peter Maydell peter.mayd...@linaro.org
---
 hw/arm11mpcore.c |  130 ++-
 hw/mpcore.c  |  137 --
 2 files changed, 129 insertions(+), 138 deletions(-)
 delete mode 100644 hw/mpcore.c

diff --git a/hw/arm11mpcore.c b/hw/arm11mpcore.c
index 32ecf98..bc0457e 100644
--- a/hw/arm11mpcore.c
+++ b/hw/arm11mpcore.c
@@ -7,11 +7,139 @@
  * This code is licensed under the GPL.
  */
 
+#include sysbus.h
+#include qemu-timer.h
+
 /* ??? The MPCore TRM says the on-chip controller has 224 external IRQ lines
(+ 32 internal).  However my test chip only exposes/reports 32.
More importantly Linux falls over if more than 32 are present!  */
 #define GIC_NIRQ 64
-#include mpcore.c
+
+#define NCPU 4
+
+static inline int
+gic_get_current_cpu(void)
+{
+  return cpu_single_env-cpu_index;
+}
+
+#include arm_gic.c
+
+/* MPCore private memory region.  */
+
+typedef struct mpcore_priv_state {
+gic_state gic;
+uint32_t scu_control;
+int iomemtype;
+uint32_t old_timer_status[8];
+uint32_t num_cpu;
+qemu_irq *timer_irq;
+MemoryRegion iomem;
+MemoryRegion container;
+DeviceState *mptimer;
+} mpcore_priv_state;
+
+/* Per-CPU private memory mapped IO.  */
+
+static uint64_t mpcore_scu_read(void *opaque, target_phys_addr_t offset,
+unsigned size)
+{
+mpcore_priv_state *s = (mpcore_priv_state *)opaque;
+int id;
+offset = 0xff;
+/* SCU */
+switch (offset) {
+case 0x00: /* Control.  */
+return s-scu_control;
+case 0x04: /* Configuration.  */
+id = ((1  s-num_cpu) - 1)  4;
+return id | (s-num_cpu - 1);
+case 0x08: /* CPU status.  */
+return 0;
+case 0x0c: /* Invalidate all.  */
+return 0;
+default:
+hw_error(mpcore_priv_read: Bad offset %x\n, (int)offset);
+}
+}
+
+static void mpcore_scu_write(void *opaque, target_phys_addr_t offset,
+ uint64_t value, unsigned size)
+{
+mpcore_priv_state *s = (mpcore_priv_state *)opaque;
+offset = 0xff;
+/* SCU */
+switch (offset) {
+case 0: /* Control register.  */
+s-scu_control = value  1;
+break;
+case 0x0c: /* Invalidate all.  */
+/* This is a no-op as cache is not emulated.  */
+break;
+default:
+hw_error(mpcore_priv_read: Bad offset %x\n, (int)offset);
+}
+}
+
+static const MemoryRegionOps mpcore_scu_ops = {
+.read = mpcore_scu_read,
+.write = mpcore_scu_write,
+.endianness = DEVICE_NATIVE_ENDIAN,
+};
+
+static void mpcore_timer_irq_handler(void *opaque, int irq, int level)
+{
+mpcore_priv_state *s = (mpcore_priv_state *)opaque;
+if (level  !s-old_timer_status[irq]) {
+gic_set_pending_private(s-gic, irq  1, 29 + (irq  1));
+}
+s-old_timer_status[irq] = level;
+}
+
+static void mpcore_priv_map_setup(mpcore_priv_state *s)
+{
+int i;
+SysBusDevice *busdev = sysbus_from_qdev(s-mptimer);
+memory_region_init(s-container, mpcode-priv-container, 0x2000);
+memory_region_init_io(s-iomem, mpcore_scu_ops, s, mpcore-scu, 0x100);
+memory_region_add_subregion(s-container, 0, s-iomem);
+/* GIC CPU interfaces: current CPU at 0x100, then specific CPUs
+ * at 0x200, 0x300...
+ */
+for (i = 0; i  (s-num_cpu + 1); i++) {
+target_phys_addr_t offset = 0x100 + (i * 0x100);
+memory_region_add_subregion(s-container, offset, 
s-gic.cpuiomem[i]);
+}
+/* Add the regions for timer and watchdog for current CPU and
+ * for each specific CPU.
+ */
+s-timer_irq = qemu_allocate_irqs(mpcore_timer_irq_handler,
+  s, (s-num_cpu + 1) * 2);
+for (i = 0; i  (s-num_cpu + 1) * 2; i++) {
+/* Timers at 0x600, 0x700, ...; watchdogs at 0x620, 0x720, ... */
+target_phys_addr_t offset = 0x600 + (i  1) * 0x100 + (i  1) * 0x20;
+memory_region_add_subregion(s-container, offset,
+sysbus_mmio_get_region(busdev, i));
+}
+memory_region_add_subregion(s-container, 0x1000, s-gic.iomem);
+/* Wire up the interrupt from each watchdog and timer. */
+for (i = 0; i  s-num_cpu * 2; i++) {
+sysbus_connect_irq(busdev, i, s-timer_irq[i]);
+}
+}
+
+static int mpcore_priv_init(SysBusDevice *dev)
+{
+mpcore_priv_state *s = FROM_SYSBUSGIC(mpcore_priv_state, dev);
+
+gic_init(s-gic, s-num_cpu);
+s-mptimer = qdev_create(NULL, arm_mptimer);
+qdev_prop_set_uint32(s-mptimer, num-cpu, s-num_cpu);
+qdev_init_nofail(s-mptimer);
+mpcore_priv_map_setup(s);
+sysbus_init_mmio(dev, s-container);
+return 0;
+}
 
 /* Dummy PIC to route IRQ lines.  The baseboard has 4 independent IRQ
controllers.

[Qemu-devel] [PATCH 2/8] hw/arm_mptimer.c: Turn ARM MPcore private timers into qdev devices

2011-12-12 Thread Peter Maydell

Turn the ARM MPcore private timer/watchdog blocks into separate
qdev devices. This will allow us to share them neatly between
11MPCore and A9MPcore.

Signed-off-by: Peter Maydell peter.mayd...@linaro.org
---
 Makefile.target  |1 +
 hw/arm_mptimer.c |  332 ++
 hw/mpcore.c  |  184 +-
 3 files changed, 365 insertions(+), 152 deletions(-)
 create mode 100644 hw/arm_mptimer.c

diff --git a/Makefile.target b/Makefile.target
index a111521..39b2e5a 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -344,6 +344,7 @@ obj-arm-y = integratorcp.o versatilepb.o arm_pic.o 
arm_timer.o
 obj-arm-y += arm_boot.o pl011.o pl031.o pl050.o pl080.o pl110.o pl181.o pl190.o
 obj-arm-y += versatile_pci.o
 obj-arm-y += realview_gic.o realview.o arm_sysctl.o arm11mpcore.o a9mpcore.o
+obj-arm-y += arm_mptimer.o
 obj-arm-y += armv7m.o armv7m_nvic.o stellaris.o pl022.o stellaris_enet.o
 obj-arm-y += pl061.o
 obj-arm-y += arm-semi.o
diff --git a/hw/arm_mptimer.c b/hw/arm_mptimer.c
new file mode 100644
index 000..455a0aa
--- /dev/null
+++ b/hw/arm_mptimer.c
@@ -0,0 +1,332 @@
+/*
+ * Private peripheral timer/watchdog blocks for ARM 11MPCore and A9MP
+ *
+ * Copyright (c) 2006-2007 CodeSourcery.
+ * Copyright (c) 2011 Linaro Limited
+ * Written by Paul Brook, Peter Maydell
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see http://www.gnu.org/licenses/.
+ */
+
+#include sysbus.h
+#include qemu-timer.h
+
+/* This device implements the per-cpu private timer and watchdog block
+ * which is used in both the ARM11MPCore and Cortex-A9MP.
+ */
+
+#define MAX_CPUS 4
+
+/* State of a single timer or watchdog block */
+typedef struct {
+uint32_t count;
+uint32_t load;
+uint32_t control;
+uint32_t status;
+int64_t tick;
+QEMUTimer *timer;
+qemu_irq irq;
+MemoryRegion iomem;
+} timerblock;
+
+typedef struct {
+SysBusDevice busdev;
+uint32_t num_cpu;
+timerblock timerblock[MAX_CPUS * 2];
+MemoryRegion iomem[2];
+} arm_mptimer_state;
+
+static inline int get_current_cpu(arm_mptimer_state *s)
+{
+if (cpu_single_env-cpu_index = s-num_cpu) {
+hw_error(arm_mptimer: num-cpu %d but this cpu is %d!\n,
+ s-num_cpu, cpu_single_env-cpu_index);
+}
+return cpu_single_env-cpu_index;
+}
+
+static inline void timerblock_update_irq(timerblock *tb)
+{
+qemu_set_irq(tb-irq, tb-status);
+}
+
+/* Return conversion factor from mpcore timer ticks to qemu timer ticks.  */
+static inline uint32_t timerblock_scale(timerblock *tb)
+{
+return (((tb-control  8)  0xff) + 1) * 10;
+}
+
+static void timerblock_reload(timerblock *tb, int restart)
+{
+if (tb-count == 0) {
+return;
+}
+if (restart) {
+tb-tick = qemu_get_clock_ns(vm_clock);
+}
+tb-tick += (int64_t)tb-count * timerblock_scale(tb);
+qemu_mod_timer(tb-timer, tb-tick);
+}
+
+static void timerblock_tick(void *opaque)
+{
+timerblock *tb = (timerblock *)opaque;
+tb-status = 1;
+if (tb-control  2) {
+tb-count = tb-load;
+timerblock_reload(tb, 0);
+} else {
+tb-count = 0;
+}
+timerblock_update_irq(tb);
+}
+
+static uint64_t timerblock_read(void *opaque, target_phys_addr_t addr,
+unsigned size)
+{
+timerblock *tb = (timerblock *)opaque;
+int64_t val;
+addr = 0x1f;
+switch (addr) {
+case 0: /* Load */
+return tb-load;
+case 4: /* Counter.  */
+if (((tb-control  1) == 0) || (tb-count == 0)) {
+return 0;
+}
+/* Slow and ugly, but hopefully won't happen too often.  */
+val = tb-tick - qemu_get_clock_ns(vm_clock);
+val /= timerblock_scale(tb);
+if (val  0) {
+val = 0;
+}
+return val;
+case 8: /* Control.  */
+return tb-control;
+case 12: /* Interrupt status.  */
+return tb-status;
+default:
+return 0;
+}
+}
+
+static void timerblock_write(void *opaque, target_phys_addr_t addr,
+ uint64_t value, unsigned size)
+{
+timerblock *tb = (timerblock *)opaque;
+int64_t old;
+addr = 0x1f;
+switch (addr) {
+case 0: /* Load */
+tb-load = value;
+/* Fall through.  */
+case 4: /* Counter.  */
+if ((tb-control  1)  tb-count) {
+/* Cancel

[Qemu-devel] [PATCH] Add a .mailmap to map pre-git-conversion authors to friendly names

2011-12-12 Thread Peter Maydell

Add a .mailmap file so 'git shortlog' can map the unfriendly
pre-git-conversion author entries to real names.

Signed-off-by: Peter Maydell peter.mayd...@linaro.org
---
v1-v2:
 fixed Andrzej's email to match MAINTAINERS file
 added entries for Fabrice Bellard, Jocelyn Mayer
 cc'd active maintainers with an entry in the mailmap
v2-v3:
 sorted into alpha order by first column

 .mailmap |   16 
 1 files changed, 16 insertions(+), 0 deletions(-)
 create mode 100644 .mailmap

diff --git a/.mailmap b/.mailmap
new file mode 100644
index 000..9797802
--- /dev/null
+++ b/.mailmap
@@ -0,0 +1,16 @@
+# This mailmap just translates the weird addresses from the original import 
into git
+# into proper addresses so that they are counted properly in git shortlog 
output.
+#
+Andrzej Zaborowski balr...@gmail.com balrog 
balrog@c046a42c-6fe2-441c-8c8c-71466251a162
+Anthony Liguori aligu...@us.ibm.com aliguori 
aliguori@c046a42c-6fe2-441c-8c8c-71466251a162
+Aurelien Jarno aurel...@aurel32.net aurel32 
aurel32@c046a42c-6fe2-441c-8c8c-71466251a162
+Blue Swirl blauwir...@gmail.com blueswir1 
blueswir1@c046a42c-6fe2-441c-8c8c-71466251a162
+Edgar E. Iglesias edgar.igles...@gmail.com edgar_igl 
edgar_igl@c046a42c-6fe2-441c-8c8c-71466251a162
+Fabrice Bellard fabr...@bellard.org bellard 
bellard@c046a42c-6fe2-441c-8c8c-71466251a162
+Jocelyn Mayer l_ind...@magic.fr j_mayer 
j_mayer@c046a42c-6fe2-441c-8c8c-71466251a162
+Paul Brook p...@codesourcery.com pbrook 
pbrook@c046a42c-6fe2-441c-8c8c-71466251a162
+Thiemo Seufer t...@networkno.de ths 
ths@c046a42c-6fe2-441c-8c8c-71466251a162
+malc av1...@comtv.ru malc malc@c046a42c-6fe2-441c-8c8c-71466251a162
+# There is also a:
+#(no author) (no author)@c046a42c-6fe2-441c-8c8c-71466251a162
+# for the cvs2svn initialization commit e63c3dc74bf.
-- 
1.7.1

[Qemu-devel] [PATCH 4/8] hw/mpcore.c: Use the GIC memory regions for the CPU interface

2011-12-12 Thread Peter Maydell

Switch to using the GIC memory regions for the CPU interface
rather than hand implementing them as a subcase of mpcore_priv_read()
and mpcore_priv_write().

Signed-off-by: Peter Maydell peter.mayd...@linaro.org
---
 hw/mpcore.c |   35 ++-
 1 files changed, 10 insertions(+), 25 deletions(-)

diff --git a/hw/mpcore.c b/hw/mpcore.c
index 3d64609..a0af1ad 100644
--- a/hw/mpcore.c
+++ b/hw/mpcore.c
@@ -41,7 +41,7 @@ static uint64_t mpcore_priv_read(void *opaque, 
target_phys_addr_t offset,
 {
 mpcore_priv_state *s = (mpcore_priv_state *)opaque;
 int id;
-offset = 0xfff;
+offset = 0xff;
 if (offset  0x100) {
 /* SCU */
 switch (offset) {
@@ -57,17 +57,6 @@ static uint64_t mpcore_priv_read(void *opaque, 
target_phys_addr_t offset,
 default:
 goto bad_reg;
 }
-} else if (offset  0x600) {
-/* Interrupt controller.  */
-if (offset  0x200) {
-id = gic_get_current_cpu();
-} else {
-id = (offset - 0x200)  8;
-if (id = s-num_cpu) {
-return 0;
-}
-}
-return gic_cpu_read(s-gic, id, offset  0xff);
 }
 bad_reg:
 hw_error(mpcore_priv_read: Bad offset %x\n, (int)offset);
@@ -78,8 +67,7 @@ static void mpcore_priv_write(void *opaque, 
target_phys_addr_t offset,
   uint64_t value, unsigned size)
 {
 mpcore_priv_state *s = (mpcore_priv_state *)opaque;
-int id;
-offset = 0xfff;
+offset = 0xff;
 if (offset  0x100) {
 /* SCU */
 switch (offset) {
@@ -92,16 +80,6 @@ static void mpcore_priv_write(void *opaque, 
target_phys_addr_t offset,
 default:
 goto bad_reg;
 }
-} else if (offset  0x600) {
-/* Interrupt controller.  */
-if (offset  0x200) {
-id = gic_get_current_cpu();
-} else {
-id = (offset - 0x200)  8;
-}
-if (id  s-num_cpu) {
-gic_cpu_write(s-gic, id, offset  0xff, value);
-}
 }
 return;
 bad_reg:
@@ -129,8 +107,15 @@ static void mpcore_priv_map_setup(mpcore_priv_state *s)
 SysBusDevice *busdev = sysbus_from_qdev(s-mptimer);
 memory_region_init(s-container, mpcode-priv-container, 0x2000);
 memory_region_init_io(s-iomem, mpcore_priv_ops, s, mpcode-priv,
-  0x1000);
+  0x100);
 memory_region_add_subregion(s-container, 0, s-iomem);
+/* GIC CPU interfaces: current CPU at 0x100, then specific CPUs
+ * at 0x200, 0x300...
+ */
+for (i = 0; i  (s-num_cpu + 1); i++) {
+target_phys_addr_t offset = 0x100 + (i * 0x100);
+memory_region_add_subregion(s-container, offset, 
s-gic.cpuiomem[i]);
+}
 /* Add the regions for timer and watchdog for current CPU and
  * for each specific CPU.
  */
-- 
1.7.1

[Qemu-devel] [PATCH 7/8] hw/a9mpcore.c: Implement A9MP peripherals rather than 11MPcore ones

2011-12-12 Thread Peter Maydell

Implement the A9MP private peripheral region correctly, rather
than piggybacking on the 11MPCore code; the two CPUs are not the
same in this area.

Signed-off-by: Peter Maydell peter.mayd...@linaro.org
---
 hw/a9mpcore.c |  189 ++---
 1 files changed, 179 insertions(+), 10 deletions(-)

diff --git a/hw/a9mpcore.c b/hw/a9mpcore.c
index 6f108f4..cd2985f 100644
--- a/hw/a9mpcore.c
+++ b/hw/a9mpcore.c
@@ -2,28 +2,197 @@
  * Cortex-A9MPCore internal peripheral emulation.
  *
  * Copyright (c) 2009 CodeSourcery.
- * Written by Paul Brook
+ * Copyright (c) 2011 Linaro Limited.
+ * Written by Paul Brook, Peter Maydell.
  *
  * This code is licensed under the GPL.
  */
 
-/* 64 external IRQ lines.  */
+#include sysbus.h
+
+/* Configuration for arm_gic.c:
+ * number of external IRQ lines, max number of CPUs, how to ID current CPU
+ */
 #define GIC_NIRQ 96
-#include mpcore.c
+#define NCPU 4
+
+static inline int
+gic_get_current_cpu(void)
+{
+  return cpu_single_env-cpu_index;
+}
+
+#include arm_gic.c
+
+/* A9MP private memory region.  */
+
+typedef struct a9mp_priv_state {
+gic_state gic;
+uint32_t scu_control;
+uint32_t old_timer_status[8];
+uint32_t num_cpu;
+qemu_irq *timer_irq;
+MemoryRegion scu_iomem;
+MemoryRegion ptimer_iomem;
+MemoryRegion container;
+DeviceState *mptimer;
+} a9mp_priv_state;
+
+static uint64_t a9_scu_read(void *opaque, target_phys_addr_t offset,
+unsigned size)
+{
+a9mp_priv_state *s = (a9mp_priv_state *)opaque;
+switch (offset) {
+case 0x00: /* Control */
+return s-scu_control;
+case 0x04: /* Configuration */
+return (((1  s-num_cpu) - 1)  4) | (s-num_cpu - 1);
+case 0x08: /* CPU Power Status */
+return 0;
+case 0x0c: /* Invalidate All Registers In Secure State */
+return 0;
+case 0x40: /* Filtering Start Address Register */
+case 0x44: /* Filtering End Address Register */
+/* RAZ/WI, like an implementation with only one AXI master */
+return 0;
+case 0x50: /* SCU Access Control Register */
+case 0x54: /* SCU Non-secure Access Control Register */
+/* unimplemented, fall through */
+default:
+return 0;
+}
+}
+
+static void a9_scu_write(void *opaque, target_phys_addr_t offset,
+ uint64_t value, unsigned size)
+{
+a9mp_priv_state *s = (a9mp_priv_state *)opaque;
+switch (offset) {
+case 0x00: /* Control */
+s-scu_control = value  1;
+break;
+case 0x4: /* Configuration: RO */
+break;
+case 0x0c: /* Invalidate All Registers In Secure State */
+/* no-op as we do not implement caches */
+break;
+case 0x40: /* Filtering Start Address Register */
+case 0x44: /* Filtering End Address Register */
+/* RAZ/WI, like an implementation with only one AXI master */
+break;
+case 0x8: /* CPU Power Status */
+case 0x50: /* SCU Access Control Register */
+case 0x54: /* SCU Non-secure Access Control Register */
+/* unimplemented, fall through */
+default:
+break;
+}
+}
+
+static const MemoryRegionOps a9_scu_ops = {
+.read = a9_scu_read,
+.write = a9_scu_write,
+.endianness = DEVICE_NATIVE_ENDIAN,
+};
+
+static void a9mpcore_timer_irq_handler(void *opaque, int irq, int level)
+{
+a9mp_priv_state *s = (a9mp_priv_state *)opaque;
+if (level  !s-old_timer_status[irq]) {
+gic_set_pending_private(s-gic, irq  1, 29 + (irq  1));
+}
+s-old_timer_status[irq] = level;
+}
+
+static void a9mp_priv_reset(DeviceState *dev)
+{
+a9mp_priv_state *s = FROM_SYSBUSGIC(a9mp_priv_state, 
sysbus_from_qdev(dev));
+int i;
+s-scu_control = 0;
+for (i = 0; i  ARRAY_SIZE(s-old_timer_status); i++) {
+s-old_timer_status[i] = 0;
+}
+}
+
+static int a9mp_priv_init(SysBusDevice *dev)
+{
+a9mp_priv_state *s = FROM_SYSBUSGIC(a9mp_priv_state, dev);
+SysBusDevice *busdev;
+int i;
+
+if (s-num_cpu  NCPU) {
+hw_error(a9mp_priv_init: num-cpu may not be more than %d\n, NCPU);
+}
+
+gic_init(s-gic, s-num_cpu);
+
+s-mptimer = qdev_create(NULL, arm_mptimer);
+qdev_prop_set_uint32(s-mptimer, num-cpu, s-num_cpu);
+qdev_init_nofail(s-mptimer);
+busdev = sysbus_from_qdev(s-mptimer);
+
+/* Memory map (addresses are offsets from PERIPHBASE):
+ *  0x-0x00ff -- Snoop Control Unit
+ *  0x0100-0x01ff -- GIC CPU interface
+ *  0x0200-0x02ff -- Global Timer
+ *  0x0300-0x05ff -- nothing
+ *  0x0600-0x06ff -- private timers and watchdogs
+ *  0x0700-0x0fff -- nothing
+ *  0x1000-0x1fff -- GIC Distributor
+ *
+ * We should implement the global timer but don't currently do so.
+ */
+memory_region_init(s-container, a9mp-priv-container, 0x2000);
+memory_region_init_io(s-scu_iomem, a9_scu_ops, s, a9mp-scu, 0x100);
+

Re: [Qemu-devel] [RFC 0/4] virtio-mmio transport

2011-12-12 Thread Pawel Moll

On Mon, 2011-12-12 at 02:52 +, Paul Brook wrote:
 I've taken a look at the virtion-mmio spec, and it looks fairly
 reasonable.
 
 The only thing I'd change is the GuestPageSize/QueuePFN mess.  Seems like 
 just 
 using straight 64-bit addresses would be a better solution.  Maybe split into 
 a high/low pair to keep all registers as 32-bit registers.

This can be done fairly simple by:
1. Bumping Version register content.
2. Adding two registers: QueueAddrLow (0x44) and QueueAddrHigh (0x48).
3. Describing the QueuePFN as obsolete.

Than the driver would just use Addr or PFN depending on the device's
version.

I can do that, but not this year (on holiday from Friday 16th, without
any access to Internet whatsoever :-) One think to be decided is in what
order the halfs should be filled? Low first, then high? High then low?
Does it matter at all? :-)

Cheers!

Paweł

[Qemu-devel] [PATCH 6/8] hw/mpcore: Clean up mpcore_priv_read/write as they are now SCU only

2011-12-12 Thread Peter Maydell

The only code left in mpcore_priv_read and mpcore_priv_write is now
the implementation of the SCU registers. Clean up by renaming functions
and removing some unnecessary conditionals to make this clearer.

Signed-off-by: Peter Maydell peter.mayd...@linaro.org
---
 hw/mpcore.c |   73 +--
 1 files changed, 31 insertions(+), 42 deletions(-)

diff --git a/hw/mpcore.c b/hw/mpcore.c
index a0af1ad..670d7e5 100644
--- a/hw/mpcore.c
+++ b/hw/mpcore.c
@@ -36,59 +36,49 @@ typedef struct mpcore_priv_state {
 
 /* Per-CPU private memory mapped IO.  */
 
-static uint64_t mpcore_priv_read(void *opaque, target_phys_addr_t offset,
- unsigned size)
+static uint64_t mpcore_scu_read(void *opaque, target_phys_addr_t offset,
+unsigned size)
 {
 mpcore_priv_state *s = (mpcore_priv_state *)opaque;
 int id;
 offset = 0xff;
-if (offset  0x100) {
-/* SCU */
-switch (offset) {
-case 0x00: /* Control.  */
-return s-scu_control;
-case 0x04: /* Configuration.  */
-id = ((1  s-num_cpu) - 1)  4;
-return id | (s-num_cpu - 1);
-case 0x08: /* CPU status.  */
-return 0;
-case 0x0c: /* Invalidate all.  */
-return 0;
-default:
-goto bad_reg;
-}
+/* SCU */
+switch (offset) {
+case 0x00: /* Control.  */
+return s-scu_control;
+case 0x04: /* Configuration.  */
+id = ((1  s-num_cpu) - 1)  4;
+return id | (s-num_cpu - 1);
+case 0x08: /* CPU status.  */
+return 0;
+case 0x0c: /* Invalidate all.  */
+return 0;
+default:
+hw_error(mpcore_priv_read: Bad offset %x\n, (int)offset);
 }
-bad_reg:
-hw_error(mpcore_priv_read: Bad offset %x\n, (int)offset);
-return 0;
 }
 
-static void mpcore_priv_write(void *opaque, target_phys_addr_t offset,
-  uint64_t value, unsigned size)
+static void mpcore_scu_write(void *opaque, target_phys_addr_t offset,
+ uint64_t value, unsigned size)
 {
 mpcore_priv_state *s = (mpcore_priv_state *)opaque;
 offset = 0xff;
-if (offset  0x100) {
-/* SCU */
-switch (offset) {
-case 0: /* Control register.  */
-s-scu_control = value  1;
-break;
-case 0x0c: /* Invalidate all.  */
-/* This is a no-op as cache is not emulated.  */
-break;
-default:
-goto bad_reg;
-}
+/* SCU */
+switch (offset) {
+case 0: /* Control register.  */
+s-scu_control = value  1;
+break;
+case 0x0c: /* Invalidate all.  */
+/* This is a no-op as cache is not emulated.  */
+break;
+default:
+hw_error(mpcore_priv_read: Bad offset %x\n, (int)offset);
 }
-return;
-bad_reg:
-hw_error(mpcore_priv_read: Bad offset %x\n, (int)offset);
 }
 
-static const MemoryRegionOps mpcore_priv_ops = {
-.read = mpcore_priv_read,
-.write = mpcore_priv_write,
+static const MemoryRegionOps mpcore_scu_ops = {
+.read = mpcore_scu_read,
+.write = mpcore_scu_write,
 .endianness = DEVICE_NATIVE_ENDIAN,
 };
 
@@ -106,8 +96,7 @@ static void mpcore_priv_map_setup(mpcore_priv_state *s)
 int i;
 SysBusDevice *busdev = sysbus_from_qdev(s-mptimer);
 memory_region_init(s-container, mpcode-priv-container, 0x2000);
-memory_region_init_io(s-iomem, mpcore_priv_ops, s, mpcode-priv,
-  0x100);
+memory_region_init_io(s-iomem, mpcore_scu_ops, s, mpcore-scu, 0x100);
 memory_region_add_subregion(s-container, 0, s-iomem);
 /* GIC CPU interfaces: current CPU at 0x100, then specific CPUs
  * at 0x200, 0x300...
-- 
1.7.1

[Qemu-devel] [PATCH 5/8] hw/realview_gic: Use GIC memory region for the CPU interface

2011-12-12 Thread Peter Maydell

Use the GIC provided memory region for the CPU interface rather
than implementing our own.

Signed-off-by: Peter Maydell peter.mayd...@linaro.org
---
 hw/realview_gic.c |   25 +
 1 files changed, 1 insertions(+), 24 deletions(-)

diff --git a/hw/realview_gic.c b/hw/realview_gic.c
index 479f939..8c4d509 100644
--- a/hw/realview_gic.c
+++ b/hw/realview_gic.c
@@ -23,36 +23,13 @@ gic_get_current_cpu(void)
 
 typedef struct {
 gic_state gic;
-MemoryRegion iomem;
 MemoryRegion container;
 } RealViewGICState;
 
-static uint64_t realview_gic_cpu_read(void *opaque, target_phys_addr_t offset,
-  unsigned size)
-{
-gic_state *s = (gic_state *)opaque;
-return gic_cpu_read(s, gic_get_current_cpu(), offset);
-}
-
-static void realview_gic_cpu_write(void *opaque, target_phys_addr_t offset,
-   uint64_t value, unsigned size)
-{
-gic_state *s = (gic_state *)opaque;
-gic_cpu_write(s, gic_get_current_cpu(), offset, value);
-}
-
-static const MemoryRegionOps realview_gic_cpu_ops = {
-.read = realview_gic_cpu_read,
-.write = realview_gic_cpu_write,
-.endianness = DEVICE_NATIVE_ENDIAN,
-};
-
 static void realview_gic_map_setup(RealViewGICState *s)
 {
 memory_region_init(s-container, realview-gic-container, 0x2000);
-memory_region_init_io(s-iomem, realview_gic_cpu_ops, s-gic,
-  realview-gic, 0x1000);
-memory_region_add_subregion(s-container, 0, s-iomem);
+memory_region_add_subregion(s-container, 0, s-gic.cpuiomem[0]);
 memory_region_add_subregion(s-container, 0x1000, s-gic.iomem);
 }
 
-- 
1.7.1

[Qemu-devel] [PATCH 3/8] hw/arm_gic: Expose GIC CPU interfaces as sysbus memory regions

2011-12-12 Thread Peter Maydell

Expose the ARM GIC CPU interfaces as memory regions, rather than
just providing read and write functions for them.

Signed-off-by: Peter Maydell peter.mayd...@linaro.org
---
 hw/arm_gic.c |   75 +-
 1 files changed, 74 insertions(+), 1 deletions(-)

diff --git a/hw/arm_gic.c b/hw/arm_gic.c
index 527c9ce..66c48fd 100644
--- a/hw/arm_gic.c
+++ b/hw/arm_gic.c
@@ -103,7 +103,14 @@ typedef struct gic_state
 int num_cpu;
 #endif
 
-MemoryRegion iomem;
+MemoryRegion iomem; /* Distributor */
+#ifndef NVIC
+/* This is just so we can have an opaque pointer which identifies
+ * both this GIC and which CPU interface we should be accessing.
+ */
+struct gic_state *backref[NCPU];
+MemoryRegion cpuiomem[NCPU+1]; /* CPU interfaces */
+#endif
 } gic_state;
 
 /* TODO: Many places that call this routine could be optimized.  */
@@ -633,6 +640,54 @@ static void gic_cpu_write(gic_state *s, int cpu, int 
offset, uint32_t value)
 }
 gic_update(s);
 }
+
+/* Wrappers to read/write the GIC CPU interface for the current CPU */
+static uint64_t gic_thiscpu_read(void *opaque, target_phys_addr_t addr,
+ unsigned size)
+{
+gic_state *s = (gic_state *)opaque;
+return gic_cpu_read(s, gic_get_current_cpu(), addr  0xff);
+}
+
+static void gic_thiscpu_write(void *opaque, target_phys_addr_t addr,
+  uint64_t value, unsigned size)
+{
+gic_state *s = (gic_state *)opaque;
+gic_cpu_write(s, gic_get_current_cpu(), addr  0xff, value);
+}
+
+/* Wrappers to read/write the GIC CPU interface for a specific CPU.
+ * These just decode the opaque pointer into gic_state* + cpu id.
+ */
+static uint64_t gic_do_cpu_read(void *opaque, target_phys_addr_t addr,
+unsigned size)
+{
+gic_state **backref = (gic_state **)opaque;
+gic_state *s = *backref;
+int id = (backref - s-backref);
+return gic_cpu_read(s, id, addr  0xff);
+}
+
+static void gic_do_cpu_write(void *opaque, target_phys_addr_t addr,
+ uint64_t value, unsigned size)
+{
+gic_state **backref = (gic_state **)opaque;
+gic_state *s = *backref;
+int id = (backref - s-backref);
+gic_cpu_write(s, id, addr  0xff, value);
+}
+
+static const MemoryRegionOps gic_thiscpu_ops = {
+.read = gic_thiscpu_read,
+.write = gic_thiscpu_write,
+.endianness = DEVICE_NATIVE_ENDIAN,
+};
+
+static const MemoryRegionOps gic_cpu_ops = {
+.read = gic_do_cpu_read,
+.write = gic_do_cpu_write,
+.endianness = DEVICE_NATIVE_ENDIAN,
+};
 #endif
 
 static void gic_reset(gic_state *s)
@@ -752,6 +807,24 @@ static void gic_init(gic_state *s)
 sysbus_init_irq(s-busdev, s-parent_irq[i]);
 }
 memory_region_init_io(s-iomem, gic_dist_ops, s, gic_dist, 0x1000);
+#ifndef NVIC
+/* Memory regions for the CPU interfaces (NVIC doesn't have these):
+ * a region for CPU interface for this core, then a region for
+ * CPU interface for core 0, for core 1, ...
+ * NB that the memory region size of 0x100 applies for the 11MPCore
+ * and also cores following the GIC v1 spec (ie A9).
+ * GIC v2 defines a larger memory region (0x1000) so this will need
+ * to be extended when we implement A15.
+ */
+memory_region_init_io(s-cpuiomem[0], gic_thiscpu_ops, s,
+  gic_cpu, 0x100);
+for (i = 0; i  NUM_CPU(s); i++) {
+s-backref[i] = s;
+memory_region_init_io(s-cpuiomem[i+1], gic_cpu_ops, s-backref[i],
+  gic_cpu, 0x100);
+}
+#endif
+
 gic_reset(s);
 register_savevm(NULL, arm_gic, -1, 2, gic_save, gic_load, s);
 }
-- 
1.7.1

Re: [Qemu-devel] Dropping the MONITOR_CMD_ASYNC

2011-12-12 Thread Gerd Hoffmann

On 12/12/11 11:18, Stefan Hajnoczi wrote:
 On Sun, Dec 11, 2011 at 10:29 AM, Alon Levy al...@redhat.com wrote:
 On Thu, Dec 08, 2011 at 05:45:44PM -0200, Luiz Capitulino wrote:
 Hi there,

 I'm about to completely drop the MONITOR_CMD_ASYNC API, but it turns out 
 that
 the command client_migrate_info uses it. That's a legacy interface and has 
 to
 be dropped, no command should be using it...

 Something tells me that if I just drop it (and change the command to use the
 regular interface), bad things will happen. Am I right? :)


 The monitor command client_migrate_info needs to complete after getting
 an ACK message from the currently connected spice client (this is the
 only case where this is required - if there is no client then the
 MONITOR_CMD_ASYNC API won't be used). This in turn requires the main
 thread to perform select and call the callback that will process this
 ACK. That's why the MONITOR_CMD_ASYNC API was used.

 I'm not aware of any other way to do this, I'll be glad for any help
 here. Also, I understand this is not what is not true async, since one
 would expect a true async interface to support multiple in flight
 monitor commands. If there is any ETA or existing way to do this we
 could change the implementation of client_migrate_info.
 
 Is it possible to use a QMP event to signal completion instead of a
 MONITOR_CMD_ASYNC command?

Problem is this breaks the qemu - libvirt interface.

cheers,
  Gerd

[Qemu-devel] [PATCH 1/8] Fix sp804 dual-timer

2011-12-12 Thread Peter Maydell

From: Peter Chubb peter.ch...@nicta.com.au

Properly implement dual-timer read/write for the sp804 dual timer module.
Based on ARM specs at
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0271d/index.html

Signed-off-by: Hans Jang hsj...@ok-labs.com
Signed-off-by: David Mirabito david.mirab...@nicta.com.au
Signed-off-by: Peter Chubb peter.ch...@nicta.com.au
Signed-off-by: Peter Maydell peter.mayd...@linaro.org
---
 hw/arm_timer.c |   41 +++--
 1 files changed, 35 insertions(+), 6 deletions(-)

diff --git a/hw/arm_timer.c b/hw/arm_timer.c
index 518bad2..0a5b9d2 100644
--- a/hw/arm_timer.c
+++ b/hw/arm_timer.c
@@ -170,9 +170,9 @@ static arm_timer_state *arm_timer_init(uint32_t freq)
 }
 
 /* ARM PrimeCell SP804 dual timer module.
-   Docs for this device don't seem to be publicly available.  This
-   implementation is based on guesswork, the linux kernel sources and the
-   Integrator/CP timer modules.  */
+ * Docs at
+ * 
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0271d/index.html
+*/
 
 typedef struct {
 SysBusDevice busdev;
@@ -182,6 +182,13 @@ typedef struct {
 qemu_irq irq;
 } sp804_state;
 
+static const uint8_t sp804_ids[] = {
+/* Timer ID */
+0x04, 0x18, 0x14, 0,
+/* PrimeCell ID */
+0xd, 0xf0, 0x05, 0xb1
+};
+
 /* Merge the IRQs from the two component devices.  */
 static void sp804_set_irq(void *opaque, int irq, int level)
 {
@@ -196,12 +203,27 @@ static uint64_t sp804_read(void *opaque, 
target_phys_addr_t offset,
 {
 sp804_state *s = (sp804_state *)opaque;
 
-/* ??? Don't know the PrimeCell ID for this device.  */
 if (offset  0x20) {
 return arm_timer_read(s-timer[0], offset);
-} else {
+}
+if (offset  0x40) {
 return arm_timer_read(s-timer[1], offset - 0x20);
 }
+
+/* TimerPeriphID */
+if (offset = 0xfe0  offset = 0xffc) {
+return sp804_ids[(offset - 0xfe0)  2];
+}
+
+switch (offset) {
+/* Integration Test control registers, which we won't support */
+case 0xf00: /* TimerITCR */
+case 0xf04: /* TimerITOP (strictly write only but..) */
+return 0;
+}
+
+hw_error(%s: Bad offset %x\n, __func__, (int)offset);
+return 0;
 }
 
 static void sp804_write(void *opaque, target_phys_addr_t offset,
@@ -211,9 +233,16 @@ static void sp804_write(void *opaque, target_phys_addr_t 
offset,
 
 if (offset  0x20) {
 arm_timer_write(s-timer[0], offset, value);
-} else {
+return;
+}
+
+if (offset  0x40) {
 arm_timer_write(s-timer[1], offset - 0x20, value);
+return;
 }
+
+/* Technically we could be writing to the Test Registers, but not likely */
+hw_error(%s: Bad offset %x\n, __func__, (int)offset);
 }
 
 static const MemoryRegionOps sp804_ops = {
-- 
1.7.1

Re: [Qemu-devel] [PATCH v3] block : return real error code in cow.c

2011-12-12 Thread Kevin Wolf

Am 12.12.2011 06:54, schrieb Li Zhi Hui:
 v3: modify some errors
 
 Signed-off-by: Li Zhi Hui zhihu...@linux.vnet.ibm.com
 ---
  block/cow.c |   44 +---
  1 files changed, 29 insertions(+), 15 deletions(-)

Thanks, applied to the block branch.

Kevin

Re: [Qemu-devel] [PATCH V4 04/13] hw/9pfs: File system helper process for qemu 9p proxy FS

2011-12-12 Thread Stefan Hajnoczi

On Fri, Dec 09, 2011 at 10:12:17PM +0530, M. Mohan Kumar wrote:
 On Friday, December 09, 2011 12:01:14 AM Stefan Hajnoczi wrote:
  On Mon, Dec 05, 2011 at 09:48:41PM +0530, M. Mohan Kumar wrote:
   +static int read_request(int sockfd, struct iovec *iovec, ProxyHeader
   *header) +{
   +int retval;
   +
   +/*
   + * read the request header.
   + */
   +iovec-iov_len = 0;
   +retval = socket_read(sockfd, iovec-iov_base, PROXY_HDR_SZ);
   +if (retval  0) {
   +return retval;
   +}
   +iovec-iov_len = PROXY_HDR_SZ;
   +retval = proxy_unmarshal(iovec, 0, dd, header-type,
   header-size); +if (retval  0) {
   +return retval;
   +}
   +/*
   + * We can't process message.size  PROXY_MAX_IO_SZ, read the
   complete + * message from the socket and ignore it. This ensures
   that + * we can correctly handle the next request. We also return + 
  * ENOBUFS as error to indicate we ran out of buffer space. + */
   +if (header-size  PROXY_MAX_IO_SZ) {
   +int count, size;
   +size = header-size;
   +while (size  0) {
   +count = MIN(PROXY_MAX_IO_SZ, size);
   +count = socket_read(sockfd, iovec-iov_base + PROXY_HDR_SZ,
   count); +if (count  0) {
   +return count;
   +}
   +size -= count;
   +}
  
  I'm not sure recovery attempts are worthwhile here.  The client is
  buggy, perhaps just refuse further work.
 
 But whats the issue in trying to recover in this case?

This recovery procedure is not robust because it does not always work.
In fact it only works in the case where the header-size field was
out-of-range but accurate.  That's not a likely case since the QEMU-side
code that you are writing should handle this.

If the nature of the invalid request is different, either a broken or
malicious client which does not send a valid header-size then we're
stuck in this special-case recovery trying to gobble bytes and we never
log an error.

A real recovery would be something like disconnecting and
re-establishing the connection between QEMU and the helper.  This would
allow us to get back to a clean state in all cases.

Stefan

Re: [Qemu-devel] Dropping the MONITOR_CMD_ASYNC

2011-12-12 Thread Stefan Hajnoczi

On Mon, Dec 12, 2011 at 11:29 AM, Gerd Hoffmann kra...@redhat.com wrote:
 On 12/12/11 11:18, Stefan Hajnoczi wrote:
 On Sun, Dec 11, 2011 at 10:29 AM, Alon Levy al...@redhat.com wrote:
 On Thu, Dec 08, 2011 at 05:45:44PM -0200, Luiz Capitulino wrote:
 Hi there,

 I'm about to completely drop the MONITOR_CMD_ASYNC API, but it turns out 
 that
 the command client_migrate_info uses it. That's a legacy interface and has 
 to
 be dropped, no command should be using it...

 Something tells me that if I just drop it (and change the command to use 
 the
 regular interface), bad things will happen. Am I right? :)


 The monitor command client_migrate_info needs to complete after getting
 an ACK message from the currently connected spice client (this is the
 only case where this is required - if there is no client then the
 MONITOR_CMD_ASYNC API won't be used). This in turn requires the main
 thread to perform select and call the callback that will process this
 ACK. That's why the MONITOR_CMD_ASYNC API was used.

 I'm not aware of any other way to do this, I'll be glad for any help
 here. Also, I understand this is not what is not true async, since one
 would expect a true async interface to support multiple in flight
 monitor commands. If there is any ETA or existing way to do this we
 could change the implementation of client_migrate_info.

 Is it possible to use a QMP event to signal completion instead of a
 MONITOR_CMD_ASYNC command?

 Problem is this breaks the qemu - libvirt interface.

I had the same issue with the block_job_cancel command, which Adam
Litke and Eric Blake helped us fix and find a backward-compatible
libvirt solution for:

http://www.redhat.com/archives/libvir-list/2011-November/msg01351.html

Stefan

Re: [Qemu-devel] [RFC 0/4] virtio-mmio transport

2011-12-12 Thread Stefan Hajnoczi

I noticed the virtio-mmio spec has an interrupt status register.  On
x86 and virtio-pci things are moving towards Message Signalled
Interrupts and virtqueues having their own interrupts for better
performance and flexibility.  Any thoughts on how 1 interrupt per
virtqueue works for virtio-mmio?

My feeling is that the interrupt details are board-specific and can't
be described in virtio-mmio, but it still jumped out at me that it
looks like you're only thinking of 1 interrupt for the device.

Stefan

Re: [Qemu-devel] [RFC 0/4] virtio-mmio transport

2011-12-12 Thread Pawel Moll

On Mon, 2011-12-12 at 12:14 +, Stefan Hajnoczi wrote:
 I noticed the virtio-mmio spec has an interrupt status register.  On
 x86 and virtio-pci things are moving towards Message Signalled
 Interrupts and virtqueues having their own interrupts for better
 performance and flexibility.  Any thoughts on how 1 interrupt per
 virtqueue works for virtio-mmio?

This could be done by either creating devices with more then one
interrupt (platform device can take any number of resources) and
declaring that first queue uses the first one etc.

 My feeling is that the interrupt details are board-specific and can't
 be described in virtio-mmio, 

It's just the the design pattern in the embedded world that devices
usually have one interrupt output, shared between its internal
functions. And - of course - there is no in-band signalling (like MSI)
possible - interrupt lines are just wires :-) In a boundary case
scenario we may face a situation when total amount of interrupts for all
queues may actually exceed amount of interrupt inputs available in the
interrupt controller...

There may be a half-way solution - one interrupt per device but the
active queue number notified via the interrupt status register (as a
FIFO) so the driver wouldn't have to enumerate all the queues.

 but it still jumped out at me that it
 looks like you're only thinking of 1 interrupt for the device.

Yes, current version assumes tgat. I'll keep this in mind when planning
changes for v2 (next year ;-), thanks for letting me know!

Cheers!

Paweł

Re: [Qemu-devel] Dropping the MONITOR_CMD_ASYNC

2011-12-12 Thread Gerd Hoffmann

On 12/12/11 13:10, Stefan Hajnoczi wrote:
 On Mon, Dec 12, 2011 at 11:29 AM, Gerd Hoffmann kra...@redhat.com wrote:
 On 12/12/11 11:18, Stefan Hajnoczi wrote:
 On Sun, Dec 11, 2011 at 10:29 AM, Alon Levy al...@redhat.com wrote:
 On Thu, Dec 08, 2011 at 05:45:44PM -0200, Luiz Capitulino wrote:
 Hi there,

 I'm about to completely drop the MONITOR_CMD_ASYNC API, but it turns out 
 that
 the command client_migrate_info uses it. That's a legacy interface and 
 has to
 be dropped, no command should be using it...

 Something tells me that if I just drop it (and change the command to use 
 the
 regular interface), bad things will happen. Am I right? :)


 The monitor command client_migrate_info needs to complete after getting
 an ACK message from the currently connected spice client (this is the
 only case where this is required - if there is no client then the
 MONITOR_CMD_ASYNC API won't be used). This in turn requires the main
 thread to perform select and call the callback that will process this
 ACK. That's why the MONITOR_CMD_ASYNC API was used.

 I'm not aware of any other way to do this, I'll be glad for any help
 here. Also, I understand this is not what is not true async, since one
 would expect a true async interface to support multiple in flight
 monitor commands. If there is any ETA or existing way to do this we
 could change the implementation of client_migrate_info.

 Is it possible to use a QMP event to signal completion instead of a
 MONITOR_CMD_ASYNC command?

 Problem is this breaks the qemu - libvirt interface.
 
 I had the same issue with the block_job_cancel command, which Adam
 Litke and Eric Blake helped us fix and find a backward-compatible
 libvirt solution for:
 
 http://www.redhat.com/archives/libvir-list/2011-November/msg01351.html

Isn't going to fly as waiting for completion isn't optional in that
case.  Workflow is this:

(1) libvirt issues client_migrate_info command.
(2) qemu forwards it to spice-server, which in turn forwards it to
the spice client (if connected).
(3) spice client connects to the migration target machine.
(4) spice client signals completion to spice-server, which in turn
notifies qemu.
(5) qemu calls the monitor completion callback, libvirt gets
client_migrate_info result.
(6) libvirt issues migrate command.

The problem is that (3) must be finished before (6) because qemu on the
target side doesn't accept incoming tcp connections any more once the
migration started.

I don't see a way to switch this to qmp events without breaking old
libvirt versions managing new qemu.

cheers,
  Gerd

Re: [Qemu-devel] [RFC] qemu-ga: Introduce guest-hibernate command

2011-12-12 Thread Luiz Capitulino

On Sun, 11 Dec 2011 12:00:12 +0200
Dor Laor dl...@redhat.com wrote:

 On 12/08/2011 08:52 PM, Luiz Capitulino wrote:
  This is basically suspend to disk on a Linux guest.
 
  Signed-off-by: Luiz Capitulinolcapitul...@redhat.com
  ---
 
  This is an RFC because I did it as simple as possible and I'm open to
  suggestions...
 
  Now, while testing this or even echo disk  /sys/power/state I get several
 
 Beyond the previous comments about virtio s4 that Amit solved and 
 pm-hibernate I think it will be nice to add suspend to ram (s3) option 
 too. Currently qemu wakes up immediately after s3 but it can be fixed 
 and it can pause the guest and wait to a keyboard event or monitor command.

I thought adding it in this RFC, but I decided to keep it simple. Will add
it to v2.

 
  funny results. Some times qemu just dies after printing that message:
 
Guest moved used index from 20151 to 1
 
  Some times it doesn't die, but I'm unable to log into the guest: I type
  username  password but the terminal kind of locks (the shell doesn't run).
 
  Some times it works...
 
qapi-schema-guest.json |   11 +++
qga/guest-agent-commands.c |   19 +++
2 files changed, 30 insertions(+), 0 deletions(-)
 
  diff --git a/qapi-schema-guest.json b/qapi-schema-guest.json
  index fde5971..2c5bbcf 100644
  --- a/qapi-schema-guest.json
  +++ b/qapi-schema-guest.json
  @@ -215,3 +215,14 @@
##
{ 'command': 'guest-fsfreeze-thaw',
  'returns': 'int' }
  +
  +##
  +# @guest-hibernate
  +#
  +# Save RAM contents to disk and powerdown the guest.
  +#
  +# Notes: This command doesn't return on success.
  +#
  +# Since: 1.1
  +##
  +{ 'command': 'guest-hibernate' }
  diff --git a/qga/guest-agent-commands.c b/qga/guest-agent-commands.c
  index 6da9904..9dd4060 100644
  --- a/qga/guest-agent-commands.c
  +++ b/qga/guest-agent-commands.c
  @@ -550,6 +550,25 @@ int64_t qmp_guest_fsfreeze_thaw(Error **err)
}
#endif
 
  +#define LINUX_SYS_STATE_FILE /sys/power/state
  +
  +void qmp_guest_hibernate(Error **err)
  +{
  +int fd;
  +
  +fd = open(LINUX_SYS_STATE_FILE, O_WRONLY);
  +if (fd  0) {
  +error_set(err, QERR_OPEN_FILE_FAILED, LINUX_SYS_STATE_FILE);
  +return;
  +}
  +
  +if (write(fd, disk, 4)  0) {
  +error_set(err, QERR_UNDEFINED_ERROR);
  +}
  +
  +close(fd);
  +}
  +
/* register init/cleanup routines for stateful command groups */
void ga_command_state_init(GAState *s, GACommandState *cs)
{

Re: [Qemu-devel] [PATCH V2 2/5] xen mapcache: Check if a memory space has moved.

2011-12-12 Thread Stefano Stabellini

On Fri, 9 Dec 2011, Anthony PERARD wrote:
 This patch change the xen_map_cache behavior. Before trying to map a guest
 addr, mapcache will look into the list of range of address that have been 
 moved
 (physmap/set_memory). There is currently one memory space like this, the vram,
 moved from were it's allocated to were the guest will look into.
 
 This help to have a succefull migration.
 
 Signed-off-by: Anthony PERARD anthony.per...@citrix.com
 ---
  xen-all.c  |   18 +-
  xen-mapcache.c |   22 +++---
  xen-mapcache.h |9 +++--
  3 files changed, 43 insertions(+), 6 deletions(-)
 
 diff --git a/xen-all.c b/xen-all.c
 index b5e28ab..b2e9853 100644
 --- a/xen-all.c
 +++ b/xen-all.c
 @@ -218,6 +218,22 @@ static XenPhysmap *get_physmapping(XenIOState *state,
  return NULL;
  }
  
 +static target_phys_addr_t xen_phys_offset_to_gaddr(target_phys_addr_t 
 start_addr,
 +   ram_addr_t size, void 
 *opaque)
 +{
 +target_phys_addr_t addr = start_addr  TARGET_PAGE_MASK;
 +XenIOState *xen_io_state = opaque;
 +XenPhysmap *physmap = NULL;
 +QLIST_FOREACH(physmap, xen_io_state-physmap, list) {
 +if (range_covers_byte(physmap-phys_offset, physmap-size, addr)) {
 +return physmap-start_addr;
 +}
 +}
 +
 +return start_addr;
 +}

Considering that xen_phys_offset_to_gaddr can be called with addresses
that are not page aligned, you should be able to return the translated
address with the right offset in the page, rather than just the
translated address, that is incorrect.
Otherwise if start_addr has to be page aligned, just add a BUG_ON at
the beginning of the function, to spot cases in which it is not.


  #if CONFIG_XEN_CTRL_INTERFACE_VERSION = 340
  static int xen_add_to_physmap(XenIOState *state,
target_phys_addr_t start_addr,
 @@ -938,7 +954,7 @@ int xen_hvm_init(void)
  }
  
  /* Init RAM management */
 -xen_map_cache_init();
 +xen_map_cache_init(xen_phys_offset_to_gaddr, state);
  xen_ram_init(ram_size);

This patch is better than the previous version but I think there is
still room for improvement.
For example, the only case in which xen_phys_offset_to_gaddr should
actually be used is for the videoram on restore, right?
In that case, why don't we set xen_phys_offset_to_gaddr only on restore
rather than all cases?
We could even unset xen_phys_offset_to_gaddr after the videoram address
has been translated correctly so that it is going to be called only once.



  qemu_add_vm_change_state_handler(xen_hvm_change_state_handler, state);
 diff --git a/xen-mapcache.c b/xen-mapcache.c
 index 7bcb86e..d5f52b2 100644
 --- a/xen-mapcache.c
 +++ b/xen-mapcache.c
 @@ -76,6 +76,9 @@ typedef struct MapCache {
  uint8_t *last_address_vaddr;
  unsigned long max_mcache_size;
  unsigned int mcache_bucket_shift;
 +
 +phys_offset_to_gaddr_t phys_offset_to_gaddr;
 +void *opaque;
  } MapCache;
  
  static MapCache *mapcache;
 @@ -89,13 +92,16 @@ static inline int test_bits(int nr, int size, const 
 unsigned long *addr)
  return 0;
  }
  
 -void xen_map_cache_init(void)
 +void xen_map_cache_init(phys_offset_to_gaddr_t f, void *opaque)
  {
  unsigned long size;
  struct rlimit rlimit_as;
  
  mapcache = g_malloc0(sizeof (MapCache));
  
 +mapcache-phys_offset_to_gaddr = f;
 +mapcache-opaque = opaque;
 +
  QTAILQ_INIT(mapcache-locked_entries);
  mapcache-last_address_index = -1;
  
 @@ -191,9 +197,14 @@ uint8_t *xen_map_cache(target_phys_addr_t phys_addr, 
 target_phys_addr_t size,
 uint8_t lock)
  {
  MapCacheEntry *entry, *pentry = NULL;
 -target_phys_addr_t address_index  = phys_addr  MCACHE_BUCKET_SHIFT;
 -target_phys_addr_t address_offset = phys_addr  (MCACHE_BUCKET_SIZE - 1);
 +target_phys_addr_t address_index;
 +target_phys_addr_t address_offset;
  target_phys_addr_t __size = size;
 +bool translated = false;
 +
 +tryagain:
 +address_index  = phys_addr  MCACHE_BUCKET_SHIFT;
 +address_offset = phys_addr  (MCACHE_BUCKET_SIZE - 1);
  
  trace_xen_map_cache(phys_addr);
  
 @@ -235,6 +246,11 @@ uint8_t *xen_map_cache(target_phys_addr_t phys_addr, 
 target_phys_addr_t size,
  if(!test_bits(address_offset  XC_PAGE_SHIFT, size  XC_PAGE_SHIFT,
  entry-valid_mapping)) {
  mapcache-last_address_index = -1;
 +if (!translated  mapcache-phys_offset_to_gaddr) {
 +phys_addr = mapcache-phys_offset_to_gaddr(phys_addr, size, 
 mapcache-opaque);
 +translated = true;
 +goto tryagain;
 +}
  trace_xen_map_cache_return(NULL);
  return NULL;
  }

It would be interesting to check how many times we end up needlessly
calling phys_offset_to_gaddr during the normal life of a VM. It should
be very few, may only one, right?

Re: [Qemu-devel] [PATCH V2 4/5] xen: Change memory access behavior during migration.

2011-12-12 Thread Stefano Stabellini

On Fri, 9 Dec 2011, Anthony PERARD wrote:
 Do not allocate RAM during pre-migration runstate.
 Do not actually do set_memory during migration.
 
 Signed-off-by: Anthony PERARD anthony.per...@citrix.com
 ---
  xen-all.c |   13 +
  1 files changed, 13 insertions(+), 0 deletions(-)
 
 diff --git a/xen-all.c b/xen-all.c
 index b2e9853..c1fed87 100644
 --- a/xen-all.c
 +++ b/xen-all.c
 @@ -189,6 +189,11 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size)
  
  trace_xen_ram_alloc(ram_addr, size);
  
 +if (runstate_check(RUN_STATE_PREMIGRATE)) {
 +/* RAM already populated in Xen */
 +return;
 +}

It is probably worth printing out a message about the address and size
qemu wanted to allocated


  nr_pfn = size  TARGET_PAGE_BITS;
  pfn_list = g_malloc(sizeof (*pfn_list) * nr_pfn);
  
 @@ -269,6 +274,13 @@ go_physmap:
  DPRINTF(mapping vram to %llx - %llx, from %llx\n,
  start_addr, start_addr + size, phys_offset);
  
 +if (runstate_check(RUN_STATE_INMIGRATE)) {
 +/* The mapping should already be done and can not be done a second
 + * time. So we just add to the physmap list instead.
 + */
 +goto done;
 +}

same here, printing a message would be useful

Re: [Qemu-devel] [PATCH V2 5/5] vga-cirrus: Workaround during restore when using Xen.

2011-12-12 Thread Stefano Stabellini

On Fri, 9 Dec 2011, Anthony PERARD wrote:
 During the initialisation of the machine at restore time, the access to the
 VRAM will fail because QEMU does not know yet the right guest address to map,
 so the vram_ptr is NULL.
 
 So this patch avoid using a NULL pointer during initialisation, and try to get
 another vram_ptr if the call failed the first time.

I think it would be a good idea to split this patch into two patches:
one that avoids doing the memset on restore, because it is actually
futile in all cases; and another patch that tries to set the
vga.vram_ptr again in case it is still NULL.

Re: [Qemu-devel] [RFC 0/4] virtio-mmio transport

2011-12-12 Thread Michael S. Tsirkin

On Mon, Dec 12, 2011 at 12:28:27PM +, Pawel Moll wrote:
 On Mon, 2011-12-12 at 12:14 +, Stefan Hajnoczi wrote:
  I noticed the virtio-mmio spec has an interrupt status register.  On
  x86 and virtio-pci things are moving towards Message Signalled
  Interrupts and virtqueues having their own interrupts for better
  performance and flexibility.  Any thoughts on how 1 interrupt per
  virtqueue works for virtio-mmio?
 
 This could be done by either creating devices with more then one
 interrupt (platform device can take any number of resources) and
 declaring that first queue uses the first one etc.

We currently support mapping from virtqueues to interrupt
vectors in virtio core. Only virtio pci uses that
but mmio can too. It's better than fixed mapping
IMO as driver can control resources per queue.

  My feeling is that the interrupt details are board-specific and can't
  be described in virtio-mmio, 
 
 It's just the the design pattern in the embedded world that devices
 usually have one interrupt output, shared between its internal
 functions. And - of course - there is no in-band signalling (like MSI)
 possible - interrupt lines are just wires :-) In a boundary case
 scenario we may face a situation when total amount of interrupts for all
 queues may actually exceed amount of interrupt inputs available in the
 interrupt controller...
 
 There may be a half-way solution - one interrupt per device but the
 active queue number notified via the interrupt status register (as a
 FIFO) so the driver wouldn't have to enumerate all the queues.

We could use a queue for this certainly.
Why do you have so many queues?

  but it still jumped out at me that it
  looks like you're only thinking of 1 interrupt for the device.
 
 Yes, current version assumes tgat. I'll keep this in mind when planning
 changes for v2 (next year ;-), thanks for letting me know!
 
 Cheers!
 
 Paweł

Re: [Qemu-devel] [PATCH V2 5/5] vga-cirrus: Workaround during restore when using Xen.

2011-12-12 Thread Stefano Stabellini

On Sat, 10 Dec 2011, Jan Kiszka wrote:
 On 2011-12-09 22:54, Anthony PERARD wrote:
  During the initialisation of the machine at restore time, the access to the
  VRAM will fail because QEMU does not know yet the right guest address to 
  map,
  so the vram_ptr is NULL.
  
  So this patch avoid using a NULL pointer during initialisation, and try to 
  get
  another vram_ptr if the call failed the first time.
  
  Signed-off-by: Anthony PERARD anthony.per...@citrix.com
  ---
   hw/cirrus_vga.c |   16 +---
   1 files changed, 13 insertions(+), 3 deletions(-)
  
  diff --git a/hw/cirrus_vga.c b/hw/cirrus_vga.c
  index c7e365b..2e049c9 100644
  --- a/hw/cirrus_vga.c
  +++ b/hw/cirrus_vga.c
  @@ -32,6 +32,7 @@
   #include console.h
   #include vga_int.h
   #include loader.h
  +#include sysemu.h
   
   /*
* TODO:
  @@ -2696,6 +2697,13 @@ static int cirrus_post_load(void *opaque, int 
  version_id)
   s-vga.gr[0x01] = s-cirrus_shadow_gr1  0x0f;
   
   cirrus_update_memory_access(s);
  +if (!s-vga.vram_ptr) {
  +/* At this point vga.vram_ptr can be invalid on Xen because we 
  need to
  + * know the position of the videoram in the guest physical address 
  space in
  + * order to be able to map it. After cirrus_update_memory_access 
  we do know
  + * where the videoram is, so let's map it now. */
  +s-vga.vram_ptr = memory_region_get_ram_ptr(s-vga.vram);
  +}
   /* force refresh */
   s-vga.graphic_mode = -1;
   cirrus_update_bank_ptr(s, 0);
  @@ -2784,9 +2792,11 @@ static void cirrus_reset(void *opaque)
   }
   s-vga.cr[0x27] = s-device_id;
   
  -/* Win2K seems to assume that the pattern buffer is at 0xff
  -   initially ! */
  -memset(s-vga.vram_ptr, 0xff, s-real_vram_size);
  +if (!runstate_check(RUN_STATE_PREMIGRATE)) {
  +/* Win2K seems to assume that the pattern buffer is at 0xff
  +   initially ! */
  +memset(s-vga.vram_ptr, 0xff, s-real_vram_size);
  +}
   
   s-cirrus_hidden_dac_lockindex = 5;
   s-cirrus_hidden_dac_data = 0;
 
 Is there really no way to fix this properly in the Xen layer?

We thought about this issue for some time but we couldn't come up with
anything better.
To summarize the problem:

- on restore the videoram has already been loaded in the guest physical
  address space by Xen;

- we don't know exactly where it is yet, because it has been loaded at
  the *last* address it was mapped to (see map_linear_vram_bank that
  moves the videoram);

- we want to avoid allocating the videoram twice (actually the second
  allocation would fail because of memory constraints);



So the solution (I acknowledge that it looks more like an hack than a
solution) is:

- wait for cirrus to load its state so that we know where the videoram
  is;

- map the videoram into qemu's address space at that point.



Anything else we came up with was even worse, for example:

- independently save the position of the videoram and then when
  vga_common_init tries to allocate it for a second time give back the
  old videoram instead;

- move back the videoram to the original position and then move it again
  to the new position.



 This looks
 highly fragile as specifically the second hunk appears unrelated to Xen.

I think that the second chuck should be in a separate patch because it
is unrelated to Xen. On restore it is useless to memset the videoram, so
for performance reasons it would be a good idea to avoid it on all
platforms. Also it happens to crash Qemu on Xen but that is another
story  ;-)

I think we should also add a comment:

useles to memset the videoram on restore, the old videoram is going to
be copied over soon anyway


 Also, is this the only device affected by the shortcoming? What about
 other VGA adapters e.g.?

To my knowledge (in the PC emulator) it is the only device that
allocates memory using memory_region_init_ram/memory_region_get_ram_ptr
and then moves it around. But maybe QXL does it as well?

Re: [Qemu-devel] [RFC 0/4] virtio-mmio transport

2011-12-12 Thread Pawel Moll

On Mon, 2011-12-12 at 13:12 +, Michael S. Tsirkin wrote:
 On Mon, Dec 12, 2011 at 12:28:27PM +, Pawel Moll wrote:
  On Mon, 2011-12-12 at 12:14 +, Stefan Hajnoczi wrote:
   I noticed the virtio-mmio spec has an interrupt status register.  On
   x86 and virtio-pci things are moving towards Message Signalled
   Interrupts and virtqueues having their own interrupts for better
   performance and flexibility.  Any thoughts on how 1 interrupt per
   virtqueue works for virtio-mmio?
  
  This could be done by either creating devices with more then one
  interrupt (platform device can take any number of resources) and
  declaring that first queue uses the first one etc.
 
 We currently support mapping from virtqueues to interrupt
 vectors in virtio core. Only virtio pci uses that
 but mmio can too. It's better than fixed mapping
 IMO as driver can control resources per queue.

I'll keep that in mind.

   My feeling is that the interrupt details are board-specific and can't
   be described in virtio-mmio, 
  
  It's just the the design pattern in the embedded world that devices
  usually have one interrupt output, shared between its internal
  functions. And - of course - there is no in-band signalling (like MSI)
  possible - interrupt lines are just wires :-) In a boundary case
  scenario we may face a situation when total amount of interrupts for all
  queues may actually exceed amount of interrupt inputs available in the
  interrupt controller...
  
  There may be a half-way solution - one interrupt per device but the
  active queue number notified via the interrupt status register (as a
  FIFO) so the driver wouldn't have to enumerate all the queues.
 
 We could use a queue for this certainly.

Hm, yes, I suppose so :-) This would be a system-level queue rather
than the normal one, but I guess we could do that.

 Why do you have so many queues?

I assume this a question about the example I gave above? The answer is:
I obviously don't :-) This was just to point out that there _may_ be a
problem if we wanted to allocate an interrupt per queue.

Cheers!

Paweł

Re: [Qemu-devel] [PATCH V2 5/5] vga-cirrus: Workaround during restore when using Xen.

2011-12-12 Thread Jan Kiszka

On 2011-12-12 14:18, Stefano Stabellini wrote:
 On Sat, 10 Dec 2011, Jan Kiszka wrote:
 On 2011-12-09 22:54, Anthony PERARD wrote:
 During the initialisation of the machine at restore time, the access to the
 VRAM will fail because QEMU does not know yet the right guest address to 
 map,
 so the vram_ptr is NULL.

 So this patch avoid using a NULL pointer during initialisation, and try to 
 get
 another vram_ptr if the call failed the first time.

 Signed-off-by: Anthony PERARD anthony.per...@citrix.com
 ---
  hw/cirrus_vga.c |   16 +---
  1 files changed, 13 insertions(+), 3 deletions(-)

 diff --git a/hw/cirrus_vga.c b/hw/cirrus_vga.c
 index c7e365b..2e049c9 100644
 --- a/hw/cirrus_vga.c
 +++ b/hw/cirrus_vga.c
 @@ -32,6 +32,7 @@
  #include console.h
  #include vga_int.h
  #include loader.h
 +#include sysemu.h
  
  /*
   * TODO:
 @@ -2696,6 +2697,13 @@ static int cirrus_post_load(void *opaque, int 
 version_id)
  s-vga.gr[0x01] = s-cirrus_shadow_gr1  0x0f;
  
  cirrus_update_memory_access(s);
 +if (!s-vga.vram_ptr) {
 +/* At this point vga.vram_ptr can be invalid on Xen because we 
 need to
 + * know the position of the videoram in the guest physical address 
 space in
 + * order to be able to map it. After cirrus_update_memory_access 
 we do know
 + * where the videoram is, so let's map it now. */
 +s-vga.vram_ptr = memory_region_get_ram_ptr(s-vga.vram);
 +}
  /* force refresh */
  s-vga.graphic_mode = -1;
  cirrus_update_bank_ptr(s, 0);
 @@ -2784,9 +2792,11 @@ static void cirrus_reset(void *opaque)
  }
  s-vga.cr[0x27] = s-device_id;
  
 -/* Win2K seems to assume that the pattern buffer is at 0xff
 -   initially ! */
 -memset(s-vga.vram_ptr, 0xff, s-real_vram_size);
 +if (!runstate_check(RUN_STATE_PREMIGRATE)) {
 +/* Win2K seems to assume that the pattern buffer is at 0xff
 +   initially ! */
 +memset(s-vga.vram_ptr, 0xff, s-real_vram_size);
 +}
  
  s-cirrus_hidden_dac_lockindex = 5;
  s-cirrus_hidden_dac_data = 0;

 Is there really no way to fix this properly in the Xen layer?
 
 We thought about this issue for some time but we couldn't come up with
 anything better.
 To summarize the problem:
 
 - on restore the videoram has already been loaded in the guest physical
   address space by Xen;
 
 - we don't know exactly where it is yet, because it has been loaded at
   the *last* address it was mapped to (see map_linear_vram_bank that
   moves the videoram);
 
 - we want to avoid allocating the videoram twice (actually the second
   allocation would fail because of memory constraints);
 
 
 
 So the solution (I acknowledge that it looks more like an hack than a
 solution) is:
 
 - wait for cirrus to load its state so that we know where the videoram
   is;

Why can't you store this information in some additional Xen-specific
vmstate? The fact that memory_region_get_ram_ptr has to return NULL
looks bogus to me. It's against the QEMU semantics. Other devices may
only be fine as they are not (yet) used with Xen.

 
 - map the videoram into qemu's address space at that point.
 
 
 
 Anything else we came up with was even worse, for example:
 
 - independently save the position of the videoram and then when
   vga_common_init tries to allocate it for a second time give back the
   old videoram instead;
 
 - move back the videoram to the original position and then move it again
   to the new position.
 
 
 
 This looks
 highly fragile as specifically the second hunk appears unrelated to Xen.
 
 I think that the second chuck should be in a separate patch because it
 is unrelated to Xen. On restore it is useless to memset the videoram, so
 for performance reasons it would be a good idea to avoid it on all
 platforms. Also it happens to crash Qemu on Xen but that is another
 story  ;-)
 
 I think we should also add a comment:
 
 useles to memset the videoram on restore, the old videoram is going to
 be copied over soon anyway

That's in fact a different story and maybe worth optimizing due to the
size of that buffer. But please do not call the state PREMIGRATE. It's
rather INCOMING[_MIGRATION].

Thanks,
Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

Re: [Qemu-devel] [PATCH V2 5/5] vga-cirrus: Workaround during restore when using Xen.

2011-12-12 Thread Stefano Stabellini

On Mon, 12 Dec 2011, Jan Kiszka wrote:
 On 2011-12-12 14:18, Stefano Stabellini wrote:
  On Sat, 10 Dec 2011, Jan Kiszka wrote:
  On 2011-12-09 22:54, Anthony PERARD wrote:
  During the initialisation of the machine at restore time, the access to 
  the
  VRAM will fail because QEMU does not know yet the right guest address to 
  map,
  so the vram_ptr is NULL.
 
  So this patch avoid using a NULL pointer during initialisation, and try 
  to get
  another vram_ptr if the call failed the first time.
 
  Signed-off-by: Anthony PERARD anthony.per...@citrix.com
  ---
   hw/cirrus_vga.c |   16 +---
   1 files changed, 13 insertions(+), 3 deletions(-)
 
  diff --git a/hw/cirrus_vga.c b/hw/cirrus_vga.c
  index c7e365b..2e049c9 100644
  --- a/hw/cirrus_vga.c
  +++ b/hw/cirrus_vga.c
  @@ -32,6 +32,7 @@
   #include console.h
   #include vga_int.h
   #include loader.h
  +#include sysemu.h
   
   /*
* TODO:
  @@ -2696,6 +2697,13 @@ static int cirrus_post_load(void *opaque, int 
  version_id)
   s-vga.gr[0x01] = s-cirrus_shadow_gr1  0x0f;
   
   cirrus_update_memory_access(s);
  +if (!s-vga.vram_ptr) {
  +/* At this point vga.vram_ptr can be invalid on Xen because we 
  need to
  + * know the position of the videoram in the guest physical 
  address space in
  + * order to be able to map it. After cirrus_update_memory_access 
  we do know
  + * where the videoram is, so let's map it now. */
  +s-vga.vram_ptr = memory_region_get_ram_ptr(s-vga.vram);
  +}
   /* force refresh */
   s-vga.graphic_mode = -1;
   cirrus_update_bank_ptr(s, 0);
  @@ -2784,9 +2792,11 @@ static void cirrus_reset(void *opaque)
   }
   s-vga.cr[0x27] = s-device_id;
   
  -/* Win2K seems to assume that the pattern buffer is at 0xff
  -   initially ! */
  -memset(s-vga.vram_ptr, 0xff, s-real_vram_size);
  +if (!runstate_check(RUN_STATE_PREMIGRATE)) {
  +/* Win2K seems to assume that the pattern buffer is at 0xff
  +   initially ! */
  +memset(s-vga.vram_ptr, 0xff, s-real_vram_size);
  +}
   
   s-cirrus_hidden_dac_lockindex = 5;
   s-cirrus_hidden_dac_data = 0;
 
  Is there really no way to fix this properly in the Xen layer?
  
  We thought about this issue for some time but we couldn't come up with
  anything better.
  To summarize the problem:
  
  - on restore the videoram has already been loaded in the guest physical
address space by Xen;
  
  - we don't know exactly where it is yet, because it has been loaded at
the *last* address it was mapped to (see map_linear_vram_bank that
moves the videoram);
  
  - we want to avoid allocating the videoram twice (actually the second
allocation would fail because of memory constraints);
  
  
  
  So the solution (I acknowledge that it looks more like an hack than a
  solution) is:
  
  - wait for cirrus to load its state so that we know where the videoram
is;
 
 Why can't you store this information in some additional Xen-specific
 vmstate? The fact that memory_region_get_ram_ptr has to return NULL
 looks bogus to me. It's against the QEMU semantics. Other devices may
 only be fine as they are not (yet) used with Xen.

Unfortunately that cannot work because the allocation is done by
vga_common_init before any state has been loaded.
So even if we saved the physmap QLIST in a vmstate record, it wouldn't
be available yet when vga_common_init tries to allocate the memory.

Re: [Qemu-devel] [RFC 0/4] virtio-mmio transport

2011-12-12 Thread Paul Brook

 I can do that, but not this year (on holiday from Friday 16th, without
 any access to Internet whatsoever :-) One think to be decided is in what
 order the halfs should be filled? Low first, then high? High then low?
 Does it matter at all? :-)

My inital though was that you shouldn't be changing this value when the ring 
is enabled.  Unfortunately you disable the ring by setting the address to zero 
so that argument doesn't work :-/

I suggest that the device to buffer writes to the high part, and construct the 
actual 64-bit value when the low part is written.  That allows 32-bit guests 
can ignore the high part entirely.  Requiring the guest always write high then 
low also works, though I don't see any benefit to either guest or host.

Acting on writes as soon as they occur is not a viable option as it causes the 
device to be enabled before the full address has bee written.  You could say 
the address takes effect after both halves have been written, writes must come 
in pairs, but may occur in either order.  However there is a risk that host 
and guest will somehow get out of sync on which values pair together, so IMO 
this is a bad idea.


If you can't stomach the above then I guess changing the condition for ring 
enablement to QueueNum != 0 and rearanging the configuration sequence 
appropriately would also do the trick.  Having this be different between PCI 
and mmio is probably not worth the confusion though.

Paul

Re: [Qemu-devel] [RFC 0/4] virtio-mmio transport

2011-12-12 Thread Pawel Moll

On Mon, 2011-12-12 at 14:45 +, Paul Brook wrote:
 I suggest that the device to buffer writes to the high part, and construct 
 the 
 actual 64-bit value when the low part is written.  That allows 32-bit guests 
 can ignore the high part entirely.

This sounds good to me. If we define the reset value of both registers
as 0 (which is consistent with 0 = stop condition) a 32-bit guest will
be able to behave as you are suggesting.

Cool, I will keep this in mind.

Thanks!

Paweł

Re: [Qemu-devel] [RFC 0/6] target-i386: TCGv cleanups

2011-12-12 Thread Paul Brook

 Am 12.12.2011 00:42, schrieb Paul Brook:
  This series makes target-i386 compile with DEBUG_TCGV_TL.
  
  What benefit does this provide?
 
 It showcases what changes would need to be done to allow type-safe
 compilation of the first pair of --enable-system targets.

How is the existing code not type safe?  What benefit does making TCGv a 
separate type rather than an alias for either _i32 or _i64 give?

Paul

Re: [Qemu-devel] [PATCH V2 5/5] vga-cirrus: Workaround during restore when using Xen.

2011-12-12 Thread Jan Kiszka

On 2011-12-12 15:41, Stefano Stabellini wrote:
 On Mon, 12 Dec 2011, Jan Kiszka wrote:
 On 2011-12-12 14:18, Stefano Stabellini wrote:
 On Sat, 10 Dec 2011, Jan Kiszka wrote:
 On 2011-12-09 22:54, Anthony PERARD wrote:
 During the initialisation of the machine at restore time, the access to 
 the
 VRAM will fail because QEMU does not know yet the right guest address to 
 map,
 so the vram_ptr is NULL.

 So this patch avoid using a NULL pointer during initialisation, and try 
 to get
 another vram_ptr if the call failed the first time.

 Signed-off-by: Anthony PERARD anthony.per...@citrix.com
 ---
  hw/cirrus_vga.c |   16 +---
  1 files changed, 13 insertions(+), 3 deletions(-)

 diff --git a/hw/cirrus_vga.c b/hw/cirrus_vga.c
 index c7e365b..2e049c9 100644
 --- a/hw/cirrus_vga.c
 +++ b/hw/cirrus_vga.c
 @@ -32,6 +32,7 @@
  #include console.h
  #include vga_int.h
  #include loader.h
 +#include sysemu.h
  
  /*
   * TODO:
 @@ -2696,6 +2697,13 @@ static int cirrus_post_load(void *opaque, int 
 version_id)
  s-vga.gr[0x01] = s-cirrus_shadow_gr1  0x0f;
  
  cirrus_update_memory_access(s);
 +if (!s-vga.vram_ptr) {
 +/* At this point vga.vram_ptr can be invalid on Xen because we 
 need to
 + * know the position of the videoram in the guest physical 
 address space in
 + * order to be able to map it. After cirrus_update_memory_access 
 we do know
 + * where the videoram is, so let's map it now. */
 +s-vga.vram_ptr = memory_region_get_ram_ptr(s-vga.vram);
 +}
  /* force refresh */
  s-vga.graphic_mode = -1;
  cirrus_update_bank_ptr(s, 0);
 @@ -2784,9 +2792,11 @@ static void cirrus_reset(void *opaque)
  }
  s-vga.cr[0x27] = s-device_id;
  
 -/* Win2K seems to assume that the pattern buffer is at 0xff
 -   initially ! */
 -memset(s-vga.vram_ptr, 0xff, s-real_vram_size);
 +if (!runstate_check(RUN_STATE_PREMIGRATE)) {
 +/* Win2K seems to assume that the pattern buffer is at 0xff
 +   initially ! */
 +memset(s-vga.vram_ptr, 0xff, s-real_vram_size);
 +}
  
  s-cirrus_hidden_dac_lockindex = 5;
  s-cirrus_hidden_dac_data = 0;

 Is there really no way to fix this properly in the Xen layer?

 We thought about this issue for some time but we couldn't come up with
 anything better.
 To summarize the problem:

 - on restore the videoram has already been loaded in the guest physical
   address space by Xen;

 - we don't know exactly where it is yet, because it has been loaded at
   the *last* address it was mapped to (see map_linear_vram_bank that
   moves the videoram);

 - we want to avoid allocating the videoram twice (actually the second
   allocation would fail because of memory constraints);



 So the solution (I acknowledge that it looks more like an hack than a
 solution) is:

 - wait for cirrus to load its state so that we know where the videoram
   is;

 Why can't you store this information in some additional Xen-specific
 vmstate? The fact that memory_region_get_ram_ptr has to return NULL
 looks bogus to me. It's against the QEMU semantics. Other devices may
 only be fine as they are not (yet) used with Xen.
 
 Unfortunately that cannot work because the allocation is done by
 vga_common_init before any state has been loaded.
 So even if we saved the physmap QLIST in a vmstate record, it wouldn't
 be available yet when vga_common_init tries to allocate the memory.

If you run two VMs with identical setup, one from cold start to
operational mode, the other into an incoming migration, the guest
physical memory layout the host sees is different? Hmm, maybe if you
reorder devices in the command line.

Really, I think this is something inherently incompatible with the
current memory API. If Xen has this unfixable special requirement
(it's rather a design issue IMHO), adjust the API and adapt all devices.
Hot-fixing only a single one this way is no good idea long term.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

Re: [Qemu-devel] New Guess OS Creation Problem

2011-12-12 Thread Stefan Hajnoczi

On Mon, Dec 12, 2011 at 2:50 PM, takizo paul...@takizo.com wrote:

 On Dec 12, 2011, at 6:56 PM, Stefan Hajnoczi wrote:

 On Sun, Dec 11, 2011 at 4:40 PM, takizo paul...@takizo.com wrote:
 LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin QEMU_AUDIO_DRV=none 
 /usr/libexec/qemu-kvm -S -M rhel6.1.0 -enable-kvm -m 4096 -smp 
 1,sockets=1,cores=1,threads=1 -name database -uuid 
 f3e9f320-7826-7e50-94bb-1833f7fd9dfb -nodefconfig -nodefaults -chardev 
 socket,id=charmonitor,path=/var/lib/libvirt/qemu/database.monitor,server,nowait
  -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-reboot 
 -drive 
 file=/opt/cibai/database,if=none,id=drive-ide0-0-0,format=raw,cache=none,aio=threads
  -device 
 ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=2 
 -drive 
 file=/opt/ISO-Download/FreeBSD-8.2-RELEASE-amd64-disc1.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw,aio=threads
  -device 
 ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1 
 -netdev tap,fd=22,id=hostnet0 -device 
 e1000,netdev=hostnet0,id=net0,mac=52:54:00:77:a5:a6,bus=pci.0,addr=0x3 
 -chardev pty,id=charserial0 -device 
 isa-serial,chardev=charserial0,id=serial0 -usb -vnc 127.0.0.1:2,password 
 -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4
 char device redirected to /dev/pts/6
 Using CPU model cpu64-rhel6

 block I/O error in device 'drive-ide0-0-0': Invalid argument (22)

 Is /opt/cibai/database on a special filesystem or exotic storage setup?

 Please check dmesg on the host for any kernel messages regarding I/O errors.

 Please try reading the file on the host to check whether the I/O error
 is happening on the host and not related to KVM:

 $ dd if=/opt/cibai/database of=/dev/null iflag=direct

 Stefan

 Hi Stefan,

 When I quit the installation, further see the error messages

 ide_dma_cancel: aiocb still pendingide_dma_cancel: BM_STATUS_DMAING still 
 pendingide_dma_cancel: aiocb still pendingide_dma_cancel: BM_STATUS_DMAING 
 still pendingide_dma_cancel: aiocb still pendingide_dma_cancel: 
 BM_STATUS_DMAING still pendingide_dma_cancel: aiocb still 
 pendingide_dma_cancel: BM_STATUS_DMAING still pendingide_dma_cancel: aiocb 
 still pendingide_dma_cancel: BM_STATUS_DMAING still pendingide_dma_cancel: 
 aiocb still pendingide_dma_cancel: BM_STATUS_DMAING still pending2011-12-12 
 22:47:51.427: shutting down


Okay, so if I understand correctly you have CentOS 6.1 host running
FreeBSD and CentOS guest installs to emulated IDE drives.  You're
getting failed I/O requests with EINVAL (22).  You have verified that
dd if=/opt/vm/database.img of=/dev/null iflag=direct reads the image
file without errors.

Can you please try the CentOS guest install with a virtio-blk drive
instead of IDE?  This should show whether this problem is related to
IDE or a general issue in how QEMU accesses the host file.

Stefan

Re: [Qemu-devel] [RFC 0/4] virtio-mmio transport

2011-12-12 Thread Stefan Hajnoczi

On Mon, Dec 12, 2011 at 12:28 PM, Pawel Moll pawel.m...@arm.com wrote:
 On Mon, 2011-12-12 at 12:14 +, Stefan Hajnoczi wrote:
 I noticed the virtio-mmio spec has an interrupt status register.  On
 x86 and virtio-pci things are moving towards Message Signalled
 Interrupts and virtqueues having their own interrupts for better
 performance and flexibility.  Any thoughts on how 1 interrupt per
 virtqueue works for virtio-mmio?

 This could be done by either creating devices with more then one
 interrupt (platform device can take any number of resources) and
 declaring that first queue uses the first one etc.

 My feeling is that the interrupt details are board-specific and can't
 be described in virtio-mmio,

 It's just the the design pattern in the embedded world that devices
 usually have one interrupt output, shared between its internal
 functions. And - of course - there is no in-band signalling (like MSI)
 possible - interrupt lines are just wires :-) In a boundary case
 scenario we may face a situation when total amount of interrupts for all
 queues may actually exceed amount of interrupt inputs available in the
 interrupt controller...

 There may be a half-way solution - one interrupt per device but the
 active queue number notified via the interrupt status register (as a
 FIFO) so the driver wouldn't have to enumerate all the queues.

If there aren't already then pretty soon ARM-based systems will deal
with PCIe and Message Signalled Interrupts.  Are you sure new ARM
boards are constraints to a small number of physical interrupts?

Stefan

Re: [Qemu-devel] [RFC 0/4] virtio-mmio transport

2011-12-12 Thread Peter Maydell

On 12 December 2011 15:11, Stefan Hajnoczi stefa...@gmail.com wrote:
 If there aren't already then pretty soon ARM-based systems will deal
 with PCIe and Message Signalled Interrupts.  Are you sure new ARM
 boards are constraints to a small number of physical interrupts?

Depends what you mean by small number. The GIC has up to 224
interrupt lines...

-- PMM

Re: [Qemu-devel] New Guess OS Creation Problem

2011-12-12 Thread takizo


On Dec 12, 2011, at 11:08 PM, Stefan Hajnoczi wrote:

 On Mon, Dec 12, 2011 at 2:50 PM, takizo paul...@takizo.com wrote:
 
 On Dec 12, 2011, at 6:56 PM, Stefan Hajnoczi wrote:
 
 On Sun, Dec 11, 2011 at 4:40 PM, takizo paul...@takizo.com wrote:
 LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin QEMU_AUDIO_DRV=none 
 /usr/libexec/qemu-kvm -S -M rhel6.1.0 -enable-kvm -m 4096 -smp 
 1,sockets=1,cores=1,threads=1 -name database -uuid 
 f3e9f320-7826-7e50-94bb-1833f7fd9dfb -nodefconfig -nodefaults -chardev 
 socket,id=charmonitor,path=/var/lib/libvirt/qemu/database.monitor,server,nowait
  -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-reboot 
 -drive 
 file=/opt/cibai/database,if=none,id=drive-ide0-0-0,format=raw,cache=none,aio=threads
  -device 
 ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=2 
 -drive 
 file=/opt/ISO-Download/FreeBSD-8.2-RELEASE-amd64-disc1.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw,aio=threads
  -device 
 ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1 
 -netdev tap,fd=22,id=hostnet0 -device 
 e1000,netdev=hostnet0,id=net0,mac=52:54:00:77:a5:a6,bus=pci.0,addr=0x3 
 -chardev pty,id=charserial0 -device 
 isa-serial,chardev=charserial0,id=serial0 -usb -vnc 127.0.0.1:2,password 
 -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4
 char device redirected to /dev/pts/6
 Using CPU model cpu64-rhel6
 
 block I/O error in device 'drive-ide0-0-0': Invalid argument (22)
 
 Is /opt/cibai/database on a special filesystem or exotic storage setup?
 
 Please check dmesg on the host for any kernel messages regarding I/O errors.
 
 Please try reading the file on the host to check whether the I/O error
 is happening on the host and not related to KVM:
 
 $ dd if=/opt/cibai/database of=/dev/null iflag=direct
 
 Stefan
 
 Hi Stefan,
 
 When I quit the installation, further see the error messages
 
 ide_dma_cancel: aiocb still pendingide_dma_cancel: BM_STATUS_DMAING still 
 pendingide_dma_cancel: aiocb still pendingide_dma_cancel: BM_STATUS_DMAING 
 still pendingide_dma_cancel: aiocb still pendingide_dma_cancel: 
 BM_STATUS_DMAING still pendingide_dma_cancel: aiocb still 
 pendingide_dma_cancel: BM_STATUS_DMAING still pendingide_dma_cancel: aiocb 
 still pendingide_dma_cancel: BM_STATUS_DMAING still pendingide_dma_cancel: 
 aiocb still pendingide_dma_cancel: BM_STATUS_DMAING still pending2011-12-12 
 22:47:51.427: shutting down
 

Hi Stefan 

 Okay, so if I understand correctly you have CentOS 6.1 host running
 FreeBSD and CentOS guest installs to emulated IDE drives.  You're
 getting failed I/O requests with EINVAL (22).  You have verified that
 dd if=/opt/vm/database.img of=/dev/null iflag=direct reads the image
 file without errors.

That is right, both also emulated on IDE drive

 
 Can you please try the CentOS guest install with a virtio-blk drive
 instead of IDE?  This should show whether this problem is related to
 IDE or a general issue in how QEMU accesses the host file.

Took your advice installed Centos on VirtIO drive and it's ok. 

Does it means it's IDE problem on the OS itself? 

2011-12-12 23:15:06.759: starting up
LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin QEMU_AUDIO_DRV=none 
/usr/libexec/qemu-kvm -S -M rhel6.1.0 -enable-kvm -m 1024 -smp 
1,sockets=1,cores=1,threads=1 -name centos -uuid 
0752c74e-1d51-d529-f66f-8df9a28670ed -nodefconfig -nodefaults -chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/centos.monitor,server,nowait 
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-reboot 
-drive 
file=/opt/vm/centos.img,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=threads
 -device 
virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2
 -drive 
file=/opt/ISO-Download/CentOS-6.0-x86_64-minimal.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw,aio=threads
 -device 
ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1 -netdev 
tap,fd=22,id=hostnet0,vhost=on,vhostfd=26 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:10:bc:5d,bus=pci.0,addr=0x3 
-chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 
-usb -device usb-tablet,id=input0 -vnc 127.0.0.1:3,password -vga cirrus -device 
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
char device redirected to /dev/pts/9
Using CPU model cpu64-rhel6


 
 Stefan

Re: [Qemu-devel] [RFC 0/4] virtio-mmio transport

2011-12-12 Thread Pawel Moll

On Mon, 2011-12-12 at 15:11 +, Stefan Hajnoczi wrote:
 If there aren't already then pretty soon ARM-based systems will deal
 with PCIe and Message Signalled Interrupts. 

Actually PCI is not an alien in ARM world - we had platforms with PCI
long time ago. And new SOCs aimed at servers come with PCIe indeed. It's
just that bulk of available systems were designed for mobile phones
etc., and I think you agree that in such case not wasting silicon on
huge PCI root complex makes at least some sense...

And generally, when you have PCI-ed platform, you just use
virtio-pci :-)

 Are you sure new ARM
 boards are constraints to a small number of physical interrupts?

It's about SOC rather then a board. If I remember correctly Cortex-A9's
GIC (interrupt controller) can be synthesized with up to 224 interrupt
inputs. And I've already seen SOCs with more interrupt sources in the
past...

Having said all this, I don't think it is really a problem now. At least
today virtio-mmio will be used in simple models rather that in huge
KVM/Xen/whatever virtualised systems with hundreds of PV devices.

Cheers!

Paweł

Re: [Qemu-devel] Dropping the MONITOR_CMD_ASYNC

2011-12-12 Thread Stefan Hajnoczi

On Mon, Dec 12, 2011 at 12:39 PM, Gerd Hoffmann kra...@redhat.com wrote:
 On 12/12/11 13:10, Stefan Hajnoczi wrote:
 On Mon, Dec 12, 2011 at 11:29 AM, Gerd Hoffmann kra...@redhat.com wrote:
 On 12/12/11 11:18, Stefan Hajnoczi wrote:
 On Sun, Dec 11, 2011 at 10:29 AM, Alon Levy al...@redhat.com wrote:
 On Thu, Dec 08, 2011 at 05:45:44PM -0200, Luiz Capitulino wrote:
 Hi there,

 I'm about to completely drop the MONITOR_CMD_ASYNC API, but it turns out 
 that
 the command client_migrate_info uses it. That's a legacy interface and 
 has to
 be dropped, no command should be using it...

 Something tells me that if I just drop it (and change the command to use 
 the
 regular interface), bad things will happen. Am I right? :)


 The monitor command client_migrate_info needs to complete after getting
 an ACK message from the currently connected spice client (this is the
 only case where this is required - if there is no client then the
 MONITOR_CMD_ASYNC API won't be used). This in turn requires the main
 thread to perform select and call the callback that will process this
 ACK. That's why the MONITOR_CMD_ASYNC API was used.

 I'm not aware of any other way to do this, I'll be glad for any help
 here. Also, I understand this is not what is not true async, since one
 would expect a true async interface to support multiple in flight
 monitor commands. If there is any ETA or existing way to do this we
 could change the implementation of client_migrate_info.

 Is it possible to use a QMP event to signal completion instead of a
 MONITOR_CMD_ASYNC command?

 Problem is this breaks the qemu - libvirt interface.

 I had the same issue with the block_job_cancel command, which Adam
 Litke and Eric Blake helped us fix and find a backward-compatible
 libvirt solution for:

 http://www.redhat.com/archives/libvir-list/2011-November/msg01351.html

 Isn't going to fly as waiting for completion isn't optional in that
 case.  Workflow is this:

 (1) libvirt issues client_migrate_info command.
 (2) qemu forwards it to spice-server, which in turn forwards it to
    the spice client (if connected).
 (3) spice client connects to the migration target machine.
 (4) spice client signals completion to spice-server, which in turn
    notifies qemu.
 (5) qemu calls the monitor completion callback, libvirt gets
    client_migrate_info result.
 (6) libvirt issues migrate command.

 The problem is that (3) must be finished before (6) because qemu on the
 target side doesn't accept incoming tcp connections any more once the
 migration started.

 I don't see a way to switch this to qmp events without breaking old
 libvirt versions managing new qemu.

I don't see a solution in this case either.  Looks like libvirt
supports this command since 0.9.2 so it's not a good idea to just yank
it.

It might be possible for the QEMU client_migrate_info handler to run a
nested event loop in the legacy libvirt case.  This would suck since
the VM is unresponsive while waiting for spice migration to complete.
New libvirt would call the async version of the command which is
well-behaved and uses a QMP event to signal completion.

Stefan

Re: [Qemu-devel] [PATCH V4 04/13] hw/9pfs: File system helper process for qemu 9p proxy FS

2011-12-12 Thread Aneesh Kumar K.V

On Mon, 12 Dec 2011 12:08:33 +, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Fri, Dec 09, 2011 at 10:12:17PM +0530, M. Mohan Kumar wrote:
  On Friday, December 09, 2011 12:01:14 AM Stefan Hajnoczi wrote:
   On Mon, Dec 05, 2011 at 09:48:41PM +0530, M. Mohan Kumar wrote:
+static int read_request(int sockfd, struct iovec *iovec, ProxyHeader
*header) +{
+int retval;
+
+/*
+ * read the request header.
+ */
+iovec-iov_len = 0;
+retval = socket_read(sockfd, iovec-iov_base, PROXY_HDR_SZ);
+if (retval  0) {
+return retval;
+}
+iovec-iov_len = PROXY_HDR_SZ;
+retval = proxy_unmarshal(iovec, 0, dd, header-type,
header-size); +if (retval  0) {
+return retval;
+}
+/*
+ * We can't process message.size  PROXY_MAX_IO_SZ, read the
complete + * message from the socket and ignore it. This ensures
that + * we can correctly handle the next request. We also return + 
   * ENOBUFS as error to indicate we ran out of buffer space. + */
+if (header-size  PROXY_MAX_IO_SZ) {
+int count, size;
+size = header-size;
+while (size  0) {
+count = MIN(PROXY_MAX_IO_SZ, size);
+count = socket_read(sockfd, iovec-iov_base + PROXY_HDR_SZ,
count); +if (count  0) {
+return count;
+}
+size -= count;
+}
   
   I'm not sure recovery attempts are worthwhile here.  The client is
   buggy, perhaps just refuse further work.
  
  But whats the issue in trying to recover in this case?
 
 This recovery procedure is not robust because it does not always work.
 In fact it only works in the case where the header-size field was
 out-of-range but accurate.  That's not a likely case since the QEMU-side
 code that you are writing should handle this.
 
 If the nature of the invalid request is different, either a broken or
 malicious client which does not send a valid header-size then we're
 stuck in this special-case recovery trying to gobble bytes and we never
 log an error.
 
 A real recovery would be something like disconnecting and
 re-establishing the connection between QEMU and the helper.  This would
 allow us to get back to a clean state in all cases.
 

Since we are not having any state in the proxy helper, returning ENOBUFS
should be similar to the above right ? One of the reason to try to
recover as much as possible, is to make sure the guest can umount the
file system properly. That is if we hit these error condition due to a
bug in proxy FS driver is qemu, we want to make sure we return some
valid error, which will atleast enable the guest/client to do an umount.

-aneesh

Re: [Qemu-devel] [PATCH V2 5/5] vga-cirrus: Workaround during restore when using Xen.

2011-12-12 Thread Stefano Stabellini

On Mon, 12 Dec 2011, Jan Kiszka wrote:
  Is there really no way to fix this properly in the Xen layer?
 
  We thought about this issue for some time but we couldn't come up with
  anything better.
  To summarize the problem:
 
  - on restore the videoram has already been loaded in the guest physical
address space by Xen;
 
  - we don't know exactly where it is yet, because it has been loaded at
the *last* address it was mapped to (see map_linear_vram_bank that
moves the videoram);
 
  - we want to avoid allocating the videoram twice (actually the second
allocation would fail because of memory constraints);
 
 
 
  So the solution (I acknowledge that it looks more like an hack than a
  solution) is:
 
  - wait for cirrus to load its state so that we know where the videoram
is;
 
  Why can't you store this information in some additional Xen-specific
  vmstate? The fact that memory_region_get_ram_ptr has to return NULL
  looks bogus to me. It's against the QEMU semantics. Other devices may
  only be fine as they are not (yet) used with Xen.
  
  Unfortunately that cannot work because the allocation is done by
  vga_common_init before any state has been loaded.
  So even if we saved the physmap QLIST in a vmstate record, it wouldn't
  be available yet when vga_common_init tries to allocate the memory.
 
 If you run two VMs with identical setup, one from cold start to
 operational mode, the other into an incoming migration, the guest
 physical memory layout the host sees is different? Hmm, maybe if you
 reorder devices in the command line.

Yes, it is different because the memory allocated for a specific device
(Cirrus) has been moved (map_linear_vram_bank).


 Really, I think this is something inherently incompatible with the
 current memory API. If Xen has this unfixable special requirement
 (it's rather a design issue IMHO), adjust the API and adapt all devices.
 Hot-fixing only a single one this way is no good idea long term.

Fair enough.
What about introducing a type of savevm state that is going to be
restored before machine-init?
This way we could save and restore our physmap and we could handle
memory maps and allocations transparently.

Re: [Qemu-devel] New Guess OS Creation Problem

2011-12-12 Thread Stefan Hajnoczi

On Mon, Dec 12, 2011 at 3:19 PM, takizo paul...@takizo.com wrote:

 On Dec 12, 2011, at 11:08 PM, Stefan Hajnoczi wrote:

 On Mon, Dec 12, 2011 at 2:50 PM, takizo paul...@takizo.com wrote:

 On Dec 12, 2011, at 6:56 PM, Stefan Hajnoczi wrote:

 On Sun, Dec 11, 2011 at 4:40 PM, takizo paul...@takizo.com wrote:
 LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin QEMU_AUDIO_DRV=none 
 /usr/libexec/qemu-kvm -S -M rhel6.1.0 -enable-kvm -m 4096 -smp 
 1,sockets=1,cores=1,threads=1 -name database -uuid 
 f3e9f320-7826-7e50-94bb-1833f7fd9dfb -nodefconfig -nodefaults -chardev 
 socket,id=charmonitor,path=/var/lib/libvirt/qemu/database.monitor,server,nowait
  -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc 
 -no-reboot -drive 
 file=/opt/cibai/database,if=none,id=drive-ide0-0-0,format=raw,cache=none,aio=threads
  -device 
 ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=2 
 -drive 
 file=/opt/ISO-Download/FreeBSD-8.2-RELEASE-amd64-disc1.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw,aio=threads
  -device 
 ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1 
 -netdev tap,fd=22,id=hostnet0 -device 
 e1000,netdev=hostnet0,id=net0,mac=52:54:00:77:a5:a6,bus=pci.0,addr=0x3 
 -chardev pty,id=charserial0 -device 
 isa-serial,chardev=charserial0,id=serial0 -usb -vnc 127.0.0.1:2,password 
 -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4
 char device redirected to /dev/pts/6
 Using CPU model cpu64-rhel6

 block I/O error in device 'drive-ide0-0-0': Invalid argument (22)

 Is /opt/cibai/database on a special filesystem or exotic storage setup?

 Please check dmesg on the host for any kernel messages regarding I/O 
 errors.

 Please try reading the file on the host to check whether the I/O error
 is happening on the host and not related to KVM:

 $ dd if=/opt/cibai/database of=/dev/null iflag=direct

 Stefan

 Hi Stefan,

 When I quit the installation, further see the error messages

 ide_dma_cancel: aiocb still pendingide_dma_cancel: BM_STATUS_DMAING still 
 pendingide_dma_cancel: aiocb still pendingide_dma_cancel: BM_STATUS_DMAING 
 still pendingide_dma_cancel: aiocb still pendingide_dma_cancel: 
 BM_STATUS_DMAING still pendingide_dma_cancel: aiocb still 
 pendingide_dma_cancel: BM_STATUS_DMAING still pendingide_dma_cancel: aiocb 
 still pendingide_dma_cancel: BM_STATUS_DMAING still pendingide_dma_cancel: 
 aiocb still pendingide_dma_cancel: BM_STATUS_DMAING still pending2011-12-12 
 22:47:51.427: shutting down


 Hi Stefan

 Okay, so if I understand correctly you have CentOS 6.1 host running
 FreeBSD and CentOS guest installs to emulated IDE drives.  You're
 getting failed I/O requests with EINVAL (22).  You have verified that
 dd if=/opt/vm/database.img of=/dev/null iflag=direct reads the image
 file without errors.

 That is right, both also emulated on IDE drive


 Can you please try the CentOS guest install with a virtio-blk drive
 instead of IDE?  This should show whether this problem is related to
 IDE or a general issue in how QEMU accesses the host file.

 Took your advice installed Centos on VirtIO drive and it's ok.

 Does it means it's IDE problem on the OS itself?

The physical drive in your machine is probably fine.  QEMU emulates an
IDE controller and disks for the guest.  It looks like there is a bug
in the IDE emulation code inside QEMU which leads to these errors.

In general virtio-blk is the recommended storage interface because it
delivers better performance.  I'm not up-to-speed on how stable the
FreeBSD virtio drivers are but I think they are available if you
search.

 2011-12-12 23:15:06.759: starting up
 LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin QEMU_AUDIO_DRV=none 
 /usr/libexec/qemu-kvm -S -M rhel6.1.0 -enable-kvm -m 1024 -smp 
 1,sockets=1,cores=1,threads=1 -name centos -uuid 
 0752c74e-1d51-d529-f66f-8df9a28670ed -nodefconfig -nodefaults -chardev 
 socket,id=charmonitor,path=/var/lib/libvirt/qemu/centos.monitor,server,nowait 
 -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-reboot 
 -drive 
 file=/opt/vm/centos.img,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=threads
  -device 
 virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2
  -drive 
 file=/opt/ISO-Download/CentOS-6.0-x86_64-minimal.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw,aio=threads
  -device 
 ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1 
 -netdev tap,fd=22,id=hostnet0,vhost=on,vhostfd=26 -device 
 virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:10:bc:5d,bus=pci.0,addr=0x3
  -chardev pty,id=charserial0 -device 
 isa-serial,chardev=charserial0,id=serial0 -usb -device usb-tablet,id=input0 
 -vnc 127.0.0.1:3,password -vga cirrus -device 
 virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
 char device redirected to /dev/pts/9
 Using CPU model cpu64-rhel6

At this stage of debugging I would try enabling SystemTap probes in
QEMU

Re: [Qemu-devel] [RFC 0/6] target-i386: TCGv cleanups

2011-12-12 Thread Andreas Färber

Am 12.12.2011 15:56, schrieb Paul Brook:
 Am 12.12.2011 00:42, schrieb Paul Brook:
 This series makes target-i386 compile with DEBUG_TCGV_TL.

 What benefit does this provide?

 It showcases what changes would need to be done to allow type-safe
 compilation of the first pair of --enable-system targets.
 
 How is the existing code not type safe?  What benefit does making TCGv a 
 separate type rather than an alias for either _i32 or _i64 give?

I've already answered that extensively elsewhere.

Look, I really personally don't care about i386 and whether this RFC
gets applied to target-i386 or not. This is an example, because the
changes that I *do* care about have already been folded into my branch,
with the help of my TCGv series now posted for others' benefit.

Please read the other thread for details of what it does and doesn't do.

Andreas

[Qemu-devel] [PATCH] ARM - Remove fixed map code buffer restriction

2011-12-12 Thread Dr. David Alan Gilbert

On ARM, don't map the code buffer at a fixed location, and fix up the
call/goto tcg routines to let it do long jumps.

Mapping the code buffer at a fixed address could sometimes result in it being
mapped over the top of the heap with pretty random results.

This diff is against v1.0.

Signed-off-by: Dr. David Alan Gilbert david.gilb...@linaro.org
---
 exec.c   |4 +---
 tcg/arm/tcg-target.c |   31 ---
 2 files changed, 13 insertions(+), 22 deletions(-)

diff --git a/exec.c b/exec.c
index 6b92198..ef83da1 100644
--- a/exec.c
+++ b/exec.c
@@ -497,9 +497,7 @@ static void code_gen_alloc(unsigned long tb_size)
 if (code_gen_buffer_size  (512 * 1024 * 1024))
 code_gen_buffer_size = (512 * 1024 * 1024);
 #elif defined(__arm__)
-/* Map the buffer below 32M, so we can use direct calls and branches */
-flags |= MAP_FIXED;
-start = (void *) 0x0100UL;
+/* Keep the buffer no bigger than 16GB to branch between blocks */
 if (code_gen_buffer_size  16 * 1024 * 1024)
 code_gen_buffer_size = 16 * 1024 * 1024;
 #elif defined(__s390x__)
diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c
index e05a64f..730d913 100644
--- a/tcg/arm/tcg-target.c
+++ b/tcg/arm/tcg-target.c
@@ -842,6 +842,12 @@ static inline void tcg_out_st8(TCGContext *s, int cond,
 tcg_out_st8_12(s, cond, rd, rn, offset);
 }
 
+/* The _goto case is normally between TBs within the same code buffer,
+   and with the code buffer limited to 16GB we shouldn't need the long
+   case.
+
+    except to the prologue that is in its own buffer.
+ */
 static inline void tcg_out_goto(TCGContext *s, int cond, uint32_t addr)
 {
 int32_t val;
@@ -855,22 +861,20 @@ static inline void tcg_out_goto(TCGContext *s, int cond, 
uint32_t addr)
 if (val - 8  0x01fd  val - 8  -0x01fd)
 tcg_out_b(s, cond, val);
 else {
-#if 1
-tcg_abort();
-#else
 if (cond == COND_AL) {
 tcg_out_ld32_12(s, COND_AL, TCG_REG_PC, TCG_REG_PC, -4);
-tcg_out32(s, addr); /* XXX: This is l-u.value, can we use it? */
+tcg_out32(s, addr);
 } else {
 tcg_out_movi32(s, cond, TCG_REG_R8, val - 8);
 tcg_out_dat_reg(s, cond, ARITH_ADD,
 TCG_REG_PC, TCG_REG_PC,
 TCG_REG_R8, SHIFT_IMM_LSL(0));
 }
-#endif
 }
 }
 
+/* The call case is mostly used for helpers - so it's not unreasonable
+   for them to be beyond branch range */
 static inline void tcg_out_call(TCGContext *s, uint32_t addr)
 {
 int32_t val;
@@ -887,20 +891,9 @@ static inline void tcg_out_call(TCGContext *s, uint32_t 
addr)
 tcg_out_bl(s, COND_AL, val);
 }
 } else {
-#if 1
-tcg_abort();
-#else
-if (cond == COND_AL) {
-tcg_out_dat_imm(s, cond, ARITH_ADD, TCG_REG_R14, TCG_REG_PC, 4);
-tcg_out_ld32_12(s, COND_AL, TCG_REG_PC, TCG_REG_PC, -4);
-tcg_out32(s, addr); /* XXX: This is l-u.value, can we use it? */
-} else {
-tcg_out_movi32(s, cond, TCG_REG_R9, addr);
-tcg_out_dat_reg(s, cond, ARITH_MOV, TCG_REG_R14, 0,
-TCG_REG_PC, SHIFT_IMM_LSL(0));
-tcg_out_bx(s, cond, TCG_REG_R9);
-}
-#endif
+tcg_out_dat_imm(s, COND_AL, ARITH_ADD, TCG_REG_R14, TCG_REG_PC, 4);
+tcg_out_ld32_12(s, COND_AL, TCG_REG_PC, TCG_REG_PC, -4);
+tcg_out32(s, addr);
 }
 }

Re: [Qemu-devel] New Guess OS Creation Problem

2011-12-12 Thread takizo


On Dec 12, 2011, at 11:33 PM, Stefan Hajnoczi wrote:

 On Mon, Dec 12, 2011 at 3:19 PM, takizo paul...@takizo.com wrote:
 
 On Dec 12, 2011, at 11:08 PM, Stefan Hajnoczi wrote:
 
 On Mon, Dec 12, 2011 at 2:50 PM, takizo paul...@takizo.com wrote:
 
 On Dec 12, 2011, at 6:56 PM, Stefan Hajnoczi wrote:
 
 On Sun, Dec 11, 2011 at 4:40 PM, takizo paul...@takizo.com wrote:
 LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin QEMU_AUDIO_DRV=none 
 /usr/libexec/qemu-kvm -S -M rhel6.1.0 -enable-kvm -m 4096 -smp 
 1,sockets=1,cores=1,threads=1 -name database -uuid 
 f3e9f320-7826-7e50-94bb-1833f7fd9dfb -nodefconfig -nodefaults -chardev 
 socket,id=charmonitor,path=/var/lib/libvirt/qemu/database.monitor,server,nowait
  -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc 
 -no-reboot -drive 
 file=/opt/cibai/database,if=none,id=drive-ide0-0-0,format=raw,cache=none,aio=threads
  -device 
 ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=2 
 -drive 
 file=/opt/ISO-Download/FreeBSD-8.2-RELEASE-amd64-disc1.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw,aio=threads
  -device 
 ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1 
 -netdev tap,fd=22,id=hostnet0 -device 
 e1000,netdev=hostnet0,id=net0,mac=52:54:00:77:a5:a6,bus=pci.0,addr=0x3 
 -chardev pty,id=charserial0 -device 
 isa-serial,chardev=charserial0,id=serial0 -usb -vnc 127.0.0.1:2,password 
 -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4
 char device redirected to /dev/pts/6
 Using CPU model cpu64-rhel6
 
 block I/O error in device 'drive-ide0-0-0': Invalid argument (22)
 
 Is /opt/cibai/database on a special filesystem or exotic storage setup?
 
 Please check dmesg on the host for any kernel messages regarding I/O 
 errors.
 
 Please try reading the file on the host to check whether the I/O error
 is happening on the host and not related to KVM:
 
 $ dd if=/opt/cibai/database of=/dev/null iflag=direct
 
 Stefan
 
 Hi Stefan,
 
 When I quit the installation, further see the error messages
 
 ide_dma_cancel: aiocb still pendingide_dma_cancel: BM_STATUS_DMAING still 
 pendingide_dma_cancel: aiocb still pendingide_dma_cancel: BM_STATUS_DMAING 
 still pendingide_dma_cancel: aiocb still pendingide_dma_cancel: 
 BM_STATUS_DMAING still pendingide_dma_cancel: aiocb still 
 pendingide_dma_cancel: BM_STATUS_DMAING still pendingide_dma_cancel: aiocb 
 still pendingide_dma_cancel: BM_STATUS_DMAING still pendingide_dma_cancel: 
 aiocb still pendingide_dma_cancel: BM_STATUS_DMAING still 
 pending2011-12-12 22:47:51.427: shutting down
 
 
 Hi Stefan
 
 Okay, so if I understand correctly you have CentOS 6.1 host running
 FreeBSD and CentOS guest installs to emulated IDE drives.  You're
 getting failed I/O requests with EINVAL (22).  You have verified that
 dd if=/opt/vm/database.img of=/dev/null iflag=direct reads the image
 file without errors.
 
 That is right, both also emulated on IDE drive
 
 
 Can you please try the CentOS guest install with a virtio-blk drive
 instead of IDE?  This should show whether this problem is related to
 IDE or a general issue in how QEMU accesses the host file.
 
 Took your advice installed Centos on VirtIO drive and it's ok.
 
 Does it means it's IDE problem on the OS itself?
 
 The physical drive in your machine is probably fine.  QEMU emulates an
 IDE controller and disks for the guest.  It looks like there is a bug
 in the IDE emulation code inside QEMU which leads to these errors.

I suspect the problem might came from QEMU, because day before it was working 
before before I upgrade to latest QEMU. 
 
 In general virtio-blk is the recommended storage interface because it
 delivers better performance.  I'm not up-to-speed on how stable the
 FreeBSD virtio drivers are but I think they are available if you
 search.

Unfortunately I am running FreeBSD 8.2, it's not natively supported yet. I 
guess now interim solutions will be duplicate from some current running VM. 

 
 2011-12-12 23:15:06.759: starting up
 LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin QEMU_AUDIO_DRV=none 
 /usr/libexec/qemu-kvm -S -M rhel6.1.0 -enable-kvm -m 1024 -smp 
 1,sockets=1,cores=1,threads=1 -name centos -uuid 
 0752c74e-1d51-d529-f66f-8df9a28670ed -nodefconfig -nodefaults -chardev 
 socket,id=charmonitor,path=/var/lib/libvirt/qemu/centos.monitor,server,nowait
  -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-reboot 
 -drive 
 file=/opt/vm/centos.img,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=threads
  -device 
 virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2
  -drive 
 file=/opt/ISO-Download/CentOS-6.0-x86_64-minimal.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw,aio=threads
  -device 
 ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1 
 -netdev tap,fd=22,id=hostnet0,vhost=on,vhostfd=26 -device 
 virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:10:bc:5d,bus=pci.0,addr=0x3

Re: [Qemu-devel] Dropping the MONITOR_CMD_ASYNC

2011-12-12 Thread Alon Levy

On Thu, Dec 08, 2011 at 05:45:44PM -0200, Luiz Capitulino wrote:
 Hi there,
 
 I'm about to completely drop the MONITOR_CMD_ASYNC API, but it turns out that
 the command client_migrate_info uses it. That's a legacy interface and has to
 be dropped, no command should be using it...
 

Why has it got to be dropped? can't it be declared as deprecated first?

 Something tells me that if I just drop it (and change the command to use the
 regular interface), bad things will happen. Am I right? :)

Re: [Qemu-devel] [PATCH] ARM - Remove fixed map code buffer restriction

2011-12-12 Thread Peter Maydell

CC'ing Andrzej, who is the tcg/arm maintainer.

On 12 December 2011 15:37, Dr. David Alan Gilbert
david.gilb...@linaro.org wrote:
 On ARM, don't map the code buffer at a fixed location, and fix up the
 call/goto tcg routines to let it do long jumps.

 Mapping the code buffer at a fixed address could sometimes result in it being
 mapped over the top of the heap with pretty random results.

 This diff is against v1.0.

Patches should always be against master, although we're still
close enough to 1.0 that this will probably apply anyway.

Comments like This diff is against v1.0. go below the '---' so they
don't appear in the final commit log.

Code generally looks OK to me.

 Signed-off-by: Dr. David Alan Gilbert david.gilb...@linaro.org
 ---
  exec.c               |    4 +---
  tcg/arm/tcg-target.c |   31 ---
  2 files changed, 13 insertions(+), 22 deletions(-)

 diff --git a/exec.c b/exec.c
 index 6b92198..ef83da1 100644
 --- a/exec.c
 +++ b/exec.c
 @@ -497,9 +497,7 @@ static void code_gen_alloc(unsigned long tb_size)
         if (code_gen_buffer_size  (512 * 1024 * 1024))
             code_gen_buffer_size = (512 * 1024 * 1024);
  #elif defined(__arm__)
 -        /* Map the buffer below 32M, so we can use direct calls and branches 
 */
 -        flags |= MAP_FIXED;
 -        start = (void *) 0x0100UL;
 +        /* Keep the buffer no bigger than 16GB to branch between blocks */
         if (code_gen_buffer_size  16 * 1024 * 1024)
             code_gen_buffer_size = 16 * 1024 * 1024;
  #elif defined(__s390x__)
 diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c
 index e05a64f..730d913 100644
 --- a/tcg/arm/tcg-target.c
 +++ b/tcg/arm/tcg-target.c
 @@ -842,6 +842,12 @@ static inline void tcg_out_st8(TCGContext *s, int cond,
         tcg_out_st8_12(s, cond, rd, rn, offset);
  }

 +/* The _goto case is normally between TBs within the same code buffer,
 +   and with the code buffer limited to 16GB we shouldn't need the long
 +   case.
 +
 +    except to the prologue that is in its own buffer.
 +     */
  static inline void tcg_out_goto(TCGContext *s, int cond, uint32_t addr)
  {
     int32_t val;
 @@ -855,22 +861,20 @@ static inline void tcg_out_goto(TCGContext *s, int 
 cond, uint32_t addr)
     if (val - 8  0x01fd  val - 8  -0x01fd)
         tcg_out_b(s, cond, val);
     else {
 -#if 1
 -        tcg_abort();
 -#else
         if (cond == COND_AL) {
             tcg_out_ld32_12(s, COND_AL, TCG_REG_PC, TCG_REG_PC, -4);
 -            tcg_out32(s, addr); /* XXX: This is l-u.value, can we use it? */
 +            tcg_out32(s, addr);

I see these XXXs have been there since the ARM tcg target was
introduced. Does anybody know what they were referring to? Andrzej?

         } else {
             tcg_out_movi32(s, cond, TCG_REG_R8, val - 8);
             tcg_out_dat_reg(s, cond, ARITH_ADD,
                             TCG_REG_PC, TCG_REG_PC,
                             TCG_REG_R8, SHIFT_IMM_LSL(0));
         }
 -#endif
     }
  }

 +/* The call case is mostly used for helpers - so it's not unreasonable
 +   for them to be beyond branch range */
  static inline void tcg_out_call(TCGContext *s, uint32_t addr)
  {
     int32_t val;
 @@ -887,20 +891,9 @@ static inline void tcg_out_call(TCGContext *s, uint32_t 
 addr)
             tcg_out_bl(s, COND_AL, val);
         }
     } else {
 -#if 1
 -        tcg_abort();
 -#else
 -        if (cond == COND_AL) {
 -            tcg_out_dat_imm(s, cond, ARITH_ADD, TCG_REG_R14, TCG_REG_PC, 4);
 -            tcg_out_ld32_12(s, COND_AL, TCG_REG_PC, TCG_REG_PC, -4);
 -            tcg_out32(s, addr); /* XXX: This is l-u.value, can we use it? */
 -        } else {
 -            tcg_out_movi32(s, cond, TCG_REG_R9, addr);
 -            tcg_out_dat_reg(s, cond, ARITH_MOV, TCG_REG_R14, 0,
 -                            TCG_REG_PC, SHIFT_IMM_LSL(0));
 -            tcg_out_bx(s, cond, TCG_REG_R9);
 -        }
 -#endif
 +        tcg_out_dat_imm(s, COND_AL, ARITH_ADD, TCG_REG_R14, TCG_REG_PC, 4);
 +        tcg_out_ld32_12(s, COND_AL, TCG_REG_PC, TCG_REG_PC, -4);
 +        tcg_out32(s, addr);
     }
  }


-- PMM

Re: [Qemu-devel] [PATCH V4 04/13] hw/9pfs: File system helper process for qemu 9p proxy FS

2011-12-12 Thread Stefan Hajnoczi

On Mon, Dec 12, 2011 at 3:21 PM, Aneesh Kumar K.V
aneesh.ku...@linux.vnet.ibm.com wrote:
 On Mon, 12 Dec 2011 12:08:33 +, Stefan Hajnoczi stefa...@gmail.com 
 wrote:
 On Fri, Dec 09, 2011 at 10:12:17PM +0530, M. Mohan Kumar wrote:
  On Friday, December 09, 2011 12:01:14 AM Stefan Hajnoczi wrote:
   On Mon, Dec 05, 2011 at 09:48:41PM +0530, M. Mohan Kumar wrote:
+static int read_request(int sockfd, struct iovec *iovec, ProxyHeader
*header) +{
+    int retval;
+
+    /*
+     * read the request header.
+     */
+    iovec-iov_len = 0;
+    retval = socket_read(sockfd, iovec-iov_base, PROXY_HDR_SZ);
+    if (retval  0) {
+        return retval;
+    }
+    iovec-iov_len = PROXY_HDR_SZ;
+    retval = proxy_unmarshal(iovec, 0, dd, header-type,
header-size); +    if (retval  0) {
+        return retval;
+    }
+    /*
+     * We can't process message.size  PROXY_MAX_IO_SZ, read the
complete +     * message from the socket and ignore it. This ensures
that +     * we can correctly handle the next request. We also return +
   * ENOBUFS as error to indicate we ran out of buffer space. +     */
+    if (header-size  PROXY_MAX_IO_SZ) {
+        int count, size;
+        size = header-size;
+        while (size  0) {
+            count = MIN(PROXY_MAX_IO_SZ, size);
+            count = socket_read(sockfd, iovec-iov_base + 
PROXY_HDR_SZ,
count); +            if (count  0) {
+                return count;
+            }
+            size -= count;
+        }
  
   I'm not sure recovery attempts are worthwhile here.  The client is
   buggy, perhaps just refuse further work.
 
  But whats the issue in trying to recover in this case?

 This recovery procedure is not robust because it does not always work.
 In fact it only works in the case where the header-size field was
 out-of-range but accurate.  That's not a likely case since the QEMU-side
 code that you are writing should handle this.

 If the nature of the invalid request is different, either a broken or
 malicious client which does not send a valid header-size then we're
 stuck in this special-case recovery trying to gobble bytes and we never
 log an error.

 A real recovery would be something like disconnecting and
 re-establishing the connection between QEMU and the helper.  This would
 allow us to get back to a clean state in all cases.


 Since we are not having any state in the proxy helper, returning ENOBUFS
 should be similar to the above right ? One of the reason to try to
 recover as much as possible, is to make sure the guest can umount the
 file system properly. That is if we hit these error condition due to a
 bug in proxy FS driver is qemu, we want to make sure we return some
 valid error, which will atleast enable the guest/client to do an umount.

When the helper detects something outside the protocol specification
it needs to terminate the connection.  The protocol has no reliable
way to skip the junk coming over the socket so we can't process the
next message.

The flipside to try to recover as much as possible is damage as
little as possible.  We don't want to mis-interpret requests on this
broken connection and corrupt the user's data.

I'm happy with any scheme as long as it handles all error cases.  The
problem with the -ENOBUFS case was that it is pretty artificial
(unlikely to happen) and doesn't handle cases where header-size is
inaccurate.

Stefan

Re: [Qemu-devel] [PATCH 0/4] tcg: Add debug facilities for TCGv

2011-12-12 Thread Paul Brook

  Trying to make a 32-bit target 64-bit safe without actually
  implementing the 64-bit target is a complete waste of time.
 
 That's where we disagree. I rather do things right from the start than
 leaving the cleanup work to someone else later on.

  You've almost no chance of getting
  it right.  In some cases the correct answer will be to use 32-bit
  arithmetic, then sign/zero extend the result. In other cases the correct
  answer will be to perform word size arithmetic.  Blindly picking one
  just makes the bugs harder to find later.
 
 This series picks nothing blindly. 

Sure it does - or at least it will when you get to ARM and m68k[1].  The whole 
point of your patch is to force developers to pick between an address sized 
value and a 32-bit value.  On a 32-bit target these are the same thing.  Not 
only are they the same thing, but most of the time the architecture doesn't 
even have the concept that they may be different.

Only when you introduce a variant with 64-bit addresses does the distinction 
become meaningful.  At that point they're actually different, and are 
trivially detected by the existing checks.

Ther are three ways to resolve a mismatch:
- Change everything to TCGv_i32.
- Change everything to TCGv.
- Add an explicit extension/truncation (compiles to no-op on 32-bit targets).

There's no way of the developer of a 32-bit architecture to know which of 
these a [hypothetical future] 64-bit architecture will require.  The fact they 
you've been forced to pick one (rather then leave the mismatch for the 64-bit 
porter to resolve) actually makes bugs harder to find.

 For me the most annoying issue was that tcg_gen_qemu_{ld,st}* needs TCGv.

You mean the value transferred is always TCGv sized, so ld32u requires an 
additional truncation before doing 32-bit arithmetic?  Fixing that is 
completely independent of making TCGv a separate type.

The whole point of TCGv is that it's an address sized value, so is the only 
real option for the address.  Constructing an address for a different sized 
value requires a target specific sign/szero extension or truncation/overflow 
check.

  If you're trying to add support for targets where the primary word size
  is neither 32 nor 64 then that's a completely different problem, and
  probably not one that's worth solving.  In practice your port is going
  to end up using 64- bit arithmetic and explicitly compensating for the
  excess precision where necessary.
 
 That's a different issue and not being addressed here.

Good.
 
 My point was that I have an inheritance hierarchy where (as opposed to,
 e.g., ppc and ppc64) both end up as TARGET_LONG_BITS==32, so I do not
 get a check for free by compiling both, and I do not want to introduce
 an artificial TARGET_LONG_BITS==64 architecture just to check that my
 tl==i32 target code is free of typos.

Huh? Now I'm completely confused.  If all your targets have T_L_B==32 how can 
mixing TCGv and TCGV_i32 be wrong? How do you know which of the 3 alternatives 
above is the right answer?

 Same issue if you pick only --target-list=some-softmmu BTW.

If you're only building a small subset of targets then clearly you won't see 
errors in the targets you omitted.  The only sane answer is to make sure you 
build (and preferably test) all the targets you have effected.

This is the same reason I object to code that is disabled by default.  If it 
isn't at least being built by the majority of developers (or at least any 
working in related areas) then it's going to bitrot rapidly, and probably 
won't work when someone does decide to turn it on.

 Just like with softfloat [u]int16 etc. types it's just too easy to
 forget _t somewhere (here: _i32) and to end up with a type you don't want.

I don't see how that is relevant.  TCGv (aka target_ulong) is a synonym for 
either TCGv_i32 (uint32_t) or TCGv_i64 (uint64_t).  If this is not true then 
we're completely screwed.

 If you have a better proposal how to introduce the checks I want, please
 let us hear it.

I still don't understand how your additional restructions are ever useful.
Please give an example of an actual error your checks will catch.

Paul

[1] m68k may never have a 64-bit variant, but ARM is a good example of a qemu 
target that is currently 32-bit but will grow a 64-bit variant in the not too 
distant future.

Re: [Qemu-devel] Dropping the MONITOR_CMD_ASYNC

2011-12-12 Thread Alon Levy

On Mon, Dec 12, 2011 at 03:23:35PM +, Stefan Hajnoczi wrote:
 On Mon, Dec 12, 2011 at 12:39 PM, Gerd Hoffmann kra...@redhat.com wrote:
  On 12/12/11 13:10, Stefan Hajnoczi wrote:
  On Mon, Dec 12, 2011 at 11:29 AM, Gerd Hoffmann kra...@redhat.com wrote:
  On 12/12/11 11:18, Stefan Hajnoczi wrote:
  On Sun, Dec 11, 2011 at 10:29 AM, Alon Levy al...@redhat.com wrote:
  On Thu, Dec 08, 2011 at 05:45:44PM -0200, Luiz Capitulino wrote:
  Hi there,
 
  I'm about to completely drop the MONITOR_CMD_ASYNC API, but it turns 
  out that
  the command client_migrate_info uses it. That's a legacy interface and 
  has to
  be dropped, no command should be using it...
 
  Something tells me that if I just drop it (and change the command to 
  use the
  regular interface), bad things will happen. Am I right? :)
 
 
  The monitor command client_migrate_info needs to complete after getting
  an ACK message from the currently connected spice client (this is the
  only case where this is required - if there is no client then the
  MONITOR_CMD_ASYNC API won't be used). This in turn requires the main
  thread to perform select and call the callback that will process this
  ACK. That's why the MONITOR_CMD_ASYNC API was used.
 
  I'm not aware of any other way to do this, I'll be glad for any help
  here. Also, I understand this is not what is not true async, since one
  would expect a true async interface to support multiple in flight
  monitor commands. If there is any ETA or existing way to do this we
  could change the implementation of client_migrate_info.
 
  Is it possible to use a QMP event to signal completion instead of a
  MONITOR_CMD_ASYNC command?
 
  Problem is this breaks the qemu - libvirt interface.
 
  I had the same issue with the block_job_cancel command, which Adam
  Litke and Eric Blake helped us fix and find a backward-compatible
  libvirt solution for:
 
  http://www.redhat.com/archives/libvir-list/2011-November/msg01351.html
 
  Isn't going to fly as waiting for completion isn't optional in that
  case.  Workflow is this:
 
  (1) libvirt issues client_migrate_info command.
  (2) qemu forwards it to spice-server, which in turn forwards it to
     the spice client (if connected).
  (3) spice client connects to the migration target machine.
  (4) spice client signals completion to spice-server, which in turn
     notifies qemu.
  (5) qemu calls the monitor completion callback, libvirt gets
     client_migrate_info result.
  (6) libvirt issues migrate command.
 
  The problem is that (3) must be finished before (6) because qemu on the
  target side doesn't accept incoming tcp connections any more once the
  migration started.
 
  I don't see a way to switch this to qmp events without breaking old
  libvirt versions managing new qemu.
 
 I don't see a solution in this case either.  Looks like libvirt
 supports this command since 0.9.2 so it's not a good idea to just yank
 it.
 
 It might be possible for the QEMU client_migrate_info handler to run a
 nested event loop in the legacy libvirt case.  This would suck since
 the VM is unresponsive while waiting for spice migration to complete.
 New libvirt would call the async version of the command which is
 well-behaved and uses a QMP event to signal completion.

I agree that a nested event loop would be a bad solution, the point is
to let the guest continue to work while waiting, otherwise you destroy
the live migration experience, might as well disconnect the client from
the source and have it connect to the target.

I thought the whole point of MONITOR_CMD_ASYNC was to allow this
scenario. So iiuc QMP is the alternative but it would require a rewrite,
i.e. break existing users like libvirt. Hence my suggestion as a reply
to Luiz's initial email that we just deprecate MONITOR_CMD_ASYNC right
now instead of dropping it, then we can implement a QMP variant of
client_migrate_info side by side, libvirt can learn to use it, and then
(seeing as we are the only user) M_C_A can be dropped.

 
 Stefan

Re: [Qemu-devel] [PATCH V12 1/8] Support for TPM command line options

2011-12-12 Thread Anthony Liguori

Please use git-send-email.  This series won't apply via git-am as-is.  Also, use 
a single line separator with '---' between the revision history and the commit 
message so that the revision history doesn't end up in git.  You need to move 
the SoB line before the revision history too.


Regards,

Anthony Liguori

On 10/11/2011 12:32 PM, Stefan Berger wrote:

Re: [Qemu-devel] Dropping the MONITOR_CMD_ASYNC

2011-12-12 Thread Luiz Capitulino

On Mon, 12 Dec 2011 17:50:46 +0200
Alon Levy al...@redhat.com wrote:

 On Thu, Dec 08, 2011 at 05:45:44PM -0200, Luiz Capitulino wrote:
  Hi there,
  
  I'm about to completely drop the MONITOR_CMD_ASYNC API, but it turns out 
  that
  the command client_migrate_info uses it. That's a legacy interface and has 
  to
  be dropped, no command should be using it...
  
 
 Why has it got to be dropped? can't it be declared as deprecated first?

Well, after this thread looks like it's what we'll have to do...

 
  Something tells me that if I just drop it (and change the command to use the
  regular interface), bad things will happen. Am I right? :)

[Qemu-devel] [PATCH v2 0/4] add qemu_thread_join, use it to fix bug in ccid

2011-12-12 Thread Paolo Bonzini

Patches introducing qemu_thread_join have floated around multiple times.
Now I found a bug that requires it to be fixed, so perhaps this time
it will be more successful.

For the actual bug, see patch 4.

v1-v2: remove spurious submodule change, fix blank lines

Jan Kiszka (2):
  qemu-thread: add API for joinable threads
  qemu-thread: implement joinable threads for POSIX

Paolo Bonzini (2):
  qemu-thread: implement joinable threads for Win32
  ccid: make threads joinable

 cpus.c  |6 ++-
 hw/ccid-card-emulated.c |   25 +--
 qemu-thread-posix.c |   35 ++--
 qemu-thread-win32.c |  107 ++
 qemu-thread-win32.h |5 +-
 qemu-thread.h   |8 +++-
 ui/vnc-jobs-async.c |2 +-
 8 files changed, 127 insertions(+), 63 deletions(-)

-- 
1.7.7.1

[Qemu-devel] [PATCH v2 2/4] qemu-thread: implement joinable threads for POSIX

2011-12-12 Thread Paolo Bonzini

From: Jan Kiszka jan.kis...@siemens.com

Allow to control if a QEMU thread is created joinable or not. Make it
not joinable by default to avoid that we keep the associated resources
around when terminating a thread without joining it (what we couldn't do
so far for obvious reasons).

The audio subsystem will need the join feature when converting it to
QEMU threading/locking abstractions, so provide that service.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 qemu-thread-posix.c |   28 --
 1 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/qemu-thread-posix.c b/qemu-thread-posix.c
index 49d3388..603ff3d 100644
--- a/qemu-thread-posix.c
+++ b/qemu-thread-posix.c
@@ -122,17 +122,29 @@ void qemu_thread_create(QemuThread *thread,
 {
 sigset_t set, oldset;
 int err;
+pthread_attr_t attr;
 
-assert(mode == QEMU_THREAD_DETACHED);
+err = pthread_attr_init(attr);
+if (err) {
+error_exit(err, __func__);
+}
+if (mode == QEMU_THREAD_DETACHED) {
+err = pthread_attr_setdetachstate(attr, PTHREAD_CREATE_DETACHED);
+if (err) {
+error_exit(err, __func__);
+}
+}
 
 /* Leave signal handling to the iothread.  */
 sigfillset(set);
 pthread_sigmask(SIG_SETMASK, set, oldset);
-err = pthread_create(thread-thread, NULL, start_routine, arg);
+err = pthread_create(thread-thread, attr, start_routine, arg);
 if (err)
 error_exit(err, __func__);
 
 pthread_sigmask(SIG_SETMASK, oldset, NULL);
+
+pthread_attr_destroy(attr);
 }
 
 void qemu_thread_get_self(QemuThread *thread)
@@ -148,3 +162,15 @@ void qemu_thread_exit(void *retval)
 {
 pthread_exit(retval);
 }
+
+void *qemu_thread_join(QemuThread *thread)
+{
+int err;
+void *ret;
+
+err = pthread_join(thread-thread, ret);
+if (err) {
+error_exit(err, __func__);
+}
+return ret;
+}
-- 
1.7.7.1

[Qemu-devel] [PATCH v2 4/4] ccid: make threads joinable

2011-12-12 Thread Paolo Bonzini

Destroying a mutex that another thread might have just unlocked
is racy.  It usually works, but you cannot do that in general and
can lead to deadlocks or segfaults.  Change ccid to use joinable
threads instead.

(Also, qemu_mutex_init/qemu_cond_init were missing).

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 hw/ccid-card-emulated.c |   26 +++---
 1 files changed, 11 insertions(+), 15 deletions(-)

diff --git a/hw/ccid-card-emulated.c b/hw/ccid-card-emulated.c
index 9fe9db5..2d2ebce 100644
--- a/hw/ccid-card-emulated.c
+++ b/hw/ccid-card-emulated.c
@@ -120,6 +120,7 @@ struct EmulatedState {
 uint8_t  atr_length;
 QSIMPLEQ_HEAD(event_list, EmulEvent) event_list;
 QemuMutex event_list_mutex;
+QemuThread event_thread_id;
 VReader *reader;
 QSIMPLEQ_HEAD(guest_apdu_list, EmulEvent) guest_apdu_list;
 QemuMutex vreader_mutex; /* and guest_apdu_list mutex */
@@ -127,8 +128,7 @@ struct EmulatedState {
 QemuCond handle_apdu_cond;
 int  pipe[2];
 int  quit_apdu_thread;
-QemuMutex apdu_thread_quit_mutex;
-QemuCond apdu_thread_quit_cond;
+QemuThread apdu_thread_id;
 };
 
 static void emulated_apdu_from_guest(CCIDCardState *base,
@@ -271,9 +271,6 @@ static void *handle_apdu_thread(void* arg)
 }
 qemu_mutex_unlock(card-vreader_mutex);
 }
-qemu_mutex_lock(card-apdu_thread_quit_mutex);
-qemu_cond_signal(card-apdu_thread_quit_cond);
-qemu_mutex_unlock(card-apdu_thread_quit_mutex);
 return NULL;
 }
 
@@ -489,7 +486,6 @@ static uint32_t parse_enumeration(char *str,
 static int emulated_initfn(CCIDCardState *base)
 {
 EmulatedState *card = DO_UPCAST(EmulatedState, base, base);
-QemuThread thread_id;
 VCardEmulError ret;
 EnumTable *ptable;
 
@@ -541,9 +537,10 @@ static int emulated_initfn(CCIDCardState *base)
 printf(%s: failed to initialize vcard\n, EMULATED_DEV_NAME);
 return -1;
 }
-qemu_thread_create(thread_id, event_thread, card, QEMU_THREAD_DETACHED);
-qemu_thread_create(thread_id, handle_apdu_thread, card,
-   QEMU_THREAD_DETACHED);
+qemu_thread_create(card-event_thread_id, event_thread, card,
+   QEMU_THREAD_JOINABLE);
+qemu_thread_create(card-apdu_thread_id, handle_apdu_thread, card,
+   QEMU_THREAD_JOINABLE);
 return 0;
 }
 
@@ -553,15 +550,14 @@ static int emulated_exitfn(CCIDCardState *base)
 VEvent *vevent = vevent_new(VEVENT_LAST, NULL, NULL);
 
 vevent_queue_vevent(vevent); /* stop vevent thread */
-qemu_mutex_lock(card-apdu_thread_quit_mutex);
+qemu_thread_join(card-event_thread_id);
+
 card-quit_apdu_thread = 1; /* stop handle_apdu thread */
 qemu_cond_signal(card-handle_apdu_cond);
-qemu_cond_wait(card-apdu_thread_quit_cond,
-  card-apdu_thread_quit_mutex);
-/* handle_apdu thread stopped, can destroy all of it's mutexes */
+qemu_thread_join(card-apdu_thread_id);
+
+/* threads exited, can destroy all condvars/mutexes */
 qemu_cond_destroy(card-handle_apdu_cond);
-qemu_cond_destroy(card-apdu_thread_quit_cond);
-qemu_mutex_destroy(card-apdu_thread_quit_mutex);
 qemu_mutex_destroy(card-handle_apdu_mutex);
 qemu_mutex_destroy(card-vreader_mutex);
 qemu_mutex_destroy(card-event_list_mutex);
-- 
1.7.7.1

[Qemu-devel] [PATCH v2 1/4] qemu-thread: add API for joinable threads

2011-12-12 Thread Paolo Bonzini

From: Jan Kiszka jan.kis...@siemens.com

Split from Jan's original qemu-thread-posix.c patch.  No semantic change,
just introduce the new API that POSIX and Win32 implementations will
conform to.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 cpus.c  |6 --
 hw/ccid-card-emulated.c |5 +++--
 qemu-thread-posix.c |7 ---
 qemu-thread-win32.c |4 +++-
 qemu-thread.h   |8 ++--
 ui/vnc-jobs-async.c |2 +-
 6 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/cpus.c b/cpus.c
index ca46ec6..cbb4617 100644
--- a/cpus.c
+++ b/cpus.c
@@ -910,7 +910,8 @@ static void qemu_tcg_init_vcpu(void *_env)
 env-halt_cond = g_malloc0(sizeof(QemuCond));
 qemu_cond_init(env-halt_cond);
 tcg_halt_cond = env-halt_cond;
-qemu_thread_create(env-thread, qemu_tcg_cpu_thread_fn, env);
+qemu_thread_create(env-thread, qemu_tcg_cpu_thread_fn, env,
+   QEMU_THREAD_DETACHED);
 while (env-created == 0) {
 qemu_cond_wait(qemu_cpu_cond, qemu_global_mutex);
 }
@@ -926,7 +927,8 @@ static void qemu_kvm_start_vcpu(CPUState *env)
 env-thread = g_malloc0(sizeof(QemuThread));
 env-halt_cond = g_malloc0(sizeof(QemuCond));
 qemu_cond_init(env-halt_cond);
-qemu_thread_create(env-thread, qemu_kvm_cpu_thread_fn, env);
+qemu_thread_create(env-thread, qemu_kvm_cpu_thread_fn, env,
+   QEMU_THREAD_DETACHED);
 while (env-created == 0) {
 qemu_cond_wait(qemu_cpu_cond, qemu_global_mutex);
 }
diff --git a/hw/ccid-card-emulated.c b/hw/ccid-card-emulated.c
index 092301b..9fe9db5 100644
--- a/hw/ccid-card-emulated.c
+++ b/hw/ccid-card-emulated.c
@@ -541,8 +541,9 @@ static int emulated_initfn(CCIDCardState *base)
 printf(%s: failed to initialize vcard\n, EMULATED_DEV_NAME);
 return -1;
 }
-qemu_thread_create(thread_id, event_thread, card);
-qemu_thread_create(thread_id, handle_apdu_thread, card);
+qemu_thread_create(thread_id, event_thread, card, QEMU_THREAD_DETACHED);
+qemu_thread_create(thread_id, handle_apdu_thread, card,
+   QEMU_THREAD_DETACHED);
 return 0;
 }
 
diff --git a/qemu-thread-posix.c b/qemu-thread-posix.c
index ac3c0c9..49d3388 100644
--- a/qemu-thread-posix.c
+++ b/qemu-thread-posix.c
@@ -117,13 +117,14 @@ void qemu_cond_wait(QemuCond *cond, QemuMutex *mutex)
 
 void qemu_thread_create(QemuThread *thread,
void *(*start_routine)(void*),
-   void *arg)
+   void *arg, int mode)
 {
+sigset_t set, oldset;
 int err;
 
-/* Leave signal handling to the iothread.  */
-sigset_t set, oldset;
+assert(mode == QEMU_THREAD_DETACHED);
 
+/* Leave signal handling to the iothread.  */
 sigfillset(set);
 pthread_sigmask(SIG_SETMASK, set, oldset);
 err = pthread_create(thread-thread, NULL, start_routine, arg);
diff --git a/qemu-thread-win32.c b/qemu-thread-win32.c
index db8e744..ff80e84 100644
--- a/qemu-thread-win32.c
+++ b/qemu-thread-win32.c
@@ -243,10 +243,12 @@ static inline void qemu_thread_init(void)
 
 void qemu_thread_create(QemuThread *thread,
void *(*start_routine)(void *),
-   void *arg)
+   void *arg, int mode)
 {
 HANDLE hThread;
 
+assert(mode == QEMU_THREAD_DETACHED);
+
 struct QemuThreadData *data;
 qemu_thread_init();
 data = g_malloc(sizeof *data);
diff --git a/qemu-thread.h b/qemu-thread.h
index e008b60..a78a8f2 100644
--- a/qemu-thread.h
+++ b/qemu-thread.h
@@ -13,6 +13,9 @@ typedef struct QemuThread QemuThread;
 #include qemu-thread-posix.h
 #endif
 
+#define QEMU_THREAD_JOINABLE 0
+#define QEMU_THREAD_DETACHED 1
+
 void qemu_mutex_init(QemuMutex *mutex);
 void qemu_mutex_destroy(QemuMutex *mutex);
 void qemu_mutex_lock(QemuMutex *mutex);
@@ -35,8 +38,9 @@ void qemu_cond_broadcast(QemuCond *cond);
 void qemu_cond_wait(QemuCond *cond, QemuMutex *mutex);
 
 void qemu_thread_create(QemuThread *thread,
-   void *(*start_routine)(void*),
-   void *arg);
+void *(*start_routine)(void *),
+void *arg, int mode);
+void *qemu_thread_join(QemuThread *thread);
 void qemu_thread_get_self(QemuThread *thread);
 int qemu_thread_is_self(QemuThread *thread);
 void qemu_thread_exit(void *retval);
diff --git a/ui/vnc-jobs-async.c b/ui/vnc-jobs-async.c
index de5ea6b..9b3016c 100644
--- a/ui/vnc-jobs-async.c
+++ b/ui/vnc-jobs-async.c
@@ -318,7 +318,7 @@ void vnc_start_worker_thread(void)
 return ;
 
 q = vnc_queue_init();
-qemu_thread_create(q-thread, vnc_worker_thread, q);
+qemu_thread_create(q-thread, vnc_worker_thread, q, QEMU_THREAD_DETACHED);
 queue = q; /* Set global queue */
 }
 
-- 
1.7.7.1

Re: [Qemu-devel] Dropping the MONITOR_CMD_ASYNC

2011-12-12 Thread Anthony Liguori


On 12/11/2011 04:29 AM, Alon Levy wrote:

On Thu, Dec 08, 2011 at 05:45:44PM -0200, Luiz Capitulino wrote:

Hi there,

I'm about to completely drop the MONITOR_CMD_ASYNC API, but it turns out that
the command client_migrate_info uses it. That's a legacy interface and has to
be dropped, no command should be using it...

Something tells me that if I just drop it (and change the command to use the
regular interface), bad things will happen. Am I right? :)



The monitor command client_migrate_info needs to complete after getting


Then it's completely broken.  CMD_ASYNC is broken and should have been removed a 
long time ago.


Unfortunately, we're going to have to remove this command until we can move to 
full QAPI.  I see this went in through Gerd's tree:


commit edc5cb1a52b2847201acf78b0fba67ab3c2464d5
Author: Yonit Halperin yhalp...@redhat.com
Date:   Mon Oct 17 10:03:18 2011 +0200

spice: turn client_migrate_info to async

Changes to monitor commands really should at least carry an Acked-by from the 
QMP maintainer (Luiz).  That would have prevented this from happening.


I'll try to be more diligent about enforcing this in the future.

Regards,

Anthony Liguori


an ACK message from the currently connected spice client (this is the
only case where this is required - if there is no client then the
MONITOR_CMD_ASYNC API won't be used). This in turn requires the main
thread to perform select and call the callback that will process this
ACK. That's why the MONITOR_CMD_ASYNC API was used.

I'm not aware of any other way to do this, I'll be glad for any help
here. Also, I understand this is not what is not true async, since one
would expect a true async interface to support multiple in flight
monitor commands. If there is any ETA or existing way to do this we
could change the implementation of client_migrate_info.

Alon

Re: [Qemu-devel] Dropping the MONITOR_CMD_ASYNC

2011-12-12 Thread Anthony Liguori


On 12/12/2011 10:00 AM, Alon Levy wrote:

On Mon, Dec 12, 2011 at 03:23:35PM +, Stefan Hajnoczi wrote:

On Mon, Dec 12, 2011 at 12:39 PM, Gerd Hoffmannkra...@redhat.com  wrote:

On 12/12/11 13:10, Stefan Hajnoczi wrote:

On Mon, Dec 12, 2011 at 11:29 AM, Gerd Hoffmannkra...@redhat.com  wrote:

On 12/12/11 11:18, Stefan Hajnoczi wrote:

On Sun, Dec 11, 2011 at 10:29 AM, Alon Levyal...@redhat.com  wrote:

On Thu, Dec 08, 2011 at 05:45:44PM -0200, Luiz Capitulino wrote:

Hi there,

I'm about to completely drop the MONITOR_CMD_ASYNC API, but it turns out that
the command client_migrate_info uses it. That's a legacy interface and has to
be dropped, no command should be using it...

Something tells me that if I just drop it (and change the command to use the
regular interface), bad things will happen. Am I right? :)



The monitor command client_migrate_info needs to complete after getting
an ACK message from the currently connected spice client (this is the
only case where this is required - if there is no client then the
MONITOR_CMD_ASYNC API won't be used). This in turn requires the main
thread to perform select and call the callback that will process this
ACK. That's why the MONITOR_CMD_ASYNC API was used.

I'm not aware of any other way to do this, I'll be glad for any help
here. Also, I understand this is not what is not true async, since one
would expect a true async interface to support multiple in flight
monitor commands. If there is any ETA or existing way to do this we
could change the implementation of client_migrate_info.


Is it possible to use a QMP event to signal completion instead of a
MONITOR_CMD_ASYNC command?


Problem is this breaks the qemu-  libvirt interface.


I had the same issue with the block_job_cancel command, which Adam
Litke and Eric Blake helped us fix and find a backward-compatible
libvirt solution for:

http://www.redhat.com/archives/libvir-list/2011-November/msg01351.html


Isn't going to fly as waiting for completion isn't optional in that
case.  Workflow is this:

(1) libvirt issues client_migrate_info command.
(2) qemu forwards it to spice-server, which in turn forwards it to
the spice client (if connected).
(3) spice client connects to the migration target machine.
(4) spice client signals completion to spice-server, which in turn
notifies qemu.
(5) qemu calls the monitor completion callback, libvirt gets
client_migrate_info result.
(6) libvirt issues migrate command.

The problem is that (3) must be finished before (6) because qemu on the
target side doesn't accept incoming tcp connections any more once the
migration started.

I don't see a way to switch this to qmp events without breaking old
libvirt versions managing new qemu.


I don't see a solution in this case either.  Looks like libvirt
supports this command since 0.9.2 so it's not a good idea to just yank
it.

It might be possible for the QEMU client_migrate_info handler to run a
nested event loop in the legacy libvirt case.  This would suck since
the VM is unresponsive while waiting for spice migration to complete.
New libvirt would call the async version of the command which is
well-behaved and uses a QMP event to signal completion.


I agree that a nested event loop would be a bad solution, the point is
to let the guest continue to work while waiting, otherwise you destroy
the live migration experience, might as well disconnect the client from
the source and have it connect to the target.

I thought the whole point of MONITOR_CMD_ASYNC was to allow this
scenario. So iiuc QMP is the alternative but it would require a rewrite,
i.e. break existing users like libvirt. Hence my suggestion as a reply
to Luiz's initial email that we just deprecate MONITOR_CMD_ASYNC right
now instead of dropping it,


It *has* to be dropped.  Any command using it is fundamentally broken.

The command should have never been introduced in the first place to use async.

Regards,

Anthony Liguori

Re: [Qemu-devel] Dropping the MONITOR_CMD_ASYNC

2011-12-12 Thread Anthony Liguori


On 12/12/2011 10:08 AM, Luiz Capitulino wrote:

On Mon, 12 Dec 2011 17:50:46 +0200
Alon Levyal...@redhat.com  wrote:


On Thu, Dec 08, 2011 at 05:45:44PM -0200, Luiz Capitulino wrote:

Hi there,

I'm about to completely drop the MONITOR_CMD_ASYNC API, but it turns out that
the command client_migrate_info uses it. That's a legacy interface and has to
be dropped, no command should be using it...



Why has it got to be dropped? can't it be declared as deprecated first?


Well, after this thread looks like it's what we'll have to do...\


Nope, it has to be dropped.

Commands using CMD_ASYNC may fail in arbitrary ways because of the way error 
reporting is done.  This is an unfixable problem until we eliminate all uses of 
qerror_report().


We need to take the hit here and force the command to always fail.  libvirt will 
need logic to use a different command with new versions.  If we coordinate this 
with the libvirt folks, we can make the transition as smooth as possible.


Regards,

Anthony Liguori






Something tells me that if I just drop it (and change the command to use the
regular interface), bad things will happen. Am I right? :)

Re: [Qemu-devel] [PULL 0/8] arm-devs queue

2011-12-12 Thread Anthony Liguori


On 12/12/2011 04:47 AM, Peter Maydell wrote:

The following changes since commit f18318eef8b4b263f4e82a5338c9b2875a6c73c8:

   Merge branch 'master' of git://git.qemu.org/qemu (2011-12-12 04:12:31 +0400)

are available in the git repository at:

   git://git.linaro.org/people/pmaydell/qemu-arm.git arm-devs.for-upstream


Pulled.  Thanks.

Regards,

Anthony Liguori



Peter Chubb (1):
   Fix sp804 dual-timer

Peter Maydell (7):
   hw/arm_mptimer.c: Turn ARM MPcore private timers into qdev devices
   hw/arm_gic: Expose GIC CPU interfaces as sysbus memory regions
   hw/mpcore.c: Use the GIC memory regions for the CPU interface
   hw/realview_gic: Use GIC memory region for the CPU interface
   hw/mpcore: Clean up mpcore_priv_read/write as they are now SCU only
   hw/a9mpcore.c: Implement A9MP peripherals rather than 11MPcore ones
   hw/mpcore.c: Merge with hw/arm11mpcore.c

  Makefile.target   |1 +
  hw/a9mpcore.c |  189 --
  hw/arm11mpcore.c  |  130 +-
  hw/arm_gic.c  |   75 -
  hw/arm_mptimer.c  |  332 +
  hw/arm_timer.c|   41 ++-
  hw/mpcore.c   |  283 -
  hw/realview_gic.c |   25 +
  8 files changed, 751 insertions(+), 325 deletions(-)
  create mode 100644 hw/arm_mptimer.c
  delete mode 100644 hw/mpcore.c

Re: [Qemu-devel] [PULL 00/22]: QMP queue

2011-12-12 Thread Anthony Liguori


On 12/06/2011 11:54 AM, Luiz Capitulino wrote:

Anthony,

This pull request contains my round 3 QAPI conversion patches, the new QMP
visitor tests, some documentation and the qmp test tool.

The changes (since 217bfb445b54db618a30f3a39170bebd9fd9dbf2) are available
in the following repository:

 git://repo.or.cz/qemu/qmp-unstable.git queue/qmp


Pulled.  Thanks.

Regards,

Anthony Liguori



Luiz Capitulino (21):
   docs: Add writing-qmp-commands.txt
   configure: Don't mix glib and libcheck tests
   Introduce test-qmp-output-visitor
   Introduce test-qmp-input-visitor
   Drop test-visitor
   qapi: Complete system_powerdown conversion
   console: Drop unused prototypes
   QError: Introduce QERR_IO_ERROR
   qapi: Convert memsave
   qapi: Convert pmemsave
   qapi: Convert cont
   qapi: Convert inject-nmi
   qapi: Convert set_link
   qapi: Convert block_passwd
   qapi: Convert balloon
   qapi: Convert block_resize
   qapi: Convert blockdev_snapshot_sync
   qapi: Convert human-monitor-command
   qapi: Convert migrate_cancel
   qapi: Convert migrate_set_downtime
   qapi: Convert migrate_set_speed

Mark Wu (1):
   qmp: add test tool for QMP

  Makefile  |9 +-
  QMP/qmp   |  126 
  balloon.c |   28 +--
  balloon.h |3 -
  blockdev.c|   89 +++
  blockdev.h|4 -
  configure |5 +-
  console.h |2 -
  cpus.c|   90 ++
  docs/writing-qmp-commands.txt |  642 +
  hmp-commands.hx   |   36 +--
  hmp.c |  148 ++
  hmp.h |   12 +
  migration.c   |   28 +--
  migration.h   |7 -
  monitor.c |  201 ++---
  monitor.h |3 +
  net.c |   10 +-
  net.h |1 -
  qapi-schema-test.json |6 +
  qapi-schema.json  |  267 +
  qerror.c  |4 +
  qerror.h  |3 +
  qmp-commands.hx   |   77 +
  qmp.c |   37 +++
  test-qmp-input-visitor.c  |  270 +
  test-qmp-output-visitor.c |  423 +++
  test-visitor.c|  338 --
  28 files changed, 2153 insertions(+), 716 deletions(-)

[Qemu-devel] [PATCH v2 3/4] qemu-thread: implement joinable threads for Win32

2011-12-12 Thread Paolo Bonzini

Rewrite the handshaking between qemu_thread_create and the
win32_start_routine, so that the thread can be joined without races.
Similar handshaking is done now between qemu_thread_exit and
qemu_thread_join.

This also simplifies how QemuThreads are initialized.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 qemu-thread-win32.c |  107 +--
 qemu-thread-win32.h |5 +-
 3 files changed, 73 insertions(+), 41 deletions(-)

diff --git a/qemu-thread-win32.c b/qemu-thread-win32.c
index ff80e84..a13ffcc 100644
--- a/qemu-thread-win32.c
+++ b/qemu-thread-win32.c
@@ -193,41 +193,78 @@ void qemu_cond_wait(QemuCond *cond, QemuMutex *mutex)
 }
 
 struct QemuThreadData {
-QemuThread *thread;
-void *(*start_routine)(void *);
-void *arg;
+/* Passed to win32_start_routine.  */
+void *(*start_routine)(void *);
+void *arg;
+short mode;
+
+/* Only used for joinable threads. */
+bool  exited;
+void *ret;
+CRITICAL_SECTION  cs;
 };
 
 static int qemu_thread_tls_index = TLS_OUT_OF_INDEXES;
 
 static unsigned __stdcall win32_start_routine(void *arg)
 {
-struct QemuThreadData data = *(struct QemuThreadData *) arg;
-QemuThread *thread = data.thread;
-
-free(arg);
-TlsSetValue(qemu_thread_tls_index, thread);
-
-/*
- * Use DuplicateHandle instead of assigning thread-thread in the
- * creating thread to avoid races.  It's simpler this way than with
- * synchronization.
- */
-DuplicateHandle(GetCurrentProcess(), GetCurrentThread(),
-GetCurrentProcess(), thread-thread,
-0, FALSE, DUPLICATE_SAME_ACCESS);
-
-qemu_thread_exit(data.start_routine(data.arg));
+QemuThreadData *data = (QemuThreadData *) arg;
+void *(*start_routine)(void *) = data-start_routine;
+void *thread_arg = data-arg;
+
+if (data-mode == QEMU_THREAD_DETACHED) {
+g_free(data);
+data = NULL;
+} else {
+InitializeCriticalSection(data-cs);
+}
+TlsSetValue(qemu_thread_tls_index, data);
+qemu_thread_exit(start_routine(thread_arg));
 abort();
 }
 
 void qemu_thread_exit(void *arg)
 {
-QemuThread *thread = TlsGetValue(qemu_thread_tls_index);
-thread-ret = arg;
-CloseHandle(thread-thread);
-thread-thread = NULL;
-ExitThread(0);
+QemuThreadData *data = TlsGetValue(qemu_thread_tls_index);
+if (data) {
+data-ret = arg;
+EnterCriticalSection(data-cs);
+data-exited = true;
+LeaveCriticalSection(data-cs);
+}
+_endthreadex(0);
+}
+
+void *qemu_thread_join(QemuThread *thread)
+{
+QemuThreadData *data;
+void *ret;
+HANDLE handle;
+
+data = thread-data;
+if (!data) {
+return NULL;
+}
+/*
+ * Because multiple copies of the QemuThread can exist via
+ * qemu_thread_get_self, we need to store a value that cannot
+ * leak there.  The simplest, non racy way is to store the TID,
+ * discard the handle that _beginthreadex gives back, and
+ * get another copy of the handle here.
+ */
+EnterCriticalSection(data-cs);
+if (!data-exited) {
+handle = OpenThread(SYNCHRONIZE, FALSE, thread-tid);
+LeaveCriticalSection(data-cs);
+WaitForSingleObject(handle, INFINITE);
+CloseHandle(handle);
+} else {
+LeaveCriticalSection(data-cs);
+}
+ret = data-ret;
+DeleteCriticalSection(data-cs);
+g_free(data);
+return ret;
 }
 
 static inline void qemu_thread_init(void)
@@ -247,37 +284,31 @@ void qemu_thread_create(QemuThread *thread,
 {
 HANDLE hThread;
 
-assert(mode == QEMU_THREAD_DETACHED);
-
 struct QemuThreadData *data;
 qemu_thread_init();
 data = g_malloc(sizeof *data);
-data-thread = thread;
 data-start_routine = start_routine;
 data-arg = arg;
+data-mode = mode;
+data-exited = false;
 
 hThread = (HANDLE) _beginthreadex(NULL, 0, win32_start_routine,
-  data, 0, NULL);
+  data, 0, thread-tid);
 if (!hThread) {
 error_exit(GetLastError(), __func__);
 }
 CloseHandle(hThread);
+thread-data = (mode == QEMU_THREAD_DETACHED) ? NULL : data;
 }
 
 void qemu_thread_get_self(QemuThread *thread)
 {
-if (!thread-thread) {
-/* In the main thread of the process.  Initialize the QemuThread
-   pointer in TLS, and use the dummy GetCurrentThread handle as
-   the identifier for qemu_thread_is_self.  */
-qemu_thread_init();
-TlsSetValue(qemu_thread_tls_index, thread);
-thread-thread = GetCurrentThread();
-}
+qemu_thread_init();
+thread-data = TlsGetValue(qemu_thread_tls_index);
+thread-tid = GetCurrentThreadId();
 }
 
 int qemu_thread_is_self(QemuThread *thread)
 {
-QemuThread *this_thread =

Re: [Qemu-devel] [PATCH] Mark future contributions to GPLv2-only files as GPLv2+

2011-12-12 Thread Anthony Liguori


On 10/21/2011 09:03 AM, Paolo Bonzini wrote:

Even for files are licensed GPLv2-only, let's not play catch with
ourselves, and explicitly declare that future contributions to those
files will also be available as any later version.

Signed-off-by: Paolo Bonzinipbonz...@redhat.com


So where do we stand with this?

Regards,

Anthony Liguori


---
  aio.c  |2 ++
  block-migration.c  |2 ++
  block/raw-posix-aio.h  |2 ++
  block/rbd.c|2 ++
  block/sheepdog.c   |3 +++
  buffered_file.c|2 ++
  compatfd.c |2 ++
  hmp.c  |2 ++
  hw/ac97.c  |3 +++
  hw/acpi.c  |3 +++
  hw/acpi_piix4.c|3 +++
  hw/ads7846.c   |3 +++
  hw/apm.c   |3 +++
  hw/bitbang_i2c.c   |3 +++
  hw/bonito.c|3 +++
  hw/collie.c|3 +++
  hw/ds1338.c|3 +++
  hw/ecc.c   |3 +++
  hw/event_notifier.c|3 +++
  hw/framebuffer.c   |3 +++
  hw/gumstix.c   |3 +++
  hw/ivshmem.c   |3 +++
  hw/kvmclock.c  |2 ++
  hw/lan9118.c   |3 +++
  hw/mainstone.c |3 +++
  hw/marvell_88w8618_audio.c |3 +++
  hw/max111x.c   |3 +++
  hw/mips_fulong2e.c |3 +++
  hw/msix.c  |3 +++
  hw/mst_fpga.c  |3 +++
  hw/musicpal.c  |3 +++
  hw/nand.c  |3 +++
  hw/pl031.c |2 ++
  hw/pxa2xx_keypad.c |3 +++
  hw/pxa2xx_lcd.c|3 +++
  hw/pxa2xx_mmci.c   |3 +++
  hw/pxa2xx_pcmcia.c |3 +++
  hw/smbios.c|2 ++
  hw/spitz.c |3 +++
  hw/ssi-sd.c|3 +++
  hw/ssi.c   |3 +++
  hw/strongarm.c |3 +++
  hw/tc6393xb.c  |3 +++
  hw/tosa.c  |3 +++
  hw/vexpress.c  |3 +++
  hw/vhost.c |3 +++
  hw/vhost_net.c |3 +++
  hw/virtio-pci.c|2 ++
  hw/virtio-serial-bus.c |3 +++
  hw/vt82c686.c  |3 +++
  hw/xen_backend.c   |3 +++
  hw/xen_disk.c  |3 +++
  hw/xen_nic.c   |3 +++
  hw/z2.c|3 +++
  iov.c  |3 +++
  memory.c   |2 ++
  migration-exec.c   |2 ++
  migration-fd.c |2 ++
  migration-tcp.c|2 ++
  migration-unix.c   |2 ++
  migration.c|2 ++
  module.c   |2 ++
  net/checksum.c |3 +++
  notify.c   |2 ++
  pflib.c|2 ++
  posix-aio-compat.c |2 ++
  qemu-tool.c|2 ++
  qmp.c  |2 ++
  roms/SLOF  |2 +-
  xen-all.c  |2 ++
  xen-mapcache.c |2 ++
  xen-stub.c |2 ++
  72 files changed, 188 insertions(+), 1 deletions(-)

diff --git a/aio.c b/aio.c
index 1239ca7..c6f3cb1 100644
--- a/aio.c
+++ b/aio.c
@@ -9,6 +9,8 @@
   * This work is licensed under the terms of the GNU GPL, version 2.  See
   * the COPYING file in the top-level directory.
   *
+ * Contributions after 2011-10-25 are licensed under the terms of the
+ * GNU GPL, version 2 or (at your option) any later version.
   */

  #include qemu-common.h
diff --git a/block-migration.c b/block-migration.c
index 0bff075..32c2eea 100644
--- a/block-migration.c
+++ b/block-migration.c
@@ -9,6 +9,8 @@
   * This work is licensed under the terms of the GNU GPL, version 2.  See
   * the COPYING file in the top-level directory.
   *
+ * Contributions after 2011-10-25 are licensed under the terms of the
+ * GNU GPL, version 2 or (at your option) any later version.
   */

  #include qemu-common.h
diff --git a/block/raw-posix-aio.h b/block/raw-posix-aio.h
index dfc63b8..d6d7275 100644
--- a/block/raw-posix-aio.h
+++ b/block/raw-posix-aio.h
@@ -9,6 +9,8 @@
   * This work is licensed under the terms of the GNU GPL, version 2.  See
   * the COPYING file in the top-level directory.
   *
+ * Contributions after 2011-10-25 are licensed under the terms of the
+ * GNU GPL, version 2 or (at your option) any later version.
   */
  #ifndef QEMU_RAW_POSIX_AIO_H
  #define QEMU_RAW_POSIX_AIO_H
diff --git a/block/rbd.c b/block/rbd.c
index 3068c82..b726c80 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -7,6 +7,8 @@
   * This work is licensed under the terms of the GNU GPL, version 2.  See
   * the COPYING file in the top-level directory.
   *
+ * Contributions after 2011-10-25 are licensed under the terms of the
+ * GNU GPL, version 2 or (at your option) any later version.
   */

  #includeinttypes.h
diff

Re: [Qemu-devel] [PATCH v4 00/15] uq/master: Introduce basic irqchip support

2011-12-12 Thread Marcelo Tosatti

On Thu, Dec 08, 2011 at 12:52:19PM +0100, Jan Kiszka wrote:
 Changes in v4:
 - rebased of current uq/master
 - fixed stupid bugs that broke bisectability and user space irqchip mode
 - integrated NMI-over-LINT1 injection logic
 
 CC: Lai Jiangshan la...@cn.fujitsu.com
 
 Jan Kiszka (15):
   msi: Generalize msix_supported to msi_supported
   kvm: Move kvmclock into hw/kvm folder
   apic: Stop timer on reset
   apic: Inject external NMI events via LINT1
   apic: Introduce backend/frontend infrastructure for KVM reuse
   apic: Open-code timer save/restore
   i8259: Introduce backend/frontend infrastructure for KVM reuse
   ioapic: Introduce backend/frontend infrastructure for KVM reuse
   memory: Introduce memory_region_init_reservation
   kvm: Introduce core services for in-kernel irqchip support
   kvm: x86: Establish IRQ0 override control
   kvm: x86: Add user space part for in-kernel APIC
   kvm: x86: Add user space part for in-kernel i8259
   kvm: x86: Add user space part for in-kernel IOAPIC
   kvm: Arm in-kernel irqchip support

Looks good to me.

Any thoughts on the qemu-kvm merge plan? Sounds painful.

Re: [Qemu-devel] [PATCH 01/11] isa: give ISABus/ISADevice to isa_create(), isa_bus_irqs() and isa_get_irq() functions

2011-12-12 Thread Anthony Liguori


On 10/24/2011 03:18 PM, Hervé Poussineau wrote:

NULL is a valid bus/device, so there is no change in behaviour.

Signed-off-by: Hervé Poussineauhpous...@reactos.org


Reviewed-by: Anthony Liguori aligu...@us.ibm.com

Regards,

Anthony Liguori


---
  arch_init.c|8 
  arch_init.h|2 +-
  hw/adlib.c |2 +-
  hw/alpha_dp264.c   |   10 ++
  hw/alpha_typhoon.c |7 ---
  hw/audiodev.h  |8 
  hw/cs4231a.c   |4 ++--
  hw/fdc.h   |4 ++--
  hw/gus.c   |4 ++--
  hw/i8254.c |2 +-
  hw/i8259.c |6 +++---
  hw/ide.h   |2 +-
  hw/ide/isa.c   |4 ++--
  hw/ide/piix.c  |2 +-
  hw/ide/via.c   |2 +-
  hw/isa-bus.c   |   18 +++---
  hw/isa.h   |   10 +-
  hw/m48t59.c|5 +++--
  hw/mc146818rtc.c   |4 ++--
  hw/mc146818rtc.h   |2 +-
  hw/mips_fulong2e.c |   16 +---
  hw/mips_jazz.c |   13 +++--
  hw/mips_malta.c|   26 ++
  hw/mips_r4k.c  |   21 +++--
  hw/nvram.h |3 ++-
  hw/pc.c|   30 +++---
  hw/pc.h|   35 ++-
  hw/pc_piix.c   |   19 +++
  hw/pcspk.c |2 +-
  hw/ppc_prep.c  |   20 +++-
  hw/sb16.c  |4 ++--
  hw/sun4u.c |   20 
  qemu-common.h  |1 +
  33 files changed, 171 insertions(+), 145 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index a411fdf..3bc2a41 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -473,7 +473,7 @@ struct soundhw {
  int enabled;
  int isa;
  union {
-int (*init_isa) (qemu_irq *pic);
+int (*init_isa) (ISABus *bus, qemu_irq *pic);
  int (*init_pci) (PCIBus *bus);
  } init;
  };
@@ -628,7 +628,7 @@ void select_soundhw(const char *optarg)
  }
  }

-void audio_init(qemu_irq *isa_pic, PCIBus *pci_bus)
+void audio_init(ISABus *isa_bus, qemu_irq *isa_pic, PCIBus *pci_bus)
  {
  struct soundhw *c;

@@ -636,7 +636,7 @@ void audio_init(qemu_irq *isa_pic, PCIBus *pci_bus)
  if (c-enabled) {
  if (c-isa) {
  if (isa_pic) {
-c-init.init_isa(isa_pic);
+c-init.init_isa(isa_bus, isa_pic);
  }
  } else {
  if (pci_bus) {
@@ -650,7 +650,7 @@ void audio_init(qemu_irq *isa_pic, PCIBus *pci_bus)
  void select_soundhw(const char *optarg)
  {
  }
-void audio_init(qemu_irq *isa_pic, PCIBus *pci_bus)
+void audio_init(ISABus *isa_bus, qemu_irq *isa_pic, PCIBus *pci_bus)
  {
  }
  #endif
diff --git a/arch_init.h b/arch_init.h
index a74187a..074f02a 100644
--- a/arch_init.h
+++ b/arch_init.h
@@ -27,7 +27,7 @@ void do_acpitable_option(const char *optarg);
  void do_smbios_option(const char *optarg);
  void cpudef_init(void);
  int audio_available(void);
-void audio_init(qemu_irq *isa_pic, PCIBus *pci_bus);
+void audio_init(ISABus *isa_bus, qemu_irq *isa_pic, PCIBus *pci_bus);
  int tcg_available(void);
  int kvm_available(void);
  int xen_available(void);
diff --git a/hw/adlib.c b/hw/adlib.c
index e4bfcc6..b5e1564 100644
--- a/hw/adlib.c
+++ b/hw/adlib.c
@@ -275,7 +275,7 @@ static void Adlib_fini (AdlibState *s)
  AUD_remove_card (s-card);
  }

-int Adlib_init (qemu_irq *pic)
+int Adlib_init (ISABus *bus, qemu_irq *pic)
  {
  AdlibState *s =glob_adlib;
  struct audsettings as;
diff --git a/hw/alpha_dp264.c b/hw/alpha_dp264.c
index fcc20e9..a87d6ef 100644
--- a/hw/alpha_dp264.c
+++ b/hw/alpha_dp264.c
@@ -50,6 +50,7 @@ static void clipper_init(ram_addr_t ram_size,
  {
  CPUState *cpus[4];
  PCIBus *pci_bus;
+ISABus *isa_bus;
  qemu_irq rtc_irq;
  long size, i;
  const char *palcode_filename;
@@ -68,10 +69,11 @@ static void clipper_init(ram_addr_t ram_size,

  /* Init the chipset.  */
  pci_bus = typhoon_init(ram_size,rtc_irq, cpus, clipper_pci_map_irq);
+isa_bus = NULL;

-rtc_init(1980, rtc_irq);
-pit_init(0x40, 0);
-isa_create_simple(i8042);
+rtc_init(isa_bus, 1980, rtc_irq);
+pit_init(isa_bus, 0x40, 0);
+isa_create_simple(isa_bus, i8042);

  /* VGA setup.  Don't bother loading the bios.  */
  alpha_pci_vga_setup(pci_bus);
@@ -79,7 +81,7 @@ static void clipper_init(ram_addr_t ram_size,
  /* Serial code setup.  */
  for (i = 0; i  MAX_SERIAL_PORTS; ++i) {
  if (serial_hds[i]) {
-serial_isa_init(i, serial_hds[i]);
+serial_isa_init(isa_bus, i, serial_hds[i]);
  }
  }

diff --git a/hw/alpha_typhoon.c b/hw/alpha_typhoon.c
index c7608bb..113837d 100644
--- a/hw/alpha_typhoon.c
+++ b/hw/alpha_typhoon.c
@@ -791,11 +791,12 @@ PCIBus *typhoon_init(ram_addr_t ram_size, qemu_irq 
*p_rtc_irq,
  /* ??? Technically there should be a cy82c693ub pci-isa bridge.  */
  {

Re: [Qemu-devel] [PATCH 02/11] isa: move ISABus structure definition to header file

2011-12-12 Thread Anthony Liguori


On 10/24/2011 03:18 PM, Hervé Poussineau wrote:


Signed-off-by: Hervé Poussineauhpous...@reactos.org


Reviewed-by: Anthony Liguori aligu...@us.ibm.com

Regards,

Anthony Liguori


---
  hw/isa-bus.c |5 -
  hw/isa.h |6 ++
  2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/hw/isa-bus.c b/hw/isa-bus.c
index dcbb134..7c94f0b 100644
--- a/hw/isa-bus.c
+++ b/hw/isa-bus.c
@@ -22,11 +22,6 @@
  #include isa.h
  #include exec-memory.h

-struct ISABus {
-BusState qbus;
-MemoryRegion *address_space_io;
-qemu_irq *irqs;
-};
  static ISABus *isabus;
  target_phys_addr_t isa_mem_base = 0;

diff --git a/hw/isa.h b/hw/isa.h
index 4b58e37..0462521 100644
--- a/hw/isa.h
+++ b/hw/isa.h
@@ -13,6 +13,12 @@ typedef struct ISABus ISABus;
  typedef struct ISADevice ISADevice;
  typedef struct ISADeviceInfo ISADeviceInfo;

+struct ISABus {
+BusState qbus;
+MemoryRegion *address_space_io;
+qemu_irq *irqs;
+};
+
  struct ISADevice {
  DeviceState qdev;
  uint32_t isairq[2];

Re: [Qemu-devel] [PATCH 03/11] i8259: give ISA device to isa_register_ioport()

2011-12-12 Thread Anthony Liguori


On 10/24/2011 03:18 PM, Hervé Poussineau wrote:


Signed-off-by: Hervé Poussineauhpous...@reactos.org


Reviewed-by: Anthony Liguori aligu...@us.ibm.com

Regards,

Anthony Liguori


---
  hw/i8259.c |4 ++--
  1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/i8259.c b/hw/i8259.c
index 4446339..7331e0e 100644
--- a/hw/i8259.c
+++ b/hw/i8259.c
@@ -469,9 +469,9 @@ static int pic_initfn(ISADevice *dev)
  memory_region_init_io(s-base_io,pic_base_ioport_ops, s, pic, 2);
  memory_region_init_io(s-elcr_io,pic_elcr_ioport_ops, s, elcr, 1);

-isa_register_ioport(NULL,s-base_io, s-iobase);
+isa_register_ioport(dev,s-base_io, s-iobase);
  if (s-elcr_addr != -1) {
-isa_register_ioport(NULL,s-elcr_io, s-elcr_addr);
+isa_register_ioport(dev,s-elcr_io, s-elcr_addr);
  }

  qdev_init_gpio_out(dev-qdev, s-int_out, ARRAY_SIZE(s-int_out));

Re: [Qemu-devel] [PATCH 04/11] pc: give ISA bus to ISA methods

2011-12-12 Thread Anthony Liguori


On 10/24/2011 03:18 PM, Hervé Poussineau wrote:


Signed-off-by: Hervé Poussineauhpous...@reactos.org


Reviewed-by: Anthony Liguori aligu...@us.ibm.com

Regards,

Anthony Liguori


---
  hw/pc.h   |2 +-
  hw/pc_piix.c  |3 +--
  hw/piix_pci.c |8 +---
  3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/hw/pc.h b/hw/pc.h
index c43fa73..127940c 100644
--- a/hw/pc.h
+++ b/hw/pc.h
@@ -181,7 +181,7 @@ struct PCII440FXState;
  typedef struct PCII440FXState PCII440FXState;

  PCIBus *i440fx_init(PCII440FXState **pi440fx_state, int *piix_devfn,
-qemu_irq *pic,
+ISABus **isa_bus, qemu_irq *pic,
  MemoryRegion *address_space_mem,
  MemoryRegion *address_space_io,
  ram_addr_t ram_size,
diff --git a/hw/pc_piix.c b/hw/pc_piix.c
index 6bc1f60..be91d3b 100644
--- a/hw/pc_piix.c
+++ b/hw/pc_piix.c
@@ -135,7 +135,7 @@ static void pc_init1(MemoryRegion *system_memory,
  gsi = qemu_allocate_irqs(gsi_handler, gsi_state, GSI_NUM_PINS);

  if (pci_enabled) {
-pci_bus = i440fx_init(i440fx_state,piix3_devfn, gsi,
+pci_bus = i440fx_init(i440fx_state,piix3_devfn,isa_bus, gsi,
system_memory, system_io, ram_size,
below_4g_mem_size,
0x1ULL - below_4g_mem_size,
@@ -144,7 +144,6 @@ static void pc_init1(MemoryRegion *system_memory,
 ? 0
 : ((uint64_t)1  62)),
pci_memory, ram_memory);
-isa_bus = NULL;
  } else {
  pci_bus = NULL;
  i440fx_state = NULL;
diff --git a/hw/piix_pci.c b/hw/piix_pci.c
index d183443..aef2d7f 100644
--- a/hw/piix_pci.c
+++ b/hw/piix_pci.c
@@ -263,7 +263,7 @@ static int i440fx_initfn(PCIDevice *dev)
  static PCIBus *i440fx_common_init(const char *device_name,
PCII440FXState **pi440fx_state,
int *piix3_devfn,
-  qemu_irq *pic,
+  ISABus **isa_bus, qemu_irq *pic,
MemoryRegion *address_space_mem,
MemoryRegion *address_space_io,
ram_addr_t ram_size,
@@ -325,6 +325,8 @@ static PCIBus *i440fx_common_init(const char *device_name,
  PIIX_NUM_PIRQS);
  }
  piix3-pic = pic;
+*isa_bus = DO_UPCAST(ISABus, qbus,
+ qdev_get_child_bus(piix3-dev.qdev, isa.0));

  (*pi440fx_state)-piix3 = piix3;

@@ -341,7 +343,7 @@ static PCIBus *i440fx_common_init(const char *device_name,
  }

  PCIBus *i440fx_init(PCII440FXState **pi440fx_state, int *piix3_devfn,
-qemu_irq *pic,
+ISABus **isa_bus, qemu_irq *pic,
  MemoryRegion *address_space_mem,
  MemoryRegion *address_space_io,
  ram_addr_t ram_size,
@@ -354,7 +356,7 @@ PCIBus *i440fx_init(PCII440FXState **pi440fx_state, int 
*piix3_devfn,
  {
  PCIBus *b;

-b = i440fx_common_init(i440FX, pi440fx_state, piix3_devfn, pic,
+b = i440fx_common_init(i440FX, pi440fx_state, piix3_devfn, isa_bus, pic,
 address_space_mem, address_space_io, ram_size,
 pci_hole_start, pci_hole_size,
 pci_hole64_size, pci_hole64_size,

Re: [Qemu-devel] [PATCH 00/11] isa: preliminary work for multiple buses

2011-12-12 Thread Anthony Liguori


On 10/24/2011 03:18 PM, Hervé Poussineau wrote:

Current patches are a rework of my patches already available at [1].
They don't provide full support for multiple ISA buses (yet), but
add a ISABus or ISADevice argument to all ISA functions.
They are mostly mechanically touching every instanciation of ISA
devices, so number of lines is quite high even if impact is quite low.

Some patches don't pass checkpass check due to spaces around
parentheses, but malc asked to do so on files he maintains.

Some more patches will be provided after Qemu 1.0 to support multiple
ISA buses, but will mostly touch ISA bridges and hw/isa-bus.c file.

I think that this first step can be applied now (before release),
so ISA interface may be considered stable for devices and machine
emulations.

Please consider applying this before Qemu 1.0.


Reviewed-by: Anthony Liguori aligu...@us.ibm.com

But could you rebase this series?  It doesn't apply very well right now.  This 
is a very nice cleanup, sorry it's taken so long to get it applied.


Regards,

Anthony Liguori



Thanks

[1] http://lists.gnu.org/archive/html/qemu-devel/2011-10/msg00094.html

Hervé Poussineau (11):
   isa: give ISABus/ISADevice to isa_create(), isa_bus_irqs() and
 isa_get_irq() functions
   isa: move ISABus structure definition to header file
   i8259: give ISA device to isa_register_ioport()
   pc: give ISA bus to ISA methods
   alpha: give ISA bus to ISA methods
   sun4u: give ISA bus to ISA methods
   fulong2e: give ISA bus to ISA methods
   malta: give ISA bus to ISA methods
   isa: always use provided ISA bus when creating an isa device
   isa: always use provided ISA bus in isa_bus_irqs()
   audio: remove unused parameter isa_pic

  arch_init.c|   10 +-
  arch_init.h|2 +-
  hw/adlib.c |2 +-
  hw/alpha_dp264.c   |   12 +++-
  hw/alpha_sys.h |3 ++-
  hw/alpha_typhoon.c |9 +
  hw/audiodev.h  |8 
  hw/cs4231a.c   |4 ++--
  hw/fdc.h   |4 ++--
  hw/gus.c   |4 ++--
  hw/i8254.c |2 +-
  hw/i8259.c |   10 +-
  hw/ide.h   |2 +-
  hw/ide/isa.c   |4 ++--
  hw/ide/piix.c  |2 +-
  hw/ide/via.c   |2 +-
  hw/isa-bus.c   |   33 -
  hw/isa.h   |   16 +++-
  hw/m48t59.c|5 +++--
  hw/mc146818rtc.c   |4 ++--
  hw/mc146818rtc.h   |2 +-
  hw/mips_fulong2e.c |   20 ++--
  hw/mips_jazz.c |   13 +++--
  hw/mips_malta.c|   27 ++-
  hw/mips_r4k.c  |   21 +++--
  hw/nvram.h |3 ++-
  hw/pc.c|   30 +++---
  hw/pc.h|   39 ---
  hw/pc_piix.c   |   20 +++-
  hw/pcspk.c |2 +-
  hw/piix4.c |3 ++-
  hw/piix_pci.c  |8 +---
  hw/ppc_prep.c  |   20 +++-
  hw/sb16.c  |4 ++--
  hw/sun4u.c |   24 +++-
  hw/vt82c686.c  |4 ++--
  hw/vt82c686.h  |2 +-
  qemu-common.h  |1 +
  38 files changed, 205 insertions(+), 176 deletions(-)

Re: [Qemu-devel] [PATCH v4 00/15] uq/master: Introduce basic irqchip support

2011-12-12 Thread Jan Kiszka

On 2011-12-12 17:37, Marcelo Tosatti wrote:
 On Thu, Dec 08, 2011 at 12:52:19PM +0100, Jan Kiszka wrote:
 Changes in v4:
 - rebased of current uq/master
 - fixed stupid bugs that broke bisectability and user space irqchip mode
 - integrated NMI-over-LINT1 injection logic

 CC: Lai Jiangshan la...@cn.fujitsu.com

 Jan Kiszka (15):
   msi: Generalize msix_supported to msi_supported
   kvm: Move kvmclock into hw/kvm folder
   apic: Stop timer on reset
   apic: Inject external NMI events via LINT1
   apic: Introduce backend/frontend infrastructure for KVM reuse
   apic: Open-code timer save/restore
   i8259: Introduce backend/frontend infrastructure for KVM reuse
   ioapic: Introduce backend/frontend infrastructure for KVM reuse
   memory: Introduce memory_region_init_reservation
   kvm: Introduce core services for in-kernel irqchip support
   kvm: x86: Establish IRQ0 override control
   kvm: x86: Add user space part for in-kernel APIC
   kvm: x86: Add user space part for in-kernel i8259
   kvm: x86: Add user space part for in-kernel IOAPIC
   kvm: Arm in-kernel irqchip support
 
 Looks good to me.
 
 Any thoughts on the qemu-kvm merge plan? Sounds painful.

Pain will be where the existing qemu-kvm extensions collide with these
refactored upstream devices (backend/frontend split specifically).
That's where we have to merge very carefully. Haven't tried this yet,
will give it a spin tomorrow or so.

From that point on, disabling the new stuff for now and at some point
switching over from the old one should be simple again.

BTW, PIT+HPET+speaker will cause similar issues for the same reasons.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

Re: [Qemu-devel] Dropping the MONITOR_CMD_ASYNC

2011-12-12 Thread Alon Levy

On Mon, Dec 12, 2011 at 10:24:53AM -0600, Anthony Liguori wrote:
 On 12/12/2011 10:00 AM, Alon Levy wrote:
 On Mon, Dec 12, 2011 at 03:23:35PM +, Stefan Hajnoczi wrote:
 On Mon, Dec 12, 2011 at 12:39 PM, Gerd Hoffmannkra...@redhat.com  wrote:
 On 12/12/11 13:10, Stefan Hajnoczi wrote:
 On Mon, Dec 12, 2011 at 11:29 AM, Gerd Hoffmannkra...@redhat.com  wrote:
 On 12/12/11 11:18, Stefan Hajnoczi wrote:
 On Sun, Dec 11, 2011 at 10:29 AM, Alon Levyal...@redhat.com  wrote:
 On Thu, Dec 08, 2011 at 05:45:44PM -0200, Luiz Capitulino wrote:
 Hi there,
 
 I'm about to completely drop the MONITOR_CMD_ASYNC API, but it turns 
 out that
 the command client_migrate_info uses it. That's a legacy interface 
 and has to
 be dropped, no command should be using it...
 
 Something tells me that if I just drop it (and change the command to 
 use the
 regular interface), bad things will happen. Am I right? :)
 
 
 The monitor command client_migrate_info needs to complete after getting
 an ACK message from the currently connected spice client (this is the
 only case where this is required - if there is no client then the
 MONITOR_CMD_ASYNC API won't be used). This in turn requires the main
 thread to perform select and call the callback that will process this
 ACK. That's why the MONITOR_CMD_ASYNC API was used.
 
 I'm not aware of any other way to do this, I'll be glad for any help
 here. Also, I understand this is not what is not true async, since one
 would expect a true async interface to support multiple in flight
 monitor commands. If there is any ETA or existing way to do this we
 could change the implementation of client_migrate_info.
 
 Is it possible to use a QMP event to signal completion instead of a
 MONITOR_CMD_ASYNC command?
 
 Problem is this breaks the qemu-  libvirt interface.
 
 I had the same issue with the block_job_cancel command, which Adam
 Litke and Eric Blake helped us fix and find a backward-compatible
 libvirt solution for:
 
 http://www.redhat.com/archives/libvir-list/2011-November/msg01351.html
 
 Isn't going to fly as waiting for completion isn't optional in that
 case.  Workflow is this:
 
 (1) libvirt issues client_migrate_info command.
 (2) qemu forwards it to spice-server, which in turn forwards it to
 the spice client (if connected).
 (3) spice client connects to the migration target machine.
 (4) spice client signals completion to spice-server, which in turn
 notifies qemu.
 (5) qemu calls the monitor completion callback, libvirt gets
 client_migrate_info result.
 (6) libvirt issues migrate command.
 
 The problem is that (3) must be finished before (6) because qemu on the
 target side doesn't accept incoming tcp connections any more once the
 migration started.
 
 I don't see a way to switch this to qmp events without breaking old
 libvirt versions managing new qemu.
 
 I don't see a solution in this case either.  Looks like libvirt
 supports this command since 0.9.2 so it's not a good idea to just yank
 it.
 
 It might be possible for the QEMU client_migrate_info handler to run a
 nested event loop in the legacy libvirt case.  This would suck since
 the VM is unresponsive while waiting for spice migration to complete.
 New libvirt would call the async version of the command which is
 well-behaved and uses a QMP event to signal completion.
 
 I agree that a nested event loop would be a bad solution, the point is
 to let the guest continue to work while waiting, otherwise you destroy
 the live migration experience, might as well disconnect the client from
 the source and have it connect to the target.
 
 I thought the whole point of MONITOR_CMD_ASYNC was to allow this
 scenario. So iiuc QMP is the alternative but it would require a rewrite,
 i.e. break existing users like libvirt. Hence my suggestion as a reply
 to Luiz's initial email that we just deprecate MONITOR_CMD_ASYNC right
 now instead of dropping it,
 
 It *has* to be dropped.  Any command using it is fundamentally broken.
 

Care to explain what is fundamental about it?

 The command should have never been introduced in the first place to use async.

Gerd gave a very good explanation of the requirement, I don't think I
can add anything to it. What you suggest is to rewrite it, I just don't
understand why we can't leave the MONITOR_CMD_ASYNC around until there
is both an alternative and libvirt learns to use it.

 
 Regards,
 
 Anthony Liguori

Re: [Qemu-devel] [PATCH] ARM - Remove fixed map code buffer restriction

2011-12-12 Thread andrzej zaborowski

Hi,

On 12 December 2011 16:55, Peter Maydell peter.mayd...@linaro.org wrote:
 On 12 December 2011 15:37, Dr. David Alan Gilbert
 david.gilb...@linaro.org wrote:
 On ARM, don't map the code buffer at a fixed location, and fix up the
 call/goto tcg routines to let it do long jumps.

 Mapping the code buffer at a fixed address could sometimes result in it being
 mapped over the top of the heap with pretty random results.

 This diff is against v1.0.

 Patches should always be against master, although we're still
 close enough to 1.0 that this will probably apply anyway.

 Comments like This diff is against v1.0. go below the '---' so they
 don't appear in the final commit log.

 Code generally looks OK to me.

To me as well, I'll apply if there are no objections.  I remember
Peter asked about MAP_FIXED on other hosts, has there been a
resolution?


 Signed-off-by: Dr. David Alan Gilbert david.gilb...@linaro.org
 ---
  exec.c               |    4 +---
  tcg/arm/tcg-target.c |   31 ---
  2 files changed, 13 insertions(+), 22 deletions(-)

 diff --git a/exec.c b/exec.c
 index 6b92198..ef83da1 100644
 --- a/exec.c
 +++ b/exec.c
 @@ -497,9 +497,7 @@ static void code_gen_alloc(unsigned long tb_size)
         if (code_gen_buffer_size  (512 * 1024 * 1024))
             code_gen_buffer_size = (512 * 1024 * 1024);
  #elif defined(__arm__)
 -        /* Map the buffer below 32M, so we can use direct calls and 
 branches */
 -        flags |= MAP_FIXED;
 -        start = (void *) 0x0100UL;
 +        /* Keep the buffer no bigger than 16GB to branch between blocks */
         if (code_gen_buffer_size  16 * 1024 * 1024)
             code_gen_buffer_size = 16 * 1024 * 1024;
  #elif defined(__s390x__)
 diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c
 index e05a64f..730d913 100644
 --- a/tcg/arm/tcg-target.c
 +++ b/tcg/arm/tcg-target.c
 @@ -842,6 +842,12 @@ static inline void tcg_out_st8(TCGContext *s, int cond,
         tcg_out_st8_12(s, cond, rd, rn, offset);
  }

 +/* The _goto case is normally between TBs within the same code buffer,
 +   and with the code buffer limited to 16GB we shouldn't need the long
 +   case.
 +
 +    except to the prologue that is in its own buffer.
 +     */
  static inline void tcg_out_goto(TCGContext *s, int cond, uint32_t addr)
  {
     int32_t val;
 @@ -855,22 +861,20 @@ static inline void tcg_out_goto(TCGContext *s, int 
 cond, uint32_t addr)
     if (val - 8  0x01fd  val - 8  -0x01fd)
         tcg_out_b(s, cond, val);
     else {
 -#if 1
 -        tcg_abort();
 -#else
         if (cond == COND_AL) {
             tcg_out_ld32_12(s, COND_AL, TCG_REG_PC, TCG_REG_PC, -4);
 -            tcg_out32(s, addr); /* XXX: This is l-u.value, can we use it? 
 */
 +            tcg_out32(s, addr);

 I see these XXXs have been there since the ARM tcg target was
 introduced. Does anybody know what they were referring to? Andrzej?

David asked me about them, but I couldn't figure it out. (the part in
tcg_out_call apparently is just copypasted from tcg_ouy_goto)

BTW: I think we can also use the ld branch when we see the goto
target is in Thumb mode.

Cheers

Re: [Qemu-devel] [PATCH v4 00/15] uq/master: Introduce basic irqchip support

2011-12-12 Thread Avi Kivity

On 12/12/2011 06:51 PM, Jan Kiszka wrote:
  
  Any thoughts on the qemu-kvm merge plan? Sounds painful.

 Pain will be where the existing qemu-kvm extensions collide with these
 refactored upstream devices (backend/frontend split specifically).
 That's where we have to merge very carefully. Haven't tried this yet,
 will give it a spin tomorrow or so.

 From that point on, disabling the new stuff for now and at some point
 switching over from the old one should be simple again.

 BTW, PIT+HPET+speaker will cause similar issues for the same reasons.


It's a little late for this, but refactoring qemu-kvm in-tree and then
splitting it into patches would have been easier.  Let's try it this way
for the next batch.


-- 
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] [PATCH v4 00/15] uq/master: Introduce basic irqchip support

2011-12-12 Thread Jan Kiszka

On 2011-12-12 18:37, Avi Kivity wrote:
 On 12/12/2011 06:51 PM, Jan Kiszka wrote:

 Any thoughts on the qemu-kvm merge plan? Sounds painful.

 Pain will be where the existing qemu-kvm extensions collide with these
 refactored upstream devices (backend/frontend split specifically).
 That's where we have to merge very carefully. Haven't tried this yet,
 will give it a spin tomorrow or so.

 From that point on, disabling the new stuff for now and at some point
 switching over from the old one should be simple again.

 BTW, PIT+HPET+speaker will cause similar issues for the same reasons.

 
 It's a little late for this, but refactoring qemu-kvm in-tree and then
 splitting it into patches would have been easier.  Let's try it this way
 for the next batch.

I thought about this, but it definitely takes a clean, qemu-kvm free
base as start. The point is to design something free of all the legacy,
only looking at the other code base to extract the logic.

Moreover, there was and still is quite some upstream cleanup necessary,
and that never goes well with the delta of qemu-kvm.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

Re: [Qemu-devel] Dropping the MONITOR_CMD_ASYNC

2011-12-12 Thread Anthony Liguori


On 12/12/2011 11:22 AM, Alon Levy wrote:

On Mon, Dec 12, 2011 at 10:24:53AM -0600, Anthony Liguori wrote:

On 12/12/2011 10:00 AM, Alon Levy wrote:

On Mon, Dec 12, 2011 at 03:23:35PM +, Stefan Hajnoczi wrote:

On Mon, Dec 12, 2011 at 12:39 PM, Gerd Hoffmannkra...@redhat.com   wrote:

On 12/12/11 13:10, Stefan Hajnoczi wrote:

On Mon, Dec 12, 2011 at 11:29 AM, Gerd Hoffmannkra...@redhat.com   wrote:

On 12/12/11 11:18, Stefan Hajnoczi wrote:

On Sun, Dec 11, 2011 at 10:29 AM, Alon Levyal...@redhat.com   wrote:

On Thu, Dec 08, 2011 at 05:45:44PM -0200, Luiz Capitulino wrote:

Hi there,

I'm about to completely drop the MONITOR_CMD_ASYNC API, but it turns out that
the command client_migrate_info uses it. That's a legacy interface and has to
be dropped, no command should be using it...

Something tells me that if I just drop it (and change the command to use the
regular interface), bad things will happen. Am I right? :)



The monitor command client_migrate_info needs to complete after getting
an ACK message from the currently connected spice client (this is the
only case where this is required - if there is no client then the
MONITOR_CMD_ASYNC API won't be used). This in turn requires the main
thread to perform select and call the callback that will process this
ACK. That's why the MONITOR_CMD_ASYNC API was used.

I'm not aware of any other way to do this, I'll be glad for any help
here. Also, I understand this is not what is not true async, since one
would expect a true async interface to support multiple in flight
monitor commands. If there is any ETA or existing way to do this we
could change the implementation of client_migrate_info.


Is it possible to use a QMP event to signal completion instead of a
MONITOR_CMD_ASYNC command?


Problem is this breaks the qemu-   libvirt interface.


I had the same issue with the block_job_cancel command, which Adam
Litke and Eric Blake helped us fix and find a backward-compatible
libvirt solution for:

http://www.redhat.com/archives/libvir-list/2011-November/msg01351.html


Isn't going to fly as waiting for completion isn't optional in that
case.  Workflow is this:

(1) libvirt issues client_migrate_info command.
(2) qemu forwards it to spice-server, which in turn forwards it to
the spice client (if connected).
(3) spice client connects to the migration target machine.
(4) spice client signals completion to spice-server, which in turn
notifies qemu.
(5) qemu calls the monitor completion callback, libvirt gets
client_migrate_info result.
(6) libvirt issues migrate command.

The problem is that (3) must be finished before (6) because qemu on the
target side doesn't accept incoming tcp connections any more once the
migration started.

I don't see a way to switch this to qmp events without breaking old
libvirt versions managing new qemu.


I don't see a solution in this case either.  Looks like libvirt
supports this command since 0.9.2 so it's not a good idea to just yank
it.

It might be possible for the QEMU client_migrate_info handler to run a
nested event loop in the legacy libvirt case.  This would suck since
the VM is unresponsive while waiting for spice migration to complete.
New libvirt would call the async version of the command which is
well-behaved and uses a QMP event to signal completion.


I agree that a nested event loop would be a bad solution, the point is
to let the guest continue to work while waiting, otherwise you destroy
the live migration experience, might as well disconnect the client from
the source and have it connect to the target.

I thought the whole point of MONITOR_CMD_ASYNC was to allow this
scenario. So iiuc QMP is the alternative but it would require a rewrite,
i.e. break existing users like libvirt. Hence my suggestion as a reply
to Luiz's initial email that we just deprecate MONITOR_CMD_ASYNC right
now instead of dropping it,


It *has* to be dropped.  Any command using it is fundamentally broken.



Care to explain what is fundamental about it?


The commands are broken as they sit today.  If you issue an async command and an 
error occurs, the error will not be associated with the right command.





The command should have never been introduced in the first place to use async.


Gerd gave a very good explanation of the requirement, I don't think I
can add anything to it. What you suggest is to rewrite it, I just don't
understand why we can't leave the MONITOR_CMD_ASYNC around until there
is both an alternative and libvirt learns to use it.


It's broken and more importantly, the longer we leave it there the more likely 
people will use it and create more problems.


We aren't breaking an ABI here.  libvirt can detect that this command is gone 
and probe for the new command.


This shouldn't have gone in in the first place and yes, that's my responsibility 
to enforce.


Regards,

Anthony Liguori





Regards,

Anthony Liguori

Re: [Qemu-devel] [PATCH v4 00/15] uq/master: Introduce basic irqchip support

2011-12-12 Thread Avi Kivity

On 12/12/2011 07:42 PM, Jan Kiszka wrote:
  It's a little late for this, but refactoring qemu-kvm in-tree and then
  splitting it into patches would have been easier.  Let's try it this way
  for the next batch.

 I thought about this, but it definitely takes a clean, qemu-kvm free
 base as start. The point is to design something free of all the legacy,
 only looking at the other code base to extract the logic.

If qemu-kvm was merged into qemu as is, then I'm sure you'd be able to
refactor it into good shape (though there might be a little less
motivation).

 Moreover, there was and still is quite some upstream cleanup necessary,
 and that never goes well with the delta of qemu-kvm.

That's a good point.

-- 
error compiling committee.c: too many arguments to function

[Qemu-devel] [Bug 569760] Re: Error in i386 cmpxchg instruction emulation

2011-12-12 Thread Andreas Gustafsson

** Changed in: qemu
   Status: New = Fix Committed

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/569760

Title:
  Error in i386 cmpxchg instruction emulation

Status in QEMU:
  Fix Committed

Bug description:
  As reported in

http://www.netbsd.org/cgi-bin/query-pr-single.pl?number=42158

  programs using pthreads and fork() under NetBSD/i386 hang when the
  NetBSD system is run within qemu.

  This problem affects every version of qemu I have tested, including
  0.12.3.

  I have now tracked down the cause of the problem to a bug in qemu's
  emulation of the cmpxchg instruction.  Quoting the above bug report:

In a physical i386 CPU, the cmpxchg instruction performs a comparison
and read-modify-write memory cycle.  In the case where the comparison
outcome is unequal, the read-modify-write cycle is an effective
no-op, writing back the same value that was read, and the value of the
source operand is loaded into the accumulator.  Qemu attempts to
emulate this behavior including the redundant memory write.
   
To be precise, qemu first loads the accumulator and then does the
redundant memory write.  If a page fault occurs during the write, the
cmpxchg instruction will be restarted after handling the page fault,
but because the accumulator has already been changed, the comparison
will now incorrectly yield a result of equal, causing the memory
write to write the value from the source operand instead of re-writing
the original memory contents.

I assume fork() triggers the bug because it write protects pages to
implement copy-on-write, thereby producing a situation where the read
part of the cmpxchg read-modify-write cycle succeeds but the write
part causes a page fault.

Patching qemu to only change the accumulator after performing the
redundant write fixes the problem for me.

  I will attach a patch against qemu 0.12.3 shortly.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/569760/+subscriptions

Re: [Qemu-devel] [PATCH] ARM - Remove fixed map code buffer restriction

2011-12-12 Thread Peter Maydell

On 12 December 2011 17:24, andrzej zaborowski balr...@gmail.com wrote:
 BTW: I think we can also use the ld branch when we see the goto
 target is in Thumb mode.

The target of a goto is currently never Thumb (because gotos are
always to other TCG generated code and we only generate ARM insns).
If we did need to be able to do a goto-Thumb we'd want to support
the short-range version too. But I think we might as well leave it
until we actually need it.

-- PMM

Re: [Qemu-devel] [PATCH] ARM - Remove fixed map code buffer restriction

2011-12-12 Thread andrzej zaborowski

On 12 December 2011 19:03, Peter Maydell peter.mayd...@linaro.org wrote:
 On 12 December 2011 17:24, andrzej zaborowski balr...@gmail.com wrote:
 BTW: I think we can also use the ld branch when we see the goto
 target is in Thumb mode.

 The target of a goto is currently never Thumb (because gotos are
 always to other TCG generated code and we only generate ARM insns).

I'm aware of that, I just like functions that can do what their name
says well. :)

Cheers

Re: [Qemu-devel] [PATCH] ARM - Remove fixed map code buffer restriction

2011-12-12 Thread David Gilbert

On 12 December 2011 18:10, andrzej zaborowski balr...@gmail.com wrote:
 On 12 December 2011 19:03, Peter Maydell peter.mayd...@linaro.org wrote:
 On 12 December 2011 17:24, andrzej zaborowski balr...@gmail.com wrote:
 BTW: I think we can also use the ld branch when we see the goto
 target is in Thumb mode.

 The target of a goto is currently never Thumb (because gotos are
 always to other TCG generated code and we only generate ARM insns).

 I'm aware of that, I just like functions that can do what their name
 says well. :)

It does have an assert which will catch it if you try, so no one
should get caught out
by it, and on ARMv7 the add is apparently an interworking branch, so I
think it might
even work.

Dave

Re: [Qemu-devel] [PATCH 1/7] Remove unnecessary casts from PCI DMA code in eepro100

2011-12-12 Thread Anthony Liguori


On 11/03/2011 08:03 PM, David Gibson wrote:

This patch removes some unnecessary casts in the eepro100 device,
introduced by commit 16ef60c9a8269f7cbc95219a431b1d7cbf29
'eepro100: Use PCI DMA stub functions'.

Signed-off-by: David Gibsonda...@gibson.dropbear.id.au


Applied all.  Thanks.

Regards,

Anthony Liguori


---
  hw/eepro100.c |7 +++
  1 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/hw/eepro100.c b/hw/eepro100.c
index 7d59e71..8769e33 100644
--- a/hw/eepro100.c
+++ b/hw/eepro100.c
@@ -713,8 +713,7 @@ static void dump_statistics(EEPRO100State * s)
   * values which really matter.
   * Number of data should check configuration!!!
   */
-pci_dma_write(s-dev, s-statsaddr,
-  (uint8_t *)s-statistics, s-stats_size);
+pci_dma_write(s-dev, s-statsaddr,s-statistics, s-stats_size);
  stl_le_pci_dma(s-dev, s-statsaddr + 0,
 s-statistics.tx_good_frames);
  stl_le_pci_dma(s-dev, s-statsaddr + 36,
@@ -732,7 +731,7 @@ static void dump_statistics(EEPRO100State * s)

  static void read_cb(EEPRO100State *s)
  {
-pci_dma_read(s-dev, s-cb_address, (uint8_t *)s-tx, sizeof(s-tx));
+pci_dma_read(s-dev, s-cb_address,s-tx, sizeof(s-tx));
  s-tx.status = le16_to_cpu(s-tx.status);
  s-tx.command = le16_to_cpu(s-tx.command);
  s-tx.link = le32_to_cpu(s-tx.link);
@@ -1707,7 +1706,7 @@ static ssize_t nic_receive(VLANClientState *nc, const 
uint8_t * buf, size_t size
  /* !!! */
  eepro100_rx_t rx;
  pci_dma_read(s-dev, s-ru_base + s-ru_offset,
- (uint8_t *)rx, sizeof(eepro100_rx_t));
+rx, sizeof(eepro100_rx_t));
  uint16_t rfd_command = le16_to_cpu(rx.command);
  uint16_t rfd_size = le16_to_cpu(rx.size);

Re: [Qemu-devel] [PATCH] hw/usb-net.c: Fix precedence bug when checking rndis_state

2011-12-12 Thread Anthony Liguori


On 11/09/2011 03:09 PM, Peter Maydell wrote:

!X == 2 is always false (spotted by Coverity), so the checks
for whether rndis is in the correct state would never fire.

Signed-off-by: Peter Maydellpeter.mayd...@linaro.org


Applied.  Thanks.

Regards,

Anthony Liguori


---
NB that although I tested that this doesn't break non-rndis
usb-net, I don't have a test image that uses rndis usb-net,
so treat this patch with the appropriate degree of caution.
(Probably safer not putting it into 1.0 unless tested.)

  hw/usb-net.c |5 +++--
  1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/usb-net.c b/hw/usb-net.c
index a8b7c8d..f91fa32 100644
--- a/hw/usb-net.c
+++ b/hw/usb-net.c
@@ -1268,8 +1268,9 @@ static ssize_t usbnet_receive(VLANClientState *nc, const 
uint8_t *buf, size_t si

  if (is_rndis(s)) {
  msg = (struct rndis_packet_msg_type *) s-in_buf;
-if (!s-rndis_state == RNDIS_DATA_INITIALIZED)
+if (s-rndis_state != RNDIS_DATA_INITIALIZED) {
  return -1;
+}
  if (size + sizeof(struct rndis_packet_msg_type)  sizeof(s-in_buf))
  return -1;

@@ -1302,7 +1303,7 @@ static int usbnet_can_receive(VLANClientState *nc)
  {
  USBNetState *s = DO_UPCAST(NICState, nc, nc)-opaque;

-if (is_rndis(s)  !s-rndis_state == RNDIS_DATA_INITIALIZED) {
+if (is_rndis(s)  s-rndis_state != RNDIS_DATA_INITIALIZED) {
  return 1;
  }

Re: [Qemu-devel] [PATCH 00/10] qemu_fclose() error handling fixes (v3)

2011-12-12 Thread Anthony Liguori


On 11/10/2011 06:41 AM, Eduardo Habkost wrote:

Comments for v3:

I am still not sure if this is 1.0 material, but I am more inclined to delay
this for post-1.0.

Changes v2 -  v3:

- Only coding style changes for issues detected by checkpatch.pl:
   - Avoid // comments;
   - Use braces on if statements.


Applied all.  Thanks.

Regards,

Anthony Liguori




Comments for v2:

I am not sure if this is appropriate post-freeze, I will let the maintainers
decide this. Personally I think the code is more reliable with these changes,
but on the other hand the only bugs it fixes are on the error paths.

Changes v1 -  v2:
  - Patch 2: Cosmetic spelling change on comment text
  - Patch 5: Add small comment about the need to return previously-spotted 
errors
  - Patch 6: On success, keep returning pclose() return value, instead of 
always 0.
(the most relevant change in this new version of the series)

Also, this series was tested using ping-pong migration with Autotest, no
problems were detected.


Original series description:

Summary of the problem:

- qemu_fclose() calls qemu_fflush()
- Writes done by qemu_fflush() can fail
- Those errors are lost after qemu_fclose() returns

So, this series change qemu_fclose() to return last_error. But to do that we
need to make sure all involve code use the -errno convention, hence the large
series.


Michael, probably this will conflict with your ongoing work. I don't want to
delay other work, so I can rebase my patches if needed. This is just a RFC.

Juan, maybe you were already working on this. But as I was already fixing this
code while auditing the migration handling, I thought it was interesting to
send this for review anyway. I hope I didn't duplicate any work.


This is still completely untested, I am just using this series as a way to
report the issue and get comments so I know I am going through the right path.


Detailed description of the changes:

Small cleanups:

- Always use qemu_file_set_error() to set last_error (patch 1)
- Add return value documentation to QEMUFileCloseFunc (patch 2)

Actual qemu_fclose() behavior changes are done in 3 steps:

- First step: fix qemu_fclose() callers:
   - exec_close()
 - Fixed to check for negative values, not -1 (patch 3)
   - Note: exec_close() is changed in two steps: first on the qemu_fclose()
 calling code, then on the return value code
   - migrate_fd_cleanup
 - Fixed to:
   - check qemu_fclose() return value for0 (patch 4)
   - return -errno, not just -1 (patch 4)
 - Callers:
   - migrate_fd_completed:
 - Error checking is done properly, already.
   - migrate_fd_error:
 - It ignores migrated_fd_cleanup() return value.
   - migrate_fd_cancel:
 - It ignores migrated_fd_cleanup() return value.
   - exec_accept_incoming_migration(): no return value check (yet)
   - fd_accept_incoming_migration(): no return value check (yet)
   - tcp_accept_incoming_migration(): no return value check (yet)
   - unix_accept_incoming_migration(): no return value check (yet)
   - do_savevm(): no return value check (yet)
   - load_vmstate(): no return value check (yet)

- Second step: change qemu_fclose() to return last_error (patch 5)
   - Made sure to return unchanged (positive) success value on success
 (required by exec_close())

- Third step: change qemu_fclose() implementations (QEMUFileCloseFunc):
   - stdio_fclose
 - Fixed to return -errno (patch 6)
   - stdio_pclose
 - Fixed to return -errno (patch 7)
   - buffered_close
 - Implemented through QEMUFileBuffered.close:
   - Only implementation is migrate_fd_close(), that calls the following,
 through MigrationState.close:
 - exec_close():
   - fixed to return original error value, not -1 (patch 8)
 - fd_close
   - Fixed to return -errno on close() errors. (patch 9)
 - tcp_close
   - Fixed to return -errno on close() errors. (patch 10)
 - unix_close
   - Fixed to return -errno on close() errors. (patch 11)
   - socket_close
 - No system call is made, returns always 0.
   - bdrv_fclose
 - No system call is made, returns always 0.

Eduardo Habkost (10):
   savevm: use qemu_file_set_error() instead of setting last_error
 directly
   QEMUFileCloseFunc: add return value documentation (v2)
   exec_close(): accept any negative value as qemu_fclose() error
   migrate_fd_cleanup: accept any negative qemu_fclose() value as error
   qemu_fclose: return last_error if set (v3)
   stdio_pclose: return -errno on error (v3)
   stdio_fclose: return -errno on errors (v2)
   exec_close(): return -errno on errors (v2)
   tcp_close(): check for close() errors too (v2)
   unix_close(): check for close() errors too (v2)

  hw/hw.h  |8 +-
  migration-exec.c |9 ++-
  migration-tcp.c  |7 -
  migration-unix.c |7 -
  migration.c  |4 +--
  savevm.c |   65

Re: [Qemu-devel] [RFC] Device sandboxing

2011-12-12 Thread Corey Bryant




On 12/08/2011 04:51 PM, Blue Swirl wrote:

Why limit this to device emulation only? Where in QEMU would this
approach not work?


That's a good point, and we've thrown this idea around.  I don't know if 
there's any reason why this approach wouldn't work for all of QEMU.  The 
idea for now though is to target the most vulnerable code, devices.


--
Regards,
Corey

Re: [Qemu-devel] [RFC] Device sandboxing

2011-12-12 Thread Will Drewry

On Sun, Dec 11, 2011 at 4:50 AM, Dor Laor dl...@redhat.com wrote:
 On 12/08/2011 11:40 AM, Stefan Hajnoczi wrote:

 On Wed, Dec 7, 2011 at 8:54 PM, Eric Parisepa...@redhat.com  wrote:

 On Wed, 2011-12-07 at 13:43 -0600, Anthony Liguori wrote:

 On 12/07/2011 01:32 PM, Corey Bryant wrote:


 That would seem like the logical approach. I think there may be new
 mode 2
 patches coming soon so we can see how they go over.


 I'd like to see what the whitelist would need to be for something like
 QEMU in
 mode 2.  My biggest concern is that the whitelist would need to be so
 large that
 the practical security what's all that much improved.


 When I prototyped my version of seccomp v2 for qemu a while back I did
 it by only looking at syscalls after inital setup was completed (aka the
 very last thing before main_loop() was to lock it down).  My list was
 much sorter than even these:

 +        __NR_brk,
 +        __NR_close,
 +        __NR_exit_group,
 +        __NR_futex,
 +        __NR_ioctl,
 +        __NR_madvise,
 +        __NR_mmap,
 +        __NR_munmap,
 +        __NR_read,
 +        __NR_recvfrom,
 +        __NR_recvmsg,
 +        __NR_rt_sigaction,
 +        __NR_select,
 +        __NR_sendto,
 +        __NR_tgkill,
 +        __NR_timer_delete,
 +        __NR_timer_gettime,
 +        __NR_timer_settime,
 +        __NR_write,
 +        __NR_writev,

 There is simple obvious negligible overhead value here, but every
 proposal I know of, including mine, has been shot down by Ingo.  Ingo's
 proposal is much more work, more overhead, but clearly more flexible.
 His suggestions (and code based on those suggestions from others) has
 been shot down by PeterZ.

 So I feel like seccomp v2 is between a rock and a hard place.  We have
 something that works really well, and could be a huge win for all sorts
 of programs, but we don't seem to be able to get anything past the
 damned if you do, damned if you don't nak's.

 (There's also a cgroup version of seccomp proposed, but I'm guessing it
 will go just about as far as all the other versions)


 Still, these sorts of situations are overcome all the time.  Sometimes
 it takes a while and several LWN.net articles about the drama but at
 the end things can be worked out.

 If we want to discuss the specifics of mode 2 and especially what Ingo
 or Peter think then I think we should do it in a forum where they
 participate.  Maybe their positions have changed.


 Will, little bird whispered that you'll going to send another iteration w/
 higher acceptance chances. Where do we stand w/ it? Can you please elaborate
 on it chance to get merged?

Hi - yup. I keep getting delayed with other work, but I still plan to
send it soon.  The first plan was to port the updated patchset to
linus's tree, then repost. I plan on adding a cover letter (0/x) to
the series to see if we can get discussion going again on it. I'm not
sure what the merge chance is, to be honest. I believe the patch is
desirable to many parties, but Ingo didn't feel it was appropriate
given that it would add an ABI that was not part of the trace/perf
ABI.  Given that grand-unified trace has been held up on stable
internal api points and other challenges, I would hope that we could
continue on using a prctl for seccomp_filter, even if it becomes a
compatibility layer which is eventually switched off for most users.

I did spend some time looking at other approaches (making the syscall
table a namespace, cgroups syscalls like Łukasz Sowa's patch, etc),
but using the seccomp-centered approach still seems to make sense from
a process view and a seccomp view - and who wants a new syscall anyway
:)  The last thing I am looking at before I post is seeing if  it'd be
possible to avoid touching ftrace filter engine at all for
ipc,socketcall, ioctl, etc, but I'm not sure that it makes sense to
not use the generic system that exists today.

cheers!
will

1 2 3 >

1 - 100 of 261 matches

Mail list logo