date:20111207


Am 08.12.2011 08:03, schrieb 陳韋任:

On Wed, Dec 07, 2011 at 11:31:46PM +0100, Stefan Weil wrote:

TCG_TARGET_REG_BITS is declared in tcg.h for all TCG targets.


Just want to make sure. When we talk about target in TCG, that 
_always_ means

the host, right?

Regards,
chenwj


Yes. See file tcg/README which says this:

   The TCG "target" is the architecture for which we generate the
   code. It is of course not the same as the "target" of QEMU which is
   the emulated architecture. As TCG started as a generic C backend used
   for cross compiling, it is assumed that the TCG target is different
   from the host, although it is never the case for QEMU.

Regards,
Stefan Weil

Re: [Qemu-devel] [PATCH] tcg: Remove redundant declarations of TCG_TARGET_REG_BITS

2011-12-07 Thread 陳韋任

On Wed, Dec 07, 2011 at 11:31:46PM +0100, Stefan Weil wrote:
> TCG_TARGET_REG_BITS is declared in tcg.h for all TCG targets.

  Just want to make sure. When we talk about target in TCG, that _always_ means
the host, right?

Regards,
chenwj

-- 
Wei-Ren Chen (陳韋任)
Computer Systems Lab, Institute of Information Science,
Academia Sinica, Taiwan (R.O.C.)
Tel:886-2-2788-3799 #1667
Homepage: http://people.cs.nctu.edu.tw/~chenwj

[Qemu-devel] [PATCH] Fix parse of usb device description with multiple configurations

2011-12-07 Thread Cao,Bing Bu


When testing ipod on QEMU by He Jie Xu,qemu made a 
assertion.
We found that the ipod with 2 configurations,and the usb-linux did not parse 
the descriptor correctly.
The descr_len returned is the total length of the all configurations,not one 
configuration.
The older version will through the other configurations instead of 
skip,continue parsing the descriptor of interfaces/endpoints in other 
configurations,then went wrong.

This patch will put the configuration descriptor parse in loop outside and 
dispel the other configurations not requested.


Signed-off-by: Cao,Bing Bu 
---
 usb-linux.c |   20 
 1 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/usb-linux.c b/usb-linux.c
index ab4c693..a53b558 100644
--- a/usb-linux.c
+++ b/usb-linux.c
@@ -1141,15 +1141,19 @@ static int usb_linux_update_endp_table(USBHostDevice *s)
 length = s->descr_len - 18;
 i = 0;
 
-if (descriptors[i + 1] != USB_DT_CONFIG ||
-descriptors[i + 5] != s->configuration) {
-fprintf(stderr, "invalid descriptor data - configuration %d\n",
-s->configuration);
-return 1;
-}
-i += descriptors[i];
-
 while (i < length) {
+if (descriptors[i + 1] != USB_DT_CONFIG) {
+fprintf(stderr, "invalid descriptor data\n");
+return 1;
+} else if (descriptors[i + 5] != s->configuration) {
+fprintf(stderr, "not requested configuration %d\n",
+s->configuration);
+i += (descriptors[i + 3] << 8) + descriptors[i + 2];
+continue;
+}
+
+i += descriptors[i];
+
 if (descriptors[i + 1] != USB_DT_INTERFACE ||
 (descriptors[i + 1] == USB_DT_INTERFACE &&
  descriptors[i + 4] == 0)) {
-- 
1.7.1

[Qemu-devel] [PATCH] use pci macro in virtio

2011-12-07 Thread hkran


Signed-off-by: hkran 
---
 hw/virtio-pci.c |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index c665f5c..f8ee772 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -627,9 +627,9 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice 
*vdev)
 if (proxy->class_code) {
 pci_config_set_class(config, proxy->class_code);
 }
-pci_set_word(config + 0x2c, pci_get_word(config + PCI_VENDOR_ID));
-pci_set_word(config + 0x2e, vdev->device_id);
-config[0x3d] = 1;
+pci_set_word(config + PCI_SUBSYSTEM_VENDOR_ID, pci_get_word(config + 
PCI_VENDOR_ID));
+pci_set_word(config + PCI_SUBSYSTEM_ID, vdev->device_id);
+config[PCI_INTERRUPT_PIN] = 1;
 
 memory_region_init(&proxy->msix_bar, "virtio-msix", 4096);
 if (vdev->nvectors && !msix_init(&proxy->pci_dev, vdev->nvectors,
-- 
1.7.1

[Qemu-devel] [PATCH] block/cow : return real error code in cow.c

2011-12-07 Thread Li Zhi Hui

Signed-off-by: Li Zhi Hui 
---
 block/cow.c |   22 --
 1 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/block/cow.c b/block/cow.c
index 3c52735..383482b 100644
--- a/block/cow.c
+++ b/block/cow.c
@@ -64,10 +64,11 @@ static int cow_open(BlockDriverState *bs, int flags)
 struct cow_header_v2 cow_header;
 int bitmap_size;
 int64_t size;
+int ret;
 
 /* see if it is a cow image */
-if (bdrv_pread(bs->file, 0, &cow_header, sizeof(cow_header)) !=
-sizeof(cow_header)) {
+ret = bdrv_pread(bs->file, 0, &cow_header, sizeof(cow_header));
+if (ret < 0) {
 goto fail;
 }
 
@@ -88,7 +89,7 @@ static int cow_open(BlockDriverState *bs, int flags)
 qemu_co_mutex_init(&s->lock);
 return 0;
  fail:
-return -1;
+return ret;
 }
 
 /*
@@ -182,14 +183,14 @@ static int coroutine_fn cow_read(BlockDriverState *bs, 
int64_t sector_num,
 ret = bdrv_pread(bs->file,
 s->cow_sectors_offset + sector_num * 512,
 buf, n * 512);
-if (ret != n * 512)
-return -1;
+if (ret < 0)
+return ret;
 } else {
 if (bs->backing_hd) {
 /* read from the base image */
 ret = bdrv_read(bs->backing_hd, sector_num, buf, n);
 if (ret < 0)
-return -1;
+return ret;
 } else {
 memset(buf, 0, n * 512);
 }
@@ -220,8 +221,9 @@ static int cow_write(BlockDriverState *bs, int64_t 
sector_num,
 
 ret = bdrv_pwrite(bs->file, s->cow_sectors_offset + sector_num * 512,
   buf, nb_sectors * 512);
-if (ret != nb_sectors * 512)
-return -1;
+if (ret < 0) {
+return ret;
+}
 
 return cow_update_bitmap(bs, sector_num, nb_sectors);
 }
@@ -288,14 +290,14 @@ static int cow_create(const char *filename, 
QEMUOptionParameter *options)
 cow_header.sectorsize = cpu_to_be32(512);
 cow_header.size = cpu_to_be64(image_sectors * 512);
 ret = bdrv_pwrite(cow_bs, 0, &cow_header, sizeof(cow_header));
-if (ret != sizeof(cow_header)) {
+if (ret < 0) {
 goto exit;
 }
 
 /* resize to include at least all the bitmap */
 ret = bdrv_truncate(cow_bs,
 sizeof(cow_header) + ((image_sectors + 7) >> 3));
-if (ret) {
+if (ret < 0) {
 goto exit;
 }
 
-- 
1.7.4.1

[Qemu-devel] [PATCH 3/3] linux-user:Signal handling for MIPS64

From: Khansa Butt 


Signed-off-by: Ehsan Ul Haq 
---
 linux-user/signal.c |  429 +--
 1 files changed, 417 insertions(+), 12 deletions(-)

diff --git a/linux-user/signal.c b/linux-user/signal.c
index 78e3380..0f4091d 100644
--- a/linux-user/signal.c
+++ b/linux-user/signal.c
@@ -596,7 +596,13 @@ int do_sigaction(int sig, const struct target_sigaction 
*act,
 if (act) {
 /* FIXME: This is not threadsafe.  */
 k->_sa_handler = tswapal(act->_sa_handler);
+#if defined(TARGET_MIPS64)
+/* tswapal() do 64 bit swap in case of MIPS64 but
+   we need 32 bit swap as sa_flags is 32 bit */
+k->sa_flags = bswap32(act->sa_flags);
+#else
 k->sa_flags = tswapal(act->sa_flags);
+#endif
 #if !defined(TARGET_MIPS)
 k->sa_restorer = tswapal(act->sa_restorer);
 #endif
@@ -2416,31 +2422,430 @@ void sparc64_get_context(CPUSPARCState *env)
 #endif
 #elif defined(TARGET_ABI_MIPSN64)
 
-# warning signal handling not implemented
+struct target_sigcontext {
+uint32_t   sc_regmask; /* Unused */
+uint32_t   sc_status;
+uint64_t   sc_pc;
+uint64_t   sc_regs[32];
+uint64_t   sc_fpregs[32];
+uint32_t   sc_ownedfp; /* Unused */
+uint32_t   sc_fpc_csr;
+uint32_t   sc_fpc_eir; /* Unused */
+uint32_t   sc_used_math;
+uint32_t   sc_dsp; /* dsp status, was sc_ssflags */
+uint32_t   pad0;
+uint64_t   sc_mdhi;
+uint64_t   sc_mdlo;
+target_ulong   sc_hi1; /* Was sc_cause */
+target_ulong   sc_lo1; /* Was sc_badvaddr */
+target_ulong   sc_hi2; /* Was sc_sigset[4] */
+target_ulong   sc_lo2;
+target_ulong   sc_hi3;
+target_ulong   sc_lo3;
+};
+
+struct sigframe {
+uint32_t sf_ass[4]; /* argument save space for o32 */
+uint32_t sf_code[2];/* signal trampoline */
+struct target_sigcontext sf_sc;
+target_sigset_t sf_mask;
+};
+
+struct target_ucontext {
+target_ulong tuc_flags;
+target_ulong tuc_link;
+target_stack_t tuc_stack;
+target_ulong pad0;
+struct target_sigcontext tuc_mcontext;
+target_sigset_t tuc_sigmask;
+};
+
+struct target_rt_sigframe {
+uint32_t rs_ass[4];   /* argument save space for o32 */
+uint32_t rs_code[2];  /* signal trampoline */
+struct target_siginfo rs_info;
+struct target_ucontext rs_uc;
+};
+
+/* Install trampoline to jump back from signal handler */
+static inline int install_sigtramp(unsigned int *tramp,   unsigned int syscall)
+{
+int err;
+
+/*
+ * Set up the return code ...
+ *
+ * li  v0, __NR__foo_sigreturn
+ * syscall
+ */
+
+err = __put_user(0x2402 + syscall, tramp + 0);
+err |= __put_user(0x000c  , tramp + 1);
+/* flush_cache_sigtramp((unsigned long) tramp); */
+return err;
+}
+
+static inline int
+setup_sigcontext(CPUState *regs, struct target_sigcontext *sc)
+{
+int err = 0;
+
+err |= __put_user(regs->active_tc.PC, &sc->sc_pc);
+
+#define save_gp_reg(i) do { \
+err |= __put_user(regs->active_tc.gpr[i], &sc->sc_regs[i]); \
+} while (0)
+__put_user(0, &sc->sc_regs[0]); save_gp_reg(1); save_gp_reg(2);
+save_gp_reg(3); save_gp_reg(4); save_gp_reg(5); save_gp_reg(6);
+save_gp_reg(7); save_gp_reg(8); save_gp_reg(9); save_gp_reg(10);
+save_gp_reg(11); save_gp_reg(12); save_gp_reg(13); save_gp_reg(14);
+save_gp_reg(15); save_gp_reg(16); save_gp_reg(17); save_gp_reg(18);
+save_gp_reg(19); save_gp_reg(20); save_gp_reg(21); save_gp_reg(22);
+save_gp_reg(23); save_gp_reg(24); save_gp_reg(25); save_gp_reg(26);
+save_gp_reg(27); save_gp_reg(28); save_gp_reg(29); save_gp_reg(30);
+save_gp_reg(31);
+#undef save_gp_reg
+
+err |= __put_user(regs->active_tc.HI[0], &sc->sc_mdhi);
+err |= __put_user(regs->active_tc.LO[0], &sc->sc_mdlo);
+
+/* Not used yet, but might be useful if we ever have DSP suppport */
+#if 0
+if (cpu_has_dsp) {
+err |= __put_user(mfhi1(), &sc->sc_hi1);
+err |= __put_user(mflo1(), &sc->sc_lo1);
+err |= __put_user(mfhi2(), &sc->sc_hi2);
+err |= __put_user(mflo2(), &sc->sc_lo2);
+err |= __put_user(mfhi3(), &sc->sc_hi3);
+err |= __put_user(mflo3(), &sc->sc_lo3);
+err |= __put_user(rddsp(DSP_MASK), &sc->sc_dsp);
+}
+/* same with 64 bit */
+#ifdef CONFIG_64BIT
+err |= __put_user(regs->hi, &sc->sc_hi[0]);
+err |= __put_user(regs->lo, &sc->sc_lo[0]);
+if (cpu_has_dsp) {
+err |= __put_user(mfhi1(), &sc->sc_hi[1]);
+err |= __put_user(mflo1(), &sc->sc_lo[1]);
+err |= __put_user(mfhi2(), &sc->sc_hi[2]);
+err |= __put_user(mflo2(), &sc->sc_lo[2]);
+err |= __put_user(mfhi3(), &sc->sc_hi[3]);
+err |= __put_user(mflo3(), &sc->sc_lo[3]);
+err |= __put_user(rddsp(DSP_MASK), &sc->sc_dsp);
+}
+#endif
+#endif

[Qemu-devel] [PATCH 2/3] target-mips:enabling of 64 bit user mode and floating point operations MIPS_HFLAG_UX is included in env->hflags so that the address computation for LD instruction does not tre

From: Khansa Butt 


Signed-off-by: Abdul Qadeer 
---
 target-mips/translate.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/target-mips/translate.c b/target-mips/translate.c
index d5b1c76..452a63b 100644
--- a/target-mips/translate.c
+++ b/target-mips/translate.c
@@ -12779,6 +12779,10 @@ void cpu_reset (CPUMIPSState *env)
 env->hflags |= MIPS_HFLAG_FPU;
 }
 #ifdef TARGET_MIPS64
+env->hflags |=  MIPS_HFLAG_UX;
+/* if cpu has FPU, MIPS_HFLAG_F64 must be included in env->hflags
+   so that floating point operations can be emulated */
+env->active_fpu.fcr0 = env->cpu_model->CP1_fcr0;
 if (env->active_fpu.fcr0 & (1 << FCR0_F64)) {
 env->hflags |= MIPS_HFLAG_F64;
 }
-- 
1.7.3.4

[Qemu-devel] [PATCH 1/3] linux-user:Support for MIPS64 user mode emulation in QEMU

From: Khansa Butt 


Signed-off-by: Khansa Butt 
---
 configure |1 +
 default-configs/mips64-linux-user.mak |1 +
 linux-user/main.c |   21 +++--
 linux-user/mips64/syscall.h   |2 ++
 4 files changed, 23 insertions(+), 2 deletions(-)
 create mode 100644 default-configs/mips64-linux-user.mak

diff --git a/configure b/configure
index ac4840d..e31229b 100755
--- a/configure
+++ b/configure
@@ -914,6 +914,7 @@ m68k-linux-user \
 microblaze-linux-user \
 microblazeel-linux-user \
 mips-linux-user \
+mips64-linux-user \
 mipsel-linux-user \
 ppc-linux-user \
 ppc64-linux-user \
diff --git a/default-configs/mips64-linux-user.mak 
b/default-configs/mips64-linux-user.mak
new file mode 100644
index 000..1598bfc
--- /dev/null
+++ b/default-configs/mips64-linux-user.mak
@@ -0,0 +1 @@
+# Default configuration for mips64-linux-user
diff --git a/linux-user/main.c b/linux-user/main.c
index d1bbc57..17a74cd 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -2157,7 +2157,8 @@ static int do_store_exclusive(CPUMIPSState *env)
 void cpu_loop(CPUMIPSState *env)
 {
 target_siginfo_t info;
-int trapnr, ret;
+int trapnr;
+abi_long ret;
 unsigned int syscall_num;
 
 for(;;) {
@@ -2166,8 +2167,23 @@ void cpu_loop(CPUMIPSState *env)
 cpu_exec_end(env);
 switch(trapnr) {
 case EXCP_SYSCALL:
-syscall_num = env->active_tc.gpr[2] - 4000;
 env->active_tc.PC += 4;
+#if defined(TARGET_MIPS64)
+syscall_num = env->active_tc.gpr[2] - 5000;
+/* MIPS64 has eight argument registers so there is
+ * no need to get arguments from stack
+ */
+ret = do_syscall(env, env->active_tc.gpr[2],
+ env->active_tc.gpr[4],
+ env->active_tc.gpr[5],
+ env->active_tc.gpr[6],
+ env->active_tc.gpr[7],
+ env->active_tc.gpr[8],
+ env->active_tc.gpr[9],
+ env->active_tc.gpr[10],
+ env->active_tc.gpr[11]);
+#else
+syscall_num = env->active_tc.gpr[2] - 4000;
 if (syscall_num >= sizeof(mips_syscall_args)) {
 ret = -TARGET_ENOSYS;
 } else {
@@ -2205,6 +2221,7 @@ void cpu_loop(CPUMIPSState *env)
  env->active_tc.gpr[7],
  arg5, arg6, arg7, arg8);
 }
+#endif
 done_syscall:
 if (ret == -TARGET_QEMU_ESIGRETURN) {
 /* Returning from a successful sigreturn syscall.
diff --git a/linux-user/mips64/syscall.h b/linux-user/mips64/syscall.h
index 668a2b9..96f03da 100644
--- a/linux-user/mips64/syscall.h
+++ b/linux-user/mips64/syscall.h
@@ -218,4 +218,6 @@ struct target_pt_regs {
 
 
 
+#define TARGET_QEMU_ESIGRETURN 255
+
 #define UNAME_MACHINE "mips64"
-- 
1.7.3.4

[Qemu-devel] [PATCH 0/3] MIPS64 user mode emulation in QEMU

From: Khansa Butt 

This is the team work of Ehsan-ul-Haq, Abdul Qadeer, Abdul Waheed, Khansa Butt
from HPCN Lab KICS UET Lahore.
In previous patch set we were including Cavium specific instructions along with 
Cavium specifc registers in UME. Because of these register fields we had to bump
the cpu version up but I noticed that cpu_save() and cpu_load() are not called 
in
UME so we decided to postpone Octeon specific changes ( registers and 
instructions)
and will include them in our SME work( we are currently working on system mode 
emulation of Octeon board) so we closing the following thread
http://lists.gnu.org/archive/html/qemu-devel/2011-10/msg02665.html
Please review this new patch set which is without cavium instruction support. 

 configure |1 +
 default-configs/mips64-linux-user.mak |1 +
 linux-user/main.c |   21 ++-
 linux-user/mips64/syscall.h   |2 +
 linux-user/signal.c   |  429 -
 target-mips/translate.c   |4 +
 6 files changed, 444 insertions(+), 14 deletions(-)
 create mode 100644 default-configs/mips64-linux-user.mak

-- 
1.7.3.4

Re: [Qemu-devel] [PATCH v2] network scripts: don't block SIGCHLD before forking


Bah, missed a 'git add', will resend. Sorry for the noise.

[Qemu-devel] [PATCH v3] network scripts: don't block SIGCHLD before forking

This patch fixes a bug where child processes of launch_script() can
misbehave due to SIGCHLD being blocked. In the case of `sudo`, this
causes a permanent hang.

Previously a SIGCHLD handler was added to reap fork_exec()'d zombie
processes by calling waitpid(-1, ...). This required other
fork()/waitpid() callers to temporarilly block SIGCHILD to avoid
having the final wait status being intercepted by the SIGCHLD
handler:

7c3370d4fe3fa6cda8655f109e4659afc8ca4269

Since then, the qemu_add_child_watch() interface was added to allow
registration of such processes and reap only from that specific set
of PIDs:

4d54ec7898bd951007cb6122d5315584bd41d0c4

As a result, we can now avoid blocking SIGCHLD in launch_script(), so
drop that behavior.

Signed-off-by: Michael Roth 
---
 net/tap.c |6 --
 1 files changed, 0 insertions(+), 6 deletions(-)

diff --git a/net/tap.c b/net/tap.c
index 1f26dc9..6c27a94 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -346,15 +346,10 @@ static TAPState *net_tap_fd_init(VLANState *vlan,
 
 static int launch_script(const char *setup_script, const char *ifname, int fd)
 {
-sigset_t oldmask, mask;
 int pid, status;
 char *args[3];
 char **parg;
 
-sigemptyset(&mask);
-sigaddset(&mask, SIGCHLD);
-sigprocmask(SIG_BLOCK, &mask, &oldmask);
-
 /* try to launch network script */
 pid = fork();
 if (pid == 0) {
@@ -378,7 +373,6 @@ static int launch_script(const char *setup_script, const 
char *ifname, int fd)
 while (waitpid(pid, &status, 0) != pid) {
 /* loop */
 }
-sigprocmask(SIG_SETMASK, &oldmask, NULL);
 
 if (WIFEXITED(status) && WEXITSTATUS(status) == 0) {
 return 0;
-- 
1.7.4.1

[Qemu-devel] [PATCH v2] network scripts: don't block SIGCHLD before forking

This patch fixes a bug where child processes of launch_script() can
misbehave due to SIGCHLD being blocked. In the case of `sudo`, this
causes a permanent hang.

Previously a SIGCHLD handler was added to reap fork_exec()'d zombie
processes by calling waitpid(-1, ...). This required other
fork()/waitpid() callers to temporarilly block SIGCHILD to avoid
having the final wait status being intercepted by the SIGCHLD
handler:

7c3370d4fe3fa6cda8655f109e4659afc8ca4269

Since then, the qemu_add_child_watch() interface was added to allow
registration of such processes and reap only from that specific set
of PIDs:

4d54ec7898bd951007cb6122d5315584bd41d0c4

As a result, we can now avoid blocking SIGCHLD in launch_script(), so
drop that behavior.

Signed-off-by: Michael Roth 
---
 net/tap.c |5 -
 1 files changed, 0 insertions(+), 5 deletions(-)

diff --git a/net/tap.c b/net/tap.c
index 1f26dc9..2cf79e1 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -351,10 +351,6 @@ static int launch_script(const char *setup_script, const 
char *ifname, int fd)
 char *args[3];
 char **parg;
 
-sigemptyset(&mask);
-sigaddset(&mask, SIGCHLD);
-sigprocmask(SIG_BLOCK, &mask, &oldmask);
-
 /* try to launch network script */
 pid = fork();
 if (pid == 0) {
@@ -378,7 +374,6 @@ static int launch_script(const char *setup_script, const 
char *ifname, int fd)
 while (waitpid(pid, &status, 0) != pid) {
 /* loop */
 }
-sigprocmask(SIG_SETMASK, &oldmask, NULL);
 
 if (WIFEXITED(status) && WEXITSTATUS(status) == 0) {
 return 0;
-- 
1.7.4.1

[Qemu-devel] [PATCH] network scripts: don't block SIGCHLD before forking

This patch fixes a bug where child processes of launch_script() can
misbehave due to SIGCHLD being blocked. In the case of `sudo`, this
causes a permanent hang.

Previously a SIGCHLD handler was added to reap fork_exec()'d zombie
processes by calling waitpid(-1, ...). This required other
fork()/waitpid() callers to temporarilly block SIGCHILD to avoid
having the final wait status being intercepted by the SIGCHLD
handler:

7c3370d4fe3fa6cda8655f109e4659afc8ca4269

Since then, the qemu_add_child_watch() interface was added to allow
registration of such processes and reap only from that specific set
of PIDs:

4d54ec7898bd951007cb6122d5315584bd41d0c4

As a result, we can now avoid blocking SIGCHLD in launch_script(), so
drop that behavior.

Signed-off-by: Michael Roth 
---
 net/tap.c |5 -
 1 files changed, 0 insertions(+), 5 deletions(-)

diff --git a/net/tap.c b/net/tap.c
index 1f26dc9..2cf79e1 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -351,10 +351,6 @@ static int launch_script(const char *setup_script, const 
char *ifname, int fd)
 char *args[3];
 char **parg;
 
-sigemptyset(&mask);
-sigaddset(&mask, SIGCHLD);
-sigprocmask(SIG_BLOCK, &mask, &oldmask);
-
 /* try to launch network script */
 pid = fork();
 if (pid == 0) {
@@ -378,7 +374,6 @@ static int launch_script(const char *setup_script, const 
char *ifname, int fd)
 while (waitpid(pid, &status, 0) != pid) {
 /* loop */
 }
-sigprocmask(SIG_SETMASK, &oldmask, NULL);
 
 if (WIFEXITED(status) && WEXITSTATUS(status) == 0) {
 return 0;
-- 
1.7.4.1

Re: [Qemu-devel] [PATCH] use PCI MACRO in virtio-pci


On 12/07/2011 03:39 AM, hkran wrote:

Signed-off-by: hkran 


A Signed-off-by line needs to use your full name.

Regards,

Anthony Liguori


---
hw/virtio-pci.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index c665f5c..f8ee772 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -627,9 +627,9 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice 
*vdev)
if (proxy->class_code) {
pci_config_set_class(config, proxy->class_code);
}
- pci_set_word(config + 0x2c, pci_get_word(config + PCI_VENDOR_ID));
- pci_set_word(config + 0x2e, vdev->device_id);
- config[0x3d] = 1;
+ pci_set_word(config + PCI_SUBSYSTEM_VENDOR_ID, pci_get_word(config +
PCI_VENDOR_ID));
+ pci_set_word(config + PCI_SUBSYSTEM_ID, vdev->device_id);
+ config[PCI_INTERRUPT_PIN] = 1;

memory_region_init(&proxy->msix_bar, "virtio-msix", 4096);
if (vdev->nvectors && !msix_init(&proxy->pci_dev, vdev->nvectors,

Re: [Qemu-devel] [PATCH] use PCI MACRO in virtio-pci

2011-12-07 Thread hkran


On 12/07/2011 06:18 PM, Stefan Hajnoczi wrote:

On Wed, Dec 7, 2011 at 9:39 AM, hkran  wrote:

Signed-off-by: hkran
---
  hw/virtio-pci.c |6 +++---
  1 files changed, 3 insertions(+), 3 deletions(-)

This patch is line-wrapped and git-am refuses to apply it.

Are you using git-send-email(1)?  It normally sends patches correctly
so I recommend it if you are not yet using it.

Otherwise this looks good and we can take it into the trivial-patches
tree.  Please resend without line wrap.

Stefan


ok,thanks for reviewing. I will resend it with git-publish:)

Re: [Qemu-devel] [PATCH 2/2] configure: remove --enable-cocoa (default), add --disable-cocoa.

2011-12-07 Thread andrzej zaborowski

On 7 December 2011 19:56, Peter Maydell  wrote:
> On 7 December 2011 07:47, Andrzej Zaborowski  wrote:
>> Cocoa can only be enabled on Darwin, and is enabled by default too,
>> making --enable-cocoa redundant, with no way to disable Cocoa.  It
>> also interfered with SDL support in a way that was dependent on
>> the order of commandline switches.
>
> For these --enable/disable pairs I quite like the pattern where
>  * default is "probe and use if available"
>  * --enable-foo is "use foo, fail configure if not available"
>  * --disable-foo is "don't use foo"
>
> (--enable-sdl/--disable-sdl work like this, for instance).

Yep, the difference here is that there's no probing, so --enable is a
little redundant, but maybe for consistency it's better to have
anyway.  One of the issues with current --enable-cocoa though is that
it has a different effect than if Cocoa is enabled by default.

>
> [cf the comment in configure at line 100.]
>
> Incidentally, is it "Cocoa" or "COCOA" ?

It appears as Cocoa in the system libraries, but don't take my word.

Cheers

Re: [Qemu-devel] [PATCH 3/3] configure: add '--disable-cocoa' switch

Am 10.11.2011 19:40, schrieb Pavel Borzenkov:
> When SDL support is disabled, there is no way to build QEMU without
> Cocoa support on MacOS X. This patch adds '--disable-cocoa' switch and
> allows to build QEMU without both SDL and Cocoa frontends.
> 
> Signed-off-by: Pavel Borzenkov 
> ---
>  configure |7 ++-
>  1 files changed, 6 insertions(+), 1 deletions(-)
> 
> diff --git a/configure b/configure
> index 401d9a6..4720bb2 100755
> --- a/configure
> +++ b/configure
> @@ -670,6 +670,8 @@ for opt do
>;;
>--enable-profiler) profiler="yes"
>;;
> +  --disable-cocoa) cocoa="no"
> +  ;;
>--enable-cocoa)
>cocoa="yes" ;
>sdl="no" ;

Tested-by: Andreas Färber 

> @@ -980,7 +982,10 @@ echo "  --disable-sdldisable SDL"
>  echo "  --enable-sdl enable SDL"
>  echo "  --disable-vncdisable VNC"
>  echo "  --enable-vnc enable VNC"
> -echo "  --enable-cocoa   enable COCOA (Mac OS X only)"
> +if test "$darwin" = "yes" ; then
> +echo "  --disable-cocoa  disable COCOA"
> +echo "  --enable-cocoa   enable COCOA (default)"
> +fi
>  echo "  --audio-drv-list=LISTset audio drivers list:"
>  echo "   Available drivers: $audio_possible_drivers"
>  echo "  --audio-card-list=LIST   set list of emulated audio cards 
> [$audio_card_list]"

I see no prior art of any conditional help output in configure. Anthony?
Andrzej?

Andreas

Re: [Qemu-devel] [PATCH 2/3] raw-posix: Do not use CONFIG_COCOA macro

Am 30.11.2011 01:38, schrieb Andreas Färber:
> Am 10.11.2011 19:40, schrieb Pavel Borzenkov:
>> Use __APPLE__ and __MACH__ macros instead of CONFIG_COCOA to detect Mac
>> OS X host. The patch is based on the Ben Leslie's patch:
>> http://patchwork.ozlabs.org/patch/97859/
>>
>> Signed-off-by: Pavel Borzenkov 
> 
> Reviewed-by: Andreas Färber 

Thanks, applied to the cocoa branch with a minor edit (s/the//).

http://repo.or.cz/w/qemu/afaerber.git/shortlog/refs/heads/cocoa-for-upstream

Andreas

>> ---
>>  block/raw-posix.c |8 
>>  1 files changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/block/raw-posix.c b/block/raw-posix.c
>> index 17e1c6f..ee08f80 100644
>> --- a/block/raw-posix.c
>> +++ b/block/raw-posix.c
>> @@ -29,7 +29,7 @@
>>  #include "module.h"
>>  #include "block/raw-posix-aio.h"
>>  
>> -#ifdef CONFIG_COCOA
>> +#if defined(__APPLE__) && (__MACH__)
>>  #include 
>>  #include 
>>  #include 
>> @@ -487,7 +487,7 @@ again:
>>  }
>>  if (size == 0)
>>  #endif
>> -#ifdef CONFIG_COCOA
>> +#if defined(__APPLE__) && defined(__MACH__)
>>  size = LONG_LONG_MAX;
>>  #else
>>  size = lseek(fd, 0LL, SEEK_END);
>> @@ -632,7 +632,7 @@ static BlockDriver bdrv_file = {
>>  /***/
>>  /* host device */
>>  
>> -#ifdef CONFIG_COCOA
>> +#if defined(__APPLE__) && defined(__MACH__)
>>  static kern_return_t FindEjectableCDMedia( io_iterator_t *mediaIterator );
>>  static kern_return_t GetBSDPath( io_iterator_t mediaIterator, char 
>> *bsdPath, CFIndex maxPathSize );
>>  
>> @@ -710,7 +710,7 @@ static int hdev_open(BlockDriverState *bs, const char 
>> *filename, int flags)
>>  {
>>  BDRVRawState *s = bs->opaque;
>>  
>> -#ifdef CONFIG_COCOA
>> +#if defined(__APPLE__) && defined(__MACH__)
>>  if (strstart(filename, "/dev/cdrom", NULL)) {
>>  kern_return_t kernResult;
>>  io_iterator_t mediaIterator;

[Qemu-devel] an error while loading checkpoint

2011-12-07 Thread sparsh mittal

I get this warning while using a checkpoint. I issued the command -m 4G
-icount 0 and other parameters.
Unknown savevm section type 32
qemu-system-x86_64: Error -22 while loading VM state

My computer has 48GB RAM, so I don't know what is the problem?


Thanks and Regards
Sparsh Mittal

Re: [Qemu-devel] [libvirt] virDomainBlockJobAbort and block_job_cancel

2011-12-07 Thread Eric Blake

On 12/07/2011 03:35 PM, Adam Litke wrote:
> Stefan's qemu tree has a block_job_cancel command that always acts
> asynchronously.  In order to provide the synchronous behavior in libvirt (when
> flags is 0), I need to wait for the block job to go away.  I see two options:
> 
> 1) Use the event:
> To implement this I would create an internal event callback function and
> register it to receive the block job events.  When the cancel event comes in, 
> I
> can exit the qemu job context.  One problem I see with this is that events are
> only available when using the json monitor mode.

I like this idea.  We have internally handled events before, and limited
it to just JSON if that made life easier: for example, virDomainReboot
on qemu is rejected if you only have the HMP monitor, and if you have
the JSON monitor, the implementation internally handles the event to
change the domain state.

Can we reliably detect whether qemu is new enough to provide the event,
and if qemu was older and did not provide the event, do we reliably know
that abort was blocking in that case?

It's okay to make things work that used to fail, but it is a regression
to make blocking job cancel fail where it used to work, so rejecting
blocking job cancel with HMP monitor is not a good idea.  If we can
guarantee that all qemu new enough to have async cancel also support the
event, while older qemu was always blocking, and that we can always use
the JSON monitor to newer qemu, then we're set - merely ensure that we
use only the JSON monitor and the event.  But if we can't make the
guarantees, and insist on supporting newer qemu via HMP monitor, then we
may need a hybrid combination of options 1 and 2, or for less code
maintenance, just option 2.

> 
> 2) Poll the qemu monitor
> To do it this way, I would write a function that repeatedly calls
> virDomainGetBlockJobInfo() against the disk in question.  Once the job
> disappears from the list I can return with confidence that the job is gone.
> This is obviously sub-optimal because I need to poll and sleep.

We've done this before, for both HMP and JSON - see
qemuMigrationWaitForCompletion.  I agree that an event is nicer than
polling, but we may be locked into this.

> 
> 3) Ask Stefan to provide a synchronous mode for the qemu monitor command
> This one is the nicest from a libvirt perspective, but I doubt qemu wants to 
> add
> such an interface given that it basically has broken semantics as far as qemu 
> is
> concerned.

Or even:

4) Ask Stefan to make the HMP monitor command synchronous, but only
expose the JSON command as asynchronous.  After all, it is only HMP
where we can't wait for an event.

> 
> If this is all too nasty, we could probably just change the behavior of
> blockJobAbort and make it always synchronous with a 'cancelled' event.

No, we can't change the behavior without breaking back-compat of
existing clients of job cancellation.

-- 
Eric Blake   ebl...@redhat.com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [libvirt] virDomainBlockJobAbort and block_job_cancel

2011-12-07 Thread Adam Litke

On Thu, Nov 24, 2011 at 09:21:42AM +, Stefan Hajnoczi wrote:
> On Thu, Nov 24, 2011 at 5:31 AM, Daniel Veillard  wrote:
> > On Wed, Nov 23, 2011 at 09:04:50AM -0700, Eric Blake wrote:
> >> On 11/23/2011 07:48 AM, Stefan Hajnoczi wrote:
> >> > This means that virDomainBlockJobAbort() returns to the client without
> >> > a guarantee that the job has completed.  If the client enumerates jobs
> >> > it may still see a job that has not finished cancelling.  The client
> >> > must register a handler for the BLOCK_JOB_CANCELLED event if it wants
> >> > to know when the job really goes away.  The BLOCK_JOB_CANCELLED event
> >> > has the same fields as the BLOCK_JOB_COMPLETED event, except it lacks
> >> > the optional "error" message field.
> >> >
> >> > The impact on clients is that they need to add a BLOCK_JOB_CANCELLED
> >> > handler if they really want to wait.  Most clients today (not many
> >> > exist) will be fine without waiting for cancellation.
> >> >
> >> > Any objections or thoughts on this?
> >>
> >> virDomainBlockJobAbort() thankfully has an 'unsigned int flags'
> >> argument.  For backwards-compatibility, I suggest we use it:
> >>
> >> calling virDomainBlockJobAbort(,0) maintains old blocking behavior, and
> >> we document that blocking until things abort may render the rest of
> >> interactions with the domain unresponsive.
> >>
> >> The new virDomainBlockJobAbort(,VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC) would
> >> then implement your new proposed semantics of returning immediately once
> >> the cancellation has been requested, even if it hasn't been acted on yet.
> >>
> >> Maybe you could convince me to swap the flags: have 0 change semantics
> >> to non-blocking, and a new flag to request blocking, where callers that
> >> care have to try the flag, and if the flag is unsupported then they know
> >> they are talking to older libvirtd where the behavior is blocking by
> >> default, but that's a bit riskier.
> >
> >  Agreed, I would rather not change the current call semantic,
> > but an ASYNC flag would be a really good addition. We can document
> > the risk of not using it in the function description and suggest
> > new applications use ASYNC flag.
> 
> Yep, that's a nice suggestion and solves the problem.

I am almost ready to post the code that makes the above change, but before I do,
I need to ask a question about implementing synchronous behavior.

Stefan's qemu tree has a block_job_cancel command that always acts
asynchronously.  In order to provide the synchronous behavior in libvirt (when
flags is 0), I need to wait for the block job to go away.  I see two options:

1) Use the event:
To implement this I would create an internal event callback function and
register it to receive the block job events.  When the cancel event comes in, I
can exit the qemu job context.  One problem I see with this is that events are
only available when using the json monitor mode.

2) Poll the qemu monitor
To do it this way, I would write a function that repeatedly calls
virDomainGetBlockJobInfo() against the disk in question.  Once the job
disappears from the list I can return with confidence that the job is gone.
This is obviously sub-optimal because I need to poll and sleep.

3) Ask Stefan to provide a synchronous mode for the qemu monitor command
This one is the nicest from a libvirt perspective, but I doubt qemu wants to add
such an interface given that it basically has broken semantics as far as qemu is
concerned.

If this is all too nasty, we could probably just change the behavior of
blockJobAbort and make it always synchronous with a 'cancelled' event.

Thoughts?

-- 
Adam Litke 
IBM Linux Technology Center

[Qemu-devel] [PATCH] tcg: Remove redundant declarations of TCG_TARGET_REG_BITS

TCG_TARGET_REG_BITS is declared in tcg.h for all TCG targets.

Signed-off-by: Stefan Weil 
---
 tcg/i386/tcg-target.h |5 -
 1 files changed, 0 insertions(+), 5 deletions(-)

diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 7756e7b..adbb036 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -23,11 +23,6 @@
  */
 #define TCG_TARGET_I386 1
 
-#if defined(__x86_64__)
-# define TCG_TARGET_REG_BITS 64
-#else
-# define TCG_TARGET_REG_BITS 32
-#endif
 //#define TCG_TARGET_WORDS_BIGENDIAN
 
 #if TCG_TARGET_REG_BITS == 64
-- 
1.7.2.5

Re: [Qemu-devel] [PATCH v2 0/5] backdoor: lightweight guest-to-QEMU backdoor channel

Lluís Vilanova writes:
[...]
> From my experience, these are the basics I need:

> * Decide what to do when an event is translated (by default - no 
> instrumentation
>   -, it just generates a TCG call to an execution-time tracing routine).

>   This includes:

>   * Deciding whether to generate the TCG call to the execution-time tracing
> routine.

>   * Access to generating other arbitrary TCG code (the API is pretty stable, 
> but
> headers depend on target-specific defines).

> This could also include generating code like incrementing a counter (e.g.,
> when counting instructions), instead of calling the execution-time tracing
> routine.

> My library actually hides this from the user, who just provides conditions
> to establish whether to trace the event. Then the library automatically
> establishes whether it's best to evaluate each condition at translation
> time, using TCG code or when the execution-time tracing routine is 
> invoked.

>   * [still don't have it] Access to a map between target-specific abstractions
> (mainly register names) and TCG values (e.g., to generate code that gets 
> or
> sets the value of a well-known register).

> * Decide what to do when an event is executed, including calls to 
> execution-time
>   event tracing routines generated by TCG code (by default, it just calls the
>   tracing backend).

>   This includes:

>   * Arbitrary user-provided code (the analysis itself when not inlined with
> TCG).

>   * [still don't have it] Access to get and set the values of target-specific
> abstractions (mainly registers).

> * Add a per-vCPU opaque pointer with data private to the analyzer code.

I forgot to add that another nice addition is to let the translation-time
user-code specify which global registers might be accessed in the execution-time
user code (thus the translation can generate TCG_CALL_CONST by default).

Right now, it is established at compile time by having two event properties; one
with and one without TCG_CALL_CONST-enabled TCG helpers.


Lluis

-- 
 "And it's much the same thing with knowledge, for whenever you learn
 something new, the whole world becomes that much richer."
 -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
 Tollbooth

Re: [Qemu-devel] [PATCH 1/2] configure: don't check for Cocoa when detecting SDL.

Am 07.12.2011 08:47, schrieb Andrzej Zaborowski:
> The SDL check is supposed to set $sdl to "yes" or "no", but with that
> check it leaves $sdl unset on darwin, unless --enable-cocoa was
> specified (which is not needed to enable cocoa anyway).
> 
> Signed-off-by: Andrzej Zaborowski 
> ---
>  configure |4 +---
>  1 files changed, 1 insertions(+), 3 deletions(-)
> 
> diff --git a/configure b/configure
> index 678b982..fb15bc6 100755
> --- a/configure
> +++ b/configure
> @@ -1492,9 +1492,7 @@ EOF
>  if test "$_sdlversion" -lt 121 ; then
>sdl_too_old=yes
>  else
> -  if test "$cocoa" = "no" ; then
> -sdl=yes
> -  fi
> +  sdl=yes

With no further explanation in the commit description I have doubts if
this is really "safe". Cocoa actually masquerades as SDL in vl.c and has
no display type of its own, so there may well be a real need to suppress
SDL in the --enable-cocoa case. The #ifdef'fery may need to be sanitized
first, at least fully reviewed.

Also please cc me on Cocoa patches.

Thanks,
Andreas

>  fi
>  
>  # static link with sdl ? (note: sdl.pc's --static --libs is broken)

Re: [Qemu-devel] [PATCH 2/2] configure: remove --enable-cocoa (default), add --disable-cocoa.

On 7 December 2011 21:12, Andreas Färber  wrote:
> Note that I have a patch that replaces uint16 with uint_fast16_t,
> properly fixing the Cocoa build. What I don't have yet is all the other
> conversions (Coccinelle doesn't fully do int16 conversion, for example)
> to run the benchmarks Peter asked for.

For the benchmarks surely it suffices to flip the typedefs, ie compare
typedef uint8_t uint8;
typedef int8_t int8;
typedef uint16_t uint16;
typedef int16_t int16;
typedef uint32_t uint32;
typedef int32_t int32;
typedef uint64_t uint64;
typedef int64_t int64;

with
typedef uint_fast8_t uint8;
typedef int_fast8_t int8;
typedef uint_fast16_t uint16;
typedef int_fast16_t int16;
typedef uint_fast32_t uint32;
typedef int_fast32_t int32;
typedef uint_fast64_t uint64;
typedef int_fast64_t int64;

?

We only need to do the full search-n-replace when we've picked
which one we're going for...

-- PMM

Re: [Qemu-devel] [RFC] Device sandboxing

2011-12-07 Thread Paul Moore

On Wednesday, December 07, 2011 12:48:16 PM Anthony Liguori wrote:
> On 12/07/2011 12:25 PM, Corey Bryant wrote:
> > A group of us are starting to work on sandboxing QEMU device emulation
> > code. We're just getting started investigating various approaches, and
> > want to engage the community to gather input.
> 
> > Following are the design points that we are currently considering:
>
> To be perfectly honest, I think prototyping and measuring performance is
> going to be the only way to figure out the right approach here.

Agreed.  I'm currently working on a prototype to play around with some of the 
ideas discussed in this thread.  As soon as it is functional I'll send a 
pointer/patches/etc. to the list.

-- 
paul moore
virtualization @ redhat

Re: [Qemu-devel] [PATCH 2/2] configure: remove --enable-cocoa (default), add --disable-cocoa.

Am 07.12.2011 08:47, schrieb Andrzej Zaborowski:
> Cocoa can only be enabled on Darwin, and is enabled by default too,
> making --enable-cocoa redundant, with no way to disable Cocoa.  It
> also interfered with SDL support in a way that was dependent on
> the order of commandline switches.
> 
> Signed-off-by: Andrzej Zaborowski 

Nack. This not only conflicts with Pavel's patch series but like many
previous patches only does half the job (misses the block layer).
Could you please review his last series instead and rebase onto that if
necessary?

http://patchwork.ozlabs.org/patch/124980/
http://patchwork.ozlabs.org/patch/124979/
http://patchwork.ozlabs.org/patch/124981/

> ---
> Cocoa support seems to be broken at the moment, at least on some
> MacOS X versions.  But qemu builds and runs with SDL.

Many times have I asked how to actually use SDL with QEMU on Mac OS X.
If you've figured it out, please share that knowledge! What SDL download
do you use, what parameters do you pass to configure, etc.?

Note that I have a patch that replaces uint16 with uint_fast16_t,
properly fixing the Cocoa build. What I don't have yet is all the other
conversions (Coccinelle doesn't fully do int16 conversion, for example)
to run the benchmarks Peter asked for.

Andreas

>  configure |7 ++-
>  1 files changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/configure b/configure
> index fb15bc6..c5d07af 100755
> --- a/configure
> +++ b/configure
> @@ -674,10 +674,7 @@ for opt do
>;;
>--enable-profiler) profiler="yes"
>;;
> -  --enable-cocoa)
> -  cocoa="yes" ;
> -  sdl="no" ;
> -  audio_drv_list="coreaudio `echo $audio_drv_list | sed s,coreaudio,,g`"
> +  --disable-cocoa) cocoa="no"
>;;
>--disable-system) softmmu="no"
>;;
> @@ -986,7 +983,7 @@ echo "  --disable-sdldisable SDL"
>  echo "  --enable-sdl enable SDL"
>  echo "  --disable-vncdisable VNC"
>  echo "  --enable-vnc enable VNC"
> -echo "  --enable-cocoa   enable COCOA (Mac OS X only)"
> +echo "  --disable-cocoa  disable COCOA (Mac OS X only)"
>  echo "  --audio-drv-list=LISTset audio drivers list:"
>  echo "   Available drivers: $audio_possible_drivers"
>  echo "  --audio-card-list=LIST   set list of emulated audio cards 
> [$audio_card_list]"

[Qemu-devel] [PATCH RFC 4/4] trace-instrument: Add a per-vCPU opaque pointer for the instrumentation backend

The 'instrument' opaque pointer can be used by the user-provided trace
instrumentation backend to hold arbitrary data.

Signed-off-by: Lluís Vilanova 
---
 cpu-defs.h |6 --
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/cpu-defs.h b/cpu-defs.h
index db48a7a..9e46953 100644
--- a/cpu-defs.h
+++ b/cpu-defs.h
@@ -218,6 +218,8 @@ typedef struct CPUWatchpoint {
 struct KVMState *kvm_state; \
 struct kvm_run *kvm_run;\
 int kvm_fd; \
-int kvm_vcpu_dirty;
-
+int kvm_vcpu_dirty; \
+\
+/* opaque pointer to instrumentation data */\
+void *instrument;
 #endif

[Qemu-devel] [PATCH RFC 3/4] trace-instrument: Add support for user-provided code on a per-event basis

Adds support for the 'instrument' propery into the "trace-events" file.

Events with this property will generate tracing routines named
'${api_name}_backend' (instead of '${api_name}'), and expect the user to provide
its own implementation for the '${api_name}' routines.

Signed-off-by: Lluís Vilanova 
---
 Makefile.objs  |   23 +--
 Makefile.target|2 +-
 configure  |   38 ++
 scripts/simpletrace.py |2 +-
 scripts/tracetool  |   16 
 5 files changed, 77 insertions(+), 4 deletions(-)

diff --git a/Makefile.objs b/Makefile.objs
index 0012b04..13f9349 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -346,12 +346,12 @@ else
 trace.h: trace.h-timestamp
 endif
 trace.h-timestamp: $(SRC_PATH)/trace-events $(BUILD_DIR)/config-host.mak
-   $(call quiet-command,sh $(SRC_PATH)/scripts/tracetool 
--$(TRACE_BACKEND) -h < $< > $@,"  GEN   trace.h")
+   $(call quiet-command,sh $(SRC_PATH)/scripts/tracetool 
$(TRACETOOL_EXTRA) --$(TRACE_BACKEND) -h < $< > $@,"  GEN   trace.h")
@cmp -s $@ trace.h || cp $@ trace.h
 
 trace.c: trace.c-timestamp
 trace.c-timestamp: $(SRC_PATH)/trace-events $(BUILD_DIR)/config-host.mak
-   $(call quiet-command,sh $(SRC_PATH)/scripts/tracetool 
--$(TRACE_BACKEND) -c < $< > $@,"  GEN   trace.c")
+   $(call quiet-command,sh $(SRC_PATH)/scripts/tracetool 
$(TRACETOOL_EXTRA) --$(TRACE_BACKEND) -c < $< > $@,"  GEN   trace.c")
@cmp -s $@ trace.c || cp $@ trace.c
 
 trace.o: trace.c $(GENERATED_HEADERS)
@@ -399,6 +399,25 @@ trace-obj-y += $(addprefix trace/, $(trace-nested-y))
 $(trace-obj-y): $(GENERATED_HEADERS)
 
 ##
+# trace instrument
+
+ifdef CONFIG_TRACE_INSTRUMENT
+LIBTRACE_INSTRUMENT = libtrace-instrument/libtrace-instrument.a
+
+.PHONY: force
+force:
+
+$(LIBTRACE_INSTRUMENT): QEMU_CFLAGS += -I$(SRC_PATH)/fpu
+$(LIBTRACE_INSTRUMENT): QEMU_CFLAGS += -I$(SRC_PATH)/tcg/$(TARGET_BASE_ARCH)
+$(LIBTRACE_INSTRUMENT): VPATH = $(TRACE_INSTRUMENT_PATH)
+$(LIBTRACE_INSTRUMENT): $(dir $(LIBTRACE_INSTRUMENT))/Makefile force
+   $(call quiet-command,$(MAKE) $(SUBDIR_MAKEFLAGS) -C $(dir $@)   \
+   QEMU_CFLAGS="$(QEMU_CFLAGS)"\
+   TARGET_DIR=$(TARGET_DIR)$(dir $@)/ VPATH=$(VPATH)   \
+   SRC_PATH=$(SRC_PATH) V="$(V)" $(notdir $@))
+endif
+
+##
 # backdoor
 
 backdoor-nested-$(CONFIG_USER_ONLY) += user.o
diff --git a/Makefile.target b/Makefile.target
index a400d92..1b1c71b 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -415,7 +415,7 @@ endif # CONFIG_LINUX_USER
 
 obj-$(CONFIG_GDBSTUB_XML) += gdbstub-xml.o
 
-$(QEMU_PROG): $(obj-y) $(obj-$(TARGET_BASE_ARCH)-y) $(LIBBACKDOOR)
+$(QEMU_PROG): $(obj-y) $(obj-$(TARGET_BASE_ARCH)-y) $(LIBBACKDOOR) 
$(LIBTRACE_INSTRUMENT)
$(call LINK,$^)
 
 
diff --git a/configure b/configure
index 67d4c9a..ab7ced5 100755
--- a/configure
+++ b/configure
@@ -176,6 +176,7 @@ pie=""
 zero_malloc=""
 trace_backend="nop"
 trace_file="trace"
+trace_instrument=""
 backdoor=""
 spice=""
 rbd=""
@@ -550,6 +551,8 @@ for opt do
   ;;
   --with-trace-file=*) trace_file="$optarg"
   ;;
+  --with-trace-instrument=*) trace_instrument="$optarg"
+  ;;
   --with-backdoor=*) backdoor="$optarg"
   ;;
   --enable-gprof) gprof="yes"
@@ -1065,6 +1068,8 @@ echo "  --enable-trace-backend=B Set trace backend"
 echo "   Available backends:" 
$("$source_path"/scripts/tracetool --list-backends)
 echo "  --with-trace-file=NAME   Full PATH,NAME of file to store traces"
 echo "   Default:trace-"
+echo "  --with-trace-instrument=PATH"
+echo "   Directory to build user-provided trace event 
instrumentation library"
 echo "  --with-backdoor=PATH Directory to build user-provided backdoor 
library"
 echo "  --disable-spice  disable spice"
 echo "  --enable-spice   enable spice"
@@ -2699,6 +2704,26 @@ if compile_prog "" "" ; then
 fi
 
 ##
+# check for a valid directory for trace instrumentation
+if test -n "$trace_instrument"; then
+if test ! -f "$trace_instrument/Makefile"; then
+echo
+echo "Error: cannot make into '$trace_instrument'"
+echo "Please choose a directory where I can run 'make'"
+echo
+exit 1
+fi
+if test ! -f "$trace_instrument/trace-instrument.h"; then
+echo
+echo "Error: directory '$trace_instrument' does not contain a 
\"trace-instrument.h\" file"
+echo
+exit 1
+fi
+trace_instrument=`readlink -f "$trace_instrument"`
+fi
+
+
+##
 # End of CC checks
 # After here, no more $cc or $ld runs
 
@@ -2856,6 +2881,9 @@ echo "uuid support  $uuid"
 echo "vhost-net support $vhost_net"
 e

[Qemu-devel] [PATCH RFC 2/4] trace-instrument: Add documentation

Signed-off-by: Lluís Vilanova 
---
 docs/instrumentation.txt |  142 ++
 docs/tracing.txt |9 +++
 2 files changed, 151 insertions(+), 0 deletions(-)
 create mode 100644 docs/instrumentation.txt

diff --git a/docs/instrumentation.txt b/docs/instrumentation.txt
new file mode 100644
index 000..f9f3ce1
--- /dev/null
+++ b/docs/instrumentation.txt
@@ -0,0 +1,142 @@
+= Trace instrumentation =
+
+== Introduction ==
+
+This document describes how the events provided by the tracing infrastructure
+described in the document "tracing.txt" can be extended with user-defined code
+that is compiled into QEMU.
+
+The "instrument" property provides means to statically override and/or wrap the
+backend-specific tracing code on a per-event basis when compiling QEMU
+(regardless of the tracing backend).
+
+As opposed to using tracing backends that can dynamically load user-defined 
code
+into the tracing events (e.g., "dtrace"), trace instrumentation is performed at
+compile time and thus provides means for the tracing hooks to directly interact
+with QEMU internals.
+
+Static trace instrumentation trades the ease and flexibility of existing 
tracing
+backends with the ability to directly interact with the internal routines of
+QEMU, as well as the ability to embed highly performant and ad-hoc analyses
+based on QEMU's tracing events directly into QEMU itself.
+
+
+== Quickstart ==
+
+1. Declare which tracing events to instrument:
+
+sed -i -e "s/qemu_vmalloc(/instrument qemu_vmalloc(/g" trace-events
+
+2. Create the trace instrumentation header ("trace-instrument.h") with the
+   headers for the instrumented tracing events:
+
+cat > /tmp/my-instrument/trace-instrument.h < /tmp/my-intrument/Makefile < /tmp/my-instrument/instrument.c <

[Qemu-devel] [PATCH RFC 1/4] trace: [tracetool] Add 'get_api_name' to construct the name of tracing routines

All backends now use 'get_api_name' to build the name of the routines that are
used by QEMU code when invoking trace points. This is built based on the values
of 'get_api_name_fmt' and 'get_api_name_fmt_default'.

The old 'get_name' is still available to refer to the name of an event.

This allows future tracetool options to change the naming of the tracing API on
a per-event basis without having to modify backend-specific code.

Signed-off-by: Lluís Vilanova 
---
 scripts/tracetool |   41 +
 1 files changed, 29 insertions(+), 12 deletions(-)

diff --git a/scripts/tracetool b/scripts/tracetool
index 701b517..c699801 100755
--- a/scripts/tracetool
+++ b/scripts/tracetool
@@ -57,6 +57,20 @@ get_name()
 echo "${name##* }"
 }
 
+get_api_name_fmt_default="trace_%s"
+
+# Get the tracepoint name of a trace event (the name of the function QEMU uses)
+# The result is 'printf "$fmt" $name', where 'fmt' is 'get_api_name_fmt' if set
+# or 'get_api_name_fmt_default' otherwise
+get_api_name()
+{
+local name fmt
+name=$(get_name "$1")
+fmt="$get_api_name_fmt"
+[ -n "$fmt" ] || fmt=$get_api_name_fmt_default
+printf "$fmt" "$name"
+}
+
 # Get the given property of a trace event
 # 1: trace-events line
 # 2: property name
@@ -133,13 +147,13 @@ linetoh_begin_nop()
 
 linetoh_nop()
 {
-local name args
-name=$(get_name "$1")
+local api args
+api=$(get_api_name "$1")
 args=$(get_args "$1")
 
 # Define an empty function for the trace event
 cat <

[Qemu-devel] [RFC][PATCH RFC 0/4] trace-instrument: Let the user wrap/override specific event tracing routines

XXX: Must be applied on top of the "backdoor" patch series and the patch "trace:
 Provide a per-event status define for conditional compilation", but is
 independent otherwise.

Adds the "instrument" event property. When specified, this event property lets
the user provide her own implementation for that specific tracing event.

Still, in case the user only wants to wrap around the tracing event, the
original tracing implementation is accessible through function
'trace__backend', instead of the original 'trace_' (which
now the user has to provide).

The user-provided tracing functions are expected to be in the static library
"libtrace-instrument.a", identified by the "--with-trace-instrument"
configuration parameter and compiled by each of QEMU's target/frontend pairs.

See the documentation added in the second patch for more information.

Signed-off-by: Lluís Vilanova 
---

Lluís Vilanova (4):
  trace: [tracetool] Add 'get_api_name' to construct the name of tracing 
routines
  trace-instrument: Add documentation
  trace-instrument: Add support for user-provided code on a per-event basis
  trace-instrument: Add a per-vCPU opaque pointer for the instrumentation 
backend


 Makefile.objs|   23 +++
 Makefile.target  |2 -
 configure|   38 
 cpu-defs.h   |6 +-
 docs/instrumentation.txt |  142 ++
 docs/tracing.txt |9 +++
 scripts/simpletrace.py   |2 -
 scripts/tracetool|   57 +++---
 8 files changed, 261 insertions(+), 18 deletions(-)
 create mode 100644 docs/instrumentation.txt

Re: [Qemu-devel] [RFC] Device sandboxing

2011-12-07 Thread Eric Paris

On Wed, 2011-12-07 at 13:43 -0600, Anthony Liguori wrote:
> On 12/07/2011 01:32 PM, Corey Bryant wrote:

> > That would seem like the logical approach. I think there may be new mode 2
> > patches coming soon so we can see how they go over.
> 
> I'd like to see what the whitelist would need to be for something like QEMU 
> in 
> mode 2.  My biggest concern is that the whitelist would need to be so large 
> that 
> the practical security what's all that much improved.

When I prototyped my version of seccomp v2 for qemu a while back I did
it by only looking at syscalls after inital setup was completed (aka the
very last thing before main_loop() was to lock it down).  My list was
much sorter than even these:

+__NR_brk,
+__NR_close,
+__NR_exit_group,
+__NR_futex,
+__NR_ioctl,
+__NR_madvise,
+__NR_mmap,
+__NR_munmap,
+__NR_read,
+__NR_recvfrom,
+__NR_recvmsg,
+__NR_rt_sigaction,
+__NR_select,
+__NR_sendto,
+__NR_tgkill,
+__NR_timer_delete,
+__NR_timer_gettime,
+__NR_timer_settime,
+__NR_write,
+__NR_writev,

There is simple obvious negligible overhead value here, but every
proposal I know of, including mine, has been shot down by Ingo.  Ingo's
proposal is much more work, more overhead, but clearly more flexible.
His suggestions (and code based on those suggestions from others) has
been shot down by PeterZ.

So I feel like seccomp v2 is between a rock and a hard place.  We have
something that works really well, and could be a huge win for all sorts
of programs, but we don't seem to be able to get anything past the
damned if you do, damned if you don't nak's.

(There's also a cgroup version of seccomp proposed, but I'm guessing it
will go just about as far as all the other versions)

-Eric

Re: [Qemu-devel] [PATCH v2 0/5] backdoor: lightweight guest-to-QEMU backdoor channel

Anthony Liguori writes:

> On 12/07/2011 12:51 PM, Peter Maydell wrote:
>> 2011/12/7 Lluís Vilanova:
>>> Anthony Liguori writes:
>>> [...]
 Why should this analyzer live outside of QEMU in the first place?  I fail 
 to see
 the rationale for that other than not wanting to do the work of making it
 suitable for upstream consumption.
>>> 
>>> For the same reason that SystemTap lets you add user-provided code into the
>>> trace hooks, you never know what the user actually wants. The difference is 
>>> that
>>> when dealing with traces that require a high bandwidth, different people 
>>> may be
>>> interested on analyzing different events and under different conditions, and
>>> having a JIT in QEMU comes in handy to optimize the traces. This includes 
>>> the
>>> generation of extra tracing code at translation time (e.g., I generate 
>>> checks in
>>> TCG to establish when I want to trace a specific event, and someone else may
>>> just want to increment a counter using TCG code).
>>> 
>>> Guest trace instrumentation turns QEMU into a highly-performant tool for 
>>> dynamic
>>> binary instrumentation, with all the benefits that stem from QEMU (support 
>>> for a
>>> myriad of target architectures, as well as support for both full-system and
>>> user-level applications).
>> 
>> I think this *is* useful. However I also think that it *is* effectively
>> defining an API for people writing this hook code, and as such we ought
>> to do it properly if we're going to do it. (ie nail down what we are
>> providing for hook authors and don't let them grub around in the internals
>> otherwise).

> I strongly suspect that you could define interfaces in QEMU such that you 
> could
> do most of what you want without needing to link any code against QEMU itself.

> This is probably a case where LUA integration might make a lot of sense.

Sure you can provide an API for instrumentors. It's just that defining a stable
API for that would require me at least an order of magnitude more effort, and
this is time I don't have right now.

The other problem here is that of performance.

During translation, it is basically dependant on TB reusability, plus the
possibility to optimize the analyzer code when it's inlined into QEMU and some
of its arguments are known at compile-time. I just don't have numbers on that.

During execution it is way more sensitive, thus my approach to build the
analyzer code into QEMU. Otherwise you would go through a LUA interpreter (or
anything else) or, in the best case, a conditional indirect call
(dlopen).

>From my experience, these are the basics I need:

* Decide what to do when an event is translated (by default - no instrumentation
  -, it just generates a TCG call to an execution-time tracing routine).

  This includes:

  * Deciding whether to generate the TCG call to the execution-time tracing
routine.

  * Access to generating other arbitrary TCG code (the API is pretty stable, but
headers depend on target-specific defines).

This could also include generating code like incrementing a counter (e.g.,
when counting instructions), instead of calling the execution-time tracing
routine.

My library actually hides this from the user, who just provides conditions
to establish whether to trace the event. Then the library automatically
establishes whether it's best to evaluate each condition at translation
time, using TCG code or when the execution-time tracing routine is invoked.

  * [still don't have it] Access to a map between target-specific abstractions
(mainly register names) and TCG values (e.g., to generate code that gets or
sets the value of a well-known register).

* Decide what to do when an event is executed, including calls to execution-time
  event tracing routines generated by TCG code (by default, it just calls the
  tracing backend).

  This includes:

  * Arbitrary user-provided code (the analysis itself when not inlined with
TCG).

  * [still don't have it] Access to get and set the values of target-specific
abstractions (mainly registers).

* Add a per-vCPU opaque pointer with data private to the analyzer code.

Other things might include the injection of interrupts or anything else related
to the guest.

Lluis

-- 
 "And it's much the same thing with knowledge, for whenever you learn
 something new, the whole world becomes that much richer."
 -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
 Tollbooth

Re: [Qemu-devel] [RFC] Device sandboxing




On 12/07/2011 02:43 PM, Anthony Liguori wrote:

On 12/07/2011 01:32 PM, Corey Bryant wrote:


Agreed.


* The untrusted thread would be restricted by seccomp mode 1 and
would contain the device emulation code.


I think the best strategy would allow for a device to run either in the
untrusted thread or the trusted thread. This makes performance testing a
bit easier and it also makes development a bit more natural.



When you refer to the device running in the trusted thread, are you
talking
about the case where you run QEMU without sandboxing support? I think
we would
ideally like to add this new support such that if it is not enabled,
QEMU will
still run as a single process and decomposition wouldn't occur.


* The trusted helper thread would run beside the untrusted thread,
enabling the untrusted thread to make syscalls beyond read(),
write(), exit(), and sigreturn().


I assume you mean process, not thread BTW?



I do mean thread. When making calls on behalf of the seccomp'd thread,
I think
there will be syscalls that must be called from the same address
space. That's
where the the trusted helper thread would come into play.


* IPC communication mechanisms:

* An IPC mechanism will be required to enable communication between
untrusted and trusted threads.

* An IPC mechanism will also be required to enable communication
between the main QEMU process and device processes.


IPC is easy. We have tons of infrastructure in QEMU for IPC (virtio,
QMP, etc.). Please don't reinvent the wheel here.



Ok


* The communication mechanisms must provide secure communication,
be low overhead (easy to generate, parse, and validate), and must
play well with sVirt/LSMs.


I don't see how sVirt/LSM fits into this but all of these requirements
are also true for the other big untrusted thread that we interact with
(the guest itself).

My view is that we should view the untrusted thread as an extension of
the guest and that the interfaces between the trusted thread and the
untrusted thread views it simply as another machine type that presents a
different (simpler) hardware abstraction.



Yes this makes sense. I think our biggest concern with IPC is that we
don't
introduce a TOCTTOU opportunity for a device to change call parameters
after
they've been checked and before the calls is made on behalf of the
sandboxed
thread. Shared memory that is writable by both untrusted/trusted
thread could
introduce this.


This is no different than dealing with a guest. We have to handle this
with virtio already.



Well that's good.




* Some thoughts for IPC mechanisms are Unix sockets, pipes, virtio,
Google Native Client's IMC, and shared memory.


The actual mechanism doesn't really matter I think, but see above
comments.


* If seccomp mode 2 support becomes available, decomposition of device
emulation into untrusted/trusted threads may not be necessary. This
could result in improved performance (no IPC overhead between trusted
and untrusted thread) and reduced complexity (no need for trusted
helper thread).


If mode 2 is the Right Answer, then we shouldn't wait for it to become
available. We should make it available by pushing it into the kernel.

If we all agree that if mode 2 existed, it's what we would use, then
that we have the answer to this discussion and we know what we need to
go off and do.



That would seem like the logical approach. I think there may be new
mode 2
patches coming soon so we can see how they go over.


I'd like to see what the whitelist would need to be for something like
QEMU in mode 2. My biggest concern is that the whitelist would need to
be so large that the practical security what's all that much improved.


This may not tell the whole story.  These are the syscalls found to be 
called with the following execution:   qemu -hda harddrive.raw -boot c 
-m 256


access
brk
clock_gettime
clone
close
connect
dup
eventfd2
execve
fcntl64
fstat64
futex
getegid32
geteuid32
getgid32
getpeername
getrlimit
getsockname
gettimeofday
getuid32
ioctl
_llseek
madvise
mmap2
mprotect
munmap
nanosleep
open
poll
prctl
pread64
read
readlink
rt_sigaction
rt_sigprocmask
select
set_robust_list
set_thread_area
set_tid_address
shmat
shmctl
shmdt
shmget
signalfd
socket
stat64
tgkill
time
timer_create
timer_gettime
timer_settime
uname
write
writev



Regards,

Anthony Liguori



--
Regards,
Corey

Re: [Qemu-devel] Insane virtio-serial semantics


On 12/07/2011 01:44 PM, Michael Roth wrote:

On 12/07/2011 07:49 AM, Anthony Liguori wrote:

On 12/07/2011 02:21 AM, Markus Armbruster wrote:

Anthony Liguori writes:


On 12/06/2011 04:30 PM, Lluís Vilanova wrote:

Anthony Liguori writes:


I really worry about us introducing so many of these one-off
paravirtual devices.
I would much prefer that you look at doing this as an extension to
the ivshmem
device as it already has this sort of scope. You should be able to
do this by
just extending the size of bar 1 and using a well known guest id.


I did in fact look at ivshmem some time ago, and it's true that both
use the
same mechanisms; but each device has a completely different purpose.
To me it
just seems that extending the control BAR in ivshmem to call the
user-provided
backdoor callbacks is just conflating two completely separate
devices into a
single one. Besides, I think that the qemu-side of the backdoor is
simple enough
to avoid being a maintenance burden.


They have the same purpose (which are both vague TBH). The only
reason I'm sympathetic to this device is that virtio-serial has such
insane semantics.


Could you summarize what's wrong? Is it fixable?


I don't think so as it's part of the userspace ABI now.

Mike, please help me make sure I get this all right. A normal
file/socket has the following guest semantics:

1) When a disconnect occurs, you will receive a return of '0' or -EPIPE
depending on the platform. The fd is now unusable and you must
close/reopen.

2) You can setup SIGIO/SIGPIPE to fire off whenever a file descriptor
becomes readable/writable.

virtio serial has the following semantics:

1) When a disconnect occurs, if you read() you will receive an -EPIPE.

2) However, if a reconnect occurs before you issue your read(), the read
will complete with no indication that a disconnect occurred.

3) This makes it impossible to determine whether a disconnect has
occurred which makes it very hard to reset your protocol stream. To deal
with this, virtio-serial can issue a SIGIO signal upon disconnect.

4) Signals are asynchronous, so a reconnect may have occurred by the
time you get the SIGIO signal. It's unclear that you can do anything
useful with this.


That about sums it up. There was a thread about this a while back where there
was some tentative agreement on a way to fix this by introducing QEMU flags that
invoke similar semantics to unix sockets:

http://thread.gmane.org/gmane.comp.emulators.qemu/94721/focus=95496

But at this point we'd need to re-visit, since there's a fair number of
virtio-serial users now. It'd probably need to be something you could switch on
from the guest via an fcntl() or something.



So besides overloading the meaning of SIGIO, there's really no way to
figure out in the guest when a reconnect has occurred. To deal with this
in qemu-ga, we actually only allow 7-bit data transfers and use the 8th
bit as an in-band message to tell the guest that a reset has occurred.


Yup, it's not perfect though, due to a delayed/spurious response from an agent
that sent data before it read/handled the reset sequence. We don't get that
problem with unix sockets since they'd get an -EPIPE and would be blocked from
sending to a newly opened session.

We try to account for this on the host by following up a reset sequences will
the guest-sync RPC, which contains a unique ID that the guest echos back to us.
That way we can throw away stale data on the host until we get the intended
response. In our case, it's not quite perfect since if the agent sent a "{"
before getting reset, subsequent transmission of the guest-sync response can be
lost. We'd need to precede responses to guest-sync with a 0xFF as well, so that
the host flushes it's rcv buffer/parser state...

And, somewhat off-topic, but none of addresses the case where an agent hangs on
an RPC. This would require some additional handling by the agent side where we
might have tie some additional action to the 0xFF sequence.

Previously this scenario was handled by a hard-coded timeout mechanism in the
agent, with a seperate thread handling the RPCs, but we've since dropped the
thread due to potential for memory leaks (with plans to re-introduce using a
child process).

client-induced resets would be much nicer though, and a reserved byte is the
best solution we've been able to come up with given the current virtio-serial
semantics.


Yeah, we really need a "sane reset semantics" flag for virtio-serial that 
provides a guest and host initiated channel close mechanism.


I think you need to do this by using a single ring and using a simple session id 
with an explicit open/close message.  That way there is never ambiguity.


And yes, I can't help but think of Dave Millers comments long ago that any PV 
transport is eventually going to reinvent TCP, poorly.


Regards,

Anthony Liguori





Regards,

Anthony Liguori



[...]

Re: [Qemu-devel] [PATCH] configure: don't try to compile against known broken curses.


Am 07.12.2011 20:06, schrieb andrzej zaborowski:

On 7 December 2011 19:57, Stefan Weil  wrote:

Am 07.12.2011 08:47, schrieb Andrzej Zaborowski:

+#ifdef NCURSES_VERSION
+# if NCURSES_VERSION_PATCH < 20040117
+# error Old ncurses contain dangerous typedefs, break qemu build 
(and are

old)
+# endif
+#endif
int main(void) { resize_term(0, 0); return curses_version(); }
EOF
for curses_lib in $curses_list; do



Is NCURSES_VERSION_PATCH always defined when NCURSES_VERSION is?


I'm not sure, will try to find out. If it isn't then we should check
that NCURSES_VERSION_MINOR < 4 perhaps.

The intent of checking defined(NCURSES_VERSION) is to detect ncurses
because qemu should also build with other implementations of curses
(in theory).

Cheers


Yes, that's right. W32 for example supports pdcurses. So the check
might look like this:

#if defined(NCURSES_VERSION_PATCH) && NCURSES_VERSION_PATCH < 20040117
# error Old ncurses contain dangerous typedefs, break qemu build (and 
are old)

#endif

Cheers,
Stefan

Re: [Qemu-devel] [RFC] Device sandboxing

2011-12-07 Thread Michael Halcrow

On Wed, Dec 7, 2011 at 11:43 AM, Anthony Liguori wrote:

> I'd like to see what the whitelist would need to be for something like
> QEMU in mode 2.  My biggest concern is that the whitelist would need to be
> so large that the practical security what's all that much improved.
>

Based on some prototyping work I've done with VMM ptrace sandboxing, I
estimate a ceiling of about 50 syscalls in the whitelist. This is a
reduction from over 300, and Linux syscalls that have had security
vulnerabilities in the past few years were not needed. Aside from that, if
we can further restrict based on syscall parameters, then we have a
straightforward mechanism for locking down access to things like file
system resources. For instance, a block device can be restricted to only
accessing the host file(s) that back the block device.

Re: [Qemu-devel] Insane virtio-serial semantics


On 12/07/2011 07:49 AM, Anthony Liguori wrote:

On 12/07/2011 02:21 AM, Markus Armbruster wrote:

Anthony Liguori writes:


On 12/06/2011 04:30 PM, Lluís Vilanova wrote:

Anthony Liguori writes:


I really worry about us introducing so many of these one-off
paravirtual devices.
I would much prefer that you look at doing this as an extension to
the ivshmem
device as it already has this sort of scope. You should be able to
do this by
just extending the size of bar 1 and using a well known guest id.


I did in fact look at ivshmem some time ago, and it's true that both
use the
same mechanisms; but each device has a completely different purpose.
To me it
just seems that extending the control BAR in ivshmem to call the
user-provided
backdoor callbacks is just conflating two completely separate
devices into a
single one. Besides, I think that the qemu-side of the backdoor is
simple enough
to avoid being a maintenance burden.


They have the same purpose (which are both vague TBH). The only
reason I'm sympathetic to this device is that virtio-serial has such
insane semantics.


Could you summarize what's wrong? Is it fixable?


I don't think so as it's part of the userspace ABI now.

Mike, please help me make sure I get this all right. A normal
file/socket has the following guest semantics:

1) When a disconnect occurs, you will receive a return of '0' or -EPIPE
depending on the platform. The fd is now unusable and you must
close/reopen.

2) You can setup SIGIO/SIGPIPE to fire off whenever a file descriptor
becomes readable/writable.

virtio serial has the following semantics:

1) When a disconnect occurs, if you read() you will receive an -EPIPE.

2) However, if a reconnect occurs before you issue your read(), the read
will complete with no indication that a disconnect occurred.

3) This makes it impossible to determine whether a disconnect has
occurred which makes it very hard to reset your protocol stream. To deal
with this, virtio-serial can issue a SIGIO signal upon disconnect.

4) Signals are asynchronous, so a reconnect may have occurred by the
time you get the SIGIO signal. It's unclear that you can do anything
useful with this.


That about sums it up. There was a thread about this a while back where 
there was some tentative agreement on a way to fix this by introducing 
QEMU flags that invoke similar semantics to unix sockets:


http://thread.gmane.org/gmane.comp.emulators.qemu/94721/focus=95496

But at this point we'd need to re-visit, since there's a fair number of 
virtio-serial users now. It'd probably need to be something you could 
switch on from the guest via an fcntl() or something.




So besides overloading the meaning of SIGIO, there's really no way to
figure out in the guest when a reconnect has occurred. To deal with this
in qemu-ga, we actually only allow 7-bit data transfers and use the 8th
bit as an in-band message to tell the guest that a reset has occurred.


Yup, it's not perfect though, due to a delayed/spurious response from an 
agent that sent data before it read/handled the reset sequence. We don't 
get that problem with unix sockets since they'd get an -EPIPE and would 
be blocked from sending to a newly opened session.


We try to account for this on the host by following up a reset sequences 
will the guest-sync RPC, which contains a unique ID that the guest echos 
back to us. That way we can throw away stale data on the host until we 
get the intended response. In our case, it's not quite perfect since if 
the agent sent a "{" before getting reset, subsequent transmission of 
the guest-sync response can be lost. We'd need to precede responses to 
guest-sync with a 0xFF as well, so that the host flushes it's rcv 
buffer/parser state...


And, somewhat off-topic, but none of addresses the case where an agent 
hangs on an RPC. This would require some additional handling by the 
agent side where we might have tie some additional action to the 0xFF 
sequence.


Previously this scenario was handled by a hard-coded timeout mechanism 
in the agent, with a seperate thread handling the RPCs, but we've since 
dropped the thread due to potential for memory leaks (with plans to 
re-introduce using a child process).


client-induced resets would be much nicer though, and a reserved byte is 
the best solution we've been able to come up with given the current 
virtio-serial semantics.




Regards,

Anthony Liguori



[...]

Re: [Qemu-devel] [PATCH v5 0/4] -net bridge: rootless bridge support for qemu



On 11/13/2011 10:45 PM, Corey Bryant wrote:

With qemu it is possible to run a guest from an unprivileged user but if
we wanted to communicate with the outside world we had to switch
to root.

We address this problem by introducing a new network backend and a new
network option for -net tap.  This is less flexible when compared to
existing -net tap options because it relies on a helper with elevated
privileges to do the heavy lifting of allocating and attaching a tap
device to a bridge.  We use a special purpose helper because we don't
want to elevate the privileges of more generic tools like brctl.

Qemu can be run with the default network helper as follows (in these cases
attaching the tap device to the default br0 bridge):

   qemu linux.img -net bridge -net nic,model=virtio

   qemu linux.img -net tap,helper=/usr/local/libexec/qemu-bridge-helper
  -net nic,model=virtio

   qemu linux.img -netdev bridge,id=hn0
  -device virtio-net-pci,netdev=hn0,id=nic1

   qemu linux.img -netdev 
tap,helper=/usr/local/libexec/qemu-bridge-helper,id=hn0
  -device virtio-net-pci,netdev=hn0,id=nic1

The default helper uses it's own ACL mechanism for access control, but
future network helpers could be developed, for example, to support PolicyKit
for access control.

More details are included in individual patches.  The helper is broken into
a series of patches to improve reviewabilty.

v2:
  - Updated signed-off-by's
  - Updated author's email
  - Set default bridge to br0
  - Added -net bridge
  - Updated ACL example
  - Moved from libcap to libcap-ng
  - Fail helper when libcap-ng not configured

v3:
  - Use simple queue to store ACLs
  - Added goto cleanup to helper's main
  - Allow helper execution if libcap-ng not configured
  - Completed static analysis and memory analysis on helper

v4:
  - Update has_vnet_hdr() to return bool
  - Update helper's main() to prevent errno clobbering
  - Let Kernel cleanup helper's file descriptors

v5:
  - Removed if statement with TUNGETIFF ioctl() from has_vnet_hdr()
  - Added -netdev examples and udpated qemu -help netdev documentation
  - Disallow vnet_hdr option with -net tap,helper

Corey Bryant (4):
   Add basic version of bridge helper
   Add access control support to qemu bridge helper
   Add cap reduction support to enable use as SUID
   Add support for net bridge

  Makefile |   12 ++-
  configure|   37 +
  net.c|   29 -
  net.h|3 +
  net/tap.c|  191 +++-
  net/tap.h|3 +
  qemu-bridge-helper.c |  402 ++
  qemu-options.hx  |   74 --
  8 files changed, 728 insertions(+), 23 deletions(-)
  create mode 100644 qemu-bridge-helper.c



Does anyone have any comments on this patch series?  I haven't received 
any thus far for this version.  Surely no news can't be good news. :)


--
Regards,
Corey

Re: [Qemu-devel] [RFC] Device sandboxing


On 12/07/2011 01:32 PM, Corey Bryant wrote:


Agreed.


* The untrusted thread would be restricted by seccomp mode 1 and
would contain the device emulation code.


I think the best strategy would allow for a device to run either in the
untrusted thread or the trusted thread. This makes performance testing a
bit easier and it also makes development a bit more natural.



When you refer to the device running in the trusted thread, are you talking
about the case where you run QEMU without sandboxing support? I think we would
ideally like to add this new support such that if it is not enabled, QEMU will
still run as a single process and decomposition wouldn't occur.


* The trusted helper thread would run beside the untrusted thread,
enabling the untrusted thread to make syscalls beyond read(),
write(), exit(), and sigreturn().


I assume you mean process, not thread BTW?



I do mean thread. When making calls on behalf of the seccomp'd thread, I think
there will be syscalls that must be called from the same address space. That's
where the the trusted helper thread would come into play.


* IPC communication mechanisms:

* An IPC mechanism will be required to enable communication between
untrusted and trusted threads.

* An IPC mechanism will also be required to enable communication
between the main QEMU process and device processes.


IPC is easy. We have tons of infrastructure in QEMU for IPC (virtio,
QMP, etc.). Please don't reinvent the wheel here.



Ok


* The communication mechanisms must provide secure communication,
be low overhead (easy to generate, parse, and validate), and must
play well with sVirt/LSMs.


I don't see how sVirt/LSM fits into this but all of these requirements
are also true for the other big untrusted thread that we interact with
(the guest itself).

My view is that we should view the untrusted thread as an extension of
the guest and that the interfaces between the trusted thread and the
untrusted thread views it simply as another machine type that presents a
different (simpler) hardware abstraction.



Yes this makes sense. I think our biggest concern with IPC is that we don't
introduce a TOCTTOU opportunity for a device to change call parameters after
they've been checked and before the calls is made on behalf of the sandboxed
thread. Shared memory that is writable by both untrusted/trusted thread could
introduce this.


This is no different than dealing with a guest.  We have to handle this with 
virtio already.





* Some thoughts for IPC mechanisms are Unix sockets, pipes, virtio,
Google Native Client's IMC, and shared memory.


The actual mechanism doesn't really matter I think, but see above comments.


* If seccomp mode 2 support becomes available, decomposition of device
emulation into untrusted/trusted threads may not be necessary. This
could result in improved performance (no IPC overhead between trusted
and untrusted thread) and reduced complexity (no need for trusted
helper thread).


If mode 2 is the Right Answer, then we shouldn't wait for it to become
available. We should make it available by pushing it into the kernel.

If we all agree that if mode 2 existed, it's what we would use, then
that we have the answer to this discussion and we know what we need to
go off and do.



That would seem like the logical approach. I think there may be new mode 2
patches coming soon so we can see how they go over.


I'd like to see what the whitelist would need to be for something like QEMU in 
mode 2.  My biggest concern is that the whitelist would need to be so large that 
the practical security what's all that much improved.


Regards,

Anthony Liguori

Re: [Qemu-devel] [RFC] Device sandboxing




On 12/07/2011 01:48 PM, Anthony Liguori wrote:

On 12/07/2011 12:25 PM, Corey Bryant wrote:

A group of us are starting to work on sandboxing QEMU device emulation
code. We're just getting started investigating various approaches, and
want to engage the community to gather input.

Following are the design points that we are currently considering:


To be perfectly honest, I think prototyping and measuring performance is
going to be the only way to figure out the right approach here. Here are
some thoughts on the various approaches.



* Decompose QEMU into multiple processes:

* This could be done such that QEMU devices execute in separate
processes based on device type, e.g. all block devices in one
process and all network devices in a second process. Another
alternative is executing a separate process per device.


I don't think that a HIRD of QEMU-replacing daemons is the best approach
to this problem. While I appreciate the academic attraction to such a
proposal, I think practical experience tells us that this isn't the
easiest type of system to get right.



Thanks for the input.

The idea would be to fork() the processes internally, if that is the 
concern.  They wouldn't have to be started separately by the user.



* Decomposition would not only afford a level of security inherent
in process separation, it would also allow development of stricter
sVirt/SELinux policy for the decomposed QEMU processes (e.g. a
block device specific policy). This would enable a true sandbox
with layers of defense.

* Decompose the device emulation process further into an untrusted and
trusted thread:


I think this general approach is the most rationale place to start.



Agreed.


* The untrusted thread would be restricted by seccomp mode 1 and
would contain the device emulation code.


I think the best strategy would allow for a device to run either in the
untrusted thread or the trusted thread. This makes performance testing a
bit easier and it also makes development a bit more natural.



When you refer to the device running in the trusted thread, are you 
talking about the case where you run QEMU without sandboxing support?  I 
think we would ideally like to add this new support such that if it is 
not enabled, QEMU will still run as a single process and decomposition 
wouldn't occur.



* The trusted helper thread would run beside the untrusted thread,
enabling the untrusted thread to make syscalls beyond read(),
write(), exit(), and sigreturn().


I assume you mean process, not thread BTW?



I do mean thread.  When making calls on behalf of the seccomp'd thread, 
I think there will be syscalls that must be called from the same address 
space.  That's where the the trusted helper thread would come into play.



* IPC communication mechanisms:

* An IPC mechanism will be required to enable communication between
untrusted and trusted threads.

* An IPC mechanism will also be required to enable communication
between the main QEMU process and device processes.


IPC is easy. We have tons of infrastructure in QEMU for IPC (virtio,
QMP, etc.). Please don't reinvent the wheel here.



Ok


* The communication mechanisms must provide secure communication,
be low overhead (easy to generate, parse, and validate), and must
play well with sVirt/LSMs.


I don't see how sVirt/LSM fits into this but all of these requirements
are also true for the other big untrusted thread that we interact with
(the guest itself).

My view is that we should view the untrusted thread as an extension of
the guest and that the interfaces between the trusted thread and the
untrusted thread views it simply as another machine type that presents a
different (simpler) hardware abstraction.



Yes this makes sense.  I think our biggest concern with IPC is that we 
don't introduce a TOCTTOU opportunity for a device to change call 
parameters after they've been checked and before the calls is made on 
behalf of the sandboxed thread.  Shared memory that is writable by both 
untrusted/trusted thread could introduce this.



* Some thoughts for IPC mechanisms are Unix sockets, pipes, virtio,
Google Native Client's IMC, and shared memory.


The actual mechanism doesn't really matter I think, but see above comments.


* If seccomp mode 2 support becomes available, decomposition of device
emulation into untrusted/trusted threads may not be necessary. This
could result in improved performance (no IPC overhead between trusted
and untrusted thread) and reduced complexity (no need for trusted
helper thread).


If mode 2 is the Right Answer, then we shouldn't wait for it to become
available. We should make it available by pushing it into the kernel.

If we all agree that if mode 2 existed, it's what we would use, then
that we have the answer to this discussion and we know what we need to
go off and do.



That would seem like the logical approach.  I think there may be new 
mode 2 patches coming soon so we can see how they go over.



* Execution of QEMU with t

Re: [Qemu-devel] vfio / iommu domain attributes

2011-12-07 Thread Stuart Yoder

On Wed, Dec 7, 2011 at 10:38 AM, Joerg Roedel  wrote:
> On Wed, Dec 07, 2011 at 09:54:39AM -0600, Stuart Yoder wrote:
>> Alex, Alexey I'm wondering if you've had any new thoughts on this over
>> the last week.
>>
>> For Freescale, our iommu domain attributes would look something like:
>>     -domain iova base address
>>     -domain iova window size
>
> I agree with that.
>
>>     -domain enable/disable
>>     -number of subwindows
>>     -operation mapping table index
>>     -stash destination CPU
>>     -stash target (cache– L1, L2, L3)
>
> Why does the user of the IOMMU-API need to have control over these
> things?

Our IOMMU complicates things in that it is used for more than just
memory protection
and address translation.  It has a concept of operation translation as well.
Some devices could do a 'write' transaction that when passing through the
iommu gets translated to a a 'write-with-stash'.   Stashed transactions
get pushed directly into some cache.

It's the entity creating and setting up the domain that will have the knowledge
of what cache is to be stashed to.Right now software that uses stashing
is pinned to a CPU, but if in the future it's possible that we'll want to
work without pinning and may need to update stashing attributes on the
fly.

The overall iova window for the domain can be divided into a configurable
number of subwindows (a power of 2, up to 256), which means we can have
a contiguous iova region backed by up to 256 physically dis-contiguous
subwindows.The creator of the iommu domain is in the best position
to know how many subwindows are needed (the fewer the better for
performance reasons).

So, in short, the above list of attributes are the attributes of our
iommu hardware
and  the knowlege of how they should be set is with the domain creator.

>> These are all things that need to be set by the creator of the domain.
>>
>> Since the domain attributes are going to be so different for each platform 
>> does
>> it make sense to define a new iommu_ops call back that just takes a void 
>> pointer
>> that can be implemented in a platform specific way?   For example:
>>
>>     struct iommu_ops {
>>         [cut]
>>         int (*domain_set_attrs)(struct iommu_domain *domain,
>>                               void *attrs);
>>         int (*domain_get_attrs)(struct iommu_domain *domain,
>>                               void *attrs);
>>     }
>
> A void pointer is certainly the worst choice for an interface. I think
> it is better to have at least a set of common attributes. Somthing like
> this:
>
> iommu_domain_set_attr(struct iommu_domain *domain, enum attr_type, void *data)
> iommu_domain_get_attr(struct iommu_domain *domain, enum attr_type, void *data)

That would be fine.

Stuart

Re: [Qemu-devel] [PATCH] configure: don't try to compile against known broken curses.

2011-12-07 Thread andrzej zaborowski

On 7 December 2011 19:57, Stefan Weil  wrote:
> Am 07.12.2011 08:47, schrieb Andrzej Zaborowski:
>> +#ifdef NCURSES_VERSION
>> +# if NCURSES_VERSION_PATCH < 20040117
>> +# error Old ncurses contain dangerous typedefs, break qemu build (and are
>> old)
>> +# endif
>> +#endif
>> int main(void) { resize_term(0, 0); return curses_version(); }
>> EOF
>> for curses_lib in $curses_list; do
>
>
> Is NCURSES_VERSION_PATCH always defined when NCURSES_VERSION is?

I'm not sure, will try to find out.  If it isn't then we should check
that NCURSES_VERSION_MINOR < 4 perhaps.

The intent of checking defined(NCURSES_VERSION) is to detect ncurses
because qemu should also build with other implementations of curses
(in theory).

Cheers

Re: [Qemu-devel] [PATCH] configure: don't try to compile against known broken curses.


Am 07.12.2011 08:47, schrieb Andrzej Zaborowski:

This should resolve a problem noted by Caraman Mihai Claudiu.

Signed-off-by: Andrzej Zaborowski 
---
configure | 5 +
1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/configure b/configure
index 61c43b9..678b982 100755
--- a/configure
+++ b/configure
@@ -1846,6 +1846,11 @@ if test "$curses" != "no" ; then
#ifdef __OpenBSD__
#define resize_term resizeterm
#endif
+#ifdef NCURSES_VERSION
+# if NCURSES_VERSION_PATCH < 20040117
+# error Old ncurses contain dangerous typedefs, break qemu build (and 
are old)

+# endif
+#endif
int main(void) { resize_term(0, 0); return curses_version(); }
EOF
for curses_lib in $curses_list; do


Is NCURSES_VERSION_PATCH always defined when NCURSES_VERSION is?
If not, code like the following is more robust:

#if !defined(NCURSES_VERSION_PATCH) || NCURSES_VERSION_PATCH < 20040117
# error Old ncurses contain dangerous typedefs, break qemu build (and 
are old)

#endif

Otherwise the patch is good.

Reviewed-by: Stefan Weil

Re: [Qemu-devel] [PATCH 2/2] configure: remove --enable-cocoa (default), add --disable-cocoa.

On 7 December 2011 07:47, Andrzej Zaborowski  wrote:
> Cocoa can only be enabled on Darwin, and is enabled by default too,
> making --enable-cocoa redundant, with no way to disable Cocoa.  It
> also interfered with SDL support in a way that was dependent on
> the order of commandline switches.

For these --enable/disable pairs I quite like the pattern where
 * default is "probe and use if available"
 * --enable-foo is "use foo, fail configure if not available"
 * --disable-foo is "don't use foo"

(--enable-sdl/--disable-sdl work like this, for instance).

[cf the comment in configure at line 100.]

Incidentally, is it "Cocoa" or "COCOA" ?

-- PMM

Re: [Qemu-devel] [PATCH v2 0/5] backdoor: lightweight guest-to-QEMU backdoor channel


On 12/07/2011 12:51 PM, Peter Maydell wrote:

2011/12/7 Lluís Vilanova:

Anthony Liguori writes:
[...]

Why should this analyzer live outside of QEMU in the first place?  I fail to see
the rationale for that other than not wanting to do the work of making it
suitable for upstream consumption.


For the same reason that SystemTap lets you add user-provided code into the
trace hooks, you never know what the user actually wants. The difference is that
when dealing with traces that require a high bandwidth, different people may be
interested on analyzing different events and under different conditions, and
having a JIT in QEMU comes in handy to optimize the traces. This includes the
generation of extra tracing code at translation time (e.g., I generate checks in
TCG to establish when I want to trace a specific event, and someone else may
just want to increment a counter using TCG code).

Guest trace instrumentation turns QEMU into a highly-performant tool for dynamic
binary instrumentation, with all the benefits that stem from QEMU (support for a
myriad of target architectures, as well as support for both full-system and
user-level applications).


I think this *is* useful. However I also think that it *is* effectively
defining an API for people writing this hook code, and as such we ought
to do it properly if we're going to do it. (ie nail down what we are
providing for hook authors and don't let them grub around in the internals
otherwise).


I strongly suspect that you could define interfaces in QEMU such that you could 
do most of what you want without needing to link any code against QEMU itself.


This is probably a case where LUA integration might make a lot of sense.

Regards,

Anthony Liguori



-- PMM

Re: [Qemu-devel] [PATCH v2 0/5] backdoor: lightweight guest-to-QEMU backdoor channel

2011/12/7 Lluís Vilanova :
> Anthony Liguori writes:
> [...]
>> Why should this analyzer live outside of QEMU in the first place?  I fail to 
>> see
>> the rationale for that other than not wanting to do the work of making it
>> suitable for upstream consumption.
>
> For the same reason that SystemTap lets you add user-provided code into the
> trace hooks, you never know what the user actually wants. The difference is 
> that
> when dealing with traces that require a high bandwidth, different people may 
> be
> interested on analyzing different events and under different conditions, and
> having a JIT in QEMU comes in handy to optimize the traces. This includes the
> generation of extra tracing code at translation time (e.g., I generate checks 
> in
> TCG to establish when I want to trace a specific event, and someone else may
> just want to increment a counter using TCG code).
>
> Guest trace instrumentation turns QEMU into a highly-performant tool for 
> dynamic
> binary instrumentation, with all the benefits that stem from QEMU (support 
> for a
> myriad of target architectures, as well as support for both full-system and
> user-level applications).

I think this *is* useful. However I also think that it *is* effectively
defining an API for people writing this hook code, and as such we ought
to do it properly if we're going to do it. (ie nail down what we are
providing for hook authors and don't let them grub around in the internals
otherwise).

-- PMM

Re: [Qemu-devel] [RFC] Device sandboxing


On 12/07/2011 12:25 PM, Corey Bryant wrote:

A group of us are starting to work on sandboxing QEMU device emulation
code. We're just getting started investigating various approaches, and
want to engage the community to gather input.

Following are the design points that we are currently considering:


To be perfectly honest, I think prototyping and measuring performance is going 
to be the only way to figure out the right approach here.  Here are some 
thoughts on the various approaches.




* Decompose QEMU into multiple processes:

* This could be done such that QEMU devices execute in separate
processes based on device type, e.g. all block devices in one
process and all network devices in a second process. Another
alternative is executing a separate process per device.


I don't think that a HIRD of QEMU-replacing daemons is the best approach to this 
problem.  While I appreciate the academic attraction to such a proposal, I think 
practical experience tells us that this isn't the easiest type of system to get 
right.



* Decomposition would not only afford a level of security inherent
in process separation, it would also allow development of stricter
sVirt/SELinux policy for the decomposed QEMU processes (e.g. a
block device specific policy). This would enable a true sandbox
with layers of defense.

* Decompose the device emulation process further into an untrusted and
trusted thread:


I think this general approach is the most rationale place to start.


* The untrusted thread would be restricted by seccomp mode 1 and
would contain the device emulation code.


I think the best strategy would allow for a device to run either in the 
untrusted thread or the trusted thread.  This makes performance testing a bit 
easier and it also makes development a bit more natural.



* The trusted helper thread would run beside the untrusted thread,
enabling the untrusted thread to make syscalls beyond read(),
write(), exit(), and sigreturn().


I assume you mean process, not thread BTW?


* IPC communication mechanisms:

* An IPC mechanism will be required to enable communication between
untrusted and trusted threads.

* An IPC mechanism will also be required to enable communication
between the main QEMU process and device processes.


IPC is easy.  We have tons of infrastructure in QEMU for IPC (virtio, QMP, 
etc.).  Please don't reinvent the wheel here.



* The communication mechanisms must provide secure communication,
be low overhead (easy to generate, parse, and validate), and must
play well with sVirt/LSMs.


I don't see how sVirt/LSM fits into this but all of these requirements are also 
true for the other big untrusted thread that we interact with (the guest itself).


My view is that we should view the untrusted thread as an extension of the guest 
and that the interfaces between the trusted thread and the untrusted thread 
views it simply as another machine type that presents a different (simpler) 
hardware abstraction.



* Some thoughts for IPC mechanisms are Unix sockets, pipes, virtio,
Google Native Client's IMC, and shared memory.


The actual mechanism doesn't really matter I think, but see above comments.


* If seccomp mode 2 support becomes available, decomposition of device
emulation into untrusted/trusted threads may not be necessary. This
could result in improved performance (no IPC overhead between trusted
and untrusted thread) and reduced complexity (no need for trusted
helper thread).


If mode 2 is the Right Answer, then we shouldn't wait for it to become 
available.  We should make it available by pushing it into the kernel.


If we all agree that if mode 2 existed, it's what we would use, then that we 
have the answer to this discussion and we know what we need to go off and do.



* Execution of QEMU with the sandboxed device support should be an
optional run-time specification.


Ack with a small exception.  If we can demonstrate that sandboxing has an 
acceptable performance overhead, then we should do it unconditionally to reduce 
our overall test matrix.  It's unclear that that's obtainable though.



* We will be focusing on legacy devices first, both for performance and
risk reasons.

Once we settle on a direction, we will develop a proof of concept to
share with the community.


Proof of concepts are the only way to settle on direction.  Code speaks louder 
than anything else.




We appreciate your input.

Regards,

Ashley Lai
Corey Bryant
Eduardo Otubo
Michael Halcrow
Paul Moore
Richa Marwaha


In the future, I would suggest beginning these type of discussions on the list 
to start with.  Otherwise valuable in information (including discussion and 
debate on directions) are not available to the greater community at large.


Not a big deal in this case, but I want to be on the record here about this.  I 
would have greatly preferred this whole effort start out on qemu-devel from day one.


Regards,

Anthony Liguori

[Qemu-devel] [PATCH 2/2] configure: remove --enable-cocoa (default), add --disable-cocoa.

2011-12-07 Thread Andrzej Zaborowski

Cocoa can only be enabled on Darwin, and is enabled by default too,
making --enable-cocoa redundant, with no way to disable Cocoa.  It
also interfered with SDL support in a way that was dependent on
the order of commandline switches.

Signed-off-by: Andrzej Zaborowski 
---
Cocoa support seems to be broken at the moment, at least on some
MacOS X versions.  But qemu builds and runs with SDL.

 configure |7 ++-
 1 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/configure b/configure
index fb15bc6..c5d07af 100755
--- a/configure
+++ b/configure
@@ -674,10 +674,7 @@ for opt do
   ;;
   --enable-profiler) profiler="yes"
   ;;
-  --enable-cocoa)
-  cocoa="yes" ;
-  sdl="no" ;
-  audio_drv_list="coreaudio `echo $audio_drv_list | sed s,coreaudio,,g`"
+  --disable-cocoa) cocoa="no"
   ;;
   --disable-system) softmmu="no"
   ;;
@@ -986,7 +983,7 @@ echo "  --disable-sdldisable SDL"
 echo "  --enable-sdl enable SDL"
 echo "  --disable-vncdisable VNC"
 echo "  --enable-vnc enable VNC"
-echo "  --enable-cocoa   enable COCOA (Mac OS X only)"
+echo "  --disable-cocoa  disable COCOA (Mac OS X only)"
 echo "  --audio-drv-list=LISTset audio drivers list:"
 echo "   Available drivers: $audio_possible_drivers"
 echo "  --audio-card-list=LIST   set list of emulated audio cards 
[$audio_card_list]"
-- 
1.7.4.4

[Qemu-devel] [PATCH 1/2] configure: don't check for Cocoa when detecting SDL.

2011-12-07 Thread Andrzej Zaborowski

The SDL check is supposed to set $sdl to "yes" or "no", but with that
check it leaves $sdl unset on darwin, unless --enable-cocoa was
specified (which is not needed to enable cocoa anyway).

Signed-off-by: Andrzej Zaborowski 
---
 configure |4 +---
 1 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/configure b/configure
index 678b982..fb15bc6 100755
--- a/configure
+++ b/configure
@@ -1492,9 +1492,7 @@ EOF
 if test "$_sdlversion" -lt 121 ; then
   sdl_too_old=yes
 else
-  if test "$cocoa" = "no" ; then
-sdl=yes
-  fi
+  sdl=yes
 fi
 
 # static link with sdl ? (note: sdl.pc's --static --libs is broken)
-- 
1.7.4.4

[Qemu-devel] [PATCH] configure: don't try to compile against known broken curses.

2011-12-07 Thread Andrzej Zaborowski

This should resolve a problem noted by Caraman Mihai Claudiu.

Signed-off-by: Andrzej Zaborowski 
---
 configure |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/configure b/configure
index 61c43b9..678b982 100755
--- a/configure
+++ b/configure
@@ -1846,6 +1846,11 @@ if test "$curses" != "no" ; then
 #ifdef __OpenBSD__
 #define resize_term resizeterm
 #endif
+#ifdef NCURSES_VERSION
+# if NCURSES_VERSION_PATCH < 20040117
+#  error Old ncurses contain dangerous typedefs, break qemu build (and are old)
+# endif
+#endif
 int main(void) { resize_term(0, 0); return curses_version(); }
 EOF
   for curses_lib in $curses_list; do
-- 
1.7.4.4

[Qemu-devel] [PULL 00/35] VMState port of all cpus

2011-12-07 Thread Juan Quintela

The following changes since commit 217bfb445b54db618a30f3a39170bebd9fd9dbf2:

  hw/arm_gic.c: Ignore attempts to complete nonexistent IRQs (2011-12-05 
21:38:56 +0100)

are available in the git repository at:
  ssh://repo.or.cz/srv/git/qemu/quintela.git vmstate-cpus-v4-for-anthony

[v4]
- change comment: 
* Bassed on qemu-file code done by:
  to
   * Based on savevm serialization code by:
- dropped sparc/ppc copyright notices.  After two tries, I haven't got
  something that Blae is happy with.


[v3]
- rebase to top
- fix sparc/arm/i386 changes in upstream
- all reviews were positive, Anthony, please pull

[v2] Changes since v1

- preserve arm comment that was missing (pbrook)
- add copyright notice to the files that were empty
- new patches:
  * fix formating for i386
  * remove unneeded includes
  * rename machine.c to vmstate.c

Later, Juan.

[v1]

This series port all cpus to use vmstate.
- 1st patch is a fix of vmstate.
- I discussed the arm changes over irc with Peter, he agreed that some
  simplification could be good, but he didn't saw the patches O:-)
- mips: no pci chipset has been ported, so migration don't work there.
  I have embedded a couple of structs to improve vmstate checking.  Notice
  that they were always allocated, so there shouldn't be any problem.
- sparc: I changed the format a little bit to be able to use normal arrays.
- sparc: If we always send the whole register windows, we don't need
  VMSTATE_VARRAY_MULTIPLY.  As that array is quite big (520 elements), I am not
  sure what is best.
- cpsr_vmstate on arm: I am not sure if I could "abuse" uncached_cpsr for that
  purpose?

I have only tested on x86, for the rest, I double checked, but it is
possible that I missed something.  I expect all patches to be
integrated by Anthony in one go.  Architecture maintainers are CC'd
for an ACK/NACK/comments.

Please, review.

PD. Is there an easy way of creating this "CC" list of mail addresses,
or the only way is to edit comments and write it by hand as I did?


Juan Quintela (35):
  vmstate: Fix VMSTATE_VARRAY_UINT32
  vmstate: Simplify test for CPU_SAVE_VERSION
  vmstate: make all architectures export a way to migrate cpu's
  vmstate: unicore32 don't support cpu migration
  vmstate: use new cpu style for x86
  vmstate: use new style for lm32 cpus
  vmstate: make microblaze cpus not migrateable
  vmstate: port cris cpu to vmstate
  vmstate: machine.c is only compiled for !CONFIG_USER_ONLY
  vmstate: introduce float32 arrays
  vmstate: introduce float64 arrays
  vmstate: introduce CPU_DoubleU arrays
  vmstate: Introduce VMSTATE_STRUCT_VARRAY_INT32_TEST
  vmstate: port ppc cpu
  vmstate: introduce VMSTATE_VARRAY_MULTIPLY
  vmstate: define vmstate_info_uinttls
  vmstate: port sparc cpu
  vmstate: make incompatible change for sparc
  mips_fulong2e: cpu vmstate already registered in cpu_exec_init
  mips: make mvp an embedded struct instead of a pointer
  mips: make tlb an embedded struct instead of a pointer
  mips: bump migration version to 4
  vmstate: port mips cpu
  arm: save always 32 fpu registers
  vmstate: port arm cpu
  vmstate: all cpus converted
  vmstate: fix vmstate formating for i386
  vmstate: remove unneeded includes from target-*/machine.c
  vmstate: rename machine.c to vmstate-cpu.c
  vmstate: Add copyright info for alpha processor
  vmstate: Add copyright info for lm32 processor
  vmstate: Add copyright info for cris processor
  vmstate: Add copyright info for arm processor
  vmstate: Add copyright info for i386 processor
  vmstate: Add copyright info for mips processor

 Makefile.target|3 +-
 exec.c |7 +-
 hw/hw.h|   62 +-
 hw/mips_fulong2e.c |1 -
 hw/mips_malta.c|4 +-
 hw/mips_timer.c|2 +-
 hw/sun4u.c |   20 --
 qemu-common.h  |4 -
 savevm.c   |   92 +
 target-alpha/{machine.c => vmstate-cpu.c}  |   28 ++-
 target-arm/cpu.h   |5 +-
 target-arm/machine.c   |  225 
 target-arm/vmstate-cpu.c   |  188 +
 target-cris/cpu.h  |   13 +-
 target-cris/machine.c  |   90 
 target-cris/vmstate-cpu.c  |   74 +++
 target-i386/cpu.h  |2 -
 target-i386/{machine.c => vmstate-cpu.c}   |   64 ---
 target-lm32/cpu.h  |2 -
 target-lm32/{machine.c => vmstate-cpu.c}   |   32 ++--
 target-m68k/vmstate-cpu.c  |   21 ++
 target-microblaze/cpu.h|2 -
 target-microblaze/machine.c

Re: [Qemu-devel] [PATCH v2 0/5] backdoor: lightweight guest-to-QEMU backdoor channel

Anthony Liguori writes:
[...]
> Why should this analyzer live outside of QEMU in the first place?  I fail to 
> see
> the rationale for that other than not wanting to do the work of making it
> suitable for upstream consumption.

For the same reason that SystemTap lets you add user-provided code into the
trace hooks, you never know what the user actually wants. The difference is that
when dealing with traces that require a high bandwidth, different people may be
interested on analyzing different events and under different conditions, and
having a JIT in QEMU comes in handy to optimize the traces. This includes the
generation of extra tracing code at translation time (e.g., I generate checks in
TCG to establish when I want to trace a specific event, and someone else may
just want to increment a counter using TCG code).

Guest trace instrumentation turns QEMU into a highly-performant tool for dynamic
binary instrumentation, with all the benefits that stem from QEMU (support for a
myriad of target architectures, as well as support for both full-system and
user-level applications).

I can understand if someone thinks this is not a desired feature for QEMU [1],
but not including it into upstream will just turn it into a patch-based feature
that will rapidly fail to apply on top of upstream QEMU.

[1] Of course, I will still contribute support for guest tracing, as well as the
guest events I added.

Lluis

-- 
 "And it's much the same thing with knowledge, for whenever you learn
 something new, the whole world becomes that much richer."
 -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
 Tollbooth

[Qemu-devel] [RFC] Device sandboxing


A group of us are starting to work on sandboxing QEMU device emulation
code.  We're just getting started investigating various approaches, and
want to engage the community to gather input.

Following are the design points that we are currently considering:

* Decompose QEMU into multiple processes:

* This could be done such that QEMU devices execute in separate
  processes based on device type, e.g. all block devices in one
  process and all network devices in a second process.  Another
  alternative is executing a separate process per device.

* Decomposition would not only afford a level of security inherent
  in process separation, it would also allow development of stricter
  sVirt/SELinux policy for the decomposed QEMU processes (e.g. a
  block device specific policy).  This would enable a true sandbox
  with layers of defense.

* Decompose the device emulation process further into an untrusted and
  trusted thread:

* The untrusted thread would be restricted by seccomp mode 1 and
  would contain the device emulation code.

* The trusted helper thread would run beside the untrusted thread,
  enabling the untrusted thread to make syscalls beyond read(),
  write(), exit(), and sigreturn().

* IPC communication mechanisms:

* An IPC mechanism will be required to enable communication between
  untrusted and trusted threads.

* An IPC mechanism will also be required to enable communication
  between the main QEMU process and device processes.

* The communication mechanisms must provide secure communication,
  be low overhead (easy to generate, parse, and validate), and must
  play well with sVirt/LSMs.

* Some thoughts for IPC mechanisms are Unix sockets, pipes, virtio,
  Google Native Client's IMC, and shared memory.

* If seccomp mode 2 support becomes available, decomposition of device
  emulation into untrusted/trusted threads may not be necessary.  This
  could result in improved performance (no IPC overhead between trusted
  and untrusted thread) and reduced complexity (no need for trusted
  helper thread).

* Execution of QEMU with the sandboxed device support should be an
  optional run-time specification.

* We will be focusing on legacy devices first, both for performance and
  risk reasons.

Once we settle on a direction, we will develop a proof of concept to
share with the community.

We appreciate your input.

Regards,

Ashley Lai
Corey Bryant
Eduardo Otubo
Michael Halcrow
Paul Moore
Richa Marwaha

Re: [Qemu-devel] ncurses 5.3 conflicts with latest qemu

2011-12-07 Thread andrzej zaborowski

On 5 December 2011 22:44, Stefan Weil  wrote:
> Am 05.12.2011 20:13, schrieb andrzej zaborowski:
>
>> Hi,
>>
>> On 17 November 2011 10:06, Caraman Mihai Claudiu-B02008
>>  wrote:
>>>
>>> A recent patch in qemu conflicts with old ncurses libraries (version
>>> 5.3). You will see this error cause by bool type redefinition in curses.h
>>> (with CONFIG_CURSES configured by default):
>>>
>>> console.c: In function 'text_console_init':
>>> console.c:1550:23: error: assignment from incompatible pointer type
>>>
>>> the qemu patch exposing this problem is:
>>>
>>> curses: fix garbling when chtype != long
>>> author  Devin J. Pohly 
>>>       Wed, 7 Sep 2011 19:44:36 + (15:44 -0400)
>>> committer       Anthony Liguori 
>>>       Fri, 9 Sep 2011 17:58:16 + (12:58 -0500)
>>> commit  df00bed0fa30a6f5712456e7add783e470c534c9
>>>
>>> The problem seems to be fixed in newer versions of ncurses (5.7 and
>>> above). I just looked over the sources, so better if someone can confirm
>>> this.
>>> Here is a qemu patch that solve the conflict with old ncurses:
>>>
>>>
>>> Signed-off-by: Mihai Caraman 
>>> ---
>>> Fix compile errors with old ncurses libraries (version 5.3) caused by
>>> bool type redefinition.
>>>
>>>  qemu-common.h |    3 +++
>>>  console.h     |    1 -
>>>
>>>  2 files changed, 3 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/qemu-common.h b/qemu-common.h index 5e87bdf..9ac15ba 100644
>>> --- a/qemu-common.h
>>> +++ b/qemu-common.h
>>> @@ -23,6 +23,9 @@ typedef struct Monitor Monitor;  #include 
>>>  #include   #include 
>>> +#ifdef CONFIG_CURSES
>>> +#include 
>>> +#endif
>>>  #include 
>>>  #include 
>>>  #include 
>>> diff --git a/console.h b/console.h
>>> index 9c1487e..3327c43 100644
>>> --- a/console.h
>>> +++ b/console.h
>>> @@ -329,7 +329,6 @@ static inline int ds_get_bytes_per_pixel(DisplayState
>>> *ds)  }
>>>
>>>  #ifdef CONFIG_CURSES
>>> -#include 
>>>  typedef chtype console_ch_t;
>>>  #else
>>>  typedef unsigned long console_ch_t;
>>
>>
>> While very fragile it looks like the easiest fix, so I'd like to push
>> this if there's no objection. It builds fine for me.
>>
>> The solution would probably be to include  only from inside
>> ui/curses.c, and define chtype at configure time, or guess it.
>>
>> Cheers
>
>
> Even Debian Lenny already has ncurses 5.7 and does not
> need this patch.
>
> Is there a good reason why very old buggy versions of ncurses
> should be supported? Redefining bool is a bug!
>
> When I had the same problem some time ago, I fixed my local
> installation...

Makes sense.  Turns out that 5.3 was released in 2002 and from the
changelog it looks like the problem was fixed on 2004-01-17 or
earlier.  I'll send a patch for configure to refuse ncurses versions
older than that.

Cheers

Re: [Qemu-devel] [PATCH v2 0/5] backdoor: lightweight guest-to-QEMU backdoor channel


On 12/07/2011 10:59 AM, Lluís Vilanova wrote:

Anthony Liguori writes:

Well, both backdoor and trace instrumentation are implemented using the same
approach (a static library selected at compile-time). The user sets which events
to instrument in the "trace-events" file. This has the effect that the tracetool
script will not generate the tracing functions for those events, and instead the
user must provide the implementation for these events on a static library
provided at compile time.



I don't think this is the right approach to tracing.  What not just use
something like SystemTap to implement logic around tracing?



http://blog.vmsplice.net/2011/03/how-to-write-trace-analysis-scripts-for.html


This would only allow me to post-process all the traces, with no possible
interaction with qemy from the analysis library.

Compiling the analysis library into qemu itself lets me control when TCG code to
actually trace an instruction is generated (with guest code, trace events are
instrumented at both code translation and code execution time):

translate:
 gen_helper_trace_mem(...)
   -- trace instrumentation -->   ... if (...) 
gen_helper_trace_mem_backend(...); ...
   -- otherwise -->   gen_helper_trace_mem_backend(...);

execute:
 helper_trace_mem(...)
   trace_mem(...)
 -- trace instrumentation -->  ... whatever ...
 -- otherwise -->  trace_fetch_backend(...);

One could argue that gen_helper_trace_mem can just check some in-qemu central
structure to see if it must generate the call to helper_trace_mem, but then the
best you can do is an all-or-nothing on a per-event basis.

You must have in mind that we're talking about an extreme tracing bandwidth
here, so the analyzer is willing to filter-out as many events as soon as
possible (e.g., just tracing a specific set of IPs can be optimized at
translation time, and tracing instructions using specific registers can be
optimized by generating extra checks with TCG). This requires some sort of
in-line communication between qemu and the analyzer.


Why should this analyzer live outside of QEMU in the first place?  I fail to see 
the rationale for that other than not wanting to do the work of making it 
suitable for upstream consumption.


Regards,

Anthony Liguori

Re: [Qemu-devel] vfio / iommu domain attributes

2011-12-07 Thread Joerg Roedel

On Wed, Dec 07, 2011 at 09:54:39AM -0600, Stuart Yoder wrote:
> Alex, Alexey I'm wondering if you've had any new thoughts on this over
> the last week.
> 
> For Freescale, our iommu domain attributes would look something like:
> -domain iova base address
> -domain iova window size

I agree with that.

> -domain enable/disable
> -number of subwindows
> -operation mapping table index
> -stash destination CPU
> -stash target (cache– L1, L2, L3)

Why does the user of the IOMMU-API need to have control over these
things?

> These are all things that need to be set by the creator of the domain.
> 
> Since the domain attributes are going to be so different for each platform 
> does
> it make sense to define a new iommu_ops call back that just takes a void 
> pointer
> that can be implemented in a platform specific way?   For example:
> 
> struct iommu_ops {
> [cut]
> int (*domain_set_attrs)(struct iommu_domain *domain,
>   void *attrs);
> int (*domain_get_attrs)(struct iommu_domain *domain,
>   void *attrs);
> }

A void pointer is certainly the worst choice for an interface. I think
it is better to have at least a set of common attributes. Somthing like
this:

iommu_domain_set_attr(struct iommu_domain *domain, enum attr_type, void *data)
iommu_domain_get_attr(struct iommu_domain *domain, enum attr_type, void *data)

The iova base/size options make sense for more IOMMUs than just
Freescale. For example it would allow to manage GART-like IOMMUs with
the IOMMU-API too.

Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

Re: [Qemu-devel] [PATCH v2 0/5] backdoor: lightweight guest-to-QEMU backdoor channel

Anthony Liguori writes:
>> Well, both backdoor and trace instrumentation are implemented using the same
>> approach (a static library selected at compile-time). The user sets which 
>> events
>> to instrument in the "trace-events" file. This has the effect that the 
>> tracetool
>> script will not generate the tracing functions for those events, and instead 
>> the
>> user must provide the implementation for these events on a static library
>> provided at compile time.

> I don't think this is the right approach to tracing.  What not just use
> something like SystemTap to implement logic around tracing?

> http://blog.vmsplice.net/2011/03/how-to-write-trace-analysis-scripts-for.html

This would only allow me to post-process all the traces, with no possible
interaction with qemy from the analysis library.

Compiling the analysis library into qemu itself lets me control when TCG code to
actually trace an instruction is generated (with guest code, trace events are
instrumented at both code translation and code execution time):

translate:
gen_helper_trace_mem(...)
  -- trace instrumentation -->  ... if (...) 
gen_helper_trace_mem_backend(...); ...
  -- otherwise -->  gen_helper_trace_mem_backend(...);

execute:
helper_trace_mem(...)
  trace_mem(...)
-- trace instrumentation --> ... whatever ...
-- otherwise --> trace_fetch_backend(...);

One could argue that gen_helper_trace_mem can just check some in-qemu central
structure to see if it must generate the call to helper_trace_mem, but then the
best you can do is an all-or-nothing on a per-event basis.

You must have in mind that we're talking about an extreme tracing bandwidth
here, so the analyzer is willing to filter-out as many events as soon as
possible (e.g., just tracing a specific set of IPs can be optimized at
translation time, and tracing instructions using specific registers can be
optimized by generating extra checks with TCG). This requires some sort of
in-line communication between qemu and the analyzer.

Still, I tried to build this into perfectly separate and independent features:

  * Support for tracing guest code through TCG.
  * Support for different sets of enabled guest tracing events on a per-vCPU
basis.
  * Support for instrumenting tracing events (including those for guest code).

>> The analyses one might want to perform are ad-hoc in nature. For example, 
>> some
>> recent work sent here did coverage analyses through ad-hoc modifications on 
>> the
>> core of qemu, I want to generate detailed traces of the guest code that is
>> executed (although only when certain conditions in the guest are met). There 
>> is
>> no silver bullet here, but the development of analyzers can be simplified by
>> letting users perform static instrumentation of the tracing events, which 
>> act as
>> regular tracing events otherwise.
>> 
>> In this environment, the backdoor is just a handy channel to let the guest
>> communicate with code on that analyzer library, like signalling semanticful
>> events of interest (e.g., I have a trivial linux kernel module that installs 
>> a
>> tracepoint callback in the process switch routine and passes that information
>> down to the analyzer library through the backdoor channel).
>> 
>> As these analyzes are ad-hoc in nature, I don't see any way other than 
>> providing
>> this extension functionality.

> But the effect is that the infrastructure is useless without out-of-tree code
> which means it's impossible to test.  Furthermore, when the ad-hoc analyzer
> library breaks because of an API change in QEMU, it puts us in a very 
> difficult
> situation.

The infrastructure itself is pretty simple (the least trivial part is code in
tracetool), and can be tested by having a testing library that can be shipped
with qemu, as well as can be used to test qemu itself.

I also think that the point of breaking third-party analyzers is less important
if the access to TCG code generation and the tracing events are enough for most
of them (the TCG API is pretty stable). Some analyzers might want to access
register values or something else, but they would still do that even without
instrumentation through ad-hoc changes in a rapidly aging version of qemu. My
only intent is to ease the effort and the performance costs of developing such
analyzers.

>> Does it make sense as a whole? I thought from previous mails that being able 
>> to
>> trace and analyze guest code was something perceived as a useful feature to 
>> have
>> in qemu.

> I think there's wide agreement that extending tracing to guests is very useful
> and something we should do.

> I don't think there's agreement that we should have a mechanism to pull in
> third-party code to implement the logic for these trace hooks within QEMU.

> The right way to do analysis is via something like SystemTap.

Well, as I tried to explain above, this comes at the expense of performance, as
the analyzer is not inlined int

Re: [Qemu-devel] [PATCH 1/2] guest agent: add RPC blacklist command-line option


On 12/07/2011 06:12 AM, Dor Laor wrote:

On 12/07/2011 12:52 PM, Daniel P. Berrange wrote:

On Wed, Dec 07, 2011 at 12:34:01PM +0200, Dor Laor wrote:

On 12/07/2011 06:03 AM, Michael Roth wrote:

This adds a command-line option, -b/--blacklist, that accepts a
comma-seperated list of RPCs to disable, or prints a list of
available RPCs if passed "?".

In consequence this also adds general blacklisting and RPC listing
facilities to the new QMP dispatch/registry facilities, should the
QMP monitor ever have a need for such a thing.


Beyond run time disablement, how easy it is to compile out some of
the general commands such as exec/file-handling?

Security certifications like common criteria usually ask to compile
out anything that might tamper security.


I don't think that's really relevant/needed. As discussed on the
call yesterday, this is security theatre, because nothing can prevent
the host admin from accessing guest RAM or disk data. AFAIK the
virtualization related security certifications acknowledge this
already& don't make any claims about security of guests against
a malicious host admin. In any case, a suitable SELinux policy for
the guest agent could prevent arbitrary file/binary access via
generic 'exec' / 'file-read' commands, in a manner that is sufficient
to satisfy security certications.


I absolutely agree that the hypervisor can tweak the guest in multiple
ways. Nevertheless there are two reasons I asked it:

1. Reduce code and noise from security reviewers eyes.
We were asked to do exactly that for other qemu functionality that
is included but does not run at all. It's just makes the review
faster.


Actually removing the code, or compiling it out?

If it's a matter of compiling it out, the best solution I can think of 
is having the QAPI code generators create a #define  for each RPC, 
then wrapping the implementations inside an #ifdef . That way you 
could compile out the code by simply modifying the schema.


That said, I'd really like to avoid having distros get into the habit of 
extensively modifying their guest agent source outside of bug fixes and 
whatnot, I think it'll cause too many problems down the road. From a 
management perspective, if you're running a cloud with multiple distros, 
it'll be really difficult to account for agents that have been modified 
or crippled in various ways.


Perhaps we only need, say, shutdown, for ovirt, and compile out the 
rest, but maybe a customer wants to run their RHEL guest in home-brewed 
environment where they use qemu-ga file read/write to handle a specific 
set of guest activation procedures. Now they need a new agent package.


It's a whole lot of hassle for host/guest admins for the sake of saving 
a security reviewer a bit of investigating that'll lead right back to 
the general operating premise that you have to trust your host 
administrators before any chain of trust can be established.


At least with this interface we can provide some semblance of relief to 
users with specific security concerns, but don't have to work with 
distros to re-package agents when those concerns collide with 
requirements on the host side. We can just check to see if they disabled 
the functionality and request they re-enable due to  by updating 
their configs.




2. Every piece of code is a risk for exploit
Imagine that a bug/leak/use-after-free in the blacklist command or
the exec command on qemu exists and allows attacked to gain control
of qemu.


A host can never assume that a guest [agent] can be trusted. qemu-ga 
might've been replaced completely by a malicious guest admin, thus 
circumventing any steps a distro has taken to harden it. Fortunately a 
guest can only affect memory outside it's address space by going through 
the virtio-serial/QMP layer. So we can focus our efforts on hardening 
the transport and json parser layers, and a lot of work has gone into 
that already (placing limits on token size, recursion depth, etc). So 
that's more an issue that needs to be addressed on the qemu side, and is 
independent of any particular RPC implementation on the guest side.






Regards,
Daniel

[Qemu-devel] [PATCH] build: Cleanup qga make output

2011-12-07 Thread Adam Litke

Currently the make variable qapi-dir refers to the qapi-generated directory in
absolute terms.  This causes the harmless but ugly make output below.  By
changing this variable to the relative path the output conforms to the norm and
the build works fine.

Before patch:
  CC/home/aglitke/src/qemu/qapi-generated/qga-qapi-types.o
  CC/home/aglitke/src/qemu/qapi-generated/qga-qapi-visit.o
  CC/home/aglitke/src/qemu/qapi-generated/qga-qmp-marshal.o
After patch:
  CCqapi-generated/qga-qapi-types.o
  CCqapi-generated/qga-qapi-visit.o
  CCqapi-generated/qga-qmp-marshal.o

Signed-off-by: Adam Litke 
---
 Makefile |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/Makefile b/Makefile
index 301c75e..7c93739 100644
--- a/Makefile
+++ b/Makefile
@@ -168,7 +168,7 @@ check-qjson: check-qjson.o $(qobject-obj-y) $(tools-obj-y)
 test-coroutine: test-coroutine.o qemu-timer-common.o async.o 
$(coroutine-obj-y) $(tools-obj-y)
 
 $(qapi-obj-y): $(GENERATED_HEADERS)
-qapi-dir := $(BUILD_DIR)/qapi-generated
+qapi-dir := qapi-generated
 test-visitor.o test-qmp-commands.o qemu-ga$(EXESUF): QEMU_CFLAGS += -I 
$(qapi-dir)
 qemu-ga$(EXESUF): LIBS = $(LIBS_QGA)
 
-- 
1.7.5.rc1

[Qemu-devel] Error while booting

2011-12-07 Thread sparsh mittal

I use version 0.15 of qemu with marss cycle-accurate simulator.

After making checkpoints in an image, I get this error, while trying to
boot it (i.e. not using loadvm to run checkpoint, but just starting the
image).

This kernel requires an x86-64 CPU, but only detected an i686 CPU. Unable
to boot - please use a kernel appropriate for your CPU

Interesting thing is that when I issue
-loadvm checkpointname

Then it works fine.

Can someone help in this regard?

Thanks a lot in advance.


Thanks and Regards
Sparsh Mittal

Re: [Qemu-devel] [PATCH v2 0/6] Memory API mutators

On 7 December 2011 15:56, Anthony Liguori  wrote:

> Ok, let's add a docs/vmstate.txt and add a section that says "If you use the
> following functions, you probably need to call them again in post_load" and
> put these functions on the list.

We've already got docs/migration.txt which talks about the postload hook.

-- PMM

Re: [Qemu-devel] [PATCH v2 0/6] Memory API mutators


On 12/07/2011 09:54 AM, Avi Kivity wrote:

On 12/07/2011 05:52 PM, Anthony Liguori wrote:

On 12/04/2011 12:09 PM, Avi Kivity wrote:

This patchset introduces memory_region_set_enabled() and
memory_region_set_address() to avoid the requirement on memory
routers to track the internal state of the memory API (so they know
whether they need to add or remove a region).  Instead, they can
simply copy the state of the region from the guest-exposed register
to the memory core, via the new mutator functions.


Based on previous discussions, any time these functions are used, the
caller more than likely needs to call them again in a post_load hook
during restore, correct?

It would be good to document this very clearly in the header docs for
each function.


It's a layering violation, but I'll add something.


Ok, let's add a docs/vmstate.txt and add a section that says "If you use the 
following functions, you probably need to call them again in post_load" and put 
these functions on the list.


Regards,

Anthony Liguori

Re: [Qemu-devel] [PATCH v2 0/6] Memory API mutators

2011-12-07 Thread Avi Kivity

On 12/07/2011 05:52 PM, Anthony Liguori wrote:
> On 12/04/2011 12:09 PM, Avi Kivity wrote:
>> This patchset introduces memory_region_set_enabled() and
>> memory_region_set_address() to avoid the requirement on memory
>> routers to track the internal state of the memory API (so they know
>> whether they need to add or remove a region).  Instead, they can
>> simply copy the state of the region from the guest-exposed register
>> to the memory core, via the new mutator functions.
>
> Based on previous discussions, any time these functions are used, the
> caller more than likely needs to call them again in a post_load hook
> during restore, correct?
>
> It would be good to document this very clearly in the header docs for
> each function.

It's a layering violation, but I'll add something.

-- 
error compiling committee.c: too many arguments to function

[Qemu-devel] vfio / iommu domain attributes

2011-12-07 Thread Stuart Yoder

In the vfio RFC thread there seemed to be convergence that some new
iommu_ops API is needed to set some platform specific aspects of an
iommu domain.

On Wed, Nov 30, 2011 at 10:58 AM, Alex Williamson
 wrote:
[cut]
> In that case, you should definitely be following what Alexey is thinking
> about with an iommu_setup IOMMU API callback.  I think it's shaping up
> to do:
>
> x86:
>  - Report any IOVA range restrictions imposed by hw implementation
> POWER:
>  - Request IOVA window size, report size and base
> powerpc:
>  - Set domain attributes, probably report range as well.

Alex, Alexey I'm wondering if you've had any new thoughts on this over
the last week.

For Freescale, our iommu domain attributes would look something like:
-domain iova base address
-domain iova window size
-domain enable/disable
-number of subwindows
-operation mapping table index
-stash destination CPU
-stash target (cache– L1, L2, L3)

These are all things that need to be set by the creator of the domain.

Since the domain attributes are going to be so different for each platform does
it make sense to define a new iommu_ops call back that just takes a void pointer
that can be implemented in a platform specific way?   For example:

struct iommu_ops {
[cut]
int (*domain_set_attrs)(struct iommu_domain *domain,
  void *attrs);
int (*domain_get_attrs)(struct iommu_domain *domain,
  void *attrs);
}

Whatever this API winds up looking like it needs to be reflected in
the vfio interface to user space as well.

Thanks,
Stuart

Re: [Qemu-devel] [PATCH v2 0/6] Memory API mutators


On 12/04/2011 12:09 PM, Avi Kivity wrote:

This patchset introduces memory_region_set_enabled() and
memory_region_set_address() to avoid the requirement on memory
routers to track the internal state of the memory API (so they know
whether they need to add or remove a region).  Instead, they can
simply copy the state of the region from the guest-exposed register
to the memory core, via the new mutator functions.


Based on previous discussions, any time these functions are used, the caller 
more than likely needs to call them again in a post_load hook during restore, 
correct?


It would be good to document this very clearly in the header docs for each 
function.

Other than Blue's comments, the rest looks good to me.

Regards,

Anthony Liguori



v2:
- fix minor bug in set_address()
- add set_alias_offset()
- two example users

Avi Kivity (6):
   memory: introduce memory_region_set_enabled()
   memory: introduce memory_region_set_address()
   memory: introduce memory_region_set_alias_offset()
   memory: optimize empty transactions due to mutators
   cirrus_vga: adapt to memory mutators API
   piix_pci: adapt smram mapping to use memory mutators

  hw/cirrus_vga.c |   50 +++--
  hw/piix_pci.c   |   20 -
  memory.c|   81 +++---
  memory.h|   39 ++
  4 files changed, 132 insertions(+), 58 deletions(-)

Re: [Qemu-devel] [PATCH v2 0/5] backdoor: lightweight guest-to-QEMU backdoor channel


On 12/07/2011 09:23 AM, Lluís Vilanova wrote:

Anthony Liguori writes:
[...]

If you want to extend QEMU, then send proper patches.  Adding random C files to
the build is unacceptable.


What do you mean by proper patches?



If you want to add "custom functionality" to QEMU, send a patch to upstream 
QEMU.


See below.



The whole point of this is to ease the
process for the user to add custom guest-to-qemu interactions, so there is no
"proper implementation" here.



That's too vague as far as I'm concerned.  The only reason I can see to have a
mechanism like this is 1) to avoid pushing stuff upstream and/or 2) to
circumvent QEMU licensing by shipping a separate file that the user includes
themselves.


AFAIK, GPL licensing restrictions only apply when the code is shipped in a
binary blob. If this is the case, the library is statically included in qemu, so
it must still comply with the licensing requirements of qemu. Still, take this
with a grain of salt as I'm not a licensing expert.

My motivation is more like 1 (I'll release the analyzer library I'm developing
once it's finished), but for the reasons stated below.



I see no reason to encourage either of these things.  It also creates an
inevitable problem of internal API stability.  What happens when the tree
changes and all of these "custom guest-to-qemu interactions" break?



You may claim that it's an unsupported interface, but what will actually happen
is that users will start depending on these custom features long after the
initial developer has stopped caring.



Blue Swirl already mentioned this can be used for testing qemu itself, while my
use case is coupled with the trace instrumentation features (I'll probably send
the first batch today).



In terms of supporting this mechanism, all backend code needs to live in
upstream QEMU.  That's a hard requirement.  No configure flags to pull in random
C files and no dlopens.



If you resent the series doing that, it'd be more likely to be merged but since
I don't think your intention is to push backend code into QEMU, I wonder whether
it's worth really merging this mechanism at all.



If this is just infrastructure that is only used by private forks, I don't think
it belongs upstream.


Well, both backdoor and trace instrumentation are implemented using the same
approach (a static library selected at compile-time). The user sets which events
to instrument in the "trace-events" file. This has the effect that the tracetool
script will not generate the tracing functions for those events, and instead the
user must provide the implementation for these events on a static library
provided at compile time.


I don't think this is the right approach to tracing.  What not just use 
something like SystemTap to implement logic around tracing?


http://blog.vmsplice.net/2011/03/how-to-write-trace-analysis-scripts-for.html


The analyses one might want to perform are ad-hoc in nature. For example, some
recent work sent here did coverage analyses through ad-hoc modifications on the
core of qemu, I want to generate detailed traces of the guest code that is
executed (although only when certain conditions in the guest are met). There is
no silver bullet here, but the development of analyzers can be simplified by
letting users perform static instrumentation of the tracing events, which act as
regular tracing events otherwise.

In this environment, the backdoor is just a handy channel to let the guest
communicate with code on that analyzer library, like signalling semanticful
events of interest (e.g., I have a trivial linux kernel module that installs a
tracepoint callback in the process switch routine and passes that information
down to the analyzer library through the backdoor channel).

As these analyzes are ad-hoc in nature, I don't see any way other than providing
this extension functionality.


But the effect is that the infrastructure is useless without out-of-tree code 
which means it's impossible to test.  Furthermore, when the ad-hoc analyzer 
library breaks because of an API change in QEMU, it puts us in a very difficult 
situation.




Does it make sense as a whole? I thought from previous mails that being able to
trace and analyze guest code was something perceived as a useful feature to have
in qemu.


I think there's wide agreement that extending tracing to guests is very useful 
and something we should do.


I don't think there's agreement that we should have a mechanism to pull in 
third-party code to implement the logic for these trace hooks within QEMU.


The right way to do analysis is via something like SystemTap.

Regards,

Anthony Liguori




Lluis

Re: [Qemu-devel] [PATCH v2 0/5] backdoor: lightweight guest-to-QEMU backdoor channel

Anthony Liguori writes:
[...]
>>> If you want to extend QEMU, then send proper patches.  Adding random C 
>>> files to
>>> the build is unacceptable.
>> 
>> What do you mean by proper patches?

> If you want to add "custom functionality" to QEMU, send a patch to upstream 
> QEMU.

See below.

>> The whole point of this is to ease the
>> process for the user to add custom guest-to-qemu interactions, so there is no
>> "proper implementation" here.

> That's too vague as far as I'm concerned.  The only reason I can see to have a
> mechanism like this is 1) to avoid pushing stuff upstream and/or 2) to
> circumvent QEMU licensing by shipping a separate file that the user includes
> themselves.

AFAIK, GPL licensing restrictions only apply when the code is shipped in a
binary blob. If this is the case, the library is statically included in qemu, so
it must still comply with the licensing requirements of qemu. Still, take this
with a grain of salt as I'm not a licensing expert.

My motivation is more like 1 (I'll release the analyzer library I'm developing
once it's finished), but for the reasons stated below.

> I see no reason to encourage either of these things.  It also creates an
> inevitable problem of internal API stability.  What happens when the tree
> changes and all of these "custom guest-to-qemu interactions" break?

> You may claim that it's an unsupported interface, but what will actually 
> happen
> is that users will start depending on these custom features long after the
> initial developer has stopped caring.

>> Blue Swirl already mentioned this can be used for testing qemu itself, while 
>> my
>> use case is coupled with the trace instrumentation features (I'll probably 
>> send
>> the first batch today).

> In terms of supporting this mechanism, all backend code needs to live in
> upstream QEMU.  That's a hard requirement.  No configure flags to pull in 
> random
> C files and no dlopens.

> If you resent the series doing that, it'd be more likely to be merged but 
> since
> I don't think your intention is to push backend code into QEMU, I wonder 
> whether
> it's worth really merging this mechanism at all.

> If this is just infrastructure that is only used by private forks, I don't 
> think
> it belongs upstream.

Well, both backdoor and trace instrumentation are implemented using the same
approach (a static library selected at compile-time). The user sets which events
to instrument in the "trace-events" file. This has the effect that the tracetool
script will not generate the tracing functions for those events, and instead the
user must provide the implementation for these events on a static library
provided at compile time.

The analyses one might want to perform are ad-hoc in nature. For example, some
recent work sent here did coverage analyses through ad-hoc modifications on the
core of qemu, I want to generate detailed traces of the guest code that is
executed (although only when certain conditions in the guest are met). There is
no silver bullet here, but the development of analyzers can be simplified by
letting users perform static instrumentation of the tracing events, which act as
regular tracing events otherwise.

In this environment, the backdoor is just a handy channel to let the guest
communicate with code on that analyzer library, like signalling semanticful
events of interest (e.g., I have a trivial linux kernel module that installs a
tracepoint callback in the process switch routine and passes that information
down to the analyzer library through the backdoor channel).

As these analyzes are ad-hoc in nature, I don't see any way other than providing
this extension functionality.

Does it make sense as a whole? I thought from previous mails that being able to
trace and analyze guest code was something perceived as a useful feature to have
in qemu.

Lluis

-- 
 "And it's much the same thing with knowledge, for whenever you learn
 something new, the whole world becomes that much richer."
 -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
 Tollbooth

[Qemu-devel] [PATCH 2/2] net: take ownership of fd in socket init functions

Today net/socket.c has no consistent policy for closing the socket file
descriptor when initialization fails.  This means we leak the file
descriptor in some cases or we could also try to close it twice.

Make error paths consistent by taking ownership of the file descriptor
and closing it on error.

Signed-off-by: Stefan Hajnoczi 
---
 net/socket.c |   17 +
 1 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/net/socket.c b/net/socket.c
index 613a7ef..f999c26 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -266,14 +266,13 @@ static NetSocketState *net_socket_fd_init_dgram(VLANState 
*vlan,
 if (saddr.sin_addr.s_addr == 0) {
 fprintf(stderr, "qemu: error: init_dgram: fd=%d unbound, "
 "cannot setup multicast dst addr\n", fd);
-return NULL;
+goto err;
 }
 /* clone dgram socket */
 newfd = net_socket_mcast_create(&saddr, NULL);
 if (newfd < 0) {
 /* error already reported by net_socket_mcast_create() */
-close(fd);
-return NULL;
+goto err;
 }
 /* clone newfd to fd, close newfd */
 dup2(newfd, fd);
@@ -283,7 +282,7 @@ static NetSocketState *net_socket_fd_init_dgram(VLANState 
*vlan,
 fprintf(stderr,
 "qemu: error: init_dgram: fd=%d failed getsockname(): 
%s\n",
 fd, strerror(errno));
-return NULL;
+goto err;
 }
 }
 
@@ -304,6 +303,10 @@ static NetSocketState *net_socket_fd_init_dgram(VLANState 
*vlan,
 if (is_connected) s->dgram_dst=saddr;
 
 return s;
+
+err:
+closesocket(fd);
+return NULL;
 }
 
 static void net_socket_connect(void *opaque)
@@ -353,6 +356,7 @@ static NetSocketState *net_socket_fd_init(VLANState *vlan,
 (socklen_t *)&optlen)< 0) {
 fprintf(stderr, "qemu: error: getsockopt(SO_TYPE) for fd=%d failed\n",
 fd);
+closesocket(fd);
 return NULL;
 }
 switch(so_type) {
@@ -386,9 +390,7 @@ static void net_socket_accept(void *opaque)
 }
 }
 s1 = net_socket_fd_init(s->vlan, s->model, s->name, fd, 1);
-if (!s1) {
-closesocket(fd);
-} else {
+if (s1) {
 snprintf(s1->nc.info_str, sizeof(s1->nc.info_str),
  "socket: connection from %s:%d",
  inet_ntoa(saddr.sin_addr), ntohs(saddr.sin_port));
@@ -549,7 +551,6 @@ int net_init_socket(QemuOpts *opts,
 }
 
 if (!net_socket_fd_init(vlan, "socket", name, fd, 1)) {
-close(fd);
 return -1;
 }
 } else if (qemu_opt_get(opts, "listen")) {
-- 
1.7.7.3

[Qemu-devel] [PATCH 1/2] net: expand tabs in net/socket.c

In order to make later patches sane, expand the tab characters and
conform to QEMU coding style now.

Signed-off-by: Stefan Hajnoczi 
---
 net/socket.c |   79 ++
 1 files changed, 41 insertions(+), 38 deletions(-)

diff --git a/net/socket.c b/net/socket.c
index e9ef128..613a7ef 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -161,10 +161,11 @@ static int net_socket_mcast_create(struct sockaddr_in 
*mcastaddr, struct in_addr
 #endif
 
 if (!IN_MULTICAST(ntohl(mcastaddr->sin_addr.s_addr))) {
-   fprintf(stderr, "qemu: error: specified mcastaddr \"%s\" (0x%08x) does 
not contain a multicast address\n",
-   inet_ntoa(mcastaddr->sin_addr),
+fprintf(stderr, "qemu: error: specified mcastaddr \"%s\" (0x%08x) "
+"does not contain a multicast address\n",
+inet_ntoa(mcastaddr->sin_addr),
 (int)ntohl(mcastaddr->sin_addr.s_addr));
-   return -1;
+return -1;
 
 }
 fd = qemu_socket(PF_INET, SOCK_DGRAM, 0);
@@ -177,8 +178,8 @@ static int net_socket_mcast_create(struct sockaddr_in 
*mcastaddr, struct in_addr
 ret=setsockopt(fd, SOL_SOCKET, SO_REUSEADDR,
(const char *)&val, sizeof(val));
 if (ret < 0) {
-   perror("setsockopt(SOL_SOCKET, SO_REUSEADDR)");
-   goto fail;
+perror("setsockopt(SOL_SOCKET, SO_REUSEADDR)");
+goto fail;
 }
 
 ret = bind(fd, (struct sockaddr *)mcastaddr, sizeof(*mcastaddr));
@@ -198,8 +199,8 @@ static int net_socket_mcast_create(struct sockaddr_in 
*mcastaddr, struct in_addr
 ret = setsockopt(fd, IPPROTO_IP, IP_ADD_MEMBERSHIP,
  (const char *)&imr, sizeof(struct ip_mreq));
 if (ret < 0) {
-   perror("setsockopt(IP_ADD_MEMBERSHIP)");
-   goto fail;
+perror("setsockopt(IP_ADD_MEMBERSHIP)");
+goto fail;
 }
 
 /* Force mcast msgs to loopback (eg. several QEMUs in same host */
@@ -207,8 +208,8 @@ static int net_socket_mcast_create(struct sockaddr_in 
*mcastaddr, struct in_addr
 ret=setsockopt(fd, IPPROTO_IP, IP_MULTICAST_LOOP,
(const char *)&loop, sizeof(loop));
 if (ret < 0) {
-   perror("setsockopt(SOL_IP, IP_MULTICAST_LOOP)");
-   goto fail;
+perror("setsockopt(SOL_IP, IP_MULTICAST_LOOP)");
+goto fail;
 }
 
 /* If a bind address is given, only send packets from that address */
@@ -260,37 +261,38 @@ static NetSocketState *net_socket_fd_init_dgram(VLANState 
*vlan,
  */
 
 if (is_connected) {
-   if (getsockname(fd, (struct sockaddr *) &saddr, &saddr_len) == 0) {
-   /* must be bound */
-   if (saddr.sin_addr.s_addr==0) {
-   fprintf(stderr, "qemu: error: init_dgram: fd=%d unbound, cannot 
setup multicast dst addr\n",
-   fd);
-   return NULL;
-   }
-   /* clone dgram socket */
-   newfd = net_socket_mcast_create(&saddr, NULL);
-   if (newfd < 0) {
-   /* error already reported by net_socket_mcast_create() */
-   close(fd);
-   return NULL;
-   }
-   /* clone newfd to fd, close newfd */
-   dup2(newfd, fd);
-   close(newfd);
-
-   } else {
-   fprintf(stderr, "qemu: error: init_dgram: fd=%d failed 
getsockname(): %s\n",
-   fd, strerror(errno));
-   return NULL;
-   }
+if (getsockname(fd, (struct sockaddr *) &saddr, &saddr_len) == 0) {
+/* must be bound */
+if (saddr.sin_addr.s_addr == 0) {
+fprintf(stderr, "qemu: error: init_dgram: fd=%d unbound, "
+"cannot setup multicast dst addr\n", fd);
+return NULL;
+}
+/* clone dgram socket */
+newfd = net_socket_mcast_create(&saddr, NULL);
+if (newfd < 0) {
+/* error already reported by net_socket_mcast_create() */
+close(fd);
+return NULL;
+}
+/* clone newfd to fd, close newfd */
+dup2(newfd, fd);
+close(newfd);
+
+} else {
+fprintf(stderr,
+"qemu: error: init_dgram: fd=%d failed getsockname(): 
%s\n",
+fd, strerror(errno));
+return NULL;
+}
 }
 
 nc = qemu_new_net_client(&net_dgram_socket_info, vlan, NULL, model, name);
 
 snprintf(nc->info_str, sizeof(nc->info_str),
-   "socket: fd=%d (%s mcast=%s:%d)",
-   fd, is_connected ? "cloned" : "",
-   inet_ntoa(saddr.sin_addr), ntohs(saddr.sin_port));
+"socket: fd=%d (%s mcast=%s:%d)",
+fd, is_connected ? "cloned" : "",
+inet_ntoa(saddr.sin_addr), ntohs(saddr.sin_port));
 
 s = DO_UPCAST(NetSocketState, nc, nc);
 
@@ -349,8 +351,9 @@ static NetSocketState *net_socket_fd_init(VLANState *vlan,
 
 if(

[Qemu-devel] [PATCH 0/2] net: clean up net/socket.c

There is no consistent policy on closing the socket file descriptor when
net/socket.c initialization fails.  This has been on my TODO list for a while
and I had a few minutes to fix it now.

Stefan Hajnoczi (2):
  net: expand tabs in net/socket.c
  net: take ownership of fd in socket init functions

 net/socket.c |   88 ++---
 1 files changed, 46 insertions(+), 42 deletions(-)

-- 
1.7.7.3

[Qemu-devel] ezmlm warning

2011-12-07 Thread 02hdec-help

Bonjour ! Je suis le programme ezmlm. Je m'occupe de la liste
de diffusion 02h...@huiledecrete.com.


Un certain nombre de messages provenant de la liste de diffusion
<#l> n'ont pas pu vous etre remis correctement. En attachement, vous trouverez
une copie du premier message de retour a l'envoyeur que j'ai recu.

Si le message que vous lisez actuellement ne parvient pas non plus a
destination, un dernier message vous sera envoye. Si celui-ci echoue aussi,
votre adresse sera malheureusement retiree de la liste 02hdec.



J'ai conserve une liste des messages de la liste de diffusion 02hdec qui
n'ont temporairement pas pu etre delivres a votre adresse.

Voici les numeros des messages en question :

   18

--- Ci-dessous se trouve une copie du message problematique qui m'est revenu :

Return-Path: 
Received: from b0.ovh.net (HELO queue) (213.186.33.50)
by b0.ovh.net with SMTP; 19 Nov 2011 23:16:45 +0200
Received: from 36.mail-out.ovh.net (213.251.186.44)
  by mx1.ovh.net with SMTP; 19 Nov 2011 23:16:43 +0200
Received: (qmail 6238 invoked for bounce); 19 Nov 2011 20:20:32 -
Date: 19 Nov 2011 20:20:32 -
From: mailer-dae...@36.mail-out.ovh.net
To: 02hdec-return-18-qemu-devel=nongnu@huiledecrete.com
Subject: failure notice
X-Ovh-Tracer-Id: 2070248455189561348
Message-ID: <213.251.186.44.1321737403.22...@mx1.ovh.net>
X-Ovh-Remote: 213.251.186.44 (36.mail-out.ovh.net)
X-Ovh-Local: 213.186.33.29 (mx1.ovh.net)
X-OVH-SPAMSTATE: BOUNCE
X-OVH-SPAMSCORE: 10001
X-OVH-SPAMCAUSE: 
gggruggvucftvghtrhhoucdtuddrfeefhedrudelucetggdotefuucfrrhhofhhilhgvmecuqfggjfenuceurghilhhouhhtmecufedttdenucfpohhtihhfihgtrghtihhonhculddutddttddumd
X-Spam-Check: DONE|U 0.5/N

Hi. This is the qmail-send program at 36.mail-out.ovh.net.
I'm afraid I wasn't able to deliver your message to the following addresses.
This is a permanent error; I've given up. Sorry it didn't work out.

:
140.186.70.92 failed after I sent the message.
[from:213.251.186.44](220_eggs.gnu.org_ESMTP_Exim_4.71_Sat,_19_Nov_2011_16:16:40_-0500?)|HELO_36.mail-out.ovh.net|250_eggs.gnu.org_Hello_36.mail-out.ovh.net_[213.251.186.44]?)|MAIL_FROM:<02hdec-return-18-qemu-devel=nongnu@huiledecrete.com>=250_OK?|RCPT_TO:=(250_Accepted?)|DATA=(354_Enter_message,_ending_with_"."_on_a_line_by_itself?)|.=(550_Invalid_address_in_message_header??|Remote
 host said: 550 Invalid address in message header

--- Below this line is a copy of the message.

Return-Path: <02hdec-return-18-qemu-devel=nongnu@huiledecrete.com>
Received: (qmail 5901 invoked by uid 503); 19 Nov 2011 20:20:30 -
Received: from b3.ovh.net (HELO mail358.ha.ovh.net) (213.186.33.53)
  by 36.mail-out.ovh.net with SMTP; 19 Nov 2011 20:20:28 -
Received: from b0.ovh.net (HELO queueml) (213.186.33.50)
by b0.ovh.net with SMTP; 19 Nov 2011 18:49:47 -
Mailing-List: Pour toute requete administrative, contactez 
02hdec-h...@huiledecrete.com; Liste geree par ezmlm
Precedence: bulk
X-No-Archive: yes
List-Post: 
List-Help: 
List-Unsubscribe: 
List-Subscribe: 
Reply-To: 02h...@huiledecrete.com
Delivered-To: mailing list 02h...@huiledecrete.com
Delivered-To: moderator for 02h...@huiledecrete.com
Received: from b0.ovh.net (HELO queue) (213.186.33.50)
by b0.ovh.net with SMTP; 19 Nov 2011 15:46:25 -
Received: from localhost (HELO mail358.ha.ovh.net) (127.0.0.1)
  by localhost with SMTP; 19 Nov 2011 15:46:25 -
Received: from b0.ovh.net (HELO queueout) (213.186.33.50)
by b0.ovh.net with SMTP; 19 Nov 2011 15:46:25 -
Received: from ns0.ovh.net (HELO webmail.ovh.net) (213.186.33.20)
  by ns0.ovh.net with SMTP; 19 Nov 2011 15:46:25 -
MIME-Version: 1.0
Content-Type: multipart/alternative;
 boundary="=_a163dab1da9e410324b5b8092a513934"
Date: Sat, 19 Nov 2011 16:46:25 +0100
From: 
To: <02h...@huiledecrete.com>>
Message-ID: <7e51fcd2ca05b3cf986b037de9449395@localhost>
X-Sender: postmas...@huiledecrete.com
User-Agent: RoundCube Webmail/0.4
X-Ovh-Tracer-Id: 14938721439454360119
Subject: [02hdec] 10 jours pour en profiter  " en novembre,  2
 =?UTF-8?Q?op=C3=A9rations=20=2E=2E=2E=20=20=2E=2E=2E=20=20=2E=2E=2E=20=20?=
 =?UTF-8?Q?=2E=2E=2E?=

--=_a163dab1da9e410324b5b8092a513934
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset=UTF-8



NOS "CORBEILLES SAVEURS" COMPOSÃES POUR VOUS + PROMO AOP NESOS
34.95EUR / 5 LITRES 5 - (JUSQU'AU 30/11/20011)

 [1] [2] [3]


[4]

DÃ©couvrir, offrir, partager... le plaisir d'instants surprenants,

grÃ¢ce Ã  nos "corbeilles saveurs" composÃ©es pour vous. 
A dÃ©couvrir ici
pour prÃ©parer "les
fÃªtes"!
---

RETROUVEZ
- NOUS :

_SALON GOURMETS ET VINS Ã BRUXELLES - PARC EXPO, DU 24 AU 27
NOVEMBRE 2011_

_SALON CUISIN'& VOUS Ã BIARRITZ - HALL D' IRATY, DU 03
AU 05 DÃCEMBRE 2011_

 [5]

 Cliquez ici pour vous dÃ©sab

Re: [Qemu-devel] [PATCH] Documentation: Add qemu-img -t parameter in man page

On Wed, Dec 7, 2011 at 12:58 PM, Kevin Wolf  wrote:
> Signed-off-by: Kevin Wolf 
> ---
>  qemu-img-cmds.hx |    6 +++---
>  1 files changed, 3 insertions(+), 3 deletions(-)

Please also document '-t' under "Command parameters:".  So far this
patch just lists -t but doesn't explain what it does or the valid
values.

Stefan

Re: [Qemu-devel] [PATCH v2 0/5] backdoor: lightweight guest-to-QEMU backdoor channel


On 12/07/2011 06:21 AM, Lluís Vilanova wrote:

Anthony Liguori writes:


On 12/05/2011 04:22 PM, Lluís Vilanova wrote:

Provides the ability for the guest to communicate with user-provided code inside
QEMU itself, using a lightweight mechanism.

See first commit for a full description.

Signed-off-by: Lluís Vilanova



This whole series:



Nacked-by: Anthony Liguori



If you want to extend QEMU, then send proper patches.  Adding random C files to
the build is unacceptable.


What do you mean by proper patches?


If you want to add "custom functionality" to QEMU, send a patch to upstream 
QEMU.


The whole point of this is to ease the
process for the user to add custom guest-to-qemu interactions, so there is no
"proper implementation" here.


That's too vague as far as I'm concerned.  The only reason I can see to have a 
mechanism like this is 1) to avoid pushing stuff upstream and/or 2) to 
circumvent QEMU licensing by shipping a separate file that the user includes 
themselves.


I see no reason to encourage either of these things.  It also creates an 
inevitable problem of internal API stability.  What happens when the tree 
changes and all of these "custom guest-to-qemu interactions" break?


You may claim that it's an unsupported interface, but what will actually happen 
is that users will start depending on these custom features long after the 
initial developer has stopped caring.



Blue Swirl already mentioned this can be used for testing qemu itself, while my
use case is coupled with the trace instrumentation features (I'll probably send
the first batch today).


In terms of supporting this mechanism, all backend code needs to live in 
upstream QEMU.  That's a hard requirement.  No configure flags to pull in random 
C files and no dlopens.


If you resent the series doing that, it'd be more likely to be merged but since 
I don't think your intention is to push backend code into QEMU, I wonder whether 
it's worth really merging this mechanism at all.


If this is just infrastructure that is only used by private forks, I don't think 
it belongs upstream.


Regards,

Anthony Liguori




Lluis

Re: [Qemu-devel] Insane virtio-serial semantics


On 12/07/2011 02:21 AM, Markus Armbruster wrote:

Anthony Liguori  writes:


On 12/06/2011 04:30 PM, Lluís Vilanova wrote:

Anthony Liguori writes:


I really worry about us introducing so many of these one-off paravirtual 
devices.
I would much prefer that you look at doing this as an extension to the ivshmem
device as it already has this sort of scope.  You should be able to do this by
just extending the size of bar 1 and using a well known guest id.


I did in fact look at ivshmem some time ago, and it's true that both use the
same mechanisms; but each device has a completely different purpose. To me it
just seems that extending the control BAR in ivshmem to call the user-provided
backdoor callbacks is just conflating two completely separate devices into a
single one. Besides, I think that the qemu-side of the backdoor is simple enough
to avoid being a maintenance burden.


They have the same purpose (which are both vague TBH).  The only
reason I'm sympathetic to this device is that virtio-serial has such
insane semantics.


Could you summarize what's wrong?  Is it fixable?


I don't think so as it's part of the userspace ABI now.

Mike, please help me make sure I get this all right.  A normal file/socket has 
the following guest semantics:


1) When a disconnect occurs, you will receive a return of '0' or -EPIPE 
depending on the platform.  The fd is now unusable and you must close/reopen.


2) You can setup SIGIO/SIGPIPE to fire off whenever a file descriptor becomes 
readable/writable.


virtio serial has the following semantics:

1) When a disconnect occurs, if you read() you will receive an -EPIPE.

2) However, if a reconnect occurs before you issue your read(), the read will 
complete with no indication that a disconnect occurred.


3) This makes it impossible to determine whether a disconnect has occurred which 
makes it very hard to reset your protocol stream.  To deal with this, 
virtio-serial can issue a SIGIO signal upon disconnect.


4) Signals are asynchronous, so a reconnect may have occurred by the time you 
get the SIGIO signal.  It's unclear that you can do anything useful with this.


So besides overloading the meaning of SIGIO, there's really no way to figure out 
in the guest when a reconnect has occurred.  To deal with this in qemu-ga, we 
actually only allow 7-bit data transfers and use the 8th bit as an in-band 
message to tell the guest that a reset has occurred.


Regards,

Anthony Liguori



[...]

[Qemu-devel] [PATCH] Documentation: Add qemu-img -t parameter in man page

Signed-off-by: Kevin Wolf 
---
 qemu-img-cmds.hx |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index 4be00a5..49dce7c 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -24,13 +24,13 @@ ETEXI
 DEF("commit", img_commit,
 "commit [-f fmt] [-t cache] filename")
 STEXI
-@item commit [-f @var{fmt}] @var{filename}
+@item commit [-f @var{fmt}] [-t @var{cache}] @var{filename}
 ETEXI
 
 DEF("convert", img_convert,
 "convert [-c] [-p] [-f fmt] [-t cache] [-O output_fmt] [-o options] [-s 
snapshot_name] [-S sparse_size] filename [filename2 [...]] output_filename")
 STEXI
-@item convert [-c] [-p] [-f @var{fmt}] [-O @var{output_fmt}] [-o 
@var{options}] [-s @var{snapshot_name}] [-S @var{sparse_size}] @var{filename} 
[@var{filename2} [...]] @var{output_filename}
+@item convert [-c] [-p] [-f @var{fmt}] [-t @var{cache}] [-O @var{output_fmt}] 
[-o @var{options}] [-s @var{snapshot_name}] [-S @var{sparse_size}] 
@var{filename} [@var{filename2} [...]] @var{output_filename}
 ETEXI
 
 DEF("info", img_info,
@@ -48,7 +48,7 @@ ETEXI
 DEF("rebase", img_rebase,
 "rebase [-f fmt] [-t cache] [-p] [-u] -b backing_file [-F backing_fmt] 
filename")
 STEXI
-@item rebase [-f @var{fmt}] [-p] [-u] -b @var{backing_file} [-F 
@var{backing_fmt}] @var{filename}
+@item rebase [-f @var{fmt}] [-t @var{cache}] [-p] [-u] -b @var{backing_file} 
[-F @var{backing_fmt}] @var{filename}
 ETEXI
 
 DEF("resize", img_resize,
-- 
1.7.6.4

Re: [Qemu-devel] [PATCH] Convert keymap file to UTF-8 encoding

2011-12-07 Thread Zhi Yong Wu

On Wed, Dec 7, 2011 at 8:31 PM, Peter Maydell  wrote:
> On 7 December 2011 12:13, Zhi Yong Wu  wrote:
>> Can you let me know how you see that it is ISO-8859-1 coding, not
>> UTF-8? They look same to me.
>
> This gets a bit confusing because mail clients and
> web browsers tend to try to fix up what they think are
> wrongly labelled encodings, so for example in my mail
> client the diff looks like it is not changing anything.
> However if you look at the file in the current git repo
> with a hex editor:
>   23 20 32 30 30 34 2d 30  33 2d 31 36 20 48 61 6c  |# 2004-03-16 Hal|
> 0010  6c 64 f3 72 20 47 75 f0  6d 75 6e 64 73 73 6f 6e  |ld.r Gu.mundsson|
>
> you can see that there is an 0xf3 at offset 0x12, which
> is LATIN SMALL LETTER O WITH ACUTE in ISO-8859-1. ISO-8859-1
> is a one-byte-per-character encoding which is why it has a
> raw 0xf3 here. However although the character is at Unicode
> codepoint 0xf3 as well, the encoding of this in UTF-8 is
> the two byte sequence 0xc3 0xb3. Similarly the 0xf0 LATIN
> SMALL LETTER ETH has to be encoded as 0xc3 0xb0.
>
> If you look at the raw text of Stefan's email it reads:
> -# 2004-03-16 Halld=F3r Gu=F0mundsson and Morten Lange
> +# 2004-03-16 Halld=C3=B3r Gu=C3=B0mundsson and Morten Lange
>
> which is the quoted-printable encoding of this change.
thanks for your explain.

>
> -- PMM



-- 
Regards,

Zhi Yong Wu

Re: [Qemu-devel] [PATCH] Convert keymap file to UTF-8 encoding

On 7 December 2011 12:13, Zhi Yong Wu  wrote:
> Can you let me know how you see that it is ISO-8859-1 coding, not
> UTF-8? They look same to me.

This gets a bit confusing because mail clients and
web browsers tend to try to fix up what they think are
wrongly labelled encodings, so for example in my mail
client the diff looks like it is not changing anything.
However if you look at the file in the current git repo
with a hex editor:
  23 20 32 30 30 34 2d 30  33 2d 31 36 20 48 61 6c  |# 2004-03-16 Hal|
0010  6c 64 f3 72 20 47 75 f0  6d 75 6e 64 73 73 6f 6e  |ld.r Gu.mundsson|

you can see that there is an 0xf3 at offset 0x12, which
is LATIN SMALL LETTER O WITH ACUTE in ISO-8859-1. ISO-8859-1
is a one-byte-per-character encoding which is why it has a
raw 0xf3 here. However although the character is at Unicode
codepoint 0xf3 as well, the encoding of this in UTF-8 is
the two byte sequence 0xc3 0xb3. Similarly the 0xf0 LATIN
SMALL LETTER ETH has to be encoded as 0xc3 0xb0.

If you look at the raw text of Stefan's email it reads:
-# 2004-03-16 Halld=F3r Gu=F0mundsson and Morten Lange
+# 2004-03-16 Halld=C3=B3r Gu=C3=B0mundsson and Morten Lange

which is the quoted-printable encoding of this change.

-- PMM

Re: [Qemu-devel] [PATCH v2 0/5] backdoor: lightweight guest-to-QEMU backdoor channel

Anthony Liguori writes:

> On 12/05/2011 04:22 PM, Lluís Vilanova wrote:
>> Provides the ability for the guest to communicate with user-provided code 
>> inside
>> QEMU itself, using a lightweight mechanism.
>> 
>> See first commit for a full description.
>> 
>> Signed-off-by: Lluís Vilanova

> This whole series:

> Nacked-by: Anthony Liguori 

> If you want to extend QEMU, then send proper patches.  Adding random C files 
> to
> the build is unacceptable.

What do you mean by proper patches? The whole point of this is to ease the
process for the user to add custom guest-to-qemu interactions, so there is no
"proper implementation" here.

Blue Swirl already mentioned this can be used for testing qemu itself, while my
use case is coupled with the trace instrumentation features (I'll probably send
the first batch today).

Lluis

-- 
 "And it's much the same thing with knowledge, for whenever you learn
 something new, the whole world becomes that much richer."
 -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
 Tollbooth

Re: [Qemu-devel] [PATCH] Convert keymap file to UTF-8 encoding

2011-12-07 Thread Zhi Yong Wu

On Sat, Dec 3, 2011 at 5:45 PM, Stefan Weil  wrote:
> Most QEMU files either are pure ASCII or use UTF-8.
> Convert this keymap file which still used ISO-8859-1 to UTF-8.
>
> Signed-off-by: Stefan Weil 
> ---
>  pc-bios/keymaps/is |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/pc-bios/keymaps/is b/pc-bios/keymaps/is
> index 21dc1fd..935ac1d 100644
> --- a/pc-bios/keymaps/is
> +++ b/pc-bios/keymaps/is
> @@ -1,4 +1,4 @@
> -# 2004-03-16 Halldór Guðmundsson and Morten Lange
Can you let me know how you see that it is ISO-8859-1 coding, not
UTF-8? They look same to me.
> +# 2004-03-16 Halldór Guðmundsson and Morten Lange
>  # Keyboard definition file for the Icelandic keyboard
>  # to be used in rdesktop 1.3.x ( See rdesktop.org)
>  # generated from XKB map de, and changed manually
> --
> 1.7.0.4
>
>



-- 
Regards,

Zhi Yong Wu

Re: [Qemu-devel] [PATCH] qemu-img rebase: Fix for undersized backing files

Am 07.12.2011 13:06, schrieb Stefan Hajnoczi:
> On Wed, Dec 7, 2011 at 12:02 PM, Kevin Wolf  wrote:
>> Am 07.12.2011 12:50, schrieb Stefan Hajnoczi:
>>> On Wed, Dec 7, 2011 at 11:45 AM, Kevin Wolf  wrote:
 Backing files may be smaller than the corresponding COW file. When
 reading directly from the backing file, qemu-img rebase must consider
 this and assume zero sectors after the end of backing files.

 Signed-off-by: Kevin Wolf 
 ---
  qemu-img.c |   42 +-
  1 files changed, 33 insertions(+), 9 deletions(-)
>>>
>>> Reviewed-by: Stefan Hajnoczi 
>>>
>>> It almost feels like we want a bdrv_read() variation that does the
>>> zeroing beyond end of image instead of duplicating this.
>>
>> Almost. If we have a third user, it's probably worth it.
>>
>> Actually, if we make a bdrv_co_readv_backing() instead of a synchronous
>> one, qcow2 would be the third user (and maybe VMDK could use it, too).
>> That would require moving qemu-img into a coroutine first, which I think
>> you wanted to do anyway?
> 
> If we can make qemu-img the last user of synchronous interfaces then
> it would be nice to convert it.  Right now I'm happy that we're
> solidifying coroutines and converting synchronous hardware emulation,
> hopefully we'll get to the point where there is no real reason to keep
> the synchronous API around anymore.

The point is, if I were to introduce a bdrv_co_readv_backing() function,
I certainly wouldn't like to add all the bdrv_(aio_)read/write emulation
code for it, too.

Kevin

Re: [Qemu-devel] [PATCH 1/2] guest agent: add RPC blacklist command-line option

2011-12-07 Thread Dor Laor


On 12/07/2011 12:52 PM, Daniel P. Berrange wrote:

On Wed, Dec 07, 2011 at 12:34:01PM +0200, Dor Laor wrote:

On 12/07/2011 06:03 AM, Michael Roth wrote:

This adds a command-line option, -b/--blacklist, that accepts a
comma-seperated list of RPCs to disable, or prints a list of
available RPCs if passed "?".

In consequence this also adds general blacklisting and RPC listing
facilities to the new QMP dispatch/registry facilities, should the
QMP monitor ever have a need for such a thing.


Beyond run time disablement, how easy it is to compile out some of
the general commands such as exec/file-handling?

Security certifications like common criteria usually ask to compile
out anything that might tamper security.


I don't think that's really relevant/needed. As discussed on the
call yesterday, this is security theatre, because nothing can prevent
the host admin from accessing guest RAM or disk data. AFAIK the
virtualization related security certifications acknowledge this
already&  don't make any claims about security of guests against
a malicious host admin. In any case, a suitable SELinux policy for
the guest agent could prevent arbitrary file/binary access via
generic 'exec' / 'file-read' commands, in a manner that is sufficient
to satisfy security certications.


I absolutely agree that the hypervisor can tweak the guest in multiple 
ways. Nevertheless there are two reasons I asked it:


 1. Reduce code and noise from security reviewers eyes.
We were asked to do exactly that for other qemu functionality that
is included but does not run at all. It's just makes the review
faster.

 2. Every piece of code is a risk for exploit
Imagine that a bug/leak/use-after-free in the blacklist command or
the exec command on qemu exists and allows attacked to gain control
of qemu.



Regards,
Daniel

[Qemu-devel] [PATCH v2 1/3] block: add zero write detection interface

Some image formats can represent zero regions efficiently even when a
backing file is present.  In order to use this feature they need to
detect zero writes and handle them specially.

Since zero write detection consumes CPU cycles it is disabled by default
and must be explicitly enabled.  This patch adds an interface to do so.

Currently no block drivers actually support zero write detection yet.
This is addressed in follow-up patches.

Signed-off-by: Stefan Hajnoczi 
---
 block.c |   16 
 block.h |3 +++
 block_int.h |9 +
 3 files changed, 28 insertions(+), 0 deletions(-)

diff --git a/block.c b/block.c
index 3f072f6..5dcffb1 100644
--- a/block.c
+++ b/block.c
@@ -554,6 +554,21 @@ void bdrv_disable_copy_on_read(BlockDriverState *bs)
 bs->copy_on_read--;
 }
 
+/**
+ * Multiple users may use this feature without worrying about clobbering its
+ * previous state.  It stays enabled until all users have called to disable it.
+ */
+void bdrv_enable_zero_detection(BlockDriverState *bs)
+{
+bs->zero_detection++;
+}
+
+void bdrv_disable_zero_detection(BlockDriverState *bs)
+{
+assert(bs->zero_detection > 0);
+bs->zero_detection--;
+}
+
 /*
  * Common part for opening disk images and files
  */
@@ -574,6 +589,7 @@ static int bdrv_open_common(BlockDriverState *bs, const 
char *filename,
 bs->open_flags = flags;
 bs->growable = 0;
 bs->buffer_alignment = 512;
+bs->zero_detection = 0;
 
 assert(bs->copy_on_read == 0); /* bdrv_new() and bdrv_close() make it so */
 if ((flags & BDRV_O_RDWR) && (flags & BDRV_O_COPY_ON_READ)) {
diff --git a/block.h b/block.h
index 1790f99..30b72e9 100644
--- a/block.h
+++ b/block.h
@@ -317,6 +317,9 @@ int64_t bdrv_get_dirty_count(BlockDriverState *bs);
 void bdrv_enable_copy_on_read(BlockDriverState *bs);
 void bdrv_disable_copy_on_read(BlockDriverState *bs);
 
+void bdrv_enable_zero_detection(BlockDriverState *bs);
+void bdrv_disable_zero_detection(BlockDriverState *bs);
+
 void bdrv_set_in_use(BlockDriverState *bs, int in_use);
 int bdrv_in_use(BlockDriverState *bs);
 
diff --git a/block_int.h b/block_int.h
index 311bd2a..89a860c 100644
--- a/block_int.h
+++ b/block_int.h
@@ -247,6 +247,15 @@ struct BlockDriverState {
 /* do we need to tell the quest if we have a volatile write cache? */
 int enable_write_cache;
 
+/*
+ * If zero write detection is enabled this field is != 0.
+ *
+ * Block drivers that support zero detection should check this field for
+ * each write request to decide whether or not to perform detection.  Since
+ * zero detection consumes CPU cycles it is disabled by default.
+ */
+int zero_detection;
+
 /* NOTE: the following infos are only hints for real hardware
drivers. They are not used by the block driver */
 int cyls, heads, secs, translation;
-- 
1.7.7.3

[Qemu-devel] [PATCH v2 2/3] qed: add zero write detection support

The QED image format is able to efficiently represent clusters
containing zeroes with a magic offset value.  This patch implements zero
write detection for allocating writes so that image streaming can copy
over zero clusters from a backing file without expanding the image file
unnecessarily.

This is based code by Anthony Liguori .

Signed-off-by: Stefan Hajnoczi 
---
 block/qed.c |   80 +--
 1 files changed, 72 insertions(+), 8 deletions(-)

diff --git a/block/qed.c b/block/qed.c
index 8da3ebe..db4246a 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -941,9 +941,8 @@ static void qed_aio_write_l1_update(void *opaque, int ret)
 /**
  * Update L2 table with new cluster offsets and write them out
  */
-static void qed_aio_write_l2_update(void *opaque, int ret)
+static void qed_aio_write_l2_update(QEDAIOCB *acb, int ret, uint64_t offset)
 {
-QEDAIOCB *acb = opaque;
 BDRVQEDState *s = acb_to_s(acb);
 bool need_alloc = acb->find_cluster_ret == QED_CLUSTER_L1;
 int index;
@@ -959,7 +958,7 @@ static void qed_aio_write_l2_update(void *opaque, int ret)
 
 index = qed_l2_index(s, acb->cur_pos);
 qed_update_l2_table(s, acb->request.l2_table->table, index, 
acb->cur_nclusters,
- acb->cur_cluster);
+ offset);
 
 if (need_alloc) {
 /* Write out the whole new L2 table */
@@ -976,6 +975,51 @@ err:
 qed_aio_complete(acb, ret);
 }
 
+static void qed_aio_write_l2_update_cb(void *opaque, int ret)
+{
+QEDAIOCB *acb = opaque;
+qed_aio_write_l2_update(acb, ret, acb->cur_cluster);
+}
+
+/**
+ * Determine if we have a zero write to a block of clusters
+ *
+ * We validate that the write is aligned to a cluster boundary, and that it's
+ * a multiple of cluster size with all zeros.
+ */
+static bool qed_is_zero_write(QEDAIOCB *acb)
+{
+BDRVQEDState *s = acb_to_s(acb);
+int i;
+
+if (!qed_offset_is_cluster_aligned(s, acb->cur_pos)) {
+return false;
+}
+
+if (!qed_offset_is_cluster_aligned(s, acb->cur_qiov.size)) {
+return false;
+}
+
+for (i = 0; i < acb->cur_qiov.niov; i++) {
+struct iovec *iov = &acb->cur_qiov.iov[i];
+uint64_t *v;
+int j;
+
+if ((iov->iov_len & 0x07)) {
+return false;
+}
+
+v = iov->iov_base;
+for (j = 0; j < iov->iov_len; j += sizeof(v[0])) {
+if (v[j >> 3]) {
+return false;
+}
+}
+}
+
+return true;
+}
+
 /**
  * Flush new data clusters before updating the L2 table
  *
@@ -990,7 +1034,7 @@ static void qed_aio_write_flush_before_l2_update(void 
*opaque, int ret)
 QEDAIOCB *acb = opaque;
 BDRVQEDState *s = acb_to_s(acb);
 
-if (!bdrv_aio_flush(s->bs->file, qed_aio_write_l2_update, opaque)) {
+if (!bdrv_aio_flush(s->bs->file, qed_aio_write_l2_update_cb, opaque)) {
 qed_aio_complete(acb, -EIO);
 }
 }
@@ -1019,7 +1063,7 @@ static void qed_aio_write_main(void *opaque, int ret)
 if (s->bs->backing_hd) {
 next_fn = qed_aio_write_flush_before_l2_update;
 } else {
-next_fn = qed_aio_write_l2_update;
+next_fn = qed_aio_write_l2_update_cb;
 }
 }
 
@@ -1081,6 +1125,18 @@ static bool qed_should_set_need_check(BDRVQEDState *s)
 return !(s->header.features & QED_F_NEED_CHECK);
 }
 
+static void qed_aio_write_zero_cluster(void *opaque, int ret)
+{
+QEDAIOCB *acb = opaque;
+
+if (ret) {
+qed_aio_complete(acb, ret);
+return;
+}
+
+qed_aio_write_l2_update(acb, 0, 1);
+}
+
 /**
  * Write new data cluster
  *
@@ -1092,6 +1148,7 @@ static bool qed_should_set_need_check(BDRVQEDState *s)
 static void qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
 {
 BDRVQEDState *s = acb_to_s(acb);
+BlockDriverCompletionFunc *cb;
 
 /* Cancel timer when the first allocating request comes in */
 if (QSIMPLEQ_EMPTY(&s->allocating_write_reqs)) {
@@ -1109,14 +1166,21 @@ static void qed_aio_write_alloc(QEDAIOCB *acb, size_t 
len)
 
 acb->cur_nclusters = qed_bytes_to_clusters(s,
 qed_offset_into_cluster(s, acb->cur_pos) + len);
-acb->cur_cluster = qed_alloc_clusters(s, acb->cur_nclusters);
 qemu_iovec_copy(&acb->cur_qiov, acb->qiov, acb->qiov_offset, len);
 
+/* Zero write detection */
+if (s->bs->zero_detection && qed_is_zero_write(acb)) {
+cb = qed_aio_write_zero_cluster;
+} else {
+cb = qed_aio_write_prefill;
+acb->cur_cluster = qed_alloc_clusters(s, acb->cur_nclusters);
+}
+
 if (qed_should_set_need_check(s)) {
 s->header.features |= QED_F_NEED_CHECK;
-qed_write_header(s, qed_aio_write_prefill, acb);
+qed_write_header(s, cb, acb);
 } else {
-qed_aio_write_prefill(acb, 0);
+cb(acb, 0);
 }
 }
 
-- 
1.7.7.3

[Qemu-devel] [PATCH v2 3/3] qemu-io: add zero write detection option

Add a -z option to qemu-io and the 'open' command to enable zero write
detection.  This is used by the qemu-iotests 029 test case and allows
scripts to exercise zero write detection.

Signed-off-by: Stefan Hajnoczi 
---
 qemu-io.c |   29 ++---
 1 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/qemu-io.c b/qemu-io.c
index ffa62fb..1568870 100644
--- a/qemu-io.c
+++ b/qemu-io.c
@@ -1594,7 +1594,7 @@ static const cmdinfo_t close_cmd = {
 .oneline= "close the current open file",
 };
 
-static int openfile(char *name, int flags, int growable)
+static int openfile(char *name, int flags, int growable, int detect_zeroes)
 {
 if (bs) {
 fprintf(stderr, "file open already, try 'help close'\n");
@@ -1617,6 +1617,10 @@ static int openfile(char *name, int flags, int growable)
 }
 }
 
+if (detect_zeroes) {
+bdrv_enable_zero_detection(bs);
+}
+
 return 0;
 }
 
@@ -1634,6 +1638,7 @@ static void open_help(void)
 " -s, -- use snapshot file\n"
 " -n, -- disable host cache\n"
 " -g, -- allow file to grow (only applies to protocols)"
+" -z  -- use zero write detection (supported formats only)\n"
 "\n");
 }
 
@@ -1646,7 +1651,7 @@ static const cmdinfo_t open_cmd = {
 .argmin = 1,
 .argmax = -1,
 .flags  = CMD_NOFILE_OK,
-.args   = "[-Crsn] [path]",
+.args   = "[-Crsnz] [path]",
 .oneline= "open the file specified by path",
 .help   = open_help,
 };
@@ -1656,9 +1661,10 @@ static int open_f(int argc, char **argv)
 int flags = 0;
 int readonly = 0;
 int growable = 0;
+int detect_zeroes = 0;
 int c;
 
-while ((c = getopt(argc, argv, "snrg")) != EOF) {
+while ((c = getopt(argc, argv, "snrgz")) != EOF) {
 switch (c) {
 case 's':
 flags |= BDRV_O_SNAPSHOT;
@@ -1672,6 +1678,9 @@ static int open_f(int argc, char **argv)
 case 'g':
 growable = 1;
 break;
+case 'z':
+detect_zeroes = 1;
+break;
 default:
 return command_usage(&open_cmd);
 }
@@ -1685,7 +1694,7 @@ static int open_f(int argc, char **argv)
 return command_usage(&open_cmd);
 }
 
-return openfile(argv[optind], flags, growable);
+return openfile(argv[optind], flags, growable, detect_zeroes);
 }
 
 static int init_args_command(int index)
@@ -1712,7 +1721,7 @@ static int init_check_command(const cmdinfo_t *ct)
 static void usage(const char *name)
 {
 printf(
-"Usage: %s [-h] [-V] [-rsnm] [-c cmd] ... [file]\n"
+"Usage: %s [-h] [-V] [-rsnmkz] [-c cmd] ... [file]\n"
 "QEMU Disk exerciser\n"
 "\n"
 "  -c, --cmdcommand to execute\n"
@@ -1722,6 +1731,7 @@ static void usage(const char *name)
 "  -g, --growable   allow file to grow (only applies to protocols)\n"
 "  -m, --misalign   misalign allocations for O_DIRECT\n"
 "  -k, --native-aio use kernel AIO implementation (on Linux only)\n"
+"  -z, --detect-zeroes  use zero write detection (supported formats only)\n"
 "  -h, --help   display this help and exit\n"
 "  -V, --versionoutput version information and exit\n"
 "\n",
@@ -1733,7 +1743,8 @@ int main(int argc, char **argv)
 {
 int readonly = 0;
 int growable = 0;
-const char *sopt = "hVc:rsnmgk";
+int detect_zeroes = 0;
+const char *sopt = "hVc:rsnmgkz";
 const struct option lopt[] = {
 { "help", 0, NULL, 'h' },
 { "version", 0, NULL, 'V' },
@@ -1745,6 +1756,7 @@ int main(int argc, char **argv)
 { "misalign", 0, NULL, 'm' },
 { "growable", 0, NULL, 'g' },
 { "native-aio", 0, NULL, 'k' },
+{ "detect-zeroes", 0, NULL, 'z' },
 { NULL, 0, NULL, 0 }
 };
 int c;
@@ -1776,6 +1788,9 @@ int main(int argc, char **argv)
 case 'k':
 flags |= BDRV_O_NATIVE_AIO;
 break;
+case 'z':
+detect_zeroes = 1;
+break;
 case 'V':
 printf("%s version %s\n", progname, VERSION);
 exit(0);
@@ -1825,7 +1840,7 @@ int main(int argc, char **argv)
 }
 
 if ((argc - optind) == 1) {
-openfile(argv[optind], flags, growable);
+openfile(argv[optind], flags, growable, detect_zeroes);
 }
 command_loop();
 
-- 
1.7.7.3

[Qemu-devel] [PATCH v2 0/3] block: zero write detection

This series adds an interface for optimized writes when data contains all
zeros. If zero detection is enabled a block driver can take extra steps to
represent zero regions efficiently.

The details of optimized zero representations depend on the image format but
the main block layer change is a field to indicate that zero detection should
be performed. (This can be CPU intensive since buffers are scanned for all
zeros and the feature is disabled by default.)

This series includes a patch for the QED image format that writes special "zero
clusters" and keeps the image file compact. In the future qcow2v3 could also
support an efficient zero representation.

My motivation for this feature is efficient image streaming. The destination
file must stay compact and not be bloated with clusters containing only zeroes
from the source file.

Kevin Wolf has suggested zero detection in the source image
to avoid scanning for zeroes when the source image has an efficient zero
representation. This is a complementary feature which can be introduced in the
future but does not work for raw files over NFS, for example. I see the
current series as the minimum support we need with extensions possible in the
future.

Here is a qemu-iotest to verify that zero write detection is working:
http://repo.or.cz/w/qemu-iotests/stefanha.git/commitdiff/226949695eef51bdcdea3e6ce3d7e5a863427f37

Stefan Hajnoczi (3):
block: add zero write detection interface
qed: add zero write detection support
qemu-io: add zero write detection option

--
1.7.7.3

Re: [Qemu-devel] [PATCH] qemu-img rebase: Fix for undersized backing files

On Wed, Dec 7, 2011 at 12:02 PM, Kevin Wolf  wrote:
> Am 07.12.2011 12:50, schrieb Stefan Hajnoczi:
>> On Wed, Dec 7, 2011 at 11:45 AM, Kevin Wolf  wrote:
>>> Backing files may be smaller than the corresponding COW file. When
>>> reading directly from the backing file, qemu-img rebase must consider
>>> this and assume zero sectors after the end of backing files.
>>>
>>> Signed-off-by: Kevin Wolf 
>>> ---
>>>  qemu-img.c |   42 +-
>>>  1 files changed, 33 insertions(+), 9 deletions(-)
>>
>> Reviewed-by: Stefan Hajnoczi 
>>
>> It almost feels like we want a bdrv_read() variation that does the
>> zeroing beyond end of image instead of duplicating this.
>
> Almost. If we have a third user, it's probably worth it.
>
> Actually, if we make a bdrv_co_readv_backing() instead of a synchronous
> one, qcow2 would be the third user (and maybe VMDK could use it, too).
> That would require moving qemu-img into a coroutine first, which I think
> you wanted to do anyway?

If we can make qemu-img the last user of synchronous interfaces then
it would be nice to convert it.  Right now I'm happy that we're
solidifying coroutines and converting synchronous hardware emulation,
hopefully we'll get to the point where there is no real reason to keep
the synchronous API around anymore.

Stefan

Re: [Qemu-devel] [PATCH] qemu-img rebase: Fix for undersized backing files

Am 07.12.2011 12:50, schrieb Stefan Hajnoczi:
> On Wed, Dec 7, 2011 at 11:45 AM, Kevin Wolf  wrote:
>> Backing files may be smaller than the corresponding COW file. When
>> reading directly from the backing file, qemu-img rebase must consider
>> this and assume zero sectors after the end of backing files.
>>
>> Signed-off-by: Kevin Wolf 
>> ---
>>  qemu-img.c |   42 +-
>>  1 files changed, 33 insertions(+), 9 deletions(-)
> 
> Reviewed-by: Stefan Hajnoczi 
> 
> It almost feels like we want a bdrv_read() variation that does the
> zeroing beyond end of image instead of duplicating this.

Almost. If we have a third user, it's probably worth it.

Actually, if we make a bdrv_co_readv_backing() instead of a synchronous
one, qcow2 would be the third user (and maybe VMDK could use it, too).
That would require moving qemu-img into a coroutine first, which I think
you wanted to do anyway?

Kevin

Re: [Qemu-devel] [PATCH] qemu-img rebase: Fix for undersized backing files

On Wed, Dec 7, 2011 at 11:45 AM, Kevin Wolf  wrote:
> Backing files may be smaller than the corresponding COW file. When
> reading directly from the backing file, qemu-img rebase must consider
> this and assume zero sectors after the end of backing files.
>
> Signed-off-by: Kevin Wolf 
> ---
>  qemu-img.c |   42 +-
>  1 files changed, 33 insertions(+), 9 deletions(-)

Reviewed-by: Stefan Hajnoczi 

It almost feels like we want a bdrv_read() variation that does the
zeroing beyond end of image instead of duplicating this.

Stefan

Re: [Qemu-devel] [PATCH 4/4] ccid: make threads joinable

2011-12-07 Thread Alon Levy

On Tue, Dec 06, 2011 at 06:05:55PM +0100, Paolo Bonzini wrote:
> Destroying a mutex that another thread might have just unlocked
> is racy.  It usually works, but you cannot do that in general and
> can lead to deadlocks or segfaults.  Change ccid to use joinable
> threads instead.
> 

Looks good to me.

Reviewed-by: Alon Levy 

> (Also, qemu_mutex_init/qemu_cond_init were missing).
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  hw/ccid-card-emulated.c |   26 +++---
>  1 files changed, 11 insertions(+), 15 deletions(-)
> 
> diff --git a/hw/ccid-card-emulated.c b/hw/ccid-card-emulated.c
> index 9fe9db5..2d2ebce 100644
> --- a/hw/ccid-card-emulated.c
> +++ b/hw/ccid-card-emulated.c
> @@ -120,6 +120,7 @@ struct EmulatedState {
>  uint8_t  atr_length;
>  QSIMPLEQ_HEAD(event_list, EmulEvent) event_list;
>  QemuMutex event_list_mutex;
> +QemuThread event_thread_id;
>  VReader *reader;
>  QSIMPLEQ_HEAD(guest_apdu_list, EmulEvent) guest_apdu_list;
>  QemuMutex vreader_mutex; /* and guest_apdu_list mutex */
> @@ -127,8 +128,7 @@ struct EmulatedState {
>  QemuCond handle_apdu_cond;
>  int  pipe[2];
>  int  quit_apdu_thread;
> -QemuMutex apdu_thread_quit_mutex;
> -QemuCond apdu_thread_quit_cond;
> +QemuThread apdu_thread_id;
>  };
>  
>  static void emulated_apdu_from_guest(CCIDCardState *base,
> @@ -271,9 +271,6 @@ static void *handle_apdu_thread(void* arg)
>  }
>  qemu_mutex_unlock(&card->vreader_mutex);
>  }
> -qemu_mutex_lock(&card->apdu_thread_quit_mutex);
> -qemu_cond_signal(&card->apdu_thread_quit_cond);
> -qemu_mutex_unlock(&card->apdu_thread_quit_mutex);
>  return NULL;
>  }
>  
> @@ -489,7 +486,6 @@ static uint32_t parse_enumeration(char *str,
>  static int emulated_initfn(CCIDCardState *base)
>  {
>  EmulatedState *card = DO_UPCAST(EmulatedState, base, base);
> -QemuThread thread_id;
>  VCardEmulError ret;
>  EnumTable *ptable;
>  
> @@ -541,9 +537,10 @@ static int emulated_initfn(CCIDCardState *base)
>  printf("%s: failed to initialize vcard\n", EMULATED_DEV_NAME);
>  return -1;
>  }
> -qemu_thread_create(&thread_id, event_thread, card, QEMU_THREAD_DETACHED);
> -qemu_thread_create(&thread_id, handle_apdu_thread, card,
> -   QEMU_THREAD_DETACHED);
> +qemu_thread_create(&card->event_thread_id, event_thread, card,
> +   QEMU_THREAD_JOINABLE);
> +qemu_thread_create(&card->apdu_thread_id, handle_apdu_thread, card,
> +   QEMU_THREAD_JOINABLE);
>  return 0;
>  }
>  
> @@ -553,15 +550,14 @@ static int emulated_exitfn(CCIDCardState *base)
>  VEvent *vevent = vevent_new(VEVENT_LAST, NULL, NULL);
>  
>  vevent_queue_vevent(vevent); /* stop vevent thread */
> -qemu_mutex_lock(&card->apdu_thread_quit_mutex);
> +qemu_thread_join(&card->event_thread_id);
> +
>  card->quit_apdu_thread = 1; /* stop handle_apdu thread */
>  qemu_cond_signal(&card->handle_apdu_cond);
> -qemu_cond_wait(&card->apdu_thread_quit_cond,
> -  &card->apdu_thread_quit_mutex);
> -/* handle_apdu thread stopped, can destroy all of it's mutexes */
> +qemu_thread_join(&card->apdu_thread_id);
> +
> +/* threads exited, can destroy all condvars/mutexes */
>  qemu_cond_destroy(&card->handle_apdu_cond);
> -qemu_cond_destroy(&card->apdu_thread_quit_cond);
> -qemu_mutex_destroy(&card->apdu_thread_quit_mutex);
>  qemu_mutex_destroy(&card->handle_apdu_mutex);
>  qemu_mutex_destroy(&card->vreader_mutex);
>  qemu_mutex_destroy(&card->event_list_mutex);
> -- 
> 1.7.7.1
>

[Qemu-devel] [PATCH] qemu-img rebase: Fix for undersized backing files

Backing files may be smaller than the corresponding COW file. When
reading directly from the backing file, qemu-img rebase must consider
this and assume zero sectors after the end of backing files.

Signed-off-by: Kevin Wolf 
---
 qemu-img.c |   42 +-
 1 files changed, 33 insertions(+), 9 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 8bdae66..01cc0d3 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1420,6 +1420,8 @@ static int img_rebase(int argc, char **argv)
  */
 if (!unsafe) {
 uint64_t num_sectors;
+uint64_t old_backing_num_sectors;
+uint64_t new_backing_num_sectors;
 uint64_t sector;
 int n;
 uint8_t * buf_old;
@@ -1430,6 +1432,8 @@ static int img_rebase(int argc, char **argv)
 buf_new = qemu_blockalign(bs, IO_BUF_SIZE);
 
 bdrv_get_geometry(bs, &num_sectors);
+bdrv_get_geometry(bs_old_backing, &old_backing_num_sectors);
+bdrv_get_geometry(bs_new_backing, &new_backing_num_sectors);
 
 local_progress = (float)100 /
 (num_sectors / MIN(num_sectors, IO_BUF_SIZE / 512));
@@ -1448,16 +1452,36 @@ static int img_rebase(int argc, char **argv)
 continue;
 }
 
-/* Read old and new backing file */
-ret = bdrv_read(bs_old_backing, sector, buf_old, n);
-if (ret < 0) {
-error_report("error while reading from old backing file");
-goto out;
+/*
+ * Read old and new backing file and take into consideration that
+ * backing files may be smaller than the COW image.
+ */
+if (sector >= old_backing_num_sectors) {
+memset(buf_old, 0, n * BDRV_SECTOR_SIZE);
+} else {
+if (sector + n > old_backing_num_sectors) {
+n = old_backing_num_sectors - sector;
+}
+
+ret = bdrv_read(bs_old_backing, sector, buf_old, n);
+if (ret < 0) {
+error_report("error while reading from old backing file");
+goto out;
+}
 }
-ret = bdrv_read(bs_new_backing, sector, buf_new, n);
-if (ret < 0) {
-error_report("error while reading from new backing file");
-goto out;
+
+if (sector >= new_backing_num_sectors) {
+memset(buf_new, 0, n * BDRV_SECTOR_SIZE);
+} else {
+if (sector + n > new_backing_num_sectors) {
+n = new_backing_num_sectors - sector;
+}
+
+ret = bdrv_read(bs_new_backing, sector, buf_new, n);
+if (ret < 0) {
+error_report("error while reading from new backing file");
+goto out;
+}
 }
 
 /* If they differ, we need to write to the COW file */
-- 
1.7.6.4

Re: [Qemu-devel] [PATCH 12/14] SD card: add query function to check wether SD card currently ready to recieve data Before executing data transfer to card, we must check that previously issued command

On 7 December 2011 09:47, Evgeny Voevodin  wrote:
> From: Mitsyanko Igor 

Shouldn't we have a Signed-off-by: from this person as well?

There's a missing blank line in the commit message which
has put all of it into the Subject.

> Signed-off-by: Evgeny Voevodin 
> ---
>  hw/sd.c |    5 +
>  hw/sd.h |    1 +
>  2 files changed, 6 insertions(+), 0 deletions(-)
>
> diff --git a/hw/sd.c b/hw/sd.c
> index 10e26ad..ddb9d39 100644
> --- a/hw/sd.c
> +++ b/hw/sd.c
> @@ -1670,3 +1670,8 @@ void sd_enable(SDState *sd, int enable)
>  {
>     sd->enable = enable;
>  }
> +
> +int sd_recieve_ready(SDState *sd)
> +{
> +    return sd->state == sd_receivingdata_state;
> +}

"receive" is spelt "ei", not "ie" -- please fix the function
name.

-- PMM

Re: [Qemu-devel] [PATCH 09/14] hw/lan9118.c: Basic byte/word/long access support.

On 7 December 2011 10:58, Evgeny Voevodin  wrote:
> On 12/07/2011 02:09 PM, Peter Maydell wrote:
>>
>> On 7 December 2011 09:47, Evgeny Voevodin  wrote:
>>>
>>> We included this chip into s5pc210 platform because SMDK board holds
>>> lan9215 chip. Difference is that 9215 access is 16-bit wide and some
>>> registers differ. By addition basic 16-bit access to 9118 emulation we
>>> achieved ethernet controller support by Linux lernel on SMDK boards.
>>
>>
>> If it differs then shouldn't we add a new qdev device for 9215 ?
>> (sharing most of the implementation code, obviously)
>>
>
> This patch could be interpreted as lan9118 emulation expansion since this
> chip supports 16-bit access too. These changes don't cover all the
> difference between 9118 and 9215, but it's enough to provide network support
> to Samsung boards. When 9215 support will be added we can easily switch to
> this chip.

If you're adding a legitimate missing feature (16 bit access support) to
lan9118 then your commit message should say so, not talk about 9215.
The right place to say "this should be a 9215 but the 9118 is close enough"
is in the patch adding the network chip to the board model, not in the
commit adding a feature to the network chip model.

We should probably make 16 vs 32 bit be a qdev property of lan9118
(corresponding to the way the hardware is configured by an input pin
on startup) and reflect this properly in the HW_CFG register.

Also, this todo comment is too obscure:
+/* TODO: Implement fair word operation, because this implementation
+ * assumes that any register is accessed as two 16-bit operations. */

I have no idea what it means.

-- PMM

Re: [Qemu-devel] [Bug 899140] Re: Problem with Linux Kernel Traffic Control

2011-12-07 Thread Vincent Autefage

Well

I've compiled the ubuntu package.
When I've launched qemu, I've got this :
*
*$ *qemu-system-x86_64 -hda debian.img -m 512
qemu: could not load PC BIOS 'bios.bin'
**
*I've checked the content of the *pc-bios* directory and no bios are 
generated but I've got strange file like :
**.bin
*.dtb
openbios-*
*
I think that the *configure* interprets the *** as a base character...
Therefore, I've copied the content of*pc-bios* directory of 0.15.1 in 
the *pc-bios* directory of 0.14.0

Finally, the bug of rate have disappeared !!
*Iperf* gave me a rate of 19mbit which is the desired rate.

Vincent

Le 05/12/2011 12:11, Stefan Hajnoczi a écrit :
> On Mon, Dec 5, 2011 at 10:45 AM, Vincent Autefage
> <899...@bugs.launchpad.net>  wrote:
>> So we have another problem...
>> The thing is that the 0.14.0 (and all 0.14.0 rc) built from GIT has the
>> same problem.
>> However, the package 0.14.0 from Ubuntu does not has this bug...
> Okay, that's actually a good thing because the issue is now isolated
> to two similar builds: 0.14.0 from source and 0.14.0 from Ubuntu.
>
> Either there is an environmental difference in the build configuration
> or Ubuntu has applied patches on top of vanilla 0.14.0.
>
> I think the next step is to grab the Ubuntu 0.14.0 source package and
> rebuild it to confirm that it does *not* have the bug.
>
> Then it's just a matter of figuring out what the difference is by a
> (manual) bisection.
>
> Are you using qemu-kvm?  I found Ubuntu's 0.14.0-based package here:
> http://packages.ubuntu.com/natty/qemu-kvm
>
> Stefan
>

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/899140

Title:
  Problem with Linux Kernel Traffic Control

Status in QEMU:
  New

Bug description:
  Hi,

  The last main versions of QEMU (0.14.1, 0.15 and 1.0) have an important 
problem when running on a Linux distribution which running itself a Traffic 
Control (TC) instance.
  Indeed, when TC is configured with a Token Bucket Filter (TBF) with a 
particular rate, the effective rate is very slower than the desired one.

  For instance, lets consider the following configuration :

  # tc qdisc add dev eth0 root tbf rate 20mbit burst 20k latency 50ms

  The effective rate will be about 100kbit/s ! (verified with iperf)
  I've encountered this problem on versions 0.14.1, 0.15 and 1.0 but not with 
the 0.14.0...
  In the 0.14.0, we have a rate of 19.2 mbit/s which is quiet normal.

  I've done the experimentation on several hosts :

  - Debian 32bit core i7, 4GB RAM
  - Debian 64bit core i7, 8GB RAM
  - 3 different high performance servers : Ubuntu 64 bits, 48 AMD Opteron, 
128GB of RAM

  The problem is always the same... The problem is also seen with a
  Class Based Queuing (CBQ) in TC.

  Thanks

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/899140/+subscriptions

Re: [Qemu-devel] [PATCH 01/14] ARM: s5pc210: Basic support of s5pc210 boards