date:20120925

Re: [Qemu-devel] [PATCH 0/2] add pci-serial device.

2012-09-25 Thread Gerd Hoffmann

  Hi,

>> The only reason I ask is whether this is something we can add
>> new features to.  I can't think of one off hand, but it can't
>> hurt to work this out up front.
> 
> Multiport e.g. (to save PCI slots). There was some proposal
> recently to add a model of an real multiport PCI card, just don't
> find the mail right now...

Easy enough to add.  Should be a separate device with its own pci id
though.  According to the linux source code there seem to be two
common ways to implement this:  A single, large pci bar where all
uarts are mapped one after the other.  Or one pci bar for each uart.
Now I need to figure which is easier to handle for windows guests ...

cheers,
  Gerd

Re: [Qemu-devel] [PATCH 09/17] aio: prepare for introducing GSource-based dispatch

2012-09-25 Thread Paolo Bonzini

Il 26/09/2012 00:01, Anthony Liguori ha scritto:
> > +revents = node->pfd.revents & node->pfd.events;
> > +node->pfd.revents &= ~revents;
> 
> This is interesting and I must admit I don't understand why it's
> necessary.  What case are you trying to handle?

That's for the case where you got a write event for fd Y, but disabled
the write handler from the handler of fd X (called before fd Y).  But
what the current code does is just eat the event, so I can do the same
and set node->pfd.revents to 0.

Paolo

Re: [Qemu-devel] [PATCH 0/2] add pci-serial device.

2012-09-25 Thread Gerd Hoffmann

On 09/26/12 01:43, Anthony Liguori wrote:
> Gerd Hoffmann  writes:
> 
>>   Hi,
>>
>> Two patches, first split up serial.c a bit,
>> then actually add the pci-based serial device.
> 
> The series looks good to me.  A couple requests:
> 
> 1) Could you add a spec describing this new PCI device?  Doesn't need to
>be more than a couple paragraphs since the device is super simple.

Well, it is pretty strait forward:  A single IO bar, 8 bytes in size,
where the 16550 uart is mapped to:

[kraxel@fedora ~]$ lspci -vse
00:0e.0 Serial controller: Red Hat, Inc. Device 0002 (rev 01) (prog-if
00 [8250])
Subsystem: Red Hat, Inc Device 1100
Physical Slot: 14
Flags: fast devsel, IRQ 11
I/O ports at c130 [size=8]
Kernel driver in use: serial

But I can surely add a comment about it.

> 2) Could you make the inf file an separate patch and either include
>documentation in the commit message on how to use it with Windows or
>just add a comment to the inf file?

I think a comment is better, easier to find than a commit message.  Will do.

> This is a new PCI space for QEMU too.  

It isn't new, I just followed what the pci bridge is doing (which has
1b36:0001).

> Is this a driver that is "owned"
> by QEMU and Red Hat is donating the PCI id or is this a driver that RH
> controls that we're implementing?

I consider it being owned by qemu.

cheers,
  Gerd

Re: [Qemu-devel] [PATCH 11/17] aio: make AioContexts GSources

2012-09-25 Thread Paolo Bonzini

Il 26/09/2012 00:06, Anthony Liguori ha scritto:
>> >  if (node) {
>> > +g_source_remove_poll(&ctx->source, &node->pfd);
>> > +
> Why remove vs. setting events = 0?

Because otherwise you'd get a dangling pointer to node->pfd. :)

Paolo

> add_poll/remove_poll also comes with an event loop notify which I don't
> think is strictly necessary here.
>

Re: [Qemu-devel] [PATCH] target-xtensa: de-optimize EXTUI

2012-09-25 Thread Aurelien Jarno

On Wed, Sep 26, 2012 at 03:05:18AM +0400, Max Filippov wrote:
> On Wed, Sep 26, 2012 at 2:57 AM, Aurelien Jarno  wrote:
> > Now that and with 0xff, 0x and 0x is optimized in
> > tcg/tcg-op.h, there is no need to do it in target-xtensa/translate.c.
> >
> > Cc: Max Filippov 
> > Signed-off-by: Aurelien Jarno 
> > ---
> >  target-xtensa/translate.c |   15 +--
> >  1 file changed, 1 insertion(+), 14 deletions(-)
> >
> > diff --git a/target-xtensa/translate.c b/target-xtensa/translate.c
> > index ba3ffcb..c1358ee 100644
> > --- a/target-xtensa/translate.c
> > +++ b/target-xtensa/translate.c
> > @@ -1835,20 +1835,7 @@ static void disas_xtensa_insn(DisasContext *dc)
> >  } else {
> >  tcg_gen_mov_i32(tmp, cpu_R[RRR_T]);
> >  }
> 
> I guess shri above may be de-optimized as well.
> In any case Acked-by: Max Filippov 

Good catch, I looked for some patterns in the targets code, and didn't
see this one. I'll send an updated patch.

> > -
> > -switch (maskimm) {
> > -case 0xff:
> > -tcg_gen_ext8u_i32(cpu_R[RRR_R], tmp);
> > -break;
> > -
> > -case 0x:
> > -tcg_gen_ext16u_i32(cpu_R[RRR_R], tmp);
> > -break;
> > -
> > -default:
> > -tcg_gen_andi_i32(cpu_R[RRR_R], tmp, maskimm);
> > -break;
> > -}
> > +tcg_gen_andi_i32(cpu_R[RRR_R], tmp, maskimm);
> >  tcg_temp_free(tmp);
> >  }
> >  break;
> > --
> > 1.7.10.4
> >
> >
> 
> -- 
> Thanks.
> -- Max
> 
> 

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH 14/17] main-loop: use GSource to poll AIO file descriptors

2012-09-25 Thread Paolo Bonzini

Il 26/09/2012 00:09, Anthony Liguori ha scritto:
> What do you think about deprecating bottom halves in the !block code in
> favor of idle functions?  I don't see any reason to keep using bottom
> halves...

The ptimer.c code uses bottom halves internally.  Otherwise I'd agree.

Paolo

> Reviewed-by: Anthony Liguori

Re: [Qemu-devel] [PATCH 3/6] target-ppc: Extend FPU state for newer POWER CPUs

2012-09-25 Thread Aurelien Jarno

On Wed, Sep 26, 2012 at 01:12:18PM +1000, David Gibson wrote:
> This patch adds some extra FPU state to CPUPPCState.  Specifically, fpscr
> is extended to 64 bits, since some recent CPUs now have more status bits
> than fit inside 64 bits, and we add the 32 VSR registers present on CPUs
> with VSX (these extend the standard FP regs, which together with the
> Altivec/VMX registers form a 64 x 128bit register file for VSX).
> 
> We don't actually support the instructions using these extra registers in
> TCG yet, but we still a place to store the state so we can sync it with
> KVM and savevm/loadvm it.  This patch updates the savevm code to not
> fail on the extended state, but also does not actually save it - that's
> a project for another patch.
> 
> Signed-off-by: David Gibson 
> ---
>  target-ppc/cpu.h   |4 +++-
>  target-ppc/machine.c   |8 ++--
>  target-ppc/translate.c |2 +-
>  3 files changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
> index faf4404..846778f 100644
> --- a/target-ppc/cpu.h
> +++ b/target-ppc/cpu.h
> @@ -963,7 +963,7 @@ struct CPUPPCState {
>  /* floating point registers */
>  float64 fpr[32];
>  /* floating point status and control register */
> -uint32_t fpscr;
> +uint64_t fpscr;

This will break the TCG code, as fpscr is mapped as an i32 in TCG. Also
if it is 64-bit only on PPC64 machines, it might be a good idea to
change it to target_ulong instead, and use _tl in the TCG code.

>  /* Next instruction pointer */
>  target_ulong nip;
> @@ -1014,6 +1014,8 @@ struct CPUPPCState {
>  /* Altivec registers */
>  ppc_avr_t avr[32];
>  uint32_t vscr;
> +/* VSX registers */
> +uint64_t vsr[32];
>  /* SPE registers */
>  uint64_t spe_acc;
>  uint32_t spe_fscr;
> diff --git a/target-ppc/machine.c b/target-ppc/machine.c
> index 21ce757..5e7bc00 100644
> --- a/target-ppc/machine.c
> +++ b/target-ppc/machine.c
> @@ -6,6 +6,7 @@ void cpu_save(QEMUFile *f, void *opaque)
>  {
>  CPUPPCState *env = (CPUPPCState *)opaque;
>  unsigned int i, j;
> +uint32_t fpscr;
>  
>  for (i = 0; i < 32; i++)
>  qemu_put_betls(f, &env->gpr[i]);
> @@ -30,7 +31,8 @@ void cpu_save(QEMUFile *f, void *opaque)
>  u.d = env->fpr[i];
>  qemu_put_be64(f, u.l);
>  }
> -qemu_put_be32s(f, &env->fpscr);
> +fpscr = env->fpscr;
> +qemu_put_be32s(f, &fpscr);
>  qemu_put_sbe32s(f, &env->access_type);
>  #if defined(TARGET_PPC64)
>  qemu_put_betls(f, &env->asr);
> @@ -90,6 +92,7 @@ int cpu_load(QEMUFile *f, void *opaque, int version_id)
>  CPUPPCState *env = (CPUPPCState *)opaque;
>  unsigned int i, j;
>  target_ulong sdr1;
> +uint32_t fpscr;
>  
>  for (i = 0; i < 32; i++)
>  qemu_get_betls(f, &env->gpr[i]);
> @@ -114,7 +117,8 @@ int cpu_load(QEMUFile *f, void *opaque, int version_id)
>  u.l = qemu_get_be64(f);
>  env->fpr[i] = u.d;
>  }
> -qemu_get_be32s(f, &env->fpscr);
> +qemu_get_be32s(f, &fpscr);
> +env->fpscr = fpscr;
>  qemu_get_sbe32s(f, &env->access_type);
>  #if defined(TARGET_PPC64)
>  qemu_get_betls(f, &env->asr);
> diff --git a/target-ppc/translate.c b/target-ppc/translate.c
> index ac915cc..c8122b7 100644
> --- a/target-ppc/translate.c
> +++ b/target-ppc/translate.c
> @@ -9463,7 +9463,7 @@ void cpu_dump_state (CPUPPCState *env, FILE *f, 
> fprintf_function cpu_fprintf,
>  if ((i & (RFPL - 1)) == (RFPL - 1))
>  cpu_fprintf(f, "\n");
>  }
> -cpu_fprintf(f, "FPSCR %08x\n", env->fpscr);
> +cpu_fprintf(f, "FPSCR %08" PRIx64 "\n", env->fpscr);
>  #if !defined(CONFIG_USER_ONLY)
>  cpu_fprintf(f, " SRR0 " TARGET_FMT_lx "  SRR1 " TARGET_FMT_lx
> "PVR " TARGET_FMT_lx " VRSAVE " TARGET_FMT_lx "\n",
> -- 
> 1.7.10.4
> 
> 
> 

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH 09/17] aio: prepare for introducing GSource-based dispatch

2012-09-25 Thread Paolo Bonzini

Il 26/09/2012 00:01, Anthony Liguori ha scritto:
>> > +node->pfd.events = G_IO_ERR;
>> > +node->pfd.events |= (io_read ? G_IO_IN | G_IO_HUP : 0);
>> > +node->pfd.events |= (io_write ? G_IO_OUT : 0);
>> >  }
> Should we even set G_IO_ERR?  I think that corresponds to exceptfd

No, that would be G_IO_PRI.

> in select() but we've never set that historically.  I know glib recommends
> it but I don't think it's applicable to how we use it.
> 
> Moreover, the way you do dispatch, if G_IO_ERR did occur, we'd dispatch
> both the read and write handlers which definitely isn't right.

I'm not sure what gives POLLERR.  Probably a connect() that fails, and
in that case dispatching on the write handler would be okay.  But I was
not sure and calling both is safe: handlers have to be ready for
spurious wakeups anyway, it happens if qemu_aio_wait dispatches from a
VCPU thread before the main loop gets hold of the big lock.

> I think it's easiest just to drop it.

That's indeed the case, since the current code never sets either
G_IO_HUP or G_IO_ERR.

Paolo

Re: [Qemu-devel] [PATCH] usb: Fix usb_packet_map() in the presence of IOMMUs

2012-09-25 Thread Gerd Hoffmann

On 09/26/12 04:59, David Gibson wrote:
> With the IOMMU infrastructure introduced before 1.2, we need to use
> dma_memory_map() to obtain a qemu pointer to memory from an IO bus address.
> However, dma_memory_map() alters the given length to reflect the length
> over which the used DMA translation is valid - which could be either more
> or less than the requested length.
> 
> usb_packet_map() does not correctly handle these cases, simply failing if
> dma_memory_map() alters the requested length.  If dma_memory_map()
> increased the length, we just need to use the requested length for the
> qemu_iovec_add().  However, if it decreased the length, it means that a
> single DMA translation is not valid for the whole sglist element, and so
> we need to loop, splitting it up into multiple iovec entries for each
> piece with a DMA translation (in practice >2 pieces is unlikely).
> 
> This patch implements the correct behaviour

Patch added to usb patch queue.

thanks,
  Gerd

Re: [Qemu-devel] [PATCH 06/17] aio: introduce AioContext, move bottom halves there

2012-09-25 Thread Paolo Bonzini

Il 25/09/2012 23:51, Anthony Liguori ha scritto:
>> >  typedef struct QEMUTimer QEMUTimer;
>> >  typedef struct QEMUFile QEMUFile;
>> > +typedef struct QEMUBH QEMUBH;
>> >  typedef struct DeviceState DeviceState;
>> >  
>> >  struct Monitor;
> Any reason to do this here vs. just #include "qemu-aio.h" in
> qemu-common.h?
> 
> I don't see an obvious dependency on qemu-common.h in qemu-aio.h other
> than this typedef.

I thought we were moving away from including everything in
qemu-common.h.  In fact, the only includes from QEMU in qemu-common.h are:

#ifdef _WIN32
#include "qemu-os-win32.h"
#endif

#ifdef CONFIG_POSIX
#include "qemu-os-posix.h"
#endif

#include "osdep.h"
#include "bswap.h"

#include "cpu.h"

where cpu.h could probably be removed---perhaps should.

Paolo

[Qemu-devel] ARM bootloader boot blobbing.

2012-09-25 Thread Peter Crosthwaite

Hi All,

Can anyone think of a reason why the arm primary bootloader cant be
done by just direct interaction with the CPU? Currently we have this
...

/* The worlds second smallest bootloader.  Set r0-r2, then jump to kernel.  */
static uint32_t bootloader[] = {
  0xe3a0, /* mov r0, #0 */
  0xe59f1004, /* ldr r1, [pc, #4] */
  0xe59f2004, /* ldr r2, [pc, #4] */
  0xe59ff004, /* ldr pc, [pc, #4] */
  0, /* Board ID */
  0, /* Address of kernel args.  Set by integratorcp_init.  */
  0  /* Kernel entry point.  Set by integratorcp_init.  */
};

... which gets injected into RAM then we set the PC to this blob and
go. But couldnt we just set R0-2 directly from the bootloader and just
straight to the kernel entry point? Why do we have to blob in a
lightweight bootloader?

Regards,
Peter

Re: [Qemu-devel] [RfC] using pixman in qemu for raster ops

2012-09-25 Thread Gerd Hoffmann

  Hi,

> $ yum -C search pixman
> Loaded plugins: downloadonly, fastestmirror
> === Matched: pixman 
> 
> qpixman.i386 : Modified version of pixman for spice
> qpixman.x86_64 : Modified version of pixman for spice
> qpixman-devel.i386 : Pixel manipulation library development package
> qpixman-devel.x86_64 : Pixel manipulation library development package

Found it, it is in the "RHEL Optional Productivity Apps" channel.

cheers,
  Gerd

[Qemu-devel] [0/6] Pending pseries updates

2012-09-25 Thread David Gibson

Hi Alex,

Here's another batch of updates for pseries, some of which affect
wider target-ppc code.  I have sent a few of these before, but I don't
believe any have made it into ppc-next so far.  5/6 is an important
bugfix we've discussed before, which I've CCed to qemu-stable.

[Qemu-devel] [PATCH 1/6] pseries: Set hash table size based on RAM size

2012-09-25 Thread David Gibson

Currently the pseries machine code always attempts to set the size of the
guests's hash page table to 16MB.  However, because of the way the POWER
MMU works, a suitable hash page table size should really depend on memory
size.  16MB will be excessive for guests with <1GB and RAM, and may not be
enough for guests with >2GB of RAM (depending on guest page size and
other factors).

The usual given rule of thumb is that the hash table should be 1/64 of
the size of memory, but in fact the Linux guests we are aiming at don't
really need that much.  This patch, therefore, changes the hash table
allocation code to aim for 1/128 of the size of RAM (rounding up).  When
using KVM, this size may still be adjusted by the host kernel if it is
unable to allocate a suitable (contiguous) table.

Signed-off-by: David Gibson 

---

This patch can only be safely applied after my earlier patch
"target-ppc: KVM: Fix some kernel version edge cases for
kvmppc_reset_htab()" (already in ppc-next), otherwise the kernel can
end up advertising to the guest a hash table size different from that
actually allocated by the kernel.
---
 hw/spapr.c |   13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/hw/spapr.c b/hw/spapr.c
index 1177efa..a8bd3c1 100644
--- a/hw/spapr.c
+++ b/hw/spapr.c
@@ -725,9 +725,16 @@ static void ppc_spapr_init(ram_addr_t ram_size,
 spapr->fdt_addr = spapr->rtas_addr - FDT_MAX_SIZE;
 load_limit = spapr->fdt_addr - FW_OVERHEAD;
 
-/* For now, always aim for a 16MB hash table */
-/* FIXME: we should change this default based on RAM size */
-spapr->htab_shift = 24;
+/* We aim for a hash table of size 1/128 the size of RAM.  The
+ * normal rule of thumb is 1/64 the size of RAM, but that's much
+ * more than needed for the Linux guests we support. */
+spapr->htab_shift = 18; /* Minimum architected size */
+while (spapr->htab_shift <= 46) {
+if ((1ULL << (spapr->htab_shift + 7)) >= ram_size) {
+break;
+}
+spapr->htab_shift++;
+}
 
 /* init CPUs */
 if (cpu_model == NULL) {
-- 
1.7.10.4

[Qemu-devel] [PATCH 4/6] savevm: Add VMSTATE_ helpers for target_phys_addr_t

2012-09-25 Thread David Gibson

The savevm code contains VMSTATE_ helpers for a number of commonly used
types, but not for target_phys_addr_t.  This patch fixes that deficiency
implementing VMSTATE_TPA helpers in terms of VMSTATE_UINT32 or
VMSTATE_UINT64 helpers as appropriate.

Signed-off-by: David Gibson 
---
 targphys.h |   19 +++
 1 file changed, 19 insertions(+)

diff --git a/targphys.h b/targphys.h
index bd4938f..39c1c63 100644
--- a/targphys.h
+++ b/targphys.h
@@ -21,6 +21,13 @@ typedef uint32_t target_phys_addr_t;
 #define TARGET_PRIuPHYS PRIu32
 #define TARGET_PRIxPHYS PRIx32
 #define TARGET_PRIXPHYS PRIX32
+
+#define VMSTATE_TPA_V(_f, _s, _v) \
+VMSTATE_UINT32_V(_f, _s, _v)
+
+#define VMSTATE_TPA_EQUAL_V(_f, _s, _v) \
+VMSTATE_UINT32_EQUAL_V(_f, _s, _v)
+
 #elif TARGET_PHYS_ADDR_BITS == 64
 typedef uint64_t target_phys_addr_t;
 #define TARGET_PHYS_ADDR_MAX UINT64_MAX
@@ -31,7 +38,19 @@ typedef uint64_t target_phys_addr_t;
 #define TARGET_PRIuPHYS PRIu64
 #define TARGET_PRIxPHYS PRIx64
 #define TARGET_PRIXPHYS PRIX64
+
+#define VMSTATE_TPA_V(_f, _s, _v) \
+VMSTATE_UINT64_V(_f, _s, _v)
+
+#define VMSTATE_TPA_EQUAL_V(_f, _s, _v) \
+VMSTATE_UINT64_EQUAL_V(_f, _s, _v)
+
 #endif
 #endif
 
+#define VMSTATE_TPA(_f, _s) \
+VMSTATE_TPA_V(_f, _s, 0)
+#define VMSTATE_TPA_EQUAL(_f, _s) \
+VMSTATE_TPA_EQUAL_V(_f, _s, 0)
+
 #endif
-- 
1.7.10.4

[Qemu-devel] [PATCH 5/6] pseries: Don't test for MSR_PR for hypercalls under KVM

2012-09-25 Thread David Gibson

PAPR hypercalls should only be invoked from the guest kernel, not guest
user programs, that is, with MSR[PR]=0.  Currently we check this in
spapr_hypercall, returning H_PRIVILEGE if MSR[PR]=1.

However, under KVM the state of MSR[PR] is already checked by the host
kernel before passing the hypercall to qemu, making this check redundant.
Worse, however, we don't generally synchronize KVM and qemu state on the
hypercall path, meaning that qemu could incorrectly reject a hypercall
because it has a stale MSR value.

This patch fixes the problem by moving the privilege test exclusively to
the TCG hypercall path.

Cc: qemu-sta...@nongnu.org

Signed-off-by: David Gibson 
---
 hw/spapr.c   |7 ++-
 hw/spapr_hcall.c |5 -
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/spapr.c b/hw/spapr.c
index 079825a..e6bf522 100644
--- a/hw/spapr.c
+++ b/hw/spapr.c
@@ -573,7 +573,12 @@ static uint64_t translate_kernel_address(void *opaque, 
uint64_t addr)
 
 static void emulate_spapr_hypercall(CPUPPCState *env)
 {
-env->gpr[3] = spapr_hypercall(env, env->gpr[3], &env->gpr[4]);
+if (msr_pr) {
+hcall_dprintf("Hypercall made with MSR[PR]=1\n");
+env->gpr[3] = H_PRIVILEGE;
+} else {
+env->gpr[3] = spapr_hypercall(env, env->gpr[3], &env->gpr[4]);
+}
 }
 
 static void spapr_reset_htab(sPAPREnvironment *spapr)
diff --git a/hw/spapr_hcall.c b/hw/spapr_hcall.c
index 826ca67..194d9c2 100644
--- a/hw/spapr_hcall.c
+++ b/hw/spapr_hcall.c
@@ -681,11 +681,6 @@ void spapr_register_hypercall(target_ulong opcode, 
spapr_hcall_fn fn)
 target_ulong spapr_hypercall(CPUPPCState *env, target_ulong opcode,
  target_ulong *args)
 {
-if (msr_pr) {
-hcall_dprintf("Hypercall made with MSR[PR]=1\n");
-return H_PRIVILEGE;
-}
-
 if ((opcode <= MAX_HCALL_OPCODE)
 && ((opcode & 0x3) == 0)) {
 spapr_hcall_fn fn = papr_hypercall_table[opcode / 4];
-- 
1.7.10.4

[Qemu-devel] [PATCH 3/6] target-ppc: Extend FPU state for newer POWER CPUs

2012-09-25 Thread David Gibson

This patch adds some extra FPU state to CPUPPCState.  Specifically, fpscr
is extended to 64 bits, since some recent CPUs now have more status bits
than fit inside 64 bits, and we add the 32 VSR registers present on CPUs
with VSX (these extend the standard FP regs, which together with the
Altivec/VMX registers form a 64 x 128bit register file for VSX).

We don't actually support the instructions using these extra registers in
TCG yet, but we still a place to store the state so we can sync it with
KVM and savevm/loadvm it.  This patch updates the savevm code to not
fail on the extended state, but also does not actually save it - that's
a project for another patch.

Signed-off-by: David Gibson 
---
 target-ppc/cpu.h   |4 +++-
 target-ppc/machine.c   |8 ++--
 target-ppc/translate.c |2 +-
 3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
index faf4404..846778f 100644
--- a/target-ppc/cpu.h
+++ b/target-ppc/cpu.h
@@ -963,7 +963,7 @@ struct CPUPPCState {
 /* floating point registers */
 float64 fpr[32];
 /* floating point status and control register */
-uint32_t fpscr;
+uint64_t fpscr;
 
 /* Next instruction pointer */
 target_ulong nip;
@@ -1014,6 +1014,8 @@ struct CPUPPCState {
 /* Altivec registers */
 ppc_avr_t avr[32];
 uint32_t vscr;
+/* VSX registers */
+uint64_t vsr[32];
 /* SPE registers */
 uint64_t spe_acc;
 uint32_t spe_fscr;
diff --git a/target-ppc/machine.c b/target-ppc/machine.c
index 21ce757..5e7bc00 100644
--- a/target-ppc/machine.c
+++ b/target-ppc/machine.c
@@ -6,6 +6,7 @@ void cpu_save(QEMUFile *f, void *opaque)
 {
 CPUPPCState *env = (CPUPPCState *)opaque;
 unsigned int i, j;
+uint32_t fpscr;
 
 for (i = 0; i < 32; i++)
 qemu_put_betls(f, &env->gpr[i]);
@@ -30,7 +31,8 @@ void cpu_save(QEMUFile *f, void *opaque)
 u.d = env->fpr[i];
 qemu_put_be64(f, u.l);
 }
-qemu_put_be32s(f, &env->fpscr);
+fpscr = env->fpscr;
+qemu_put_be32s(f, &fpscr);
 qemu_put_sbe32s(f, &env->access_type);
 #if defined(TARGET_PPC64)
 qemu_put_betls(f, &env->asr);
@@ -90,6 +92,7 @@ int cpu_load(QEMUFile *f, void *opaque, int version_id)
 CPUPPCState *env = (CPUPPCState *)opaque;
 unsigned int i, j;
 target_ulong sdr1;
+uint32_t fpscr;
 
 for (i = 0; i < 32; i++)
 qemu_get_betls(f, &env->gpr[i]);
@@ -114,7 +117,8 @@ int cpu_load(QEMUFile *f, void *opaque, int version_id)
 u.l = qemu_get_be64(f);
 env->fpr[i] = u.d;
 }
-qemu_get_be32s(f, &env->fpscr);
+qemu_get_be32s(f, &fpscr);
+env->fpscr = fpscr;
 qemu_get_sbe32s(f, &env->access_type);
 #if defined(TARGET_PPC64)
 qemu_get_betls(f, &env->asr);
diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index ac915cc..c8122b7 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -9463,7 +9463,7 @@ void cpu_dump_state (CPUPPCState *env, FILE *f, 
fprintf_function cpu_fprintf,
 if ((i & (RFPL - 1)) == (RFPL - 1))
 cpu_fprintf(f, "\n");
 }
-cpu_fprintf(f, "FPSCR %08x\n", env->fpscr);
+cpu_fprintf(f, "FPSCR %08" PRIx64 "\n", env->fpscr);
 #if !defined(CONFIG_USER_ONLY)
 cpu_fprintf(f, " SRR0 " TARGET_FMT_lx "  SRR1 " TARGET_FMT_lx
"PVR " TARGET_FMT_lx " VRSAVE " TARGET_FMT_lx "\n",
-- 
1.7.10.4

[Qemu-devel] [PATCH 4/6] pseries: Implement PAPR NVRAM

2012-09-25 Thread David Gibson

The PAPR specification requires a certain amount of NVRAM, accessed via
RTAS, which we don't currently implement in qemu.  This patch addresses
this deficiency, implementing the NVRAM as a VIO device, with some glue to
instantiate it automatically based on a machine option.

The machine option specifies a drive id, which is used to back the NVRAM,
making it persistent.  If nothing is specified, the driver instead simply
allocates space for the NVRAM, which will not be persistent

Signed-off-by: David Gibson 
---
 hw/ppc/Makefile.objs |1 +
 hw/spapr.c   |3 +
 hw/spapr.h   |3 +
 hw/spapr_nvram.c |  225 ++
 qemu-config.c|4 +
 5 files changed, 236 insertions(+)
 create mode 100644 hw/spapr_nvram.c

diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
index 951e407..91cbe8c 100644
--- a/hw/ppc/Makefile.objs
+++ b/hw/ppc/Makefile.objs
@@ -11,6 +11,7 @@ obj-y += ppc_newworld.o
 obj-$(CONFIG_PSERIES) += spapr.o spapr_hcall.o spapr_rtas.o spapr_vio.o
 obj-$(CONFIG_PSERIES) += xics.o spapr_vty.o spapr_llan.o spapr_vscsi.o
 obj-$(CONFIG_PSERIES) += spapr_pci.o pci-hotplug.o spapr_iommu.o
+obj-$(CONFIG_PSERIES) += spapr_nvram.o
 # PowerPC 4xx boards
 obj-y += ppc4xx_devs.o ppc4xx_pci.o ppc405_uc.o ppc405_boards.o
 obj-y += ppc440_bamboo.o
diff --git a/hw/spapr.c b/hw/spapr.c
index a8bd3c1..079825a 100644
--- a/hw/spapr.c
+++ b/hw/spapr.c
@@ -804,6 +804,9 @@ static void ppc_spapr_init(ram_addr_t ram_size,
 }
 }
 
+/* We always have at least the nvram device on VIO */
+spapr_create_nvram(spapr);
+
 /* Set up PCI */
 spapr_pci_rtas_init();
 
diff --git a/hw/spapr.h b/hw/spapr.h
index e984e3f..d9c3b4a 100644
--- a/hw/spapr.h
+++ b/hw/spapr.h
@@ -6,11 +6,13 @@
 
 struct VIOsPAPRBus;
 struct sPAPRPHBState;
+struct sPAPRNVRAM;
 struct icp_state;
 
 typedef struct sPAPREnvironment {
 struct VIOsPAPRBus *vio_bus;
 QLIST_HEAD(, sPAPRPHBState) phbs;
+struct sPAPRNVRAM *nvram;
 struct icp_state *icp;
 
 target_phys_addr_t ram_limit;
@@ -336,6 +338,7 @@ typedef struct sPAPRTCE {
 #define SPAPR_PCI_BASE_LIOBN0x8000
 
 void spapr_iommu_init(void);
+void spapr_create_nvram(sPAPREnvironment *spapr);
 DMAContext *spapr_tce_new_dma_context(uint32_t liobn, size_t window_size);
 void spapr_tce_free(DMAContext *dma);
 void spapr_tce_reset(DMAContext *dma);
diff --git a/hw/spapr_nvram.c b/hw/spapr_nvram.c
new file mode 100644
index 000..8cd8a53
--- /dev/null
+++ b/hw/spapr_nvram.c
@@ -0,0 +1,225 @@
+/*
+ * QEMU sPAPR NVRAM emulation
+ *
+ * Copyright (C) 2012 David Gibson, IBM Corporation.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+#include 
+#include 
+
+#include "device_tree.h"
+#include "hw/sysbus.h"
+#include "hw/spapr.h"
+#include "hw/spapr_vio.h"
+
+typedef struct sPAPRNVRAM {
+VIOsPAPRDevice sdev;
+uint32_t size;
+uint8_t *buf;
+BlockDriverState *drive;
+} sPAPRNVRAM;
+
+#define MIN_NVRAM_SIZE 8192
+#define DEFAULT_NVRAM_SIZE 16384
+#define MAX_NVRAM_SIZE (UINT16_MAX * 16)
+
+static void rtas_nvram_fetch(sPAPREnvironment *spapr,
+ uint32_t token, uint32_t nargs,
+ target_ulong args,
+ uint32_t nret, target_ulong rets)
+{
+sPAPRNVRAM *nvram = spapr->nvram;
+target_phys_addr_t offset, buffer, len;
+int alen;
+void *membuf;
+
+if ((nargs != 3) || (nret != 2)) {
+rtas_st(rets, 0, -3);
+return;
+}
+
+if (!nvram) {
+rtas_st(rets, 0, -1);
+rtas_st(rets, 1, 0);
+return;
+}
+
+offset = rtas_ld(args, 0);
+buffer = rtas_ld(args, 1);
+len = rtas_ld(args, 2);
+
+if (((offset + len) < offset)
+|| ((offset + len) > nvram->size)) {
+rtas_st(rets, 0, -3);
+rtas_st(rets, 1, 0);
+return;
+}
+
+membuf = cpu_physical_memory_map(buffer, &len

[Qemu-devel] [RESEND PATCH v5 2/4] Update Linux kernel headers

2012-09-25 Thread Alex Williamson

Based on Linux as of 1a95620.

Signed-off-by: Alex Williamson 
Acked-by: Michael S. Tsirkin 
---

 linux-headers/linux/vfio.h |  368 
 1 file changed, 368 insertions(+)
 create mode 100644 linux-headers/linux/vfio.h

diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
new file mode 100644
index 000..f787b72
--- /dev/null
+++ b/linux-headers/linux/vfio.h
@@ -0,0 +1,368 @@
+/*
+ * VFIO API definition
+ *
+ * Copyright (C) 2012 Red Hat, Inc.  All rights reserved.
+ * Author: Alex Williamson 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#ifndef VFIO_H
+#define VFIO_H
+
+#include 
+#include 
+
+#define VFIO_API_VERSION   0
+
+
+/* Kernel & User level defines for VFIO IOCTLs. */
+
+/* Extensions */
+
+#define VFIO_TYPE1_IOMMU   1
+
+/*
+ * The IOCTL interface is designed for extensibility by embedding the
+ * structure length (argsz) and flags into structures passed between
+ * kernel and userspace.  We therefore use the _IO() macro for these
+ * defines to avoid implicitly embedding a size into the ioctl request.
+ * As structure fields are added, argsz will increase to match and flag
+ * bits will be defined to indicate additional fields with valid data.
+ * It's *always* the caller's responsibility to indicate the size of
+ * the structure passed by setting argsz appropriately.
+ */
+
+#define VFIO_TYPE  (';')
+#define VFIO_BASE  100
+
+/*  IOCTLs for VFIO file descriptor (/dev/vfio/vfio)  */
+
+/**
+ * VFIO_GET_API_VERSION - _IO(VFIO_TYPE, VFIO_BASE + 0)
+ *
+ * Report the version of the VFIO API.  This allows us to bump the entire
+ * API version should we later need to add or change features in incompatible
+ * ways.
+ * Return: VFIO_API_VERSION
+ * Availability: Always
+ */
+#define VFIO_GET_API_VERSION   _IO(VFIO_TYPE, VFIO_BASE + 0)
+
+/**
+ * VFIO_CHECK_EXTENSION - _IOW(VFIO_TYPE, VFIO_BASE + 1, __u32)
+ *
+ * Check whether an extension is supported.
+ * Return: 0 if not supported, 1 (or some other positive integer) if supported.
+ * Availability: Always
+ */
+#define VFIO_CHECK_EXTENSION   _IO(VFIO_TYPE, VFIO_BASE + 1)
+
+/**
+ * VFIO_SET_IOMMU - _IOW(VFIO_TYPE, VFIO_BASE + 2, __s32)
+ *
+ * Set the iommu to the given type.  The type must be supported by an
+ * iommu driver as verified by calling CHECK_EXTENSION using the same
+ * type.  A group must be set to this file descriptor before this
+ * ioctl is available.  The IOMMU interfaces enabled by this call are
+ * specific to the value set.
+ * Return: 0 on success, -errno on failure
+ * Availability: When VFIO group attached
+ */
+#define VFIO_SET_IOMMU _IO(VFIO_TYPE, VFIO_BASE + 2)
+
+/*  IOCTLs for GROUP file descriptors (/dev/vfio/$GROUP)  */
+
+/**
+ * VFIO_GROUP_GET_STATUS - _IOR(VFIO_TYPE, VFIO_BASE + 3,
+ * struct vfio_group_status)
+ *
+ * Retrieve information about the group.  Fills in provided
+ * struct vfio_group_info.  Caller sets argsz.
+ * Return: 0 on succes, -errno on failure.
+ * Availability: Always
+ */
+struct vfio_group_status {
+   __u32   argsz;
+   __u32   flags;
+#define VFIO_GROUP_FLAGS_VIABLE(1 << 0)
+#define VFIO_GROUP_FLAGS_CONTAINER_SET (1 << 1)
+};
+#define VFIO_GROUP_GET_STATUS  _IO(VFIO_TYPE, VFIO_BASE + 3)
+
+/**
+ * VFIO_GROUP_SET_CONTAINER - _IOW(VFIO_TYPE, VFIO_BASE + 4, __s32)
+ *
+ * Set the container for the VFIO group to the open VFIO file
+ * descriptor provided.  Groups may only belong to a single
+ * container.  Containers may, at their discretion, support multiple
+ * groups.  Only when a container is set are all of the interfaces
+ * of the VFIO file descriptor and the VFIO group file descriptor
+ * available to the user.
+ * Return: 0 on success, -errno on failure.
+ * Availability: Always
+ */
+#define VFIO_GROUP_SET_CONTAINER   _IO(VFIO_TYPE, VFIO_BASE + 4)
+
+/**
+ * VFIO_GROUP_UNSET_CONTAINER - _IO(VFIO_TYPE, VFIO_BASE + 5)
+ *
+ * Remove the group from the attached container.  This is the
+ * opposite of the SET_CONTAINER call and returns the group to
+ * an initial state.  All device file descriptors must be released
+ * prior to calling this interface.  When removing the last group
+ * from a container, the IOMMU will be disabled and all state lost,
+ * effectively also returning the VFIO file descriptor to an initial
+ * state.
+ * Return: 0 on success, -errno on failure.
+ * Availability: When attached to container
+ */
+#define VFIO_GROUP_UNSET_CONTAINER _IO(VFIO_TYPE, VFIO_BASE + 5)
+
+/**
+ * VFIO_GROUP_GET_DEVICE_FD - _IOW(VFIO_TYPE, VFIO_BASE + 6, char)
+ *
+ * Return a new file descriptor for the device object described by
+ * the provided string.  The string should match a device listed in
+ * the

[Qemu-devel] [RESEND PATCH v5 4/4] vfio: Enable vfio-pci and mark supported

2012-09-25 Thread Alex Williamson

Enabled for all softmmu guests supporting PCI on Linux hosts.  Note
that currently only x86 hosts have the kernel side VFIO IOMMU support
for this.  PPC (g3beige) is the only non-x86 guest known to work.
ARM (veratile) hangs in firmware, others untested.

Signed-off-by: Alex Williamson 
Acked-by: Michael S. Tsirkin 
---

 MAINTAINERS  |5 +
 configure|6 ++
 hw/Makefile.objs |3 ++-
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 25733fc..29aac4f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -474,6 +474,11 @@ M: Gerd Hoffmann 
 S: Maintained
 F: hw/usb*
 
+VFIO
+M: Alex Williamson 
+S: Supported
+F: hw/vfio*
+
 vhost
 M: Michael S. Tsirkin 
 S: Supported
diff --git a/configure b/configure
index 1b86517..c2c0d4f 100755
--- a/configure
+++ b/configure
@@ -165,6 +165,7 @@ attr=""
 libattr=""
 xfs=""
 
+vfio_pci="no"
 vhost_net="no"
 kvm="no"
 gprof="no"
@@ -509,6 +510,7 @@ Haiku)
   usb="linux"
   kvm="yes"
   vhost_net="yes"
+  vfio_pci="yes"
   if [ "$cpu" = "i386" -o "$cpu" = "x86_64" ] ; then
 audio_possible_drivers="$audio_possible_drivers fmod"
   fi
@@ -3174,6 +3176,7 @@ echo "libiscsi support  $libiscsi"
 echo "build guest agent $guest_agent"
 echo "seccomp support   $seccomp"
 echo "coroutine backend $coroutine_backend"
+echo "VFIO PCI support  $vfio_pci"
 
 if test "$sdl_too_old" = "yes"; then
 echo "-> Your SDL version is too old - please upgrade to have SDL support"
@@ -3911,6 +3914,9 @@ if test "$target_softmmu" = "yes" ; then
   if test "$smartcard_nss" = "yes" ; then
 echo "subdir-$target: subdir-libcacard" >> $config_host_mak
   fi
+  if test "$vfio_pci" = "yes" ; then
+echo "CONFIG_VFIO_PCI=y" >> $config_target_mak
+  fi
   case "$target_arch2" in
 i386|x86_64)
   echo "CONFIG_HAVE_CORE_DUMP=y" >> $config_target_mak
diff --git a/hw/Makefile.objs b/hw/Makefile.objs
index 6dfebd2..7f8d3e4 100644
--- a/hw/Makefile.objs
+++ b/hw/Makefile.objs
@@ -198,7 +198,8 @@ obj-$(CONFIG_VGA) += vga.o
 obj-$(CONFIG_SOFTMMU) += device-hotplug.o
 obj-$(CONFIG_XEN) += xen_domainbuild.o xen_machine_pv.o
 
-# Inter-VM PCI shared memory
+# Inter-VM PCI shared memory & VFIO PCI device assignment
 ifeq ($(CONFIG_PCI), y)
 obj-$(CONFIG_KVM) += ivshmem.o
+obj-$(CONFIG_VFIO_PCI) += vfio_pci.o
 endif

[Qemu-devel] [0/6] Fill gaps in savevm support

2012-09-25 Thread David Gibson

This is a series of fairly straightforward patches to fix some fairly
obvious gaps in the set of helpers for VMStateDescription buffers.
These just extend existing practice to some data types that didn't yet
have helper functions, presumably because there were not yet any
users.

I have draft savevm support for the pseries machine in the works which
need the various extensions here.

[Qemu-devel] [PATCH 6/6] ppc/pseries: Reset VPA registration on CPU reset

2012-09-25 Thread David Gibson

The ppc specific CPU state contains several variables which track the
VPA, SLB shadow and dispatch trace log.  These are structures shared
between OS and hypervisor that are used on the pseries machine to track
various per-CPU quantities.

The address of these structures needs to be registered by the guest on each
boot, however currently this registration is not cleared when we reset the
cpu.  This patch corrects this bug.

Signed-off-by: David Gibson 
---
 target-ppc/translate_init.c |8 
 1 file changed, 8 insertions(+)

diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
index fba2b42..a972287 100644
--- a/target-ppc/translate_init.c
+++ b/target-ppc/translate_init.c
@@ -10423,6 +10423,14 @@ static void ppc_cpu_reset(CPUState *s)
 env->pending_interrupts = 0;
 env->exception_index = POWERPC_EXCP_NONE;
 env->error_code = 0;
+
+#if defined(TARGET_PPC64) && !defined(CONFIG_USER_ONLY)
+env->vpa = 0;
+env->slb_shadow = 0;
+env->dispatch_trace_log = 0;
+env->dtl_size = 0;
+#endif /* TARGET_PPC64 */
+
 /* Flush all TLBs */
 tlb_flush(env, 1);
 }
-- 
1.7.10.4

[Qemu-devel] [PATCH 2/6] target-ppc: Remove unused power_mode field from cpu state

2012-09-25 Thread David Gibson

CPUPPCState includes a variable 'power_mode' which is used nowhere.  This
patch removes it.  This includes saving a dummy zero in its place during
vmsave, to avoid breaking the save format.

Signed-off-by: David Gibson 
---
 target-ppc/cpu.h |1 -
 target-ppc/machine.c |4 ++--
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
index ca2fc21..faf4404 100644
--- a/target-ppc/cpu.h
+++ b/target-ppc/cpu.h
@@ -1079,7 +1079,6 @@ struct CPUPPCState {
 int mmu_idx; /* precomputed MMU index to speed up mem accesses */
 
 /* Power management */
-int power_mode;
 int (*check_pow)(CPUPPCState *env);
 
 #if !defined(CONFIG_USER_ONLY)
diff --git a/target-ppc/machine.c b/target-ppc/machine.c
index d6c2ee4..21ce757 100644
--- a/target-ppc/machine.c
+++ b/target-ppc/machine.c
@@ -82,7 +82,7 @@ void cpu_save(QEMUFile *f, void *opaque)
 qemu_put_betls(f, &env->hflags);
 qemu_put_betls(f, &env->hflags_nmsr);
 qemu_put_sbe32s(f, &env->mmu_idx);
-qemu_put_sbe32s(f, &env->power_mode);
+qemu_put_sbe32(f, 0);
 }
 
 int cpu_load(QEMUFile *f, void *opaque, int version_id)
@@ -167,7 +167,7 @@ int cpu_load(QEMUFile *f, void *opaque, int version_id)
 qemu_get_betls(f, &env->hflags);
 qemu_get_betls(f, &env->hflags_nmsr);
 qemu_get_sbe32s(f, &env->mmu_idx);
-qemu_get_sbe32s(f, &env->power_mode);
+qemu_get_sbe32(f); /* Discard unused power_mode */
 
 return 0;
 }
-- 
1.7.10.4

[Qemu-devel] [PATCH 1/6] savevm: Add VMSTATE_UINT64_EQUAL helpers

2012-09-25 Thread David Gibson

The savevm code already includes a number of *_EQUAL helpers which act as
sanity checks verifying that the configuration of the saved state matches
that of the machine we're loading into to work.  Variants already exist
for 8 bit 16 bit and 32 bit integers, but not 64 bit integers.  This patch
fills that hole, adding a UINT64 version.

Signed-off-by: David Gibson 
---
 savevm.c  |   20 
 vmstate.h |7 +++
 2 files changed, 27 insertions(+)

diff --git a/savevm.c b/savevm.c
index c7fe283..f38e16e 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1043,6 +1043,26 @@ const VMStateInfo vmstate_info_uint64 = {
 .put  = put_uint64,
 };
 
+/* 64 bit unsigned int. See that the received value is the same than the one
+   in the field */
+
+static int get_uint64_equal(QEMUFile *f, void *pv, size_t size)
+{
+uint64_t *v = pv;
+uint64_t v2;
+qemu_get_be64s(f, &v2);
+
+if (*v == v2)
+return 0;
+return -EINVAL;
+}
+
+const VMStateInfo vmstate_info_uint64_equal = {
+.name = "int64 equal",
+.get  = get_uint64_equal,
+.put  = put_uint64,
+};
+
 /* 8 bit int. See that the received value is the same than the one
in the field */
 
diff --git a/vmstate.h b/vmstate.h
index c9c320e..6c7fbe0 100644
--- a/vmstate.h
+++ b/vmstate.h
@@ -129,6 +129,7 @@ extern const VMStateInfo vmstate_info_uint8_equal;
 extern const VMStateInfo vmstate_info_uint16_equal;
 extern const VMStateInfo vmstate_info_int32_equal;
 extern const VMStateInfo vmstate_info_uint32_equal;
+extern const VMStateInfo vmstate_info_uint64_equal;
 extern const VMStateInfo vmstate_info_int32_le;
 
 extern const VMStateInfo vmstate_info_uint8;
@@ -488,6 +489,12 @@ extern const VMStateInfo vmstate_info_unused_buffer;
 #define VMSTATE_UINT32_EQUAL(_f, _s)   \
 VMSTATE_SINGLE(_f, _s, 0, vmstate_info_uint32_equal, uint32_t)
 
+#define VMSTATE_UINT64_EQUAL_V(_f, _s, _v)\
+VMSTATE_SINGLE(_f, _s, _v, vmstate_info_uint64_equal, uint64_t)
+
+#define VMSTATE_UINT64_EQUAL(_f, _s)  \
+VMSTATE_UINT64_EQUAL_V(_f, _s, 0)
+
 #define VMSTATE_INT32_LE(_f, _s)   \
 VMSTATE_SINGLE(_f, _s, 0, vmstate_info_int32_le, int32_t)
 
-- 
1.7.10.4

[Qemu-devel] [PATCH 3/6] savevm: Add VMSTATE_FLOAT64 helpers

2012-09-25 Thread David Gibson

The current savevm code includes VMSTATE helpers for a number of commonly
used data types, but not for the float64 type used by the internal floating
point emulation code.  This patch fixes the deficiency.

Signed-off-by: David Gibson 
---
 savevm.c  |   23 +++
 vmstate.h |   15 +++
 2 files changed, 38 insertions(+)

diff --git a/savevm.c b/savevm.c
index f38e16e..d091488 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1103,6 +1103,29 @@ const VMStateInfo vmstate_info_uint16_equal = {
 .put  = put_uint16,
 };
 
+/* floating point */
+
+static int get_float64(QEMUFile *f, void *pv, size_t size)
+{
+float64 *v = pv;
+
+*v = make_float64(qemu_get_be64(f));
+return 0;
+}
+
+static void put_float64(QEMUFile *f, void *pv, size_t size)
+{
+uint64_t *v = pv;
+
+qemu_put_be64(f, float64_val(*v));
+}
+
+const VMStateInfo vmstate_info_float64 = {
+.name = "float64",
+.get  = get_float64,
+.put  = put_float64,
+};
+
 /* timers  */
 
 static int get_timer(QEMUFile *f, void *pv, size_t size)
diff --git a/vmstate.h b/vmstate.h
index 5d1c4f5..a04561e 100644
--- a/vmstate.h
+++ b/vmstate.h
@@ -137,6 +137,8 @@ extern const VMStateInfo vmstate_info_uint16;
 extern const VMStateInfo vmstate_info_uint32;
 extern const VMStateInfo vmstate_info_uint64;
 
+extern const VMStateInfo vmstate_info_float64;
+
 extern const VMStateInfo vmstate_info_timer;
 extern const VMStateInfo vmstate_info_buffer;
 extern const VMStateInfo vmstate_info_unused_buffer;
@@ -510,6 +512,13 @@ extern const VMStateInfo vmstate_info_unused_buffer;
 #define VMSTATE_UINT32_TEST(_f, _s, _t)  \
 VMSTATE_SINGLE_TEST(_f, _s, _t, 0, vmstate_info_uint32, uint32_t)
 
+
+#define VMSTATE_FLOAT64_V(_f, _s, _v) \
+VMSTATE_SINGLE(_f, _s, _v, vmstate_info_float64, float64)
+
+#define VMSTATE_FLOAT64(_f, _s)   \
+VMSTATE_FLOAT64_V(_f, _s, 0)
+
 #define VMSTATE_TIMER_TEST(_f, _s, _test) \
 VMSTATE_POINTER_TEST(_f, _s, _test, vmstate_info_timer, QEMUTimer *)
 
@@ -576,6 +585,12 @@ extern const VMStateInfo vmstate_info_unused_buffer;
 #define VMSTATE_INT64_ARRAY(_f, _s, _n)   \
 VMSTATE_INT64_ARRAY_V(_f, _s, _n, 0)
 
+#define VMSTATE_FLOAT64_ARRAY_V(_f, _s, _n, _v)   \
+VMSTATE_ARRAY(_f, _s, _n, _v, vmstate_info_float64, float64)
+
+#define VMSTATE_FLOAT64_ARRAY(_f, _s, _n) \
+VMSTATE_FLOAT64_ARRAY_V(_f, _s, _n, 0)
+
 #define VMSTATE_BUFFER_V(_f, _s, _v)  \
 VMSTATE_STATIC_BUFFER(_f, _s, _v, NULL, 0, sizeof(typeof_field(_s, _f)))
 
-- 
1.7.10.4

[Qemu-devel] [RESEND PATCH v5 1/4] Update kernel header script to include vfio

2012-09-25 Thread Alex Williamson

Signed-off-by: Alex Williamson 
Acked-by: Michael S. Tsirkin 
---

 scripts/update-linux-headers.sh |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh
index 53a6f87..67be2ef 100755
--- a/scripts/update-linux-headers.sh
+++ b/scripts/update-linux-headers.sh
@@ -57,7 +57,7 @@ done
 
 rm -rf "$output/linux-headers/linux"
 mkdir -p "$output/linux-headers/linux"
-for header in kvm.h kvm_para.h vhost.h virtio_config.h virtio_ring.h; do
+for header in kvm.h kvm_para.h vfio.h vhost.h virtio_config.h virtio_ring.h; do
 cp "$tmpdir/include/linux/$header" "$output/linux-headers/linux"
 done
 rm -rf "$output/linux-headers/asm-generic"

[Qemu-devel] [RESEND PATCH v5 0/4] VFIO-based PCI device assignment

2012-09-25 Thread Alex Williamson

Only change is adding Michael's ack.  Thanks,

Alex

---

Alex Williamson (4):
  vfio: Enable vfio-pci and mark supported
  vfio: vfio-pci device assignment driver
  Update Linux kernel headers
  Update kernel header script to include vfio


 MAINTAINERS |5 
 configure   |6 
 hw/Makefile.objs|3 
 hw/vfio_pci.c   | 1864 +++
 hw/vfio_pci_int.h   |  114 ++
 linux-headers/linux/vfio.h  |  368 
 scripts/update-linux-headers.sh |2 
 7 files changed, 2360 insertions(+), 2 deletions(-)
 create mode 100644 hw/vfio_pci.c
 create mode 100644 hw/vfio_pci_int.h
 create mode 100644 linux-headers/linux/vfio.h

[Qemu-devel] [PATCH 5/6] savevm: Add VMSTATE_STRUCT_VARRAY_POINTER_UINT32

2012-09-25 Thread David Gibson

Currently the savevm code contains a VMSTATE_STRUCT_VARRAY_POINTER_INT32
helper (a variably sized array with the number of elements in an int32_t),
but not VMSTATE_STRUCT_VARRAY_POINTER_UINT32 (... with the number of
elements in a uint32_t).  This patch (trivially) fixes the deficiency.

Signed-off-by: David Gibson 
---
 vmstate.h |   10 ++
 1 file changed, 10 insertions(+)

diff --git a/vmstate.h b/vmstate.h
index a04561e..4b393a0 100644
--- a/vmstate.h
+++ b/vmstate.h
@@ -322,6 +322,16 @@ extern const VMStateInfo vmstate_info_unused_buffer;
 .offset = vmstate_offset_pointer(_state, _field, _type), \
 }
 
+#define VMSTATE_STRUCT_VARRAY_POINTER_UINT32(_field, _state, _field_num, 
_vmsd, _type) { \
+.name   = (stringify(_field)),   \
+.version_id = 0, \
+.num_offset = vmstate_offset_value(_state, _field_num, uint32_t),\
+.size   = sizeof(_type), \
+.vmsd   = &(_vmsd),  \
+.flags  = VMS_POINTER | VMS_VARRAY_INT32 | VMS_STRUCT,   \
+.offset = vmstate_offset_pointer(_state, _field, _type), \
+}
+
 #define VMSTATE_STRUCT_VARRAY_POINTER_UINT16(_field, _state, _field_num, 
_vmsd, _type) { \
 .name   = (stringify(_field)),   \
 .version_id = 0, \
-- 
1.7.10.4

[Qemu-devel] [PATCH 2/6] savevm: Add VMSTATE_UINTTL_EQUAL helper

2012-09-25 Thread David Gibson

This adds an _EQUAL VMSTATE helper for target_ulongs, defined in terms of
VMSTATE_UINT32_EQUAL or VMSTATE_UINT64_EQUAL as appropriate.

Signed-off-by: David Gibson 
---
 hw/hw.h   |6 ++
 vmstate.h |7 +--
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/hw/hw.h b/hw/hw.h
index e5cb9bf..3af287f 100644
--- a/hw/hw.h
+++ b/hw/hw.h
@@ -50,16 +50,22 @@ int qemu_boot_set(const char *boot_devices);
 #if TARGET_LONG_BITS == 64
 #define VMSTATE_UINTTL_V(_f, _s, _v)  \
 VMSTATE_UINT64_V(_f, _s, _v)
+#define VMSTATE_UINTTL_EQUAL_V(_f, _s, _v)\
+VMSTATE_UINT64_EQUAL_V(_f, _s, _v)
 #define VMSTATE_UINTTL_ARRAY_V(_f, _s, _n, _v)\
 VMSTATE_UINT64_ARRAY_V(_f, _s, _n, _v)
 #else
 #define VMSTATE_UINTTL_V(_f, _s, _v)  \
 VMSTATE_UINT32_V(_f, _s, _v)
+#define VMSTATE_UINTTL_EQUAL_V(_f, _s, _v)\
+VMSTATE_UINT32_EQUAL_V(_f, _s, _v)
 #define VMSTATE_UINTTL_ARRAY_V(_f, _s, _n, _v)\
 VMSTATE_UINT32_ARRAY_V(_f, _s, _n, _v)
 #endif
 #define VMSTATE_UINTTL(_f, _s)\
 VMSTATE_UINTTL_V(_f, _s, 0)
+#define VMSTATE_UINTTL_EQUAL(_f, _s)  \
+VMSTATE_UINTTL_EQUAL_V(_f, _s, 0)
 #define VMSTATE_UINTTL_ARRAY(_f, _s, _n)  \
 VMSTATE_UINTTL_ARRAY_V(_f, _s, _n, 0)
 
diff --git a/vmstate.h b/vmstate.h
index 6c7fbe0..5d1c4f5 100644
--- a/vmstate.h
+++ b/vmstate.h
@@ -486,8 +486,11 @@ extern const VMStateInfo vmstate_info_unused_buffer;
 #define VMSTATE_INT32_EQUAL(_f, _s)   \
 VMSTATE_SINGLE(_f, _s, 0, vmstate_info_int32_equal, int32_t)
 
-#define VMSTATE_UINT32_EQUAL(_f, _s)   \
-VMSTATE_SINGLE(_f, _s, 0, vmstate_info_uint32_equal, uint32_t)
+#define VMSTATE_UINT32_EQUAL_V(_f, _s, _v)\
+VMSTATE_SINGLE(_f, _s, _v, vmstate_info_uint32_equal, uint32_t)
+
+#define VMSTATE_UINT32_EQUAL(_f, _s)  \
+VMSTATE_UINT32_EQUAL_V(_f, _s, 0)
 
 #define VMSTATE_UINT64_EQUAL_V(_f, _s, _v)\
 VMSTATE_SINGLE(_f, _s, _v, vmstate_info_uint64_equal, uint64_t)
-- 
1.7.10.4

[Qemu-devel] [PATCH 6/6] savevm: Fix bugs in the VMSTATE_VBUFFER_MULTIPLY definition

2012-09-25 Thread David Gibson

The VMSTATE_BUFFER_MULTIPLY macro is misnamed - it actually specifies
a variably sized buffer with VMS_VBUFFER, so should be named
VMSTATE_VBUFFER_MULTIPLY.  This patch fixes this (the macro had no current
users under either name).

In addition, unlike the other VMSTATE_VBUFFER variants, this macro did not
specify VMS_POINTER.  This patch fixes this bug as well.

Signed-off-by: David Gibson 
---
 vmstate.h |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/vmstate.h b/vmstate.h
index 4b393a0..6bfdb6a 100644
--- a/vmstate.h
+++ b/vmstate.h
@@ -372,14 +372,14 @@ extern const VMStateInfo vmstate_info_unused_buffer;
 .offset   = vmstate_offset_buffer(_state, _field) + _start,  \
 }
 
-#define VMSTATE_BUFFER_MULTIPLY(_field, _state, _version, _test, _start, 
_field_size, _multiply) { \
+#define VMSTATE_VBUFFER_MULTIPLY(_field, _state, _version, _test, _start, 
_field_size, _multiply) { \
 .name = (stringify(_field)), \
 .version_id   = (_version),  \
 .field_exists = (_test), \
 .size_offset  = vmstate_offset_value(_state, _field_size, uint32_t),\
 .size = (_multiply),  \
 .info = &vmstate_info_buffer,\
-.flags= VMS_VBUFFER|VMS_MULTIPLY,\
+.flags= VMS_VBUFFER|VMS_POINTER|VMS_MULTIPLY,\
 .offset   = offsetof(_state, _field),\
 .start= (_start),\
 }
-- 
1.7.10.4

Re: [Qemu-devel] [PATCH v2] Align PCI capabilities in pci_find_space

2012-09-25 Thread Alex Williamson

On Tue, 2012-09-25 at 20:01 -0500, m...@cs.wisc.edu wrote:
> From: Matt Renzelmann 
> 
> The current implementation of pci_find_space does not correctly align
> PCI capabilities in the PCI configuration space.  This patch fixes
> this issue.
> 
> Signed-off-by: Matt Renzelmann 
> ---
> 
> Alex Williamson  wrote:
> > I think you could just search every 4th byte.  In fact, this whole used
> > byte-map could be turned into a single uint64_t bitmap for standard
> > config space.  Thanks,
> 
> I've not tested this version of the patch, in contrast to the last, so
> I'm a bit less confident of its correctness.  I did not reimplement it
> as suggested as I'm not that familiar with this code, and instead just
> applied the every 4th byte strategy.
> 
>  hw/pci.c |   12 
>  1 files changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/pci.c b/hw/pci.c
> index f855cf3..e99866a 100644
> --- a/hw/pci.c
> +++ b/hw/pci.c
> @@ -1631,11 +1631,15 @@ static int pci_find_space(PCIDevice *pdev, uint8_t 
> size)
>  int config_size = pci_config_size(pdev);
>  int offset = PCI_CONFIG_HEADER_SIZE;
>  int i;
> -for (i = PCI_CONFIG_HEADER_SIZE; i < config_size; ++i)
> -if (pdev->used[i])
> -offset = i + 1;
> -else if (i - offset + 1 == size)
> +
> +for (i = PCI_CONFIG_HEADER_SIZE; i < config_size; i += 4) {
> +if (pdev->used[i]) {
> +offset = i + 4;
> +} else if (i - offset + 1 == size) {

This test needs to change as well.  Looks like it should now be:

 (i - offset + 4 >= size)

Whereas we were previously calculating the difference from the offset to
the current pointer plus the current unused byte, we're now assuming the
current dword is empty because we're only handing out dword aligned
offsets and it would be broken for something to not mark the first entry
used.  Probably worthwhile to also add a comment noting the PCI spec
requires dword alignment for capabilities.  Thanks,

Alex

>  return offset;
> +}
> +}
> +
>  return 0;
>  }
>

[Qemu-devel] [PATCH] usb: Fix usb_packet_map() in the presence of IOMMUs

2012-09-25 Thread David Gibson

With the IOMMU infrastructure introduced before 1.2, we need to use
dma_memory_map() to obtain a qemu pointer to memory from an IO bus address.
However, dma_memory_map() alters the given length to reflect the length
over which the used DMA translation is valid - which could be either more
or less than the requested length.

usb_packet_map() does not correctly handle these cases, simply failing if
dma_memory_map() alters the requested length.  If dma_memory_map()
increased the length, we just need to use the requested length for the
qemu_iovec_add().  However, if it decreased the length, it means that a
single DMA translation is not valid for the whole sglist element, and so
we need to loop, splitting it up into multiple iovec entries for each
piece with a DMA translation (in practice >2 pieces is unlikely).

This patch implements the correct behaviour

Signed-off-by: David Gibson 
---
 hw/usb/libhw.c |   24 +++-
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/hw/usb/libhw.c b/hw/usb/libhw.c
index c0de30e..703e2d2 100644
--- a/hw/usb/libhw.c
+++ b/hw/usb/libhw.c
@@ -28,19 +28,25 @@ int usb_packet_map(USBPacket *p, QEMUSGList *sgl)
 {
 DMADirection dir = (p->pid == USB_TOKEN_IN) ?
 DMA_DIRECTION_FROM_DEVICE : DMA_DIRECTION_TO_DEVICE;
-dma_addr_t len;
 void *mem;
 int i;
 
 for (i = 0; i < sgl->nsg; i++) {
-len = sgl->sg[i].len;
-mem = dma_memory_map(sgl->dma, sgl->sg[i].base, &len, dir);
-if (!mem) {
-goto err;
-}
-qemu_iovec_add(&p->iov, mem, len);
-if (len != sgl->sg[i].len) {
-goto err;
+dma_addr_t base = sgl->sg[i].base;
+dma_addr_t len = sgl->sg[i].len;
+
+while (len) {
+dma_addr_t xlen = len;
+mem = dma_memory_map(sgl->dma, sgl->sg[i].base, &xlen, dir);
+if (!mem) {
+goto err;
+}
+if (xlen > len) {
+xlen = len;
+}
+qemu_iovec_add(&p->iov, mem, xlen);
+len -= xlen;
+base += xlen;
 }
 }
 return 0;
-- 
1.7.10.4

Re: [Qemu-devel] [Qemu-ppc] RFC: NVRAM for pseries machine

2012-09-25 Thread David Gibson

On Wed, Sep 26, 2012 at 03:03:10AM +0200, Alexander Graf wrote:
> On 26.09.2012, at 02:27, David Gibson wrote:
> > On Mon, Sep 24, 2012 at 12:38:59PM +0200, Alexander Graf wrote:
> >> On 24.09.2012, at 02:31, David Gibson wrote:
[snip]
> >>> So, if you look at the patch there is actually a -device form within
> >>> there, the machine option is a wrapper around it.  Without the machine
> >>> option, I don't see how to get the desired properties for the
> >>> configuration that is:
> >>> * NVRAM is always instantiated by default (even if it's
> >>> non-persistent)
> >>> * It's easy to set the drive for that always-present NVRAM
> >> 
> >> I suppose the idea is that when creating a machine from a qtree
> >> dump, we can still recreate it. Or maybe when using -nodefaults? Not
> >> sure. But the way you do it right now is very close to how we want
> >> to model USB too, so I do like it. It's consistent.
> > 
> > I really don't follow what point you're making here.
> > 
> > The problem with -device syntax for my purpose is that with *no* extra
> > command line arguments we should always have some sort of NVRAM - it's
> > mandated by the platform spec, and should always be there, just like
> > the PCI bridge and VIO bridge.  That means instantiating the device
> > from the machine setup code.  But then, using -device will create a
> > second instance of the device, which is no good, because only one can
> > actually be used.
> 
> What I'm trying to say is that the machine file should create a
> device. Always in the case of PAPR. But I suppose pseudo-code is
> easier to read:
> 
> spapr.c:
> 
>   create_device("spapr-nvram", drive=machine_opts["nvram"]);

Ok.  That's what I do now.

> spapr-nvram:
> 
>   if (!drive || checksum_is_bad(drive))
> autogenerate_nvram_contents();

Actually, I'm planning for the initialization of the content to be
done from the guest firmware.


> Then we can later add in vl.c:
> 
>   case OPTION_nvram:
> create_drive("nvram", option);
> machine_opts["nvram"] = drive["nvram"];

Ok, that all works for me.

Blue, does that seem reasonable to you?

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson

[Qemu-devel] [PATCH v2] Align PCI capabilities in pci_find_space

2012-09-25 Thread mjr

From: Matt Renzelmann 

The current implementation of pci_find_space does not correctly align
PCI capabilities in the PCI configuration space.  This patch fixes
this issue.

Signed-off-by: Matt Renzelmann 
---

Alex Williamson  wrote:
> I think you could just search every 4th byte.  In fact, this whole used
> byte-map could be turned into a single uint64_t bitmap for standard
> config space.  Thanks,

I've not tested this version of the patch, in contrast to the last, so
I'm a bit less confident of its correctness.  I did not reimplement it
as suggested as I'm not that familiar with this code, and instead just
applied the every 4th byte strategy.

 hw/pci.c |   12 
 1 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index f855cf3..e99866a 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -1631,11 +1631,15 @@ static int pci_find_space(PCIDevice *pdev, uint8_t size)
 int config_size = pci_config_size(pdev);
 int offset = PCI_CONFIG_HEADER_SIZE;
 int i;
-for (i = PCI_CONFIG_HEADER_SIZE; i < config_size; ++i)
-if (pdev->used[i])
-offset = i + 1;
-else if (i - offset + 1 == size)
+
+for (i = PCI_CONFIG_HEADER_SIZE; i < config_size; i += 4) {
+if (pdev->used[i]) {
+offset = i + 4;
+} else if (i - offset + 1 == size) {
 return offset;
+}
+}
+
 return 0;
 }

-- 
1.7.5.4

Re: [Qemu-devel] [Qemu-ppc] RFC: NVRAM for pseries machine

2012-09-25 Thread Alexander Graf


On 26.09.2012, at 02:27, David Gibson wrote:

> On Mon, Sep 24, 2012 at 12:38:59PM +0200, Alexander Graf wrote:
>> 
>> On 24.09.2012, at 02:31, David Gibson wrote:
>> 
>>> On Sat, Sep 22, 2012 at 01:31:08PM +, Blue Swirl wrote:
 On Fri, Sep 21, 2012 at 3:08 AM, David Gibson
  wrote:
> Below is a patch which implements the (PAPR mandated) NVRAM for the
> pseries machine.  It raises a couple of generic questions.
> 
> First, this adds a new "nvram" machine option which is used to give a
> block device id to back the NVRAM so it is persistent.  Since some
> sort of NVRAM is quite common, it seems this might be useful on other
> machines one day, although obviously nothing else implements it yet.
 
 Yes, there have been discussions earlier since loading NVRAM contents
 from a file would be useful for many architectures too.
 
> 
> Second, if a block device is not specified, it simply allocates a
> block of memory to make a non-persistent NVRAM.  Obviously that isn't
> really "NV", but it's enough to make many guests happy most of the
> time, and doesn't require setting up an image file and drive.  It does
> mean a different set of code paths in the driver though, and it will
> need special case handling for savevm (not implemented yet).  Is this
> the right approach, or should I be creating a dummy block device for a
> one-run NVRAM of this kind?  I couldn't see an obvious way to do that,
> but maybe I'm missing something.
 
 That was the problem earlier too, it looks like a generic way for all
 NVRAM/flash devices should be obvious but so far nobody has been able
 to propose something.
 
 What if there are two devices which could use this, for example CMOS
 and flash on x86?
 
 This should be extending  -device syntax rather than adding another
 top level option. Something like
 -drive foo,file=nvram.qcow2,format=qcow2,id=main_nvram -device
 spapr-nvram,drive_id=main_nvram
>>> 
>>> So, if you look at the patch there is actually a -device form within
>>> there, the machine option is a wrapper around it.  Without the machine
>>> option, I don't see how to get the desired properties for the
>>> configuration that is:
>>> * NVRAM is always instantiated by default (even if it's
>>> non-persistent)
>>> * It's easy to set the drive for that always-present NVRAM
>> 
>> I suppose the idea is that when creating a machine from a qtree
>> dump, we can still recreate it. Or maybe when using -nodefaults? Not
>> sure. But the way you do it right now is very close to how we want
>> to model USB too, so I do like it. It's consistent.
> 
> I really don't follow what point you're making here.
> 
> The problem with -device syntax for my purpose is that with *no* extra
> command line arguments we should always have some sort of NVRAM - it's
> mandated by the platform spec, and should always be there, just like
> the PCI bridge and VIO bridge.  That means instantiating the device
> from the machine setup code.  But then, using -device will create a
> second instance of the device, which is no good, because only one can
> actually be used.

What I'm trying to say is that the machine file should create a device. Always 
in the case of PAPR. But I suppose pseudo-code is easier to read:

spapr.c:

  create_device("spapr-nvram", drive=machine_opts["nvram"]);

spapr-nvram:

  if (!drive || checksum_is_bad(drive))
autogenerate_nvram_contents();

Then we can later add in vl.c:

  case OPTION_nvram:
create_drive("nvram", option);
machine_opts["nvram"] = drive["nvram"];


Alex

Re: [Qemu-devel] [PATCH 8/9] mm: compaction: Cache if a pageblock was scanned and no pages were isolated

2012-09-25 Thread Minchan Kim

On Tue, Sep 25, 2012 at 10:12:07AM +0100, Mel Gorman wrote:
> On Mon, Sep 24, 2012 at 02:26:44PM -0700, Andrew Morton wrote:
> > On Mon, 24 Sep 2012 10:39:38 +0100
> > Mel Gorman  wrote:
> > 
> > > On Fri, Sep 21, 2012 at 02:36:56PM -0700, Andrew Morton wrote:
> > > 
> > > > Also, what has to be done to avoid the polling altogether?  eg/ie, zap
> > > > a pageblock's PB_migrate_skip synchronously, when something was done to
> > > > that pageblock which justifies repolling it?
> > > > 
> > > 
> > > The "something" event you are looking for is pages being freed or
> > > allocated in the page allocator. A movable page being allocated in block
> > > or a page being freed should clear the PB_migrate_skip bit if it's set.
> > > Unfortunately this would impact the fast path of the alloc and free paths
> > > of the page allocator. I felt that that was too high a price to pay.
> > 
> > We already do a similar thing in the page allocator: clearing of
> > ->all_unreclaimable and ->pages_scanned. 
> 
> That is true but that is a simple write (shared cache line but still) to
> a struct zone. Worse, now that you point it out, that's pretty stupid. It
> should be checking if the value is non-zero before writing to it to avoid
> a cache line bounce.
> 
> Clearing the PG_migrate_skip in this path to avoid the need to ever pool is
> not as cheap as it needs to
> 
> set_pageblock_skip
>   -> set_pageblock_flags_group
> -> page_zone
> -> page_to_pfn
> -> get_pageblock_bitmap
> -> pfn_to_bitidx
> -> __set_bit
> 
> > But that isn't on the "fast
> > path" really - it happens once per pcp unload. 
> 
> That's still an important enough path that I'm wary of making it fatter
> and that only covers the free path. To avoid the polling, the allocation
> side needs to be handled too. It could be shoved down into rmqueue() to
> put it into a slightly colder path but still, it's a price to pay to keep
> compaction happy.
> 
> > Can we do something
> > like that?  Drop some hint into the zone without having to visit each
> > page?
> > 
> 
> Not without incurring a cost, but yes, t is possible to give a hint on when
> PG_migrate_skip should be cleared and move away from that time-based hammer.
> 
> First, we'd introduce a variant of get_pageblock_migratetype() that returns
> all the bits for the pageblock flags and then helpers to extract either the
> migratetype or the PG_migrate_skip. We already are incurring the cost of
> get_pageblock_migratetype() so it will not be much more expensive than what
> is already there. If there is an allocation or free within a pageblock that
> as the PG_migrate_skip bit set then we increment a counter. When the counter
> reaches some to-be-decided "threshold" then compaction may clear all the
> bits. This would match the criteria of the clearing being based on activity.
> 
> There are four potential problems with this
> 
> 1. The logic to retrieve all the bits and split them up will be a little
>convulated but maybe it would not be that bad.
> 
> 2. The counter is a shared-writable cache line but obviously it could
>be moved to vmstat and incremented with inc_zone_page_state to offset
>the cost a little.
> 
> 3. The biggested weakness is that there is not way to know if the
>counter is incremented based on activity in a small subset of blocks.
> 
> 4. What should the threshold be?
> 
> The first problem is minor but the other three are potentially a mess.
> Adding another vmstat counter is bad enough in itself but if the counter
> is incremented based on a small subsets of pageblocks, the hint becomes
> is potentially useless.

Another idea is that we can add two bits(PG_check_migrate/PG_check_free)
in pageblock_flags_group.
In allocation path, we can set PG_check_migrate in a pageblock
In free path, we can set PG_check_free in a pageblock.
And they are cleared by compaction's scan like now.
So we can discard 3 and 4 at least.

Another idea is that let's cure it by fixing fundamental problem.
Make zone's locks more fine-grained.
As time goes by, system uses bigger memory but our lock of zone
isn't scalable. Recently, lru_lock and zone->lock contention report
isn't rare so i think it's good time that we move next step.

How about defining struct sub_zone per 2G or 4G?
so a zone can have several sub_zone as size and subzone can replace
current zone's role and zone is just container of subzones.
Of course, it's not easy to implement but I think someday we should
go that way. Is it a really overkill?

> 
> However, does this match what you have in mind or am I over-complicating
> things?
> 
> > > > >
> > > > > ...
> > > > >
> > > > > +static void reset_isolation_suitable(struct zone *zone)
> > > > > +{
> > > > > + unsigned long start_pfn = zone->zone_start_pfn;
> > > > > + unsigned long end_pfn = zone->zone_start_pfn + 
> > > > > zone->spanned_pages;
> > > > > + unsigned long pfn;
> > > > > +
> > > > > + /*
> > > > > +  * Do not reset more than once every

Re: [Qemu-devel] [Qemu-ppc] RFC: NVRAM for pseries machine

2012-09-25 Thread David Gibson

On Mon, Sep 24, 2012 at 12:38:59PM +0200, Alexander Graf wrote:
> 
> On 24.09.2012, at 02:31, David Gibson wrote:
> 
> > On Sat, Sep 22, 2012 at 01:31:08PM +, Blue Swirl wrote:
> >> On Fri, Sep 21, 2012 at 3:08 AM, David Gibson
> >>  wrote:
> >>> Below is a patch which implements the (PAPR mandated) NVRAM for the
> >>> pseries machine.  It raises a couple of generic questions.
> >>> 
> >>> First, this adds a new "nvram" machine option which is used to give a
> >>> block device id to back the NVRAM so it is persistent.  Since some
> >>> sort of NVRAM is quite common, it seems this might be useful on other
> >>> machines one day, although obviously nothing else implements it yet.
> >> 
> >> Yes, there have been discussions earlier since loading NVRAM contents
> >> from a file would be useful for many architectures too.
> >> 
> >>> 
> >>> Second, if a block device is not specified, it simply allocates a
> >>> block of memory to make a non-persistent NVRAM.  Obviously that isn't
> >>> really "NV", but it's enough to make many guests happy most of the
> >>> time, and doesn't require setting up an image file and drive.  It does
> >>> mean a different set of code paths in the driver though, and it will
> >>> need special case handling for savevm (not implemented yet).  Is this
> >>> the right approach, or should I be creating a dummy block device for a
> >>> one-run NVRAM of this kind?  I couldn't see an obvious way to do that,
> >>> but maybe I'm missing something.
> >> 
> >> That was the problem earlier too, it looks like a generic way for all
> >> NVRAM/flash devices should be obvious but so far nobody has been able
> >> to propose something.
> >> 
> >> What if there are two devices which could use this, for example CMOS
> >> and flash on x86?
> >> 
> >> This should be extending  -device syntax rather than adding another
> >> top level option. Something like
> >> -drive foo,file=nvram.qcow2,format=qcow2,id=main_nvram -device
> >> spapr-nvram,drive_id=main_nvram
> > 
> > So, if you look at the patch there is actually a -device form within
> > there, the machine option is a wrapper around it.  Without the machine
> > option, I don't see how to get the desired properties for the
> > configuration that is:
> > * NVRAM is always instantiated by default (even if it's
> > non-persistent)
> > * It's easy to set the drive for that always-present NVRAM
> 
> I suppose the idea is that when creating a machine from a qtree
> dump, we can still recreate it. Or maybe when using -nodefaults? Not
> sure. But the way you do it right now is very close to how we want
> to model USB too, so I do like it. It's consistent.

I really don't follow what point you're making here.

The problem with -device syntax for my purpose is that with *no* extra
command line arguments we should always have some sort of NVRAM - it's
mandated by the platform spec, and should always be there, just like
the PCI bridge and VIO bridge.  That means instantiating the device
from the machine setup code.  But then, using -device will create a
second instance of the device, which is no good, because only one can
actually be used.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson

Re: [Qemu-devel] [PATCH 5/9] mm: compaction: Acquire the zone->lru_lock as late as possible

2012-09-25 Thread Minchan Kim

On Tue, Sep 25, 2012 at 02:39:31PM -0700, Andrew Morton wrote:
> On Tue, 25 Sep 2012 17:13:27 +0900
> Minchan Kim  wrote:
> 
> > I see. To me, your saying is better than current comment.
> > I hope comment could be more explicit.
> > 
> > diff --git a/mm/compaction.c b/mm/compaction.c
> > index df01b4e..f1d2cc7 100644
> > --- a/mm/compaction.c
> > +++ b/mm/compaction.c
> > @@ -542,8 +542,9 @@ isolate_migratepages_range(struct zone *zone, struct 
> > compact_control *cc,
> >  * splitting and collapsing (collapsing has already happened
> >  * if PageLRU is set) but the lock is not necessarily taken
> >  * here and it is wasteful to take it just to check 
> > transhuge.
> > -* Check transhuge without lock and skip if it's either a
> > -* transhuge or hugetlbfs page.
> > +* Check transhuge without lock and *skip* if it's either a
> > +* transhuge or hugetlbfs page because it's not safe to call
> > +* compound_order.
> >  */
> > if (PageTransHuge(page)) {
> > if (!locked)
> 
> Going a bit further:
> 
> --- 
> a/mm/compaction.c~mm-compaction-acquire-the-zone-lru_lock-as-late-as-possible-fix
> +++ a/mm/compaction.c
> @@ -415,7 +415,8 @@ isolate_migratepages_range(struct zone *
>* if PageLRU is set) but the lock is not necessarily taken
>* here and it is wasteful to take it just to check transhuge.
>* Check transhuge without lock and skip if it's either a
> -  * transhuge or hugetlbfs page.
> +  * transhuge or hugetlbfs page because calling compound_order()
> +  * requires lru_lock to exclude isolation and splitting.
>*/
>   if (PageTransHuge(page)) {
>   if (!locked)
> _
> 
> 
> but...  the requirement to hold lru_lock for compound_order() is news
> to me.  It doesn't seem to be written down or explained anywhere, and
> one wonders why the cheerily undocumented compound_lock() doesn't have
> this effect.  What's going on here??

First of all, I don't know why we should mention hugetlbfs in comment.
I don't know hugetlbfs well so I had a time to look through code but
can't find a place setting PG_lru so I'm not sure hugetlbfs page can
reach on this code. Please correct me if I was wrong.

On THP, I think compound_lock you mentioned is okay, too but I think
it's sort of optimization because we don't need both lru_lock and
compound_lock. If we hold lru_lock, we can't prevent race with
__split_huge_page_refcount so that the page couldn't be freed.

Namely, it's safe to call compound_order.

> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org";> em...@kvack.org 

-- 
Kind regards,
Minchan Kim

Re: [Qemu-devel] [PATCH 0/2] add pci-serial device.

2012-09-25 Thread Jan Kiszka

On 2012-09-26 01:43, Anthony Liguori wrote:
> Gerd Hoffmann  writes:
> 
>>   Hi,
>>
>> Two patches, first split up serial.c a bit,
>> then actually add the pci-based serial device.
> 
> The series looks good to me.  A couple requests:
> 
> 1) Could you add a spec describing this new PCI device?  Doesn't need to
>be more than a couple paragraphs since the device is super simple.
> 
> 2) Could you make the inf file an separate patch and either include
>documentation in the commit message on how to use it with Windows or
>just add a comment to the inf file?
> 
> This is a new PCI space for QEMU too.  Is this a driver that is "owned"
> by QEMU and Red Hat is donating the PCI id or is this a driver that RH
> controls that we're implementing?
> 
> The only reason I ask is whether this is something we can add new
> features to.  I can't think of one off hand, but it can't hurt to work
> this out up front.

Multiport e.g. (to save PCI slots). There was some proposal recently to
add a model of an real multiport PCI card, just don't find the mail
right now...

Jan




signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 0/2] add pci-serial device.

2012-09-25 Thread Anthony Liguori

Gerd Hoffmann  writes:

>   Hi,
>
> Two patches, first split up serial.c a bit,
> then actually add the pci-based serial device.

The series looks good to me.  A couple requests:

1) Could you add a spec describing this new PCI device?  Doesn't need to
   be more than a couple paragraphs since the device is super simple.

2) Could you make the inf file an separate patch and either include
   documentation in the commit message on how to use it with Windows or
   just add a comment to the inf file?

This is a new PCI space for QEMU too.  Is this a driver that is "owned"
by QEMU and Red Hat is donating the PCI id or is this a driver that RH
controls that we're implementing?

The only reason I ask is whether this is something we can add new
features to.  I can't think of one off hand, but it can't hurt to work
this out up front.

Regards,

Anthony Liguori

>
> cheers,
>   Gerd
>
> Gerd Hoffmann (2):
>   serial: split serial.c
>   serial: add pci variant
>
>  default-configs/pci.mak  |2 +
>  docs/pciserial.inf   |   96 +++
>  hw/Makefile.objs |3 +-
>  hw/alpha_dp264.c |1 +
>  hw/kzm.c |2 +-
>  hw/mips_fulong2e.c   |1 +
>  hw/mips_jazz.c   |1 +
>  hw/mips_malta.c  |1 +
>  hw/mips_mipssim.c|2 +-
>  hw/mips_r4k.c|1 +
>  hw/musicpal.c|2 +-
>  hw/omap_uart.c   |3 +-
>  hw/openrisc_sim.c|3 +-
>  hw/pc.c  |1 +
>  hw/pc.h  |   27 -
>  hw/pci_ids.h |1 +
>  hw/petalogix_ml605_mmu.c |2 +-
>  hw/ppc/e500.c|2 +-
>  hw/ppc405_uc.c   |2 +-
>  hw/ppc440_bamboo.c   |2 +-
>  hw/ppc_prep.c|1 +
>  hw/pxa2xx.c  |2 +-
>  hw/serial-isa.c  |  130 +
>  hw/serial-pci.c  |  101 
>  hw/serial.c  |  143 
> ++
>  hw/serial.h  |   73 +++
>  hw/sm501.c   |2 +-
>  hw/sun4u.c   |1 +
>  hw/virtex_ml507.c|2 +-
>  hw/xtensa_lx60.c |3 +-
>  30 files changed, 433 insertions(+), 180 deletions(-)
>  create mode 100644 docs/pciserial.inf
>  create mode 100644 hw/serial-isa.c
>  create mode 100644 hw/serial-pci.c
>  create mode 100644 hw/serial.h

[Qemu-devel] [PATCH v3 2/5] target-arm: convert add_cc and sub_cc helpers to TCG

2012-09-25 Thread Aurelien Jarno

Now that the setcond TCG op is available, it's possible to replace
add_cc and sub_cc helpers by TCG code. The code generated by TCG is
actually very close to the one generated by GCC for the helper, and
this avoid all the consequences of using an helper: globals saved back
to memory, no possible optimization, call overhead, etc.

Reviewed-by: Peter Maydell 
Signed-off-by: Aurelien Jarno 
---
 target-arm/helper.h|2 --
 target-arm/op_helper.c |   20 ---
 target-arm/translate.c |   66 +++-
 3 files changed, 48 insertions(+), 40 deletions(-)

diff --git a/target-arm/helper.h b/target-arm/helper.h
index afdb2b5..7151e28 100644
--- a/target-arm/helper.h
+++ b/target-arm/helper.h
@@ -142,9 +142,7 @@ DEF_HELPER_2(recpe_u32, i32, i32, env)
 DEF_HELPER_2(rsqrte_u32, i32, i32, env)
 DEF_HELPER_5(neon_tbl, i32, env, i32, i32, i32, i32)
 
-DEF_HELPER_3(add_cc, i32, env, i32, i32)
 DEF_HELPER_3(adc_cc, i32, env, i32, i32)
-DEF_HELPER_3(sub_cc, i32, env, i32, i32)
 DEF_HELPER_3(sbc_cc, i32, env, i32, i32)
 
 DEF_HELPER_3(shl, i32, env, i32, i32)
diff --git a/target-arm/op_helper.c b/target-arm/op_helper.c
index f13fc3a..6095f24 100644
--- a/target-arm/op_helper.c
+++ b/target-arm/op_helper.c
@@ -323,16 +323,6 @@ uint64_t HELPER(get_cp_reg64)(CPUARMState *env, void *rip)
The only way to do that in TCG is a conditional branch, which clobbers
all our temporaries.  For now implement these as helper functions.  */
 
-uint32_t HELPER (add_cc)(CPUARMState *env, uint32_t a, uint32_t b)
-{
-uint32_t result;
-result = a + b;
-env->NF = env->ZF = result;
-env->CF = result < a;
-env->VF = (a ^ b ^ -1) & (a ^ result);
-return result;
-}
-
 uint32_t HELPER(adc_cc)(CPUARMState *env, uint32_t a, uint32_t b)
 {
 uint32_t result;
@@ -348,16 +338,6 @@ uint32_t HELPER(adc_cc)(CPUARMState *env, uint32_t a, 
uint32_t b)
 return result;
 }
 
-uint32_t HELPER(sub_cc)(CPUARMState *env, uint32_t a, uint32_t b)
-{
-uint32_t result;
-result = a - b;
-env->NF = env->ZF = result;
-env->CF = a >= b;
-env->VF = (a ^ b) & (a ^ result);
-return result;
-}
-
 uint32_t HELPER(sbc_cc)(CPUARMState *env, uint32_t a, uint32_t b)
 {
 uint32_t result;
diff --git a/target-arm/translate.c b/target-arm/translate.c
index 3deb24d..19bb1e8 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -410,6 +410,36 @@ static void gen_sub_carry(TCGv dest, TCGv t0, TCGv t1)
 tcg_gen_subi_i32(dest, dest, 1);
 }
 
+/* dest = T0 + T1. Compute C, N, V and Z flags */
+static void gen_add_CC(TCGv dest, TCGv t0, TCGv t1)
+{
+TCGv tmp;
+tcg_gen_add_i32(cpu_NF, t0, t1);
+tcg_gen_mov_i32(cpu_ZF, cpu_NF);
+tcg_gen_setcond_i32(TCG_COND_LTU, cpu_CF, cpu_NF, t0);
+tcg_gen_xor_i32(cpu_VF, cpu_NF, t0);
+tmp = tcg_temp_new_i32();
+tcg_gen_xor_i32(tmp, t0, t1);
+tcg_gen_andc_i32(cpu_VF, cpu_VF, tmp);
+tcg_temp_free_i32(tmp);
+tcg_gen_mov_i32(dest, cpu_NF);
+}
+
+/* dest = T0 - T1. Compute C, N, V and Z flags */
+static void gen_sub_CC(TCGv dest, TCGv t0, TCGv t1)
+{
+TCGv tmp;
+tcg_gen_sub_i32(cpu_NF, t0, t1);
+tcg_gen_mov_i32(cpu_ZF, cpu_NF);
+tcg_gen_setcond_i32(TCG_COND_GEU, cpu_CF, t0, t1);
+tcg_gen_xor_i32(cpu_VF, cpu_NF, t0);
+tmp = tcg_temp_new_i32();
+tcg_gen_xor_i32(tmp, t0, t1);
+tcg_gen_and_i32(cpu_VF, cpu_VF, tmp);
+tcg_temp_free_i32(tmp);
+tcg_gen_mov_i32(dest, cpu_NF);
+}
+
 /* FIXME:  Implement this natively.  */
 #define tcg_gen_abs_i32(t0, t1) gen_helper_abs(t0, t1)
 
@@ -6970,11 +7000,11 @@ static void disas_arm_insn(CPUARMState * env, 
DisasContext *s)
 if (IS_USER(s)) {
 goto illegal_op;
 }
-gen_helper_sub_cc(tmp, cpu_env, tmp, tmp2);
+gen_sub_CC(tmp, tmp, tmp2);
 gen_exception_return(s, tmp);
 } else {
 if (set_cc) {
-gen_helper_sub_cc(tmp, cpu_env, tmp, tmp2);
+gen_sub_CC(tmp, tmp, tmp2);
 } else {
 tcg_gen_sub_i32(tmp, tmp, tmp2);
 }
@@ -6983,7 +7013,7 @@ static void disas_arm_insn(CPUARMState * env, 
DisasContext *s)
 break;
 case 0x03:
 if (set_cc) {
-gen_helper_sub_cc(tmp, cpu_env, tmp2, tmp);
+gen_sub_CC(tmp, tmp2, tmp);
 } else {
 tcg_gen_sub_i32(tmp, tmp2, tmp);
 }
@@ -6991,7 +7021,7 @@ static void disas_arm_insn(CPUARMState * env, 
DisasContext *s)
 break;
 case 0x04:
 if (set_cc) {
-gen_helper_add_cc(tmp, cpu_env, tmp, tmp2);
+gen_add_CC(tmp, tmp, tmp2);
 } else {
 tcg_gen_add_i32(tmp, tmp, tmp2);
 }
@@ -7037,13 +7067,13 @@ static void disas_arm_insn(CPUARMState * env, 
DisasContext *s)
 break;
 case

[Qemu-devel] [PATCH v3 1/5] target-arm: use globals for CC flags

2012-09-25 Thread Aurelien Jarno

Use globals for CC flags instead of loading/storing them each they are
accessed. This allows some optimizations to be performed by the TCG
optimization passes.

Reviewed-by: Peter Maydell 
Signed-off-by: Aurelien Jarno 
---
 target-arm/translate.c |  127 ++--
 1 file changed, 46 insertions(+), 81 deletions(-)

diff --git a/target-arm/translate.c b/target-arm/translate.c
index f4b447a..3deb24d 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -85,6 +85,7 @@ static TCGv_ptr cpu_env;
 /* We reuse the same 64-bit temporaries for efficiency.  */
 static TCGv_i64 cpu_V0, cpu_V1, cpu_M0;
 static TCGv_i32 cpu_R[16];
+static TCGv_i32 cpu_CF, cpu_NF, cpu_VF, cpu_ZF;
 static TCGv_i32 cpu_exclusive_addr;
 static TCGv_i32 cpu_exclusive_val;
 static TCGv_i32 cpu_exclusive_high;
@@ -115,6 +116,11 @@ void arm_translate_init(void)
   offsetof(CPUARMState, regs[i]),
   regnames[i]);
 }
+cpu_CF = tcg_global_mem_new_i32(TCG_AREG0, offsetof(CPUARMState, CF), 
"CF");
+cpu_NF = tcg_global_mem_new_i32(TCG_AREG0, offsetof(CPUARMState, NF), 
"NF");
+cpu_VF = tcg_global_mem_new_i32(TCG_AREG0, offsetof(CPUARMState, VF), 
"VF");
+cpu_ZF = tcg_global_mem_new_i32(TCG_AREG0, offsetof(CPUARMState, ZF), 
"ZF");
+
 cpu_exclusive_addr = tcg_global_mem_new_i32(TCG_AREG0,
 offsetof(CPUARMState, exclusive_addr), "exclusive_addr");
 cpu_exclusive_val = tcg_global_mem_new_i32(TCG_AREG0,
@@ -369,53 +375,39 @@ static void gen_add16(TCGv t0, TCGv t1)
 tcg_temp_free_i32(t1);
 }
 
-#define gen_set_CF(var) tcg_gen_st_i32(var, cpu_env, offsetof(CPUARMState, CF))
-
 /* Set CF to the top bit of var.  */
 static void gen_set_CF_bit31(TCGv var)
 {
-TCGv tmp = tcg_temp_new_i32();
-tcg_gen_shri_i32(tmp, var, 31);
-gen_set_CF(tmp);
-tcg_temp_free_i32(tmp);
+tcg_gen_shri_i32(cpu_CF, var, 31);
 }
 
 /* Set N and Z flags from var.  */
 static inline void gen_logic_CC(TCGv var)
 {
-tcg_gen_st_i32(var, cpu_env, offsetof(CPUARMState, NF));
-tcg_gen_st_i32(var, cpu_env, offsetof(CPUARMState, ZF));
+tcg_gen_mov_i32(cpu_NF, var);
+tcg_gen_mov_i32(cpu_ZF, var);
 }
 
 /* T0 += T1 + CF.  */
 static void gen_adc(TCGv t0, TCGv t1)
 {
-TCGv tmp;
 tcg_gen_add_i32(t0, t0, t1);
-tmp = load_cpu_field(CF);
-tcg_gen_add_i32(t0, t0, tmp);
-tcg_temp_free_i32(tmp);
+tcg_gen_add_i32(t0, t0, cpu_CF);
 }
 
 /* dest = T0 + T1 + CF. */
 static void gen_add_carry(TCGv dest, TCGv t0, TCGv t1)
 {
-TCGv tmp;
 tcg_gen_add_i32(dest, t0, t1);
-tmp = load_cpu_field(CF);
-tcg_gen_add_i32(dest, dest, tmp);
-tcg_temp_free_i32(tmp);
+tcg_gen_add_i32(dest, dest, cpu_CF);
 }
 
 /* dest = T0 - T1 + CF - 1.  */
 static void gen_sub_carry(TCGv dest, TCGv t0, TCGv t1)
 {
-TCGv tmp;
 tcg_gen_sub_i32(dest, t0, t1);
-tmp = load_cpu_field(CF);
-tcg_gen_add_i32(dest, dest, tmp);
+tcg_gen_add_i32(dest, dest, cpu_CF);
 tcg_gen_subi_i32(dest, dest, 1);
-tcg_temp_free_i32(tmp);
 }
 
 /* FIXME:  Implement this natively.  */
@@ -423,16 +415,14 @@ static void gen_sub_carry(TCGv dest, TCGv t0, TCGv t1)
 
 static void shifter_out_im(TCGv var, int shift)
 {
-TCGv tmp = tcg_temp_new_i32();
 if (shift == 0) {
-tcg_gen_andi_i32(tmp, var, 1);
+tcg_gen_andi_i32(cpu_CF, var, 1);
 } else {
-tcg_gen_shri_i32(tmp, var, shift);
-if (shift != 31)
-tcg_gen_andi_i32(tmp, tmp, 1);
+tcg_gen_shri_i32(cpu_CF, var, shift);
+if (shift != 31) {
+tcg_gen_andi_i32(cpu_CF, cpu_CF, 1);
+}
 }
-gen_set_CF(tmp);
-tcg_temp_free_i32(tmp);
 }
 
 /* Shift by immediate.  Includes special handling for shift == 0.  */
@@ -449,8 +439,7 @@ static inline void gen_arm_shift_im(TCGv var, int shiftop, 
int shift, int flags)
 case 1: /* LSR */
 if (shift == 0) {
 if (flags) {
-tcg_gen_shri_i32(var, var, 31);
-gen_set_CF(var);
+tcg_gen_shri_i32(cpu_CF, var, 31);
 }
 tcg_gen_movi_i32(var, 0);
 } else {
@@ -474,11 +463,11 @@ static inline void gen_arm_shift_im(TCGv var, int 
shiftop, int shift, int flags)
 shifter_out_im(var, shift - 1);
 tcg_gen_rotri_i32(var, var, shift); break;
 } else {
-TCGv tmp = load_cpu_field(CF);
+TCGv tmp = tcg_temp_new_i32();
 if (flags)
 shifter_out_im(var, 0);
 tcg_gen_shri_i32(var, var, 1);
-tcg_gen_shli_i32(tmp, tmp, 31);
+tcg_gen_shli_i32(tmp, cpu_CF, 31);
 tcg_gen_or_i32(var, var, tmp);
 tcg_temp_free_i32(tmp);
 }
@@ -603,99 +592,75 @@ static void gen_thumb2_parallel_addsub(int op1, int op2, 
TCGv a, TCGv b)
 static void gen_test_cc(int cc, int label)
 {
 TCGv tmp;
-TCGv tmp2;

[Qemu-devel] [PATCH v3 4/5] target-arm: mark a few integer helpers const and pure

2012-09-25 Thread Aurelien Jarno

Reviewed-by: Peter Maydell 
Signed-off-by: Aurelien Jarno 
---
 target-arm/helper.h |   19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/target-arm/helper.h b/target-arm/helper.h
index 794e2b1..8b9adf1 100644
--- a/target-arm/helper.h
+++ b/target-arm/helper.h
@@ -1,8 +1,8 @@
 #include "def-helper.h"
 
-DEF_HELPER_1(clz, i32, i32)
-DEF_HELPER_1(sxtb16, i32, i32)
-DEF_HELPER_1(uxtb16, i32, i32)
+DEF_HELPER_FLAGS_1(clz, TCG_CALL_CONST | TCG_CALL_PURE, i32, i32)
+DEF_HELPER_FLAGS_1(sxtb16, TCG_CALL_CONST | TCG_CALL_PURE, i32, i32)
+DEF_HELPER_FLAGS_1(uxtb16, TCG_CALL_CONST | TCG_CALL_PURE, i32, i32)
 
 DEF_HELPER_3(add_setq, i32, env, i32, i32)
 DEF_HELPER_3(add_saturate, i32, env, i32, i32)
@@ -10,10 +10,10 @@ DEF_HELPER_3(sub_saturate, i32, env, i32, i32)
 DEF_HELPER_3(add_usaturate, i32, env, i32, i32)
 DEF_HELPER_3(sub_usaturate, i32, env, i32, i32)
 DEF_HELPER_2(double_saturate, i32, env, s32)
-DEF_HELPER_2(sdiv, s32, s32, s32)
-DEF_HELPER_2(udiv, i32, i32, i32)
-DEF_HELPER_1(rbit, i32, i32)
-DEF_HELPER_1(abs, i32, i32)
+DEF_HELPER_FLAGS_2(sdiv, TCG_CALL_CONST | TCG_CALL_PURE, s32, s32, s32)
+DEF_HELPER_FLAGS_2(udiv, TCG_CALL_CONST | TCG_CALL_PURE, i32, i32, i32)
+DEF_HELPER_FLAGS_1(rbit, TCG_CALL_CONST | TCG_CALL_PURE, i32, i32)
+DEF_HELPER_FLAGS_1(abs, TCG_CALL_CONST | TCG_CALL_PURE, i32, i32)
 
 #define PAS_OP(pfx)  \
 DEF_HELPER_3(pfx ## add8, i32, i32, i32, ptr) \
@@ -45,11 +45,12 @@ DEF_HELPER_3(usat, i32, env, i32, i32)
 DEF_HELPER_3(ssat16, i32, env, i32, i32)
 DEF_HELPER_3(usat16, i32, env, i32, i32)
 
-DEF_HELPER_2(usad8, i32, i32, i32)
+DEF_HELPER_FLAGS_2(usad8, TCG_CALL_CONST | TCG_CALL_PURE, i32, i32, i32)
 
 DEF_HELPER_1(logicq_cc, i32, i64)
 
-DEF_HELPER_3(sel_flags, i32, i32, i32, i32)
+DEF_HELPER_FLAGS_3(sel_flags, TCG_CALL_CONST | TCG_CALL_PURE,
+   i32, i32, i32, i32)
 DEF_HELPER_2(exception, void, env, i32)
 DEF_HELPER_1(wfi, void, env)
 
-- 
1.7.10.4

Re: [Qemu-devel] [PATCH 7/7] target-s390x: Tidy cpu_dump_state

2012-09-25 Thread Aurelien Jarno

On Mon, Sep 24, 2012 at 02:55:53PM -0700, Richard Henderson wrote:
> The blank lines inside the single dump make it difficult for the
> eye to pick out the block.  Worse, with interior newlines, but
> no blank line following, the PSW line appears to belong to the
> next dump block.
> 
> Cc: Alexander Graf 
> Signed-off-by: Richard Henderson 
> ---
>  target-s390x/translate.c | 22 ++
>  1 file changed, 10 insertions(+), 12 deletions(-)
> 
> diff --git a/target-s390x/translate.c b/target-s390x/translate.c
> index 4cc9225..db464cc 100644
> --- a/target-s390x/translate.c
> +++ b/target-s390x/translate.c
> @@ -79,6 +79,14 @@ void cpu_dump_state(CPUS390XState *env, FILE *f, 
> fprintf_function cpu_fprintf,
>  {
>  int i;
>  
> +if (env->cc_op > 3) {
> +cpu_fprintf(f, "PSW=mask %016" PRIx64 " addr %016" PRIx64 " cc 
> %15s\n",
> +env->psw.mask, env->psw.addr, cc_name(env->cc_op));
> +} else {
> +cpu_fprintf(f, "PSW=mask %016" PRIx64 " addr %016" PRIx64 " cc 
> %02x\n",
> +env->psw.mask, env->psw.addr, env->cc_op);
> +}
> +
>  for (i = 0; i < 16; i++) {
>  cpu_fprintf(f, "R%02d=%016" PRIx64, i, env->regs[i]);
>  if ((i % 4) == 3) {
> @@ -97,8 +105,6 @@ void cpu_dump_state(CPUS390XState *env, FILE *f, 
> fprintf_function cpu_fprintf,
>  }
>  }
>  
> -cpu_fprintf(f, "\n");
> -
>  #ifndef CONFIG_USER_ONLY
>  for (i = 0; i < 16; i++) {
>  cpu_fprintf(f, "C%02d=%016" PRIx64, i, env->cregs[i]);
> @@ -110,22 +116,14 @@ void cpu_dump_state(CPUS390XState *env, FILE *f, 
> fprintf_function cpu_fprintf,
>  }
>  #endif
>  
> -cpu_fprintf(f, "\n");
> -
> -if (env->cc_op > 3) {
> -cpu_fprintf(f, "PSW=mask %016" PRIx64 " addr %016" PRIx64 " cc 
> %15s\n",
> -env->psw.mask, env->psw.addr, cc_name(env->cc_op));
> -} else {
> -cpu_fprintf(f, "PSW=mask %016" PRIx64 " addr %016" PRIx64 " cc 
> %02x\n",
> -env->psw.mask, env->psw.addr, env->cc_op);
> -}
> -
>  #ifdef DEBUG_INLINE_BRANCHES
>  for (i = 0; i < CC_OP_MAX; i++) {
>  cpu_fprintf(f, "  %15s = %10ld\t%10ld\n", cc_name(i),
>  inline_branch_miss[i], inline_branch_hit[i]);
>  }
>  #endif
> +
> +cpu_fprintf(f, "\n");
>  }
>  
>  static TCGv_i64 psw_addr;
> -- 
> 1.7.11.4

Reviewed-by: Aurelien Jarno 

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH 5/7] target-s390x: Use CPU_LOG_INT

2012-09-25 Thread Aurelien Jarno

On Mon, Sep 24, 2012 at 02:55:51PM -0700, Richard Henderson wrote:
> Three places in the interrupt code did we not honor the mask.
> 
> Cc: Alexander Graf 
> Signed-off-by: Richard Henderson 
> ---
>  target-s390x/helper.c  | 7 ---
>  target-s390x/misc_helper.c | 3 ++-
>  2 files changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/target-s390x/helper.c b/target-s390x/helper.c
> index a5741ec..22256b0 100644
> --- a/target-s390x/helper.c
> +++ b/target-s390x/helper.c
> @@ -511,7 +511,8 @@ static void do_program_interrupt(CPUS390XState *env)
>  break;
>  }
>  
> -qemu_log("%s: code=0x%x ilc=%d\n", __func__, env->int_pgm_code, ilc);
> +qemu_log_mask(CPU_LOG_INT, "%s: code=0x%x ilc=%d\n",
> +  __func__, env->int_pgm_code, ilc);
>  
>  lowcore = cpu_physical_memory_map(env->psa, &len, 1);
>  
> @@ -575,8 +576,8 @@ static void do_ext_interrupt(CPUS390XState *env)
>  
>  void do_interrupt(CPUS390XState *env)
>  {
> -qemu_log("%s: %d at pc=%" PRIx64 "\n", __func__, env->exception_index,
> - env->psw.addr);
> +qemu_log_mask(CPU_LOG_INT, "%s: %d at pc=%" PRIx64 "\n",
> +  __func__, env->exception_index, env->psw.addr);
>  
>  s390_add_running_cpu(env);
>  /* handle external interrupts */
> diff --git a/target-s390x/misc_helper.c b/target-s390x/misc_helper.c
> index 2938ac9..e9b3cae 100644
> --- a/target-s390x/misc_helper.c
> +++ b/target-s390x/misc_helper.c
> @@ -53,7 +53,8 @@ void HELPER(exception)(CPUS390XState *env, uint32_t excp)
>  #ifndef CONFIG_USER_ONLY
>  void program_interrupt(CPUS390XState *env, uint32_t code, int ilc)
>  {
> -qemu_log("program interrupt at %#" PRIx64 "\n", env->psw.addr);
> +qemu_log_mask(CPU_LOG_INT, "program interrupt at %#" PRIx64 "\n",
> +  env->psw.addr);
>  
>  if (kvm_enabled()) {
>  #ifdef CONFIG_KVM
> -- 
> 1.7.11.4

Reviewed-by: Aurelien Jarno 

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PULL 00/12] NBD patches for 2012-09-22

2012-09-25 Thread Anthony Liguori

Paolo Bonzini  writes:

> The following changes since commit 89c7fd21930de671a6e34793e8b1ee257e2e:
>
>   Remove unused CONFIG_TCG_PASS_AREG0 and dead code (2012-09-15 17:51:14 
> +)
>
> are available in the git repository at:
>
>   git://github.com/bonzini/qemu.git nbd-next
>
> for you to fetch changes up to be3d30e144dbb99cd39ae4cedfb802337b5b172f:
>
>   nbd: add nbd_export_get_blockdev (2012-09-19 14:03:15 +0200)
>
> These patches merge the first part of the embedded NBD server.  The actual
> QEMU implementation needs some refactoring of qemu-sockets, and might even
> go in through Luiz's tree.

Pulled. Thanks.

Regards,

Anthony Liguori

>
> 
> Paolo Bonzini (12):
>   nbd: add more constants
>   nbd: pass NBDClient to nbd_send_negotiate
>   nbd: do not close BlockDriverState in nbd_export_close
>   nbd: make refcount interface public
>   nbd: do not leak nbd_trip coroutines when a connection is torn down
>   nbd: add reference counting to NBDExport
>   nbd: track clients into NBDExport
>   nbd: add notification for closing an NBDExport
>   qemu-nbd: rewrite termination conditions to use a state machine
>   nbd: register named exports
>   nbd: negotiate with named exports
>   nbd: add nbd_export_get_blockdev
>
>  nbd.c  | 396 
> ++---
>  nbd.h  |  15 ++-
>  qemu-nbd.c |  36 --
>  3 file modificati, 367 inserzioni(+), 80 rimozioni(-)
> -- 
> 1.7.12

Re: [Qemu-devel] [PULL 0/7] SCSI patches for 2012-09-21

2012-09-25 Thread Anthony Liguori

Paolo Bonzini  writes:

> Anthony,
>
> The following changes since commit c26032b2c91721245bfec542d94f37a0238e986e:
>
>   target-xtensa: don't emit extra tcg_gen_goto_tb (2012-09-21 03:07:27 +0400)
>
> are available in the git repository at:
>
>   git://github.com/bonzini/qemu.git scsi-next
>
> for you to fetch changes up to 1109c894052751df99962c009fd7dbae397721f5:
>
>   SCSI: Standard INQUIRY data should report HiSup flag as set. (2012-09-21 
> 16:17:49 +0200)
>

Pulled. Thanks.

Regards,

Anthony Liguori

> Boring stuff, only cleanups and bugfixes, except for patch 1.
>
> 
> Paolo Bonzini (4):
>   scsi-disk: introduce check_lba_range
>   scsi-disk: fix check for out-of-range LBA
>   scsi: introduce scsi_cdb_length and scsi_data_cdb_length
>   scsi-disk: use scsi_data_cdb_length
>
> Ronnie Sahlberg (3):
>   iSCSI: We need to support SG_IO also from iscsi_ioctl()
>   iSCSI: We dont need to explicitely call qemu_notify_event() any more
>   SCSI: Standard INQUIRY data should report HiSup flag as set.
>
>  block/iscsi.c  | 23 +--
>  hw/scsi-bus.c  | 23 ++-
>  hw/scsi-disk.c | 44 ++--
>  hw/scsi.h  |  2 ++
>  4 file modificati, 63 inserzioni(+), 29 rimozioni(-)
> -- 
> 1.7.12

Re: [Qemu-devel] [PULL] QOM CPUState patch queue 2012-09-21

2012-09-25 Thread Anthony Liguori

Andreas Färber  writes:

> Hello Anthony,
>
> Please pull a small batch of CPUState-related patches towards x86 CPU hotplug
> and fixing some issues with alpha-linux-user.
>
> The branch is mirrored on both GitHub and repo.or.cz for your convenience. :-)
>
> Regards,
> Andreas
>
> P.S. Will be mostly offline throughout next week.

Pulled. Thanks.

Regards,

Anthony Liguori

>
>
> Cc: Anthony Liguori 
>
> Cc: Eduardo Habkost 
> Cc: Igor Mammedov 
> Cc: Don Slutz 
> Cc: Richard Henderson 
>
> The following changes since commit c26032b2c91721245bfec542d94f37a0238e986e:
>
>   target-xtensa: don't emit extra tcg_gen_goto_tb (2012-09-21 03:07:27 +0400)
>
> are available in the git repository at:
>
>   git://github.com/afaerber/qemu-cpu.git qom-cpu
>
> for you to fetch changes up to eeed53faccb82e17d2a4f31705eb1c5719f6e614:
>
>   target-alpha: Make cpu_alpha_init() reentrant (2012-09-21 15:12:59 +0200)
>
> 
> Andreas Färber (2):
>   MAINTAINERS: Add entry for QOM CPU
>   target-i386: Drop unused setscalar() macro
>
> Eduardo Habkost (5):
>   target-i386: Add missing CPUID_* constants
>   target-i386: Move CPU models from cpus-x86_64.conf to C
>   Eliminate cpus-x86_64.conf file
>   target-i386: x86_cpudef_setup() coding style change
>   target-i386: Kill cpudef config section support
>
> Peter Maydell (2):
>   target-i386: Fold -cpu ?cpuid, ?model output into -cpu help, drop ?dump
>   Drop cpu_list_id macro
>
> Richard Henderson (2):
>   target-alpha: Initialize env->cpu_model_str
>   target-alpha: Make cpu_alpha_init() reentrant
>
>  MAINTAINERS|6 +
>  Makefile   |1 -
>  arch_init.c|1 -
>  cpus.c |6 +-
>  linux-user/main.c  |6 +-
>  sysconfigs/target/cpus-x86_64.conf |  128 ---
>  target-alpha/translate.c   |7 +-
>  target-i386/cpu.c  |  411 
> +---
>  target-i386/cpu.h  |   26 ++-
>  9 Dateien geändert, 281 Zeilen hinzugefügt(+), 311 Zeilen entfernt(-)
>  delete mode 100644 sysconfigs/target/cpus-x86_64.conf

Re: [Qemu-devel] [PULL 00/14] Trivial patches for 15 to 23 September 2012

2012-09-25 Thread Anthony Liguori

Stefan Hajnoczi  writes:

> The following changes since commit 93b6599734f81328ee3d608f57667742cafeea72:
>
>   audio: Fix warning from static code analysis (2012-09-23 01:34:16 +0400)
>
> are available in the git repository at:
>
>   git://github.com/stefanha/qemu.git trivial-patches
>

Pulled. Thanks.

Regards,

Anthony Liguori

> for you to fetch changes up to 95df51a4a02a853af8828c281bce2d4f2a41d6fd:
>
>   w32: Always use standard instead of native format strings (2012-09-23 
> 07:39:22 +0100)
>
> 
> Alon Levy (1):
>   dtrace backend: add function to reserved words
>
> Don Slutz (1):
>   target-i386: Allow tsc-frequency to be larger then 2.147G
>
> Hitoshi Mitake (1):
>   curses: don't initialize curses when qemu is daemonized
>
> Laszlo Ersek (1):
>   TextConsole: saturate escape parameter in TTY_STATE_CSI
>
> Paolo Bonzini (1):
>   qemu-timer: simplify qemu_run_timers
>
> Stefan Weil (9):
>   qemu-ga: Remove unreachable code after g_error
>   qemu-sockets: Fix potential memory leak
>   cadence_uart: Fix buffer overflow
>   lm4549: Fix buffer overflow
>   ioh3420: Remove unreachable code
>   pflash_cfi01: Fix warning caused by unreachable code
>   linux-user: Remove redundant null check and replace free by g_free
>   net/socket: Fix compiler warning (regression for MinGW)
>   w32: Always use standard instead of native format strings
>
>  compiler.h  |5 +
>  console.c   |7 +--
>  hw/cadence_uart.c   |2 +-
>  hw/ioh3420.c|1 -
>  hw/lm4549.c |2 +-
>  hw/pflash_cfi01.c   |8 
>  linux-user/syscall.c|4 +---
>  net/socket.c|6 +++---
>  os-posix.c  |5 +
>  qemu-common.h   |5 +
>  qemu-ga.c   |2 --
>  qemu-os-posix.h |2 ++
>  qemu-os-win32.h |5 +
>  qemu-sockets.c  |2 +-
>  qemu-timer.c|7 +++
>  scripts/tracetool/backend/dtrace.py |2 +-
>  target-i386/cpu.c   |2 +-
>  vl.c|4 +++-
>  18 files changed, 46 insertions(+), 25 deletions(-)
>
> -- 
> 1.7.10.4

Re: [Qemu-devel] [PULL 00/19] Block patches

2012-09-25 Thread Anthony Liguori

Kevin Wolf  writes:

> The following changes since commit d3e8f95753114a827f9cd8e819b1d5cc8333f76b:
>
>   w32: Add implementation of gmtime_r, localtime_r (2012-09-23 17:09:30 +)
>
> are available in the git repository at:
>
>   git://repo.or.cz/qemu/kevin.git for-anthony

Pulled. Thanks.

Regards,

Anthony Liguori

>
> Jeff Cody (18):
>   block: correctly set the keep_read_only flag
>   block: make bdrv_set_enable_write_cache() modify open_flags
>   block: Framework for reopening files safely
>   block: move aio initialization into a helper function
>   block: move open flag parsing in raw block drivers to helper functions
>   block: do not parse BDRV_O_CACHE_WB in block drivers
>   block: use BDRV_O_NOCACHE instead of s->aligned_buf in raw-posix.c
>   block: purge s->aligned_buf and s->aligned_buf_size from raw-posix.c
>   block: raw-posix image file reopen
>   block: raw image file reopen
>   block: qed image file reopen
>   block: qcow2 image file reopen
>   block: qcow image file reopen
>   block: vmdk image file reopen
>   block: vdi image file reopen
>   block: vpc image file reopen
>   block: convert bdrv_commit() to use bdrv_reopen()
>   block: remove keep_read_only flag from BlockDriverState struct
>
> Kevin Shanahan (1):
>   blockdev: preserve readonly and snapshot states across media changes
>
>  block.c   |  299 
> -
>  block.h   |   18 +++
>  block/iscsi.c |4 -
>  block/qcow.c  |   10 ++
>  block/qcow2.c |   10 ++
>  block/qed.c   |9 ++
>  block/raw-posix.c |  225 ++--
>  block/raw-win32.c |   40 
>  block/raw.c   |   10 ++
>  block/rbd.c   |6 -
>  block/sheepdog.c  |   14 +--
>  block/vdi.c   |7 ++
>  block/vmdk.c  |   35 ++
>  block/vpc.c   |7 ++
>  block_int.h   |9 ++-
>  blockdev.c|2 +
>  16 files changed, 563 insertions(+), 142 deletions(-)

Re: [Qemu-devel] linux aio and cache mode

2012-09-25 Thread ching

On 09/25/2012 09:33 PM, Kevin Wolf wrote:
> Am 25.09.2012 00:40, schrieb ching:
>> On 09/24/2012 08:30 PM, Kevin Wolf wrote:
>>> Am 24.09.2012 13:32, schrieb ching:
 Hi all,

 My host is qemu-1.1.1 and x64 kernel 3.5.4. The guest is using aio="native"

 I am trying to use unsafe cache mode to boost i/o performance.
>>> aio=native requires the image to be opened with O_DIRECT, i.e.
>>> cache=none or cache=directsync. If you specify a different cache option,
>>> it will silently fall back to aio=threads.
>>>
>>> Kevin
>>>
>> will qemu log a entry for the silent fallback?
> No, that's why it's silent. :-)
>
>> Reason:
>>
>> I am testing sparse image on btrfs with mount option: 
>> rw,noatime,space_cache,autodefrag,inode_cache
>>
>> i encounter a speed difference (around 2X-3X) between 
>> aio=threads,cache=unsafe and aio=native,cache=unsafe
>>
>> aio=threads is much faster, i guest there is conflict between "autodefrag" 
>> and linux aio
> This is odd. The point is that with cache=unsafe it shouldn't even be
> using Linux AIO in the first place. I can't see why there would be any
> difference between aio=threads and aio=native with cache=unsafe.
>
> Kevin
>

is it possible to check the open mode of file and whether it is using aio at 
runtime?

Re: [Qemu-devel] [PATCH] target-xtensa: de-optimize EXTUI

2012-09-25 Thread Max Filippov

On Wed, Sep 26, 2012 at 2:57 AM, Aurelien Jarno  wrote:
> Now that and with 0xff, 0x and 0x is optimized in
> tcg/tcg-op.h, there is no need to do it in target-xtensa/translate.c.
>
> Cc: Max Filippov 
> Signed-off-by: Aurelien Jarno 
> ---
>  target-xtensa/translate.c |   15 +--
>  1 file changed, 1 insertion(+), 14 deletions(-)
>
> diff --git a/target-xtensa/translate.c b/target-xtensa/translate.c
> index ba3ffcb..c1358ee 100644
> --- a/target-xtensa/translate.c
> +++ b/target-xtensa/translate.c
> @@ -1835,20 +1835,7 @@ static void disas_xtensa_insn(DisasContext *dc)
>  } else {
>  tcg_gen_mov_i32(tmp, cpu_R[RRR_T]);
>  }

I guess shri above may be de-optimized as well.
In any case Acked-by: Max Filippov 

> -
> -switch (maskimm) {
> -case 0xff:
> -tcg_gen_ext8u_i32(cpu_R[RRR_R], tmp);
> -break;
> -
> -case 0x:
> -tcg_gen_ext16u_i32(cpu_R[RRR_R], tmp);
> -break;
> -
> -default:
> -tcg_gen_andi_i32(cpu_R[RRR_R], tmp, maskimm);
> -break;
> -}
> +tcg_gen_andi_i32(cpu_R[RRR_R], tmp, maskimm);
>  tcg_temp_free(tmp);
>  }
>  break;
> --
> 1.7.10.4
>
>

-- 
Thanks.
-- Max

[Qemu-devel] [PATCH] target-xtensa: de-optimize EXTUI

2012-09-25 Thread Aurelien Jarno

Now that and with 0xff, 0x and 0x is optimized in
tcg/tcg-op.h, there is no need to do it in target-xtensa/translate.c.

Cc: Max Filippov 
Signed-off-by: Aurelien Jarno 
---
 target-xtensa/translate.c |   15 +--
 1 file changed, 1 insertion(+), 14 deletions(-)

diff --git a/target-xtensa/translate.c b/target-xtensa/translate.c
index ba3ffcb..c1358ee 100644
--- a/target-xtensa/translate.c
+++ b/target-xtensa/translate.c
@@ -1835,20 +1835,7 @@ static void disas_xtensa_insn(DisasContext *dc)
 } else {
 tcg_gen_mov_i32(tmp, cpu_R[RRR_T]);
 }
-
-switch (maskimm) {
-case 0xff:
-tcg_gen_ext8u_i32(cpu_R[RRR_R], tmp);
-break;
-
-case 0x:
-tcg_gen_ext16u_i32(cpu_R[RRR_R], tmp);
-break;
-
-default:
-tcg_gen_andi_i32(cpu_R[RRR_R], tmp, maskimm);
-break;
-}
+tcg_gen_andi_i32(cpu_R[RRR_R], tmp, maskimm);
 tcg_temp_free(tmp);
 }
 break;
-- 
1.7.10.4

[Qemu-devel] [PATCH v3 3/5] target-arm: convert sar, shl and shr helpers to TCG

2012-09-25 Thread Aurelien Jarno

Now that the movcond TCG op is available, it's possible to replace
shl and shr helpers by TCG code. The code generated by TCG is slightly
longer than the code generated by GCC for the helper but is still worth
it as this avoid all the consequences of using an helper: globals saved
back to memory, no possible optimization, call overhead, etc.

Cc: Peter Maydell 
Signed-off-by: Aurelien Jarno 
---
 target-arm/helper.h|3 ---
 target-arm/op_helper.c |   24 
 target-arm/translate.c |   49 ++--
 3 files changed, 43 insertions(+), 33 deletions(-)

diff --git a/target-arm/helper.h b/target-arm/helper.h
index 7151e28..794e2b1 100644
--- a/target-arm/helper.h
+++ b/target-arm/helper.h
@@ -145,9 +145,6 @@ DEF_HELPER_5(neon_tbl, i32, env, i32, i32, i32, i32)
 DEF_HELPER_3(adc_cc, i32, env, i32, i32)
 DEF_HELPER_3(sbc_cc, i32, env, i32, i32)
 
-DEF_HELPER_3(shl, i32, env, i32, i32)
-DEF_HELPER_3(shr, i32, env, i32, i32)
-DEF_HELPER_3(sar, i32, env, i32, i32)
 DEF_HELPER_3(shl_cc, i32, env, i32, i32)
 DEF_HELPER_3(shr_cc, i32, env, i32, i32)
 DEF_HELPER_3(sar_cc, i32, env, i32, i32)
diff --git a/target-arm/op_helper.c b/target-arm/op_helper.c
index 6095f24..aef592a 100644
--- a/target-arm/op_helper.c
+++ b/target-arm/op_helper.c
@@ -355,30 +355,6 @@ uint32_t HELPER(sbc_cc)(CPUARMState *env, uint32_t a, 
uint32_t b)
 
 /* Similarly for variable shift instructions.  */
 
-uint32_t HELPER(shl)(CPUARMState *env, uint32_t x, uint32_t i)
-{
-int shift = i & 0xff;
-if (shift >= 32)
-return 0;
-return x << shift;
-}
-
-uint32_t HELPER(shr)(CPUARMState *env, uint32_t x, uint32_t i)
-{
-int shift = i & 0xff;
-if (shift >= 32)
-return 0;
-return (uint32_t)x >> shift;
-}
-
-uint32_t HELPER(sar)(CPUARMState *env, uint32_t x, uint32_t i)
-{
-int shift = i & 0xff;
-if (shift >= 32)
-shift = 31;
-return (int32_t)x >> shift;
-}
-
 uint32_t HELPER(shl_cc)(CPUARMState *env, uint32_t x, uint32_t i)
 {
 int shift = i & 0xff;
diff --git a/target-arm/translate.c b/target-arm/translate.c
index 19bb1e8..953a80e 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -440,6 +440,37 @@ static void gen_sub_CC(TCGv dest, TCGv t0, TCGv t1)
 tcg_gen_mov_i32(dest, cpu_NF);
 }
 
+#define GEN_SHIFT(name)   \
+static void gen_##name(TCGv dest, TCGv t0, TCGv t1)   \
+{ \
+TCGv tmp1, tmp2, tmp3;\
+tmp1 = tcg_temp_new_i32();\
+tcg_gen_andi_i32(tmp1, t1, 0xff); \
+tmp2 = tcg_const_i32(0);  \
+tmp3 = tcg_const_i32(0x1f);   \
+tcg_gen_movcond_i32(TCG_COND_GTU, tmp2, tmp1, tmp3, tmp2, t0);\
+tcg_temp_free_i32(tmp3);  \
+tcg_gen_andi_i32(tmp1, tmp1, 0x1f);   \
+tcg_gen_##name##_i32(dest, tmp2, tmp1);   \
+tcg_temp_free_i32(tmp2);  \
+tcg_temp_free_i32(tmp1);  \
+}
+GEN_SHIFT(shl)
+GEN_SHIFT(shr)
+#undef GEN_SHIFT
+
+static void gen_sar(TCGv dest, TCGv t0, TCGv t1)
+{
+TCGv tmp1, tmp2;
+tmp1 = tcg_temp_new_i32();
+tcg_gen_andi_i32(tmp1, t1, 0xff);
+tmp2 = tcg_const_i32(0x1f);
+tcg_gen_movcond_i32(TCG_COND_GTU, tmp1, tmp1, tmp2, tmp2, tmp1);
+tcg_temp_free_i32(tmp2);
+tcg_gen_sar_i32(dest, t0, tmp1);
+tcg_temp_free_i32(tmp1);
+}
+
 /* FIXME:  Implement this natively.  */
 #define tcg_gen_abs_i32(t0, t1) gen_helper_abs(t0, t1)
 
@@ -516,9 +547,15 @@ static inline void gen_arm_shift_reg(TCGv var, int shiftop,
 }
 } else {
 switch (shiftop) {
-case 0: gen_helper_shl(var, cpu_env, var, shift); break;
-case 1: gen_helper_shr(var, cpu_env, var, shift); break;
-case 2: gen_helper_sar(var, cpu_env, var, shift); break;
+case 0:
+gen_shl(var, var, shift);
+break;
+case 1:
+gen_shr(var, var, shift);
+break;
+case 2:
+gen_sar(var, var, shift);
+break;
 case 3: tcg_gen_andi_i32(shift, shift, 0x1f);
 tcg_gen_rotr_i32(var, var, shift); break;
 }
@@ -9161,7 +9198,7 @@ static void disas_thumb_insn(CPUARMState *env, 
DisasContext *s)
 break;
 case 0x2: /* lsl */
 if (s->condexec_mask) {
-gen_helper_shl(tmp2, cpu_env, tmp2, tmp);
+gen_shl(tmp2, tmp2, tmp);
 } else {
 gen_helper_shl_cc(tmp2, cpu_env, tmp2, tmp);
 gen_logic_CC(tmp2);
@@ -9169,7 +9206,7 @@ static void disas_thumb_insn(CPUARMState

[Qemu-devel] [PATCH v3 5/5] target-arm: use deposit instead of hardcoded version

2012-09-25 Thread Aurelien Jarno

Use the deposit op instead of and hardcoded bit field insertion. It
allows the host to emit the corresponding instruction if available.

Reviewed-by: Peter Maydell 
Signed-off-by: Aurelien Jarno 
---
 target-arm/translate.c |   20 ++--
 1 file changed, 6 insertions(+), 14 deletions(-)

diff --git a/target-arm/translate.c b/target-arm/translate.c
index 953a80e..d2d7088 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -277,15 +277,6 @@ static void gen_sbfx(TCGv var, int shift, int width)
 }
 }
 
-/* Bitfield insertion.  Insert val into base.  Clobbers base and val.  */
-static void gen_bfi(TCGv dest, TCGv base, TCGv val, int shift, uint32_t mask)
-{
-tcg_gen_andi_i32(val, val, mask);
-tcg_gen_shli_i32(val, val, shift);
-tcg_gen_andi_i32(base, base, ~(mask << shift));
-tcg_gen_or_i32(dest, base, val);
-}
-
 /* Return (b << 32) + a. Mark inputs as dead */
 static TCGv_i64 gen_addq_msw(TCGv_i64 a, TCGv b)
 {
@@ -2660,12 +2651,12 @@ static int disas_vfp_insn(CPUARMState * env, 
DisasContext *s, uint32_t insn)
 switch (size) {
 case 0:
 tmp2 = neon_load_reg(rn, pass);
-gen_bfi(tmp, tmp2, tmp, offset, 0xff);
+tcg_gen_deposit_i32(tmp, tmp2, tmp, offset, 8);
 tcg_temp_free_i32(tmp2);
 break;
 case 1:
 tmp2 = neon_load_reg(rn, pass);
-gen_bfi(tmp, tmp2, tmp, offset, 0x);
+tcg_gen_deposit_i32(tmp, tmp2, tmp, offset, 16);
 tcg_temp_free_i32(tmp2);
 break;
 case 2:
@@ -4021,7 +4012,8 @@ static int disas_neon_ls_insn(CPUARMState * env, 
DisasContext *s, uint32_t insn)
 }
 if (size != 2) {
 tmp2 = neon_load_reg(rd, pass);
-gen_bfi(tmp, tmp2, tmp, shift, size ? 0x : 0xff);
+tcg_gen_deposit_i32(tmp, tmp2, tmp,
+shift, size ? 16 : 8);
 tcg_temp_free_i32(tmp2);
 }
 neon_store_reg(rd, pass, tmp);
@@ -7625,7 +7617,7 @@ static void disas_arm_insn(CPUARMState * env, 
DisasContext *s)
 }
 if (i != 32) {
 tmp2 = load_reg(s, rd);
-gen_bfi(tmp, tmp2, tmp, shift, (1u << i) - 1);
+tcg_gen_deposit_i32(tmp, tmp2, tmp, shift, i);
 tcg_temp_free_i32(tmp2);
 }
 store_reg(s, rd, tmp);
@@ -8736,7 +8728,7 @@ static int disas_thumb2_insn(CPUARMState *env, 
DisasContext *s, uint16_t insn_hw
 imm = imm + 1 - shift;
 if (imm != 32) {
 tmp2 = load_reg(s, rd);
-gen_bfi(tmp, tmp2, tmp, shift, (1u << imm) - 1);
+tcg_gen_deposit_i32(tmp, tmp2, tmp, shift, imm);
 tcg_temp_free_i32(tmp2);
 }
 break;
-- 
1.7.10.4

[Qemu-devel] [PATCH v3 0/5] target-arm: misc optimizations

2012-09-25 Thread Aurelien Jarno

This patch series optimizes the ARM target by:
 - using globals instead of ld/st function
 - using TCG code instead of helpers
 - marking some helpers const and pure

--
Changes v3 -> v2
 - removed useless code from patch 3

Changes v1 -> v2
 - updated patch 3 to use movcond instead of setcond. Also converted sar.
 - updated patch 4 following changes in patch 3
 - added patch 5


Aurelien Jarno (5):
  target-arm: use globals for CC flags
  target-arm: convert add_cc and sub_cc helpers to TCG
  target-arm: convert sar, shl and shr helpers to TCG
  target-arm: mark a few integer helpers const and pure
  target-arm: use deposit instead of hardcoded version

 target-arm/helper.h|   24 ++---
 target-arm/op_helper.c |   44 
 target-arm/translate.c |  260 ++--
 3 files changed, 152 insertions(+), 176 deletions(-)

--
1.7.10.4

Re: [Qemu-devel] [PATCH 3/7] tcg-i386: Implement movcond

2012-09-25 Thread Aurelien Jarno

On Mon, Sep 24, 2012 at 02:54:27PM -0700, Richard Henderson wrote:
> On 09/24/2012 02:37 PM, Alex Barcelo wrote:
> > just finished a git-bisect and I found this... and now I do not fully
> > understand why I have the problem.
> > 
> > To replicate the error (in a i386 machine, at least):
> > $ make clean && ./configure --enable-debug && make -j && make install
> > [Note: I tried both ppc and i386 targets, so doesn't seem machine-dependent]
> > $ ./path/to/qemu/bin/qemu-i386 doesntmatter
> > Invalid op definition for movcond_i32
> > /mnt/DATA/DARCO/qemu-git/tcg/tcg.c:1170: tcg fatal error
> > Aborted
> 
> Ah, right.  I only tried with -march=i686.
> 
> >> +{ INDEX_op_movcond_i32, { "r", "r", "ri", "r", "0" } },
> 
> Should be protected by 
> 
> #if TCG_TARGET_HAS_movcond_i32
> 
> If no one beats me to this, I'll submit a patch this evening.
> 
> 

I have applied the following patch to fix the issue.


commit f813cb838f19ee8637d3c365659e6a6bb0c9c974
Author: Aurelien Jarno 
Date:   Wed Sep 26 00:30:12 2012 +0200

tcg/i386: fix build with -march < i686

The movcond_i32 op has to be protected with TCG_TARGET_HAS_movcond_i32
to fix the build with -march < i686.

Thanks to Richard Henderson for the hint.

Reported-by: Alex Barcelo 
Signed-off-by: Aurelien Jarno 

diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c
index 122d636..0e218c8 100644
--- a/tcg/i386/tcg-target.c
+++ b/tcg/i386/tcg-target.c
@@ -1893,7 +1893,9 @@ static const TCGTargetOpDef x86_op_defs[] = {
 { INDEX_op_setcond_i32, { "q", "r", "ri" } },
 
 { INDEX_op_deposit_i32, { "Q", "0", "Q" } },
+#if TCG_TARGET_HAS_movcond_i32
 { INDEX_op_movcond_i32, { "r", "r", "ri", "r", "0" } },
+#endif
 
 #if TCG_TARGET_REG_BITS == 32
 { INDEX_op_mulu2_i32, { "a", "d", "a", "r" } },

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH 0/8] Misc tcg improvements

2012-09-25 Thread Aurelien Jarno

On Fri, Sep 21, 2012 at 05:18:08PM -0700, Richard Henderson wrote:
> The subject of the andi and assertion patches has come up on this
> list earlier this week, between Max Filippov, malc and myself.
> 
> The posibility of using deposit to implement concat occurred to
> me while working on the MIPS FPU conversion patch.
> 
> 
> r~
> 
> 
> Richard Henderson (8):
>   tcg: Adjust descriptions of *cond opcodes
>   tcg: Emit ANDI as EXTU for appropriate constants
>   tcg: Optimize initial inputs for ori_i64
>   tcg: Emit XORI as NOT for appropriate constants
>   tcg: Implement concat*_i64 with deposit_i64
>   tcg: Add tcg_debug_assert
>   tcg: Sanity check deposit inputs
>   tcg: Sanity check goto_tb input
> 
>  tcg/README   |  10 ++--
>  tcg/tcg-op.h | 182 
> ++-
>  tcg/tcg.c|   4 ++
>  tcg/tcg.h|  10 
>  4 files changed, 149 insertions(+), 57 deletions(-)
> 

Thanks, all applied.

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v3 0/2] Optimize movcond

2012-09-25 Thread Aurelien Jarno

On Mon, Sep 24, 2012 at 01:44:58PM -0700, Richard Henderson wrote:
> Changes v2->v3:
>   Rebase with the first 5 patches committed.
>   Fix 32/64-bit compile problems.  Oops.
> 
> 
> r~
> 
> 
> Richard Henderson (2):
>   tcg: Streamline movcond_i64 using 32-bit arithmetic
>   tcg: Streamline movcond_i64 using movcond_i32
> 
>  tcg/tcg-op.h | 28 
>  1 file changed, 28 insertions(+)
> 

Thanks, both applied.

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH 6/7] target-s390x: Avoid double CPU_LOG_TB_CPU

2012-09-25 Thread Aurelien Jarno

On Mon, Sep 24, 2012 at 02:55:52PM -0700, Richard Henderson wrote:
> This is already handled generically in cpu_exec.
> 
> Cc: Alexander Graf 
> Signed-off-by: Richard Henderson 
> ---
>  target-s390x/translate.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/target-s390x/translate.c b/target-s390x/translate.c
> index 6fa76a0..4cc9225 100644
> --- a/target-s390x/translate.c
> +++ b/target-s390x/translate.c
> @@ -5220,7 +5220,6 @@ static inline void 
> gen_intermediate_code_internal(CPUS390XState *env,
>  tb->icount = num_insns;
>  }
>  #if defined(S390X_DEBUG_DISAS)
> -log_cpu_state_mask(CPU_LOG_TB_CPU, env, 0);
>  if (qemu_loglevel_mask(CPU_LOG_TB_IN_ASM)) {
>  qemu_log("IN: %s\n", lookup_symbol(pc_start));
>  log_target_disas(pc_start, dc.pc - pc_start, 1);
> -- 
> 1.7.11.4

Reviewed-by: Aurelien Jarno 

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH 4/7] target-unicore32: Call tcg_gen_debug_insn_start

2012-09-25 Thread Aurelien Jarno

On Mon, Sep 24, 2012 at 02:55:50PM -0700, Richard Henderson wrote:
> Cc: Guan Xuetao 
> Signed-off-by: Richard Henderson 
> ---
>  target-unicore32/translate.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/target-unicore32/translate.c b/target-unicore32/translate.c
> index b786a6b..36f4f2f 100644
> --- a/target-unicore32/translate.c
> +++ b/target-unicore32/translate.c
> @@ -1861,6 +1861,10 @@ static void disas_uc32_insn(CPUUniCore32State *env, 
> DisasContext *s)
>  {
>  unsigned int insn;
>  
> +if (unlikely(qemu_loglevel_mask(CPU_LOG_TB_OP | CPU_LOG_TB_OP_OPT))) {
> +tcg_gen_debug_insn_start(s->pc);
> +}
> +
>  insn = cpu_ldl_code(env, s->pc);
>  s->pc += 4;
>  
> -- 
> 1.7.11.4
> 

Reviewed-by: Aurelien Jarno 

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH 3/7] target-s390x: Call tcg_gen_debug_insn_start

2012-09-25 Thread Aurelien Jarno

On Mon, Sep 24, 2012 at 02:55:49PM -0700, Richard Henderson wrote:
> Cc: Alexander Graf 
> Signed-off-by: Richard Henderson 
> ---
>  target-s390x/translate.c | 9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/target-s390x/translate.c b/target-s390x/translate.c
> index 3214783..6fa76a0 100644
> --- a/target-s390x/translate.c
> +++ b/target-s390x/translate.c
> @@ -5173,10 +5173,11 @@ static inline void 
> gen_intermediate_code_internal(CPUS390XState *env,
>  if (num_insns + 1 == max_insns && (tb->cflags & CF_LAST_IO)) {
>  gen_io_start();
>  }
> -#if defined(S390X_DEBUG_DISAS_VERBOSE)
> -LOG_DISAS("pc " TARGET_FMT_lx "\n",
> -  dc.pc);
> -#endif
> +
> +if (unlikely(qemu_loglevel_mask(CPU_LOG_TB_OP | CPU_LOG_TB_OP_OPT))) 
> {
> +tcg_gen_debug_insn_start(dc.pc);
> +}
> +
>  disas_s390_insn(env, &dc);
>  
>  num_insns++;
> -- 
> 1.7.11.4
> 
Reviewed-by: Aurelien Jarno 

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH 2/7] target-m68k: Call tcg_gen_debug_insn_start

2012-09-25 Thread Aurelien Jarno

On Mon, Sep 24, 2012 at 02:55:48PM -0700, Richard Henderson wrote:
> Cc: Paul Brook 
> Signed-off-by: Richard Henderson 
> ---
>  target-m68k/translate.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/target-m68k/translate.c b/target-m68k/translate.c
> index fb707f2..451ef74 100644
> --- a/target-m68k/translate.c
> +++ b/target-m68k/translate.c
> @@ -2953,6 +2953,10 @@ static void disas_m68k_insn(CPUM68KState * env, 
> DisasContext *s)
>  {
>  uint16_t insn;
>  
> +if (unlikely(qemu_loglevel_mask(CPU_LOG_TB_OP | CPU_LOG_TB_OP_OPT))) {
> +tcg_gen_debug_insn_start(s->pc);
> +}
> +
>  insn = cpu_lduw_code(env, s->pc);
>  s->pc += 2;
>  
> -- 
> 1.7.11.4
> 
> 
> 

Reviewed-by: Aurelien Jarno 

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH 1/7] Emit debug_insn for CPU_LOG_TB_OP_OPT as well.

2012-09-25 Thread Aurelien Jarno

On Mon, Sep 24, 2012 at 02:55:47PM -0700, Richard Henderson wrote:
> For all targets that currently call tcg_gen_debug_insn_start,
> add CPU_LOG_TB_OP_OPT to the condition that gates it.
> 
> This is useful for comparing optimization dumps, when the
> pre-optimization dump is merely noise.
> 
> Signed-off-by: Richard Henderson 
> ---
>  target-alpha/translate.c  | 2 +-
>  target-arm/translate.c| 2 +-
>  target-cris/translate.c   | 3 ++-
>  target-i386/translate.c   | 3 ++-
>  target-lm32/translate.c   | 2 +-
>  target-microblaze/translate.c | 3 ++-
>  target-mips/translate.c   | 3 ++-
>  target-openrisc/translate.c   | 2 +-
>  target-ppc/translate.c| 3 ++-
>  target-sh4/translate.c| 2 +-
>  target-sparc/translate.c  | 3 ++-
>  target-xtensa/translate.c | 2 +-
>  12 files changed, 18 insertions(+), 12 deletions(-)
> 
> diff --git a/target-alpha/translate.c b/target-alpha/translate.c
> index 4a9011a..4194d6e 100644
> --- a/target-alpha/translate.c
> +++ b/target-alpha/translate.c
> @@ -3421,7 +3421,7 @@ static inline void 
> gen_intermediate_code_internal(CPUAlphaState *env,
>  insn = cpu_ldl_code(env, ctx.pc);
>  num_insns++;
>  
> - if (unlikely(qemu_loglevel_mask(CPU_LOG_TB_OP))) {
> + if (unlikely(qemu_loglevel_mask(CPU_LOG_TB_OP | CPU_LOG_TB_OP_OPT))) {
>  tcg_gen_debug_insn_start(ctx.pc);
>  }
>  
> diff --git a/target-arm/translate.c b/target-arm/translate.c
> index f4b447a..5fded49 100644
> --- a/target-arm/translate.c
> +++ b/target-arm/translate.c
> @@ -9816,7 +9816,7 @@ static inline void 
> gen_intermediate_code_internal(CPUARMState *env,
>  if (num_insns + 1 == max_insns && (tb->cflags & CF_LAST_IO))
>  gen_io_start();
>  
> -if (unlikely(qemu_loglevel_mask(CPU_LOG_TB_OP))) {
> +if (unlikely(qemu_loglevel_mask(CPU_LOG_TB_OP | CPU_LOG_TB_OP_OPT))) 
> {
>  tcg_gen_debug_insn_start(dc->pc);
>  }
>  
> diff --git a/target-cris/translate.c b/target-cris/translate.c
> index 19144b5..755de65 100644
> --- a/target-cris/translate.c
> +++ b/target-cris/translate.c
> @@ -3074,8 +3074,9 @@ static unsigned int crisv32_decoder(CPUCRISState *env, 
> DisasContext *dc)
>   int insn_len = 2;
>   int i;
>  
> - if (unlikely(qemu_loglevel_mask(CPU_LOG_TB_OP)))
> + if (unlikely(qemu_loglevel_mask(CPU_LOG_TB_OP | CPU_LOG_TB_OP_OPT))) {
>   tcg_gen_debug_insn_start(dc->pc);
> +}
>  
>   /* Load a halfword onto the instruction register.  */
>  dc->ir = cris_fetch(env, dc, dc->pc, 2, 0);
> diff --git a/target-i386/translate.c b/target-i386/translate.c
> index eb0cabc..323869d 100644
> --- a/target-i386/translate.c
> +++ b/target-i386/translate.c
> @@ -4202,8 +4202,9 @@ static target_ulong disas_insn(DisasContext *s, 
> target_ulong pc_start)
>  target_ulong next_eip, tval;
>  int rex_w, rex_r;
>  
> -if (unlikely(qemu_loglevel_mask(CPU_LOG_TB_OP)))
> +if (unlikely(qemu_loglevel_mask(CPU_LOG_TB_OP | CPU_LOG_TB_OP_OPT))) {
>  tcg_gen_debug_insn_start(pc_start);
> +}
>  s->pc = pc_start;
>  prefixes = 0;
>  aflag = s->code32;
> diff --git a/target-lm32/translate.c b/target-lm32/translate.c
> index 5f6dcba..77c2866 100644
> --- a/target-lm32/translate.c
> +++ b/target-lm32/translate.c
> @@ -942,7 +942,7 @@ static const DecoderInfo decinfo[] = {
>  
>  static inline void decode(DisasContext *dc, uint32_t ir)
>  {
> -if (unlikely(qemu_loglevel_mask(CPU_LOG_TB_OP))) {
> +if (unlikely(qemu_loglevel_mask(CPU_LOG_TB_OP | CPU_LOG_TB_OP_OPT))) {
>  tcg_gen_debug_insn_start(dc->pc);
>  }
>  
> diff --git a/target-microblaze/translate.c b/target-microblaze/translate.c
> index 9c7d77f..7d864b1 100644
> --- a/target-microblaze/translate.c
> +++ b/target-microblaze/translate.c
> @@ -1664,8 +1664,9 @@ static inline void decode(DisasContext *dc, uint32_t ir)
>  {
>  int i;
>  
> -if (unlikely(qemu_loglevel_mask(CPU_LOG_TB_OP)))
> +if (unlikely(qemu_loglevel_mask(CPU_LOG_TB_OP | CPU_LOG_TB_OP_OPT))) {
>  tcg_gen_debug_insn_start(dc->pc);
> +}
>  
>  dc->ir = ir;
>  LOG_DIS("%8.8x\t", dc->ir);
> diff --git a/target-mips/translate.c b/target-mips/translate.c
> index fa79d49..454e5cc 100644
> --- a/target-mips/translate.c
> +++ b/target-mips/translate.c
> @@ -12124,8 +12124,9 @@ static void decode_opc (CPUMIPSState *env, 
> DisasContext *ctx, int *is_branch)
>  gen_set_label(l1);
>  }
>  
> -if (unlikely(qemu_loglevel_mask(CPU_LOG_TB_OP)))
> +if (unlikely(qemu_loglevel_mask(CPU_LOG_TB_OP | CPU_LOG_TB_OP_OPT))) {
>  tcg_gen_debug_insn_start(ctx->pc);
> +}
>  
>  op = MASK_OP_MAJOR(ctx->opcode);
>  rs = (ctx->opcode >> 21) & 0x1f;
> diff --git a/target-openrisc/translate.c b/target-openrisc/translate.c
> index 325ba09..e2cad3a 100644
> --- a/target-openrisc/translate.c
> +++ b/target-openrisc/translate.c
> @@ -1715,7 +

Re: [Qemu-devel] [PATCH v2 3/5] target-arm: convert sar, shl and shr helpers to TCG

2012-09-25 Thread Aurelien Jarno

On Tue, Sep 25, 2012 at 02:57:20PM +0100, Peter Maydell wrote:
> On 21 September 2012 20:33, Aurelien Jarno  wrote:
> > +static void gen_sar(TCGv dest, TCGv t0, TCGv t1)
> > +{
> > +TCGv tmp1, tmp2, tmp3;
> > +tmp1 = tcg_temp_new_i32();
> > +tcg_gen_andi_i32(tmp1, t1, 0xff);
> > +tmp2 = tcg_const_i32(0x1f);
> > +tmp3 = tcg_const_i32(0);
> > +tcg_gen_movcond_i32(TCG_COND_GTU, tmp1, tmp1, tmp2, tmp2, tmp1);
> > +tcg_temp_free_i32(tmp3);
> > +tcg_temp_free_i32(tmp2);
> > +tcg_gen_andi_i32(tmp1, tmp1, 0x1f);
> > +tcg_gen_sar_i32(dest, t0, tmp1);
> > +tcg_temp_free_i32(tmp1);
> > +}
> 
> I don't think you need the "tcg_gen_andi_i32(tmp1, tmp1, 0x1f);"
> for sar, do you? The movcond is doing "shift = (shift > 31) ? 31 : shift"
> so we know it's going to be in [0..31] and the and will do nothing,
> right?
> 

Indeed, that's a leftover from the previous version based on setcond. 
It should be removed.

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH 15/17] main-loop: use aio_notify for qemu_notify_event

2012-09-25 Thread Anthony Liguori

Paolo Bonzini  writes:

> Signed-off-by: Paolo Bonzini 
> ---
>  main-loop.c | 106 
> +---
>  1 file modificato, 8 inserzioni(+), 98 rimozioni(-)

diffstats like this are always a good sign :-)

Reviewed-by: Anthony Liguori 

Regards,

Anthony Liguori

>
> diff --git a/main-loop.c b/main-loop.c
> index 209f699..978050a 100644
> --- a/main-loop.c
> +++ b/main-loop.c
> @@ -32,70 +32,6 @@
>  
>  #include "compatfd.h"
>  
> -static int io_thread_fd = -1;
> -
> -void qemu_notify_event(void)
> -{
> -/* Write 8 bytes to be compatible with eventfd.  */
> -static const uint64_t val = 1;
> -ssize_t ret;
> -
> -if (io_thread_fd == -1) {
> -return;
> -}
> -do {
> -ret = write(io_thread_fd, &val, sizeof(val));
> -} while (ret < 0 && errno == EINTR);
> -
> -/* EAGAIN is fine, a read must be pending.  */
> -if (ret < 0 && errno != EAGAIN) {
> -fprintf(stderr, "qemu_notify_event: write() failed: %s\n",
> -strerror(errno));
> -exit(1);
> -}
> -}
> -
> -static void qemu_event_read(void *opaque)
> -{
> -int fd = (intptr_t)opaque;
> -ssize_t len;
> -char buffer[512];
> -
> -/* Drain the notify pipe.  For eventfd, only 8 bytes will be read.  */
> -do {
> -len = read(fd, buffer, sizeof(buffer));
> -} while ((len == -1 && errno == EINTR) || len == sizeof(buffer));
> -}
> -
> -static int qemu_event_init(void)
> -{
> -int err;
> -int fds[2];
> -
> -err = qemu_eventfd(fds);
> -if (err == -1) {
> -return -errno;
> -}
> -err = fcntl_setfl(fds[0], O_NONBLOCK);
> -if (err < 0) {
> -goto fail;
> -}
> -err = fcntl_setfl(fds[1], O_NONBLOCK);
> -if (err < 0) {
> -goto fail;
> -}
> -qemu_set_fd_handler2(fds[0], NULL, qemu_event_read, NULL,
> - (void *)(intptr_t)fds[0]);
> -
> -io_thread_fd = fds[1];
> -return 0;
> -
> -fail:
> -close(fds[0]);
> -close(fds[1]);
> -return err;
> -}
> -
>  /* If we have signalfd, we mask out the signals we want to handle and then
>   * use signalfd to listen for them.  We rely on whatever the current signal
>   * handler is to dispatch the signals when we receive them.
> @@ -165,43 +101,22 @@ static int qemu_signal_init(void)
>  
>  #else /* _WIN32 */
>  
> -static HANDLE qemu_event_handle = NULL;
> -
> -static void dummy_event_handler(void *opaque)
> -{
> -}
> -
> -static int qemu_event_init(void)
> +static int qemu_signal_init(void)
>  {
> -qemu_event_handle = CreateEvent(NULL, FALSE, FALSE, NULL);
> -if (!qemu_event_handle) {
> -fprintf(stderr, "Failed CreateEvent: %ld\n", GetLastError());
> -return -1;
> -}
> -qemu_add_wait_object(qemu_event_handle, dummy_event_handler, NULL);
>  return 0;
>  }
> +#endif
> +
> +static AioContext *qemu_aio_context;
>  
>  void qemu_notify_event(void)
>  {
> -if (!qemu_event_handle) {
> +if (!qemu_aio_context) {
>  return;
>  }
> -if (!SetEvent(qemu_event_handle)) {
> -fprintf(stderr, "qemu_notify_event: SetEvent failed: %ld\n",
> -GetLastError());
> -exit(1);
> -}
> +aio_notify(qemu_aio_context);
>  }
>  
> -static int qemu_signal_init(void)
> -{
> -return 0;
> -}
> -#endif
> -
> -static AioContext *qemu_aio_context;
> -
>  int main_loop_init(void)
>  {
>  int ret;
> @@ -213,12 +128,6 @@ int main_loop_init(void)
>  return ret;
>  }
>  
> -/* Note eventfd must be drained before signalfd handlers run */
> -ret = qemu_event_init();
> -if (ret) {
> -return ret;
> -}
> -
>  qemu_aio_context = aio_context_new();
>  src = aio_get_g_source(qemu_aio_context);
>  g_source_attach(src, NULL);
> @@ -408,7 +317,8 @@ void qemu_del_wait_object(HANDLE handle, WaitObjectFunc 
> *func, void *opaque)
>  
>  void qemu_fd_register(int fd)
>  {
> -WSAEventSelect(fd, qemu_event_handle, FD_READ | FD_ACCEPT | FD_CLOSE |
> +WSAEventSelect(fd, 
> event_notifier_get_handle(&qemu_aio_context->notifier),
> +   FD_READ | FD_ACCEPT | FD_CLOSE |
> FD_CONNECT | FD_WRITE | FD_OOB);
>  }
>  
> -- 
> 1.7.12

Re: [Qemu-devel] [PATCH 16/17] aio: clean up now-unused functions

2012-09-25 Thread Anthony Liguori

Paolo Bonzini  writes:

> Some cleanups can now be made, now that the main loop does not anymore need
> hooks into the bottom half code.
>
> Signed-off-by: Paolo Bonzini 

Reviewed-by: Anthony Liguori 

Regards,

Anthony Liguori

> ---
>  async.c   | 23 +++
>  oslib-posix.c | 31 ---
>  qemu-aio.h|  1 -
>  qemu-common.h |  1 -
>  4 file modificati, 7 inserzioni(+), 49 rimozioni(-)
>
> diff --git a/async.c b/async.c
> index 31c6c76..5a96d11 100644
> --- a/async.c
> +++ b/async.c
> @@ -117,16 +117,20 @@ void qemu_bh_delete(QEMUBH *bh)
>  bh->deleted = 1;
>  }
>  
> -void aio_bh_update_timeout(AioContext *ctx, uint32_t *timeout)
> +static gboolean
> +aio_ctx_prepare(GSource *source, gint*timeout)
>  {
> +AioContext *ctx = (AioContext *) source;
>  QEMUBH *bh;
> +bool scheduled = false;
>  
>  for (bh = ctx->first_bh; bh; bh = bh->next) {
>  if (!bh->deleted && bh->scheduled) {
> +scheduled = true;
>  if (bh->idle) {
>  /* idle bottom halves will be polled at least
>   * every 10ms */
> -*timeout = MIN(10, *timeout);
> +*timeout = 10;
>  } else {
>  /* non-idle bottom halves will be executed
>   * immediately */
> @@ -135,21 +139,8 @@ void aio_bh_update_timeout(AioContext *ctx, uint32_t 
> *timeout)
>  }
>  }
>  }
> -}
> -
> -static gboolean
> -aio_ctx_prepare(GSource *source, gint*timeout)
> -{
> -AioContext *ctx = (AioContext *) source;
> -uint32_t wait = -1;
> -aio_bh_update_timeout(ctx, &wait);
> -
> -if (wait != -1) {
> -*timeout = MIN(*timeout, wait);
> -return wait == 0;
> -}
>  
> -return FALSE;
> +return scheduled;
>  }
>  
>  static gboolean
> diff --git a/oslib-posix.c b/oslib-posix.c
> index dbeb627..9db9c3d 100644
> --- a/oslib-posix.c
> +++ b/oslib-posix.c
> @@ -61,9 +61,6 @@ static int running_on_valgrind = -1;
>  #ifdef CONFIG_LINUX
>  #include 
>  #endif
> -#ifdef CONFIG_EVENTFD
> -#include 
> -#endif
>  
>  int qemu_get_thread_id(void)
>  {
> @@ -183,34 +180,6 @@ int qemu_pipe(int pipefd[2])
>  return ret;
>  }
>  
> -/*
> - * Creates an eventfd that looks like a pipe and has EFD_CLOEXEC set.
> - */
> -int qemu_eventfd(int fds[2])
> -{
> -#ifdef CONFIG_EVENTFD
> -int ret;
> -
> -ret = eventfd(0, 0);
> -if (ret >= 0) {
> -fds[0] = ret;
> -fds[1] = dup(ret);
> -if (fds[1] == -1) {
> -close(ret);
> -return -1;
> -}
> -qemu_set_cloexec(ret);
> -qemu_set_cloexec(fds[1]);
> -return 0;
> -}
> -if (errno != ENOSYS) {
> -return -1;
> -}
> -#endif
> -
> -return qemu_pipe(fds);
> -}
> -
>  int qemu_utimens(const char *path, const struct timespec *times)
>  {
>  struct timeval tv[2], tv_now;
> diff --git a/qemu-aio.h b/qemu-aio.h
> index 2354617..1b7eb6e 100644
> --- a/qemu-aio.h
> +++ b/qemu-aio.h
> @@ -125,7 +125,6 @@ void aio_notify(AioContext *ctx);
>   * These are internal functions used by the QEMU main loop.
>   */
>  int aio_bh_poll(AioContext *ctx);
> -void aio_bh_update_timeout(AioContext *ctx, uint32_t *timeout);
>  
>  /**
>   * qemu_bh_schedule: Schedule a bottom half.
> diff --git a/qemu-common.h b/qemu-common.h
> index ac44657..1ea6ea3 100644
> --- a/qemu-common.h
> +++ b/qemu-common.h
> @@ -219,7 +219,6 @@ ssize_t qemu_recv_full(int fd, void *buf, size_t count, 
> int flags)
>  QEMU_WARN_UNUSED_RESULT;
>  
>  #ifndef _WIN32
> -int qemu_eventfd(int pipefd[2]);
>  int qemu_pipe(int pipefd[2]);
>  #endif
>  
> -- 
> 1.7.12

Re: [Qemu-devel] [PATCH 14/17] main-loop: use GSource to poll AIO file descriptors

2012-09-25 Thread Anthony Liguori

Paolo Bonzini  writes:

> Signed-off-by: Paolo Bonzini 
> ---
>  main-loop.c | 23 ++-
>  main-loop.h |  2 --
>  2 file modificati, 6 inserzioni(+), 19 rimozioni(-)
>
> diff --git a/main-loop.c b/main-loop.c
> index b290c79..209f699 100644
> --- a/main-loop.c
> +++ b/main-loop.c
> @@ -205,6 +205,7 @@ static AioContext *qemu_aio_context;
>  int main_loop_init(void)
>  {
>  int ret;
> +GSource *src;
>  
>  qemu_mutex_lock_iothread();
>  ret = qemu_signal_init();
> @@ -219,6 +220,9 @@ int main_loop_init(void)
>  }
>  
>  qemu_aio_context = aio_context_new();
> +src = aio_get_g_source(qemu_aio_context);
> +g_source_attach(src, NULL);
> +g_source_unref(src);
>  return 0;
>  }
>  
> @@ -481,8 +485,6 @@ int main_loop_wait(int nonblocking)
>  
>  if (nonblocking) {
>  timeout = 0;
> -} else {
> -aio_bh_update_timeout(qemu_aio_context, &timeout);
>  }
>  
>  /* poll any events */
> @@ -505,10 +507,6 @@ int main_loop_wait(int nonblocking)
>  
>  qemu_run_all_timers();
>  
> -/* Check bottom-halves last in case any of the earlier events triggered
> -   them.  */
> -qemu_bh_poll();
> -

This is an awesome cleanup!

What do you think about deprecating bottom halves in the !block code in
favor of idle functions?  I don't see any reason to keep using bottom
halves...

Reviewed-by: Anthony Liguori 

Regards,

Anthony Liguori

>  return ret;
>  }
>  
> @@ -519,11 +517,6 @@ QEMUBH *qemu_bh_new(QEMUBHFunc *cb, void *opaque)
>  return aio_bh_new(qemu_aio_context, cb, opaque);
>  }
>  
> -int qemu_bh_poll(void)
> -{
> -return aio_bh_poll(qemu_aio_context);
> -}
> -
>  void qemu_aio_flush(void)
>  {
>  aio_flush(qemu_aio_context);
> @@ -543,16 +536,12 @@ void qemu_aio_set_fd_handler(int fd,
>  {
>  aio_set_fd_handler(qemu_aio_context, fd, io_read, io_write, io_flush,
> opaque);
> -
> -qemu_set_fd_handler2(fd, NULL, io_read, io_write, opaque);
>  }
> +#endif
>  
>  void qemu_aio_set_event_notifier(EventNotifier *notifier,
>   EventNotifierHandler *io_read,
>   AioFlushEventNotifierHandler *io_flush)
>  {
> -qemu_aio_set_fd_handler(event_notifier_get_fd(notifier),
> -(IOHandler *)io_read, NULL,
> -(AioFlushHandler *)io_flush, notifier);
> +aio_set_event_notifier(qemu_aio_context, notifier, io_read, io_flush);
>  }
> -#endif
> diff --git a/main-loop.h b/main-loop.h
> index 47644ce..c58f38b 100644
> --- a/main-loop.h
> +++ b/main-loop.h
> @@ -312,7 +312,5 @@ void qemu_iohandler_poll(fd_set *readfds, fd_set 
> *writefds, fd_set *xfds, int rc
>  
>  QEMUBH *qemu_bh_new(QEMUBHFunc *cb, void *opaque);
>  void qemu_bh_schedule_idle(QEMUBH *bh);
> -int qemu_bh_poll(void);
> -void qemu_bh_update_timeout(uint32_t *timeout);
>  
>  #endif
> -- 
> 1.7.12

Re: [Qemu-devel] [PATCH 12/17] aio: add aio_notify

2012-09-25 Thread Anthony Liguori

Paolo Bonzini  writes:

> With this change async.c does not rely anymore on any service from
> main-loop.c, i.e. it is completely self-contained.
>
> Signed-off-by: Paolo Bonzini 

Other than the coding style bits that need to be fixed first in the
previous patch:

Reviewed-by: Anthony Liguori 

Regards,

Anthony Liguori

> ---
>  async.c| 30 ++
>  qemu-aio.h | 18 ++
>  2 file modificati, 44 inserzioni(+), 4 rimozioni(-)
>
> diff --git a/async.c b/async.c
> index ed2bd3f..31c6c76 100644
> --- a/async.c
> +++ b/async.c
> @@ -30,6 +30,7 @@
>  /* bottom halves (can be seen as timers which expire ASAP) */
>  
>  struct QEMUBH {
> +AioContext *ctx;
>  QEMUBHFunc *cb;
>  void *opaque;
>  QEMUBH *next;
> @@ -42,6 +43,7 @@ QEMUBH *aio_bh_new(AioContext *ctx, QEMUBHFunc *cb, void 
> *opaque)
>  {
>  QEMUBH *bh;
>  bh = g_malloc0(sizeof(QEMUBH));
> +bh->ctx = ctx;
>  bh->cb = cb;
>  bh->opaque = opaque;
>  bh->next = ctx->first_bh;
> @@ -101,8 +103,7 @@ void qemu_bh_schedule(QEMUBH *bh)
>  return;
>  bh->scheduled = 1;
>  bh->idle = 0;
> -/* stop the currently executing CPU to execute the BH ASAP */
> -qemu_notify_event();
> +aio_notify(bh->ctx);
>  }
>  
>  void qemu_bh_cancel(QEMUBH *bh)
> @@ -177,11 +178,20 @@ aio_ctx_dispatch(GSource *source,
>  return TRUE;
>  }
>  
> +static void
> +aio_ctx_finalize(GSource *source)
> +{
> +AioContext *ctx = (AioContext *) source;
> +
> +aio_set_event_notifier(ctx, &ctx->notifier, NULL, NULL);
> +event_notifier_cleanup(&ctx->notifier);
> +}
> +
>  static GSourceFuncs aio_source_funcs = {
>  aio_ctx_prepare,
>  aio_ctx_check,
>  aio_ctx_dispatch,
> -NULL
> +aio_ctx_finalize
>  };
>  
>  GSource *aio_get_g_source(AioContext *ctx)
> @@ -190,9 +200,21 @@ GSource *aio_get_g_source(AioContext *ctx)
>  return &ctx->source;
>  }
>  
> +void aio_notify(AioContext *ctx)
> +{
> +event_notifier_set(&ctx->notifier);
> +}
> +
>  AioContext *aio_context_new(void)
>  {
> -return (AioContext *) g_source_new(&aio_source_funcs, 
> sizeof(AioContext));
> +AioContext *ctx;
> +ctx = (AioContext *) g_source_new(&aio_source_funcs, sizeof(AioContext));
> +event_notifier_init(&ctx->notifier, false);
> +aio_set_event_notifier(ctx, &ctx->notifier, 
> +   (EventNotifierHandler *)
> +   event_notifier_test_and_clear, NULL);
> +
> +return ctx;
>  }
>  
>  void aio_context_ref(AioContext *ctx)
> diff --git a/qemu-aio.h b/qemu-aio.h
> index aedf66c..2354617 100644
> --- a/qemu-aio.h
> +++ b/qemu-aio.h
> @@ -62,6 +62,9 @@ typedef struct AioContext {
>   * no callbacks are removed while we're walking and dispatching 
> callbacks.
>   */
>  int walking_bh;
> +
> +/* Used for aio_notify.  */
> +EventNotifier notifier;
>  } AioContext;
>  
>  /* Returns 1 if there are still outstanding AIO requests; 0 otherwise */
> @@ -102,6 +105,21 @@ void aio_context_unref(AioContext *ctx);
>  QEMUBH *aio_bh_new(AioContext *ctx, QEMUBHFunc *cb, void *opaque);
>  
>  /**
> + * aio_notify: Force processing of pending events.
> + *
> + * Similar to signaling a condition variable, aio_notify forces
> + * aio_wait to exit, so that the next call will re-examine pending events.
> + * The caller of aio_notify will usually call aio_wait again very soon,
> + * or go through another iteration of the GLib main loop.  Hence, aio_notify
> + * also has the side effect of recalculating the sets of file descriptors
> + * that the main loop waits for.
> + *
> + * Calling aio_notify is rarely necessary, because for example scheduling
> + * a bottom half calls it already.
> + */
> +void aio_notify(AioContext *ctx);
> +
> +/**
>   * aio_bh_poll: Poll bottom halves for an AioContext.
>   *
>   * These are internal functions used by the QEMU main loop.
> -- 
> 1.7.12

Re: [Qemu-devel] [PATCH 13/17] aio: call aio_notify after setting I/O handlers

2012-09-25 Thread Anthony Liguori

Paolo Bonzini  writes:

> In the current code, this is done by qemu_set_fd_handler2, which is
> called by qemu_aio_set_fd_handler.  We need to keep the same behavior
> even after removing the call to qemu_set_fd_handler2.
>
> Signed-off-by: Paolo Bonzini 

Reviewed-by: Anthony Liguori 

Regards,

Anthony Liguori


> ---
>  aio-posix.c | 2 ++
>  aio-win32.c | 2 ++
>  2 file modificati, 4 inserzioni(+)
>
> diff --git a/aio-posix.c b/aio-posix.c
> index e29ece9..41f638f 100644
> --- a/aio-posix.c
> +++ b/aio-posix.c
> @@ -90,6 +90,8 @@ void aio_set_fd_handler(AioContext *ctx,
>  node->pfd.events |= (io_read ? G_IO_IN | G_IO_HUP : 0);
>  node->pfd.events |= (io_write ? G_IO_OUT : 0);
>  }
> +
> +aio_notify(ctx);
>  }
>  
>  void aio_set_event_notifier(AioContext *ctx,
> diff --git a/aio-win32.c b/aio-win32.c
> index 5057371..78faf69 100644
> --- a/aio-win32.c
> +++ b/aio-win32.c
> @@ -75,6 +75,8 @@ void aio_set_event_notifier(AioContext *ctx,
>  node->io_notify = io_notify;
>  node->io_flush = io_flush;
>  }
> +
> +aio_notify(ctx);
>  }
>  
>  bool aio_pending(AioContext *ctx)
> -- 
> 1.7.12

Re: [Qemu-devel] [PATCH 11/17] aio: make AioContexts GSources

2012-09-25 Thread Anthony Liguori

Paolo Bonzini  writes:

> This lets AioContexts be used (optionally) with a glib main loop.
>
> Signed-off-by: Paolo Bonzini 
> ---
>  aio-posix.c |  4 
>  aio-win32.c |  4 
>  async.c | 65 
> -
>  qemu-aio.h  | 23 ++
>  4 file modificati, 95 inserzioni(+). 1 rimozione(-)
>
> diff --git a/aio-posix.c b/aio-posix.c
> index c848a9f..e29ece9 100644
> --- a/aio-posix.c
> +++ b/aio-posix.c
> @@ -56,6 +56,8 @@ void aio_set_fd_handler(AioContext *ctx,
>  /* Are we deleting the fd handler? */
>  if (!io_read && !io_write) {
>  if (node) {
> +g_source_remove_poll(&ctx->source, &node->pfd);
> +
>  /* If the lock is held, just mark the node as deleted */
>  if (ctx->walking_handlers) {
>  node->deleted = 1;
> @@ -75,6 +77,8 @@ void aio_set_fd_handler(AioContext *ctx,
>  node = g_malloc0(sizeof(AioHandler));
>  node->pfd.fd = fd;
>  QLIST_INSERT_HEAD(&ctx->aio_handlers, node, node);
> +
> +g_source_add_poll(&ctx->source, &node->pfd);
>  }
>  /* Update handler with latest information */
>  node->io_read = io_read;
> diff --git a/aio-win32.c b/aio-win32.c
> index c46dfb2..5057371 100644
> --- a/aio-win32.c
> +++ b/aio-win32.c
> @@ -45,6 +45,8 @@ void aio_set_event_notifier(AioContext *ctx,
>  /* Are we deleting the fd handler? */
>  if (!io_notify) {
>  if (node) {
> +g_source_remove_poll(&ctx->source, &node->pfd);
> +

Why remove vs. setting events = 0?

add_poll/remove_poll also comes with an event loop notify which I don't
think is strictly necessary here.

>  /* If the lock is held, just mark the node as deleted */
>  if (ctx->walking_handlers) {
>  node->deleted = 1;
> @@ -66,6 +68,8 @@ void aio_set_event_notifier(AioContext *ctx,
>  node->pfd.fd = (uintptr_t)event_notifier_get_handle(e);
>  node->pfd.events = G_IO_IN;
>  QLIST_INSERT_HEAD(&ctx->aio_handlers, node, node);
> +
> +g_source_add_poll(&ctx->source, &node->pfd);
>  }
>  /* Update handler with latest information */
>  node->io_notify = io_notify;
> diff --git a/async.c b/async.c
> index 513bdd7..ed2bd3f 100644
> --- a/async.c
> +++ b/async.c
> @@ -136,10 +136,73 @@ void aio_bh_update_timeout(AioContext *ctx, uint32_t 
> *timeout)
>  }
>  }
>  
> +static gboolean
> +aio_ctx_prepare(GSource *source, gint*timeout)
> +{
> +AioContext *ctx = (AioContext *) source;
> +uint32_t wait = -1;
> +aio_bh_update_timeout(ctx, &wait);
> +
> +if (wait != -1) {
> +*timeout = MIN(*timeout, wait);
> +return wait == 0;
> +}
> +
> +return FALSE;
> +}
> +
> +static gboolean
> +aio_ctx_check(GSource *source)
> +{
> +AioContext *ctx = (AioContext *) source;
> +QEMUBH *bh;
> +
> +for (bh = ctx->first_bh; bh; bh = bh->next) {
> +if (!bh->deleted && bh->scheduled) {
> +return true;
> + }
> +}
> +return aio_pending(ctx);
> +}

Think you've got some copy/paste leftover glib coding style.  Probably
should use TRUE/true consistently too.  I think using TRUE/FALSE for
gboolean and true/false for bool is reasonable.

> +
> +static gboolean
> +aio_ctx_dispatch(GSource *source,
> + GSourceFunc  callback,
> + gpointer user_data)
> +{
> +AioContext *ctx = (AioContext *) source;
> +
> +assert(callback == NULL);
> +aio_poll(ctx, false);
> +return TRUE;
> +}
> +
> +static GSourceFuncs aio_source_funcs = {
> +aio_ctx_prepare,
> +aio_ctx_check,
> +aio_ctx_dispatch,
> +NULL
> +};
> +
> +GSource *aio_get_g_source(AioContext *ctx)
> +{
> +g_source_ref(&ctx->source);
> +return &ctx->source;
> +}
>  
>  AioContext *aio_context_new(void)
>  {
> -return g_new0(AioContext, 1);
> +return (AioContext *) g_source_new(&aio_source_funcs, 
> sizeof(AioContext));
> +}
> +
> +void aio_context_ref(AioContext *ctx)
> +{
> +g_source_ref(&ctx->source);
> +}
> +
> +void aio_context_unref(AioContext *ctx)
> +{
> +g_source_unref(&ctx->source);
>  }
>  
>  void aio_flush(AioContext *ctx)
> diff --git a/qemu-aio.h b/qemu-aio.h
> index ac24896..aedf66c 100644
> --- a/qemu-aio.h
> +++ b/qemu-aio.h
> @@ -44,6 +44,8 @@ typedef void QEMUBHFunc(void *opaque);
>  typedef void IOHandler(void *opaque);
>  
>  typedef struct AioContext {
> +GSource source;
> +
>  /* The list of registered AIO handlers */
>  QLIST_HEAD(, AioHandler) aio_handlers;
>  
> @@ -75,6 +77,22 @@ typedef int (AioFlushEventNotifierHandler)(EventNotifier 
> *e);
>  AioContext *aio_context_new(void);
>  
>  /**
> + * aio_context_ref:
> + * @ctx: The AioContext to operate on.
> + *
> + * Add a reference to an AioContext.
> + */
> +void aio_context_ref(AioContext *ctx);
> +
> +/**
> + * aio_context_unr

Re: [Qemu-devel] [PATCH 09/17] aio: prepare for introducing GSource-based dispatch

2012-09-25 Thread Anthony Liguori

Paolo Bonzini  writes:

> This adds a GPollFD to each AioHandler.  It will then be possible to
> attach these GPollFDs to a GSource, and from there to the main loop.
> aio_wait examines the GPollFDs and avoids calling select() if any
> is set (similar to what it does if bottom halves are available).
>
> Signed-off-by: Paolo Bonzini 
> ---
>  aio.c  | 82 
> +-
>  qemu-aio.h |  7 ++
>  2 file modificati, 78 inserzioni(+), 11 rimozioni(-)
>
> diff --git a/aio.c b/aio.c
> index 95ad467..c848a9f 100644
> --- a/aio.c
> +++ b/aio.c
> @@ -20,7 +20,7 @@
>  
>  struct AioHandler
>  {
> -int fd;
> +GPollFD pfd;
>  IOHandler *io_read;
>  IOHandler *io_write;
>  AioFlushHandler *io_flush;
> @@ -34,7 +34,7 @@ static AioHandler *find_aio_handler(AioContext *ctx, int fd)
>  AioHandler *node;
>  
>  QLIST_FOREACH(node, &ctx->aio_handlers, node) {
> -if (node->fd == fd)
> +if (node->pfd.fd == fd)
>  if (!node->deleted)
>  return node;
>  }
> @@ -57,9 +57,10 @@ void aio_set_fd_handler(AioContext *ctx,
>  if (!io_read && !io_write) {
>  if (node) {
>  /* If the lock is held, just mark the node as deleted */
> -if (ctx->walking_handlers)
> +if (ctx->walking_handlers) {
>  node->deleted = 1;
> -else {
> +node->pfd.revents = 0;
> +} else {
>  /* Otherwise, delete it for real.  We can't just mark it as
>   * deleted because deleted nodes are only cleaned up after
>   * releasing the walking_handlers lock.
> @@ -72,7 +73,7 @@ void aio_set_fd_handler(AioContext *ctx,
>  if (node == NULL) {
>  /* Alloc and insert if it's not already there */
>  node = g_malloc0(sizeof(AioHandler));
> -node->fd = fd;
> +node->pfd.fd = fd;
>  QLIST_INSERT_HEAD(&ctx->aio_handlers, node, node);
>  }
>  /* Update handler with latest information */
> @@ -80,6 +81,10 @@ void aio_set_fd_handler(AioContext *ctx,
>  node->io_write = io_write;
>  node->io_flush = io_flush;
>  node->opaque = opaque;
> +
> +node->pfd.events = G_IO_ERR;
> +node->pfd.events |= (io_read ? G_IO_IN | G_IO_HUP : 0);
> +node->pfd.events |= (io_write ? G_IO_OUT : 0);
>  }

Should we even set G_IO_ERR?  I think that corresponds to exceptfd in
select() but we've never set that historically.  I know glib recommends
it but I don't think it's applicable to how we use it.

Moreover, the way you do dispatch, if G_IO_ERR did occur, we'd dispatch
both the read and write handlers which definitely isn't right.

I think it's easiest just to drop it.

>  }
>  
> @@ -93,6 +98,25 @@ void aio_set_event_notifier(AioContext *ctx,
> (AioFlushHandler *)io_flush, notifier);
>  }
>  
> +bool aio_pending(AioContext *ctx)
> +{
> +AioHandler *node;
> +
> +QLIST_FOREACH(node, &ctx->aio_handlers, node) {
> +int revents;
> +
> +revents = node->pfd.revents & node->pfd.events;
> +if (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR) && node->io_read) {
> +return true;
> +}
> +if (revents & (G_IO_OUT | G_IO_ERR) && node->io_write) {
> +return true;
> +}
> +}
> +
> +return false;
> +}
> +
>  bool aio_poll(AioContext *ctx, bool blocking)
>  {
>  static struct timeval tv0;
> @@ -114,6 +138,42 @@ bool aio_poll(AioContext *ctx, bool blocking)
>  progress = true;
>  }
>  
> +/*
> + * Then dispatch any pending callbacks from the GSource.
> + *
> + * We have to walk very carefully in case qemu_aio_set_fd_handler is
> + * called while we're walking.
> + */
> +node = QLIST_FIRST(&ctx->aio_handlers);
> +while (node) {
> +AioHandler *tmp;
> +int revents;
> +
> +ctx->walking_handlers++;
> +
> +revents = node->pfd.revents & node->pfd.events;
> +node->pfd.revents &= ~revents;

This is interesting and I must admit I don't understand why it's
necessary.  What case are you trying to handle?

> +
> +if (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR) && node->io_read) {
> +node->io_read(node->opaque);
> +progress = true;
> +}
> +if (revents & (G_IO_OUT | G_IO_ERR) && node->io_write) {
> +node->io_write(node->opaque);
> +progress = true;
> +}
> +
> +tmp = node;
> +node = QLIST_NEXT(node, node);
> +
> +ctx->walking_handlers--;
> +
> +if (!ctx->walking_handlers && tmp->deleted) {
> +QLIST_REMOVE(tmp, node);
> +g_free(tmp);
> +}
> +}
> +
>  if (progress && !blocking) {
>  return true;
>  }
> @@ -137,12 +197,12 @@ bool aio_poll(AioContext *ctx, bool blocking)
>

Re: [Qemu-devel] [PATCH 08/17] aio: add non-blocking variant of aio_wait

2012-09-25 Thread Anthony Liguori

Paolo Bonzini  writes:

> This will be used when polling the GSource attached to an AioContext.
>
> Signed-off-by: Paolo Bonzini 

Reviewed-by: Anthony Liguori 

Regards,

Anthony Liguori

> ---
>  aio.c   | 16 
>  async.c |  2 +-
>  main-loop.c |  2 +-
>  qemu-aio.h  | 21 +++--
>  4 file modificati, 29 inserzioni(+), 12 rimozioni(-)
>
> diff --git a/aio.c b/aio.c
> index c89f1e9..95ad467 100644
> --- a/aio.c
> +++ b/aio.c
> @@ -93,13 +93,16 @@ void aio_set_event_notifier(AioContext *ctx,
> (AioFlushHandler *)io_flush, notifier);
>  }
>  
> -bool aio_wait(AioContext *ctx)
> +bool aio_poll(AioContext *ctx, bool blocking)
>  {
> +static struct timeval tv0;
>  AioHandler *node;
>  fd_set rdfds, wrfds;
>  int max_fd = -1;
>  int ret;
> -bool busy;
> +bool busy, progress;
> +
> +progress = false;
>  
>  /*
>   * If there are callbacks left that have been queued, we need to call 
> then.
> @@ -107,6 +110,11 @@ bool aio_wait(AioContext *ctx)
>   * does not need a complete flush (as is the case for qemu_aio_wait 
> loops).
>   */
>  if (aio_bh_poll(ctx)) {
> +blocking = false;
> +progress = true;
> +}
> +
> +if (progress && !blocking) {
>  return true;
>  }
>  
> @@ -142,11 +150,11 @@ bool aio_wait(AioContext *ctx)
>  
>  /* No AIO operations?  Get us out of here */
>  if (!busy) {
> -return false;
> +return progress;
>  }
>  
>  /* wait until next event */
> -ret = select(max_fd, &rdfds, &wrfds, NULL, NULL);
> +ret = select(max_fd, &rdfds, &wrfds, NULL, blocking ? NULL : &tv0);
>  
>  /* if we have any readable fds, dispatch event */
>  if (ret > 0) {
> diff --git a/async.c b/async.c
> index c99db79..513bdd7 100644
> --- a/async.c
> +++ b/async.c
> @@ -144,5 +144,5 @@ AioContext *aio_context_new(void)
>  
>  void aio_flush(AioContext *ctx)
>  {
> -while (aio_wait(ctx));
> +while (aio_poll(ctx, true));
>  }
> diff --git a/main-loop.c b/main-loop.c
> index 8301fe9..67800fe 100644
> --- a/main-loop.c
> +++ b/main-loop.c
> @@ -531,7 +531,7 @@ void qemu_aio_flush(void)
>  
>  bool qemu_aio_wait(void)
>  {
> -return aio_wait(qemu_aio_context);
> +return aio_poll(qemu_aio_context, true);
>  }
>  
>  void qemu_aio_set_fd_handler(int fd,
> diff --git a/qemu-aio.h b/qemu-aio.h
> index f8a93d8..f19201e 100644
> --- a/qemu-aio.h
> +++ b/qemu-aio.h
> @@ -133,13 +133,22 @@ void qemu_bh_delete(QEMUBH *bh);
>   * outstanding AIO operations have been completed or cancelled. */
>  void aio_flush(AioContext *ctx);
>  
> -/* Wait for a single AIO completion to occur.  This function will wait
> - * until a single AIO event has completed and it will ensure something
> - * has moved before returning. This can issue new pending aio as
> - * result of executing I/O completion or bh callbacks.
> +/* Progress in completing AIO work to occur.  This can issue new pending
> + * aio as a result of executing I/O completion or bh callbacks.
>   *
> - * Return whether there is still any pending AIO operation.  */
> -bool aio_wait(AioContext *ctx);
> + * If there is no pending AIO operation or completion (bottom half),
> + * return false.  If there are pending bottom halves, return true.
> + *
> + * If there are no pending bottom halves, but there are pending AIO
> + * operations, it may not be possible to make any progress without
> + * blocking.  If @blocking is true, this function will wait until one
> + * or more AIO events have completed, to ensure something has moved
> + * before returning.
> + *
> + * If @blocking is false, this function will also return false if the
> + * function cannot make any progress without blocking.
> + */
> +bool aio_poll(AioContext *ctx, bool blocking);
>  
>  #ifdef CONFIG_POSIX
>  /* Returns 1 if there are still outstanding AIO requests; 0 otherwise */
> -- 
> 1.7.12

Re: [Qemu-devel] [PATCH 06/17] aio: introduce AioContext, move bottom halves there

2012-09-25 Thread Anthony Liguori

Paolo Bonzini  writes:

> Start introducing AioContext, which will let us remove globals from
> aio.c/async.c, and introduce multiple I/O threads.
>
> The bottom half functions now take an additional AioContext argument.
> A bottom half is created with a specific AioContext that remains the
> same throughout the lifetime.
>
> Signed-off-by: Paolo Bonzini 
> ---
>  async.c   | 30 +--
>  hw/hw.h   |  1 +
>  iohandler.c   |  1 +
>  main-loop.c   | 18 +++-
>  main-loop.h   | 54 ++-
>  qemu-aio.h| 79 
> ++-
>  qemu-char.h   |  1 +
>  qemu-common.h |  1 +
>  qemu-coroutine-lock.c |  2 +-
>  9 file modificati, 118 inserzioni(+), 69 rimozioni(-)
>
> diff --git a/async.c b/async.c
> index 85cc641..189ee1b 100644
> --- a/async.c
> +++ b/async.c
> @@ -26,9 +26,6 @@
>  #include "qemu-aio.h"
>  #include "main-loop.h"
>  
> -/* Anchor of the list of Bottom Halves belonging to the context */
> -static struct QEMUBH *first_bh;
> -
>  /***/
>  /* bottom halves (can be seen as timers which expire ASAP) */
>  
> @@ -41,27 +38,26 @@ struct QEMUBH {
>  bool deleted;
>  };
>  
> -QEMUBH *qemu_bh_new(QEMUBHFunc *cb, void *opaque)
> +QEMUBH *aio_bh_new(AioContext *ctx, QEMUBHFunc *cb, void *opaque)
>  {
>  QEMUBH *bh;
>  bh = g_malloc0(sizeof(QEMUBH));
>  bh->cb = cb;
>  bh->opaque = opaque;
> -bh->next = first_bh;
> -first_bh = bh;
> +bh->next = ctx->first_bh;
> +ctx->first_bh = bh;
>  return bh;
>  }
>  
> -int qemu_bh_poll(void)
> +int aio_bh_poll(AioContext *ctx)
>  {
>  QEMUBH *bh, **bhp, *next;
>  int ret;
> -static int nesting = 0;
>  
> -nesting++;
> +ctx->walking_bh++;
>  
>  ret = 0;
> -for (bh = first_bh; bh; bh = next) {
> +for (bh = ctx->first_bh; bh; bh = next) {
>  next = bh->next;
>  if (!bh->deleted && bh->scheduled) {
>  bh->scheduled = 0;
> @@ -72,11 +68,11 @@ int qemu_bh_poll(void)
>  }
>  }
>  
> -nesting--;
> +ctx->walking_bh--;
>  
>  /* remove deleted bhs */
> -if (!nesting) {
> -bhp = &first_bh;
> +if (!ctx->walking_bh) {
> +bhp = &ctx->first_bh;
>  while (*bhp) {
>  bh = *bhp;
>  if (bh->deleted) {
> @@ -120,11 +116,11 @@ void qemu_bh_delete(QEMUBH *bh)
>  bh->deleted = 1;
>  }
>  
> -void qemu_bh_update_timeout(uint32_t *timeout)
> +void aio_bh_update_timeout(AioContext *ctx, uint32_t *timeout)
>  {
>  QEMUBH *bh;
>  
> -for (bh = first_bh; bh; bh = bh->next) {
> +for (bh = ctx->first_bh; bh; bh = bh->next) {
>  if (!bh->deleted && bh->scheduled) {
>  if (bh->idle) {
>  /* idle bottom halves will be polled at least
> @@ -140,3 +136,7 @@ void qemu_bh_update_timeout(uint32_t *timeout)
>  }
>  }
>  
> +AioContext *aio_context_new(void)
> +{
> +return g_new0(AioContext, 1);
> +}
> diff --git a/hw/hw.h b/hw/hw.h
> index e5cb9bf..acb718d 100644
> --- a/hw/hw.h
> +++ b/hw/hw.h
> @@ -10,6 +10,7 @@
>  
>  #include "ioport.h"
>  #include "irq.h"
> +#include "qemu-aio.h"
>  #include "qemu-file.h"
>  #include "vmstate.h"
>  
> diff --git a/iohandler.c b/iohandler.c
> index a2d871b..60460a6 100644
> --- a/iohandler.c
> +++ b/iohandler.c
> @@ -26,6 +26,7 @@
>  #include "qemu-common.h"
>  #include "qemu-char.h"
>  #include "qemu-queue.h"
> +#include "qemu-aio.h"
>  #include "main-loop.h"
>  
>  #ifndef _WIN32
> diff --git a/main-loop.c b/main-loop.c
> index eb3b6e6..f0bc515 100644
> --- a/main-loop.c
> +++ b/main-loop.c
> @@ -26,6 +26,7 @@
>  #include "qemu-timer.h"
>  #include "slirp/slirp.h"
>  #include "main-loop.h"
> +#include "qemu-aio.h"
>  
>  #ifndef _WIN32
>  
> @@ -199,6 +200,8 @@ static int qemu_signal_init(void)
>  }
>  #endif
>  
> +static AioContext *qemu_aio_context;
> +
>  int main_loop_init(void)
>  {
>  int ret;
> @@ -215,6 +218,7 @@ int main_loop_init(void)
>  return ret;
>  }
>  
> +qemu_aio_context = aio_context_new();
>  return 0;
>  }
>  
> @@ -478,7 +482,7 @@ int main_loop_wait(int nonblocking)
>  if (nonblocking) {
>  timeout = 0;
>  } else {
> -qemu_bh_update_timeout(&timeout);
> +aio_bh_update_timeout(qemu_aio_context, &timeout);
>  }
>  
>  /* poll any events */
> @@ -507,3 +511,15 @@ int main_loop_wait(int nonblocking)
>  
>  return ret;
>  }
> +
> +/* Functions to operate on the main QEMU AioContext.  */
> +
> +QEMUBH *qemu_bh_new(QEMUBHFunc *cb, void *opaque)
> +{
> +return aio_bh_new(qemu_aio_context, cb, opaque);
> +}
> +
> +int qemu_bh_poll(void)
> +{
> +return aio_bh_poll(qemu_aio_context);
> +}
> diff --git a/main-loop.h b/main-loop.h
> index dce1cd9..47644ce 100644
> --- a/main-loop.h
> +++ b/main-loop.h
> @@ -25,6 +25,8 @@
>  #ifndef QEMU_

Re: [Qemu-devel] [PATCH 05/17] aio: provide platform-independent API

2012-09-25 Thread Anthony Liguori

Paolo Bonzini  writes:

> This adds to aio.c a platform-independent API based on EventNotifiers, that
> can be used by both POSIX and Win32.
>
> Signed-off-by: Paolo Bonzini 

Reviewed-by: Anthony Liguori 

Regards,

Anthony Liguori

> ---
>  Makefile.objs |  4 ++--
>  aio.c |  9 +
>  qemu-aio.h| 19 ++-
>  3 file modificati, 29 inserzioni(+), 3 rimozioni(-)
>
> diff --git a/Makefile.objs b/Makefile.objs
> index a99378c..713dd87 100644
> --- a/Makefile.objs
> +++ b/Makefile.objs
> @@ -45,6 +45,8 @@ block-obj-y = iov.o cache-utils.o qemu-option.o module.o 
> async.o
>  block-obj-y += nbd.o block.o aio.o aes.o qemu-config.o qemu-progress.o 
> qemu-sockets.o
>  block-obj-y += $(coroutine-obj-y) $(qobject-obj-y) $(version-obj-y)
>  block-obj-$(CONFIG_POSIX) += posix-aio-compat.o
> +block-obj-$(CONFIG_POSIX) += event_notifier-posix.o
> +block-obj-$(CONFIG_WIN32) += event_notifier-win32.o
>  block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
>  block-obj-y += block/
>  
> @@ -92,8 +94,6 @@ common-obj-y += bt-host.o bt-vhci.o
>  common-obj-y += acl.o
>  common-obj-$(CONFIG_POSIX) += compatfd.o
>  common-obj-y += notify.o
> -common-obj-$(CONFIG_POSIX) += event_notifier-posix.o
> -common-obj-$(CONFIG_WIN32) += event_notifier-win32.o
>  common-obj-y += qemu-timer.o qemu-timer-common.o
>  
>  common-obj-$(CONFIG_SLIRP) += slirp/
> diff --git a/aio.c b/aio.c
> index e062aab..44214e1 100644
> --- a/aio.c
> +++ b/aio.c
> @@ -95,6 +95,15 @@ void qemu_aio_set_fd_handler(int fd,
>  qemu_set_fd_handler2(fd, NULL, io_read, io_write, opaque);
>  }
>  
> +void qemu_aio_set_event_notifier(EventNotifier *notifier,
> + EventNotifierHandler *io_read,
> + AioFlushEventNotifierHandler *io_flush)
> +{
> +qemu_aio_set_fd_handler(event_notifier_get_fd(notifier),
> +(IOHandler *)io_read, NULL,
> +(AioFlushHandler *)io_flush, notifier);
> +}
> +
>  void qemu_aio_flush(void)
>  {
>  while (qemu_aio_wait());
> diff --git a/qemu-aio.h b/qemu-aio.h
> index 27a7e21..dc416a5 100644
> --- a/qemu-aio.h
> +++ b/qemu-aio.h
> @@ -16,6 +16,7 @@
>  
>  #include "qemu-common.h"
>  #include "qemu-char.h"
> +#include "event_notifier.h"
>  
>  typedef struct BlockDriverAIOCB BlockDriverAIOCB;
>  typedef void BlockDriverCompletionFunc(void *opaque, int ret);
> @@ -39,7 +40,7 @@ void *qemu_aio_get(AIOPool *pool, BlockDriverState *bs,
>  void qemu_aio_release(void *p);
>  
>  /* Returns 1 if there are still outstanding AIO requests; 0 otherwise */
> -typedef int (AioFlushHandler)(void *opaque);
> +typedef int (AioFlushEventNotifierHandler)(EventNotifier *e);
>  
>  /* Flush any pending AIO operation. This function will block until all
>   * outstanding AIO operations have been completed or cancelled. */
> @@ -53,6 +54,10 @@ void qemu_aio_flush(void);
>   * Return whether there is still any pending AIO operation.  */
>  bool qemu_aio_wait(void);
>  
> +#ifdef CONFIG_POSIX
> +/* Returns 1 if there are still outstanding AIO requests; 0 otherwise */
> +typedef int (AioFlushHandler)(void *opaque);
> +
>  /* Register a file descriptor and associated callbacks.  Behaves very 
> similarly
>   * to qemu_set_fd_handler2.  Unlike qemu_set_fd_handler2, these callbacks 
> will
>   * be invoked when using either qemu_aio_wait() or qemu_aio_flush().
> @@ -65,5 +70,17 @@ void qemu_aio_set_fd_handler(int fd,
>   IOHandler *io_write,
>   AioFlushHandler *io_flush,
>   void *opaque);
> +#endif
> +
> +/* Register an event notifier and associated callbacks.  Behaves very 
> similarly
> + * to event_notifier_set_handler.  Unlike event_notifier_set_handler, these 
> callbacks
> + * will be invoked when using either qemu_aio_wait() or qemu_aio_flush().
> + *
> + * Code that invokes AIO completion functions should rely on this function
> + * instead of event_notifier_set_handler.
> + */
> +void qemu_aio_set_event_notifier(EventNotifier *notifier,
> + EventNotifierHandler *io_read,
> + AioFlushEventNotifierHandler *io_flush);
>  
>  #endif
> -- 
> 1.7.12

Re: [Qemu-devel] [PATCH 04/17] aio: change qemu_aio_set_fd_handler to return void

2012-09-25 Thread Anthony Liguori

Paolo Bonzini  writes:

> Signed-off-by: Paolo Bonzini 

Reviewed-by: Anthony Liguori 

Regards,

Anthony Liguori

> ---
>  aio.c  | 12 +---
>  qemu-aio.h | 10 +-
>  2 file modificati, 10 inserzioni(+), 12 rimozioni(-)
>
> diff --git a/aio.c b/aio.c
> index c738a4e..e062aab 100644
> --- a/aio.c
> +++ b/aio.c
> @@ -53,11 +53,11 @@ static AioHandler *find_aio_handler(int fd)
>  return NULL;
>  }
>  
> -int qemu_aio_set_fd_handler(int fd,
> -IOHandler *io_read,
> -IOHandler *io_write,
> -AioFlushHandler *io_flush,
> -void *opaque)
> +void qemu_aio_set_fd_handler(int fd,
> + IOHandler *io_read,
> + IOHandler *io_write,
> + AioFlushHandler *io_flush,
> + void *opaque)
>  {
>  AioHandler *node;
>  
> @@ -93,8 +93,6 @@ int qemu_aio_set_fd_handler(int fd,
>  }
>  
>  qemu_set_fd_handler2(fd, NULL, io_read, io_write, opaque);
> -
> -return 0;
>  }
>  
>  void qemu_aio_flush(void)
> diff --git a/qemu-aio.h b/qemu-aio.h
> index bfdd35f..27a7e21 100644
> --- a/qemu-aio.h
> +++ b/qemu-aio.h
> @@ -60,10 +60,10 @@ bool qemu_aio_wait(void);
>   * Code that invokes AIO completion functions should rely on this function
>   * instead of qemu_set_fd_handler[2].
>   */
> -int qemu_aio_set_fd_handler(int fd,
> -IOHandler *io_read,
> -IOHandler *io_write,
> -AioFlushHandler *io_flush,
> -void *opaque);
> +void qemu_aio_set_fd_handler(int fd,
> + IOHandler *io_read,
> + IOHandler *io_write,
> + AioFlushHandler *io_flush,
> + void *opaque);
>  
>  #endif
> -- 
> 1.7.12

Re: [Qemu-devel] [PATCH 5/9] mm: compaction: Acquire the zone->lru_lock as late as possible

2012-09-25 Thread Andrew Morton

On Tue, 25 Sep 2012 17:13:27 +0900
Minchan Kim  wrote:

> I see. To me, your saying is better than current comment.
> I hope comment could be more explicit.
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index df01b4e..f1d2cc7 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -542,8 +542,9 @@ isolate_migratepages_range(struct zone *zone, struct 
> compact_control *cc,
>  * splitting and collapsing (collapsing has already happened
>  * if PageLRU is set) but the lock is not necessarily taken
>  * here and it is wasteful to take it just to check transhuge.
> -* Check transhuge without lock and skip if it's either a
> -* transhuge or hugetlbfs page.
> +* Check transhuge without lock and *skip* if it's either a
> +* transhuge or hugetlbfs page because it's not safe to call
> +* compound_order.
>  */
> if (PageTransHuge(page)) {
> if (!locked)

Going a bit further:

--- 
a/mm/compaction.c~mm-compaction-acquire-the-zone-lru_lock-as-late-as-possible-fix
+++ a/mm/compaction.c
@@ -415,7 +415,8 @@ isolate_migratepages_range(struct zone *
 * if PageLRU is set) but the lock is not necessarily taken
 * here and it is wasteful to take it just to check transhuge.
 * Check transhuge without lock and skip if it's either a
-* transhuge or hugetlbfs page.
+* transhuge or hugetlbfs page because calling compound_order()
+* requires lru_lock to exclude isolation and splitting.
 */
if (PageTransHuge(page)) {
if (!locked)
_


but...  the requirement to hold lru_lock for compound_order() is news
to me.  It doesn't seem to be written down or explained anywhere, and
one wonders why the cheerily undocumented compound_lock() doesn't have
this effect.  What's going on here??

Re: [Qemu-devel] [PATCH] Align PCI capabilities in pci_find_space

2012-09-25 Thread Alex Williamson

On Tue, 2012-09-25 at 15:59 -0500, m...@cs.wisc.edu wrote:
> From: Matt Renzelmann 
> 
> The current implementation of pci_find_space does not properly align
> PCI capabilities in the PCI configuration space.  This patch fixes
> this issue.
> 
> Signed-off-by: Matt Renzelmann 
> ---
> 
> This is my first patch to QEMU so I hope I'm not screwing up too much.
> The purpose of this patch is to mask off the low-order two bits--Linux
> masks these while scanning the PCI configuration space, for example,
> so we need to make sure QEMU's behavior matches the standard.
> 
> No current QEMU hardware is likely using this but it may be important
> later.
> 
>  hw/pci.c |   14 ++
>  1 files changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/pci.c b/hw/pci.c
> index e149305..8771b7e 100644
> --- a/hw/pci.c
> +++ b/hw/pci.c
> @@ -1571,11 +1571,17 @@ static int pci_find_space(PCIDevice *pdev, uint8_t 
> size)
>  int config_size = pci_config_size(pdev);
>  int offset = PCI_CONFIG_HEADER_SIZE;
>  int i;
> -for (i = PCI_CONFIG_HEADER_SIZE; i < config_size; ++i)
> -if (pdev->used[i])
> -offset = i + 1;
> -else if (i - offset + 1 == size)
> +int masked;
> +
> +for (i = PCI_CONFIG_HEADER_SIZE; i < config_size; ++i) {
> +masked = i & (~3);
> +if (pdev->used[i]) {
> +offset = masked + 4;
> +} else if (i - offset + 1 == size) {
>  return offset;
> +}
> +}
> +
>  return 0;
>  }
>  

I think you could just search every 4th byte.  In fact, this whole used
byte-map could be turned into a single uint64_t bitmap for standard
config space.  Thanks,

Alex

Re: [Qemu-devel] [PATCH v2] stop using stdio for monitor/serial/etc with -daemonize

2012-09-25 Thread Anthony Liguori

Michael Tokarev  writes:

> Current code binds monitor and serial port to the guest console
> unless -nographic is specified, which is okay.  But when there's
> no guest console (-nographic), the code tries to use stdio for
> the same default devices.  But it does not check for -daemonize
> at the same time -- because when -daemonize is given, there's no
> point at using stdin since it will be closed down the line.
> However, when serial port is attached to stdin, tty control
> modes are changed (switching tty to raw mode), and qemu will
> switch to background, leaving the tty in bad state.
>
> Take -daemonize into account too, when assigning default devices,
> and for -nographic -daemonize case, assign them to "null" instead.

Combining -nographic and -daemonize don't make sense.  I'd rather error
out with this combination.

I think what the user is after is -daemonize -vga none OR -daemonize
-display none.

Regards,

Anthony Liguori

>
> This is https://bugs.launchpad.net/qemu/+bug/1024275
> or http://bugs.debian.org/549195 .
>
> While at it, reformat this code a bit to be less of a maze of
> ifs and use a variable to hold common target of devices.
>
> This patch depends on another patch, 995ee2bf469de6,
> "curses: don't initialize curses when qemu is daemonized",
> by Hitoshi Mitake, which creates is_daemonized() routine.
>
> Signed-off-by: Michael Tokarev 
> ---
>  vl.c |   45 +++--
>  1 file changed, 27 insertions(+), 18 deletions(-)
>
> diff --git a/vl.c b/vl.c
> index 48049ef..a210ff9 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -3392,30 +3392,39 @@ int main(int argc, char **argv, char **envp)
>  default_sdcard = 0;
>  }
>  
> -if (display_type == DT_NOGRAPHIC) {
> -if (default_parallel)
> -add_device_config(DEV_PARALLEL, "null");
> +/* Create default monitor, serial, parallel and virtcon devices. */
> +/* reuse optarg variable */
> +optarg = NULL;
> +if (display_type != DT_NOGRAPHIC) {
> +/* regular case, all devices directed to the guest console */
> +optarg = "vc:80Cx24C";
> +} else if (is_daemonized()) {
> +/* nographic and daemonize, everything => null */
> +optarg = "null";
> +} else {
> +/* nographic and no daemonize */
> +/* can't have both serial and virtcon on stdio */
>  if (default_serial && default_monitor) {
>  add_device_config(DEV_SERIAL, "mon:stdio");
>  } else if (default_virtcon && default_monitor) {
>  add_device_config(DEV_VIRTCON, "mon:stdio");
>  } else {
> -if (default_serial)
> -add_device_config(DEV_SERIAL, "stdio");
> -if (default_virtcon)
> -add_device_config(DEV_VIRTCON, "stdio");
> -if (default_monitor)
> -monitor_parse("stdio", "readline");
> +optarg = "stdio";
>  }
> -} else {
> -if (default_serial)
> -add_device_config(DEV_SERIAL, "vc:80Cx24C");
> -if (default_parallel)
> -add_device_config(DEV_PARALLEL, "vc:80Cx24C");
> -if (default_monitor)
> -monitor_parse("vc:80Cx24C", "readline");
> -if (default_virtcon)
> -add_device_config(DEV_VIRTCON, "vc:80Cx24C");
> +}
> +if (optarg && default_serial) {
> +add_device_config(DEV_SERIAL, optarg);
> +}
> +if (default_parallel) {
> +/* parallel port is connected console or to null */
> +add_device_config(DEV_PARALLEL,
> +  display_type == DT_NOGRAPHIC ? "null" : optarg);
> +}
> +if (optarg && default_monitor) {
> +monitor_parse(optarg, "readline");
> +}
> +if (optarg && default_virtcon) {
> +add_device_config(DEV_VIRTCON, optarg);
>  }
>  
>  socket_init();
> -- 
> 1.7.10.4

Re: [Qemu-devel] [RfC] using pixman in qemu for raster ops

2012-09-25 Thread Anthony Liguori

Gerd Hoffmann  writes:

> On 09/25/12 12:48, Peter Maydell wrote:
>> On 25 September 2012 11:37, Gerd Hoffmann  wrote:
>>> On 09/25/12 11:31, Peter Maydell wrote:
 For me "not a standard library package on RHEL5" is a strong argument
 against adding a hard dependency. (For instance, most of the compute
 cluster machines here are RHEL5 and it would be pretty awkward to
 deal with manually building a dependent library.)
>>>
>>> Why it is that a big deal?  Whatever is used to distribute qemu to the
>>> cluster machines (local yum repo?) can be used to distribute pixman too, no?
>> 
>> There's a big leap between "run configure, put the qemu executable
>> in some generally available directory" and "you have to first get
>> and build some third party library, install it somewhere, tell
>> configure where you put it, then build qemu, then make sure that
>> not just the qemu executable but also this third party library
>> are on all the machines". At the moment you can build qemu without
>> any non-system dependencies, and we should have a really good
>> reason for breaking that.
>
> Hmm, we could import a pixman copy into the qemu tree and use that as
> fallback if we don't find pixman installed on the system ...

We can include a pixman submodule in qemu.git and have configure try to
find a local version, if it's not available, build the embedded version.

After RHEL5 becomes ancient, we can simply remove the submodule.  Should
make everyone happy.

Regards,

Anthony Liguori

>
> cheers,
>   Gerd

Re: [Qemu-devel] KVM Call minutes for 2012-09-25

2012-09-25 Thread Anthony Liguori

Luiz Capitulino  writes:

> On Tue, 25 Sep 2012 16:59:00 +0200
> Markus Armbruster  wrote:
>
>> Juan Quintela  writes:
>> 
>> > Hi
>> >
>> > This are this week minutes:
>> >
>> > - URI parsing library for glusterfs: libxml2 vs. in-tree "fork" of the
>> > same code. (Paolo)
>> >  * code hasn't changed in 2 years, it is really stable
>> >  * anthony wants to copy the code
>> >
>> > - there are several commands that do blocking IO
>> >   dump-guest-memory/screen-dump
>> >   convert to asynchronous commands after we move all to QAPI
>> >   only two commands missingto port to QAPI, and one is posted on list
>> >   non-blocking IO to a file is a challenge
>> >   (we have code on the block layer for it)
>> >
>> > - how to give errors from OpenFile to the caller
>> >   putting errno as int: bad idea
>> >   putting as strerrno string: also a bad idea, no warantees
>> 
>> Use the identifiers instead of their non-portable numeric encodings or
>> strerror() descriptions: "EPERM", "EINVAL", ...
>
> Yes, but for me the important point in this discussion is whether
> or not a new class is necessary. I think it it isn't.

If we have a tool that needs to differientiate errors, chances are
another user needs to also.

Regards,

Anthony Liguori

Re: [Qemu-devel] [PATCH v2] Add infrastructure for QIDL-based device serialization

2012-09-25 Thread Anthony Liguori

Michael Roth  writes:

> On Tue, Sep 25, 2012 at 08:37:16AM +0200, Paolo Bonzini wrote:
>> Il 24/09/2012 20:14, Michael Roth ha scritto:
>> >>> > > I went with qUppercase because it avoids all the previous issues with
>> >>> > > using leading underscores, and it's reserved in terms of QEMU coding
>> >>> > > guidelines as far as I can tell (we generally require leading capital
>> >>> > > for typedefs and lowercase for variable names, and can work around
>> >>> > > exceptions on a case by case basis by using QIDL() or some other 
>> >>> > > name).
>> >>> > > I also had it as q_* for a bit but that didn't seem much better on 
>> >>> > > the
>> >>> > > eyes we looking at converted structures.
>> >> > 
>> >> > It looks like Hungarian notation and very much unlike other QEMU code.
>> >> > I'd use q_ or qidl_ prefix instead, or rather QIDL().
>> >> > 
>> > I wanted some way to distinguish from other qemu code to avoid conflicts,
>> > but i think q_* seems reasonable if we reserve the prefix via CODING_STYLE.
>> > Then for conflicts outside our control we can either use a different name
>> > for the annotations or use the long-form QIDL() style depending on the
>> > circumstances.
>> 
>> I'm not sure why we need two ways to say the same thing...  I know it's
>> just bikeshedding to some extent, but I'd really like to standardize on
>> a single form.
>
> QIDL() (or maybe qidl()) should be the One True Form. It's the
> only one that provides both proper namespacing and can be used both for
> simple annotations and for ones that take parameters.
>
> I guess the real question is whether or not it makes sense to provide
> "shortcuts" for the more common annotations to avoid clutter. I've heard
> it both ways, so it's hard to decide.
>
> So let's bikeshed a bit. Maybe to put things into perspective, we're looking
> at (and I'm just gonna go ahead and switch the OTF to qidl() now so we're
> looking at the best case scenarios for both, and include q_* as well):
>
> a) One True Form:
> QIDL_DECLARE(RTCState) {  
>   
> ISADevice dev qidl(immutable);
> MemoryRegion io qidl(immutable);

Just like sparse is a "compiler", so is qidl.  We are free to use the
'_' + lowercase prefix.

  ISADevice _immutable dev;

It's an established practice in wide-use.

Regards,

Anthony Liguori

> uint8_t cmos_data[128];
> uint8_t cmos_index;
> int32_t base_year qidl(property, "base_year", 1980);
> uint64_t base_rtc;
> uint64_t last_update;
> int64_t offset;
> qemu_irq irq qidl(immutable);
> qemu_irq sqw_irq qidl(immutable);
> int it_shift qidl(immutable);
> /* periodic timer */
> QEMUTimer *periodic_timer;
> int64_t next_periodic_time;
> /* update-ended timer */
> QEMUTimer *update_timer;
> uint64_t next_alarm_time;
> uint16_t irq_reinject_on_ack_count qidl(broken);
> uint32_t irq_coalesced;
> uint32_t period;
> bool has_coalesced_timer;
> QEMUTimer *coalesced_timer qidl(optional);
> Notifier clock_reset_notifier qidl(broken);
> LostTickPolicy lost_tick_policy qidl(immutable) \
> qidl(property, "lost_tick_policy", LOST_TICK_DISCARD);
> Notifier suspend_notifier qidl(broken);
> };
>
> b) current simplified form:
> QIDL_DECLARE(RTCState) {  
>   
> ISADevice dev qImmutable;
> MemoryRegion io qImmutable;
> uint8_t cmos_data[128];
> uint8_t cmos_index;
> int32_t base_year qProperty("base_year", 1980);
> uint64_t base_rtc;
> uint64_t last_update;
> int64_t offset;
> qemu_irq irq qImmutable;
> qemu_irq sqw_irq qImmutable;
> int it_shift qImmutable;
> /* periodic timer */
> QEMUTimer *periodic_timer;
> int64_t next_periodic_time;
> /* update-ended timer */
> QEMUTimer *update_timer;
> uint64_t next_alarm_time;
> uint16_t irq_reinject_on_ack_count qBroken;
> uint32_t irq_coalesced;
> uint32_t period;
> bool has_coalesced_timer;
> QEMUTimer *coalesced_timer qOptional;
> Notifier clock_reset_notifier qBroken;
> LostTickPolicy lost_tick_policy qImmutable \
> qProperty("lost_tick_policy", LOST_TICK_DISCARD);
> Notifier suspend_notifier qBroken;
> };
>
> c) proposed simplified form:
> QIDL_DECLARE(RTCState) {  
>   
> ISADevice dev q_immutable;
> MemoryRegion io q_immutable;
> uint8_t cmos_data[128];
> uint8_t cmos_index;
> int32_t base_year q_property("base_year", 1980);
> uint64_t base_rtc;
> uint64_t last_update;
> int64_t offset;
>

Re: [Qemu-devel] [PATCH 18/22] qidl: add lexer library (based on QC parser)

2012-09-25 Thread Anthony Liguori

Michael Roth  writes:

> On Fri, Sep 21, 2012 at 05:18:09PM -0600, Eric Blake wrote:
>> On 09/21/2012 08:07 AM, Michael Roth wrote:
>> > Adds an abstract Lexer class to handle tokenizer via a
>> > peek/pop/peekline/popline interface, along with an implementation for C
>> > based on the lexer from qc.git
>> > 
>> > Signed-off-by: Michael Roth 
>> > ---
>> >  scripts/lexer.py |  306 
>> > ++
>> >  1 file changed, 306 insertions(+)
>> >  create mode 100644 scripts/lexer.py
>> > 
>> > diff --git a/scripts/lexer.py b/scripts/lexer.py
>> > new file mode 100644
>> > index 000..e740e5c
>> > --- /dev/null
>> > +++ b/scripts/lexer.py
>> > @@ -0,0 +1,306 @@
>> > +#
>> > +# QEMU Lexer Library
>> > +#
>> > +# Copyright IBM, Corp. 2012
>> > +#
>> > +# Authors:
>> > +#  Anthony Liguori 
>> > +#  Michael Roth
>> > +#
>> > +# This work is licensed under the terms of the GNU GPLv2.
>> 
>> Any specific reason this is not GPLv2+?
>
> I thought I'd assigned the same license Anthony used for
> https://github.com/aliguori/qidl/blob/master/qc.py, but I
> guess the license wasn't actually specified. If it's fine
> by Anthony I'll switch them to GPLv2+.

Works for me.

Regards,

Anthony Liguori

>
>> 
>> -- 
>> Eric Blake   ebl...@redhat.com+1-919-301-3266
>> Libvirt virtualization library http://libvirt.org
>>

Re: [Qemu-devel] [PATCH 0/9] build cleanups

2012-09-25 Thread Anthony Liguori

Paolo Bonzini  writes:

> Il 17/09/2012 18:44, Paolo Bonzini ha scritto:
>> Here are some cleanups to the build system, removing some unused files
>> from some of the products or fixing some strange behavior that bit me
>> while looking at the tree reorganization.
>> 
>> Paolo
>> 
>> Paolo Bonzini (9):
>>   tools: split monitor-dummy.c, clean up qemu-tool.c
>>   block: move QEMUIOVector functions to iov.c
>>   build: avoid duplicate appearance of files
>>   qemu-ga: do not include the QEMU main loop and dependencies
>>   build: add object directory to QEMU_INCLUDES
>>   build: add $(TARGET_DIR) to "GEN config-target.h" lines
>>   build: use nesting magic for tools-obj-y
>>   libcacard: add include/ to CFLAGS
>>   libcacard: remove dependency on QEMU main loop
>> 
>>  Makefile   |   9 ++---
>>  Makefile.objs  |  19 +++---
>>  cutils.c   | 103 
>> 
>>  iov.c  | 104 
>> +
>>  libcacard/Makefile |   2 +-
>>  monitor-dummy.c|  64 +
>>  qemu-tool.c|  54 
>>  rules.mak  |   5 ++-
>>  8 file modificati, 189 inserzioni(+), 171 rimozioni(-)
>>  create mode 100644 monitor-dummy.c
>> 
>
> Doesn't apply anymore. :(


FWIW, it breaks make check too:

  LINK  tests/check-qdict
iohandler.o: In function `qemu_init_child_watch':
/home/anthony/git/qemu/iohandler.c:172: undefined reference to `qemu_bh_new'
iohandler.o: In function `sigchld_handler':
/home/anthony/git/qemu/iohandler.c:154: undefined reference to 
`qemu_bh_schedule'
main-loop.o: In function `main_loop_wait':
/home/anthony/git/qemu/main-loop.c:506: undefined reference to `qemu_bh_poll'
/home/anthony/git/qemu/main-loop.c:481: undefined reference to 
`qemu_bh_update_timeout'
collect2: ld returned 1 exit status
make: *** [tests/check-qdict] Error 1

Regards,

Anthony Liguori

>
> Paolo

[Qemu-devel] [PATCH] Align PCI capabilities in pci_find_space

2012-09-25 Thread mjr

From: Matt Renzelmann 

The current implementation of pci_find_space does not properly align
PCI capabilities in the PCI configuration space.  This patch fixes
this issue.

Signed-off-by: Matt Renzelmann 
---

This is my first patch to QEMU so I hope I'm not screwing up too much.
The purpose of this patch is to mask off the low-order two bits--Linux
masks these while scanning the PCI configuration space, for example,
so we need to make sure QEMU's behavior matches the standard.

No current QEMU hardware is likely using this but it may be important
later.

 hw/pci.c |   14 ++
 1 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index e149305..8771b7e 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -1571,11 +1571,17 @@ static int pci_find_space(PCIDevice *pdev, uint8_t size)
 int config_size = pci_config_size(pdev);
 int offset = PCI_CONFIG_HEADER_SIZE;
 int i;
-for (i = PCI_CONFIG_HEADER_SIZE; i < config_size; ++i)
-if (pdev->used[i])
-offset = i + 1;
-else if (i - offset + 1 == size)
+int masked;
+
+for (i = PCI_CONFIG_HEADER_SIZE; i < config_size; ++i) {
+masked = i & (~3);
+if (pdev->used[i]) {
+offset = masked + 4;
+} else if (i - offset + 1 == size) {
 return offset;
+}
+}
+
 return 0;
 }
 
-- 
1.7.5.4

[Qemu-devel] [PATCH 2/3] monitor: add Error * argument to monitor_get_fd

2012-09-25 Thread Luiz Capitulino

From: Paolo Bonzini 

Signed-off-by: Paolo Bonzini 
Signed-off-by: Luiz Capitulino 
---
 dump.c |  3 +--
 migration-fd.c |  2 +-
 monitor.c  | 15 +--
 monitor.h  |  2 +-
 4 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/dump.c b/dump.c
index 2bf8d8d..1a3c716 100644
--- a/dump.c
+++ b/dump.c
@@ -836,9 +836,8 @@ void qmp_dump_guest_memory(bool paging, const char *file, 
bool has_begin,
 
 #if !defined(WIN32)
 if (strstart(file, "fd:", &p)) {
-fd = monitor_get_fd(cur_mon, p);
+fd = monitor_get_fd(cur_mon, p, errp);
 if (fd == -1) {
-error_set(errp, QERR_FD_NOT_FOUND, p);
 return;
 }
 }
diff --git a/migration-fd.c b/migration-fd.c
index 50138ed..7335167 100644
--- a/migration-fd.c
+++ b/migration-fd.c
@@ -75,7 +75,7 @@ static int fd_close(MigrationState *s)
 
 int fd_start_outgoing_migration(MigrationState *s, const char *fdname)
 {
-s->fd = monitor_get_fd(cur_mon, fdname);
+s->fd = monitor_get_fd(cur_mon, fdname, NULL);
 if (s->fd == -1) {
 DPRINTF("fd_migration: invalid file descriptor identifier\n");
 goto err_after_get_fd;
diff --git a/monitor.c b/monitor.c
index 67064e2..c24235e 100644
--- a/monitor.c
+++ b/monitor.c
@@ -951,7 +951,7 @@ static int add_graphics_client(Monitor *mon, const QDict 
*qdict, QObject **ret_d
 CharDriverState *s;
 
 if (strcmp(protocol, "spice") == 0) {
-int fd = monitor_get_fd(mon, fdname);
+int fd = monitor_get_fd(mon, fdname, NULL);
 int skipauth = qdict_get_try_bool(qdict, "skipauth", 0);
 int tls = qdict_get_try_bool(qdict, "tls", 0);
 if (!using_spice) {
@@ -965,13 +965,13 @@ static int add_graphics_client(Monitor *mon, const QDict 
*qdict, QObject **ret_d
 return 0;
 #ifdef CONFIG_VNC
 } else if (strcmp(protocol, "vnc") == 0) {
-   int fd = monitor_get_fd(mon, fdname);
+   int fd = monitor_get_fd(mon, fdname, NULL);
 int skipauth = qdict_get_try_bool(qdict, "skipauth", 0);
vnc_display_add_client(NULL, fd, skipauth);
return 0;
 #endif
 } else if ((s = qemu_chr_find(protocol)) != NULL) {
-   int fd = monitor_get_fd(mon, fdname);
+   int fd = monitor_get_fd(mon, fdname, NULL);
if (qemu_chr_add_client(s, fd) < 0) {
qerror_report(QERR_ADD_CLIENT_FAILED);
return -1;
@@ -2118,7 +2118,7 @@ static void do_loadvm(Monitor *mon, const QDict *qdict)
 }
 }
 
-int monitor_get_fd(Monitor *mon, const char *fdname)
+int monitor_get_fd(Monitor *mon, const char *fdname, Error **errp)
 {
 mon_fd_t *monfd;
 
@@ -2139,6 +2139,7 @@ int monitor_get_fd(Monitor *mon, const char *fdname)
 return fd;
 }
 
+error_setg(errp, "File descriptor named '%s' has not been found", fdname);
 return -1;
 }
 
@@ -2410,12 +2411,14 @@ int monitor_fdset_dup_fd_remove(int dup_fd)
 int monitor_handle_fd_param(Monitor *mon, const char *fdname)
 {
 int fd;
+Error *local_err = NULL;
 
 if (!qemu_isdigit(fdname[0]) && mon) {
 
-fd = monitor_get_fd(mon, fdname);
+fd = monitor_get_fd(mon, fdname, &local_err);
 if (fd == -1) {
-error_report("No file descriptor named %s found", fdname);
+qerror_report_err(local_err);
+error_free(local_err);
 return -1;
 }
 } else {
diff --git a/monitor.h b/monitor.h
index 64c1561..e240c3f 100644
--- a/monitor.h
+++ b/monitor.h
@@ -66,7 +66,7 @@ int monitor_read_block_device_key(Monitor *mon, const char 
*device,
   BlockDriverCompletionFunc *completion_cb,
   void *opaque);
 
-int monitor_get_fd(Monitor *mon, const char *fdname);
+int monitor_get_fd(Monitor *mon, const char *fdname, Error **errp);
 int monitor_handle_fd_param(Monitor *mon, const char *fdname);
 
 void monitor_vprintf(Monitor *mon, const char *fmt, va_list ap)
-- 
1.7.12.315.g682ce8b

[Qemu-devel] [PATCH 3/3] qapi: convert add_client

2012-09-25 Thread Luiz Capitulino

Also fixes a few issues while there:

 1. The fd returned by monitor_get_fd() leaks in most error conditions
 2. monitor_get_fd() return value is not checked. Best case we get
an error that is not correctly reported, worse case one of the
functions using the fd (with value of -1) will explode
 3. A few error conditions aren't reported
 4. We now "use up" @fdname always.  Before, it was left alone for
invalid @protocol

Signed-off-by: Luiz Capitulino 
---
 monitor.c| 39 ---
 qapi-schema.json | 25 +
 qmp-commands.hx  |  5 +
 qmp.c| 43 +++
 4 files changed, 69 insertions(+), 43 deletions(-)

diff --git a/monitor.c b/monitor.c
index c24235e..c9f460a 100644
--- a/monitor.c
+++ b/monitor.c
@@ -944,45 +944,6 @@ static void do_trace_print_events(Monitor *mon)
 trace_print_events((FILE *)mon, &monitor_fprintf);
 }
 
-static int add_graphics_client(Monitor *mon, const QDict *qdict, QObject 
**ret_data)
-{
-const char *protocol  = qdict_get_str(qdict, "protocol");
-const char *fdname = qdict_get_str(qdict, "fdname");
-CharDriverState *s;
-
-if (strcmp(protocol, "spice") == 0) {
-int fd = monitor_get_fd(mon, fdname, NULL);
-int skipauth = qdict_get_try_bool(qdict, "skipauth", 0);
-int tls = qdict_get_try_bool(qdict, "tls", 0);
-if (!using_spice) {
-/* correct one? spice isn't a device ,,, */
-qerror_report(QERR_DEVICE_NOT_ACTIVE, "spice");
-return -1;
-}
-if (qemu_spice_display_add_client(fd, skipauth, tls) < 0) {
-close(fd);
-}
-return 0;
-#ifdef CONFIG_VNC
-} else if (strcmp(protocol, "vnc") == 0) {
-   int fd = monitor_get_fd(mon, fdname, NULL);
-int skipauth = qdict_get_try_bool(qdict, "skipauth", 0);
-   vnc_display_add_client(NULL, fd, skipauth);
-   return 0;
-#endif
-} else if ((s = qemu_chr_find(protocol)) != NULL) {
-   int fd = monitor_get_fd(mon, fdname, NULL);
-   if (qemu_chr_add_client(s, fd) < 0) {
-   qerror_report(QERR_ADD_CLIENT_FAILED);
-   return -1;
-   }
-   return 0;
-}
-
-qerror_report(QERR_INVALID_PARAMETER, "protocol");
-return -1;
-}
-
 static int client_migrate_info(Monitor *mon, const QDict *qdict,
MonitorCompletion cb, void *opaque)
 {
diff --git a/qapi-schema.json b/qapi-schema.json
index 14e4419..191d921 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -33,6 +33,31 @@
 'MigrationExpected' ] }
 
 ##
+# @add_client
+#
+# Allow client connections for VNC, Spice and socket based
+# character devices to be passed in to QEMU via SCM_RIGHTS.
+#
+# @protocol: protocol name. Valid names are "vnc", "spice" or the
+#name of a character device (eg. from -chardev id=)
+#
+# @fdname: file descriptor name previously passed via 'getfd' command
+#
+# @skipauth: #optional whether to skip authentication. Only applies
+#to "vnc" and "spice" protocols
+#
+# @tls: #optional whether to perform TLS. Only applies to the "spice"
+#   protocol
+#
+# Returns: nothing on success.
+#
+# Since: 0.14.0
+##
+{ 'command': 'add_client',
+  'data': { 'protocol': 'str', 'fdname': 'str', '*skipauth': 'bool',
+'*tls': 'bool' } }
+
+##
 # @NameInfo:
 #
 # Guest name information.
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 6e21ddb..36e08d9 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -1231,10 +1231,7 @@ EQMP
 {
 .name   = "add_client",
 .args_type  = "protocol:s,fdname:s,skipauth:b?,tls:b?",
-.params = "protocol fdname skipauth tls",
-.help   = "add a graphics client",
-.user_print = monitor_user_noop,
-.mhandler.cmd_new = add_graphics_client,
+.mhandler.cmd_new = qmp_marshal_input_add_client,
 },
 
 SQMP
diff --git a/qmp.c b/qmp.c
index 8463922..36c54c5 100644
--- a/qmp.c
+++ b/qmp.c
@@ -479,3 +479,46 @@ CpuDefinitionInfoList *qmp_query_cpu_definitions(Error 
**errp)
 return arch_query_cpu_definitions(errp);
 }
 
+void qmp_add_client(const char *protocol, const char *fdname,
+bool has_skipauth, bool skipauth, bool has_tls, bool tls,
+Error **errp)
+{
+CharDriverState *s;
+int fd;
+
+fd = monitor_get_fd(cur_mon, fdname, errp);
+if (fd < 0) {
+return;
+}
+
+if (strcmp(protocol, "spice") == 0) {
+if (!using_spice) {
+error_set(errp, QERR_DEVICE_NOT_ACTIVE, "spice");
+close(fd);
+return;
+}
+skipauth = has_skipauth ? skipauth : false;
+tls = has_tls ? tls : false;
+if (qemu_spice_display_add_client(fd, skipauth, tls) < 0) {
+error_setg(errp, "spice failed to add client");
+close(fd);
+}
+return;
+#ifdef CONFIG_VNC
+}

[Qemu-devel] [PATCH 1/3] pci-assign: use monitor_handle_fd_param

2012-09-25 Thread Luiz Capitulino

From: Paolo Bonzini 

There is no need to open-code the choice between a file descriptor
number or a named one.  Just use monitor_handle_fd_param, which
also takes care of printing the error message.

Signed-off-by: Paolo Bonzini 
Signed-off-by: Luiz Capitulino 
---
 hw/kvm/pci-assign.c | 12 +++-
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/hw/kvm/pci-assign.c b/hw/kvm/pci-assign.c
index 05b93d9..7a0998c 100644
--- a/hw/kvm/pci-assign.c
+++ b/hw/kvm/pci-assign.c
@@ -579,15 +579,9 @@ static int get_real_device(AssignedDevice *pci_dev, 
uint16_t r_seg,
 snprintf(name, sizeof(name), "%sconfig", dir);
 
 if (pci_dev->configfd_name && *pci_dev->configfd_name) {
-if (qemu_isdigit(pci_dev->configfd_name[0])) {
-dev->config_fd = strtol(pci_dev->configfd_name, NULL, 0);
-} else {
-dev->config_fd = monitor_get_fd(cur_mon, pci_dev->configfd_name);
-if (dev->config_fd < 0) {
-error_report("%s: (%s) unkown", __func__,
- pci_dev->configfd_name);
-return 1;
-}
+dev->config_fd = monitor_handle_fd_param(cur_mon, 
pci_dev->configfd_name);
+if (dev->config_fd < 0) {
+return 1;
 }
 } else {
 dev->config_fd = open(name, O_RDWR);
-- 
1.7.12.315.g682ce8b

[Qemu-devel] [PATCH v4 0/3] qapi: convert add_client

2012-09-25 Thread Luiz Capitulino

v4

 - Fix misworded error message in monitor_get_fd() [Markus]
 - small doc & commit log improvements [Markus]

Luiz Capitulino (1):
  qapi: convert add_client

Paolo Bonzini (2):
  pci-assign: use monitor_handle_fd_param
  monitor: add Error * argument to monitor_get_fd

 dump.c  |  3 +--
 hw/kvm/pci-assign.c | 12 +++-
 migration-fd.c  |  2 +-
 monitor.c   | 48 ++--
 monitor.h   |  2 +-
 qapi-schema.json| 25 +
 qmp-commands.hx |  5 +
 qmp.c   | 43 +++
 8 files changed, 81 insertions(+), 59 deletions(-)

-- 
1.7.12.315.g682ce8b

Re: [Qemu-devel] [PATCH 8/9] mm: compaction: Cache if a pageblock was scanned and no pages were isolated

2012-09-25 Thread Andrew Morton

On Tue, 25 Sep 2012 10:12:07 +0100
Mel Gorman  wrote:

> First, we'd introduce a variant of get_pageblock_migratetype() that returns
> all the bits for the pageblock flags and then helpers to extract either the
> migratetype or the PG_migrate_skip. We already are incurring the cost of
> get_pageblock_migratetype() so it will not be much more expensive than what
> is already there. If there is an allocation or free within a pageblock that
> as the PG_migrate_skip bit set then we increment a counter. When the counter
> reaches some to-be-decided "threshold" then compaction may clear all the
> bits. This would match the criteria of the clearing being based on activity.
> 
> There are four potential problems with this
> 
> 1. The logic to retrieve all the bits and split them up will be a little
>convulated but maybe it would not be that bad.
> 
> 2. The counter is a shared-writable cache line but obviously it could
>be moved to vmstat and incremented with inc_zone_page_state to offset
>the cost a little.
> 
> 3. The biggested weakness is that there is not way to know if the
>counter is incremented based on activity in a small subset of blocks.
> 
> 4. What should the threshold be?
> 
> The first problem is minor but the other three are potentially a mess.
> Adding another vmstat counter is bad enough in itself but if the counter
> is incremented based on a small subsets of pageblocks, the hint becomes
> is potentially useless.
> 
> However, does this match what you have in mind or am I over-complicating
> things?

Sounds complicated.

Using wall time really does suck.  Are you sure you can't think of
something more logical?

How would we demonstrate the suckage?  What would be the observeable downside of
switching that 5 seconds to 5 hours?

> > > > > + for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) 
> > > > > {
> > > > > + struct page *page;
> > > > > + if (!pfn_valid(pfn))
> > > > > + continue;
> > > > > +
> > > > > + page = pfn_to_page(pfn);
> > > > > + if (zone != page_zone(page))
> > > > > + continue;
> > > > > +
> > > > > + clear_pageblock_skip(page);
> > > > > + }
> > > > 
> > > > What's the worst-case loop count here?
> > > > 
> > > 
> > > zone->spanned_pages >> pageblock_order
> > 
> > What's the worst-case value of (zone->spanned_pages >> pageblock_order) :)
> 
> Lets take an unlikely case - 128G single-node machine. That loop count
> on x86-64 would be 65536. It'll be fast enough, particularly in this
> path.

That could easily exceed a millisecond.  Can/should we stick a
cond_resched() in there?

Re: [Qemu-devel] [PATCH v2 5/7] QAPI: add command for live block commit, 'block-commit'

2012-09-25 Thread Jeff Cody

On 09/25/2012 03:42 PM, Eric Blake wrote:
> On 09/25/2012 10:29 AM, Jeff Cody wrote:
>> The command for live block commit is added, which has the following
>> arguments:
>>
>> device: the block device to perform the commit on (mandatory)
>> base:   the base image to commit into; optional (if not specified,
>> it is the underlying original image)
>> top:the top image of the commit - all data from inside top down
>> to base will be committed into base. optional (if not specified,
>> it is one below the active image) - see note below
> 
> This says that for 'base' <- 'mid' <- 'active', omitting 'top' is the
> same as specifying 'top':'mid'.
> 

That is the intended default.

>> speed:  maximum speed, in bytes/sec
>>
>> note: eventually this will support merging down the active layer,
>>   but that code is not yet complete.  If the active layer is passed
>>   in currently as top, or top is left to the default, then an error
>>   will be returned.
> 
> This says that for 'base' <- 'mid' <- 'active', omitting 'top' is an
> error (because it would be the same as an explicit 'top:'active').

This is how I originally had it, and then I changed it to default to
top->backing_hd.  I apparently did not remember to catch the note,
however.

> 
> Let's check the code...
> 
>> +if (top && has_top) {
>> +/* if we want to allow the active layer,
>> + * use 'bdrv_find_image()' here */
>> +top_bs = bdrv_find_backing_image(bs, top);
>> +} else {
>> +/* default is one below the active layer, unless that is
>> + * the base */
>> +top_bs = bs->backing_hd;
> 
> Aha - the former is correct as coded (you default to 'top':'mid' in my
> example), so the 'note' in your commit message needs editing.
> 
> On the other hand, when we ever DO get around to adding live commit,
> which is the saner default?  That is, am I more likely to want to do
> live commit from 'active', or more likely to do commit of the layer
> below 'active'?  If live commit is the more desirable case, then that
> argues that THIS commit should always error out if !has_top, and that
> the future patch will default top_bs = bs, and the rest of your commit
> message and documentation would need tweaking accordingly.
> 
> I don't have a preference either way (libvirt can arrange to always pass
> the top argument, and thus avoid the confusion when top is omitted), but
> if anyone else cares, now is the time to get the API right before we are
> locked in to it.
>

I guess I don't have a strong preference either - I originally had it
the other way, but then that meant the default in the current
implementation was actually an error.

Also, I assumed (danger!) that the most common use of commit would be a
snapshot, followed by a commit of active->backing_hd. With that
assumption, it seemed like a sane default.

>> +++ b/qapi-schema.json
>> @@ -1468,6 +1468,41 @@
>>'returns': 'str' }
>>  
>>  ##
>> +# @block-commit
>> +#
>> +# Live commit of data from overlay image nodes into backing nodes - i.e.,
>> +# writes data between 'top' and 'base' into 'base'.
>> +#
>> +# @device:  the name of the device
>> +#
>> +# @base:   #optional The file name of the backing image to write data into.
>> +#If not specified, this is the deepest backing image
>> +#
>> +# @top:#optional The file name of the backing image within the image 
>> chain,
>> +#which contains the topmost data to be committed down.
>> +#If not specified, this is one layer below the active
>> +#layer (i.e. active->backing_hd).
>> +#
>> +#If top == base, that is an error.
>> +#
> 
> This documentation of @top matches the first paragraph of your commit
> message.
> 
>> +#
>> +# @speed:  #optional the maximum speed, in bytes per second
>> +#
>> +# Returns: Nothing on success
>> +#  If commit or stream is already active on this device, DeviceInUse
>> +#  If @device does not exist, DeviceNotFound
>> +#  If image commit is not supported by this device, NotSupported
>> +#  If @base does not exist, a generic error is returned
>> +#  If @top does not exist, a generic error is returned
> 
> These two lines could be merged, or even made more generic (for example,
> based on the rest of this thread, you will also be erroring out if base
> and top exist, but appear as swapped arguments):
> 
> If @base or @top is invalid, a generic error is returned
> 

OK

Re: [Qemu-devel] [PATCH v2 4/7] block: helper function, to find the base image of a chain

2012-09-25 Thread Jeff Cody

On 09/25/2012 03:13 PM, Eric Blake wrote:
> On 09/25/2012 10:29 AM, Jeff Cody wrote:
>> This is a simple helper function, that will return the base image
>> of a given image chain.
>>
>> Signed-off-by: Jeff Cody 
>> ---
>>  block.c | 16 
>>  block.h |  1 +
>>  2 files changed, 17 insertions(+)
> 
> Bikeshed question: Any reason patch 1/7 adds two helper functions, and
> patch 4/7 adds a third?  Perhaps it would be worth squashing all three
> into one commit, or else having three commits, one per helper?  But
> there's no point in repainting things now, so:
> 
> Reviewed-by: Eric Blake 
> 

The reasoning was the helper functions in 1/7 were to support block
commit (patch 2/7), and the helper function in 4/7 was to support the
block commit QMP command (patch 5/7), so it seemed logical to split
them.

Re: [Qemu-devel] KVM call agenda for September 25th

2012-09-25 Thread Anthony Liguori

Kevin Wolf  writes:

> Am 25.09.2012 14:57, schrieb Anthony Liguori:
>> Paolo Bonzini  writes:
>> 
>>> Il 24/09/2012 13:28, Juan Quintela ha scritto:
 Hi

 Please send in any agenda items you are interested in covering.
>>>
>>> URI parsing library for glusterfs: libxml2 vs. in-tree "fork" of the
>>> same code.
>> 
>> The call is a bit late for Bharata but I think copying is the way to go.
>> 
>> Something I've been thinking about since this discussion started
>> though.  Maybe we could standardize on using URIs as short-hand syntax
>> for backends.
>
> Compared with QemuOpts, it's not really short-hand or even convenient
> for manual use. For management tools it might be nice because URIs have
> a well-known syntax, can escape anything and implementations exist. But
> I think we must still maintain an easy to use syntax for human users.
>
>> For example:
>> 
>> qemu -hda file:///foo.img
>> 
>> Or:
>> 
>> qemu -device virtio-net-pci,netdev=tap:///vnet0?script=/etc/qemu-ifup
>> 
>> Or:
>> 
>> qemu -device \
>>   isa-serial,index=0,chr=tcp://localhost:1025/?server=on&wait=off
>
> Your examples kind of prove this: They aren't much shorter than what
> exists today, but they contain ? and &, which are nasty characters on
> the command line.
>
>> This works particularly well with a "treat unknown options as -device"
>> mechanism so that we could do:
>> 
>> qemu -isa-serial chr=tcp://localhost:1025/?server=on&wait=off
>> 
>> We could even introduce a secondary implied option to shorten this
>> further to:
>> 
>> qemu -isa-serial tcp://localhost:1025/?server=on&wait=off
>
> This is something that I was thinking of in the context of -blockdev a
> while ago (without URLs): Define the block device inside of -device
> specifications. The problem of nesting an option string inside another
> one is solved in theory by URLs because they allow (nested) escaping,
> but in practice we'll need to use some kind of brackets instead if we
> want it to be usable.

qemu -isa-serial 'tcp://localhost:1025/?server=on&wait=off'

I don't think it's really that better.  And yeah, your thoughts are
exactly mine.  Having two syntaxes allows us to use a single option.

Hopefully most options could avoid having query parameters so escaping
wasn't a problem.  It's unfortunate that the TCP character device uses
client mode by default.

Regards,

Anthony Liguori


>
> Kevin

Re: [Qemu-devel] [PATCH v2 5/7] QAPI: add command for live block commit, 'block-commit'

2012-09-25 Thread Eric Blake

On 09/25/2012 10:29 AM, Jeff Cody wrote:
> The command for live block commit is added, which has the following
> arguments:
> 
> device: the block device to perform the commit on (mandatory)
> base:   the base image to commit into; optional (if not specified,
> it is the underlying original image)
> top:the top image of the commit - all data from inside top down
> to base will be committed into base. optional (if not specified,
> it is one below the active image) - see note below

This says that for 'base' <- 'mid' <- 'active', omitting 'top' is the
same as specifying 'top':'mid'.

> speed:  maximum speed, in bytes/sec
> 
> note: eventually this will support merging down the active layer,
>   but that code is not yet complete.  If the active layer is passed
>   in currently as top, or top is left to the default, then an error
>   will be returned.

This says that for 'base' <- 'mid' <- 'active', omitting 'top' is an
error (because it would be the same as an explicit 'top:'active').

Let's check the code...

> +if (top && has_top) {
> +/* if we want to allow the active layer,
> + * use 'bdrv_find_image()' here */
> +top_bs = bdrv_find_backing_image(bs, top);
> +} else {
> +/* default is one below the active layer, unless that is
> + * the base */
> +top_bs = bs->backing_hd;

Aha - the former is correct as coded (you default to 'top':'mid' in my
example), so the 'note' in your commit message needs editing.

On the other hand, when we ever DO get around to adding live commit,
which is the saner default?  That is, am I more likely to want to do
live commit from 'active', or more likely to do commit of the layer
below 'active'?  If live commit is the more desirable case, then that
argues that THIS commit should always error out if !has_top, and that
the future patch will default top_bs = bs, and the rest of your commit
message and documentation would need tweaking accordingly.

I don't have a preference either way (libvirt can arrange to always pass
the top argument, and thus avoid the confusion when top is omitted), but
if anyone else cares, now is the time to get the API right before we are
locked in to it.

> +++ b/qapi-schema.json
> @@ -1468,6 +1468,41 @@
>'returns': 'str' }
>  
>  ##
> +# @block-commit
> +#
> +# Live commit of data from overlay image nodes into backing nodes - i.e.,
> +# writes data between 'top' and 'base' into 'base'.
> +#
> +# @device:  the name of the device
> +#
> +# @base:   #optional The file name of the backing image to write data into.
> +#If not specified, this is the deepest backing image
> +#
> +# @top:#optional The file name of the backing image within the image 
> chain,
> +#which contains the topmost data to be committed down.
> +#If not specified, this is one layer below the active
> +#layer (i.e. active->backing_hd).
> +#
> +#If top == base, that is an error.
> +#

This documentation of @top matches the first paragraph of your commit
message.

> +#
> +# @speed:  #optional the maximum speed, in bytes per second
> +#
> +# Returns: Nothing on success
> +#  If commit or stream is already active on this device, DeviceInUse
> +#  If @device does not exist, DeviceNotFound
> +#  If image commit is not supported by this device, NotSupported
> +#  If @base does not exist, a generic error is returned
> +#  If @top does not exist, a generic error is returned

These two lines could be merged, or even made more generic (for example,
based on the rest of this thread, you will also be erroring out if base
and top exist, but appear as swapped arguments):

If @base or @top is invalid, a generic error is returned

-- 
Eric Blake   ebl...@redhat.com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] Dynamic Binary Instrumentation

2012-09-25 Thread Lluís Vilanova

陳韋任 (Wei-Ren Chen) writes:

> On Fri, Sep 21, 2012 at 03:39:32PM +0200, Lluís Vilanova wrote:
>> Sorry, it's up again. The server is low on RAM and Linux' OOM killer kicks in
>> sometimes.

>   Thanks. I am playing around your qemu-dbi, and has error below.

> $ make
>   LINK  i386-softmmu/qemu-system-i386
> translate.o: In function `trace_tcg_instr__before':
> /tmp/chenwj/qemu-dbi/trace/tcg-instr-internal.h:134: undefined reference to 
> `trace_tcg_vbbl_before'

How did you configure it?

After applying a small build patch, this works for me:

  cp ~/qemu-dbi/trace-events ~/
  sed -i -e "s/disable tcg vmem(/instrument tcg vmem(/g" trace-events
  mkdir -p ~/qemu-dbi-build
  cd ~/qemu-dbi-build
  ~/qemu-src/configure --with-trace-events=../trace-events 
--with-trace-instrument=dynamic --prefix=/tmp/qemu-install 
--enable-trace-backend=stderr --target-list="i386-softmmu"
  make
  make install

I've update the repo with the aforementioned patch.


> Seems some files or functions are missing? Also, do you consider
> move your repo to github or other more reliable server?

I might do it in the future, but for now my todo is full enough with other
tasks :)



> P.S. Attach is a minor patch which fix a typo (I guess).

Right, I should proofread the docs.



Thanks,
  Lluis

-- 
 "And it's much the same thing with knowledge, for whenever you learn
 something new, the whole world becomes that much richer."
 -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
 Tollbooth

Re: [Qemu-devel] [PATCH v2 4/7] block: helper function, to find the base image of a chain

2012-09-25 Thread Eric Blake

On 09/25/2012 10:29 AM, Jeff Cody wrote:
> This is a simple helper function, that will return the base image
> of a given image chain.
> 
> Signed-off-by: Jeff Cody 
> ---
>  block.c | 16 
>  block.h |  1 +
>  2 files changed, 17 insertions(+)

Bikeshed question: Any reason patch 1/7 adds two helper functions, and
patch 4/7 adds a third?  Perhaps it would be worth squashing all three
into one commit, or else having three commits, one per helper?  But
there's no point in repainting things now, so:

Reviewed-by: Eric Blake 

-- 
Eric Blake   ebl...@redhat.com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v2 2/7] block: add live block commit functionality

2012-09-25 Thread Eric Blake

On 09/25/2012 12:58 PM, Jeff Cody wrote:
> On 09/25/2012 02:12 PM, Eric Blake wrote:
>> On 09/25/2012 10:29 AM, Jeff Cody wrote:
>>> This adds the live commit coroutine.  This iteration focuses on the
>>> commit only below the active layer, and not the active layer itself.
>>>

>> I think you are missing a check here that base is on the backing chain
>> of top.  See also my comments to 5/7.
>>
> 
> Did you mean your comments on 6/7 (or am I missing an email)?

Shoot.  You're right, and I messed myself up by reviewing out of order :)

> 
> This does get partially validated in patch 5/7, in the
> qmp_block_commit() handler - both base and top are verified to be in the
> chain 'bs'.  What is not validated, however, is that you did not swap
> your 'top' and 'base' arguments.  I'll add a check here for that, to
> make sure that base is reachable from overlay_bs.

At any rate, it sounds like I got my point across, in spite of myself.

-- 
Eric Blake   ebl...@redhat.com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v2 2/7] block: add live block commit functionality

2012-09-25 Thread Jeff Cody

On 09/25/2012 02:12 PM, Eric Blake wrote:
> On 09/25/2012 10:29 AM, Jeff Cody wrote:
>> This adds the live commit coroutine.  This iteration focuses on the
>> commit only below the active layer, and not the active layer itself.
>>
>> The behaviour is similar to block streaming; the sectors are walked
>> through, and anything that exists above 'base' is committed back down
>> into base.  At the end, intermediate images are deleted, and the
>> chain stitched together.  Images are restored to their original open
>> flags upon completion.
>>
>> Signed-off-by: Jeff Cody 
> 
>> +void commit_start(BlockDriverState *bs, BlockDriverState *base,
>> + BlockDriverState *top, int64_t speed,
>> + BlockErrorAction on_error, BlockDriverCompletionFunc *cb,
>> + void *opaque, Error **errp)
>> +{
>> +CommitBlockJob *s;
>> +BlockReopenQueue *reopen_queue = NULL;
>> +int orig_overlay_flags;
>> +int orig_base_flags;
>> +BlockDriverState *overlay_bs;
>> +Error *local_err = NULL;
>> +
>> +if ((on_error == BLOCK_ERR_STOP_ANY ||
>> + on_error == BLOCK_ERR_STOP_ENOSPC) &&
>> +!bdrv_iostatus_is_enabled(bs)) {
>> +error_set(errp, QERR_INVALID_PARAMETER_COMBINATION);
>> +return;
>> +}
>> +
>> +overlay_bs = bdrv_find_overlay(bs, top);
>> +
>> +if (overlay_bs == NULL) {
>> +error_setg(errp, "Could not find overlay image for %s:", 
>> top->filename);
>> +return;
>> +}
>> +
>> +orig_base_flags= bdrv_get_flags(base);
>> +orig_overlay_flags = bdrv_get_flags(overlay_bs);
> 
> I think you are missing a check here that base is on the backing chain
> of top.  See also my comments to 5/7.
> 

Did you mean your comments on 6/7 (or am I missing an email)?

This does get partially validated in patch 5/7, in the
qmp_block_commit() handler - both base and top are verified to be in the
chain 'bs'.  What is not validated, however, is that you did not swap
your 'top' and 'base' arguments.  I'll add a check here for that, to
make sure that base is reachable from overlay_bs.


>> +
>> +/* convert base_bs & overlay_bs to r/w, if necessary */
>> +if (!(orig_base_flags & BDRV_O_RDWR)) {
>> +reopen_queue = bdrv_reopen_queue(reopen_queue, base,
>> + orig_base_flags | BDRV_O_RDWR);
>> +}
>> +if (!(orig_overlay_flags & BDRV_O_RDWR)) {
>> +reopen_queue = bdrv_reopen_queue(reopen_queue, overlay_bs,
>> + orig_overlay_flags | BDRV_O_RDWR);
>> +}
>> +if (reopen_queue) {
>> +bdrv_reopen_multiple(reopen_queue, &local_err);
> 
> Is it valid to make a no-op call, such as:
> 
> { "execute":"block-commit", "arguments":{
>   "device":"drive0", "top":"base", "base":"base" }}
> 
> If so, should we do an early exit here, rather than temporarily changing
> base to R/W just to change it back to R/O?
> 
> If not, should we be rejecting it up front (again, back to the question
> of failing if 'base' is not a backing file of 'top', even if both 'top'
> and 'base' are backing files of 'device').
> 

I'll add a quick check for that as well.  Patch 5 checks for it in one
scenario (default 'top'), and returns a generic error of:
"Invalid files for merge: top and base are the same"

I'll make sure to check for that in all scenarios (or just make the test
mentioned earlier incorporate top == base as well)

Thanks,
Jeff

Re: [Qemu-devel] [PATCH v2 6/7] qemu-iotests: add initial tests for live block commit

2012-09-25 Thread Jeff Cody

On 09/25/2012 02:02 PM, Eric Blake wrote:
> On 09/25/2012 10:29 AM, Jeff Cody wrote:
>> Derived from the streaming test cases (030), this adds the
>> following tests:
>>
>> 1. For the following image chain, commit [mid] into [backing],
>>and use qemu-io to verify [backing] has its original data, as
>>well as the data from [mid]
>>
>>[backing] <-- [mid] <-- [test]
>>
>> 2. Verifies that 'block-commit' with the 'speed' parameter sets the
>>speed parameter, as reported by 'query-block-jobs'
>>
>> 3. Verifies that a bogus 'device' parameter to 'block-commit'
>>results in error
> 
> I think you are missing a test; you should also verify that:
> 
> { "command":"block-commit", "arguments":{
>   "device":"drive0", "base":"mid", "top":"backing" } }
> 
> properly fails, since 'mid' is not a backing file of 'backing'.  I saw
> code in patch 1/7 that bdrv_drop_intermediate() should detect the
> situation, but I'm not confident enough in my reading of patch 2/7 to
> know if that detection point was early enough, or whether the coroutine
> stuff in 2/7 ends up corrupting 'mid' prior to failure.
> 

Good idea.  This seems like a good test to have in place.

Re: [Qemu-devel] KVM Call minutes for 2012-09-25

2012-09-25 Thread Luiz Capitulino

On Tue, 25 Sep 2012 16:59:00 +0200
Markus Armbruster  wrote:

> Juan Quintela  writes:
> 
> > Hi
> >
> > This are this week minutes:
> >
> > - URI parsing library for glusterfs: libxml2 vs. in-tree "fork" of the
> > same code. (Paolo)
> >  * code hasn't changed in 2 years, it is really stable
> >  * anthony wants to copy the code
> >
> > - there are several commands that do blocking IO
> >   dump-guest-memory/screen-dump
> >   convert to asynchronous commands after we move all to QAPI
> >   only two commands missingto port to QAPI, and one is posted on list
> >   non-blocking IO to a file is a challenge
> >   (we have code on the block layer for it)
> >
> > - how to give errors from OpenFile to the caller
> >   putting errno as int: bad idea
> >   putting as strerrno string: also a bad idea, no warantees
> 
> Use the identifiers instead of their non-portable numeric encodings or
> strerror() descriptions: "EPERM", "EINVAL", ...

Yes, but for me the important point in this discussion is whether
or not a new class is necessary. I think it it isn't.

1 2 3 >

1 - 100 of 236 matches

Mail list logo