Re: [PATCH v3 3/6] target/arm: make ARMCPU.ctr 64-bit

2021-01-11 Thread Laurent Desnogues
On Fri, Jan 8, 2021 at 7:51 PM Leif Lindholm  wrote:
>
> When FEAT_MTE is implemented, the AArch64 view of CTR_EL0 adds the
> TminLine field in bits [37:32].
> Extend the ctr field to be able to hold this context.
>
> Signed-off-by: Leif Lindholm 

Reviewed-by: Laurent Desnogues 

Thanks,

Laurent

> ---
>  target/arm/cpu.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index fadd1a47df..063228de2a 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -931,7 +931,7 @@ struct ARMCPU {
>  uint64_t midr;
>  uint32_t revidr;
>  uint32_t reset_fpsid;
> -uint32_t ctr;
> +uint64_t ctr;
>  uint32_t reset_sctlr;
>  uint64_t pmceid0;
>  uint64_t pmceid1;
> --
> 2.20.1
>



Re: [PATCH v3 4/6] target/arm: add descriptions of CLIDR_EL1, CCSIDR_EL1, CTR_EL0 to cpu.h

2021-01-11 Thread Laurent Desnogues
On Fri, Jan 8, 2021 at 7:51 PM Leif Lindholm  wrote:
>
> Signed-off-by: Leif Lindholm 

Reviewed-by: Laurent Desnogues 

Thanks,

Laurent


> ---
>  target/arm/cpu.h | 31 +++
>  1 file changed, 31 insertions(+)
>
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index 063228de2a..18c1cb02bb 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -1736,6 +1736,37 @@ FIELD(V7M_FPCCR, ASPEN, 31, 1)
>  /*
>   * System register ID fields.
>   */
> +FIELD(CLIDR_EL1, CTYPE1, 0, 3)
> +FIELD(CLIDR_EL1, CTYPE2, 3, 3)
> +FIELD(CLIDR_EL1, CTYPE3, 6, 3)
> +FIELD(CLIDR_EL1, CTYPE4, 9, 3)
> +FIELD(CLIDR_EL1, CTYPE5, 12, 3)
> +FIELD(CLIDR_EL1, CTYPE6, 15, 3)
> +FIELD(CLIDR_EL1, CTYPE7, 18, 3)
> +FIELD(CLIDR_EL1, LOUIS, 21, 3)
> +FIELD(CLIDR_EL1, LOC, 24, 3)
> +FIELD(CLIDR_EL1, LOUU, 27, 3)
> +FIELD(CLIDR_EL1, ICB, 30, 3)
> +
> +/* When FEAT_CCIDX is implemented */
> +FIELD(CCSIDR_EL1, CCIDX_LINESIZE, 0, 3)
> +FIELD(CCSIDR_EL1, CCIDX_ASSOCIATIVITY, 3, 21)
> +FIELD(CCSIDR_EL1, CCIDX_NUMSETS, 32, 24)
> +
> +/* When FEAT_CCIDX is not implemented */
> +FIELD(CCSIDR_EL1, LINESIZE, 0, 3)
> +FIELD(CCSIDR_EL1, ASSOCIATIVITY, 3, 10)
> +FIELD(CCSIDR_EL1, NUMSETS, 13, 15)
> +
> +FIELD(CTR_EL0,  IMINLINE, 0, 4)
> +FIELD(CTR_EL0,  L1IP, 14, 2)
> +FIELD(CTR_EL0,  DMINLINE, 16, 4)
> +FIELD(CTR_EL0,  ERG, 20, 4)
> +FIELD(CTR_EL0,  CWG, 24, 4)
> +FIELD(CTR_EL0,  IDC, 28, 1)
> +FIELD(CTR_EL0,  DIC, 29, 1)
> +FIELD(CTR_EL0,  TMINLINE, 32, 6)
> +
>  FIELD(MIDR_EL1, REVISION, 0, 4)
>  FIELD(MIDR_EL1, PARTNUM, 4, 12)
>  FIELD(MIDR_EL1, ARCHITECTURE, 16, 4)
> --
> 2.20.1
>



Re: [PATCH v2 3/5] target/arm: add descriptions of CLIDR_EL1, CCSIDR_EL1, CTR_EL0 to cpu.h

2020-12-17 Thread Laurent Desnogues
On Thu, Dec 17, 2020 at 1:10 PM Leif Lindholm  wrote:
[...]
> > > > > +FIELD(CCSIDR_EL1, LINESIZE, 0, 3)
> > > > > +FIELD(CCSIDR_EL1, ASSOCIATIVITY, 3, 21)
> > > > > +FIELD(CCSIDR_EL1, NUMSETS, 32, 24)
> > > >
> > > > The positions and sizes of the ASSOCIATIVITY and NUMSETS CCSIDR fields
> > > > depend on whether the ARMv8.3-CCIDX extension is implemented or not.
> > > > If we really want to define the fields this way, we perhaps should
> > > > define two sets.  Or at the very least, add a comment stating this
> > > > definition is for ARMv8.3-CCIDX.
> > >
> > > Urgh, sorry for this.
> > > I added the fields only to make the CPU definition more readable, so I
> > > think we don't need to worry about runtime handling of this?
> > > But I don't think it makes sense to add only the one form.
> > > Should I use CCIDX_CCSIDR_EL1 for these ones and add
> > >
> > > /* When FEAT_CCIDX is not implemented */
> > > FIELD(CCSIDR_EL1, LINESIZE, 0, 3)
> > > FIELD(CCSIDR_EL1, ASSOCIATIVITY, 3, 10)
> > > FIELD(CCSIDR_EL1, NUMSETS, 13, 15)
> > >
> > > with a comment that
> > > /* When FEAT_CCIDX is implemented */
> > > for the former set
> > > ?
> >
> > Having both would be handy, but you need to have different names for
> > the fields.
>
> Different names for the same field?
> I.e.
> FIELD(CCIDX_CCSIDR_EL1, LINESIZE, 0, 3)
> would need a different name for LINESIZE than
> FIELD(CCSIDR_EL1, LINESIZE, 0, 3)
> ?

I was thinking about changing the field names, not the register name
because the register is the same, only the layout changes.  So
LINESIZE -> CCIDX_LINESIZE, etc.

That's personal preference, Peter might have a different one.

Thanks,

Laurent



Re: [PATCH v2 3/5] target/arm: add descriptions of CLIDR_EL1, CCSIDR_EL1, CTR_EL0 to cpu.h

2020-12-17 Thread Laurent Desnogues
Hi Leif,

On Tue, Dec 15, 2020 at 5:49 PM Leif Lindholm  wrote:
>
> On Tue, Dec 15, 2020 at 13:23:58 +0100, Laurent Desnogues wrote:
> > Hello,
> >
> > On Tue, Dec 15, 2020 at 12:51 PM Leif Lindholm  wrote:
> > >
> > > Signed-off-by: Leif Lindholm 
> > > ---
> > >  target/arm/cpu.h | 24 
> > >  1 file changed, 24 insertions(+)
> > >
> > > diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> > > index fadd1a47df..90ba707b64 100644
> > > --- a/target/arm/cpu.h
> > > +++ b/target/arm/cpu.h
> > > @@ -1736,6 +1736,30 @@ FIELD(V7M_FPCCR, ASPEN, 31, 1)
> > >  /*
> > >   * System register ID fields.
> > >   */
> > > +FIELD(CLIDR_EL1, CTYPE1, 0, 3)
> > > +FIELD(CLIDR_EL1, CTYPE2, 3, 3)
> > > +FIELD(CLIDR_EL1, CTYPE3, 6, 3)
> > > +FIELD(CLIDR_EL1, CTYPE4, 9, 3)
> > > +FIELD(CLIDR_EL1, CTYPE5, 12, 3)
> > > +FIELD(CLIDR_EL1, CTYPE6, 15, 3)
> > > +FIELD(CLIDR_EL1, CTYPE7, 18, 3)
> > > +FIELD(CLIDR_EL1, LOUIS, 21, 3)
> > > +FIELD(CLIDR_EL1, LOC, 24, 3)
> > > +FIELD(CLIDR_EL1, LOUU, 27, 3)
> > > +FIELD(CLIDR_EL1, ICB, 30, 3)
> > > +
> > > +FIELD(CCSIDR_EL1, LINESIZE, 0, 3)
> > > +FIELD(CCSIDR_EL1, ASSOCIATIVITY, 3, 21)
> > > +FIELD(CCSIDR_EL1, NUMSETS, 32, 24)
> >
> > The positions and sizes of the ASSOCIATIVITY and NUMSETS CCSIDR fields
> > depend on whether the ARMv8.3-CCIDX extension is implemented or not.
> > If we really want to define the fields this way, we perhaps should
> > define two sets.  Or at the very least, add a comment stating this
> > definition is for ARMv8.3-CCIDX.
>
> Urgh, sorry for this.
> I added the fields only to make the CPU definition more readable, so I
> think we don't need to worry about runtime handling of this?
> But I don't think it makes sense to add only the one form.
> Should I use CCIDX_CCSIDR_EL1 for these ones and add
>
> /* When FEAT_CCIDX is not implemented */
> FIELD(CCSIDR_EL1, LINESIZE, 0, 3)
> FIELD(CCSIDR_EL1, ASSOCIATIVITY, 3, 10)
> FIELD(CCSIDR_EL1, NUMSETS, 13, 15)
>
> with a comment that
> /* When FEAT_CCIDX is implemented */
> for the former set
> ?

Having both would be handy, but you need to have different names for
the fields.  For setting fields up in cpu{64}.c that'd be acceptable
as you know if the CPU you define has ARMv8.3-CCIDX. In the rest of
the code the use would be more complicated as you'd have to check for
ARMv8.3-CCIDX before accessing fields.  But the use of those fields
outside of cpu{64}.c would likely be extremely limited so I don't
think that's an issue.

> > > +FIELD(CTR_EL0,  IMINLINE, 0, 4)
> > > +FIELD(CTR_EL0,  L1IP, 14, 2)
> > > +FIELD(CTR_EL0,  DMINLINE, 16, 4)
> > > +FIELD(CTR_EL0,  ERG, 20, 4)
> > > +FIELD(CTR_EL0,  CWG, 24, 4)
> > > +FIELD(CTR_EL0,  IDC, 28, 1)
> > > +FIELD(CTR_EL0,  DIC, 29, 1)
> >
> > There's a missing field:  TminLine which starts at bit 32.
>
> Ack, oops.
>
> > If
> > implemented, that would require to make ctr a 64-bit integer.
>
> As far as I can tell, this will be safe with existing code - should I
> fold in a patch extending the register?

IMHO it'd be better to extend ctr to 64-bit.  But I'm not sure of the
implications in the rest of the code.

Thanks,

Laurent

> Regards,
>
> Leif
>
> > Thanks,
> >
> > Laurent
> >
> > > +
> > >  FIELD(MIDR_EL1, REVISION, 0, 4)
> > >  FIELD(MIDR_EL1, PARTNUM, 4, 12)
> > >  FIELD(MIDR_EL1, ARCHITECTURE, 16, 4)
> > > --
> > > 2.20.1
> > >
> > >



Re: [PATCH v2 5/5] target/arm: add aarch32 ID register fields to cpu.h

2020-12-15 Thread Laurent Desnogues
On Tue, Dec 15, 2020 at 12:52 PM Leif Lindholm  wrote:
>
> Add entries present in ARM DDI 0487F.c (August 2020).
>
> Signed-off-by: Leif Lindholm 

Reviewed-by: Laurent Desnogues 

Thanks,

Laurent

> ---
>  target/arm/cpu.h | 28 
>  1 file changed, 28 insertions(+)
>
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index efa977eaca..fb81eed776 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -1823,6 +1823,8 @@ FIELD(ID_ISAR6, DP, 4, 4)
>  FIELD(ID_ISAR6, FHM, 8, 4)
>  FIELD(ID_ISAR6, SB, 12, 4)
>  FIELD(ID_ISAR6, SPECRES, 16, 4)
> +FIELD(ID_ISAR6, BF16, 20, 4)
> +FIELD(ID_ISAR6, I8MM, 24, 4)
>
>  FIELD(ID_MMFR0, VMSA, 0, 4)
>  FIELD(ID_MMFR0, PMSA, 4, 4)
> @@ -1833,6 +1835,24 @@ FIELD(ID_MMFR0, AUXREG, 20, 4)
>  FIELD(ID_MMFR0, FCSE, 24, 4)
>  FIELD(ID_MMFR0, INNERSHR, 28, 4)
>
> +FIELD(ID_MMFR1, L1HVDVA, 0, 4)
> +FIELD(ID_MMFR1, L1UNIVA, 4, 4)
> +FIELD(ID_MMFR1, L1HVDSW, 8, 4)
> +FIELD(ID_MMFR1, L1UNISW, 12, 4)
> +FIELD(ID_MMFR1, L1HVD, 16, 4)
> +FIELD(ID_MMFR1, L1UNI, 20, 4)
> +FIELD(ID_MMFR1, L1TSTCLN, 24, 4)
> +FIELD(ID_MMFR1, BPRED, 28, 4)
> +
> +FIELD(ID_MMFR2, L1HVDFG, 0, 4)
> +FIELD(ID_MMFR2, L1HVDBG, 4, 4)
> +FIELD(ID_MMFR2, L1HVDRNG, 8, 4)
> +FIELD(ID_MMFR2, HVDTLB, 12, 4)
> +FIELD(ID_MMFR2, UNITLB, 16, 4)
> +FIELD(ID_MMFR2, MEMBARR, 20, 4)
> +FIELD(ID_MMFR2, WFISTALL, 24, 4)
> +FIELD(ID_MMFR2, HWACCFLG, 28, 4)
> +
>  FIELD(ID_MMFR3, CMAINTVA, 0, 4)
>  FIELD(ID_MMFR3, CMAINTSW, 4, 4)
>  FIELD(ID_MMFR3, BPMAINT, 8, 4)
> @@ -1851,6 +1871,8 @@ FIELD(ID_MMFR4, LSM, 20, 4)
>  FIELD(ID_MMFR4, CCIDX, 24, 4)
>  FIELD(ID_MMFR4, EVT, 28, 4)
>
> +FIELD(ID_MMFR5, ETS, 0, 4)
> +
>  FIELD(ID_PFR0, STATE0, 0, 4)
>  FIELD(ID_PFR0, STATE1, 4, 4)
>  FIELD(ID_PFR0, STATE2, 8, 4)
> @@ -1869,6 +1891,10 @@ FIELD(ID_PFR1, SEC_FRAC, 20, 4)
>  FIELD(ID_PFR1, VIRT_FRAC, 24, 4)
>  FIELD(ID_PFR1, GIC, 28, 4)
>
> +FIELD(ID_PFR2, CSV3, 0, 4)
> +FIELD(ID_PFR2, SSBS, 4, 4)
> +FIELD(ID_PFR2, RAS_FRAC, 8, 4)
> +
>  FIELD(ID_AA64ISAR0, AES, 4, 4)
>  FIELD(ID_AA64ISAR0, SHA1, 8, 4)
>  FIELD(ID_AA64ISAR0, SHA2, 12, 4)
> @@ -1983,6 +2009,8 @@ FIELD(ID_DFR0, MPROFDBG, 20, 4)
>  FIELD(ID_DFR0, PERFMON, 24, 4)
>  FIELD(ID_DFR0, TRACEFILT, 28, 4)
>
> +FIELD(ID_DFR1, MTPMU, 0, 4)
> +
>  FIELD(DBGDIDR, SE_IMP, 12, 1)
>  FIELD(DBGDIDR, NSUHD_IMP, 14, 1)
>  FIELD(DBGDIDR, VERSION, 16, 4)
> --
> 2.20.1
>
>



Re: [PATCH v2 2/5] target/arm: make ARMCPU.clidr 64-bit

2020-12-15 Thread Laurent Desnogues
On Tue, Dec 15, 2020 at 12:52 PM Leif Lindholm  wrote:
>
> The AArch64 view of CLIDR_EL1 extends the ICB field to include also bit
> 32, as well as adding a Ttype field when FEAT_MTE is implemented.
> Extend the clidr field to be able to hold this context.
>
> Signed-off-by: Leif Lindholm 

Reviewed-by: Laurent Desnogues 

Thanks,

Laurent

> ---
>  target/arm/cpu.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index 5e3cf77ec7..fadd1a47df 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -938,7 +938,7 @@ struct ARMCPU {
>  uint32_t id_afr0;
>  uint64_t id_aa64afr0;
>  uint64_t id_aa64afr1;
> -uint32_t clidr;
> +uint64_t clidr;
>  uint64_t mp_affinity; /* MP ID without feature bits */
>  /* The elements of this array are the CCSIDR values for each cache,
>   * in the order L1DCache, L1ICache, L2DCache, L2ICache, etc.
> --
> 2.20.1
>
>



Re: [PATCH v2 4/5] target/arm: add aarch64 ID register fields to cpu.h

2020-12-15 Thread Laurent Desnogues
On Tue, Dec 15, 2020 at 12:51 PM Leif Lindholm  wrote:
>
> Add entries present in ARM DDI 0487F.c (August 2020).
>
> Signed-off-by: Leif Lindholm 

Reviewed-by: Laurent Desnogues 

Thanks,

Laurent

> ---
>  target/arm/cpu.h | 15 +++
>  1 file changed, 15 insertions(+)
>
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index 90ba707b64..efa977eaca 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -1895,6 +1895,9 @@ FIELD(ID_AA64ISAR1, GPI, 28, 4)
>  FIELD(ID_AA64ISAR1, FRINTTS, 32, 4)
>  FIELD(ID_AA64ISAR1, SB, 36, 4)
>  FIELD(ID_AA64ISAR1, SPECRES, 40, 4)
> +FIELD(ID_AA64ISAR1, BF16, 44, 4)
> +FIELD(ID_AA64ISAR1, DGH, 48, 4)
> +FIELD(ID_AA64ISAR1, I8MM, 52, 4)
>
>  FIELD(ID_AA64PFR0, EL0, 0, 4)
>  FIELD(ID_AA64PFR0, EL1, 4, 4)
> @@ -1905,11 +1908,18 @@ FIELD(ID_AA64PFR0, ADVSIMD, 20, 4)
>  FIELD(ID_AA64PFR0, GIC, 24, 4)
>  FIELD(ID_AA64PFR0, RAS, 28, 4)
>  FIELD(ID_AA64PFR0, SVE, 32, 4)
> +FIELD(ID_AA64PFR0, SEL2, 36, 4)
> +FIELD(ID_AA64PFR0, MPAM, 40, 4)
> +FIELD(ID_AA64PFR0, AMU, 44, 4)
> +FIELD(ID_AA64PFR0, DIT, 48, 4)
> +FIELD(ID_AA64PFR0, CSV2, 56, 4)
> +FIELD(ID_AA64PFR0, CSV3, 60, 4)
>
>  FIELD(ID_AA64PFR1, BT, 0, 4)
>  FIELD(ID_AA64PFR1, SSBS, 4, 4)
>  FIELD(ID_AA64PFR1, MTE, 8, 4)
>  FIELD(ID_AA64PFR1, RAS_FRAC, 12, 4)
> +FIELD(ID_AA64PFR1, MPAM_FRAC, 16, 4)
>
>  FIELD(ID_AA64MMFR0, PARANGE, 0, 4)
>  FIELD(ID_AA64MMFR0, ASIDBITS, 4, 4)
> @@ -1923,6 +1933,8 @@ FIELD(ID_AA64MMFR0, TGRAN16_2, 32, 4)
>  FIELD(ID_AA64MMFR0, TGRAN64_2, 36, 4)
>  FIELD(ID_AA64MMFR0, TGRAN4_2, 40, 4)
>  FIELD(ID_AA64MMFR0, EXS, 44, 4)
> +FIELD(ID_AA64MMFR0, FGT, 56, 4)
> +FIELD(ID_AA64MMFR0, ECV, 60, 4)
>
>  FIELD(ID_AA64MMFR1, HAFDBS, 0, 4)
>  FIELD(ID_AA64MMFR1, VMIDBITS, 4, 4)
> @@ -1932,6 +1944,8 @@ FIELD(ID_AA64MMFR1, LO, 16, 4)
>  FIELD(ID_AA64MMFR1, PAN, 20, 4)
>  FIELD(ID_AA64MMFR1, SPECSEI, 24, 4)
>  FIELD(ID_AA64MMFR1, XNX, 28, 4)
> +FIELD(ID_AA64MMFR1, TWED, 32, 4)
> +FIELD(ID_AA64MMFR1, ETS, 36, 4)
>
>  FIELD(ID_AA64MMFR2, CNP, 0, 4)
>  FIELD(ID_AA64MMFR2, UAO, 4, 4)
> @@ -1958,6 +1972,7 @@ FIELD(ID_AA64DFR0, CTX_CMPS, 28, 4)
>  FIELD(ID_AA64DFR0, PMSVER, 32, 4)
>  FIELD(ID_AA64DFR0, DOUBLELOCK, 36, 4)
>  FIELD(ID_AA64DFR0, TRACEFILT, 40, 4)
> +FIELD(ID_AA64DFR0, MTPMU, 48, 4)
>
>  FIELD(ID_DFR0, COPDBG, 0, 4)
>  FIELD(ID_DFR0, COPSDBG, 4, 4)
> --
> 2.20.1
>
>



Re: [PATCH v2 1/5] target/arm: fix typo in cpu.h ID_AA64PFR1 field name

2020-12-15 Thread Laurent Desnogues
On Tue, Dec 15, 2020 at 12:51 PM Leif Lindholm  wrote:
>
> SBSS -> SSBS
>
> Signed-off-by: Leif Lindholm 

Reviewed-by: Laurent Desnogues 

Thanks,

Laurent

> ---
>  target/arm/cpu.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index 7e6c881a7e..5e3cf77ec7 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -1883,7 +1883,7 @@ FIELD(ID_AA64PFR0, RAS, 28, 4)
>  FIELD(ID_AA64PFR0, SVE, 32, 4)
>
>  FIELD(ID_AA64PFR1, BT, 0, 4)
> -FIELD(ID_AA64PFR1, SBSS, 4, 4)
> +FIELD(ID_AA64PFR1, SSBS, 4, 4)
>  FIELD(ID_AA64PFR1, MTE, 8, 4)
>  FIELD(ID_AA64PFR1, RAS_FRAC, 12, 4)
>
> --
> 2.20.1
>
>



Re: [PATCH v2 3/5] target/arm: add descriptions of CLIDR_EL1, CCSIDR_EL1, CTR_EL0 to cpu.h

2020-12-15 Thread Laurent Desnogues
Hello,

On Tue, Dec 15, 2020 at 12:51 PM Leif Lindholm  wrote:
>
> Signed-off-by: Leif Lindholm 
> ---
>  target/arm/cpu.h | 24 
>  1 file changed, 24 insertions(+)
>
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index fadd1a47df..90ba707b64 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -1736,6 +1736,30 @@ FIELD(V7M_FPCCR, ASPEN, 31, 1)
>  /*
>   * System register ID fields.
>   */
> +FIELD(CLIDR_EL1, CTYPE1, 0, 3)
> +FIELD(CLIDR_EL1, CTYPE2, 3, 3)
> +FIELD(CLIDR_EL1, CTYPE3, 6, 3)
> +FIELD(CLIDR_EL1, CTYPE4, 9, 3)
> +FIELD(CLIDR_EL1, CTYPE5, 12, 3)
> +FIELD(CLIDR_EL1, CTYPE6, 15, 3)
> +FIELD(CLIDR_EL1, CTYPE7, 18, 3)
> +FIELD(CLIDR_EL1, LOUIS, 21, 3)
> +FIELD(CLIDR_EL1, LOC, 24, 3)
> +FIELD(CLIDR_EL1, LOUU, 27, 3)
> +FIELD(CLIDR_EL1, ICB, 30, 3)
> +
> +FIELD(CCSIDR_EL1, LINESIZE, 0, 3)
> +FIELD(CCSIDR_EL1, ASSOCIATIVITY, 3, 21)
> +FIELD(CCSIDR_EL1, NUMSETS, 32, 24)

The positions and sizes of the ASSOCIATIVITY and NUMSETS CCSIDR fields
depend on whether the ARMv8.3-CCIDX extension is implemented or not.
If we really want to define the fields this way, we perhaps should
define two sets.  Or at the very least, add a comment stating this
definition is for ARMv8.3-CCIDX.

> +FIELD(CTR_EL0,  IMINLINE, 0, 4)
> +FIELD(CTR_EL0,  L1IP, 14, 2)
> +FIELD(CTR_EL0,  DMINLINE, 16, 4)
> +FIELD(CTR_EL0,  ERG, 20, 4)
> +FIELD(CTR_EL0,  CWG, 24, 4)
> +FIELD(CTR_EL0,  IDC, 28, 1)
> +FIELD(CTR_EL0,  DIC, 29, 1)

There's a missing field:  TminLine which starts at bit 32.  If
implemented, that would require to make ctr a 64-bit integer.

Thanks,

Laurent

> +
>  FIELD(MIDR_EL1, REVISION, 0, 4)
>  FIELD(MIDR_EL1, PARTNUM, 4, 12)
>  FIELD(MIDR_EL1, ARCHITECTURE, 16, 4)
> --
> 2.20.1
>
>



Re: what is the symbole "leul" mean on this code?

2020-06-02 Thread Laurent Desnogues
On Tue, Jun 2, 2020 at 3:13 PM tugouxp <13824125...@163.com> wrote:
>
> HI guys:
>i am stucked by the following code,and i can find document illustrations 
> this nowhere.
>what does the "leul" stands for? and the postfix 10 means what?

That should be little-endian unsigned long (where long is 32-bit).


Laurent

> thank you!
>
> 10   6001  
>
> 11  movi_i32 tmp3,$0x60010020pref=all
>
> 12  mov_i32 tmp5,tmp3dead: 1  pref=all
>
> 13  qemu_ld_i32 tmp4,tmp5,leul,10dead: 1  pref=all
>
> 14  movi_i32 tmp3,$0xfffepref=all
>
> 15  and_i32 pc,tmp4,tmp3 sync: 0  dead: 0 2  pref=all
>
> 16  movi_i32 tmp3,$0x1   pref=all
>
> 17  and_i32 tmp4,tmp4,tmp3   dead: 1 2  pref=all
>
> 18  st_i32 tmp4,env,$0x220   dead: 0 1
>
> 19  exit_tb $0x0
>
> 20  set_label $L0
>
> 21  exit_tb $0x7f0d0223c003
>
>
>
>
>
>



Re: [INFO] Some preliminary performance data

2020-05-09 Thread Laurent Desnogues
On Sat, May 9, 2020 at 2:38 PM Aleksandar Markovic
 wrote:
>
> суб, 9. мај 2020. у 13:37 Laurent Desnogues
>  је написао/ла:
> >
> > On Sat, May 9, 2020 at 12:17 PM Aleksandar Markovic
> >  wrote:
> > >  сре, 6. мај 2020. у 13:26 Alex Bennée  је 
> > > написао/ла:
> > >
> > > > This is very much driven by how much code generation vs running you see.
> > > > In most of my personal benchmarks I never really notice code generation
> > > > because I give my machines large amounts of RAM so code tends to stay
> > > > resident so not need to be re-translated. When the optimiser shows up
> > > > it's usually accompanied by high TB flush and invalidate counts in "info
> > > > jit" because we are doing more translation that we usually do.
> > > >
> > >
> > > Yes, I think the machine was setup with only 128MB RAM.
> > >
> > > That would be an interesting experiment for Ahmed actually - to
> > > measure impact of given RAM memory to performance.
> > >
> > > But it looks that at least for machines with small RAM, translation
> > > phase will take significant percentage.
> > >
> > > I am attaching call graph for translation phase for "Hello World" built
> > > for mips, and emulated by QEMU: *tb_gen_code() and its calees)
> >
>
> Hi, Laurent,
>
> "Hello world" was taken as an example where code generation is
> dominant. It was taken to illustrate how performance-wise code
> generation overhead is distributed (illustrating dominance of a
> single function).
>
> While "Hello world" by itself is not a significant example, it conveys
> a useful information: it says how much is the overhead of QEMU
> linux-user executable initialization, and code generation spent on
> emulation of loading target executable and printing a simple
> message. This can be roughly deducted from the result for
> a meaningful benchmark.
>
> Booting of a virtual machine is a legitimate scenario for measuring
> performance, and perhaps even attempting improving it.
>
> Everything should be measured - code generation, JIT-ed code
> execution, and helpers execution - in all cases, and checked
> whether it departs from expected behavior.
>
> Let's say that we emulate a benchmark that basically runs some
> code in a loop, or an algorithm - one would expect that after a
> while, while increasing number of iterations of the loop, or the
> size of data in the algorithm, code generation becomes less and
> less significant, converging to zero. Well, this should be confirmed
> with an experiment, and not taken for granted.
>
> I think limiting measurements only on, let's say, execution of
> JIT-ed code (if that is what you implied) is a logical mistake.
> The right conclusions should be drawn from the complete
> picture, shouldn't it?

I explicitly wrote that you should consider a wide spectrum of
programs so I think we're in violent agreement ;-)

Thanks,

Laurent

> Yours,
> Aleksandar
>
> > Sorry if I'm stating the obvious but both "Hello World" and a
> > Linux boot will exhibit similar behaviors with low reuse of
> > translated blocks, which means translation will show up in
> > profiles as a lot of time is spent in translating blocks that
> > will run once.  If you push in that direction you might reach
> > the conclusion that a non JIST simulator is faster than QEMU.
> >
> > You will have to carefully select the tests you run:  you need
> > a large spectrum from Linux boot, "Hello World" up to synthetic
> > benchmarks.
> >
> > Again sorry if that was too trivial :-)
> >
> > Laurent



Re: [INFO] Some preliminary performance data

2020-05-09 Thread Laurent Desnogues
On Sat, May 9, 2020 at 12:17 PM Aleksandar Markovic
 wrote:
>  сре, 6. мај 2020. у 13:26 Alex Bennée  је написао/ла:
>
> > This is very much driven by how much code generation vs running you see.
> > In most of my personal benchmarks I never really notice code generation
> > because I give my machines large amounts of RAM so code tends to stay
> > resident so not need to be re-translated. When the optimiser shows up
> > it's usually accompanied by high TB flush and invalidate counts in "info
> > jit" because we are doing more translation that we usually do.
> >
>
> Yes, I think the machine was setup with only 128MB RAM.
>
> That would be an interesting experiment for Ahmed actually - to
> measure impact of given RAM memory to performance.
>
> But it looks that at least for machines with small RAM, translation
> phase will take significant percentage.
>
> I am attaching call graph for translation phase for "Hello World" built
> for mips, and emulated by QEMU: *tb_gen_code() and its calees)

Sorry if I'm stating the obvious but both "Hello World" and a
Linux boot will exhibit similar behaviors with low reuse of
translated blocks, which means translation will show up in
profiles as a lot of time is spent in translating blocks that
will run once.  If you push in that direction you might reach
the conclusion that a non JIST simulator is faster than QEMU.

You will have to carefully select the tests you run:  you need
a large spectrum from Linux boot, "Hello World" up to synthetic
benchmarks.

Again sorry if that was too trivial :-)

Laurent



Re: [PATCH v2 4/4] target/arm: Fix tcg_gen_gvec_dup_imm vs DUP (indexed)

2020-05-07 Thread Laurent Desnogues
On Thu, May 7, 2020 at 7:23 PM Richard Henderson
 wrote:
>
> DUP (indexed) can duplicate 128-bit elements, so using esz
> unconditionally can assert in tcg_gen_gvec_dup_imm.
>
> Fixes: 8711e71f9cbb
> Reported-by: Laurent Desnogues 
> Signed-off-by: Richard Henderson 

Reviewed-by: Laurent Desnogues 
Tested-by: Laurent Desnogues 

I had the same fix locally.

Thanks,

Laurent


> ---
>  target/arm/translate-sve.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
> index c8649283be..83614e9e70 100644
> --- a/target/arm/translate-sve.c
> +++ b/target/arm/translate-sve.c
> @@ -2044,7 +2044,11 @@ static bool trans_DUP_x(DisasContext *s, arg_DUP_x *a)
>  unsigned nofs = vec_reg_offset(s, a->rn, index, esz);
>  tcg_gen_gvec_dup_mem(esz, dofs, nofs, vsz, vsz);
>  } else {
> -tcg_gen_gvec_dup_imm(esz, dofs, vsz, vsz, 0);
> +/*
> + * While dup_mem handles 128-bit elements, dup_imm does not.
> + * Thankfully element size doesn't matter for splatting zero.
> + */
> +tcg_gen_gvec_dup_imm(MO_64, dofs, vsz, vsz, 0);
>  }
>  }
>  return true;
> --
> 2.20.1
>



Re: [PULL 04/10] target/arm: Use tcg_gen_gvec_dup_imm

2020-05-07 Thread Laurent Desnogues
Hello,

On Wed, May 6, 2020 at 8:30 PM Richard Henderson
 wrote:
>
> In a few cases, we're able to remove some manual replication.
>
> Reviewed-by: Alex Bennée 
> Signed-off-by: Richard Henderson 
> ---
>  target/arm/translate-a64.c | 10 +-
>  target/arm/translate-sve.c | 12 +---
>  target/arm/translate.c |  9 ++---
>  3 files changed, 16 insertions(+), 15 deletions(-)
>
> diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
> index a896f9c4b8..62e5729904 100644
> --- a/target/arm/translate-a64.c
> +++ b/target/arm/translate-a64.c
[...]
> @@ -2044,7 +2044,7 @@ static bool trans_DUP_x(DisasContext *s, arg_DUP_x *a)
>  unsigned nofs = vec_reg_offset(s, a->rn, index, esz);
>  tcg_gen_gvec_dup_mem(esz, dofs, nofs, vsz, vsz);
>  } else {
> -tcg_gen_gvec_dup64i(dofs, vsz, vsz, 0);
> +tcg_gen_gvec_dup_imm(esz, dofs, vsz, vsz, 0);

For an indexed DUP, size can be 128-bit so that will fail the first
assert in tcg-op-gvec.c:do_dup.

Thanks,

Laurent



Re: [PATCH v3] target/arm: Use correct variable for setting 'max' cpu's MIDR_EL1

2020-04-29 Thread Laurent Desnogues
On Tue, Apr 28, 2020 at 7:26 PM Philippe Mathieu-Daudé  wrote:
>
> MIDR_EL1 a 64-bit system register with the top 32-bit being RES0.
>
> This fixes when compiling with -Werror=conversion:
>
>   target/arm/cpu64.c: In function ‘aarch64_max_initfn’:
>   target/arm/cpu64.c:628:21: error: conversion from ‘uint64_t’ {aka ‘long 
> unsigned int’} to ‘uint32_t’ {aka ‘unsigned int’} may change value 
> [-Werror=conversion]
> 628 | cpu->midr = t;
> |     ^
>
> Suggested-by: Laurent Desnogues 
> Suggested-by: Peter Maydell 
> Signed-off-by: Philippe Mathieu-Daudé 

Reviewed-by: Laurent Desnogues 

Thanks,

Laurent

> ---
> Since v2: Do not use RESERVED bits.
> Since v1: Follow Laurent and Peter suggestion.
> ---
>  target/arm/cpu.h | 2 +-
>  target/arm/cpu.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index 8b9f2961ba..592fb217d6 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -894,7 +894,7 @@ struct ARMCPU {
>  uint64_t id_aa64dfr0;
>  uint64_t id_aa64dfr1;
>  } isar;
> -uint32_t midr;
> +uint64_t midr;
>  uint32_t revidr;
>  uint32_t reset_fpsid;
>  uint32_t ctr;
> diff --git a/target/arm/cpu.c b/target/arm/cpu.c
> index a79f233b17..7ff80894b6 100644
> --- a/target/arm/cpu.c
> +++ b/target/arm/cpu.c
> @@ -2757,7 +2757,7 @@ static const ARMCPUInfo arm_cpus[] = {
>  static Property arm_cpu_properties[] = {
>  DEFINE_PROP_BOOL("start-powered-off", ARMCPU, start_powered_off, false),
>  DEFINE_PROP_UINT32("psci-conduit", ARMCPU, psci_conduit, 0),
> -DEFINE_PROP_UINT32("midr", ARMCPU, midr, 0),
> +DEFINE_PROP_UINT64("midr", ARMCPU, midr, 0),
>  DEFINE_PROP_UINT64("mp-affinity", ARMCPU,
>  mp_affinity, ARM64_AFFINITY_INVALID),
>  DEFINE_PROP_INT32("node-id", ARMCPU, node_id, CPU_UNSET_NUMA_NODE_ID),
> --
> 2.21.1
>



Re: [PATCH v2] target/arm: Use correct variable for setting 'max' cpu's MIDR_EL1

2020-04-28 Thread Laurent Desnogues
On Tue, Apr 28, 2020 at 5:50 PM Philippe Mathieu-Daudé  wrote:
>
> MIDR_EL1 a 64-bit system register with the top 32-bit being RES0.
>
> This fixes when compiling with -Werror=conversion:
>
>   target/arm/cpu64.c: In function ‘aarch64_max_initfn’:
>   target/arm/cpu64.c:628:21: error: conversion from ‘uint64_t’ {aka ‘long 
> unsigned int’} to ‘uint32_t’ {aka ‘unsigned int’} may change value 
> [-Werror=conversion]
> 628 | cpu->midr = t;
> |     ^
>
> Suggested-by: Laurent Desnogues 
> Suggested-by: Peter Maydell 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
> I suppose cp15.c0_cpuid register in target/arm/cpu.h as uint32_t is OK.
>
> Since v1: Follow Laurent and Peter suggestion.
> ---
>  target/arm/cpu.h | 3 ++-
>  target/arm/cpu.c | 4 +++-
>  2 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index 8b9f2961ba..4d1be56df9 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -894,7 +894,7 @@ struct ARMCPU {
>  uint64_t id_aa64dfr0;
>  uint64_t id_aa64dfr1;
>  } isar;
> -uint32_t midr;
> +uint64_t midr;
>  uint32_t revidr;
>  uint32_t reset_fpsid;
>  uint32_t ctr;
> @@ -1685,6 +1685,7 @@ FIELD(MIDR_EL1, PARTNUM, 4, 12)
>  FIELD(MIDR_EL1, ARCHITECTURE, 16, 4)
>  FIELD(MIDR_EL1, VARIANT, 20, 4)
>  FIELD(MIDR_EL1, IMPLEMENTER, 24, 8)
> +FIELD(MIDR_EL1, RESERVED, 32, 32)

If you follow Peter advice to not check these 32-bits, you could
remove that field definition and if you keep it please rename it RES0
:-).

Thanks,

Laurent

>
>  FIELD(ID_ISAR0, SWAP, 0, 4)
>  FIELD(ID_ISAR0, BITCOUNT, 4, 4)
> diff --git a/target/arm/cpu.c b/target/arm/cpu.c
> index a79f233b17..aaa48e06ac 100644
> --- a/target/arm/cpu.c
> +++ b/target/arm/cpu.c
> @@ -1182,6 +1182,8 @@ void arm_cpu_post_init(Object *obj)
>  {
>  ARMCPU *cpu = ARM_CPU(obj);
>
> +assert(FIELD_EX64(cpu->midr, MIDR_EL1, RESERVED) == 0);
> +
>  /* M profile implies PMSA. We have to do this here rather than
>   * in realize with the other feature-implication checks because
>   * we look at the PMSA bit to see if we should add some properties.
> @@ -2757,7 +2759,7 @@ static const ARMCPUInfo arm_cpus[] = {
>  static Property arm_cpu_properties[] = {
>  DEFINE_PROP_BOOL("start-powered-off", ARMCPU, start_powered_off, false),
>  DEFINE_PROP_UINT32("psci-conduit", ARMCPU, psci_conduit, 0),
> -DEFINE_PROP_UINT32("midr", ARMCPU, midr, 0),
> +DEFINE_PROP_UINT64("midr", ARMCPU, midr, 0),
>  DEFINE_PROP_UINT64("mp-affinity", ARMCPU,
>  mp_affinity, ARM64_AFFINITY_INVALID),
>  DEFINE_PROP_INT32("node-id", ARMCPU, node_id, CPU_UNSET_NUMA_NODE_ID),
> --
> 2.21.1
>



Re: ARM SVE issues with non "standard" vector lengths

2020-04-23 Thread Laurent Desnogues
On Thu, Apr 23, 2020 at 5:19 PM Richard Henderson
 wrote:
>
> On 4/23/20 6:59 AM, Laurent Desnogues wrote:
> > 2. sve_zip_p
> >
> > This generates extraneous data in the higher part of the result.
> >
> > I hit this when I got a wrong result on an instruction that ends up
> > using sve_cntp which counts all bits set in each 64-bit chunk. There
> > might be some other instructions beyond ZIP that generate extra data
> > that would break sve_cntp.  So perhaps it'd be easier to fix sve_cmtp
> > (and hope that it's the only function that uses bits beyond vector
> > length...).
>
> There are quite a few places that assume (and some that assert, such as the
> load/store paths) that there are no predicate bits set beyond VL.
>
> I will fix zip.

Looking at the code I think sve_punpk_p is affected too.

Thanks,

Laurent

>
>
> r~



Re: [PATCH] target/arm: Use correct variable for setting 'max' cpu's MIDR_EL1

2020-04-23 Thread Laurent Desnogues
On Thu, Apr 23, 2020 at 2:44 PM Philippe Mathieu-Daudé  wrote:
>
> MIDR_EL1 is a 32-bit register.

In fact MIDR_EL1 a 64-bit system register with the top 32-bit being RES0.

So the right fix might be to change midr field size, just to be future proof :-)

But if we stick to a 32-bit midr then:

Reviewed-by: Laurent Desnogues 

Thanks,

Laurent

> This fixes when compiling with -Werror=conversion:
>
>   target/arm/cpu64.c: In function ‘aarch64_max_initfn’:
>   target/arm/cpu64.c:628:21: error: conversion from ‘uint64_t’ {aka ‘long 
> unsigned int’} to ‘uint32_t’ {aka ‘unsigned int’} may change value 
> [-Werror=conversion]
> 628 | cpu->midr = t;
> | ^
>
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  target/arm/cpu64.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
> index 95d0c8c101..4eb0a9030e 100644
> --- a/target/arm/cpu64.c
> +++ b/target/arm/cpu64.c
> @@ -620,12 +620,12 @@ static void aarch64_max_initfn(Object *obj)
>   * code needs to distinguish this QEMU CPU from other software
>   * implementations, though this shouldn't be needed.
>   */
> -t = FIELD_DP64(0, MIDR_EL1, IMPLEMENTER, 0);
> -t = FIELD_DP64(t, MIDR_EL1, ARCHITECTURE, 0xf);
> -t = FIELD_DP64(t, MIDR_EL1, PARTNUM, 'Q');
> -t = FIELD_DP64(t, MIDR_EL1, VARIANT, 0);
> -t = FIELD_DP64(t, MIDR_EL1, REVISION, 0);
> -cpu->midr = t;
> +u = FIELD_DP32(0, MIDR_EL1, IMPLEMENTER, 0);
> +u = FIELD_DP32(u, MIDR_EL1, ARCHITECTURE, 0xf);
> +u = FIELD_DP32(u, MIDR_EL1, PARTNUM, 'Q');
> +u = FIELD_DP32(u, MIDR_EL1, VARIANT, 0);
> +u = FIELD_DP32(u, MIDR_EL1, REVISION, 0);
> +cpu->midr = u;
>
>  t = cpu->isar.id_aa64isar0;
>  t = FIELD_DP64(t, ID_AA64ISAR0, AES, 2); /* AES + PMULL */
> --
> 2.21.1
>
>



Re: [PATCH] target/arm: Use correct variable for setting 'max' cpu's ID_AA64DFR0

2020-04-23 Thread Laurent Desnogues
On Thu, Apr 23, 2020 at 1:09 PM Peter Maydell  wrote:
>
> In aarch64_max_initfn() we update both 32-bit and 64-bit ID
> registers.  The intended pattern is that for 64-bit ID registers we
> use FIELD_DP64 and the uint64_t 't' register, while 32-bit ID
> registers use FIELD_DP32 and the uint32_t 'u' register.  For
> ID_AA64DFR0 we accidentally used 'u', meaning that the top 32 bits of
> this 64-bit ID register would end up always zero.  Luckily at the
> moment that's what they should be anyway, so this bug has no visible
> effects.
>
> Use the right-sized variable.
>
> Fixes: 3bec78447a958d481991
> Signed-off-by: Peter Maydell 

Reviewed-by: Laurent Desnogues 

Thanks,

Laurent

> ---
>  target/arm/cpu64.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
> index 95d0c8c101a..4c7105ea1a1 100644
> --- a/target/arm/cpu64.c
> +++ b/target/arm/cpu64.c
> @@ -708,9 +708,9 @@ static void aarch64_max_initfn(Object *obj)
>  u = FIELD_DP32(u, ID_MMFR4, CNP, 1); /* TTCNP */
>  cpu->isar.id_mmfr4 = u;
>
> -u = cpu->isar.id_aa64dfr0;
> -u = FIELD_DP64(u, ID_AA64DFR0, PMUVER, 5); /* v8.4-PMU */
> -cpu->isar.id_aa64dfr0 = u;
> +t = cpu->isar.id_aa64dfr0;
> +t = FIELD_DP64(t, ID_AA64DFR0, PMUVER, 5); /* v8.4-PMU */
> +cpu->isar.id_aa64dfr0 = t;
>
>  u = cpu->isar.id_dfr0;
>  u = FIELD_DP32(u, ID_DFR0, PERFMON, 5); /* v8.4-PMU */
> --
> 2.20.1
>
>



Re: [PATCH for-5.0?] target/arm: Fix ID_MMFR4 value on AArch64 'max' CPU

2020-04-22 Thread Laurent Desnogues
On Wed, Apr 22, 2020 at 2:45 PM Peter Maydell  wrote:
>
> In commit 41a4bf1feab098da4cd the added code to set the CNP
> field in ID_MMFR4 for the AArch64 'max' CPU had a typo
> where it used the wrong variable name, resulting in ID_MMFR4
> fields AC2, XNX and LSM being wrong. Fix the typo.
>
> Fixes: 41a4bf1feab098da4cd
> Reported-by: Laurent Desnogues 
> Signed-off-by: Peter Maydell 

Reviewed-by: Laurent Desnogues 

Thanks,

Laurent

> ---
> maybe 5.0 just because it's so trivial. I dunno...
>
> There's also an error where we use the uint32_t u variable
> to update the 64-bit ID_AA64DFR0 register, but that's harmless
> because as it happens the top 32 bits of that register are
> all zeroes anyway, so we can just fix that in 5.1.
>
>  target/arm/cpu64.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
> index 62d36f9e8d3..95d0c8c101a 100644
> --- a/target/arm/cpu64.c
> +++ b/target/arm/cpu64.c
> @@ -705,7 +705,7 @@ static void aarch64_max_initfn(Object *obj)
>  u = cpu->isar.id_mmfr4;
>  u = FIELD_DP32(u, ID_MMFR4, HPDS, 1); /* AA32HPD */
>  u = FIELD_DP32(u, ID_MMFR4, AC2, 1); /* ACTLR2, HACTLR2 */
> -u = FIELD_DP32(t, ID_MMFR4, CNP, 1); /* TTCNP */
> +u = FIELD_DP32(u, ID_MMFR4, CNP, 1); /* TTCNP */
>  cpu->isar.id_mmfr4 = u;
>
>  u = cpu->isar.id_aa64dfr0;
> --
> 2.20.1
>



Re: [PATCH 21/31] target/arm: Implement SVE2 integer absolute difference and accumulate long

2020-04-14 Thread Laurent Desnogues
On Tue, Apr 14, 2020 at 1:19 AM Richard Henderson
 wrote:
>
> On 4/13/20 9:15 AM, Laurent Desnogues wrote:
> > On Fri, Mar 27, 2020 at 12:18 AM Richard Henderson
> >  wrote:
> > [...]
> >> diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
> >> index a3653007ac..a0995d95c7 100644
> >> --- a/target/arm/sve_helper.c
> >> +++ b/target/arm/sve_helper.c
> >> @@ -1216,6 +1216,30 @@ DO_ZZZ_NTB(sve2_eoril_d, uint64_t, , DO_EOR)
> >>
> >>  #undef DO_ZZZ_NTB
> >>
> >> +#define DO_ABAL(NAME, TYPE, TYPEN) \
> >> +void HELPER(NAME)(void *vd, void *va, void *vn, void *vm, uint32_t desc) \
> >> +{  \
> >> +intptr_t i, opr_sz = simd_oprsz(desc); \
> >> +int sel1 = (simd_data(desc) & 1) * sizeof(TYPE);   \
> >> +int sel2 = (simd_data(desc) & 2) * (sizeof(TYPE) / 2); \
> >> +for (i = 0; i < opr_sz; i += sizeof(TYPE)) {   \
> >> +TYPE nn = (TYPEN)(*(TYPE *)(vn + i) >> sel1);  \
> >> +TYPE mm = (TYPEN)(*(TYPE *)(vm + i) >> sel2);  \
> >> +TYPE aa = *(TYPE *)(va + i);   \
> >> +*(TYPE *)(vd + i) = DO_ABD(nn, mm) + aa;   \
> >> +}  \
> >> +}
> >
> > ABAL is either top or bottom not a mix of two.  So only sel1 is needed
> > and its multiplicand should be the number of bits of TYPEN.
>
> Yep.
>
> > vd is both a source and a destination so a temporary should be used.
>
> In what way am I not?  Both sources are read before the write.  The operands
> are all in columns of the wide type (unlike the addp case you pointed out).

You're right, sorry.

Laurent



Re: [PATCH 20/31] target/arm: Implement SVE2 complex integer add

2020-04-13 Thread Laurent Desnogues
On Fri, Mar 27, 2020 at 12:20 AM Richard Henderson
 wrote:
[...]
> diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
> index b5afa34efe..a3653007ac 100644
> --- a/target/arm/sve_helper.c
> +++ b/target/arm/sve_helper.c
> @@ -1289,6 +1289,48 @@ DO_BITPERM(sve2_bgrp_d, uint64_t, bitgroup)
>
>  #undef DO_BITPERM
>
> +#define DO_CADD(NAME, TYPE, H, ADD_OP, SUB_OP)  \
> +void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc)  \
> +{   \
> +intptr_t i, opr_sz = simd_oprsz(desc);  \
> +int sub_r = simd_data(desc);\
> +if (sub_r) {\
> +for (i = 0; i < opr_sz; i += 2 * sizeof(TYPE)) {\
> +TYPE acc_r = *(TYPE *)(vn + H(i));  \
> +TYPE acc_i = *(TYPE *)(vn + H(i + sizeof(TYPE)));   \
> +TYPE el2_r = *(TYPE *)(vm + H(i));  \
> +TYPE el2_i = *(TYPE *)(vm + H(i + sizeof(TYPE)));   \
> +acc_r = SUB_OP(acc_r, el2_i);   \
> +acc_i = ADD_OP(acc_i, el2_r);   \
> +*(TYPE *)(vd + H(i)) = acc_r;   \
> +*(TYPE *)(vd + H(i + sizeof(TYPE))) = acc_i;\
> +}   \
> +} else {\
> +for (i = 0; i < opr_sz; i += 2 * sizeof(TYPE)) {\
> +TYPE acc_r = *(TYPE *)(vn + H(i));  \
> +TYPE acc_i = *(TYPE *)(vn + H(i + sizeof(TYPE)));   \
> +TYPE el2_r = *(TYPE *)(vm + H(i));  \
> +TYPE el2_i = *(TYPE *)(vm + H(i + sizeof(TYPE)));   \
> +acc_r = ADD_OP(acc_r, el2_i);   \
> +acc_i = SUB_OP(acc_i, el2_r);   \
> +*(TYPE *)(vd + H(i)) = acc_r;   \
> +*(TYPE *)(vd + H(i + sizeof(TYPE))) = acc_i;\
> +}   \
> +}   \
> +}

The then/else branches of if (sub_r) are swapped.

Laurent



Re: [PATCH 22/31] target/arm: Implement SVE2 integer add/subtract long with carry

2020-04-13 Thread Laurent Desnogues
On Fri, Mar 27, 2020 at 12:17 AM Richard Henderson
 wrote:
[...]
> diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
> index a0995d95c7..aa330f75c3 100644
> --- a/target/arm/sve_helper.c
> +++ b/target/arm/sve_helper.c
[...]
> +void HELPER(sve2_adcl_d)(void *vd, void *va, void *vn, void *vm, uint32_t 
> desc)
> +{
> +intptr_t i, opr_sz = simd_oprsz(desc);
> +int sel = extract32(desc, SIMD_DATA_SHIFT, 1) * 32;
> +uint64_t inv = -(uint64_t)extract32(desc, SIMD_DATA_SHIFT + 1, 1);
> +uint64_t *d = vd, *a = va, *n = vn, *m = vm;
> +
> +for (i = 0; i < opr_sz / 8; i += 2) {
> +Int128 e1 = int128_make64(a[i]);
> +Int128 e2 = int128_make64(n[i + sel] ^ inv);
> +Int128 c = int128_make64(m[i + 1] & 1);
> +Int128 r = int128_add(int128_add(e1, e2), c);
> +d[i + 0] = int128_getlo(r);
> +d[i + 1] = int128_gethi(r);
> +}
> +}

sel should not be multiplied by 32.

Laurent



Re: [PATCH 21/31] target/arm: Implement SVE2 integer absolute difference and accumulate long

2020-04-13 Thread Laurent Desnogues
On Fri, Mar 27, 2020 at 12:18 AM Richard Henderson
 wrote:
[...]
> diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
> index a3653007ac..a0995d95c7 100644
> --- a/target/arm/sve_helper.c
> +++ b/target/arm/sve_helper.c
> @@ -1216,6 +1216,30 @@ DO_ZZZ_NTB(sve2_eoril_d, uint64_t, , DO_EOR)
>
>  #undef DO_ZZZ_NTB
>
> +#define DO_ABAL(NAME, TYPE, TYPEN) \
> +void HELPER(NAME)(void *vd, void *va, void *vn, void *vm, uint32_t desc) \
> +{  \
> +intptr_t i, opr_sz = simd_oprsz(desc); \
> +int sel1 = (simd_data(desc) & 1) * sizeof(TYPE);   \
> +int sel2 = (simd_data(desc) & 2) * (sizeof(TYPE) / 2); \
> +for (i = 0; i < opr_sz; i += sizeof(TYPE)) {   \
> +TYPE nn = (TYPEN)(*(TYPE *)(vn + i) >> sel1);  \
> +TYPE mm = (TYPEN)(*(TYPE *)(vm + i) >> sel2);  \
> +TYPE aa = *(TYPE *)(va + i);   \
> +*(TYPE *)(vd + i) = DO_ABD(nn, mm) + aa;   \
> +}  \
> +}

ABAL is either top or bottom not a mix of two.  So only sel1 is needed
and its multiplicand should be the number of bits of TYPEN.
vd is both a source and a destination so a temporary should be used.

Laurent



Re: [PATCH 13/31] target/arm: Implement SVE2 integer add/subtract wide

2020-04-13 Thread Laurent Desnogues
On Fri, Mar 27, 2020 at 12:17 AM Richard Henderson
 wrote:
[...]
> diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
> index 7d7a59f620..44503626e4 100644
> --- a/target/arm/sve_helper.c
> +++ b/target/arm/sve_helper.c
> @@ -1131,6 +1131,36 @@ DO_ZZZ_TB(sve2_uabdl_d, uint64_t, uint32_t, DO_ABD)
>
>  #undef DO_ZZZ_TB
>
> +#define DO_ZZZ_WTB(NAME, TYPE, TYPEN, OP) \
> +void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \
> +{  \
> +intptr_t i, opr_sz = simd_oprsz(desc); \
> +int sel2 = (simd_data(desc) & 1) * sizeof(TYPE);   \
> +for (i = 0; i < opr_sz; i += sizeof(TYPE)) {   \
> +TYPE nn = *(TYPE *)(vn + i);   \
> +TYPE mm = (TYPEN)(*(TYPE *)(vm + i) >> sel2);  \
> +*(TYPE *)(vd + i) = OP(nn, mm);\
> +}  \
> +}

For sel2 the multiplicand should be the number of bits of TYPEN.

Laurent



Re: [PATCH 11/31] target/arm: Implement SVE2 integer add/subtract long

2020-04-13 Thread Laurent Desnogues
On Fri, Mar 27, 2020 at 12:09 AM Richard Henderson
 wrote:
[...]
> diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
> index bee00eaa44..7d7a59f620 100644
> --- a/target/arm/sve_helper.c
> +++ b/target/arm/sve_helper.c
> @@ -1088,6 +1088,49 @@ DO_ZZW(sve_lsl_zzw_s, uint32_t, uint64_t, H1_4, DO_LSL)
>  #undef DO_ZPZ
>  #undef DO_ZPZ_D
>
> +/*
> + * Three-operand expander, unpredicated, in which the two inputs are
> + * selected from the top or bottom half of the wide column.
> + */
> +#define DO_ZZZ_TB(NAME, TYPE, TYPEN, OP) \
> +void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \
> +{  \
> +intptr_t i, opr_sz = simd_oprsz(desc); \
> +int sel1 = (simd_data(desc) & 1) * sizeof(TYPE);   \
> +int sel2 = (simd_data(desc) & 2) * (sizeof(TYPE) / 2); \
> +for (i = 0; i < opr_sz; i += sizeof(TYPE)) {   \
> +TYPE nn = (TYPEN)(*(TYPE *)(vn + i) >> sel1);  \
> +TYPE mm = (TYPEN)(*(TYPE *)(vm + i) >> sel2);  \
> +*(TYPE *)(vd + i) = OP(nn, mm);\
> +}  \
> +}

For sel1/sel2 the multiplicand should be the number of bits in TYPEN.

Laurent



Re: [PATCH 09/31] target/arm: Implement SVE2 integer pairwise arithmetic

2020-04-13 Thread Laurent Desnogues
On Fri, Mar 27, 2020 at 12:16 AM Richard Henderson
 wrote:
[...]
> diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
> index 5d75aed7b7..d7c181ddb8 100644
> --- a/target/arm/sve_helper.c
> +++ b/target/arm/sve_helper.c
> @@ -681,6 +681,73 @@ DO_ZPZZ_D(sve2_uhsub_zpzz_d, uint64_t, DO_HSUB_D)
>  #undef DO_ZPZZ
>  #undef DO_ZPZZ_D
>
> +/*
> + * Three operand expander, operating on element pairs.
> + * If the slot I is even, the elements from from VN {I, I+1}.
> + * If the slot I is odd, the elements from from VM {I-1, I}.
> + */
> +#define DO_ZPZZ_PAIR(NAME, TYPE, H, OP) \
> +void HELPER(NAME)(void *vd, void *vn, void *vm, void *vg, uint32_t desc) \
> +{   \
> +intptr_t i, opr_sz = simd_oprsz(desc);  \
> +for (i = 0; i < opr_sz; ) { \
> +uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); \
> +do {\
> +if (pg & 1) {   \
> +void *p = (i & 1 ? vm : vn);\
> +TYPE nn = *(TYPE *)(p + H(i & ~1)); \
> +TYPE mm = *(TYPE *)(p + H(i | 1));  \
> +*(TYPE *)(vd + H(i)) = OP(nn, mm);  \
> +}   \
> +i += sizeof(TYPE), pg >>= sizeof(TYPE); \
> +} while (i & 15);   \
> +}   \
> +}

You should not use 1 as mask but sizeof(TYPE).
A temporary should be used because vd also is a source.

> +/* Similarly, specialized for 64-bit operands.  */
> +#define DO_ZPZZ_PAIR_D(NAME, TYPE, OP)  \
> +void HELPER(NAME)(void *vd, void *vn, void *vm, void *vg, uint32_t desc) \
> +{   \
> +intptr_t i, opr_sz = simd_oprsz(desc) / 8;  \
> +TYPE *d = vd, *n = vn, *m = vm; \
> +uint8_t *pg = vg;   \
> +for (i = 0; i < opr_sz; i += 1) {   \
> +if (pg[H1(i)] & 1) {\
> +TYPE *p = (i & 1 ? m : n) + (i & ~1);   \
> +TYPE nn = p[0], mm = p[1];  \
> +d[i] = OP(nn, mm);  \
> +}   \
> +}   \
> +}

A temporary should be used because vd also is a source.

Laurent



Re: QEMU for aarch64 with plugins seems to fail basic consistency checks

2020-01-24 Thread Laurent Desnogues
Hello,

On Fri, Jan 24, 2020 at 1:45 AM Robert Henry  wrote:
>
> I wrote a QEMU plugin for aarch64 where the insn and mem callbacks print out 
> the specifics of the guest instructions as they are "executed".  I expect 
> this trace stream to be well behaved but it is not. By well-behaved, I expect 
> memory insns print out some memory details, non-memory insns don't print 
> anything, and the pc only changes after a control flow instruction.  I don't 
> see that gross correctness about 2% of the time.
>
> I'm using qemu at tag v4.2.0 (or master head; it doesn't matter), running on 
> a x86_64 host.
> I build qemu using   ./configure --disable-sdl --enable-gtk --enable-plugins 
> --enable-debug --target-list=aarch64-softmmu aarch64-linux-user
> I execute qemu from its build area build/aarch64-linux-user/qemu-aarch64, 
> with flags --cpu cortex-a72 and the appropriate args to --plugin ... -d 
> plugin -D .
> I'm emulating a simple C program in linux emulation mode.
> The resulting qemu execution is valgrind clean (eg, I run qemu under 
> valgrind) for my little program save for memory leaks I reported a few days 
> ago.
>
> Below is an example of my trace output (the first int printed is the 
> cpu_index, checked to be always 0). Note that the ldr instruction at 0x41a608 
> sometimes reports a memop, but most of the time it doesn't.  Note that 
> 0x41a608 is seen, by trace, running back to back. Note that (bottom of trace) 
> that the movz instruction reports a memop.  (The executed code comes from 
> glibc _dl_aux_init, executed before main() is called.)
>
> How should this problem be tackled? I can't figure out how to make each tcg 
> block be exactly 1 guest (aarch64) insn, which is where I'd first start out.

To get TBs of a single instruction, you can add -singlestep to the command line.

HTH,

Laurent

> 0 0x0041a784 0x0041a784 0xf1000c3f cmp x1, #3
> 0 0x0041a788 0x0041a788 0x54fff401 b.ne #0xfe80
> 0 0x0041a78c 0x0041a78c 0x52800033 movz w19, #0x1
> 0 0x0041a790 0x0041a790 0xf9400416 ldr x22, [x0, #8] 0 mem  
> {3 0 0 0} 0x004000800618
> 0 0x0041a794 0x0041a794 0x179d b #0xfe74
> 0 0x0041a608 0x0041a608 0xf8410c01 ldr x1, [x0, #0x10]!  0 
> mem  {3 0 0 0} 0x004000800620
> 0 0x0041a60c 0x0041a60c 0xb4000221 cbz x1, #0x44
> 0 0x0041a608 0x0041a608 0xf8410c01 ldr x1, [x0, #0x10]!
> 0 0x0041a60c 0x0041a60c 0xb4000221 cbz x1, #0x44
> 0 0x0041a608 0x0041a608 0xf8410c01 ldr x1, [x0, #0x10]!
> 0 0x0041a608 0x0041a608 0xf8410c01 ldr x1, [x0, #0x10]!
> 0 0x0041a60c 0x0041a60c 0xb4000221 cbz x1, #0x44
> 0 0x0041a608 0x0041a608 0xf8410c01 ldr x1, [x0, #0x10]!
> 0 0x0041a608 0x0041a608 0xf8410c01 ldr x1, [x0, #0x10]!
> 0 0x0041a60c 0x0041a60c 0xb4000221 cbz x1, #0x44
> 0 0x0041a608 0x0041a608 0xf8410c01 ldr x1, [x0, #0x10]!
> 0 0x0041a60c 0x0041a60c 0xb4000221 cbz x1, #0x44
> 0 0x0041a608 0x0041a608 0xf8410c01 ldr x1, [x0, #0x10]!
> 0 0x0041a60c 0x0041a60c 0xb4000221 cbz x1, #0x44
> 0 0x0041a608 0x0041a608 0xf8410c01 ldr x1, [x0, #0x10]!  0 
> mem  {3 0 0 0} 0x004000800630
> 0 0x0041a60c 0x0041a60c 0xb4000221 cbz x1, #0x44
> 0 0x0041a608 0x0041a608 0xf8410c01 ldr x1, [x0, #0x10]!
> 0 0x0041a60c 0x0041a60c 0xb4000221 cbz x1, #0x44
> 0 0x0041a608 0x0041a608 0xf8410c01 ldr x1, [x0, #0x10]!
> 0 0x0041a608 0x0041a608 0xf8410c01 ldr x1, [x0, #0x10]!
> 0 0x0041a60c 0x0041a60c 0xb4000221 cbz x1, #0x44
> 0 0x0041a608 0x0041a608 0xf8410c01 ldr x1, [x0, #0x10]!
> 0 0x0041a608 0x0041a608 0xf8410c01 ldr x1, [x0, #0x10]!
> 0 0x0041a60c 0x0041a60c 0xb4000221 cbz x1, #0x44
> 0 0x0041a7d8 0x0041a7d8 0x52800035 movz w21, #0x1
> 0 0x0041a7dc 0x0041a7dc 0xf9400418 ldr x24, [x0, #8] 0 mem  
> {3 0 0 0} 0x004000800638
> 0 0x0041a7e0 0x0041a7e0 0x178a b #0xfe28
> 0 0x0041a7d8 0x0041a7d8 0x52800035 movz w21, #0x1 0 mem  {3 0 
> 0 0} 0x004000800640
>
>
>
>
>



Re: [PATCH 0/3] linux-user: Implement x86_64 vsyscalls

2020-01-14 Thread Laurent Desnogues
On Tue, Jan 14, 2020 at 10:09 PM Richard Henderson
 wrote:
[...]
> I vaguely remember someone (Paolo?) implementing something like
> this many years ago, but clearly it never got merged.

That was me back in 2009:

https://lists.gnu.org/archive/html/qemu-devel/2009-07/msg00881.html

Glad it will finally get an official fix.

Thanks,

Laurent



Re: [PATCH v4 04/40] target/arm: Add TTBR1_EL2

2019-12-10 Thread Laurent Desnogues
Hi Richard,

On Tue, Dec 3, 2019 at 3:35 AM Richard Henderson
 wrote:
>
> At the same time, add writefn to TTBR0_EL2 and TCR_EL2.
> A later patch will update any ASID therein.
>
> Reviewed-by: Alex Bennée 
> Signed-off-by: Richard Henderson 
> ---
>  target/arm/helper.c | 18 +-
>  1 file changed, 13 insertions(+), 5 deletions(-)
>
> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index b4d774632d..06ec4641f3 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -3484,6 +3484,12 @@ static void vmsa_ttbr_write(CPUARMState *env, const 
> ARMCPRegInfo *ri,
>  raw_write(env, ri, value);
>  }
>
> +static void vmsa_tcr_ttbr_el2_write(CPUARMState *env, const ARMCPRegInfo *ri,
> +uint64_t value)
> +{
> +raw_write(env, ri, value);
> +}
> +
>  static void vttbr_write(CPUARMState *env, const ARMCPRegInfo *ri,
>  uint64_t value)
>  {
> @@ -4893,10 +4899,8 @@ static const ARMCPRegInfo el2_cp_reginfo[] = {
>.resetvalue = 0 },
>  { .name = "TCR_EL2", .state = ARM_CP_STATE_BOTH,
>.opc0 = 3, .opc1 = 4, .crn = 2, .crm = 0, .opc2 = 2,
> -  .access = PL2_RW,
> -  /* no .writefn needed as this can't cause an ASID change;
> -   * no .raw_writefn or .resetfn needed as we never use mask/base_mask
> -   */
> +  .access = PL2_RW, .writefn = vmsa_tcr_ttbr_el2_write,

Are you sure you should call vmsa_tcr_ttbr_el2_write for a tcr_el2
write?  As far as I can see, tcr_el2 has no ASID field in bits
[63:48]. On the other hand there are various other bits in TCR_EL2
that might require a TLB flush; for instance the A1 bit [22] that
defines where to pick ASID from.

Thanks,

Laurent

> +  /* no .raw_writefn or .resetfn needed as we never use mask/base_mask */
>.fieldoffset = offsetof(CPUARMState, cp15.tcr_el[2]) },
>  { .name = "VTCR", .state = ARM_CP_STATE_AA32,
>.cp = 15, .opc1 = 4, .crn = 2, .crm = 1, .opc2 = 2,
> @@ -4930,7 +4934,7 @@ static const ARMCPRegInfo el2_cp_reginfo[] = {
>.fieldoffset = offsetof(CPUARMState, cp15.tpidr_el[2]) },
>  { .name = "TTBR0_EL2", .state = ARM_CP_STATE_AA64,
>.opc0 = 3, .opc1 = 4, .crn = 2, .crm = 0, .opc2 = 0,
> -  .access = PL2_RW, .resetvalue = 0,
> +  .access = PL2_RW, .resetvalue = 0, .writefn = vmsa_tcr_ttbr_el2_write,
>.fieldoffset = offsetof(CPUARMState, cp15.ttbr0_el[2]) },
>  { .name = "HTTBR", .cp = 15, .opc1 = 4, .crm = 2,
>.access = PL2_RW, .type = ARM_CP_64BIT | ARM_CP_ALIAS,
> @@ -6959,6 +6963,10 @@ void register_cp_regs_for_features(ARMCPU *cpu)
>.opc0 = 3, .opc1 = 4, .crn = 13, .crm = 0, .opc2 = 1,
>.access = PL2_RW,
>.fieldoffset = offsetof(CPUARMState, cp15.contextidr_el[2]) },
> +{ .name = "TTBR1_EL2", .state = ARM_CP_STATE_AA64,
> +  .opc0 = 3, .opc1 = 4, .crn = 2, .crm = 0, .opc2 = 1,
> +  .access = PL2_RW, .writefn = vmsa_tcr_ttbr_el2_write,
> +  .fieldoffset = offsetof(CPUARMState, cp15.ttbr1_el[2]) },
>  REGINFO_SENTINEL
>  };
>  define_arm_cp_regs(cpu, vhe_reginfo);
> --
> 2.17.1
>
>



[Bug 1851095] Re: [feature request] awareness of instructions that are well emulated

2019-11-04 Thread Laurent Desnogues
Can you please provide a binary (preferably statically built or with
required shared libraries attached)?

Thanks,

Laurent

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1851095

Title:
  [feature request] awareness of instructions that are well emulated

Status in QEMU:
  New

Bug description:
  While qemu's scalar emulation tends to be excellent, qemu's SIMD
  emulation tends to be incorrect (except for arm64 from x86_64)--i have
  found this both for mipsel and arm32. Until these code paths are
  audited, which is probably a large job, it would be nice if qemu knew
  its emulation of this class of instructions was not very good, and
  thus it would give up on finding these instructions if a "careful"
  operation is passed.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1851095/+subscriptions



Re: [Qemu-devel] [PATCH] target/arm: Fix sign-extension for SMLAL*

2019-09-13 Thread Laurent Desnogues
Hello,

On Thu, Sep 12, 2019 at 8:31 PM Richard Henderson
 wrote:
>
> The 32-bit product should be sign-extended, not zero-extended.
>
> Fixes: ea96b374641b
> Reported-by: Laurent Desnogues 
> Signed-off-by: Richard Henderson 

No more failures on my tests.

Tested-by: Laurent Desnogues 

Thanks,

Laurent

> ---
>  target/arm/translate.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/target/arm/translate.c b/target/arm/translate.c
> index 34bb280e3d..fd2f0e3048 100644
> --- a/target/arm/translate.c
> +++ b/target/arm/translate.c
> @@ -8045,7 +8045,9 @@ static bool op_smlaxxx(DisasContext *s, arg_ *a,
>  case 2:
>  tl = load_reg(s, a->ra);
>  th = load_reg(s, a->rd);
> -t1 = tcg_const_i32(0);
> +/* Sign-extend the 32-bit product to 64 bits.  */
> +t1 = tcg_temp_new_i32();
> +tcg_gen_sari_i32(t1, t0, 31);
>  tcg_gen_add2_i32(tl, th, tl, th, t0, t1);
>  tcg_temp_free_i32(t0);
>  tcg_temp_free_i32(t1);
> --
> 2.17.1
>



Re: [Qemu-devel] [PATCH v4 10/69] target/arm: Simplify op_smlaxxx for SMLAL*

2019-09-12 Thread Laurent Desnogues
Hi Richard,

On Wed, Sep 4, 2019 at 9:39 PM Richard Henderson
 wrote:
>
> Since all of the inputs and outputs are i32, dispense with
> the intermediate promotion to i64 and use tcg_gen_add2_i32.
>
> Reviewed-by: Peter Maydell 
> Signed-off-by: Richard Henderson 
> ---
>  target/arm/translate.c | 15 ---
>  1 file changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/target/arm/translate.c b/target/arm/translate.c
> index 5306e93470..37aa873e25 100644
> --- a/target/arm/translate.c
> +++ b/target/arm/translate.c
> @@ -8177,8 +8177,7 @@ DO_QADDSUB(QDSUB, false, true)
>  static bool op_smlaxxx(DisasContext *s, arg_ *a,
> int add_long, bool nt, bool mt)
>  {
> -TCGv_i32 t0, t1;
> -TCGv_i64 t64;
> +TCGv_i32 t0, t1, tl, th;
>
>  if (s->thumb
>  ? !arm_dc_feature(s, ARM_FEATURE_THUMB_DSP)
> @@ -8202,12 +8201,14 @@ static bool op_smlaxxx(DisasContext *s, arg_ *a,
>  store_reg(s, a->rd, t0);
>  break;
>  case 2:
> -t64 = tcg_temp_new_i64();
> -tcg_gen_ext_i32_i64(t64, t0);
> +tl = load_reg(s, a->ra);
> +th = load_reg(s, a->rd);
> +t1 = tcg_const_i32(0);

The product is sign-extended before the addition so t1 is not always 0.

Thanks,

Laurent

> +tcg_gen_add2_i32(tl, th, tl, th, t0, t1);
>  tcg_temp_free_i32(t0);
> -gen_addq(s, t64, a->ra, a->rd);
> -gen_storeq_reg(s, a->ra, a->rd, t64);
> -tcg_temp_free_i64(t64);
> +tcg_temp_free_i32(t1);
> +store_reg(s, a->ra, tl);
> +store_reg(s, a->rd, th);
>  break;
>  default:
>  g_assert_not_reached();
> --
> 2.17.1
>
>



Re: [Qemu-devel] [Qemu-arm] [PATCH v5 17/17] target/arm: Rely on hflags correct in cpu_get_tb_cpu_state

2019-09-05 Thread Laurent Desnogues
On Thu, Sep 5, 2019 at 5:24 PM Alex Bennée  wrote:
>
>
> Richard Henderson  writes:
>
> > This is the payoff.
> >
> > From perf record -g data of ubuntu 18 boot and shutdown:
> >
> > BEFORE:
> >
> > -   23.02% 2.82%  qemu-system-aar  [.] helper_lookup_tb_ptr
> >- 20.22% helper_lookup_tb_ptr
> >   + 10.05% tb_htable_lookup
> >   - 9.13% cpu_get_tb_cpu_state
> >3.20% aa64_va_parameters_both
> >0.55% fp_exception_el
> >
> > -   11.66% 4.74%  qemu-system-aar  [.] cpu_get_tb_cpu_state
> >- 6.96% cpu_get_tb_cpu_state
> > 3.63% aa64_va_parameters_both
> > 0.60% fp_exception_el
> > 0.53% sve_exception_el
> >
> > AFTER:
> >
> > -   16.40% 3.40%  qemu-system-aar  [.] helper_lookup_tb_ptr
> >- 13.03% helper_lookup_tb_ptr
> >   + 11.19% tb_htable_lookup
> > 0.55% cpu_get_tb_cpu_state
> >
> >  0.98% 0.71%  qemu-system-aar  [.] cpu_get_tb_cpu_state
> >
> >  0.87% 0.24%  qemu-system-aar  [.] rebuild_hflags_a64
> >
> > Before, helper_lookup_tb_ptr is the second hottest function in the
> > application, consuming almost a quarter of the runtime.  Within the
> > entire execution, cpu_get_tb_cpu_state consumes about 12%.
> >
> > After, helper_lookup_tb_ptr has dropped to the fourth hottest function,
> > with consumption dropping to a sixth of the runtime.  Within the
> > entire execution, cpu_get_tb_cpu_state has dropped below 1%, and the
> > supporting function to rebuild hflags also consumes about 1%.
> >
> > Assertions are retained for --enable-debug-tcg.
> >
> > Tested-by: Alex Bennée 
>
> Hmm something must have been missed for M-profile because:
>
>   make run-tcg-tests-arm-softmmu V=1
>
> Leads to:
>
>   timeout 15  
> /home/alex/lsrc/qemu.git/builds/all.debug/arm-softmmu/qemu-system-arm 
> -monitor none -display none -chardev 
> file,path=test-armv6m-undef.out,id=output -semihosting -M microbit -kernel 
> test-armv6m-undef
>   qemu: fatal: Lockup: can't escalate 3 to HardFault (current priority -1)
>
>   R00= R01= R02= R03=
>   R04= R05= R06= R07=
>   R08= R09= R10= R11=
>   R12= R13=20003fe0 R14=fff9 R15=00c0
>   XPSR=4103 -Z-- T handler
>   FPSCR: 
>   timeout: the monitored command dumped core
>
> But annoyingly not shown up by the debug-tcg verification. The commit
> before works fine.

There's a typo in the patch:  that should not be CONFIG_TCG_DEBUG but
CONFIG_DEBUG_TCG.  With this you should see the assert fire.

I let Richard know that there's an issue with the handling of CPSR E
flag (BE_DATA in hflags).  I don't know if that applies to your test.

Thanks,

Laurent

> > Reviewed-by: Alex Bennée 
> > Signed-off-by: Richard Henderson 
> > ---
> > v2: Retain asserts for future debugging.
> > ---
> >  target/arm/helper.c | 9 ++---
> >  1 file changed, 6 insertions(+), 3 deletions(-)
> >
> > diff --git a/target/arm/helper.c b/target/arm/helper.c
> > index d1bf71a260..5e4f996882 100644
> > --- a/target/arm/helper.c
> > +++ b/target/arm/helper.c
> > @@ -11211,12 +11211,15 @@ void HELPER(rebuild_hflags_a64)(CPUARMState *env, 
> > int el)
> >  void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
> >target_ulong *cs_base, uint32_t *pflags)
> >  {
> > -uint32_t flags, pstate_for_ss;
> > +uint32_t flags = env->hflags;
> > +uint32_t pstate_for_ss;
> >
> >  *cs_base = 0;
> > -flags = rebuild_hflags_internal(env);
> > +#ifdef CONFIG_TCG_DEBUG
> > +assert(flags == rebuild_hflags_internal(env));
> > +#endif
> >
> > -if (is_a64(env)) {
> > +if (FIELD_EX32(flags, TBFLAG_ANY, AARCH64_STATE)) {
> >  *pc = env->pc;
> >  if (cpu_isar_feature(aa64_bti, env_archcpu(env))) {
> >  flags = FIELD_DP32(flags, TBFLAG_A64, BTYPE, env->btype);
>
>
> --
> Alex Bennée
>



Re: [Qemu-devel] [PATCH] target/arm: Fix SMMLS argument order

2019-08-29 Thread Laurent Desnogues
Hi,

On Thu, Aug 29, 2019 at 3:33 AM Richard Henderson
 wrote:
>
> The previous simplification got the order of operands to the
> subtraction wrong.  Since the 64-bit product is the subtrahend,
> we must use a 64-bit subtract to properly compute the borrow
> from the low-part of the product.
>
> Fixes: 5f8cd06ebcf5 ("target/arm: Simplify SMMLA, SMMLAR, SMMLS, SMMLSR")
> Reported-by: Laurent Desnogues 
> Signed-off-by: Richard Henderson 

Tested-by: Laurent Desnogues 

Thanks,

Laurent

> ---
>  target/arm/translate.c | 20 ++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/target/arm/translate.c b/target/arm/translate.c
> index cbe19b7a62..a0f7577f47 100644
> --- a/target/arm/translate.c
> +++ b/target/arm/translate.c
> @@ -8824,7 +8824,16 @@ static void disas_arm_insn(DisasContext *s, unsigned 
> int insn)
>  if (rd != 15) {
>  tmp3 = load_reg(s, rd);
>  if (insn & (1 << 6)) {
> -tcg_gen_sub_i32(tmp, tmp, tmp3);
> +/*
> + * For SMMLS, we need a 64-bit subtract.
> + * Borrow caused by a non-zero multiplicand
> + * lowpart, and the correct result lowpart
> + * for rounding.
> + */
> +TCGv_i32 zero = tcg_const_i32(0);
> +tcg_gen_sub2_i32(tmp2, tmp, zero, tmp3,
> + tmp2, tmp);
> +tcg_temp_free_i32(zero);
>  } else {
>  tcg_gen_add_i32(tmp, tmp, tmp3);
>  }
> @@ -10068,7 +10077,14 @@ static void disas_thumb2_insn(DisasContext *s, 
> uint32_t insn)
>  if (insn & (1 << 20)) {
>  tcg_gen_add_i32(tmp, tmp, tmp3);
>  } else {
> -tcg_gen_sub_i32(tmp, tmp, tmp3);
> +/*
> + * For SMMLS, we need a 64-bit subtract.
> + * Borrow caused by a non-zero multiplicand lowpart,
> + * and the correct result lowpart for rounding.
> + */
> +TCGv_i32 zero = tcg_const_i32(0);
> +tcg_gen_sub2_i32(tmp2, tmp, zero, tmp3, tmp2, tmp);
> +tcg_temp_free_i32(zero);
>  }
>  tcg_temp_free_i32(tmp3);
>  }
> --
> 2.17.1
>



Re: [Qemu-devel] [Qemu-arm] [PATCH 6/7] target/arm: Simplify SMMLA, SMMLAR, SMMLS, SMMLSR

2019-08-28 Thread Laurent Desnogues
Hi Richard,

On Thu, Aug 8, 2019 at 10:28 PM Richard Henderson
 wrote:
>
> All of the inputs to these instructions are 32-bits.  Rather than
> extend each input to 64-bits and then extract the high 32-bits of
> the output, use tcg_gen_muls2_i32 and other 32-bit generator functions.
>
> Signed-off-by: Richard Henderson 
> ---
>  target/arm/translate.c | 72 +++---
>  1 file changed, 26 insertions(+), 46 deletions(-)
>
> diff --git a/target/arm/translate.c b/target/arm/translate.c
> index ddc54e77e4..77154be743 100644
> --- a/target/arm/translate.c
> +++ b/target/arm/translate.c
> @@ -391,34 +391,6 @@ static void gen_revsh(TCGv_i32 var)
>  tcg_gen_ext16s_i32(var, var);
>  }
>
> -/* Return (b << 32) + a. Mark inputs as dead */
> -static TCGv_i64 gen_addq_msw(TCGv_i64 a, TCGv_i32 b)
> -{
> -TCGv_i64 tmp64 = tcg_temp_new_i64();
> -
> -tcg_gen_extu_i32_i64(tmp64, b);
> -tcg_temp_free_i32(b);
> -tcg_gen_shli_i64(tmp64, tmp64, 32);
> -tcg_gen_add_i64(a, tmp64, a);
> -
> -tcg_temp_free_i64(tmp64);
> -return a;
> -}
> -
> -/* Return (b << 32) - a. Mark inputs as dead. */
> -static TCGv_i64 gen_subq_msw(TCGv_i64 a, TCGv_i32 b)
> -{
> -TCGv_i64 tmp64 = tcg_temp_new_i64();
> -
> -tcg_gen_extu_i32_i64(tmp64, b);
> -tcg_temp_free_i32(b);
> -tcg_gen_shli_i64(tmp64, tmp64, 32);
> -tcg_gen_sub_i64(a, tmp64, a);
> -
> -tcg_temp_free_i64(tmp64);
> -return a;
> -}
> -
>  /* 32x32->64 multiply.  Marks inputs as dead.  */
>  static TCGv_i64 gen_mulu_i64_i32(TCGv_i32 a, TCGv_i32 b)
>  {
> @@ -8872,23 +8844,27 @@ static void disas_arm_insn(DisasContext *s, unsigned 
> int insn)
> (SMMUL, SMMLA, SMMLS) */
>  tmp = load_reg(s, rm);
>  tmp2 = load_reg(s, rs);
> -tmp64 = gen_muls_i64_i32(tmp, tmp2);
> +tcg_gen_muls2_i32(tmp2, tmp, tmp, tmp2);
>
>  if (rd != 15) {
> -tmp = load_reg(s, rd);
> +tmp3 = load_reg(s, rd);
>  if (insn & (1 << 6)) {
> -tmp64 = gen_subq_msw(tmp64, tmp);
> +tcg_gen_sub_i32(tmp, tmp, tmp3);

Shouldn't you subtract tmp from tmp3?

>  } else {
> -tmp64 = gen_addq_msw(tmp64, tmp);
> +tcg_gen_add_i32(tmp, tmp, tmp3);
>  }
> +tcg_temp_free_i32(tmp3);
>  }
>  if (insn & (1 << 5)) {
> -tcg_gen_addi_i64(tmp64, tmp64, 0x8000u);
> +/*
> + * Adding 0x8000 to the 64-bit quantity
> + * means that we have carry in to the high
> + * word when the low word has the high bit set.
> + */
> +tcg_gen_shri_i32(tmp2, tmp2, 31);
> +tcg_gen_add_i32(tmp, tmp, tmp2);
>  }
> -tcg_gen_shri_i64(tmp64, tmp64, 32);
> -tmp = tcg_temp_new_i32();
> -tcg_gen_extrl_i64_i32(tmp, tmp64);
> -tcg_temp_free_i64(tmp64);
> +tcg_temp_free_i32(tmp2);
>  store_reg(s, rn, tmp);
>  break;
>  case 0:
> @@ -10114,22 +10090,26 @@ static void disas_thumb2_insn(DisasContext *s, 
> uint32_t insn)
>}
>  break;
>  case 5: case 6: /* 32 * 32 -> 32msb (SMMUL, SMMLA, SMMLS) */
> -tmp64 = gen_muls_i64_i32(tmp, tmp2);
> +tcg_gen_muls2_i32(tmp2, tmp, tmp, tmp2);
>  if (rs != 15) {
> -tmp = load_reg(s, rs);
> +tmp3 = load_reg(s, rs);
>  if (insn & (1 << 20)) {
> -tmp64 = gen_addq_msw(tmp64, tmp);
> +tcg_gen_add_i32(tmp, tmp, tmp3);
>  } else {
> -tmp64 = gen_subq_msw(tmp64, tmp);
> +tcg_gen_sub_i32(tmp, tmp, tmp3);

Same here.

Also the way you do the computation means you don't propagate the
borrow from the lower 32-bit of the 64-bit product when doing the
subtraction.

Thanks,

Laurent

>  }
> +tcg_temp_free_i32(tmp3);
>  }
>  if (insn & (1 << 4)) {
> -tcg_gen_addi_i64(tmp64, tmp64, 0x8000u);
> +/*
> + * Adding 0x8000 to the 64-bit quantity
> + * means that we have carry in to the high
> + * word when the low word has 

Re: [Qemu-devel] [PATCH 2/2] target/arm: Factor out unallocated_encoding for aarch32

2019-08-27 Thread Laurent Desnogues
On Mon, Aug 26, 2019 at 5:15 PM Richard Henderson
 wrote:
>
> Make this a static function private to translate.c.
> Thus we can use the same idiom between aarch64 and aarch32
> without actually sharing function implementations.
>
> Signed-off-by: Richard Henderson 

Reviewed-by: Laurent Desnogues 

Thanks,

Laurent

> ---
>  target/arm/translate-vfp.inc.c |  3 +--
>  target/arm/translate.c | 22 --
>  2 files changed, 13 insertions(+), 12 deletions(-)
>
> diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
> index 5065d4524c..3e8ea80493 100644
> --- a/target/arm/translate-vfp.inc.c
> +++ b/target/arm/translate-vfp.inc.c
> @@ -108,8 +108,7 @@ static bool full_vfp_access_check(DisasContext *s, bool 
> ignore_vfp_enabled)
>
>  if (!s->vfp_enabled && !ignore_vfp_enabled) {
>  assert(!arm_dc_feature(s, ARM_FEATURE_M));
> -gen_exception_insn(s, s->pc_curr, EXCP_UDEF, syn_uncategorized(),
> -   default_exception_el(s));
> +unallocated_encoding(s);
>  return false;
>  }
>
> diff --git a/target/arm/translate.c b/target/arm/translate.c
> index 2aac9aae68..66311580c0 100644
> --- a/target/arm/translate.c
> +++ b/target/arm/translate.c
> @@ -1231,6 +1231,13 @@ static void gen_exception_bkpt_insn(DisasContext *s, 
> uint32_t syn)
>  s->base.is_jmp = DISAS_NORETURN;
>  }
>
> +static void unallocated_encoding(DisasContext *s)
> +{
> +/* Unallocated and reserved encodings are uncategorized */
> +gen_exception_insn(s, s->pc_curr, EXCP_UDEF, syn_uncategorized(),
> +   default_exception_el(s));
> +}
> +
>  /* Force a TB lookup after an instruction that changes the CPU state.  */
>  static inline void gen_lookup_tb(DisasContext *s)
>  {
> @@ -1261,8 +1268,7 @@ static inline void gen_hlt(DisasContext *s, int imm)
>  return;
>  }
>
> -gen_exception_insn(s, s->pc_curr, EXCP_UDEF, syn_uncategorized(),
> -   default_exception_el(s));
> +unallocated_encoding(s);
>  }
>
>  static inline void gen_add_data_offset(DisasContext *s, unsigned int insn,
> @@ -7574,8 +7580,7 @@ static void gen_srs(DisasContext *s,
>  }
>
>  if (undef) {
> -gen_exception_insn(s, s->pc_curr, EXCP_UDEF, syn_uncategorized(),
> -   default_exception_el(s));
> +unallocated_encoding(s);
>  return;
>  }
>
> @@ -9196,8 +9201,7 @@ static void disas_arm_insn(DisasContext *s, unsigned 
> int insn)
>  break;
>  default:
>  illegal_op:
> -gen_exception_insn(s, s->pc_curr, EXCP_UDEF, syn_uncategorized(),
> -   default_exception_el(s));
> +unallocated_encoding(s);
>  break;
>  }
>  }
> @@ -10882,8 +10886,7 @@ static void disas_thumb2_insn(DisasContext *s, 
> uint32_t insn)
>  }
>  return;
>  illegal_op:
> -gen_exception_insn(s, s->pc_curr, EXCP_UDEF, syn_uncategorized(),
> -   default_exception_el(s));
> +unallocated_encoding(s);
>  }
>
>  static void disas_thumb_insn(DisasContext *s, uint32_t insn)
> @@ -11706,8 +11709,7 @@ static void disas_thumb_insn(DisasContext *s, 
> uint32_t insn)
>  return;
>  illegal_op:
>  undef:
> -gen_exception_insn(s, s->pc_curr, EXCP_UDEF, syn_uncategorized(),
> -   default_exception_el(s));
> +unallocated_encoding(s);
>  }
>
>  static bool insn_crosses_page(CPUARMState *env, DisasContext *s)
> --
> 2.17.1
>



Re: [Qemu-devel] [PATCH 1/2] Revert "target/arm: Use unallocated_encoding for aarch32"

2019-08-27 Thread Laurent Desnogues
Hi,

On Mon, Aug 26, 2019 at 5:15 PM Richard Henderson
 wrote:
>
> This reverts commit 3cb36637157088892e9e33ddb1034bffd1251d3b.
>
> Despite the fact that the text for the call to gen_exception_insn
> is identical for aarch64 and aarch32, the implementation inside
> gen_exception_insn is totally different.
>
> This fixes exceptions raised from aarch64.
>
> Reported-by: Laurent Desnogues 
> Signed-off-by: Richard Henderson 

Reviewed-by: Laurent Desnogues 

Thanks,

Laurent

> ---
>  target/arm/translate-a64.h |  2 ++
>  target/arm/translate.h |  2 --
>  target/arm/translate-a64.c |  7 +++
>  target/arm/translate-vfp.inc.c |  3 ++-
>  target/arm/translate.c | 22 ++
>  5 files changed, 21 insertions(+), 15 deletions(-)
>
> diff --git a/target/arm/translate-a64.h b/target/arm/translate-a64.h
> index 12ad8ac6ed..9cd2b3d238 100644
> --- a/target/arm/translate-a64.h
> +++ b/target/arm/translate-a64.h
> @@ -18,6 +18,8 @@
>  #ifndef TARGET_ARM_TRANSLATE_A64_H
>  #define TARGET_ARM_TRANSLATE_A64_H
>
> +void unallocated_encoding(DisasContext *s);
> +
>  #define unsupported_encoding(s, insn)\
>  do { \
>  qemu_log_mask(LOG_UNIMP, \
> diff --git a/target/arm/translate.h b/target/arm/translate.h
> index 92ef790be9..64304c957e 100644
> --- a/target/arm/translate.h
> +++ b/target/arm/translate.h
> @@ -99,8 +99,6 @@ typedef struct DisasCompare {
>  bool value_global;
>  } DisasCompare;
>
> -void unallocated_encoding(DisasContext *s);
> -
>  /* Share the TCG temporaries common between 32 and 64 bit modes.  */
>  extern TCGv_i32 cpu_NF, cpu_ZF, cpu_CF, cpu_VF;
>  extern TCGv_i64 cpu_exclusive_addr;
> diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
> index 6fd0b779d3..9183f89ba3 100644
> --- a/target/arm/translate-a64.c
> +++ b/target/arm/translate-a64.c
> @@ -338,6 +338,13 @@ static inline void gen_goto_tb(DisasContext *s, int n, 
> uint64_t dest)
>  }
>  }
>
> +void unallocated_encoding(DisasContext *s)
> +{
> +/* Unallocated and reserved encodings are uncategorized */
> +gen_exception_insn(s, s->pc_curr, EXCP_UDEF, syn_uncategorized(),
> +   default_exception_el(s));
> +}
> +
>  static void init_tmp_a64_array(DisasContext *s)
>  {
>  #ifdef CONFIG_DEBUG_TCG
> diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
> index 3e8ea80493..5065d4524c 100644
> --- a/target/arm/translate-vfp.inc.c
> +++ b/target/arm/translate-vfp.inc.c
> @@ -108,7 +108,8 @@ static bool full_vfp_access_check(DisasContext *s, bool 
> ignore_vfp_enabled)
>
>  if (!s->vfp_enabled && !ignore_vfp_enabled) {
>  assert(!arm_dc_feature(s, ARM_FEATURE_M));
> -unallocated_encoding(s);
> +gen_exception_insn(s, s->pc_curr, EXCP_UDEF, syn_uncategorized(),
> +   default_exception_el(s));
>  return false;
>  }
>
> diff --git a/target/arm/translate.c b/target/arm/translate.c
> index cbe19b7a62..2aac9aae68 100644
> --- a/target/arm/translate.c
> +++ b/target/arm/translate.c
> @@ -1231,13 +1231,6 @@ static void gen_exception_bkpt_insn(DisasContext *s, 
> uint32_t syn)
>  s->base.is_jmp = DISAS_NORETURN;
>  }
>
> -void unallocated_encoding(DisasContext *s)
> -{
> -/* Unallocated and reserved encodings are uncategorized */
> -gen_exception_insn(s, s->pc_curr, EXCP_UDEF, syn_uncategorized(),
> -   default_exception_el(s));
> -}
> -
>  /* Force a TB lookup after an instruction that changes the CPU state.  */
>  static inline void gen_lookup_tb(DisasContext *s)
>  {
> @@ -1268,7 +1261,8 @@ static inline void gen_hlt(DisasContext *s, int imm)
>  return;
>  }
>
> -unallocated_encoding(s);
> +gen_exception_insn(s, s->pc_curr, EXCP_UDEF, syn_uncategorized(),
> +   default_exception_el(s));
>  }
>
>  static inline void gen_add_data_offset(DisasContext *s, unsigned int insn,
> @@ -7580,7 +7574,8 @@ static void gen_srs(DisasContext *s,
>  }
>
>  if (undef) {
> -unallocated_encoding(s);
> +gen_exception_insn(s, s->pc_curr, EXCP_UDEF, syn_uncategorized(),
> +   default_exception_el(s));
>  return;
>  }
>
> @@ -9201,7 +9196,8 @@ static void disas_arm_insn(DisasContext *s, unsigned 
> int insn)
>  break;
>  default:
>  illegal_op:
> -unallocated_encoding(s);
> + 

Re: [Qemu-devel] [Qemu-arm] [PATCH 10/11] target/arm: Use unallocated_encoding for aarch32

2019-08-26 Thread Laurent Desnogues
Hi Richard,

On Wed, Aug 7, 2019 at 6:54 AM Richard Henderson
 wrote:
>
> Promote this function from aarch64 to fully general use.
> Use it to unify the code sequences for generating illegal
> opcode exceptions.
>
> Signed-off-by: Richard Henderson 
> ---
>  target/arm/translate-a64.h |  2 --
>  target/arm/translate.h |  2 ++
>  target/arm/translate-a64.c |  7 ---
>  target/arm/translate-vfp.inc.c |  3 +--
>  target/arm/translate.c | 22 --
>  5 files changed, 15 insertions(+), 21 deletions(-)
>
> diff --git a/target/arm/translate-a64.h b/target/arm/translate-a64.h
> index 9cd2b3d238..12ad8ac6ed 100644
> --- a/target/arm/translate-a64.h
> +++ b/target/arm/translate-a64.h
> @@ -18,8 +18,6 @@
>  #ifndef TARGET_ARM_TRANSLATE_A64_H
>  #define TARGET_ARM_TRANSLATE_A64_H
>
> -void unallocated_encoding(DisasContext *s);
> -
>  #define unsupported_encoding(s, insn)\
>  do { \
>  qemu_log_mask(LOG_UNIMP, \
> diff --git a/target/arm/translate.h b/target/arm/translate.h
> index de600073d8..6a65df0b27 100644
> --- a/target/arm/translate.h
> +++ b/target/arm/translate.h
> @@ -98,6 +98,8 @@ typedef struct DisasCompare {
>  bool value_global;
>  } DisasCompare;
>
> +void unallocated_encoding(DisasContext *s);
> +
>  /* Share the TCG temporaries common between 32 and 64 bit modes.  */
>  extern TCGv_i32 cpu_NF, cpu_ZF, cpu_CF, cpu_VF;
>  extern TCGv_i64 cpu_exclusive_addr;
> diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
> index d68bfc66d3..9e1ffe9cfb 100644
> --- a/target/arm/translate-a64.c
> +++ b/target/arm/translate-a64.c
> @@ -352,13 +352,6 @@ static inline void gen_goto_tb(DisasContext *s, int n, 
> uint64_t dest)
>  }
>  }
>
> -void unallocated_encoding(DisasContext *s)
> -{
> -/* Unallocated and reserved encodings are uncategorized */
> -gen_exception_insn(s, s->pc_curr, EXCP_UDEF, syn_uncategorized(),
> -   default_exception_el(s));
> -}
> -
>  static void init_tmp_a64_array(DisasContext *s)
>  {
>  #ifdef CONFIG_DEBUG_TCG
> diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
> index 5065d4524c..3e8ea80493 100644
> --- a/target/arm/translate-vfp.inc.c
> +++ b/target/arm/translate-vfp.inc.c
> @@ -108,8 +108,7 @@ static bool full_vfp_access_check(DisasContext *s, bool 
> ignore_vfp_enabled)
>
>  if (!s->vfp_enabled && !ignore_vfp_enabled) {
>  assert(!arm_dc_feature(s, ARM_FEATURE_M));
> -gen_exception_insn(s, s->pc_curr, EXCP_UDEF, syn_uncategorized(),
> -   default_exception_el(s));
> +unallocated_encoding(s);
>  return false;
>  }
>
> diff --git a/target/arm/translate.c b/target/arm/translate.c
> index d6b0ab7247..2d447d4b90 100644
> --- a/target/arm/translate.c
> +++ b/target/arm/translate.c
> @@ -1285,6 +1285,13 @@ static void gen_exception_bkpt_insn(DisasContext *s, 
> uint32_t syn)
>  s->base.is_jmp = DISAS_NORETURN;
>  }
>
> +void unallocated_encoding(DisasContext *s)
> +{
> +/* Unallocated and reserved encodings are uncategorized */
> +gen_exception_insn(s, s->pc_curr, EXCP_UDEF, syn_uncategorized(),
> +   default_exception_el(s));
> +}

Isn't the move of unallocated_encoding from translate-64.c to
translate.c causing issues since gen_exception_insn isn't the same?
In particular in one case the pc is 64-bit while now it's always
32-bit.  Did I miss something?

Thanks,

Laurent

> +
>  /* Force a TB lookup after an instruction that changes the CPU state.  */
>  static inline void gen_lookup_tb(DisasContext *s)
>  {
> @@ -1315,8 +1322,7 @@ static inline void gen_hlt(DisasContext *s, int imm)
>  return;
>  }
>
> -gen_exception_insn(s, s->pc_curr, EXCP_UDEF, syn_uncategorized(),
> -   default_exception_el(s));
> +unallocated_encoding(s);
>  }
>
>  static inline void gen_add_data_offset(DisasContext *s, unsigned int insn,
> @@ -7638,8 +7644,7 @@ static void gen_srs(DisasContext *s,
>  }
>
>  if (undef) {
> -gen_exception_insn(s, s->pc_curr, EXCP_UDEF, syn_uncategorized(),
> -   default_exception_el(s));
> +unallocated_encoding(s);
>  return;
>  }
>
> @@ -9266,8 +9271,7 @@ static void disas_arm_insn(DisasContext *s, unsigned 
> int insn)
>  break;
>  default:
>  illegal_op:
> -gen_exception_insn(s, s->pc_curr, EXCP_UDEF, syn_uncategorized(),
> -   default_exception_el(s));
> +unallocated_encoding(s);
>  break;
>  }
>  }
> @@ -10955,8 +10959,7 @@ static void disas_thumb2_insn(DisasContext *s, 
> uint32_t insn)
>  }
>  return;
>  illegal_op:
> -gen_exception_insn(s, s->pc_curr, EXCP_UDEF, syn_uncategorized(),
> -  

Re: [Qemu-devel] [PATCH for 4.2] target/arm: generate a custom MIDR for -cpu max

2019-07-26 Thread Laurent Desnogues
On Fri, Jul 26, 2019 at 9:24 AM Alex Bennée  wrote:
>
>
> Peter Maydell  writes:
>
> > On Tue, 23 Jul 2019 at 12:33, Alex Bennée  wrote:
[...]
> >  /*
> >   * Reset MIDR so the guest doesn't mistake our 'max' CPU type for a real
> >   * one and try to apply errata workarounds or use impdef features we
> >   * don't provide.
> >   * An IMPLEMENTER field of 0 means "reserved for software use";
> >   * ARCHITECTURE must be 0xf indicating "v7 or later, check ID registers
> >   * to see which features are present";
> >   * the VARIANT, PARTNUM and REVISION fields are all implementation
> >   * defined and we choose to leave them all at zero.
> >   */
> >
> > It's also a bit inconsistent to do an explicit deposit of 0
> > for the IMPLEMENTER field but not for the VARIANT/PARTNUM/REVISION.
> >
> > I wonder if we should put 0x51 (ascii 'Q') in the PARTNUM field;
> > then if somebody really needs to distinguish QEMU from random
> > other software-models they have a way to do it.
>
> Q is reserved for Qualcomm - It would be nice if ARM could assign QEMU a
> code but I suspect that's not part of the business model.

That was my reaction at first too, but that Q is reserved for the
Implementer field, while Peter is proposing to put it in the PartNum
field :-)

Laurent



Re: [Qemu-devel] [PULL v6 10/42] vl.c: Replace smp global variables with smp machine properties

2019-07-08 Thread Laurent Desnogues
Hi,

On Sat, Jul 6, 2019 at 12:40 AM Eduardo Habkost  wrote:
>
> From: Like Xu 
>
> The global smp variables in vl.c are completely replaced with machine 
> properties.
>
> Form this commit, the smp_cpus/smp_cores/smp_threads/max_cpus are deprecated
> and only machine properties within MachineState are fully applied and enabled.

Wouldn't it make sense to now remove the declarations in
include/sysemu/cpus.h and include/sysemu/sysemu.h?

Thanks,

Laurent

> Signed-off-by: Like Xu 
> Reviewed-by: Alistair Francis 
> Message-Id: <20190518205428.90532-11-like...@linux.intel.com>
> Reviewed-by: Eduardo Habkost 
> Signed-off-by: Eduardo Habkost 
> ---
>  vl.c | 53 ++---
>  1 file changed, 26 insertions(+), 27 deletions(-)
>
> diff --git a/vl.c b/vl.c
> index d657faec03..56aa221385 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -163,10 +163,6 @@ static Chardev **serial_hds;
>  Chardev *parallel_hds[MAX_PARALLEL_PORTS];
>  int win2k_install_hack = 0;
>  int singlestep = 0;
> -int smp_cpus;
> -unsigned int max_cpus;
> -int smp_cores = 1;
> -int smp_threads = 1;
>  int acpi_enabled = 1;
>  int no_hpet = 0;
>  int fd_bootchk = 1;
> @@ -1265,8 +1261,9 @@ static void smp_parse(QemuOpts *opts)
>  sockets = sockets > 0 ? sockets : 1;
>  cpus = cores * threads * sockets;
>  } else {
> -max_cpus = qemu_opt_get_number(opts, "maxcpus", cpus);
> -sockets = max_cpus / (cores * threads);
> +current_machine->smp.max_cpus =
> +qemu_opt_get_number(opts, "maxcpus", cpus);
> +sockets = current_machine->smp.max_cpus / (cores * threads);
>  }
>  } else if (cores == 0) {
>  threads = threads > 0 ? threads : 1;
> @@ -1283,34 +1280,37 @@ static void smp_parse(QemuOpts *opts)
>  exit(1);
>  }
>
> -max_cpus = qemu_opt_get_number(opts, "maxcpus", cpus);
> +current_machine->smp.max_cpus =
> +qemu_opt_get_number(opts, "maxcpus", cpus);
>
> -if (max_cpus < cpus) {
> +if (current_machine->smp.max_cpus < cpus) {
>  error_report("maxcpus must be equal to or greater than smp");
>  exit(1);
>  }
>
> -if (sockets * cores * threads > max_cpus) {
> +if (sockets * cores * threads > current_machine->smp.max_cpus) {
>  error_report("cpu topology: "
>   "sockets (%u) * cores (%u) * threads (%u) > "
>   "maxcpus (%u)",
> - sockets, cores, threads, max_cpus);
> + sockets, cores, threads,
> + current_machine->smp.max_cpus);
>  exit(1);
>  }
>
> -if (sockets * cores * threads != max_cpus) {
> +if (sockets * cores * threads != current_machine->smp.max_cpus) {
>  warn_report("Invalid CPU topology deprecated: "
>  "sockets (%u) * cores (%u) * threads (%u) "
>  "!= maxcpus (%u)",
> -sockets, cores, threads, max_cpus);
> +sockets, cores, threads,
> +current_machine->smp.max_cpus);
>  }
>
> -smp_cpus = cpus;
> -smp_cores = cores;
> -smp_threads = threads;
> +current_machine->smp.cpus = cpus;
> +current_machine->smp.cores = cores;
> +current_machine->smp.threads = threads;
>  }
>
> -if (smp_cpus > 1) {
> +if (current_machine->smp.cpus > 1) {
>  Error *blocker = NULL;
>  error_setg(, QERR_REPLAY_NOT_SUPPORTED, "smp");
>  replay_add_blocker(blocker);
> @@ -4009,26 +4009,25 @@ int main(int argc, char **argv, char **envp)
>  machine_class->default_cpus = machine_class->default_cpus ?: 1;
>
>  /* default to machine_class->default_cpus */
> -smp_cpus = machine_class->default_cpus;
> -max_cpus = machine_class->default_cpus;
> +current_machine->smp.cpus = machine_class->default_cpus;
> +current_machine->smp.max_cpus = machine_class->default_cpus;
> +current_machine->smp.cores = 1;
> +current_machine->smp.threads = 1;
>
>  smp_parse(qemu_opts_find(qemu_find_opts("smp-opts"), NULL));
>
> -current_machine->smp.cpus = smp_cpus;
> -current_machine->smp.max_cpus = max_cpus;
> -current_machine->smp.cores = smp_cores;
> -current_machine->smp.threads = smp_threads;
> -
>  /* sanity-check smp_cpus and max_cpus against machine_class */
> -if (smp_cpus < machine_class->min_cpus) {
> +if (current_machine->smp.cpus < machine_class->min_cpus) {
>  error_report("Invalid SMP CPUs %d. The min CPUs "
> - "supported by machine '%s' is %d", smp_cpus,
> + "supported by machine '%s' is %d",
> + current_machine->smp.cpus,
>   machine_class->name, 

Re: [Qemu-devel] [PATCH] target/arm/vfp_helper: Call set_fpscr_to_host before updating to FPSCR

2019-07-05 Thread Laurent Desnogues
On Fri, Jul 5, 2019 at 2:43 PM Philippe Mathieu-Daudé  wrote:
>
> In commit e9d652824b0 we extracted the vfp_set_fpscr_to_host()
> function but failed at calling it in the correct place, we call
> it after xregs[ARM_VFP_FPSCR] is modified.
>
> Fix by calling this function before we update FPSCR.
>
> Reported-by: Laurent Desnogues 
> Signed-off-by: Philippe Mathieu-Daudé 

Reviewed-by: Laurent Desnogues 
Tested-by: Laurent Desnogues 

Thanks,

Laurent

> ---
>  target/arm/vfp_helper.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
> index 46041e3294..9710ef1c3e 100644
> --- a/target/arm/vfp_helper.c
> +++ b/target/arm/vfp_helper.c
> @@ -197,6 +197,8 @@ void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
>  val &= 0xf7c0009f;
>  }
>
> +vfp_set_fpscr_to_host(env, val);
> +
>  /*
>   * We don't implement trapped exception handling, so the
>   * trap enable bits, IDE|IXE|UFE|OFE|DZE|IOE are all RAZ/WI (not RES0!)
> @@ -217,8 +219,6 @@ void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
>  env->vfp.qc[1] = 0;
>  env->vfp.qc[2] = 0;
>  env->vfp.qc[3] = 0;
> -
> -vfp_set_fpscr_to_host(env, val);
>  }
>
>  void vfp_set_fpscr(CPUARMState *env, uint32_t val)
> --
> 2.20.1
>



Re: [Qemu-devel] [PULL 41/46] target/arm/vfp_helper: Extract vfp_set_fpscr_to_host()

2019-07-05 Thread Laurent Desnogues
Hello,

On Tue, Jul 2, 2019 at 4:18 AM Peter Maydell  wrote:
>
> From: Philippe Mathieu-Daudé 
>
> The vfp_set_fpscr() helper contains code specific to the host
> floating point implementation (here the SoftFloat library).
> Extract this code to vfp_set_fpscr_to_host().
>
> Signed-off-by: Philippe Mathieu-Daudé 
> Message-id: 20190701132516.26392-16-phi...@redhat.com
> Reviewed-by: Peter Maydell 
> Signed-off-by: Peter Maydell 
> ---
>  target/arm/vfp_helper.c | 127 +---
>  1 file changed, 66 insertions(+), 61 deletions(-)
>
> diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
> index d54e3253240..b19a395b67d 100644
> --- a/target/arm/vfp_helper.c
> +++ b/target/arm/vfp_helper.c
> @@ -81,71 +81,11 @@ static inline int vfp_exceptbits_to_host(int target_bits)
>  return host_bits;
>  }
>
> -uint32_t HELPER(vfp_get_fpscr)(CPUARMState *env)
> -{
> -uint32_t i, fpscr;
> -
> -fpscr = env->vfp.xregs[ARM_VFP_FPSCR]
> -| (env->vfp.vec_len << 16)
> -| (env->vfp.vec_stride << 20);
> -
> -i = get_float_exception_flags(>vfp.fp_status);
> -i |= get_float_exception_flags(>vfp.standard_fp_status);
> -/* FZ16 does not generate an input denormal exception.  */
> -i |= (get_float_exception_flags(>vfp.fp_status_f16)
> -  & ~float_flag_input_denormal);
> -fpscr |= vfp_exceptbits_from_host(i);
> -
> -i = env->vfp.qc[0] | env->vfp.qc[1] | env->vfp.qc[2] | env->vfp.qc[3];
> -fpscr |= i ? FPCR_QC : 0;
> -
> -return fpscr;
> -}
> -
> -uint32_t vfp_get_fpscr(CPUARMState *env)
> -{
> -return HELPER(vfp_get_fpscr)(env);
> -}
> -
> -void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
> +static void vfp_set_fpscr_to_host(CPUARMState *env, uint32_t val)
>  {
>  int i;
>  uint32_t changed = env->vfp.xregs[ARM_VFP_FPSCR];
>
> -/* When ARMv8.2-FP16 is not supported, FZ16 is RES0.  */
> -if (!cpu_isar_feature(aa64_fp16, env_archcpu(env))) {
> -val &= ~FPCR_FZ16;
> -}
> -
> -if (arm_feature(env, ARM_FEATURE_M)) {
> -/*
> - * M profile FPSCR is RES0 for the QC, STRIDE, FZ16, LEN bits
> - * and also for the trapped-exception-handling bits IxE.
> - */
> -val &= 0xf7c0009f;
> -}
> -
> -/*
> - * We don't implement trapped exception handling, so the
> - * trap enable bits, IDE|IXE|UFE|OFE|DZE|IOE are all RAZ/WI (not RES0!)
> - *
> - * If we exclude the exception flags, IOC|DZC|OFC|UFC|IXC|IDC
> - * (which are stored in fp_status), and the other RES0 bits
> - * in between, then we clear all of the low 16 bits.
> - */
> -env->vfp.xregs[ARM_VFP_FPSCR] = val & 0xf7c8;
> -env->vfp.vec_len = (val >> 16) & 7;
> -env->vfp.vec_stride = (val >> 20) & 3;
> -
> -/*
> - * The bit we set within fpscr_q is arbitrary; the register as a
> - * whole being zero/non-zero is what counts.
> - */
> -env->vfp.qc[0] = val & FPCR_QC;
> -env->vfp.qc[1] = 0;
> -env->vfp.qc[2] = 0;
> -env->vfp.qc[3] = 0;
> -
>  changed ^= val;
>  if (changed & (3 << 22)) {
>  i = (val >> 22) & 3;
> @@ -193,6 +133,71 @@ void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t 
> val)
>  set_float_exception_flags(0, >vfp.standard_fp_status);
>  }
>
> +uint32_t HELPER(vfp_get_fpscr)(CPUARMState *env)
> +{
> +uint32_t i, fpscr;
> +
> +fpscr = env->vfp.xregs[ARM_VFP_FPSCR]
> +| (env->vfp.vec_len << 16)
> +| (env->vfp.vec_stride << 20);
> +
> +i = get_float_exception_flags(>vfp.fp_status);
> +i |= get_float_exception_flags(>vfp.standard_fp_status);
> +/* FZ16 does not generate an input denormal exception.  */
> +i |= (get_float_exception_flags(>vfp.fp_status_f16)
> +  & ~float_flag_input_denormal);
> +fpscr |= vfp_exceptbits_from_host(i);
> +
> +i = env->vfp.qc[0] | env->vfp.qc[1] | env->vfp.qc[2] | env->vfp.qc[3];
> +fpscr |= i ? FPCR_QC : 0;
> +
> +return fpscr;
> +}
> +
> +uint32_t vfp_get_fpscr(CPUARMState *env)
> +{
> +return HELPER(vfp_get_fpscr)(env);
> +}
> +
> +void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
> +{
> +/* When ARMv8.2-FP16 is not supported, FZ16 is RES0.  */
> +if (!cpu_isar_feature(aa64_fp16, env_archcpu(env))) {
> +val &= ~FPCR_FZ16;
> +}
> +
> +if (arm_feature(env, ARM_FEATURE_M)) {
> +/*
> + * M profile FPSCR is RES0 for the QC, STRIDE, FZ16, LEN bits
> + * and also for the trapped-exception-handling bits IxE.
> + */
> +val &= 0xf7c0009f;
> +}
> +
> +/*
> + * We don't implement trapped exception handling, so the
> + * trap enable bits, IDE|IXE|UFE|OFE|DZE|IOE are all RAZ/WI (not RES0!)
> + *
> + * If we exclude the exception flags, IOC|DZC|OFC|UFC|IXC|IDC
> + * (which are stored in fp_status), and the other RES0 bits
> + * in between, then we clear all of the low 16 bits.
> + */
> +

Re: [Qemu-devel] [PATCH] linux-user: avoid treading on gprof's SIGPROF signals

2019-05-02 Thread Laurent Desnogues
On Thu, May 2, 2019 at 6:17 PM Laurent Vivier  wrote:
>
> On 02/05/2019 16:58, Alex Bennée wrote:
> > The guest tends to get confused when it receives signals it doesn't
> > know about. Given the gprof magic has also set up it's own handler we
> > would do well to avoid stomping on it as well.
> >
> > Signed-off-by: Alex Bennée 
> > ---
> >  linux-user/signal.c | 5 +
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/linux-user/signal.c b/linux-user/signal.c
> > index e2c0b37173..44b2d3b35a 100644
> > --- a/linux-user/signal.c
> > +++ b/linux-user/signal.c
> > @@ -508,6 +508,11 @@ void signal_init(void)
> >  act.sa_flags = SA_SIGINFO;
> >  act.sa_sigaction = host_signal_handler;
> >  for(i = 1; i <= TARGET_NSIG; i++) {
> > +#ifdef TARGET_GPROF
> > +if (i == SIGPROF) {
> > +continue;
> > +}
> > +#endif
> >  host_sig = target_to_host_signal(i);
> >  sigaction(host_sig, NULL, );
> >  if (oact.sa_sigaction == (void *)SIG_IGN) {
> >
>
> Perhaps merge this with the previous one and send a v2: it will ease
> bisecting.

I agree it would be better, though it should be noted that the signal
issue has existed for at least 8 years (that's when I had to add a
specific patch in my private repository).

Thanks,

Laurent



Re: [Qemu-devel] [PATCH] linux-user: fix GPROF build failure

2019-05-02 Thread Laurent Desnogues
Hello,

On Thu, May 2, 2019 at 11:31 AM Alex Bennée  wrote:
>
> When linux-user/exit was introduced we failed to move the gprof
> include at the same time. The CI didn't notice because it only builds
> system emulation. Fix it for those that still find gprof useful.
>
> Signed-off-by: Alex Bennée 

Tested-by: Laurent Desnogues 

Thanks,

Laurent

> ---
>  linux-user/exit.c| 3 +++
>  linux-user/syscall.c | 3 ---
>  2 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/linux-user/exit.c b/linux-user/exit.c
> index 14e94e28fa..bdda720553 100644
> --- a/linux-user/exit.c
> +++ b/linux-user/exit.c
> @@ -18,6 +18,9 @@
>   */
>  #include "qemu/osdep.h"
>  #include "qemu.h"
> +#ifdef TARGET_GPROF
> +#include 
> +#endif
>
>  #ifdef CONFIG_GCOV
>  extern void __gcov_dump(void);
> diff --git a/linux-user/syscall.c b/linux-user/syscall.c
> index 96cd4bf86d..f2d9883aef 100644
> --- a/linux-user/syscall.c
> +++ b/linux-user/syscall.c
> @@ -59,9 +59,6 @@
>  #ifdef CONFIG_TIMERFD
>  #include 
>  #endif
> -#ifdef TARGET_GPROF
> -#include 
> -#endif
>  #ifdef CONFIG_EVENTFD
>  #include 
>  #endif
> --
> 2.20.1
>
>



Re: [Qemu-devel] [PATCH v4 03/22] target/arm: Add MTE system registers

2019-03-08 Thread Laurent Desnogues
On Fri, Mar 8, 2019 at 11:31 AM Laurent Desnogues
 wrote:
>
> Hello,
>
> On Thu, Mar 7, 2019 at 6:09 PM Richard Henderson
>  wrote:
[...]
> > +static CPAccessResult access_mte(CPUARMState *env, const ARMCPRegInfo *ri,
> > + bool isread)
> > +{
> > +int el = arm_current_el(env);
> > +
> > +if (el < 2 &&
> > +arm_feature(env, ARM_FEATURE_EL2) &&
> > +!(arm_hcr_el2_eff(env) & HCR_ATA)) {
> > +return CP_ACCESS_TRAP_EL2;
> > +}
>
> arm_hcr_el2_eff seems to be clearing HCR_ATA bit.  I think it needs to
> be updated.

Forget that.  I read it wrong and that's my test that is buggy!

Perhaps the comment about ARMv8.4 in arm_hcr_el2_eff should be updated?

Sorry,

Laurent



Re: [Qemu-devel] [PATCH v4 03/22] target/arm: Add MTE system registers

2019-03-08 Thread Laurent Desnogues
Hello,

On Thu, Mar 7, 2019 at 6:09 PM Richard Henderson
 wrote:
>
> This is TFSRE0_EL1, TFSR_EL1, TFSR_EL2, TFSR_EL3,
> RGSR_EL1, GCR_EL1, GMID_EL1, and PSTATE.TCO.
>
> Signed-off-by: Richard Henderson 
> ---
> v3: Add GMID; add access_mte.
> ---
>  target/arm/cpu.h   |  3 ++
>  target/arm/internals.h |  6 
>  target/arm/helper.c| 66 ++
>  target/arm/translate-a64.c | 11 +++
>  4 files changed, 86 insertions(+)
>
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index 0cf9eacebe..b9b33bc285 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -495,6 +495,9 @@ typedef struct CPUARMState {
>  uint64_t pmccfiltr_el0; /* Performance Monitor Filter Register */
>  uint64_t vpidr_el2; /* Virtualization Processor ID Register */
>  uint64_t vmpidr_el2; /* Virtualization Multiprocessor ID Register */
> +uint64_t tfsr_el[4]; /* tfsrel0_el1 is index 0.  */
> +uint64_t gcr_el1;
> +uint64_t rgsr_el1;
>  } cp15;
>
>  struct {
> diff --git a/target/arm/internals.h b/target/arm/internals.h
> index 2922324f63..fbfa770c23 100644
> --- a/target/arm/internals.h
> +++ b/target/arm/internals.h
> @@ -1002,4 +1002,10 @@ static inline bool 
> allocation_tag_access_enabled(CPUARMState *env, int el,
>  return sctlr != 0;
>  }
>
> +/*
> + * The log2 of the words in the tag block, for GMID_EL1.BS.
> + * The is the maximum, 256 bytes, which manipulates 64-bits of tags.
> + */
> +#define GMID_EL1_BS  6
> +
>  #endif
> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index ab8006291b..7b30e1a1a9 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -5732,6 +5732,69 @@ static const ARMCPRegInfo pauth_reginfo[] = {
>.fieldoffset = offsetof(CPUARMState, apib_key.hi) },
>  REGINFO_SENTINEL
>  };
> +
> +static CPAccessResult access_mte(CPUARMState *env, const ARMCPRegInfo *ri,
> + bool isread)
> +{
> +int el = arm_current_el(env);
> +
> +if (el < 2 &&
> +arm_feature(env, ARM_FEATURE_EL2) &&
> +!(arm_hcr_el2_eff(env) & HCR_ATA)) {
> +return CP_ACCESS_TRAP_EL2;
> +}

arm_hcr_el2_eff seems to be clearing HCR_ATA bit.  I think it needs to
be updated.

Thanks,

Laurent

> +if (el < 3 &&
> +arm_feature(env, ARM_FEATURE_EL3) &&
> +!(env->cp15.scr_el3 & SCR_ATA)) {
> +return CP_ACCESS_TRAP_EL3;
> +}
> +return CP_ACCESS_OK;
> +}
> +
> +static uint64_t tco_read(CPUARMState *env, const ARMCPRegInfo *ri)
> +{
> +return env->pstate & PSTATE_TCO;
> +}
> +
> +static void tco_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t val)
> +{
> +env->pstate = (env->pstate & ~PSTATE_TCO) | (val & PSTATE_TCO);
> +}
> +
> +static const ARMCPRegInfo mte_reginfo[] = {
> +{ .name = "TFSRE0_EL1", .state = ARM_CP_STATE_AA64,
> +  .opc0 = 3, .opc1 = 0, .crn = 6, .crm = 6, .opc2 = 1,
> +  .access = PL1_RW, .accessfn = access_mte,
> +  .fieldoffset = offsetof(CPUARMState, cp15.tfsr_el[0]) },
> +{ .name = "TFSR_EL1", .state = ARM_CP_STATE_AA64,
> +  .opc0 = 3, .opc1 = 0, .crn = 6, .crm = 5, .opc2 = 0,
> +  .access = PL1_RW, .accessfn = access_mte,
> +  .fieldoffset = offsetof(CPUARMState, cp15.tfsr_el[1]) },
> +{ .name = "TFSR_EL2", .state = ARM_CP_STATE_AA64,
> +  .opc0 = 3, .opc1 = 4, .crn = 6, .crm = 5, .opc2 = 0,
> +  .access = PL2_RW, .accessfn = access_mte,
> +  .fieldoffset = offsetof(CPUARMState, cp15.tfsr_el[2]) },
> +{ .name = "TFSR_EL3", .state = ARM_CP_STATE_AA64,
> +  .opc0 = 3, .opc1 = 6, .crn = 6, .crm = 6, .opc2 = 0,
> +  .access = PL3_RW,
> +  .fieldoffset = offsetof(CPUARMState, cp15.tfsr_el[3]) },
> +{ .name = "RGSR_EL1", .state = ARM_CP_STATE_AA64,
> +  .opc0 = 3, .opc1 = 0, .crn = 1, .crm = 0, .opc2 = 5,
> +  .access = PL1_RW, .accessfn = access_mte,
> +  .fieldoffset = offsetof(CPUARMState, cp15.rgsr_el1) },
> +{ .name = "GCR_EL1", .state = ARM_CP_STATE_AA64,
> +  .opc0 = 3, .opc1 = 0, .crn = 1, .crm = 0, .opc2 = 6,
> +  .access = PL1_RW, .accessfn = access_mte,
> +  .fieldoffset = offsetof(CPUARMState, cp15.gcr_el1) },
> +{ .name = "TCO", .state = ARM_CP_STATE_AA64,
> +  .opc0 = 3, .opc1 = 3, .crn = 4, .crm = 2, .opc2 = 7,
> +  .type = ARM_CP_NO_RAW,
> +  .access = PL0_RW, .readfn = tco_read, .writefn = tco_write },
> +{ .name = "GMID_EL1", .state = ARM_CP_STATE_AA64,
> +  .opc0 = 3, .opc1 = 1, .crn = 0, .crm = 0, .opc2 = 4,
> +  .access = PL1_R, .type = ARM_CP_CONST, .resetvalue = GMID_EL1_BS },
> +REGINFO_SENTINEL
> +};
>  #endif
>
>  static CPAccessResult access_predinv(CPUARMState *env, const ARMCPRegInfo 
> *ri,
> @@ -6676,6 +6739,9 @@ void register_cp_regs_for_features(ARMCPU *cpu)
>  if (cpu_isar_feature(aa64_pauth, cpu)) {
>  define_arm_cp_regs(cpu, pauth_reginfo);
>  }
> +if (cpu_isar_feature(aa64_mte_insn_reg, cpu)) 

Re: [Qemu-devel] [PATCH 0/2] target/arm: Implement ARMv8.5-FRINT

2019-02-25 Thread Laurent Desnogues
Hi Richard,

On Sat, Feb 23, 2019 at 2:23 AM Richard Henderson
 wrote:
>
> Based-on: the ARMv8.2-FHM patch set, although I don't know that
> this is an actual dependency; it's just the tree I started with.
>
> There is not yet support for this extension within FVP, so I've
> self-tested it against my own understanding of what is supposed
> to go on with the file below.

I ran some tests and found no issue.  Your understanding looks correct :-)

Tested-by: Laurent Desnogues 

Thanks,

Laurent


>
> r~
>
>
> Richard Henderson (2):
>   target/arm: Restructure handle_fp_1src_{single,double}
>   target/arm: Implement ARMv8.5-FRINT
>
>  target/arm/cpu.h   |   5 ++
>  target/arm/helper.h|   5 ++
>  target/arm/cpu64.c |   1 +
>  target/arm/translate-a64.c | 161 ++---
>  target/arm/vfp_helper.c|  96 ++
>  5 files changed, 222 insertions(+), 46 deletions(-)
>
> --
>
> #include 
> #include 
>
> asm(".arch armv8.5-a");
>
> static const float f32_i32_z[][2] = {
> { 0, 0 },
> { -0.0, -0.0 },
> { 0.9, 0.0 },
> { 1.0, 1.0 },
> { 1.1, 1.0 },
> { 1.9, 1.0 },
> { 0x1.fep30, 0x1.fep30 },   /* 2^31 - epsilon */
> { -0x1p31, -0x1.0p31 }, /* -2^31 = INT32_MIN */
> { 0x1.0p31, -0x1.0p31 },/* overflow */
> { 0x1.0p32, -0x1.0p31 },/* overflow */
> { -0x1.0p32, -0x1.0p31 },   /* overflow */
> { __builtin_inf(), -0x1.0p31 }, /* overflow */
> };
>
> static const double f64_i32_z[][2] = {
> { 0, 0 },
> { -0.0, -0.0 },
> { 0.9, 0.0 },
> { 1.0, 1.0 },
> { 1.1, 1.0 },
> { 1.9, 1.0 },
> { 2147483647, 2147483647 }, /* 2^31 - 1 = INT32_MAX */
> { -0x1p31, -0x1.0p31 }, /* -2^31 = INT32_MIN */
> { 0x1.0p31, -0x1.0p31 },/* overflow */
> { 0x1.0p32, -0x1.0p31 },/* overflow */
> { -0x1.0p32, -0x1.0p31 },   /* overflow */
> { __builtin_inf(), -0x1.0p31 }, /* overflow */
> };
>
> static const float f32_i64_z[][2] = {
> { 0, 0 },
> { -0.0, -0.0 },
> { 0.9, 0.0 },
> { 1.0, 1.0 },
> { 1.1, 1.0 },
> { 1.9, 1.0 },
> { 0x1.fep30, 0x1.fep30 },   /* 2^31 - epsilon */
> { -0x1p31, -0x1.0p31 }, /* -2^31 = INT32_MIN */
> { 0x1.0p31, 0x1.0p31 },
> { 0x1.0p32, 0x1.0p32 },
> { -0x1.0p32, -0x1.0p32 },
> { 0x1.0p62, 0x1.0p62 },
> { 0x1.fep62, 0x1.fep62 },   /* 2^63 - epsilon */
> { -0x1.0p63, -0x1.0p63 },   /* -2^63 = INT64_MIN */
> { 0x1.0p63, -0x1.0p63 },/* overflow */
> { 0x1.0p64, -0x1.0p63 },/* overflow */
> { -0x1.0p64, -0x1.0p63 },   /* overflow */
> { __builtin_inf(), -0x1.0p63 }, /* overflow */
> };
>
> static const double f64_i64_z[][2] = {
> { 0, 0 },
> { -0.0, -0.0 },
> { 0.9, 0.0 },
> { 1.0, 1.0 },
> { 1.1, 1.0 },
> { 1.9, 1.0 },
> { 2147483647, 2147483647 }, /* 2^31 - 1 = INT32_MAX */
> { -0x1p31, -0x1.0p31 }, /* -2^31 = INT32_MIN */
> { 0x1.0p31, 0x1.0p31 },
> { 0x1.0p32, 0x1.0p32 },
> { -0x1.0p32, -0x1.0p32 },
> { 0x1.0p62, 0x1.0p62 },
> { 0x1.fp62, 0x1.fp62 }, /* 2^63 - epsilon */
> { -0x1.0p63, -0x1.0p63 },   /* -2^63 = INT64_MIN */
> { 0x1.0p63, -0x1.0p63 },/* overflow */
> { 0x1.0p64, -0x1.0p63 },/* overflow */
> { -0x1.0p64, -0x1.0p63 },   /* overflow */
> { __builtin_inf(), -0x1.0p63 }, /* overflow */
> };
>
> int main()
> {
> int i;
>
> for (i = 0; i < sizeof(f32_i32_z)/sizeof(f32_i32_z[0]); i++) {
> float x;
> asm("frint32z %s0,%s1" : "=w"(x) : "w"(f32_i32_z[i][0]));
> if (x != f32_i32_z[i][1]) {
> printf("%-2d: frint(%a) -> %a != %a\n",
>i, f32_i32_z[i][0], x, f32_i32_z[i][1]);
> }
> }
>
> for (i = 0; i < sizeof(f32_i64_z)/sizeof(f32_i64_z[0]); i++) {
> float x;
> asm("frint64z %s0,%s1" : "=w"(x) : "w"(f32_i64_z[i][0]));
> if (x != f32_i64_z[i][1]) {
> printf("%-2d: frint(%a) -> %a != %a\n",
>i, f32_i64_z[i][0], x, f32_i64_z[i][1]);
> }
>

Re: [Qemu-devel] [PATCH 0/3] target/arm: Implement ARMv8.5-CondM

2019-02-20 Thread Laurent Desnogues
Hi,

On Wed, Feb 20, 2019 at 6:05 AM Richard Henderson
 wrote:
>
> Based-on: https://lists.gnu.org/archive/html/qemu-devel/2019-02/msg02733.html
> aka the v3 ARMv8.5-MemTag patch set,
> or at least some of the early patches that split handle_msr_i.
>
> The v8.4 parts have been tested vs FVP, but there's no released
> version that supports v8.5 yet, so XAFlag and AXFlag are untested.
> But they seem fairly straightforward, unless I've done something silly.

I tested xaflag/axflag (after setting the ID_AA64ISAR0 TS field to 2).

I also tested rmif/setf8/setf16.

Everything passes.

tested-by: Laurent Desnogues 

Thanks,

Laurent

>
> r~
>
>
> Richard Henderson (3):
>   target/arm: Rearrange disas_data_proc_reg
>   target/arm: Implement ARMv8.4-CondM
>   target/arm: Implement ARMv8.5-CondM
>
>  target/arm/cpu.h   |  10 ++
>  linux-user/elfload.c   |   1 +
>  target/arm/cpu64.c |   1 +
>  target/arm/translate-a64.c | 249 +++--
>  4 files changed, 221 insertions(+), 40 deletions(-)
>
> --
> 2.17.2
>
>



Re: [Qemu-devel] [PATCH 1/4] target/arm: Add helpers for FMLAL and FMLSL

2019-02-14 Thread Laurent Desnogues
On Thu, Feb 14, 2019 at 3:56 PM Richard Henderson
 wrote:
>
> On 2/14/19 1:16 AM, Laurent Desnogues wrote:
> > Hello,
> >
> > On Thu, Feb 14, 2019 at 5:00 AM Richard Henderson
> >  wrote:
> >>
> >> Note that float16_to_float32 rightly squashes SNaN to QNaN.
> >> But of course pickNaNMulAdd, for ARM, selects SNaNs first.
> >> So we have to preserve SNaN long enough for the correct NaN
> >> to be selected.  Thus float16_to_float32_by_bits.
> >>
> >> Signed-off-by: Richard Henderson 
> >> ---
> >>  target/arm/helper.h |   9 +++
> >>  target/arm/vec_helper.c | 154 
> >>  2 files changed, 163 insertions(+)
> >>
> >> diff --git a/target/arm/helper.h b/target/arm/helper.h
> >> index 53a38188c6..0302e13604 100644
> >> --- a/target/arm/helper.h
> >> +++ b/target/arm/helper.h
> >> @@ -653,6 +653,15 @@ DEF_HELPER_FLAGS_6(gvec_fmla_idx_s, TCG_CALL_NO_RWG,
> >>  DEF_HELPER_FLAGS_6(gvec_fmla_idx_d, TCG_CALL_NO_RWG,
> >> void, ptr, ptr, ptr, ptr, ptr, i32)
> >>
> >> +DEF_HELPER_FLAGS_5(gvec_fmlal_h, TCG_CALL_NO_RWG,
> >> +   void, ptr, ptr, ptr, ptr, i32)
> >> +DEF_HELPER_FLAGS_5(gvec_fmlsl_h, TCG_CALL_NO_RWG,
> >> +   void, ptr, ptr, ptr, ptr, i32)
> >> +DEF_HELPER_FLAGS_5(gvec_fmlal_idx_h, TCG_CALL_NO_RWG,
> >> +   void, ptr, ptr, ptr, ptr, i32)
> >> +DEF_HELPER_FLAGS_5(gvec_fmlsl_idx_h, TCG_CALL_NO_RWG,
> >> +   void, ptr, ptr, ptr, ptr, i32)
> >> +
> >>  #ifdef TARGET_AARCH64
> >>  #include "helper-a64.h"
> >>  #include "helper-sve.h"
> >> diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
> >> index 37f338732e..0c3b3de961 100644
> >> --- a/target/arm/vec_helper.c
> >> +++ b/target/arm/vec_helper.c
> >> @@ -766,3 +766,157 @@ DO_FMLA_IDX(gvec_fmla_idx_s, float32, H4)
> >>  DO_FMLA_IDX(gvec_fmla_idx_d, float64, )
> >>
> >>  #undef DO_FMLA_IDX
> >> +
> >> +/*
> >> + * Convert float16 to float32, raising no exceptions and
> >> + * preserving exceptional values, including SNaN.
> >> + * This is effectively an unpack+repack operation.
> >> + */
> >> +static float32 float16_to_float32_by_bits(uint32_t f16)
> >> +{
> >> +const int f16_bias = 15;
> >> +const int f32_bias = 127;
> >> +uint32_t sign = extract32(f16, 15, 1);
> >> +uint32_t exp = extract32(f16, 10, 5);
> >> +uint32_t frac = extract32(f16, 0, 10);
> >> +
> >> +if (exp == 0x1f) {
> >> +/* Inf or NaN */
> >> +exp = 0xff;
> >> +} else if (exp == 0) {
> >> +/* Zero or denormal.  */
> >> +if (frac != 0) {
> >> +/*
> >> + * Denormal; these are all normal float32.
> >> + * Shift the fraction so that the msb is at bit 11,
> >> + * then remove bit 11 as the implicit bit of the
> >> + * normalized float32.  Note that we still go through
> >> + * the shift for normal numbers below, to put the
> >> + * float32 fraction at the right place.
> >> + */
> >> +int shift = clz32(frac) - 21;
> >> +frac = (frac << shift) & 0x3ff;
> >> +exp = f32_bias - f16_bias - shift + 1;
> >
> > If FZ16 is set, this should flush to zero.
>
> Ho, hum, yes it should.
>
> > This means you will have to use both fp_status (for the muladd) and
> > fp_status_f16 (for this function) and so you should pass cpu_env to
> > the helpers rather than the fp_status.
>
> It's not quite as simple as that, because aa32 mode would pass
> standard_fp_status.  I'll figure something out...

Ha yes, I only looked at AArch64... as usual :-(

Thanks,

Laurent

>
> r~



Re: [Qemu-devel] [PATCH 0/4] target/arm: Reduce overhead of cpu_get_tb_cpu_state

2019-02-14 Thread Laurent Desnogues
Hi Richard,

On Thu, Feb 14, 2019 at 5:07 AM Richard Henderson
 wrote:
>
> We've talked about this before, caching state to reduce the
> amount of computation that happens looking up each TB.
>
> I know that Peter has been concerned that we would not be able to
> reliably maintain all of the places that need to be updates to
> keep this up-to-date.
>
> Well, modulo dirty tricks within linux-user, it appears as if
> exception delivery and return, plus after every TB-ending write
> to a system register is sufficient.
>
> There seems to be a noticable improvement, although wall-time
> is harder to come by -- all of my system-level measurements
> include user input, and my user-level measurements seem to be
> too small to matter.

FWIW this patch series made a run of linux-user AArch64 176.gcc 166.i
go from 29.5s down to 24.5s (on an E5-2650 v2).  Though that'd need
more benchmarks, that looks quite good to me.

Thanks,

Laurent

>
> r~
>
>
> Richard Henderson (4):
>   target/arm: Split out recompute_hflags et al
>   target/arm: Rebuild hflags at el changes and MSR writes
>   target/arm: Assert hflags is correct in cpu_get_tb_cpu_state
>   target/arm: Rely on hflags correct in cpu_get_tb_cpu_state
>
>  target/arm/cpu.h   |  22 ++-
>  target/arm/helper.h|   3 +
>  target/arm/internals.h |   4 +
>  linux-user/syscall.c   |   1 +
>  target/arm/cpu.c   |   1 +
>  target/arm/helper-a64.c|   3 +
>  target/arm/helper.c| 267 ++---
>  target/arm/machine.c   |   1 +
>  target/arm/op_helper.c |   1 +
>  target/arm/translate-a64.c |   6 +-
>  target/arm/translate.c |  14 +-
>  11 files changed, 204 insertions(+), 119 deletions(-)
>
> --
> 2.17.1
>
>



Re: [Qemu-devel] [PATCH 1/4] target/arm: Add helpers for FMLAL and FMLSL

2019-02-14 Thread Laurent Desnogues
Hello,

On Thu, Feb 14, 2019 at 5:00 AM Richard Henderson
 wrote:
>
> Note that float16_to_float32 rightly squashes SNaN to QNaN.
> But of course pickNaNMulAdd, for ARM, selects SNaNs first.
> So we have to preserve SNaN long enough for the correct NaN
> to be selected.  Thus float16_to_float32_by_bits.
>
> Signed-off-by: Richard Henderson 
> ---
>  target/arm/helper.h |   9 +++
>  target/arm/vec_helper.c | 154 
>  2 files changed, 163 insertions(+)
>
> diff --git a/target/arm/helper.h b/target/arm/helper.h
> index 53a38188c6..0302e13604 100644
> --- a/target/arm/helper.h
> +++ b/target/arm/helper.h
> @@ -653,6 +653,15 @@ DEF_HELPER_FLAGS_6(gvec_fmla_idx_s, TCG_CALL_NO_RWG,
>  DEF_HELPER_FLAGS_6(gvec_fmla_idx_d, TCG_CALL_NO_RWG,
> void, ptr, ptr, ptr, ptr, ptr, i32)
>
> +DEF_HELPER_FLAGS_5(gvec_fmlal_h, TCG_CALL_NO_RWG,
> +   void, ptr, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_5(gvec_fmlsl_h, TCG_CALL_NO_RWG,
> +   void, ptr, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_5(gvec_fmlal_idx_h, TCG_CALL_NO_RWG,
> +   void, ptr, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_5(gvec_fmlsl_idx_h, TCG_CALL_NO_RWG,
> +   void, ptr, ptr, ptr, ptr, i32)
> +
>  #ifdef TARGET_AARCH64
>  #include "helper-a64.h"
>  #include "helper-sve.h"
> diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
> index 37f338732e..0c3b3de961 100644
> --- a/target/arm/vec_helper.c
> +++ b/target/arm/vec_helper.c
> @@ -766,3 +766,157 @@ DO_FMLA_IDX(gvec_fmla_idx_s, float32, H4)
>  DO_FMLA_IDX(gvec_fmla_idx_d, float64, )
>
>  #undef DO_FMLA_IDX
> +
> +/*
> + * Convert float16 to float32, raising no exceptions and
> + * preserving exceptional values, including SNaN.
> + * This is effectively an unpack+repack operation.
> + */
> +static float32 float16_to_float32_by_bits(uint32_t f16)
> +{
> +const int f16_bias = 15;
> +const int f32_bias = 127;
> +uint32_t sign = extract32(f16, 15, 1);
> +uint32_t exp = extract32(f16, 10, 5);
> +uint32_t frac = extract32(f16, 0, 10);
> +
> +if (exp == 0x1f) {
> +/* Inf or NaN */
> +exp = 0xff;
> +} else if (exp == 0) {
> +/* Zero or denormal.  */
> +if (frac != 0) {
> +/*
> + * Denormal; these are all normal float32.
> + * Shift the fraction so that the msb is at bit 11,
> + * then remove bit 11 as the implicit bit of the
> + * normalized float32.  Note that we still go through
> + * the shift for normal numbers below, to put the
> + * float32 fraction at the right place.
> + */
> +int shift = clz32(frac) - 21;
> +frac = (frac << shift) & 0x3ff;
> +exp = f32_bias - f16_bias - shift + 1;

If FZ16 is set, this should flush to zero.

This means you will have to use both fp_status (for the muladd) and
fp_status_f16 (for this function) and so you should pass cpu_env to
the helpers rather than the fp_status.

Thanks,

Laurent

> +}
> +} else {
> +/* Normal number; adjust the bias.  */
> +exp += f32_bias - f16_bias;
> +}
> +sign <<= 31;
> +exp <<= 23;
> +frac <<= 23 - 10;
> +
> +return sign | exp | frac;
> +}
> +
> +static float32 fmlal(float32 a, float16 n16, float16 m16, float_status *fpst)
> +{
> +float32 n = float16_to_float32_by_bits(n16);
> +float32 m = float16_to_float32_by_bits(m16);
> +return float32_muladd(n, m, a, 0, fpst);
> +}
> +
> +static float32 fmlsl(float32 a, float16 n16, float16 m16, float_status *fpst)
> +{
> +float32 n = float16_to_float32_by_bits(n16);
> +float32 m = float16_to_float32_by_bits(m16);
> +return float32_muladd(float32_chs(n), m, a, 0, fpst);
> +}
> +
> +static inline uint64_t load4_f16(uint64_t *ptr, int is_q, int is_2)
> +{
> +/*
> + * Branchless load of u32[0], u64[0], u32[1], or u64[1].
> + * Load the 2nd qword iff is_q & is_2.
> + * Shift to the 2nd dword iff !is_q & is_2.
> + * For !is_q & !is_2, the upper bits of the result are garbage.
> + */
> +return ptr[is_q & is_2] >> ((is_2 & ~is_q) << 5);
> +}
> +
> +/*
> + * Note that FMLAL and FMLSL require oprsz == 8 or oprsz == 16,
> + * as there is not yet SVE versions that might use blocking.
> + */
> +
> +void HELPER(gvec_fmlal_h)(void *vd, void *vn, void *vm,
> +  void *fpst, uint32_t desc)
> +{
> +intptr_t i, oprsz = simd_oprsz(desc);
> +int is_2 = extract32(desc, SIMD_DATA_SHIFT, 1);
> +int is_q = oprsz == 16;
> +float32 *d = vd;
> +uint64_t n_4, m_4;
> +
> +/* Pre-load all of the f16 data, avoiding overlap issues.  */
> +n_4 = load4_f16(vn, is_q, is_2);
> +m_4 = load4_f16(vm, is_q, is_2);
> +
> +for (i = 0; i < oprsz / 4; i++) {
> +d[H4(i)] = fmlal(d[H4(i)], extract64(n_4, i*16, 16),
> + extract64(m_4, i*16, 16), fpst);
> +  

Re: [Qemu-devel] [PATCH v2 3/3] target/arm: Implement ARMv8.3-JSConv

2019-02-05 Thread Laurent Desnogues
Hello,

On Wed, Feb 6, 2019 at 6:32 AM Richard Henderson
 wrote:
>
> Signed-off-by: Richard Henderson 

Tested-by: Laurent Desnogues 

Thanks,

Laurent

> ---
> v2: Return 0 for NaN
> ---
>  target/arm/cpu.h   | 10 +
>  target/arm/helper.h|  2 +
>  target/arm/cpu.c   |  1 +
>  target/arm/cpu64.c |  2 +
>  target/arm/op_helper.c | 76 ++
>  target/arm/translate-a64.c | 26 +
>  target/arm/translate.c | 15 
>  7 files changed, 132 insertions(+)
>
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index 47238e4245..bfc532f0ca 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -3227,6 +3227,11 @@ static inline bool isar_feature_aa32_vcma(const 
> ARMISARegisters *id)
>  return FIELD_EX32(id->id_isar5, ID_ISAR5, VCMA) != 0;
>  }
>
> +static inline bool isar_feature_aa32_jscvt(const ARMISARegisters *id)
> +{
> +return FIELD_EX32(id->id_isar6, ID_ISAR6, JSCVT) != 0;
> +}
> +
>  static inline bool isar_feature_aa32_dp(const ARMISARegisters *id)
>  {
>  return FIELD_EX32(id->id_isar6, ID_ISAR6, DP) != 0;
> @@ -3305,6 +3310,11 @@ static inline bool isar_feature_aa64_dp(const 
> ARMISARegisters *id)
>  return FIELD_EX64(id->id_aa64isar0, ID_AA64ISAR0, DP) != 0;
>  }
>
> +static inline bool isar_feature_aa64_jscvt(const ARMISARegisters *id)
> +{
> +return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, JSCVT) != 0;
> +}
> +
>  static inline bool isar_feature_aa64_fcma(const ARMISARegisters *id)
>  {
>  return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, FCMA) != 0;
> diff --git a/target/arm/helper.h b/target/arm/helper.h
> index 53a38188c6..6998f7e8d5 100644
> --- a/target/arm/helper.h
> +++ b/target/arm/helper.h
> @@ -218,6 +218,8 @@ DEF_HELPER_FLAGS_2(rintd_exact, TCG_CALL_NO_RWG, f64, 
> f64, ptr)
>  DEF_HELPER_FLAGS_2(rints, TCG_CALL_NO_RWG, f32, f32, ptr)
>  DEF_HELPER_FLAGS_2(rintd, TCG_CALL_NO_RWG, f64, f64, ptr)
>
> +DEF_HELPER_FLAGS_2(fjcvtzs, TCG_CALL_NO_RWG, i64, f64, ptr)
> +
>  /* neon_helper.c */
>  DEF_HELPER_FLAGS_3(neon_qadd_u8, TCG_CALL_NO_RWG, i32, env, i32, i32)
>  DEF_HELPER_FLAGS_3(neon_qadd_s8, TCG_CALL_NO_RWG, i32, env, i32, i32)
> diff --git a/target/arm/cpu.c b/target/arm/cpu.c
> index edf6e0e1f1..8ea6569088 100644
> --- a/target/arm/cpu.c
> +++ b/target/arm/cpu.c
> @@ -2001,6 +2001,7 @@ static void arm_max_initfn(Object *obj)
>  cpu->isar.id_isar5 = t;
>
>  t = cpu->isar.id_isar6;
> +t = FIELD_DP32(t, ID_ISAR6, JSCVT, 1);
>  t = FIELD_DP32(t, ID_ISAR6, DP, 1);
>  cpu->isar.id_isar6 = t;
>
> diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
> index eff0f164dd..69e4134f79 100644
> --- a/target/arm/cpu64.c
> +++ b/target/arm/cpu64.c
> @@ -311,6 +311,7 @@ static void aarch64_max_initfn(Object *obj)
>  cpu->isar.id_aa64isar0 = t;
>
>  t = cpu->isar.id_aa64isar1;
> +t = FIELD_DP64(t, ID_AA64ISAR1, JSCVT, 1);
>  t = FIELD_DP64(t, ID_AA64ISAR1, FCMA, 1);
>  t = FIELD_DP64(t, ID_AA64ISAR1, APA, 1); /* PAuth, architected only 
> */
>  t = FIELD_DP64(t, ID_AA64ISAR1, API, 0);
> @@ -344,6 +345,7 @@ static void aarch64_max_initfn(Object *obj)
>  cpu->isar.id_isar5 = u;
>
>  u = cpu->isar.id_isar6;
> +u = FIELD_DP32(u, ID_ISAR6, JSCVT, 1);
>  u = FIELD_DP32(u, ID_ISAR6, DP, 1);
>  cpu->isar.id_isar6 = u;
>
> diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
> index c998eadfaa..be555c44e4 100644
> --- a/target/arm/op_helper.c
> +++ b/target/arm/op_helper.c
> @@ -24,6 +24,7 @@
>  #include "internals.h"
>  #include "exec/exec-all.h"
>  #include "exec/cpu_ldst.h"
> +#include "fpu/softfloat.h"
>
>  #define SIGNBIT (uint32_t)0x8000
>  #define SIGNBIT64 ((uint64_t)1 << 63)
> @@ -1376,3 +1377,78 @@ uint32_t HELPER(ror_cc)(CPUARMState *env, uint32_t x, 
> uint32_t i)
>  return ((uint32_t)x >> shift) | (x << (32 - shift));
>  }
>  }
> +
> +/*
> + * Implement float64 to int32_t conversion without saturation;
> + * the result is supplied modulo 2^32.
> + */
> +uint64_t HELPER(fjcvtzs)(float64 value, void *vstatus)
> +{
> +float_status *status = vstatus;
> +uint32_t exp, sign;
> +uint64_t frac;
> +uint32_t inexact = 1; /* !Z */
> +
> +sign = extract64(value, 63, 1);
> +exp = extract64(value, 52, 11);
> +frac = extract64(value, 0, 52);
> +
> +if (exp == 0) {
> +/* While not inexact for IEEE FP, -0.0 i

[Qemu-devel] ARM: another round of missed undefined instructions

2019-02-05 Thread Laurent Desnogues
Hello,

here are some missed undefined instructions I skipped over last time
(I had forgotten to enable FP16 support in QEMU):

- disas_simd_across_lanes: for FP16 FMAX/FMINV/FMAXNMV/FMINNMV it
should be checked that bit 0 of the size field is 0

- disas_simd_mod_imm: FP16 FMOV imm must have bit 29 = 0

- disas_simd_scalar_pairwise: FP16 FMAXP/FADPP/... must have bit 0 of
the size field = 0

- disas_simd_three_reg_same_fp16: opcodes 5, b, c, d, 11, 19, 1b, 1f
are unallocated

- disas_simd_two_reg_misc_fp16: FRECPX is a scalar only instruction

- disas_simd_two_reg_misc_fp16: FABS/FNEG/FSQRT are vector only instructions

- disas_simd_two_reg_misc_fp16: the default clause should undef in the
first switch statement

- disas_simd_two_reg_misc_fp16:  FNEG should be removed from the scalar part.

Hopefully that's all that was left...

Thanks,

Laurent



Re: [Qemu-devel] [PATCH] target/arm: Fix CRn to be 14 for PMEVTYPER/PMEVCNTR

2019-02-05 Thread Laurent Desnogues
On Tue, Feb 5, 2019 at 2:51 PM Aaron Lindsay OS
 wrote:
>
> This bug was introduced in:
> commit 5ecdd3e47cadae83a62dc92b472f1fe163b56f59
> target/arm: Finish implementation of PM[X]EVCNTR and PM[X]EVTYPER
>
> Signed-off-by: Aaron Lindsay 
> Reported-by: Laurent Desnogues 

Reviewed-by: Laurent Desnogues 

Thanks,

Laurent

> ---
>  target/arm/helper.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index d070879894..ec2d17093c 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -5855,25 +5855,25 @@ void register_cp_regs_for_features(ARMCPU *cpu)
>  char *pmevtyper_name = g_strdup_printf("PMEVTYPER%d", i);
>  char *pmevtyper_el0_name = g_strdup_printf("PMEVTYPER%d_EL0", i);
>  ARMCPRegInfo pmev_regs[] = {
> -{ .name = pmevcntr_name, .cp = 15, .crn = 15,
> +{ .name = pmevcntr_name, .cp = 15, .crn = 14,
>.crm = 8 | (3 & (i >> 3)), .opc1 = 0, .opc2 = i & 7,
>.access = PL0_RW, .type = ARM_CP_IO | ARM_CP_ALIAS,
>.readfn = pmevcntr_readfn, .writefn = pmevcntr_writefn,
>.accessfn = pmreg_access },
>  { .name = pmevcntr_el0_name, .state = ARM_CP_STATE_AA64,
> -  .opc0 = 3, .opc1 = 3, .crn = 15, .crm = 8 | (3 & (i >> 3)),
> +  .opc0 = 3, .opc1 = 3, .crn = 14, .crm = 8 | (3 & (i >> 3)),
>.opc2 = i & 7, .access = PL0_RW, .accessfn = pmreg_access,
>.type = ARM_CP_IO,
>.readfn = pmevcntr_readfn, .writefn = pmevcntr_writefn,
>.raw_readfn = pmevcntr_rawread,
>.raw_writefn = pmevcntr_rawwrite },
> -{ .name = pmevtyper_name, .cp = 15, .crn = 15,
> +{ .name = pmevtyper_name, .cp = 15, .crn = 14,
>.crm = 12 | (3 & (i >> 3)), .opc1 = 0, .opc2 = i & 7,
>.access = PL0_RW, .type = ARM_CP_IO | ARM_CP_ALIAS,
>.readfn = pmevtyper_readfn, .writefn = pmevtyper_writefn,
>.accessfn = pmreg_access },
>  { .name = pmevtyper_el0_name, .state = ARM_CP_STATE_AA64,
> -  .opc0 = 3, .opc1 = 3, .crn = 15, .crm = 12 | (3 & (i >> 
> 3)),
> +  .opc0 = 3, .opc1 = 3, .crn = 14, .crm = 12 | (3 & (i >> 
> 3)),
>.opc2 = i & 7, .access = PL0_RW, .accessfn = pmreg_access,
>.type = ARM_CP_IO,
>.readfn = pmevtyper_readfn, .writefn = pmevtyper_writefn,
> --
> 2.20.1
>



Re: [Qemu-devel] [Qemu-arm] [PATCH v10 10/14] target/arm: Finish implementation of PM[X]EVCNTR and PM[X]EVTYPER

2019-02-05 Thread Laurent Desnogues
On Tue, Feb 5, 2019 at 2:41 PM Aaron Lindsay OS
 wrote:
>
> On Feb 04 20:22, Laurent Desnogues wrote:
> > Hello,
> >
> > On Tue, Dec 11, 2018 at 4:25 PM Aaron Lindsay
> >  wrote:
> > >
> > > Add arrays to hold the registers, the definitions themselves, access
> > > functions, and logic to reset counters when PMCR.P is set. Update
> > > filtering code to support counters other than PMCCNTR. Support migration
> > > with raw read/write functions.
> > >
> > > Signed-off-by: Aaron Lindsay 
> > > Signed-off-by: Aaron Lindsay 
> > > Reviewed-by: Richard Henderson 
> > > ---
> > >  target/arm/cpu.h|   3 +
> > >  target/arm/helper.c | 296 +---
> > >  2 files changed, 282 insertions(+), 17 deletions(-)
> > [...]
> > > diff --git a/target/arm/helper.c b/target/arm/helper.c
> > > index fd2923f033..1b851d1689 100644
> > > --- a/target/arm/helper.c
> > > +++ b/target/arm/helper.c
> > [...]
> > > @@ -5301,6 +5526,43 @@ void register_cp_regs_for_features(ARMCPU *cpu)
> > >  };
> > >  define_one_arm_cp_reg(cpu, );
> > >  define_one_arm_cp_reg(cpu, );
> > > +for (i = 0; i < pmcrn; i++) {
> > > +char *pmevcntr_name = g_strdup_printf("PMEVCNTR%d", i);
> > > +char *pmevcntr_el0_name = g_strdup_printf("PMEVCNTR%d_EL0", 
> > > i);
> > > +char *pmevtyper_name = g_strdup_printf("PMEVTYPER%d", i);
> > > +char *pmevtyper_el0_name = 
> > > g_strdup_printf("PMEVTYPER%d_EL0", i);
> > > +ARMCPRegInfo pmev_regs[] = {
> > > +{ .name = pmevcntr_name, .cp = 15, .crn = 15,
> > > +  .crm = 8 | (3 & (i >> 3)), .opc1 = 0, .opc2 = i & 7,
> > > +  .access = PL0_RW, .type = ARM_CP_IO | ARM_CP_ALIAS,
> > > +  .readfn = pmevcntr_readfn, .writefn = pmevcntr_writefn,
> > > +  .accessfn = pmreg_access },
> > > +{ .name = pmevcntr_el0_name, .state = ARM_CP_STATE_AA64,
> > > +  .opc0 = 3, .opc1 = 3, .crn = 15, .crm = 8 | (3 & (i >> 
> > > 3)),
> > > +  .opc2 = i & 7, .access = PL0_RW, .accessfn = 
> > > pmreg_access,
> > > +  .type = ARM_CP_IO,
> > > +  .readfn = pmevcntr_readfn, .writefn = pmevcntr_writefn,
> > > +  .raw_readfn = pmevcntr_rawread,
> > > +  .raw_writefn = pmevcntr_rawwrite },
> > > +{ .name = pmevtyper_name, .cp = 15, .crn = 15,
> > > +  .crm = 12 | (3 & (i >> 3)), .opc1 = 0, .opc2 = i & 7,
> > > +  .access = PL0_RW, .type = ARM_CP_IO | ARM_CP_ALIAS,
> > > +  .readfn = pmevtyper_readfn, .writefn = 
> > > pmevtyper_writefn,
> > > +  .accessfn = pmreg_access },
> > > +{ .name = pmevtyper_el0_name, .state = ARM_CP_STATE_AA64,
> > > +  .opc0 = 3, .opc1 = 3, .crn = 15, .crm = 12 | (3 & (i 
> > > >> 3)),
> >
> > Looking at ARM documentation, I think the value for crn should be 14
> > for PMEVCNTR_EL0 and PMEVTYPER_EL0.
>
> You are correct. I'll post a fix momentarily.
>
> Out of curiosity, how did you discover this? I've been using recent
> Linux kernels for testing, and thought it used these registers, but I
> guess it must only be using PMSELR/PMXEVCNTR/PMXEVTYPER.

There regs are used in a proprietary bare metal bootloader on a
virtual platform my company uses for performance measurements.

Thanks,

Laurent



Re: [Qemu-devel] [Qemu-arm] [PATCH v10 10/14] target/arm: Finish implementation of PM[X]EVCNTR and PM[X]EVTYPER

2019-02-04 Thread Laurent Desnogues
Hello,

On Tue, Dec 11, 2018 at 4:25 PM Aaron Lindsay
 wrote:
>
> Add arrays to hold the registers, the definitions themselves, access
> functions, and logic to reset counters when PMCR.P is set. Update
> filtering code to support counters other than PMCCNTR. Support migration
> with raw read/write functions.
>
> Signed-off-by: Aaron Lindsay 
> Signed-off-by: Aaron Lindsay 
> Reviewed-by: Richard Henderson 
> ---
>  target/arm/cpu.h|   3 +
>  target/arm/helper.c | 296 +---
>  2 files changed, 282 insertions(+), 17 deletions(-)
[...]
> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index fd2923f033..1b851d1689 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
[...]
> @@ -5301,6 +5526,43 @@ void register_cp_regs_for_features(ARMCPU *cpu)
>  };
>  define_one_arm_cp_reg(cpu, );
>  define_one_arm_cp_reg(cpu, );
> +for (i = 0; i < pmcrn; i++) {
> +char *pmevcntr_name = g_strdup_printf("PMEVCNTR%d", i);
> +char *pmevcntr_el0_name = g_strdup_printf("PMEVCNTR%d_EL0", i);
> +char *pmevtyper_name = g_strdup_printf("PMEVTYPER%d", i);
> +char *pmevtyper_el0_name = g_strdup_printf("PMEVTYPER%d_EL0", i);
> +ARMCPRegInfo pmev_regs[] = {
> +{ .name = pmevcntr_name, .cp = 15, .crn = 15,
> +  .crm = 8 | (3 & (i >> 3)), .opc1 = 0, .opc2 = i & 7,
> +  .access = PL0_RW, .type = ARM_CP_IO | ARM_CP_ALIAS,
> +  .readfn = pmevcntr_readfn, .writefn = pmevcntr_writefn,
> +  .accessfn = pmreg_access },
> +{ .name = pmevcntr_el0_name, .state = ARM_CP_STATE_AA64,
> +  .opc0 = 3, .opc1 = 3, .crn = 15, .crm = 8 | (3 & (i >> 3)),
> +  .opc2 = i & 7, .access = PL0_RW, .accessfn = pmreg_access,
> +  .type = ARM_CP_IO,
> +  .readfn = pmevcntr_readfn, .writefn = pmevcntr_writefn,
> +  .raw_readfn = pmevcntr_rawread,
> +  .raw_writefn = pmevcntr_rawwrite },
> +{ .name = pmevtyper_name, .cp = 15, .crn = 15,
> +  .crm = 12 | (3 & (i >> 3)), .opc1 = 0, .opc2 = i & 7,
> +  .access = PL0_RW, .type = ARM_CP_IO | ARM_CP_ALIAS,
> +  .readfn = pmevtyper_readfn, .writefn = pmevtyper_writefn,
> +  .accessfn = pmreg_access },
> +{ .name = pmevtyper_el0_name, .state = ARM_CP_STATE_AA64,
> +  .opc0 = 3, .opc1 = 3, .crn = 15, .crm = 12 | (3 & (i >> 
> 3)),

Looking at ARM documentation, I think the value for crn should be 14
for PMEVCNTR_EL0 and PMEVTYPER_EL0.

Thanks,

Laurent

> +  .opc2 = i & 7, .access = PL0_RW, .accessfn = pmreg_access,
> +  .type = ARM_CP_IO,
> +  .readfn = pmevtyper_readfn, .writefn = pmevtyper_writefn,
> +  .raw_writefn = pmevtyper_rawwrite },
> +REGINFO_SENTINEL
> +};
> +define_arm_cp_regs(cpu, pmev_regs);
> +g_free(pmevcntr_name);
> +g_free(pmevcntr_el0_name);
> +g_free(pmevtyper_name);
> +g_free(pmevtyper_el0_name);
> +}
>  #endif
>  ARMCPRegInfo clidr = {
>  .name = "CLIDR", .state = ARM_CP_STATE_BOTH,
> --
> 2.19.2
>
>



Re: [Qemu-devel] [PATCH 3/3] target/arm: Implement ARMv8.3-JSConv

2019-02-04 Thread Laurent Desnogues
Hello,

On Mon, Feb 4, 2019 at 6:38 AM Richard Henderson
 wrote:
>
> Signed-off-by: Richard Henderson 
> ---
>  target/arm/cpu.h   | 10 +
>  target/arm/helper.h|  2 +
>  target/arm/cpu.c   |  1 +
>  target/arm/cpu64.c |  2 +
>  target/arm/op_helper.c | 91 ++
>  target/arm/translate-a64.c | 26 +++
>  target/arm/translate.c | 15 +++
>  7 files changed, 147 insertions(+)
>
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index a68bcc9fed..d2c2e2b0cf 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -3209,6 +3209,11 @@ static inline bool isar_feature_aa32_vcma(const 
> ARMISARegisters *id)
>  return FIELD_EX32(id->id_isar5, ID_ISAR5, VCMA) != 0;
>  }
>
> +static inline bool isar_feature_aa32_jscvt(const ARMISARegisters *id)
> +{
> +return FIELD_EX32(id->id_isar6, ID_ISAR6, JSCVT) != 0;
> +}
> +
>  static inline bool isar_feature_aa32_dp(const ARMISARegisters *id)
>  {
>  return FIELD_EX32(id->id_isar6, ID_ISAR6, DP) != 0;
> @@ -3287,6 +3292,11 @@ static inline bool isar_feature_aa64_dp(const 
> ARMISARegisters *id)
>  return FIELD_EX64(id->id_aa64isar0, ID_AA64ISAR0, DP) != 0;
>  }
>
> +static inline bool isar_feature_aa64_jscvt(const ARMISARegisters *id)
> +{
> +return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, JSCVT) != 0;
> +}
> +
>  static inline bool isar_feature_aa64_fcma(const ARMISARegisters *id)
>  {
>  return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, FCMA) != 0;
> diff --git a/target/arm/helper.h b/target/arm/helper.h
> index 53a38188c6..6998f7e8d5 100644
> --- a/target/arm/helper.h
> +++ b/target/arm/helper.h
> @@ -218,6 +218,8 @@ DEF_HELPER_FLAGS_2(rintd_exact, TCG_CALL_NO_RWG, f64, 
> f64, ptr)
>  DEF_HELPER_FLAGS_2(rints, TCG_CALL_NO_RWG, f32, f32, ptr)
>  DEF_HELPER_FLAGS_2(rintd, TCG_CALL_NO_RWG, f64, f64, ptr)
>
> +DEF_HELPER_FLAGS_2(fjcvtzs, TCG_CALL_NO_RWG, i64, f64, ptr)
> +
>  /* neon_helper.c */
>  DEF_HELPER_FLAGS_3(neon_qadd_u8, TCG_CALL_NO_RWG, i32, env, i32, i32)
>  DEF_HELPER_FLAGS_3(neon_qadd_s8, TCG_CALL_NO_RWG, i32, env, i32, i32)
> diff --git a/target/arm/cpu.c b/target/arm/cpu.c
> index 3874dc9875..2eb2ce6c8c 100644
> --- a/target/arm/cpu.c
> +++ b/target/arm/cpu.c
> @@ -1995,6 +1995,7 @@ static void arm_max_initfn(Object *obj)
>  cpu->isar.id_isar5 = t;
>
>  t = cpu->isar.id_isar6;
> +t = FIELD_DP32(t, ID_ISAR6, JSCVT, 1);
>  t = FIELD_DP32(t, ID_ISAR6, DP, 1);
>  cpu->isar.id_isar6 = t;
>
> diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
> index 7107ec8d7e..43d8ff047c 100644
> --- a/target/arm/cpu64.c
> +++ b/target/arm/cpu64.c
> @@ -311,6 +311,7 @@ static void aarch64_max_initfn(Object *obj)
>  cpu->isar.id_aa64isar0 = t;
>
>  t = cpu->isar.id_aa64isar1;
> +t = FIELD_DP64(t, ID_AA64ISAR1, JSCVT, 1);
>  t = FIELD_DP64(t, ID_AA64ISAR1, FCMA, 1);
>  t = FIELD_DP64(t, ID_AA64ISAR1, APA, 1); /* PAuth, architected only 
> */
>  t = FIELD_DP64(t, ID_AA64ISAR1, API, 0);
> @@ -340,6 +341,7 @@ static void aarch64_max_initfn(Object *obj)
>  cpu->isar.id_isar5 = u;
>
>  u = cpu->isar.id_isar6;
> +u = FIELD_DP32(u, ID_ISAR6, JSCVT, 1);
>  u = FIELD_DP32(u, ID_ISAR6, DP, 1);
>  cpu->isar.id_isar6 = u;
>
> diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
> index c998eadfaa..a7259a7194 100644
> --- a/target/arm/op_helper.c
> +++ b/target/arm/op_helper.c
> @@ -24,6 +24,7 @@
>  #include "internals.h"
>  #include "exec/exec-all.h"
>  #include "exec/cpu_ldst.h"
> +#include "fpu/softfloat.h"
>
>  #define SIGNBIT (uint32_t)0x8000
>  #define SIGNBIT64 ((uint64_t)1 << 63)
> @@ -1376,3 +1377,93 @@ uint32_t HELPER(ror_cc)(CPUARMState *env, uint32_t x, 
> uint32_t i)
>  return ((uint32_t)x >> shift) | (x << (32 - shift));
>  }
>  }
> +
> +/*
> + * Implement float64 to int32_t conversion without saturation;
> + * the result is supplied modulo 2^32.
> + */
> +uint64_t HELPER(fjcvtzs)(float64 value, void *vstatus)
> +{
> +float_status *status = vstatus;
> +uint32_t result, exp, sign;
> +uint64_t frac;
> +uint32_t inexact; /* !Z */
> +
> +sign = extract64(value, 63, 1);
> +exp = extract64(value, 52, 11);
> +frac = extract64(value, 0, 52);
> +
> +if (exp == 0) {
> +/* While not inexact for IEEE FP, -0.0 is inexact for JavaScript.  */
> +inexact = sign;
> +result = 0;
> +if (frac != 0) {
> +if (status->flush_inputs_to_zero) {
> +float_raise(float_flag_input_denormal, status);
> +} else {
> +float_raise(float_flag_inexact, status);
> +inexact = 1;
> +}
> +}
> +} else if (exp == 0x7ff) {
> +if (frac == 0) {
> +/* Infinity.  */
> +result = 0;
> +} else {
> +/* NaN */
> +result = 

Re: [Qemu-devel] [PATCH 1/2] target/arm/translate-a64: Fix FCMLA decoding error

2019-01-29 Thread Laurent Desnogues
On Tue, Jan 29, 2019 at 3:04 PM Peter Maydell  wrote:
>
> The FCMLA (by element) instruction exists in the
> "vector x indexed element" encoding group, but not in
> the "scalar x indexed element" group. Correctly UNDEF
> the unallocated encodings.
>
> Reported-by: Laurent Desnogues 
> Signed-off-by: Peter Maydell 

Reviewed-by: Laurent Desnogues 

Thanks,

Laurent

> ---
>  target/arm/translate-a64.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
> index 30bc2412fc0..a7b999d2b5a 100644
> --- a/target/arm/translate-a64.c
> +++ b/target/arm/translate-a64.c
> @@ -12650,7 +12650,7 @@ static void disas_simd_indexed(DisasContext *s, 
> uint32_t insn)
>  case 0x13: /* FCMLA #90 */
>  case 0x15: /* FCMLA #180 */
>  case 0x17: /* FCMLA #270 */
> -if (!dc_isar_feature(aa64_fcma, s)) {
> +if (is_scalar || !dc_isar_feature(aa64_fcma, s)) {
>  unallocated_encoding(s);
>  return;
>  }
> --
> 2.20.1
>



Re: [Qemu-devel] [PATCH 2/2] target/arm/translate-a64: Fix mishandling of size in FCMLA decode

2019-01-29 Thread Laurent Desnogues
On Tue, Jan 29, 2019 at 3:04 PM Peter Maydell  wrote:
>
> In disas_simd_indexed(), for the case of "complex fp", each indexable
> element is a complex pair, so the total size is twice that indicated
> in the 'size' field in the encoding. We were trying to do this
> "double the size" operation with a left shift by 1, but this is
> incorrect because the 'size' field is a MO_8/MO_16/MO_32/MO_64
> value, and doubling the size should be done by a simple increment.
>
> This meant we were mishandling FCMLA (by element) of values where
> the real and imaginary parts are 32-bit floats, and would incorrectly
> UNDEF this encoding. (No other insns take this code path, and for
> 16-bit floats it happens that 1 << 1 and 1 + 1 are both the same).
>
> Reported-by: Laurent Desnogues 
> Signed-off-by: Peter Maydell 

Reviewed-by: Laurent Desnogues 

Thanks,

Laurent

> ---
>  target/arm/translate-a64.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
> index a7b999d2b5a..06418f0ac3c 100644
> --- a/target/arm/translate-a64.c
> +++ b/target/arm/translate-a64.c
> @@ -12680,7 +12680,7 @@ static void disas_simd_indexed(DisasContext *s, 
> uint32_t insn)
>
>  case 2: /* complex fp */
>  /* Each indexable element is a complex pair.  */
> -size <<= 1;
> +size += 1;
>  switch (size) {
>  case MO_32:
>  if (h && !is_q) {
> --
> 2.20.1
>



[Qemu-devel] AArch64: fcmla decoding errors

2019-01-28 Thread Laurent Desnogues
Hello,

I found two issues with FCMLA decoding in AArch64 disas_simd_indexed:

- FCMLA is not available as a scalar instruction
- size is wrongly shifted by one instead of being incremented by one.

Thanks,

Laurent



Re: [Qemu-devel] [PATCH 6/7] target/arm/translate-a64: Don't underdecode FP insns

2019-01-28 Thread Laurent Desnogues
On Fri, Jan 25, 2019 at 7:26 PM Peter Maydell  wrote:
>
> In the encoding groups
>  * floating-point data-processing (1 source)
>  * floating-point data-processing (2 source)
>  * floating-point data-processing (3 source)
>  * floating-point immediate
>  * floating-point compare
>  * floating-ponit conditional compare
>  * floating-point conditional select
>
>
> bit 31 is M and bit 29 is S (and bit 30 is 0, already checked at
> this point in the decode). None of these groups allocate any
> encoding for M=1 or S=1. We checked this in disas_fp_compare(),
> disas_fp_ccomp() and disas_fp_csel(), but missed it in disas_fp_1src(),
> disas_fp_2src(), disas_fp_3src() and disas_fp_imm().
>
> We also missed that in the fp immediate encoding the imm5 field
> must be all zeroes.
>
> Correctly UNDEF the unallocated encodings here.
>
> Reported-by: Laurent Desnogues 
> Signed-off-by: Peter Maydell 

Reviewed-by: Laurent Desnogues 

Thanks,

Laurent

> ---
>  target/arm/translate-a64.c | 22 +-
>  1 file changed, 21 insertions(+), 1 deletion(-)
>
> diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
> index efd2f6490b5..474d9bfb5f0 100644
> --- a/target/arm/translate-a64.c
> +++ b/target/arm/translate-a64.c
> @@ -5636,11 +5636,17 @@ static void handle_fp_fcvt(DisasContext *s, int 
> opcode,
>   */
>  static void disas_fp_1src(DisasContext *s, uint32_t insn)
>  {
> +int mos = extract32(insn, 29, 3);
>  int type = extract32(insn, 22, 2);
>  int opcode = extract32(insn, 15, 6);
>  int rn = extract32(insn, 5, 5);
>  int rd = extract32(insn, 0, 5);
>
> +if (mos) {
> +unallocated_encoding(s);
> +return;
> +}
> +
>  switch (opcode) {
>  case 0x4: case 0x5: case 0x7:
>  {
> @@ -5867,13 +5873,14 @@ static void handle_fp_2src_half(DisasContext *s, int 
> opcode,
>   */
>  static void disas_fp_2src(DisasContext *s, uint32_t insn)
>  {
> +int mos = extract32(insn, 29, 3);
>  int type = extract32(insn, 22, 2);
>  int rd = extract32(insn, 0, 5);
>  int rn = extract32(insn, 5, 5);
>  int rm = extract32(insn, 16, 5);
>  int opcode = extract32(insn, 12, 4);
>
> -if (opcode > 8) {
> +if (opcode > 8 || mos) {
>  unallocated_encoding(s);
>  return;
>  }
> @@ -6028,6 +6035,7 @@ static void handle_fp_3src_half(DisasContext *s, bool 
> o0, bool o1,
>   */
>  static void disas_fp_3src(DisasContext *s, uint32_t insn)
>  {
> +int mos = extract32(insn, 29, 3);
>  int type = extract32(insn, 22, 2);
>  int rd = extract32(insn, 0, 5);
>  int rn = extract32(insn, 5, 5);
> @@ -6036,6 +6044,11 @@ static void disas_fp_3src(DisasContext *s, uint32_t 
> insn)
>  bool o0 = extract32(insn, 15, 1);
>  bool o1 = extract32(insn, 21, 1);
>
> +if (mos) {
> +unallocated_encoding(s);
> +return;
> +}
> +
>  switch (type) {
>  case 0:
>  if (!fp_access_check(s)) {
> @@ -6105,12 +6118,19 @@ uint64_t vfp_expand_imm(int size, uint8_t imm8)
>  static void disas_fp_imm(DisasContext *s, uint32_t insn)
>  {
>  int rd = extract32(insn, 0, 5);
> +int imm5 = extract32(insn, 5, 5);
>  int imm8 = extract32(insn, 13, 8);
>  int type = extract32(insn, 22, 2);
> +int mos = extract32(insn, 29, 3);
>  uint64_t imm;
>  TCGv_i64 tcg_res;
>  TCGMemOp sz;
>
> +if (mos || imm5) {
> +unallocated_encoding(s);
> +return;
> +}
> +
>  switch (type) {
>  case 0:
>  sz = MO_32;
> --
> 2.20.1
>



Re: [Qemu-devel] [PATCH 5/7] target/arm/translate-a64: Don't underdecode add/sub extended register

2019-01-28 Thread Laurent Desnogues
On Fri, Jan 25, 2019 at 7:26 PM Peter Maydell  wrote:
>
> In the "add/subtract (extended register)" encoding group, the "opt"
> field in bits [23:22] must be zero. Correctly UNDEF the unallocated
> encodings where this field is not zero.
>
> Reported-by: Laurent Desnogues 
> Signed-off-by: Peter Maydell 
> ---
>  target/arm/translate-a64.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
> index 2cade64ed25..efd2f6490b5 100644
> --- a/target/arm/translate-a64.c
> +++ b/target/arm/translate-a64.c
> @@ -4204,12 +4204,13 @@ static void disas_add_sub_ext_reg(DisasContext *s, 
> uint32_t insn)
>  bool setflags = extract32(insn, 29, 1);
>  bool sub_op = extract32(insn, 30, 1);
>  bool sf = extract32(insn, 31, 1);
> +bool opt = extract32(insn, 22, 2);

I'd prefer an int to a bool.

Otherwise:

Reviewed-by: Laurent Desnogues 

Thanks,

Laurent

>  TCGv_i64 tcg_rm, tcg_rn; /* temps */
>  TCGv_i64 tcg_rd;
>  TCGv_i64 tcg_result;
>
> -if (imm3 > 4) {
> +if (imm3 > 4 || opt != 0) {
>  unallocated_encoding(s);
>  return;
>  }
> --
> 2.20.1
>



Re: [Qemu-devel] [PATCH 7/7] target/arm/translate-a64: Don't underdecode SDOT and UDOT

2019-01-28 Thread Laurent Desnogues
On Fri, Jan 25, 2019 at 7:26 PM Peter Maydell  wrote:
>
> In the AdvSIMD scalar x indexed element and vector x indexed element
> encoding group, the SDOT and UDOT instructions are vector only,
> and their opcode is unallocated in the scalar group. Correctly
> UNDEF this unallocated encoding.
>
> Reported-by: Laurent Desnogues 
> Signed-off-by: Peter Maydell 

Reviewed-by: Laurent Desnogues 

Thanks,

Laurent

> ---
>  target/arm/translate-a64.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
> index 474d9bfb5f0..30bc2412fc0 100644
> --- a/target/arm/translate-a64.c
> +++ b/target/arm/translate-a64.c
> @@ -12641,7 +12641,7 @@ static void disas_simd_indexed(DisasContext *s, 
> uint32_t insn)
>  break;
>  case 0x0e: /* SDOT */
>  case 0x1e: /* UDOT */
> -if (size != MO_32 || !dc_isar_feature(aa64_dp, s)) {
> +if (is_scalar || size != MO_32 || !dc_isar_feature(aa64_dp, s)) {
>  unallocated_encoding(s);
>  return;
>  }
> --
> 2.20.1
>



Re: [Qemu-devel] [PATCH 4/7] target/arm/translate-a64: Don't underdecode SIMD ld/st single

2019-01-28 Thread Laurent Desnogues
On Fri, Jan 25, 2019 at 7:26 PM Peter Maydell  wrote:
>
> In the AdvSIMD load/store single structure encodings, the
> non-post-indexed case should have zeroes in [20:16] (which is the
> Rm field for the post-indexed case). Bit 31 must also be zero
> (a check we got right in ldst_multiple but not here). Correctly
> UNDEF these unallocated encodings.
>
> Reported-by: Laurent Desnogues 
> Signed-off-by: Peter Maydell 

Reviewed-by: Laurent Desnogues 

Thanks,

Laurent

> ---
>  target/arm/translate-a64.c | 11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
> index c1f0cad7691..2cade64ed25 100644
> --- a/target/arm/translate-a64.c
> +++ b/target/arm/translate-a64.c
> @@ -3409,6 +3409,7 @@ static void disas_ldst_single_struct(DisasContext *s, 
> uint32_t insn)
>  {
>  int rt = extract32(insn, 0, 5);
>  int rn = extract32(insn, 5, 5);
> +int rm = extract32(insn, 16, 5);
>  int size = extract32(insn, 10, 2);
>  int S = extract32(insn, 12, 1);
>  int opc = extract32(insn, 13, 3);
> @@ -3424,6 +3425,15 @@ static void disas_ldst_single_struct(DisasContext *s, 
> uint32_t insn)
>  int ebytes, xs;
>  TCGv_i64 tcg_addr, tcg_rn, tcg_ebytes;
>
> +if (extract32(insn, 31, 1)) {
> +unallocated_encoding(s);
> +return;
> +}
> +if (!is_postidx && rm != 0) {
> +unallocated_encoding(s);
> +return;
> +}
> +
>  switch (scale) {
>  case 3:
>  if (!is_load || S) {
> @@ -3501,7 +3511,6 @@ static void disas_ldst_single_struct(DisasContext *s, 
> uint32_t insn)
>  }
>
>  if (is_postidx) {
> -int rm = extract32(insn, 16, 5);
>  if (rm == 31) {
>  tcg_gen_mov_i64(tcg_rn, tcg_addr);
>  } else {
> --
> 2.20.1
>



Re: [Qemu-devel] [PATCH 2/7] target/arm/translate-a64: Don't underdecode PRFM

2019-01-28 Thread Laurent Desnogues
On Fri, Jan 25, 2019 at 7:26 PM Peter Maydell  wrote:
>
> The PRFM prefetch insn in the load/store with imm9 encodings
> requires idx field 0b00; we were underdecoding this by
> only checking !is_unpriv (which is equivalent to idx != 2).
> Correctly UNDEF the unallocated encodings where idx == 0b01
> and 0b11 as well as 0b10.
>
> Reported-by: Laurent Desnogues 
> Signed-off-by: Peter Maydell 

Reviewed-by: Laurent Desnogues 

Thanks,

Laurent

> ---
>  target/arm/translate-a64.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
> index e6df303e321..8e081758e03 100644
> --- a/target/arm/translate-a64.c
> +++ b/target/arm/translate-a64.c
> @@ -2803,7 +2803,7 @@ static void disas_ldst_reg_imm9(DisasContext *s, 
> uint32_t insn,
>  } else {
>  if (size == 3 && opc == 2) {
>  /* PRFM - prefetch */
> -if (is_unpriv) {
> +if (idx != 0) {
>  unallocated_encoding(s);
>  return;
>  }
> --
> 2.20.1
>



Re: [Qemu-devel] [PATCH 3/7] target/arm/translate-a64: Don't underdecode SIMD ld/st multiple

2019-01-28 Thread Laurent Desnogues
On Fri, Jan 25, 2019 at 7:26 PM Peter Maydell  wrote:
>
> In the AdvSIMD load/store multiple structures encodings,
> the non-post-indexed case should have zeroes in [20:16]
> (which is the Rm field for the post-indexed case).
> Correctly UNDEF the currently unallocated encodings which
> have non-zeroes in those bits.
>
> Reported-by: Laurent Desnogues 
> Signed-off-by: Peter Maydell 

Reviewed-by: Laurent Desnogues 

Thanks,

Laurent

> ---
>  target/arm/translate-a64.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
> index 8e081758e03..c1f0cad7691 100644
> --- a/target/arm/translate-a64.c
> +++ b/target/arm/translate-a64.c
> @@ -3249,6 +3249,7 @@ static void disas_ldst_multiple_struct(DisasContext *s, 
> uint32_t insn)
>  {
>  int rt = extract32(insn, 0, 5);
>  int rn = extract32(insn, 5, 5);
> +int rm = extract32(insn, 16, 5);
>  int size = extract32(insn, 10, 2);
>  int opcode = extract32(insn, 12, 4);
>  bool is_store = !extract32(insn, 22, 1);
> @@ -3268,6 +3269,11 @@ static void disas_ldst_multiple_struct(DisasContext 
> *s, uint32_t insn)
>  return;
>  }
>
> +if (!is_postidx && rm != 0) {
> +unallocated_encoding(s);
> +return;
> +}
> +
>  /* From the shared decode logic */
>  switch (opcode) {
>  case 0x0:
> @@ -3367,7 +3373,6 @@ static void disas_ldst_multiple_struct(DisasContext *s, 
> uint32_t insn)
>  }
>
>  if (is_postidx) {
> -int rm = extract32(insn, 16, 5);
>  if (rm == 31) {
>  tcg_gen_mov_i64(tcg_rn, tcg_addr);
>  } else {
> --
> 2.20.1
>



Re: [Qemu-devel] [PATCH 1/7] target/arm/translate-a64: Don't underdecode system instructions

2019-01-28 Thread Laurent Desnogues
On Fri, Jan 25, 2019 at 7:26 PM Peter Maydell  wrote:
>
> The "system instructions" and "system register move" subcategories
> of "branches, exception generating and system instructions" for A64
> only apply if bits [23:22] are zero; other values are currently
> unallocated. Correctly UNDEF these unallocated encodings.
>
> Reported-by: Laurent Desnogues 
> Signed-off-by: Peter Maydell 

Reviewed-by: Laurent Desnogues 

Thanks,

Laurent

> ---
>  target/arm/translate-a64.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
> index 4d28a27c3bd..e6df303e321 100644
> --- a/target/arm/translate-a64.c
> +++ b/target/arm/translate-a64.c
> @@ -2144,7 +2144,11 @@ static void disas_b_exc_sys(DisasContext *s, uint32_t 
> insn)
>  break;
>  case 0x6a: /* Exception generation / System */
>  if (insn & (1 << 24)) {
> -disas_system(s, insn);
> +if (extract32(insn, 22, 2) == 0) {
> +disas_system(s, insn);
> +} else {
> +unallocated_encoding(s);
> +}
>  } else {
>  disas_exc(s, insn);
>  }
> --
> 2.20.1
>



[Qemu-devel] AArch64: some missed undefined instructions

2019-01-24 Thread Laurent Desnogues
Hello,

I did exhaustive comparisons against latest binutils and found the
following undefined instructions that QEMU fails to flag:

- in disas_b_exc_sys, before calling disas_system bits [23:22] should
be checked to be 0

- in disas_ldst_reg_imm9, PRFM is wrongly detected:  PRFM is for idx =
0, not for is_unpriv, the rest being undefined

- in disas_ldst_multiple_struct, if the instruction is not
post-indexed, then bits [20:16] should be checked to be 0

- in disas_ldst_single_struct, if the instruction is not post-indexed,
then bits [20:16] should be checked to be 0;  also bit [31] should be
0

- in disas_add_sub_ext_reg, bits [23:22] should be checked to be 0

- in disas_data_proc_1src, there's a missing default that would flag
undefined instructions

- in disas_fp_1src, disas_fp_2src, disas_fp_3src, and disas_fp_imm
bits,  [31:29] should be checked to be 0

- in disas_fp_imm, bits [9:5] should be checked to be 0

- in disas_simd_indexed, SDOT and UDOT are not scalar instructions.

That's all I found.  I hope I didn't make any transcription error :-)

Thanks,

Laurent



Re: [Qemu-devel] [RFC PATCH] fpu: add compile time check for old glibc/libm and fma

2018-12-21 Thread Laurent Desnogues
Hi,

On Thu, Dec 20, 2018 at 12:10 PM Alex Bennée  wrote:
>
> Some versions of glibc have been reported to have problems with
> fused-multiply-accumulate operations. If the underlying fma
> implementation does a two step operation it will instroduce subtle
> rounding errors. Newer versions of the library seem to deal with this
> better and modern hardware has fused operations which the library can
> use.

Thanks for taking care of this issue.  The fix was tested on our
machines and it works as expected.

So:

Tested-by: Laurent Desnogues 

> Reported-by: Laurent Desnogues 
> Signed-off-by: Alex Bennée 
> Cc: Emilio G. Cota 
> ---
>  fpu/softfloat.c | 25 +
>  1 file changed, 25 insertions(+)
>
> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
> index 59eac97d10..9c2dbd04b5 100644
> --- a/fpu/softfloat.c
> +++ b/fpu/softfloat.c
> @@ -203,6 +203,25 @@ GEN_INPUT_FLUSH3(float64_input_flush3, float64)
>  # define QEMU_HARDFLOAT_3F64_USE_FP 0
>  #endif
>
> +/*
> + * Choose whether to accelerate fused multiply-accumulate operations
> + * with hard float functions. Some versions of glibc's maths library
> + * have been reported to be broken on x86 without FMA instructions.
> + */
> +#if defined(__x86_64__)
> +/* this was actually reported as glibc-2.12-1.149.el6_6.5.x86_64 was
> + * broken but glibc 2.12-1.209 works but out of caution lets disable
> + * for all older glibcs.
> + */
> +#if defined(__GLIBC__) && (__GLIBC__ == 2 && __GLIBC_MINOR__ <= 12)
> +#define QEMU_HARDFLOAT_USE_FMA 0
> +#else
> +#define QEMU_HARDFLOAT_USE_FMA 1
> +#endif
> +#else
> +#define QEMU_HARDFLOAT_USE_FMA 1
> +#endif
> +
>  /*
>   * QEMU_HARDFLOAT_USE_ISINF chooses whether to use isinf() over
>   * float{32,64}_is_infinity when !USE_FP.
> @@ -1551,6 +1570,9 @@ float32_muladd(float32 xa, float32 xb, float32 xc, int 
> flags, float_status *s)
>  ub.s = xb;
>  uc.s = xc;
>
> +if (!QEMU_HARDFLOAT_USE_FMA) {
> +goto soft;
> +}
>  if (unlikely(!can_use_fpu(s))) {
>  goto soft;
>  }
> @@ -1612,6 +1634,9 @@ float64_muladd(float64 xa, float64 xb, float64 xc, int 
> flags, float_status *s)
>  ub.s = xb;
>  uc.s = xc;
>
> +if (!QEMU_HARDFLOAT_USE_FMA) {
> +goto soft;
> +}
>  if (unlikely(!can_use_fpu(s))) {
>  goto soft;
>  }
> --
> 2.17.1
>



Re: [Qemu-devel] [PATCH 00/12] tcg: Improve register allocation for calls

2018-11-29 Thread Laurent Desnogues
On Fri, Nov 30, 2018 at 4:00 AM Emilio G. Cota  wrote:
>
> On Thu, Nov 29, 2018 at 19:39:15 -0500, Emilio G. Cota wrote:
> > A64 and POWER9 host numbers:
> >
> >   https://imgur.com/a/m6Pss99
> >
> > There's quite a bit of noise in the P9 measurements, but it's
> > a shared machine so I can't do much about that.
> >
> > I'll update the A64 results with error bars later tonight,
> > when I get further results.
>
> Here they are:
>
>   https://imgur.com/a/EAAapSW

What is a X-Gene A57? It's either X-Gene or A57 :-)

Thanks,

Laurent

> The second image is the same results, but zoomed in. I could
> bring the confidence intervals down by running this many times,
> but each run takes 2h and I only have access to the
> machine for a few hours at a time.
>
> Those confidence intervals are generated from only 2 runs per benchmark,
> which explains why they're so large.
>
> E.
>



Re: [Qemu-devel] [PATCH v1 10/12] target/arm: Add the Cortex-A72

2018-10-09 Thread Laurent Desnogues
Hello,

On Tue, Oct 9, 2018 at 3:19 PM Edgar E. Iglesias
 wrote:
>
> Another A72 related thing I wanted to check with you. A month or two ago I was
> looking at an issue with Linux running very slowly on our models.
> Something that popped up was that Linux was running a couple of spectre 
> related
> "workarounds" and "hardening" sequences on the QEMU A72s.
>
> There are a couple of bits in the ID_AARCH64_PFR0 register that
> Linux checks before enabling the sequences but I never found any
> documentation of them in the specs. Bits 56 and 60.
>
> In Linux these are refered to as:
> ID_AA64PFR0_CSV2_SHIFT
> ID_AA64PFR0_CSV3_SHIFT
>
> This is what we have in our tree:
>
> cpu->gic_vprebits = 5;
> define_arm_cp_regs(cpu, cortex_a57_a53_cp_reginfo);
>
> /* Xilinx FIXUPs.  */
> /* These indicate the BP hardening and KPTI aren't needed.  */
> cpu->id_aa64pfr0 |= (uint64_t)1 << 56; /* BP.  */
> cpu->id_aa64pfr0 |= (uint64_t)1 << 60; /* KPTI.  */
> }
>
> Do you know what these are?
> Should we be setting these in QEMU?

These fields are publicly documented in the system register specification:

https://developer.arm.com/products/architecture/cpu-architecture/a-profile/exploration-tools

These are ARMv8.5 fields, I don't think these should be set by default
for Cortex-A72. Of course nothing prevents you from defining a
specific CPU with these fields set to boot faster :-)  Or perhaps add
a property to override the default value of these registers?


Laurent



Re: [Qemu-devel] [PATCH 07/20] target/arm: Fix is_a64 for user-only

2018-08-17 Thread Laurent Desnogues
Hello,

On Fri, Aug 17, 2018 at 6:04 PM Peter Maydell  wrote:
>
> On 9 August 2018 at 05:21, Richard Henderson
>  wrote:
> > Saves about 8k code size in qemu-aarch64.
> >
> > Signed-off-by: Richard Henderson 
> > ---
> >  target/arm/cpu.h | 8 
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> > index aedaf2631e..ed51a2f5aa 100644
> > --- a/target/arm/cpu.h
> > +++ b/target/arm/cpu.h
> > @@ -918,7 +918,15 @@ void aarch64_sync_64_to_32(CPUARMState *env);
> >
> >  static inline bool is_a64(CPUARMState *env)
> >  {
> > +#ifdef CONFIG_USER_ONLY
> > +# ifdef TARGET_AARCH64
> > +return true;
> > +# else
> > +return false;
> > +# endif
> > +#else
> >  return env->aarch64;
> > +#endif
> >  }
>
> And again. I don't want to pepper the code with ifdefs if
> we can do the right thing without them.

FWIW I find it more readable with the ifdef's (here and in the
previous patches) and I guess that helps the compiler too.

Thanks,

Laurent



Re: [Qemu-devel] [PATCH 5/6] target/arm: Fix aa64 FCADD and FCMLA decode

2018-08-11 Thread Laurent Desnogues
On Fri, Aug 10, 2018 at 9:31 PM, Richard Henderson
 wrote:
> These insns require u=1; failed to include that in the switch
> cases.  This probably happened during one of the rebases just
> before final commit.
>
> Fixes: d17b7cdcf4e
> Signed-off-by: Richard Henderson 

Reviewed-by: Laurent Desnogues 

Laurent

> ---
>  target/arm/translate-a64.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
> index 4a0ca8c906..8a24278d79 100644
> --- a/target/arm/translate-a64.c
> +++ b/target/arm/translate-a64.c
> @@ -11427,12 +11427,12 @@ static void 
> disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
>  }
>  feature = ARM_FEATURE_V8_DOTPROD;
>  break;
> -case 0x8: /* FCMLA, #0 */
> -case 0x9: /* FCMLA, #90 */
> -case 0xa: /* FCMLA, #180 */
> -case 0xb: /* FCMLA, #270 */
> -case 0xc: /* FCADD, #90 */
> -case 0xe: /* FCADD, #270 */
> +case 0x18: /* FCMLA, #0 */
> +case 0x19: /* FCMLA, #90 */
> +case 0x1a: /* FCMLA, #180 */
> +case 0x1b: /* FCMLA, #270 */
> +case 0x1c: /* FCADD, #90 */
> +case 0x1e: /* FCADD, #270 */
>  if (size == 0
>  || (size == 1 && !arm_dc_feature(s, ARM_FEATURE_V8_FP16))
>  || (size == 3 && !is_q)) {
> --
> 2.17.1
>



Re: [Qemu-devel] [PATCH 6/6] softfloat: Fix missing inexact for floating-point add

2018-08-11 Thread Laurent Desnogues
On Fri, Aug 10, 2018 at 9:31 PM, Richard Henderson
 wrote:
> For 0x1.3p+0 + 0x1.ffep+14 = 0x1.0001fffp+15
> we dropped the sticky bit and so failed to raise inexact.
>
> Reported-by: Laurent Desnogues 
> Signed-off-by: Richard Henderson 

Tested-by: Laurent Desnogues 
Reviewed-by: Laurent Desnogues 

Laurent

> ---
>  fpu/softfloat.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
> index 8cd2400081..7d63cffdeb 100644
> --- a/fpu/softfloat.c
> +++ b/fpu/softfloat.c
> @@ -701,7 +701,7 @@ static FloatParts addsub_floats(FloatParts a, FloatParts 
> b, bool subtract,
>  }
>  a.frac += b.frac;
>  if (a.frac & DECOMPOSED_OVERFLOW_BIT) {
> -a.frac >>= 1;
> +shift64RightJamming(a.frac, 1, );
>  a.exp += 1;
>  }
>  return a;
> --
> 2.17.1
>



Re: [Qemu-devel] [PATCH 4/6] target/arm: Use FZ not FZ16 for SVE FCVT single-half and double-half

2018-08-11 Thread Laurent Desnogues
On Fri, Aug 10, 2018 at 9:31 PM, Richard Henderson
 wrote:
> We were using the wrong flush-to-zero bit for the non-half input.
>
> Fixes: 46d33d1e3c9
> Cc: qemu-sta...@nongnu.org (3.0.1)
> Reported-by: Laurent Desnogues 
> Signed-off-by: Richard Henderson 

Tested-by: Laurent Desnogues 
Reviewed-by: Laurent Desnogues 

Laurent

> ---
>  target/arm/translate-sve.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
> index 05ba0518c8..fe7aebdc19 100644
> --- a/target/arm/translate-sve.c
> +++ b/target/arm/translate-sve.c
> @@ -4093,7 +4093,7 @@ static bool do_zpz_ptr(DisasContext *s, int rd, int rn, 
> int pg,
>
>  static bool trans_FCVT_sh(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
>  {
> -return do_zpz_ptr(s, a->rd, a->rn, a->pg, true, gen_helper_sve_fcvt_sh);
> +return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_fcvt_sh);
>  }
>
>  static bool trans_FCVT_hs(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
> @@ -4103,7 +4103,7 @@ static bool trans_FCVT_hs(DisasContext *s, arg_rpr_esz 
> *a, uint32_t insn)
>
>  static bool trans_FCVT_dh(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
>  {
> -return do_zpz_ptr(s, a->rd, a->rn, a->pg, true, gen_helper_sve_fcvt_dh);
> +return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_fcvt_dh);
>  }
>
>  static bool trans_FCVT_hd(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
> --
> 2.17.1
>



Re: [Qemu-devel] [PATCH 3/6] target/arm: Use fp_status_fp16 for do_fmpa_zpzzz_h

2018-08-11 Thread Laurent Desnogues
On Fri, Aug 10, 2018 at 9:31 PM, Richard Henderson
 wrote:
> This makes float16_muladd correctly use FZ16 not FZ.
>
> Fixes: 6ceabaad110
> Cc: qemu-sta...@nongnu.org (3.0.1)
> Reported-by: Laurent Desnogues 
> Signed-off-by: Richard Henderson 

Tested-by: Laurent Desnogues 
Reviewed-by: Laurent Desnogues 

Laurent

> ---
>  target/arm/sve_helper.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
> index 5bae600d17..f8434e68f2 100644
> --- a/target/arm/sve_helper.c
> +++ b/target/arm/sve_helper.c
> @@ -3398,7 +3398,7 @@ static void do_fmla_zpzzz_h(CPUARMState *env, void *vg, 
> uint32_t desc,
>  e1 = *(uint16_t *)(vn + H1_2(i)) ^ neg1;
>  e2 = *(uint16_t *)(vm + H1_2(i));
>  e3 = *(uint16_t *)(va + H1_2(i)) ^ neg3;
> -r = float16_muladd(e1, e2, e3, 0, >vfp.fp_status);
> +r = float16_muladd(e1, e2, e3, 0, >vfp.fp_status_f16);
>  *(uint16_t *)(vd + H1_2(i)) = r;
>  }
>  } while (i & 63);
> --
> 2.17.1
>



Re: [Qemu-devel] [PATCH 2/6] target/arm: Ignore float_flag_input_denormal from fp_status_f16

2018-08-11 Thread Laurent Desnogues
On Fri, Aug 10, 2018 at 9:31 PM, Richard Henderson
 wrote:
> When FZ is set, input_denormal exceptions are recognized, but this does
> not happen with FZ16.  The softfloat code has no way to distinguish
> these bits and will raise such exceptions into fp_status_f16.flags,
> so ignore them when computing the accumulated flags.
>
> Cc: qemu-sta...@nongnu.org (3.0.1)
> Reported-by: Laurent Desnogues 
> Signed-off-by: Richard Henderson 

Tested-by: Laurent Desnogues 
Reviewed-by: Laurent Desnogues 

Laurent

> ---
>  target/arm/helper.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index 452d5e182a..61454a77ec 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -11314,9 +11314,13 @@ uint32_t HELPER(vfp_get_fpscr)(CPUARMState *env)
>  fpscr = (env->vfp.xregs[ARM_VFP_FPSCR] & 0xffc8)
>  | (env->vfp.vec_len << 16)
>  | (env->vfp.vec_stride << 20);
> +
>  i = get_float_exception_flags(>vfp.fp_status);
>  i |= get_float_exception_flags(>vfp.standard_fp_status);
> -i |= get_float_exception_flags(>vfp.fp_status_f16);
> +/* FZ16 does not generate an input denormal exception.  */
> +i |= (get_float_exception_flags(>vfp.fp_status_f16)
> +  & ~float_flag_input_denormal);
> +
>  fpscr |= vfp_exceptbits_from_host(i);
>  return fpscr;
>  }
> --
> 2.17.1
>



Re: [Qemu-devel] [PATCH 1/6] target/arm: Adjust FPCR_MASK for FZ16

2018-08-11 Thread Laurent Desnogues
On Fri, Aug 10, 2018 at 9:31 PM, Richard Henderson
 wrote:
> When support for FZ16 was added, we failed to include the bit
> within FPCR_MASK, which means that it could never be set.
> Continue to zero FZ16 when ARMv8.2-FP16 is not enabled.
>
> Fixes: d81ce0ef2c4
> Cc: qemu-sta...@nongnu.org (3.0.1)
> Reported-by: Laurent Desnogues 
> Signed-off-by: Richard Henderson 

Tested-by: Laurent Desnogues 
Reviewed-by: Laurent Desnogues 

Laurent

> ---
>  target/arm/cpu.h| 2 +-
>  target/arm/helper.c | 5 +
>  2 files changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index 33d06f2340..0176716a70 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -1279,7 +1279,7 @@ void vfp_set_fpscr(CPUARMState *env, uint32_t val);
>   * we store the underlying state in fpscr and just mask on read/write.
>   */
>  #define FPSR_MASK 0xf89f
> -#define FPCR_MASK 0x07f79f00
> +#define FPCR_MASK 0x07ff9f00
>
>  #define FPCR_FZ16   (1 << 19)   /* ARMv8.2+, FP16 flush-to-zero */
>  #define FPCR_FZ (1 << 24)   /* Flush-to-zero enable bit */
> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index 64ff71b722..452d5e182a 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -11351,6 +11351,11 @@ void HELPER(vfp_set_fpscr)(CPUARMState *env, 
> uint32_t val)
>  int i;
>  uint32_t changed;
>
> +/* When ARMv8.2-FP16 is not supported, FZ16 is RES0.  */
> +if (!arm_feature(env, ARM_FEATURE_V8_FP16)) {
> +val &= ~FPCR_FZ16;
> +}
> +
>  changed = env->vfp.xregs[ARM_VFP_FPSCR];
>  env->vfp.xregs[ARM_VFP_FPSCR] = (val & 0xffc8);
>  env->vfp.vec_len = (val >> 16) & 7;
> --
> 2.17.1
>



Re: [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches

2018-08-08 Thread Laurent Desnogues
Hello,

On Thu, Aug 9, 2018 at 6:21 AM, Richard Henderson
 wrote:
> This is my current set of patches for running SVE in system mode.
>
> The first half deal with the system registers that affect SVE.
> I recall that Peter has said he'd like the first patch to be
> done a different way, but we haven't had a chance to talk about
> what form it should take.  I've left it as-is since it does what
> I need for now.
>
> The second half re-implement the SVE memory operations.
> The FF and NF loads had been stubbed out.  Getting those to work
> requires some infrastructure that can be reused to speed up normal
> loads -- one guest-to-host tlb lookup can be reused for the rest
> of the page.

I did not review every patch individually but tested the whole and
found no issue.

Tested-by: Laurent Desnogues 

Thanks,

Laurent

>
> r~
>
>
> Based-on: <20180809034033.10579-1-richard.hender...@linaro.org>
> Richard Henderson (20):
>   target/arm: Set ISAR bits for -cpu max
>   target/arm: Set ID_AA64PFR0 bits for SVE for -cpu max
>   target/arm: Define ID_AA64ZFR0_EL1
>   target/arm: Adjust sve_exception_el
>   target/arm: Fix arm_cpu_data_is_big_endian for aa64 user-only
>   target/arm: Fix arm_current_el for user-only
>   target/arm: Fix is_a64 for user-only
>   target/arm: Pass in current_el to fp and sve_exception_el
>   target/arm: Handle SVE vector length changes in system mode
>   target/arm: Adjust aarch64_cpu_dump_state for system mode SVE
>   target/arm: Clear unused predicate bits for LD1RQ
>   target/arm: Rewrite helper_sve_ld1*_r using pages
>   target/arm: Rewrite helper_sve_ld[234]*_r
>   target/arm: Rewrite helper_sve_st[1234]*_r
>   target/arm: Split contiguous loads for endianness
>   target/arm: Split contiguous stores for endianness
>   target/arm: Rewrite vector gather loads
>   target/arm: Rewrite vector gather stores
>   target/arm: Rewrite vector gather first-fault loads
>   target/arm: Pass TCGMemOpIdx to sve memory helpers
>
>  target/arm/cpu.h   |   47 +-
>  target/arm/helper-sve.h|  385 +--
>  target/arm/internals.h |5 +
>  target/arm/cpu.c   |   24 +-
>  target/arm/cpu64.c |   93 +-
>  target/arm/helper.c|  237 +++--
>  target/arm/op_helper.c |1 +
>  target/arm/sve_helper.c| 2062 +---
>  target/arm/translate-a64.c |8 +-
>  target/arm/translate-sve.c |  670 
>  10 files changed, 2453 insertions(+), 1079 deletions(-)
>
> --
> 2.17.1
>



Re: [Qemu-devel] [PATCH 08/11] target/arm: Fix offset scaling for LD_zprr and ST_zprr

2018-08-08 Thread Laurent Desnogues
On Thu, Aug 9, 2018 at 5:40 AM, Richard Henderson
 wrote:
> The scaling should be solely on the memory operation size; the number
> of registers being loaded does not come in to the initial computation.
>
> Cc: qemu-sta...@nongnu.org (3.0.1)
> Reported-by: Laurent Desnogues 
> Signed-off-by: Richard Henderson 

Tested-by: Laurent Desnogues 
Reviewed-by: Laurent Desnogues 

Laurent

> ---
>  target/arm/translate-sve.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
> index f635822a61..d27bc8c946 100644
> --- a/target/arm/translate-sve.c
> +++ b/target/arm/translate-sve.c
> @@ -4665,8 +4665,7 @@ static bool trans_LD_zprr(DisasContext *s, 
> arg_rprr_load *a, uint32_t insn)
>  }
>  if (sve_access_check(s)) {
>  TCGv_i64 addr = new_tmp_a64(s);
> -tcg_gen_muli_i64(addr, cpu_reg(s, a->rm),
> - (a->nreg + 1) << dtype_msz(a->dtype));
> +tcg_gen_shli_i64(addr, cpu_reg(s, a->rm), dtype_msz(a->dtype));
>  tcg_gen_add_i64(addr, addr, cpu_reg_sp(s, a->rn));
>  do_ld_zpa(s, a->rd, a->pg, addr, a->dtype, a->nreg);
>  }
> @@ -4899,7 +4898,7 @@ static bool trans_ST_zprr(DisasContext *s, 
> arg_rprr_store *a, uint32_t insn)
>  }
>  if (sve_access_check(s)) {
>  TCGv_i64 addr = new_tmp_a64(s);
> -tcg_gen_muli_i64(addr, cpu_reg(s, a->rm), (a->nreg + 1) << a->msz);
> +tcg_gen_shli_i64(addr, cpu_reg(s, a->rm), a->msz);
>  tcg_gen_add_i64(addr, addr, cpu_reg_sp(s, a->rn));
>  do_st_zpa(s, a->rd, a->pg, addr, a->msz, a->esz, a->nreg);
>  }
> --
> 2.17.1
>



Re: [Qemu-devel] [PATCH 07/11] target/arm: Fix offset for LD1R instructions

2018-08-08 Thread Laurent Desnogues
On Thu, Aug 9, 2018 at 5:40 AM, Richard Henderson
 wrote:
> The immediate should be scaled by the size of the memory reference,
> not the size of the elements into which it is loaded.
>
> Cc: qemu-sta...@nongnu.org (3.0.1)
> Reported-by: Laurent Desnogues 
> Signed-off-by: Richard Henderson 

Tested-by: Laurent Desnogues 
Reviewed-by: Laurent Desnogues 

Laurent

> ---
>  target/arm/translate-sve.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
> index 9e63b5f8e5..f635822a61 100644
> --- a/target/arm/translate-sve.c
> +++ b/target/arm/translate-sve.c
> @@ -4819,6 +4819,7 @@ static bool trans_LD1R_zpri(DisasContext *s, 
> arg_rpri_load *a, uint32_t insn)
>  unsigned vsz = vec_full_reg_size(s);
>  unsigned psz = pred_full_reg_size(s);
>  unsigned esz = dtype_esz[a->dtype];
> +unsigned msz = dtype_msz(a->dtype);
>  TCGLabel *over = gen_new_label();
>  TCGv_i64 temp;
>
> @@ -4842,7 +4843,7 @@ static bool trans_LD1R_zpri(DisasContext *s, 
> arg_rpri_load *a, uint32_t insn)
>
>  /* Load the data.  */
>  temp = tcg_temp_new_i64();
> -tcg_gen_addi_i64(temp, cpu_reg_sp(s, a->rn), a->imm << esz);
> +tcg_gen_addi_i64(temp, cpu_reg_sp(s, a->rn), a->imm << msz);
>  tcg_gen_qemu_ld_i64(temp, temp, get_mem_index(s),
>  s->be_data | dtype_mop[a->dtype]);
>
> --
> 2.17.1
>



Re: [Qemu-devel] [PATCH 06/11] target/arm: Fix sign-extension in sve do_ldr/do_str

2018-08-08 Thread Laurent Desnogues
On Thu, Aug 9, 2018 at 5:40 AM, Richard Henderson
 wrote:
> The expression (int) imm + (uint32_t) len_align turns into uint32_t
> and thus with negative imm produces a memory operation at the wrong
> offset.  None of the numbers involved are particularly large, so
> change everything to use int.
>
> Cc: qemu-sta...@nongnu.org (3.0.1)
> Reported-by: Laurent Desnogues 
> Signed-off-by: Richard Henderson 

Tested-by: Laurent Desnogues 
Reviewed-by: Laurent Desnogues 

Laurent

> ---
>  target/arm/translate-sve.c | 18 --
>  1 file changed, 8 insertions(+), 10 deletions(-)
>
> diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
> index 89efc80ee7..9e63b5f8e5 100644
> --- a/target/arm/translate-sve.c
> +++ b/target/arm/translate-sve.c
> @@ -4372,12 +4372,11 @@ static bool trans_UCVTF_dd(DisasContext *s, 
> arg_rpr_esz *a, uint32_t insn)
>   * The load should begin at the address Rn + IMM.
>   */
>
> -static void do_ldr(DisasContext *s, uint32_t vofs, uint32_t len,
> -   int rn, int imm)
> +static void do_ldr(DisasContext *s, uint32_t vofs, int len, int rn, int imm)
>  {
> -uint32_t len_align = QEMU_ALIGN_DOWN(len, 8);
> -uint32_t len_remain = len % 8;
> -uint32_t nparts = len / 8 + ctpop8(len_remain);
> +int len_align = QEMU_ALIGN_DOWN(len, 8);
> +int len_remain = len % 8;
> +int nparts = len / 8 + ctpop8(len_remain);
>  int midx = get_mem_index(s);
>  TCGv_i64 addr, t0, t1;
>
> @@ -4458,12 +4457,11 @@ static void do_ldr(DisasContext *s, uint32_t vofs, 
> uint32_t len,
>  }
>
>  /* Similarly for stores.  */
> -static void do_str(DisasContext *s, uint32_t vofs, uint32_t len,
> -   int rn, int imm)
> +static void do_str(DisasContext *s, uint32_t vofs, int len, int rn, int imm)
>  {
> -uint32_t len_align = QEMU_ALIGN_DOWN(len, 8);
> -uint32_t len_remain = len % 8;
> -uint32_t nparts = len / 8 + ctpop8(len_remain);
> +int len_align = QEMU_ALIGN_DOWN(len, 8);
> +int len_remain = len % 8;
> +int nparts = len / 8 + ctpop8(len_remain);
>  int midx = get_mem_index(s);
>  TCGv_i64 addr, t0;
>
> --
> 2.17.1
>



Re: [Qemu-devel] [PATCH] tcg/optimize: Do not skip default processing of dup_vec

2018-08-05 Thread Laurent Desnogues
Hello,

On Mon, Aug 6, 2018 at 1:32 AM, Richard Henderson
 wrote:
> If we do not opimize away dup_vec, we must mark its output as changed.
>
> Fixes: 170ba88f45b
> Reported-by: Laurent Desnogues 
> Signed-off-by: Richard Henderson 

Reviewed-by: Laurent Desnogues 
Tested-by: Laurent Desnogues 

Thanks,

Laurent

> ---
>  tcg/optimize.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index d4ea67e541..5dbe11c3c8 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -1094,9 +1094,9 @@ void tcg_optimize(TCGContext *s)
>  tmp = arg_info(op->args[1])->val;
>  tmp = dup_const(TCGOP_VECE(op), tmp);
>  tcg_opt_gen_movi(s, op, op->args[0], tmp);
> -continue;
> +break;
>  }
> -break;
> +goto do_default;
>
>  CASE_OP_32_64(not):
>  CASE_OP_32_64(neg):
> --
> 2.17.1
>



[Qemu-devel] TCG dup_vec optimization issue

2018-08-02 Thread Laurent Desnogues
Hello Richard,

the case INDEX_op_dup_vec in optimize.c:tcg_optimize is doing a break
when the input arg is not constant instead of jumping to do_default.

The issue can be reproduced by compiling the following SVE code:

.global main
.section .text

main:
mov z2.d, #0
decpz1.d, p0
ret

When run with SVE vector length set to 128-bit, we get this before TCG opt:

  00400524  
 dupi_vec v128,e64,tmp2,$0x0
 st_vec v128,e8,tmp2,env,$0xb00

  00400528  
 ld_i64 tmp3,env,$0x2900
 movi_i64 tmp4,$0x101
 and_i64 tmp3,tmp3,tmp4
 ctpop_i64 tmp3,tmp3
 dup_vec v128,e64,tmp2,tmp3
 ld_vec v128,e8,tmp5,env,$0xa00
 sub_vec v128,e64,tmp6,tmp5,tmp2
 st_vec v128,e8,tmp6,env,$0xa00

After the optimization pass:

   00400524  
 dupi_vec v128,e64,tmp2,$0x0
 st_vec v128,e8,tmp2,env,$0xb00   dead: 0

  00400528  
 ld_vec v128,e8,tmp5,env,$0xa00
 mov_vec v128,e64,tmp6,tmp5   dead: 1
 st_vec v128,e8,tmp6,env,$0xa00   dead: 0

The dupi_vec TCG op correctly flags tmp2 as a 0 constant. This is then
wrongly propagated down to the sub_vec TCG op despite tmp2 being
overwritten by a dup_vec TCG op.

Thanks,

Laurent



Re: [Qemu-devel] [PATCH 2/4] target/arm: Fix typo in do_sat_addsub_64

2018-08-01 Thread Laurent Desnogues
On Wed, Aug 1, 2018 at 2:31 PM, Richard Henderson
 wrote:
> Used the wrong temporary in the computation of subtractive overflow.
>
> Reported-by: Laurent Desnogues 
> Signed-off-by: Richard Henderson 

Reviewed-by: Laurent Desnogues 
Tested-by: Laurent Desnogues 


Laurent

> ---
>  target/arm/translate-sve.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
> index 374051cd20..9dd4c38bab 100644
> --- a/target/arm/translate-sve.c
> +++ b/target/arm/translate-sve.c
> @@ -1625,7 +1625,7 @@ static void do_sat_addsub_64(TCGv_i64 reg, TCGv_i64 
> val, bool u, bool d)
>  /* Detect signed overflow for subtraction.  */
>  tcg_gen_xor_i64(t0, reg, val);
>  tcg_gen_sub_i64(t1, reg, val);
> -tcg_gen_xor_i64(reg, reg, t0);
> +tcg_gen_xor_i64(reg, reg, t1);
>  tcg_gen_and_i64(t0, t0, reg);
>
>  /* Bound the result.  */
> --
> 2.17.1
>



Re: [Qemu-devel] [PATCH 0/4] target/arm sve fixes

2018-08-01 Thread Laurent Desnogues
On Wed, Aug 1, 2018 at 2:31 PM, Richard Henderson
 wrote:
> These four patches are minor, reported by Laurent this week.

This fixes the reported issues, thanks!

> If there happens to be an -rc4 release, it would be nice if
> they were included.  But if not, no biggie.  I suspect that
> other minor issues will be found past these four, so I would
> hope everyone who cares about sve is just as happy working
> from a branch anyway.

I indeed still have some things to look at.


Laurent

>
> r~
>
>
> Richard Henderson (4):
>   target/arm: Fix sign of sve_cmpeq_ppzw/sve_cmpne_ppzw
>   target/arm: Fix typo in do_sat_addsub_64
>   target/arm: Reorganize SVE WHILE
>   target/arm: Fix typo in helper_sve_movz_d
>
>  target/arm/sve_helper.c| 19 ++
>  target/arm/translate-sve.c | 51 --
>  2 files changed, 40 insertions(+), 30 deletions(-)
>
> --
> 2.17.1
>



Re: [Qemu-devel] [PATCH 3/4] target/arm: Reorganize SVE WHILE

2018-08-01 Thread Laurent Desnogues
On Wed, Aug 1, 2018 at 2:31 PM, Richard Henderson
 wrote:
> The pseudocode for this operation is an increment + compare loop,
> so comparing <= the maximum integer produces an all-true predicate.
>
> Rather than bound in both the inline code and the helper, pass the
> helper the number of predicate bits to set instead of the number
> of predicate elements to set.
>
> Reported-by: Laurent Desnogues 
> Signed-off-by: Richard Henderson 

Reviewed-by: Laurent Desnogues 
Tested-by: Laurent Desnogues 


Laurent

> ---
>  target/arm/sve_helper.c|  5 
>  target/arm/translate-sve.c | 49 +-
>  2 files changed, 32 insertions(+), 22 deletions(-)
>
> diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
> index 9bd0694d55..87594a8adb 100644
> --- a/target/arm/sve_helper.c
> +++ b/target/arm/sve_helper.c
> @@ -2846,11 +2846,6 @@ uint32_t HELPER(sve_while)(void *vd, uint32_t count, 
> uint32_t pred_desc)
>  return flags;
>  }
>
> -/* Scale from predicate element count to bits.  */
> -count <<= esz;
> -/* Bound to the bits in the predicate.  */
> -count = MIN(count, oprsz * 8);
> -
>  /* Set all of the requested bits.  */
>  for (i = 0; i < count / 64; ++i) {
>  d->p[i] = esz_mask;
> diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
> index 9dd4c38bab..89efc80ee7 100644
> --- a/target/arm/translate-sve.c
> +++ b/target/arm/translate-sve.c
> @@ -3173,19 +3173,19 @@ static bool trans_CTERM(DisasContext *s, arg_CTERM 
> *a, uint32_t insn)
>
>  static bool trans_WHILE(DisasContext *s, arg_WHILE *a, uint32_t insn)
>  {
> -if (!sve_access_check(s)) {
> -return true;
> -}
> -
> -TCGv_i64 op0 = read_cpu_reg(s, a->rn, 1);
> -TCGv_i64 op1 = read_cpu_reg(s, a->rm, 1);
> -TCGv_i64 t0 = tcg_temp_new_i64();
> -TCGv_i64 t1 = tcg_temp_new_i64();
> +TCGv_i64 op0, op1, t0, t1, tmax;
>  TCGv_i32 t2, t3;
>  TCGv_ptr ptr;
>  unsigned desc, vsz = vec_full_reg_size(s);
>  TCGCond cond;
>
> +if (!sve_access_check(s)) {
> +return true;
> +}
> +
> +op0 = read_cpu_reg(s, a->rn, 1);
> +op1 = read_cpu_reg(s, a->rm, 1);
> +
>  if (!a->sf) {
>  if (a->u) {
>  tcg_gen_ext32u_i64(op0, op0);
> @@ -3198,32 +3198,47 @@ static bool trans_WHILE(DisasContext *s, arg_WHILE 
> *a, uint32_t insn)
>
>  /* For the helper, compress the different conditions into a computation
>   * of how many iterations for which the condition is true.
> - *
> - * This is slightly complicated by 0 <= UINT64_MAX, which is nominally
> - * 2**64 iterations, overflowing to 0.  Of course, predicate registers
> - * aren't that large, so any value >= predicate size is sufficient.
>   */
> +t0 = tcg_temp_new_i64();
> +t1 = tcg_temp_new_i64();
>  tcg_gen_sub_i64(t0, op1, op0);
>
> -/* t0 = MIN(op1 - op0, vsz).  */
> -tcg_gen_movi_i64(t1, vsz);
> -tcg_gen_umin_i64(t0, t0, t1);
> +tmax = tcg_const_i64(vsz >> a->esz);
>  if (a->eq) {
>  /* Equality means one more iteration.  */
>  tcg_gen_addi_i64(t0, t0, 1);
> +
> +/* If op1 is max (un)signed integer (and the only time the addition
> + * above could overflow), then we produce an all-true predicate by
> + * setting the count to the vector length.  This is because the
> + * pseudocode is described as an increment + compare loop, and the
> + * max integer would always compare true.
> + */
> +tcg_gen_movi_i64(t1, (a->sf
> +  ? (a->u ? UINT64_MAX : INT64_MAX)
> +  : (a->u ? UINT32_MAX : INT32_MAX)));
> +tcg_gen_movcond_i64(TCG_COND_EQ, t0, op1, t1, tmax, t0);
>  }
>
> -/* t0 = (condition true ? t0 : 0).  */
> +/* Bound to the maximum.  */
> +tcg_gen_umin_i64(t0, t0, tmax);
> +tcg_temp_free_i64(tmax);
> +
> +/* Set the count to zero if the condition is false.  */
>  cond = (a->u
>  ? (a->eq ? TCG_COND_LEU : TCG_COND_LTU)
>  : (a->eq ? TCG_COND_LE : TCG_COND_LT));
>  tcg_gen_movi_i64(t1, 0);
>  tcg_gen_movcond_i64(cond, t0, op0, op1, t0, t1);
> +tcg_temp_free_i64(t1);
>
> +/* Since we're bounded, pass as a 32-bit type.  */
>  t2 = tcg_temp_new_i32();
>  tcg_gen_extrl_i64_i32(t2, t0);
>  tcg_temp_free_i64(t0);
> -tcg_temp_free_i64(t1);
> +
> +/* Scale elements to bits.  */
> +tcg_gen_shli_i32(t2, t2, a->esz);
>
>  desc = (vsz / 8) - 2;
>  desc = deposit32(desc, SIMD_DATA_SHIFT, 2, a->esz);
> --
> 2.17.1
>



Re: [Qemu-devel] [PATCH 1/4] target/arm: Fix sign of sve_cmpeq_ppzw/sve_cmpne_ppzw

2018-08-01 Thread Laurent Desnogues
On Wed, Aug 1, 2018 at 2:31 PM, Richard Henderson
 wrote:
> The normal vector element is sign-extended before
> comparing with the wide vector element.
>
> Reported-by: Laurent Desnogues 
> Signed-off-by: Richard Henderson 

Reviewed-by: Laurent Desnogues 
Tested-by: Laurent Desnogues 


Laurent

> ---
>  target/arm/sve_helper.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
> index 54795c9194..9bd0694d55 100644
> --- a/target/arm/sve_helper.c
> +++ b/target/arm/sve_helper.c
> @@ -2436,13 +2436,13 @@ uint32_t HELPER(NAME)(void *vd, void *vn, void *vm, 
> void *vg, uint32_t desc) \
>  #define DO_CMP_PPZW_S(NAME, TYPE, TYPEW, OP) \
>  DO_CMP_PPZW(NAME, TYPE, TYPEW, OP, H1_4, 0xull)
>
> -DO_CMP_PPZW_B(sve_cmpeq_ppzw_b, uint8_t,  uint64_t, ==)
> -DO_CMP_PPZW_H(sve_cmpeq_ppzw_h, uint16_t, uint64_t, ==)
> -DO_CMP_PPZW_S(sve_cmpeq_ppzw_s, uint32_t, uint64_t, ==)
> +DO_CMP_PPZW_B(sve_cmpeq_ppzw_b, int8_t,  uint64_t, ==)
> +DO_CMP_PPZW_H(sve_cmpeq_ppzw_h, int16_t, uint64_t, ==)
> +DO_CMP_PPZW_S(sve_cmpeq_ppzw_s, int32_t, uint64_t, ==)
>
> -DO_CMP_PPZW_B(sve_cmpne_ppzw_b, uint8_t,  uint64_t, !=)
> -DO_CMP_PPZW_H(sve_cmpne_ppzw_h, uint16_t, uint64_t, !=)
> -DO_CMP_PPZW_S(sve_cmpne_ppzw_s, uint32_t, uint64_t, !=)
> +DO_CMP_PPZW_B(sve_cmpne_ppzw_b, int8_t,  uint64_t, !=)
> +DO_CMP_PPZW_H(sve_cmpne_ppzw_h, int16_t, uint64_t, !=)
> +DO_CMP_PPZW_S(sve_cmpne_ppzw_s, int32_t, uint64_t, !=)
>
>  DO_CMP_PPZW_B(sve_cmpgt_ppzw_b, int8_t,   int64_t, >)
>  DO_CMP_PPZW_H(sve_cmpgt_ppzw_h, int16_t,  int64_t, >)
> --
> 2.17.1
>



Re: [Qemu-devel] [PATCH 4/4] target/arm: Fix typo in helper_sve_movz_d

2018-08-01 Thread Laurent Desnogues
On Wed, Aug 1, 2018 at 2:31 PM, Richard Henderson
 wrote:
> Reported-by: Laurent Desnogues 
> Signed-off-by: Richard Henderson 

Reviewed-by: Laurent Desnogues 
Tested-by: Laurent Desnogues 


Laurent

> ---
>  target/arm/sve_helper.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
> index 87594a8adb..c3cbec9cf5 100644
> --- a/target/arm/sve_helper.c
> +++ b/target/arm/sve_helper.c
> @@ -1042,7 +1042,7 @@ void HELPER(sve_movz_d)(void *vd, void *vn, void *vg, 
> uint32_t desc)
>  uint64_t *d = vd, *n = vn;
>  uint8_t *pg = vg;
>  for (i = 0; i < opr_sz; i += 1) {
> -d[i] = n[1] & -(uint64_t)(pg[H1(i)] & 1);
> +d[i] = n[i] & -(uint64_t)(pg[H1(i)] & 1);
>  }
>  }
>
> --
> 2.17.1
>



[Qemu-devel] ARM: SVE movprfx issue

2018-08-01 Thread Laurent Desnogues
Hello Richard,

there's a typo in sve_movz_d where n[1] is used instead of n[i].

Thanks,

Laurent



Re: [Qemu-devel] ARM: SVE while issue

2018-07-31 Thread Laurent Desnogues
Hello,

On Tue, Jul 31, 2018 at 6:36 PM, Richard Henderson
 wrote:
> On 07/31/2018 09:52 AM, Laurent Desnogues wrote:
>> Hello Richard,
>>
>> according to SVE specification, whilels/whilele instructions have a
>> special case where if the second operand is the maximum (un)signed
>> integer then the result is an all-true predicate.  The current code in
>> trans_WHILE doesn't seem to capture that requirement.  I'm afraid the
>> fix won't be as trivial as for the other two bugs I reported :-)
>
> Still not too bad.
> Testing a fix for my branch now.

I gave it a look and tested it, it works fine!

Thanks,

Laurent

> Annoying that risu didn't randomly generate this case,
> and I totally mis-read the documentation for this.
>
>
> r~
>



[Qemu-devel] ARM: SVE while issue

2018-07-31 Thread Laurent Desnogues
Hello Richard,

according to SVE specification, whilels/whilele instructions have a
special case where if the second operand is the maximum (un)signed
integer then the result is an all-true predicate.  The current code in
trans_WHILE doesn't seem to capture that requirement.  I'm afraid the
fix won't be as trivial as for the other two bugs I reported :-)

Thanks,

Laurent



[Qemu-devel] ARM: SVE issue in saturated subtraction

2018-07-30 Thread Laurent Desnogues
Hello Richard,

the signed overflow computation for 64-bit saturated subtraction in
do_sat_addsub_64 looks wrong:  I think the second xor should use t1
instead of t0.

Thanks,

Laurent



[Qemu-devel] ARM: SVE issue in cmpeq/cmpne

2018-07-30 Thread Laurent Desnogues
Hello Richard,

According to SVE definition, the operands of cmpeq/cmpne SVE
instructions are signed, so I guess the definitions of
sve_cmpeq_ppzw_[bhs] and sve_cmpne_ppzw_[bhs] should use signed types.
The other cmpcc instructions look good.

Thanks,

Laurent



Re: [Qemu-devel] [PATCH for-3.0] target/arm: Fix LD1W and LDFF1W (scalar plus vector)

2018-07-11 Thread Laurent Desnogues
Hello,

On Wed, Jul 11, 2018 at 12:39 PM, Richard Henderson
 wrote:
> 'I' was being double-incremented; correctly within the inner loop
> and incorrectly within the outer loop.
>
> Signed-off-by: Richard Henderson 

I didn't try to apply the patch but line numbers look wrong.

I guess this should apply to DO_LD1_ZPZ_S and DO_LDFF1_ZPZ macros, in
which case you get my review:

Reviewed-by: Laurent Desnogues 

Thanks,

Laurent

> ---
>
> Fixes a SIGSEGV within one of these generated helpers,
> exposed by an armclang vectorized code sample.
>
>
> r~
>
> ---
>  target/arm/sve_helper.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
> index cec0d3ee54..ddc592ff79 100644
> --- a/target/arm/sve_helper.c
> +++ b/target/arm/sve_helper.c
> @@ -4855,7 +4855,7 @@ void HELPER(NAME)(CPUARMState *env, void *vd, void *vg, 
> void *vm,   \
>  intptr_t i, oprsz = simd_oprsz(desc);   \
>  unsigned scale = simd_data(desc);   \
>  uintptr_t ra = GETPC(); \
> -for (i = 0; i < oprsz; i++) {   \
> +for (i = 0; i < oprsz; ) {  \
>  uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); \
>  do {\
>  TYPEM m = 0;\
> @@ -4936,7 +4936,7 @@ void HELPER(NAME)(CPUARMState *env, void *vd, void *vg, 
> void *vm,   \
>  uintptr_t ra = GETPC(); \
>  bool first = true;  \
>  mmap_lock();\
> -for (i = 0; i < oprsz; i++) {   \
> +for (i = 0; i < oprsz; ) {  \
>  uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); \
>  do {\
>  TYPEM m = 0;\
> --
> 2.17.1
>
>



Re: [Qemu-devel] [PATCH 2/6] target/arm: Fix SVE system register access checks

2018-06-29 Thread Laurent Desnogues
Hello,

On Fri, Jun 29, 2018 at 2:15 AM, Richard Henderson
 wrote:
> Leave ARM_CP_SVE, removing ARM_CP_FPU; the sve_access_check
> produced by the flag already includes fp_access_check.  If
> we also check ARM_CP_FPU the double fp_access_check asserts.
>
> Reported-by: Laurent Desnogues 
> Signed-off-by: Richard Henderson 

Reviewed-by: Laurent Desnogues 

Thanks for taking care of it,

Laurent

> ---
>  target/arm/helper.c| 8 
>  target/arm/translate-a64.c | 5 ++---
>  2 files changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index b19c7ace78..a855da045b 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -4393,7 +4393,7 @@ static void zcr_write(CPUARMState *env, const 
> ARMCPRegInfo *ri,
>  static const ARMCPRegInfo zcr_el1_reginfo = {
>  .name = "ZCR_EL1", .state = ARM_CP_STATE_AA64,
>  .opc0 = 3, .opc1 = 0, .crn = 1, .crm = 2, .opc2 = 0,
> -.access = PL1_RW, .type = ARM_CP_SVE | ARM_CP_FPU,
> +.access = PL1_RW, .type = ARM_CP_SVE,
>  .fieldoffset = offsetof(CPUARMState, vfp.zcr_el[1]),
>  .writefn = zcr_write, .raw_writefn = raw_write
>  };
> @@ -4401,7 +4401,7 @@ static const ARMCPRegInfo zcr_el1_reginfo = {
>  static const ARMCPRegInfo zcr_el2_reginfo = {
>  .name = "ZCR_EL2", .state = ARM_CP_STATE_AA64,
>  .opc0 = 3, .opc1 = 4, .crn = 1, .crm = 2, .opc2 = 0,
> -.access = PL2_RW, .type = ARM_CP_SVE | ARM_CP_FPU,
> +.access = PL2_RW, .type = ARM_CP_SVE,
>  .fieldoffset = offsetof(CPUARMState, vfp.zcr_el[2]),
>  .writefn = zcr_write, .raw_writefn = raw_write
>  };
> @@ -4409,14 +4409,14 @@ static const ARMCPRegInfo zcr_el2_reginfo = {
>  static const ARMCPRegInfo zcr_no_el2_reginfo = {
>  .name = "ZCR_EL2", .state = ARM_CP_STATE_AA64,
>  .opc0 = 3, .opc1 = 4, .crn = 1, .crm = 2, .opc2 = 0,
> -.access = PL2_RW, .type = ARM_CP_SVE | ARM_CP_FPU,
> +.access = PL2_RW, .type = ARM_CP_SVE,
>  .readfn = arm_cp_read_zero, .writefn = arm_cp_write_ignore
>  };
>
>  static const ARMCPRegInfo zcr_el3_reginfo = {
>  .name = "ZCR_EL3", .state = ARM_CP_STATE_AA64,
>  .opc0 = 3, .opc1 = 6, .crn = 1, .crm = 2, .opc2 = 0,
> -.access = PL3_RW, .type = ARM_CP_SVE | ARM_CP_FPU,
> +.access = PL3_RW, .type = ARM_CP_SVE,
>  .fieldoffset = offsetof(CPUARMState, vfp.zcr_el[3]),
>  .writefn = zcr_write, .raw_writefn = raw_write
>  };
> diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
> index f986340832..45a6c2a3aa 100644
> --- a/target/arm/translate-a64.c
> +++ b/target/arm/translate-a64.c
> @@ -1633,11 +1633,10 @@ static void handle_sys(DisasContext *s, uint32_t 
> insn, bool isread,
>  default:
>  break;
>  }
> -if ((ri->type & ARM_CP_SVE) && !sve_access_check(s)) {
> -return;
> -}
>  if ((ri->type & ARM_CP_FPU) && !fp_access_check(s)) {
>  return;
> +} else if ((ri->type & ARM_CP_SVE) && !sve_access_check(s)) {
> +return;
>  }
>
>  if ((tb_cflags(s->base.tb) & CF_USE_ICOUNT) && (ri->type & ARM_CP_IO)) {
> --
> 2.17.1
>
>



Re: [Qemu-devel] Stair step trace output since 12fb0ac05

2018-06-06 Thread Laurent Desnogues
On Thu, Jun 7, 2018 at 6:56 AM, Thomas Huth  wrote:
> On 06.06.2018 20:56, Markus Armbruster wrote:
>> BALATON Zoltan  writes:
>>
>>> Hello,
>>>
>>> Since 12fb0ac05 (char: Remove unwanted crlf conversion) trace output
>>> is printed in stair steps when using -trace and -serial stdio
>>> together. E.g.
>>> $ qemu-system-i386 -trace 'pci*' -serial stdio
>>>
>>> Regards,
>>> BALATON Zoltan
>>
>> I cc'ed the people involved with this patch for you.
>
> Is it OK if you revert the change to chardev/char-stdio.c, but not
> chardev/char-serial.c ?

That's what I did, and it was enough to fix the issues I had.

Thanks,

Laurent



Re: [Qemu-devel] [PATCH] tcg: Fix helper function vs host abi for float16

2018-05-24 Thread Laurent Desnogues
On Thu, May 24, 2018 at 2:28 PM, Peter Maydell <peter.mayd...@linaro.org> wrote:
> On 23 May 2018 at 06:10, Laurent Desnogues <laurent.desnog...@gmail.com> 
> wrote:
>> Some AArch64 tests I had that previously failed on a x86-64 host now pass.
>>
>> Tested-by: Laurent Desnogues <laurent.desnog...@gmail.com>
>
> Thanks for the testing.
>
>> Perhaps the two occurrences of the following comment in
>> target/arm/translate-a64.c could be removed along with this patch:
>>
>> /* write_fp_sreg is OK here because top half of tcg_rd is zero */
>
> I think this comment remains correct, doesn't it? At this point
> we've just called a helper routine which is 'f16' return. With
> this patch, that means that (by virtue of being uint16_t return
> type as far as C is concerned), it will have returned a 32 bit
> value into the TCGv_i32 whose bits [31:16] are zero, as the
> comment says. We require that because we're using
> write_fp_sreg() to update the vector register, and that function
> zeroes out bits [127:32] and assumes [31:16] from its input
> should all go into the vector register; but the instruction's
> semantics require [127:16] to be zeroed, and it's only because
> we know [31:16] are zero in the return value that we can
> get away with calling write_fp_sreg() rather than requiring a
> new write_fp_hreg().

I was reading the comment as somehow explaining the assumption of the
ABI clearing the top 16-bit of 16-bit return value.  But I agree it
still makes sense as it is.

Thanks,

Laurent



[Qemu-devel] AArch64: ZCR and ARM_CP_SVE/ARM_CP_FPU flags

2018-05-23 Thread Laurent Desnogues
Hi,

ZCR system registers are both flagged as ARM_CP_SVE and ARM_CP_FPU,
which results in an assertion failure in fp_access_check due to the
check of these flags in handle_sys:

if ((ri->type & ARM_CP_SVE) && !sve_access_check(s)) {
return;
}
if ((ri->type & ARM_CP_FPU) && !fp_access_check(s)) {
return;
}

sve_access_check calls fp_access_check so the assert
!s->fp_access_checked in the second call to fp_access_check will fail.

I took a quick look at sve_exception_el and given that it checks that
FPU is enabled, can't we just remove the ARM_CP_FPU flag from ZCR?

Alternatively the second call to fp_access_check when ARM_CP_FPU is
defined could be skipped if ARM_CP_SVE is set.

Thanks,

Laurent



  1   2   3   4   5   >